Merge branch 'trunk'

commit: 0a000fac4ce1f2d7e6feac54a8d5ab980998e4e5 [log] [tgz]
author: Arda Coskunses <arda@lunarg.com> Mon Nov 21 09:46:36 2016 -0700
committer: Arda Coskunses <arda@lunarg.com> Mon Nov 21 09:46:36 2016 -0700
tree: 7f91b49d96675f18791544a842930ec569c0cdfb
parent: fc98a1ddc69c545bc9f7659e4d772101248afb82 [diff]
parent: 39c845ed4c066740e9efaed0a00af51be07c67c1 [diff]
diff --git a/.gitignore b/.gitignore
index 6eceb39..583a485 100644
--- a/.gitignore
+++ b/.gitignore

@@ -28,6 +28,10 @@
 build
 build32
 dbuild
+rbuild
+vktrace/src/vktrace_extensions/vktracevulkan/codegen_vktrace_utils
+vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/codegen
+vktrace/src/vktrace_extensions/vktracevulkan/vulkan/codegen_utils
 external
 *.config
 *.creator

diff --git a/BUILD.md b/BUILD.md
deleted file mode 100644
index fb98aa9..0000000
--- a/BUILD.md
+++ /dev/null

@@ -1,320 +0,0 @@
-# Build Instructions
-This document contains the instructions for building this repository on Linux and Windows.
-
-This repository does not contain a Vulkan-capable driver.
-Before proceeding, it is strongly recommended that you obtain a Vulkan driver from your graphics hardware vendor
-and install it.
-
-Note: The sample Vulkan Intel driver for Linux (ICD) is being deprecated in favor of other driver options from Intel.
-This driver has been moved to the [VulkanTools repo](https://github.com/LunarG/VulkanTools).
-Further instructions regarding this ICD are available there.
-
-## Contributing
-
-If you intend to contribute, the preferred work flow is for you to develop your contribution
-in a fork of this repo in your GitHub account and then submit a pull request.
-Please see the [CONTRIBUTING](CONTRIBUTING.md) file in this respository for more details.
-
-## Git the Bits
-
-To create your local git repository:
-```
-git clone https://github.com/KhronosGroup/Vulkan-LoaderAndValidationLayers 
-```
-
-## Linux Build
-
-The build process uses CMake to generate makefiles for this project.
-The build generates the loader, layers, and tests.
-
-This repo has been built and tested on Ubuntu 14.04.3 LTS, 14.10, 15.04, 15.10, and 16.04 LTS.
-It should be straightforward to use it on other Linux distros.
-
-These packages are needed to build this repository: 
-```
-sudo apt-get install git cmake build-essential bison libx11-dev libxcb1-dev libxkbcommon-dev
-```
-
-Example debug build (Note that the update\_external\_sources script used below builds external tools into predefined locations. See **Loader and Validation Layer Dependencies** for more information and other options):
-```
-cd Vulkan-LoaderAndValidationLayers  # cd to the root of the cloned git repository
-./update_external_sources.sh
-cmake -H. -Bdbuild -DCMAKE_BUILD_TYPE=Debug
-cd dbuild
-make
-```
-
-If you have installed a Vulkan driver obtained from your graphics hardware vendor, the install process should
-have configured the driver so that the Vulkan loader can find and load it.
-
-If you want to use the loader and layers that you have just built:
-```
-export LD_LIBRARY_PATH=<path to your repository root>/dbuild/loader
-export VK_LAYER_PATH=<path to your repository root>/dbuild/layers
-```
-You can run the `vulkaninfo` application to see which driver, loader and layers are being used.
-
-The `LoaderAndLayerInterface` document in the `loader` folder in this repository is a specification that
-describes both how ICDs and layers should be properly
-packaged, and how developers can point to ICDs and layers within their builds.
-
-### Linux Install to System Directories
-
-Installing the files resulting from your build to the systems directories is optional since
-environment variables can usually be used instead to locate the binaries.
-There are also risks with interfering with binaries installed by packages.
-If you are certain that you would like to install your binaries to system directories,
-you can proceed with these instructions.
-
-Assuming that you've built the code as described above and the current directory is still `dbuild`,
-you can execute:
-
-```
-sudo make install
-```
-
-This command installs files to:
-
-* `/usr/local/include/vulkan`:  Vulkan include files
-* `/usr/local/lib`:  Vulkan loader and layers shared objects
-* `/usr/local/bin`:  vulkaninfo application
-* `/usr/local/etc/vulkan/explicit_layer.d`:  Layer JSON files
-
-You may need to run `ldconfig` in order to refresh the system loader search cache on some Linux systems.
-
-The list of installed files appears in the build directory in a file named `install_manifest.txt`.
-You can easily remove the installed files with:
-
-```
-cat install_manifest.txt | sudo xargs rm
-```
-
-You can further customize the installation location by setting additional CMake variables
-to override their defaults.
-For example, if you would like to install to `/tmp/build` instead of `/usr/local`, specify:
-
-```
--DCMAKE_INSTALL_PREFIX=/tmp/build
--DDEST_DIR=/tmp/build
-```
-
-on your CMake command line and run `make install` as before.
-The install step places the files in `/tmp/build`.
-
-Using the `CMAKE_INSTALL_PREFIX` to customize the install location also modifies the
-loader search paths to include searching for layers in the specified install location.
-In this example, setting `CMAKE_INSTALL_PREFIX` to `/tmp/build` causes the loader to
-search `/tmp/build/etc/vulkan/explicit_layer.d` and `/tmp/build/share/vulkan/explicit_layer.d`
-for the layer JSON files.
-The loader also searches the "standard" system locations of `/etc/vulkan/explicit_layer.d`
-and `/usr/share/vulkan/explicit_layer.d` after searching the two locations under `/tmp/build`.
-
-You can further customize the installation directories by using the CMake variables
-`CMAKE_INSTALL_SYSCONFDIR` to rename the `etc` directory and `CMAKE_INSTALL_DATADIR`
-to rename the `share` directory.
-
-See the CMake documentation for more details on using these variables
-to further customize your installation.
-
-Also see the `LoaderAndLayerInterface` document in the `loader` folder in this repository for more
-information about loader operation.
-
-Note that some executables in this repository (e.g., `cube`) use the "rpath" linker directive
-to load the Vulkan loader from the build directory, `dbuild` in this example.
-This means that even after installing the loader to the system directories, these executables
-still use the loader from the build directory.
-
-## Validation Test
-
-The test executables can be found in the dbuild/tests directory.
-Some of the tests that are available:
-- vk\_layer\_validation\_tests: Test Vulkan layers.
-
-There are also a few shell and Python scripts that run test collections (eg,
-`run_all_tests.sh`).
-
-## Linux Demos
-
-Some demos that can be found in the dbuild/demos directory are:
-- vulkaninfo: report GPU properties
-- cube: a textured spinning cube
-- smoke/smoke: A "smoke" test using a more complex Vulkan demo
-
-## Windows System Requirements
-
-Windows 7+ with additional required software packages:
-
-- Microsoft Visual Studio 2013 Professional.  Note: it is possible that lesser/older versions may work, but that has not been tested.
-- [CMake](http://www.cmake.org/download/).  Notes:
-  - Tell the installer to "Add CMake to the system PATH" environment variable.
-- [Python 3](https://www.python.org/downloads).  Notes:
-  - Select to install the optional sub-package to add Python to the system PATH environment variable.
-  - Ensure the pip module is installed (it should be by default)
-  - Need python3.3 or later to get the Windows py.exe launcher that is used to get python3 rather than python2 if both are installed on Windows
-  - 32 bit python works
-- [Git](http://git-scm.com/download/win).
-  - Note: If you use Cygwin, you can normally use Cygwin's "git.exe".  However, in order to use the "update\_external\_sources.bat" script, you must have this version.
-  - Tell the installer to allow it to be used for "Developer Prompt" as well as "Git Bash".
-  - Tell the installer to treat line endings "as is" (i.e. both DOS and Unix-style line endings).
-  - Install each a 32-bit and a 64-bit version, as the 64-bit installer does not install the 32-bit libraries and tools.
-- glslang is required for demos and tests.
-  - [You can download and configure it (in a peer directory) here](https://github.com/KhronosGroup/glslang/blob/master/README.md)
-  - A windows batch file has been included that will pull and build the correct version.  Run it from Developer Command Prompt for VS2013 like so:
-    - update\_external\_sources.bat --build-glslang (Note: see **Loader and Validation Layer Dependencies** below for other options)
-
-## Windows Build - MSVC
-
-Before building on Windows, you may want to modify the customize section in loader/loader.rc to so as to
-set the version numbers and build description for your build. Doing so will set the information displayed
-for the Properites->Details tab of the loader vulkan-1.dll file that is built.
-
-Build all Windows targets after installing required software and cloning the Loader and Validation Layer repo as described above by completing the following steps in a "Developer Command Prompt for VS2013" window (Note that the update\_external\_sources script used below builds external tools into predefined locations. See **Loader and Validation Layer Dependencies** for more information and other options):
-```
-cd Vulkan-LoaderAndValidationLayers  # cd to the root of the cloned git repository
-update_external_sources.bat --all
-build_windows_targets.bat 
-```
-
-At this point, you can use Windows Explorer to launch Visual Studio by double-clicking on the "VULKAN.sln" file in the \build folder.  Once Visual Studio comes up, you can select "Debug" or "Release" from a drop-down list.  You can start a build with either the menu (Build->Build Solution), or a keyboard shortcut (Ctrl+Shift+B).  As part of the build process, Python scripts will create additional Visual Studio files and projects, along with additional source files.  All of these auto-generated files are under the "build" folder.
-
-Vulkan programs must be able to find and use the vulkan-1.dll library. Make sure it is either installed in the C:\Windows\System32 folder, or the PATH environment variable includes the folder that it is located in.
-
-To run Vulkan programs you must tell the icd loader where to find the libraries.
-This is described in a `LoaderAndLayerInterface` document in the `loader` folder in this repository.
-This specification describes both how ICDs and layers should be properly
-packaged, and how developers can point to ICDs and layers within their builds.
-
-## Android Build
-Install the required tools for Linux and Windows covered above, then add the
-following.
-### Android Studio
-- Install [Android Studio 2.1](http://tools.android.com/download/studio/canary), latest Preview (tested with 4):
-- From the "Welcome to Android Studio" splash screen, add the following components using Configure > SDK Manager:
-  - SDK Platforms > Android N Preview
-  - SDK Tools > Android NDK
-
-#### Add NDK to path
-
-On Linux:
-```
-export PATH=$HOME/Android/sdk/ndk-bundle:$PATH
-```
-On Windows:
-```
-set PATH=%LOCALAPPDATA%\Android\sdk\ndk-bundle;%PATH%
-```
-On OSX:
-```
-export PATH=$HOME/Library/Android/sdk/ndk-bundle:$PATH
-```
-### Additional OSX System Requirements
-Tested on OSX version 10.11.4
-
- Setup Homebrew and components
-- Follow instructions on [brew.sh](http://brew.sh) to get homebrew installed.
-```
-/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
-```
-- Ensure Homebrew is at the beginning of your PATH:
-```
-export PATH=/usr/local/bin:$PATH
-```
-- Add packages with the following (may need refinement)
-```
-brew install cmake python python3 git
-```
-### Build steps for Android
-Use the following to ensure the Android build works.
-#### Linux and OSX
-Follow the setup steps for Linux or OSX above, then from your terminal:
-```
-cd build-android
-./update_external_sources_android.sh
-./android-generate.sh
-ndk-build -j $(sysctl -n hw.ncpu)
-```
-#### Windows
-Follow the setup steps for Windows above, then from Developer Command Prompt for VS2013:
-```
-cd build-android
-update_external_sources_android.bat
-android-generate.bat
-ndk-build
-```
-#### Android demos
-Use the following steps to build, install, and run Cube and Tri for Android:
-```
-cd demos/android
-android update project -s -p . -t "android-23"
-ndk-build
-ant -buildfile cube debug
-adb install ./cube/bin/NativeActivity-debug.apk
-adb shell am start com.example.Cube/android.app.NativeActivity
-```
-To build, install, and run Cube with validation layers, first build layers using steps above, then run:
-```
-cd demos/android
-android update project -s -p . -t "android-23"
-ndk-build -j
-ant -buildfile cube-with-layers debug
-adb install ./cube-with-layers/bin/NativeActivity-debug.apk
-adb shell am start -a android.intent.action.MAIN -c android-intent.category.LAUNCH -n com.example.CubeWithLayers/android.app.NativeActivity --es args "--validate"
-```
-
-To build, install, and run the Smoke demo for Android, run the following, and any
-prompts that come back from the script:
-```
-./update_external_sources.sh --glslang
-cd demos/smoke/android
-export ANDROID_SDK_HOME=<path to Android/Sdk>
-export ANDROID_NDK_HOME=<path to Android/Sdk/ndk-bundle>
-./build-and-install
-adb shell am start com.example.Smoke/android.app.NativeActivity
-```
-
-## Ninja Builds - All Platforms
-The [Qt Creator IDE](https://qt.io/download-open-source/#section-2) can open a root CMakeList.txt as a project directly, and it provides tools within Creator to configure and generate Vulkan SDK build files for one to many targets concurrently, resolving configuration issues as needed. Alternatively, when invoking CMake use the -G Codeblocks Ninja option to generate Ninja build files to be used as project files for QtCreator
-
-- Follow the steps defined elsewhere for the OS using the update\_external\_sources script or as shown in **Loader and Validation Layer Dependencies** below
-- Open, configure, and build the gslang and spirv-tools CMakeList.txt files
-- Then do the same with the Vulkan-LoaderAndValidationLayers CMakeList.txt file.
-- In order to debug with QtCreator, a [Microsoft WDK: eg WDK 10](http://go.microsoft.com/fwlink/p/?LinkId=526733) is required. Note that installing the WDK breaks the MSVC vcvarsall.bat build scripts provided by MSVC, requiring that the LIB, INCLUDE, and PATH env variables be set to the WDK paths by some other means
-
-## Loader and Validation Layer Dependencies 
-gslang and SPIRV-Tools repos are required to build and run Loader and Validation Layer components. They are not git sub-modules of Vulkan-LoaderAndValidationLayers but Vulkan-LoaderAndValidationLayers is linked to specific revisions of gslang and spirv-tools. These can be automatically cloned and built to predefined locations with the update\_external\_sources scripts. If a custom configuration is required, do the following steps:
-
-1) clone the repos:
-
-    git clone https://github.com/KhronosGroup/glslang.git
-    git clone https://github.com/KhronosGroup/SPIRV-Tools.git
-
-
-2) checkout the correct version of each tree based on the contents of the glslang\_revision and spirv-tools\_revision files at the root of the Vulkan-LoaderAndValidationLayers tree (do the same anytime that Vulkan-LoaderAndValidationLayers is updated from remote)
-
-_on windows_
-
-    git checkout < [path to Vulkan-LoaderAndValidationLayers]\glslang_revision [in glslang repo]
-	git checkout < [path to Vulkan-LoaderAndValidationLayers]\spirv-tools_revision[in spriv-tools repo]
-
-*non windows*
-
-    git checkout `cat [path to Vulkan-LoaderAndValidationLayers]\glslang_revision` [in glslang repo]
-	git checkout `cat [path to Vulkan-LoaderAndValidationLayers]\spirv-tools_revision` [in spriv-tools repo]
-
-3) Configure the gslang and spirv-tools source trees with cmake and build them with your IDE of choice
-
-4) Enable the CUSTOM\_GSLANG\_BIN\_PATH and CUSTOM\_SPIRV\_TOOLS\_BIN\_PATH options in the Vulkan-LoaderAndValidationLayers cmake configuration and point the GSLANG\_BINARY\_PATH and SPIRV\_TOOLS\_BINARY\_PATH variables to the correct location
-
-5) If building on Windows with MSVC, set DISABLE\_BUILDTGT\_DIR\_DECORATION to _On_. If building on Windows, but without MSVC set DISABLE\_BUILD\_PATH\_DECORATION to _On_
-
-## Optional software packages:
-
-- [Cygwin for windows](https://www.cygwin.com/).  Notes:
-  - Cygwin provides some Linux-like tools, which are valuable for obtaining the source code, and running CMake.
-    Especially valuable are the BASH shell and git packages.
-  - If you don't want to use Cygwin, there are other shells and environments that can be used.
-    You can also use a Git package that doesn't come from Cygwin.
-	
-- [Ninja on all platforms](https://github.com/ninja-build/ninja/releases). [The Ninja-build project](ninja-build.org). [Ninja Users Manual](ninja-build.org/manual.html) 
-
-- [QtCreator as IDE for CMake builds on all platforms](https://qt.io/download-open-source/#section-2)

diff --git a/BUILDVT.md b/BUILDVT.md
new file mode 100644
index 0000000..e879399
--- /dev/null
+++ b/BUILDVT.md

@@ -0,0 +1,293 @@
+# Build Instructions
+This document contains the instructions for building this repository on Linux and Windows.
+
+This repository contains additional layers and the VkTrace trace/replay tools, supplementing the
+loader and validation layer core components found at https://github.com/KhronosGroup.
+
+For Linux, this repository also contains a sample Intel Vulkan driver that is being deprecated.
+Instead of using this driver, it is suggested that you contact your graphics device manufacturer
+to obtain a Vulkan driver for your GPU hardware.
+
+## Git the Bits
+
+The public repository for the the LunarG VulkanTools is hosted at https://github.com/LunarG.
+
+If you intend to contribute, the preferred work flow is to fork the repo,
+create a branch in your forked repo, do the work,
+and create a pull request on GitHub to integrate that work back into the repo.
+
+## Linux System Requirements
+Ubuntu 14.04.3 LTS, 14.10, 15.04,15.10, and 16.04 LTS have been tested with this repo.
+
+These additional packages are needed for building the components in this repo.
+```
+# Dependencies from the LoaderAndValidationLayers repo:
+sudo apt-get install git cmake build-essential bison libx11-dev libxcb1-dev
+# Additional dependencies for this repo:
+sudo apt-get install libudev-dev libpciaccess-dev libxcb-dri3-dev libxcb-present-dev libgl1-mesa-dev wget autotools-dev
+```
+
+If you are using the sample Intel Vulkan driver in this repo, you will have to ensure that
+the DRI3 extension is active in your X Server.
+Follow the
+[DRI3 Instructions](dri3.md)
+to prepare your X Server.
+
+## Clone the Repository
+
+Note that the Vulkan-LoaderAndValidationLayers repo content is included within the VulkanTools repo.
+
+To create your local git repository of VulkanTools:
+```
+cd YOUR_DEV_DIRECTORY
+git clone git@github.com:LunarG/VulkanTools.git
+cd VulkanTools
+# This will fetch and build glslang and spriv-tools
+# On Linux, it will also fetch and build LunarGLASS
+./update_external_sources.sh         # linux
+./update_external_sources.bat --all  # windows
+```
+
+## Linux Build
+
+This build process builds the icd, vktrace and the LVL tests.
+
+Example debug build:
+```
+cd YOUR_DEV_DIRECTORY/VulkanTools  # cd to the root of the VulkanTools git repository
+cmake -H. -Bdbuild -DCMAKE_BUILD_TYPE=Debug
+cd dbuild
+make
+```
+
+## Windows System Requirements
+
+Windows 7+ with additional required software packages:
+
+- Microsoft Visual Studio 2013 Professional.  Note: it is possible that lesser/older versions may work, but that has not been tested.
+- CMake (from http://www.cmake.org/download/).  Notes:
+  - In order to build the VkTrace tools, you need at least version 3.0.
+  - Tell the installer to "Add CMake to the system PATH" environment variable.
+- Python 3 (from https://www.python.org/downloads).  Notes:
+  - Select to install the optional sub-package to add Python to the system PATH environment variable.
+  - Need python3.3 or later to get the Windows py.exe launcher that is used to get python3 rather than python2 if both are installed on Windows
+- Git (from http://git-scm.com/download/win).
+  - Note: If you use Cygwin, you can normally use Cygwin's "git.exe".  However, in order to use the "update_external_sources.bat" script, you must have this version.
+  - Tell the installer to allow it to be used for "Developer Prompt" as well as "Git Bash".
+  - Tell the installer to treat line endings "as is" (i.e. both DOS and Unix-style line endings).
+- glslang is required for tests.
+  - You can download and configure it (in a peer directory) here: https://github.com/KhronosGroup/glslang/blob/master/README.md
+  - A windows batch file has been included that will pull and build the correct version.  Run it from Developer Command Prompt for VS2013 like so:
+    - update_external_sources.bat --build-glslang
+
+Optional software packages:
+
+- Cygwin (from https://www.cygwin.com/).  Notes:
+  - Cygwin provides some Linux-like tools, which are valuable for obtaining the source code, and running CMake.
+    Especially valuable are the BASH shell and git packages.
+  - If you don't want to use Cygwin, there are other shells and environments that can be used.
+    You can also use a Git package that doesn't come from Cygwin.
+
+## Windows Build
+
+Cygwin is used in order to obtain a local copy of the Git repository, and to run the CMake command that creates Visual Studio files.  Visual Studio is used to build the software, and will re-run CMake as appropriate.
+
+To build all Windows targets (e.g. in a "Developer Command Prompt for VS2013" window):
+```
+cd VulkanTools  # cd to the root of the VulkanTools git repository
+mkdir build
+cd build
+cmake -G "Visual Studio 12 Win64" ..
+```
+
+At this point, you can use Windows Explorer to launch Visual Studio by double-clicking on the "VULKAN.sln" file in the \build folder.  
+Once Visual Studio comes up, you can select "Debug" or "Release" from a drop-down list.  
+You can start a build with either the menu (Build->Build Solution), or a keyboard shortcut (Ctrl+Shift+B).
+As part of the build process, Python scripts will create additional Visual Studio files and projects,
+along with additional source files.  
+All of these auto-generated files are under the "build" folder.
+
+Vulkan programs must be able to find and use the Vulkan-1.dll library.
+Make sure it is either installed in the C:\Windows\System32 folder,
+or the PATH environment variable includes the folder that it is located in.
+
+### Windows 64-bit Installation Notes
+If you plan on creating a Windows Install file (done in the windowsRuntimeInstaller sub-directory) you will need to build for both 32-bit and 64-bit Windows since both versions of EXEs and DLLs exist simultaneously on Windows 64.
+
+To do this, simply create and build the release versions of each target:
+```
+cd VulkanTools  # cd to the root of the Vulkan git repository
+mkdir build
+cd build
+cmake -G "Visual Studio 12 Win64" ..
+msbuild ALL_BUILD.vcxproj /p:Platform=x64 /p:Configuration=Release
+mkdir build32
+cd build32
+cmake -G "Visual Studio 12" ..
+msbuild ALL_BUILD.vcxproj /p:Platform=x86 /p:Configuration=Release
+```
+## Android Build
+Install the required tools for Linux and Windows covered above, then add the
+following.
+### Android Studio
+- Install 2.1 or later verion of [Android Studio](http://tools.android.com/download/studio/stable)
+- From the "Welcome to Android Studio" splash screen, add the following components using Configure > SDK Manager:
+  - SDK Tools > Android NDK
+
+#### Add NDK to path
+
+On Linux:
+```
+export PATH=$HOME/Android/sdk/ndk-bundle:$PATH
+```
+On Windows:
+```
+set PATH=%LOCALAPPDATA%\Android\sdk\ndk-bundle;%PATH%
+```
+On OSX:
+```
+export PATH=$HOME/Library/Android/sdk/ndk-bundle:$PATH
+```
+### Additional OSX System Requirements
+Tested on OSX version 10.11.4
+
+ Setup Homebrew and components
+- Follow instructions on [brew.sh](http://brew.sh) to get homebrew installed.
+```
+/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
+```
+- Ensure Homebrew is at the beginning of your PATH:
+```
+export PATH=/usr/local/bin:$PATH
+```
+- Add packages with the following (may need refinement)
+```
+brew install cmake python python3 git
+```
+### Build steps for Android
+Use the following to ensure the Android build works.
+#### Linux
+Follow the setup steps for Linux, then from your terminal:
+```
+cd build-android
+./update_external_sources_android.sh
+./android-generate.sh
+ndk-build -j $(nproc)
+```
+#### OSX
+Follow the setup steps for OSX above, then from your terminal:
+```
+cd build-android
+./update_external_sources_android.sh
+./android-generate.sh
+ndk-build -j $(sysctl -n hw.ncpu)
+```
+#### Windows
+Follow the setup steps for Windows above, then from Developer Command Prompt for VS2013:
+```
+cd build-android
+update_external_sources_android.bat
+android-generate.bat
+ndk-build
+```
+
+## Android usage
+This documentation is preliminary and needs to be beefed up.
+
+See the [vktracereplay.sh](https://github.com/LunarG/VulkanTools/blob/master/build-android/vktracereplay.sh) file for a working example of how to use vktrace/vkreplay and screenshot layers.
+
+Two additional scripts have been added to facilitate tracing and replaying any APK.  Note that these two scripts do not install anything for you, so make sure your target APK, vktrace, vktrace_layer, and vkreplay all use the same ABI.
+```
+./create_trace.sh --serial 0123456789 --abi armeabi-v7a --package com.example.CubeWithLayers  --activity android.app.NativeActivity
+adb install --abi armeabi-v7a
+./replay_trace.sh --serial 0123456789 --tracefile com.example.CubeWithLayers0.vktrace
+```
+An example of using the scripts on Linux and macOS:
+```
+./build_vktracereplay.sh
+./vktracereplay.sh \
+ --serial 12345678 \
+ --abi armeabi-v7a \
+ --apk ../demos/android/cube-with-layers/bin/NativeActivity-debug.apk \
+ --package com.example.CubeWithLayers \
+ --frame 50
+```
+And on Windows:
+```
+build_vktracereplay.bat ^
+vktracereplay.bat ^
+ --serial 12345678 ^
+ --abi armeabi-v7a ^
+ --apk ..\demos\android\cube-with-layers\bin\NativeActivity-debug.apk ^
+ --package com.example.CubeWithLayers ^
+ --frame 50
+```
+### api_dump
+To enable, make the following changes to vk_layer_settings.txt
+```
+-lunarg_api_dump.file = FALSE
++lunarg_api_dump.file = TRUE
+
+-lunarg_api_dump.log_filename = stdout
++lunarg_api_dump.log_filename = /sdcard/Android/vk_apidump.txt
+```
+Then:
+```
+adb push vk_layer_settings.txt /sdcard/Android
+```
+And run your application with the following layer enabled:
+```
+VK_LAYER_LUNARG_api_dump
+```
+### screenshot
+To enable, set a property that contains target frame:
+```
+adb shell setprop debug.vulkan.screenshot <framenumber>
+```
+For production builds, be sure your application has access to read and write to external storage by adding the following to AndroidManifest.xml:
+```
+<!-- This allows writing log files to sdcard -->
+<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE"/>
+<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE"/>
+```
+You may also need to grant it access with package manager:
+```
+adb shell pm grant com.example.Cube android.permission.READ_EXTERNAL_STORAGE
+adb shell pm grant com.example.Cube android.permission.WRITE_EXTERNAL_STORAGE
+```
+Run your application with the following layer enabled:
+```
+VK_LAYER_LUNARG_screenshot
+```
+Result screenshot will be in:
+```
+/sdcard/Android/<framenumber>.ppm
+```
+### vktrace
+To record a trace on Android, enable port forwarding from the device to the host:
+```
+adb reverse localabstract:vktrace tcp:34201
+```
+Start up vktrace on the host in server mode:
+```
+vktrace -v full -o cube.vktrace
+```
+Run your application with the following layer enabled:
+```
+VK_LAYER_LUNARG_vktrace
+```
+The trace will be recorded on the host.
+### vkreplay
+To replay a trace, push the trace to your device
+```
+adb push cube.vktrace /sdcard/cube.vktrace
+```
+Grant vkreplay the ability to read it
+```
+adb shell pm grant com.example.vkreplay android.permission.READ_EXTERNAL_STORAGE
+adb shell pm grant com.example.vkreplay android.permission.WRITE_EXTERNAL_STORAGE
+```
+And start the native activity
+```
+adb shell am start -a android.intent.action.MAIN -c android-intent.category.LAUNCH -n com.example.vkreplay/android.app.NativeActivity --es args "-v\ full\ -t\ /sdcard/cube.vktrace"
+```

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 8176116..d4b3738 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt

@@ -2,7 +2,7 @@
 # refer to the root source directory of the project as ${VULKAN_SOURCE_DIR} and
 # to the root binary directory of the project as ${VULKAN_BINARY_DIR}.
 cmake_minimum_required(VERSION 2.8.11)
-project (VULKAN)
+project (VULKAN_TOOLS)
 # set (CMAKE_VERBOSE_MAKEFILE 1)
 
 # The API_NAME allows renaming builds to avoid conflicts with installed SDKs
@@ -51,6 +51,8 @@
     if (NOT BUILD_WSI_XCB_SUPPORT AND NOT BUILD_WSI_XLIB_SUPPORT AND NOT BUILD_WSI_WAYLAND_SUPPORT AND NOT BUILD_WSI_MIR_SUPPORT)
         set(DisplayServer Display)
     endif()
+elseif(CMAKE_SYSTEM_NAME STREQUAL "Darwin")
+    # Only vktrace is supported on macOS
 else()
     message(FATAL_ERROR "Unsupported Platform!")
 endif()
@@ -90,7 +92,6 @@
 else()
     option(DISABLE_BUILD_PATH_DECORATION "Disable the decoration of the gslang and SPIRV-Tools build path with MSVC build type info" OFF)
     option(DISABLE_BUILDTGT_DIR_DECORATION "Disable the decoration of the gslang and SPIRV-Tools build path with target info" OFF)
-
     # For Windows, since 32-bit and 64-bit items can co-exist, we build each in its own build directory.
     # 32-bit target data goes in build32, and 64-bit target data goes into build.  So, include/link the
     # appropriate data at build time.
@@ -109,13 +110,43 @@
     endif()
 endif()
 
-option(BUILD_LOADER "Build loader" ON)
-option(BUILD_TESTS "Build tests" ON)
-option(BUILD_LAYERS "Build layers" ON)
-option(BUILD_DEMOS "Build demos" ON)
-option(BUILD_VKJSON "Build vkjson" ON)
-option(CUSTOM_GLSLANG_BIN_ROOT "Use the user defined GLSLANG_BINARY_ROOT" OFF)
-option(CUSTOM_SPIRV_TOOLS_BIN_ROOT "Use the user defined SPIRV_TOOLS_BINARY_ROOT" OFF)
+if (CMAKE_SYSTEM_NAME STREQUAL "Windows" OR
+    CMAKE_SYSTEM_NAME STREQUAL "Linux")
+
+    # These are unchanged from upstream file
+    option(BUILD_LOADER "Build loader" ON)
+    if(WIN32)
+        option(BUILD_ICD "Build LunarG intel icd" OFF)
+    else()
+        option(BUILD_ICD "Build LunarG intel icd" ON)
+    endif()
+    option(BUILD_TESTS "Build tests" ON)
+    option(BUILD_LAYERS "Build layers" ON)
+    option(BUILD_LAYERSVT "Build layersvt" ON)
+    option(BUILD_DEMOS "Build demos" ON)
+    option(BUILD_VKTRACE "Build VkTrace" ON)
+    option(BUILD_VKJSON "Build vkjson" ON)
+    option(BUILD_VIA "Build via" ON)
+    option(CUSTOM_GLSLANG_BIN_ROOT "Use the user defined GLSLANG_BINARY_ROOT" OFF)
+    option(CUSTOM_SPIRV_TOOLS_BIN_ROOT "Use the user defined SPIRV_TOOLS_BINARY_ROOT" OFF)
+
+elseif (CMAKE_SYSTEM_NAME STREQUAL "Darwin")
+
+    # Only vktrace enable for macOS
+    option(BUILD_VKTRACE ON)
+    option(BUILD_LOADER OFF)
+    option(BUILD_ICD OFF)
+    option(BUILD_TESTS OFF)
+    option(BUILD_LAYERS OFF)
+    option(BUILD_LAYERSVT OFF)
+    option(BUILD_VKTRACEVIEWER OFF)
+    option(BUILD_DEMOS OFF)
+    option(BUILD_VKJSON OFF)
+    option(BUILD_VIA OFF)
+    option(BUILD_VKTRACE_LAYER OFF)
+    option(BUILD_VKTRACE_REPLAY OFF)
+
+endif()
 
 #Choose natural default paths for glslang and SPIRV-Tools binaries to support custom definition by the user on the CMake command line or in the GUI
 set(GLSLANG_BINARY_ROOT "${CMAKE_BINARY_DIR}/../glslang" CACHE STRING "User defined path to the glslang binaries for this project")
@@ -197,27 +228,44 @@
                                                    "${EXTERNAL_SOURCE_ROOT}/source/spirv-tools/external/include"
                                              DOC "Path to spirv-tools/libspirv.h")
 
-find_library(GLSLANG_LIB NAMES glslang
+find_path(JSONCPP_INCLUDE_DIR json/json.h HINTS "${EXTERNAL_SOURCE_ROOT}/jsoncpp/dist"
+                                                   "${EXTERNAL_SOURCE_ROOT}/JsonCpp/dist"
+                                                   "${EXTERNAL_SOURCE_ROOT}/JsonCPP/dist"
+                                                   "${EXTERNAL_SOURCE_ROOT}/JSONCPP/dist"
+                                                   "${CMAKE_SOURCE_DIR}/../jsoncpp/dist"
+                                             DOC "Path to jsoncpp/dist/json/json.h")
+
+find_path(JSONCPP_SOURCE_DIR jsoncpp.cpp HINTS "${EXTERNAL_SOURCE_ROOT}/jsoncpp/dist"
+                                                   "${EXTERNAL_SOURCE_ROOT}/JsonCpp/dist"
+                                                   "${EXTERNAL_SOURCE_ROOT}/JsonCPP/dist"
+                                                   "${EXTERNAL_SOURCE_ROOT}/JSONCPP/dist"
+                                                   "${CMAKE_SOURCE_DIR}/../jsoncpp/dist"
+                                             DOC "Path to jsoncpp/dist/json.cpp")
+
+    find_library(GLSLANG_LIB NAMES glslang
+        HINTS ${GLSLANG_SEARCH_PATH} )
+
+    find_library(OGLCompiler_LIB NAMES OGLCompiler
+        HINTS ${GLSLANG_SEARCH_PATH} )
+
+    find_library(OSDependent_LIB NAMES OSDependent
+        HINTS ${GLSLANG_SEARCH_PATH} )
+
+    find_library(HLSL_LIB NAMES HLSL
+        HINTS ${GLSLANG_SEARCH_PATH} )
+
+    find_library(SPIRV_LIB NAMES SPIRV
+        HINTS ${GLSLANG_SEARCH_PATH} )
+
+    find_library(SPIRV_REMAPPER_LIB NAMES SPVRemapper
              HINTS ${GLSLANG_SEARCH_PATH} )
 
-find_library(OGLCompiler_LIB NAMES OGLCompiler
-             HINTS ${GLSLANG_SEARCH_PATH} )
-
-find_library(OSDependent_LIB NAMES OSDependent
-             HINTS ${GLSLANG_SEARCH_PATH} )
-
-find_library(HLSL_LIB NAMES HLSL
-             HINTS ${GLSLANG_SEARCH_PATH} )
-
-find_library(SPIRV_LIB NAMES SPIRV
-             HINTS ${GLSLANG_SEARCH_PATH} )
-
-find_library(SPIRV_REMAPPER_LIB NAMES SPVRemapper
-             HINTS ${GLSLANG_SEARCH_PATH} )
-
-find_library(SPIRV_TOOLS_LIB NAMES SPIRV-Tools
+    find_library(SPIRV_TOOLS_LIB NAMES SPIRV-Tools
              HINTS ${SPIRV_TOOLS_SEARCH_PATH} )
 
+    find_library(JSONCPP_LIB NAMES jsoncpp
+             HINTS ${JSONCPP_SEARCH_PATH} )
+
 if (WIN32)
     add_library(glslang     STATIC IMPORTED)
     add_library(OGLCompiler STATIC IMPORTED)
@@ -227,6 +275,7 @@
     add_library(SPVRemapper       STATIC IMPORTED)
     add_library(Loader      STATIC IMPORTED)
     add_library(SPIRV-Tools STATIC IMPORTED)
+    add_library(jsoncpp     STATIC IMPORTED)
 
     find_library(GLSLANG_DLIB NAMES glslangd
                  HINTS ${GLSLANG_DEBUG_SEARCH_PATH} )
@@ -242,19 +291,21 @@
                  HINTS ${GLSLANG_DEBUG_SEARCH_PATH} )
     find_library(SPIRV_TOOLS_DLIB NAMES SPIRV-Tools
                  HINTS ${SPIRV_TOOLS_DEBUG_SEARCH_PATH} )
+    find_library(JSONCPP_DLIB NAMES jsoncpp
+                 HINTS ${JSONCPP_DEBUG_SEARCH_PATH} )
 
     set_target_properties(glslang PROPERTIES
-                         IMPORTED_LOCATION       "${GLSLANG_LIB}"
-                         IMPORTED_LOCATION_DEBUG "${GLSLANG_DLIB}")
+                          IMPORTED_LOCATION       "${GLSLANG_LIB}"
+                          IMPORTED_LOCATION_DEBUG "${GLSLANG_DLIB}")
     set_target_properties(OGLCompiler PROPERTIES
-                         IMPORTED_LOCATION       "${OGLCompiler_LIB}"
-                         IMPORTED_LOCATION_DEBUG "${OGLCompiler_DLIB}")
+                          IMPORTED_LOCATION       "${OGLCompiler_LIB}"
+                          IMPORTED_LOCATION_DEBUG "${OGLCompiler_DLIB}")
     set_target_properties(OSDependent PROPERTIES
-                         IMPORTED_LOCATION       "${OSDependent_LIB}"
-                         IMPORTED_LOCATION_DEBUG "${OSDependent_DLIB}")
+                          IMPORTED_LOCATION       "${OSDependent_LIB}"
+                          IMPORTED_LOCATION_DEBUG "${OSDependent_DLIB}")
     set_target_properties(HLSL PROPERTIES
-                         IMPORTED_LOCATION       "${HLSL_LIB}"
-                         IMPORTED_LOCATION_DEBUG "${HLSL_DLIB}")
+                          IMPORTED_LOCATION       "${HLSL_LIB}"
+                          IMPORTED_LOCATION_DEBUG "${HLSL_DLIB}")
     set_target_properties(SPIRV PROPERTIES
                          IMPORTED_LOCATION       "${SPIRV_LIB}"
                          IMPORTED_LOCATION_DEBUG "${SPIRV_DLIB}")
@@ -262,8 +313,11 @@
                          IMPORTED_LOCATION       "${SPIRV_REMAPPER_LIB}"
                          IMPORTED_LOCATION_DEBUG "${SPIRV_REMAPPER_DLIB}")
     set_target_properties(SPIRV-Tools PROPERTIES
-                         IMPORTED_LOCATION       "${SPIRV_TOOLS_LIB}"
-                         IMPORTED_LOCATION_DEBUG "${SPIRV_TOOLS_DLIB}")
+                          IMPORTED_LOCATION       "${SPIRV_TOOLS_LIB}"
+                          IMPORTED_LOCATION_DEBUG "${SPIRV_TOOLS_DLIB}")
+    set_target_properties(jsoncpp PROPERTIES
+                         IMPORTED_LOCATION       "${JSONCPP_LIB}"
+                         IMPORTED_LOCATION_DEBUG "${JSONCPP_DLIB}")
 
     set (GLSLANG_LIBRARIES glslang OGLCompiler OSDependent HLSL SPIRV SPVRemapper)
     set (SPIRV_TOOLS_LIBRARIES SPIRV-Tools)
@@ -272,6 +326,15 @@
     set (SPIRV_TOOLS_LIBRARIES ${SPIRV_TOOLS_LIB})
 endif()
 
+if (BUILD_ICD)
+    # Hard code our LunarGLASS path for now
+    get_filename_component(LUNARGLASS_PREFIX external/LunarGLASS ABSOLUTE)
+
+    if(NOT EXISTS ${LUNARGLASS_PREFIX})
+        message(FATAL_ERROR "Necessary LunarGLASS components do not exist: " ${LUNARGLASS_PREFIX})
+    endif()
+endif()
+
 set (PYTHON_CMD ${PYTHON_EXECUTABLE})
 
 if(NOT WIN32)
@@ -296,11 +359,16 @@
 endif()
 
 # loader: Generic VULKAN ICD loader
+# icd: Device dependent (DD) VULKAN components
 # tests: VULKAN tests
 if(BUILD_LOADER)
     add_subdirectory(loader)
 endif()
 
+if(BUILD_ICD)
+    add_subdirectory(icd)
+endif()
+
 if(BUILD_TESTS)
     add_subdirectory(tests)
 endif()
@@ -309,10 +377,22 @@
     add_subdirectory(layers)
 endif()
 
+if(BUILD_LAYERSVT)
+    add_subdirectory(layersvt)
+endif()
+
 if(BUILD_DEMOS)
     add_subdirectory(demos)
 endif()
 
+if(BUILD_VKTRACE)
+    add_subdirectory(vktrace)
+endif()
+
 if(BUILD_VKJSON)
     add_subdirectory(libs/vkjson)
 endif()
+
+if(BUILD_VIA)
+    add_subdirectory(via)
+endif()

diff --git a/COPYRIGHT.txt b/COPYRIGHT.txt
index a30ee9e..0c9484f 100644
--- a/COPYRIGHT.txt
+++ b/COPYRIGHT.txt

@@ -1,4 +1,4 @@
-This file contains other licenses and their copyrights that appear in this

+This file contains other licenses and their copyrights that appear in this

 repository besides Apache 2.0 license.

 

 ===================================================

@@ -129,3 +129,62 @@
 /// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN

 /// THE SOFTWARE.

 ///

+

+===================================================

+The JsonCpp library's source code, including accompanying documentation, 

+tests and demonstration applications, are licensed under the following

+conditions...

+

+The author (Baptiste Lepilleur) explicitly disclaims copyright in all 

+jurisdictions which recognize such a disclaimer. In such jurisdictions, 

+this software is released into the Public Domain.

+

+In jurisdictions which do not recognize Public Domain property (e.g. Germany as of

+2010), this software is Copyright (c) 2007-2010 by Baptiste Lepilleur, and is

+released under the terms of the MIT License (see below).

+

+In jurisdictions which recognize Public Domain property, the user of this 

+software may choose to accept it either as 1) Public Domain, 2) under the 

+conditions of the MIT License (see below), or 3) under the terms of dual 

+Public Domain/MIT License conditions described here, as they choose.

+

+The MIT License is about as close to Public Domain as a license can get, and is

+described in clear, concise terms at:

+

+   http://en.wikipedia.org/wiki/MIT_License

+   

+The full text of the MIT License follows:

+

+========================================================================

+Copyright (c) 2007-2010 Baptiste Lepilleur

+

+Permission is hereby granted, free of charge, to any person

+obtaining a copy of this software and associated documentation

+files (the "Software"), to deal in the Software without

+restriction, including without limitation the rights to use, copy,

+modify, merge, publish, distribute, sublicense, and/or sell copies

+of the Software, and to permit persons to whom the Software is

+furnished to do so, subject to the following conditions:

+

+The above copyright notice and this permission notice shall be

+included in all copies or substantial portions of the Software.

+

+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,

+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF

+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND

+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS

+BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN

+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN

+CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE

+SOFTWARE.

+========================================================================

+(END LICENSE TEXT)

+

+The MIT license is compatible with both the GPL and commercial

+software, affording one all of the rights of Public Domain with the

+minor nuisance of being required to keep the above copyright notice

+and license text in the source code. Note also that by accepting the

+Public Domain "license" you can re-license your copy using whatever

+license you like.

+

+===================================================


diff --git a/LunarGLASS/Core/TopToBottom.cpp b/LunarGLASS/Core/TopToBottom.cpp
new file mode 100644
index 0000000..778a205
--- /dev/null
+++ b/LunarGLASS/Core/TopToBottom.cpp

@@ -0,0 +1,352 @@
+//===- TopToBottom.cpp - Translate Top IR to Bottom IR --------------------===//
+//
+// LunarGLASS: An Open Modular Shader Compiler Architecture
+// Copyright (C) 2010-2014 LunarG, Inc.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions
+// are met:
+// 
+//     Redistributions of source code must retain the above copyright
+//     notice, this list of conditions and the following disclaimer.
+// 
+//     Redistributions in binary form must reproduce the above
+//     copyright notice, this list of conditions and the following
+//     disclaimer in the documentation and/or other materials provided
+//     with the distribution.
+// 
+//     Neither the name of LunarG Inc. nor the names of its
+//     contributors may be used to endorse or promote products derived
+//     from this software without specific prior written permission.
+// 
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+// COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+// BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+// LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+// CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+// LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+// ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+// POSSIBILITY OF SUCH DAMAGE.
+//
+//===----------------------------------------------------------------------===//
+//
+// Author: John Kessenich, LunarG
+//
+// Translate Top IR to Bottom IR
+//
+//===----------------------------------------------------------------------===//
+
+// LLVM includes
+#pragma warning(push, 1)
+#include "llvm/Assembly/PrintModulePass.h"
+#include "llvm/IR/DerivedTypes.h"
+#include "llvm/IR/LLVMContext.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/DataLayout.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/PassManager.h"
+#include "llvm/Analysis/Passes.h"
+#include "llvm/Analysis/Verifier.h"
+#include "llvm/Transforms/IPO.h"
+#include "llvm/Transforms/Scalar.h"
+#include "llvm/Support/raw_ostream.h"
+#include <cstdio>
+#include <string>
+#include <map>
+#include <vector>
+#pragma warning(pop)
+
+// LunarGLASS includes
+#include "Exceptions.h"
+#include "Backend.h"
+#include "PrivateManager.h"
+#include "Options.h"
+
+// LunarGLASS Passes
+#include "Passes/Passes.h"
+
+
+void gla::PrivateManager::translateTopToBottom()
+{
+#ifdef _WIN32
+    unsigned int oldFormat = _set_output_format(_TWO_DIGIT_EXPONENT);
+#endif
+
+    runLLVMOptimizations1();
+
+#ifdef _WIN32
+    _set_output_format(oldFormat);
+#endif
+
+    int innerAoS, outerSoA;
+    backEnd->getRegisterForm(outerSoA, innerAoS);
+    if (outerSoA != 1)
+        UnsupportedFunctionality("SoA in middle end: ", outerSoA);
+    if (innerAoS != 4 && innerAoS != 1)
+        UnsupportedFunctionality("AoS other than size 4 or 1 in middle end: ", innerAoS);
+}
+
+void gla::PrivateManager::dump(const char *heading)
+{
+    llvm::errs() << heading;
+    module->dump();
+}
+
+static inline void RunOnModule(llvm::FunctionPassManager& pm, llvm::Module* m)
+{
+    pm.doInitialization();
+    for (llvm::Module::iterator f = m->begin(), e = m->end(); f != e; ++f)
+        pm.run(*f);
+    pm.doFinalization();
+}
+
+static bool HasControlFlow(llvm::Module* m)
+{
+    for (llvm::Module::iterator f = m->begin(), e = m->end(); f != e; ++f) {
+        for (llvm::Function::iterator b = f->begin(), be = f->end(); b != be; ++b) {
+            if (b->getTerminator()->getNumSuccessors() > 1) {
+                return true;
+            }
+        }
+    }
+    return false;
+}
+
+// Verify each function
+static inline void VerifyModule(llvm::Module* module)
+{
+#ifndef NDEBUG
+
+    llvm::FunctionPassManager verifier(module);
+    verifier.add(llvm::createVerifierPass());
+    RunOnModule(verifier, module);
+
+#endif // NDEBUG
+}
+
+void gla::PrivateManager::runLLVMOptimizations1()
+{
+    VerifyModule(module);
+
+    // TODO: generate code performance: When we have backend support for shuffles, or we canonicalize
+    // shuffles into multiinserts, we can replace the InstSimplify passes with
+    // InstCombine passes.
+
+    // First, do some global (module-level) optimizations, which can free up
+    // function passes to do more.
+    llvm::PassManager globalPM;
+    globalPM.add(llvm::createGlobalOptimizerPass());
+    globalPM.add(llvm::createIPSCCPPass());
+    globalPM.add(llvm::createConstantMergePass());
+    globalPM.add(llvm::createInstructionSimplifierPass());
+    if (options.optimizations.inlineThreshold)
+        globalPM.add(llvm::createAlwaysInlinerPass());
+    globalPM.add(llvm::createPromoteMemoryToRegisterPass());
+    globalPM.run(*module);
+
+    // Next, do interprocedural passes
+    // TODO: generated code performance: If we ever have non-inlined functions, we'll want to add some interprocedural passes
+
+    VerifyModule(module);
+
+    // Set up the function-level optimizations we want
+    // TODO: generated code performance: explore ordering of passes more
+    llvm::FunctionPassManager passManager(module);
+
+
+    // Add target data to unblock optimizations that require it
+    // This matches default except for endianness (little) and pointer size/alignment (32)
+    llvm::DataLayout* DL = new llvm::DataLayout("e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64");
+    passManager.add(DL);
+
+    // Create immutable passes once
+    passManager.add(llvm::createBasicAliasAnalysisPass());
+    passManager.add(llvm::createTypeBasedAliasAnalysisPass());
+
+    // Provide the backend queries
+    passManager.add(gla_llvm::createBackEndPointerPass(backEnd));
+
+    // TODO: explore SimplifyLibCalls
+    // TODO: compile-time performance: see if we can avoid running gvn/sccp multiple times
+
+    // Early, simple optimizations to enable others/make others more efficient
+    //passManager.add(llvm::createScalarReplAggregatesPass());
+    passManager.add(llvm::createInstructionCombiningPass());
+    passManager.add(llvm::createEarlyCSEPass());
+    passManager.add(llvm::createCorrelatedValuePropagationPass());
+
+    bool hasCf = HasControlFlow(module);
+
+    if (hasCf) {
+        passManager.add(llvm::createCFGSimplificationPass());
+        passManager.add(llvm::createLoopSimplifyPass());
+        passManager.add(gla_llvm::createCanonicalizeCFGPass());
+        passManager.add(gla_llvm::createDecomposeInstsPass());
+        passManager.add(gla_llvm::createCanonicalizeCFGPass());
+
+        // TODO: Compile-time performance: something goes stale in FlattenConditionalAssignments (dom trees?).
+        //       Running it multiple times here catches more, whereas running it multiple times internally does not help.
+        //       Once that's fixed, most at this level can be eliminated.
+        passManager.add(gla_llvm::createFlattenConditionalAssignmentsPass(options.optimizations.flattenHoistThreshold));
+        passManager.add(gla_llvm::createFlattenConditionalAssignmentsPass(options.optimizations.flattenHoistThreshold));
+        passManager.add(gla_llvm::createFlattenConditionalAssignmentsPass(options.optimizations.flattenHoistThreshold));
+
+        passManager.add(gla_llvm::createCanonicalizeCFGPass());
+    } else        
+        passManager.add(gla_llvm::createDecomposeInstsPass());
+
+    int innerAoS, outerSoA;
+    backEnd->getRegisterForm(outerSoA, innerAoS);
+    if (innerAoS == 1) {
+        passManager.add(gla_llvm::createScalarizePass());
+    }
+
+    if (options.optimizations.reassociate)
+        passManager.add(llvm::createReassociatePass());
+    passManager.add(llvm::createInstructionCombiningPass());
+
+    //if (options.optimizations.gvn)
+    //    passManager.add(llvm::createGVNPass());
+    passManager.add(llvm::createSCCPPass());
+
+    if (hasCf) {
+        passManager.add(llvm::createLoopSimplifyPass());
+        passManager.add(gla_llvm::createCanonicalizeCFGPass());
+        passManager.add(gla_llvm::createFlattenConditionalAssignmentsPass(options.optimizations.flattenHoistThreshold));
+        passManager.add(gla_llvm::createFlattenConditionalAssignmentsPass(options.optimizations.flattenHoistThreshold));
+        passManager.add(gla_llvm::createCanonicalizeCFGPass());
+    }
+
+    // Make multiinsert intrinsics, and clean up afterwards
+    passManager.add(llvm::createInstructionCombiningPass());
+    if (options.optimizations.coalesce)
+        passManager.add(gla_llvm::createCoalesceInsertsPass());
+    if (options.optimizations.adce)
+        passManager.add(llvm::createAggressiveDCEPass());
+    passManager.add(llvm::createInstructionCombiningPass());
+
+    if (hasCf) {
+        // Loop optimizations, and clean up afterwards
+        passManager.add(llvm::createLICMPass());
+        passManager.add(llvm::createIndVarSimplifyPass());
+        if (options.optimizations.loopUnrollThreshold) {
+            // Loop rotation creates a less desirable loop form for loops that do not get unrolled,
+            // but is needed if a loop will be unrolled.
+            passManager.add(llvm::createLoopRotatePass(options.optimizations.loopUnrollThreshold));
+            passManager.add(llvm::createIndVarSimplifyPass());
+            passManager.add(llvm::createLoopUnrollPass(options.optimizations.loopUnrollThreshold));
+        }
+        passManager.add(llvm::createLoopStrengthReducePass());
+        if (options.optimizations.adce)
+            passManager.add(llvm::createAggressiveDCEPass());
+
+        passManager.add(llvm::createInstructionCombiningPass());
+        //if (options.optimizations.gvn)
+        //    passManager.add(llvm::createGVNPass());
+        passManager.add(llvm::createSCCPPass());
+    }
+
+    // Run intrinisic combining
+    passManager.add(gla_llvm::createCanonicalizeCFGPass());
+    passManager.add(llvm::createInstructionCombiningPass());
+    passManager.add(gla_llvm::createIntrinsicCombinePass());
+    passManager.add(gla_llvm::createCanonicalizeCFGPass());
+
+    //if (options.optimizations.gvn)
+    //    passManager.add(llvm::createGVNPass());
+    passManager.add(llvm::createSCCPPass());
+
+    // TODO: generated code: Consider if we really want it. For some reason StandardPasses.h
+    // doesn't have it listed.
+    // passManager.add(llvm::createSinkingPass());
+
+    // Run some post-redundancy-elimination passes
+    //passManager.add(llvm::createScalarReplAggregatesPass());
+    passManager.add(llvm::createInstructionCombiningPass());
+    passManager.add(llvm::createCorrelatedValuePropagationPass());
+    if (options.optimizations.adce)
+        passManager.add(llvm::createAggressiveDCEPass());
+
+    if (hasCf) {
+        // LunarGLASS CFG optimizations
+        passManager.add(llvm::createLoopSimplifyPass());
+        passManager.add(gla_llvm::createCanonicalizeCFGPass());
+        passManager.add(gla_llvm::createFlattenConditionalAssignmentsPass(options.optimizations.flattenHoistThreshold));
+        passManager.add(gla_llvm::createCanonicalizeCFGPass());
+
+        passManager.add(llvm::createInstructionCombiningPass());
+        if (options.optimizations.adce)
+            passManager.add(llvm::createAggressiveDCEPass());
+    }
+
+    RunOnModule(passManager, module);
+
+    VerifyModule(module);
+
+    // Post Function passes cleanup
+    llvm::PassManager pm;
+    pm.add(llvm::createInstructionCombiningPass());
+    pm.add(llvm::createDeadStoreEliminationPass());
+    if (options.optimizations.adce)
+        pm.add(llvm::createAggressiveDCEPass());
+    pm.add(llvm::createStripDeadPrototypesPass());
+    
+    // TODO: function-call functionality: Consider using the below in the presense of functions
+    // pm.add(llvm::createGlobalDCEPass());
+
+    pm.run(*module);
+
+    VerifyModule(module);
+
+    // TODO: Refactor the below use of GlobalOpt. Perhaps we want to repeat our
+    // some function passes?
+
+    llvm::PassManager modulePassManager;
+    modulePassManager.add(llvm::createGlobalOptimizerPass());
+
+    // Optimize the whole module
+    bool changed = modulePassManager.run(*module);
+
+    if (changed) {
+        // removing globals created stack allocations we want to eliminate
+        llvm::FunctionPassManager postGlobalManager(module);
+        postGlobalManager.add(llvm::createPromoteMemoryToRegisterPass());
+
+        // run across all functions
+        postGlobalManager.doInitialization();
+        for (llvm::Module::iterator function = module->begin(), lastFunction = module->end(); function != lastFunction; ++function) {
+            postGlobalManager.run(*function);
+        }
+        postGlobalManager.doFinalization();
+    }
+
+    if (! backEnd->preferRegistersOverMemory()) {
+        llvm::FunctionPassManager memoryPassManager(module);
+        memoryPassManager.add(llvm::createDemoteRegisterToMemoryPass());
+
+        memoryPassManager.doInitialization();
+        for (llvm::Module::iterator function = module->begin(), lastFunction = module->end(); function != lastFunction; ++function) {
+            memoryPassManager.run(*function);
+        }
+        memoryPassManager.doFinalization();
+    }
+
+    VerifyModule(module);
+
+    // Put the IR into a canonical form for BottomTranslator.
+    llvm::PassManager canonicalize;
+
+    canonicalize.add(llvm::createIndVarSimplifyPass());
+    canonicalize.add(gla_llvm::createCanonicalizeCFGPass());
+    canonicalize.add(gla_llvm::createBackEndPointerPass(backEnd));
+    canonicalize.add(gla_llvm::createGatherInstsPass());
+    canonicalize.add(gla_llvm::createCanonicalizeInstsPass());
+    canonicalize.add(llvm::createStripDeadPrototypesPass());
+    canonicalize.run(*module);
+
+    VerifyModule(module);
+}

diff --git a/LunarGLASS/Core/metadata.h b/LunarGLASS/Core/metadata.h
new file mode 100644
index 0000000..4022296
--- /dev/null
+++ b/LunarGLASS/Core/metadata.h

@@ -0,0 +1,966 @@
+//===- metadata.h - LLVM Metadata for LunarGLASS -============================//
+//
+// LunarGLASS: An Open Modular Shader Compiler Architecture
+// Copyright (C) 2010-2014 LunarG, Inc.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions
+// are met:
+//
+//     Redistributions of source code must retain the above copyright
+//     notice, this list of conditions and the following disclaimer.
+//
+//     Redistributions in binary form must reproduce the above
+//     copyright notice, this list of conditions and the following
+//     disclaimer in the documentation and/or other materials provided
+//     with the distribution.
+//
+//     Neither the name of LunarG Inc. nor the names of its
+//     contributors may be used to endorse or promote products derived
+//     from this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+// COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+// BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+// LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+// CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+// LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+// ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+// POSSIBILITY OF SUCH DAMAGE.
+//
+//===----------------------------------------------------------------------===//
+//
+// Author: John Kessenich, LunarG
+//
+//===----------------------------------------------------------------------===//
+
+#pragma once
+#ifndef metadata_H
+#define metadata_H
+
+// LunarGLASS includes
+#include "LunarGLASSTopIR.h"
+
+// LLVM includes
+#pragma warning(push, 1)
+#include "llvm/IR/Module.h"
+#include "llvm/IR/Metadata.h"
+#pragma warning(pop)
+
+namespace gla {
+
+// Forms of metadata nodes, pointed to by instructions, by named metadata, or by other metadata.
+//
+// Only the names starting with "!gla." actually appear in the IR, the other names here are
+// for ease of understanding the linkage between the nodes.
+//
+// NOTE: There are *two* forms the recursive type-walking metadata can appear in:
+// - Single-Walk form: A single self-recursive node, the same one used to root
+//   !gla.input/output/uniform nodes.  This enables a single type tree walker
+//   through the metadata nodes.
+// - Dual-Walk form: The type is rooted by a !gla.input/output/uniform node, which
+//   points to a recursive !aggregate node.  This requires walking the LLVM type
+//   in parallel with walking the metadata nodes.
+// Only one of these forms should be used in a given module.
+//
+// Node forms:
+//
+//     !gla.entrypoint -> { name, EMIoEntrypoint }
+//     Notes:
+//         - contains the name of an source entry point into the shader; e.g., "main"
+//
+//     !gla.precision -> { EMdPrecision }
+//
+//     !gla.io is shorthand for one of !gla.input, !gla.output, !gla.uniform
+//
+//     !gla.io -> { name, EMdInputOutput, Value*, !typeLayout, !aggregate }
+//     This is Dual-Walk form:
+//         - the name is the name of the object (instance-name for a block)
+//              * for a block with no instance name, the name here will be empty ("")
+//         - Value* is a proxy for getting the LLVM type of the root of the type
+//         - !aggregate is for blocks and structures
+//              * it's a block when EMdInputOutput is EMio*Block*, !typeLayout will say how the block is laid out
+//              * it's a structure when the EMdTypeLayout is EMtlAggregate
+//         - for blocks, the instance name is the name above, while the interface name is the name in the !aggregate
+//
+//     !gla.io: { instanceName, EMdInputOutput, Value*, !typeLayout, typeName, !gla.io, !gla.io, !gla.io, ... }
+//     This is Single-Walk form
+//         - the instanceName is the name of the object instance (instance-name for a block)
+//              * for a block with no instance name, the name here will be empty ("")
+//         - Value* is a proxy for getting the LLVM type of this root or intermediate type in the tree
+//              * this must be looked at to get the current level's arrayness information
+//         - typeName is for block interface name or struct type name
+//              * will be empty ("") if this level is not a struct or block
+//              * it's a block when EMdInputOutput is EMio*Block*, !typeLayout will say how the block is laid out
+//              * it's a structure when the EMdTypeLayout is EMtlAggregate
+//              * the !gla.io operands are the child members, in order, of the type
+//         - for blocks, the interface name is typeName
+//
+//     !sampler -> { EMdSampler, Value*, EMdSamplerDim, array, shadow, EMdSamplerBaseType }
+//     Notes:
+//         - EMdSampler says whether it is an image or texture, and if an image what its format is
+//         - texel return value has the same type as Value*, modified by EMdSamplerBaseType (e.g., unsigned int)
+//         - array is bool, true if it is a samplerArray
+//         - shadow is bool, true if it is a samplerShadow
+//
+//     !typeLayout -> { EMdTypeLayout, EMdPrecision, location, !sampler, interpMode, EMdBuiltIn, binding, EMdQualifierShift mask, offset }
+//     Notes:
+//         - the EMdTypeLayout is how it is known if something is a matrix or unsigned integer,
+//           because this is not in the LLVM type
+//         - 'location'
+//           - the *first* location of the variable, which can span many slots/locations
+//           - is >= MaxUserLayoutLocation for non-user specified locations, to be patched later by a linker
+//           - is < MaxUserLayoutLocation for user-assigned locations
+//         - the intrinsic's slot can be different when reading a single slot out of the middle of a large input/output
+//         - interpMode is present as an integer encoded for MakeInterpolationMode() and CrackInterpolationMode()
+//           it can also be present if there is an EMdBuiltIn, and will be -1 if there is no interpMode
+//         - EMdBuiltIn, if present, says what built-in variable is being represented.  It is optional.
+//         - binding, if present, is the binding
+//             - a binding of -1 means the source specified no binding
+//         - EMdQualifierShift, if present, mask is a collection of additional qualifiers
+//             - a value of 0 means no qualifiers were specified in the source
+//         - offset, if present, is an integer offset for the object (as per GLSL)
+//             - an offset of -1 means the source specified no offset
+//
+//     !aggregate -> { name, !typeLayout, list of members: name1, !aggregate1, name2, !aggregate2, ... }
+//     Notes:
+//         - this recursively represents nested types, paralleling the LLVM type, but complementing it
+//           (e.g., an llvm array of array might have EMdTypeLayout of EMtlRowMajorMatrix)
+//         - the name in operand 0 is the name of the type (interface name for a block)
+//         - each contained member has a pair of operands, one for the member's name, one for the member's type
+//
+//     When aggregates are used, the starting point is always a !gla.uniform, !gla.output, or !gla.input node, which will
+//     supply overall information and then point to a !aggregate for the hierarchical information.
+//
+//     Blocks can have two names:
+//        1) the instance name used in a reference, which could be missing
+//        2) the interface name used only in the declaration, which must be present
+//     The instance name will be in the !gla.uniform/input/output node, while the interface name will be in the
+//     !aggregate node pointed to.
+//
+// Forms of named metadata
+//
+//     !gla.inputs = !{ list of all pipeline !input }
+//     !gla.outputs = !{ list of all pipeline !output }
+//     !gla.uniforms = !{ list of all !uniform }
+//     !gla.invariant = !{ subset list of output that were declared as invariant }
+//     !gla.entrypoint = !{ list of all entry points }
+//     !gla.noStaticUse = !{ subset of input/output/uniform that were not statically referenced in the source shader }
+//     !gla.shared = !{ list of all workgroup-shared globals (the Value* of the global variables that are storage-qualified shared) }
+//
+//     !gla.inputPrimitive     = !{ !M } where !M = !{ i32 EMdLayoutGeometry }
+//     !gla.outputPrimitive    = !{ !M } where !M = !{ i32 EMdLayoutGeometry }
+//     !gla.xfbMode            = !{ !M } where !M = !{ i32 bool }
+//     !gla.numVertices        = !{ !M } where !M = !{ i32 int }
+//     !gla.vertexSpacing      = !{ !M } where !M = !{ i32 EMdVertexSpacing }
+//     !gla.vertexOrder        = !{ !M } where !M = !{ i32 EMdVertexOrder }
+//     !gla.pointMode          = !{ !M } where !M = !{ i32 bool }
+//     !gla.invocations        = !{ !M } where !M = !{ i32 int }
+//     !gla.pixelCenterInteger = !{ !M } where !M = !{ i32 bool }
+//     !gla.originUpperLeft    = !{ !M } where !M = !{ i32 bool }
+//     !gla.blendEquations     = !{ !M } where !M = !{ i32 mask of bits shifted by EMdBlendEquationShift amounts }
+//     !gla.localSize          = !{ !M } where !M = !{ i32 x-size, int y-size, int z-size }
+//
+
+// Operand names
+const char* const InputMdName     = "gla.input";
+const char* const OutputMdName    = "gla.output";
+const char* const UniformMdName   = "gla.uniform";
+const char* const PrecisionMdName = "gla.precision";
+
+// Named nodes
+const char* const InputListMdName          = "gla.inputs";
+const char* const OutputListMdName         = "gla.outputs";
+const char* const UniformListMdName        = "gla.uniforms";
+const char* const InvariantListMdName      = "gla.invariant";
+const char* const EntrypointListMdName     = "gla.entrypoint";
+const char* const NoStaticUseMdName        = "gla.noStaticUse";
+const char* const WorkgroupSharedMdName    = "gla.shared";              // storage qualifier 'shared'
+
+const char* const InputPrimitiveMdName     = "gla.inputPrimitive";
+const char* const OutputPrimitiveMdName    = "gla.outputPrimitive";
+const char* const XfbModeMdName            = "gla.xfbMode";
+const char* const NumVerticesMdName        = "gla.numVertices";
+const char* const VertexSpacingMdName      = "gla.vertexSpacing";
+const char* const VertexOrderMdName        = "gla.vertexOrder";
+const char* const PointModeMdName          = "gla.pointMode";
+const char* const InvocationsMdName        = "gla.invocations";
+const char* const PixelCenterIntegerMdName = "gla.pixelCenterInteger";
+const char* const OriginUpperLeftMdName    = "gla.originUpperLeft";
+const char* const BlendEquationMdName      = "gla.blendEquations";
+const char* const LocalSizeMdName          = "gla.localSize";
+
+// what kind of I/O:
+enum EMdInputOutput {
+    EMioNone,               // for something that is not I/O, or already had its EmdInputOutput in md pointing here
+
+    // inputs
+    EMioPipeIn,             // normal user-streamed input data: attributes, varyings, etc.
+    EMioVertexId,
+    EMioInstanceId,
+    EMioVertexIndex,
+    EMioInstanceIndex,
+    EMioFragmentFace,
+    EMioFragmentCoord,
+    EMioPointCoord,
+
+    // outputs
+    EMioPipeOut,            // normal user-streamed output data, including fragment color
+    EMioVertexPosition,
+    EMioPointSize,
+    EMioClipVertex,
+    EMioFragmentDepth,
+
+    // uniforms
+    EMioDefaultUniform,      // a uniform variable not in a block
+    EMioUniformBlockMember,  // uniform buffer (uniform block)
+    EMioBufferBlockMember,   // shader storage buffer object (buffer block), with no run-time sized array, see also EMioBufferBlockMemberArrayed
+
+    // Entry point into shader
+    EMioEntrypoint,
+
+    // in & out blocks
+    EMioPipeOutBlock,         // output block
+    EMioPipeInBlock,          // input block
+
+    // uniforms
+    EMioBufferBlockMemberArrayed, // EMioBufferBlockMember but with run-time sized array as the last member
+
+    EMioCount,
+};
+
+// How the *interior* of a single, non-aggregate entity is laid out, supplemental to the "Value* for type"
+// Also, how a block or structure is laid out, if applied to a block or structure
+enum EMdTypeLayout {
+    EMtlNone,
+
+    // single-entity layouts
+    EMtlUnsigned,           // unsigned integer type, other type information comes from the LLVM value->type
+    EMtlRowMajorMatrix,
+    EMtlColMajorMatrix,
+    EMtlAggregate,
+    EMtlSampler,
+
+    // aggregate layouts
+    EMtlShared,             // layout-qualifier identifier 'shared'
+    EMtlStd140,
+    EMtlStd430,
+    EMtlPacked,
+
+    // Atomic counter
+    EMtlAtomicUint,
+
+    EMtlCount,
+};
+
+// What kind of sampler
+enum EMdSampler {
+    EMsTexture,
+
+    // Image with no format
+    EMsImage,
+
+    // Floating-point format image
+    EMsRgba32f,
+    EMsRgba16f,
+    EMsR32f,
+    EMsRgba8,
+    EMsRgba8Snorm,
+    EMsRg32f,
+    EMsRg16f,
+    EMsR11fG11fB10f,
+    EMsR16f,
+    EMsRgba16,
+    EMsRgb10A2,
+    EMsRg16,
+    EMsRg8,
+    EMsR16,
+    EMsR8,
+    EMsRgba16Snorm,
+    EMsRg16Snorm,
+    EMsRg8Snorm,
+    EMsR16Snorm,
+    EMsR8Snorm,
+
+    // signed-int format image
+    EMsRgba32i,
+    EMsRgba16i,
+    EMsRgba8i,
+    EMsR32i,
+    EMsRg32i,
+    EMsRg16i,
+    EMsRg8i,
+    EMsR16i,
+    EMsR8i,
+
+    // unsigned-int format image
+    EMsRgba32ui,
+    EMsRgba16ui,
+    EMsRgba8ui,
+    EMsR32ui,
+    EMsRg32ui,
+    EMsRg16ui,
+    EMsRg8ui,
+    EMsR16ui,
+    EMsR8ui,
+
+    EMsCount,
+};
+
+// Dimensionality of sampler
+enum EMdSamplerDim {
+    EMsd1D,
+    EMsd2D,
+    EMsd3D,
+    EMsdCube,
+    EMsdRect,
+    EMsdBuffer,
+    EMsd2DMS,
+    EMsdCount,
+};
+
+// Return type of sampler
+enum EMdSamplerBaseType {
+    EMsbFloat,
+    EMsbInt,
+    EMsbUint,
+    EMsbCount,
+};
+
+// ESSL precision qualifier
+enum EMdPrecision {
+    EMpNone,
+    EMpLow,
+    EMpMedium,
+    EMpHigh,
+    EMpCount,
+};
+
+// For input/output primitives
+enum EMdLayoutGeometry {
+    EMlgNone,
+    EMlgPoints,
+    EMlgLines,
+    EMlgLinesAdjacency,
+    EMlgLineStrip,
+    EMlgTriangles,
+    EMlgTrianglesAdjacency,
+    EMlgTriangleStrip,
+    EMlgQuads,
+    EMlgIsolines,
+    EMlgCount,
+};
+
+enum EMdVertexSpacing {
+    EMvsNone,
+    EMvsEqual,
+    EMvsFractionalEven,
+    EMvsFractionalOdd,
+    EMvsCount,
+};
+
+enum EMdVertexOrder {
+    EMvoNone,
+    EMvoCw,
+    EMvoCcw,
+    EMvoCount,
+};
+
+enum EMdBuiltIn {
+    EmbNone,
+    EmbNumWorkGroups,
+    EmbWorkGroupSize,
+    EmbWorkGroupId,
+    EmbLocalInvocationId,
+    EmbGlobalInvocationId,
+    EmbLocalInvocationIndex,
+    EmbVertexId,
+    EmbInstanceId,
+    EmbPosition,
+    EmbPointSize,
+    EmbClipVertex,
+    EmbClipDistance,
+    EmbCullDistance,
+    EmbNormal,
+    EmbVertex,
+    EmbMultiTexCoord0,
+    EmbMultiTexCoord1,
+    EmbMultiTexCoord2,
+    EmbMultiTexCoord3,
+    EmbMultiTexCoord4,
+    EmbMultiTexCoord5,
+    EmbMultiTexCoord6,
+    EmbMultiTexCoord7,
+    EmbFrontColor,
+    EmbBackColor,
+    EmbFrontSecondaryColor,
+    EmbBackSecondaryColor,
+    EmbTexCoord,
+    EmbFogFragCoord,
+    EmbInvocationId,
+    EmbPrimitiveId,
+    EmbLayer,
+    EmbViewportIndex,
+    EmbPatchVertices,
+    EmbTessLevelOuter,
+    EmbTessLevelInner,
+    EmbTessCoord,
+    EmbColor,
+    EmbSecondaryColor,
+    EmbFace,
+    EmbFragCoord,
+    EmbPointCoord,
+    EmbFragColor,
+    EmbFragData,
+    EmbFragDepth,
+    EmbSampleId,
+    EmbSamplePosition,
+    EmbSampleMask,
+    EmbHelperInvocation,
+    EmbBoundingBox,
+    EmbVertexIndex,
+    EmbInstanceIndex,
+    EmbCount
+};
+
+// These are bit-shift amounts for various additional qualifiers.
+enum EMdQualifierShift {
+    EmqNonreadable,
+    EmqNonwritable,
+    EmqVolatile,
+    EmqRestrict,
+    EmqCoherent,
+
+    EmqCount
+};
+
+enum EMdBlendEquationShift {
+    // No 'EMeNone':
+    // These are used as bit-shift amounts.  A mask of such shifts will have type 'int',
+    // and in that space, 0 means no bits set, or none.  In this enum, 0 means (1 << 0), a bit is set.
+    EmeMultiply,
+    EmeScreen,
+    EmeOverlay,
+    EmeDarken,
+    EmeLighten,
+    EmeColordodge,
+    EmeColorburn,
+    EmeHardlight,
+    EmeSoftlight,
+    EmeDifference,
+    EmeExclusion,
+    EmeHslHue,
+    EmeHslSaturation,
+    EmeHslColor,
+    EmeHslLuminosity,
+    EmeAllEquations,
+
+    EmeCount
+};
+
+//
+// Crackers are for the consumer of the IR.
+// They take an MD node, or instruction that might point to one, and decode it, as per the enums above.
+//
+
+inline bool CrackTypeLayout(const llvm::MDNode* md, EMdTypeLayout& layout, EMdPrecision& precision, int& location, const llvm::MDNode*& sampler, int& interpMode, EMdBuiltIn& builtIn)
+{
+    const llvm::ConstantInt* constInt = llvm::dyn_cast<llvm::ConstantInt>(md->getOperand(0));
+    if (! constInt)
+        return false;
+    layout = (EMdTypeLayout)constInt->getSExtValue();
+
+    constInt = llvm::dyn_cast<llvm::ConstantInt>(md->getOperand(1));
+    if (! constInt)
+        return false;
+    precision = (EMdPrecision)constInt->getSExtValue();
+
+    constInt = llvm::dyn_cast<llvm::ConstantInt>(md->getOperand(2));
+    if (! constInt)
+        return false;
+    location = (int)constInt->getSExtValue();
+
+    llvm::Value* speculativeSampler = md->getOperand(3);
+    if (speculativeSampler)
+        sampler = llvm::dyn_cast<llvm::MDNode>(speculativeSampler);
+    else
+        sampler = 0;
+
+    if (md->getNumOperands() >= 5) {
+        const llvm::ConstantInt* constInt = llvm::dyn_cast<llvm::ConstantInt>(md->getOperand(4));
+        if (constInt)
+            interpMode = (int)constInt->getZExtValue();
+    } else
+        interpMode = 0;
+
+    if (md->getNumOperands() >= 6) {
+        const llvm::ConstantInt* constInt = llvm::dyn_cast<llvm::ConstantInt>(md->getOperand(5));
+        if (constInt)
+            builtIn = (EMdBuiltIn)constInt->getZExtValue();
+    } else
+        builtIn = EmbNone;
+
+    return true;
+}
+
+inline bool CrackTypeLayout(const llvm::MDNode* md, EMdTypeLayout& layout, EMdPrecision& precision, int& location, const llvm::MDNode*& sampler, int& interpMode, EMdBuiltIn& builtIn,
+                            int& binding, unsigned int& qualifiers, int& offset)
+{
+    CrackTypeLayout(md, layout, precision, location, sampler, interpMode, builtIn);
+    if (md->getNumOperands() >= 7) {
+        const llvm::ConstantInt* constInt = llvm::dyn_cast<llvm::ConstantInt>(md->getOperand(6));
+        if (constInt)
+            binding = (int)constInt->getSExtValue();
+    } else
+        binding = -1;
+
+    if (md->getNumOperands() >= 8) {
+        const llvm::ConstantInt* constInt = llvm::dyn_cast<llvm::ConstantInt>(md->getOperand(7));
+        if (constInt)
+            qualifiers = (unsigned int)constInt->getZExtValue();
+    } else
+        qualifiers = 0;
+
+    if (md->getNumOperands() >= 9) {
+        const llvm::ConstantInt* constInt = llvm::dyn_cast<llvm::ConstantInt>(md->getOperand(8));
+        if (constInt)
+            offset = (int)constInt->getSExtValue();
+    } else
+        offset = 0;
+
+    return true;
+}
+
+inline bool CrackIOMdType(const llvm::MDNode* md, llvm::Type*& type)
+{
+    if (! md->getOperand(2))
+        return false;
+    type = md->getOperand(2)->getType();
+
+    return true;
+}
+
+inline bool CrackIOMd(const llvm::MDNode* md, std::string& symbolName, EMdInputOutput& qualifier, llvm::Type*& type,
+                      EMdTypeLayout& layout, EMdPrecision& precision, int& location, const llvm::MDNode*& sampler, const llvm::MDNode*& aggregate,
+                      int& interpMode, EMdBuiltIn& builtIn)
+{
+    symbolName = md->getOperand(0)->getName();
+
+    const llvm::ConstantInt* constInt = llvm::dyn_cast<llvm::ConstantInt>(md->getOperand(1));
+    if (! constInt)
+        return false;
+    qualifier = (EMdInputOutput)constInt->getSExtValue();
+
+    if (! md->getOperand(2))
+        return false;
+    type = md->getOperand(2)->getType();
+
+    llvm::MDNode* layoutMd = llvm::dyn_cast<llvm::MDNode>(md->getOperand(3));
+    if (! layoutMd)
+        return false;
+    if (! CrackTypeLayout(layoutMd, layout, precision, location, sampler, interpMode, builtIn))
+        return false;
+
+    if (md->getNumOperands() >= 5)
+        aggregate = llvm::dyn_cast<llvm::MDNode>(md->getOperand(4));
+    else
+        aggregate = 0;
+
+    return true;
+}
+
+inline bool CrackIOMd(const llvm::MDNode* md, std::string& symbolName, EMdInputOutput& qualifier, llvm::Type*& type,
+                      EMdTypeLayout& layout, EMdPrecision& precision, int& location, const llvm::MDNode*& sampler, const llvm::MDNode*& aggregate,
+                      int& interpMode, EMdBuiltIn& builtIn, int& binding, unsigned int& qualifiers, int& offset)
+{
+    symbolName = md->getOperand(0)->getName();
+
+    const llvm::ConstantInt* constInt = llvm::dyn_cast<llvm::ConstantInt>(md->getOperand(1));
+    if (! constInt)
+        return false;
+    qualifier = (EMdInputOutput)constInt->getSExtValue();
+
+    if (! md->getOperand(2))
+        return false;
+    type = md->getOperand(2)->getType();
+
+    llvm::MDNode* layoutMd = llvm::dyn_cast<llvm::MDNode>(md->getOperand(3));
+    if (! layoutMd)
+        return false;
+    if (! CrackTypeLayout(layoutMd, layout, precision, location, sampler, interpMode, builtIn, binding, qualifiers, offset))
+        return false;
+
+    if (md->getNumOperands() >= 5)
+        aggregate = llvm::dyn_cast<llvm::MDNode>(md->getOperand(4));
+    else
+        aggregate = 0;
+
+    return true;
+}
+
+inline bool CrackAggregateMd(const llvm::MDNode* md, std::string& symbolName,
+                             EMdTypeLayout& layout, EMdPrecision& precision, int& location, const llvm::MDNode*& sampler, EMdBuiltIn& builtIn)
+{
+    symbolName = md->getOperand(0)->getName();
+    int dummyInterpMode;
+
+    llvm::MDNode* layoutMd = llvm::dyn_cast<llvm::MDNode>(md->getOperand(1));
+    if (! layoutMd)
+        return false;
+    if (! CrackTypeLayout(layoutMd, layout, precision, location, sampler, dummyInterpMode, builtIn))
+        return false;
+
+    return true;
+}
+
+inline bool CrackAggregateMd(const llvm::MDNode* md, std::string& symbolName,
+                             EMdTypeLayout& layout, EMdPrecision& precision, int& location, const llvm::MDNode*& sampler, EMdBuiltIn& builtIn,
+                             int& binding, unsigned int& qualifiers, int& offset)
+{
+    symbolName = md->getOperand(0)->getName();
+    int dummyInterpMode;
+
+    llvm::MDNode* layoutMd = llvm::dyn_cast<llvm::MDNode>(md->getOperand(1));
+    if (! layoutMd)
+        return false;
+    if (! CrackTypeLayout(layoutMd, layout, precision, location, sampler, dummyInterpMode, builtIn, binding, qualifiers, offset))
+        return false;
+
+    return true;
+}
+
+inline bool CrackIOMd(const llvm::Instruction* instruction, llvm::StringRef kind, std::string& symbolName, EMdInputOutput& qualifier, llvm::Type*& type,
+                      EMdTypeLayout& layout, EMdPrecision& precision, int& location, const llvm::MDNode*& sampler, const llvm::MDNode*& aggregate,
+                      int& interpMode, EMdBuiltIn& builtIn)
+{
+    const llvm::MDNode* md = instruction->getMetadata(kind);
+    if (! md)
+        return false;
+
+    return CrackIOMd(md, symbolName, qualifier, type, layout, precision, location, sampler, aggregate, interpMode, builtIn);
+}
+
+inline bool CrackIOMd(const llvm::Instruction* instruction, llvm::StringRef kind, std::string& symbolName, EMdInputOutput& qualifier, llvm::Type*& type,
+                      EMdTypeLayout& layout, EMdPrecision& precision, int& location, const llvm::MDNode*& sampler, const llvm::MDNode*& aggregate,
+                      int& interpMode, EMdBuiltIn& builtIn, int& binding, unsigned int& qualifiers, int& offset)
+{
+    const llvm::MDNode* md = instruction->getMetadata(kind);
+    if (! md)
+        return false;
+
+    return CrackIOMd(md, symbolName, qualifier, type, layout, precision, location, sampler, aggregate, interpMode, builtIn, binding, qualifiers, offset);
+}
+
+inline bool CrackSamplerMd(const llvm::MDNode* md, EMdSampler& sampler, llvm::Type*& type, EMdSamplerDim& dim, bool& isArray, bool& isShadow, EMdSamplerBaseType& baseType)
+{
+    const llvm::ConstantInt* constInt = llvm::dyn_cast<llvm::ConstantInt>(md->getOperand(0));
+    if (! constInt)
+        return false;
+    sampler = (EMdSampler)constInt->getSExtValue();
+
+    if (! md->getOperand(1))
+        return false;
+    type = md->getOperand(1)->getType();
+
+    constInt = llvm::dyn_cast<llvm::ConstantInt>(md->getOperand(2));
+    if (! constInt)
+        return false;
+    dim = (EMdSamplerDim)constInt->getSExtValue();
+
+    constInt = llvm::dyn_cast<llvm::ConstantInt>(md->getOperand(3));
+    if (! constInt)
+        return false;
+    isArray = constInt->getSExtValue() != 0;
+
+    constInt = llvm::dyn_cast<llvm::ConstantInt>(md->getOperand(4));
+    if (! constInt)
+        return false;
+    isShadow = constInt->getSExtValue() != 0;
+
+    constInt = llvm::dyn_cast<llvm::ConstantInt>(md->getOperand(5));
+    if (! constInt)
+        return false;
+    baseType = (EMdSamplerBaseType)constInt->getSExtValue();
+
+    return true;
+}
+
+inline bool CrackPrecisionMd(const llvm::Instruction* instruction, EMdPrecision& precision)
+{
+    precision = EMpNone;
+
+    const llvm::MDNode* md = instruction->getMetadata(PrecisionMdName);
+    if (! md)
+        return false;
+
+    const llvm::ConstantInt* constInt = llvm::dyn_cast<llvm::ConstantInt>(md->getOperand(0));
+    if (! constInt)
+        return false;
+    precision = (EMdPrecision)constInt->getSExtValue();
+
+    return true;
+}
+
+// translate from 0-based counting to aggregate operand number
+inline int GetAggregateMdNameOp(int index)
+{
+    return 2 + 2 * index;
+}
+
+inline int GetAggregateMdSubAggregateOp(int index)
+{
+    return 2 + 2 * index + 1;
+}
+
+// Just a short cut for when only the type is cared about, rather than cracking all the operands.
+// It looks for both forms of whether this is a deferenced aggregate or a top-level interface md.
+inline EMdTypeLayout GetMdTypeLayout(const llvm::MDNode *md)
+{
+    if (! md)
+        return EMtlNone;
+
+    // check for aggregate member form first
+    llvm::MDNode* layoutMd = llvm::dyn_cast<llvm::MDNode>(md->getOperand(1));
+    if (! layoutMd) {
+        // check for top-level interface
+        layoutMd = llvm::dyn_cast<llvm::MDNode>(md->getOperand(3));
+        if (! layoutMd)
+            return EMtlNone;
+    }
+
+    const llvm::ConstantInt* constInt = llvm::dyn_cast<llvm::ConstantInt>(layoutMd->getOperand(0));
+    if (! constInt)
+        return EMtlNone;
+
+    return (EMdTypeLayout)constInt->getSExtValue();
+}
+
+// A shortcut for just getting the type of a sampler (int, uint, float) from an
+// instruction's metadata
+inline EMdSamplerBaseType GetMdSamplerBaseType(const llvm::MDNode* md)
+{
+    if (! md || md->getNumOperands() < 4)
+        return EMsbFloat;
+
+    md = llvm::dyn_cast<llvm::MDNode>(md->getOperand(3));
+    if (! md || md->getNumOperands() < 4 || md->getOperand(3) == 0)
+        return EMsbFloat;
+
+    md = llvm::dyn_cast<llvm::MDNode>(md->getOperand(3));
+    if (! md || md->getNumOperands() < 6 || md->getOperand(5) == 0)
+        return EMsbFloat;
+
+    const llvm::ConstantInt* constInt = llvm::dyn_cast<llvm::ConstantInt>(md->getOperand(5));
+    if (! constInt)
+        return EMsbFloat;
+
+    return (EMdSamplerBaseType)constInt->getSExtValue();
+}
+
+// Return the value of the integer metadata operand to the named metadata node.
+// Return 0 if the named node is missing, or was there and it's metadata node was 0 or false.
+inline int GetMdNamedInt(llvm::Module& module, const char* name)
+{
+    llvm::NamedMDNode* namedNode = module.getNamedMetadata(name);
+    if (namedNode == nullptr)
+        return 0;
+    const llvm::MDNode* md = namedNode->getOperand(0);
+    const llvm::ConstantInt* constInt = llvm::dyn_cast<llvm::ConstantInt>(md->getOperand(0));
+    if (! constInt)
+        return 0;
+
+    return (int)constInt->getSExtValue();
+}
+
+// Build the set of values of the integer metadata operand to the named metadata node.
+// Return false if the named node is missing, or was there and it's metadata node was 0 or false.
+inline bool GetMdNamedInts(llvm::Module& module, const char* name, std::vector<int>& ints)
+{
+    llvm::NamedMDNode* namedNode = module.getNamedMetadata(name);
+    if (namedNode == nullptr)
+        return false;
+    const llvm::MDNode* md = namedNode->getOperand(0);
+    if (md == nullptr)
+        return false;
+    for (unsigned int op = 0; op < md->getNumOperands(); ++op) {
+        const llvm::ConstantInt* constInt = llvm::dyn_cast<llvm::ConstantInt>(md->getOperand(op));
+        if (! constInt)
+            return false;
+        ints.push_back((int)constInt->getSExtValue());
+    }
+
+    return true;
+}
+
+//
+// Metadata class is just for adapter while building the IR.
+//
+class Metadata {
+public:
+    Metadata(llvm::LLVMContext& c, llvm::Module* m) : context(c), module(m)
+    {
+        // cache the precision qualifier nodes, there are only 4 to reuse
+        for (int i = 0; i < EMpCount; ++i) {
+            llvm::Value* args[] = {
+                gla::MakeIntConstant(context, i),
+            };
+            precisionMd[i] = llvm::MDNode::get(context, args);
+        }
+    }
+
+    // "!gla.input/output/uniform ->" as per comment at top of file
+    llvm::MDNode* makeMdInputOutput(llvm::StringRef symbolName, llvm::StringRef sectionName, EMdInputOutput qualifier,
+                                    llvm::Value* typeProxy, EMdTypeLayout layout, EMdPrecision precision, int location,
+                                    llvm::MDNode* sampler = 0, llvm::MDNode* aggregate = 0, int interpMode = -1, EMdBuiltIn builtIn = EmbNone,
+                                    int binding = -1, int qualifiers = 0, int offset = -1)
+    {
+        llvm::MDNode* layoutMd = makeMdTypeLayout(layout, precision, location, sampler, interpMode, builtIn, binding, qualifiers, offset);
+
+        llvm::MDNode* md;
+        if (aggregate) {
+            llvm::Value* args[] = {
+                llvm::MDString::get(context, symbolName),
+                gla::MakeIntConstant(context, qualifier),
+                typeProxy,
+                layoutMd,
+                aggregate
+            };
+            md = llvm::MDNode::get(context, args);
+        } else  {
+            llvm::Value* args[] = {
+                llvm::MDString::get(context, symbolName),
+                gla::MakeIntConstant(context, qualifier),
+                typeProxy,
+                layoutMd
+            };
+            md = llvm::MDNode::get(context, args);
+        }
+
+        llvm::NamedMDNode* namedNode = module->getOrInsertNamedMetadata(sectionName);
+        namedNode->addOperand(md);
+
+        return md;
+    }
+
+    // "!gla.input/output/uniform ->" as per comment at top of file, for Single-Walk form
+    llvm::MDNode* makeMdSingleTypeIo(llvm::StringRef symbolName, const char* typeName, EMdInputOutput qualifier,
+                                     llvm::Value* typeProxy, llvm::MDNode* layoutMd, llvm::ArrayRef<llvm::MDNode*> members)
+    {
+        llvm::MDNode* md;
+        llvm::SmallVector<llvm::Value*, 10> args;
+        args.push_back(llvm::MDString::get(context, symbolName));
+        args.push_back(gla::MakeIntConstant(context, qualifier));
+        args.push_back(typeProxy);
+        args.push_back(layoutMd);
+        args.push_back(llvm::MDString::get(context, typeName));
+        if (members.size() > 0) {
+            args.append(members.begin(), members.end());
+        }
+        md = llvm::MDNode::get(context, args);
+
+        return md;
+    }
+
+    // "!sampler ->" as per comment at top of file
+    llvm::MDNode* makeMdSampler(EMdSampler sampler, llvm::Value* typeProxy, EMdSamplerDim dim, bool isArray, bool isShadow, EMdSamplerBaseType baseType)
+    {
+        llvm::Value* args[] = {
+            gla::MakeIntConstant(context, sampler),
+            typeProxy,
+            gla::MakeIntConstant(context, dim),
+            gla::MakeBoolConstant(context, isArray),
+            gla::MakeBoolConstant(context, isShadow),
+            gla::MakeIntConstant(context, baseType),
+        };
+        llvm::MDNode* md = llvm::MDNode::get(context, args);
+
+        return md;
+    }
+
+    llvm::MDNode* makeMdPrecision(EMdPrecision precision)
+    {
+        // just use our precision cache
+
+        return precisionMd[precision];
+    }
+
+    llvm::MDNode* makeMdTypeLayout(EMdTypeLayout layout, EMdPrecision precision, int location, llvm::MDNode* sampler, int interpMode = -1, EMdBuiltIn builtIn = EmbNone,
+                                   int binding = -1, unsigned int qualifiers = 0, int offset = -1)
+    {
+        llvm::Value* args[] = {
+            gla::MakeIntConstant(context, layout),
+            gla::MakeIntConstant(context, precision),
+            gla::MakeIntConstant(context, location),
+            sampler,
+            gla::MakeIntConstant(context, interpMode),
+            gla::MakeIntConstant(context, builtIn),
+            gla::MakeIntConstant(context, binding),
+            gla::MakeUnsignedConstant(context, qualifiers),
+            gla::MakeIntConstant(context, offset),
+        };
+
+        return llvm::MDNode::get(context, args);
+    }
+
+    void addMdEntrypoint(const char* name)
+    {
+        llvm::MDNode* md;
+        llvm::Value* args[] = {
+            llvm::MDString::get(context, name),
+            gla::MakeIntConstant(context, EMioEntrypoint),
+        };
+        md = llvm::MDNode::get(context, args);
+
+        llvm::NamedMDNode* namedNode = module->getOrInsertNamedMetadata(EntrypointListMdName);
+        namedNode->addOperand(md);
+    }
+
+    void addNoStaticUse(llvm::MDNode* md)
+    {
+        llvm::NamedMDNode* namedNode = module->getOrInsertNamedMetadata(NoStaticUseMdName);
+        namedNode->addOperand(md);
+    }
+
+    void addShared(llvm::Value* shared)
+    {
+        llvm::Value* args[] = {
+            shared
+        };
+        llvm::MDNode* md = llvm::MDNode::get(context, args);
+        llvm::NamedMDNode* namedNode = module->getOrInsertNamedMetadata(WorkgroupSharedMdName);
+        namedNode->addOperand(md);
+    }
+
+    // Add a named metadata node of the form:
+    //     !gla.name        = !{ int }
+    // Where 'int' could also be considered a bool or enum.  See comment at top for long list
+    // of examples where this can be used.
+    void makeMdNamedInt(const char* name, int i)
+    {
+        llvm::NamedMDNode* namedNode = module->getOrInsertNamedMetadata(name);
+        llvm::Value* layoutArgs[] = { gla::MakeIntConstant(context, i) };
+        namedNode->addOperand(llvm::MDNode::get(context, layoutArgs));
+    }
+
+    void makeMdNamedInt(const char* name, int i1, int i2, int i3)
+    {
+        llvm::NamedMDNode* namedNode = module->getOrInsertNamedMetadata(name);
+        llvm::Value* layoutArgs[] = { gla::MakeIntConstant(context, i1),
+                                      gla::MakeIntConstant(context, i2),
+                                      gla::MakeIntConstant(context, i3), };
+        namedNode->addOperand(llvm::MDNode::get(context, layoutArgs));
+    }
+
+protected:
+    llvm::LLVMContext& context;
+    llvm::Module* module;
+    llvm::MDNode* precisionMd[EMpCount];
+};
+
+};
+
+#endif // metadata_H

diff --git a/LunarGLASS/Frontends/SPIRV/SpvToTop.cpp b/LunarGLASS/Frontends/SPIRV/SpvToTop.cpp
new file mode 100644
index 0000000..d519e8d
--- /dev/null
+++ b/LunarGLASS/Frontends/SPIRV/SpvToTop.cpp

@@ -0,0 +1,2913 @@
+//
+//Copyright (C) 2014 LunarG, Inc.
+//
+//All rights reserved.
+//
+//Redistribution and use in source and binary forms, with or without
+//modification, are permitted provided that the following conditions
+//are met:
+//
+//    Redistributions of source code must retain the above copyright
+//    notice, this list of conditions and the following disclaimer.
+//
+//    Redistributions in binary form must reproduce the above
+//    copyright notice, this list of conditions and the following
+//    disclaimer in the documentation and/or other materials provided
+//    with the distribution.
+//
+//    Neither the name of 3Dlabs Inc. Ltd. nor the names of its
+//    contributors may be used to endorse or promote products derived
+//    from this software without specific prior written permission.
+//
+//THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+//"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+//LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+//FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+//COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+//INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+//BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+//LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+//CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+//LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+//ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+//POSSIBILITY OF SUCH DAMAGE.
+
+//
+// Author: John Kessenich, LunarG
+//
+
+//
+// Translate SPIR-V to Top IR.
+//
+
+#define _CRT_SECURE_NO_WARNINGS
+#ifdef _WIN32
+#define snprintf _snprintf
+#endif
+
+// LLVM includes
+#pragma warning(push, 1)
+#include "llvm/Support/CFG.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/IR/DerivedTypes.h"
+#include "llvm/IR/Intrinsics.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/LLVMContext.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Transforms/Scalar.h"
+#include "llvm/ADT/SmallVector.h"
+#pragma warning(pop)
+
+#include <cstdio>
+#include <iostream>
+#include <string>
+#include <map>
+#include <vector>
+#include <iomanip>
+#include <stack>
+#include <list>
+
+// Glslang includes
+#include "SPIRV/spirv.hpp"
+namespace spv {
+    #include "SPIRV/GLSL.std.450.h"
+}
+
+// LunarGLASS includes
+#include "LunarGLASSTopIR.h"
+#include "LunarGLASSManager.h"
+#include "Exceptions.h"
+#include "TopBuilder.h"
+#include "metadata.h"
+#include "Util.h"
+
+// Adapter includes
+#include "SpvToTop.h"
+
+// Glslang includes, just because the GLSL backend reuses some enums for stage and version
+#include "glslang/Public/ShaderLang.h"
+#include "glslang/MachineIndependent/Versions.h"
+
+namespace {
+
+const unsigned int BadValue = 0xFFFFFFFF;
+
+spv::Op GetOpCode(unsigned int word)
+{
+    return (spv::Op)(word & spv::OpCodeMask);
+}
+
+int GetWordCount(unsigned int word)
+{
+    return (int)(word >> spv::WordCountShift);
+}
+
+gla::EMdBuiltIn GetMdBuiltIn(spv::BuiltIn builtIn)
+{
+    switch (builtIn) {
+    case spv::BuiltInNumWorkgroups:        return gla::EmbNumWorkGroups;
+    case spv::BuiltInWorkgroupSize:        return gla::EmbWorkGroupSize;
+    case spv::BuiltInWorkgroupId:          return gla::EmbWorkGroupId;
+    case spv::BuiltInLocalInvocationId:    return gla::EmbLocalInvocationId;
+    case spv::BuiltInGlobalInvocationId:   return gla::EmbGlobalInvocationId;
+    case spv::BuiltInLocalInvocationIndex: return gla::EmbLocalInvocationIndex;
+    case spv::BuiltInVertexId:             return gla::EmbVertexId;
+    case spv::BuiltInVertexIndex:          return gla::EmbVertexIndex;
+    case spv::BuiltInInstanceId:           return gla::EmbInstanceId;
+    case spv::BuiltInInstanceIndex:        return gla::EmbInstanceIndex;
+    case spv::BuiltInPosition:             return gla::EmbPosition;
+    case spv::BuiltInPointSize:            return gla::EmbPointSize;
+    case spv::BuiltInClipDistance:         return gla::EmbClipDistance;
+    case spv::BuiltInCullDistance:         return gla::EmbCullDistance;
+    case spv::BuiltInInvocationId:         return gla::EmbInvocationId;
+    case spv::BuiltInPrimitiveId:          return gla::EmbPrimitiveId;
+    case spv::BuiltInLayer:                return gla::EmbLayer;
+    case spv::BuiltInViewportIndex:        return gla::EmbViewportIndex;
+    case spv::BuiltInPatchVertices:        return gla::EmbPatchVertices;
+    case spv::BuiltInTessLevelOuter:       return gla::EmbTessLevelOuter;
+    case spv::BuiltInTessLevelInner:       return gla::EmbTessLevelInner;
+    case spv::BuiltInTessCoord:            return gla::EmbTessCoord;
+    case spv::BuiltInFrontFacing:          return gla::EmbFace;
+    case spv::BuiltInFragCoord:            return gla::EmbFragCoord;
+    case spv::BuiltInPointCoord:           return gla::EmbPointCoord;
+    case spv::BuiltInFragDepth:            return gla::EmbFragDepth;
+    case spv::BuiltInSampleId:             return gla::EmbSampleId;
+    case spv::BuiltInSamplePosition:       return gla::EmbSamplePosition;
+    case spv::BuiltInSampleMask:           return gla::EmbSampleMask;
+    case spv::BuiltInHelperInvocation:     return gla::EmbHelperInvocation;
+    default:
+        gla::UnsupportedFunctionality("built in variable", gla::EATContinue);
+        return gla::EmbNone;
+    }
+}
+
+const char* NonNullName(const char* name)
+{
+    return name ? name : "";
+}
+
+//
+// Translator instance for translating a SPIR-V stream to LunarGLASS's LLVM-based Top IR.
+//
+class SpvToTopTranslator {
+public:
+    SpvToTopTranslator(const std::vector<unsigned int>& spirv, gla::Manager& manager);
+    virtual ~SpvToTopTranslator();
+
+    void makeTop();
+
+protected:
+    // a bag to hold type information that's lost going to LLVM (without metadata)
+    struct MetaType {
+        MetaType() : layout(gla::EMtlNone), combinedImageSampler(false), precision(gla::EMpNone), builtIn(gla::EmbNone), set(-1), binding(-1), location(gla::MaxUserLayoutLocation),
+                     interpolationMethod((spv::Decoration)BadValue), interpolateTo((spv::Decoration)BadValue),
+                     invariant(false), name(0) { }
+
+        // set of things needed for !typeLayout metadata node
+        gla::EMdTypeLayout layout;              // includes matrix information
+        bool combinedImageSampler;
+        gla::EMdPrecision precision;
+        gla::EMdBuiltIn builtIn;
+        short set;
+        short matrixStride;
+        int arrayStride;
+        int binding;
+        int location;
+        spv::Decoration interpolationMethod;
+        spv::Decoration interpolateTo;
+
+        // metadata indicated in some other way
+        bool invariant;
+        const char* name;
+        static const int bufSize = 12;
+        char buf[bufSize];
+    };
+    void bumpMemberMetaData(std::vector<MetaType>*& memberMetaData, int member);
+
+    // SPIR-V instruction processors
+    void setEntryPoint(spv::ExecutionModel, spv::Id entryId);
+    void setExecutionMode(spv::Id entryPoint, spv::ExecutionMode);
+    void addDecoration(spv::Id, spv::Decoration);
+    void addMemberDecoration(spv::Id, unsigned int member, spv::Decoration);
+    void addMetaTypeDecoration(spv::Decoration decoration, MetaType& metaType);
+    void addType(spv::Op, spv::Id, int numOperands);
+    void addVariable(spv::Id resultId, spv::Id typeId, spv::StorageClass);
+    gla::Builder::EStorageQualifier mapStorageClass(spv::StorageClass, bool isBuffer);
+    void addConstant(spv::Op, spv::Id resultId, spv::Id typeId, int numOperands);
+    void addConstantAggregate(spv::Id resultId, spv::Id typeId, int numOperands);
+    int assignSlot(spv::Id resultId, int& numSlots);
+    void decodeResult(bool type, spv::Id& typeId, spv::Id& resultId);
+    const char* findAName(spv::Id choice1, spv::Id choice2 = 0);
+    void translateInstruction(spv::Op opCode, int numOperands);
+    llvm::Value* createUnaryOperation(spv::Op, gla::EMdPrecision, llvm::Type* resultType, llvm::Value* operand, bool hasSign, bool reduceComparison);
+    llvm::Value* createBinaryOperation(spv::Op, gla::EMdPrecision, llvm::Value* left, llvm::Value* right, bool hasSign, bool reduceComparison, const char* name = 0);
+    gla::ESamplerType convertImageType(spv::Id imageTypeId);
+    llvm::Value* createSamplingCall(spv::Op, spv::Id type, spv::Id result, int numOperands);
+    llvm::Value* createTextureQueryCall(spv::Op, spv::Id type, spv::Id result, int numOperands);
+    llvm::Value* createConstructor(spv::Id resultId, spv::Id typeId, std::vector<llvm::Value*> constituents);
+    void handleOpFunction(spv::Id& typeId, spv::Id& resultId);
+    llvm::Function* makeFunction(spv::Id functionId, spv::Id returnTypeId, std::vector<spv::Id>& argTypeIds);
+    spv::Id getNextLabelId();
+    llvm::Value* createFunctionCall(gla::EMdPrecision precision, spv::Id typeId, spv::Id resultId, int numOperands);
+    llvm::Value* createExternalInstruction(gla::EMdPrecision precision, spv::Id typeId, spv::Id resultId, int numOperands, const char* name);
+    spv::Op getOpCode(spv::Id) const;
+    spv::Id dereferenceTypeId(spv::Id typeId) const;
+    spv::Id getArrayElementTypeId(spv::Id typeId) const;
+    spv::Id getStructMemberTypeId(spv::Id typeId, int member) const;
+    spv::Id getImageTypeId(spv::Id sampledImageTypeId) const;
+    spv::Id getImageSampledType(spv::Id typeId) const;
+    spv::Dim getImageDim(spv::Id typeId) const;
+    bool isImageArrayed(spv::Id typeId) const;
+    bool isImageDepth(spv::Id typeId) const;
+    bool isImageMS(spv::Id typeId) const;
+
+    bool inEntryPoint();
+    bool isMatrix(spv::Id typeId) { return commonMap[typeId].metaType.layout == gla::EMtlColMajorMatrix || 
+                                           commonMap[typeId].metaType.layout == gla::EMtlRowMajorMatrix; }
+    void makeLabelBlock(spv::Id labelId);
+    void createSwitch(int numOperands);
+
+    // metadata parameter translators
+    gla::EMdInputOutput     getMdQualifier      (spv::Id resultId) const;
+    gla::EMdSampler         getMdSampler        (spv::Id typeId)   const;
+    gla::EMdSamplerDim      getMdSamplerDim     (spv::Id typeId)   const;
+    gla::EMdSamplerBaseType getMdSamplerBaseType(spv::Id typeId)   const;
+    void getInterpolationLocationMethod(spv::Id id, gla::EInterpolationMethod& method, gla::EInterpolationLocation& location);
+
+    // metadata creators
+
+    // bias set by 1, so that we can simultaneously
+    //  - tell the difference between "set=0" and nothing having been said, and
+    //  - not be using the upper bits at all for all the common cases where there is no set
+    int packSetBinding(MetaType& metaType) { return ((metaType.set + 1) << 16) | metaType.binding; }
+
+    llvm::Value* makePermanentTypeProxy(llvm::Value*);
+    llvm::MDNode* declareUniformMetadata(spv::Id resultId);
+    llvm::MDNode* declareMdDefaultUniform(spv::Id resultId);
+    llvm::MDNode* makeMdSampler(spv::Id typeId, llvm::Value*);
+    llvm::MDNode* declareMdUniformBlock(gla::EMdInputOutput, spv::Id resultId);
+    llvm::MDNode* declareMdType(spv::Id typeId, MetaType&);
+    llvm::MDNode* makeInputOutputMetadata(spv::Id resultId, int slot, const char* kind);
+    void makeOutputMetadata(spv::Id resultId, int slot, int numSlots);
+    llvm::MDNode* makeInputMetadata(spv::Id resultId, int slot);
+
+    // class data
+    const std::vector<unsigned int>& spirv; // the SPIR-V stream of words
+    int word;                               // next word to read from spirv
+    int nextInst;                           // beginning of the next instruction
+    gla::Manager& manager;                  // LunarGLASS manager
+    llvm::LLVMContext &context;
+    llvm::BasicBlock* shaderEntry;
+    llvm::IRBuilder<> llvmBuilder;
+    llvm::Module* module;
+    gla::Metadata metadata;
+    gla::Builder* glaBuilder;
+    int version;
+    int generator;
+    spv::ExecutionModel currentModel;        // ...this is the 'one' currently being operated on
+    spv::Id currentFunction;                 // function we are currently inside of
+    llvm::Function::arg_iterator currentArg;   // the current argument for processing the function declaration
+    int nextSlot;
+
+    // map each <id> to the set of things commonly needed
+    // TODO: optimize space down to what really is commonly needed
+    unsigned int numIds;
+    struct CommonAnnotations {
+        CommonAnnotations() : instructionIndex(0), typeId(0), value(0),
+                              isBlock(false), isBuffer(false), storageClass((spv::StorageClass)BadValue),
+                              entryPoint((spv::ExecutionModel)BadValue), memberMetaData(0) { }
+        int instructionIndex;                   // the location in the spirv of this instruction
+        union {
+            spv::Id typeId;                     // typeId is valid if indexed with a resultId
+            llvm::Type* type;                   // type is valid if indexed with a typeId
+        };
+        union {
+            llvm::Value* value;                 // for things using a value
+            llvm::Function* function;           // for function id
+            llvm::BasicBlock* block;            // for label id
+        };
+        MetaType metaType;
+        bool isBlock;
+        bool isBuffer;        // SSBO
+        spv::StorageClass storageClass;
+        spv::ExecutionModel entryPoint;
+        std::vector<MetaType> *memberMetaData;
+    };
+    std::vector<CommonAnnotations> commonMap;
+};
+
+SpvToTopTranslator::SpvToTopTranslator(const std::vector<unsigned int>& spirv, gla::Manager& manager)
+    : spirv(spirv), word(0),
+      manager(manager), context(manager.getModule()->getContext()),
+      shaderEntry(0), llvmBuilder(context),
+      module(manager.getModule()), metadata(context, module),
+      version(0), generator(0), currentModel((spv::ExecutionModel)BadValue), currentFunction(0),
+      nextSlot(gla::MaxUserLayoutLocation)
+{
+    glaBuilder = new gla::Builder(llvmBuilder, &manager, metadata);
+    glaBuilder->setNoPredecessorBlocks(false);
+    glaBuilder->clearAccessChain();
+    glaBuilder->setAccessChainDirectionRightToLeft(false);
+}
+
+SpvToTopTranslator::~SpvToTopTranslator()
+{
+}
+
+// Make sure the memberMetaData is big enough to hold 0-based 'member'
+void SpvToTopTranslator::bumpMemberMetaData(std::vector<MetaType>*& memberMetaData, int member)
+{
+    // Specification issue: it would be much better to know up front the number of members to decorate
+    if (memberMetaData == 0)
+        memberMetaData = new std::vector<MetaType>;
+    if ((int)memberMetaData->size() < member + 1)
+        memberMetaData->resize(member + 1);
+}
+
+//
+// The top-level algorithm for translating SPIR-V to the LLVM-based Top IR.
+//
+// Goes sequentially through the entire SPIR-V module, one instruction at a time,
+// handing out the work to helper methods.
+//
+// The result <id> and type <id> of each instruction are decoded here, as well
+// as the number of operands.  But the operands themselves are decoded by the
+// called methods.
+//
+void SpvToTopTranslator::makeTop()
+{
+    int size = (int)spirv.size();
+
+    // Sanity check size
+    if (size < 5)
+        gla::UnsupportedFunctionality("SPIR-V is too short");
+
+    // Magic number
+    if (spirv[word++] != spv::MagicNumber)
+        gla::UnsupportedFunctionality("Bad magic number for SPIR-V");
+
+    version = spirv[word++];
+    generator = spirv[word++];
+
+    numIds = spirv[word++];
+    commonMap.resize(numIds);
+
+    if (spirv[word++] != 0)
+        gla::UnsupportedFunctionality("Non-0 schema");
+
+    // Walk the instructions
+    while (word < size) {
+        // First word
+        unsigned int instructionStart = word;
+        unsigned int firstWord = spirv[word];
+
+        // OpCode
+        spv::Op opCode = GetOpCode(firstWord);
+
+        // Instruction size
+        int instrSize = (int)(firstWord >> 16);
+        nextInst = word + instrSize;
+
+        word++;
+        if (nextInst > size)
+            gla::UnsupportedFunctionality("SPIR-V instruction terminated too early");
+
+        // Hand off each instruction
+        translateInstruction(opCode, instrSize - 1);
+
+        // go to the next instruction, in a way stable WRT incorrect operand parsing
+        word = nextInst;
+    }
+}
+
+// Set an entry point for a model.
+// Note:  currently only one entry point is supported.
+void SpvToTopTranslator::setEntryPoint(spv::ExecutionModel model, spv::Id entryId)
+{
+    commonMap[entryId].entryPoint = model;
+    if (currentModel != BadValue)
+        gla::UnsupportedFunctionality("Multiple execution models in the same SPIR-V module (setting entry point)");
+    currentModel = model;
+
+    // Lock ourselves into a single model for now...
+    switch (currentModel) {
+    case spv::ExecutionModelVertex:                  manager.setStage(EShLangVertex);         break;
+    case spv::ExecutionModelTessellationControl:     manager.setStage(EShLangTessControl);    break;
+    case spv::ExecutionModelTessellationEvaluation:  manager.setStage(EShLangTessEvaluation); break;
+    case spv::ExecutionModelGeometry:                manager.setStage(EShLangGeometry);       break;
+    case spv::ExecutionModelFragment:                manager.setStage(EShLangFragment);       break;
+    case spv::ExecutionModelGLCompute:               manager.setStage(EShLangCompute);        break;
+    default:
+        gla::UnsupportedFunctionality("SPIR-V execution model", model, gla::EATAbort);
+    }
+}
+
+// Process an OpExecutionMode instruction.
+// On entry, 'word' is the index of the first operand.
+void SpvToTopTranslator::setExecutionMode(spv::Id entryPoint, spv::ExecutionMode mode)
+{
+    spv::ExecutionModel model = commonMap[entryPoint].entryPoint;
+    assert(model != BadValue);
+
+    if (currentModel != model)
+        gla::UnsupportedFunctionality("Multiple execution models in the same SPIR-V module (setting input mode)", gla::EATContinue);
+
+    switch (mode) {
+    case spv::ExecutionModeInvocations:
+        metadata.makeMdNamedInt(gla::InvocationsMdName, spirv[word++]);
+        break;
+    case spv::ExecutionModeSpacingEqual:
+        metadata.makeMdNamedInt(gla::VertexSpacingMdName, gla::EMvsEqual);
+        break;
+    case spv::ExecutionModeSpacingFractionalEven:
+        metadata.makeMdNamedInt(gla::VertexSpacingMdName, gla::EMvsFractionalEven);
+        break;
+    case spv::ExecutionModeSpacingFractionalOdd:
+        metadata.makeMdNamedInt(gla::VertexSpacingMdName, gla::EMvsFractionalOdd);
+        break;
+    case spv::ExecutionModeVertexOrderCw:
+        metadata.makeMdNamedInt(gla::VertexOrderMdName, gla::EMvoCw);
+        break;
+    case spv::ExecutionModeVertexOrderCcw:
+        metadata.makeMdNamedInt(gla::VertexOrderMdName, gla::EMvoCcw);
+        break;
+    case spv::ExecutionModePixelCenterInteger:
+        metadata.makeMdNamedInt(gla::PixelCenterIntegerMdName, 1);
+        break;
+    case spv::ExecutionModeOriginUpperLeft:
+        metadata.makeMdNamedInt(gla::OriginUpperLeftMdName, 1);
+        break;
+    case spv::ExecutionModeOriginLowerLeft:
+        break;
+    case spv::ExecutionModePointMode:
+        metadata.makeMdNamedInt(gla::PointModeMdName, 1);
+        break;
+    case spv::ExecutionModeInputPoints:
+        metadata.makeMdNamedInt(gla::InputPrimitiveMdName, gla::EMlgPoints);
+        break;
+    case spv::ExecutionModeInputLines:
+        metadata.makeMdNamedInt(gla::InputPrimitiveMdName, gla::EMlgLines);
+        break;
+    case spv::ExecutionModeInputLinesAdjacency:
+        metadata.makeMdNamedInt(gla::InputPrimitiveMdName, gla::EMlgLinesAdjacency);
+        break;
+    case spv::ExecutionModeTriangles:
+        metadata.makeMdNamedInt(gla::InputPrimitiveMdName, gla::EMlgTriangles);
+        break;
+    case spv::ExecutionModeInputTrianglesAdjacency:
+        metadata.makeMdNamedInt(gla::InputPrimitiveMdName, gla::EMlgTrianglesAdjacency);
+        break;
+    case spv::ExecutionModeQuads:
+        metadata.makeMdNamedInt(gla::InputPrimitiveMdName, gla::EMlgQuads);
+        break;
+    case spv::ExecutionModeIsolines:
+        metadata.makeMdNamedInt(gla::InputPrimitiveMdName, gla::EMlgIsolines);
+        break;
+    case spv::ExecutionModeXfb:
+        metadata.makeMdNamedInt(gla::XfbModeMdName, 1);
+        break;
+    case spv::ExecutionModeOutputVertices:
+        metadata.makeMdNamedInt(gla::NumVerticesMdName, spirv[word++]);
+        break;
+    case spv::ExecutionModeOutputPoints:
+        metadata.makeMdNamedInt(gla::OutputPrimitiveMdName, gla::EMlgPoints);
+        break;
+    case spv::ExecutionModeOutputLineStrip:
+        metadata.makeMdNamedInt(gla::OutputPrimitiveMdName, gla::EMlgLineStrip);
+        break;
+    case spv::ExecutionModeOutputTriangleStrip:
+        metadata.makeMdNamedInt(gla::OutputPrimitiveMdName, gla::EMlgTriangleStrip);
+        break;
+
+    case spv::ExecutionModeLocalSize:
+    case spv::ExecutionModeEarlyFragmentTests:
+    case spv::ExecutionModeDepthGreater:
+    case spv::ExecutionModeDepthLess:
+    case spv::ExecutionModeDepthUnchanged:
+    case spv::ExecutionModeDepthReplacing:
+
+    default:
+        gla::UnsupportedFunctionality("execution mode", gla::EATContinue);
+        break;
+    }
+}
+
+// Process an OpDecorate instruction.
+// It's been decoded up to but not including optional operands.
+void SpvToTopTranslator::addDecoration(spv::Id id, spv::Decoration decoration)
+{
+    switch (decoration) {
+    case spv::DecorationBlock:
+        commonMap[id].isBlock = true;
+        break;
+    case spv::DecorationBufferBlock:
+        commonMap[id].isBlock = true;
+        commonMap[id].isBuffer = true;
+        break;
+    default:
+        addMetaTypeDecoration(decoration, commonMap[id].metaType);
+        break;
+    }
+}
+
+// Process an OpMemberDecorate instruction.
+// It's been decoded up to but not including optional operands.
+void SpvToTopTranslator::addMemberDecoration(spv::Id structTypeId, unsigned int member, spv::Decoration decoration)
+{
+    bumpMemberMetaData(commonMap[structTypeId].memberMetaData, member);
+    addMetaTypeDecoration(decoration, (*commonMap[structTypeId].memberMetaData)[member]);
+}
+
+// Process an OpDecorate instruction.
+// It's been decoded up to but not including optional operands.
+void SpvToTopTranslator::addMetaTypeDecoration(spv::Decoration decoration, MetaType& metaType)
+{
+    unsigned int num;
+
+    switch (decoration) {
+    case spv::DecorationRelaxedPrecision:
+        metaType.precision = gla::EMpMedium;
+        break;
+
+    case spv::DecorationNoPerspective:
+    case spv::DecorationFlat:
+    case spv::DecorationPatch:
+        metaType.interpolationMethod = decoration;
+        break;
+
+    case spv::DecorationGLSLShared:
+        metaType.layout = gla::EMtlShared;
+        break;
+    case spv::DecorationGLSLPacked:
+        metaType.layout = gla::EMtlPacked;
+        break;
+    case spv::DecorationRowMajor:
+        metaType.layout = gla::EMtlRowMajorMatrix;
+        break;
+    case spv::DecorationColMajor:
+        metaType.layout = gla::EMtlColMajorMatrix;
+        break;
+    case spv::DecorationMatrixStride:
+        num = spirv[word++];
+        metaType.matrixStride = (short)num;
+        break;
+    case spv::DecorationArrayStride:
+        num = spirv[word++];
+        metaType.arrayStride = (short)num;
+        break;
+
+    case spv::DecorationInvariant:
+        metaType.invariant = true;
+        break;
+
+    case spv::DecorationDescriptorSet:
+        num = spirv[word++];
+        metaType.set = num;
+        break;
+    case spv::DecorationBinding:
+        num = spirv[word++];
+        metaType.binding = num;
+        break;
+    case spv::DecorationLocation:
+        num = spirv[word++];
+        metaType.location = num;
+        break;
+    case spv::DecorationBuiltIn:
+        num = spirv[word++];
+        metaType.builtIn = GetMdBuiltIn((spv::BuiltIn)num);
+        break;
+    case spv::DecorationOffset:
+    {
+        static bool once = false;
+        if (! once) {
+            gla::UnsupportedFunctionality("member offset", gla::EATContinue);
+            once = true;
+        }
+        break;
+    }
+
+    case spv::DecorationStream:
+    case spv::DecorationComponent:
+    case spv::DecorationIndex:
+    case spv::DecorationXfbBuffer:
+
+    case spv::DecorationCentroid:
+    case spv::DecorationSample:
+    case spv::DecorationUniform:
+    default:
+        gla::UnsupportedFunctionality("metaType decoration ", decoration, gla::EATContinue);
+        break;
+    }
+}
+
+// Process an OpType instruction.
+// On entry, 'word' is the index of the first operand.
+void SpvToTopTranslator::addType(spv::Op typeClass, spv::Id resultId, int numOperands)
+{
+    unsigned int width;
+    switch (typeClass) {
+
+    // void
+    case spv::OpTypeVoid:
+        commonMap[resultId].type = gla::GetVoidType(context);
+        break;
+
+    // bool
+    case spv::OpTypeBool:        
+        commonMap[resultId].type = gla::GetBoolType(context);
+        break;
+
+    // float
+    case spv::OpTypeFloat:
+        width = spirv[word++];
+        if (width != 32)
+            gla::UnsupportedFunctionality("non-32-bit width OpType");
+        commonMap[resultId].type = gla::GetFloatType(context);
+        break;
+
+    // int
+    case spv::OpTypeInt:
+        width = spirv[word++];
+        if (width == 32) {
+            if (spirv[word++] == 0u) {
+                commonMap[resultId].metaType.layout = gla::EMtlUnsigned;
+                commonMap[resultId].type = gla::GetUintType(context);
+            } else
+                commonMap[resultId].type = gla::GetIntType(context);                
+        } else
+            gla::UnsupportedFunctionality("non-32-bit int for OpType");
+
+        break;
+
+    // vector
+    case spv::OpTypeVector:
+    {
+        llvm::Type* componentType = commonMap[spirv[word++]].type;
+        unsigned int vectorSize = spirv[word++];
+        commonMap[resultId].type = llvm::VectorType::get(componentType, vectorSize);
+        break;
+    }
+
+    // array
+    case spv::OpTypeArray:
+    {
+        llvm::Type* elementType = commonMap[spirv[word++]].type;
+        unsigned int arraySizeId = spirv[word++];
+        commonMap[resultId].type = llvm::ArrayType::get(elementType, gla::GetConstantInt(commonMap[arraySizeId].value));
+        break;
+    }
+
+    // matrix
+    case spv::OpTypeMatrix:
+    {
+        llvm::Type* columnType = commonMap[spirv[word++]].type;
+        int cols = spirv[word++];
+        int rows = gla::GetComponentCount(columnType);
+        commonMap[resultId].type = glaBuilder->getMatrixType(columnType->getContainedType(0), cols, rows);
+        if (commonMap[resultId].metaType.layout == gla::EMtlNone)
+            commonMap[resultId].metaType.layout = gla::EMtlColMajorMatrix;
+        break;
+    }
+
+    // images
+    case spv::OpTypeImage:
+        commonMap[resultId].type = gla::GetIntType(context);
+        commonMap[resultId].metaType.layout = gla::EMtlSampler;
+        break;
+    case spv::OpTypeSampledImage:
+        commonMap[resultId].type = gla::GetIntType(context);
+        commonMap[resultId].metaType.layout = gla::EMtlSampler;
+        commonMap[resultId].metaType.combinedImageSampler = true;
+        break;
+    case spv::OpTypeSampler:
+        gla::UnsupportedFunctionality("OpTypeSampler");
+        break;
+
+    // run-time array
+    case spv::OpTypeRuntimeArray:
+        gla::UnsupportedFunctionality("run time array OpType");
+        break;
+
+    // structure
+    case spv::OpTypeStruct:
+    {
+        std::vector<llvm::Type*> memberTypes;
+        memberTypes.resize(numOperands);
+        for (int m = 0; m < numOperands; ++m)
+            memberTypes[m] = commonMap[spirv[word++]].type;
+        commonMap[resultId].type = commonMap[resultId].metaType.name ? llvm::StructType::create(context, memberTypes, commonMap[resultId].metaType.name)
+                                                                     : llvm::StructType::create(context, memberTypes);
+        break;
+    }
+
+    // pointer
+    case spv::OpTypePointer:
+    {
+        spv::StorageClass storageClass = (spv::StorageClass)spirv[word++];
+        spv::Id pointee = spirv[word++];
+        commonMap[resultId].type = glaBuilder->getPointerType(commonMap[pointee].type, mapStorageClass(storageClass, commonMap[pointee].isBuffer), 0);
+        break;
+    }
+
+    // function
+    case spv::OpTypeFunction:
+        // Don't make an LLVM type for a function type, just use it for 
+        // reference when a function uses it.  Maybe this will change when
+        // function pointers need to be supported.
+        break;
+
+    default:
+        gla::UnsupportedFunctionality("OpType type\n");
+        break;
+    }
+}
+
+// Process an OpVariable instruction.
+// It has already been fully decoded.
+void SpvToTopTranslator::addVariable(spv::Id resultId, spv::Id typeId, spv::StorageClass storageClass)
+{
+    // typeId should be the type of the result of the OpVariable instruction.
+    // That is, a pointer, meaning the type of variable to create is a 
+    // dereference of typeId.
+    assert(llvm::isa<llvm::PointerType>(commonMap[typeId].type));
+    llvm::Type* variableType = commonMap[typeId].type->getContainedType(0);
+
+    commonMap[resultId].storageClass = storageClass;
+
+    llvm::Constant* initializer = 0;
+    int constantBuffer = 0;
+    gla::Builder::EStorageQualifier glaQualifier = mapStorageClass(storageClass, commonMap[resultId].isBuffer);
+
+    const char* name = commonMap[resultId].metaType.name;
+    if (name) {
+        if (name[0] == 0)
+            name = "anon@";
+    } else {
+        if (commonMap[resultId].metaType.builtIn != gla::EmbNone) {
+            snprintf(&commonMap[resultId].metaType.buf[0], MetaType::bufSize, "__glab%d_", commonMap[resultId].metaType.builtIn);
+            name = &commonMap[resultId].metaType.buf[0];
+            commonMap[resultId].metaType.name = name;
+        } else if (storageClass != spv::StorageClassFunction)
+            name = "nn";   // no name, but LLVM treats I/O as dead when there is no name
+        else
+            name = "";
+    }
+
+    commonMap[resultId].value = glaBuilder->createVariable(glaQualifier, constantBuffer, variableType, initializer, 0, name);
+
+    // Set up any IO metadata
+    int numSlots;
+    int slot;
+    switch (storageClass) {
+    case spv::StorageClassOutput:
+        slot = assignSlot(resultId, numSlots);
+        makeOutputMetadata(resultId, slot, numSlots);
+        break;
+    case spv::StorageClassInput:
+        slot = assignSlot(resultId, numSlots);
+        makeInputMetadata(resultId, slot);
+    case spv::StorageClassUniformConstant:
+    case spv::StorageClassUniform:
+        declareUniformMetadata(resultId);
+        break;
+    default:
+        break;
+    }
+}
+
+gla::Builder::EStorageQualifier SpvToTopTranslator::mapStorageClass(spv::StorageClass storageClass, bool isBuffer)
+{
+    switch (storageClass) {
+    case spv::StorageClassFunction:
+        return gla::Builder::ESQLocal;
+    case spv::StorageClassUniformConstant:
+    case spv::StorageClassUniform:
+        return isBuffer ? gla::Builder::ESQBuffer : gla::Builder::ESQUniform;
+    case spv::StorageClassInput:
+        return gla::Builder::ESQInput;
+    case spv::StorageClassOutput:
+        return gla::Builder::ESQOutput;
+    case spv::StorageClassPrivate:
+        return gla::Builder::ESQGlobal;
+    case spv::StorageClassWorkgroup:
+        return gla::Builder::ESQShared;
+
+    default:
+        gla::UnsupportedFunctionality("storage class");
+        break;
+    }
+
+    // should not normally be executed:
+    return gla::Builder::ESQGlobal;
+}
+
+// Build a literal constant.
+void SpvToTopTranslator::addConstant(spv::Op opCode, spv::Id resultId, spv::Id typeId, int numOperands)
+{
+    // vector of constants for LLVM
+    std::vector<llvm::Constant*> llvmConsts;
+
+    switch (opCode) {
+    case spv::OpConstantTrue:
+        llvmConsts.push_back(gla::MakeBoolConstant(context, 1));
+        break;
+    case spv::OpConstantFalse:
+        llvmConsts.push_back(gla::MakeBoolConstant(context, 0));
+        break;
+    case spv::OpConstant:
+        switch (getOpCode(typeId)) {
+        case spv::OpTypeFloat:
+            if (numOperands > 1)
+                gla::UnsupportedFunctionality("non-single-precision constant");
+            else
+                llvmConsts.push_back(gla::MakeFloatConstant(context, *(const float*)(&spirv[word])));
+            break;
+        case spv::OpTypeInt:
+            if (commonMap[typeId].type == commonMap[typeId].type->getInt1Ty(context))
+                gla::UnsupportedFunctionality("1-bit integer");
+            else if (commonMap[resultId].metaType.layout == gla::EMtlUnsigned)
+                llvmConsts.push_back(gla::MakeUnsignedConstant(context, spirv[word]));
+            else
+                llvmConsts.push_back(gla::MakeIntConstant(context, (int)spirv[word]));
+            break;
+        default:
+            gla::UnsupportedFunctionality("literal constant type");
+            break;
+        }
+    }
+
+    commonMap[resultId].value = glaBuilder->getConstant(llvmConsts, commonMap[typeId].type);
+}
+
+// Build up a hierarchical constant.
+void SpvToTopTranslator::addConstantAggregate(spv::Id resultId, spv::Id typeId, int numOperands)
+{
+    // vector of constants for LLVM
+    std::vector<llvm::Constant*> llvmConsts;
+    for (int op = 0; op < numOperands; ++op)
+        llvmConsts.push_back(llvm::dyn_cast<llvm::Constant>(commonMap[spirv[word++]].value));
+
+    commonMap[resultId].value = glaBuilder->getConstant(llvmConsts, commonMap[typeId].type);
+}
+
+//
+// Find and use the user-specified location as a slot, or if a location was not
+// specified, pick the next non-user available slot. User-specified locations
+// directly use the location specified, while non-user-specified will use locations
+// starting after MaxUserLayoutLocation to avoid collisions.
+//
+// Ensure enough slots are consumed to cover the size of the data represented by the node symbol.
+//
+// 'numSlots' means number of GLSL locations when using logical IO.
+//
+int SpvToTopTranslator::assignSlot(spv::Id resultId, int& numSlots)
+{
+    // Base the numbers of slots on the front-end's computation, if possible, otherwise estimate it.
+    numSlots = 1;
+
+    //??
+    //if (type.isArray() && ! type.getQualifier().isArrayedIo(glslangIntermediate->getStage()))
+    //    numSlots = type.getArraySize();
+    //if (type.isStruct() || type.isMatrix() || type.getBasicType() == spv::EbtDouble)
+    //    gla::UnsupportedFunctionality("complex I/O type; use new glslang C++ interface", gla::EATContinue);
+    //}
+
+    int slot = commonMap[resultId].metaType.location;
+    if (slot == gla::MaxUserLayoutLocation) {
+        slot = nextSlot;
+        nextSlot += numSlots;
+    }
+
+    return slot;
+}
+
+//
+// The following is a set of helper functions for translating SPIR-V to metadata, 
+// so that information not representable in LLVM does not get lost.
+//
+
+// Translate a SPIR-V variable to the kind of input/output qualifier needed for it in metadata.
+gla::EMdInputOutput SpvToTopTranslator::getMdQualifier(spv::Id resultId) const
+{
+    gla::EMdInputOutput mdQualifier = gla::EMioNone;
+
+    spv::Id typeId = dereferenceTypeId(commonMap[resultId].typeId);
+    typeId = getArrayElementTypeId(typeId);
+    if (commonMap[typeId].isBlock) {
+        // storage class comes from the variable
+        switch (commonMap[resultId].storageClass) {
+        case spv::StorageClassInput:   mdQualifier = gla::EMioPipeInBlock;        break;
+        case spv::StorageClassOutput:  mdQualifier = gla::EMioPipeOutBlock;       break;
+        case spv::StorageClassUniform: mdQualifier = gla::EMioUniformBlockMember; break;
+        //case spv::Storage  mdQualifier = gla::EMioBufferBlockMember; break;
+        default:                                                                  break;
+        }
+
+        return mdQualifier;
+    }
+
+    // non-blocks...
+
+    switch (commonMap[resultId].metaType.builtIn) {
+    case gla::EmbPosition:    mdQualifier = gla::EMioVertexPosition; break;
+    case gla::EmbPointSize:   mdQualifier = gla::EMioPointSize;      break;
+    case gla::EmbClipVertex:  mdQualifier = gla::EMioClipVertex;     break;
+    case gla::EmbInstanceId:  mdQualifier = gla::EMioInstanceId;     break;
+    case gla::EmbVertexId:    mdQualifier = gla::EMioVertexId;       break;
+    case gla::EmbVertexIndex: mdQualifier = gla::EMioVertexIndex;    break;
+    case gla::EmbFragCoord:   mdQualifier = gla::EMioFragmentCoord;  break;
+    case gla::EmbPointCoord:  mdQualifier = gla::EMioPointCoord;     break;
+    case gla::EmbFace:        mdQualifier = gla::EMioFragmentFace;   break;
+    case gla::EmbFragColor:   mdQualifier = gla::EMioPipeOut;        break;
+    case gla::EmbFragDepth:   mdQualifier = gla::EMioFragmentDepth;  break;
+
+    default:
+        switch (commonMap[resultId].storageClass) {
+        case spv::StorageClassUniformConstant: mdQualifier = gla::EMioDefaultUniform;     break;
+        case spv::StorageClassInput:           mdQualifier = gla::EMioPipeIn;             break;
+        case spv::StorageClassUniform:         mdQualifier = gla::EMioUniformBlockMember; break;
+        case spv::StorageClassOutput:          mdQualifier = gla::EMioPipeOut;            break;
+        default:
+            gla::UnsupportedFunctionality("metadata storage class", commonMap[resultId].storageClass, gla::EATContinue);
+            break;
+        }
+    }
+
+    return mdQualifier;
+}
+
+// Translate a SPIR-V sampler type to the kind of image/texture needed for it in metadata.
+gla::EMdSampler SpvToTopTranslator::getMdSampler(spv::Id typeId) const
+{
+    if (commonMap[typeId].metaType.combinedImageSampler)
+        return gla::EMsTexture;
+    else
+        return gla::EMsImage;
+}
+
+// Translate a SPIR-V sampler type to the metadata's dimensionality.
+gla::EMdSamplerDim SpvToTopTranslator::getMdSamplerDim(spv::Id typeId) const
+{
+    switch (getImageDim(typeId)) {
+    case spv::Dim1D:     return gla::EMsd1D;
+    case spv::Dim2D:     return isImageMS(typeId) ? gla::EMsd2DMS : gla::EMsd2D;
+    case spv::Dim3D:     return gla::EMsd3D;
+    case spv::DimCube:   return gla::EMsdCube;
+    case spv::DimRect:   return gla::EMsdRect;
+    case spv::DimBuffer: return gla::EMsdBuffer;
+    default:
+        gla::UnsupportedFunctionality("unknown sampler dimension", gla::EATContinue);
+        return gla::EMsd2D;
+    }
+}
+
+// Translate a SPIR-V sampler type to return (sampled) type needed for it in metadata.
+gla::EMdSamplerBaseType SpvToTopTranslator::getMdSamplerBaseType(spv::Id typeId) const
+{
+    spv::Id sampledType = getImageSampledType(typeId);
+    if (commonMap[sampledType].type->getTypeID() == llvm::Type::FloatTyID)
+        return gla::EMsbFloat;
+    else if (commonMap[sampledType].metaType.layout == gla::EMtlUnsigned)
+        return gla::EMsbUint;
+    else
+        return gla::EMsbInt;
+}
+
+// Translate SPIR-V descriptions for type of varying/interpolation into Top IR's description.
+void SpvToTopTranslator::getInterpolationLocationMethod(spv::Id id, gla::EInterpolationMethod& method, gla::EInterpolationLocation& location)
+{
+    switch (commonMap[id].metaType.interpolationMethod) {
+    case spv::DecorationNoPerspective: method = gla::EIMNoperspective;  break;
+    case spv::DecorationPatch:         method = gla::EIMPatch;          break;
+    default:                           method = gla::EIMNone;           break;
+    }
+
+    switch (commonMap[id].metaType.interpolateTo) {
+    case spv::DecorationSample:        location = gla::EILSample;       break;
+    case spv::DecorationCentroid:      location = gla::EILCentroid;     break;
+    default:                           location = gla::EILFragment;     break;
+    }
+}
+
+// For making a global that won't be optimized away, so that metadata can have a type
+// that's not statically consumed.
+llvm::Value* SpvToTopTranslator::makePermanentTypeProxy(llvm::Value* value)
+{
+    // Make a type proxy that won't be optimized away (we still want the real llvm::Value to get optimized away when it can)
+    llvm::Type* type = value->getType();
+    while (type->getTypeID() == llvm::Type::PointerTyID)
+        type = llvm::dyn_cast<llvm::PointerType>(type)->getContainedType(0);
+
+    // Don't hook this global into the module, that will cause LLVM to optimize it away.
+    llvm::Value* typeProxy = new llvm::GlobalVariable(type, true, llvm::GlobalVariable::ExternalLinkage, 0, value->getName() + "_typeProxy");
+    manager.addToFreeList(typeProxy);
+
+    return typeProxy;
+}
+
+// Create all the metadata needed to back up a uniform or block.
+llvm::MDNode* SpvToTopTranslator::declareUniformMetadata(spv::Id resultId)
+{
+    llvm::MDNode* md;
+
+    gla::EMdInputOutput ioType = getMdQualifier(resultId);
+    switch (ioType) {
+    case gla::EMioDefaultUniform:
+        md = declareMdDefaultUniform(resultId);
+        break;
+    case gla::EMioUniformBlockMember:
+    case gla::EMioBufferBlockMember:
+    case gla::EMioBufferBlockMemberArrayed:
+        md = declareMdUniformBlock(ioType, resultId);
+        break;
+    default:
+        return 0;
+    }
+
+    return md;
+}
+
+// Make a !gla.uniform node, as per metadata.h, for a default uniform
+// Assumes the resultId is for a variable declaration, meaning the typeId
+// is a pointer to the actual type.
+llvm::MDNode* SpvToTopTranslator::declareMdDefaultUniform(spv::Id resultId)
+{
+    // Deference the type
+    spv::Id typeId = dereferenceTypeId(commonMap[resultId].typeId);
+    typeId = getArrayElementTypeId(typeId);
+    llvm::MDNode* samplerMd = makeMdSampler(typeId, commonMap[resultId].value);
+
+    // Create hierarchical type information if it's an aggregate
+    gla::EMdTypeLayout layout = commonMap[typeId].metaType.layout;
+    llvm::MDNode* structure = 0;
+    if (commonMap[typeId].type->getTypeID() == llvm::Type::StructTyID)
+        structure = declareMdType(typeId, commonMap[resultId].metaType);
+
+    // Make the main node
+    return metadata.makeMdInputOutput(NonNullName(commonMap[resultId].metaType.name), gla::UniformListMdName, gla::EMioDefaultUniform,
+                                      makePermanentTypeProxy(commonMap[resultId].value),
+                                      layout, commonMap[resultId].metaType.precision, commonMap[resultId].metaType.location, samplerMd, structure,
+                                      -1, commonMap[resultId].metaType.builtIn, packSetBinding(commonMap[resultId].metaType));
+}
+
+// Make a metadata description of a sampler's type.
+llvm::MDNode* SpvToTopTranslator::makeMdSampler(spv::Id typeId, llvm::Value* value)
+{
+    // Figure out sampler information, if it's a sampler
+    if (commonMap[typeId].metaType.layout != gla::EMtlSampler)
+        return 0;
+
+    llvm::Value* typeProxy = 0;
+    if (! value) {
+        // Don't hook this global into the module, that will cause LLVM to optimize it away.
+        typeProxy = new llvm::GlobalVariable(commonMap[typeId].type, true, llvm::GlobalVariable::ExternalLinkage, 0, "sampler_typeProxy");
+        manager.addToFreeList(typeProxy);
+    } else
+        typeProxy = makePermanentTypeProxy(value);
+
+    return metadata.makeMdSampler(getMdSampler(typeId), typeProxy, getMdSamplerDim(typeId), isImageArrayed(typeId),
+                                  isImageDepth(typeId), getMdSamplerBaseType(typeId));
+}
+
+// Make a !gla.uniform node, as per metadata.h, for a uniform block or buffer block
+// (depending on ioType).
+// Assumes the resultId is for a variable declaration, meaning the typeId
+// is a pointer to the actual type.
+llvm::MDNode* SpvToTopTranslator::declareMdUniformBlock(gla::EMdInputOutput ioType, spv::Id resultId)
+{
+    // Deference the type
+    spv::Id typeId = dereferenceTypeId(commonMap[resultId].typeId);
+
+    // Make hierachical type information
+    llvm::MDNode* block = declareMdType(typeId, commonMap[resultId].metaType);
+
+    // Make the main node
+    return metadata.makeMdInputOutput(NonNullName(commonMap[resultId].metaType.name), gla::UniformListMdName, ioType, makePermanentTypeProxy(commonMap[resultId].value),
+                                      commonMap[typeId].metaType.layout, commonMap[resultId].metaType.precision, commonMap[resultId].metaType.location, 0, block, -1,
+                                      gla::EmbNone, packSetBinding(commonMap[resultId].metaType));
+}
+
+// Make a !aggregate node for the object, as per metadata.h, calling declareMdType with the type
+// to recursively finish for hierarchical types.
+llvm::MDNode* SpvToTopTranslator::declareMdType(spv::Id typeId, MetaType& metaType)
+{
+    // if contained type is an array, we actually need the type of the elements
+    // (we need metadata for the element type, not the array itself)
+    typeId = getArrayElementTypeId(typeId);
+
+    // Figure out sampler information if it's a sampler
+    llvm::MDNode* samplerMd = makeMdSampler(typeId, 0);
+    std::vector<llvm::Value*> mdArgs;
+
+    // name of aggregate, if an aggregate (struct or block)
+    gla::EMdTypeLayout typeLayout;
+    if (commonMap[typeId].type->getTypeID() == llvm::Type::StructTyID) {
+        mdArgs.push_back(llvm::MDString::get(context, NonNullName(commonMap[typeId].metaType.name)));
+        typeLayout = commonMap[typeId].metaType.layout;
+    } else {
+        mdArgs.push_back(llvm::MDString::get(context, ""));
+        typeLayout = metaType.layout;
+    }
+
+    // !typeLayout
+    mdArgs.push_back(metadata.makeMdTypeLayout(typeLayout, metaType.precision, metaType.location, samplerMd, -1, metaType.builtIn, packSetBinding(metaType)));
+
+    if (commonMap[typeId].type->getTypeID() == llvm::Type::StructTyID) {
+        int numMembers = (int)commonMap[typeId].type->getNumContainedTypes();
+
+        // make sure we have enough metadata, if not enough name/decorations created it
+        bumpMemberMetaData(commonMap[typeId].memberMetaData, numMembers - 1);
+
+        std::vector<MetaType>& memberMetaData = *commonMap[typeId].memberMetaData;
+        for (int t = 0; t < numMembers; ++t) {
+            spv::Id containedTypeId = getStructMemberTypeId(typeId, t);
+
+            // name of member
+            mdArgs.push_back(llvm::MDString::get(context, memberMetaData[t].name));
+
+            // type of member
+            llvm::MDNode* mdType = declareMdType(containedTypeId, memberMetaData[t]);
+            mdArgs.push_back(mdType);
+        }
+    }
+
+    return llvm::MDNode::get(context, mdArgs);
+}
+
+// Make metadata node for either an 'in' or an 'out' variable/block.
+// Assumes the resultId is for a variable declaration, meaning the typeId
+// is a pointer to the actual type.
+llvm::MDNode* SpvToTopTranslator::makeInputOutputMetadata(spv::Id resultId, int slot, const char* kind)
+{
+    // Deference the type
+    spv::Id typeId = dereferenceTypeId(commonMap[resultId].typeId);
+    typeId = getArrayElementTypeId(typeId);
+
+    llvm::MDNode* aggregate = 0;
+    if (commonMap[typeId].type->getTypeID() == llvm::Type::StructTyID) {
+        // Make hierarchical type information, for the dereferenced type
+        aggregate = declareMdType(typeId, commonMap[resultId].metaType);
+    }
+
+    gla::EInterpolationMethod interpMethod = gla::EIMNone;
+    gla::EInterpolationLocation interpLocation = gla::EILFragment;
+    getInterpolationLocationMethod(resultId, interpMethod, interpLocation);
+
+    return metadata.makeMdInputOutput(NonNullName(commonMap[resultId].metaType.name), kind, getMdQualifier(resultId), makePermanentTypeProxy(commonMap[resultId].value),
+                                      commonMap[typeId].metaType.layout, commonMap[resultId].metaType.precision, slot, 0, aggregate,
+                                      gla::MakeInterpolationMode(interpMethod, interpLocation), commonMap[resultId].metaType.builtIn);
+}
+
+// Make metadata node for an 'out' variable/block and associate it with the 
+// output-variable cache in the gla builder.
+void SpvToTopTranslator::makeOutputMetadata(spv::Id resultId, int slot, int numSlots)
+{
+    llvm::MDNode* md = makeInputOutputMetadata(resultId, slot, gla::OutputListMdName);
+
+    if (commonMap[resultId].metaType.invariant)
+        module->getOrInsertNamedMetadata(gla::InvariantListMdName)->addOperand(md);
+}
+
+llvm::MDNode* SpvToTopTranslator::makeInputMetadata(spv::Id resultId, int slot)
+{
+    llvm::MDNode* mdNode = makeInputOutputMetadata(resultId, slot, gla::InputListMdName);
+
+    return mdNode;
+}
+
+void SpvToTopTranslator::decodeResult(bool type, spv::Id& typeId, spv::Id& resultId)
+{
+    unsigned int instructionStart = word - 1;
+
+    if (type)
+        typeId = spirv[word++];
+    
+    resultId = spirv[word++];
+
+    if (type && commonMap[resultId].typeId == 0)
+        commonMap[resultId].typeId = typeId;
+
+    // map instruction for future reference
+    commonMap[resultId].instructionIndex = instructionStart;
+}
+
+const char* SpvToTopTranslator::findAName(spv::Id choice1, spv::Id choice2)
+{
+    if (commonMap[choice1].metaType.name)
+        return commonMap[choice1].metaType.name;
+    else if (choice2 != 0 && commonMap[choice2].metaType.name)
+        return commonMap[choice1].metaType.name;
+    else {
+        // Look ahead to see if we're in a potential chain of instructions leading 
+        // to a store that has the result named.  This is just an approximation.
+        spv::Id chain = choice1;
+        int trialInst = nextInst;
+        for (int i = 0; i < 200; i++) {
+            if (trialInst + 5 >= (int)spirv.size())
+                break;
+            spv::Op opcode = GetOpCode(spirv[trialInst]);
+            switch (opcode) {
+            case spv::OpStore:
+                if (spirv[trialInst + 2] == chain ||
+                    (chain == 0 && commonMap[spirv[trialInst + 1]].metaType.name))
+                    chain = spirv[trialInst + 1];
+                break;
+
+            // up against something not worth crossing?
+            case spv::OpFunctionEnd:
+            case spv::OpSwitch:
+            case spv::OpReturn:
+            case spv::OpReturnValue:
+            case spv::OpUnreachable:
+                return 0;
+
+            // something worth ignoring?
+            case spv::OpLoopMerge:
+            case spv::OpSelectionMerge:
+            case spv::OpBranch:
+            case spv::OpLabel:
+            case spv::OpKill:
+                break;
+
+            // propagate access-chain base name into pointer name?
+            case spv::OpAccessChain:
+            {
+                spv::Id baseId = spirv[trialInst + 3];
+                spv::Id resultId = spirv[trialInst + 2];
+                if (commonMap[resultId].metaType.name == 0)
+                    commonMap[resultId].metaType.name = commonMap[baseId].metaType.name;
+                break;
+            }
+
+            // Does this instruction consume it but not have a result?
+            case spv::OpBranchConditional:  // probably "?:"
+                if (spirv[trialInst + 1] == chain)
+                    chain = 0;
+                break;
+
+            default:
+                // hopefully a normal data-flow processing instruction, where the instruction has result
+                if (chain != 0 && spirv[trialInst + 2] < numIds) {
+                    for (int op = 3; op < GetWordCount(spirv[trialInst]); ++op) {
+                        if (spirv[trialInst + op] == chain) {
+                            chain = spirv[trialInst + 2];
+                            break;
+                        }
+                    }
+                }
+                break;
+            }
+            const char* name = commonMap[chain].metaType.name;
+            if (name != 0)
+                return name;
+            trialInst += GetWordCount(spirv[trialInst]);
+        }
+    }
+
+    return 0;
+}
+
+void SpvToTopTranslator::translateInstruction(spv::Op opCode, int numOperands)
+{
+    spv::Id resultId;
+    spv::Id typeId;
+
+    switch (opCode) {
+    case spv::OpSource:
+    {
+        spv::SourceLanguage source = (spv::SourceLanguage)spirv[word++];
+        unsigned int sourceVersion = spirv[word++];
+        manager.setVersion(sourceVersion);
+        switch (source) {
+        case spv::SourceLanguageESSL:
+            manager.setProfile(EEsProfile);
+            break;
+        case spv::SourceLanguageGLSL:
+            manager.setProfile(ECoreProfile);
+            break;
+        case spv::SourceLanguageOpenCL_C:
+        case spv::SourceLanguageOpenCL_CPP:
+        default:
+            gla::UnsupportedFunctionality("non-GL profile", gla::EATContinue);
+            manager.setProfile(ECoreProfile);
+            break;
+        }
+        break;
+    }
+    case spv::OpSourceExtension:
+        manager.addExtension((const char*)&spirv[word]);
+        break;
+    case spv::OpExtInstImport:
+        break;
+    case spv::OpMemoryModel:
+        break;
+    case spv::OpEntryPoint:
+    {
+        spv::ExecutionModel model = (spv::ExecutionModel)spirv[word++];
+        spv::Id entry = spirv[word++];
+        setEntryPoint(model, entry);
+        break;
+    }
+    case spv::OpExecutionMode:
+    {
+        spv::Id entryPoint = (spv::ExecutionModel)spirv[word++];
+        spv::ExecutionMode mode = (spv::ExecutionMode)spirv[word++];
+        setExecutionMode(entryPoint, mode);
+        if (numOperands > 2)
+            gla::UnsupportedFunctionality("OpExecutionMode with more than two operands", gla::EATContinue);
+        break;
+    }
+    case spv::OpCapability:
+        break;
+    case spv::OpTypeVoid:
+    case spv::OpTypeBool:
+    case spv::OpTypeInt:
+    case spv::OpTypeFloat:
+    case spv::OpTypeVector:
+    case spv::OpTypeMatrix:
+    case spv::OpTypeImage:
+    case spv::OpTypeSampler:
+    case spv::OpTypeSampledImage:
+    case spv::OpTypeArray:
+    case spv::OpTypeRuntimeArray:
+    case spv::OpTypeStruct:
+    case spv::OpTypePointer:
+    case spv::OpTypeFunction:
+        decodeResult(false, typeId, resultId);
+        numOperands -= 1;
+        addType(opCode, resultId, numOperands);
+        break;
+    case spv::OpConstantTrue:
+    case spv::OpConstantFalse:
+    case spv::OpConstant:
+        decodeResult(true, typeId, resultId);
+        numOperands -= 2;
+        addConstant(opCode, resultId, typeId, numOperands);
+        break;
+    case spv::OpConstantComposite:
+        decodeResult(true, typeId, resultId);
+        numOperands -= 2;
+        addConstantAggregate(resultId, typeId, numOperands);
+        break;
+    case spv::OpVariable:
+        decodeResult(true, typeId, resultId);
+        numOperands -= 2;
+        addVariable(resultId, typeId, (spv::StorageClass)spirv[word++]);
+        break;
+    case spv::OpDecorate:
+    {
+        spv::Id id = spirv[word++];
+        spv::Decoration decoration = (spv::Decoration)spirv[word++];
+        addDecoration(id, decoration);
+        break;
+    }
+    case spv::OpMemberDecorate:
+    {
+        spv::Id structTypeId = spirv[word++];
+        unsigned int member = spirv[word++];
+        spv::Decoration decoration = (spv::Decoration)spirv[word++];
+        addMemberDecoration(structTypeId, member, decoration);
+        break;
+    }
+    case spv::OpName:
+    {
+        spv::Id id = spirv[word++];
+        commonMap[id].metaType.name = (const char *)&spirv[word];
+        break;
+    }
+    case spv::OpMemberName:
+    {
+        spv::Id id = spirv[word++];
+        unsigned int memberNumber = spirv[word++];
+        const char* name = (const char *)&spirv[word];
+        bumpMemberMetaData(commonMap[id].memberMetaData, memberNumber);
+        (*commonMap[id].memberMetaData)[memberNumber].name = name;
+        break;
+    }
+
+    case spv::OpString:
+        gla::UnsupportedFunctionality("OpString", gla::EATContinue);
+        break;
+    case spv::OpLine:
+        gla::UnsupportedFunctionality("OpLine", gla::EATContinue);
+        break;
+
+    case spv::OpLoad:
+        decodeResult(true, typeId, resultId);
+        commonMap[resultId].value = glaBuilder->createLoad(commonMap[spirv[word++]].value);
+        break;
+    case spv::OpStore:
+    {
+        llvm::Value* lvalue = commonMap[spirv[word++]].value;
+        llvm::Value* rvalue = commonMap[spirv[word++]].value;
+        glaBuilder->createStore(rvalue, lvalue);
+        break;
+    }
+
+    case spv::OpFunction:
+        decodeResult(true, typeId, resultId);
+        handleOpFunction(typeId, resultId);
+        currentArg = commonMap[currentFunction].function->arg_begin();
+        break;
+    case spv::OpFunctionParameter:
+        decodeResult(true, typeId, resultId);
+        commonMap[resultId].value = &(*currentArg);
+        ++currentArg;
+        break;
+    case spv::OpFunctionEnd:
+        glaBuilder->leaveFunction(inEntryPoint());
+        currentFunction = 0;
+        break;
+
+    case spv::OpIAdd:
+    case spv::OpFAdd:
+    case spv::OpISub:
+    case spv::OpFSub:
+    case spv::OpIMul:
+    case spv::OpFMul:
+    case spv::OpUDiv:
+    case spv::OpSDiv:
+    case spv::OpFDiv:
+    case spv::OpUMod:
+    case spv::OpSRem:
+    case spv::OpSMod:
+    case spv::OpFRem:
+    case spv::OpFMod:
+    case spv::OpVectorTimesScalar:
+    case spv::OpMatrixTimesScalar:
+    case spv::OpVectorTimesMatrix:
+    case spv::OpMatrixTimesVector:
+    case spv::OpMatrixTimesMatrix:
+    case spv::OpOuterProduct:
+    case spv::OpShiftRightLogical:
+    case spv::OpShiftRightArithmetic:
+    case spv::OpShiftLeftLogical:
+    case spv::OpLogicalOr:
+    case spv::OpLogicalNotEqual:
+    case spv::OpLogicalAnd:
+    case spv::OpBitwiseOr:
+    case spv::OpBitwiseXor:
+    case spv::OpBitwiseAnd:
+    case spv::OpIEqual:
+    case spv::OpFOrdEqual:
+    case spv::OpFUnordEqual:
+    case spv::OpINotEqual:
+    case spv::OpFOrdNotEqual:
+    case spv::OpFUnordNotEqual:
+    case spv::OpULessThan:
+    case spv::OpSLessThan:
+    case spv::OpFOrdLessThan:
+    case spv::OpFUnordLessThan:
+    case spv::OpUGreaterThan:
+    case spv::OpSGreaterThan:
+    case spv::OpFOrdGreaterThan:
+    case spv::OpFUnordGreaterThan:
+    case spv::OpULessThanEqual:
+    case spv::OpSLessThanEqual:
+    case spv::OpFOrdLessThanEqual:
+    case spv::OpFUnordLessThanEqual:
+    case spv::OpUGreaterThanEqual:
+    case spv::OpSGreaterThanEqual:
+    case spv::OpFOrdGreaterThanEqual:
+    case spv::OpFUnordGreaterThanEqual:
+    case spv::OpDot:
+    {
+        decodeResult(true, typeId, resultId);
+        unsigned int left = spirv[word++];
+        unsigned int right = spirv[word++];
+        commonMap[resultId].value = createBinaryOperation(opCode, commonMap[resultId].metaType.precision, commonMap[left].value, commonMap[right].value, 
+                                                          commonMap[left].metaType.layout == gla::EMtlNone, false, findAName(resultId));
+        if (commonMap[resultId].value == 0)
+            gla::UnsupportedFunctionality("binary operation");
+        break;
+    }
+    case spv::OpSelect:
+    {
+        decodeResult(true, typeId, resultId);
+        numOperands -= 2;
+        spv::Id conditionId = spirv[word++];
+        spv::Id trueId = spirv[word++];
+        spv::Id falseId = spirv[word++];
+        commonMap[resultId].value = llvmBuilder.CreateSelect(commonMap[conditionId].value, commonMap[trueId].value, commonMap[falseId].value);
+        break;
+    }
+    case spv::OpAccessChain:
+    {
+        decodeResult(true, typeId, resultId);
+        numOperands -= 2;
+        spv::Id baseId = spirv[word++];
+
+        llvm::SmallVector<llvm::Value*, 5> chain;
+        chain.push_back(gla::MakeIntConstant(context, 0));
+        llvm::Value* base = commonMap[baseId].value;
+        for (int op = 1; op < numOperands; ++op)
+            chain.push_back(commonMap[spirv[word++]].value);
+        commonMap[resultId].value = llvmBuilder.CreateGEP(base, chain);
+        if (commonMap[resultId].metaType.name == 0)
+            commonMap[resultId].metaType.name = commonMap[baseId].metaType.name;
+        break;
+    }
+    case spv::OpVectorShuffle:
+    {
+        decodeResult(true, typeId, resultId);
+        numOperands -= 2;
+        llvm::Value* vector1 = commonMap[spirv[word++]].value;
+        llvm::Value* vector2 = commonMap[spirv[word++]].value;
+        numOperands -= 2;
+        unsigned numTargetComponents = gla::GetComponentCount(commonMap[typeId].type);
+        assert(numOperands == numTargetComponents);
+
+        if (vector1 == vector2) {
+            llvm::SmallVector<int, 4> channels;
+            for (int op = 0; op < numOperands; ++op)
+                channels.push_back(spirv[word++]);
+            commonMap[resultId].value = glaBuilder->createSwizzle(commonMap[resultId].metaType.precision, vector1, channels, commonMap[typeId].type);
+        } else if (gla::GetComponentCount(vector1) == numTargetComponents) {
+            // See if this is just updating vector1 with parts of vector2
+            bool justUpdate = true;
+            for (int i = 0; i < numOperands; ++i) {
+                if (spirv[word + i] < numTargetComponents && spirv[word + i] != i)
+                    justUpdate = false;
+            }
+            if (! justUpdate)
+                gla::UnsupportedFunctionality("generalized shuffle with matching sizes");
+
+            // Now, we know anything from vector1 is an identity shuffle,
+            // so, just collect from vector2 and put into the first one.
+            for (int i = 0; i < numOperands; ++i) {
+                if (spirv[word + i] >= numTargetComponents) {
+                    llvm::Value* comp = llvmBuilder.CreateExtractElement(vector2, gla::MakeIntConstant(context, spirv[word + i] - numTargetComponents));
+                    vector1 = llvmBuilder.CreateInsertElement(vector1, comp, gla::MakeIntConstant(context, i));
+                }
+            }
+            commonMap[resultId].value = vector1;
+        } else {
+            printf("%d %d %d %d \n", gla::GetComponentCount(commonMap[typeId].type), gla::GetComponentCount(vector1), gla::GetComponentCount(vector2), numOperands);
+            gla::UnsupportedFunctionality("generalized shuffle with unmatching sizes");
+        }
+
+        break;
+    }
+    case spv::OpCompositeConstruct:
+    {
+        decodeResult(true, typeId, resultId);
+        numOperands -= 2;
+        std::vector<llvm::Value*> constituents;
+        for (int i = 0; i < numOperands; ++i)
+            constituents.push_back(commonMap[spirv[word++]].value);
+        commonMap[resultId].value = createConstructor(resultId, typeId, constituents);
+        break;
+    }
+    case spv::OpCompositeExtract:
+    {
+        decodeResult(true, typeId, resultId);
+        llvm::Value* composite = commonMap[spirv[word++]].value;
+        numOperands -= 3;
+
+        // Build the indexes...
+        // SPIR-V can go to down to components, but LLVM stops at vectors, so track type
+        // to break out early.
+        const llvm::Type* currentType = composite->getType();
+        llvm::SmallVector<unsigned int, 4> indexes;
+        int vectorIndex = -1;
+        for (int i = 0; i < numOperands; ++i) {
+            if (currentType->getTypeID() == llvm::Type::VectorTyID) {
+                vectorIndex = spirv[word++];
+                break;
+            }
+            indexes.push_back(spirv[word++]);
+            currentType = currentType->getContainedType(std::min(indexes.back(), currentType->getNumContainedTypes() - 1));
+        }
+
+        // Do the operation
+        if (vectorIndex == -1)
+            commonMap[resultId].value = llvmBuilder.CreateExtractValue(composite, indexes);
+        else {
+            // The final index was into a vector.
+            // If the vector is a subpart of the composite; first extract the vector
+            llvm::Value* vector;
+            if (indexes.size() > 0)
+                vector = llvmBuilder.CreateExtractValue(composite, indexes);
+            else
+                vector = composite;
+            commonMap[resultId].value = llvmBuilder.CreateExtractElement(vector, gla::MakeIntConstant(context, vectorIndex));
+        }
+
+        break;
+    }
+    case spv::OpCompositeInsert:
+    {
+        decodeResult(true, typeId, resultId);
+        llvm::Value* object = commonMap[spirv[word++]].value;
+        llvm::Value* composite = commonMap[spirv[word++]].value;
+        numOperands -= 4;
+
+        // Build the indexes...
+        // SPIR-V can go to down to components, but LLVM stops at vectors, so track type
+        // to break out early.
+        const llvm::Type* currentType = composite->getType();
+        llvm::SmallVector<unsigned int, 4> indexes;
+        int vectorIndex = -1;
+        for (int i = 0; i < numOperands; ++i) {
+            if (currentType->getTypeID() == llvm::Type::VectorTyID) {
+                vectorIndex = spirv[word++];
+                break;
+            }
+            indexes.push_back(spirv[word++]);
+            currentType = currentType->getContainedType(std::min(indexes.back(), currentType->getNumContainedTypes() - 1));
+        }
+
+        // Do the operation
+        if (vectorIndex == -1)
+            commonMap[resultId].value = llvmBuilder.CreateInsertValue(composite, object, indexes);
+        else {
+            // The final index was into a vector.
+            // If the vector is a subpart of the composite; extract the vector, do the insert, then put the vector back.
+            // If the vector is the composite, deal with it more directly.
+            if (indexes.size() > 0) {
+                llvm::Value* vector = llvmBuilder.CreateExtractValue(composite, indexes);
+                vector = llvmBuilder.CreateInsertElement(vector, object, gla::MakeIntConstant(context, vectorIndex));
+                commonMap[resultId].value = llvmBuilder.CreateInsertValue(composite, vector, indexes);
+            } else
+                commonMap[resultId].value = llvmBuilder.CreateInsertElement(composite, object, gla::MakeIntConstant(context, vectorIndex));
+        }
+
+        break;
+    }
+    case spv::OpVectorExtractDynamic:
+    {
+        decodeResult(true, typeId, resultId);
+        llvm::Value* source = commonMap[spirv[word++]].value;
+        llvm::Value* component = commonMap[spirv[word++]].value;
+        commonMap[resultId].value = llvmBuilder.CreateExtractElement(source, component);
+        break;
+    }
+    case spv::OpVectorInsertDynamic:
+    {
+        decodeResult(true, typeId, resultId);
+        llvm::Value* target = commonMap[spirv[word++]].value;
+        llvm::Value* source = commonMap[spirv[word++]].value;
+        llvm::Value* component = commonMap[spirv[word++]].value;
+        commonMap[resultId].value = llvmBuilder.CreateInsertElement(target, source, component);
+        break;
+    }
+    case spv::OpUndef:
+        decodeResult(true, typeId, resultId);
+        commonMap[resultId].value = llvm::UndefValue::get(commonMap[typeId].type);
+        break;
+    case spv::OpPhi:
+    {
+        decodeResult(true, typeId, resultId);
+        numOperands -= 2;
+        llvm::PHINode* phi = llvmBuilder.CreatePHI(commonMap[typeId].type, numOperands);
+        while (numOperands >= 2) {
+            spv::Id variable = spirv[word++];
+            spv::Id parent = spirv[word++];
+            makeLabelBlock(parent);
+            numOperands -= 2;
+            phi->addIncoming(commonMap[variable].value, commonMap[parent].block);
+        }
+        commonMap[resultId].value = phi;
+        break;
+    }
+    case spv::OpSampledImage:
+        gla::UnsupportedFunctionality("OpSampler");
+        break;
+
+    case spv::OpImage:
+    {
+        decodeResult(true, typeId, resultId);
+        numOperands -= 2;
+        unsigned int operand = spirv[word++];
+        commonMap[resultId].value = commonMap[operand].value;
+        break;
+    }
+
+    case spv::OpImageSampleImplicitLod:
+    case spv::OpImageSampleExplicitLod:
+    case spv::OpImageSampleDrefImplicitLod:
+    case spv::OpImageSampleDrefExplicitLod:
+    case spv::OpImageSampleProjImplicitLod:
+    case spv::OpImageSampleProjExplicitLod:
+    case spv::OpImageSampleProjDrefImplicitLod:
+    case spv::OpImageSampleProjDrefExplicitLod:
+        decodeResult(true, typeId, resultId);
+        numOperands -= 2;
+        commonMap[resultId].value = createSamplingCall(opCode, typeId, resultId, numOperands);
+        break;
+
+    case spv::OpImageFetch:
+        gla::UnsupportedFunctionality("OpTextureFetch instruction");
+        break;
+
+    case spv::OpImageGather:
+    case spv::OpImageDrefGather:
+        gla::UnsupportedFunctionality("OpTextureGather instruction");
+        break;
+
+    case spv::OpImageQuerySizeLod:
+    case spv::OpImageQuerySize:
+    case spv::OpImageQueryLod:
+    case spv::OpImageQueryLevels:
+    case spv::OpImageQuerySamples:
+        decodeResult(true, typeId, resultId);
+        numOperands -= 2;
+        commonMap[resultId].value = createTextureQueryCall(opCode, typeId, resultId, numOperands);
+        break;
+
+    case spv::OpSNegate:
+    case spv::OpFNegate:
+    case spv::OpNot:
+    case spv::OpLogicalNot:
+    case spv::OpAny:
+    case spv::OpAll:
+    case spv::OpConvertFToU:
+    case spv::OpConvertFToS:
+    case spv::OpConvertSToF:
+    case spv::OpConvertUToF:
+    case spv::OpUConvert:
+    case spv::OpSConvert:
+    case spv::OpFConvert:
+    case spv::OpBitcast:
+    case spv::OpTranspose:
+    case spv::OpIsNan:
+    case spv::OpIsInf:
+    case spv::OpArrayLength:
+    case spv::OpDPdx:
+    case spv::OpDPdy:
+    case spv::OpFwidth:
+    case spv::OpDPdxFine:
+    case spv::OpDPdyFine:
+    case spv::OpFwidthFine:
+    case spv::OpDPdxCoarse:
+    case spv::OpDPdyCoarse:
+    case spv::OpFwidthCoarse:
+    {
+        decodeResult(true, typeId, resultId);
+        unsigned int operand = spirv[word++];
+        commonMap[resultId].value = createUnaryOperation(opCode, commonMap[resultId].metaType.precision, commonMap[typeId].type, commonMap[operand].value, 
+                                                         commonMap[operand].metaType.layout == gla::EMtlNone, false);
+        if (commonMap[resultId].value == 0)
+            gla::UnsupportedFunctionality("unary operation ", opCode);
+        break;
+    }
+
+    case spv::OpEmitStreamVertex:
+        glaBuilder->createIntrinsicCall(gla::EMpNone, llvm::Intrinsic::gla_emitStreamVertex, commonMap[spirv[word++]].value);
+        break;
+    case spv::OpEndStreamPrimitive:
+        glaBuilder->createIntrinsicCall(gla::EMpNone, llvm::Intrinsic::gla_endStreamPrimitive, commonMap[spirv[word++]].value);
+        break;
+
+    case spv::OpEmitVertex:
+        glaBuilder->createIntrinsicCall(llvm::Intrinsic::gla_emitVertex);
+        break;
+    case spv::OpEndPrimitive:
+        glaBuilder->createIntrinsicCall(llvm::Intrinsic::gla_endPrimitive);
+        break;
+    case spv::OpControlBarrier:
+        glaBuilder->createIntrinsicCall(llvm::Intrinsic::gla_barrier);
+        break;
+    case spv::OpMemoryBarrier:
+        // TODO: handle all the different kinds
+        glaBuilder->createIntrinsicCall(llvm::Intrinsic::gla_memoryBarrier);
+        break;
+
+    case spv::OpFunctionCall:
+        decodeResult(true, typeId, resultId);
+        numOperands -= 2;
+        commonMap[resultId].value = createFunctionCall(commonMap[resultId].metaType.precision, typeId, resultId, numOperands);
+        break;
+
+    case spv::OpExtInst:
+        decodeResult(true, typeId, resultId);
+        numOperands -= 2;
+        commonMap[resultId].value = createExternalInstruction(commonMap[resultId].metaType.precision, typeId, resultId, numOperands, findAName(resultId));
+        break;
+
+    case spv::OpLabel:
+        decodeResult(false, typeId, resultId);
+        makeLabelBlock(resultId);
+        llvmBuilder.SetInsertPoint(commonMap[resultId].block);
+        break;
+
+    case spv::OpLoopMerge:
+        // going to LunarGLASS does not require preserving structured flow control
+        break;
+    case spv::OpSelectionMerge:
+        // going to LunarGLASS does not require preserving structured flow control
+        break;
+    case spv::OpBranch:
+    {
+        spv::Id labelId = spirv[word++];
+        makeLabelBlock(labelId);
+        llvmBuilder.CreateBr(commonMap[labelId].block);
+        break;
+    }
+    case spv::OpBranchConditional:
+    {
+        spv::Id conditionId = spirv[word++];
+        spv::Id trueLabelId = spirv[word++];
+        spv::Id falseLabelId = spirv[word++];
+        makeLabelBlock(trueLabelId);
+        makeLabelBlock(falseLabelId);
+        llvmBuilder.CreateCondBr(commonMap[conditionId].value, commonMap[trueLabelId].block, commonMap[falseLabelId].block);
+        break;
+    }
+    case spv::OpSwitch:
+        createSwitch(numOperands);
+        break;
+    case spv::OpKill:
+        glaBuilder->makeDiscard(inEntryPoint());
+        // we might be missing a termination instruction in the now current block
+        if (llvmBuilder.GetInsertBlock()->getTerminator() == 0 && GetOpCode(spirv[word]) == spv::OpLabel) {
+            spv::Id labelId = spirv[word + 1];
+            makeLabelBlock(labelId);
+            llvmBuilder.CreateBr(commonMap[labelId].block);
+        }
+        break;
+    case spv::OpReturn:
+        if (inEntryPoint())
+            glaBuilder->makeMainReturn();
+        else
+            glaBuilder->makeReturn();
+        break;
+    case spv::OpReturnValue:
+        glaBuilder->makeReturn(false, commonMap[spirv[word++]].value);
+        break;
+    default:
+        gla::UnsupportedFunctionality("OpCode ", opCode);
+        break;
+    }
+}
+
+// Handle an OpFunction, getting the information needed to make the gla function
+// from the OpTypeFunction, so not yet visiting the OpFunctionParameters that follow it
+void SpvToTopTranslator::handleOpFunction(spv::Id& typeId, spv::Id& resultId)
+{
+    currentFunction = resultId;
+
+    spv::FunctionControlMask controlMask = (spv::FunctionControlMask)spirv[word++];
+    spv::Id functionTypeId = spirv[word++];
+
+    // peek ahead at the first label, as we're making the entry block early
+    spv::Id firstLabelId = getNextLabelId();
+
+    if (inEntryPoint()) {
+        // Make the entry point function in LLVM.
+        shaderEntry = glaBuilder->makeMain();
+        const char* entryName = commonMap[resultId].metaType.name;
+        if (entryName == 0)
+            entryName = "main";
+        metadata.addMdEntrypoint(entryName);
+        commonMap[currentFunction].function = shaderEntry->getParent();
+        commonMap[firstLabelId].block = shaderEntry;
+        llvmBuilder.SetInsertPoint(shaderEntry);
+    } else {
+        // The function may have already been created by a forward call.
+        // If not, make it now.
+        llvm::Function* function;
+        const unsigned int typeInstrIndex = commonMap[functionTypeId].instructionIndex;
+        const int numParams = GetWordCount(spirv[typeInstrIndex]) - 3;
+        const unsigned int returnTypeIndex = typeInstrIndex + 2;
+        const unsigned int param0TypeIndex = typeInstrIndex + 3;
+        if (commonMap[currentFunction].function == 0) {
+            // Build up the formal parameter type list
+            std::vector<spv::Id> paramTypeIds;
+            for (int p = 0; p < numParams; ++p)
+                paramTypeIds.push_back(spirv[param0TypeIndex + p]);                
+            function = makeFunction(currentFunction, typeId, paramTypeIds);
+            commonMap[currentFunction].function = function;
+        } else
+            function = commonMap[currentFunction].function;
+        commonMap[firstLabelId].block = &function->getEntryBlock();
+
+        llvmBuilder.SetInsertPoint(&function->getBasicBlockList().front());
+    }
+}
+
+llvm::Function* SpvToTopTranslator::makeFunction(spv::Id functionId, spv::Id returnTypeId, std::vector<spv::Id>& argTypeIds)
+{
+    llvm::Type* retType = commonMap[returnTypeId].type;
+    std::vector<llvm::Type*> paramTypes;
+    for (int p = 0; p < (int)argTypeIds.size(); ++p)
+        paramTypes.push_back(commonMap[argTypeIds[p]].type);
+
+    // Make the function
+    llvm::BasicBlock* entryBlock;  // seems this is just needed as a flag now
+    llvm::Function* function = glaBuilder->makeFunctionEntry(retType, commonMap[functionId].metaType.name, paramTypes, &entryBlock);
+    function->addFnAttr(llvm::Attribute::AlwaysInline);
+    
+    return function;
+}
+
+// Peek ahead in the spirv until an OpLabel is found, and return its id.
+spv::Id SpvToTopTranslator::getNextLabelId()
+{
+    unsigned searchWord = word;
+    do {
+        if ((spirv[searchWord] & spv::OpCodeMask) == spv::OpLabel)
+            return spirv[searchWord + 1];
+        searchWord += spirv[searchWord] >> spv::WordCountShift;
+        assert(searchWord < spirv.size());
+    } while (true);
+}
+
+// Translate a SPIR-V OpFunctionCall to a Top IR call.
+llvm::Value* SpvToTopTranslator::createFunctionCall(gla::EMdPrecision precision, spv::Id typeId, spv::Id resultId, int numOperands)
+{
+    // get the function operand
+    spv::Id functionId = spirv[word++];
+    numOperands -= 1;
+
+    // If it's a forward reference, we need to create the function based on the 
+    // signature of the call.  It is supposed to be validated to an exact match
+    // of the function type.
+    llvm::Function* function;
+    if (commonMap[functionId].value == 0) {
+        std::vector<spv::Id> argTypeIds;
+        for (int a = 0; a < numOperands; ++a)
+            argTypeIds.push_back(commonMap[spirv[word + a]].typeId);
+        function = makeFunction(functionId, typeId, argTypeIds);
+        commonMap[functionId].value = function;
+    } else {
+        // Grab the function's pointer from the previously created function
+        function = llvm::dyn_cast<llvm::Function>(commonMap[functionId].value);
+    }
+
+    if (! function)
+        gla::UnsupportedFunctionality("Call to undefined function");
+
+    // Note: All the difficult semantics for various kinds of argument passing should be
+    // handled going to SPIR-V, so that now things should just be pass by copy.
+
+    // get the calling argument operands
+    llvm::SmallVector<llvm::Value*, 4> llvmArgs;
+    for (int op = 0; op < numOperands; ++op) {
+        spv::Id argId = spirv[word++];
+        llvmArgs.push_back(commonMap[argId].value);
+    }
+
+    // Make the call
+    return llvmBuilder.Insert(llvm::CallInst::Create(function, llvmArgs));
+}
+
+// Translate a SPIR-V OpExtInst to a Top IR call.
+llvm::Value* SpvToTopTranslator::createExternalInstruction(gla::EMdPrecision precision, spv::Id typeId, spv::Id resultId, int numOperands, const char* name)
+{
+    // get the set and instruction number
+    spv::Id extInstSet = spirv[word++];
+    unsigned int instEnum = spirv[word++];
+    numOperands -= 2;
+
+    // get the calling argument operands
+    bool firstHasSign = false;
+    bool firstIsFloat = false;
+    std::vector<llvm::Value*> operands;
+    for (int op = 0; op < numOperands; ++op) {
+        spv::Id argId = spirv[word++];
+        operands.push_back(commonMap[argId].value);
+        if (op == 0) {
+            firstIsFloat = gla::GetBasicTypeID(operands.front()) == llvm::Type::FloatTyID;
+            if (! firstIsFloat)
+                firstHasSign = commonMap[argId].metaType.layout == gla::EMtlNone;
+        }
+    }
+
+    llvm::Intrinsic::ID intrinsicID = llvm::Intrinsic::ID(0);
+    llvm::Value* result = 0;
+
+    switch (instEnum) {
+    case spv::GLSLstd450Round:
+        intrinsicID = llvm::Intrinsic::gla_fRoundFast;
+        break;
+    case spv::GLSLstd450RoundEven:
+        intrinsicID = llvm::Intrinsic::gla_fRoundEven;
+        break;
+    case spv::GLSLstd450Trunc:
+        intrinsicID = llvm::Intrinsic::gla_fRoundZero;
+        break;
+    case spv::GLSLstd450FAbs:
+        intrinsicID = llvm::Intrinsic::gla_fAbs;
+        break;
+    case spv::GLSLstd450SAbs:
+        intrinsicID = llvm::Intrinsic::gla_abs;
+        break;
+    case spv::GLSLstd450FSign:
+        intrinsicID = llvm::Intrinsic::gla_fSign;
+        break;
+    case spv::GLSLstd450SSign:
+        intrinsicID = llvm::Intrinsic::gla_sign;
+        break;
+    case spv::GLSLstd450Floor:
+        intrinsicID = llvm::Intrinsic::gla_fFloor;
+        break;
+    case spv::GLSLstd450Ceil:
+        intrinsicID = llvm::Intrinsic::gla_fCeiling;
+        break;
+    case spv::GLSLstd450Fract:
+        intrinsicID = llvm::Intrinsic::gla_fFraction;
+        break;
+    case spv::GLSLstd450Radians:
+        intrinsicID = llvm::Intrinsic::gla_fRadians;
+        break;
+    case spv::GLSLstd450Degrees:
+        intrinsicID = llvm::Intrinsic::gla_fDegrees;
+        break;
+    case spv::GLSLstd450Sin:
+        intrinsicID = llvm::Intrinsic::gla_fSin;
+        break;
+    case spv::GLSLstd450Cos:
+        intrinsicID = llvm::Intrinsic::gla_fCos;
+        break;
+    case spv::GLSLstd450Tan:
+        intrinsicID = llvm::Intrinsic::gla_fTan;
+        break;
+    case spv::GLSLstd450Asin:
+        intrinsicID = llvm::Intrinsic::gla_fAsin;
+        break;
+    case spv::GLSLstd450Acos:
+        intrinsicID = llvm::Intrinsic::gla_fAcos;
+        break;
+    case spv::GLSLstd450Atan:
+        intrinsicID = llvm::Intrinsic::gla_fAtan;
+        break;
+    case spv::GLSLstd450Sinh:
+        intrinsicID = llvm::Intrinsic::gla_fSinh;
+        break;
+    case spv::GLSLstd450Cosh:
+        intrinsicID = llvm::Intrinsic::gla_fCosh;
+        break;
+    case spv::GLSLstd450Tanh:
+        intrinsicID = llvm::Intrinsic::gla_fTanh;
+        break;
+    case spv::GLSLstd450Asinh:
+        intrinsicID = llvm::Intrinsic::gla_fAsinh;
+        break;
+    case spv::GLSLstd450Acosh:
+        intrinsicID = llvm::Intrinsic::gla_fAcosh;
+        break;
+    case spv::GLSLstd450Atanh:
+        intrinsicID = llvm::Intrinsic::gla_fAtanh;
+        break;
+    case spv::GLSLstd450Atan2:
+        intrinsicID = llvm::Intrinsic::gla_fAtan2;
+        break;
+    case spv::GLSLstd450Pow:
+        if (firstIsFloat)
+            intrinsicID = llvm::Intrinsic::gla_fPow;
+        else
+            intrinsicID = llvm::Intrinsic::gla_fPowi;
+        break;
+    case spv::GLSLstd450Exp:
+        intrinsicID = llvm::Intrinsic::gla_fExp;
+        break;
+    case spv::GLSLstd450Log:
+        intrinsicID = llvm::Intrinsic::gla_fLog;
+        break;
+    case spv::GLSLstd450Exp2:
+        intrinsicID = llvm::Intrinsic::gla_fExp2;
+        break;
+    case spv::GLSLstd450Log2:
+        intrinsicID = llvm::Intrinsic::gla_fLog2;
+        break;
+    case spv::GLSLstd450Sqrt:
+        intrinsicID = llvm::Intrinsic::gla_fSqrt;
+        break;
+    case spv::GLSLstd450InverseSqrt:
+        intrinsicID = llvm::Intrinsic::gla_fInverseSqrt;
+        break;
+    case spv::GLSLstd450Determinant:
+        return glaBuilder->createMatrixDeterminant(precision, operands.front());
+    case spv::GLSLstd450MatrixInverse:
+        return glaBuilder->createMatrixInverse(precision, operands.front());
+    case spv::GLSLstd450Modf:
+    {
+        // modf()'s second operand is only an l-value to set the 2nd return value to
+
+        // use a unary intrinsic form to make the call and get back the returned struct
+        llvm::Value* structure = glaBuilder->createIntrinsicCall(precision, llvm::Intrinsic::gla_fModF, operands.front());
+
+        // store integer part into second operand
+        llvm::Value* intPart = llvmBuilder.CreateExtractValue(structure, 1);
+        llvmBuilder.CreateStore(intPart, operands[1]);
+
+        // leave the first part as the function-call's value
+        return llvmBuilder.CreateExtractValue(structure, 0);
+    }
+    case spv::GLSLstd450FMin:
+        intrinsicID = llvm::Intrinsic::gla_fMin;
+        break;
+    case spv::GLSLstd450SMin:
+        intrinsicID = llvm::Intrinsic::gla_sMin;
+        break;
+    case spv::GLSLstd450UMin:
+        intrinsicID = llvm::Intrinsic::gla_uMin;
+        break;
+    case spv::GLSLstd450FMax:
+        intrinsicID = llvm::Intrinsic::gla_fMax;
+        break;
+    case spv::GLSLstd450SMax:
+        intrinsicID = llvm::Intrinsic::gla_sMax;
+        break;
+    case spv::GLSLstd450UMax:
+        intrinsicID = llvm::Intrinsic::gla_uMax;
+        break;
+    case spv::GLSLstd450FClamp:
+        intrinsicID = llvm::Intrinsic::gla_fClamp;
+        break;
+    case spv::GLSLstd450SClamp:
+        intrinsicID = llvm::Intrinsic::gla_sClamp;
+        break;
+    case spv::GLSLstd450UClamp:
+        intrinsicID = llvm::Intrinsic::gla_uClamp;
+        break;
+    case spv::GLSLstd450FMix:
+        intrinsicID = llvm::Intrinsic::gla_fMix;
+        break;
+    case spv::GLSLstd450IMix:
+        gla::UnsupportedFunctionality("integer mix");
+        break;
+    case spv::GLSLstd450Step:
+        intrinsicID = llvm::Intrinsic::gla_fStep;
+        break;
+    case spv::GLSLstd450SmoothStep:
+        intrinsicID = llvm::Intrinsic::gla_fSmoothStep;
+        break;
+    case spv::GLSLstd450Fma:
+        break;
+    case spv::GLSLstd450Frexp:
+        break;
+    case spv::GLSLstd450Ldexp:
+        break;
+    case spv::GLSLstd450PackSnorm4x8:
+        break;
+    case spv::GLSLstd450PackUnorm4x8:
+        break;
+    case spv::GLSLstd450PackSnorm2x16:
+        break;
+    case spv::GLSLstd450PackUnorm2x16:
+        break;
+    case spv::GLSLstd450PackHalf2x16:
+        break;
+    case spv::GLSLstd450PackDouble2x32:
+        break;
+    case spv::GLSLstd450UnpackSnorm2x16:
+        break;
+    case spv::GLSLstd450UnpackUnorm2x16:
+        break;
+    case spv::GLSLstd450UnpackHalf2x16:
+        break;
+    case spv::GLSLstd450UnpackSnorm4x8:
+        break;
+    case spv::GLSLstd450UnpackUnorm4x8:
+        break;
+    case spv::GLSLstd450UnpackDouble2x32:
+        break;
+    case spv::GLSLstd450Length:
+        intrinsicID = llvm::Intrinsic::gla_fLength;
+        break;
+    case spv::GLSLstd450Distance:
+        intrinsicID = llvm::Intrinsic::gla_fDistance;
+        break;
+    case spv::GLSLstd450Cross:
+        intrinsicID = llvm::Intrinsic::gla_fCross;
+        break;
+    case spv::GLSLstd450Normalize:
+        intrinsicID = llvm::Intrinsic::gla_fNormalize;
+        break;
+    case spv::GLSLstd450FaceForward:
+        intrinsicID = llvm::Intrinsic::gla_fFaceForward;
+        break;
+    case spv::GLSLstd450Reflect:
+        intrinsicID = llvm::Intrinsic::gla_fReflect;
+        break;
+    case spv::GLSLstd450Refract:
+        intrinsicID = llvm::Intrinsic::gla_fRefract;
+        break;
+    case spv::GLSLstd450FindILsb:
+        break;
+    case spv::GLSLstd450FindSMsb:
+        break;
+    case spv::GLSLstd450FindUMsb:
+        break;
+    case spv::GLSLstd450InterpolateAtCentroid:
+        break;
+    case spv::GLSLstd450InterpolateAtSample:
+        break;
+    case spv::GLSLstd450InterpolateAtOffset:
+        break;
+    }
+
+    // If intrinsic was assigned, then call the intrinsic and return
+    if (intrinsicID != llvm::Intrinsic::ID(0)) {
+        switch (operands.size()) {
+        case 0:
+            result = glaBuilder->createIntrinsicCall(precision, intrinsicID);
+            break;
+        case 1:
+            result = glaBuilder->createIntrinsicCall(precision, intrinsicID, operands[0], name);
+            break;
+        case 2:
+            result = glaBuilder->createIntrinsicCall(precision, intrinsicID, operands[0], operands[1], name ? name : "misc2a");
+            break;
+        case 3:
+            result = glaBuilder->createIntrinsicCall(precision, intrinsicID, operands[0], operands[1], operands[2], name ? name : "misc3a");
+            break;
+        default:
+            // These do not exist yet
+            assert(0 && "intrinsic with more than 3 operands");
+            break;
+        }
+    }
+
+    if (result == 0)
+        gla::UnsupportedFunctionality("built-in call");
+
+    return result;
+}
+
+llvm::Value* SpvToTopTranslator::createUnaryOperation(spv::Op op, gla::EMdPrecision precision, llvm::Type* resultType, llvm::Value* operand, bool hasSign, bool reduceComparison)
+{
+    llvm::Instruction::CastOps castOp = llvm::Instruction::CastOps(0);
+    bool isFloat = gla::GetBasicTypeID(operand) == llvm::Type::FloatTyID;
+    bool comparison = false;
+
+    llvm::Intrinsic::ID intrinsicID = llvm::Intrinsic::ID(0);
+    llvm::Value* result = 0;
+
+    switch (op) {
+    case spv::OpSNegate:
+        result = llvmBuilder.CreateNeg(operand);
+        glaBuilder->setInstructionPrecision(result, precision);
+        return result;
+
+    case spv::OpFNegate:
+        if (gla::IsAggregate(operand)) {
+            // emulate by subtracting from 0.0
+            llvm::Value* zero = gla::MakeFloatConstant(context, 0.0);
+
+            return glaBuilder->createMatrixOp(precision, llvm::Instruction::FSub, zero, operand);
+        }
+
+        if (gla::GetBasicTypeID(operand) == llvm::Type::FloatTyID)
+            result = llvmBuilder.CreateFNeg(operand);
+        else
+            result = llvmBuilder.CreateNeg (operand);
+        glaBuilder->setInstructionPrecision(result, precision);
+        return result;
+
+    case spv::OpNot:
+    case spv::OpLogicalNot:
+        return llvmBuilder.CreateNot(operand);
+    case spv::OpAny:
+        intrinsicID = llvm::Intrinsic::gla_any;
+        break;
+    case spv::OpAll:
+        intrinsicID = llvm::Intrinsic::gla_all;
+        break;
+    case spv::OpConvertFToU:
+        castOp = llvm::Instruction::FPToUI;
+        break;
+    case spv::OpConvertFToS:
+        castOp = llvm::Instruction::FPToSI;
+        break;
+    case spv::OpConvertSToF:
+        castOp = llvm::Instruction::SIToFP;
+        break;
+    case spv::OpConvertUToF:
+        castOp = llvm::Instruction::UIToFP;
+        break;
+    case spv::OpUConvert:
+        castOp = llvm::Instruction::UIToFP;// TODO
+        break;
+    case spv::OpSConvert:
+        // TODO
+        break;
+    case spv::OpFConvert:
+        // TODO
+        break;
+    case spv::OpBitcast:
+        return llvmBuilder.CreateBitCast(operand, resultType);
+        break;
+    case spv::OpTranspose:
+        return glaBuilder->createMatrixTranspose(precision, operand);
+    case spv::OpIsNan:
+        intrinsicID = llvm::Intrinsic::gla_fIsNan;
+        break;
+    case spv::OpIsInf:
+        intrinsicID = llvm::Intrinsic::gla_fIsInf;
+        break;
+    case spv::OpArrayLength:
+        break;
+    case spv::OpDPdx:
+        intrinsicID = llvm::Intrinsic::gla_fDFdx;
+        break;
+    case spv::OpDPdy:
+        intrinsicID = llvm::Intrinsic::gla_fDFdy;
+        break;
+    case spv::OpFwidth:
+        intrinsicID = llvm::Intrinsic::gla_fFilterWidth;
+        break;
+    case spv::OpDPdxFine:
+        // TODO
+        break;
+    case spv::OpDPdyFine:
+        // TODO
+        break;
+    case spv::OpFwidthFine:
+        // TODO
+        break;
+    case spv::OpDPdxCoarse:
+        // TODO
+        break;
+    case spv::OpDPdyCoarse:
+        // TODO
+        break;
+    case spv::OpFwidthCoarse:
+        // TODO
+        break;
+    default:
+        break;
+    }
+
+    // If intrinsic was assigned, then call the Top IR intrinsic and return.
+    if (intrinsicID != llvm::Intrinsic::ID(0))
+        return glaBuilder->createIntrinsicCall(precision, intrinsicID, operand);
+
+    if (castOp) {
+        result = llvmBuilder.CreateCast(castOp, operand, resultType);
+        glaBuilder->setInstructionPrecision(result, precision);
+        return result;
+    }
+
+    return 0;
+}
+
+llvm::Value* SpvToTopTranslator::createBinaryOperation(spv::Op op, gla::EMdPrecision precision, llvm::Value* left, llvm::Value* right, bool hasSign, bool reduceComparison, const char* name)
+{
+    llvm::Instruction::BinaryOps binOp = llvm::Instruction::BinaryOps(0);
+    bool leftIsFloat = (gla::GetBasicTypeID(left) == llvm::Type::FloatTyID);
+    bool comparison = false;
+
+    llvm::Intrinsic::ID intrinsicID = llvm::Intrinsic::ID(0);
+
+    switch (op) {
+    case spv::OpFAdd:
+        binOp = llvm::Instruction::FAdd;
+        break;
+    case spv::OpIAdd:
+        binOp = llvm::Instruction::Add;
+        break;
+    case spv::OpFSub:
+        binOp = llvm::Instruction::FSub;
+        break;
+    case spv::OpISub:
+        binOp = llvm::Instruction::Sub;
+        break;
+    case spv::OpFMul:
+    case spv::OpVectorTimesScalar:
+        binOp = llvm::Instruction::FMul;
+        break;
+    case spv::OpIMul:
+        binOp = llvm::Instruction::Mul;
+        break;
+    case spv::OpMatrixTimesScalar:
+    case spv::OpVectorTimesMatrix:
+    case spv::OpMatrixTimesVector:
+    case spv::OpMatrixTimesMatrix:
+        return glaBuilder->createMatrixMultiply(precision, left, right);
+
+    case spv::OpDot:
+        switch (gla::GetComponentCount(left)) {
+        case 2:
+            intrinsicID = llvm::Intrinsic::gla_fDot2;
+            break;
+        case 3:
+            intrinsicID = llvm::Intrinsic::gla_fDot3;
+            break;
+        case 4:
+            intrinsicID = llvm::Intrinsic::gla_fDot4;
+            break;
+        default:
+            assert(! "bad component count for dot");
+            break;
+        }
+        break;
+
+    case spv::OpUDiv:
+        binOp = llvm::Instruction::UDiv;
+        break;
+    case spv::OpSDiv:
+        binOp = llvm::Instruction::SDiv;
+        break;
+    case spv::OpFDiv:
+        binOp = llvm::Instruction::FDiv;
+        break;
+    case spv::OpUMod:
+        binOp = llvm::Instruction::URem;
+        break;
+    case spv::OpSRem:
+        binOp = llvm::Instruction::SRem;
+        break;
+    case spv::OpSMod:
+        binOp = llvm::Instruction::SRem;  // TODO is this correct?
+        break;
+    case spv::OpFRem:
+        binOp = llvm::Instruction::FRem;
+        break;
+    case spv::OpFMod:
+        binOp = llvm::Instruction::FRem;  // TODO is this correct?
+        break;
+    case spv::OpShiftRightLogical:
+        binOp = llvm::Instruction::LShr;
+        break;
+    case spv::OpShiftRightArithmetic:
+        binOp = llvm::Instruction::AShr;
+        break;
+    case spv::OpShiftLeftLogical:
+        binOp = llvm::Instruction::Shl;
+        break;
+    case spv::OpLogicalAnd:
+    case spv::OpBitwiseAnd:
+        binOp = llvm::Instruction::And;
+        break;
+    case spv::OpLogicalOr:
+    case spv::OpBitwiseOr:
+        binOp = llvm::Instruction::Or;
+        break;
+    case spv::OpLogicalNotEqual:
+    case spv::OpBitwiseXor:
+        binOp = llvm::Instruction::Xor;
+        break;
+
+    case spv::OpIEqual:
+    case spv::OpFOrdEqual:
+    case spv::OpFUnordEqual:
+    case spv::OpINotEqual:
+    case spv::OpFOrdNotEqual:
+    case spv::OpFUnordNotEqual:
+    case spv::OpULessThan:
+    case spv::OpSLessThan:
+    case spv::OpFOrdLessThan:
+    case spv::OpFUnordLessThan:
+    case spv::OpUGreaterThan:
+    case spv::OpSGreaterThan:
+    case spv::OpFOrdGreaterThan:
+    case spv::OpFUnordGreaterThan:
+    case spv::OpULessThanEqual:
+    case spv::OpSLessThanEqual:
+    case spv::OpFOrdLessThanEqual:
+    case spv::OpFUnordLessThanEqual:
+    case spv::OpUGreaterThanEqual:
+    case spv::OpSGreaterThanEqual:
+    case spv::OpFOrdGreaterThanEqual:
+    case spv::OpFUnordGreaterThanEqual:
+        comparison = true;
+        break;
+    default:
+        break;
+    }
+
+    // If intrinsic was assigned, then call the Top IR intrinsic and return.
+    if (intrinsicID != llvm::Intrinsic::ID(0))
+        return glaBuilder->createIntrinsicCall(precision, intrinsicID, left, right, name ? name : "misc2a");
+
+    // If binOp was assigned, then make the LLVM instruction.
+    if (binOp != 0) {
+        if (gla::IsAggregate(left) || gla::IsAggregate(right))
+            return glaBuilder->createMatrixOp(precision, binOp, left, right);
+
+        glaBuilder->promoteScalar(precision, left, right);
+        llvm::Value* value = llvmBuilder.CreateBinOp(binOp, left, right);
+        glaBuilder->setInstructionPrecision(value, precision);
+
+        return value;
+    }
+
+    if (! comparison)
+        return 0;
+
+    // Comparison instructions
+
+    if (reduceComparison && (gla::IsVector(left) || gla::IsAggregate(left))) {
+        bool equal;
+        switch (op) {
+        case spv::OpIEqual:
+        case spv::OpFOrdEqual:
+        case spv::OpFUnordEqual:
+            equal = true;
+            break;
+        case spv::OpINotEqual:
+        case spv::OpFOrdNotEqual:
+        case spv::OpFUnordNotEqual:
+            equal = false;
+            break;
+        default:
+            gla::UnsupportedFunctionality("Comparison reduction");
+            break;
+        }
+
+        return glaBuilder->createCompare(precision, left, right, equal);
+    }
+        
+    llvm::FCmpInst::Predicate fpred = llvm::FCmpInst::Predicate(0);
+    llvm::ICmpInst::Predicate ipred = llvm::ICmpInst::Predicate(0);
+    switch (op) {
+    case spv::OpIEqual:
+        ipred = llvm::ICmpInst::ICMP_EQ;
+        break;
+    case spv::OpFOrdEqual:
+        fpred = llvm::FCmpInst::FCMP_OEQ;
+        break;
+    case spv::OpINotEqual:
+        ipred = llvm::ICmpInst::ICMP_NE;
+        break;
+    case spv::OpFOrdNotEqual:
+        fpred = llvm::FCmpInst::FCMP_ONE;
+        break;
+    case spv::OpULessThan:
+        ipred = llvm::ICmpInst::ICMP_ULT;
+        break;
+    case spv::OpSLessThan:
+        ipred = llvm::ICmpInst::ICMP_SLT;
+        break;
+    case spv::OpFOrdLessThan:
+        fpred = llvm::FCmpInst::FCMP_OLT;
+        break;
+    case spv::OpUGreaterThan:
+        ipred = llvm::ICmpInst::ICMP_UGT;
+        break;
+    case spv::OpSGreaterThan:
+        ipred = llvm::ICmpInst::ICMP_SGT;
+        break;
+    case spv::OpFOrdGreaterThan:
+        fpred = llvm::FCmpInst::FCMP_OGT;
+        break;
+    case spv::OpULessThanEqual:
+        ipred = llvm::ICmpInst::ICMP_ULE;
+        break;
+    case spv::OpSLessThanEqual:
+        ipred = llvm::ICmpInst::ICMP_SLE;
+        break;
+    case spv::OpFOrdLessThanEqual:
+        fpred = llvm::FCmpInst::FCMP_OLE;
+        break;
+    case spv::OpUGreaterThanEqual:
+        ipred = llvm::ICmpInst::ICMP_UGE;
+        break;
+    case spv::OpSGreaterThanEqual:
+        ipred = llvm::ICmpInst::ICMP_SGE;
+        break;
+    case spv::OpFOrdGreaterThanEqual:
+        fpred = llvm::FCmpInst::FCMP_OGE;
+        break;
+
+    case spv::OpFUnordEqual:
+    case spv::OpFUnordNotEqual:
+    case spv::OpFUnordLessThan:
+    case spv::OpFUnordGreaterThan:
+    case spv::OpFUnordLessThanEqual:
+    case spv::OpFUnordGreaterThanEqual:
+        gla::UnsupportedFunctionality("Unordered compare");
+        break;
+    }
+
+    if (fpred) {
+        llvm::Value* result = llvmBuilder.CreateFCmp(fpred, left, right);
+        glaBuilder->setInstructionPrecision(result, precision);
+
+        return result;
+    } else if (ipred != 0) {
+        llvm::Value* result = llvmBuilder.CreateICmp(ipred, left, right);
+        glaBuilder->setInstructionPrecision(result, precision);
+
+        return result;
+    }
+
+    return 0;
+}
+
+// Change from a SPIR-V sampler type to a LunarGLASS sampler type
+gla::ESamplerType SpvToTopTranslator::convertImageType(spv::Id imageTypeId)
+{
+    switch (getImageDim(imageTypeId)) {
+    case spv::Dim1D:
+        return gla::ESampler1D;
+    case spv::Dim2D:
+        if (isImageMS(imageTypeId))
+            return gla::ESampler2DMS;
+        else
+            return gla::ESampler2D;
+    case spv::Dim3D:
+        return gla::ESampler3D;
+    case spv::DimCube:
+        return gla::ESamplerCube;
+    case spv::DimRect:
+        return gla::ESampler2DRect;
+    case spv::DimBuffer:        
+        return gla::ESamplerBuffer;
+    default:
+        gla::UnsupportedFunctionality("SPIR-V sampler dimensionality");
+        return gla::ESampler2D;
+    }
+}
+
+// Turn a SPIR-V OpTextureSample* instruction into gla texturing intrinsic.
+llvm::Value* SpvToTopTranslator::createSamplingCall(spv::Op opcode, spv::Id typeId, spv::Id resultId, int numOperands)
+{
+    // First, decode the instruction
+
+    // invariant fixed operands, sampled-image and coord
+    spv::Id sampledImageId = spirv[word++];
+    spv::Id coord = spirv[word++];
+    numOperands -= 2;
+
+    // Dref fixed operand
+    spv::Id dref = BadValue;
+    switch (opcode) {
+        case spv::OpImageSampleDrefImplicitLod:
+        case spv::OpImageSampleDrefExplicitLod:
+        case spv::OpImageSampleProjDrefImplicitLod:
+        case spv::OpImageSampleProjDrefExplicitLod:
+            dref = spirv[word++];
+            --numOperands;
+            break;
+        default:
+            break;
+    }
+
+    // optional SPV image operands
+    spv::Id bias = BadValue;
+    spv::Id lod = BadValue;
+    spv::Id gradx = BadValue;
+    spv::Id grady = BadValue;
+    spv::Id offset = BadValue;
+    spv::Id offsets = BadValue;
+    spv::Id sample = BadValue;
+    if (numOperands > 0) {
+        spv::ImageOperandsMask imageOperands = (spv::ImageOperandsMask)spirv[word++];
+        if (imageOperands & spv::ImageOperandsBiasMask)
+            bias = spirv[word++];
+        if (imageOperands & spv::ImageOperandsLodMask)
+            lod = spirv[word++];
+        if (imageOperands & spv::ImageOperandsGradMask) {
+            gradx = spirv[word++];
+            grady = spirv[word++];
+        }
+        if (imageOperands & spv::ImageOperandsOffsetMask ||
+            imageOperands & spv::ImageOperandsConstOffsetMask)
+            offset = spirv[word++];
+        if (imageOperands & spv::ImageOperandsConstOffsetsMask)
+            offsets = spirv[word++];
+        if (imageOperands & spv::ImageOperandsSampleMask)
+            sample = spirv[word++];
+    }
+
+    // Second, set up information to set up to pass to the builder:
+
+    // bias
+    int flags = 0;
+    gla::Builder::TextureParameters parameters = {};
+    if (bias != BadValue) {
+        flags |= gla::ETFBias;
+        parameters.ETPBiasLod = commonMap[bias].value;
+    }
+
+    // sampler and gla dimensionality
+    spv::Id sampledImageTypeId = commonMap[sampledImageId].typeId;
+    gla::ESamplerType glaSamplerType = convertImageType(getImageTypeId(commonMap[sampledImageId].typeId));
+    parameters.ETPSampler = commonMap[sampledImageId].value;
+
+    // coord
+    parameters.ETPCoords = commonMap[coord].value;
+
+    // lod
+    if (lod != BadValue) {
+        flags |= gla::ETFBiasLodArg;
+        flags |= gla::ETFLod;
+        parameters.ETPBiasLod = commonMap[lod].value;
+    }
+
+    // offset
+    if (offset != BadValue) {
+        flags |= gla::ETFOffsetArg;
+        parameters.ETPOffset = commonMap[offset].value;
+    }
+
+    // offsets
+    if (offsets != BadValue)
+        gla::UnsupportedFunctionality("image offsets");
+
+    // gradient
+    if (gradx != BadValue) {
+        parameters.ETPGradX = commonMap[gradx].value;
+        parameters.ETPGradY = commonMap[grady].value;
+    }
+
+    // proj
+    switch (opcode) {
+        case spv::OpImageSampleProjImplicitLod:
+        case spv::OpImageSampleProjExplicitLod:
+        case spv::OpImageSampleProjDrefImplicitLod:
+        case spv::OpImageSampleProjDrefExplicitLod:
+            flags |= gla::ETFProjected;
+            break;
+        default:
+            break;
+    }
+
+    // Dref
+    if (dref != BadValue) {
+        // Note: for most cases, this needs to be put back with the coordinate,
+        // which for the glslang front end already holds it.
+        // TODO: do this correctly, in the general case.
+        
+        if (false /*! cube-arrayed*/) {
+            flags |= gla::ETFRefZArg;
+            parameters.ETPShadowRef = commonMap[dref].value;
+        }
+    }
+
+    return glaBuilder->createTextureCall(commonMap[resultId].metaType.precision, commonMap[typeId].type, glaSamplerType, flags, parameters, findAName(resultId));
+}
+
+// Turn a SPIR-V OpTextureQuery* instruction into gla texturing intrinsic.
+llvm::Value* SpvToTopTranslator::createTextureQueryCall(spv::Op opcode, spv::Id typeId, spv::Id resultId, int numOperands)
+{
+    // first operand is always the sampler
+    spv::Id image = spirv[word++];
+
+    // sometimes there is another operand, either lod or coords
+    spv::Id extraArg = 0;
+    if (numOperands > 1)
+        extraArg = spirv[word++];
+
+    llvm::Intrinsic::ID intrinsicID;
+    switch (opcode) {
+    case spv::OpImageQuerySizeLod:
+        intrinsicID = llvm::Intrinsic::gla_queryTextureSize;
+        break;
+    case spv::OpImageQuerySize:
+        intrinsicID = llvm::Intrinsic::gla_queryTextureSizeNoLod;
+        break;
+    case spv::OpImageQueryLod:
+        intrinsicID = llvm::Intrinsic::gla_fQueryTextureLod;
+        break;
+    case spv::OpImageQueryLevels:
+    case spv::OpImageQuerySamples:
+    default:
+        gla::UnsupportedFunctionality("SPIR-V texture query op");
+        break;
+    }
+
+    gla::ESamplerType glaSamplerType = convertImageType(commonMap[image].typeId);
+
+    return glaBuilder->createTextureQueryCall(gla::EMpNone, intrinsicID, commonMap[typeId].type, gla::MakeIntConstant(context, glaSamplerType),
+                                              commonMap[image].value, commonMap[extraArg].value, findAName(resultId));
+}
+
+// OpCompositeConstruct
+llvm::Value* SpvToTopTranslator::createConstructor(spv::Id resultId, spv::Id typeId, std::vector<llvm::Value*> constituents)
+{
+    llvm::Value* result;
+
+    if (gla::IsVector(commonMap[typeId].type)) {
+        // handle vectors differently, as they don't have a 1:1 mapping between constituents and components
+        result = llvm::UndefValue::get(commonMap[typeId].type);
+        result = glaBuilder->createConstructor(commonMap[resultId].metaType.precision, constituents, result);
+    } else {
+        // everything else (matrices, arrays, structs) should have a 1:1 mapping between constituents and components
+        result = llvm::UndefValue::get(commonMap[typeId].type);
+        for (unsigned i = 0; i < constituents.size(); ++i)
+            result = llvmBuilder.CreateInsertValue(result, constituents[i], i);
+    }
+
+    return result;
+}
+
+spv::Op SpvToTopTranslator::getOpCode(spv::Id id) const
+{
+    return GetOpCode(spirv[commonMap[id].instructionIndex]);
+}
+
+spv::Id SpvToTopTranslator::dereferenceTypeId(spv::Id typeId) const
+{
+    // Look back at the OpTypePointer instruction
+    const int OpTypePointerIdOffset = 3;
+    return spirv[commonMap[typeId].instructionIndex + OpTypePointerIdOffset];
+}
+
+spv::Id SpvToTopTranslator::getArrayElementTypeId(spv::Id typeId) const
+{
+    // Look back at the OpType* instruction
+    const int OpTypeArrayElementTypeOffset = 2;
+    while (getOpCode(typeId) == spv::OpTypeArray)
+        typeId = spirv[commonMap[typeId].instructionIndex + OpTypeArrayElementTypeOffset];
+
+    return typeId;
+}
+
+spv::Id SpvToTopTranslator::getStructMemberTypeId(spv::Id typeId, int member) const
+{
+    // Look back at the OpType* instruction
+    const int OpTypeStructMember0Offset = 2;
+    return spirv[commonMap[typeId].instructionIndex + OpTypeStructMember0Offset + member];
+}
+
+spv::Id SpvToTopTranslator::getImageTypeId(spv::Id sampledImageTypeId) const
+{
+    return spirv[commonMap[sampledImageTypeId].instructionIndex + 2];
+}
+
+// Lookup the 'Sampled Type' operand in the image type
+spv::Id SpvToTopTranslator::getImageSampledType(spv::Id typeId) const
+{
+    if (commonMap[typeId].metaType.combinedImageSampler)
+        typeId = getImageTypeId(typeId);
+
+    return spirv[commonMap[typeId].instructionIndex + 2];
+}
+
+// Lookup the 'Dim' operand in the image type
+spv::Dim SpvToTopTranslator::getImageDim(spv::Id typeId) const
+{
+    if (commonMap[typeId].metaType.combinedImageSampler)
+        typeId = getImageTypeId(typeId);
+
+    return (spv::Dim)spirv[commonMap[typeId].instructionIndex + 3];
+}
+
+// Lookup the 'depth' operand in the image type
+bool SpvToTopTranslator::isImageDepth(spv::Id typeId) const
+{
+    if (commonMap[typeId].metaType.combinedImageSampler)
+        typeId = getImageTypeId(typeId);
+
+    return spirv[commonMap[typeId].instructionIndex + 4] != 0;
+}
+
+// Lookup the 'arrayed' operand in the image type
+bool SpvToTopTranslator::isImageArrayed(spv::Id typeId) const
+{
+    if (commonMap[typeId].metaType.combinedImageSampler)
+        typeId = getImageTypeId(typeId);
+
+    return spirv[commonMap[typeId].instructionIndex + 5] != 0;
+}
+
+// Lookup the 'Dim' operand in the image type
+bool SpvToTopTranslator::isImageMS(spv::Id typeId) const
+{
+    if (commonMap[typeId].metaType.combinedImageSampler)
+        typeId = getImageTypeId(typeId);
+
+    return (spv::Dim)(spirv[commonMap[typeId].instructionIndex + 6] != 0);
+}
+
+bool SpvToTopTranslator::inEntryPoint()
+{
+    return commonMap[currentFunction].entryPoint != BadValue;
+}
+
+void SpvToTopTranslator::makeLabelBlock(spv::Id labelId)
+{
+    if (commonMap[labelId].block == 0)
+        commonMap[labelId].block = llvm::BasicBlock::Create(context, "L", commonMap[currentFunction].function);
+}
+
+void SpvToTopTranslator::createSwitch(int numOperands)
+{
+    spv::Id selectorId = spirv[word++];
+    spv::Id defaultId = spirv[word++];
+    numOperands -= 2;
+
+    assert((numOperands & 1) == 0);
+    int numSegments = numOperands / 2;
+
+    makeLabelBlock(defaultId);
+    llvm::SwitchInst* switchInst = llvmBuilder.CreateSwitch(commonMap[selectorId].value, commonMap[defaultId].block, numSegments);
+    for (int c = 0; c < numSegments; ++c) {
+        unsigned int value = spirv[word++];
+        spv::Id labelId = spirv[word++];
+        makeLabelBlock(labelId);
+        switchInst->addCase(llvm::ConstantInt::get(llvm::Type::getInt32Ty(context), value), commonMap[labelId].block);
+    }
+}
+
+};  // end anonymous namespace
+
+namespace gla {
+
+// Translate SPIR-V to LunarGLASS Top IR
+void SpvToTop(const std::vector<unsigned int>& spirv, gla::Manager& manager)
+{
+    manager.createContext();
+    llvm::Module* topModule = new llvm::Module("SPIR-V", manager.getContext());
+    manager.setModule(topModule);
+
+    SpvToTopTranslator translator(spirv, manager);
+    translator.makeTop();
+}
+
+}; // end gla namespace

diff --git a/LunarGLASS_revision b/LunarGLASS_revision
new file mode 100644
index 0000000..9018d3b
--- /dev/null
+++ b/LunarGLASS_revision

@@ -0,0 +1 @@
+31a3529ce61152a35d283b55abc5e06614fdcb5d

diff --git a/README.md b/README.md
index deaa4f1..845d761 100644
--- a/README.md
+++ b/README.md

@@ -1,36 +1,31 @@
 # Vulkan Ecosystem Components
 
-This project provides Khronos official ICD loader and validation layers for Vulkan developers on Windows and Linux.
+This project provides vktrace capture/replay tool, Intel Ilo sample driver and other layer tools and driver tests.
 
 ## Introduction
 
-Vulkan is an Explicit API, enabling direct control over how GPUs actually work. No (or very little) validation
-or error checking is done inside a Vulkan driver. Applications have full control and responsibility. Any errors in
-how Vulkan is used often result in a crash. This project provides standard validation layers that can be enabled
-to ease development by helping developers verify their applications correctly use the Vulkan API.
+Branches within this repository include the Vulkan loader, validation layers, header files, and associated tests.  These pieces are mirrored from this Github repository:
+https://github.com/KhronosGroup/Vulkan-LoaderAndValidationLayers
+These pieces are required to enable this repository to be built standalone; that is without having to clone the Vulkan-LoaderAndValidationLayers repository.
 
-Vulkan supports multiple GPUs and multiple global contexts (VkInstance). The ICD loader is necessary to
-support multiple GPUs  and the VkInstance level Vulkan commands.  Additionally, the loader manages inserting
-Vulkan layer libraries, including validation layers between the application and the ICD.
-
-The following components are available in this repository:
-- Vulkan header files
-- [*ICD Loader*](loader/)
-- [*Validation Layers*](layers/)
-- Demos and tests for the loader and validation layers
+The following components are available in this repository over and above what is mirrored from Vulkan-LoaderAndValidationLayers repository
+- Api_dump, screenshot and example layers (layersvt/)
+- Intel sample driver and null driver (icd/)
+- tests for the Intel Ilo sample driver (tests/)
+- vktrace and vkreplay, API capture and replay  (vktrace/)
 
 ## Contributing
 
 If you intend to contribute, the preferred work flow is for you to develop your contribution
 in a fork of this repo in your GitHub account and then submit a pull request.
-Please see the [CONTRIBUTING](CONTRIBUTING.md) file in this respository for more details
+Please see the [CONTRIBUTING](CONTRIBUTING.md) file in this respository for more details.
 
 ## How to Build and Run
 
-[BUILD.md](BUILD.md)
-includes directions for building all the components, running the validation tests and running the demo applications.
+[BUILDVT.md](BUILDVT.md)
+includes directions for building all the components, running the tests and running the demo applications.
 
-Information on how to enable the various Validation layers is in
+Information on how to enable the various layers is in
 [layers/README.md](layers/README.md).
 
 Architecture and interface information for the loader is in
@@ -55,4 +50,3 @@
 project development; Google providing significant contributions to the validation layers;
 Khronos providing oversight and hosting of the project.
 
-

diff --git a/api_dump_generator.py b/api_dump_generator.py
new file mode 100644
index 0000000..9f0381d
--- /dev/null
+++ b/api_dump_generator.py

@@ -0,0 +1,1370 @@
+#!/usr/bin/python3 -i
+#
+# Copyright (c) 2015-2016 Valve Corporation
+# Copyright (c) 2015-2016 LunarG, Inc.
+# Copyright (c) 2015-2016 Google Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Author: Lenny Komow <lenny@lunarg.com>
+#
+# The API dump layer works by passing custom format strings to the ApiDumpGenerator. These format
+# strings are C++ code, with 3-ish exceptions:
+#   * Anything beginning with @ will be expanded by the ApiDumpGenerator. These are used to allow
+#       iteration over various items within the Vulkan spec, usch as functions, enums, etc.
+#   * Anything surrounded by { and } will be substituted when the ApiDumpGenerator expands the @
+#       directives. This gives a way to get things like data types or names for anything that can
+#       be iterated over in an @ directive.
+#   * Curly braces must be doubled like {{ for a single curly brace to appear in the output code.
+#
+# The API dump uses separate format strings for each output file, but passes them to a common
+# generator. This allows greater flexibility, as changing the output codegen means just changing
+# the corresponding format string.
+#
+# Currently, the API dump layer generates the following files from the following strings:
+#   * api_dump.cpp: COMMON_CODEGEN - Provides all entrypoints for functions and dispatches the calls
+#       to the proper back end
+#   * api_dump_text.h: TEXT_CODEGEN - Provides the back end for dumping to a text file
+#
+
+import generator as gen
+import re
+import sys
+import xml.etree;
+
+COMMON_CODEGEN = """
+/* Copyright (c) 2015-2016 Valve Corporation
+ * Copyright (c) 2015-2016 LunarG, Inc.
+ * Copyright (c) 2015-2016 Google Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Lenny Komow <lenny@lunarg.com>
+ */
+ 
+/*
+ * This file is generated from the Khronos Vulkan XML API Registry.
+ */
+ 
+#include "api_dump_text.h"
+
+//============================= Dump Functions ==============================//
+
+@foreach function where('{funcReturn}' != 'void')
+inline void dump_{funcName}(ApiDumpInstance& dump_inst, {funcReturn} result, {funcTypedParams})
+{{
+    loader_platform_thread_lock_mutex(dump_inst.outputMutex());
+    switch(dump_inst.settings().format())
+    {{
+    case ApiDumpFormat::Text:
+        dump_text_{funcName}(dump_inst, result, {funcNamedParams});
+        break;
+    }}
+    loader_platform_thread_unlock_mutex(dump_inst.outputMutex());
+}}
+@end function
+
+@foreach function where('{funcReturn}' == 'void')
+inline void dump_{funcName}(ApiDumpInstance& dump_inst, {funcTypedParams})
+{{
+    loader_platform_thread_lock_mutex(dump_inst.outputMutex());
+    switch(dump_inst.settings().format())
+    {{
+    case ApiDumpFormat::Text:
+        dump_text_{funcName}(dump_inst, {funcNamedParams});
+        break;
+    }}
+    loader_platform_thread_unlock_mutex(dump_inst.outputMutex());
+}}
+@end function
+
+//============================= API EntryPoints =============================//
+
+// Specifically implemented functions
+
+@foreach function where('{funcName}' == 'vkCreateInstance')
+VK_LAYER_EXPORT VKAPI_ATTR {funcReturn} VKAPI_CALL {funcName}({funcTypedParams})
+{{
+    // Get the function pointer
+    VkLayerInstanceCreateInfo* chain_info = get_chain_info(pCreateInfo, VK_LAYER_LINK_INFO);
+    assert(chain_info->u.pLayerInfo != 0);
+    PFN_vkGetInstanceProcAddr fpGetInstanceProcAddr = chain_info->u.pLayerInfo->pfnNextGetInstanceProcAddr;
+    assert(fpGetInstanceProcAddr != 0);
+    PFN_vkCreateInstance fpCreateInstance = (PFN_vkCreateInstance) fpGetInstanceProcAddr(NULL, "vkCreateInstance");
+    if(fpCreateInstance == NULL) {{
+        return VK_ERROR_INITIALIZATION_FAILED;
+    }}
+    
+    // Call the function and create the dispatch table
+    chain_info->u.pLayerInfo = chain_info->u.pLayerInfo->pNext;
+    {funcReturn} result = fpCreateInstance({funcNamedParams});
+    if(result == VK_SUCCESS) {{
+        initInstanceTable(*pInstance, fpGetInstanceProcAddr);
+    }}
+
+    // Output the API dump
+    dump_{funcName}(ApiDumpInstance::current(), result, {funcNamedParams});
+    return result;
+}}
+@end function
+
+@foreach function where('{funcName}' == 'vkDestroyInstance')
+VK_LAYER_EXPORT VKAPI_ATTR {funcReturn} VKAPI_CALL {funcName}({funcTypedParams})
+{{
+    // Destroy the dispatch table
+    dispatch_key key = get_dispatch_key({funcDispatchParam});
+    instance_dispatch_table({funcDispatchParam})->DestroyInstance({funcNamedParams});
+    destroy_instance_dispatch_table(key);
+    
+    // Output the API dump
+    dump_{funcName}(ApiDumpInstance::current(), {funcNamedParams});
+}}
+@end function
+
+@foreach function where('{funcName}' == 'vkCreateDevice')
+VK_LAYER_EXPORT VKAPI_ATTR {funcReturn} VKAPI_CALL {funcName}({funcTypedParams})
+{{
+    // Get the function pointer
+    VkLayerDeviceCreateInfo* chain_info = get_chain_info(pCreateInfo, VK_LAYER_LINK_INFO);
+    assert(chain_info->u.pLayerInfo != 0);
+    PFN_vkGetInstanceProcAddr fpGetInstanceProcAddr = chain_info->u.pLayerInfo->pfnNextGetInstanceProcAddr;
+    PFN_vkGetDeviceProcAddr fpGetDeviceProcAddr = chain_info->u.pLayerInfo->pfnNextGetDeviceProcAddr;
+    PFN_vkCreateDevice fpCreateDevice = (PFN_vkCreateDevice) fpGetInstanceProcAddr(NULL, "vkCreateDevice");
+    if(fpCreateDevice == NULL) {{
+        return VK_ERROR_INITIALIZATION_FAILED;
+    }}
+    
+    // Call the function and create the dispatch table
+    chain_info->u.pLayerInfo = chain_info->u.pLayerInfo->pNext;
+    {funcReturn} result = fpCreateDevice({funcNamedParams});
+    if(result == VK_SUCCESS) {{
+        initDeviceTable(*pDevice, fpGetDeviceProcAddr);
+    }}
+    
+    // Output the API dump
+    dump_{funcName}(ApiDumpInstance::current(), result, {funcNamedParams});
+    return result;
+}}
+@end function
+
+@foreach function where('{funcName}' == 'vkDestroyDevice')
+VK_LAYER_EXPORT VKAPI_ATTR {funcReturn} VKAPI_CALL {funcName}({funcTypedParams})
+{{
+    // Destroy the dispatch table
+    dispatch_key key = get_dispatch_key({funcDispatchParam});
+    device_dispatch_table({funcDispatchParam})->DestroyDevice({funcNamedParams});
+    destroy_device_dispatch_table(key);
+    
+    // Output the API dump
+    dump_{funcName}(ApiDumpInstance::current(), {funcNamedParams});
+}}
+@end function
+
+@foreach function where('{funcName}' == 'vkEnumerateInstanceExtensionProperties')
+VK_LAYER_EXPORT VKAPI_ATTR {funcReturn} VKAPI_CALL {funcName}({funcTypedParams})
+{{
+    return util_GetExtensionProperties(0, NULL, pPropertyCount, pProperties);
+}}
+@end function
+
+@foreach function where('{funcName}' == 'vkEnumerateInstanceLayerProperties')
+VK_LAYER_EXPORT VKAPI_ATTR {funcReturn} VKAPI_CALL {funcName}({funcTypedParams})
+{{
+    static const VkLayerProperties layerProperties[] = {{
+        {{
+            "VK_LAYER_LUNARG_api_dump",
+            VK_MAKE_VERSION(1, 0, VK_HEADER_VERSION), // specVersion
+            VK_MAKE_VERSION(0, 2, 0), // implementationVersion
+            "layer: api_dump",
+        }}
+    }};
+    
+    return util_GetLayerProperties(ARRAY_SIZE(layerProperties), layerProperties, pPropertyCount, pProperties);
+}}
+@end function
+
+@foreach function where('{funcName}' == 'vkEnumerateDeviceLayerProperties')
+VK_LAYER_EXPORT VKAPI_ATTR {funcReturn} VKAPI_CALL {funcName}({funcTypedParams})
+{{
+    static const VkLayerProperties layerProperties[] = {{
+        {{
+            "VK_LAYER_LUNARG_api_dump",
+            VK_MAKE_VERSION(1, 0, VK_HEADER_VERSION),
+            VK_MAKE_VERSION(0, 2, 0),
+            "layer: api_dump",
+        }}
+    }};
+    
+    return util_GetLayerProperties(ARRAY_SIZE(layerProperties), layerProperties, pPropertyCount, pProperties);
+}}
+@end function
+
+@foreach function where('{funcName}' == 'vkQueuePresentKHR')
+VK_LAYER_EXPORT VKAPI_ATTR {funcReturn} VKAPI_CALL {funcName}({funcTypedParams})
+{{
+    {funcReturn} result = device_dispatch_table({funcDispatchParam})->{funcShortName}({funcNamedParams});
+    dump_{funcName}(ApiDumpInstance::current(), result, {funcNamedParams});
+    ApiDumpInstance::current().nextFrame();
+    return result;
+}}
+@end function
+
+// Autogen instance functions
+
+@foreach function where('{funcType}' == 'instance' and '{funcReturn}' != 'void' and '{funcName}' not in ['vkCreateInstance', 'vkDestroyInstance', 'vkCreateDevice', 'vkGetInstanceProcAddr', 'vkEnumerateDeviceExtensionProperties', 'vkEnumerateDeviceLayerProperties'])
+VK_LAYER_EXPORT VKAPI_ATTR {funcReturn} VKAPI_CALL {funcName}({funcTypedParams})
+{{
+    {funcReturn} result = instance_dispatch_table({funcDispatchParam})->{funcShortName}({funcNamedParams});
+    dump_{funcName}(ApiDumpInstance::current(), result, {funcNamedParams});
+    return result;
+}}
+@end function
+
+@foreach function where('{funcType}' == 'instance' and '{funcReturn}' == 'void' and '{funcName}' not in ['vkCreateInstance', 'vkDestroyInstance', 'vkCreateDevice', 'vkGetInstanceProcAddr', 'vkEnumerateDeviceExtensionProperties', 'vkEnumerateDeviceLayerProperties'])
+VK_LAYER_EXPORT VKAPI_ATTR {funcReturn} VKAPI_CALL {funcName}({funcTypedParams})
+{{
+    instance_dispatch_table({funcDispatchParam})->{funcShortName}({funcNamedParams});
+    dump_{funcName}(ApiDumpInstance::current(), {funcNamedParams});
+}}
+@end function
+
+// Autogen device functions
+
+@foreach function where('{funcType}' == 'device' and '{funcReturn}' != 'void' and '{funcName}' not in ['vkDestroyDevice', 'vkEnumerateInstanceExtensionProperties', 'vkEnumerateInstanceLayerProperties', 'vkQueuePresentKHR', 'vkGetDeviceProcAddr'])
+VK_LAYER_EXPORT VKAPI_ATTR {funcReturn} VKAPI_CALL {funcName}({funcTypedParams})
+{{
+    {funcReturn} result = device_dispatch_table({funcDispatchParam})->{funcShortName}({funcNamedParams});
+    dump_{funcName}(ApiDumpInstance::current(), result, {funcNamedParams});
+    return result;
+}}
+@end function
+
+@foreach function where('{funcType}' == 'device' and '{funcReturn}' == 'void' and '{funcName}' not in ['vkDestroyDevice', 'vkEnumerateInstanceExtensionProperties', 'vkEnumerateInstanceLayerProperties', 'vkGetDeviceProcAddr'])
+VK_LAYER_EXPORT VKAPI_ATTR {funcReturn} VKAPI_CALL {funcName}({funcTypedParams})
+{{
+    device_dispatch_table({funcDispatchParam})->{funcShortName}({funcNamedParams});
+    dump_{funcName}(ApiDumpInstance::current(), {funcNamedParams});
+}}
+@end function
+
+VK_LAYER_EXPORT VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL vkGetInstanceProcAddr(VkInstance instance, const char* pName)
+{{
+    @foreach function where('{funcType}' == 'instance'  and '{funcName}' not in [ 'vkEnumerateDeviceExtensionProperties' ])
+    if(strcmp(pName, "{funcName}") == 0)
+        return reinterpret_cast<PFN_vkVoidFunction>({funcName});
+    @end function
+    
+    if(instance_dispatch_table(instance)->GetInstanceProcAddr == NULL)
+        return NULL;
+    return instance_dispatch_table(instance)->GetInstanceProcAddr(instance, pName);
+}}
+
+VK_LAYER_EXPORT VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL vkGetDeviceProcAddr(VkDevice device, const char* pName)
+{{
+    @foreach function where('{funcType}' == 'device')
+    if(strcmp(pName, "{funcName}") == 0)
+        return reinterpret_cast<PFN_vkVoidFunction>({funcName});
+    @end function
+    
+    if(device_dispatch_table(device)->GetDeviceProcAddr == NULL)
+        return NULL;
+    return device_dispatch_table(device)->GetDeviceProcAddr(device, pName);
+}}
+"""
+
+TEXT_CODEGEN = """
+/* Copyright (c) 2015-2016 Valve Corporation
+ * Copyright (c) 2015-2016 LunarG, Inc.
+ * Copyright (c) 2015-2016 Google Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Lenny Komow <lenny@lunarg.com>
+ */
+ 
+/*
+ * This file is generated from the Khronos Vulkan XML API Registry.
+ */
+ 
+#pragma once
+ 
+#include "api_dump.h"
+
+@foreach struct
+std::ostream& dump_text_{sctName}(const {sctName}& object, const ApiDumpSettings& settings, int indents);
+@end struct
+@foreach union
+std::ostream& dump_text_{unName}(const {unName}& object, const ApiDumpSettings& settings, int indents);
+@end union
+
+//=========================== Type Implementations ==========================//
+
+@foreach type where('{etyName}' != 'void')
+inline std::ostream& dump_text_{etyName}({etyName} object, const ApiDumpSettings& settings, int indents)
+{{
+    @if('{etyName}' != 'uint8_t')
+    return settings.stream() << object;
+    @end if
+    @if('{etyName}' == 'uint8_t')
+    return settings.stream() << (uint32_t) object;
+    @end if
+}}
+@end type
+
+//========================= Basetype Implementations ========================//
+
+@foreach basetype
+inline std::ostream& dump_text_{baseName}({baseName} object, const ApiDumpSettings& settings, int indents)
+{{
+    return settings.stream() << object;
+}}
+@end basetype
+
+//======================= System Type Implementations =======================//
+
+@foreach systype
+inline std::ostream& dump_text_{sysName}(const {sysType} object, const ApiDumpSettings& settings, int indents)
+{{
+    return settings.stream() << object;
+}}
+@end systype
+
+//========================== Handle Implementations =========================//
+
+@foreach handle
+inline std::ostream& dump_text_{hdlName}(const {hdlName} object, const ApiDumpSettings& settings, int indents)
+{{
+    if(settings.showAddress())
+        return settings.stream() << object;
+    else
+        return settings.stream() << "address";
+}}
+@end handle
+
+//=========================== Enum Implementations ==========================//
+
+@foreach enum
+std::ostream& dump_text_{enumName}({enumName} object, const ApiDumpSettings& settings, int indents)
+{{
+    switch((int64_t) object)
+    {{
+    @foreach option
+    case {optValue}:
+        settings.stream() << "{optName} (";
+        break;
+    @end option
+    default:
+        settings.stream() << "UNKNOWN (";
+    }}
+    return settings.stream() << object << ")";
+}}
+@end enum
+
+//========================= Bitmask Implementations =========================//
+
+@foreach bitmask
+std::ostream& dump_text_{bitName}({bitName} object, const ApiDumpSettings& settings, int indents)
+{{
+    bool is_first = true;
+    //settings.formatNameType(stream, indents, name, type_string) << object;
+    settings.stream() << object;
+    @foreach option
+    if(object & {optValue})
+        is_first = dump_text_bitmaskOption("{optName}", settings.stream(), is_first);
+    @end option
+    if(!is_first)
+        settings.stream() << ")";
+    return settings.stream();
+}}
+@end bitmask
+
+//=========================== Flag Implementations ==========================//
+
+@foreach flag where('{flagEnum}' != 'None')
+inline std::ostream& dump_text_{flagName}({flagName} object, const ApiDumpSettings& settings, int indents)
+{{
+    return dump_text_{flagEnum}(({flagEnum}) object, settings, indents);
+}}
+@end flag
+@foreach flag where('{flagEnum}' == 'None')
+inline std::ostream& dump_text_{flagName}({flagName} object, const ApiDumpSettings& settings, int indents)
+{{
+    return settings.stream() << object;
+}}
+@end flag
+
+//======================= Func Pointer Implementations ======================//
+
+@foreach funcpointer
+inline std::ostream& dump_text_{pfnName}({pfnName} object, const ApiDumpSettings& settings, int indents)
+{{
+    if(settings.showAddress())
+        return settings.stream() << object;
+    else
+        return settings.stream() << "address";
+}}
+@end funcpointer
+
+//========================== Struct Implementations =========================//
+
+@foreach struct where('{sctName}' != 'VkShaderModuleCreateInfo')
+std::ostream& dump_text_{sctName}(const {sctName}& object, const ApiDumpSettings& settings, int indents)
+{{
+    if(settings.showAddress())
+        settings.stream() << &object << ":\\n";
+    else
+        settings.stream() << "address:\\n";
+    
+    @foreach member
+    @if('{memCondition}' != 'None')
+    if({memCondition})
+    @end if
+    
+    @if({memPtrLevel} == 0)
+    dump_text_value<const {memBaseType}>(object.{memName}, settings, "{memType}", "{memName}", indents + 1, dump_text_{memTypeID});
+    @end if
+    @if({memPtrLevel} == 1 and '{memLength}' == 'None')
+    dump_text_pointer<const {memBaseType}>(object.{memName}, settings, "{memType}", "{memName}", indents + 1, dump_text_{memTypeID});
+    @end if
+    @if({memPtrLevel} == 1 and '{memLength}' != 'None' and not {memLengthIsMember})
+    dump_text_array<const {memBaseType}>(object.{memName}, {memLength}, settings, "{memType}", "{memChildType}", "{memName}", indents + 1, dump_text_{memTypeID});
+    @end if
+    @if({memPtrLevel} == 1 and '{memLength}' != 'None' and {memLengthIsMember})
+    dump_text_array<const {memBaseType}>(object.{memName}, object.{memLength}, settings, "{memType}", "{memChildType}", "{memName}", indents + 1, dump_text_{memTypeID});
+    @end if
+    
+    @if('{memCondition}' != 'None')
+    else
+        dump_text_special("UNUSED", settings, "{memType}", "{memName}", indents + 1);
+    @end if
+    @end member
+    return settings.stream();
+}}
+@end struct
+
+@foreach struct where('{sctName}' == 'VkShaderModuleCreateInfo')
+std::ostream& dump_text_{sctName}(const {sctName}& object, const ApiDumpSettings& settings, int indents)
+{{
+    if(settings.showAddress())
+        settings.stream() << &object << ":\\n";
+    else
+        settings.stream() << "address:\\n";
+        
+    @foreach member
+    @if('{memCondition}' != 'None')
+    if({memCondition})
+    @end if
+    
+    @if({memPtrLevel} == 0)
+    dump_text_value<const {memBaseType}>(object.{memName}, settings, "{memType}", "{memName}", indents + 1, dump_text_{memTypeID});
+    @end if
+    @if({memPtrLevel} == 1 and '{memLength}' == 'None')
+    dump_text_pointer<const {memBaseType}>(object.{memName}, settings, "{memType}", "{memName}", indents + 1, dump_text_{memTypeID});
+    @end if
+    @if({memPtrLevel} == 1 and '{memLength}' != 'None' and not {memLengthIsMember} and '{memName}' != 'pCode')
+    dump_text_array<const {memBaseType}>(object.{memName}, {memLength}, settings, "{memType}", "{memChildType}", "{memName}", indents + 1, dump_text_{memTypeID});
+    @end if
+    @if({memPtrLevel} == 1 and '{memLength}' != 'None' and {memLengthIsMember} and '{memName}' != 'pCode')
+    dump_text_array<const {memBaseType}>(object.{memName}, object.{memLength}, settings, "{memType}", "{memChildType}", "{memName}", indents + 1, dump_text_{memTypeID});
+    @end if
+    @if('{memName}' == 'pCode')
+    if(settings.showShader())
+        dump_text_array<const {memBaseType}>(object.{memName}, object.{memLength}, settings, "{memType}", "{memChildType}", "{memName}", indents + 1, dump_text_{memTypeID});
+    else
+        dump_text_special("SHADER DATA", settings, "{memType}", "{memName}", indents + 1);
+    @end if
+    
+    @if('{memCondition}' != 'None')
+    else
+        dump_text_special("UNUSED", settings, "{memType}", "{memName}", indents + 1);
+    @end if
+    @end member
+    return settings.stream();
+}}
+@end struct
+
+//========================== Union Implementations ==========================//
+
+@foreach union
+std::ostream& dump_text_{unName}(const {unName}& object, const ApiDumpSettings& settings, int indents)
+{{
+    if(settings.showAddress())
+        settings.stream() << &object << " (Union):\\n";
+    else
+        settings.stream() << "address (Union):\\n";
+        
+    @foreach choice
+    @if({chcPtrLevel} == 0)
+    dump_text_value<const {chcBaseType}>(object.{chcName}, settings, "{chcType}", "{chcName}", indents + 1, dump_text_{chcTypeID});
+    @end if
+    @if({chcPtrLevel} == 1 and '{chcLength}' == 'None')
+    dump_text_pointer<const {chcBaseType}>(object.{chcName}, settings, "{chcType}", "{chcName}", indents + 1, dump_text_{chcTypeID});
+    @end if
+    @if({chcPtrLevel} == 1 and '{chcLength}' != 'None')
+    dump_text_array<const {chcBaseType}>(object.{chcName}, {chcLength}, settings, "{chcType}", "{chcChildType}", "{chcName}", indents + 1, dump_text_{chcTypeID});
+    @end if
+    @end choice
+    return settings.stream();
+}}
+@end union
+
+//========================= Function Implementations ========================//
+
+@foreach function where('{funcReturn}' != 'void')
+std::ostream& dump_text_{funcName}(ApiDumpInstance& dump_inst, {funcReturn} result, {funcTypedParams})
+{{
+    const ApiDumpSettings& settings(dump_inst.settings());
+    settings.stream() << "Thread " << dump_inst.threadID() << ", Frame " << dump_inst.frameCount() << ":\\n";
+    settings.stream() << "{funcName}({funcNamedParams}) returns {funcReturn} ";
+    dump_text_{funcReturn}(result, settings, 0) << ":\\n";
+    
+    if(settings.showParams())
+    {{
+        @foreach parameter
+        @if({prmPtrLevel} == 0)
+        dump_text_value<const {prmBaseType}>({prmName}, settings, "{prmType}", "{prmName}", 1, dump_text_{prmTypeID});
+        @end if
+        @if({prmPtrLevel} == 1 and '{prmLength}' == 'None')
+        dump_text_pointer<const {prmBaseType}>({prmName}, settings, "{prmType}", "{prmName}", 1, dump_text_{prmTypeID});
+        @end if
+        @if({prmPtrLevel} == 1 and '{prmLength}' != 'None')
+        dump_text_array<const {prmBaseType}>({prmName}, {prmLength}, settings, "{prmType}", "{prmChildType}", "{prmName}", 1, dump_text_{prmTypeID});
+        @end if
+        @end parameter
+    }}
+    settings.shouldFlush() ? settings.stream() << std::endl : settings.stream() << "\\n";
+    
+    return settings.stream();
+}}
+@end function
+
+@foreach function where('{funcReturn}' == 'void')
+std::ostream& dump_text_{funcName}(ApiDumpInstance& dump_inst, {funcTypedParams})
+{{
+    const ApiDumpSettings& settings(dump_inst.settings());
+    settings.stream() << "Thread " << dump_inst.threadID() << ", Frame " << dump_inst.frameCount() << ":\\n";
+    settings.stream() << "{funcName}({funcNamedParams}) returns {funcReturn}:\\n";
+    
+    if(settings.showParams())
+    {{
+        @foreach parameter
+        @if({prmPtrLevel} == 0)
+        dump_text_value<const {prmBaseType}>({prmName}, settings, "{prmType}", "{prmName}", 1, dump_text_{prmTypeID});
+        @end if
+        @if({prmPtrLevel} == 1 and '{prmLength}' == 'None')
+        dump_text_pointer<const {prmBaseType}>({prmName}, settings, "{prmType}", "{prmName}", 1, dump_text_{prmTypeID});
+        @end if
+        @if({prmPtrLevel} == 1 and '{prmLength}' != 'None')
+        dump_text_array<const {prmBaseType}>({prmName}, {prmLength}, settings, "{prmType}", "{prmChildType}", "{prmName}", 1, dump_text_{prmTypeID});
+        @end if
+        @end parameter
+    }}
+    settings.shouldFlush() ? settings.stream() << std::endl : settings.stream() << "\\n";
+    
+    return settings.stream();
+}}
+@end function
+"""
+
+POINTER_TYPES = ['void', 'xcb_connection_t', 'Display', 'SECURITY_ATTRIBUTES', 'ANativeWindow']
+VALIDITY_CHECKS = {
+    'VkBufferCreateInfo': {
+        'pQueueFamilyIndices': 'object.sharingMode == VK_SHARING_MODE_CONCURRENT',
+    },
+    'VkCommandBufferBeginInfo': {
+        'pInheritanceInfo': 'false',    # No easy way to tell if this is a primary command buffer
+    },
+    'VkDescriptorSetLayoutBinding': {
+        'pImmutableSamplers':
+            '(object.descriptorType == VK_DESCRIPTOR_TYPE_SAMPLER) || ' +
+            '(object.descriptorType == VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER)',
+    },
+    'VkImageCreateInfo': {
+        'pQueueFamilyIndices': 'object.sharingMode == VK_SHARING_MODE_CONCURRENT',
+    },
+    'VkPipelineViewportStateCreateInfo': {
+        'pViewports': 'false',      # No easy way to check if viewport state is dynamic
+        'pScissors': 'false',       # No easy way to check if scissor state is dynamic
+    },
+    'VkSwapchainCreateInfoKHR': {
+        'pQueueFamilyIndices': 'object.imageSharingMode == VK_SHARING_MODE_CONCURRENT',
+    },
+    'VkWriteDescriptorSet': {
+        'pImageInfo':
+            '(object.descriptorType == VK_DESCRIPTOR_TYPE_SAMPLER) || ' +
+            '(object.descriptorType == VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER) || ' +
+            '(object.descriptorType == VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE) || ' +
+            '(object.descriptorType == VK_DESCRIPTOR_TYPE_STORAGE_IMAGE)',
+        'pBufferInfo':
+            '(object.descriptorType == VK_DESCRIPTOR_TYPE_STORAGE_BUFFER) || ' +
+            '(object.descriptorType == VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER) || ' +
+            '(object.descriptorType == VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC) || ' +
+            '(object.descriptorType == VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC)',
+        'pTexelBufferView':
+            '(object.descriptorType == VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER) || ' +
+            '(object.descriptorType == VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER)',
+    },
+}
+
+class ApiDumpGeneratorOptions(gen.GeneratorOptions):
+    
+    def __init__(self,
+                 input = None,
+                 filename = None,
+                 directory = '.',
+                 apiname = None,
+                 profile = None,
+                 versions = '.*',
+                 emitversions = '.*',
+                 defaultExtensions = None,
+                 addExtensions = None,
+                 removeExtensions = None,
+                 sortProcedure = None,
+                 prefixText = "",
+                 genFuncPointers = True,
+                 protectFile = True,
+                 protectFeature = True,
+                 protectProto = None,
+                 protectProtoStr = None,
+                 apicall = '',
+                 apientry = '',
+                 apientryp = '',
+                 indentFuncProto = True,
+                 indentFuncPointer = False,
+                 alignFuncParam = 0):
+        gen.GeneratorOptions.__init__(self, filename, directory, apiname, profile,
+            versions, emitversions, defaultExtensions,
+            addExtensions, removeExtensions, sortProcedure)
+        self.input           = input
+        self.prefixText      = prefixText
+        self.genFuncPointers = genFuncPointers
+        self.protectFile     = protectFile
+        self.protectFeature  = protectFeature
+        self.protectProto    = protectProto
+        self.protectProtoStr = protectProtoStr
+        self.apicall         = apicall
+        self.apientry        = apientry
+        self.apientryp       = apientryp
+        self.indentFuncProto = indentFuncProto
+        self.indentFuncPointer = indentFuncPointer
+        self.alignFuncParam  = alignFuncParam
+
+
+class ApiDumpOutputGenerator(gen.OutputGenerator):
+    
+    def __init__(self,
+                 errFile = sys.stderr,
+                 warnFile = sys.stderr,
+                 diagFile = sys.stdout,
+                 registryFile = None):
+        gen.OutputGenerator.__init__(self, errFile, warnFile, diagFile)
+        self.format = None
+        
+        self.constants = {}
+        self.extensions = set()
+        self.extFuncs = {}
+        self.extTypes = {}
+        self.includes = {}
+        
+        self.basetypes = set()
+        self.bitmasks = set()
+        self.enums = set()
+        self.externalTypes = set()
+        self.flags = set()
+        self.funcPointers = set()
+        self.functions = set()
+        self.handles = set()
+        self.structs = set()
+        self.unions = set()
+        
+        self.registryFile = registryFile
+        
+    def beginFile(self, genOpts):
+        gen.OutputGenerator.beginFile(self, genOpts)
+        self.format = genOpts.input
+
+        if self.registryFile != None:
+            root = xml.etree.ElementTree.parse(self.registryFile)
+        else:
+            root = self.registry.reg
+            
+        for node in root.find('extensions').findall('extension'):
+            ext = VulkanExtension(node)
+            self.extensions.add(ext)
+            for item in ext.vktypes:
+                self.extTypes[item] = ext
+            for item in ext.vkfuncs:
+                self.extFuncs[item] = ext
+                
+        for node in self.registry.reg.findall('enums'):
+            if node.get('name') == 'API Constants':
+                for item in node.findall('enum'):
+                    self.constants[item.get('name')] = item.get('value')
+                    
+        for node in self.registry.reg.find('types').findall('type'):
+            if node.get('category') == 'include':
+                self.includes[node.get('name')] = ''.join(node.itertext())
+                
+        
+    def endFile(self):
+        # Find all of the extensions that use the system types
+        self.sysTypes = set()
+        for node in self.registry.reg.find('types').findall('type'):
+            if node.get('category') == None and node.get('requires') in self.includes and node.get('requires') != 'vk_platform':
+                for extension in self.extTypes:
+                    for structName in self.extTypes[extension].vktypes:
+                        for struct in self.structs:
+                            if struct.name == structName:
+                                for member in struct.members:
+                                    if node.get('name') == member.baseType or node.get('name') + '*' == member.baseType:
+                                        sysType = VulkanSystemType(node.get('name'), self.extTypes[structName])
+                                        if sysType not in self.sysTypes:
+                                            self.sysTypes.add(sysType)
+                    for funcName in self.extTypes[extension].vkfuncs:
+                        for func in self.functions:
+                            if func.name == funcName:
+                                for param in func.parameters:
+                                    if node.get('name') == param.baseType or node.get('name') + '*' == param.baseType:
+                                        sysType = VulkanSystemType(node.get('name'), self.extFuncs[funcName])
+                                        if sysType not in self.sysTypes:
+                                            self.sysTypes.add(sysType)
+        
+        # Find every @foreach, @if, and @end
+        forIter = re.finditer('(^\\s*\\@foreach\\s+[a-z]+(\\s+where\\(.*\\))?\\s*^)|(\\@foreach [a-z]+(\\s+where\\(.*\\))?\\b)', self.format, flags=re.MULTILINE)
+        ifIter = re.finditer('(^\\s*\\@if\\(.*\\)\\s*^)|(\\@if\\(.*\\))', self.format, flags=re.MULTILINE)
+        endIter = re.finditer('(^\\s*\\@end\\s+[a-z]+\\s*^)|(\\@end [a-z]+\\b)', self.format, flags=re.MULTILINE)
+        try:
+            nextFor = next(forIter)
+        except StopIteration:
+            nextFor = None
+        try:
+            nextIf = next(ifIter)
+        except StopIteration:
+            nextIf = None
+        try:
+            nextEnd = next(endIter)
+        except StopIteration:
+            nextEnd = None
+        
+        # Match the beginnings to the ends
+        loops = []
+        unassignedControls = []
+        depth = 0
+        while nextFor != None or nextFor != None or nextEnd != None:
+            # If this is a @foreach
+            if nextFor != None and ((nextIf == None or nextFor.start() < nextIf.start()) and nextFor.start() < nextEnd.start()):
+                depth += 1
+                forType = re.search('(?<=\\s)[a-z]+', self.format[nextFor.start():nextFor.end()])
+                text = self.format[forType.start()+nextFor.start():forType.end()+nextFor.start()]
+                whereMatch = re.search('(?<=where\\().*(?=\\))', self.format[nextFor.start():nextFor.end()])
+                condition = None if whereMatch == None else self.format[whereMatch.start()+nextFor.start():whereMatch.end()+nextFor.start()]
+                unassignedControls.append((nextFor.start(), nextFor.end(), text, condition))
+                
+                try:
+                    nextFor = next(forIter)
+                except StopIteration:
+                    nextFor = None
+            
+            # If this is an @if    
+            elif nextIf != None and nextIf.start() < nextEnd.start():
+                depth += 1
+                condMatch = re.search('(?<=if\\().*(?=\\))', self.format[nextIf.start():nextIf.end()])
+                condition = None if condMatch == None else self.format[condMatch.start()+nextIf.start():condMatch.end()+nextIf.start()]
+                unassignedControls.append((nextIf.start(), nextIf.end(), 'if', condition))
+                
+                try:
+                    nextIf = next(ifIter)
+                except StopIteration:
+                    nextIf = None
+                    
+            # Else this is an @end
+            else:
+                depth -= 1
+                endType = re.search('(?<=\\s)[a-z]+', self.format[nextEnd.start():nextEnd.end()])
+                text = self.format[endType.start()+nextEnd.start():endType.end()+nextEnd.start()]
+
+                start = unassignedControls.pop(-1)
+                assert(start[2] == text)
+                
+                item = Control(self.format, start[0:2], (nextEnd.start(), nextEnd.end()), text, start[3])
+                if len(loops) < 1 or depth < loops[-1][0]:
+                    while len(loops) > 0 and depth < loops[-1][0]:
+                        item.children.insert(0, loops.pop(-1)[1])
+                    loops.append((depth, item))
+                else:
+                    loops.append((depth, item))
+
+                try:
+                    nextEnd = next(endIter)
+                except StopIteration:
+                    nextEnd = None                
+        
+        # Expand each loop into its full form
+        lastIndex = 0
+        for _, loop in loops:
+            gen.write(self.format[lastIndex:loop.startPos[0]].format(**{}), file=self.outFile)
+            gen.write(self.expand(loop), file=self.outFile)
+            lastIndex = loop.endPos[1]
+        gen.write(self.format[lastIndex:-1].format(**{}), file=self.outFile)
+        
+        gen.OutputGenerator.endFile(self)
+        
+    def genCmd(self, cmd, name):
+        gen.OutputGenerator.genCmd(self, cmd, name)
+        self.functions.add(VulkanFunction(cmd.elem, self.constants))
+        
+    # These are actually constants
+    def genEnum(self, enuminfo, name):
+        gen.OutputGenerator.genEnum(self, enuminfo, name)
+    
+    # These are actually enums
+    def genGroup(self, groupinfo, groupName):
+        gen.OutputGenerator.genGroup(self, groupinfo, groupName)
+        
+        if groupinfo.elem.get('type') == 'bitmask':
+            self.bitmasks.add(VulkanBitmask(groupinfo.elem, self.extensions))
+        elif groupinfo.elem.get('type') == 'enum':
+            self.enums.add(VulkanEnum(groupinfo.elem, self.extensions))
+
+    def genType(self, typeinfo, name):
+        gen.OutputGenerator.genType(self, typeinfo, name)
+        
+        if typeinfo.elem.get('category') == 'struct':
+            self.structs.add(VulkanStruct(typeinfo.elem, self.constants))
+        elif typeinfo.elem.get('category') == 'basetype':
+            self.basetypes.add(VulkanBasetype(typeinfo.elem))
+        elif typeinfo.elem.get('category') == None and typeinfo.elem.get('requires') == 'vk_platform':
+            self.externalTypes.add(VulkanExternalType(typeinfo.elem))
+        elif typeinfo.elem.get('category') == 'handle':
+            self.handles.add(VulkanHandle(typeinfo.elem))
+        elif typeinfo.elem.get('category') == 'union':
+            self.unions.add(VulkanUnion(typeinfo.elem, self.constants))
+        elif typeinfo.elem.get('category') == 'bitmask':
+            self.flags.add(VulkanFlags(typeinfo.elem))
+        elif typeinfo.elem.get('category') == 'funcpointer':
+            self.funcPointers.add(VulkanFunctionPointer(typeinfo.elem))
+            
+    def expand(self, loop, parents=[]):
+        # Figure out what we're dealing with
+        if loop.text == 'if':
+            subjects = [ Control.IfDummy() ]
+        elif loop.text == 'basetype':
+            subjects = self.basetypes
+        elif loop.text == 'bitmask':
+            subjects = self.bitmasks
+        elif loop.text == 'choice':
+            subjects = self.findByType([VulkanUnion], parents).choices
+        elif loop.text == 'enum':
+            subjects = self.enums
+        elif loop.text == 'extension':
+            subjects = self.extensions
+        elif loop.text == 'flag':
+            subjects = self.flags
+        elif loop.text == 'funcpointer':
+            subjects = self.funcPointers
+        elif loop.text == 'function':
+            subjects = self.functions
+        elif loop.text == 'handle':
+            subjects = self.handles
+        elif loop.text == 'option':
+            subjects = self.findByType([VulkanEnum, VulkanBitmask], parents).options
+        elif loop.text == 'member':
+            subjects = self.findByType([VulkanStruct], parents).members
+        elif loop.text == 'parameter':
+            subjects = self.findByType([VulkanFunction], parents).parameters
+        elif loop.text == 'struct':
+            subjects = self.structs
+        elif loop.text == 'systype':
+            subjects = self.sysTypes
+        elif loop.text == 'type':
+            subjects = self.externalTypes
+        elif loop.text == 'union':
+            subjects = self.unions
+        else:
+            assert(False)
+            
+        # Generate the output string
+        out = ''
+        for item in subjects:
+            
+            # Merge the values and the parent values
+            values = item.values().copy()
+            for parent in parents:
+                values.update(parent.values())
+        
+            # Check if the condition is met
+            if loop.condition != None:
+                cond = eval(loop.condition.format(**values))
+                assert(cond == True or cond == False)
+                if not cond:
+                    continue
+                    
+            # Check if an ifdef is needed
+            if item.name in self.extFuncs:
+                ext = self.extFuncs[item.name]
+            elif item.name in self.extTypes:
+                ext = self.extTypes[item.name]
+            elif item in self.sysTypes:
+                ext = item.ext
+            else:
+                ext = None
+            if ext != None and ext.guard != None:
+                out += '#if defined({})\n'.format(ext.guard)
+                    
+            # Format the string
+            lastIndex = loop.startPos[1]
+            for child in loop.children:
+                out += loop.fullString[lastIndex:child.startPos[0]].format(**values)
+                out += self.expand(child, parents=[item]+parents)
+                lastIndex = child.endPos[1]
+            out += loop.fullString[lastIndex:loop.endPos[0]].format(**values)
+            
+            # Close the ifdef
+            if ext != None and ext.guard != None:
+                out += '#endif // {}\n'.format(ext.guard)
+            
+        return out
+        
+    def findByType(self, types, objects):
+        value = None
+        for item in objects:
+            for ty in types:
+                if isinstance(item, ty):
+                    value = item
+                    break
+        assert(value != None)
+        return value
+            
+class Control:
+    
+    class IfDummy:
+        
+        def __init__(self):
+            self.name = 'ifdummy'
+        
+        def values(self):
+            return {}
+    
+    def __init__(self, fullString, start, end, text, condition):
+        self.fullString = fullString
+        self.startPos = start
+        self.endPos = end
+        self.text = text
+        self.condition = condition
+        self.children = []
+        
+# Base class for VulkanStruct.Member and VulkanStruct.Parameter
+class VulkanVariable:
+    
+    def __init__(self, rootNode, constants):
+        # Set basic properties
+        self.name = rootNode.find('name').text      # Variable name
+        self.typeID = rootNode.find('type').text    # Typename, dereferenced and converted to a useable C++ token
+        self.baseType = self.typeID                     # Type, dereferenced to the non-pointer type
+        self.childType = None                       # Type, dereferenced to the non-pointer type (None if it isn't a pointer)
+        self.arrayLength = None                     # Length of the array, or None if it isn't an array
+        
+        typeMatch = re.search('.+(?=' + self.name + ')', ''.join(rootNode.itertext()))
+        self.type = typeMatch.string[typeMatch.start():typeMatch.end()]
+        self.type = ' '.join(self.type.split())
+        bracketMatch = re.search('(?<=\\[)[a-zA-Z0-9_]+(?=\\])', ''.join(rootNode.itertext()))
+        if bracketMatch != None:
+            matchText = bracketMatch.string[bracketMatch.start():bracketMatch.end()]
+            self.childType = self.type
+            self.type += '[' + matchText + ']'
+            if matchText in constants:
+                self.arrayLength = constants[matchText]
+            else:
+                self.arrayLength = matchText
+        
+        self.lengthMember = False
+        lengthString = rootNode.get('len')
+        lengths = []
+        if lengthString != None:
+            lengths = re.split(',', lengthString)
+            lengths = list(filter(('null-terminated').__ne__, lengths))
+        assert(len(lengths) <= 1)
+        if self.arrayLength == None and len(lengths) > 0:
+            self.childType = '*'.join(self.type.split('*')[0:-1])
+            self.arrayLength = lengths[0]
+            self.lengthMember = True
+        if self.arrayLength != None and self.arrayLength.startswith('latexmath'):
+            code = self.arrayLength[10:len(self.arrayLength)]
+            code = re.sub('\\[\\$', '', code)
+            code = re.sub('\\$\\]', '', code)
+            code = re.sub('\\\\(lceil|rceil)', '', code)
+            code = re.sub('{|}', '', code)
+            code = re.sub('\\\\mathit', '', code)
+            code = re.sub('\\\\over', '/', code)
+            self.arrayLength = code
+            
+        # Dereference if necessary and handle members of variables
+        if self.arrayLength != None:
+            self.arrayLength = re.sub('::', '->', self.arrayLength)
+            sections = self.arrayLength.split('->')
+            if sections[-1][0] == 'p' and sections[0][1].isupper():
+                self.arrayLength = '*' + self.arrayLength
+        
+        self.pointerLevels = len(re.findall('\\*|\\[', ''.join(rootNode.itertext())))
+        if self.typeID == 'char' and self.pointerLevels > 0:
+            self.baseType += '*'
+            self.pointerLevels -= 1
+            self.typeID = 'cstring'
+        elif self.typeID in POINTER_TYPES:
+            self.baseType += '*'
+            self.pointerLevels -= 1
+        assert(self.pointerLevels >= 0)
+        
+class VulkanBasetype:
+    
+    def __init__(self, rootNode):
+        self.name = rootNode.get('name')
+        self.type = rootNode.get('type')
+        
+    def values(self):
+        return {
+            'baseName': self.name,
+            'baseType': self.type,
+        }
+        
+class VulkanBitmask:
+    
+    def __init__(self, rootNode, extensions):
+        self.name = rootNode.get('name')
+        self.type = rootNode.get('type')
+        
+        # Read each value that the enum contains
+        self.options = []
+        for child in rootNode:
+            childName = child.get('name')
+            childValue = child.get('value')
+            childBitpos = child.get('bitpos')
+            childComment = child.get('comment')
+            if childName == None or (childValue == None and childBitpos == None):
+                continue
+                
+            self.options.append(VulkanEnum.Option(childName, childValue, childBitpos, childComment))
+            
+        for ext in extensions:
+            if self.name in ext.enumValues:
+                childName, childValue = ext.enumValues[self.name]
+                self.options.append(VulkanEnum.Option(childName, childValue, None, None))
+            
+    def values(self):
+        return {
+            'bitName': self.name,
+            'bitType': self.type,
+        }
+        
+        
+class VulkanEnum:
+    
+    class Option:
+        
+        def __init__(self, name, value, bitpos, comment):
+            self.name = name
+            self.comment = comment
+            
+            if value == 0 or value == None:
+                value = 1 << int(bitpos)
+            self.value = value
+            
+        def values(self):
+            return {
+                'optName': self.name,
+                'optValue': self.value,
+                'optComment': self.comment,
+            }
+    
+    def __init__(self, rootNode, extensions):
+        self.name = rootNode.get('name')
+        self.type = rootNode.get('type')
+        
+        # Read each value that the enum contains
+        self.options = []
+        for child in rootNode:
+            childName = child.get('name')
+            childValue = child.get('value')
+            childBitpos = child.get('bitpos')
+            childComment = child.get('comment')
+            if childName == None or (childValue == None and childBitpos == None):
+                continue
+                
+            self.options.append(VulkanEnum.Option(childName, childValue, childBitpos, childComment))
+            
+        for ext in extensions:
+            if self.name in ext.enumValues:
+                childName, childValue = ext.enumValues[self.name]
+                self.options.append(VulkanEnum.Option(childName, childValue, None, None))
+        
+    def values(self):
+        return {
+            'enumName': self.name,
+            'enumType': self.type,
+        }
+            
+class VulkanExtension:
+    
+    def __init__(self, rootNode):
+        self.name = rootNode.get('name')
+        self.number = int(rootNode.get('number'))
+        self.type = rootNode.get('type')
+        self.dependency = rootNode.get('requires')
+        self.guard = rootNode.get('protect')
+        self.supported = rootNode.get('supported')
+        
+        self.vktypes = []
+        for ty in rootNode.find('require').findall('type'):
+            self.vktypes.append(ty.get('name'))
+        self.vkfuncs = []
+        for func in rootNode.find('require').findall('command'):
+            self.vkfuncs.append(func.get('name'))
+            
+        self.constants = {}
+        self.enumValues = {}
+        for enum in rootNode.find('require').findall('enum'):
+            base = enum.get('extends')
+            name = enum.get('name')
+            value = enum.get('value')
+            bitpos = enum.get('bitpos')
+            offset = enum.get('offset')
+            
+            if value == None and bitpos != None:
+                value = 1 << int(bitpos)
+            
+            if offset != None:
+                offset = int(offset)
+            if base != None and offset != None:
+                enumValue = 1000000000 + 1000*(self.number - 1) + offset
+                if enum.get('dir') == '-':
+                    enumValue = -enumValue;
+                self.enumValues[base] = (name, enumValue)
+            else:
+                self.constants[name] = value
+        
+    def values(self):
+        return {
+            'extName': self.name,
+            'extNumber': self.number,
+            'extType': self.type,
+            'extDependency': self.dependency,
+            'extGuard': self.guard,
+            'extSupported': self.supported,
+        }
+        
+class VulkanExternalType:
+    
+    def __init__(self, rootNode):
+        self.name = rootNode.get('name')
+        self.dependency = rootNode.get('requires')
+        
+    def values(self):
+        return {
+            'etyName': self.name,
+            'etyDependency': self.dependency,
+        }
+        
+class VulkanFlags:
+    
+    def __init__(self, rootNode):
+        self.name = rootNode.get('name')
+        self.type = rootNode.get('type')
+        self.enum = rootNode.get('requires')
+        
+    def values(self):
+        return {
+            'flagName': self.name,
+            'flagType': self.type,
+            'flagEnum': self.enum,
+        }
+        
+class VulkanFunction:
+    
+    class Parameter(VulkanVariable):
+        
+        def __init__(self, rootNode, constants):
+            VulkanVariable.__init__(self, rootNode, constants)
+            self.text = ''.join(rootNode.itertext())
+
+        def values(self):
+            return {
+                'prmName': self.name,
+                'prmBaseType': self.baseType,
+                'prmTypeID': self.typeID,
+                'prmType': self.type,
+                'prmChildType': self.childType,
+                'prmPtrLevel': self.pointerLevels,
+                'prmLength': self.arrayLength,
+            }
+    
+    def __init__(self, rootNode, constants):
+        self.name = rootNode.find('proto').find('name').text
+        self.returnType = rootNode.find('proto').find('type').text
+
+        self.parameters = []
+        self.namedParams = ''
+        self.typedParams = ''
+        for node in rootNode.findall('param'):
+            self.parameters.append(VulkanFunction.Parameter(node, constants))
+            self.namedParams += self.parameters[-1].name + ', '
+            self.typedParams += self.parameters[-1].text + ', '
+        if len(self.parameters) > 0:
+            self.namedParams = self.namedParams[0:-2]
+            self.typedParams = self.typedParams[0:-2]
+            
+        if self.parameters[0].type in ['VkInstance', 'VkPhysicalDevice'] or self.name == 'vkCreateInstance':
+            self.type = 'instance'
+        else:
+            self.type = 'device' 
+            
+    def values(self):
+        return {
+            'funcName': self.name,
+            'funcShortName': self.name[2:len(self.name)],
+            'funcType': self.type,
+            'funcReturn': self.returnType,
+            'funcNamedParams': self.namedParams,
+            'funcTypedParams': self.typedParams,
+            'funcDispatchParam': self.parameters[0].name
+        }
+        
+class VulkanFunctionPointer:
+    
+    def __init__(self, rootNode):
+        self.name = rootNode.get('name')
+        
+    def values(self):
+        return {
+            'pfnName': self.name,
+        }
+            
+class VulkanHandle:
+    
+    def __init__(self, rootNode):
+        self.name = rootNode.get('name')
+        self.type = rootNode.get('type')
+        self.parent = rootNode.get('parent')
+        
+    def values(self):
+        return {
+            'hdlName': self.name,
+            'hdlType': self.type,
+            'hdlParent': self.parent,
+        }
+            
+class VulkanStruct:
+    
+    class Member(VulkanVariable):
+        
+        def __init__(self, rootNode, constants, parentName):
+            VulkanVariable.__init__(self, rootNode, constants)
+            
+            # Search for a member condition
+            self.condition = None
+            if rootNode.get('noautovalidity') == 'true' and parentName in VALIDITY_CHECKS and self.name in VALIDITY_CHECKS[parentName]:
+                self.condition = VALIDITY_CHECKS[parentName][self.name]
+            
+        def values(self):
+            return {
+                'memName': self.name,
+                'memBaseType': self.baseType,
+                'memTypeID': self.typeID,
+                'memType': self.type,
+                'memChildType': self.childType,
+                'memPtrLevel': self.pointerLevels,
+                'memLength': self.arrayLength,
+                'memLengthIsMember': self.lengthMember,
+                'memCondition': self.condition,
+            }
+            
+    
+    def __init__(self, rootNode, constants):
+        self.name = rootNode.get('name')
+        self.members = []
+        for node in rootNode.findall('member'):
+            self.members.append(VulkanStruct.Member(node, constants, self.name))
+            
+    def values(self):
+        return {
+            'sctName': self.name,
+        }
+            
+class VulkanSystemType:
+    
+    def __init__(self, name, ext):
+        self.name = name
+        self.type = self.name if name not in POINTER_TYPES else self.name + '*'
+        self.ext = ext
+        
+    def __eq__(self, that):
+        return self.name == that.name and self.type == that.type
+        
+    def __hash__(self):
+        return hash(self.name) | hash(self.type)
+        
+    def values(self):
+        return {
+            'sysName': self.name,
+            'sysType': self.type,
+        }
+            
+class VulkanUnion:
+    
+    class Choice(VulkanVariable):
+        
+        def __init__(self, rootNode, constants):
+            VulkanVariable.__init__(self, rootNode, constants)
+            
+        def values(self):
+            return {
+                'chcName': self.name,
+                'chcBaseType': self.baseType,
+                'chcTypeID': self.typeID,
+                'chcType': self.type,
+                'chcChildType': self.childType,
+                'chcPtrLevel': self.pointerLevels,
+                'chcLength': self.arrayLength,
+                #'chcLengthIsMember': self.lengthMember,
+            }
+    
+    def __init__(self, rootNode, constants):
+        self.name = rootNode.get('name')
+        self.choices = []
+        for node in rootNode.findall('member'):
+            self.choices.append(VulkanUnion.Choice(node, constants))
+        
+    def values(self):
+        return {
+            'unName': self.name,
+        }
+            
\ No newline at end of file

diff --git a/build-android/android-generate.bat b/build-android/android-generate.bat
index cfe0ed3..b58159c 100644
--- a/build-android/android-generate.bat
+++ b/build-android/android-generate.bat

@@ -28,6 +28,20 @@
 python ../../../lvl_genvk.py -registry ../../../vk.xml thread_check.h

 python ../../../lvl_genvk.py -registry ../../../vk.xml parameter_validation.h

 python ../../../lvl_genvk.py -registry ../../../vk.xml unique_objects_wrappers.h

+python ../../../vt_genvk.py -registry ../../../vk.xml api_dump.cpp

+python ../../../vt_genvk.py -registry ../../../vk.xml api_dump_text.h

+

+REM vktrace

+python ../../../vktrace/vktrace_generate.py AllPlatforms vktrace-trace-h vk_version_1_0 > vktrace_vk_vk.h

+python ../../../vktrace/vktrace_generate.py AllPlatforms vktrace-trace-c vk_version_1_0 > vktrace_vk_vk.cpp

+python ../../../vktrace/vktrace_generate.py AllPlatforms vktrace-core-trace-packets vk_version_1_0 > vktrace_vk_vk_packets.h

+python ../../../vktrace/vktrace_generate.py AllPlatforms vktrace-packet-id vk_version_1_0 > vktrace_vk_packet_id.h

+

+REM vkreplay

+python ../../../vktrace/vktrace_generate.py AllPlatforms vktrace-replay-vk-funcs vk_version_1_0 > vkreplay_vk_func_ptrs.h

+python ../../../vktrace/vktrace_generate.py AllPlatforms vktrace-replay-c vk_version_1_0 > vkreplay_vk_replay_gen.cpp

+python ../../../vktrace/vktrace_generate.py AllPlatforms vktrace-replay-obj-mapper-h vk_version_1_0 > vkreplay_vk_objmapper.h

+

 cd ../..

 

 copy /Y ..\layers\vk_layer_config.cpp   generated\common\

@@ -39,18 +53,25 @@
 REM create build-script root directory

 mkdir generated\gradle-build

 cd generated\gradle-build

-mkdir  core_validation image object_tracker parameter_validation swapchain threading unique_objects

+mkdir  core_validation image object_tracker parameter_validation swapchain threading unique_objects api_dump screenshot

 cd ..\..

 mkdir generated\layer-src

 cd generated\layer-src

-mkdir  core_validation image object_tracker parameter_validation swapchain threading unique_objects

+mkdir  core_validation image object_tracker parameter_validation swapchain threading unique_objects api_dump screenshot

 cd ..\..

 xcopy /s gradle-templates\*   generated\gradle-build\

 for %%G in (core_validation image object_tracker parameter_validation swapchain threading unique_objects) Do (

     copy ..\layers\%%G.cpp   generated\layer-src\%%G

     echo apply from: "../common.gradle"  > generated\gradle-build\%%G\build.gradle

 )

+for %%G in (screenshot) Do (

+    copy ..\layersvt\%%G.cpp   generated\layer-src\%%G

+    echo apply from: "../common.gradle"  > generated\gradle-build\%%G\build.gradle

+)

+copy generated\include\api_dump.cpp   generated\layer-src\api_dump

 copy generated\common\descriptor_sets.cpp generated\layer-src\core_validation\descriptor_sets.cpp

 copy generated\include\vk_safe_struct.cpp generated\layer-src\core_validation\vk_safe_struct.cpp

 move generated\include\vk_safe_struct.cpp generated\layer-src\unique_objects\vk_safe_struct.cpp

 echo apply from: "../common.gradle"  > generated\gradle-build\unique_objects\build.gradle

+

+del  /f /q generated\include\api_dump.cpp


diff --git a/build-android/android-generate.sh b/build-android/android-generate.sh
index 2fd432d..14a96ed 100755
--- a/build-android/android-generate.sh
+++ b/build-android/android-generate.sh

@@ -30,6 +30,20 @@
 ( cd generated/include; python ../../../lvl_genvk.py -registry ../../../vk.xml parameter_validation.h )
 ( cd generated/include; python ../../../lvl_genvk.py -registry ../../../vk.xml unique_objects_wrappers.h )
 
+( cd generated/include; python ../../../vt_genvk.py -registry ../../../vk.xml api_dump.cpp )
+( cd generated/include; python ../../../vt_genvk.py -registry ../../../vk.xml api_dump_text.h )
+
+# vktrace
+python ../vktrace/vktrace_generate.py AllPlatforms vktrace-trace-h vk_version_1_0 > generated/include/vktrace_vk_vk.h
+python ../vktrace/vktrace_generate.py AllPlatforms vktrace-trace-c vk_version_1_0 > generated/include/vktrace_vk_vk.cpp
+python ../vktrace/vktrace_generate.py AllPlatforms vktrace-core-trace-packets vk_version_1_0 > generated/include/vktrace_vk_vk_packets.h
+python ../vktrace/vktrace_generate.py AllPlatforms vktrace-packet-id vk_version_1_0 > generated/include/vktrace_vk_packet_id.h
+
+# vkreplay
+python ../vktrace/vktrace_generate.py AllPlatforms vktrace-replay-vk-funcs vk_version_1_0 > generated/include/vkreplay_vk_func_ptrs.h
+python ../vktrace/vktrace_generate.py AllPlatforms vktrace-replay-c vk_version_1_0 > generated/include/vkreplay_vk_replay_gen.cpp
+python ../vktrace/vktrace_generate.py AllPlatforms vktrace-replay-obj-mapper-h vk_version_1_0 > generated/include/vkreplay_vk_objmapper.h
+
 cp -f ../layers/vk_layer_config.cpp   generated/common/
 cp -f ../layers/vk_layer_extension_utils.cpp  generated/common/
 cp -f ../layers/vk_layer_utils.cpp    generated/common/
@@ -39,8 +53,8 @@
 # layer names and their original source files directory
 # 1 to 1 correspondence -- one layer one source file; additional files are copied
 # at fixup step
-declare layers=(core_validation image object_tracker parameter_validation swapchain threading unique_objects)
-declare src_dirs=(../layers ../layers ../layers ../layers ../layers ../layers ../layers)
+declare layers=(core_validation image object_tracker parameter_validation swapchain threading unique_objects api_dump screenshot)
+declare src_dirs=(../layers ../layers ../layers ../layers ../layers ../layers ../layers generated/include ../layersvt)
 
 SRC_ROOT=generated/layer-src
 BUILD_ROOT=generated/gradle-build
@@ -63,4 +77,7 @@
 cp  generated/include/vk_safe_struct.cpp ${SRC_ROOT}/core_validation/vk_safe_struct.cpp
 mv  generated/include/vk_safe_struct.cpp ${SRC_ROOT}/unique_objects/vk_safe_struct.cpp
 
+# fixup - remove copied files from generated/include
+rm  generated/include/api_dump.cpp
+
 exit 0

diff --git a/build-android/build_vktracereplay.bat b/build-android/build_vktracereplay.bat
new file mode 100644
index 0000000..71b6451
--- /dev/null
+++ b/build-android/build_vktracereplay.bat

@@ -0,0 +1,57 @@
+@echo off
+REM # Copyright 2016 The Android Open Source Project
+REM # Copyright (C) 2015 Valve Corporation
+REM
+REM # Licensed under the Apache License, Version 2.0 (the "License");
+REM # you may not use this file except in compliance with the License.
+REM # You may obtain a copy of the License at
+REM
+REM #      http://www.apache.org/licenses/LICENSE-2.0
+REM
+REM # Unless required by applicable law or agreed to in writing, software
+REM # distributed under the License is distributed on an "AS IS" BASIS,
+REM # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+REM # See the License for the specific language governing permissions and
+REM # limitations under the License.
+
+set num_cpus=%NUMBER_OF_PROCESSORS%
+
+REM
+REM build layers
+REM
+call update_external_sources_android.bat
+call android-generate.bat
+call ndk-build -j %num_cpus%
+
+if %errorlevel% neq 0 exit /b %errorlevel%
+
+REM
+REM create vkreplay APK
+REM
+call android update project -s -p . -t "android-23"
+call ant -buildfile vkreplay debug
+
+if %errorlevel% neq 0 exit /b %errorlevel%
+
+REM
+REM build cube-with-layers
+REM
+pushd ..\demos\android
+call android update project -s -p . -t "android-23"
+call ndk-build -j  %num_cpus%
+call ant -buildfile cube-with-layers debug
+popd
+
+if %errorlevel% neq 0 exit /b %errorlevel%
+
+REM
+REM build vktrace
+REM
+pushd ..
+call update_external_sources.bat --all
+call build_windows_targets.bat
+popd
+
+if %errorlevel% neq 0 exit /b %errorlevel%
+
+exit /b 0
\ No newline at end of file

diff --git a/build-android/build_vktracereplay.sh b/build-android/build_vktracereplay.sh
new file mode 100755
index 0000000..2ee5745
--- /dev/null
+++ b/build-android/build_vktracereplay.sh

@@ -0,0 +1,60 @@
+#!/bin/bash
+
+set -ev
+
+if [[ $(uname) == "Linux" ]]; then
+    cores=$(ncpus)
+elif [[ $(uname) == "Darwin" ]]; then
+    cores=$(sysctl -n hw.ncpu)
+fi
+
+#
+# build layers
+#
+./update_external_sources_android.sh
+./android-generate.sh
+ndk-build -j $cores
+
+#
+# create vkreplay APK
+#
+android update project -s -p . -t "android-23"
+ant -buildfile vkreplay debug
+
+#
+# build cube-with-layers
+#
+(
+pushd ../demos/android
+android update project -s -p . -t "android-23"
+ndk-build -j $cores
+ant -buildfile cube-with-layers debug
+popd
+)
+
+#
+# build vktrace
+#
+(
+pushd ..
+./update_external_sources.sh -g -s
+mkdir -p build
+cd build
+cmake -DCMAKE_BUILD_TYPE=Debug -DBUILD_LOADER=Off -DBUILD_ICD=Off -DBUILD_TESTS=Off -DBUILD_LAYERS=Off -DBUILD_VKTRACEVIEWER=Off -DBUILD_LAYERSVT=Off -DBUILD_DEMOS=Off -DBUILD_VKJSON=Off -DBUILD_VIA=Off -DBUILD_VKTRACE_LAYER=Off -DBUILD_VKTRACE_REPLAY=Off -DBUILD_VKTRACE=On ..
+make -j $cores vktrace
+popd
+)
+
+#
+# build vktrace32
+#
+(
+pushd ..
+mkdir -p build32
+cd build32
+cmake -DCMAKE_BUILD_TYPE=Debug -DBUILD_LOADER=Off -DBUILD_ICD=Off -DBUILD_TESTS=Off -DBUILD_LAYERS=Off -DBUILD_VKTRACEVIEWER=Off -DBUILD_LAYERSVT=Off -DBUILD_DEMOS=Off -DBUILD_VKJSON=Off -DBUILD_VIA=Off -DBUILD_VKTRACE_LAYER=Off -DBUILD_VKTRACE_REPLAY=Off -DBUILD_X64=Off -DBUILD_VKTRACE=On ..
+make -j $cores vktrace
+popd
+)
+
+exit 0

diff --git a/build-android/create_trace.sh b/build-android/create_trace.sh
new file mode 100755
index 0000000..512ed0f
--- /dev/null
+++ b/build-android/create_trace.sh

@@ -0,0 +1,259 @@
+#!/bin/bash
+
+#set -vex
+
+script_start_time=$(date +%s)
+
+default_vktrace_exe=../build/vktrace/vktrace
+default_vktrace32_exe=../build32/vktrace/vktrace32
+default_target_abi=$(adb shell getprop ro.product.cpu.abi)
+default_activity=android.app.NativeActivity
+
+#
+# Parse parameters
+#
+
+function printUsage {
+   echo "Supported parameters are:"
+   echo "    --serial <target device serial number>"
+   echo "    --abi <abi to install>"
+   echo "    --vktrace <full path to vktrace on host> (optional)"
+   echo "    --package <package name>"
+   echo "    --actvity <launchable-activity name>"
+   echo "    --frame <frame number to capture> (optional)"
+   echo
+   echo "i.e. ${0##*/} --serial 01234567 \\"
+   echo "              --abi arm64-v8a \\"
+   echo "              --vktrace ../build/vktrace/vktrace \\"
+   echo "              --package com.example.foo \\"
+   echo "              --actvity android.app.NativeActivity \\"
+   echo "              --frame 100"
+   exit 1
+}
+
+if [[ $(($# % 2)) -ne 0 ]]
+then
+    echo Parameters must be provided in pairs.
+    echo parameter count = $#
+    echo
+    printUsage
+    exit 1
+fi
+
+while [[ $# -gt 0 ]]
+do
+    case $1 in
+        --serial)
+            # include the flag, because we need to leave it off if not provided
+            serial="$2"
+            serialFlag="-s $serial"
+            shift 2
+            ;;
+        --abi)
+            target_abi="$2"
+            shift 2
+            ;;
+        --vktrace)
+            vktrace_exe="$2"
+            shift 2
+            ;;
+        --package)
+            package="$2"
+            shift 2
+            ;;
+        --activity)
+            activity="$2"
+            shift 2
+            ;;
+        --frame)
+            frame="$2"
+            shift 2
+            ;;
+        -*)
+            # unknown option
+            echo Unknown option: $1
+            echo
+            printUsage
+            exit 1
+            ;;
+    esac
+done
+
+echo serial = $serial
+
+if [[ -z $serial ]]
+then
+    echo Please provide a serial number.
+    echo
+    printUsage
+    exit 1
+fi
+
+if [[ $(adb devices) != *"$serial"* ]];
+then
+    echo Device not found: $serial
+    echo
+    printUsage
+    exit 1
+fi
+
+if [[ -z $target_abi ]];
+then
+    echo Using default target_abi
+    target_abi=$default_target_abi
+fi
+echo target_abi = $target_abi
+
+if [[ $target_abi == "armeabi-v7a" ]] ||
+   [[ $target_abi == "mips" ]] ||
+   [[ $target_abi == "x86" ]];
+then
+    echo Targeting 32-bit abi $target_abi
+    target_32bit_abi=1
+fi
+
+if [[ -z $vktrace_exe ]];
+then
+    echo Using default vktrace_exe
+    vktrace_exe=$default_vktrace_exe
+    if [[ $target_32bit_abi ]]
+    then
+        vktrace_exe=$default_vktrace32_exe
+    fi
+else
+    if [[ $target_32bit_abi ]]
+    then
+       echo Ensure your vktrace is 32-bit, i.e. vktrace32
+    fi
+fi
+echo vktrace_exe = $vktrace_exe
+
+if [[ -z $package ]];
+then
+    echo target package name required
+    exit 1
+fi
+echo package = $package
+
+if [[ -z $activity ]];
+then
+    echo Using default activity
+    activity=$default_activity
+fi
+echo activity = $activity
+
+if [[ -z $frame ]];
+then
+    echo Not attempting to record any screenshot
+else
+    echo Attempting to record screenshot of frame $frame
+fi
+
+function printLayerBuild() {
+    echo "To build layers:"
+    echo "./update_external_sources_android.sh"
+    echo "./android-generate.sh"
+    echo "ndk-build -j"
+}
+
+function printvktraceBuild() {
+    echo "To build vktrace"
+    echo "pushd .."
+    echo "mkdir build"
+    echo "cd build"
+    echo "cmake -DCMAKE_BUILD_TYPE=Debug .."
+    echo "make -j"
+}
+
+#
+# If any parameter not found, print how to build it
+#
+
+if [ ! -f $vktrace_exe ]; then
+    echo "$vktrace_exe not found!"
+    printvktraceBuild
+    exit 1
+fi
+
+#
+# Check for required tools
+#
+
+adb_path=$(which adb)
+if [[ $? == 0 ]];
+then
+    echo using $adb_path
+else
+    echo adb not found, exiting
+    echo check your NDK for it and add to path
+    exit 1
+fi
+aapt_path=$(which aapt)
+if [[ $? == 0 ]];
+then
+    echo using $aapt_path
+else
+    echo aapt not found, exiting
+    echo check your NDK for it and add to path
+    exit 1
+fi
+jar_path=$(which jar)
+if [[ $? == 0 ]];
+then
+    echo using $jar_path
+else
+    echo jar not found, exiting
+    exit 1
+fi
+
+apk_contents=$(jar tvf $apk)
+if [[ $apk_contents != *"libVkLayer_screenshot.so"* ]] ||
+   [[ $apk_contents != *"libVkLayer_vktrace_layer.so"* ]];
+then
+    echo Your APK does not contain the following layers:
+    echo     libVkLayer_screenshot.so
+    echo     libVkLayer_vktrace_layer.so
+    echo You\'ll need to provide them another way.
+    echo Continuing...
+fi
+
+#
+# Start up
+#
+
+# We want to halt on errors here
+set -e
+
+# Wake up the device
+adb $serialFlag shell input keyevent "KEYCODE_MENU"
+adb $serialFlag shell input keyevent "KEYCODE_HOME"
+
+# clean up anything lingering from previous runs
+adb $serialFlag shell am force-stop $package
+
+# Ensure vktrace wasn't already running
+let "script_run_time=$(date +%s)-$script_start_time"
+killall --older-than "$script_run_time"s vktrace || echo continuing...
+
+# Enable trace layer
+adb $serialFlag shell setprop debug.vulkan.layer.1 VK_LAYER_LUNARG_vktrace
+
+# Start vktrace
+adb $serialFlag reverse localabstract:vktrace tcp:34201
+sleep 1
+$vktrace_exe -v full -o $package.vktrace &
+
+# Start the target activity
+adb $serialFlag shell am start $package/$activity
+
+# wait for a keystroke, indicating when tracing should stop
+read -rsn1
+
+# stop our background vktrace
+kill $!
+
+# clean up
+adb $serialFlag shell am force-stop $package
+adb $serialFlag shell setprop debug.vulkan.layer.1 '""'
+
+exit 0

diff --git a/build-android/jni/Android.mk b/build-android/jni/Android.mk
index 09d0ceb..9e27c5f 100644
--- a/build-android/jni/Android.mk
+++ b/build-android/jni/Android.mk

@@ -137,6 +137,35 @@
 LOCAL_LDFLAGS   += -Wl,--exclude-libs,ALL
 include $(BUILD_SHARED_LIBRARY)
 
+include $(CLEAR_VARS)
+LOCAL_MODULE := VkLayer_api_dump
+LOCAL_SRC_FILES += $(LAYER_DIR)/layer-src/api_dump/api_dump.cpp
+LOCAL_SRC_FILES += $(LAYER_DIR)/common/vk_layer_table.cpp
+LOCAL_C_INCLUDES += $(SRC_DIR)/include \
+                    $(SRC_DIR)/layers \
+                    $(SRC_DIR)/layersvt \
+                    $(LAYER_DIR)/include \
+                    $(SRC_DIR)/loader
+LOCAL_STATIC_LIBRARIES += layer_utils
+LOCAL_CPPFLAGS += -DVK_USE_PLATFORM_ANDROID_KHR
+LOCAL_LDLIBS    := -llog
+include $(BUILD_SHARED_LIBRARY)
+
+include $(CLEAR_VARS)
+LOCAL_MODULE := VkLayer_screenshot
+LOCAL_SRC_FILES += $(LAYER_DIR)/layer-src/screenshot/screenshot.cpp
+LOCAL_SRC_FILES += $(LAYER_DIR)/common/vk_layer_table.cpp
+LOCAL_C_INCLUDES += $(SRC_DIR)/include \
+                    $(SRC_DIR)/layers \
+                    $(SRC_DIR)/layersvt \
+                    $(LAYER_DIR)/include \
+                    $(SRC_DIR)/loader
+LOCAL_STATIC_LIBRARIES += layer_utils
+LOCAL_WHOLE_STATIC_LIBRARIES += libcutils
+LOCAL_CPPFLAGS += -DVK_USE_PLATFORM_ANDROID_KHR
+LOCAL_LDLIBS    := -llog
+include $(BUILD_SHARED_LIBRARY)
+
 # Pull in prebuilt shaderc
 include $(CLEAR_VARS)
 LOCAL_MODULE := shaderc-prebuilt
@@ -227,5 +256,71 @@
 LOCAL_LDLIBS := -llog -landroid
 include $(BUILD_SHARED_LIBRARY)
 
+include $(CLEAR_VARS)
+LOCAL_MODULE := VkLayer_vktrace_layer
+LOCAL_SRC_FILES += $(LAYER_DIR)/include/vktrace_vk_vk.cpp
+LOCAL_SRC_FILES += $(LAYER_DIR)/include/vk_struct_size_helper.c
+LOCAL_SRC_FILES += $(SRC_DIR)/vktrace/src/vktrace_common/vktrace_trace_packet_utils.c
+LOCAL_SRC_FILES += $(SRC_DIR)/vktrace/src/vktrace_common/vktrace_filelike.c
+LOCAL_SRC_FILES += $(SRC_DIR)/vktrace/src/vktrace_common/vktrace_interconnect.c
+LOCAL_SRC_FILES += $(SRC_DIR)/vktrace/src/vktrace_common/vktrace_platform.c
+LOCAL_SRC_FILES += $(SRC_DIR)/vktrace/src/vktrace_common/vktrace_process.c
+LOCAL_SRC_FILES += $(SRC_DIR)/vktrace/src/vktrace_common/vktrace_settings.c
+LOCAL_SRC_FILES += $(SRC_DIR)/vktrace/src/vktrace_common/vktrace_tracelog.c
+LOCAL_SRC_FILES += $(SRC_DIR)/vktrace/src/vktrace_layer/vktrace_lib_trace.cpp
+LOCAL_SRC_FILES += $(SRC_DIR)/vktrace/src/vktrace_layer/vktrace_vk_exts.cpp
+LOCAL_C_INCLUDES += $(SRC_DIR)/vktrace/include \
+		    $(SRC_DIR)/include \
+                    $(SRC_DIR)/layers \
+                    $(LAYER_DIR)/include \
+                    $(SRC_DIR)/vktrace/src/vktrace_common \
+                    $(SRC_DIR)/vktrace/src/vktrace_layer \
+                    $(SRC_DIR)/loader
+LOCAL_STATIC_LIBRARIES += layer_utils
+LOCAL_CPPFLAGS += -DVK_USE_PLATFORM_ANDROID_KHR
+LOCAL_CPPFLAGS += -DPLATFORM_LINUX=1
+LOCAL_CFLAGS += -DPLATFORM_LINUX=1
+LOCAL_CFLAGS += -DPLATFORM_POSIX=1
+LOCAL_LDLIBS    := -llog
+include $(BUILD_SHARED_LIBRARY)
+
+include $(CLEAR_VARS)
+LOCAL_MODULE := vkreplay
+LOCAL_SRC_FILES += $(LAYER_DIR)/include/vk_struct_size_helper.c
+LOCAL_SRC_FILES += $(LAYER_DIR)/include/vkreplay_vk_replay_gen.cpp
+LOCAL_SRC_FILES += $(SRC_DIR)/vktrace/src/vktrace_common/vktrace_trace_packet_utils.c
+LOCAL_SRC_FILES += $(SRC_DIR)/vktrace/src/vktrace_common/vktrace_filelike.c
+LOCAL_SRC_FILES += $(SRC_DIR)/vktrace/src/vktrace_common/vktrace_interconnect.c
+LOCAL_SRC_FILES += $(SRC_DIR)/vktrace/src/vktrace_common/vktrace_platform.c
+LOCAL_SRC_FILES += $(SRC_DIR)/vktrace/src/vktrace_common/vktrace_process.c
+LOCAL_SRC_FILES += $(SRC_DIR)/vktrace/src/vktrace_common/vktrace_settings.c
+LOCAL_SRC_FILES += $(SRC_DIR)/vktrace/src/vktrace_common/vktrace_tracelog.c
+LOCAL_SRC_FILES += $(SRC_DIR)/vktrace/src/vktrace_replay/vkreplay_factory.cpp
+LOCAL_SRC_FILES += $(SRC_DIR)/vktrace/src/vktrace_replay/vkreplay_main.cpp
+LOCAL_SRC_FILES += $(SRC_DIR)/vktrace/src/vktrace_replay/vkreplay_seq.cpp
+LOCAL_SRC_FILES += $(SRC_DIR)/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay.cpp
+LOCAL_SRC_FILES += $(SRC_DIR)/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay_settings.cpp
+LOCAL_SRC_FILES += $(SRC_DIR)/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay_vkdisplay.cpp
+LOCAL_SRC_FILES += $(SRC_DIR)/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay_vkreplay.cpp
+LOCAL_SRC_FILES += $(SRC_DIR)/common/vulkan_wrapper.cpp
+LOCAL_C_INCLUDES += $(SRC_DIR)/vktrace/include \
+                    $(SRC_DIR)/include \
+                    $(SRC_DIR)/include/vulkan \
+                    $(SRC_DIR)/layers \
+                    $(LAYER_DIR)/include \
+                    $(SRC_DIR)/vktrace/src/vktrace_common \
+                    $(SRC_DIR)/vktrace/src/vktrace_layer \
+                    $(SRC_DIR)/vktrace/src/vktrace_replay \
+                    $(SRC_DIR)/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay \
+                    $(SRC_DIR)/loader
+LOCAL_STATIC_LIBRARIES += layer_utils android_native_app_glue
+LOCAL_SHARED_LIBRARIES += VkLayer_vktrace_layer
+LOCAL_CPPFLAGS += -DVK_USE_PLATFORM_ANDROID_KHR --include=$(SRC_DIR)/common/vulkan_wrapper.h -fexceptions
+LOCAL_CPPFLAGS += -DPLATFORM_LINUX=1
+LOCAL_CFLAGS += -DPLATFORM_LINUX=1
+LOCAL_CFLAGS += -DPLATFORM_POSIX=1
+LOCAL_LDLIBS    := -llog -landroid
+include $(BUILD_SHARED_LIBRARY)
+
 $(call import-module,android/native_app_glue)
 $(call import-module,third_party/googletest)

diff --git a/build-android/jni/Application.mk b/build-android/jni/Application.mk
index 8b4fb09..a7f19db 100644
--- a/build-android/jni/Application.mk
+++ b/build-android/jni/Application.mk

@@ -16,6 +16,6 @@
 APP_ABI := armeabi-v7a arm64-v8a x86 x86_64 mips mips64

 APP_PLATFORM := android-22

 APP_STL := gnustl_static

-APP_MODULES := layer_utils VkLayer_core_validation VkLayer_image VkLayer_parameter_validation VkLayer_object_tracker VkLayer_threading VkLayer_swapchain VkLayer_unique_objects VkLayerValidationTests VulkanLayerValidationTests

+APP_MODULES := layer_utils VkLayer_core_validation VkLayer_image VkLayer_parameter_validation VkLayer_object_tracker VkLayer_threading VkLayer_swapchain VkLayer_unique_objects VkLayer_api_dump VkLayer_screenshot VkLayerValidationTests VkLayer_vktrace_layer vkreplay

 APP_CPPFLAGS += -std=c++11 -DVK_PROTOTYPES -Wall -Werror -Wno-unused-function -Wno-unused-const-variable -mxgot

 NDK_TOOLCHAIN_VERSION := clang


diff --git a/build-android/replay_trace.sh b/build-android/replay_trace.sh
new file mode 100755
index 0000000..b7e30e1
--- /dev/null
+++ b/build-android/replay_trace.sh

@@ -0,0 +1,122 @@
+#!/bin/bash
+
+#
+# Parse parameters
+#
+
+function printUsage {
+   echo "Supported parameters are:"
+   echo "    --serial <target device serial number>"
+   echo "    --tracefile <path to tracefile>"
+   echo
+   echo "i.e. ${0##*/} --serial 01234567 \\"
+   echo "              --tracefile cube.vktrace \\"
+   exit 1
+}
+
+if [[ $(($# % 2)) -ne 0 ]]
+then
+    echo Parameters must be provided in pairs.
+    echo parameter count = $#
+    echo
+    printUsage
+    exit 1
+fi
+
+while [[ $# -gt 0 ]]
+do
+    case $1 in
+        --serial)
+            # include the flag, because we need to leave it off if not provided
+            serial="$2"
+            serialFlag="-s $serial"
+            shift 2
+            ;;
+        --tracefile)
+            tracefile="$2"
+            shift 2
+            ;;
+        -*)
+            # unknown option
+            echo Unknown option: $1
+            echo
+            printUsage
+            exit 1
+            ;;
+    esac
+done
+
+echo serial = $serial
+
+if [[ -z $serial ]]
+then
+    echo Please provide a serial number.
+    echo
+    printUsage
+    exit 1
+fi
+
+if [[ $(adb devices) != *"$serial"* ]];
+then
+    echo Device not found: $serial
+    echo
+    printUsage
+    exit 1
+fi
+
+function printvkreplayBuild() {
+    echo "To build vkreplay apk"
+    echo "android update project -s -p . -t \"android-23\""
+    echo "ndk-build -j"
+    echo "ant -buildfile vkreplay debug"
+}
+
+#
+# If any parameter not found, print how to build it
+#
+
+if [ ! -f $tracefile ]; then
+    echo "$tracefile not found!"
+    exit 1
+fi
+
+#
+# Check for required tools
+#
+
+adb_path=$(which adb)
+if [[ $? == 0 ]];
+then
+    echo using $adb_path
+else
+    echo adb not found, exiting
+    echo check your NDK for it and add to path
+    exit 1
+fi
+
+#
+# Start up
+#
+
+# We want to halt on errors here
+set -e
+
+# clean up anything lingering from previous runs
+adb $serialFlag shell am force-stop com.example.vkreplay
+
+# push the trace to the device
+adb $serialFlag push $tracefile /sdcard/$tracefile
+
+# replay and screenshot
+adb $serialFlag shell pm grant com.example.vkreplay android.permission.READ_EXTERNAL_STORAGE
+adb $serialFlag shell pm grant com.example.vkreplay android.permission.WRITE_EXTERNAL_STORAGE
+sleep 1 # small pause to allow permission to take
+
+# Wake up the device
+adb $serialFlag shell input keyevent "KEYCODE_MENU"
+adb $serialFlag shell input keyevent "KEYCODE_HOME"
+
+# Start the replay
+adb $serialFlag shell am start -a android.intent.action.MAIN -c android-intent.category.LAUNCH -n com.example.vkreplay/android.app.NativeActivity --es args "-v\ full\ -t\ /sdcard/$tracefile"
+
+exit 0

diff --git a/build-android/vkreplay/AndroidManifest.xml b/build-android/vkreplay/AndroidManifest.xml
new file mode 100644
index 0000000..bdd315e
--- /dev/null
+++ b/build-android/vkreplay/AndroidManifest.xml

@@ -0,0 +1,26 @@
+<?xml version="1.0"?>

+<manifest xmlns:android="http://schemas.android.com/apk/res/android" package="com.example.vkreplay" android:versionCode="1" android:versionName="1.0">

+

+    <!-- This is the platform API where NativeActivity was introduced. -->

+    <uses-sdk android:minSdkVersion="24" android:targetSdkVersion="23"/>

+

+    <!-- This allows reading trace file from sdcard -->

+    <uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE"/>

+    <uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE"/>

+

+    <!-- This .apk has no Java code itself, so set hasCode to false. -->

+    <application android:label="@string/app_name" android:hasCode="false" android:debuggable='true'>

+

+        <!-- Our activity is the built-in NativeActivity framework class.

+             This will take care of integrating with our NDK code. -->

+        <activity android:name="android.app.NativeActivity" android:label="@string/app_name" android:exported="true">

+            <!-- Tell NativeActivity the name of or .so -->

+            <meta-data android:name="android.app.lib_name" android:value="vkreplay"/>

+            <intent-filter>

+                <action android:name="android.intent.action.MAIN"/>

+                <category android:name="android.intent.category.LAUNCHER"/>

+            </intent-filter>

+        </activity>

+    </application>

+

+</manifest>


diff --git a/build-android/vkreplay/custom_rules.xml b/build-android/vkreplay/custom_rules.xml
new file mode 100644
index 0000000..697cb8c
--- /dev/null
+++ b/build-android/vkreplay/custom_rules.xml

@@ -0,0 +1,6 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<project name="NativeActivity" default="help">
+<!-- Point ndk-build at the libs created in build-android -->
+<echo>vkreplay: Overriding native.libs.absolute.dir with ../libs</echo>
+<property name="native.libs.absolute.dir" location="../libs" />
+</project>

diff --git a/build-android/vkreplay/res/values/strings.xml b/build-android/vkreplay/res/values/strings.xml
new file mode 100644
index 0000000..70897c3
--- /dev/null
+++ b/build-android/vkreplay/res/values/strings.xml

@@ -0,0 +1,24 @@
+<?xml version="1.0" encoding="utf-8"?>
+<!-- Copyright 2016 The Android Open Source Project
+
+     Licensed under the Apache License, Version 2.0 (the "License");
+     you may not use this file except in compliance with the License.
+     You may obtain a copy of the License at
+
+          http://www.apache.org/licenses/LICENSE-2.0
+
+     Unless required by applicable law or agreed to in writing, software
+     distributed under the License is distributed on an "AS IS" BASIS,
+     WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+     See the License for the specific language governing permissions and
+     limitations under the License.
+-->
+
+<!-- This file contains resource definitions for displayed strings, allowing
+     them to be changed based on the locale and options. -->
+
+<resources>
+    <!-- Simple strings. -->
+    <string name="app_name">vkreplay</string>
+
+</resources>

diff --git a/build-android/vktracereplay.bat b/build-android/vktracereplay.bat
new file mode 100644
index 0000000..c068611
--- /dev/null
+++ b/build-android/vktracereplay.bat

@@ -0,0 +1,234 @@
+@echo off

+REM # Copyright 2016 The Android Open Source Project

+REM # Copyright (C) 2016 Valve Corporation

+REM

+REM # Licensed under the Apache License, Version 2.0 (the "License");

+REM # you may not use this file except in compliance with the License.

+REM # You may obtain a copy of the License at

+REM

+REM #      http://www.apache.org/licenses/LICENSE-2.0

+REM

+REM # Unless required by applicable law or agreed to in writing, software

+REM # distributed under the License is distributed on an "AS IS" BASIS,

+REM # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

+REM # See the License for the specific language governing permissions and

+REM # limitations under the License.

+

+setlocal EnableDelayedExpansion

+

+set vktrace_exe=..\build\vktrace\Debug\vktrace.exe

+set vktrace32_exe=..\build\vktrace\Debug\vktrace32.exe

+set vkreplay_apk=.\vkreplay\bin\NativeActivity-debug.apk

+set activity=android.app.NativeActivity

+set frame=1

+for /f "delims=" %%i in ('adb shell getprop ro.product.cpu.abi') do set target_abi=%%i

+

+REM // ======== Parameter parsing ======== //

+

+   if "%1" == "" (

+       echo Supported parameters are:

+       echo     --serial ^<target device serial number^>

+       echo     --abi ^<abi to install^>

+       echo     --vktrace ^<full path to vktrace on host^> ^(optional^)

+       echo     --vkreplay ^<full path to vkreplay APK^> ^(optional^)

+       echo     --apk ^<full path to APK^>

+       echo     --package ^<package name^>

+       echo     --actvity ^<launchable-activity name^>

+       echo     --frame ^<frame number to capture^>

+       echo. 

+       echo i.e. %~nx0 --serial 01234567

+       echo                        --abi arm64-v8a

+       echo                        --vktrace ../build/vktrace/vktrace

+       echo                        --vkreplay ./vkreplay/bin/NativeActivity-debug.apk

+       echo                        --apk ~/Downloads/foo.apk.apk

+       echo                        --package com.example.foo

+       echo                        --actvity android.app.NativeActivity

+       echo                        --frame 1

+      goto:finish

+   )

+

+   REM Check for even parameter count

+   set arg_count=0

+   for %%x in (%*) do set /A arg_count+=1

+   set /A even_args=arg_count%2

+   if %even_args% neq 0 (

+       echo Arguments must be provided in pairs.

+       goto:finish

+   )

+   

+   :parameterLoop

+

+      if "%1"=="" goto:parameterContinue

+

+      if "%1" == "--serial" (

+         set serial=%2

+         shift

+         shift

+         goto:parameterLoop

+      )

+

+      if "%1" == "--abi" (

+         set target_abi=%2

+         shift

+         shift

+         goto:parameterLoop

+      )

+

+      if "%1" == "--vktrace" (

+         set vktrace_exe=%2

+         shift

+         shift

+         goto:parameterLoop

+      )

+

+      if "%1" == "--vkreplay" (

+         set vkreplay_apk=%2

+         shift

+         shift

+         goto:parameterLoop

+      )

+

+      if "%1" == "--apk" (

+         set apk=%2

+         shift

+         shift

+         goto:parameterLoop

+      )

+

+      if "%1" == "--package" (

+         set package=%2

+         shift

+         shift

+         goto:parameterLoop

+      )

+

+      if "%1" == "--activity" (

+         set activity=%2

+         shift

+         shift

+         goto:parameterLoop

+      )

+

+      if "%1" == "--frame" (

+         set frame=%2

+         shift

+         shift

+         goto:parameterLoop

+      )

+

+      echo Unrecognized options "%1"

+      goto:error

+

+   :parameterContinue

+

+REM // ======== end Parameter parsing ======== //

+

+echo serial = %serial%

+echo target_abi = %target_abi%

+echo vktrace_exe = %vktrace_exe%

+echo vkreplay_apk = %vkreplay_apk%

+echo apk = %apk%

+echo package = %package%

+echo activity = %activity%

+echo frame = %frame%

+

+

+REM Wake up the device

+adb -s %serial% shell input keyevent "KEYCODE_MENU"

+adb -s %serial% shell input keyevent "KEYCODE_HOME"

+

+REM clean up anything lingering from previous runs

+adb -s %serial% shell am force-stop %package%

+adb -s %serial% shell am force-stop com.example.vkreplay

+

+adb -s %serial% uninstall %package% && echo continuing...

+adb -s %serial% uninstall com.example.vkreplay && echo continuing...

+

+adb -s %serial% shell rm -f /sdcard/Android/%frame%.ppm

+adb -s %serial% shell rm -f /sdcard/Android/%package%.%frame%.vktrace.ppm

+adb -s %serial% shell rm -f /sdcard/Android/%package%.%frame%.vkreplay.ppm

+del /Q /F %package%.%frame%.vktrace.ppm

+del /Q /F %package%.%frame%.vkreplay.ppm

+

+REM install the latest APK, possibly packaged with vktrace and screenshot

+adb -s %serial% install --abi %target_abi% %apk%

+

+REM install vkreplay APK

+adb -s %serial% install --abi %target_abi% %vkreplay_apk%

+

+REM trace and screenshot

+adb -s %serial% shell setprop debug.vulkan.layer.1 VK_LAYER_LUNARG_vktrace

+adb -s %serial% shell setprop debug.vulkan.layer.2 VK_LAYER_LUNARG_screenshot

+adb -s %serial% shell pm grant %package% android.permission.READ_EXTERNAL_STORAGE

+adb -s %serial% shell pm grant %package% android.permission.WRITE_EXTERNAL_STORAGE

+adb -s %serial% shell setprop debug.vulkan.screenshot %frame%

+

+REM vktrace

+adb -s %serial% reverse tcp:34201 tcp:34201

+start /b %vktrace_exe% -v full -o %package%.vktrace

+adb -s %serial% shell am start %package%/%activity%

+

+

+REM WAIT HERE FOR SCREENSHOT

+REM TODO THIS SHOULD BE A LOOP

+timeout /t 5

+

+taskkill /f /im vktrace.exe

+taskkill /f /im vktrace32.exe

+

+REM set up for vkreplay

+adb -s %serial% shell am force-stop %package%

+adb -s %serial% push %package%0.vktrace /sdcard/%package%.vktrace

+

+REM grab the screenshot

+adb -s %serial% pull /sdcard/Android/%frame%.ppm %package%.%frame%.vktrace.ppm

+adb -s %serial% shell mv /sdcard/Android/%frame%.ppm /sdcard/Android/%package%.%frame%.vktrace.ppm

+

+REM replay and screenshot

+adb -s %serial% shell setprop debug.vulkan.layer.1 VK_LAYER_LUNARG_screenshot

+adb -s %serial% shell setprop debug.vulkan.layer.2 '""'

+adb -s %serial% shell setprop debug.vulkan.screenshot %frame%

+adb -s %serial% shell pm grant com.example.vkreplay android.permission.READ_EXTERNAL_STORAGE

+adb -s %serial% shell pm grant com.example.vkreplay android.permission.WRITE_EXTERNAL_STORAGE

+REM small pause to allow permission to take

+timeout /t 5

+

+REM Wake up the device

+adb -s %serial% shell input keyevent "KEYCODE_MENU"

+adb -s %serial% shell input keyevent "KEYCODE_HOME"

+

+adb -s %serial% shell am start -a android.intent.action.MAIN -c android-intent.category.LAUNCH -n com.example.vkreplay/android.app.NativeActivity --es args "-v\ full\ -t\ /sdcard/%package%.vktrace"

+

+REM WAIT HERE FOR SCREENSHOT

+REM TODO THIS SHOULD BE A LOOP

+timeout /t 5

+

+REM grab the screenshot

+adb -s %serial% pull /sdcard/Android/%frame%.ppm %package%.%frame%.vkreplay.ppm

+adb -s %serial% shell mv /sdcard/Android/%frame%.ppm /sdcard/Android/%package%.%frame%.vkreplay.ppm

+

+REM clean up

+adb -s %serial% shell am force-stop com.example.vkreplay

+adb -s %serial% shell setprop debug.vulkan.layer.1 '""'

+

+REM compare screenshots

+fc.exe /b %package%.%frame%.vktrace.ppm %package%.%frame%.vkreplay.ppm

+ 

+if %errorlevel% neq 0 (

+    echo Images DO NOT match - Failed

+    exit /b 1

+)

+

+if %errorlevel% equ 0 (

+    echo Images match - PASSED

+    exit /b 0

+)

+

+:error

+echo.

+echo Halting due to error

+goto:finish

+

+:finish

+endlocal

+goto:eof


diff --git a/build-android/vktracereplay.sh b/build-android/vktracereplay.sh
new file mode 100755
index 0000000..bf9a0c5
--- /dev/null
+++ b/build-android/vktracereplay.sh

@@ -0,0 +1,446 @@
+#!/bin/bash
+
+#set -vex
+
+script_start_time=$(date +%s)
+
+default_vkreplay_apk=./vkreplay/bin/NativeActivity-debug.apk
+default_vktrace_exe=../build/vktrace/vktrace
+default_vktrace32_exe=../build32/vktrace/vktrace32
+default_target_abi=$(adb shell getprop ro.product.cpu.abi)
+default_activity=android.app.NativeActivity
+default_frame=1
+
+#if [[ $(basename "$PWD") != "build-android" ]]
+#then
+#    echo "Please run this script from the build-android directory"
+#    exit 1
+#fi
+
+#
+# Parse parameters
+#
+
+function printUsage {
+   echo "Supported parameters are:"
+   echo "    --serial <target device serial number>"
+   echo "    --abi <abi to install>"
+   echo "    --vktrace <full path to vktrace on host> (optional)"
+   echo "    --vkreplay <full path to vkreplay APK> (optional)"
+   echo "    --apk <full path to APK>"
+   echo "    --package <package name>"
+   echo "    --actvity <launchable-activity name>"
+   echo "    --frame <frame number to capture>"
+   echo
+   echo "i.e. ${0##*/} --serial 01234567 \\"
+   echo "              --abi arm64-v8a \\"
+   echo "              --vktrace ../build/vktrace/vktrace \\"
+   echo "              --vkreplay ./vkreplay/bin/NativeActivity-debug.apk \\"
+   echo "              --apk ~/Downloads/foo.apk.apk \\"
+   echo "              --package com.example.foo \\"
+   echo "              --actvity android.app.NativeActivity \\"
+   echo "              --frame 1"
+   exit 1
+}
+
+if [[ $(($# % 2)) -ne 0 ]]
+then
+    echo Parameters must be provided in pairs.
+    echo parameter count = $#
+    echo
+    printUsage
+    exit 1
+fi
+
+while [[ $# -gt 0 ]]
+do
+    case $1 in
+        --serial)
+            # include the flag, because we need to leave it off if not provided
+            serial="$2"
+            serialFlag="-s $serial"
+            shift 2
+            ;;
+        --abi)
+            target_abi="$2"
+            shift 2
+            ;;
+        --vktrace)
+            vktrace_exe="$2"
+            shift 2
+            ;;
+        --vkreplay)
+            vkreplay_apk="$2"
+            shift 2
+            ;;
+        --apk)
+            apk="$2"
+            shift 2
+            ;;
+        --package)
+            package="$2"
+            shift 2
+            ;;
+        --activity)
+            activity="$2"
+            shift 2
+            ;;
+        --frame)
+            frame="$2"
+            shift 2
+            ;;
+        -*)
+            # unknown option
+            echo Unknown option: $1
+            echo
+            printUsage
+            exit 1
+            ;;
+    esac
+done
+
+echo serial = $serial
+
+if [[ -z $serial ]]
+then
+    echo Please provide a serial number.
+    echo
+    printUsage
+    exit 1
+fi
+
+if [[ $(adb devices) != *"$serial"* ]];
+then
+    echo Device not found: $serial
+    echo
+    printUsage
+    exit 1
+fi
+
+if [[ -z $target_abi ]];
+then
+    echo Using default target_abi
+    target_abi=$default_target_abi
+fi
+echo target_abi = $target_abi
+
+if [[ $target_abi == "armeabi-v7a" ]] ||
+   [[ $target_abi == "mips" ]] ||
+   [[ $target_abi == "x86" ]];
+then
+    echo Targeting 32-bit abi $target_abi
+    target_32bit_abi=1
+fi
+
+if [[ -z $vktrace_exe ]];
+then
+    echo Using default vktrace_exe
+    vktrace_exe=$default_vktrace_exe
+    if [[ $target_32bit_abi ]]
+    then
+        vktrace_exe=$default_vktrace32_exe
+    fi
+else
+    if [[ $target_32bit_abi ]]
+    then
+       echo Ensure your vktrace is 32-bit, i.e. vktrace32
+    fi
+fi
+echo vktrace_exe = $vktrace_exe
+
+if [[ -z $vkreplay_apk ]];
+then
+    echo Using default vkreplay_apk
+    vkreplay_apk=$default_vkreplay_apk
+fi
+echo vkreplay_apk = $vkreplay_apk
+
+if [[ -z $apk ]];
+then
+    echo target APK required
+    exit 1
+fi
+echo apk = $apk
+
+if [[ -z $package ]];
+then
+    echo target package name required
+    exit 1
+fi
+echo package = $package
+
+if [[ -z $activity ]];
+then
+    echo Using default activity
+    activity=$default_activity
+fi
+echo activity = $activity
+
+if [[ -z $frame ]];
+then
+    echo Using default screenshot frame
+    frame=$default_frame
+fi
+echo frame = $frame
+
+function printLayerBuild() {
+    echo "To build layers:"
+    echo "./update_external_sources_android.sh"
+    echo "./android-generate.sh"
+    echo "ndk-build -j"
+}
+
+function printvkreplayBuild() {
+    echo "To build vkreplay apk"
+    echo "android update project -s -p . -t \"android-23\""
+    echo "ndk-build -j"
+    echo "ant -buildfile vkreplay debug"
+}
+
+function printvktraceBuild() {
+    echo "To build vktrace"
+    echo "pushd .."
+    echo "mkdir build"
+    echo "cd build"
+    echo "cmake -DCMAKE_BUILD_TYPE=Debug .."
+    echo "make -j"
+}
+
+#
+# If any parameter not found, print how to build it
+#
+
+if [ ! -f $vkreplay_apk ]; then
+    echo "$vkreplay_apk not found!"
+    printvkreplayBuild
+    exit 1
+fi
+
+if [ ! -f $vktrace_exe ]; then
+    echo "$vktrace_exe not found!"
+    printvktraceBuild
+    exit 1
+fi
+
+#
+# Check for required tools
+#
+
+adb_path=$(which adb)
+if [[ $? == 0 ]];
+then
+    echo using $adb_path
+else
+    echo adb not found, exiting
+    echo check your NDK for it and add to path
+    exit 1
+fi
+aapt_path=$(which aapt)
+if [[ $? == 0 ]];
+then
+    echo using $aapt_path
+else
+    echo aapt not found, exiting
+    echo check your NDK for it and add to path
+    exit 1
+fi
+jar_path=$(which jar)
+if [[ $? == 0 ]];
+then
+    echo using $jar_path
+else
+    echo jar not found, exiting
+    exit 1
+fi
+
+#
+# Ensure APKs can be traced
+#
+
+apk_badging=$(aapt dump badging $apk)
+if [[ $apk_badging != *"uses-permission: name='android.permission.READ_EXTERNAL_STORAGE'"* ]] ||
+   [[ $apk_badging != *"uses-permission: name='android.permission.WRITE_EXTERNAL_STORAGE'"* ]];
+then
+    echo Please package APK with the following permissions:
+    echo     android.permission.READ_EXTERNAL_STORAGE
+    echo     android.permission.WRITE_EXTERNAL_STORAGE
+    exit 1
+fi
+apk_contents=$(jar tvf $apk)
+if [[ $apk_contents != *"libVkLayer_screenshot.so"* ]] ||
+   [[ $apk_contents != *"libVkLayer_vktrace_layer.so"* ]];
+then
+    echo Your APK does not contain the following layers:
+    echo     libVkLayer_screenshot.so
+    echo     libVkLayer_vktrace_layer.so
+    echo You\'ll need to provide them another way.
+    echo Continuing...
+fi
+
+#
+# Start up
+#
+
+# We want to halt on errors here
+set -e
+
+# Wake up the device
+adb $serialFlag shell input keyevent "KEYCODE_MENU"
+adb $serialFlag shell input keyevent "KEYCODE_HOME"
+
+# clean up anything lingering from previous runs
+adb $serialFlag shell am force-stop $package
+adb $serialFlag shell am force-stop com.example.vkreplay
+if [[ $(adb $serialFlag shell pm list packages $package) == "package:$package" ]]
+then
+    adb $serialFlag uninstall $package && echo continuing...
+fi
+if [[ $(adb $serialFlag shell pm list packages com.example.vkreplay) == "package:com.example.vkreplay" ]]
+then
+    adb $serialFlag uninstall com.example.vkreplay && echo continuing...
+fi
+adb $serialFlag shell rm -f /sdcard/Android/$frame.ppm
+adb $serialFlag shell rm -f /sdcard/Android/$package.$frame.vktrace.ppm
+adb $serialFlag shell rm -f /sdcard/Android/$package.$frame.vkreplay.ppm
+rm -f $package.$frame.vktrace.ppm
+rm -f $package.$frame.vkreplay.ppm
+rm -f $package.vktrace
+rm -f $package\0.vktrace
+
+# Ensure vktrace wasn't already running
+let "script_run_time=$(date +%s)-$script_start_time"
+killall --older-than "$script_run_time"s vktrace || echo continuing...
+
+# install the latest APK, possibly packaged with vktrace and screenshot
+adb $serialFlag install --abi $target_abi $apk
+
+# install vkreplay APK
+adb $serialFlag install --abi $target_abi $vkreplay_apk
+
+# trace and screenshot
+adb $serialFlag shell setprop debug.vulkan.layer.1 VK_LAYER_LUNARG_vktrace
+adb $serialFlag shell setprop debug.vulkan.layer.2 VK_LAYER_LUNARG_screenshot
+adb $serialFlag shell pm grant $package android.permission.READ_EXTERNAL_STORAGE
+adb $serialFlag shell pm grant $package android.permission.WRITE_EXTERNAL_STORAGE
+adb $serialFlag shell setprop debug.vulkan.screenshot $frame
+
+# vktrace
+adb $serialFlag reverse localabstract:vktrace tcp:34201
+$vktrace_exe -v full -o $package.vktrace &
+adb $serialFlag shell am start $package/$activity
+
+# don't halt on error for this loop
+set +e
+
+# wait until trace screenshot arrives, or a timeout
+vktrace_seconds=300                                    # Duration in seconds.
+vktrace_end_time=$(( $(date +%s) + vktrace_seconds ))  # Calculate end time.
+sleep 5 # pause to let the screenshot write finish
+until adb $serialFlag shell ls -la /sdcard/Android/$frame.ppm
+do
+    echo "Waiting for $package.vktrace screenshot on $serial"
+
+    if [ $(date +%s) -gt $vktrace_end_time ]
+    then
+        echo "vktrace timeout reached: $vktrace_seconds seconds"
+        echo "No vktrace screenshot, closing down"
+        exit 1
+    fi
+
+    sleep 5
+done
+
+# stop our background vktrace
+kill $!
+
+# pause for a moment to let our trace file finish writing
+sleep 1
+
+# halt on errors here
+set -e
+
+# set up for vkreplay
+adb $serialFlag shell am force-stop $package
+if [ -f $package.vktrace ]; then
+    adb $serialFlag push $package.vktrace /sdcard/$package.vktrace
+fi
+if [ -f $package\0.vktrace ]; then
+    adb $serialFlag push $package\0.vktrace /sdcard/$package.vktrace
+fi
+
+# grab the screenshot
+adb $serialFlag pull /sdcard/Android/$frame.ppm $package.$frame.vktrace.ppm
+adb $serialFlag shell mv /sdcard/Android/$frame.ppm /sdcard/Android/$package.$frame.vktrace.ppm
+
+# replay and screenshot
+adb $serialFlag shell setprop debug.vulkan.layer.1 VK_LAYER_LUNARG_screenshot
+adb $serialFlag shell setprop debug.vulkan.layer.2 '""'
+adb $serialFlag shell setprop debug.vulkan.screenshot $frame
+adb $serialFlag shell pm grant com.example.vkreplay android.permission.READ_EXTERNAL_STORAGE
+adb $serialFlag shell pm grant com.example.vkreplay android.permission.WRITE_EXTERNAL_STORAGE
+sleep 5 # small pause to allow permission to take
+
+# Wake up the device
+adb $serialFlag shell input keyevent "KEYCODE_MENU"
+adb $serialFlag shell input keyevent "KEYCODE_HOME"
+
+adb $serialFlag shell am start -a android.intent.action.MAIN -c android-intent.category.LAUNCH -n com.example.vkreplay/android.app.NativeActivity --es args "-v\ full\ -t\ /sdcard/$package.vktrace"
+
+# don't halt on the errors in this loop
+set +e
+
+# wait until vkreplay screenshot arrives, or a timeout
+vkreplay_seconds=300                                     # Duration in seconds.
+vkreplay_end_time=$(( $(date +%s) + vkreplay_seconds ))  # Calculate end time.
+sleep 5 # pause to let the screenshot write finish
+until adb $serialFlag shell ls -la /sdcard/Android/$frame.ppm
+do
+    echo "Waiting for vkreplay screenshot on $serial"
+
+    if [ $(date +%s) -gt $vkreplay_end_time ]
+    then
+        echo "vkreplay timeout reached: $vkreplay_seconds seconds"
+        echo "No vkreplay screenshot, closing down"
+        exit 1
+    fi
+    sleep 5
+done
+
+# halt on any errors here
+set -e
+
+# grab the screenshot
+adb $serialFlag pull /sdcard/Android/$frame.ppm $package.$frame.vkreplay.ppm
+adb $serialFlag shell mv /sdcard/Android/$frame.ppm /sdcard/Android/$package.$frame.vkreplay.ppm
+
+# clean up
+adb $serialFlag shell am force-stop com.example.vkreplay
+adb $serialFlag shell setprop debug.vulkan.layer.1 '""'
+
+# don't halt in the exit code below
+set +e
+
+# the rest is a quick port from vktracereplay.sh
+
+if [ -t 1 ] ; then
+    RED='\033[0;31m'
+    GREEN='\033[0;32m'
+    NC='\033[0m' # No Color
+else
+    RED=''
+    GREEN=''
+    NC=''
+fi
+
+cmp -s $package.$frame.vktrace.ppm $package.$frame.vkreplay.ppm
+
+if [ $? -eq 0 ] ; then
+    printf "$GREEN[  PASSED  ]$NC {$apk-$package}\n"
+else
+    printf "$RED[  FAILED  ]$NC screenshot file compare failed\n"
+    printf "$RED[  FAILED  ]$NC {$apk-$package}\n"
+    printf "TEST FAILED\n"
+    exit 1
+fi
+
+exit 0

diff --git a/cmake/FindPCIAccess.cmake b/cmake/FindPCIAccess.cmake
index 65f7d5c..d3c288c 100644
--- a/cmake/FindPCIAccess.cmake
+++ b/cmake/FindPCIAccess.cmake

@@ -1,6 +1,7 @@
 # - FindPCIAccess
 #
-# Copyright 2015 Valve Corporation
+# Copyright (C) 2015-2016 Valve Corporation
+# Copyright (C) 2015-2016 LunarG, Inc.
 
 find_package(PkgConfig)
 

diff --git a/cmake/FindPthreadStubs.cmake b/cmake/FindPthreadStubs.cmake
index 063bbe5..9fc33d5 100644
--- a/cmake/FindPthreadStubs.cmake
+++ b/cmake/FindPthreadStubs.cmake

@@ -1,6 +1,7 @@
 # - FindPthreadStubs
 #
-# Copyright (C) 2015 Valve Corporation
+# Copyright (C) 2015-2016 Valve Corporation
+# Copyright (C) 2015-2016 LunarG, Inc.
 
 find_package(PkgConfig)
 

diff --git a/cmake/FindUDev.cmake b/cmake/FindUDev.cmake
index e3d1699..d2727ad 100644
--- a/cmake/FindUDev.cmake
+++ b/cmake/FindUDev.cmake

@@ -1,6 +1,7 @@
 # - FindUDev
 #
-# Copyright (C) 2015 Valve Corporation
+# Copyright (C) 2015-2016 Valve Corporation
+# Copyright (C) 2015-2016 LunarG, Inc.
 
 find_package(PkgConfig)
 

diff --git a/cmake/FindValgrind.cmake b/cmake/FindValgrind.cmake
index 5c1fb56..fdc4d7a 100644
--- a/cmake/FindValgrind.cmake
+++ b/cmake/FindValgrind.cmake

@@ -1,6 +1,7 @@
 # - FindValgrind
 #
-# Copyright (C) 2015 Valve Corporation
+# Copyright (C) 2015-2016 Valve Corporation
+# Copyright (C) 2015-2016 LunarG, Inc.
 
 find_package(PkgConfig)
 

diff --git a/cmake/FindXCB.cmake b/cmake/FindXCB.cmake
index 2311591..360b8ee 100644
--- a/cmake/FindXCB.cmake
+++ b/cmake/FindXCB.cmake

@@ -1,6 +1,7 @@
 # - FindXCB
 #
-# Copyright (C) 2015 Valve Corporation
+# Copyright (C) 2015-2016 Valve Corporation
+# Copyright (C) 2015-2016 LunarG, Inc.
 
 find_package(PkgConfig)
 

diff --git a/dri3.md b/dri3.md
new file mode 100644
index 0000000..dade002
--- /dev/null
+++ b/dri3.md

@@ -0,0 +1,250 @@
+# Additional Linux Configuration for the Sample Intel Vulkan Driver
+
+The sample intel Vulkan driver in this repo uses DRI3 for its window system interface.
+This requires extra configuration of Ubuntu systems.
+
+You may need to install the following packages:
+```
+sudo apt-get install git subversion cmake libgl1-mesa-dev freeglut3-dev libglm-dev libmagickwand-dev qt5-default libpciaccess-dev libpthread-stubs0-dev libudev-dev bison graphviz libpng-dev python3-lxml
+sudo apt-get build-dep mesa
+```
+
+## Linux Render Nodes
+
+The render tests depend on access to DRM render nodes.
+To make that available, a couple of config files need to be created to set a module option
+and make accessible device files.
+The system will need to be rebooted with these files in place to complete initialization.
+These commands will create the config files.
+
+```
+sudo tee /etc/modprobe.d/drm.conf << EOF
+# Enable render nodes
+options drm rnodes=1
+EOF
+# this will add the rnodes=1 option into the boot environment
+sudo update-initramfs -k all -u
+```
+```
+sudo tee /etc/udev/rules.d/drm.rules << EOF
+# Add permissions to render nodes
+SUBSYSTEM=="drm", ACTION=="add", DEVPATH=="/devices/*/renderD*", MODE="020666"
+EOF
+```
+## DRI 3
+Find your Ubuntu release below:
+
+### Ubuntu 14.04.3 LTS support of DRI 3
+
+Ubuntu 14.04.3 LTS does not ship a xserver-xorg-video-intel package with supported DRI 3 on intel graphics.
+The xserver-xorg-video-intel package can be built from source with DRI 3 enabled.
+Use the following commands to enable DRI3 on ubuntu 14.04.3 LTS.
+
+- Install packages used to build:
+```
+sudo apt-get update
+sudo apt-get dist-upgrade
+sudo apt-get install devscripts
+sudo apt-get build-dep xserver-xorg-video-intel-lts-vivid
+```
+
+- Get the source code for xserver-xorg-video-intel-lts-vivid
+```
+mkdir xserver-xorg-video-intel-lts-vivid_source
+cd xserver-xorg-video-intel-lts-vivid_source
+apt-get source xserver-xorg-video-intel-lts-vivid
+cd xserver-xorg-video-intel-lts-vivid-2.99.917
+debian/rules patch
+quilt new 'enable-DRI3'
+quilt edit configure.ac
+```
+
+- Use the editor to make these changes.
+```
+--- a/configure.ac
++++ b/configure.ac
+@@ -340,9 +340,9 @@
+ 	      [DRI2=yes])
+ AC_ARG_ENABLE(dri3,
+ 	      AS_HELP_STRING([--enable-dri3],
+-			     [Enable DRI3 support [[default=no]]]),
++			     [Enable DRI3 support [[default=yes]]]),
+ 	      [DRI3=$enableval],
+-	      [DRI3=no])
++	      [DRI3=yes])
+ AC_ARG_ENABLE(xvmc, AS_HELP_STRING([--disable-xvmc],
+                                   [Disable XvMC support [[default=yes]]]),
+```
+- Build and install xserver-xorg-video-intel-lts-vivid
+```
+quilt refresh
+debian/rules clean
+debuild -us -uc
+sudo dpkg -i ../xserver-xorg-video-intel-lts-vivid_2.99.917-1~exp1ubuntu2.2~trusty1_amd64.deb
+```
+- Prevent updates from replacing this version of the package.
+```
+sudo bash -c 'echo xserver-xorg-video-intel-lts-vivid hold | dpkg --set-selections'
+```
+- save your work then restart the X server with the next command.
+```
+sudo service lightdm restart
+```
+- After logging in again, check for success with this command and look for DRI3.
+```
+xdpyinfo | grep DRI
+```
+
+### Ubuntu 14.10 support of DRI 3
+
+Warning: Recent versions of 14.10 have **REMOVED** DRI 3.
+Version: 2:2.99.914-1~exp1ubuntu4.1 is known to work.
+To see status of this package:
+```
+dpkg -s xserver-xorg-video-intel
+```
+
+Note:
+Version 2:2.99.914-1~exp1ubuntu4.2 does not work anymore.
+To install the working driver from launchpadlibrarian.net:
+- Remove the current driver:
+```
+sudo apt-get purge xserver-xorg-video-intel
+```
+- Download the old driver:
+```
+wget http://launchpadlibrarian.net/189418339/xserver-xorg-video-intel_2.99.914-1%7Eexp1ubuntu4.1_amd64.deb
+```
+- Install the driver:
+```
+sudo dpkg -i xserver-xorg-video-intel_2.99.914-1~exp1ubuntu4.1_amd64.deb
+```
+- Pin the package to prevent updates
+```
+sudo bash -c "echo $'Package: xserver-xorg-video-intel\nPin: version 2:2.99.914-1~exp1ubuntu4.1\nPin-Priority: 1001' > /etc/apt/preferences.d/xserver-xorg-video-intel"
+```
+
+- Either restart Ubuntu or just X11.
+
+
+### Ubuntu 15.04 support of DRI 3
+
+Ubuntu 15.04 has never shipped a xserver-xorg-video-intel package with supported DRI 3 on intel graphics.
+The xserver-xorg-video-intel package can be built from source with DRI 3 enabled.
+Use the following commands to enable DRI3 on ubuntu 15.04.
+
+- Install packages used to build:
+```
+sudo apt-get update
+sudo apt-get dist-upgrade
+sudo apt-get install devscripts
+sudo apt-get build-dep xserver-xorg-video-intel
+```
+
+- Get the source code for xserver-xorg-video-intel
+```
+mkdir xserver-xorg-video-intel_source
+cd xserver-xorg-video-intel_source
+apt-get source xserver-xorg-video-intel
+cd xserver-xorg-video-intel-2.99.917
+debian/rules patch
+quilt new 'enable-DRI3'
+quilt edit configure.ac
+```
+
+- Use the editor to make these changes.
+```
+--- a/configure.ac
++++ b/configure.ac
+@@ -340,9 +340,9 @@
+ 	      [DRI2=yes])
+ AC_ARG_ENABLE(dri3,
+ 	      AS_HELP_STRING([--enable-dri3],
+-			     [Enable DRI3 support [[default=no]]]),
++			     [Enable DRI3 support [[default=yes]]]),
+ 	      [DRI3=$enableval],
+-	      [DRI3=no])
++	      [DRI3=yes])
+ AC_ARG_ENABLE(xvmc, AS_HELP_STRING([--disable-xvmc],
+                                   [Disable XvMC support [[default=yes]]]),
+```
+- Build and install xserver-xorg-video-intel
+```
+quilt refresh
+debian/rules clean
+debuild -us -uc
+sudo dpkg -i ../xserver-xorg-video-intel_2.99.917-1~exp1ubuntu2.2_amd64.deb
+```
+- Prevent updates from replacing this version of the package.
+```
+sudo bash -c 'echo xserver-xorg-video-intel hold | dpkg --set-selections'
+```
+- save your work then restart the X server with the next command.
+```
+sudo service lightdm restart
+```
+- After logging in again, check for success with this command and look for DRI3.
+```
+xdpyinfo | grep DRI
+```
+### Ubuntu 15.10 support of DRI 3
+
+Ubuntu 15.10 has never shipped a xserver-xorg-video-intel package with supported DRI 3 on intel graphics.
+The xserver-xorg-video-intel package can be built from source with DRI 3 enabled.
+Use the following commands to enable DRI3 on ubuntu 15.10.
+
+- Install packages used to build:
+```
+sudo apt-get update
+sudo apt-get dist-upgrade
+sudo apt-get install devscripts
+sudo apt-get build-dep xserver-xorg-video-intel
+```
+
+- Get the source code for xserver-xorg-video-intel
+```
+mkdir xserver-xorg-video-intel_source
+cd xserver-xorg-video-intel_source
+apt-get source xserver-xorg-video-intel
+cd xserver-xorg-video-intel-2.99.917+git20150808
+debian/rules patch
+quilt new 'enable-DRI3'
+quilt edit configure.ac
+```
+
+- Use the editor to make these changes.
+```
+Index: xserver-xorg-video-intel-2.99.917+git20150808/configure.ac
+===================================================================
+--- xserver-xorg-video-intel-2.99.917+git20150808.orig/configure.ac
++++ xserver-xorg-video-intel-2.99.917+git20150808/configure.ac
+@@ -356,7 +356,7 @@ AC_ARG_WITH(default-dri,
+            AS_HELP_STRING([--with-default-dri],
+                           [Select the default maximum DRI level [default 2]]),
+              [DRI_DEFAULT=$withval],
+-             [DRI_DEFAULT=2])
++             [DRI_DEFAULT=3])
+ if test "x$DRI_DEFAULT" = "x0"; then
+        AC_DEFINE(DEFAULT_DRI_LEVEL, 0,[Default DRI level])
+ else
+```
+- Build and install xserver-xorg-video-intel
+```
+quilt refresh
+debian/rules clean
+debuild -us -uc
+sudo dpkg -i ../xserver-xorg-video-intel_2.99.917+git20150808-0ubuntu4_amd64.deb
+```
+- Prevent updates from replacing this version of the package.
+```
+sudo bash -c 'echo xserver-xorg-video-intel hold | dpkg --set-selections'
+```
+- save your work then restart the X server with the next command.
+```
+sudo service lightdm restart
+```
+- After logging in again, check for success with this command and look for DRI3.
+```
+xdpyinfo | grep DRI
+```
+

diff --git a/icd/CMakeLists.txt b/icd/CMakeLists.txt
new file mode 100644
index 0000000..ecc2bb6
--- /dev/null
+++ b/icd/CMakeLists.txt

@@ -0,0 +1,19 @@
+function(add_compiler_flag flag)
+    set(CMAKE_C_FLAGS    "${CMAKE_C_FLAGS}   ${flag}" PARENT_SCOPE)
+    set(CMAKE_CXX_FLAGS  "${CMAKE_CXX_FLAGS} ${flag}" PARENT_SCOPE)
+endfunction()
+
+if (${CMAKE_SYSTEM_NAME} MATCHES "Linux")
+    add_compiler_flag("-DPLATFORM_LINUX=1")
+    add_compiler_flag("-DPLATFORM_POSIX=1")
+elseif (${CMAKE_SYSTEM_NAME} MATCHES "Windows")
+    add_compiler_flag("-DPLATFORM_WINDOWS=1")
+else()
+    message(FATAL_ERROR "Platform unset, build will fail--stopping at CMake time.")
+endif()
+
+add_subdirectory(common)
+add_subdirectory(nulldrv)
+if (NOT WIN32)
+    add_subdirectory(intel)
+endif()

diff --git a/icd/README.md b/icd/README.md
new file mode 100644
index 0000000..616f978
--- /dev/null
+++ b/icd/README.md

@@ -0,0 +1,26 @@
+This sample driver implementation provide multiple subcomponents required to build and test an Installable Client Driver (ICD):
+- [Common Infrastructure](common)
+- [Implementation for Intel GPUs](intel)
+- [Null driver](nulldrv)
+- [*Sample Driver Tests*](../tests)
+    - Now includes Golden images to verify vk_render_tests rendering.
+
+common/ provides helper and utility functions, as well as all VK entry points
+except vkInitAndEnumerateGpus.  Hardware drivers are required to provide that
+function, and to embed a "VkLayerDispatchTable *" as the first member of
+VkPhysicalDevice and all VkBaseObject.
+
+Thread safety
+
+ We have these static variables
+
+  - common/icd.c:static struct icd icd;
+  - intel/gpu.c:static struct intel_gpu *intel_gpus;
+
+ They require that there is no other thread calling the ICD when these
+ functions are called
+
+  - vkInitAndEnumerateGpus
+  - vkDbgRegisterMsgCallback
+  - vkDbgUnregisterMsgCallback
+  - vkDbgSetGlobalOption

diff --git a/icd/common/CMakeLists.txt b/icd/common/CMakeLists.txt
new file mode 100644
index 0000000..4fec186
--- /dev/null
+++ b/icd/common/CMakeLists.txt

@@ -0,0 +1,25 @@
+if (WIN32)
+    set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -D_CRT_SECURE_NO_WARNINGS")
+endif()
+
+set(sources
+    icd-format.c
+    icd-instance.c
+    icd-utils.c)
+
+set(include_dirs "")
+set(libraries "")
+
+if(UNIX)
+    find_package(UDev REQUIRED)
+    list(APPEND include_dirs ${UDEV_INCLUDE_DIRS})
+    list(APPEND libraries ${UDEV_LIBRARIES})
+    list(APPEND sources icd-enumerate-drm.c)
+endif()
+
+add_library(icd STATIC ${sources})
+target_include_directories(icd
+    PRIVATE ${include_dirs}
+    INTERFACE ${CMAKE_CURRENT_SOURCE_DIR})
+target_link_libraries(icd ${libraries})
+set_target_properties(icd PROPERTIES POSITION_INDEPENDENT_CODE ON)

diff --git a/icd/common/icd-enumerate-drm.c b/icd/common/icd-enumerate-drm.c
new file mode 100644
index 0000000..6b4d137
--- /dev/null
+++ b/icd/common/icd-enumerate-drm.c

@@ -0,0 +1,205 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ *
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <libudev.h>
+
+#include "icd-instance.h"
+#include "icd-utils.h"
+#include "icd-enumerate-drm.h"
+
+static enum icd_drm_minor_type get_minor_type(struct udev_device *minor_dev)
+{
+    const char *minor;
+
+    minor = udev_device_get_property_value(minor_dev, "MINOR");
+    if (!minor)
+        return ICD_DRM_MINOR_INVALID;
+
+    switch (atoi(minor) >> 6) {
+    case 0:
+        return ICD_DRM_MINOR_LEGACY;
+    case 2:
+        return ICD_DRM_MINOR_RENDER;
+    default:
+        return ICD_DRM_MINOR_INVALID;
+    }
+}
+
+static void get_pci_id(struct udev_device *pci_dev, int *vendor, int *devid)
+{
+    const char *pci_id;
+
+    pci_id = udev_device_get_property_value(pci_dev, "PCI_ID");
+    if (sscanf(pci_id, "%x:%x", vendor, devid) != 2) {
+        *vendor = 0;
+        *devid = 0;
+    }
+}
+
+static struct icd_drm_device *find_dev(struct icd_drm_device *devices,
+                                       const char *parent_syspath)
+{
+    struct icd_drm_device *dev = devices;
+
+    while (dev) {
+        if (!strcmp((const char *) dev->id, parent_syspath))
+            break;
+        dev = dev->next;
+    }
+
+    return dev;
+}
+
+static struct icd_drm_device *probe_syspath(const struct icd_instance *instance,
+                                            struct icd_drm_device *devices,
+                                            struct udev *udev, const char *syspath,
+                                            int vendor_id_match)
+{
+    struct udev_device *minor, *parent;
+    enum icd_drm_minor_type type;
+    const char *parent_syspath;
+    struct icd_drm_device *dev;
+    int vendor, devid;
+
+    minor = udev_device_new_from_syspath(udev, syspath);
+    if (!minor)
+        return devices;
+
+    type = get_minor_type(minor);
+    if (type == ICD_DRM_MINOR_INVALID) {
+        udev_device_unref(minor);
+        return devices;
+    }
+
+    parent = udev_device_get_parent(minor);
+    if (!parent) {
+        udev_device_unref(minor);
+        return devices;
+    }
+
+    get_pci_id(parent, &vendor, &devid);
+    if (vendor_id_match && vendor != vendor_id_match) {
+        udev_device_unref(minor);
+        return devices;
+    }
+
+    parent_syspath = udev_device_get_syspath(parent);
+
+    dev = find_dev(devices, parent_syspath);
+    if (dev) {
+        assert(dev->devid == devid);
+
+        assert(!dev->minors[type]);
+        if (dev->minors[type])
+            udev_device_unref((struct udev_device *) dev->minors[type]);
+
+        dev->minors[type] = (void *) minor;
+
+        return devices;
+    } else {
+        dev = icd_instance_alloc(instance, sizeof(*dev), sizeof(int),
+                VK_SYSTEM_ALLOCATION_SCOPE_COMMAND);
+        if (!dev)
+            return devices;
+
+        memset(dev, 0, sizeof(*dev));
+
+        dev->id = (const void *) parent_syspath;
+        dev->devid = devid;
+        dev->minors[type] = (void *) minor;
+
+        dev->next = devices;
+
+        return dev;
+    }
+}
+
+struct icd_drm_device *icd_drm_enumerate(const struct icd_instance *instance,
+                                         int vendor_id)
+{
+    struct icd_drm_device *devices = NULL;
+    struct udev *udev;
+    struct udev_enumerate *e;
+    struct udev_list_entry *entry;
+
+    udev = udev_new();
+    if (udev == NULL) {
+        icd_instance_log(instance, VK_DEBUG_REPORT_ERROR_BIT_EXT,
+                         0, VK_NULL_HANDLE,     /* obj_type, object */
+                         0, 0,                  /* location, msg_code */
+                         "failed to initialize udev context");
+
+        return NULL;
+    }
+
+    e = udev_enumerate_new(udev);
+    if (e == NULL) {
+        icd_instance_log(instance, VK_DEBUG_REPORT_ERROR_BIT_EXT,
+                         0, VK_NULL_HANDLE,     /* obj_type, object */
+                         0, 0,                  /* location, msg_code */
+                         "failed to initialize udev enumerate context");
+        udev_unref(udev);
+
+        return NULL;
+    }
+
+    /* we are interested in DRM minors */
+    udev_enumerate_add_match_subsystem(e, "drm");
+    udev_enumerate_add_match_property(e, "DEVTYPE", "drm_minor");
+    udev_enumerate_scan_devices(e);
+
+    udev_list_entry_foreach(entry, udev_enumerate_get_list_entry(e)) {
+        devices = probe_syspath(instance, devices, udev,
+                udev_list_entry_get_name(entry), vendor_id);
+    }
+
+    free(e);
+    free(udev);
+    return devices;
+}
+
+void icd_drm_release(const struct icd_instance *instance,
+                     struct icd_drm_device *devices)
+{
+    struct icd_drm_device *dev = devices;
+
+    while (dev) {
+        struct icd_drm_device *next = dev->next;
+        size_t i;
+
+        for (i = 0; i < ARRAY_SIZE(dev->minors); i++)
+            udev_device_unref((struct udev_device *) dev->minors[i]);
+
+        icd_instance_free(instance, dev);
+        dev = next;
+    }
+}
+
+const char *icd_drm_get_devnode(struct icd_drm_device *dev,
+                                enum icd_drm_minor_type minor)
+{
+    return (dev->minors[minor]) ?
+        udev_device_get_devnode((struct udev_device *) dev->minors[minor]) :
+        NULL;
+}

diff --git a/icd/common/icd-enumerate-drm.h b/icd/common/icd-enumerate-drm.h
new file mode 100644
index 0000000..ef0ea3b
--- /dev/null
+++ b/icd/common/icd-enumerate-drm.h

@@ -0,0 +1,53 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Chia-I Wu <olv@lunarg.com>
+ *
+ */
+
+#ifndef ICD_ENUMERATE_DRM_H
+#define ICD_ENUMERATE_DRM_H
+
+enum icd_drm_minor_type {
+    ICD_DRM_MINOR_LEGACY,
+    ICD_DRM_MINOR_RENDER,
+
+    ICD_DRM_MINOR_COUNT,
+    ICD_DRM_MINOR_INVALID,
+};
+
+struct icd_drm_device {
+    const void *id;
+    int devid;
+
+    void *minors[ICD_DRM_MINOR_COUNT];
+
+    struct icd_drm_device *next;
+};
+
+struct icd_instance;
+
+struct icd_drm_device *icd_drm_enumerate(const struct icd_instance *instance,
+                                         int vendor_id);
+void icd_drm_release(const struct icd_instance *instance,
+                     struct icd_drm_device *devices);
+
+const char *icd_drm_get_devnode(struct icd_drm_device *dev,
+                                enum icd_drm_minor_type minor);
+
+#endif /* ICD_ENUMERATE_DRM_H */

diff --git a/icd/common/icd-format.c b/icd/common/icd-format.c
new file mode 100644
index 0000000..b4fefca
--- /dev/null
+++ b/icd/common/icd-format.c

@@ -0,0 +1,746 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Chia-I Wu <olv@lunarg.com>
+ * Author: Jeremy Hayes <jeremy@lunarg.com>
+ * Author: Jon Ashburn <jon@lunarg.com>
+ *
+ */
+
+#include <string.h> /* for memcpy */
+#include "icd-utils.h"
+#include "icd-format.h"
+
+static const struct icd_format_info {
+    size_t size;
+    uint32_t channel_count;
+} icd_format_table[VK_FORMAT_RANGE_SIZE] = {
+    [VK_FORMAT_UNDEFINED]            = { 0,  0 },
+    [VK_FORMAT_R4G4_UNORM_PACK8]           = { 1,  2 },
+    [VK_FORMAT_R4G4B4A4_UNORM_PACK16]       = { 2,  4 },
+    [VK_FORMAT_B4G4R4A4_UNORM_PACK16]       = { 2,  4 },
+    [VK_FORMAT_R5G6B5_UNORM_PACK16]         = { 2,  3 },
+    [VK_FORMAT_B5G6R5_UNORM_PACK16]         = { 2, 3 },
+    [VK_FORMAT_R5G5B5A1_UNORM_PACK16]       = { 2,  4 },
+    [VK_FORMAT_B5G5R5A1_UNORM_PACK16]       = { 2,  4 },
+    [VK_FORMAT_A1R5G5B5_UNORM_PACK16]       = { 2,  4 },
+    [VK_FORMAT_R8_UNORM]             = { 1,  1 },
+    [VK_FORMAT_R8_SNORM]             = { 1,  1 },
+    [VK_FORMAT_R8_USCALED]           = { 1,  1 },
+    [VK_FORMAT_R8_SSCALED]           = { 1,  1 },
+    [VK_FORMAT_R8_UINT]              = { 1,  1 },
+    [VK_FORMAT_R8_SINT]              = { 1,  1 },
+    [VK_FORMAT_R8_SRGB]              = { 1,  1 },
+    [VK_FORMAT_R8G8_UNORM]           = { 2,  2 },
+    [VK_FORMAT_R8G8_SNORM]           = { 2,  2 },
+    [VK_FORMAT_R8G8_USCALED]         = { 2,  2 },
+    [VK_FORMAT_R8G8_SSCALED]         = { 2,  2 },
+    [VK_FORMAT_R8G8_UINT]            = { 2,  2 },
+    [VK_FORMAT_R8G8_SINT]            = { 2,  2 },
+    [VK_FORMAT_R8G8_SRGB]            = { 2,  2 },
+    [VK_FORMAT_R8G8B8_UNORM]         = { 3,  3 },
+    [VK_FORMAT_R8G8B8_SNORM]         = { 3,  3 },
+    [VK_FORMAT_R8G8B8_USCALED]       = { 3,  3 },
+    [VK_FORMAT_R8G8B8_SSCALED]       = { 3,  3 },
+    [VK_FORMAT_R8G8B8_UINT]          = { 3,  3 },
+    [VK_FORMAT_R8G8B8_SINT]          = { 3,  3 },
+    [VK_FORMAT_R8G8B8_SRGB]          = { 3,  3 },
+    [VK_FORMAT_B8G8R8_UNORM]         = { 3, 3 },
+    [VK_FORMAT_B8G8R8_SNORM]         = { 3, 3 },
+    [VK_FORMAT_B8G8R8_USCALED]       = { 3, 3 },
+    [VK_FORMAT_B8G8R8_SSCALED]       = { 3, 3 },
+    [VK_FORMAT_B8G8R8_UINT]          = { 3, 3 },
+    [VK_FORMAT_B8G8R8_SINT]          = { 3, 3 },
+    [VK_FORMAT_B8G8R8_SRGB]          = { 3, 3 },
+    [VK_FORMAT_R8G8B8A8_UNORM]       = { 4,  4 },
+    [VK_FORMAT_R8G8B8A8_SNORM]       = { 4,  4 },
+    [VK_FORMAT_R8G8B8A8_USCALED]     = { 4,  4 },
+    [VK_FORMAT_R8G8B8A8_SSCALED]     = { 4,  4 },
+    [VK_FORMAT_R8G8B8A8_UINT]        = { 4,  4 },
+    [VK_FORMAT_R8G8B8A8_SINT]        = { 4,  4 },
+    [VK_FORMAT_R8G8B8A8_SRGB]        = { 4,  4 },
+    [VK_FORMAT_B8G8R8A8_UNORM]       = { 4, 4 },
+    [VK_FORMAT_B8G8R8A8_SNORM]       = { 4, 4 },
+    [VK_FORMAT_B8G8R8A8_USCALED]     = { 4, 4 },
+    [VK_FORMAT_B8G8R8A8_SSCALED]     = { 4, 4 },
+    [VK_FORMAT_B8G8R8A8_UINT]        = { 4, 4 },
+    [VK_FORMAT_B8G8R8A8_SINT]        = { 4, 4 },
+    [VK_FORMAT_B8G8R8A8_SRGB]        = { 4, 4 },
+    [VK_FORMAT_A8B8G8R8_UNORM_PACK32]       = { 4, 4 },
+    [VK_FORMAT_A8B8G8R8_SNORM_PACK32]       = { 4, 4 },
+    [VK_FORMAT_A8B8G8R8_USCALED_PACK32]     = { 4, 4 },
+    [VK_FORMAT_A8B8G8R8_SSCALED_PACK32]     = { 4, 4 },
+    [VK_FORMAT_A8B8G8R8_UINT_PACK32]        = { 4, 4 },
+    [VK_FORMAT_A8B8G8R8_SINT_PACK32]        = { 4, 4 },
+    [VK_FORMAT_A8B8G8R8_SRGB_PACK32]        = { 4, 4 },
+    [VK_FORMAT_A2R10G10B10_UNORM_PACK32]    = { 4, 4 },
+    [VK_FORMAT_A2R10G10B10_SNORM_PACK32]    = { 4, 4 },
+    [VK_FORMAT_A2R10G10B10_USCALED_PACK32]  = { 4, 4 },
+    [VK_FORMAT_A2R10G10B10_SSCALED_PACK32]  = { 4, 4 },
+    [VK_FORMAT_A2R10G10B10_UINT_PACK32]     = { 4, 4 },
+    [VK_FORMAT_A2R10G10B10_SINT_PACK32]     = { 4, 4 },
+    [VK_FORMAT_A2B10G10R10_UNORM_PACK32]    = { 4,  4 },
+    [VK_FORMAT_A2B10G10R10_SNORM_PACK32]    = { 4,  4 },
+    [VK_FORMAT_A2B10G10R10_USCALED_PACK32]  = { 4,  4 },
+    [VK_FORMAT_A2B10G10R10_SSCALED_PACK32]  = { 4,  4 },
+    [VK_FORMAT_A2B10G10R10_UINT_PACK32]     = { 4,  4 },
+    [VK_FORMAT_A2B10G10R10_SINT_PACK32]     = { 4,  4 },
+    [VK_FORMAT_R16_UNORM]            = { 2,  1 },
+    [VK_FORMAT_R16_SNORM]            = { 2,  1 },
+    [VK_FORMAT_R16_USCALED]          = { 2,  1 },
+    [VK_FORMAT_R16_SSCALED]          = { 2,  1 },
+    [VK_FORMAT_R16_UINT]             = { 2,  1 },
+    [VK_FORMAT_R16_SINT]             = { 2,  1 },
+    [VK_FORMAT_R16_SFLOAT]           = { 2,  1 },
+    [VK_FORMAT_R16G16_UNORM]         = { 4,  2 },
+    [VK_FORMAT_R16G16_SNORM]         = { 4,  2 },
+    [VK_FORMAT_R16G16_USCALED]       = { 4,  2 },
+    [VK_FORMAT_R16G16_SSCALED]       = { 4,  2 },
+    [VK_FORMAT_R16G16_UINT]          = { 4,  2 },
+    [VK_FORMAT_R16G16_SINT]          = { 4,  2 },
+    [VK_FORMAT_R16G16_SFLOAT]        = { 4,  2 },
+    [VK_FORMAT_R16G16B16_UNORM]      = { 6,  3 },
+    [VK_FORMAT_R16G16B16_SNORM]      = { 6,  3 },
+    [VK_FORMAT_R16G16B16_USCALED]    = { 6,  3 },
+    [VK_FORMAT_R16G16B16_SSCALED]    = { 6,  3 },
+    [VK_FORMAT_R16G16B16_UINT]       = { 6,  3 },
+    [VK_FORMAT_R16G16B16_SINT]       = { 6,  3 },
+    [VK_FORMAT_R16G16B16_SFLOAT]     = { 6,  3 },
+    [VK_FORMAT_R16G16B16A16_UNORM]   = { 8,  4 },
+    [VK_FORMAT_R16G16B16A16_SNORM]   = { 8,  4 },
+    [VK_FORMAT_R16G16B16A16_USCALED] = { 8,  4 },
+    [VK_FORMAT_R16G16B16A16_SSCALED] = { 8,  4 },
+    [VK_FORMAT_R16G16B16A16_UINT]    = { 8,  4 },
+    [VK_FORMAT_R16G16B16A16_SINT]    = { 8,  4 },
+    [VK_FORMAT_R16G16B16A16_SFLOAT]  = { 8,  4 },
+    [VK_FORMAT_R32_UINT]             = { 4,  1 },
+    [VK_FORMAT_R32_SINT]             = { 4,  1 },
+    [VK_FORMAT_R32_SFLOAT]           = { 4,  1 },
+    [VK_FORMAT_R32G32_UINT]          = { 8,  2 },
+    [VK_FORMAT_R32G32_SINT]          = { 8,  2 },
+    [VK_FORMAT_R32G32_SFLOAT]        = { 8,  2 },
+    [VK_FORMAT_R32G32B32_UINT]       = { 12, 3 },
+    [VK_FORMAT_R32G32B32_SINT]       = { 12, 3 },
+    [VK_FORMAT_R32G32B32_SFLOAT]     = { 12, 3 },
+    [VK_FORMAT_R32G32B32A32_UINT]    = { 16, 4 },
+    [VK_FORMAT_R32G32B32A32_SINT]    = { 16, 4 },
+    [VK_FORMAT_R32G32B32A32_SFLOAT]  = { 16, 4 },
+    [VK_FORMAT_R64_UINT]             = { 8,  1 },
+    [VK_FORMAT_R64_SINT]             = { 8,  1 },
+    [VK_FORMAT_R64_SFLOAT]           = { 8,  1 },
+    [VK_FORMAT_R64G64_UINT]          = { 16, 2 },
+    [VK_FORMAT_R64G64_SINT]          = { 16, 2 },
+    [VK_FORMAT_R64G64_SFLOAT]        = { 16, 2 },
+    [VK_FORMAT_R64G64B64_UINT]       = { 24, 3 },
+    [VK_FORMAT_R64G64B64_SINT]       = { 24, 3 },
+    [VK_FORMAT_R64G64B64_SFLOAT]     = { 24, 3 },
+    [VK_FORMAT_R64G64B64A64_UINT]    = { 32, 4 },
+    [VK_FORMAT_R64G64B64A64_SINT]    = { 32, 4 },
+    [VK_FORMAT_R64G64B64A64_SFLOAT]  = { 32, 4 },
+    [VK_FORMAT_B10G11R11_UFLOAT_PACK32]     = { 4,  3 },
+    [VK_FORMAT_E5B9G9R9_UFLOAT_PACK32]      = { 4,  3 },
+    [VK_FORMAT_D16_UNORM]            = { 2,  1 },
+    [VK_FORMAT_X8_D24_UNORM_PACK32]         = { 3,  1 },
+    [VK_FORMAT_D32_SFLOAT]           = { 4,  1 },
+    [VK_FORMAT_S8_UINT]              = { 1,  1 },
+    [VK_FORMAT_D16_UNORM_S8_UINT]    = { 3,  2 },
+    [VK_FORMAT_D24_UNORM_S8_UINT]    = { 4,  2 },
+    [VK_FORMAT_D32_SFLOAT_S8_UINT]   = { 4,  2 },
+    [VK_FORMAT_BC1_RGB_UNORM_BLOCK]        = { 8,  4 },
+    [VK_FORMAT_BC1_RGB_SRGB_BLOCK]         = { 8,  4 },
+    [VK_FORMAT_BC1_RGBA_UNORM_BLOCK]       = { 8,  4 },
+    [VK_FORMAT_BC1_RGBA_SRGB_BLOCK]        = { 8,  4 },
+    [VK_FORMAT_BC2_UNORM_BLOCK]            = { 16, 4 },
+    [VK_FORMAT_BC2_SRGB_BLOCK]             = { 16, 4 },
+    [VK_FORMAT_BC3_UNORM_BLOCK]            = { 16, 4 },
+    [VK_FORMAT_BC3_SRGB_BLOCK]             = { 16, 4 },
+    [VK_FORMAT_BC4_UNORM_BLOCK]            = { 8,  4 },
+    [VK_FORMAT_BC4_SNORM_BLOCK]            = { 8,  4 },
+    [VK_FORMAT_BC5_UNORM_BLOCK]            = { 16, 4 },
+    [VK_FORMAT_BC5_SNORM_BLOCK]            = { 16, 4 },
+    [VK_FORMAT_BC6H_UFLOAT_BLOCK]          = { 16, 4 },
+    [VK_FORMAT_BC6H_SFLOAT_BLOCK]          = { 16, 4 },
+    [VK_FORMAT_BC7_UNORM_BLOCK]            = { 16, 4 },
+    [VK_FORMAT_BC7_SRGB_BLOCK]             = { 16, 4 },
+    /* TODO: Initialize remaining compressed formats. */
+    [VK_FORMAT_ETC2_R8G8B8_UNORM_BLOCK]    = { 0, 0 },
+    [VK_FORMAT_ETC2_R8G8B8A1_UNORM_BLOCK]  = { 0, 0 },
+    [VK_FORMAT_ETC2_R8G8B8A8_UNORM_BLOCK]  = { 0, 0 },
+    [VK_FORMAT_EAC_R11_UNORM_BLOCK]        = { 0, 0 },
+    [VK_FORMAT_EAC_R11_SNORM_BLOCK]        = { 0, 0 },
+    [VK_FORMAT_EAC_R11G11_UNORM_BLOCK]     = { 0, 0 },
+    [VK_FORMAT_EAC_R11G11_SNORM_BLOCK]     = { 0, 0 },
+    [VK_FORMAT_ASTC_4x4_UNORM_BLOCK]       = { 0, 0 },
+    [VK_FORMAT_ASTC_4x4_SRGB_BLOCK]        = { 0, 0 },
+    [VK_FORMAT_ASTC_5x4_UNORM_BLOCK]       = { 0, 0 },
+    [VK_FORMAT_ASTC_5x4_SRGB_BLOCK]        = { 0, 0 },
+    [VK_FORMAT_ASTC_5x5_UNORM_BLOCK]       = { 0, 0 },
+    [VK_FORMAT_ASTC_5x5_SRGB_BLOCK]        = { 0, 0 },
+    [VK_FORMAT_ASTC_6x5_UNORM_BLOCK]       = { 0, 0 },
+    [VK_FORMAT_ASTC_6x5_SRGB_BLOCK]        = { 0, 0 },
+    [VK_FORMAT_ASTC_6x6_UNORM_BLOCK]       = { 0, 0 },
+    [VK_FORMAT_ASTC_6x6_SRGB_BLOCK]        = { 0, 0 },
+    [VK_FORMAT_ASTC_8x5_UNORM_BLOCK]       = { 0, 0 },
+    [VK_FORMAT_ASTC_8x5_SRGB_BLOCK]        = { 0, 0 },
+    [VK_FORMAT_ASTC_8x6_UNORM_BLOCK]       = { 0, 0 },
+    [VK_FORMAT_ASTC_8x6_SRGB_BLOCK]        = { 0, 0 },
+    [VK_FORMAT_ASTC_8x8_UNORM_BLOCK]       = { 0, 0 },
+    [VK_FORMAT_ASTC_8x8_SRGB_BLOCK]        = { 0, 0 },
+    [VK_FORMAT_ASTC_10x5_UNORM_BLOCK]      = { 0, 0 },
+    [VK_FORMAT_ASTC_10x5_SRGB_BLOCK]       = { 0, 0 },
+    [VK_FORMAT_ASTC_10x6_UNORM_BLOCK]      = { 0, 0 },
+    [VK_FORMAT_ASTC_10x6_SRGB_BLOCK]       = { 0, 0 },
+    [VK_FORMAT_ASTC_10x8_UNORM_BLOCK]      = { 0, 0 },
+    [VK_FORMAT_ASTC_10x8_SRGB_BLOCK]       = { 0, 0 },
+    [VK_FORMAT_ASTC_10x10_UNORM_BLOCK]     = { 0, 0 },
+    [VK_FORMAT_ASTC_10x10_SRGB_BLOCK]      = { 0, 0 },
+    [VK_FORMAT_ASTC_12x10_UNORM_BLOCK]     = { 0, 0 },
+    [VK_FORMAT_ASTC_12x10_SRGB_BLOCK]      = { 0, 0 },
+    [VK_FORMAT_ASTC_12x12_UNORM_BLOCK]     = { 0, 0 },
+    [VK_FORMAT_ASTC_12x12_SRGB_BLOCK]      = { 0, 0 },
+};
+
+bool icd_format_is_ds(VkFormat format)
+{
+    bool is_ds = false;
+
+    switch (format) {
+    case VK_FORMAT_D16_UNORM:
+    case VK_FORMAT_X8_D24_UNORM_PACK32:
+    case VK_FORMAT_D32_SFLOAT:
+    case VK_FORMAT_S8_UINT:
+    case VK_FORMAT_D16_UNORM_S8_UINT:
+    case VK_FORMAT_D24_UNORM_S8_UINT:
+    case VK_FORMAT_D32_SFLOAT_S8_UINT:
+        is_ds = true;
+        break;
+    default:
+        break;
+    }
+
+    return is_ds;
+}
+
+bool icd_format_is_norm(VkFormat format)
+{
+    bool is_norm = false;
+
+    switch (format) {
+    case VK_FORMAT_R4G4_UNORM_PACK8:
+    case VK_FORMAT_R4G4B4A4_UNORM_PACK16:
+    case VK_FORMAT_R5G6B5_UNORM_PACK16:
+    case VK_FORMAT_R5G5B5A1_UNORM_PACK16:
+    case VK_FORMAT_A1R5G5B5_UNORM_PACK16:
+    case VK_FORMAT_R8_UNORM:
+    case VK_FORMAT_R8_SNORM:
+    case VK_FORMAT_R8G8_UNORM:
+    case VK_FORMAT_R8G8_SNORM:
+    case VK_FORMAT_R8G8B8_UNORM:
+    case VK_FORMAT_R8G8B8_SNORM:
+    case VK_FORMAT_R8G8B8A8_UNORM:
+    case VK_FORMAT_R8G8B8A8_SNORM:
+    case VK_FORMAT_A8B8G8R8_UNORM_PACK32:
+    case VK_FORMAT_A8B8G8R8_SNORM_PACK32:
+    case VK_FORMAT_A2B10G10R10_UNORM_PACK32:
+    case VK_FORMAT_A2B10G10R10_SNORM_PACK32:
+    case VK_FORMAT_R16_UNORM:
+    case VK_FORMAT_R16_SNORM:
+    case VK_FORMAT_R16G16_UNORM:
+    case VK_FORMAT_R16G16_SNORM:
+    case VK_FORMAT_R16G16B16_UNORM:
+    case VK_FORMAT_R16G16B16_SNORM:
+    case VK_FORMAT_R16G16B16A16_UNORM:
+    case VK_FORMAT_R16G16B16A16_SNORM:
+    case VK_FORMAT_BC1_RGB_UNORM_BLOCK:
+    case VK_FORMAT_BC2_UNORM_BLOCK:
+    case VK_FORMAT_BC3_UNORM_BLOCK:
+    case VK_FORMAT_BC4_UNORM_BLOCK:
+    case VK_FORMAT_BC4_SNORM_BLOCK:
+    case VK_FORMAT_BC5_UNORM_BLOCK:
+    case VK_FORMAT_BC5_SNORM_BLOCK:
+    case VK_FORMAT_BC7_UNORM_BLOCK:
+    case VK_FORMAT_ETC2_R8G8B8_UNORM_BLOCK:
+    case VK_FORMAT_ETC2_R8G8B8A1_UNORM_BLOCK:
+    case VK_FORMAT_ETC2_R8G8B8A8_UNORM_BLOCK:
+    case VK_FORMAT_EAC_R11_UNORM_BLOCK:
+    case VK_FORMAT_EAC_R11_SNORM_BLOCK:
+    case VK_FORMAT_EAC_R11G11_UNORM_BLOCK:
+    case VK_FORMAT_EAC_R11G11_SNORM_BLOCK:
+    case VK_FORMAT_ASTC_4x4_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_5x4_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_5x5_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_6x5_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_6x6_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_8x5_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_8x6_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_8x8_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_10x5_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_10x6_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_10x8_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_10x10_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_12x10_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_12x12_UNORM_BLOCK:
+    case VK_FORMAT_B5G6R5_UNORM_PACK16:
+    case VK_FORMAT_B8G8R8_UNORM:
+    case VK_FORMAT_B8G8R8_SNORM:
+    case VK_FORMAT_B8G8R8A8_UNORM:
+    case VK_FORMAT_B8G8R8A8_SNORM:
+    case VK_FORMAT_A2R10G10B10_UNORM_PACK32:
+    case VK_FORMAT_A2R10G10B10_SNORM_PACK32:
+        is_norm = true;
+        break;
+    default:
+        break;
+    }
+
+    return is_norm;
+};
+
+bool icd_format_is_int(VkFormat format)
+{
+    bool is_int = false;
+
+    switch (format) {
+    case VK_FORMAT_R8_UINT:
+    case VK_FORMAT_R8_SINT:
+    case VK_FORMAT_R8G8_UINT:
+    case VK_FORMAT_R8G8_SINT:
+    case VK_FORMAT_R8G8B8_UINT:
+    case VK_FORMAT_R8G8B8_SINT:
+    case VK_FORMAT_R8G8B8A8_UINT:
+    case VK_FORMAT_R8G8B8A8_SINT:
+    case VK_FORMAT_A8B8G8R8_UINT_PACK32:
+    case VK_FORMAT_A8B8G8R8_SINT_PACK32:
+    case VK_FORMAT_A2B10G10R10_UINT_PACK32:
+    case VK_FORMAT_A2B10G10R10_SINT_PACK32:
+    case VK_FORMAT_R16_UINT:
+    case VK_FORMAT_R16_SINT:
+    case VK_FORMAT_R16G16_UINT:
+    case VK_FORMAT_R16G16_SINT:
+    case VK_FORMAT_R16G16B16_UINT:
+    case VK_FORMAT_R16G16B16_SINT:
+    case VK_FORMAT_R16G16B16A16_UINT:
+    case VK_FORMAT_R16G16B16A16_SINT:
+    case VK_FORMAT_R32_UINT:
+    case VK_FORMAT_R32_SINT:
+    case VK_FORMAT_R32G32_UINT:
+    case VK_FORMAT_R32G32_SINT:
+    case VK_FORMAT_R32G32B32_UINT:
+    case VK_FORMAT_R32G32B32_SINT:
+    case VK_FORMAT_R32G32B32A32_UINT:
+    case VK_FORMAT_R32G32B32A32_SINT:
+    case VK_FORMAT_R64_UINT:
+    case VK_FORMAT_R64_SINT:
+    case VK_FORMAT_R64G64_UINT:
+    case VK_FORMAT_R64G64_SINT:
+    case VK_FORMAT_R64G64B64_UINT:
+    case VK_FORMAT_R64G64B64_SINT:
+    case VK_FORMAT_R64G64B64A64_UINT:
+    case VK_FORMAT_R64G64B64A64_SINT:
+    case VK_FORMAT_B8G8R8_UINT:
+    case VK_FORMAT_B8G8R8_SINT:
+    case VK_FORMAT_B8G8R8A8_UINT:
+    case VK_FORMAT_B8G8R8A8_SINT:
+    case VK_FORMAT_A2R10G10B10_UINT_PACK32:
+    case VK_FORMAT_A2R10G10B10_SINT_PACK32:
+        is_int = true;
+        break;
+    default:
+        break;
+    }
+
+    return is_int;
+}
+
+bool icd_format_is_float(VkFormat format)
+{
+    bool is_float = false;
+
+    switch (format) {
+    case VK_FORMAT_R16_SFLOAT:
+    case VK_FORMAT_R16G16_SFLOAT:
+    case VK_FORMAT_R16G16B16_SFLOAT:
+    case VK_FORMAT_R16G16B16A16_SFLOAT:
+    case VK_FORMAT_R32_SFLOAT:
+    case VK_FORMAT_R32G32_SFLOAT:
+    case VK_FORMAT_R32G32B32_SFLOAT:
+    case VK_FORMAT_R32G32B32A32_SFLOAT:
+    case VK_FORMAT_R64_SFLOAT:
+    case VK_FORMAT_R64G64_SFLOAT:
+    case VK_FORMAT_R64G64B64_SFLOAT:
+    case VK_FORMAT_R64G64B64A64_SFLOAT:
+    case VK_FORMAT_B10G11R11_UFLOAT_PACK32:
+    case VK_FORMAT_E5B9G9R9_UFLOAT_PACK32:
+    case VK_FORMAT_BC6H_UFLOAT_BLOCK:
+    case VK_FORMAT_BC6H_SFLOAT_BLOCK:
+        is_float = true;
+        break;
+    default:
+        break;
+    }
+
+    return is_float;
+}
+
+bool icd_format_is_srgb(VkFormat format)
+{
+    bool is_srgb = false;
+
+    switch (format) {
+    case VK_FORMAT_R8_SRGB:
+    case VK_FORMAT_R8G8_SRGB:
+    case VK_FORMAT_R8G8B8_SRGB:
+    case VK_FORMAT_R8G8B8A8_SRGB:
+    case VK_FORMAT_A8B8G8R8_SRGB_PACK32:
+    case VK_FORMAT_BC1_RGB_SRGB_BLOCK:
+    case VK_FORMAT_BC2_SRGB_BLOCK:
+    case VK_FORMAT_BC3_SRGB_BLOCK:
+    case VK_FORMAT_BC7_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_4x4_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_5x4_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_5x5_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_6x5_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_6x6_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_8x5_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_8x6_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_8x8_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_10x5_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_10x6_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_10x8_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_10x10_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_12x10_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_12x12_SRGB_BLOCK:
+    case VK_FORMAT_B8G8R8_SRGB:
+    case VK_FORMAT_B8G8R8A8_SRGB:
+        is_srgb = true;
+        break;
+    default:
+        break;
+    }
+
+    return is_srgb;
+}
+
+bool icd_format_is_compressed(VkFormat format)
+{
+    switch (format) {
+    case VK_FORMAT_BC1_RGB_UNORM_BLOCK:
+    case VK_FORMAT_BC1_RGB_SRGB_BLOCK:
+    case VK_FORMAT_BC2_UNORM_BLOCK:
+    case VK_FORMAT_BC2_SRGB_BLOCK:
+    case VK_FORMAT_BC3_UNORM_BLOCK:
+    case VK_FORMAT_BC3_SRGB_BLOCK:
+    case VK_FORMAT_BC4_UNORM_BLOCK:
+    case VK_FORMAT_BC4_SNORM_BLOCK:
+    case VK_FORMAT_BC5_UNORM_BLOCK:
+    case VK_FORMAT_BC5_SNORM_BLOCK:
+    case VK_FORMAT_BC6H_UFLOAT_BLOCK:
+    case VK_FORMAT_BC6H_SFLOAT_BLOCK:
+    case VK_FORMAT_BC7_UNORM_BLOCK:
+    case VK_FORMAT_BC7_SRGB_BLOCK:
+    case VK_FORMAT_ETC2_R8G8B8_UNORM_BLOCK:
+    case VK_FORMAT_ETC2_R8G8B8A1_UNORM_BLOCK:
+    case VK_FORMAT_ETC2_R8G8B8A8_UNORM_BLOCK:
+    case VK_FORMAT_EAC_R11_UNORM_BLOCK:
+    case VK_FORMAT_EAC_R11_SNORM_BLOCK:
+    case VK_FORMAT_EAC_R11G11_UNORM_BLOCK:
+    case VK_FORMAT_EAC_R11G11_SNORM_BLOCK:
+    case VK_FORMAT_ASTC_4x4_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_4x4_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_5x4_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_5x4_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_5x5_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_5x5_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_6x5_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_6x5_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_6x6_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_6x6_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_8x5_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_8x5_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_8x6_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_8x6_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_8x8_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_8x8_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_10x5_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_10x5_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_10x6_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_10x6_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_10x8_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_10x8_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_10x10_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_10x10_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_12x10_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_12x10_SRGB_BLOCK:
+    case VK_FORMAT_ASTC_12x12_UNORM_BLOCK:
+    case VK_FORMAT_ASTC_12x12_SRGB_BLOCK:
+        return true;
+    default:
+        return false;
+    }
+}
+
+size_t icd_format_get_size(VkFormat format)
+{
+    return icd_format_table[format].size;
+}
+
+unsigned int icd_format_get_channel_count(VkFormat format)
+{
+    return icd_format_table[format].channel_count;
+}
+
+/**
+ * Convert a raw RGBA color to a raw value.  \p value must have at least
+ * icd_format_get_size(format) bytes.
+ */
+void icd_format_get_raw_value(VkFormat format,
+                              const uint32_t color[4],
+                              void *value)
+{
+    /* assume little-endian */
+    switch (format) {
+    case VK_FORMAT_UNDEFINED:
+        break;
+    case VK_FORMAT_R4G4_UNORM_PACK8:
+        ((uint8_t *) value)[0]  = (color[0] & 0xf) << 0   |
+                                  (color[1] & 0xf) << 4;
+        break;
+    case VK_FORMAT_R4G4B4A4_UNORM_PACK16:
+        ((uint16_t *) value)[0] = (color[0] & 0xf) << 0   |
+                                  (color[1] & 0xf) << 4   |
+                                  (color[2] & 0xf) << 8   |
+                                  (color[3] & 0xf) << 12;
+        break;
+    case VK_FORMAT_R5G6B5_UNORM_PACK16:
+        ((uint16_t *) value)[0] = (color[0] & 0x1f) << 0  |
+                                  (color[1] & 0x3f) << 5  |
+                                  (color[2] & 0x1f) << 11;
+        break;
+    case VK_FORMAT_B5G6R5_UNORM_PACK16:
+        ((uint16_t *) value)[0] = (color[2] & 0x1f) << 0  |
+                                  (color[1] & 0x3f) << 5  |
+                                  (color[0] & 0x1f) << 11;
+        break;
+    case VK_FORMAT_R5G5B5A1_UNORM_PACK16:
+        ((uint16_t *) value)[0] = (color[0] & 0x1f) << 0  |
+                                  (color[1] & 0x1f) << 5  |
+                                  (color[2] & 0x1f) << 10 |
+                                  (color[3] & 0x1)  << 15;
+        break;
+    case VK_FORMAT_R8_UNORM:
+    case VK_FORMAT_R8_SNORM:
+    case VK_FORMAT_R8_USCALED:
+    case VK_FORMAT_R8_SSCALED:
+    case VK_FORMAT_R8_UINT:
+    case VK_FORMAT_R8_SINT:
+    case VK_FORMAT_R8_SRGB:
+        ((uint8_t *) value)[0]  = (uint8_t) color[0];
+        break;
+    case VK_FORMAT_R8G8_UNORM:
+    case VK_FORMAT_R8G8_SNORM:
+    case VK_FORMAT_R8G8_USCALED:
+    case VK_FORMAT_R8G8_SSCALED:
+    case VK_FORMAT_R8G8_UINT:
+    case VK_FORMAT_R8G8_SINT:
+    case VK_FORMAT_R8G8_SRGB:
+        ((uint8_t *) value)[0]  = (uint8_t) color[0];
+        ((uint8_t *) value)[1]  = (uint8_t) color[1];
+        break;
+    case VK_FORMAT_R8G8B8A8_UNORM:
+    case VK_FORMAT_R8G8B8A8_SNORM:
+    case VK_FORMAT_R8G8B8A8_USCALED:
+    case VK_FORMAT_R8G8B8A8_SSCALED:
+    case VK_FORMAT_R8G8B8A8_UINT:
+    case VK_FORMAT_R8G8B8A8_SINT:
+    case VK_FORMAT_R8G8B8A8_SRGB:
+        ((uint8_t *) value)[0]  = (uint8_t) color[0];
+        ((uint8_t *) value)[1]  = (uint8_t) color[1];
+        ((uint8_t *) value)[2]  = (uint8_t) color[2];
+        ((uint8_t *) value)[3]  = (uint8_t) color[3];
+        break;
+    case VK_FORMAT_B8G8R8A8_UNORM:
+    case VK_FORMAT_B8G8R8A8_SRGB:
+        ((uint8_t *) value)[0]  = (uint8_t) color[2];
+        ((uint8_t *) value)[1]  = (uint8_t) color[1];
+        ((uint8_t *) value)[2]  = (uint8_t) color[0];
+        ((uint8_t *) value)[3]  = (uint8_t) color[3];
+        break;
+    case VK_FORMAT_B10G11R11_UFLOAT_PACK32:
+        ((uint32_t *) value)[0] = (color[0] & 0x7ff) << 0  |
+                                  (color[1] & 0x7ff) << 11 |
+                                  (color[2] & 0x3ff) << 22;
+        break;
+    case VK_FORMAT_A2B10G10R10_UNORM_PACK32:
+    case VK_FORMAT_A2B10G10R10_SNORM_PACK32:
+    case VK_FORMAT_A2B10G10R10_USCALED_PACK32:
+    case VK_FORMAT_A2B10G10R10_SSCALED_PACK32:
+    case VK_FORMAT_A2B10G10R10_UINT_PACK32:
+    case VK_FORMAT_A2B10G10R10_SINT_PACK32:
+        ((uint32_t *) value)[0] = (color[0] & 0x3ff) << 0  |
+                                  (color[1] & 0x3ff) << 10 |
+                                  (color[2] & 0x3ff) << 20 |
+                                  (color[3] & 0x3)   << 30;
+        break;
+    case VK_FORMAT_R16_UNORM:
+    case VK_FORMAT_R16_SNORM:
+    case VK_FORMAT_R16_USCALED:
+    case VK_FORMAT_R16_SSCALED:
+    case VK_FORMAT_R16_UINT:
+    case VK_FORMAT_R16_SINT:
+    case VK_FORMAT_R16_SFLOAT:
+        ((uint16_t *) value)[0] = (uint16_t) color[0];
+        break;
+    case VK_FORMAT_R16G16_UNORM:
+    case VK_FORMAT_R16G16_SNORM:
+    case VK_FORMAT_R16G16_USCALED:
+    case VK_FORMAT_R16G16_SSCALED:
+    case VK_FORMAT_R16G16_UINT:
+    case VK_FORMAT_R16G16_SINT:
+    case VK_FORMAT_R16G16_SFLOAT:
+        ((uint16_t *) value)[0] = (uint16_t) color[0];
+        ((uint16_t *) value)[1] = (uint16_t) color[1];
+        break;
+    case VK_FORMAT_R16G16B16A16_UNORM:
+    case VK_FORMAT_R16G16B16A16_SNORM:
+    case VK_FORMAT_R16G16B16A16_USCALED:
+    case VK_FORMAT_R16G16B16A16_SSCALED:
+    case VK_FORMAT_R16G16B16A16_UINT:
+    case VK_FORMAT_R16G16B16A16_SINT:
+    case VK_FORMAT_R16G16B16A16_SFLOAT:
+        ((uint16_t *) value)[0] = (uint16_t) color[0];
+        ((uint16_t *) value)[1] = (uint16_t) color[1];
+        ((uint16_t *) value)[2] = (uint16_t) color[2];
+        ((uint16_t *) value)[3] = (uint16_t) color[3];
+        break;
+    case VK_FORMAT_R32_UINT:
+    case VK_FORMAT_R32_SINT:
+    case VK_FORMAT_R32_SFLOAT:
+        ((uint32_t *) value)[0] = color[0];
+        break;
+    case VK_FORMAT_R32G32_UINT:
+    case VK_FORMAT_R32G32_SINT:
+    case VK_FORMAT_R32G32_SFLOAT:
+        ((uint32_t *) value)[0] = color[0];
+        ((uint32_t *) value)[1] = color[1];
+        break;
+    case VK_FORMAT_R32G32B32_UINT:
+    case VK_FORMAT_R32G32B32_SINT:
+    case VK_FORMAT_R32G32B32_SFLOAT:
+        ((uint32_t *) value)[0] = color[0];
+        ((uint32_t *) value)[1] = color[1];
+        ((uint32_t *) value)[2] = color[2];
+        break;
+    case VK_FORMAT_R32G32B32A32_UINT:
+    case VK_FORMAT_R32G32B32A32_SINT:
+    case VK_FORMAT_R32G32B32A32_SFLOAT:
+        ((uint32_t *) value)[0] = color[0];
+        ((uint32_t *) value)[1] = color[1];
+        ((uint32_t *) value)[2] = color[2];
+        ((uint32_t *) value)[3] = color[3];
+        break;
+    case VK_FORMAT_D16_UNORM_S8_UINT:
+        ((uint16_t *) value)[0] = (uint16_t) color[0];
+        ((char *) value)[2] = (uint8_t) color[1];
+        break;
+    case VK_FORMAT_D32_SFLOAT_S8_UINT:
+        ((uint32_t *) value)[0] = (uint32_t) color[0];
+        ((char *) value)[4] = (uint8_t) color[1];
+        break;
+    case VK_FORMAT_E5B9G9R9_UFLOAT_PACK32:
+        ((uint32_t *) value)[0] = (color[0] & 0x1ff) << 0  |
+                                  (color[1] & 0x1ff) << 9  |
+                                  (color[2] & 0x1ff) << 18 |
+                                  (color[3] & 0x1f)  << 27;
+        break;
+    case VK_FORMAT_BC1_RGB_UNORM_BLOCK:
+    case VK_FORMAT_BC1_RGB_SRGB_BLOCK:
+    case VK_FORMAT_BC1_RGBA_UNORM_BLOCK:
+    case VK_FORMAT_BC1_RGBA_SRGB_BLOCK:
+    case VK_FORMAT_BC4_UNORM_BLOCK:
+    case VK_FORMAT_BC4_SNORM_BLOCK:
+        memcpy(value, color, 8);
+        break;
+    case VK_FORMAT_BC2_UNORM_BLOCK:
+    case VK_FORMAT_BC2_SRGB_BLOCK:
+    case VK_FORMAT_BC3_UNORM_BLOCK:
+    case VK_FORMAT_BC3_SRGB_BLOCK:
+    case VK_FORMAT_BC5_UNORM_BLOCK:
+    case VK_FORMAT_BC5_SNORM_BLOCK:
+    case VK_FORMAT_BC6H_UFLOAT_BLOCK:
+    case VK_FORMAT_BC6H_SFLOAT_BLOCK:
+    case VK_FORMAT_BC7_UNORM_BLOCK:
+    case VK_FORMAT_BC7_SRGB_BLOCK:
+        memcpy(value, color, 16);
+        break;
+    case VK_FORMAT_R8G8B8_UNORM:
+    case VK_FORMAT_R8G8B8_SNORM:
+    case VK_FORMAT_R8G8B8_USCALED:
+    case VK_FORMAT_R8G8B8_SSCALED:
+    case VK_FORMAT_R8G8B8_UINT:
+    case VK_FORMAT_R8G8B8_SINT:
+    case VK_FORMAT_R8G8B8_SRGB:
+        ((uint8_t *) value)[0]  = (uint8_t) color[0];
+        ((uint8_t *) value)[1]  = (uint8_t) color[1];
+        ((uint8_t *) value)[2]  = (uint8_t) color[2];
+        break;
+    case VK_FORMAT_R16G16B16_UNORM:
+    case VK_FORMAT_R16G16B16_SNORM:
+    case VK_FORMAT_R16G16B16_USCALED:
+    case VK_FORMAT_R16G16B16_SSCALED:
+    case VK_FORMAT_R16G16B16_UINT:
+    case VK_FORMAT_R16G16B16_SINT:
+    case VK_FORMAT_R16G16B16_SFLOAT:
+        ((uint16_t *) value)[0] = (uint16_t) color[0];
+        ((uint16_t *) value)[1] = (uint16_t) color[1];
+        ((uint16_t *) value)[2] = (uint16_t) color[2];
+        break;
+    case VK_FORMAT_A2R10G10B10_UNORM_PACK32:
+    case VK_FORMAT_A2R10G10B10_SNORM_PACK32:
+    case VK_FORMAT_A2R10G10B10_USCALED_PACK32:
+    case VK_FORMAT_A2R10G10B10_SSCALED_PACK32:
+    case VK_FORMAT_A2R10G10B10_UINT_PACK32:
+    case VK_FORMAT_A2R10G10B10_SINT_PACK32:
+        ((uint32_t *) value)[0] = (color[2] & 0x3ff) << 0  |
+                                  (color[1] & 0x3ff) << 10 |
+                                  (color[0] & 0x3ff) << 20 |
+                                  (color[3] & 0x3)   << 30;
+        break;
+    case VK_FORMAT_R64_SFLOAT:
+        /* higher 32 bits always 0 */
+        ((uint64_t *) value)[0] = color[0];
+        break;
+    case VK_FORMAT_R64G64_SFLOAT:
+        ((uint64_t *) value)[0] = color[0];
+        ((uint64_t *) value)[1] = color[1];
+        break;
+    case VK_FORMAT_R64G64B64_SFLOAT:
+        ((uint64_t *) value)[0] = color[0];
+        ((uint64_t *) value)[1] = color[1];
+        ((uint64_t *) value)[2] = color[2];
+        break;
+    case VK_FORMAT_R64G64B64A64_SFLOAT:
+        ((uint64_t *) value)[0] = color[0];
+        ((uint64_t *) value)[1] = color[1];
+        ((uint64_t *) value)[2] = color[2];
+        ((uint64_t *) value)[3] = color[3];
+        break;
+    default:
+        assert(!"unknown format");
+        break;
+    }
+}

diff --git a/icd/common/icd-format.h b/icd/common/icd-format.h
new file mode 100644
index 0000000..c16de60
--- /dev/null
+++ b/icd/common/icd-format.h

@@ -0,0 +1,85 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Chia-I Wu <olv@lunarg.com>
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ *
+ */
+
+#ifndef ICD_FORMAT_H
+#define ICD_FORMAT_H
+
+#include <stdbool.h>
+#include "icd.h"
+
+static inline bool icd_format_is_undef(VkFormat format)
+{
+    return (format == VK_FORMAT_UNDEFINED);
+}
+
+bool icd_format_is_ds(VkFormat format);
+
+static inline bool icd_format_is_color(VkFormat format)
+{
+    return !(icd_format_is_undef(format) || icd_format_is_ds(format));
+}
+
+bool icd_format_is_norm(VkFormat format);
+
+bool icd_format_is_int(VkFormat format);
+
+bool icd_format_is_float(VkFormat format);
+
+bool icd_format_is_srgb(VkFormat format);
+
+bool icd_format_is_compressed(VkFormat format);
+
+static inline int icd_format_get_block_width(VkFormat format)
+{
+    /* all compressed formats use 4x4 blocks */
+    return (icd_format_is_compressed(format)) ? 4 : 1;
+}
+
+static inline bool icd_blend_mode_is_dual_src(VkBlendFactor mode)
+{
+    return (mode == VK_BLEND_FACTOR_SRC1_COLOR) ||
+           (mode == VK_BLEND_FACTOR_SRC1_ALPHA) ||
+           (mode == VK_BLEND_FACTOR_ONE_MINUS_SRC1_COLOR) ||
+           (mode == VK_BLEND_FACTOR_ONE_MINUS_SRC1_ALPHA);
+}
+
+static inline bool icd_pipeline_cb_att_needs_dual_source_blending(const VkPipelineColorBlendAttachmentState *att)
+{
+    if (icd_blend_mode_is_dual_src(att->srcColorBlendFactor) ||
+        icd_blend_mode_is_dual_src(att->srcAlphaBlendFactor) ||
+        icd_blend_mode_is_dual_src(att->dstColorBlendFactor) ||
+        icd_blend_mode_is_dual_src(att->dstAlphaBlendFactor)) {
+        return true;
+    }
+    return false;
+}
+
+size_t icd_format_get_size(VkFormat format);
+
+unsigned int icd_format_get_channel_count(VkFormat format);
+
+void icd_format_get_raw_value(VkFormat format,
+                              const uint32_t color[4],
+                              void *value);
+
+#endif /* ICD_FORMAT_H */

diff --git a/icd/common/icd-instance.c b/icd/common/icd-instance.c
new file mode 100644
index 0000000..f630881
--- /dev/null
+++ b/icd/common/icd-instance.c

@@ -0,0 +1,193 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olv@lunarg.com>
+ *
+ */
+
+#define _ISOC11_SOURCE /* for aligned_alloc() */
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include "icd-instance.h"
+
+static VKAPI_ATTR void * VKAPI_CALL default_alloc(void *user_data, size_t size,
+                                   size_t alignment,
+                                   VkSystemAllocationScope allocationScope)
+{
+    if (alignment <= 1) {
+        return malloc(size);
+    } else if (u_is_pow2((unsigned int) alignment)) {
+        if (alignment < sizeof(void *)) {
+            assert(u_is_pow2(sizeof(void*)));
+            alignment = sizeof(void *);
+        }
+
+        size = (size + alignment - 1) & ~(alignment - 1);
+
+#if defined(_WIN32)
+        return _aligned_malloc(alignment, size);
+#else
+        return aligned_alloc(alignment, size);
+#endif
+    }
+    else {
+        return NULL;
+    }
+}
+
+static VKAPI_ATTR void VKAPI_CALL default_free(void *user_data, void *ptr)
+{
+#if defined(_WIN32)
+    _aligned_free(ptr);
+#else
+    free(ptr);
+#endif
+}
+
+struct icd_instance *icd_instance_create(const VkApplicationInfo *app_info,
+                                         const VkAllocationCallbacks *alloc_cb)
+{
+    static const VkAllocationCallbacks default_alloc_cb = {
+        .pfnAllocation = default_alloc,
+        .pfnFree = default_free,
+    };
+    struct icd_instance *instance;
+    const char *name;
+    size_t len;
+
+    if (!alloc_cb)
+        alloc_cb = &default_alloc_cb;
+
+    instance = alloc_cb->pfnAllocation(alloc_cb->pUserData, sizeof(*instance), sizeof(int),
+            VK_SYSTEM_ALLOCATION_SCOPE_INSTANCE);
+    if (!instance)
+        return NULL;
+
+    memset(instance, 0, sizeof(*instance));
+
+    name = (app_info && app_info->pApplicationName) ? app_info->pApplicationName : "unnamed";
+    len = strlen(name);
+    instance->name = alloc_cb->pfnAllocation(alloc_cb->pUserData, len + 1, sizeof(int),
+            VK_SYSTEM_ALLOCATION_SCOPE_INSTANCE);
+    if (!instance->name) {
+        alloc_cb->pfnFree(alloc_cb->pUserData, instance);
+        return NULL;
+    }
+
+    memcpy(instance->name, name, len);
+    instance->name[len] = '\0';
+
+    instance->alloc_cb = *alloc_cb;
+
+    return instance;
+}
+
+void icd_instance_destroy(struct icd_instance *instance)
+{
+    struct icd_instance_logger *logger;
+
+    for (logger = instance->loggers; logger; ) {
+        struct icd_instance_logger *next = logger->next;
+
+        icd_instance_free(instance, logger);
+        logger = next;
+    }
+
+    icd_instance_free(instance, instance->name);
+    icd_instance_free(instance, instance);
+}
+
+VkResult icd_instance_create_logger(
+        struct icd_instance *instance,
+        const VkDebugReportCallbackCreateInfoEXT *pCreateInfo,
+        const VkAllocationCallbacks *pAllocator,
+        VkDebugReportCallbackEXT *msg_obj)
+{
+    struct icd_instance_logger *logger;
+
+    /* TODOVV: Move this test to a validation layer */
+//    if (msg_obj == NULL) {
+//        return VK_ERROR_INVALID_POINTER;
+//    }
+
+    logger = icd_instance_alloc(instance, sizeof(*logger), sizeof(int),
+            VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+    if (!logger)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    logger->func = pCreateInfo->pfnCallback;
+    logger->flags = pCreateInfo->flags;
+    logger->next = instance->loggers;
+    instance->loggers = logger;
+
+    logger->user_data = pCreateInfo->pUserData;
+
+    *( struct icd_instance_logger **)msg_obj = logger;
+
+    return VK_SUCCESS;
+}
+
+void icd_instance_destroy_logger(
+        struct icd_instance *instance,
+        const VkDebugReportCallbackEXT msg_obj,
+        const VkAllocationCallbacks *pAllocator)
+{
+    struct icd_instance_logger *logger, *prev;
+    VkDebugReportCallbackEXT local_msg_obj = msg_obj;
+
+    for (prev = NULL, logger = instance->loggers; logger;
+         prev = logger, logger = logger->next) {
+        if (logger == *(struct icd_instance_logger **) &local_msg_obj)
+            break;
+    }
+
+    /* TODOVV: Move this to validation layer */
+//    if (!logger)
+//        return VK_ERROR_INVALID_POINTER;
+
+    if (prev)
+        prev->next = logger->next;
+    else
+        instance->loggers = logger->next;
+
+    icd_instance_free(instance, logger);
+}
+
+void icd_instance_log(const struct icd_instance *instance,
+                      VkFlags msg_flags,
+                      VkDebugReportObjectTypeEXT obj_type,
+                      uint64_t src_object,
+                      size_t location,
+                      int32_t msg_code,
+                      const char *msg)
+{
+    const struct icd_instance_logger *logger;
+
+    if (!instance->loggers) {
+        fputs(msg, stderr);
+        fputc('\n', stderr);
+        return;
+    }
+
+    for (logger = instance->loggers; logger; logger = logger->next) {
+        if (msg_flags & logger->flags) {
+            logger->func(msg_flags, obj_type, (uint64_t)src_object, location,
+                         msg_code, instance->name, msg, logger->user_data);
+        }
+    }
+}

diff --git a/icd/common/icd-instance.h b/icd/common/icd-instance.h
new file mode 100644
index 0000000..d85e921
--- /dev/null
+++ b/icd/common/icd-instance.h

@@ -0,0 +1,85 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olv@lunarg.com>
+ *
+ */
+
+#ifndef ICD_INSTANCE_H
+#define ICD_INSTANCE_H
+
+#include "icd-utils.h"
+#include "icd.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct icd_instance_logger {
+    PFN_vkDebugReportCallbackEXT func;
+    void *user_data;
+    VkFlags flags;
+
+    struct icd_instance_logger *next;
+};
+
+struct icd_instance {
+    char *name;
+
+    VkAllocationCallbacks alloc_cb;
+
+    struct icd_instance_logger *loggers;
+};
+
+struct icd_instance *icd_instance_create(const VkApplicationInfo *app_info,
+                                         const VkAllocationCallbacks *alloc_cb);
+void icd_instance_destroy(struct icd_instance *instance);
+
+static inline void *icd_instance_alloc(const struct icd_instance *instance,
+                                       size_t size, size_t alignment,
+                                       VkSystemAllocationScope scope)
+{
+    return instance->alloc_cb.pfnAllocation(instance->alloc_cb.pUserData,
+            size, alignment, scope);
+}
+
+static inline void icd_instance_free(const struct icd_instance *instance,
+                                     void *ptr)
+{
+    instance->alloc_cb.pfnFree(instance->alloc_cb.pUserData, ptr);
+}
+
+VkResult icd_instance_create_logger(struct icd_instance *instance,
+        const VkDebugReportCallbackCreateInfoEXT *pCreateInfo,
+        const VkAllocationCallbacks *pAllocator,
+        VkDebugReportCallbackEXT *msg_obj);
+
+void icd_instance_destroy_logger(struct icd_instance *instance,
+        const VkDebugReportCallbackEXT msg_obj, const VkAllocationCallbacks *pAllocator);
+
+void icd_instance_log(const struct icd_instance *instance,
+                      VkFlags msg_flags,
+                      VkDebugReportObjectTypeEXT obj_type,
+                      uint64_t src_object,
+                      size_t location, int32_t msg_code,
+                      const char *msg);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* ICD_INSTANCE_H */

diff --git a/icd/common/icd-spv.h b/icd/common/icd-spv.h
new file mode 100644
index 0000000..cc65284
--- /dev/null
+++ b/icd/common/icd-spv.h

@@ -0,0 +1,34 @@
+/*
+ * Copyright (c) 2015-2016 Valve Corporation
+ * Copyright (c) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Cody Northrop <cody@lunarg.com>
+ */
+
+#ifndef ICD_SPV_H
+#define ICD_SPV_H
+
+#include <stdint.h>
+
+#define ICD_SPV_MAGIC   0x07230203
+#define ICD_SPV_VERSION 99
+
+struct icd_spv_header {
+    uint32_t magic;
+    uint32_t version;
+    uint32_t gen_magic;  // Generator's magic number
+};
+
+#endif /* ICD_SPV_H */

diff --git a/icd/common/icd-utils.c b/icd/common/icd-utils.c
new file mode 100644
index 0000000..6b4b6b6
--- /dev/null
+++ b/icd/common/icd-utils.c

@@ -0,0 +1,80 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ *
+ */
+
+#include "icd-utils.h"
+
+/* stolen from Mesa */
+uint16_t u_float_to_half(float f)
+{
+    union fi {
+        float f;
+        uint32_t ui;
+    };
+
+   uint32_t sign_mask  = 0x80000000;
+   uint32_t round_mask = ~0xfff;
+   uint32_t f32inf = 0xff << 23;
+   uint32_t f16inf = 0x1f << 23;
+   uint32_t sign;
+   union fi magic;
+   union fi f32;
+   uint16_t f16;
+
+   magic.ui = 0xf << 23;
+
+   f32.f = f;
+
+   /* Sign */
+   sign = f32.ui & sign_mask;
+   f32.ui ^= sign;
+
+   if (f32.ui == f32inf) {
+      /* Inf */
+      f16 = 0x7c00;
+   } else if (f32.ui > f32inf) {
+      /* NaN */
+      f16 = 0x7e00;
+   } else {
+      /* Number */
+      f32.ui &= round_mask;
+      f32.f  *= magic.f;
+      f32.ui -= round_mask;
+
+      /*
+       * Clamp to max finite value if overflowed.
+       * OpenGL has completely undefined rounding behavior for float to
+       * half-float conversions, and this matches what is mandated for float
+       * to fp11/fp10, which recommend round-to-nearest-finite too.
+       * (d3d10 is deeply unhappy about flushing such values to infinity, and
+       * while it also mandates round-to-zero it doesn't care nearly as much
+       * about that.)
+       */
+      if (f32.ui > f16inf)
+         f32.ui = f16inf - 1;
+
+      f16 = f32.ui >> 13;
+   }
+
+   /* Sign */
+   f16 |= sign >> 16;
+
+   return f16;
+}

diff --git a/icd/common/icd-utils.h b/icd/common/icd-utils.h
new file mode 100644
index 0000000..6dc923a
--- /dev/null
+++ b/icd/common/icd-utils.h

@@ -0,0 +1,119 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Chia-I Wu <olv@lunarg.com>
+ * Author: David Pinedo <david@lunarg.com>
+ *
+ */
+
+#ifndef ICD_UTILS_H
+#define ICD_UTILS_H
+
+#include <stdbool.h>
+#include <stdint.h>
+#include <assert.h>
+#if defined(PLATFORM_LINUX)
+#include <strings.h> /* for ffs() */
+#endif
+#include "icd.h"
+
+#if defined(NDEBUG) && defined(__GNUC__)
+#define U_ASSERT_ONLY __attribute__((unused))
+#else
+#define U_ASSERT_ONLY
+#endif
+
+#if defined(__GNUC__)
+#define likely(x)   __builtin_expect(!!(x), 1)
+#define unlikely(x) __builtin_expect(!!(x), 0)
+#else
+#define likely(x)   (x)
+#define unlikely(x) (x)
+#endif
+
+#define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0]))
+
+#define u_popcount(u) __builtin_popcount(u)
+#define u_popcountll(u) __builtin_popcountll(u)
+
+#define STATIC_ASSERT(expr) do {            \
+    (void) sizeof(char[1 - 2 * !(expr)]);   \
+} while (0)
+
+#define U_CLAMP(val, min, max) \
+    ((val) < (min) ? (min) : (val) > (max)? (max) : (val))
+
+/**
+ * Return true if val is power of two, or zero.
+ */
+static inline bool u_is_pow2(unsigned int val)
+{
+    return ((val & (val - 1)) == 0);
+}
+
+static inline int u_ffs(int val)
+{
+#if defined(PLATFORM_LINUX)
+	return ffs(val);
+#else
+	return __lzcnt(val) + 1;
+#endif
+}
+
+static inline unsigned int u_align(unsigned int val, unsigned alignment)
+{
+    assert(alignment && u_is_pow2(alignment));
+    return (val + alignment - 1) & ~(alignment - 1);
+}
+
+static inline unsigned int u_minify(unsigned int val, unsigned level)
+{
+    return (val >> level) ? val >> level : 1;
+}
+
+static inline uint32_t u_fui(float f)
+{
+    union {
+        float f;
+        uint32_t ui;
+    } u = { .f = f };
+
+    return u.ui;
+}
+
+static inline float u_uif(uint32_t ui)
+{
+    union {
+        float f;
+        uint32_t ui;
+    } u = { .ui = ui };
+
+    return u.f;
+}
+
+static inline int u_iround(float f)
+{
+    if (f >= 0.0f)
+        return (int) (f + 0.5f);
+    else
+        return (int) (f - 0.5f);
+}
+
+uint16_t u_float_to_half(float f);
+
+#endif /* ICD_UTILS_H */

diff --git a/icd/common/icd.h b/icd/common/icd.h
new file mode 100644
index 0000000..7ccb67a
--- /dev/null
+++ b/icd/common/icd.h

@@ -0,0 +1,45 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Cody Northrop <cody@lunarg.com>
+ * Author: David Pinedo <david@lunarg.com>
+ * Author: Jon Ashburn <jon@lunarg.com>
+ *
+ */
+
+#ifndef ICD_H
+#define ICD_H
+
+#include <vulkan/vulkan.h>
+#include <vulkan/vk_platform.h>
+#include "vulkan/vk_sdk_platform.h"
+
+#if defined(__GNUC__) && __GNUC__ >= 4
+#  define ICD_EXPORT __attribute__((visibility("default")))
+#elif defined(__SUNPRO_C) && (__SUNPRO_C >= 0x590)
+#  define ICD_EXPORT __attribute__((visibility("default")))
+#else
+#  define ICD_EXPORT
+#endif
+
+VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL vk_icdGetInstanceProcAddr(
+    VkInstance                                  instance,
+    const char*                                 pName);
+VKAPI_ATTR VkResult VKAPI_CALL vk_icdNegotiateLoaderICDInterfaceVersion(
+    uint32_t                                    *pVersion);
+#endif /* ICD_H */

diff --git a/icd/intel/CMakeLists.txt b/icd/intel/CMakeLists.txt
new file mode 100644
index 0000000..88cbe64
--- /dev/null
+++ b/icd/intel/CMakeLists.txt

@@ -0,0 +1,100 @@
+# Create the i965 Vulkan DRI library
+
+set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-sign-compare")
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-sign-compare")
+
+add_subdirectory(kmd)
+add_subdirectory(compiler)
+
+add_custom_command(OUTPUT gpa.c
+    COMMAND ${PROJECT_SOURCE_DIR}/vk-vtgenerate.py Xcb icd-get-proc-addr > gpa.c
+    DEPENDS ${PROJECT_SOURCE_DIR}/vk-vtgenerate.py ${PROJECT_SOURCE_DIR}/vulkan.py)
+
+set(sources
+    buf.c
+    cmd.c
+    cmd_decode.c
+    cmd_meta.c
+    cmd_mi.c
+    cmd_barrier.c
+    cmd_pipeline.c
+    desc.c
+    dev.c
+    instance.c
+    event.c
+    extension_utils.c
+    fb.c
+    fence.c
+    format.c
+    gpa.c
+    gpu.c
+    img.c
+    layout.c
+    mem.c
+    obj.c
+    pipeline.c
+    query.c
+    queue.c
+    sampler.c
+    shader.c
+    state.c
+    view.c
+    )
+
+set(definitions "")
+set(include_dirs "")
+
+set(libraries
+    m
+    icd
+    intelkmd
+    intelcompiler)
+
+find_package(XCB COMPONENTS xcb xcb-dri3 xcb-present)
+if(XCB_FOUND)
+    list(APPEND include_dirs ${XCB_INCLUDE_DIRS})
+    list(APPEND libraries ${XCB_LIBRARIES})
+
+    if (DisplayServer MATCHES "Xcb")
+        find_package(XCB COMPONENTS xcb)
+        if(XCB_FOUND)
+            # No include directories by purpose.
+            list(APPEND libraries ${XCB_LIBRARIES})
+        endif()
+    endif()
+
+    if (DisplayServer MATCHES "Xlib")
+        find_package(X11_XCB COMPONENTS X11-xcb)
+        if(X11_XCB_FOUND)
+            # No include directories by purpose.
+            list(APPEND libraries ${X11_XCB_LIBRARIES})
+        endif()
+    endif()
+
+    list(APPEND sources wsi_x11.c)
+
+    set_source_files_properties(wsi_x11.c PROPERTIES COMPILE_FLAGS "-I${PROJECT_SOURCE_DIR}/icd/intel/kmd/libdrm/include/drm")
+else()
+    list(APPEND sources wsi_null.c)
+endif()
+
+if (NOT WIN32)
+    # extra setup for out-of-tree builds
+    # intel ICD not built on Windows
+    if (NOT (CMAKE_CURRENT_SOURCE_DIR STREQUAL CMAKE_CURRENT_BINARY_DIR))
+            add_custom_target(intel_icd-json ALL
+                COMMAND ln -sf ${CMAKE_CURRENT_SOURCE_DIR}/intel_icd.json
+                VERBATIM
+                )
+    endif()
+endif()
+
+add_library(VK_i965 SHARED ${sources})
+target_compile_definitions(VK_i965 PRIVATE ${definitions})
+target_include_directories(VK_i965 PRIVATE ${include_dirs})
+target_link_libraries(VK_i965 ${libraries})
+
+# set -Bsymbolic for vkGetProcAddr()
+set_target_properties(VK_i965 PROPERTIES
+    COMPILE_FLAGS "-Wmissing-declarations"
+    LINK_FLAGS "-Wl,-Bsymbolic -Wl,-no-undefined -Wl,--exclude-libs,ALL")

diff --git a/icd/intel/README.md b/icd/intel/README.md
new file mode 100644
index 0000000..d259f2f
--- /dev/null
+++ b/icd/intel/README.md

@@ -0,0 +1,7 @@
+# Intel Sample Driver
+
+This directory provides support for the Intel Haswell, Ivy Bridge and Sandy Bridge GPUs:
+- Top level directory is the OS-independent API support
+- [compiler](compiler) contains BIL->Intel ISA compiler
+- [kmd](kmd) contains OS kernel mode driver abstraction
+- [genhw](genhw) contains autogenerated HW interface

diff --git a/icd/intel/buf.c b/icd/intel/buf.c
new file mode 100644
index 0000000..bfd4381
--- /dev/null
+++ b/icd/intel/buf.c

@@ -0,0 +1,105 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olv@lunarg.com>
+ * Author: Jon Ashburn <jon@lunarg.com>
+ *
+ */
+
+#include "dev.h"
+#include "obj.h"
+#include "buf.h"
+
+static void buf_destroy(struct intel_obj *obj)
+{
+    struct intel_buf *buf = intel_buf_from_obj(obj);
+
+    intel_buf_destroy(buf);
+}
+
+static VkResult buf_get_memory_requirements(struct intel_base *base,
+                               VkMemoryRequirements *pRequirements)
+{
+    struct intel_buf *buf = intel_buf_from_base(base);
+
+    /*
+     * From the Sandy Bridge PRM, volume 1 part 1, page 118:
+     *
+     *     "For buffers, which have no inherent "height," padding
+     *      requirements are different. A buffer must be padded to the
+     *      next multiple of 256 array elements, with an additional 16
+     *      bytes added beyond that to account for the L1 cache line."
+    */
+    pRequirements->size = buf->size;
+    if (buf->usage & (VK_BUFFER_USAGE_UNIFORM_TEXEL_BUFFER_BIT |
+                      VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT)) {
+        pRequirements->size = u_align(pRequirements->size, 256) + 16;
+    }
+
+    pRequirements->alignment      = 4096;
+    pRequirements->memoryTypeBits = (1 << INTEL_MEMORY_TYPE_COUNT) - 1;
+
+    return VK_SUCCESS;
+}
+
+VkResult intel_buf_create(struct intel_dev *dev,
+                            const VkBufferCreateInfo *info,
+                            struct intel_buf **buf_ret)
+{
+    struct intel_buf *buf;
+
+    buf = (struct intel_buf *) intel_base_create(&dev->base.handle,
+            sizeof(*buf), dev->base.dbg, VK_DEBUG_REPORT_OBJECT_TYPE_BUFFER_EXT, info, 0);
+    if (!buf)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    buf->size = info->size;
+    buf->usage = info->usage;
+
+    buf->obj.destroy = buf_destroy;
+    buf->obj.base.get_memory_requirements = buf_get_memory_requirements;
+
+    *buf_ret = buf;
+
+    return VK_SUCCESS;
+}
+
+void intel_buf_destroy(struct intel_buf *buf)
+{
+    intel_base_destroy(&buf->obj.base);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateBuffer(
+    VkDevice                                  device,
+    const VkBufferCreateInfo*               pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkBuffer*                                 pBuffer)
+{
+    struct intel_dev *dev = intel_dev(device);
+
+    return intel_buf_create(dev, pCreateInfo, (struct intel_buf **) pBuffer);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyBuffer(
+    VkDevice                                device,
+    VkBuffer                                buffer,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    struct intel_obj *obj = intel_obj(buffer);
+
+    obj->destroy(obj);
+}

diff --git a/icd/intel/buf.h b/icd/intel/buf.h
new file mode 100644
index 0000000..a40d160
--- /dev/null
+++ b/icd/intel/buf.h

@@ -0,0 +1,56 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olv@lunarg.com>
+ *
+ */
+
+#ifndef BUF_H
+#define BUF_H
+
+#include "intel.h"
+#include "obj.h"
+
+struct intel_buf {
+    struct intel_obj obj;
+
+    VkDeviceSize size;
+    VkFlags usage;
+};
+
+static inline struct intel_buf *intel_buf(VkBuffer buf)
+{
+    return *(struct intel_buf **) &buf;
+}
+
+static inline struct intel_buf *intel_buf_from_base(struct intel_base *base)
+{
+    return (struct intel_buf *) base;
+}
+
+static inline struct intel_buf *intel_buf_from_obj(struct intel_obj *obj)
+{
+    return intel_buf_from_base(&obj->base);
+}
+
+VkResult intel_buf_create(struct intel_dev *dev,
+                            const VkBufferCreateInfo *info,
+                            struct intel_buf **buf_ret);
+
+void intel_buf_destroy(struct intel_buf *buf);
+
+#endif /* BUF_H */

diff --git a/icd/intel/cmd.c b/icd/intel/cmd.c
new file mode 100644
index 0000000..32a0d36
--- /dev/null
+++ b/icd/intel/cmd.c

@@ -0,0 +1,568 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Cody Northrop <cody@lunarg.com>
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ *
+ */
+
+#include "genhw/genhw.h"
+#include "kmd/winsys.h"
+#include "dev.h"
+#include "mem.h"
+#include "obj.h"
+#include "cmd_priv.h"
+#include "fb.h"
+
+/**
+ * Free all resources used by a writer.  Note that the initial size is not
+ * reset.
+ */
+static void cmd_writer_reset(struct intel_cmd *cmd,
+                             enum intel_cmd_writer_type which)
+{
+    struct intel_cmd_writer *writer = &cmd->writers[which];
+
+    if (writer->ptr) {
+        intel_bo_unmap(writer->bo);
+        writer->ptr = NULL;
+    }
+
+    intel_bo_unref(writer->bo);
+    writer->bo = NULL;
+
+    writer->used = 0;
+
+    writer->sba_offset = 0;
+
+    if (writer->items) {
+        intel_free(cmd, writer->items);
+        writer->items = NULL;
+        writer->item_alloc = 0;
+        writer->item_used = 0;
+    }
+}
+
+/**
+ * Discard everything written so far.
+ */
+static void cmd_writer_discard(struct intel_cmd *cmd,
+                               enum intel_cmd_writer_type which)
+{
+    struct intel_cmd_writer *writer = &cmd->writers[which];
+
+    intel_bo_truncate_relocs(writer->bo, 0);
+    writer->used = 0;
+    writer->item_used = 0;
+}
+
+static struct intel_bo *alloc_writer_bo(struct intel_winsys *winsys,
+                                        enum intel_cmd_writer_type which,
+                                        size_t size)
+{
+    static const char *writer_names[INTEL_CMD_WRITER_COUNT] = {
+        [INTEL_CMD_WRITER_BATCH] = "batch",
+        [INTEL_CMD_WRITER_SURFACE] = "surface",
+        [INTEL_CMD_WRITER_STATE] = "state",
+        [INTEL_CMD_WRITER_INSTRUCTION] = "instruction",
+    };
+
+    return intel_winsys_alloc_bo(winsys, writer_names[which], size, true);
+}
+
+/**
+ * Allocate and map the buffer for writing.
+ */
+static VkResult cmd_writer_alloc_and_map(struct intel_cmd *cmd,
+                                           enum intel_cmd_writer_type which)
+{
+    struct intel_cmd_writer *writer = &cmd->writers[which];
+    struct intel_bo *bo;
+
+    bo = alloc_writer_bo(cmd->dev->winsys, which, writer->size);
+    if (bo) {
+        intel_bo_unref(writer->bo);
+        writer->bo = bo;
+    } else if (writer->bo) {
+        /* reuse the old bo */
+        cmd_writer_discard(cmd, which);
+    } else {
+        return VK_ERROR_OUT_OF_DEVICE_MEMORY;
+    }
+
+    writer->used = 0;
+    writer->item_used = 0;
+
+    writer->ptr = intel_bo_map(writer->bo, true);
+    if (!writer->ptr) {
+        return VK_ERROR_MEMORY_MAP_FAILED;
+    }
+
+    return VK_SUCCESS;
+}
+
+/**
+ * Unmap the buffer for submission.
+ */
+static void cmd_writer_unmap(struct intel_cmd *cmd,
+                             enum intel_cmd_writer_type which)
+{
+    struct intel_cmd_writer *writer = &cmd->writers[which];
+
+    intel_bo_unmap(writer->bo);
+    writer->ptr = NULL;
+}
+
+/**
+ * Grow a mapped writer to at least \p new_size.  Failures are handled
+ * silently.
+ */
+void cmd_writer_grow(struct intel_cmd *cmd,
+                     enum intel_cmd_writer_type which,
+                     size_t new_size)
+{
+    struct intel_cmd_writer *writer = &cmd->writers[which];
+    struct intel_bo *new_bo;
+    void *new_ptr;
+
+    if (new_size < writer->size << 1)
+        new_size = writer->size << 1;
+    /* STATE_BASE_ADDRESS requires page-aligned buffers */
+    new_size = u_align(new_size, 4096);
+
+    new_bo = alloc_writer_bo(cmd->dev->winsys, which, new_size);
+    if (!new_bo) {
+        cmd_writer_discard(cmd, which);
+        cmd_fail(cmd, VK_ERROR_OUT_OF_DEVICE_MEMORY);
+        return;
+    }
+
+    /* map and copy the data over */
+    new_ptr = intel_bo_map(new_bo, true);
+    if (!new_ptr) {
+        intel_bo_unref(new_bo);
+        cmd_writer_discard(cmd, which);
+        cmd_fail(cmd, VK_ERROR_VALIDATION_FAILED_EXT);
+        return;
+    }
+
+    memcpy(new_ptr, writer->ptr, writer->used);
+
+    intel_bo_unmap(writer->bo);
+    intel_bo_unref(writer->bo);
+
+    writer->size = new_size;
+    writer->bo = new_bo;
+    writer->ptr = new_ptr;
+}
+
+/**
+ * Record an item for later decoding.
+ */
+void cmd_writer_record(struct intel_cmd *cmd,
+                       enum intel_cmd_writer_type which,
+                       enum intel_cmd_item_type type,
+                       size_t offset, size_t size)
+{
+    struct intel_cmd_writer *writer = &cmd->writers[which];
+    struct intel_cmd_item *item;
+
+    if (writer->item_used == writer->item_alloc) {
+        const unsigned new_alloc = (writer->item_alloc) ?
+            writer->item_alloc << 1 : 256;
+        struct intel_cmd_item *items;
+
+        items = intel_alloc(cmd, sizeof(writer->items[0]) * new_alloc,
+                sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+        if (!items) {
+            writer->item_used = 0;
+            cmd_fail(cmd, VK_ERROR_OUT_OF_HOST_MEMORY);
+            return;
+        }
+
+        memcpy(items, writer->items,
+                sizeof(writer->items[0]) * writer->item_alloc);
+
+        intel_free(cmd, writer->items);
+
+        writer->items = items;
+        writer->item_alloc = new_alloc;
+    }
+
+    item = &writer->items[writer->item_used++];
+    item->type = type;
+    item->offset = offset;
+    item->size = size;
+}
+
+static void cmd_writer_patch(struct intel_cmd *cmd,
+                             enum intel_cmd_writer_type which,
+                             size_t offset, uint32_t val)
+{
+    struct intel_cmd_writer *writer = &cmd->writers[which];
+
+    assert(offset + sizeof(val) <= writer->used);
+    *((uint32_t *) ((char *) writer->ptr + offset)) = val;
+}
+
+static void cmd_reset(struct intel_cmd *cmd)
+{
+    uint32_t i;
+
+    for (i = 0; i < INTEL_CMD_WRITER_COUNT; i++)
+        cmd_writer_reset(cmd, i);
+
+    if (cmd->bind.shader_cache.entries)
+        intel_free(cmd, cmd->bind.shader_cache.entries);
+
+    if (cmd->bind.dset.graphics_data.set_offsets)
+        intel_free(cmd, cmd->bind.dset.graphics_data.set_offsets);
+    if (cmd->bind.dset.graphics_data.dynamic_offsets)
+        intel_free(cmd, cmd->bind.dset.graphics_data.dynamic_offsets);
+    if (cmd->bind.dset.compute_data.set_offsets)
+        intel_free(cmd, cmd->bind.dset.compute_data.set_offsets);
+    if (cmd->bind.dset.compute_data.dynamic_offsets)
+        intel_free(cmd, cmd->bind.dset.compute_data.dynamic_offsets);
+
+    memset(&cmd->bind, 0, sizeof(cmd->bind));
+
+    cmd->reloc_used = 0;
+    cmd->result = VK_SUCCESS;
+}
+
+static void cmd_destroy(struct intel_obj *obj)
+{
+    struct intel_cmd *cmd = intel_cmd_from_obj(obj);
+
+    intel_cmd_destroy(cmd);
+}
+
+VkResult intel_cmd_create(struct intel_dev *dev,
+                            const VkCommandBufferAllocateInfo *info,
+                            struct intel_cmd **cmd_ret)
+{
+    int pipeline_select;
+    struct intel_cmd *cmd;
+    struct intel_cmd_pool *pool = intel_cmd_pool(info->commandPool);
+
+    switch (pool->queue_family_index) {
+    case INTEL_GPU_ENGINE_3D:
+        pipeline_select = GEN6_PIPELINE_SELECT_DW0_SELECT_3D;
+        break;
+    default:
+        /* TODOVV: Add validation check for this */
+        assert(0 && "icd: Invalid queue_family_index");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+        break;
+    }
+
+    cmd = (struct intel_cmd *) intel_base_create(&dev->base.handle,
+            sizeof(*cmd), dev->base.dbg, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT, info, 0);
+    if (!cmd)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    cmd->obj.destroy = cmd_destroy;
+
+    cmd->dev = dev;
+    cmd->scratch_bo = dev->cmd_scratch_bo;
+    cmd->primary = (info->level == VK_COMMAND_BUFFER_LEVEL_PRIMARY);
+    cmd->pipeline_select = pipeline_select;
+
+    /*
+     * XXX This is not quite right.  intel_gpu sets maxMemReferences to
+     * batch_buffer_reloc_count, but we may emit up to two relocs, for start
+     * and end offsets, for each referenced memories.
+     */
+    cmd->reloc_count = dev->gpu->batch_buffer_reloc_count;
+    cmd->relocs = intel_alloc(cmd, sizeof(cmd->relocs[0]) * cmd->reloc_count,
+            4096, VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+    if (!cmd->relocs) {
+        intel_cmd_destroy(cmd);
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+    }
+
+    *cmd_ret = cmd;
+
+    return VK_SUCCESS;
+}
+
+void intel_cmd_destroy(struct intel_cmd *cmd)
+{
+    cmd_reset(cmd);
+
+    intel_free(cmd, cmd->relocs);
+    intel_base_destroy(&cmd->obj.base);
+}
+
+VkResult intel_cmd_begin(struct intel_cmd *cmd, const VkCommandBufferBeginInfo *info)
+{
+    VkResult ret;
+    uint32_t i;
+
+    cmd_reset(cmd);
+
+    /* TODOVV: Check that render pass is defined */
+    const VkCommandBufferInheritanceInfo *hinfo = info->pInheritanceInfo;
+    if (!cmd->primary) {
+        cmd_begin_render_pass(cmd,
+                intel_render_pass(hinfo->renderPass),
+                intel_fb(hinfo->framebuffer),
+                hinfo->subpass,
+                VK_SUBPASS_CONTENTS_INLINE);
+    }
+
+    if (cmd->flags != info->flags) {
+        cmd->flags = info->flags;
+        cmd->writers[INTEL_CMD_WRITER_BATCH].size = 0;
+    }
+
+    if (!cmd->writers[INTEL_CMD_WRITER_BATCH].size) {
+        const uint32_t size = cmd->dev->gpu->max_batch_buffer_size / 2;
+
+        cmd->writers[INTEL_CMD_WRITER_BATCH].size = size;
+        cmd->writers[INTEL_CMD_WRITER_SURFACE].size = size / 2;
+        cmd->writers[INTEL_CMD_WRITER_STATE].size = size / 2;
+        cmd->writers[INTEL_CMD_WRITER_INSTRUCTION].size = 16384;
+    }
+
+    for (i = 0; i < INTEL_CMD_WRITER_COUNT; i++) {
+        ret = cmd_writer_alloc_and_map(cmd, i);
+        if (ret != VK_SUCCESS) {
+            cmd_reset(cmd);
+            return  ret;
+        }
+    }
+
+    cmd_batch_begin(cmd);
+
+    return VK_SUCCESS;
+}
+
+VkResult intel_cmd_end(struct intel_cmd *cmd)
+{
+    struct intel_winsys *winsys = cmd->dev->winsys;
+    uint32_t i;
+
+    /* draw_state: no matching intel_cmd_begin() */
+    assert(cmd->writers[INTEL_CMD_WRITER_BATCH].ptr && "icd: no matching intel_cmd_begin");
+
+    cmd_batch_end(cmd);
+
+    /* TODO we need a more "explicit" winsys */
+    for (i = 0; i < cmd->reloc_used; i++) {
+        const struct intel_cmd_reloc *reloc = &cmd->relocs[i];
+        const struct intel_cmd_writer *writer = &cmd->writers[reloc->which];
+        uint64_t presumed_offset;
+        int err;
+
+        /*
+         * Once a bo is used as a reloc target, libdrm_intel disallows more
+         * relocs to be added to it.  That may happen when
+         * INTEL_CMD_RELOC_TARGET_IS_WRITER is set.  We have to process them
+         * in another pass.
+         */
+        if (reloc->flags & INTEL_CMD_RELOC_TARGET_IS_WRITER)
+            continue;
+
+        err = intel_bo_add_reloc(writer->bo, reloc->offset,
+                (struct intel_bo *) reloc->target, reloc->target_offset,
+                reloc->flags, &presumed_offset);
+        if (err) {
+            cmd_fail(cmd, VK_ERROR_OUT_OF_DEVICE_MEMORY);
+            break;
+        }
+
+        assert(presumed_offset == (uint64_t) (uint32_t) presumed_offset);
+        cmd_writer_patch(cmd, reloc->which, reloc->offset,
+                (uint32_t) presumed_offset);
+    }
+    for (i = 0; i < cmd->reloc_used; i++) {
+        const struct intel_cmd_reloc *reloc = &cmd->relocs[i];
+        const struct intel_cmd_writer *writer = &cmd->writers[reloc->which];
+        uint64_t presumed_offset;
+        int err;
+
+        if (!(reloc->flags & INTEL_CMD_RELOC_TARGET_IS_WRITER))
+            continue;
+
+        err = intel_bo_add_reloc(writer->bo, reloc->offset,
+                cmd->writers[reloc->target].bo, reloc->target_offset,
+                reloc->flags & ~INTEL_CMD_RELOC_TARGET_IS_WRITER,
+                &presumed_offset);
+        if (err) {
+            cmd_fail(cmd, VK_ERROR_OUT_OF_DEVICE_MEMORY);
+            break;
+        }
+
+        assert(presumed_offset == (uint64_t) (uint32_t) presumed_offset);
+        cmd_writer_patch(cmd, reloc->which, reloc->offset,
+                (uint32_t) presumed_offset);
+    }
+
+    for (i = 0; i < INTEL_CMD_WRITER_COUNT; i++)
+        cmd_writer_unmap(cmd, i);
+
+    if (cmd->result != VK_SUCCESS)
+        return cmd->result;
+
+    if (intel_winsys_can_submit_bo(winsys,
+                &cmd->writers[INTEL_CMD_WRITER_BATCH].bo, 1))
+        return VK_SUCCESS;
+    else {
+        assert(0 && "intel_winsys_can_submit_bo failed");
+    }
+    return VK_ERROR_DEVICE_LOST;
+}
+
+static void pool_destroy(struct intel_obj *obj)
+{
+    struct intel_cmd_pool *cmd_pool = intel_cmd_pool_from_obj(obj);
+
+    intel_cmd_pool_destroy(cmd_pool);
+}
+
+VkResult intel_cmd_pool_create(struct intel_dev *dev,
+                            const VkCommandPoolCreateInfo *info,
+                            struct intel_cmd_pool **cmd_pool_ret)
+{
+    struct intel_cmd_pool *cmd_pool;
+
+    cmd_pool = (struct intel_cmd_pool *) intel_base_create(&dev->base.handle,
+            sizeof(*cmd_pool), dev->base.dbg, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_POOL_EXT, info, 0);
+    if (!cmd_pool)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    cmd_pool->obj.destroy = pool_destroy;
+
+    cmd_pool->dev = dev;
+    cmd_pool->queue_family_index = info->queueFamilyIndex;
+    cmd_pool->create_flags = info->flags;
+
+    *cmd_pool_ret = cmd_pool;
+
+    return VK_SUCCESS;
+}
+
+void intel_cmd_pool_destroy(struct intel_cmd_pool *cmd_pool)
+{
+    //cmd_reset(cmd);
+    //intel_free(cmd, cmd->relocs);
+    //intel_base_destroy(&cmd->obj.base);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateCommandPool(
+    VkDevice                                    device,
+    const VkCommandPoolCreateInfo*              pCreateInfo,
+    const VkAllocationCallbacks*                pAllocator,
+    VkCommandPool*                              pCommandPool)
+{
+    struct intel_dev *dev = intel_dev(device);
+
+    return intel_cmd_pool_create(dev, pCreateInfo,
+            (struct intel_cmd_pool **) pCommandPool);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyCommandPool(
+    VkDevice                                        device,
+    VkCommandPool                                   commandPool,
+    const VkAllocationCallbacks*                    pAllocator)
+{
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkResetCommandPool(
+    VkDevice                                        device,
+    VkCommandPool                                   commandPool,
+    VkCommandPoolResetFlags                         flags)
+{
+    // TODO
+    return VK_SUCCESS;
+}
+
+void intel_free_cmd_buffers(
+    struct intel_cmd_pool              *cmd_pool,
+    uint32_t                            count,
+    const VkCommandBuffer                  *cmd_bufs)
+{
+    for (uint32_t i = 0; i < count; i++) {
+        struct intel_obj *obj = intel_obj(cmd_bufs[i]);
+
+        obj->destroy(obj);
+    }
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkAllocateCommandBuffers(
+    VkDevice                                   device,
+    const VkCommandBufferAllocateInfo*         pAllocateInfo,
+    VkCommandBuffer*                           pCommandBuffers)
+{
+    struct intel_dev *dev = intel_dev(device);
+    struct intel_cmd_pool *pool = intel_cmd_pool(pAllocateInfo->commandPool);
+    uint32_t num_allocated = 0;
+    VkResult res;
+
+    for (uint32_t i = 0; i < pAllocateInfo->commandBufferCount; i++) {
+        res = intel_cmd_create(dev, pAllocateInfo,
+            (struct intel_cmd **) &pCommandBuffers[i]);
+        if (res != VK_SUCCESS) {
+            intel_free_cmd_buffers(pool,
+                                   num_allocated,
+                                   pCommandBuffers);
+            return res;
+        }
+        num_allocated++;
+    }
+
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkFreeCommandBuffers(
+    VkDevice                                device,
+    VkCommandPool                           commandPool,
+    uint32_t                                commandBufferCount,
+    const VkCommandBuffer*                  pCommandBuffers)
+{
+    intel_free_cmd_buffers(intel_cmd_pool(commandPool), commandBufferCount, pCommandBuffers);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkBeginCommandBuffer(
+    VkCommandBuffer                              commandBuffer,
+    const VkCommandBufferBeginInfo            *info)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+
+    return intel_cmd_begin(cmd, info);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkEndCommandBuffer(
+    VkCommandBuffer                              commandBuffer)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+
+    return intel_cmd_end(cmd);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkResetCommandBuffer(
+    VkCommandBuffer                              commandBuffer,
+    VkCommandBufferResetFlags                    flags)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+
+    cmd_reset(cmd);
+
+    return VK_SUCCESS;
+}

diff --git a/icd/intel/cmd.h b/icd/intel/cmd.h
new file mode 100644
index 0000000..4c55e28
--- /dev/null
+++ b/icd/intel/cmd.h

@@ -0,0 +1,311 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Cody Northrop <cody@lunarg.com>
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ *
+ */
+
+#ifndef CMD_H
+#define CMD_H
+
+#include "intel.h"
+#include "obj.h"
+#include "view.h"
+#include "state.h"
+
+struct intel_pipeline;
+struct intel_pipeline_shader;
+struct intel_viewport_state;
+struct intel_raster_state;
+struct intel_msaa_state;
+struct intel_blend_state;
+struct intel_ds_state;
+struct intel_desc_set;
+struct intel_render_pass;
+
+struct intel_cmd_item;
+struct intel_cmd_reloc;
+struct intel_cmd_meta;
+
+/*
+ * We know what workarounds are needed for intel_pipeline.  These are mostly
+ * for pipeline derivatives.
+ */
+enum intel_cmd_wa_flags {
+    /*
+     * From the Sandy Bridge PRM, volume 2 part 1, page 60:
+     *
+     *     "Before any depth stall flush (including those produced by
+     *      non-pipelined state commands), software needs to first send a
+     *      PIPE_CONTROL with no bits set except Post-Sync Operation != 0."
+     */
+    INTEL_CMD_WA_GEN6_PRE_DEPTH_STALL_WRITE = 1 << 0,
+
+    /*
+     * From the Sandy Bridge PRM, volume 2 part 1, page 274:
+     *
+     *     "A PIPE_CONTROL command, with only the Stall At Pixel Scoreboard
+     *      field set (DW1 Bit 1), must be issued prior to any change to the
+     *      value in this field (Maximum Number of Threads in 3DSTATE_WM)"
+     *
+     * From the Ivy Bridge PRM, volume 2 part 1, page 286:
+     *
+     *     "If this field (Maximum Number of Threads in 3DSTATE_PS) is changed
+     *      between 3DPRIMITIVE commands, a PIPE_CONTROL command with Stall at
+     *      Pixel Scoreboard set is required to be issued."
+     */
+    INTEL_CMD_WA_GEN6_PRE_COMMAND_SCOREBOARD_STALL = 1 << 1,
+
+    /*
+     * From the Ivy Bridge PRM, volume 2 part 1, page 106:
+     *
+     *     "A PIPE_CONTROL with Post-Sync Operation set to 1h and a depth
+     *      stall needs to be sent just prior to any 3DSTATE_VS,
+     *      3DSTATE_URB_VS, 3DSTATE_CONSTANT_VS,
+     *      3DSTATE_BINDING_TABLE_POINTER_VS, 3DSTATE_SAMPLER_STATE_POINTER_VS
+     *      command.  Only one PIPE_CONTROL needs to be sent before any
+     *      combination of VS associated 3DSTATE."
+     */
+    INTEL_CMD_WA_GEN7_PRE_VS_DEPTH_STALL_WRITE = 1 << 2,
+
+    /*
+     * From the Ivy Bridge PRM, volume 2 part 1, page 258:
+     *
+     *     "Due to an HW issue driver needs to send a pipe control with stall
+     *      when ever there is state change in depth bias related state"
+     */
+    INTEL_CMD_WA_GEN7_POST_COMMAND_CS_STALL = 1 << 3,
+
+    /*
+     * From the Ivy Bridge PRM, volume 2 part 1, page 276:
+     *
+     *     "The driver must make sure a PIPE_CONTROL with the Depth Stall
+     *      Enable bit set after all the following states are programmed:
+     *
+     *       - 3DSTATE_PS
+     *       - 3DSTATE_VIEWPORT_STATE_POINTERS_CC
+     *       - 3DSTATE_CONSTANT_PS
+     *       - 3DSTATE_BINDING_TABLE_POINTERS_PS
+     *       - 3DSTATE_SAMPLER_STATE_POINTERS_PS
+     *       - 3DSTATE_CC_STATE_POINTERS
+     *       - 3DSTATE_BLEND_STATE_POINTERS
+     *       - 3DSTATE_DEPTH_STENCIL_STATE_POINTERS"
+     */
+    INTEL_CMD_WA_GEN7_POST_COMMAND_DEPTH_STALL = 1 << 4,
+};
+
+enum intel_cmd_writer_type {
+    INTEL_CMD_WRITER_BATCH,
+    INTEL_CMD_WRITER_SURFACE,
+    INTEL_CMD_WRITER_STATE,
+    INTEL_CMD_WRITER_INSTRUCTION,
+
+    INTEL_CMD_WRITER_COUNT,
+};
+
+enum intel_use_pipeline_dynamic_state {
+    INTEL_USE_PIPELINE_DYNAMIC_VIEWPORT                  = (1 << 0),
+    INTEL_USE_PIPELINE_DYNAMIC_SCISSOR                   = (1 << 1),
+    INTEL_USE_PIPELINE_DYNAMIC_LINE_WIDTH                = (1 << 2),
+    INTEL_USE_PIPELINE_DYNAMIC_DEPTH_BIAS                = (1 << 3),
+    INTEL_USE_PIPELINE_DYNAMIC_BLEND_CONSTANTS           = (1 << 4),
+    INTEL_USE_PIPELINE_DYNAMIC_DEPTH_BOUNDS              = (1 << 5),
+    INTEL_USE_PIPELINE_DYNAMIC_STENCIL_COMPARE_MASK      = (1 << 6),
+    INTEL_USE_PIPELINE_DYNAMIC_STENCIL_WRITE_MASK        = (1 << 7),
+    INTEL_USE_PIPELINE_DYNAMIC_STENCIL_REFERENCE         = (1 << 8)
+};
+
+struct intel_cmd_shader_cache {
+    struct {
+        const void *shader;
+        uint32_t kernel_offset;
+    } *entries;
+
+    uint32_t count;
+    uint32_t used;
+};
+
+struct intel_cmd_dset_data {
+    struct intel_desc_offset *set_offsets;
+    uint32_t set_offset_count;
+
+    uint32_t *dynamic_offsets;
+    uint32_t dynamic_offset_count;
+};
+
+/*
+ * States bounded to the command buffer.  We want to write states directly to
+ * the command buffer when possible, and reduce this struct.
+ */
+struct intel_cmd_bind {
+    const struct intel_cmd_meta *meta;
+
+    struct intel_cmd_shader_cache shader_cache;
+
+    struct {
+        const struct intel_pipeline *graphics;
+        const struct intel_pipeline *compute;
+
+        uint32_t vs_offset;
+        uint32_t tcs_offset;
+        uint32_t tes_offset;
+        uint32_t gs_offset;
+        uint32_t fs_offset;
+        uint32_t cs_offset;
+    } pipeline;
+
+    struct {
+        VkFlags use_pipeline_dynamic_state;
+        struct intel_dynamic_viewport viewport;
+        struct intel_dynamic_line_width line_width;
+        struct intel_dynamic_depth_bias depth_bias;
+        struct intel_dynamic_blend blend;
+        struct intel_dynamic_depth_bounds depth_bounds;
+        struct intel_dynamic_stencil stencil;
+    } state;
+
+    struct {
+        struct intel_cmd_dset_data graphics_data;
+        struct intel_cmd_dset_data compute_data;
+    } dset;
+
+    struct {
+        const struct intel_buf *buf[INTEL_MAX_VERTEX_BINDING_COUNT];
+        VkDeviceSize offset[INTEL_MAX_VERTEX_BINDING_COUNT];
+    } vertex;
+
+    struct {
+        const struct intel_buf *buf;
+        VkDeviceSize offset;
+        VkIndexType type;
+    } index;
+
+
+    bool render_pass_changed;
+    const struct intel_render_pass *render_pass;
+    const struct intel_render_pass_subpass *render_pass_subpass;
+    const struct intel_fb *fb;
+    VkSubpassContents render_pass_contents;
+
+    uint32_t draw_count;
+    uint32_t wa_flags;
+};
+
+struct intel_cmd_writer {
+    size_t size;
+    struct intel_bo *bo;
+    void *ptr;
+
+    size_t used;
+
+    uint32_t sba_offset;
+
+    /* for decoding */
+    struct intel_cmd_item *items;
+    uint32_t item_alloc;
+    uint32_t item_used;
+};
+
+struct intel_cmd_pool {
+    struct intel_obj obj;
+    struct intel_dev *dev;
+
+    uint32_t queue_family_index;
+    uint32_t create_flags;
+};
+
+static inline struct intel_cmd_pool *intel_cmd_pool(VkCommandPool pool)
+{
+    return *(struct intel_cmd_pool **) &pool;
+}
+
+static inline struct intel_cmd_pool *intel_cmd_pool_from_base(struct intel_base *base)
+{
+    return (struct intel_cmd_pool *) base;
+}
+
+static inline struct intel_cmd_pool *intel_cmd_pool_from_obj(struct intel_obj *obj)
+{
+    return (struct intel_cmd_pool *) &obj->base;
+}
+
+VkResult intel_cmd_pool_create(struct intel_dev *dev,
+                            const VkCommandPoolCreateInfo *info,
+                            struct intel_cmd_pool **cmd_pool_ret);
+void intel_cmd_pool_destroy(struct intel_cmd_pool *pool);
+
+void intel_free_cmd_buffers(
+        struct intel_cmd_pool              *cmd_pool,
+        uint32_t                            count,
+        const VkCommandBuffer                  *cmd_bufs);
+
+struct intel_cmd {
+    struct intel_obj obj;
+
+    struct intel_dev *dev;
+    struct intel_bo *scratch_bo;
+    bool primary;
+    int pipeline_select;
+
+    struct intel_cmd_reloc *relocs;
+    uint32_t reloc_count;
+
+    VkFlags flags;
+
+    struct intel_cmd_writer writers[INTEL_CMD_WRITER_COUNT];
+
+    uint32_t reloc_used;
+    VkResult result;
+
+    struct intel_cmd_bind bind;
+};
+
+static inline struct intel_cmd *intel_cmd(VkCommandBuffer cmd)
+{
+    return (struct intel_cmd *) cmd;
+}
+
+static inline struct intel_cmd *intel_cmd_from_obj(struct intel_obj *obj)
+{
+    return (struct intel_cmd *) obj;
+}
+
+VkResult intel_cmd_create(struct intel_dev *dev,
+                            const VkCommandBufferAllocateInfo *info,
+                            struct intel_cmd **cmd_ret);
+void intel_cmd_destroy(struct intel_cmd *cmd);
+
+VkResult intel_cmd_begin(struct intel_cmd *cmd, const VkCommandBufferBeginInfo* pBeginInfo);
+VkResult intel_cmd_end(struct intel_cmd *cmd);
+
+void intel_cmd_decode(struct intel_cmd *cmd, bool decode_inst_writer);
+
+static inline struct intel_bo *intel_cmd_get_batch(const struct intel_cmd *cmd,
+                                                   VkDeviceSize *used)
+{
+    const struct intel_cmd_writer *writer =
+        &cmd->writers[INTEL_CMD_WRITER_BATCH];
+
+    if (used)
+        *used = writer->used;
+
+    return writer->bo;
+}
+
+#endif /* CMD_H */

diff --git a/icd/intel/cmd_barrier.c b/icd/intel/cmd_barrier.c
new file mode 100644
index 0000000..afd325a
--- /dev/null
+++ b/icd/intel/cmd_barrier.c

@@ -0,0 +1,328 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olv@lunarg.com>
+ *
+ */
+
+#include "genhw/genhw.h"
+#include "img.h"
+#include "buf.h"
+#include "cmd_priv.h"
+
+enum {
+    READ_OP          = 1 << 0,
+    WRITE_OP         = 1 << 1,
+    HIZ_OP           = 1 << 2,
+};
+
+enum {
+    MEM_CACHE        = 1 << 0,
+    DATA_READ_CACHE  = 1 << 1,
+    DATA_WRITE_CACHE = 1 << 2,
+    RENDER_CACHE     = 1 << 3,
+    SAMPLER_CACHE    = 1 << 4,
+};
+
+static uint32_t img_get_layout_ops(const struct intel_img *img,
+                                   VkImageLayout layout)
+{
+    uint32_t ops;
+
+    switch ((int) layout) {
+    case VK_IMAGE_LAYOUT_GENERAL:
+    case VK_IMAGE_LAYOUT_PRESENT_SRC_KHR:
+        ops = READ_OP | WRITE_OP;
+        break;
+    case VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL:
+        ops = READ_OP | WRITE_OP;
+        break;
+    case VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL:
+        ops = READ_OP | WRITE_OP | HIZ_OP;
+        break;
+    case VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL:
+        ops = READ_OP | HIZ_OP;
+        break;
+    case VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL:
+        ops = READ_OP;
+        break;
+    case VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL:
+        ops = READ_OP;
+        break;
+    case VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL:
+        ops = WRITE_OP;
+        break;
+    case VK_IMAGE_LAYOUT_UNDEFINED:
+    default:
+        ops = 0;
+        break;
+    }
+
+    return ops;
+}
+
+static uint32_t img_get_layout_caches(const struct intel_img *img,
+                                     VkImageLayout layout)
+{
+    uint32_t caches;
+
+    switch ((int) layout) {
+    case VK_IMAGE_LAYOUT_GENERAL:
+    case VK_IMAGE_LAYOUT_PRESENT_SRC_KHR:
+        // General layout when image can be used for any kind of access
+        caches = MEM_CACHE | DATA_READ_CACHE | DATA_WRITE_CACHE | RENDER_CACHE | SAMPLER_CACHE;
+        break;
+    case VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL:
+        // Optimal layout when image is only used for color attachment read/write
+        caches = DATA_WRITE_CACHE | RENDER_CACHE;
+        break;
+    case VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL:
+        // Optimal layout when image is only used for depth/stencil attachment read/write
+        caches = DATA_WRITE_CACHE | RENDER_CACHE;
+        break;
+    case VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL:
+        // Optimal layout when image is used for read only depth/stencil attachment and shader access
+        caches = RENDER_CACHE;
+        break;
+    case VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL:
+        // Optimal layout when image is used for read only shader access
+        caches = DATA_READ_CACHE | SAMPLER_CACHE;
+        break;
+    case VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL:
+        // Optimal layout when image is used only as source of transfer operations
+        caches = MEM_CACHE | DATA_READ_CACHE | RENDER_CACHE | SAMPLER_CACHE;
+        break;
+    case VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL:
+        // Optimal layout when image is used only as destination of transfer operations
+        caches = MEM_CACHE | DATA_WRITE_CACHE | RENDER_CACHE;
+        break;
+    default:
+        caches = 0;
+        break;
+    }
+
+    return caches;
+}
+
+static void cmd_resolve_depth(struct intel_cmd *cmd,
+                              struct intel_img *img,
+                              VkImageLayout old_layout,
+                              VkImageLayout new_layout,
+                              const VkImageSubresourceRange *range)
+{
+    const uint32_t old_ops = img_get_layout_ops(img, old_layout);
+    const uint32_t new_ops = img_get_layout_ops(img, new_layout);
+
+    if (old_ops & WRITE_OP) {
+        if ((old_ops & HIZ_OP) && !(new_ops & HIZ_OP))
+            cmd_meta_ds_op(cmd, INTEL_CMD_META_DS_RESOLVE, img, range);
+        else if (!(old_ops & HIZ_OP) && (new_ops & HIZ_OP))
+            cmd_meta_ds_op(cmd, INTEL_CMD_META_DS_HIZ_RESOLVE, img, range);
+    }
+}
+
+static uint32_t cmd_get_flush_flags(const struct intel_cmd *cmd,
+                                    uint32_t old_caches,
+                                    uint32_t new_caches,
+                                    bool is_ds)
+{
+    uint32_t flags = 0;
+
+    /* not dirty */
+    if (!(old_caches & (MEM_CACHE | RENDER_CACHE | DATA_WRITE_CACHE)))
+        return 0;
+
+    if ((old_caches & RENDER_CACHE) && (new_caches & ~RENDER_CACHE)) {
+        if (is_ds)
+            flags |= GEN6_PIPE_CONTROL_DEPTH_CACHE_FLUSH;
+        else
+            flags |= GEN6_PIPE_CONTROL_RENDER_CACHE_FLUSH;
+    }
+
+    if ((old_caches & DATA_WRITE_CACHE) &&
+        (new_caches & ~(DATA_READ_CACHE | DATA_WRITE_CACHE))) {
+        if (cmd_gen(cmd) >= INTEL_GEN(7))
+            flags |= GEN7_PIPE_CONTROL_DC_FLUSH;
+    }
+
+    if (new_caches & SAMPLER_CACHE)
+        flags |= GEN6_PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE;
+
+    if ((new_caches & DATA_READ_CACHE) && old_caches != DATA_WRITE_CACHE)
+        flags |= GEN6_PIPE_CONTROL_CONSTANT_CACHE_INVALIDATE;
+
+    if (!flags)
+        return 0;
+
+    flags |= GEN6_PIPE_CONTROL_CS_STALL;
+
+    return flags;
+}
+
+static void cmd_memory_barriers(struct intel_cmd *cmd,
+                                uint32_t flush_flags,
+                                uint32_t mem_barrier_count,
+                                const VkMemoryBarrier* mem_barriers,
+                                uint32_t buf_mem_barrier_count,
+                                const VkBufferMemoryBarrier* buf_mem_barriers,
+                                uint32_t image_mem_barrier_count,
+                                const VkImageMemoryBarrier* image_mem_barriers)
+{
+    uint32_t i;
+    VkFlags input_mask = 0;
+    VkFlags output_mask = 0;
+
+    for (i = 0; i < mem_barrier_count; i++) {
+        const VkMemoryBarrier *b = &mem_barriers[i];
+        assert(b->sType == VK_STRUCTURE_TYPE_MEMORY_BARRIER);
+        output_mask |= b->srcAccessMask;
+        input_mask  |= b->dstAccessMask;
+    }
+
+    for (i = 0; i < buf_mem_barrier_count; i++) {
+        const VkBufferMemoryBarrier *b = &buf_mem_barriers[i];
+        assert(b->sType == VK_STRUCTURE_TYPE_BUFFER_MEMORY_BARRIER);
+        output_mask |= b->srcAccessMask;
+        input_mask  |= b->dstAccessMask;
+    }
+
+    for (i = 0; i < image_mem_barrier_count; i++) {
+        const VkImageMemoryBarrier *b = &image_mem_barriers[i];
+        assert(b->sType == VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER);
+        output_mask |= b->srcAccessMask;
+        input_mask  |= b->dstAccessMask;
+        {
+            struct intel_img *img = intel_img(b->image);
+
+            cmd_resolve_depth(cmd, img, b->oldLayout,
+                        b->newLayout, &b->subresourceRange);
+
+            flush_flags |= cmd_get_flush_flags(cmd,
+                            img_get_layout_caches(img, b->oldLayout),
+                            img_get_layout_caches(img, b->newLayout),
+                            icd_format_is_ds(img->layout.format));
+        }
+    }
+
+    if (output_mask & VK_ACCESS_SHADER_WRITE_BIT) {
+        flush_flags |= GEN7_PIPE_CONTROL_DC_FLUSH;
+    }
+    if (output_mask & VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT) {
+        flush_flags |= GEN6_PIPE_CONTROL_RENDER_CACHE_FLUSH;
+    }
+    if (output_mask & VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT) {
+        flush_flags |= GEN6_PIPE_CONTROL_DEPTH_CACHE_FLUSH;
+    }
+
+    /* CPU write is cache coherent, so VK_ACCESS_HOST_WRITE_BIT needs no flush. */
+    /* Meta handles flushes, so VK_ACCESS_TRANSFER_WRITE_BIT needs no flush. */
+
+    if (input_mask & (VK_ACCESS_SHADER_READ_BIT | VK_ACCESS_UNIFORM_READ_BIT)) {
+        flush_flags |= GEN6_PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE;
+    }
+
+    if (input_mask & VK_ACCESS_UNIFORM_READ_BIT) {
+        flush_flags |= GEN6_PIPE_CONTROL_CONSTANT_CACHE_INVALIDATE;
+    }
+
+    if (input_mask & VK_ACCESS_VERTEX_ATTRIBUTE_READ_BIT) {
+        flush_flags |= GEN6_PIPE_CONTROL_VF_CACHE_INVALIDATE;
+    }
+
+    /* These bits have no corresponding cache invalidate operation.
+     * VK_ACCESS_HOST_READ_BIT
+     * VK_ACCESS_INDIRECT_COMMAND_READ_BIT
+     * VK_ACCESS_INDEX_READ_BIT
+     * VK_ACCESS_COLOR_ATTACHMENT_READ_BIT
+     * VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT
+     * VK_ACCESS_TRANSFER_READ_BIT
+     */
+
+    cmd_batch_flush(cmd, flush_flags);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdWaitEvents(
+    VkCommandBuffer                             commandBuffer,
+    uint32_t                                    eventCount,
+    const VkEvent*                              pEvents,
+    VkPipelineStageFlags                        sourceStageMask,
+    VkPipelineStageFlags                        dstStageMask,
+    uint32_t                                    memoryBarrierCount,
+    const VkMemoryBarrier*                      pMemoryBarriers,
+    uint32_t                                    bufferMemoryBarrierCount,
+    const VkBufferMemoryBarrier*                pBufferMemoryBarriers,
+    uint32_t                                    imageMemoryBarrierCount,
+    const VkImageMemoryBarrier*                 pImageMemoryBarriers)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+
+    /* This hardware will always wait at VK_PIPELINE_STAGE_TOP_OF_PIPE.
+     * Passing a stageMask specifying other stages
+     * does not change that.
+     */
+
+    /* Because the command buffer is serialized, reaching
+     * a pipelined wait is always after completion of prior events.
+     * pWaitInfo->pEvents need not be examined.
+     * vkCmdWaitEvents is equivalent to memory barrier part of vkCmdPipelineBarrier.
+     * cmd_memory_barriers will wait for GEN6_PIPE_CONTROL_CS_STALL and perform
+     * appropriate cache control.
+     */
+    cmd_memory_barriers(cmd, GEN6_PIPE_CONTROL_CS_STALL,
+                        memoryBarrierCount, pMemoryBarriers,
+                        bufferMemoryBarrierCount, pBufferMemoryBarriers,
+                        imageMemoryBarrierCount, pImageMemoryBarriers);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdPipelineBarrier(
+        VkCommandBuffer                             commandBuffer,
+        VkPipelineStageFlags                        srcStageMask,
+        VkPipelineStageFlags                        dstStageMask,
+        VkDependencyFlags                           dependencyFlags,
+        uint32_t                                    memoryBarrierCount,
+        const VkMemoryBarrier*                      pMemoryBarriers,
+        uint32_t                                    bufferMemoryBarrierCount,
+        const VkBufferMemoryBarrier*                pBufferMemoryBarriers,
+        uint32_t                                    imageMemoryBarrierCount,
+        const VkImageMemoryBarrier*                 pImageMemoryBarriers)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+    uint32_t pipe_control_flags = 0;
+
+    /* This hardware will always wait at VK_WAIT_EVENT_TOP_OF_PIPE.
+     * Passing a stageMask specifying other stages
+     * does not change that.
+     */
+
+    /* Cache control is done with PIPE_CONTROL flags.
+     * With no GEN6_PIPE_CONTROL_CS_STALL flag set, it behaves as VK_PIPE_EVENT_TOP_OF_PIPE.
+     * All other pEvents values will behave as VK_PIPE_EVENT_COMMANDS_COMPLETE.
+     */
+
+    if ((srcStageMask & ~VK_PIPELINE_STAGE_HOST_BIT) ||
+        (dstStageMask & ~VK_PIPELINE_STAGE_HOST_BIT)){
+        pipe_control_flags = GEN6_PIPE_CONTROL_CS_STALL;
+    }
+
+    /* cmd_memory_barriers can wait for GEN6_PIPE_CONTROL_CS_STALL and perform
+     * appropriate cache control.
+     */
+    cmd_memory_barriers(cmd, pipe_control_flags,
+                        memoryBarrierCount, pMemoryBarriers,
+                        bufferMemoryBarrierCount, pBufferMemoryBarriers,
+                        imageMemoryBarrierCount, pImageMemoryBarriers);
+}

diff --git a/icd/intel/cmd_decode.c b/icd/intel/cmd_decode.c
new file mode 100644
index 0000000..dcb5cad
--- /dev/null
+++ b/icd/intel/cmd_decode.c

@@ -0,0 +1,607 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Chia-I Wu <olv@lunarg.com>
+ *
+ */
+
+#include <stdio.h>
+#include <stdarg.h>
+#include "compiler/pipeline/pipeline_compiler_interface.h"
+#include "genhw/genhw.h"
+#include "kmd/winsys.h"
+#include "cmd_priv.h"
+
+static const uint32_t *
+writer_pointer(const struct intel_cmd *cmd,
+               enum intel_cmd_writer_type which,
+               unsigned offset)
+{
+    const struct intel_cmd_writer *writer = &cmd->writers[which];
+    return (const uint32_t *) ((const char *) writer->ptr + offset);
+}
+
+static uint32_t
+writer_dw(const struct intel_cmd *cmd,
+          enum intel_cmd_writer_type which,
+          unsigned offset, unsigned dw_index,
+          const char *format, ...)
+{
+    const uint32_t *dw = writer_pointer(cmd, which, offset);
+    va_list ap;
+    char desc[16];
+    int len;
+
+    fprintf(stderr, "0x%08x:      0x%08x: ",
+            offset + (dw_index << 2), dw[dw_index]);
+
+    va_start(ap, format);
+    len = vsnprintf(desc, sizeof(desc), format, ap);
+    va_end(ap);
+
+    if (len >= sizeof(desc)) {
+        len = sizeof(desc) - 1;
+        desc[len] = '\0';
+    }
+
+    if (desc[len - 1] == '\n') {
+        desc[len - 1] = '\0';
+        fprintf(stderr, "%8s: \n", desc);
+    } else {
+        fprintf(stderr, "%8s: ", desc);
+    }
+
+    return dw[dw_index];
+}
+
+static void
+writer_decode_blob(const struct intel_cmd *cmd,
+                   enum intel_cmd_writer_type which,
+                   const struct intel_cmd_item *item)
+{
+    const unsigned state_size = sizeof(uint32_t);
+    const unsigned count = item->size / state_size;
+    unsigned offset = item->offset;
+    unsigned i;
+
+    for (i = 0; i < count; i += 4) {
+        const uint32_t *dw = writer_pointer(cmd, which, offset);
+
+        writer_dw(cmd, which, offset, 0, "BLOB%d", i / 4);
+
+        switch (count - i) {
+        case 1:
+            fprintf(stderr, "(%10.4f, %10c, %10c, %10c) "
+                            "(0x%08x, %10c, %10c, %10c)\n",
+                            u_uif(dw[0]), 'X', 'X', 'X',
+                            dw[0],        'X', 'X', 'X');
+            break;
+        case 2:
+            fprintf(stderr, "(%10.4f, %10.4f, %10c, %10c) "
+                            "(0x%08x, 0x%08x, %10c, %10c)\n",
+                            u_uif(dw[0]), u_uif(dw[1]), 'X', 'X',
+                                  dw[0],        dw[1],  'X', 'X');
+            break;
+        case 3:
+            fprintf(stderr, "(%10.4f, %10.4f, %10.4f, %10c) "
+                            "(0x%08x, 0x%08x, 0x%08x, %10c)\n",
+                            u_uif(dw[0]), u_uif(dw[1]), u_uif(dw[2]), 'X',
+                                  dw[0],        dw[1],        dw[2],  'X');
+            break;
+        default:
+            fprintf(stderr, "(%10.4f, %10.4f, %10.4f, %10.4f) "
+                            "(0x%08x, 0x%08x, 0x%08x, 0x%08x)\n",
+                            u_uif(dw[0]), u_uif(dw[1]), u_uif(dw[2]), u_uif(dw[3]),
+                                  dw[0],        dw[1],        dw[2],        dw[3]);
+            break;
+        }
+
+        offset += state_size * 4;
+    }
+}
+
+static void
+writer_decode_clip_viewport(const struct intel_cmd *cmd,
+                            enum intel_cmd_writer_type which,
+                            const struct intel_cmd_item *item)
+{
+    const unsigned state_size = sizeof(uint32_t) * 4;
+    const unsigned count = item->size / state_size;
+    unsigned offset = item->offset;
+    unsigned i;
+
+    for (i = 0; i < count; i++) {
+        uint32_t dw;
+
+        dw = writer_dw(cmd, which, offset, 0, "CLIP VP%d", i);
+        fprintf(stderr, "xmin = %f\n", u_uif(dw));
+
+        dw = writer_dw(cmd, which, offset, 1, "CLIP VP%d", i);
+        fprintf(stderr, "xmax = %f\n", u_uif(dw));
+
+        dw = writer_dw(cmd, which, offset, 2, "CLIP VP%d", i);
+        fprintf(stderr, "ymin = %f\n", u_uif(dw));
+
+        dw = writer_dw(cmd, which, offset, 3, "CLIP VP%d", i);
+        fprintf(stderr, "ymax = %f\n", u_uif(dw));
+
+        offset += state_size;
+    }
+}
+
+static void
+writer_decode_sf_clip_viewport_gen7(const struct intel_cmd *cmd,
+                                    enum intel_cmd_writer_type which,
+                                    const struct intel_cmd_item *item)
+{
+    const unsigned state_size = sizeof(uint32_t) * 16;
+    const unsigned count = item->size / state_size;
+    unsigned offset = item->offset;
+    unsigned i;
+
+    for (i = 0; i < count; i++) {
+        uint32_t dw;
+
+        dw = writer_dw(cmd, which, offset, 0, "SF_CLIP VP%d", i);
+        fprintf(stderr, "m00 = %f\n", u_uif(dw));
+
+        dw = writer_dw(cmd, which, offset, 1, "SF_CLIP VP%d", i);
+        fprintf(stderr, "m11 = %f\n", u_uif(dw));
+
+        dw = writer_dw(cmd, which, offset, 2, "SF_CLIP VP%d", i);
+        fprintf(stderr, "m22 = %f\n", u_uif(dw));
+
+        dw = writer_dw(cmd, which, offset, 3, "SF_CLIP VP%d", i);
+        fprintf(stderr, "m30 = %f\n", u_uif(dw));
+
+        dw = writer_dw(cmd, which, offset, 4, "SF_CLIP VP%d", i);
+        fprintf(stderr, "m31 = %f\n", u_uif(dw));
+
+        dw = writer_dw(cmd, which, offset, 5, "SF_CLIP VP%d", i);
+        fprintf(stderr, "m32 = %f\n", u_uif(dw));
+
+        dw = writer_dw(cmd, which, offset, 8, "SF_CLIP VP%d", i);
+        fprintf(stderr, "guardband xmin = %f\n", u_uif(dw));
+
+        dw = writer_dw(cmd, which, offset, 9, "SF_CLIP VP%d", i);
+        fprintf(stderr, "guardband xmax = %f\n", u_uif(dw));
+
+        dw = writer_dw(cmd, which, offset, 10, "SF_CLIP VP%d", i);
+        fprintf(stderr, "guardband ymin = %f\n", u_uif(dw));
+
+        dw = writer_dw(cmd, which, offset, 11, "SF_CLIP VP%d", i);
+        fprintf(stderr, "guardband ymax = %f\n", u_uif(dw));
+
+        offset += state_size;
+    }
+}
+
+static void
+writer_decode_sf_viewport_gen6(const struct intel_cmd *cmd,
+                               enum intel_cmd_writer_type which,
+                               const struct intel_cmd_item *item)
+{
+    const unsigned state_size = sizeof(uint32_t) * 8;
+    const unsigned count = item->size / state_size;
+    unsigned offset = item->offset;
+    unsigned i;
+
+    for (i = 0; i < count; i++) {
+        uint32_t dw;
+
+        dw = writer_dw(cmd, which, offset, 0, "SF VP%d", i);
+        fprintf(stderr, "m00 = %f\n", u_uif(dw));
+
+        dw = writer_dw(cmd, which, offset, 1, "SF VP%d", i);
+        fprintf(stderr, "m11 = %f\n", u_uif(dw));
+
+        dw = writer_dw(cmd, which, offset, 2, "SF VP%d", i);
+        fprintf(stderr, "m22 = %f\n", u_uif(dw));
+
+        dw = writer_dw(cmd, which, offset, 3, "SF VP%d", i);
+        fprintf(stderr, "m30 = %f\n", u_uif(dw));
+
+        dw = writer_dw(cmd, which, offset, 4, "SF VP%d", i);
+        fprintf(stderr, "m31 = %f\n", u_uif(dw));
+
+        dw = writer_dw(cmd, which, offset, 5, "SF VP%d", i);
+        fprintf(stderr, "m32 = %f\n", u_uif(dw));
+
+        offset += state_size;
+    }
+}
+
+static void
+writer_decode_sf_viewport(const struct intel_cmd *cmd,
+                          enum intel_cmd_writer_type which,
+                          const struct intel_cmd_item *item)
+{
+    if (cmd_gen(cmd) >= INTEL_GEN(7))
+        writer_decode_sf_clip_viewport_gen7(cmd, which, item);
+    else
+        writer_decode_sf_viewport_gen6(cmd, which, item);
+}
+
+static void
+writer_decode_scissor_rect(const struct intel_cmd *cmd,
+                           enum intel_cmd_writer_type which,
+                           const struct intel_cmd_item *item)
+{
+    const unsigned state_size = sizeof(uint32_t) * 2;
+    const unsigned count = item->size / state_size;
+    unsigned offset = item->offset;
+    unsigned i;
+
+    for (i = 0; i < count; i++) {
+        uint32_t dw;
+
+        dw = writer_dw(cmd, which, offset, 0, "SCISSOR%d", i);
+        fprintf(stderr, "xmin %d, ymin %d\n",
+                GEN_EXTRACT(dw, GEN6_SCISSOR_DW0_MIN_X),
+                GEN_EXTRACT(dw, GEN6_SCISSOR_DW0_MIN_Y));
+
+        dw = writer_dw(cmd, which, offset, 1, "SCISSOR%d", i);
+        fprintf(stderr, "xmax %d, ymax %d\n",
+                GEN_EXTRACT(dw, GEN6_SCISSOR_DW1_MAX_X),
+                GEN_EXTRACT(dw, GEN6_SCISSOR_DW1_MAX_Y));
+
+        offset += state_size;
+    }
+}
+
+static void
+writer_decode_cc_viewport(const struct intel_cmd *cmd,
+                          enum intel_cmd_writer_type which,
+                          const struct intel_cmd_item *item)
+{
+    const unsigned state_size = sizeof(uint32_t) * 2;
+    const unsigned count = item->size / state_size;
+    unsigned offset = item->offset;
+    unsigned i;
+
+    for (i = 0; i < count; i++) {
+        uint32_t dw;
+
+        dw = writer_dw(cmd, which, offset, 0, "CC VP%d", i);
+        fprintf(stderr, "min_depth = %f\n", u_uif(dw));
+
+        dw = writer_dw(cmd, which, offset, 1, "CC VP%d", i);
+        fprintf(stderr, "max_depth = %f\n", u_uif(dw));
+
+        offset += state_size;
+    }
+}
+
+static void
+writer_decode_color_calc(const struct intel_cmd *cmd,
+                         enum intel_cmd_writer_type which,
+                         const struct intel_cmd_item *item)
+{
+    uint32_t dw;
+
+    dw = writer_dw(cmd, which, item->offset, 0, "CC");
+    fprintf(stderr, "alpha test format %s, round disable %d, "
+            "stencil ref %d, bf stencil ref %d\n",
+            GEN_EXTRACT(dw, GEN6_CC_DW0_ALPHATEST) ? "FLOAT32" : "UNORM8",
+            (bool) (dw & GEN6_CC_DW0_ROUND_DISABLE_DISABLE),
+            GEN_EXTRACT(dw, GEN6_CC_DW0_STENCIL0_REF),
+            GEN_EXTRACT(dw, GEN6_CC_DW0_STENCIL1_REF));
+
+    writer_dw(cmd, which, item->offset, 1, "CC\n");
+
+    dw = writer_dw(cmd, which, item->offset, 2, "CC");
+    fprintf(stderr, "constant red %f\n", u_uif(dw));
+
+    dw = writer_dw(cmd, which, item->offset, 3, "CC");
+    fprintf(stderr, "constant green %f\n", u_uif(dw));
+
+    dw = writer_dw(cmd, which, item->offset, 4, "CC");
+    fprintf(stderr, "constant blue %f\n", u_uif(dw));
+
+    dw = writer_dw(cmd, which, item->offset, 5, "CC");
+    fprintf(stderr, "constant alpha %f\n", u_uif(dw));
+}
+
+static void
+writer_decode_depth_stencil(const struct intel_cmd *cmd,
+                            enum intel_cmd_writer_type which,
+                            const struct intel_cmd_item *item)
+{
+    uint32_t dw;
+
+    dw = writer_dw(cmd, which, item->offset, 0, "D_S");
+    fprintf(stderr, "stencil %sable, func %d, write %sable\n",
+            (dw & GEN6_ZS_DW0_STENCIL_TEST_ENABLE) ? "en" : "dis",
+            GEN_EXTRACT(dw, GEN6_ZS_DW0_STENCIL0_FUNC),
+            (dw & GEN6_ZS_DW0_STENCIL_WRITE_ENABLE) ? "en" : "dis");
+
+    dw = writer_dw(cmd, which, item->offset, 1, "D_S");
+    fprintf(stderr, "stencil test mask 0x%x, write mask 0x%x\n",
+            GEN_EXTRACT(dw, GEN6_ZS_DW1_STENCIL0_VALUEMASK),
+            GEN_EXTRACT(dw, GEN6_ZS_DW1_STENCIL0_WRITEMASK));
+
+    dw = writer_dw(cmd, which, item->offset, 2, "D_S");
+    fprintf(stderr, "depth test %sable, func %d, write %sable\n",
+            (dw & GEN6_ZS_DW2_DEPTH_TEST_ENABLE) ? "en" : "dis",
+            GEN_EXTRACT(dw, GEN6_ZS_DW2_DEPTH_FUNC),
+            (dw & GEN6_ZS_DW2_DEPTH_WRITE_ENABLE) ? "en" : "dis");
+}
+
+static void
+writer_decode_blend(const struct intel_cmd *cmd,
+                    enum intel_cmd_writer_type which,
+                    const struct intel_cmd_item *item)
+{
+    const unsigned state_size = sizeof(uint32_t) * 2;
+    const unsigned count = item->size / state_size;
+    unsigned offset = item->offset;
+    unsigned i;
+
+    for (i = 0; i < count; i++) {
+        writer_dw(cmd, which, offset, 0, "BLEND%d\n", i);
+        writer_dw(cmd, which, offset, 1, "BLEND%d\n", i);
+
+        offset += state_size;
+    }
+}
+
+static void
+writer_decode_sampler(const struct intel_cmd *cmd,
+                      enum intel_cmd_writer_type which,
+                      const struct intel_cmd_item *item)
+{
+    const unsigned state_size = sizeof(uint32_t) * 4;
+    const unsigned count = item->size / state_size;
+    unsigned offset = item->offset;
+    unsigned i;
+
+    for (i = 0; i < count; i++) {
+        writer_dw(cmd, which, offset, 0, "WM SAMP%d", i);
+        fprintf(stderr, "filtering\n");
+
+        writer_dw(cmd, which, offset, 1, "WM SAMP%d", i);
+        fprintf(stderr, "wrapping, lod\n");
+
+        writer_dw(cmd, which, offset, 2, "WM SAMP%d", i);
+        fprintf(stderr, "default color pointer\n");
+
+        writer_dw(cmd, which, offset, 3, "WM SAMP%d", i);
+        fprintf(stderr, "chroma key, aniso\n");
+
+        offset += state_size;
+    }
+}
+
+static void
+writer_decode_surface_gen7(const struct intel_cmd *cmd,
+                           enum intel_cmd_writer_type which,
+                           const struct intel_cmd_item *item)
+{
+    uint32_t dw;
+
+    dw = writer_dw(cmd, which, item->offset, 0, "SURF");
+    fprintf(stderr, "type 0x%x, format 0x%x, tiling %d, %s array\n",
+            GEN_EXTRACT(dw, GEN7_SURFACE_DW0_TYPE),
+            GEN_EXTRACT(dw, GEN7_SURFACE_DW0_FORMAT),
+            GEN_EXTRACT(dw, GEN7_SURFACE_DW0_TILING),
+            (dw & GEN7_SURFACE_DW0_IS_ARRAY) ? "is" : "not");
+
+    writer_dw(cmd, which, item->offset, 1, "SURF");
+    fprintf(stderr, "offset\n");
+
+    dw = writer_dw(cmd, which, item->offset, 2, "SURF");
+    fprintf(stderr, "%dx%d size\n",
+            GEN_EXTRACT(dw, GEN7_SURFACE_DW2_WIDTH),
+            GEN_EXTRACT(dw, GEN7_SURFACE_DW2_HEIGHT));
+
+    dw = writer_dw(cmd, which, item->offset, 3, "SURF");
+    fprintf(stderr, "depth %d, pitch %d\n",
+            GEN_EXTRACT(dw, GEN7_SURFACE_DW3_DEPTH),
+            GEN_EXTRACT(dw, GEN7_SURFACE_DW3_PITCH));
+
+    dw = writer_dw(cmd, which, item->offset, 4, "SURF");
+    fprintf(stderr, "min array element %d, array extent %d\n",
+            GEN_EXTRACT(dw, GEN7_SURFACE_DW4_MIN_ARRAY_ELEMENT),
+            GEN_EXTRACT(dw, GEN7_SURFACE_DW4_RT_VIEW_EXTENT));
+
+    dw = writer_dw(cmd, which, item->offset, 5, "SURF");
+    fprintf(stderr, "mip base %d, mips %d, x,y offset: %d,%d\n",
+            GEN_EXTRACT(dw, GEN7_SURFACE_DW5_MIN_LOD),
+            GEN_EXTRACT(dw, GEN7_SURFACE_DW5_MIP_COUNT_LOD),
+            GEN_EXTRACT(dw, GEN7_SURFACE_DW5_X_OFFSET),
+            GEN_EXTRACT(dw, GEN7_SURFACE_DW5_Y_OFFSET));
+
+    writer_dw(cmd, which, item->offset, 6, "SURF\n");
+    writer_dw(cmd, which, item->offset, 7, "SURF\n");
+}
+
+static void
+writer_decode_surface_gen6(const struct intel_cmd *cmd,
+                           enum intel_cmd_writer_type which,
+                           const struct intel_cmd_item *item)
+{
+    uint32_t dw;
+
+    dw = writer_dw(cmd, which, item->offset, 0, "SURF");
+    fprintf(stderr, "type 0x%x, format 0x%x\n",
+            GEN_EXTRACT(dw, GEN6_SURFACE_DW0_TYPE),
+            GEN_EXTRACT(dw, GEN6_SURFACE_DW0_FORMAT));
+
+    writer_dw(cmd, which, item->offset, 1, "SURF");
+    fprintf(stderr, "offset\n");
+
+    dw = writer_dw(cmd, which, item->offset, 2, "SURF");
+    fprintf(stderr, "%dx%d size, %d mips\n",
+            GEN_EXTRACT(dw, GEN6_SURFACE_DW2_WIDTH),
+            GEN_EXTRACT(dw, GEN6_SURFACE_DW2_HEIGHT),
+            GEN_EXTRACT(dw, GEN6_SURFACE_DW2_MIP_COUNT_LOD));
+
+    dw = writer_dw(cmd, which, item->offset, 3, "SURF");
+    fprintf(stderr, "pitch %d, tiling %d\n",
+            GEN_EXTRACT(dw, GEN6_SURFACE_DW3_PITCH),
+            GEN_EXTRACT(dw, GEN6_SURFACE_DW3_TILING));
+
+    dw = writer_dw(cmd, which, item->offset, 4, "SURF");
+    fprintf(stderr, "mip base %d\n",
+            GEN_EXTRACT(dw, GEN6_SURFACE_DW4_MIN_LOD));
+
+    dw = writer_dw(cmd, which, item->offset, 5, "SURF");
+    fprintf(stderr, "x,y offset: %d,%d\n",
+            GEN_EXTRACT(dw, GEN6_SURFACE_DW5_X_OFFSET),
+            GEN_EXTRACT(dw, GEN6_SURFACE_DW5_Y_OFFSET));
+}
+
+static void
+writer_decode_surface(const struct intel_cmd *cmd,
+                      enum intel_cmd_writer_type which,
+                      const struct intel_cmd_item *item)
+{
+    if (cmd_gen(cmd) >= INTEL_GEN(7))
+        writer_decode_surface_gen7(cmd, which, item);
+    else
+        writer_decode_surface_gen6(cmd, which, item);
+}
+
+static void
+writer_decode_binding_table(const struct intel_cmd *cmd,
+                            enum intel_cmd_writer_type which,
+                            const struct intel_cmd_item *item)
+{
+    const unsigned state_size = sizeof(uint32_t) * 1;
+    const unsigned count = item->size / state_size;
+    unsigned offset = item->offset;
+    unsigned i;
+
+    for (i = 0; i < count; i++) {
+        writer_dw(cmd, which, offset, 0, "BIND");
+        fprintf(stderr, "BINDING_TABLE_STATE[%d]\n", i);
+
+        offset += state_size;
+    }
+}
+
+static void
+writer_decode_kernel(const struct intel_cmd *cmd,
+                     enum intel_cmd_writer_type which,
+                     const struct intel_cmd_item *item)
+{
+    const void *kernel = (const void *)
+        writer_pointer(cmd, which, item->offset);
+
+    fprintf(stderr, "0x%08zx:\n", item->offset);
+    intel_disassemble_kernel(cmd->dev->gpu, kernel, item->size);
+}
+
+static const struct {
+    void (*func)(const struct intel_cmd *cmd,
+                 enum intel_cmd_writer_type which,
+                 const struct intel_cmd_item *item);
+} writer_decode_table[INTEL_CMD_ITEM_COUNT] = {
+    [INTEL_CMD_ITEM_BLOB]                = { writer_decode_blob },
+    [INTEL_CMD_ITEM_CLIP_VIEWPORT]       = { writer_decode_clip_viewport },
+    [INTEL_CMD_ITEM_SF_VIEWPORT]         = { writer_decode_sf_viewport },
+    [INTEL_CMD_ITEM_SCISSOR_RECT]        = { writer_decode_scissor_rect },
+    [INTEL_CMD_ITEM_CC_VIEWPORT]         = { writer_decode_cc_viewport },
+    [INTEL_CMD_ITEM_COLOR_CALC]          = { writer_decode_color_calc },
+    [INTEL_CMD_ITEM_DEPTH_STENCIL]       = { writer_decode_depth_stencil },
+    [INTEL_CMD_ITEM_BLEND]               = { writer_decode_blend },
+    [INTEL_CMD_ITEM_SAMPLER]             = { writer_decode_sampler },
+    [INTEL_CMD_ITEM_SURFACE]             = { writer_decode_surface },
+    [INTEL_CMD_ITEM_BINDING_TABLE]       = { writer_decode_binding_table },
+    [INTEL_CMD_ITEM_KERNEL]              = { writer_decode_kernel },
+};
+
+static void cmd_writer_decode_items(struct intel_cmd *cmd,
+                                    enum intel_cmd_writer_type which)
+{
+    struct intel_cmd_writer *writer = &cmd->writers[which];
+    int i;
+
+    if (!writer->item_used)
+        return;
+
+    writer->ptr = intel_bo_map(writer->bo, false);
+    if (!writer->ptr)
+        return;
+
+    for (i = 0; i < writer->item_used; i++) {
+        const struct intel_cmd_item *item = &writer->items[i];
+
+        writer_decode_table[item->type].func(cmd, which, item);
+    }
+
+    intel_bo_unmap(writer->bo);
+    writer->ptr = NULL;
+}
+
+static void cmd_writer_decode(struct intel_cmd *cmd,
+                              enum intel_cmd_writer_type which,
+                              bool decode_inst_writer)
+{
+    struct intel_cmd_writer *writer = &cmd->writers[which];
+
+    assert(writer->bo && !writer->ptr);
+
+    switch (which) {
+    case INTEL_CMD_WRITER_BATCH:
+        fprintf(stderr, "decoding batch buffer: %zu bytes\n", writer->used);
+        if (writer->used) {
+            intel_winsys_decode_bo(cmd->dev->winsys,
+                    writer->bo, writer->used);
+        }
+        break;
+    case INTEL_CMD_WRITER_SURFACE:
+        fprintf(stderr, "decoding surface state buffer: %d states\n",
+                writer->item_used);
+        cmd_writer_decode_items(cmd, which);
+        break;
+    case INTEL_CMD_WRITER_STATE:
+        fprintf(stderr, "decoding dynamic state buffer: %d states\n",
+                writer->item_used);
+        cmd_writer_decode_items(cmd, which);
+        break;
+    case INTEL_CMD_WRITER_INSTRUCTION:
+        if (decode_inst_writer) {
+            fprintf(stderr, "decoding instruction buffer: %d kernels\n",
+                    writer->item_used);
+
+            cmd_writer_decode_items(cmd, which);
+        } else {
+            fprintf(stderr, "skipping instruction buffer: %d kernels\n",
+                    writer->item_used);
+        }
+        break;
+    default:
+        break;
+    }
+}
+
+/**
+ * Decode according to the recorded items.  This can be called only after a
+ * successful intel_cmd_end().
+ */
+void intel_cmd_decode(struct intel_cmd *cmd, bool decode_inst_writer)
+{
+    int i;
+
+    assert(cmd->result == VK_SUCCESS);
+
+    for (i = 0; i < INTEL_CMD_WRITER_COUNT; i++)
+        cmd_writer_decode(cmd, i, decode_inst_writer);
+}

diff --git a/icd/intel/cmd_meta.c b/icd/intel/cmd_meta.c
new file mode 100644
index 0000000..61044f9
--- /dev/null
+++ b/icd/intel/cmd_meta.c

@@ -0,0 +1,1120 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olv@lunarg.com>
+ * Author: Chris Forbes <chrisf@ijw.co.nz>
+ *
+ */
+
+#include "buf.h"
+#include "img.h"
+#include "mem.h"
+#include "state.h"
+#include "cmd_priv.h"
+#include "fb.h"
+
+static VkResult cmd_meta_create_buf_view(struct intel_cmd *cmd,
+                                           VkBuffer buf,
+                                           VkDeviceSize range,
+                                           VkFormat format,
+                                           struct intel_buf_view **view)
+{
+    VkBufferViewCreateInfo info;
+    VkDeviceSize stride;
+
+    memset(&info, 0, sizeof(info));
+    info.sType = VK_STRUCTURE_TYPE_BUFFER_VIEW_CREATE_INFO;
+    info.buffer = buf;
+    info.format = format;
+    info.range = range;
+
+    /*
+     * We do not rely on the hardware to avoid out-of-bound access.  But we do
+     * not want the hardware to ignore the last element either.
+     */
+    stride = icd_format_get_size(format);
+    if (info.range % stride)
+        info.range += stride - (info.range % stride);
+
+    return intel_buf_view_create(cmd->dev, &info, view);
+}
+
+static void cmd_meta_set_src_for_buf(struct intel_cmd *cmd,
+                                     const struct intel_buf *buf,
+                                     VkFormat format,
+                                     struct intel_cmd_meta *meta)
+{
+    struct intel_buf_view *view;
+    VkResult res;
+    VkBuffer localbuf = (VkBuffer) buf;
+
+    res = cmd_meta_create_buf_view(cmd, localbuf,
+            buf->size, format, &view);
+    if (res != VK_SUCCESS) {
+        cmd_fail(cmd, res);
+        return;
+    }
+
+    meta->src.valid = true;
+
+    memcpy(meta->src.surface, view->cmd,
+            sizeof(view->cmd[0]) * view->cmd_len);
+    meta->src.surface_len = view->cmd_len;
+
+    intel_buf_view_destroy(view);
+
+    meta->src.reloc_target = (intptr_t) buf->obj.mem->bo;
+    meta->src.reloc_offset = 0;
+    meta->src.reloc_flags = 0;
+}
+
+static void cmd_meta_set_dst_for_buf(struct intel_cmd *cmd,
+                                     const struct intel_buf *buf,
+                                     VkFormat format,
+                                     struct intel_cmd_meta *meta)
+{
+    struct intel_buf_view *view;
+    VkResult res;
+    VkBuffer localbuf = (VkBuffer) buf;
+
+    res = cmd_meta_create_buf_view(cmd, localbuf,
+            buf->size, format, &view);
+    if (res != VK_SUCCESS) {
+        cmd_fail(cmd, res);
+        return;
+    }
+
+    meta->dst.valid = true;
+
+    memcpy(meta->dst.surface, view->cmd,
+            sizeof(view->cmd[0]) * view->cmd_len);
+    meta->dst.surface_len = view->cmd_len;
+
+    intel_buf_view_destroy(view);
+
+    meta->dst.reloc_target = (intptr_t) buf->obj.mem->bo;
+    meta->dst.reloc_offset = 0;
+    meta->dst.reloc_flags = INTEL_RELOC_WRITE;
+}
+
+static void cmd_meta_set_src_for_img(struct intel_cmd *cmd,
+                                     const struct intel_img *img,
+                                     VkFormat format,
+                                     VkImageAspectFlagBits aspect,
+                                     struct intel_cmd_meta *meta)
+{
+    VkImageViewCreateInfo info;
+    struct intel_img_view tmp_view;
+    struct intel_img_view *view = &tmp_view;
+
+    memset(&tmp_view, 0, sizeof(tmp_view));
+
+    memset(&info, 0, sizeof(info));
+    info.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO;
+    info.image = (VkImage) img;
+
+    if (img->array_size == 1) {
+        switch (img->type) {
+        case VK_IMAGE_TYPE_1D:
+            info.viewType = VK_IMAGE_VIEW_TYPE_1D;
+            break;
+        case VK_IMAGE_TYPE_2D:
+            info.viewType = VK_IMAGE_VIEW_TYPE_2D;
+            break;
+        default:
+            break;
+        }
+    } else {
+        switch (img->type) {
+        case VK_IMAGE_TYPE_1D:
+            info.viewType = VK_IMAGE_VIEW_TYPE_1D_ARRAY;
+            break;
+        case VK_IMAGE_TYPE_2D:
+            info.viewType = VK_IMAGE_VIEW_TYPE_2D_ARRAY;
+            break;
+        case VK_IMAGE_TYPE_3D:
+            info.viewType = VK_IMAGE_VIEW_TYPE_3D;
+            break;
+        default:
+            break;
+        }
+    }
+
+    info.format = format;
+    info.components.r = VK_COMPONENT_SWIZZLE_R;
+    info.components.g = VK_COMPONENT_SWIZZLE_G;
+    info.components.b = VK_COMPONENT_SWIZZLE_B;
+    info.components.a = VK_COMPONENT_SWIZZLE_A;
+    info.subresourceRange.aspectMask = aspect;
+    info.subresourceRange.baseMipLevel = 0;
+    info.subresourceRange.levelCount = VK_REMAINING_MIP_LEVELS;
+    info.subresourceRange.baseArrayLayer = 0;
+    info.subresourceRange.layerCount = VK_REMAINING_ARRAY_LAYERS;
+
+    intel_img_view_init(cmd->dev, &info, view);
+
+    meta->src.valid = true;
+
+    memcpy(meta->src.surface, view->cmd,
+            sizeof(view->cmd[0]) * view->cmd_len);
+    meta->src.surface_len = view->cmd_len;
+
+    meta->src.reloc_target = (intptr_t) img->obj.mem->bo;
+    meta->src.reloc_offset = 0;
+    meta->src.reloc_flags = 0;
+
+    /* Don't need tmp_view anymore */
+}
+
+static void cmd_meta_adjust_compressed_dst(struct intel_cmd *cmd,
+                                           const struct intel_img *img,
+                                           struct intel_cmd_meta *meta)
+{
+    int32_t w, h, layer;
+    unsigned x_offset, y_offset;
+
+    if (cmd_gen(cmd) >= INTEL_GEN(7)) {
+        w = GEN_EXTRACT(meta->dst.surface[2], GEN7_SURFACE_DW2_WIDTH);
+        h = GEN_EXTRACT(meta->dst.surface[2], GEN7_SURFACE_DW2_HEIGHT);
+        layer = GEN_EXTRACT(meta->dst.surface[4],
+                GEN7_SURFACE_DW4_MIN_ARRAY_ELEMENT);
+    } else {
+        w = GEN_EXTRACT(meta->dst.surface[2], GEN6_SURFACE_DW2_WIDTH);
+        h = GEN_EXTRACT(meta->dst.surface[2], GEN6_SURFACE_DW2_HEIGHT);
+        layer = GEN_EXTRACT(meta->dst.surface[4],
+                GEN6_SURFACE_DW4_MIN_ARRAY_ELEMENT);
+    }
+
+    /* note that the width/height fields have the real values minus 1 */
+    w = (w + img->layout.block_width) / img->layout.block_width - 1;
+    h = (h + img->layout.block_height) / img->layout.block_height - 1;
+
+    /* adjust width and height */
+    if (cmd_gen(cmd) >= INTEL_GEN(7)) {
+        meta->dst.surface[2] &= ~(GEN7_SURFACE_DW2_WIDTH__MASK |
+                                  GEN7_SURFACE_DW2_HEIGHT__MASK);
+        meta->dst.surface[2] |= GEN_SHIFT32(w, GEN7_SURFACE_DW2_WIDTH) |
+                                GEN_SHIFT32(h, GEN7_SURFACE_DW2_HEIGHT);
+    } else {
+        meta->dst.surface[2] &= ~(GEN6_SURFACE_DW2_WIDTH__MASK |
+                                  GEN6_SURFACE_DW2_HEIGHT__MASK);
+        meta->dst.surface[2] |= GEN_SHIFT32(w, GEN6_SURFACE_DW2_WIDTH) |
+                                GEN_SHIFT32(h, GEN6_SURFACE_DW2_HEIGHT);
+    }
+
+    if (!layer)
+        return;
+
+    meta->dst.reloc_offset = intel_layout_get_slice_tile_offset(&img->layout,
+            0, layer, &x_offset, &y_offset);
+
+    /*
+     * The lower 2 bits (or 1 bit for Y) are missing.  This may be a problem
+     * for small images (16x16 or smaller).  We will need to adjust the
+     * drawing rectangle instead.
+     */
+    x_offset = (x_offset / img->layout.block_width) >> 2;
+    y_offset = (y_offset / img->layout.block_height) >> 1;
+
+    /* adjust min array element and X/Y offsets */
+    if (cmd_gen(cmd) >= INTEL_GEN(7)) {
+        meta->dst.surface[4] &= ~GEN7_SURFACE_DW4_MIN_ARRAY_ELEMENT__MASK;
+        meta->dst.surface[5] |= GEN_SHIFT32(x_offset, GEN7_SURFACE_DW5_X_OFFSET) |
+                                GEN_SHIFT32(y_offset, GEN7_SURFACE_DW5_Y_OFFSET);
+    } else {
+        meta->dst.surface[4] &= ~GEN6_SURFACE_DW4_MIN_ARRAY_ELEMENT__MASK;
+        meta->dst.surface[5] |= GEN_SHIFT32(x_offset, GEN6_SURFACE_DW5_X_OFFSET) |
+                                GEN_SHIFT32(y_offset, GEN6_SURFACE_DW5_Y_OFFSET);
+    }
+}
+
+static void cmd_meta_set_dst_for_img(struct intel_cmd *cmd,
+                                     const struct intel_img *img,
+                                     VkFormat format,
+                                     uint32_t lod, uint32_t layer,
+                                     struct intel_cmd_meta *meta)
+{
+    struct intel_att_view tmp_view;
+    struct intel_att_view *view = &tmp_view;
+    VkImageViewCreateInfo info;
+
+    memset(&info, 0, sizeof(info));
+    info.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO;
+    info.image = (VkImage) img;
+    info.format = format;
+    info.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
+    info.subresourceRange.baseMipLevel = lod;
+    info.subresourceRange.levelCount = 1;
+    info.subresourceRange.baseArrayLayer = layer;
+    info.subresourceRange.layerCount = 1;
+
+    intel_att_view_init(cmd->dev, &info, view);
+
+    meta->dst.valid = true;
+
+    memcpy(meta->dst.surface, view->att_cmd,
+            sizeof(view->att_cmd[0]) * view->cmd_len);
+    meta->dst.surface_len = view->cmd_len;
+
+    meta->dst.reloc_target = (intptr_t) img->obj.mem->bo;
+    meta->dst.reloc_offset = 0;
+    meta->dst.reloc_flags = INTEL_RELOC_WRITE;
+
+    if (icd_format_is_compressed(img->layout.format))
+        cmd_meta_adjust_compressed_dst(cmd, img, meta);
+}
+
+static void cmd_meta_set_src_for_writer(struct intel_cmd *cmd,
+                                        enum intel_cmd_writer_type writer,
+                                        VkDeviceSize size,
+                                        VkFormat format,
+                                        struct intel_cmd_meta *meta)
+{
+    struct intel_buf_view *view;
+    VkResult res;
+    VkBuffer localbuf = VK_NULL_HANDLE;
+
+    res = cmd_meta_create_buf_view(cmd, localbuf,
+            size, format, &view);
+    if (res != VK_SUCCESS) {
+        cmd_fail(cmd, res);
+        return;
+    }
+
+    meta->src.valid = true;
+
+    memcpy(meta->src.surface, view->cmd,
+            sizeof(view->cmd[0]) * view->cmd_len);
+    meta->src.surface_len = view->cmd_len;
+
+    intel_buf_view_destroy(view);
+
+    meta->src.reloc_target = (intptr_t) writer;
+    meta->src.reloc_offset = 0;
+    meta->src.reloc_flags = INTEL_CMD_RELOC_TARGET_IS_WRITER;
+}
+
+static void cmd_meta_set_ds_view(struct intel_cmd *cmd,
+                                 const struct intel_img *img,
+                                 uint32_t lod, uint32_t layer,
+                                 struct intel_cmd_meta *meta)
+{
+    VkImageViewCreateInfo info;
+
+    memset(&info, 0, sizeof(info));
+    info.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO;
+    info.image = (VkImage) img;
+    info.subresourceRange.baseMipLevel = lod;
+    info.subresourceRange.levelCount = 1;
+    info.subresourceRange.baseArrayLayer = layer;
+    info.subresourceRange.layerCount = 1;
+
+    intel_att_view_init(cmd->dev, &info, &meta->ds.view);
+}
+
+static void cmd_meta_set_ds_state(struct intel_cmd *cmd,
+                                  VkImageAspectFlagBits aspect,
+                                  uint32_t stencil_ref,
+                                  struct intel_cmd_meta *meta)
+{
+    meta->ds.stencil_ref = stencil_ref;
+    meta->ds.aspect = aspect;
+}
+
+static enum intel_dev_meta_shader get_shader_id(const struct intel_dev *dev,
+                                                const struct intel_img *img,
+                                                bool copy_array)
+{
+    enum intel_dev_meta_shader shader_id;
+
+    switch (img->type) {
+    case VK_IMAGE_TYPE_1D:
+        shader_id = (copy_array) ?
+            INTEL_DEV_META_FS_COPY_1D_ARRAY : INTEL_DEV_META_FS_COPY_1D;
+        break;
+    case VK_IMAGE_TYPE_2D:
+        shader_id = (img->sample_count > 1) ? INTEL_DEV_META_FS_COPY_2D_MS :
+                    (copy_array) ?  INTEL_DEV_META_FS_COPY_2D_ARRAY :
+                    INTEL_DEV_META_FS_COPY_2D;
+        break;
+    case VK_IMAGE_TYPE_3D:
+    default:
+        shader_id = INTEL_DEV_META_FS_COPY_2D_ARRAY;
+        break;
+    }
+
+    return shader_id;
+}
+
+static bool cmd_meta_mem_dword_aligned(const struct intel_cmd *cmd,
+                                       VkDeviceSize src_offset,
+                                       VkDeviceSize dst_offset,
+                                       VkDeviceSize size)
+{
+    return !((src_offset | dst_offset | size) & 0x3);
+}
+
+static VkFormat cmd_meta_img_raw_format(const struct intel_cmd *cmd,
+                                          VkFormat format)
+{
+    switch (icd_format_get_size(format)) {
+    case 1:
+        format = VK_FORMAT_R8_UINT;
+        break;
+    case 2:
+        format = VK_FORMAT_R16_UINT;
+        break;
+    case 4:
+        format = VK_FORMAT_R32_UINT;
+        break;
+    case 8:
+        format = VK_FORMAT_R32G32_UINT;
+        break;
+    case 16:
+        format = VK_FORMAT_R32G32B32A32_UINT;
+        break;
+    default:
+        assert(!"unsupported image format for raw blit op");
+        format = VK_FORMAT_UNDEFINED;
+        break;
+    }
+
+    return format;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdCopyBuffer(
+    VkCommandBuffer                 commandBuffer,
+    VkBuffer                    srcBuffer,
+    VkBuffer                    dstBuffer,
+    uint32_t                    regionCount,
+    const VkBufferCopy*         pRegions)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+    struct intel_buf *src = intel_buf(srcBuffer);
+    struct intel_buf *dst = intel_buf(dstBuffer);
+    struct intel_cmd_meta meta;
+    VkFormat format;
+    uint32_t i;
+
+    memset(&meta, 0, sizeof(meta));
+    meta.mode = INTEL_CMD_META_VS_POINTS;
+
+    meta.height = 1;
+    meta.sample_count = 1;
+
+    format = VK_FORMAT_UNDEFINED;
+
+    for (i = 0; i < regionCount; i++) {
+        const VkBufferCopy *region = &pRegions[i];
+        VkFormat fmt;
+
+        meta.src.x = region->srcOffset;
+        meta.dst.x = region->dstOffset;
+        meta.width = region->size;
+
+        if (cmd_meta_mem_dword_aligned(cmd, region->srcOffset,
+                    region->dstOffset, region->size)) {
+            meta.shader_id = INTEL_DEV_META_VS_COPY_MEM;
+            meta.src.x /= 4;
+            meta.dst.x /= 4;
+            meta.width /= 4;
+
+            /*
+             * INTEL_DEV_META_VS_COPY_MEM is untyped but expects the stride to
+             * be 16
+             */
+            fmt = VK_FORMAT_R32G32B32A32_UINT;
+        } else {
+            if (cmd_gen(cmd) == INTEL_GEN(6)) {
+                intel_dev_log(cmd->dev, VK_DEBUG_REPORT_ERROR_BIT_EXT,
+                        &cmd->obj.base, 0, 0,
+                        "unaligned vkCmdCopyBuffer unsupported");
+                cmd_fail(cmd, VK_ERROR_VALIDATION_FAILED_EXT);
+                continue;
+            }
+
+            meta.shader_id = INTEL_DEV_META_VS_COPY_MEM_UNALIGNED;
+
+            /*
+             * INTEL_DEV_META_VS_COPY_MEM_UNALIGNED is untyped but expects the
+             * stride to be 4
+             */
+            fmt = VK_FORMAT_R8G8B8A8_UINT;
+        }
+
+        if (format != fmt) {
+            format = fmt;
+
+            cmd_meta_set_src_for_buf(cmd, src, format, &meta);
+            cmd_meta_set_dst_for_buf(cmd, dst, format, &meta);
+        }
+
+        cmd_draw_meta(cmd, &meta);
+    }
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdCopyImage(
+    VkCommandBuffer                              commandBuffer,
+    VkImage                                   srcImage,
+    VkImageLayout                            srcImageLayout,
+    VkImage                                   dstImage,
+    VkImageLayout                            dstImageLayout,
+    uint32_t                                    regionCount,
+    const VkImageCopy*                       pRegions)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+    struct intel_img *src = intel_img(srcImage);
+    struct intel_img *dst = intel_img(dstImage);
+    struct intel_cmd_meta meta;
+    VkFormat raw_format;
+    bool raw_copy = false;
+    uint32_t i;
+
+    if (src->layout.format == dst->layout.format) {
+        raw_copy = true;
+        raw_format = cmd_meta_img_raw_format(cmd, src->layout.format);
+    } else {
+        assert(!(icd_format_is_compressed(src->layout.format) ||
+               icd_format_is_compressed(dst->layout.format)) && "Compressed formats not supported");
+    }
+
+    memset(&meta, 0, sizeof(meta));
+    meta.mode = INTEL_CMD_META_FS_RECT;
+
+    cmd_meta_set_src_for_img(cmd, src,
+            (raw_copy) ? raw_format : src->layout.format,
+            VK_IMAGE_ASPECT_COLOR_BIT, &meta);
+
+    meta.sample_count = dst->sample_count;
+
+    for (i = 0; i < regionCount; i++) {
+        const VkImageCopy *region = &pRegions[i];
+        uint32_t j;
+
+        meta.shader_id = get_shader_id(cmd->dev, src,
+                (region->extent.depth > 1));
+
+        meta.src.lod = region->srcSubresource.mipLevel;
+        meta.src.layer = region->srcSubresource.baseArrayLayer +
+            region->srcOffset.z;
+        meta.src.x = region->srcOffset.x;
+        meta.src.y = region->srcOffset.y;
+
+        meta.dst.lod = region->dstSubresource.mipLevel;
+        meta.dst.layer = region->dstSubresource.baseArrayLayer +
+                region->dstOffset.z;
+        meta.dst.x = region->dstOffset.x;
+        meta.dst.y = region->dstOffset.y;
+
+        meta.width = region->extent.width;
+        meta.height = region->extent.height;
+
+        if (raw_copy) {
+            const uint32_t block_width =
+                icd_format_get_block_width(raw_format);
+
+            meta.src.x /= block_width;
+            meta.src.y /= block_width;
+            meta.dst.x /= block_width;
+            meta.dst.y /= block_width;
+            meta.width /= block_width;
+            meta.height /= block_width;
+        }
+
+        for (j = 0; j < region->extent.depth; j++) {
+            cmd_meta_set_dst_for_img(cmd, dst,
+                    (raw_copy) ? raw_format : dst->layout.format,
+                    meta.dst.lod, meta.dst.layer, &meta);
+
+            cmd_draw_meta(cmd, &meta);
+
+            meta.src.layer++;
+            meta.dst.layer++;
+        }
+    }
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdBlitImage(
+    VkCommandBuffer                              commandBuffer,
+    VkImage                                  srcImage,
+    VkImageLayout                            srcImageLayout,
+    VkImage                                  dstImage,
+    VkImageLayout                            dstImageLayout,
+    uint32_t                                 regionCount,
+    const VkImageBlit*                       pRegions,
+    VkFilter                                 filter)
+{
+    /*
+     * TODO: Implement actual blit function.
+     */
+    assert(0 && "vkCmdBlitImage not implemented");
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdCopyBufferToImage(
+    VkCommandBuffer                              commandBuffer,
+    VkBuffer                                  srcBuffer,
+    VkImage                                   dstImage,
+    VkImageLayout                            dstImageLayout,
+    uint32_t                                    regionCount,
+    const VkBufferImageCopy*                pRegions)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+    struct intel_buf *buf = intel_buf(srcBuffer);
+    struct intel_img *img = intel_img(dstImage);
+    struct intel_cmd_meta meta;
+    VkFormat format;
+    uint32_t block_width, i;
+
+    memset(&meta, 0, sizeof(meta));
+    meta.mode = INTEL_CMD_META_FS_RECT;
+
+    meta.shader_id = INTEL_DEV_META_FS_COPY_MEM_TO_IMG;
+    meta.sample_count = img->sample_count;
+
+    format = cmd_meta_img_raw_format(cmd, img->layout.format);
+    block_width = icd_format_get_block_width(img->layout.format);
+    cmd_meta_set_src_for_buf(cmd, buf, format, &meta);
+
+    for (i = 0; i < regionCount; i++) {
+        const VkBufferImageCopy *region = &pRegions[i];
+        uint32_t j;
+
+        meta.src.x = region->bufferOffset / icd_format_get_size(format);
+
+        meta.dst.lod = region->imageSubresource.mipLevel;
+        meta.dst.layer = region->imageSubresource.baseArrayLayer +
+            region->imageOffset.z;
+        meta.dst.x = region->imageOffset.x / block_width;
+        meta.dst.y = region->imageOffset.y / block_width;
+
+        meta.width = region->imageExtent.width / block_width;
+        meta.height = region->imageExtent.height / block_width;
+
+        for (j = 0; j < region->imageExtent.depth; j++) {
+            cmd_meta_set_dst_for_img(cmd, img, format,
+                    meta.dst.lod, meta.dst.layer, &meta);
+
+            cmd_draw_meta(cmd, &meta);
+
+            meta.src.x += meta.width * meta.height;
+            meta.dst.layer++;
+        }
+    }
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdCopyImageToBuffer(
+    VkCommandBuffer                              commandBuffer,
+    VkImage                                   srcImage,
+    VkImageLayout                            srcImageLayout,
+    VkBuffer                                  dstBuffer,
+    uint32_t                                    regionCount,
+    const VkBufferImageCopy*                pRegions)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+    struct intel_img *img = intel_img(srcImage);
+    struct intel_buf *buf = intel_buf(dstBuffer);
+    struct intel_cmd_meta meta;
+    VkFormat img_format, buf_format;
+    uint32_t block_width, i;
+
+    memset(&meta, 0, sizeof(meta));
+    meta.mode = INTEL_CMD_META_VS_POINTS;
+
+    img_format = cmd_meta_img_raw_format(cmd, img->layout.format);
+    block_width = icd_format_get_block_width(img_format);
+
+    /* buf_format is ignored by hw, but we derive stride from it */
+    switch (img_format) {
+    case VK_FORMAT_R8_UINT:
+        meta.shader_id = INTEL_DEV_META_VS_COPY_R8_TO_MEM;
+        buf_format = VK_FORMAT_R8G8B8A8_UINT;
+        break;
+    case VK_FORMAT_R16_UINT:
+        meta.shader_id = INTEL_DEV_META_VS_COPY_R16_TO_MEM;
+        buf_format = VK_FORMAT_R8G8B8A8_UINT;
+        break;
+    case VK_FORMAT_R32_UINT:
+        meta.shader_id = INTEL_DEV_META_VS_COPY_R32_TO_MEM;
+        buf_format = VK_FORMAT_R32G32B32A32_UINT;
+        break;
+    case VK_FORMAT_R32G32_UINT:
+        meta.shader_id = INTEL_DEV_META_VS_COPY_R32G32_TO_MEM;
+        buf_format = VK_FORMAT_R32G32B32A32_UINT;
+        break;
+    case VK_FORMAT_R32G32B32A32_UINT:
+        meta.shader_id = INTEL_DEV_META_VS_COPY_R32G32B32A32_TO_MEM;
+        buf_format = VK_FORMAT_R32G32B32A32_UINT;
+        break;
+    default:
+        img_format = VK_FORMAT_UNDEFINED;
+        buf_format = VK_FORMAT_UNDEFINED;
+        break;
+    }
+
+    if (img_format == VK_FORMAT_UNDEFINED ||
+        (cmd_gen(cmd) == INTEL_GEN(6) &&
+         icd_format_get_size(img_format) < 4)) {
+        intel_dev_log(cmd->dev, VK_DEBUG_REPORT_ERROR_BIT_EXT,
+                      &cmd->obj.base, 0, 0,
+                      "vkCmdCopyImageToBuffer with bpp %d unsupported",
+                      icd_format_get_size(img->layout.format));
+        return;
+    }
+
+    cmd_meta_set_src_for_img(cmd, img, img_format,
+            VK_IMAGE_ASPECT_COLOR_BIT, &meta);
+    cmd_meta_set_dst_for_buf(cmd, buf, buf_format, &meta);
+
+    meta.sample_count = 1;
+
+    for (i = 0; i < regionCount; i++) {
+        const VkBufferImageCopy *region = &pRegions[i];
+        uint32_t j;
+
+        meta.src.lod = region->imageSubresource.mipLevel;
+        meta.src.layer = region->imageSubresource.baseArrayLayer +
+            region->imageOffset.z;
+        meta.src.x = region->imageOffset.x / block_width;
+        meta.src.y = region->imageOffset.y / block_width;
+
+        meta.dst.x = region->bufferOffset / icd_format_get_size(img_format);
+        meta.width = region->imageExtent.width / block_width;
+        meta.height = region->imageExtent.height / block_width;
+
+        for (j = 0; j < region->imageExtent.depth; j++) {
+            cmd_draw_meta(cmd, &meta);
+
+            meta.src.layer++;
+            meta.dst.x += meta.width * meta.height;
+        }
+    }
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdUpdateBuffer(
+    VkCommandBuffer                              commandBuffer,
+    VkBuffer                                  dstBuffer,
+    VkDeviceSize                                dstOffset,
+    VkDeviceSize                                dataSize,
+    const void*                                 pData)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+    struct intel_buf *dst = intel_buf(dstBuffer);
+    struct intel_cmd_meta meta;
+    VkFormat format;
+    uint32_t *ptr;
+    uint32_t offset;
+
+    /* write to dynamic state writer first */
+    offset = cmd_state_pointer(cmd, INTEL_CMD_ITEM_BLOB, 32,
+            (dataSize + 3) / 4, &ptr);
+    memcpy(ptr, pData, dataSize);
+
+    memset(&meta, 0, sizeof(meta));
+    meta.mode = INTEL_CMD_META_VS_POINTS;
+
+    meta.shader_id = INTEL_DEV_META_VS_COPY_MEM;
+
+    meta.src.x = offset / 4;
+    meta.dst.x = dstOffset / 4;
+    meta.width = dataSize / 4;
+    meta.height = 1;
+    meta.sample_count = 1;
+
+    /*
+     * INTEL_DEV_META_VS_COPY_MEM is untyped but expects the stride to be 16
+     */
+    format = VK_FORMAT_R32G32B32A32_UINT;
+
+    cmd_meta_set_src_for_writer(cmd, INTEL_CMD_WRITER_STATE,
+            offset + dataSize, format, &meta);
+    cmd_meta_set_dst_for_buf(cmd, dst, format, &meta);
+
+    cmd_draw_meta(cmd, &meta);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdFillBuffer(
+    VkCommandBuffer                              commandBuffer,
+    VkBuffer                                  dstBuffer,
+    VkDeviceSize                                dstOffset,
+    VkDeviceSize                                size,
+    uint32_t                                    data)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+    struct intel_buf *dst = intel_buf(dstBuffer);
+    struct intel_cmd_meta meta;
+    VkFormat format;
+
+    memset(&meta, 0, sizeof(meta));
+    meta.mode = INTEL_CMD_META_VS_POINTS;
+
+    meta.shader_id = INTEL_DEV_META_VS_FILL_MEM;
+
+    meta.clear_val[0] = data;
+
+    meta.dst.x = dstOffset / 4;
+    meta.width = size / 4;
+    meta.height = 1;
+    meta.sample_count = 1;
+
+    /*
+     * INTEL_DEV_META_VS_FILL_MEM is untyped but expects the stride to be 16
+     */
+    format = VK_FORMAT_R32G32B32A32_UINT;
+
+    cmd_meta_set_dst_for_buf(cmd, dst, format, &meta);
+
+    cmd_draw_meta(cmd, &meta);
+}
+
+static void cmd_meta_clear_image(struct intel_cmd *cmd,
+                                 struct intel_img *img,
+                                 VkFormat format,
+                                 struct intel_cmd_meta *meta,
+                                 const VkImageSubresourceRange *range)
+{
+    uint32_t mip_levels, array_size;
+    uint32_t i, j;
+
+    if (range->baseMipLevel >= img->mip_levels ||
+        range->baseArrayLayer >= img->array_size)
+        return;
+
+    mip_levels = img->mip_levels - range->baseMipLevel;
+    if (mip_levels > range->levelCount)
+        mip_levels = range->levelCount;
+
+    array_size = img->array_size - range->baseArrayLayer;
+    if (array_size > range->layerCount)
+        array_size = range->layerCount;
+
+    for (i = 0; i < mip_levels; i++) {
+        meta->dst.lod = range->baseMipLevel + i;
+        meta->dst.layer = range->baseArrayLayer;
+
+        /* TODO INTEL_CMD_META_DS_HIZ_CLEAR requires 8x4 aligned rectangle */
+        meta->width = u_minify(img->layout.width0, meta->dst.lod);
+        meta->height = u_minify(img->layout.height0, meta->dst.lod);
+
+        if (meta->ds.op != INTEL_CMD_META_DS_NOP &&
+            !intel_img_can_enable_hiz(img, meta->dst.lod))
+            continue;
+
+        for (j = 0; j < array_size; j++) {
+            if (range->aspectMask == VK_IMAGE_ASPECT_COLOR_BIT) {
+                cmd_meta_set_dst_for_img(cmd, img, format,
+                        meta->dst.lod, meta->dst.layer, meta);
+
+                cmd_draw_meta(cmd, meta);
+            } else {
+                cmd_meta_set_ds_view(cmd, img, meta->dst.lod,
+                        meta->dst.layer, meta);
+                if (range->aspectMask & VK_IMAGE_ASPECT_DEPTH_BIT) {
+                    cmd_meta_set_ds_state(cmd, VK_IMAGE_ASPECT_DEPTH_BIT,
+                            meta->clear_val[1], meta);
+
+                    cmd_draw_meta(cmd, meta);
+                }
+                if (range->aspectMask & VK_IMAGE_ASPECT_STENCIL_BIT) {
+                    cmd_meta_set_ds_state(cmd, VK_IMAGE_ASPECT_STENCIL_BIT,
+                            meta->clear_val[1], meta);
+
+                    cmd_draw_meta(cmd, meta);
+                }
+            }
+
+            meta->dst.layer++;
+        }
+    }
+}
+
+void cmd_meta_ds_op(struct intel_cmd *cmd,
+                    enum intel_cmd_meta_ds_op op,
+                    struct intel_img *img,
+                    const VkImageSubresourceRange *range)
+{
+    struct intel_cmd_meta meta;
+
+    if (img->layout.aux != INTEL_LAYOUT_AUX_HIZ)
+        return;
+    if (!(range->aspectMask & (VK_IMAGE_ASPECT_DEPTH_BIT | VK_IMAGE_ASPECT_STENCIL_BIT)))
+        return;
+
+    memset(&meta, 0, sizeof(meta));
+    meta.mode = INTEL_CMD_META_DEPTH_STENCIL_RECT;
+    meta.sample_count = img->sample_count;
+
+    meta.ds.op = op;
+    meta.ds.optimal = true;
+
+    cmd_meta_clear_image(cmd, img, img->layout.format, &meta, range);
+}
+
+void cmd_meta_clear_color_image(
+    VkCommandBuffer                         commandBuffer,
+    struct intel_img                   *img,
+    VkImageLayout                       imageLayout,
+    const VkClearColorValue            *pClearColor,
+    uint32_t                            rangeCount,
+    const VkImageSubresourceRange      *pRanges)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+    struct intel_cmd_meta meta;
+    VkFormat format;
+    uint32_t i;
+
+    memset(&meta, 0, sizeof(meta));
+    meta.mode = INTEL_CMD_META_FS_RECT;
+
+    meta.shader_id = INTEL_DEV_META_FS_CLEAR_COLOR;
+    meta.sample_count = img->sample_count;
+
+    meta.clear_val[0] = pClearColor->uint32[0];
+    meta.clear_val[1] = pClearColor->uint32[1];
+    meta.clear_val[2] = pClearColor->uint32[2];
+    meta.clear_val[3] = pClearColor->uint32[3];
+    format = img->layout.format;
+
+    for (i = 0; i < rangeCount; i++) {
+        cmd_meta_clear_image(cmd, img, format, &meta, &pRanges[i]);
+    }
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdClearColorImage(
+    VkCommandBuffer                         commandBuffer,
+    VkImage                             image,
+    VkImageLayout                       imageLayout,
+    const VkClearColorValue            *pClearColor,
+    uint32_t                            rangeCount,
+    const VkImageSubresourceRange      *pRanges)
+{
+    struct intel_img *img = intel_img(image);
+    cmd_meta_clear_color_image(commandBuffer, img, imageLayout, pClearColor, rangeCount, pRanges);
+}
+
+void cmd_meta_clear_depth_stencil_image(
+    VkCommandBuffer                              commandBuffer,
+    struct intel_img*                        img,
+    VkImageLayout                            imageLayout,
+    float                                       depth,
+    uint32_t                                    stencil,
+    uint32_t                                    rangeCount,
+    const VkImageSubresourceRange*          pRanges)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+    struct intel_cmd_meta meta;
+    uint32_t i;
+
+    memset(&meta, 0, sizeof(meta));
+    meta.mode = INTEL_CMD_META_DEPTH_STENCIL_RECT;
+
+    meta.shader_id = INTEL_DEV_META_FS_CLEAR_DEPTH;
+    meta.sample_count = img->sample_count;
+
+    meta.clear_val[0] = u_fui(depth);
+    meta.clear_val[1] = stencil;
+
+    if (imageLayout == VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL ||
+        imageLayout == VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL) {
+        meta.ds.optimal = true;
+    }
+
+    for (i = 0; i < rangeCount; i++) {
+        const VkImageSubresourceRange *range = &pRanges[i];
+
+        cmd_meta_clear_image(cmd, img, img->layout.format,
+                &meta, range);
+    }
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdClearDepthStencilImage(
+    VkCommandBuffer                                 commandBuffer,
+    VkImage                                     image,
+    VkImageLayout                               imageLayout,
+    const VkClearDepthStencilValue*             pDepthStencil,
+    uint32_t                                    rangeCount,
+    const VkImageSubresourceRange*              pRanges)
+{
+    struct intel_img *img = intel_img(image);
+    cmd_meta_clear_depth_stencil_image(commandBuffer, img, imageLayout, pDepthStencil->depth, pDepthStencil->stencil, rangeCount, pRanges);
+}
+
+static void cmd_clear_color_attachment(
+    VkCommandBuffer                             commandBuffer,
+    uint32_t                                colorAttachment,
+    VkImageLayout                           imageLayout,
+    const VkClearColorValue                *pColor,
+    uint32_t                                rectCount,
+    const VkClearRect                      *pRects)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+    const struct intel_render_pass_subpass *subpass =
+        cmd->bind.render_pass_subpass;
+    const struct intel_fb *fb = cmd->bind.fb;
+    const struct intel_att_view *view =
+        fb->views[subpass->color_indices[colorAttachment]];
+
+    /* Convert each rect3d to clear into a subresource clear.
+     * TODO: this currently only supports full layer clears --
+     * cmd_meta_clear_color_image does not provide a means to
+     * specify the xy bounds.
+     */
+    for (uint32_t i = 0; i < rectCount; i++) {
+           VkImageSubresourceRange range = {
+               VK_IMAGE_ASPECT_COLOR_BIT,
+               view->mipLevel,
+               1,
+               0,
+               1
+           };
+
+           cmd_meta_clear_color_image(commandBuffer, view->img,
+                                      imageLayout,
+                                      pColor,
+                                      1,
+                                      &range);
+    }
+}
+
+static void cmd_clear_depth_stencil_attachment(
+    VkCommandBuffer                             commandBuffer,
+    VkImageAspectFlags                      aspectMask,
+    VkImageLayout                           imageLayout,
+    const VkClearDepthStencilValue*         pDepthStencil,
+    uint32_t                                rectCount,
+    const VkClearRect                      *pRects)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+    const struct intel_render_pass_subpass *subpass =
+        cmd->bind.render_pass_subpass;
+    const struct intel_fb *fb = cmd->bind.fb;
+    const struct intel_att_view *view = fb->views[subpass->ds_index];
+
+    /* Convert each rect3d to clear into a subresource clear.
+     * TODO: this currently only supports full layer clears --
+     * cmd_meta_clear_depth_stencil_image does not provide a means to
+     * specify the xy bounds.
+     */
+    for (uint32_t i = 0; i < rectCount; i++) {
+        VkImageSubresourceRange range = {
+            aspectMask,
+            view->mipLevel,
+            1,
+            0,
+            1
+        };
+
+        cmd_meta_clear_depth_stencil_image(commandBuffer,
+                view->img, imageLayout,
+                pDepthStencil->depth, pDepthStencil->stencil, 1, &range);
+    }
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdClearAttachments(
+    VkCommandBuffer                                 commandBuffer,
+    uint32_t                                    attachmentCount,
+    const VkClearAttachment*                    pAttachments,
+    uint32_t                                    rectCount,
+    const VkClearRect*                          pRects)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+
+    for (uint32_t i = 0; i < attachmentCount; i++) {
+        if (pAttachments[i].aspectMask == VK_IMAGE_ASPECT_COLOR_BIT) {
+            cmd_clear_color_attachment(
+                        commandBuffer,
+                        pAttachments[i].colorAttachment,
+                        cmd->bind.render_pass->attachments[i].final_layout,
+                        &pAttachments[i].clearValue.color,
+                        rectCount,
+                        pRects);
+        } else {
+            cmd_clear_depth_stencil_attachment(
+                        commandBuffer,
+                        pAttachments[i].aspectMask,
+                        cmd->bind.render_pass_subpass->ds_layout,
+                        &pAttachments[i].clearValue.depthStencil,
+                        rectCount,
+                        pRects);
+        }
+    }
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdResolveImage(
+    VkCommandBuffer                              commandBuffer,
+    VkImage                                   srcImage,
+    VkImageLayout                            srcImageLayout,
+    VkImage                                   dstImage,
+    VkImageLayout                            dstImageLayout,
+    uint32_t                                    regionCount,
+    const VkImageResolve*                    pRegions)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+    struct intel_img *src = intel_img(srcImage);
+    struct intel_img *dst = intel_img(dstImage);
+    struct intel_cmd_meta meta;
+    VkFormat format;
+    uint32_t i;
+
+    memset(&meta, 0, sizeof(meta));
+    meta.mode = INTEL_CMD_META_FS_RECT;
+
+    switch (src->sample_count) {
+    case 2:
+    default:
+        meta.shader_id = INTEL_DEV_META_FS_RESOLVE_2X;
+        break;
+    case 4:
+        meta.shader_id = INTEL_DEV_META_FS_RESOLVE_4X;
+        break;
+    case 8:
+        meta.shader_id = INTEL_DEV_META_FS_RESOLVE_8X;
+        break;
+    case 16:
+        meta.shader_id = INTEL_DEV_META_FS_RESOLVE_16X;
+        break;
+    }
+
+    meta.sample_count = 1;
+
+    format = cmd_meta_img_raw_format(cmd, src->layout.format);
+    cmd_meta_set_src_for_img(cmd, src, format, VK_IMAGE_ASPECT_COLOR_BIT, &meta);
+
+    for (i = 0; i < regionCount; i++) {
+        const VkImageResolve *region = &pRegions[i];
+        int arrayLayer;
+
+        for(arrayLayer = 0; arrayLayer < region->extent.depth; arrayLayer++) {
+            meta.src.lod = region->srcSubresource.mipLevel;
+            meta.src.layer = region->srcSubresource.baseArrayLayer + arrayLayer;
+            meta.src.x = region->srcOffset.x;
+            meta.src.y = region->srcOffset.y;
+
+            meta.dst.lod = region->dstSubresource.mipLevel;
+            meta.dst.layer = region->dstSubresource.baseArrayLayer + arrayLayer;
+            meta.dst.x = region->dstOffset.x;
+            meta.dst.y = region->dstOffset.y;
+
+            meta.width = region->extent.width;
+            meta.height = region->extent.height;
+
+            cmd_meta_set_dst_for_img(cmd, dst, format,
+                    meta.dst.lod, meta.dst.layer, &meta);
+
+            cmd_draw_meta(cmd, &meta);
+        }
+    }
+}

diff --git a/icd/intel/cmd_mi.c b/icd/intel/cmd_mi.c
new file mode 100644
index 0000000..a2a1ad4
--- /dev/null
+++ b/icd/intel/cmd_mi.c

@@ -0,0 +1,235 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Mike Stroyan <mike@LunarG.com>
+ *
+ */
+
+#include "kmd/winsys.h"
+#include "buf.h"
+#include "event.h"
+#include "mem.h"
+#include "obj.h"
+#include "query.h"
+#include "cmd_priv.h"
+
+static void gen6_MI_STORE_REGISTER_MEM(struct intel_cmd *cmd,
+                                       struct intel_bo *bo,
+                                       uint32_t offset,
+                                       uint32_t reg)
+{
+    const uint8_t cmd_len = 3;
+    uint32_t dw0 = GEN6_MI_CMD(MI_STORE_REGISTER_MEM) |
+                   (cmd_len - 2);
+    uint32_t reloc_flags = INTEL_RELOC_WRITE;
+    uint32_t *dw;
+    uint32_t pos;
+
+    if (cmd_gen(cmd) == INTEL_GEN(6)) {
+        dw0 |= GEN6_MI_STORE_REGISTER_MEM_DW0_USE_GGTT;
+        reloc_flags |= INTEL_RELOC_GGTT;
+    }
+
+    pos = cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+    dw[1] = reg;
+
+    cmd_reserve_reloc(cmd, 1);
+    cmd_batch_reloc(cmd, pos + 2, bo, offset, reloc_flags);
+}
+
+static void gen6_MI_STORE_DATA_IMM(struct intel_cmd *cmd,
+                                   struct intel_bo *bo,
+                                   uint32_t offset,
+                                   uint64_t val)
+{
+    const uint8_t cmd_len = 5;
+    uint32_t dw0 = GEN6_MI_CMD(MI_STORE_DATA_IMM) |
+                   (cmd_len - 2);
+    uint32_t reloc_flags = INTEL_RELOC_WRITE;
+    uint32_t *dw;
+    uint32_t pos;
+
+    if (cmd_gen(cmd) == INTEL_GEN(6)) {
+        dw0 |= GEN6_MI_STORE_DATA_IMM_DW0_USE_GGTT;
+        reloc_flags |= INTEL_RELOC_GGTT;
+    }
+
+    pos = cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+    dw[1] = 0;
+    dw[3] = (uint32_t) val;
+    dw[4] = (uint32_t) (val >> 32);
+
+    cmd_reserve_reloc(cmd, 1);
+    cmd_batch_reloc(cmd, pos + 2, bo, offset, reloc_flags);
+}
+
+static void cmd_query_pipeline_statistics(struct intel_cmd *cmd,
+                                          const struct intel_query *query,
+                                          struct intel_bo *bo,
+                                          VkDeviceSize offset)
+{
+    uint32_t i;
+
+    cmd_batch_flush(cmd, GEN6_PIPE_CONTROL_CS_STALL);
+
+    for (i = 0; i < query->reg_count; i++) {
+        if (query->regs[i]) {
+            /* store lower 32 bits */
+            gen6_MI_STORE_REGISTER_MEM(cmd, bo, offset, query->regs[i]);
+            /* store higher 32 bits */
+            gen6_MI_STORE_REGISTER_MEM(cmd, bo, offset + 4, query->regs[i] + 4);
+        } else {
+            gen6_MI_STORE_DATA_IMM(cmd, bo, offset, 0);
+        }
+
+        offset += sizeof(uint64_t);
+    }
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdBeginQuery(
+    VkCommandBuffer                                 commandBuffer,
+    VkQueryPool                                 queryPool,
+    uint32_t                                    slot,
+    VkQueryControlFlags                         flags)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+    struct intel_query *query = intel_query(queryPool);
+    struct intel_bo *bo = query->obj.mem->bo;
+    const VkDeviceSize offset = query->slot_stride * slot;
+
+    switch (query->type) {
+    case VK_QUERY_TYPE_OCCLUSION:
+        cmd_batch_depth_count(cmd, bo, offset);
+        break;
+    case VK_QUERY_TYPE_PIPELINE_STATISTICS:
+        cmd_query_pipeline_statistics(cmd, query, bo, offset);
+        break;
+    default:
+        /* TODOVV: validate */
+        cmd_fail(cmd, VK_ERROR_VALIDATION_FAILED_EXT);
+        break;
+    }
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdEndQuery(
+    VkCommandBuffer                                 commandBuffer,
+    VkQueryPool                                 queryPool,
+    uint32_t                                    slot)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+    struct intel_query *query = intel_query(queryPool);
+    struct intel_bo *bo = query->obj.mem->bo;
+    const VkDeviceSize offset = query->slot_stride * slot;
+
+    switch (query->type) {
+    case VK_QUERY_TYPE_OCCLUSION:
+        cmd_batch_depth_count(cmd, bo, offset + sizeof(uint64_t));
+        break;
+    case VK_QUERY_TYPE_PIPELINE_STATISTICS:
+        cmd_query_pipeline_statistics(cmd, query, bo, offset + query->slot_stride);
+        break;
+    default:
+        /* TODOVV: validate */
+        cmd_fail(cmd, VK_ERROR_VALIDATION_FAILED_EXT);
+        break;
+    }
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdResetQueryPool(
+    VkCommandBuffer                                 commandBuffer,
+    VkQueryPool                                 queryPool,
+    uint32_t                                    firstQuery,
+    uint32_t                                    queryCount)
+{
+    /* no-op */
+}
+
+static void cmd_write_event_value(struct intel_cmd *cmd, struct intel_event *event,
+                            VkPipelineStageFlags stageMask, uint32_t value)
+{
+    uint32_t pipe_control_flags = 0;
+
+    /* Event setting is done with PIPE_CONTROL post-sync write immediate.
+     * With no other PIPE_CONTROL flags set, it behaves as VK_PIPE_EVENT_TOP_OF_PIPE.
+     * All other pipeEvent values will behave as VK_PIPE_EVENT_COMMANDS_COMPLETE.
+     */
+    if (stageMask & ~VK_PIPELINE_STAGE_HOST_BIT) {
+        pipe_control_flags = GEN6_PIPE_CONTROL_CS_STALL;
+    }
+
+    cmd_batch_immediate(cmd, pipe_control_flags, event->obj.mem->bo, 0, value);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdSetEvent(
+    VkCommandBuffer                              commandBuffer,
+    VkEvent                                  event_,
+    VkPipelineStageFlags                     stageMask)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+    struct intel_event *event = intel_event(event_);
+
+    cmd_write_event_value(cmd, event, stageMask, 1);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdResetEvent(
+    VkCommandBuffer                              commandBuffer,
+    VkEvent                                  event_,
+    VkPipelineStageFlags                     stageMask)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+    struct intel_event *event = intel_event(event_);
+
+    cmd_write_event_value(cmd, event, stageMask, 0);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdCopyQueryPoolResults(
+    VkCommandBuffer                                 commandBuffer,
+    VkQueryPool                                 queryPool,
+    uint32_t                                    firstQuery,
+    uint32_t                                    queryCount,
+    VkBuffer                                    dstBuffer,
+    VkDeviceSize                                dstOffset,
+    VkDeviceSize                                stride,
+    VkFlags                                     flags)
+{
+    /* TODO: Fill in functionality here */
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdWriteTimestamp(
+    VkCommandBuffer                              commandBuffer,
+    VkPipelineStageFlagBits                     pipelineStage,
+    VkQueryPool                                 queryPool,
+    uint32_t                                    slot)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+    struct intel_query *query = intel_query(queryPool);
+    struct intel_bo *bo = query->obj.mem->bo;
+    const VkDeviceSize offset = query->slot_stride * slot;
+
+    if ((pipelineStage & ~VK_PIPELINE_STAGE_HOST_BIT) &&
+        pipelineStage != VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT) {
+        cmd_batch_timestamp(cmd, bo, offset);
+    } else {
+        /* XXX we are not supposed to use two commands... */
+        gen6_MI_STORE_REGISTER_MEM(cmd, bo, offset, GEN6_REG_TIMESTAMP);
+        gen6_MI_STORE_REGISTER_MEM(cmd, bo, offset + 4,
+                GEN6_REG_TIMESTAMP + 4);
+    }
+}

diff --git a/icd/intel/cmd_pipeline.c b/icd/intel/cmd_pipeline.c
new file mode 100644
index 0000000..7dfe676
--- /dev/null
+++ b/icd/intel/cmd_pipeline.c

@@ -0,0 +1,3919 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Chia-I Wu <olv@lunarg.com>
+ * Author: Cody Northrop <cody@lunarg.com>
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ *
+ */
+
+#include <math.h>
+#include "genhw/genhw.h"
+#include "buf.h"
+#include "desc.h"
+#include "img.h"
+#include "mem.h"
+#include "pipeline.h"
+#include "sampler.h"
+#include "shader.h"
+#include "state.h"
+#include "view.h"
+#include "cmd_priv.h"
+#include "fb.h"
+
+static void gen6_3DPRIMITIVE(struct intel_cmd *cmd,
+                             int prim_type, bool indexed,
+                             uint32_t vertex_count,
+                             uint32_t vertex_start,
+                             uint32_t instance_count,
+                             uint32_t instance_start,
+                             uint32_t vertex_base)
+{
+    const uint8_t cmd_len = 6;
+    uint32_t dw0, *dw;
+
+    CMD_ASSERT(cmd, 6, 6);
+
+    dw0 = GEN6_RENDER_CMD(3D, 3DPRIMITIVE) |
+          prim_type << GEN6_3DPRIM_DW0_TYPE__SHIFT |
+          (cmd_len - 2);
+
+    if (indexed)
+        dw0 |= GEN6_3DPRIM_DW0_ACCESS_RANDOM;
+
+    cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+    dw[1] = vertex_count;
+    dw[2] = vertex_start;
+    dw[3] = instance_count;
+    dw[4] = instance_start;
+    dw[5] = vertex_base;
+}
+
+static void gen7_3DPRIMITIVE(struct intel_cmd *cmd,
+                             int prim_type, bool indexed,
+                             uint32_t vertex_count,
+                             uint32_t vertex_start,
+                             uint32_t instance_count,
+                             uint32_t instance_start,
+                             uint32_t vertex_base)
+{
+    const uint8_t cmd_len = 7;
+    uint32_t dw0, dw1, *dw;
+
+    CMD_ASSERT(cmd, 7, 7.5);
+
+    dw0 = GEN6_RENDER_CMD(3D, 3DPRIMITIVE) | (cmd_len - 2);
+    dw1 = prim_type << GEN7_3DPRIM_DW1_TYPE__SHIFT;
+
+    if (indexed)
+        dw1 |= GEN7_3DPRIM_DW1_ACCESS_RANDOM;
+
+    cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+    dw[1] = dw1;
+    dw[2] = vertex_count;
+    dw[3] = vertex_start;
+    dw[4] = instance_count;
+    dw[5] = instance_start;
+    dw[6] = vertex_base;
+}
+
+static void gen6_PIPE_CONTROL(struct intel_cmd *cmd, uint32_t dw1,
+                              struct intel_bo *bo, uint32_t bo_offset,
+                              uint64_t imm)
+{
+   const uint8_t cmd_len = 5;
+   const uint32_t dw0 = GEN6_RENDER_CMD(3D, PIPE_CONTROL) |
+                        (cmd_len - 2);
+   uint32_t reloc_flags = INTEL_RELOC_WRITE;
+   uint32_t *dw;
+   uint32_t pos;
+
+   CMD_ASSERT(cmd, 6, 7.5);
+
+   assert(bo_offset % 8 == 0);
+
+   if (dw1 & GEN6_PIPE_CONTROL_CS_STALL) {
+      /*
+       * From the Sandy Bridge PRM, volume 2 part 1, page 73:
+       *
+       *     "1 of the following must also be set (when CS stall is set):
+       *
+       *       * Depth Cache Flush Enable ([0] of DW1)
+       *       * Stall at Pixel Scoreboard ([1] of DW1)
+       *       * Depth Stall ([13] of DW1)
+       *       * Post-Sync Operation ([13] of DW1)
+       *       * Render Target Cache Flush Enable ([12] of DW1)
+       *       * Notify Enable ([8] of DW1)"
+       *
+       * From the Ivy Bridge PRM, volume 2 part 1, page 61:
+       *
+       *     "One of the following must also be set (when CS stall is set):
+       *
+       *       * Render Target Cache Flush Enable ([12] of DW1)
+       *       * Depth Cache Flush Enable ([0] of DW1)
+       *       * Stall at Pixel Scoreboard ([1] of DW1)
+       *       * Depth Stall ([13] of DW1)
+       *       * Post-Sync Operation ([13] of DW1)"
+       */
+      uint32_t bit_test = GEN6_PIPE_CONTROL_RENDER_CACHE_FLUSH |
+                          GEN6_PIPE_CONTROL_DEPTH_CACHE_FLUSH |
+                          GEN6_PIPE_CONTROL_PIXEL_SCOREBOARD_STALL |
+                          GEN6_PIPE_CONTROL_DEPTH_STALL;
+
+      /* post-sync op */
+      bit_test |= GEN6_PIPE_CONTROL_WRITE_IMM |
+                  GEN6_PIPE_CONTROL_WRITE_PS_DEPTH_COUNT |
+                  GEN6_PIPE_CONTROL_WRITE_TIMESTAMP;
+
+      if (cmd_gen(cmd) == INTEL_GEN(6))
+         bit_test |= GEN6_PIPE_CONTROL_NOTIFY_ENABLE;
+
+      assert(dw1 & bit_test);
+   }
+
+   if (dw1 & GEN6_PIPE_CONTROL_DEPTH_STALL) {
+      /*
+       * From the Sandy Bridge PRM, volume 2 part 1, page 73:
+       *
+       *     "Following bits must be clear (when Depth Stall is set):
+       *
+       *       * Render Target Cache Flush Enable ([12] of DW1)
+       *       * Depth Cache Flush Enable ([0] of DW1)"
+       */
+      assert(!(dw1 & (GEN6_PIPE_CONTROL_RENDER_CACHE_FLUSH |
+                      GEN6_PIPE_CONTROL_DEPTH_CACHE_FLUSH)));
+   }
+
+   /*
+    * From the Sandy Bridge PRM, volume 1 part 3, page 19:
+    *
+    *     "[DevSNB] PPGTT memory writes by MI_* (such as MI_STORE_DATA_IMM)
+    *      and PIPE_CONTROL are not supported."
+    *
+    * The kernel will add the mapping automatically (when write domain is
+    * INTEL_DOMAIN_INSTRUCTION).
+    */
+   if (cmd_gen(cmd) == INTEL_GEN(6) && bo) {
+      bo_offset |= GEN6_PIPE_CONTROL_DW2_USE_GGTT;
+      reloc_flags |= INTEL_RELOC_GGTT;
+   }
+
+   pos = cmd_batch_pointer(cmd, cmd_len, &dw);
+   dw[0] = dw0;
+   dw[1] = dw1;
+   dw[2] = 0;
+   dw[3] = (uint32_t) imm;
+   dw[4] = (uint32_t) (imm >> 32);
+
+   if (bo) {
+       cmd_reserve_reloc(cmd, 1);
+       cmd_batch_reloc(cmd, pos + 2, bo, bo_offset, reloc_flags);
+   }
+}
+
+static bool gen6_can_primitive_restart(const struct intel_cmd *cmd)
+{
+    const struct intel_pipeline *p = cmd->bind.pipeline.graphics;
+    bool supported;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    if (cmd_gen(cmd) >= INTEL_GEN(7.5))
+        return (p->prim_type != GEN6_3DPRIM_RECTLIST);
+
+    switch (p->prim_type) {
+    case GEN6_3DPRIM_POINTLIST:
+    case GEN6_3DPRIM_LINELIST:
+    case GEN6_3DPRIM_LINESTRIP:
+    case GEN6_3DPRIM_TRILIST:
+    case GEN6_3DPRIM_TRISTRIP:
+        supported = true;
+        break;
+    default:
+        supported = false;
+        break;
+    }
+
+    if (!supported)
+        return false;
+
+    switch (cmd->bind.index.type) {
+    case VK_INDEX_TYPE_UINT16:
+        supported = (p->primitive_restart_index != 0xffffu);
+        break;
+    case VK_INDEX_TYPE_UINT32:
+        supported = (p->primitive_restart_index != 0xffffffffu);
+        break;
+    default:
+        supported = false;
+        break;
+    }
+
+    return supported;
+}
+
+static void gen6_3DSTATE_INDEX_BUFFER(struct intel_cmd *cmd,
+                                      const struct intel_buf *buf,
+                                      VkDeviceSize offset,
+                                      VkIndexType type,
+                                      bool enable_cut_index)
+{
+    const uint8_t cmd_len = 3;
+    uint32_t dw0, end_offset, *dw;
+    unsigned offset_align;
+    uint32_t pos;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    dw0 = GEN6_RENDER_CMD(3D, 3DSTATE_INDEX_BUFFER) | (cmd_len - 2);
+
+    /* the bit is moved to 3DSTATE_VF */
+    if (cmd_gen(cmd) >= INTEL_GEN(7.5))
+        assert(!enable_cut_index);
+    if (enable_cut_index)
+        dw0 |= GEN6_IB_DW0_CUT_INDEX_ENABLE;
+
+    switch (type) {
+    case VK_INDEX_TYPE_UINT16:
+        dw0 |= GEN6_IB_DW0_FORMAT_WORD;
+        offset_align = 2;
+        break;
+    case VK_INDEX_TYPE_UINT32:
+        dw0 |= GEN6_IB_DW0_FORMAT_DWORD;
+        offset_align = 4;
+        break;
+    default:
+        assert(!"unsupported index type");
+        break;
+    }
+
+    /* aligned and inclusive */
+    end_offset = buf->size - (buf->size % offset_align) - 1;
+
+    pos = cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+
+    cmd_reserve_reloc(cmd, 2);
+    cmd_batch_reloc(cmd, pos + 1, buf->obj.mem->bo, offset, 0);
+    cmd_batch_reloc(cmd, pos + 2, buf->obj.mem->bo, end_offset, 0);
+}
+
+static void gen75_3DSTATE_VF(struct intel_cmd *cmd,
+                             bool enable_cut_index,
+                             uint32_t cut_index)
+{
+    const uint8_t cmd_len = 2;
+    uint32_t dw0, *dw;
+
+    CMD_ASSERT(cmd, 7.5, 7.5);
+
+    dw0 = GEN75_RENDER_CMD(3D, 3DSTATE_VF) | (cmd_len - 2);
+    if (enable_cut_index)
+        dw0 |=  GEN75_VF_DW0_CUT_INDEX_ENABLE;
+
+    cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+    dw[1] = cut_index;
+}
+
+static void gen6_add_scratch_space(struct intel_cmd *cmd,
+                                   uint32_t batch_pos,
+                                   const struct intel_pipeline *pipeline,
+                                   const struct intel_pipeline_shader *sh)
+{
+    int scratch_space;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    assert(sh->per_thread_scratch_size &&
+           sh->per_thread_scratch_size % 1024 == 0 &&
+           u_is_pow2(sh->per_thread_scratch_size) &&
+           sh->scratch_offset % 1024 == 0);
+    scratch_space = u_ffs(sh->per_thread_scratch_size) - 11;
+
+    cmd_reserve_reloc(cmd, 1);
+    cmd_batch_reloc(cmd, batch_pos, pipeline->obj.mem->bo,
+            sh->scratch_offset | scratch_space, INTEL_RELOC_WRITE);
+}
+
+static void gen6_3DSTATE_GS(struct intel_cmd *cmd)
+{
+    const struct intel_pipeline *pipeline = cmd->bind.pipeline.graphics;
+    const struct intel_pipeline_shader *gs = &pipeline->gs;
+    const uint8_t cmd_len = 7;
+    uint32_t dw0, dw2, dw4, dw5, dw6, *dw;
+    CMD_ASSERT(cmd, 6, 6);
+    int vue_read_len = 0;
+    int pos = 0;
+
+    dw0 = GEN6_RENDER_CMD(3D, 3DSTATE_GS) | (cmd_len - 2);
+
+    if (pipeline->active_shaders & SHADER_GEOMETRY_FLAG) {
+
+        // based on ilo_gpe_init_gs_cso_gen6
+        vue_read_len = (gs->in_count + 1) / 2;
+        if (!vue_read_len)
+            vue_read_len = 1;
+
+        dw2 = (gs->sampler_count + 3) / 4 << GEN6_THREADDISP_SAMPLER_COUNT__SHIFT |
+              gs->surface_count << GEN6_THREADDISP_BINDING_TABLE_SIZE__SHIFT |
+              GEN6_THREADDISP_SPF;
+
+        dw4 = vue_read_len << GEN6_GS_DW4_URB_READ_LEN__SHIFT |
+              0 << GEN6_GS_DW4_URB_READ_OFFSET__SHIFT |
+              gs->urb_grf_start << GEN6_GS_DW4_URB_GRF_START__SHIFT;
+
+        dw5 = (gs->max_threads - 1) << GEN6_GS_DW5_MAX_THREADS__SHIFT |
+              GEN6_GS_DW5_STATISTICS |
+              GEN6_GS_DW5_RENDER_ENABLE;
+
+        dw6 = GEN6_GS_DW6_GS_ENABLE;
+
+        if (gs->discard_adj)
+            dw6 |= GEN6_GS_DW6_DISCARD_ADJACENCY;
+
+    } else {
+        dw2 = 0;
+        dw4 = 0;
+        dw5 = GEN6_GS_DW5_STATISTICS;
+        dw6 = 0;
+    }
+
+    pos = cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+    dw[1] = cmd->bind.pipeline.gs_offset;
+    dw[2] = dw2;
+    dw[3] = 0;
+    dw[4] = dw4;
+    dw[5] = dw5;
+    dw[6] = dw6;
+
+    if (gs->per_thread_scratch_size)
+        gen6_add_scratch_space(cmd, pos + 3, pipeline, gs);
+}
+
+static void gen7_3DSTATE_GS(struct intel_cmd *cmd)
+{
+    const struct intel_pipeline *pipeline = cmd->bind.pipeline.graphics;
+    const struct intel_pipeline_shader *gs = &pipeline->gs;
+    const uint8_t cmd_len = 7;
+    uint32_t dw0, dw2, dw4, dw5, dw6, *dw;
+    CMD_ASSERT(cmd, 7, 7.5);
+    int vue_read_len = 0;
+    int pos = 0;
+
+    dw0 = GEN6_RENDER_CMD(3D, 3DSTATE_GS) | (cmd_len - 2);
+
+    if (pipeline->active_shaders & SHADER_GEOMETRY_FLAG) {
+
+        // based on upload_gs_state
+        dw2 = (gs->sampler_count + 3) / 4 << GEN6_THREADDISP_SAMPLER_COUNT__SHIFT |
+              gs->surface_count << GEN6_THREADDISP_BINDING_TABLE_SIZE__SHIFT;
+
+        vue_read_len = (gs->in_count + 1) / 2;
+        if (!vue_read_len)
+            vue_read_len = 1;
+
+        dw4 = (gs->output_size_hwords * 2 - 1) << GEN7_GS_DW4_OUTPUT_SIZE__SHIFT |
+               gs->output_topology << GEN7_GS_DW4_OUTPUT_TOPO__SHIFT |
+               vue_read_len << GEN7_GS_DW4_URB_READ_LEN__SHIFT |
+               0 << GEN7_GS_DW4_URB_READ_OFFSET__SHIFT |
+               gs->urb_grf_start << GEN7_GS_DW4_URB_GRF_START__SHIFT;
+
+
+        dw5 = gs->control_data_header_size_hwords << GEN7_GS_DW5_CONTROL_DATA_HEADER_SIZE__SHIFT |
+              (gs->invocations - 1) << GEN7_GS_DW5_INSTANCE_CONTROL__SHIFT |
+              GEN7_GS_DW5_STATISTICS |
+              GEN7_GS_DW5_GS_ENABLE;
+
+        dw5 |= (gs->dual_instanced_dispatch) ? GEN7_GS_DW5_DISPATCH_MODE_DUAL_INSTANCE
+                                             : GEN7_GS_DW5_DISPATCH_MODE_DUAL_OBJECT;
+
+        if (gs->include_primitive_id)
+            dw5 |= GEN7_GS_DW5_INCLUDE_PRIMITIVE_ID;
+
+        if (cmd_gen(cmd) >= INTEL_GEN(7.5)) {
+            dw5 |= (gs->max_threads - 1) << GEN75_GS_DW5_MAX_THREADS__SHIFT;
+            dw5 |= GEN75_GS_DW5_REORDER_TRAILING;
+            dw6  = gs->control_data_format << GEN75_GS_DW6_GSCTRL__SHIFT;
+        } else {
+            dw5 |= (gs->max_threads - 1) << GEN7_GS_DW5_MAX_THREADS__SHIFT;
+            dw5 |= gs->control_data_format << GEN7_GS_DW5_GSCTRL__SHIFT;
+            dw6  = 0;
+        }
+    } else {
+        dw2 = 0;
+        dw4 = 0;
+        dw5 = GEN7_GS_DW5_STATISTICS;
+        dw6 = 0;
+    }
+
+    pos = cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+    dw[1] = cmd->bind.pipeline.gs_offset;
+    dw[2] = dw2;
+    dw[3] = 0;
+    dw[4] = dw4;
+    dw[5] = dw5;
+    dw[6] = dw6;
+
+    if (gs->per_thread_scratch_size)
+        gen6_add_scratch_space(cmd, pos + 3, pipeline, gs);
+}
+
+static void gen6_3DSTATE_DRAWING_RECTANGLE(struct intel_cmd *cmd,
+                                           uint32_t width, uint32_t height)
+{
+    const uint8_t cmd_len = 4;
+    const uint32_t dw0 = GEN6_RENDER_CMD(3D, 3DSTATE_DRAWING_RECTANGLE) |
+                         (cmd_len - 2);
+    uint32_t *dw;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+
+    if (width && height) {
+        dw[1] = 0;
+        dw[2] = (height - 1) << 16 |
+                (width - 1);
+    } else {
+        dw[1] = 1;
+        dw[2] = 0;
+    }
+
+    dw[3] = 0;
+}
+
+static void gen7_fill_3DSTATE_SF_body(const struct intel_cmd *cmd,
+                                      uint32_t body[6])
+{
+    const struct intel_pipeline *pipeline = cmd->bind.pipeline.graphics;
+    const struct intel_render_pass *rp = cmd->bind.render_pass;
+    const struct intel_render_pass_subpass *subpass =
+        cmd->bind.render_pass_subpass;
+    const struct intel_dynamic_line_width *line_width = &cmd->bind.state.line_width;
+    const struct intel_dynamic_depth_bias *depth_bias = &cmd->bind.state.depth_bias;
+    uint32_t dw1, dw2, dw3, dw4, dw5, dw6;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    dw1 = GEN7_SF_DW1_STATISTICS |
+          GEN7_SF_DW1_DEPTH_OFFSET_SOLID |
+          GEN7_SF_DW1_DEPTH_OFFSET_WIREFRAME |
+          GEN7_SF_DW1_DEPTH_OFFSET_POINT |
+          GEN7_SF_DW1_VIEWPORT_ENABLE |
+          pipeline->cmd_sf_fill;
+
+    if (cmd_gen(cmd) >= INTEL_GEN(7)) {
+        int format = GEN6_ZFORMAT_D32_FLOAT;
+
+        if (subpass->ds_index < rp->attachment_count) {
+            switch (rp->attachments[subpass->ds_index].format) {
+            case VK_FORMAT_D16_UNORM:
+                format = GEN6_ZFORMAT_D16_UNORM;
+                break;
+            case VK_FORMAT_D32_SFLOAT:
+            case VK_FORMAT_D32_SFLOAT_S8_UINT:
+                format = GEN6_ZFORMAT_D32_FLOAT;
+                break;
+            default:
+                assert(!"unsupported depth/stencil format");
+                break;
+            }
+        }
+
+        dw1 |= format << GEN7_SF_DW1_DEPTH_FORMAT__SHIFT;
+    }
+
+    dw2 = pipeline->cmd_sf_cull;
+
+    /* Scissor is always enabled */
+    dw2 |= GEN7_SF_DW2_SCISSOR_ENABLE;
+
+    // TODO: line width support
+    (void) line_width;
+
+    if (pipeline->sample_count > 1) {
+          dw2 |= 128 << GEN7_SF_DW2_LINE_WIDTH__SHIFT |
+                 GEN7_SF_DW2_MSRASTMODE_ON_PATTERN;
+    } else {
+          dw2 |= 0 << GEN7_SF_DW2_LINE_WIDTH__SHIFT |
+                 GEN7_SF_DW2_MSRASTMODE_OFF_PIXEL;
+    }
+
+    dw3 = 2 << GEN7_SF_DW3_TRI_PROVOKE__SHIFT |
+          1 << GEN7_SF_DW3_LINE_PROVOKE__SHIFT |
+          2 << GEN7_SF_DW3_TRIFAN_PROVOKE__SHIFT |
+          GEN7_SF_DW3_SUBPIXEL_8BITS;
+
+    if (pipeline->depthBiasEnable) {
+        dw4 = u_fui((float) depth_bias->depth_bias * 2.0f);
+        dw5 = u_fui(depth_bias->slope_scaled_depth_bias);
+        dw6 = u_fui(depth_bias->depth_bias_clamp);
+    } else {
+        dw4 = 0;
+        dw5 = 0;
+        dw6 = 0;
+    }
+
+    body[0] = dw1;
+    body[1] = dw2;
+    body[2] = dw3;
+    body[3] = dw4;
+    body[4] = dw5;
+    body[5] = dw6;
+}
+
+static void gen6_3DSTATE_SF(struct intel_cmd *cmd)
+{
+    const uint8_t cmd_len = 20;
+    const uint32_t dw0 = GEN6_RENDER_CMD(3D, 3DSTATE_SF) |
+                         (cmd_len - 2);
+    const uint32_t *sbe = cmd->bind.pipeline.graphics->cmd_3dstate_sbe;
+    uint32_t sf[6];
+    uint32_t *dw;
+
+    CMD_ASSERT(cmd, 6, 6);
+
+    gen7_fill_3DSTATE_SF_body(cmd, sf);
+
+    cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+    dw[1] = sbe[1];
+    memcpy(&dw[2], sf, sizeof(sf));
+    memcpy(&dw[8], &sbe[2], 12);
+}
+
+static void gen7_3DSTATE_SF(struct intel_cmd *cmd)
+{
+    const uint8_t cmd_len = 7;
+    uint32_t *dw;
+
+    CMD_ASSERT(cmd, 7, 7.5);
+
+    cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_SF) |
+            (cmd_len - 2);
+    gen7_fill_3DSTATE_SF_body(cmd, &dw[1]);
+}
+
+static void gen6_3DSTATE_CLIP(struct intel_cmd *cmd)
+{
+    const uint8_t cmd_len = 4;
+    const uint32_t dw0 = GEN6_RENDER_CMD(3D, 3DSTATE_CLIP) |
+                         (cmd_len - 2);
+    const struct intel_pipeline *pipeline = cmd->bind.pipeline.graphics;
+    const struct intel_pipeline_shader *vs = &pipeline->vs;
+    const struct intel_pipeline_shader *fs = &pipeline->fs;
+    const struct intel_dynamic_viewport *viewport = &cmd->bind.state.viewport;
+    uint32_t dw1, dw2, dw3, *dw;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    dw1 = GEN6_CLIP_DW1_STATISTICS;
+    if (cmd_gen(cmd) >= INTEL_GEN(7)) {
+        dw1 |= GEN7_CLIP_DW1_SUBPIXEL_8BITS |
+               GEN7_CLIP_DW1_EARLY_CULL_ENABLE |
+               pipeline->cmd_clip_cull;
+    }
+
+    dw2 = GEN6_CLIP_DW2_CLIP_ENABLE |
+          GEN6_CLIP_DW2_APIMODE_D3D | /* depth range [0, 1] */
+          GEN6_CLIP_DW2_XY_TEST_ENABLE |
+          (vs->enable_user_clip ? 1 : 0) << GEN6_CLIP_DW2_UCP_CLIP_ENABLES__SHIFT |
+          2 << GEN6_CLIP_DW2_TRI_PROVOKE__SHIFT |
+          1 << GEN6_CLIP_DW2_LINE_PROVOKE__SHIFT |
+          2 << GEN6_CLIP_DW2_TRIFAN_PROVOKE__SHIFT;
+
+    if (pipeline->rasterizerDiscardEnable)
+        dw2 |= GEN6_CLIP_DW2_CLIPMODE_REJECT_ALL;
+    else
+        dw2 |= GEN6_CLIP_DW2_CLIPMODE_NORMAL;
+
+    if (pipeline->depthClipEnable)
+        dw2 |= GEN6_CLIP_DW2_Z_TEST_ENABLE;
+
+    if (fs->barycentric_interps & (GEN6_INTERP_NONPERSPECTIVE_PIXEL |
+                                   GEN6_INTERP_NONPERSPECTIVE_CENTROID |
+                                   GEN6_INTERP_NONPERSPECTIVE_SAMPLE))
+        dw2 |= GEN6_CLIP_DW2_NONPERSPECTIVE_BARYCENTRIC_ENABLE;
+
+    dw3 = 0x1 << GEN6_CLIP_DW3_MIN_POINT_WIDTH__SHIFT |
+          0x7ff << GEN6_CLIP_DW3_MAX_POINT_WIDTH__SHIFT |
+          (viewport->viewport_count - 1);
+
+    /* TODO: framebuffer requests layer_count > 1 */
+    if (cmd->bind.fb->array_size == 1) {
+        dw3 |= GEN6_CLIP_DW3_RTAINDEX_FORCED_ZERO;
+    }
+
+    cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+    dw[1] = dw1;
+    dw[2] = dw2;
+    dw[3] = dw3;
+}
+
+static void gen6_3DSTATE_WM(struct intel_cmd *cmd)
+{
+    const struct intel_pipeline *pipeline = cmd->bind.pipeline.graphics;
+    const struct intel_pipeline_shader *fs = &pipeline->fs;
+    const uint8_t cmd_len = 9;
+    uint32_t pos;
+    uint32_t dw0, dw2, dw4, dw5, dw6, dw8, *dw;
+
+    CMD_ASSERT(cmd, 6, 6);
+
+    dw0 = GEN6_RENDER_CMD(3D, 3DSTATE_WM) | (cmd_len - 2);
+
+    dw2 = (fs->sampler_count + 3) / 4 << GEN6_THREADDISP_SAMPLER_COUNT__SHIFT |
+          fs->surface_count << GEN6_THREADDISP_BINDING_TABLE_SIZE__SHIFT;
+
+    dw4 = GEN6_WM_DW4_STATISTICS |
+          fs->urb_grf_start << GEN6_WM_DW4_URB_GRF_START0__SHIFT |
+          0 << GEN6_WM_DW4_URB_GRF_START1__SHIFT |
+          fs->urb_grf_start_16 << GEN6_WM_DW4_URB_GRF_START2__SHIFT;
+
+    dw5 = (fs->max_threads - 1) << GEN6_WM_DW5_MAX_THREADS__SHIFT |
+          GEN6_WM_DW5_PS_DISPATCH_ENABLE |
+          GEN6_PS_DISPATCH_8 << GEN6_WM_DW5_PS_DISPATCH_MODE__SHIFT;
+
+    if (fs->offset_16)
+        dw5 |= GEN6_PS_DISPATCH_16 << GEN6_WM_DW5_PS_DISPATCH_MODE__SHIFT;
+
+    if (fs->uses & INTEL_SHADER_USE_KILL ||
+        pipeline->alphaToCoverageEnable)
+        dw5 |= GEN6_WM_DW5_PS_KILL_PIXEL;
+
+    if (fs->computed_depth_mode)
+        dw5 |= GEN6_WM_DW5_PS_COMPUTE_DEPTH;
+    if (fs->uses & INTEL_SHADER_USE_DEPTH)
+        dw5 |= GEN6_WM_DW5_PS_USE_DEPTH;
+    if (fs->uses & INTEL_SHADER_USE_W)
+        dw5 |= GEN6_WM_DW5_PS_USE_W;
+
+    if (pipeline->dual_source_blend_enable)
+        dw5 |= GEN6_WM_DW5_PS_DUAL_SOURCE_BLEND;
+
+    dw6 = fs->in_count << GEN6_WM_DW6_SF_ATTR_COUNT__SHIFT |
+          GEN6_WM_DW6_PS_POSOFFSET_NONE |
+          GEN6_WM_DW6_ZW_INTERP_PIXEL |
+          fs->barycentric_interps << GEN6_WM_DW6_BARYCENTRIC_INTERP__SHIFT |
+          GEN6_WM_DW6_POINT_RASTRULE_UPPER_RIGHT;
+
+    if (pipeline->sample_count > 1) {
+        dw6 |= GEN6_WM_DW6_MSRASTMODE_ON_PATTERN |
+               GEN6_WM_DW6_MSDISPMODE_PERPIXEL;
+    } else {
+        dw6 |= GEN6_WM_DW6_MSRASTMODE_OFF_PIXEL |
+               GEN6_WM_DW6_MSDISPMODE_PERSAMPLE;
+    }
+
+    dw8 = (fs->offset_16) ? cmd->bind.pipeline.fs_offset + fs->offset_16 : 0;
+
+    pos = cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+    dw[1] = cmd->bind.pipeline.fs_offset;
+    dw[2] = dw2;
+    dw[3] = 0; /* scratch */
+    dw[4] = dw4;
+    dw[5] = dw5;
+    dw[6] = dw6;
+    dw[7] = 0; /* kernel 1 */
+    dw[8] = dw8; /* kernel 2 */
+
+    if (fs->per_thread_scratch_size)
+        gen6_add_scratch_space(cmd, pos + 3, pipeline, fs);
+}
+
+static void gen7_3DSTATE_WM(struct intel_cmd *cmd)
+{
+    const struct intel_pipeline *pipeline = cmd->bind.pipeline.graphics;
+    const struct intel_pipeline_shader *fs = &pipeline->fs;
+    const uint8_t cmd_len = 3;
+    uint32_t dw0, dw1, dw2, *dw;
+
+    CMD_ASSERT(cmd, 7, 7.5);
+
+    dw0 = GEN6_RENDER_CMD(3D, 3DSTATE_WM) | (cmd_len - 2);
+
+    dw1 = GEN7_WM_DW1_STATISTICS |
+          GEN7_WM_DW1_PS_DISPATCH_ENABLE |
+          GEN7_WM_DW1_ZW_INTERP_PIXEL |
+          fs->barycentric_interps << GEN7_WM_DW1_BARYCENTRIC_INTERP__SHIFT |
+          GEN7_WM_DW1_POINT_RASTRULE_UPPER_RIGHT;
+
+    if (fs->uses & INTEL_SHADER_USE_KILL ||
+        pipeline->alphaToCoverageEnable)
+        dw1 |= GEN7_WM_DW1_PS_KILL_PIXEL;
+
+    dw1 |= fs->computed_depth_mode << GEN7_WM_DW1_PSCDEPTH__SHIFT;
+
+    if (fs->uses & INTEL_SHADER_USE_DEPTH)
+        dw1 |= GEN7_WM_DW1_PS_USE_DEPTH;
+    if (fs->uses & INTEL_SHADER_USE_W)
+        dw1 |= GEN7_WM_DW1_PS_USE_W;
+
+    dw2 = 0;
+
+    if (pipeline->sample_count > 1) {
+        dw1 |= GEN7_WM_DW1_MSRASTMODE_ON_PATTERN;
+        dw2 |= GEN7_WM_DW2_MSDISPMODE_PERPIXEL;
+    } else {
+        dw1 |= GEN7_WM_DW1_MSRASTMODE_OFF_PIXEL;
+        dw2 |= GEN7_WM_DW2_MSDISPMODE_PERSAMPLE;
+    }
+
+    cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+    dw[1] = dw1;
+    dw[2] = dw2;
+}
+
+static void gen7_3DSTATE_PS(struct intel_cmd *cmd)
+{
+    const struct intel_pipeline *pipeline = cmd->bind.pipeline.graphics;
+    const struct intel_pipeline_shader *fs = &pipeline->fs;
+    const uint8_t cmd_len = 8;
+    uint32_t dw0, dw2, dw4, dw5, dw7, *dw;
+    uint32_t pos;
+
+    CMD_ASSERT(cmd, 7, 7.5);
+
+    dw0 = GEN7_RENDER_CMD(3D, 3DSTATE_PS) | (cmd_len - 2);
+
+    dw2 = (fs->sampler_count + 3) / 4 << GEN6_THREADDISP_SAMPLER_COUNT__SHIFT |
+          fs->surface_count << GEN6_THREADDISP_BINDING_TABLE_SIZE__SHIFT;
+
+    dw4 = GEN7_PS_DW4_POSOFFSET_NONE |
+          GEN6_PS_DISPATCH_8 << GEN7_PS_DW4_DISPATCH_MODE__SHIFT;
+
+    if (fs->offset_16)
+        dw4 |= GEN6_PS_DISPATCH_16 << GEN7_PS_DW4_DISPATCH_MODE__SHIFT;
+
+    if (cmd_gen(cmd) >= INTEL_GEN(7.5)) {
+        dw4 |= (fs->max_threads - 1) << GEN75_PS_DW4_MAX_THREADS__SHIFT;
+        dw4 |= pipeline->cmd_sample_mask << GEN75_PS_DW4_SAMPLE_MASK__SHIFT;
+    } else {
+        dw4 |= (fs->max_threads - 1) << GEN7_PS_DW4_MAX_THREADS__SHIFT;
+    }
+
+    if (fs->in_count)
+        dw4 |= GEN7_PS_DW4_ATTR_ENABLE;
+
+    if (pipeline->dual_source_blend_enable)
+        dw4 |= GEN7_PS_DW4_DUAL_SOURCE_BLEND;
+
+    dw5 = fs->urb_grf_start << GEN7_PS_DW5_URB_GRF_START0__SHIFT |
+          0 << GEN7_PS_DW5_URB_GRF_START1__SHIFT |
+          fs->urb_grf_start_16 << GEN7_PS_DW5_URB_GRF_START2__SHIFT;
+
+    dw7 = (fs->offset_16) ? cmd->bind.pipeline.fs_offset + fs->offset_16 : 0;
+
+    pos = cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+    dw[1] = cmd->bind.pipeline.fs_offset;
+    dw[2] = dw2;
+    dw[3] = 0; /* scratch */
+    dw[4] = dw4;
+    dw[5] = dw5;
+    dw[6] = 0; /* kernel 1 */
+    dw[7] = dw7; /* kernel 2 */
+
+    if (fs->per_thread_scratch_size)
+        gen6_add_scratch_space(cmd, pos + 3, pipeline, fs);
+}
+
+static void gen6_3DSTATE_MULTISAMPLE(struct intel_cmd *cmd,
+                                     uint32_t sample_count)
+{
+    const uint8_t cmd_len = (cmd_gen(cmd) >= INTEL_GEN(7)) ? 4 : 3;
+    uint32_t dw1, dw2, dw3, *dw;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    switch (sample_count) {
+    case 4:
+        dw1 = GEN6_MULTISAMPLE_DW1_NUMSAMPLES_4;
+        dw2 = cmd->dev->sample_pattern_4x;
+        dw3 = 0;
+        break;
+    case 8:
+        assert(cmd_gen(cmd) >= INTEL_GEN(7));
+        dw1 = GEN7_MULTISAMPLE_DW1_NUMSAMPLES_8;
+        dw2 = cmd->dev->sample_pattern_8x[0];
+        dw3 = cmd->dev->sample_pattern_8x[1];
+        break;
+    default:
+        assert(sample_count <= 1);
+        dw1 = GEN6_MULTISAMPLE_DW1_NUMSAMPLES_1;
+        dw2 = 0;
+        dw3 = 0;
+        break;
+    }
+
+    cmd_batch_pointer(cmd, cmd_len, &dw);
+
+    dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_MULTISAMPLE) | (cmd_len - 2);
+    dw[1] = dw1;
+    dw[2] = dw2;
+    if (cmd_gen(cmd) >= INTEL_GEN(7))
+        dw[3] = dw3;
+}
+
+static void gen6_3DSTATE_DEPTH_BUFFER(struct intel_cmd *cmd,
+                                      const struct intel_att_view *view,
+                                      bool optimal_ds)
+{
+    const uint8_t cmd_len = 7;
+    uint32_t dw0, *dw;
+    uint32_t pos;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    dw0 = (cmd_gen(cmd) >= INTEL_GEN(7)) ?
+        GEN7_RENDER_CMD(3D, 3DSTATE_DEPTH_BUFFER) :
+        GEN6_RENDER_CMD(3D, 3DSTATE_DEPTH_BUFFER);
+    dw0 |= (cmd_len - 2);
+
+    pos = cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+
+    dw[1] = view->att_cmd[0];
+    /* note that we only enable HiZ on Gen7+ */
+    if (!optimal_ds)
+        dw[1] &= ~GEN7_DEPTH_DW1_HIZ_ENABLE;
+
+    dw[2] = 0;
+    dw[3] = view->att_cmd[2];
+    dw[4] = view->att_cmd[3];
+    dw[5] = view->att_cmd[4];
+    dw[6] = view->att_cmd[5];
+
+    if (view->img) {
+        cmd_reserve_reloc(cmd, 1);
+        cmd_batch_reloc(cmd, pos + 2, view->img->obj.mem->bo,
+                view->att_cmd[1], INTEL_RELOC_WRITE);
+    }
+}
+
+static void gen6_3DSTATE_STENCIL_BUFFER(struct intel_cmd *cmd,
+                                        const struct intel_att_view *view,
+                                        bool optimal_ds)
+{
+    const uint8_t cmd_len = 3;
+    uint32_t dw0, *dw;
+    uint32_t pos;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    dw0 = (cmd_gen(cmd) >= INTEL_GEN(7)) ?
+        GEN7_RENDER_CMD(3D, 3DSTATE_STENCIL_BUFFER) :
+        GEN6_RENDER_CMD(3D, 3DSTATE_STENCIL_BUFFER);
+    dw0 |= (cmd_len - 2);
+
+    pos = cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+
+    if (view->has_stencil) {
+        dw[1] = view->att_cmd[6];
+
+        cmd_reserve_reloc(cmd, 1);
+        cmd_batch_reloc(cmd, pos + 2, view->img->obj.mem->bo,
+                view->att_cmd[7], INTEL_RELOC_WRITE);
+    } else {
+        dw[1] = 0;
+        dw[2] = 0;
+    }
+}
+
+static void gen6_3DSTATE_HIER_DEPTH_BUFFER(struct intel_cmd *cmd,
+                                           const struct intel_att_view *view,
+                                           bool optimal_ds)
+{
+    const uint8_t cmd_len = 3;
+    uint32_t dw0, *dw;
+    uint32_t pos;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    dw0 = (cmd_gen(cmd) >= INTEL_GEN(7)) ?
+        GEN7_RENDER_CMD(3D, 3DSTATE_HIER_DEPTH_BUFFER) :
+        GEN6_RENDER_CMD(3D, 3DSTATE_HIER_DEPTH_BUFFER);
+    dw0 |= (cmd_len - 2);
+
+    pos = cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+
+    if (view->has_hiz && optimal_ds) {
+        dw[1] = view->att_cmd[8];
+
+        cmd_reserve_reloc(cmd, 1);
+        cmd_batch_reloc(cmd, pos + 2, view->img->obj.mem->bo,
+                view->att_cmd[9], INTEL_RELOC_WRITE);
+    } else {
+        dw[1] = 0;
+        dw[2] = 0;
+    }
+}
+
+static void gen6_3DSTATE_CLEAR_PARAMS(struct intel_cmd *cmd,
+                                      uint32_t clear_val)
+{
+    const uint8_t cmd_len = 2;
+    const uint32_t dw0 = GEN6_RENDER_CMD(3D, 3DSTATE_CLEAR_PARAMS) |
+                         GEN6_CLEAR_PARAMS_DW0_VALID |
+                         (cmd_len - 2);
+    uint32_t *dw;
+
+    CMD_ASSERT(cmd, 6, 6);
+
+    cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+    dw[1] = clear_val;
+}
+
+static void gen7_3DSTATE_CLEAR_PARAMS(struct intel_cmd *cmd,
+                                      uint32_t clear_val)
+{
+    const uint8_t cmd_len = 3;
+    const uint32_t dw0 = GEN7_RENDER_CMD(3D, 3DSTATE_CLEAR_PARAMS) |
+                         (cmd_len - 2);
+    uint32_t *dw;
+
+    CMD_ASSERT(cmd, 7, 7.5);
+
+    cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+    dw[1] = clear_val;
+    dw[2] = 1;
+}
+
+static void gen6_3DSTATE_CC_STATE_POINTERS(struct intel_cmd *cmd,
+                                           uint32_t blend_offset,
+                                           uint32_t ds_offset,
+                                           uint32_t cc_offset)
+{
+    const uint8_t cmd_len = 4;
+    uint32_t dw0, *dw;
+
+    CMD_ASSERT(cmd, 6, 6);
+
+    dw0 = GEN6_RENDER_CMD(3D, 3DSTATE_CC_STATE_POINTERS) |
+          (cmd_len - 2);
+
+    cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+    dw[1] = blend_offset | 1;
+    dw[2] = ds_offset | 1;
+    dw[3] = cc_offset | 1;
+}
+
+static void gen6_3DSTATE_VIEWPORT_STATE_POINTERS(struct intel_cmd *cmd,
+                                                 uint32_t clip_offset,
+                                                 uint32_t sf_offset,
+                                                 uint32_t cc_offset)
+{
+    const uint8_t cmd_len = 4;
+    uint32_t dw0, *dw;
+
+    CMD_ASSERT(cmd, 6, 6);
+
+    dw0 = GEN6_RENDER_CMD(3D, 3DSTATE_VIEWPORT_STATE_POINTERS) |
+          GEN6_VP_PTR_DW0_CLIP_CHANGED |
+          GEN6_VP_PTR_DW0_SF_CHANGED |
+          GEN6_VP_PTR_DW0_CC_CHANGED |
+          (cmd_len - 2);
+
+    cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+    dw[1] = clip_offset;
+    dw[2] = sf_offset;
+    dw[3] = cc_offset;
+}
+
+static void gen6_3DSTATE_SCISSOR_STATE_POINTERS(struct intel_cmd *cmd,
+                                                uint32_t scissor_offset)
+{
+    const uint8_t cmd_len = 2;
+    uint32_t dw0, *dw;
+
+    CMD_ASSERT(cmd, 6, 6);
+
+    dw0 = GEN6_RENDER_CMD(3D, 3DSTATE_SCISSOR_STATE_POINTERS) |
+          (cmd_len - 2);
+
+    cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+    dw[1] = scissor_offset;
+}
+
+static void gen6_3DSTATE_BINDING_TABLE_POINTERS(struct intel_cmd *cmd,
+                                                uint32_t vs_offset,
+                                                uint32_t gs_offset,
+                                                uint32_t ps_offset)
+{
+    const uint8_t cmd_len = 4;
+    uint32_t dw0, *dw;
+
+    CMD_ASSERT(cmd, 6, 6);
+
+    dw0 = GEN6_RENDER_CMD(3D, 3DSTATE_BINDING_TABLE_POINTERS) |
+          GEN6_BINDING_TABLE_PTR_DW0_VS_CHANGED |
+          GEN6_BINDING_TABLE_PTR_DW0_GS_CHANGED |
+          GEN6_BINDING_TABLE_PTR_DW0_PS_CHANGED |
+          (cmd_len - 2);
+
+    cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+    dw[1] = vs_offset;
+    dw[2] = gs_offset;
+    dw[3] = ps_offset;
+}
+
+static void gen6_3DSTATE_SAMPLER_STATE_POINTERS(struct intel_cmd *cmd,
+                                                uint32_t vs_offset,
+                                                uint32_t gs_offset,
+                                                uint32_t ps_offset)
+{
+    const uint8_t cmd_len = 4;
+    uint32_t dw0, *dw;
+
+    CMD_ASSERT(cmd, 6, 6);
+
+    dw0 = GEN6_RENDER_CMD(3D, 3DSTATE_SAMPLER_STATE_POINTERS) |
+          GEN6_SAMPLER_PTR_DW0_VS_CHANGED |
+          GEN6_SAMPLER_PTR_DW0_GS_CHANGED |
+          GEN6_SAMPLER_PTR_DW0_PS_CHANGED |
+          (cmd_len - 2);
+
+    cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+    dw[1] = vs_offset;
+    dw[2] = gs_offset;
+    dw[3] = ps_offset;
+}
+
+static void gen7_3dstate_pointer(struct intel_cmd *cmd,
+                                 int subop, uint32_t offset)
+{
+    const uint8_t cmd_len = 2;
+    const uint32_t dw0 = GEN6_RENDER_TYPE_RENDER |
+                         GEN6_RENDER_SUBTYPE_3D |
+                         subop | (cmd_len - 2);
+    uint32_t *dw;
+
+    cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+    dw[1] = offset;
+}
+
+static uint32_t gen6_BLEND_STATE(struct intel_cmd *cmd)
+{
+    const uint8_t cmd_align = GEN6_ALIGNMENT_BLEND_STATE;
+    const uint8_t cmd_len = INTEL_MAX_RENDER_TARGETS * 2;
+    const struct intel_pipeline *pipeline = cmd->bind.pipeline.graphics;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+    STATIC_ASSERT(ARRAY_SIZE(pipeline->cmd_cb) >= INTEL_MAX_RENDER_TARGETS);
+
+    return cmd_state_write(cmd, INTEL_CMD_ITEM_BLEND, cmd_align, cmd_len, pipeline->cmd_cb);
+}
+
+static uint32_t gen6_DEPTH_STENCIL_STATE(struct intel_cmd *cmd,
+                                         const struct intel_dynamic_stencil *stencil_state)
+{
+    const struct intel_pipeline *pipeline = cmd->bind.pipeline.graphics;
+    const uint8_t cmd_align = GEN6_ALIGNMENT_DEPTH_STENCIL_STATE;
+    const uint8_t cmd_len = 3;
+    uint32_t dw[3];
+
+    dw[0] = pipeline->cmd_depth_stencil;
+
+    /* TODO: enable back facing stencil state */
+    /* same read and write masks for both front and back faces */
+    dw[1] = (stencil_state->front.stencil_compare_mask & 0xff) << 24 |
+            (stencil_state->front.stencil_write_mask & 0xff) << 16 |
+            (stencil_state->front.stencil_compare_mask & 0xff) << 8 |
+            (stencil_state->front.stencil_write_mask & 0xff);
+    dw[2] = pipeline->cmd_depth_test;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    if (stencil_state->front.stencil_write_mask && pipeline->stencilTestEnable)
+       dw[0] |= 1 << 18;
+
+    return cmd_state_write(cmd, INTEL_CMD_ITEM_DEPTH_STENCIL,
+            cmd_align, cmd_len, dw);
+}
+
+static uint32_t gen6_COLOR_CALC_STATE(struct intel_cmd *cmd,
+                                      uint32_t stencil_ref,
+                                      const uint32_t blend_color[4])
+{
+    const uint8_t cmd_align = GEN6_ALIGNMENT_COLOR_CALC_STATE;
+    const uint8_t cmd_len = 6;
+    uint32_t offset, *dw;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    offset = cmd_state_pointer(cmd, INTEL_CMD_ITEM_COLOR_CALC,
+            cmd_align, cmd_len, &dw);
+    dw[0] = stencil_ref;
+    dw[1] = 0;
+    dw[2] = blend_color[0];
+    dw[3] = blend_color[1];
+    dw[4] = blend_color[2];
+    dw[5] = blend_color[3];
+
+    return offset;
+}
+
+static void cmd_wa_gen6_pre_depth_stall_write(struct intel_cmd *cmd)
+{
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    if (!cmd->bind.draw_count)
+        return;
+
+    if (cmd->bind.wa_flags & INTEL_CMD_WA_GEN6_PRE_DEPTH_STALL_WRITE)
+        return;
+
+    cmd->bind.wa_flags |= INTEL_CMD_WA_GEN6_PRE_DEPTH_STALL_WRITE;
+
+   /*
+    * From the Sandy Bridge PRM, volume 2 part 1, page 60:
+    *
+    *     "Pipe-control with CS-stall bit set must be sent BEFORE the
+    *      pipe-control with a post-sync op and no write-cache flushes."
+    *
+    * The workaround below necessitates this workaround.
+    */
+    gen6_PIPE_CONTROL(cmd,
+            GEN6_PIPE_CONTROL_CS_STALL |
+            GEN6_PIPE_CONTROL_PIXEL_SCOREBOARD_STALL,
+            NULL, 0, 0);
+
+    gen6_PIPE_CONTROL(cmd, GEN6_PIPE_CONTROL_WRITE_IMM,
+            cmd->scratch_bo, 0, 0);
+}
+
+static void cmd_wa_gen6_pre_command_scoreboard_stall(struct intel_cmd *cmd)
+{
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    if (!cmd->bind.draw_count)
+        return;
+
+    gen6_PIPE_CONTROL(cmd, GEN6_PIPE_CONTROL_PIXEL_SCOREBOARD_STALL,
+            NULL, 0, 0);
+}
+
+static void cmd_wa_gen7_pre_vs_depth_stall_write(struct intel_cmd *cmd)
+{
+    CMD_ASSERT(cmd, 7, 7.5);
+
+    if (!cmd->bind.draw_count)
+        return;
+
+    cmd_wa_gen6_pre_depth_stall_write(cmd);
+
+    gen6_PIPE_CONTROL(cmd,
+            GEN6_PIPE_CONTROL_DEPTH_STALL | GEN6_PIPE_CONTROL_WRITE_IMM,
+            cmd->scratch_bo, 0, 0);
+}
+
+static void cmd_wa_gen7_post_command_cs_stall(struct intel_cmd *cmd)
+{
+    CMD_ASSERT(cmd, 7, 7.5);
+
+    /*
+     * From the Ivy Bridge PRM, volume 2 part 1, page 61:
+     *
+     *     "One of the following must also be set (when CS stall is set):
+     *
+     *       * Render Target Cache Flush Enable ([12] of DW1)
+     *       * Depth Cache Flush Enable ([0] of DW1)
+     *       * Stall at Pixel Scoreboard ([1] of DW1)
+     *       * Depth Stall ([13] of DW1)
+     *       * Post-Sync Operation ([13] of DW1)"
+     */
+    gen6_PIPE_CONTROL(cmd,
+            GEN6_PIPE_CONTROL_CS_STALL |
+            GEN6_PIPE_CONTROL_PIXEL_SCOREBOARD_STALL,
+            NULL, 0, 0);
+}
+
+static void cmd_wa_gen7_post_command_depth_stall(struct intel_cmd *cmd)
+{
+    CMD_ASSERT(cmd, 7, 7.5);
+
+    cmd_wa_gen6_pre_depth_stall_write(cmd);
+
+    gen6_PIPE_CONTROL(cmd, GEN6_PIPE_CONTROL_DEPTH_STALL, NULL, 0, 0);
+}
+
+static void cmd_wa_gen6_pre_multisample_depth_flush(struct intel_cmd *cmd)
+{
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    if (!cmd->bind.draw_count)
+        return;
+
+    /*
+     * From the Sandy Bridge PRM, volume 2 part 1, page 305:
+     *
+     *     "Driver must guarantee that all the caches in the depth pipe are
+     *      flushed before this command (3DSTATE_MULTISAMPLE) is parsed. This
+     *      requires driver to send a PIPE_CONTROL with a CS stall along with
+     *      a Depth Flush prior to this command."
+     *
+     * From the Ivy Bridge PRM, volume 2 part 1, page 304:
+     *
+     *     "Driver must ierarchi that all the caches in the depth pipe are
+     *      flushed before this command (3DSTATE_MULTISAMPLE) is parsed. This
+     *      requires driver to send a PIPE_CONTROL with a CS stall along with
+     *      a Depth Flush prior to this command.
+     */
+    gen6_PIPE_CONTROL(cmd,
+            GEN6_PIPE_CONTROL_DEPTH_CACHE_FLUSH |
+            GEN6_PIPE_CONTROL_CS_STALL,
+            NULL, 0, 0);
+}
+
+static void cmd_wa_gen6_pre_ds_flush(struct intel_cmd *cmd)
+{
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    if (!cmd->bind.draw_count)
+        return;
+
+    /*
+     * From the Ivy Bridge PRM, volume 2 part 1, page 315:
+     *
+     *     "Driver must send a least one PIPE_CONTROL command with CS Stall
+     *      and a post sync operation prior to the group of depth
+     *      commands(3DSTATE_DEPTH_BUFFER, 3DSTATE_CLEAR_PARAMS,
+     *      3DSTATE_STENCIL_BUFFER, and 3DSTATE_HIER_DEPTH_BUFFER)."
+     *
+     * This workaround satifies all the conditions.
+     */
+    cmd_wa_gen6_pre_depth_stall_write(cmd);
+
+    /*
+     * From the Ivy Bridge PRM, volume 2 part 1, page 315:
+     *
+     *     "Restriction: Prior to changing Depth/Stencil Buffer state (i.e.,
+     *      any combination of 3DSTATE_DEPTH_BUFFER, 3DSTATE_CLEAR_PARAMS,
+     *      3DSTATE_STENCIL_BUFFER, 3DSTATE_HIER_DEPTH_BUFFER) SW must first
+     *      issue a pipelined depth stall (PIPE_CONTROL with Depth Stall bit
+     *      set), followed by a pipelined depth cache flush (PIPE_CONTROL with
+     *      Depth Flush Bit set, followed by another pipelined depth stall
+     *      (PIPE_CONTROL with Depth Stall Bit set), unless SW can otherwise
+     *      guarantee that the pipeline from WM onwards is already flushed
+     *      (e.g., via a preceding MI_FLUSH)."
+     */
+    gen6_PIPE_CONTROL(cmd, GEN6_PIPE_CONTROL_DEPTH_STALL, NULL, 0, 0);
+    gen6_PIPE_CONTROL(cmd, GEN6_PIPE_CONTROL_DEPTH_CACHE_FLUSH, NULL, 0, 0);
+    gen6_PIPE_CONTROL(cmd, GEN6_PIPE_CONTROL_DEPTH_STALL, NULL, 0, 0);
+}
+
+void cmd_batch_state_base_address(struct intel_cmd *cmd)
+{
+    const uint8_t cmd_len = 10;
+    const uint32_t dw0 = GEN6_RENDER_CMD(COMMON, STATE_BASE_ADDRESS) |
+                         (cmd_len - 2);
+    const uint32_t mocs = (cmd_gen(cmd) >= INTEL_GEN(7)) ?
+        (GEN7_MOCS_L3_WB << 8 | GEN7_MOCS_L3_WB << 4) : 0;
+    uint32_t pos;
+    uint32_t *dw;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    pos = cmd_batch_pointer(cmd, cmd_len, &dw);
+
+    dw[0] = dw0;
+    /* start offsets */
+    dw[1] = mocs | 1;
+    dw[2] = 1;
+    dw[3] = 1;
+    dw[4] = 1;
+    dw[5] = 1;
+    /* end offsets */
+    dw[6] = 1;
+    dw[7] = 1 + 0xfffff000;
+    dw[8] = 1 + 0xfffff000;
+    dw[9] = 1;
+
+    cmd_reserve_reloc(cmd, 3);
+    cmd_batch_reloc_writer(cmd, pos + 2, INTEL_CMD_WRITER_SURFACE,
+            cmd->writers[INTEL_CMD_WRITER_SURFACE].sba_offset + 1);
+    cmd_batch_reloc_writer(cmd, pos + 3, INTEL_CMD_WRITER_STATE,
+            cmd->writers[INTEL_CMD_WRITER_STATE].sba_offset + 1);
+    cmd_batch_reloc_writer(cmd, pos + 5, INTEL_CMD_WRITER_INSTRUCTION,
+            cmd->writers[INTEL_CMD_WRITER_INSTRUCTION].sba_offset + 1);
+}
+
+void cmd_batch_push_const_alloc(struct intel_cmd *cmd)
+{
+    const uint32_t size = (cmd->dev->gpu->gt == 3) ? 16 : 8;
+    const uint8_t cmd_len = 2;
+    uint32_t offset = 0;
+    uint32_t *dw;
+
+    if (cmd_gen(cmd) <= INTEL_GEN(6))
+        return;
+
+    CMD_ASSERT(cmd, 7, 7.5);
+
+    /* 3DSTATE_PUSH_CONSTANT_ALLOC_x */
+    cmd_batch_pointer(cmd, cmd_len * 5, &dw);
+    dw[0] = GEN7_RENDER_CMD(3D, 3DSTATE_PUSH_CONSTANT_ALLOC_VS) | (cmd_len - 2);
+    dw[1] = offset << GEN7_PCB_ALLOC_DW1_OFFSET__SHIFT |
+            size << GEN7_PCB_ALLOC_DW1_SIZE__SHIFT;
+    offset += size;
+
+    dw += 2;
+    dw[0] = GEN7_RENDER_CMD(3D, 3DSTATE_PUSH_CONSTANT_ALLOC_PS) | (cmd_len - 2);
+    dw[1] = offset << GEN7_PCB_ALLOC_DW1_OFFSET__SHIFT |
+            size << GEN7_PCB_ALLOC_DW1_SIZE__SHIFT;
+
+    dw += 2;
+    dw[0] = GEN7_RENDER_CMD(3D, 3DSTATE_PUSH_CONSTANT_ALLOC_HS) | (cmd_len - 2);
+    dw[1] = 0 << GEN7_PCB_ALLOC_DW1_OFFSET__SHIFT |
+            0 << GEN7_PCB_ALLOC_DW1_SIZE__SHIFT;
+
+    dw += 2;
+    dw[0] = GEN7_RENDER_CMD(3D, 3DSTATE_PUSH_CONSTANT_ALLOC_DS) | (cmd_len - 2);
+    dw[1] = 0 << GEN7_PCB_ALLOC_DW1_OFFSET__SHIFT |
+            0 << GEN7_PCB_ALLOC_DW1_SIZE__SHIFT;
+
+    dw += 2;
+    dw[0] = GEN7_RENDER_CMD(3D, 3DSTATE_PUSH_CONSTANT_ALLOC_GS) | (cmd_len - 2);
+    dw[1] = 0 << GEN7_PCB_ALLOC_DW1_OFFSET__SHIFT |
+            0 << GEN7_PCB_ALLOC_DW1_SIZE__SHIFT;
+
+    /*
+     *
+     * From the Ivy Bridge PRM, volume 2 part 1, page 292:
+     *
+     *     "A PIPE_CONTOL command with the CS Stall bit set must be programmed
+     *      in the ring after this instruction
+     *      (3DSTATE_PUSH_CONSTANT_ALLOC_PS)."
+     */
+    cmd_wa_gen7_post_command_cs_stall(cmd);
+}
+
+void cmd_batch_flush(struct intel_cmd *cmd, uint32_t pipe_control_dw0)
+{
+    if (pipe_control_dw0 == 0)
+        return;
+
+    if (!cmd->bind.draw_count)
+        return;
+
+    assert(!(pipe_control_dw0 & GEN6_PIPE_CONTROL_WRITE__MASK));
+
+    /*
+     * From the Sandy Bridge PRM, volume 2 part 1, page 60:
+     *
+     *     "Before a PIPE_CONTROL with Write Cache Flush Enable =1, a
+     *      PIPE_CONTROL with any non-zero post-sync-op is required."
+     */
+    if (pipe_control_dw0 & GEN6_PIPE_CONTROL_RENDER_CACHE_FLUSH)
+        cmd_wa_gen6_pre_depth_stall_write(cmd);
+
+    /*
+     * From the Ivy Bridge PRM, volume 2 part 1, page 61:
+     *
+     *     "One of the following must also be set (when CS stall is set):
+     *
+     *       * Render Target Cache Flush Enable ([12] of DW1)
+     *       * Depth Cache Flush Enable ([0] of DW1)
+     *       * Stall at Pixel Scoreboard ([1] of DW1)
+     *       * Depth Stall ([13] of DW1)
+     *       * Post-Sync Operation ([13] of DW1)"
+     */
+    if ((pipe_control_dw0 & GEN6_PIPE_CONTROL_CS_STALL) &&
+        !(pipe_control_dw0 & (GEN6_PIPE_CONTROL_RENDER_CACHE_FLUSH |
+                              GEN6_PIPE_CONTROL_DEPTH_CACHE_FLUSH |
+                              GEN6_PIPE_CONTROL_PIXEL_SCOREBOARD_STALL |
+                              GEN6_PIPE_CONTROL_DEPTH_STALL)))
+        pipe_control_dw0 |= GEN6_PIPE_CONTROL_PIXEL_SCOREBOARD_STALL;
+
+    gen6_PIPE_CONTROL(cmd, pipe_control_dw0, NULL, 0, 0);
+}
+
+void cmd_batch_flush_all(struct intel_cmd *cmd)
+{
+    cmd_batch_flush(cmd, GEN6_PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE |
+                         GEN6_PIPE_CONTROL_RENDER_CACHE_FLUSH |
+                         GEN6_PIPE_CONTROL_DEPTH_CACHE_FLUSH |
+                         GEN6_PIPE_CONTROL_VF_CACHE_INVALIDATE |
+                         GEN6_PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
+                         GEN6_PIPE_CONTROL_CS_STALL);
+}
+
+void cmd_batch_depth_count(struct intel_cmd *cmd,
+                           struct intel_bo *bo,
+                           VkDeviceSize offset)
+{
+    cmd_wa_gen6_pre_depth_stall_write(cmd);
+
+    gen6_PIPE_CONTROL(cmd,
+            GEN6_PIPE_CONTROL_DEPTH_STALL |
+            GEN6_PIPE_CONTROL_WRITE_PS_DEPTH_COUNT,
+            bo, offset, 0);
+}
+
+void cmd_batch_timestamp(struct intel_cmd *cmd,
+                         struct intel_bo *bo,
+                         VkDeviceSize offset)
+{
+    /* need any WA or stall? */
+    gen6_PIPE_CONTROL(cmd, GEN6_PIPE_CONTROL_WRITE_TIMESTAMP, bo, offset, 0);
+}
+
+void cmd_batch_immediate(struct intel_cmd *cmd,
+                         uint32_t pipe_control_flags,
+                         struct intel_bo *bo,
+                         VkDeviceSize offset,
+                         uint64_t val)
+{
+    /* need any WA or stall? */
+    gen6_PIPE_CONTROL(cmd,
+            GEN6_PIPE_CONTROL_WRITE_IMM | pipe_control_flags,
+            bo, offset, val);
+}
+
+static void gen6_cc_states(struct intel_cmd *cmd)
+{
+    const struct intel_dynamic_blend *blend = &cmd->bind.state.blend;
+    const struct intel_dynamic_stencil *ss = &cmd->bind.state.stencil;
+    uint32_t blend_offset, ds_offset, cc_offset;
+    uint32_t stencil_ref;
+    uint32_t blend_color[4];
+
+    CMD_ASSERT(cmd, 6, 6);
+
+    blend_offset = gen6_BLEND_STATE(cmd);
+
+    if (blend)
+        memcpy(blend_color, blend->blend_const, sizeof(blend_color));
+    else
+        memset(blend_color, 0, sizeof(blend_color));
+
+    if (ss) {
+        ds_offset = gen6_DEPTH_STENCIL_STATE(cmd, ss);
+        /* TODO: enable back facing stencil state */
+        /* same reference for both front and back faces */
+        stencil_ref = (ss->front.stencil_reference & 0xff) << 24 |
+                      (ss->front.stencil_reference & 0xff) << 16;
+    } else {
+        ds_offset = 0;
+        stencil_ref = 0;
+    }
+
+    cc_offset = gen6_COLOR_CALC_STATE(cmd, stencil_ref, blend_color);
+
+    gen6_3DSTATE_CC_STATE_POINTERS(cmd, blend_offset, ds_offset, cc_offset);
+}
+
+static void gen6_viewport_states(struct intel_cmd *cmd)
+{
+    const struct intel_dynamic_viewport *viewport = &cmd->bind.state.viewport;
+    uint32_t sf_offset, clip_offset, cc_offset, scissor_offset;
+
+    if (!viewport)
+        return;
+
+    assert(viewport->cmd_len == (8 + 4 + 2) *
+            /* viewports */ viewport->viewport_count + (/* scissor */ viewport->viewport_count * 2));
+
+    sf_offset = cmd_state_write(cmd, INTEL_CMD_ITEM_SF_VIEWPORT,
+            GEN6_ALIGNMENT_SF_VIEWPORT, 8 * viewport->viewport_count,
+            viewport->cmd);
+
+    clip_offset = cmd_state_write(cmd, INTEL_CMD_ITEM_CLIP_VIEWPORT,
+            GEN6_ALIGNMENT_CLIP_VIEWPORT, 4 * viewport->viewport_count,
+            &viewport->cmd[viewport->cmd_clip_pos]);
+
+    cc_offset = cmd_state_write(cmd, INTEL_CMD_ITEM_CC_VIEWPORT,
+            GEN6_ALIGNMENT_SF_VIEWPORT, 2 * viewport->viewport_count,
+            &viewport->cmd[viewport->cmd_cc_pos]);
+
+    scissor_offset = cmd_state_write(cmd, INTEL_CMD_ITEM_SCISSOR_RECT,
+            GEN6_ALIGNMENT_SCISSOR_RECT, 2 * viewport->viewport_count,
+            &viewport->cmd[viewport->cmd_scissor_rect_pos]);
+
+    gen6_3DSTATE_VIEWPORT_STATE_POINTERS(cmd,
+            clip_offset, sf_offset, cc_offset);
+
+    gen6_3DSTATE_SCISSOR_STATE_POINTERS(cmd, scissor_offset);
+}
+
+static void gen7_cc_states(struct intel_cmd *cmd)
+{
+    const struct intel_dynamic_blend *blend = &cmd->bind.state.blend;
+    const struct intel_dynamic_depth_bounds *ds = &cmd->bind.state.depth_bounds;
+    const struct intel_dynamic_stencil *ss = &cmd->bind.state.stencil;
+    uint32_t stencil_ref;
+    uint32_t blend_color[4];
+    uint32_t offset;
+
+    CMD_ASSERT(cmd, 7, 7.5);
+
+    if (!blend && !ds)
+        return;
+
+    offset = gen6_BLEND_STATE(cmd);
+    gen7_3dstate_pointer(cmd,
+            GEN7_RENDER_OPCODE_3DSTATE_BLEND_STATE_POINTERS, offset);
+
+    if (blend)
+        memcpy(blend_color, blend->blend_const, sizeof(blend_color));
+    else
+        memset(blend_color, 0, sizeof(blend_color));
+
+    if (ss) {
+        offset = gen6_DEPTH_STENCIL_STATE(cmd, ss);
+        /* TODO: enable back facing stencil state */
+        /* same reference for both front and back faces */
+        stencil_ref = (ss->front.stencil_reference & 0xff) << 24 |
+                      (ss->front.stencil_reference & 0xff) << 16;
+        gen7_3dstate_pointer(cmd,
+                GEN7_RENDER_OPCODE_3DSTATE_DEPTH_STENCIL_STATE_POINTERS,
+                offset);
+    } else {
+        stencil_ref = 0;
+    }
+
+    offset = gen6_COLOR_CALC_STATE(cmd, stencil_ref, blend_color);
+    gen7_3dstate_pointer(cmd,
+            GEN6_RENDER_OPCODE_3DSTATE_CC_STATE_POINTERS, offset);
+}
+
+static void gen7_viewport_states(struct intel_cmd *cmd)
+{
+    const struct intel_dynamic_viewport *viewport = &cmd->bind.state.viewport;
+    uint32_t offset;
+
+    if (!viewport)
+        return;
+
+    assert(viewport->cmd_len == (16 + 2 + 2) * viewport->viewport_count);
+
+    offset = cmd_state_write(cmd, INTEL_CMD_ITEM_SF_VIEWPORT,
+            GEN7_ALIGNMENT_SF_CLIP_VIEWPORT, 16 * viewport->viewport_count,
+            viewport->cmd);
+    gen7_3dstate_pointer(cmd,
+            GEN7_RENDER_OPCODE_3DSTATE_VIEWPORT_STATE_POINTERS_SF_CLIP,
+            offset);
+
+    offset = cmd_state_write(cmd, INTEL_CMD_ITEM_CC_VIEWPORT,
+            GEN6_ALIGNMENT_CC_VIEWPORT, 2 * viewport->viewport_count,
+            &viewport->cmd[viewport->cmd_cc_pos]);
+    gen7_3dstate_pointer(cmd,
+            GEN7_RENDER_OPCODE_3DSTATE_VIEWPORT_STATE_POINTERS_CC,
+            offset);
+
+    offset = cmd_state_write(cmd, INTEL_CMD_ITEM_SCISSOR_RECT,
+                             GEN6_ALIGNMENT_SCISSOR_RECT, 2 * viewport->viewport_count,
+                             &viewport->cmd[viewport->cmd_scissor_rect_pos]);
+    gen7_3dstate_pointer(cmd,
+                         GEN6_RENDER_OPCODE_3DSTATE_SCISSOR_STATE_POINTERS,
+                         offset);
+}
+
+static void gen6_pcb(struct intel_cmd *cmd, int subop,
+                     const struct intel_pipeline_shader *sh)
+{
+    const uint8_t cmd_len = 5;
+    uint32_t *dw;
+
+    cmd_batch_pointer(cmd, cmd_len, &dw);
+
+    dw[0] = GEN6_RENDER_TYPE_RENDER |
+            GEN6_RENDER_SUBTYPE_3D |
+            subop | (cmd_len - 2);
+    dw[1] = 0;
+    dw[2] = 0;
+    dw[3] = 0;
+    dw[4] = 0;
+}
+
+static void gen7_pcb(struct intel_cmd *cmd, int subop,
+                     const struct intel_pipeline_shader *sh)
+{
+    const uint8_t cmd_len = 7;
+    uint32_t *dw;
+
+    cmd_batch_pointer(cmd, cmd_len, &dw);
+
+    dw[0] = GEN6_RENDER_TYPE_RENDER |
+            GEN6_RENDER_SUBTYPE_3D |
+            subop | (cmd_len - 2);
+    dw[1] = 0;
+    dw[2] = 0;
+    dw[3] = 0;
+    dw[4] = 0;
+    dw[5] = 0;
+    dw[6] = 0;
+}
+
+static uint32_t emit_samplers(struct intel_cmd *cmd,
+                              const struct intel_pipeline_rmap *rmap)
+{
+    const struct intel_desc_region *region = cmd->dev->desc_region;
+    const struct intel_cmd_dset_data *data = &cmd->bind.dset.graphics_data;
+    const uint32_t border_len = (cmd_gen(cmd) >= INTEL_GEN(7)) ? 4 : 12;
+    const uint32_t border_stride =
+        u_align(border_len, GEN6_ALIGNMENT_SAMPLER_BORDER_COLOR_STATE / 4);
+    uint32_t border_offset, *border_dw, sampler_offset, *sampler_dw;
+    uint32_t surface_count;
+    uint32_t i;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    if (!rmap || !rmap->sampler_count)
+        return 0;
+
+    surface_count = rmap->rt_count + rmap->texture_resource_count + rmap->resource_count + rmap->uav_count;
+
+    /*
+     * note that we cannot call cmd_state_pointer() here as the following
+     * cmd_state_pointer() would invalidate the pointer
+     */
+    border_offset = cmd_state_reserve(cmd, INTEL_CMD_ITEM_BLOB,
+            GEN6_ALIGNMENT_SAMPLER_BORDER_COLOR_STATE,
+            border_stride * rmap->sampler_count);
+
+    sampler_offset = cmd_state_pointer(cmd, INTEL_CMD_ITEM_SAMPLER,
+            GEN6_ALIGNMENT_SAMPLER_STATE,
+            4 * rmap->sampler_count, &sampler_dw);
+
+    cmd_state_update(cmd, border_offset,
+            border_stride * rmap->sampler_count, &border_dw);
+
+    for (i = 0; i < rmap->sampler_count; i++) {
+        const struct intel_pipeline_rmap_slot *slot =
+            &rmap->slots[surface_count + i];
+        struct intel_desc_offset desc_offset;
+        const struct intel_sampler *sampler;
+
+        switch (slot->type) {
+        case INTEL_PIPELINE_RMAP_SAMPLER:
+            intel_desc_offset_add(&desc_offset, &slot->u.sampler,
+                    &data->set_offsets[slot->index]);
+            intel_desc_region_read_sampler(region, &desc_offset, &sampler);
+            break;
+        case INTEL_PIPELINE_RMAP_UNUSED:
+            sampler = NULL;
+            break;
+        default:
+            assert(!"unexpected rmap type");
+            sampler = NULL;
+            break;
+        }
+
+        if (sampler) {
+            memcpy(border_dw, &sampler->cmd[3], border_len * 4);
+
+            sampler_dw[0] = sampler->cmd[0];
+            sampler_dw[1] = sampler->cmd[1];
+            sampler_dw[2] = border_offset;
+            sampler_dw[3] = sampler->cmd[2];
+        } else {
+            sampler_dw[0] = GEN6_SAMPLER_DW0_DISABLE;
+            sampler_dw[1] = 0;
+            sampler_dw[2] = 0;
+            sampler_dw[3] = 0;
+        }
+
+        border_offset += border_stride * 4;
+        border_dw += border_stride;
+        sampler_dw += 4;
+    }
+
+    return sampler_offset;
+}
+
+static uint32_t emit_binding_table(struct intel_cmd *cmd,
+                                   const struct intel_pipeline_rmap *rmap,
+                                   const VkShaderStageFlagBits stage)
+{
+    const struct intel_desc_region *region = cmd->dev->desc_region;
+    const struct intel_cmd_dset_data *data = &cmd->bind.dset.graphics_data;
+    const uint32_t sba_offset =
+        cmd->writers[INTEL_CMD_WRITER_SURFACE].sba_offset;
+    uint32_t binding_table[256], offset;
+    uint32_t surface_count, i;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    surface_count = (rmap) ?
+        rmap->rt_count + rmap->texture_resource_count + rmap->resource_count + rmap->uav_count : 0;
+    if (!surface_count)
+        return 0;
+
+    assert(surface_count <= ARRAY_SIZE(binding_table));
+
+    for (i = 0; i < surface_count; i++) {
+        const struct intel_pipeline_rmap_slot *slot = &rmap->slots[i];
+        struct intel_null_view null_view;
+        bool need_null_view = false;
+
+        switch (slot->type) {
+        case INTEL_PIPELINE_RMAP_RT:
+            {
+                const struct intel_render_pass_subpass *subpass =
+                    cmd->bind.render_pass_subpass;
+                const struct intel_fb *fb = cmd->bind.fb;
+                const struct intel_att_view *view =
+                    (slot->index < subpass->color_count &&
+                     subpass->color_indices[slot->index] < fb->view_count) ?
+                    fb->views[subpass->color_indices[slot->index]] : NULL;
+
+                if (view) {
+                    offset = cmd_surface_write(cmd, INTEL_CMD_ITEM_SURFACE,
+                            GEN6_ALIGNMENT_SURFACE_STATE,
+                            view->cmd_len, view->att_cmd);
+
+                    cmd_reserve_reloc(cmd, 1);
+                    cmd_surface_reloc(cmd, offset, 1, view->img->obj.mem->bo,
+                            view->att_cmd[1], INTEL_RELOC_WRITE);
+                } else {
+                    need_null_view = true;
+                }
+            }
+            break;
+        case INTEL_PIPELINE_RMAP_SURFACE:
+            {
+                const struct intel_pipeline_layout U_ASSERT_ONLY *pipeline_layout =
+                    cmd->bind.pipeline.graphics->pipeline_layout;
+                const int32_t dyn_idx = slot->u.surface.dynamic_offset_index;
+                struct intel_desc_offset desc_offset;
+                const struct intel_mem *mem;
+                bool read_only;
+                const uint32_t *cmd_data;
+                uint32_t cmd_len;
+
+                assert(dyn_idx < 0 ||
+                        dyn_idx < pipeline_layout->total_dynamic_desc_count);
+
+                intel_desc_offset_add(&desc_offset, &slot->u.surface.offset,
+                        &data->set_offsets[slot->index]);
+
+                intel_desc_region_read_surface(region, &desc_offset, stage,
+                        &mem, &read_only, &cmd_data, &cmd_len);
+                if (mem) {
+                    const uint32_t dynamic_offset = (dyn_idx >= 0) ?
+                        data->dynamic_offsets[dyn_idx] : 0;
+                    const uint32_t reloc_flags =
+                        (read_only) ? 0 : INTEL_RELOC_WRITE;
+
+                    offset = cmd_surface_write(cmd, INTEL_CMD_ITEM_SURFACE,
+                            GEN6_ALIGNMENT_SURFACE_STATE,
+                            cmd_len, cmd_data);
+
+                    cmd_reserve_reloc(cmd, 1);
+                    cmd_surface_reloc(cmd, offset, 1, mem->bo,
+                            cmd_data[1] + dynamic_offset, reloc_flags);
+                } else {
+                    need_null_view = true;
+                }
+            }
+            break;
+        case INTEL_PIPELINE_RMAP_UNUSED:
+            need_null_view = true;
+            break;
+        default:
+            assert(!"unexpected rmap type");
+            need_null_view = true;
+            break;
+        }
+
+        if (need_null_view) {
+            intel_null_view_init(&null_view, cmd->dev);
+            offset = cmd_surface_write(cmd, INTEL_CMD_ITEM_SURFACE,
+                    GEN6_ALIGNMENT_SURFACE_STATE,
+                    null_view.cmd_len, null_view.cmd);
+        }
+
+        binding_table[i] = offset - sba_offset;
+    }
+
+    offset = cmd_surface_write(cmd, INTEL_CMD_ITEM_BINDING_TABLE,
+            GEN6_ALIGNMENT_BINDING_TABLE_STATE,
+            surface_count, binding_table) - sba_offset;
+
+    /* there is a 64KB limit on BINIDNG_TABLE_STATEs */
+    assert(offset + sizeof(uint32_t) * surface_count <= 64 * 1024);
+
+    return offset;
+}
+
+static void gen6_3DSTATE_VERTEX_BUFFERS(struct intel_cmd *cmd)
+{
+    const struct intel_pipeline *pipeline = cmd->bind.pipeline.graphics;
+    const uint8_t cmd_len = 1 + 4 * pipeline->vb_count;
+    uint32_t *dw;
+    uint32_t pos, i;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    if (!pipeline->vb_count)
+        return;
+
+    pos = cmd_batch_pointer(cmd, cmd_len, &dw);
+
+    dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_VERTEX_BUFFERS) | (cmd_len - 2);
+    dw++;
+    pos++;
+
+    for (i = 0; i < pipeline->vb_count; i++) {
+        assert(pipeline->vb[i].stride <= 2048);
+
+        dw[0] = i << GEN6_VB_DW0_INDEX__SHIFT |
+                pipeline->vb[i].stride;
+
+        if (cmd_gen(cmd) >= INTEL_GEN(7)) {
+            dw[0] |= GEN7_MOCS_L3_WB << GEN6_VB_DW0_MOCS__SHIFT |
+                     GEN7_VB_DW0_ADDR_MODIFIED;
+        }
+
+        switch (pipeline->vb[i].inputRate) {
+        case VK_VERTEX_INPUT_RATE_VERTEX:
+            dw[0] |= GEN6_VB_DW0_ACCESS_VERTEXDATA;
+            dw[3] = 0;
+            break;
+        case VK_VERTEX_INPUT_RATE_INSTANCE:
+            dw[0] |= GEN6_VB_DW0_ACCESS_INSTANCEDATA;
+            dw[3] = 1;
+            break;
+        default:
+            assert(!"unknown step rate");
+            dw[0] |= GEN6_VB_DW0_ACCESS_VERTEXDATA;
+            dw[3] = 0;
+            break;
+        }
+
+        if (cmd->bind.vertex.buf[i]) {
+            const struct intel_buf *buf = cmd->bind.vertex.buf[i];
+            const VkDeviceSize offset = cmd->bind.vertex.offset[i];
+
+            cmd_reserve_reloc(cmd, 2);
+            cmd_batch_reloc(cmd, pos + 1, buf->obj.mem->bo, offset, 0);
+            cmd_batch_reloc(cmd, pos + 2, buf->obj.mem->bo, buf->size - 1, 0);
+        } else {
+            dw[0] |= GEN6_VB_DW0_IS_NULL;
+            dw[1] = 0;
+            dw[2] = 0;
+        }
+
+        dw += 4;
+        pos += 4;
+    }
+}
+
+static void gen6_3DSTATE_VS(struct intel_cmd *cmd)
+{
+    const struct intel_pipeline *pipeline = cmd->bind.pipeline.graphics;
+    const struct intel_pipeline_shader *vs = &pipeline->vs;
+    const uint8_t cmd_len = 6;
+    const uint32_t dw0 = GEN6_RENDER_CMD(3D, 3DSTATE_VS) | (cmd_len - 2);
+    uint32_t dw2, dw4, dw5, *dw;
+    uint32_t pos;
+    int vue_read_len;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    /*
+     * From the Sandy Bridge PRM, volume 2 part 1, page 135:
+     *
+     *     "(Vertex URB Entry Read Length) Specifies the number of pairs of
+     *      128-bit vertex elements to be passed into the payload for each
+     *      vertex."
+     *
+     *     "It is UNDEFINED to set this field to 0 indicating no Vertex URB
+     *      data to be read and passed to the thread."
+     */
+    vue_read_len = (vs->in_count + 1) / 2;
+    if (!vue_read_len)
+        vue_read_len = 1;
+
+    dw2 = (vs->sampler_count + 3) / 4 << GEN6_THREADDISP_SAMPLER_COUNT__SHIFT |
+          vs->surface_count << GEN6_THREADDISP_BINDING_TABLE_SIZE__SHIFT;
+
+    dw4 = vs->urb_grf_start << GEN6_VS_DW4_URB_GRF_START__SHIFT |
+          vue_read_len << GEN6_VS_DW4_URB_READ_LEN__SHIFT |
+          0 << GEN6_VS_DW4_URB_READ_OFFSET__SHIFT;
+
+    dw5 = GEN6_VS_DW5_STATISTICS |
+          GEN6_VS_DW5_VS_ENABLE;
+
+    if (cmd_gen(cmd) >= INTEL_GEN(7.5))
+        dw5 |= (vs->max_threads - 1) << GEN75_VS_DW5_MAX_THREADS__SHIFT;
+    else
+        dw5 |= (vs->max_threads - 1) << GEN6_VS_DW5_MAX_THREADS__SHIFT;
+
+    if (pipeline->disable_vs_cache)
+        dw5 |= GEN6_VS_DW5_CACHE_DISABLE;
+
+    pos = cmd_batch_pointer(cmd, cmd_len, &dw);
+    dw[0] = dw0;
+    dw[1] = cmd->bind.pipeline.vs_offset;
+    dw[2] = dw2;
+    dw[3] = 0; /* scratch */
+    dw[4] = dw4;
+    dw[5] = dw5;
+
+    if (vs->per_thread_scratch_size)
+        gen6_add_scratch_space(cmd, pos + 3, pipeline, vs);
+}
+
+static void emit_shader_resources(struct intel_cmd *cmd)
+{
+    /* five HW shader stages */
+    uint32_t binding_tables[5], samplers[5];
+
+    binding_tables[0] = emit_binding_table(cmd,
+            cmd->bind.pipeline.graphics->vs.rmap,
+            VK_SHADER_STAGE_VERTEX_BIT);
+    binding_tables[1] = emit_binding_table(cmd,
+            cmd->bind.pipeline.graphics->tcs.rmap,
+            VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT);
+    binding_tables[2] = emit_binding_table(cmd,
+            cmd->bind.pipeline.graphics->tes.rmap,
+            VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT);
+    binding_tables[3] = emit_binding_table(cmd,
+            cmd->bind.pipeline.graphics->gs.rmap,
+            VK_SHADER_STAGE_GEOMETRY_BIT);
+    binding_tables[4] = emit_binding_table(cmd,
+            cmd->bind.pipeline.graphics->fs.rmap,
+            VK_SHADER_STAGE_FRAGMENT_BIT);
+
+    samplers[0] = emit_samplers(cmd, cmd->bind.pipeline.graphics->vs.rmap);
+    samplers[1] = emit_samplers(cmd, cmd->bind.pipeline.graphics->tcs.rmap);
+    samplers[2] = emit_samplers(cmd, cmd->bind.pipeline.graphics->tes.rmap);
+    samplers[3] = emit_samplers(cmd, cmd->bind.pipeline.graphics->gs.rmap);
+    samplers[4] = emit_samplers(cmd, cmd->bind.pipeline.graphics->fs.rmap);
+
+    if (cmd_gen(cmd) >= INTEL_GEN(7)) {
+        gen7_3dstate_pointer(cmd,
+                GEN7_RENDER_OPCODE_3DSTATE_BINDING_TABLE_POINTERS_VS,
+                binding_tables[0]);
+        gen7_3dstate_pointer(cmd,
+                GEN7_RENDER_OPCODE_3DSTATE_BINDING_TABLE_POINTERS_HS,
+                binding_tables[1]);
+        gen7_3dstate_pointer(cmd,
+                GEN7_RENDER_OPCODE_3DSTATE_BINDING_TABLE_POINTERS_DS,
+                binding_tables[2]);
+        gen7_3dstate_pointer(cmd,
+                GEN7_RENDER_OPCODE_3DSTATE_BINDING_TABLE_POINTERS_GS,
+                binding_tables[3]);
+        gen7_3dstate_pointer(cmd,
+                GEN7_RENDER_OPCODE_3DSTATE_BINDING_TABLE_POINTERS_PS,
+                binding_tables[4]);
+
+        gen7_3dstate_pointer(cmd,
+                GEN7_RENDER_OPCODE_3DSTATE_SAMPLER_STATE_POINTERS_VS,
+                samplers[0]);
+        gen7_3dstate_pointer(cmd,
+                GEN7_RENDER_OPCODE_3DSTATE_SAMPLER_STATE_POINTERS_HS,
+                samplers[1]);
+        gen7_3dstate_pointer(cmd,
+                GEN7_RENDER_OPCODE_3DSTATE_SAMPLER_STATE_POINTERS_DS,
+                samplers[2]);
+        gen7_3dstate_pointer(cmd,
+                GEN7_RENDER_OPCODE_3DSTATE_SAMPLER_STATE_POINTERS_GS,
+                samplers[3]);
+        gen7_3dstate_pointer(cmd,
+                GEN7_RENDER_OPCODE_3DSTATE_SAMPLER_STATE_POINTERS_PS,
+                samplers[4]);
+    } else {
+        assert(!binding_tables[1] && !binding_tables[2]);
+        gen6_3DSTATE_BINDING_TABLE_POINTERS(cmd,
+                binding_tables[0], binding_tables[3], binding_tables[4]);
+
+        assert(!samplers[1] && !samplers[2]);
+        gen6_3DSTATE_SAMPLER_STATE_POINTERS(cmd,
+                samplers[0], samplers[3], samplers[4]);
+    }
+}
+
+static void emit_msaa(struct intel_cmd *cmd)
+{
+    const struct intel_pipeline *pipeline = cmd->bind.pipeline.graphics;
+
+    if (!cmd->bind.render_pass_changed)
+        return;
+
+    cmd_wa_gen6_pre_multisample_depth_flush(cmd);
+    gen6_3DSTATE_MULTISAMPLE(cmd, pipeline->sample_count);
+}
+
+static void emit_rt(struct intel_cmd *cmd)
+{
+    const struct intel_fb *fb = cmd->bind.fb;
+
+    if (!cmd->bind.render_pass_changed)
+        return;
+
+    cmd_wa_gen6_pre_depth_stall_write(cmd);
+    gen6_3DSTATE_DRAWING_RECTANGLE(cmd, fb->width,
+            fb->height);
+}
+
+static void emit_ds(struct intel_cmd *cmd)
+{
+    const struct intel_render_pass *rp = cmd->bind.render_pass;
+    const struct intel_render_pass_subpass *subpass =
+        cmd->bind.render_pass_subpass;
+    const struct intel_fb *fb = cmd->bind.fb;
+    const struct intel_att_view *view =
+        (subpass->ds_index < rp->attachment_count) ?
+        fb->views[subpass->ds_index] : NULL;
+
+    if (!cmd->bind.render_pass_changed)
+        return;
+
+    if (!view) {
+        /* all zeros */
+        static const struct intel_att_view null_view;
+        view = &null_view;
+    }
+
+    cmd_wa_gen6_pre_ds_flush(cmd);
+    gen6_3DSTATE_DEPTH_BUFFER(cmd, view, subpass->ds_optimal);
+    gen6_3DSTATE_STENCIL_BUFFER(cmd, view, subpass->ds_optimal);
+    gen6_3DSTATE_HIER_DEPTH_BUFFER(cmd, view, subpass->ds_optimal);
+
+    if (cmd_gen(cmd) >= INTEL_GEN(7))
+        gen7_3DSTATE_CLEAR_PARAMS(cmd, 0);
+    else
+        gen6_3DSTATE_CLEAR_PARAMS(cmd, 0);
+}
+
+static uint32_t emit_shader(struct intel_cmd *cmd,
+                            const struct intel_pipeline_shader *shader)
+{
+    struct intel_cmd_shader_cache *cache = &cmd->bind.shader_cache;
+    uint32_t offset;
+    uint32_t i;
+
+    /* see if the shader is already in the cache */
+    for (i = 0; i < cache->used; i++) {
+        if (cache->entries[i].shader == (const void *) shader)
+            return cache->entries[i].kernel_offset;
+    }
+
+    offset = cmd_instruction_write(cmd, shader->codeSize, shader->pCode);
+
+    /* grow the cache if full */
+    if (cache->used >= cache->count) {
+        const uint32_t count = cache->count + 16;
+        void *entries;
+
+        entries = intel_alloc(cmd, sizeof(cache->entries[0]) * count, sizeof(int),
+                VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+        if (entries) {
+            if (cache->entries) {
+                memcpy(entries, cache->entries,
+                        sizeof(cache->entries[0]) * cache->used);
+                intel_free(cmd, cache->entries);
+            }
+
+            cache->entries = entries;
+            cache->count = count;
+        }
+    }
+
+    /* add the shader to the cache */
+    if (cache->used < cache->count) {
+        cache->entries[cache->used].shader = (const void *) shader;
+        cache->entries[cache->used].kernel_offset = offset;
+        cache->used++;
+    }
+
+    return offset;
+}
+
+static void emit_graphics_pipeline(struct intel_cmd *cmd)
+{
+    const struct intel_pipeline *pipeline = cmd->bind.pipeline.graphics;
+
+    if (pipeline->wa_flags & INTEL_CMD_WA_GEN6_PRE_DEPTH_STALL_WRITE)
+        cmd_wa_gen6_pre_depth_stall_write(cmd);
+    if (pipeline->wa_flags & INTEL_CMD_WA_GEN6_PRE_COMMAND_SCOREBOARD_STALL)
+        cmd_wa_gen6_pre_command_scoreboard_stall(cmd);
+    if (pipeline->wa_flags & INTEL_CMD_WA_GEN7_PRE_VS_DEPTH_STALL_WRITE)
+        cmd_wa_gen7_pre_vs_depth_stall_write(cmd);
+
+    /* 3DSTATE_URB_VS and etc. */
+    assert(pipeline->cmd_len);
+    cmd_batch_write(cmd, pipeline->cmd_len, pipeline->cmds);
+
+    if (pipeline->active_shaders & SHADER_VERTEX_FLAG) {
+        cmd->bind.pipeline.vs_offset = emit_shader(cmd, &pipeline->vs);
+    }
+    if (pipeline->active_shaders & SHADER_TESS_CONTROL_FLAG) {
+        cmd->bind.pipeline.tcs_offset = emit_shader(cmd, &pipeline->tcs);
+    }
+    if (pipeline->active_shaders & SHADER_TESS_EVAL_FLAG) {
+        cmd->bind.pipeline.tes_offset = emit_shader(cmd, &pipeline->tes);
+    }
+    if (pipeline->active_shaders & SHADER_GEOMETRY_FLAG) {
+        cmd->bind.pipeline.gs_offset = emit_shader(cmd, &pipeline->gs);
+    }
+    if (pipeline->active_shaders & SHADER_FRAGMENT_FLAG) {
+        cmd->bind.pipeline.fs_offset = emit_shader(cmd, &pipeline->fs);
+    }
+
+    if (pipeline->wa_flags & INTEL_CMD_WA_GEN7_POST_COMMAND_CS_STALL)
+        cmd_wa_gen7_post_command_cs_stall(cmd);
+    if (pipeline->wa_flags & INTEL_CMD_WA_GEN7_POST_COMMAND_DEPTH_STALL)
+        cmd_wa_gen7_post_command_depth_stall(cmd);
+}
+
+static void
+viewport_get_guardband(const struct intel_gpu *gpu,
+                       int center_x, int center_y,
+                       int *min_gbx, int *max_gbx,
+                       int *min_gby, int *max_gby)
+{
+   /*
+    * From the Sandy Bridge PRM, volume 2 part 1, page 234:
+    *
+    *     "Per-Device Guardband Extents
+    *
+    *       - Supported X,Y ScreenSpace "Guardband" Extent: [-16K,16K-1]
+    *       - Maximum Post-Clamp Delta (X or Y): 16K"
+    *
+    *     "In addition, in order to be correctly rendered, objects must have a
+    *      screenspace bounding box not exceeding 8K in the X or Y direction.
+    *      This additional restriction must also be comprehended by software,
+    *      i.e., enforced by use of clipping."
+    *
+    * From the Ivy Bridge PRM, volume 2 part 1, page 248:
+    *
+    *     "Per-Device Guardband Extents
+    *
+    *       - Supported X,Y ScreenSpace "Guardband" Extent: [-32K,32K-1]
+    *       - Maximum Post-Clamp Delta (X or Y): N/A"
+    *
+    *     "In addition, in order to be correctly rendered, objects must have a
+    *      screenspace bounding box not exceeding 8K in the X or Y direction.
+    *      This additional restriction must also be comprehended by software,
+    *      i.e., enforced by use of clipping."
+    *
+    * Combined, the bounding box of any object can not exceed 8K in both
+    * width and height.
+    *
+    * Below we set the guardband as a squre of length 8K, centered at where
+    * the viewport is.  This makes sure all objects passing the GB test are
+    * valid to the renderer, and those failing the XY clipping have a
+    * better chance of passing the GB test.
+    */
+   const int max_extent = (intel_gpu_gen(gpu) >= INTEL_GEN(7)) ? 32768 : 16384;
+   const int half_len = 8192 / 2;
+
+   /* make sure the guardband is within the valid range */
+   if (center_x - half_len < -max_extent)
+      center_x = -max_extent + half_len;
+   else if (center_x + half_len > max_extent - 1)
+      center_x = max_extent - half_len;
+
+   if (center_y - half_len < -max_extent)
+      center_y = -max_extent + half_len;
+   else if (center_y + half_len > max_extent - 1)
+      center_y = max_extent - half_len;
+
+   *min_gbx = (float) (center_x - half_len);
+   *max_gbx = (float) (center_x + half_len);
+   *min_gby = (float) (center_y - half_len);
+   *max_gby = (float) (center_y + half_len);
+}
+
+static void
+viewport_state_cmd(struct intel_dynamic_viewport *state,
+                   const struct intel_gpu *gpu,
+                   uint32_t count)
+{
+    INTEL_GPU_ASSERT(gpu, 6, 7.5);
+
+    state->viewport_count = count;
+
+    assert(count <= INTEL_MAX_VIEWPORTS);
+
+    if (intel_gpu_gen(gpu) >= INTEL_GEN(7)) {
+        state->cmd_len = 16 * count;
+
+        state->cmd_clip_pos = 8;
+    } else {
+        state->cmd_len = 8 * count;
+
+        state->cmd_clip_pos = state->cmd_len;
+        state->cmd_len += 4 * count;
+    }
+
+    state->cmd_cc_pos = state->cmd_len;
+    state->cmd_len += 2 * count;
+
+    state->cmd_scissor_rect_pos = state->cmd_len;
+    state->cmd_len += 2 * count;
+
+    assert(sizeof(uint32_t) * state->cmd_len <= sizeof(state->cmd));
+}
+
+static void
+set_viewport_state(
+        struct intel_cmd*               cmd)
+{
+    const struct intel_gpu *gpu = cmd->dev->gpu;
+    struct intel_dynamic_viewport *state = &cmd->bind.state.viewport;
+    const uint32_t sf_stride = (intel_gpu_gen(gpu) >= INTEL_GEN(7)) ? 16 : 8;
+    const uint32_t clip_stride = (intel_gpu_gen(gpu) >= INTEL_GEN(7)) ? 16 : 4;
+    uint32_t *sf_viewport, *clip_viewport, *cc_viewport, *scissor_rect;
+    uint32_t i;
+
+    INTEL_GPU_ASSERT(gpu, 6, 7.5);
+
+    viewport_state_cmd(state, gpu, cmd->bind.state.viewport.viewport_count);
+
+    sf_viewport = state->cmd;
+    clip_viewport = state->cmd + state->cmd_clip_pos;
+    cc_viewport = state->cmd + state->cmd_cc_pos;
+    scissor_rect = state->cmd + state->cmd_scissor_rect_pos;
+
+    for (i = 0; i < cmd->bind.state.viewport.viewport_count; i++) {
+        const VkViewport *viewport = &cmd->bind.state.viewport.viewports[i];
+        uint32_t *dw = NULL;
+        float translate[3], scale[3];
+        int min_gbx, max_gbx, min_gby, max_gby;
+
+        scale[0] = viewport->width / 2.0f;
+        scale[1] = viewport->height / 2.0f;
+        scale[2] = viewport->maxDepth - viewport->minDepth;
+        translate[0] = viewport->x + scale[0];
+        translate[1] = viewport->y + scale[1];
+        translate[2] = viewport->minDepth;
+
+        viewport_get_guardband(gpu, (int) translate[0], (int) translate[1],
+                &min_gbx, &max_gbx, &min_gby, &max_gby);
+
+        /* SF_VIEWPORT */
+        dw = sf_viewport;
+        dw[0] = u_fui(scale[0]);
+        dw[1] = u_fui(scale[1]);
+        dw[2] = u_fui(scale[2]);
+        dw[3] = u_fui(translate[0]);
+        dw[4] = u_fui(translate[1]);
+        dw[5] = u_fui(translate[2]);
+        dw[6] = 0;
+        dw[7] = 0;
+        sf_viewport += sf_stride;
+
+        /* CLIP_VIEWPORT */
+        dw = clip_viewport;
+        dw[0] = u_fui(((float) min_gbx - translate[0]) / fabsf(scale[0]));
+        dw[1] = u_fui(((float) max_gbx - translate[0]) / fabsf(scale[0]));
+        dw[2] = u_fui(((float) min_gby - translate[1]) / fabsf(scale[1]));
+        dw[3] = u_fui(((float) max_gby - translate[1]) / fabsf(scale[1]));
+        clip_viewport += clip_stride;
+
+        /* CC_VIEWPORT */
+        dw = cc_viewport;
+        dw[0] = u_fui(viewport->minDepth);
+        dw[1] = u_fui(viewport->maxDepth);
+        cc_viewport += 2;
+    }
+
+    for (i = 0; i < cmd->bind.state.viewport.viewport_count; i++) {
+        const VkRect2D *scissor = &cmd->bind.state.viewport.scissors[i];
+        /* SCISSOR_RECT */
+        int16_t max_x, max_y;
+        uint32_t *dw = NULL;
+
+        max_x = (scissor->offset.x + scissor->extent.width - 1) & 0xffff;
+        max_y = (scissor->offset.y + scissor->extent.height - 1) & 0xffff;
+
+        dw = scissor_rect;
+        if (scissor->extent.width && scissor->extent.height) {
+            dw[0] = (scissor->offset.y & 0xffff) << 16 |
+                                                    (scissor->offset.x & 0xffff);
+            dw[1] = max_y << 16 | max_x;
+        } else {
+            dw[0] = 1 << 16 | 1;
+            dw[1] = 0;
+        }
+        scissor_rect += 2;
+    }
+}
+
+static void emit_bounded_states(struct intel_cmd *cmd)
+{
+    set_viewport_state(cmd);
+
+    emit_msaa(cmd);
+
+    emit_graphics_pipeline(cmd);
+
+    emit_rt(cmd);
+    emit_ds(cmd);
+
+    if (cmd_gen(cmd) >= INTEL_GEN(7)) {
+        gen7_cc_states(cmd);
+        gen7_viewport_states(cmd);
+
+        gen7_pcb(cmd, GEN6_RENDER_OPCODE_3DSTATE_CONSTANT_VS,
+                &cmd->bind.pipeline.graphics->vs);
+        gen7_pcb(cmd, GEN6_RENDER_OPCODE_3DSTATE_CONSTANT_GS,
+                &cmd->bind.pipeline.graphics->gs);
+        gen7_pcb(cmd, GEN6_RENDER_OPCODE_3DSTATE_CONSTANT_PS,
+                &cmd->bind.pipeline.graphics->fs);
+
+        gen7_3DSTATE_GS(cmd);
+        gen6_3DSTATE_CLIP(cmd);
+        gen7_3DSTATE_SF(cmd);
+        gen7_3DSTATE_WM(cmd);
+        gen7_3DSTATE_PS(cmd);
+    } else {
+        gen6_cc_states(cmd);
+        gen6_viewport_states(cmd);
+
+        gen6_pcb(cmd, GEN6_RENDER_OPCODE_3DSTATE_CONSTANT_VS,
+                &cmd->bind.pipeline.graphics->vs);
+        gen6_pcb(cmd, GEN6_RENDER_OPCODE_3DSTATE_CONSTANT_GS,
+                &cmd->bind.pipeline.graphics->gs);
+        gen6_pcb(cmd, GEN6_RENDER_OPCODE_3DSTATE_CONSTANT_PS,
+                &cmd->bind.pipeline.graphics->fs);
+
+        gen6_3DSTATE_GS(cmd);
+        gen6_3DSTATE_CLIP(cmd);
+        gen6_3DSTATE_SF(cmd);
+        gen6_3DSTATE_WM(cmd);
+    }
+
+    emit_shader_resources(cmd);
+
+    cmd_wa_gen6_pre_depth_stall_write(cmd);
+
+    gen6_3DSTATE_VERTEX_BUFFERS(cmd);
+    gen6_3DSTATE_VS(cmd);
+}
+
+static uint32_t gen6_meta_DEPTH_STENCIL_STATE(struct intel_cmd *cmd,
+                                              const struct intel_cmd_meta *meta)
+{
+    const uint8_t cmd_align = GEN6_ALIGNMENT_DEPTH_STENCIL_STATE;
+    const uint8_t cmd_len = 3;
+    uint32_t dw[3];
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    /* TODO: aspect is now a mask, can you do both? */
+    if (meta->ds.aspect == VK_IMAGE_ASPECT_DEPTH_BIT) {
+        dw[0] = 0;
+        dw[1] = 0;
+
+        if (meta->ds.op == INTEL_CMD_META_DS_RESOLVE) {
+            dw[2] = GEN6_ZS_DW2_DEPTH_TEST_ENABLE |
+                    GEN6_COMPAREFUNCTION_NEVER << 27 |
+                    GEN6_ZS_DW2_DEPTH_WRITE_ENABLE;
+        } else {
+            dw[2] = GEN6_COMPAREFUNCTION_ALWAYS << 27 |
+                    GEN6_ZS_DW2_DEPTH_WRITE_ENABLE;
+        }
+    } else if (meta->ds.aspect == VK_IMAGE_ASPECT_STENCIL_BIT) {
+        dw[0] = GEN6_ZS_DW0_STENCIL_TEST_ENABLE |
+                (GEN6_COMPAREFUNCTION_ALWAYS) << 28 |
+                (GEN6_STENCILOP_KEEP) << 25 |
+                (GEN6_STENCILOP_KEEP) << 22 |
+                (GEN6_STENCILOP_REPLACE) << 19 |
+                GEN6_ZS_DW0_STENCIL_WRITE_ENABLE |
+                GEN6_ZS_DW0_STENCIL1_ENABLE |
+                (GEN6_COMPAREFUNCTION_ALWAYS) << 12 |
+                (GEN6_STENCILOP_KEEP) << 9 |
+                (GEN6_STENCILOP_KEEP) << 6 |
+                (GEN6_STENCILOP_REPLACE) << 3;
+
+        dw[1] = 0xff << GEN6_ZS_DW1_STENCIL0_VALUEMASK__SHIFT |
+                0xff << GEN6_ZS_DW1_STENCIL0_WRITEMASK__SHIFT |
+                0xff << GEN6_ZS_DW1_STENCIL1_VALUEMASK__SHIFT |
+                0xff << GEN6_ZS_DW1_STENCIL1_WRITEMASK__SHIFT;
+        dw[2] = 0;
+    }
+
+    return cmd_state_write(cmd, INTEL_CMD_ITEM_DEPTH_STENCIL,
+            cmd_align, cmd_len, dw);
+}
+
+static void gen6_meta_dynamic_states(struct intel_cmd *cmd)
+{
+    const struct intel_cmd_meta *meta = cmd->bind.meta;
+    uint32_t blend_offset, ds_offset, cc_offset, cc_vp_offset, *dw;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    blend_offset = 0;
+    ds_offset = 0;
+    cc_offset = 0;
+    cc_vp_offset = 0;
+
+    if (meta->mode == INTEL_CMD_META_FS_RECT) {
+        /* BLEND_STATE */
+        blend_offset = cmd_state_pointer(cmd, INTEL_CMD_ITEM_BLEND,
+                GEN6_ALIGNMENT_BLEND_STATE, 2, &dw);
+        dw[0] = 0;
+        dw[1] = GEN6_RT_DW1_COLORCLAMP_RTFORMAT | 0x3;
+    }
+
+    if (meta->mode != INTEL_CMD_META_VS_POINTS) {
+        if (meta->ds.aspect == VK_IMAGE_ASPECT_DEPTH_BIT ||
+            meta->ds.aspect == VK_IMAGE_ASPECT_STENCIL_BIT) {
+            const uint32_t blend_color[4] = { 0, 0, 0, 0 };
+            uint32_t stencil_ref = (meta->ds.stencil_ref & 0xff) << 24 |
+                                   (meta->ds.stencil_ref & 0xff) << 16;
+
+            /* DEPTH_STENCIL_STATE */
+            ds_offset = gen6_meta_DEPTH_STENCIL_STATE(cmd, meta);
+
+            /* COLOR_CALC_STATE */
+            cc_offset = gen6_COLOR_CALC_STATE(cmd,
+                    stencil_ref, blend_color);
+
+            /* CC_VIEWPORT */
+            cc_vp_offset = cmd_state_pointer(cmd, INTEL_CMD_ITEM_CC_VIEWPORT,
+                    GEN6_ALIGNMENT_CC_VIEWPORT, 2, &dw);
+            dw[0] = u_fui(0.0f);
+            dw[1] = u_fui(1.0f);
+        } else {
+            /* DEPTH_STENCIL_STATE */
+            ds_offset = cmd_state_pointer(cmd, INTEL_CMD_ITEM_DEPTH_STENCIL,
+                    GEN6_ALIGNMENT_DEPTH_STENCIL_STATE,
+                    GEN6_DEPTH_STENCIL_STATE__SIZE, &dw);
+            memset(dw, 0, sizeof(*dw) * GEN6_DEPTH_STENCIL_STATE__SIZE);
+        }
+    }
+
+    if (cmd_gen(cmd) >= INTEL_GEN(7)) {
+        gen7_3dstate_pointer(cmd,
+                GEN7_RENDER_OPCODE_3DSTATE_BLEND_STATE_POINTERS,
+                blend_offset);
+        gen7_3dstate_pointer(cmd,
+                GEN7_RENDER_OPCODE_3DSTATE_DEPTH_STENCIL_STATE_POINTERS,
+                ds_offset);
+        gen7_3dstate_pointer(cmd,
+                GEN6_RENDER_OPCODE_3DSTATE_CC_STATE_POINTERS, cc_offset);
+
+        gen7_3dstate_pointer(cmd,
+                GEN7_RENDER_OPCODE_3DSTATE_VIEWPORT_STATE_POINTERS_CC,
+                cc_vp_offset);
+    } else {
+        /* 3DSTATE_CC_STATE_POINTERS */
+        gen6_3DSTATE_CC_STATE_POINTERS(cmd, blend_offset, ds_offset, cc_offset);
+
+        /* 3DSTATE_VIEWPORT_STATE_POINTERS */
+        cmd_batch_pointer(cmd, 4, &dw);
+        dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_VIEWPORT_STATE_POINTERS) | (4 - 2) |
+                GEN6_VP_PTR_DW0_CC_CHANGED;
+        dw[1] = 0;
+        dw[2] = 0;
+        dw[3] = cc_vp_offset;
+    }
+}
+
+static void gen6_meta_surface_states(struct intel_cmd *cmd)
+{
+    const struct intel_cmd_meta *meta = cmd->bind.meta;
+    uint32_t binding_table[2] = { 0, 0 };
+    uint32_t offset;
+    const uint32_t sba_offset =
+        cmd->writers[INTEL_CMD_WRITER_SURFACE].sba_offset;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    if (meta->mode == INTEL_CMD_META_DEPTH_STENCIL_RECT)
+        return;
+
+    /* SURFACE_STATEs */
+    if (meta->src.valid) {
+        offset = cmd_surface_write(cmd, INTEL_CMD_ITEM_SURFACE,
+                GEN6_ALIGNMENT_SURFACE_STATE,
+                meta->src.surface_len, meta->src.surface);
+
+        cmd_reserve_reloc(cmd, 1);
+        if (meta->src.reloc_flags & INTEL_CMD_RELOC_TARGET_IS_WRITER) {
+            cmd_surface_reloc_writer(cmd, offset, 1,
+                    meta->src.reloc_target, meta->src.reloc_offset);
+        } else {
+            cmd_surface_reloc(cmd, offset, 1,
+                    (struct intel_bo *) meta->src.reloc_target,
+                    meta->src.reloc_offset, meta->src.reloc_flags);
+        }
+
+        binding_table[0] = offset - sba_offset;
+    }
+    if (meta->dst.valid) {
+        offset = cmd_surface_write(cmd, INTEL_CMD_ITEM_SURFACE,
+                GEN6_ALIGNMENT_SURFACE_STATE,
+                meta->dst.surface_len, meta->dst.surface);
+
+        cmd_reserve_reloc(cmd, 1);
+        cmd_surface_reloc(cmd, offset, 1,
+                (struct intel_bo *) meta->dst.reloc_target,
+                meta->dst.reloc_offset, meta->dst.reloc_flags);
+
+        binding_table[1] = offset - sba_offset;
+    }
+
+    /* BINDING_TABLE */
+    offset = cmd_surface_write(cmd, INTEL_CMD_ITEM_BINDING_TABLE,
+            GEN6_ALIGNMENT_BINDING_TABLE_STATE,
+            2, binding_table);
+
+    if (cmd_gen(cmd) >= INTEL_GEN(7)) {
+        const int subop = (meta->mode == INTEL_CMD_META_VS_POINTS) ?
+            GEN7_RENDER_OPCODE_3DSTATE_BINDING_TABLE_POINTERS_VS :
+            GEN7_RENDER_OPCODE_3DSTATE_BINDING_TABLE_POINTERS_PS;
+        gen7_3dstate_pointer(cmd, subop, offset - sba_offset);
+    } else {
+        /* 3DSTATE_BINDING_TABLE_POINTERS */
+        if (meta->mode == INTEL_CMD_META_VS_POINTS)
+            gen6_3DSTATE_BINDING_TABLE_POINTERS(cmd, offset - sba_offset, 0, 0);
+        else
+            gen6_3DSTATE_BINDING_TABLE_POINTERS(cmd, 0, 0, offset - sba_offset);
+    }
+}
+
+static void gen6_meta_urb(struct intel_cmd *cmd)
+{
+    const int vs_entry_count = (cmd->dev->gpu->gt == 2) ? 256 : 128;
+    uint32_t *dw;
+
+    CMD_ASSERT(cmd, 6, 6);
+
+    /* 3DSTATE_URB */
+    cmd_batch_pointer(cmd, 3, &dw);
+    dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_URB) | (3 - 2);
+    dw[1] = vs_entry_count << GEN6_URB_DW1_VS_ENTRY_COUNT__SHIFT;
+    dw[2] = 0;
+}
+
+static void gen7_meta_urb(struct intel_cmd *cmd)
+{
+    const int pcb_alloc = (cmd->dev->gpu->gt == 3) ? 16 : 8;
+    const int urb_offset = pcb_alloc / 8;
+    int vs_entry_count;
+    uint32_t *dw;
+
+    CMD_ASSERT(cmd, 7, 7.5);
+
+    cmd_wa_gen7_pre_vs_depth_stall_write(cmd);
+
+    switch (cmd_gen(cmd)) {
+    case INTEL_GEN(7.5):
+        vs_entry_count = (cmd->dev->gpu->gt >= 2) ? 1664 : 640;
+        break;
+    case INTEL_GEN(7):
+    default:
+        vs_entry_count = (cmd->dev->gpu->gt == 2) ? 704 : 512;
+        break;
+    }
+
+    /* 3DSTATE_URB_x */
+    cmd_batch_pointer(cmd, 8, &dw);
+
+    dw[0] = GEN7_RENDER_CMD(3D, 3DSTATE_URB_VS) | (2 - 2);
+    dw[1] = urb_offset << GEN7_URB_DW1_OFFSET__SHIFT |
+            vs_entry_count;
+    dw += 2;
+
+    dw[0] = GEN7_RENDER_CMD(3D, 3DSTATE_URB_HS) | (2 - 2);
+    dw[1] = urb_offset << GEN7_URB_DW1_OFFSET__SHIFT;
+    dw += 2;
+
+    dw[0] = GEN7_RENDER_CMD(3D, 3DSTATE_URB_DS) | (2 - 2);
+    dw[1] = urb_offset << GEN7_URB_DW1_OFFSET__SHIFT;
+    dw += 2;
+
+    dw[0] = GEN7_RENDER_CMD(3D, 3DSTATE_URB_GS) | (2 - 2);
+    dw[1] = urb_offset << GEN7_URB_DW1_OFFSET__SHIFT;
+    dw += 2;
+}
+
+static void gen6_meta_vf(struct intel_cmd *cmd)
+{
+    const struct intel_cmd_meta *meta = cmd->bind.meta;
+    uint32_t vb_start, vb_end, vb_stride;
+    int ve_format, ve_z_source;
+    uint32_t *dw;
+    uint32_t pos;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    switch (meta->mode) {
+    case INTEL_CMD_META_VS_POINTS:
+        cmd_batch_pointer(cmd, 3, &dw);
+        dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_VERTEX_ELEMENTS) | (3 - 2);
+        dw[1] = GEN6_VE_DW0_VALID;
+        dw[2] = GEN6_VFCOMP_STORE_VID << GEN6_VE_DW1_COMP0__SHIFT |
+                GEN6_VFCOMP_NOSTORE << GEN6_VE_DW1_COMP1__SHIFT |
+                GEN6_VFCOMP_NOSTORE << GEN6_VE_DW1_COMP2__SHIFT |
+                GEN6_VFCOMP_NOSTORE << GEN6_VE_DW1_COMP3__SHIFT;
+        return;
+        break;
+    case INTEL_CMD_META_FS_RECT:
+        {
+            uint32_t vertices[3][2];
+
+            vertices[0][0] = meta->dst.x + meta->width;
+            vertices[0][1] = meta->dst.y + meta->height;
+            vertices[1][0] = meta->dst.x;
+            vertices[1][1] = meta->dst.y + meta->height;
+            vertices[2][0] = meta->dst.x;
+            vertices[2][1] = meta->dst.y;
+
+            vb_start = cmd_state_write(cmd, INTEL_CMD_ITEM_BLOB, 32,
+                    sizeof(vertices) / 4, (const uint32_t *) vertices);
+
+            vb_end = vb_start + sizeof(vertices) - 1;
+            vb_stride = sizeof(vertices[0]);
+            ve_z_source = GEN6_VFCOMP_STORE_0;
+            ve_format = GEN6_FORMAT_R32G32_USCALED;
+        }
+        break;
+    case INTEL_CMD_META_DEPTH_STENCIL_RECT:
+        {
+            float vertices[3][3];
+
+            vertices[0][0] = (float) (meta->dst.x + meta->width);
+            vertices[0][1] = (float) (meta->dst.y + meta->height);
+            vertices[0][2] = u_uif(meta->clear_val[0]);
+            vertices[1][0] = (float) meta->dst.x;
+            vertices[1][1] = (float) (meta->dst.y + meta->height);
+            vertices[1][2] = u_uif(meta->clear_val[0]);
+            vertices[2][0] = (float) meta->dst.x;
+            vertices[2][1] = (float) meta->dst.y;
+            vertices[2][2] = u_uif(meta->clear_val[0]);
+
+            vb_start = cmd_state_write(cmd, INTEL_CMD_ITEM_BLOB, 32,
+                    sizeof(vertices) / 4, (const uint32_t *) vertices);
+
+            vb_end = vb_start + sizeof(vertices) - 1;
+            vb_stride = sizeof(vertices[0]);
+            ve_z_source = GEN6_VFCOMP_STORE_SRC;
+            ve_format = GEN6_FORMAT_R32G32B32_FLOAT;
+        }
+        break;
+    default:
+        assert(!"unknown meta mode");
+        return;
+        break;
+    }
+
+    /* 3DSTATE_VERTEX_BUFFERS */
+    pos = cmd_batch_pointer(cmd, 5, &dw);
+
+    dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_VERTEX_BUFFERS) | (5 - 2);
+    dw[1] = vb_stride;
+    if (cmd_gen(cmd) >= INTEL_GEN(7))
+        dw[1] |= GEN7_VB_DW0_ADDR_MODIFIED;
+
+    cmd_reserve_reloc(cmd, 2);
+    cmd_batch_reloc_writer(cmd, pos + 2, INTEL_CMD_WRITER_STATE, vb_start);
+    cmd_batch_reloc_writer(cmd, pos + 3, INTEL_CMD_WRITER_STATE, vb_end);
+
+    dw[4] = 0;
+
+    /* 3DSTATE_VERTEX_ELEMENTS */
+    cmd_batch_pointer(cmd, 5, &dw);
+    dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_VERTEX_ELEMENTS) | (5 - 2);
+    dw[1] = GEN6_VE_DW0_VALID;
+    dw[2] = GEN6_VFCOMP_STORE_0 << GEN6_VE_DW1_COMP0__SHIFT | /* Reserved */
+            GEN6_VFCOMP_STORE_0 << GEN6_VE_DW1_COMP1__SHIFT | /* Render Target Array Index */
+            GEN6_VFCOMP_STORE_0 << GEN6_VE_DW1_COMP2__SHIFT | /* Viewport Index */
+            GEN6_VFCOMP_STORE_0 << GEN6_VE_DW1_COMP3__SHIFT;  /* Point Width */
+    dw[3] = GEN6_VE_DW0_VALID |
+            ve_format << GEN6_VE_DW0_FORMAT__SHIFT;
+    dw[4] = GEN6_VFCOMP_STORE_SRC  << GEN6_VE_DW1_COMP0__SHIFT |
+            GEN6_VFCOMP_STORE_SRC  << GEN6_VE_DW1_COMP1__SHIFT |
+            ve_z_source            << GEN6_VE_DW1_COMP2__SHIFT |
+            GEN6_VFCOMP_STORE_1_FP << GEN6_VE_DW1_COMP3__SHIFT;
+}
+
+static uint32_t gen6_meta_vs_constants(struct intel_cmd *cmd)
+{
+    const struct intel_cmd_meta *meta = cmd->bind.meta;
+    /* one GPR */
+    uint32_t consts[8];
+    uint32_t const_count;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    switch (meta->shader_id) {
+    case INTEL_DEV_META_VS_FILL_MEM:
+        consts[0] = meta->dst.x;
+        consts[1] = meta->clear_val[0];
+        const_count = 2;
+        break;
+    case INTEL_DEV_META_VS_COPY_MEM:
+    case INTEL_DEV_META_VS_COPY_MEM_UNALIGNED:
+        consts[0] = meta->dst.x;
+        consts[1] = meta->src.x;
+        const_count = 2;
+        break;
+    case INTEL_DEV_META_VS_COPY_R8_TO_MEM:
+    case INTEL_DEV_META_VS_COPY_R16_TO_MEM:
+    case INTEL_DEV_META_VS_COPY_R32_TO_MEM:
+    case INTEL_DEV_META_VS_COPY_R32G32_TO_MEM:
+    case INTEL_DEV_META_VS_COPY_R32G32B32A32_TO_MEM:
+        consts[0] = meta->src.x;
+        consts[1] = meta->src.y;
+        consts[2] = meta->width;
+        consts[3] = meta->dst.x;
+        const_count = 4;
+        break;
+    default:
+        assert(!"unknown meta shader id");
+        const_count = 0;
+        break;
+    }
+
+    /* this can be skipped but it makes state dumping prettier */
+    memset(&consts[const_count], 0, sizeof(consts[0]) * (8 - const_count));
+
+    return cmd_state_write(cmd, INTEL_CMD_ITEM_BLOB, 32, 8, consts);
+}
+
+static void gen6_meta_vs(struct intel_cmd *cmd)
+{
+    const struct intel_cmd_meta *meta = cmd->bind.meta;
+    const struct intel_pipeline_shader *sh =
+        intel_dev_get_meta_shader(cmd->dev, meta->shader_id);
+    uint32_t offset, *dw;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    if (meta->mode != INTEL_CMD_META_VS_POINTS) {
+        uint32_t cmd_len;
+
+        /* 3DSTATE_CONSTANT_VS */
+        cmd_len = (cmd_gen(cmd) >= INTEL_GEN(7)) ? 7 : 5;
+        cmd_batch_pointer(cmd, cmd_len, &dw);
+        dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_CONSTANT_VS) | (cmd_len - 2);
+        memset(&dw[1], 0, sizeof(*dw) * (cmd_len - 1));
+
+        /* 3DSTATE_VS */
+        cmd_batch_pointer(cmd, 6, &dw);
+        dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_VS) | (6 - 2);
+        memset(&dw[1], 0, sizeof(*dw) * (6 - 1));
+
+        return;
+    }
+
+    assert(meta->dst.valid && sh->uses == INTEL_SHADER_USE_VID);
+
+    /* 3DSTATE_CONSTANT_VS */
+    offset = gen6_meta_vs_constants(cmd);
+    if (cmd_gen(cmd) >= INTEL_GEN(7)) {
+        cmd_batch_pointer(cmd, 7, &dw);
+        dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_CONSTANT_VS) | (7 - 2);
+        dw[1] = 1 << GEN7_CONSTANT_DW1_BUFFER0_READ_LEN__SHIFT;
+        dw[2] = 0;
+        dw[3] = offset | GEN7_MOCS_L3_WB;
+        dw[4] = 0;
+        dw[5] = 0;
+        dw[6] = 0;
+    } else {
+        cmd_batch_pointer(cmd, 5, &dw);
+        dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_CONSTANT_VS) | (5 - 2) |
+                1 << GEN6_CONSTANT_DW0_BUFFER_ENABLES__SHIFT;
+        dw[1] = offset;
+        dw[2] = 0;
+        dw[3] = 0;
+        dw[4] = 0;
+    }
+
+    /* 3DSTATE_VS */
+    offset = emit_shader(cmd, sh);
+    cmd_batch_pointer(cmd, 6, &dw);
+    dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_VS) | (6 - 2);
+    dw[1] = offset;
+    dw[2] = GEN6_THREADDISP_SPF |
+            (sh->sampler_count + 3) / 4 << GEN6_THREADDISP_SAMPLER_COUNT__SHIFT |
+             sh->surface_count << GEN6_THREADDISP_BINDING_TABLE_SIZE__SHIFT;
+    dw[3] = 0; /* scratch */
+    dw[4] = sh->urb_grf_start << GEN6_VS_DW4_URB_GRF_START__SHIFT |
+            1 << GEN6_VS_DW4_URB_READ_LEN__SHIFT;
+
+    dw[5] = GEN6_VS_DW5_CACHE_DISABLE |
+            GEN6_VS_DW5_VS_ENABLE;
+    if (cmd_gen(cmd) >= INTEL_GEN(7.5))
+        dw[5] |= (sh->max_threads - 1) << GEN75_VS_DW5_MAX_THREADS__SHIFT;
+    else
+        dw[5] |= (sh->max_threads - 1) << GEN6_VS_DW5_MAX_THREADS__SHIFT;
+
+    assert(!sh->per_thread_scratch_size);
+}
+
+static void gen6_meta_disabled(struct intel_cmd *cmd)
+{
+    uint32_t *dw;
+
+    CMD_ASSERT(cmd, 6, 6);
+
+    /* 3DSTATE_CONSTANT_GS */
+    cmd_batch_pointer(cmd, 5, &dw);
+    dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_CONSTANT_GS) | (5 - 2);
+    dw[1] = 0;
+    dw[2] = 0;
+    dw[3] = 0;
+    dw[4] = 0;
+
+    /* 3DSTATE_GS */
+    cmd_batch_pointer(cmd, 7, &dw);
+    dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_GS) | (7 - 2);
+    dw[1] = 0;
+    dw[2] = 0;
+    dw[3] = 0;
+    dw[4] = 1 << GEN6_GS_DW4_URB_READ_LEN__SHIFT;
+    dw[5] = GEN6_GS_DW5_STATISTICS;
+    dw[6] = 0;
+
+    /* 3DSTATE_SF */
+    cmd_batch_pointer(cmd, 20, &dw);
+    dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_SF) | (20 - 2);
+    dw[1] = 1 << GEN7_SBE_DW1_URB_READ_LEN__SHIFT;
+    memset(&dw[2], 0, 18 * sizeof(*dw));
+}
+
+static void gen7_meta_disabled(struct intel_cmd *cmd)
+{
+    uint32_t *dw;
+
+    CMD_ASSERT(cmd, 7, 7.5);
+
+    /* 3DSTATE_CONSTANT_HS */
+    cmd_batch_pointer(cmd, 7, &dw);
+    dw[0] = GEN7_RENDER_CMD(3D, 3DSTATE_CONSTANT_HS) | (7 - 2);
+    memset(&dw[1], 0, sizeof(*dw) * (7 - 1));
+
+    /* 3DSTATE_HS */
+    cmd_batch_pointer(cmd, 7, &dw);
+    dw[0] = GEN7_RENDER_CMD(3D, 3DSTATE_HS) | (7 - 2);
+    memset(&dw[1], 0, sizeof(*dw) * (7 - 1));
+
+    /* 3DSTATE_TE */
+    cmd_batch_pointer(cmd, 4, &dw);
+    dw[0] = GEN7_RENDER_CMD(3D, 3DSTATE_TE) | (4 - 2);
+    memset(&dw[1], 0, sizeof(*dw) * (4 - 1));
+
+    /* 3DSTATE_CONSTANT_DS */
+    cmd_batch_pointer(cmd, 7, &dw);
+    dw[0] = GEN7_RENDER_CMD(3D, 3DSTATE_CONSTANT_DS) | (7 - 2);
+    memset(&dw[1], 0, sizeof(*dw) * (7 - 1));
+
+    /* 3DSTATE_DS */
+    cmd_batch_pointer(cmd, 6, &dw);
+    dw[0] = GEN7_RENDER_CMD(3D, 3DSTATE_DS) | (6 - 2);
+    memset(&dw[1], 0, sizeof(*dw) * (6 - 1));
+
+    /* 3DSTATE_CONSTANT_GS */
+    cmd_batch_pointer(cmd, 7, &dw);
+    dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_CONSTANT_GS) | (7 - 2);
+    memset(&dw[1], 0, sizeof(*dw) * (7 - 1));
+
+    /* 3DSTATE_GS */
+    cmd_batch_pointer(cmd, 7, &dw);
+    dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_GS) | (7 - 2);
+    memset(&dw[1], 0, sizeof(*dw) * (7 - 1));
+
+    /* 3DSTATE_STREAMOUT */
+    cmd_batch_pointer(cmd, 3, &dw);
+    dw[0] = GEN7_RENDER_CMD(3D, 3DSTATE_STREAMOUT) | (3 - 2);
+    memset(&dw[1], 0, sizeof(*dw) * (3 - 1));
+
+    /* 3DSTATE_SF */
+    cmd_batch_pointer(cmd, 7, &dw);
+    dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_SF) | (7 - 2);
+    memset(&dw[1], 0, sizeof(*dw) * (7 - 1));
+
+    /* 3DSTATE_SBE */
+    cmd_batch_pointer(cmd, 14, &dw);
+    dw[0] = GEN7_RENDER_CMD(3D, 3DSTATE_SBE) | (14 - 2);
+    dw[1] = 1 << GEN7_SBE_DW1_URB_READ_LEN__SHIFT;
+    memset(&dw[2], 0, sizeof(*dw) * (14 - 2));
+}
+
+static void gen6_meta_clip(struct intel_cmd *cmd)
+{
+    const struct intel_cmd_meta *meta = cmd->bind.meta;
+    uint32_t *dw;
+
+    /* 3DSTATE_CLIP */
+    cmd_batch_pointer(cmd, 4, &dw);
+    dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_CLIP) | (4 - 2);
+    dw[1] = 0;
+    if (meta->mode == INTEL_CMD_META_VS_POINTS) {
+        dw[2] = GEN6_CLIP_DW2_CLIP_ENABLE |
+                GEN6_CLIP_DW2_CLIPMODE_REJECT_ALL;
+    } else {
+        dw[2] = 0;
+    }
+    dw[3] = 0;
+}
+
+static void gen6_meta_wm(struct intel_cmd *cmd)
+{
+    const struct intel_cmd_meta *meta = cmd->bind.meta;
+    uint32_t *dw;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    cmd_wa_gen6_pre_multisample_depth_flush(cmd);
+
+    /* 3DSTATE_MULTISAMPLE */
+    if (cmd_gen(cmd) >= INTEL_GEN(7)) {
+        cmd_batch_pointer(cmd, 4, &dw);
+        dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_MULTISAMPLE) | (4 - 2);
+        dw[1] =
+            (meta->sample_count <= 1) ? GEN6_MULTISAMPLE_DW1_NUMSAMPLES_1 :
+            (meta->sample_count <= 4) ? GEN6_MULTISAMPLE_DW1_NUMSAMPLES_4 :
+                                        GEN7_MULTISAMPLE_DW1_NUMSAMPLES_8;
+        dw[2] = 0;
+        dw[3] = 0;
+    } else {
+        cmd_batch_pointer(cmd, 3, &dw);
+        dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_MULTISAMPLE) | (3 - 2);
+        dw[1] = (meta->sample_count <= 1) ? GEN6_MULTISAMPLE_DW1_NUMSAMPLES_1 :
+                                       GEN6_MULTISAMPLE_DW1_NUMSAMPLES_4;
+        dw[2] = 0;
+    }
+
+    /* 3DSTATE_SAMPLE_MASK */
+    cmd_batch_pointer(cmd, 2, &dw);
+    dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_SAMPLE_MASK) | (2 - 2);
+    dw[1] = (1 << meta->sample_count) - 1;
+
+    /* 3DSTATE_DRAWING_RECTANGLE */
+    cmd_batch_pointer(cmd, 4, &dw);
+    dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_DRAWING_RECTANGLE) | (4 - 2);
+    if (meta->mode == INTEL_CMD_META_VS_POINTS) {
+        /* unused */
+        dw[1] = 0;
+        dw[2] = 0;
+    } else {
+        dw[1] = meta->dst.y << 16 | meta->dst.x;
+        dw[2] = (meta->dst.y + meta->height - 1) << 16 |
+                (meta->dst.x + meta->width - 1);
+    }
+    dw[3] = 0;
+}
+
+static uint32_t gen6_meta_ps_constants(struct intel_cmd *cmd)
+{
+    const struct intel_cmd_meta *meta = cmd->bind.meta;
+    uint32_t offset_x, offset_y;
+    /* one GPR */
+    uint32_t consts[8];
+    uint32_t const_count;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    /* underflow is fine here */
+    offset_x = meta->src.x - meta->dst.x;
+    offset_y = meta->src.y - meta->dst.y;
+
+    switch (meta->shader_id) {
+    case INTEL_DEV_META_FS_COPY_MEM:
+    case INTEL_DEV_META_FS_COPY_1D:
+    case INTEL_DEV_META_FS_COPY_1D_ARRAY:
+    case INTEL_DEV_META_FS_COPY_2D:
+    case INTEL_DEV_META_FS_COPY_2D_ARRAY:
+    case INTEL_DEV_META_FS_COPY_2D_MS:
+        consts[0] = offset_x;
+        consts[1] = offset_y;
+        consts[2] = meta->src.layer;
+        consts[3] = meta->src.lod;
+        const_count = 4;
+        break;
+    case INTEL_DEV_META_FS_COPY_1D_TO_MEM:
+    case INTEL_DEV_META_FS_COPY_1D_ARRAY_TO_MEM:
+    case INTEL_DEV_META_FS_COPY_2D_TO_MEM:
+    case INTEL_DEV_META_FS_COPY_2D_ARRAY_TO_MEM:
+    case INTEL_DEV_META_FS_COPY_2D_MS_TO_MEM:
+        consts[0] = offset_x;
+        consts[1] = offset_y;
+        consts[2] = meta->src.layer;
+        consts[3] = meta->src.lod;
+        consts[4] = meta->src.x;
+        consts[5] = meta->width;
+        const_count = 6;
+        break;
+    case INTEL_DEV_META_FS_COPY_MEM_TO_IMG:
+        consts[0] = offset_x;
+        consts[1] = offset_y;
+        consts[2] = meta->width;
+        const_count = 3;
+        break;
+    case INTEL_DEV_META_FS_CLEAR_COLOR:
+        consts[0] = meta->clear_val[0];
+        consts[1] = meta->clear_val[1];
+        consts[2] = meta->clear_val[2];
+        consts[3] = meta->clear_val[3];
+        const_count = 4;
+        break;
+    case INTEL_DEV_META_FS_CLEAR_DEPTH:
+        consts[0] = meta->clear_val[0];
+        consts[1] = meta->clear_val[1];
+        const_count = 2;
+        break;
+    case INTEL_DEV_META_FS_RESOLVE_2X:
+    case INTEL_DEV_META_FS_RESOLVE_4X:
+    case INTEL_DEV_META_FS_RESOLVE_8X:
+    case INTEL_DEV_META_FS_RESOLVE_16X:
+        consts[0] = offset_x;
+        consts[1] = offset_y;
+        const_count = 2;
+        break;
+    default:
+        assert(!"unknown meta shader id");
+        const_count = 0;
+        break;
+    }
+
+    /* this can be skipped but it makes state dumping prettier */
+    memset(&consts[const_count], 0, sizeof(consts[0]) * (8 - const_count));
+
+    return cmd_state_write(cmd, INTEL_CMD_ITEM_BLOB, 32, 8, consts);
+}
+
+static void gen6_meta_ps(struct intel_cmd *cmd)
+{
+    const struct intel_cmd_meta *meta = cmd->bind.meta;
+    const struct intel_pipeline_shader *sh =
+        intel_dev_get_meta_shader(cmd->dev, meta->shader_id);
+    uint32_t offset, *dw;
+
+    CMD_ASSERT(cmd, 6, 6);
+
+    if (meta->mode != INTEL_CMD_META_FS_RECT) {
+        /* 3DSTATE_CONSTANT_PS */
+        cmd_batch_pointer(cmd, 5, &dw);
+        dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_CONSTANT_PS) | (5 - 2);
+        dw[1] = 0;
+        dw[2] = 0;
+        dw[3] = 0;
+        dw[4] = 0;
+
+        /* 3DSTATE_WM */
+        cmd_batch_pointer(cmd, 9, &dw);
+        dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_WM) | (9 - 2);
+        dw[1] = 0;
+        dw[2] = 0;
+        dw[3] = 0;
+
+        switch (meta->ds.op) {
+        case INTEL_CMD_META_DS_HIZ_CLEAR:
+            dw[4] = GEN6_WM_DW4_DEPTH_CLEAR;
+            break;
+        case INTEL_CMD_META_DS_HIZ_RESOLVE:
+            dw[4] = GEN6_WM_DW4_HIZ_RESOLVE;
+            break;
+        case INTEL_CMD_META_DS_RESOLVE:
+            dw[4] = GEN6_WM_DW4_DEPTH_RESOLVE;
+            break;
+        default:
+            dw[4] = 0;
+            break;
+        }
+
+        dw[5] = (sh->max_threads - 1) << GEN6_WM_DW5_MAX_THREADS__SHIFT;
+        dw[6] = 0;
+        dw[7] = 0;
+        dw[8] = 0;
+
+        return;
+    }
+
+    /* a normal color write */
+    assert(meta->dst.valid && !sh->uses);
+
+    /* 3DSTATE_CONSTANT_PS */
+    offset = gen6_meta_ps_constants(cmd);
+    cmd_batch_pointer(cmd, 5, &dw);
+    dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_CONSTANT_PS) | (5 - 2) |
+            1 << GEN6_CONSTANT_DW0_BUFFER_ENABLES__SHIFT;
+    dw[1] = offset;
+    dw[2] = 0;
+    dw[3] = 0;
+    dw[4] = 0;
+
+    /* 3DSTATE_WM */
+    offset = emit_shader(cmd, sh);
+    cmd_batch_pointer(cmd, 9, &dw);
+    dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_WM) | (9 - 2);
+    dw[1] = offset;
+    dw[2] = (sh->sampler_count + 3) / 4 << GEN6_THREADDISP_SAMPLER_COUNT__SHIFT |
+             sh->surface_count << GEN6_THREADDISP_BINDING_TABLE_SIZE__SHIFT;
+    dw[3] = 0; /* scratch */
+    dw[4] = sh->urb_grf_start << GEN6_WM_DW4_URB_GRF_START0__SHIFT;
+    dw[5] = (sh->max_threads - 1) << GEN6_WM_DW5_MAX_THREADS__SHIFT |
+            GEN6_WM_DW5_PS_DISPATCH_ENABLE |
+            GEN6_PS_DISPATCH_16 << GEN6_WM_DW5_PS_DISPATCH_MODE__SHIFT;
+
+    dw[6] = sh->in_count << GEN6_WM_DW6_SF_ATTR_COUNT__SHIFT |
+            GEN6_WM_DW6_PS_POSOFFSET_NONE |
+            GEN6_WM_DW6_ZW_INTERP_PIXEL |
+            sh->barycentric_interps << GEN6_WM_DW6_BARYCENTRIC_INTERP__SHIFT |
+            GEN6_WM_DW6_POINT_RASTRULE_UPPER_RIGHT;
+    if (meta->sample_count > 1) {
+        dw[6] |= GEN6_WM_DW6_MSRASTMODE_ON_PATTERN |
+                 GEN6_WM_DW6_MSDISPMODE_PERPIXEL;
+    } else {
+        dw[6] |= GEN6_WM_DW6_MSRASTMODE_OFF_PIXEL |
+                 GEN6_WM_DW6_MSDISPMODE_PERSAMPLE;
+    }
+    dw[7] = 0;
+    dw[8] = 0;
+
+    assert(!sh->per_thread_scratch_size);
+}
+
+static void gen7_meta_ps(struct intel_cmd *cmd)
+{
+    const struct intel_cmd_meta *meta = cmd->bind.meta;
+    const struct intel_pipeline_shader *sh =
+        intel_dev_get_meta_shader(cmd->dev, meta->shader_id);
+    uint32_t offset, *dw;
+
+    CMD_ASSERT(cmd, 7, 7.5);
+
+    if (meta->mode != INTEL_CMD_META_FS_RECT) {
+        /* 3DSTATE_WM */
+        cmd_batch_pointer(cmd, 3, &dw);
+        dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_WM) | (3 - 2);
+
+        switch (meta->ds.op) {
+        case INTEL_CMD_META_DS_HIZ_CLEAR:
+            dw[1] = GEN7_WM_DW1_DEPTH_CLEAR;
+            break;
+        case INTEL_CMD_META_DS_HIZ_RESOLVE:
+            dw[1] = GEN7_WM_DW1_HIZ_RESOLVE;
+            break;
+        case INTEL_CMD_META_DS_RESOLVE:
+            dw[1] = GEN7_WM_DW1_DEPTH_RESOLVE;
+            break;
+        default:
+            dw[1] = 0;
+            break;
+        }
+
+        dw[2] = 0;
+
+        /* 3DSTATE_CONSTANT_GS */
+        cmd_batch_pointer(cmd, 7, &dw);
+        dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_CONSTANT_PS) | (7 - 2);
+        memset(&dw[1], 0, sizeof(*dw) * (7 - 1));
+
+        /* 3DSTATE_PS */
+        cmd_batch_pointer(cmd, 8, &dw);
+        dw[0] = GEN7_RENDER_CMD(3D, 3DSTATE_PS) | (8 - 2);
+        dw[1] = 0;
+        dw[2] = 0;
+        dw[3] = 0;
+        /* required to avoid hangs */
+        dw[4] = GEN6_PS_DISPATCH_8 << GEN7_PS_DW4_DISPATCH_MODE__SHIFT |
+                (sh->max_threads - 1) << GEN7_PS_DW4_MAX_THREADS__SHIFT;
+        dw[5] = 0;
+        dw[6] = 0;
+        dw[7] = 0;
+
+        return;
+    }
+
+    /* a normal color write */
+    assert(meta->dst.valid && !sh->uses);
+
+    /* 3DSTATE_WM */
+    cmd_batch_pointer(cmd, 3, &dw);
+    dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_WM) | (3 - 2);
+    dw[1] = GEN7_WM_DW1_PS_DISPATCH_ENABLE |
+            GEN7_WM_DW1_ZW_INTERP_PIXEL |
+            sh->barycentric_interps << GEN7_WM_DW1_BARYCENTRIC_INTERP__SHIFT |
+            GEN7_WM_DW1_POINT_RASTRULE_UPPER_RIGHT;
+    dw[2] = 0;
+
+    /* 3DSTATE_CONSTANT_PS */
+    offset = gen6_meta_ps_constants(cmd);
+    cmd_batch_pointer(cmd, 7, &dw);
+    dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_CONSTANT_PS) | (7 - 2);
+    dw[1] = 1 << GEN7_CONSTANT_DW1_BUFFER0_READ_LEN__SHIFT;
+    dw[2] = 0;
+    dw[3] = offset | GEN7_MOCS_L3_WB;
+    dw[4] = 0;
+    dw[5] = 0;
+    dw[6] = 0;
+
+    /* 3DSTATE_PS */
+    offset = emit_shader(cmd, sh);
+    cmd_batch_pointer(cmd, 8, &dw);
+    dw[0] = GEN7_RENDER_CMD(3D, 3DSTATE_PS) | (8 - 2);
+    dw[1] = offset;
+    dw[2] = (sh->sampler_count + 3) / 4 << GEN6_THREADDISP_SAMPLER_COUNT__SHIFT |
+             sh->surface_count << GEN6_THREADDISP_BINDING_TABLE_SIZE__SHIFT;
+    dw[3] = 0; /* scratch */
+
+    dw[4] = GEN7_PS_DW4_PUSH_CONSTANT_ENABLE |
+            GEN7_PS_DW4_POSOFFSET_NONE |
+            GEN6_PS_DISPATCH_16 << GEN7_PS_DW4_DISPATCH_MODE__SHIFT;
+
+    if (cmd_gen(cmd) >= INTEL_GEN(7.5)) {
+        dw[4] |= (sh->max_threads - 1) << GEN75_PS_DW4_MAX_THREADS__SHIFT;
+        dw[4] |= ((1 << meta->sample_count) - 1) <<
+            GEN75_PS_DW4_SAMPLE_MASK__SHIFT;
+    } else {
+        dw[4] |= (sh->max_threads - 1) << GEN7_PS_DW4_MAX_THREADS__SHIFT;
+    }
+
+    dw[5] = sh->urb_grf_start << GEN7_PS_DW5_URB_GRF_START0__SHIFT;
+    dw[6] = 0;
+    dw[7] = 0;
+
+    assert(!sh->per_thread_scratch_size);
+}
+
+static void gen6_meta_depth_buffer(struct intel_cmd *cmd)
+{
+    const struct intel_cmd_meta *meta = cmd->bind.meta;
+    const struct intel_att_view *view = &meta->ds.view;
+
+    CMD_ASSERT(cmd, 6, 7.5);
+
+    if (!view) {
+        /* all zeros */
+        static const struct intel_att_view null_view;
+        view = &null_view;
+    }
+
+    cmd_wa_gen6_pre_ds_flush(cmd);
+    gen6_3DSTATE_DEPTH_BUFFER(cmd, view, meta->ds.optimal);
+    gen6_3DSTATE_STENCIL_BUFFER(cmd, view, meta->ds.optimal);
+    gen6_3DSTATE_HIER_DEPTH_BUFFER(cmd, view, meta->ds.optimal);
+
+    if (cmd_gen(cmd) >= INTEL_GEN(7))
+        gen7_3DSTATE_CLEAR_PARAMS(cmd, 0);
+    else
+        gen6_3DSTATE_CLEAR_PARAMS(cmd, 0);
+}
+
+static bool cmd_alloc_dset_data(struct intel_cmd *cmd,
+                                struct intel_cmd_dset_data *data,
+                                const struct intel_pipeline_layout *pipeline_layout)
+{
+    if (data->set_offset_count < pipeline_layout->layout_count) {
+        if (data->set_offsets)
+            intel_free(cmd, data->set_offsets);
+
+        data->set_offsets = intel_alloc(cmd,
+                sizeof(data->set_offsets[0]) * pipeline_layout->layout_count,
+                sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+        if (!data->set_offsets) {
+            cmd_fail(cmd, VK_ERROR_OUT_OF_HOST_MEMORY);
+            data->set_offset_count = 0;
+            return false;
+        }
+
+        data->set_offset_count = pipeline_layout->layout_count;
+    }
+
+    if (data->dynamic_offset_count < pipeline_layout->total_dynamic_desc_count) {
+        if (data->dynamic_offsets)
+            intel_free(cmd, data->dynamic_offsets);
+
+        data->dynamic_offsets = intel_alloc(cmd,
+                sizeof(data->dynamic_offsets[0]) * pipeline_layout->total_dynamic_desc_count,
+                sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+        if (!data->dynamic_offsets) {
+            cmd_fail(cmd, VK_ERROR_OUT_OF_HOST_MEMORY);
+            data->dynamic_offset_count = 0;
+            return false;
+        }
+
+        data->dynamic_offset_count = pipeline_layout->total_dynamic_desc_count;
+    }
+
+    return true;
+}
+
+static void cmd_bind_dynamic_state(struct intel_cmd *cmd,
+                                   const struct intel_pipeline *pipeline)
+
+{
+    VkFlags use_flags = pipeline->state.use_pipeline_dynamic_state;
+    if (!use_flags) {
+        return;
+    }
+    cmd->bind.state.use_pipeline_dynamic_state = use_flags;
+    if (use_flags & INTEL_USE_PIPELINE_DYNAMIC_VIEWPORT) {
+        const struct intel_dynamic_viewport *viewport = &pipeline->state.viewport;
+        intel_set_viewport(cmd, viewport->first_viewport, viewport->viewport_count, viewport->viewports);
+    }
+    if (use_flags & INTEL_USE_PIPELINE_DYNAMIC_SCISSOR) {
+        const struct intel_dynamic_viewport *viewport = &pipeline->state.viewport;
+        intel_set_scissor(cmd, viewport->first_scissor, viewport->scissor_count, viewport->scissors);
+    }
+    if (use_flags & INTEL_USE_PIPELINE_DYNAMIC_LINE_WIDTH) {
+        intel_set_line_width(cmd, pipeline->state.line_width.line_width);
+    }
+    if (use_flags & INTEL_USE_PIPELINE_DYNAMIC_DEPTH_BIAS) {
+        const struct intel_dynamic_depth_bias *s = &pipeline->state.depth_bias;
+        intel_set_depth_bias(cmd, s->depth_bias, s->depth_bias_clamp, s->slope_scaled_depth_bias);
+    }
+    if (use_flags & INTEL_USE_PIPELINE_DYNAMIC_BLEND_CONSTANTS) {
+        const struct intel_dynamic_blend *s = &pipeline->state.blend;
+        intel_set_blend_constants(cmd, s->blend_const);
+    }
+    if (use_flags & INTEL_USE_PIPELINE_DYNAMIC_DEPTH_BOUNDS) {
+        const struct intel_dynamic_depth_bounds *s = &pipeline->state.depth_bounds;
+        intel_set_depth_bounds(cmd, s->min_depth_bounds, s->max_depth_bounds);
+    }
+    if (use_flags & INTEL_USE_PIPELINE_DYNAMIC_STENCIL_COMPARE_MASK) {
+        const struct intel_dynamic_stencil *s = &pipeline->state.stencil;
+        intel_set_stencil_compare_mask(cmd, VK_STENCIL_FACE_FRONT_BIT, s->front.stencil_compare_mask);
+        intel_set_stencil_compare_mask(cmd, VK_STENCIL_FACE_BACK_BIT, s->back.stencil_compare_mask);
+    }
+    if (use_flags & INTEL_USE_PIPELINE_DYNAMIC_STENCIL_WRITE_MASK) {
+        const struct intel_dynamic_stencil *s = &pipeline->state.stencil;
+        intel_set_stencil_write_mask(cmd, VK_STENCIL_FACE_FRONT_BIT, s->front.stencil_write_mask);
+        intel_set_stencil_write_mask(cmd, VK_STENCIL_FACE_BACK_BIT, s->back.stencil_write_mask);
+    }
+    if (use_flags & INTEL_USE_PIPELINE_DYNAMIC_STENCIL_REFERENCE) {
+        const struct intel_dynamic_stencil *s = &pipeline->state.stencil;
+        intel_set_stencil_reference(cmd, VK_STENCIL_FACE_FRONT_BIT, s->front.stencil_reference);
+        intel_set_stencil_reference(cmd, VK_STENCIL_FACE_BACK_BIT, s->back.stencil_reference);
+    }
+}
+
+static void cmd_bind_graphics_pipeline(struct intel_cmd *cmd,
+                                       const struct intel_pipeline *pipeline)
+{
+    cmd->bind.pipeline.graphics = pipeline;
+
+    cmd_bind_dynamic_state(cmd, pipeline);
+
+    cmd_alloc_dset_data(cmd, &cmd->bind.dset.graphics_data,
+            pipeline->pipeline_layout);
+}
+
+static void cmd_bind_compute_pipeline(struct intel_cmd *cmd,
+                                      const struct intel_pipeline *pipeline)
+{
+    cmd->bind.pipeline.compute = pipeline;
+
+    cmd_alloc_dset_data(cmd, &cmd->bind.dset.compute_data,
+            pipeline->pipeline_layout);
+}
+
+static void cmd_copy_dset_data(struct intel_cmd *cmd,
+                               struct intel_cmd_dset_data *data,
+                               const struct intel_pipeline_layout *pipeline_layout,
+                               uint32_t index,
+                               const struct intel_desc_set *set,
+                               const uint32_t *dynamic_offsets)
+{
+    const struct intel_desc_layout *layout = pipeline_layout->layouts[index];
+
+    assert(index < data->set_offset_count);
+    data->set_offsets[index] = set->region_begin;
+
+    if (layout->dynamic_desc_count) {
+        assert(pipeline_layout->dynamic_desc_indices[index] +
+                layout->dynamic_desc_count - 1 < data->dynamic_offset_count);
+
+        memcpy(&data->dynamic_offsets[pipeline_layout->dynamic_desc_indices[index]],
+                dynamic_offsets,
+                sizeof(dynamic_offsets[0]) * layout->dynamic_desc_count);
+    }
+}
+
+static void cmd_bind_vertex_data(struct intel_cmd *cmd,
+                                 const struct intel_buf *buf,
+                                 VkDeviceSize offset, uint32_t binding)
+{
+    /* TODOVV: verify */
+    assert(!(binding >= ARRAY_SIZE(cmd->bind.vertex.buf)) && "binding exceeds buf size");
+
+    cmd->bind.vertex.buf[binding] = buf;
+    cmd->bind.vertex.offset[binding] = offset;
+}
+
+static void cmd_bind_index_data(struct intel_cmd *cmd,
+                                const struct intel_buf *buf,
+                                VkDeviceSize offset, VkIndexType type)
+{
+    cmd->bind.index.buf = buf;
+    cmd->bind.index.offset = offset;
+    cmd->bind.index.type = type;
+}
+
+static uint32_t cmd_get_max_surface_write(const struct intel_cmd *cmd)
+{
+    const struct intel_pipeline *pipeline = cmd->bind.pipeline.graphics;
+    struct intel_pipeline_rmap *rmaps[5] = {
+        pipeline->vs.rmap,
+        pipeline->tcs.rmap,
+        pipeline->tes.rmap,
+        pipeline->gs.rmap,
+        pipeline->fs.rmap,
+    };
+    uint32_t max_write;
+    int i;
+
+    STATIC_ASSERT(GEN6_ALIGNMENT_SURFACE_STATE >= GEN6_SURFACE_STATE__SIZE);
+    STATIC_ASSERT(GEN6_ALIGNMENT_SURFACE_STATE >=
+            GEN6_ALIGNMENT_BINDING_TABLE_STATE);
+
+    /* pad first */
+    max_write = GEN6_ALIGNMENT_SURFACE_STATE;
+
+    for (i = 0; i < ARRAY_SIZE(rmaps); i++) {
+        const struct intel_pipeline_rmap *rmap = rmaps[i];
+        const uint32_t surface_count = (rmap) ?
+            rmap->rt_count + rmap->texture_resource_count +
+            rmap->resource_count + rmap->uav_count : 0;
+
+        if (surface_count) {
+            /* SURFACE_STATEs */
+            max_write += GEN6_ALIGNMENT_SURFACE_STATE * surface_count;
+
+            /* BINDING_TABLE_STATE */
+            max_write += u_align(sizeof(uint32_t) * surface_count,
+                    GEN6_ALIGNMENT_SURFACE_STATE);
+        }
+    }
+
+    return max_write;
+}
+
+static void cmd_adjust_state_base_address(struct intel_cmd *cmd)
+{
+    struct intel_cmd_writer *writer = &cmd->writers[INTEL_CMD_WRITER_SURFACE];
+    const uint32_t cur_surface_offset = writer->used - writer->sba_offset;
+    uint32_t max_surface_write;
+
+    /* enough for src and dst SURFACE_STATEs plus BINDING_TABLE_STATE */
+    if (cmd->bind.meta)
+        max_surface_write = 64 * sizeof(uint32_t);
+    else
+        max_surface_write = cmd_get_max_surface_write(cmd);
+
+    /* there is a 64KB limit on BINDING_TABLE_STATEs */
+    if (cur_surface_offset + max_surface_write > 64 * 1024) {
+        /* SBA expects page-aligned addresses */
+        writer->sba_offset = writer->used & ~0xfff;
+
+        assert((writer->used & 0xfff) + max_surface_write <= 64 * 1024);
+
+        cmd_batch_state_base_address(cmd);
+    }
+}
+
+static void cmd_draw(struct intel_cmd *cmd,
+                     uint32_t vertex_start,
+                     uint32_t vertex_count,
+                     uint32_t instance_start,
+                     uint32_t instance_count,
+                     bool indexed,
+                     uint32_t vertex_base)
+{
+    const struct intel_pipeline *p = cmd->bind.pipeline.graphics;
+    const uint32_t surface_writer_used U_ASSERT_ONLY =
+        cmd->writers[INTEL_CMD_WRITER_SURFACE].used;
+
+    cmd_adjust_state_base_address(cmd);
+
+    emit_bounded_states(cmd);
+
+    /* sanity check on cmd_get_max_surface_write() */
+    assert(cmd->writers[INTEL_CMD_WRITER_SURFACE].used -
+            surface_writer_used <= cmd_get_max_surface_write(cmd));
+
+    if (indexed) {
+        assert(!(p->primitive_restart && !gen6_can_primitive_restart(cmd)) && "Primitive restart unsupported on this device");
+
+        if (cmd_gen(cmd) >= INTEL_GEN(7.5)) {
+            gen75_3DSTATE_VF(cmd, p->primitive_restart,
+                    p->primitive_restart_index);
+            gen6_3DSTATE_INDEX_BUFFER(cmd, cmd->bind.index.buf,
+                    cmd->bind.index.offset, cmd->bind.index.type,
+                    false);
+        } else {
+            gen6_3DSTATE_INDEX_BUFFER(cmd, cmd->bind.index.buf,
+                    cmd->bind.index.offset, cmd->bind.index.type,
+                    p->primitive_restart);
+        }
+    } else {
+        assert(!vertex_base);
+    }
+
+    if (cmd_gen(cmd) >= INTEL_GEN(7)) {
+        gen7_3DPRIMITIVE(cmd, p->prim_type, indexed, vertex_count,
+                vertex_start, instance_count, instance_start, vertex_base);
+    } else {
+        gen6_3DPRIMITIVE(cmd, p->prim_type, indexed, vertex_count,
+                vertex_start, instance_count, instance_start, vertex_base);
+    }
+
+    cmd->bind.draw_count++;
+    cmd->bind.render_pass_changed = false;
+    /* need to re-emit all workarounds */
+    cmd->bind.wa_flags = 0;
+
+    if (intel_debug & INTEL_DEBUG_NOCACHE)
+        cmd_batch_flush_all(cmd);
+}
+
+void cmd_draw_meta(struct intel_cmd *cmd, const struct intel_cmd_meta *meta)
+{
+    cmd->bind.meta = meta;
+
+    cmd_adjust_state_base_address(cmd);
+
+    cmd_wa_gen6_pre_depth_stall_write(cmd);
+    cmd_wa_gen6_pre_command_scoreboard_stall(cmd);
+
+    gen6_meta_dynamic_states(cmd);
+    gen6_meta_surface_states(cmd);
+
+    if (cmd_gen(cmd) >= INTEL_GEN(7)) {
+        gen7_meta_urb(cmd);
+        gen6_meta_vf(cmd);
+        gen6_meta_vs(cmd);
+        gen7_meta_disabled(cmd);
+        gen6_meta_clip(cmd);
+        gen6_meta_wm(cmd);
+        gen7_meta_ps(cmd);
+        gen6_meta_depth_buffer(cmd);
+
+        cmd_wa_gen7_post_command_cs_stall(cmd);
+        cmd_wa_gen7_post_command_depth_stall(cmd);
+
+        if (meta->mode == INTEL_CMD_META_VS_POINTS) {
+            gen7_3DPRIMITIVE(cmd, GEN6_3DPRIM_POINTLIST, false,
+                    meta->width * meta->height, 0, 1, 0, 0);
+        } else {
+            gen7_3DPRIMITIVE(cmd, GEN6_3DPRIM_RECTLIST, false, 3, 0, 1, 0, 0);
+        }
+    } else {
+        gen6_meta_urb(cmd);
+        gen6_meta_vf(cmd);
+        gen6_meta_vs(cmd);
+        gen6_meta_disabled(cmd);
+        gen6_meta_clip(cmd);
+        gen6_meta_wm(cmd);
+        gen6_meta_ps(cmd);
+        gen6_meta_depth_buffer(cmd);
+
+        if (meta->mode == INTEL_CMD_META_VS_POINTS) {
+            gen6_3DPRIMITIVE(cmd, GEN6_3DPRIM_POINTLIST, false,
+                    meta->width * meta->height, 0, 1, 0, 0);
+        } else {
+            gen6_3DPRIMITIVE(cmd, GEN6_3DPRIM_RECTLIST, false, 3, 0, 1, 0, 0);
+        }
+    }
+
+    cmd->bind.draw_count++;
+    /* need to re-emit all workarounds */
+    cmd->bind.wa_flags = 0;
+
+    cmd->bind.meta = NULL;
+
+    /* make the normal path believe the render pass has changed */
+    cmd->bind.render_pass_changed = true;
+
+    if (intel_debug & INTEL_DEBUG_NOCACHE)
+        cmd_batch_flush_all(cmd);
+}
+
+static void cmd_exec(struct intel_cmd *cmd, struct intel_bo *bo)
+{
+   const uint8_t cmd_len = 2;
+   uint32_t *dw;
+   uint32_t pos;
+
+   assert(!(cmd_gen(cmd) < INTEL_GEN(7.5)) && "Invalid GPU version");
+
+   pos = cmd_batch_pointer(cmd, cmd_len, &dw);
+   dw[0] = GEN6_MI_CMD(MI_BATCH_BUFFER_START) | (cmd_len - 2) |
+           GEN75_MI_BATCH_BUFFER_START_DW0_SECOND_LEVEL |
+           GEN75_MI_BATCH_BUFFER_START_DW0_NON_PRIVILEGED |
+           GEN6_MI_BATCH_BUFFER_START_DW0_USE_PPGTT;
+
+   cmd_batch_reloc(cmd, pos + 1, bo, 0, 0);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdBindPipeline(
+    VkCommandBuffer                              commandBuffer,
+    VkPipelineBindPoint                     pipelineBindPoint,
+    VkPipeline                                pipeline)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+
+    switch (pipelineBindPoint) {
+    case VK_PIPELINE_BIND_POINT_COMPUTE:
+        cmd_bind_compute_pipeline(cmd, intel_pipeline(pipeline));
+        break;
+    case VK_PIPELINE_BIND_POINT_GRAPHICS:
+        cmd_bind_graphics_pipeline(cmd, intel_pipeline(pipeline));
+        break;
+    default:
+        assert(!"unsupported pipelineBindPoint");
+        break;
+    }
+}
+
+
+VKAPI_ATTR void VKAPI_CALL vkCmdBindDescriptorSets(
+    VkCommandBuffer                             commandBuffer,
+    VkPipelineBindPoint                     pipelineBindPoint,
+    VkPipelineLayout                        layout,
+    uint32_t                                firstSet,
+    uint32_t                                descriptorSetCount,
+    const VkDescriptorSet*                  pDescriptorSets,
+    uint32_t                                dynamicOffsetCount,
+    const uint32_t*                         pDynamicOffsets)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+    const struct intel_pipeline_layout *pipeline_layout;
+    struct intel_cmd_dset_data *data = NULL;
+    uint32_t offset_count = 0;
+    uint32_t i;
+
+    pipeline_layout = intel_pipeline_layout(layout);
+
+    switch (pipelineBindPoint) {
+    case VK_PIPELINE_BIND_POINT_COMPUTE:
+        data = &cmd->bind.dset.compute_data;
+        break;
+    case VK_PIPELINE_BIND_POINT_GRAPHICS:
+        data = &cmd->bind.dset.graphics_data;
+        break;
+    default:
+        assert(!"unsupported pipelineBindPoint");
+        break;
+    }
+
+    cmd_alloc_dset_data(cmd, data, pipeline_layout);
+
+    for (i = 0; i < descriptorSetCount; i++) {
+        struct intel_desc_set *dset = intel_desc_set(pDescriptorSets[i]);
+
+        offset_count += pipeline_layout->layouts[firstSet + i]->dynamic_desc_count;
+        if (offset_count <= dynamicOffsetCount) {
+            cmd_copy_dset_data(cmd, data, pipeline_layout, firstSet + i,
+                    dset, pDynamicOffsets);
+            pDynamicOffsets += pipeline_layout->layouts[firstSet + i]->dynamic_desc_count;
+        }
+    }
+}
+
+
+VKAPI_ATTR void VKAPI_CALL vkCmdBindVertexBuffers(
+    VkCommandBuffer                                 commandBuffer,
+    uint32_t                                        firstBinding,
+    uint32_t                                        bindingCount,
+    const VkBuffer*                                 pBuffers,
+    const VkDeviceSize*                             pOffsets)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+
+    for (uint32_t i = 0; i < bindingCount; i++) {
+        struct intel_buf *buf = intel_buf(pBuffers[i]);
+        cmd_bind_vertex_data(cmd, buf, pOffsets[i], firstBinding + i);
+    }
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdBindIndexBuffer(
+    VkCommandBuffer                              commandBuffer,
+    VkBuffer                                  buffer,
+    VkDeviceSize                                offset,
+    VkIndexType                              indexType)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+    struct intel_buf *buf = intel_buf(buffer);
+
+    cmd_bind_index_data(cmd, buf, offset, indexType);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdDraw(
+    VkCommandBuffer                                 commandBuffer,
+    uint32_t                                    vertexCount,
+    uint32_t                                    instanceCount,
+    uint32_t                                    firstVertex,
+    uint32_t                                    firstInstance)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+
+    cmd_draw(cmd, firstVertex, vertexCount,
+            firstInstance, instanceCount, false, 0);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdDrawIndexed(
+    VkCommandBuffer                                 commandBuffer,
+    uint32_t                                    indexCount,
+    uint32_t                                    instanceCount,
+    uint32_t                                    firstIndex,
+    int32_t                                     vertexOffset,
+    uint32_t                                    firstInstance)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+
+    cmd_draw(cmd, firstIndex, indexCount,
+            firstInstance, instanceCount, true, vertexOffset);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdDrawIndirect(
+    VkCommandBuffer                              commandBuffer,
+    VkBuffer                                  buffer,
+    VkDeviceSize                                offset,
+    uint32_t                                    drawCount,
+    uint32_t                                    stride)
+{
+    assert(0 && "vkCmdDrawIndirect not implemented");
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdDrawIndexedIndirect(
+    VkCommandBuffer                              commandBuffer,
+    VkBuffer                                  buffer,
+    VkDeviceSize                                offset,
+    uint32_t                                    drawCount,
+    uint32_t                                    stride)
+{
+    assert(0 && "vkCmdDrawIndexedIndirect not implemented");
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdDispatch(
+    VkCommandBuffer                              commandBuffer,
+    uint32_t                                    x,
+    uint32_t                                    y,
+    uint32_t                                    z)
+{
+    assert(0 && "vkCmdDispatch not implemented");
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdDispatchIndirect(
+    VkCommandBuffer                              commandBuffer,
+    VkBuffer                                  buffer,
+    VkDeviceSize                                offset)
+{
+    assert(0 && "vkCmdDisatchIndirect not implemented");
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdPushConstants(
+    VkCommandBuffer                                 commandBuffer,
+    VkPipelineLayout                            layout,
+    VkShaderStageFlags                          stageFlags,
+    uint32_t                                    offset,
+    uint32_t                                    size,
+    const void*                                 pValues)
+{
+    /* TODO: Implement */
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetRenderAreaGranularity(
+    VkDevice                                    device,
+    VkRenderPass                                renderPass,
+    VkExtent2D*                                 pGranularity)
+{
+    pGranularity->height = 1;
+    pGranularity->width = 1;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdBeginRenderPass(
+    VkCommandBuffer                                 commandBuffer,
+    const VkRenderPassBeginInfo*                pRenderPassBegin,
+    VkSubpassContents                        contents)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+    const struct intel_render_pass *rp =
+        intel_render_pass(pRenderPassBegin->renderPass);
+    const struct intel_fb *fb = intel_fb(pRenderPassBegin->framebuffer);
+    const struct intel_att_view *view;
+    uint32_t i;
+
+    /* TODOVV: */
+    assert(!(!cmd->primary || rp->attachment_count != fb->view_count) && "Invalid RenderPass");
+
+    cmd_begin_render_pass(cmd, rp, fb, 0, contents);
+
+    for (i = 0; i < rp->attachment_count; i++) {
+        const struct intel_render_pass_attachment *att = &rp->attachments[i];
+        const VkClearValue *clear_val =
+            &pRenderPassBegin->pClearValues[i];
+        VkImageSubresourceRange range;
+
+        view = fb->views[i];
+        range.baseMipLevel = view->mipLevel;
+        range.levelCount = 1;
+        range.baseArrayLayer = view->baseArrayLayer;
+        range.layerCount = view->array_size;
+        range.aspectMask = 0;
+
+        if (view->is_rt) {
+            /* color */
+            if (att->clear_on_load) {
+                range.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
+
+                cmd_meta_clear_color_image(commandBuffer, view->img,
+                        att->initial_layout, &clear_val->color, 1, &range);
+            }
+        } else {
+            /* depth/stencil */
+            if (att->clear_on_load) {
+                range.aspectMask |= VK_IMAGE_ASPECT_DEPTH_BIT;
+            }
+
+            if (att->stencil_clear_on_load) {
+                range.aspectMask |= VK_IMAGE_ASPECT_STENCIL_BIT;
+            }
+
+            if (range.aspectMask) {
+                cmd_meta_clear_depth_stencil_image(commandBuffer,
+                        view->img, att->initial_layout,
+                        clear_val->depthStencil.depth, clear_val->depthStencil.stencil,
+                        1, &range);
+            }
+        }
+    }
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdNextSubpass(
+    VkCommandBuffer                                 commandBuffer,
+    VkSubpassContents                        contents)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+    const struct intel_render_pass U_ASSERT_ONLY *rp = cmd->bind.render_pass;
+
+    /* TODOVV */
+   assert(!(cmd->bind.render_pass_subpass >= rp->subpasses +
+           rp->subpass_count - 1) && "Invalid RenderPassContents");
+
+   cmd->bind.render_pass_changed = true;
+   cmd->bind.render_pass_subpass++;
+   cmd->bind.render_pass_contents = contents;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdEndRenderPass(
+    VkCommandBuffer                              commandBuffer)
+{
+   struct intel_cmd *cmd = intel_cmd(commandBuffer);
+
+   cmd_end_render_pass(cmd);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdExecuteCommands(
+    VkCommandBuffer                                 commandBuffer,
+    uint32_t                                    commandBuffersCount,
+    const VkCommandBuffer*                          pCommandBuffers)
+{
+   struct intel_cmd *cmd = intel_cmd(commandBuffer);
+   uint32_t i;
+
+   /* TODOVV */
+   assert(!(!cmd->bind.render_pass || cmd->bind.render_pass_contents !=
+           VK_SUBPASS_CONTENTS_SECONDARY_COMMAND_BUFFERS) && "Invalid RenderPass");
+
+   for (i = 0; i < commandBuffersCount; i++) {
+       const struct intel_cmd *secondary = intel_cmd(pCommandBuffers[i]);
+
+       /* TODOVV: Move test to validation layer */
+       assert(!(secondary->primary) && "Cannot be primary command buffer");
+
+       cmd_exec(cmd, intel_cmd_get_batch(secondary, NULL));
+   }
+
+   if (i)
+       cmd_batch_state_base_address(cmd);
+}

diff --git a/icd/intel/cmd_priv.h b/icd/intel/cmd_priv.h
new file mode 100644
index 0000000..e3a7cba
--- /dev/null
+++ b/icd/intel/cmd_priv.h

@@ -0,0 +1,549 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Chia-I Wu <olv@lunarg.com>
+ * Author: Chris Forbes <chrisf@ijw.co.nz>
+ *
+ */
+
+#ifndef CMD_PRIV_H
+#define CMD_PRIV_H
+
+#include "genhw/genhw.h"
+#include "dev.h"
+#include "fb.h"
+#include "gpu.h"
+#include "cmd.h"
+
+#define CMD_ASSERT(cmd, min_gen, max_gen) \
+    INTEL_GPU_ASSERT((cmd)->dev->gpu, (min_gen), (max_gen))
+
+enum intel_cmd_item_type {
+    /* for state buffer */
+    INTEL_CMD_ITEM_BLOB,
+    INTEL_CMD_ITEM_CLIP_VIEWPORT,
+    INTEL_CMD_ITEM_SF_VIEWPORT,
+    INTEL_CMD_ITEM_SCISSOR_RECT,
+    INTEL_CMD_ITEM_CC_VIEWPORT,
+    INTEL_CMD_ITEM_COLOR_CALC,
+    INTEL_CMD_ITEM_DEPTH_STENCIL,
+    INTEL_CMD_ITEM_BLEND,
+    INTEL_CMD_ITEM_SAMPLER,
+
+    /* for surface buffer */
+    INTEL_CMD_ITEM_SURFACE,
+    INTEL_CMD_ITEM_BINDING_TABLE,
+
+    /* for instruction buffer */
+    INTEL_CMD_ITEM_KERNEL,
+
+    INTEL_CMD_ITEM_COUNT,
+};
+
+struct intel_cmd_item {
+    enum intel_cmd_item_type type;
+    size_t offset;
+    size_t size;
+};
+
+#define INTEL_CMD_RELOC_TARGET_IS_WRITER (1u << 31)
+struct intel_cmd_reloc {
+    enum intel_cmd_writer_type which;
+    size_t offset;
+
+    intptr_t target;
+    uint32_t target_offset;
+
+    uint32_t flags;
+};
+
+struct intel_att_view;
+
+enum intel_cmd_meta_mode {
+    /*
+     * Draw POINTLIST of (width * height) vertices with only VS enabled.  The
+     * vertex id is from 0 to (width * height - 1).
+     */
+    INTEL_CMD_META_VS_POINTS,
+
+    /*
+     * Draw a RECTLIST from (dst.x, dst.y) to (dst.x + width, dst.y + height)
+     * with only FS enabled.
+     */
+    INTEL_CMD_META_FS_RECT,
+
+    /*
+     * Draw a RECTLIST from (dst.x, dst.y) to (dst.x + width, dst.y + height)
+     * with only depth/stencil enabled.
+     */
+    INTEL_CMD_META_DEPTH_STENCIL_RECT,
+};
+
+enum intel_cmd_meta_ds_op {
+    INTEL_CMD_META_DS_NOP,
+    INTEL_CMD_META_DS_HIZ_CLEAR,
+    INTEL_CMD_META_DS_HIZ_RESOLVE,
+    INTEL_CMD_META_DS_RESOLVE,
+};
+
+struct intel_cmd_meta {
+    enum intel_cmd_meta_mode mode;
+    enum intel_dev_meta_shader shader_id;
+
+    struct {
+        bool valid;
+
+        uint32_t surface[8];
+        uint32_t surface_len;
+
+        intptr_t reloc_target;
+        uint32_t reloc_offset;
+        uint32_t reloc_flags;
+
+        uint32_t lod, layer;
+        uint32_t x, y;
+    } src, dst;
+
+    struct {
+        struct intel_att_view view;
+        uint32_t stencil_ref;
+        /* Using VkImageAspectFlagBits as that means we
+         * are expecting only one bit to be set at a time */
+        VkImageAspectFlagBits aspect;
+
+        enum intel_cmd_meta_ds_op op;
+        bool optimal;
+    } ds;
+
+    uint32_t clear_val[4];
+
+    uint32_t width, height;
+    uint32_t sample_count;
+};
+
+static inline int cmd_gen(const struct intel_cmd *cmd)
+{
+    return intel_gpu_gen(cmd->dev->gpu);
+}
+
+static inline void cmd_fail(struct intel_cmd *cmd, VkResult result)
+{
+    intel_dev_log(cmd->dev, VK_DEBUG_REPORT_ERROR_BIT_EXT,
+                  &cmd->obj.base, 0, 0,
+                  "command building error");
+
+    cmd->result = result;
+}
+
+static inline void cmd_reserve_reloc(struct intel_cmd *cmd,
+                                     uint32_t reloc_len)
+{
+    /* fail silently */
+    if (cmd->reloc_used + reloc_len > cmd->reloc_count) {
+        cmd->reloc_used = 0;
+        cmd_fail(cmd, VK_ERROR_VALIDATION_FAILED_EXT);
+    }
+    assert(cmd->reloc_used + reloc_len <= cmd->reloc_count);
+}
+
+void cmd_writer_grow(struct intel_cmd *cmd,
+                     enum intel_cmd_writer_type which,
+                     size_t new_size);
+
+void cmd_writer_record(struct intel_cmd *cmd,
+                       enum intel_cmd_writer_type which,
+                       enum intel_cmd_item_type type,
+                       size_t offset, size_t size);
+
+/**
+ * Return an offset to a region that is aligned to \p alignment and has at
+ * least \p size bytes.
+ */
+static inline size_t cmd_writer_reserve(struct intel_cmd *cmd,
+                                        enum intel_cmd_writer_type which,
+                                        size_t alignment, size_t size)
+{
+    struct intel_cmd_writer *writer = &cmd->writers[which];
+    size_t offset;
+
+    assert(alignment && u_is_pow2(alignment));
+    offset = u_align(writer->used, alignment);
+
+    if (offset + size > writer->size) {
+        cmd_writer_grow(cmd, which, offset + size);
+        /* align again in case of errors */
+        offset = u_align(writer->used, alignment);
+
+        assert(offset + size <= writer->size);
+    }
+
+    return offset;
+}
+
+/**
+ * Add a reloc at \p pos.  No error checking.
+ */
+static inline void cmd_writer_reloc(struct intel_cmd *cmd,
+                                    enum intel_cmd_writer_type which,
+                                    size_t offset, intptr_t target,
+                                    uint32_t target_offset, uint32_t flags)
+{
+    struct intel_cmd_reloc *reloc = &cmd->relocs[cmd->reloc_used];
+
+    assert(cmd->reloc_used < cmd->reloc_count);
+
+    reloc->which = which;
+    reloc->offset = offset;
+    reloc->target = target;
+    reloc->target_offset = target_offset;
+    reloc->flags = flags;
+
+    cmd->reloc_used++;
+}
+
+/**
+ * Reserve a region from the state buffer.  The offset, in bytes, to the
+ * reserved region is returned.
+ *
+ * Note that \p alignment is in bytes and \p len is in DWords.
+ */
+static inline uint32_t cmd_state_reserve(struct intel_cmd *cmd,
+                                         enum intel_cmd_item_type item,
+                                         size_t alignment, uint32_t len)
+{
+    const enum intel_cmd_writer_type which = INTEL_CMD_WRITER_STATE;
+    const size_t size = len << 2;
+    const size_t offset = cmd_writer_reserve(cmd, which, alignment, size);
+    struct intel_cmd_writer *writer = &cmd->writers[which];
+
+    /* all states are at least aligned to 32-bytes */
+    assert(alignment % 32 == 0);
+
+    writer->used = offset + size;
+
+    if (intel_debug & (INTEL_DEBUG_BATCH | INTEL_DEBUG_HANG))
+        cmd_writer_record(cmd, which, item, offset, size);
+
+    return offset;
+}
+
+/**
+ * Get the pointer to a reserved region for updating.  The pointer is only
+ * valid until the next reserve call.
+ */
+static inline void cmd_state_update(struct intel_cmd *cmd,
+                                    uint32_t offset, uint32_t len,
+                                    uint32_t **dw)
+{
+    const enum intel_cmd_writer_type which = INTEL_CMD_WRITER_STATE;
+    struct intel_cmd_writer *writer = &cmd->writers[which];
+
+    assert(offset + (len << 2) <= writer->used);
+
+    *dw = (uint32_t *) ((char *) writer->ptr + offset);
+}
+
+/**
+ * Reserve a region from the state buffer.  Both the offset, in bytes, and the
+ * pointer to the reserved region are returned.  The pointer is only valid
+ * until the next reserve call.
+ *
+ * Note that \p alignment is in bytes and \p len is in DWords.
+ */
+static inline uint32_t cmd_state_pointer(struct intel_cmd *cmd,
+                                         enum intel_cmd_item_type item,
+                                         size_t alignment, uint32_t len,
+                                         uint32_t **dw)
+{
+    const uint32_t offset = cmd_state_reserve(cmd, item, alignment, len);
+
+    cmd_state_update(cmd, offset, len, dw);
+
+    return offset;
+}
+
+/**
+ * Write a dynamic state to the state buffer.
+ */
+static inline uint32_t cmd_state_write(struct intel_cmd *cmd,
+                                       enum intel_cmd_item_type item,
+                                       size_t alignment, uint32_t len,
+                                       const uint32_t *dw)
+{
+    uint32_t offset, *dst;
+
+    offset = cmd_state_pointer(cmd, item, alignment, len, &dst);
+    memcpy(dst, dw, len << 2);
+
+    return offset;
+}
+
+/**
+ * Write a surface state to the surface buffer.  The offset, in bytes, of the
+ * state is returned.
+ *
+ * Note that \p alignment is in bytes and \p len is in DWords.
+ */
+static inline uint32_t cmd_surface_write(struct intel_cmd *cmd,
+                                         enum intel_cmd_item_type item,
+                                         size_t alignment, uint32_t len,
+                                         const uint32_t *dw)
+{
+    const enum intel_cmd_writer_type which = INTEL_CMD_WRITER_SURFACE;
+    const size_t size = len << 2;
+    const uint32_t offset = cmd_writer_reserve(cmd, which, alignment, size);
+    struct intel_cmd_writer *writer = &cmd->writers[which];
+    uint32_t *dst;
+
+    assert(item == INTEL_CMD_ITEM_SURFACE ||
+           item == INTEL_CMD_ITEM_BINDING_TABLE);
+
+    /* all states are at least aligned to 32-bytes */
+    assert(alignment % 32 == 0);
+
+    writer->used = offset + size;
+
+    if (intel_debug & INTEL_DEBUG_BATCH)
+        cmd_writer_record(cmd, which, item, offset, size);
+
+    dst = (uint32_t *) ((char *) writer->ptr + offset);
+    memcpy(dst, dw, size);
+
+    return offset;
+}
+
+/**
+ * Add a relocation entry for a DWord of a surface state.
+ */
+static inline void cmd_surface_reloc(struct intel_cmd *cmd,
+                                     uint32_t offset, uint32_t dw_index,
+                                     struct intel_bo *bo,
+                                     uint32_t bo_offset, uint32_t reloc_flags)
+{
+    const enum intel_cmd_writer_type which = INTEL_CMD_WRITER_SURFACE;
+
+    cmd_writer_reloc(cmd, which, offset + (dw_index << 2),
+            (intptr_t) bo, bo_offset, reloc_flags);
+}
+
+static inline void cmd_surface_reloc_writer(struct intel_cmd *cmd,
+                                            uint32_t offset, uint32_t dw_index,
+                                            enum intel_cmd_writer_type writer,
+                                            uint32_t writer_offset)
+{
+    const enum intel_cmd_writer_type which = INTEL_CMD_WRITER_SURFACE;
+
+    cmd_writer_reloc(cmd, which, offset + (dw_index << 2),
+            (intptr_t) writer, writer_offset,
+            INTEL_CMD_RELOC_TARGET_IS_WRITER);
+}
+
+/**
+ * Write a kernel to the instruction buffer.  The offset, in bytes, of the
+ * kernel is returned.
+ */
+static inline uint32_t cmd_instruction_write(struct intel_cmd *cmd,
+                                             size_t size,
+                                             const void *kernel)
+{
+    const enum intel_cmd_writer_type which = INTEL_CMD_WRITER_INSTRUCTION;
+    /*
+     * From the Sandy Bridge PRM, volume 4 part 2, page 112:
+     *
+     *     "Due to prefetch of the instruction stream, the EUs may attempt to
+     *      access up to 8 instructions (128 bytes) beyond the end of the
+     *      kernel program - possibly into the next memory page.  Although
+     *      these instructions will not be executed, software must account for
+     *      the prefetch in order to avoid invalid page access faults."
+     */
+    const size_t reserved_size = size + 128;
+    /* kernels are aligned to 64 bytes */
+    const size_t alignment = 64;
+    const size_t offset = cmd_writer_reserve(cmd,
+            which, alignment, reserved_size);
+    struct intel_cmd_writer *writer = &cmd->writers[which];
+
+    memcpy((char *) writer->ptr + offset, kernel, size);
+
+    writer->used = offset + size;
+
+    if (intel_debug & (INTEL_DEBUG_BATCH | INTEL_DEBUG_HANG))
+        cmd_writer_record(cmd, which, INTEL_CMD_ITEM_KERNEL, offset, size);
+
+    return offset;
+}
+
+/**
+ * Reserve a region from the batch buffer.  Both the offset, in DWords, and
+ * the pointer to the reserved region are returned.  The pointer is only valid
+ * until the next reserve call.
+ *
+ * Note that \p len is in DWords.
+ */
+static inline uint32_t cmd_batch_pointer(struct intel_cmd *cmd,
+                                         uint32_t len, uint32_t **dw)
+{
+    const enum intel_cmd_writer_type which = INTEL_CMD_WRITER_BATCH;
+    /*
+     * We know the batch bo is always aligned.  Using 1 here should allow the
+     * compiler to optimize away aligning.
+     */
+    const size_t alignment = 1;
+    const size_t size = len << 2;
+    const size_t offset = cmd_writer_reserve(cmd, which, alignment, size);
+    struct intel_cmd_writer *writer = &cmd->writers[which];
+
+    assert(offset % 4 == 0);
+    *dw = (uint32_t *) ((char *) writer->ptr + offset);
+
+    writer->used = offset + size;
+
+    return offset >> 2;
+}
+
+/**
+ * Write a command to the batch buffer.
+ */
+static inline uint32_t cmd_batch_write(struct intel_cmd *cmd,
+                                       uint32_t len, const uint32_t *dw)
+{
+    uint32_t pos;
+    uint32_t *dst;
+
+    pos = cmd_batch_pointer(cmd, len, &dst);
+    memcpy(dst, dw, len << 2);
+
+    return pos;
+}
+
+/**
+ * Add a relocation entry for a DWord of a command.
+ */
+static inline void cmd_batch_reloc(struct intel_cmd *cmd, uint32_t pos,
+                                   struct intel_bo *bo,
+                                   uint32_t bo_offset, uint32_t reloc_flags)
+{
+    const enum intel_cmd_writer_type which = INTEL_CMD_WRITER_BATCH;
+
+    cmd_writer_reloc(cmd, which, pos << 2, (intptr_t) bo, bo_offset, reloc_flags);
+}
+
+static inline void cmd_batch_reloc_writer(struct intel_cmd *cmd, uint32_t pos,
+                                          enum intel_cmd_writer_type writer,
+                                          uint32_t writer_offset)
+{
+    const enum intel_cmd_writer_type which = INTEL_CMD_WRITER_BATCH;
+
+    cmd_writer_reloc(cmd, which, pos << 2, (intptr_t) writer, writer_offset,
+            INTEL_CMD_RELOC_TARGET_IS_WRITER);
+}
+
+void cmd_batch_state_base_address(struct intel_cmd *cmd);
+void cmd_batch_push_const_alloc(struct intel_cmd *cmd);
+
+/**
+ * Begin the batch buffer.
+ */
+static inline void cmd_batch_begin(struct intel_cmd *cmd)
+{
+    cmd_batch_state_base_address(cmd);
+    cmd_batch_push_const_alloc(cmd);
+}
+
+/**
+ * End the batch buffer.
+ */
+static inline void cmd_batch_end(struct intel_cmd *cmd)
+{
+    struct intel_cmd_writer *writer = &cmd->writers[INTEL_CMD_WRITER_BATCH];
+    uint32_t *dw;
+
+    if (writer->used & 0x7) {
+        cmd_batch_pointer(cmd, 1, &dw);
+        dw[0] = GEN6_MI_CMD(MI_BATCH_BUFFER_END);
+    } else {
+        cmd_batch_pointer(cmd, 2, &dw);
+        dw[0] = GEN6_MI_CMD(MI_BATCH_BUFFER_END);
+        dw[1] = GEN6_MI_CMD(MI_NOOP);
+    }
+}
+
+static inline void cmd_begin_render_pass(struct intel_cmd *cmd,
+                                         const struct intel_render_pass *rp,
+                                         const struct intel_fb *fb,
+                                         const uint32_t sp,
+                                         VkSubpassContents contents)
+{
+    assert(sp < rp->subpass_count);
+
+    cmd->bind.render_pass_changed = true;
+    cmd->bind.render_pass = rp;
+    cmd->bind.render_pass_subpass = &rp->subpasses[sp];
+    cmd->bind.fb = fb;
+    cmd->bind.render_pass_contents = contents;
+}
+
+static inline void cmd_end_render_pass(struct intel_cmd *cmd)
+{
+    //note what to do if rp != bound rp
+    cmd->bind.render_pass = 0;
+    cmd->bind.fb = 0;
+}
+
+void cmd_batch_flush(struct intel_cmd *cmd, uint32_t pipe_control_dw0);
+void cmd_batch_flush_all(struct intel_cmd *cmd);
+
+void cmd_batch_depth_count(struct intel_cmd *cmd,
+                           struct intel_bo *bo,
+                           VkDeviceSize offset);
+
+void cmd_batch_timestamp(struct intel_cmd *cmd,
+                         struct intel_bo *bo,
+                         VkDeviceSize offset);
+
+void cmd_batch_immediate(struct intel_cmd *cmd,
+                         uint32_t pipe_control_flags,
+                         struct intel_bo *bo,
+                         VkDeviceSize offset,
+                         uint64_t val);
+
+void cmd_draw_meta(struct intel_cmd *cmd, const struct intel_cmd_meta *meta);
+
+void cmd_meta_ds_op(struct intel_cmd *cmd,
+                    enum intel_cmd_meta_ds_op op,
+                    struct intel_img *img,
+                    const VkImageSubresourceRange *range);
+
+void cmd_meta_clear_color_image(
+    VkCommandBuffer                         commandBuffer,
+    struct intel_img                   *img,
+    VkImageLayout                       imageLayout,
+    const VkClearColorValue            *pClearColor,
+    uint32_t                            rangeCount,
+    const VkImageSubresourceRange      *pRanges);
+
+void cmd_meta_clear_depth_stencil_image(
+    VkCommandBuffer                              commandBuffer,
+    struct intel_img*                        img,
+    VkImageLayout                            imageLayout,
+    float                                       depth,
+    uint32_t                                    stencil,
+    uint32_t                                    rangeCount,
+    const VkImageSubresourceRange*          pRanges);
+
+#endif /* CMD_PRIV_H */

diff --git a/icd/intel/compiler/CMakeLists.txt b/icd/intel/compiler/CMakeLists.txt
new file mode 100644
index 0000000..968b501
--- /dev/null
+++ b/icd/intel/compiler/CMakeLists.txt

@@ -0,0 +1,398 @@
+# Create the i965 compiler library
+
+# Mesa required defines
+add_definitions(-D_GNU_SOURCE -DHAVE_PTHREAD -D__NOT_HAVE_DRM_H)
+# LLVM required defines
+add_definitions(-D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS)
+
+# DEBUG and NDEBUG flags are important for proper mesa behavior
+set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} -DDEBUG")
+set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} -DNDEBUG")
+set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -DDEBUG -std=c++11")
+set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -DNDEBUG -std=c++11")
+
+if (WIN32)
+    # For Windows, since 32-bit and 64-bit items can co-exist, we build each in its own build directory.
+    # 32-bit target data goes in build32, and 64-bit target data goes into build.  So, include/link the
+    # appropriate data at build time.
+    if (CMAKE_CL_64)
+        set (BUILDTGT_DIR build)
+    else ()
+        set (BUILDTGT_DIR build32)
+    endif()
+else()
+    set (BUILDTGT_DIR build)
+endif()
+
+# LunarG TODO:  Get the llvm-config flags hooked up correctly and remove extra definitions from above
+
+execute_process(COMMAND ${LUNARGLASS_PREFIX}/Core/LLVM/llvm-3.4/${BUILDTGT_DIR}/install/usr/local/bin/llvm-config --libs engine bitwriter
+  OUTPUT_VARIABLE LLVM_LIBS_ALL_1
+  RESULT_VARIABLE LLVM_LIBS_RESULT)
+
+string(REPLACE "\n" "" LLVM_LIBS_ALL ${LLVM_LIBS_ALL_1})
+message(STATUS "llvm-config lib results")
+message(STATUS ${LLVM_LIBS_ALL})
+
+if(NOT "${LLVM_LIBS_RESULT}" EQUAL "0")
+  message(FATAL_ERROR "llvm-config failed: " ${LLVM_LIBS_RESULT})
+endif()
+
+# Expect libraries to be in either the build (release build) or dbuild (debug) directories
+if(EXISTS ${LUNARGLASS_PREFIX}/${BUILDTGT_DIR}/install/lib)
+    set(LUNARGLASS_BUILD ${LUNARGLASS_PREFIX}/${BUILDTGT_DIR})
+elseif(EXISTS ${LUNARGLASS_PREFIX}/dbuild/install/lib)
+    set(LUNARGLASS_BUILD ${LUNARGLASS_PREFIX}/dbuild)
+else()
+    message(FATAL_ERROR "Necessary LunarGLASS libraries cannot be found: " ${LUNARGLASS_PREFIX})
+endif()
+
+execute_process(COMMAND ${LUNARGLASS_PREFIX}/Core/LLVM/llvm-3.4/${BUILDTGT_DIR}/install/usr/local/bin/llvm-config --cxxflags
+  OUTPUT_VARIABLE LLVM_CXX_CONFIG_ALL_1
+  RESULT_VARIABLE LLVM_CXX_CONFIG_RESULT)
+
+string(REPLACE "\n" "" LLVM_CXX_CONFIG_ALL ${LLVM_CXX_CONFIG_ALL_1})
+string(REPLACE "-Woverloaded-virtual" "" LLVM_CXX_CONFIG_1 ${LLVM_CXX_CONFIG_ALL})
+string(REPLACE "-fvisibility-inlines-hidden" "" LLVM_CXX_CONFIG ${LLVM_CXX_CONFIG_1})
+message(STATUS "llvm-config cxxflags results")
+message(STATUS ${LLVM_CXX_CONFIG})
+
+# if(NOT "${LLVM_CXX_CONFIG_RESULT}" EQUAL "0")
+#   message(FATAL_ERROR "llvm-config failed: " ${LLVM_CXX_CONFIG_RESULT})
+# endif()
+
+
+set_target_properties(icd
+      PROPERTIES
+      COMPILE_FLAGS "${LLVM_CXX_CONFIG}")
+
+SET(COMPILER_LINK_DIRS
+    -L${PROJECT_SOURCE_DIR}/external/glslang/build/install/lib
+    -L${LUNARGLASS_PREFIX}/Core/LLVM/llvm-3.4/${BUILDTGT_DIR}/install/usr/local/lib
+    -L${LUNARGLASS_PREFIX}/${BUILDTGT_DIR}/Core
+    -L${LUNARGLASS_PREFIX}/${BUILDTGT_DIR}/Frontends/glslang
+    -L${LUNARGLASS_PREFIX}/${BUILDTGT_DIR}/Frontends/SPIRV
+    -L${LUNARGLASS_PREFIX}/${BUILDTGT_DIR}/Core/Passes/Transforms
+    -L${LUNARGLASS_PREFIX}/${BUILDTGT_DIR}/Core/Passes/Immutable
+    -L${LUNARGLASS_PREFIX}/${BUILDTGT_DIR}/Core/Passes/Analysis
+    -L${LUNARGLASS_PREFIX}/${BUILDTGT_DIR}/Core/Passes/Util
+)
+
+SET(COMPILER_LIBS
+   glslangFrontend
+   SpvFrontend
+   core
+   LLVMipo
+   glslang
+   SPIRV
+   SPVRemapper
+   HLSL
+   OGLCompiler
+   ${LLVM_LIBS_ALL}
+)
+
+# Ensure we re-link if our external libraries change
+# Every entry of COMPILER_LIBS should be included here
+string(REPLACE "-l" "" LLVM_LIBS ${LLVM_LIBS_ALL})
+separate_arguments(LLVM_LIBS)
+foreach(LIB ${LLVM_LIBS})
+    #message(STATUS "adding the following lib")
+    #message(STATUS ${LIB})
+    add_library(${LIB} STATIC IMPORTED)
+    #message(STATUS "adding the following path")
+    #message(STATUS ${LUNARGLASS_PREFIX}/Core/LLVM/llvm-3.4/${BUILDTGT_DIR}/install/usr/local/lib/lib${LIB}.a)
+    set_target_properties(${LIB} PROPERTIES IMPORTED_LOCATION ${LUNARGLASS_PREFIX}/Core/LLVM/llvm-3.4/${BUILDTGT_DIR}/install/usr/local/lib/lib${LIB}.a)
+endforeach(LIB)
+add_library(glslangFrontend STATIC IMPORTED)
+add_library(SpvFrontend     STATIC IMPORTED)
+add_library(core            STATIC IMPORTED)
+add_library(LLVMipo         STATIC IMPORTED)
+add_library(glslang         STATIC IMPORTED)
+add_library(SPIRV           STATIC IMPORTED)
+add_library(SPVRemapper     STATIC IMPORTED)
+add_library(HLSL            STATIC IMPORTED)
+add_library(OGLCompiler     STATIC IMPORTED)
+
+set_target_properties(glslangFrontend PROPERTIES IMPORTED_LOCATION ${LUNARGLASS_PREFIX}/${BUILDTGT_DIR}/Frontends/glslang/libglslangFrontend.a)
+set_target_properties(SpvFrontend     PROPERTIES IMPORTED_LOCATION ${LUNARGLASS_PREFIX}/${BUILDTGT_DIR}/Frontends/SPIRV/libSpvFrontend.a)
+set_target_properties(core            PROPERTIES IMPORTED_LOCATION ${LUNARGLASS_PREFIX}/${BUILDTGT_DIR}/Core/libcore.a)
+set_target_properties(LLVMipo         PROPERTIES IMPORTED_LOCATION ${LUNARGLASS_PREFIX}/Core/LLVM/llvm-3.4/${BUILDTGT_DIR}/install/usr/local/lib/libLLVMipo.a)
+set_target_properties(glslang         PROPERTIES IMPORTED_LOCATION ${GLSLANG_LIB})
+set_target_properties(SPIRV           PROPERTIES IMPORTED_LOCATION ${SPIRV_LIB})
+set_target_properties(SPVRemapper     PROPERTIES IMPORTED_LOCATION ${SPIRV_REMAPPER_LIB})
+set_target_properties(HLSL            PROPERTIES IMPORTED_LOCATION ${HLSL_LIB})
+set_target_properties(OGLCompiler     PROPERTIES IMPORTED_LOCATION ${OGLCompiler_LIB})
+
+SET(COMPILER_INCLUDE_DIRS
+    ${GLSLANG_SPIRV_INCLUDE_DIR}
+    ${LUNARGLASS_PREFIX}/Core/LLVM/llvm-3.4/${BUILDTGT_DIR}/install/usr/local/include
+    ${LUNARGLASS_PREFIX}
+    shader
+    pipeline
+    mesa-utils/include
+    mesa-utils/src
+    mesa-utils/src/glsl
+    mesa-utils/src/mesa
+    mesa-utils/src/mesa/program
+    mesa-utils/src/mapi
+)
+
+SET(CREATE_SHADER_SOURCES
+    shader/ast_array_index.cpp
+    shader/ast_expr.cpp
+    shader/ast_function.cpp
+    shader/ast_to_hir.cpp
+    shader/ast_type.cpp
+    shader/builtin_functions.cpp
+    shader/builtin_types.cpp
+    shader/builtin_variables.cpp
+    shader/ir.cpp
+    shader/ir_basic_block.cpp
+    shader/ir_builder.cpp
+    shader/ir_clone.cpp
+    shader/ir_constant_expression.cpp
+    shader/ir_deserializer.cpp
+    shader/ir_equals.cpp
+    shader/ir_expression_flattening.cpp
+    shader/ir_function_can_inline.cpp
+    shader/ir_function.cpp
+    shader/ir_function_detect_recursion.cpp
+    shader/ir_hierarchical_visitor.cpp
+    shader/ir_hv_accept.cpp
+    shader/ir_import_prototypes.cpp
+    shader/ir_print_visitor.cpp
+    shader/ir_reader.cpp
+    shader/ir_rvalue_visitor.cpp
+    shader/ir_serialize.cpp
+    shader/ir_set_program_inouts.cpp
+    shader/ir_validate.cpp
+    shader/ir_variable_refcount.cpp
+    shader/link_atomics.cpp
+    shader/linker.cpp
+    shader/link_functions.cpp
+    shader/link_interface_blocks.cpp
+    shader/link_uniform_block_active_visitor.cpp
+    shader/link_uniform_blocks.cpp
+    shader/link_uniform_initializers.cpp
+    shader/link_uniforms.cpp
+    shader/link_varyings.cpp
+    shader/loop_analysis.cpp
+    shader/loop_controls.cpp
+    shader/loop_unroll.cpp
+    shader/lower_clip_distance.cpp
+    shader/lower_discard.cpp
+    shader/lower_discard_flow.cpp
+    shader/lower_if_to_cond_assign.cpp
+    shader/lower_instructions.cpp
+    shader/lower_jumps.cpp
+    shader/lower_mat_op_to_vec.cpp
+    shader/lower_named_interface_blocks.cpp
+    shader/lower_noise.cpp
+    shader/lower_offset_array.cpp
+    shader/lower_output_reads.cpp
+    shader/lower_packed_varyings.cpp
+    shader/lower_packing_builtins.cpp
+    shader/lower_texture_projection.cpp
+    shader/lower_ubo_reference.cpp
+    shader/lower_variable_index_to_cond_assign.cpp
+    shader/lower_vec_index_to_cond_assign.cpp
+    shader/lower_vec_index_to_swizzle.cpp
+    shader/lower_vector.cpp
+    shader/lower_vector_insert.cpp
+    shader/opt_algebraic.cpp
+    shader/opt_array_splitting.cpp
+    shader/opt_constant_folding.cpp
+    shader/opt_constant_propagation.cpp
+    shader/opt_constant_variable.cpp
+    shader/opt_copy_propagation.cpp
+    shader/opt_copy_propagation_elements.cpp
+    shader/opt_cse.cpp
+    shader/opt_dead_builtin_varyings.cpp
+    shader/opt_dead_code.cpp
+    shader/opt_dead_code_local.cpp
+    shader/opt_dead_functions.cpp
+    shader/opt_flatten_nested_if_blocks.cpp
+    shader/opt_flip_matrices.cpp
+    shader/opt_function_inlining.cpp
+    shader/opt_if_simplification.cpp
+    shader/opt_noop_swizzle.cpp
+    shader/opt_redundant_jumps.cpp
+    shader/opt_structure_splitting.cpp
+    shader/opt_swizzle_swizzle.cpp
+    shader/opt_tree_grafting.cpp
+    shader/opt_vectorize.cpp
+    shader/s_expression.cpp
+#    shader/shader_deserialize.cpp
+#    shader/shader_serialize.cpp
+#    shader/standalone_scaffolding.cpp
+    shader/strtod.cpp
+
+    mesa-utils/src/glsl/ralloc.c
+    mesa-utils/src/mesa/program/program.c
+#    mesa-utils/src/mesa/program/prog_execute.c
+    # mesa-utils/src/mesa/program/prog_noise.c
+    mesa-utils/src/mesa/program/prog_statevars.c
+    # mesa-utils/src/mesa/program/prog_opt_constant_fold.c
+    mesa-utils/src/mesa/program/symbol_table.c
+#    mesa-utils/src/mesa/program/prog_cache.c
+    mesa-utils/src/mesa/program/prog_instruction.c
+    # mesa-utils/src/mesa/program/prog_optimize.c
+    # mesa-utils/src/mesa/program/arbprogparse.c
+    mesa-utils/src/mesa/program/prog_hash_table.c
+    mesa-utils/src/mesa/program/prog_parameter.c
+    # mesa-utils/src/mesa/program/prog_diskcache.c
+    # mesa-utils/src/mesa/program/program_parse.tab.c
+    # mesa-utils/src/mesa/program/programopt.c
+    # mesa-utils/src/mesa/program/prog_print.c
+    # mesa-utils/src/mesa/program/program_parse_extra.c
+    # mesa-utils/src/mesa/program/prog_parameter_layout.c
+     mesa-utils/src/mesa/program/register_allocate.c
+    # mesa-utils/src/mesa/math/m_matrix.c
+    # mesa-utils/src/mesa/main/enums.c
+    # mesa-utils/src/mesa/main/imports.c
+    mesa-utils/src/mesa/main/hash.c
+    mesa-utils/src/mesa/main/hash_table.c
+    # mesa-utils/src/mesa/main/errors.c
+    # mesa-utils/src/mesa/main/formats.c
+
+    mesa-utils/src/mesa/main/errors.c
+    # mesa-utils/src/mesa/main/context.c
+    mesa-utils/src/mesa/main/enums.c
+    mesa-utils/src/mesa/main/imports.c
+    mesa-utils/src/mesa/main/version.c
+    mesa-utils/src/mesa/main/uniforms.c
+
+    #mesa-utils/src/mesa/main/shaderobj.c
+
+    mesa-utils/src/mesa/program/sampler.cpp
+
+    shader/glsl_glass_manager.cpp
+    shader/glsl_glass_backend_translator.cpp
+    shader/glsl_glass_backend.cpp
+
+    shader/glsl_parser_extras.cpp
+    shader/standalone_scaffolding.cpp
+    shader/glsl_types.cpp
+    shader/glsl_symbol_table.cpp
+    shader/hir_field_selection.cpp
+
+)
+
+
+SET(CREATE_PIPELINE_SOURCES
+    # File required for backend compiler
+    pipeline/brw_blorp_blit_eu.cpp
+    pipeline/brw_shader.cpp
+    pipeline/brw_fs.cpp
+    pipeline/brw_fs_visitor.cpp
+    pipeline/brw_fs_live_variables.cpp
+    pipeline/brw_cfg.cpp
+    pipeline/brw_fs_cse.cpp
+    pipeline/brw_fs_copy_propagation.cpp
+    pipeline/brw_fs_peephole_predicated_break.cpp
+    pipeline/brw_fs_dead_code_eliminate.cpp
+    pipeline/brw_fs_sel_peephole.cpp
+    pipeline/brw_dead_control_flow.cpp
+    pipeline/brw_fs_saturate_propagation.cpp
+    pipeline/brw_fs_register_coalesce.cpp
+    pipeline/brw_schedule_instructions.cpp
+    pipeline/brw_fs_reg_allocate.cpp
+    pipeline/brw_fs_generator.cpp
+    pipeline/brw_lower_texture_gradients.cpp
+    pipeline/brw_cubemap_normalize.cpp
+    pipeline/brw_lower_unnormalized_offset.cpp
+    pipeline/brw_fs_channel_expressions.cpp
+    pipeline/brw_fs_vector_splitting.cpp
+
+    pipeline/brw_disasm.c
+    pipeline/brw_device_info.c
+    pipeline/brw_eu.c
+    pipeline/brw_program.c
+    pipeline/brw_wm.c
+    pipeline/brw_eu_emit.c
+    pipeline/brw_eu_compact.c
+    pipeline/intel_debug.c
+
+    pipeline/brw_vs.c
+    pipeline/brw_vec4.cpp
+    pipeline/brw_vec4_visitor.cpp
+    pipeline/brw_vec4_vs_visitor.cpp
+    pipeline/brw_vec4_live_variables.cpp
+    pipeline/brw_vec4_copy_propagation.cpp
+    pipeline/brw_vec4_reg_allocate.cpp
+    pipeline/brw_vec4_generator.cpp
+
+    pipeline/brw_vec4_gs.c
+    pipeline/brw_vec4_gs_visitor.cpp
+
+    # pipeline/gen8_vec4_generator.cpp
+    )
+
+set_source_files_properties(
+	shader/glsl_glass_manager.cpp
+	shader/glsl_glass_backend_translator.cpp
+	shader/glsl_glass_backend.cpp
+	shader/glsl_parser_extras.cpp
+	PROPERTIES COMPILE_FLAGS "-Wno-unknown-pragmas -Wno-ignored-qualifiers")
+
+# needed by libglslang.a
+add_library(intelcompiler-os STATIC shader/ossource.cpp)
+target_include_directories(intelcompiler-os PRIVATE ${COMPILER_INCLUDE_DIRS})
+set_target_properties(intelcompiler-os PROPERTIES POSITION_INDEPENDENT_CODE ON)
+
+add_library(intelcompiler STATIC
+            shader/compiler_interface.cpp
+            pipeline/pipeline_compiler_interface.cpp
+            pipeline/pipeline_compiler_interface_meta.cpp
+            ${CREATE_SHADER_SOURCES}
+            ${CREATE_PIPELINE_SOURCES}
+)
+
+target_include_directories(intelcompiler
+    PRIVATE ${COMPILER_INCLUDE_DIRS}
+    PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/..)
+
+target_link_libraries(intelcompiler
+    icd
+    ${COMPILER_LINK_DIRS}
+    ${COMPILER_LIBS}
+    intelcompiler-os
+    m
+    pthread
+    dl
+)
+
+set_target_properties(intelcompiler PROPERTIES POSITION_INDEPENDENT_CODE ON)
+
+# The following generates standalone_compiler executable
+
+SET(STANDALONE_COMPILER_SOURCES
+    shader/main.cpp
+    shader/standalone_utils.c
+    shader/compiler_interface.cpp
+    pipeline/pipeline_compiler_interface.cpp
+    ${CREATE_SHADER_SOURCES}
+    ${CREATE_PIPELINE_SOURCES}
+)
+
+add_executable(standalone_compiler
+   ${STANDALONE_COMPILER_SOURCES}
+)
+
+target_include_directories(standalone_compiler
+    PRIVATE ${COMPILER_INCLUDE_DIRS}
+    PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/..
+)
+
+# This define is used to turn off driver portions of the compiler/pipeline interface
+target_compile_definitions(standalone_compiler PRIVATE "-DSTANDALONE_SHADER_COMPILER")
+
+target_link_libraries(standalone_compiler
+    icd
+    ${COMPILER_LINK_DIRS}
+    ${COMPILER_LIBS}
+    intelcompiler-os
+    m
+    pthread
+    dl
+)

diff --git a/icd/intel/compiler/README.md b/icd/intel/compiler/README.md
new file mode 100644
index 0000000..d4cd9e6
--- /dev/null
+++ b/icd/intel/compiler/README.md

@@ -0,0 +1,32 @@
+# Sample BIL to Intel ISA Compiler
+
+This compiler stack was brought over from the GlassyMesa driver LunarG created for Valve.
+It uses the following tools:
+- BIL support and LunarGLASS middle end optimizer (pulled in via 
+[update_external_sources.sh](../../../update_external_sources.sh) script)
+(mesa-utils/src/glsl)
+- [GlassyMesa's GLSLIR and supporting infrastructure](shader)
+- [GlassyMesa's DRI i965 backend](pipeline)
+
+For vkCreateShader, we primarily used the existing standalone device independent front end which can consume GLSL or BIL, and results in a separately linked shader object.
+
+For vkCreateGraphicsPipeline, we pulled over only the files needed to lower the shader object to ISA and supporting metadata.  Much of the i965 DRI driver was removed or commented out for future use, and is still being actively bootstrapped.
+
+Currently only Vertex and Fragment shaders are supported.  Any shader that fits within the IO parameters you see tested in compiler_render_tests.cpp should work.  Buffers with bindings, samplers with bindings, interstage IO with locations, are all working.  Vertex input locations work if they are sequential and start from 0.  Fragment output locations only work for location 0.
+
+We recommend using only buffers with bindings for uniforms, no global, non-block uniforms.
+
+Design decisions we made to get this stack working with current specified VK and BIL.  We know these are active areas of discussion, and we'll update when decisions are made:
+- Samplers:
+  - GLSL sampler bindings equate to a sampler/texture pair of the same number, as set up by the VK application.  i.e. the following sampler:
+```
+    layout (binding = 2) uniform sampler2D surface;
+```
+will read from VK_SLOT_SHADER_SAMPLER entity 2 and VK_SLOT_SHADER_RESOURCE entity 2.
+
+- Buffers:
+  - GLSL buffer bindings equate to the buffer bound at the same slot. i.e. the following uniform buffer:
+```
+    layout (std140, binding = 2) uniform foo { vec4 bar; } myBuffer;
+```
+will be read from VK_SHADER_RESOURCE entity 2.

diff --git a/icd/intel/compiler/mesa-utils/include/c11/threads.h b/icd/intel/compiler/mesa-utils/include/c11/threads.h
new file mode 100644
index 0000000..45823df
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/include/c11/threads.h

@@ -0,0 +1,79 @@
+/*
+ * C11 <threads.h> emulation library
+ *
+ * (C) Copyright yohhoy 2012.
+ * Distributed under the Boost Software License, Version 1.0.
+ *
+ * Permission is hereby granted, free of charge, to any person or organization
+ * obtaining a copy of the software and accompanying documentation covered by
+ * this license (the "Software") to use, reproduce, display, distribute,
+ * execute, and transmit the Software, and to prepare [[derivative work]]s of the
+ * Software, and to permit third-parties to whom the Software is furnished to
+ * do so, all subject to the following:
+ *
+ * The copyright notices in the Software and this entire statement, including
+ * the above license grant, this restriction and the following disclaimer,
+ * must be included in all copies of the Software, in whole or in part, and
+ * all derivative works of the Software, unless such copies or derivative
+ * works are solely in the form of machine-executable object code generated by
+ * a source language processor.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
+ * SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
+ * FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#ifndef EMULATED_THREADS_H_INCLUDED_
+#define EMULATED_THREADS_H_INCLUDED_
+
+#include <time.h>
+
+#ifndef TIME_UTC
+#define TIME_UTC 1
+#endif
+
+#include "c99_compat.h" /* for `inline` */
+
+/*---------------------------- types ----------------------------*/
+typedef void (*tss_dtor_t)(void*);
+typedef int (*thrd_start_t)(void*);
+
+struct xtime {
+    time_t sec;
+    long nsec;
+};
+typedef struct xtime xtime;
+
+
+/*-------------------- enumeration constants --------------------*/
+enum {
+    mtx_plain     = 0,
+    mtx_try       = 1,
+    mtx_timed     = 2,
+    mtx_recursive = 4
+};
+
+enum {
+    thrd_success = 0, // succeeded
+    thrd_timeout,     // timeout
+    thrd_error,       // failed
+    thrd_busy,        // resource busy
+    thrd_nomem        // out of memory
+};
+
+/*-------------------------- functions --------------------------*/
+
+#if defined(_WIN32) && !defined(__CYGWIN__)
+#include "threads_win32.h"
+#elif defined(HAVE_PTHREAD)
+#include "threads_posix.h"
+#else
+#error Not supported on this platform.
+#endif
+
+
+
+#endif /* EMULATED_THREADS_H_INCLUDED_ */

diff --git a/icd/intel/compiler/mesa-utils/include/c11/threads_posix.h b/icd/intel/compiler/mesa-utils/include/c11/threads_posix.h
new file mode 100644
index 0000000..f9c165d
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/include/c11/threads_posix.h

@@ -0,0 +1,375 @@
+/*
+ * C11 <threads.h> emulation library
+ *
+ * (C) Copyright yohhoy 2012.
+ * Distributed under the Boost Software License, Version 1.0.
+ *
+ * Permission is hereby granted, free of charge, to any person or organization
+ * obtaining a copy of the software and accompanying documentation covered by
+ * this license (the "Software") to use, reproduce, display, distribute,
+ * execute, and transmit the Software, and to prepare [[derivative work]]s of the
+ * Software, and to permit third-parties to whom the Software is furnished to
+ * do so, all subject to the following:
+ *
+ * The copyright notices in the Software and this entire statement, including
+ * the above license grant, this restriction and the following disclaimer,
+ * must be included in all copies of the Software, in whole or in part, and
+ * all derivative works of the Software, unless such copies or derivative
+ * works are solely in the form of machine-executable object code generated by
+ * a source language processor.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
+ * SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
+ * FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include <stdlib.h>
+#ifndef assert
+#include <assert.h>
+#endif
+#include <limits.h>
+#include <errno.h>
+#include <unistd.h>
+#include <sched.h>
+#include <stdint.h> /* for intptr_t */
+
+/*
+Configuration macro:
+
+  EMULATED_THREADS_USE_NATIVE_TIMEDLOCK
+    Use pthread_mutex_timedlock() for `mtx_timedlock()'
+    Otherwise use mtx_trylock() + *busy loop* emulation.
+*/
+#if !defined(__CYGWIN__) && !defined(__APPLE__) && !defined(__NetBSD__)
+#define EMULATED_THREADS_USE_NATIVE_TIMEDLOCK
+#endif
+
+
+#include <pthread.h>
+
+/*---------------------------- macros ----------------------------*/
+#define ONCE_FLAG_INIT PTHREAD_ONCE_INIT
+#ifdef INIT_ONCE_STATIC_INIT
+#define TSS_DTOR_ITERATIONS PTHREAD_DESTRUCTOR_ITERATIONS
+#else
+#define TSS_DTOR_ITERATIONS 1  // assume TSS dtor MAY be called at least once.
+#endif
+
+// FIXME: temporary non-standard hack to ease transition
+#define _MTX_INITIALIZER_NP PTHREAD_MUTEX_INITIALIZER
+
+/*---------------------------- types ----------------------------*/
+typedef pthread_cond_t  cnd_t;
+typedef pthread_t       thrd_t;
+typedef pthread_key_t   tss_t;
+typedef pthread_mutex_t mtx_t;
+typedef pthread_once_t  once_flag;
+
+
+/*
+Implementation limits:
+  - Conditionally emulation for "mutex with timeout"
+    (see EMULATED_THREADS_USE_NATIVE_TIMEDLOCK macro)
+*/
+struct impl_thrd_param {
+    thrd_start_t func;
+    void *arg;
+};
+
+static inline void *
+impl_thrd_routine(void *p)
+{
+    struct impl_thrd_param pack = *((struct impl_thrd_param *)p);
+    free(p);
+    return (void*)(intptr_t)pack.func(pack.arg);
+}
+
+
+/*--------------- 7.25.2 Initialization functions ---------------*/
+// 7.25.2.1
+static inline void
+call_once(once_flag *flag, void (*func)(void))
+{
+    pthread_once(flag, func);
+}
+
+
+/*------------- 7.25.3 Condition variable functions -------------*/
+// 7.25.3.1
+static inline int
+cnd_broadcast(cnd_t *cond)
+{
+    if (!cond) return thrd_error;
+    pthread_cond_broadcast(cond);
+    return thrd_success;
+}
+
+// 7.25.3.2
+static inline void
+cnd_destroy(cnd_t *cond)
+{
+    assert(cond);
+    pthread_cond_destroy(cond);
+}
+
+// 7.25.3.3
+static inline int
+cnd_init(cnd_t *cond)
+{
+    if (!cond) return thrd_error;
+    pthread_cond_init(cond, NULL);
+    return thrd_success;
+}
+
+// 7.25.3.4
+static inline int
+cnd_signal(cnd_t *cond)
+{
+    if (!cond) return thrd_error;
+    pthread_cond_signal(cond);
+    return thrd_success;
+}
+
+// 7.25.3.5
+static inline int
+cnd_timedwait(cnd_t *cond, mtx_t *mtx, const xtime *xt)
+{
+    struct timespec abs_time;
+    int rt;
+    if (!cond || !mtx || !xt) return thrd_error;
+    rt = pthread_cond_timedwait(cond, mtx, &abs_time);
+    if (rt == ETIMEDOUT)
+        return thrd_busy;
+    return (rt == 0) ? thrd_success : thrd_error;
+}
+
+// 7.25.3.6
+static inline int
+cnd_wait(cnd_t *cond, mtx_t *mtx)
+{
+    if (!cond || !mtx) return thrd_error;
+    pthread_cond_wait(cond, mtx);
+    return thrd_success;
+}
+
+
+/*-------------------- 7.25.4 Mutex functions --------------------*/
+// 7.25.4.1
+static inline void
+mtx_destroy(mtx_t *mtx)
+{
+    assert(mtx);
+    pthread_mutex_destroy(mtx);
+}
+
+// 7.25.4.2
+static inline int
+mtx_init(mtx_t *mtx, int type)
+{
+    pthread_mutexattr_t attr;
+    if (!mtx) return thrd_error;
+    if (type != mtx_plain && type != mtx_timed && type != mtx_try
+      && type != (mtx_plain|mtx_recursive)
+      && type != (mtx_timed|mtx_recursive)
+      && type != (mtx_try|mtx_recursive))
+        return thrd_error;
+    pthread_mutexattr_init(&attr);
+    if ((type & mtx_recursive) != 0) {
+#if defined(__linux__) || defined(__linux)
+        pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE_NP);
+#else
+        pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
+#endif
+    }
+    pthread_mutex_init(mtx, &attr);
+    pthread_mutexattr_destroy(&attr);
+    return thrd_success;
+}
+
+// 7.25.4.3
+static inline int
+mtx_lock(mtx_t *mtx)
+{
+    if (!mtx) return thrd_error;
+    pthread_mutex_lock(mtx);
+    return thrd_success;
+}
+
+static inline int
+mtx_trylock(mtx_t *mtx);
+
+static inline void
+thrd_yield(void);
+
+// 7.25.4.4
+static inline int
+mtx_timedlock(mtx_t *mtx, const xtime *xt)
+{
+    if (!mtx || !xt) return thrd_error;
+    {
+#ifdef EMULATED_THREADS_USE_NATIVE_TIMEDLOCK
+    struct timespec ts;
+    int rt;
+    ts.tv_sec = xt->sec;
+    ts.tv_nsec = xt->nsec;
+    rt = pthread_mutex_timedlock(mtx, &ts);
+    if (rt == 0)
+        return thrd_success;
+    return (rt == ETIMEDOUT) ? thrd_busy : thrd_error;
+#else
+    time_t expire = time(NULL);
+    expire += xt->sec;
+    while (mtx_trylock(mtx) != thrd_success) {
+        time_t now = time(NULL);
+        if (expire < now)
+            return thrd_busy;
+        // busy loop!
+        thrd_yield();
+    }
+    return thrd_success;
+#endif
+    }
+}
+
+// 7.25.4.5
+static inline int
+mtx_trylock(mtx_t *mtx)
+{
+    if (!mtx) return thrd_error;
+    return (pthread_mutex_trylock(mtx) == 0) ? thrd_success : thrd_busy;
+}
+
+// 7.25.4.6
+static inline int
+mtx_unlock(mtx_t *mtx)
+{
+    if (!mtx) return thrd_error;
+    pthread_mutex_unlock(mtx);
+    return thrd_success;
+}
+
+
+/*------------------- 7.25.5 Thread functions -------------------*/
+// 7.25.5.1
+static inline int
+thrd_create(thrd_t *thr, thrd_start_t func, void *arg)
+{
+    struct impl_thrd_param *pack;
+    if (!thr) return thrd_error;
+    pack = (struct impl_thrd_param *)malloc(sizeof(struct impl_thrd_param));
+    if (!pack) return thrd_nomem;
+    pack->func = func;
+    pack->arg = arg;
+    if (pthread_create(thr, NULL, impl_thrd_routine, pack) != 0) {
+        free(pack);
+        return thrd_error;
+    }
+    return thrd_success;
+}
+
+// 7.25.5.2
+static inline thrd_t
+thrd_current(void)
+{
+    return pthread_self();
+}
+
+// 7.25.5.3
+static inline int
+thrd_detach(thrd_t thr)
+{
+    return (pthread_detach(thr) == 0) ? thrd_success : thrd_error;
+}
+
+// 7.25.5.4
+static inline int
+thrd_equal(thrd_t thr0, thrd_t thr1)
+{
+    return pthread_equal(thr0, thr1);
+}
+
+// 7.25.5.5
+static inline void
+thrd_exit(int res)
+{
+    pthread_exit((void*)(intptr_t)res);
+}
+
+// 7.25.5.6
+static inline int
+thrd_join(thrd_t thr, int *res)
+{
+    void *code;
+    if (pthread_join(thr, &code) != 0)
+        return thrd_error;
+    if (res)
+        *res = (int)(intptr_t)code;
+    return thrd_success;
+}
+
+// 7.25.5.7
+static inline void
+thrd_sleep(const xtime *xt)
+{
+    struct timespec req;
+    assert(xt);
+    req.tv_sec = xt->sec;
+    req.tv_nsec = xt->nsec;
+    nanosleep(&req, NULL);
+}
+
+// 7.25.5.8
+static inline void
+thrd_yield(void)
+{
+    sched_yield();
+}
+
+
+/*----------- 7.25.6 Thread-specific storage functions -----------*/
+// 7.25.6.1
+static inline int
+tss_create(tss_t *key, tss_dtor_t dtor)
+{
+    if (!key) return thrd_error;
+    return (pthread_key_create(key, dtor) == 0) ? thrd_success : thrd_error;
+}
+
+// 7.25.6.2
+static inline void
+tss_delete(tss_t key)
+{
+    pthread_key_delete(key);
+}
+
+// 7.25.6.3
+static inline void *
+tss_get(tss_t key)
+{
+    return pthread_getspecific(key);
+}
+
+// 7.25.6.4
+static inline int
+tss_set(tss_t key, void *val)
+{
+    return (pthread_setspecific(key, val) == 0) ? thrd_success : thrd_error;
+}
+
+
+/*-------------------- 7.25.7 Time functions --------------------*/
+// 7.25.6.1
+static inline int
+xtime_get(xtime *xt, int base)
+{
+    if (!xt) return 0;
+    if (base == TIME_UTC) {
+        xt->sec = time(NULL);
+        xt->nsec = 0;
+        return base;
+    }
+    return 0;
+}

diff --git a/icd/intel/compiler/mesa-utils/include/c99_compat.h b/icd/intel/compiler/mesa-utils/include/c99_compat.h
new file mode 100644
index 0000000..429c601
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/include/c99_compat.h

@@ -0,0 +1,145 @@
+/**************************************************************************
+ *
+ * Copyright 2007-2013 VMware, Inc.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
+ * IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR
+ * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ **************************************************************************/
+
+#ifndef _C99_COMPAT_H_
+#define _C99_COMPAT_H_
+
+
+/*
+ * MSVC hacks.
+ */
+#if defined(_MSC_VER)
+   /*
+    * Visual Studio 2012 will complain if we define the `inline` keyword, but
+    * actually it only supports the keyword on C++.
+    *
+    * To avoid this the _ALLOW_KEYWORD_MACROS must be set.
+    */
+#  if (_MSC_VER >= 1700) && !defined(_ALLOW_KEYWORD_MACROS)
+#    define _ALLOW_KEYWORD_MACROS
+#  endif
+
+   /*
+    * XXX: MSVC has a `__restrict` keyword, but it also has a
+    * `__declspec(restrict)` modifier, so it is impossible to define a
+    * `restrict` macro without interfering with the latter.  Furthermore the
+    * MSVC standard library uses __declspec(restrict) under the _CRTRESTRICT
+    * macro.  For now resolve this issue by redefining _CRTRESTRICT, but going
+    * forward we should probably should stop using restrict, especially
+    * considering that our code does not obbey strict aliasing rules any way.
+    */
+#  include <crtdefs.h>
+#  undef _CRTRESTRICT
+#  define _CRTRESTRICT
+#endif
+
+
+/*
+ * C99 inline keyword
+ */
+#ifndef inline
+#  ifdef __cplusplus
+     /* C++ supports inline keyword */
+#  elif defined(__GNUC__)
+#    define inline __inline__
+#  elif defined(_MSC_VER)
+#    define inline __inline
+#  elif defined(__ICL)
+#    define inline __inline
+#  elif defined(__INTEL_COMPILER)
+     /* Intel compiler supports inline keyword */
+#  elif defined(__WATCOMC__) && (__WATCOMC__ >= 1100)
+#    define inline __inline
+#  elif defined(__SUNPRO_C) && defined(__C99FEATURES__)
+     /* C99 supports inline keyword */
+#  elif (__STDC_VERSION__ >= 199901L)
+     /* C99 supports inline keyword */
+#  else
+#    define inline
+#  endif
+#endif
+
+
+/*
+ * C99 restrict keyword
+ *
+ * See also:
+ * - http://cellperformance.beyond3d.com/articles/2006/05/demystifying-the-restrict-keyword.html
+ */
+#ifndef restrict
+#  if (__STDC_VERSION__ >= 199901L)
+     /* C99 */
+#  elif defined(__SUNPRO_C) && defined(__C99FEATURES__)
+     /* C99 */
+#  elif defined(__GNUC__)
+#    define restrict __restrict__
+#  elif defined(_MSC_VER)
+#    define restrict __restrict
+#  else
+#    define restrict /* */
+#  endif
+#endif
+
+
+/*
+ * C99 __func__ macro
+ */
+#ifndef __func__
+#  if (__STDC_VERSION__ >= 199901L)
+     /* C99 */
+#  elif defined(__SUNPRO_C) && defined(__C99FEATURES__)
+     /* C99 */
+#  elif defined(__GNUC__)
+#    if __GNUC__ >= 2
+#      define __func__ __FUNCTION__
+#    else
+#      define __func__ "<unknown>"
+#    endif
+#  elif defined(_MSC_VER)
+#    if _MSC_VER >= 1300
+#      define __func__ __FUNCTION__
+#    else
+#      define __func__ "<unknown>"
+#    endif
+#  else
+#    define __func__ "<unknown>"
+#  endif
+#endif
+
+
+/* Simple test case for debugging */
+#if 0
+static inline const char *
+test_c99_compat_h(const void * restrict a,
+                  const void * restrict b)
+{
+   return __func__;
+}
+#endif
+
+
+#endif /* _C99_COMPAT_H_ */

diff --git a/icd/intel/compiler/mesa-utils/include/pci_ids/i965_pci_ids.h b/icd/intel/compiler/mesa-utils/include/pci_ids/i965_pci_ids.h
new file mode 100644
index 0000000..2e04301
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/include/pci_ids/i965_pci_ids.h

@@ -0,0 +1,115 @@
+CHIPSET(0x29A2, i965,    "Intel(R) 965G")
+CHIPSET(0x2992, i965,    "Intel(R) 965Q")
+CHIPSET(0x2982, i965,    "Intel(R) 965G")
+CHIPSET(0x2972, i965,    "Intel(R) 946GZ")
+CHIPSET(0x2A02, i965,    "Intel(R) 965GM")
+CHIPSET(0x2A12, i965,    "Intel(R) 965GME/GLE")
+CHIPSET(0x2A42, g4x,     "Mobile Intel® GM45 Express Chipset")
+CHIPSET(0x2E02, g4x,     "Intel(R) Integrated Graphics Device")
+CHIPSET(0x2E12, g4x,     "Intel(R) Q45/Q43")
+CHIPSET(0x2E22, g4x,     "Intel(R) G45/G43")
+CHIPSET(0x2E32, g4x,     "Intel(R) G41")
+CHIPSET(0x2E42, g4x,     "Intel(R) B43")
+CHIPSET(0x2E92, g4x,     "Intel(R) B43")
+CHIPSET(0x0042, ilk,     "Intel(R) Ironlake Desktop")
+CHIPSET(0x0046, ilk,     "Intel(R) Ironlake Mobile")
+CHIPSET(0x0102, snb_gt1, "Intel(R) Sandybridge Desktop")
+CHIPSET(0x0112, snb_gt2, "Intel(R) Sandybridge Desktop")
+CHIPSET(0x0122, snb_gt2, "Intel(R) Sandybridge Desktop")
+CHIPSET(0x0106, snb_gt1, "Intel(R) Sandybridge Mobile")
+CHIPSET(0x0116, snb_gt2, "Intel(R) Sandybridge Mobile")
+CHIPSET(0x0126, snb_gt2, "Intel(R) Sandybridge Mobile")
+CHIPSET(0x010A, snb_gt1, "Intel(R) Sandybridge Server")
+CHIPSET(0x0152, ivb_gt1, "Intel(R) Ivybridge Desktop")
+CHIPSET(0x0162, ivb_gt2, "Intel(R) Ivybridge Desktop")
+CHIPSET(0x0156, ivb_gt1, "Intel(R) Ivybridge Mobile")
+CHIPSET(0x0166, ivb_gt2, "Intel(R) Ivybridge Mobile")
+CHIPSET(0x015a, ivb_gt1, "Intel(R) Ivybridge Server")
+CHIPSET(0x016a, ivb_gt2, "Intel(R) Ivybridge Server")
+CHIPSET(0x0402, hsw_gt1, "Intel(R) Haswell Desktop")
+CHIPSET(0x0412, hsw_gt2, "Intel(R) Haswell Desktop")
+CHIPSET(0x0422, hsw_gt3, "Intel(R) Haswell Desktop")
+CHIPSET(0x0406, hsw_gt1, "Intel(R) Haswell Mobile")
+CHIPSET(0x0416, hsw_gt2, "Intel(R) Haswell Mobile")
+CHIPSET(0x0426, hsw_gt3, "Intel(R) Haswell Mobile")
+CHIPSET(0x040A, hsw_gt1, "Intel(R) Haswell Server")
+CHIPSET(0x041A, hsw_gt2, "Intel(R) Haswell Server")
+CHIPSET(0x042A, hsw_gt3, "Intel(R) Haswell Server")
+CHIPSET(0x040B, hsw_gt1, "Intel(R) Haswell")
+CHIPSET(0x041B, hsw_gt2, "Intel(R) Haswell")
+CHIPSET(0x042B, hsw_gt3, "Intel(R) Haswell")
+CHIPSET(0x040E, hsw_gt1, "Intel(R) Haswell")
+CHIPSET(0x041E, hsw_gt2, "Intel(R) Haswell")
+CHIPSET(0x042E, hsw_gt3, "Intel(R) Haswell")
+CHIPSET(0x0C02, hsw_gt1, "Intel(R) Haswell Desktop")
+CHIPSET(0x0C12, hsw_gt2, "Intel(R) Haswell Desktop")
+CHIPSET(0x0C22, hsw_gt3, "Intel(R) Haswell Desktop")
+CHIPSET(0x0C06, hsw_gt1, "Intel(R) Haswell Mobile")
+CHIPSET(0x0C16, hsw_gt2, "Intel(R) Haswell Mobile")
+CHIPSET(0x0C26, hsw_gt3, "Intel(R) Haswell Mobile")
+CHIPSET(0x0C0A, hsw_gt1, "Intel(R) Haswell Server")
+CHIPSET(0x0C1A, hsw_gt2, "Intel(R) Haswell Server")
+CHIPSET(0x0C2A, hsw_gt3, "Intel(R) Haswell Server")
+CHIPSET(0x0C0B, hsw_gt1, "Intel(R) Haswell")
+CHIPSET(0x0C1B, hsw_gt2, "Intel(R) Haswell")
+CHIPSET(0x0C2B, hsw_gt3, "Intel(R) Haswell")
+CHIPSET(0x0C0E, hsw_gt1, "Intel(R) Haswell")
+CHIPSET(0x0C1E, hsw_gt2, "Intel(R) Haswell")
+CHIPSET(0x0C2E, hsw_gt3, "Intel(R) Haswell")
+CHIPSET(0x0A02, hsw_gt1, "Intel(R) Haswell Desktop")
+CHIPSET(0x0A12, hsw_gt2, "Intel(R) Haswell Desktop")
+CHIPSET(0x0A22, hsw_gt3, "Intel(R) Haswell Desktop")
+CHIPSET(0x0A06, hsw_gt1, "Intel(R) Haswell Mobile")
+CHIPSET(0x0A16, hsw_gt2, "Intel(R) Haswell Mobile")
+CHIPSET(0x0A26, hsw_gt3, "Intel(R) Haswell Mobile")
+CHIPSET(0x0A0A, hsw_gt1, "Intel(R) Haswell Server")
+CHIPSET(0x0A1A, hsw_gt2, "Intel(R) Haswell Server")
+CHIPSET(0x0A2A, hsw_gt3, "Intel(R) Haswell Server")
+CHIPSET(0x0A0B, hsw_gt1, "Intel(R) Haswell")
+CHIPSET(0x0A1B, hsw_gt2, "Intel(R) Haswell")
+CHIPSET(0x0A2B, hsw_gt3, "Intel(R) Haswell")
+CHIPSET(0x0A0E, hsw_gt1, "Intel(R) Haswell")
+CHIPSET(0x0A1E, hsw_gt2, "Intel(R) Haswell")
+CHIPSET(0x0A2E, hsw_gt3, "Intel(R) Haswell")
+CHIPSET(0x0D02, hsw_gt1, "Intel(R) Haswell Desktop")
+CHIPSET(0x0D12, hsw_gt2, "Intel(R) Haswell Desktop")
+CHIPSET(0x0D22, hsw_gt3, "Intel(R) Haswell Desktop")
+CHIPSET(0x0D06, hsw_gt1, "Intel(R) Haswell Mobile")
+CHIPSET(0x0D16, hsw_gt2, "Intel(R) Haswell Mobile")
+CHIPSET(0x0D26, hsw_gt3, "Intel(R) Haswell Mobile")
+CHIPSET(0x0D0A, hsw_gt1, "Intel(R) Haswell Server")
+CHIPSET(0x0D1A, hsw_gt2, "Intel(R) Haswell Server")
+CHIPSET(0x0D2A, hsw_gt3, "Intel(R) Haswell")
+CHIPSET(0x0D0B, hsw_gt1, "Intel(R) Haswell")
+CHIPSET(0x0D1B, hsw_gt2, "Intel(R) Haswell")
+CHIPSET(0x0D2B, hsw_gt3, "Intel(R) Haswell")
+CHIPSET(0x0D0E, hsw_gt1, "Intel(R) Haswell")
+CHIPSET(0x0D1E, hsw_gt2, "Intel(R) Haswell")
+CHIPSET(0x0D2E, hsw_gt3, "Intel(R) Haswell")
+CHIPSET(0x0F31, byt,     "Intel(R) Bay Trail")
+CHIPSET(0x0F32, byt,     "Intel(R) Bay Trail")
+CHIPSET(0x0F33, byt,     "Intel(R) Bay Trail")
+CHIPSET(0x0157, byt,     "Intel(R) Bay Trail")
+CHIPSET(0x0155, byt,     "Intel(R) Bay Trail")
+CHIPSET(0x1602, bdw_gt1, "Intel(R) Broadwell GT1")
+CHIPSET(0x1606, bdw_gt1, "Intel(R) Broadwell GT1")
+CHIPSET(0x160A, bdw_gt1, "Intel(R) Broadwell GT1")
+CHIPSET(0x160B, bdw_gt1, "Intel(R) Broadwell GT1")
+CHIPSET(0x160D, bdw_gt1, "Intel(R) Broadwell GT1")
+CHIPSET(0x160E, bdw_gt1, "Intel(R) Broadwell GT1")
+CHIPSET(0x1612, bdw_gt2, "Intel(R) HD Graphics 5600 (Broadwell GT2)")
+CHIPSET(0x1616, bdw_gt2, "Intel(R) HD Graphics 5500 (Broadwell GT2)")
+CHIPSET(0x161A, bdw_gt2, "Intel(R) Broadwell GT2")
+CHIPSET(0x161B, bdw_gt2, "Intel(R) Broadwell GT2")
+CHIPSET(0x161D, bdw_gt2, "Intel(R) Broadwell GT2")
+CHIPSET(0x161E, bdw_gt2, "Intel(R) HD Graphics 5300 (Broadwell GT2)")
+CHIPSET(0x1622, bdw_gt3, "Intel(R) Iris Pro 6200 (Broadwell GT3e)")
+CHIPSET(0x1626, bdw_gt3, "Intel(R) HD Graphics 6000 (Broadwell GT3)")
+CHIPSET(0x162A, bdw_gt3, "Intel(R) Iris Pro P6300 (Broadwell GT3e)")
+CHIPSET(0x162B, bdw_gt3, "Intel(R) Iris 6100 (Broadwell GT3)")
+CHIPSET(0x162D, bdw_gt3, "Intel(R) Broadwell GT3")
+CHIPSET(0x162E, bdw_gt3, "Intel(R) Broadwell GT3")
+CHIPSET(0x22B0, chv,     "Intel(R) Cherryview")
+CHIPSET(0x22B1, chv,     "Intel(R) Cherryview")
+CHIPSET(0x22B2, chv,     "Intel(R) Cherryview")
+CHIPSET(0x22B3, chv,     "Intel(R) Cherryview")

diff --git a/icd/intel/compiler/mesa-utils/src/glsl/builtin_type_macros.h b/icd/intel/compiler/mesa-utils/src/glsl/builtin_type_macros.h
new file mode 100644
index 0000000..236e1ce
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/glsl/builtin_type_macros.h

@@ -0,0 +1,156 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file builtin_type_macros.h
+ *
+ * This contains definitions for all GLSL built-in types, regardless of what
+ * language version or extension might provide them.
+ */
+
+#include "glsl_types.h"
+
+DECL_TYPE(error,  GL_INVALID_ENUM, GLSL_TYPE_ERROR, 0, 0)
+DECL_TYPE(void,   GL_INVALID_ENUM, GLSL_TYPE_VOID,  0, 0)
+
+DECL_TYPE(bool,   GL_BOOL,         GLSL_TYPE_BOOL,  1, 1)
+DECL_TYPE(bvec2,  GL_BOOL_VEC2,    GLSL_TYPE_BOOL,  2, 1)
+DECL_TYPE(bvec3,  GL_BOOL_VEC3,    GLSL_TYPE_BOOL,  3, 1)
+DECL_TYPE(bvec4,  GL_BOOL_VEC4,    GLSL_TYPE_BOOL,  4, 1)
+
+DECL_TYPE(int,    GL_INT,          GLSL_TYPE_INT,   1, 1)
+DECL_TYPE(ivec2,  GL_INT_VEC2,     GLSL_TYPE_INT,   2, 1)
+DECL_TYPE(ivec3,  GL_INT_VEC3,     GLSL_TYPE_INT,   3, 1)
+DECL_TYPE(ivec4,  GL_INT_VEC4,     GLSL_TYPE_INT,   4, 1)
+
+DECL_TYPE(uint,   GL_UNSIGNED_INT,      GLSL_TYPE_UINT, 1, 1)
+DECL_TYPE(uvec2,  GL_UNSIGNED_INT_VEC2, GLSL_TYPE_UINT, 2, 1)
+DECL_TYPE(uvec3,  GL_UNSIGNED_INT_VEC3, GLSL_TYPE_UINT, 3, 1)
+DECL_TYPE(uvec4,  GL_UNSIGNED_INT_VEC4, GLSL_TYPE_UINT, 4, 1)
+
+DECL_TYPE(float,  GL_FLOAT,        GLSL_TYPE_FLOAT, 1, 1)
+DECL_TYPE(vec2,   GL_FLOAT_VEC2,   GLSL_TYPE_FLOAT, 2, 1)
+DECL_TYPE(vec3,   GL_FLOAT_VEC3,   GLSL_TYPE_FLOAT, 3, 1)
+DECL_TYPE(vec4,   GL_FLOAT_VEC4,   GLSL_TYPE_FLOAT, 4, 1)
+
+DECL_TYPE(mat2,   GL_FLOAT_MAT2,   GLSL_TYPE_FLOAT, 2, 2)
+DECL_TYPE(mat3,   GL_FLOAT_MAT3,   GLSL_TYPE_FLOAT, 3, 3)
+DECL_TYPE(mat4,   GL_FLOAT_MAT4,   GLSL_TYPE_FLOAT, 4, 4)
+
+DECL_TYPE(mat2x3, GL_FLOAT_MAT2x3, GLSL_TYPE_FLOAT, 3, 2)
+DECL_TYPE(mat2x4, GL_FLOAT_MAT2x4, GLSL_TYPE_FLOAT, 4, 2)
+DECL_TYPE(mat3x2, GL_FLOAT_MAT3x2, GLSL_TYPE_FLOAT, 2, 3)
+DECL_TYPE(mat3x4, GL_FLOAT_MAT3x4, GLSL_TYPE_FLOAT, 4, 3)
+DECL_TYPE(mat4x2, GL_FLOAT_MAT4x2, GLSL_TYPE_FLOAT, 2, 4)
+DECL_TYPE(mat4x3, GL_FLOAT_MAT4x3, GLSL_TYPE_FLOAT, 3, 4)
+
+DECL_TYPE(sampler1D,         GL_SAMPLER_1D,                   GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_1D,   0, 0, GLSL_TYPE_FLOAT)
+DECL_TYPE(sampler2D,         GL_SAMPLER_2D,                   GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_2D,   0, 0, GLSL_TYPE_FLOAT)
+DECL_TYPE(sampler3D,         GL_SAMPLER_3D,                   GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_3D,   0, 0, GLSL_TYPE_FLOAT)
+DECL_TYPE(samplerCube,       GL_SAMPLER_CUBE,                 GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_CUBE, 0, 0, GLSL_TYPE_FLOAT)
+DECL_TYPE(sampler1DArray,    GL_SAMPLER_1D_ARRAY,             GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_1D,   0, 1, GLSL_TYPE_FLOAT)
+DECL_TYPE(sampler2DArray,    GL_SAMPLER_2D_ARRAY,             GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_2D,   0, 1, GLSL_TYPE_FLOAT)
+DECL_TYPE(samplerCubeArray,  GL_SAMPLER_CUBE_MAP_ARRAY,       GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_CUBE, 0, 1, GLSL_TYPE_FLOAT)
+DECL_TYPE(sampler2DRect,     GL_SAMPLER_2D_RECT,              GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_RECT, 0, 0, GLSL_TYPE_FLOAT)
+DECL_TYPE(samplerBuffer,     GL_SAMPLER_BUFFER,               GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_BUF,  0, 0, GLSL_TYPE_FLOAT)
+DECL_TYPE(sampler2DMS,       GL_SAMPLER_2D_MULTISAMPLE,       GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_MS,   0, 0, GLSL_TYPE_FLOAT)
+DECL_TYPE(sampler2DMSArray,  GL_SAMPLER_2D_MULTISAMPLE_ARRAY, GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_MS,   0, 1, GLSL_TYPE_FLOAT)
+
+DECL_TYPE(isampler1D,        GL_INT_SAMPLER_1D,                   GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_1D,   0, 0, GLSL_TYPE_INT)
+DECL_TYPE(isampler2D,        GL_INT_SAMPLER_2D,                   GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_2D,   0, 0, GLSL_TYPE_INT)
+DECL_TYPE(isampler3D,        GL_INT_SAMPLER_3D,                   GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_3D,   0, 0, GLSL_TYPE_INT)
+DECL_TYPE(isamplerCube,      GL_INT_SAMPLER_CUBE,                 GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_CUBE, 0, 0, GLSL_TYPE_INT)
+DECL_TYPE(isampler1DArray,   GL_INT_SAMPLER_1D_ARRAY,             GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_1D,   0, 1, GLSL_TYPE_INT)
+DECL_TYPE(isampler2DArray,   GL_INT_SAMPLER_2D_ARRAY,             GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_2D,   0, 1, GLSL_TYPE_INT)
+DECL_TYPE(isamplerCubeArray, GL_INT_SAMPLER_CUBE_MAP_ARRAY,       GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_CUBE, 0, 1, GLSL_TYPE_INT)
+DECL_TYPE(isampler2DRect,    GL_INT_SAMPLER_2D_RECT,              GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_RECT, 0, 0, GLSL_TYPE_INT)
+DECL_TYPE(isamplerBuffer,    GL_INT_SAMPLER_BUFFER,               GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_BUF,  0, 0, GLSL_TYPE_INT)
+DECL_TYPE(isampler2DMS,      GL_INT_SAMPLER_2D_MULTISAMPLE,       GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_MS,   0, 0, GLSL_TYPE_INT)
+DECL_TYPE(isampler2DMSArray, GL_INT_SAMPLER_2D_MULTISAMPLE_ARRAY, GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_MS,   0, 1, GLSL_TYPE_INT)
+
+DECL_TYPE(usampler1D,        GL_UNSIGNED_INT_SAMPLER_1D,                   GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_1D,   0, 0, GLSL_TYPE_UINT)
+DECL_TYPE(usampler2D,        GL_UNSIGNED_INT_SAMPLER_2D,                   GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_2D,   0, 0, GLSL_TYPE_UINT)
+DECL_TYPE(usampler3D,        GL_UNSIGNED_INT_SAMPLER_3D,                   GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_3D,   0, 0, GLSL_TYPE_UINT)
+DECL_TYPE(usamplerCube,      GL_UNSIGNED_INT_SAMPLER_CUBE,                 GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_CUBE, 0, 0, GLSL_TYPE_UINT)
+DECL_TYPE(usampler1DArray,   GL_UNSIGNED_INT_SAMPLER_1D_ARRAY,             GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_1D,   0, 1, GLSL_TYPE_UINT)
+DECL_TYPE(usampler2DArray,   GL_UNSIGNED_INT_SAMPLER_2D_ARRAY,             GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_2D,   0, 1, GLSL_TYPE_UINT)
+DECL_TYPE(usamplerCubeArray, GL_UNSIGNED_INT_SAMPLER_CUBE_MAP_ARRAY,       GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_CUBE, 0, 1, GLSL_TYPE_UINT)
+DECL_TYPE(usampler2DRect,    GL_UNSIGNED_INT_SAMPLER_2D_RECT,              GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_RECT, 0, 0, GLSL_TYPE_UINT)
+DECL_TYPE(usamplerBuffer,    GL_UNSIGNED_INT_SAMPLER_BUFFER,               GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_BUF,  0, 0, GLSL_TYPE_UINT)
+DECL_TYPE(usampler2DMS,      GL_UNSIGNED_INT_SAMPLER_2D_MULTISAMPLE,       GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_MS,   0, 0, GLSL_TYPE_UINT)
+DECL_TYPE(usampler2DMSArray, GL_UNSIGNED_INT_SAMPLER_2D_MULTISAMPLE_ARRAY, GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_MS,   0, 1, GLSL_TYPE_UINT)
+
+DECL_TYPE(sampler1DShadow,        GL_SAMPLER_1D_SHADOW,             GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_1D,       1, 0, GLSL_TYPE_FLOAT)
+DECL_TYPE(sampler2DShadow,        GL_SAMPLER_2D_SHADOW,             GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_2D,       1, 0, GLSL_TYPE_FLOAT)
+DECL_TYPE(samplerCubeShadow,      GL_SAMPLER_CUBE_SHADOW,           GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_CUBE,     1, 0, GLSL_TYPE_FLOAT)
+DECL_TYPE(sampler1DArrayShadow,   GL_SAMPLER_1D_ARRAY_SHADOW,       GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_1D,       1, 1, GLSL_TYPE_FLOAT)
+DECL_TYPE(sampler2DArrayShadow,   GL_SAMPLER_2D_ARRAY_SHADOW,       GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_2D,       1, 1, GLSL_TYPE_FLOAT)
+DECL_TYPE(samplerCubeArrayShadow, GL_SAMPLER_CUBE_MAP_ARRAY_SHADOW, GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_CUBE,     1, 1, GLSL_TYPE_FLOAT)
+DECL_TYPE(sampler2DRectShadow,    GL_SAMPLER_2D_RECT_SHADOW,        GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_RECT,     1, 0, GLSL_TYPE_FLOAT)
+
+DECL_TYPE(samplerExternalOES,     GL_SAMPLER_EXTERNAL_OES,          GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_EXTERNAL, 0, 0, GLSL_TYPE_FLOAT)
+
+DECL_TYPE(image1D,         GL_IMAGE_1D,                                GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_1D,     0, 0, GLSL_TYPE_FLOAT);
+DECL_TYPE(image2D,         GL_IMAGE_2D,                                GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_2D,     0, 0, GLSL_TYPE_FLOAT);
+DECL_TYPE(image3D,         GL_IMAGE_3D,                                GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_3D,     0, 0, GLSL_TYPE_FLOAT);
+DECL_TYPE(image2DRect,     GL_IMAGE_2D_RECT,                           GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_RECT,   0, 0, GLSL_TYPE_FLOAT);
+DECL_TYPE(imageCube,       GL_IMAGE_CUBE,                              GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_CUBE,   0, 0, GLSL_TYPE_FLOAT);
+DECL_TYPE(imageBuffer,     GL_IMAGE_BUFFER,                            GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_BUF,    0, 0, GLSL_TYPE_FLOAT);
+DECL_TYPE(image1DArray,    GL_IMAGE_1D_ARRAY,                          GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_1D,     0, 1, GLSL_TYPE_FLOAT);
+DECL_TYPE(image2DArray,    GL_IMAGE_2D_ARRAY,                          GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_2D,     0, 1, GLSL_TYPE_FLOAT);
+DECL_TYPE(imageCubeArray,  GL_IMAGE_CUBE_MAP_ARRAY,                    GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_CUBE,   0, 1, GLSL_TYPE_FLOAT);
+DECL_TYPE(image2DMS,       GL_IMAGE_2D_MULTISAMPLE,                    GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_MS,     0, 0, GLSL_TYPE_FLOAT);
+DECL_TYPE(image2DMSArray,  GL_IMAGE_2D_MULTISAMPLE_ARRAY,              GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_MS,     0, 1, GLSL_TYPE_FLOAT);
+DECL_TYPE(iimage1D,        GL_INT_IMAGE_1D,                            GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_1D,     0, 0, GLSL_TYPE_INT);
+DECL_TYPE(iimage2D,        GL_INT_IMAGE_2D,                            GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_2D,     0, 0, GLSL_TYPE_INT);
+DECL_TYPE(iimage3D,        GL_INT_IMAGE_3D,                            GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_3D,     0, 0, GLSL_TYPE_INT);
+DECL_TYPE(iimage2DRect,    GL_INT_IMAGE_2D_RECT,                       GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_RECT,   0, 0, GLSL_TYPE_INT);
+DECL_TYPE(iimageCube,      GL_INT_IMAGE_CUBE,                          GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_CUBE,   0, 0, GLSL_TYPE_INT);
+DECL_TYPE(iimageBuffer,    GL_INT_IMAGE_BUFFER,                        GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_BUF,    0, 0, GLSL_TYPE_INT);
+DECL_TYPE(iimage1DArray,   GL_INT_IMAGE_1D_ARRAY,                      GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_1D,     0, 1, GLSL_TYPE_INT);
+DECL_TYPE(iimage2DArray,   GL_INT_IMAGE_2D_ARRAY,                      GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_2D,     0, 1, GLSL_TYPE_INT);
+DECL_TYPE(iimageCubeArray, GL_INT_IMAGE_CUBE_MAP_ARRAY,                GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_CUBE,   0, 1, GLSL_TYPE_INT);
+DECL_TYPE(iimage2DMS,      GL_INT_IMAGE_2D_MULTISAMPLE,                GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_MS,     0, 0, GLSL_TYPE_INT);
+DECL_TYPE(iimage2DMSArray, GL_INT_IMAGE_2D_MULTISAMPLE_ARRAY,          GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_MS,     0, 1, GLSL_TYPE_INT);
+DECL_TYPE(uimage1D,        GL_UNSIGNED_INT_IMAGE_1D,                   GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_1D,     0, 0, GLSL_TYPE_UINT);
+DECL_TYPE(uimage2D,        GL_UNSIGNED_INT_IMAGE_2D,                   GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_2D,     0, 0, GLSL_TYPE_UINT);
+DECL_TYPE(uimage3D,        GL_UNSIGNED_INT_IMAGE_3D,                   GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_3D,     0, 0, GLSL_TYPE_UINT);
+DECL_TYPE(uimage2DRect,    GL_UNSIGNED_INT_IMAGE_2D_RECT,              GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_RECT,   0, 0, GLSL_TYPE_UINT);
+DECL_TYPE(uimageCube,      GL_UNSIGNED_INT_IMAGE_CUBE,                 GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_CUBE,   0, 0, GLSL_TYPE_UINT);
+DECL_TYPE(uimageBuffer,    GL_UNSIGNED_INT_IMAGE_BUFFER,               GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_BUF,    0, 0, GLSL_TYPE_UINT);
+DECL_TYPE(uimage1DArray,   GL_UNSIGNED_INT_IMAGE_1D_ARRAY,             GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_1D,     0, 1, GLSL_TYPE_UINT);
+DECL_TYPE(uimage2DArray,   GL_UNSIGNED_INT_IMAGE_2D_ARRAY,             GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_2D,     0, 1, GLSL_TYPE_UINT);
+DECL_TYPE(uimageCubeArray, GL_UNSIGNED_INT_IMAGE_CUBE_MAP_ARRAY,       GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_CUBE,   0, 1, GLSL_TYPE_UINT);
+DECL_TYPE(uimage2DMS,      GL_UNSIGNED_INT_IMAGE_2D_MULTISAMPLE,       GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_MS,     0, 0, GLSL_TYPE_UINT);
+DECL_TYPE(uimage2DMSArray, GL_UNSIGNED_INT_IMAGE_2D_MULTISAMPLE_ARRAY, GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_MS,     0, 1, GLSL_TYPE_UINT);
+
+DECL_TYPE(atomic_uint, GL_UNSIGNED_INT_ATOMIC_COUNTER, GLSL_TYPE_ATOMIC_UINT, 1, 1)
+
+STRUCT_TYPE(gl_DepthRangeParameters)
+STRUCT_TYPE(gl_PointParameters)
+STRUCT_TYPE(gl_MaterialParameters)
+STRUCT_TYPE(gl_LightSourceParameters)
+STRUCT_TYPE(gl_LightModelParameters)
+STRUCT_TYPE(gl_LightModelProducts)
+STRUCT_TYPE(gl_LightProducts)
+STRUCT_TYPE(gl_FogParameters)

diff --git a/icd/intel/compiler/mesa-utils/src/glsl/glsl_parser_extras.h b/icd/intel/compiler/mesa-utils/src/glsl/glsl_parser_extras.h
new file mode 100644
index 0000000..ac203f9
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/glsl/glsl_parser_extras.h

@@ -0,0 +1,536 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef GLSL_PARSER_EXTRAS_H
+#define GLSL_PARSER_EXTRAS_H
+
+/*
+ * Most of the definitions here only apply to C++
+ */
+#ifdef __cplusplus
+
+
+#include <stdlib.h>
+#include "glsl_symbol_table.h"
+
+struct gl_context;
+
+struct glsl_switch_state {
+   /** Temporary variables needed for switch statement. */
+   ir_variable *test_var;
+   ir_variable *is_fallthru_var;
+   ir_variable *is_break_var;
+   class ast_switch_statement *switch_nesting_ast;
+
+   /** Table of constant values already used in case labels */
+   struct hash_table *labels_ht;
+   class ast_case_label *previous_default;
+
+   bool is_switch_innermost; // if switch stmt is closest to break, ...
+};
+
+const char *
+glsl_compute_version_string(void *mem_ctx, bool is_es, unsigned version);
+
+typedef struct YYLTYPE {
+   int first_line;
+   int first_column;
+   int last_line;
+   int last_column;
+   unsigned source;
+} YYLTYPE;
+# define YYLTYPE_IS_DECLARED 1
+# define YYLTYPE_IS_TRIVIAL 1
+
+extern void _mesa_glsl_error(YYLTYPE *locp, _mesa_glsl_parse_state *state,
+			     const char *fmt, ...);
+
+
+struct _mesa_glsl_parse_state {
+   _mesa_glsl_parse_state(struct gl_context *_ctx, gl_shader_stage stage,
+			  void *mem_ctx);
+
+   DECLARE_RALLOC_CXX_OPERATORS(_mesa_glsl_parse_state);
+
+   /**
+    * Generate a string representing the GLSL version currently being compiled
+    * (useful for error messages).
+    */
+   const char *get_version_string()
+   {
+      return glsl_compute_version_string(this, this->es_shader,
+                                         this->language_version);
+   }
+
+   /**
+    * Determine whether the current GLSL version is sufficiently high to
+    * support a certain feature.
+    *
+    * \param required_glsl_version is the desktop GLSL version that is
+    * required to support the feature, or 0 if no version of desktop GLSL
+    * supports the feature.
+    *
+    * \param required_glsl_es_version is the GLSL ES version that is required
+    * to support the feature, or 0 if no version of GLSL ES suports the
+    * feature.
+    */
+   bool is_version(unsigned required_glsl_version,
+                   unsigned required_glsl_es_version) const
+   {
+      unsigned required_version = this->es_shader ?
+         required_glsl_es_version : required_glsl_version;
+      return required_version != 0
+         && this->language_version >= required_version;
+   }
+
+   bool check_version(unsigned required_glsl_version,
+                      unsigned required_glsl_es_version,
+                      YYLTYPE *locp, const char *fmt, ...) PRINTFLIKE(5, 6);
+
+   bool check_precision_qualifiers_allowed(YYLTYPE *locp)
+   {
+      return check_version(130, 100, locp,
+                           "precision qualifiers are forbidden");
+   }
+
+   bool check_bitwise_operations_allowed(YYLTYPE *locp)
+   {
+      return check_version(130, 300, locp, "bit-wise operations are forbidden");
+   }
+
+   bool check_explicit_attrib_location_allowed(YYLTYPE *locp,
+                                               const ir_variable *var)
+   {
+      if (!this->has_explicit_attrib_location()) {
+         const char *const requirement = this->es_shader
+            ? "GLSL ES 300"
+            : "GL_ARB_explicit_attrib_location extension or GLSL 330";
+
+         _mesa_glsl_error(locp, this, "%s explicit location requires %s",
+                          mode_string(var), requirement);
+         return false;
+      }
+
+      return true;
+   }
+
+   bool check_separate_shader_objects_allowed(YYLTYPE *locp,
+                                              const ir_variable *var)
+   {
+      if (!this->has_separate_shader_objects()) {
+         const char *const requirement = this->es_shader
+            ? "GL_EXT_separate_shader_objects extension"
+            : "GL_ARB_separate_shader_objects extension or GLSL 420";
+
+         _mesa_glsl_error(locp, this, "%s explicit location requires %s",
+                          mode_string(var), requirement);
+         return false;
+      }
+
+      return true;
+   }
+
+   bool has_explicit_attrib_location() const
+   {
+      return ARB_explicit_attrib_location_enable || is_version(330, 300);
+   }
+
+   bool has_uniform_buffer_objects() const
+   {
+      return ARB_uniform_buffer_object_enable || is_version(140, 300);
+   }
+
+   bool has_separate_shader_objects() const
+   {
+      return ARB_separate_shader_objects_enable || is_version(410, 0)
+         || EXT_separate_shader_objects_enable;
+   }
+
+   void process_version_directive(YYLTYPE *locp, int version,
+                                  const char *ident);
+
+   struct gl_context *const ctx;
+   void *scanner;
+   exec_list translation_unit;
+   glsl_symbol_table *symbols;
+
+   unsigned num_supported_versions;
+   struct {
+      unsigned ver;
+      bool es;
+   } supported_versions[12];
+
+   bool es_shader;
+   unsigned language_version;
+   gl_shader_stage stage;
+
+   /**
+    * Number of nested struct_specifier levels
+    *
+    * Outside a struct_specifer, this is zero.
+    */
+   unsigned struct_specifier_depth;
+
+   /**
+    * Default uniform layout qualifiers tracked during parsing.
+    * Currently affects uniform blocks and uniform buffer variables in
+    * those blocks.
+    */
+   struct ast_type_qualifier *default_uniform_qualifier;
+
+   /**
+    * Variables to track different cases if a fragment shader redeclares
+    * built-in variable gl_FragCoord.
+    *
+    * Note: These values are computed at ast_to_hir time rather than at parse
+    * time.
+    */
+   bool fs_redeclares_gl_fragcoord;
+   bool fs_origin_upper_left;
+   bool fs_pixel_center_integer;
+   bool fs_redeclares_gl_fragcoord_with_no_layout_qualifiers;
+
+   /**
+    * True if a geometry shader input primitive type was specified using a
+    * layout directive.
+    *
+    * Note: this value is computed at ast_to_hir time rather than at parse
+    * time.
+    */
+   bool gs_input_prim_type_specified;
+
+   /** Input layout qualifiers from GLSL 1.50. (geometry shader controls)*/
+   struct ast_type_qualifier *in_qualifier;
+
+   /**
+    * True if a compute shader input local size was specified using a layout
+    * directive.
+    *
+    * Note: this value is computed at ast_to_hir time rather than at parse
+    * time.
+    */
+   bool cs_input_local_size_specified;
+
+   /**
+    * If cs_input_local_size_specified is true, the local size that was
+    * specified.  Otherwise ignored.
+    */
+   unsigned cs_input_local_size[3];
+
+   /** Output layout qualifiers from GLSL 1.50. (geometry shader controls)*/
+   struct ast_type_qualifier *out_qualifier;
+
+   /**
+    * Printable list of GLSL versions supported by the current context
+    *
+    * \note
+    * This string should probably be generated per-context instead of per
+    * invokation of the compiler.  This should be changed when the method of
+    * tracking supported GLSL versions changes.
+    */
+   const char *supported_version_string;
+
+   /**
+    * Implementation defined limits that affect built-in variables, etc.
+    *
+    * \sa struct gl_constants (in mtypes.h)
+    */
+   struct {
+      /* 1.10 */
+      unsigned MaxLights;
+      unsigned MaxClipPlanes;
+      unsigned MaxTextureUnits;
+      unsigned MaxTextureCoords;
+      unsigned MaxVertexAttribs;
+      unsigned MaxVertexUniformComponents;
+      unsigned MaxVertexTextureImageUnits;
+      unsigned MaxCombinedTextureImageUnits;
+      unsigned MaxTextureImageUnits;
+      unsigned MaxFragmentUniformComponents;
+
+      /* ARB_draw_buffers */
+      unsigned MaxDrawBuffers;
+
+      /* 3.00 ES */
+      int MinProgramTexelOffset;
+      int MaxProgramTexelOffset;
+
+      /* 1.50 */
+      unsigned MaxVertexOutputComponents;
+      unsigned MaxGeometryInputComponents;
+      unsigned MaxGeometryOutputComponents;
+      unsigned MaxFragmentInputComponents;
+      unsigned MaxGeometryTextureImageUnits;
+      unsigned MaxGeometryOutputVertices;
+      unsigned MaxGeometryTotalOutputComponents;
+      unsigned MaxGeometryUniformComponents;
+
+      /* ARB_shader_atomic_counters */
+      unsigned MaxVertexAtomicCounters;
+      unsigned MaxGeometryAtomicCounters;
+      unsigned MaxFragmentAtomicCounters;
+      unsigned MaxCombinedAtomicCounters;
+      unsigned MaxAtomicBufferBindings;
+
+      /* ARB_compute_shader */
+      unsigned MaxComputeWorkGroupCount[3];
+      unsigned MaxComputeWorkGroupSize[3];
+
+      /* ARB_shader_image_load_store */
+      unsigned MaxImageUnits;
+      unsigned MaxCombinedImageUnitsAndFragmentOutputs;
+      unsigned MaxImageSamples;
+      unsigned MaxVertexImageUniforms;
+      unsigned MaxGeometryImageUniforms;
+      unsigned MaxFragmentImageUniforms;
+      unsigned MaxCombinedImageUniforms;
+   } Const;
+
+   /**
+    * During AST to IR conversion, pointer to current IR function
+    *
+    * Will be \c NULL whenever the AST to IR conversion is not inside a
+    * function definition.
+    */
+   class ir_function_signature *current_function;
+
+   /**
+    * During AST to IR conversion, pointer to the toplevel IR
+    * instruction list being generated.
+    */
+   exec_list *toplevel_ir;
+
+   /** Have we found a return statement in this function? */
+   bool found_return;
+
+   /** Was there an error during compilation? */
+   bool error;
+
+   /**
+    * Are all shader inputs / outputs invariant?
+    *
+    * This is set when the 'STDGL invariant(all)' pragma is used.
+    */
+   bool all_invariant;
+
+   /** Loop or switch statement containing the current instructions. */
+   class ast_iteration_statement *loop_nesting_ast;
+
+   struct glsl_switch_state switch_state;
+
+   /** List of structures defined in user code. */
+   const glsl_type **user_structures;
+   unsigned num_user_structures;
+
+   char *info_log;
+
+   /**
+    * \name Enable bits for GLSL extensions
+    */
+   /*@{*/
+   /* ARB extensions go here, sorted alphabetically.
+    */
+   bool ARB_arrays_of_arrays_enable;
+   bool ARB_arrays_of_arrays_warn;
+   bool ARB_compute_shader_enable;
+   bool ARB_compute_shader_warn;
+   bool ARB_conservative_depth_enable;
+   bool ARB_conservative_depth_warn;
+   bool ARB_draw_buffers_enable;
+   bool ARB_draw_buffers_warn;
+   bool ARB_draw_instanced_enable;
+   bool ARB_draw_instanced_warn;
+   bool ARB_explicit_attrib_location_enable;
+   bool ARB_explicit_attrib_location_warn;
+   bool ARB_fragment_coord_conventions_enable;
+   bool ARB_fragment_coord_conventions_warn;
+   bool ARB_gpu_shader5_enable;
+   bool ARB_gpu_shader5_warn;
+   bool ARB_sample_shading_enable;
+   bool ARB_sample_shading_warn;
+   bool ARB_separate_shader_objects_enable;
+   bool ARB_separate_shader_objects_warn;
+   bool ARB_shader_atomic_counters_enable;
+   bool ARB_shader_atomic_counters_warn;
+   bool ARB_shader_bit_encoding_enable;
+   bool ARB_shader_bit_encoding_warn;
+   bool ARB_shader_image_load_store_enable;
+   bool ARB_shader_image_load_store_warn;
+   bool ARB_shader_stencil_export_enable;
+   bool ARB_shader_stencil_export_warn;
+   bool ARB_shader_texture_lod_enable;
+   bool ARB_shader_texture_lod_warn;
+   bool ARB_shading_language_420pack_enable;
+   bool ARB_shading_language_420pack_warn;
+   bool ARB_shading_language_packing_enable;
+   bool ARB_shading_language_packing_warn;
+   bool ARB_texture_cube_map_array_enable;
+   bool ARB_texture_cube_map_array_warn;
+   bool ARB_texture_gather_enable;
+   bool ARB_texture_gather_warn;
+   bool ARB_texture_multisample_enable;
+   bool ARB_texture_multisample_warn;
+   bool ARB_texture_query_levels_enable;
+   bool ARB_texture_query_levels_warn;
+   bool ARB_texture_query_lod_enable;
+   bool ARB_texture_query_lod_warn;
+   bool ARB_texture_rectangle_enable;
+   bool ARB_texture_rectangle_warn;
+   bool ARB_uniform_buffer_object_enable;
+   bool ARB_uniform_buffer_object_warn;
+   bool ARB_viewport_array_enable;
+   bool ARB_viewport_array_warn;
+
+   /* KHR extensions go here, sorted alphabetically.
+    */
+
+   /* OES extensions go here, sorted alphabetically.
+    */
+   bool OES_EGL_image_external_enable;
+   bool OES_EGL_image_external_warn;
+   bool OES_standard_derivatives_enable;
+   bool OES_standard_derivatives_warn;
+   bool OES_texture_3D_enable;
+   bool OES_texture_3D_warn;
+
+   /* All other extensions go here, sorted alphabetically.
+    */
+   bool AMD_conservative_depth_enable;
+   bool AMD_conservative_depth_warn;
+   bool AMD_shader_stencil_export_enable;
+   bool AMD_shader_stencil_export_warn;
+   bool AMD_shader_trinary_minmax_enable;
+   bool AMD_shader_trinary_minmax_warn;
+   bool AMD_vertex_shader_layer_enable;
+   bool AMD_vertex_shader_layer_warn;
+   bool EXT_separate_shader_objects_enable;
+   bool EXT_separate_shader_objects_warn;
+   bool EXT_shader_integer_mix_enable;
+   bool EXT_shader_integer_mix_warn;
+   bool EXT_texture_array_enable;
+   bool EXT_texture_array_warn;
+   /*@}*/
+
+   /** Extensions supported by the OpenGL implementation. */
+   const struct gl_extensions *extensions;
+
+   bool uses_builtin_functions;
+   bool fs_uses_gl_fragcoord;
+
+   /**
+    * For geometry shaders, size of the most recently seen input declaration
+    * that was a sized array, or 0 if no sized input array declarations have
+    * been seen.
+    *
+    * Unused for other shader types.
+    */
+   unsigned gs_input_size;
+
+   bool early_fragment_tests;
+
+   /** Atomic counter offsets by binding */
+   unsigned atomic_counter_offsets[MAX_COMBINED_ATOMIC_BUFFERS];
+};
+
+# define YYLLOC_DEFAULT(Current, Rhs, N)			\
+do {								\
+   if (N)							\
+   {								\
+      (Current).first_line   = YYRHSLOC(Rhs, 1).first_line;	\
+      (Current).first_column = YYRHSLOC(Rhs, 1).first_column;	\
+      (Current).last_line    = YYRHSLOC(Rhs, N).last_line;	\
+      (Current).last_column  = YYRHSLOC(Rhs, N).last_column;	\
+   }								\
+   else								\
+   {								\
+      (Current).first_line   = (Current).last_line =		\
+	 YYRHSLOC(Rhs, 0).last_line;				\
+      (Current).first_column = (Current).last_column =		\
+	 YYRHSLOC(Rhs, 0).last_column;				\
+   }								\
+   (Current).source = 0;					\
+} while (0)
+
+/**
+ * Emit a warning to the shader log
+ *
+ * \sa _mesa_glsl_error
+ */
+extern void _mesa_glsl_warning(const YYLTYPE *locp,
+			       _mesa_glsl_parse_state *state,
+			       const char *fmt, ...);
+
+extern void _mesa_glsl_lexer_ctor(struct _mesa_glsl_parse_state *state,
+				  const char *string);
+
+extern void _mesa_glsl_lexer_dtor(struct _mesa_glsl_parse_state *state);
+
+union YYSTYPE;
+extern int _mesa_glsl_lexer_lex(union YYSTYPE *yylval, YYLTYPE *yylloc,
+                                void *scanner);
+
+extern int _mesa_glsl_parse(struct _mesa_glsl_parse_state *);
+
+/**
+ * Process elements of the #extension directive
+ *
+ * \return
+ * If \c name and \c behavior are valid, \c true is returned.  Otherwise
+ * \c false is returned.
+ */
+extern bool _mesa_glsl_process_extension(const char *name, YYLTYPE *name_locp,
+					 const char *behavior,
+					 YYLTYPE *behavior_locp,
+					 _mesa_glsl_parse_state *state);
+
+#endif /* __cplusplus */
+
+/*
+ * These definitions apply to C and C++
+ */
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Get the textual name of the specified shader stage (which is a
+ * gl_shader_stage).
+ */
+extern const char *
+_mesa_shader_stage_to_string(unsigned stage);
+
+extern int glcpp_preprocess(void *ctx, const char **shader, char **info_log,
+                      const struct gl_extensions *extensions, struct gl_context *gl_ctx);
+
+extern void _mesa_create_shader_compiler(void);
+extern void _mesa_destroy_shader_compiler(void);
+extern void _mesa_destroy_shader_compiler_caches(void);
+
+
+#ifdef __cplusplus
+}
+#endif
+
+
+#endif /* GLSL_PARSER_EXTRAS_H */

diff --git a/icd/intel/compiler/mesa-utils/src/glsl/glsl_types.h b/icd/intel/compiler/mesa-utils/src/glsl/glsl_types.h
new file mode 100644
index 0000000..57ace59
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/glsl/glsl_types.h

@@ -0,0 +1,711 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2009 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef GLSL_TYPES_H
+#define GLSL_TYPES_H
+
+#include <string.h>
+#include <assert.h>
+#include "main/mtypes.h" /* for gl_texture_index, C++'s enum rules are broken */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct _mesa_glsl_parse_state;
+struct glsl_symbol_table;
+
+extern void
+_mesa_glsl_initialize_types(struct _mesa_glsl_parse_state *state);
+
+extern void
+_mesa_glsl_release_types(void);
+
+#ifdef __cplusplus
+}
+#endif
+
+enum glsl_base_type {
+   GLSL_TYPE_UINT = 0,
+   GLSL_TYPE_INT,
+   GLSL_TYPE_FLOAT,
+   GLSL_TYPE_BOOL,
+   GLSL_TYPE_SAMPLER,
+   GLSL_TYPE_IMAGE,
+   GLSL_TYPE_ATOMIC_UINT,
+   GLSL_TYPE_STRUCT,
+   GLSL_TYPE_INTERFACE,
+   GLSL_TYPE_ARRAY,
+   GLSL_TYPE_VOID,
+   GLSL_TYPE_ERROR
+};
+
+enum glsl_sampler_dim {
+   GLSL_SAMPLER_DIM_1D = 0,
+   GLSL_SAMPLER_DIM_2D,
+   GLSL_SAMPLER_DIM_3D,
+   GLSL_SAMPLER_DIM_CUBE,
+   GLSL_SAMPLER_DIM_RECT,
+   GLSL_SAMPLER_DIM_BUF,
+   GLSL_SAMPLER_DIM_EXTERNAL,
+   GLSL_SAMPLER_DIM_MS
+};
+
+enum glsl_interface_packing {
+   GLSL_INTERFACE_PACKING_STD140,
+   GLSL_INTERFACE_PACKING_SHARED,
+   GLSL_INTERFACE_PACKING_PACKED
+};
+
+#ifdef __cplusplus
+#include "GL/gl.h"
+#include "ralloc.h"
+#include "memory_writer.h"
+#include "memory_map.h"
+
+struct glsl_type {
+   GLenum gl_type;
+   glsl_base_type base_type;
+
+   unsigned sampler_dimensionality:3; /**< \see glsl_sampler_dim */
+   unsigned sampler_shadow:1;
+   unsigned sampler_array:1;
+   unsigned sampler_type:2;    /**< Type of data returned using this
+				* sampler or image.  Only \c
+				* GLSL_TYPE_FLOAT, \c GLSL_TYPE_INT,
+				* and \c GLSL_TYPE_UINT are valid.
+				*/
+   unsigned interface_packing:2;
+
+   /* Callers of this ralloc-based new need not call delete. It's
+    * easier to just ralloc_free 'mem_ctx' (or any of its ancestors). */
+   static void* operator new(size_t size)
+   {
+      mtx_lock(&glsl_type::mutex);
+
+      /* mem_ctx should have been created by the static members */
+      assert(glsl_type::mem_ctx != NULL);
+
+      void *type;
+
+      type = ralloc_size(glsl_type::mem_ctx, size);
+      assert(type != NULL);
+
+      mtx_unlock(&glsl_type::mutex);
+
+      return type;
+   }
+
+   /* If the user *does* call delete, that's OK, we will just
+    * ralloc_free in that case. */
+   static void operator delete(void *type)
+   {
+      mtx_lock(&glsl_type::mutex);
+      ralloc_free(type);
+      mtx_unlock(&glsl_type::mutex);
+   }
+
+   /**
+    * Serialization functionality used by binary shaders.
+    */
+   void serialize(memory_writer &mem) const;
+
+   /**
+    * Deserialization functionality used by binary shaders,
+    * state and type_hash are helper structures managed by the
+    * ir_deserializer class.
+    */
+   const glsl_type *deserialize(memory_map *map,
+                                struct _mesa_glsl_parse_state *state,
+                                struct hash_table *type_hash,
+                                uint32_t hash_value);
+
+   /**
+    * \name Vector and matrix element counts
+    *
+    * For scalars, each of these values will be 1.  For non-numeric types
+    * these will be 0.
+    */
+   /*@{*/
+   unsigned vector_elements:3; /**< 1, 2, 3, or 4 vector elements. */
+   unsigned matrix_columns:3;  /**< 1, 2, 3, or 4 matrix columns. */
+   /*@}*/
+
+   /**
+    * Name of the data type
+    *
+    * Will never be \c NULL.
+    */
+   const char *name;
+
+   /**
+    * For \c GLSL_TYPE_ARRAY, this is the length of the array.  For
+    * \c GLSL_TYPE_STRUCT or \c GLSL_TYPE_INTERFACE, it is the number of
+    * elements in the structure and the number of values pointed to by
+    * \c fields.structure (below).
+    */
+   unsigned length;
+
+   /**
+    * Subtype of composite data types.
+    */
+   union {
+      const struct glsl_type *array;            /**< Type of array elements. */
+      const struct glsl_type *parameters;       /**< Parameters to function. */
+      struct glsl_struct_field *structure;      /**< List of struct fields. */
+   } fields;
+
+   /**
+    * \name Pointers to various public type singletons
+    */
+   /*@{*/
+#undef  DECL_TYPE
+#define DECL_TYPE(NAME, ...) \
+   static const glsl_type *const NAME##_type;
+#undef  STRUCT_TYPE
+#define STRUCT_TYPE(NAME) \
+   static const glsl_type *const struct_##NAME##_type;
+#include "builtin_type_macros.h"
+   /*@}*/
+
+   /**
+    * Convenience accessors for vector types (shorter than get_instance()).
+    * @{
+    */
+   static const glsl_type *vec(unsigned components);
+   static const glsl_type *ivec(unsigned components);
+   static const glsl_type *uvec(unsigned components);
+   static const glsl_type *bvec(unsigned components);
+   /**@}*/
+
+   /**
+    * For numeric and boolean derived types returns the basic scalar type
+    *
+    * If the type is a numeric or boolean scalar, vector, or matrix type,
+    * this function gets the scalar type of the individual components.  For
+    * all other types, including arrays of numeric or boolean types, the
+    * error type is returned.
+    */
+   const glsl_type *get_base_type() const;
+
+   /**
+    * Get the basic scalar type which this type aggregates.
+    *
+    * If the type is a numeric or boolean scalar, vector, or matrix, or an
+    * array of any of those, this function gets the scalar type of the
+    * individual components.  For structs and arrays of structs, this function
+    * returns the struct type.  For samplers and arrays of samplers, this
+    * function returns the sampler type.
+    */
+   const glsl_type *get_scalar_type() const;
+
+   /**
+    * Query the type of elements in an array
+    *
+    * \return
+    * Pointer to the type of elements in the array for array types, or \c NULL
+    * for non-array types.
+    */
+   const glsl_type *element_type() const
+   {
+      return is_array() ? fields.array : NULL;
+   }
+
+   /**
+    * Get the instance of a built-in scalar, vector, or matrix type
+    */
+   static const glsl_type *get_instance(unsigned base_type, unsigned rows,
+					unsigned columns);
+
+   /**
+    * Get the instance of an array type
+    */
+   static const glsl_type *get_array_instance(const glsl_type *base,
+					      unsigned elements);
+
+   /**
+    * Get the instance of a record type
+    */
+   static const glsl_type *get_record_instance(const glsl_struct_field *fields,
+					       unsigned num_fields,
+					       const char *name);
+
+   /**
+    * Get the instance of an interface block type
+    */
+   static const glsl_type *get_interface_instance(const glsl_struct_field *fields,
+						  unsigned num_fields,
+						  enum glsl_interface_packing packing,
+						  const char *block_name);
+
+   /**
+    * Query the total number of scalars that make up a scalar, vector or matrix
+    */
+   unsigned components() const
+   {
+      return vector_elements * matrix_columns;
+   }
+
+   /**
+    * Calculate the number of components slots required to hold this type
+    *
+    * This is used to determine how many uniform or varying locations a type
+    * might occupy.
+    */
+   unsigned component_slots() const;
+
+   /**
+    * Calculate the number of attribute slots required to hold this type
+    *
+    * This implements the language rules of GLSL 1.50 for counting the number
+    * of slots used by a vertex attribute.  It also determines the number of
+    * varying slots the type will use up in the absence of varying packing
+    * (and thus, it can be used to measure the number of varying slots used by
+    * the varyings that are generated by lower_packed_varyings).
+    */
+   unsigned count_attribute_slots() const;
+
+
+   /**
+    * Alignment in bytes of the start of this type in a std140 uniform
+    * block.
+    */
+   unsigned std140_base_alignment(bool row_major) const;
+
+   /** Size in bytes of this type in a std140 uniform block.
+    *
+    * Note that this is not GL_UNIFORM_SIZE (which is the number of
+    * elements in the array)
+    */
+   unsigned std140_size(bool row_major) const;
+
+   /**
+    * \brief Can this type be implicitly converted to another?
+    *
+    * \return True if the types are identical or if this type can be converted
+    *         to \c desired according to Section 4.1.10 of the GLSL spec.
+    *
+    * \verbatim
+    * From page 25 (31 of the pdf) of the GLSL 1.50 spec, Section 4.1.10
+    * Implicit Conversions:
+    *
+    *     In some situations, an expression and its type will be implicitly
+    *     converted to a different type. The following table shows all allowed
+    *     implicit conversions:
+    *
+    *     Type of expression | Can be implicitly converted to
+    *     --------------------------------------------------
+    *     int                  float
+    *     uint
+    *
+    *     ivec2                vec2
+    *     uvec2
+    *
+    *     ivec3                vec3
+    *     uvec3
+    *
+    *     ivec4                vec4
+    *     uvec4
+    *
+    *     There are no implicit array or structure conversions. For example,
+    *     an array of int cannot be implicitly converted to an array of float.
+    *     There are no implicit conversions between signed and unsigned
+    *     integers.
+    * \endverbatim
+    */
+   bool can_implicitly_convert_to(const glsl_type *desired) const;
+
+   /**
+    * Query whether or not a type is a scalar (non-vector and non-matrix).
+    */
+   bool is_scalar() const
+   {
+      return (vector_elements == 1)
+	 && (base_type >= GLSL_TYPE_UINT)
+	 && (base_type <= GLSL_TYPE_BOOL);
+   }
+
+   /**
+    * Query whether or not a type is a vector
+    */
+   bool is_vector() const
+   {
+      return (vector_elements > 1)
+	 && (matrix_columns == 1)
+	 && (base_type >= GLSL_TYPE_UINT)
+	 && (base_type <= GLSL_TYPE_BOOL);
+   }
+
+   /**
+    * Query whether or not a type is a matrix
+    */
+   bool is_matrix() const
+   {
+      /* GLSL only has float matrices. */
+      return (matrix_columns > 1) && (base_type == GLSL_TYPE_FLOAT);
+   }
+
+   /**
+    * Query whether or not a type is a non-array numeric type
+    */
+   bool is_numeric() const
+   {
+      return (base_type >= GLSL_TYPE_UINT) && (base_type <= GLSL_TYPE_FLOAT);
+   }
+
+   /**
+    * Query whether or not a type is an integral type
+    */
+   bool is_integer() const
+   {
+      return (base_type == GLSL_TYPE_UINT) || (base_type == GLSL_TYPE_INT);
+   }
+
+   /**
+    * Query whether or not type is an integral type, or for struct and array
+    * types, contains an integral type.
+    */
+   bool contains_integer() const;
+
+   /**
+    * Query whether or not a type is a float type
+    */
+   bool is_float() const
+   {
+      return base_type == GLSL_TYPE_FLOAT;
+   }
+
+   /**
+    * Query whether or not a type is a non-array boolean type
+    */
+   bool is_boolean() const
+   {
+      return base_type == GLSL_TYPE_BOOL;
+   }
+
+   /**
+    * Query whether or not a type is a sampler
+    */
+   bool is_sampler() const
+   {
+      return base_type == GLSL_TYPE_SAMPLER;
+   }
+
+   /**
+    * Query whether or not type is a sampler, or for struct and array
+    * types, contains a sampler.
+    */
+   bool contains_sampler() const;
+
+   /**
+    * Get the Mesa texture target index for a sampler type.
+    */
+   gl_texture_index sampler_index() const;
+
+   /**
+    * Query whether or not type is an image, or for struct and array
+    * types, contains an image.
+    */
+   bool contains_image() const;
+
+   /**
+    * Query whether or not a type is an image
+    */
+   bool is_image() const
+   {
+      return base_type == GLSL_TYPE_IMAGE;
+   }
+
+   /**
+    * Query whether or not a type is an array
+    */
+   bool is_array() const
+   {
+      return base_type == GLSL_TYPE_ARRAY;
+   }
+
+   /**
+    * Query whether or not a type is a record
+    */
+   bool is_record() const
+   {
+      return base_type == GLSL_TYPE_STRUCT;
+   }
+
+   /**
+    * Query whether or not a type is an interface
+    */
+   bool is_interface() const
+   {
+      return base_type == GLSL_TYPE_INTERFACE;
+   }
+
+   /**
+    * Query whether or not a type is the void type singleton.
+    */
+   bool is_void() const
+   {
+      return base_type == GLSL_TYPE_VOID;
+   }
+
+   /**
+    * Query whether or not a type is the error type singleton.
+    */
+   bool is_error() const
+   {
+      return base_type == GLSL_TYPE_ERROR;
+   }
+
+   /**
+    * Return the amount of atomic counter storage required for a type.
+    */
+   unsigned atomic_size() const
+   {
+      if (base_type == GLSL_TYPE_ATOMIC_UINT)
+         return ATOMIC_COUNTER_SIZE;
+      else if (is_array())
+         return length * element_type()->atomic_size();
+      else
+         return 0;
+   }
+
+   /**
+    * Return whether a type contains any atomic counters.
+    */
+   bool contains_atomic() const
+   {
+      return atomic_size() > 0;
+   }
+
+   /**
+    * Return whether a type contains any opaque types.
+    */
+   bool contains_opaque() const;
+
+   /**
+    * Query the full type of a matrix row
+    *
+    * \return
+    * If the type is not a matrix, \c glsl_type::error_type is returned.
+    * Otherwise a type matching the rows of the matrix is returned.
+    */
+   const glsl_type *row_type() const
+   {
+      return is_matrix()
+	 ? get_instance(base_type, matrix_columns, 1)
+	 : error_type;
+   }
+
+   /**
+    * Query the full type of a matrix column
+    *
+    * \return
+    * If the type is not a matrix, \c glsl_type::error_type is returned.
+    * Otherwise a type matching the columns of the matrix is returned.
+    */
+   const glsl_type *column_type() const
+   {
+      return is_matrix()
+	 ? get_instance(base_type, vector_elements, 1)
+	 : error_type;
+   }
+
+   /**
+    * Get the type of a structure field
+    *
+    * \return
+    * Pointer to the type of the named field.  If the type is not a structure
+    * or the named field does not exist, \c glsl_type::error_type is returned.
+    */
+   const glsl_type *field_type(const char *name) const;
+
+   /**
+    * Get the location of a filed within a record type
+    */
+   int field_index(const char *name) const;
+
+   /**
+    * Query the number of elements in an array type
+    *
+    * \return
+    * The number of elements in the array for array types or -1 for non-array
+    * types.  If the number of elements in the array has not yet been declared,
+    * zero is returned.
+    */
+   int array_size() const
+   {
+      return is_array() ? length : -1;
+   }
+
+   /**
+    * Query whether the array size for all dimensions has been declared.
+    */
+   bool is_unsized_array() const
+   {
+      return is_array() && length == 0;
+   }
+
+   /**
+    * Return the number of coordinate components needed for this
+    * sampler or image type.
+    *
+    * This is based purely on the sampler's dimensionality.  For example, this
+    * returns 1 for sampler1D, and 3 for sampler2DArray.
+    *
+    * Note that this is often different than actual coordinate type used in
+    * a texturing built-in function, since those pack additional values (such
+    * as the shadow comparitor or projector) into the coordinate type.
+    */
+   int coordinate_components() const;
+
+   /**
+    * Compare a record type against another record type.
+    *
+    * This is useful for matching record types declared across shader stages.
+    */
+   bool record_compare(const glsl_type *b) const;
+
+private:
+
+   static mtx_t mutex;
+
+   /**
+    * ralloc context for all glsl_type allocations
+    *
+    * Set on the first call to \c glsl_type::new.
+    */
+   static void *mem_ctx;
+
+   void init_ralloc_type_ctx(void);
+
+   /** Constructor for vector and matrix types */
+   glsl_type(GLenum gl_type,
+	     glsl_base_type base_type, unsigned vector_elements,
+	     unsigned matrix_columns, const char *name);
+
+   /** Constructor for sampler or image types */
+   glsl_type(GLenum gl_type, glsl_base_type base_type,
+	     enum glsl_sampler_dim dim, bool shadow, bool array,
+	     unsigned type, const char *name);
+
+   /** Constructor for record types */
+   glsl_type(const glsl_struct_field *fields, unsigned num_fields,
+	     const char *name);
+
+   /** Constructor for interface types */
+   glsl_type(const glsl_struct_field *fields, unsigned num_fields,
+	     enum glsl_interface_packing packing, const char *name);
+
+   /** Constructor for array types */
+   glsl_type(const glsl_type *array, unsigned length);
+
+   /** Hash table containing the known array types. */
+   static struct hash_table *array_types;
+
+   /** Hash table containing the known record types. */
+   static struct hash_table *record_types;
+
+   /** Hash table containing the known interface types. */
+   static struct hash_table *interface_types;
+
+   static int record_key_compare(const void *a, const void *b);
+   static unsigned record_key_hash(const void *key);
+
+   /**
+    * \name Built-in type flyweights
+    */
+   /*@{*/
+#undef  DECL_TYPE
+#define DECL_TYPE(NAME, ...) static const glsl_type _##NAME##_type;
+#undef  STRUCT_TYPE
+#define STRUCT_TYPE(NAME)        static const glsl_type _struct_##NAME##_type;
+#include "builtin_type_macros.h"
+   /*@}*/
+
+   /**
+    * \name Friend functions.
+    *
+    * These functions are friends because they must have C linkage and the
+    * need to call various private methods or access various private static
+    * data.
+    */
+   /*@{*/
+   friend void _mesa_glsl_initialize_types(struct _mesa_glsl_parse_state *);
+   friend void _mesa_glsl_release_types(void);
+   /*@}*/
+};
+
+struct glsl_struct_field {
+   const struct glsl_type *type;
+   const char *name;
+   bool row_major;
+
+   /**
+    * For interface blocks, gl_varying_slot corresponding to the input/output
+    * if this is a built-in input/output (i.e. a member of the built-in
+    * gl_PerVertex interface block); -1 otherwise.
+    *
+    * Ignored for structs.
+    */
+   int location;
+
+   /**
+    * For interface blocks, the interpolation mode (as in
+    * ir_variable::interpolation).  0 otherwise.
+    */
+   unsigned interpolation:2;
+
+   /**
+    * For interface blocks, 1 if this variable uses centroid interpolation (as
+    * in ir_variable::centroid).  0 otherwise.
+    */
+   unsigned centroid:1;
+
+   /**
+    * For interface blocks, 1 if this variable uses sample interpolation (as
+    * in ir_variable::sample). 0 otherwise.
+    */
+   unsigned sample:1;
+};
+
+/**
+ * Deserialization utility function used by the binary shaders, state and
+ * type_hash are mandatory helper structures managed by the caller.
+ */
+const glsl_type *
+deserialize_glsl_type(memory_map *map, struct _mesa_glsl_parse_state *state,
+                      struct hash_table *type_hash);
+
+static inline unsigned int
+glsl_align(unsigned int a, unsigned int align)
+{
+   return (a + align - 1) / align * align;
+}
+
+#undef DECL_TYPE
+#undef STRUCT_TYPE
+#endif /* __cplusplus */
+
+#endif /* GLSL_TYPES_H */

diff --git a/icd/intel/compiler/mesa-utils/src/glsl/ir.h b/icd/intel/compiler/mesa-utils/src/glsl/ir.h
new file mode 100644
index 0000000..e45b41b
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/glsl/ir.h

@@ -0,0 +1,2403 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef IR_H
+#define IR_H
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "ralloc.h"
+#include "glsl_types.h"
+#include "list.h"
+#include "ir_visitor.h"
+#include "ir_hierarchical_visitor.h"
+#include "main/mtypes.h"
+#include "memory_writer.h"
+
+#ifdef __cplusplus
+
+/**
+ * \defgroup IR Intermediate representation nodes
+ *
+ * @{
+ */
+
+/**
+ * Class tags
+ *
+ * Each concrete class derived from \c ir_instruction has a value in this
+ * enumerant.  The value for the type is stored in \c ir_instruction::ir_type
+ * by the constructor.  While using type tags is not very C++, it is extremely
+ * convenient.  For example, during debugging you can simply inspect
+ * \c ir_instruction::ir_type to find out the actual type of the object.
+ *
+ * In addition, it is possible to use a switch-statement based on \c
+ * \c ir_instruction::ir_type to select different behavior for different object
+ * types.  For functions that have only slight differences for several object
+ * types, this allows writing very straightforward, readable code.
+ */
+enum ir_node_type {
+   /**
+    * Zero is unused so that the IR validator can detect cases where
+    * \c ir_instruction::ir_type has not been initialized.
+    */
+   ir_type_unset,
+   ir_type_variable,
+   ir_type_assignment,
+   ir_type_call,
+   ir_type_constant,
+   ir_type_dereference_array,
+   ir_type_dereference_record,
+   ir_type_dereference_variable,
+   ir_type_discard,
+   ir_type_expression,
+   ir_type_function,
+   ir_type_function_signature,
+   ir_type_if,
+   ir_type_loop,
+   ir_type_loop_jump,
+   ir_type_return,
+   ir_type_swizzle,
+   ir_type_texture,
+   ir_type_emit_vertex,
+   ir_type_end_primitive,
+   ir_type_max /**< maximum ir_type enum number, for validation */
+};
+
+
+/**
+ * Base class of all IR instructions
+ */
+class ir_instruction : public exec_node {
+public:
+   enum ir_node_type ir_type;
+
+   /**
+    * GCC 4.7+ and clang warn when deleting an ir_instruction unless
+    * there's a virtual destructor present.  Because we almost
+    * universally use ralloc for our memory management of
+    * ir_instructions, the destructor doesn't need to do any work.
+    */
+   virtual ~ir_instruction()
+   {
+   }
+
+   /** ir_print_visitor helper for debugging. */
+   void print(void) const;
+   void fprint(FILE *f) const;
+
+   /* serialization */
+   void serialize(memory_writer &mem);
+   virtual void serialize_data(memory_writer &mem) = 0;
+
+   virtual void accept(ir_visitor *) = 0;
+   virtual ir_visitor_status accept(ir_hierarchical_visitor *) = 0;
+   virtual ir_instruction *clone(void *mem_ctx,
+				 struct hash_table *ht) const = 0;
+
+   /**
+    * \name IR instruction downcast functions
+    *
+    * These functions either cast the object to a derived class or return
+    * \c NULL if the object's type does not match the specified derived class.
+    * Additional downcast functions will be added as needed.
+    */
+   /*@{*/
+   virtual class ir_variable *          as_variable()         { return NULL; }
+   virtual class ir_function *          as_function()         { return NULL; }
+   virtual class ir_dereference *       as_dereference()      { return NULL; }
+   virtual class ir_dereference_array *	as_dereference_array() { return NULL; }
+   virtual class ir_dereference_variable *as_dereference_variable() { return NULL; }
+   virtual class ir_dereference_record *as_dereference_record() { return NULL; }
+   virtual class ir_expression *        as_expression()       { return NULL; }
+   virtual class ir_rvalue *            as_rvalue()           { return NULL; }
+   virtual class ir_loop *              as_loop()             { return NULL; }
+   virtual class ir_assignment *        as_assignment()       { return NULL; }
+   virtual class ir_call *              as_call()             { return NULL; }
+   virtual class ir_return *            as_return()           { return NULL; }
+   virtual class ir_if *                as_if()               { return NULL; }
+   virtual class ir_swizzle *           as_swizzle()          { return NULL; }
+   virtual class ir_texture *           as_texture()          { return NULL; }
+   virtual class ir_constant *          as_constant()         { return NULL; }
+   virtual class ir_discard *           as_discard()          { return NULL; }
+   virtual class ir_jump *              as_jump()             { return NULL; }
+   /*@}*/
+
+   /**
+    * IR equality method: Return true if the referenced instruction would
+    * return the same value as this one.
+    *
+    * This intended to be used for CSE and algebraic optimizations, on rvalues
+    * in particular.  No support for other instruction types (assignments,
+    * jumps, calls, etc.) is planned.
+    */
+   virtual bool equals(ir_instruction *ir, enum ir_node_type ignore = ir_type_unset);
+
+protected:
+   ir_instruction()
+   {
+      ir_type = ir_type_unset;
+   }
+};
+
+
+/**
+ * The base class for all "values"/expression trees.
+ */
+class ir_rvalue : public ir_instruction {
+public:
+   const struct glsl_type *type;
+
+   virtual ir_rvalue *clone(void *mem_ctx, struct hash_table *) const;
+
+   virtual void serialize_data(memory_writer &mem);
+
+   virtual void accept(ir_visitor *v)
+   {
+      v->visit(this);
+   }
+
+   virtual ir_visitor_status accept(ir_hierarchical_visitor *);
+
+   virtual ir_constant *constant_expression_value(struct hash_table *variable_context = NULL);
+
+   virtual ir_rvalue * as_rvalue()
+   {
+      return this;
+   }
+
+   ir_rvalue *as_rvalue_to_saturate();
+
+   virtual bool is_lvalue() const
+   {
+      return false;
+   }
+
+   /**
+    * Get the variable that is ultimately referenced by an r-value
+    */
+   virtual ir_variable *variable_referenced() const
+   {
+      return NULL;
+   }
+
+
+   /**
+    * If an r-value is a reference to a whole variable, get that variable
+    *
+    * \return
+    * Pointer to a variable that is completely dereferenced by the r-value.  If
+    * the r-value is not a dereference or the dereference does not access the
+    * entire variable (i.e., it's just one array element, struct field), \c NULL
+    * is returned.
+    */
+   virtual ir_variable *whole_variable_referenced()
+   {
+      return NULL;
+   }
+
+   /**
+    * Determine if an r-value has the value zero
+    *
+    * The base implementation of this function always returns \c false.  The
+    * \c ir_constant class over-rides this function to return \c true \b only
+    * for vector and scalar types that have all elements set to the value
+    * zero (or \c false for booleans).
+    *
+    * \sa ir_constant::has_value, ir_rvalue::is_one, ir_rvalue::is_negative_one,
+    *     ir_constant::is_basis
+    */
+   virtual bool is_zero() const;
+
+   /**
+    * Determine if an r-value has the value one
+    *
+    * The base implementation of this function always returns \c false.  The
+    * \c ir_constant class over-rides this function to return \c true \b only
+    * for vector and scalar types that have all elements set to the value
+    * one (or \c true for booleans).
+    *
+    * \sa ir_constant::has_value, ir_rvalue::is_zero, ir_rvalue::is_negative_one,
+    *     ir_constant::is_basis
+    */
+   virtual bool is_one() const;
+
+   /**
+    * Determine if an r-value has the value negative one
+    *
+    * The base implementation of this function always returns \c false.  The
+    * \c ir_constant class over-rides this function to return \c true \b only
+    * for vector and scalar types that have all elements set to the value
+    * negative one.  For boolean types, the result is always \c false.
+    *
+    * \sa ir_constant::has_value, ir_rvalue::is_zero, ir_rvalue::is_one
+    *     ir_constant::is_basis
+    */
+   virtual bool is_negative_one() const;
+
+   /**
+    * Determine if an r-value is a basis vector
+    *
+    * The base implementation of this function always returns \c false.  The
+    * \c ir_constant class over-rides this function to return \c true \b only
+    * for vector and scalar types that have one element set to the value one,
+    * and the other elements set to the value zero.  For boolean types, the
+    * result is always \c false.
+    *
+    * \sa ir_constant::has_value, ir_rvalue::is_zero, ir_rvalue::is_one,
+    *     is_constant::is_negative_one
+    */
+   virtual bool is_basis() const;
+
+   /**
+    * Determine if an r-value is an unsigned integer constant which can be
+    * stored in 16 bits.
+    *
+    * \sa ir_constant::is_uint16_constant.
+    */
+   virtual bool is_uint16_constant() const { return false; }
+
+   /**
+    * Return a generic value of error_type.
+    *
+    * Allocation will be performed with 'mem_ctx' as ralloc owner.
+    */
+   static ir_rvalue *error_value(void *mem_ctx);
+
+protected:
+   ir_rvalue();
+};
+
+
+/**
+ * Variable storage classes
+ */
+enum ir_variable_mode {
+   ir_var_auto = 0,     /**< Function local variables and globals. */
+   ir_var_uniform,      /**< Variable declared as a uniform. */
+   ir_var_shader_in,
+   ir_var_shader_out,
+   ir_var_function_in,
+   ir_var_function_out,
+   ir_var_function_inout,
+   ir_var_const_in,	/**< "in" param that must be a constant expression */
+   ir_var_system_value, /**< Ex: front-face, instance-id, etc. */
+   ir_var_temporary,	/**< Temporary variable generated during compilation. */
+   ir_var_mode_count	/**< Number of variable modes */
+};
+
+/**
+ * Enum keeping track of how a variable was declared.  For error checking of
+ * the gl_PerVertex redeclaration rules.
+ */
+enum ir_var_declaration_type {
+   /**
+    * Normal declaration (for most variables, this means an explicit
+    * declaration.  Exception: temporaries are always implicitly declared, but
+    * they still use ir_var_declared_normally).
+    *
+    * Note: an ir_variable that represents a named interface block uses
+    * ir_var_declared_normally.
+    */
+   ir_var_declared_normally = 0,
+
+   /**
+    * Variable was explicitly declared (or re-declared) in an unnamed
+    * interface block.
+    */
+   ir_var_declared_in_block,
+
+   /**
+    * Variable is an implicitly declared built-in that has not been explicitly
+    * re-declared by the shader.
+    */
+   ir_var_declared_implicitly,
+};
+
+/**
+ * \brief Layout qualifiers for gl_FragDepth.
+ *
+ * The AMD/ARB_conservative_depth extensions allow gl_FragDepth to be redeclared
+ * with a layout qualifier.
+ */
+enum ir_depth_layout {
+    ir_depth_layout_none, /**< No depth layout is specified. */
+    ir_depth_layout_any,
+    ir_depth_layout_greater,
+    ir_depth_layout_less,
+    ir_depth_layout_unchanged
+};
+
+/**
+ * \brief Convert depth layout qualifier to string.
+ */
+const char*
+depth_layout_string(ir_depth_layout layout);
+
+/**
+ * Description of built-in state associated with a uniform
+ *
+ * \sa ir_variable::state_slots
+ */
+struct ir_state_slot {
+   int tokens[5];
+   int swizzle;
+};
+
+
+/**
+ * Get the string value for an interpolation qualifier
+ *
+ * \return The string that would be used in a shader to specify \c
+ * mode will be returned.
+ *
+ * This function is used to generate error messages of the form "shader
+ * uses %s interpolation qualifier", so in the case where there is no
+ * interpolation qualifier, it returns "no".
+ *
+ * This function should only be used on a shader input or output variable.
+ */
+const char *interpolation_string(unsigned interpolation);
+
+
+class ir_variable : public ir_instruction {
+public:
+   ir_variable(const struct glsl_type *, const char *, ir_variable_mode);
+
+   virtual ir_variable *clone(void *mem_ctx, struct hash_table *ht) const;
+
+   virtual ir_variable *as_variable()
+   {
+      return this;
+   }
+
+   virtual void accept(ir_visitor *v)
+   {
+      v->visit(this);
+   }
+
+   virtual void serialize_data(memory_writer &mem);
+
+   virtual ir_visitor_status accept(ir_hierarchical_visitor *);
+
+
+   /**
+    * Determine how this variable should be interpolated based on its
+    * interpolation qualifier (if present), whether it is gl_Color or
+    * gl_SecondaryColor, and whether flatshading is enabled in the current GL
+    * state.
+    *
+    * The return value will always be either INTERP_QUALIFIER_SMOOTH,
+    * INTERP_QUALIFIER_NOPERSPECTIVE, or INTERP_QUALIFIER_FLAT.
+    */
+   glsl_interp_qualifier determine_interpolation_mode(bool flat_shade);
+
+   /**
+    * Determine whether or not a variable is part of a uniform block.
+    */
+   inline bool is_in_uniform_block() const
+   {
+      return this->data.mode == ir_var_uniform && this->interface_type != NULL;
+   }
+
+   /**
+    * Determine whether or not a variable is the declaration of an interface
+    * block
+    *
+    * For the first declaration below, there will be an \c ir_variable named
+    * "instance" whose type and whose instance_type will be the same
+    *  \cglsl_type.  For the second declaration, there will be an \c ir_variable
+    * named "f" whose type is float and whose instance_type is B2.
+    *
+    * "instance" is an interface instance variable, but "f" is not.
+    *
+    * uniform B1 {
+    *     float f;
+    * } instance;
+    *
+    * uniform B2 {
+    *     float f;
+    * };
+    */
+   inline bool is_interface_instance() const
+   {
+      const glsl_type *const t = this->type;
+
+      return (t == this->interface_type)
+         || (t->is_array() && t->fields.array == this->interface_type);
+    }
+
+   /**
+    * Set this->interface_type on a newly created variable.
+    */
+   void init_interface_type(const struct glsl_type *type)
+   {
+      assert(this->interface_type == NULL);
+      this->interface_type = type;
+      if (this->is_interface_instance()) {
+         this->max_ifc_array_access =
+            rzalloc_array(this, unsigned, type->length);
+      }
+   }
+
+   /**
+    * Change this->interface_type on a variable that previously had a
+    * different, but compatible, interface_type.  This is used during linking
+    * to set the size of arrays in interface blocks.
+    */
+   void change_interface_type(const struct glsl_type *type)
+   {
+      if (this->max_ifc_array_access != NULL) {
+         /* max_ifc_array_access has already been allocated, so make sure the
+          * new interface has the same number of fields as the old one.
+          */
+         assert(this->interface_type->length == type->length);
+      }
+      this->interface_type = type;
+   }
+
+   /**
+    * Change this->interface_type on a variable that previously had a
+    * different, and incompatible, interface_type. This is used during
+    * compilation to handle redeclaration of the built-in gl_PerVertex
+    * interface block.
+    */
+   void reinit_interface_type(const struct glsl_type *type)
+   {
+      if (this->max_ifc_array_access != NULL) {
+#ifndef NDEBUG
+         /* Redeclaring gl_PerVertex is only allowed if none of the built-ins
+          * it defines have been accessed yet; so it's safe to throw away the
+          * old max_ifc_array_access pointer, since all of its values are
+          * zero.
+          */
+         for (unsigned i = 0; i < this->interface_type->length; i++)
+            assert(this->max_ifc_array_access[i] == 0);
+#endif
+         ralloc_free(this->max_ifc_array_access);
+         this->max_ifc_array_access = NULL;
+      }
+      this->interface_type = NULL;
+      init_interface_type(type);
+   }
+
+   const glsl_type *get_interface_type() const
+   {
+      return this->interface_type;
+   }
+
+   /**
+    * Declared type of the variable
+    */
+   const struct glsl_type *type;
+
+   /**
+    * Declared name of the variable
+    */
+   const char *name;
+
+   /**
+    * For variables which satisfy the is_interface_instance() predicate, this
+    * points to an array of integers such that if the ith member of the
+    * interface block is an array, max_ifc_array_access[i] is the maximum
+    * array element of that member that has been accessed.  If the ith member
+    * of the interface block is not an array, max_ifc_array_access[i] is
+    * unused.
+    *
+    * For variables whose type is not an interface block, this pointer is
+    * NULL.
+    */
+   unsigned *max_ifc_array_access;
+
+   struct ir_variable_data {
+
+      /**
+       * Is the variable read-only?
+       *
+       * This is set for variables declared as \c const, shader inputs,
+       * and uniforms.
+       */
+      unsigned read_only:1;
+      unsigned centroid:1;
+      unsigned sample:1;
+      unsigned invariant:1;
+
+      /**
+       * Has this variable been used for reading or writing?
+       *
+       * Several GLSL semantic checks require knowledge of whether or not a
+       * variable has been used.  For example, it is an error to redeclare a
+       * variable as invariant after it has been used.
+       *
+       * This is only maintained in the ast_to_hir.cpp path, not in
+       * Mesa's fixed function or ARB program paths.
+       */
+      unsigned used:1;
+
+      /**
+       * Has this variable been statically assigned?
+       *
+       * This answers whether the variable was assigned in any path of
+       * the shader during ast_to_hir.  This doesn't answer whether it is
+       * still written after dead code removal, nor is it maintained in
+       * non-ast_to_hir.cpp (GLSL parsing) paths.
+       */
+      unsigned assigned:1;
+
+      /**
+       * Enum indicating how the variable was declared.  See
+       * ir_var_declaration_type.
+       *
+       * This is used to detect certain kinds of illegal variable redeclarations.
+       */
+      unsigned how_declared:2;
+
+      /**
+       * Storage class of the variable.
+       *
+       * \sa ir_variable_mode
+       */
+      unsigned mode:4;
+
+      /**
+       * Interpolation mode for shader inputs / outputs
+       *
+       * \sa ir_variable_interpolation
+       */
+      unsigned interpolation:2;
+
+      /**
+       * \name ARB_fragment_coord_conventions
+       * @{
+       */
+      unsigned origin_upper_left:1;
+      unsigned pixel_center_integer:1;
+      /*@}*/
+
+      /**
+       * Was the location explicitly set in the shader?
+       *
+       * If the location is explicitly set in the shader, it \b cannot be changed
+       * by the linker or by the API (e.g., calls to \c glBindAttribLocation have
+       * no effect).
+       */
+      unsigned explicit_location:1;
+      unsigned explicit_index:1;
+
+      /**
+       * Was an initial binding explicitly set in the shader?
+       *
+       * If so, constant_value contains an integer ir_constant representing the
+       * initial binding point.
+       */
+      unsigned explicit_binding:1;
+
+      /**
+       * Does this variable have an initializer?
+       *
+       * This is used by the linker to cross-validiate initializers of global
+       * variables.
+       */
+      unsigned has_initializer:1;
+
+      /**
+       * Is this variable a generic output or input that has not yet been matched
+       * up to a variable in another stage of the pipeline?
+       *
+       * This is used by the linker as scratch storage while assigning locations
+       * to generic inputs and outputs.
+       */
+      unsigned is_unmatched_generic_inout:1;
+
+      /**
+       * If non-zero, then this variable may be packed along with other variables
+       * into a single varying slot, so this offset should be applied when
+       * accessing components.  For example, an offset of 1 means that the x
+       * component of this variable is actually stored in component y of the
+       * location specified by \c location.
+       */
+      unsigned location_frac:2;
+
+      /**
+       * Non-zero if this variable was created by lowering a named interface
+       * block which was not an array.
+       *
+       * Note that this variable and \c from_named_ifc_block_array will never
+       * both be non-zero.
+       */
+      unsigned from_named_ifc_block_nonarray:1;
+
+      /**
+       * Non-zero if this variable was created by lowering a named interface
+       * block which was an array.
+       *
+       * Note that this variable and \c from_named_ifc_block_nonarray will never
+       * both be non-zero.
+       */
+      unsigned from_named_ifc_block_array:1;
+
+      /**
+       * \brief Layout qualifier for gl_FragDepth.
+       *
+       * This is not equal to \c ir_depth_layout_none if and only if this
+       * variable is \c gl_FragDepth and a layout qualifier is specified.
+       */
+      ir_depth_layout depth_layout;
+
+      /**
+       * Storage location of the base of this variable
+       *
+       * The precise meaning of this field depends on the nature of the variable.
+       *
+       *   - Vertex shader input: one of the values from \c gl_vert_attrib.
+       *   - Vertex shader output: one of the values from \c gl_varying_slot.
+       *   - Geometry shader input: one of the values from \c gl_varying_slot.
+       *   - Geometry shader output: one of the values from \c gl_varying_slot.
+       *   - Fragment shader input: one of the values from \c gl_varying_slot.
+       *   - Fragment shader output: one of the values from \c gl_frag_result.
+       *   - Uniforms: Per-stage uniform slot number for default uniform block.
+       *   - Uniforms: Index within the uniform block definition for UBO members.
+       *   - Other: This field is not currently used.
+       *
+       * If the variable is a uniform, shader input, or shader output, and the
+       * slot has not been assigned, the value will be -1.
+       */
+      int location;
+
+      /**
+       * output index for dual source blending.
+       */
+      int index;
+
+      /**
+       * Initial binding point for a sampler or UBO.
+       *
+       * For array types, this represents the binding point for the first element.
+       */
+      int binding;
+
+      /**
+       * Location an atomic counter is stored at.
+       */
+      struct {
+         unsigned buffer_index;
+         unsigned offset;
+      } atomic;
+
+      /**
+       * ARB_shader_image_load_store qualifiers.
+       */
+      struct {
+         bool read_only; /**< "readonly" qualifier. */
+         bool write_only; /**< "writeonly" qualifier. */
+         bool coherent;
+         bool _volatile;
+         bool restrict_flag;
+
+         /** Image internal format if specified explicitly, otherwise GL_NONE. */
+         GLenum format;
+      } image;
+
+      /**
+       * Highest element accessed with a constant expression array index
+       *
+       * Not used for non-array variables.
+       */
+      unsigned max_array_access;
+
+   } data;
+
+   /**
+    * Built-in state that backs this uniform
+    *
+    * Once set at variable creation, \c state_slots must remain invariant.
+    * This is because, ideally, this array would be shared by all clones of
+    * this variable in the IR tree.  In other words, we'd really like for it
+    * to be a fly-weight.
+    *
+    * If the variable is not a uniform, \c num_state_slots will be zero and
+    * \c state_slots will be \c NULL.
+    */
+   /*@{*/
+   unsigned num_state_slots;    /**< Number of state slots used */
+   ir_state_slot *state_slots;  /**< State descriptors. */
+   /*@}*/
+
+   /**
+    * Emit a warning if this variable is accessed.
+    */
+   const char *warn_extension;
+
+   /**
+    * Value assigned in the initializer of a variable declared "const"
+    */
+   ir_constant *constant_value;
+
+   /**
+    * Constant expression assigned in the initializer of the variable
+    *
+    * \warning
+    * This field and \c ::constant_value are distinct.  Even if the two fields
+    * refer to constants with the same value, they must point to separate
+    * objects.
+    */
+   ir_constant *constant_initializer;
+
+private:
+   /**
+    * For variables that are in an interface block or are an instance of an
+    * interface block, this is the \c GLSL_TYPE_INTERFACE type for that block.
+    *
+    * \sa ir_variable::location
+    */
+   const glsl_type *interface_type;
+};
+
+/**
+ * A function that returns whether a built-in function is available in the
+ * current shading language (based on version, ES or desktop, and extensions).
+ */
+typedef bool (*builtin_available_predicate)(const _mesa_glsl_parse_state *);
+
+/*@{*/
+/**
+ * The representation of a function instance; may be the full definition or
+ * simply a prototype.
+ */
+class ir_function_signature : public ir_instruction {
+   /* An ir_function_signature will be part of the list of signatures in
+    * an ir_function.
+    */
+public:
+   ir_function_signature(const glsl_type *return_type,
+                         builtin_available_predicate builtin_avail = NULL);
+
+   virtual ir_function_signature *clone(void *mem_ctx,
+					struct hash_table *ht) const;
+   ir_function_signature *clone_prototype(void *mem_ctx,
+					  struct hash_table *ht) const;
+
+   virtual void serialize_data(memory_writer &mem);
+
+   virtual void accept(ir_visitor *v)
+   {
+      v->visit(this);
+   }
+
+   virtual ir_visitor_status accept(ir_hierarchical_visitor *);
+
+   /**
+    * Attempt to evaluate this function as a constant expression,
+    * given a list of the actual parameters and the variable context.
+    * Returns NULL for non-built-ins.
+    */
+   ir_constant *constant_expression_value(exec_list *actual_parameters, struct hash_table *variable_context);
+
+   /**
+    * Get the name of the function for which this is a signature
+    */
+   const char *function_name() const;
+
+   /**
+    * Get a handle to the function for which this is a signature
+    *
+    * There is no setter function, this function returns a \c const pointer,
+    * and \c ir_function_signature::_function is private for a reason.  The
+    * only way to make a connection between a function and function signature
+    * is via \c ir_function::add_signature.  This helps ensure that certain
+    * invariants (i.e., a function signature is in the list of signatures for
+    * its \c _function) are met.
+    *
+    * \sa ir_function::add_signature
+    */
+   inline const class ir_function *function() const
+   {
+      return this->_function;
+   }
+
+   /**
+    * Check whether the qualifiers match between this signature's parameters
+    * and the supplied parameter list.  If not, returns the name of the first
+    * parameter with mismatched qualifiers (for use in error messages).
+    */
+   const char *qualifiers_match(exec_list *params);
+
+   /**
+    * Replace the current parameter list with the given one.  This is useful
+    * if the current information came from a prototype, and either has invalid
+    * or missing parameter names.
+    */
+   void replace_parameters(exec_list *new_params);
+
+   /**
+    * Function return type.
+    *
+    * \note This discards the optional precision qualifier.
+    */
+   const struct glsl_type *return_type;
+
+   /**
+    * List of ir_variable of function parameters.
+    *
+    * This represents the storage.  The paramaters passed in a particular
+    * call will be in ir_call::actual_paramaters.
+    */
+   struct exec_list parameters;
+
+   /** Whether or not this function has a body (which may be empty). */
+   unsigned is_defined:1;
+
+   /** Whether or not this function signature is a built-in. */
+   bool is_builtin() const;
+
+   /**
+    * Whether or not this function is an intrinsic to be implemented
+    * by the driver.
+    */
+   bool is_intrinsic;
+
+   /** Whether or not a built-in is available for this shader. */
+   bool is_builtin_available(const _mesa_glsl_parse_state *state) const;
+
+   /** Body of instructions in the function. */
+   struct exec_list body;
+
+private:
+   /**
+    * A function pointer to a predicate that answers whether a built-in
+    * function is available in the current shader.  NULL if not a built-in.
+    */
+   builtin_available_predicate builtin_avail;
+
+   /** Function of which this signature is one overload. */
+   class ir_function *_function;
+
+   /** Function signature of which this one is a prototype clone */
+   const ir_function_signature *origin;
+
+   friend class ir_function;
+
+   /**
+    * Helper function to run a list of instructions for constant
+    * expression evaluation.
+    *
+    * The hash table represents the values of the visible variables.
+    * There are no scoping issues because the table is indexed on
+    * ir_variable pointers, not variable names.
+    *
+    * Returns false if the expression is not constant, true otherwise,
+    * and the value in *result if result is non-NULL.
+    */
+   bool constant_expression_evaluate_expression_list(const struct exec_list &body,
+						     struct hash_table *variable_context,
+						     ir_constant **result);
+};
+
+
+/**
+ * Header for tracking multiple overloaded functions with the same name.
+ * Contains a list of ir_function_signatures representing each of the
+ * actual functions.
+ */
+class ir_function : public ir_instruction {
+public:
+   ir_function(const char *name);
+
+   virtual ir_function *clone(void *mem_ctx, struct hash_table *ht) const;
+
+   virtual ir_function *as_function()
+   {
+      return this;
+   }
+
+   virtual void serialize_data(memory_writer &mem);
+
+   virtual void accept(ir_visitor *v)
+   {
+      v->visit(this);
+   }
+
+   virtual ir_visitor_status accept(ir_hierarchical_visitor *);
+
+   void add_signature(ir_function_signature *sig)
+   {
+      sig->_function = this;
+      this->signatures.push_tail(sig);
+   }
+
+   /**
+    * Find a signature that matches a set of actual parameters, taking implicit
+    * conversions into account.  Also flags whether the match was exact.
+    */
+   ir_function_signature *matching_signature(_mesa_glsl_parse_state *state,
+                                             const exec_list *actual_param,
+					     bool *match_is_exact);
+
+   /**
+    * Find a signature that matches a set of actual parameters, taking implicit
+    * conversions into account.
+    */
+   ir_function_signature *matching_signature(_mesa_glsl_parse_state *state,
+                                             const exec_list *actual_param);
+
+   /**
+    * Find a signature that exactly matches a set of actual parameters without
+    * any implicit type conversions.
+    */
+   ir_function_signature *exact_matching_signature(_mesa_glsl_parse_state *state,
+                                                   const exec_list *actual_ps);
+
+   /**
+    * Name of the function.
+    */
+   const char *name;
+
+   /** Whether or not this function has a signature that isn't a built-in. */
+   bool has_user_signature();
+
+   /**
+    * List of ir_function_signature for each overloaded function with this name.
+    */
+   struct exec_list signatures;
+};
+
+inline const char *ir_function_signature::function_name() const
+{
+   return this->_function->name;
+}
+/*@}*/
+
+
+/**
+ * IR instruction representing high-level if-statements
+ */
+class ir_if : public ir_instruction {
+public:
+   ir_if(ir_rvalue *condition)
+      : condition(condition)
+   {
+      ir_type = ir_type_if;
+   }
+
+   virtual ir_if *clone(void *mem_ctx, struct hash_table *ht) const;
+
+   virtual ir_if *as_if()
+   {
+      return this;
+   }
+
+   virtual void accept(ir_visitor *v)
+   {
+      v->visit(this);
+   }
+
+   virtual ir_visitor_status accept(ir_hierarchical_visitor *);
+   virtual void serialize_data(memory_writer &mem);
+
+
+   ir_rvalue *condition;
+   /** List of ir_instruction for the body of the then branch */
+   exec_list  then_instructions;
+   /** List of ir_instruction for the body of the else branch */
+   exec_list  else_instructions;
+};
+
+
+/**
+ * IR instruction representing a high-level loop structure.
+ */
+class ir_loop : public ir_instruction {
+public:
+   ir_loop();
+
+   virtual ir_loop *clone(void *mem_ctx, struct hash_table *ht) const;
+
+   virtual void serialize_data(memory_writer &mem);
+
+   virtual void accept(ir_visitor *v)
+   {
+      v->visit(this);
+   }
+
+   virtual ir_visitor_status accept(ir_hierarchical_visitor *);
+
+   virtual ir_loop *as_loop()
+   {
+      return this;
+   }
+
+   /** List of ir_instruction that make up the body of the loop. */
+   exec_list body_instructions;
+};
+
+
+class ir_assignment : public ir_instruction {
+public:
+   ir_assignment(ir_rvalue *lhs, ir_rvalue *rhs, ir_rvalue *condition = NULL);
+
+   /**
+    * Construct an assignment with an explicit write mask
+    *
+    * \note
+    * Since a write mask is supplied, the LHS must already be a bare
+    * \c ir_dereference.  The cannot be any swizzles in the LHS.
+    */
+   ir_assignment(ir_dereference *lhs, ir_rvalue *rhs, ir_rvalue *condition,
+		 unsigned write_mask);
+
+   virtual ir_assignment *clone(void *mem_ctx, struct hash_table *ht) const;
+
+   virtual void serialize_data(memory_writer &mem);
+
+   virtual ir_constant *constant_expression_value(struct hash_table *variable_context = NULL);
+
+   virtual void accept(ir_visitor *v)
+   {
+      v->visit(this);
+   }
+
+   virtual ir_visitor_status accept(ir_hierarchical_visitor *);
+
+   virtual ir_assignment * as_assignment()
+   {
+      return this;
+   }
+
+   /**
+    * Get a whole variable written by an assignment
+    *
+    * If the LHS of the assignment writes a whole variable, the variable is
+    * returned.  Otherwise \c NULL is returned.  Examples of whole-variable
+    * assignment are:
+    *
+    *  - Assigning to a scalar
+    *  - Assigning to all components of a vector
+    *  - Whole array (or matrix) assignment
+    *  - Whole structure assignment
+    */
+   ir_variable *whole_variable_written();
+
+   /**
+    * Set the LHS of an assignment
+    */
+   void set_lhs(ir_rvalue *lhs);
+
+   /**
+    * Left-hand side of the assignment.
+    *
+    * This should be treated as read only.  If you need to set the LHS of an
+    * assignment, use \c ir_assignment::set_lhs.
+    */
+   ir_dereference *lhs;
+
+   /**
+    * Value being assigned
+    */
+   ir_rvalue *rhs;
+
+   /**
+    * Optional condition for the assignment.
+    */
+   ir_rvalue *condition;
+
+
+   /**
+    * Component mask written
+    *
+    * For non-vector types in the LHS, this field will be zero.  For vector
+    * types, a bit will be set for each component that is written.  Note that
+    * for \c vec2 and \c vec3 types only the lower bits will ever be set.
+    *
+    * A partially-set write mask means that each enabled channel gets
+    * the value from a consecutive channel of the rhs.  For example,
+    * to write just .xyw of gl_FrontColor with color:
+    *
+    * (assign (constant bool (1)) (xyw)
+    *     (var_ref gl_FragColor)
+    *     (swiz xyw (var_ref color)))
+    */
+   unsigned write_mask:4;
+};
+
+/* Update ir_expression::get_num_operands() and operator_strs when
+ * updating this list.
+ */
+enum ir_expression_operation {
+   ir_unop_bit_not,
+   ir_unop_logic_not,
+   ir_unop_neg,
+   ir_unop_abs,
+   ir_unop_sign,
+   ir_unop_rcp,
+   ir_unop_rsq,
+   ir_unop_sqrt,
+   ir_unop_exp,         /**< Log base e on gentype */
+   ir_unop_log,	        /**< Natural log on gentype */
+   ir_unop_exp2,
+   ir_unop_log2,
+   ir_unop_f2i,         /**< Float-to-integer conversion. */
+   ir_unop_f2u,         /**< Float-to-unsigned conversion. */
+   ir_unop_i2f,         /**< Integer-to-float conversion. */
+   ir_unop_f2b,         /**< Float-to-boolean conversion */
+   ir_unop_b2f,         /**< Boolean-to-float conversion */
+   ir_unop_i2b,         /**< int-to-boolean conversion */
+   ir_unop_b2i,         /**< Boolean-to-int conversion */
+   ir_unop_u2f,         /**< Unsigned-to-float conversion. */
+   ir_unop_i2u,         /**< Integer-to-unsigned conversion. */
+   ir_unop_u2i,         /**< Unsigned-to-integer conversion. */
+   ir_unop_bitcast_i2f, /**< Bit-identical int-to-float "conversion" */
+   ir_unop_bitcast_f2i, /**< Bit-identical float-to-int "conversion" */
+   ir_unop_bitcast_u2f, /**< Bit-identical uint-to-float "conversion" */
+   ir_unop_bitcast_f2u, /**< Bit-identical float-to-uint "conversion" */
+   ir_unop_any,
+
+   /**
+    * \name Unary floating-point rounding operations.
+    */
+   /*@{*/
+   ir_unop_trunc,
+   ir_unop_ceil,
+   ir_unop_floor,
+   ir_unop_fract,
+   ir_unop_round_even,
+   /*@}*/
+
+   /**
+    * \name Trigonometric operations.
+    */
+   /*@{*/
+   ir_unop_sin,
+   ir_unop_cos,
+   ir_unop_sin_reduced,    /**< Reduced range sin. [-pi, pi] */
+   ir_unop_cos_reduced,    /**< Reduced range cos. [-pi, pi] */
+   /*@}*/
+
+   /**
+    * \name Partial derivatives.
+    */
+   /*@{*/
+   ir_unop_dFdx,
+   ir_unop_dFdy,
+   /*@}*/
+
+   /**
+    * \name Floating point pack and unpack operations.
+    */
+   /*@{*/
+   ir_unop_pack_snorm_2x16,
+   ir_unop_pack_snorm_4x8,
+   ir_unop_pack_unorm_2x16,
+   ir_unop_pack_unorm_4x8,
+   ir_unop_pack_half_2x16,
+   ir_unop_unpack_snorm_2x16,
+   ir_unop_unpack_snorm_4x8,
+   ir_unop_unpack_unorm_2x16,
+   ir_unop_unpack_unorm_4x8,
+   ir_unop_unpack_half_2x16,
+   /*@}*/
+
+   /**
+    * \name Lowered floating point unpacking operations.
+    *
+    * \see lower_packing_builtins_visitor::split_unpack_half_2x16
+    */
+   /*@{*/
+   ir_unop_unpack_half_2x16_split_x,
+   ir_unop_unpack_half_2x16_split_y,
+   /*@}*/
+
+   /**
+    * \name Bit operations, part of ARB_gpu_shader5.
+    */
+   /*@{*/
+   ir_unop_bitfield_reverse,
+   ir_unop_bit_count,
+   ir_unop_find_msb,
+   ir_unop_find_lsb,
+   /*@}*/
+
+   ir_unop_noise,
+
+   /**
+    * A sentinel marking the last of the unary operations.
+    */
+   ir_last_unop = ir_unop_noise,
+
+   ir_binop_add,
+   ir_binop_sub,
+   ir_binop_mul,       /**< Floating-point or low 32-bit integer multiply. */
+   ir_binop_imul_high, /**< Calculates the high 32-bits of a 64-bit multiply. */
+   ir_binop_div,
+
+   /**
+    * Returns the carry resulting from the addition of the two arguments.
+    */
+   /*@{*/
+   ir_binop_carry,
+   /*@}*/
+
+   /**
+    * Returns the borrow resulting from the subtraction of the second argument
+    * from the first argument.
+    */
+   /*@{*/
+   ir_binop_borrow,
+   /*@}*/
+
+   /**
+    * Takes one of two combinations of arguments:
+    *
+    * - mod(vecN, vecN)
+    * - mod(vecN, float)
+    *
+    * Does not take integer types.
+    */
+   ir_binop_mod,
+
+   /**
+    * \name Binary comparison operators which return a boolean vector.
+    * The type of both operands must be equal.
+    */
+   /*@{*/
+   ir_binop_less,
+   ir_binop_greater,
+   ir_binop_lequal,
+   ir_binop_gequal,
+   ir_binop_equal,
+   ir_binop_nequal,
+   /**
+    * Returns single boolean for whether all components of operands[0]
+    * equal the components of operands[1].
+    */
+   ir_binop_all_equal,
+   /**
+    * Returns single boolean for whether any component of operands[0]
+    * is not equal to the corresponding component of operands[1].
+    */
+   ir_binop_any_nequal,
+   /*@}*/
+
+   /**
+    * \name Bit-wise binary operations.
+    */
+   /*@{*/
+   ir_binop_lshift,
+   ir_binop_rshift,
+   ir_binop_bit_and,
+   ir_binop_bit_xor,
+   ir_binop_bit_or,
+   /*@}*/
+
+   ir_binop_logic_and,
+   ir_binop_logic_xor,
+   ir_binop_logic_or,
+
+   ir_binop_dot,
+   ir_binop_min,
+   ir_binop_max,
+
+   ir_binop_pow,
+
+   /**
+    * \name Lowered floating point packing operations.
+    *
+    * \see lower_packing_builtins_visitor::split_pack_half_2x16
+    */
+   /*@{*/
+   ir_binop_pack_half_2x16_split,
+   /*@}*/
+
+   /**
+    * \name First half of a lowered bitfieldInsert() operation.
+    *
+    * \see lower_instructions::bitfield_insert_to_bfm_bfi
+    */
+   /*@{*/
+   ir_binop_bfm,
+   /*@}*/
+
+   /**
+    * Load a value the size of a given GLSL type from a uniform block.
+    *
+    * operand0 is the ir_constant uniform block index in the linked shader.
+    * operand1 is a byte offset within the uniform block.
+    */
+   ir_binop_ubo_load,
+
+   /**
+    * \name Multiplies a number by two to a power, part of ARB_gpu_shader5.
+    */
+   /*@{*/
+   ir_binop_ldexp,
+   /*@}*/
+
+   /**
+    * Extract a scalar from a vector
+    *
+    * operand0 is the vector
+    * operand1 is the index of the field to read from operand0
+    */
+   ir_binop_vector_extract,
+
+   /**
+    * A sentinel marking the last of the binary operations.
+    */
+   ir_last_binop = ir_binop_vector_extract,
+
+   /**
+    * \name Fused floating-point multiply-add, part of ARB_gpu_shader5.
+    */
+   /*@{*/
+   ir_triop_fma,
+   /*@}*/
+
+   ir_triop_lrp,
+
+   /**
+    * \name Conditional Select
+    *
+    * A vector conditional select instruction (like ?:, but operating per-
+    * component on vectors).
+    *
+    * \see lower_instructions_visitor::ldexp_to_arith
+    */
+   /*@{*/
+   ir_triop_csel,
+   /*@}*/
+
+   /**
+    * \name Second half of a lowered bitfieldInsert() operation.
+    *
+    * \see lower_instructions::bitfield_insert_to_bfm_bfi
+    */
+   /*@{*/
+   ir_triop_bfi,
+   /*@}*/
+
+   ir_triop_bitfield_extract,
+
+   /**
+    * Generate a value with one field of a vector changed
+    *
+    * operand0 is the vector
+    * operand1 is the value to write into the vector result
+    * operand2 is the index in operand0 to be modified
+    */
+   ir_triop_vector_insert,
+
+   /**
+    * A sentinel marking the last of the ternary operations.
+    */
+   ir_last_triop = ir_triop_vector_insert,
+
+   ir_quadop_bitfield_insert,
+
+   ir_quadop_vector,
+
+   /**
+    * A sentinel marking the last of the ternary operations.
+    */
+   ir_last_quadop = ir_quadop_vector,
+
+   /**
+    * A sentinel marking the last of all operations.
+    */
+   ir_last_opcode = ir_quadop_vector
+};
+
+class ir_expression : public ir_rvalue {
+public:
+   ir_expression(int op, const struct glsl_type *type,
+                 ir_rvalue *op0, ir_rvalue *op1 = NULL,
+                 ir_rvalue *op2 = NULL, ir_rvalue *op3 = NULL);
+
+   /**
+    * Constructor for unary operation expressions
+    */
+   ir_expression(int op, ir_rvalue *);
+
+   /**
+    * Constructor for binary operation expressions
+    */
+   ir_expression(int op, ir_rvalue *op0, ir_rvalue *op1);
+
+   /**
+    * Constructor for ternary operation expressions
+    */
+   ir_expression(int op, ir_rvalue *op0, ir_rvalue *op1, ir_rvalue *op2);
+
+   virtual ir_expression *as_expression()
+   {
+      return this;
+   }
+
+   virtual bool equals(ir_instruction *ir, enum ir_node_type ignore = ir_type_unset);
+
+   virtual ir_expression *clone(void *mem_ctx, struct hash_table *ht) const;
+
+   virtual void serialize_data(memory_writer &mem);
+
+   /**
+    * Attempt to constant-fold the expression
+    *
+    * The "variable_context" hash table links ir_variable * to ir_constant *
+    * that represent the variables' values.  \c NULL represents an empty
+    * context.
+    *
+    * If the expression cannot be constant folded, this method will return
+    * \c NULL.
+    */
+   virtual ir_constant *constant_expression_value(struct hash_table *variable_context = NULL);
+
+   /**
+    * Determine the number of operands used by an expression
+    */
+   static unsigned int get_num_operands(ir_expression_operation);
+
+   /**
+    * Determine the number of operands used by an expression
+    */
+   unsigned int get_num_operands() const
+   {
+      return (this->operation == ir_quadop_vector)
+	 ? this->type->vector_elements : get_num_operands(operation);
+   }
+
+   /**
+    * Return whether the expression operates on vectors horizontally.
+    */
+   bool is_horizontal() const
+   {
+      return operation == ir_binop_all_equal ||
+             operation == ir_binop_any_nequal ||
+             operation == ir_unop_any ||
+             operation == ir_binop_dot ||
+             operation == ir_quadop_vector;
+   }
+
+   /**
+    * Return a string representing this expression's operator.
+    */
+   const char *operator_string();
+
+   /**
+    * Return a string representing this expression's operator.
+    */
+   static const char *operator_string(ir_expression_operation);
+
+
+   /**
+    * Do a reverse-lookup to translate the given string into an operator.
+    */
+   static ir_expression_operation get_operator(const char *);
+
+   virtual void accept(ir_visitor *v)
+   {
+      v->visit(this);
+   }
+
+   virtual ir_visitor_status accept(ir_hierarchical_visitor *);
+
+   ir_expression_operation operation;
+   ir_rvalue *operands[4];
+};
+
+
+/**
+ * HIR instruction representing a high-level function call, containing a list
+ * of parameters and returning a value in the supplied temporary.
+ */
+class ir_call : public ir_instruction {
+public:
+   ir_call(ir_function_signature *callee,
+	   ir_dereference_variable *return_deref,
+	   exec_list *actual_parameters)
+      : return_deref(return_deref), callee(callee)
+   {
+      ir_type = ir_type_call;
+      assert(callee->return_type != NULL);
+      actual_parameters->move_nodes_to(& this->actual_parameters);
+      this->use_builtin = callee->is_builtin();
+   }
+
+   virtual ir_call *clone(void *mem_ctx, struct hash_table *ht) const;
+
+   virtual void serialize_data(memory_writer &mem);
+
+   virtual ir_constant *constant_expression_value(struct hash_table *variable_context = NULL);
+
+   virtual ir_call *as_call()
+   {
+      return this;
+   }
+
+   virtual void accept(ir_visitor *v)
+   {
+      v->visit(this);
+   }
+
+   virtual ir_visitor_status accept(ir_hierarchical_visitor *);
+
+   /**
+    * Get the name of the function being called.
+    */
+   const char *callee_name() const
+   {
+      return callee->function_name();
+   }
+
+   /**
+    * Generates an inline version of the function before @ir,
+    * storing the return value in return_deref.
+    */
+   void generate_inline(ir_instruction *ir);
+
+   /**
+    * Storage for the function's return value.
+    * This must be NULL if the return type is void.
+    */
+   ir_dereference_variable *return_deref;
+
+   /**
+    * The specific function signature being called.
+    */
+   ir_function_signature *callee;
+
+   /* List of ir_rvalue of paramaters passed in this call. */
+   exec_list actual_parameters;
+
+   /** Should this call only bind to a built-in function? */
+   bool use_builtin;
+};
+
+
+/**
+ * \name Jump-like IR instructions.
+ *
+ * These include \c break, \c continue, \c return, and \c discard.
+ */
+/*@{*/
+class ir_jump : public ir_instruction {
+protected:
+   ir_jump()
+   {
+      ir_type = ir_type_unset;
+   }
+
+public:
+   virtual ir_jump *as_jump()
+   {
+      return this;
+   }
+};
+
+class ir_return : public ir_jump {
+public:
+   ir_return()
+      : value(NULL)
+   {
+      this->ir_type = ir_type_return;
+   }
+
+   ir_return(ir_rvalue *value)
+      : value(value)
+   {
+      this->ir_type = ir_type_return;
+   }
+
+   virtual ir_return *clone(void *mem_ctx, struct hash_table *) const;
+
+   virtual ir_return *as_return()
+   {
+      return this;
+   }
+
+   ir_rvalue *get_value() const
+   {
+      return value;
+   }
+
+   virtual void accept(ir_visitor *v)
+   {
+      v->visit(this);
+   }
+
+   virtual ir_visitor_status accept(ir_hierarchical_visitor *);
+
+   ir_rvalue *value;
+   virtual void serialize_data(memory_writer &mem);
+
+};
+
+
+/**
+ * Jump instructions used inside loops
+ *
+ * These include \c break and \c continue.  The \c break within a loop is
+ * different from the \c break within a switch-statement.
+ *
+ * \sa ir_switch_jump
+ */
+class ir_loop_jump : public ir_jump {
+public:
+   enum jump_mode {
+      jump_break,
+      jump_continue
+   };
+
+   ir_loop_jump(jump_mode mode)
+   {
+      this->ir_type = ir_type_loop_jump;
+      this->mode = mode;
+   }
+
+   virtual ir_loop_jump *clone(void *mem_ctx, struct hash_table *) const;
+
+   virtual void serialize_data(memory_writer &mem);
+
+   virtual void accept(ir_visitor *v)
+   {
+      v->visit(this);
+   }
+
+   virtual ir_visitor_status accept(ir_hierarchical_visitor *);
+
+   bool is_break() const
+   {
+      return mode == jump_break;
+   }
+
+   bool is_continue() const
+   {
+      return mode == jump_continue;
+   }
+
+   /** Mode selector for the jump instruction. */
+   enum jump_mode mode;
+};
+
+/**
+ * IR instruction representing discard statements.
+ */
+class ir_discard : public ir_jump {
+public:
+   ir_discard()
+   {
+      this->ir_type = ir_type_discard;
+      this->condition = NULL;
+   }
+
+   ir_discard(ir_rvalue *cond)
+   {
+      this->ir_type = ir_type_discard;
+      this->condition = cond;
+   }
+
+   virtual ir_discard *clone(void *mem_ctx, struct hash_table *ht) const;
+
+   virtual void serialize_data(memory_writer &mem);
+
+   virtual void accept(ir_visitor *v)
+   {
+      v->visit(this);
+   }
+
+   virtual ir_visitor_status accept(ir_hierarchical_visitor *);
+
+   virtual ir_discard *as_discard()
+   {
+      return this;
+   }
+
+   ir_rvalue *condition;
+};
+/*@}*/
+
+
+/**
+ * Texture sampling opcodes used in ir_texture
+ */
+enum ir_texture_opcode {
+   ir_tex,		/**< Regular texture look-up */
+   ir_txb,		/**< Texture look-up with LOD bias */
+   ir_txl,		/**< Texture look-up with explicit LOD */
+   ir_txd,		/**< Texture look-up with partial derivatvies */
+   ir_txf,		/**< Texel fetch with explicit LOD */
+   ir_txf_ms,           /**< Multisample texture fetch */
+   ir_txs,		/**< Texture size */
+   ir_lod,		/**< Texture lod query */
+   ir_tg4,		/**< Texture gather */
+   ir_query_levels      /**< Texture levels query */
+};
+
+
+/**
+ * IR instruction to sample a texture
+ *
+ * The specific form of the IR instruction depends on the \c mode value
+ * selected from \c ir_texture_opcodes.  In the printed IR, these will
+ * appear as:
+ *
+ *                                    Texel offset (0 or an expression)
+ *                                    | Projection divisor
+ *                                    | |  Shadow comparitor
+ *                                    | |  |
+ *                                    v v  v
+ * (tex <type> <sampler> <coordinate> 0 1 ( ))
+ * (txb <type> <sampler> <coordinate> 0 1 ( ) <bias>)
+ * (txl <type> <sampler> <coordinate> 0 1 ( ) <lod>)
+ * (txd <type> <sampler> <coordinate> 0 1 ( ) (dPdx dPdy))
+ * (txf <type> <sampler> <coordinate> 0       <lod>)
+ * (txf_ms
+ *      <type> <sampler> <coordinate>         <sample_index>)
+ * (txs <type> <sampler> <lod>)
+ * (lod <type> <sampler> <coordinate>)
+ * (tg4 <type> <sampler> <coordinate> <offset> <component>)
+ * (query_levels <type> <sampler>)
+ */
+class ir_texture : public ir_rvalue {
+public:
+   ir_texture(enum ir_texture_opcode op)
+      : op(op), sampler(NULL), coordinate(NULL), projector(NULL),
+        shadow_comparitor(NULL), offset(NULL)
+   {
+      this->ir_type = ir_type_texture;
+      memset(&lod_info, 0, sizeof(lod_info));
+   }
+
+   virtual ir_texture *clone(void *mem_ctx, struct hash_table *) const;
+
+   virtual void serialize_data(memory_writer &mem);
+
+   virtual ir_constant *constant_expression_value(struct hash_table *variable_context = NULL);
+
+   virtual void accept(ir_visitor *v)
+   {
+      v->visit(this);
+   }
+
+   virtual ir_texture *as_texture()
+   {
+      return this;
+   }
+
+   virtual ir_visitor_status accept(ir_hierarchical_visitor *);
+
+   virtual bool equals(ir_instruction *ir, enum ir_node_type ignore = ir_type_unset);
+
+   /**
+    * Return a string representing the ir_texture_opcode.
+    */
+   const char *opcode_string();
+
+   /** Set the sampler and type. */
+   void set_sampler(ir_dereference *sampler, const glsl_type *type);
+
+   /**
+    * Do a reverse-lookup to translate a string into an ir_texture_opcode.
+    */
+   static ir_texture_opcode get_opcode(const char *);
+
+   enum ir_texture_opcode op;
+
+   /** Sampler to use for the texture access. */
+   ir_dereference *sampler;
+
+   /** Texture coordinate to sample */
+   ir_rvalue *coordinate;
+
+   /**
+    * Value used for projective divide.
+    *
+    * If there is no projective divide (the common case), this will be
+    * \c NULL.  Optimization passes should check for this to point to a constant
+    * of 1.0 and replace that with \c NULL.
+    */
+   ir_rvalue *projector;
+
+   /**
+    * Coordinate used for comparison on shadow look-ups.
+    *
+    * If there is no shadow comparison, this will be \c NULL.  For the
+    * \c ir_txf opcode, this *must* be \c NULL.
+    */
+   ir_rvalue *shadow_comparitor;
+
+   /** Texel offset. */
+   ir_rvalue *offset;
+
+   union {
+      ir_rvalue *lod;		/**< Floating point LOD */
+      ir_rvalue *bias;		/**< Floating point LOD bias */
+      ir_rvalue *sample_index;  /**< MSAA sample index */
+      ir_rvalue *component;     /**< Gather component selector */
+      struct {
+	 ir_rvalue *dPdx;	/**< Partial derivative of coordinate wrt X */
+	 ir_rvalue *dPdy;	/**< Partial derivative of coordinate wrt Y */
+      } grad;
+   } lod_info;
+};
+
+
+struct ir_swizzle_mask {
+   unsigned x:2;
+   unsigned y:2;
+   unsigned z:2;
+   unsigned w:2;
+
+   /**
+    * Number of components in the swizzle.
+    */
+   unsigned num_components:3;
+
+   /**
+    * Does the swizzle contain duplicate components?
+    *
+    * L-value swizzles cannot contain duplicate components.
+    */
+   unsigned has_duplicates:1;
+};
+
+
+class ir_swizzle : public ir_rvalue {
+public:
+   ir_swizzle(ir_rvalue *, unsigned x, unsigned y, unsigned z, unsigned w,
+              unsigned count);
+
+   ir_swizzle(ir_rvalue *val, const unsigned *components, unsigned count);
+
+   ir_swizzle(ir_rvalue *val, ir_swizzle_mask mask);
+
+   virtual ir_swizzle *clone(void *mem_ctx, struct hash_table *) const;
+
+   virtual void serialize_data(memory_writer &mem);
+
+   virtual ir_constant *constant_expression_value(struct hash_table *variable_context = NULL);
+
+   virtual ir_swizzle *as_swizzle()
+   {
+      return this;
+   }
+
+   /**
+    * Construct an ir_swizzle from the textual representation.  Can fail.
+    */
+   static ir_swizzle *create(ir_rvalue *, const char *, unsigned vector_length);
+
+   virtual void accept(ir_visitor *v)
+   {
+      v->visit(this);
+   }
+
+   virtual ir_visitor_status accept(ir_hierarchical_visitor *);
+
+   virtual bool equals(ir_instruction *ir, enum ir_node_type ignore = ir_type_unset);
+
+   bool is_lvalue() const
+   {
+      return val->is_lvalue() && !mask.has_duplicates;
+   }
+
+   /**
+    * Get the variable that is ultimately referenced by an r-value
+    */
+   virtual ir_variable *variable_referenced() const;
+
+   ir_rvalue *val;
+   ir_swizzle_mask mask;
+
+private:
+   /**
+    * Initialize the mask component of a swizzle
+    *
+    * This is used by the \c ir_swizzle constructors.
+    */
+   void init_mask(const unsigned *components, unsigned count);
+};
+
+
+class ir_dereference : public ir_rvalue {
+public:
+   virtual ir_dereference *clone(void *mem_ctx, struct hash_table *) const = 0;
+
+   virtual ir_dereference *as_dereference()
+   {
+      return this;
+   }
+
+   bool is_lvalue() const;
+
+   /**
+    * Get the variable that is ultimately referenced by an r-value
+    */
+   virtual ir_variable *variable_referenced() const = 0;
+};
+
+
+class ir_dereference_variable : public ir_dereference {
+public:
+   ir_dereference_variable(ir_variable *var);
+
+   virtual ir_dereference_variable *clone(void *mem_ctx,
+					  struct hash_table *) const;
+
+   virtual void serialize_data(memory_writer &mem);
+
+   virtual ir_constant *constant_expression_value(struct hash_table *variable_context = NULL);
+
+   virtual ir_dereference_variable *as_dereference_variable()
+   {
+      return this;
+   }
+
+   virtual bool equals(ir_instruction *ir, enum ir_node_type ignore = ir_type_unset);
+
+   /**
+    * Get the variable that is ultimately referenced by an r-value
+    */
+   virtual ir_variable *variable_referenced() const
+   {
+      return this->var;
+   }
+
+   virtual ir_variable *whole_variable_referenced()
+   {
+      /* ir_dereference_variable objects always dereference the entire
+       * variable.  However, if this dereference is dereferenced by anything
+       * else, the complete deferefernce chain is not a whole-variable
+       * dereference.  This method should only be called on the top most
+       * ir_rvalue in a dereference chain.
+       */
+      return this->var;
+   }
+
+   virtual void accept(ir_visitor *v)
+   {
+      v->visit(this);
+   }
+
+   virtual ir_visitor_status accept(ir_hierarchical_visitor *);
+
+   /**
+    * Object being dereferenced.
+    */
+   ir_variable *var;
+};
+
+
+class ir_dereference_array : public ir_dereference {
+public:
+   ir_dereference_array(ir_rvalue *value, ir_rvalue *array_index);
+
+   ir_dereference_array(ir_variable *var, ir_rvalue *array_index);
+
+   virtual ir_dereference_array *clone(void *mem_ctx,
+				       struct hash_table *) const;
+
+   virtual void serialize_data(memory_writer &mem);
+
+   virtual ir_constant *constant_expression_value(struct hash_table *variable_context = NULL);
+
+   virtual ir_dereference_array *as_dereference_array()
+   {
+      return this;
+   }
+
+   virtual bool equals(ir_instruction *ir, enum ir_node_type ignore = ir_type_unset);
+
+   /**
+    * Get the variable that is ultimately referenced by an r-value
+    */
+   virtual ir_variable *variable_referenced() const
+   {
+      return this->array->variable_referenced();
+   }
+
+   virtual void accept(ir_visitor *v)
+   {
+      v->visit(this);
+   }
+
+   virtual ir_visitor_status accept(ir_hierarchical_visitor *);
+
+   ir_rvalue *array;
+   ir_rvalue *array_index;
+
+private:
+   void set_array(ir_rvalue *value);
+};
+
+
+class ir_dereference_record : public ir_dereference {
+public:
+   ir_dereference_record(ir_rvalue *value, const char *field);
+
+   ir_dereference_record(ir_variable *var, const char *field);
+
+   virtual ir_dereference_record *clone(void *mem_ctx,
+					struct hash_table *) const;
+
+   virtual void serialize_data(memory_writer &mem);
+
+   virtual ir_constant *constant_expression_value(struct hash_table *variable_context = NULL);
+
+   virtual ir_dereference_record *as_dereference_record()
+   {
+      return this;
+   }
+
+   /**
+    * Get the variable that is ultimately referenced by an r-value
+    */
+   virtual ir_variable *variable_referenced() const
+   {
+      return this->record->variable_referenced();
+   }
+
+   virtual void accept(ir_visitor *v)
+   {
+      v->visit(this);
+   }
+
+   virtual ir_visitor_status accept(ir_hierarchical_visitor *);
+
+   ir_rvalue *record;
+   const char *field;
+};
+
+
+/**
+ * Data stored in an ir_constant
+ */
+union ir_constant_data {
+      unsigned u[16];
+      int i[16];
+      float f[16];
+      bool b[16];
+};
+
+
+class ir_constant : public ir_rvalue {
+public:
+   ir_constant(const struct glsl_type *type, const ir_constant_data *data);
+   ir_constant(bool b, unsigned vector_elements=1);
+   ir_constant(unsigned int u, unsigned vector_elements=1);
+   ir_constant(int i, unsigned vector_elements=1);
+   ir_constant(float f, unsigned vector_elements=1);
+
+   /**
+    * Construct an ir_constant from a list of ir_constant values
+    */
+   ir_constant(const struct glsl_type *type, exec_list *values);
+
+   /**
+    * Construct an ir_constant from a scalar component of another ir_constant
+    *
+    * The new \c ir_constant inherits the type of the component from the
+    * source constant.
+    *
+    * \note
+    * In the case of a matrix constant, the new constant is a scalar, \b not
+    * a vector.
+    */
+   ir_constant(const ir_constant *c, unsigned i);
+
+   /**
+    * Return a new ir_constant of the specified type containing all zeros.
+    */
+   static ir_constant *zero(void *mem_ctx, const glsl_type *type);
+
+   virtual ir_constant *clone(void *mem_ctx, struct hash_table *) const;
+
+   virtual void serialize_data(memory_writer &mem);
+
+   virtual ir_constant *constant_expression_value(struct hash_table *variable_context = NULL);
+
+   virtual ir_constant *as_constant()
+   {
+      return this;
+   }
+
+   virtual void accept(ir_visitor *v)
+   {
+      v->visit(this);
+   }
+
+   virtual ir_visitor_status accept(ir_hierarchical_visitor *);
+
+   virtual bool equals(ir_instruction *ir, enum ir_node_type ignore = ir_type_unset);
+
+   /**
+    * Get a particular component of a constant as a specific type
+    *
+    * This is useful, for example, to get a value from an integer constant
+    * as a float or bool.  This appears frequently when constructors are
+    * called with all constant parameters.
+    */
+   /*@{*/
+   bool get_bool_component(unsigned i) const;
+   float get_float_component(unsigned i) const;
+   int get_int_component(unsigned i) const;
+   unsigned get_uint_component(unsigned i) const;
+   /*@}*/
+
+   ir_constant *get_array_element(unsigned i) const;
+
+   ir_constant *get_record_field(const char *name);
+
+   /**
+    * Copy the values on another constant at a given offset.
+    *
+    * The offset is ignored for array or struct copies, it's only for
+    * scalars or vectors into vectors or matrices.
+    *
+    * With identical types on both sides and zero offset it's clone()
+    * without creating a new object.
+    */
+
+   void copy_offset(ir_constant *src, int offset);
+
+   /**
+    * Copy the values on another constant at a given offset and
+    * following an assign-like mask.
+    *
+    * The mask is ignored for scalars.
+    *
+    * Note that this function only handles what assign can handle,
+    * i.e. at most a vector as source and a column of a matrix as
+    * destination.
+    */
+
+   void copy_masked_offset(ir_constant *src, int offset, unsigned int mask);
+
+   /**
+    * Determine whether a constant has the same value as another constant
+    *
+    * \sa ir_constant::is_zero, ir_constant::is_one,
+    * ir_constant::is_negative_one, ir_constant::is_basis
+    */
+   bool has_value(const ir_constant *) const;
+
+   /**
+    * Return true if this ir_constant represents the given value.
+    *
+    * For vectors, this checks that each component is the given value.
+    */
+   virtual bool is_value(float f, int i) const;
+   virtual bool is_zero() const;
+   virtual bool is_one() const;
+   virtual bool is_negative_one() const;
+   virtual bool is_basis() const;
+
+   /**
+    * Return true for constants that could be stored as 16-bit unsigned values.
+    *
+    * Note that this will return true even for signed integer ir_constants, as
+    * long as the value is non-negative and fits in 16-bits.
+    */
+   virtual bool is_uint16_constant() const;
+
+   /**
+    * Value of the constant.
+    *
+    * The field used to back the values supplied by the constant is determined
+    * by the type associated with the \c ir_instruction.  Constants may be
+    * scalars, vectors, or matrices.
+    */
+   union ir_constant_data value;
+
+   /* Array elements */
+   ir_constant **array_elements;
+
+   /* Structure fields */
+   exec_list components;
+
+private:
+   /**
+    * Parameterless constructor only used by the clone method
+    */
+   ir_constant(void);
+};
+
+/**
+ * IR instruction to emit a vertex in a geometry shader.
+ */
+class ir_emit_vertex : public ir_instruction {
+public:
+   ir_emit_vertex()
+   {
+      ir_type = ir_type_emit_vertex;
+   }
+
+   virtual void accept(ir_visitor *v)
+   {
+      v->visit(this);
+   }
+
+   virtual ir_emit_vertex *clone(void *mem_ctx, struct hash_table *) const
+   {
+      return new(mem_ctx) ir_emit_vertex();
+   }
+
+   virtual void serialize_data(memory_writer &mem);
+
+   virtual ir_visitor_status accept(ir_hierarchical_visitor *);
+};
+
+/**
+ * IR instruction to complete the current primitive and start a new one in a
+ * geometry shader.
+ */
+class ir_end_primitive : public ir_instruction {
+public:
+   ir_end_primitive()
+   {
+      ir_type = ir_type_end_primitive;
+   }
+
+   virtual void accept(ir_visitor *v)
+   {
+      v->visit(this);
+   }
+
+   virtual ir_end_primitive *clone(void *mem_ctx, struct hash_table *) const
+   {
+      return new(mem_ctx) ir_end_primitive();
+   }
+
+   virtual void serialize_data(memory_writer &mem);
+
+   virtual ir_visitor_status accept(ir_hierarchical_visitor *);
+};
+
+/*@}*/
+
+/**
+ * Apply a visitor to each IR node in a list
+ */
+void
+visit_exec_list(exec_list *list, ir_visitor *visitor);
+
+/**
+ * Validate invariants on each IR node in a list
+ */
+void validate_ir_tree(exec_list *instructions);
+
+struct _mesa_glsl_parse_state;
+struct gl_shader_program;
+
+/**
+ * Detect whether an unlinked shader contains static recursion
+ *
+ * If the list of instructions is determined to contain static recursion,
+ * \c _mesa_glsl_error will be called to emit error messages for each function
+ * that is in the recursion cycle.
+ */
+void
+detect_recursion_unlinked(struct _mesa_glsl_parse_state *state,
+			  exec_list *instructions);
+
+/**
+ * Detect whether a linked shader contains static recursion
+ *
+ * If the list of instructions is determined to contain static recursion,
+ * \c link_error_printf will be called to emit error messages for each function
+ * that is in the recursion cycle.  In addition,
+ * \c gl_shader_program::LinkStatus will be set to false.
+ */
+void
+detect_recursion_linked(struct gl_shader_program *prog,
+			exec_list *instructions);
+
+/**
+ * Make a clone of each IR instruction in a list
+ *
+ * \param in   List of IR instructions that are to be cloned
+ * \param out  List to hold the cloned instructions
+ */
+void
+clone_ir_list(void *mem_ctx, exec_list *out, const exec_list *in);
+
+extern void
+_mesa_glsl_initialize_variables(exec_list *instructions,
+				struct _mesa_glsl_parse_state *state);
+
+extern void
+_mesa_glsl_initialize_functions(_mesa_glsl_parse_state *state);
+
+extern void
+_mesa_glsl_initialize_builtin_functions();
+
+extern ir_function_signature *
+_mesa_glsl_find_builtin_function(_mesa_glsl_parse_state *state,
+                                 const char *name, exec_list *actual_parameters);
+
+extern gl_shader *
+_mesa_glsl_get_builtin_function_shader(void);
+
+extern void
+_mesa_glsl_release_functions(void);
+
+extern void
+_mesa_glsl_release_builtin_functions(void);
+
+extern void
+reparent_ir(exec_list *list, void *mem_ctx);
+
+struct glsl_symbol_table;
+
+extern void
+import_prototypes(const exec_list *source, exec_list *dest,
+		  struct glsl_symbol_table *symbols, void *mem_ctx);
+
+extern bool
+ir_has_call(ir_instruction *ir);
+
+extern void
+do_set_program_inouts(exec_list *instructions, struct gl_program *prog,
+                      gl_shader_stage shader_stage);
+
+extern char *
+prototype_string(const glsl_type *return_type, const char *name,
+		 exec_list *parameters);
+
+const char *
+mode_string(const ir_variable *var);
+
+extern "C" {
+#endif /* __cplusplus */
+
+extern void _mesa_print_ir(FILE *f, struct exec_list *instructions,
+                           struct _mesa_glsl_parse_state *state);
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+unsigned
+vertices_per_prim(GLenum prim);
+
+#endif /* IR_H */

diff --git a/icd/intel/compiler/mesa-utils/src/glsl/ir_builder.h b/icd/intel/compiler/mesa-utils/src/glsl/ir_builder.h
new file mode 100644
index 0000000..108b53a
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/glsl/ir_builder.h

@@ -0,0 +1,218 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "ir.h"
+
+namespace ir_builder {
+
+#ifndef WRITEMASK_X
+enum writemask {
+   WRITEMASK_X = 0x1,
+   WRITEMASK_Y = 0x2,
+   WRITEMASK_Z = 0x4,
+   WRITEMASK_W = 0x8,
+};
+#endif
+
+/**
+ * This little class exists to let the helper expression generators
+ * take either an ir_rvalue * or an ir_variable * to be automatically
+ * dereferenced, while still providing compile-time type checking.
+ *
+ * You don't have to explicitly call the constructor -- C++ will see
+ * that you passed an ir_variable, and silently call the
+ * operand(ir_variable *var) constructor behind your back.
+ */
+class operand {
+public:
+   operand(ir_rvalue *val)
+      : val(val)
+   {
+   }
+
+   operand(ir_variable *var)
+   {
+      void *mem_ctx = ralloc_parent(var);
+      val = new(mem_ctx) ir_dereference_variable(var);
+   }
+
+   ir_rvalue *val;
+};
+
+/** Automatic generator for ir_dereference_variable on assignment LHS.
+ *
+ * \sa operand
+ */
+class deref {
+public:
+   deref(ir_dereference *val)
+      : val(val)
+   {
+   }
+
+   deref(ir_variable *var)
+   {
+      void *mem_ctx = ralloc_parent(var);
+      val = new(mem_ctx) ir_dereference_variable(var);
+   }
+
+
+   ir_dereference *val;
+};
+
+class ir_factory {
+public:
+   ir_factory(exec_list *instructions = NULL, void *mem_ctx = NULL)
+      : instructions(instructions),
+        mem_ctx(mem_ctx)
+   {
+      return;
+   }
+
+   void emit(ir_instruction *ir);
+   ir_variable *make_temp(const glsl_type *type, const char *name);
+
+   ir_constant*
+   constant(float f)
+   {
+      return new(mem_ctx) ir_constant(f);
+   }
+
+   ir_constant*
+   constant(int i)
+   {
+      return new(mem_ctx) ir_constant(i);
+   }
+
+   ir_constant*
+   constant(unsigned u)
+   {
+      return new(mem_ctx) ir_constant(u);
+   }
+
+   ir_constant*
+   constant(bool b)
+   {
+      return new(mem_ctx) ir_constant(b);
+   }
+
+   exec_list *instructions;
+   void *mem_ctx;
+};
+
+ir_assignment *assign(deref lhs, operand rhs);
+ir_assignment *assign(deref lhs, operand rhs, int writemask);
+ir_assignment *assign(deref lhs, operand rhs, operand condition);
+ir_assignment *assign(deref lhs, operand rhs, operand condition, int writemask);
+
+ir_return *ret(operand retval);
+
+ir_expression *expr(ir_expression_operation op, operand a);
+ir_expression *expr(ir_expression_operation op, operand a, operand b);
+ir_expression *expr(ir_expression_operation op, operand a, operand b, operand c);
+ir_expression *add(operand a, operand b);
+ir_expression *sub(operand a, operand b);
+ir_expression *mul(operand a, operand b);
+ir_expression *imul_high(operand a, operand b);
+ir_expression *div(operand a, operand b);
+ir_expression *carry(operand a, operand b);
+ir_expression *borrow(operand a, operand b);
+ir_expression *round_even(operand a);
+ir_expression *dot(operand a, operand b);
+ir_expression *clamp(operand a, operand b, operand c);
+ir_expression *saturate(operand a);
+ir_expression *abs(operand a);
+ir_expression *neg(operand a);
+ir_expression *sin(operand a);
+ir_expression *cos(operand a);
+ir_expression *exp(operand a);
+ir_expression *rsq(operand a);
+ir_expression *sqrt(operand a);
+ir_expression *log(operand a);
+ir_expression *sign(operand a);
+
+ir_expression *equal(operand a, operand b);
+ir_expression *nequal(operand a, operand b);
+ir_expression *less(operand a, operand b);
+ir_expression *greater(operand a, operand b);
+ir_expression *lequal(operand a, operand b);
+ir_expression *gequal(operand a, operand b);
+
+ir_expression *logic_not(operand a);
+ir_expression *logic_and(operand a, operand b);
+ir_expression *logic_or(operand a, operand b);
+
+ir_expression *bit_not(operand a);
+ir_expression *bit_or(operand a, operand b);
+ir_expression *bit_and(operand a, operand b);
+ir_expression *lshift(operand a, operand b);
+ir_expression *rshift(operand a, operand b);
+
+ir_expression *f2i(operand a);
+ir_expression *bitcast_f2i(operand a);
+ir_expression *i2f(operand a);
+ir_expression *bitcast_i2f(operand a);
+ir_expression *f2u(operand a);
+ir_expression *bitcast_f2u(operand a);
+ir_expression *u2f(operand a);
+ir_expression *bitcast_u2f(operand a);
+ir_expression *i2u(operand a);
+ir_expression *u2i(operand a);
+ir_expression *b2i(operand a);
+ir_expression *i2b(operand a);
+ir_expression *f2b(operand a);
+ir_expression *b2f(operand a);
+
+ir_expression *min2(operand a, operand b);
+ir_expression *max2(operand a, operand b);
+
+ir_expression *fma(operand a, operand b, operand c);
+ir_expression *lrp(operand x, operand y, operand a);
+ir_expression *csel(operand a, operand b, operand c);
+ir_expression *bitfield_insert(operand a, operand b, operand c, operand d);
+
+ir_swizzle *swizzle(operand a, int swizzle, int components);
+/**
+ * Swizzle away later components, but preserve the ordering.
+ */
+ir_swizzle *swizzle_for_size(operand a, unsigned components);
+
+ir_swizzle *swizzle_xxxx(operand a);
+ir_swizzle *swizzle_yyyy(operand a);
+ir_swizzle *swizzle_zzzz(operand a);
+ir_swizzle *swizzle_wwww(operand a);
+ir_swizzle *swizzle_x(operand a);
+ir_swizzle *swizzle_y(operand a);
+ir_swizzle *swizzle_z(operand a);
+ir_swizzle *swizzle_w(operand a);
+ir_swizzle *swizzle_xy(operand a);
+ir_swizzle *swizzle_xyz(operand a);
+ir_swizzle *swizzle_xyzw(operand a);
+
+ir_if *if_tree(operand condition,
+               ir_instruction *then_branch);
+ir_if *if_tree(operand condition,
+               ir_instruction *then_branch,
+               ir_instruction *else_branch);
+
+} /* namespace ir_builder */

diff --git a/icd/intel/compiler/mesa-utils/src/glsl/ir_expression_flattening.h b/icd/intel/compiler/mesa-utils/src/glsl/ir_expression_flattening.h
new file mode 100644
index 0000000..2eda159
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/glsl/ir_expression_flattening.h

@@ -0,0 +1,38 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+
+/**
+ * \file ir_expression_flattening.h
+ *
+ * Takes the leaves of expression trees and makes them dereferences of
+ * assignments of the leaves to temporaries, according to a predicate.
+ *
+ * This is used for automatic function inlining, where we want to take
+ * an expression containing a call and move the call out to its own
+ * assignment so that we can inline it at the appropriate place in the
+ * instruction stream.
+ */
+
+void do_expression_flattening(exec_list *instructions,
+			      bool (*predicate)(ir_instruction *ir));

diff --git a/icd/intel/compiler/mesa-utils/src/glsl/ir_optimization.h b/icd/intel/compiler/mesa-utils/src/glsl/ir_optimization.h
new file mode 100644
index 0000000..c63921c
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/glsl/ir_optimization.h

@@ -0,0 +1,128 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+
+/**
+ * \file ir_optimization.h
+ *
+ * Prototypes for optimization passes to be called by the compiler and drivers.
+ */
+
+/* Operations for lower_instructions() */
+#define SUB_TO_ADD_NEG     0x01
+#define DIV_TO_MUL_RCP     0x02
+#define EXP_TO_EXP2        0x04
+#define POW_TO_EXP2        0x08
+#define LOG_TO_LOG2        0x10
+#define MOD_TO_FRACT       0x20
+#define INT_DIV_TO_MUL_RCP 0x40
+#define BITFIELD_INSERT_TO_BFM_BFI 0x80
+#define LDEXP_TO_ARITH     0x100
+#define CARRY_TO_ARITH     0x200
+#define BORROW_TO_ARITH    0x400
+
+/**
+ * \see class lower_packing_builtins_visitor
+ */
+enum lower_packing_builtins_op {
+   LOWER_PACK_UNPACK_NONE               = 0x0000,
+
+   LOWER_PACK_SNORM_2x16                = 0x0001,
+   LOWER_UNPACK_SNORM_2x16              = 0x0002,
+
+   LOWER_PACK_UNORM_2x16                = 0x0004,
+   LOWER_UNPACK_UNORM_2x16              = 0x0008,
+
+   LOWER_PACK_HALF_2x16                 = 0x0010,
+   LOWER_UNPACK_HALF_2x16               = 0x0020,
+
+   LOWER_PACK_HALF_2x16_TO_SPLIT        = 0x0040,
+   LOWER_UNPACK_HALF_2x16_TO_SPLIT      = 0x0080,
+
+   LOWER_PACK_SNORM_4x8                 = 0x0100,
+   LOWER_UNPACK_SNORM_4x8               = 0x0200,
+
+   LOWER_PACK_UNORM_4x8                 = 0x0400,
+   LOWER_UNPACK_UNORM_4x8               = 0x0800
+};
+
+bool do_common_optimization(exec_list *ir, bool linked,
+			    bool uniform_locations_assigned,
+                            const struct gl_shader_compiler_options *options,
+                            bool native_integers);
+
+bool do_algebraic(exec_list *instructions, bool native_integers);
+bool do_constant_folding(exec_list *instructions);
+bool do_constant_variable(exec_list *instructions);
+bool do_constant_variable_unlinked(exec_list *instructions);
+bool do_copy_propagation(exec_list *instructions);
+bool do_copy_propagation_elements(exec_list *instructions);
+bool do_constant_propagation(exec_list *instructions);
+bool do_cse(exec_list *instructions);
+void do_dead_builtin_varyings(struct gl_context *ctx,
+                              gl_shader *producer, gl_shader *consumer,
+                              unsigned num_tfeedback_decls,
+                              class tfeedback_decl *tfeedback_decls);
+bool do_dead_code(exec_list *instructions, bool uniform_locations_assigned);
+bool do_dead_code_local(exec_list *instructions);
+bool do_dead_code_unlinked(exec_list *instructions);
+bool do_dead_functions(exec_list *instructions);
+bool opt_flip_matrices(exec_list *instructions);
+bool do_function_inlining(exec_list *instructions);
+bool do_lower_jumps(exec_list *instructions, bool pull_out_jumps = true, bool lower_sub_return = true, bool lower_main_return = false, bool lower_continue = false, bool lower_break = false);
+bool do_lower_texture_projection(exec_list *instructions);
+bool do_if_simplification(exec_list *instructions);
+bool opt_flatten_nested_if_blocks(exec_list *instructions);
+bool do_discard_simplification(exec_list *instructions);
+bool lower_if_to_cond_assign(exec_list *instructions, unsigned max_depth = 0);
+bool do_mat_op_to_vec(exec_list *instructions);
+bool do_noop_swizzle(exec_list *instructions);
+bool do_structure_splitting(exec_list *instructions);
+bool do_swizzle_swizzle(exec_list *instructions);
+bool do_vectorize(exec_list *instructions);
+bool do_tree_grafting(exec_list *instructions);
+bool do_vec_index_to_cond_assign(exec_list *instructions);
+bool do_vec_index_to_swizzle(exec_list *instructions);
+bool lower_discard(exec_list *instructions);
+void lower_discard_flow(exec_list *instructions);
+bool lower_instructions(exec_list *instructions, unsigned what_to_lower);
+bool lower_noise(exec_list *instructions);
+bool lower_variable_index_to_cond_assign(exec_list *instructions,
+    bool lower_input, bool lower_output, bool lower_temp, bool lower_uniform);
+bool lower_quadop_vector(exec_list *instructions, bool dont_lower_swz);
+bool lower_clip_distance(gl_shader *shader);
+void lower_output_reads(exec_list *instructions);
+bool lower_packing_builtins(exec_list *instructions, int op_mask);
+void lower_ubo_reference(struct gl_shader *shader, exec_list *instructions);
+void lower_packed_varyings(void *mem_ctx,
+                           unsigned locations_used, ir_variable_mode mode,
+                           unsigned gs_input_vertices, gl_shader *shader);
+bool lower_vector_insert(exec_list *instructions, bool lower_nonconstant_index);
+void lower_named_interface_blocks(void *mem_ctx, gl_shader *shader);
+bool optimize_redundant_jumps(exec_list *instructions);
+bool optimize_split_arrays(exec_list *instructions, bool linked);
+bool lower_offset_arrays(exec_list *instructions);
+
+ir_rvalue *
+compare_index_block(exec_list *instructions, ir_variable *index,
+		    unsigned base, unsigned components, void *mem_ctx);

diff --git a/icd/intel/compiler/mesa-utils/src/glsl/ir_rvalue_visitor.h b/icd/intel/compiler/mesa-utils/src/glsl/ir_rvalue_visitor.h
new file mode 100644
index 0000000..2179fa5
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/glsl/ir_rvalue_visitor.h

@@ -0,0 +1,74 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file ir_rvalue_visitor.h
+ *
+ * Generic class to implement the common pattern we have of wanting to
+ * visit each ir_rvalue * and possibly change that node to a different
+ * class.  Just implement handle_rvalue() and you will be called with
+ * a pointer to each rvalue in the tree.
+ */
+
+class ir_rvalue_base_visitor : public ir_hierarchical_visitor {
+public:
+   ir_visitor_status rvalue_visit(ir_assignment *);
+   ir_visitor_status rvalue_visit(ir_call *);
+   ir_visitor_status rvalue_visit(ir_dereference_array *);
+   ir_visitor_status rvalue_visit(ir_dereference_record *);
+   ir_visitor_status rvalue_visit(ir_expression *);
+   ir_visitor_status rvalue_visit(ir_if *);
+   ir_visitor_status rvalue_visit(ir_return *);
+   ir_visitor_status rvalue_visit(ir_swizzle *);
+   ir_visitor_status rvalue_visit(ir_texture *);
+
+   virtual void handle_rvalue(ir_rvalue **rvalue) = 0;
+};
+
+class ir_rvalue_visitor : public ir_rvalue_base_visitor {
+public:
+
+   virtual ir_visitor_status visit_leave(ir_assignment *);
+   virtual ir_visitor_status visit_leave(ir_call *);
+   virtual ir_visitor_status visit_leave(ir_dereference_array *);
+   virtual ir_visitor_status visit_leave(ir_dereference_record *);
+   virtual ir_visitor_status visit_leave(ir_expression *);
+   virtual ir_visitor_status visit_leave(ir_if *);
+   virtual ir_visitor_status visit_leave(ir_return *);
+   virtual ir_visitor_status visit_leave(ir_swizzle *);
+   virtual ir_visitor_status visit_leave(ir_texture *);
+};
+
+class ir_rvalue_enter_visitor : public ir_rvalue_base_visitor {
+public:
+
+   virtual ir_visitor_status visit_enter(ir_assignment *);
+   virtual ir_visitor_status visit_enter(ir_call *);
+   virtual ir_visitor_status visit_enter(ir_dereference_array *);
+   virtual ir_visitor_status visit_enter(ir_dereference_record *);
+   virtual ir_visitor_status visit_enter(ir_expression *);
+   virtual ir_visitor_status visit_enter(ir_if *);
+   virtual ir_visitor_status visit_enter(ir_return *);
+   virtual ir_visitor_status visit_enter(ir_swizzle *);
+   virtual ir_visitor_status visit_enter(ir_texture *);
+};

diff --git a/icd/intel/compiler/mesa-utils/src/glsl/ir_uniform.h b/icd/intel/compiler/mesa-utils/src/glsl/ir_uniform.h
new file mode 100644
index 0000000..3508509
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/glsl/ir_uniform.h

@@ -0,0 +1,193 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef IR_UNIFORM_H
+#define IR_UNIFORM_H
+
+
+/* stdbool.h is necessary because this file is included in both C and C++ code.
+ */
+#include <stdbool.h>
+
+#include "program/prog_parameter.h"  /* For union gl_constant_value. */
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+enum gl_uniform_driver_format {
+   uniform_native = 0,          /**< Store data in the native format. */
+   uniform_int_float,           /**< Store integer data as floats. */
+   uniform_bool_float,          /**< Store boolean data as floats. */
+
+   /**
+    * Store boolean data as integer using 1 for \c true.
+    */
+   uniform_bool_int_0_1,
+
+   /**
+    * Store boolean data as integer using ~0 for \c true.
+    */
+   uniform_bool_int_0_not0
+};
+
+struct gl_uniform_driver_storage {
+   /**
+    * Number of bytes from one array element to the next.
+    */
+   uint8_t element_stride;
+
+   /**
+    * Number of bytes from one vector in a matrix to the next.
+    */
+   uint8_t vector_stride;
+
+   /**
+    * Base format of the stored data.
+    *
+    * This field must have a value from \c GLSL_TYPE_UINT through \c
+    * GLSL_TYPE_SAMPLER.
+    */
+   uint8_t format;
+
+   /**
+    * Pointer to the base of the data.
+    */
+   void *data;
+};
+
+struct gl_opaque_uniform_index {
+   /**
+    * Base opaque uniform index
+    *
+    * If \c gl_uniform_storage::base_type is an opaque type, this
+    * represents its uniform index.  If \c
+    * gl_uniform_storage::array_elements is not zero, the array will
+    * use opaque uniform indices \c index through \c index + \c
+    * gl_uniform_storage::array_elements - 1, inclusive.
+    *
+    * Note that the index may be different in each shader stage.
+    */
+   uint8_t index;
+
+   /**
+    * Whether this opaque uniform is used in this shader stage.
+    */
+   bool active;
+};
+
+struct gl_uniform_storage {
+   char *name;
+   /** Type of this uniform data stored.
+    *
+    * In the case of an array, it's the type of a single array element.
+    */
+   const struct glsl_type *type;
+
+   /**
+    * The number of elements in this uniform.
+    *
+    * For non-arrays, this is always 0.  For arrays, the value is the size of
+    * the array.
+    */
+   unsigned array_elements;
+
+   /**
+    * Has this uniform ever been set?
+    */
+   bool initialized;
+
+   struct gl_opaque_uniform_index sampler[MESA_SHADER_STAGES];
+
+   struct gl_opaque_uniform_index image[MESA_SHADER_STAGES];
+
+   /**
+    * Storage used by the driver for the uniform
+    */
+   unsigned num_driver_storage;
+   struct gl_uniform_driver_storage *driver_storage;
+
+   /**
+    * Storage used by Mesa for the uniform
+    *
+    * This form of the uniform is used by Mesa's implementation of \c
+    * glGetUniform.  It can also be used by drivers to obtain the value of the
+    * uniform if the \c ::driver_storage interface is not used.
+    */
+   union gl_constant_value *storage;
+
+   /** Fields for GL_ARB_uniform_buffer_object
+    * @{
+    */
+
+   /**
+    * GL_UNIFORM_BLOCK_INDEX: index of the uniform block containing
+    * the uniform, or -1 for the default uniform block.  Note that the
+    * index is into the linked program's UniformBlocks[] array, not
+    * the linked shader's.
+    */
+   int block_index;
+
+   /** GL_UNIFORM_OFFSET: byte offset within the uniform block, or -1. */
+   int offset;
+
+   /**
+    * GL_UNIFORM_MATRIX_STRIDE: byte stride between columns or rows of
+    * a matrix.  Set to 0 for non-matrices in UBOs, or -1 for uniforms
+    * in the default uniform block.
+    */
+   int matrix_stride;
+
+   /**
+    * GL_UNIFORM_ARRAY_STRIDE: byte stride between elements of the
+    * array.  Set to zero for non-arrays in UBOs, or -1 for uniforms
+    * in the default uniform block.
+    */
+   int array_stride;
+
+   /** GL_UNIFORM_ROW_MAJOR: true iff it's a row-major matrix in a UBO */
+   bool row_major;
+
+   /** @} */
+
+   /**
+    * Index within gl_shader_program::AtomicBuffers[] of the atomic
+    * counter buffer this uniform is stored in, or -1 if this is not
+    * an atomic counter.
+    */
+   int atomic_buffer_index;
+
+   /**
+    * The 'base location' for this uniform in the uniform remap table. For
+    * arrays this is the first element in the array.
+    */
+   unsigned remap_location;
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* IR_UNIFORM_H */

diff --git a/icd/intel/compiler/mesa-utils/src/glsl/ir_visitor.h b/icd/intel/compiler/mesa-utils/src/glsl/ir_visitor.h
new file mode 100644
index 0000000..40f96ff
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/glsl/ir_visitor.h

@@ -0,0 +1,91 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef IR_VISITOR_H
+#define IR_VISITOR_H
+
+#ifdef __cplusplus
+/**
+ * Abstract base class of visitors of IR instruction trees
+ */
+class ir_visitor {
+public:
+   virtual ~ir_visitor()
+   {
+      /* empty */
+   }
+
+   /**
+    * \name Visit methods
+    *
+    * As typical for the visitor pattern, there must be one \c visit method for
+    * each concrete subclass of \c ir_instruction.  Virtual base classes within
+    * the hierarchy should not have \c visit methods.
+    */
+   /*@{*/
+   virtual void visit(class ir_rvalue *) { assert(!"unhandled error_type"); }
+   virtual void visit(class ir_variable *) = 0;
+   virtual void visit(class ir_function_signature *) = 0;
+   virtual void visit(class ir_function *) = 0;
+   virtual void visit(class ir_expression *) = 0;
+   virtual void visit(class ir_texture *) = 0;
+   virtual void visit(class ir_swizzle *) = 0;
+   virtual void visit(class ir_dereference_variable *) = 0;
+   virtual void visit(class ir_dereference_array *) = 0;
+   virtual void visit(class ir_dereference_record *) = 0;
+   virtual void visit(class ir_assignment *) = 0;
+   virtual void visit(class ir_constant *) = 0;
+   virtual void visit(class ir_call *) = 0;
+   virtual void visit(class ir_return *) = 0;
+   virtual void visit(class ir_discard *) = 0;
+   virtual void visit(class ir_if *) = 0;
+   virtual void visit(class ir_loop *) = 0;
+   virtual void visit(class ir_loop_jump *) = 0;
+   virtual void visit(class ir_emit_vertex *) = 0;
+   virtual void visit(class ir_end_primitive *) = 0;
+   /*@}*/
+};
+
+/* NOTE: function calls may never return due to discards inside them
+ * This is usually not an issue, but if it is, keep it in mind
+ */
+class ir_control_flow_visitor : public ir_visitor {
+public:
+   virtual void visit(class ir_variable *) {}
+   virtual void visit(class ir_expression *) {}
+   virtual void visit(class ir_texture *) {}
+   virtual void visit(class ir_swizzle *) {}
+   virtual void visit(class ir_dereference_variable *) {}
+   virtual void visit(class ir_dereference_array *) {}
+   virtual void visit(class ir_dereference_record *) {}
+   virtual void visit(class ir_assignment *) {}
+   virtual void visit(class ir_constant *) {}
+   virtual void visit(class ir_call *) {}
+   virtual void visit(class ir_emit_vertex *) {}
+   virtual void visit(class ir_end_primitive *) {}
+};
+#endif /* __cplusplus */
+
+#endif /* IR_VISITOR_H */

diff --git a/icd/intel/compiler/mesa-utils/src/glsl/memory_map.h b/icd/intel/compiler/mesa-utils/src/glsl/memory_map.h
new file mode 100644
index 0000000..fc13134
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/glsl/memory_map.h

@@ -0,0 +1,237 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef MEMORY_MAP_H
+#define MEMORY_MAP_H
+
+#include <fcntl.h>
+#include <unistd.h>
+
+#ifdef _POSIX_MAPPED_FILES
+#include <sys/mman.h>
+#include <sys/stat.h>
+#endif
+
+#include <stdint.h>
+#include <string.h>
+#include "ralloc.h"
+
+#ifdef __cplusplus
+
+/**
+ * Helper class to read data
+ *
+ * Class can read either from user given memory or from a file. On Linux
+ * file reading wraps around the Posix functions for mapping a file into
+ * the process's address space. Other OS may need different implementation.
+ */
+class memory_map
+{
+public:
+   memory_map() :
+      error(false),
+      mode(memory_map::READ_MEM),
+      cache_size(0),
+      cache_mmap(NULL),
+      cache_mmap_p(NULL)
+   {
+      mem_ctx = ralloc_context(NULL);
+   }
+
+   /* read from disk */
+   int map(const char *path)
+   {
+#ifdef _POSIX_MAPPED_FILES
+      struct stat stat_info;
+      if (stat(path, &stat_info) != 0)
+         return -1;
+
+      mode = memory_map::READ_MAP;
+      cache_size = stat_info.st_size;
+
+      int fd = open(path, O_RDONLY);
+      if (fd) {
+         cache_mmap_p = cache_mmap = (char *)
+            mmap(NULL, cache_size, PROT_READ, MAP_PRIVATE, fd, 0);
+         close(fd);
+         return (cache_mmap == MAP_FAILED) ? -1 : 0;
+      }
+#else
+      /* Implementation for systems without mmap(). */
+      FILE *in = fopen(path, "r");
+      if (in) {
+         fseek(in, 0, SEEK_END);
+         cache_size = ftell(in);
+         rewind(in);
+
+         cache_mmap = ralloc_array(mem_ctx, char, cache_size);
+
+         if (!cache_mmap)
+            return -1;
+
+         if (fread(cache_mmap, cache_size, 1, in) != 1) {
+            ralloc_free(cache_mmap);
+            cache_mmap = NULL;
+         }
+         cache_mmap_p = cache_mmap;
+         fclose(in);
+
+         return (cache_mmap == NULL) ? -1 : 0;
+      }
+#endif
+      return -1;
+   }
+
+   /* read from memory */
+   void map(const void *memory, size_t size)
+   {
+      cache_mmap_p = cache_mmap = (char *) memory;
+      cache_size = size;
+   }
+
+   ~memory_map() {
+#ifdef _POSIX_MAPPED_FILES
+      if (cache_mmap && mode == READ_MAP) {
+         munmap(cache_mmap, cache_size);
+      }
+#endif
+      ralloc_free(mem_ctx);
+   }
+
+   /* move read pointer forward */
+   inline void ffwd(int len)
+   {
+      cache_mmap_p += len;
+   }
+
+   inline void jump(unsigned pos)
+   {
+      cache_mmap_p = cache_mmap + pos;
+   }
+
+   /**
+    * safety check to avoid reading over cache_size,
+    * returns bool if it is safe to continue reading
+    */
+   bool safe_read(unsigned size)
+   {
+      if (position() + size > cache_size)
+         error = true;
+      return !error;
+   }
+
+   /* position of read pointer */
+   inline uint32_t position()
+   {
+      return cache_mmap_p - cache_mmap;
+   }
+
+   inline char *read_string()
+   {
+      uint32_t len = read_uint32_t();
+
+      /* NULL pointer is supported */
+      if (len == 0)
+         return NULL;
+
+      /* don't read off the end of cache */
+      /* TODO: Understand how this can happen and fix */
+      if (len + position() > cache_size) {
+         error = true;
+         return NULL;
+      }
+
+      /* verify that last character is terminator */
+      if (*(cache_mmap_p + len - 1) != '\0') {
+         error = true;
+         return NULL;
+      }
+
+      char *str = ralloc_array(mem_ctx, char, len);
+      memcpy(str, cache_mmap_p, len);
+      ffwd(len);
+      return str;
+   }
+
+/**
+ * read functions per type
+ */
+#define DECL_READER(type) type read_ ##type () {\
+   if (!safe_read(sizeof(type)))\
+      return 0;\
+   ffwd(sizeof(type));\
+   return *(type *) (cache_mmap_p - sizeof(type));\
+}
+
+   DECL_READER(int32_t);
+   DECL_READER(int64_t);
+   DECL_READER(uint8_t);
+   DECL_READER(uint32_t);
+
+   inline uint8_t read_bool()
+   {
+      return read_uint8_t();
+   }
+
+   inline void read(void *dst, size_t size)
+   {
+      if (!safe_read(size))
+         return;
+      memcpy(dst, cache_mmap_p, size);
+      ffwd(size);
+   }
+
+   /* total size of mapped memory */
+   inline int32_t size()
+   {
+      return cache_size;
+   }
+
+   inline bool errors()
+   {
+      return error;
+   }
+
+private:
+
+   void *mem_ctx;
+
+   /* if errors have occurred during reading */
+   bool error;
+
+   /* specifies if we are reading mapped memory or user passed mem */
+   enum read_mode {
+      READ_MEM = 0,
+      READ_MAP
+   };
+
+   int32_t mode;
+   unsigned cache_size;
+   char *cache_mmap;
+   char *cache_mmap_p;
+};
+#endif /* ifdef __cplusplus */
+
+#endif /* MEMORY_MAP_H */

diff --git a/icd/intel/compiler/mesa-utils/src/glsl/memory_writer.h b/icd/intel/compiler/mesa-utils/src/glsl/memory_writer.h
new file mode 100644
index 0000000..f98d118
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/glsl/memory_writer.h

@@ -0,0 +1,204 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef MEMORY_WRITER_H
+#define MEMORY_WRITER_H
+
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+
+#include "main/hash_table.h"
+
+#ifdef __cplusplus
+/**
+ * Helper class for writing data to memory
+ *
+ * This class maintains a dynamically-sized memory buffer and allows
+ * for data to be efficiently appended to it with automatic resizing.
+ */
+class memory_writer
+{
+public:
+   memory_writer() :
+      memory(NULL),
+      curr_size(0),
+      pos(0),
+      unique_id_counter(0)
+   {
+      data_hash = _mesa_hash_table_create(0, int_equal);
+      hash_value = _mesa_hash_data(this, sizeof(memory_writer));
+   }
+
+   ~memory_writer()
+   {
+      free(memory);
+      _mesa_hash_table_destroy(data_hash, NULL);
+   }
+
+   /* user wants to claim the memory */
+   char *release_memory(size_t *size)
+   {
+      /* final realloc to free allocated but unused memory */
+      char *result = (char *) realloc(memory, pos);
+      *size = pos;
+      memory = NULL;
+      curr_size = 0;
+      pos = 0;
+      return result;
+   }
+
+/**
+ * write functions per type
+ */
+#define DECL_WRITER(type) void write_ ##type (const type data) {\
+   write(&data, sizeof(type));\
+}
+
+   DECL_WRITER(int32_t);
+   DECL_WRITER(int64_t);
+   DECL_WRITER(uint8_t);
+   DECL_WRITER(uint32_t);
+
+   void write_bool(bool data)
+   {
+      uint8_t val = data;
+      write_uint8_t(val);
+   }
+
+   /* write function that reallocates more memory if required */
+   void write(const void *data, int size)
+   {
+      if (!memory || pos > (curr_size - size))
+         if (!grow(size)) {
+            assert(!"Out of memory while serializing a shader");
+            return;
+         }
+
+      memcpy(memory + pos, data, size);
+
+      pos += size;
+   }
+
+   void overwrite(const void *data, int size, int offset)
+   {
+      if (offset < 0 || offset + size > pos) {
+         assert(!"Attempt to write out of bounds while serializing a shader");
+         return;
+      }
+
+      memcpy(memory + offset, data, size);
+   }
+
+   /* length is written to make reading safe, we write len + 1 to be
+    * able to make distinction between "" and NULL
+    */
+   void write_string(const char *str)
+   {
+      uint32_t len = str ? strlen(str) + 1 : 0;
+      write_uint32_t(len);
+
+      /* serialize string + terminator for more convinient parsing. */
+      if (str)
+         write(str, len);
+   }
+
+   unsigned position()
+   {
+      return pos;
+   }
+
+   /**
+    * Convert the given pointer into a small integer unique ID.  In other
+    * words, if make_unique_id() has previously been called with this pointer,
+    * return the same ID that was returned last time.  If this is the first
+    * call to make_unique_id() with this pointer, return a fresh ID.
+    *
+    * Return value is true if the pointer has been seen before, false
+    * otherwise.
+    */
+   bool make_unique_id(const void *ptr, uint32_t *id_out)
+   {
+      hash_entry *entry =
+         _mesa_hash_table_search(data_hash, _mesa_hash_pointer(ptr), ptr);
+      if (entry != NULL) {
+         *id_out = (uint32_t) (intptr_t) entry->data;
+         return true;
+      } else {
+         /* Note: hashtable uses 0 to represent "entry not found" so our
+          * unique ID's need to start at 1.  Hence, preincrement
+          * unique_id_counter.
+          */
+         *id_out = ++this->unique_id_counter;
+         _mesa_hash_table_insert(data_hash, _mesa_hash_pointer(ptr), ptr,
+                                 (void *) (intptr_t) *id_out);
+         return false;
+      }
+   }
+
+private:
+
+   /* reallocate more memory */
+   bool grow(int size)
+   {
+      unsigned new_size = 2 * (curr_size + size);
+      char *more_mem = (char *) realloc(memory, new_size);
+      if (more_mem == NULL) {
+         free(memory);
+         memory = NULL;
+         return false;
+      } else {
+         memory = more_mem;
+         curr_size = new_size;
+         return true;
+      }
+   }
+
+   /* allocated memory */
+   char *memory;
+
+   /* current size of the whole allocation */
+   int curr_size;
+
+   /* write position / size of the data written */
+   int pos;
+
+   /* this hash can be used to refer to data already written
+    * to skip sequential writes of the same data
+    */
+   struct hash_table *data_hash;
+   uint32_t hash_value;
+   unsigned unique_id_counter;
+
+   static bool int_equal(const void *a, const void *b)
+   {
+      return a == b;
+   }
+
+};
+
+#endif /* ifdef __cplusplus */
+
+#endif /* MEMORY_WRITER_H */

diff --git a/icd/intel/compiler/mesa-utils/src/glsl/ralloc.c b/icd/intel/compiler/mesa-utils/src/glsl/ralloc.c
new file mode 100644
index 0000000..36bc61f
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/glsl/ralloc.c

@@ -0,0 +1,492 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include <assert.h>
+#include <stdlib.h>
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <stdint.h>
+
+/* Android defines SIZE_MAX in limits.h, instead of the standard stdint.h */
+#ifdef ANDROID
+#include <limits.h>
+#endif
+
+/* Some versions of MinGW are missing _vscprintf's declaration, although they
+ * still provide the symbol in the import library. */
+#ifdef __MINGW32__
+_CRTIMP int _vscprintf(const char *format, va_list argptr);
+#endif
+
+#include "ralloc.h"
+
+#ifndef va_copy
+#ifdef __va_copy
+#define va_copy(dest, src) __va_copy((dest), (src))
+#else
+#define va_copy(dest, src) (dest) = (src)
+#endif
+#endif
+
+#define CANARY 0x5A1106
+
+struct ralloc_header
+{
+#ifdef DEBUG
+   /* A canary value used to determine whether a pointer is ralloc'd. */
+   unsigned canary;
+#endif
+
+   struct ralloc_header *parent;
+
+   /* The first child (head of a linked list) */
+   struct ralloc_header *child;
+
+   /* Linked list of siblings */
+   struct ralloc_header *prev;
+   struct ralloc_header *next;
+
+   void (*destructor)(void *);
+};
+
+typedef struct ralloc_header ralloc_header;
+
+static void unlink_block(ralloc_header *info);
+static void unsafe_free(ralloc_header *info);
+
+static ralloc_header *
+get_header(const void *ptr)
+{
+   ralloc_header *info = (ralloc_header *) (((char *) ptr) -
+					    sizeof(ralloc_header));
+#ifdef DEBUG
+   assert(info->canary == CANARY);
+#endif
+   return info;
+}
+
+#define PTR_FROM_HEADER(info) (((char *) info) + sizeof(ralloc_header))
+
+static void
+add_child(ralloc_header *parent, ralloc_header *info)
+{
+   if (parent != NULL) {
+      info->parent = parent;
+      info->next = parent->child;
+      parent->child = info;
+
+      if (info->next != NULL)
+	 info->next->prev = info;
+   }
+}
+
+void *
+ralloc_context(const void *ctx)
+{
+   return ralloc_size(ctx, 0);
+}
+
+void *
+ralloc_size(const void *ctx, size_t size)
+{
+   void *block = calloc(1, size + sizeof(ralloc_header));
+   ralloc_header *info;
+   ralloc_header *parent;
+
+   if (unlikely(block == NULL))
+      return NULL;
+   info = (ralloc_header *) block;
+   parent = ctx != NULL ? get_header(ctx) : NULL;
+
+   add_child(parent, info);
+
+#ifdef DEBUG
+   info->canary = CANARY;
+#endif
+
+   return PTR_FROM_HEADER(info);
+}
+
+void *
+rzalloc_size(const void *ctx, size_t size)
+{
+   void *ptr = ralloc_size(ctx, size);
+   if (likely(ptr != NULL))
+      memset(ptr, 0, size);
+   return ptr;
+}
+
+/* helper function - assumes ptr != NULL */
+static void *
+resize(void *ptr, size_t size)
+{
+   ralloc_header *child, *old, *info;
+
+   old = get_header(ptr);
+   info = realloc(old, size + sizeof(ralloc_header));
+
+   if (info == NULL)
+      return NULL;
+
+   /* Update parent and sibling's links to the reallocated node. */
+   if (info != old && info->parent != NULL) {
+      if (info->parent->child == old)
+	 info->parent->child = info;
+
+      if (info->prev != NULL)
+	 info->prev->next = info;
+
+      if (info->next != NULL)
+	 info->next->prev = info;
+   }
+
+   /* Update child->parent links for all children */
+   for (child = info->child; child != NULL; child = child->next)
+      child->parent = info;
+
+   return PTR_FROM_HEADER(info);
+}
+
+void *
+reralloc_size(const void *ctx, void *ptr, size_t size)
+{
+   if (unlikely(ptr == NULL))
+      return ralloc_size(ctx, size);
+
+   assert(ralloc_parent(ptr) == ctx);
+   return resize(ptr, size);
+}
+
+void *
+ralloc_array_size(const void *ctx, size_t size, unsigned count)
+{
+   if (count > SIZE_MAX/size)
+      return NULL;
+
+   return ralloc_size(ctx, size * count);
+}
+
+void *
+rzalloc_array_size(const void *ctx, size_t size, unsigned count)
+{
+   if (count > SIZE_MAX/size)
+      return NULL;
+
+   return rzalloc_size(ctx, size * count);
+}
+
+void *
+reralloc_array_size(const void *ctx, void *ptr, size_t size, unsigned count)
+{
+   if (count > SIZE_MAX/size)
+      return NULL;
+
+   return reralloc_size(ctx, ptr, size * count);
+}
+
+void
+ralloc_free(void *ptr)
+{
+   ralloc_header *info;
+
+   if (ptr == NULL)
+      return;
+
+   info = get_header(ptr);
+   unlink_block(info);
+   unsafe_free(info);
+}
+
+static void
+unlink_block(ralloc_header *info)
+{
+   /* Unlink from parent & siblings */
+   if (info->parent != NULL) {
+      if (info->parent->child == info)
+	 info->parent->child = info->next;
+
+      if (info->prev != NULL)
+	 info->prev->next = info->next;
+
+      if (info->next != NULL)
+	 info->next->prev = info->prev;
+   }
+   info->parent = NULL;
+   info->prev = NULL;
+   info->next = NULL;
+}
+
+static void
+unsafe_free(ralloc_header *info)
+{
+   /* Recursively free any children...don't waste time unlinking them. */
+   ralloc_header *temp;
+   while (info->child != NULL) {
+      temp = info->child;
+      info->child = temp->next;
+      unsafe_free(temp);
+   }
+
+   /* Free the block itself.  Call the destructor first, if any. */
+   if (info->destructor != NULL)
+      info->destructor(PTR_FROM_HEADER(info));
+
+   free(info);
+}
+
+void
+ralloc_steal(const void *new_ctx, void *ptr)
+{
+   ralloc_header *info, *parent;
+
+   if (unlikely(ptr == NULL))
+      return;
+
+   info = get_header(ptr);
+   parent = get_header(new_ctx);
+
+   unlink_block(info);
+
+   add_child(parent, info);
+}
+
+void *
+ralloc_parent(const void *ptr)
+{
+   ralloc_header *info;
+
+   if (unlikely(ptr == NULL))
+      return NULL;
+
+   info = get_header(ptr);
+   return info->parent ? PTR_FROM_HEADER(info->parent) : NULL;
+}
+
+static void *autofree_context = NULL;
+
+static void
+autofree(void)
+{
+   ralloc_free(autofree_context);
+}
+
+void *
+ralloc_autofree_context(void)
+{
+   if (unlikely(autofree_context == NULL)) {
+      autofree_context = ralloc_context(NULL);
+      atexit(autofree);
+   }
+   return autofree_context;
+}
+
+void
+ralloc_set_destructor(const void *ptr, void(*destructor)(void *))
+{
+   ralloc_header *info = get_header(ptr);
+   info->destructor = destructor;
+}
+
+char *
+ralloc_strdup(const void *ctx, const char *str)
+{
+   size_t n;
+   char *ptr;
+
+   if (unlikely(str == NULL))
+      return NULL;
+
+   n = strlen(str);
+   ptr = ralloc_array(ctx, char, n + 1);
+   memcpy(ptr, str, n);
+   ptr[n] = '\0';
+   return ptr;
+}
+
+char *
+ralloc_strndup(const void *ctx, const char *str, size_t max)
+{
+   size_t n;
+   char *ptr;
+
+   if (unlikely(str == NULL))
+      return NULL;
+
+   n = strlen(str);
+   if (n > max)
+      n = max;
+
+   ptr = ralloc_array(ctx, char, n + 1);
+   memcpy(ptr, str, n);
+   ptr[n] = '\0';
+   return ptr;
+}
+
+/* helper routine for strcat/strncat - n is the exact amount to copy */
+static bool
+cat(char **dest, const char *str, size_t n)
+{
+   char *both;
+   size_t existing_length;
+   assert(dest != NULL && *dest != NULL);
+
+   existing_length = strlen(*dest);
+   both = resize(*dest, existing_length + n + 1);
+   if (unlikely(both == NULL))
+      return false;
+
+   memcpy(both + existing_length, str, n);
+   both[existing_length + n] = '\0';
+
+   *dest = both;
+   return true;
+}
+
+
+bool
+ralloc_strcat(char **dest, const char *str)
+{
+   return cat(dest, str, strlen(str));
+}
+
+bool
+ralloc_strncat(char **dest, const char *str, size_t n)
+{
+   /* Clamp n to the string length */
+   size_t str_length = strlen(str);
+   if (str_length < n)
+      n = str_length;
+
+   return cat(dest, str, n);
+}
+
+char *
+ralloc_asprintf(const void *ctx, const char *fmt, ...)
+{
+   char *ptr;
+   va_list args;
+   va_start(args, fmt);
+   ptr = ralloc_vasprintf(ctx, fmt, args);
+   va_end(args);
+   return ptr;
+}
+
+/* Return the length of the string that would be generated by a printf-style
+ * format and argument list, not including the \0 byte.
+ */
+static size_t
+printf_length(const char *fmt, va_list untouched_args)
+{
+   int size;
+   char junk;
+
+   /* Make a copy of the va_list so the original caller can still use it */
+   va_list args;
+   va_copy(args, untouched_args);
+
+#ifdef _WIN32
+   /* We need to use _vcsprintf to calculate the size as vsnprintf returns -1
+    * if the number of characters to write is greater than count.
+    */
+   size = _vscprintf(fmt, args);
+   (void)junk;
+#else
+   size = vsnprintf(&junk, 1, fmt, args);
+#endif
+   assert(size >= 0);
+
+   va_end(args);
+
+   return size;
+}
+
+char *
+ralloc_vasprintf(const void *ctx, const char *fmt, va_list args)
+{
+   size_t size = printf_length(fmt, args) + 1;
+
+   char *ptr = ralloc_size(ctx, size);
+   if (ptr != NULL)
+      vsnprintf(ptr, size, fmt, args);
+
+   return ptr;
+}
+
+bool
+ralloc_asprintf_append(char **str, const char *fmt, ...)
+{
+   bool success;
+   va_list args;
+   va_start(args, fmt);
+   success = ralloc_vasprintf_append(str, fmt, args);
+   va_end(args);
+   return success;
+}
+
+bool
+ralloc_vasprintf_append(char **str, const char *fmt, va_list args)
+{
+   size_t existing_length;
+   assert(str != NULL);
+   existing_length = *str ? strlen(*str) : 0;
+   return ralloc_vasprintf_rewrite_tail(str, &existing_length, fmt, args);
+}
+
+bool
+ralloc_asprintf_rewrite_tail(char **str, size_t *start, const char *fmt, ...)
+{
+   bool success;
+   va_list args;
+   va_start(args, fmt);
+   success = ralloc_vasprintf_rewrite_tail(str, start, fmt, args);
+   va_end(args);
+   return success;
+}
+
+bool
+ralloc_vasprintf_rewrite_tail(char **str, size_t *start, const char *fmt,
+			      va_list args)
+{
+   size_t new_length;
+   char *ptr;
+
+   assert(str != NULL);
+
+   if (unlikely(*str == NULL)) {
+      // Assuming a NULL context is probably bad, but it's expected behavior.
+      *str = ralloc_vasprintf(NULL, fmt, args);
+      return true;
+   }
+
+   new_length = printf_length(fmt, args);
+
+   ptr = resize(*str, *start + new_length + 1);
+   if (unlikely(ptr == NULL))
+      return false;
+
+   vsnprintf(ptr + *start, new_length + 1, fmt, args);
+   *str = ptr;
+   *start += new_length;
+   return true;
+}

diff --git a/icd/intel/compiler/mesa-utils/src/glsl/ralloc.h b/icd/intel/compiler/mesa-utils/src/glsl/ralloc.h
new file mode 100644
index 0000000..4581a7a
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/glsl/ralloc.h

@@ -0,0 +1,445 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file ralloc.h
+ *
+ * ralloc: a recursive memory allocator
+ *
+ * The ralloc memory allocator creates a hierarchy of allocated
+ * objects. Every allocation is in reference to some parent, and
+ * every allocated object can in turn be used as the parent of a
+ * subsequent allocation. This allows for extremely convenient
+ * discarding of an entire tree/sub-tree of allocations by calling
+ * ralloc_free on any particular object to free it and all of its
+ * children.
+ *
+ * The conceptual working of ralloc was directly inspired by Andrew
+ * Tridgell's talloc, but ralloc is an independent implementation
+ * released under the MIT license and tuned for Mesa.
+ *
+ * The talloc implementation is available under the GNU Lesser
+ * General Public License (GNU LGPL), version 3 or later. It is
+ * more sophisticated than ralloc in that it includes reference
+ * counting and debugging features. See: http://talloc.samba.org/
+ */
+
+#ifndef RALLOC_H
+#define RALLOC_H
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stddef.h>
+#include <stdarg.h>
+#include <stdbool.h>
+#include "main/compiler.h"
+
+/**
+ * \def ralloc(ctx, type)
+ * Allocate a new object chained off of the given context.
+ *
+ * This is equivalent to:
+ * \code
+ * ((type *) ralloc_size(ctx, sizeof(type))
+ * \endcode
+ */
+#define ralloc(ctx, type)  ((type *) ralloc_size(ctx, sizeof(type)))
+
+/**
+ * \def rzalloc(ctx, type)
+ * Allocate a new object out of the given context and initialize it to zero.
+ *
+ * This is equivalent to:
+ * \code
+ * ((type *) rzalloc_size(ctx, sizeof(type))
+ * \endcode
+ */
+#define rzalloc(ctx, type) ((type *) rzalloc_size(ctx, sizeof(type)))
+
+/**
+ * Allocate a new ralloc context.
+ *
+ * While any ralloc'd pointer can be used as a context, sometimes it is useful
+ * to simply allocate a context with no associated memory.
+ *
+ * It is equivalent to:
+ * \code
+ * ((type *) ralloc_size(ctx, 0)
+ * \endcode
+ */
+void *ralloc_context(const void *ctx);
+
+/**
+ * Allocate memory chained off of the given context.
+ *
+ * This is the core allocation routine which is used by all others.  It
+ * simply allocates storage for \p size bytes and returns the pointer,
+ * similar to \c malloc.
+ */
+void *ralloc_size(const void *ctx, size_t size);
+
+/**
+ * Allocate zero-initialized memory chained off of the given context.
+ *
+ * This is similar to \c calloc with a size of 1.
+ */
+void *rzalloc_size(const void *ctx, size_t size);
+
+/**
+ * Resize a piece of ralloc-managed memory, preserving data.
+ *
+ * Similar to \c realloc.  Unlike C89, passing 0 for \p size does not free the
+ * memory.  Instead, it resizes it to a 0-byte ralloc context, just like
+ * calling ralloc_size(ctx, 0).  This is different from talloc.
+ *
+ * \param ctx  The context to use for new allocation.  If \p ptr != NULL,
+ *             it must be the same as ralloc_parent(\p ptr).
+ * \param ptr  Pointer to the memory to be resized.  May be NULL.
+ * \param size The amount of memory to allocate, in bytes.
+ */
+void *reralloc_size(const void *ctx, void *ptr, size_t size);
+
+/// \defgroup array Array Allocators @{
+
+/**
+ * \def ralloc_array(ctx, type, count)
+ * Allocate an array of objects chained off the given context.
+ *
+ * Similar to \c calloc, but does not initialize the memory to zero.
+ *
+ * More than a convenience function, this also checks for integer overflow when
+ * multiplying \c sizeof(type) and \p count.  This is necessary for security.
+ *
+ * This is equivalent to:
+ * \code
+ * ((type *) ralloc_array_size(ctx, sizeof(type), count)
+ * \endcode
+ */
+#define ralloc_array(ctx, type, count) \
+   ((type *) ralloc_array_size(ctx, sizeof(type), count))
+
+/**
+ * \def rzalloc_array(ctx, type, count)
+ * Allocate a zero-initialized array chained off the given context.
+ *
+ * Similar to \c calloc.
+ *
+ * More than a convenience function, this also checks for integer overflow when
+ * multiplying \c sizeof(type) and \p count.  This is necessary for security.
+ *
+ * This is equivalent to:
+ * \code
+ * ((type *) rzalloc_array_size(ctx, sizeof(type), count)
+ * \endcode
+ */
+#define rzalloc_array(ctx, type, count) \
+   ((type *) rzalloc_array_size(ctx, sizeof(type), count))
+
+/**
+ * \def reralloc(ctx, ptr, type, count)
+ * Resize a ralloc-managed array, preserving data.
+ *
+ * Similar to \c realloc.  Unlike C89, passing 0 for \p size does not free the
+ * memory.  Instead, it resizes it to a 0-byte ralloc context, just like
+ * calling ralloc_size(ctx, 0).  This is different from talloc.
+ *
+ * More than a convenience function, this also checks for integer overflow when
+ * multiplying \c sizeof(type) and \p count.  This is necessary for security.
+ *
+ * \param ctx   The context to use for new allocation.  If \p ptr != NULL,
+ *              it must be the same as ralloc_parent(\p ptr).
+ * \param ptr   Pointer to the array to be resized.  May be NULL.
+ * \param type  The element type.
+ * \param count The number of elements to allocate.
+ */
+#define reralloc(ctx, ptr, type, count) \
+   ((type *) reralloc_array_size(ctx, ptr, sizeof(type), count))
+
+/**
+ * Allocate memory for an array chained off the given context.
+ *
+ * Similar to \c calloc, but does not initialize the memory to zero.
+ *
+ * More than a convenience function, this also checks for integer overflow when
+ * multiplying \p size and \p count.  This is necessary for security.
+ */
+void *ralloc_array_size(const void *ctx, size_t size, unsigned count);
+
+/**
+ * Allocate a zero-initialized array chained off the given context.
+ *
+ * Similar to \c calloc.
+ *
+ * More than a convenience function, this also checks for integer overflow when
+ * multiplying \p size and \p count.  This is necessary for security.
+ */
+void *rzalloc_array_size(const void *ctx, size_t size, unsigned count);
+
+/**
+ * Resize a ralloc-managed array, preserving data.
+ *
+ * Similar to \c realloc.  Unlike C89, passing 0 for \p size does not free the
+ * memory.  Instead, it resizes it to a 0-byte ralloc context, just like
+ * calling ralloc_size(ctx, 0).  This is different from talloc.
+ *
+ * More than a convenience function, this also checks for integer overflow when
+ * multiplying \c sizeof(type) and \p count.  This is necessary for security.
+ *
+ * \param ctx   The context to use for new allocation.  If \p ptr != NULL,
+ *              it must be the same as ralloc_parent(\p ptr).
+ * \param ptr   Pointer to the array to be resized.  May be NULL.
+ * \param size  The size of an individual element.
+ * \param count The number of elements to allocate.
+ *
+ * \return True unless allocation failed.
+ */
+void *reralloc_array_size(const void *ctx, void *ptr, size_t size,
+			  unsigned count);
+/// @}
+
+/**
+ * Free a piece of ralloc-managed memory.
+ *
+ * This will also free the memory of any children allocated this context.
+ */
+void ralloc_free(void *ptr);
+
+/**
+ * "Steal" memory from one context, changing it to another.
+ *
+ * This changes \p ptr's context to \p new_ctx.  This is quite useful if
+ * memory is allocated out of a temporary context.
+ */
+void ralloc_steal(const void *new_ctx, void *ptr);
+
+/**
+ * Return the given pointer's ralloc context.
+ */
+void *ralloc_parent(const void *ptr);
+
+/**
+ * Return a context whose memory will be automatically freed at program exit.
+ *
+ * The first call to this function creates a context and registers a handler
+ * to free it using \c atexit.  This may cause trouble if used in a library
+ * loaded with \c dlopen.
+ */
+void *ralloc_autofree_context(void);
+
+/**
+ * Set a callback to occur just before an object is freed.
+ */
+void ralloc_set_destructor(const void *ptr, void(*destructor)(void *));
+
+/// \defgroup array String Functions @{
+/**
+ * Duplicate a string, allocating the memory from the given context.
+ */
+char *ralloc_strdup(const void *ctx, const char *str);
+
+/**
+ * Duplicate a string, allocating the memory from the given context.
+ *
+ * Like \c strndup, at most \p n characters are copied.  If \p str is longer
+ * than \p n characters, \p n are copied, and a termining \c '\0' byte is added.
+ */
+char *ralloc_strndup(const void *ctx, const char *str, size_t n);
+
+/**
+ * Concatenate two strings, allocating the necessary space.
+ *
+ * This appends \p str to \p *dest, similar to \c strcat, using ralloc_resize
+ * to expand \p *dest to the appropriate size.  \p dest will be updated to the
+ * new pointer unless allocation fails.
+ *
+ * The result will always be null-terminated.
+ *
+ * \return True unless allocation failed.
+ */
+bool ralloc_strcat(char **dest, const char *str);
+
+/**
+ * Concatenate two strings, allocating the necessary space.
+ *
+ * This appends at most \p n bytes of \p str to \p *dest, using ralloc_resize
+ * to expand \p *dest to the appropriate size.  \p dest will be updated to the
+ * new pointer unless allocation fails.
+ *
+ * The result will always be null-terminated; \p str does not need to be null
+ * terminated if it is longer than \p n.
+ *
+ * \return True unless allocation failed.
+ */
+bool ralloc_strncat(char **dest, const char *str, size_t n);
+
+/**
+ * Print to a string.
+ *
+ * This is analogous to \c sprintf, but allocates enough space (using \p ctx
+ * as the context) for the resulting string.
+ *
+ * \return The newly allocated string.
+ */
+char *ralloc_asprintf (const void *ctx, const char *fmt, ...) PRINTFLIKE(2, 3);
+
+/**
+ * Print to a string, given a va_list.
+ *
+ * This is analogous to \c vsprintf, but allocates enough space (using \p ctx
+ * as the context) for the resulting string.
+ *
+ * \return The newly allocated string.
+ */
+char *ralloc_vasprintf(const void *ctx, const char *fmt, va_list args);
+
+/**
+ * Rewrite the tail of an existing string, starting at a given index.
+ *
+ * Overwrites the contents of *str starting at \p start with newly formatted
+ * text, including a new null-terminator.  Allocates more memory as necessary.
+ *
+ * This can be used to append formatted text when the length of the existing
+ * string is already known, saving a strlen() call.
+ *
+ * \sa ralloc_asprintf_append
+ *
+ * \param str   The string to be updated.
+ * \param start The index to start appending new data at.
+ * \param fmt   A printf-style formatting string
+ *
+ * \p str will be updated to the new pointer unless allocation fails.
+ * \p start will be increased by the length of the newly formatted text.
+ *
+ * \return True unless allocation failed.
+ */
+bool ralloc_asprintf_rewrite_tail(char **str, size_t *start,
+				  const char *fmt, ...)
+				  PRINTFLIKE(3, 4);
+
+/**
+ * Rewrite the tail of an existing string, starting at a given index.
+ *
+ * Overwrites the contents of *str starting at \p start with newly formatted
+ * text, including a new null-terminator.  Allocates more memory as necessary.
+ *
+ * This can be used to append formatted text when the length of the existing
+ * string is already known, saving a strlen() call.
+ *
+ * \sa ralloc_vasprintf_append
+ *
+ * \param str   The string to be updated.
+ * \param start The index to start appending new data at.
+ * \param fmt   A printf-style formatting string
+ * \param args  A va_list containing the data to be formatted
+ *
+ * \p str will be updated to the new pointer unless allocation fails.
+ * \p start will be increased by the length of the newly formatted text.
+ *
+ * \return True unless allocation failed.
+ */
+bool ralloc_vasprintf_rewrite_tail(char **str, size_t *start, const char *fmt,
+				   va_list args);
+
+/**
+ * Append formatted text to the supplied string.
+ *
+ * This is equivalent to
+ * \code
+ * ralloc_asprintf_rewrite_tail(str, strlen(*str), fmt, ...)
+ * \endcode
+ *
+ * \sa ralloc_asprintf
+ * \sa ralloc_asprintf_rewrite_tail
+ * \sa ralloc_strcat
+ *
+ * \p str will be updated to the new pointer unless allocation fails.
+ *
+ * \return True unless allocation failed.
+ */
+bool ralloc_asprintf_append (char **str, const char *fmt, ...)
+			     PRINTFLIKE(2, 3);
+
+/**
+ * Append formatted text to the supplied string, given a va_list.
+ *
+ * This is equivalent to
+ * \code
+ * ralloc_vasprintf_rewrite_tail(str, strlen(*str), fmt, args)
+ * \endcode
+ *
+ * \sa ralloc_vasprintf
+ * \sa ralloc_vasprintf_rewrite_tail
+ * \sa ralloc_strcat
+ *
+ * \p str will be updated to the new pointer unless allocation fails.
+ *
+ * \return True unless allocation failed.
+ */
+bool ralloc_vasprintf_append(char **str, const char *fmt, va_list args);
+/// @}
+
+#ifdef __cplusplus
+} /* end of extern "C" */
+#endif
+
+/**
+ * Declare C++ new and delete operators which use ralloc.
+ *
+ * Placing this macro in the body of a class makes it possible to do:
+ *
+ * TYPE *var = new(mem_ctx) TYPE(...);
+ * delete var;
+ *
+ * which is more idiomatic in C++ than calling ralloc.
+ */
+#define DECLARE_RALLOC_CXX_OPERATORS(TYPE)                               \
+private:                                                                 \
+   static void _ralloc_destructor(void *p)                               \
+   {                                                                     \
+      reinterpret_cast<TYPE *>(p)->~TYPE();                              \
+   }                                                                     \
+public:                                                                  \
+   static void* operator new(size_t size, void *mem_ctx)                 \
+   {                                                                     \
+      void *p = ralloc_size(mem_ctx, size);                              \
+      assert(p != NULL);                                                 \
+      if (!HAS_TRIVIAL_DESTRUCTOR(TYPE))                                 \
+         ralloc_set_destructor(p, _ralloc_destructor);                   \
+      return p;                                                          \
+   }                                                                     \
+                                                                         \
+   static void operator delete(void *p)                                  \
+   {                                                                     \
+      /* The object's destructor is guaranteed to have already been      \
+       * called by the delete operator at this point -- Make sure it's   \
+       * not called again.                                               \
+       */                                                                \
+      if (!HAS_TRIVIAL_DESTRUCTOR(TYPE))                                 \
+         ralloc_set_destructor(p, NULL);                                 \
+      ralloc_free(p);                                                    \
+   }
+
+
+#endif

diff --git a/icd/intel/compiler/mesa-utils/src/glsl/threadpool.h b/icd/intel/compiler/mesa-utils/src/glsl/threadpool.h
new file mode 100644
index 0000000..b1a8ea8
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/glsl/threadpool.h

@@ -0,0 +1,76 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 2014  LunarG, Inc.   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+#ifndef THREADPOOL_H
+#define THREADPOOL_H
+
+#include <stdbool.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct _mesa_threadpool;
+struct _mesa_threadpool_task;
+
+struct _mesa_threadpool *
+_mesa_threadpool_create(int max_threads);
+
+struct _mesa_threadpool *
+_mesa_threadpool_ref(struct _mesa_threadpool *pool);
+
+void
+_mesa_threadpool_unref(struct _mesa_threadpool *pool);
+
+void
+_mesa_threadpool_join(struct _mesa_threadpool *pool, bool graceful);
+
+struct _mesa_threadpool_task *
+_mesa_threadpool_queue_task(struct _mesa_threadpool *pool,
+                            void (*func)(void *), void *data);
+
+bool
+_mesa_threadpool_complete_tasks(struct _mesa_threadpool *pool,
+                                struct _mesa_threadpool_task **tasks,
+                                int num_tasks);
+
+bool
+_mesa_threadpool_complete_task(struct _mesa_threadpool *pool,
+                               struct _mesa_threadpool_task *task);
+
+struct _mesa_threadpool *
+_mesa_glsl_get_threadpool(int max_threads);
+
+void
+_mesa_glsl_wait_threadpool(void);
+
+void
+_mesa_glsl_destroy_threadpool(void);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* THREADPOOL_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mapi/glapi/glapi.h b/icd/intel/compiler/mesa-utils/src/mapi/glapi/glapi.h
new file mode 100644
index 0000000..e2fa925
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mapi/glapi/glapi.h

@@ -0,0 +1,191 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2008  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+/**
+ * \mainpage Mesa GL API Module
+ *
+ * \section GLAPIIntroduction Introduction
+ *
+ * The Mesa GL API module is responsible for dispatching all the
+ * gl*() functions.  All GL functions are dispatched by jumping through
+ * the current dispatch table (basically a struct full of function
+ * pointers.)
+ *
+ * A per-thread current dispatch table and per-thread current context
+ * pointer are managed by this module too.
+ *
+ * This module is intended to be non-Mesa-specific so it can be used
+ * with the X/DRI libGL also.
+ */
+
+
+#ifndef _GLAPI_H
+#define _GLAPI_H
+
+#include "u_thread.h"
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+#ifdef _GLAPI_NO_EXPORTS
+#  define _GLAPI_EXPORT
+#else /* _GLAPI_NO_EXPORTS */
+#  ifdef _WIN32
+#    ifdef _GLAPI_DLL_EXPORTS
+#      define _GLAPI_EXPORT __declspec(dllexport)
+#    else
+#      define _GLAPI_EXPORT __declspec(dllimport)
+#    endif
+#  elif defined(__GNUC__) || (defined(__SUNPRO_C) && (__SUNPRO_C >= 0x590))
+#    define _GLAPI_EXPORT __attribute__((visibility("default")))
+#  else
+#    define _GLAPI_EXPORT
+#  endif
+#endif /* _GLAPI_NO_EXPORTS */
+
+
+/* Is this needed?  It is incomplete anyway. */
+#ifdef USE_MGL_NAMESPACE
+#define _glapi_set_dispatch _mglapi_set_dispatch
+#define _glapi_get_dispatch _mglapi_get_dispatch
+#define _glapi_set_context _mglapi_set_context
+#define _glapi_get_context _mglapi_get_context
+#define _glapi_Dispatch _mglapi_Dispatch
+#define _glapi_Context _mglapi_Context
+#endif
+
+typedef void (*_glapi_proc)(void);
+struct _glapi_table;
+
+
+#if defined (GLX_USE_TLS)
+
+_GLAPI_EXPORT extern __thread struct _glapi_table * _glapi_tls_Dispatch
+    __attribute__((tls_model("initial-exec")));
+
+_GLAPI_EXPORT extern __thread void * _glapi_tls_Context
+    __attribute__((tls_model("initial-exec")));
+
+_GLAPI_EXPORT extern const struct _glapi_table *_glapi_Dispatch;
+_GLAPI_EXPORT extern const void *_glapi_Context;
+
+# define GET_DISPATCH() _glapi_tls_Dispatch
+# define GET_CURRENT_CONTEXT(C)  struct gl_context *C = (struct gl_context *) _glapi_tls_Context
+
+#else
+
+_GLAPI_EXPORT extern struct _glapi_table *_glapi_Dispatch;
+_GLAPI_EXPORT extern void *_glapi_Context;
+
+# ifdef THREADS
+
+#  define GET_DISPATCH() \
+     (likely(_glapi_Dispatch) ? _glapi_Dispatch : _glapi_get_dispatch())
+
+#  define GET_CURRENT_CONTEXT(C)  struct gl_context *C = (struct gl_context *) \
+     (likely(_glapi_Context) ? _glapi_Context : _glapi_get_context())
+
+# else
+
+#  define GET_DISPATCH() _glapi_Dispatch
+#  define GET_CURRENT_CONTEXT(C)  struct gl_context *C = (struct gl_context *) _glapi_Context
+
+# endif
+
+#endif /* defined (GLX_USE_TLS) */
+
+
+void
+_glapi_destroy_multithread(void);
+
+
+_GLAPI_EXPORT void
+_glapi_check_multithread(void);
+
+
+_GLAPI_EXPORT void
+_glapi_set_context(void *context);
+
+
+_GLAPI_EXPORT void *
+_glapi_get_context(void);
+
+
+_GLAPI_EXPORT void
+_glapi_set_dispatch(struct _glapi_table *dispatch);
+
+
+_GLAPI_EXPORT struct _glapi_table *
+_glapi_get_dispatch(void);
+
+
+_GLAPI_EXPORT unsigned int
+_glapi_get_dispatch_table_size(void);
+
+
+_GLAPI_EXPORT int
+_glapi_add_dispatch( const char * const * function_names,
+		     const char * parameter_signature );
+
+_GLAPI_EXPORT int
+_glapi_get_proc_offset(const char *funcName);
+
+
+_GLAPI_EXPORT _glapi_proc
+_glapi_get_proc_address(const char *funcName);
+
+
+_GLAPI_EXPORT const char *
+_glapi_get_proc_name(unsigned int offset);
+
+
+_GLAPI_EXPORT struct _glapi_table *
+_glapi_create_table_from_handle(void *handle, const char *symbol_prefix);
+
+
+/** Deprecated function */
+_GLAPI_EXPORT unsigned long
+_glthread_GetID(void);
+
+
+/*
+ * These stubs are kept so that the old DRI drivers still load.
+ */
+_GLAPI_EXPORT void
+_glapi_noop_enable_warnings(unsigned char enable);
+
+
+_GLAPI_EXPORT void
+_glapi_set_warning_func(_glapi_proc func);
+
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _GLAPI_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mapi/u_compiler.h b/icd/intel/compiler/mesa-utils/src/mapi/u_compiler.h
new file mode 100644
index 0000000..f376e97
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mapi/u_compiler.h

@@ -0,0 +1,33 @@
+#ifndef _U_COMPILER_H_
+#define _U_COMPILER_H_
+
+#include "c99_compat.h" /* inline, __func__, etc. */
+
+
+/* XXX: Use standard `inline` keyword instead */
+#ifndef INLINE
+#  define INLINE inline
+#endif
+
+/* Function visibility */
+#ifndef PUBLIC
+#  if defined(__GNUC__) || (defined(__SUNPRO_C) && (__SUNPRO_C >= 0x590))
+#    define PUBLIC __attribute__((visibility("default")))
+#  elif defined(_MSC_VER)
+#    define PUBLIC __declspec(dllexport)
+#  else
+#    define PUBLIC
+#  endif
+#endif
+
+#ifndef likely
+#  if defined(__GNUC__)
+#    define likely(x)   __builtin_expect(!!(x), 1)
+#    define unlikely(x) __builtin_expect(!!(x), 0)
+#  else
+#    define likely(x)   (x)
+#    define unlikely(x) (x)
+#  endif
+#endif
+
+#endif /* _U_COMPILER_H_ */

diff --git a/icd/intel/compiler/mesa-utils/src/mapi/u_thread.h b/icd/intel/compiler/mesa-utils/src/mapi/u_thread.h
new file mode 100644
index 0000000..57c3b07
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mapi/u_thread.h

@@ -0,0 +1,156 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2006  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+/*
+ * Thread support for gl dispatch.
+ *
+ * Initial version by John Stone (j.stone@acm.org) (johns@cs.umr.edu)
+ *                and Christoph Poliwoda (poliwoda@volumegraphics.com)
+ * Revised by Keith Whitwell
+ * Adapted for new gl dispatcher by Brian Paul
+ * Modified for use in mapi by Chia-I Wu
+ */
+
+/*
+ * If this file is accidentally included by a non-threaded build,
+ * it should not cause the build to fail, or otherwise cause problems.
+ * In general, it should only be included when needed however.
+ */
+
+#ifndef _U_THREAD_H_
+#define _U_THREAD_H_
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "u_compiler.h"
+
+#include "c11/threads.h"
+
+#if defined(HAVE_PTHREAD) || defined(_WIN32)
+#ifndef THREADS
+#define THREADS
+#endif
+#endif
+
+/*
+ * Error messages
+ */
+#define INIT_TSD_ERROR "Mesa: failed to allocate key for thread specific data"
+#define GET_TSD_ERROR "Mesa: failed to get thread specific data"
+#define SET_TSD_ERROR "Mesa: thread failed to set thread specific data"
+
+
+/*
+ * Magic number to determine if a TSD object has been initialized.
+ * Kind of a hack but there doesn't appear to be a better cross-platform
+ * solution.
+ */
+#define INIT_MAGIC 0xff8adc98
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+struct u_tsd {
+   tss_t key;
+   unsigned initMagic;
+};
+
+
+static INLINE unsigned long
+u_thread_self(void)
+{
+   /*
+    * XXX: Callers of u_thread_self assume it is a lightweight function,
+    * returning a numeric value.  But unfortunately C11's thrd_current() gives
+    * no such guarantees.  In fact, it's pretty hard to have a compliant
+    * implementation of thrd_current() on Windows with such characteristics.
+    * So for now, we side-step this mess and use Windows thread primitives
+    * directly here.
+    *
+    * FIXME: On the other hand, u_thread_self() is a bad
+    * abstraction.  Even with pthreads, there is no guarantee that
+    * pthread_self() will return numeric IDs -- we should be using
+    * pthread_equal() instead of assuming we can compare thread ids...
+    */
+#ifdef _WIN32
+   return GetCurrentThreadId();
+#else
+   return (unsigned long) (uintptr_t) thrd_current();
+#endif
+}
+
+
+static INLINE void
+u_tsd_init(struct u_tsd *tsd)
+{
+   if (tss_create(&tsd->key, NULL/*free*/) != 0) {
+      perror(INIT_TSD_ERROR);
+      exit(-1);
+   }
+   tsd->initMagic = INIT_MAGIC;
+}
+
+
+static INLINE void *
+u_tsd_get(struct u_tsd *tsd)
+{
+   if (tsd->initMagic != INIT_MAGIC) {
+      u_tsd_init(tsd);
+   }
+   return tss_get(tsd->key);
+}
+
+
+static INLINE void
+u_tsd_set(struct u_tsd *tsd, void *ptr)
+{
+   if (tsd->initMagic != INIT_MAGIC) {
+      u_tsd_init(tsd);
+   }
+   if (tss_set(tsd->key, ptr) != 0) {
+      perror(SET_TSD_ERROR);
+      exit(-1);
+   }
+}
+
+
+static INLINE void
+u_tsd_destroy(struct u_tsd *tsd)
+{
+   if (tsd->initMagic != INIT_MAGIC) {
+      return;
+   }
+   tss_delete(tsd->key);
+   tsd->initMagic = 0x0;
+}
+
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _U_THREAD_H_ */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/accum.c b/icd/intel/compiler/mesa-utils/src/mesa/main/accum.c
new file mode 100644
index 0000000..ef74468
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/accum.c

@@ -0,0 +1,491 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2005  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include "glheader.h"
+#include "accum.h"
+#include "condrender.h"
+#include "context.h"
+#include "format_unpack.h"
+#include "format_pack.h"
+#include "imports.h"
+#include "macros.h"
+#include "state.h"
+#include "mtypes.h"
+#include "main/dispatch.h"
+
+
+void GLAPIENTRY
+_mesa_ClearAccum( GLfloat red, GLfloat green, GLfloat blue, GLfloat alpha )
+{
+   GLfloat tmp[4];
+   GET_CURRENT_CONTEXT(ctx);
+
+   tmp[0] = CLAMP( red,   -1.0F, 1.0F );
+   tmp[1] = CLAMP( green, -1.0F, 1.0F );
+   tmp[2] = CLAMP( blue,  -1.0F, 1.0F );
+   tmp[3] = CLAMP( alpha, -1.0F, 1.0F );
+
+   if (TEST_EQ_4V(tmp, ctx->Accum.ClearColor))
+      return;
+
+   COPY_4FV( ctx->Accum.ClearColor, tmp );
+}
+
+
+void GLAPIENTRY
+_mesa_Accum( GLenum op, GLfloat value )
+{
+   GET_CURRENT_CONTEXT(ctx);
+   FLUSH_VERTICES(ctx, 0);
+
+   switch (op) {
+   case GL_ADD:
+   case GL_MULT:
+   case GL_ACCUM:
+   case GL_LOAD:
+   case GL_RETURN:
+      /* OK */
+      break;
+   default:
+      _mesa_error(ctx, GL_INVALID_ENUM, "glAccum(op)");
+      return;
+   }
+
+   if (ctx->DrawBuffer->Visual.haveAccumBuffer == 0) {
+      _mesa_error(ctx, GL_INVALID_OPERATION, "glAccum(no accum buffer)");
+      return;
+   }
+
+   if (ctx->DrawBuffer != ctx->ReadBuffer) {
+      /* See GLX_SGI_make_current_read or WGL_ARB_make_current_read,
+       * or GL_EXT_framebuffer_blit.
+       */
+      _mesa_error(ctx, GL_INVALID_OPERATION,
+                  "glAccum(different read/draw buffers)");
+      return;
+   }
+
+   if (ctx->NewState)
+      _mesa_update_state(ctx);
+
+   if (ctx->DrawBuffer->_Status != GL_FRAMEBUFFER_COMPLETE_EXT) {
+      _mesa_error(ctx, GL_INVALID_FRAMEBUFFER_OPERATION_EXT,
+                  "glAccum(incomplete framebuffer)");
+      return;
+   }
+
+   if (ctx->RasterDiscard)
+      return;
+
+   if (ctx->RenderMode == GL_RENDER) {
+      _mesa_accum(ctx, op, value);
+   }
+}
+
+
+/**
+ * Clear the accumulation buffer by mapping the renderbuffer and
+ * writing the clear color to it.  Called by the driver's implementation
+ * of the glClear function.
+ */
+void
+_mesa_clear_accum_buffer(struct gl_context *ctx)
+{
+   GLuint x, y, width, height;
+   GLubyte *accMap;
+   GLint accRowStride;
+   struct gl_renderbuffer *accRb;
+
+   if (!ctx->DrawBuffer)
+      return;
+
+   accRb = ctx->DrawBuffer->Attachment[BUFFER_ACCUM].Renderbuffer;
+   if (!accRb)
+      return;   /* missing accum buffer, not an error */
+
+   /* bounds, with scissor */
+   x = ctx->DrawBuffer->_Xmin;
+   y = ctx->DrawBuffer->_Ymin;
+   width = ctx->DrawBuffer->_Xmax - ctx->DrawBuffer->_Xmin;
+   height = ctx->DrawBuffer->_Ymax - ctx->DrawBuffer->_Ymin;
+
+   ctx->Driver.MapRenderbuffer(ctx, accRb, x, y, width, height,
+                               GL_MAP_WRITE_BIT, &accMap, &accRowStride);
+
+   if (!accMap) {
+      _mesa_error(ctx, GL_OUT_OF_MEMORY, "glAccum");
+      return;
+   }
+
+   if (accRb->Format == MESA_FORMAT_RGBA_SNORM16) {
+      const GLshort clearR = FLOAT_TO_SHORT(ctx->Accum.ClearColor[0]);
+      const GLshort clearG = FLOAT_TO_SHORT(ctx->Accum.ClearColor[1]);
+      const GLshort clearB = FLOAT_TO_SHORT(ctx->Accum.ClearColor[2]);
+      const GLshort clearA = FLOAT_TO_SHORT(ctx->Accum.ClearColor[3]);
+      GLuint i, j;
+
+      for (j = 0; j < height; j++) {
+         GLshort *row = (GLshort *) accMap;
+         
+         for (i = 0; i < width; i++) {
+            row[i * 4 + 0] = clearR;
+            row[i * 4 + 1] = clearG;
+            row[i * 4 + 2] = clearB;
+            row[i * 4 + 3] = clearA;
+         }
+         accMap += accRowStride;
+      }
+   }
+   else {
+      /* other types someday? */
+      _mesa_warning(ctx, "unexpected accum buffer type");
+   }
+
+   ctx->Driver.UnmapRenderbuffer(ctx, accRb);
+}
+
+
+/**
+ * if (bias)
+ *    Accum += value
+ * else
+ *    Accum *= value
+ */
+static void
+accum_scale_or_bias(struct gl_context *ctx, GLfloat value,
+                    GLint xpos, GLint ypos, GLint width, GLint height,
+                    GLboolean bias)
+{
+   struct gl_renderbuffer *accRb =
+      ctx->DrawBuffer->Attachment[BUFFER_ACCUM].Renderbuffer;
+   GLubyte *accMap;
+   GLint accRowStride;
+
+   assert(accRb);
+
+   ctx->Driver.MapRenderbuffer(ctx, accRb, xpos, ypos, width, height,
+                               GL_MAP_READ_BIT | GL_MAP_WRITE_BIT,
+                               &accMap, &accRowStride);
+
+   if (!accMap) {
+      _mesa_error(ctx, GL_OUT_OF_MEMORY, "glAccum");
+      return;
+   }
+
+   if (accRb->Format == MESA_FORMAT_RGBA_SNORM16) {
+      const GLshort incr = (GLshort) (value * 32767.0f);
+      GLint i, j;
+      if (bias) {
+         for (j = 0; j < height; j++) {
+            GLshort *acc = (GLshort *) accMap;
+            for (i = 0; i < 4 * width; i++) {
+               acc[i] += incr;
+            }
+            accMap += accRowStride;
+         }
+      }
+      else {
+         /* scale */
+         for (j = 0; j < height; j++) {
+            GLshort *acc = (GLshort *) accMap;
+            for (i = 0; i < 4 * width; i++) {
+               acc[i] = (GLshort) (acc[i] * value);
+            }
+            accMap += accRowStride;
+         }
+      }
+   }
+   else {
+      /* other types someday? */
+   }
+
+   ctx->Driver.UnmapRenderbuffer(ctx, accRb);
+}
+
+
+/**
+ * if (load)
+ *    Accum = ColorBuf * value
+ * else
+ *    Accum += ColorBuf * value
+ */
+static void
+accum_or_load(struct gl_context *ctx, GLfloat value,
+              GLint xpos, GLint ypos, GLint width, GLint height,
+              GLboolean load)
+{
+   struct gl_renderbuffer *accRb =
+      ctx->DrawBuffer->Attachment[BUFFER_ACCUM].Renderbuffer;
+   struct gl_renderbuffer *colorRb = ctx->ReadBuffer->_ColorReadBuffer;
+   GLubyte *accMap, *colorMap;
+   GLint accRowStride, colorRowStride;
+   GLbitfield mappingFlags;
+
+   if (!colorRb) {
+      /* no read buffer - OK */
+      return;
+   }
+
+   assert(accRb);
+
+   mappingFlags = GL_MAP_WRITE_BIT;
+   if (!load) /* if we're accumulating */
+      mappingFlags |= GL_MAP_READ_BIT;
+
+   /* Map accum buffer */
+   ctx->Driver.MapRenderbuffer(ctx, accRb, xpos, ypos, width, height,
+                               mappingFlags, &accMap, &accRowStride);
+   if (!accMap) {
+      _mesa_error(ctx, GL_OUT_OF_MEMORY, "glAccum");
+      return;
+   }
+
+   /* Map color buffer */
+   ctx->Driver.MapRenderbuffer(ctx, colorRb, xpos, ypos, width, height,
+                               GL_MAP_READ_BIT,
+                               &colorMap, &colorRowStride);
+   if (!colorMap) {
+      ctx->Driver.UnmapRenderbuffer(ctx, accRb);
+      _mesa_error(ctx, GL_OUT_OF_MEMORY, "glAccum");
+      return;
+   }
+
+   if (accRb->Format == MESA_FORMAT_RGBA_SNORM16) {
+      const GLfloat scale = value * 32767.0f;
+      GLint i, j;
+      GLfloat (*rgba)[4];
+
+      rgba = malloc(width * 4 * sizeof(GLfloat));
+      if (rgba) {
+         for (j = 0; j < height; j++) {
+            GLshort *acc = (GLshort *) accMap;
+
+            /* read colors from source color buffer */
+            _mesa_unpack_rgba_row(colorRb->Format, width, colorMap, rgba);
+
+            if (load) {
+               for (i = 0; i < width; i++) {
+                  acc[i * 4 + 0] = (GLshort) (rgba[i][RCOMP] * scale);
+                  acc[i * 4 + 1] = (GLshort) (rgba[i][GCOMP] * scale);
+                  acc[i * 4 + 2] = (GLshort) (rgba[i][BCOMP] * scale);
+                  acc[i * 4 + 3] = (GLshort) (rgba[i][ACOMP] * scale);
+               }
+            }
+            else {
+               /* accumulate */
+               for (i = 0; i < width; i++) {
+                  acc[i * 4 + 0] += (GLshort) (rgba[i][RCOMP] * scale);
+                  acc[i * 4 + 1] += (GLshort) (rgba[i][GCOMP] * scale);
+                  acc[i * 4 + 2] += (GLshort) (rgba[i][BCOMP] * scale);
+                  acc[i * 4 + 3] += (GLshort) (rgba[i][ACOMP] * scale);
+               }
+            }
+
+            colorMap += colorRowStride;
+            accMap += accRowStride;
+         }
+
+         free(rgba);
+      }
+      else {
+         _mesa_error(ctx, GL_OUT_OF_MEMORY, "glAccum");
+      }
+   }
+   else {
+      /* other types someday? */
+   }
+
+   ctx->Driver.UnmapRenderbuffer(ctx, accRb);
+   ctx->Driver.UnmapRenderbuffer(ctx, colorRb);
+}
+
+
+/**
+ * ColorBuffer = Accum * value
+ */
+static void
+accum_return(struct gl_context *ctx, GLfloat value,
+             GLint xpos, GLint ypos, GLint width, GLint height)
+{
+   struct gl_framebuffer *fb = ctx->DrawBuffer;
+   struct gl_renderbuffer *accRb = fb->Attachment[BUFFER_ACCUM].Renderbuffer;
+   GLubyte *accMap, *colorMap;
+   GLint accRowStride, colorRowStride;
+   GLuint buffer;
+
+   /* Map accum buffer */
+   ctx->Driver.MapRenderbuffer(ctx, accRb, xpos, ypos, width, height,
+                               GL_MAP_READ_BIT,
+                               &accMap, &accRowStride);
+   if (!accMap) {
+      _mesa_error(ctx, GL_OUT_OF_MEMORY, "glAccum");
+      return;
+   }
+
+   /* Loop over destination buffers */
+   for (buffer = 0; buffer < fb->_NumColorDrawBuffers; buffer++) {
+      struct gl_renderbuffer *colorRb = fb->_ColorDrawBuffers[buffer];
+      const GLboolean masking = (!ctx->Color.ColorMask[buffer][RCOMP] ||
+                                 !ctx->Color.ColorMask[buffer][GCOMP] ||
+                                 !ctx->Color.ColorMask[buffer][BCOMP] ||
+                                 !ctx->Color.ColorMask[buffer][ACOMP]);
+      GLbitfield mappingFlags = GL_MAP_WRITE_BIT;
+
+      if (masking)
+         mappingFlags |= GL_MAP_READ_BIT;
+
+      /* Map color buffer */
+      ctx->Driver.MapRenderbuffer(ctx, colorRb, xpos, ypos, width, height,
+                                  mappingFlags, &colorMap, &colorRowStride);
+      if (!colorMap) {
+         _mesa_error(ctx, GL_OUT_OF_MEMORY, "glAccum");
+         continue;
+      }
+
+      if (accRb->Format == MESA_FORMAT_RGBA_SNORM16) {
+         const GLfloat scale = value / 32767.0f;
+         GLint i, j;
+         GLfloat (*rgba)[4], (*dest)[4];
+
+         rgba = malloc(width * 4 * sizeof(GLfloat));
+         dest = malloc(width * 4 * sizeof(GLfloat));
+
+         if (rgba && dest) {
+            for (j = 0; j < height; j++) {
+               GLshort *acc = (GLshort *) accMap;
+
+               for (i = 0; i < width; i++) {
+                  rgba[i][0] = acc[i * 4 + 0] * scale;
+                  rgba[i][1] = acc[i * 4 + 1] * scale;
+                  rgba[i][2] = acc[i * 4 + 2] * scale;
+                  rgba[i][3] = acc[i * 4 + 3] * scale;
+               }
+
+               if (masking) {
+
+                  /* get existing colors from dest buffer */
+                  _mesa_unpack_rgba_row(colorRb->Format, width, colorMap, dest);
+
+                  /* use the dest colors where mask[channel] = 0 */
+                  if (ctx->Color.ColorMask[buffer][RCOMP] == 0) {
+                     for (i = 0; i < width; i++)
+                        rgba[i][RCOMP] = dest[i][RCOMP];
+                  }
+                  if (ctx->Color.ColorMask[buffer][GCOMP] == 0) {
+                     for (i = 0; i < width; i++)
+                        rgba[i][GCOMP] = dest[i][GCOMP];
+                  }
+                  if (ctx->Color.ColorMask[buffer][BCOMP] == 0) {
+                     for (i = 0; i < width; i++)
+                        rgba[i][BCOMP] = dest[i][BCOMP];
+                  }
+                  if (ctx->Color.ColorMask[buffer][ACOMP] == 0) {
+                     for (i = 0; i < width; i++)
+                        rgba[i][ACOMP] = dest[i][ACOMP];
+                  }
+               }
+
+               _mesa_pack_float_rgba_row(colorRb->Format, width,
+                                         (const GLfloat (*)[4]) rgba, colorMap);
+
+               accMap += accRowStride;
+               colorMap += colorRowStride;
+            }
+         }
+         else {
+            _mesa_error(ctx, GL_OUT_OF_MEMORY, "glAccum");
+         }
+         free(rgba);
+         free(dest);
+      }
+      else {
+         /* other types someday? */
+      }
+
+      ctx->Driver.UnmapRenderbuffer(ctx, colorRb);
+   }
+
+   ctx->Driver.UnmapRenderbuffer(ctx, accRb);
+}
+
+
+
+/**
+ * Software fallback for glAccum.  A hardware driver that supports
+ * signed 16-bit color channels could implement hardware accumulation
+ * operations, but no driver does so at this time.
+ */
+void
+_mesa_accum(struct gl_context *ctx, GLenum op, GLfloat value)
+{
+   GLint xpos, ypos, width, height;
+
+   if (!ctx->DrawBuffer->Attachment[BUFFER_ACCUM].Renderbuffer) {
+      _mesa_warning(ctx, "Calling glAccum() without an accumulation buffer");
+      return;
+   }
+
+   if (!_mesa_check_conditional_render(ctx))
+      return;
+
+   xpos = ctx->DrawBuffer->_Xmin;
+   ypos = ctx->DrawBuffer->_Ymin;
+   width =  ctx->DrawBuffer->_Xmax - ctx->DrawBuffer->_Xmin;
+   height = ctx->DrawBuffer->_Ymax - ctx->DrawBuffer->_Ymin;
+
+   switch (op) {
+   case GL_ADD:
+      if (value != 0.0F) {
+         accum_scale_or_bias(ctx, value, xpos, ypos, width, height, GL_TRUE);
+      }
+      break;
+   case GL_MULT:
+      if (value != 1.0F) {
+         accum_scale_or_bias(ctx, value, xpos, ypos, width, height, GL_FALSE);
+      }
+      break;
+   case GL_ACCUM:
+      if (value != 0.0F) {
+         accum_or_load(ctx, value, xpos, ypos, width, height, GL_FALSE);
+      }
+      break;
+   case GL_LOAD:
+      accum_or_load(ctx, value, xpos, ypos, width, height, GL_TRUE);
+      break;
+   case GL_RETURN:
+      accum_return(ctx, value, xpos, ypos, width, height);
+      break;
+   default:
+      _mesa_problem(ctx, "invalid mode in _mesa_accum()");
+      break;
+   }
+}
+
+
+void 
+_mesa_init_accum( struct gl_context *ctx )
+{
+   /* Accumulate buffer group */
+   ASSIGN_4V( ctx->Accum.ClearColor, 0.0, 0.0, 0.0, 0.0 );
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/accum.h b/icd/intel/compiler/mesa-utils/src/mesa/main/accum.h
new file mode 100644
index 0000000..a5665c7
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/accum.h

@@ -0,0 +1,60 @@
+/**
+ * \file accum.h
+ * Accumulation buffer operations.
+ * 
+ * \if subset
+ * (No-op)
+ *
+ * \endif
+ */
+
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2001  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+
+#ifndef ACCUM_H
+#define ACCUM_H
+
+#include "main/glheader.h"
+
+struct _glapi_table;
+struct gl_context;
+struct gl_renderbuffer;
+
+extern void GLAPIENTRY
+_mesa_ClearAccum( GLfloat red, GLfloat green, GLfloat blue, GLfloat alpha );
+void GLAPIENTRY
+_mesa_Accum( GLenum op, GLfloat value );
+
+extern void
+_mesa_accum(struct gl_context *ctx, GLenum op, GLfloat value);
+
+extern void
+_mesa_clear_accum_buffer(struct gl_context *ctx);
+
+extern void
+_mesa_init_accum( struct gl_context *ctx );
+
+#endif /* ACCUM_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/bitset.h b/icd/intel/compiler/mesa-utils/src/mesa/main/bitset.h
new file mode 100644
index 0000000..601fd0e
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/bitset.h

@@ -0,0 +1,99 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 2006  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file bitset.h
+ * \brief Bitset of arbitrary size definitions.
+ * \author Michal Krol
+ */
+
+#ifndef BITSET_H
+#define BITSET_H
+
+#include "imports.h"
+
+/****************************************************************************
+ * generic bitset implementation
+ */
+
+#define BITSET_WORD GLuint
+#define BITSET_WORDBITS (sizeof (BITSET_WORD) * 8)
+
+/* bitset declarations
+ */
+#define BITSET_WORDS(bits) (ALIGN(bits, BITSET_WORDBITS) / BITSET_WORDBITS)
+#define BITSET_DECLARE(name, bits) BITSET_WORD name[BITSET_WORDS(bits)]
+
+/* bitset operations
+ */
+#define BITSET_COPY(x, y) memcpy( (x), (y), sizeof (x) )
+#define BITSET_EQUAL(x, y) (memcmp( (x), (y), sizeof (x) ) == 0)
+#define BITSET_ZERO(x) memset( (x), 0, sizeof (x) )
+#define BITSET_ONES(x) memset( (x), 0xff, sizeof (x) )
+
+#define BITSET_BITWORD(b) ((b) / BITSET_WORDBITS)
+#define BITSET_BIT(b) (1 << ((b) % BITSET_WORDBITS))
+
+/* single bit operations
+ */
+#define BITSET_TEST(x, b) ((x)[BITSET_BITWORD(b)] & BITSET_BIT(b))
+#define BITSET_SET(x, b) ((x)[BITSET_BITWORD(b)] |= BITSET_BIT(b))
+#define BITSET_CLEAR(x, b) ((x)[BITSET_BITWORD(b)] &= ~BITSET_BIT(b))
+
+#define BITSET_MASK(b) ((b) == BITSET_WORDBITS ? ~0 : BITSET_BIT(b) - 1)
+#define BITSET_RANGE(b, e) (BITSET_MASK((e) + 1) & ~BITSET_MASK(b))
+
+/* bit range operations
+ */
+#define BITSET_TEST_RANGE(x, b, e) \
+   (BITSET_BITWORD(b) == BITSET_BITWORD(e) ? \
+   ((x)[BITSET_BITWORD(b)] & BITSET_RANGE(b, e)) : \
+   (assert (!"BITSET_TEST_RANGE: bit range crosses word boundary"), 0))
+#define BITSET_SET_RANGE(x, b, e) \
+   (BITSET_BITWORD(b) == BITSET_BITWORD(e) ? \
+   ((x)[BITSET_BITWORD(b)] |= BITSET_RANGE(b, e)) : \
+   (assert (!"BITSET_SET_RANGE: bit range crosses word boundary"), 0))
+#define BITSET_CLEAR_RANGE(x, b, e) \
+   (BITSET_BITWORD(b) == BITSET_BITWORD(e) ? \
+   ((x)[BITSET_BITWORD(b)] &= ~BITSET_RANGE(b, e)) : \
+   (assert (!"BITSET_CLEAR_RANGE: bit range crosses word boundary"), 0))
+
+/* Get first bit set in a bitset.
+ */
+static inline int
+__bitset_ffs(const BITSET_WORD *x, int n)
+{
+   int i;
+
+   for (i = 0; i < n; i++) {
+      if (x[i])
+	 return ffs(x[i]) + BITSET_WORDBITS * i;
+   }
+
+   return 0;
+}
+
+#define BITSET_FFS(x) __bitset_ffs(x, Elements(x))
+
+#endif

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/compiler.h b/icd/intel/compiler/mesa-utils/src/mesa/main/compiler.h
new file mode 100644
index 0000000..c4c2d95
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/compiler.h

@@ -0,0 +1,446 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2008  Brian Paul   All Rights Reserved.
+ * Copyright (C) 2009  VMware, Inc.  All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+/**
+ * \file compiler.h
+ * Compiler-related stuff.
+ */
+
+
+#ifndef COMPILER_H
+#define COMPILER_H
+
+
+#include <assert.h>
+#include <ctype.h>
+#include <math.h>
+#include <limits.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <float.h>
+#include <stdarg.h>
+
+#include "c99_compat.h" /* inline, __func__, etc. */
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+/**
+ * Get standard integer types
+ */
+#include <stdint.h>
+
+
+/**
+  * Sun compilers define __i386 instead of the gcc-style __i386__
+ */
+#ifdef __SUNPRO_C
+# if !defined(__i386__) && defined(__i386)
+#  define __i386__
+# elif !defined(__amd64__) && defined(__amd64)
+#  define __amd64__
+# elif !defined(__sparc__) && defined(__sparc)
+#  define __sparc__
+# endif
+# if !defined(__volatile)
+#  define __volatile volatile
+# endif
+#endif
+
+
+/**
+ * finite macro.
+ */
+#if defined(_MSC_VER)
+#  define finite _finite
+#endif
+
+
+/**
+ * Disable assorted warnings
+ */
+#if defined(_WIN32) && !defined(__CYGWIN__)
+#  if !defined(__GNUC__) /* mingw environment */
+#    pragma warning( disable : 4068 ) /* unknown pragma */
+#    pragma warning( disable : 4710 ) /* function 'foo' not inlined */
+#    pragma warning( disable : 4711 ) /* function 'foo' selected for automatic inline expansion */
+#    pragma warning( disable : 4127 ) /* conditional expression is constant */
+#    if defined(MESA_MINWARN)
+#      pragma warning( disable : 4244 ) /* '=' : conversion from 'const double ' to 'float ', possible loss of data */
+#      pragma warning( disable : 4018 ) /* '<' : signed/unsigned mismatch */
+#      pragma warning( disable : 4305 ) /* '=' : truncation from 'const double ' to 'float ' */
+#      pragma warning( disable : 4550 ) /* 'function' undefined; assuming extern returning int */
+#      pragma warning( disable : 4761 ) /* integral size mismatch in argument; conversion supplied */
+#    endif
+#  endif
+#endif
+
+
+
+/* XXX: Use standard `inline` keyword instead */
+#ifndef INLINE
+#  define INLINE inline
+#endif
+
+
+/**
+ * PUBLIC/USED macros
+ *
+ * If we build the library with gcc's -fvisibility=hidden flag, we'll
+ * use the PUBLIC macro to mark functions that are to be exported.
+ *
+ * We also need to define a USED attribute, so the optimizer doesn't 
+ * inline a static function that we later use in an alias. - ajax
+ */
+#ifndef PUBLIC
+#  if (defined(__GNUC__) && __GNUC__ >= 4) || (defined(__SUNPRO_C) && (__SUNPRO_C >= 0x590))
+#    define PUBLIC __attribute__((visibility("default")))
+#    define USED __attribute__((used))
+#  else
+#    define PUBLIC
+#    define USED
+#  endif
+#endif
+
+
+/**
+ * __builtin_expect macros
+ */
+#if !defined(__GNUC__)
+#  define __builtin_expect(x, y) (x)
+#endif
+
+#ifndef likely
+#  ifdef __GNUC__
+#    define likely(x)   __builtin_expect(!!(x), 1)
+#    define unlikely(x) __builtin_expect(!!(x), 0)
+#  else
+#    define likely(x)   (x)
+#    define unlikely(x) (x)
+#  endif
+#endif
+
+/* XXX: Use standard `__func__` instead */
+#ifndef __FUNCTION__
+#  define __FUNCTION__ __func__
+#endif
+
+/**
+ * Either define MESA_BIG_ENDIAN or MESA_LITTLE_ENDIAN, and CPU_TO_LE32.
+ * Do not use these unless absolutely necessary!
+ * Try to use a runtime test instead.
+ * For now, only used by some DRI hardware drivers for color/texel packing.
+ */
+#if defined(BYTE_ORDER) && defined(BIG_ENDIAN) && BYTE_ORDER == BIG_ENDIAN
+#if defined(__linux__)
+#include <byteswap.h>
+#define CPU_TO_LE32( x )	bswap_32( x )
+#elif defined(__APPLE__)
+#include <CoreFoundation/CFByteOrder.h>
+#define CPU_TO_LE32( x )	CFSwapInt32HostToLittle( x )
+#elif (defined(_AIX) || defined(__blrts))
+static INLINE GLuint CPU_TO_LE32(GLuint x)
+{
+   return (((x & 0x000000ff) << 24) |
+           ((x & 0x0000ff00) <<  8) |
+           ((x & 0x00ff0000) >>  8) |
+           ((x & 0xff000000) >> 24));
+}
+#elif defined(__OpenBSD__)
+#include <sys/types.h>
+#define CPU_TO_LE32( x )	htole32( x )
+#else /*__linux__ */
+#include <sys/endian.h>
+#define CPU_TO_LE32( x )	bswap32( x )
+#endif /*__linux__*/
+#define MESA_BIG_ENDIAN 1
+#else
+#define CPU_TO_LE32( x )	( x )
+#define MESA_LITTLE_ENDIAN 1
+#endif
+#define LE32_TO_CPU( x )	CPU_TO_LE32( x )
+
+
+
+#if !defined(CAPI) && defined(_WIN32)
+#define CAPI _cdecl
+#endif
+
+
+/**
+ * Create a macro so that asm functions can be linked into compilers other
+ * than GNU C
+ */
+#ifndef _ASMAPI
+#if defined(_WIN32)
+#define _ASMAPI __cdecl
+#else
+#define _ASMAPI
+#endif
+#ifdef	PTR_DECL_IN_FRONT
+#define	_ASMAPIP * _ASMAPI
+#else
+#define	_ASMAPIP _ASMAPI *
+#endif
+#endif
+
+#ifdef USE_X86_ASM
+#define _NORMAPI _ASMAPI
+#define _NORMAPIP _ASMAPIP
+#else
+#define _NORMAPI
+#define _NORMAPIP *
+#endif
+
+
+/* Turn off macro checking systems used by other libraries */
+#ifdef CHECK
+#undef CHECK
+#endif
+
+
+/**
+ * ASSERT macro
+ */
+#if !defined(_WIN32_WCE)
+#if defined(DEBUG)
+#  define ASSERT(X)   assert(X)
+#else
+#  define ASSERT(X)
+#endif
+#endif
+
+
+/**
+ * Static (compile-time) assertion.
+ * Basically, use COND to dimension an array.  If COND is false/zero the
+ * array size will be -1 and we'll get a compilation error.
+ */
+
+/**
+ * Unreachable macro. Useful for suppressing "control reaches end of non-void
+ * function" warnings.
+ */
+#if __GNUC__ >= 4 && __GNUC_MINOR__ >= 5
+#define unreachable() __builtin_unreachable()
+#elif (defined(__clang__) && defined(__has_builtin))
+# if __has_builtin(__builtin_unreachable)
+#  define unreachable() __builtin_unreachable()
+# endif
+#endif
+
+#ifndef unreachable
+#define unreachable()
+#endif
+
+#if (__GNUC__ >= 3)
+#define PRINTFLIKE(f, a) __attribute__ ((format(__printf__, f, a)))
+#else
+#define PRINTFLIKE(f, a)
+#endif
+
+#ifndef NULL
+#define NULL 0
+#endif
+
+/* Used to optionally mark structures with misaligned elements or size as
+ * packed, to trade off performance for space.
+ */
+#if (__GNUC__ >= 3)
+#define PACKED __attribute__((__packed__))
+#else
+#define PACKED
+#endif
+
+
+/**
+ * LONGSTRING macro
+ * gcc -pedantic warns about long string literals, LONGSTRING silences that.
+ */
+#if !defined(__GNUC__)
+# define LONGSTRING
+#else
+# define LONGSTRING __extension__
+#endif
+
+
+#ifndef M_PI
+#define M_PI (3.14159265358979323846)
+#endif
+
+#ifndef M_E
+#define M_E (2.7182818284590452354)
+#endif
+
+#ifndef M_LOG2E
+#define M_LOG2E     (1.4426950408889634074)
+#endif
+
+#ifndef ONE_DIV_SQRT_LN2
+#define ONE_DIV_SQRT_LN2 (1.201122408786449815)
+#endif
+
+#ifndef FLT_MAX_EXP
+#define FLT_MAX_EXP 128
+#endif
+
+
+/**
+ * USE_IEEE: Determine if we're using IEEE floating point
+ */
+#if defined(__i386__) || defined(__386__) || defined(__sparc__) || \
+    defined(__s390__) || defined(__s390x__) || defined(__powerpc__) || \
+    defined(__x86_64__) || \
+    defined(__m68k__) || \
+    defined(ia64) || defined(__ia64__) || \
+    defined(__hppa__) || defined(hpux) || \
+    defined(__mips) || defined(_MIPS_ARCH) || \
+    defined(__arm__) || defined(__aarch64__) || \
+    defined(__sh__) || defined(__m32r__) || \
+    (defined(__sun) && defined(_IEEE_754)) || \
+    defined(__alpha__)
+#define USE_IEEE
+#define IEEE_ONE 0x3f800000
+#endif
+
+
+/**
+ * START/END_FAST_MATH macros:
+ *
+ * START_FAST_MATH: Set x86 FPU to faster, 32-bit precision mode (and save
+ *                  original mode to a temporary).
+ * END_FAST_MATH: Restore x86 FPU to original mode.
+ */
+#if defined(__GNUC__) && defined(__i386__)
+/*
+ * Set the x86 FPU control word to guarantee only 32 bits of precision
+ * are stored in registers.  Allowing the FPU to store more introduces
+ * differences between situations where numbers are pulled out of memory
+ * vs. situations where the compiler is able to optimize register usage.
+ *
+ * In the worst case, we force the compiler to use a memory access to
+ * truncate the float, by specifying the 'volatile' keyword.
+ */
+/* Hardware default: All exceptions masked, extended double precision,
+ * round to nearest (IEEE compliant):
+ */
+#define DEFAULT_X86_FPU		0x037f
+/* All exceptions masked, single precision, round to nearest:
+ */
+#define FAST_X86_FPU		0x003f
+/* The fldcw instruction will cause any pending FP exceptions to be
+ * raised prior to entering the block, and we clear any pending
+ * exceptions before exiting the block.  Hence, asm code has free
+ * reign over the FPU while in the fast math block.
+ */
+#if defined(NO_FAST_MATH)
+#define START_FAST_MATH(x)						\
+do {									\
+   static GLuint mask = DEFAULT_X86_FPU;				\
+   __asm__ ( "fnstcw %0" : "=m" (*&(x)) );				\
+   __asm__ ( "fldcw %0" : : "m" (mask) );				\
+} while (0)
+#else
+#define START_FAST_MATH(x)						\
+do {									\
+   static GLuint mask = FAST_X86_FPU;					\
+   __asm__ ( "fnstcw %0" : "=m" (*&(x)) );				\
+   __asm__ ( "fldcw %0" : : "m" (mask) );				\
+} while (0)
+#endif
+/* Restore original FPU mode, and clear any exceptions that may have
+ * occurred in the FAST_MATH block.
+ */
+#define END_FAST_MATH(x)						\
+do {									\
+   __asm__ ( "fnclex ; fldcw %0" : : "m" (*&(x)) );			\
+} while (0)
+
+#elif defined(_MSC_VER) && defined(_M_IX86)
+#define DEFAULT_X86_FPU		0x037f /* See GCC comments above */
+#define FAST_X86_FPU		0x003f /* See GCC comments above */
+#if defined(NO_FAST_MATH)
+#define START_FAST_MATH(x) do {\
+	static GLuint mask = DEFAULT_X86_FPU;\
+	__asm fnstcw word ptr [x]\
+	__asm fldcw word ptr [mask]\
+} while(0)
+#else
+#define START_FAST_MATH(x) do {\
+	static GLuint mask = FAST_X86_FPU;\
+	__asm fnstcw word ptr [x]\
+	__asm fldcw word ptr [mask]\
+} while(0)
+#endif
+#define END_FAST_MATH(x) do {\
+	__asm fnclex\
+	__asm fldcw word ptr [x]\
+} while(0)
+
+#else
+#define START_FAST_MATH(x)  x = 0
+#define END_FAST_MATH(x)  (void)(x)
+#endif
+
+
+#ifndef Elements
+#define Elements(x) (sizeof(x)/sizeof(*(x)))
+#endif
+
+#ifdef __cplusplus
+/**
+ * Macro function that evaluates to true if T is a trivially
+ * destructible type -- that is, if its (non-virtual) destructor
+ * performs no action and all member variables and base classes are
+ * trivially destructible themselves.
+ */
+#   if defined(__GNUC__)
+#      if ((__GNUC__ > 4) || ((__GNUC__ == 4) && (__GNUC_MINOR__ >= 3)))
+#         define HAS_TRIVIAL_DESTRUCTOR(T) __has_trivial_destructor(T)
+#      endif
+#   elif (defined(__clang__) && defined(__has_feature))
+#      if __has_feature(has_trivial_destructor)
+#         define HAS_TRIVIAL_DESTRUCTOR(T) __has_trivial_destructor(T)
+#      endif
+#   endif
+#   ifndef HAS_TRIVIAL_DESTRUCTOR
+       /* It's always safe (if inefficient) to assume that a
+        * destructor is non-trivial.
+        */
+#      define HAS_TRIVIAL_DESTRUCTOR(T) (false)
+#   endif
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+
+#endif /* COMPILER_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/config.h b/icd/intel/compiler/mesa-utils/src/mesa/main/config.h
new file mode 100644
index 0000000..a4b0afc
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/config.h

@@ -0,0 +1,315 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2007  Brian Paul   All Rights Reserved.
+ * Copyright (C) 2008  VMware, Inc.  All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file config.h
+ * Tunable configuration parameters.
+ */
+
+#ifndef MESA_CONFIG_H_INCLUDED
+#define MESA_CONFIG_H_INCLUDED
+
+
+/**
+ * \name OpenGL implementation limits
+ */
+/*@{*/
+
+/** Maximum modelview matrix stack depth */
+#define MAX_MODELVIEW_STACK_DEPTH 32
+
+/** Maximum projection matrix stack depth */
+#define MAX_PROJECTION_STACK_DEPTH 32
+
+/** Maximum texture matrix stack depth */
+#define MAX_TEXTURE_STACK_DEPTH 10
+
+/** Maximum attribute stack depth */
+#define MAX_ATTRIB_STACK_DEPTH 16
+
+/** Maximum client attribute stack depth */
+#define MAX_CLIENT_ATTRIB_STACK_DEPTH 16
+
+/** Maximum recursion depth of display list calls */
+#define MAX_LIST_NESTING 64
+
+/** Maximum number of lights */
+#define MAX_LIGHTS 8
+
+/**
+ * Maximum number of user-defined clipping planes supported by any driver in
+ * Mesa.  This is used to size arrays.
+ */
+#define MAX_CLIP_PLANES 8
+
+/** Maximum pixel map lookup table size */
+#define MAX_PIXEL_MAP_TABLE 256
+
+/** Maximum number of auxiliary color buffers */
+#define MAX_AUX_BUFFERS 1
+
+/** Maximum order (degree) of curves */
+#define MAX_EVAL_ORDER 30
+
+/** Maximum Name stack depth */
+#define MAX_NAME_STACK_DEPTH 64
+
+/** Minimum point size */
+#define MIN_POINT_SIZE 1.0
+/** Maximum point size */
+#define MAX_POINT_SIZE 60.0
+/** Point size granularity */
+#define POINT_SIZE_GRANULARITY 0.1
+
+/** Minimum line width */
+#define MIN_LINE_WIDTH 1.0
+/** Maximum line width */
+#define MAX_LINE_WIDTH 10.0
+/** Line width granularity */
+#define LINE_WIDTH_GRANULARITY 0.1
+
+/** Max memory to allow for a single texture image (in megabytes) */
+#define MAX_TEXTURE_MBYTES 1024
+
+/** Number of 1D/2D texture mipmap levels */
+#define MAX_TEXTURE_LEVELS 15
+
+/** Number of 3D texture mipmap levels */
+#define MAX_3D_TEXTURE_LEVELS 15
+
+/** Number of cube texture mipmap levels - GL_ARB_texture_cube_map */
+#define MAX_CUBE_TEXTURE_LEVELS 15
+
+/** Maximum rectangular texture size - GL_NV_texture_rectangle */
+#define MAX_TEXTURE_RECT_SIZE 16384
+
+/**
+ * Maximum number of layers in a 1D or 2D array texture - GL_MESA_texture_array
+ */
+#define MAX_ARRAY_TEXTURE_LAYERS 64
+
+/**
+ * Max number of texture coordinate units.  This mainly just applies to
+ * the fixed-function vertex code.  This will be difficult to raise above
+ * eight because of various vertex attribute bitvectors.
+ */
+#define MAX_TEXTURE_COORD_UNITS 8
+
+/**
+ * Max number of texture image units.  Also determines number of texture
+ * samplers in shaders.
+ */
+#define MAX_TEXTURE_IMAGE_UNITS 32
+
+/**
+ * Larger of MAX_TEXTURE_COORD_UNITS and MAX_TEXTURE_IMAGE_UNITS.
+ * This value is only used for dimensioning arrays.
+ * Either MAX_TEXTURE_COORD_UNITS or MAX_TEXTURE_IMAGE_UNITS (or the
+ * corresponding ctx->Const.MaxTextureCoord/ImageUnits fields) should be
+ * used almost everywhere else.
+ */
+#define MAX_TEXTURE_UNITS ((MAX_TEXTURE_COORD_UNITS > MAX_TEXTURE_IMAGE_UNITS) ? MAX_TEXTURE_COORD_UNITS : MAX_TEXTURE_IMAGE_UNITS)
+
+
+/** Maximum viewport size */
+#define MAX_VIEWPORT_WIDTH 16384
+#define MAX_VIEWPORT_HEIGHT 16384
+
+/** Maximun number of viewports supported with ARB_viewport_array */
+#define MAX_VIEWPORTS 16
+
+/** Maxmimum size for CVA.  May be overridden by the drivers.  */
+#define MAX_ARRAY_LOCK_SIZE 3000
+
+/** Subpixel precision for antialiasing, window coordinate snapping */
+#define SUB_PIXEL_BITS 4
+
+/** For GL_ARB_texture_compression */
+#define MAX_COMPRESSED_TEXTURE_FORMATS 25
+
+/** For GL_EXT_texture_filter_anisotropic */
+#define MAX_TEXTURE_MAX_ANISOTROPY 16.0
+
+/** For GL_EXT_texture_lod_bias (typically MAX_TEXTURE_LEVELS - 1) */
+#define MAX_TEXTURE_LOD_BIAS 14.0
+
+/** For any program target/extension */
+/*@{*/
+#define MAX_PROGRAM_INSTRUCTIONS       (16 * 1024)
+
+/**
+ * Per-program constants (power of two)
+ *
+ * \c MAX_PROGRAM_LOCAL_PARAMS and \c MAX_UNIFORMS are just the assembly shader
+ * and GLSL shader names for the same thing.  They should \b always have the
+ * same value.  Each refers to the number of vec4 values supplied as
+ * per-program parameters.
+ */
+/*@{*/
+#define MAX_PROGRAM_LOCAL_PARAMS       4096
+#define MAX_UNIFORMS                   4096
+#define MAX_UNIFORM_BUFFERS            15 /* + 1 default uniform buffer */
+/* 6 is for vertex, hull, domain, geometry, fragment, and compute shader. */
+#define MAX_COMBINED_UNIFORM_BUFFERS   (MAX_UNIFORM_BUFFERS * 6)
+#define MAX_ATOMIC_COUNTERS            4096
+/* 6 is for vertex, hull, domain, geometry, fragment, and compute shader. */
+#define MAX_COMBINED_ATOMIC_BUFFERS    (MAX_UNIFORM_BUFFERS * 6)
+/* Size of an atomic counter in bytes according to ARB_shader_atomic_counters */
+#define ATOMIC_COUNTER_SIZE            4
+#define MAX_IMAGE_UNIFORMS             16
+/* 6 is for vertex, hull, domain, geometry, fragment, and compute shader. */
+#define MAX_IMAGE_UNITS                (MAX_IMAGE_UNIFORMS * 6)
+/*@}*/
+
+/**
+ * Per-context constants (power of two)
+ *
+ * \note
+ * This value should always be less than or equal to \c MAX_PROGRAM_LOCAL_PARAMS
+ * and \c MAX_VERTEX_PROGRAM_PARAMS.  Otherwise some applications will make
+ * incorrect assumptions.
+ */
+#define MAX_PROGRAM_ENV_PARAMS         256
+
+#define MAX_PROGRAM_MATRICES           8
+#define MAX_PROGRAM_MATRIX_STACK_DEPTH 4
+#define MAX_PROGRAM_CALL_DEPTH         8
+#define MAX_PROGRAM_TEMPS              256
+#define MAX_PROGRAM_ADDRESS_REGS       1
+#define MAX_VARYING                    32    /**< number of float[4] vectors */
+#define MAX_SAMPLERS                   MAX_TEXTURE_IMAGE_UNITS
+#define MAX_PROGRAM_INPUTS             32
+#define MAX_PROGRAM_OUTPUTS            64
+/*@}*/
+
+/** For GL_ARB_vertex_program */
+/*@{*/
+#define MAX_VERTEX_PROGRAM_ADDRESS_REGS 1
+#define MAX_VERTEX_PROGRAM_PARAMS       MAX_UNIFORMS
+/*@}*/
+
+/** For GL_ARB_fragment_program */
+/*@{*/
+#define MAX_FRAGMENT_PROGRAM_ADDRESS_REGS 0
+/*@}*/
+
+/** For GL_NV_fragment_program */
+/*@{*/
+#define MAX_NV_FRAGMENT_PROGRAM_INSTRUCTIONS 1024 /* 72 for GL_ARB_f_p */
+#define MAX_NV_FRAGMENT_PROGRAM_TEMPS         96
+#define MAX_NV_FRAGMENT_PROGRAM_PARAMS        64
+#define MAX_NV_FRAGMENT_PROGRAM_INPUTS        12
+#define MAX_NV_FRAGMENT_PROGRAM_OUTPUTS        3
+#define MAX_NV_FRAGMENT_PROGRAM_WRITE_ONLYS    2
+/*@}*/
+
+
+/** For GL_ARB_vertex_shader */
+/*@{*/
+#define MAX_VERTEX_GENERIC_ATTRIBS 16
+/* 6 is for vertex, hull, domain, geometry, fragment, and compute shader. */
+#define MAX_COMBINED_TEXTURE_IMAGE_UNITS (MAX_TEXTURE_IMAGE_UNITS * 6)
+/*@}*/
+
+
+/** For GL_ARB_draw_buffers */
+/*@{*/
+#define MAX_DRAW_BUFFERS 8
+/*@}*/
+
+
+/** For GL_EXT_framebuffer_object */
+/*@{*/
+#define MAX_COLOR_ATTACHMENTS 8
+#define MAX_RENDERBUFFER_SIZE 16384
+/*@}*/
+
+/** For GL_ATI_envmap_bump - support bump mapping on first 8 units */
+#define SUPPORTED_ATI_BUMP_UNITS 0xff
+
+/** For GL_EXT_transform_feedback */
+#define MAX_FEEDBACK_BUFFERS 4
+#define MAX_FEEDBACK_ATTRIBS 32
+
+/** For GL_ARB_geometry_shader4 */
+/*@{*/
+#define MAX_GEOMETRY_UNIFORM_COMPONENTS              512
+#define MAX_GEOMETRY_OUTPUT_VERTICES                 256
+#define MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS         1024
+/*@}*/
+
+/** For GL_ARB_debug_output and GL_KHR_debug */
+/*@{*/
+#define MAX_DEBUG_LOGGED_MESSAGES   10
+#define MAX_DEBUG_MESSAGE_LENGTH    4096
+/*@}*/
+
+/** For GL_KHR_debug */
+/*@{*/
+#define MAX_LABEL_LENGTH 256
+#define MAX_DEBUG_GROUP_STACK_DEPTH 64
+/*@}*/
+
+/** For GL_ARB_gpu_shader5 */
+/*@{*/
+#define MAX_GEOMETRY_SHADER_INVOCATIONS     32
+#define MIN_FRAGMENT_INTERPOLATION_OFFSET   -0.5
+#define MAX_FRAGMENT_INTERPOLATION_OFFSET   0.5
+#define FRAGMENT_INTERPOLATION_OFFSET_BITS  4
+#define MAX_VERTEX_STREAMS                  4
+/*@}*/
+
+/** For GL_INTEL_performance_query */
+/*@{*/
+#define MAX_PERFQUERY_QUERY_NAME_LENGTH     256
+#define MAX_PERFQUERY_COUNTER_NAME_LENGTH   256
+#define MAX_PERFQUERY_COUNTER_DESC_LENGTH   1024
+#define PERFQUERY_HAVE_GPA_EXTENDED_COUNTERS 0
+/*@}*/
+
+/*
+ * Color channel component order
+ * 
+ * \note Changes will almost certainly cause problems at this time.
+ */
+#define RCOMP 0
+#define GCOMP 1
+#define BCOMP 2
+#define ACOMP 3
+
+
+/**
+ * Maximum number of temporary vertices required for clipping.  
+ *
+ * Used in array_cache and tnl modules.
+ */
+#define MAX_CLIPPED_VERTICES ((2 * (6 + MAX_CLIP_PLANES))+1)
+
+/**
+ * Maximum number of builtin state backing up a uniform variable.
+ */
+#define MAX_NUM_STATE_SLOTS 96
+
+#endif /* MESA_CONFIG_H_INCLUDED */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/context.c b/icd/intel/compiler/mesa-utils/src/mesa/main/context.c
new file mode 100644
index 0000000..190ed86
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/context.c

@@ -0,0 +1,1988 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2007  Brian Paul   All Rights Reserved.
+ * Copyright (C) 2008  VMware, Inc.  All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file context.c
+ * Mesa context/visual/framebuffer management functions.
+ * \author Brian Paul
+ */
+
+/**
+ * \mainpage Mesa Main Module
+ *
+ * \section MainIntroduction Introduction
+ *
+ * The Mesa Main module consists of all the files in the main/ directory.
+ * Among the features of this module are:
+ * <UL>
+ * <LI> Structures to represent most GL state </LI>
+ * <LI> State set/get functions </LI>
+ * <LI> Display lists </LI>
+ * <LI> Texture unit, object and image handling </LI>
+ * <LI> Matrix and attribute stacks </LI>
+ * </UL>
+ *
+ * Other modules are responsible for API dispatch, vertex transformation,
+ * point/line/triangle setup, rasterization, vertex array caching,
+ * vertex/fragment programs/shaders, etc.
+ *
+ *
+ * \section AboutDoxygen About Doxygen
+ *
+ * If you're viewing this information as Doxygen-generated HTML you'll
+ * see the documentation index at the top of this page.
+ *
+ * The first line lists the Mesa source code modules.
+ * The second line lists the indexes available for viewing the documentation
+ * for each module.
+ *
+ * Selecting the <b>Main page</b> link will display a summary of the module
+ * (this page).
+ *
+ * Selecting <b>Data Structures</b> will list all C structures.
+ *
+ * Selecting the <b>File List</b> link will list all the source files in
+ * the module.
+ * Selecting a filename will show a list of all functions defined in that file.
+ *
+ * Selecting the <b>Data Fields</b> link will display a list of all
+ * documented structure members.
+ *
+ * Selecting the <b>Globals</b> link will display a list
+ * of all functions, structures, global variables and macros in the module.
+ *
+ */
+
+
+#include "glheader.h"
+#include "imports.h"
+#include "accum.h"
+#include "api_exec.h"
+#include "api_loopback.h"
+#include "arrayobj.h"
+#include "attrib.h"
+#include "blend.h"
+#include "buffers.h"
+#include "bufferobj.h"
+#include "context.h"
+#include "cpuinfo.h"
+#include "debug.h"
+#include "depth.h"
+#include "dlist.h"
+#include "eval.h"
+#include "extensions.h"
+#include "fbobject.h"
+#include "feedback.h"
+#include "fog.h"
+#include "formats.h"
+#include "framebuffer.h"
+#include "hint.h"
+#include "hash.h"
+#include "light.h"
+#include "lines.h"
+#include "macros.h"
+#include "matrix.h"
+#include "multisample.h"
+#include "performance_monitor.h"
+#include "pipelineobj.h"
+#include "pixel.h"
+#include "pixelstore.h"
+#include "points.h"
+#include "polygon.h"
+#include "queryobj.h"
+#include "shaderapi.h"
+#include "syncobj.h"
+#include "rastpos.h"
+#include "remap.h"
+#include "scissor.h"
+#include "shared.h"
+#include "shaderobj.h"
+#include "simple_list.h"
+#include "state.h"
+#include "stencil.h"
+#include "texcompress_s3tc.h"
+#include "texstate.h"
+#include "transformfeedback.h"
+#include "mtypes.h"
+#include "varray.h"
+#include "version.h"
+#include "viewport.h"
+#include "vtxfmt.h"
+#include "program/program.h"
+#include "program/prog_print.h"
+#include "program/prog_diskcache.h"
+#include "math/m_matrix.h"
+#include "main/dispatch.h" /* for _gloffset_COUNT */
+
+#ifdef USE_SPARC_ASM
+#include "sparc/sparc.h"
+#endif
+
+#include "glsl_parser_extras.h"
+#include "threadpool.h"
+#include <stdbool.h>
+
+
+#ifndef MESA_VERBOSE
+int MESA_VERBOSE = 0;
+#endif
+
+#ifndef MESA_DEBUG_FLAGS
+int MESA_DEBUG_FLAGS = 0;
+#endif
+
+
+/* ubyte -> float conversion */
+GLfloat _mesa_ubyte_to_float_color_tab[256];
+
+
+
+/**
+ * Swap buffers notification callback.
+ * 
+ * \param ctx GL context.
+ *
+ * Called by window system just before swapping buffers.
+ * We have to finish any pending rendering.
+ */
+void
+_mesa_notifySwapBuffers(struct gl_context *ctx)
+{
+   if (MESA_VERBOSE & VERBOSE_SWAPBUFFERS)
+      _mesa_debug(ctx, "SwapBuffers\n");
+   FLUSH_CURRENT( ctx, 0 );
+   if (ctx->Driver.Flush) {
+      ctx->Driver.Flush(ctx);
+   }
+}
+
+
+/**********************************************************************/
+/** \name GL Visual allocation/destruction                            */
+/**********************************************************************/
+/*@{*/
+
+/**
+ * Allocates a struct gl_config structure and initializes it via
+ * _mesa_initialize_visual().
+ * 
+ * \param dbFlag double buffering
+ * \param stereoFlag stereo buffer
+ * \param depthBits requested bits per depth buffer value. Any value in [0, 32]
+ * is acceptable but the actual depth type will be GLushort or GLuint as
+ * needed.
+ * \param stencilBits requested minimum bits per stencil buffer value
+ * \param accumRedBits, accumGreenBits, accumBlueBits, accumAlphaBits number
+ * of bits per color component in accum buffer.
+ * \param indexBits number of bits per pixel if \p rgbFlag is GL_FALSE
+ * \param redBits number of bits per color component in frame buffer for RGB(A)
+ * mode.  We always use 8 in core Mesa though.
+ * \param greenBits same as above.
+ * \param blueBits same as above.
+ * \param alphaBits same as above.
+ * \param numSamples not really used.
+ * 
+ * \return pointer to new struct gl_config or NULL if requested parameters
+ * can't be met.
+ *
+ * \note Need to add params for level and numAuxBuffers (at least)
+ */
+struct gl_config *
+_mesa_create_visual( GLboolean dbFlag,
+                     GLboolean stereoFlag,
+                     GLint redBits,
+                     GLint greenBits,
+                     GLint blueBits,
+                     GLint alphaBits,
+                     GLint depthBits,
+                     GLint stencilBits,
+                     GLint accumRedBits,
+                     GLint accumGreenBits,
+                     GLint accumBlueBits,
+                     GLint accumAlphaBits,
+                     GLint numSamples )
+{
+   struct gl_config *vis = CALLOC_STRUCT(gl_config);
+   if (vis) {
+      if (!_mesa_initialize_visual(vis, dbFlag, stereoFlag,
+                                   redBits, greenBits, blueBits, alphaBits,
+                                   depthBits, stencilBits,
+                                   accumRedBits, accumGreenBits,
+                                   accumBlueBits, accumAlphaBits,
+                                   numSamples)) {
+         free(vis);
+         return NULL;
+      }
+   }
+   return vis;
+}
+
+
+/**
+ * Makes some sanity checks and fills in the fields of the struct
+ * gl_config object with the given parameters.  If the caller needs to
+ * set additional fields, he should just probably init the whole
+ * gl_config object himself.
+ *
+ * \return GL_TRUE on success, or GL_FALSE on failure.
+ *
+ * \sa _mesa_create_visual() above for the parameter description.
+ */
+GLboolean
+_mesa_initialize_visual( struct gl_config *vis,
+                         GLboolean dbFlag,
+                         GLboolean stereoFlag,
+                         GLint redBits,
+                         GLint greenBits,
+                         GLint blueBits,
+                         GLint alphaBits,
+                         GLint depthBits,
+                         GLint stencilBits,
+                         GLint accumRedBits,
+                         GLint accumGreenBits,
+                         GLint accumBlueBits,
+                         GLint accumAlphaBits,
+                         GLint numSamples )
+{
+   assert(vis);
+
+   if (depthBits < 0 || depthBits > 32) {
+      return GL_FALSE;
+   }
+   if (stencilBits < 0 || stencilBits > 8) {
+      return GL_FALSE;
+   }
+   assert(accumRedBits >= 0);
+   assert(accumGreenBits >= 0);
+   assert(accumBlueBits >= 0);
+   assert(accumAlphaBits >= 0);
+
+   vis->rgbMode          = GL_TRUE;
+   vis->doubleBufferMode = dbFlag;
+   vis->stereoMode       = stereoFlag;
+
+   vis->redBits          = redBits;
+   vis->greenBits        = greenBits;
+   vis->blueBits         = blueBits;
+   vis->alphaBits        = alphaBits;
+   vis->rgbBits          = redBits + greenBits + blueBits;
+
+   vis->indexBits      = 0;
+   vis->depthBits      = depthBits;
+   vis->stencilBits    = stencilBits;
+
+   vis->accumRedBits   = accumRedBits;
+   vis->accumGreenBits = accumGreenBits;
+   vis->accumBlueBits  = accumBlueBits;
+   vis->accumAlphaBits = accumAlphaBits;
+
+   vis->haveAccumBuffer   = accumRedBits > 0;
+   vis->haveDepthBuffer   = depthBits > 0;
+   vis->haveStencilBuffer = stencilBits > 0;
+
+   vis->numAuxBuffers = 0;
+   vis->level = 0;
+   vis->sampleBuffers = numSamples > 0 ? 1 : 0;
+   vis->samples = numSamples;
+
+   return GL_TRUE;
+}
+
+
+/**
+ * Destroy a visual and free its memory.
+ *
+ * \param vis visual.
+ * 
+ * Frees the visual structure.
+ */
+void
+_mesa_destroy_visual( struct gl_config *vis )
+{
+   free(vis);
+}
+
+/*@}*/
+
+
+/**********************************************************************/
+/** \name Context allocation, initialization, destroying
+ *
+ * The purpose of the most initialization functions here is to provide the
+ * default state values according to the OpenGL specification.
+ */
+/**********************************************************************/
+/*@{*/
+
+
+/**
+ * This is lame.  gdb only seems to recognize enum types that are
+ * actually used somewhere.  We want to be able to print/use enum
+ * values such as TEXTURE_2D_INDEX in gdb.  But we don't actually use
+ * the gl_texture_index type anywhere.  Thus, this lame function.
+ */
+static void
+dummy_enum_func(void)
+{
+   gl_buffer_index bi = BUFFER_FRONT_LEFT;
+   gl_face_index fi = FACE_POS_X;
+   gl_frag_result fr = FRAG_RESULT_DEPTH;
+   gl_texture_index ti = TEXTURE_2D_ARRAY_INDEX;
+   gl_vert_attrib va = VERT_ATTRIB_POS;
+   gl_varying_slot vs = VARYING_SLOT_POS;
+
+   (void) bi;
+   (void) fi;
+   (void) fr;
+   (void) ti;
+   (void) va;
+   (void) vs;
+}
+
+
+/**
+ * One-time initialization mutex lock.
+ *
+ * \sa Used by one_time_init().
+ */
+mtx_t OneTimeLock = _MTX_INITIALIZER_NP;
+
+
+
+/**
+ * Calls all the various one-time-init functions in Mesa.
+ *
+ * While holding a global mutex lock, calls several initialization functions,
+ * and sets the glapi callbacks if the \c MESA_DEBUG environment variable is
+ * defined.
+ *
+ * \sa _math_init().
+ */
+static void
+one_time_init( struct gl_context *ctx )
+{
+   static GLbitfield api_init_mask = 0x0;
+
+   mtx_lock(&OneTimeLock);
+
+   /* truly one-time init */
+   if (!api_init_mask) {
+      GLuint i;
+
+      /* do some implementation tests */
+      assert( sizeof(GLbyte) == 1 );
+      assert( sizeof(GLubyte) == 1 );
+      assert( sizeof(GLshort) == 2 );
+      assert( sizeof(GLushort) == 2 );
+      assert( sizeof(GLint) == 4 );
+      assert( sizeof(GLuint) == 4 );
+
+      _mesa_get_cpu_features();
+
+      for (i = 0; i < 256; i++) {
+         _mesa_ubyte_to_float_color_tab[i] = (float) i / 255.0F;
+      }
+
+#if defined(DEBUG) && defined(__DATE__) && defined(__TIME__)
+      if (MESA_VERBOSE != 0) {
+	 _mesa_debug(ctx, "Mesa %s DEBUG build %s %s\n",
+		     PACKAGE_VERSION, __DATE__, __TIME__);
+      }
+#endif
+
+#ifdef DEBUG
+      _mesa_test_formats();
+#endif
+
+      _mesa_create_shader_compiler();
+   }
+
+   /* per-API one-time init */
+   if (!(api_init_mask & (1 << ctx->API))) {
+      _mesa_init_get_hash(ctx);
+
+      _mesa_init_remap_table();
+   }
+
+   api_init_mask |= 1 << ctx->API;
+
+   mtx_unlock(&OneTimeLock);
+
+   /* Hopefully atexit() is widely available.  If not, we may need some
+    * #ifdef tests here.
+    */
+   atexit(_mesa_destroy_shader_compiler);
+
+   dummy_enum_func();
+}
+
+
+/**
+ * Initialize fields of gl_current_attrib (aka ctx->Current.*)
+ */
+static void
+_mesa_init_current(struct gl_context *ctx)
+{
+   GLuint i;
+
+   /* Init all to (0,0,0,1) */
+   for (i = 0; i < Elements(ctx->Current.Attrib); i++) {
+      ASSIGN_4V( ctx->Current.Attrib[i], 0.0, 0.0, 0.0, 1.0 );
+   }
+
+   /* redo special cases: */
+   ASSIGN_4V( ctx->Current.Attrib[VERT_ATTRIB_WEIGHT], 1.0, 0.0, 0.0, 0.0 );
+   ASSIGN_4V( ctx->Current.Attrib[VERT_ATTRIB_NORMAL], 0.0, 0.0, 1.0, 1.0 );
+   ASSIGN_4V( ctx->Current.Attrib[VERT_ATTRIB_COLOR0], 1.0, 1.0, 1.0, 1.0 );
+   ASSIGN_4V( ctx->Current.Attrib[VERT_ATTRIB_COLOR1], 0.0, 0.0, 0.0, 1.0 );
+   ASSIGN_4V( ctx->Current.Attrib[VERT_ATTRIB_COLOR_INDEX], 1.0, 0.0, 0.0, 1.0 );
+   ASSIGN_4V( ctx->Current.Attrib[VERT_ATTRIB_EDGEFLAG], 1.0, 0.0, 0.0, 1.0 );
+}
+
+
+/**
+ * Init vertex/fragment/geometry program limits.
+ * Important: drivers should override these with actual limits.
+ */
+static void
+init_program_limits(struct gl_context *ctx, gl_shader_stage stage,
+                    struct gl_program_constants *prog)
+{
+   prog->MaxInstructions = MAX_PROGRAM_INSTRUCTIONS;
+   prog->MaxAluInstructions = MAX_PROGRAM_INSTRUCTIONS;
+   prog->MaxTexInstructions = MAX_PROGRAM_INSTRUCTIONS;
+   prog->MaxTexIndirections = MAX_PROGRAM_INSTRUCTIONS;
+   prog->MaxTemps = MAX_PROGRAM_TEMPS;
+   prog->MaxEnvParams = MAX_PROGRAM_ENV_PARAMS;
+   prog->MaxLocalParams = MAX_PROGRAM_LOCAL_PARAMS;
+   prog->MaxAddressOffset = MAX_PROGRAM_LOCAL_PARAMS;
+
+   switch (stage) {
+   case MESA_SHADER_VERTEX:
+      prog->MaxParameters = MAX_VERTEX_PROGRAM_PARAMS;
+      prog->MaxAttribs = MAX_VERTEX_GENERIC_ATTRIBS;
+      prog->MaxAddressRegs = MAX_VERTEX_PROGRAM_ADDRESS_REGS;
+      prog->MaxUniformComponents = 4 * MAX_UNIFORMS;
+      prog->MaxInputComponents = 0; /* value not used */
+      prog->MaxOutputComponents = 16 * 4; /* old limit not to break tnl and swrast */
+      break;
+   case MESA_SHADER_FRAGMENT:
+      prog->MaxParameters = MAX_NV_FRAGMENT_PROGRAM_PARAMS;
+      prog->MaxAttribs = MAX_NV_FRAGMENT_PROGRAM_INPUTS;
+      prog->MaxAddressRegs = MAX_FRAGMENT_PROGRAM_ADDRESS_REGS;
+      prog->MaxUniformComponents = 4 * MAX_UNIFORMS;
+      prog->MaxInputComponents = 16 * 4; /* old limit not to break tnl and swrast */
+      prog->MaxOutputComponents = 0; /* value not used */
+      break;
+   case MESA_SHADER_GEOMETRY:
+      prog->MaxParameters = MAX_VERTEX_PROGRAM_PARAMS;
+      prog->MaxAttribs = MAX_VERTEX_GENERIC_ATTRIBS;
+      prog->MaxAddressRegs = MAX_VERTEX_PROGRAM_ADDRESS_REGS;
+      prog->MaxUniformComponents = 4 * MAX_UNIFORMS;
+      prog->MaxInputComponents = 16 * 4; /* old limit not to break tnl and swrast */
+      prog->MaxOutputComponents = 16 * 4; /* old limit not to break tnl and swrast */
+      break;
+   case MESA_SHADER_COMPUTE:
+      prog->MaxParameters = 0; /* not meaningful for compute shaders */
+      prog->MaxAttribs = 0; /* not meaningful for compute shaders */
+      prog->MaxAddressRegs = 0; /* not meaningful for compute shaders */
+      prog->MaxUniformComponents = 4 * MAX_UNIFORMS;
+      prog->MaxInputComponents = 0; /* not meaningful for compute shaders */
+      prog->MaxOutputComponents = 0; /* not meaningful for compute shaders */
+      break;
+   default:
+      assert(0 && "Bad shader stage in init_program_limits()");
+   }
+
+   /* Set the native limits to zero.  This implies that there is no native
+    * support for shaders.  Let the drivers fill in the actual values.
+    */
+   prog->MaxNativeInstructions = 0;
+   prog->MaxNativeAluInstructions = 0;
+   prog->MaxNativeTexInstructions = 0;
+   prog->MaxNativeTexIndirections = 0;
+   prog->MaxNativeAttribs = 0;
+   prog->MaxNativeTemps = 0;
+   prog->MaxNativeAddressRegs = 0;
+   prog->MaxNativeParameters = 0;
+
+   /* Set GLSL datatype range/precision info assuming IEEE float values.
+    * Drivers should override these defaults as needed.
+    */
+   prog->MediumFloat.RangeMin = 127;
+   prog->MediumFloat.RangeMax = 127;
+   prog->MediumFloat.Precision = 23;
+   prog->LowFloat = prog->HighFloat = prog->MediumFloat;
+
+   /* Assume ints are stored as floats for now, since this is the least-common
+    * denominator.  The OpenGL ES spec implies (page 132) that the precision
+    * of integer types should be 0.  Practically speaking, IEEE
+    * single-precision floating point values can only store integers in the
+    * range [-0x01000000, 0x01000000] without loss of precision.
+    */
+   prog->MediumInt.RangeMin = 24;
+   prog->MediumInt.RangeMax = 24;
+   prog->MediumInt.Precision = 0;
+   prog->LowInt = prog->HighInt = prog->MediumInt;
+
+   prog->MaxUniformBlocks = 12;
+   prog->MaxCombinedUniformComponents = (prog->MaxUniformComponents +
+                                         ctx->Const.MaxUniformBlockSize / 4 *
+                                         prog->MaxUniformBlocks);
+
+   prog->MaxAtomicBuffers = 0;
+   prog->MaxAtomicCounters = 0;
+}
+
+
+/**
+ * Initialize fields of gl_constants (aka ctx->Const.*).
+ * Use defaults from config.h.  The device drivers will often override
+ * some of these values (such as number of texture units).
+ */
+static void 
+_mesa_init_constants(struct gl_context *ctx)
+{
+   int i;
+   assert(ctx);
+
+   /* Constants, may be overriden (usually only reduced) by device drivers */
+   ctx->Const.MaxTextureMbytes = MAX_TEXTURE_MBYTES;
+   ctx->Const.MaxTextureLevels = MAX_TEXTURE_LEVELS;
+   ctx->Const.Max3DTextureLevels = MAX_3D_TEXTURE_LEVELS;
+   ctx->Const.MaxCubeTextureLevels = MAX_CUBE_TEXTURE_LEVELS;
+   ctx->Const.MaxTextureRectSize = MAX_TEXTURE_RECT_SIZE;
+   ctx->Const.MaxArrayTextureLayers = MAX_ARRAY_TEXTURE_LAYERS;
+   ctx->Const.MaxTextureCoordUnits = MAX_TEXTURE_COORD_UNITS;
+   ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxTextureImageUnits = MAX_TEXTURE_IMAGE_UNITS;
+   ctx->Const.MaxTextureUnits = MIN2(ctx->Const.MaxTextureCoordUnits,
+                                     ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxTextureImageUnits);
+   ctx->Const.MaxTextureMaxAnisotropy = MAX_TEXTURE_MAX_ANISOTROPY;
+   ctx->Const.MaxTextureLodBias = MAX_TEXTURE_LOD_BIAS;
+   ctx->Const.MaxTextureBufferSize = 65536;
+   ctx->Const.TextureBufferOffsetAlignment = 1;
+   ctx->Const.MaxArrayLockSize = MAX_ARRAY_LOCK_SIZE;
+   ctx->Const.SubPixelBits = SUB_PIXEL_BITS;
+   ctx->Const.MinPointSize = MIN_POINT_SIZE;
+   ctx->Const.MaxPointSize = MAX_POINT_SIZE;
+   ctx->Const.MinPointSizeAA = MIN_POINT_SIZE;
+   ctx->Const.MaxPointSizeAA = MAX_POINT_SIZE;
+   ctx->Const.PointSizeGranularity = (GLfloat) POINT_SIZE_GRANULARITY;
+   ctx->Const.MinLineWidth = MIN_LINE_WIDTH;
+   ctx->Const.MaxLineWidth = MAX_LINE_WIDTH;
+   ctx->Const.MinLineWidthAA = MIN_LINE_WIDTH;
+   ctx->Const.MaxLineWidthAA = MAX_LINE_WIDTH;
+   ctx->Const.LineWidthGranularity = (GLfloat) LINE_WIDTH_GRANULARITY;
+   ctx->Const.MaxClipPlanes = 6;
+   ctx->Const.MaxLights = MAX_LIGHTS;
+   ctx->Const.MaxShininess = 128.0;
+   ctx->Const.MaxSpotExponent = 128.0;
+   ctx->Const.MaxViewportWidth = MAX_VIEWPORT_WIDTH;
+   ctx->Const.MaxViewportHeight = MAX_VIEWPORT_HEIGHT;
+   ctx->Const.MinMapBufferAlignment = 64;
+
+   /* Driver must override these values if ARB_viewport_array is supported. */
+   ctx->Const.MaxViewports = 1;
+   ctx->Const.ViewportSubpixelBits = 0;
+   ctx->Const.ViewportBounds.Min = 0;
+   ctx->Const.ViewportBounds.Max = 0;
+
+   /** GL_ARB_uniform_buffer_object */
+   ctx->Const.MaxCombinedUniformBlocks = 36;
+   ctx->Const.MaxUniformBufferBindings = 36;
+   ctx->Const.MaxUniformBlockSize = 16384;
+   ctx->Const.UniformBufferOffsetAlignment = 1;
+
+   for (i = 0; i < MESA_SHADER_STAGES; i++)
+      init_program_limits(ctx, i, &ctx->Const.Program[i]);
+
+   ctx->Const.MaxProgramMatrices = MAX_PROGRAM_MATRICES;
+   ctx->Const.MaxProgramMatrixStackDepth = MAX_PROGRAM_MATRIX_STACK_DEPTH;
+
+   /* CheckArrayBounds is overriden by drivers/x11 for X server */
+   ctx->Const.CheckArrayBounds = GL_FALSE;
+
+   /* GL_ARB_draw_buffers */
+   ctx->Const.MaxDrawBuffers = MAX_DRAW_BUFFERS;
+
+   ctx->Const.MaxColorAttachments = MAX_COLOR_ATTACHMENTS;
+   ctx->Const.MaxRenderbufferSize = MAX_RENDERBUFFER_SIZE;
+
+   ctx->Const.Program[MESA_SHADER_VERTEX].MaxTextureImageUnits = MAX_TEXTURE_IMAGE_UNITS;
+   ctx->Const.MaxCombinedTextureImageUnits = MAX_COMBINED_TEXTURE_IMAGE_UNITS;
+   ctx->Const.MaxVarying = 16; /* old limit not to break tnl and swrast */
+   ctx->Const.Program[MESA_SHADER_GEOMETRY].MaxTextureImageUnits = MAX_TEXTURE_IMAGE_UNITS;
+   ctx->Const.MaxGeometryOutputVertices = MAX_GEOMETRY_OUTPUT_VERTICES;
+   ctx->Const.MaxGeometryTotalOutputComponents = MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS;
+
+   /* Shading language version */
+   if (_mesa_is_desktop_gl(ctx)) {
+      ctx->Const.GLSLVersion = 120;
+      _mesa_override_glsl_version(ctx);
+   }
+   else if (ctx->API == API_OPENGLES2) {
+      ctx->Const.GLSLVersion = 100;
+   }
+   else if (ctx->API == API_OPENGLES) {
+      ctx->Const.GLSLVersion = 0; /* GLSL not supported */
+   }
+
+   /* GL_ARB_framebuffer_object */
+   ctx->Const.MaxSamples = 0;
+
+   /* GL_ARB_sync */
+   ctx->Const.MaxServerWaitTimeout = 0x1fff7fffffffULL;
+
+   /* GL_ATI_envmap_bumpmap */
+   ctx->Const.SupportedBumpUnits = SUPPORTED_ATI_BUMP_UNITS;
+
+   /* GL_EXT_provoking_vertex */
+   ctx->Const.QuadsFollowProvokingVertexConvention = GL_TRUE;
+
+   /* GL_EXT_transform_feedback */
+   ctx->Const.MaxTransformFeedbackBuffers = MAX_FEEDBACK_BUFFERS;
+   ctx->Const.MaxTransformFeedbackSeparateComponents = 4 * MAX_FEEDBACK_ATTRIBS;
+   ctx->Const.MaxTransformFeedbackInterleavedComponents = 4 * MAX_FEEDBACK_ATTRIBS;
+   ctx->Const.MaxVertexStreams = 1;
+
+   /* GL 3.2  */
+   ctx->Const.ProfileMask = ctx->API == API_OPENGL_CORE
+                          ? GL_CONTEXT_CORE_PROFILE_BIT
+                          : GL_CONTEXT_COMPATIBILITY_PROFILE_BIT;
+
+   /** GL_EXT_gpu_shader4 */
+   ctx->Const.MinProgramTexelOffset = -8;
+   ctx->Const.MaxProgramTexelOffset = 7;
+
+   /* GL_ARB_texture_gather */
+   ctx->Const.MinProgramTextureGatherOffset = -8;
+   ctx->Const.MaxProgramTextureGatherOffset = 7;
+
+   /* GL_ARB_robustness */
+   ctx->Const.ResetStrategy = GL_NO_RESET_NOTIFICATION_ARB;
+
+   /* PrimitiveRestart */
+   ctx->Const.PrimitiveRestartInSoftware = GL_FALSE;
+
+   /* ES 3.0 or ARB_ES3_compatibility */
+   ctx->Const.MaxElementIndex = 0xffffffffu;
+
+   /* GL_ARB_texture_multisample */
+   ctx->Const.MaxColorTextureSamples = 1;
+   ctx->Const.MaxDepthTextureSamples = 1;
+   ctx->Const.MaxIntegerSamples = 1;
+
+   /* GL_ARB_shader_atomic_counters */
+   ctx->Const.MaxAtomicBufferBindings = MAX_COMBINED_ATOMIC_BUFFERS;
+   ctx->Const.MaxAtomicBufferSize = MAX_ATOMIC_COUNTERS * ATOMIC_COUNTER_SIZE;
+   ctx->Const.MaxCombinedAtomicBuffers = MAX_COMBINED_ATOMIC_BUFFERS;
+   ctx->Const.MaxCombinedAtomicCounters = MAX_ATOMIC_COUNTERS;
+
+   /* GL_ARB_vertex_attrib_binding */
+   ctx->Const.MaxVertexAttribRelativeOffset = 2047;
+   ctx->Const.MaxVertexAttribBindings = MAX_VERTEX_GENERIC_ATTRIBS;
+
+   /* GL_ARB_compute_shader */
+   ctx->Const.MaxComputeWorkGroupCount[0] = 65535;
+   ctx->Const.MaxComputeWorkGroupCount[1] = 65535;
+   ctx->Const.MaxComputeWorkGroupCount[2] = 65535;
+   ctx->Const.MaxComputeWorkGroupSize[0] = 1024;
+   ctx->Const.MaxComputeWorkGroupSize[1] = 1024;
+   ctx->Const.MaxComputeWorkGroupSize[2] = 64;
+   ctx->Const.MaxComputeWorkGroupInvocations = 1024;
+
+   /** GL_ARB_gpu_shader5 */
+   ctx->Const.MinFragmentInterpolationOffset = MIN_FRAGMENT_INTERPOLATION_OFFSET;
+   ctx->Const.MaxFragmentInterpolationOffset = MAX_FRAGMENT_INTERPOLATION_OFFSET;
+
+   ctx->Const.GlassMode = 0;
+}
+
+
+/**
+ * Do some sanity checks on the limits/constants for the given context.
+ * Only called the first time a context is bound.
+ */
+static void
+check_context_limits(struct gl_context *ctx)
+{
+   /* check that we don't exceed the size of various bitfields */
+   assert(VARYING_SLOT_MAX <=
+	  (8 * sizeof(ctx->VertexProgram._Current->Base.OutputsWritten)));
+   assert(VARYING_SLOT_MAX <=
+	  (8 * sizeof(ctx->FragmentProgram._Current->Base.InputsRead)));
+
+   /* shader-related checks */
+   assert(ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxLocalParams <= MAX_PROGRAM_LOCAL_PARAMS);
+   assert(ctx->Const.Program[MESA_SHADER_VERTEX].MaxLocalParams <= MAX_PROGRAM_LOCAL_PARAMS);
+
+   /* Texture unit checks */
+   assert(ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxTextureImageUnits > 0);
+   assert(ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxTextureImageUnits <= MAX_TEXTURE_IMAGE_UNITS);
+   assert(ctx->Const.MaxTextureCoordUnits > 0);
+   assert(ctx->Const.MaxTextureCoordUnits <= MAX_TEXTURE_COORD_UNITS);
+   assert(ctx->Const.MaxTextureUnits > 0);
+   assert(ctx->Const.MaxTextureUnits <= MAX_TEXTURE_IMAGE_UNITS);
+   assert(ctx->Const.MaxTextureUnits <= MAX_TEXTURE_COORD_UNITS);
+   assert(ctx->Const.MaxTextureUnits == MIN2(ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxTextureImageUnits,
+                                             ctx->Const.MaxTextureCoordUnits));
+   assert(ctx->Const.MaxCombinedTextureImageUnits > 0);
+   assert(ctx->Const.MaxCombinedTextureImageUnits <= MAX_COMBINED_TEXTURE_IMAGE_UNITS);
+   assert(ctx->Const.MaxTextureCoordUnits <= MAX_COMBINED_TEXTURE_IMAGE_UNITS);
+   /* number of coord units cannot be greater than number of image units */
+   assert(ctx->Const.MaxTextureCoordUnits <= ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxTextureImageUnits);
+
+
+   /* Texture size checks */
+   assert(ctx->Const.MaxTextureLevels <= MAX_TEXTURE_LEVELS);
+   assert(ctx->Const.Max3DTextureLevels <= MAX_3D_TEXTURE_LEVELS);
+   assert(ctx->Const.MaxCubeTextureLevels <= MAX_CUBE_TEXTURE_LEVELS);
+   assert(ctx->Const.MaxTextureRectSize <= MAX_TEXTURE_RECT_SIZE);
+
+   /* Texture level checks */
+   assert(MAX_TEXTURE_LEVELS >= MAX_3D_TEXTURE_LEVELS);
+   assert(MAX_TEXTURE_LEVELS >= MAX_CUBE_TEXTURE_LEVELS);
+
+   /* Max texture size should be <= max viewport size (render to texture) */
+   assert((1U << (ctx->Const.MaxTextureLevels - 1))
+          <= ctx->Const.MaxViewportWidth);
+   assert((1U << (ctx->Const.MaxTextureLevels - 1))
+          <= ctx->Const.MaxViewportHeight);
+
+   assert(ctx->Const.MaxDrawBuffers <= MAX_DRAW_BUFFERS);
+
+   /* if this fails, add more enum values to gl_buffer_index */
+   assert(BUFFER_COLOR0 + MAX_DRAW_BUFFERS <= BUFFER_COUNT);
+
+   /* XXX probably add more tests */
+}
+
+
+/**
+ * Initialize the attribute groups in a GL context.
+ *
+ * \param ctx GL context.
+ *
+ * Initializes all the attributes, calling the respective <tt>init*</tt>
+ * functions for the more complex data structures.
+ */
+static GLboolean
+init_attrib_groups(struct gl_context *ctx)
+{
+   assert(ctx);
+
+   /* Constants */
+   _mesa_init_constants( ctx );
+
+   /* Extensions */
+   _mesa_init_extensions( ctx );
+
+   /* Attribute Groups */
+   _mesa_init_accum( ctx );
+   _mesa_init_attrib( ctx );
+   _mesa_init_buffer_objects( ctx );
+   _mesa_init_color( ctx );
+   _mesa_init_current( ctx );
+   _mesa_init_depth( ctx );
+   _mesa_init_debug( ctx );
+   _mesa_init_display_list( ctx );
+   _mesa_init_errors( ctx );
+   _mesa_init_eval( ctx );
+   _mesa_init_fbobjects( ctx );
+   _mesa_init_feedback( ctx );
+   _mesa_init_fog( ctx );
+   _mesa_init_hint( ctx );
+   _mesa_init_line( ctx );
+   _mesa_init_lighting( ctx );
+   _mesa_init_matrix( ctx );
+   _mesa_init_multisample( ctx );
+   _mesa_init_performance_monitors( ctx );
+   _mesa_init_pipeline( ctx );
+   _mesa_init_pixel( ctx );
+   _mesa_init_pixelstore( ctx );
+   _mesa_init_point( ctx );
+   _mesa_init_polygon( ctx );
+   _mesa_init_program( ctx );
+   _mesa_init_queryobj( ctx );
+   _mesa_init_sync( ctx );
+   _mesa_init_rastpos( ctx );
+   _mesa_init_scissor( ctx );
+   _mesa_init_shader_state( ctx );
+   _mesa_init_stencil( ctx );
+   _mesa_init_transform( ctx );
+   _mesa_init_transform_feedback( ctx );
+   _mesa_init_varray( ctx );
+   _mesa_init_viewport( ctx );
+
+   if (!_mesa_init_texture( ctx ))
+      return GL_FALSE;
+
+   _mesa_init_texture_s3tc( ctx );
+
+   /* Miscellaneous */
+   ctx->NewState = _NEW_ALL;
+   ctx->NewDriverState = ~0;
+   ctx->ErrorValue = GL_NO_ERROR;
+   ctx->ShareGroupReset = false;
+   ctx->varying_vp_inputs = VERT_BIT_ALL;
+
+   return GL_TRUE;
+}
+
+
+/**
+ * Update default objects in a GL context with respect to shared state.
+ *
+ * \param ctx GL context.
+ *
+ * Removes references to old default objects, (texture objects, program
+ * objects, etc.) and changes to reference those from the current shared
+ * state.
+ */
+static GLboolean
+update_default_objects(struct gl_context *ctx)
+{
+   assert(ctx);
+
+   _mesa_update_default_objects_program(ctx);
+   _mesa_update_default_objects_texture(ctx);
+   _mesa_update_default_objects_buffer_objects(ctx);
+
+   return GL_TRUE;
+}
+
+
+/**
+ * This is the default function we plug into all dispatch table slots
+ * This helps prevents a segfault when someone calls a GL function without
+ * first checking if the extension's supported.
+ */
+int
+_mesa_generic_nop(void)
+{
+   GET_CURRENT_CONTEXT(ctx);
+   _mesa_error(ctx, GL_INVALID_OPERATION,
+               "unsupported function called "
+               "(unsupported extension or deprecated function?)");
+   return 0;
+}
+
+
+/**
+ * Allocate and initialize a new dispatch table.
+ */
+struct _glapi_table *
+_mesa_alloc_dispatch_table()
+{
+   /* Find the larger of Mesa's dispatch table and libGL's dispatch table.
+    * In practice, this'll be the same for stand-alone Mesa.  But for DRI
+    * Mesa we do this to accommodate different versions of libGL and various
+    * DRI drivers.
+    */
+   GLint numEntries = MAX2(_glapi_get_dispatch_table_size(), _gloffset_COUNT);
+   struct _glapi_table *table;
+
+   table = malloc(numEntries * sizeof(_glapi_proc));
+   if (table) {
+      _glapi_proc *entry = (_glapi_proc *) table;
+      GLint i;
+      for (i = 0; i < numEntries; i++) {
+         entry[i] = (_glapi_proc) _mesa_generic_nop;
+      }
+   }
+   return table;
+}
+
+/**
+ * Creates a minimal dispatch table for use within glBegin()/glEnd().
+ *
+ * This ensures that we generate GL_INVALID_OPERATION errors from most
+ * functions, since the set of functions that are valid within Begin/End is
+ * very small.
+ *
+ * From the GL 1.0 specification section 2.6.3, "GL Commands within
+ * Begin/End"
+ *
+ *     "The only GL commands that are allowed within any Begin/End pairs are
+ *      the commands for specifying vertex coordinates, vertex color, normal
+ *      coordinates, and texture coordinates (Vertex, Color, Index, Normal,
+ *      TexCoord), EvalCoord and EvalPoint commands (see section 5.1),
+ *      commands for specifying lighting material parameters (Material
+ *      commands see section 2.12.2), display list invocation commands
+ *      (CallList and CallLists see section 5.4), and the EdgeFlag
+ *      command. Executing Begin after Begin has already been executed but
+ *      before an End is issued generates the INVALID OPERATION error, as does
+ *      executing End without a previous corresponding Begin. Executing any
+ *      other GL command within Begin/End results in the error INVALID
+ *      OPERATION."
+ *
+ * The table entries for specifying vertex attributes are set up by
+ * install_vtxfmt() and _mesa_loopback_init_api_table(), and End() and dlists
+ * are set by install_vtxfmt() as well.
+ */
+static struct _glapi_table *
+create_beginend_table(const struct gl_context *ctx)
+{
+   struct _glapi_table *table;
+
+   table = _mesa_alloc_dispatch_table();
+   if (!table)
+      return NULL;
+
+   /* Fill in functions which return a value, since they should return some
+    * specific value even if they emit a GL_INVALID_OPERATION error from them
+    * being called within glBegin()/glEnd().
+    */
+#define COPY_DISPATCH(func) SET_##func(table, GET_##func(ctx->Exec))
+
+   COPY_DISPATCH(GenLists);
+   COPY_DISPATCH(IsProgram);
+   COPY_DISPATCH(IsVertexArray);
+   COPY_DISPATCH(IsBuffer);
+   COPY_DISPATCH(IsEnabled);
+   COPY_DISPATCH(IsEnabledi);
+   COPY_DISPATCH(IsRenderbuffer);
+   COPY_DISPATCH(IsFramebuffer);
+   COPY_DISPATCH(CheckFramebufferStatus);
+   COPY_DISPATCH(RenderMode);
+   COPY_DISPATCH(GetString);
+   COPY_DISPATCH(GetStringi);
+   COPY_DISPATCH(GetPointerv);
+   COPY_DISPATCH(IsQuery);
+   COPY_DISPATCH(IsSampler);
+   COPY_DISPATCH(IsSync);
+   COPY_DISPATCH(IsTexture);
+   COPY_DISPATCH(IsTransformFeedback);
+   COPY_DISPATCH(DeleteQueries);
+   COPY_DISPATCH(AreTexturesResident);
+   COPY_DISPATCH(FenceSync);
+   COPY_DISPATCH(ClientWaitSync);
+   COPY_DISPATCH(MapBuffer);
+   COPY_DISPATCH(UnmapBuffer);
+   COPY_DISPATCH(MapBufferRange);
+   COPY_DISPATCH(ObjectPurgeableAPPLE);
+   COPY_DISPATCH(ObjectUnpurgeableAPPLE);
+
+   _mesa_loopback_init_api_table(ctx, table);
+
+   return table;
+}
+
+void
+_mesa_initialize_dispatch_tables(struct gl_context *ctx)
+{
+   /* Do the code-generated setup of the exec table in api_exec.c. */
+   _mesa_initialize_exec_table(ctx);
+
+   if (ctx->Save)
+      _mesa_initialize_save_table(ctx);
+}
+
+
+/**
+ * Initialize a struct gl_context struct (rendering context).
+ *
+ * This includes allocating all the other structs and arrays which hang off of
+ * the context by pointers.
+ * Note that the driver needs to pass in its dd_function_table here since
+ * we need to at least call driverFunctions->NewTextureObject to create the
+ * default texture objects.
+ * 
+ * Called by _mesa_create_context().
+ *
+ * Performs the imports and exports callback tables initialization, and
+ * miscellaneous one-time initializations. If no shared context is supplied one
+ * is allocated, and increase its reference count.  Setups the GL API dispatch
+ * tables.  Initialize the TNL module. Sets the maximum Z buffer depth.
+ * Finally queries the \c MESA_DEBUG and \c MESA_VERBOSE environment variables
+ * for debug flags.
+ *
+ * \param ctx the context to initialize
+ * \param api the GL API type to create the context for
+ * \param visual describes the visual attributes for this context or NULL to
+ *               create a configless context
+ * \param share_list points to context to share textures, display lists,
+ *        etc with, or NULL
+ * \param driverFunctions table of device driver functions for this context
+ *        to use
+ */
+GLboolean
+_mesa_initialize_context(struct gl_context *ctx,
+                         gl_api api,
+                         const struct gl_config *visual,
+                         struct gl_context *share_list,
+                         const struct dd_function_table *driverFunctions)
+{
+   struct gl_shared_state *shared;
+   int i;
+
+   assert(driverFunctions->NewTextureObject);
+   assert(driverFunctions->FreeTextureImageBuffer);
+
+   ctx->API = api;
+   ctx->DrawBuffer = NULL;
+   ctx->ReadBuffer = NULL;
+   ctx->WinSysDrawBuffer = NULL;
+   ctx->WinSysReadBuffer = NULL;
+
+   if (visual) {
+      ctx->Visual = *visual;
+      ctx->HasConfig = GL_TRUE;
+   }
+   else {
+      memset(&ctx->Visual, 0, sizeof ctx->Visual);
+      ctx->HasConfig = GL_FALSE;
+   }
+
+   if (_mesa_is_desktop_gl(ctx)) {
+      _mesa_override_gl_version(ctx);
+   }
+
+   /* misc one-time initializations */
+   one_time_init(ctx);
+
+   /* Plug in driver functions and context pointer here.
+    * This is important because when we call alloc_shared_state() below
+    * we'll call ctx->Driver.NewTextureObject() to create the default
+    * textures.
+    */
+   ctx->Driver = *driverFunctions;
+
+   if (share_list) {
+      /* share state with another context */
+      shared = share_list->Shared;
+   }
+   else {
+      /* allocate new, unshared state */
+      shared = _mesa_alloc_shared_state(ctx);
+      if (!shared)
+         return GL_FALSE;
+   }
+
+   _mesa_reference_shared_state(ctx, &ctx->Shared, shared);
+
+   if (!init_attrib_groups( ctx ))
+      goto fail;
+
+   /* setup the API dispatch tables with all nop functions */
+   ctx->OutsideBeginEnd = _mesa_alloc_dispatch_table();
+   if (!ctx->OutsideBeginEnd)
+      goto fail;
+   ctx->Exec = ctx->OutsideBeginEnd;
+   ctx->CurrentDispatch = ctx->OutsideBeginEnd;
+
+   ctx->FragmentProgram._MaintainTexEnvProgram
+      = (_mesa_getenv("MESA_TEX_PROG") != NULL);
+
+   ctx->VertexProgram._MaintainTnlProgram
+      = (_mesa_getenv("MESA_TNL_PROG") != NULL);
+   if (ctx->VertexProgram._MaintainTnlProgram) {
+      /* this is required... */
+      ctx->FragmentProgram._MaintainTexEnvProgram = GL_TRUE;
+   }
+
+   /* Mesa core handles all the formats that mesa core knows about.
+    * Drivers will want to override this list with just the formats
+    * they can handle, and confirm that appropriate fallbacks exist in
+    * _mesa_choose_tex_format().
+    */
+   memset(&ctx->TextureFormatSupported, GL_TRUE,
+	  sizeof(ctx->TextureFormatSupported));
+
+   switch (ctx->API) {
+   case API_OPENGL_COMPAT:
+      ctx->BeginEnd = create_beginend_table(ctx);
+      ctx->Save = _mesa_alloc_dispatch_table();
+      if (!ctx->BeginEnd || !ctx->Save)
+         goto fail;
+
+      /* fall-through */
+   case API_OPENGL_CORE:
+      break;
+   case API_OPENGLES:
+      /**
+       * GL_OES_texture_cube_map says
+       * "Initially all texture generation modes are set to REFLECTION_MAP_OES"
+       */
+      for (i = 0; i < MAX_TEXTURE_UNITS; i++) {
+	 struct gl_texture_unit *texUnit = &ctx->Texture.Unit[i];
+	 texUnit->GenS.Mode = GL_REFLECTION_MAP_NV;
+	 texUnit->GenT.Mode = GL_REFLECTION_MAP_NV;
+	 texUnit->GenR.Mode = GL_REFLECTION_MAP_NV;
+	 texUnit->GenS._ModeBit = TEXGEN_REFLECTION_MAP_NV;
+	 texUnit->GenT._ModeBit = TEXGEN_REFLECTION_MAP_NV;
+	 texUnit->GenR._ModeBit = TEXGEN_REFLECTION_MAP_NV;
+      }
+      break;
+   case API_OPENGLES2:
+      ctx->FragmentProgram._MaintainTexEnvProgram = GL_TRUE;
+      ctx->VertexProgram._MaintainTnlProgram = GL_TRUE;
+      break;
+   }
+
+   ctx->FirstTimeCurrent = GL_TRUE;
+
+   ctx->GlslFlags =_mesa_get_shader_flags();
+
+   return GL_TRUE;
+
+fail:
+   _mesa_reference_shared_state(ctx, &ctx->Shared, NULL);
+   free(ctx->BeginEnd);
+   free(ctx->OutsideBeginEnd);
+   free(ctx->Save);
+   return GL_FALSE;
+}
+
+
+/**
+ * Allocate and initialize a struct gl_context structure.
+ * Note that the driver needs to pass in its dd_function_table here since
+ * we need to at least call driverFunctions->NewTextureObject to initialize
+ * the rendering context.
+ *
+ * \param api the GL API type to create the context for
+ * \param visual a struct gl_config pointer (we copy the struct contents) or
+ *               NULL to create a configless context
+ * \param share_list another context to share display lists with or NULL
+ * \param driverFunctions points to the dd_function_table into which the
+ *        driver has plugged in all its special functions.
+ * 
+ * \return pointer to a new __struct gl_contextRec or NULL if error.
+ */
+struct gl_context *
+_mesa_create_context(gl_api api,
+                     const struct gl_config *visual,
+                     struct gl_context *share_list,
+                     const struct dd_function_table *driverFunctions)
+{
+   struct gl_context *ctx;
+
+   ctx = calloc(1, sizeof(struct gl_context));
+   if (!ctx)
+      return NULL;
+
+   if (_mesa_initialize_context(ctx, api, visual, share_list,
+                                driverFunctions)) {
+      return ctx;
+   }
+   else {
+      free(ctx);
+      return NULL;
+   }
+}
+
+void
+_mesa_enable_glsl_threadpool(struct gl_context *ctx, int max_threads)
+{
+   if (!ctx->ThreadPool)
+      ctx->ThreadPool = _mesa_glsl_get_threadpool(max_threads);
+}
+
+static void
+wait_shader_object_cb(GLuint id, void *data, void *userData)
+{
+   struct gl_context *ctx = (struct gl_context *) userData;
+   struct gl_shader *sh = (struct gl_shader *) data;
+
+   if (_mesa_validate_shader_target(ctx, sh->Type)) {
+      _mesa_wait_shaders(ctx, &sh, 1);
+   }
+   else {
+      struct gl_shader_program *shProg = (struct gl_shader_program *) data;
+      _mesa_wait_shader_program(ctx, shProg);
+   }
+}
+
+/**
+ * Free the data associated with the given context.
+ * 
+ * But doesn't free the struct gl_context struct itself.
+ *
+ * \sa _mesa_initialize_context() and init_attrib_groups().
+ */
+void
+_mesa_free_context_data( struct gl_context *ctx )
+{
+   if (!_mesa_get_current_context()){
+      /* No current context, but we may need one in order to delete
+       * texture objs, etc.  So temporarily bind the context now.
+       */
+      _mesa_make_current(ctx, NULL, NULL);
+   }
+
+   if (ctx->ThreadPool) {
+      _mesa_HashWalk(ctx->Shared->ShaderObjects, wait_shader_object_cb, ctx);
+      _mesa_threadpool_unref(ctx->ThreadPool);
+      ctx->ThreadPool = NULL;
+   }
+
+   /* unreference WinSysDraw/Read buffers */
+   _mesa_reference_framebuffer(&ctx->WinSysDrawBuffer, NULL);
+   _mesa_reference_framebuffer(&ctx->WinSysReadBuffer, NULL);
+   _mesa_reference_framebuffer(&ctx->DrawBuffer, NULL);
+   _mesa_reference_framebuffer(&ctx->ReadBuffer, NULL);
+
+   _mesa_reference_vertprog(ctx, &ctx->VertexProgram.Current, NULL);
+   _mesa_reference_vertprog(ctx, &ctx->VertexProgram._Current, NULL);
+   _mesa_reference_vertprog(ctx, &ctx->VertexProgram._TnlProgram, NULL);
+
+   _mesa_reference_geomprog(ctx, &ctx->GeometryProgram.Current, NULL);
+   _mesa_reference_geomprog(ctx, &ctx->GeometryProgram._Current, NULL);
+
+   _mesa_reference_fragprog(ctx, &ctx->FragmentProgram.Current, NULL);
+   _mesa_reference_fragprog(ctx, &ctx->FragmentProgram._Current, NULL);
+   _mesa_reference_fragprog(ctx, &ctx->FragmentProgram._TexEnvProgram, NULL);
+
+   _mesa_reference_vao(ctx, &ctx->Array.VAO, NULL);
+   _mesa_reference_vao(ctx, &ctx->Array.DefaultVAO, NULL);
+
+   _mesa_free_attrib_data(ctx);
+   _mesa_free_buffer_objects(ctx);
+   _mesa_free_lighting_data( ctx );
+   _mesa_free_eval_data( ctx );
+   _mesa_free_texture_data( ctx );
+   _mesa_free_matrix_data( ctx );
+   _mesa_free_viewport_data( ctx );
+   _mesa_free_pipeline_data(ctx);
+   _mesa_free_program_data(ctx);
+   _mesa_free_shader_state(ctx);
+   _mesa_free_queryobj_data(ctx);
+   _mesa_free_sync_data(ctx);
+   _mesa_free_varray_data(ctx);
+   _mesa_free_transform_feedback(ctx);
+   _mesa_free_performance_monitors(ctx);
+
+   _mesa_reference_buffer_object(ctx, &ctx->Pack.BufferObj, NULL);
+   _mesa_reference_buffer_object(ctx, &ctx->Unpack.BufferObj, NULL);
+   _mesa_reference_buffer_object(ctx, &ctx->DefaultPacking.BufferObj, NULL);
+   _mesa_reference_buffer_object(ctx, &ctx->Array.ArrayBufferObj, NULL);
+
+   /* free dispatch tables */
+   free(ctx->BeginEnd);
+   free(ctx->OutsideBeginEnd);
+   free(ctx->Save);
+
+   /* Shared context state (display lists, textures, etc) */
+   _mesa_reference_shared_state(ctx, &ctx->Shared, NULL);
+
+   /* needs to be after freeing shared state */
+   _mesa_free_display_list_data(ctx);
+
+   _mesa_free_errors_data(ctx);
+
+   free((void *)ctx->Extensions.String);
+
+   free(ctx->VersionString);
+
+   /* unbind the context if it's currently bound */
+   if (ctx == _mesa_get_current_context()) {
+      _mesa_make_current(NULL, NULL, NULL);
+   }
+}
+
+
+/**
+ * Destroy a struct gl_context structure.
+ *
+ * \param ctx GL context.
+ * 
+ * Calls _mesa_free_context_data() and frees the gl_context object itself.
+ */
+void
+_mesa_destroy_context( struct gl_context *ctx )
+{
+   if (ctx) {
+      _mesa_free_context_data(ctx);
+      free( (void *) ctx );
+   }
+}
+
+
+/**
+ * Copy attribute groups from one context to another.
+ * 
+ * \param src source context
+ * \param dst destination context
+ * \param mask bitwise OR of GL_*_BIT flags
+ *
+ * According to the bits specified in \p mask, copies the corresponding
+ * attributes from \p src into \p dst.  For many of the attributes a simple \c
+ * memcpy is not enough due to the existence of internal pointers in their data
+ * structures.
+ */
+void
+_mesa_copy_context( const struct gl_context *src, struct gl_context *dst,
+                    GLuint mask )
+{
+   if (mask & GL_ACCUM_BUFFER_BIT) {
+      /* OK to memcpy */
+      dst->Accum = src->Accum;
+   }
+   if (mask & GL_COLOR_BUFFER_BIT) {
+      /* OK to memcpy */
+      dst->Color = src->Color;
+   }
+   if (mask & GL_CURRENT_BIT) {
+      /* OK to memcpy */
+      dst->Current = src->Current;
+   }
+   if (mask & GL_DEPTH_BUFFER_BIT) {
+      /* OK to memcpy */
+      dst->Depth = src->Depth;
+   }
+   if (mask & GL_ENABLE_BIT) {
+      /* no op */
+   }
+   if (mask & GL_EVAL_BIT) {
+      /* OK to memcpy */
+      dst->Eval = src->Eval;
+   }
+   if (mask & GL_FOG_BIT) {
+      /* OK to memcpy */
+      dst->Fog = src->Fog;
+   }
+   if (mask & GL_HINT_BIT) {
+      /* OK to memcpy */
+      dst->Hint = src->Hint;
+   }
+   if (mask & GL_LIGHTING_BIT) {
+      GLuint i;
+      /* begin with memcpy */
+      dst->Light = src->Light;
+      /* fixup linked lists to prevent pointer insanity */
+      make_empty_list( &(dst->Light.EnabledList) );
+      for (i = 0; i < MAX_LIGHTS; i++) {
+         if (dst->Light.Light[i].Enabled) {
+            insert_at_tail(&(dst->Light.EnabledList), &(dst->Light.Light[i]));
+         }
+      }
+   }
+   if (mask & GL_LINE_BIT) {
+      /* OK to memcpy */
+      dst->Line = src->Line;
+   }
+   if (mask & GL_LIST_BIT) {
+      /* OK to memcpy */
+      dst->List = src->List;
+   }
+   if (mask & GL_PIXEL_MODE_BIT) {
+      /* OK to memcpy */
+      dst->Pixel = src->Pixel;
+   }
+   if (mask & GL_POINT_BIT) {
+      /* OK to memcpy */
+      dst->Point = src->Point;
+   }
+   if (mask & GL_POLYGON_BIT) {
+      /* OK to memcpy */
+      dst->Polygon = src->Polygon;
+   }
+   if (mask & GL_POLYGON_STIPPLE_BIT) {
+      /* Use loop instead of memcpy due to problem with Portland Group's
+       * C compiler.  Reported by John Stone.
+       */
+      GLuint i;
+      for (i = 0; i < 32; i++) {
+         dst->PolygonStipple[i] = src->PolygonStipple[i];
+      }
+   }
+   if (mask & GL_SCISSOR_BIT) {
+      /* OK to memcpy */
+      dst->Scissor = src->Scissor;
+   }
+   if (mask & GL_STENCIL_BUFFER_BIT) {
+      /* OK to memcpy */
+      dst->Stencil = src->Stencil;
+   }
+   if (mask & GL_TEXTURE_BIT) {
+      /* Cannot memcpy because of pointers */
+      _mesa_copy_texture_state(src, dst);
+   }
+   if (mask & GL_TRANSFORM_BIT) {
+      /* OK to memcpy */
+      dst->Transform = src->Transform;
+   }
+   if (mask & GL_VIEWPORT_BIT) {
+      /* Cannot use memcpy, because of pointers in GLmatrix _WindowMap */
+      unsigned i;
+      for (i = 0; i < src->Const.MaxViewports; i++) {
+         dst->ViewportArray[i].X = src->ViewportArray[i].X;
+         dst->ViewportArray[i].Y = src->ViewportArray[i].Y;
+         dst->ViewportArray[i].Width = src->ViewportArray[i].Width;
+         dst->ViewportArray[i].Height = src->ViewportArray[i].Height;
+         dst->ViewportArray[i].Near = src->ViewportArray[i].Near;
+         dst->ViewportArray[i].Far = src->ViewportArray[i].Far;
+         _math_matrix_copy(&dst->ViewportArray[i]._WindowMap,
+                           &src->ViewportArray[i]._WindowMap);
+      }
+   }
+
+   /* XXX FIXME:  Call callbacks?
+    */
+   dst->NewState = _NEW_ALL;
+   dst->NewDriverState = ~0;
+}
+
+
+/**
+ * Check if the given context can render into the given framebuffer
+ * by checking visual attributes.
+ *
+ * Most of these tests could go away because Mesa is now pretty flexible
+ * in terms of mixing rendering contexts with framebuffers.  As long
+ * as RGB vs. CI mode agree, we're probably good.
+ *
+ * \return GL_TRUE if compatible, GL_FALSE otherwise.
+ */
+static GLboolean 
+check_compatible(const struct gl_context *ctx,
+                 const struct gl_framebuffer *buffer)
+{
+   const struct gl_config *ctxvis = &ctx->Visual;
+   const struct gl_config *bufvis = &buffer->Visual;
+
+   if (buffer == _mesa_get_incomplete_framebuffer())
+      return GL_TRUE;
+
+#if 0
+   /* disabling this fixes the fgl_glxgears pbuffer demo */
+   if (ctxvis->doubleBufferMode && !bufvis->doubleBufferMode)
+      return GL_FALSE;
+#endif
+   if (ctxvis->stereoMode && !bufvis->stereoMode)
+      return GL_FALSE;
+   if (ctxvis->haveAccumBuffer && !bufvis->haveAccumBuffer)
+      return GL_FALSE;
+   if (ctxvis->haveDepthBuffer && !bufvis->haveDepthBuffer)
+      return GL_FALSE;
+   if (ctxvis->haveStencilBuffer && !bufvis->haveStencilBuffer)
+      return GL_FALSE;
+   if (ctxvis->redMask && ctxvis->redMask != bufvis->redMask)
+      return GL_FALSE;
+   if (ctxvis->greenMask && ctxvis->greenMask != bufvis->greenMask)
+      return GL_FALSE;
+   if (ctxvis->blueMask && ctxvis->blueMask != bufvis->blueMask)
+      return GL_FALSE;
+#if 0
+   /* disabled (see bug 11161) */
+   if (ctxvis->depthBits && ctxvis->depthBits != bufvis->depthBits)
+      return GL_FALSE;
+#endif
+   if (ctxvis->stencilBits && ctxvis->stencilBits != bufvis->stencilBits)
+      return GL_FALSE;
+
+   return GL_TRUE;
+}
+
+
+/**
+ * Check if the viewport/scissor size has not yet been initialized.
+ * Initialize the size if the given width and height are non-zero.
+ */
+void
+_mesa_check_init_viewport(struct gl_context *ctx, GLuint width, GLuint height)
+{
+   if (!ctx->ViewportInitialized && width > 0 && height > 0) {
+      unsigned i;
+
+      /* Note: set flag here, before calling _mesa_set_viewport(), to prevent
+       * potential infinite recursion.
+       */
+      ctx->ViewportInitialized = GL_TRUE;
+
+      /* Note: ctx->Const.MaxViewports may not have been set by the driver
+       * yet, so just initialize all of them.
+       */
+      for (i = 0; i < MAX_VIEWPORTS; i++) {
+         _mesa_set_viewport(ctx, i, 0, 0, width, height);
+         _mesa_set_scissor(ctx, i, 0, 0, width, height);
+      }
+   }
+}
+
+static void
+handle_first_current(struct gl_context *ctx)
+{
+   GLenum buffer;
+   GLint bufferIndex;
+
+   assert(ctx->Version > 0);
+
+   ctx->Extensions.String = _mesa_make_extension_string(ctx);
+
+   check_context_limits(ctx);
+
+   /* According to GL_MESA_configless_context the default value of
+    * glDrawBuffers depends on the config of the first surface it is bound to.
+    * For GLES it is always GL_BACK which has a magic interpretation */
+   if (!ctx->HasConfig && _mesa_is_desktop_gl(ctx)) {
+      if (ctx->DrawBuffer != _mesa_get_incomplete_framebuffer()) {
+         if (ctx->DrawBuffer->Visual.doubleBufferMode)
+            buffer = GL_BACK;
+         else
+            buffer = GL_FRONT;
+
+         _mesa_drawbuffers(ctx, 1, &buffer, NULL /* destMask */);
+      }
+
+      if (ctx->ReadBuffer != _mesa_get_incomplete_framebuffer()) {
+         if (ctx->ReadBuffer->Visual.doubleBufferMode) {
+            buffer = GL_BACK;
+            bufferIndex = BUFFER_BACK_LEFT;
+         }
+         else {
+            buffer = GL_FRONT;
+            bufferIndex = BUFFER_FRONT_LEFT;
+         }
+
+         _mesa_readbuffer(ctx, buffer, bufferIndex);
+      }
+   }
+
+   /* We can use this to help debug user's problems.  Tell them to set
+    * the MESA_INFO env variable before running their app.  Then the
+    * first time each context is made current we'll print some useful
+    * information.
+    */
+   if (_mesa_getenv("MESA_INFO")) {
+      _mesa_print_info(ctx);
+   }
+}
+
+/**
+ * Bind the given context to the given drawBuffer and readBuffer and
+ * make it the current context for the calling thread.
+ * We'll render into the drawBuffer and read pixels from the
+ * readBuffer (i.e. glRead/CopyPixels, glCopyTexImage, etc).
+ *
+ * We check that the context's and framebuffer's visuals are compatible
+ * and return immediately if they're not.
+ *
+ * \param newCtx  the new GL context. If NULL then there will be no current GL
+ *                context.
+ * \param drawBuffer  the drawing framebuffer
+ * \param readBuffer  the reading framebuffer
+ */
+GLboolean
+_mesa_make_current( struct gl_context *newCtx,
+                    struct gl_framebuffer *drawBuffer,
+                    struct gl_framebuffer *readBuffer )
+{
+   GET_CURRENT_CONTEXT(curCtx);
+
+   if (MESA_VERBOSE & VERBOSE_API)
+      _mesa_debug(newCtx, "_mesa_make_current()\n");
+
+   /* Check that the context's and framebuffer's visuals are compatible.
+    */
+   if (newCtx && drawBuffer && newCtx->WinSysDrawBuffer != drawBuffer) {
+      if (!check_compatible(newCtx, drawBuffer)) {
+         _mesa_warning(newCtx,
+              "MakeCurrent: incompatible visuals for context and drawbuffer");
+         return GL_FALSE;
+      }
+   }
+   if (newCtx && readBuffer && newCtx->WinSysReadBuffer != readBuffer) {
+      if (!check_compatible(newCtx, readBuffer)) {
+         _mesa_warning(newCtx,
+              "MakeCurrent: incompatible visuals for context and readbuffer");
+         return GL_FALSE;
+      }
+   }
+
+   if (curCtx && 
+      (curCtx->WinSysDrawBuffer || curCtx->WinSysReadBuffer) &&
+       /* make sure this context is valid for flushing */
+      curCtx != newCtx)
+      _mesa_flush(curCtx);
+
+   /* We used to call _glapi_check_multithread() here.  Now do it in drivers */
+   _glapi_set_context((void *) newCtx);
+   ASSERT(_mesa_get_current_context() == newCtx);
+
+   if (!newCtx) {
+      _glapi_set_dispatch(NULL);  /* none current */
+   }
+   else {
+      _glapi_set_dispatch(newCtx->CurrentDispatch);
+
+      if (drawBuffer && readBuffer) {
+         ASSERT(_mesa_is_winsys_fbo(drawBuffer));
+         ASSERT(_mesa_is_winsys_fbo(readBuffer));
+         _mesa_reference_framebuffer(&newCtx->WinSysDrawBuffer, drawBuffer);
+         _mesa_reference_framebuffer(&newCtx->WinSysReadBuffer, readBuffer);
+
+         /*
+          * Only set the context's Draw/ReadBuffer fields if they're NULL
+          * or not bound to a user-created FBO.
+          */
+         if (!newCtx->DrawBuffer || _mesa_is_winsys_fbo(newCtx->DrawBuffer)) {
+            _mesa_reference_framebuffer(&newCtx->DrawBuffer, drawBuffer);
+            /* Update the FBO's list of drawbuffers/renderbuffers.
+             * For winsys FBOs this comes from the GL state (which may have
+             * changed since the last time this FBO was bound).
+             */
+            _mesa_update_draw_buffers(newCtx);
+         }
+         if (!newCtx->ReadBuffer || _mesa_is_winsys_fbo(newCtx->ReadBuffer)) {
+            _mesa_reference_framebuffer(&newCtx->ReadBuffer, readBuffer);
+         }
+
+         /* XXX only set this flag if we're really changing the draw/read
+          * framebuffer bindings.
+          */
+	 newCtx->NewState |= _NEW_BUFFERS;
+
+         if (drawBuffer) {
+            _mesa_check_init_viewport(newCtx,
+                                      drawBuffer->Width, drawBuffer->Height);
+         }
+      }
+
+      if (newCtx->FirstTimeCurrent) {
+         handle_first_current(newCtx);
+	 newCtx->FirstTimeCurrent = GL_FALSE;
+      }
+   }
+   
+   return GL_TRUE;
+}
+
+
+/**
+ * Make context 'ctx' share the display lists, textures and programs
+ * that are associated with 'ctxToShare'.
+ * Any display lists, textures or programs associated with 'ctx' will
+ * be deleted if nobody else is sharing them.
+ */
+GLboolean
+_mesa_share_state(struct gl_context *ctx, struct gl_context *ctxToShare)
+{
+   if (ctx && ctxToShare && ctx->Shared && ctxToShare->Shared) {
+      struct gl_shared_state *oldShared = NULL;
+
+      /* save ref to old state to prevent it from being deleted immediately */
+      _mesa_reference_shared_state(ctx, &oldShared, ctx->Shared);
+
+      /* update ctx's Shared pointer */
+      _mesa_reference_shared_state(ctx, &ctx->Shared, ctxToShare->Shared);
+
+      update_default_objects(ctx);
+
+      /* release the old shared state */
+      _mesa_reference_shared_state(ctx, &oldShared, NULL);
+
+      return GL_TRUE;
+   }
+   else {
+      return GL_FALSE;
+   }
+}
+
+
+
+/**
+ * \return pointer to the current GL context for this thread.
+ * 
+ * Calls _glapi_get_context(). This isn't the fastest way to get the current
+ * context.  If you need speed, see the #GET_CURRENT_CONTEXT macro in
+ * context.h.
+ */
+struct gl_context *
+_mesa_get_current_context( void )
+{
+   return (struct gl_context *) _glapi_get_context();
+}
+
+
+/**
+ * Get context's current API dispatch table.
+ *
+ * It'll either be the immediate-mode execute dispatcher or the display list
+ * compile dispatcher.
+ * 
+ * \param ctx GL context.
+ *
+ * \return pointer to dispatch_table.
+ *
+ * Simply returns __struct gl_contextRec::CurrentDispatch.
+ */
+struct _glapi_table *
+_mesa_get_dispatch(struct gl_context *ctx)
+{
+   return ctx->CurrentDispatch;
+}
+
+/*@}*/
+
+
+/**********************************************************************/
+/** \name Miscellaneous functions                                     */
+/**********************************************************************/
+/*@{*/
+
+/**
+ * Record an error.
+ *
+ * \param ctx GL context.
+ * \param error error code.
+ * 
+ * Records the given error code and call the driver's dd_function_table::Error
+ * function if defined.
+ *
+ * \sa
+ * This is called via _mesa_error().
+ */
+void
+_mesa_record_error(struct gl_context *ctx, GLenum error)
+{
+   if (!ctx)
+      return;
+
+   if (ctx->ErrorValue == GL_NO_ERROR) {
+      ctx->ErrorValue = error;
+   }
+}
+
+
+/**
+ * Flush commands and wait for completion.
+ */
+void
+_mesa_finish(struct gl_context *ctx)
+{
+   FLUSH_VERTICES( ctx, 0 );
+   FLUSH_CURRENT( ctx, 0 );
+   if (ctx->Driver.Finish) {
+      ctx->Driver.Finish(ctx);
+   }
+}
+
+
+/**
+ * Flush commands.
+ */
+void
+_mesa_flush(struct gl_context *ctx)
+{
+   FLUSH_VERTICES( ctx, 0 );
+   FLUSH_CURRENT( ctx, 0 );
+   if (ctx->Driver.Flush) {
+      ctx->Driver.Flush(ctx);
+   }
+}
+
+
+
+/**
+ * Execute glFinish().
+ *
+ * Calls the #ASSERT_OUTSIDE_BEGIN_END_AND_FLUSH macro and the
+ * dd_function_table::Finish driver callback, if not NULL.
+ */
+void GLAPIENTRY
+_mesa_Finish(void)
+{
+   GET_CURRENT_CONTEXT(ctx);
+   ASSERT_OUTSIDE_BEGIN_END(ctx);
+   _mesa_finish(ctx);
+}
+
+
+/**
+ * Execute glFlush().
+ *
+ * Calls the #ASSERT_OUTSIDE_BEGIN_END_AND_FLUSH macro and the
+ * dd_function_table::Flush driver callback, if not NULL.
+ */
+void GLAPIENTRY
+_mesa_Flush(void)
+{
+   GET_CURRENT_CONTEXT(ctx);
+   ASSERT_OUTSIDE_BEGIN_END(ctx);
+   _mesa_flush(ctx);
+}
+
+
+/*
+ * ARB_blend_func_extended - ERRORS section
+ * "The error INVALID_OPERATION is generated by Begin or any procedure that
+ *  implicitly calls Begin if any draw buffer has a blend function requiring the
+ *  second color input (SRC1_COLOR, ONE_MINUS_SRC1_COLOR, SRC1_ALPHA or
+ *  ONE_MINUS_SRC1_ALPHA), and a framebuffer is bound that has more than
+ *  the value of MAX_DUAL_SOURCE_DRAW_BUFFERS-1 active color attachements."
+ */
+static GLboolean
+_mesa_check_blend_func_error(struct gl_context *ctx)
+{
+   GLuint i;
+   for (i = ctx->Const.MaxDualSourceDrawBuffers;
+	i < ctx->DrawBuffer->_NumColorDrawBuffers;
+	i++) {
+      if (ctx->Color.Blend[i]._UsesDualSrc) {
+	 _mesa_error(ctx, GL_INVALID_OPERATION,
+		     "dual source blend on illegal attachment");
+	 return GL_FALSE;
+      }
+   }
+   return GL_TRUE;
+}
+
+static bool
+shader_linked_or_absent(struct gl_context *ctx,
+                        const struct gl_shader_program *shProg,
+                        bool *shader_present, const char *where)
+{
+   if (shProg) {
+      *shader_present = true;
+
+      if (!shProg->LinkStatus) {
+         _mesa_error(ctx, GL_INVALID_OPERATION, "%s(shader not linked)", where);
+         return false;
+      }
+#if 0 /* not normally enabled */
+      {
+         char errMsg[100];
+         if (!_mesa_validate_shader_program(ctx, shProg, errMsg)) {
+            _mesa_warning(ctx, "Shader program %u is invalid: %s",
+                          shProg->Name, errMsg);
+         }
+      }
+#endif
+   }
+
+   return true;
+}
+
+/**
+ * Prior to drawing anything with glBegin, glDrawArrays, etc. this function
+ * is called to see if it's valid to render.  This involves checking that
+ * the current shader is valid and the framebuffer is complete.
+ * It also check the current pipeline object is valid if any.
+ * If an error is detected it'll be recorded here.
+ * \return GL_TRUE if OK to render, GL_FALSE if not
+ */
+GLboolean
+_mesa_valid_to_render(struct gl_context *ctx, const char *where)
+{
+   bool from_glsl_shader[MESA_SHADER_COMPUTE] = { false };
+   unsigned i;
+
+   /* This depends on having up to date derived state (shaders) */
+   if (ctx->NewState)
+      _mesa_update_state(ctx);
+
+   for (i = 0; i < MESA_SHADER_COMPUTE; i++) {
+      if (!shader_linked_or_absent(ctx, ctx->_Shader->CurrentProgram[i],
+                                   &from_glsl_shader[i], where))
+         return GL_FALSE;
+   }
+
+   /* Any shader stages that are not supplied by the GLSL shader and have
+    * assembly shaders enabled must now be validated.
+    */
+   if (!from_glsl_shader[MESA_SHADER_VERTEX]
+       && ctx->VertexProgram.Enabled && !ctx->VertexProgram._Enabled) {
+      _mesa_error(ctx, GL_INVALID_OPERATION,
+		  "%s(vertex program not valid)", where);
+      return GL_FALSE;
+   }
+
+   /* FINISHME: If GL_NV_geometry_program4 is ever supported, the current
+    * FINISHME: geometry program should validated here.
+    */
+   (void) from_glsl_shader[MESA_SHADER_GEOMETRY];
+
+   if (!from_glsl_shader[MESA_SHADER_FRAGMENT]) {
+      if (ctx->FragmentProgram.Enabled && !ctx->FragmentProgram._Enabled) {
+	 _mesa_error(ctx, GL_INVALID_OPERATION,
+		     "%s(fragment program not valid)", where);
+	 return GL_FALSE;
+      }
+
+      /* If drawing to integer-valued color buffers, there must be an
+       * active fragment shader (GL_EXT_texture_integer).
+       */
+      if (ctx->DrawBuffer && ctx->DrawBuffer->_IntegerColor) {
+         _mesa_error(ctx, GL_INVALID_OPERATION,
+                     "%s(integer format but no fragment shader)", where);
+         return GL_FALSE;
+      }
+   }
+
+   /* A pipeline object is bound */
+   if (ctx->_Shader->Name && !ctx->_Shader->Validated) {
+      /* Error message will be printed inside _mesa_validate_program_pipeline.
+       */
+      if (!_mesa_validate_program_pipeline(ctx, ctx->_Shader, GL_TRUE)) {
+         return GL_FALSE;
+      }
+   }
+
+   if (ctx->DrawBuffer->_Status != GL_FRAMEBUFFER_COMPLETE_EXT) {
+      _mesa_error(ctx, GL_INVALID_FRAMEBUFFER_OPERATION_EXT,
+                  "%s(incomplete framebuffer)", where);
+      return GL_FALSE;
+   }
+
+   if (_mesa_check_blend_func_error(ctx) == GL_FALSE) {
+      return GL_FALSE;
+   }
+
+#ifdef DEBUG
+   if (ctx->GlslFlags & GLSL_LOG) {
+      struct gl_shader_program **shProg = ctx->_Shader->CurrentProgram;
+      gl_shader_stage i;
+
+      for (i = 0; i < MESA_SHADER_STAGES; i++) {
+	 if (shProg[i] == NULL || shProg[i]->_Used
+	     || shProg[i]->_LinkedShaders[i] == NULL)
+	    continue;
+
+	 /* This is the first time this shader is being used.
+	  * Append shader's constants/uniforms to log file.
+	  *
+	  * Only log data for the program target that matches the shader
+	  * target.  It's possible to have a program bound to the vertex
+	  * shader target that also supplied a fragment shader.  If that
+	  * program isn't also bound to the fragment shader target we don't
+	  * want to log its fragment data.
+	  */
+	 _mesa_append_uniforms_to_file(shProg[i]->_LinkedShaders[i]);
+      }
+
+      for (i = 0; i < MESA_SHADER_STAGES; i++) {
+	 if (shProg[i] != NULL)
+	    shProg[i]->_Used = GL_TRUE;
+      }
+   }
+#endif
+
+   return GL_TRUE;
+}
+
+
+/*@}*/

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/context.h b/icd/intel/compiler/mesa-utils/src/mesa/main/context.h
new file mode 100644
index 0000000..b23f9fa
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/context.h

@@ -0,0 +1,334 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2006  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+/**
+ * \file context.h
+ * Mesa context and visual-related functions.
+ *
+ * There are three large Mesa data types/classes which are meant to be
+ * used by device drivers:
+ * - struct gl_context: this contains the Mesa rendering state
+ * - struct gl_config:  this describes the color buffer (RGB vs. ci), whether
+ *   or not there's a depth buffer, stencil buffer, etc.
+ * - struct gl_framebuffer:  contains pointers to the depth buffer, stencil
+ *   buffer, accum buffer and alpha buffers.
+ *
+ * These types should be encapsulated by corresponding device driver
+ * data types.  See xmesa.h and xmesaP.h for an example.
+ *
+ * In OOP terms, struct gl_context, struct gl_config, and struct gl_framebuffer
+ * are base classes which the device driver must derive from.
+ *
+ * The following functions create and destroy these data types.
+ */
+
+
+#ifndef CONTEXT_H
+#define CONTEXT_H
+
+
+#include "imports.h"
+#include "mtypes.h"
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+struct _glapi_table;
+
+
+/** \name Visual-related functions */
+/*@{*/
+ 
+extern struct gl_config *
+_mesa_create_visual( GLboolean dbFlag,
+                     GLboolean stereoFlag,
+                     GLint redBits,
+                     GLint greenBits,
+                     GLint blueBits,
+                     GLint alphaBits,
+                     GLint depthBits,
+                     GLint stencilBits,
+                     GLint accumRedBits,
+                     GLint accumGreenBits,
+                     GLint accumBlueBits,
+                     GLint accumAlphaBits,
+                     GLint numSamples );
+
+extern GLboolean
+_mesa_initialize_visual( struct gl_config *v,
+                         GLboolean dbFlag,
+                         GLboolean stereoFlag,
+                         GLint redBits,
+                         GLint greenBits,
+                         GLint blueBits,
+                         GLint alphaBits,
+                         GLint depthBits,
+                         GLint stencilBits,
+                         GLint accumRedBits,
+                         GLint accumGreenBits,
+                         GLint accumBlueBits,
+                         GLint accumAlphaBits,
+                         GLint numSamples );
+
+extern void
+_mesa_destroy_visual( struct gl_config *vis );
+
+/*@}*/
+
+
+/** \name Context-related functions */
+/*@{*/
+
+extern GLboolean
+_mesa_initialize_context( struct gl_context *ctx,
+                          gl_api api,
+                          const struct gl_config *visual,
+                          struct gl_context *share_list,
+                          const struct dd_function_table *driverFunctions);
+
+extern struct gl_context *
+_mesa_create_context(gl_api api,
+                     const struct gl_config *visual,
+                     struct gl_context *share_list,
+                     const struct dd_function_table *driverFunctions);
+
+extern void
+_mesa_enable_glsl_threadpool(struct gl_context *ctx, int max_threads);
+
+extern void
+_mesa_free_context_data( struct gl_context *ctx );
+
+extern void
+_mesa_destroy_context( struct gl_context *ctx );
+
+
+extern void
+_mesa_copy_context(const struct gl_context *src, struct gl_context *dst, GLuint mask);
+
+
+extern void
+_mesa_check_init_viewport(struct gl_context *ctx, GLuint width, GLuint height);
+
+extern GLboolean
+_mesa_make_current( struct gl_context *ctx, struct gl_framebuffer *drawBuffer,
+                    struct gl_framebuffer *readBuffer );
+
+extern GLboolean
+_mesa_share_state(struct gl_context *ctx, struct gl_context *ctxToShare);
+
+extern struct gl_context *
+_mesa_get_current_context(void);
+
+/*@}*/
+
+extern void
+_mesa_init_get_hash(struct gl_context *ctx);
+
+extern void
+_mesa_notifySwapBuffers(struct gl_context *gc);
+
+
+extern struct _glapi_table *
+_mesa_get_dispatch(struct gl_context *ctx);
+
+
+extern GLboolean
+_mesa_valid_to_render(struct gl_context *ctx, const char *where);
+
+
+
+/** \name Miscellaneous */
+/*@{*/
+
+extern void
+_mesa_record_error( struct gl_context *ctx, GLenum error );
+
+
+extern void
+_mesa_finish(struct gl_context *ctx);
+
+extern void
+_mesa_flush(struct gl_context *ctx);
+
+extern int
+_mesa_generic_nop(void);
+
+extern void GLAPIENTRY
+_mesa_Finish( void );
+
+extern void GLAPIENTRY
+_mesa_Flush( void );
+
+/*@}*/
+
+
+/**
+ * Are we currently between glBegin and glEnd?
+ * During execution, not display list compilation.
+ */
+static inline GLboolean
+_mesa_inside_begin_end(const struct gl_context *ctx)
+{
+   return ctx->Driver.CurrentExecPrimitive != PRIM_OUTSIDE_BEGIN_END;
+}
+
+
+/**
+ * Are we currently between glBegin and glEnd in a display list?
+ */
+static inline GLboolean
+_mesa_inside_dlist_begin_end(const struct gl_context *ctx)
+{
+   return ctx->Driver.CurrentSavePrimitive <= PRIM_MAX;
+}
+
+
+
+/**
+ * \name Macros for flushing buffered rendering commands before state changes,
+ * checking if inside glBegin/glEnd, etc.
+ */
+/*@{*/
+
+/**
+ * Flush vertices.
+ *
+ * \param ctx GL context.
+ * \param newstate new state.
+ *
+ * Checks if dd_function_table::NeedFlush is marked to flush stored vertices,
+ * and calls dd_function_table::FlushVertices if so. Marks
+ * __struct gl_contextRec::NewState with \p newstate.
+ */
+#define FLUSH_VERTICES(ctx, newstate)				\
+do {								\
+   if (MESA_VERBOSE & VERBOSE_STATE)				\
+      _mesa_debug(ctx, "FLUSH_VERTICES in %s\n", MESA_FUNCTION);\
+   if (ctx->Driver.NeedFlush & FLUSH_STORED_VERTICES)		\
+      ctx->Driver.FlushVertices(ctx, FLUSH_STORED_VERTICES);	\
+   ctx->NewState |= newstate;					\
+} while (0)
+
+/**
+ * Flush current state.
+ *
+ * \param ctx GL context.
+ * \param newstate new state.
+ *
+ * Checks if dd_function_table::NeedFlush is marked to flush current state,
+ * and calls dd_function_table::FlushVertices if so. Marks
+ * __struct gl_contextRec::NewState with \p newstate.
+ */
+#define FLUSH_CURRENT(ctx, newstate)				\
+do {								\
+   if (MESA_VERBOSE & VERBOSE_STATE)				\
+      _mesa_debug(ctx, "FLUSH_CURRENT in %s\n", MESA_FUNCTION);	\
+   if (ctx->Driver.NeedFlush & FLUSH_UPDATE_CURRENT)		\
+      ctx->Driver.FlushVertices(ctx, FLUSH_UPDATE_CURRENT);	\
+   ctx->NewState |= newstate;					\
+} while (0)
+
+/**
+ * Macro to assert that the API call was made outside the
+ * glBegin()/glEnd() pair, with return value.
+ * 
+ * \param ctx GL context.
+ * \param retval value to return in case the assertion fails.
+ */
+#define ASSERT_OUTSIDE_BEGIN_END_WITH_RETVAL(ctx, retval)		\
+do {									\
+   if (_mesa_inside_begin_end(ctx)) {					\
+      _mesa_error(ctx, GL_INVALID_OPERATION, "Inside glBegin/glEnd");	\
+      return retval;							\
+   }									\
+} while (0)
+
+/**
+ * Macro to assert that the API call was made outside the
+ * glBegin()/glEnd() pair.
+ * 
+ * \param ctx GL context.
+ */
+#define ASSERT_OUTSIDE_BEGIN_END(ctx)					\
+do {									\
+   if (_mesa_inside_begin_end(ctx)) {					\
+      _mesa_error(ctx, GL_INVALID_OPERATION, "Inside glBegin/glEnd");	\
+      return;								\
+   }									\
+} while (0)
+
+/*@}*/
+
+
+/**
+ * Checks if the context is for Desktop GL (Compatibility or Core)
+ */
+static inline GLboolean
+_mesa_is_desktop_gl(const struct gl_context *ctx)
+{
+   return ctx->API == API_OPENGL_COMPAT || ctx->API == API_OPENGL_CORE;
+}
+
+
+/**
+ * Checks if the context is for any GLES version
+ */
+static inline GLboolean
+_mesa_is_gles(const struct gl_context *ctx)
+{
+   return ctx->API == API_OPENGLES || ctx->API == API_OPENGLES2;
+}
+
+
+/**
+ * Checks if the context is for GLES 3.x
+ */
+static inline GLboolean
+_mesa_is_gles3(const struct gl_context *ctx)
+{
+   return ctx->API == API_OPENGLES2 && ctx->Version >= 30;
+}
+
+
+/**
+ * Checks if the context supports geometry shaders.
+ */
+static inline GLboolean
+_mesa_has_geometry_shaders(const struct gl_context *ctx)
+{
+   return _mesa_is_desktop_gl(ctx) &&
+      (ctx->Version >= 32 || ctx->Extensions.ARB_geometry_shader4);
+}
+
+
+#ifdef __cplusplus
+}
+#endif
+
+
+#endif /* CONTEXT_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/dd.h b/icd/intel/compiler/mesa-utils/src/mesa/main/dd.h
new file mode 100644
index 0000000..07e0ad5
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/dd.h

@@ -0,0 +1,1138 @@
+/**
+ * \file dd.h
+ * Device driver interfaces.
+ */
+
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2006  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+#ifndef DD_INCLUDED
+#define DD_INCLUDED
+
+/* THIS FILE ONLY INCLUDED BY mtypes.h !!!!! */
+
+#include "glheader.h"
+
+struct gl_buffer_object;
+struct gl_context;
+struct gl_display_list;
+struct gl_framebuffer;
+struct gl_image_unit;
+struct gl_pixelstore_attrib;
+struct gl_program;
+struct gl_renderbuffer;
+struct gl_renderbuffer_attachment;
+struct gl_shader;
+struct gl_shader_program;
+struct gl_texture_image;
+struct gl_texture_object;
+
+/* GL_ARB_vertex_buffer_object */
+/* Modifies GL_MAP_UNSYNCHRONIZED_BIT to allow driver to fail (return
+ * NULL) if buffer is unavailable for immediate mapping.
+ *
+ * Does GL_MAP_INVALIDATE_RANGE_BIT do this?  It seems so, but it
+ * would require more book-keeping in the driver than seems necessary
+ * at this point.
+ *
+ * Does GL_MAP_INVALDIATE_BUFFER_BIT do this?  Not really -- we don't
+ * want to provoke the driver to throw away the old storage, we will
+ * respect the contents of already referenced data.
+ */
+#define MESA_MAP_NOWAIT_BIT       0x0040
+
+
+/**
+ * Device driver function table.
+ * Core Mesa uses these function pointers to call into device drivers.
+ * Most of these functions directly correspond to OpenGL state commands.
+ * Core Mesa will call these functions after error checking has been done
+ * so that the drivers don't have to worry about error testing.
+ *
+ * Vertex transformation/clipping/lighting is patched into the T&L module.
+ * Rasterization functions are patched into the swrast module.
+ *
+ * Note: when new functions are added here, the drivers/common/driverfuncs.c
+ * file should be updated too!!!
+ */
+struct dd_function_table {
+   /**
+    * Return a string as needed by glGetString().
+    * Only the GL_RENDERER query must be implemented.  Otherwise, NULL can be
+    * returned.
+    */
+   const GLubyte * (*GetString)( struct gl_context *ctx, GLenum name );
+
+   /**
+    * Notify the driver after Mesa has made some internal state changes.  
+    *
+    * This is in addition to any state change callbacks Mesa may already have
+    * made.
+    */
+   void (*UpdateState)( struct gl_context *ctx, GLbitfield new_state );
+
+   /**
+    * Resize the given framebuffer to the given size.
+    * XXX OBSOLETE: this function will be removed in the future.
+    */
+   void (*ResizeBuffers)( struct gl_context *ctx, struct gl_framebuffer *fb,
+                          GLuint width, GLuint height);
+
+   /**
+    * This is called whenever glFinish() is called.
+    */
+   void (*Finish)( struct gl_context *ctx );
+
+   /**
+    * This is called whenever glFlush() is called.
+    */
+   void (*Flush)( struct gl_context *ctx );
+
+   /**
+    * Clear the color/depth/stencil/accum buffer(s).
+    * \param buffers  a bitmask of BUFFER_BIT_* flags indicating which
+    *                 renderbuffers need to be cleared.
+    */
+   void (*Clear)( struct gl_context *ctx, GLbitfield buffers );
+
+   /**
+    * Execute glAccum command.
+    */
+   void (*Accum)( struct gl_context *ctx, GLenum op, GLfloat value );
+
+
+   /**
+    * Execute glRasterPos, updating the ctx->Current.Raster fields
+    */
+   void (*RasterPos)( struct gl_context *ctx, const GLfloat v[4] );
+
+   /**
+    * \name Image-related functions
+    */
+   /*@{*/
+
+   /**
+    * Called by glDrawPixels().
+    * \p unpack describes how to unpack the source image data.
+    */
+   void (*DrawPixels)( struct gl_context *ctx,
+		       GLint x, GLint y, GLsizei width, GLsizei height,
+		       GLenum format, GLenum type,
+		       const struct gl_pixelstore_attrib *unpack,
+		       const GLvoid *pixels );
+
+   /**
+    * Called by glReadPixels().
+    */
+   void (*ReadPixels)( struct gl_context *ctx,
+		       GLint x, GLint y, GLsizei width, GLsizei height,
+		       GLenum format, GLenum type,
+		       const struct gl_pixelstore_attrib *unpack,
+		       GLvoid *dest );
+
+   /**
+    * Called by glCopyPixels().  
+    */
+   void (*CopyPixels)( struct gl_context *ctx, GLint srcx, GLint srcy,
+                       GLsizei width, GLsizei height,
+                       GLint dstx, GLint dsty, GLenum type );
+
+   /**
+    * Called by glBitmap().  
+    */
+   void (*Bitmap)( struct gl_context *ctx,
+		   GLint x, GLint y, GLsizei width, GLsizei height,
+		   const struct gl_pixelstore_attrib *unpack,
+		   const GLubyte *bitmap );
+   /*@}*/
+
+   
+   /**
+    * \name Texture image functions
+    */
+   /*@{*/
+
+   /**
+    * Choose actual hardware texture format given the texture target, the
+    * user-provided source image format and type and the desired internal
+    * format.  In some cases, srcFormat and srcType can be GL_NONE.
+    * Note:  target may be GL_TEXTURE_CUBE_MAP, but never
+    * GL_TEXTURE_CUBE_MAP_[POSITIVE/NEGATIVE]_[XYZ].
+    * Called by glTexImage(), etc.
+    */
+   mesa_format (*ChooseTextureFormat)(struct gl_context *ctx,
+                                      GLenum target, GLint internalFormat,
+                                      GLenum srcFormat, GLenum srcType );
+
+   /**
+    * Determine sample counts support for a particular target and format
+    *
+    * \param ctx            GL context
+    * \param target         GL target enum
+    * \param internalFormat GL format enum
+    * \param samples        Buffer to hold the returned sample counts.
+    *                       Drivers \b must \b not return more than 16 counts.
+    *
+    * \returns
+    * The number of sample counts actually written to \c samples.  If
+    * \c internaFormat is not renderable, zero is returned.
+    */
+   size_t (*QuerySamplesForFormat)(struct gl_context *ctx,
+                                   GLenum target,
+                                   GLenum internalFormat,
+                                   int samples[16]);
+
+   /**
+    * Called by glTexImage[123]D() and glCopyTexImage[12]D()
+    * Allocate texture memory and copy the user's image to the buffer.
+    * The gl_texture_image fields, etc. will be fully initialized.
+    * The parameters are the same as glTexImage3D(), plus:
+    * \param dims  1, 2, or 3 indicating glTexImage1/2/3D()
+    * \param packing describes how to unpack the source data.
+    * \param texImage is the destination texture image.
+    */
+   void (*TexImage)(struct gl_context *ctx, GLuint dims,
+                    struct gl_texture_image *texImage,
+                    GLenum format, GLenum type, const GLvoid *pixels,
+                    const struct gl_pixelstore_attrib *packing);
+
+   /**
+    * Called by glTexSubImage[123]D().
+    * Replace a subset of the target texture with new texel data.
+    */
+   void (*TexSubImage)(struct gl_context *ctx, GLuint dims,
+                       struct gl_texture_image *texImage,
+                       GLint xoffset, GLint yoffset, GLint zoffset,
+                       GLsizei width, GLsizei height, GLint depth,
+                       GLenum format, GLenum type,
+                       const GLvoid *pixels,
+                       const struct gl_pixelstore_attrib *packing);
+
+
+   /**
+    * Called by glGetTexImage().
+    */
+   void (*GetTexImage)( struct gl_context *ctx,
+                        GLenum format, GLenum type, GLvoid *pixels,
+                        struct gl_texture_image *texImage );
+
+   /**
+    * Called by glCopyTex[Sub]Image[123]D().
+    *
+    * This function should copy a rectangular region in the rb to a single
+    * destination slice, specified by @slice.  In the case of 1D array
+    * textures (where one GL call can potentially affect multiple destination
+    * slices), core mesa takes care of calling this function multiple times,
+    * once for each scanline to be copied.
+    */
+   void (*CopyTexSubImage)(struct gl_context *ctx, GLuint dims,
+                           struct gl_texture_image *texImage,
+                           GLint xoffset, GLint yoffset, GLint slice,
+                           struct gl_renderbuffer *rb,
+                           GLint x, GLint y,
+                           GLsizei width, GLsizei height);
+
+   /**
+    * Called by glGenerateMipmap() or when GL_GENERATE_MIPMAP_SGIS is enabled.
+    * Note that if the texture is a cube map, the <target> parameter will
+    * indicate which cube face to generate (GL_POSITIVE/NEGATIVE_X/Y/Z).
+    * texObj->BaseLevel is the level from which to generate the remaining
+    * mipmap levels.
+    */
+   void (*GenerateMipmap)(struct gl_context *ctx, GLenum target,
+                          struct gl_texture_object *texObj);
+
+   /**
+    * Called by glTexImage, glCompressedTexImage, glCopyTexImage
+    * and glTexStorage to check if the dimensions of the texture image
+    * are too large.
+    * \param target  any GL_PROXY_TEXTURE_x target
+    * \return GL_TRUE if the image is OK, GL_FALSE if too large
+    */
+   GLboolean (*TestProxyTexImage)(struct gl_context *ctx, GLenum target,
+                                  GLint level, mesa_format format,
+                                  GLint width, GLint height,
+                                  GLint depth, GLint border);
+   /*@}*/
+
+   
+   /**
+    * \name Compressed texture functions
+    */
+   /*@{*/
+
+   /**
+    * Called by glCompressedTexImage[123]D().
+    */
+   void (*CompressedTexImage)(struct gl_context *ctx, GLuint dims,
+                              struct gl_texture_image *texImage,
+                              GLsizei imageSize, const GLvoid *data);
+
+   /**
+    * Called by glCompressedTexSubImage[123]D().
+    */
+   void (*CompressedTexSubImage)(struct gl_context *ctx, GLuint dims,
+                                 struct gl_texture_image *texImage,
+                                 GLint xoffset, GLint yoffset, GLint zoffset,
+                                 GLsizei width, GLint height, GLint depth,
+                                 GLenum format,
+                                 GLsizei imageSize, const GLvoid *data);
+
+   /**
+    * Called by glGetCompressedTexImage.
+    */
+   void (*GetCompressedTexImage)(struct gl_context *ctx,
+                                 struct gl_texture_image *texImage,
+                                 GLvoid *data);
+   /*@}*/
+
+   /**
+    * \name Texture object / image functions
+    */
+   /*@{*/
+
+   /**
+    * Called by glBindTexture() and glBindTextures().
+    */
+   void (*BindTexture)( struct gl_context *ctx, GLuint texUnit,
+                        GLenum target, struct gl_texture_object *tObj );
+
+   /**
+    * Called to allocate a new texture object.  Drivers will usually
+    * allocate/return a subclass of gl_texture_object.
+    */
+   struct gl_texture_object * (*NewTextureObject)(struct gl_context *ctx,
+                                                  GLuint name, GLenum target);
+   /**
+    * Called to delete/free a texture object.  Drivers should free the
+    * object and any image data it contains.
+    */
+   void (*DeleteTexture)(struct gl_context *ctx,
+                         struct gl_texture_object *texObj);
+
+   /** Called to allocate a new texture image object. */
+   struct gl_texture_image * (*NewTextureImage)(struct gl_context *ctx);
+
+   /** Called to free a texture image object returned by NewTextureImage() */
+   void (*DeleteTextureImage)(struct gl_context *ctx,
+                              struct gl_texture_image *);
+
+   /** Called to allocate memory for a single texture image */
+   GLboolean (*AllocTextureImageBuffer)(struct gl_context *ctx,
+                                        struct gl_texture_image *texImage);
+
+   /** Free the memory for a single texture image */
+   void (*FreeTextureImageBuffer)(struct gl_context *ctx,
+                                  struct gl_texture_image *texImage);
+
+   /** Map a slice of a texture image into user space.
+    * Note: for GL_TEXTURE_1D_ARRAY, height must be 1, y must be 0 and slice
+    * indicates the 1D array index.
+    * \param texImage  the texture image
+    * \param slice  the 3D image slice or array texture slice
+    * \param x, y, w, h  region of interest
+    * \param mode  bitmask of GL_MAP_READ_BIT, GL_MAP_WRITE_BIT and
+    *              GL_MAP_INVALIDATE_RANGE_BIT (if writing)
+    * \param mapOut  returns start of mapping of region of interest
+    * \param rowStrideOut returns row stride (in bytes).  In the case of a
+    * compressed texture, this is the byte stride between one row of blocks
+    * and another.
+    */
+   void (*MapTextureImage)(struct gl_context *ctx,
+			   struct gl_texture_image *texImage,
+			   GLuint slice,
+			   GLuint x, GLuint y, GLuint w, GLuint h,
+			   GLbitfield mode,
+			   GLubyte **mapOut, GLint *rowStrideOut);
+
+   void (*UnmapTextureImage)(struct gl_context *ctx,
+			     struct gl_texture_image *texImage,
+			     GLuint slice);
+
+   /** For GL_ARB_texture_storage.  Allocate memory for whole mipmap stack.
+    * All the gl_texture_images in the texture object will have their
+    * dimensions, format, etc. initialized already.
+    */
+   GLboolean (*AllocTextureStorage)(struct gl_context *ctx,
+                                    struct gl_texture_object *texObj,
+                                    GLsizei levels, GLsizei width,
+                                    GLsizei height, GLsizei depth);
+
+   /** Called as part of glTextureView to add views to origTexObj */
+   GLboolean (*TextureView)(struct gl_context *ctx,
+                            struct gl_texture_object *texObj,
+                            struct gl_texture_object *origTexObj);
+
+   /**
+    * Map a renderbuffer into user space.
+    * \param mode  bitmask of GL_MAP_READ_BIT, GL_MAP_WRITE_BIT and
+    *              GL_MAP_INVALIDATE_RANGE_BIT (if writing)
+    */
+   void (*MapRenderbuffer)(struct gl_context *ctx,
+			   struct gl_renderbuffer *rb,
+			   GLuint x, GLuint y, GLuint w, GLuint h,
+			   GLbitfield mode,
+			   GLubyte **mapOut, GLint *rowStrideOut);
+
+   void (*UnmapRenderbuffer)(struct gl_context *ctx,
+			     struct gl_renderbuffer *rb);
+
+   /**
+    * Optional driver entrypoint that binds a non-texture renderbuffer's
+    * contents to a texture image.
+    */
+   GLboolean (*BindRenderbufferTexImage)(struct gl_context *ctx,
+                                         struct gl_renderbuffer *rb,
+                                         struct gl_texture_image *texImage);
+   /*@}*/
+
+
+   /**
+    * \name Vertex/fragment program functions
+    */
+   /*@{*/
+   /** Bind a vertex/fragment program */
+   void (*BindProgram)(struct gl_context *ctx, GLenum target,
+                       struct gl_program *prog);
+   /** Allocate a new program */
+   struct gl_program * (*NewProgram)(struct gl_context *ctx, GLenum target,
+                                     GLuint id);
+   /** Construct gl_program for a shader. */
+   struct gl_program * (*GetProgram) (struct gl_context *ctx,
+                                      struct gl_shader_program *shader_program,
+                                      struct gl_shader *shader);
+   /** Delete a program */
+   void (*DeleteProgram)(struct gl_context *ctx, struct gl_program *prog);   
+   /**
+    * Notify driver that a program string (and GPU code) has been specified
+    * or modified.  Return GL_TRUE or GL_FALSE to indicate if the program is
+    * supported by the driver.
+    */
+   GLboolean (*ProgramStringNotify)(struct gl_context *ctx, GLenum target, 
+                                    struct gl_program *prog);
+
+   /**
+    * Notify driver that the sampler uniforms for the current program have
+    * changed.  On some drivers, this may require shader recompiles.
+    */
+   void (*SamplerUniformChange)(struct gl_context *ctx, GLenum target,
+                                struct gl_program *prog);
+
+   /** Query if program can be loaded onto hardware */
+   GLboolean (*IsProgramNative)(struct gl_context *ctx, GLenum target, 
+				struct gl_program *prog);
+   
+   /*@}*/
+
+   /**
+    * \name GLSL shader/program functions.
+    */
+   /*@{*/
+   /**
+    * Called when a shader program is to be linked.
+    *
+    * This is optional and gives drivers an opportunity to inspect the context
+    * and prepare for LinkShader, which may be deferred to another thread.
+    */
+   void (*NotifyLinkShader)(struct gl_context *ctx,
+                            struct gl_shader_program *shader);
+   /**
+    * Called when a shader program is linked.
+    *
+    * This gives drivers an opportunity to clone the IR and make their
+    * own transformations on it for the purposes of code generation.
+    */
+   GLboolean (*LinkShader)(struct gl_context *ctx,
+                           struct gl_shader_program *shader);
+   /*@}*/
+
+   /**
+    * \name State-changing functions.
+    *
+    * \note drawing functions are above.
+    *
+    * These functions are called by their corresponding OpenGL API functions.
+    * They are \e also called by the gl_PopAttrib() function!!!
+    * May add more functions like these to the device driver in the future.
+    */
+   /*@{*/
+   /** Specify the alpha test function */
+   void (*AlphaFunc)(struct gl_context *ctx, GLenum func, GLfloat ref);
+   /** Set the blend color */
+   void (*BlendColor)(struct gl_context *ctx, const GLfloat color[4]);
+   /** Set the blend equation */
+   void (*BlendEquationSeparate)(struct gl_context *ctx,
+                                 GLenum modeRGB, GLenum modeA);
+   void (*BlendEquationSeparatei)(struct gl_context *ctx, GLuint buffer,
+                                  GLenum modeRGB, GLenum modeA);
+   /** Specify pixel arithmetic */
+   void (*BlendFuncSeparate)(struct gl_context *ctx,
+                             GLenum sfactorRGB, GLenum dfactorRGB,
+                             GLenum sfactorA, GLenum dfactorA);
+   void (*BlendFuncSeparatei)(struct gl_context *ctx, GLuint buffer,
+                              GLenum sfactorRGB, GLenum dfactorRGB,
+                              GLenum sfactorA, GLenum dfactorA);
+   /** Specify a plane against which all geometry is clipped */
+   void (*ClipPlane)(struct gl_context *ctx, GLenum plane, const GLfloat *eq);
+   /** Enable and disable writing of frame buffer color components */
+   void (*ColorMask)(struct gl_context *ctx, GLboolean rmask, GLboolean gmask,
+                     GLboolean bmask, GLboolean amask );
+   void (*ColorMaskIndexed)(struct gl_context *ctx, GLuint buf, GLboolean rmask,
+                            GLboolean gmask, GLboolean bmask, GLboolean amask);
+   /** Cause a material color to track the current color */
+   void (*ColorMaterial)(struct gl_context *ctx, GLenum face, GLenum mode);
+   /** Specify whether front- or back-facing facets can be culled */
+   void (*CullFace)(struct gl_context *ctx, GLenum mode);
+   /** Define front- and back-facing polygons */
+   void (*FrontFace)(struct gl_context *ctx, GLenum mode);
+   /** Specify the value used for depth buffer comparisons */
+   void (*DepthFunc)(struct gl_context *ctx, GLenum func);
+   /** Enable or disable writing into the depth buffer */
+   void (*DepthMask)(struct gl_context *ctx, GLboolean flag);
+   /** Specify mapping of depth values from NDC to window coordinates */
+   void (*DepthRange)(struct gl_context *ctx);
+   /** Specify the current buffer for writing */
+   void (*DrawBuffer)( struct gl_context *ctx, GLenum buffer );
+   /** Specify the buffers for writing for fragment programs*/
+   void (*DrawBuffers)(struct gl_context *ctx, GLsizei n, const GLenum *buffers);
+   /** Enable or disable server-side gl capabilities */
+   void (*Enable)(struct gl_context *ctx, GLenum cap, GLboolean state);
+   /** Specify fog parameters */
+   void (*Fogfv)(struct gl_context *ctx, GLenum pname, const GLfloat *params);
+   /** Specify implementation-specific hints */
+   void (*Hint)(struct gl_context *ctx, GLenum target, GLenum mode);
+   /** Set light source parameters.
+    * Note: for GL_POSITION and GL_SPOT_DIRECTION, params will have already
+    * been transformed to eye-space.
+    */
+   void (*Lightfv)(struct gl_context *ctx, GLenum light,
+		   GLenum pname, const GLfloat *params );
+   /** Set the lighting model parameters */
+   void (*LightModelfv)(struct gl_context *ctx, GLenum pname,
+                        const GLfloat *params);
+   /** Specify the line stipple pattern */
+   void (*LineStipple)(struct gl_context *ctx, GLint factor, GLushort pattern );
+   /** Specify the width of rasterized lines */
+   void (*LineWidth)(struct gl_context *ctx, GLfloat width);
+   /** Specify a logical pixel operation for color index rendering */
+   void (*LogicOpcode)(struct gl_context *ctx, GLenum opcode);
+   void (*PointParameterfv)(struct gl_context *ctx, GLenum pname,
+                            const GLfloat *params);
+   /** Specify the diameter of rasterized points */
+   void (*PointSize)(struct gl_context *ctx, GLfloat size);
+   /** Select a polygon rasterization mode */
+   void (*PolygonMode)(struct gl_context *ctx, GLenum face, GLenum mode);
+   /** Set the scale and units used to calculate depth values */
+   void (*PolygonOffset)(struct gl_context *ctx, GLfloat factor, GLfloat units);
+   /** Set the polygon stippling pattern */
+   void (*PolygonStipple)(struct gl_context *ctx, const GLubyte *mask );
+   /* Specifies the current buffer for reading */
+   void (*ReadBuffer)( struct gl_context *ctx, GLenum buffer );
+   /** Set rasterization mode */
+   void (*RenderMode)(struct gl_context *ctx, GLenum mode );
+   /** Define the scissor box */
+   void (*Scissor)(struct gl_context *ctx);
+   /** Select flat or smooth shading */
+   void (*ShadeModel)(struct gl_context *ctx, GLenum mode);
+   /** OpenGL 2.0 two-sided StencilFunc */
+   void (*StencilFuncSeparate)(struct gl_context *ctx, GLenum face, GLenum func,
+                               GLint ref, GLuint mask);
+   /** OpenGL 2.0 two-sided StencilMask */
+   void (*StencilMaskSeparate)(struct gl_context *ctx, GLenum face, GLuint mask);
+   /** OpenGL 2.0 two-sided StencilOp */
+   void (*StencilOpSeparate)(struct gl_context *ctx, GLenum face, GLenum fail,
+                             GLenum zfail, GLenum zpass);
+   /** Control the generation of texture coordinates */
+   void (*TexGen)(struct gl_context *ctx, GLenum coord, GLenum pname,
+		  const GLfloat *params);
+   /** Set texture environment parameters */
+   void (*TexEnv)(struct gl_context *ctx, GLenum target, GLenum pname,
+                  const GLfloat *param);
+   /** Set texture parameters */
+   void (*TexParameter)(struct gl_context *ctx,
+                        struct gl_texture_object *texObj,
+                        GLenum pname, const GLfloat *params);
+   /** Set the viewport */
+   void (*Viewport)(struct gl_context *ctx);
+   /*@}*/
+
+
+   /**
+    * \name Vertex/pixel buffer object functions
+    */
+   /*@{*/
+   struct gl_buffer_object * (*NewBufferObject)(struct gl_context *ctx,
+                                                GLuint buffer, GLenum target);
+   
+   void (*DeleteBuffer)( struct gl_context *ctx, struct gl_buffer_object *obj );
+
+   GLboolean (*BufferData)(struct gl_context *ctx, GLenum target,
+                           GLsizeiptrARB size, const GLvoid *data, GLenum usage,
+                           GLenum storageFlags, struct gl_buffer_object *obj);
+
+   void (*BufferSubData)( struct gl_context *ctx, GLintptrARB offset,
+			  GLsizeiptrARB size, const GLvoid *data,
+			  struct gl_buffer_object *obj );
+
+   void (*GetBufferSubData)( struct gl_context *ctx,
+			     GLintptrARB offset, GLsizeiptrARB size,
+			     GLvoid *data, struct gl_buffer_object *obj );
+
+   void (*ClearBufferSubData)( struct gl_context *ctx,
+                               GLintptr offset, GLsizeiptr size,
+                               const GLvoid *clearValue,
+                               GLsizeiptr clearValueSize,
+                               struct gl_buffer_object *obj );
+
+   void (*CopyBufferSubData)( struct gl_context *ctx,
+                              struct gl_buffer_object *src,
+                              struct gl_buffer_object *dst,
+                              GLintptr readOffset, GLintptr writeOffset,
+                              GLsizeiptr size );
+
+   /* Returns pointer to the start of the mapped range.
+    * May return NULL if MESA_MAP_NOWAIT_BIT is set in access:
+    */
+   void * (*MapBufferRange)( struct gl_context *ctx, GLintptr offset,
+                             GLsizeiptr length, GLbitfield access,
+                             struct gl_buffer_object *obj,
+                             gl_map_buffer_index index);
+
+   void (*FlushMappedBufferRange)(struct gl_context *ctx,
+                                  GLintptr offset, GLsizeiptr length,
+                                  struct gl_buffer_object *obj,
+                                  gl_map_buffer_index index);
+
+   GLboolean (*UnmapBuffer)( struct gl_context *ctx,
+			     struct gl_buffer_object *obj,
+                             gl_map_buffer_index index);
+   /*@}*/
+
+   /**
+    * \name Functions for GL_APPLE_object_purgeable
+    */
+   /*@{*/
+   /* variations on ObjectPurgeable */
+   GLenum (*BufferObjectPurgeable)(struct gl_context *ctx,
+                                   struct gl_buffer_object *obj, GLenum option);
+   GLenum (*RenderObjectPurgeable)(struct gl_context *ctx,
+                                   struct gl_renderbuffer *obj, GLenum option);
+   GLenum (*TextureObjectPurgeable)(struct gl_context *ctx,
+                                    struct gl_texture_object *obj,
+                                    GLenum option);
+
+   /* variations on ObjectUnpurgeable */
+   GLenum (*BufferObjectUnpurgeable)(struct gl_context *ctx,
+                                     struct gl_buffer_object *obj,
+                                     GLenum option);
+   GLenum (*RenderObjectUnpurgeable)(struct gl_context *ctx,
+                                     struct gl_renderbuffer *obj,
+                                     GLenum option);
+   GLenum (*TextureObjectUnpurgeable)(struct gl_context *ctx,
+                                      struct gl_texture_object *obj,
+                                      GLenum option);
+   /*@}*/
+
+   /**
+    * \name Functions for GL_EXT_framebuffer_{object,blit,discard}.
+    */
+   /*@{*/
+   struct gl_framebuffer * (*NewFramebuffer)(struct gl_context *ctx,
+                                             GLuint name);
+   struct gl_renderbuffer * (*NewRenderbuffer)(struct gl_context *ctx,
+                                               GLuint name);
+   void (*BindFramebuffer)(struct gl_context *ctx, GLenum target,
+                           struct gl_framebuffer *drawFb,
+                           struct gl_framebuffer *readFb);
+   void (*FramebufferRenderbuffer)(struct gl_context *ctx, 
+                                   struct gl_framebuffer *fb,
+                                   GLenum attachment,
+                                   struct gl_renderbuffer *rb);
+   void (*RenderTexture)(struct gl_context *ctx,
+                         struct gl_framebuffer *fb,
+                         struct gl_renderbuffer_attachment *att);
+   void (*FinishRenderTexture)(struct gl_context *ctx,
+                               struct gl_renderbuffer *rb);
+   void (*ValidateFramebuffer)(struct gl_context *ctx,
+                               struct gl_framebuffer *fb);
+   /*@}*/
+   void (*BlitFramebuffer)(struct gl_context *ctx,
+                           GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1,
+                           GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1,
+                           GLbitfield mask, GLenum filter);
+   void (*DiscardFramebuffer)(struct gl_context *ctx,
+                              GLenum target, GLsizei numAttachments,
+                              const GLenum *attachments);
+
+   /**
+    * \name Query objects
+    */
+   /*@{*/
+   struct gl_query_object * (*NewQueryObject)(struct gl_context *ctx, GLuint id);
+   void (*DeleteQuery)(struct gl_context *ctx, struct gl_query_object *q);
+   void (*BeginQuery)(struct gl_context *ctx, struct gl_query_object *q);
+   void (*QueryCounter)(struct gl_context *ctx, struct gl_query_object *q);
+   void (*EndQuery)(struct gl_context *ctx, struct gl_query_object *q);
+   void (*CheckQuery)(struct gl_context *ctx, struct gl_query_object *q);
+   void (*WaitQuery)(struct gl_context *ctx, struct gl_query_object *q);
+   /*@}*/
+
+   /**
+    * \name Performance monitors
+    */
+   /*@{*/
+   struct gl_perf_monitor_object * (*NewPerfMonitor)(struct gl_context *ctx);
+   void (*DeletePerfMonitor)(struct gl_context *ctx,
+                             struct gl_perf_monitor_object *m);
+   GLboolean (*BeginPerfMonitor)(struct gl_context *ctx,
+                                 struct gl_perf_monitor_object *m);
+
+   /** Stop an active performance monitor, discarding results. */
+   void (*ResetPerfMonitor)(struct gl_context *ctx,
+                            struct gl_perf_monitor_object *m);
+   void (*EndPerfMonitor)(struct gl_context *ctx,
+                          struct gl_perf_monitor_object *m);
+   GLboolean (*IsPerfMonitorResultAvailable)(struct gl_context *ctx,
+                                             struct gl_perf_monitor_object *m);
+   void (*GetPerfMonitorResult)(struct gl_context *ctx,
+                                struct gl_perf_monitor_object *m,
+                                GLsizei dataSize,
+                                GLuint *data,
+                                GLint *bytesWritten);
+   /*@}*/
+
+
+   /**
+    * \name Vertex Array objects
+    */
+   /*@{*/
+   struct gl_vertex_array_object * (*NewArrayObject)(struct gl_context *ctx, GLuint id);
+   void (*DeleteArrayObject)(struct gl_context *ctx, struct gl_vertex_array_object *);
+   void (*BindArrayObject)(struct gl_context *ctx, struct gl_vertex_array_object *);
+   /*@}*/
+
+   /**
+    * \name GLSL-related functions (ARB extensions and OpenGL 2.x)
+    */
+   /*@{*/
+   struct gl_shader *(*NewShader)(struct gl_context *ctx,
+                                  GLuint name, GLenum type);
+   void (*DeleteShader)(struct gl_context *ctx, struct gl_shader *shader);
+   struct gl_shader_program *(*NewShaderProgram)(struct gl_context *ctx,
+                                                 GLuint name);
+   void (*DeleteShaderProgram)(struct gl_context *ctx,
+                               struct gl_shader_program *shProg);
+   void (*UseProgram)(struct gl_context *ctx, struct gl_shader_program *shProg);
+   /*@}*/
+
+
+   /**
+    * \name Support for multiple T&L engines
+    */
+   /*@{*/
+
+   /**
+    * Set by the driver-supplied T&L engine.  
+    *
+    * Set to PRIM_OUTSIDE_BEGIN_END when outside glBegin()/glEnd().
+    */
+   GLuint CurrentExecPrimitive;
+
+   /**
+    * Current glBegin state of an in-progress compilation.  May be
+    * GL_POINTS, GL_TRIANGLE_STRIP, etc. or PRIM_OUTSIDE_BEGIN_END
+    * or PRIM_UNKNOWN.
+    */
+   GLuint CurrentSavePrimitive;
+
+
+#define FLUSH_STORED_VERTICES 0x1
+#define FLUSH_UPDATE_CURRENT  0x2
+   /**
+    * Set by the driver-supplied T&L engine whenever vertices are buffered
+    * between glBegin()/glEnd() objects or __struct gl_contextRec::Current
+    * is not updated.  A bitmask of the FLUSH_x values above.
+    *
+    * The dd_function_table::FlushVertices call below may be used to resolve
+    * these conditions.
+    */
+   GLbitfield NeedFlush;
+
+   /** Need to call SaveFlushVertices() upon state change? */
+   GLboolean SaveNeedFlush;
+
+   /* Called prior to any of the GLvertexformat functions being
+    * called.  Paired with Driver.FlushVertices().
+    */
+   void (*BeginVertices)( struct gl_context *ctx );
+
+   /**
+    * If inside glBegin()/glEnd(), it should ASSERT(0).  Otherwise, if
+    * FLUSH_STORED_VERTICES bit in \p flags is set flushes any buffered
+    * vertices, if FLUSH_UPDATE_CURRENT bit is set updates
+    * __struct gl_contextRec::Current and gl_light_attrib::Material
+    *
+    * Note that the default T&L engine never clears the
+    * FLUSH_UPDATE_CURRENT bit, even after performing the update.
+    */
+   void (*FlushVertices)( struct gl_context *ctx, GLuint flags );
+   void (*SaveFlushVertices)( struct gl_context *ctx );
+
+   /**
+    * Give the driver the opportunity to hook in its own vtxfmt for
+    * compiling optimized display lists.  This is called on each valid
+    * glBegin() during list compilation.
+    */
+   GLboolean (*NotifySaveBegin)( struct gl_context *ctx, GLenum mode );
+
+   /**
+    * Notify driver that the special derived value _NeedEyeCoords has
+    * changed.
+    */
+   void (*LightingSpaceChange)( struct gl_context *ctx );
+
+   /**
+    * Called by glNewList().
+    *
+    * Let the T&L component know what is going on with display lists
+    * in time to make changes to dispatch tables, etc.
+    */
+   void (*NewList)( struct gl_context *ctx, GLuint list, GLenum mode );
+   /**
+    * Called by glEndList().
+    *
+    * \sa dd_function_table::NewList.
+    */
+   void (*EndList)( struct gl_context *ctx );
+
+   /**
+    * Called by glCallList(s).
+    *
+    * Notify the T&L component before and after calling a display list.
+    */
+   void (*BeginCallList)( struct gl_context *ctx, 
+			  struct gl_display_list *dlist );
+   /**
+    * Called by glEndCallList().
+    *
+    * \sa dd_function_table::BeginCallList.
+    */
+   void (*EndCallList)( struct gl_context *ctx );
+
+   /**@}*/
+
+   /**
+    * \name GL_ARB_sync interfaces
+    */
+   /*@{*/
+   struct gl_sync_object * (*NewSyncObject)(struct gl_context *, GLenum);
+   void (*FenceSync)(struct gl_context *, struct gl_sync_object *,
+                     GLenum, GLbitfield);
+   void (*DeleteSyncObject)(struct gl_context *, struct gl_sync_object *);
+   void (*CheckSync)(struct gl_context *, struct gl_sync_object *);
+   void (*ClientWaitSync)(struct gl_context *, struct gl_sync_object *,
+			  GLbitfield, GLuint64);
+   void (*ServerWaitSync)(struct gl_context *, struct gl_sync_object *,
+			  GLbitfield, GLuint64);
+   /*@}*/
+
+   /** GL_NV_conditional_render */
+   void (*BeginConditionalRender)(struct gl_context *ctx,
+                                  struct gl_query_object *q,
+                                  GLenum mode);
+   void (*EndConditionalRender)(struct gl_context *ctx,
+                                struct gl_query_object *q);
+
+   /**
+    * \name GL_OES_draw_texture interface
+    */
+   /*@{*/
+   void (*DrawTex)(struct gl_context *ctx, GLfloat x, GLfloat y, GLfloat z,
+                   GLfloat width, GLfloat height);
+   /*@}*/
+
+   /**
+    * \name GL_OES_EGL_image interface
+    */
+   void (*EGLImageTargetTexture2D)(struct gl_context *ctx, GLenum target,
+				   struct gl_texture_object *texObj,
+				   struct gl_texture_image *texImage,
+				   GLeglImageOES image_handle);
+   void (*EGLImageTargetRenderbufferStorage)(struct gl_context *ctx,
+					     struct gl_renderbuffer *rb,
+					     void *image_handle);
+
+   /**
+    * \name GL_EXT_transform_feedback interface
+    */
+   struct gl_transform_feedback_object *
+        (*NewTransformFeedback)(struct gl_context *ctx, GLuint name);
+   void (*DeleteTransformFeedback)(struct gl_context *ctx,
+                                   struct gl_transform_feedback_object *obj);
+   void (*BeginTransformFeedback)(struct gl_context *ctx, GLenum mode,
+                                  struct gl_transform_feedback_object *obj);
+   void (*EndTransformFeedback)(struct gl_context *ctx,
+                                struct gl_transform_feedback_object *obj);
+   void (*PauseTransformFeedback)(struct gl_context *ctx,
+                                  struct gl_transform_feedback_object *obj);
+   void (*ResumeTransformFeedback)(struct gl_context *ctx,
+                                   struct gl_transform_feedback_object *obj);
+
+   /**
+    * Return the number of vertices written to a stream during the last
+    * Begin/EndTransformFeedback block.
+    */
+   GLsizei (*GetTransformFeedbackVertexCount)(struct gl_context *ctx,
+                                       struct gl_transform_feedback_object *obj,
+                                       GLuint stream);
+
+   /**
+    * \name GL_NV_texture_barrier interface
+    */
+   void (*TextureBarrier)(struct gl_context *ctx);
+
+   /**
+    * \name GL_ARB_sampler_objects
+    */
+   struct gl_sampler_object * (*NewSamplerObject)(struct gl_context *ctx,
+                                                  GLuint name);
+   void (*DeleteSamplerObject)(struct gl_context *ctx,
+                               struct gl_sampler_object *samp);
+
+   /**
+    * \name Return a timestamp in nanoseconds as defined by GL_ARB_timer_query.
+    * This should be equivalent to glGetInteger64v(GL_TIMESTAMP);
+    */
+   uint64_t (*GetTimestamp)(struct gl_context *ctx);
+
+   /**
+    * \name GL_ARB_texture_multisample
+    */
+   void (*GetSamplePosition)(struct gl_context *ctx,
+                             struct gl_framebuffer *fb,
+                             GLuint index,
+                             GLfloat *outValue);
+
+   /**
+    * \name NV_vdpau_interop interface
+    */
+   void (*VDPAUMapSurface)(struct gl_context *ctx, GLenum target,
+                           GLenum access, GLboolean output,
+                           struct gl_texture_object *texObj,
+                           struct gl_texture_image *texImage,
+                           const GLvoid *vdpSurface, GLuint index);
+   void (*VDPAUUnmapSurface)(struct gl_context *ctx, GLenum target,
+                             GLenum access, GLboolean output,
+                             struct gl_texture_object *texObj,
+                             struct gl_texture_image *texImage,
+                             const GLvoid *vdpSurface, GLuint index);
+
+   /**
+    * Query reset status for GL_ARB_robustness
+    *
+    * Per \c glGetGraphicsResetStatusARB, this function should return a
+    * non-zero value once after a reset.  If a reset is non-atomic, the
+    * non-zero status should be returned for the duration of the reset.
+    */
+   GLenum (*GetGraphicsResetStatus)(struct gl_context *ctx);
+
+   /**
+    * \name GL_ARB_shader_image_load_store interface.
+    */
+   /** @{ */
+   void (*BindImageTexture)(struct gl_context *ctx,
+                            struct gl_image_unit *unit,
+                            struct gl_texture_object *texObj,
+                            GLint level, GLboolean layered, GLint layer,
+                            GLenum access, GLenum format);
+
+   void (*MemoryBarrier)(struct gl_context *ctx, GLbitfield barriers);
+   /** @} */
+};
+
+
+/**
+ * Per-vertex functions.
+ *
+ * These are the functions which can appear between glBegin and glEnd.
+ * Depending on whether we're inside or outside a glBegin/End pair
+ * and whether we're in immediate mode or building a display list, these
+ * functions behave differently.  This structure allows us to switch
+ * between those modes more easily.
+ *
+ * Generally, these pointers point to functions in the VBO module.
+ */
+typedef struct {
+   void (GLAPIENTRYP ArrayElement)( GLint );
+   void (GLAPIENTRYP Color3f)( GLfloat, GLfloat, GLfloat );
+   void (GLAPIENTRYP Color3fv)( const GLfloat * );
+   void (GLAPIENTRYP Color4f)( GLfloat, GLfloat, GLfloat, GLfloat );
+   void (GLAPIENTRYP Color4fv)( const GLfloat * );
+   void (GLAPIENTRYP EdgeFlag)( GLboolean );
+   void (GLAPIENTRYP EvalCoord1f)( GLfloat );
+   void (GLAPIENTRYP EvalCoord1fv)( const GLfloat * );
+   void (GLAPIENTRYP EvalCoord2f)( GLfloat, GLfloat );
+   void (GLAPIENTRYP EvalCoord2fv)( const GLfloat * );
+   void (GLAPIENTRYP EvalPoint1)( GLint );
+   void (GLAPIENTRYP EvalPoint2)( GLint, GLint );
+   void (GLAPIENTRYP FogCoordfEXT)( GLfloat );
+   void (GLAPIENTRYP FogCoordfvEXT)( const GLfloat * );
+   void (GLAPIENTRYP Indexf)( GLfloat );
+   void (GLAPIENTRYP Indexfv)( const GLfloat * );
+   void (GLAPIENTRYP Materialfv)( GLenum face, GLenum pname, const GLfloat * );
+   void (GLAPIENTRYP MultiTexCoord1fARB)( GLenum, GLfloat );
+   void (GLAPIENTRYP MultiTexCoord1fvARB)( GLenum, const GLfloat * );
+   void (GLAPIENTRYP MultiTexCoord2fARB)( GLenum, GLfloat, GLfloat );
+   void (GLAPIENTRYP MultiTexCoord2fvARB)( GLenum, const GLfloat * );
+   void (GLAPIENTRYP MultiTexCoord3fARB)( GLenum, GLfloat, GLfloat, GLfloat );
+   void (GLAPIENTRYP MultiTexCoord3fvARB)( GLenum, const GLfloat * );
+   void (GLAPIENTRYP MultiTexCoord4fARB)( GLenum, GLfloat, GLfloat, GLfloat, GLfloat );
+   void (GLAPIENTRYP MultiTexCoord4fvARB)( GLenum, const GLfloat * );
+   void (GLAPIENTRYP Normal3f)( GLfloat, GLfloat, GLfloat );
+   void (GLAPIENTRYP Normal3fv)( const GLfloat * );
+   void (GLAPIENTRYP SecondaryColor3fEXT)( GLfloat, GLfloat, GLfloat );
+   void (GLAPIENTRYP SecondaryColor3fvEXT)( const GLfloat * );
+   void (GLAPIENTRYP TexCoord1f)( GLfloat );
+   void (GLAPIENTRYP TexCoord1fv)( const GLfloat * );
+   void (GLAPIENTRYP TexCoord2f)( GLfloat, GLfloat );
+   void (GLAPIENTRYP TexCoord2fv)( const GLfloat * );
+   void (GLAPIENTRYP TexCoord3f)( GLfloat, GLfloat, GLfloat );
+   void (GLAPIENTRYP TexCoord3fv)( const GLfloat * );
+   void (GLAPIENTRYP TexCoord4f)( GLfloat, GLfloat, GLfloat, GLfloat );
+   void (GLAPIENTRYP TexCoord4fv)( const GLfloat * );
+   void (GLAPIENTRYP Vertex2f)( GLfloat, GLfloat );
+   void (GLAPIENTRYP Vertex2fv)( const GLfloat * );
+   void (GLAPIENTRYP Vertex3f)( GLfloat, GLfloat, GLfloat );
+   void (GLAPIENTRYP Vertex3fv)( const GLfloat * );
+   void (GLAPIENTRYP Vertex4f)( GLfloat, GLfloat, GLfloat, GLfloat );
+   void (GLAPIENTRYP Vertex4fv)( const GLfloat * );
+   void (GLAPIENTRYP CallList)( GLuint );
+   void (GLAPIENTRYP CallLists)( GLsizei, GLenum, const GLvoid * );
+   void (GLAPIENTRYP Begin)( GLenum );
+   void (GLAPIENTRYP End)( void );
+   void (GLAPIENTRYP PrimitiveRestartNV)( void );
+   /* Originally for GL_NV_vertex_program, now used only dlist.c and friends */
+   void (GLAPIENTRYP VertexAttrib1fNV)( GLuint index, GLfloat x );
+   void (GLAPIENTRYP VertexAttrib1fvNV)( GLuint index, const GLfloat *v );
+   void (GLAPIENTRYP VertexAttrib2fNV)( GLuint index, GLfloat x, GLfloat y );
+   void (GLAPIENTRYP VertexAttrib2fvNV)( GLuint index, const GLfloat *v );
+   void (GLAPIENTRYP VertexAttrib3fNV)( GLuint index, GLfloat x, GLfloat y, GLfloat z );
+   void (GLAPIENTRYP VertexAttrib3fvNV)( GLuint index, const GLfloat *v );
+   void (GLAPIENTRYP VertexAttrib4fNV)( GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w );
+   void (GLAPIENTRYP VertexAttrib4fvNV)( GLuint index, const GLfloat *v );
+   /* GL_ARB_vertex_program */
+   void (GLAPIENTRYP VertexAttrib1fARB)( GLuint index, GLfloat x );
+   void (GLAPIENTRYP VertexAttrib1fvARB)( GLuint index, const GLfloat *v );
+   void (GLAPIENTRYP VertexAttrib2fARB)( GLuint index, GLfloat x, GLfloat y );
+   void (GLAPIENTRYP VertexAttrib2fvARB)( GLuint index, const GLfloat *v );
+   void (GLAPIENTRYP VertexAttrib3fARB)( GLuint index, GLfloat x, GLfloat y, GLfloat z );
+   void (GLAPIENTRYP VertexAttrib3fvARB)( GLuint index, const GLfloat *v );
+   void (GLAPIENTRYP VertexAttrib4fARB)( GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w );
+   void (GLAPIENTRYP VertexAttrib4fvARB)( GLuint index, const GLfloat *v );
+
+   /* GL_EXT_gpu_shader4 / GL 3.0 */
+   void (GLAPIENTRYP VertexAttribI1i)( GLuint index, GLint x);
+   void (GLAPIENTRYP VertexAttribI2i)( GLuint index, GLint x, GLint y);
+   void (GLAPIENTRYP VertexAttribI3i)( GLuint index, GLint x, GLint y, GLint z);
+   void (GLAPIENTRYP VertexAttribI4i)( GLuint index, GLint x, GLint y, GLint z, GLint w);
+   void (GLAPIENTRYP VertexAttribI2iv)( GLuint index, const GLint *v);
+   void (GLAPIENTRYP VertexAttribI3iv)( GLuint index, const GLint *v);
+   void (GLAPIENTRYP VertexAttribI4iv)( GLuint index, const GLint *v);
+
+   void (GLAPIENTRYP VertexAttribI1ui)( GLuint index, GLuint x);
+   void (GLAPIENTRYP VertexAttribI2ui)( GLuint index, GLuint x, GLuint y);
+   void (GLAPIENTRYP VertexAttribI3ui)( GLuint index, GLuint x, GLuint y, GLuint z);
+   void (GLAPIENTRYP VertexAttribI4ui)( GLuint index, GLuint x, GLuint y, GLuint z, GLuint w);
+   void (GLAPIENTRYP VertexAttribI2uiv)( GLuint index, const GLuint *v);
+   void (GLAPIENTRYP VertexAttribI3uiv)( GLuint index, const GLuint *v);
+   void (GLAPIENTRYP VertexAttribI4uiv)( GLuint index, const GLuint *v);
+
+   /* GL_ARB_vertex_type_10_10_10_2_rev / GL3.3 */
+   void (GLAPIENTRYP VertexP2ui)( GLenum type, GLuint value );
+   void (GLAPIENTRYP VertexP2uiv)( GLenum type, const GLuint *value);
+
+   void (GLAPIENTRYP VertexP3ui)( GLenum type, GLuint value );
+   void (GLAPIENTRYP VertexP3uiv)( GLenum type, const GLuint *value);
+
+   void (GLAPIENTRYP VertexP4ui)( GLenum type, GLuint value );
+   void (GLAPIENTRYP VertexP4uiv)( GLenum type, const GLuint *value);
+
+   void (GLAPIENTRYP TexCoordP1ui)( GLenum type, GLuint coords );
+   void (GLAPIENTRYP TexCoordP1uiv)( GLenum type, const GLuint *coords );
+
+   void (GLAPIENTRYP TexCoordP2ui)( GLenum type, GLuint coords );
+   void (GLAPIENTRYP TexCoordP2uiv)( GLenum type, const GLuint *coords );
+
+   void (GLAPIENTRYP TexCoordP3ui)( GLenum type, GLuint coords );
+   void (GLAPIENTRYP TexCoordP3uiv)( GLenum type, const GLuint *coords );
+
+   void (GLAPIENTRYP TexCoordP4ui)( GLenum type, GLuint coords );
+   void (GLAPIENTRYP TexCoordP4uiv)( GLenum type, const GLuint *coords );
+
+   void (GLAPIENTRYP MultiTexCoordP1ui)( GLenum texture, GLenum type, GLuint coords );
+   void (GLAPIENTRYP MultiTexCoordP1uiv)( GLenum texture, GLenum type, const GLuint *coords );
+   void (GLAPIENTRYP MultiTexCoordP2ui)( GLenum texture, GLenum type, GLuint coords );
+   void (GLAPIENTRYP MultiTexCoordP2uiv)( GLenum texture, GLenum type, const GLuint *coords );
+   void (GLAPIENTRYP MultiTexCoordP3ui)( GLenum texture, GLenum type, GLuint coords );
+   void (GLAPIENTRYP MultiTexCoordP3uiv)( GLenum texture, GLenum type, const GLuint *coords );
+   void (GLAPIENTRYP MultiTexCoordP4ui)( GLenum texture, GLenum type, GLuint coords );
+   void (GLAPIENTRYP MultiTexCoordP4uiv)( GLenum texture, GLenum type, const GLuint *coords );
+
+   void (GLAPIENTRYP NormalP3ui)( GLenum type, GLuint coords );
+   void (GLAPIENTRYP NormalP3uiv)( GLenum type, const GLuint *coords );
+
+   void (GLAPIENTRYP ColorP3ui)( GLenum type, GLuint color );
+   void (GLAPIENTRYP ColorP3uiv)( GLenum type, const GLuint *color );
+
+   void (GLAPIENTRYP ColorP4ui)( GLenum type, GLuint color );
+   void (GLAPIENTRYP ColorP4uiv)( GLenum type, const GLuint *color );
+
+   void (GLAPIENTRYP SecondaryColorP3ui)( GLenum type, GLuint color );
+   void (GLAPIENTRYP SecondaryColorP3uiv)( GLenum type, const GLuint *color );
+
+   void (GLAPIENTRYP VertexAttribP1ui)( GLuint index, GLenum type,
+					GLboolean normalized, GLuint value);
+   void (GLAPIENTRYP VertexAttribP2ui)( GLuint index, GLenum type,
+					GLboolean normalized, GLuint value);
+   void (GLAPIENTRYP VertexAttribP3ui)( GLuint index, GLenum type,
+					GLboolean normalized, GLuint value);
+   void (GLAPIENTRYP VertexAttribP4ui)( GLuint index, GLenum type,
+					GLboolean normalized, GLuint value);
+   void (GLAPIENTRYP VertexAttribP1uiv)( GLuint index, GLenum type,
+					GLboolean normalized,
+					 const GLuint *value);
+   void (GLAPIENTRYP VertexAttribP2uiv)( GLuint index, GLenum type,
+					GLboolean normalized,
+					 const GLuint *value);
+   void (GLAPIENTRYP VertexAttribP3uiv)( GLuint index, GLenum type,
+					GLboolean normalized,
+					 const GLuint *value);
+   void (GLAPIENTRYP VertexAttribP4uiv)( GLuint index, GLenum type,
+					 GLboolean normalized,
+					 const GLuint *value);
+} GLvertexformat;
+
+
+#endif /* DD_INCLUDED */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/enums.c b/icd/intel/compiler/mesa-utils/src/mesa/main/enums.c
new file mode 100644
index 0000000..50e4409
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/enums.c

@@ -0,0 +1,4139 @@
+/* DO NOT EDIT - This file generated automatically by gl_enums.py (from Mesa) script */
+
+/*
+ * Copyright (C) 1999-2005 Brian Paul All Rights Reserved.
+ * All Rights Reserved.
+ * 
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sub license,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ * 
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ * 
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.  IN NO EVENT SHALL
+ * BRIAN PAUL,
+ * AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF
+ * OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "icd-utils.h" // LunarG: ADD
+#include "main/glheader.h"
+#include "main/enums.h"
+#include "main/imports.h"
+#include "main/mtypes.h"
+
+typedef struct PACKED {
+   uint16_t offset;
+   int n;
+} enum_elt;
+
+LONGSTRING static const char enum_string_table[] = 
+   "GL_FALSE\0"
+   "GL_LINES\0"
+   "GL_LINE_LOOP\0"
+   "GL_LINE_STRIP\0"
+   "GL_TRIANGLES\0"
+   "GL_TRIANGLE_STRIP\0"
+   "GL_TRIANGLE_FAN\0"
+   "GL_QUADS\0"
+   "GL_QUAD_STRIP\0"
+   "GL_POLYGON\0"
+   "GL_LINES_ADJACENCY\0"
+   "GL_LINE_STRIP_ADJACENCY\0"
+   "GL_TRIANGLES_ADJACENCY\0"
+   "GL_TRIANGLE_STRIP_ADJACENCY\0"
+   "GL_POLYGON_STIPPLE_BIT\0"
+   "GL_PIXEL_MODE_BIT\0"
+   "GL_LIGHTING_BIT\0"
+   "GL_FOG_BIT\0"
+   "GL_ACCUM\0"
+   "GL_LOAD\0"
+   "GL_RETURN\0"
+   "GL_MULT\0"
+   "GL_ADD\0"
+   "GL_NEVER\0"
+   "GL_LESS\0"
+   "GL_EQUAL\0"
+   "GL_LEQUAL\0"
+   "GL_GREATER\0"
+   "GL_NOTEQUAL\0"
+   "GL_GEQUAL\0"
+   "GL_ALWAYS\0"
+   "GL_SRC_COLOR\0"
+   "GL_ONE_MINUS_SRC_COLOR\0"
+   "GL_SRC_ALPHA\0"
+   "GL_ONE_MINUS_SRC_ALPHA\0"
+   "GL_DST_ALPHA\0"
+   "GL_ONE_MINUS_DST_ALPHA\0"
+   "GL_DST_COLOR\0"
+   "GL_ONE_MINUS_DST_COLOR\0"
+   "GL_SRC_ALPHA_SATURATE\0"
+   "GL_FRONT_LEFT\0"
+   "GL_FRONT_RIGHT\0"
+   "GL_BACK_LEFT\0"
+   "GL_BACK_RIGHT\0"
+   "GL_FRONT\0"
+   "GL_BACK\0"
+   "GL_LEFT\0"
+   "GL_RIGHT\0"
+   "GL_FRONT_AND_BACK\0"
+   "GL_AUX0\0"
+   "GL_AUX1\0"
+   "GL_AUX2\0"
+   "GL_AUX3\0"
+   "GL_INVALID_ENUM\0"
+   "GL_INVALID_VALUE\0"
+   "GL_INVALID_OPERATION\0"
+   "GL_STACK_OVERFLOW\0"
+   "GL_STACK_UNDERFLOW\0"
+   "GL_OUT_OF_MEMORY\0"
+   "GL_INVALID_FRAMEBUFFER_OPERATION\0"
+   "GL_2D\0"
+   "GL_3D\0"
+   "GL_3D_COLOR\0"
+   "GL_3D_COLOR_TEXTURE\0"
+   "GL_4D_COLOR_TEXTURE\0"
+   "GL_PASS_THROUGH_TOKEN\0"
+   "GL_POINT_TOKEN\0"
+   "GL_LINE_TOKEN\0"
+   "GL_POLYGON_TOKEN\0"
+   "GL_BITMAP_TOKEN\0"
+   "GL_DRAW_PIXEL_TOKEN\0"
+   "GL_COPY_PIXEL_TOKEN\0"
+   "GL_LINE_RESET_TOKEN\0"
+   "GL_EXP\0"
+   "GL_EXP2\0"
+   "GL_CW\0"
+   "GL_CCW\0"
+   "GL_COEFF\0"
+   "GL_ORDER\0"
+   "GL_DOMAIN\0"
+   "GL_CURRENT_COLOR\0"
+   "GL_CURRENT_INDEX\0"
+   "GL_CURRENT_NORMAL\0"
+   "GL_CURRENT_TEXTURE_COORDS\0"
+   "GL_CURRENT_RASTER_COLOR\0"
+   "GL_CURRENT_RASTER_INDEX\0"
+   "GL_CURRENT_RASTER_TEXTURE_COORDS\0"
+   "GL_CURRENT_RASTER_POSITION\0"
+   "GL_CURRENT_RASTER_POSITION_VALID\0"
+   "GL_CURRENT_RASTER_DISTANCE\0"
+   "GL_POINT_SMOOTH\0"
+   "GL_POINT_SIZE\0"
+   "GL_POINT_SIZE_RANGE\0"
+   "GL_POINT_SIZE_GRANULARITY\0"
+   "GL_LINE_SMOOTH\0"
+   "GL_LINE_WIDTH\0"
+   "GL_LINE_WIDTH_RANGE\0"
+   "GL_LINE_WIDTH_GRANULARITY\0"
+   "GL_LINE_STIPPLE\0"
+   "GL_LINE_STIPPLE_PATTERN\0"
+   "GL_LINE_STIPPLE_REPEAT\0"
+   "GL_LIST_MODE\0"
+   "GL_MAX_LIST_NESTING\0"
+   "GL_LIST_BASE\0"
+   "GL_LIST_INDEX\0"
+   "GL_POLYGON_MODE\0"
+   "GL_POLYGON_SMOOTH\0"
+   "GL_POLYGON_STIPPLE\0"
+   "GL_EDGE_FLAG\0"
+   "GL_CULL_FACE\0"
+   "GL_CULL_FACE_MODE\0"
+   "GL_FRONT_FACE\0"
+   "GL_LIGHTING\0"
+   "GL_LIGHT_MODEL_LOCAL_VIEWER\0"
+   "GL_LIGHT_MODEL_TWO_SIDE\0"
+   "GL_LIGHT_MODEL_AMBIENT\0"
+   "GL_SHADE_MODEL\0"
+   "GL_COLOR_MATERIAL_FACE\0"
+   "GL_COLOR_MATERIAL_PARAMETER\0"
+   "GL_COLOR_MATERIAL\0"
+   "GL_FOG\0"
+   "GL_FOG_INDEX\0"
+   "GL_FOG_DENSITY\0"
+   "GL_FOG_START\0"
+   "GL_FOG_END\0"
+   "GL_FOG_MODE\0"
+   "GL_FOG_COLOR\0"
+   "GL_DEPTH_RANGE\0"
+   "GL_DEPTH_TEST\0"
+   "GL_DEPTH_WRITEMASK\0"
+   "GL_DEPTH_CLEAR_VALUE\0"
+   "GL_DEPTH_FUNC\0"
+   "GL_ACCUM_CLEAR_VALUE\0"
+   "GL_STENCIL_TEST\0"
+   "GL_STENCIL_CLEAR_VALUE\0"
+   "GL_STENCIL_FUNC\0"
+   "GL_STENCIL_VALUE_MASK\0"
+   "GL_STENCIL_FAIL\0"
+   "GL_STENCIL_PASS_DEPTH_FAIL\0"
+   "GL_STENCIL_PASS_DEPTH_PASS\0"
+   "GL_STENCIL_REF\0"
+   "GL_STENCIL_WRITEMASK\0"
+   "GL_MATRIX_MODE\0"
+   "GL_NORMALIZE\0"
+   "GL_VIEWPORT\0"
+   "GL_MODELVIEW_STACK_DEPTH\0"
+   "GL_PROJECTION_STACK_DEPTH\0"
+   "GL_TEXTURE_STACK_DEPTH\0"
+   "GL_MODELVIEW_MATRIX\0"
+   "GL_PROJECTION_MATRIX\0"
+   "GL_TEXTURE_MATRIX\0"
+   "GL_ATTRIB_STACK_DEPTH\0"
+   "GL_CLIENT_ATTRIB_STACK_DEPTH\0"
+   "GL_ALPHA_TEST\0"
+   "GL_ALPHA_TEST_FUNC\0"
+   "GL_ALPHA_TEST_REF\0"
+   "GL_DITHER\0"
+   "GL_BLEND_DST\0"
+   "GL_BLEND_SRC\0"
+   "GL_BLEND\0"
+   "GL_LOGIC_OP_MODE\0"
+   "GL_INDEX_LOGIC_OP\0"
+   "GL_COLOR_LOGIC_OP\0"
+   "GL_AUX_BUFFERS\0"
+   "GL_DRAW_BUFFER\0"
+   "GL_READ_BUFFER\0"
+   "GL_SCISSOR_BOX\0"
+   "GL_SCISSOR_TEST\0"
+   "GL_INDEX_CLEAR_VALUE\0"
+   "GL_INDEX_WRITEMASK\0"
+   "GL_COLOR_CLEAR_VALUE\0"
+   "GL_COLOR_WRITEMASK\0"
+   "GL_INDEX_MODE\0"
+   "GL_RGBA_MODE\0"
+   "GL_DOUBLEBUFFER\0"
+   "GL_STEREO\0"
+   "GL_RENDER_MODE\0"
+   "GL_PERSPECTIVE_CORRECTION_HINT\0"
+   "GL_POINT_SMOOTH_HINT\0"
+   "GL_LINE_SMOOTH_HINT\0"
+   "GL_POLYGON_SMOOTH_HINT\0"
+   "GL_FOG_HINT\0"
+   "GL_TEXTURE_GEN_S\0"
+   "GL_TEXTURE_GEN_T\0"
+   "GL_TEXTURE_GEN_R\0"
+   "GL_TEXTURE_GEN_Q\0"
+   "GL_PIXEL_MAP_I_TO_I\0"
+   "GL_PIXEL_MAP_S_TO_S\0"
+   "GL_PIXEL_MAP_I_TO_R\0"
+   "GL_PIXEL_MAP_I_TO_G\0"
+   "GL_PIXEL_MAP_I_TO_B\0"
+   "GL_PIXEL_MAP_I_TO_A\0"
+   "GL_PIXEL_MAP_R_TO_R\0"
+   "GL_PIXEL_MAP_G_TO_G\0"
+   "GL_PIXEL_MAP_B_TO_B\0"
+   "GL_PIXEL_MAP_A_TO_A\0"
+   "GL_PIXEL_MAP_I_TO_I_SIZE\0"
+   "GL_PIXEL_MAP_S_TO_S_SIZE\0"
+   "GL_PIXEL_MAP_I_TO_R_SIZE\0"
+   "GL_PIXEL_MAP_I_TO_G_SIZE\0"
+   "GL_PIXEL_MAP_I_TO_B_SIZE\0"
+   "GL_PIXEL_MAP_I_TO_A_SIZE\0"
+   "GL_PIXEL_MAP_R_TO_R_SIZE\0"
+   "GL_PIXEL_MAP_G_TO_G_SIZE\0"
+   "GL_PIXEL_MAP_B_TO_B_SIZE\0"
+   "GL_PIXEL_MAP_A_TO_A_SIZE\0"
+   "GL_UNPACK_SWAP_BYTES\0"
+   "GL_UNPACK_LSB_FIRST\0"
+   "GL_UNPACK_ROW_LENGTH\0"
+   "GL_UNPACK_SKIP_ROWS\0"
+   "GL_UNPACK_SKIP_PIXELS\0"
+   "GL_UNPACK_ALIGNMENT\0"
+   "GL_PACK_SWAP_BYTES\0"
+   "GL_PACK_LSB_FIRST\0"
+   "GL_PACK_ROW_LENGTH\0"
+   "GL_PACK_SKIP_ROWS\0"
+   "GL_PACK_SKIP_PIXELS\0"
+   "GL_PACK_ALIGNMENT\0"
+   "GL_MAP_COLOR\0"
+   "GL_MAP_STENCIL\0"
+   "GL_INDEX_SHIFT\0"
+   "GL_INDEX_OFFSET\0"
+   "GL_RED_SCALE\0"
+   "GL_RED_BIAS\0"
+   "GL_ZOOM_X\0"
+   "GL_ZOOM_Y\0"
+   "GL_GREEN_SCALE\0"
+   "GL_GREEN_BIAS\0"
+   "GL_BLUE_SCALE\0"
+   "GL_BLUE_BIAS\0"
+   "GL_ALPHA_SCALE\0"
+   "GL_ALPHA_BIAS\0"
+   "GL_DEPTH_SCALE\0"
+   "GL_DEPTH_BIAS\0"
+   "GL_MAX_EVAL_ORDER\0"
+   "GL_MAX_LIGHTS\0"
+   "GL_MAX_CLIP_DISTANCES\0"
+   "GL_MAX_TEXTURE_SIZE\0"
+   "GL_MAX_PIXEL_MAP_TABLE\0"
+   "GL_MAX_ATTRIB_STACK_DEPTH\0"
+   "GL_MAX_MODELVIEW_STACK_DEPTH\0"
+   "GL_MAX_NAME_STACK_DEPTH\0"
+   "GL_MAX_PROJECTION_STACK_DEPTH\0"
+   "GL_MAX_TEXTURE_STACK_DEPTH\0"
+   "GL_MAX_VIEWPORT_DIMS\0"
+   "GL_MAX_CLIENT_ATTRIB_STACK_DEPTH\0"
+   "GL_SUBPIXEL_BITS\0"
+   "GL_INDEX_BITS\0"
+   "GL_RED_BITS\0"
+   "GL_GREEN_BITS\0"
+   "GL_BLUE_BITS\0"
+   "GL_ALPHA_BITS\0"
+   "GL_DEPTH_BITS\0"
+   "GL_STENCIL_BITS\0"
+   "GL_ACCUM_RED_BITS\0"
+   "GL_ACCUM_GREEN_BITS\0"
+   "GL_ACCUM_BLUE_BITS\0"
+   "GL_ACCUM_ALPHA_BITS\0"
+   "GL_NAME_STACK_DEPTH\0"
+   "GL_AUTO_NORMAL\0"
+   "GL_MAP1_COLOR_4\0"
+   "GL_MAP1_INDEX\0"
+   "GL_MAP1_NORMAL\0"
+   "GL_MAP1_TEXTURE_COORD_1\0"
+   "GL_MAP1_TEXTURE_COORD_2\0"
+   "GL_MAP1_TEXTURE_COORD_3\0"
+   "GL_MAP1_TEXTURE_COORD_4\0"
+   "GL_MAP1_VERTEX_3\0"
+   "GL_MAP1_VERTEX_4\0"
+   "GL_MAP2_COLOR_4\0"
+   "GL_MAP2_INDEX\0"
+   "GL_MAP2_NORMAL\0"
+   "GL_MAP2_TEXTURE_COORD_1\0"
+   "GL_MAP2_TEXTURE_COORD_2\0"
+   "GL_MAP2_TEXTURE_COORD_3\0"
+   "GL_MAP2_TEXTURE_COORD_4\0"
+   "GL_MAP2_VERTEX_3\0"
+   "GL_MAP2_VERTEX_4\0"
+   "GL_MAP1_GRID_DOMAIN\0"
+   "GL_MAP1_GRID_SEGMENTS\0"
+   "GL_MAP2_GRID_DOMAIN\0"
+   "GL_MAP2_GRID_SEGMENTS\0"
+   "GL_TEXTURE_1D\0"
+   "GL_TEXTURE_2D\0"
+   "GL_FEEDBACK_BUFFER_POINTER\0"
+   "GL_FEEDBACK_BUFFER_SIZE\0"
+   "GL_FEEDBACK_BUFFER_TYPE\0"
+   "GL_SELECTION_BUFFER_POINTER\0"
+   "GL_SELECTION_BUFFER_SIZE\0"
+   "GL_TEXTURE_WIDTH\0"
+   "GL_TEXTURE_HEIGHT\0"
+   "GL_TEXTURE_COMPONENTS\0"
+   "GL_TEXTURE_BORDER_COLOR\0"
+   "GL_TEXTURE_BORDER\0"
+   "GL_DONT_CARE\0"
+   "GL_FASTEST\0"
+   "GL_NICEST\0"
+   "GL_AMBIENT\0"
+   "GL_DIFFUSE\0"
+   "GL_SPECULAR\0"
+   "GL_POSITION\0"
+   "GL_SPOT_DIRECTION\0"
+   "GL_SPOT_EXPONENT\0"
+   "GL_SPOT_CUTOFF\0"
+   "GL_CONSTANT_ATTENUATION\0"
+   "GL_LINEAR_ATTENUATION\0"
+   "GL_QUADRATIC_ATTENUATION\0"
+   "GL_COMPILE\0"
+   "GL_COMPILE_AND_EXECUTE\0"
+   "GL_BYTE\0"
+   "GL_UNSIGNED_BYTE\0"
+   "GL_SHORT\0"
+   "GL_UNSIGNED_SHORT\0"
+   "GL_INT\0"
+   "GL_UNSIGNED_INT\0"
+   "GL_FLOAT\0"
+   "GL_2_BYTES\0"
+   "GL_3_BYTES\0"
+   "GL_4_BYTES\0"
+   "GL_DOUBLE\0"
+   "GL_HALF_FLOAT\0"
+   "GL_FIXED\0"
+   "GL_CLEAR\0"
+   "GL_AND\0"
+   "GL_AND_REVERSE\0"
+   "GL_COPY\0"
+   "GL_AND_INVERTED\0"
+   "GL_NOOP\0"
+   "GL_XOR\0"
+   "GL_OR\0"
+   "GL_NOR\0"
+   "GL_EQUIV\0"
+   "GL_INVERT\0"
+   "GL_OR_REVERSE\0"
+   "GL_COPY_INVERTED\0"
+   "GL_OR_INVERTED\0"
+   "GL_NAND\0"
+   "GL_SET\0"
+   "GL_EMISSION\0"
+   "GL_SHININESS\0"
+   "GL_AMBIENT_AND_DIFFUSE\0"
+   "GL_COLOR_INDEXES\0"
+   "GL_MODELVIEW\0"
+   "GL_PROJECTION\0"
+   "GL_TEXTURE\0"
+   "GL_COLOR\0"
+   "GL_DEPTH\0"
+   "GL_STENCIL\0"
+   "GL_COLOR_INDEX\0"
+   "GL_STENCIL_INDEX\0"
+   "GL_DEPTH_COMPONENT\0"
+   "GL_RED\0"
+   "GL_GREEN\0"
+   "GL_BLUE\0"
+   "GL_ALPHA\0"
+   "GL_RGB\0"
+   "GL_RGBA\0"
+   "GL_LUMINANCE\0"
+   "GL_LUMINANCE_ALPHA\0"
+   "GL_BITMAP\0"
+   "GL_POINT\0"
+   "GL_LINE\0"
+   "GL_FILL\0"
+   "GL_RENDER\0"
+   "GL_FEEDBACK\0"
+   "GL_SELECT\0"
+   "GL_FLAT\0"
+   "GL_SMOOTH\0"
+   "GL_KEEP\0"
+   "GL_REPLACE\0"
+   "GL_INCR\0"
+   "GL_DECR\0"
+   "GL_VENDOR\0"
+   "GL_RENDERER\0"
+   "GL_VERSION\0"
+   "GL_EXTENSIONS\0"
+   "GL_S\0"
+   "GL_T\0"
+   "GL_R\0"
+   "GL_Q\0"
+   "GL_MODULATE\0"
+   "GL_DECAL\0"
+   "GL_TEXTURE_ENV_MODE\0"
+   "GL_TEXTURE_ENV_COLOR\0"
+   "GL_TEXTURE_ENV\0"
+   "GL_EYE_LINEAR\0"
+   "GL_OBJECT_LINEAR\0"
+   "GL_SPHERE_MAP\0"
+   "GL_TEXTURE_GEN_MODE\0"
+   "GL_OBJECT_PLANE\0"
+   "GL_EYE_PLANE\0"
+   "GL_NEAREST\0"
+   "GL_LINEAR\0"
+   "GL_NEAREST_MIPMAP_NEAREST\0"
+   "GL_LINEAR_MIPMAP_NEAREST\0"
+   "GL_NEAREST_MIPMAP_LINEAR\0"
+   "GL_LINEAR_MIPMAP_LINEAR\0"
+   "GL_TEXTURE_MAG_FILTER\0"
+   "GL_TEXTURE_MIN_FILTER\0"
+   "GL_TEXTURE_WRAP_S\0"
+   "GL_TEXTURE_WRAP_T\0"
+   "GL_CLAMP\0"
+   "GL_REPEAT\0"
+   "GL_POLYGON_OFFSET_UNITS\0"
+   "GL_POLYGON_OFFSET_POINT\0"
+   "GL_POLYGON_OFFSET_LINE\0"
+   "GL_R3_G3_B2\0"
+   "GL_V2F\0"
+   "GL_V3F\0"
+   "GL_C4UB_V2F\0"
+   "GL_C4UB_V3F\0"
+   "GL_C3F_V3F\0"
+   "GL_N3F_V3F\0"
+   "GL_C4F_N3F_V3F\0"
+   "GL_T2F_V3F\0"
+   "GL_T4F_V4F\0"
+   "GL_T2F_C4UB_V3F\0"
+   "GL_T2F_C3F_V3F\0"
+   "GL_T2F_N3F_V3F\0"
+   "GL_T2F_C4F_N3F_V3F\0"
+   "GL_T4F_C4F_N3F_V4F\0"
+   "GL_CLIP_DISTANCE0\0"
+   "GL_CLIP_DISTANCE1\0"
+   "GL_CLIP_DISTANCE2\0"
+   "GL_CLIP_DISTANCE3\0"
+   "GL_CLIP_DISTANCE4\0"
+   "GL_CLIP_DISTANCE5\0"
+   "GL_CLIP_DISTANCE6\0"
+   "GL_CLIP_DISTANCE7\0"
+   "GL_LIGHT0\0"
+   "GL_LIGHT1\0"
+   "GL_LIGHT2\0"
+   "GL_LIGHT3\0"
+   "GL_LIGHT4\0"
+   "GL_LIGHT5\0"
+   "GL_LIGHT6\0"
+   "GL_LIGHT7\0"
+   "GL_HINT_BIT\0"
+   "GL_CONSTANT_COLOR\0"
+   "GL_ONE_MINUS_CONSTANT_COLOR\0"
+   "GL_CONSTANT_ALPHA\0"
+   "GL_ONE_MINUS_CONSTANT_ALPHA\0"
+   "GL_BLEND_COLOR\0"
+   "GL_FUNC_ADD\0"
+   "GL_MIN\0"
+   "GL_MAX\0"
+   "GL_BLEND_EQUATION\0"
+   "GL_FUNC_SUBTRACT\0"
+   "GL_FUNC_REVERSE_SUBTRACT\0"
+   "GL_CONVOLUTION_1D\0"
+   "GL_CONVOLUTION_2D\0"
+   "GL_SEPARABLE_2D\0"
+   "GL_CONVOLUTION_BORDER_MODE\0"
+   "GL_CONVOLUTION_FILTER_SCALE\0"
+   "GL_CONVOLUTION_FILTER_BIAS\0"
+   "GL_REDUCE\0"
+   "GL_CONVOLUTION_FORMAT\0"
+   "GL_CONVOLUTION_WIDTH\0"
+   "GL_CONVOLUTION_HEIGHT\0"
+   "GL_MAX_CONVOLUTION_WIDTH\0"
+   "GL_MAX_CONVOLUTION_HEIGHT\0"
+   "GL_POST_CONVOLUTION_RED_SCALE\0"
+   "GL_POST_CONVOLUTION_GREEN_SCALE\0"
+   "GL_POST_CONVOLUTION_BLUE_SCALE\0"
+   "GL_POST_CONVOLUTION_ALPHA_SCALE\0"
+   "GL_POST_CONVOLUTION_RED_BIAS\0"
+   "GL_POST_CONVOLUTION_GREEN_BIAS\0"
+   "GL_POST_CONVOLUTION_BLUE_BIAS\0"
+   "GL_POST_CONVOLUTION_ALPHA_BIAS\0"
+   "GL_HISTOGRAM\0"
+   "GL_PROXY_HISTOGRAM\0"
+   "GL_HISTOGRAM_WIDTH\0"
+   "GL_HISTOGRAM_FORMAT\0"
+   "GL_HISTOGRAM_RED_SIZE\0"
+   "GL_HISTOGRAM_GREEN_SIZE\0"
+   "GL_HISTOGRAM_BLUE_SIZE\0"
+   "GL_HISTOGRAM_ALPHA_SIZE\0"
+   "GL_HISTOGRAM_LUMINANCE_SIZE\0"
+   "GL_HISTOGRAM_SINK\0"
+   "GL_MINMAX\0"
+   "GL_MINMAX_FORMAT\0"
+   "GL_MINMAX_SINK\0"
+   "GL_TABLE_TOO_LARGE_EXT\0"
+   "GL_UNSIGNED_BYTE_3_3_2\0"
+   "GL_UNSIGNED_SHORT_4_4_4_4\0"
+   "GL_UNSIGNED_SHORT_5_5_5_1\0"
+   "GL_UNSIGNED_INT_8_8_8_8\0"
+   "GL_UNSIGNED_INT_10_10_10_2\0"
+   "GL_POLYGON_OFFSET_FILL\0"
+   "GL_POLYGON_OFFSET_FACTOR\0"
+   "GL_POLYGON_OFFSET_BIAS_EXT\0"
+   "GL_RESCALE_NORMAL\0"
+   "GL_ALPHA4\0"
+   "GL_ALPHA8\0"
+   "GL_ALPHA12\0"
+   "GL_ALPHA16\0"
+   "GL_LUMINANCE4\0"
+   "GL_LUMINANCE8\0"
+   "GL_LUMINANCE12\0"
+   "GL_LUMINANCE16\0"
+   "GL_LUMINANCE4_ALPHA4\0"
+   "GL_LUMINANCE6_ALPHA2\0"
+   "GL_LUMINANCE8_ALPHA8\0"
+   "GL_LUMINANCE12_ALPHA4\0"
+   "GL_LUMINANCE12_ALPHA12\0"
+   "GL_LUMINANCE16_ALPHA16\0"
+   "GL_INTENSITY\0"
+   "GL_INTENSITY4\0"
+   "GL_INTENSITY8\0"
+   "GL_INTENSITY12\0"
+   "GL_INTENSITY16\0"
+   "GL_RGB2_EXT\0"
+   "GL_RGB4\0"
+   "GL_RGB5\0"
+   "GL_RGB8\0"
+   "GL_RGB10\0"
+   "GL_RGB12\0"
+   "GL_RGB16\0"
+   "GL_RGBA2\0"
+   "GL_RGBA4\0"
+   "GL_RGB5_A1\0"
+   "GL_RGBA8\0"
+   "GL_RGB10_A2\0"
+   "GL_RGBA12\0"
+   "GL_RGBA16\0"
+   "GL_TEXTURE_RED_SIZE\0"
+   "GL_TEXTURE_GREEN_SIZE\0"
+   "GL_TEXTURE_BLUE_SIZE\0"
+   "GL_TEXTURE_ALPHA_SIZE\0"
+   "GL_TEXTURE_LUMINANCE_SIZE\0"
+   "GL_TEXTURE_INTENSITY_SIZE\0"
+   "GL_REPLACE_EXT\0"
+   "GL_PROXY_TEXTURE_1D\0"
+   "GL_PROXY_TEXTURE_2D\0"
+   "GL_TEXTURE_TOO_LARGE_EXT\0"
+   "GL_TEXTURE_PRIORITY\0"
+   "GL_TEXTURE_RESIDENT\0"
+   "GL_TEXTURE_BINDING_1D\0"
+   "GL_TEXTURE_BINDING_2D\0"
+   "GL_TEXTURE_BINDING_3D\0"
+   "GL_PACK_SKIP_IMAGES\0"
+   "GL_PACK_IMAGE_HEIGHT\0"
+   "GL_UNPACK_SKIP_IMAGES\0"
+   "GL_UNPACK_IMAGE_HEIGHT\0"
+   "GL_TEXTURE_3D\0"
+   "GL_PROXY_TEXTURE_3D\0"
+   "GL_TEXTURE_DEPTH\0"
+   "GL_TEXTURE_WRAP_R\0"
+   "GL_MAX_3D_TEXTURE_SIZE\0"
+   "GL_VERTEX_ARRAY\0"
+   "GL_NORMAL_ARRAY\0"
+   "GL_COLOR_ARRAY\0"
+   "GL_INDEX_ARRAY\0"
+   "GL_TEXTURE_COORD_ARRAY\0"
+   "GL_EDGE_FLAG_ARRAY\0"
+   "GL_VERTEX_ARRAY_SIZE\0"
+   "GL_VERTEX_ARRAY_TYPE\0"
+   "GL_VERTEX_ARRAY_STRIDE\0"
+   "GL_VERTEX_ARRAY_COUNT_EXT\0"
+   "GL_NORMAL_ARRAY_TYPE\0"
+   "GL_NORMAL_ARRAY_STRIDE\0"
+   "GL_NORMAL_ARRAY_COUNT_EXT\0"
+   "GL_COLOR_ARRAY_SIZE\0"
+   "GL_COLOR_ARRAY_TYPE\0"
+   "GL_COLOR_ARRAY_STRIDE\0"
+   "GL_COLOR_ARRAY_COUNT_EXT\0"
+   "GL_INDEX_ARRAY_TYPE\0"
+   "GL_INDEX_ARRAY_STRIDE\0"
+   "GL_INDEX_ARRAY_COUNT_EXT\0"
+   "GL_TEXTURE_COORD_ARRAY_SIZE\0"
+   "GL_TEXTURE_COORD_ARRAY_TYPE\0"
+   "GL_TEXTURE_COORD_ARRAY_STRIDE\0"
+   "GL_TEXTURE_COORD_ARRAY_COUNT_EXT\0"
+   "GL_EDGE_FLAG_ARRAY_STRIDE\0"
+   "GL_EDGE_FLAG_ARRAY_COUNT_EXT\0"
+   "GL_VERTEX_ARRAY_POINTER\0"
+   "GL_NORMAL_ARRAY_POINTER\0"
+   "GL_COLOR_ARRAY_POINTER\0"
+   "GL_INDEX_ARRAY_POINTER\0"
+   "GL_TEXTURE_COORD_ARRAY_POINTER\0"
+   "GL_EDGE_FLAG_ARRAY_POINTER\0"
+   "GL_MULTISAMPLE\0"
+   "GL_SAMPLE_ALPHA_TO_COVERAGE\0"
+   "GL_SAMPLE_ALPHA_TO_ONE\0"
+   "GL_SAMPLE_COVERAGE\0"
+   "GL_SAMPLE_BUFFERS\0"
+   "GL_SAMPLES\0"
+   "GL_SAMPLE_COVERAGE_VALUE\0"
+   "GL_SAMPLE_COVERAGE_INVERT\0"
+   "GL_COLOR_MATRIX\0"
+   "GL_COLOR_MATRIX_STACK_DEPTH\0"
+   "GL_MAX_COLOR_MATRIX_STACK_DEPTH\0"
+   "GL_POST_COLOR_MATRIX_RED_SCALE\0"
+   "GL_POST_COLOR_MATRIX_GREEN_SCALE\0"
+   "GL_POST_COLOR_MATRIX_BLUE_SCALE\0"
+   "GL_POST_COLOR_MATRIX_ALPHA_SCALE\0"
+   "GL_POST_COLOR_MATRIX_RED_BIAS\0"
+   "GL_POST_COLOR_MATRIX_GREEN_BIAS\0"
+   "GL_POST_COLOR_MATRIX_BLUE_BIAS\0"
+   "GL_POST_COLOR_MATRIX_ALPHA_BIAS\0"
+   "GL_TEXTURE_COLOR_TABLE_SGI\0"
+   "GL_PROXY_TEXTURE_COLOR_TABLE_SGI\0"
+   "GL_TEXTURE_COMPARE_FAIL_VALUE_ARB\0"
+   "GL_BLEND_DST_RGB\0"
+   "GL_BLEND_SRC_RGB\0"
+   "GL_BLEND_DST_ALPHA\0"
+   "GL_BLEND_SRC_ALPHA\0"
+   "GL_COLOR_TABLE\0"
+   "GL_POST_CONVOLUTION_COLOR_TABLE\0"
+   "GL_POST_COLOR_MATRIX_COLOR_TABLE\0"
+   "GL_PROXY_COLOR_TABLE\0"
+   "GL_PROXY_POST_CONVOLUTION_COLOR_TABLE\0"
+   "GL_PROXY_POST_COLOR_MATRIX_COLOR_TABLE\0"
+   "GL_COLOR_TABLE_SCALE\0"
+   "GL_COLOR_TABLE_BIAS\0"
+   "GL_COLOR_TABLE_FORMAT\0"
+   "GL_COLOR_TABLE_WIDTH\0"
+   "GL_COLOR_TABLE_RED_SIZE\0"
+   "GL_COLOR_TABLE_GREEN_SIZE\0"
+   "GL_COLOR_TABLE_BLUE_SIZE\0"
+   "GL_COLOR_TABLE_ALPHA_SIZE\0"
+   "GL_COLOR_TABLE_LUMINANCE_SIZE\0"
+   "GL_COLOR_TABLE_INTENSITY_SIZE\0"
+   "GL_BGR\0"
+   "GL_BGRA\0"
+   "GL_MAX_ELEMENTS_VERTICES\0"
+   "GL_MAX_ELEMENTS_INDICES\0"
+   "GL_TEXTURE_INDEX_SIZE_EXT\0"
+   "GL_CLIP_VOLUME_CLIPPING_HINT_EXT\0"
+   "GL_POINT_SIZE_MIN\0"
+   "GL_POINT_SIZE_MAX\0"
+   "GL_POINT_FADE_THRESHOLD_SIZE\0"
+   "GL_POINT_DISTANCE_ATTENUATION\0"
+   "GL_CLAMP_TO_BORDER\0"
+   "GL_CLAMP_TO_EDGE\0"
+   "GL_TEXTURE_MIN_LOD\0"
+   "GL_TEXTURE_MAX_LOD\0"
+   "GL_TEXTURE_BASE_LEVEL\0"
+   "GL_TEXTURE_MAX_LEVEL\0"
+   "GL_IGNORE_BORDER_HP\0"
+   "GL_CONSTANT_BORDER_HP\0"
+   "GL_REPLICATE_BORDER_HP\0"
+   "GL_CONVOLUTION_BORDER_COLOR\0"
+   "GL_OCCLUSION_TEST_HP\0"
+   "GL_OCCLUSION_TEST_RESULT_HP\0"
+   "GL_LINEAR_CLIPMAP_LINEAR_SGIX\0"
+   "GL_TEXTURE_CLIPMAP_CENTER_SGIX\0"
+   "GL_TEXTURE_CLIPMAP_FRAME_SGIX\0"
+   "GL_TEXTURE_CLIPMAP_OFFSET_SGIX\0"
+   "GL_TEXTURE_CLIPMAP_VIRTUAL_DEPTH_SGIX\0"
+   "GL_TEXTURE_CLIPMAP_LOD_OFFSET_SGIX\0"
+   "GL_TEXTURE_CLIPMAP_DEPTH_SGIX\0"
+   "GL_MAX_CLIPMAP_DEPTH_SGIX\0"
+   "GL_MAX_CLIPMAP_VIRTUAL_DEPTH_SGIX\0"
+   "GL_POST_TEXTURE_FILTER_BIAS_SGIX\0"
+   "GL_POST_TEXTURE_FILTER_SCALE_SGIX\0"
+   "GL_POST_TEXTURE_FILTER_BIAS_RANGE_SGIX\0"
+   "GL_POST_TEXTURE_FILTER_SCALE_RANGE_SGIX\0"
+   "GL_TEXTURE_LOD_BIAS_S_SGIX\0"
+   "GL_TEXTURE_LOD_BIAS_T_SGIX\0"
+   "GL_TEXTURE_LOD_BIAS_R_SGIX\0"
+   "GL_GENERATE_MIPMAP\0"
+   "GL_GENERATE_MIPMAP_HINT\0"
+   "GL_FOG_OFFSET_SGIX\0"
+   "GL_FOG_OFFSET_VALUE_SGIX\0"
+   "GL_TEXTURE_COMPARE_SGIX\0"
+   "GL_TEXTURE_COMPARE_OPERATOR_SGIX\0"
+   "GL_TEXTURE_LEQUAL_R_SGIX\0"
+   "GL_TEXTURE_GEQUAL_R_SGIX\0"
+   "GL_DEPTH_COMPONENT16\0"
+   "GL_DEPTH_COMPONENT24\0"
+   "GL_DEPTH_COMPONENT32\0"
+   "GL_ARRAY_ELEMENT_LOCK_FIRST_EXT\0"
+   "GL_ARRAY_ELEMENT_LOCK_COUNT_EXT\0"
+   "GL_CULL_VERTEX_EXT\0"
+   "GL_CULL_VERTEX_OBJECT_POSITION_EXT\0"
+   "GL_CULL_VERTEX_EYE_POSITION_EXT\0"
+   "GL_WRAP_BORDER_SUN\0"
+   "GL_TEXTURE_COLOR_WRITEMASK_SGIS\0"
+   "GL_LIGHT_MODEL_COLOR_CONTROL\0"
+   "GL_SINGLE_COLOR\0"
+   "GL_SEPARATE_SPECULAR_COLOR\0"
+   "GL_SHARED_TEXTURE_PALETTE_EXT\0"
+   "GL_FRAMEBUFFER_ATTACHMENT_COLOR_ENCODING\0"
+   "GL_FRAMEBUFFER_ATTACHMENT_COMPONENT_TYPE\0"
+   "GL_FRAMEBUFFER_ATTACHMENT_RED_SIZE\0"
+   "GL_FRAMEBUFFER_ATTACHMENT_GREEN_SIZE\0"
+   "GL_FRAMEBUFFER_ATTACHMENT_BLUE_SIZE\0"
+   "GL_FRAMEBUFFER_ATTACHMENT_ALPHA_SIZE\0"
+   "GL_FRAMEBUFFER_ATTACHMENT_DEPTH_SIZE\0"
+   "GL_FRAMEBUFFER_ATTACHMENT_STENCIL_SIZE\0"
+   "GL_FRAMEBUFFER_DEFAULT\0"
+   "GL_FRAMEBUFFER_UNDEFINED\0"
+   "GL_DEPTH_STENCIL_ATTACHMENT\0"
+   "GL_MAJOR_VERSION\0"
+   "GL_MINOR_VERSION\0"
+   "GL_NUM_EXTENSIONS\0"
+   "GL_CONTEXT_FLAGS\0"
+   "GL_BUFFER_IMMUTABLE_STORAGE\0"
+   "GL_BUFFER_STORAGE_FLAGS\0"
+   "GL_INDEX\0"
+   "GL_DEPTH_BUFFER\0"
+   "GL_STENCIL_BUFFER\0"
+   "GL_COMPRESSED_RED\0"
+   "GL_COMPRESSED_RG\0"
+   "GL_RG\0"
+   "GL_RG_INTEGER\0"
+   "GL_R8\0"
+   "GL_R16\0"
+   "GL_RG8\0"
+   "GL_RG16\0"
+   "GL_R16F\0"
+   "GL_R32F\0"
+   "GL_RG16F\0"
+   "GL_RG32F\0"
+   "GL_R8I\0"
+   "GL_R8UI\0"
+   "GL_R16I\0"
+   "GL_R16UI\0"
+   "GL_R32I\0"
+   "GL_R32UI\0"
+   "GL_RG8I\0"
+   "GL_RG8UI\0"
+   "GL_RG16I\0"
+   "GL_RG16UI\0"
+   "GL_RG32I\0"
+   "GL_RG32UI\0"
+   "GL_DEBUG_OUTPUT_SYNCHRONOUS_ARB\0"
+   "GL_DEBUG_NEXT_LOGGED_MESSAGE_LENGTH_ARB\0"
+   "GL_DEBUG_CALLBACK_FUNCTION_ARB\0"
+   "GL_DEBUG_CALLBACK_USER_PARAM_ARB\0"
+   "GL_DEBUG_SOURCE_API_ARB\0"
+   "GL_DEBUG_SOURCE_WINDOW_SYSTEM_ARB\0"
+   "GL_DEBUG_SOURCE_SHADER_COMPILER_ARB\0"
+   "GL_DEBUG_SOURCE_THIRD_PARTY_ARB\0"
+   "GL_DEBUG_SOURCE_APPLICATION_ARB\0"
+   "GL_DEBUG_SOURCE_OTHER_ARB\0"
+   "GL_DEBUG_TYPE_ERROR_ARB\0"
+   "GL_DEBUG_TYPE_DEPRECATED_BEHAVIOR_ARB\0"
+   "GL_DEBUG_TYPE_UNDEFINED_BEHAVIOR_ARB\0"
+   "GL_DEBUG_TYPE_PORTABILITY_ARB\0"
+   "GL_DEBUG_TYPE_PERFORMANCE_ARB\0"
+   "GL_DEBUG_TYPE_OTHER_ARB\0"
+   "GL_LOSE_CONTEXT_ON_RESET_ARB\0"
+   "GL_GUILTY_CONTEXT_RESET_ARB\0"
+   "GL_INNOCENT_CONTEXT_RESET_ARB\0"
+   "GL_UNKNOWN_CONTEXT_RESET_ARB\0"
+   "GL_RESET_NOTIFICATION_STRATEGY_ARB\0"
+   "GL_PROGRAM_BINARY_RETRIEVABLE_HINT\0"
+   "GL_PROGRAM_SEPARABLE_EXT\0"
+   "GL_ACTIVE_PROGRAM_EXT\0"
+   "GL_PROGRAM_PIPELINE_BINDING_EXT\0"
+   "GL_MAX_VIEWPORTS\0"
+   "GL_VIEWPORT_SUBPIXEL_BITS\0"
+   "GL_VIEWPORT_BOUNDS_RANGE\0"
+   "GL_LAYER_PROVOKING_VERTEX\0"
+   "GL_VIEWPORT_INDEX_PROVOKING_VERTEX\0"
+   "GL_UNDEFINED_VERTEX\0"
+   "GL_NO_RESET_NOTIFICATION_ARB\0"
+   "GL_MAX_COMPUTE_SHARED_MEMORY_SIZE\0"
+   "GL_MAX_COMPUTE_UNIFORM_COMPONENTS\0"
+   "GL_MAX_COMPUTE_ATOMIC_COUNTER_BUFFERS\0"
+   "GL_MAX_COMPUTE_ATOMIC_COUNTERS\0"
+   "GL_MAX_COMBINED_COMPUTE_UNIFORM_COMPONENTS\0"
+   "GL_COMPUTE_WORK_GROUP_SIZE\0"
+   "GL_DEBUG_TYPE_MARKER\0"
+   "GL_DEBUG_TYPE_PUSH_GROUP\0"
+   "GL_DEBUG_TYPE_POP_GROUP\0"
+   "GL_DEBUG_SEVERITY_NOTIFICATION\0"
+   "GL_MAX_DEBUG_GROUP_STACK_DEPTH\0"
+   "GL_DEBUG_GROUP_STACK_DEPTH\0"
+   "GL_VERTEX_ATTRIB_BINDING\0"
+   "GL_VERTEX_ATTRIB_RELATIVE_OFFSET\0"
+   "GL_VERTEX_BINDING_DIVISOR\0"
+   "GL_VERTEX_BINDING_OFFSET\0"
+   "GL_VERTEX_BINDING_STRIDE\0"
+   "GL_MAX_VERTEX_ATTRIB_RELATIVE_OFFSET\0"
+   "GL_MAX_VERTEX_ATTRIB_BINDINGS\0"
+   "GL_TEXTURE_IMMUTABLE_LEVELS\0"
+   "GL_BUFFER\0"
+   "GL_SHADER\0"
+   "GL_PROGRAM\0"
+   "GL_QUERY\0"
+   "GL_PROGRAM_PIPELINE\0"
+   "GL_SAMPLER\0"
+   "GL_DISPLAY_LIST\0"
+   "GL_MAX_LABEL_LENGTH\0"
+   "GL_UNSIGNED_BYTE_2_3_3_REV\0"
+   "GL_UNSIGNED_SHORT_5_6_5\0"
+   "GL_UNSIGNED_SHORT_5_6_5_REV\0"
+   "GL_UNSIGNED_SHORT_4_4_4_4_REV\0"
+   "GL_UNSIGNED_SHORT_1_5_5_5_REV\0"
+   "GL_UNSIGNED_INT_8_8_8_8_REV\0"
+   "GL_UNSIGNED_INT_2_10_10_10_REV\0"
+   "GL_TEXTURE_MAX_CLAMP_S_SGIX\0"
+   "GL_TEXTURE_MAX_CLAMP_T_SGIX\0"
+   "GL_TEXTURE_MAX_CLAMP_R_SGIX\0"
+   "GL_MIRRORED_REPEAT\0"
+   "GL_RGB_S3TC\0"
+   "GL_RGB4_S3TC\0"
+   "GL_RGBA_S3TC\0"
+   "GL_RGBA4_S3TC\0"
+   "GL_RGBA_DXT5_S3TC\0"
+   "GL_RGBA4_DXT5_S3TC\0"
+   "GL_COMPRESSED_RGB_S3TC_DXT1_EXT\0"
+   "GL_COMPRESSED_RGBA_S3TC_DXT1_EXT\0"
+   "GL_COMPRESSED_RGBA_S3TC_DXT3_ANGLE\0"
+   "GL_COMPRESSED_RGBA_S3TC_DXT5_ANGLE\0"
+   "GL_PERFQUERY_DONOT_FLUSH_INTEL\0"
+   "GL_PERFQUERY_FLUSH_INTEL\0"
+   "GL_PERFQUERY_WAIT_INTEL\0"
+   "GL_NEAREST_CLIPMAP_NEAREST_SGIX\0"
+   "GL_NEAREST_CLIPMAP_LINEAR_SGIX\0"
+   "GL_LINEAR_CLIPMAP_NEAREST_SGIX\0"
+   "GL_FOG_COORDINATE_SOURCE\0"
+   "GL_FOG_COORD\0"
+   "GL_FRAGMENT_DEPTH\0"
+   "GL_CURRENT_FOG_COORD\0"
+   "GL_FOG_COORDINATE_ARRAY_TYPE\0"
+   "GL_FOG_COORDINATE_ARRAY_STRIDE\0"
+   "GL_FOG_COORDINATE_ARRAY_POINTER\0"
+   "GL_FOG_COORDINATE_ARRAY\0"
+   "GL_COLOR_SUM\0"
+   "GL_CURRENT_SECONDARY_COLOR\0"
+   "GL_SECONDARY_COLOR_ARRAY_SIZE\0"
+   "GL_SECONDARY_COLOR_ARRAY_TYPE\0"
+   "GL_SECONDARY_COLOR_ARRAY_STRIDE\0"
+   "GL_SECONDARY_COLOR_ARRAY_POINTER\0"
+   "GL_SECONDARY_COLOR_ARRAY\0"
+   "GL_CURRENT_RASTER_SECONDARY_COLOR\0"
+   "GL_ALIASED_POINT_SIZE_RANGE\0"
+   "GL_ALIASED_LINE_WIDTH_RANGE\0"
+   "GL_TEXTURE0\0"
+   "GL_TEXTURE1\0"
+   "GL_TEXTURE2\0"
+   "GL_TEXTURE3\0"
+   "GL_TEXTURE4\0"
+   "GL_TEXTURE5\0"
+   "GL_TEXTURE6\0"
+   "GL_TEXTURE7\0"
+   "GL_TEXTURE8\0"
+   "GL_TEXTURE9\0"
+   "GL_TEXTURE10\0"
+   "GL_TEXTURE11\0"
+   "GL_TEXTURE12\0"
+   "GL_TEXTURE13\0"
+   "GL_TEXTURE14\0"
+   "GL_TEXTURE15\0"
+   "GL_TEXTURE16\0"
+   "GL_TEXTURE17\0"
+   "GL_TEXTURE18\0"
+   "GL_TEXTURE19\0"
+   "GL_TEXTURE20\0"
+   "GL_TEXTURE21\0"
+   "GL_TEXTURE22\0"
+   "GL_TEXTURE23\0"
+   "GL_TEXTURE24\0"
+   "GL_TEXTURE25\0"
+   "GL_TEXTURE26\0"
+   "GL_TEXTURE27\0"
+   "GL_TEXTURE28\0"
+   "GL_TEXTURE29\0"
+   "GL_TEXTURE30\0"
+   "GL_TEXTURE31\0"
+   "GL_ACTIVE_TEXTURE\0"
+   "GL_CLIENT_ACTIVE_TEXTURE\0"
+   "GL_MAX_TEXTURE_UNITS\0"
+   "GL_TRANSPOSE_MODELVIEW_MATRIX\0"
+   "GL_TRANSPOSE_PROJECTION_MATRIX\0"
+   "GL_TRANSPOSE_TEXTURE_MATRIX\0"
+   "GL_TRANSPOSE_COLOR_MATRIX\0"
+   "GL_SUBTRACT\0"
+   "GL_MAX_RENDERBUFFER_SIZE\0"
+   "GL_COMPRESSED_ALPHA\0"
+   "GL_COMPRESSED_LUMINANCE\0"
+   "GL_COMPRESSED_LUMINANCE_ALPHA\0"
+   "GL_COMPRESSED_INTENSITY\0"
+   "GL_COMPRESSED_RGB\0"
+   "GL_COMPRESSED_RGBA\0"
+   "GL_TEXTURE_COMPRESSION_HINT\0"
+   "GL_TEXTURE_RECTANGLE\0"
+   "GL_TEXTURE_BINDING_RECTANGLE\0"
+   "GL_PROXY_TEXTURE_RECTANGLE\0"
+   "GL_MAX_RECTANGLE_TEXTURE_SIZE\0"
+   "GL_DEPTH_STENCIL\0"
+   "GL_UNSIGNED_INT_24_8\0"
+   "GL_MAX_TEXTURE_LOD_BIAS\0"
+   "GL_TEXTURE_MAX_ANISOTROPY_EXT\0"
+   "GL_MAX_TEXTURE_MAX_ANISOTROPY_EXT\0"
+   "GL_TEXTURE_FILTER_CONTROL\0"
+   "GL_TEXTURE_LOD_BIAS\0"
+   "GL_COMBINE4_NV\0"
+   "GL_MAX_SHININESS_NV\0"
+   "GL_MAX_SPOT_EXPONENT_NV\0"
+   "GL_INCR_WRAP\0"
+   "GL_DECR_WRAP\0"
+   "GL_MODELVIEW1_ARB\0"
+   "GL_NORMAL_MAP\0"
+   "GL_REFLECTION_MAP\0"
+   "GL_TEXTURE_CUBE_MAP\0"
+   "GL_TEXTURE_BINDING_CUBE_MAP\0"
+   "GL_TEXTURE_CUBE_MAP_POSITIVE_X\0"
+   "GL_TEXTURE_CUBE_MAP_NEGATIVE_X\0"
+   "GL_TEXTURE_CUBE_MAP_POSITIVE_Y\0"
+   "GL_TEXTURE_CUBE_MAP_NEGATIVE_Y\0"
+   "GL_TEXTURE_CUBE_MAP_POSITIVE_Z\0"
+   "GL_TEXTURE_CUBE_MAP_NEGATIVE_Z\0"
+   "GL_PROXY_TEXTURE_CUBE_MAP\0"
+   "GL_MAX_CUBE_MAP_TEXTURE_SIZE\0"
+   "GL_MULTISAMPLE_FILTER_HINT_NV\0"
+   "GL_PRIMITIVE_RESTART_NV\0"
+   "GL_PRIMITIVE_RESTART_INDEX_NV\0"
+   "GL_FOG_DISTANCE_MODE_NV\0"
+   "GL_EYE_RADIAL_NV\0"
+   "GL_EYE_PLANE_ABSOLUTE_NV\0"
+   "GL_COMBINE\0"
+   "GL_COMBINE_RGB\0"
+   "GL_COMBINE_ALPHA\0"
+   "GL_RGB_SCALE\0"
+   "GL_ADD_SIGNED\0"
+   "GL_INTERPOLATE\0"
+   "GL_CONSTANT\0"
+   "GL_PRIMARY_COLOR\0"
+   "GL_PREVIOUS\0"
+   "GL_SOURCE0_RGB\0"
+   "GL_SOURCE1_RGB\0"
+   "GL_SOURCE2_RGB\0"
+   "GL_SOURCE3_RGB_NV\0"
+   "GL_SOURCE0_ALPHA\0"
+   "GL_SOURCE1_ALPHA\0"
+   "GL_SOURCE2_ALPHA\0"
+   "GL_SOURCE3_ALPHA_NV\0"
+   "GL_OPERAND0_RGB\0"
+   "GL_OPERAND1_RGB\0"
+   "GL_OPERAND2_RGB\0"
+   "GL_OPERAND3_RGB_NV\0"
+   "GL_OPERAND0_ALPHA\0"
+   "GL_OPERAND1_ALPHA\0"
+   "GL_OPERAND2_ALPHA\0"
+   "GL_OPERAND3_ALPHA_NV\0"
+   "GL_BUFFER_OBJECT_APPLE\0"
+   "GL_VERTEX_ARRAY_BINDING\0"
+   "GL_TEXTURE_RANGE_LENGTH_APPLE\0"
+   "GL_TEXTURE_RANGE_POINTER_APPLE\0"
+   "GL_YCBCR_422_APPLE\0"
+   "GL_UNSIGNED_SHORT_8_8_APPLE\0"
+   "GL_UNSIGNED_SHORT_8_8_REV_APPLE\0"
+   "GL_TEXTURE_STORAGE_HINT_APPLE\0"
+   "GL_STORAGE_PRIVATE_APPLE\0"
+   "GL_STORAGE_CACHED_APPLE\0"
+   "GL_STORAGE_SHARED_APPLE\0"
+   "GL_SLICE_ACCUM_SUN\0"
+   "GL_QUAD_MESH_SUN\0"
+   "GL_TRIANGLE_MESH_SUN\0"
+   "GL_VERTEX_PROGRAM_ARB\0"
+   "GL_VERTEX_STATE_PROGRAM_NV\0"
+   "GL_VERTEX_ATTRIB_ARRAY_ENABLED\0"
+   "GL_VERTEX_ATTRIB_ARRAY_SIZE\0"
+   "GL_VERTEX_ATTRIB_ARRAY_STRIDE\0"
+   "GL_VERTEX_ATTRIB_ARRAY_TYPE\0"
+   "GL_CURRENT_VERTEX_ATTRIB\0"
+   "GL_PROGRAM_LENGTH_ARB\0"
+   "GL_PROGRAM_STRING_ARB\0"
+   "GL_MODELVIEW_PROJECTION_NV\0"
+   "GL_IDENTITY_NV\0"
+   "GL_INVERSE_NV\0"
+   "GL_TRANSPOSE_NV\0"
+   "GL_INVERSE_TRANSPOSE_NV\0"
+   "GL_MAX_PROGRAM_MATRIX_STACK_DEPTH_ARB\0"
+   "GL_MAX_PROGRAM_MATRICES_ARB\0"
+   "GL_MATRIX0_NV\0"
+   "GL_MATRIX1_NV\0"
+   "GL_MATRIX2_NV\0"
+   "GL_MATRIX3_NV\0"
+   "GL_MATRIX4_NV\0"
+   "GL_MATRIX5_NV\0"
+   "GL_MATRIX6_NV\0"
+   "GL_MATRIX7_NV\0"
+   "GL_CURRENT_MATRIX_STACK_DEPTH_ARB\0"
+   "GL_CURRENT_MATRIX_ARB\0"
+   "GL_PROGRAM_POINT_SIZE\0"
+   "GL_VERTEX_PROGRAM_TWO_SIDE\0"
+   "GL_PROGRAM_PARAMETER_NV\0"
+   "GL_VERTEX_ATTRIB_ARRAY_POINTER\0"
+   "GL_PROGRAM_TARGET_NV\0"
+   "GL_PROGRAM_RESIDENT_NV\0"
+   "GL_TRACK_MATRIX_NV\0"
+   "GL_TRACK_MATRIX_TRANSFORM_NV\0"
+   "GL_VERTEX_PROGRAM_BINDING_NV\0"
+   "GL_PROGRAM_ERROR_POSITION_ARB\0"
+   "GL_DEPTH_CLAMP\0"
+   "GL_VERTEX_ATTRIB_ARRAY0_NV\0"
+   "GL_VERTEX_ATTRIB_ARRAY1_NV\0"
+   "GL_VERTEX_ATTRIB_ARRAY2_NV\0"
+   "GL_VERTEX_ATTRIB_ARRAY3_NV\0"
+   "GL_VERTEX_ATTRIB_ARRAY4_NV\0"
+   "GL_VERTEX_ATTRIB_ARRAY5_NV\0"
+   "GL_VERTEX_ATTRIB_ARRAY6_NV\0"
+   "GL_VERTEX_ATTRIB_ARRAY7_NV\0"
+   "GL_VERTEX_ATTRIB_ARRAY8_NV\0"
+   "GL_VERTEX_ATTRIB_ARRAY9_NV\0"
+   "GL_VERTEX_ATTRIB_ARRAY10_NV\0"
+   "GL_VERTEX_ATTRIB_ARRAY11_NV\0"
+   "GL_VERTEX_ATTRIB_ARRAY12_NV\0"
+   "GL_VERTEX_ATTRIB_ARRAY13_NV\0"
+   "GL_VERTEX_ATTRIB_ARRAY14_NV\0"
+   "GL_VERTEX_ATTRIB_ARRAY15_NV\0"
+   "GL_MAP1_VERTEX_ATTRIB0_4_NV\0"
+   "GL_MAP1_VERTEX_ATTRIB1_4_NV\0"
+   "GL_MAP1_VERTEX_ATTRIB2_4_NV\0"
+   "GL_MAP1_VERTEX_ATTRIB3_4_NV\0"
+   "GL_MAP1_VERTEX_ATTRIB4_4_NV\0"
+   "GL_MAP1_VERTEX_ATTRIB5_4_NV\0"
+   "GL_MAP1_VERTEX_ATTRIB6_4_NV\0"
+   "GL_MAP1_VERTEX_ATTRIB7_4_NV\0"
+   "GL_MAP1_VERTEX_ATTRIB8_4_NV\0"
+   "GL_MAP1_VERTEX_ATTRIB9_4_NV\0"
+   "GL_MAP1_VERTEX_ATTRIB10_4_NV\0"
+   "GL_MAP1_VERTEX_ATTRIB11_4_NV\0"
+   "GL_MAP1_VERTEX_ATTRIB12_4_NV\0"
+   "GL_MAP1_VERTEX_ATTRIB13_4_NV\0"
+   "GL_MAP1_VERTEX_ATTRIB14_4_NV\0"
+   "GL_MAP1_VERTEX_ATTRIB15_4_NV\0"
+   "GL_MAP2_VERTEX_ATTRIB0_4_NV\0"
+   "GL_MAP2_VERTEX_ATTRIB1_4_NV\0"
+   "GL_MAP2_VERTEX_ATTRIB2_4_NV\0"
+   "GL_MAP2_VERTEX_ATTRIB3_4_NV\0"
+   "GL_MAP2_VERTEX_ATTRIB4_4_NV\0"
+   "GL_MAP2_VERTEX_ATTRIB5_4_NV\0"
+   "GL_MAP2_VERTEX_ATTRIB6_4_NV\0"
+   "GL_PROGRAM_BINDING_ARB\0"
+   "GL_MAP2_VERTEX_ATTRIB8_4_NV\0"
+   "GL_MAP2_VERTEX_ATTRIB9_4_NV\0"
+   "GL_MAP2_VERTEX_ATTRIB10_4_NV\0"
+   "GL_MAP2_VERTEX_ATTRIB11_4_NV\0"
+   "GL_MAP2_VERTEX_ATTRIB12_4_NV\0"
+   "GL_MAP2_VERTEX_ATTRIB13_4_NV\0"
+   "GL_MAP2_VERTEX_ATTRIB14_4_NV\0"
+   "GL_MAP2_VERTEX_ATTRIB15_4_NV\0"
+   "GL_TEXTURE_COMPRESSED_IMAGE_SIZE\0"
+   "GL_TEXTURE_COMPRESSED\0"
+   "GL_NUM_COMPRESSED_TEXTURE_FORMATS\0"
+   "GL_COMPRESSED_TEXTURE_FORMATS\0"
+   "GL_MAX_VERTEX_UNITS_ARB\0"
+   "GL_ACTIVE_VERTEX_UNITS_ARB\0"
+   "GL_WEIGHT_SUM_UNITY_ARB\0"
+   "GL_VERTEX_BLEND_ARB\0"
+   "GL_CURRENT_WEIGHT_ARB\0"
+   "GL_WEIGHT_ARRAY_TYPE_ARB\0"
+   "GL_WEIGHT_ARRAY_STRIDE_ARB\0"
+   "GL_WEIGHT_ARRAY_SIZE_ARB\0"
+   "GL_WEIGHT_ARRAY_POINTER_ARB\0"
+   "GL_WEIGHT_ARRAY_ARB\0"
+   "GL_DOT3_RGB\0"
+   "GL_DOT3_RGBA\0"
+   "GL_COMPRESSED_RGB_FXT1_3DFX\0"
+   "GL_COMPRESSED_RGBA_FXT1_3DFX\0"
+   "GL_MULTISAMPLE_3DFX\0"
+   "GL_SAMPLE_BUFFERS_3DFX\0"
+   "GL_SAMPLES_3DFX\0"
+   "GL_SURFACE_STATE_NV\0"
+   "GL_SURFACE_REGISTERED_NV\0"
+   "GL_SURFACE_MAPPED_NV\0"
+   "GL_MODELVIEW2_ARB\0"
+   "GL_MODELVIEW3_ARB\0"
+   "GL_MODELVIEW4_ARB\0"
+   "GL_MODELVIEW5_ARB\0"
+   "GL_MODELVIEW6_ARB\0"
+   "GL_MODELVIEW7_ARB\0"
+   "GL_MODELVIEW8_ARB\0"
+   "GL_MODELVIEW9_ARB\0"
+   "GL_MODELVIEW10_ARB\0"
+   "GL_MODELVIEW11_ARB\0"
+   "GL_MODELVIEW12_ARB\0"
+   "GL_MODELVIEW13_ARB\0"
+   "GL_MODELVIEW14_ARB\0"
+   "GL_MODELVIEW15_ARB\0"
+   "GL_MODELVIEW16_ARB\0"
+   "GL_MODELVIEW17_ARB\0"
+   "GL_MODELVIEW18_ARB\0"
+   "GL_MODELVIEW19_ARB\0"
+   "GL_MODELVIEW20_ARB\0"
+   "GL_MODELVIEW21_ARB\0"
+   "GL_MODELVIEW22_ARB\0"
+   "GL_MODELVIEW23_ARB\0"
+   "GL_MODELVIEW24_ARB\0"
+   "GL_MODELVIEW25_ARB\0"
+   "GL_MODELVIEW26_ARB\0"
+   "GL_MODELVIEW27_ARB\0"
+   "GL_MODELVIEW28_ARB\0"
+   "GL_MODELVIEW29_ARB\0"
+   "GL_MODELVIEW30_ARB\0"
+   "GL_MODELVIEW31_ARB\0"
+   "GL_DOT3_RGB_EXT\0"
+   "GL_PROGRAM_BINARY_LENGTH\0"
+   "GL_MIRROR_CLAMP_EXT\0"
+   "GL_MIRROR_CLAMP_TO_EDGE_EXT\0"
+   "GL_MODULATE_ADD_ATI\0"
+   "GL_MODULATE_SIGNED_ADD_ATI\0"
+   "GL_MODULATE_SUBTRACT_ATI\0"
+   "GL_YCBCR_MESA\0"
+   "GL_PACK_INVERT_MESA\0"
+   "GL_BUFFER_SIZE\0"
+   "GL_BUFFER_USAGE\0"
+   "GL_BUMP_ROT_MATRIX_ATI\0"
+   "GL_BUMP_ROT_MATRIX_SIZE_ATI\0"
+   "GL_BUMP_NUM_TEX_UNITS_ATI\0"
+   "GL_BUMP_TEX_UNITS_ATI\0"
+   "GL_DUDV_ATI\0"
+   "GL_DU8DV8_ATI\0"
+   "GL_BUMP_ENVMAP_ATI\0"
+   "GL_BUMP_TARGET_ATI\0"
+   "GL_NUM_PROGRAM_BINARY_FORMATS\0"
+   "GL_PROGRAM_BINARY_FORMATS\0"
+   "GL_STENCIL_BACK_FUNC\0"
+   "GL_STENCIL_BACK_FAIL\0"
+   "GL_STENCIL_BACK_PASS_DEPTH_FAIL\0"
+   "GL_STENCIL_BACK_PASS_DEPTH_PASS\0"
+   "GL_FRAGMENT_PROGRAM_ARB\0"
+   "GL_PROGRAM_ALU_INSTRUCTIONS_ARB\0"
+   "GL_PROGRAM_TEX_INSTRUCTIONS_ARB\0"
+   "GL_PROGRAM_TEX_INDIRECTIONS_ARB\0"
+   "GL_PROGRAM_NATIVE_ALU_INSTRUCTIONS_ARB\0"
+   "GL_PROGRAM_NATIVE_TEX_INSTRUCTIONS_ARB\0"
+   "GL_PROGRAM_NATIVE_TEX_INDIRECTIONS_ARB\0"
+   "GL_MAX_PROGRAM_ALU_INSTRUCTIONS_ARB\0"
+   "GL_MAX_PROGRAM_TEX_INSTRUCTIONS_ARB\0"
+   "GL_MAX_PROGRAM_TEX_INDIRECTIONS_ARB\0"
+   "GL_MAX_PROGRAM_NATIVE_ALU_INSTRUCTIONS_ARB\0"
+   "GL_MAX_PROGRAM_NATIVE_TEX_INSTRUCTIONS_ARB\0"
+   "GL_MAX_PROGRAM_NATIVE_TEX_INDIRECTIONS_ARB\0"
+   "GL_RGBA32F\0"
+   "GL_RGB32F\0"
+   "GL_ALPHA32F_ARB\0"
+   "GL_INTENSITY32F_ARB\0"
+   "GL_LUMINANCE32F_ARB\0"
+   "GL_LUMINANCE_ALPHA32F_ARB\0"
+   "GL_RGBA16F\0"
+   "GL_RGB16F\0"
+   "GL_ALPHA16F_ARB\0"
+   "GL_INTENSITY16F_ARB\0"
+   "GL_LUMINANCE16F_ARB\0"
+   "GL_LUMINANCE_ALPHA16F_ARB\0"
+   "GL_RGBA_FLOAT_MODE_ARB\0"
+   "GL_MAX_DRAW_BUFFERS\0"
+   "GL_DRAW_BUFFER0\0"
+   "GL_DRAW_BUFFER1\0"
+   "GL_DRAW_BUFFER2\0"
+   "GL_DRAW_BUFFER3\0"
+   "GL_DRAW_BUFFER4\0"
+   "GL_DRAW_BUFFER5\0"
+   "GL_DRAW_BUFFER6\0"
+   "GL_DRAW_BUFFER7\0"
+   "GL_DRAW_BUFFER8\0"
+   "GL_DRAW_BUFFER9\0"
+   "GL_DRAW_BUFFER10\0"
+   "GL_DRAW_BUFFER11\0"
+   "GL_DRAW_BUFFER12\0"
+   "GL_DRAW_BUFFER13\0"
+   "GL_DRAW_BUFFER14\0"
+   "GL_DRAW_BUFFER15\0"
+   "GL_BLEND_EQUATION_ALPHA\0"
+   "GL_MATRIX_PALETTE_ARB\0"
+   "GL_MAX_MATRIX_PALETTE_STACK_DEPTH_ARB\0"
+   "GL_MAX_PALETTE_MATRICES_ARB\0"
+   "GL_CURRENT_PALETTE_MATRIX_ARB\0"
+   "GL_MATRIX_INDEX_ARRAY_ARB\0"
+   "GL_CURRENT_MATRIX_INDEX_ARB\0"
+   "GL_MATRIX_INDEX_ARRAY_SIZE_ARB\0"
+   "GL_MATRIX_INDEX_ARRAY_TYPE_ARB\0"
+   "GL_MATRIX_INDEX_ARRAY_STRIDE_ARB\0"
+   "GL_MATRIX_INDEX_ARRAY_POINTER_ARB\0"
+   "GL_TEXTURE_DEPTH_SIZE\0"
+   "GL_DEPTH_TEXTURE_MODE\0"
+   "GL_TEXTURE_COMPARE_MODE\0"
+   "GL_TEXTURE_COMPARE_FUNC\0"
+   "GL_COMPARE_REF_TO_TEXTURE\0"
+   "GL_TEXTURE_CUBE_MAP_SEAMLESS\0"
+   "GL_POINT_SPRITE\0"
+   "GL_COORD_REPLACE\0"
+   "GL_POINT_SPRITE_R_MODE_NV\0"
+   "GL_QUERY_COUNTER_BITS\0"
+   "GL_CURRENT_QUERY\0"
+   "GL_QUERY_RESULT\0"
+   "GL_QUERY_RESULT_AVAILABLE\0"
+   "GL_MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV\0"
+   "GL_MAX_VERTEX_ATTRIBS\0"
+   "GL_VERTEX_ATTRIB_ARRAY_NORMALIZED\0"
+   "GL_DEPTH_STENCIL_TO_RGBA_NV\0"
+   "GL_DEPTH_STENCIL_TO_BGRA_NV\0"
+   "GL_FRAGMENT_PROGRAM_NV\0"
+   "GL_MAX_TEXTURE_COORDS\0"
+   "GL_MAX_TEXTURE_IMAGE_UNITS\0"
+   "GL_FRAGMENT_PROGRAM_BINDING_NV\0"
+   "GL_PROGRAM_ERROR_STRING_ARB\0"
+   "GL_PROGRAM_FORMAT_ASCII_ARB\0"
+   "GL_PROGRAM_FORMAT_ARB\0"
+   "GL_GEOMETRY_SHADER_INVOCATIONS\0"
+   "GL_TEXTURE_UNSIGNED_REMAP_MODE_NV\0"
+   "GL_DEPTH_BOUNDS_TEST_EXT\0"
+   "GL_DEPTH_BOUNDS_EXT\0"
+   "GL_ARRAY_BUFFER\0"
+   "GL_ELEMENT_ARRAY_BUFFER\0"
+   "GL_ARRAY_BUFFER_BINDING\0"
+   "GL_ELEMENT_ARRAY_BUFFER_BINDING\0"
+   "GL_VERTEX_ARRAY_BUFFER_BINDING\0"
+   "GL_NORMAL_ARRAY_BUFFER_BINDING\0"
+   "GL_COLOR_ARRAY_BUFFER_BINDING\0"
+   "GL_INDEX_ARRAY_BUFFER_BINDING\0"
+   "GL_TEXTURE_COORD_ARRAY_BUFFER_BINDING\0"
+   "GL_EDGE_FLAG_ARRAY_BUFFER_BINDING\0"
+   "GL_SECONDARY_COLOR_ARRAY_BUFFER_BINDING\0"
+   "GL_FOG_COORDINATE_ARRAY_BUFFER_BINDING\0"
+   "GL_WEIGHT_ARRAY_BUFFER_BINDING\0"
+   "GL_VERTEX_ATTRIB_ARRAY_BUFFER_BINDING\0"
+   "GL_PROGRAM_INSTRUCTIONS_ARB\0"
+   "GL_MAX_PROGRAM_INSTRUCTIONS_ARB\0"
+   "GL_PROGRAM_NATIVE_INSTRUCTIONS_ARB\0"
+   "GL_MAX_PROGRAM_NATIVE_INSTRUCTIONS_ARB\0"
+   "GL_PROGRAM_TEMPORARIES_ARB\0"
+   "GL_MAX_PROGRAM_TEMPORARIES_ARB\0"
+   "GL_PROGRAM_NATIVE_TEMPORARIES_ARB\0"
+   "GL_MAX_PROGRAM_NATIVE_TEMPORARIES_ARB\0"
+   "GL_PROGRAM_PARAMETERS_ARB\0"
+   "GL_MAX_PROGRAM_PARAMETERS_ARB\0"
+   "GL_PROGRAM_NATIVE_PARAMETERS_ARB\0"
+   "GL_MAX_PROGRAM_NATIVE_PARAMETERS_ARB\0"
+   "GL_PROGRAM_ATTRIBS_ARB\0"
+   "GL_MAX_PROGRAM_ATTRIBS_ARB\0"
+   "GL_PROGRAM_NATIVE_ATTRIBS_ARB\0"
+   "GL_MAX_PROGRAM_NATIVE_ATTRIBS_ARB\0"
+   "GL_PROGRAM_ADDRESS_REGISTERS_ARB\0"
+   "GL_MAX_PROGRAM_ADDRESS_REGISTERS_ARB\0"
+   "GL_PROGRAM_NATIVE_ADDRESS_REGISTERS_ARB\0"
+   "GL_MAX_PROGRAM_NATIVE_ADDRESS_REGISTERS_ARB\0"
+   "GL_MAX_PROGRAM_LOCAL_PARAMETERS_ARB\0"
+   "GL_MAX_PROGRAM_ENV_PARAMETERS_ARB\0"
+   "GL_PROGRAM_UNDER_NATIVE_LIMITS_ARB\0"
+   "GL_TRANSPOSE_CURRENT_MATRIX_ARB\0"
+   "GL_READ_ONLY\0"
+   "GL_WRITE_ONLY\0"
+   "GL_READ_WRITE\0"
+   "GL_BUFFER_ACCESS\0"
+   "GL_BUFFER_MAPPED\0"
+   "GL_BUFFER_MAP_POINTER\0"
+   "GL_WRITE_DISCARD_NV\0"
+   "GL_TIME_ELAPSED\0"
+   "GL_MATRIX0_ARB\0"
+   "GL_MATRIX1_ARB\0"
+   "GL_MATRIX2_ARB\0"
+   "GL_MATRIX3_ARB\0"
+   "GL_MATRIX4_ARB\0"
+   "GL_MATRIX5_ARB\0"
+   "GL_MATRIX6_ARB\0"
+   "GL_MATRIX7_ARB\0"
+   "GL_MATRIX8_ARB\0"
+   "GL_MATRIX9_ARB\0"
+   "GL_MATRIX10_ARB\0"
+   "GL_MATRIX11_ARB\0"
+   "GL_MATRIX12_ARB\0"
+   "GL_MATRIX13_ARB\0"
+   "GL_MATRIX14_ARB\0"
+   "GL_MATRIX15_ARB\0"
+   "GL_MATRIX16_ARB\0"
+   "GL_MATRIX17_ARB\0"
+   "GL_MATRIX18_ARB\0"
+   "GL_MATRIX19_ARB\0"
+   "GL_MATRIX20_ARB\0"
+   "GL_MATRIX21_ARB\0"
+   "GL_MATRIX22_ARB\0"
+   "GL_MATRIX23_ARB\0"
+   "GL_MATRIX24_ARB\0"
+   "GL_MATRIX25_ARB\0"
+   "GL_MATRIX26_ARB\0"
+   "GL_MATRIX27_ARB\0"
+   "GL_MATRIX28_ARB\0"
+   "GL_MATRIX29_ARB\0"
+   "GL_MATRIX30_ARB\0"
+   "GL_MATRIX31_ARB\0"
+   "GL_STREAM_DRAW\0"
+   "GL_STREAM_READ\0"
+   "GL_STREAM_COPY\0"
+   "GL_STATIC_DRAW\0"
+   "GL_STATIC_READ\0"
+   "GL_STATIC_COPY\0"
+   "GL_DYNAMIC_DRAW\0"
+   "GL_DYNAMIC_READ\0"
+   "GL_DYNAMIC_COPY\0"
+   "GL_PIXEL_PACK_BUFFER\0"
+   "GL_PIXEL_UNPACK_BUFFER\0"
+   "GL_PIXEL_PACK_BUFFER_BINDING\0"
+   "GL_PIXEL_UNPACK_BUFFER_BINDING\0"
+   "GL_DEPTH24_STENCIL8\0"
+   "GL_TEXTURE_STENCIL_SIZE\0"
+   "GL_MAX_PROGRAM_EXEC_INSTRUCTIONS_NV\0"
+   "GL_MAX_PROGRAM_CALL_DEPTH_NV\0"
+   "GL_MAX_PROGRAM_IF_DEPTH_NV\0"
+   "GL_MAX_PROGRAM_LOOP_DEPTH_NV\0"
+   "GL_MAX_PROGRAM_LOOP_COUNT_NV\0"
+   "GL_SRC1_COLOR\0"
+   "GL_ONE_MINUS_SRC1_COLOR\0"
+   "GL_ONE_MINUS_SRC1_ALPHA\0"
+   "GL_MAX_DUAL_SOURCE_DRAW_BUFFERS\0"
+   "GL_VERTEX_ATTRIB_ARRAY_INTEGER\0"
+   "GL_VERTEX_ATTRIB_ARRAY_DIVISOR_ARB\0"
+   "GL_MAX_ARRAY_TEXTURE_LAYERS\0"
+   "GL_MIN_PROGRAM_TEXEL_OFFSET\0"
+   "GL_MAX_PROGRAM_TEXEL_OFFSET\0"
+   "GL_STENCIL_TEST_TWO_SIDE_EXT\0"
+   "GL_ACTIVE_STENCIL_FACE_EXT\0"
+   "GL_MIRROR_CLAMP_TO_BORDER_EXT\0"
+   "GL_SAMPLES_PASSED\0"
+   "GL_GEOMETRY_VERTICES_OUT\0"
+   "GL_GEOMETRY_INPUT_TYPE\0"
+   "GL_GEOMETRY_OUTPUT_TYPE\0"
+   "GL_SAMPLER_BINDING\0"
+   "GL_CLAMP_VERTEX_COLOR\0"
+   "GL_CLAMP_FRAGMENT_COLOR\0"
+   "GL_CLAMP_READ_COLOR\0"
+   "GL_FIXED_ONLY\0"
+   "GL_FRAGMENT_SHADER_ATI\0"
+   "GL_REG_0_ATI\0"
+   "GL_REG_1_ATI\0"
+   "GL_REG_2_ATI\0"
+   "GL_REG_3_ATI\0"
+   "GL_REG_4_ATI\0"
+   "GL_REG_5_ATI\0"
+   "GL_REG_6_ATI\0"
+   "GL_REG_7_ATI\0"
+   "GL_REG_8_ATI\0"
+   "GL_REG_9_ATI\0"
+   "GL_REG_10_ATI\0"
+   "GL_REG_11_ATI\0"
+   "GL_REG_12_ATI\0"
+   "GL_REG_13_ATI\0"
+   "GL_REG_14_ATI\0"
+   "GL_REG_15_ATI\0"
+   "GL_REG_16_ATI\0"
+   "GL_REG_17_ATI\0"
+   "GL_REG_18_ATI\0"
+   "GL_REG_19_ATI\0"
+   "GL_REG_20_ATI\0"
+   "GL_REG_21_ATI\0"
+   "GL_REG_22_ATI\0"
+   "GL_REG_23_ATI\0"
+   "GL_REG_24_ATI\0"
+   "GL_REG_25_ATI\0"
+   "GL_REG_26_ATI\0"
+   "GL_REG_27_ATI\0"
+   "GL_REG_28_ATI\0"
+   "GL_REG_29_ATI\0"
+   "GL_REG_30_ATI\0"
+   "GL_REG_31_ATI\0"
+   "GL_CON_0_ATI\0"
+   "GL_CON_1_ATI\0"
+   "GL_CON_2_ATI\0"
+   "GL_CON_3_ATI\0"
+   "GL_CON_4_ATI\0"
+   "GL_CON_5_ATI\0"
+   "GL_CON_6_ATI\0"
+   "GL_CON_7_ATI\0"
+   "GL_CON_8_ATI\0"
+   "GL_CON_9_ATI\0"
+   "GL_CON_10_ATI\0"
+   "GL_CON_11_ATI\0"
+   "GL_CON_12_ATI\0"
+   "GL_CON_13_ATI\0"
+   "GL_CON_14_ATI\0"
+   "GL_CON_15_ATI\0"
+   "GL_CON_16_ATI\0"
+   "GL_CON_17_ATI\0"
+   "GL_CON_18_ATI\0"
+   "GL_CON_19_ATI\0"
+   "GL_CON_20_ATI\0"
+   "GL_CON_21_ATI\0"
+   "GL_CON_22_ATI\0"
+   "GL_CON_23_ATI\0"
+   "GL_CON_24_ATI\0"
+   "GL_CON_25_ATI\0"
+   "GL_CON_26_ATI\0"
+   "GL_CON_27_ATI\0"
+   "GL_CON_28_ATI\0"
+   "GL_CON_29_ATI\0"
+   "GL_CON_30_ATI\0"
+   "GL_CON_31_ATI\0"
+   "GL_MOV_ATI\0"
+   "GL_ADD_ATI\0"
+   "GL_MUL_ATI\0"
+   "GL_SUB_ATI\0"
+   "GL_DOT3_ATI\0"
+   "GL_DOT4_ATI\0"
+   "GL_MAD_ATI\0"
+   "GL_LERP_ATI\0"
+   "GL_CND_ATI\0"
+   "GL_CND0_ATI\0"
+   "GL_DOT2_ADD_ATI\0"
+   "GL_SECONDARY_INTERPOLATOR_ATI\0"
+   "GL_NUM_FRAGMENT_REGISTERS_ATI\0"
+   "GL_NUM_FRAGMENT_CONSTANTS_ATI\0"
+   "GL_NUM_PASSES_ATI\0"
+   "GL_NUM_INSTRUCTIONS_PER_PASS_ATI\0"
+   "GL_NUM_INSTRUCTIONS_TOTAL_ATI\0"
+   "GL_NUM_INPUT_INTERPOLATOR_COMPONENTS_ATI\0"
+   "GL_NUM_LOOPBACK_COMPONENTS_ATI\0"
+   "GL_COLOR_ALPHA_PAIRING_ATI\0"
+   "GL_SWIZZLE_STR_ATI\0"
+   "GL_SWIZZLE_STQ_ATI\0"
+   "GL_SWIZZLE_STR_DR_ATI\0"
+   "GL_SWIZZLE_STQ_DQ_ATI\0"
+   "GL_SWIZZLE_STRQ_ATI\0"
+   "GL_SWIZZLE_STRQ_DQ_ATI\0"
+   "GL_POINT_SIZE_ARRAY_TYPE_OES\0"
+   "GL_POINT_SIZE_ARRAY_STRIDE_OES\0"
+   "GL_POINT_SIZE_ARRAY_POINTER_OES\0"
+   "GL_MODELVIEW_MATRIX_FLOAT_AS_INT_BITS_OES\0"
+   "GL_PROJECTION_MATRIX_FLOAT_AS_INT_BITS_OES\0"
+   "GL_TEXTURE_MATRIX_FLOAT_AS_INT_BITS_OES\0"
+   "GL_UNIFORM_BUFFER\0"
+   "GL_BUFFER_SERIALIZED_MODIFY_APPLE\0"
+   "GL_BUFFER_FLUSHING_UNMAP_APPLE\0"
+   "GL_RELEASED_APPLE\0"
+   "GL_VOLATILE_APPLE\0"
+   "GL_RETAINED_APPLE\0"
+   "GL_UNDEFINED_APPLE\0"
+   "GL_PURGEABLE_APPLE\0"
+   "GL_UNIFORM_BUFFER_BINDING\0"
+   "GL_UNIFORM_BUFFER_START\0"
+   "GL_UNIFORM_BUFFER_SIZE\0"
+   "GL_MAX_VERTEX_UNIFORM_BLOCKS\0"
+   "GL_MAX_GEOMETRY_UNIFORM_BLOCKS\0"
+   "GL_MAX_FRAGMENT_UNIFORM_BLOCKS\0"
+   "GL_MAX_COMBINED_UNIFORM_BLOCKS\0"
+   "GL_MAX_UNIFORM_BUFFER_BINDINGS\0"
+   "GL_MAX_UNIFORM_BLOCK_SIZE\0"
+   "GL_MAX_COMBINED_VERTEX_UNIFORM_COMPONENTS\0"
+   "GL_MAX_COMBINED_GEOMETRY_UNIFORM_COMPONENTS\0"
+   "GL_MAX_COMBINED_FRAGMENT_UNIFORM_COMPONENTS\0"
+   "GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT\0"
+   "GL_ACTIVE_UNIFORM_BLOCK_MAX_NAME_LENGTH\0"
+   "GL_ACTIVE_UNIFORM_BLOCKS\0"
+   "GL_UNIFORM_TYPE\0"
+   "GL_UNIFORM_SIZE\0"
+   "GL_UNIFORM_NAME_LENGTH\0"
+   "GL_UNIFORM_BLOCK_INDEX\0"
+   "GL_UNIFORM_OFFSET\0"
+   "GL_UNIFORM_ARRAY_STRIDE\0"
+   "GL_UNIFORM_MATRIX_STRIDE\0"
+   "GL_UNIFORM_IS_ROW_MAJOR\0"
+   "GL_UNIFORM_BLOCK_BINDING\0"
+   "GL_UNIFORM_BLOCK_DATA_SIZE\0"
+   "GL_UNIFORM_BLOCK_NAME_LENGTH\0"
+   "GL_UNIFORM_BLOCK_ACTIVE_UNIFORMS\0"
+   "GL_UNIFORM_BLOCK_ACTIVE_UNIFORM_INDICES\0"
+   "GL_UNIFORM_BLOCK_REFERENCED_BY_VERTEX_SHADER\0"
+   "GL_UNIFORM_BLOCK_REFERENCED_BY_GEOMETRY_SHADER\0"
+   "GL_UNIFORM_BLOCK_REFERENCED_BY_FRAGMENT_SHADER\0"
+   "GL_TEXTURE_SRGB_DECODE_EXT\0"
+   "GL_DECODE_EXT\0"
+   "GL_SKIP_DECODE_EXT\0"
+   "GL_FRAGMENT_SHADER\0"
+   "GL_VERTEX_SHADER\0"
+   "GL_PROGRAM_OBJECT_ARB\0"
+   "GL_SHADER_OBJECT_ARB\0"
+   "GL_MAX_FRAGMENT_UNIFORM_COMPONENTS\0"
+   "GL_MAX_VERTEX_UNIFORM_COMPONENTS\0"
+   "GL_MAX_VARYING_COMPONENTS\0"
+   "GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS\0"
+   "GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS\0"
+   "GL_OBJECT_TYPE_ARB\0"
+   "GL_SHADER_TYPE\0"
+   "GL_FLOAT_VEC2\0"
+   "GL_FLOAT_VEC3\0"
+   "GL_FLOAT_VEC4\0"
+   "GL_INT_VEC2\0"
+   "GL_INT_VEC3\0"
+   "GL_INT_VEC4\0"
+   "GL_BOOL\0"
+   "GL_BOOL_VEC2\0"
+   "GL_BOOL_VEC3\0"
+   "GL_BOOL_VEC4\0"
+   "GL_FLOAT_MAT2\0"
+   "GL_FLOAT_MAT3\0"
+   "GL_FLOAT_MAT4\0"
+   "GL_SAMPLER_1D\0"
+   "GL_SAMPLER_2D\0"
+   "GL_SAMPLER_3D\0"
+   "GL_SAMPLER_CUBE\0"
+   "GL_SAMPLER_1D_SHADOW\0"
+   "GL_SAMPLER_2D_SHADOW\0"
+   "GL_SAMPLER_2D_RECT\0"
+   "GL_SAMPLER_2D_RECT_SHADOW\0"
+   "GL_FLOAT_MAT2x3\0"
+   "GL_FLOAT_MAT2x4\0"
+   "GL_FLOAT_MAT3x2\0"
+   "GL_FLOAT_MAT3x4\0"
+   "GL_FLOAT_MAT4x2\0"
+   "GL_FLOAT_MAT4x3\0"
+   "GL_DELETE_STATUS\0"
+   "GL_COMPILE_STATUS\0"
+   "GL_LINK_STATUS\0"
+   "GL_VALIDATE_STATUS\0"
+   "GL_INFO_LOG_LENGTH\0"
+   "GL_ATTACHED_SHADERS\0"
+   "GL_ACTIVE_UNIFORMS\0"
+   "GL_ACTIVE_UNIFORM_MAX_LENGTH\0"
+   "GL_SHADER_SOURCE_LENGTH\0"
+   "GL_ACTIVE_ATTRIBUTES\0"
+   "GL_ACTIVE_ATTRIBUTE_MAX_LENGTH\0"
+   "GL_FRAGMENT_SHADER_DERIVATIVE_HINT\0"
+   "GL_SHADING_LANGUAGE_VERSION\0"
+   "GL_CURRENT_PROGRAM\0"
+   "GL_PALETTE4_RGB8_OES\0"
+   "GL_PALETTE4_RGBA8_OES\0"
+   "GL_PALETTE4_R5_G6_B5_OES\0"
+   "GL_PALETTE4_RGBA4_OES\0"
+   "GL_PALETTE4_RGB5_A1_OES\0"
+   "GL_PALETTE8_RGB8_OES\0"
+   "GL_PALETTE8_RGBA8_OES\0"
+   "GL_PALETTE8_R5_G6_B5_OES\0"
+   "GL_PALETTE8_RGBA4_OES\0"
+   "GL_PALETTE8_RGB5_A1_OES\0"
+   "GL_IMPLEMENTATION_COLOR_READ_TYPE\0"
+   "GL_IMPLEMENTATION_COLOR_READ_FORMAT\0"
+   "GL_POINT_SIZE_ARRAY_OES\0"
+   "GL_TEXTURE_CROP_RECT_OES\0"
+   "GL_MATRIX_INDEX_ARRAY_BUFFER_BINDING_OES\0"
+   "GL_POINT_SIZE_ARRAY_BUFFER_BINDING_OES\0"
+   "GL_COUNTER_TYPE_AMD\0"
+   "GL_COUNTER_RANGE_AMD\0"
+   "GL_UNSIGNED_INT64_AMD\0"
+   "GL_PECENTAGE_AMD\0"
+   "GL_PERFMON_RESULT_AVAILABLE_AMD\0"
+   "GL_PERFMON_RESULT_SIZE_AMD\0"
+   "GL_PERFMON_RESULT_AMD\0"
+   "GL_TEXTURE_RED_TYPE\0"
+   "GL_TEXTURE_GREEN_TYPE\0"
+   "GL_TEXTURE_BLUE_TYPE\0"
+   "GL_TEXTURE_ALPHA_TYPE\0"
+   "GL_TEXTURE_LUMINANCE_TYPE\0"
+   "GL_TEXTURE_INTENSITY_TYPE\0"
+   "GL_TEXTURE_DEPTH_TYPE\0"
+   "GL_UNSIGNED_NORMALIZED\0"
+   "GL_TEXTURE_1D_ARRAY\0"
+   "GL_PROXY_TEXTURE_1D_ARRAY\0"
+   "GL_TEXTURE_2D_ARRAY\0"
+   "GL_PROXY_TEXTURE_2D_ARRAY\0"
+   "GL_TEXTURE_BINDING_1D_ARRAY\0"
+   "GL_TEXTURE_BINDING_2D_ARRAY\0"
+   "GL_MAX_GEOMETRY_TEXTURE_IMAGE_UNITS\0"
+   "GL_TEXTURE_BUFFER\0"
+   "GL_MAX_TEXTURE_BUFFER_SIZE\0"
+   "GL_TEXTURE_BINDING_BUFFER\0"
+   "GL_TEXTURE_BUFFER_DATA_STORE_BINDING\0"
+   "GL_TEXTURE_BUFFER_FORMAT\0"
+   "GL_ANY_SAMPLES_PASSED\0"
+   "GL_SAMPLE_SHADING\0"
+   "GL_MIN_SAMPLE_SHADING_VALUE\0"
+   "GL_R11F_G11F_B10F\0"
+   "GL_UNSIGNED_INT_10F_11F_11F_REV\0"
+   "GL_RGBA_SIGNED_COMPONENTS_EXT\0"
+   "GL_RGB9_E5\0"
+   "GL_UNSIGNED_INT_5_9_9_9_REV\0"
+   "GL_TEXTURE_SHARED_SIZE\0"
+   "GL_SRGB\0"
+   "GL_SRGB8\0"
+   "GL_SRGB_ALPHA\0"
+   "GL_SRGB8_ALPHA8\0"
+   "GL_SLUMINANCE_ALPHA\0"
+   "GL_SLUMINANCE8_ALPHA8\0"
+   "GL_SLUMINANCE\0"
+   "GL_SLUMINANCE8\0"
+   "GL_COMPRESSED_SRGB\0"
+   "GL_COMPRESSED_SRGB_ALPHA\0"
+   "GL_COMPRESSED_SLUMINANCE\0"
+   "GL_COMPRESSED_SLUMINANCE_ALPHA\0"
+   "GL_TRANSFORM_FEEDBACK_VARYING_MAX_LENGTH\0"
+   "GL_TRANSFORM_FEEDBACK_BUFFER_MODE\0"
+   "GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS\0"
+   "GL_TRANSFORM_FEEDBACK_VARYINGS\0"
+   "GL_TRANSFORM_FEEDBACK_BUFFER_START\0"
+   "GL_TRANSFORM_FEEDBACK_BUFFER_SIZE\0"
+   "GL_PRIMITIVES_GENERATED\0"
+   "GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN\0"
+   "GL_RASTERIZER_DISCARD\0"
+   "GL_MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS\0"
+   "GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS\0"
+   "GL_INTERLEAVED_ATTRIBS\0"
+   "GL_SEPARATE_ATTRIBS\0"
+   "GL_TRANSFORM_FEEDBACK_BUFFER\0"
+   "GL_TRANSFORM_FEEDBACK_BUFFER_BINDING\0"
+   "GL_POINT_SPRITE_COORD_ORIGIN\0"
+   "GL_LOWER_LEFT\0"
+   "GL_UPPER_LEFT\0"
+   "GL_STENCIL_BACK_REF\0"
+   "GL_STENCIL_BACK_VALUE_MASK\0"
+   "GL_STENCIL_BACK_WRITEMASK\0"
+   "GL_DRAW_FRAMEBUFFER_BINDING\0"
+   "GL_RENDERBUFFER_BINDING\0"
+   "GL_READ_FRAMEBUFFER\0"
+   "GL_DRAW_FRAMEBUFFER\0"
+   "GL_READ_FRAMEBUFFER_BINDING\0"
+   "GL_RENDERBUFFER_SAMPLES\0"
+   "GL_DEPTH_COMPONENT32F\0"
+   "GL_DEPTH32F_STENCIL8\0"
+   "GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE\0"
+   "GL_FRAMEBUFFER_ATTACHMENT_OBJECT_NAME\0"
+   "GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_LEVEL\0"
+   "GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_CUBE_MAP_FACE\0"
+   "GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_LAYER\0"
+   "GL_FRAMEBUFFER_COMPLETE\0"
+   "GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT\0"
+   "GL_FRAMEBUFFER_INCOMPLETE_MISSING_ATTACHMENT\0"
+   "GL_FRAMEBUFFER_INCOMPLETE_DUPLICATE_ATTACHMENT_EXT\0"
+   "GL_FRAMEBUFFER_INCOMPLETE_DIMENSIONS_EXT\0"
+   "GL_FRAMEBUFFER_INCOMPLETE_FORMATS_EXT\0"
+   "GL_FRAMEBUFFER_INCOMPLETE_DRAW_BUFFER\0"
+   "GL_FRAMEBUFFER_INCOMPLETE_READ_BUFFER\0"
+   "GL_FRAMEBUFFER_UNSUPPORTED\0"
+   "GL_FRAMEBUFFER_STATUS_ERROR_EXT\0"
+   "GL_MAX_COLOR_ATTACHMENTS\0"
+   "GL_COLOR_ATTACHMENT0\0"
+   "GL_COLOR_ATTACHMENT1\0"
+   "GL_COLOR_ATTACHMENT2\0"
+   "GL_COLOR_ATTACHMENT3\0"
+   "GL_COLOR_ATTACHMENT4\0"
+   "GL_COLOR_ATTACHMENT5\0"
+   "GL_COLOR_ATTACHMENT6\0"
+   "GL_COLOR_ATTACHMENT7\0"
+   "GL_COLOR_ATTACHMENT8\0"
+   "GL_COLOR_ATTACHMENT9\0"
+   "GL_COLOR_ATTACHMENT10\0"
+   "GL_COLOR_ATTACHMENT11\0"
+   "GL_COLOR_ATTACHMENT12\0"
+   "GL_COLOR_ATTACHMENT13\0"
+   "GL_COLOR_ATTACHMENT14\0"
+   "GL_COLOR_ATTACHMENT15\0"
+   "GL_DEPTH_ATTACHMENT\0"
+   "GL_STENCIL_ATTACHMENT\0"
+   "GL_FRAMEBUFFER\0"
+   "GL_RENDERBUFFER\0"
+   "GL_RENDERBUFFER_WIDTH\0"
+   "GL_RENDERBUFFER_HEIGHT\0"
+   "GL_RENDERBUFFER_INTERNAL_FORMAT\0"
+   "GL_STENCIL_INDEX_EXT\0"
+   "GL_STENCIL_INDEX1\0"
+   "GL_STENCIL_INDEX4\0"
+   "GL_STENCIL_INDEX8\0"
+   "GL_STENCIL_INDEX16\0"
+   "GL_RENDERBUFFER_RED_SIZE\0"
+   "GL_RENDERBUFFER_GREEN_SIZE\0"
+   "GL_RENDERBUFFER_BLUE_SIZE\0"
+   "GL_RENDERBUFFER_ALPHA_SIZE\0"
+   "GL_RENDERBUFFER_DEPTH_SIZE\0"
+   "GL_RENDERBUFFER_STENCIL_SIZE\0"
+   "GL_FRAMEBUFFER_INCOMPLETE_MULTISAMPLE\0"
+   "GL_MAX_SAMPLES\0"
+   "GL_TEXTURE_GEN_STR_OES\0"
+   "GL_HALF_FLOAT_OES\0"
+   "GL_RGB565\0"
+   "GL_ETC1_RGB8_OES\0"
+   "GL_TEXTURE_EXTERNAL_OES\0"
+   "GL_SAMPLER_EXTERNAL_OES\0"
+   "GL_TEXTURE_BINDING_EXTERNAL_OES\0"
+   "GL_REQUIRED_TEXTURE_IMAGE_UNITS_OES\0"
+   "GL_PRIMITIVE_RESTART_FIXED_INDEX\0"
+   "GL_ANY_SAMPLES_PASSED_CONSERVATIVE\0"
+   "GL_MAX_ELEMENT_INDEX\0"
+   "GL_RGBA32UI\0"
+   "GL_RGB32UI\0"
+   "GL_ALPHA32UI_EXT\0"
+   "GL_INTENSITY32UI_EXT\0"
+   "GL_LUMINANCE32UI_EXT\0"
+   "GL_LUMINANCE_ALPHA32UI_EXT\0"
+   "GL_RGBA16UI\0"
+   "GL_RGB16UI\0"
+   "GL_ALPHA16UI_EXT\0"
+   "GL_INTENSITY16UI_EXT\0"
+   "GL_LUMINANCE16UI_EXT\0"
+   "GL_LUMINANCE_ALPHA16UI_EXT\0"
+   "GL_RGBA8UI\0"
+   "GL_RGB8UI\0"
+   "GL_ALPHA8UI_EXT\0"
+   "GL_INTENSITY8UI_EXT\0"
+   "GL_LUMINANCE8UI_EXT\0"
+   "GL_LUMINANCE_ALPHA8UI_EXT\0"
+   "GL_RGBA32I\0"
+   "GL_RGB32I\0"
+   "GL_ALPHA32I_EXT\0"
+   "GL_INTENSITY32I_EXT\0"
+   "GL_LUMINANCE32I_EXT\0"
+   "GL_LUMINANCE_ALPHA32I_EXT\0"
+   "GL_RGBA16I\0"
+   "GL_RGB16I\0"
+   "GL_ALPHA16I_EXT\0"
+   "GL_INTENSITY16I_EXT\0"
+   "GL_LUMINANCE16I_EXT\0"
+   "GL_LUMINANCE_ALPHA16I_EXT\0"
+   "GL_RGBA8I\0"
+   "GL_RGB8I\0"
+   "GL_ALPHA8I_EXT\0"
+   "GL_INTENSITY8I_EXT\0"
+   "GL_LUMINANCE8I_EXT\0"
+   "GL_LUMINANCE_ALPHA8I_EXT\0"
+   "GL_RED_INTEGER\0"
+   "GL_GREEN_INTEGER\0"
+   "GL_BLUE_INTEGER\0"
+   "GL_ALPHA_INTEGER_EXT\0"
+   "GL_RGB_INTEGER\0"
+   "GL_RGBA_INTEGER\0"
+   "GL_BGR_INTEGER\0"
+   "GL_BGRA_INTEGER\0"
+   "GL_LUMINANCE_INTEGER_EXT\0"
+   "GL_LUMINANCE_ALPHA_INTEGER_EXT\0"
+   "GL_RGBA_INTEGER_MODE_EXT\0"
+   "GL_INT_2_10_10_10_REV\0"
+   "GL_FRAMEBUFFER_ATTACHMENT_LAYERED\0"
+   "GL_FRAMEBUFFER_INCOMPLETE_LAYER_TARGETS\0"
+   "GL_FRAMEBUFFER_INCOMPLETE_LAYER_COUNT_ARB\0"
+   "GL_FLOAT_32_UNSIGNED_INT_24_8_REV\0"
+   "GL_FRAMEBUFFER_SRGB\0"
+   "GL_FRAMEBUFFER_SRGB_CAPABLE_EXT\0"
+   "GL_COMPRESSED_RED_RGTC1\0"
+   "GL_COMPRESSED_SIGNED_RED_RGTC1\0"
+   "GL_COMPRESSED_RG_RGTC2\0"
+   "GL_COMPRESSED_SIGNED_RG_RGTC2\0"
+   "GL_SAMPLER_1D_ARRAY\0"
+   "GL_SAMPLER_2D_ARRAY\0"
+   "GL_SAMPLER_BUFFER\0"
+   "GL_SAMPLER_1D_ARRAY_SHADOW\0"
+   "GL_SAMPLER_2D_ARRAY_SHADOW\0"
+   "GL_SAMPLER_CUBE_SHADOW\0"
+   "GL_UNSIGNED_INT_VEC2\0"
+   "GL_UNSIGNED_INT_VEC3\0"
+   "GL_UNSIGNED_INT_VEC4\0"
+   "GL_INT_SAMPLER_1D\0"
+   "GL_INT_SAMPLER_2D\0"
+   "GL_INT_SAMPLER_3D\0"
+   "GL_INT_SAMPLER_CUBE\0"
+   "GL_INT_SAMPLER_2D_RECT\0"
+   "GL_INT_SAMPLER_1D_ARRAY\0"
+   "GL_INT_SAMPLER_2D_ARRAY\0"
+   "GL_INT_SAMPLER_BUFFER\0"
+   "GL_UNSIGNED_INT_SAMPLER_1D\0"
+   "GL_UNSIGNED_INT_SAMPLER_2D\0"
+   "GL_UNSIGNED_INT_SAMPLER_3D\0"
+   "GL_UNSIGNED_INT_SAMPLER_CUBE\0"
+   "GL_UNSIGNED_INT_SAMPLER_2D_RECT\0"
+   "GL_UNSIGNED_INT_SAMPLER_1D_ARRAY\0"
+   "GL_UNSIGNED_INT_SAMPLER_2D_ARRAY\0"
+   "GL_UNSIGNED_INT_SAMPLER_BUFFER\0"
+   "GL_GEOMETRY_SHADER\0"
+   "GL_GEOMETRY_VERTICES_OUT_ARB\0"
+   "GL_GEOMETRY_INPUT_TYPE_ARB\0"
+   "GL_GEOMETRY_OUTPUT_TYPE_ARB\0"
+   "GL_MAX_GEOMETRY_VARYING_COMPONENTS_ARB\0"
+   "GL_MAX_VERTEX_VARYING_COMPONENTS_ARB\0"
+   "GL_MAX_GEOMETRY_UNIFORM_COMPONENTS\0"
+   "GL_MAX_GEOMETRY_OUTPUT_VERTICES\0"
+   "GL_MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS\0"
+   "GL_LOW_FLOAT\0"
+   "GL_MEDIUM_FLOAT\0"
+   "GL_HIGH_FLOAT\0"
+   "GL_LOW_INT\0"
+   "GL_MEDIUM_INT\0"
+   "GL_HIGH_INT\0"
+   "GL_UNSIGNED_INT_10_10_10_2_OES\0"
+   "GL_INT_10_10_10_2_OES\0"
+   "GL_SHADER_BINARY_FORMATS\0"
+   "GL_NUM_SHADER_BINARY_FORMATS\0"
+   "GL_SHADER_COMPILER\0"
+   "GL_MAX_VERTEX_UNIFORM_VECTORS\0"
+   "GL_MAX_VARYING_VECTORS\0"
+   "GL_MAX_FRAGMENT_UNIFORM_VECTORS\0"
+   "GL_QUERY_WAIT\0"
+   "GL_QUERY_NO_WAIT\0"
+   "GL_QUERY_BY_REGION_WAIT\0"
+   "GL_QUERY_BY_REGION_NO_WAIT\0"
+   "GL_TRANSFORM_FEEDBACK\0"
+   "GL_TRANSFORM_FEEDBACK_BUFFER_PAUSED\0"
+   "GL_TRANSFORM_FEEDBACK_BUFFER_ACTIVE\0"
+   "GL_TRANSFORM_FEEDBACK_BINDING\0"
+   "GL_TIMESTAMP\0"
+   "GL_TEXTURE_SWIZZLE_R\0"
+   "GL_TEXTURE_SWIZZLE_G\0"
+   "GL_TEXTURE_SWIZZLE_B\0"
+   "GL_TEXTURE_SWIZZLE_A\0"
+   "GL_TEXTURE_SWIZZLE_RGBA\0"
+   "GL_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION\0"
+   "GL_FIRST_VERTEX_CONVENTION\0"
+   "GL_LAST_VERTEX_CONVENTION\0"
+   "GL_PROVOKING_VERTEX\0"
+   "GL_SAMPLE_POSITION\0"
+   "GL_SAMPLE_MASK\0"
+   "GL_SAMPLE_MASK_VALUE\0"
+   "GL_MAX_SAMPLE_MASK_WORDS\0"
+   "GL_MAX_GEOMETRY_SHADER_INVOCATIONS\0"
+   "GL_MIN_FRAGMENT_INTERPOLATION_OFFSET\0"
+   "GL_MAX_FRAGMENT_INTERPOLATION_OFFSET\0"
+   "GL_FRAGMENT_INTERPOLATION_OFFSET_BITS\0"
+   "GL_MIN_PROGRAM_TEXTURE_GATHER_OFFSET\0"
+   "GL_MAX_PROGRAM_TEXTURE_GATHER_OFFSET\0"
+   "GL_MAX_TRANSFORM_FEEDBACK_BUFFERS\0"
+   "GL_MAX_VERTEX_STREAMS\0"
+   "GL_COPY_READ_BUFFER\0"
+   "GL_COPY_WRITE_BUFFER\0"
+   "GL_MAX_IMAGE_UNITS\0"
+   "GL_MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS\0"
+   "GL_IMAGE_BINDING_NAME\0"
+   "GL_IMAGE_BINDING_LEVEL\0"
+   "GL_IMAGE_BINDING_LAYERED\0"
+   "GL_IMAGE_BINDING_LAYER\0"
+   "GL_IMAGE_BINDING_ACCESS\0"
+   "GL_DRAW_INDIRECT_BUFFER\0"
+   "GL_DRAW_INDIRECT_BUFFER_BINDING\0"
+   "GL_RED_SNORM\0"
+   "GL_RG_SNORM\0"
+   "GL_RGB_SNORM\0"
+   "GL_RGBA_SNORM\0"
+   "GL_R8_SNORM\0"
+   "GL_RG8_SNORM\0"
+   "GL_RGB8_SNORM\0"
+   "GL_RGBA8_SNORM\0"
+   "GL_R16_SNORM\0"
+   "GL_RG16_SNORM\0"
+   "GL_RGB16_SNORM\0"
+   "GL_RGBA16_SNORM\0"
+   "GL_SIGNED_NORMALIZED\0"
+   "GL_PRIMITIVE_RESTART\0"
+   "GL_PRIMITIVE_RESTART_INDEX\0"
+   "GL_MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB\0"
+   "GL_TEXTURE_CUBE_MAP_ARRAY_ARB\0"
+   "GL_TEXTURE_BINDING_CUBE_MAP_ARRAY_ARB\0"
+   "GL_PROXY_TEXTURE_CUBE_MAP_ARRAY_ARB\0"
+   "GL_SAMPLER_CUBE_MAP_ARRAY_ARB\0"
+   "GL_SAMPLER_CUBE_MAP_ARRAY_SHADOW_ARB\0"
+   "GL_INT_SAMPLER_CUBE_MAP_ARRAY_ARB\0"
+   "GL_UNSIGNED_INT_SAMPLER_CUBE_MAP_ARRAY_ARB\0"
+   "GL_IMAGE_1D\0"
+   "GL_IMAGE_2D\0"
+   "GL_IMAGE_3D\0"
+   "GL_IMAGE_2D_RECT\0"
+   "GL_IMAGE_CUBE\0"
+   "GL_IMAGE_BUFFER\0"
+   "GL_IMAGE_1D_ARRAY\0"
+   "GL_IMAGE_2D_ARRAY\0"
+   "GL_IMAGE_CUBE_MAP_ARRAY\0"
+   "GL_IMAGE_2D_MULTISAMPLE\0"
+   "GL_IMAGE_2D_MULTISAMPLE_ARRAY\0"
+   "GL_INT_IMAGE_1D\0"
+   "GL_INT_IMAGE_2D\0"
+   "GL_INT_IMAGE_3D\0"
+   "GL_INT_IMAGE_2D_RECT\0"
+   "GL_INT_IMAGE_CUBE\0"
+   "GL_INT_IMAGE_BUFFER\0"
+   "GL_INT_IMAGE_1D_ARRAY\0"
+   "GL_INT_IMAGE_2D_ARRAY\0"
+   "GL_INT_IMAGE_CUBE_MAP_ARRAY\0"
+   "GL_INT_IMAGE_2D_MULTISAMPLE\0"
+   "GL_INT_IMAGE_2D_MULTISAMPLE_ARRAY\0"
+   "GL_UNSIGNED_INT_IMAGE_1D\0"
+   "GL_UNSIGNED_INT_IMAGE_2D\0"
+   "GL_UNSIGNED_INT_IMAGE_3D\0"
+   "GL_UNSIGNED_INT_IMAGE_2D_RECT\0"
+   "GL_UNSIGNED_INT_IMAGE_CUBE\0"
+   "GL_UNSIGNED_INT_IMAGE_BUFFER\0"
+   "GL_UNSIGNED_INT_IMAGE_1D_ARRAY\0"
+   "GL_UNSIGNED_INT_IMAGE_2D_ARRAY\0"
+   "GL_UNSIGNED_INT_IMAGE_CUBE_MAP_ARRAY\0"
+   "GL_UNSIGNED_INT_IMAGE_2D_MULTISAMPLE\0"
+   "GL_UNSIGNED_INT_IMAGE_2D_MULTISAMPLE_ARRAY\0"
+   "GL_MAX_IMAGE_SAMPLES\0"
+   "GL_IMAGE_BINDING_FORMAT\0"
+   "GL_RGB10_A2UI\0"
+   "GL_MIN_MAP_BUFFER_ALIGNMENT\0"
+   "GL_IMAGE_FORMAT_COMPATIBILITY_TYPE\0"
+   "GL_IMAGE_FORMAT_COMPATIBILITY_BY_SIZE\0"
+   "GL_IMAGE_FORMAT_COMPATIBILITY_BY_CLASS\0"
+   "GL_MAX_VERTEX_IMAGE_UNIFORMS\0"
+   "GL_MAX_TESS_CONTROL_IMAGE_UNIFORMS\0"
+   "GL_MAX_TESS_EVALUATION_IMAGE_UNIFORMS\0"
+   "GL_MAX_GEOMETRY_IMAGE_UNIFORMS\0"
+   "GL_MAX_FRAGMENT_IMAGE_UNIFORMS\0"
+   "GL_MAX_COMBINED_IMAGE_UNIFORMS\0"
+   "GL_DEPTH_STENCIL_TEXTURE_MODE\0"
+   "GL_MAX_COMPUTE_WORK_GROUP_INVOCATIONS\0"
+   "GL_UNIFORM_BLOCK_REFERENCED_BY_COMPUTE_SHADER\0"
+   "GL_ATOMIC_COUNTER_BUFFER_REFERENCED_BY_COMPUTE_SHADER\0"
+   "GL_DISPATCH_INDIRECT_BUFFER\0"
+   "GL_DISPATCH_INDIRECT_BUFFER_BINDING\0"
+   "GL_TEXTURE_2D_MULTISAMPLE\0"
+   "GL_PROXY_TEXTURE_2D_MULTISAMPLE\0"
+   "GL_TEXTURE_2D_MULTISAMPLE_ARRAY\0"
+   "GL_PROXY_TEXTURE_2D_MULTISAMPLE_ARRAY\0"
+   "GL_TEXTURE_BINDING_2D_MULTISAMPLE\0"
+   "GL_TEXTURE_BINDING_2D_MULTISAMPLE_ARRAY\0"
+   "GL_TEXTURE_SAMPLES\0"
+   "GL_TEXTURE_FIXED_SAMPLE_LOCATIONS\0"
+   "GL_SAMPLER_2D_MULTISAMPLE\0"
+   "GL_INT_SAMPLER_2D_MULTISAMPLE\0"
+   "GL_UNSIGNED_INT_SAMPLER_2D_MULTISAMPLE\0"
+   "GL_SAMPLER_2D_MULTISAMPLE_ARRAY\0"
+   "GL_INT_SAMPLER_2D_MULTISAMPLE_ARRAY\0"
+   "GL_UNSIGNED_INT_SAMPLER_2D_MULTISAMPLE_ARRAY\0"
+   "GL_MAX_COLOR_TEXTURE_SAMPLES\0"
+   "GL_MAX_DEPTH_TEXTURE_SAMPLES\0"
+   "GL_MAX_INTEGER_SAMPLES\0"
+   "GL_MAX_SERVER_WAIT_TIMEOUT\0"
+   "GL_OBJECT_TYPE\0"
+   "GL_SYNC_CONDITION\0"
+   "GL_SYNC_STATUS\0"
+   "GL_SYNC_FLAGS\0"
+   "GL_SYNC_FENCE\0"
+   "GL_SYNC_GPU_COMMANDS_COMPLETE\0"
+   "GL_UNSIGNALED\0"
+   "GL_SIGNALED\0"
+   "GL_ALREADY_SIGNALED\0"
+   "GL_TIMEOUT_EXPIRED\0"
+   "GL_CONDITION_SATISFIED\0"
+   "GL_WAIT_FAILED\0"
+   "GL_BUFFER_ACCESS_FLAGS\0"
+   "GL_BUFFER_MAP_LENGTH\0"
+   "GL_BUFFER_MAP_OFFSET\0"
+   "GL_MAX_VERTEX_OUTPUT_COMPONENTS\0"
+   "GL_MAX_GEOMETRY_INPUT_COMPONENTS\0"
+   "GL_MAX_GEOMETRY_OUTPUT_COMPONENTS\0"
+   "GL_MAX_FRAGMENT_INPUT_COMPONENTS\0"
+   "GL_CONTEXT_PROFILE_MASK\0"
+   "GL_TEXTURE_IMMUTABLE_FORMAT\0"
+   "GL_MAX_DEBUG_MESSAGE_LENGTH_ARB\0"
+   "GL_MAX_DEBUG_LOGGED_MESSAGES_ARB\0"
+   "GL_DEBUG_LOGGED_MESSAGES_ARB\0"
+   "GL_DEBUG_SEVERITY_HIGH_ARB\0"
+   "GL_DEBUG_SEVERITY_MEDIUM_ARB\0"
+   "GL_DEBUG_SEVERITY_LOW_ARB\0"
+   "GL_TEXTURE_BUFFER_OFFSET\0"
+   "GL_TEXTURE_BUFFER_SIZE\0"
+   "GL_TEXTURE_BUFFER_OFFSET_ALIGNMENT\0"
+   "GL_COMPUTE_SHADER\0"
+   "GL_MAX_COMPUTE_UNIFORM_BLOCKS\0"
+   "GL_MAX_COMPUTE_TEXTURE_IMAGE_UNITS\0"
+   "GL_MAX_COMPUTE_IMAGE_UNIFORMS\0"
+   "GL_MAX_COMPUTE_WORK_GROUP_COUNT\0"
+   "GL_MAX_COMPUTE_WORK_GROUP_SIZE\0"
+   "GL_COMPRESSED_R11_EAC\0"
+   "GL_COMPRESSED_SIGNED_R11_EAC\0"
+   "GL_COMPRESSED_RG11_EAC\0"
+   "GL_COMPRESSED_SIGNED_RG11_EAC\0"
+   "GL_COMPRESSED_RGB8_ETC2\0"
+   "GL_COMPRESSED_SRGB8_ETC2\0"
+   "GL_COMPRESSED_RGB8_PUNCHTHROUGH_ALPHA1_ETC2\0"
+   "GL_COMPRESSED_SRGB8_PUNCHTHROUGH_ALPHA1_ETC2\0"
+   "GL_COMPRESSED_RGBA8_ETC2_EAC\0"
+   "GL_COMPRESSED_SRGB8_ALPHA8_ETC2_EAC\0"
+   "GL_ATOMIC_COUNTER_BUFFER\0"
+   "GL_ATOMIC_COUNTER_BUFFER_BINDING\0"
+   "GL_ATOMIC_COUNTER_BUFFER_START\0"
+   "GL_ATOMIC_COUNTER_BUFFER_SIZE\0"
+   "GL_ATOMIC_COUNTER_BUFFER_DATA_SIZE\0"
+   "GL_ATOMIC_COUNTER_BUFFER_ACTIVE_ATOMIC_COUNTERS\0"
+   "GL_ATOMIC_COUNTER_BUFFER_ACTIVE_ATOMIC_COUNTER_INDICES\0"
+   "GL_ATOMIC_COUNTER_BUFFER_REFERENCED_BY_VERTEX_SHADER\0"
+   "GL_ATOMIC_COUNTER_BUFFER_REFERENCED_BY_TESS_CONTROL_SHADER\0"
+   "GL_ATOMIC_COUNTER_BUFFER_REFERENCED_BY_TESS_EVALUATION_SHADER\0"
+   "GL_ATOMIC_COUNTER_BUFFER_REFERENCED_BY_GEOMETRY_SHADER\0"
+   "GL_ATOMIC_COUNTER_BUFFER_REFERENCED_BY_FRAGMENT_SHADER\0"
+   "GL_MAX_VERTEX_ATOMIC_COUNTER_BUFFERS\0"
+   "GL_MAX_TESS_CONTROL_ATOMIC_COUNTER_BUFFERS\0"
+   "GL_MAX_TESS_EVALUATION_ATOMIC_COUNTER_BUFFERS\0"
+   "GL_MAX_GEOMETRY_ATOMIC_COUNTER_BUFFERS\0"
+   "GL_MAX_FRAGMENT_ATOMIC_COUNTER_BUFFERS\0"
+   "GL_MAX_COMBINED_ATOMIC_COUNTER_BUFFERS\0"
+   "GL_MAX_VERTEX_ATOMIC_COUNTERS\0"
+   "GL_MAX_TESS_CONTROL_ATOMIC_COUNTERS\0"
+   "GL_MAX_TESS_EVALUATION_ATOMIC_COUNTERS\0"
+   "GL_MAX_GEOMETRY_ATOMIC_COUNTERS\0"
+   "GL_MAX_FRAGMENT_ATOMIC_COUNTERS\0"
+   "GL_MAX_COMBINED_ATOMIC_COUNTERS\0"
+   "GL_MAX_ATOMIC_COUNTER_BUFFER_SIZE\0"
+   "GL_ACTIVE_ATOMIC_COUNTER_BUFFERS\0"
+   "GL_UNIFORM_ATOMIC_COUNTER_BUFFER_INDEX\0"
+   "GL_UNSIGNED_INT_ATOMIC_COUNTER\0"
+   "GL_MAX_ATOMIC_COUNTER_BUFFER_BINDINGS\0"
+   "GL_DEBUG_OUTPUT\0"
+   "GL_NUM_SAMPLE_COUNTS\0"
+   "GL_PERFQUERY_COUNTER_EVENT_INTEL\0"
+   "GL_PERFQUERY_COUNTER_DURATION_NORM_INTEL\0"
+   "GL_PERFQUERY_COUNTER_DURATION_RAW_INTEL\0"
+   "GL_PERFQUERY_COUNTER_THROUGHPUT_INTEL\0"
+   "GL_PERFQUERY_COUNTER_RAW_INTEL\0"
+   "GL_PERFQUERY_COUNTER_TIMESTAMP_INTEL\0"
+   "GL_PERFQUERY_COUNTER_DATA_UINT32_INTEL\0"
+   "GL_PERFQUERY_COUNTER_DATA_UINT64_INTEL\0"
+   "GL_PERFQUERY_COUNTER_DATA_FLOAT_INTEL\0"
+   "GL_PERFQUERY_COUNTER_DATA_DOUBLE_INTEL\0"
+   "GL_PERFQUERY_COUNTER_DATA_BOOL32_INTEL\0"
+   "GL_PERFQUERY_QUERY_NAME_LENGTH_MAX_INTEL\0"
+   "GL_PERFQUERY_COUNTER_NAME_LENGTH_MAX_INTEL\0"
+   "GL_PERFQUERY_COUNTER_DESC_LENGTH_MAX_INTEL\0"
+   "GL_PERFQUERY_GPA_EXTENDED_COUNTERS_INTEL\0"
+   "GL_EVAL_BIT\0"
+   "GL_RASTER_POSITION_UNCLIPPED_IBM\0"
+   "GL_LIST_BIT\0"
+   "GL_TEXTURE_BIT\0"
+   "GL_SCISSOR_BIT\0"
+   "GL_ALL_ATTRIB_BITS\0"
+   "GL_MULTISAMPLE_BIT\0"
+   "GL_ALL_CLIENT_ATTRIB_BITS\0"
+   ;
+
+static const enum_elt enum_string_table_offsets[2007] =
+{
+   {     0, 0x00000000 }, /* GL_FALSE */
+   {     9, 0x00000001 }, /* GL_LINES */
+   {    18, 0x00000002 }, /* GL_LINE_LOOP */
+   {    31, 0x00000003 }, /* GL_LINE_STRIP */
+   {    45, 0x00000004 }, /* GL_TRIANGLES */
+   {    58, 0x00000005 }, /* GL_TRIANGLE_STRIP */
+   {    76, 0x00000006 }, /* GL_TRIANGLE_FAN */
+   {    92, 0x00000007 }, /* GL_QUADS */
+   {   101, 0x00000008 }, /* GL_QUAD_STRIP */
+   {   115, 0x00000009 }, /* GL_POLYGON */
+   {   126, 0x0000000A }, /* GL_LINES_ADJACENCY */
+   {   145, 0x0000000B }, /* GL_LINE_STRIP_ADJACENCY */
+   {   169, 0x0000000C }, /* GL_TRIANGLES_ADJACENCY */
+   {   192, 0x0000000D }, /* GL_TRIANGLE_STRIP_ADJACENCY */
+   {   220, 0x00000010 }, /* GL_POLYGON_STIPPLE_BIT */
+   {   243, 0x00000020 }, /* GL_PIXEL_MODE_BIT */
+   {   261, 0x00000040 }, /* GL_LIGHTING_BIT */
+   {   277, 0x00000080 }, /* GL_FOG_BIT */
+   {   288, 0x00000100 }, /* GL_ACCUM */
+   {   297, 0x00000101 }, /* GL_LOAD */
+   {   305, 0x00000102 }, /* GL_RETURN */
+   {   315, 0x00000103 }, /* GL_MULT */
+   {   323, 0x00000104 }, /* GL_ADD */
+   {   330, 0x00000200 }, /* GL_NEVER */
+   {   339, 0x00000201 }, /* GL_LESS */
+   {   347, 0x00000202 }, /* GL_EQUAL */
+   {   356, 0x00000203 }, /* GL_LEQUAL */
+   {   366, 0x00000204 }, /* GL_GREATER */
+   {   377, 0x00000205 }, /* GL_NOTEQUAL */
+   {   389, 0x00000206 }, /* GL_GEQUAL */
+   {   399, 0x00000207 }, /* GL_ALWAYS */
+   {   409, 0x00000300 }, /* GL_SRC_COLOR */
+   {   422, 0x00000301 }, /* GL_ONE_MINUS_SRC_COLOR */
+   {   445, 0x00000302 }, /* GL_SRC_ALPHA */
+   {   458, 0x00000303 }, /* GL_ONE_MINUS_SRC_ALPHA */
+   {   481, 0x00000304 }, /* GL_DST_ALPHA */
+   {   494, 0x00000305 }, /* GL_ONE_MINUS_DST_ALPHA */
+   {   517, 0x00000306 }, /* GL_DST_COLOR */
+   {   530, 0x00000307 }, /* GL_ONE_MINUS_DST_COLOR */
+   {   553, 0x00000308 }, /* GL_SRC_ALPHA_SATURATE */
+   {   575, 0x00000400 }, /* GL_FRONT_LEFT */
+   {   589, 0x00000401 }, /* GL_FRONT_RIGHT */
+   {   604, 0x00000402 }, /* GL_BACK_LEFT */
+   {   617, 0x00000403 }, /* GL_BACK_RIGHT */
+   {   631, 0x00000404 }, /* GL_FRONT */
+   {   640, 0x00000405 }, /* GL_BACK */
+   {   648, 0x00000406 }, /* GL_LEFT */
+   {   656, 0x00000407 }, /* GL_RIGHT */
+   {   665, 0x00000408 }, /* GL_FRONT_AND_BACK */
+   {   683, 0x00000409 }, /* GL_AUX0 */
+   {   691, 0x0000040A }, /* GL_AUX1 */
+   {   699, 0x0000040B }, /* GL_AUX2 */
+   {   707, 0x0000040C }, /* GL_AUX3 */
+   {   715, 0x00000500 }, /* GL_INVALID_ENUM */
+   {   731, 0x00000501 }, /* GL_INVALID_VALUE */
+   {   748, 0x00000502 }, /* GL_INVALID_OPERATION */
+   {   769, 0x00000503 }, /* GL_STACK_OVERFLOW */
+   {   787, 0x00000504 }, /* GL_STACK_UNDERFLOW */
+   {   806, 0x00000505 }, /* GL_OUT_OF_MEMORY */
+   {   823, 0x00000506 }, /* GL_INVALID_FRAMEBUFFER_OPERATION */
+   {   856, 0x00000600 }, /* GL_2D */
+   {   862, 0x00000601 }, /* GL_3D */
+   {   868, 0x00000602 }, /* GL_3D_COLOR */
+   {   880, 0x00000603 }, /* GL_3D_COLOR_TEXTURE */
+   {   900, 0x00000604 }, /* GL_4D_COLOR_TEXTURE */
+   {   920, 0x00000700 }, /* GL_PASS_THROUGH_TOKEN */
+   {   942, 0x00000701 }, /* GL_POINT_TOKEN */
+   {   957, 0x00000702 }, /* GL_LINE_TOKEN */
+   {   971, 0x00000703 }, /* GL_POLYGON_TOKEN */
+   {   988, 0x00000704 }, /* GL_BITMAP_TOKEN */
+   {  1004, 0x00000705 }, /* GL_DRAW_PIXEL_TOKEN */
+   {  1024, 0x00000706 }, /* GL_COPY_PIXEL_TOKEN */
+   {  1044, 0x00000707 }, /* GL_LINE_RESET_TOKEN */
+   {  1064, 0x00000800 }, /* GL_EXP */
+   {  1071, 0x00000801 }, /* GL_EXP2 */
+   {  1079, 0x00000900 }, /* GL_CW */
+   {  1085, 0x00000901 }, /* GL_CCW */
+   {  1092, 0x00000A00 }, /* GL_COEFF */
+   {  1101, 0x00000A01 }, /* GL_ORDER */
+   {  1110, 0x00000A02 }, /* GL_DOMAIN */
+   {  1120, 0x00000B00 }, /* GL_CURRENT_COLOR */
+   {  1137, 0x00000B01 }, /* GL_CURRENT_INDEX */
+   {  1154, 0x00000B02 }, /* GL_CURRENT_NORMAL */
+   {  1172, 0x00000B03 }, /* GL_CURRENT_TEXTURE_COORDS */
+   {  1198, 0x00000B04 }, /* GL_CURRENT_RASTER_COLOR */
+   {  1222, 0x00000B05 }, /* GL_CURRENT_RASTER_INDEX */
+   {  1246, 0x00000B06 }, /* GL_CURRENT_RASTER_TEXTURE_COORDS */
+   {  1279, 0x00000B07 }, /* GL_CURRENT_RASTER_POSITION */
+   {  1306, 0x00000B08 }, /* GL_CURRENT_RASTER_POSITION_VALID */
+   {  1339, 0x00000B09 }, /* GL_CURRENT_RASTER_DISTANCE */
+   {  1366, 0x00000B10 }, /* GL_POINT_SMOOTH */
+   {  1382, 0x00000B11 }, /* GL_POINT_SIZE */
+   {  1396, 0x00000B12 }, /* GL_POINT_SIZE_RANGE */
+   {  1416, 0x00000B13 }, /* GL_POINT_SIZE_GRANULARITY */
+   {  1442, 0x00000B20 }, /* GL_LINE_SMOOTH */
+   {  1457, 0x00000B21 }, /* GL_LINE_WIDTH */
+   {  1471, 0x00000B22 }, /* GL_LINE_WIDTH_RANGE */
+   {  1491, 0x00000B23 }, /* GL_LINE_WIDTH_GRANULARITY */
+   {  1517, 0x00000B24 }, /* GL_LINE_STIPPLE */
+   {  1533, 0x00000B25 }, /* GL_LINE_STIPPLE_PATTERN */
+   {  1557, 0x00000B26 }, /* GL_LINE_STIPPLE_REPEAT */
+   {  1580, 0x00000B30 }, /* GL_LIST_MODE */
+   {  1593, 0x00000B31 }, /* GL_MAX_LIST_NESTING */
+   {  1613, 0x00000B32 }, /* GL_LIST_BASE */
+   {  1626, 0x00000B33 }, /* GL_LIST_INDEX */
+   {  1640, 0x00000B40 }, /* GL_POLYGON_MODE */
+   {  1656, 0x00000B41 }, /* GL_POLYGON_SMOOTH */
+   {  1674, 0x00000B42 }, /* GL_POLYGON_STIPPLE */
+   {  1693, 0x00000B43 }, /* GL_EDGE_FLAG */
+   {  1706, 0x00000B44 }, /* GL_CULL_FACE */
+   {  1719, 0x00000B45 }, /* GL_CULL_FACE_MODE */
+   {  1737, 0x00000B46 }, /* GL_FRONT_FACE */
+   {  1751, 0x00000B50 }, /* GL_LIGHTING */
+   {  1763, 0x00000B51 }, /* GL_LIGHT_MODEL_LOCAL_VIEWER */
+   {  1791, 0x00000B52 }, /* GL_LIGHT_MODEL_TWO_SIDE */
+   {  1815, 0x00000B53 }, /* GL_LIGHT_MODEL_AMBIENT */
+   {  1838, 0x00000B54 }, /* GL_SHADE_MODEL */
+   {  1853, 0x00000B55 }, /* GL_COLOR_MATERIAL_FACE */
+   {  1876, 0x00000B56 }, /* GL_COLOR_MATERIAL_PARAMETER */
+   {  1904, 0x00000B57 }, /* GL_COLOR_MATERIAL */
+   {  1922, 0x00000B60 }, /* GL_FOG */
+   {  1929, 0x00000B61 }, /* GL_FOG_INDEX */
+   {  1942, 0x00000B62 }, /* GL_FOG_DENSITY */
+   {  1957, 0x00000B63 }, /* GL_FOG_START */
+   {  1970, 0x00000B64 }, /* GL_FOG_END */
+   {  1981, 0x00000B65 }, /* GL_FOG_MODE */
+   {  1993, 0x00000B66 }, /* GL_FOG_COLOR */
+   {  2006, 0x00000B70 }, /* GL_DEPTH_RANGE */
+   {  2021, 0x00000B71 }, /* GL_DEPTH_TEST */
+   {  2035, 0x00000B72 }, /* GL_DEPTH_WRITEMASK */
+   {  2054, 0x00000B73 }, /* GL_DEPTH_CLEAR_VALUE */
+   {  2075, 0x00000B74 }, /* GL_DEPTH_FUNC */
+   {  2089, 0x00000B80 }, /* GL_ACCUM_CLEAR_VALUE */
+   {  2110, 0x00000B90 }, /* GL_STENCIL_TEST */
+   {  2126, 0x00000B91 }, /* GL_STENCIL_CLEAR_VALUE */
+   {  2149, 0x00000B92 }, /* GL_STENCIL_FUNC */
+   {  2165, 0x00000B93 }, /* GL_STENCIL_VALUE_MASK */
+   {  2187, 0x00000B94 }, /* GL_STENCIL_FAIL */
+   {  2203, 0x00000B95 }, /* GL_STENCIL_PASS_DEPTH_FAIL */
+   {  2230, 0x00000B96 }, /* GL_STENCIL_PASS_DEPTH_PASS */
+   {  2257, 0x00000B97 }, /* GL_STENCIL_REF */
+   {  2272, 0x00000B98 }, /* GL_STENCIL_WRITEMASK */
+   {  2293, 0x00000BA0 }, /* GL_MATRIX_MODE */
+   {  2308, 0x00000BA1 }, /* GL_NORMALIZE */
+   {  2321, 0x00000BA2 }, /* GL_VIEWPORT */
+   {  2333, 0x00000BA3 }, /* GL_MODELVIEW_STACK_DEPTH */
+   {  2358, 0x00000BA4 }, /* GL_PROJECTION_STACK_DEPTH */
+   {  2384, 0x00000BA5 }, /* GL_TEXTURE_STACK_DEPTH */
+   {  2407, 0x00000BA6 }, /* GL_MODELVIEW_MATRIX */
+   {  2427, 0x00000BA7 }, /* GL_PROJECTION_MATRIX */
+   {  2448, 0x00000BA8 }, /* GL_TEXTURE_MATRIX */
+   {  2466, 0x00000BB0 }, /* GL_ATTRIB_STACK_DEPTH */
+   {  2488, 0x00000BB1 }, /* GL_CLIENT_ATTRIB_STACK_DEPTH */
+   {  2517, 0x00000BC0 }, /* GL_ALPHA_TEST */
+   {  2531, 0x00000BC1 }, /* GL_ALPHA_TEST_FUNC */
+   {  2550, 0x00000BC2 }, /* GL_ALPHA_TEST_REF */
+   {  2568, 0x00000BD0 }, /* GL_DITHER */
+   {  2578, 0x00000BE0 }, /* GL_BLEND_DST */
+   {  2591, 0x00000BE1 }, /* GL_BLEND_SRC */
+   {  2604, 0x00000BE2 }, /* GL_BLEND */
+   {  2613, 0x00000BF0 }, /* GL_LOGIC_OP_MODE */
+   {  2630, 0x00000BF1 }, /* GL_INDEX_LOGIC_OP */
+   {  2648, 0x00000BF2 }, /* GL_COLOR_LOGIC_OP */
+   {  2666, 0x00000C00 }, /* GL_AUX_BUFFERS */
+   {  2681, 0x00000C01 }, /* GL_DRAW_BUFFER */
+   {  2696, 0x00000C02 }, /* GL_READ_BUFFER */
+   {  2711, 0x00000C10 }, /* GL_SCISSOR_BOX */
+   {  2726, 0x00000C11 }, /* GL_SCISSOR_TEST */
+   {  2742, 0x00000C20 }, /* GL_INDEX_CLEAR_VALUE */
+   {  2763, 0x00000C21 }, /* GL_INDEX_WRITEMASK */
+   {  2782, 0x00000C22 }, /* GL_COLOR_CLEAR_VALUE */
+   {  2803, 0x00000C23 }, /* GL_COLOR_WRITEMASK */
+   {  2822, 0x00000C30 }, /* GL_INDEX_MODE */
+   {  2836, 0x00000C31 }, /* GL_RGBA_MODE */
+   {  2849, 0x00000C32 }, /* GL_DOUBLEBUFFER */
+   {  2865, 0x00000C33 }, /* GL_STEREO */
+   {  2875, 0x00000C40 }, /* GL_RENDER_MODE */
+   {  2890, 0x00000C50 }, /* GL_PERSPECTIVE_CORRECTION_HINT */
+   {  2921, 0x00000C51 }, /* GL_POINT_SMOOTH_HINT */
+   {  2942, 0x00000C52 }, /* GL_LINE_SMOOTH_HINT */
+   {  2962, 0x00000C53 }, /* GL_POLYGON_SMOOTH_HINT */
+   {  2985, 0x00000C54 }, /* GL_FOG_HINT */
+   {  2997, 0x00000C60 }, /* GL_TEXTURE_GEN_S */
+   {  3014, 0x00000C61 }, /* GL_TEXTURE_GEN_T */
+   {  3031, 0x00000C62 }, /* GL_TEXTURE_GEN_R */
+   {  3048, 0x00000C63 }, /* GL_TEXTURE_GEN_Q */
+   {  3065, 0x00000C70 }, /* GL_PIXEL_MAP_I_TO_I */
+   {  3085, 0x00000C71 }, /* GL_PIXEL_MAP_S_TO_S */
+   {  3105, 0x00000C72 }, /* GL_PIXEL_MAP_I_TO_R */
+   {  3125, 0x00000C73 }, /* GL_PIXEL_MAP_I_TO_G */
+   {  3145, 0x00000C74 }, /* GL_PIXEL_MAP_I_TO_B */
+   {  3165, 0x00000C75 }, /* GL_PIXEL_MAP_I_TO_A */
+   {  3185, 0x00000C76 }, /* GL_PIXEL_MAP_R_TO_R */
+   {  3205, 0x00000C77 }, /* GL_PIXEL_MAP_G_TO_G */
+   {  3225, 0x00000C78 }, /* GL_PIXEL_MAP_B_TO_B */
+   {  3245, 0x00000C79 }, /* GL_PIXEL_MAP_A_TO_A */
+   {  3265, 0x00000CB0 }, /* GL_PIXEL_MAP_I_TO_I_SIZE */
+   {  3290, 0x00000CB1 }, /* GL_PIXEL_MAP_S_TO_S_SIZE */
+   {  3315, 0x00000CB2 }, /* GL_PIXEL_MAP_I_TO_R_SIZE */
+   {  3340, 0x00000CB3 }, /* GL_PIXEL_MAP_I_TO_G_SIZE */
+   {  3365, 0x00000CB4 }, /* GL_PIXEL_MAP_I_TO_B_SIZE */
+   {  3390, 0x00000CB5 }, /* GL_PIXEL_MAP_I_TO_A_SIZE */
+   {  3415, 0x00000CB6 }, /* GL_PIXEL_MAP_R_TO_R_SIZE */
+   {  3440, 0x00000CB7 }, /* GL_PIXEL_MAP_G_TO_G_SIZE */
+   {  3465, 0x00000CB8 }, /* GL_PIXEL_MAP_B_TO_B_SIZE */
+   {  3490, 0x00000CB9 }, /* GL_PIXEL_MAP_A_TO_A_SIZE */
+   {  3515, 0x00000CF0 }, /* GL_UNPACK_SWAP_BYTES */
+   {  3536, 0x00000CF1 }, /* GL_UNPACK_LSB_FIRST */
+   {  3556, 0x00000CF2 }, /* GL_UNPACK_ROW_LENGTH */
+   {  3577, 0x00000CF3 }, /* GL_UNPACK_SKIP_ROWS */
+   {  3597, 0x00000CF4 }, /* GL_UNPACK_SKIP_PIXELS */
+   {  3619, 0x00000CF5 }, /* GL_UNPACK_ALIGNMENT */
+   {  3639, 0x00000D00 }, /* GL_PACK_SWAP_BYTES */
+   {  3658, 0x00000D01 }, /* GL_PACK_LSB_FIRST */
+   {  3676, 0x00000D02 }, /* GL_PACK_ROW_LENGTH */
+   {  3695, 0x00000D03 }, /* GL_PACK_SKIP_ROWS */
+   {  3713, 0x00000D04 }, /* GL_PACK_SKIP_PIXELS */
+   {  3733, 0x00000D05 }, /* GL_PACK_ALIGNMENT */
+   {  3751, 0x00000D10 }, /* GL_MAP_COLOR */
+   {  3764, 0x00000D11 }, /* GL_MAP_STENCIL */
+   {  3779, 0x00000D12 }, /* GL_INDEX_SHIFT */
+   {  3794, 0x00000D13 }, /* GL_INDEX_OFFSET */
+   {  3810, 0x00000D14 }, /* GL_RED_SCALE */
+   {  3823, 0x00000D15 }, /* GL_RED_BIAS */
+   {  3835, 0x00000D16 }, /* GL_ZOOM_X */
+   {  3845, 0x00000D17 }, /* GL_ZOOM_Y */
+   {  3855, 0x00000D18 }, /* GL_GREEN_SCALE */
+   {  3870, 0x00000D19 }, /* GL_GREEN_BIAS */
+   {  3884, 0x00000D1A }, /* GL_BLUE_SCALE */
+   {  3898, 0x00000D1B }, /* GL_BLUE_BIAS */
+   {  3911, 0x00000D1C }, /* GL_ALPHA_SCALE */
+   {  3926, 0x00000D1D }, /* GL_ALPHA_BIAS */
+   {  3940, 0x00000D1E }, /* GL_DEPTH_SCALE */
+   {  3955, 0x00000D1F }, /* GL_DEPTH_BIAS */
+   {  3969, 0x00000D30 }, /* GL_MAX_EVAL_ORDER */
+   {  3987, 0x00000D31 }, /* GL_MAX_LIGHTS */
+   {  4001, 0x00000D32 }, /* GL_MAX_CLIP_DISTANCES */
+   {  4023, 0x00000D33 }, /* GL_MAX_TEXTURE_SIZE */
+   {  4043, 0x00000D34 }, /* GL_MAX_PIXEL_MAP_TABLE */
+   {  4066, 0x00000D35 }, /* GL_MAX_ATTRIB_STACK_DEPTH */
+   {  4092, 0x00000D36 }, /* GL_MAX_MODELVIEW_STACK_DEPTH */
+   {  4121, 0x00000D37 }, /* GL_MAX_NAME_STACK_DEPTH */
+   {  4145, 0x00000D38 }, /* GL_MAX_PROJECTION_STACK_DEPTH */
+   {  4175, 0x00000D39 }, /* GL_MAX_TEXTURE_STACK_DEPTH */
+   {  4202, 0x00000D3A }, /* GL_MAX_VIEWPORT_DIMS */
+   {  4223, 0x00000D3B }, /* GL_MAX_CLIENT_ATTRIB_STACK_DEPTH */
+   {  4256, 0x00000D50 }, /* GL_SUBPIXEL_BITS */
+   {  4273, 0x00000D51 }, /* GL_INDEX_BITS */
+   {  4287, 0x00000D52 }, /* GL_RED_BITS */
+   {  4299, 0x00000D53 }, /* GL_GREEN_BITS */
+   {  4313, 0x00000D54 }, /* GL_BLUE_BITS */
+   {  4326, 0x00000D55 }, /* GL_ALPHA_BITS */
+   {  4340, 0x00000D56 }, /* GL_DEPTH_BITS */
+   {  4354, 0x00000D57 }, /* GL_STENCIL_BITS */
+   {  4370, 0x00000D58 }, /* GL_ACCUM_RED_BITS */
+   {  4388, 0x00000D59 }, /* GL_ACCUM_GREEN_BITS */
+   {  4408, 0x00000D5A }, /* GL_ACCUM_BLUE_BITS */
+   {  4427, 0x00000D5B }, /* GL_ACCUM_ALPHA_BITS */
+   {  4447, 0x00000D70 }, /* GL_NAME_STACK_DEPTH */
+   {  4467, 0x00000D80 }, /* GL_AUTO_NORMAL */
+   {  4482, 0x00000D90 }, /* GL_MAP1_COLOR_4 */
+   {  4498, 0x00000D91 }, /* GL_MAP1_INDEX */
+   {  4512, 0x00000D92 }, /* GL_MAP1_NORMAL */
+   {  4527, 0x00000D93 }, /* GL_MAP1_TEXTURE_COORD_1 */
+   {  4551, 0x00000D94 }, /* GL_MAP1_TEXTURE_COORD_2 */
+   {  4575, 0x00000D95 }, /* GL_MAP1_TEXTURE_COORD_3 */
+   {  4599, 0x00000D96 }, /* GL_MAP1_TEXTURE_COORD_4 */
+   {  4623, 0x00000D97 }, /* GL_MAP1_VERTEX_3 */
+   {  4640, 0x00000D98 }, /* GL_MAP1_VERTEX_4 */
+   {  4657, 0x00000DB0 }, /* GL_MAP2_COLOR_4 */
+   {  4673, 0x00000DB1 }, /* GL_MAP2_INDEX */
+   {  4687, 0x00000DB2 }, /* GL_MAP2_NORMAL */
+   {  4702, 0x00000DB3 }, /* GL_MAP2_TEXTURE_COORD_1 */
+   {  4726, 0x00000DB4 }, /* GL_MAP2_TEXTURE_COORD_2 */
+   {  4750, 0x00000DB5 }, /* GL_MAP2_TEXTURE_COORD_3 */
+   {  4774, 0x00000DB6 }, /* GL_MAP2_TEXTURE_COORD_4 */
+   {  4798, 0x00000DB7 }, /* GL_MAP2_VERTEX_3 */
+   {  4815, 0x00000DB8 }, /* GL_MAP2_VERTEX_4 */
+   {  4832, 0x00000DD0 }, /* GL_MAP1_GRID_DOMAIN */
+   {  4852, 0x00000DD1 }, /* GL_MAP1_GRID_SEGMENTS */
+   {  4874, 0x00000DD2 }, /* GL_MAP2_GRID_DOMAIN */
+   {  4894, 0x00000DD3 }, /* GL_MAP2_GRID_SEGMENTS */
+   {  4916, 0x00000DE0 }, /* GL_TEXTURE_1D */
+   {  4930, 0x00000DE1 }, /* GL_TEXTURE_2D */
+   {  4944, 0x00000DF0 }, /* GL_FEEDBACK_BUFFER_POINTER */
+   {  4971, 0x00000DF1 }, /* GL_FEEDBACK_BUFFER_SIZE */
+   {  4995, 0x00000DF2 }, /* GL_FEEDBACK_BUFFER_TYPE */
+   {  5019, 0x00000DF3 }, /* GL_SELECTION_BUFFER_POINTER */
+   {  5047, 0x00000DF4 }, /* GL_SELECTION_BUFFER_SIZE */
+   {  5072, 0x00001000 }, /* GL_TEXTURE_WIDTH */
+   {  5089, 0x00001001 }, /* GL_TEXTURE_HEIGHT */
+   {  5107, 0x00001003 }, /* GL_TEXTURE_COMPONENTS */
+   {  5129, 0x00001004 }, /* GL_TEXTURE_BORDER_COLOR */
+   {  5153, 0x00001005 }, /* GL_TEXTURE_BORDER */
+   {  5171, 0x00001100 }, /* GL_DONT_CARE */
+   {  5184, 0x00001101 }, /* GL_FASTEST */
+   {  5195, 0x00001102 }, /* GL_NICEST */
+   {  5205, 0x00001200 }, /* GL_AMBIENT */
+   {  5216, 0x00001201 }, /* GL_DIFFUSE */
+   {  5227, 0x00001202 }, /* GL_SPECULAR */
+   {  5239, 0x00001203 }, /* GL_POSITION */
+   {  5251, 0x00001204 }, /* GL_SPOT_DIRECTION */
+   {  5269, 0x00001205 }, /* GL_SPOT_EXPONENT */
+   {  5286, 0x00001206 }, /* GL_SPOT_CUTOFF */
+   {  5301, 0x00001207 }, /* GL_CONSTANT_ATTENUATION */
+   {  5325, 0x00001208 }, /* GL_LINEAR_ATTENUATION */
+   {  5347, 0x00001209 }, /* GL_QUADRATIC_ATTENUATION */
+   {  5372, 0x00001300 }, /* GL_COMPILE */
+   {  5383, 0x00001301 }, /* GL_COMPILE_AND_EXECUTE */
+   {  5406, 0x00001400 }, /* GL_BYTE */
+   {  5414, 0x00001401 }, /* GL_UNSIGNED_BYTE */
+   {  5431, 0x00001402 }, /* GL_SHORT */
+   {  5440, 0x00001403 }, /* GL_UNSIGNED_SHORT */
+   {  5458, 0x00001404 }, /* GL_INT */
+   {  5465, 0x00001405 }, /* GL_UNSIGNED_INT */
+   {  5481, 0x00001406 }, /* GL_FLOAT */
+   {  5490, 0x00001407 }, /* GL_2_BYTES */
+   {  5501, 0x00001408 }, /* GL_3_BYTES */
+   {  5512, 0x00001409 }, /* GL_4_BYTES */
+   {  5523, 0x0000140A }, /* GL_DOUBLE */
+   {  5533, 0x0000140B }, /* GL_HALF_FLOAT */
+   {  5547, 0x0000140C }, /* GL_FIXED */
+   {  5556, 0x00001500 }, /* GL_CLEAR */
+   {  5565, 0x00001501 }, /* GL_AND */
+   {  5572, 0x00001502 }, /* GL_AND_REVERSE */
+   {  5587, 0x00001503 }, /* GL_COPY */
+   {  5595, 0x00001504 }, /* GL_AND_INVERTED */
+   {  5611, 0x00001505 }, /* GL_NOOP */
+   {  5619, 0x00001506 }, /* GL_XOR */
+   {  5626, 0x00001507 }, /* GL_OR */
+   {  5632, 0x00001508 }, /* GL_NOR */
+   {  5639, 0x00001509 }, /* GL_EQUIV */
+   {  5648, 0x0000150A }, /* GL_INVERT */
+   {  5658, 0x0000150B }, /* GL_OR_REVERSE */
+   {  5672, 0x0000150C }, /* GL_COPY_INVERTED */
+   {  5689, 0x0000150D }, /* GL_OR_INVERTED */
+   {  5704, 0x0000150E }, /* GL_NAND */
+   {  5712, 0x0000150F }, /* GL_SET */
+   {  5719, 0x00001600 }, /* GL_EMISSION */
+   {  5731, 0x00001601 }, /* GL_SHININESS */
+   {  5744, 0x00001602 }, /* GL_AMBIENT_AND_DIFFUSE */
+   {  5767, 0x00001603 }, /* GL_COLOR_INDEXES */
+   {  5784, 0x00001700 }, /* GL_MODELVIEW */
+   {  5797, 0x00001701 }, /* GL_PROJECTION */
+   {  5811, 0x00001702 }, /* GL_TEXTURE */
+   {  5822, 0x00001800 }, /* GL_COLOR */
+   {  5831, 0x00001801 }, /* GL_DEPTH */
+   {  5840, 0x00001802 }, /* GL_STENCIL */
+   {  5851, 0x00001900 }, /* GL_COLOR_INDEX */
+   {  5866, 0x00001901 }, /* GL_STENCIL_INDEX */
+   {  5883, 0x00001902 }, /* GL_DEPTH_COMPONENT */
+   {  5902, 0x00001903 }, /* GL_RED */
+   {  5909, 0x00001904 }, /* GL_GREEN */
+   {  5918, 0x00001905 }, /* GL_BLUE */
+   {  5926, 0x00001906 }, /* GL_ALPHA */
+   {  5935, 0x00001907 }, /* GL_RGB */
+   {  5942, 0x00001908 }, /* GL_RGBA */
+   {  5950, 0x00001909 }, /* GL_LUMINANCE */
+   {  5963, 0x0000190A }, /* GL_LUMINANCE_ALPHA */
+   {  5982, 0x00001A00 }, /* GL_BITMAP */
+   {  5992, 0x00001B00 }, /* GL_POINT */
+   {  6001, 0x00001B01 }, /* GL_LINE */
+   {  6009, 0x00001B02 }, /* GL_FILL */
+   {  6017, 0x00001C00 }, /* GL_RENDER */
+   {  6027, 0x00001C01 }, /* GL_FEEDBACK */
+   {  6039, 0x00001C02 }, /* GL_SELECT */
+   {  6049, 0x00001D00 }, /* GL_FLAT */
+   {  6057, 0x00001D01 }, /* GL_SMOOTH */
+   {  6067, 0x00001E00 }, /* GL_KEEP */
+   {  6075, 0x00001E01 }, /* GL_REPLACE */
+   {  6086, 0x00001E02 }, /* GL_INCR */
+   {  6094, 0x00001E03 }, /* GL_DECR */
+   {  6102, 0x00001F00 }, /* GL_VENDOR */
+   {  6112, 0x00001F01 }, /* GL_RENDERER */
+   {  6124, 0x00001F02 }, /* GL_VERSION */
+   {  6135, 0x00001F03 }, /* GL_EXTENSIONS */
+   {  6149, 0x00002000 }, /* GL_S */
+   {  6154, 0x00002001 }, /* GL_T */
+   {  6159, 0x00002002 }, /* GL_R */
+   {  6164, 0x00002003 }, /* GL_Q */
+   {  6169, 0x00002100 }, /* GL_MODULATE */
+   {  6181, 0x00002101 }, /* GL_DECAL */
+   {  6190, 0x00002200 }, /* GL_TEXTURE_ENV_MODE */
+   {  6210, 0x00002201 }, /* GL_TEXTURE_ENV_COLOR */
+   {  6231, 0x00002300 }, /* GL_TEXTURE_ENV */
+   {  6246, 0x00002400 }, /* GL_EYE_LINEAR */
+   {  6260, 0x00002401 }, /* GL_OBJECT_LINEAR */
+   {  6277, 0x00002402 }, /* GL_SPHERE_MAP */
+   {  6291, 0x00002500 }, /* GL_TEXTURE_GEN_MODE */
+   {  6311, 0x00002501 }, /* GL_OBJECT_PLANE */
+   {  6327, 0x00002502 }, /* GL_EYE_PLANE */
+   {  6340, 0x00002600 }, /* GL_NEAREST */
+   {  6351, 0x00002601 }, /* GL_LINEAR */
+   {  6361, 0x00002700 }, /* GL_NEAREST_MIPMAP_NEAREST */
+   {  6387, 0x00002701 }, /* GL_LINEAR_MIPMAP_NEAREST */
+   {  6412, 0x00002702 }, /* GL_NEAREST_MIPMAP_LINEAR */
+   {  6437, 0x00002703 }, /* GL_LINEAR_MIPMAP_LINEAR */
+   {  6461, 0x00002800 }, /* GL_TEXTURE_MAG_FILTER */
+   {  6483, 0x00002801 }, /* GL_TEXTURE_MIN_FILTER */
+   {  6505, 0x00002802 }, /* GL_TEXTURE_WRAP_S */
+   {  6523, 0x00002803 }, /* GL_TEXTURE_WRAP_T */
+   {  6541, 0x00002900 }, /* GL_CLAMP */
+   {  6550, 0x00002901 }, /* GL_REPEAT */
+   {  6560, 0x00002A00 }, /* GL_POLYGON_OFFSET_UNITS */
+   {  6584, 0x00002A01 }, /* GL_POLYGON_OFFSET_POINT */
+   {  6608, 0x00002A02 }, /* GL_POLYGON_OFFSET_LINE */
+   {  6631, 0x00002A10 }, /* GL_R3_G3_B2 */
+   {  6643, 0x00002A20 }, /* GL_V2F */
+   {  6650, 0x00002A21 }, /* GL_V3F */
+   {  6657, 0x00002A22 }, /* GL_C4UB_V2F */
+   {  6669, 0x00002A23 }, /* GL_C4UB_V3F */
+   {  6681, 0x00002A24 }, /* GL_C3F_V3F */
+   {  6692, 0x00002A25 }, /* GL_N3F_V3F */
+   {  6703, 0x00002A26 }, /* GL_C4F_N3F_V3F */
+   {  6718, 0x00002A27 }, /* GL_T2F_V3F */
+   {  6729, 0x00002A28 }, /* GL_T4F_V4F */
+   {  6740, 0x00002A29 }, /* GL_T2F_C4UB_V3F */
+   {  6756, 0x00002A2A }, /* GL_T2F_C3F_V3F */
+   {  6771, 0x00002A2B }, /* GL_T2F_N3F_V3F */
+   {  6786, 0x00002A2C }, /* GL_T2F_C4F_N3F_V3F */
+   {  6805, 0x00002A2D }, /* GL_T4F_C4F_N3F_V4F */
+   {  6824, 0x00003000 }, /* GL_CLIP_DISTANCE0 */
+   {  6842, 0x00003001 }, /* GL_CLIP_DISTANCE1 */
+   {  6860, 0x00003002 }, /* GL_CLIP_DISTANCE2 */
+   {  6878, 0x00003003 }, /* GL_CLIP_DISTANCE3 */
+   {  6896, 0x00003004 }, /* GL_CLIP_DISTANCE4 */
+   {  6914, 0x00003005 }, /* GL_CLIP_DISTANCE5 */
+   {  6932, 0x00003006 }, /* GL_CLIP_DISTANCE6 */
+   {  6950, 0x00003007 }, /* GL_CLIP_DISTANCE7 */
+   {  6968, 0x00004000 }, /* GL_LIGHT0 */
+   {  6978, 0x00004001 }, /* GL_LIGHT1 */
+   {  6988, 0x00004002 }, /* GL_LIGHT2 */
+   {  6998, 0x00004003 }, /* GL_LIGHT3 */
+   {  7008, 0x00004004 }, /* GL_LIGHT4 */
+   {  7018, 0x00004005 }, /* GL_LIGHT5 */
+   {  7028, 0x00004006 }, /* GL_LIGHT6 */
+   {  7038, 0x00004007 }, /* GL_LIGHT7 */
+   {  7048, 0x00008000 }, /* GL_HINT_BIT */
+   {  7060, 0x00008001 }, /* GL_CONSTANT_COLOR */
+   {  7078, 0x00008002 }, /* GL_ONE_MINUS_CONSTANT_COLOR */
+   {  7106, 0x00008003 }, /* GL_CONSTANT_ALPHA */
+   {  7124, 0x00008004 }, /* GL_ONE_MINUS_CONSTANT_ALPHA */
+   {  7152, 0x00008005 }, /* GL_BLEND_COLOR */
+   {  7167, 0x00008006 }, /* GL_FUNC_ADD */
+   {  7179, 0x00008007 }, /* GL_MIN */
+   {  7186, 0x00008008 }, /* GL_MAX */
+   {  7193, 0x00008009 }, /* GL_BLEND_EQUATION */
+   {  7211, 0x0000800A }, /* GL_FUNC_SUBTRACT */
+   {  7228, 0x0000800B }, /* GL_FUNC_REVERSE_SUBTRACT */
+   {  7253, 0x00008010 }, /* GL_CONVOLUTION_1D */
+   {  7271, 0x00008011 }, /* GL_CONVOLUTION_2D */
+   {  7289, 0x00008012 }, /* GL_SEPARABLE_2D */
+   {  7305, 0x00008013 }, /* GL_CONVOLUTION_BORDER_MODE */
+   {  7332, 0x00008014 }, /* GL_CONVOLUTION_FILTER_SCALE */
+   {  7360, 0x00008015 }, /* GL_CONVOLUTION_FILTER_BIAS */
+   {  7387, 0x00008016 }, /* GL_REDUCE */
+   {  7397, 0x00008017 }, /* GL_CONVOLUTION_FORMAT */
+   {  7419, 0x00008018 }, /* GL_CONVOLUTION_WIDTH */
+   {  7440, 0x00008019 }, /* GL_CONVOLUTION_HEIGHT */
+   {  7462, 0x0000801A }, /* GL_MAX_CONVOLUTION_WIDTH */
+   {  7487, 0x0000801B }, /* GL_MAX_CONVOLUTION_HEIGHT */
+   {  7513, 0x0000801C }, /* GL_POST_CONVOLUTION_RED_SCALE */
+   {  7543, 0x0000801D }, /* GL_POST_CONVOLUTION_GREEN_SCALE */
+   {  7575, 0x0000801E }, /* GL_POST_CONVOLUTION_BLUE_SCALE */
+   {  7606, 0x0000801F }, /* GL_POST_CONVOLUTION_ALPHA_SCALE */
+   {  7638, 0x00008020 }, /* GL_POST_CONVOLUTION_RED_BIAS */
+   {  7667, 0x00008021 }, /* GL_POST_CONVOLUTION_GREEN_BIAS */
+   {  7698, 0x00008022 }, /* GL_POST_CONVOLUTION_BLUE_BIAS */
+   {  7728, 0x00008023 }, /* GL_POST_CONVOLUTION_ALPHA_BIAS */
+   {  7759, 0x00008024 }, /* GL_HISTOGRAM */
+   {  7772, 0x00008025 }, /* GL_PROXY_HISTOGRAM */
+   {  7791, 0x00008026 }, /* GL_HISTOGRAM_WIDTH */
+   {  7810, 0x00008027 }, /* GL_HISTOGRAM_FORMAT */
+   {  7830, 0x00008028 }, /* GL_HISTOGRAM_RED_SIZE */
+   {  7852, 0x00008029 }, /* GL_HISTOGRAM_GREEN_SIZE */
+   {  7876, 0x0000802A }, /* GL_HISTOGRAM_BLUE_SIZE */
+   {  7899, 0x0000802B }, /* GL_HISTOGRAM_ALPHA_SIZE */
+   {  7923, 0x0000802C }, /* GL_HISTOGRAM_LUMINANCE_SIZE */
+   {  7951, 0x0000802D }, /* GL_HISTOGRAM_SINK */
+   {  7969, 0x0000802E }, /* GL_MINMAX */
+   {  7979, 0x0000802F }, /* GL_MINMAX_FORMAT */
+   {  7996, 0x00008030 }, /* GL_MINMAX_SINK */
+   {  8011, 0x00008031 }, /* GL_TABLE_TOO_LARGE_EXT */
+   {  8034, 0x00008032 }, /* GL_UNSIGNED_BYTE_3_3_2 */
+   {  8057, 0x00008033 }, /* GL_UNSIGNED_SHORT_4_4_4_4 */
+   {  8083, 0x00008034 }, /* GL_UNSIGNED_SHORT_5_5_5_1 */
+   {  8109, 0x00008035 }, /* GL_UNSIGNED_INT_8_8_8_8 */
+   {  8133, 0x00008036 }, /* GL_UNSIGNED_INT_10_10_10_2 */
+   {  8160, 0x00008037 }, /* GL_POLYGON_OFFSET_FILL */
+   {  8183, 0x00008038 }, /* GL_POLYGON_OFFSET_FACTOR */
+   {  8208, 0x00008039 }, /* GL_POLYGON_OFFSET_BIAS_EXT */
+   {  8235, 0x0000803A }, /* GL_RESCALE_NORMAL */
+   {  8253, 0x0000803B }, /* GL_ALPHA4 */
+   {  8263, 0x0000803C }, /* GL_ALPHA8 */
+   {  8273, 0x0000803D }, /* GL_ALPHA12 */
+   {  8284, 0x0000803E }, /* GL_ALPHA16 */
+   {  8295, 0x0000803F }, /* GL_LUMINANCE4 */
+   {  8309, 0x00008040 }, /* GL_LUMINANCE8 */
+   {  8323, 0x00008041 }, /* GL_LUMINANCE12 */
+   {  8338, 0x00008042 }, /* GL_LUMINANCE16 */
+   {  8353, 0x00008043 }, /* GL_LUMINANCE4_ALPHA4 */
+   {  8374, 0x00008044 }, /* GL_LUMINANCE6_ALPHA2 */
+   {  8395, 0x00008045 }, /* GL_LUMINANCE8_ALPHA8 */
+   {  8416, 0x00008046 }, /* GL_LUMINANCE12_ALPHA4 */
+   {  8438, 0x00008047 }, /* GL_LUMINANCE12_ALPHA12 */
+   {  8461, 0x00008048 }, /* GL_LUMINANCE16_ALPHA16 */
+   {  8484, 0x00008049 }, /* GL_INTENSITY */
+   {  8497, 0x0000804A }, /* GL_INTENSITY4 */
+   {  8511, 0x0000804B }, /* GL_INTENSITY8 */
+   {  8525, 0x0000804C }, /* GL_INTENSITY12 */
+   {  8540, 0x0000804D }, /* GL_INTENSITY16 */
+   {  8555, 0x0000804E }, /* GL_RGB2_EXT */
+   {  8567, 0x0000804F }, /* GL_RGB4 */
+   {  8575, 0x00008050 }, /* GL_RGB5 */
+   {  8583, 0x00008051 }, /* GL_RGB8 */
+   {  8591, 0x00008052 }, /* GL_RGB10 */
+   {  8600, 0x00008053 }, /* GL_RGB12 */
+   {  8609, 0x00008054 }, /* GL_RGB16 */
+   {  8618, 0x00008055 }, /* GL_RGBA2 */
+   {  8627, 0x00008056 }, /* GL_RGBA4 */
+   {  8636, 0x00008057 }, /* GL_RGB5_A1 */
+   {  8647, 0x00008058 }, /* GL_RGBA8 */
+   {  8656, 0x00008059 }, /* GL_RGB10_A2 */
+   {  8668, 0x0000805A }, /* GL_RGBA12 */
+   {  8678, 0x0000805B }, /* GL_RGBA16 */
+   {  8688, 0x0000805C }, /* GL_TEXTURE_RED_SIZE */
+   {  8708, 0x0000805D }, /* GL_TEXTURE_GREEN_SIZE */
+   {  8730, 0x0000805E }, /* GL_TEXTURE_BLUE_SIZE */
+   {  8751, 0x0000805F }, /* GL_TEXTURE_ALPHA_SIZE */
+   {  8773, 0x00008060 }, /* GL_TEXTURE_LUMINANCE_SIZE */
+   {  8799, 0x00008061 }, /* GL_TEXTURE_INTENSITY_SIZE */
+   {  8825, 0x00008062 }, /* GL_REPLACE_EXT */
+   {  8840, 0x00008063 }, /* GL_PROXY_TEXTURE_1D */
+   {  8860, 0x00008064 }, /* GL_PROXY_TEXTURE_2D */
+   {  8880, 0x00008065 }, /* GL_TEXTURE_TOO_LARGE_EXT */
+   {  8905, 0x00008066 }, /* GL_TEXTURE_PRIORITY */
+   {  8925, 0x00008067 }, /* GL_TEXTURE_RESIDENT */
+   {  8945, 0x00008068 }, /* GL_TEXTURE_BINDING_1D */
+   {  8967, 0x00008069 }, /* GL_TEXTURE_BINDING_2D */
+   {  8989, 0x0000806A }, /* GL_TEXTURE_BINDING_3D */
+   {  9011, 0x0000806B }, /* GL_PACK_SKIP_IMAGES */
+   {  9031, 0x0000806C }, /* GL_PACK_IMAGE_HEIGHT */
+   {  9052, 0x0000806D }, /* GL_UNPACK_SKIP_IMAGES */
+   {  9074, 0x0000806E }, /* GL_UNPACK_IMAGE_HEIGHT */
+   {  9097, 0x0000806F }, /* GL_TEXTURE_3D */
+   {  9111, 0x00008070 }, /* GL_PROXY_TEXTURE_3D */
+   {  9131, 0x00008071 }, /* GL_TEXTURE_DEPTH */
+   {  9148, 0x00008072 }, /* GL_TEXTURE_WRAP_R */
+   {  9166, 0x00008073 }, /* GL_MAX_3D_TEXTURE_SIZE */
+   {  9189, 0x00008074 }, /* GL_VERTEX_ARRAY */
+   {  9205, 0x00008075 }, /* GL_NORMAL_ARRAY */
+   {  9221, 0x00008076 }, /* GL_COLOR_ARRAY */
+   {  9236, 0x00008077 }, /* GL_INDEX_ARRAY */
+   {  9251, 0x00008078 }, /* GL_TEXTURE_COORD_ARRAY */
+   {  9274, 0x00008079 }, /* GL_EDGE_FLAG_ARRAY */
+   {  9293, 0x0000807A }, /* GL_VERTEX_ARRAY_SIZE */
+   {  9314, 0x0000807B }, /* GL_VERTEX_ARRAY_TYPE */
+   {  9335, 0x0000807C }, /* GL_VERTEX_ARRAY_STRIDE */
+   {  9358, 0x0000807D }, /* GL_VERTEX_ARRAY_COUNT_EXT */
+   {  9384, 0x0000807E }, /* GL_NORMAL_ARRAY_TYPE */
+   {  9405, 0x0000807F }, /* GL_NORMAL_ARRAY_STRIDE */
+   {  9428, 0x00008080 }, /* GL_NORMAL_ARRAY_COUNT_EXT */
+   {  9454, 0x00008081 }, /* GL_COLOR_ARRAY_SIZE */
+   {  9474, 0x00008082 }, /* GL_COLOR_ARRAY_TYPE */
+   {  9494, 0x00008083 }, /* GL_COLOR_ARRAY_STRIDE */
+   {  9516, 0x00008084 }, /* GL_COLOR_ARRAY_COUNT_EXT */
+   {  9541, 0x00008085 }, /* GL_INDEX_ARRAY_TYPE */
+   {  9561, 0x00008086 }, /* GL_INDEX_ARRAY_STRIDE */
+   {  9583, 0x00008087 }, /* GL_INDEX_ARRAY_COUNT_EXT */
+   {  9608, 0x00008088 }, /* GL_TEXTURE_COORD_ARRAY_SIZE */
+   {  9636, 0x00008089 }, /* GL_TEXTURE_COORD_ARRAY_TYPE */
+   {  9664, 0x0000808A }, /* GL_TEXTURE_COORD_ARRAY_STRIDE */
+   {  9694, 0x0000808B }, /* GL_TEXTURE_COORD_ARRAY_COUNT_EXT */
+   {  9727, 0x0000808C }, /* GL_EDGE_FLAG_ARRAY_STRIDE */
+   {  9753, 0x0000808D }, /* GL_EDGE_FLAG_ARRAY_COUNT_EXT */
+   {  9782, 0x0000808E }, /* GL_VERTEX_ARRAY_POINTER */
+   {  9806, 0x0000808F }, /* GL_NORMAL_ARRAY_POINTER */
+   {  9830, 0x00008090 }, /* GL_COLOR_ARRAY_POINTER */
+   {  9853, 0x00008091 }, /* GL_INDEX_ARRAY_POINTER */
+   {  9876, 0x00008092 }, /* GL_TEXTURE_COORD_ARRAY_POINTER */
+   {  9907, 0x00008093 }, /* GL_EDGE_FLAG_ARRAY_POINTER */
+   {  9934, 0x0000809D }, /* GL_MULTISAMPLE */
+   {  9949, 0x0000809E }, /* GL_SAMPLE_ALPHA_TO_COVERAGE */
+   {  9977, 0x0000809F }, /* GL_SAMPLE_ALPHA_TO_ONE */
+   { 10000, 0x000080A0 }, /* GL_SAMPLE_COVERAGE */
+   { 10019, 0x000080A8 }, /* GL_SAMPLE_BUFFERS */
+   { 10037, 0x000080A9 }, /* GL_SAMPLES */
+   { 10048, 0x000080AA }, /* GL_SAMPLE_COVERAGE_VALUE */
+   { 10073, 0x000080AB }, /* GL_SAMPLE_COVERAGE_INVERT */
+   { 10099, 0x000080B1 }, /* GL_COLOR_MATRIX */
+   { 10115, 0x000080B2 }, /* GL_COLOR_MATRIX_STACK_DEPTH */
+   { 10143, 0x000080B3 }, /* GL_MAX_COLOR_MATRIX_STACK_DEPTH */
+   { 10175, 0x000080B4 }, /* GL_POST_COLOR_MATRIX_RED_SCALE */
+   { 10206, 0x000080B5 }, /* GL_POST_COLOR_MATRIX_GREEN_SCALE */
+   { 10239, 0x000080B6 }, /* GL_POST_COLOR_MATRIX_BLUE_SCALE */
+   { 10271, 0x000080B7 }, /* GL_POST_COLOR_MATRIX_ALPHA_SCALE */
+   { 10304, 0x000080B8 }, /* GL_POST_COLOR_MATRIX_RED_BIAS */
+   { 10334, 0x000080B9 }, /* GL_POST_COLOR_MATRIX_GREEN_BIAS */
+   { 10366, 0x000080BA }, /* GL_POST_COLOR_MATRIX_BLUE_BIAS */
+   { 10397, 0x000080BB }, /* GL_POST_COLOR_MATRIX_ALPHA_BIAS */
+   { 10429, 0x000080BC }, /* GL_TEXTURE_COLOR_TABLE_SGI */
+   { 10456, 0x000080BD }, /* GL_PROXY_TEXTURE_COLOR_TABLE_SGI */
+   { 10489, 0x000080BF }, /* GL_TEXTURE_COMPARE_FAIL_VALUE_ARB */
+   { 10523, 0x000080C8 }, /* GL_BLEND_DST_RGB */
+   { 10540, 0x000080C9 }, /* GL_BLEND_SRC_RGB */
+   { 10557, 0x000080CA }, /* GL_BLEND_DST_ALPHA */
+   { 10576, 0x000080CB }, /* GL_BLEND_SRC_ALPHA */
+   { 10595, 0x000080D0 }, /* GL_COLOR_TABLE */
+   { 10610, 0x000080D1 }, /* GL_POST_CONVOLUTION_COLOR_TABLE */
+   { 10642, 0x000080D2 }, /* GL_POST_COLOR_MATRIX_COLOR_TABLE */
+   { 10675, 0x000080D3 }, /* GL_PROXY_COLOR_TABLE */
+   { 10696, 0x000080D4 }, /* GL_PROXY_POST_CONVOLUTION_COLOR_TABLE */
+   { 10734, 0x000080D5 }, /* GL_PROXY_POST_COLOR_MATRIX_COLOR_TABLE */
+   { 10773, 0x000080D6 }, /* GL_COLOR_TABLE_SCALE */
+   { 10794, 0x000080D7 }, /* GL_COLOR_TABLE_BIAS */
+   { 10814, 0x000080D8 }, /* GL_COLOR_TABLE_FORMAT */
+   { 10836, 0x000080D9 }, /* GL_COLOR_TABLE_WIDTH */
+   { 10857, 0x000080DA }, /* GL_COLOR_TABLE_RED_SIZE */
+   { 10881, 0x000080DB }, /* GL_COLOR_TABLE_GREEN_SIZE */
+   { 10907, 0x000080DC }, /* GL_COLOR_TABLE_BLUE_SIZE */
+   { 10932, 0x000080DD }, /* GL_COLOR_TABLE_ALPHA_SIZE */
+   { 10958, 0x000080DE }, /* GL_COLOR_TABLE_LUMINANCE_SIZE */
+   { 10988, 0x000080DF }, /* GL_COLOR_TABLE_INTENSITY_SIZE */
+   { 11018, 0x000080E0 }, /* GL_BGR */
+   { 11025, 0x000080E1 }, /* GL_BGRA */
+   { 11033, 0x000080E8 }, /* GL_MAX_ELEMENTS_VERTICES */
+   { 11058, 0x000080E9 }, /* GL_MAX_ELEMENTS_INDICES */
+   { 11082, 0x000080ED }, /* GL_TEXTURE_INDEX_SIZE_EXT */
+   { 11108, 0x000080F0 }, /* GL_CLIP_VOLUME_CLIPPING_HINT_EXT */
+   { 11141, 0x00008126 }, /* GL_POINT_SIZE_MIN */
+   { 11159, 0x00008127 }, /* GL_POINT_SIZE_MAX */
+   { 11177, 0x00008128 }, /* GL_POINT_FADE_THRESHOLD_SIZE */
+   { 11206, 0x00008129 }, /* GL_POINT_DISTANCE_ATTENUATION */
+   { 11236, 0x0000812D }, /* GL_CLAMP_TO_BORDER */
+   { 11255, 0x0000812F }, /* GL_CLAMP_TO_EDGE */
+   { 11272, 0x0000813A }, /* GL_TEXTURE_MIN_LOD */
+   { 11291, 0x0000813B }, /* GL_TEXTURE_MAX_LOD */
+   { 11310, 0x0000813C }, /* GL_TEXTURE_BASE_LEVEL */
+   { 11332, 0x0000813D }, /* GL_TEXTURE_MAX_LEVEL */
+   { 11353, 0x00008150 }, /* GL_IGNORE_BORDER_HP */
+   { 11373, 0x00008151 }, /* GL_CONSTANT_BORDER_HP */
+   { 11395, 0x00008153 }, /* GL_REPLICATE_BORDER_HP */
+   { 11418, 0x00008154 }, /* GL_CONVOLUTION_BORDER_COLOR */
+   { 11446, 0x00008165 }, /* GL_OCCLUSION_TEST_HP */
+   { 11467, 0x00008166 }, /* GL_OCCLUSION_TEST_RESULT_HP */
+   { 11495, 0x00008170 }, /* GL_LINEAR_CLIPMAP_LINEAR_SGIX */
+   { 11525, 0x00008171 }, /* GL_TEXTURE_CLIPMAP_CENTER_SGIX */
+   { 11556, 0x00008172 }, /* GL_TEXTURE_CLIPMAP_FRAME_SGIX */
+   { 11586, 0x00008173 }, /* GL_TEXTURE_CLIPMAP_OFFSET_SGIX */
+   { 11617, 0x00008174 }, /* GL_TEXTURE_CLIPMAP_VIRTUAL_DEPTH_SGIX */
+   { 11655, 0x00008175 }, /* GL_TEXTURE_CLIPMAP_LOD_OFFSET_SGIX */
+   { 11690, 0x00008176 }, /* GL_TEXTURE_CLIPMAP_DEPTH_SGIX */
+   { 11720, 0x00008177 }, /* GL_MAX_CLIPMAP_DEPTH_SGIX */
+   { 11746, 0x00008178 }, /* GL_MAX_CLIPMAP_VIRTUAL_DEPTH_SGIX */
+   { 11780, 0x00008179 }, /* GL_POST_TEXTURE_FILTER_BIAS_SGIX */
+   { 11813, 0x0000817A }, /* GL_POST_TEXTURE_FILTER_SCALE_SGIX */
+   { 11847, 0x0000817B }, /* GL_POST_TEXTURE_FILTER_BIAS_RANGE_SGIX */
+   { 11886, 0x0000817C }, /* GL_POST_TEXTURE_FILTER_SCALE_RANGE_SGIX */
+   { 11926, 0x0000818E }, /* GL_TEXTURE_LOD_BIAS_S_SGIX */
+   { 11953, 0x0000818F }, /* GL_TEXTURE_LOD_BIAS_T_SGIX */
+   { 11980, 0x00008190 }, /* GL_TEXTURE_LOD_BIAS_R_SGIX */
+   { 12007, 0x00008191 }, /* GL_GENERATE_MIPMAP */
+   { 12026, 0x00008192 }, /* GL_GENERATE_MIPMAP_HINT */
+   { 12050, 0x00008198 }, /* GL_FOG_OFFSET_SGIX */
+   { 12069, 0x00008199 }, /* GL_FOG_OFFSET_VALUE_SGIX */
+   { 12094, 0x0000819A }, /* GL_TEXTURE_COMPARE_SGIX */
+   { 12118, 0x0000819B }, /* GL_TEXTURE_COMPARE_OPERATOR_SGIX */
+   { 12151, 0x0000819C }, /* GL_TEXTURE_LEQUAL_R_SGIX */
+   { 12176, 0x0000819D }, /* GL_TEXTURE_GEQUAL_R_SGIX */
+   { 12201, 0x000081A5 }, /* GL_DEPTH_COMPONENT16 */
+   { 12222, 0x000081A6 }, /* GL_DEPTH_COMPONENT24 */
+   { 12243, 0x000081A7 }, /* GL_DEPTH_COMPONENT32 */
+   { 12264, 0x000081A8 }, /* GL_ARRAY_ELEMENT_LOCK_FIRST_EXT */
+   { 12296, 0x000081A9 }, /* GL_ARRAY_ELEMENT_LOCK_COUNT_EXT */
+   { 12328, 0x000081AA }, /* GL_CULL_VERTEX_EXT */
+   { 12347, 0x000081AB }, /* GL_CULL_VERTEX_OBJECT_POSITION_EXT */
+   { 12382, 0x000081AC }, /* GL_CULL_VERTEX_EYE_POSITION_EXT */
+   { 12414, 0x000081D4 }, /* GL_WRAP_BORDER_SUN */
+   { 12433, 0x000081EF }, /* GL_TEXTURE_COLOR_WRITEMASK_SGIS */
+   { 12465, 0x000081F8 }, /* GL_LIGHT_MODEL_COLOR_CONTROL */
+   { 12494, 0x000081F9 }, /* GL_SINGLE_COLOR */
+   { 12510, 0x000081FA }, /* GL_SEPARATE_SPECULAR_COLOR */
+   { 12537, 0x000081FB }, /* GL_SHARED_TEXTURE_PALETTE_EXT */
+   { 12567, 0x00008210 }, /* GL_FRAMEBUFFER_ATTACHMENT_COLOR_ENCODING */
+   { 12608, 0x00008211 }, /* GL_FRAMEBUFFER_ATTACHMENT_COMPONENT_TYPE */
+   { 12649, 0x00008212 }, /* GL_FRAMEBUFFER_ATTACHMENT_RED_SIZE */
+   { 12684, 0x00008213 }, /* GL_FRAMEBUFFER_ATTACHMENT_GREEN_SIZE */
+   { 12721, 0x00008214 }, /* GL_FRAMEBUFFER_ATTACHMENT_BLUE_SIZE */
+   { 12757, 0x00008215 }, /* GL_FRAMEBUFFER_ATTACHMENT_ALPHA_SIZE */
+   { 12794, 0x00008216 }, /* GL_FRAMEBUFFER_ATTACHMENT_DEPTH_SIZE */
+   { 12831, 0x00008217 }, /* GL_FRAMEBUFFER_ATTACHMENT_STENCIL_SIZE */
+   { 12870, 0x00008218 }, /* GL_FRAMEBUFFER_DEFAULT */
+   { 12893, 0x00008219 }, /* GL_FRAMEBUFFER_UNDEFINED */
+   { 12918, 0x0000821A }, /* GL_DEPTH_STENCIL_ATTACHMENT */
+   { 12946, 0x0000821B }, /* GL_MAJOR_VERSION */
+   { 12963, 0x0000821C }, /* GL_MINOR_VERSION */
+   { 12980, 0x0000821D }, /* GL_NUM_EXTENSIONS */
+   { 12998, 0x0000821E }, /* GL_CONTEXT_FLAGS */
+   { 13015, 0x0000821F }, /* GL_BUFFER_IMMUTABLE_STORAGE */
+   { 13043, 0x00008220 }, /* GL_BUFFER_STORAGE_FLAGS */
+   { 13067, 0x00008222 }, /* GL_INDEX */
+   { 13076, 0x00008223 }, /* GL_DEPTH_BUFFER */
+   { 13092, 0x00008224 }, /* GL_STENCIL_BUFFER */
+   { 13110, 0x00008225 }, /* GL_COMPRESSED_RED */
+   { 13128, 0x00008226 }, /* GL_COMPRESSED_RG */
+   { 13145, 0x00008227 }, /* GL_RG */
+   { 13151, 0x00008228 }, /* GL_RG_INTEGER */
+   { 13165, 0x00008229 }, /* GL_R8 */
+   { 13171, 0x0000822A }, /* GL_R16 */
+   { 13178, 0x0000822B }, /* GL_RG8 */
+   { 13185, 0x0000822C }, /* GL_RG16 */
+   { 13193, 0x0000822D }, /* GL_R16F */
+   { 13201, 0x0000822E }, /* GL_R32F */
+   { 13209, 0x0000822F }, /* GL_RG16F */
+   { 13218, 0x00008230 }, /* GL_RG32F */
+   { 13227, 0x00008231 }, /* GL_R8I */
+   { 13234, 0x00008232 }, /* GL_R8UI */
+   { 13242, 0x00008233 }, /* GL_R16I */
+   { 13250, 0x00008234 }, /* GL_R16UI */
+   { 13259, 0x00008235 }, /* GL_R32I */
+   { 13267, 0x00008236 }, /* GL_R32UI */
+   { 13276, 0x00008237 }, /* GL_RG8I */
+   { 13284, 0x00008238 }, /* GL_RG8UI */
+   { 13293, 0x00008239 }, /* GL_RG16I */
+   { 13302, 0x0000823A }, /* GL_RG16UI */
+   { 13312, 0x0000823B }, /* GL_RG32I */
+   { 13321, 0x0000823C }, /* GL_RG32UI */
+   { 13331, 0x00008242 }, /* GL_DEBUG_OUTPUT_SYNCHRONOUS_ARB */
+   { 13363, 0x00008243 }, /* GL_DEBUG_NEXT_LOGGED_MESSAGE_LENGTH_ARB */
+   { 13403, 0x00008244 }, /* GL_DEBUG_CALLBACK_FUNCTION_ARB */
+   { 13434, 0x00008245 }, /* GL_DEBUG_CALLBACK_USER_PARAM_ARB */
+   { 13467, 0x00008246 }, /* GL_DEBUG_SOURCE_API_ARB */
+   { 13491, 0x00008247 }, /* GL_DEBUG_SOURCE_WINDOW_SYSTEM_ARB */
+   { 13525, 0x00008248 }, /* GL_DEBUG_SOURCE_SHADER_COMPILER_ARB */
+   { 13561, 0x00008249 }, /* GL_DEBUG_SOURCE_THIRD_PARTY_ARB */
+   { 13593, 0x0000824A }, /* GL_DEBUG_SOURCE_APPLICATION_ARB */
+   { 13625, 0x0000824B }, /* GL_DEBUG_SOURCE_OTHER_ARB */
+   { 13651, 0x0000824C }, /* GL_DEBUG_TYPE_ERROR_ARB */
+   { 13675, 0x0000824D }, /* GL_DEBUG_TYPE_DEPRECATED_BEHAVIOR_ARB */
+   { 13713, 0x0000824E }, /* GL_DEBUG_TYPE_UNDEFINED_BEHAVIOR_ARB */
+   { 13750, 0x0000824F }, /* GL_DEBUG_TYPE_PORTABILITY_ARB */
+   { 13780, 0x00008250 }, /* GL_DEBUG_TYPE_PERFORMANCE_ARB */
+   { 13810, 0x00008251 }, /* GL_DEBUG_TYPE_OTHER_ARB */
+   { 13834, 0x00008252 }, /* GL_LOSE_CONTEXT_ON_RESET_ARB */
+   { 13863, 0x00008253 }, /* GL_GUILTY_CONTEXT_RESET_ARB */
+   { 13891, 0x00008254 }, /* GL_INNOCENT_CONTEXT_RESET_ARB */
+   { 13921, 0x00008255 }, /* GL_UNKNOWN_CONTEXT_RESET_ARB */
+   { 13950, 0x00008256 }, /* GL_RESET_NOTIFICATION_STRATEGY_ARB */
+   { 13985, 0x00008257 }, /* GL_PROGRAM_BINARY_RETRIEVABLE_HINT */
+   { 14020, 0x00008258 }, /* GL_PROGRAM_SEPARABLE_EXT */
+   { 14045, 0x00008259 }, /* GL_ACTIVE_PROGRAM_EXT */
+   { 14067, 0x0000825A }, /* GL_PROGRAM_PIPELINE_BINDING_EXT */
+   { 14099, 0x0000825B }, /* GL_MAX_VIEWPORTS */
+   { 14116, 0x0000825C }, /* GL_VIEWPORT_SUBPIXEL_BITS */
+   { 14142, 0x0000825D }, /* GL_VIEWPORT_BOUNDS_RANGE */
+   { 14167, 0x0000825E }, /* GL_LAYER_PROVOKING_VERTEX */
+   { 14193, 0x0000825F }, /* GL_VIEWPORT_INDEX_PROVOKING_VERTEX */
+   { 14228, 0x00008260 }, /* GL_UNDEFINED_VERTEX */
+   { 14248, 0x00008261 }, /* GL_NO_RESET_NOTIFICATION_ARB */
+   { 14277, 0x00008262 }, /* GL_MAX_COMPUTE_SHARED_MEMORY_SIZE */
+   { 14311, 0x00008263 }, /* GL_MAX_COMPUTE_UNIFORM_COMPONENTS */
+   { 14345, 0x00008264 }, /* GL_MAX_COMPUTE_ATOMIC_COUNTER_BUFFERS */
+   { 14383, 0x00008265 }, /* GL_MAX_COMPUTE_ATOMIC_COUNTERS */
+   { 14414, 0x00008266 }, /* GL_MAX_COMBINED_COMPUTE_UNIFORM_COMPONENTS */
+   { 14457, 0x00008267 }, /* GL_COMPUTE_WORK_GROUP_SIZE */
+   { 14484, 0x00008268 }, /* GL_DEBUG_TYPE_MARKER */
+   { 14505, 0x00008269 }, /* GL_DEBUG_TYPE_PUSH_GROUP */
+   { 14530, 0x0000826A }, /* GL_DEBUG_TYPE_POP_GROUP */
+   { 14554, 0x0000826B }, /* GL_DEBUG_SEVERITY_NOTIFICATION */
+   { 14585, 0x0000826C }, /* GL_MAX_DEBUG_GROUP_STACK_DEPTH */
+   { 14616, 0x0000826D }, /* GL_DEBUG_GROUP_STACK_DEPTH */
+   { 14643, 0x000082D4 }, /* GL_VERTEX_ATTRIB_BINDING */
+   { 14668, 0x000082D5 }, /* GL_VERTEX_ATTRIB_RELATIVE_OFFSET */
+   { 14701, 0x000082D6 }, /* GL_VERTEX_BINDING_DIVISOR */
+   { 14727, 0x000082D7 }, /* GL_VERTEX_BINDING_OFFSET */
+   { 14752, 0x000082D8 }, /* GL_VERTEX_BINDING_STRIDE */
+   { 14777, 0x000082D9 }, /* GL_MAX_VERTEX_ATTRIB_RELATIVE_OFFSET */
+   { 14814, 0x000082DA }, /* GL_MAX_VERTEX_ATTRIB_BINDINGS */
+   { 14844, 0x000082DF }, /* GL_TEXTURE_IMMUTABLE_LEVELS */
+   { 14872, 0x000082E0 }, /* GL_BUFFER */
+   { 14882, 0x000082E1 }, /* GL_SHADER */
+   { 14892, 0x000082E2 }, /* GL_PROGRAM */
+   { 14903, 0x000082E3 }, /* GL_QUERY */
+   { 14912, 0x000082E4 }, /* GL_PROGRAM_PIPELINE */
+   { 14932, 0x000082E6 }, /* GL_SAMPLER */
+   { 14943, 0x000082E7 }, /* GL_DISPLAY_LIST */
+   { 14959, 0x000082E8 }, /* GL_MAX_LABEL_LENGTH */
+   { 14979, 0x00008362 }, /* GL_UNSIGNED_BYTE_2_3_3_REV */
+   { 15006, 0x00008363 }, /* GL_UNSIGNED_SHORT_5_6_5 */
+   { 15030, 0x00008364 }, /* GL_UNSIGNED_SHORT_5_6_5_REV */
+   { 15058, 0x00008365 }, /* GL_UNSIGNED_SHORT_4_4_4_4_REV */
+   { 15088, 0x00008366 }, /* GL_UNSIGNED_SHORT_1_5_5_5_REV */
+   { 15118, 0x00008367 }, /* GL_UNSIGNED_INT_8_8_8_8_REV */
+   { 15146, 0x00008368 }, /* GL_UNSIGNED_INT_2_10_10_10_REV */
+   { 15177, 0x00008369 }, /* GL_TEXTURE_MAX_CLAMP_S_SGIX */
+   { 15205, 0x0000836A }, /* GL_TEXTURE_MAX_CLAMP_T_SGIX */
+   { 15233, 0x0000836B }, /* GL_TEXTURE_MAX_CLAMP_R_SGIX */
+   { 15261, 0x00008370 }, /* GL_MIRRORED_REPEAT */
+   { 15280, 0x000083A0 }, /* GL_RGB_S3TC */
+   { 15292, 0x000083A1 }, /* GL_RGB4_S3TC */
+   { 15305, 0x000083A2 }, /* GL_RGBA_S3TC */
+   { 15318, 0x000083A3 }, /* GL_RGBA4_S3TC */
+   { 15332, 0x000083A4 }, /* GL_RGBA_DXT5_S3TC */
+   { 15350, 0x000083A5 }, /* GL_RGBA4_DXT5_S3TC */
+   { 15369, 0x000083F0 }, /* GL_COMPRESSED_RGB_S3TC_DXT1_EXT */
+   { 15401, 0x000083F1 }, /* GL_COMPRESSED_RGBA_S3TC_DXT1_EXT */
+   { 15434, 0x000083F2 }, /* GL_COMPRESSED_RGBA_S3TC_DXT3_ANGLE */
+   { 15469, 0x000083F3 }, /* GL_COMPRESSED_RGBA_S3TC_DXT5_ANGLE */
+   { 15504, 0x000083F9 }, /* GL_PERFQUERY_DONOT_FLUSH_INTEL */
+   { 15535, 0x000083FA }, /* GL_PERFQUERY_FLUSH_INTEL */
+   { 15560, 0x000083FB }, /* GL_PERFQUERY_WAIT_INTEL */
+   { 15584, 0x0000844D }, /* GL_NEAREST_CLIPMAP_NEAREST_SGIX */
+   { 15616, 0x0000844E }, /* GL_NEAREST_CLIPMAP_LINEAR_SGIX */
+   { 15647, 0x0000844F }, /* GL_LINEAR_CLIPMAP_NEAREST_SGIX */
+   { 15678, 0x00008450 }, /* GL_FOG_COORDINATE_SOURCE */
+   { 15703, 0x00008451 }, /* GL_FOG_COORD */
+   { 15716, 0x00008452 }, /* GL_FRAGMENT_DEPTH */
+   { 15734, 0x00008453 }, /* GL_CURRENT_FOG_COORD */
+   { 15755, 0x00008454 }, /* GL_FOG_COORDINATE_ARRAY_TYPE */
+   { 15784, 0x00008455 }, /* GL_FOG_COORDINATE_ARRAY_STRIDE */
+   { 15815, 0x00008456 }, /* GL_FOG_COORDINATE_ARRAY_POINTER */
+   { 15847, 0x00008457 }, /* GL_FOG_COORDINATE_ARRAY */
+   { 15871, 0x00008458 }, /* GL_COLOR_SUM */
+   { 15884, 0x00008459 }, /* GL_CURRENT_SECONDARY_COLOR */
+   { 15911, 0x0000845A }, /* GL_SECONDARY_COLOR_ARRAY_SIZE */
+   { 15941, 0x0000845B }, /* GL_SECONDARY_COLOR_ARRAY_TYPE */
+   { 15971, 0x0000845C }, /* GL_SECONDARY_COLOR_ARRAY_STRIDE */
+   { 16003, 0x0000845D }, /* GL_SECONDARY_COLOR_ARRAY_POINTER */
+   { 16036, 0x0000845E }, /* GL_SECONDARY_COLOR_ARRAY */
+   { 16061, 0x0000845F }, /* GL_CURRENT_RASTER_SECONDARY_COLOR */
+   { 16095, 0x0000846D }, /* GL_ALIASED_POINT_SIZE_RANGE */
+   { 16123, 0x0000846E }, /* GL_ALIASED_LINE_WIDTH_RANGE */
+   { 16151, 0x000084C0 }, /* GL_TEXTURE0 */
+   { 16163, 0x000084C1 }, /* GL_TEXTURE1 */
+   { 16175, 0x000084C2 }, /* GL_TEXTURE2 */
+   { 16187, 0x000084C3 }, /* GL_TEXTURE3 */
+   { 16199, 0x000084C4 }, /* GL_TEXTURE4 */
+   { 16211, 0x000084C5 }, /* GL_TEXTURE5 */
+   { 16223, 0x000084C6 }, /* GL_TEXTURE6 */
+   { 16235, 0x000084C7 }, /* GL_TEXTURE7 */
+   { 16247, 0x000084C8 }, /* GL_TEXTURE8 */
+   { 16259, 0x000084C9 }, /* GL_TEXTURE9 */
+   { 16271, 0x000084CA }, /* GL_TEXTURE10 */
+   { 16284, 0x000084CB }, /* GL_TEXTURE11 */
+   { 16297, 0x000084CC }, /* GL_TEXTURE12 */
+   { 16310, 0x000084CD }, /* GL_TEXTURE13 */
+   { 16323, 0x000084CE }, /* GL_TEXTURE14 */
+   { 16336, 0x000084CF }, /* GL_TEXTURE15 */
+   { 16349, 0x000084D0 }, /* GL_TEXTURE16 */
+   { 16362, 0x000084D1 }, /* GL_TEXTURE17 */
+   { 16375, 0x000084D2 }, /* GL_TEXTURE18 */
+   { 16388, 0x000084D3 }, /* GL_TEXTURE19 */
+   { 16401, 0x000084D4 }, /* GL_TEXTURE20 */
+   { 16414, 0x000084D5 }, /* GL_TEXTURE21 */
+   { 16427, 0x000084D6 }, /* GL_TEXTURE22 */
+   { 16440, 0x000084D7 }, /* GL_TEXTURE23 */
+   { 16453, 0x000084D8 }, /* GL_TEXTURE24 */
+   { 16466, 0x000084D9 }, /* GL_TEXTURE25 */
+   { 16479, 0x000084DA }, /* GL_TEXTURE26 */
+   { 16492, 0x000084DB }, /* GL_TEXTURE27 */
+   { 16505, 0x000084DC }, /* GL_TEXTURE28 */
+   { 16518, 0x000084DD }, /* GL_TEXTURE29 */
+   { 16531, 0x000084DE }, /* GL_TEXTURE30 */
+   { 16544, 0x000084DF }, /* GL_TEXTURE31 */
+   { 16557, 0x000084E0 }, /* GL_ACTIVE_TEXTURE */
+   { 16575, 0x000084E1 }, /* GL_CLIENT_ACTIVE_TEXTURE */
+   { 16600, 0x000084E2 }, /* GL_MAX_TEXTURE_UNITS */
+   { 16621, 0x000084E3 }, /* GL_TRANSPOSE_MODELVIEW_MATRIX */
+   { 16651, 0x000084E4 }, /* GL_TRANSPOSE_PROJECTION_MATRIX */
+   { 16682, 0x000084E5 }, /* GL_TRANSPOSE_TEXTURE_MATRIX */
+   { 16710, 0x000084E6 }, /* GL_TRANSPOSE_COLOR_MATRIX */
+   { 16736, 0x000084E7 }, /* GL_SUBTRACT */
+   { 16748, 0x000084E8 }, /* GL_MAX_RENDERBUFFER_SIZE */
+   { 16773, 0x000084E9 }, /* GL_COMPRESSED_ALPHA */
+   { 16793, 0x000084EA }, /* GL_COMPRESSED_LUMINANCE */
+   { 16817, 0x000084EB }, /* GL_COMPRESSED_LUMINANCE_ALPHA */
+   { 16847, 0x000084EC }, /* GL_COMPRESSED_INTENSITY */
+   { 16871, 0x000084ED }, /* GL_COMPRESSED_RGB */
+   { 16889, 0x000084EE }, /* GL_COMPRESSED_RGBA */
+   { 16908, 0x000084EF }, /* GL_TEXTURE_COMPRESSION_HINT */
+   { 16936, 0x000084F5 }, /* GL_TEXTURE_RECTANGLE */
+   { 16957, 0x000084F6 }, /* GL_TEXTURE_BINDING_RECTANGLE */
+   { 16986, 0x000084F7 }, /* GL_PROXY_TEXTURE_RECTANGLE */
+   { 17013, 0x000084F8 }, /* GL_MAX_RECTANGLE_TEXTURE_SIZE */
+   { 17043, 0x000084F9 }, /* GL_DEPTH_STENCIL */
+   { 17060, 0x000084FA }, /* GL_UNSIGNED_INT_24_8 */
+   { 17081, 0x000084FD }, /* GL_MAX_TEXTURE_LOD_BIAS */
+   { 17105, 0x000084FE }, /* GL_TEXTURE_MAX_ANISOTROPY_EXT */
+   { 17135, 0x000084FF }, /* GL_MAX_TEXTURE_MAX_ANISOTROPY_EXT */
+   { 17169, 0x00008500 }, /* GL_TEXTURE_FILTER_CONTROL */
+   { 17195, 0x00008501 }, /* GL_TEXTURE_LOD_BIAS */
+   { 17215, 0x00008503 }, /* GL_COMBINE4_NV */
+   { 17230, 0x00008504 }, /* GL_MAX_SHININESS_NV */
+   { 17250, 0x00008505 }, /* GL_MAX_SPOT_EXPONENT_NV */
+   { 17274, 0x00008507 }, /* GL_INCR_WRAP */
+   { 17287, 0x00008508 }, /* GL_DECR_WRAP */
+   { 17300, 0x0000850A }, /* GL_MODELVIEW1_ARB */
+   { 17318, 0x00008511 }, /* GL_NORMAL_MAP */
+   { 17332, 0x00008512 }, /* GL_REFLECTION_MAP */
+   { 17350, 0x00008513 }, /* GL_TEXTURE_CUBE_MAP */
+   { 17370, 0x00008514 }, /* GL_TEXTURE_BINDING_CUBE_MAP */
+   { 17398, 0x00008515 }, /* GL_TEXTURE_CUBE_MAP_POSITIVE_X */
+   { 17429, 0x00008516 }, /* GL_TEXTURE_CUBE_MAP_NEGATIVE_X */
+   { 17460, 0x00008517 }, /* GL_TEXTURE_CUBE_MAP_POSITIVE_Y */
+   { 17491, 0x00008518 }, /* GL_TEXTURE_CUBE_MAP_NEGATIVE_Y */
+   { 17522, 0x00008519 }, /* GL_TEXTURE_CUBE_MAP_POSITIVE_Z */
+   { 17553, 0x0000851A }, /* GL_TEXTURE_CUBE_MAP_NEGATIVE_Z */
+   { 17584, 0x0000851B }, /* GL_PROXY_TEXTURE_CUBE_MAP */
+   { 17610, 0x0000851C }, /* GL_MAX_CUBE_MAP_TEXTURE_SIZE */
+   { 17639, 0x00008534 }, /* GL_MULTISAMPLE_FILTER_HINT_NV */
+   { 17669, 0x00008558 }, /* GL_PRIMITIVE_RESTART_NV */
+   { 17693, 0x00008559 }, /* GL_PRIMITIVE_RESTART_INDEX_NV */
+   { 17723, 0x0000855A }, /* GL_FOG_DISTANCE_MODE_NV */
+   { 17747, 0x0000855B }, /* GL_EYE_RADIAL_NV */
+   { 17764, 0x0000855C }, /* GL_EYE_PLANE_ABSOLUTE_NV */
+   { 17789, 0x00008570 }, /* GL_COMBINE */
+   { 17800, 0x00008571 }, /* GL_COMBINE_RGB */
+   { 17815, 0x00008572 }, /* GL_COMBINE_ALPHA */
+   { 17832, 0x00008573 }, /* GL_RGB_SCALE */
+   { 17845, 0x00008574 }, /* GL_ADD_SIGNED */
+   { 17859, 0x00008575 }, /* GL_INTERPOLATE */
+   { 17874, 0x00008576 }, /* GL_CONSTANT */
+   { 17886, 0x00008577 }, /* GL_PRIMARY_COLOR */
+   { 17903, 0x00008578 }, /* GL_PREVIOUS */
+   { 17915, 0x00008580 }, /* GL_SOURCE0_RGB */
+   { 17930, 0x00008581 }, /* GL_SOURCE1_RGB */
+   { 17945, 0x00008582 }, /* GL_SOURCE2_RGB */
+   { 17960, 0x00008583 }, /* GL_SOURCE3_RGB_NV */
+   { 17978, 0x00008588 }, /* GL_SOURCE0_ALPHA */
+   { 17995, 0x00008589 }, /* GL_SOURCE1_ALPHA */
+   { 18012, 0x0000858A }, /* GL_SOURCE2_ALPHA */
+   { 18029, 0x0000858B }, /* GL_SOURCE3_ALPHA_NV */
+   { 18049, 0x00008590 }, /* GL_OPERAND0_RGB */
+   { 18065, 0x00008591 }, /* GL_OPERAND1_RGB */
+   { 18081, 0x00008592 }, /* GL_OPERAND2_RGB */
+   { 18097, 0x00008593 }, /* GL_OPERAND3_RGB_NV */
+   { 18116, 0x00008598 }, /* GL_OPERAND0_ALPHA */
+   { 18134, 0x00008599 }, /* GL_OPERAND1_ALPHA */
+   { 18152, 0x0000859A }, /* GL_OPERAND2_ALPHA */
+   { 18170, 0x0000859B }, /* GL_OPERAND3_ALPHA_NV */
+   { 18191, 0x000085B3 }, /* GL_BUFFER_OBJECT_APPLE */
+   { 18214, 0x000085B5 }, /* GL_VERTEX_ARRAY_BINDING */
+   { 18238, 0x000085B7 }, /* GL_TEXTURE_RANGE_LENGTH_APPLE */
+   { 18268, 0x000085B8 }, /* GL_TEXTURE_RANGE_POINTER_APPLE */
+   { 18299, 0x000085B9 }, /* GL_YCBCR_422_APPLE */
+   { 18318, 0x000085BA }, /* GL_UNSIGNED_SHORT_8_8_APPLE */
+   { 18346, 0x000085BB }, /* GL_UNSIGNED_SHORT_8_8_REV_APPLE */
+   { 18378, 0x000085BC }, /* GL_TEXTURE_STORAGE_HINT_APPLE */
+   { 18408, 0x000085BD }, /* GL_STORAGE_PRIVATE_APPLE */
+   { 18433, 0x000085BE }, /* GL_STORAGE_CACHED_APPLE */
+   { 18457, 0x000085BF }, /* GL_STORAGE_SHARED_APPLE */
+   { 18481, 0x000085CC }, /* GL_SLICE_ACCUM_SUN */
+   { 18500, 0x00008614 }, /* GL_QUAD_MESH_SUN */
+   { 18517, 0x00008615 }, /* GL_TRIANGLE_MESH_SUN */
+   { 18538, 0x00008620 }, /* GL_VERTEX_PROGRAM_ARB */
+   { 18560, 0x00008621 }, /* GL_VERTEX_STATE_PROGRAM_NV */
+   { 18587, 0x00008622 }, /* GL_VERTEX_ATTRIB_ARRAY_ENABLED */
+   { 18618, 0x00008623 }, /* GL_VERTEX_ATTRIB_ARRAY_SIZE */
+   { 18646, 0x00008624 }, /* GL_VERTEX_ATTRIB_ARRAY_STRIDE */
+   { 18676, 0x00008625 }, /* GL_VERTEX_ATTRIB_ARRAY_TYPE */
+   { 18704, 0x00008626 }, /* GL_CURRENT_VERTEX_ATTRIB */
+   { 18729, 0x00008627 }, /* GL_PROGRAM_LENGTH_ARB */
+   { 18751, 0x00008628 }, /* GL_PROGRAM_STRING_ARB */
+   { 18773, 0x00008629 }, /* GL_MODELVIEW_PROJECTION_NV */
+   { 18800, 0x0000862A }, /* GL_IDENTITY_NV */
+   { 18815, 0x0000862B }, /* GL_INVERSE_NV */
+   { 18829, 0x0000862C }, /* GL_TRANSPOSE_NV */
+   { 18845, 0x0000862D }, /* GL_INVERSE_TRANSPOSE_NV */
+   { 18869, 0x0000862E }, /* GL_MAX_PROGRAM_MATRIX_STACK_DEPTH_ARB */
+   { 18907, 0x0000862F }, /* GL_MAX_PROGRAM_MATRICES_ARB */
+   { 18935, 0x00008630 }, /* GL_MATRIX0_NV */
+   { 18949, 0x00008631 }, /* GL_MATRIX1_NV */
+   { 18963, 0x00008632 }, /* GL_MATRIX2_NV */
+   { 18977, 0x00008633 }, /* GL_MATRIX3_NV */
+   { 18991, 0x00008634 }, /* GL_MATRIX4_NV */
+   { 19005, 0x00008635 }, /* GL_MATRIX5_NV */
+   { 19019, 0x00008636 }, /* GL_MATRIX6_NV */
+   { 19033, 0x00008637 }, /* GL_MATRIX7_NV */
+   { 19047, 0x00008640 }, /* GL_CURRENT_MATRIX_STACK_DEPTH_ARB */
+   { 19081, 0x00008641 }, /* GL_CURRENT_MATRIX_ARB */
+   { 19103, 0x00008642 }, /* GL_PROGRAM_POINT_SIZE */
+   { 19125, 0x00008643 }, /* GL_VERTEX_PROGRAM_TWO_SIDE */
+   { 19152, 0x00008644 }, /* GL_PROGRAM_PARAMETER_NV */
+   { 19176, 0x00008645 }, /* GL_VERTEX_ATTRIB_ARRAY_POINTER */
+   { 19207, 0x00008646 }, /* GL_PROGRAM_TARGET_NV */
+   { 19228, 0x00008647 }, /* GL_PROGRAM_RESIDENT_NV */
+   { 19251, 0x00008648 }, /* GL_TRACK_MATRIX_NV */
+   { 19270, 0x00008649 }, /* GL_TRACK_MATRIX_TRANSFORM_NV */
+   { 19299, 0x0000864A }, /* GL_VERTEX_PROGRAM_BINDING_NV */
+   { 19328, 0x0000864B }, /* GL_PROGRAM_ERROR_POSITION_ARB */
+   { 19358, 0x0000864F }, /* GL_DEPTH_CLAMP */
+   { 19373, 0x00008650 }, /* GL_VERTEX_ATTRIB_ARRAY0_NV */
+   { 19400, 0x00008651 }, /* GL_VERTEX_ATTRIB_ARRAY1_NV */
+   { 19427, 0x00008652 }, /* GL_VERTEX_ATTRIB_ARRAY2_NV */
+   { 19454, 0x00008653 }, /* GL_VERTEX_ATTRIB_ARRAY3_NV */
+   { 19481, 0x00008654 }, /* GL_VERTEX_ATTRIB_ARRAY4_NV */
+   { 19508, 0x00008655 }, /* GL_VERTEX_ATTRIB_ARRAY5_NV */
+   { 19535, 0x00008656 }, /* GL_VERTEX_ATTRIB_ARRAY6_NV */
+   { 19562, 0x00008657 }, /* GL_VERTEX_ATTRIB_ARRAY7_NV */
+   { 19589, 0x00008658 }, /* GL_VERTEX_ATTRIB_ARRAY8_NV */
+   { 19616, 0x00008659 }, /* GL_VERTEX_ATTRIB_ARRAY9_NV */
+   { 19643, 0x0000865A }, /* GL_VERTEX_ATTRIB_ARRAY10_NV */
+   { 19671, 0x0000865B }, /* GL_VERTEX_ATTRIB_ARRAY11_NV */
+   { 19699, 0x0000865C }, /* GL_VERTEX_ATTRIB_ARRAY12_NV */
+   { 19727, 0x0000865D }, /* GL_VERTEX_ATTRIB_ARRAY13_NV */
+   { 19755, 0x0000865E }, /* GL_VERTEX_ATTRIB_ARRAY14_NV */
+   { 19783, 0x0000865F }, /* GL_VERTEX_ATTRIB_ARRAY15_NV */
+   { 19811, 0x00008660 }, /* GL_MAP1_VERTEX_ATTRIB0_4_NV */
+   { 19839, 0x00008661 }, /* GL_MAP1_VERTEX_ATTRIB1_4_NV */
+   { 19867, 0x00008662 }, /* GL_MAP1_VERTEX_ATTRIB2_4_NV */
+   { 19895, 0x00008663 }, /* GL_MAP1_VERTEX_ATTRIB3_4_NV */
+   { 19923, 0x00008664 }, /* GL_MAP1_VERTEX_ATTRIB4_4_NV */
+   { 19951, 0x00008665 }, /* GL_MAP1_VERTEX_ATTRIB5_4_NV */
+   { 19979, 0x00008666 }, /* GL_MAP1_VERTEX_ATTRIB6_4_NV */
+   { 20007, 0x00008667 }, /* GL_MAP1_VERTEX_ATTRIB7_4_NV */
+   { 20035, 0x00008668 }, /* GL_MAP1_VERTEX_ATTRIB8_4_NV */
+   { 20063, 0x00008669 }, /* GL_MAP1_VERTEX_ATTRIB9_4_NV */
+   { 20091, 0x0000866A }, /* GL_MAP1_VERTEX_ATTRIB10_4_NV */
+   { 20120, 0x0000866B }, /* GL_MAP1_VERTEX_ATTRIB11_4_NV */
+   { 20149, 0x0000866C }, /* GL_MAP1_VERTEX_ATTRIB12_4_NV */
+   { 20178, 0x0000866D }, /* GL_MAP1_VERTEX_ATTRIB13_4_NV */
+   { 20207, 0x0000866E }, /* GL_MAP1_VERTEX_ATTRIB14_4_NV */
+   { 20236, 0x0000866F }, /* GL_MAP1_VERTEX_ATTRIB15_4_NV */
+   { 20265, 0x00008670 }, /* GL_MAP2_VERTEX_ATTRIB0_4_NV */
+   { 20293, 0x00008671 }, /* GL_MAP2_VERTEX_ATTRIB1_4_NV */
+   { 20321, 0x00008672 }, /* GL_MAP2_VERTEX_ATTRIB2_4_NV */
+   { 20349, 0x00008673 }, /* GL_MAP2_VERTEX_ATTRIB3_4_NV */
+   { 20377, 0x00008674 }, /* GL_MAP2_VERTEX_ATTRIB4_4_NV */
+   { 20405, 0x00008675 }, /* GL_MAP2_VERTEX_ATTRIB5_4_NV */
+   { 20433, 0x00008676 }, /* GL_MAP2_VERTEX_ATTRIB6_4_NV */
+   { 20461, 0x00008677 }, /* GL_PROGRAM_BINDING_ARB */
+   { 20484, 0x00008678 }, /* GL_MAP2_VERTEX_ATTRIB8_4_NV */
+   { 20512, 0x00008679 }, /* GL_MAP2_VERTEX_ATTRIB9_4_NV */
+   { 20540, 0x0000867A }, /* GL_MAP2_VERTEX_ATTRIB10_4_NV */
+   { 20569, 0x0000867B }, /* GL_MAP2_VERTEX_ATTRIB11_4_NV */
+   { 20598, 0x0000867C }, /* GL_MAP2_VERTEX_ATTRIB12_4_NV */
+   { 20627, 0x0000867D }, /* GL_MAP2_VERTEX_ATTRIB13_4_NV */
+   { 20656, 0x0000867E }, /* GL_MAP2_VERTEX_ATTRIB14_4_NV */
+   { 20685, 0x0000867F }, /* GL_MAP2_VERTEX_ATTRIB15_4_NV */
+   { 20714, 0x000086A0 }, /* GL_TEXTURE_COMPRESSED_IMAGE_SIZE */
+   { 20747, 0x000086A1 }, /* GL_TEXTURE_COMPRESSED */
+   { 20769, 0x000086A2 }, /* GL_NUM_COMPRESSED_TEXTURE_FORMATS */
+   { 20803, 0x000086A3 }, /* GL_COMPRESSED_TEXTURE_FORMATS */
+   { 20833, 0x000086A4 }, /* GL_MAX_VERTEX_UNITS_ARB */
+   { 20857, 0x000086A5 }, /* GL_ACTIVE_VERTEX_UNITS_ARB */
+   { 20884, 0x000086A6 }, /* GL_WEIGHT_SUM_UNITY_ARB */
+   { 20908, 0x000086A7 }, /* GL_VERTEX_BLEND_ARB */
+   { 20928, 0x000086A8 }, /* GL_CURRENT_WEIGHT_ARB */
+   { 20950, 0x000086A9 }, /* GL_WEIGHT_ARRAY_TYPE_ARB */
+   { 20975, 0x000086AA }, /* GL_WEIGHT_ARRAY_STRIDE_ARB */
+   { 21002, 0x000086AB }, /* GL_WEIGHT_ARRAY_SIZE_ARB */
+   { 21027, 0x000086AC }, /* GL_WEIGHT_ARRAY_POINTER_ARB */
+   { 21055, 0x000086AD }, /* GL_WEIGHT_ARRAY_ARB */
+   { 21075, 0x000086AE }, /* GL_DOT3_RGB */
+   { 21087, 0x000086AF }, /* GL_DOT3_RGBA */
+   { 21100, 0x000086B0 }, /* GL_COMPRESSED_RGB_FXT1_3DFX */
+   { 21128, 0x000086B1 }, /* GL_COMPRESSED_RGBA_FXT1_3DFX */
+   { 21157, 0x000086B2 }, /* GL_MULTISAMPLE_3DFX */
+   { 21177, 0x000086B3 }, /* GL_SAMPLE_BUFFERS_3DFX */
+   { 21200, 0x000086B4 }, /* GL_SAMPLES_3DFX */
+   { 21216, 0x000086EB }, /* GL_SURFACE_STATE_NV */
+   { 21236, 0x000086FD }, /* GL_SURFACE_REGISTERED_NV */
+   { 21261, 0x00008700 }, /* GL_SURFACE_MAPPED_NV */
+   { 21282, 0x00008722 }, /* GL_MODELVIEW2_ARB */
+   { 21300, 0x00008723 }, /* GL_MODELVIEW3_ARB */
+   { 21318, 0x00008724 }, /* GL_MODELVIEW4_ARB */
+   { 21336, 0x00008725 }, /* GL_MODELVIEW5_ARB */
+   { 21354, 0x00008726 }, /* GL_MODELVIEW6_ARB */
+   { 21372, 0x00008727 }, /* GL_MODELVIEW7_ARB */
+   { 21390, 0x00008728 }, /* GL_MODELVIEW8_ARB */
+   { 21408, 0x00008729 }, /* GL_MODELVIEW9_ARB */
+   { 21426, 0x0000872A }, /* GL_MODELVIEW10_ARB */
+   { 21445, 0x0000872B }, /* GL_MODELVIEW11_ARB */
+   { 21464, 0x0000872C }, /* GL_MODELVIEW12_ARB */
+   { 21483, 0x0000872D }, /* GL_MODELVIEW13_ARB */
+   { 21502, 0x0000872E }, /* GL_MODELVIEW14_ARB */
+   { 21521, 0x0000872F }, /* GL_MODELVIEW15_ARB */
+   { 21540, 0x00008730 }, /* GL_MODELVIEW16_ARB */
+   { 21559, 0x00008731 }, /* GL_MODELVIEW17_ARB */
+   { 21578, 0x00008732 }, /* GL_MODELVIEW18_ARB */
+   { 21597, 0x00008733 }, /* GL_MODELVIEW19_ARB */
+   { 21616, 0x00008734 }, /* GL_MODELVIEW20_ARB */
+   { 21635, 0x00008735 }, /* GL_MODELVIEW21_ARB */
+   { 21654, 0x00008736 }, /* GL_MODELVIEW22_ARB */
+   { 21673, 0x00008737 }, /* GL_MODELVIEW23_ARB */
+   { 21692, 0x00008738 }, /* GL_MODELVIEW24_ARB */
+   { 21711, 0x00008739 }, /* GL_MODELVIEW25_ARB */
+   { 21730, 0x0000873A }, /* GL_MODELVIEW26_ARB */
+   { 21749, 0x0000873B }, /* GL_MODELVIEW27_ARB */
+   { 21768, 0x0000873C }, /* GL_MODELVIEW28_ARB */
+   { 21787, 0x0000873D }, /* GL_MODELVIEW29_ARB */
+   { 21806, 0x0000873E }, /* GL_MODELVIEW30_ARB */
+   { 21825, 0x0000873F }, /* GL_MODELVIEW31_ARB */
+   { 21844, 0x00008740 }, /* GL_DOT3_RGB_EXT */
+   { 21860, 0x00008741 }, /* GL_PROGRAM_BINARY_LENGTH */
+   { 21885, 0x00008742 }, /* GL_MIRROR_CLAMP_EXT */
+   { 21905, 0x00008743 }, /* GL_MIRROR_CLAMP_TO_EDGE_EXT */
+   { 21933, 0x00008744 }, /* GL_MODULATE_ADD_ATI */
+   { 21953, 0x00008745 }, /* GL_MODULATE_SIGNED_ADD_ATI */
+   { 21980, 0x00008746 }, /* GL_MODULATE_SUBTRACT_ATI */
+   { 22005, 0x00008757 }, /* GL_YCBCR_MESA */
+   { 22019, 0x00008758 }, /* GL_PACK_INVERT_MESA */
+   { 22039, 0x00008764 }, /* GL_BUFFER_SIZE */
+   { 22054, 0x00008765 }, /* GL_BUFFER_USAGE */
+   { 22070, 0x00008775 }, /* GL_BUMP_ROT_MATRIX_ATI */
+   { 22093, 0x00008776 }, /* GL_BUMP_ROT_MATRIX_SIZE_ATI */
+   { 22121, 0x00008777 }, /* GL_BUMP_NUM_TEX_UNITS_ATI */
+   { 22147, 0x00008778 }, /* GL_BUMP_TEX_UNITS_ATI */
+   { 22169, 0x00008779 }, /* GL_DUDV_ATI */
+   { 22181, 0x0000877A }, /* GL_DU8DV8_ATI */
+   { 22195, 0x0000877B }, /* GL_BUMP_ENVMAP_ATI */
+   { 22214, 0x0000877C }, /* GL_BUMP_TARGET_ATI */
+   { 22233, 0x000087FE }, /* GL_NUM_PROGRAM_BINARY_FORMATS */
+   { 22263, 0x000087FF }, /* GL_PROGRAM_BINARY_FORMATS */
+   { 22289, 0x00008800 }, /* GL_STENCIL_BACK_FUNC */
+   { 22310, 0x00008801 }, /* GL_STENCIL_BACK_FAIL */
+   { 22331, 0x00008802 }, /* GL_STENCIL_BACK_PASS_DEPTH_FAIL */
+   { 22363, 0x00008803 }, /* GL_STENCIL_BACK_PASS_DEPTH_PASS */
+   { 22395, 0x00008804 }, /* GL_FRAGMENT_PROGRAM_ARB */
+   { 22419, 0x00008805 }, /* GL_PROGRAM_ALU_INSTRUCTIONS_ARB */
+   { 22451, 0x00008806 }, /* GL_PROGRAM_TEX_INSTRUCTIONS_ARB */
+   { 22483, 0x00008807 }, /* GL_PROGRAM_TEX_INDIRECTIONS_ARB */
+   { 22515, 0x00008808 }, /* GL_PROGRAM_NATIVE_ALU_INSTRUCTIONS_ARB */
+   { 22554, 0x00008809 }, /* GL_PROGRAM_NATIVE_TEX_INSTRUCTIONS_ARB */
+   { 22593, 0x0000880A }, /* GL_PROGRAM_NATIVE_TEX_INDIRECTIONS_ARB */
+   { 22632, 0x0000880B }, /* GL_MAX_PROGRAM_ALU_INSTRUCTIONS_ARB */
+   { 22668, 0x0000880C }, /* GL_MAX_PROGRAM_TEX_INSTRUCTIONS_ARB */
+   { 22704, 0x0000880D }, /* GL_MAX_PROGRAM_TEX_INDIRECTIONS_ARB */
+   { 22740, 0x0000880E }, /* GL_MAX_PROGRAM_NATIVE_ALU_INSTRUCTIONS_ARB */
+   { 22783, 0x0000880F }, /* GL_MAX_PROGRAM_NATIVE_TEX_INSTRUCTIONS_ARB */
+   { 22826, 0x00008810 }, /* GL_MAX_PROGRAM_NATIVE_TEX_INDIRECTIONS_ARB */
+   { 22869, 0x00008814 }, /* GL_RGBA32F */
+   { 22880, 0x00008815 }, /* GL_RGB32F */
+   { 22890, 0x00008816 }, /* GL_ALPHA32F_ARB */
+   { 22906, 0x00008817 }, /* GL_INTENSITY32F_ARB */
+   { 22926, 0x00008818 }, /* GL_LUMINANCE32F_ARB */
+   { 22946, 0x00008819 }, /* GL_LUMINANCE_ALPHA32F_ARB */
+   { 22972, 0x0000881A }, /* GL_RGBA16F */
+   { 22983, 0x0000881B }, /* GL_RGB16F */
+   { 22993, 0x0000881C }, /* GL_ALPHA16F_ARB */
+   { 23009, 0x0000881D }, /* GL_INTENSITY16F_ARB */
+   { 23029, 0x0000881E }, /* GL_LUMINANCE16F_ARB */
+   { 23049, 0x0000881F }, /* GL_LUMINANCE_ALPHA16F_ARB */
+   { 23075, 0x00008820 }, /* GL_RGBA_FLOAT_MODE_ARB */
+   { 23098, 0x00008824 }, /* GL_MAX_DRAW_BUFFERS */
+   { 23118, 0x00008825 }, /* GL_DRAW_BUFFER0 */
+   { 23134, 0x00008826 }, /* GL_DRAW_BUFFER1 */
+   { 23150, 0x00008827 }, /* GL_DRAW_BUFFER2 */
+   { 23166, 0x00008828 }, /* GL_DRAW_BUFFER3 */
+   { 23182, 0x00008829 }, /* GL_DRAW_BUFFER4 */
+   { 23198, 0x0000882A }, /* GL_DRAW_BUFFER5 */
+   { 23214, 0x0000882B }, /* GL_DRAW_BUFFER6 */
+   { 23230, 0x0000882C }, /* GL_DRAW_BUFFER7 */
+   { 23246, 0x0000882D }, /* GL_DRAW_BUFFER8 */
+   { 23262, 0x0000882E }, /* GL_DRAW_BUFFER9 */
+   { 23278, 0x0000882F }, /* GL_DRAW_BUFFER10 */
+   { 23295, 0x00008830 }, /* GL_DRAW_BUFFER11 */
+   { 23312, 0x00008831 }, /* GL_DRAW_BUFFER12 */
+   { 23329, 0x00008832 }, /* GL_DRAW_BUFFER13 */
+   { 23346, 0x00008833 }, /* GL_DRAW_BUFFER14 */
+   { 23363, 0x00008834 }, /* GL_DRAW_BUFFER15 */
+   { 23380, 0x0000883D }, /* GL_BLEND_EQUATION_ALPHA */
+   { 23404, 0x00008840 }, /* GL_MATRIX_PALETTE_ARB */
+   { 23426, 0x00008841 }, /* GL_MAX_MATRIX_PALETTE_STACK_DEPTH_ARB */
+   { 23464, 0x00008842 }, /* GL_MAX_PALETTE_MATRICES_ARB */
+   { 23492, 0x00008843 }, /* GL_CURRENT_PALETTE_MATRIX_ARB */
+   { 23522, 0x00008844 }, /* GL_MATRIX_INDEX_ARRAY_ARB */
+   { 23548, 0x00008845 }, /* GL_CURRENT_MATRIX_INDEX_ARB */
+   { 23576, 0x00008846 }, /* GL_MATRIX_INDEX_ARRAY_SIZE_ARB */
+   { 23607, 0x00008847 }, /* GL_MATRIX_INDEX_ARRAY_TYPE_ARB */
+   { 23638, 0x00008848 }, /* GL_MATRIX_INDEX_ARRAY_STRIDE_ARB */
+   { 23671, 0x00008849 }, /* GL_MATRIX_INDEX_ARRAY_POINTER_ARB */
+   { 23705, 0x0000884A }, /* GL_TEXTURE_DEPTH_SIZE */
+   { 23727, 0x0000884B }, /* GL_DEPTH_TEXTURE_MODE */
+   { 23749, 0x0000884C }, /* GL_TEXTURE_COMPARE_MODE */
+   { 23773, 0x0000884D }, /* GL_TEXTURE_COMPARE_FUNC */
+   { 23797, 0x0000884E }, /* GL_COMPARE_REF_TO_TEXTURE */
+   { 23823, 0x0000884F }, /* GL_TEXTURE_CUBE_MAP_SEAMLESS */
+   { 23852, 0x00008861 }, /* GL_POINT_SPRITE */
+   { 23868, 0x00008862 }, /* GL_COORD_REPLACE */
+   { 23885, 0x00008863 }, /* GL_POINT_SPRITE_R_MODE_NV */
+   { 23911, 0x00008864 }, /* GL_QUERY_COUNTER_BITS */
+   { 23933, 0x00008865 }, /* GL_CURRENT_QUERY */
+   { 23950, 0x00008866 }, /* GL_QUERY_RESULT */
+   { 23966, 0x00008867 }, /* GL_QUERY_RESULT_AVAILABLE */
+   { 23992, 0x00008868 }, /* GL_MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV */
+   { 24036, 0x00008869 }, /* GL_MAX_VERTEX_ATTRIBS */
+   { 24058, 0x0000886A }, /* GL_VERTEX_ATTRIB_ARRAY_NORMALIZED */
+   { 24092, 0x0000886E }, /* GL_DEPTH_STENCIL_TO_RGBA_NV */
+   { 24120, 0x0000886F }, /* GL_DEPTH_STENCIL_TO_BGRA_NV */
+   { 24148, 0x00008870 }, /* GL_FRAGMENT_PROGRAM_NV */
+   { 24171, 0x00008871 }, /* GL_MAX_TEXTURE_COORDS */
+   { 24193, 0x00008872 }, /* GL_MAX_TEXTURE_IMAGE_UNITS */
+   { 24220, 0x00008873 }, /* GL_FRAGMENT_PROGRAM_BINDING_NV */
+   { 24251, 0x00008874 }, /* GL_PROGRAM_ERROR_STRING_ARB */
+   { 24279, 0x00008875 }, /* GL_PROGRAM_FORMAT_ASCII_ARB */
+   { 24307, 0x00008876 }, /* GL_PROGRAM_FORMAT_ARB */
+   { 24329, 0x0000887F }, /* GL_GEOMETRY_SHADER_INVOCATIONS */
+   { 24360, 0x0000888F }, /* GL_TEXTURE_UNSIGNED_REMAP_MODE_NV */
+   { 24394, 0x00008890 }, /* GL_DEPTH_BOUNDS_TEST_EXT */
+   { 24419, 0x00008891 }, /* GL_DEPTH_BOUNDS_EXT */
+   { 24439, 0x00008892 }, /* GL_ARRAY_BUFFER */
+   { 24455, 0x00008893 }, /* GL_ELEMENT_ARRAY_BUFFER */
+   { 24479, 0x00008894 }, /* GL_ARRAY_BUFFER_BINDING */
+   { 24503, 0x00008895 }, /* GL_ELEMENT_ARRAY_BUFFER_BINDING */
+   { 24535, 0x00008896 }, /* GL_VERTEX_ARRAY_BUFFER_BINDING */
+   { 24566, 0x00008897 }, /* GL_NORMAL_ARRAY_BUFFER_BINDING */
+   { 24597, 0x00008898 }, /* GL_COLOR_ARRAY_BUFFER_BINDING */
+   { 24627, 0x00008899 }, /* GL_INDEX_ARRAY_BUFFER_BINDING */
+   { 24657, 0x0000889A }, /* GL_TEXTURE_COORD_ARRAY_BUFFER_BINDING */
+   { 24695, 0x0000889B }, /* GL_EDGE_FLAG_ARRAY_BUFFER_BINDING */
+   { 24729, 0x0000889C }, /* GL_SECONDARY_COLOR_ARRAY_BUFFER_BINDING */
+   { 24769, 0x0000889D }, /* GL_FOG_COORDINATE_ARRAY_BUFFER_BINDING */
+   { 24808, 0x0000889E }, /* GL_WEIGHT_ARRAY_BUFFER_BINDING */
+   { 24839, 0x0000889F }, /* GL_VERTEX_ATTRIB_ARRAY_BUFFER_BINDING */
+   { 24877, 0x000088A0 }, /* GL_PROGRAM_INSTRUCTIONS_ARB */
+   { 24905, 0x000088A1 }, /* GL_MAX_PROGRAM_INSTRUCTIONS_ARB */
+   { 24937, 0x000088A2 }, /* GL_PROGRAM_NATIVE_INSTRUCTIONS_ARB */
+   { 24972, 0x000088A3 }, /* GL_MAX_PROGRAM_NATIVE_INSTRUCTIONS_ARB */
+   { 25011, 0x000088A4 }, /* GL_PROGRAM_TEMPORARIES_ARB */
+   { 25038, 0x000088A5 }, /* GL_MAX_PROGRAM_TEMPORARIES_ARB */
+   { 25069, 0x000088A6 }, /* GL_PROGRAM_NATIVE_TEMPORARIES_ARB */
+   { 25103, 0x000088A7 }, /* GL_MAX_PROGRAM_NATIVE_TEMPORARIES_ARB */
+   { 25141, 0x000088A8 }, /* GL_PROGRAM_PARAMETERS_ARB */
+   { 25167, 0x000088A9 }, /* GL_MAX_PROGRAM_PARAMETERS_ARB */
+   { 25197, 0x000088AA }, /* GL_PROGRAM_NATIVE_PARAMETERS_ARB */
+   { 25230, 0x000088AB }, /* GL_MAX_PROGRAM_NATIVE_PARAMETERS_ARB */
+   { 25267, 0x000088AC }, /* GL_PROGRAM_ATTRIBS_ARB */
+   { 25290, 0x000088AD }, /* GL_MAX_PROGRAM_ATTRIBS_ARB */
+   { 25317, 0x000088AE }, /* GL_PROGRAM_NATIVE_ATTRIBS_ARB */
+   { 25347, 0x000088AF }, /* GL_MAX_PROGRAM_NATIVE_ATTRIBS_ARB */
+   { 25381, 0x000088B0 }, /* GL_PROGRAM_ADDRESS_REGISTERS_ARB */
+   { 25414, 0x000088B1 }, /* GL_MAX_PROGRAM_ADDRESS_REGISTERS_ARB */
+   { 25451, 0x000088B2 }, /* GL_PROGRAM_NATIVE_ADDRESS_REGISTERS_ARB */
+   { 25491, 0x000088B3 }, /* GL_MAX_PROGRAM_NATIVE_ADDRESS_REGISTERS_ARB */
+   { 25535, 0x000088B4 }, /* GL_MAX_PROGRAM_LOCAL_PARAMETERS_ARB */
+   { 25571, 0x000088B5 }, /* GL_MAX_PROGRAM_ENV_PARAMETERS_ARB */
+   { 25605, 0x000088B6 }, /* GL_PROGRAM_UNDER_NATIVE_LIMITS_ARB */
+   { 25640, 0x000088B7 }, /* GL_TRANSPOSE_CURRENT_MATRIX_ARB */
+   { 25672, 0x000088B8 }, /* GL_READ_ONLY */
+   { 25685, 0x000088B9 }, /* GL_WRITE_ONLY */
+   { 25699, 0x000088BA }, /* GL_READ_WRITE */
+   { 25713, 0x000088BB }, /* GL_BUFFER_ACCESS */
+   { 25730, 0x000088BC }, /* GL_BUFFER_MAPPED */
+   { 25747, 0x000088BD }, /* GL_BUFFER_MAP_POINTER */
+   { 25769, 0x000088BE }, /* GL_WRITE_DISCARD_NV */
+   { 25789, 0x000088BF }, /* GL_TIME_ELAPSED */
+   { 25805, 0x000088C0 }, /* GL_MATRIX0_ARB */
+   { 25820, 0x000088C1 }, /* GL_MATRIX1_ARB */
+   { 25835, 0x000088C2 }, /* GL_MATRIX2_ARB */
+   { 25850, 0x000088C3 }, /* GL_MATRIX3_ARB */
+   { 25865, 0x000088C4 }, /* GL_MATRIX4_ARB */
+   { 25880, 0x000088C5 }, /* GL_MATRIX5_ARB */
+   { 25895, 0x000088C6 }, /* GL_MATRIX6_ARB */
+   { 25910, 0x000088C7 }, /* GL_MATRIX7_ARB */
+   { 25925, 0x000088C8 }, /* GL_MATRIX8_ARB */
+   { 25940, 0x000088C9 }, /* GL_MATRIX9_ARB */
+   { 25955, 0x000088CA }, /* GL_MATRIX10_ARB */
+   { 25971, 0x000088CB }, /* GL_MATRIX11_ARB */
+   { 25987, 0x000088CC }, /* GL_MATRIX12_ARB */
+   { 26003, 0x000088CD }, /* GL_MATRIX13_ARB */
+   { 26019, 0x000088CE }, /* GL_MATRIX14_ARB */
+   { 26035, 0x000088CF }, /* GL_MATRIX15_ARB */
+   { 26051, 0x000088D0 }, /* GL_MATRIX16_ARB */
+   { 26067, 0x000088D1 }, /* GL_MATRIX17_ARB */
+   { 26083, 0x000088D2 }, /* GL_MATRIX18_ARB */
+   { 26099, 0x000088D3 }, /* GL_MATRIX19_ARB */
+   { 26115, 0x000088D4 }, /* GL_MATRIX20_ARB */
+   { 26131, 0x000088D5 }, /* GL_MATRIX21_ARB */
+   { 26147, 0x000088D6 }, /* GL_MATRIX22_ARB */
+   { 26163, 0x000088D7 }, /* GL_MATRIX23_ARB */
+   { 26179, 0x000088D8 }, /* GL_MATRIX24_ARB */
+   { 26195, 0x000088D9 }, /* GL_MATRIX25_ARB */
+   { 26211, 0x000088DA }, /* GL_MATRIX26_ARB */
+   { 26227, 0x000088DB }, /* GL_MATRIX27_ARB */
+   { 26243, 0x000088DC }, /* GL_MATRIX28_ARB */
+   { 26259, 0x000088DD }, /* GL_MATRIX29_ARB */
+   { 26275, 0x000088DE }, /* GL_MATRIX30_ARB */
+   { 26291, 0x000088DF }, /* GL_MATRIX31_ARB */
+   { 26307, 0x000088E0 }, /* GL_STREAM_DRAW */
+   { 26322, 0x000088E1 }, /* GL_STREAM_READ */
+   { 26337, 0x000088E2 }, /* GL_STREAM_COPY */
+   { 26352, 0x000088E4 }, /* GL_STATIC_DRAW */
+   { 26367, 0x000088E5 }, /* GL_STATIC_READ */
+   { 26382, 0x000088E6 }, /* GL_STATIC_COPY */
+   { 26397, 0x000088E8 }, /* GL_DYNAMIC_DRAW */
+   { 26413, 0x000088E9 }, /* GL_DYNAMIC_READ */
+   { 26429, 0x000088EA }, /* GL_DYNAMIC_COPY */
+   { 26445, 0x000088EB }, /* GL_PIXEL_PACK_BUFFER */
+   { 26466, 0x000088EC }, /* GL_PIXEL_UNPACK_BUFFER */
+   { 26489, 0x000088ED }, /* GL_PIXEL_PACK_BUFFER_BINDING */
+   { 26518, 0x000088EF }, /* GL_PIXEL_UNPACK_BUFFER_BINDING */
+   { 26549, 0x000088F0 }, /* GL_DEPTH24_STENCIL8 */
+   { 26569, 0x000088F1 }, /* GL_TEXTURE_STENCIL_SIZE */
+   { 26593, 0x000088F4 }, /* GL_MAX_PROGRAM_EXEC_INSTRUCTIONS_NV */
+   { 26629, 0x000088F5 }, /* GL_MAX_PROGRAM_CALL_DEPTH_NV */
+   { 26658, 0x000088F6 }, /* GL_MAX_PROGRAM_IF_DEPTH_NV */
+   { 26685, 0x000088F7 }, /* GL_MAX_PROGRAM_LOOP_DEPTH_NV */
+   { 26714, 0x000088F8 }, /* GL_MAX_PROGRAM_LOOP_COUNT_NV */
+   { 26743, 0x000088F9 }, /* GL_SRC1_COLOR */
+   { 26757, 0x000088FA }, /* GL_ONE_MINUS_SRC1_COLOR */
+   { 26781, 0x000088FB }, /* GL_ONE_MINUS_SRC1_ALPHA */
+   { 26805, 0x000088FC }, /* GL_MAX_DUAL_SOURCE_DRAW_BUFFERS */
+   { 26837, 0x000088FD }, /* GL_VERTEX_ATTRIB_ARRAY_INTEGER */
+   { 26868, 0x000088FE }, /* GL_VERTEX_ATTRIB_ARRAY_DIVISOR_ARB */
+   { 26903, 0x000088FF }, /* GL_MAX_ARRAY_TEXTURE_LAYERS */
+   { 26931, 0x00008904 }, /* GL_MIN_PROGRAM_TEXEL_OFFSET */
+   { 26959, 0x00008905 }, /* GL_MAX_PROGRAM_TEXEL_OFFSET */
+   { 26987, 0x00008910 }, /* GL_STENCIL_TEST_TWO_SIDE_EXT */
+   { 27016, 0x00008911 }, /* GL_ACTIVE_STENCIL_FACE_EXT */
+   { 27043, 0x00008912 }, /* GL_MIRROR_CLAMP_TO_BORDER_EXT */
+   { 27073, 0x00008914 }, /* GL_SAMPLES_PASSED */
+   { 27091, 0x00008916 }, /* GL_GEOMETRY_VERTICES_OUT */
+   { 27116, 0x00008917 }, /* GL_GEOMETRY_INPUT_TYPE */
+   { 27139, 0x00008918 }, /* GL_GEOMETRY_OUTPUT_TYPE */
+   { 27163, 0x00008919 }, /* GL_SAMPLER_BINDING */
+   { 27182, 0x0000891A }, /* GL_CLAMP_VERTEX_COLOR */
+   { 27204, 0x0000891B }, /* GL_CLAMP_FRAGMENT_COLOR */
+   { 27228, 0x0000891C }, /* GL_CLAMP_READ_COLOR */
+   { 27248, 0x0000891D }, /* GL_FIXED_ONLY */
+   { 27262, 0x00008920 }, /* GL_FRAGMENT_SHADER_ATI */
+   { 27285, 0x00008921 }, /* GL_REG_0_ATI */
+   { 27298, 0x00008922 }, /* GL_REG_1_ATI */
+   { 27311, 0x00008923 }, /* GL_REG_2_ATI */
+   { 27324, 0x00008924 }, /* GL_REG_3_ATI */
+   { 27337, 0x00008925 }, /* GL_REG_4_ATI */
+   { 27350, 0x00008926 }, /* GL_REG_5_ATI */
+   { 27363, 0x00008927 }, /* GL_REG_6_ATI */
+   { 27376, 0x00008928 }, /* GL_REG_7_ATI */
+   { 27389, 0x00008929 }, /* GL_REG_8_ATI */
+   { 27402, 0x0000892A }, /* GL_REG_9_ATI */
+   { 27415, 0x0000892B }, /* GL_REG_10_ATI */
+   { 27429, 0x0000892C }, /* GL_REG_11_ATI */
+   { 27443, 0x0000892D }, /* GL_REG_12_ATI */
+   { 27457, 0x0000892E }, /* GL_REG_13_ATI */
+   { 27471, 0x0000892F }, /* GL_REG_14_ATI */
+   { 27485, 0x00008930 }, /* GL_REG_15_ATI */
+   { 27499, 0x00008931 }, /* GL_REG_16_ATI */
+   { 27513, 0x00008932 }, /* GL_REG_17_ATI */
+   { 27527, 0x00008933 }, /* GL_REG_18_ATI */
+   { 27541, 0x00008934 }, /* GL_REG_19_ATI */
+   { 27555, 0x00008935 }, /* GL_REG_20_ATI */
+   { 27569, 0x00008936 }, /* GL_REG_21_ATI */
+   { 27583, 0x00008937 }, /* GL_REG_22_ATI */
+   { 27597, 0x00008938 }, /* GL_REG_23_ATI */
+   { 27611, 0x00008939 }, /* GL_REG_24_ATI */
+   { 27625, 0x0000893A }, /* GL_REG_25_ATI */
+   { 27639, 0x0000893B }, /* GL_REG_26_ATI */
+   { 27653, 0x0000893C }, /* GL_REG_27_ATI */
+   { 27667, 0x0000893D }, /* GL_REG_28_ATI */
+   { 27681, 0x0000893E }, /* GL_REG_29_ATI */
+   { 27695, 0x0000893F }, /* GL_REG_30_ATI */
+   { 27709, 0x00008940 }, /* GL_REG_31_ATI */
+   { 27723, 0x00008941 }, /* GL_CON_0_ATI */
+   { 27736, 0x00008942 }, /* GL_CON_1_ATI */
+   { 27749, 0x00008943 }, /* GL_CON_2_ATI */
+   { 27762, 0x00008944 }, /* GL_CON_3_ATI */
+   { 27775, 0x00008945 }, /* GL_CON_4_ATI */
+   { 27788, 0x00008946 }, /* GL_CON_5_ATI */
+   { 27801, 0x00008947 }, /* GL_CON_6_ATI */
+   { 27814, 0x00008948 }, /* GL_CON_7_ATI */
+   { 27827, 0x00008949 }, /* GL_CON_8_ATI */
+   { 27840, 0x0000894A }, /* GL_CON_9_ATI */
+   { 27853, 0x0000894B }, /* GL_CON_10_ATI */
+   { 27867, 0x0000894C }, /* GL_CON_11_ATI */
+   { 27881, 0x0000894D }, /* GL_CON_12_ATI */
+   { 27895, 0x0000894E }, /* GL_CON_13_ATI */
+   { 27909, 0x0000894F }, /* GL_CON_14_ATI */
+   { 27923, 0x00008950 }, /* GL_CON_15_ATI */
+   { 27937, 0x00008951 }, /* GL_CON_16_ATI */
+   { 27951, 0x00008952 }, /* GL_CON_17_ATI */
+   { 27965, 0x00008953 }, /* GL_CON_18_ATI */
+   { 27979, 0x00008954 }, /* GL_CON_19_ATI */
+   { 27993, 0x00008955 }, /* GL_CON_20_ATI */
+   { 28007, 0x00008956 }, /* GL_CON_21_ATI */
+   { 28021, 0x00008957 }, /* GL_CON_22_ATI */
+   { 28035, 0x00008958 }, /* GL_CON_23_ATI */
+   { 28049, 0x00008959 }, /* GL_CON_24_ATI */
+   { 28063, 0x0000895A }, /* GL_CON_25_ATI */
+   { 28077, 0x0000895B }, /* GL_CON_26_ATI */
+   { 28091, 0x0000895C }, /* GL_CON_27_ATI */
+   { 28105, 0x0000895D }, /* GL_CON_28_ATI */
+   { 28119, 0x0000895E }, /* GL_CON_29_ATI */
+   { 28133, 0x0000895F }, /* GL_CON_30_ATI */
+   { 28147, 0x00008960 }, /* GL_CON_31_ATI */
+   { 28161, 0x00008961 }, /* GL_MOV_ATI */
+   { 28172, 0x00008963 }, /* GL_ADD_ATI */
+   { 28183, 0x00008964 }, /* GL_MUL_ATI */
+   { 28194, 0x00008965 }, /* GL_SUB_ATI */
+   { 28205, 0x00008966 }, /* GL_DOT3_ATI */
+   { 28217, 0x00008967 }, /* GL_DOT4_ATI */
+   { 28229, 0x00008968 }, /* GL_MAD_ATI */
+   { 28240, 0x00008969 }, /* GL_LERP_ATI */
+   { 28252, 0x0000896A }, /* GL_CND_ATI */
+   { 28263, 0x0000896B }, /* GL_CND0_ATI */
+   { 28275, 0x0000896C }, /* GL_DOT2_ADD_ATI */
+   { 28291, 0x0000896D }, /* GL_SECONDARY_INTERPOLATOR_ATI */
+   { 28321, 0x0000896E }, /* GL_NUM_FRAGMENT_REGISTERS_ATI */
+   { 28351, 0x0000896F }, /* GL_NUM_FRAGMENT_CONSTANTS_ATI */
+   { 28381, 0x00008970 }, /* GL_NUM_PASSES_ATI */
+   { 28399, 0x00008971 }, /* GL_NUM_INSTRUCTIONS_PER_PASS_ATI */
+   { 28432, 0x00008972 }, /* GL_NUM_INSTRUCTIONS_TOTAL_ATI */
+   { 28462, 0x00008973 }, /* GL_NUM_INPUT_INTERPOLATOR_COMPONENTS_ATI */
+   { 28503, 0x00008974 }, /* GL_NUM_LOOPBACK_COMPONENTS_ATI */
+   { 28534, 0x00008975 }, /* GL_COLOR_ALPHA_PAIRING_ATI */
+   { 28561, 0x00008976 }, /* GL_SWIZZLE_STR_ATI */
+   { 28580, 0x00008977 }, /* GL_SWIZZLE_STQ_ATI */
+   { 28599, 0x00008978 }, /* GL_SWIZZLE_STR_DR_ATI */
+   { 28621, 0x00008979 }, /* GL_SWIZZLE_STQ_DQ_ATI */
+   { 28643, 0x0000897A }, /* GL_SWIZZLE_STRQ_ATI */
+   { 28663, 0x0000897B }, /* GL_SWIZZLE_STRQ_DQ_ATI */
+   { 28686, 0x0000898A }, /* GL_POINT_SIZE_ARRAY_TYPE_OES */
+   { 28715, 0x0000898B }, /* GL_POINT_SIZE_ARRAY_STRIDE_OES */
+   { 28746, 0x0000898C }, /* GL_POINT_SIZE_ARRAY_POINTER_OES */
+   { 28778, 0x0000898D }, /* GL_MODELVIEW_MATRIX_FLOAT_AS_INT_BITS_OES */
+   { 28820, 0x0000898E }, /* GL_PROJECTION_MATRIX_FLOAT_AS_INT_BITS_OES */
+   { 28863, 0x0000898F }, /* GL_TEXTURE_MATRIX_FLOAT_AS_INT_BITS_OES */
+   { 28903, 0x00008A11 }, /* GL_UNIFORM_BUFFER */
+   { 28921, 0x00008A12 }, /* GL_BUFFER_SERIALIZED_MODIFY_APPLE */
+   { 28955, 0x00008A13 }, /* GL_BUFFER_FLUSHING_UNMAP_APPLE */
+   { 28986, 0x00008A19 }, /* GL_RELEASED_APPLE */
+   { 29004, 0x00008A1A }, /* GL_VOLATILE_APPLE */
+   { 29022, 0x00008A1B }, /* GL_RETAINED_APPLE */
+   { 29040, 0x00008A1C }, /* GL_UNDEFINED_APPLE */
+   { 29059, 0x00008A1D }, /* GL_PURGEABLE_APPLE */
+   { 29078, 0x00008A28 }, /* GL_UNIFORM_BUFFER_BINDING */
+   { 29104, 0x00008A29 }, /* GL_UNIFORM_BUFFER_START */
+   { 29128, 0x00008A2A }, /* GL_UNIFORM_BUFFER_SIZE */
+   { 29151, 0x00008A2B }, /* GL_MAX_VERTEX_UNIFORM_BLOCKS */
+   { 29180, 0x00008A2C }, /* GL_MAX_GEOMETRY_UNIFORM_BLOCKS */
+   { 29211, 0x00008A2D }, /* GL_MAX_FRAGMENT_UNIFORM_BLOCKS */
+   { 29242, 0x00008A2E }, /* GL_MAX_COMBINED_UNIFORM_BLOCKS */
+   { 29273, 0x00008A2F }, /* GL_MAX_UNIFORM_BUFFER_BINDINGS */
+   { 29304, 0x00008A30 }, /* GL_MAX_UNIFORM_BLOCK_SIZE */
+   { 29330, 0x00008A31 }, /* GL_MAX_COMBINED_VERTEX_UNIFORM_COMPONENTS */
+   { 29372, 0x00008A32 }, /* GL_MAX_COMBINED_GEOMETRY_UNIFORM_COMPONENTS */
+   { 29416, 0x00008A33 }, /* GL_MAX_COMBINED_FRAGMENT_UNIFORM_COMPONENTS */
+   { 29460, 0x00008A34 }, /* GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT */
+   { 29495, 0x00008A35 }, /* GL_ACTIVE_UNIFORM_BLOCK_MAX_NAME_LENGTH */
+   { 29535, 0x00008A36 }, /* GL_ACTIVE_UNIFORM_BLOCKS */
+   { 29560, 0x00008A37 }, /* GL_UNIFORM_TYPE */
+   { 29576, 0x00008A38 }, /* GL_UNIFORM_SIZE */
+   { 29592, 0x00008A39 }, /* GL_UNIFORM_NAME_LENGTH */
+   { 29615, 0x00008A3A }, /* GL_UNIFORM_BLOCK_INDEX */
+   { 29638, 0x00008A3B }, /* GL_UNIFORM_OFFSET */
+   { 29656, 0x00008A3C }, /* GL_UNIFORM_ARRAY_STRIDE */
+   { 29680, 0x00008A3D }, /* GL_UNIFORM_MATRIX_STRIDE */
+   { 29705, 0x00008A3E }, /* GL_UNIFORM_IS_ROW_MAJOR */
+   { 29729, 0x00008A3F }, /* GL_UNIFORM_BLOCK_BINDING */
+   { 29754, 0x00008A40 }, /* GL_UNIFORM_BLOCK_DATA_SIZE */
+   { 29781, 0x00008A41 }, /* GL_UNIFORM_BLOCK_NAME_LENGTH */
+   { 29810, 0x00008A42 }, /* GL_UNIFORM_BLOCK_ACTIVE_UNIFORMS */
+   { 29843, 0x00008A43 }, /* GL_UNIFORM_BLOCK_ACTIVE_UNIFORM_INDICES */
+   { 29883, 0x00008A44 }, /* GL_UNIFORM_BLOCK_REFERENCED_BY_VERTEX_SHADER */
+   { 29928, 0x00008A45 }, /* GL_UNIFORM_BLOCK_REFERENCED_BY_GEOMETRY_SHADER */
+   { 29975, 0x00008A46 }, /* GL_UNIFORM_BLOCK_REFERENCED_BY_FRAGMENT_SHADER */
+   { 30022, 0x00008A48 }, /* GL_TEXTURE_SRGB_DECODE_EXT */
+   { 30049, 0x00008A49 }, /* GL_DECODE_EXT */
+   { 30063, 0x00008A4A }, /* GL_SKIP_DECODE_EXT */
+   { 30082, 0x00008B30 }, /* GL_FRAGMENT_SHADER */
+   { 30101, 0x00008B31 }, /* GL_VERTEX_SHADER */
+   { 30118, 0x00008B40 }, /* GL_PROGRAM_OBJECT_ARB */
+   { 30140, 0x00008B48 }, /* GL_SHADER_OBJECT_ARB */
+   { 30161, 0x00008B49 }, /* GL_MAX_FRAGMENT_UNIFORM_COMPONENTS */
+   { 30196, 0x00008B4A }, /* GL_MAX_VERTEX_UNIFORM_COMPONENTS */
+   { 30229, 0x00008B4B }, /* GL_MAX_VARYING_COMPONENTS */
+   { 30255, 0x00008B4C }, /* GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS */
+   { 30289, 0x00008B4D }, /* GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS */
+   { 30325, 0x00008B4E }, /* GL_OBJECT_TYPE_ARB */
+   { 30344, 0x00008B4F }, /* GL_SHADER_TYPE */
+   { 30359, 0x00008B50 }, /* GL_FLOAT_VEC2 */
+   { 30373, 0x00008B51 }, /* GL_FLOAT_VEC3 */
+   { 30387, 0x00008B52 }, /* GL_FLOAT_VEC4 */
+   { 30401, 0x00008B53 }, /* GL_INT_VEC2 */
+   { 30413, 0x00008B54 }, /* GL_INT_VEC3 */
+   { 30425, 0x00008B55 }, /* GL_INT_VEC4 */
+   { 30437, 0x00008B56 }, /* GL_BOOL */
+   { 30445, 0x00008B57 }, /* GL_BOOL_VEC2 */
+   { 30458, 0x00008B58 }, /* GL_BOOL_VEC3 */
+   { 30471, 0x00008B59 }, /* GL_BOOL_VEC4 */
+   { 30484, 0x00008B5A }, /* GL_FLOAT_MAT2 */
+   { 30498, 0x00008B5B }, /* GL_FLOAT_MAT3 */
+   { 30512, 0x00008B5C }, /* GL_FLOAT_MAT4 */
+   { 30526, 0x00008B5D }, /* GL_SAMPLER_1D */
+   { 30540, 0x00008B5E }, /* GL_SAMPLER_2D */
+   { 30554, 0x00008B5F }, /* GL_SAMPLER_3D */
+   { 30568, 0x00008B60 }, /* GL_SAMPLER_CUBE */
+   { 30584, 0x00008B61 }, /* GL_SAMPLER_1D_SHADOW */
+   { 30605, 0x00008B62 }, /* GL_SAMPLER_2D_SHADOW */
+   { 30626, 0x00008B63 }, /* GL_SAMPLER_2D_RECT */
+   { 30645, 0x00008B64 }, /* GL_SAMPLER_2D_RECT_SHADOW */
+   { 30671, 0x00008B65 }, /* GL_FLOAT_MAT2x3 */
+   { 30687, 0x00008B66 }, /* GL_FLOAT_MAT2x4 */
+   { 30703, 0x00008B67 }, /* GL_FLOAT_MAT3x2 */
+   { 30719, 0x00008B68 }, /* GL_FLOAT_MAT3x4 */
+   { 30735, 0x00008B69 }, /* GL_FLOAT_MAT4x2 */
+   { 30751, 0x00008B6A }, /* GL_FLOAT_MAT4x3 */
+   { 30767, 0x00008B80 }, /* GL_DELETE_STATUS */
+   { 30784, 0x00008B81 }, /* GL_COMPILE_STATUS */
+   { 30802, 0x00008B82 }, /* GL_LINK_STATUS */
+   { 30817, 0x00008B83 }, /* GL_VALIDATE_STATUS */
+   { 30836, 0x00008B84 }, /* GL_INFO_LOG_LENGTH */
+   { 30855, 0x00008B85 }, /* GL_ATTACHED_SHADERS */
+   { 30875, 0x00008B86 }, /* GL_ACTIVE_UNIFORMS */
+   { 30894, 0x00008B87 }, /* GL_ACTIVE_UNIFORM_MAX_LENGTH */
+   { 30923, 0x00008B88 }, /* GL_SHADER_SOURCE_LENGTH */
+   { 30947, 0x00008B89 }, /* GL_ACTIVE_ATTRIBUTES */
+   { 30968, 0x00008B8A }, /* GL_ACTIVE_ATTRIBUTE_MAX_LENGTH */
+   { 30999, 0x00008B8B }, /* GL_FRAGMENT_SHADER_DERIVATIVE_HINT */
+   { 31034, 0x00008B8C }, /* GL_SHADING_LANGUAGE_VERSION */
+   { 31062, 0x00008B8D }, /* GL_CURRENT_PROGRAM */
+   { 31081, 0x00008B90 }, /* GL_PALETTE4_RGB8_OES */
+   { 31102, 0x00008B91 }, /* GL_PALETTE4_RGBA8_OES */
+   { 31124, 0x00008B92 }, /* GL_PALETTE4_R5_G6_B5_OES */
+   { 31149, 0x00008B93 }, /* GL_PALETTE4_RGBA4_OES */
+   { 31171, 0x00008B94 }, /* GL_PALETTE4_RGB5_A1_OES */
+   { 31195, 0x00008B95 }, /* GL_PALETTE8_RGB8_OES */
+   { 31216, 0x00008B96 }, /* GL_PALETTE8_RGBA8_OES */
+   { 31238, 0x00008B97 }, /* GL_PALETTE8_R5_G6_B5_OES */
+   { 31263, 0x00008B98 }, /* GL_PALETTE8_RGBA4_OES */
+   { 31285, 0x00008B99 }, /* GL_PALETTE8_RGB5_A1_OES */
+   { 31309, 0x00008B9A }, /* GL_IMPLEMENTATION_COLOR_READ_TYPE */
+   { 31343, 0x00008B9B }, /* GL_IMPLEMENTATION_COLOR_READ_FORMAT */
+   { 31379, 0x00008B9C }, /* GL_POINT_SIZE_ARRAY_OES */
+   { 31403, 0x00008B9D }, /* GL_TEXTURE_CROP_RECT_OES */
+   { 31428, 0x00008B9E }, /* GL_MATRIX_INDEX_ARRAY_BUFFER_BINDING_OES */
+   { 31469, 0x00008B9F }, /* GL_POINT_SIZE_ARRAY_BUFFER_BINDING_OES */
+   { 31508, 0x00008BC0 }, /* GL_COUNTER_TYPE_AMD */
+   { 31528, 0x00008BC1 }, /* GL_COUNTER_RANGE_AMD */
+   { 31549, 0x00008BC2 }, /* GL_UNSIGNED_INT64_AMD */
+   { 31571, 0x00008BC3 }, /* GL_PECENTAGE_AMD */
+   { 31588, 0x00008BC4 }, /* GL_PERFMON_RESULT_AVAILABLE_AMD */
+   { 31620, 0x00008BC5 }, /* GL_PERFMON_RESULT_SIZE_AMD */
+   { 31647, 0x00008BC6 }, /* GL_PERFMON_RESULT_AMD */
+   { 31669, 0x00008C10 }, /* GL_TEXTURE_RED_TYPE */
+   { 31689, 0x00008C11 }, /* GL_TEXTURE_GREEN_TYPE */
+   { 31711, 0x00008C12 }, /* GL_TEXTURE_BLUE_TYPE */
+   { 31732, 0x00008C13 }, /* GL_TEXTURE_ALPHA_TYPE */
+   { 31754, 0x00008C14 }, /* GL_TEXTURE_LUMINANCE_TYPE */
+   { 31780, 0x00008C15 }, /* GL_TEXTURE_INTENSITY_TYPE */
+   { 31806, 0x00008C16 }, /* GL_TEXTURE_DEPTH_TYPE */
+   { 31828, 0x00008C17 }, /* GL_UNSIGNED_NORMALIZED */
+   { 31851, 0x00008C18 }, /* GL_TEXTURE_1D_ARRAY */
+   { 31871, 0x00008C19 }, /* GL_PROXY_TEXTURE_1D_ARRAY */
+   { 31897, 0x00008C1A }, /* GL_TEXTURE_2D_ARRAY */
+   { 31917, 0x00008C1B }, /* GL_PROXY_TEXTURE_2D_ARRAY */
+   { 31943, 0x00008C1C }, /* GL_TEXTURE_BINDING_1D_ARRAY */
+   { 31971, 0x00008C1D }, /* GL_TEXTURE_BINDING_2D_ARRAY */
+   { 31999, 0x00008C29 }, /* GL_MAX_GEOMETRY_TEXTURE_IMAGE_UNITS */
+   { 32035, 0x00008C2A }, /* GL_TEXTURE_BUFFER */
+   { 32053, 0x00008C2B }, /* GL_MAX_TEXTURE_BUFFER_SIZE */
+   { 32080, 0x00008C2C }, /* GL_TEXTURE_BINDING_BUFFER */
+   { 32106, 0x00008C2D }, /* GL_TEXTURE_BUFFER_DATA_STORE_BINDING */
+   { 32143, 0x00008C2E }, /* GL_TEXTURE_BUFFER_FORMAT */
+   { 32168, 0x00008C2F }, /* GL_ANY_SAMPLES_PASSED */
+   { 32190, 0x00008C36 }, /* GL_SAMPLE_SHADING */
+   { 32208, 0x00008C37 }, /* GL_MIN_SAMPLE_SHADING_VALUE */
+   { 32236, 0x00008C3A }, /* GL_R11F_G11F_B10F */
+   { 32254, 0x00008C3B }, /* GL_UNSIGNED_INT_10F_11F_11F_REV */
+   { 32286, 0x00008C3C }, /* GL_RGBA_SIGNED_COMPONENTS_EXT */
+   { 32316, 0x00008C3D }, /* GL_RGB9_E5 */
+   { 32327, 0x00008C3E }, /* GL_UNSIGNED_INT_5_9_9_9_REV */
+   { 32355, 0x00008C3F }, /* GL_TEXTURE_SHARED_SIZE */
+   { 32378, 0x00008C40 }, /* GL_SRGB */
+   { 32386, 0x00008C41 }, /* GL_SRGB8 */
+   { 32395, 0x00008C42 }, /* GL_SRGB_ALPHA */
+   { 32409, 0x00008C43 }, /* GL_SRGB8_ALPHA8 */
+   { 32425, 0x00008C44 }, /* GL_SLUMINANCE_ALPHA */
+   { 32445, 0x00008C45 }, /* GL_SLUMINANCE8_ALPHA8 */
+   { 32467, 0x00008C46 }, /* GL_SLUMINANCE */
+   { 32481, 0x00008C47 }, /* GL_SLUMINANCE8 */
+   { 32496, 0x00008C48 }, /* GL_COMPRESSED_SRGB */
+   { 32515, 0x00008C49 }, /* GL_COMPRESSED_SRGB_ALPHA */
+   { 32540, 0x00008C4A }, /* GL_COMPRESSED_SLUMINANCE */
+   { 32565, 0x00008C4B }, /* GL_COMPRESSED_SLUMINANCE_ALPHA */
+   { 32596, 0x00008C76 }, /* GL_TRANSFORM_FEEDBACK_VARYING_MAX_LENGTH */
+   { 32637, 0x00008C7F }, /* GL_TRANSFORM_FEEDBACK_BUFFER_MODE */
+   { 32671, 0x00008C80 }, /* GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS */
+   { 32717, 0x00008C83 }, /* GL_TRANSFORM_FEEDBACK_VARYINGS */
+   { 32748, 0x00008C84 }, /* GL_TRANSFORM_FEEDBACK_BUFFER_START */
+   { 32783, 0x00008C85 }, /* GL_TRANSFORM_FEEDBACK_BUFFER_SIZE */
+   { 32817, 0x00008C87 }, /* GL_PRIMITIVES_GENERATED */
+   { 32841, 0x00008C88 }, /* GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN */
+   { 32882, 0x00008C89 }, /* GL_RASTERIZER_DISCARD */
+   { 32904, 0x00008C8A }, /* GL_MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS */
+   { 32953, 0x00008C8B }, /* GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS */
+   { 32996, 0x00008C8C }, /* GL_INTERLEAVED_ATTRIBS */
+   { 33019, 0x00008C8D }, /* GL_SEPARATE_ATTRIBS */
+   { 33039, 0x00008C8E }, /* GL_TRANSFORM_FEEDBACK_BUFFER */
+   { 33068, 0x00008C8F }, /* GL_TRANSFORM_FEEDBACK_BUFFER_BINDING */
+   { 33105, 0x00008CA0 }, /* GL_POINT_SPRITE_COORD_ORIGIN */
+   { 33134, 0x00008CA1 }, /* GL_LOWER_LEFT */
+   { 33148, 0x00008CA2 }, /* GL_UPPER_LEFT */
+   { 33162, 0x00008CA3 }, /* GL_STENCIL_BACK_REF */
+   { 33182, 0x00008CA4 }, /* GL_STENCIL_BACK_VALUE_MASK */
+   { 33209, 0x00008CA5 }, /* GL_STENCIL_BACK_WRITEMASK */
+   { 33235, 0x00008CA6 }, /* GL_DRAW_FRAMEBUFFER_BINDING */
+   { 33263, 0x00008CA7 }, /* GL_RENDERBUFFER_BINDING */
+   { 33287, 0x00008CA8 }, /* GL_READ_FRAMEBUFFER */
+   { 33307, 0x00008CA9 }, /* GL_DRAW_FRAMEBUFFER */
+   { 33327, 0x00008CAA }, /* GL_READ_FRAMEBUFFER_BINDING */
+   { 33355, 0x00008CAB }, /* GL_RENDERBUFFER_SAMPLES */
+   { 33379, 0x00008CAC }, /* GL_DEPTH_COMPONENT32F */
+   { 33401, 0x00008CAD }, /* GL_DEPTH32F_STENCIL8 */
+   { 33422, 0x00008CD0 }, /* GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE */
+   { 33460, 0x00008CD1 }, /* GL_FRAMEBUFFER_ATTACHMENT_OBJECT_NAME */
+   { 33498, 0x00008CD2 }, /* GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_LEVEL */
+   { 33538, 0x00008CD3 }, /* GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_CUBE_MAP_FACE */
+   { 33586, 0x00008CD4 }, /* GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_LAYER */
+   { 33626, 0x00008CD5 }, /* GL_FRAMEBUFFER_COMPLETE */
+   { 33650, 0x00008CD6 }, /* GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT */
+   { 33687, 0x00008CD7 }, /* GL_FRAMEBUFFER_INCOMPLETE_MISSING_ATTACHMENT */
+   { 33732, 0x00008CD8 }, /* GL_FRAMEBUFFER_INCOMPLETE_DUPLICATE_ATTACHMENT_EXT */
+   { 33783, 0x00008CD9 }, /* GL_FRAMEBUFFER_INCOMPLETE_DIMENSIONS_EXT */
+   { 33824, 0x00008CDA }, /* GL_FRAMEBUFFER_INCOMPLETE_FORMATS_EXT */
+   { 33862, 0x00008CDB }, /* GL_FRAMEBUFFER_INCOMPLETE_DRAW_BUFFER */
+   { 33900, 0x00008CDC }, /* GL_FRAMEBUFFER_INCOMPLETE_READ_BUFFER */
+   { 33938, 0x00008CDD }, /* GL_FRAMEBUFFER_UNSUPPORTED */
+   { 33965, 0x00008CDE }, /* GL_FRAMEBUFFER_STATUS_ERROR_EXT */
+   { 33997, 0x00008CDF }, /* GL_MAX_COLOR_ATTACHMENTS */
+   { 34022, 0x00008CE0 }, /* GL_COLOR_ATTACHMENT0 */
+   { 34043, 0x00008CE1 }, /* GL_COLOR_ATTACHMENT1 */
+   { 34064, 0x00008CE2 }, /* GL_COLOR_ATTACHMENT2 */
+   { 34085, 0x00008CE3 }, /* GL_COLOR_ATTACHMENT3 */
+   { 34106, 0x00008CE4 }, /* GL_COLOR_ATTACHMENT4 */
+   { 34127, 0x00008CE5 }, /* GL_COLOR_ATTACHMENT5 */
+   { 34148, 0x00008CE6 }, /* GL_COLOR_ATTACHMENT6 */
+   { 34169, 0x00008CE7 }, /* GL_COLOR_ATTACHMENT7 */
+   { 34190, 0x00008CE8 }, /* GL_COLOR_ATTACHMENT8 */
+   { 34211, 0x00008CE9 }, /* GL_COLOR_ATTACHMENT9 */
+   { 34232, 0x00008CEA }, /* GL_COLOR_ATTACHMENT10 */
+   { 34254, 0x00008CEB }, /* GL_COLOR_ATTACHMENT11 */
+   { 34276, 0x00008CEC }, /* GL_COLOR_ATTACHMENT12 */
+   { 34298, 0x00008CED }, /* GL_COLOR_ATTACHMENT13 */
+   { 34320, 0x00008CEE }, /* GL_COLOR_ATTACHMENT14 */
+   { 34342, 0x00008CEF }, /* GL_COLOR_ATTACHMENT15 */
+   { 34364, 0x00008D00 }, /* GL_DEPTH_ATTACHMENT */
+   { 34384, 0x00008D20 }, /* GL_STENCIL_ATTACHMENT */
+   { 34406, 0x00008D40 }, /* GL_FRAMEBUFFER */
+   { 34421, 0x00008D41 }, /* GL_RENDERBUFFER */
+   { 34437, 0x00008D42 }, /* GL_RENDERBUFFER_WIDTH */
+   { 34459, 0x00008D43 }, /* GL_RENDERBUFFER_HEIGHT */
+   { 34482, 0x00008D44 }, /* GL_RENDERBUFFER_INTERNAL_FORMAT */
+   { 34514, 0x00008D45 }, /* GL_STENCIL_INDEX_EXT */
+   { 34535, 0x00008D46 }, /* GL_STENCIL_INDEX1 */
+   { 34553, 0x00008D47 }, /* GL_STENCIL_INDEX4 */
+   { 34571, 0x00008D48 }, /* GL_STENCIL_INDEX8 */
+   { 34589, 0x00008D49 }, /* GL_STENCIL_INDEX16 */
+   { 34608, 0x00008D50 }, /* GL_RENDERBUFFER_RED_SIZE */
+   { 34633, 0x00008D51 }, /* GL_RENDERBUFFER_GREEN_SIZE */
+   { 34660, 0x00008D52 }, /* GL_RENDERBUFFER_BLUE_SIZE */
+   { 34686, 0x00008D53 }, /* GL_RENDERBUFFER_ALPHA_SIZE */
+   { 34713, 0x00008D54 }, /* GL_RENDERBUFFER_DEPTH_SIZE */
+   { 34740, 0x00008D55 }, /* GL_RENDERBUFFER_STENCIL_SIZE */
+   { 34769, 0x00008D56 }, /* GL_FRAMEBUFFER_INCOMPLETE_MULTISAMPLE */
+   { 34807, 0x00008D57 }, /* GL_MAX_SAMPLES */
+   { 34822, 0x00008D60 }, /* GL_TEXTURE_GEN_STR_OES */
+   { 34845, 0x00008D61 }, /* GL_HALF_FLOAT_OES */
+   { 34863, 0x00008D62 }, /* GL_RGB565 */
+   { 34873, 0x00008D64 }, /* GL_ETC1_RGB8_OES */
+   { 34890, 0x00008D65 }, /* GL_TEXTURE_EXTERNAL_OES */
+   { 34914, 0x00008D66 }, /* GL_SAMPLER_EXTERNAL_OES */
+   { 34938, 0x00008D67 }, /* GL_TEXTURE_BINDING_EXTERNAL_OES */
+   { 34970, 0x00008D68 }, /* GL_REQUIRED_TEXTURE_IMAGE_UNITS_OES */
+   { 35006, 0x00008D69 }, /* GL_PRIMITIVE_RESTART_FIXED_INDEX */
+   { 35039, 0x00008D6A }, /* GL_ANY_SAMPLES_PASSED_CONSERVATIVE */
+   { 35074, 0x00008D6B }, /* GL_MAX_ELEMENT_INDEX */
+   { 35095, 0x00008D70 }, /* GL_RGBA32UI */
+   { 35107, 0x00008D71 }, /* GL_RGB32UI */
+   { 35118, 0x00008D72 }, /* GL_ALPHA32UI_EXT */
+   { 35135, 0x00008D73 }, /* GL_INTENSITY32UI_EXT */
+   { 35156, 0x00008D74 }, /* GL_LUMINANCE32UI_EXT */
+   { 35177, 0x00008D75 }, /* GL_LUMINANCE_ALPHA32UI_EXT */
+   { 35204, 0x00008D76 }, /* GL_RGBA16UI */
+   { 35216, 0x00008D77 }, /* GL_RGB16UI */
+   { 35227, 0x00008D78 }, /* GL_ALPHA16UI_EXT */
+   { 35244, 0x00008D79 }, /* GL_INTENSITY16UI_EXT */
+   { 35265, 0x00008D7A }, /* GL_LUMINANCE16UI_EXT */
+   { 35286, 0x00008D7B }, /* GL_LUMINANCE_ALPHA16UI_EXT */
+   { 35313, 0x00008D7C }, /* GL_RGBA8UI */
+   { 35324, 0x00008D7D }, /* GL_RGB8UI */
+   { 35334, 0x00008D7E }, /* GL_ALPHA8UI_EXT */
+   { 35350, 0x00008D7F }, /* GL_INTENSITY8UI_EXT */
+   { 35370, 0x00008D80 }, /* GL_LUMINANCE8UI_EXT */
+   { 35390, 0x00008D81 }, /* GL_LUMINANCE_ALPHA8UI_EXT */
+   { 35416, 0x00008D82 }, /* GL_RGBA32I */
+   { 35427, 0x00008D83 }, /* GL_RGB32I */
+   { 35437, 0x00008D84 }, /* GL_ALPHA32I_EXT */
+   { 35453, 0x00008D85 }, /* GL_INTENSITY32I_EXT */
+   { 35473, 0x00008D86 }, /* GL_LUMINANCE32I_EXT */
+   { 35493, 0x00008D87 }, /* GL_LUMINANCE_ALPHA32I_EXT */
+   { 35519, 0x00008D88 }, /* GL_RGBA16I */
+   { 35530, 0x00008D89 }, /* GL_RGB16I */
+   { 35540, 0x00008D8A }, /* GL_ALPHA16I_EXT */
+   { 35556, 0x00008D8B }, /* GL_INTENSITY16I_EXT */
+   { 35576, 0x00008D8C }, /* GL_LUMINANCE16I_EXT */
+   { 35596, 0x00008D8D }, /* GL_LUMINANCE_ALPHA16I_EXT */
+   { 35622, 0x00008D8E }, /* GL_RGBA8I */
+   { 35632, 0x00008D8F }, /* GL_RGB8I */
+   { 35641, 0x00008D90 }, /* GL_ALPHA8I_EXT */
+   { 35656, 0x00008D91 }, /* GL_INTENSITY8I_EXT */
+   { 35675, 0x00008D92 }, /* GL_LUMINANCE8I_EXT */
+   { 35694, 0x00008D93 }, /* GL_LUMINANCE_ALPHA8I_EXT */
+   { 35719, 0x00008D94 }, /* GL_RED_INTEGER */
+   { 35734, 0x00008D95 }, /* GL_GREEN_INTEGER */
+   { 35751, 0x00008D96 }, /* GL_BLUE_INTEGER */
+   { 35767, 0x00008D97 }, /* GL_ALPHA_INTEGER_EXT */
+   { 35788, 0x00008D98 }, /* GL_RGB_INTEGER */
+   { 35803, 0x00008D99 }, /* GL_RGBA_INTEGER */
+   { 35819, 0x00008D9A }, /* GL_BGR_INTEGER */
+   { 35834, 0x00008D9B }, /* GL_BGRA_INTEGER */
+   { 35850, 0x00008D9C }, /* GL_LUMINANCE_INTEGER_EXT */
+   { 35875, 0x00008D9D }, /* GL_LUMINANCE_ALPHA_INTEGER_EXT */
+   { 35906, 0x00008D9E }, /* GL_RGBA_INTEGER_MODE_EXT */
+   { 35931, 0x00008D9F }, /* GL_INT_2_10_10_10_REV */
+   { 35953, 0x00008DA7 }, /* GL_FRAMEBUFFER_ATTACHMENT_LAYERED */
+   { 35987, 0x00008DA8 }, /* GL_FRAMEBUFFER_INCOMPLETE_LAYER_TARGETS */
+   { 36027, 0x00008DA9 }, /* GL_FRAMEBUFFER_INCOMPLETE_LAYER_COUNT_ARB */
+   { 36069, 0x00008DAD }, /* GL_FLOAT_32_UNSIGNED_INT_24_8_REV */
+   { 36103, 0x00008DB9 }, /* GL_FRAMEBUFFER_SRGB */
+   { 36123, 0x00008DBA }, /* GL_FRAMEBUFFER_SRGB_CAPABLE_EXT */
+   { 36155, 0x00008DBB }, /* GL_COMPRESSED_RED_RGTC1 */
+   { 36179, 0x00008DBC }, /* GL_COMPRESSED_SIGNED_RED_RGTC1 */
+   { 36210, 0x00008DBD }, /* GL_COMPRESSED_RG_RGTC2 */
+   { 36233, 0x00008DBE }, /* GL_COMPRESSED_SIGNED_RG_RGTC2 */
+   { 36263, 0x00008DC0 }, /* GL_SAMPLER_1D_ARRAY */
+   { 36283, 0x00008DC1 }, /* GL_SAMPLER_2D_ARRAY */
+   { 36303, 0x00008DC2 }, /* GL_SAMPLER_BUFFER */
+   { 36321, 0x00008DC3 }, /* GL_SAMPLER_1D_ARRAY_SHADOW */
+   { 36348, 0x00008DC4 }, /* GL_SAMPLER_2D_ARRAY_SHADOW */
+   { 36375, 0x00008DC5 }, /* GL_SAMPLER_CUBE_SHADOW */
+   { 36398, 0x00008DC6 }, /* GL_UNSIGNED_INT_VEC2 */
+   { 36419, 0x00008DC7 }, /* GL_UNSIGNED_INT_VEC3 */
+   { 36440, 0x00008DC8 }, /* GL_UNSIGNED_INT_VEC4 */
+   { 36461, 0x00008DC9 }, /* GL_INT_SAMPLER_1D */
+   { 36479, 0x00008DCA }, /* GL_INT_SAMPLER_2D */
+   { 36497, 0x00008DCB }, /* GL_INT_SAMPLER_3D */
+   { 36515, 0x00008DCC }, /* GL_INT_SAMPLER_CUBE */
+   { 36535, 0x00008DCD }, /* GL_INT_SAMPLER_2D_RECT */
+   { 36558, 0x00008DCE }, /* GL_INT_SAMPLER_1D_ARRAY */
+   { 36582, 0x00008DCF }, /* GL_INT_SAMPLER_2D_ARRAY */
+   { 36606, 0x00008DD0 }, /* GL_INT_SAMPLER_BUFFER */
+   { 36628, 0x00008DD1 }, /* GL_UNSIGNED_INT_SAMPLER_1D */
+   { 36655, 0x00008DD2 }, /* GL_UNSIGNED_INT_SAMPLER_2D */
+   { 36682, 0x00008DD3 }, /* GL_UNSIGNED_INT_SAMPLER_3D */
+   { 36709, 0x00008DD4 }, /* GL_UNSIGNED_INT_SAMPLER_CUBE */
+   { 36738, 0x00008DD5 }, /* GL_UNSIGNED_INT_SAMPLER_2D_RECT */
+   { 36770, 0x00008DD6 }, /* GL_UNSIGNED_INT_SAMPLER_1D_ARRAY */
+   { 36803, 0x00008DD7 }, /* GL_UNSIGNED_INT_SAMPLER_2D_ARRAY */
+   { 36836, 0x00008DD8 }, /* GL_UNSIGNED_INT_SAMPLER_BUFFER */
+   { 36867, 0x00008DD9 }, /* GL_GEOMETRY_SHADER */
+   { 36886, 0x00008DDA }, /* GL_GEOMETRY_VERTICES_OUT_ARB */
+   { 36915, 0x00008DDB }, /* GL_GEOMETRY_INPUT_TYPE_ARB */
+   { 36942, 0x00008DDC }, /* GL_GEOMETRY_OUTPUT_TYPE_ARB */
+   { 36970, 0x00008DDD }, /* GL_MAX_GEOMETRY_VARYING_COMPONENTS_ARB */
+   { 37009, 0x00008DDE }, /* GL_MAX_VERTEX_VARYING_COMPONENTS_ARB */
+   { 37046, 0x00008DDF }, /* GL_MAX_GEOMETRY_UNIFORM_COMPONENTS */
+   { 37081, 0x00008DE0 }, /* GL_MAX_GEOMETRY_OUTPUT_VERTICES */
+   { 37113, 0x00008DE1 }, /* GL_MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS */
+   { 37153, 0x00008DF0 }, /* GL_LOW_FLOAT */
+   { 37166, 0x00008DF1 }, /* GL_MEDIUM_FLOAT */
+   { 37182, 0x00008DF2 }, /* GL_HIGH_FLOAT */
+   { 37196, 0x00008DF3 }, /* GL_LOW_INT */
+   { 37207, 0x00008DF4 }, /* GL_MEDIUM_INT */
+   { 37221, 0x00008DF5 }, /* GL_HIGH_INT */
+   { 37233, 0x00008DF6 }, /* GL_UNSIGNED_INT_10_10_10_2_OES */
+   { 37264, 0x00008DF7 }, /* GL_INT_10_10_10_2_OES */
+   { 37286, 0x00008DF8 }, /* GL_SHADER_BINARY_FORMATS */
+   { 37311, 0x00008DF9 }, /* GL_NUM_SHADER_BINARY_FORMATS */
+   { 37340, 0x00008DFA }, /* GL_SHADER_COMPILER */
+   { 37359, 0x00008DFB }, /* GL_MAX_VERTEX_UNIFORM_VECTORS */
+   { 37389, 0x00008DFC }, /* GL_MAX_VARYING_VECTORS */
+   { 37412, 0x00008DFD }, /* GL_MAX_FRAGMENT_UNIFORM_VECTORS */
+   { 37444, 0x00008E13 }, /* GL_QUERY_WAIT */
+   { 37458, 0x00008E14 }, /* GL_QUERY_NO_WAIT */
+   { 37475, 0x00008E15 }, /* GL_QUERY_BY_REGION_WAIT */
+   { 37499, 0x00008E16 }, /* GL_QUERY_BY_REGION_NO_WAIT */
+   { 37526, 0x00008E22 }, /* GL_TRANSFORM_FEEDBACK */
+   { 37548, 0x00008E23 }, /* GL_TRANSFORM_FEEDBACK_BUFFER_PAUSED */
+   { 37584, 0x00008E24 }, /* GL_TRANSFORM_FEEDBACK_BUFFER_ACTIVE */
+   { 37620, 0x00008E25 }, /* GL_TRANSFORM_FEEDBACK_BINDING */
+   { 37650, 0x00008E28 }, /* GL_TIMESTAMP */
+   { 37663, 0x00008E42 }, /* GL_TEXTURE_SWIZZLE_R */
+   { 37684, 0x00008E43 }, /* GL_TEXTURE_SWIZZLE_G */
+   { 37705, 0x00008E44 }, /* GL_TEXTURE_SWIZZLE_B */
+   { 37726, 0x00008E45 }, /* GL_TEXTURE_SWIZZLE_A */
+   { 37747, 0x00008E46 }, /* GL_TEXTURE_SWIZZLE_RGBA */
+   { 37771, 0x00008E4C }, /* GL_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION */
+   { 37815, 0x00008E4D }, /* GL_FIRST_VERTEX_CONVENTION */
+   { 37842, 0x00008E4E }, /* GL_LAST_VERTEX_CONVENTION */
+   { 37868, 0x00008E4F }, /* GL_PROVOKING_VERTEX */
+   { 37888, 0x00008E50 }, /* GL_SAMPLE_POSITION */
+   { 37907, 0x00008E51 }, /* GL_SAMPLE_MASK */
+   { 37922, 0x00008E52 }, /* GL_SAMPLE_MASK_VALUE */
+   { 37943, 0x00008E59 }, /* GL_MAX_SAMPLE_MASK_WORDS */
+   { 37968, 0x00008E5A }, /* GL_MAX_GEOMETRY_SHADER_INVOCATIONS */
+   { 38003, 0x00008E5B }, /* GL_MIN_FRAGMENT_INTERPOLATION_OFFSET */
+   { 38040, 0x00008E5C }, /* GL_MAX_FRAGMENT_INTERPOLATION_OFFSET */
+   { 38077, 0x00008E5D }, /* GL_FRAGMENT_INTERPOLATION_OFFSET_BITS */
+   { 38115, 0x00008E5E }, /* GL_MIN_PROGRAM_TEXTURE_GATHER_OFFSET */
+   { 38152, 0x00008E5F }, /* GL_MAX_PROGRAM_TEXTURE_GATHER_OFFSET */
+   { 38189, 0x00008E70 }, /* GL_MAX_TRANSFORM_FEEDBACK_BUFFERS */
+   { 38223, 0x00008E71 }, /* GL_MAX_VERTEX_STREAMS */
+   { 38245, 0x00008F36 }, /* GL_COPY_READ_BUFFER */
+   { 38265, 0x00008F37 }, /* GL_COPY_WRITE_BUFFER */
+   { 38286, 0x00008F38 }, /* GL_MAX_IMAGE_UNITS */
+   { 38305, 0x00008F39 }, /* GL_MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS */
+   { 38354, 0x00008F3A }, /* GL_IMAGE_BINDING_NAME */
+   { 38376, 0x00008F3B }, /* GL_IMAGE_BINDING_LEVEL */
+   { 38399, 0x00008F3C }, /* GL_IMAGE_BINDING_LAYERED */
+   { 38424, 0x00008F3D }, /* GL_IMAGE_BINDING_LAYER */
+   { 38447, 0x00008F3E }, /* GL_IMAGE_BINDING_ACCESS */
+   { 38471, 0x00008F3F }, /* GL_DRAW_INDIRECT_BUFFER */
+   { 38495, 0x00008F43 }, /* GL_DRAW_INDIRECT_BUFFER_BINDING */
+   { 38527, 0x00008F90 }, /* GL_RED_SNORM */
+   { 38540, 0x00008F91 }, /* GL_RG_SNORM */
+   { 38552, 0x00008F92 }, /* GL_RGB_SNORM */
+   { 38565, 0x00008F93 }, /* GL_RGBA_SNORM */
+   { 38579, 0x00008F94 }, /* GL_R8_SNORM */
+   { 38591, 0x00008F95 }, /* GL_RG8_SNORM */
+   { 38604, 0x00008F96 }, /* GL_RGB8_SNORM */
+   { 38618, 0x00008F97 }, /* GL_RGBA8_SNORM */
+   { 38633, 0x00008F98 }, /* GL_R16_SNORM */
+   { 38646, 0x00008F99 }, /* GL_RG16_SNORM */
+   { 38660, 0x00008F9A }, /* GL_RGB16_SNORM */
+   { 38675, 0x00008F9B }, /* GL_RGBA16_SNORM */
+   { 38691, 0x00008F9C }, /* GL_SIGNED_NORMALIZED */
+   { 38712, 0x00008F9D }, /* GL_PRIMITIVE_RESTART */
+   { 38733, 0x00008F9E }, /* GL_PRIMITIVE_RESTART_INDEX */
+   { 38760, 0x00008F9F }, /* GL_MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB */
+   { 38805, 0x00009009 }, /* GL_TEXTURE_CUBE_MAP_ARRAY_ARB */
+   { 38835, 0x0000900A }, /* GL_TEXTURE_BINDING_CUBE_MAP_ARRAY_ARB */
+   { 38873, 0x0000900B }, /* GL_PROXY_TEXTURE_CUBE_MAP_ARRAY_ARB */
+   { 38909, 0x0000900C }, /* GL_SAMPLER_CUBE_MAP_ARRAY_ARB */
+   { 38939, 0x0000900D }, /* GL_SAMPLER_CUBE_MAP_ARRAY_SHADOW_ARB */
+   { 38976, 0x0000900E }, /* GL_INT_SAMPLER_CUBE_MAP_ARRAY_ARB */
+   { 39010, 0x0000900F }, /* GL_UNSIGNED_INT_SAMPLER_CUBE_MAP_ARRAY_ARB */
+   { 39053, 0x0000904C }, /* GL_IMAGE_1D */
+   { 39065, 0x0000904D }, /* GL_IMAGE_2D */
+   { 39077, 0x0000904E }, /* GL_IMAGE_3D */
+   { 39089, 0x0000904F }, /* GL_IMAGE_2D_RECT */
+   { 39106, 0x00009050 }, /* GL_IMAGE_CUBE */
+   { 39120, 0x00009051 }, /* GL_IMAGE_BUFFER */
+   { 39136, 0x00009052 }, /* GL_IMAGE_1D_ARRAY */
+   { 39154, 0x00009053 }, /* GL_IMAGE_2D_ARRAY */
+   { 39172, 0x00009054 }, /* GL_IMAGE_CUBE_MAP_ARRAY */
+   { 39196, 0x00009055 }, /* GL_IMAGE_2D_MULTISAMPLE */
+   { 39220, 0x00009056 }, /* GL_IMAGE_2D_MULTISAMPLE_ARRAY */
+   { 39250, 0x00009057 }, /* GL_INT_IMAGE_1D */
+   { 39266, 0x00009058 }, /* GL_INT_IMAGE_2D */
+   { 39282, 0x00009059 }, /* GL_INT_IMAGE_3D */
+   { 39298, 0x0000905A }, /* GL_INT_IMAGE_2D_RECT */
+   { 39319, 0x0000905B }, /* GL_INT_IMAGE_CUBE */
+   { 39337, 0x0000905C }, /* GL_INT_IMAGE_BUFFER */
+   { 39357, 0x0000905D }, /* GL_INT_IMAGE_1D_ARRAY */
+   { 39379, 0x0000905E }, /* GL_INT_IMAGE_2D_ARRAY */
+   { 39401, 0x0000905F }, /* GL_INT_IMAGE_CUBE_MAP_ARRAY */
+   { 39429, 0x00009060 }, /* GL_INT_IMAGE_2D_MULTISAMPLE */
+   { 39457, 0x00009061 }, /* GL_INT_IMAGE_2D_MULTISAMPLE_ARRAY */
+   { 39491, 0x00009062 }, /* GL_UNSIGNED_INT_IMAGE_1D */
+   { 39516, 0x00009063 }, /* GL_UNSIGNED_INT_IMAGE_2D */
+   { 39541, 0x00009064 }, /* GL_UNSIGNED_INT_IMAGE_3D */
+   { 39566, 0x00009065 }, /* GL_UNSIGNED_INT_IMAGE_2D_RECT */
+   { 39596, 0x00009066 }, /* GL_UNSIGNED_INT_IMAGE_CUBE */
+   { 39623, 0x00009067 }, /* GL_UNSIGNED_INT_IMAGE_BUFFER */
+   { 39652, 0x00009068 }, /* GL_UNSIGNED_INT_IMAGE_1D_ARRAY */
+   { 39683, 0x00009069 }, /* GL_UNSIGNED_INT_IMAGE_2D_ARRAY */
+   { 39714, 0x0000906A }, /* GL_UNSIGNED_INT_IMAGE_CUBE_MAP_ARRAY */
+   { 39751, 0x0000906B }, /* GL_UNSIGNED_INT_IMAGE_2D_MULTISAMPLE */
+   { 39788, 0x0000906C }, /* GL_UNSIGNED_INT_IMAGE_2D_MULTISAMPLE_ARRAY */
+   { 39831, 0x0000906D }, /* GL_MAX_IMAGE_SAMPLES */
+   { 39852, 0x0000906E }, /* GL_IMAGE_BINDING_FORMAT */
+   { 39876, 0x0000906F }, /* GL_RGB10_A2UI */
+   { 39890, 0x000090BC }, /* GL_MIN_MAP_BUFFER_ALIGNMENT */
+   { 39918, 0x000090C7 }, /* GL_IMAGE_FORMAT_COMPATIBILITY_TYPE */
+   { 39953, 0x000090C8 }, /* GL_IMAGE_FORMAT_COMPATIBILITY_BY_SIZE */
+   { 39991, 0x000090C9 }, /* GL_IMAGE_FORMAT_COMPATIBILITY_BY_CLASS */
+   { 40030, 0x000090CA }, /* GL_MAX_VERTEX_IMAGE_UNIFORMS */
+   { 40059, 0x000090CB }, /* GL_MAX_TESS_CONTROL_IMAGE_UNIFORMS */
+   { 40094, 0x000090CC }, /* GL_MAX_TESS_EVALUATION_IMAGE_UNIFORMS */
+   { 40132, 0x000090CD }, /* GL_MAX_GEOMETRY_IMAGE_UNIFORMS */
+   { 40163, 0x000090CE }, /* GL_MAX_FRAGMENT_IMAGE_UNIFORMS */
+   { 40194, 0x000090CF }, /* GL_MAX_COMBINED_IMAGE_UNIFORMS */
+   { 40225, 0x000090EA }, /* GL_DEPTH_STENCIL_TEXTURE_MODE */
+   { 40255, 0x000090EB }, /* GL_MAX_COMPUTE_WORK_GROUP_INVOCATIONS */
+   { 40293, 0x000090EC }, /* GL_UNIFORM_BLOCK_REFERENCED_BY_COMPUTE_SHADER */
+   { 40339, 0x000090ED }, /* GL_ATOMIC_COUNTER_BUFFER_REFERENCED_BY_COMPUTE_SHADER */
+   { 40393, 0x000090EE }, /* GL_DISPATCH_INDIRECT_BUFFER */
+   { 40421, 0x000090EF }, /* GL_DISPATCH_INDIRECT_BUFFER_BINDING */
+   { 40457, 0x00009100 }, /* GL_TEXTURE_2D_MULTISAMPLE */
+   { 40483, 0x00009101 }, /* GL_PROXY_TEXTURE_2D_MULTISAMPLE */
+   { 40515, 0x00009102 }, /* GL_TEXTURE_2D_MULTISAMPLE_ARRAY */
+   { 40547, 0x00009103 }, /* GL_PROXY_TEXTURE_2D_MULTISAMPLE_ARRAY */
+   { 40585, 0x00009104 }, /* GL_TEXTURE_BINDING_2D_MULTISAMPLE */
+   { 40619, 0x00009105 }, /* GL_TEXTURE_BINDING_2D_MULTISAMPLE_ARRAY */
+   { 40659, 0x00009106 }, /* GL_TEXTURE_SAMPLES */
+   { 40678, 0x00009107 }, /* GL_TEXTURE_FIXED_SAMPLE_LOCATIONS */
+   { 40712, 0x00009108 }, /* GL_SAMPLER_2D_MULTISAMPLE */
+   { 40738, 0x00009109 }, /* GL_INT_SAMPLER_2D_MULTISAMPLE */
+   { 40768, 0x0000910A }, /* GL_UNSIGNED_INT_SAMPLER_2D_MULTISAMPLE */
+   { 40807, 0x0000910B }, /* GL_SAMPLER_2D_MULTISAMPLE_ARRAY */
+   { 40839, 0x0000910C }, /* GL_INT_SAMPLER_2D_MULTISAMPLE_ARRAY */
+   { 40875, 0x0000910D }, /* GL_UNSIGNED_INT_SAMPLER_2D_MULTISAMPLE_ARRAY */
+   { 40920, 0x0000910E }, /* GL_MAX_COLOR_TEXTURE_SAMPLES */
+   { 40949, 0x0000910F }, /* GL_MAX_DEPTH_TEXTURE_SAMPLES */
+   { 40978, 0x00009110 }, /* GL_MAX_INTEGER_SAMPLES */
+   { 41001, 0x00009111 }, /* GL_MAX_SERVER_WAIT_TIMEOUT */
+   { 41028, 0x00009112 }, /* GL_OBJECT_TYPE */
+   { 41043, 0x00009113 }, /* GL_SYNC_CONDITION */
+   { 41061, 0x00009114 }, /* GL_SYNC_STATUS */
+   { 41076, 0x00009115 }, /* GL_SYNC_FLAGS */
+   { 41090, 0x00009116 }, /* GL_SYNC_FENCE */
+   { 41104, 0x00009117 }, /* GL_SYNC_GPU_COMMANDS_COMPLETE */
+   { 41134, 0x00009118 }, /* GL_UNSIGNALED */
+   { 41148, 0x00009119 }, /* GL_SIGNALED */
+   { 41160, 0x0000911A }, /* GL_ALREADY_SIGNALED */
+   { 41180, 0x0000911B }, /* GL_TIMEOUT_EXPIRED */
+   { 41199, 0x0000911C }, /* GL_CONDITION_SATISFIED */
+   { 41222, 0x0000911D }, /* GL_WAIT_FAILED */
+   { 41237, 0x0000911F }, /* GL_BUFFER_ACCESS_FLAGS */
+   { 41260, 0x00009120 }, /* GL_BUFFER_MAP_LENGTH */
+   { 41281, 0x00009121 }, /* GL_BUFFER_MAP_OFFSET */
+   { 41302, 0x00009122 }, /* GL_MAX_VERTEX_OUTPUT_COMPONENTS */
+   { 41334, 0x00009123 }, /* GL_MAX_GEOMETRY_INPUT_COMPONENTS */
+   { 41367, 0x00009124 }, /* GL_MAX_GEOMETRY_OUTPUT_COMPONENTS */
+   { 41401, 0x00009125 }, /* GL_MAX_FRAGMENT_INPUT_COMPONENTS */
+   { 41434, 0x00009126 }, /* GL_CONTEXT_PROFILE_MASK */
+   { 41458, 0x0000912F }, /* GL_TEXTURE_IMMUTABLE_FORMAT */
+   { 41486, 0x00009143 }, /* GL_MAX_DEBUG_MESSAGE_LENGTH_ARB */
+   { 41518, 0x00009144 }, /* GL_MAX_DEBUG_LOGGED_MESSAGES_ARB */
+   { 41551, 0x00009145 }, /* GL_DEBUG_LOGGED_MESSAGES_ARB */
+   { 41580, 0x00009146 }, /* GL_DEBUG_SEVERITY_HIGH_ARB */
+   { 41607, 0x00009147 }, /* GL_DEBUG_SEVERITY_MEDIUM_ARB */
+   { 41636, 0x00009148 }, /* GL_DEBUG_SEVERITY_LOW_ARB */
+   { 41662, 0x0000919D }, /* GL_TEXTURE_BUFFER_OFFSET */
+   { 41687, 0x0000919E }, /* GL_TEXTURE_BUFFER_SIZE */
+   { 41710, 0x0000919F }, /* GL_TEXTURE_BUFFER_OFFSET_ALIGNMENT */
+   { 41745, 0x000091B9 }, /* GL_COMPUTE_SHADER */
+   { 41763, 0x000091BB }, /* GL_MAX_COMPUTE_UNIFORM_BLOCKS */
+   { 41793, 0x000091BC }, /* GL_MAX_COMPUTE_TEXTURE_IMAGE_UNITS */
+   { 41828, 0x000091BD }, /* GL_MAX_COMPUTE_IMAGE_UNIFORMS */
+   { 41858, 0x000091BE }, /* GL_MAX_COMPUTE_WORK_GROUP_COUNT */
+   { 41890, 0x000091BF }, /* GL_MAX_COMPUTE_WORK_GROUP_SIZE */
+   { 41921, 0x00009270 }, /* GL_COMPRESSED_R11_EAC */
+   { 41943, 0x00009271 }, /* GL_COMPRESSED_SIGNED_R11_EAC */
+   { 41972, 0x00009272 }, /* GL_COMPRESSED_RG11_EAC */
+   { 41995, 0x00009273 }, /* GL_COMPRESSED_SIGNED_RG11_EAC */
+   { 42025, 0x00009274 }, /* GL_COMPRESSED_RGB8_ETC2 */
+   { 42049, 0x00009275 }, /* GL_COMPRESSED_SRGB8_ETC2 */
+   { 42074, 0x00009276 }, /* GL_COMPRESSED_RGB8_PUNCHTHROUGH_ALPHA1_ETC2 */
+   { 42118, 0x00009277 }, /* GL_COMPRESSED_SRGB8_PUNCHTHROUGH_ALPHA1_ETC2 */
+   { 42163, 0x00009278 }, /* GL_COMPRESSED_RGBA8_ETC2_EAC */
+   { 42192, 0x00009279 }, /* GL_COMPRESSED_SRGB8_ALPHA8_ETC2_EAC */
+   { 42228, 0x000092C0 }, /* GL_ATOMIC_COUNTER_BUFFER */
+   { 42253, 0x000092C1 }, /* GL_ATOMIC_COUNTER_BUFFER_BINDING */
+   { 42286, 0x000092C2 }, /* GL_ATOMIC_COUNTER_BUFFER_START */
+   { 42317, 0x000092C3 }, /* GL_ATOMIC_COUNTER_BUFFER_SIZE */
+   { 42347, 0x000092C4 }, /* GL_ATOMIC_COUNTER_BUFFER_DATA_SIZE */
+   { 42382, 0x000092C5 }, /* GL_ATOMIC_COUNTER_BUFFER_ACTIVE_ATOMIC_COUNTERS */
+   { 42430, 0x000092C6 }, /* GL_ATOMIC_COUNTER_BUFFER_ACTIVE_ATOMIC_COUNTER_INDICES */
+   { 42485, 0x000092C7 }, /* GL_ATOMIC_COUNTER_BUFFER_REFERENCED_BY_VERTEX_SHADER */
+   { 42538, 0x000092C8 }, /* GL_ATOMIC_COUNTER_BUFFER_REFERENCED_BY_TESS_CONTROL_SHADER */
+   { 42597, 0x000092C9 }, /* GL_ATOMIC_COUNTER_BUFFER_REFERENCED_BY_TESS_EVALUATION_SHADER */
+   { 42659, 0x000092CA }, /* GL_ATOMIC_COUNTER_BUFFER_REFERENCED_BY_GEOMETRY_SHADER */
+   { 42714, 0x000092CB }, /* GL_ATOMIC_COUNTER_BUFFER_REFERENCED_BY_FRAGMENT_SHADER */
+   { 42769, 0x000092CC }, /* GL_MAX_VERTEX_ATOMIC_COUNTER_BUFFERS */
+   { 42806, 0x000092CD }, /* GL_MAX_TESS_CONTROL_ATOMIC_COUNTER_BUFFERS */
+   { 42849, 0x000092CE }, /* GL_MAX_TESS_EVALUATION_ATOMIC_COUNTER_BUFFERS */
+   { 42895, 0x000092CF }, /* GL_MAX_GEOMETRY_ATOMIC_COUNTER_BUFFERS */
+   { 42934, 0x000092D0 }, /* GL_MAX_FRAGMENT_ATOMIC_COUNTER_BUFFERS */
+   { 42973, 0x000092D1 }, /* GL_MAX_COMBINED_ATOMIC_COUNTER_BUFFERS */
+   { 43012, 0x000092D2 }, /* GL_MAX_VERTEX_ATOMIC_COUNTERS */
+   { 43042, 0x000092D3 }, /* GL_MAX_TESS_CONTROL_ATOMIC_COUNTERS */
+   { 43078, 0x000092D4 }, /* GL_MAX_TESS_EVALUATION_ATOMIC_COUNTERS */
+   { 43117, 0x000092D5 }, /* GL_MAX_GEOMETRY_ATOMIC_COUNTERS */
+   { 43149, 0x000092D6 }, /* GL_MAX_FRAGMENT_ATOMIC_COUNTERS */
+   { 43181, 0x000092D7 }, /* GL_MAX_COMBINED_ATOMIC_COUNTERS */
+   { 43213, 0x000092D8 }, /* GL_MAX_ATOMIC_COUNTER_BUFFER_SIZE */
+   { 43247, 0x000092D9 }, /* GL_ACTIVE_ATOMIC_COUNTER_BUFFERS */
+   { 43280, 0x000092DA }, /* GL_UNIFORM_ATOMIC_COUNTER_BUFFER_INDEX */
+   { 43319, 0x000092DB }, /* GL_UNSIGNED_INT_ATOMIC_COUNTER */
+   { 43350, 0x000092DC }, /* GL_MAX_ATOMIC_COUNTER_BUFFER_BINDINGS */
+   { 43388, 0x000092E0 }, /* GL_DEBUG_OUTPUT */
+   { 43404, 0x00009380 }, /* GL_NUM_SAMPLE_COUNTS */
+   { 43425, 0x000094F0 }, /* GL_PERFQUERY_COUNTER_EVENT_INTEL */
+   { 43458, 0x000094F1 }, /* GL_PERFQUERY_COUNTER_DURATION_NORM_INTEL */
+   { 43499, 0x000094F2 }, /* GL_PERFQUERY_COUNTER_DURATION_RAW_INTEL */
+   { 43539, 0x000094F3 }, /* GL_PERFQUERY_COUNTER_THROUGHPUT_INTEL */
+   { 43577, 0x000094F4 }, /* GL_PERFQUERY_COUNTER_RAW_INTEL */
+   { 43608, 0x000094F5 }, /* GL_PERFQUERY_COUNTER_TIMESTAMP_INTEL */
+   { 43645, 0x000094F8 }, /* GL_PERFQUERY_COUNTER_DATA_UINT32_INTEL */
+   { 43684, 0x000094F9 }, /* GL_PERFQUERY_COUNTER_DATA_UINT64_INTEL */
+   { 43723, 0x000094FA }, /* GL_PERFQUERY_COUNTER_DATA_FLOAT_INTEL */
+   { 43761, 0x000094FB }, /* GL_PERFQUERY_COUNTER_DATA_DOUBLE_INTEL */
+   { 43800, 0x000094FC }, /* GL_PERFQUERY_COUNTER_DATA_BOOL32_INTEL */
+   { 43839, 0x000094FD }, /* GL_PERFQUERY_QUERY_NAME_LENGTH_MAX_INTEL */
+   { 43880, 0x000094FE }, /* GL_PERFQUERY_COUNTER_NAME_LENGTH_MAX_INTEL */
+   { 43923, 0x000094FF }, /* GL_PERFQUERY_COUNTER_DESC_LENGTH_MAX_INTEL */
+   { 43966, 0x00009500 }, /* GL_PERFQUERY_GPA_EXTENDED_COUNTERS_INTEL */
+   { 44007, 0x00010000 }, /* GL_EVAL_BIT */
+   { 44019, 0x00019262 }, /* GL_RASTER_POSITION_UNCLIPPED_IBM */
+   { 44052, 0x00020000 }, /* GL_LIST_BIT */
+   { 44064, 0x00040000 }, /* GL_TEXTURE_BIT */
+   { 44079, 0x00080000 }, /* GL_SCISSOR_BIT */
+   { 44094, 0x000FFFFF }, /* GL_ALL_ATTRIB_BITS */
+   { 44113, 0x20000000 }, /* GL_MULTISAMPLE_BIT */
+   { 44132, 0xFFFFFFFF }, /* GL_ALL_CLIENT_ATTRIB_BITS */
+};
+
+
+typedef int (*cfunc)(const void *, const void *);
+
+/**
+ * Compare a key enum value to an element in the \c enum_string_table_offsets array.
+ *
+ * \c bsearch always passes the key as the first parameter and the pointer
+ * to the array element as the second parameter.  We can elimiate some
+ * extra work by taking advantage of that fact.
+ *
+ * \param a  Pointer to the desired enum name.
+ * \param b  Pointer into the \c enum_string_table_offsets array.
+ */
+static int compar_nr( const int *a, enum_elt *b )
+{
+   return a[0] - b->n;
+}
+
+
+static char token_tmp[20];
+
+const char *_mesa_lookup_enum_by_nr( int nr )
+{
+   enum_elt *elt;
+
+   STATIC_ASSERT(sizeof(enum_string_table) < (1 << 16));
+
+   elt = _mesa_bsearch(& nr, enum_string_table_offsets,
+                       Elements(enum_string_table_offsets),
+                       sizeof(enum_string_table_offsets[0]),
+                       (cfunc) compar_nr);
+
+   if (elt != NULL) {
+      return &enum_string_table[elt->offset];
+   }
+   else {
+      /* this is not re-entrant safe, no big deal here */
+      _mesa_snprintf(token_tmp, sizeof(token_tmp) - 1, "0x%x", nr);
+      token_tmp[sizeof(token_tmp) - 1] = '\0';
+      return token_tmp;
+   }
+}
+
+/**
+ * Primitive names
+ */
+static const char *prim_names[PRIM_MAX+3] = {
+   "GL_POINTS",
+   "GL_LINES",
+   "GL_LINE_LOOP",
+   "GL_LINE_STRIP",
+   "GL_TRIANGLES",
+   "GL_TRIANGLE_STRIP",
+   "GL_TRIANGLE_FAN",
+   "GL_QUADS",
+   "GL_QUAD_STRIP",
+   "GL_POLYGON",
+   "GL_LINES_ADJACENCY",
+   "GL_LINE_STRIP_ADJACENCY",
+   "GL_TRIANGLES_ADJACENCY",
+   "GL_TRIANGLE_STRIP_ADJACENCY",
+   "outside begin/end",
+   "unknown state"
+};
+
+
+/* Get the name of an enum given that it is a primitive type.  Avoids
+ * GL_FALSE/GL_POINTS ambiguity and others.
+ */
+const char *
+_mesa_lookup_prim_by_nr(GLuint nr)
+{
+   if (nr < Elements(prim_names))
+      return prim_names[nr];
+   else
+      return "invalid mode";
+}
+
+
+

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/enums.h b/icd/intel/compiler/mesa-utils/src/mesa/main/enums.h
new file mode 100644
index 0000000..36c053d
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/enums.h

@@ -0,0 +1,47 @@
+/**
+ * \file enums.h
+ * Enumeration name/number lookup functions.
+ * 
+ * \if subset
+ * (No-op)
+ *
+ * \endif
+ */
+
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2006  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+#ifndef _ENUMS_H_
+#define _ENUMS_H_
+
+
+extern const char *_mesa_lookup_enum_by_nr( int nr );
+
+/* Get the name of an enum given that it is a primitive type.  Avoids
+ * GL_FALSE/GL_POINTS ambiguity and others.
+ */
+const char *_mesa_lookup_prim_by_nr( unsigned nr );
+
+#endif

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/errors.c b/icd/intel/compiler/mesa-utils/src/mesa/main/errors.c
new file mode 100644
index 0000000..6227ef5
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/errors.c

@@ -0,0 +1,731 @@
+/**
+ * \file errors.c
+ * Mesa debugging and error handling functions.
+ */
+
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2007  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+#include "errors.h"
+#include "enums.h"
+#include "imports.h"
+#include "context.h"
+// #include "dispatch.h"  // LunarG REM:
+#include "hash.h"
+#include "mtypes.h"
+#include "version.h"
+#include "hash_table.h"
+
+static mtx_t DynamicIDMutex = _MTX_INITIALIZER_NP;
+static GLuint NextDynamicID = 1;
+
+/**
+ * A namespace element.
+ */
+struct gl_debug_element
+{
+   struct simple_node link;
+
+   GLuint ID;
+   /* at which severity levels (mesa_debug_severity) is the message enabled */
+   GLbitfield State;
+};
+
+struct gl_debug_namespace
+{
+   struct simple_node Elements;
+   GLbitfield DefaultState;
+};
+
+struct gl_debug_group {
+   struct gl_debug_namespace Namespaces[MESA_DEBUG_SOURCE_COUNT][MESA_DEBUG_TYPE_COUNT];
+};
+
+/**
+ * An error, warning, or other piece of debug information for an application
+ * to consume via GL_ARB_debug_output/GL_KHR_debug.
+ */
+struct gl_debug_message
+{
+   enum mesa_debug_source source;
+   enum mesa_debug_type type;
+   GLuint id;
+   enum mesa_debug_severity severity;
+   GLsizei length;
+   GLcharARB *message;
+};
+
+/**
+ * Debug message log.  It works like a ring buffer.
+ */
+struct gl_debug_log {
+   struct gl_debug_message Messages[MAX_DEBUG_LOGGED_MESSAGES];
+   GLint NextMessage;
+   GLint NumMessages;
+};
+
+struct gl_debug_state
+{
+   GLDEBUGPROC Callback;
+   const void *CallbackData;
+   GLboolean SyncOutput;
+   GLboolean DebugOutput;
+
+   struct gl_debug_group *Groups[MAX_DEBUG_GROUP_STACK_DEPTH];
+   struct gl_debug_message GroupMessages[MAX_DEBUG_GROUP_STACK_DEPTH];
+   GLint GroupStackDepth;
+
+   struct gl_debug_log Log;
+};
+
+static char out_of_memory[] = "Debugging error: out of memory";
+
+static const GLenum debug_source_enums[] = {
+   GL_DEBUG_SOURCE_API,
+   GL_DEBUG_SOURCE_WINDOW_SYSTEM,
+   GL_DEBUG_SOURCE_SHADER_COMPILER,
+   GL_DEBUG_SOURCE_THIRD_PARTY,
+   GL_DEBUG_SOURCE_APPLICATION,
+   GL_DEBUG_SOURCE_OTHER,
+};
+
+static const GLenum debug_type_enums[] = {
+   GL_DEBUG_TYPE_ERROR,
+   GL_DEBUG_TYPE_DEPRECATED_BEHAVIOR,
+   GL_DEBUG_TYPE_UNDEFINED_BEHAVIOR,
+   GL_DEBUG_TYPE_PORTABILITY,
+   GL_DEBUG_TYPE_PERFORMANCE,
+   GL_DEBUG_TYPE_OTHER,
+   GL_DEBUG_TYPE_MARKER,
+   GL_DEBUG_TYPE_PUSH_GROUP,
+   GL_DEBUG_TYPE_POP_GROUP,
+};
+
+static const GLenum debug_severity_enums[] = {
+   GL_DEBUG_SEVERITY_LOW,
+   GL_DEBUG_SEVERITY_MEDIUM,
+   GL_DEBUG_SEVERITY_HIGH,
+   GL_DEBUG_SEVERITY_NOTIFICATION,
+};
+
+/**
+ * Handles generating a GL_ARB_debug_output message ID generated by the GL or
+ * GLSL compiler.
+ *
+ * The GL API has this "ID" mechanism, where the intention is to allow a
+ * client to filter in/out messages based on source, type, and ID.  Of course,
+ * building a giant enum list of all debug output messages that Mesa might
+ * generate is ridiculous, so instead we have our caller pass us a pointer to
+ * static storage where the ID should get stored.  This ID will be shared
+ * across all contexts for that message (which seems like a desirable
+ * property, even if it's not expected by the spec), but note that it won't be
+ * the same between executions if messages aren't generated in the same order.
+ */
+static void
+debug_get_id(GLuint *id)
+{
+   if (!(*id)) {
+      mtx_lock(&DynamicIDMutex);
+      if (!(*id))
+         *id = NextDynamicID++;
+      mtx_unlock(&DynamicIDMutex);
+   }
+}
+
+static void
+debug_message_store(struct gl_debug_message *msg,
+                    enum mesa_debug_source source,
+                    enum mesa_debug_type type, GLuint id,
+                    enum mesa_debug_severity severity,
+                    GLsizei len, const char *buf)
+{
+   assert(!msg->message && !msg->length);
+
+   msg->message = malloc(len+1);
+   if (msg->message) {
+      (void) strncpy(msg->message, buf, (size_t)len);
+      msg->message[len] = '\0';
+
+      msg->length = len+1;
+      msg->source = source;
+      msg->type = type;
+      msg->id = id;
+      msg->severity = severity;
+   } else {
+      static GLuint oom_msg_id = 0;
+      debug_get_id(&oom_msg_id);
+
+      /* malloc failed! */
+      msg->message = out_of_memory;
+      msg->length = strlen(out_of_memory)+1;
+      msg->source = MESA_DEBUG_SOURCE_OTHER;
+      msg->type = MESA_DEBUG_TYPE_ERROR;
+      msg->id = oom_msg_id;
+      msg->severity = MESA_DEBUG_SEVERITY_HIGH;
+   }
+}
+static void
+debug_namespace_clear(struct gl_debug_namespace *ns)
+{
+   struct simple_node *node, *tmp;
+
+   foreach_s(node, tmp, &ns->Elements)
+      free(node);
+}
+
+/**
+ * Get the state of \p id in the namespace.
+ */
+static bool
+debug_namespace_get(const struct gl_debug_namespace *ns, GLuint id,
+                    enum mesa_debug_severity severity)
+{
+   struct simple_node *node;
+   uint32_t state;
+
+   state = ns->DefaultState;
+   foreach(node, &ns->Elements) {
+      struct gl_debug_element *elem = (struct gl_debug_element *) node;
+
+      if (elem->ID == id) {
+         state = elem->State;
+         break;
+      }
+   }
+
+   return (state & (1 << severity));
+}
+
+/**
+ * Return true if the top debug group points to the group below it.
+ */
+static bool
+debug_is_group_read_only(const struct gl_debug_state *debug)
+{
+   const GLint gstack = debug->GroupStackDepth;
+   return (gstack > 0 && debug->Groups[gstack] == debug->Groups[gstack - 1]);
+}
+
+/**
+ * Free the top debug group.
+ */
+static void
+debug_clear_group(struct gl_debug_state *debug)
+{
+   const GLint gstack = debug->GroupStackDepth;
+
+   if (!debug_is_group_read_only(debug)) {
+      struct gl_debug_group *grp = debug->Groups[gstack];
+      int s, t;
+
+      for (s = 0; s < MESA_DEBUG_SOURCE_COUNT; s++) {
+         for (t = 0; t < MESA_DEBUG_TYPE_COUNT; t++)
+            debug_namespace_clear(&grp->Namespaces[s][t]);
+      }
+
+      free(grp);
+   }
+
+   debug->Groups[gstack] = NULL;
+}
+
+/**
+ * Loop through debug group stack tearing down states for
+ * filtering debug messages.  Then free debug output state.
+ */
+static void
+debug_destroy(struct gl_debug_state *debug)
+{
+   while (debug->GroupStackDepth > 0) {
+      debug_clear_group(debug);
+      debug->GroupStackDepth--;
+   }
+
+   debug_clear_group(debug);
+   free(debug);
+}
+
+/**
+ * Returns if the given message source/type/ID tuple is enabled.
+ */
+static bool
+debug_is_message_enabled(const struct gl_debug_state *debug,
+                         enum mesa_debug_source source,
+                         enum mesa_debug_type type,
+                         GLuint id,
+                         enum mesa_debug_severity severity)
+{
+   const GLint gstack = debug->GroupStackDepth;
+   struct gl_debug_group *grp = debug->Groups[gstack];
+   struct gl_debug_namespace *nspace = &grp->Namespaces[source][type];
+
+   if (!debug->DebugOutput)
+      return false;
+
+   return debug_namespace_get(nspace, id, severity);
+}
+
+/**
+ * 'buf' is not necessarily a null-terminated string. When logging, copy
+ * 'len' characters from it, store them in a new, null-terminated string,
+ * and remember the number of bytes used by that string, *including*
+ * the null terminator this time.
+ */
+static void
+debug_log_message(struct gl_debug_state *debug,
+                  enum mesa_debug_source source,
+                  enum mesa_debug_type type, GLuint id,
+                  enum mesa_debug_severity severity,
+                  GLsizei len, const char *buf)
+{
+   struct gl_debug_log *log = &debug->Log;
+   GLint nextEmpty;
+   struct gl_debug_message *emptySlot;
+
+   assert(len >= 0 && len < MAX_DEBUG_MESSAGE_LENGTH);
+
+   if (log->NumMessages == MAX_DEBUG_LOGGED_MESSAGES)
+      return;
+
+   nextEmpty = (log->NextMessage + log->NumMessages)
+      % MAX_DEBUG_LOGGED_MESSAGES;
+   emptySlot = &log->Messages[nextEmpty];
+
+   debug_message_store(emptySlot, source, type,
+                       id, severity, len, buf);
+
+   log->NumMessages++;
+}
+
+/**
+ * Lock and return debug state for the context.  The debug state will be
+ * allocated and initialized upon the first call.  When NULL is returned, the
+ * debug state is not locked.
+ */
+static struct gl_debug_state *
+_mesa_lock_debug_state(struct gl_context *ctx)
+{
+    return 0;
+}
+
+static void
+_mesa_unlock_debug_state(struct gl_context *ctx)
+{
+}
+
+/**
+ * Set the integer debug state specified by \p pname.  This can be called from
+ * _mesa_set_enable for example.
+ */
+bool
+_mesa_set_debug_state_int(struct gl_context *ctx, GLenum pname, GLint val)
+{
+   struct gl_debug_state *debug = _mesa_lock_debug_state(ctx);
+
+   if (!debug)
+      return false;
+
+   switch (pname) {
+   case GL_DEBUG_OUTPUT:
+      debug->DebugOutput = (val != 0);
+      break;
+   case GL_DEBUG_OUTPUT_SYNCHRONOUS_ARB:
+      debug->SyncOutput = (val != 0);
+      break;
+   default:
+      assert(!"unknown debug output param");
+      break;
+   }
+
+   _mesa_unlock_debug_state(ctx);
+
+   return true;
+}
+
+/**
+ * Query the integer debug state specified by \p pname.  This can be called
+ * _mesa_GetIntegerv for example.
+ */
+GLint
+_mesa_get_debug_state_int(struct gl_context *ctx, GLenum pname)
+{
+   struct gl_debug_state *debug;
+   GLint val;
+
+   mtx_lock(&ctx->DebugMutex);
+   debug = ctx->Debug;
+   if (!debug) {
+      mtx_unlock(&ctx->DebugMutex);
+      return 0;
+   }
+
+   switch (pname) {
+   case GL_DEBUG_OUTPUT:
+      val = debug->DebugOutput;
+      break;
+   case GL_DEBUG_OUTPUT_SYNCHRONOUS_ARB:
+      val = debug->SyncOutput;
+      break;
+   case GL_DEBUG_LOGGED_MESSAGES:
+      val = debug->Log.NumMessages;
+      break;
+   case GL_DEBUG_NEXT_LOGGED_MESSAGE_LENGTH:
+      val = (debug->Log.NumMessages) ?
+         debug->Log.Messages[debug->Log.NextMessage].length : 0;
+      break;
+   case GL_DEBUG_GROUP_STACK_DEPTH:
+      val = debug->GroupStackDepth;
+      break;
+   default:
+      assert(!"unknown debug output param");
+      val = 0;
+      break;
+   }
+
+   mtx_unlock(&ctx->DebugMutex);
+
+   return val;
+}
+
+/**
+ * Query the pointer debug state specified by \p pname.  This can be called
+ * _mesa_GetPointerv for example.
+ */
+void *
+_mesa_get_debug_state_ptr(struct gl_context *ctx, GLenum pname)
+{
+   struct gl_debug_state *debug;
+   void *val;
+
+   mtx_lock(&ctx->DebugMutex);
+   debug = ctx->Debug;
+   if (!debug) {
+      mtx_unlock(&ctx->DebugMutex);
+      return NULL;
+   }
+
+   switch (pname) {
+   case GL_DEBUG_CALLBACK_FUNCTION_ARB:
+      val = (void *) debug->Callback;
+      break;
+   case GL_DEBUG_CALLBACK_USER_PARAM_ARB:
+      val = (void *) debug->CallbackData;
+      break;
+   default:
+      assert(!"unknown debug output param");
+      val = NULL;
+      break;
+   }
+
+   mtx_unlock(&ctx->DebugMutex);
+
+   return val;
+}
+
+/**
+ * Insert a debug message.  The mutex is assumed to be locked, and will be
+ * unlocked by this call.
+ */
+static void
+log_msg_locked_and_unlock(struct gl_context *ctx,
+                          enum mesa_debug_source source,
+                          enum mesa_debug_type type, GLuint id,
+                          enum mesa_debug_severity severity,
+                          GLint len, const char *buf)
+{
+   struct gl_debug_state *debug = ctx->Debug;
+
+   if (!debug_is_message_enabled(debug, source, type, id, severity)) {
+      _mesa_unlock_debug_state(ctx);
+      return;
+   }
+
+   if (ctx->Debug->Callback) {
+      GLenum gl_source = debug_source_enums[source];
+      GLenum gl_type = debug_type_enums[type];
+      GLenum gl_severity = debug_severity_enums[severity];
+      GLDEBUGPROC callback = ctx->Debug->Callback;
+      const void *data = ctx->Debug->CallbackData;
+
+      /*
+       * When ctx->Debug->SyncOutput is GL_FALSE, the client is prepared for
+       * unsynchronous calls.  When it is GL_TRUE, we will not spawn threads.
+       * In either case, we can call the callback unlocked.
+       */
+      _mesa_unlock_debug_state(ctx);
+      callback(gl_source, gl_type, id, gl_severity, len, buf, data);
+   }
+   else {
+      debug_log_message(ctx->Debug, source, type, id, severity, len, buf);
+      _mesa_unlock_debug_state(ctx);
+   }
+}
+
+/**
+ * Log a client or driver debug message.
+ */
+static void
+log_msg(struct gl_context *ctx, enum mesa_debug_source source,
+        enum mesa_debug_type type, GLuint id,
+        enum mesa_debug_severity severity, GLint len, const char *buf)
+{
+   struct gl_debug_state *debug = _mesa_lock_debug_state(ctx);
+
+   if (!debug)
+      return;
+
+   log_msg_locked_and_unlock(ctx, source, type, id, severity, len, buf);
+}
+
+
+void
+_mesa_init_errors(struct gl_context *ctx)
+{
+   mtx_init(&ctx->DebugMutex, mtx_plain);
+}
+
+
+void
+_mesa_free_errors_data(struct gl_context *ctx)
+{
+   if (ctx->Debug) {
+      debug_destroy(ctx->Debug);
+      /* set to NULL just in case it is used before context is completely gone. */
+      ctx->Debug = NULL;
+   }
+
+   mtx_destroy(&ctx->DebugMutex);
+}
+
+
+/**********************************************************************/
+/** \name Diagnostics */
+/*@{*/
+
+static void
+output_if_debug(const char *prefixString, const char *outputString,
+                GLboolean newline)
+{
+   static int debug = -1;
+   static FILE *fout = NULL;
+
+   /* Init the local 'debug' var once.
+    * Note: the _mesa_init_debug() function should have been called
+    * by now so MESA_DEBUG_FLAGS will be initialized.
+    */
+   if (debug == -1) {
+      /* If MESA_LOG_FILE env var is set, log Mesa errors, warnings,
+       * etc to the named file.  Otherwise, output to stderr.
+       */
+      const char *logFile = _mesa_getenv("MESA_LOG_FILE");
+      if (logFile)
+         fout = fopen(logFile, "w");
+      if (!fout)
+         fout = stderr;
+#ifdef DEBUG
+      /* in debug builds, print messages unless MESA_DEBUG="silent" */
+      if (MESA_DEBUG_FLAGS & DEBUG_SILENT)
+         debug = 0;
+      else
+         debug = 1;
+#else
+      /* in release builds, be silent unless MESA_DEBUG is set */
+      debug = _mesa_getenv("MESA_DEBUG") != NULL;
+#endif
+   }
+
+   /* Now only print the string if we're required to do so. */
+   if (debug) {
+      fprintf(fout, "%s: %s", prefixString, outputString);
+      if (newline)
+         fprintf(fout, "\n");
+      fflush(fout);
+
+#if defined(_WIN32) && !defined(_WIN32_WCE)
+      /* stderr from windows applications without console is not usually 
+       * visible, so communicate with the debugger instead */ 
+      {
+         char buf[4096];
+         _mesa_snprintf(buf, sizeof(buf), "%s: %s%s", prefixString, outputString, newline ? "\n" : "");
+         OutputDebugStringA(buf);
+      }
+#endif
+   }
+}
+
+
+/**
+ * When a new type of error is recorded, print a message describing
+ * previous errors which were accumulated.
+ */
+static void
+flush_delayed_errors( struct gl_context *ctx )
+{
+   char s[MAX_DEBUG_MESSAGE_LENGTH];
+
+   if (ctx->ErrorDebugCount) {
+      _mesa_snprintf(s, MAX_DEBUG_MESSAGE_LENGTH, "%d similar %s errors", 
+                     ctx->ErrorDebugCount,
+                     _mesa_lookup_enum_by_nr(ctx->ErrorValue));
+
+      output_if_debug("Mesa", s, GL_TRUE);
+
+      ctx->ErrorDebugCount = 0;
+   }
+}
+
+
+/**
+ * Report a warning (a recoverable error condition) to stderr if
+ * either DEBUG is defined or the MESA_DEBUG env var is set.
+ *
+ * \param ctx GL context.
+ * \param fmtString printf()-like format string.
+ */
+void
+_mesa_warning( struct gl_context *ctx, const char *fmtString, ... )
+{
+   char str[MAX_DEBUG_MESSAGE_LENGTH];
+   va_list args;
+   va_start( args, fmtString );  
+   (void) _mesa_vsnprintf( str, MAX_DEBUG_MESSAGE_LENGTH, fmtString, args );
+   va_end( args );
+   
+   if (ctx)
+      flush_delayed_errors( ctx );
+
+   output_if_debug("Mesa warning", str, GL_TRUE);
+}
+
+
+/**
+ * Report an internal implementation problem.
+ * Prints the message to stderr via fprintf().
+ *
+ * \param ctx GL context.
+ * \param fmtString problem description string.
+ */
+void
+_mesa_problem( const struct gl_context *ctx, const char *fmtString, ... )
+{
+    // LunarG DEL:
+}
+
+
+void
+_mesa_gl_debug(struct gl_context *ctx,
+               GLuint *id,
+               enum mesa_debug_type type,
+               enum mesa_debug_severity severity,
+               const char *fmtString, ...)
+{
+   char s[MAX_DEBUG_MESSAGE_LENGTH];
+   int len;
+   va_list args;
+
+   debug_get_id(id);
+
+   va_start(args, fmtString);
+   len = _mesa_vsnprintf(s, MAX_DEBUG_MESSAGE_LENGTH, fmtString, args);
+   va_end(args);
+
+   log_msg(ctx, MESA_DEBUG_SOURCE_API, type, *id, severity, len, s);
+}
+
+
+/**
+ * Record an OpenGL state error.  These usually occur when the user
+ * passes invalid parameters to a GL function.
+ *
+ * If debugging is enabled (either at compile-time via the DEBUG macro, or
+ * run-time via the MESA_DEBUG environment variable), report the error with
+ * _mesa_debug().
+ * 
+ * \param ctx the GL context.
+ * \param error the error value.
+ * \param fmtString printf() style format string, followed by optional args
+ */
+void
+_mesa_error( struct gl_context *ctx, GLenum error, const char *fmtString, ... )
+{
+}
+
+void
+_mesa_error_no_memory(const char *caller)
+{
+}
+
+/**
+ * Report debug information.  Print error message to stderr via fprintf().
+ * No-op if DEBUG mode not enabled.
+ * 
+ * \param ctx GL context.
+ * \param fmtString printf()-style format string, followed by optional args.
+ */
+void
+_mesa_debug( const struct gl_context *ctx, const char *fmtString, ... )
+{
+#ifdef DEBUG
+   char s[MAX_DEBUG_MESSAGE_LENGTH];
+   va_list args;
+   va_start(args, fmtString);
+   _mesa_vsnprintf(s, MAX_DEBUG_MESSAGE_LENGTH, fmtString, args);
+   va_end(args);
+   output_if_debug("Mesa", s, GL_FALSE);
+#endif /* DEBUG */
+   (void) ctx;
+   (void) fmtString;
+}
+
+
+/**
+ * Report debug information from the shader compiler via GL_ARB_debug_output.
+ *
+ * \param ctx GL context.
+ * \param type The namespace to which this message belongs.
+ * \param id The message ID within the given namespace.
+ * \param msg The message to output. Need not be null-terminated.
+ * \param len The length of 'msg'. If negative, 'msg' must be null-terminated.
+ */
+void
+_mesa_shader_debug( struct gl_context *ctx, GLenum type, GLuint *id,
+                    const char *msg, int len )
+{
+   enum mesa_debug_source source = MESA_DEBUG_SOURCE_SHADER_COMPILER;
+   enum mesa_debug_severity severity = MESA_DEBUG_SEVERITY_HIGH;
+
+   debug_get_id(id);
+
+   if (len < 0)
+      len = strlen(msg);
+
+   /* Truncate the message if necessary. */
+   if (len >= MAX_DEBUG_MESSAGE_LENGTH)
+      len = MAX_DEBUG_MESSAGE_LENGTH - 1;
+
+   log_msg(ctx, source, type, *id, severity, len, msg);
+}
+
+/*@}*/

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/errors.h b/icd/intel/compiler/mesa-utils/src/mesa/main/errors.h
new file mode 100644
index 0000000..b388138
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/errors.h

@@ -0,0 +1,128 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2008  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+/**
+ * \file errors.h
+ * Mesa debugging and error handling functions.
+ *
+ * This file provides functions to record errors, warnings, and miscellaneous
+ * debug information.
+ */
+
+
+#ifndef ERRORS_H
+#define ERRORS_H
+
+
+#include "compiler.h"
+#include "glheader.h"
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "mtypes.h"
+
+struct _glapi_table;
+
+extern void
+_mesa_init_errors( struct gl_context *ctx );
+
+extern void
+_mesa_free_errors_data( struct gl_context *ctx );
+
+extern void
+_mesa_warning( struct gl_context *gc, const char *fmtString, ... ) PRINTFLIKE(2, 3);
+
+extern void
+_mesa_problem( const struct gl_context *ctx, const char *fmtString, ... ) PRINTFLIKE(2, 3);
+
+extern void
+_mesa_error( struct gl_context *ctx, GLenum error, const char *fmtString, ... ) PRINTFLIKE(3, 4);
+
+extern void
+_mesa_error_no_memory(const char *caller);
+
+extern void
+_mesa_debug( const struct gl_context *ctx, const char *fmtString, ... ) PRINTFLIKE(2, 3);
+
+extern void
+_mesa_gl_debug(struct gl_context *ctx,
+               GLuint *id,
+               enum mesa_debug_type type,
+               enum mesa_debug_severity severity,
+               const char *fmtString, ...) PRINTFLIKE(5, 6);
+
+#define _mesa_perf_debug(ctx, sev, ...) do {                              \
+   static GLuint msg_id = 0;                                              \
+   if (unlikely(ctx->Const.ContextFlags & GL_CONTEXT_FLAG_DEBUG_BIT)) {   \
+      _mesa_gl_debug(ctx, &msg_id,                                        \
+                     MESA_DEBUG_TYPE_PERFORMANCE,                         \
+                     sev,                                                 \
+                     __VA_ARGS__);                                        \
+   }                                                                      \
+} while (0)
+
+bool
+_mesa_set_debug_state_int(struct gl_context *ctx, GLenum pname, GLint val);
+
+GLint
+_mesa_get_debug_state_int(struct gl_context *ctx, GLenum pname);
+
+void *
+_mesa_get_debug_state_ptr(struct gl_context *ctx, GLenum pname);
+
+extern void
+_mesa_shader_debug(struct gl_context *ctx, GLenum type, GLuint *id,
+                   const char *msg, int len);
+
+void GLAPIENTRY
+_mesa_DebugMessageInsert(GLenum source, GLenum type, GLuint id,
+                         GLenum severity, GLint length,
+                         const GLchar* buf);
+GLuint GLAPIENTRY
+_mesa_GetDebugMessageLog(GLuint count, GLsizei logSize, GLenum* sources,
+                         GLenum* types, GLenum* ids, GLenum* severities,
+                         GLsizei* lengths, GLchar* messageLog);
+void GLAPIENTRY
+_mesa_DebugMessageControl(GLenum source, GLenum type, GLenum severity,
+                          GLsizei count, const GLuint *ids,
+                          GLboolean enabled);
+void GLAPIENTRY
+_mesa_DebugMessageCallback(GLDEBUGPROC callback,
+                           const void *userParam);
+void GLAPIENTRY
+_mesa_PushDebugGroup(GLenum source, GLuint id, GLsizei length,
+                     const GLchar *message);
+void GLAPIENTRY
+_mesa_PopDebugGroup(void);
+
+#ifdef __cplusplus
+}
+#endif
+
+
+#endif /* ERRORS_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/formats.c b/icd/intel/compiler/mesa-utils/src/mesa/main/formats.c
new file mode 100644
index 0000000..36aa292
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/formats.c

@@ -0,0 +1,3538 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2008  Brian Paul   All Rights Reserved.
+ * Copyright (c) 2008-2009  VMware, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+#include "imports.h"
+#include "formats.h"
+#include "macros.h"
+#include "glformats.h"
+
+
+/**
+ * Information about texture formats.
+ */
+struct gl_format_info
+{
+   mesa_format Name;
+
+   /** text name for debugging */
+   const char *StrName;
+
+   /**
+    * Base format is one of GL_RED, GL_RG, GL_RGB, GL_RGBA, GL_ALPHA,
+    * GL_LUMINANCE, GL_LUMINANCE_ALPHA, GL_INTENSITY, GL_YCBCR_MESA,
+    * GL_DEPTH_COMPONENT, GL_STENCIL_INDEX, GL_DEPTH_STENCIL, GL_DUDV_ATI.
+    */
+   GLenum BaseFormat;
+
+   /**
+    * Logical data type: one of  GL_UNSIGNED_NORMALIZED, GL_SIGNED_NORMALIZED,
+    * GL_UNSIGNED_INT, GL_INT, GL_FLOAT.
+    */
+   GLenum DataType;
+
+   GLubyte RedBits;
+   GLubyte GreenBits;
+   GLubyte BlueBits;
+   GLubyte AlphaBits;
+   GLubyte LuminanceBits;
+   GLubyte IntensityBits;
+   GLubyte IndexBits;
+   GLubyte DepthBits;
+   GLubyte StencilBits;
+
+   /**
+    * To describe compressed formats.  If not compressed, Width=Height=1.
+    */
+   GLubyte BlockWidth, BlockHeight;
+   GLubyte BytesPerBlock;
+};
+
+
+/**
+ * Info about each format.
+ * These must be in the same order as the MESA_FORMAT_* enums so that
+ * we can do lookups without searching.
+ */
+static struct gl_format_info format_info[MESA_FORMAT_COUNT] =
+{
+   /* Packed unorm formats */
+   {
+      MESA_FORMAT_NONE,            /* Name */
+      "MESA_FORMAT_NONE",          /* StrName */
+      GL_NONE,                     /* BaseFormat */
+      GL_NONE,                     /* DataType */
+      0, 0, 0, 0,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      0, 0, 0                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_A8B8G8R8_UNORM,  /* Name */
+      "MESA_FORMAT_A8B8G8R8_UNORM",/* StrName */
+      GL_RGBA,                     /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      8, 8, 8, 8,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 4                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_X8B8G8R8_UNORM,  /* Name */
+      "MESA_FORMAT_X8B8G8R8_UNORM",/* StrName */
+      GL_RGB,                      /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      8, 8, 8, 0,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 4                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_R8G8B8A8_UNORM,  /* Name */
+      "MESA_FORMAT_R8G8B8A8_UNORM",/* StrName */
+      GL_RGBA,                     /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      8, 8, 8, 8,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 4                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_R8G8B8X8_UNORM,  /* Name */
+      "MESA_FORMAT_R8G8B8X8_UNORM",/* StrName */
+      GL_RGB,                      /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      8, 8, 8, 0,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 4                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_B8G8R8A8_UNORM,  /* Name */
+      "MESA_FORMAT_B8G8R8A8_UNORM",/* StrName */
+      GL_RGBA,                     /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      8, 8, 8, 8,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 4                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_B8G8R8X8_UNORM,  /* Name */
+      "MESA_FORMAT_B8G8R8X8_UNORM",/* StrName */
+      GL_RGB,                      /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      8, 8, 8, 0,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 4                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_A8R8G8B8_UNORM,  /* Name */
+      "MESA_FORMAT_A8R8G8B8_UNORM",/* StrName */
+      GL_RGBA,                     /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      8, 8, 8, 8,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 4                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_X8R8G8B8_UNORM,  /* Name */
+      "MESA_FORMAT_X8R8G8B8_UNORM",/* StrName */
+      GL_RGB,                      /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      8, 8, 8, 0,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 4                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_L16A16_UNORM,    /* Name */
+      "MESA_FORMAT_L16A16_UNORM",  /* StrName */
+      GL_LUMINANCE_ALPHA,          /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      0, 0, 0, 16,                 /* Red/Green/Blue/AlphaBits */
+      16, 0, 0, 0, 0,              /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 4                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_A16L16_UNORM,    /* Name */
+      "MESA_FORMAT_A16L16_UNORM",  /* StrName */
+      GL_LUMINANCE_ALPHA,          /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      0, 0, 0, 16,                 /* Red/Green/Blue/AlphaBits */
+      16, 0, 0, 0, 0,              /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 4                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_B5G6R5_UNORM,    /* Name */
+      "MESA_FORMAT_B5G6R5_UNORM",  /* StrName */
+      GL_RGB,                      /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      5, 6, 5, 0,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 2                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_R5G6B5_UNORM,    /* Name */
+      "MESA_FORMAT_R5G6B5_UNORM",  /* StrName */
+      GL_RGB,                      /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      5, 6, 5, 0,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 2                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_B4G4R4A4_UNORM,  /* Name */
+      "MESA_FORMAT_B4G4R4A4_UNORM",/* StrName */
+      GL_RGBA,                     /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      4, 4, 4, 4,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 2                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_B4G4R4X4_UNORM,
+      "MESA_FORMAT_B4G4R4X4_UNORM",
+      GL_RGB,
+      GL_UNSIGNED_NORMALIZED,
+      4, 4, 4, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_A4R4G4B4_UNORM,  /* Name */
+      "MESA_FORMAT_A4R4G4B4_UNORM",/* StrName */
+      GL_RGBA,                     /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      4, 4, 4, 4,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 2                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_A1B5G5R5_UNORM,  /* Name */
+      "MESA_FORMAT_A1B5G5R5_UNORM",/* StrName */
+      GL_RGBA,                     /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      5, 5, 5, 1,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 2                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_B5G5R5A1_UNORM,  /* Name */
+      "MESA_FORMAT_B5G5R5A1_UNORM",/* StrName */
+      GL_RGBA,                     /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      5, 5, 5, 1,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 2                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_B5G5R5X1_UNORM,
+      "MESA_FORMAT_B5G5R5X1_UNORM",
+      GL_RGB,
+      GL_UNSIGNED_NORMALIZED,
+      5, 5, 5, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_A1R5G5B5_UNORM,  /* Name */
+      "MESA_FORMAT_A1R5G5B5_UNORM",/* StrName */
+      GL_RGBA,                     /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      5, 5, 5, 1,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 2                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_L8A8_UNORM,      /* Name */
+      "MESA_FORMAT_L8A8_UNORM",    /* StrName */
+      GL_LUMINANCE_ALPHA,          /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      0, 0, 0, 8,                  /* Red/Green/Blue/AlphaBits */
+      8, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 2                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_A8L8_UNORM,      /* Name */
+      "MESA_FORMAT_A8L8_UNORM",    /* StrName */
+      GL_LUMINANCE_ALPHA,          /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      0, 0, 0, 8,                  /* Red/Green/Blue/AlphaBits */
+      8, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 2                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_R8G8_UNORM,
+      "MESA_FORMAT_R8G8_UNORM",
+      GL_RG,
+      GL_UNSIGNED_NORMALIZED,
+      8, 8, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_G8R8_UNORM,
+      "MESA_FORMAT_G8R8_UNORM",
+      GL_RG,
+      GL_UNSIGNED_NORMALIZED,
+      8, 8, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_L4A4_UNORM,      /* Name */
+      "MESA_FORMAT_L4A4_UNORM",    /* StrName */
+      GL_LUMINANCE_ALPHA,          /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      0, 0, 0, 4,                  /* Red/Green/Blue/AlphaBits */
+      4, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 1                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_B2G3R3_UNORM,    /* Name */
+      "MESA_FORMAT_B2G3R3_UNORM",  /* StrName */
+      GL_RGB,                      /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      3, 3, 2, 0,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 1                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_R16G16_UNORM,
+      "MESA_FORMAT_R16G16_UNORM",
+      GL_RG,
+      GL_UNSIGNED_NORMALIZED,
+      16, 16, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_G16R16_UNORM,
+      "MESA_FORMAT_G16R16_UNORM",
+      GL_RG,
+      GL_UNSIGNED_NORMALIZED,
+      16, 16, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_B10G10R10A2_UNORM,
+      "MESA_FORMAT_B10G10R10A2_UNORM",
+      GL_RGBA,
+      GL_UNSIGNED_NORMALIZED,
+      10, 10, 10, 2,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_B10G10R10X2_UNORM,
+      "MESA_FORMAT_B10G10R10X2_UNORM",
+      GL_RGB,
+      GL_UNSIGNED_NORMALIZED,
+      10, 10, 10, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_R10G10B10A2_UNORM,
+      "MESA_FORMAT_R10G10B10A2_UNORM",
+      GL_RGBA,
+      GL_UNSIGNED_NORMALIZED,
+      10, 10, 10, 2,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_S8_UINT_Z24_UNORM,   /* Name */
+      "MESA_FORMAT_S8_UINT_Z24_UNORM", /* StrName */
+      GL_DEPTH_STENCIL,                /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,          /* DataType */
+      0, 0, 0, 0,                      /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 24, 8,                  /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 4                          /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_X8_UINT_Z24_UNORM,   /* Name */
+      "MESA_FORMAT_X8_UINT_Z24_UNORM", /* StrName */
+      GL_DEPTH_COMPONENT,              /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,          /* DataType */
+      0, 0, 0, 0,                      /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 24, 0,                  /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 4                          /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_Z24_UNORM_S8_UINT,   /* Name */
+      "MESA_FORMAT_Z24_UNORM_S8_UINT", /* StrName */
+      GL_DEPTH_STENCIL,                /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,          /* DataType */
+      0, 0, 0, 0,                      /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 24, 8,                  /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 4                          /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_Z24_UNORM_X8_UINT,   /* Name */
+      "MESA_FORMAT_Z24_UNORM_X8_UINT", /* StrName */
+      GL_DEPTH_COMPONENT,              /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,          /* DataType */
+      0, 0, 0, 0,                      /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 24, 0,                  /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 4                          /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_YCBCR,           /* Name */
+      "MESA_FORMAT_YCBCR",         /* StrName */
+      GL_YCBCR_MESA,               /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      0, 0, 0, 0,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 2                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_YCBCR_REV,       /* Name */
+      "MESA_FORMAT_YCBCR_REV",     /* StrName */
+      GL_YCBCR_MESA,               /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      0, 0, 0, 0,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 2                      /* BlockWidth/Height,Bytes */
+   },
+
+   /* Array unorm formats */
+   {
+      MESA_FORMAT_DUDV8,
+      "MESA_FORMAT_DUDV8",
+      GL_DUDV_ATI,
+      GL_SIGNED_NORMALIZED,
+      0, 0, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_A_UNORM8,        /* Name */
+      "MESA_FORMAT_A_UNORM8",      /* StrName */
+      GL_ALPHA,                    /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      0, 0, 0, 8,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 1                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_A_UNORM16,       /* Name */
+      "MESA_FORMAT_A_UNORM16",     /* StrName */
+      GL_ALPHA,                    /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      0, 0, 0, 16,                 /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 2                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_L_UNORM8,        /* Name */
+      "MESA_FORMAT_L_UNORM8",      /* StrName */
+      GL_LUMINANCE,                /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      0, 0, 0, 0,                  /* Red/Green/Blue/AlphaBits */
+      8, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 1                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_L_UNORM16,       /* Name */
+      "MESA_FORMAT_L_UNORM16",     /* StrName */
+      GL_LUMINANCE,                /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      0, 0, 0, 0,                  /* Red/Green/Blue/AlphaBits */
+      16, 0, 0, 0, 0,              /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 2                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_I_UNORM8,        /* Name */
+      "MESA_FORMAT_I_UNORM8",      /* StrName */
+      GL_INTENSITY,                /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      0, 0, 0, 0,                  /* Red/Green/Blue/AlphaBits */
+      0, 8, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 1                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_I_UNORM16,       /* Name */
+      "MESA_FORMAT_I_UNORM16",     /* StrName */
+      GL_INTENSITY,                /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      0, 0, 0, 0,                  /* Red/Green/Blue/AlphaBits */
+      0, 16, 0, 0, 0,              /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 2                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_R_UNORM8,
+      "MESA_FORMAT_R_UNORM8",
+      GL_RED,
+      GL_UNSIGNED_NORMALIZED,
+      8, 0, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 1
+   },
+   {
+      MESA_FORMAT_R_UNORM16,
+      "MESA_FORMAT_R_UNORM16",
+      GL_RED,
+      GL_UNSIGNED_NORMALIZED,
+      16, 0, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_BGR_UNORM8,      /* Name */
+      "MESA_FORMAT_BGR_UNORM8",    /* StrName */
+      GL_RGB,                      /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      8, 8, 8, 0,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 3                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_RGB_UNORM8,      /* Name */
+      "MESA_FORMAT_RGB_UNORM8",    /* StrName */
+      GL_RGB,                      /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      8, 8, 8, 0,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 3                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_RGBA_UNORM16,
+      "MESA_FORMAT_RGBA_UNORM16",
+      GL_RGBA,
+      GL_UNSIGNED_NORMALIZED,
+      16, 16, 16, 16,
+      0, 0, 0, 0, 0,
+      1, 1, 8
+   },
+   {
+      MESA_FORMAT_RGBX_UNORM16,
+      "MESA_FORMAT_RGBX_UNORM16",
+      GL_RGB,
+      GL_UNSIGNED_NORMALIZED,
+      16, 16, 16, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 8
+   },
+   {
+      MESA_FORMAT_Z_UNORM16,       /* Name */
+      "MESA_FORMAT_Z_UNORM16",     /* StrName */
+      GL_DEPTH_COMPONENT,          /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      0, 0, 0, 0,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 16, 0,              /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 2                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_Z_UNORM32,       /* Name */
+      "MESA_FORMAT_Z_UNORM32",     /* StrName */
+      GL_DEPTH_COMPONENT,          /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      0, 0, 0, 0,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 32, 0,              /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 4                      /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_S_UINT8,         /* Name */
+      "MESA_FORMAT_S_UINT8",       /* StrName */
+      GL_STENCIL_INDEX,            /* BaseFormat */
+      GL_UNSIGNED_INT,             /* DataType */
+      0, 0, 0, 0,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 8,               /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 1                      /* BlockWidth/Height,Bytes */
+   },
+
+   /* Packed signed/normalized formats */
+   {
+      MESA_FORMAT_A8B8G8R8_SNORM,
+      "MESA_FORMAT_A8B8G8R8_SNORM",
+      GL_RGBA,
+      GL_SIGNED_NORMALIZED,
+      8, 8, 8, 8,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_X8B8G8R8_SNORM,
+      "MESA_FORMAT_X8B8G8R8_SNORM",
+      GL_RGB,
+      GL_SIGNED_NORMALIZED,
+      8, 8, 8, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 4                       /* 4 bpp, but no alpha */
+   },
+   {
+      MESA_FORMAT_R8G8B8A8_SNORM,
+      "MESA_FORMAT_R8G8B8A8_SNORM",
+      GL_RGBA,
+      GL_SIGNED_NORMALIZED,
+      8, 8, 8, 8,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_R8G8B8X8_SNORM,
+      "MESA_FORMAT_R8G8B8X8_SNORM",
+      GL_RGB,
+      GL_SIGNED_NORMALIZED,
+      8, 8, 8, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_R16G16_SNORM,
+      "MESA_FORMAT_R16G16_SNORM",
+      GL_RG,
+      GL_SIGNED_NORMALIZED,
+      16, 16, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_G16R16_SNORM,
+      "MESA_FORMAT_G16R16_SNORM",
+      GL_RG,
+      GL_SIGNED_NORMALIZED,
+      16, 16, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_R8G8_SNORM,
+      "MESA_FORMAT_R8G8_SNORM",
+      GL_RG,
+      GL_SIGNED_NORMALIZED,
+      8, 8, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_G8R8_SNORM,
+      "MESA_FORMAT_G8R8_SNORM",
+      GL_RG,
+      GL_SIGNED_NORMALIZED,
+      8, 8, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_L8A8_SNORM,
+      "MESA_FORMAT_L8A8_SNORM",
+      GL_LUMINANCE_ALPHA,
+      GL_SIGNED_NORMALIZED,
+      0, 0, 0, 8,
+      8, 0, 0, 0, 0,
+      1, 1, 2
+   },
+
+   /* Array signed/normalized formats */
+   {
+      MESA_FORMAT_A_SNORM8,
+      "MESA_FORMAT_A_SNORM8",
+      GL_ALPHA,
+      GL_SIGNED_NORMALIZED,
+      0, 0, 0, 8,
+      0, 0, 0, 0, 0,
+      1, 1, 1
+   },
+   {
+      MESA_FORMAT_A_SNORM16,
+      "MESA_FORMAT_A_SNORM16",
+      GL_ALPHA,
+      GL_SIGNED_NORMALIZED,
+      0, 0, 0, 16,
+      0, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_L_SNORM8,
+      "MESA_FORMAT_L_SNORM8",
+      GL_LUMINANCE,
+      GL_SIGNED_NORMALIZED,
+      0, 0, 0, 0,
+      8, 0, 0, 0, 0,
+      1, 1, 1
+   },
+   {
+      MESA_FORMAT_L_SNORM16,
+      "MESA_FORMAT_L_SNORM16",
+      GL_LUMINANCE,
+      GL_SIGNED_NORMALIZED,
+      0, 0, 0, 0,
+      16, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_I_SNORM8,
+      "MESA_FORMAT_I_SNORM8",
+      GL_INTENSITY,
+      GL_SIGNED_NORMALIZED,
+      0, 0, 0, 0,
+      0, 8, 0, 0, 0,
+      1, 1, 1
+   },
+   {
+      MESA_FORMAT_I_SNORM16,
+      "MESA_FORMAT_I_SNORM16",
+      GL_INTENSITY,
+      GL_SIGNED_NORMALIZED,
+      0, 0, 0, 0,
+      0, 16, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_R_SNORM8,         /* Name */
+      "MESA_FORMAT_R_SNORM8",       /* StrName */
+      GL_RED,                       /* BaseFormat */
+      GL_SIGNED_NORMALIZED,         /* DataType */
+      8, 0, 0, 0,                   /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,                /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 1                       /* BlockWidth/Height,Bytes */
+   },
+   {
+      MESA_FORMAT_R_SNORM16,
+      "MESA_FORMAT_R_SNORM16",
+      GL_RED,
+      GL_SIGNED_NORMALIZED,
+      16, 0, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_LA_SNORM16,
+      "MESA_FORMAT_LA_SNORM16",
+      GL_LUMINANCE_ALPHA,
+      GL_SIGNED_NORMALIZED,
+      0, 0, 0, 16,
+      16, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_RGB_SNORM16,
+      "MESA_FORMAT_RGB_SNORM16",
+      GL_RGB,
+      GL_SIGNED_NORMALIZED,
+      16, 16, 16, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 6
+   },
+   {
+      MESA_FORMAT_RGBA_SNORM16,
+      "MESA_FORMAT_RGBA_SNORM16",
+      GL_RGBA,
+      GL_SIGNED_NORMALIZED,
+      16, 16, 16, 16,
+      0, 0, 0, 0, 0,
+      1, 1, 8
+   },
+   {
+      MESA_FORMAT_RGBX_SNORM16,
+      "MESA_FORMAT_RGBX_SNORM16",
+      GL_RGB,
+      GL_SIGNED_NORMALIZED,
+      16, 16, 16, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 8
+   },
+
+   /* Packed sRGB formats */
+   {
+      MESA_FORMAT_A8B8G8R8_SRGB,
+      "MESA_FORMAT_A8B8G8R8_SRGB",
+      GL_RGBA,
+      GL_UNSIGNED_NORMALIZED,    
+      8, 8, 8, 8,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_B8G8R8A8_SRGB,
+      "MESA_FORMAT_B8G8R8A8_SRGB",
+      GL_RGBA,
+      GL_UNSIGNED_NORMALIZED,    
+      8, 8, 8, 8,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_B8G8R8X8_SRGB,
+      "MESA_FORMAT_B8G8R8X8_SRGB",
+      GL_RGB,
+      GL_UNSIGNED_NORMALIZED,
+      8, 8, 8, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_R8G8B8A8_SRGB,
+      "MESA_FORMAT_R8G8B8A8_SRGB",
+      GL_RGBA,
+      GL_UNSIGNED_NORMALIZED,    
+      8, 8, 8, 8,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_R8G8B8X8_SRGB,
+      "MESA_FORMAT_R8G8B8X8_SRGB",
+      GL_RGB,
+      GL_UNSIGNED_NORMALIZED,
+      8, 8, 8, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_L8A8_SRGB,
+      "MESA_FORMAT_L8A8_SRGB",
+      GL_LUMINANCE_ALPHA,
+      GL_UNSIGNED_NORMALIZED,    
+      0, 0, 0, 8,
+      8, 0, 0, 0, 0,
+      1, 1, 2
+   },
+
+   /* Array sRGB formats */
+   {
+      MESA_FORMAT_L_SRGB8,
+      "MESA_FORMAT_L_SRGB8",
+      GL_LUMINANCE,
+      GL_UNSIGNED_NORMALIZED,    
+      0, 0, 0, 0,
+      8, 0, 0, 0, 0,
+      1, 1, 1
+   },
+   {
+      MESA_FORMAT_BGR_SRGB8,
+      "MESA_FORMAT_BGR_SRGB8",
+      GL_RGB,
+      GL_UNSIGNED_NORMALIZED,
+      8, 8, 8, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 3
+   },
+
+   /* Packed float formats */
+   {
+      MESA_FORMAT_R9G9B9E5_FLOAT,
+      "MESA_FORMAT_RGB9_E5",
+      GL_RGB,
+      GL_FLOAT,
+      9, 9, 9, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_R11G11B10_FLOAT,
+      "MESA_FORMAT_R11G11B10_FLOAT",
+      GL_RGB,
+      GL_FLOAT,
+      11, 11, 10, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_Z32_FLOAT_S8X24_UINT,   /* Name */
+      "MESA_FORMAT_Z32_FLOAT_S8X24_UINT", /* StrName */
+      GL_DEPTH_STENCIL,                   /* BaseFormat */
+      /* DataType here is used to answer GL_TEXTURE_DEPTH_TYPE queries, and is
+       * never used for stencil because stencil is always GL_UNSIGNED_INT.
+       */
+      GL_FLOAT,                    /* DataType */
+      0, 0, 0, 0,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 32, 8,              /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 8                      /* BlockWidth/Height,Bytes */
+   },
+
+   /* Array float formats */
+   {
+      MESA_FORMAT_A_FLOAT16,
+      "MESA_FORMAT_A_FLOAT16",
+      GL_ALPHA,
+      GL_FLOAT,
+      0, 0, 0, 16,
+      0, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_A_FLOAT32,
+      "MESA_FORMAT_A_FLOAT32",
+      GL_ALPHA,
+      GL_FLOAT,
+      0, 0, 0, 32,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_L_FLOAT16,
+      "MESA_FORMAT_L_FLOAT16",
+      GL_LUMINANCE,
+      GL_FLOAT,
+      0, 0, 0, 0,
+      16, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_L_FLOAT32,
+      "MESA_FORMAT_L_FLOAT32",
+      GL_LUMINANCE,
+      GL_FLOAT,
+      0, 0, 0, 0,
+      32, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_LA_FLOAT16,
+      "MESA_FORMAT_LA_FLOAT16",
+      GL_LUMINANCE_ALPHA,
+      GL_FLOAT,
+      0, 0, 0, 16,
+      16, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_LA_FLOAT32,
+      "MESA_FORMAT_LA_FLOAT32",
+      GL_LUMINANCE_ALPHA,
+      GL_FLOAT,
+      0, 0, 0, 32,
+      32, 0, 0, 0, 0,
+      1, 1, 8
+   },
+   {
+      MESA_FORMAT_I_FLOAT16,
+      "MESA_FORMAT_I_FLOAT16",
+      GL_INTENSITY,
+      GL_FLOAT,
+      0, 0, 0, 0,
+      0, 16, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_I_FLOAT32,
+      "MESA_FORMAT_I_FLOAT32",
+      GL_INTENSITY,
+      GL_FLOAT,
+      0, 0, 0, 0,
+      0, 32, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_R_FLOAT16,
+      "MESA_FORMAT_R_FLOAT16",
+      GL_RED,
+      GL_FLOAT,
+      16, 0, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_R_FLOAT32,
+      "MESA_FORMAT_R_FLOAT32",
+      GL_RED,
+      GL_FLOAT,
+      32, 0, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_RG_FLOAT16,
+      "MESA_FORMAT_RG_FLOAT16",
+      GL_RG,
+      GL_FLOAT,
+      16, 16, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_RG_FLOAT32,
+      "MESA_FORMAT_RG_FLOAT32",
+      GL_RG,
+      GL_FLOAT,
+      32, 32, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 8
+   },
+   {
+      MESA_FORMAT_RGB_FLOAT16,
+      "MESA_FORMAT_RGB_FLOAT16",
+      GL_RGB,
+      GL_FLOAT,
+      16, 16, 16, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 6
+   },
+   {
+      MESA_FORMAT_RGB_FLOAT32,
+      "MESA_FORMAT_RGB_FLOAT32",
+      GL_RGB,
+      GL_FLOAT,
+      32, 32, 32, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 12
+   },
+   {
+      MESA_FORMAT_RGBA_FLOAT16,
+      "MESA_FORMAT_RGBA_FLOAT16",
+      GL_RGBA,
+      GL_FLOAT,
+      16, 16, 16, 16,
+      0, 0, 0, 0, 0,
+      1, 1, 8
+   },
+   {
+      MESA_FORMAT_RGBA_FLOAT32,
+      "MESA_FORMAT_RGBA_FLOAT32",
+      GL_RGBA,
+      GL_FLOAT,
+      32, 32, 32, 32,
+      0, 0, 0, 0, 0,
+      1, 1, 16
+   },
+   {
+      MESA_FORMAT_RGBX_FLOAT16,
+      "MESA_FORMAT_RGBX_FLOAT16",
+      GL_RGB,
+      GL_FLOAT,
+      16, 16, 16, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 8
+   },
+   {
+      MESA_FORMAT_RGBX_FLOAT32,
+      "MESA_FORMAT_RGBX_FLOAT32",
+      GL_RGB,
+      GL_FLOAT,
+      32, 32, 32, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 16
+   },
+   {
+      MESA_FORMAT_Z_FLOAT32,       /* Name */
+      "MESA_FORMAT_Z_FLOAT32",     /* StrName */
+      GL_DEPTH_COMPONENT,          /* BaseFormat */
+      GL_FLOAT,                    /* DataType */
+      0, 0, 0, 0,                  /* Red/Green/Blue/AlphaBits */
+      0, 0, 0, 32, 0,              /* Lum/Int/Index/Depth/StencilBits */
+      1, 1, 4                      /* BlockWidth/Height,Bytes */
+   },
+
+   /* Packed signed/unsigned non-normalized integer formats */
+   {
+      MESA_FORMAT_B10G10R10A2_UINT,
+      "MESA_FORMAT_B10G10R10A2_UINT",
+      GL_RGBA,
+      GL_UNSIGNED_INT,
+      10, 10, 10, 2,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_R10G10B10A2_UINT,
+      "MESA_FORMAT_R10G10B10A2_UINT",
+      GL_RGBA,
+      GL_UNSIGNED_INT,
+      10, 10, 10, 2,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+
+   /* Array signed/unsigned non-normalized integer formats */
+   {
+      MESA_FORMAT_A_UINT8,
+      "MESA_FORMAT_A_UINT8",
+      GL_ALPHA,
+      GL_UNSIGNED_INT,
+      0, 0, 0, 8,
+      0, 0, 0, 0, 0,
+      1, 1, 1
+   },
+   {
+      MESA_FORMAT_A_UINT16,
+      "MESA_FORMAT_A_UINT16",
+      GL_ALPHA,
+      GL_UNSIGNED_INT,
+      0, 0, 0, 16,
+      0, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_A_UINT32,
+      "MESA_FORMAT_A_UINT32",
+      GL_ALPHA,
+      GL_UNSIGNED_INT,
+      0, 0, 0, 32,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_A_SINT8,
+      "MESA_FORMAT_A_SINT8",
+      GL_ALPHA,
+      GL_INT,
+      0, 0, 0, 8,
+      0, 0, 0, 0, 0,
+      1, 1, 1
+   },
+   {
+      MESA_FORMAT_A_SINT16,
+      "MESA_FORMAT_A_SINT16",
+      GL_ALPHA,
+      GL_INT,
+      0, 0, 0, 16,
+      0, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_A_SINT32,
+      "MESA_FORMAT_A_SINT32",
+      GL_ALPHA,
+      GL_INT,
+      0, 0, 0, 32,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_I_UINT8,
+      "MESA_FORMAT_I_UINT8",
+      GL_INTENSITY,
+      GL_UNSIGNED_INT,
+      0, 0, 0, 0,
+      0, 8, 0, 0, 0,
+      1, 1, 1
+   },
+   {
+      MESA_FORMAT_I_UINT16,
+      "MESA_FORMAT_I_UINT16",
+      GL_INTENSITY,
+      GL_UNSIGNED_INT,
+      0, 0, 0, 0,
+      0, 16, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_I_UINT32,
+      "MESA_FORMAT_I_UINT32",
+      GL_INTENSITY,
+      GL_UNSIGNED_INT,
+      0, 0, 0, 0,
+      0, 32, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_I_SINT8,
+      "MESA_FORMAT_I_SINT8",
+      GL_INTENSITY,
+      GL_INT,
+      0, 0, 0, 0,
+      0, 8, 0, 0, 0,
+      1, 1, 1
+   },
+   {
+      MESA_FORMAT_I_SINT16,
+      "MESA_FORMAT_I_SINT16",
+      GL_INTENSITY,
+      GL_INT,
+      0, 0, 0, 0,
+      0, 16, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_I_SINT32,
+      "MESA_FORMAT_I_SINT32",
+      GL_INTENSITY,
+      GL_INT,
+      0, 0, 0, 0,
+      0, 32, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_L_UINT8,
+      "MESA_FORMAT_L_UINT8",
+      GL_LUMINANCE,
+      GL_UNSIGNED_INT,
+      0, 0, 0, 0,
+      8, 0, 0, 0, 0,
+      1, 1, 1
+   },
+   {
+      MESA_FORMAT_L_UINT16,
+      "MESA_FORMAT_L_UINT16",
+      GL_LUMINANCE,
+      GL_UNSIGNED_INT,
+      0, 0, 0, 0,
+      16, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_L_UINT32,
+      "MESA_FORMAT_L_UINT32",
+      GL_LUMINANCE,
+      GL_UNSIGNED_INT,
+      0, 0, 0, 0,
+      32, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_L_SINT8,
+      "MESA_FORMAT_L_SINT8",
+      GL_LUMINANCE,
+      GL_INT,
+      0, 0, 0, 0,
+      8, 0, 0, 0, 0,
+      1, 1, 1
+   },
+   {
+      MESA_FORMAT_L_SINT16,
+      "MESA_FORMAT_L_SINT16",
+      GL_LUMINANCE,
+      GL_INT,
+      0, 0, 0, 0,
+      16, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_L_SINT32,
+      "MESA_FORMAT_L_SINT32",
+      GL_LUMINANCE,
+      GL_INT,
+      0, 0, 0, 0,
+      32, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_LA_UINT8,
+      "MESA_FORMAT_LA_UINT8",
+      GL_LUMINANCE_ALPHA,
+      GL_UNSIGNED_INT,
+      0, 0, 0, 8,
+      8, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_LA_UINT16,
+      "MESA_FORMAT_LA_UINT16",
+      GL_LUMINANCE_ALPHA,
+      GL_UNSIGNED_INT,
+      0, 0, 0, 16,
+      16, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_LA_UINT32,
+      "MESA_FORMAT_LA_UINT32",
+      GL_LUMINANCE_ALPHA,
+      GL_UNSIGNED_INT,
+      0, 0, 0, 32,
+      32, 0, 0, 0, 0,
+      1, 1, 8
+   },
+   {
+      MESA_FORMAT_LA_SINT8,
+      "MESA_FORMAT_LA_SINT8",
+      GL_LUMINANCE_ALPHA,
+      GL_INT,
+      0, 0, 0, 8,
+      8, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_LA_SINT16,
+      "MESA_FORMAT_LA_SINT16",
+      GL_LUMINANCE_ALPHA,
+      GL_INT,
+      0, 0, 0, 16,
+      16, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_LA_SINT32,
+      "MESA_FORMAT_LA_SINT32",
+      GL_LUMINANCE_ALPHA,
+      GL_INT,
+      0, 0, 0, 32,
+      32, 0, 0, 0, 0,
+      1, 1, 8
+   },
+   {
+      MESA_FORMAT_R_UINT8,
+      "MESA_FORMAT_R_UINT8",
+      GL_RED,
+      GL_UNSIGNED_INT,
+      8, 0, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 1
+   },
+   {
+      MESA_FORMAT_R_UINT16,
+      "MESA_FORMAT_R_UINT16",
+      GL_RED,
+      GL_UNSIGNED_INT,
+      16, 0, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_R_UINT32,
+      "MESA_FORMAT_R_UINT32",
+      GL_RED,
+      GL_UNSIGNED_INT,
+      32, 0, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_R_SINT8,
+      "MESA_FORMAT_R_SINT8",
+      GL_RED,
+      GL_INT,
+      8, 0, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 1
+   },
+   {
+      MESA_FORMAT_R_SINT16,
+      "MESA_FORMAT_R_SINT16",
+      GL_RED,
+      GL_INT,
+      16, 0, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_R_SINT32,
+      "MESA_FORMAT_R_SINT32",
+      GL_RED,
+      GL_INT,
+      32, 0, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_RG_UINT8,
+      "MESA_FORMAT_RG_UINT8",
+      GL_RG,
+      GL_UNSIGNED_INT,
+      8, 8, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_RG_UINT16,
+      "MESA_FORMAT_RG_UINT16",
+      GL_RG,
+      GL_UNSIGNED_INT,
+      16, 16, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_RG_UINT32,
+      "MESA_FORMAT_RG_UINT32",
+      GL_RG,
+      GL_UNSIGNED_INT,
+      32, 32, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 8
+   },
+   {
+      MESA_FORMAT_RG_SINT8,
+      "MESA_FORMAT_RG_SINT8",
+      GL_RG,
+      GL_INT,
+      8, 8, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 2
+   },
+   {
+      MESA_FORMAT_RG_SINT16,
+      "MESA_FORMAT_RG_SINT16",
+      GL_RG,
+      GL_INT,
+      16, 16, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_RG_SINT32,
+      "MESA_FORMAT_RG_SINT32",
+      GL_RG,
+      GL_INT,
+      32, 32, 0, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 8
+   },
+   {
+      MESA_FORMAT_RGB_UINT8,
+      "MESA_FORMAT_RGB_UINT8",
+      GL_RGB,
+      GL_UNSIGNED_INT,
+      8, 8, 8, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 3
+   },
+   {
+      MESA_FORMAT_RGB_UINT16,
+      "MESA_FORMAT_RGB_UINT16",
+      GL_RGB,
+      GL_UNSIGNED_INT,
+      16, 16, 16, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 6
+   },
+   {
+      MESA_FORMAT_RGB_UINT32,
+      "MESA_FORMAT_RGB_UINT32",
+      GL_RGB,
+      GL_UNSIGNED_INT,
+      32, 32, 32, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 12
+   },
+   {
+      MESA_FORMAT_RGB_SINT8,
+      "MESA_FORMAT_RGB_SINT8",
+      GL_RGB,
+      GL_INT,
+      8, 8, 8, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 3
+   },
+   {
+      MESA_FORMAT_RGB_SINT16,
+      "MESA_FORMAT_RGB_SINT16",
+      GL_RGB,
+      GL_INT,
+      16, 16, 16, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 6
+   },
+   {
+      MESA_FORMAT_RGB_SINT32,
+      "MESA_FORMAT_RGB_SINT32",
+      GL_RGB,
+      GL_INT,
+      32, 32, 32, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 12
+   },
+   {
+      MESA_FORMAT_RGBA_UINT8,
+      "MESA_FORMAT_RGBA_UINT8",
+      GL_RGBA,
+      GL_UNSIGNED_INT,
+      8, 8, 8, 8,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_RGBA_UINT16,
+      "MESA_FORMAT_RGBA_UINT16",
+      GL_RGBA,
+      GL_UNSIGNED_INT,
+      16, 16, 16, 16,
+      0, 0, 0, 0, 0,
+      1, 1, 8
+   },
+   {
+      MESA_FORMAT_RGBA_UINT32,
+      "MESA_FORMAT_RGBA_UINT32",
+      GL_RGBA,
+      GL_UNSIGNED_INT,
+      32, 32, 32, 32,
+      0, 0, 0, 0, 0,
+      1, 1, 16
+   },
+   {
+      MESA_FORMAT_RGBA_SINT8,
+      "MESA_FORMAT_RGBA_SINT8",
+      GL_RGBA,
+      GL_INT,
+      8, 8, 8, 8,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_RGBA_SINT16,
+      "MESA_FORMAT_RGBA_SINT16",
+      GL_RGBA,
+      GL_INT,
+      16, 16, 16, 16,
+      0, 0, 0, 0, 0,
+      1, 1, 8
+   },
+   {
+      MESA_FORMAT_RGBA_SINT32,
+      "MESA_FORMAT_RGBA_SINT32",
+      GL_RGBA,
+      GL_INT,
+      32, 32, 32, 32,
+      0, 0, 0, 0, 0,
+      1, 1, 16
+   },
+   {
+      MESA_FORMAT_RGBX_UINT8,
+      "MESA_FORMAT_RGBX_UINT8",
+      GL_RGB,
+      GL_UNSIGNED_INT,
+      8, 8, 8, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_RGBX_UINT16,
+      "MESA_FORMAT_RGBX_UINT16",
+      GL_RGB,
+      GL_UNSIGNED_INT,
+      16, 16, 16, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 8
+   },
+   {
+      MESA_FORMAT_RGBX_UINT32,
+      "MESA_FORMAT_RGBX_UINT32",
+      GL_RGB,
+      GL_UNSIGNED_INT,
+      32, 32, 32, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 16
+   },
+   {
+      MESA_FORMAT_RGBX_SINT8,
+      "MESA_FORMAT_RGBX_SINT8",
+      GL_RGB,
+      GL_INT,
+      8, 8, 8, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 4
+   },
+   {
+      MESA_FORMAT_RGBX_SINT16,
+      "MESA_FORMAT_RGBX_SINT16",
+      GL_RGB,
+      GL_INT,
+      16, 16, 16, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 8
+   },
+   {
+      MESA_FORMAT_RGBX_SINT32,
+      "MESA_FORMAT_RGBX_SINT32",
+      GL_RGB,
+      GL_INT,
+      32, 32, 32, 0,
+      0, 0, 0, 0, 0,
+      1, 1, 16
+   },
+
+   /* DXT compressed formats */
+   {
+      MESA_FORMAT_RGB_DXT1,        /* Name */
+      "MESA_FORMAT_RGB_DXT1",      /* StrName */
+      GL_RGB,                      /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      4, 4, 4, 0,                  /* approx Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      4, 4, 8                      /* 8 bytes per 4x4 block */
+   },
+   {
+      MESA_FORMAT_RGBA_DXT1,
+      "MESA_FORMAT_RGBA_DXT1",
+      GL_RGBA,
+      GL_UNSIGNED_NORMALIZED,    
+      4, 4, 4, 4,
+      0, 0, 0, 0, 0,
+      4, 4, 8                      /* 8 bytes per 4x4 block */
+   },
+   {
+      MESA_FORMAT_RGBA_DXT3,
+      "MESA_FORMAT_RGBA_DXT3",
+      GL_RGBA,
+      GL_UNSIGNED_NORMALIZED,    
+      4, 4, 4, 4,
+      0, 0, 0, 0, 0,
+      4, 4, 16                     /* 16 bytes per 4x4 block */
+   },
+   {
+      MESA_FORMAT_RGBA_DXT5,
+      "MESA_FORMAT_RGBA_DXT5",
+      GL_RGBA,
+      GL_UNSIGNED_NORMALIZED,    
+      4, 4, 4, 4,
+      0, 0, 0, 0, 0,
+      4, 4, 16                     /* 16 bytes per 4x4 block */
+   },
+
+   /* DXT sRGB compressed formats */
+   {
+      MESA_FORMAT_SRGB_DXT1,       /* Name */
+      "MESA_FORMAT_SRGB_DXT1",     /* StrName */
+      GL_RGB,                      /* BaseFormat */
+      GL_UNSIGNED_NORMALIZED,      /* DataType */
+      4, 4, 4, 0,                  /* approx Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,               /* Lum/Int/Index/Depth/StencilBits */
+      4, 4, 8                      /* 8 bytes per 4x4 block */
+   },
+   {
+      MESA_FORMAT_SRGBA_DXT1,
+      "MESA_FORMAT_SRGBA_DXT1",
+      GL_RGBA,
+      GL_UNSIGNED_NORMALIZED,
+      4, 4, 4, 4,
+      0, 0, 0, 0, 0,
+      4, 4, 8                      /* 8 bytes per 4x4 block */
+   },
+   {
+      MESA_FORMAT_SRGBA_DXT3,
+      "MESA_FORMAT_SRGBA_DXT3",
+      GL_RGBA,
+      GL_UNSIGNED_NORMALIZED,
+      4, 4, 4, 4,
+      0, 0, 0, 0, 0,
+      4, 4, 16                     /* 16 bytes per 4x4 block */
+   },
+   {
+      MESA_FORMAT_SRGBA_DXT5,
+      "MESA_FORMAT_SRGBA_DXT5",
+      GL_RGBA,
+      GL_UNSIGNED_NORMALIZED,
+      4, 4, 4, 4,
+      0, 0, 0, 0, 0,
+      4, 4, 16                     /* 16 bytes per 4x4 block */
+   },
+
+   /* FXT1 compressed formats */
+   {
+      MESA_FORMAT_RGB_FXT1,
+      "MESA_FORMAT_RGB_FXT1",
+      GL_RGB,
+      GL_UNSIGNED_NORMALIZED,
+      4, 4, 4, 0,                  /* approx Red/Green/BlueBits */
+      0, 0, 0, 0, 0,
+      8, 4, 16                     /* 16 bytes per 8x4 block */
+   },
+   {
+      MESA_FORMAT_RGBA_FXT1,
+      "MESA_FORMAT_RGBA_FXT1",
+      GL_RGBA,
+      GL_UNSIGNED_NORMALIZED,
+      4, 4, 4, 1,                  /* approx Red/Green/Blue/AlphaBits */
+      0, 0, 0, 0, 0,
+      8, 4, 16                     /* 16 bytes per 8x4 block */
+   },
+
+   /* RGTC compressed formats */
+   {
+     MESA_FORMAT_R_RGTC1_UNORM,
+     "MESA_FORMAT_R_RGTC1_UNORM",
+     GL_RED,
+     GL_UNSIGNED_NORMALIZED,
+     8, 0, 0, 0,
+     0, 0, 0, 0, 0,
+     4, 4, 8                     /* 8 bytes per 4x4 block */
+   },
+   {
+     MESA_FORMAT_R_RGTC1_SNORM,
+     "MESA_FORMAT_R_RGTC1_SNORM",
+     GL_RED,
+     GL_SIGNED_NORMALIZED,
+     8, 0, 0, 0,
+     0, 0, 0, 0, 0,
+     4, 4, 8                     /* 8 bytes per 4x4 block */
+   },
+   {
+     MESA_FORMAT_RG_RGTC2_UNORM,
+     "MESA_FORMAT_RG_RGTC2_UNORM",
+     GL_RG,
+     GL_UNSIGNED_NORMALIZED,
+     8, 8, 0, 0,
+     0, 0, 0, 0, 0,
+     4, 4, 16                     /* 16 bytes per 4x4 block */
+   },
+   {
+     MESA_FORMAT_RG_RGTC2_SNORM,
+     "MESA_FORMAT_RG_RGTC2_SNORM",
+     GL_RG,
+     GL_SIGNED_NORMALIZED,
+     8, 8, 0, 0,
+     0, 0, 0, 0, 0,
+     4, 4, 16                     /* 16 bytes per 4x4 block */
+   },
+
+   /* LATC1/2 compressed formats */
+   {
+     MESA_FORMAT_L_LATC1_UNORM,
+     "MESA_FORMAT_L_LATC1_UNORM",
+     GL_LUMINANCE,
+     GL_UNSIGNED_NORMALIZED,
+     0, 0, 0, 0,
+     4, 0, 0, 0, 0,
+     4, 4, 8                     /* 8 bytes per 4x4 block */
+   },
+   {
+     MESA_FORMAT_L_LATC1_SNORM,
+     "MESA_FORMAT_L_LATC1_SNORM",
+     GL_LUMINANCE,
+     GL_SIGNED_NORMALIZED,
+     0, 0, 0, 0,
+     4, 0, 0, 0, 0,
+     4, 4, 8                     /* 8 bytes per 4x4 block */
+   },
+   {
+     MESA_FORMAT_LA_LATC2_UNORM,
+     "MESA_FORMAT_LA_LATC2_UNORM",
+     GL_LUMINANCE_ALPHA,
+     GL_UNSIGNED_NORMALIZED,
+     0, 0, 0, 4,
+     4, 0, 0, 0, 0,
+     4, 4, 16                     /* 16 bytes per 4x4 block */
+   },
+   {
+     MESA_FORMAT_LA_LATC2_SNORM,
+     "MESA_FORMAT_LA_LATC2_SNORM",
+     GL_LUMINANCE_ALPHA,
+     GL_SIGNED_NORMALIZED,
+     0, 0, 0, 4,
+     4, 0, 0, 0, 0,
+     4, 4, 16                     /* 16 bytes per 4x4 block */
+   },
+
+   /* ETC1/2 compressed formats */
+   {
+      MESA_FORMAT_ETC1_RGB8,
+      "MESA_FORMAT_ETC1_RGB8",
+      GL_RGB,
+      GL_UNSIGNED_NORMALIZED,
+      8, 8, 8, 0,
+      0, 0, 0, 0, 0,
+      4, 4, 8                     /* 8 bytes per 4x4 block */
+   },
+   {
+      MESA_FORMAT_ETC2_RGB8,
+      "MESA_FORMAT_ETC2_RGB8",
+      GL_RGB,
+      GL_UNSIGNED_NORMALIZED,
+      8, 8, 8, 0,
+      0, 0, 0, 0, 0,
+      4, 4, 8                     /* 8 bytes per 4x4 block */
+   },
+   {
+      MESA_FORMAT_ETC2_SRGB8,
+      "MESA_FORMAT_ETC2_SRGB8",
+      GL_RGB,
+      GL_UNSIGNED_NORMALIZED,
+      8, 8, 8, 0,
+      0, 0, 0, 0, 0,
+      4, 4, 8                     /* 8 bytes per 4x4 block */
+   },
+   {
+      MESA_FORMAT_ETC2_RGBA8_EAC,
+      "MESA_FORMAT_ETC2_RGBA8_EAC",
+      GL_RGBA,
+      GL_UNSIGNED_NORMALIZED,
+      8, 8, 8, 8,
+      0, 0, 0, 0, 0,
+      4, 4, 16                    /* 16 bytes per 4x4 block */
+   },
+   {
+      MESA_FORMAT_ETC2_SRGB8_ALPHA8_EAC,
+      "MESA_FORMAT_ETC2_SRGB8_ALPHA8_EAC",
+      GL_RGBA,
+      GL_UNSIGNED_NORMALIZED,
+      8, 8, 8, 8,
+      0, 0, 0, 0, 0,
+      4, 4, 16                    /* 16 bytes per 4x4 block */
+   },
+   {
+      MESA_FORMAT_ETC2_R11_EAC,
+      "MESA_FORMAT_ETC2_R11_EAC",
+      GL_RED,
+      GL_UNSIGNED_NORMALIZED,
+      11, 0, 0, 0,
+      0, 0, 0, 0, 0,
+      4, 4, 8                    /* 8 bytes per 4x4 block */
+   },
+   {
+      MESA_FORMAT_ETC2_RG11_EAC,
+      "MESA_FORMAT_ETC2_RG11_EAC",
+      GL_RG,
+      GL_UNSIGNED_NORMALIZED,
+      11, 11, 0, 0,
+      0, 0, 0, 0, 0,
+      4, 4, 16                    /* 16 bytes per 4x4 block */
+   },
+   {
+      MESA_FORMAT_ETC2_SIGNED_R11_EAC,
+      "MESA_FORMAT_ETC2_SIGNED_R11_EAC",
+      GL_RED,
+      GL_SIGNED_NORMALIZED,
+      11, 0, 0, 0,
+      0, 0, 0, 0, 0,
+      4, 4, 8                    /* 8 bytes per 4x4 block */
+   },
+   {
+      MESA_FORMAT_ETC2_SIGNED_RG11_EAC,
+      "MESA_FORMAT_ETC2_SIGNED_RG11_EAC",
+      GL_RG,
+      GL_SIGNED_NORMALIZED,
+      11, 11, 0, 0,
+      0, 0, 0, 0, 0,
+      4, 4, 16                    /* 16 bytes per 4x4 block */
+   },
+   {
+      MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1,
+      "MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1",
+      GL_RGBA,
+      GL_UNSIGNED_NORMALIZED,
+      8, 8, 8, 1,
+      0, 0, 0, 0, 0,
+      4, 4, 8                     /* 8 bytes per 4x4 block */
+   },
+   {
+      MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1,
+      "MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1",
+      GL_RGBA,
+      GL_UNSIGNED_NORMALIZED,
+      8, 8, 8, 1,
+      0, 0, 0, 0, 0,
+      4, 4, 8                     /* 8 bytes per 4x4 block */
+   },
+};
+
+
+
+static const struct gl_format_info *
+_mesa_get_format_info(mesa_format format)
+{
+   const struct gl_format_info *info = &format_info[format];
+   assert(info->Name == format);
+   return info;
+}
+
+
+/** Return string name of format (for debugging) */
+const char *
+_mesa_get_format_name(mesa_format format)
+{
+   const struct gl_format_info *info = _mesa_get_format_info(format);
+   return info->StrName;
+}
+
+
+
+/**
+ * Return bytes needed to store a block of pixels in the given format.
+ * Normally, a block is 1x1 (a single pixel).  But for compressed formats
+ * a block may be 4x4 or 8x4, etc.
+ *
+ * Note: not GLuint, so as not to coerce math to unsigned. cf. fdo #37351
+ */
+GLint
+_mesa_get_format_bytes(mesa_format format)
+{
+   const struct gl_format_info *info = _mesa_get_format_info(format);
+   ASSERT(info->BytesPerBlock);
+   ASSERT(info->BytesPerBlock <= MAX_PIXEL_BYTES ||
+          _mesa_is_format_compressed(format));
+   return info->BytesPerBlock;
+}
+
+
+/**
+ * Return bits per component for the given format.
+ * \param format  one of MESA_FORMAT_x
+ * \param pname  the component, such as GL_RED_BITS, GL_TEXTURE_BLUE_BITS, etc.
+ */
+GLint
+_mesa_get_format_bits(mesa_format format, GLenum pname)
+{
+   const struct gl_format_info *info = _mesa_get_format_info(format);
+
+   switch (pname) {
+   case GL_RED_BITS:
+   case GL_TEXTURE_RED_SIZE:
+   case GL_RENDERBUFFER_RED_SIZE_EXT:
+   case GL_FRAMEBUFFER_ATTACHMENT_RED_SIZE:
+      return info->RedBits;
+   case GL_GREEN_BITS:
+   case GL_TEXTURE_GREEN_SIZE:
+   case GL_RENDERBUFFER_GREEN_SIZE_EXT:
+   case GL_FRAMEBUFFER_ATTACHMENT_GREEN_SIZE:
+      return info->GreenBits;
+   case GL_BLUE_BITS:
+   case GL_TEXTURE_BLUE_SIZE:
+   case GL_RENDERBUFFER_BLUE_SIZE_EXT:
+   case GL_FRAMEBUFFER_ATTACHMENT_BLUE_SIZE:
+      return info->BlueBits;
+   case GL_ALPHA_BITS:
+   case GL_TEXTURE_ALPHA_SIZE:
+   case GL_RENDERBUFFER_ALPHA_SIZE_EXT:
+   case GL_FRAMEBUFFER_ATTACHMENT_ALPHA_SIZE:
+      return info->AlphaBits;
+   case GL_TEXTURE_INTENSITY_SIZE:
+      return info->IntensityBits;
+   case GL_TEXTURE_LUMINANCE_SIZE:
+      return info->LuminanceBits;
+   case GL_INDEX_BITS:
+      return info->IndexBits;
+   case GL_DEPTH_BITS:
+   case GL_TEXTURE_DEPTH_SIZE_ARB:
+   case GL_RENDERBUFFER_DEPTH_SIZE_EXT:
+   case GL_FRAMEBUFFER_ATTACHMENT_DEPTH_SIZE:
+      return info->DepthBits;
+   case GL_STENCIL_BITS:
+   case GL_TEXTURE_STENCIL_SIZE_EXT:
+   case GL_RENDERBUFFER_STENCIL_SIZE_EXT:
+   case GL_FRAMEBUFFER_ATTACHMENT_STENCIL_SIZE:
+      return info->StencilBits;
+   default:
+      _mesa_problem(NULL, "bad pname in _mesa_get_format_bits()");
+      return 0;
+   }
+}
+
+
+GLuint
+_mesa_get_format_max_bits(mesa_format format)
+{
+   const struct gl_format_info *info = _mesa_get_format_info(format);
+   GLuint max = MAX2(info->RedBits, info->GreenBits);
+   max = MAX2(max, info->BlueBits);
+   max = MAX2(max, info->AlphaBits);
+   max = MAX2(max, info->LuminanceBits);
+   max = MAX2(max, info->IntensityBits);
+   max = MAX2(max, info->DepthBits);
+   max = MAX2(max, info->StencilBits);
+   return max;
+}
+
+
+/**
+ * Return the data type (or more specifically, the data representation)
+ * for the given format.
+ * The return value will be one of:
+ *    GL_UNSIGNED_NORMALIZED = unsigned int representing [0,1]
+ *    GL_SIGNED_NORMALIZED = signed int representing [-1, 1]
+ *    GL_UNSIGNED_INT = an ordinary unsigned integer
+ *    GL_INT = an ordinary signed integer
+ *    GL_FLOAT = an ordinary float
+ */
+GLenum
+_mesa_get_format_datatype(mesa_format format)
+{
+   const struct gl_format_info *info = _mesa_get_format_info(format);
+   return info->DataType;
+}
+
+
+/**
+ * Return the basic format for the given type.  The result will be one of
+ * GL_RGB, GL_RGBA, GL_ALPHA, GL_LUMINANCE, GL_LUMINANCE_ALPHA, GL_INTENSITY,
+ * GL_YCBCR_MESA, GL_DEPTH_COMPONENT, GL_STENCIL_INDEX, GL_DEPTH_STENCIL.
+ */
+GLenum
+_mesa_get_format_base_format(mesa_format format)
+{
+   const struct gl_format_info *info = _mesa_get_format_info(format);
+   return info->BaseFormat;
+}
+
+
+/**
+ * Return the block size (in pixels) for the given format.  Normally
+ * the block size is 1x1.  But compressed formats will have block sizes
+ * of 4x4 or 8x4 pixels, etc.
+ * \param bw  returns block width in pixels
+ * \param bh  returns block height in pixels
+ */
+void
+_mesa_get_format_block_size(mesa_format format, GLuint *bw, GLuint *bh)
+{
+   const struct gl_format_info *info = _mesa_get_format_info(format);
+   *bw = info->BlockWidth;
+   *bh = info->BlockHeight;
+}
+
+
+/** Is the given format a compressed format? */
+GLboolean
+_mesa_is_format_compressed(mesa_format format)
+{
+   const struct gl_format_info *info = _mesa_get_format_info(format);
+   return info->BlockWidth > 1 || info->BlockHeight > 1;
+}
+
+
+/**
+ * Determine if the given format represents a packed depth/stencil buffer.
+ */
+GLboolean
+_mesa_is_format_packed_depth_stencil(mesa_format format)
+{
+   const struct gl_format_info *info = _mesa_get_format_info(format);
+
+   return info->BaseFormat == GL_DEPTH_STENCIL;
+}
+
+
+/**
+ * Is the given format a signed/unsigned integer color format?
+ */
+GLboolean
+_mesa_is_format_integer_color(mesa_format format)
+{
+   const struct gl_format_info *info = _mesa_get_format_info(format);
+   return (info->DataType == GL_INT || info->DataType == GL_UNSIGNED_INT) &&
+      info->BaseFormat != GL_DEPTH_COMPONENT &&
+      info->BaseFormat != GL_DEPTH_STENCIL &&
+      info->BaseFormat != GL_STENCIL_INDEX;
+}
+
+
+/**
+ * Is the given format an unsigned integer format?
+ */
+GLboolean
+_mesa_is_format_unsigned(mesa_format format)
+{
+   const struct gl_format_info *info = _mesa_get_format_info(format);
+   return _mesa_is_type_unsigned(info->DataType);
+}
+
+
+/**
+ * Does the given format store signed values?
+ */
+GLboolean
+_mesa_is_format_signed(mesa_format format)
+{
+   if (format == MESA_FORMAT_R11G11B10_FLOAT || 
+       format == MESA_FORMAT_R9G9B9E5_FLOAT) {
+      /* these packed float formats only store unsigned values */
+      return GL_FALSE;
+   }
+   else {
+      const struct gl_format_info *info = _mesa_get_format_info(format);
+      return (info->DataType == GL_SIGNED_NORMALIZED ||
+              info->DataType == GL_INT ||
+              info->DataType == GL_FLOAT);
+   }
+}
+
+/**
+ * Is the given format an integer format?
+ */
+GLboolean
+_mesa_is_format_integer(mesa_format format)
+{
+   const struct gl_format_info *info = _mesa_get_format_info(format);
+   return (info->DataType == GL_INT || info->DataType == GL_UNSIGNED_INT);
+}
+
+/**
+ * Return color encoding for given format.
+ * \return GL_LINEAR or GL_SRGB
+ */
+GLenum
+_mesa_get_format_color_encoding(mesa_format format)
+{
+   /* XXX this info should be encoded in gl_format_info */
+   switch (format) {
+   case MESA_FORMAT_BGR_SRGB8:
+   case MESA_FORMAT_A8B8G8R8_SRGB:
+   case MESA_FORMAT_B8G8R8A8_SRGB:
+   case MESA_FORMAT_R8G8B8A8_SRGB:
+   case MESA_FORMAT_L_SRGB8:
+   case MESA_FORMAT_L8A8_SRGB:
+   case MESA_FORMAT_SRGB_DXT1:
+   case MESA_FORMAT_SRGBA_DXT1:
+   case MESA_FORMAT_SRGBA_DXT3:
+   case MESA_FORMAT_SRGBA_DXT5:
+   case MESA_FORMAT_R8G8B8X8_SRGB:
+   case MESA_FORMAT_ETC2_SRGB8:
+   case MESA_FORMAT_ETC2_SRGB8_ALPHA8_EAC:
+   case MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1:
+   case MESA_FORMAT_B8G8R8X8_SRGB:
+      return GL_SRGB;
+   default:
+      return GL_LINEAR;
+   }
+}
+
+
+/**
+ * For an sRGB format, return the corresponding linear color space format.
+ * For non-sRGB formats, return the format as-is.
+ */
+mesa_format
+_mesa_get_srgb_format_linear(mesa_format format)
+{
+   switch (format) {
+   case MESA_FORMAT_BGR_SRGB8:
+      format = MESA_FORMAT_BGR_UNORM8;
+      break;
+   case MESA_FORMAT_A8B8G8R8_SRGB:
+      format = MESA_FORMAT_A8B8G8R8_UNORM;
+      break;
+   case MESA_FORMAT_B8G8R8A8_SRGB:
+      format = MESA_FORMAT_B8G8R8A8_UNORM;
+      break;
+   case MESA_FORMAT_R8G8B8A8_SRGB:
+      format = MESA_FORMAT_R8G8B8A8_UNORM;
+      break;
+   case MESA_FORMAT_L_SRGB8:
+      format = MESA_FORMAT_L_UNORM8;
+      break;
+   case MESA_FORMAT_L8A8_SRGB:
+      format = MESA_FORMAT_L8A8_UNORM;
+      break;
+   case MESA_FORMAT_SRGB_DXT1:
+      format = MESA_FORMAT_RGB_DXT1;
+      break;
+   case MESA_FORMAT_SRGBA_DXT1:
+      format = MESA_FORMAT_RGBA_DXT1;
+      break;
+   case MESA_FORMAT_SRGBA_DXT3:
+      format = MESA_FORMAT_RGBA_DXT3;
+      break;
+   case MESA_FORMAT_SRGBA_DXT5:
+      format = MESA_FORMAT_RGBA_DXT5;
+      break;
+   case MESA_FORMAT_R8G8B8X8_SRGB:
+      format = MESA_FORMAT_R8G8B8X8_UNORM;
+      break;
+   case MESA_FORMAT_ETC2_SRGB8:
+      format = MESA_FORMAT_ETC2_RGB8;
+      break;
+   case MESA_FORMAT_ETC2_SRGB8_ALPHA8_EAC:
+      format = MESA_FORMAT_ETC2_RGBA8_EAC;
+      break;
+   case MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1:
+      format = MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1;
+      break;
+   case MESA_FORMAT_B8G8R8X8_SRGB:
+      format = MESA_FORMAT_B8G8R8X8_UNORM;
+      break;
+   default:
+      break;
+   }
+   return format;
+}
+
+
+/**
+ * If the given format is a compressed format, return a corresponding
+ * uncompressed format.
+ */
+mesa_format
+_mesa_get_uncompressed_format(mesa_format format)
+{
+   switch (format) {
+   case MESA_FORMAT_RGB_FXT1:
+      return MESA_FORMAT_BGR_UNORM8;
+   case MESA_FORMAT_RGBA_FXT1:
+      return MESA_FORMAT_A8B8G8R8_UNORM;
+   case MESA_FORMAT_RGB_DXT1:
+   case MESA_FORMAT_SRGB_DXT1:
+      return MESA_FORMAT_BGR_UNORM8;
+   case MESA_FORMAT_RGBA_DXT1:
+   case MESA_FORMAT_SRGBA_DXT1:
+      return MESA_FORMAT_A8B8G8R8_UNORM;
+   case MESA_FORMAT_RGBA_DXT3:
+   case MESA_FORMAT_SRGBA_DXT3:
+      return MESA_FORMAT_A8B8G8R8_UNORM;
+   case MESA_FORMAT_RGBA_DXT5:
+   case MESA_FORMAT_SRGBA_DXT5:
+      return MESA_FORMAT_A8B8G8R8_UNORM;
+   case MESA_FORMAT_R_RGTC1_UNORM:
+      return MESA_FORMAT_R_UNORM8;
+   case MESA_FORMAT_R_RGTC1_SNORM:
+      return MESA_FORMAT_R_SNORM8;
+   case MESA_FORMAT_RG_RGTC2_UNORM:
+      return MESA_FORMAT_R8G8_UNORM;
+   case MESA_FORMAT_RG_RGTC2_SNORM:
+      return MESA_FORMAT_R8G8_SNORM;
+   case MESA_FORMAT_L_LATC1_UNORM:
+      return MESA_FORMAT_L_UNORM8;
+   case MESA_FORMAT_L_LATC1_SNORM:
+      return MESA_FORMAT_L_SNORM8;
+   case MESA_FORMAT_LA_LATC2_UNORM:
+      return MESA_FORMAT_L8A8_UNORM;
+   case MESA_FORMAT_LA_LATC2_SNORM:
+      return MESA_FORMAT_L8A8_SNORM;
+   case MESA_FORMAT_ETC1_RGB8:
+   case MESA_FORMAT_ETC2_RGB8:
+   case MESA_FORMAT_ETC2_SRGB8:
+      return MESA_FORMAT_BGR_UNORM8;
+   case MESA_FORMAT_ETC2_RGBA8_EAC:
+   case MESA_FORMAT_ETC2_SRGB8_ALPHA8_EAC:
+   case MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1:
+   case MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1:
+      return MESA_FORMAT_A8B8G8R8_UNORM;
+   case MESA_FORMAT_ETC2_R11_EAC:
+   case MESA_FORMAT_ETC2_SIGNED_R11_EAC:
+      return MESA_FORMAT_R_UNORM16;
+   case MESA_FORMAT_ETC2_RG11_EAC:
+   case MESA_FORMAT_ETC2_SIGNED_RG11_EAC:
+      return MESA_FORMAT_R16G16_UNORM;
+   default:
+#ifdef DEBUG
+      assert(!_mesa_is_format_compressed(format));
+#endif
+      return format;
+   }
+}
+
+
+GLuint
+_mesa_format_num_components(mesa_format format)
+{
+   const struct gl_format_info *info = _mesa_get_format_info(format);
+   return ((info->RedBits > 0) +
+           (info->GreenBits > 0) +
+           (info->BlueBits > 0) +
+           (info->AlphaBits > 0) +
+           (info->LuminanceBits > 0) +
+           (info->IntensityBits > 0) +
+           (info->DepthBits > 0) +
+           (info->StencilBits > 0));
+}
+
+
+/**
+ * Returns true if a color format has data stored in the R/G/B/A channels,
+ * given an index from 0 to 3.
+ */
+bool
+_mesa_format_has_color_component(mesa_format format, int component)
+{
+   const struct gl_format_info *info = _mesa_get_format_info(format);
+
+   assert(info->BaseFormat != GL_DEPTH_COMPONENT &&
+          info->BaseFormat != GL_DEPTH_STENCIL &&
+          info->BaseFormat != GL_STENCIL_INDEX);
+
+   switch (component) {
+   case 0:
+      return (info->RedBits + info->IntensityBits + info->LuminanceBits) > 0;
+   case 1:
+      return (info->GreenBits + info->IntensityBits + info->LuminanceBits) > 0;
+   case 2:
+      return (info->BlueBits + info->IntensityBits + info->LuminanceBits) > 0;
+   case 3:
+      return (info->AlphaBits + info->IntensityBits) > 0;
+   default:
+      assert(!"Invalid color component: must be 0..3");
+      return false;
+   }
+}
+
+
+/**
+ * Return number of bytes needed to store an image of the given size
+ * in the given format.
+ */
+GLuint
+_mesa_format_image_size(mesa_format format, GLsizei width,
+                        GLsizei height, GLsizei depth)
+{
+   const struct gl_format_info *info = _mesa_get_format_info(format);
+   /* Strictly speaking, a conditional isn't needed here */
+   if (info->BlockWidth > 1 || info->BlockHeight > 1) {
+      /* compressed format (2D only for now) */
+      const GLuint bw = info->BlockWidth, bh = info->BlockHeight;
+      const GLuint wblocks = (width + bw - 1) / bw;
+      const GLuint hblocks = (height + bh - 1) / bh;
+      const GLuint sz = wblocks * hblocks * info->BytesPerBlock;
+      return sz * depth;
+   }
+   else {
+      /* non-compressed */
+      const GLuint sz = width * height * depth * info->BytesPerBlock;
+      return sz;
+   }
+}
+
+
+/**
+ * Same as _mesa_format_image_size() but returns a 64-bit value to
+ * accommodate very large textures.
+ */
+uint64_t
+_mesa_format_image_size64(mesa_format format, GLsizei width,
+                          GLsizei height, GLsizei depth)
+{
+   const struct gl_format_info *info = _mesa_get_format_info(format);
+   /* Strictly speaking, a conditional isn't needed here */
+   if (info->BlockWidth > 1 || info->BlockHeight > 1) {
+      /* compressed format (2D only for now) */
+      const uint64_t bw = info->BlockWidth, bh = info->BlockHeight;
+      const uint64_t wblocks = (width + bw - 1) / bw;
+      const uint64_t hblocks = (height + bh - 1) / bh;
+      const uint64_t sz = wblocks * hblocks * info->BytesPerBlock;
+      return sz * depth;
+   }
+   else {
+      /* non-compressed */
+      const uint64_t sz = ((uint64_t) width *
+                           (uint64_t) height *
+                           (uint64_t) depth *
+                           info->BytesPerBlock);
+      return sz;
+   }
+}
+
+
+
+GLint
+_mesa_format_row_stride(mesa_format format, GLsizei width)
+{
+   const struct gl_format_info *info = _mesa_get_format_info(format);
+   /* Strictly speaking, a conditional isn't needed here */
+   if (info->BlockWidth > 1 || info->BlockHeight > 1) {
+      /* compressed format */
+      const GLuint bw = info->BlockWidth;
+      const GLuint wblocks = (width + bw - 1) / bw;
+      const GLint stride = wblocks * info->BytesPerBlock;
+      return stride;
+   }
+   else {
+      const GLint stride = width * info->BytesPerBlock;
+      return stride;
+   }
+}
+
+
+/**
+ * Debug/test: check that all formats are handled in the
+ * _mesa_format_to_type_and_comps() function.  When new pixel formats
+ * are added to Mesa, that function needs to be updated.
+ * This is a no-op after the first call.
+ */
+static void
+check_format_to_type_and_comps(void)
+{
+   mesa_format f;
+
+   for (f = MESA_FORMAT_NONE + 1; f < MESA_FORMAT_COUNT; f++) {
+      GLenum datatype = 0;
+      GLuint comps = 0;
+      /* This function will emit a problem/warning if the format is
+       * not handled.
+       */
+      _mesa_format_to_type_and_comps(f, &datatype, &comps);
+   }
+}
+
+
+/**
+ * Do sanity checking of the format info table.
+ */
+void
+_mesa_test_formats(void)
+{
+   GLuint i;
+
+   STATIC_ASSERT(Elements(format_info) == MESA_FORMAT_COUNT);
+
+   for (i = 0; i < MESA_FORMAT_COUNT; i++) {
+      const struct gl_format_info *info = _mesa_get_format_info(i);
+      assert(info);
+
+      assert(info->Name == i);
+
+      if (info->Name == MESA_FORMAT_NONE)
+         continue;
+
+      if (info->BlockWidth == 1 && info->BlockHeight == 1) {
+         if (info->RedBits > 0) {
+            GLuint t = info->RedBits + info->GreenBits
+               + info->BlueBits + info->AlphaBits;
+            assert(t / 8 <= info->BytesPerBlock);
+            (void) t;
+         }
+      }
+
+      assert(info->DataType == GL_UNSIGNED_NORMALIZED ||
+             info->DataType == GL_SIGNED_NORMALIZED ||
+             info->DataType == GL_UNSIGNED_INT ||
+             info->DataType == GL_INT ||
+             info->DataType == GL_FLOAT ||
+             /* Z32_FLOAT_X24S8 has DataType of GL_NONE */
+             info->DataType == GL_NONE);
+
+      if (info->BaseFormat == GL_RGB) {
+         assert(info->RedBits > 0);
+         assert(info->GreenBits > 0);
+         assert(info->BlueBits > 0);
+         assert(info->AlphaBits == 0);
+         assert(info->LuminanceBits == 0);
+         assert(info->IntensityBits == 0);
+      }
+      else if (info->BaseFormat == GL_RGBA) {
+         assert(info->RedBits > 0);
+         assert(info->GreenBits > 0);
+         assert(info->BlueBits > 0);
+         assert(info->AlphaBits > 0);
+         assert(info->LuminanceBits == 0);
+         assert(info->IntensityBits == 0);
+      }
+      else if (info->BaseFormat == GL_RG) {
+         assert(info->RedBits > 0);
+         assert(info->GreenBits > 0);
+         assert(info->BlueBits == 0);
+         assert(info->AlphaBits == 0);
+         assert(info->LuminanceBits == 0);
+         assert(info->IntensityBits == 0);
+      }
+      else if (info->BaseFormat == GL_RED) {
+         assert(info->RedBits > 0);
+         assert(info->GreenBits == 0);
+         assert(info->BlueBits == 0);
+         assert(info->AlphaBits == 0);
+         assert(info->LuminanceBits == 0);
+         assert(info->IntensityBits == 0);
+      }
+      else if (info->BaseFormat == GL_LUMINANCE) {
+         assert(info->RedBits == 0);
+         assert(info->GreenBits == 0);
+         assert(info->BlueBits == 0);
+         assert(info->AlphaBits == 0);
+         assert(info->LuminanceBits > 0);
+         assert(info->IntensityBits == 0);
+      }
+      else if (info->BaseFormat == GL_INTENSITY) {
+         assert(info->RedBits == 0);
+         assert(info->GreenBits == 0);
+         assert(info->BlueBits == 0);
+         assert(info->AlphaBits == 0);
+         assert(info->LuminanceBits == 0);
+         assert(info->IntensityBits > 0);
+      }
+   }
+
+   check_format_to_type_and_comps();
+}
+
+
+
+/**
+ * Return datatype and number of components per texel for the given mesa_format.
+ * Only used for mipmap generation code.
+ */
+void
+_mesa_format_to_type_and_comps(mesa_format format,
+                               GLenum *datatype, GLuint *comps)
+{
+   switch (format) {
+   case MESA_FORMAT_A8B8G8R8_UNORM:
+   case MESA_FORMAT_R8G8B8A8_UNORM:
+   case MESA_FORMAT_B8G8R8A8_UNORM:
+   case MESA_FORMAT_A8R8G8B8_UNORM:
+   case MESA_FORMAT_X8B8G8R8_UNORM:
+   case MESA_FORMAT_R8G8B8X8_UNORM:
+   case MESA_FORMAT_B8G8R8X8_UNORM:
+   case MESA_FORMAT_X8R8G8B8_UNORM:
+      *datatype = GL_UNSIGNED_BYTE;
+      *comps = 4;
+      return;
+   case MESA_FORMAT_BGR_UNORM8:
+   case MESA_FORMAT_RGB_UNORM8:
+      *datatype = GL_UNSIGNED_BYTE;
+      *comps = 3;
+      return;
+   case MESA_FORMAT_B5G6R5_UNORM:
+   case MESA_FORMAT_R5G6B5_UNORM:
+      *datatype = GL_UNSIGNED_SHORT_5_6_5;
+      *comps = 3;
+      return;
+
+   case MESA_FORMAT_B4G4R4A4_UNORM:
+   case MESA_FORMAT_A4R4G4B4_UNORM:
+   case MESA_FORMAT_B4G4R4X4_UNORM:
+      *datatype = GL_UNSIGNED_SHORT_4_4_4_4;
+      *comps = 4;
+      return;
+
+   case MESA_FORMAT_B5G5R5A1_UNORM:
+   case MESA_FORMAT_A1R5G5B5_UNORM:
+   case MESA_FORMAT_B5G5R5X1_UNORM:
+      *datatype = GL_UNSIGNED_SHORT_1_5_5_5_REV;
+      *comps = 4;
+      return;
+
+   case MESA_FORMAT_B10G10R10A2_UNORM:
+      *datatype = GL_UNSIGNED_INT_2_10_10_10_REV;
+      *comps = 4;
+      return;
+
+   case MESA_FORMAT_A1B5G5R5_UNORM:
+      *datatype = GL_UNSIGNED_SHORT_5_5_5_1;
+      *comps = 4;
+      return;
+
+   case MESA_FORMAT_L4A4_UNORM:
+      *datatype = MESA_UNSIGNED_BYTE_4_4;
+      *comps = 2;
+      return;
+
+   case MESA_FORMAT_L8A8_UNORM:
+   case MESA_FORMAT_A8L8_UNORM:
+   case MESA_FORMAT_R8G8_UNORM:
+   case MESA_FORMAT_G8R8_UNORM:
+      *datatype = GL_UNSIGNED_BYTE;
+      *comps = 2;
+      return;
+
+   case MESA_FORMAT_L16A16_UNORM:
+   case MESA_FORMAT_A16L16_UNORM:
+   case MESA_FORMAT_R16G16_UNORM:
+   case MESA_FORMAT_G16R16_UNORM:
+      *datatype = GL_UNSIGNED_SHORT;
+      *comps = 2;
+      return;
+
+   case MESA_FORMAT_R_UNORM16:
+   case MESA_FORMAT_A_UNORM16:
+   case MESA_FORMAT_L_UNORM16:
+   case MESA_FORMAT_I_UNORM16:
+      *datatype = GL_UNSIGNED_SHORT;
+      *comps = 1;
+      return;
+
+   case MESA_FORMAT_B2G3R3_UNORM:
+      *datatype = GL_UNSIGNED_BYTE_3_3_2;
+      *comps = 3;
+      return;
+
+   case MESA_FORMAT_A_UNORM8:
+   case MESA_FORMAT_L_UNORM8:
+   case MESA_FORMAT_I_UNORM8:
+   case MESA_FORMAT_R_UNORM8:
+   case MESA_FORMAT_S_UINT8:
+      *datatype = GL_UNSIGNED_BYTE;
+      *comps = 1;
+      return;
+
+   case MESA_FORMAT_YCBCR:
+   case MESA_FORMAT_YCBCR_REV:
+      *datatype = GL_UNSIGNED_SHORT;
+      *comps = 2;
+      return;
+
+   case MESA_FORMAT_S8_UINT_Z24_UNORM:
+      *datatype = GL_UNSIGNED_INT_24_8_MESA;
+      *comps = 2;
+      return;
+
+   case MESA_FORMAT_Z24_UNORM_S8_UINT:
+      *datatype = GL_UNSIGNED_INT_8_24_REV_MESA;
+      *comps = 2;
+      return;
+
+   case MESA_FORMAT_Z_UNORM16:
+      *datatype = GL_UNSIGNED_SHORT;
+      *comps = 1;
+      return;
+
+   case MESA_FORMAT_Z24_UNORM_X8_UINT:
+      *datatype = GL_UNSIGNED_INT;
+      *comps = 1;
+      return;
+
+   case MESA_FORMAT_X8_UINT_Z24_UNORM:
+      *datatype = GL_UNSIGNED_INT;
+      *comps = 1;
+      return;
+
+   case MESA_FORMAT_Z_UNORM32:
+      *datatype = GL_UNSIGNED_INT;
+      *comps = 1;
+      return;
+
+   case MESA_FORMAT_Z_FLOAT32:
+      *datatype = GL_FLOAT;
+      *comps = 1;
+      return;
+
+   case MESA_FORMAT_Z32_FLOAT_S8X24_UINT:
+      *datatype = GL_FLOAT_32_UNSIGNED_INT_24_8_REV;
+      *comps = 1;
+      return;
+
+   case MESA_FORMAT_DUDV8:
+      *datatype = GL_BYTE;
+      *comps = 2;
+      return;
+
+   case MESA_FORMAT_R_SNORM8:
+   case MESA_FORMAT_A_SNORM8:
+   case MESA_FORMAT_L_SNORM8:
+   case MESA_FORMAT_I_SNORM8:
+      *datatype = GL_BYTE;
+      *comps = 1;
+      return;
+   case MESA_FORMAT_R8G8_SNORM:
+   case MESA_FORMAT_L8A8_SNORM:
+      *datatype = GL_BYTE;
+      *comps = 2;
+      return;
+   case MESA_FORMAT_A8B8G8R8_SNORM:
+   case MESA_FORMAT_R8G8B8A8_SNORM:
+   case MESA_FORMAT_X8B8G8R8_SNORM:
+      *datatype = GL_BYTE;
+      *comps = 4;
+      return;
+
+   case MESA_FORMAT_RGBA_UNORM16:
+      *datatype = GL_UNSIGNED_SHORT;
+      *comps = 4;
+      return;
+
+   case MESA_FORMAT_R_SNORM16:
+   case MESA_FORMAT_A_SNORM16:
+   case MESA_FORMAT_L_SNORM16:
+   case MESA_FORMAT_I_SNORM16:
+      *datatype = GL_SHORT;
+      *comps = 1;
+      return;
+   case MESA_FORMAT_R16G16_SNORM:
+   case MESA_FORMAT_LA_SNORM16:
+      *datatype = GL_SHORT;
+      *comps = 2;
+      return;
+   case MESA_FORMAT_RGB_SNORM16:
+      *datatype = GL_SHORT;
+      *comps = 3;
+      return;
+   case MESA_FORMAT_RGBA_SNORM16:
+      *datatype = GL_SHORT;
+      *comps = 4;
+      return;
+
+   case MESA_FORMAT_BGR_SRGB8:
+      *datatype = GL_UNSIGNED_BYTE;
+      *comps = 3;
+      return;
+   case MESA_FORMAT_A8B8G8R8_SRGB:
+   case MESA_FORMAT_B8G8R8A8_SRGB:
+   case MESA_FORMAT_R8G8B8A8_SRGB:
+      *datatype = GL_UNSIGNED_BYTE;
+      *comps = 4;
+      return;
+   case MESA_FORMAT_L_SRGB8:
+      *datatype = GL_UNSIGNED_BYTE;
+      *comps = 1;
+      return;
+   case MESA_FORMAT_L8A8_SRGB:
+      *datatype = GL_UNSIGNED_BYTE;
+      *comps = 2;
+      return;
+
+   case MESA_FORMAT_RGB_FXT1:
+   case MESA_FORMAT_RGBA_FXT1:
+   case MESA_FORMAT_RGB_DXT1:
+   case MESA_FORMAT_RGBA_DXT1:
+   case MESA_FORMAT_RGBA_DXT3:
+   case MESA_FORMAT_RGBA_DXT5:
+   case MESA_FORMAT_SRGB_DXT1:
+   case MESA_FORMAT_SRGBA_DXT1:
+   case MESA_FORMAT_SRGBA_DXT3:
+   case MESA_FORMAT_SRGBA_DXT5:
+   case MESA_FORMAT_R_RGTC1_UNORM:
+   case MESA_FORMAT_R_RGTC1_SNORM:
+   case MESA_FORMAT_RG_RGTC2_UNORM:
+   case MESA_FORMAT_RG_RGTC2_SNORM:
+   case MESA_FORMAT_L_LATC1_UNORM:
+   case MESA_FORMAT_L_LATC1_SNORM:
+   case MESA_FORMAT_LA_LATC2_UNORM:
+   case MESA_FORMAT_LA_LATC2_SNORM:
+   case MESA_FORMAT_ETC1_RGB8:
+   case MESA_FORMAT_ETC2_RGB8:
+   case MESA_FORMAT_ETC2_SRGB8:
+   case MESA_FORMAT_ETC2_RGBA8_EAC:
+   case MESA_FORMAT_ETC2_SRGB8_ALPHA8_EAC:
+   case MESA_FORMAT_ETC2_R11_EAC:
+   case MESA_FORMAT_ETC2_RG11_EAC:
+   case MESA_FORMAT_ETC2_SIGNED_R11_EAC:
+   case MESA_FORMAT_ETC2_SIGNED_RG11_EAC:
+   case MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1:
+   case MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1:
+      /* XXX generate error instead? */
+      *datatype = GL_UNSIGNED_BYTE;
+      *comps = 0;
+      return;
+
+   case MESA_FORMAT_RGBA_FLOAT32:
+      *datatype = GL_FLOAT;
+      *comps = 4;
+      return;
+   case MESA_FORMAT_RGBA_FLOAT16:
+      *datatype = GL_HALF_FLOAT_ARB;
+      *comps = 4;
+      return;
+   case MESA_FORMAT_RGB_FLOAT32:
+      *datatype = GL_FLOAT;
+      *comps = 3;
+      return;
+   case MESA_FORMAT_RGB_FLOAT16:
+      *datatype = GL_HALF_FLOAT_ARB;
+      *comps = 3;
+      return;
+   case MESA_FORMAT_LA_FLOAT32:
+   case MESA_FORMAT_RG_FLOAT32:
+      *datatype = GL_FLOAT;
+      *comps = 2;
+      return;
+   case MESA_FORMAT_LA_FLOAT16:
+   case MESA_FORMAT_RG_FLOAT16:
+      *datatype = GL_HALF_FLOAT_ARB;
+      *comps = 2;
+      return;
+   case MESA_FORMAT_A_FLOAT32:
+   case MESA_FORMAT_L_FLOAT32:
+   case MESA_FORMAT_I_FLOAT32:
+   case MESA_FORMAT_R_FLOAT32:
+      *datatype = GL_FLOAT;
+      *comps = 1;
+      return;
+   case MESA_FORMAT_A_FLOAT16:
+   case MESA_FORMAT_L_FLOAT16:
+   case MESA_FORMAT_I_FLOAT16:
+   case MESA_FORMAT_R_FLOAT16:
+      *datatype = GL_HALF_FLOAT_ARB;
+      *comps = 1;
+      return;
+
+   case MESA_FORMAT_A_UINT8:
+   case MESA_FORMAT_L_UINT8:
+   case MESA_FORMAT_I_UINT8:
+      *datatype = GL_UNSIGNED_BYTE;
+      *comps = 1;
+      return;
+   case MESA_FORMAT_LA_UINT8:
+      *datatype = GL_UNSIGNED_BYTE;
+      *comps = 2;
+      return;
+
+   case MESA_FORMAT_A_UINT16:
+   case MESA_FORMAT_L_UINT16:
+   case MESA_FORMAT_I_UINT16:
+      *datatype = GL_UNSIGNED_SHORT;
+      *comps = 1;
+      return;
+   case MESA_FORMAT_LA_UINT16:
+      *datatype = GL_UNSIGNED_SHORT;
+      *comps = 2;
+      return;
+   case MESA_FORMAT_A_UINT32:
+   case MESA_FORMAT_L_UINT32:
+   case MESA_FORMAT_I_UINT32:
+      *datatype = GL_UNSIGNED_INT;
+      *comps = 1;
+      return;
+   case MESA_FORMAT_LA_UINT32:
+      *datatype = GL_UNSIGNED_INT;
+      *comps = 2;
+      return;
+   case MESA_FORMAT_A_SINT8:
+   case MESA_FORMAT_L_SINT8:
+   case MESA_FORMAT_I_SINT8:
+      *datatype = GL_BYTE;
+      *comps = 1;
+      return;
+   case MESA_FORMAT_LA_SINT8:
+      *datatype = GL_BYTE;
+      *comps = 2;
+      return;
+
+   case MESA_FORMAT_A_SINT16:
+   case MESA_FORMAT_L_SINT16:
+   case MESA_FORMAT_I_SINT16:
+      *datatype = GL_SHORT;
+      *comps = 1;
+      return;
+   case MESA_FORMAT_LA_SINT16:
+      *datatype = GL_SHORT;
+      *comps = 2;
+      return;
+
+   case MESA_FORMAT_A_SINT32:
+   case MESA_FORMAT_L_SINT32:
+   case MESA_FORMAT_I_SINT32:
+      *datatype = GL_INT;
+      *comps = 1;
+      return;
+   case MESA_FORMAT_LA_SINT32:
+      *datatype = GL_INT;
+      *comps = 2;
+      return;
+
+   case MESA_FORMAT_R_SINT8:
+      *datatype = GL_BYTE;
+      *comps = 1;
+      return;
+   case MESA_FORMAT_RG_SINT8:
+      *datatype = GL_BYTE;
+      *comps = 2;
+      return;
+   case MESA_FORMAT_RGB_SINT8:
+      *datatype = GL_BYTE;
+      *comps = 3;
+      return;
+   case MESA_FORMAT_RGBA_SINT8:
+      *datatype = GL_BYTE;
+      *comps = 4;
+      return;
+   case MESA_FORMAT_R_SINT16:
+      *datatype = GL_SHORT;
+      *comps = 1;
+      return;
+   case MESA_FORMAT_RG_SINT16:
+      *datatype = GL_SHORT;
+      *comps = 2;
+      return;
+   case MESA_FORMAT_RGB_SINT16:
+      *datatype = GL_SHORT;
+      *comps = 3;
+      return;
+   case MESA_FORMAT_RGBA_SINT16:
+      *datatype = GL_SHORT;
+      *comps = 4;
+      return;
+   case MESA_FORMAT_R_SINT32:
+      *datatype = GL_INT;
+      *comps = 1;
+      return;
+   case MESA_FORMAT_RG_SINT32:
+      *datatype = GL_INT;
+      *comps = 2;
+      return;
+   case MESA_FORMAT_RGB_SINT32:
+      *datatype = GL_INT;
+      *comps = 3;
+      return;
+   case MESA_FORMAT_RGBA_SINT32:
+      *datatype = GL_INT;
+      *comps = 4;
+      return;
+
+   /**
+    * \name Non-normalized unsigned integer formats.
+    */
+   case MESA_FORMAT_R_UINT8:
+      *datatype = GL_UNSIGNED_BYTE;
+      *comps = 1;
+      return;
+   case MESA_FORMAT_RG_UINT8:
+      *datatype = GL_UNSIGNED_BYTE;
+      *comps = 2;
+      return;
+   case MESA_FORMAT_RGB_UINT8:
+      *datatype = GL_UNSIGNED_BYTE;
+      *comps = 3;
+      return;
+   case MESA_FORMAT_RGBA_UINT8:
+      *datatype = GL_UNSIGNED_BYTE;
+      *comps = 4;
+      return;
+   case MESA_FORMAT_R_UINT16:
+      *datatype = GL_UNSIGNED_SHORT;
+      *comps = 1;
+      return;
+   case MESA_FORMAT_RG_UINT16:
+      *datatype = GL_UNSIGNED_SHORT;
+      *comps = 2;
+      return;
+   case MESA_FORMAT_RGB_UINT16:
+      *datatype = GL_UNSIGNED_SHORT;
+      *comps = 3;
+      return;
+   case MESA_FORMAT_RGBA_UINT16:
+      *datatype = GL_UNSIGNED_SHORT;
+      *comps = 4;
+      return;
+   case MESA_FORMAT_R_UINT32:
+      *datatype = GL_UNSIGNED_INT;
+      *comps = 1;
+      return;
+   case MESA_FORMAT_RG_UINT32:
+      *datatype = GL_UNSIGNED_INT;
+      *comps = 2;
+      return;
+   case MESA_FORMAT_RGB_UINT32:
+      *datatype = GL_UNSIGNED_INT;
+      *comps = 3;
+      return;
+   case MESA_FORMAT_RGBA_UINT32:
+      *datatype = GL_UNSIGNED_INT;
+      *comps = 4;
+      return;
+
+   case MESA_FORMAT_R9G9B9E5_FLOAT:
+      *datatype = GL_UNSIGNED_INT_5_9_9_9_REV;
+      *comps = 3;
+      return;
+
+   case MESA_FORMAT_R11G11B10_FLOAT:
+      *datatype = GL_UNSIGNED_INT_10F_11F_11F_REV;
+      *comps = 3;
+      return;
+
+   case MESA_FORMAT_B10G10R10A2_UINT:
+   case MESA_FORMAT_R10G10B10A2_UINT:
+      *datatype = GL_UNSIGNED_INT_2_10_10_10_REV;
+      *comps = 4;
+      return;
+
+   case MESA_FORMAT_R8G8B8X8_SRGB:
+   case MESA_FORMAT_RGBX_UINT8:
+      *datatype = GL_UNSIGNED_BYTE;
+      *comps = 4;
+      return;
+
+   case MESA_FORMAT_R8G8B8X8_SNORM:
+   case MESA_FORMAT_RGBX_SINT8:
+      *datatype = GL_BYTE;
+      *comps = 4;
+      return;
+
+   case MESA_FORMAT_B10G10R10X2_UNORM:
+      *datatype = GL_UNSIGNED_INT_2_10_10_10_REV;
+      *comps = 4;
+      return;
+
+   case MESA_FORMAT_RGBX_UNORM16:
+   case MESA_FORMAT_RGBX_UINT16:
+      *datatype = GL_UNSIGNED_SHORT;
+      *comps = 4;
+      return;
+
+   case MESA_FORMAT_RGBX_SNORM16:
+   case MESA_FORMAT_RGBX_SINT16:
+      *datatype = GL_SHORT;
+      *comps = 4;
+      return;
+
+   case MESA_FORMAT_RGBX_FLOAT16:
+      *datatype = GL_HALF_FLOAT;
+      *comps = 4;
+      return;
+
+   case MESA_FORMAT_RGBX_FLOAT32:
+      *datatype = GL_FLOAT;
+      *comps = 4;
+      return;
+
+   case MESA_FORMAT_RGBX_UINT32:
+      *datatype = GL_UNSIGNED_INT;
+      *comps = 4;
+      return;
+
+   case MESA_FORMAT_RGBX_SINT32:
+      *datatype = GL_INT;
+      *comps = 4;
+      return;
+
+   case MESA_FORMAT_R10G10B10A2_UNORM:
+      *datatype = GL_UNSIGNED_INT_2_10_10_10_REV;
+      *comps = 4;
+      return;
+
+   case MESA_FORMAT_G8R8_SNORM:
+      *datatype = GL_BYTE;
+      *comps = 2;
+      return;
+
+   case MESA_FORMAT_G16R16_SNORM:
+      *datatype = GL_SHORT;
+      *comps = 2;
+      return;
+
+   case MESA_FORMAT_B8G8R8X8_SRGB:
+      *datatype = GL_UNSIGNED_BYTE;
+      *comps = 4;
+      return;
+
+   case MESA_FORMAT_COUNT:
+      assert(0);
+      return;
+
+   case MESA_FORMAT_NONE:
+   /* For debug builds, warn if any formats are not handled */
+#ifdef DEBUG
+   default:
+#endif
+      _mesa_problem(NULL, "bad format %s in _mesa_format_to_type_and_comps",
+                    _mesa_get_format_name(format));
+      *datatype = 0;
+      *comps = 1;
+   }
+}
+
+/**
+ * Check if a mesa_format exactly matches a GL format/type combination
+ * such that we can use memcpy() from one to the other.
+ * \param mesa_format  a MESA_FORMAT_x value
+ * \param format  the user-specified image format
+ * \param type  the user-specified image datatype
+ * \param swapBytes  typically the current pixel pack/unpack byteswap state
+ * \return GL_TRUE if the formats match, GL_FALSE otherwise.
+ */
+GLboolean
+_mesa_format_matches_format_and_type(mesa_format mesa_format,
+				     GLenum format, GLenum type,
+                                     GLboolean swapBytes)
+{
+   const GLboolean littleEndian = _mesa_little_endian();
+
+   /* Note: When reading a GL format/type combination, the format lists channel
+    * assignments from most significant channel in the type to least
+    * significant.  A type with _REV indicates that the assignments are
+    * swapped, so they are listed from least significant to most significant.
+    *
+    * For sanity, please keep this switch statement ordered the same as the
+    * enums in formats.h.
+    */
+
+   switch (mesa_format) {
+
+   case MESA_FORMAT_NONE:
+   case MESA_FORMAT_COUNT:
+      return GL_FALSE;
+
+   case MESA_FORMAT_A8B8G8R8_UNORM:
+   case MESA_FORMAT_A8B8G8R8_SRGB:
+      if (format == GL_RGBA && type == GL_UNSIGNED_INT_8_8_8_8 && !swapBytes)
+         return GL_TRUE;
+
+      if (format == GL_RGBA && type == GL_UNSIGNED_INT_8_8_8_8_REV && swapBytes)
+         return GL_TRUE;
+
+      if (format == GL_RGBA && type == GL_UNSIGNED_BYTE && !littleEndian)
+         return GL_TRUE;
+
+      if (format == GL_ABGR_EXT && type == GL_UNSIGNED_INT_8_8_8_8_REV
+          && !swapBytes)
+         return GL_TRUE;
+
+      if (format == GL_ABGR_EXT && type == GL_UNSIGNED_INT_8_8_8_8
+          && swapBytes)
+         return GL_TRUE;
+
+      if (format == GL_ABGR_EXT && type == GL_UNSIGNED_BYTE && littleEndian)
+         return GL_TRUE;
+
+      return GL_FALSE;
+
+   case MESA_FORMAT_R8G8B8A8_UNORM:
+   case MESA_FORMAT_R8G8B8A8_SRGB:
+      if (format == GL_RGBA && type == GL_UNSIGNED_INT_8_8_8_8_REV &&
+          !swapBytes)
+         return GL_TRUE;
+
+      if (format == GL_RGBA && type == GL_UNSIGNED_INT_8_8_8_8 && swapBytes)
+         return GL_TRUE;
+
+      if (format == GL_RGBA && type == GL_UNSIGNED_BYTE && littleEndian)
+         return GL_TRUE;
+
+      if (format == GL_ABGR_EXT && type == GL_UNSIGNED_INT_8_8_8_8 &&
+          !swapBytes)
+         return GL_TRUE;
+
+      if (format == GL_ABGR_EXT && type == GL_UNSIGNED_INT_8_8_8_8_REV &&
+          swapBytes)
+         return GL_TRUE;
+
+      if (format == GL_ABGR_EXT && type == GL_UNSIGNED_BYTE && !littleEndian)
+         return GL_TRUE;
+
+      return GL_FALSE;
+
+   case MESA_FORMAT_B8G8R8A8_UNORM:
+   case MESA_FORMAT_B8G8R8A8_SRGB:
+      if (format == GL_BGRA && type == GL_UNSIGNED_INT_8_8_8_8_REV &&
+          !swapBytes)
+         return GL_TRUE;
+
+      if (format == GL_BGRA && type == GL_UNSIGNED_INT_8_8_8_8 && swapBytes)
+         return GL_TRUE;
+
+      if (format == GL_BGRA && type == GL_UNSIGNED_BYTE && littleEndian)
+         return GL_TRUE;
+
+      return GL_FALSE;
+
+   case MESA_FORMAT_A8R8G8B8_UNORM:
+      if (format == GL_BGRA && type == GL_UNSIGNED_INT_8_8_8_8 && !swapBytes)
+         return GL_TRUE;
+
+      if (format == GL_BGRA && type == GL_UNSIGNED_INT_8_8_8_8_REV &&
+          swapBytes)
+         return GL_TRUE;
+
+      if (format == GL_BGRA && type == GL_UNSIGNED_BYTE && !littleEndian)
+         return GL_TRUE;
+
+      return GL_FALSE;
+
+   case MESA_FORMAT_X8B8G8R8_UNORM:
+   case MESA_FORMAT_R8G8B8X8_UNORM:
+      return GL_FALSE;
+
+   case MESA_FORMAT_B8G8R8X8_UNORM:
+   case MESA_FORMAT_X8R8G8B8_UNORM:
+      return GL_FALSE;
+
+   case MESA_FORMAT_BGR_UNORM8:
+   case MESA_FORMAT_BGR_SRGB8:
+      return format == GL_BGR && type == GL_UNSIGNED_BYTE && littleEndian;
+
+   case MESA_FORMAT_RGB_UNORM8:
+      return format == GL_RGB && type == GL_UNSIGNED_BYTE && littleEndian;
+
+   case MESA_FORMAT_B5G6R5_UNORM:
+      return format == GL_RGB && type == GL_UNSIGNED_SHORT_5_6_5 && !swapBytes;
+
+   case MESA_FORMAT_R5G6B5_UNORM:
+      /* Some of the 16-bit MESA_FORMATs that would seem to correspond to
+       * GL_UNSIGNED_SHORT_* are byte-swapped instead of channel-reversed,
+       * according to formats.h, so they can't be matched.
+       */
+      return GL_FALSE;
+
+   case MESA_FORMAT_B4G4R4A4_UNORM:
+      return format == GL_BGRA && type == GL_UNSIGNED_SHORT_4_4_4_4_REV &&
+         !swapBytes;
+
+   case MESA_FORMAT_A4R4G4B4_UNORM:
+      return GL_FALSE;
+
+   case MESA_FORMAT_A1B5G5R5_UNORM:
+      return format == GL_RGBA && type == GL_UNSIGNED_SHORT_5_5_5_1 &&
+         !swapBytes;
+
+   case MESA_FORMAT_B5G5R5A1_UNORM:
+      return format == GL_BGRA && type == GL_UNSIGNED_SHORT_1_5_5_5_REV &&
+         !swapBytes;
+
+   case MESA_FORMAT_A1R5G5B5_UNORM:
+      return GL_FALSE;
+
+   case MESA_FORMAT_L4A4_UNORM:
+      return GL_FALSE;
+   case MESA_FORMAT_L8A8_UNORM:
+   case MESA_FORMAT_L8A8_SRGB:
+      return format == GL_LUMINANCE_ALPHA && type == GL_UNSIGNED_BYTE && littleEndian;
+   case MESA_FORMAT_A8L8_UNORM:
+      return GL_FALSE;
+
+   case MESA_FORMAT_L16A16_UNORM:
+      return format == GL_LUMINANCE_ALPHA && type == GL_UNSIGNED_SHORT && littleEndian && !swapBytes;
+   case MESA_FORMAT_A16L16_UNORM:
+      return GL_FALSE;
+
+   case MESA_FORMAT_B2G3R3_UNORM:
+      return format == GL_RGB && type == GL_UNSIGNED_BYTE_3_3_2;
+
+   case MESA_FORMAT_A_UNORM8:
+      return format == GL_ALPHA && type == GL_UNSIGNED_BYTE;
+   case MESA_FORMAT_A_UNORM16:
+      return format == GL_ALPHA && type == GL_UNSIGNED_SHORT && !swapBytes;
+   case MESA_FORMAT_L_UNORM8:
+   case MESA_FORMAT_L_SRGB8:
+      return format == GL_LUMINANCE && type == GL_UNSIGNED_BYTE;
+   case MESA_FORMAT_L_UNORM16:
+      return format == GL_LUMINANCE && type == GL_UNSIGNED_SHORT && !swapBytes;
+   case MESA_FORMAT_I_UNORM8:
+      return format == GL_RED && type == GL_UNSIGNED_BYTE;
+   case MESA_FORMAT_I_UNORM16:
+      return format == GL_RED && type == GL_UNSIGNED_SHORT && !swapBytes;
+
+   case MESA_FORMAT_YCBCR:
+      return format == GL_YCBCR_MESA &&
+             ((type == GL_UNSIGNED_SHORT_8_8_MESA && littleEndian != swapBytes) ||
+              (type == GL_UNSIGNED_SHORT_8_8_REV_MESA && littleEndian == swapBytes));
+   case MESA_FORMAT_YCBCR_REV:
+      return format == GL_YCBCR_MESA &&
+             ((type == GL_UNSIGNED_SHORT_8_8_MESA && littleEndian == swapBytes) ||
+              (type == GL_UNSIGNED_SHORT_8_8_REV_MESA && littleEndian != swapBytes));
+
+   case MESA_FORMAT_R_UNORM8:
+      return format == GL_RED && type == GL_UNSIGNED_BYTE;
+   case MESA_FORMAT_R8G8_UNORM:
+      return format == GL_RG && type == GL_UNSIGNED_BYTE && littleEndian;
+   case MESA_FORMAT_G8R8_UNORM:
+      return GL_FALSE;
+
+   case MESA_FORMAT_R_UNORM16:
+      return format == GL_RED && type == GL_UNSIGNED_SHORT &&
+         !swapBytes;
+   case MESA_FORMAT_R16G16_UNORM:
+      return format == GL_RG && type == GL_UNSIGNED_SHORT && littleEndian &&
+         !swapBytes;
+   case MESA_FORMAT_G16R16_UNORM:
+      return GL_FALSE;
+
+   case MESA_FORMAT_B10G10R10A2_UNORM:
+      return format == GL_BGRA && type == GL_UNSIGNED_INT_2_10_10_10_REV &&
+         !swapBytes;
+
+   case MESA_FORMAT_S8_UINT_Z24_UNORM:
+      return format == GL_DEPTH_STENCIL && type == GL_UNSIGNED_INT_24_8 &&
+         !swapBytes;
+   case MESA_FORMAT_X8_UINT_Z24_UNORM:
+   case MESA_FORMAT_Z24_UNORM_S8_UINT:
+      return GL_FALSE;
+
+   case MESA_FORMAT_Z_UNORM16:
+      return format == GL_DEPTH_COMPONENT && type == GL_UNSIGNED_SHORT &&
+         !swapBytes;
+
+   case MESA_FORMAT_Z24_UNORM_X8_UINT:
+      return GL_FALSE;
+
+   case MESA_FORMAT_Z_UNORM32:
+      return format == GL_DEPTH_COMPONENT && type == GL_UNSIGNED_INT &&
+         !swapBytes;
+
+   case MESA_FORMAT_S_UINT8:
+      return format == GL_STENCIL_INDEX && type == GL_UNSIGNED_BYTE;
+
+   case MESA_FORMAT_SRGB_DXT1:
+   case MESA_FORMAT_SRGBA_DXT1:
+   case MESA_FORMAT_SRGBA_DXT3:
+   case MESA_FORMAT_SRGBA_DXT5:
+      return GL_FALSE;
+
+   case MESA_FORMAT_RGB_FXT1:
+   case MESA_FORMAT_RGBA_FXT1:
+   case MESA_FORMAT_RGB_DXT1:
+   case MESA_FORMAT_RGBA_DXT1:
+   case MESA_FORMAT_RGBA_DXT3:
+   case MESA_FORMAT_RGBA_DXT5:
+      return GL_FALSE;
+
+   case MESA_FORMAT_RGBA_FLOAT32:
+      return format == GL_RGBA && type == GL_FLOAT && !swapBytes;
+   case MESA_FORMAT_RGBA_FLOAT16:
+      return format == GL_RGBA && type == GL_HALF_FLOAT && !swapBytes;
+
+   case MESA_FORMAT_RGB_FLOAT32:
+      return format == GL_RGB && type == GL_FLOAT && !swapBytes;
+   case MESA_FORMAT_RGB_FLOAT16:
+      return format == GL_RGB && type == GL_HALF_FLOAT && !swapBytes;
+
+   case MESA_FORMAT_A_FLOAT32:
+      return format == GL_ALPHA && type == GL_FLOAT && !swapBytes;
+   case MESA_FORMAT_A_FLOAT16:
+      return format == GL_ALPHA && type == GL_HALF_FLOAT && !swapBytes;
+
+   case MESA_FORMAT_L_FLOAT32:
+      return format == GL_LUMINANCE && type == GL_FLOAT && !swapBytes;
+   case MESA_FORMAT_L_FLOAT16:
+      return format == GL_LUMINANCE && type == GL_HALF_FLOAT && !swapBytes;
+
+   case MESA_FORMAT_LA_FLOAT32:
+      return format == GL_LUMINANCE_ALPHA && type == GL_FLOAT && !swapBytes;
+   case MESA_FORMAT_LA_FLOAT16:
+      return format == GL_LUMINANCE_ALPHA && type == GL_HALF_FLOAT && !swapBytes;
+
+   case MESA_FORMAT_I_FLOAT32:
+      return format == GL_RED && type == GL_FLOAT && !swapBytes;
+   case MESA_FORMAT_I_FLOAT16:
+      return format == GL_RED && type == GL_HALF_FLOAT && !swapBytes;
+
+   case MESA_FORMAT_R_FLOAT32:
+      return format == GL_RED && type == GL_FLOAT && !swapBytes;
+   case MESA_FORMAT_R_FLOAT16:
+      return format == GL_RED && type == GL_HALF_FLOAT && !swapBytes;
+
+   case MESA_FORMAT_RG_FLOAT32:
+      return format == GL_RG && type == GL_FLOAT && !swapBytes;
+   case MESA_FORMAT_RG_FLOAT16:
+      return format == GL_RG && type == GL_HALF_FLOAT && !swapBytes;
+
+   case MESA_FORMAT_A_UINT8:
+      return format == GL_ALPHA_INTEGER && type == GL_UNSIGNED_BYTE;
+   case MESA_FORMAT_A_UINT16:
+      return format == GL_ALPHA_INTEGER && type == GL_UNSIGNED_SHORT &&
+             !swapBytes;
+   case MESA_FORMAT_A_UINT32:
+      return format == GL_ALPHA_INTEGER && type == GL_UNSIGNED_INT &&
+             !swapBytes;
+   case MESA_FORMAT_A_SINT8:
+      return format == GL_ALPHA_INTEGER && type == GL_BYTE;
+   case MESA_FORMAT_A_SINT16:
+      return format == GL_ALPHA_INTEGER && type == GL_SHORT && !swapBytes;
+   case MESA_FORMAT_A_SINT32:
+      return format == GL_ALPHA_INTEGER && type == GL_INT && !swapBytes;
+
+   case MESA_FORMAT_I_UINT8:
+      return format == GL_RED_INTEGER && type == GL_UNSIGNED_BYTE;
+   case MESA_FORMAT_I_UINT16:
+      return format == GL_RED_INTEGER && type == GL_UNSIGNED_SHORT && !swapBytes;
+   case MESA_FORMAT_I_UINT32:
+      return format == GL_RED_INTEGER && type == GL_UNSIGNED_INT && !swapBytes;
+   case MESA_FORMAT_I_SINT8:
+      return format == GL_RED_INTEGER && type == GL_BYTE;
+   case MESA_FORMAT_I_SINT16:
+      return format == GL_RED_INTEGER && type == GL_SHORT && !swapBytes;
+   case MESA_FORMAT_I_SINT32:
+      return format == GL_RED_INTEGER && type == GL_INT && !swapBytes;
+
+   case MESA_FORMAT_L_UINT8:
+      return format == GL_LUMINANCE_INTEGER_EXT && type == GL_UNSIGNED_BYTE;
+   case MESA_FORMAT_L_UINT16:
+      return format == GL_LUMINANCE_INTEGER_EXT && type == GL_UNSIGNED_SHORT &&
+             !swapBytes;
+   case MESA_FORMAT_L_UINT32:
+      return format == GL_LUMINANCE_INTEGER_EXT && type == GL_UNSIGNED_INT &&
+             !swapBytes;
+   case MESA_FORMAT_L_SINT8:
+      return format == GL_LUMINANCE_INTEGER_EXT && type == GL_BYTE;
+   case MESA_FORMAT_L_SINT16:
+      return format == GL_LUMINANCE_INTEGER_EXT && type == GL_SHORT &&
+             !swapBytes;
+   case MESA_FORMAT_L_SINT32:
+      return format == GL_LUMINANCE_INTEGER_EXT && type == GL_INT && !swapBytes;
+
+   case MESA_FORMAT_LA_UINT8:
+      return format == GL_LUMINANCE_ALPHA_INTEGER_EXT &&
+             type == GL_UNSIGNED_BYTE && !swapBytes;
+   case MESA_FORMAT_LA_UINT16:
+      return format == GL_LUMINANCE_ALPHA_INTEGER_EXT &&
+             type == GL_UNSIGNED_SHORT && !swapBytes;
+   case MESA_FORMAT_LA_UINT32:
+      return format == GL_LUMINANCE_ALPHA_INTEGER_EXT &&
+             type == GL_UNSIGNED_INT && !swapBytes;
+   case MESA_FORMAT_LA_SINT8:
+      return format == GL_LUMINANCE_ALPHA_INTEGER_EXT && type == GL_BYTE &&
+             !swapBytes;
+   case MESA_FORMAT_LA_SINT16:
+      return format == GL_LUMINANCE_ALPHA_INTEGER_EXT && type == GL_SHORT &&
+             !swapBytes;
+   case MESA_FORMAT_LA_SINT32:
+      return format == GL_LUMINANCE_ALPHA_INTEGER_EXT && type == GL_INT &&
+             !swapBytes;
+
+   case MESA_FORMAT_R_SINT8:
+      return format == GL_RED_INTEGER && type == GL_BYTE;
+   case MESA_FORMAT_RG_SINT8:
+      return format == GL_RG_INTEGER && type == GL_BYTE && !swapBytes;
+   case MESA_FORMAT_RGB_SINT8:
+      return format == GL_RGB_INTEGER && type == GL_BYTE && !swapBytes;
+   case MESA_FORMAT_RGBA_SINT8:
+      return format == GL_RGBA_INTEGER && type == GL_BYTE && !swapBytes;
+   case MESA_FORMAT_R_SINT16:
+      return format == GL_RED_INTEGER && type == GL_SHORT && !swapBytes;
+   case MESA_FORMAT_RG_SINT16:
+      return format == GL_RG_INTEGER && type == GL_SHORT && !swapBytes;
+   case MESA_FORMAT_RGB_SINT16:
+      return format == GL_RGB_INTEGER && type == GL_SHORT && !swapBytes;
+   case MESA_FORMAT_RGBA_SINT16:
+      return format == GL_RGBA_INTEGER && type == GL_SHORT && !swapBytes;
+   case MESA_FORMAT_R_SINT32:
+      return format == GL_RED_INTEGER && type == GL_INT && !swapBytes;
+   case MESA_FORMAT_RG_SINT32:
+      return format == GL_RG_INTEGER && type == GL_INT && !swapBytes;
+   case MESA_FORMAT_RGB_SINT32:
+      return format == GL_RGB_INTEGER && type == GL_INT && !swapBytes;
+   case MESA_FORMAT_RGBA_SINT32:
+      return format == GL_RGBA_INTEGER && type == GL_INT && !swapBytes;
+
+   case MESA_FORMAT_R_UINT8:
+      return format == GL_RED_INTEGER && type == GL_UNSIGNED_BYTE;
+   case MESA_FORMAT_RG_UINT8:
+      return format == GL_RG_INTEGER && type == GL_UNSIGNED_BYTE && !swapBytes;
+   case MESA_FORMAT_RGB_UINT8:
+      return format == GL_RGB_INTEGER && type == GL_UNSIGNED_BYTE && !swapBytes;
+   case MESA_FORMAT_RGBA_UINT8:
+      return format == GL_RGBA_INTEGER && type == GL_UNSIGNED_BYTE &&
+             !swapBytes;
+   case MESA_FORMAT_R_UINT16:
+      return format == GL_RED_INTEGER && type == GL_UNSIGNED_SHORT &&
+             !swapBytes;
+   case MESA_FORMAT_RG_UINT16:
+      return format == GL_RG_INTEGER && type == GL_UNSIGNED_SHORT && !swapBytes;
+   case MESA_FORMAT_RGB_UINT16:
+      return format == GL_RGB_INTEGER && type == GL_UNSIGNED_SHORT &&
+             !swapBytes;
+   case MESA_FORMAT_RGBA_UINT16:
+      return format == GL_RGBA_INTEGER && type == GL_UNSIGNED_SHORT &&
+             !swapBytes;
+   case MESA_FORMAT_R_UINT32:
+      return format == GL_RED_INTEGER && type == GL_UNSIGNED_INT && !swapBytes;
+   case MESA_FORMAT_RG_UINT32:
+      return format == GL_RG_INTEGER && type == GL_UNSIGNED_INT && !swapBytes;
+   case MESA_FORMAT_RGB_UINT32:
+      return format == GL_RGB_INTEGER && type == GL_UNSIGNED_INT && !swapBytes;
+   case MESA_FORMAT_RGBA_UINT32:
+      return format == GL_RGBA_INTEGER && type == GL_UNSIGNED_INT && !swapBytes;
+
+   case MESA_FORMAT_DUDV8:
+      return (format == GL_DU8DV8_ATI || format == GL_DUDV_ATI) &&
+             type == GL_BYTE && littleEndian && !swapBytes;
+
+   case MESA_FORMAT_R_SNORM8:
+      return format == GL_RED && type == GL_BYTE;
+   case MESA_FORMAT_R8G8_SNORM:
+      return format == GL_RG && type == GL_BYTE && littleEndian &&
+             !swapBytes;
+   case MESA_FORMAT_X8B8G8R8_SNORM:
+      return GL_FALSE;
+
+   case MESA_FORMAT_A8B8G8R8_SNORM:
+      if (format == GL_RGBA && type == GL_BYTE && !littleEndian)
+         return GL_TRUE;
+
+      if (format == GL_ABGR_EXT && type == GL_BYTE && littleEndian)
+         return GL_TRUE;
+
+      return GL_FALSE;
+
+   case MESA_FORMAT_R8G8B8A8_SNORM:
+      if (format == GL_RGBA && type == GL_BYTE && littleEndian)
+         return GL_TRUE;
+
+      if (format == GL_ABGR_EXT && type == GL_BYTE && !littleEndian)
+         return GL_TRUE;
+
+      return GL_FALSE;
+
+   case MESA_FORMAT_R_SNORM16:
+      return format == GL_RED && type == GL_SHORT &&
+             !swapBytes;
+   case MESA_FORMAT_R16G16_SNORM:
+      return format == GL_RG && type == GL_SHORT && littleEndian && !swapBytes;
+   case MESA_FORMAT_RGB_SNORM16:
+      return format == GL_RGB && type == GL_SHORT && !swapBytes;
+   case MESA_FORMAT_RGBA_SNORM16:
+      return format == GL_RGBA && type == GL_SHORT && !swapBytes;
+   case MESA_FORMAT_RGBA_UNORM16:
+      return format == GL_RGBA && type == GL_UNSIGNED_SHORT &&
+             !swapBytes;
+
+   case MESA_FORMAT_R_RGTC1_UNORM:
+   case MESA_FORMAT_R_RGTC1_SNORM:
+   case MESA_FORMAT_RG_RGTC2_UNORM:
+   case MESA_FORMAT_RG_RGTC2_SNORM:
+      return GL_FALSE;
+
+   case MESA_FORMAT_L_LATC1_UNORM:
+   case MESA_FORMAT_L_LATC1_SNORM:
+   case MESA_FORMAT_LA_LATC2_UNORM:
+   case MESA_FORMAT_LA_LATC2_SNORM:
+      return GL_FALSE;
+
+   case MESA_FORMAT_ETC1_RGB8:
+   case MESA_FORMAT_ETC2_RGB8:
+   case MESA_FORMAT_ETC2_SRGB8:
+   case MESA_FORMAT_ETC2_RGBA8_EAC:
+   case MESA_FORMAT_ETC2_SRGB8_ALPHA8_EAC:
+   case MESA_FORMAT_ETC2_R11_EAC:
+   case MESA_FORMAT_ETC2_RG11_EAC:
+   case MESA_FORMAT_ETC2_SIGNED_R11_EAC:
+   case MESA_FORMAT_ETC2_SIGNED_RG11_EAC:
+   case MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1:
+   case MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1:
+      return GL_FALSE;
+
+   case MESA_FORMAT_A_SNORM8:
+      return format == GL_ALPHA && type == GL_BYTE;
+   case MESA_FORMAT_L_SNORM8:
+      return format == GL_LUMINANCE && type == GL_BYTE;
+   case MESA_FORMAT_L8A8_SNORM:
+      return format == GL_LUMINANCE_ALPHA && type == GL_BYTE &&
+             littleEndian && !swapBytes;
+   case MESA_FORMAT_I_SNORM8:
+      return format == GL_RED && type == GL_BYTE;
+   case MESA_FORMAT_A_SNORM16:
+      return format == GL_ALPHA && type == GL_SHORT && !swapBytes;
+   case MESA_FORMAT_L_SNORM16:
+      return format == GL_LUMINANCE && type == GL_SHORT && !swapBytes;
+   case MESA_FORMAT_LA_SNORM16:
+      return format == GL_LUMINANCE_ALPHA && type == GL_SHORT &&
+             littleEndian && !swapBytes;
+   case MESA_FORMAT_I_SNORM16:
+      return format == GL_RED && type == GL_SHORT && littleEndian &&
+             !swapBytes;
+
+   case MESA_FORMAT_B10G10R10A2_UINT:
+      return (format == GL_BGRA_INTEGER_EXT &&
+              type == GL_UNSIGNED_INT_2_10_10_10_REV &&
+              !swapBytes);
+
+   case MESA_FORMAT_R10G10B10A2_UINT:
+      return (format == GL_RGBA_INTEGER_EXT &&
+              type == GL_UNSIGNED_INT_2_10_10_10_REV &&
+              !swapBytes);
+
+   case MESA_FORMAT_R9G9B9E5_FLOAT:
+      return format == GL_RGB && type == GL_UNSIGNED_INT_5_9_9_9_REV &&
+         !swapBytes;
+
+   case MESA_FORMAT_R11G11B10_FLOAT:
+      return format == GL_RGB && type == GL_UNSIGNED_INT_10F_11F_11F_REV &&
+         !swapBytes;
+
+   case MESA_FORMAT_Z_FLOAT32:
+      return format == GL_DEPTH_COMPONENT && type == GL_FLOAT && !swapBytes;
+
+   case MESA_FORMAT_Z32_FLOAT_S8X24_UINT:
+      return format == GL_DEPTH_STENCIL &&
+             type == GL_FLOAT_32_UNSIGNED_INT_24_8_REV && !swapBytes;
+
+   case MESA_FORMAT_B4G4R4X4_UNORM:
+   case MESA_FORMAT_B5G5R5X1_UNORM:
+   case MESA_FORMAT_R8G8B8X8_SNORM:
+   case MESA_FORMAT_R8G8B8X8_SRGB:
+   case MESA_FORMAT_RGBX_UINT8:
+   case MESA_FORMAT_RGBX_SINT8:
+   case MESA_FORMAT_B10G10R10X2_UNORM:
+   case MESA_FORMAT_RGBX_UNORM16:
+   case MESA_FORMAT_RGBX_SNORM16:
+   case MESA_FORMAT_RGBX_FLOAT16:
+   case MESA_FORMAT_RGBX_UINT16:
+   case MESA_FORMAT_RGBX_SINT16:
+   case MESA_FORMAT_RGBX_FLOAT32:
+   case MESA_FORMAT_RGBX_UINT32:
+   case MESA_FORMAT_RGBX_SINT32:
+      return GL_FALSE;
+
+   case MESA_FORMAT_R10G10B10A2_UNORM:
+      return format == GL_RGBA && type == GL_UNSIGNED_INT_2_10_10_10_REV &&
+         !swapBytes;
+
+   case MESA_FORMAT_G8R8_SNORM:
+      return format == GL_RG && type == GL_BYTE && !littleEndian &&
+         !swapBytes;
+
+   case MESA_FORMAT_G16R16_SNORM:
+      return format == GL_RG && type == GL_SHORT && !littleEndian &&
+         !swapBytes;
+
+   case MESA_FORMAT_B8G8R8X8_SRGB:
+      return GL_FALSE;
+   }
+
+   return GL_FALSE;
+}
+

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/formats.h b/icd/intel/compiler/mesa-utils/src/mesa/main/formats.h
new file mode 100644
index 0000000..52151d0
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/formats.h

@@ -0,0 +1,493 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2008  Brian Paul   All Rights Reserved.
+ * Copyright (c) 2008-2009  VMware, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/*
+ * Authors:
+ *   Brian Paul
+ */
+
+
+#ifndef FORMATS_H
+#define FORMATS_H
+
+
+#include <GL/gl.h>
+#include <stdbool.h>
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+/**
+ * OpenGL doesn't have GL_UNSIGNED_BYTE_4_4, so we must define our own type
+ * for GL_LUMINANCE4_ALPHA4.
+ */
+#define MESA_UNSIGNED_BYTE_4_4 (GL_UNSIGNED_BYTE<<1)
+
+
+/**
+ * Max number of bytes for any non-compressed pixel format below, or for
+ * intermediate pixel storage in Mesa.  This should never be less than
+ * 16.  Maybe 32 someday?
+ */
+#define MAX_PIXEL_BYTES 16
+
+
+/**
+ * Mesa texture/renderbuffer image formats.
+ */
+typedef enum
+{
+   MESA_FORMAT_NONE = 0,
+
+   /**
+    * \name Basic hardware formats
+    *
+    * The mesa format name specification is as follows:
+    *
+    *  There shall be 3 naming format base types: those for component array
+    *  formats (type A); those for compressed formats (type C); and those for
+    *  packed component formats (type P). With type A formats, color component
+    *  order does not change with endianness. Each format name shall begin with
+    *  MESA_FORMAT_, followed by a component label (from the Component Label
+    *  list below) for each component in the order that the component(s) occur
+    *  in the format, except for non-linear color formats where the first
+    *  letter shall be 'S'. For type P formats, each component label is
+    *  followed by the number of bits that represent it in the fundamental
+    *  data type used by the format.
+    *
+    *  Following the listing of the component labels shall be an underscore; a
+    *  compression type followed by an underscore for Type C formats only; a
+    *  storage type from the list below; and a bit with for type A formats,
+    *  which is the bit width for each array element.
+    *
+    *
+    *  ----------    Format Base Type A: Array ----------
+    *  MESA_FORMAT_[component list]_[storage type][array element bit width]
+    *
+    *  examples:
+    *  MESA_FORMAT_A_SNORM8     - uchar[i] = A
+    *  MESA_FORMAT_RGBA_16 - ushort[i * 4 + 0] = R, ushort[i * 4 + 1] = G,
+    *                             ushort[i * 4 + 2] = B, ushort[i * 4 + 3] = A
+    *  MESA_FORMAT_Z_UNORM32    - float[i] = Z
+    *
+    *
+    *
+    *  ----------    Format Base Type C: Compressed ----------
+    *  MESA_FORMAT_[component list*][_*][compression type][storage type*]
+    *  * where required
+    *
+    *  examples:
+    *  MESA_FORMAT_RGB_ETC1
+    *  MESA_FORMAT_RGBA_ETC2
+    *  MESA_FORMAT_LATC1_UNORM
+    *  MESA_FORMAT_RGBA_FXT1
+    *
+    *
+    *
+    *  ----------    Format Base Type P: Packed  ----------
+    *  MESA_FORMAT_[[component list,bit width][storage type*][_]][_][storage type**]
+    *   * when type differs between component
+    *   ** when type applies to all components
+    *
+    *  examples:                   msb <------ TEXEL BITS -----------> lsb
+    *  MESA_FORMAT_A8B8G8R8_UNORM, AAAA AAAA BBBB BBBB GGGG GGGG RRRR RRRR
+    *  MESA_FORMAT_R5G6B5_UNORM                        RRRR RGGG GGGB BBBB
+    *  MESA_FORMAT_B4G4R4X4_UNORM                      BBBB GGGG RRRR XXXX
+    *  MESA_FORMAT_Z32_FLOAT_S8X24_UINT
+    *  MESA_FORMAT_R10G10B10A2_UINT
+    *  MESA_FORMAT_R9G9B9E5_FLOAT
+    *
+    *
+    *
+    *  ----------    Component Labels: ----------
+    *  A - Alpha
+    *  B - Blue
+    *  DU - Delta U
+    *  DV - Delta V
+    *  E - Shared Exponent
+    *  G - Green
+    *  I - Intensity
+    *  L - Luminance
+    *  R - Red
+    *  S - Stencil (when not followed by RGB or RGBA)
+    *  U - Chrominance
+    *  V - Chrominance
+    *  Y - Luma
+    *  X - Packing bits
+    *  Z - Depth
+    *
+    *
+    *
+    *  ----------    Type C Compression Types: ----------
+    *  DXT1 - Color component labels shall be given
+    *  DXT3 - Color component labels shall be given
+    *  DXT5 - Color component labels shall be given
+    *  ETC1 - No other information required
+    *  ETC2 - No other information required
+    *  FXT1 - Color component labels shall be given
+    *  FXT3 - Color component labels shall be given
+    *  LATC1 - Fundamental data type shall be given
+    *  LATC2 - Fundamental data type shall be given
+    *  RGTC1 - Color component labels and data type shall be given
+    *  RGTC2 - Color component labels and data type shall be given
+    *
+    *
+    *
+    *  ----------    Storage Types: ----------
+    *  FLOAT
+    *  SINT
+    *  UINT
+    *  SNORM
+    *  UNORM
+    *  SRGB - RGB components, or L are UNORMs in sRGB color space.
+    *         Alpha, if present is linear.
+    *
+    */
+
+   /* Packed unorm formats */    /* msb <------ TEXEL BITS -----------> lsb */
+                                 /* ---- ---- ---- ---- ---- ---- ---- ---- */
+   MESA_FORMAT_A8B8G8R8_UNORM,   /* RRRR RRRR GGGG GGGG BBBB BBBB AAAA AAAA */
+   MESA_FORMAT_X8B8G8R8_UNORM,   /* RRRR RRRR GGGG GGGG BBBB BBBB xxxx xxxx */
+   MESA_FORMAT_R8G8B8A8_UNORM,   /* AAAA AAAA BBBB BBBB GGGG GGGG RRRR RRRR */
+   MESA_FORMAT_R8G8B8X8_UNORM,   /* xxxx xxxx BBBB BBBB GGGG GGGG RRRR RRRR */
+   MESA_FORMAT_B8G8R8A8_UNORM,   /* AAAA AAAA RRRR RRRR GGGG GGGG BBBB BBBB */
+   MESA_FORMAT_B8G8R8X8_UNORM,   /* xxxx xxxx RRRR RRRR GGGG GGGG BBBB BBBB */
+   MESA_FORMAT_A8R8G8B8_UNORM,   /* BBBB BBBB GGGG GGGG RRRR RRRR AAAA AAAA */
+   MESA_FORMAT_X8R8G8B8_UNORM,   /* BBBB BBBB GGGG GGGG RRRR RRRR xxxx xxxx */
+   MESA_FORMAT_L16A16_UNORM,     /* AAAA AAAA AAAA AAAA LLLL LLLL LLLL LLLL */
+   MESA_FORMAT_A16L16_UNORM,     /* LLLL LLLL LLLL LLLL AAAA AAAA AAAA AAAA */
+   MESA_FORMAT_B5G6R5_UNORM,                         /* RRRR RGGG GGGB BBBB */
+   MESA_FORMAT_R5G6B5_UNORM,                         /* BBBB BGGG GGGR RRRR */
+   MESA_FORMAT_B4G4R4A4_UNORM,                       /* AAAA RRRR GGGG BBBB */
+   MESA_FORMAT_B4G4R4X4_UNORM,                       /* xxxx RRRR GGGG BBBB */
+   MESA_FORMAT_A4R4G4B4_UNORM,                       /* BBBB GGGG RRRR AAAA */
+   MESA_FORMAT_A1B5G5R5_UNORM,                       /* RRRR RGGG GGBB BBBA */
+   MESA_FORMAT_B5G5R5A1_UNORM,                       /* ARRR RRGG GGGB BBBB */
+   MESA_FORMAT_B5G5R5X1_UNORM,                       /* xRRR RRGG GGGB BBBB */
+   MESA_FORMAT_A1R5G5B5_UNORM,                       /* BBBB BGGG GGRR RRRA */
+   MESA_FORMAT_L8A8_UNORM,                           /* AAAA AAAA LLLL LLLL */
+   MESA_FORMAT_A8L8_UNORM,                           /* LLLL LLLL AAAA AAAA */
+   MESA_FORMAT_R8G8_UNORM,                           /* GGGG GGGG RRRR RRRR */
+   MESA_FORMAT_G8R8_UNORM,                           /* RRRR RRRR GGGG GGGG */
+   MESA_FORMAT_L4A4_UNORM,                                     /* AAAA LLLL */
+   MESA_FORMAT_B2G3R3_UNORM,                                   /* RRRG GGBB */
+
+   MESA_FORMAT_R16G16_UNORM,     /* GGGG GGGG GGGG GGGG RRRR RRRR RRRR RRRR */
+   MESA_FORMAT_G16R16_UNORM,     /* RRRR RRRR RRRR RRRR GGGG GGGG GGGG GGGG */
+   MESA_FORMAT_B10G10R10A2_UNORM,/* AARR RRRR RRRR GGGG GGGG GGBB BBBB BBBB */
+   MESA_FORMAT_B10G10R10X2_UNORM,/* xxRR RRRR RRRR GGGG GGGG GGBB BBBB BBBB */
+   MESA_FORMAT_R10G10B10A2_UNORM,/* AABB BBBB BBBB GGGG GGGG GGRR RRRR RRRR */
+
+   MESA_FORMAT_S8_UINT_Z24_UNORM,/* ZZZZ ZZZZ ZZZZ ZZZZ ZZZZ ZZZZ SSSS SSSS */
+   MESA_FORMAT_X8_UINT_Z24_UNORM,/* ZZZZ ZZZZ ZZZZ ZZZZ ZZZZ ZZZZ xxxx xxxx */
+   MESA_FORMAT_Z24_UNORM_S8_UINT,/* SSSS SSSS ZZZZ ZZZZ ZZZZ ZZZZ ZZZZ ZZZZ */
+   MESA_FORMAT_Z24_UNORM_X8_UINT,/* xxxx xxxx ZZZZ ZZZZ ZZZZ ZZZZ ZZZZ ZZZZ */
+
+   MESA_FORMAT_YCBCR,            /*                     YYYY YYYY UorV UorV */
+   MESA_FORMAT_YCBCR_REV,        /*                     UorV UorV YYYY YYYY */
+
+   MESA_FORMAT_DUDV8,            /*                     DUDU DUDU DVDV DVDV */
+
+   /* Array unorm formats */
+   MESA_FORMAT_A_UNORM8,      /* ubyte[i] = A */
+   MESA_FORMAT_A_UNORM16,     /* ushort[i] = A */
+   MESA_FORMAT_L_UNORM8,      /* ubyte[i] = L */
+   MESA_FORMAT_L_UNORM16,     /* ushort[i] = L */
+   MESA_FORMAT_I_UNORM8,      /* ubyte[i] = I */
+   MESA_FORMAT_I_UNORM16,     /* ushort[i] = I */
+   MESA_FORMAT_R_UNORM8,      /* ubyte[i] = R */
+   MESA_FORMAT_R_UNORM16,     /* ushort[i] = R */
+   MESA_FORMAT_BGR_UNORM8,    /* ubyte[i*3] = B, [i*3+1] = G, [i*3+2] = R */
+   MESA_FORMAT_RGB_UNORM8,    /* ubyte[i*3] = R, [i*3+1] = G, [i*3+2] = B */
+   MESA_FORMAT_RGBA_UNORM16,  /* ushort[i] = R, [1] = G, [2] = B, [3] = A */
+   MESA_FORMAT_RGBX_UNORM16,  
+
+   MESA_FORMAT_Z_UNORM16,     /* ushort[i] = Z */
+   MESA_FORMAT_Z_UNORM32,     /* uint[i] = Z */
+   MESA_FORMAT_S_UINT8,       /* ubyte[i] = S */
+
+   /* Packed signed/normalized formats */
+                                 /* msb <------ TEXEL BITS -----------> lsb */
+                                 /* ---- ---- ---- ---- ---- ---- ---- ---- */
+   MESA_FORMAT_A8B8G8R8_SNORM,   /* RRRR RRRR GGGG GGGG BBBB BBBB AAAA AAAA */
+   MESA_FORMAT_X8B8G8R8_SNORM,   /* RRRR RRRR GGGG GGGG BBBB BBBB xxxx xxxx */
+   MESA_FORMAT_R8G8B8A8_SNORM,   /* AAAA AAAA BBBB BBBB GGGG GGGG RRRR RRRR */
+   MESA_FORMAT_R8G8B8X8_SNORM,   /* xxxx xxxx BBBB BBBB GGGG GGGG RRRR RRRR */
+   MESA_FORMAT_R16G16_SNORM,     /* GGGG GGGG GGGG GGGG RRRR RRRR RRRR RRRR */
+   MESA_FORMAT_G16R16_SNORM,     /* RRRR RRRR RRRR RRRR GGGG GGGG GGGG GGGG */
+   MESA_FORMAT_R8G8_SNORM,       /*                     GGGG GGGG RRRR RRRR */
+   MESA_FORMAT_G8R8_SNORM,       /*                     RRRR RRRR GGGG GGGG */
+   MESA_FORMAT_L8A8_SNORM,       /*                     AAAA AAAA LLLL LLLL */
+
+   /* Array signed/normalized formats */
+   MESA_FORMAT_A_SNORM8,      /* byte[i] = A */
+   MESA_FORMAT_A_SNORM16,     /* short[i] = A */
+   MESA_FORMAT_L_SNORM8,      /* byte[i] = L */
+   MESA_FORMAT_L_SNORM16,     /* short[i] = L */
+   MESA_FORMAT_I_SNORM8,      /* byte[i] = I */
+   MESA_FORMAT_I_SNORM16,     /* short[i] = I */
+   MESA_FORMAT_R_SNORM8,      /* byte[i] = R */
+   MESA_FORMAT_R_SNORM16,     /* short[i] = R */
+   MESA_FORMAT_LA_SNORM16,    /* short[i * 2] = L, [i * 2 + 1] = A */
+   MESA_FORMAT_RGB_SNORM16,   /* short[i*3] = R, [i*3+1] = G, [i*3+2] = B */
+   MESA_FORMAT_RGBA_SNORM16,  /* ... */
+   MESA_FORMAT_RGBX_SNORM16,  /* ... */
+
+   /* Packed sRGB formats */
+   MESA_FORMAT_A8B8G8R8_SRGB,    /* RRRR RRRR GGGG GGGG BBBB BBBB AAAA AAAA */
+   MESA_FORMAT_B8G8R8A8_SRGB,    /* AAAA AAAA RRRR RRRR GGGG GGGG BBBB BBBB */
+   MESA_FORMAT_B8G8R8X8_SRGB,    /* xxxx xxxx RRRR RRRR GGGG GGGG BBBB BBBB */
+   MESA_FORMAT_R8G8B8A8_SRGB,    /* AAAA AAAA BBBB BBBB GGGG GGGG RRRR RRRR */
+   MESA_FORMAT_R8G8B8X8_SRGB,    /* xxxx xxxx BBBB BBBB GGGG GGGG RRRR RRRR */
+   MESA_FORMAT_L8A8_SRGB,                            /* AAAA AAAA LLLL LLLL */
+
+   /* Array sRGB formats */
+   MESA_FORMAT_L_SRGB8,       /* ubyte[i] = L */
+   MESA_FORMAT_BGR_SRGB8,     /* ubyte[i*3] = B, [i*3+1] = G, [i*3+2] = R */
+
+   /* Packed float formats */
+   MESA_FORMAT_R9G9B9E5_FLOAT,
+   MESA_FORMAT_R11G11B10_FLOAT,   /* BBBB BBBB BBGG GGGG GGGG GRRR RRRR RRRR */
+   MESA_FORMAT_Z32_FLOAT_S8X24_UINT, /* (float, x24s8) */
+
+   /* Array float formats */
+   MESA_FORMAT_A_FLOAT16,
+   MESA_FORMAT_A_FLOAT32,
+   MESA_FORMAT_L_FLOAT16,
+   MESA_FORMAT_L_FLOAT32,
+   MESA_FORMAT_LA_FLOAT16,
+   MESA_FORMAT_LA_FLOAT32,
+   MESA_FORMAT_I_FLOAT16,
+   MESA_FORMAT_I_FLOAT32,
+   MESA_FORMAT_R_FLOAT16,
+   MESA_FORMAT_R_FLOAT32,
+   MESA_FORMAT_RG_FLOAT16,
+   MESA_FORMAT_RG_FLOAT32,
+   MESA_FORMAT_RGB_FLOAT16,
+   MESA_FORMAT_RGB_FLOAT32,
+   MESA_FORMAT_RGBA_FLOAT16,
+   MESA_FORMAT_RGBA_FLOAT32,  /* float[0] = R, [1] = G, [2] = B, [3] = A */
+   MESA_FORMAT_RGBX_FLOAT16,
+   MESA_FORMAT_RGBX_FLOAT32,
+   MESA_FORMAT_Z_FLOAT32,
+
+   /* Packed signed/unsigned non-normalized integer formats */
+   MESA_FORMAT_B10G10R10A2_UINT, /* AARR RRRR RRRR GGGG GGGG GGBB BBBB BBBB */
+   MESA_FORMAT_R10G10B10A2_UINT, /* AABB BBBB BBBB GGGG GGGG GGRR RRRR RRRR */
+
+   /* Array signed/unsigned non-normalized integer formats */
+   MESA_FORMAT_A_UINT8,
+   MESA_FORMAT_A_UINT16,
+   MESA_FORMAT_A_UINT32,
+   MESA_FORMAT_A_SINT8,
+   MESA_FORMAT_A_SINT16,
+   MESA_FORMAT_A_SINT32,
+
+   MESA_FORMAT_I_UINT8,
+   MESA_FORMAT_I_UINT16,
+   MESA_FORMAT_I_UINT32,
+   MESA_FORMAT_I_SINT8,
+   MESA_FORMAT_I_SINT16,
+   MESA_FORMAT_I_SINT32,
+
+   MESA_FORMAT_L_UINT8,
+   MESA_FORMAT_L_UINT16,
+   MESA_FORMAT_L_UINT32,
+   MESA_FORMAT_L_SINT8,
+   MESA_FORMAT_L_SINT16,
+   MESA_FORMAT_L_SINT32,
+
+   MESA_FORMAT_LA_UINT8,
+   MESA_FORMAT_LA_UINT16,
+   MESA_FORMAT_LA_UINT32,
+   MESA_FORMAT_LA_SINT8,
+   MESA_FORMAT_LA_SINT16,
+   MESA_FORMAT_LA_SINT32,
+
+   MESA_FORMAT_R_UINT8,
+   MESA_FORMAT_R_UINT16,
+   MESA_FORMAT_R_UINT32,
+   MESA_FORMAT_R_SINT8,
+   MESA_FORMAT_R_SINT16,
+   MESA_FORMAT_R_SINT32,
+
+   MESA_FORMAT_RG_UINT8,
+   MESA_FORMAT_RG_UINT16,
+   MESA_FORMAT_RG_UINT32,
+   MESA_FORMAT_RG_SINT8,
+   MESA_FORMAT_RG_SINT16,
+   MESA_FORMAT_RG_SINT32,
+
+   MESA_FORMAT_RGB_UINT8,
+   MESA_FORMAT_RGB_UINT16,
+   MESA_FORMAT_RGB_UINT32,
+   MESA_FORMAT_RGB_SINT8,
+   MESA_FORMAT_RGB_SINT16,
+   MESA_FORMAT_RGB_SINT32,
+
+   MESA_FORMAT_RGBA_UINT8,
+   MESA_FORMAT_RGBA_UINT16,
+   MESA_FORMAT_RGBA_UINT32,
+   MESA_FORMAT_RGBA_SINT8,
+   MESA_FORMAT_RGBA_SINT16,
+   MESA_FORMAT_RGBA_SINT32,
+
+   MESA_FORMAT_RGBX_UINT8,
+   MESA_FORMAT_RGBX_UINT16,
+   MESA_FORMAT_RGBX_UINT32,
+   MESA_FORMAT_RGBX_SINT8,
+   MESA_FORMAT_RGBX_SINT16,
+   MESA_FORMAT_RGBX_SINT32,
+
+   /* DXT compressed formats */
+   MESA_FORMAT_RGB_DXT1,
+   MESA_FORMAT_RGBA_DXT1,
+   MESA_FORMAT_RGBA_DXT3,
+   MESA_FORMAT_RGBA_DXT5,
+
+   /* DXT sRGB compressed formats */
+   MESA_FORMAT_SRGB_DXT1,
+   MESA_FORMAT_SRGBA_DXT1,
+   MESA_FORMAT_SRGBA_DXT3,
+   MESA_FORMAT_SRGBA_DXT5,
+
+   /* FXT1 compressed formats */
+   MESA_FORMAT_RGB_FXT1,
+   MESA_FORMAT_RGBA_FXT1,
+
+   /* RGTC compressed formats */
+   MESA_FORMAT_R_RGTC1_UNORM,
+   MESA_FORMAT_R_RGTC1_SNORM,
+   MESA_FORMAT_RG_RGTC2_UNORM,
+   MESA_FORMAT_RG_RGTC2_SNORM,
+
+   /* LATC1/2 compressed formats */
+   MESA_FORMAT_L_LATC1_UNORM,
+   MESA_FORMAT_L_LATC1_SNORM,
+   MESA_FORMAT_LA_LATC2_UNORM,
+   MESA_FORMAT_LA_LATC2_SNORM,
+
+   /* ETC1/2 compressed formats */
+   MESA_FORMAT_ETC1_RGB8,
+   MESA_FORMAT_ETC2_RGB8,
+   MESA_FORMAT_ETC2_SRGB8,
+   MESA_FORMAT_ETC2_RGBA8_EAC,
+   MESA_FORMAT_ETC2_SRGB8_ALPHA8_EAC,
+   MESA_FORMAT_ETC2_R11_EAC,
+   MESA_FORMAT_ETC2_RG11_EAC,
+   MESA_FORMAT_ETC2_SIGNED_R11_EAC,
+   MESA_FORMAT_ETC2_SIGNED_RG11_EAC,
+   MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1,
+   MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1,
+
+   MESA_FORMAT_COUNT
+} mesa_format;
+
+
+extern const char *
+_mesa_get_format_name(mesa_format format);
+
+extern GLint
+_mesa_get_format_bytes(mesa_format format);
+
+extern GLint
+_mesa_get_format_bits(mesa_format format, GLenum pname);
+
+extern GLuint
+_mesa_get_format_max_bits(mesa_format format);
+
+extern GLenum
+_mesa_get_format_datatype(mesa_format format);
+
+extern GLenum
+_mesa_get_format_base_format(mesa_format format);
+
+extern void
+_mesa_get_format_block_size(mesa_format format, GLuint *bw, GLuint *bh);
+
+extern GLboolean
+_mesa_is_format_compressed(mesa_format format);
+
+extern GLboolean
+_mesa_is_format_packed_depth_stencil(mesa_format format);
+
+extern GLboolean
+_mesa_is_format_integer_color(mesa_format format);
+
+extern GLboolean
+_mesa_is_format_unsigned(mesa_format format);
+
+extern GLboolean
+_mesa_is_format_signed(mesa_format format);
+
+extern GLboolean
+_mesa_is_format_integer(mesa_format format);
+
+extern GLenum
+_mesa_get_format_color_encoding(mesa_format format);
+
+extern GLuint
+_mesa_format_image_size(mesa_format format, GLsizei width,
+                        GLsizei height, GLsizei depth);
+
+extern uint64_t
+_mesa_format_image_size64(mesa_format format, GLsizei width,
+                          GLsizei height, GLsizei depth);
+
+extern GLint
+_mesa_format_row_stride(mesa_format format, GLsizei width);
+
+extern void
+_mesa_format_to_type_and_comps(mesa_format format,
+                               GLenum *datatype, GLuint *comps);
+
+extern void
+_mesa_test_formats(void);
+
+extern mesa_format
+_mesa_get_srgb_format_linear(mesa_format format);
+
+extern mesa_format
+_mesa_get_uncompressed_format(mesa_format format);
+
+extern GLuint
+_mesa_format_num_components(mesa_format format);
+
+extern bool
+_mesa_format_has_color_component(mesa_format format, int component);
+
+GLboolean
+_mesa_format_matches_format_and_type(mesa_format mesa_format,
+				     GLenum format, GLenum type,
+                                     GLboolean swapBytes);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* FORMATS_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/glheader.h b/icd/intel/compiler/mesa-utils/src/mesa/main/glheader.h
new file mode 100644
index 0000000..7f7f9a3
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/glheader.h

@@ -0,0 +1,170 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2008  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+/**
+ * \file glheader.h
+ * Wrapper for GL/gl.h and GL/glext.h
+ */
+
+
+#ifndef GLHEADER_H
+#define GLHEADER_H
+
+
+#define GL_GLEXT_PROTOTYPES
+#include "GL/gl.h"
+#include "GL/glext.h"
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+typedef int GLclampx;
+
+
+#ifndef GL_OES_EGL_image
+typedef void *GLeglImageOES;
+#endif
+
+
+#ifndef GL_OES_EGL_image_external
+#define GL_TEXTURE_EXTERNAL_OES                                 0x8D65
+#define GL_SAMPLER_EXTERNAL_OES                                 0x8D66
+#define GL_TEXTURE_BINDING_EXTERNAL_OES                         0x8D67
+#define GL_REQUIRED_TEXTURE_IMAGE_UNITS_OES                     0x8D68
+#endif
+
+
+#ifndef GL_OES_point_size_array
+#define GL_POINT_SIZE_ARRAY_OES                                 0x8B9C
+#define GL_POINT_SIZE_ARRAY_TYPE_OES                            0x898A
+#define GL_POINT_SIZE_ARRAY_STRIDE_OES                          0x898B
+#define GL_POINT_SIZE_ARRAY_POINTER_OES                         0x898C
+#define GL_POINT_SIZE_ARRAY_BUFFER_BINDING_OES                  0x8B9F
+#endif
+
+
+#ifndef GL_OES_draw_texture
+#define GL_TEXTURE_CROP_RECT_OES  0x8B9D
+#endif
+
+
+#ifndef GL_PROGRAM_BINARY_LENGTH_OES
+#define GL_PROGRAM_BINARY_LENGTH_OES 0x8741
+#endif
+
+/* GLES 2.0 tokens */
+#ifndef GL_RGB565
+#define GL_RGB565 0x8D62
+#endif
+
+#ifndef GL_TEXTURE_GEN_STR_OES
+#define GL_TEXTURE_GEN_STR_OES                                  0x8D60
+#endif
+
+#ifndef GL_OES_compressed_paletted_texture
+#define GL_PALETTE4_RGB8_OES                                    0x8B90
+#define GL_PALETTE4_RGBA8_OES                                   0x8B91
+#define GL_PALETTE4_R5_G6_B5_OES                                0x8B92
+#define GL_PALETTE4_RGBA4_OES                                   0x8B93
+#define GL_PALETTE4_RGB5_A1_OES                                 0x8B94
+#define GL_PALETTE8_RGB8_OES                                    0x8B95
+#define GL_PALETTE8_RGBA8_OES                                   0x8B96
+#define GL_PALETTE8_R5_G6_B5_OES                                0x8B97
+#define GL_PALETTE8_RGBA4_OES                                   0x8B98
+#define GL_PALETTE8_RGB5_A1_OES                                 0x8B99
+#endif
+
+#ifndef GL_ES_VERSION_2_0
+#define GL_SHADER_BINARY_FORMATS            0x8DF8
+#define GL_NUM_SHADER_BINARY_FORMATS        0x8DF9
+#define GL_SHADER_COMPILER                  0x8DFA
+#define GL_MAX_VERTEX_UNIFORM_VECTORS       0x8DFB
+#define GL_MAX_VARYING_VECTORS              0x8DFC
+#define GL_MAX_FRAGMENT_UNIFORM_VECTORS     0x8DFD
+#endif
+
+#ifndef GL_ATI_texture_compression_3dc
+#define GL_ATI_texture_compression_3dc 1
+#define GL_COMPRESSED_LUMINANCE_ALPHA_3DC_ATI 0x8837
+#endif
+
+#ifndef GL_OES_compressed_ETC1_RGB8_texture
+#define GL_ETC1_RGB8_OES                                        0x8D64
+#endif
+
+
+/* Inexplicably, GL_HALF_FLOAT_OES has a different value than GL_HALF_FLOAT.
+ */
+#ifndef GL_HALF_FLOAT_OES
+#define GL_HALF_FLOAT_OES 0x8D61
+#endif
+
+
+/**
+ * Internal token to represent a GLSL shader program (a collection of
+ * one or more shaders that get linked together).  Note that GLSL
+ * shaders and shader programs share one name space (one hash table)
+ * so we need a value that's different from any of the
+ * GL_VERTEX/FRAGMENT/GEOMETRY_PROGRAM tokens.
+ */
+#define GL_SHADER_PROGRAM_MESA 0x9999
+
+
+/**
+ * Internal token for geometry programs.
+ * Use the value for GL_GEOMETRY_PROGRAM_NV for now.
+ */
+#define MESA_GEOMETRY_PROGRAM 0x8c26
+
+/* Several fields of struct gl_config can take these as values.  Since
+ * GLX header files may not be available everywhere they need to be used,
+ * redefine them here.
+ */
+#define GLX_NONE                           0x8000
+#define GLX_SLOW_CONFIG                    0x8001
+#define GLX_TRUE_COLOR                     0x8002
+#define GLX_DIRECT_COLOR                   0x8003
+#define GLX_PSEUDO_COLOR                   0x8004
+#define GLX_STATIC_COLOR                   0x8005
+#define GLX_GRAY_SCALE                     0x8006
+#define GLX_STATIC_GRAY                    0x8007
+#define GLX_TRANSPARENT_RGB                0x8008
+#define GLX_TRANSPARENT_INDEX              0x8009
+#define GLX_NON_CONFORMANT_CONFIG          0x800D
+#define GLX_SWAP_EXCHANGE_OML              0x8061
+#define GLX_SWAP_COPY_OML                  0x8062
+#define GLX_SWAP_UNDEFINED_OML             0x8063
+
+#define GLX_DONT_CARE                      0xFFFFFFFF
+
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* GLHEADER_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/hash.c b/icd/intel/compiler/mesa-utils/src/mesa/main/hash.c
new file mode 100644
index 0000000..23018e9
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/hash.c

@@ -0,0 +1,520 @@
+/**
+ * \file hash.c
+ * Generic hash table. 
+ *
+ * Used for display lists, texture objects, vertex/fragment programs,
+ * buffer objects, etc.  The hash functions are thread-safe.
+ * 
+ * \note key=0 is illegal.
+ *
+ * \author Brian Paul
+ */
+
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2006  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include "glheader.h"
+#include "imports.h"
+#include "hash.h"
+#include "hash_table.h"
+
+/**
+ * Magic GLuint object name that gets stored outside of the struct hash_table.
+ *
+ * The hash table needs a particular pointer to be the marker for a key that
+ * was deleted from the table, along with NULL for the "never allocated in the
+ * table" marker.  Legacy GL allows any GLuint to be used as a GL object name,
+ * and we use a 1:1 mapping from GLuints to key pointers, so we need to be
+ * able to track a GLuint that happens to match the deleted key outside of
+ * struct hash_table.  We tell the hash table to use "1" as the deleted key
+ * value, so that we test the deleted-key-in-the-table path as best we can.
+ */
+#define DELETED_KEY_VALUE 1
+
+/**
+ * The hash table data structure.  
+ */
+struct _mesa_HashTable {
+   struct hash_table *ht;
+   GLuint MaxKey;                        /**< highest key inserted so far */
+   mtx_t Mutex;                /**< mutual exclusion lock */
+   mtx_t WalkMutex;            /**< for _mesa_HashWalk() */
+   GLboolean InDeleteAll;                /**< Debug check */
+   /** Value that would be in the table for DELETED_KEY_VALUE. */
+   void *deleted_key_data;
+};
+
+/** @{
+ * Mapping from our use of GLuint as both the key and the hash value to the
+ * hash_table.h API
+ *
+ * There exist many integer hash functions, designed to avoid collisions when
+ * the integers are spread across key space with some patterns.  In GL, the
+ * pattern (in the case of glGen*()ed object IDs) is that the keys are unique
+ * contiguous integers starting from 1.  Because of that, we just use the key
+ * as the hash value, to minimize the cost of the hash function.  If objects
+ * are never deleted, we will never see a collision in the table, because the
+ * table resizes itself when it approaches full, and thus key % table_size ==
+ * key.
+ *
+ * The case where we could have collisions for genned objects would be
+ * something like: glGenBuffers(&a, 100); glDeleteBuffers(&a + 50, 50);
+ * glGenBuffers(&b, 100), because objects 1-50 and 101-200 are allocated at
+ * the end of that sequence, instead of 1-150.  So far it doesn't appear to be
+ * a problem.
+ */
+static bool
+uint_key_compare(const void *a, const void *b)
+{
+   return a == b;
+}
+
+static uint32_t
+uint_hash(GLuint id)
+{
+   return id;
+}
+
+static void *
+uint_key(GLuint id)
+{
+   return (void *)(uintptr_t) id;
+}
+/** @} */
+
+/**
+ * Create a new hash table.
+ * 
+ * \return pointer to a new, empty hash table.
+ */
+struct _mesa_HashTable *
+_mesa_NewHashTable(void)
+{
+   struct _mesa_HashTable *table = CALLOC_STRUCT(_mesa_HashTable);
+
+   if (table) {
+      table->ht = _mesa_hash_table_create(NULL, uint_key_compare);
+      _mesa_hash_table_set_deleted_key(table->ht, uint_key(DELETED_KEY_VALUE));
+      mtx_init(&table->Mutex, mtx_plain);
+      mtx_init(&table->WalkMutex, mtx_plain);
+   }
+   return table;
+}
+
+
+
+/**
+ * Delete a hash table.
+ * Frees each entry on the hash table and then the hash table structure itself.
+ * Note that the caller should have already traversed the table and deleted
+ * the objects in the table (i.e. We don't free the entries' data pointer).
+ *
+ * \param table the hash table to delete.
+ */
+void
+_mesa_DeleteHashTable(struct _mesa_HashTable *table)
+{
+   assert(table);
+
+   if (_mesa_hash_table_next_entry(table->ht, NULL) != NULL) {
+      _mesa_problem(NULL, "In _mesa_DeleteHashTable, found non-freed data");
+   }
+
+   _mesa_hash_table_destroy(table->ht, NULL);
+
+   mtx_destroy(&table->Mutex);
+   mtx_destroy(&table->WalkMutex);
+   free(table);
+}
+
+
+
+/**
+ * Lookup an entry in the hash table, without locking.
+ * \sa _mesa_HashLookup
+ */
+static inline void *
+_mesa_HashLookup_unlocked(struct _mesa_HashTable *table, GLuint key)
+{
+   const struct hash_entry *entry;
+
+   assert(table);
+   assert(key);
+
+   if (key == DELETED_KEY_VALUE)
+      return table->deleted_key_data;
+
+   entry = _mesa_hash_table_search(table->ht, uint_hash(key), uint_key(key));
+   if (!entry)
+      return NULL;
+
+   return entry->data;
+}
+
+
+/**
+ * Lookup an entry in the hash table.
+ * 
+ * \param table the hash table.
+ * \param key the key.
+ * 
+ * \return pointer to user's data or NULL if key not in table
+ */
+void *
+_mesa_HashLookup(struct _mesa_HashTable *table, GLuint key)
+{
+   void *res;
+   assert(table);
+   mtx_lock(&table->Mutex);
+   res = _mesa_HashLookup_unlocked(table, key);
+   mtx_unlock(&table->Mutex);
+   return res;
+}
+
+
+/**
+ * Lookup an entry in the hash table without locking the mutex.
+ *
+ * The hash table mutex must be locked manually by calling
+ * _mesa_HashLockMutex() before calling this function.
+ *
+ * \param table the hash table.
+ * \param key the key.
+ *
+ * \return pointer to user's data or NULL if key not in table
+ */
+void *
+_mesa_HashLookupLocked(struct _mesa_HashTable *table, GLuint key)
+{
+   return _mesa_HashLookup_unlocked(table, key);
+}
+
+
+/**
+ * Lock the hash table mutex.
+ *
+ * This function should be used when multiple objects need
+ * to be looked up in the hash table, to avoid having to lock
+ * and unlock the mutex each time.
+ *
+ * \param table the hash table.
+ */
+void
+_mesa_HashLockMutex(struct _mesa_HashTable *table)
+{
+   assert(table);
+   mtx_lock(&table->Mutex);
+}
+
+
+/**
+ * Unlock the hash table mutex.
+ *
+ * \param table the hash table.
+ */
+void
+_mesa_HashUnlockMutex(struct _mesa_HashTable *table)
+{
+   assert(table);
+   mtx_unlock(&table->Mutex);
+}
+
+
+static inline void
+_mesa_HashInsert_unlocked(struct _mesa_HashTable *table, GLuint key, void *data)
+{
+   uint32_t hash = uint_hash(key);
+   struct hash_entry *entry;
+
+   assert(table);
+   assert(key);
+
+   if (key > table->MaxKey)
+      table->MaxKey = key;
+
+   if (key == DELETED_KEY_VALUE) {
+      table->deleted_key_data = data;
+   } else {
+      entry = _mesa_hash_table_search(table->ht, hash, uint_key(key));
+      if (entry) {
+         entry->data = data;
+      } else {
+         _mesa_hash_table_insert(table->ht, hash, uint_key(key), data);
+      }
+   }
+}
+
+
+/**
+ * Insert a key/pointer pair into the hash table without locking the mutex.
+ * If an entry with this key already exists we'll replace the existing entry.
+ *
+ * The hash table mutex must be locked manually by calling
+ * _mesa_HashLockMutex() before calling this function.
+ *
+ * \param table the hash table.
+ * \param key the key (not zero).
+ * \param data pointer to user data.
+ */
+void
+_mesa_HashInsertLocked(struct _mesa_HashTable *table, GLuint key, void *data)
+{
+   _mesa_HashInsert_unlocked(table, key, data);
+}
+
+
+/**
+ * Insert a key/pointer pair into the hash table.
+ * If an entry with this key already exists we'll replace the existing entry.
+ *
+ * \param table the hash table.
+ * \param key the key (not zero).
+ * \param data pointer to user data.
+ */
+void
+_mesa_HashInsert(struct _mesa_HashTable *table, GLuint key, void *data)
+{
+   assert(table);
+   mtx_lock(&table->Mutex);
+   _mesa_HashInsert_unlocked(table, key, data);
+   mtx_unlock(&table->Mutex);
+}
+
+
+/**
+ * Remove an entry from the hash table.
+ * 
+ * \param table the hash table.
+ * \param key key of entry to remove.
+ *
+ * While holding the hash table's lock, searches the entry with the matching
+ * key and unlinks it.
+ */
+void
+_mesa_HashRemove(struct _mesa_HashTable *table, GLuint key)
+{
+   struct hash_entry *entry;
+
+   assert(table);
+   assert(key);
+
+   /* have to check this outside of mutex lock */
+   if (table->InDeleteAll) {
+      _mesa_problem(NULL, "_mesa_HashRemove illegally called from "
+                    "_mesa_HashDeleteAll callback function");
+      return;
+   }
+
+   mtx_lock(&table->Mutex);
+   if (key == DELETED_KEY_VALUE) {
+      table->deleted_key_data = NULL;
+   } else {
+      entry = _mesa_hash_table_search(table->ht, uint_hash(key), uint_key(key));
+      _mesa_hash_table_remove(table->ht, entry);
+   }
+   mtx_unlock(&table->Mutex);
+}
+
+
+
+/**
+ * Delete all entries in a hash table, but don't delete the table itself.
+ * Invoke the given callback function for each table entry.
+ *
+ * \param table  the hash table to delete
+ * \param callback  the callback function
+ * \param userData  arbitrary pointer to pass along to the callback
+ *                  (this is typically a struct gl_context pointer)
+ */
+void
+_mesa_HashDeleteAll(struct _mesa_HashTable *table,
+                    void (*callback)(GLuint key, void *data, void *userData),
+                    void *userData)
+{
+   struct hash_entry *entry;
+
+   ASSERT(table);
+   ASSERT(callback);
+   mtx_lock(&table->Mutex);
+   table->InDeleteAll = GL_TRUE;
+   hash_table_foreach(table->ht, entry) {
+      callback((uintptr_t)entry->key, entry->data, userData);
+      _mesa_hash_table_remove(table->ht, entry);
+   }
+   if (table->deleted_key_data) {
+      callback(DELETED_KEY_VALUE, table->deleted_key_data, userData);
+      table->deleted_key_data = NULL;
+   }
+   table->InDeleteAll = GL_FALSE;
+   mtx_unlock(&table->Mutex);
+}
+
+
+/**
+ * Clone all entries in a hash table, into a new table.
+ *
+ * \param table  the hash table to clone
+ */
+struct _mesa_HashTable *
+_mesa_HashClone(const struct _mesa_HashTable *table)
+{
+   /* cast-away const */
+   struct _mesa_HashTable *table2 = (struct _mesa_HashTable *) table;
+   struct hash_entry *entry;
+   struct _mesa_HashTable *clonetable;
+
+   ASSERT(table);
+   mtx_lock(&table2->Mutex);
+
+   clonetable = _mesa_NewHashTable();
+   assert(clonetable);
+   hash_table_foreach(table->ht, entry) {
+      _mesa_HashInsert(clonetable, (GLint)(uintptr_t)entry->key, entry->data);
+   }
+
+   mtx_unlock(&table2->Mutex);
+
+   return clonetable;
+}
+
+
+/**
+ * Walk over all entries in a hash table, calling callback function for each.
+ * Note: we use a separate mutex in this function to avoid a recursive
+ * locking deadlock (in case the callback calls _mesa_HashRemove()) and to
+ * prevent multiple threads/contexts from getting tangled up.
+ * A lock-less version of this function could be used when the table will
+ * not be modified.
+ * \param table  the hash table to walk
+ * \param callback  the callback function
+ * \param userData  arbitrary pointer to pass along to the callback
+ *                  (this is typically a struct gl_context pointer)
+ */
+void
+_mesa_HashWalk(const struct _mesa_HashTable *table,
+               void (*callback)(GLuint key, void *data, void *userData),
+               void *userData)
+{
+   /* cast-away const */
+   struct _mesa_HashTable *table2 = (struct _mesa_HashTable *) table;
+   struct hash_entry *entry;
+
+   ASSERT(table);
+   ASSERT(callback);
+   mtx_lock(&table2->WalkMutex);
+   hash_table_foreach(table->ht, entry) {
+      callback((uintptr_t)entry->key, entry->data, userData);
+   }
+   if (table->deleted_key_data)
+      callback(DELETED_KEY_VALUE, table->deleted_key_data, userData);
+   mtx_unlock(&table2->WalkMutex);
+}
+
+static void
+debug_print_entry(GLuint key, void *data, void *userData)
+{
+   _mesa_debug(NULL, "%u %p\n", key, data);
+}
+
+/**
+ * Dump contents of hash table for debugging.
+ * 
+ * \param table the hash table.
+ */
+void
+_mesa_HashPrint(const struct _mesa_HashTable *table)
+{
+   if (table->deleted_key_data)
+      debug_print_entry(DELETED_KEY_VALUE, table->deleted_key_data, NULL);
+   _mesa_HashWalk(table, debug_print_entry, NULL);
+}
+
+
+/**
+ * Find a block of adjacent unused hash keys.
+ * 
+ * \param table the hash table.
+ * \param numKeys number of keys needed.
+ * 
+ * \return Starting key of free block or 0 if failure.
+ *
+ * If there are enough free keys between the maximum key existing in the table
+ * (_mesa_HashTable::MaxKey) and the maximum key possible, then simply return
+ * the adjacent key. Otherwise do a full search for a free key block in the
+ * allowable key range.
+ */
+GLuint
+_mesa_HashFindFreeKeyBlock(struct _mesa_HashTable *table, GLuint numKeys)
+{
+   const GLuint maxKey = ~((GLuint) 0) - 1;
+   mtx_lock(&table->Mutex);
+   if (maxKey - numKeys > table->MaxKey) {
+      /* the quick solution */
+      mtx_unlock(&table->Mutex);
+      return table->MaxKey + 1;
+   }
+   else {
+      /* the slow solution */
+      GLuint freeCount = 0;
+      GLuint freeStart = 1;
+      GLuint key;
+      for (key = 1; key != maxKey; key++) {
+	 if (_mesa_HashLookup_unlocked(table, key)) {
+	    /* darn, this key is already in use */
+	    freeCount = 0;
+	    freeStart = key+1;
+	 }
+	 else {
+	    /* this key not in use, check if we've found enough */
+	    freeCount++;
+	    if (freeCount == numKeys) {
+               mtx_unlock(&table->Mutex);
+	       return freeStart;
+	    }
+	 }
+      }
+      /* cannot allocate a block of numKeys consecutive keys */
+      mtx_unlock(&table->Mutex);
+      return 0;
+   }
+}
+
+
+/**
+ * Return the number of entries in the hash table.
+ */
+GLuint
+_mesa_HashNumEntries(const struct _mesa_HashTable *table)
+{
+   struct hash_entry *entry;
+   GLuint count = 0;
+
+   if (table->deleted_key_data)
+      count++;
+
+   hash_table_foreach(table->ht, entry)
+      count++;
+
+   return count;
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/hash.h b/icd/intel/compiler/mesa-utils/src/mesa/main/hash.h
new file mode 100644
index 0000000..e3e8f49
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/hash.h

@@ -0,0 +1,80 @@
+/**
+ * \file hash.h
+ * Generic hash table. 
+ */
+
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2006  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+#ifndef HASH_H
+#define HASH_H
+
+
+#include "glheader.h"
+
+
+extern struct _mesa_HashTable *_mesa_NewHashTable(void);
+
+extern void _mesa_DeleteHashTable(struct _mesa_HashTable *table);
+
+extern void *_mesa_HashLookup(struct _mesa_HashTable *table, GLuint key);
+
+extern void _mesa_HashInsert(struct _mesa_HashTable *table, GLuint key, void *data);
+
+extern void _mesa_HashRemove(struct _mesa_HashTable *table, GLuint key);
+
+extern void _mesa_HashLockMutex(struct _mesa_HashTable *table);
+
+extern void _mesa_HashUnlockMutex(struct _mesa_HashTable *table);
+
+extern void *_mesa_HashLookupLocked(struct _mesa_HashTable *table, GLuint key);
+
+extern void _mesa_HashInsertLocked(struct _mesa_HashTable *table,
+                                   GLuint key, void *data);
+
+extern void
+_mesa_HashDeleteAll(struct _mesa_HashTable *table,
+                    void (*callback)(GLuint key, void *data, void *userData),
+                    void *userData);
+
+extern struct _mesa_HashTable *
+_mesa_HashClone(const struct _mesa_HashTable *table);
+
+extern void
+_mesa_HashWalk(const struct _mesa_HashTable *table,
+               void (*callback)(GLuint key, void *data, void *userData),
+               void *userData);
+
+extern void _mesa_HashPrint(const struct _mesa_HashTable *table);
+
+extern GLuint _mesa_HashFindFreeKeyBlock(struct _mesa_HashTable *table, GLuint numKeys);
+
+extern GLuint
+_mesa_HashNumEntries(const struct _mesa_HashTable *table);
+
+extern void _mesa_test_hash_functions(void);
+
+
+#endif

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/hash_table.c b/icd/intel/compiler/mesa-utils/src/mesa/main/hash_table.c
new file mode 100644
index 0000000..d6c5851
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/hash_table.c

@@ -0,0 +1,441 @@
+/*
+ * Copyright © 2009,2012 Intel Corporation
+ * Copyright © 1988-2004 Keith Packard and Bart Massey.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Except as contained in this notice, the names of the authors
+ * or their institutions shall not be used in advertising or
+ * otherwise to promote the sale, use or other dealings in this
+ * Software without prior written authorization from the
+ * authors.
+ *
+ * Authors:
+ *    Eric Anholt <eric@anholt.net>
+ *    Keith Packard <keithp@keithp.com>
+ */
+
+/**
+ * Implements an open-addressing, linear-reprobing hash table.
+ *
+ * For more information, see:
+ *
+ * http://cgit.freedesktop.org/~anholt/hash_table/tree/README
+ */
+
+#include <stdlib.h>
+#include <string.h>
+
+#include "icd-utils.h" // LunarG: ADD
+#include "main/hash_table.h"
+#include "main/macros.h"
+#include "ralloc.h"
+
+static const uint32_t deleted_key_value;
+
+/**
+ * From Knuth -- a good choice for hash/rehash values is p, p-2 where
+ * p and p-2 are both prime.  These tables are sized to have an extra 10%
+ * free to avoid exponential performance degradation as the hash table fills
+ */
+static const struct {
+   uint32_t max_entries, size, rehash;
+} hash_sizes[] = {
+   { 2,			5,		3	  },
+   { 4,			7,		5	  },
+   { 8,			13,		11	  },
+   { 16,		19,		17	  },
+   { 32,		43,		41        },
+   { 64,		73,		71        },
+   { 128,		151,		149       },
+   { 256,		283,		281       },
+   { 512,		571,		569       },
+   { 1024,		1153,		1151      },
+   { 2048,		2269,		2267      },
+   { 4096,		4519,		4517      },
+   { 8192,		9013,		9011      },
+   { 16384,		18043,		18041     },
+   { 32768,		36109,		36107     },
+   { 65536,		72091,		72089     },
+   { 131072,		144409,		144407    },
+   { 262144,		288361,		288359    },
+   { 524288,		576883,		576881    },
+   { 1048576,		1153459,	1153457   },
+   { 2097152,		2307163,	2307161   },
+   { 4194304,		4613893,	4613891   },
+   { 8388608,		9227641,	9227639   },
+   { 16777216,		18455029,	18455027  },
+   { 33554432,		36911011,	36911009  },
+   { 67108864,		73819861,	73819859  },
+   { 134217728,		147639589,	147639587 },
+   { 268435456,		295279081,	295279079 },
+   { 536870912,		590559793,	590559791 },
+   { 1073741824,	1181116273,	1181116271},
+   { 2147483648ul,	2362232233ul,	2362232231ul}
+};
+
+static int
+entry_is_free(const struct hash_entry *entry)
+{
+   return entry->key == NULL;
+}
+
+static int
+entry_is_deleted(const struct hash_table *ht, struct hash_entry *entry)
+{
+   return entry->key == ht->deleted_key;
+}
+
+static int
+entry_is_present(const struct hash_table *ht, struct hash_entry *entry)
+{
+   return entry->key != NULL && entry->key != ht->deleted_key;
+}
+
+struct hash_table *
+_mesa_hash_table_create(void *mem_ctx,
+                        bool (*key_equals_function)(const void *a,
+                                                    const void *b))
+{
+   struct hash_table *ht;
+
+   ht = ralloc(mem_ctx, struct hash_table);
+   if (ht == NULL)
+      return NULL;
+
+   ht->size_index = 0;
+   ht->size = hash_sizes[ht->size_index].size;
+   ht->rehash = hash_sizes[ht->size_index].rehash;
+   ht->max_entries = hash_sizes[ht->size_index].max_entries;
+   ht->key_equals_function = key_equals_function;
+   ht->table = rzalloc_array(ht, struct hash_entry, ht->size);
+   ht->entries = 0;
+   ht->deleted_entries = 0;
+   ht->deleted_key = &deleted_key_value;
+
+   if (ht->table == NULL) {
+      ralloc_free(ht);
+      return NULL;
+   }
+
+   return ht;
+}
+
+/**
+ * Frees the given hash table.
+ *
+ * If delete_function is passed, it gets called on each entry present before
+ * freeing.
+ */
+void
+_mesa_hash_table_destroy(struct hash_table *ht,
+                         void (*delete_function)(struct hash_entry *entry))
+{
+   if (!ht)
+      return;
+
+   if (delete_function) {
+      struct hash_entry *entry;
+
+      hash_table_foreach(ht, entry) {
+         delete_function(entry);
+      }
+   }
+   ralloc_free(ht);
+}
+
+/** Sets the value of the key pointer used for deleted entries in the table.
+ *
+ * The assumption is that usually keys are actual pointers, so we use a
+ * default value of a pointer to an arbitrary piece of storage in the library.
+ * But in some cases a consumer wants to store some other sort of value in the
+ * table, like a uint32_t, in which case that pointer may conflict with one of
+ * their valid keys.  This lets that user select a safe value.
+ *
+ * This must be called before any keys are actually deleted from the table.
+ */
+void
+_mesa_hash_table_set_deleted_key(struct hash_table *ht, const void *deleted_key)
+{
+   ht->deleted_key = deleted_key;
+}
+
+/**
+ * Finds a hash table entry with the given key and hash of that key.
+ *
+ * Returns NULL if no entry is found.  Note that the data pointer may be
+ * modified by the user.
+ */
+struct hash_entry *
+_mesa_hash_table_search(struct hash_table *ht, uint32_t hash,
+                        const void *key)
+{
+   uint32_t start_hash_address = hash % ht->size;
+   uint32_t hash_address = start_hash_address;
+
+   do {
+      uint32_t double_hash;
+
+      struct hash_entry *entry = ht->table + hash_address;
+
+      if (entry_is_free(entry)) {
+         return NULL;
+      } else if (entry_is_present(ht, entry) && entry->hash == hash) {
+         if (ht->key_equals_function(key, entry->key)) {
+            return entry;
+         }
+      }
+
+      double_hash = 1 + hash % ht->rehash;
+
+      hash_address = (hash_address + double_hash) % ht->size;
+   } while (hash_address != start_hash_address);
+
+   return NULL;
+}
+
+static void
+_mesa_hash_table_rehash(struct hash_table *ht, int new_size_index)
+{
+   struct hash_table old_ht;
+   struct hash_entry *table, *entry;
+
+   if (new_size_index >= ARRAY_SIZE(hash_sizes))
+      return;
+
+   table = rzalloc_array(ht, struct hash_entry,
+                         hash_sizes[new_size_index].size);
+   if (table == NULL)
+      return;
+
+   old_ht = *ht;
+
+   ht->table = table;
+   ht->size_index = new_size_index;
+   ht->size = hash_sizes[ht->size_index].size;
+   ht->rehash = hash_sizes[ht->size_index].rehash;
+   ht->max_entries = hash_sizes[ht->size_index].max_entries;
+   ht->entries = 0;
+   ht->deleted_entries = 0;
+
+   hash_table_foreach(&old_ht, entry) {
+      _mesa_hash_table_insert(ht, entry->hash,
+                              entry->key, entry->data);
+   }
+
+   ralloc_free(old_ht.table);
+}
+
+/**
+ * Inserts the key with the given hash into the table.
+ *
+ * Note that insertion may rearrange the table on a resize or rehash,
+ * so previously found hash_entries are no longer valid after this function.
+ */
+struct hash_entry *
+_mesa_hash_table_insert(struct hash_table *ht, uint32_t hash,
+                        const void *key, void *data)
+{
+   uint32_t start_hash_address, hash_address;
+
+   if (ht->entries >= ht->max_entries) {
+      _mesa_hash_table_rehash(ht, ht->size_index + 1);
+   } else if (ht->deleted_entries + ht->entries >= ht->max_entries) {
+      _mesa_hash_table_rehash(ht, ht->size_index);
+   }
+
+   start_hash_address = hash % ht->size;
+   hash_address = start_hash_address;
+   do {
+      struct hash_entry *entry = ht->table + hash_address;
+      uint32_t double_hash;
+
+      if (!entry_is_present(ht, entry)) {
+         if (entry_is_deleted(ht, entry))
+            ht->deleted_entries--;
+         entry->hash = hash;
+         entry->key = key;
+         entry->data = data;
+         ht->entries++;
+         return entry;
+      }
+
+      /* Implement replacement when another insert happens
+       * with a matching key.  This is a relatively common
+       * feature of hash tables, with the alternative
+       * generally being "insert the new value as well, and
+       * return it first when the key is searched for".
+       *
+       * Note that the hash table doesn't have a delete
+       * callback.  If freeing of old data pointers is
+       * required to avoid memory leaks, perform a search
+       * before inserting.
+       */
+      if (entry->hash == hash &&
+          ht->key_equals_function(key, entry->key)) {
+         entry->key = key;
+         entry->data = data;
+         return entry;
+      }
+
+
+      double_hash = 1 + hash % ht->rehash;
+
+      hash_address = (hash_address + double_hash) % ht->size;
+   } while (hash_address != start_hash_address);
+
+   /* We could hit here if a required resize failed. An unchecked-malloc
+    * application could ignore this result.
+    */
+   return NULL;
+}
+
+/**
+ * This function deletes the given hash table entry.
+ *
+ * Note that deletion doesn't otherwise modify the table, so an iteration over
+ * the table deleting entries is safe.
+ */
+void
+_mesa_hash_table_remove(struct hash_table *ht,
+                        struct hash_entry *entry)
+{
+   if (!entry)
+      return;
+
+   entry->key = ht->deleted_key;
+   ht->entries--;
+   ht->deleted_entries++;
+}
+
+/**
+ * This function is an iterator over the hash table.
+ *
+ * Pass in NULL for the first entry, as in the start of a for loop.  Note that
+ * an iteration over the table is O(table_size) not O(entries).
+ */
+struct hash_entry *
+_mesa_hash_table_next_entry(struct hash_table *ht,
+                            struct hash_entry *entry)
+{
+   if (entry == NULL)
+      entry = ht->table;
+   else
+      entry = entry + 1;
+
+   for (; entry != ht->table + ht->size; entry++) {
+      if (entry_is_present(ht, entry)) {
+         return entry;
+      }
+   }
+
+   return NULL;
+}
+
+/**
+ * Returns a random entry from the hash table.
+ *
+ * This may be useful in implementing random replacement (as opposed
+ * to just removing everything) in caches based on this hash table
+ * implementation.  @predicate may be used to filter entries, or may
+ * be set to NULL for no filtering.
+ */
+struct hash_entry *
+_mesa_hash_table_random_entry(struct hash_table *ht,
+                              bool (*predicate)(struct hash_entry *entry))
+{
+   struct hash_entry *entry;
+   uint32_t i = rand() % ht->size;
+
+   if (ht->entries == 0)
+      return NULL;
+
+   for (entry = ht->table + i; entry != ht->table + ht->size; entry++) {
+      if (entry_is_present(ht, entry) &&
+          (!predicate || predicate(entry))) {
+         return entry;
+      }
+   }
+
+   for (entry = ht->table; entry != ht->table + i; entry++) {
+      if (entry_is_present(ht, entry) &&
+          (!predicate || predicate(entry))) {
+         return entry;
+      }
+   }
+
+   return NULL;
+}
+
+
+/**
+ * Quick FNV-1 hash implementation based on:
+ * http://www.isthe.com/chongo/tech/comp/fnv/
+ *
+ * FNV-1 is not be the best hash out there -- Jenkins's lookup3 is supposed to
+ * be quite good, and it probably beats FNV.  But FNV has the advantage that
+ * it involves almost no code.  For an improvement on both, see Paul
+ * Hsieh's http://www.azillionmonkeys.com/qed/hash.html
+ */
+uint32_t
+_mesa_hash_data(const void *data, size_t size)
+{
+   uint32_t hash = 2166136261ul;
+   const uint8_t *bytes = data;
+
+   while (size-- != 0) {
+      hash ^= *bytes;
+      hash = hash * 0x01000193;
+      bytes++;
+   }
+
+   return hash;
+}
+
+/** FNV-1 string hash implementation */
+uint32_t
+_mesa_hash_string(const char *key)
+{
+   uint32_t hash = 2166136261ul;
+
+   while (*key != 0) {
+      hash ^= *key;
+      hash = hash * 0x01000193;
+      key++;
+   }
+
+   return hash;
+}
+
+/**
+ * String compare function for use as the comparison callback in
+ * _mesa_hash_table_create().
+ */
+bool
+_mesa_key_string_equal(const void *a, const void *b)
+{
+   return strcmp(a, b) == 0;
+}
+
+bool
+_mesa_key_pointer_equal(const void *a, const void *b)
+{
+   return a == b;
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/hash_table.h b/icd/intel/compiler/mesa-utils/src/mesa/main/hash_table.h
new file mode 100644
index 0000000..f51131a
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/hash_table.h

@@ -0,0 +1,106 @@
+/*
+ * Copyright © 2009,2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Eric Anholt <eric@anholt.net>
+ *
+ */
+
+#ifndef _HASH_TABLE_H
+#define _HASH_TABLE_H
+
+#include <inttypes.h>
+#include <stdbool.h>
+
+#include "compiler.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct hash_entry {
+   uint32_t hash;
+   const void *key;
+   void *data;
+};
+
+struct hash_table {
+   struct hash_entry *table;
+   bool (*key_equals_function)(const void *a, const void *b);
+   const void *deleted_key;
+   uint32_t size;
+   uint32_t rehash;
+   uint32_t max_entries;
+   uint32_t size_index;
+   uint32_t entries;
+   uint32_t deleted_entries;
+};
+
+struct hash_table *
+_mesa_hash_table_create(void *mem_ctx,
+                        bool (*key_equals_function)(const void *a,
+                                                    const void *b));
+void _mesa_hash_table_destroy(struct hash_table *ht,
+                              void (*delete_function)(struct hash_entry *entry));
+void _mesa_hash_table_set_deleted_key(struct hash_table *ht,
+                                      const void *deleted_key);
+
+struct hash_entry *
+_mesa_hash_table_insert(struct hash_table *ht, uint32_t hash,
+                        const void *key, void *data);
+struct hash_entry *
+_mesa_hash_table_search(struct hash_table *ht, uint32_t hash,
+                        const void *key);
+void _mesa_hash_table_remove(struct hash_table *ht,
+                             struct hash_entry *entry);
+
+struct hash_entry *_mesa_hash_table_next_entry(struct hash_table *ht,
+                                               struct hash_entry *entry);
+struct hash_entry *
+_mesa_hash_table_random_entry(struct hash_table *ht,
+                              bool (*predicate)(struct hash_entry *entry));
+
+uint32_t _mesa_hash_data(const void *data, size_t size);
+uint32_t _mesa_hash_string(const char *key);
+bool _mesa_key_string_equal(const void *a, const void *b);
+bool _mesa_key_pointer_equal(const void *a, const void *b);
+
+static inline uint32_t _mesa_hash_pointer(const void *pointer)
+{
+   return _mesa_hash_data(&pointer, sizeof(pointer));
+}
+
+/**
+ * This foreach function is safe against deletion (which just replaces
+ * an entry's data with the deleted marker), but not against insertion
+ * (which may rehash the table, making entry a dangling pointer).
+ */
+#define hash_table_foreach(ht, entry)                   \
+   for (entry = _mesa_hash_table_next_entry(ht, NULL);  \
+        entry != NULL;                                  \
+        entry = _mesa_hash_table_next_entry(ht, entry))
+
+#ifdef __cplusplus
+} /* extern C */
+#endif
+
+#endif /* _HASH_TABLE_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/imports.c b/icd/intel/compiler/mesa-utils/src/mesa/main/imports.c
new file mode 100644
index 0000000..a82272a
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/imports.c

@@ -0,0 +1,622 @@
+/**
+ * \file imports.c
+ * Standard C library function wrappers.
+ * 
+ * Imports are services which the device driver or window system or
+ * operating system provides to the core renderer.  The core renderer (Mesa)
+ * will call these functions in order to do memory allocation, simple I/O,
+ * etc.
+ *
+ * Some drivers will want to override/replace this file with something
+ * specialized, but that'll be rare.
+ *
+ * Eventually, I want to move roll the glheader.h file into this.
+ *
+ * \todo Functions still needed:
+ * - scanf
+ * - qsort
+ * - rand and RAND_MAX
+ */
+
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2007  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+
+#include "imports.h"
+#include "mtypes.h"
+#include "version.h"
+
+#ifdef _GNU_SOURCE
+#include <locale.h>
+#ifdef __APPLE__
+#include <xlocale.h>
+#endif
+#endif
+
+
+#ifdef _WIN32
+#define vsnprintf _vsnprintf
+#elif defined(__IBMC__) || defined(__IBMCPP__)
+extern int vsnprintf(char *str, size_t count, const char *fmt, va_list arg);
+#endif
+
+/**********************************************************************/
+/** \name Memory */
+/*@{*/
+
+/**
+ * Allocate aligned memory.
+ *
+ * \param bytes number of bytes to allocate.
+ * \param alignment alignment (must be greater than zero).
+ * 
+ * Allocates extra memory to accommodate rounding up the address for
+ * alignment and to record the real malloc address.
+ *
+ * \sa _mesa_align_free().
+ */
+void *
+_mesa_align_malloc(size_t bytes, unsigned long alignment)
+{
+#if defined(HAVE_POSIX_MEMALIGN)
+   void *mem;
+   int err = posix_memalign(& mem, alignment, bytes);
+   if (err)
+      return NULL;
+   return mem;
+#elif defined(_WIN32) && defined(_MSC_VER)
+   return _aligned_malloc(bytes, alignment);
+#else
+   uintptr_t ptr, buf;
+
+   ASSERT( alignment > 0 );
+
+   ptr = (uintptr_t)malloc(bytes + alignment + sizeof(void *));
+   if (!ptr)
+      return NULL;
+
+   buf = (ptr + alignment + sizeof(void *)) & ~(uintptr_t)(alignment - 1);
+   *(uintptr_t *)(buf - sizeof(void *)) = ptr;
+
+#ifdef DEBUG
+   /* mark the non-aligned area */
+   while ( ptr < buf - sizeof(void *) ) {
+      *(unsigned long *)ptr = 0xcdcdcdcd;
+      ptr += sizeof(unsigned long);
+   }
+#endif
+
+   return (void *) buf;
+#endif /* defined(HAVE_POSIX_MEMALIGN) */
+}
+
+/**
+ * Same as _mesa_align_malloc(), but using calloc(1, ) instead of
+ * malloc()
+ */
+void *
+_mesa_align_calloc(size_t bytes, unsigned long alignment)
+{
+#if defined(HAVE_POSIX_MEMALIGN)
+   void *mem;
+   
+   mem = _mesa_align_malloc(bytes, alignment);
+   if (mem != NULL) {
+      (void) memset(mem, 0, bytes);
+   }
+
+   return mem;
+#elif defined(_WIN32) && defined(_MSC_VER)
+   void *mem;
+
+   mem = _aligned_malloc(bytes, alignment);
+   if (mem != NULL) {
+      (void) memset(mem, 0, bytes);
+   }
+
+   return mem;
+#else
+   uintptr_t ptr, buf;
+
+   ASSERT( alignment > 0 );
+
+   ptr = (uintptr_t)calloc(1, bytes + alignment + sizeof(void *));
+   if (!ptr)
+      return NULL;
+
+   buf = (ptr + alignment + sizeof(void *)) & ~(uintptr_t)(alignment - 1);
+   *(uintptr_t *)(buf - sizeof(void *)) = ptr;
+
+#ifdef DEBUG
+   /* mark the non-aligned area */
+   while ( ptr < buf - sizeof(void *) ) {
+      *(unsigned long *)ptr = 0xcdcdcdcd;
+      ptr += sizeof(unsigned long);
+   }
+#endif
+
+   return (void *)buf;
+#endif /* defined(HAVE_POSIX_MEMALIGN) */
+}
+
+/**
+ * Free memory which was allocated with either _mesa_align_malloc()
+ * or _mesa_align_calloc().
+ * \param ptr pointer to the memory to be freed.
+ * The actual address to free is stored in the word immediately before the
+ * address the client sees.
+ * Note that it is legal to pass NULL pointer to this function and will be
+ * handled accordingly.
+ */
+void
+_mesa_align_free(void *ptr)
+{
+#if defined(HAVE_POSIX_MEMALIGN)
+   free(ptr);
+#elif defined(_WIN32) && defined(_MSC_VER)
+   _aligned_free(ptr);
+#else
+   if (ptr) {
+      void **cubbyHole = (void **) ((char *) ptr - sizeof(void *));
+      void *realAddr = *cubbyHole;
+      free(realAddr);
+   }
+#endif /* defined(HAVE_POSIX_MEMALIGN) */
+}
+
+/**
+ * Reallocate memory, with alignment.
+ */
+void *
+_mesa_align_realloc(void *oldBuffer, size_t oldSize, size_t newSize,
+                    unsigned long alignment)
+{
+#if defined(_WIN32) && defined(_MSC_VER)
+   (void) oldSize;
+   return _aligned_realloc(oldBuffer, newSize, alignment);
+#else
+   const size_t copySize = (oldSize < newSize) ? oldSize : newSize;
+   void *newBuf = _mesa_align_malloc(newSize, alignment);
+   if (newBuf && oldBuffer && copySize > 0) {
+      memcpy(newBuf, oldBuffer, copySize);
+   }
+
+   _mesa_align_free(oldBuffer);
+   return newBuf;
+#endif
+}
+
+
+
+/** Reallocate memory */
+void *
+_mesa_realloc(void *oldBuffer, size_t oldSize, size_t newSize)
+{
+   const size_t copySize = (oldSize < newSize) ? oldSize : newSize;
+   void *newBuffer = malloc(newSize);
+   if (newBuffer && oldBuffer && copySize > 0)
+      memcpy(newBuffer, oldBuffer, copySize);
+   free(oldBuffer);
+   return newBuffer;
+}
+
+/*@}*/
+
+
+/**********************************************************************/
+/** \name Math */
+/*@{*/
+
+
+#ifndef __GNUC__
+/**
+ * Find the first bit set in a word.
+ */
+int
+ffs(int i)
+{
+   register int bit = 0;
+   if (i != 0) {
+      if ((i & 0xffff) == 0) {
+         bit += 16;
+         i >>= 16;
+      }
+      if ((i & 0xff) == 0) {
+         bit += 8;
+         i >>= 8;
+      }
+      if ((i & 0xf) == 0) {
+         bit += 4;
+         i >>= 4;
+      }
+      while ((i & 1) == 0) {
+         bit++;
+         i >>= 1;
+      }
+      bit++;
+   }
+   return bit;
+}
+
+
+/**
+ * Find position of first bit set in given value.
+ * XXX Warning: this function can only be used on 64-bit systems!
+ * \return  position of least-significant bit set, starting at 1, return zero
+ *          if no bits set.
+ */
+int
+ffsll(long long int val)
+{
+   int bit;
+
+   assert(sizeof(val) == 8);
+
+   bit = ffs((int) val);
+   if (bit != 0)
+      return bit;
+
+   bit = ffs((int) (val >> 32));
+   if (bit != 0)
+      return 32 + bit;
+
+   return 0;
+}
+#endif /* __GNUC__ */
+
+
+#if !defined(__GNUC__) ||\
+   ((__GNUC__ * 100 + __GNUC_MINOR__) < 304) /* Not gcc 3.4 or later */
+/**
+ * Return number of bits set in given GLuint.
+ */
+unsigned int
+_mesa_bitcount(unsigned int n)
+{
+   unsigned int bits;
+   for (bits = 0; n > 0; n = n >> 1) {
+      bits += (n & 1);
+   }
+   return bits;
+}
+
+/**
+ * Return number of bits set in given 64-bit uint.
+ */
+unsigned int
+_mesa_bitcount_64(uint64_t n)
+{
+   unsigned int bits;
+   for (bits = 0; n > 0; n = n >> 1) {
+      bits += (n & 1);
+   }
+   return bits;
+}
+#endif
+
+
+/* Using C99 rounding functions for roundToEven() implementation is
+ * difficult, because round(), rint, and nearbyint() are affected by
+ * fesetenv(), which the application may have done for its own
+ * purposes.  Mesa's IROUND macro is close to what we want, but it
+ * rounds away from 0 on n + 0.5.
+ */
+int
+_mesa_round_to_even(float val)
+{
+   int rounded = IROUND(val);
+
+   if (val - floor(val) == 0.5) {
+      if (rounded % 2 != 0)
+         rounded += val > 0 ? -1 : 1;
+   }
+
+   return rounded;
+}
+
+
+/**
+ * Convert a 4-byte float to a 2-byte half float.
+ *
+ * Not all float32 values can be represented exactly as a float16 value. We
+ * round such intermediate float32 values to the nearest float16. When the
+ * float32 lies exactly between to float16 values, we round to the one with
+ * an even mantissa.
+ *
+ * This rounding behavior has several benefits:
+ *   - It has no sign bias.
+ *
+ *   - It reproduces the behavior of real hardware: opcode F32TO16 in Intel's
+ *     GPU ISA.
+ *
+ *   - By reproducing the behavior of the GPU (at least on Intel hardware),
+ *     compile-time evaluation of constant packHalf2x16 GLSL expressions will
+ *     result in the same value as if the expression were executed on the GPU.
+ */
+GLhalfARB
+_mesa_float_to_half(float val)
+{
+   const fi_type fi = {val};
+   const int flt_m = fi.i & 0x7fffff;
+   const int flt_e = (fi.i >> 23) & 0xff;
+   const int flt_s = (fi.i >> 31) & 0x1;
+   int s, e, m = 0;
+   GLhalfARB result;
+   
+   /* sign bit */
+   s = flt_s;
+
+   /* handle special cases */
+   if ((flt_e == 0) && (flt_m == 0)) {
+      /* zero */
+      /* m = 0; - already set */
+      e = 0;
+   }
+   else if ((flt_e == 0) && (flt_m != 0)) {
+      /* denorm -- denorm float maps to 0 half */
+      /* m = 0; - already set */
+      e = 0;
+   }
+   else if ((flt_e == 0xff) && (flt_m == 0)) {
+      /* infinity */
+      /* m = 0; - already set */
+      e = 31;
+   }
+   else if ((flt_e == 0xff) && (flt_m != 0)) {
+      /* NaN */
+      m = 1;
+      e = 31;
+   }
+   else {
+      /* regular number */
+      const int new_exp = flt_e - 127;
+      if (new_exp < -14) {
+         /* The float32 lies in the range (0.0, min_normal16) and is rounded
+          * to a nearby float16 value. The result will be either zero, subnormal,
+          * or normal.
+          */
+         e = 0;
+         m = _mesa_round_to_even((1 << 24) * fabsf(fi.f));
+      }
+      else if (new_exp > 15) {
+         /* map this value to infinity */
+         /* m = 0; - already set */
+         e = 31;
+      }
+      else {
+         /* The float32 lies in the range
+          *   [min_normal16, max_normal16 + max_step16)
+          * and is rounded to a nearby float16 value. The result will be
+          * either normal or infinite.
+          */
+         e = new_exp + 15;
+         m = _mesa_round_to_even(flt_m / (float) (1 << 13));
+      }
+   }
+
+   assert(0 <= m && m <= 1024);
+   if (m == 1024) {
+      /* The float32 was rounded upwards into the range of the next exponent,
+       * so bump the exponent. This correctly handles the case where f32
+       * should be rounded up to float16 infinity.
+       */
+      ++e;
+      m = 0;
+   }
+
+   result = (s << 15) | (e << 10) | m;
+   return result;
+}
+
+
+/**
+ * Convert a 2-byte half float to a 4-byte float.
+ * Based on code from:
+ * http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/008786.html
+ */
+float
+_mesa_half_to_float(GLhalfARB val)
+{
+   /* XXX could also use a 64K-entry lookup table */
+   const int m = val & 0x3ff;
+   const int e = (val >> 10) & 0x1f;
+   const int s = (val >> 15) & 0x1;
+   int flt_m, flt_e, flt_s;
+   fi_type fi;
+   float result;
+
+   /* sign bit */
+   flt_s = s;
+
+   /* handle special cases */
+   if ((e == 0) && (m == 0)) {
+      /* zero */
+      flt_m = 0;
+      flt_e = 0;
+   }
+   else if ((e == 0) && (m != 0)) {
+      /* denorm -- denorm half will fit in non-denorm single */
+      const float half_denorm = 1.0f / 16384.0f; /* 2^-14 */
+      float mantissa = ((float) (m)) / 1024.0f;
+      float sign = s ? -1.0f : 1.0f;
+      return sign * mantissa * half_denorm;
+   }
+   else if ((e == 31) && (m == 0)) {
+      /* infinity */
+      flt_e = 0xff;
+      flt_m = 0;
+   }
+   else if ((e == 31) && (m != 0)) {
+      /* NaN */
+      flt_e = 0xff;
+      flt_m = 1;
+   }
+   else {
+      /* regular */
+      flt_e = e + 112;
+      flt_m = m << 13;
+   }
+
+   fi.i = (flt_s << 31) | (flt_e << 23) | flt_m;
+   result = fi.f;
+   return result;
+}
+
+/*@}*/
+
+
+/**********************************************************************/
+/** \name Sort & Search */
+/*@{*/
+
+/**
+ * Wrapper for bsearch().
+ */
+void *
+_mesa_bsearch( const void *key, const void *base, size_t nmemb, size_t size, 
+               int (*compar)(const void *, const void *) )
+{
+#if defined(_WIN32_WCE)
+   void *mid;
+   int cmp;
+   while (nmemb) {
+      nmemb >>= 1;
+      mid = (char *)base + nmemb * size;
+      cmp = (*compar)(key, mid);
+      if (cmp == 0)
+	 return mid;
+      if (cmp > 0) {
+	 base = (char *)mid + size;
+	 --nmemb;
+      }
+   }
+   return NULL;
+#else
+   return bsearch(key, base, nmemb, size, compar);
+#endif
+}
+
+/*@}*/
+
+
+/**********************************************************************/
+/** \name Environment vars */
+/*@{*/
+
+/**
+ * Wrapper for getenv().
+ */
+char *
+_mesa_getenv( const char *var )
+{
+#if defined(_XBOX) || defined(_WIN32_WCE)
+   return NULL;
+#else
+   return getenv(var);
+#endif
+}
+
+/*@}*/
+
+
+/**********************************************************************/
+/** \name String */
+/*@{*/
+
+/**
+ * Implemented using malloc() and strcpy.
+ * Note that NULL is handled accordingly.
+ */
+char *
+_mesa_strdup( const char *s )
+{
+   if (s) {
+      size_t l = strlen(s);
+      char *s2 = malloc(l + 1);
+      if (s2)
+         strcpy(s2, s);
+      return s2;
+   }
+   else {
+      return NULL;
+   }
+}
+
+/** Wrapper around strtof() */
+float
+_mesa_strtof( const char *s, char **end )
+{
+#if defined(_GNU_SOURCE) && !defined(__CYGWIN__) && !defined(__FreeBSD__) && \
+   !defined(ANDROID) && !defined(__HAIKU__) && !defined(__UCLIBC__) && \
+   !defined(__NetBSD__)
+   static locale_t loc = NULL;
+   if (!loc) {
+      loc = newlocale(LC_CTYPE_MASK, "C", NULL);
+   }
+   return strtof_l(s, end, loc);
+#elif defined(_ISOC99_SOURCE) || (defined(_XOPEN_SOURCE) && _XOPEN_SOURCE >= 600)
+   return strtof(s, end);
+#else
+   return (float)strtod(s, end);
+#endif
+}
+
+/** Compute simple checksum/hash for a string */
+unsigned int
+_mesa_str_checksum(const char *str)
+{
+   /* This could probably be much better */
+   unsigned int sum, i;
+   const char *c;
+   sum = i = 1;
+   for (c = str; *c; c++, i++)
+      sum += *c * (i % 100);
+   return sum + i;
+}
+
+
+/*@}*/
+
+
+/** Needed due to #ifdef's, above. */
+int
+_mesa_vsnprintf(char *str, size_t size, const char *fmt, va_list args)
+{
+   return vsnprintf( str, size, fmt, args);
+}
+
+/** Wrapper around vsnprintf() */
+int
+_mesa_snprintf( char *str, size_t size, const char *fmt, ... )
+{
+   int r;
+   va_list args;
+   va_start( args, fmt );  
+   r = vsnprintf( str, size, fmt, args );
+   va_end( args );
+   return r;
+}
+
+

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/imports.h b/icd/intel/compiler/mesa-utils/src/mesa/main/imports.h
new file mode 100644
index 0000000..aa8ddfd
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/imports.h

@@ -0,0 +1,603 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2008  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+/**
+ * \file imports.h
+ * Standard C library function wrappers.
+ *
+ * This file provides wrappers for all the standard C library functions
+ * like malloc(), free(), printf(), getenv(), etc.
+ */
+
+
+#ifndef IMPORTS_H
+#define IMPORTS_H
+
+
+#include "compiler.h"
+#include "glheader.h"
+#include "errors.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+/**********************************************************************/
+/** Memory macros */
+/*@{*/
+
+/** Allocate a structure of type \p T */
+#define MALLOC_STRUCT(T)   (struct T *) malloc(sizeof(struct T))
+/** Allocate and zero a structure of type \p T */
+#define CALLOC_STRUCT(T)   (struct T *) calloc(1, sizeof(struct T))
+
+/*@}*/
+
+
+/*
+ * For GL_ARB_vertex_buffer_object we need to treat vertex array pointers
+ * as offsets into buffer stores.  Since the vertex array pointer and
+ * buffer store pointer are both pointers and we need to add them, we use
+ * this macro.
+ * Both pointers/offsets are expressed in bytes.
+ */
+#define ADD_POINTERS(A, B)  ( (GLubyte *) (A) + (uintptr_t) (B) )
+
+
+/**
+ * Sometimes we treat GLfloats as GLints.  On x86 systems, moving a float
+ * as a int (thereby using integer registers instead of FP registers) is
+ * a performance win.  Typically, this can be done with ordinary casts.
+ * But with gcc's -fstrict-aliasing flag (which defaults to on in gcc 3.0)
+ * these casts generate warnings.
+ * The following union typedef is used to solve that.
+ */
+typedef union { GLfloat f; GLint i; GLuint u; } fi_type;
+
+
+
+/**********************************************************************
+ * Math macros
+ */
+
+#define MAX_GLUSHORT	0xffff
+#define MAX_GLUINT	0xffffffff
+
+/* Degrees to radians conversion: */
+#define DEG2RAD (M_PI/180.0)
+
+
+/**
+ * \name Work-arounds for platforms that lack C99 math functions
+ */
+/*@{*/
+#if (!defined(_XOPEN_SOURCE) || (_XOPEN_SOURCE < 600)) && !defined(_ISOC99_SOURCE) \
+   && (!defined(__STDC_VERSION__) || (__STDC_VERSION__ < 199901L)) \
+   && (!defined(_MSC_VER) || (_MSC_VER < 1400))
+#define acosf(f) ((float) acos(f))
+#define asinf(f) ((float) asin(f))
+#define atan2f(x,y) ((float) atan2(x,y))
+#define atanf(f) ((float) atan(f))
+#define ceilf(f) ((float) ceil(f))
+#define cosf(f) ((float) cos(f))
+#define coshf(f) ((float) cosh(f))
+#define expf(f) ((float) exp(f))
+#define exp2f(f) ((float) exp2(f))
+#define floorf(f) ((float) floor(f))
+#define logf(f) ((float) log(f))
+
+#ifdef ANDROID
+#define log2f(f) (logf(f) * (float) (1.0 / M_LN2))
+#else
+#define log2f(f) ((float) log2(f))
+#endif
+
+#define powf(x,y) ((float) pow(x,y))
+#define sinf(f) ((float) sin(f))
+#define sinhf(f) ((float) sinh(f))
+#define sqrtf(f) ((float) sqrt(f))
+#define tanf(f) ((float) tan(f))
+#define tanhf(f) ((float) tanh(f))
+#define acoshf(f) ((float) acosh(f))
+#define asinhf(f) ((float) asinh(f))
+#define atanhf(f) ((float) atanh(f))
+#endif
+
+#if defined(_MSC_VER)
+#if _MSC_VER < 1800  /* Not req'd on VS2013 and above */
+static inline float truncf(float x) { return x < 0.0f ? ceilf(x) : floorf(x); }
+static inline float exp2f(float x) { return powf(2.0f, x); }
+static inline float log2f(float x) { return logf(x) * 1.442695041f; }
+static inline float asinhf(float x) { return logf(x + sqrtf(x * x + 1.0f)); }
+static inline float acoshf(float x) { return logf(x + sqrtf(x * x - 1.0f)); }
+static inline float atanhf(float x) { return (logf(1.0f + x) - logf(1.0f - x)) / 2.0f; }
+static inline int isblank(int ch) { return ch == ' ' || ch == '\t'; }
+#define strtoll(p, e, b) _strtoi64(p, e, b)
+#endif /* _MSC_VER < 1800 */
+#define strcasecmp(s1, s2) _stricmp(s1, s2)
+#endif
+/*@}*/
+
+
+/*
+ * signbit() is a macro on Linux.  Not available on Windows.
+ */
+#ifndef signbit
+#define signbit(x) ((x) < 0.0f)
+#endif
+
+
+/** single-precision inverse square root */
+static inline float
+INV_SQRTF(float x)
+{
+   /* XXX we could try Quake's fast inverse square root function here */
+   return 1.0F / sqrtf(x);
+}
+
+
+/***
+ *** LOG2: Log base 2 of float
+ ***/
+static inline GLfloat LOG2(GLfloat x)
+{
+#ifdef USE_IEEE
+#if 0
+   /* This is pretty fast, but not accurate enough (only 2 fractional bits).
+    * Based on code from http://www.stereopsis.com/log2.html
+    */
+   const GLfloat y = x * x * x * x;
+   const GLuint ix = *((GLuint *) &y);
+   const GLuint exp = (ix >> 23) & 0xFF;
+   const GLint log2 = ((GLint) exp) - 127;
+   return (GLfloat) log2 * (1.0 / 4.0);  /* 4, because of x^4 above */
+#endif
+   /* Pretty fast, and accurate.
+    * Based on code from http://www.flipcode.com/totd/
+    */
+   fi_type num;
+   GLint log_2;
+   num.f = x;
+   log_2 = ((num.i >> 23) & 255) - 128;
+   num.i &= ~(255 << 23);
+   num.i += 127 << 23;
+   num.f = ((-1.0f/3) * num.f + 2) * num.f - 2.0f/3;
+   return num.f + log_2;
+#else
+   /*
+    * NOTE: log_base_2(x) = log(x) / log(2)
+    * NOTE: 1.442695 = 1/log(2).
+    */
+   return (GLfloat) (log(x) * 1.442695F);
+#endif
+}
+
+
+
+/***
+ *** IS_INF_OR_NAN: test if float is infinite or NaN
+ ***/
+#ifdef USE_IEEE
+static inline int IS_INF_OR_NAN( float x )
+{
+   fi_type tmp;
+   tmp.f = x;
+   return !(int)((unsigned int)((tmp.i & 0x7fffffff)-0x7f800000) >> 31);
+}
+#elif defined(isfinite)
+#define IS_INF_OR_NAN(x)        (!isfinite(x))
+#elif defined(finite)
+#define IS_INF_OR_NAN(x)        (!finite(x))
+#elif defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
+#define IS_INF_OR_NAN(x)        (!isfinite(x))
+#else
+#define IS_INF_OR_NAN(x)        (!finite(x))
+#endif
+
+
+/***
+ *** CEILF: ceiling of float
+ *** FLOORF: floor of float
+ *** FABSF: absolute value of float
+ *** LOGF: the natural logarithm (base e) of the value
+ *** EXPF: raise e to the value
+ *** LDEXPF: multiply value by an integral power of two
+ *** FREXPF: extract mantissa and exponent from value
+ ***/
+#if defined(__gnu_linux__)
+/* C99 functions */
+#define CEILF(x)   ceilf(x)
+#define FLOORF(x)  floorf(x)
+#define FABSF(x)   fabsf(x)
+#define LOGF(x)    logf(x)
+#define EXPF(x)    expf(x)
+#define LDEXPF(x,y)  ldexpf(x,y)
+#define FREXPF(x,y)  frexpf(x,y)
+#else
+#define CEILF(x)   ((GLfloat) ceil(x))
+#define FLOORF(x)  ((GLfloat) floor(x))
+#define FABSF(x)   ((GLfloat) fabs(x))
+#define LOGF(x)    ((GLfloat) log(x))
+#define EXPF(x)    ((GLfloat) exp(x))
+#define LDEXPF(x,y)  ((GLfloat) ldexp(x,y))
+#define FREXPF(x,y)  ((GLfloat) frexp(x,y))
+#endif
+
+
+/**
+ * Convert float to int by rounding to nearest integer, away from zero.
+ */
+static inline int IROUND(float f)
+{
+   return (int) ((f >= 0.0F) ? (f + 0.5F) : (f - 0.5F));
+}
+
+
+/**
+ * Convert float to int64 by rounding to nearest integer.
+ */
+static inline GLint64 IROUND64(float f)
+{
+   return (GLint64) ((f >= 0.0F) ? (f + 0.5F) : (f - 0.5F));
+}
+
+
+/**
+ * Convert positive float to int by rounding to nearest integer.
+ */
+static inline int IROUND_POS(float f)
+{
+   assert(f >= 0.0F);
+   return (int) (f + 0.5F);
+}
+
+
+/**
+ * Convert float to int using a fast method.  The rounding mode may vary.
+ * XXX We could use an x86-64/SSE2 version here.
+ */
+static inline int F_TO_I(float f)
+{
+#if defined(USE_X86_ASM) && defined(__GNUC__) && defined(__i386__)
+   int r;
+   __asm__ ("fistpl %0" : "=m" (r) : "t" (f) : "st");
+   return r;
+#elif defined(USE_X86_ASM) && defined(_MSC_VER)
+   int r;
+   _asm {
+	 fld f
+	 fistp r
+	}
+   return r;
+#else
+   return IROUND(f);
+#endif
+}
+
+
+/** Return (as an integer) floor of float */
+static inline int IFLOOR(float f)
+{
+#if defined(USE_X86_ASM) && defined(__GNUC__) && defined(__i386__)
+   /*
+    * IEEE floor for computers that round to nearest or even.
+    * 'f' must be between -4194304 and 4194303.
+    * This floor operation is done by "(iround(f + .5) + iround(f - .5)) >> 1",
+    * but uses some IEEE specific tricks for better speed.
+    * Contributed by Josh Vanderhoof
+    */
+   int ai, bi;
+   double af, bf;
+   af = (3 << 22) + 0.5 + (double)f;
+   bf = (3 << 22) + 0.5 - (double)f;
+   /* GCC generates an extra fstp/fld without this. */
+   __asm__ ("fstps %0" : "=m" (ai) : "t" (af) : "st");
+   __asm__ ("fstps %0" : "=m" (bi) : "t" (bf) : "st");
+   return (ai - bi) >> 1;
+#elif defined(USE_IEEE)
+   int ai, bi;
+   double af, bf;
+   fi_type u;
+   af = (3 << 22) + 0.5 + (double)f;
+   bf = (3 << 22) + 0.5 - (double)f;
+   u.f = (float) af;  ai = u.i;
+   u.f = (float) bf;  bi = u.i;
+   return (ai - bi) >> 1;
+#else
+   int i = IROUND(f);
+   return (i > f) ? i - 1 : i;
+#endif
+}
+
+
+/** Return (as an integer) ceiling of float */
+static inline int ICEIL(float f)
+{
+#if defined(USE_X86_ASM) && defined(__GNUC__) && defined(__i386__)
+   /*
+    * IEEE ceil for computers that round to nearest or even.
+    * 'f' must be between -4194304 and 4194303.
+    * This ceil operation is done by "(iround(f + .5) + iround(f - .5) + 1) >> 1",
+    * but uses some IEEE specific tricks for better speed.
+    * Contributed by Josh Vanderhoof
+    */
+   int ai, bi;
+   double af, bf;
+   af = (3 << 22) + 0.5 + (double)f;
+   bf = (3 << 22) + 0.5 - (double)f;
+   /* GCC generates an extra fstp/fld without this. */
+   __asm__ ("fstps %0" : "=m" (ai) : "t" (af) : "st");
+   __asm__ ("fstps %0" : "=m" (bi) : "t" (bf) : "st");
+   return (ai - bi + 1) >> 1;
+#elif defined(USE_IEEE)
+   int ai, bi;
+   double af, bf;
+   fi_type u;
+   af = (3 << 22) + 0.5 + (double)f;
+   bf = (3 << 22) + 0.5 - (double)f;
+   u.f = (float) af; ai = u.i;
+   u.f = (float) bf; bi = u.i;
+   return (ai - bi + 1) >> 1;
+#else
+   int i = IROUND(f);
+   return (i < f) ? i + 1 : i;
+#endif
+}
+
+
+/**
+ * Is x a power of two?
+ */
+static inline int
+_mesa_is_pow_two(int x)
+{
+   return !(x & (x - 1));
+}
+
+/**
+ * Round given integer to next higher power of two
+ * If X is zero result is undefined.
+ *
+ * Source for the fallback implementation is
+ * Sean Eron Anderson's webpage "Bit Twiddling Hacks"
+ * http://graphics.stanford.edu/~seander/bithacks.html
+ *
+ * When using builtin function have to do some work
+ * for case when passed values 1 to prevent hiting
+ * undefined result from __builtin_clz. Undefined
+ * results would be different depending on optimization
+ * level used for build.
+ */
+static inline int32_t
+_mesa_next_pow_two_32(uint32_t x)
+{
+#if defined(__GNUC__) && \
+	((__GNUC__ * 100 + __GNUC_MINOR__) >= 304) /* gcc 3.4 or later */
+	uint32_t y = (x != 1);
+	return (1 + y) << ((__builtin_clz(x - y) ^ 31) );
+#else
+	x--;
+	x |= x >> 1;
+	x |= x >> 2;
+	x |= x >> 4;
+	x |= x >> 8;
+	x |= x >> 16;
+	x++;
+	return x;
+#endif
+}
+
+static inline int64_t
+_mesa_next_pow_two_64(uint64_t x)
+{
+#if defined(__GNUC__) && \
+	((__GNUC__ * 100 + __GNUC_MINOR__) >= 304) /* gcc 3.4 or later */
+	uint64_t y = (x != 1);
+	if (sizeof(x) == sizeof(long))
+		return (1 + y) << ((__builtin_clzl(x - y) ^ 63));
+	else
+		return (1 + y) << ((__builtin_clzll(x - y) ^ 63));
+#else
+	x--;
+	x |= x >> 1;
+	x |= x >> 2;
+	x |= x >> 4;
+	x |= x >> 8;
+	x |= x >> 16;
+	x |= x >> 32;
+	x++;
+	return x;
+#endif
+}
+
+
+/*
+ * Returns the floor form of binary logarithm for a 32-bit integer.
+ */
+static inline GLuint
+_mesa_logbase2(GLuint n)
+{
+#if defined(__GNUC__) && \
+   ((__GNUC__ * 100 + __GNUC_MINOR__) >= 304) /* gcc 3.4 or later */
+   return (31 - __builtin_clz(n | 1));
+#else
+   GLuint pos = 0;
+   if (n >= 1<<16) { n >>= 16; pos += 16; }
+   if (n >= 1<< 8) { n >>=  8; pos +=  8; }
+   if (n >= 1<< 4) { n >>=  4; pos +=  4; }
+   if (n >= 1<< 2) { n >>=  2; pos +=  2; }
+   if (n >= 1<< 1) {           pos +=  1; }
+   return pos;
+#endif
+}
+
+
+/**
+ * Return 1 if this is a little endian machine, 0 if big endian.
+ */
+static inline GLboolean
+_mesa_little_endian(void)
+{
+   const GLuint ui = 1; /* intentionally not static */
+   return *((const GLubyte *) &ui);
+}
+
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <errno.h>
+
+static inline int
+_mesa_mkdir(const char *path)
+{
+#ifdef _WIN32
+   if (_mkdir(path) != 0)
+      return errno;
+   return 0;
+#else
+   if (mkdir(path, 0775) != 0)
+      return errno;
+   return 0;
+#endif
+}
+
+
+
+/**********************************************************************
+ * Functions
+ */
+
+extern void *
+_mesa_align_malloc( size_t bytes, unsigned long alignment );
+
+extern void *
+_mesa_align_calloc( size_t bytes, unsigned long alignment );
+
+extern void
+_mesa_align_free( void *ptr );
+
+extern void *
+_mesa_align_realloc(void *oldBuffer, size_t oldSize, size_t newSize,
+                    unsigned long alignment);
+
+extern void *
+_mesa_exec_malloc( GLuint size );
+
+extern void 
+_mesa_exec_free( void *addr );
+
+extern void *
+_mesa_realloc( void *oldBuffer, size_t oldSize, size_t newSize );
+
+
+#ifndef FFS_DEFINED
+#define FFS_DEFINED 1
+#ifdef __GNUC__
+#define ffs __builtin_ffs
+#define ffsll __builtin_ffsll
+#else
+extern int ffs(int i);
+extern int ffsll(long long int i);
+#endif /*__ GNUC__ */
+#endif /* FFS_DEFINED */
+
+
+#if defined(__GNUC__) && ((__GNUC__ * 100 + __GNUC_MINOR__) >= 304) /* gcc 3.4 or later */
+#define _mesa_bitcount(i) __builtin_popcount(i)
+#define _mesa_bitcount_64(i) __builtin_popcountll(i)
+#else
+extern unsigned int
+_mesa_bitcount(unsigned int n);
+extern unsigned int
+_mesa_bitcount_64(uint64_t n);
+#endif
+
+/**
+ * Find the last (most significant) bit set in a word.
+ *
+ * Essentially ffs() in the reverse direction.
+ */
+static inline unsigned int
+_mesa_fls(unsigned int n)
+{
+#if defined(__GNUC__) && ((__GNUC__ * 100 + __GNUC_MINOR__) >= 304)
+   return n == 0 ? 0 : 32 - __builtin_clz(n);
+#else
+   unsigned int v = 1;
+
+   if (n == 0)
+      return 0;
+
+   while (n >>= 1)
+       v++;
+
+   return v;
+#endif
+}
+
+extern int
+_mesa_round_to_even(float val);
+
+extern GLhalfARB
+_mesa_float_to_half(float f);
+
+extern float
+_mesa_half_to_float(GLhalfARB h);
+
+
+extern void *
+_mesa_bsearch( const void *key, const void *base, size_t nmemb, size_t size, 
+               int (*compar)(const void *, const void *) );
+
+extern char *
+_mesa_getenv( const char *var );
+
+extern char *
+_mesa_strdup( const char *s );
+
+extern float
+_mesa_strtof( const char *s, char **end );
+
+extern unsigned int
+_mesa_str_checksum(const char *str);
+
+extern int
+_mesa_snprintf( char *str, size_t size, const char *fmt, ... ) PRINTFLIKE(3, 4);
+
+extern int
+_mesa_vsnprintf(char *str, size_t size, const char *fmt, va_list arg);
+
+
+#if defined(_MSC_VER) && !defined(snprintf)
+#define snprintf _snprintf
+#endif
+
+
+#ifdef __cplusplus
+}
+#endif
+
+
+#endif /* IMPORTS_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/macros.h b/icd/intel/compiler/mesa-utils/src/mesa/main/macros.h
new file mode 100644
index 0000000..abb5ef8
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/macros.h

@@ -0,0 +1,826 @@
+/**
+ * \file macros.h
+ * A collection of useful macros.
+ */
+
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2006  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+#ifndef MACROS_H
+#define MACROS_H
+
+#include "imports.h"
+
+
+/**
+ * \name Integer / float conversion for colors, normals, etc.
+ */
+/*@{*/
+
+/** Convert GLubyte in [0,255] to GLfloat in [0.0,1.0] */
+extern GLfloat _mesa_ubyte_to_float_color_tab[256];
+#define UBYTE_TO_FLOAT(u) _mesa_ubyte_to_float_color_tab[(unsigned int)(u)]
+
+/** Convert GLfloat in [0.0,1.0] to GLubyte in [0,255] */
+#define FLOAT_TO_UBYTE(X)   ((GLubyte) (GLint) ((X) * 255.0F))
+
+
+/** Convert GLbyte in [-128,127] to GLfloat in [-1.0,1.0] */
+#define BYTE_TO_FLOAT(B)    ((2.0F * (B) + 1.0F) * (1.0F/255.0F))
+
+/** Convert GLfloat in [-1.0,1.0] to GLbyte in [-128,127] */
+#define FLOAT_TO_BYTE(X)    ( (((GLint) (255.0F * (X))) - 1) / 2 )
+
+
+/** Convert GLbyte to GLfloat while preserving zero */
+#define BYTE_TO_FLOATZ(B)   ((B) == 0 ? 0.0F : BYTE_TO_FLOAT(B))
+
+
+/** Convert GLbyte in [-128,127] to GLfloat in [-1.0,1.0], texture/fb data */
+#define BYTE_TO_FLOAT_TEX(B)    ((B) == -128 ? -1.0F : (B) * (1.0F/127.0F))
+
+/** Convert GLfloat in [-1.0,1.0] to GLbyte in [-128,127], texture/fb data */
+#define FLOAT_TO_BYTE_TEX(X)    CLAMP( (GLint) (127.0F * (X)), -128, 127 )
+
+/** Convert GLushort in [0,65535] to GLfloat in [0.0,1.0] */
+#define USHORT_TO_FLOAT(S)  ((GLfloat) (S) * (1.0F / 65535.0F))
+
+/** Convert GLfloat in [0.0,1.0] to GLushort in [0, 65535] */
+#define FLOAT_TO_USHORT(X)   ((GLuint) ((X) * 65535.0F))
+
+
+/** Convert GLshort in [-32768,32767] to GLfloat in [-1.0,1.0] */
+#define SHORT_TO_FLOAT(S)   ((2.0F * (S) + 1.0F) * (1.0F/65535.0F))
+
+/** Convert GLfloat in [-1.0,1.0] to GLshort in [-32768,32767] */
+#define FLOAT_TO_SHORT(X)   ( (((GLint) (65535.0F * (X))) - 1) / 2 )
+
+/** Convert GLshort to GLfloat while preserving zero */
+#define SHORT_TO_FLOATZ(S)   ((S) == 0 ? 0.0F : SHORT_TO_FLOAT(S))
+
+
+/** Convert GLshort in [-32768,32767] to GLfloat in [-1.0,1.0], texture/fb data */
+#define SHORT_TO_FLOAT_TEX(S)    ((S) == -32768 ? -1.0F : (S) * (1.0F/32767.0F))
+
+/** Convert GLfloat in [-1.0,1.0] to GLshort in [-32768,32767], texture/fb data */
+#define FLOAT_TO_SHORT_TEX(X)    ( (GLint) (32767.0F * (X)) )
+
+
+/** Convert GLuint in [0,4294967295] to GLfloat in [0.0,1.0] */
+#define UINT_TO_FLOAT(U)    ((GLfloat) ((U) * (1.0F / 4294967295.0)))
+
+/** Convert GLfloat in [0.0,1.0] to GLuint in [0,4294967295] */
+#define FLOAT_TO_UINT(X)    ((GLuint) ((X) * 4294967295.0))
+
+
+/** Convert GLint in [-2147483648,2147483647] to GLfloat in [-1.0,1.0] */
+#define INT_TO_FLOAT(I)     ((GLfloat) ((2.0F * (I) + 1.0F) * (1.0F/4294967294.0)))
+
+/** Convert GLfloat in [-1.0,1.0] to GLint in [-2147483648,2147483647] */
+/* causes overflow:
+#define FLOAT_TO_INT(X)     ( (((GLint) (4294967294.0 * (X))) - 1) / 2 )
+*/
+/* a close approximation: */
+#define FLOAT_TO_INT(X)     ( (GLint) (2147483647.0 * (X)) )
+
+/** Convert GLfloat in [-1.0,1.0] to GLint64 in [-(1<<63),(1 << 63) -1] */
+#define FLOAT_TO_INT64(X)     ( (GLint64) (9223372036854775807.0 * (double)(X)) )
+
+
+/** Convert GLint in [-2147483648,2147483647] to GLfloat in [-1.0,1.0], texture/fb data */
+#define INT_TO_FLOAT_TEX(I)    ((I) == -2147483648 ? -1.0F : (I) * (1.0F/2147483647.0))
+
+/** Convert GLfloat in [-1.0,1.0] to GLint in [-2147483648,2147483647], texture/fb data */
+#define FLOAT_TO_INT_TEX(X)    ( (GLint) (2147483647.0 * (X)) )
+
+
+#define BYTE_TO_UBYTE(b)   ((GLubyte) ((b) < 0 ? 0 : (GLubyte) (b)))
+#define SHORT_TO_UBYTE(s)  ((GLubyte) ((s) < 0 ? 0 : (GLubyte) ((s) >> 7)))
+#define USHORT_TO_UBYTE(s) ((GLubyte) ((s) >> 8))
+#define INT_TO_UBYTE(i)    ((GLubyte) ((i) < 0 ? 0 : (GLubyte) ((i) >> 23)))
+#define UINT_TO_UBYTE(i)   ((GLubyte) ((i) >> 24))
+
+
+#define BYTE_TO_USHORT(b)  ((b) < 0 ? 0 : ((GLushort) (((b) * 65535) / 255)))
+#define UBYTE_TO_USHORT(b) (((GLushort) (b) << 8) | (GLushort) (b))
+#define SHORT_TO_USHORT(s) ((s) < 0 ? 0 : ((GLushort) (((s) * 65535 / 32767))))
+#define INT_TO_USHORT(i)   ((i) < 0 ? 0 : ((GLushort) ((i) >> 15)))
+#define UINT_TO_USHORT(i)  ((i) < 0 ? 0 : ((GLushort) ((i) >> 16)))
+#define UNCLAMPED_FLOAT_TO_USHORT(us, f)  \
+        us = ( (GLushort) F_TO_I( CLAMP((f), 0.0F, 1.0F) * 65535.0F) )
+#define CLAMPED_FLOAT_TO_USHORT(us, f)  \
+        us = ( (GLushort) F_TO_I( (f) * 65535.0F) )
+
+#define UNCLAMPED_FLOAT_TO_SHORT(s, f)  \
+        s = ( (GLshort) F_TO_I( CLAMP((f), -1.0F, 1.0F) * 32767.0F) )
+
+/***
+ *** UNCLAMPED_FLOAT_TO_UBYTE: clamp float to [0,1] and map to ubyte in [0,255]
+ *** CLAMPED_FLOAT_TO_UBYTE: map float known to be in [0,1] to ubyte in [0,255]
+ ***/
+#if defined(USE_IEEE) && !defined(DEBUG)
+/* This function/macro is sensitive to precision.  Test very carefully
+ * if you change it!
+ */
+#define UNCLAMPED_FLOAT_TO_UBYTE(UB, F)					\
+        do {								\
+           fi_type __tmp;						\
+           __tmp.f = (F);						\
+           if (__tmp.i < 0)						\
+              UB = (GLubyte) 0;						\
+           else if (__tmp.i >= IEEE_ONE)				\
+              UB = (GLubyte) 255;					\
+           else {							\
+              __tmp.f = __tmp.f * (255.0F/256.0F) + 32768.0F;		\
+              UB = (GLubyte) __tmp.i;					\
+           }								\
+        } while (0)
+#define CLAMPED_FLOAT_TO_UBYTE(UB, F)					\
+        do {								\
+           fi_type __tmp;						\
+           __tmp.f = (F) * (255.0F/256.0F) + 32768.0F;			\
+           UB = (GLubyte) __tmp.i;					\
+        } while (0)
+#else
+#define UNCLAMPED_FLOAT_TO_UBYTE(ub, f) \
+	ub = ((GLubyte) F_TO_I(CLAMP((f), 0.0F, 1.0F) * 255.0F))
+#define CLAMPED_FLOAT_TO_UBYTE(ub, f) \
+	ub = ((GLubyte) F_TO_I((f) * 255.0F))
+#endif
+
+static inline GLfloat INT_AS_FLT(GLint i)
+{
+   fi_type tmp;
+   tmp.i = i;
+   return tmp.f;
+}
+
+static inline GLfloat UINT_AS_FLT(GLuint u)
+{
+   fi_type tmp;
+   tmp.u = u;
+   return tmp.f;
+}
+
+/**
+ * Convert a floating point value to an unsigned fixed point value.
+ *
+ * \param frac_bits   The number of bits used to store the fractional part.
+ */
+static INLINE uint32_t
+U_FIXED(float value, uint32_t frac_bits)
+{
+   value *= (1 << frac_bits);
+   return value < 0.0f ? 0 : (uint32_t) value;
+}
+
+/**
+ * Convert a floating point value to an signed fixed point value.
+ *
+ * \param frac_bits   The number of bits used to store the fractional part.
+ */
+static INLINE int32_t
+S_FIXED(float value, uint32_t frac_bits)
+{
+   return (int32_t) (value * (1 << frac_bits));
+}
+/*@}*/
+
+
+/** Stepping a GLfloat pointer by a byte stride */
+#define STRIDE_F(p, i)  (p = (GLfloat *)((GLubyte *)p + i))
+/** Stepping a GLuint pointer by a byte stride */
+#define STRIDE_UI(p, i)  (p = (GLuint *)((GLubyte *)p + i))
+/** Stepping a GLubyte[4] pointer by a byte stride */
+#define STRIDE_4UB(p, i)  (p = (GLubyte (*)[4])((GLubyte *)p + i))
+/** Stepping a GLfloat[4] pointer by a byte stride */
+#define STRIDE_4F(p, i)  (p = (GLfloat (*)[4])((GLubyte *)p + i))
+/** Stepping a \p t pointer by a byte stride */
+#define STRIDE_T(p, t, i)  (p = (t)((GLubyte *)p + i))
+
+
+/**********************************************************************/
+/** \name 4-element vector operations */
+/*@{*/
+
+/** Zero */
+#define ZERO_4V( DST )  (DST)[0] = (DST)[1] = (DST)[2] = (DST)[3] = 0
+
+/** Test for equality */
+#define TEST_EQ_4V(a,b)  ((a)[0] == (b)[0] &&   \
+              (a)[1] == (b)[1] &&   \
+              (a)[2] == (b)[2] &&   \
+              (a)[3] == (b)[3])
+
+/** Test for equality (unsigned bytes) */
+static inline GLboolean
+TEST_EQ_4UBV(const GLubyte a[4], const GLubyte b[4])
+{
+#if defined(__i386__)
+   return *((const GLuint *) a) == *((const GLuint *) b);
+#else
+   return TEST_EQ_4V(a, b);
+#endif
+}
+
+
+/** Copy a 4-element vector */
+#define COPY_4V( DST, SRC )         \
+do {                                \
+   (DST)[0] = (SRC)[0];             \
+   (DST)[1] = (SRC)[1];             \
+   (DST)[2] = (SRC)[2];             \
+   (DST)[3] = (SRC)[3];             \
+} while (0)
+
+/** Copy a 4-element unsigned byte vector */
+static inline void
+COPY_4UBV(GLubyte dst[4], const GLubyte src[4])
+{
+#if defined(__i386__)
+   *((GLuint *) dst) = *((GLuint *) src);
+#else
+   /* The GLuint cast might fail if DST or SRC are not dword-aligned (RISC) */
+   COPY_4V(dst, src);
+#endif
+}
+
+/** Copy a 4-element float vector */
+static inline void
+COPY_4FV(GLfloat dst[4], const GLfloat src[4])
+{
+   /* memcpy seems to be most efficient */
+   memcpy(dst, src, sizeof(GLfloat) * 4);
+}
+
+/** Copy \p SZ elements into a 4-element vector */
+#define COPY_SZ_4V(DST, SZ, SRC)  \
+do {                              \
+   switch (SZ) {                  \
+   case 4: (DST)[3] = (SRC)[3];   \
+   case 3: (DST)[2] = (SRC)[2];   \
+   case 2: (DST)[1] = (SRC)[1];   \
+   case 1: (DST)[0] = (SRC)[0];   \
+   }                              \
+} while(0)
+
+/** Copy \p SZ elements into a homegeneous (4-element) vector, giving
+ * default values to the remaining */
+#define COPY_CLEAN_4V(DST, SZ, SRC)  \
+do {                                 \
+      ASSIGN_4V( DST, 0, 0, 0, 1 );  \
+      COPY_SZ_4V( DST, SZ, SRC );    \
+} while (0)
+
+/** Subtraction */
+#define SUB_4V( DST, SRCA, SRCB )           \
+do {                                        \
+      (DST)[0] = (SRCA)[0] - (SRCB)[0];     \
+      (DST)[1] = (SRCA)[1] - (SRCB)[1];     \
+      (DST)[2] = (SRCA)[2] - (SRCB)[2];     \
+      (DST)[3] = (SRCA)[3] - (SRCB)[3];     \
+} while (0)
+
+/** Addition */
+#define ADD_4V( DST, SRCA, SRCB )           \
+do {                                        \
+      (DST)[0] = (SRCA)[0] + (SRCB)[0];     \
+      (DST)[1] = (SRCA)[1] + (SRCB)[1];     \
+      (DST)[2] = (SRCA)[2] + (SRCB)[2];     \
+      (DST)[3] = (SRCA)[3] + (SRCB)[3];     \
+} while (0)
+
+/** Element-wise multiplication */
+#define SCALE_4V( DST, SRCA, SRCB )         \
+do {                                        \
+      (DST)[0] = (SRCA)[0] * (SRCB)[0];     \
+      (DST)[1] = (SRCA)[1] * (SRCB)[1];     \
+      (DST)[2] = (SRCA)[2] * (SRCB)[2];     \
+      (DST)[3] = (SRCA)[3] * (SRCB)[3];     \
+} while (0)
+
+/** In-place addition */
+#define ACC_4V( DST, SRC )          \
+do {                                \
+      (DST)[0] += (SRC)[0];         \
+      (DST)[1] += (SRC)[1];         \
+      (DST)[2] += (SRC)[2];         \
+      (DST)[3] += (SRC)[3];         \
+} while (0)
+
+/** Element-wise multiplication and addition */
+#define ACC_SCALE_4V( DST, SRCA, SRCB )     \
+do {                                        \
+      (DST)[0] += (SRCA)[0] * (SRCB)[0];    \
+      (DST)[1] += (SRCA)[1] * (SRCB)[1];    \
+      (DST)[2] += (SRCA)[2] * (SRCB)[2];    \
+      (DST)[3] += (SRCA)[3] * (SRCB)[3];    \
+} while (0)
+
+/** In-place scalar multiplication and addition */
+#define ACC_SCALE_SCALAR_4V( DST, S, SRCB ) \
+do {                                        \
+      (DST)[0] += S * (SRCB)[0];            \
+      (DST)[1] += S * (SRCB)[1];            \
+      (DST)[2] += S * (SRCB)[2];            \
+      (DST)[3] += S * (SRCB)[3];            \
+} while (0)
+
+/** Scalar multiplication */
+#define SCALE_SCALAR_4V( DST, S, SRCB ) \
+do {                                    \
+      (DST)[0] = S * (SRCB)[0];         \
+      (DST)[1] = S * (SRCB)[1];         \
+      (DST)[2] = S * (SRCB)[2];         \
+      (DST)[3] = S * (SRCB)[3];         \
+} while (0)
+
+/** In-place scalar multiplication */
+#define SELF_SCALE_SCALAR_4V( DST, S ) \
+do {                                   \
+      (DST)[0] *= S;                   \
+      (DST)[1] *= S;                   \
+      (DST)[2] *= S;                   \
+      (DST)[3] *= S;                   \
+} while (0)
+
+/** Assignment */
+#define ASSIGN_4V( V, V0, V1, V2, V3 )  \
+do {                                    \
+    V[0] = V0;                          \
+    V[1] = V1;                          \
+    V[2] = V2;                          \
+    V[3] = V3;                          \
+} while(0)
+
+/*@}*/
+
+
+/**********************************************************************/
+/** \name 3-element vector operations*/
+/*@{*/
+
+/** Zero */
+#define ZERO_3V( DST )  (DST)[0] = (DST)[1] = (DST)[2] = 0
+
+/** Test for equality */
+#define TEST_EQ_3V(a,b)  \
+   ((a)[0] == (b)[0] &&  \
+    (a)[1] == (b)[1] &&  \
+    (a)[2] == (b)[2])
+
+/** Copy a 3-element vector */
+#define COPY_3V( DST, SRC )         \
+do {                                \
+   (DST)[0] = (SRC)[0];             \
+   (DST)[1] = (SRC)[1];             \
+   (DST)[2] = (SRC)[2];             \
+} while (0)
+
+/** Copy a 3-element vector with cast */
+#define COPY_3V_CAST( DST, SRC, CAST )  \
+do {                                    \
+   (DST)[0] = (CAST)(SRC)[0];           \
+   (DST)[1] = (CAST)(SRC)[1];           \
+   (DST)[2] = (CAST)(SRC)[2];           \
+} while (0)
+
+/** Copy a 3-element float vector */
+#define COPY_3FV( DST, SRC )        \
+do {                                \
+   const GLfloat *_tmp = (SRC);     \
+   (DST)[0] = _tmp[0];              \
+   (DST)[1] = _tmp[1];              \
+   (DST)[2] = _tmp[2];              \
+} while (0)
+
+/** Subtraction */
+#define SUB_3V( DST, SRCA, SRCB )        \
+do {                                     \
+      (DST)[0] = (SRCA)[0] - (SRCB)[0];  \
+      (DST)[1] = (SRCA)[1] - (SRCB)[1];  \
+      (DST)[2] = (SRCA)[2] - (SRCB)[2];  \
+} while (0)
+
+/** Addition */
+#define ADD_3V( DST, SRCA, SRCB )       \
+do {                                    \
+      (DST)[0] = (SRCA)[0] + (SRCB)[0]; \
+      (DST)[1] = (SRCA)[1] + (SRCB)[1]; \
+      (DST)[2] = (SRCA)[2] + (SRCB)[2]; \
+} while (0)
+
+/** In-place scalar multiplication */
+#define SCALE_3V( DST, SRCA, SRCB )     \
+do {                                    \
+      (DST)[0] = (SRCA)[0] * (SRCB)[0]; \
+      (DST)[1] = (SRCA)[1] * (SRCB)[1]; \
+      (DST)[2] = (SRCA)[2] * (SRCB)[2]; \
+} while (0)
+
+/** In-place element-wise multiplication */
+#define SELF_SCALE_3V( DST, SRC )   \
+do {                                \
+      (DST)[0] *= (SRC)[0];         \
+      (DST)[1] *= (SRC)[1];         \
+      (DST)[2] *= (SRC)[2];         \
+} while (0)
+
+/** In-place addition */
+#define ACC_3V( DST, SRC )          \
+do {                                \
+      (DST)[0] += (SRC)[0];         \
+      (DST)[1] += (SRC)[1];         \
+      (DST)[2] += (SRC)[2];         \
+} while (0)
+
+/** Element-wise multiplication and addition */
+#define ACC_SCALE_3V( DST, SRCA, SRCB )     \
+do {                                        \
+      (DST)[0] += (SRCA)[0] * (SRCB)[0];    \
+      (DST)[1] += (SRCA)[1] * (SRCB)[1];    \
+      (DST)[2] += (SRCA)[2] * (SRCB)[2];    \
+} while (0)
+
+/** Scalar multiplication */
+#define SCALE_SCALAR_3V( DST, S, SRCB ) \
+do {                                    \
+      (DST)[0] = S * (SRCB)[0];         \
+      (DST)[1] = S * (SRCB)[1];         \
+      (DST)[2] = S * (SRCB)[2];         \
+} while (0)
+
+/** In-place scalar multiplication and addition */
+#define ACC_SCALE_SCALAR_3V( DST, S, SRCB ) \
+do {                                        \
+      (DST)[0] += S * (SRCB)[0];            \
+      (DST)[1] += S * (SRCB)[1];            \
+      (DST)[2] += S * (SRCB)[2];            \
+} while (0)
+
+/** In-place scalar multiplication */
+#define SELF_SCALE_SCALAR_3V( DST, S ) \
+do {                                   \
+      (DST)[0] *= S;                   \
+      (DST)[1] *= S;                   \
+      (DST)[2] *= S;                   \
+} while (0)
+
+/** In-place scalar addition */
+#define ACC_SCALAR_3V( DST, S )     \
+do {                                \
+      (DST)[0] += S;                \
+      (DST)[1] += S;                \
+      (DST)[2] += S;                \
+} while (0)
+
+/** Assignment */
+#define ASSIGN_3V( V, V0, V1, V2 )  \
+do {                                \
+    V[0] = V0;                      \
+    V[1] = V1;                      \
+    V[2] = V2;                      \
+} while(0)
+
+/*@}*/
+
+
+/**********************************************************************/
+/** \name 2-element vector operations*/
+/*@{*/
+
+/** Zero */
+#define ZERO_2V( DST )  (DST)[0] = (DST)[1] = 0
+
+/** Copy a 2-element vector */
+#define COPY_2V( DST, SRC )         \
+do {                        \
+   (DST)[0] = (SRC)[0];             \
+   (DST)[1] = (SRC)[1];             \
+} while (0)
+
+/** Copy a 2-element vector with cast */
+#define COPY_2V_CAST( DST, SRC, CAST )      \
+do {                        \
+   (DST)[0] = (CAST)(SRC)[0];           \
+   (DST)[1] = (CAST)(SRC)[1];           \
+} while (0)
+
+/** Copy a 2-element float vector */
+#define COPY_2FV( DST, SRC )            \
+do {                        \
+   const GLfloat *_tmp = (SRC);         \
+   (DST)[0] = _tmp[0];              \
+   (DST)[1] = _tmp[1];              \
+} while (0)
+
+/** Subtraction */
+#define SUB_2V( DST, SRCA, SRCB )       \
+do {                        \
+      (DST)[0] = (SRCA)[0] - (SRCB)[0];     \
+      (DST)[1] = (SRCA)[1] - (SRCB)[1];     \
+} while (0)
+
+/** Addition */
+#define ADD_2V( DST, SRCA, SRCB )       \
+do {                        \
+      (DST)[0] = (SRCA)[0] + (SRCB)[0];     \
+      (DST)[1] = (SRCA)[1] + (SRCB)[1];     \
+} while (0)
+
+/** In-place scalar multiplication */
+#define SCALE_2V( DST, SRCA, SRCB )     \
+do {                        \
+      (DST)[0] = (SRCA)[0] * (SRCB)[0];     \
+      (DST)[1] = (SRCA)[1] * (SRCB)[1];     \
+} while (0)
+
+/** In-place addition */
+#define ACC_2V( DST, SRC )          \
+do {                        \
+      (DST)[0] += (SRC)[0];         \
+      (DST)[1] += (SRC)[1];         \
+} while (0)
+
+/** Element-wise multiplication and addition */
+#define ACC_SCALE_2V( DST, SRCA, SRCB )     \
+do {                        \
+      (DST)[0] += (SRCA)[0] * (SRCB)[0];    \
+      (DST)[1] += (SRCA)[1] * (SRCB)[1];    \
+} while (0)
+
+/** Scalar multiplication */
+#define SCALE_SCALAR_2V( DST, S, SRCB )     \
+do {                        \
+      (DST)[0] = S * (SRCB)[0];         \
+      (DST)[1] = S * (SRCB)[1];         \
+} while (0)
+
+/** In-place scalar multiplication and addition */
+#define ACC_SCALE_SCALAR_2V( DST, S, SRCB ) \
+do {                        \
+      (DST)[0] += S * (SRCB)[0];        \
+      (DST)[1] += S * (SRCB)[1];        \
+} while (0)
+
+/** In-place scalar multiplication */
+#define SELF_SCALE_SCALAR_2V( DST, S )      \
+do {                        \
+      (DST)[0] *= S;                \
+      (DST)[1] *= S;                \
+} while (0)
+
+/** In-place scalar addition */
+#define ACC_SCALAR_2V( DST, S )         \
+do {                        \
+      (DST)[0] += S;                \
+      (DST)[1] += S;                \
+} while (0)
+
+/** Assign scalers to short vectors */
+#define ASSIGN_2V( V, V0, V1 )	\
+do {				\
+    V[0] = V0;			\
+    V[1] = V1;			\
+} while(0)
+
+/*@}*/
+
+/** Copy \p sz elements into a homegeneous (4-element) vector, giving
+ * default values to the remaining components.
+ * The default values are chosen based on \p type.
+ */
+static inline void
+COPY_CLEAN_4V_TYPE_AS_FLOAT(GLfloat dst[4], int sz, const GLfloat src[4],
+                            GLenum type)
+{
+   switch (type) {
+   case GL_FLOAT:
+      ASSIGN_4V(dst, 0, 0, 0, 1);
+      break;
+   case GL_INT:
+      ASSIGN_4V(dst, INT_AS_FLT(0), INT_AS_FLT(0),
+                     INT_AS_FLT(0), INT_AS_FLT(1));
+      break;
+   case GL_UNSIGNED_INT:
+      ASSIGN_4V(dst, UINT_AS_FLT(0), UINT_AS_FLT(0),
+                     UINT_AS_FLT(0), UINT_AS_FLT(1));
+      break;
+   default:
+      ASSIGN_4V(dst, 0.0f, 0.0f, 0.0f, 1.0f); /* silence warnings */
+      ASSERT(!"Unexpected type in COPY_CLEAN_4V_TYPE_AS_FLOAT macro");
+   }
+   COPY_SZ_4V(dst, sz, src);
+}
+
+/** \name Linear interpolation functions */
+/*@{*/
+
+static inline GLfloat
+LINTERP(GLfloat t, GLfloat out, GLfloat in)
+{
+   return out + t * (in - out);
+}
+
+static inline void
+INTERP_3F(GLfloat t, GLfloat dst[3], const GLfloat out[3], const GLfloat in[3])
+{
+   dst[0] = LINTERP( t, out[0], in[0] );
+   dst[1] = LINTERP( t, out[1], in[1] );
+   dst[2] = LINTERP( t, out[2], in[2] );
+}
+
+static inline void
+INTERP_4F(GLfloat t, GLfloat dst[4], const GLfloat out[4], const GLfloat in[4])
+{
+   dst[0] = LINTERP( t, out[0], in[0] );
+   dst[1] = LINTERP( t, out[1], in[1] );
+   dst[2] = LINTERP( t, out[2], in[2] );
+   dst[3] = LINTERP( t, out[3], in[3] );
+}
+
+/*@}*/
+
+
+
+/** Clamp X to [MIN,MAX] */
+#define CLAMP( X, MIN, MAX )  ( (X)<(MIN) ? (MIN) : ((X)>(MAX) ? (MAX) : (X)) )
+
+/** Minimum of two values: */
+#define MIN2( A, B )   ( (A)<(B) ? (A) : (B) )
+
+/** Maximum of two values: */
+#define MAX2( A, B )   ( (A)>(B) ? (A) : (B) )
+
+/** Minimum and maximum of three values: */
+#define MIN3( A, B, C ) ((A) < (B) ? MIN2(A, C) : MIN2(B, C))
+#define MAX3( A, B, C ) ((A) > (B) ? MAX2(A, C) : MAX2(B, C))
+
+static inline unsigned
+minify(unsigned value, unsigned levels)
+{
+    return MAX2(1, value >> levels);
+}
+
+/**
+ * Return true if the given value is a power of two.
+ *
+ * Note that this considers 0 a power of two.
+ */
+static inline bool
+is_power_of_two(unsigned value)
+{
+   return (value & (value - 1)) == 0;
+}
+
+/**
+ * Align a value up to an alignment value
+ *
+ * If \c value is not already aligned to the requested alignment value, it
+ * will be rounded up.
+ *
+ * \param value  Value to be rounded
+ * \param alignment  Alignment value to be used.  This must be a power of two.
+ *
+ * \sa ROUND_DOWN_TO()
+ */
+#define ALIGN(value, alignment)  (((value) + (alignment) - 1) & ~((alignment) - 1))
+
+/**
+ * Align a value down to an alignment value
+ *
+ * If \c value is not already aligned to the requested alignment value, it
+ * will be rounded down.
+ *
+ * \param value  Value to be rounded
+ * \param alignment  Alignment value to be used.  This must be a power of two.
+ *
+ * \sa ALIGN()
+ */
+#define ROUND_DOWN_TO(value, alignment) ((value) & ~(alignment - 1))
+
+
+/** Cross product of two 3-element vectors */
+static inline void
+CROSS3(GLfloat n[3], const GLfloat u[3], const GLfloat v[3])
+{
+   n[0] = u[1] * v[2] - u[2] * v[1];
+   n[1] = u[2] * v[0] - u[0] * v[2];
+   n[2] = u[0] * v[1] - u[1] * v[0];
+}
+
+
+/** Dot product of two 2-element vectors */
+static inline GLfloat
+DOT2(const GLfloat a[2], const GLfloat b[2])
+{
+   return a[0] * b[0] + a[1] * b[1];
+}
+
+static inline GLfloat
+DOT3(const GLfloat a[3], const GLfloat b[3])
+{
+   return a[0] * b[0] + a[1] * b[1] + a[2] * b[2];
+}
+
+static inline GLfloat
+DOT4(const GLfloat a[4], const GLfloat b[4])
+{
+   return a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3];
+}
+
+
+static inline GLfloat
+LEN_SQUARED_3FV(const GLfloat v[3])
+{
+   return DOT3(v, v);
+}
+
+static inline GLfloat
+LEN_SQUARED_2FV(const GLfloat v[2])
+{
+   return DOT2(v, v);
+}
+
+
+static inline GLfloat
+LEN_3FV(const GLfloat v[3])
+{
+   return sqrtf(LEN_SQUARED_3FV(v));
+}
+
+static inline GLfloat
+LEN_2FV(const GLfloat v[2])
+{
+   return sqrtf(LEN_SQUARED_2FV(v));
+}
+
+
+/* Normalize a 3-element vector to unit length. */
+static inline void
+NORMALIZE_3FV(GLfloat v[3])
+{
+   GLfloat len = (GLfloat) LEN_SQUARED_3FV(v);
+   if (len) {
+      len = INV_SQRTF(len);
+      v[0] *= len;
+      v[1] *= len;
+      v[2] *= len;
+   }
+}
+
+
+/** Is float value negative? */
+static inline GLboolean
+IS_NEGATIVE(float x)
+{
+   return signbit(x) != 0;
+}
+
+/** Test two floats have opposite signs */
+static inline GLboolean
+DIFFERENT_SIGNS(GLfloat x, GLfloat y)
+{
+   return signbit(x) != signbit(y);
+}
+
+
+/** Compute ceiling of integer quotient of A divided by B. */
+#define CEILING( A, B )  ( (A) % (B) == 0 ? (A)/(B) : (A)/(B)+1 )
+
+
+/** casts to silence warnings with some compilers */
+#define ENUM_TO_INT(E)     ((GLint)(E))
+#define ENUM_TO_FLOAT(E)   ((GLfloat)(GLint)(E))
+#define ENUM_TO_DOUBLE(E)  ((GLdouble)(GLint)(E))
+#define ENUM_TO_BOOLEAN(E) ((E) ? GL_TRUE : GL_FALSE)
+
+/* Compute the size of an array */
+//#define ARRAY_SIZE(x) (sizeof(x) / sizeof(x[0]))
+
+/* Stringify */
+#define STRINGIFY(x) #x
+
+#endif

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/mtypes.h b/icd/intel/compiler/mesa-utils/src/mesa/main/mtypes.h
new file mode 100644
index 0000000..fb3a12e
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/mtypes.h

@@ -0,0 +1,2389 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2008  Brian Paul   All Rights Reserved.
+ * Copyright (C) 2009  VMware, Inc.  All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file mtypes.h
+ * Main Mesa data structures.
+ *
+ * Please try to mark derived values with a leading underscore ('_').
+ */
+
+#ifndef MTYPES_H
+#define MTYPES_H
+
+
+#include <stdint.h>             /* uint32_t */
+#include <stdbool.h>
+
+#include "main/glheader.h"
+#include "main/config.h"
+#include "glapi/glapi.h"
+#include "math/m_matrix.h"	/* GLmatrix */
+#include "main/simple_list.h"	/* struct simple_node */
+#include "main/formats.h"       /* MESA_FORMAT_COUNT */
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+/**
+ * \name 64-bit extension of GLbitfield.
+ */
+/*@{*/
+typedef GLuint64 GLbitfield64;
+
+/** Set a single bit */
+#define BITFIELD64_BIT(b)      ((GLbitfield64)1 << (b))
+/** Set all bits up to excluding bit b */
+#define BITFIELD64_MASK(b)      \
+   ((b) == 64 ? (~(GLbitfield64)0) : BITFIELD64_BIT(b) - 1)
+/** Set count bits starting from bit b  */
+#define BITFIELD64_RANGE(b, count) \
+   (BITFIELD64_MASK((b) + (count)) & ~BITFIELD64_MASK(b))
+
+
+/**
+ * \name Some forward type declarations
+ */
+/*@{*/
+struct _mesa_HashTable;
+struct _mesa_threadpool;
+struct _mesa_threadpool_task;
+struct gl_attrib_node;
+struct gl_list_extensions;
+struct gl_meta_state;
+struct gl_program_cache;
+struct gl_texture_object;
+struct gl_debug_state;
+struct gl_context;
+struct st_context;
+struct gl_uniform_storage;
+struct prog_instruction;
+struct gl_program_parameter_list;
+struct set;
+struct set_entry;
+struct vbo_context;
+/*@}*/
+
+
+/** Extra draw modes beyond GL_POINTS, GL_TRIANGLE_FAN, etc */
+#define PRIM_MAX                 GL_TRIANGLE_STRIP_ADJACENCY
+#define PRIM_OUTSIDE_BEGIN_END   (PRIM_MAX + 1)
+#define PRIM_UNKNOWN             (PRIM_MAX + 2)
+
+
+
+/**
+ * Indexes for vertex program attributes.
+ * GL_NV_vertex_program aliases generic attributes over the conventional
+ * attributes.  In GL_ARB_vertex_program shader the aliasing is optional.
+ * In GL_ARB_vertex_shader / OpenGL 2.0 the aliasing is disallowed (the
+ * generic attributes are distinct/separate).
+ */
+typedef enum
+{
+   VERT_ATTRIB_POS = 0,
+   VERT_ATTRIB_WEIGHT = 1,
+   VERT_ATTRIB_NORMAL = 2,
+   VERT_ATTRIB_COLOR0 = 3,
+   VERT_ATTRIB_COLOR1 = 4,
+   VERT_ATTRIB_FOG = 5,
+   VERT_ATTRIB_COLOR_INDEX = 6,
+   VERT_ATTRIB_EDGEFLAG = 7,
+   VERT_ATTRIB_TEX0 = 8,
+   VERT_ATTRIB_TEX1 = 9,
+   VERT_ATTRIB_TEX2 = 10,
+   VERT_ATTRIB_TEX3 = 11,
+   VERT_ATTRIB_TEX4 = 12,
+   VERT_ATTRIB_TEX5 = 13,
+   VERT_ATTRIB_TEX6 = 14,
+   VERT_ATTRIB_TEX7 = 15,
+   VERT_ATTRIB_POINT_SIZE = 16,
+   VERT_ATTRIB_GENERIC0 = 17,
+   VERT_ATTRIB_GENERIC1 = 18,
+   VERT_ATTRIB_GENERIC2 = 19,
+   VERT_ATTRIB_GENERIC3 = 20,
+   VERT_ATTRIB_GENERIC4 = 21,
+   VERT_ATTRIB_GENERIC5 = 22,
+   VERT_ATTRIB_GENERIC6 = 23,
+   VERT_ATTRIB_GENERIC7 = 24,
+   VERT_ATTRIB_GENERIC8 = 25,
+   VERT_ATTRIB_GENERIC9 = 26,
+   VERT_ATTRIB_GENERIC10 = 27,
+   VERT_ATTRIB_GENERIC11 = 28,
+   VERT_ATTRIB_GENERIC12 = 29,
+   VERT_ATTRIB_GENERIC13 = 30,
+   VERT_ATTRIB_GENERIC14 = 31,
+   VERT_ATTRIB_GENERIC15 = 32,
+   VERT_ATTRIB_MAX = 33
+} gl_vert_attrib;
+
+/**
+ * Symbolic constats to help iterating over
+ * specific blocks of vertex attributes.
+ *
+ * VERT_ATTRIB_FF
+ *   includes all fixed function attributes as well as
+ *   the aliased GL_NV_vertex_program shader attributes.
+ * VERT_ATTRIB_TEX
+ *   include the classic texture coordinate attributes.
+ *   Is a subset of VERT_ATTRIB_FF.
+ * VERT_ATTRIB_GENERIC
+ *   include the OpenGL 2.0+ GLSL generic shader attributes.
+ *   These alias the generic GL_ARB_vertex_shader attributes.
+ */
+#define VERT_ATTRIB_FF(i)           (VERT_ATTRIB_POS + (i))
+#define VERT_ATTRIB_FF_MAX          VERT_ATTRIB_GENERIC0
+
+#define VERT_ATTRIB_TEX(i)          (VERT_ATTRIB_TEX0 + (i))
+#define VERT_ATTRIB_TEX_MAX         MAX_TEXTURE_COORD_UNITS
+
+#define VERT_ATTRIB_GENERIC(i)      (VERT_ATTRIB_GENERIC0 + (i))
+#define VERT_ATTRIB_GENERIC_MAX     MAX_VERTEX_GENERIC_ATTRIBS
+
+/**
+ * Bitflags for vertex attributes.
+ * These are used in bitfields in many places.
+ */
+/*@{*/
+#define VERT_BIT_POS             BITFIELD64_BIT(VERT_ATTRIB_POS)
+#define VERT_BIT_WEIGHT          BITFIELD64_BIT(VERT_ATTRIB_WEIGHT)
+#define VERT_BIT_NORMAL          BITFIELD64_BIT(VERT_ATTRIB_NORMAL)
+#define VERT_BIT_COLOR0          BITFIELD64_BIT(VERT_ATTRIB_COLOR0)
+#define VERT_BIT_COLOR1          BITFIELD64_BIT(VERT_ATTRIB_COLOR1)
+#define VERT_BIT_FOG             BITFIELD64_BIT(VERT_ATTRIB_FOG)
+#define VERT_BIT_COLOR_INDEX     BITFIELD64_BIT(VERT_ATTRIB_COLOR_INDEX)
+#define VERT_BIT_EDGEFLAG        BITFIELD64_BIT(VERT_ATTRIB_EDGEFLAG)
+#define VERT_BIT_TEX0            BITFIELD64_BIT(VERT_ATTRIB_TEX0)
+#define VERT_BIT_TEX1            BITFIELD64_BIT(VERT_ATTRIB_TEX1)
+#define VERT_BIT_TEX2            BITFIELD64_BIT(VERT_ATTRIB_TEX2)
+#define VERT_BIT_TEX3            BITFIELD64_BIT(VERT_ATTRIB_TEX3)
+#define VERT_BIT_TEX4            BITFIELD64_BIT(VERT_ATTRIB_TEX4)
+#define VERT_BIT_TEX5            BITFIELD64_BIT(VERT_ATTRIB_TEX5)
+#define VERT_BIT_TEX6            BITFIELD64_BIT(VERT_ATTRIB_TEX6)
+#define VERT_BIT_TEX7            BITFIELD64_BIT(VERT_ATTRIB_TEX7)
+#define VERT_BIT_POINT_SIZE      BITFIELD64_BIT(VERT_ATTRIB_POINT_SIZE)
+#define VERT_BIT_GENERIC0        BITFIELD64_BIT(VERT_ATTRIB_GENERIC0)
+
+#define VERT_BIT(i)              BITFIELD64_BIT(i)
+#define VERT_BIT_ALL             BITFIELD64_RANGE(0, VERT_ATTRIB_MAX)
+
+#define VERT_BIT_FF(i)           VERT_BIT(i)
+#define VERT_BIT_FF_ALL          BITFIELD64_RANGE(0, VERT_ATTRIB_FF_MAX)
+#define VERT_BIT_TEX(i)          VERT_BIT(VERT_ATTRIB_TEX(i))
+#define VERT_BIT_TEX_ALL         \
+   BITFIELD64_RANGE(VERT_ATTRIB_TEX(0), VERT_ATTRIB_TEX_MAX)
+
+#define VERT_BIT_GENERIC(i)      VERT_BIT(VERT_ATTRIB_GENERIC(i))
+#define VERT_BIT_GENERIC_ALL     \
+   BITFIELD64_RANGE(VERT_ATTRIB_GENERIC(0), VERT_ATTRIB_GENERIC_MAX)
+/*@}*/
+
+
+/**
+ * Indexes for vertex shader outputs, geometry shader inputs/outputs, and
+ * fragment shader inputs.
+ *
+ * Note that some of these values are not available to all pipeline stages.
+ *
+ * When this enum is updated, the following code must be updated too:
+ * - vertResults (in prog_print.c's arb_output_attrib_string())
+ * - fragAttribs (in prog_print.c's arb_input_attrib_string())
+ * - _mesa_varying_slot_in_fs()
+ */
+typedef enum
+{
+   VARYING_SLOT_POS,
+   VARYING_SLOT_COL0, /* COL0 and COL1 must be contiguous */
+   VARYING_SLOT_COL1,
+   VARYING_SLOT_FOGC,
+   VARYING_SLOT_TEX0, /* TEX0-TEX7 must be contiguous */
+   VARYING_SLOT_TEX1,
+   VARYING_SLOT_TEX2,
+   VARYING_SLOT_TEX3,
+   VARYING_SLOT_TEX4,
+   VARYING_SLOT_TEX5,
+   VARYING_SLOT_TEX6,
+   VARYING_SLOT_TEX7,
+   VARYING_SLOT_PSIZ, /* Does not appear in FS */
+   VARYING_SLOT_BFC0, /* Does not appear in FS */
+   VARYING_SLOT_BFC1, /* Does not appear in FS */
+   VARYING_SLOT_EDGE, /* Does not appear in FS */
+   VARYING_SLOT_CLIP_VERTEX, /* Does not appear in FS */
+   VARYING_SLOT_CLIP_DIST0,
+   VARYING_SLOT_CLIP_DIST1,
+   VARYING_SLOT_PRIMITIVE_ID, /* Does not appear in VS */
+   VARYING_SLOT_LAYER, /* Appears as VS or GS output */
+   VARYING_SLOT_VIEWPORT, /* Appears as VS or GS output */
+   VARYING_SLOT_FACE, /* FS only */
+   VARYING_SLOT_PNTC, /* FS only */
+   VARYING_SLOT_VAR0, /* First generic varying slot */
+   VARYING_SLOT_MAX = VARYING_SLOT_VAR0 + MAX_VARYING
+} gl_varying_slot;
+
+
+/**
+ * Bitflags for varying slots.
+ */
+/*@{*/
+#define VARYING_BIT_POS BITFIELD64_BIT(VARYING_SLOT_POS)
+#define VARYING_BIT_COL0 BITFIELD64_BIT(VARYING_SLOT_COL0)
+#define VARYING_BIT_COL1 BITFIELD64_BIT(VARYING_SLOT_COL1)
+#define VARYING_BIT_FOGC BITFIELD64_BIT(VARYING_SLOT_FOGC)
+#define VARYING_BIT_TEX0 BITFIELD64_BIT(VARYING_SLOT_TEX0)
+#define VARYING_BIT_TEX1 BITFIELD64_BIT(VARYING_SLOT_TEX1)
+#define VARYING_BIT_TEX2 BITFIELD64_BIT(VARYING_SLOT_TEX2)
+#define VARYING_BIT_TEX3 BITFIELD64_BIT(VARYING_SLOT_TEX3)
+#define VARYING_BIT_TEX4 BITFIELD64_BIT(VARYING_SLOT_TEX4)
+#define VARYING_BIT_TEX5 BITFIELD64_BIT(VARYING_SLOT_TEX5)
+#define VARYING_BIT_TEX6 BITFIELD64_BIT(VARYING_SLOT_TEX6)
+#define VARYING_BIT_TEX7 BITFIELD64_BIT(VARYING_SLOT_TEX7)
+#define VARYING_BIT_TEX(U) BITFIELD64_BIT(VARYING_SLOT_TEX0 + (U))
+#define VARYING_BITS_TEX_ANY BITFIELD64_RANGE(VARYING_SLOT_TEX0, \
+                                              MAX_TEXTURE_COORD_UNITS)
+#define VARYING_BIT_PSIZ BITFIELD64_BIT(VARYING_SLOT_PSIZ)
+#define VARYING_BIT_BFC0 BITFIELD64_BIT(VARYING_SLOT_BFC0)
+#define VARYING_BIT_BFC1 BITFIELD64_BIT(VARYING_SLOT_BFC1)
+#define VARYING_BIT_EDGE BITFIELD64_BIT(VARYING_SLOT_EDGE)
+#define VARYING_BIT_CLIP_VERTEX BITFIELD64_BIT(VARYING_SLOT_CLIP_VERTEX)
+#define VARYING_BIT_CLIP_DIST0 BITFIELD64_BIT(VARYING_SLOT_CLIP_DIST0)
+#define VARYING_BIT_CLIP_DIST1 BITFIELD64_BIT(VARYING_SLOT_CLIP_DIST1)
+#define VARYING_BIT_PRIMITIVE_ID BITFIELD64_BIT(VARYING_SLOT_PRIMITIVE_ID)
+#define VARYING_BIT_LAYER BITFIELD64_BIT(VARYING_SLOT_LAYER)
+#define VARYING_BIT_VIEWPORT BITFIELD64_BIT(VARYING_SLOT_VIEWPORT)
+#define VARYING_BIT_FACE BITFIELD64_BIT(VARYING_SLOT_FACE)
+#define VARYING_BIT_PNTC BITFIELD64_BIT(VARYING_SLOT_PNTC)
+#define VARYING_BIT_VAR(V) BITFIELD64_BIT(VARYING_SLOT_VAR0 + (V))
+/*@}*/
+
+/**
+ * Bitflags for system values.
+ */
+#define SYSTEM_BIT_SAMPLE_ID BITFIELD64_BIT(SYSTEM_VALUE_SAMPLE_ID)
+#define SYSTEM_BIT_SAMPLE_POS BITFIELD64_BIT(SYSTEM_VALUE_SAMPLE_POS)
+#define SYSTEM_BIT_SAMPLE_MASK_IN BITFIELD64_BIT(SYSTEM_VALUE_SAMPLE_MASK_IN)
+
+/**
+ * Determine if the given gl_varying_slot appears in the fragment shader.
+ */
+static inline GLboolean
+_mesa_varying_slot_in_fs(gl_varying_slot slot)
+{
+   switch (slot) {
+   case VARYING_SLOT_PSIZ:
+   case VARYING_SLOT_BFC0:
+   case VARYING_SLOT_BFC1:
+   case VARYING_SLOT_EDGE:
+   case VARYING_SLOT_CLIP_VERTEX:
+   case VARYING_SLOT_LAYER:
+      return GL_FALSE;
+   default:
+      return GL_TRUE;
+   }
+}
+
+
+/**
+ * Fragment program results
+ */
+typedef enum
+{
+   FRAG_RESULT_DEPTH = 0,
+   FRAG_RESULT_STENCIL = 1,
+   /* If a single color should be written to all render targets, this
+    * register is written.  No FRAG_RESULT_DATAn will be written.
+    */
+   FRAG_RESULT_COLOR = 2,
+   FRAG_RESULT_SAMPLE_MASK = 3,
+
+   /* FRAG_RESULT_DATAn are the per-render-target (GLSL gl_FragData[n]
+    * or ARB_fragment_program fragment.color[n]) color results.  If
+    * any are written, FRAG_RESULT_COLOR will not be written.
+    */
+   FRAG_RESULT_DATA0 = 4,
+   FRAG_RESULT_MAX = (FRAG_RESULT_DATA0 + MAX_DRAW_BUFFERS)
+} gl_frag_result;
+
+
+/**
+ * Indexes for all renderbuffers
+ */
+typedef enum
+{
+   /* the four standard color buffers */
+   BUFFER_FRONT_LEFT,
+   BUFFER_BACK_LEFT,
+   BUFFER_FRONT_RIGHT,
+   BUFFER_BACK_RIGHT,
+   BUFFER_DEPTH,
+   BUFFER_STENCIL,
+   BUFFER_ACCUM,
+   /* optional aux buffer */
+   BUFFER_AUX0,
+   /* generic renderbuffers */
+   BUFFER_COLOR0,
+   BUFFER_COLOR1,
+   BUFFER_COLOR2,
+   BUFFER_COLOR3,
+   BUFFER_COLOR4,
+   BUFFER_COLOR5,
+   BUFFER_COLOR6,
+   BUFFER_COLOR7,
+   BUFFER_COUNT
+} gl_buffer_index;
+
+/**
+ * Bit flags for all renderbuffers
+ */
+#define BUFFER_BIT_FRONT_LEFT   (1 << BUFFER_FRONT_LEFT)
+#define BUFFER_BIT_BACK_LEFT    (1 << BUFFER_BACK_LEFT)
+#define BUFFER_BIT_FRONT_RIGHT  (1 << BUFFER_FRONT_RIGHT)
+#define BUFFER_BIT_BACK_RIGHT   (1 << BUFFER_BACK_RIGHT)
+#define BUFFER_BIT_AUX0         (1 << BUFFER_AUX0)
+#define BUFFER_BIT_AUX1         (1 << BUFFER_AUX1)
+#define BUFFER_BIT_AUX2         (1 << BUFFER_AUX2)
+#define BUFFER_BIT_AUX3         (1 << BUFFER_AUX3)
+#define BUFFER_BIT_DEPTH        (1 << BUFFER_DEPTH)
+#define BUFFER_BIT_STENCIL      (1 << BUFFER_STENCIL)
+#define BUFFER_BIT_ACCUM        (1 << BUFFER_ACCUM)
+#define BUFFER_BIT_COLOR0       (1 << BUFFER_COLOR0)
+#define BUFFER_BIT_COLOR1       (1 << BUFFER_COLOR1)
+#define BUFFER_BIT_COLOR2       (1 << BUFFER_COLOR2)
+#define BUFFER_BIT_COLOR3       (1 << BUFFER_COLOR3)
+#define BUFFER_BIT_COLOR4       (1 << BUFFER_COLOR4)
+#define BUFFER_BIT_COLOR5       (1 << BUFFER_COLOR5)
+#define BUFFER_BIT_COLOR6       (1 << BUFFER_COLOR6)
+#define BUFFER_BIT_COLOR7       (1 << BUFFER_COLOR7)
+
+/**
+ * Mask of all the color buffer bits (but not accum).
+ */
+#define BUFFER_BITS_COLOR  (BUFFER_BIT_FRONT_LEFT | \
+                            BUFFER_BIT_BACK_LEFT | \
+                            BUFFER_BIT_FRONT_RIGHT | \
+                            BUFFER_BIT_BACK_RIGHT | \
+                            BUFFER_BIT_AUX0 | \
+                            BUFFER_BIT_COLOR0 | \
+                            BUFFER_BIT_COLOR1 | \
+                            BUFFER_BIT_COLOR2 | \
+                            BUFFER_BIT_COLOR3 | \
+                            BUFFER_BIT_COLOR4 | \
+                            BUFFER_BIT_COLOR5 | \
+                            BUFFER_BIT_COLOR6 | \
+                            BUFFER_BIT_COLOR7)
+
+
+/**
+ * Shader stages. Note that these will become 5 with tessellation.
+ *
+ * The order must match how shaders are ordered in the pipeline.
+ * The GLSL linker assumes that if i<j, then the j-th shader is
+ * executed later than the i-th shader.
+ */
+typedef enum
+{
+   MESA_SHADER_VERTEX = 0,
+   MESA_SHADER_GEOMETRY = 1,
+   MESA_SHADER_FRAGMENT = 2,
+   MESA_SHADER_COMPUTE = 3,
+} gl_shader_stage;
+
+#define MESA_SHADER_STAGES (MESA_SHADER_COMPUTE + 1)
+
+
+
+/**
+ * \name Bit flags used for updating material values.
+ */
+/*@{*/
+#define MAT_ATTRIB_FRONT_AMBIENT           0 
+#define MAT_ATTRIB_BACK_AMBIENT            1
+#define MAT_ATTRIB_FRONT_DIFFUSE           2 
+#define MAT_ATTRIB_BACK_DIFFUSE            3
+#define MAT_ATTRIB_FRONT_SPECULAR          4 
+#define MAT_ATTRIB_BACK_SPECULAR           5
+#define MAT_ATTRIB_FRONT_EMISSION          6
+#define MAT_ATTRIB_BACK_EMISSION           7
+#define MAT_ATTRIB_FRONT_SHININESS         8
+#define MAT_ATTRIB_BACK_SHININESS          9
+#define MAT_ATTRIB_FRONT_INDEXES           10
+#define MAT_ATTRIB_BACK_INDEXES            11
+#define MAT_ATTRIB_MAX                     12
+
+#define MAT_ATTRIB_AMBIENT(f)  (MAT_ATTRIB_FRONT_AMBIENT+(f))  
+#define MAT_ATTRIB_DIFFUSE(f)  (MAT_ATTRIB_FRONT_DIFFUSE+(f))  
+#define MAT_ATTRIB_SPECULAR(f) (MAT_ATTRIB_FRONT_SPECULAR+(f)) 
+#define MAT_ATTRIB_EMISSION(f) (MAT_ATTRIB_FRONT_EMISSION+(f)) 
+#define MAT_ATTRIB_SHININESS(f)(MAT_ATTRIB_FRONT_SHININESS+(f))
+#define MAT_ATTRIB_INDEXES(f)  (MAT_ATTRIB_FRONT_INDEXES+(f))  
+
+#define MAT_INDEX_AMBIENT  0
+#define MAT_INDEX_DIFFUSE  1
+#define MAT_INDEX_SPECULAR 2
+
+#define MAT_BIT_FRONT_AMBIENT         (1<<MAT_ATTRIB_FRONT_AMBIENT)
+#define MAT_BIT_BACK_AMBIENT          (1<<MAT_ATTRIB_BACK_AMBIENT)
+#define MAT_BIT_FRONT_DIFFUSE         (1<<MAT_ATTRIB_FRONT_DIFFUSE)
+#define MAT_BIT_BACK_DIFFUSE          (1<<MAT_ATTRIB_BACK_DIFFUSE)
+#define MAT_BIT_FRONT_SPECULAR        (1<<MAT_ATTRIB_FRONT_SPECULAR)
+#define MAT_BIT_BACK_SPECULAR         (1<<MAT_ATTRIB_BACK_SPECULAR)
+#define MAT_BIT_FRONT_EMISSION        (1<<MAT_ATTRIB_FRONT_EMISSION)
+#define MAT_BIT_BACK_EMISSION         (1<<MAT_ATTRIB_BACK_EMISSION)
+#define MAT_BIT_FRONT_SHININESS       (1<<MAT_ATTRIB_FRONT_SHININESS)
+#define MAT_BIT_BACK_SHININESS        (1<<MAT_ATTRIB_BACK_SHININESS)
+#define MAT_BIT_FRONT_INDEXES         (1<<MAT_ATTRIB_FRONT_INDEXES)
+#define MAT_BIT_BACK_INDEXES          (1<<MAT_ATTRIB_BACK_INDEXES)
+
+
+#define FRONT_MATERIAL_BITS	(MAT_BIT_FRONT_EMISSION | 	\
+				 MAT_BIT_FRONT_AMBIENT |	\
+				 MAT_BIT_FRONT_DIFFUSE | 	\
+				 MAT_BIT_FRONT_SPECULAR |	\
+				 MAT_BIT_FRONT_SHININESS | 	\
+				 MAT_BIT_FRONT_INDEXES)
+
+#define BACK_MATERIAL_BITS	(MAT_BIT_BACK_EMISSION |	\
+				 MAT_BIT_BACK_AMBIENT |		\
+				 MAT_BIT_BACK_DIFFUSE |		\
+				 MAT_BIT_BACK_SPECULAR |	\
+				 MAT_BIT_BACK_SHININESS |	\
+				 MAT_BIT_BACK_INDEXES)
+
+#define ALL_MATERIAL_BITS	(FRONT_MATERIAL_BITS | BACK_MATERIAL_BITS)
+/*@}*/
+
+
+/**
+ * Light state flags.
+ */
+/*@{*/
+#define LIGHT_SPOT         0x1
+#define LIGHT_LOCAL_VIEWER 0x2
+#define LIGHT_POSITIONAL   0x4
+#define LIGHT_NEED_VERTICES (LIGHT_POSITIONAL|LIGHT_LOCAL_VIEWER)
+/*@}*/
+
+
+
+
+
+/**
+ * Multisample attribute group (GL_MULTISAMPLE_BIT).
+ */
+struct gl_multisample_attrib
+{
+   GLboolean Enabled;
+   GLboolean _Enabled;   /**< true if Enabled and multisample buffer */
+   GLboolean SampleAlphaToCoverage;
+   GLboolean SampleAlphaToOne;
+   GLboolean SampleCoverage;
+   GLfloat SampleCoverageValue;
+   GLboolean SampleCoverageInvert;
+   GLboolean SampleShading;
+   GLfloat MinSampleShadingValue;
+
+   /* ARB_texture_multisample / GL3.2 additions */
+   GLboolean SampleMask;
+   /** The GL spec defines this as an array but >32x MSAA is madness */
+   GLbitfield SampleMaskValue;
+};
+
+
+/**
+ * An index for each type of texture object.  These correspond to the GL
+ * texture target enums, such as GL_TEXTURE_2D, GL_TEXTURE_CUBE_MAP, etc.
+ * Note: the order is from highest priority to lowest priority.
+ */
+typedef enum
+{
+   TEXTURE_2D_MULTISAMPLE_INDEX,
+   TEXTURE_2D_MULTISAMPLE_ARRAY_INDEX,
+   TEXTURE_CUBE_ARRAY_INDEX,
+   TEXTURE_BUFFER_INDEX,
+   TEXTURE_2D_ARRAY_INDEX,
+   TEXTURE_1D_ARRAY_INDEX,
+   TEXTURE_EXTERNAL_INDEX,
+   TEXTURE_CUBE_INDEX,
+   TEXTURE_3D_INDEX,
+   TEXTURE_RECT_INDEX,
+   TEXTURE_2D_INDEX,
+   TEXTURE_1D_INDEX,
+   NUM_TEXTURE_TARGETS
+} gl_texture_index;
+
+
+/**
+ * Bit flags for each type of texture object
+ */
+/*@{*/
+#define TEXTURE_2D_MULTISAMPLE_BIT (1 << TEXTURE_2D_MULTISAMPLE_INDEX)
+#define TEXTURE_2D_MULTISAMPLE_ARRAY_BIT (1 << TEXTURE_2D_MULTISAMPLE_ARRAY_INDEX)
+#define TEXTURE_CUBE_ARRAY_BIT (1 << TEXTURE_CUBE_ARRAY_INDEX)
+#define TEXTURE_BUFFER_BIT   (1 << TEXTURE_BUFFER_INDEX)
+#define TEXTURE_2D_ARRAY_BIT (1 << TEXTURE_2D_ARRAY_INDEX)
+#define TEXTURE_1D_ARRAY_BIT (1 << TEXTURE_1D_ARRAY_INDEX)
+#define TEXTURE_EXTERNAL_BIT (1 << TEXTURE_EXTERNAL_INDEX)
+#define TEXTURE_CUBE_BIT     (1 << TEXTURE_CUBE_INDEX)
+#define TEXTURE_3D_BIT       (1 << TEXTURE_3D_INDEX)
+#define TEXTURE_RECT_BIT     (1 << TEXTURE_RECT_INDEX)
+#define TEXTURE_2D_BIT       (1 << TEXTURE_2D_INDEX)
+#define TEXTURE_1D_BIT       (1 << TEXTURE_1D_INDEX)
+/*@}*/
+
+
+/**
+ * Indexes for cube map faces.
+ */
+typedef enum
+{
+   FACE_POS_X = 0,
+   FACE_NEG_X = 1,
+   FACE_POS_Y = 2,
+   FACE_NEG_Y = 3,
+   FACE_POS_Z = 4,
+   FACE_NEG_Z = 5,
+   MAX_FACES = 6
+} gl_face_index;
+
+
+
+/** Up to four combiner sources are possible with GL_NV_texture_env_combine4 */
+#define MAX_COMBINER_TERMS 4
+
+
+/**
+ * Texture combine environment state.
+ */
+struct gl_tex_env_combine_state
+{
+   GLenum ModeRGB;       /**< GL_REPLACE, GL_DECAL, GL_ADD, etc. */
+   GLenum ModeA;         /**< GL_REPLACE, GL_DECAL, GL_ADD, etc. */
+   /** Source terms: GL_PRIMARY_COLOR, GL_TEXTURE, etc */
+   GLenum SourceRGB[MAX_COMBINER_TERMS];
+   GLenum SourceA[MAX_COMBINER_TERMS];
+   /** Source operands: GL_SRC_COLOR, GL_ONE_MINUS_SRC_COLOR, etc */
+   GLenum OperandRGB[MAX_COMBINER_TERMS];
+   GLenum OperandA[MAX_COMBINER_TERMS];
+   GLuint ScaleShiftRGB; /**< 0, 1 or 2 */
+   GLuint ScaleShiftA;   /**< 0, 1 or 2 */
+   GLuint _NumArgsRGB;   /**< Number of inputs used for the RGB combiner */
+   GLuint _NumArgsA;     /**< Number of inputs used for the A combiner */
+};
+
+
+/**
+ * TexGenEnabled flags.
+ */
+/*@{*/
+#define S_BIT 1
+#define T_BIT 2
+#define R_BIT 4
+#define Q_BIT 8
+#define STR_BITS (S_BIT | T_BIT | R_BIT)
+/*@}*/
+
+
+/**
+ * Bit flag versions of the corresponding GL_ constants.
+ */
+/*@{*/
+#define TEXGEN_SPHERE_MAP        0x1
+#define TEXGEN_OBJ_LINEAR        0x2
+#define TEXGEN_EYE_LINEAR        0x4
+#define TEXGEN_REFLECTION_MAP_NV 0x8
+#define TEXGEN_NORMAL_MAP_NV     0x10
+
+#define TEXGEN_NEED_NORMALS      (TEXGEN_SPHERE_MAP        | \
+				  TEXGEN_REFLECTION_MAP_NV | \
+				  TEXGEN_NORMAL_MAP_NV)
+#define TEXGEN_NEED_EYE_COORD    (TEXGEN_SPHERE_MAP        | \
+				  TEXGEN_REFLECTION_MAP_NV | \
+				  TEXGEN_NORMAL_MAP_NV     | \
+				  TEXGEN_EYE_LINEAR)
+/*@}*/
+
+
+
+/** Tex-gen enabled for texture unit? */
+#define ENABLE_TEXGEN(unit) (1 << (unit))
+
+/** Non-identity texture matrix for texture unit? */
+#define ENABLE_TEXMAT(unit) (1 << (unit))
+
+/**
+ * Data structure representing a single clip plane (e.g. one of the elements
+ * of the ctx->Transform.EyeUserPlane or ctx->Transform._ClipUserPlane array).
+ */
+typedef GLfloat gl_clip_plane[4];
+
+typedef enum {
+   MAP_USER,
+   MAP_INTERNAL,
+
+   MAP_COUNT
+} gl_map_buffer_index;
+
+
+/**
+ * Fields describing a mapped buffer range.
+ */
+
+struct gl_transform_feedback_varying_info
+{
+   char *Name;
+   GLenum Type;
+   GLint Size;
+};
+
+
+/**
+ * Per-output info vertex shaders for transform feedback.
+ */
+struct gl_transform_feedback_output
+{
+   unsigned OutputRegister;
+   unsigned OutputBuffer;
+   unsigned NumComponents;
+
+   /** offset (in DWORDs) of this output within the interleaved structure */
+   unsigned DstOffset;
+
+   /**
+    * Offset into the output register of the data to output.  For example,
+    * if NumComponents is 2 and ComponentOffset is 1, then the data to
+    * offset is in the y and z components of the output register.
+    */
+   unsigned ComponentOffset;
+};
+
+
+/** Post-link transform feedback info. */
+struct gl_transform_feedback_info
+{
+   unsigned NumOutputs;
+
+   /**
+    * Number of transform feedback buffers in use by this program.
+    */
+   unsigned NumBuffers;
+
+   struct gl_transform_feedback_output *Outputs;
+
+   /** Transform feedback varyings used for the linking of this shader program.
+    *
+    * Use for glGetTransformFeedbackVarying().
+    */
+   struct gl_transform_feedback_varying_info *Varyings;
+   GLint NumVarying;
+
+   /**
+    * Total number of components stored in each buffer.  This may be used by
+    * hardware back-ends to determine the correct stride when interleaving
+    * multiple transform feedback outputs in the same buffer.
+    */
+   unsigned BufferStride[MAX_FEEDBACK_BUFFERS];
+};
+
+
+/**
+ * Names of the various vertex/fragment program register files, etc.
+ *
+ * NOTE: first four tokens must fit into 2 bits (see t_vb_arbprogram.c)
+ * All values should fit in a 4-bit field.
+ *
+ * NOTE: PROGRAM_STATE_VAR, PROGRAM_CONSTANT, and PROGRAM_UNIFORM can all be
+ * considered to be "uniform" variables since they can only be set outside
+ * glBegin/End.  They're also all stored in the same Parameters array.
+ */
+typedef enum
+{
+   PROGRAM_TEMPORARY,   /**< machine->Temporary[] */
+   PROGRAM_ARRAY,       /**< Arrays & Matrixes */
+   PROGRAM_INPUT,       /**< machine->Inputs[] */
+   PROGRAM_OUTPUT,      /**< machine->Outputs[] */
+   PROGRAM_STATE_VAR,   /**< gl_program->Parameters[] */
+   PROGRAM_CONSTANT,    /**< gl_program->Parameters[] */
+   PROGRAM_UNIFORM,     /**< gl_program->Parameters[] */
+   PROGRAM_WRITE_ONLY,  /**< A dummy, write-only register */
+   PROGRAM_ADDRESS,     /**< machine->AddressReg */
+   PROGRAM_SAMPLER,     /**< for shader samplers, compile-time only */
+   PROGRAM_SYSTEM_VALUE,/**< InstanceId, PrimitiveID, etc. */
+   PROGRAM_UNDEFINED,   /**< Invalid/TBD value */
+   PROGRAM_FILE_MAX
+} gl_register_file;
+
+
+/**
+ * If the register file is PROGRAM_SYSTEM_VALUE, the register index will be
+ * one of these values.
+ */
+typedef enum
+{
+   SYSTEM_VALUE_FRONT_FACE,     /**< Fragment shader only (not done yet) */
+   SYSTEM_VALUE_VERTEX_ID,      /**< Vertex shader only */
+   SYSTEM_VALUE_INSTANCE_ID,    /**< Vertex shader only */
+   SYSTEM_VALUE_SAMPLE_ID,      /**< Fragment shader only */
+   SYSTEM_VALUE_SAMPLE_POS,     /**< Fragment shader only */
+   SYSTEM_VALUE_SAMPLE_MASK_IN, /**< Fragment shader only */
+   SYSTEM_VALUE_INVOCATION_ID,  /**< Geometry shader only */
+   SYSTEM_VALUE_MAX             /**< Number of values */
+} gl_system_value;
+
+
+/**
+ * The possible interpolation qualifiers that can be applied to a fragment
+ * shader input in GLSL.
+ *
+ * Note: INTERP_QUALIFIER_NONE must be 0 so that memsetting the
+ * gl_fragment_program data structure to 0 causes the default behavior.
+ */
+enum glsl_interp_qualifier
+{
+   INTERP_QUALIFIER_NONE = 0,
+   INTERP_QUALIFIER_SMOOTH,
+   INTERP_QUALIFIER_FLAT,
+   INTERP_QUALIFIER_NOPERSPECTIVE,
+   INTERP_QUALIFIER_COUNT /**< Number of interpolation qualifiers */
+};
+
+
+/**
+ * \brief Layout qualifiers for gl_FragDepth.
+ *
+ * Extension AMD_conservative_depth allows gl_FragDepth to be redeclared with
+ * a layout qualifier.
+ *
+ * \see enum ir_depth_layout
+ */
+enum gl_frag_depth_layout
+{
+   FRAG_DEPTH_LAYOUT_NONE, /**< No layout is specified. */
+   FRAG_DEPTH_LAYOUT_ANY,
+   FRAG_DEPTH_LAYOUT_GREATER,
+   FRAG_DEPTH_LAYOUT_LESS,
+   FRAG_DEPTH_LAYOUT_UNCHANGED
+};
+
+
+/**
+ * Base class for any kind of program object
+ */
+struct gl_program
+{
+   GLuint Id;
+   GLubyte *String;  /**< Null-terminated program text */
+   // LunarG: Remove - VK does not use reference counts
+   // GLint RefCount;
+   GLenum Target;    /**< GL_VERTEX/FRAGMENT_PROGRAM_ARB, GL_GEOMETRY_PROGRAM_NV */
+   GLenum Format;    /**< String encoding format */
+
+   struct prog_instruction *Instructions;
+
+   GLbitfield64 InputsRead;     /**< Bitmask of which input regs are read */
+   GLbitfield64 OutputsWritten; /**< Bitmask of which output regs are written */
+   GLbitfield SystemValuesRead;   /**< Bitmask of SYSTEM_VALUE_x inputs used */
+   GLbitfield InputFlags[MAX_PROGRAM_INPUTS];   /**< PROG_PARAM_BIT_x flags */
+   GLbitfield OutputFlags[MAX_PROGRAM_OUTPUTS]; /**< PROG_PARAM_BIT_x flags */
+   GLbitfield TexturesUsed[MAX_COMBINED_TEXTURE_IMAGE_UNITS];  /**< TEXTURE_x_BIT bitmask */
+   GLbitfield SamplersUsed;   /**< Bitfield of which samplers are used */
+   GLbitfield ShadowSamplers; /**< Texture units used for shadow sampling. */
+
+   GLboolean UsesGather; /**< Does this program use gather4 at all? */
+
+   /**
+    * For vertex and geometry shaders, true if the program uses the
+    * gl_ClipDistance output.  Ignored for fragment shaders.
+    */
+   GLboolean UsesClipDistanceOut;
+
+
+   /** Named parameters, constants, etc. from program text */
+   struct gl_program_parameter_list *Parameters;
+
+   /**
+    * Local parameters used by the program.
+    *
+    * It's dynamically allocated because it is rarely used (just
+    * assembly-style programs), and MAX_PROGRAM_LOCAL_PARAMS entries once it's
+    * allocated.
+    */
+   GLfloat (*LocalParams)[4];
+
+   /** Map from sampler unit to texture unit (set by glUniform1i()) */
+   // LunarG - Bump to 32 bits to hold binding and set
+   GLuint SamplerUnits[MAX_SAMPLERS];
+
+   /** Bitmask of which register files are read/written with indirect
+    * addressing.  Mask of (1 << PROGRAM_x) bits.
+    */
+   GLbitfield IndirectRegisterFiles;
+
+   /** Logical counts */
+   /*@{*/
+   GLuint NumInstructions;
+   GLuint NumTemporaries;
+   GLuint NumParameters;
+   GLuint NumAttributes;
+   GLuint NumAddressRegs;
+   GLuint NumAluInstructions;
+   GLuint NumTexInstructions;
+   GLuint NumTexIndirections;
+   /*@}*/
+   /** Native, actual h/w counts */
+   /*@{*/
+   GLuint NumNativeInstructions;
+   GLuint NumNativeTemporaries;
+   GLuint NumNativeParameters;
+   GLuint NumNativeAttributes;
+   GLuint NumNativeAddressRegs;
+   GLuint NumNativeAluInstructions;
+   GLuint NumNativeTexInstructions;
+   GLuint NumNativeTexIndirections;
+   /*@}*/
+};
+
+
+/** Vertex program object */
+struct gl_vertex_program
+{
+   struct gl_program Base;   /**< base class */
+   GLboolean IsPositionInvariant;
+};
+
+
+/** Geometry program object */
+struct gl_geometry_program
+{
+   struct gl_program Base;   /**< base class */
+
+   GLint VerticesIn;
+   GLint VerticesOut;
+   GLint Invocations;
+   GLenum InputType;  /**< GL_POINTS, GL_LINES, GL_LINES_ADJACENCY_ARB,
+                           GL_TRIANGLES, or GL_TRIANGLES_ADJACENCY_ARB */
+   GLenum OutputType; /**< GL_POINTS, GL_LINE_STRIP or GL_TRIANGLE_STRIP */
+   GLboolean UsesEndPrimitive;
+};
+
+
+/** Fragment program object */
+struct gl_fragment_program
+{
+   struct gl_program Base;   /**< base class */
+   GLboolean UsesKill;          /**< shader uses KIL instruction */
+   GLboolean UsesDFdy;          /**< shader uses DDY instruction */
+   GLboolean OriginUpperLeft;
+   GLboolean PixelCenterInteger;
+   enum gl_frag_depth_layout FragDepthLayout;
+
+   /**
+    * GLSL interpolation qualifier associated with each fragment shader input.
+    * For inputs that do not have an interpolation qualifier specified in
+    * GLSL, the value is INTERP_QUALIFIER_NONE.
+    */
+   enum glsl_interp_qualifier InterpQualifier[VARYING_SLOT_MAX];
+
+   /**
+    * Bitfield indicating, for each fragment shader input, 1 if that input
+    * uses centroid interpolation, 0 otherwise.  Unused inputs are 0.
+    */
+   GLbitfield64 IsCentroid;
+
+   /**
+    * Bitfield indicating, for each fragment shader input, 1 if that input
+    * uses sample interpolation, 0 otherwise.  Unused inputs are 0.
+    */
+   GLbitfield64 IsSample;
+};
+
+
+/** Compute program object */
+struct gl_compute_program
+{
+   struct gl_program Base;   /**< base class */
+
+   /**
+    * Size specified using local_size_{x,y,z}.
+    */
+   unsigned LocalSize[3];
+};
+
+
+/**
+ * State common to vertex and fragment programs.
+ */
+struct gl_program_state
+{
+   GLint ErrorPos;                       /* GL_PROGRAM_ERROR_POSITION_ARB/NV */
+   const char *ErrorString;              /* GL_PROGRAM_ERROR_STRING_ARB/NV */
+};
+
+
+/**
+ * Context state for vertex programs.
+ */
+struct gl_vertex_program_state
+{
+   GLboolean Enabled;            /**< User-set GL_VERTEX_PROGRAM_ARB/NV flag */
+   GLboolean _Enabled;           /**< Enabled and _valid_ user program? */
+   GLboolean PointSizeEnabled;   /**< GL_VERTEX_PROGRAM_POINT_SIZE_ARB/NV */
+   GLboolean TwoSideEnabled;     /**< GL_VERTEX_PROGRAM_TWO_SIDE_ARB/NV */
+   /** Computed two sided lighting for fixed function/programs. */
+   GLboolean _TwoSideEnabled;
+   struct gl_vertex_program *Current;  /**< User-bound vertex program */
+
+   /** Currently enabled and valid vertex program (including internal
+    * programs, user-defined vertex programs and GLSL vertex shaders).
+    * This is the program we must use when rendering.
+    */
+   struct gl_vertex_program *_Current;
+
+   GLfloat Parameters[MAX_PROGRAM_ENV_PARAMS][4]; /**< Env params */
+
+   /** Should fixed-function T&L be implemented with a vertex prog? */
+   GLboolean _MaintainTnlProgram;
+
+   /** Program to emulate fixed-function T&L (see above) */
+   struct gl_vertex_program *_TnlProgram;
+
+   /** Cache of fixed-function programs */
+   struct gl_program_cache *Cache;
+
+   GLboolean _Overriden;
+};
+
+
+/**
+ * Context state for geometry programs.
+ */
+struct gl_geometry_program_state
+{
+   GLboolean Enabled;               /**< GL_ARB_GEOMETRY_SHADER4 */
+   GLboolean _Enabled;              /**< Enabled and valid program? */
+   struct gl_geometry_program *Current;  /**< user-bound geometry program */
+
+   /** Currently enabled and valid program (including internal programs
+    * and compiled shader programs).
+    */
+   struct gl_geometry_program *_Current;
+
+   GLfloat Parameters[MAX_PROGRAM_ENV_PARAMS][4]; /**< Env params */
+
+   /** Cache of fixed-function programs */
+   struct gl_program_cache *Cache;
+};
+
+/**
+ * Context state for fragment programs.
+ */
+struct gl_fragment_program_state
+{
+   GLboolean Enabled;     /**< User-set fragment program enable flag */
+   GLboolean _Enabled;    /**< Enabled and _valid_ user program? */
+   struct gl_fragment_program *Current;  /**< User-bound fragment program */
+
+   /** Currently enabled and valid fragment program (including internal
+    * programs, user-defined fragment programs and GLSL fragment shaders).
+    * This is the program we must use when rendering.
+    */
+   struct gl_fragment_program *_Current;
+
+   GLfloat Parameters[MAX_PROGRAM_ENV_PARAMS][4]; /**< Env params */
+
+   /** Should fixed-function texturing be implemented with a fragment prog? */
+   GLboolean _MaintainTexEnvProgram;
+
+   /** Program to emulate fixed-function texture env/combine (see above) */
+   struct gl_fragment_program *_TexEnvProgram;
+
+   /** Cache of fixed-function programs */
+   struct gl_program_cache *Cache;
+};
+
+
+/**
+ * ATI_fragment_shader runtime state
+ */
+#define ATI_FS_INPUT_PRIMARY 0
+#define ATI_FS_INPUT_SECONDARY 1
+
+struct atifs_instruction;
+struct atifs_setupinst;
+
+/**
+ * ATI fragment shader
+ */
+struct ati_fragment_shader
+{
+   GLuint Id;
+   GLint RefCount;
+   struct atifs_instruction *Instructions[2];
+   struct atifs_setupinst *SetupInst[2];
+   GLfloat Constants[8][4];
+   GLbitfield LocalConstDef;  /**< Indicates which constants have been set */
+   GLubyte numArithInstr[2];
+   GLubyte regsAssigned[2];
+   GLubyte NumPasses;         /**< 1 or 2 */
+   GLubyte cur_pass;
+   GLubyte last_optype;
+   GLboolean interpinp1;
+   GLboolean isValid;
+   GLuint swizzlerq;
+};
+
+/**
+ * Context state for GL_ATI_fragment_shader
+ */
+struct gl_ati_fragment_shader_state
+{
+   GLboolean Enabled;
+   GLboolean _Enabled;                  /**< enabled and valid shader? */
+   GLboolean Compiling;
+   GLfloat GlobalConstants[8][4];
+   struct ati_fragment_shader *Current;
+};
+
+
+/** Set by #pragma directives */
+struct gl_sl_pragmas
+{
+   GLboolean IgnoreOptimize;  /**< ignore #pragma optimize(on/off) ? */
+   GLboolean IgnoreDebug;     /**< ignore #pragma debug(on/off) ? */
+   GLboolean Optimize;  /**< defaults on */
+   GLboolean Debug;     /**< defaults off */
+};
+
+
+/**
+ * A GLSL vertex or fragment shader object.
+ */
+struct gl_shader
+{
+   /** GL_FRAGMENT_SHADER || GL_VERTEX_SHADER || GL_GEOMETRY_SHADER_ARB.
+    * Must be the first field.
+    */
+   GLenum Type;
+   gl_shader_stage Stage;
+   GLuint Name;  /**< AKA the handle */
+   GLchar *Label;   /**< GL_KHR_debug */
+   // LunarG: Remove - VK does not use reference counts
+   // GLint RefCount;
+   GLboolean DeletePending;
+   GLboolean CompileStatus;
+   const GLchar *Source;  /**< SPV or GLSL source code */
+   GLuint Size;         /**< SPV size */
+   GLuint SourceChecksum;       /**< for debug/logging purposes */
+   struct gl_program *Program;  /**< Post-compile assembly code */
+   GLchar *InfoLog;
+   struct gl_sl_pragmas Pragmas;
+
+   unsigned Version;       /**< GLSL version used for linking */
+   GLboolean IsES;         /**< True if this shader uses GLSL ES */
+
+   /**
+    * \name Sampler tracking
+    *
+    * \note Each of these fields is only set post-linking.
+    */
+   /*@{*/
+   unsigned num_samplers;	/**< Number of samplers used by this shader. */
+   GLbitfield active_samplers;	/**< Bitfield of which samplers are used */
+   GLbitfield shadow_samplers;	/**< Samplers used for shadow sampling. */
+   /*@}*/
+
+   /**
+    * Map from sampler unit to texture unit (set by glUniform1i())
+    *
+    * A sampler unit is associated with each sampler uniform by the linker.
+    * The sampler unit associated with each uniform is stored in the
+    * \c gl_uniform_storage::sampler field.
+    */
+   // LunarG - Bump to 32 bits to hold binding and set
+   GLuint SamplerUnits[MAX_SAMPLERS];
+   /** Which texture target is being sampled (TEXTURE_1D/2D/3D/etc_INDEX) */
+   gl_texture_index SamplerTargets[MAX_SAMPLERS];
+
+   /**
+    * Number of default uniform block components used by this shader.
+    *
+    * This field is only set post-linking.
+    */
+   unsigned num_uniform_components;
+
+   /**
+    * Number of combined uniform components used by this shader.
+    *
+    * This field is only set post-linking.  It is the sum of the uniform block
+    * sizes divided by sizeof(float), and num_uniform_compoennts.
+    */
+   unsigned num_combined_uniform_components;
+
+   /**
+    * This shader's uniform block information.
+    *
+    * These fields are only set post-linking.
+    */
+   struct gl_uniform_block *UniformBlocks;
+   unsigned NumUniformBlocks;
+
+   struct exec_list *ir;
+   struct glsl_symbol_table *symbols;
+
+   bool uses_builtin_functions;
+   bool uses_gl_fragcoord;
+   bool redeclares_gl_fragcoord;
+   bool ARB_fragment_coord_conventions_enable;
+
+   /**
+    * Fragment shader state from GLSL 1.50 layout qualifiers.
+    */
+   bool origin_upper_left;
+   bool pixel_center_integer;
+
+   /**
+    * Geometry shader state from GLSL 1.50 layout qualifiers.
+    */
+   struct {
+      GLint VerticesOut;
+      /**
+       * 0 - Invocations count not declared in shader, or
+       * 1 .. MAX_GEOMETRY_SHADER_INVOCATIONS
+       */
+      GLint Invocations;
+      /**
+       * GL_POINTS, GL_LINES, GL_LINES_ADJACENCY, GL_TRIANGLES, or
+       * GL_TRIANGLES_ADJACENCY, or PRIM_UNKNOWN if it's not set in this
+       * shader.
+       */
+      GLenum InputType;
+       /**
+        * GL_POINTS, GL_LINE_STRIP or GL_TRIANGLE_STRIP, or PRIM_UNKNOWN if
+        * it's not set in this shader.
+        */
+      GLenum OutputType;
+   } Geom;
+
+   /**
+    * Map from image uniform index to image unit (set by glUniform1i())
+    *
+    * An image uniform index is associated with each image uniform by
+    * the linker.  The image index associated with each uniform is
+    * stored in the \c gl_uniform_storage::image field.
+    */
+   GLubyte ImageUnits[MAX_IMAGE_UNIFORMS];
+
+   /**
+    * Access qualifier specified in the shader for each image uniform
+    * index.  Either \c GL_READ_ONLY, \c GL_WRITE_ONLY or \c
+    * GL_READ_WRITE.
+    *
+    * It may be different, though only more strict than the value of
+    * \c gl_image_unit::Access for the corresponding image unit.
+    */
+   GLenum ImageAccess[MAX_IMAGE_UNIFORMS];
+
+   /**
+    * Number of image uniforms defined in the shader.  It specifies
+    * the number of valid elements in the \c ImageUnits and \c
+    * ImageAccess arrays above.
+    */
+   GLuint NumImages;
+
+   /**
+    * Compute shader state from ARB_compute_shader layout qualifiers.
+    */
+   struct {
+      /**
+       * Size specified using local_size_{x,y,z}, or all 0's to indicate that
+       * it's not set in this shader.
+       */
+      unsigned LocalSize[3];
+   } Comp;
+
+   /**
+    * Deferred task of glCompileShader.  We should extend the mutex, not only
+    * to protect the deferred task, but to protect the entire gl_shader.
+    *
+    * MUST BE LAST FOR SHADER CACHE TO WORK
+    */
+   mtx_t Mutex;
+   struct _mesa_threadpool_task *Task;
+   void *TaskData;
+};
+
+
+struct gl_uniform_buffer_variable
+{
+   char *Name;
+
+   /**
+    * Name of the uniform as seen by glGetUniformIndices.
+    *
+    * glGetUniformIndices requires that the block instance index \b not be
+    * present in the name of queried uniforms.
+    *
+    * \note
+    * \c gl_uniform_buffer_variable::IndexName and
+    * \c gl_uniform_buffer_variable::Name may point to identical storage.
+    */
+   char *IndexName;
+
+   const struct glsl_type *Type;
+   unsigned int Offset;
+   GLboolean RowMajor;
+};
+
+
+enum gl_uniform_block_packing
+{
+   ubo_packing_std140,
+   ubo_packing_shared,
+   ubo_packing_packed
+};
+
+
+struct gl_uniform_block
+{
+   /** Declared name of the uniform block */
+   char *Name;
+
+   /** Array of supplemental information about UBO ir_variables. */
+   struct gl_uniform_buffer_variable *Uniforms;
+   GLuint NumUniforms;
+
+   /**
+    * Index (GL_UNIFORM_BLOCK_BINDING) into ctx->UniformBufferBindings[] to use
+    * with glBindBufferBase to bind a buffer object to this uniform block.  When
+    * updated in the program, _NEW_BUFFER_OBJECT will be set.
+    */
+   GLuint Binding;
+
+   /**
+    * Minimum size of a buffer object to back this uniform buffer
+    * (GL_UNIFORM_BLOCK_DATA_SIZE).
+    */
+   GLuint UniformBufferSize;
+
+   /**
+    * Layout specified in the shader
+    *
+    * This isn't accessible through the API, but it is used while
+    * cross-validating uniform blocks.
+    */
+   enum gl_uniform_block_packing _Packing;
+};
+
+/**
+ * Structure that represents a reference to an atomic buffer from some
+ * shader program.
+ */
+struct gl_active_atomic_buffer
+{
+   /** Uniform indices of the atomic counters declared within it. */
+   GLuint *Uniforms;
+   GLuint NumUniforms;
+
+   /** Binding point index associated with it. */
+   GLuint Binding;
+
+   /** Minimum reasonable size it is expected to have. */
+   GLuint MinimumSize;
+
+   /** Shader stages making use of it. */
+   GLboolean StageReferences[MESA_SHADER_STAGES];
+};
+
+/**
+ * A GLSL program object.
+ * Basically a linked collection of vertex and fragment shaders.
+ */
+struct gl_shader_program
+{
+   GLenum Type;  /**< Always GL_SHADER_PROGRAM (internal token) */
+   GLuint Name;  /**< aka handle or ID */
+   GLchar *Label;   /**< GL_KHR_debug */
+   // LunarG: Remove - VK does not use reference counts
+   // GLint RefCount;
+   GLboolean DeletePending;
+
+   /**
+    * Is the application intending to glGetProgramBinary this program?
+    */
+   GLboolean BinaryRetreivableHint;
+
+   /**
+    * Indicates whether program can be bound for individual pipeline stages
+    * using UseProgramStages after it is next linked.
+    */
+   GLboolean SeparateShader;
+
+   GLuint NumShaders;          /**< number of attached shaders */
+   struct gl_shader **Shaders; /**< List of attached the shaders */
+
+   /**
+    * User-defined attribute bindings
+    *
+    * These are set via \c glBindAttribLocation and are used to direct the
+    * GLSL linker.  These are \b not the values used in the compiled shader,
+    * and they are \b not the values returned by \c glGetAttribLocation.
+    */
+   struct string_to_uint_map *AttributeBindings;
+
+   /**
+    * User-defined fragment data bindings
+    *
+    * These are set via \c glBindFragDataLocation and are used to direct the
+    * GLSL linker.  These are \b not the values used in the compiled shader,
+    * and they are \b not the values returned by \c glGetFragDataLocation.
+    */
+   struct string_to_uint_map *FragDataBindings;
+   struct string_to_uint_map *FragDataIndexBindings;
+
+   /**
+    * Transform feedback varyings last specified by
+    * glTransformFeedbackVaryings().
+    *
+    * For the current set of transform feeedback varyings used for transform
+    * feedback output, see LinkedTransformFeedback.
+    */
+   struct {
+      GLenum BufferMode;
+      GLuint NumVarying;
+      GLchar **VaryingNames;  /**< Array [NumVarying] of char * */
+   } TransformFeedback;
+
+   /** Post-link transform feedback info. */
+   struct gl_transform_feedback_info LinkedTransformFeedback;
+
+   /** Post-link gl_FragDepth layout for ARB_conservative_depth. */
+   enum gl_frag_depth_layout FragDepthLayout;
+
+   /**
+    * Geometry shader state - copied into gl_geometry_program by
+    * _mesa_copy_linked_program_data().
+    */
+   struct {
+      GLint VerticesIn;
+      GLint VerticesOut;
+      /**
+       * 1 .. MAX_GEOMETRY_SHADER_INVOCATIONS
+       */
+      GLint Invocations;
+      GLenum InputType;  /**< GL_POINTS, GL_LINES, GL_LINES_ADJACENCY_ARB,
+                              GL_TRIANGLES, or GL_TRIANGLES_ADJACENCY_ARB */
+      GLenum OutputType; /**< GL_POINTS, GL_LINE_STRIP or GL_TRIANGLE_STRIP */
+      /**
+       * True if gl_ClipDistance is written to.  Copied into
+       * gl_geometry_program by _mesa_copy_linked_program_data().
+       */
+      GLboolean UsesClipDistance;
+      GLuint ClipDistanceArraySize; /**< Size of the gl_ClipDistance array, or
+                                         0 if not present. */
+      GLboolean UsesEndPrimitive;
+   } Geom;
+
+   /** Vertex shader state */
+   struct {
+      /**
+       * True if gl_ClipDistance is written to.  Copied into gl_vertex_program
+       * by _mesa_copy_linked_program_data().
+       */
+      GLboolean UsesClipDistance;
+      GLuint ClipDistanceArraySize; /**< Size of the gl_ClipDistance array, or
+                                         0 if not present. */
+   } Vert;
+
+   /**
+    * Compute shader state - copied into gl_compute_program by
+    * _mesa_copy_linked_program_data().
+    */
+   struct {
+      /**
+       * If this shader contains a compute stage, size specified using
+       * local_size_{x,y,z}.  Otherwise undefined.
+       */
+      unsigned LocalSize[3];
+   } Comp;
+
+   /* post-link info: */
+   unsigned NumUserUniformStorage;
+   struct gl_uniform_storage *UniformStorage;
+
+   /**
+    * Mapping from GL uniform locations returned by \c glUniformLocation to
+    * UniformStorage entries. Arrays will have multiple contiguous slots
+    * in the UniformRemapTable, all pointing to the same UniformStorage entry.
+    */
+   unsigned NumUniformRemapTable;
+   struct gl_uniform_storage **UniformRemapTable;
+
+   /**
+    * Size of the gl_ClipDistance array that is output from the last pipeline
+    * stage before the fragment shader.
+    */
+   unsigned LastClipDistanceArraySize;
+
+   struct gl_uniform_block *UniformBlocks;
+   unsigned NumUniformBlocks;
+
+   /**
+    * Indices into the _LinkedShaders's UniformBlocks[] array for each stage
+    * they're used in, or -1.
+    *
+    * This is used to maintain the Binding values of the stage's UniformBlocks[]
+    * and to answer the GL_UNIFORM_BLOCK_REFERENCED_BY_*_SHADER queries.
+    */
+   int *UniformBlockStageIndex[MESA_SHADER_STAGES];
+
+   /**
+    * Map of active uniform names to locations
+    *
+    * Maps any active uniform that is not an array element to a location.
+    * Each active uniform, including individual structure members will appear
+    * in this map.  This roughly corresponds to the set of names that would be
+    * enumerated by \c glGetActiveUniform.
+    */
+   struct string_to_uint_map *UniformHash;
+
+   struct gl_active_atomic_buffer *AtomicBuffers;
+   unsigned NumAtomicBuffers;
+
+   GLboolean LinkStatus;   /**< GL_LINK_STATUS */
+   GLboolean Validated;
+   GLboolean _Used;        /**< Ever used for drawing? */
+   GLboolean _Linked;      /**< Ever linked? */
+   GLchar *InfoLog;
+
+   unsigned Version;       /**< GLSL version used for linking */
+   GLboolean IsES;         /**< True if this program uses GLSL ES */
+
+   /**
+    * Per-stage shaders resulting from the first stage of linking.
+    *
+    * Set of linked shaders for this program.  The array is accessed using the
+    * \c MESA_SHADER_* defines.  Entries for non-existent stages will be
+    * \c NULL.
+    */
+   struct gl_shader *_LinkedShaders[MESA_SHADER_STAGES];
+
+   /* True if any of the fragment shaders attached to this program use:
+    * #extension ARB_fragment_coord_conventions: enable
+    */
+   GLboolean ARB_fragment_coord_conventions_enable;
+
+   /**
+    * Deferred task of glLinkProgram.  We should extend the mutex, not only
+    * to protect the deferred task, but to protect the entire
+    * gl_shader_program.
+    *
+    * MUST BE LAST FOR SHADER CACHE TO WORK
+    */
+   mtx_t Mutex;
+   struct _mesa_threadpool_task *Task;
+   void *TaskData;
+};   
+
+
+#define GLSL_DUMP      0x1  /**< Dump shaders to stdout */
+#define GLSL_LOG       0x2  /**< Write shaders to files */
+#define GLSL_OPT       0x4  /**< Force optimizations (override pragmas) */
+#define GLSL_NO_OPT    0x8  /**< Force no optimizations (override pragmas) */
+#define GLSL_UNIFORMS 0x10  /**< Print glUniform calls */
+#define GLSL_NOP_VERT 0x20  /**< Force no-op vertex shaders */
+#define GLSL_NOP_FRAG 0x40  /**< Force no-op fragment shaders */
+#define GLSL_USE_PROG 0x80  /**< Log glUseProgram calls */
+#define GLSL_REPORT_ERRORS 0x100  /**< Print compilation errors */
+#define GLSL_DUMP_ON_ERROR 0x200 /**< Dump shaders to stderr on compile error */
+#define GLSL_USE_GLASS     0x400 /**< Use LunarGlass optimizer */
+
+/**
+ * Compiler options for a single GLSL shaders type
+ */
+struct gl_shader_compiler_options
+{
+   /** Driver-selectable options: */
+   GLboolean EmitCondCodes;             /**< Use condition codes? */
+   GLboolean EmitNoLoops;
+   GLboolean EmitNoFunctions;
+   GLboolean EmitNoCont;                  /**< Emit CONT opcode? */
+   GLboolean EmitNoMainReturn;            /**< Emit CONT/RET opcodes? */
+   GLboolean EmitNoNoise;                 /**< Emit NOISE opcodes? */
+   GLboolean EmitNoPow;                   /**< Emit POW opcodes? */
+   GLboolean LowerClipDistance; /**< Lower gl_ClipDistance from float[8] to vec4[2]? */
+
+   /**
+    * \name Forms of indirect addressing the driver cannot do.
+    */
+   /*@{*/
+   GLboolean EmitNoIndirectInput;   /**< No indirect addressing of inputs */
+   GLboolean EmitNoIndirectOutput;  /**< No indirect addressing of outputs */
+   GLboolean EmitNoIndirectTemp;    /**< No indirect addressing of temps */
+   GLboolean EmitNoIndirectUniform; /**< No indirect addressing of constants */
+   /*@}*/
+
+   GLuint MaxIfDepth;               /**< Maximum nested IF blocks */
+   GLuint MaxUnrollIterations;
+
+   /**
+    * Optimize code for array of structures backends.
+    *
+    * This is a proxy for:
+    *   - preferring DP4 instructions (rather than MUL/MAD) for
+    *     matrix * vector operations, such as position transformation.
+    */
+   GLboolean OptimizeForAOS;
+
+   struct gl_sl_pragmas DefaultPragmas; /**< Default #pragma settings */
+};
+
+/**
+ * State which can be shared by multiple contexts:
+ */
+struct gl_shared_state
+{
+   mtx_t Mutex;		   /**< for thread safety */
+   GLint RefCount;			   /**< Reference count */
+
+   /**
+    * \name Vertex/geometry/fragment programs
+    */
+   /*@{*/
+   struct _mesa_HashTable *Programs; /**< All vertex/fragment programs */
+   struct gl_vertex_program *DefaultVertexProgram;
+   struct gl_fragment_program *DefaultFragmentProgram;
+   struct gl_geometry_program *DefaultGeometryProgram;
+   /*@}*/
+
+   /* GL_ATI_fragment_shader */
+   struct _mesa_HashTable *ATIShaders;
+   struct ati_fragment_shader *DefaultFragmentShader;
+
+   /** Table of both gl_shader and gl_shader_program objects */
+   struct _mesa_HashTable *ShaderObjects;
+
+   /**
+    * Some context in this share group was affected by a GPU reset
+    *
+    * On the next call to \c glGetGraphicsResetStatus, contexts that have not
+    * been affected by a GPU reset must also return
+    * \c GL_INNOCENT_CONTEXT_RESET_ARB.
+    *
+    * Once this field becomes true, it is never reset to false.
+    */
+   bool ShareGroupReset;
+};
+
+
+
+/**
+ * Precision info for shader datatypes.  See glGetShaderPrecisionFormat().
+ */
+struct gl_precision
+{
+   GLushort RangeMin;   /**< min value exponent */
+   GLushort RangeMax;   /**< max value exponent */
+   GLushort Precision;  /**< number of mantissa bits */
+};
+
+
+/**
+ * Limits for vertex, geometry and fragment programs/shaders.
+ */
+struct gl_program_constants
+{
+   /* logical limits */
+   GLuint MaxInstructions;
+   GLuint MaxAluInstructions;
+   GLuint MaxTexInstructions;
+   GLuint MaxTexIndirections;
+   GLuint MaxAttribs;
+   GLuint MaxTemps;
+   GLuint MaxAddressRegs;
+   GLuint MaxAddressOffset;  /**< [-MaxAddressOffset, MaxAddressOffset-1] */
+   GLuint MaxParameters;
+   GLuint MaxLocalParams;
+   GLuint MaxEnvParams;
+   /* native/hardware limits */
+   GLuint MaxNativeInstructions;
+   GLuint MaxNativeAluInstructions;
+   GLuint MaxNativeTexInstructions;
+   GLuint MaxNativeTexIndirections;
+   GLuint MaxNativeAttribs;
+   GLuint MaxNativeTemps;
+   GLuint MaxNativeAddressRegs;
+   GLuint MaxNativeParameters;
+   /* For shaders */
+   GLuint MaxUniformComponents;  /**< Usually == MaxParameters * 4 */
+
+   /**
+    * \name Per-stage input / output limits
+    *
+    * Previous to OpenGL 3.2, the intrastage data limits were advertised with
+    * a single value: GL_MAX_VARYING_COMPONENTS (GL_MAX_VARYING_VECTORS in
+    * ES).  This is stored as \c gl_constants::MaxVarying.
+    *
+    * Starting with OpenGL 3.2, the limits are advertised with per-stage
+    * variables.  Each stage as a certain number of outputs that it can feed
+    * to the next stage and a certain number inputs that it can consume from
+    * the previous stage.
+    *
+    * Vertex shader inputs do not participate this in this accounting.
+    * These are tracked exclusively by \c gl_program_constants::MaxAttribs.
+    *
+    * Fragment shader outputs do not participate this in this accounting.
+    * These are tracked exclusively by \c gl_constants::MaxDrawBuffers.
+    */
+   /*@{*/
+   GLuint MaxInputComponents;
+   GLuint MaxOutputComponents;
+   /*@}*/
+
+   /* ES 2.0 and GL_ARB_ES2_compatibility */
+   struct gl_precision LowFloat, MediumFloat, HighFloat;
+   struct gl_precision LowInt, MediumInt, HighInt;
+   /* GL_ARB_uniform_buffer_object */
+   GLuint MaxUniformBlocks;
+   GLuint MaxCombinedUniformComponents;
+   GLuint MaxTextureImageUnits;
+
+   /* GL_ARB_shader_atomic_counters */
+   GLuint MaxAtomicBuffers;
+   GLuint MaxAtomicCounters;
+
+   /* GL_ARB_shader_image_load_store */
+   GLuint MaxImageUniforms;
+};
+
+
+/**
+ * Constants which may be overridden by device driver during context creation
+ * but are never changed after that.
+ */
+struct gl_constants
+{
+   GLuint MaxTextureMbytes;      /**< Max memory per image, in MB */
+   GLuint MaxTextureLevels;      /**< Max mipmap levels. */ 
+   GLuint Max3DTextureLevels;    /**< Max mipmap levels for 3D textures */
+   GLuint MaxCubeTextureLevels;  /**< Max mipmap levels for cube textures */
+   GLuint MaxArrayTextureLayers; /**< Max layers in array textures */
+   GLuint MaxTextureRectSize;    /**< Max rectangle texture size, in pixes */
+   GLuint MaxTextureCoordUnits;
+   GLuint MaxCombinedTextureImageUnits;
+   GLuint MaxTextureUnits; /**< = MIN(CoordUnits, FragmentProgram.ImageUnits) */
+   GLfloat MaxTextureMaxAnisotropy;  /**< GL_EXT_texture_filter_anisotropic */
+   GLfloat MaxTextureLodBias;        /**< GL_EXT_texture_lod_bias */
+   GLuint MaxTextureBufferSize;      /**< GL_ARB_texture_buffer_object */
+
+   GLuint TextureBufferOffsetAlignment; /**< GL_ARB_texture_buffer_range */
+
+   GLuint MaxArrayLockSize;
+
+   GLint SubPixelBits;
+
+   GLfloat MinPointSize, MaxPointSize;	     /**< aliased */
+   GLfloat MinPointSizeAA, MaxPointSizeAA;   /**< antialiased */
+   GLfloat PointSizeGranularity;
+   GLfloat MinLineWidth, MaxLineWidth;       /**< aliased */
+   GLfloat MinLineWidthAA, MaxLineWidthAA;   /**< antialiased */
+   GLfloat LineWidthGranularity;
+
+   GLuint MaxClipPlanes;
+   GLuint MaxLights;
+   GLfloat MaxShininess;                     /**< GL_NV_light_max_exponent */
+   GLfloat MaxSpotExponent;                  /**< GL_NV_light_max_exponent */
+
+   GLuint MaxViewportWidth, MaxViewportHeight;
+   GLuint MaxViewports;                      /**< GL_ARB_viewport_array */
+   GLuint ViewportSubpixelBits;              /**< GL_ARB_viewport_array */
+   struct {
+      GLfloat Min;
+      GLfloat Max;
+   } ViewportBounds;                         /**< GL_ARB_viewport_array */
+
+   struct gl_program_constants Program[MESA_SHADER_STAGES];
+   GLuint MaxProgramMatrices;
+   GLuint MaxProgramMatrixStackDepth;
+
+   struct {
+      GLuint SamplesPassed;
+      GLuint TimeElapsed;
+      GLuint Timestamp;
+      GLuint PrimitivesGenerated;
+      GLuint PrimitivesWritten;
+   } QueryCounterBits;
+
+   /** vertex array / buffer object bounds checking */
+   GLboolean CheckArrayBounds;
+
+   GLuint MaxDrawBuffers;    /**< GL_ARB_draw_buffers */
+
+   GLuint MaxColorAttachments;   /**< GL_EXT_framebuffer_object */
+   GLuint MaxRenderbufferSize;   /**< GL_EXT_framebuffer_object */
+   GLuint MaxSamples;            /**< GL_ARB_framebuffer_object */
+
+   /** Number of varying vectors between any two shader stages. */
+   GLuint MaxVarying;
+
+   /** @{
+    * GL_ARB_uniform_buffer_object
+    */
+   GLuint MaxCombinedUniformBlocks;
+   GLuint MaxUniformBufferBindings;
+   GLuint MaxUniformBlockSize;
+   GLuint UniformBufferOffsetAlignment;
+   /** @} */
+
+   /** GL_ARB_geometry_shader4 */
+   GLuint MaxGeometryOutputVertices;
+   GLuint MaxGeometryTotalOutputComponents;
+
+   GLuint GLSLVersion;  /**< GLSL version supported (ex: 120 = 1.20) */
+
+   /**
+    * Changes default GLSL extension behavior from "error" to "warn".  It's out
+    * of spec, but it can make some apps work that otherwise wouldn't.
+    */
+   GLboolean ForceGLSLExtensionsWarn;
+
+   /**
+    * If non-zero, forces GLSL shaders without the #version directive to behave
+    * as if they began with "#version ForceGLSLVersion".
+    */
+   GLuint ForceGLSLVersion;
+
+   /**
+    * LunarGlass optimizer mode:
+    * 0 = never use (force disable)
+    * 1 = use driver whitelist
+    * 2 = always use (force enable)
+    */
+   GLuint GlassMode;
+
+   /**
+    * LunarGlass optimization flags:
+    * This is just one for now, but more should be added.
+    */
+   GLboolean GlassEnableReassociation;
+
+   /**
+    * Does the driver support real 32-bit integers?  (Otherwise, integers are
+    * simulated via floats.)
+    */
+   GLboolean NativeIntegers;
+
+   /**
+    * If the driver supports real 32-bit integers, what integer value should be
+    * used for boolean true in uniform uploads?  (Usually 1 or ~0.)
+    */
+   GLuint UniformBooleanTrue;
+
+   /** Which texture units support GL_ATI_envmap_bumpmap as targets */
+   GLbitfield SupportedBumpUnits;
+
+   /**
+    * Maximum amount of time, measured in nanseconds, that the server can wait.
+    */
+   GLuint64 MaxServerWaitTimeout;
+
+   /** GL_EXT_provoking_vertex */
+   GLboolean QuadsFollowProvokingVertexConvention;
+
+   /** OpenGL version 3.0 */
+   GLbitfield ContextFlags;  /**< Ex: GL_CONTEXT_FLAG_FORWARD_COMPATIBLE_BIT */
+
+   /** OpenGL version 3.2 */
+   GLbitfield ProfileMask;   /**< Mask of CONTEXT_x_PROFILE_BIT */
+
+   /** GL_EXT_transform_feedback */
+   GLuint MaxTransformFeedbackBuffers;
+   GLuint MaxTransformFeedbackSeparateComponents;
+   GLuint MaxTransformFeedbackInterleavedComponents;
+   GLuint MaxVertexStreams;
+
+   /** GL_EXT_gpu_shader4 */
+   GLint MinProgramTexelOffset, MaxProgramTexelOffset;
+
+   /** GL_ARB_texture_gather */
+   GLuint MinProgramTextureGatherOffset;
+   GLuint MaxProgramTextureGatherOffset;
+   GLuint MaxProgramTextureGatherComponents;
+
+   /* GL_ARB_robustness */
+   GLenum ResetStrategy;
+
+   /* GL_ARB_blend_func_extended */
+   GLuint MaxDualSourceDrawBuffers;
+
+   /**
+    * Whether the implementation strips out and ignores texture borders.
+    *
+    * Many GPU hardware implementations don't support rendering with texture
+    * borders and mipmapped textures.  (Note: not static border color, but the
+    * old 1-pixel border around each edge).  Implementations then have to do
+    * slow fallbacks to be correct, or just ignore the border and be fast but
+    * wrong.  Setting the flag strips the border off of TexImage calls,
+    * providing "fast but wrong" at significantly reduced driver complexity.
+    *
+    * Texture borders are deprecated in GL 3.0.
+    **/
+   GLboolean StripTextureBorder;
+
+   /**
+    * For drivers which can do a better job at eliminating unused uniforms
+    * than the GLSL compiler.
+    *
+    * XXX Remove these as soon as a better solution is available.
+    */
+   GLboolean GLSLSkipStrictMaxUniformLimitCheck;
+
+   /**
+    * Force software support for primitive restart in the VBO module.
+    */
+   GLboolean PrimitiveRestartInSoftware;
+
+   /**
+    * Always use the GetTransformFeedbackVertexCount() driver hook, rather
+    * than passing the transform feedback object to the drawing function.
+    */
+   GLboolean AlwaysUseGetTransformFeedbackVertexCount;
+
+   /** GL_ARB_map_buffer_alignment */
+   GLuint MinMapBufferAlignment;
+
+   /**
+    * Disable varying packing.  This is out of spec, but potentially useful
+    * for older platforms that supports a limited number of texture
+    * indirections--on these platforms, unpacking the varyings in the fragment
+    * shader increases the number of texture indirections by 1, which might
+    * make some shaders not executable at all.
+    *
+    * Drivers that support transform feedback must set this value to GL_FALSE.
+    */
+   GLboolean DisableVaryingPacking;
+
+   /*
+    * Maximum value supported for an index in DrawElements and friends.
+    *
+    * This must be at least (1ull<<24)-1.  The default value is
+    * (1ull<<32)-1.
+    *
+    * \since ES 3.0 or GL_ARB_ES3_compatibility
+    * \sa _mesa_init_constants
+    */
+   GLuint64 MaxElementIndex;
+
+   /**
+    * Disable interpretation of line continuations (lines ending with a
+    * backslash character ('\') in GLSL source.
+    */
+   GLboolean DisableGLSLLineContinuations;
+
+   /** GL_ARB_texture_multisample */
+   GLint MaxColorTextureSamples;
+   GLint MaxDepthTextureSamples;
+   GLint MaxIntegerSamples;
+
+   /** GL_ARB_shader_atomic_counters */
+   GLuint MaxAtomicBufferBindings;
+   GLuint MaxAtomicBufferSize;
+   GLuint MaxCombinedAtomicBuffers;
+   GLuint MaxCombinedAtomicCounters;
+
+   /** GL_ARB_vertex_attrib_binding */
+   GLint MaxVertexAttribRelativeOffset;
+   GLint MaxVertexAttribBindings;
+
+   /* GL_ARB_shader_image_load_store */
+   GLuint MaxImageUnits;
+   GLuint MaxCombinedImageUnitsAndFragmentOutputs;
+   GLuint MaxImageSamples;
+   GLuint MaxCombinedImageUniforms;
+
+   /** GL_ARB_compute_shader */
+   GLuint MaxComputeWorkGroupCount[3]; /* Array of x, y, z dimensions */
+   GLuint MaxComputeWorkGroupSize[3]; /* Array of x, y, z dimensions */
+   GLuint MaxComputeWorkGroupInvocations;
+
+   /** GL_ARB_gpu_shader5 */
+   GLfloat MinFragmentInterpolationOffset;
+   GLfloat MaxFragmentInterpolationOffset;
+
+   GLboolean FakeSWMSAA;
+
+   /*
+    * Defer certain operations to a thread pool.
+    *
+    * When DeferLinkProgram is set, these functions must be thread-safe
+    *
+    *   ctx->Driver.NewShader
+    *   ctx->Driver.DeleteShader
+    *   ctx->Driver.LinkShader
+    */
+   GLboolean DeferCompileShader;
+   GLboolean DeferLinkProgram;
+
+   /* The following is used to limit both program and shader cache sizes */
+   /* If set to 0, it will disable caching */
+   GLuint MaxShaderCacheSize;
+};
+
+
+/**
+ * Enable flag for each OpenGL extension.  Different device drivers will
+ * enable different extensions at runtime.
+ */
+struct gl_extensions
+{
+   GLboolean dummy;  /* don't remove this! */
+   GLboolean dummy_true;  /* Set true by _mesa_init_extensions(). */
+   GLboolean dummy_false; /* Set false by _mesa_init_extensions(). */
+   GLboolean ANGLE_texture_compression_dxt;
+   GLboolean ARB_ES2_compatibility;
+   GLboolean ARB_ES3_compatibility;
+   GLboolean ARB_arrays_of_arrays;
+   GLboolean ARB_base_instance;
+   GLboolean ARB_blend_func_extended;
+   GLboolean ARB_buffer_storage;
+   GLboolean ARB_color_buffer_float;
+   GLboolean ARB_compute_shader;
+   GLboolean ARB_conservative_depth;
+   GLboolean ARB_depth_buffer_float;
+   GLboolean ARB_depth_clamp;
+   GLboolean ARB_depth_texture;
+   GLboolean ARB_draw_buffers_blend;
+   GLboolean ARB_draw_elements_base_vertex;
+   GLboolean ARB_draw_indirect;
+   GLboolean ARB_draw_instanced;
+   GLboolean ARB_fragment_coord_conventions;
+   GLboolean ARB_fragment_program;
+   GLboolean ARB_fragment_program_shadow;
+   GLboolean ARB_fragment_shader;
+   GLboolean ARB_framebuffer_object;
+   GLboolean ARB_explicit_attrib_location;
+   GLboolean ARB_geometry_shader4;
+   GLboolean ARB_gpu_shader5;
+   GLboolean ARB_half_float_vertex;
+   GLboolean ARB_instanced_arrays;
+   GLboolean ARB_internalformat_query;
+   GLboolean ARB_map_buffer_range;
+   GLboolean ARB_occlusion_query;
+   GLboolean ARB_occlusion_query2;
+   GLboolean ARB_point_sprite;
+   GLboolean ARB_sample_shading;
+   GLboolean ARB_seamless_cube_map;
+   GLboolean ARB_shader_atomic_counters;
+   GLboolean ARB_shader_bit_encoding;
+   GLboolean ARB_shader_image_load_store;
+   GLboolean ARB_shader_stencil_export;
+   GLboolean ARB_shader_texture_lod;
+   GLboolean ARB_shading_language_packing;
+   GLboolean ARB_shading_language_420pack;
+   GLboolean ARB_shadow;
+   GLboolean ARB_stencil_texturing;
+   GLboolean ARB_sync;
+   GLboolean ARB_texture_border_clamp;
+   GLboolean ARB_texture_buffer_object;
+   GLboolean ARB_texture_buffer_object_rgb32;
+   GLboolean ARB_texture_buffer_range;
+   GLboolean ARB_texture_compression_rgtc;
+   GLboolean ARB_texture_cube_map;
+   GLboolean ARB_texture_cube_map_array;
+   GLboolean ARB_texture_env_combine;
+   GLboolean ARB_texture_env_crossbar;
+   GLboolean ARB_texture_env_dot3;
+   GLboolean ARB_texture_float;
+   GLboolean ARB_texture_gather;
+   GLboolean ARB_texture_mirror_clamp_to_edge;
+   GLboolean ARB_texture_multisample;
+   GLboolean ARB_texture_non_power_of_two;
+   GLboolean ARB_texture_stencil8;
+   GLboolean ARB_texture_query_levels;
+   GLboolean ARB_texture_query_lod;
+   GLboolean ARB_texture_rg;
+   GLboolean ARB_texture_rgb10_a2ui;
+   GLboolean ARB_texture_view;
+   GLboolean ARB_timer_query;
+   GLboolean ARB_transform_feedback2;
+   GLboolean ARB_transform_feedback3;
+   GLboolean ARB_transform_feedback_instanced;
+   GLboolean ARB_uniform_buffer_object;
+   GLboolean ARB_vertex_program;
+   GLboolean ARB_vertex_shader;
+   GLboolean ARB_vertex_type_10f_11f_11f_rev;
+   GLboolean ARB_vertex_type_2_10_10_10_rev;
+   GLboolean ARB_viewport_array;
+   GLboolean EXT_blend_color;
+   GLboolean EXT_blend_equation_separate;
+   GLboolean EXT_blend_func_separate;
+   GLboolean EXT_blend_minmax;
+   GLboolean EXT_depth_bounds_test;
+   GLboolean EXT_draw_buffers2;
+   GLboolean EXT_framebuffer_multisample;
+   GLboolean EXT_framebuffer_multisample_blit_scaled;
+   GLboolean EXT_framebuffer_sRGB;
+   GLboolean EXT_gpu_program_parameters;
+   GLboolean EXT_gpu_shader4;
+   GLboolean EXT_packed_float;
+   GLboolean EXT_pixel_buffer_object;
+   GLboolean EXT_point_parameters;
+   GLboolean EXT_provoking_vertex;
+   GLboolean EXT_shader_integer_mix;
+   GLboolean EXT_stencil_two_side;
+   GLboolean EXT_texture3D;
+   GLboolean EXT_texture_array;
+   GLboolean EXT_texture_compression_latc;
+   GLboolean EXT_texture_compression_s3tc;
+   GLboolean EXT_texture_env_dot3;
+   GLboolean EXT_texture_filter_anisotropic;
+   GLboolean EXT_texture_integer;
+   GLboolean EXT_texture_mirror_clamp;
+   GLboolean EXT_texture_shared_exponent;
+   GLboolean EXT_texture_snorm;
+   GLboolean EXT_texture_sRGB;
+   GLboolean EXT_texture_sRGB_decode;
+   GLboolean EXT_texture_swizzle;
+   GLboolean EXT_transform_feedback;
+   GLboolean EXT_timer_query;
+   GLboolean EXT_vertex_array_bgra;
+   GLboolean OES_standard_derivatives;
+   /* vendor extensions */
+   GLboolean AMD_performance_monitor;
+   GLboolean AMD_seamless_cubemap_per_texture;
+   GLboolean AMD_vertex_shader_layer;
+   GLboolean APPLE_object_purgeable;
+   GLboolean ATI_envmap_bumpmap;
+   GLboolean ATI_texture_compression_3dc;
+   GLboolean ATI_texture_mirror_once;
+   GLboolean ATI_texture_env_combine3;
+   GLboolean ATI_fragment_shader;
+   GLboolean ATI_separate_stencil;
+   GLboolean INTEL_performance_query;
+   GLboolean MESA_pack_invert;
+   GLboolean MESA_ycbcr_texture;
+   GLboolean NV_conditional_render;
+   GLboolean NV_fog_distance;
+   GLboolean NV_fragment_program_option;
+   GLboolean NV_point_sprite;
+   GLboolean NV_primitive_restart;
+   GLboolean NV_texture_barrier;
+   GLboolean NV_texture_env_combine4;
+   GLboolean NV_texture_rectangle;
+   GLboolean NV_vdpau_interop;
+   GLboolean TDFX_texture_compression_FXT1;
+   GLboolean OES_EGL_image;
+   GLboolean OES_draw_texture;
+   GLboolean OES_depth_texture_cube_map;
+   GLboolean OES_EGL_image_external;
+   GLboolean OES_compressed_ETC1_RGB8_texture;
+   GLboolean extension_sentinel;
+   /** The extension string */
+   const GLubyte *String;
+   /** Number of supported extensions */
+   GLuint Count;
+};
+
+
+/**
+ * A stack of matrices (projection, modelview, color, texture, etc).
+ */
+struct gl_matrix_stack
+{
+   GLmatrix *Top;      /**< points into Stack */
+   GLmatrix *Stack;    /**< array [MaxDepth] of GLmatrix */
+   GLuint Depth;       /**< 0 <= Depth < MaxDepth */
+   GLuint MaxDepth;    /**< size of Stack[] array */
+   GLuint DirtyFlag;   /**< _NEW_MODELVIEW or _NEW_PROJECTION, for example */
+};
+
+
+/**
+ * \name Bits for image transfer operations 
+ * \sa __struct gl_contextRec::ImageTransferState.
+ */
+/*@{*/
+#define IMAGE_SCALE_BIAS_BIT                      0x1
+#define IMAGE_SHIFT_OFFSET_BIT                    0x2
+#define IMAGE_MAP_COLOR_BIT                       0x4
+#define IMAGE_CLAMP_BIT                           0x800
+
+
+/** Pixel Transfer ops */
+#define IMAGE_BITS (IMAGE_SCALE_BIAS_BIT |			\
+		    IMAGE_SHIFT_OFFSET_BIT |			\
+		    IMAGE_MAP_COLOR_BIT)
+
+/**
+ * \name Bits to indicate what state has changed.  
+ */
+/*@{*/
+#define _NEW_MODELVIEW         (1 << 0)   /**< gl_context::ModelView */
+#define _NEW_PROJECTION        (1 << 1)   /**< gl_context::Projection */
+#define _NEW_TEXTURE_MATRIX    (1 << 2)   /**< gl_context::TextureMatrix */
+#define _NEW_COLOR             (1 << 3)   /**< gl_context::Color */
+#define _NEW_DEPTH             (1 << 4)   /**< gl_context::Depth */
+#define _NEW_EVAL              (1 << 5)   /**< gl_context::Eval, EvalMap */
+#define _NEW_FOG               (1 << 6)   /**< gl_context::Fog */
+#define _NEW_HINT              (1 << 7)   /**< gl_context::Hint */
+#define _NEW_LIGHT             (1 << 8)   /**< gl_context::Light */
+#define _NEW_LINE              (1 << 9)   /**< gl_context::Line */
+#define _NEW_PIXEL             (1 << 10)  /**< gl_context::Pixel */
+#define _NEW_POINT             (1 << 11)  /**< gl_context::Point */
+#define _NEW_POLYGON           (1 << 12)  /**< gl_context::Polygon */
+#define _NEW_POLYGONSTIPPLE    (1 << 13)  /**< gl_context::PolygonStipple */
+#define _NEW_SCISSOR           (1 << 14)  /**< gl_context::Scissor */
+#define _NEW_STENCIL           (1 << 15)  /**< gl_context::Stencil */
+#define _NEW_TEXTURE           (1 << 16)  /**< gl_context::Texture */
+#define _NEW_TRANSFORM         (1 << 17)  /**< gl_context::Transform */
+#define _NEW_VIEWPORT          (1 << 18)  /**< gl_context::Viewport */
+/* gap, re-use for core Mesa state only; use ctx->DriverFlags otherwise */
+#define _NEW_ARRAY             (1 << 20)  /**< gl_context::Array */
+#define _NEW_RENDERMODE        (1 << 21)  /**< gl_context::RenderMode, etc */
+#define _NEW_BUFFERS           (1 << 22)  /**< gl_context::Visual, DrawBuffer, */
+#define _NEW_CURRENT_ATTRIB    (1 << 23)  /**< gl_context::Current */
+#define _NEW_MULTISAMPLE       (1 << 24)  /**< gl_context::Multisample */
+#define _NEW_TRACK_MATRIX      (1 << 25)  /**< gl_context::VertexProgram */
+#define _NEW_PROGRAM           (1 << 26)  /**< New program/shader state */
+#define _NEW_PROGRAM_CONSTANTS (1 << 27)
+#define _NEW_BUFFER_OBJECT     (1 << 28)
+#define _NEW_FRAG_CLAMP        (1 << 29)
+/* gap, re-use for core Mesa state only; use ctx->DriverFlags otherwise */
+#define _NEW_VARYING_VP_INPUTS (1 << 31) /**< gl_context::varying_vp_inputs */
+#define _NEW_ALL ~0
+/*@}*/
+
+
+/**
+ * Composite state flags
+ */
+/*@{*/
+#define _MESA_NEW_NEED_EYE_COORDS         (_NEW_LIGHT |		\
+                                           _NEW_TEXTURE |	\
+                                           _NEW_POINT |		\
+                                           _NEW_PROGRAM |	\
+                                           _NEW_MODELVIEW)
+
+#define _MESA_NEW_SEPARATE_SPECULAR        (_NEW_LIGHT | \
+                                            _NEW_FOG | \
+                                            _NEW_PROGRAM)
+
+
+/*@}*/
+
+
+
+
+/* This has to be included here. */
+#include "dd.h"
+
+
+/**
+ * Display list flags.
+ * Strictly this is a tnl-private concept, but it doesn't seem
+ * worthwhile adding a tnl private structure just to hold this one bit
+ * of information:
+ */
+#define DLIST_DANGLING_REFS     0x1 
+
+/** @{
+ *
+ * These are a mapping of the GL_ARB_debug_output/GL_KHR_debug enums
+ * to small enums suitable for use as an array index.
+ */
+
+enum mesa_debug_source {
+   MESA_DEBUG_SOURCE_API,
+   MESA_DEBUG_SOURCE_WINDOW_SYSTEM,
+   MESA_DEBUG_SOURCE_SHADER_COMPILER,
+   MESA_DEBUG_SOURCE_THIRD_PARTY,
+   MESA_DEBUG_SOURCE_APPLICATION,
+   MESA_DEBUG_SOURCE_OTHER,
+   MESA_DEBUG_SOURCE_COUNT
+};
+
+enum mesa_debug_type {
+   MESA_DEBUG_TYPE_ERROR,
+   MESA_DEBUG_TYPE_DEPRECATED,
+   MESA_DEBUG_TYPE_UNDEFINED,
+   MESA_DEBUG_TYPE_PORTABILITY,
+   MESA_DEBUG_TYPE_PERFORMANCE,
+   MESA_DEBUG_TYPE_OTHER,
+   MESA_DEBUG_TYPE_MARKER,
+   MESA_DEBUG_TYPE_PUSH_GROUP,
+   MESA_DEBUG_TYPE_POP_GROUP,
+   MESA_DEBUG_TYPE_COUNT
+};
+
+enum mesa_debug_severity {
+   MESA_DEBUG_SEVERITY_LOW,
+   MESA_DEBUG_SEVERITY_MEDIUM,
+   MESA_DEBUG_SEVERITY_HIGH,
+   MESA_DEBUG_SEVERITY_NOTIFICATION,
+   MESA_DEBUG_SEVERITY_COUNT
+};
+
+/** @} */
+
+/**
+ * Enum for the OpenGL APIs we know about and may support.
+ *
+ * NOTE: This must match the api_enum table in
+ * src/mesa/main/get_hash_generator.py
+ */
+typedef enum
+{
+   API_OPENGL_COMPAT,      /* legacy / compatibility contexts */
+   API_OPENGLES,
+   API_OPENGLES2,
+   API_OPENGL_CORE,
+   API_VK,
+   API_OPENGL_LAST = API_OPENGL_CORE
+} gl_api;
+
+
+/**
+ * Mesa rendering context.
+ *
+ * This is the central context data structure for Mesa.  Almost all
+ * OpenGL state is contained in this structure.
+ * Think of this as a base class from which device drivers will derive
+ * sub classes.
+ */
+struct gl_context
+{
+   /** State possibly shared with other contexts in the address space */
+    struct gl_shared_state *Shared;
+
+   /** \name API function pointer tables */
+   /*@{*/
+    gl_api API;
+
+   /**
+    * Device driver function pointer table
+    */
+    struct dd_function_table Driver;  // LunarG: MARKED_FOR_DEATH
+
+   /** Core/Driver constants */
+   struct gl_constants Const;
+
+   /** Extension information */
+   struct gl_extensions Extensions;
+
+   /** GL version integer, for example 31 for GL 3.1, or 20 for GLES 2.0. */
+   GLuint Version;
+   char *VersionString;
+
+   struct gl_multisample_attrib Multisample;
+
+   struct gl_program_state Program;  /**< general program state */
+   struct gl_vertex_program_state VertexProgram;
+   struct gl_fragment_program_state FragmentProgram;
+   struct gl_geometry_program_state GeometryProgram;
+   struct gl_ati_fragment_shader_state ATIFragmentShader;
+
+   struct gl_shader_compiler_options ShaderCompilerOptions[MESA_SHADER_STAGES];
+
+   GLenum ErrorValue;        /**< Last error code */
+
+   /**
+    * Recognize and silence repeated error debug messages in buggy apps.
+    */
+   const char *ErrorDebugFmtString;
+   GLuint ErrorDebugCount;
+
+   /* GL_ARB_debug_output/GL_KHR_debug */
+    // LunarG: MARKED_FOR_DEATH
+   mtx_t DebugMutex;
+   struct gl_debug_state *Debug;
+
+
+   struct gl_list_extensions *ListExt; /**< driver dlist extensions */
+
+   GLbitfield GlslFlags;                    /**< Mask of GLSL_x flags */
+
+   /* A thread pool for threaded shader compilation */
+   struct _mesa_threadpool *ThreadPool;
+};
+
+
+// LunarG - update this when merging compiler and driver debug flags
+#ifdef DEBUG
+//extern int MESA_VERBOSE;
+//extern int MESA_DEBUG_FLAGS;
+//# define MESA_FUNCTION __FUNCTION__
+# define MESA_VERBOSE 0
+# define MESA_DEBUG_FLAGS 0
+# define MESA_FUNCTION "a function"
+#else
+# define MESA_VERBOSE 0
+# define MESA_DEBUG_FLAGS 0
+# define MESA_FUNCTION "a function"
+# ifndef NDEBUG
+#  define NDEBUG
+# endif
+#endif
+
+
+/** The MESA_VERBOSE var is a bitmask of these flags */
+enum _verbose
+{
+   VERBOSE_VARRAY		= 0x0001,
+   VERBOSE_TEXTURE		= 0x0002,
+   VERBOSE_MATERIAL		= 0x0004,
+   VERBOSE_PIPELINE		= 0x0008,
+   VERBOSE_DRIVER		= 0x0010,
+   VERBOSE_STATE		= 0x0020,
+   VERBOSE_API			= 0x0040,
+   VERBOSE_DISPLAY_LIST		= 0x0100,
+   VERBOSE_LIGHTING		= 0x0200,
+   VERBOSE_PRIMS		= 0x0400,
+   VERBOSE_VERTS		= 0x0800,
+   VERBOSE_DISASSEM		= 0x1000,
+   VERBOSE_DRAW                 = 0x2000,
+   VERBOSE_SWAPBUFFERS          = 0x4000
+};
+
+
+/** The MESA_DEBUG_FLAGS var is a bitmask of these flags */
+enum _debug
+{
+   DEBUG_SILENT                 = (1 << 0),
+   DEBUG_ALWAYS_FLUSH		= (1 << 1),
+   DEBUG_INCOMPLETE_TEXTURE     = (1 << 2),
+   DEBUG_INCOMPLETE_FBO         = (1 << 3)
+};
+
+
+#define DRI_CONF_GLASS_MODE_NEVER 0
+#define DRI_CONF_GLASS_MODE_WHITELIST 1
+#define DRI_CONF_GLASS_MODE_ALWAYS 2
+
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* MTYPES_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/shaderobj.h b/icd/intel/compiler/mesa-utils/src/mesa/main/shaderobj.h
new file mode 100644
index 0000000..c2f8a0d
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/shaderobj.h

@@ -0,0 +1,174 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 2004-2007  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+#ifndef SHADEROBJ_H
+#define SHADEROBJ_H
+
+
+#include "main/compiler.h"
+#include "main/glheader.h"
+#include "main/mtypes.h"
+#include "program/ir_to_mesa.h"
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+/**
+ * Internal functions
+ */
+
+extern void
+_mesa_init_shader_state(struct gl_context * ctx);
+
+extern void
+_mesa_free_shader_state(struct gl_context *ctx);
+
+
+extern void
+_mesa_reference_shader(struct gl_context *ctx, struct gl_shader **ptr,
+                       struct gl_shader *sh);
+
+extern void
+_mesa_wait_shaders(struct gl_context *ctx,
+                   struct gl_shader **shaders,
+                   int num_shaders);
+
+extern struct gl_shader *
+_mesa_lookup_shader_no_wait(struct gl_context *ctx, GLuint name);
+
+extern struct gl_shader *
+_mesa_lookup_shader_err_no_wait(struct gl_context *ctx, GLuint name,
+                                const char *caller);
+
+static inline struct gl_shader *
+_mesa_lookup_shader(struct gl_context *ctx, GLuint name)
+{
+   struct gl_shader *sh = _mesa_lookup_shader_no_wait(ctx, name);
+   if (sh)
+      _mesa_wait_shaders(ctx, &sh, 1);
+   return sh;
+}
+
+static inline struct gl_shader *
+_mesa_lookup_shader_err(struct gl_context *ctx, GLuint name, const char *caller)
+{
+   struct gl_shader *sh = _mesa_lookup_shader_err_no_wait(ctx, name, caller);
+   if (sh)
+      _mesa_wait_shaders(ctx, &sh, 1);
+   return sh;
+}
+
+extern void
+_mesa_reference_shader_program(struct gl_context *ctx,
+                               struct gl_shader_program **ptr,
+                               struct gl_shader_program *shProg);
+extern void
+_mesa_init_shader(struct gl_context *ctx, struct gl_shader *shader);
+
+extern struct gl_shader *
+_mesa_new_shader(struct gl_context *ctx, GLuint name, GLenum type);
+
+extern void
+_mesa_init_shader_program(struct gl_context *ctx, struct gl_shader_program *prog);
+
+extern void
+_mesa_wait_shader_program(struct gl_context *ctx,
+                          struct gl_shader_program *shProg);
+
+extern struct gl_shader_program *
+_mesa_lookup_shader_program_no_wait(struct gl_context *ctx, GLuint name);
+
+extern struct gl_shader_program *
+_mesa_lookup_shader_program_err_no_wait(struct gl_context *ctx, GLuint name,
+                                        const char *caller);
+
+static inline struct gl_shader_program *
+_mesa_lookup_shader_program(struct gl_context *ctx, GLuint name)
+{
+   struct gl_shader_program *shProg =
+      _mesa_lookup_shader_program_no_wait(ctx, name);
+   if (shProg)
+      _mesa_wait_shader_program(ctx, shProg);
+   return shProg;
+}
+
+static inline struct gl_shader_program *
+_mesa_lookup_shader_program_err(struct gl_context *ctx, GLuint name,
+                                const char *caller)
+{
+   struct gl_shader_program *shProg =
+      _mesa_lookup_shader_program_err_no_wait(ctx, name, caller);
+   if (shProg)
+      _mesa_wait_shader_program(ctx, shProg);
+   return shProg;
+}
+
+extern void
+_mesa_clear_shader_program_data(struct gl_context *ctx,
+                                struct gl_shader_program *shProg);
+
+extern void
+_mesa_free_shader_program_data(struct gl_context *ctx,
+                               struct gl_shader_program *shProg);
+
+
+
+extern void
+_mesa_init_shader_object_functions(struct dd_function_table *driver);
+
+extern void
+_mesa_init_shader_state(struct gl_context *ctx);
+
+extern void
+_mesa_free_shader_state(struct gl_context *ctx);
+
+
+static inline gl_shader_stage
+_mesa_shader_enum_to_shader_stage(GLenum v)
+{
+   switch (v) {
+   case GL_VERTEX_SHADER:
+      return MESA_SHADER_VERTEX;
+   case GL_FRAGMENT_SHADER:
+      return MESA_SHADER_FRAGMENT;
+   case GL_GEOMETRY_SHADER:
+      return MESA_SHADER_GEOMETRY;
+   case GL_COMPUTE_SHADER:
+      return MESA_SHADER_COMPUTE;
+   default:
+      ASSERT(0 && "bad value in _mesa_shader_enum_to_shader_stage()");
+      return MESA_SHADER_VERTEX;
+   }
+}
+
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* SHADEROBJ_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/simple_list.h b/icd/intel/compiler/mesa-utils/src/mesa/main/simple_list.h
new file mode 100644
index 0000000..903432d
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/simple_list.h

@@ -0,0 +1,210 @@
+/**
+ * \file simple_list.h
+ * Simple macros for type-safe, intrusive lists.
+ *
+ *  Intended to work with a list sentinal which is created as an empty
+ *  list.  Insert & delete are O(1).
+ *  
+ * \author
+ *  (C) 1997, Keith Whitwell
+ */
+
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2001  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+#ifndef _SIMPLE_LIST_H
+#define _SIMPLE_LIST_H
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct simple_node {
+   struct simple_node *next;
+   struct simple_node *prev;
+};
+
+/**
+ * Remove an element from list.
+ *
+ * \param elem element to remove.
+ */
+#define remove_from_list(elem)			\
+do {						\
+   (elem)->next->prev = (elem)->prev;		\
+   (elem)->prev->next = (elem)->next;		\
+} while (0)
+
+/**
+ * Insert an element to the list head.
+ *
+ * \param list list.
+ * \param elem element to insert.
+ */
+#define insert_at_head(list, elem)		\
+do {						\
+   (elem)->prev = list;				\
+   (elem)->next = (list)->next;			\
+   (list)->next->prev = elem;			\
+   (list)->next = elem;				\
+} while(0)
+
+/**
+ * Insert an element to the list tail.
+ *
+ * \param list list.
+ * \param elem element to insert.
+ */
+#define insert_at_tail(list, elem)		\
+do {						\
+   (elem)->next = list;				\
+   (elem)->prev = (list)->prev;			\
+   (list)->prev->next = elem;			\
+   (list)->prev = elem;				\
+} while(0)
+
+/**
+ * Move an element to the list head.
+ *
+ * \param list list.
+ * \param elem element to move.
+ */
+#define move_to_head(list, elem)		\
+do {						\
+   remove_from_list(elem);			\
+   insert_at_head(list, elem);			\
+} while (0)
+
+/**
+ * Move an element to the list tail.
+ *
+ * \param list list.
+ * \param elem element to move.
+ */
+#define move_to_tail(list, elem)		\
+do {						\
+   remove_from_list(elem);			\
+   insert_at_tail(list, elem);			\
+} while (0)
+
+/**
+ * Make a empty list empty.
+ *
+ * \param sentinal list (sentinal element).
+ */
+#define make_empty_list(sentinal)		\
+do {						\
+   (sentinal)->next = sentinal;			\
+   (sentinal)->prev = sentinal;			\
+} while (0)
+
+/**
+ * Get list first element.
+ *
+ * \param list list.
+ *
+ * \return pointer to first element.
+ */
+#define first_elem(list)       ((list)->next)
+
+/**
+ * Get list last element.
+ *
+ * \param list list.
+ *
+ * \return pointer to last element.
+ */
+#define last_elem(list)        ((list)->prev)
+
+/**
+ * Get next element.
+ *
+ * \param elem element.
+ *
+ * \return pointer to next element.
+ */
+#define next_elem(elem)        ((elem)->next)
+
+/**
+ * Get previous element.
+ *
+ * \param elem element.
+ *
+ * \return pointer to previous element.
+ */
+#define prev_elem(elem)        ((elem)->prev)
+
+/**
+ * Test whether element is at end of the list.
+ * 
+ * \param list list.
+ * \param elem element.
+ * 
+ * \return non-zero if element is at end of list, or zero otherwise.
+ */
+#define at_end(list, elem)     ((elem) == (list))
+
+/**
+ * Test if a list is empty.
+ * 
+ * \param list list.
+ * 
+ * \return non-zero if list empty, or zero otherwise.
+ */
+#define is_empty_list(list)    ((list)->next == (list))
+
+/**
+ * Walk through the elements of a list.
+ *
+ * \param ptr pointer to the current element.
+ * \param list list.
+ *
+ * \note It should be followed by a { } block or a single statement, as in a \c
+ * for loop.
+ */
+#define foreach(ptr, list)     \
+        for( ptr=(list)->next ;  ptr!=list ;  ptr=(ptr)->next )
+
+/**
+ * Walk through the elements of a list.
+ *
+ * Same as #foreach but lets you unlink the current value during a list
+ * traversal.  Useful for freeing a list, element by element.
+ * 
+ * \param ptr pointer to the current element.
+ * \param t temporary pointer.
+ * \param list list.
+ *
+ * \note It should be followed by a { } block or a single statement, as in a \c
+ * for loop.
+ */
+#define foreach_s(ptr, t, list)   \
+        for(ptr=(list)->next,t=(ptr)->next; list != ptr; ptr=t, t=(t)->next)
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/uniforms.c b/icd/intel/compiler/mesa-utils/src/mesa/main/uniforms.c
new file mode 100644
index 0000000..fea05de
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/uniforms.c

@@ -0,0 +1,86 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 2004-2008  Brian Paul   All Rights Reserved.
+ * Copyright (C) 2009-2010  VMware, Inc.  All Rights Reserved.
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file uniforms.c
+ * Functions related to GLSL uniform variables.
+ * \author Brian Paul
+ */
+
+/**
+ * XXX things to do:
+ * 1. Check that the right error code is generated for all _mesa_error() calls.
+ * 2. Insert FLUSH_VERTICES calls in various places
+ */
+
+#include "main/glheader.h"
+#include "main/context.h"
+#include "main/shaderobj.h"
+#include "main/uniforms.h"
+#include "main/enums.h"
+#include "ir_uniform.h"
+#include "glsl_types.h"
+#include "program/program.h"
+
+/**
+ * Update the vertex/fragment program's TexturesUsed array.
+ *
+ * This needs to be called after glUniform(set sampler var) is called.
+ * A call to glUniform(samplerVar, value) causes a sampler to point to a
+ * particular texture unit.  We know the sampler's texture target
+ * (1D/2D/3D/etc) from compile time but the sampler's texture unit is
+ * set by glUniform() calls.
+ *
+ * So, scan the program->SamplerUnits[] and program->SamplerTargets[]
+ * information to update the prog->TexturesUsed[] values.
+ * Each value of TexturesUsed[unit] is one of zero, TEXTURE_1D_INDEX,
+ * TEXTURE_2D_INDEX, TEXTURE_3D_INDEX, etc.
+ * We'll use that info for state validation before rendering.
+ */
+void
+_mesa_update_shader_textures_used(struct gl_shader_program *shProg,
+				  struct gl_program *prog)
+{
+   GLuint s;
+   struct gl_shader *shader =
+      shProg->_LinkedShaders[_mesa_program_enum_to_shader_stage(prog->Target)];
+
+   assert(shader);
+
+   memcpy(prog->SamplerUnits, shader->SamplerUnits, sizeof(prog->SamplerUnits));
+   memset(prog->TexturesUsed, 0, sizeof(prog->TexturesUsed));
+
+   for (s = 0; s < MAX_SAMPLERS; s++) {
+      if (prog->SamplersUsed & (1 << s)) {
+         // LunarG - mask out the set value, which resides in upper 16-bits
+         GLuint unit = shader->SamplerUnits[s] & 0xFFFF;
+         GLuint tgt = shader->SamplerTargets[s];
+         assert(unit < Elements(prog->TexturesUsed));
+         assert(tgt < NUM_TEXTURE_TARGETS);
+         prog->TexturesUsed[unit] |= (1 << tgt);
+      }
+   }
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/uniforms.h b/icd/intel/compiler/mesa-utils/src/mesa/main/uniforms.h
new file mode 100644
index 0000000..0e93e15
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/uniforms.h

@@ -0,0 +1,392 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 2010  VMware, Inc.  All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+#ifndef UNIFORMS_H
+#define UNIFORMS_H
+
+#include "glheader.h"
+#include "program/prog_parameter.h"
+#include "../glsl/glsl_types.h"
+#include "../glsl/ir_uniform.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+struct gl_program;
+struct _glapi_table;
+
+void GLAPIENTRY
+_mesa_Uniform1f(GLint, GLfloat);
+void GLAPIENTRY
+_mesa_Uniform2f(GLint, GLfloat, GLfloat);
+void GLAPIENTRY
+_mesa_Uniform3f(GLint, GLfloat, GLfloat, GLfloat);
+void GLAPIENTRY
+_mesa_Uniform4f(GLint, GLfloat, GLfloat, GLfloat, GLfloat);
+void GLAPIENTRY
+_mesa_Uniform1i(GLint, GLint);
+void GLAPIENTRY
+_mesa_Uniform2i(GLint, GLint, GLint);
+void GLAPIENTRY
+_mesa_Uniform3i(GLint, GLint, GLint, GLint);
+void GLAPIENTRY
+_mesa_Uniform4i(GLint, GLint, GLint, GLint, GLint);
+void GLAPIENTRY
+_mesa_Uniform1fv(GLint, GLsizei, const GLfloat *);
+void GLAPIENTRY
+_mesa_Uniform2fv(GLint, GLsizei, const GLfloat *);
+void GLAPIENTRY
+_mesa_Uniform3fv(GLint, GLsizei, const GLfloat *);
+void GLAPIENTRY
+_mesa_Uniform4fv(GLint, GLsizei, const GLfloat *);
+void GLAPIENTRY
+_mesa_Uniform1iv(GLint, GLsizei, const GLint *);
+void GLAPIENTRY
+_mesa_Uniform2iv(GLint, GLsizei, const GLint *);
+void GLAPIENTRY
+_mesa_Uniform3iv(GLint, GLsizei, const GLint *);
+void GLAPIENTRY
+_mesa_Uniform4iv(GLint, GLsizei, const GLint *);
+void GLAPIENTRY
+_mesa_Uniform1ui(GLint location, GLuint v0);
+void GLAPIENTRY
+_mesa_Uniform2ui(GLint location, GLuint v0, GLuint v1);
+void GLAPIENTRY
+_mesa_Uniform3ui(GLint location, GLuint v0, GLuint v1, GLuint v2);
+void GLAPIENTRY
+_mesa_Uniform4ui(GLint location, GLuint v0, GLuint v1, GLuint v2, GLuint v3);
+void GLAPIENTRY
+_mesa_Uniform1uiv(GLint location, GLsizei count, const GLuint *value);
+void GLAPIENTRY
+_mesa_Uniform2uiv(GLint location, GLsizei count, const GLuint *value);
+void GLAPIENTRY
+_mesa_Uniform3uiv(GLint location, GLsizei count, const GLuint *value);
+void GLAPIENTRY
+_mesa_Uniform4uiv(GLint location, GLsizei count, const GLuint *value);
+void GLAPIENTRY
+_mesa_UniformMatrix2fv(GLint, GLsizei, GLboolean, const GLfloat *);
+void GLAPIENTRY
+_mesa_UniformMatrix3fv(GLint, GLsizei, GLboolean, const GLfloat *);
+void GLAPIENTRY
+_mesa_UniformMatrix4fv(GLint, GLsizei, GLboolean, const GLfloat *);
+void GLAPIENTRY
+_mesa_UniformMatrix2x3fv(GLint location, GLsizei count, GLboolean transpose,
+                         const GLfloat *value);
+void GLAPIENTRY
+_mesa_UniformMatrix3x2fv(GLint location, GLsizei count, GLboolean transpose,
+                         const GLfloat *value);
+void GLAPIENTRY
+_mesa_UniformMatrix2x4fv(GLint location, GLsizei count, GLboolean transpose,
+                         const GLfloat *value);
+void GLAPIENTRY
+_mesa_UniformMatrix4x2fv(GLint location, GLsizei count, GLboolean transpose,
+                         const GLfloat *value);
+void GLAPIENTRY
+_mesa_UniformMatrix3x4fv(GLint location, GLsizei count, GLboolean transpose,
+                         const GLfloat *value);
+void GLAPIENTRY
+_mesa_UniformMatrix4x3fv(GLint location, GLsizei count, GLboolean transpose,
+                         const GLfloat *value);
+
+void GLAPIENTRY
+_mesa_ProgramUniform1f(GLuint program, GLint, GLfloat);
+void GLAPIENTRY
+_mesa_ProgramUniform2f(GLuint program, GLint, GLfloat, GLfloat);
+void GLAPIENTRY
+_mesa_ProgramUniform3f(GLuint program, GLint, GLfloat, GLfloat, GLfloat);
+void GLAPIENTRY
+_mesa_ProgramUniform4f(GLuint program, GLint, GLfloat, GLfloat, GLfloat, GLfloat);
+void GLAPIENTRY
+_mesa_ProgramUniform1i(GLuint program, GLint, GLint);
+void GLAPIENTRY
+_mesa_ProgramUniform2i(GLuint program, GLint, GLint, GLint);
+void GLAPIENTRY
+_mesa_ProgramUniform3i(GLuint program, GLint, GLint, GLint, GLint);
+void GLAPIENTRY
+_mesa_ProgramUniform4i(GLuint program, GLint, GLint, GLint, GLint, GLint);
+void GLAPIENTRY
+_mesa_ProgramUniform1fv(GLuint program, GLint, GLsizei, const GLfloat *);
+void GLAPIENTRY
+_mesa_ProgramUniform2fv(GLuint program, GLint, GLsizei, const GLfloat *);
+void GLAPIENTRY
+_mesa_ProgramUniform3fv(GLuint program, GLint, GLsizei, const GLfloat *);
+void GLAPIENTRY
+_mesa_ProgramUniform4fv(GLuint program, GLint, GLsizei, const GLfloat *);
+void GLAPIENTRY
+_mesa_ProgramUniform1iv(GLuint program, GLint, GLsizei, const GLint *);
+void GLAPIENTRY
+_mesa_ProgramUniform2iv(GLuint program, GLint, GLsizei, const GLint *);
+void GLAPIENTRY
+_mesa_ProgramUniform3iv(GLuint program, GLint, GLsizei, const GLint *);
+void GLAPIENTRY
+_mesa_ProgramUniform4iv(GLuint program, GLint, GLsizei, const GLint *);
+void GLAPIENTRY
+_mesa_ProgramUniform1ui(GLuint program, GLint location, GLuint v0);
+void GLAPIENTRY
+_mesa_ProgramUniform2ui(GLuint program, GLint location, GLuint v0, GLuint v1);
+void GLAPIENTRY
+_mesa_ProgramUniform3ui(GLuint program, GLint location, GLuint v0, GLuint v1,
+                        GLuint v2);
+void GLAPIENTRY
+_mesa_ProgramUniform4ui(GLuint program, GLint location, GLuint v0, GLuint v1,
+                        GLuint v2, GLuint v3);
+void GLAPIENTRY
+_mesa_ProgramUniform1uiv(GLuint program, GLint location, GLsizei count,
+                         const GLuint *value);
+void GLAPIENTRY
+_mesa_ProgramUniform2uiv(GLuint program, GLint location, GLsizei count,
+                         const GLuint *value);
+void GLAPIENTRY
+_mesa_ProgramUniform3uiv(GLuint program, GLint location, GLsizei count,
+                         const GLuint *value);
+void GLAPIENTRY
+_mesa_ProgramUniform4uiv(GLuint program, GLint location, GLsizei count,
+                         const GLuint *value);
+void GLAPIENTRY
+_mesa_ProgramUniformMatrix2fv(GLuint program, GLint, GLsizei, GLboolean,
+                              const GLfloat *);
+void GLAPIENTRY
+_mesa_ProgramUniformMatrix3fv(GLuint program, GLint, GLsizei, GLboolean,
+                              const GLfloat *);
+void GLAPIENTRY
+_mesa_ProgramUniformMatrix4fv(GLuint program, GLint, GLsizei, GLboolean,
+                              const GLfloat *);
+void GLAPIENTRY
+_mesa_ProgramUniformMatrix2x3fv(GLuint program, GLint location, GLsizei count,
+                                GLboolean transpose, const GLfloat *value);
+void GLAPIENTRY
+_mesa_ProgramUniformMatrix3x2fv(GLuint program, GLint location, GLsizei count,
+                                GLboolean transpose, const GLfloat *value);
+void GLAPIENTRY
+_mesa_ProgramUniformMatrix2x4fv(GLuint program, GLint location, GLsizei count,
+                                GLboolean transpose, const GLfloat *value);
+void GLAPIENTRY
+_mesa_ProgramUniformMatrix4x2fv(GLuint program, GLint location, GLsizei count,
+                                GLboolean transpose, const GLfloat *value);
+void GLAPIENTRY
+_mesa_ProgramUniformMatrix3x4fv(GLuint program, GLint location, GLsizei count,
+                                GLboolean transpose, const GLfloat *value);
+void GLAPIENTRY
+_mesa_ProgramUniformMatrix4x3fv(GLuint program, GLint location, GLsizei count,
+                                GLboolean transpose, const GLfloat *value);
+
+void GLAPIENTRY
+_mesa_GetnUniformfvARB(GLuint, GLint, GLsizei, GLfloat *);
+void GLAPIENTRY
+_mesa_GetUniformfv(GLuint, GLint, GLfloat *);
+void GLAPIENTRY
+_mesa_GetnUniformivARB(GLuint, GLint, GLsizei, GLint *);
+void GLAPIENTRY
+_mesa_GetUniformuiv(GLuint, GLint, GLuint *);
+void GLAPIENTRY
+_mesa_GetnUniformuivARB(GLuint, GLint, GLsizei, GLuint *);
+void GLAPIENTRY
+_mesa_GetUniformuiv(GLuint program, GLint location, GLuint *params);
+void GLAPIENTRY
+_mesa_GetnUniformdvARB(GLuint, GLint, GLsizei, GLdouble *);
+void GLAPIENTRY
+_mesa_GetUniformdv(GLuint, GLint, GLdouble *);
+GLint GLAPIENTRY
+_mesa_GetUniformLocation(GLuint, const GLcharARB *);
+GLuint GLAPIENTRY
+_mesa_GetUniformBlockIndex(GLuint program,
+			   const GLchar *uniformBlockName);
+void GLAPIENTRY
+_mesa_GetUniformIndices(GLuint program,
+			GLsizei uniformCount,
+			const GLchar * const *uniformNames,
+			GLuint *uniformIndices);
+void GLAPIENTRY
+_mesa_UniformBlockBinding(GLuint program,
+			  GLuint uniformBlockIndex,
+			  GLuint uniformBlockBinding);
+void GLAPIENTRY
+_mesa_GetActiveAtomicCounterBufferiv(GLuint program, GLuint bufferIndex,
+                                     GLenum pname, GLint *params);
+void GLAPIENTRY
+_mesa_GetActiveUniformBlockiv(GLuint program,
+			      GLuint uniformBlockIndex,
+			      GLenum pname,
+			      GLint *params);
+void GLAPIENTRY
+_mesa_GetActiveUniformBlockName(GLuint program,
+				GLuint uniformBlockIndex,
+				GLsizei bufSize,
+				GLsizei *length,
+				GLchar *uniformBlockName);
+void GLAPIENTRY
+_mesa_GetActiveUniformName(GLuint program, GLuint uniformIndex,
+			   GLsizei bufSize, GLsizei *length,
+			   GLchar *uniformName);
+void GLAPIENTRY
+_mesa_GetActiveUniform(GLuint, GLuint, GLsizei, GLsizei *,
+                       GLint *, GLenum *, GLcharARB *);
+void GLAPIENTRY
+_mesa_GetActiveUniformsiv(GLuint program,
+			  GLsizei uniformCount,
+			  const GLuint *uniformIndices,
+			  GLenum pname,
+			  GLint *params);
+void GLAPIENTRY
+_mesa_GetUniformiv(GLuint, GLint, GLint *);
+
+long
+_mesa_parse_program_resource_name(const GLchar *name,
+                                  const GLchar **out_base_name_end);
+
+unsigned
+_mesa_get_uniform_location(struct gl_context *ctx, struct gl_shader_program *shProg,
+			   const GLchar *name, unsigned *offset);
+
+void
+_mesa_uniform(struct gl_context *ctx, struct gl_shader_program *shader_program,
+	      GLint location, GLsizei count,
+              const GLvoid *values, GLenum type);
+
+void
+_mesa_uniform_matrix(struct gl_context *ctx, struct gl_shader_program *shProg,
+		     GLuint cols, GLuint rows,
+                     GLint location, GLsizei count,
+                     GLboolean transpose, const GLfloat *values);
+
+void
+_mesa_get_uniform(struct gl_context *ctx, GLuint program, GLint location,
+		  GLsizei bufSize, enum glsl_base_type returnType,
+		  GLvoid *paramsOut);
+
+extern void
+_mesa_uniform_attach_driver_storage(struct gl_uniform_storage *,
+				    unsigned element_stride,
+				    unsigned vector_stride,
+				    enum gl_uniform_driver_format format,
+				    void *data);
+
+extern void
+_mesa_uniform_detach_all_driver_storage(struct gl_uniform_storage *uni);
+
+extern void
+_mesa_propagate_uniforms_to_driver_storage(struct gl_uniform_storage *uni,
+					   unsigned array_index,
+					   unsigned count);
+
+extern void
+_mesa_update_shader_textures_used(struct gl_shader_program *shProg,
+				  struct gl_program *prog);
+
+extern bool
+_mesa_sampler_uniforms_are_valid(const struct gl_shader_program *shProg,
+				 char *errMsg, size_t errMsgLength);
+
+extern const struct gl_program_parameter *
+get_uniform_parameter(struct gl_shader_program *shProg, GLint index);
+
+extern void
+_mesa_get_uniform_name(const struct gl_uniform_storage *uni,
+                       GLsizei maxLength, GLsizei *length,
+                       GLchar *nameOut);
+
+struct gl_builtin_uniform_element {
+   const char *field;
+   int tokens[STATE_LENGTH];
+   int swizzle;
+};
+
+struct gl_builtin_uniform_desc {
+   const char *name;
+   const struct gl_builtin_uniform_element *elements;
+   unsigned int num_elements;
+};
+
+/**
+ * \name GLSL uniform arrays and structs require special handling.
+ *
+ * The GL_ARB_shader_objects spec says that if you use
+ * glGetUniformLocation to get the location of an array, you CANNOT
+ * access other elements of the array by adding an offset to the
+ * returned location.  For example, you must call
+ * glGetUniformLocation("foo[16]") if you want to set the 16th element
+ * of the array with glUniform().
+ *
+ * HOWEVER, some other OpenGL drivers allow accessing array elements
+ * by adding an offset to the returned array location.  And some apps
+ * seem to depend on that behaviour.
+ *
+ * Mesa's gl_uniform_list doesn't directly support this since each
+ * entry in the list describes one uniform variable, not one uniform
+ * element.  We could insert dummy entries in the list for each array
+ * element after [0] but that causes complications elsewhere.
+ *
+ * We solve this problem by creating multiple entries for uniform arrays
+ * in the UniformRemapTable so that their elements get sequential locations.
+ *
+ * Utility functions below offer functionality to split UniformRemapTable
+ * location in to location of the uniform in UniformStorage + offset to the
+ * array element (0 if not an array) and also merge it back again as the
+ * UniformRemapTable location.
+ *
+ */
+/*@{*/
+/**
+ * Combine the uniform's storage index and the array index
+ */
+static inline GLint
+_mesa_uniform_merge_location_offset(const struct gl_shader_program *prog,
+                                    unsigned storage_index,
+                                    unsigned uniform_array_index)
+{
+   /* location in remap table + array element offset */
+   return prog->UniformStorage[storage_index].remap_location +
+      uniform_array_index;
+}
+
+/**
+ * Separate the uniform storage index and array index
+ */
+static inline void
+_mesa_uniform_split_location_offset(const struct gl_shader_program *prog,
+                                    GLint location, unsigned *storage_index,
+				    unsigned *uniform_array_index)
+{
+   *storage_index = prog->UniformRemapTable[location] - prog->UniformStorage;
+   *uniform_array_index = location -
+      prog->UniformRemapTable[location]->remap_location;
+
+   /*gl_uniform_storage in UniformStorage with the calculated base_location
+    * must match with the entry in remap table
+    */
+   assert(&prog->UniformStorage[*storage_index] ==
+          prog->UniformRemapTable[location]);
+}
+/*@}*/
+
+
+#ifdef __cplusplus
+}
+#endif
+
+
+#endif /* UNIFORMS_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/version.c b/icd/intel/compiler/mesa-utils/src/mesa/main/version.c
new file mode 100644
index 0000000..520c7b4
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/version.c

@@ -0,0 +1,415 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 2010  VMware, Inc.  All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+#include "imports.h"
+#include "mtypes.h"
+#include "version.h"
+// #include "git_sha1.h" // LunarG DEL:
+
+/**
+ * Scans 'string' to see if it ends with 'ending'.
+ */
+static GLboolean
+check_for_ending(const char *string, const char *ending)
+{
+   int len1, len2;
+
+   len1 = strlen(string);
+   len2 = strlen(ending);
+
+   if (len2 > len1) {
+      return GL_FALSE;
+   }
+
+   if (strcmp(string + (len1 - len2), ending) == 0) {
+      return GL_TRUE;
+   } else {
+      return GL_FALSE;
+   }
+}
+
+/**
+ * Returns the gl override data
+ *
+ * version > 0 indicates there is an override requested
+ * fwd_context is only valid if version > 0
+ */
+static void
+get_gl_override(int *version, GLboolean *fwd_context)
+{
+   const char *env_var = "MESA_GL_VERSION_OVERRIDE";
+   const char *version_str;
+   int major, minor, n;
+   static int override_version = -1;
+   static GLboolean fc_suffix = GL_FALSE;
+
+   if (override_version < 0) {
+      override_version = 0;
+
+      version_str = getenv(env_var);
+      if (version_str) {
+         fc_suffix = check_for_ending(version_str, "FC");
+
+         n = sscanf(version_str, "%u.%u", &major, &minor);
+         if (n != 2) {
+            fprintf(stderr, "error: invalid value for %s: %s\n", env_var, version_str);
+            override_version = 0;
+         } else {
+            override_version = major * 10 + minor;
+            if (override_version < 30 && fc_suffix) {
+               fprintf(stderr, "error: invalid value for %s: %s\n", env_var, version_str);
+            }
+         }
+      }
+   }
+
+   *version = override_version;
+   *fwd_context = fc_suffix;
+}
+
+/**
+ * Builds the MESA version string.
+ */
+static void
+create_version_string(struct gl_context *ctx, const char *prefix)
+{
+   static const int max = 100;
+
+   ctx->VersionString = malloc(max);
+   if (ctx->VersionString) {
+       ctx->VersionString[0] = '\0';
+   }
+}
+
+/**
+ * Override the context's version and/or API type if the
+ * environment variable MESA_GL_VERSION_OVERRIDE is set.
+ *
+ * Example uses of MESA_GL_VERSION_OVERRIDE:
+ *
+ * 2.1: select a compatibility (non-Core) profile with GL version 2.1
+ * 3.0: select a compatibility (non-Core) profile with GL version 3.0
+ * 3.0FC: select a Core+Forward Compatible profile with GL version 3.0
+ * 3.1: select a Core profile with GL version 3.1
+ * 3.1FC: select a Core+Forward Compatible profile with GL version 3.1
+ */
+void
+_mesa_override_gl_version(struct gl_context *ctx)
+{
+   int version;
+   GLboolean fwd_context;
+
+   get_gl_override(&version, &fwd_context);
+
+   if (version > 0) {
+      ctx->Version = version;
+      if (version >= 30 && fwd_context) {
+         ctx->API = API_OPENGL_CORE;
+         ctx->Const.ContextFlags |= GL_CONTEXT_FLAG_FORWARD_COMPATIBLE_BIT;
+      } else if (version >= 31) {
+         ctx->API = API_OPENGL_CORE;
+      } else {
+         ctx->API = API_OPENGL_COMPAT;
+      }
+      create_version_string(ctx, "");
+   }
+}
+
+/**
+ * Returns the gl override value
+ *
+ * version > 0 indicates there is an override requested
+ */
+int
+_mesa_get_gl_version_override(void)
+{
+   int version;
+   GLboolean fwd_context;
+
+   get_gl_override(&version, &fwd_context);
+
+   return version;
+}
+
+/**
+ * Override the context's GLSL version if the environment variable
+ * MESA_GLSL_VERSION_OVERRIDE is set. Valid values for
+ * MESA_GLSL_VERSION_OVERRIDE are integers, such as "130".
+ */
+void
+_mesa_override_glsl_version(struct gl_context *ctx)
+{
+   const char *env_var = "MESA_GLSL_VERSION_OVERRIDE";
+   const char *version;
+   int n;
+
+   version = getenv(env_var);
+   if (!version) {
+      return;
+   }
+
+   n = sscanf(version, "%u", &ctx->Const.GLSLVersion);
+   if (n != 1) {
+      fprintf(stderr, "error: invalid value for %s: %s\n", env_var, version);
+      return;
+   }
+}
+
+/**
+ * Examine enabled GL extensions to determine GL version.
+ */
+static void
+compute_version(struct gl_context *ctx)
+{
+   GLuint major, minor;
+
+   const GLboolean ver_1_3 = (ctx->Extensions.ARB_texture_border_clamp &&
+                              ctx->Extensions.ARB_texture_cube_map &&
+                              ctx->Extensions.ARB_texture_env_combine &&
+                              ctx->Extensions.ARB_texture_env_dot3);
+   const GLboolean ver_1_4 = (ver_1_3 &&
+                              ctx->Extensions.ARB_depth_texture &&
+                              ctx->Extensions.ARB_shadow &&
+                              ctx->Extensions.ARB_texture_env_crossbar &&
+                              ctx->Extensions.EXT_blend_color &&
+                              ctx->Extensions.EXT_blend_func_separate &&
+                              ctx->Extensions.EXT_blend_minmax &&
+                              ctx->Extensions.EXT_point_parameters);
+   const GLboolean ver_1_5 = (ver_1_4 &&
+                              ctx->Extensions.ARB_occlusion_query);
+   const GLboolean ver_2_0 = (ver_1_5 &&
+                              ctx->Extensions.ARB_point_sprite &&
+                              ctx->Extensions.ARB_vertex_shader &&
+                              ctx->Extensions.ARB_fragment_shader &&
+                              ctx->Extensions.ARB_texture_non_power_of_two &&
+                              ctx->Extensions.EXT_blend_equation_separate &&
+
+			      /* Technically, 2.0 requires the functionality
+			       * of the EXT version.  Enable 2.0 if either
+			       * extension is available, and assume that a
+			       * driver that only exposes the ATI extension
+			       * will fallback to software when necessary.
+			       */
+			      (ctx->Extensions.EXT_stencil_two_side
+			       || ctx->Extensions.ATI_separate_stencil));
+   const GLboolean ver_2_1 = (ver_2_0 &&
+                              ctx->Extensions.EXT_pixel_buffer_object &&
+                              ctx->Extensions.EXT_texture_sRGB);
+   const GLboolean ver_3_0 = (ver_2_1 &&
+                              ctx->Const.GLSLVersion >= 130 &&
+                              (ctx->Const.MaxSamples >= 4 || ctx->Const.FakeSWMSAA) &&
+                              (ctx->API == API_OPENGL_CORE ||
+                               ctx->Extensions.ARB_color_buffer_float) &&
+                              ctx->Extensions.ARB_depth_buffer_float &&
+                              ctx->Extensions.ARB_half_float_vertex &&
+                              ctx->Extensions.ARB_map_buffer_range &&
+                              ctx->Extensions.ARB_shader_texture_lod &&
+                              ctx->Extensions.ARB_texture_float &&
+                              ctx->Extensions.ARB_texture_rg &&
+                              ctx->Extensions.ARB_texture_compression_rgtc &&
+                              ctx->Extensions.EXT_draw_buffers2 &&
+                              ctx->Extensions.ARB_framebuffer_object &&
+                              ctx->Extensions.EXT_framebuffer_sRGB &&
+                              ctx->Extensions.EXT_packed_float &&
+                              ctx->Extensions.EXT_texture_array &&
+                              ctx->Extensions.EXT_texture_shared_exponent &&
+                              ctx->Extensions.EXT_transform_feedback &&
+                              ctx->Extensions.NV_conditional_render);
+   const GLboolean ver_3_1 = (ver_3_0 &&
+                              ctx->Const.GLSLVersion >= 140 &&
+                              ctx->Extensions.ARB_draw_instanced &&
+                              ctx->Extensions.ARB_texture_buffer_object &&
+                              ctx->Extensions.ARB_uniform_buffer_object &&
+                              ctx->Extensions.EXT_texture_snorm &&
+                              ctx->Extensions.NV_primitive_restart &&
+                              ctx->Extensions.NV_texture_rectangle &&
+                              ctx->Const.Program[MESA_SHADER_VERTEX].MaxTextureImageUnits >= 16);
+   const GLboolean ver_3_2 = (ver_3_1 &&
+                              ctx->Const.GLSLVersion >= 150 &&
+                              ctx->Extensions.ARB_depth_clamp &&
+                              ctx->Extensions.ARB_draw_elements_base_vertex &&
+                              ctx->Extensions.ARB_fragment_coord_conventions &&
+                              ctx->Extensions.EXT_provoking_vertex &&
+                              ctx->Extensions.ARB_seamless_cube_map &&
+                              ctx->Extensions.ARB_sync &&
+                              ctx->Extensions.ARB_texture_multisample &&
+                              ctx->Extensions.EXT_vertex_array_bgra);
+   const GLboolean ver_3_3 = (ver_3_2 &&
+                              ctx->Const.GLSLVersion >= 330 &&
+                              ctx->Extensions.ARB_blend_func_extended &&
+                              ctx->Extensions.ARB_explicit_attrib_location &&
+                              ctx->Extensions.ARB_instanced_arrays &&
+                              ctx->Extensions.ARB_occlusion_query2 &&
+                              ctx->Extensions.ARB_shader_bit_encoding &&
+                              ctx->Extensions.ARB_texture_rgb10_a2ui &&
+                              ctx->Extensions.ARB_timer_query &&
+                              ctx->Extensions.ARB_vertex_type_2_10_10_10_rev &&
+                              ctx->Extensions.EXT_texture_swizzle);
+                              /* ARB_sampler_objects is always enabled in mesa */
+
+   if (ver_3_3) {
+      major = 3;
+      minor = 3;
+   }
+   else if (ver_3_2) {
+      major = 3;
+      minor = 2;
+   }
+   else if (ver_3_1) {
+      major = 3;
+      minor = 1;
+   }
+   else if (ver_3_0) {
+      major = 3;
+      minor = 0;
+   }
+   else if (ver_2_1) {
+      major = 2;
+      minor = 1;
+   }
+   else if (ver_2_0) {
+      major = 2;
+      minor = 0;
+   }
+   else if (ver_1_5) {
+      major = 1;
+      minor = 5;
+   }
+   else if (ver_1_4) {
+      major = 1;
+      minor = 4;
+   }
+   else if (ver_1_3) {
+      major = 1;
+      minor = 3;
+   }
+   else {
+      major = 1;
+      minor = 2;
+   }
+
+   ctx->Version = major * 10 + minor;
+
+   create_version_string(ctx, "");
+}
+
+static void
+compute_version_es1(struct gl_context *ctx)
+{
+   /* OpenGL ES 1.0 is derived from OpenGL 1.3 */
+   const GLboolean ver_1_0 = (ctx->Extensions.ARB_texture_env_combine &&
+                              ctx->Extensions.ARB_texture_env_dot3);
+   /* OpenGL ES 1.1 is derived from OpenGL 1.5 */
+   const GLboolean ver_1_1 = (ver_1_0 &&
+                              ctx->Extensions.EXT_point_parameters);
+
+   if (ver_1_1) {
+      ctx->Version = 11;
+   } else if (ver_1_0) {
+      ctx->Version = 10;
+   } else {
+      _mesa_problem(ctx, "Incomplete OpenGL ES 1.0 support.");
+   }
+
+   create_version_string(ctx, "OpenGL ES-CM ");
+}
+
+static void
+compute_version_es2(struct gl_context *ctx)
+{
+   /* OpenGL ES 2.0 is derived from OpenGL 2.0 */
+   const GLboolean ver_2_0 = (ctx->Extensions.ARB_texture_cube_map &&
+                              ctx->Extensions.EXT_blend_color &&
+                              ctx->Extensions.EXT_blend_func_separate &&
+                              ctx->Extensions.EXT_blend_minmax &&
+                              ctx->Extensions.ARB_vertex_shader &&
+                              ctx->Extensions.ARB_fragment_shader &&
+                              ctx->Extensions.ARB_texture_non_power_of_two &&
+                              ctx->Extensions.EXT_blend_equation_separate);
+   /* FINISHME: This list isn't quite right. */
+   const GLboolean ver_3_0 = (ctx->Extensions.ARB_half_float_vertex &&
+                              ctx->Extensions.ARB_internalformat_query &&
+                              ctx->Extensions.ARB_map_buffer_range &&
+                              ctx->Extensions.ARB_shader_texture_lod &&
+                              ctx->Extensions.ARB_texture_float &&
+                              ctx->Extensions.ARB_texture_rg &&
+                              ctx->Extensions.ARB_texture_compression_rgtc &&
+                              ctx->Extensions.EXT_draw_buffers2 &&
+                              /* ctx->Extensions.ARB_framebuffer_object && */
+                              ctx->Extensions.EXT_framebuffer_sRGB &&
+                              ctx->Extensions.EXT_packed_float &&
+                              ctx->Extensions.EXT_texture_array &&
+                              ctx->Extensions.EXT_texture_shared_exponent &&
+                              ctx->Extensions.EXT_transform_feedback &&
+                              ctx->Extensions.NV_conditional_render &&
+                              ctx->Extensions.ARB_draw_instanced &&
+                              ctx->Extensions.ARB_uniform_buffer_object &&
+                              ctx->Extensions.EXT_texture_snorm &&
+                              ctx->Extensions.NV_primitive_restart &&
+                              ctx->Extensions.OES_depth_texture_cube_map);
+   if (ver_3_0) {
+      ctx->Version = 30;
+   } else if (ver_2_0) {
+      ctx->Version = 20;
+   } else {
+      _mesa_problem(ctx, "Incomplete OpenGL ES 2.0 support.");
+   }
+
+   create_version_string(ctx, "OpenGL ES ");
+}
+
+/**
+ * Set the context's Version and VersionString fields.
+ * This should only be called once as part of context initialization
+ * or to perform version check for GLX_ARB_create_context_profile.
+ */
+void
+_mesa_compute_version(struct gl_context *ctx)
+{
+   if (ctx->Version)
+      return;
+
+   switch (ctx->API) {
+   case API_OPENGL_COMPAT:
+      /* Disable GLSL 1.40 and later for legacy contexts.
+       * This disallows creation of the GL 3.1 compatibility context. */
+      if (ctx->Const.GLSLVersion > 130) {
+         ctx->Const.GLSLVersion = 130;
+      }
+      /* fall through */
+   case API_OPENGL_CORE:
+      compute_version(ctx);
+      break;
+   case API_OPENGLES:
+      compute_version_es1(ctx);
+      break;
+   case API_OPENGLES2:
+      compute_version_es2(ctx);
+      break;
+   case API_VK:
+       break;
+   }
+
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/main/version.h b/icd/intel/compiler/mesa-utils/src/mesa/main/version.h
new file mode 100644
index 0000000..c78f87a
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/main/version.h

@@ -0,0 +1,46 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2008  Brian Paul   All Rights Reserved.
+ * Copyright (C) 2009  VMware, Inc.  All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+#ifndef VERSION_H
+#define VERSION_H
+
+
+struct gl_context;
+
+
+extern void
+_mesa_compute_version(struct gl_context *ctx);
+
+extern void
+_mesa_override_gl_version(struct gl_context *ctx);
+
+extern void
+_mesa_override_glsl_version(struct gl_context *ctx);
+
+extern int
+_mesa_get_gl_version_override(void);
+
+#endif /* VERSION_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/math/m_matrix.c b/icd/intel/compiler/mesa-utils/src/mesa/math/m_matrix.c
new file mode 100644
index 0000000..2d78765
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/math/m_matrix.c

@@ -0,0 +1,1609 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2005  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+/**
+ * \file m_matrix.c
+ * Matrix operations.
+ *
+ * \note
+ * -# 4x4 transformation matrices are stored in memory in column major order.
+ * -# Points/vertices are to be thought of as column vectors.
+ * -# Transformation of a point p by a matrix M is: p' = M * p
+ */
+
+
+#include "main/glheader.h"
+#include "main/imports.h"
+#include "main/macros.h"
+
+#include "m_matrix.h"
+
+
+/**
+ * \defgroup MatFlags MAT_FLAG_XXX-flags
+ *
+ * Bitmasks to indicate different kinds of 4x4 matrices in GLmatrix::flags
+ */
+/*@{*/
+#define MAT_FLAG_IDENTITY       0     /**< is an identity matrix flag.
+                                       *   (Not actually used - the identity
+                                       *   matrix is identified by the absence
+                                       *   of all other flags.)
+                                       */
+#define MAT_FLAG_GENERAL        0x1   /**< is a general matrix flag */
+#define MAT_FLAG_ROTATION       0x2   /**< is a rotation matrix flag */
+#define MAT_FLAG_TRANSLATION    0x4   /**< is a translation matrix flag */
+#define MAT_FLAG_UNIFORM_SCALE  0x8   /**< is an uniform scaling matrix flag */
+#define MAT_FLAG_GENERAL_SCALE  0x10  /**< is a general scaling matrix flag */
+#define MAT_FLAG_GENERAL_3D     0x20  /**< general 3D matrix flag */
+#define MAT_FLAG_PERSPECTIVE    0x40  /**< is a perspective proj matrix flag */
+#define MAT_FLAG_SINGULAR       0x80  /**< is a singular matrix flag */
+#define MAT_DIRTY_TYPE          0x100  /**< matrix type is dirty */
+#define MAT_DIRTY_FLAGS         0x200  /**< matrix flags are dirty */
+#define MAT_DIRTY_INVERSE       0x400  /**< matrix inverse is dirty */
+
+/** angle preserving matrix flags mask */
+#define MAT_FLAGS_ANGLE_PRESERVING (MAT_FLAG_ROTATION | \
+				    MAT_FLAG_TRANSLATION | \
+				    MAT_FLAG_UNIFORM_SCALE)
+
+/** geometry related matrix flags mask */
+#define MAT_FLAGS_GEOMETRY (MAT_FLAG_GENERAL | \
+			    MAT_FLAG_ROTATION | \
+			    MAT_FLAG_TRANSLATION | \
+			    MAT_FLAG_UNIFORM_SCALE | \
+			    MAT_FLAG_GENERAL_SCALE | \
+			    MAT_FLAG_GENERAL_3D | \
+			    MAT_FLAG_PERSPECTIVE | \
+	                    MAT_FLAG_SINGULAR)
+
+/** length preserving matrix flags mask */
+#define MAT_FLAGS_LENGTH_PRESERVING (MAT_FLAG_ROTATION | \
+				     MAT_FLAG_TRANSLATION)
+
+
+/** 3D (non-perspective) matrix flags mask */
+#define MAT_FLAGS_3D (MAT_FLAG_ROTATION | \
+		      MAT_FLAG_TRANSLATION | \
+		      MAT_FLAG_UNIFORM_SCALE | \
+		      MAT_FLAG_GENERAL_SCALE | \
+		      MAT_FLAG_GENERAL_3D)
+
+/** dirty matrix flags mask */
+#define MAT_DIRTY          (MAT_DIRTY_TYPE | \
+			    MAT_DIRTY_FLAGS | \
+			    MAT_DIRTY_INVERSE)
+
+/*@}*/
+
+
+/** 
+ * Test geometry related matrix flags.
+ * 
+ * \param mat a pointer to a GLmatrix structure.
+ * \param a flags mask.
+ *
+ * \returns non-zero if all geometry related matrix flags are contained within
+ * the mask, or zero otherwise.
+ */ 
+#define TEST_MAT_FLAGS(mat, a)  \
+    ((MAT_FLAGS_GEOMETRY & (~(a)) & ((mat)->flags) ) == 0)
+
+
+
+/**
+ * Names of the corresponding GLmatrixtype values.
+ */
+static const char *types[] = {
+   "MATRIX_GENERAL",
+   "MATRIX_IDENTITY",
+   "MATRIX_3D_NO_ROT",
+   "MATRIX_PERSPECTIVE",
+   "MATRIX_2D",
+   "MATRIX_2D_NO_ROT",
+   "MATRIX_3D"
+};
+
+
+/**
+ * Identity matrix.
+ */
+static GLfloat Identity[16] = {
+   1.0, 0.0, 0.0, 0.0,
+   0.0, 1.0, 0.0, 0.0,
+   0.0, 0.0, 1.0, 0.0,
+   0.0, 0.0, 0.0, 1.0
+};
+
+
+
+/**********************************************************************/
+/** \name Matrix multiplication */
+/*@{*/
+
+#define A(row,col)  a[(col<<2)+row]
+#define B(row,col)  b[(col<<2)+row]
+#define P(row,col)  product[(col<<2)+row]
+
+/**
+ * Perform a full 4x4 matrix multiplication.
+ *
+ * \param a matrix.
+ * \param b matrix.
+ * \param product will receive the product of \p a and \p b.
+ *
+ * \warning Is assumed that \p product != \p b. \p product == \p a is allowed.
+ *
+ * \note KW: 4*16 = 64 multiplications
+ * 
+ * \author This \c matmul was contributed by Thomas Malik
+ */
+static void matmul4( GLfloat *product, const GLfloat *a, const GLfloat *b )
+{
+   GLint i;
+   for (i = 0; i < 4; i++) {
+      const GLfloat ai0=A(i,0),  ai1=A(i,1),  ai2=A(i,2),  ai3=A(i,3);
+      P(i,0) = ai0 * B(0,0) + ai1 * B(1,0) + ai2 * B(2,0) + ai3 * B(3,0);
+      P(i,1) = ai0 * B(0,1) + ai1 * B(1,1) + ai2 * B(2,1) + ai3 * B(3,1);
+      P(i,2) = ai0 * B(0,2) + ai1 * B(1,2) + ai2 * B(2,2) + ai3 * B(3,2);
+      P(i,3) = ai0 * B(0,3) + ai1 * B(1,3) + ai2 * B(2,3) + ai3 * B(3,3);
+   }
+}
+
+/**
+ * Multiply two matrices known to occupy only the top three rows, such
+ * as typical model matrices, and orthogonal matrices.
+ *
+ * \param a matrix.
+ * \param b matrix.
+ * \param product will receive the product of \p a and \p b.
+ */
+static void matmul34( GLfloat *product, const GLfloat *a, const GLfloat *b )
+{
+   GLint i;
+   for (i = 0; i < 3; i++) {
+      const GLfloat ai0=A(i,0),  ai1=A(i,1),  ai2=A(i,2),  ai3=A(i,3);
+      P(i,0) = ai0 * B(0,0) + ai1 * B(1,0) + ai2 * B(2,0);
+      P(i,1) = ai0 * B(0,1) + ai1 * B(1,1) + ai2 * B(2,1);
+      P(i,2) = ai0 * B(0,2) + ai1 * B(1,2) + ai2 * B(2,2);
+      P(i,3) = ai0 * B(0,3) + ai1 * B(1,3) + ai2 * B(2,3) + ai3;
+   }
+   P(3,0) = 0;
+   P(3,1) = 0;
+   P(3,2) = 0;
+   P(3,3) = 1;
+}
+
+#undef A
+#undef B
+#undef P
+
+/**
+ * Multiply a matrix by an array of floats with known properties.
+ *
+ * \param mat pointer to a GLmatrix structure containing the left multiplication
+ * matrix, and that will receive the product result.
+ * \param m right multiplication matrix array.
+ * \param flags flags of the matrix \p m.
+ * 
+ * Joins both flags and marks the type and inverse as dirty.  Calls matmul34()
+ * if both matrices are 3D, or matmul4() otherwise.
+ */
+static void matrix_multf( GLmatrix *mat, const GLfloat *m, GLuint flags )
+{
+   mat->flags |= (flags | MAT_DIRTY_TYPE | MAT_DIRTY_INVERSE);
+
+   if (TEST_MAT_FLAGS(mat, MAT_FLAGS_3D))
+      matmul34( mat->m, mat->m, m );
+   else
+      matmul4( mat->m, mat->m, m );
+}
+
+/**
+ * Matrix multiplication.
+ *
+ * \param dest destination matrix.
+ * \param a left matrix.
+ * \param b right matrix.
+ * 
+ * Joins both flags and marks the type and inverse as dirty.  Calls matmul34()
+ * if both matrices are 3D, or matmul4() otherwise.
+ */
+void
+_math_matrix_mul_matrix( GLmatrix *dest, const GLmatrix *a, const GLmatrix *b )
+{
+   dest->flags = (a->flags |
+		  b->flags |
+		  MAT_DIRTY_TYPE |
+		  MAT_DIRTY_INVERSE);
+
+   if (TEST_MAT_FLAGS(dest, MAT_FLAGS_3D))
+      matmul34( dest->m, a->m, b->m );
+   else
+      matmul4( dest->m, a->m, b->m );
+}
+
+/**
+ * Matrix multiplication.
+ *
+ * \param dest left and destination matrix.
+ * \param m right matrix array.
+ * 
+ * Marks the matrix flags with general flag, and type and inverse dirty flags.
+ * Calls matmul4() for the multiplication.
+ */
+void
+_math_matrix_mul_floats( GLmatrix *dest, const GLfloat *m )
+{
+   dest->flags |= (MAT_FLAG_GENERAL |
+		   MAT_DIRTY_TYPE |
+		   MAT_DIRTY_INVERSE |
+                   MAT_DIRTY_FLAGS);
+
+   matmul4( dest->m, dest->m, m );
+}
+
+/*@}*/
+
+
+/**********************************************************************/
+/** \name Matrix output */
+/*@{*/
+
+/**
+ * Print a matrix array.
+ *
+ * \param m matrix array.
+ *
+ * Called by _math_matrix_print() to print a matrix or its inverse.
+ */
+static void print_matrix_floats( const GLfloat m[16] )
+{
+   int i;
+   for (i=0;i<4;i++) {
+      _mesa_debug(NULL,"\t%f %f %f %f\n", m[i], m[4+i], m[8+i], m[12+i] );
+   }
+}
+
+/**
+ * Dumps the contents of a GLmatrix structure.
+ * 
+ * \param m pointer to the GLmatrix structure.
+ */
+void
+_math_matrix_print( const GLmatrix *m )
+{
+   GLfloat prod[16];
+
+   _mesa_debug(NULL, "Matrix type: %s, flags: %x\n", types[m->type], m->flags);
+   print_matrix_floats(m->m);
+   _mesa_debug(NULL, "Inverse: \n");
+   print_matrix_floats(m->inv);
+   matmul4(prod, m->m, m->inv);
+   _mesa_debug(NULL, "Mat * Inverse:\n");
+   print_matrix_floats(prod);
+}
+
+/*@}*/
+
+
+/**
+ * References an element of 4x4 matrix.
+ *
+ * \param m matrix array.
+ * \param c column of the desired element.
+ * \param r row of the desired element.
+ * 
+ * \return value of the desired element.
+ *
+ * Calculate the linear storage index of the element and references it. 
+ */
+#define MAT(m,r,c) (m)[(c)*4+(r)]
+
+
+/**********************************************************************/
+/** \name Matrix inversion */
+/*@{*/
+
+/**
+ * Swaps the values of two floating point variables.
+ *
+ * Used by invert_matrix_general() to swap the row pointers.
+ */
+#define SWAP_ROWS(a, b) { GLfloat *_tmp = a; (a)=(b); (b)=_tmp; }
+
+/**
+ * Compute inverse of 4x4 transformation matrix.
+ * 
+ * \param mat pointer to a GLmatrix structure. The matrix inverse will be
+ * stored in the GLmatrix::inv attribute.
+ * 
+ * \return GL_TRUE for success, GL_FALSE for failure (\p singular matrix).
+ * 
+ * \author
+ * Code contributed by Jacques Leroy jle@star.be
+ *
+ * Calculates the inverse matrix by performing the gaussian matrix reduction
+ * with partial pivoting followed by back/substitution with the loops manually
+ * unrolled.
+ */
+static GLboolean invert_matrix_general( GLmatrix *mat )
+{
+   const GLfloat *m = mat->m;
+   GLfloat *out = mat->inv;
+   GLfloat wtmp[4][8];
+   GLfloat m0, m1, m2, m3, s;
+   GLfloat *r0, *r1, *r2, *r3;
+
+   r0 = wtmp[0], r1 = wtmp[1], r2 = wtmp[2], r3 = wtmp[3];
+
+   r0[0] = MAT(m,0,0), r0[1] = MAT(m,0,1),
+   r0[2] = MAT(m,0,2), r0[3] = MAT(m,0,3),
+   r0[4] = 1.0, r0[5] = r0[6] = r0[7] = 0.0,
+
+   r1[0] = MAT(m,1,0), r1[1] = MAT(m,1,1),
+   r1[2] = MAT(m,1,2), r1[3] = MAT(m,1,3),
+   r1[5] = 1.0, r1[4] = r1[6] = r1[7] = 0.0,
+
+   r2[0] = MAT(m,2,0), r2[1] = MAT(m,2,1),
+   r2[2] = MAT(m,2,2), r2[3] = MAT(m,2,3),
+   r2[6] = 1.0, r2[4] = r2[5] = r2[7] = 0.0,
+
+   r3[0] = MAT(m,3,0), r3[1] = MAT(m,3,1),
+   r3[2] = MAT(m,3,2), r3[3] = MAT(m,3,3),
+   r3[7] = 1.0, r3[4] = r3[5] = r3[6] = 0.0;
+
+   /* choose pivot - or die */
+   if (FABSF(r3[0])>FABSF(r2[0])) SWAP_ROWS(r3, r2);
+   if (FABSF(r2[0])>FABSF(r1[0])) SWAP_ROWS(r2, r1);
+   if (FABSF(r1[0])>FABSF(r0[0])) SWAP_ROWS(r1, r0);
+   if (0.0 == r0[0])  return GL_FALSE;
+
+   /* eliminate first variable     */
+   m1 = r1[0]/r0[0]; m2 = r2[0]/r0[0]; m3 = r3[0]/r0[0];
+   s = r0[1]; r1[1] -= m1 * s; r2[1] -= m2 * s; r3[1] -= m3 * s;
+   s = r0[2]; r1[2] -= m1 * s; r2[2] -= m2 * s; r3[2] -= m3 * s;
+   s = r0[3]; r1[3] -= m1 * s; r2[3] -= m2 * s; r3[3] -= m3 * s;
+   s = r0[4];
+   if (s != 0.0) { r1[4] -= m1 * s; r2[4] -= m2 * s; r3[4] -= m3 * s; }
+   s = r0[5];
+   if (s != 0.0) { r1[5] -= m1 * s; r2[5] -= m2 * s; r3[5] -= m3 * s; }
+   s = r0[6];
+   if (s != 0.0) { r1[6] -= m1 * s; r2[6] -= m2 * s; r3[6] -= m3 * s; }
+   s = r0[7];
+   if (s != 0.0) { r1[7] -= m1 * s; r2[7] -= m2 * s; r3[7] -= m3 * s; }
+
+   /* choose pivot - or die */
+   if (FABSF(r3[1])>FABSF(r2[1])) SWAP_ROWS(r3, r2);
+   if (FABSF(r2[1])>FABSF(r1[1])) SWAP_ROWS(r2, r1);
+   if (0.0 == r1[1])  return GL_FALSE;
+
+   /* eliminate second variable */
+   m2 = r2[1]/r1[1]; m3 = r3[1]/r1[1];
+   r2[2] -= m2 * r1[2]; r3[2] -= m3 * r1[2];
+   r2[3] -= m2 * r1[3]; r3[3] -= m3 * r1[3];
+   s = r1[4]; if (0.0 != s) { r2[4] -= m2 * s; r3[4] -= m3 * s; }
+   s = r1[5]; if (0.0 != s) { r2[5] -= m2 * s; r3[5] -= m3 * s; }
+   s = r1[6]; if (0.0 != s) { r2[6] -= m2 * s; r3[6] -= m3 * s; }
+   s = r1[7]; if (0.0 != s) { r2[7] -= m2 * s; r3[7] -= m3 * s; }
+
+   /* choose pivot - or die */
+   if (FABSF(r3[2])>FABSF(r2[2])) SWAP_ROWS(r3, r2);
+   if (0.0 == r2[2])  return GL_FALSE;
+
+   /* eliminate third variable */
+   m3 = r3[2]/r2[2];
+   r3[3] -= m3 * r2[3], r3[4] -= m3 * r2[4],
+   r3[5] -= m3 * r2[5], r3[6] -= m3 * r2[6],
+   r3[7] -= m3 * r2[7];
+
+   /* last check */
+   if (0.0 == r3[3]) return GL_FALSE;
+
+   s = 1.0F/r3[3];             /* now back substitute row 3 */
+   r3[4] *= s; r3[5] *= s; r3[6] *= s; r3[7] *= s;
+
+   m2 = r2[3];                 /* now back substitute row 2 */
+   s  = 1.0F/r2[2];
+   r2[4] = s * (r2[4] - r3[4] * m2), r2[5] = s * (r2[5] - r3[5] * m2),
+   r2[6] = s * (r2[6] - r3[6] * m2), r2[7] = s * (r2[7] - r3[7] * m2);
+   m1 = r1[3];
+   r1[4] -= r3[4] * m1, r1[5] -= r3[5] * m1,
+   r1[6] -= r3[6] * m1, r1[7] -= r3[7] * m1;
+   m0 = r0[3];
+   r0[4] -= r3[4] * m0, r0[5] -= r3[5] * m0,
+   r0[6] -= r3[6] * m0, r0[7] -= r3[7] * m0;
+
+   m1 = r1[2];                 /* now back substitute row 1 */
+   s  = 1.0F/r1[1];
+   r1[4] = s * (r1[4] - r2[4] * m1), r1[5] = s * (r1[5] - r2[5] * m1),
+   r1[6] = s * (r1[6] - r2[6] * m1), r1[7] = s * (r1[7] - r2[7] * m1);
+   m0 = r0[2];
+   r0[4] -= r2[4] * m0, r0[5] -= r2[5] * m0,
+   r0[6] -= r2[6] * m0, r0[7] -= r2[7] * m0;
+
+   m0 = r0[1];                 /* now back substitute row 0 */
+   s  = 1.0F/r0[0];
+   r0[4] = s * (r0[4] - r1[4] * m0), r0[5] = s * (r0[5] - r1[5] * m0),
+   r0[6] = s * (r0[6] - r1[6] * m0), r0[7] = s * (r0[7] - r1[7] * m0);
+
+   MAT(out,0,0) = r0[4]; MAT(out,0,1) = r0[5],
+   MAT(out,0,2) = r0[6]; MAT(out,0,3) = r0[7],
+   MAT(out,1,0) = r1[4]; MAT(out,1,1) = r1[5],
+   MAT(out,1,2) = r1[6]; MAT(out,1,3) = r1[7],
+   MAT(out,2,0) = r2[4]; MAT(out,2,1) = r2[5],
+   MAT(out,2,2) = r2[6]; MAT(out,2,3) = r2[7],
+   MAT(out,3,0) = r3[4]; MAT(out,3,1) = r3[5],
+   MAT(out,3,2) = r3[6]; MAT(out,3,3) = r3[7];
+
+   return GL_TRUE;
+}
+#undef SWAP_ROWS
+
+/**
+ * Compute inverse of a general 3d transformation matrix.
+ * 
+ * \param mat pointer to a GLmatrix structure. The matrix inverse will be
+ * stored in the GLmatrix::inv attribute.
+ * 
+ * \return GL_TRUE for success, GL_FALSE for failure (\p singular matrix).
+ *
+ * \author Adapted from graphics gems II.
+ *
+ * Calculates the inverse of the upper left by first calculating its
+ * determinant and multiplying it to the symmetric adjust matrix of each
+ * element. Finally deals with the translation part by transforming the
+ * original translation vector using by the calculated submatrix inverse.
+ */
+static GLboolean invert_matrix_3d_general( GLmatrix *mat )
+{
+   const GLfloat *in = mat->m;
+   GLfloat *out = mat->inv;
+   GLfloat pos, neg, t;
+   GLfloat det;
+
+   /* Calculate the determinant of upper left 3x3 submatrix and
+    * determine if the matrix is singular.
+    */
+   pos = neg = 0.0;
+   t =  MAT(in,0,0) * MAT(in,1,1) * MAT(in,2,2);
+   if (t >= 0.0) pos += t; else neg += t;
+
+   t =  MAT(in,1,0) * MAT(in,2,1) * MAT(in,0,2);
+   if (t >= 0.0) pos += t; else neg += t;
+
+   t =  MAT(in,2,0) * MAT(in,0,1) * MAT(in,1,2);
+   if (t >= 0.0) pos += t; else neg += t;
+
+   t = -MAT(in,2,0) * MAT(in,1,1) * MAT(in,0,2);
+   if (t >= 0.0) pos += t; else neg += t;
+
+   t = -MAT(in,1,0) * MAT(in,0,1) * MAT(in,2,2);
+   if (t >= 0.0) pos += t; else neg += t;
+
+   t = -MAT(in,0,0) * MAT(in,2,1) * MAT(in,1,2);
+   if (t >= 0.0) pos += t; else neg += t;
+
+   det = pos + neg;
+
+   if (FABSF(det) < 1e-25)
+      return GL_FALSE;
+
+   det = 1.0F / det;
+   MAT(out,0,0) = (  (MAT(in,1,1)*MAT(in,2,2) - MAT(in,2,1)*MAT(in,1,2) )*det);
+   MAT(out,0,1) = (- (MAT(in,0,1)*MAT(in,2,2) - MAT(in,2,1)*MAT(in,0,2) )*det);
+   MAT(out,0,2) = (  (MAT(in,0,1)*MAT(in,1,2) - MAT(in,1,1)*MAT(in,0,2) )*det);
+   MAT(out,1,0) = (- (MAT(in,1,0)*MAT(in,2,2) - MAT(in,2,0)*MAT(in,1,2) )*det);
+   MAT(out,1,1) = (  (MAT(in,0,0)*MAT(in,2,2) - MAT(in,2,0)*MAT(in,0,2) )*det);
+   MAT(out,1,2) = (- (MAT(in,0,0)*MAT(in,1,2) - MAT(in,1,0)*MAT(in,0,2) )*det);
+   MAT(out,2,0) = (  (MAT(in,1,0)*MAT(in,2,1) - MAT(in,2,0)*MAT(in,1,1) )*det);
+   MAT(out,2,1) = (- (MAT(in,0,0)*MAT(in,2,1) - MAT(in,2,0)*MAT(in,0,1) )*det);
+   MAT(out,2,2) = (  (MAT(in,0,0)*MAT(in,1,1) - MAT(in,1,0)*MAT(in,0,1) )*det);
+
+   /* Do the translation part */
+   MAT(out,0,3) = - (MAT(in,0,3) * MAT(out,0,0) +
+		     MAT(in,1,3) * MAT(out,0,1) +
+		     MAT(in,2,3) * MAT(out,0,2) );
+   MAT(out,1,3) = - (MAT(in,0,3) * MAT(out,1,0) +
+		     MAT(in,1,3) * MAT(out,1,1) +
+		     MAT(in,2,3) * MAT(out,1,2) );
+   MAT(out,2,3) = - (MAT(in,0,3) * MAT(out,2,0) +
+		     MAT(in,1,3) * MAT(out,2,1) +
+		     MAT(in,2,3) * MAT(out,2,2) );
+
+   return GL_TRUE;
+}
+
+/**
+ * Compute inverse of a 3d transformation matrix.
+ * 
+ * \param mat pointer to a GLmatrix structure. The matrix inverse will be
+ * stored in the GLmatrix::inv attribute.
+ * 
+ * \return GL_TRUE for success, GL_FALSE for failure (\p singular matrix).
+ *
+ * If the matrix is not an angle preserving matrix then calls
+ * invert_matrix_3d_general for the actual calculation. Otherwise calculates
+ * the inverse matrix analyzing and inverting each of the scaling, rotation and
+ * translation parts.
+ */
+static GLboolean invert_matrix_3d( GLmatrix *mat )
+{
+   const GLfloat *in = mat->m;
+   GLfloat *out = mat->inv;
+
+   if (!TEST_MAT_FLAGS(mat, MAT_FLAGS_ANGLE_PRESERVING)) {
+      return invert_matrix_3d_general( mat );
+   }
+
+   if (mat->flags & MAT_FLAG_UNIFORM_SCALE) {
+      GLfloat scale = (MAT(in,0,0) * MAT(in,0,0) +
+                       MAT(in,0,1) * MAT(in,0,1) +
+                       MAT(in,0,2) * MAT(in,0,2));
+
+      if (scale == 0.0)
+         return GL_FALSE;
+
+      scale = 1.0F / scale;
+
+      /* Transpose and scale the 3 by 3 upper-left submatrix. */
+      MAT(out,0,0) = scale * MAT(in,0,0);
+      MAT(out,1,0) = scale * MAT(in,0,1);
+      MAT(out,2,0) = scale * MAT(in,0,2);
+      MAT(out,0,1) = scale * MAT(in,1,0);
+      MAT(out,1,1) = scale * MAT(in,1,1);
+      MAT(out,2,1) = scale * MAT(in,1,2);
+      MAT(out,0,2) = scale * MAT(in,2,0);
+      MAT(out,1,2) = scale * MAT(in,2,1);
+      MAT(out,2,2) = scale * MAT(in,2,2);
+   }
+   else if (mat->flags & MAT_FLAG_ROTATION) {
+      /* Transpose the 3 by 3 upper-left submatrix. */
+      MAT(out,0,0) = MAT(in,0,0);
+      MAT(out,1,0) = MAT(in,0,1);
+      MAT(out,2,0) = MAT(in,0,2);
+      MAT(out,0,1) = MAT(in,1,0);
+      MAT(out,1,1) = MAT(in,1,1);
+      MAT(out,2,1) = MAT(in,1,2);
+      MAT(out,0,2) = MAT(in,2,0);
+      MAT(out,1,2) = MAT(in,2,1);
+      MAT(out,2,2) = MAT(in,2,2);
+   }
+   else {
+      /* pure translation */
+      memcpy( out, Identity, sizeof(Identity) );
+      MAT(out,0,3) = - MAT(in,0,3);
+      MAT(out,1,3) = - MAT(in,1,3);
+      MAT(out,2,3) = - MAT(in,2,3);
+      return GL_TRUE;
+   }
+
+   if (mat->flags & MAT_FLAG_TRANSLATION) {
+      /* Do the translation part */
+      MAT(out,0,3) = - (MAT(in,0,3) * MAT(out,0,0) +
+			MAT(in,1,3) * MAT(out,0,1) +
+			MAT(in,2,3) * MAT(out,0,2) );
+      MAT(out,1,3) = - (MAT(in,0,3) * MAT(out,1,0) +
+			MAT(in,1,3) * MAT(out,1,1) +
+			MAT(in,2,3) * MAT(out,1,2) );
+      MAT(out,2,3) = - (MAT(in,0,3) * MAT(out,2,0) +
+			MAT(in,1,3) * MAT(out,2,1) +
+			MAT(in,2,3) * MAT(out,2,2) );
+   }
+   else {
+      MAT(out,0,3) = MAT(out,1,3) = MAT(out,2,3) = 0.0;
+   }
+
+   return GL_TRUE;
+}
+
+/**
+ * Compute inverse of an identity transformation matrix.
+ * 
+ * \param mat pointer to a GLmatrix structure. The matrix inverse will be
+ * stored in the GLmatrix::inv attribute.
+ * 
+ * \return always GL_TRUE.
+ *
+ * Simply copies Identity into GLmatrix::inv.
+ */
+static GLboolean invert_matrix_identity( GLmatrix *mat )
+{
+   memcpy( mat->inv, Identity, sizeof(Identity) );
+   return GL_TRUE;
+}
+
+/**
+ * Compute inverse of a no-rotation 3d transformation matrix.
+ * 
+ * \param mat pointer to a GLmatrix structure. The matrix inverse will be
+ * stored in the GLmatrix::inv attribute.
+ * 
+ * \return GL_TRUE for success, GL_FALSE for failure (\p singular matrix).
+ *
+ * Calculates the 
+ */
+static GLboolean invert_matrix_3d_no_rot( GLmatrix *mat )
+{
+   const GLfloat *in = mat->m;
+   GLfloat *out = mat->inv;
+
+   if (MAT(in,0,0) == 0 || MAT(in,1,1) == 0 || MAT(in,2,2) == 0 )
+      return GL_FALSE;
+
+   memcpy( out, Identity, 16 * sizeof(GLfloat) );
+   MAT(out,0,0) = 1.0F / MAT(in,0,0);
+   MAT(out,1,1) = 1.0F / MAT(in,1,1);
+   MAT(out,2,2) = 1.0F / MAT(in,2,2);
+
+   if (mat->flags & MAT_FLAG_TRANSLATION) {
+      MAT(out,0,3) = - (MAT(in,0,3) * MAT(out,0,0));
+      MAT(out,1,3) = - (MAT(in,1,3) * MAT(out,1,1));
+      MAT(out,2,3) = - (MAT(in,2,3) * MAT(out,2,2));
+   }
+
+   return GL_TRUE;
+}
+
+/**
+ * Compute inverse of a no-rotation 2d transformation matrix.
+ * 
+ * \param mat pointer to a GLmatrix structure. The matrix inverse will be
+ * stored in the GLmatrix::inv attribute.
+ * 
+ * \return GL_TRUE for success, GL_FALSE for failure (\p singular matrix).
+ *
+ * Calculates the inverse matrix by applying the inverse scaling and
+ * translation to the identity matrix.
+ */
+static GLboolean invert_matrix_2d_no_rot( GLmatrix *mat )
+{
+   const GLfloat *in = mat->m;
+   GLfloat *out = mat->inv;
+
+   if (MAT(in,0,0) == 0 || MAT(in,1,1) == 0)
+      return GL_FALSE;
+
+   memcpy( out, Identity, 16 * sizeof(GLfloat) );
+   MAT(out,0,0) = 1.0F / MAT(in,0,0);
+   MAT(out,1,1) = 1.0F / MAT(in,1,1);
+
+   if (mat->flags & MAT_FLAG_TRANSLATION) {
+      MAT(out,0,3) = - (MAT(in,0,3) * MAT(out,0,0));
+      MAT(out,1,3) = - (MAT(in,1,3) * MAT(out,1,1));
+   }
+
+   return GL_TRUE;
+}
+
+#if 0
+/* broken */
+static GLboolean invert_matrix_perspective( GLmatrix *mat )
+{
+   const GLfloat *in = mat->m;
+   GLfloat *out = mat->inv;
+
+   if (MAT(in,2,3) == 0)
+      return GL_FALSE;
+
+   memcpy( out, Identity, 16 * sizeof(GLfloat) );
+
+   MAT(out,0,0) = 1.0F / MAT(in,0,0);
+   MAT(out,1,1) = 1.0F / MAT(in,1,1);
+
+   MAT(out,0,3) = MAT(in,0,2);
+   MAT(out,1,3) = MAT(in,1,2);
+
+   MAT(out,2,2) = 0;
+   MAT(out,2,3) = -1;
+
+   MAT(out,3,2) = 1.0F / MAT(in,2,3);
+   MAT(out,3,3) = MAT(in,2,2) * MAT(out,3,2);
+
+   return GL_TRUE;
+}
+#endif
+
+/**
+ * Matrix inversion function pointer type.
+ */
+typedef GLboolean (*inv_mat_func)( GLmatrix *mat );
+
+/**
+ * Table of the matrix inversion functions according to the matrix type.
+ */
+static inv_mat_func inv_mat_tab[7] = {
+   invert_matrix_general,
+   invert_matrix_identity,
+   invert_matrix_3d_no_rot,
+#if 0
+   /* Don't use this function for now - it fails when the projection matrix
+    * is premultiplied by a translation (ala Chromium's tilesort SPU).
+    */
+   invert_matrix_perspective,
+#else
+   invert_matrix_general,
+#endif
+   invert_matrix_3d,		/* lazy! */
+   invert_matrix_2d_no_rot,
+   invert_matrix_3d
+};
+
+/**
+ * Compute inverse of a transformation matrix.
+ * 
+ * \param mat pointer to a GLmatrix structure. The matrix inverse will be
+ * stored in the GLmatrix::inv attribute.
+ * 
+ * \return GL_TRUE for success, GL_FALSE for failure (\p singular matrix).
+ *
+ * Calls the matrix inversion function in inv_mat_tab corresponding to the
+ * given matrix type.  In case of failure, updates the MAT_FLAG_SINGULAR flag,
+ * and copies the identity matrix into GLmatrix::inv.
+ */
+static GLboolean matrix_invert( GLmatrix *mat )
+{
+   if (inv_mat_tab[mat->type](mat)) {
+      mat->flags &= ~MAT_FLAG_SINGULAR;
+      return GL_TRUE;
+   } else {
+      mat->flags |= MAT_FLAG_SINGULAR;
+      memcpy( mat->inv, Identity, sizeof(Identity) );
+      return GL_FALSE;
+   }
+}
+
+/*@}*/
+
+
+/**********************************************************************/
+/** \name Matrix generation */
+/*@{*/
+
+/**
+ * Generate a 4x4 transformation matrix from glRotate parameters, and
+ * post-multiply the input matrix by it.
+ *
+ * \author
+ * This function was contributed by Erich Boleyn (erich@uruk.org).
+ * Optimizations contributed by Rudolf Opalla (rudi@khm.de).
+ */
+void
+_math_matrix_rotate( GLmatrix *mat,
+		     GLfloat angle, GLfloat x, GLfloat y, GLfloat z )
+{
+   GLfloat xx, yy, zz, xy, yz, zx, xs, ys, zs, one_c, s, c;
+   GLfloat m[16];
+   GLboolean optimized;
+
+   s = (GLfloat) sin( angle * DEG2RAD );
+   c = (GLfloat) cos( angle * DEG2RAD );
+
+   memcpy(m, Identity, sizeof(GLfloat)*16);
+   optimized = GL_FALSE;
+
+#define M(row,col)  m[col*4+row]
+
+   if (x == 0.0F) {
+      if (y == 0.0F) {
+         if (z != 0.0F) {
+            optimized = GL_TRUE;
+            /* rotate only around z-axis */
+            M(0,0) = c;
+            M(1,1) = c;
+            if (z < 0.0F) {
+               M(0,1) = s;
+               M(1,0) = -s;
+            }
+            else {
+               M(0,1) = -s;
+               M(1,0) = s;
+            }
+         }
+      }
+      else if (z == 0.0F) {
+         optimized = GL_TRUE;
+         /* rotate only around y-axis */
+         M(0,0) = c;
+         M(2,2) = c;
+         if (y < 0.0F) {
+            M(0,2) = -s;
+            M(2,0) = s;
+         }
+         else {
+            M(0,2) = s;
+            M(2,0) = -s;
+         }
+      }
+   }
+   else if (y == 0.0F) {
+      if (z == 0.0F) {
+         optimized = GL_TRUE;
+         /* rotate only around x-axis */
+         M(1,1) = c;
+         M(2,2) = c;
+         if (x < 0.0F) {
+            M(1,2) = s;
+            M(2,1) = -s;
+         }
+         else {
+            M(1,2) = -s;
+            M(2,1) = s;
+         }
+      }
+   }
+
+   if (!optimized) {
+      const GLfloat mag = sqrtf(x * x + y * y + z * z);
+
+      if (mag <= 1.0e-4) {
+         /* no rotation, leave mat as-is */
+         return;
+      }
+
+      x /= mag;
+      y /= mag;
+      z /= mag;
+
+
+      /*
+       *     Arbitrary axis rotation matrix.
+       *
+       *  This is composed of 5 matrices, Rz, Ry, T, Ry', Rz', multiplied
+       *  like so:  Rz * Ry * T * Ry' * Rz'.  T is the final rotation
+       *  (which is about the X-axis), and the two composite transforms
+       *  Ry' * Rz' and Rz * Ry are (respectively) the rotations necessary
+       *  from the arbitrary axis to the X-axis then back.  They are
+       *  all elementary rotations.
+       *
+       *  Rz' is a rotation about the Z-axis, to bring the axis vector
+       *  into the x-z plane.  Then Ry' is applied, rotating about the
+       *  Y-axis to bring the axis vector parallel with the X-axis.  The
+       *  rotation about the X-axis is then performed.  Ry and Rz are
+       *  simply the respective inverse transforms to bring the arbitrary
+       *  axis back to its original orientation.  The first transforms
+       *  Rz' and Ry' are considered inverses, since the data from the
+       *  arbitrary axis gives you info on how to get to it, not how
+       *  to get away from it, and an inverse must be applied.
+       *
+       *  The basic calculation used is to recognize that the arbitrary
+       *  axis vector (x, y, z), since it is of unit length, actually
+       *  represents the sines and cosines of the angles to rotate the
+       *  X-axis to the same orientation, with theta being the angle about
+       *  Z and phi the angle about Y (in the order described above)
+       *  as follows:
+       *
+       *  cos ( theta ) = x / sqrt ( 1 - z^2 )
+       *  sin ( theta ) = y / sqrt ( 1 - z^2 )
+       *
+       *  cos ( phi ) = sqrt ( 1 - z^2 )
+       *  sin ( phi ) = z
+       *
+       *  Note that cos ( phi ) can further be inserted to the above
+       *  formulas:
+       *
+       *  cos ( theta ) = x / cos ( phi )
+       *  sin ( theta ) = y / sin ( phi )
+       *
+       *  ...etc.  Because of those relations and the standard trigonometric
+       *  relations, it is pssible to reduce the transforms down to what
+       *  is used below.  It may be that any primary axis chosen will give the
+       *  same results (modulo a sign convention) using thie method.
+       *
+       *  Particularly nice is to notice that all divisions that might
+       *  have caused trouble when parallel to certain planes or
+       *  axis go away with care paid to reducing the expressions.
+       *  After checking, it does perform correctly under all cases, since
+       *  in all the cases of division where the denominator would have
+       *  been zero, the numerator would have been zero as well, giving
+       *  the expected result.
+       */
+
+      xx = x * x;
+      yy = y * y;
+      zz = z * z;
+      xy = x * y;
+      yz = y * z;
+      zx = z * x;
+      xs = x * s;
+      ys = y * s;
+      zs = z * s;
+      one_c = 1.0F - c;
+
+      /* We already hold the identity-matrix so we can skip some statements */
+      M(0,0) = (one_c * xx) + c;
+      M(0,1) = (one_c * xy) - zs;
+      M(0,2) = (one_c * zx) + ys;
+/*    M(0,3) = 0.0F; */
+
+      M(1,0) = (one_c * xy) + zs;
+      M(1,1) = (one_c * yy) + c;
+      M(1,2) = (one_c * yz) - xs;
+/*    M(1,3) = 0.0F; */
+
+      M(2,0) = (one_c * zx) - ys;
+      M(2,1) = (one_c * yz) + xs;
+      M(2,2) = (one_c * zz) + c;
+/*    M(2,3) = 0.0F; */
+
+/*
+      M(3,0) = 0.0F;
+      M(3,1) = 0.0F;
+      M(3,2) = 0.0F;
+      M(3,3) = 1.0F;
+*/
+   }
+#undef M
+
+   matrix_multf( mat, m, MAT_FLAG_ROTATION );
+}
+
+/**
+ * Apply a perspective projection matrix.
+ *
+ * \param mat matrix to apply the projection.
+ * \param left left clipping plane coordinate.
+ * \param right right clipping plane coordinate.
+ * \param bottom bottom clipping plane coordinate.
+ * \param top top clipping plane coordinate.
+ * \param nearval distance to the near clipping plane.
+ * \param farval distance to the far clipping plane.
+ *
+ * Creates the projection matrix and multiplies it with \p mat, marking the
+ * MAT_FLAG_PERSPECTIVE flag.
+ */
+void
+_math_matrix_frustum( GLmatrix *mat,
+		      GLfloat left, GLfloat right,
+		      GLfloat bottom, GLfloat top,
+		      GLfloat nearval, GLfloat farval )
+{
+   GLfloat x, y, a, b, c, d;
+   GLfloat m[16];
+
+   x = (2.0F*nearval) / (right-left);
+   y = (2.0F*nearval) / (top-bottom);
+   a = (right+left) / (right-left);
+   b = (top+bottom) / (top-bottom);
+   c = -(farval+nearval) / ( farval-nearval);
+   d = -(2.0F*farval*nearval) / (farval-nearval);  /* error? */
+
+#define M(row,col)  m[col*4+row]
+   M(0,0) = x;     M(0,1) = 0.0F;  M(0,2) = a;      M(0,3) = 0.0F;
+   M(1,0) = 0.0F;  M(1,1) = y;     M(1,2) = b;      M(1,3) = 0.0F;
+   M(2,0) = 0.0F;  M(2,1) = 0.0F;  M(2,2) = c;      M(2,3) = d;
+   M(3,0) = 0.0F;  M(3,1) = 0.0F;  M(3,2) = -1.0F;  M(3,3) = 0.0F;
+#undef M
+
+   matrix_multf( mat, m, MAT_FLAG_PERSPECTIVE );
+}
+
+/**
+ * Apply an orthographic projection matrix.
+ *
+ * \param mat matrix to apply the projection.
+ * \param left left clipping plane coordinate.
+ * \param right right clipping plane coordinate.
+ * \param bottom bottom clipping plane coordinate.
+ * \param top top clipping plane coordinate.
+ * \param nearval distance to the near clipping plane.
+ * \param farval distance to the far clipping plane.
+ *
+ * Creates the projection matrix and multiplies it with \p mat, marking the
+ * MAT_FLAG_GENERAL_SCALE and MAT_FLAG_TRANSLATION flags.
+ */
+void
+_math_matrix_ortho( GLmatrix *mat,
+		    GLfloat left, GLfloat right,
+		    GLfloat bottom, GLfloat top,
+		    GLfloat nearval, GLfloat farval )
+{
+   GLfloat m[16];
+
+#define M(row,col)  m[col*4+row]
+   M(0,0) = 2.0F / (right-left);
+   M(0,1) = 0.0F;
+   M(0,2) = 0.0F;
+   M(0,3) = -(right+left) / (right-left);
+
+   M(1,0) = 0.0F;
+   M(1,1) = 2.0F / (top-bottom);
+   M(1,2) = 0.0F;
+   M(1,3) = -(top+bottom) / (top-bottom);
+
+   M(2,0) = 0.0F;
+   M(2,1) = 0.0F;
+   M(2,2) = -2.0F / (farval-nearval);
+   M(2,3) = -(farval+nearval) / (farval-nearval);
+
+   M(3,0) = 0.0F;
+   M(3,1) = 0.0F;
+   M(3,2) = 0.0F;
+   M(3,3) = 1.0F;
+#undef M
+
+   matrix_multf( mat, m, (MAT_FLAG_GENERAL_SCALE|MAT_FLAG_TRANSLATION));
+}
+
+/**
+ * Multiply a matrix with a general scaling matrix.
+ *
+ * \param mat matrix.
+ * \param x x axis scale factor.
+ * \param y y axis scale factor.
+ * \param z z axis scale factor.
+ *
+ * Multiplies in-place the elements of \p mat by the scale factors. Checks if
+ * the scales factors are roughly the same, marking the MAT_FLAG_UNIFORM_SCALE
+ * flag, or MAT_FLAG_GENERAL_SCALE. Marks the MAT_DIRTY_TYPE and
+ * MAT_DIRTY_INVERSE dirty flags.
+ */
+void
+_math_matrix_scale( GLmatrix *mat, GLfloat x, GLfloat y, GLfloat z )
+{
+   GLfloat *m = mat->m;
+   m[0] *= x;   m[4] *= y;   m[8]  *= z;
+   m[1] *= x;   m[5] *= y;   m[9]  *= z;
+   m[2] *= x;   m[6] *= y;   m[10] *= z;
+   m[3] *= x;   m[7] *= y;   m[11] *= z;
+
+   if (FABSF(x - y) < 1e-8 && FABSF(x - z) < 1e-8)
+      mat->flags |= MAT_FLAG_UNIFORM_SCALE;
+   else
+      mat->flags |= MAT_FLAG_GENERAL_SCALE;
+
+   mat->flags |= (MAT_DIRTY_TYPE |
+		  MAT_DIRTY_INVERSE);
+}
+
+/**
+ * Multiply a matrix with a translation matrix.
+ *
+ * \param mat matrix.
+ * \param x translation vector x coordinate.
+ * \param y translation vector y coordinate.
+ * \param z translation vector z coordinate.
+ *
+ * Adds the translation coordinates to the elements of \p mat in-place.  Marks
+ * the MAT_FLAG_TRANSLATION flag, and the MAT_DIRTY_TYPE and MAT_DIRTY_INVERSE
+ * dirty flags.
+ */
+void
+_math_matrix_translate( GLmatrix *mat, GLfloat x, GLfloat y, GLfloat z )
+{
+   GLfloat *m = mat->m;
+   m[12] = m[0] * x + m[4] * y + m[8]  * z + m[12];
+   m[13] = m[1] * x + m[5] * y + m[9]  * z + m[13];
+   m[14] = m[2] * x + m[6] * y + m[10] * z + m[14];
+   m[15] = m[3] * x + m[7] * y + m[11] * z + m[15];
+
+   mat->flags |= (MAT_FLAG_TRANSLATION |
+		  MAT_DIRTY_TYPE |
+		  MAT_DIRTY_INVERSE);
+}
+
+
+/**
+ * Set matrix to do viewport and depthrange mapping.
+ * Transforms Normalized Device Coords to window/Z values.
+ */
+void
+_math_matrix_viewport(GLmatrix *m, GLfloat x, GLfloat y,
+                      GLfloat width, GLfloat height,
+                      GLdouble zNear, GLdouble zFar, GLdouble depthMax)
+{
+   m->m[MAT_SX] = width / 2.0F;
+   m->m[MAT_TX] = m->m[MAT_SX] + x;
+   m->m[MAT_SY] = height / 2.0F;
+   m->m[MAT_TY] = m->m[MAT_SY] + y;
+   m->m[MAT_SZ] = (GLfloat) (depthMax * ((zFar - zNear) / 2.0));
+   m->m[MAT_TZ] = (GLfloat) (depthMax * ((zFar - zNear) / 2.0 + zNear));
+   m->flags = MAT_FLAG_GENERAL_SCALE | MAT_FLAG_TRANSLATION;
+   m->type = MATRIX_3D_NO_ROT;
+}
+
+
+/**
+ * Set a matrix to the identity matrix.
+ *
+ * \param mat matrix.
+ *
+ * Copies ::Identity into \p GLmatrix::m, and into GLmatrix::inv if not NULL.
+ * Sets the matrix type to identity, and clear the dirty flags.
+ */
+void
+_math_matrix_set_identity( GLmatrix *mat )
+{
+   memcpy( mat->m, Identity, 16*sizeof(GLfloat) );
+   memcpy( mat->inv, Identity, 16*sizeof(GLfloat) );
+
+   mat->type = MATRIX_IDENTITY;
+   mat->flags &= ~(MAT_DIRTY_FLAGS|
+		   MAT_DIRTY_TYPE|
+		   MAT_DIRTY_INVERSE);
+}
+
+/*@}*/
+
+
+/**********************************************************************/
+/** \name Matrix analysis */
+/*@{*/
+
+#define ZERO(x) (1<<x)
+#define ONE(x)  (1<<(x+16))
+
+#define MASK_NO_TRX      (ZERO(12) | ZERO(13) | ZERO(14))
+#define MASK_NO_2D_SCALE ( ONE(0)  | ONE(5))
+
+#define MASK_IDENTITY    ( ONE(0)  | ZERO(4)  | ZERO(8)  | ZERO(12) |\
+			  ZERO(1)  |  ONE(5)  | ZERO(9)  | ZERO(13) |\
+			  ZERO(2)  | ZERO(6)  |  ONE(10) | ZERO(14) |\
+			  ZERO(3)  | ZERO(7)  | ZERO(11) |  ONE(15) )
+
+#define MASK_2D_NO_ROT   (           ZERO(4)  | ZERO(8)  |           \
+			  ZERO(1)  |            ZERO(9)  |           \
+			  ZERO(2)  | ZERO(6)  |  ONE(10) | ZERO(14) |\
+			  ZERO(3)  | ZERO(7)  | ZERO(11) |  ONE(15) )
+
+#define MASK_2D          (                      ZERO(8)  |           \
+			                        ZERO(9)  |           \
+			  ZERO(2)  | ZERO(6)  |  ONE(10) | ZERO(14) |\
+			  ZERO(3)  | ZERO(7)  | ZERO(11) |  ONE(15) )
+
+
+#define MASK_3D_NO_ROT   (           ZERO(4)  | ZERO(8)  |           \
+			  ZERO(1)  |            ZERO(9)  |           \
+			  ZERO(2)  | ZERO(6)  |                      \
+			  ZERO(3)  | ZERO(7)  | ZERO(11) |  ONE(15) )
+
+#define MASK_3D          (                                           \
+			                                             \
+			                                             \
+			  ZERO(3)  | ZERO(7)  | ZERO(11) |  ONE(15) )
+
+
+#define MASK_PERSPECTIVE (           ZERO(4)  |            ZERO(12) |\
+			  ZERO(1)  |                       ZERO(13) |\
+			  ZERO(2)  | ZERO(6)  |                      \
+			  ZERO(3)  | ZERO(7)  |            ZERO(15) )
+
+#define SQ(x) ((x)*(x))
+
+/**
+ * Determine type and flags from scratch.  
+ *
+ * \param mat matrix.
+ * 
+ * This is expensive enough to only want to do it once.
+ */
+static void analyse_from_scratch( GLmatrix *mat )
+{
+   const GLfloat *m = mat->m;
+   GLuint mask = 0;
+   GLuint i;
+
+   for (i = 0 ; i < 16 ; i++) {
+      if (m[i] == 0.0) mask |= (1<<i);
+   }
+
+   if (m[0] == 1.0F) mask |= (1<<16);
+   if (m[5] == 1.0F) mask |= (1<<21);
+   if (m[10] == 1.0F) mask |= (1<<26);
+   if (m[15] == 1.0F) mask |= (1<<31);
+
+   mat->flags &= ~MAT_FLAGS_GEOMETRY;
+
+   /* Check for translation - no-one really cares
+    */
+   if ((mask & MASK_NO_TRX) != MASK_NO_TRX)
+      mat->flags |= MAT_FLAG_TRANSLATION;
+
+   /* Do the real work
+    */
+   if (mask == (GLuint) MASK_IDENTITY) {
+      mat->type = MATRIX_IDENTITY;
+   }
+   else if ((mask & MASK_2D_NO_ROT) == (GLuint) MASK_2D_NO_ROT) {
+      mat->type = MATRIX_2D_NO_ROT;
+
+      if ((mask & MASK_NO_2D_SCALE) != MASK_NO_2D_SCALE)
+	 mat->flags |= MAT_FLAG_GENERAL_SCALE;
+   }
+   else if ((mask & MASK_2D) == (GLuint) MASK_2D) {
+      GLfloat mm = DOT2(m, m);
+      GLfloat m4m4 = DOT2(m+4,m+4);
+      GLfloat mm4 = DOT2(m,m+4);
+
+      mat->type = MATRIX_2D;
+
+      /* Check for scale */
+      if (SQ(mm-1) > SQ(1e-6) ||
+	  SQ(m4m4-1) > SQ(1e-6))
+	 mat->flags |= MAT_FLAG_GENERAL_SCALE;
+
+      /* Check for rotation */
+      if (SQ(mm4) > SQ(1e-6))
+	 mat->flags |= MAT_FLAG_GENERAL_3D;
+      else
+	 mat->flags |= MAT_FLAG_ROTATION;
+
+   }
+   else if ((mask & MASK_3D_NO_ROT) == (GLuint) MASK_3D_NO_ROT) {
+      mat->type = MATRIX_3D_NO_ROT;
+
+      /* Check for scale */
+      if (SQ(m[0]-m[5]) < SQ(1e-6) &&
+	  SQ(m[0]-m[10]) < SQ(1e-6)) {
+	 if (SQ(m[0]-1.0) > SQ(1e-6)) {
+	    mat->flags |= MAT_FLAG_UNIFORM_SCALE;
+         }
+      }
+      else {
+	 mat->flags |= MAT_FLAG_GENERAL_SCALE;
+      }
+   }
+   else if ((mask & MASK_3D) == (GLuint) MASK_3D) {
+      GLfloat c1 = DOT3(m,m);
+      GLfloat c2 = DOT3(m+4,m+4);
+      GLfloat c3 = DOT3(m+8,m+8);
+      GLfloat d1 = DOT3(m, m+4);
+      GLfloat cp[3];
+
+      mat->type = MATRIX_3D;
+
+      /* Check for scale */
+      if (SQ(c1-c2) < SQ(1e-6) && SQ(c1-c3) < SQ(1e-6)) {
+	 if (SQ(c1-1.0) > SQ(1e-6))
+	    mat->flags |= MAT_FLAG_UNIFORM_SCALE;
+	 /* else no scale at all */
+      }
+      else {
+	 mat->flags |= MAT_FLAG_GENERAL_SCALE;
+      }
+
+      /* Check for rotation */
+      if (SQ(d1) < SQ(1e-6)) {
+	 CROSS3( cp, m, m+4 );
+	 SUB_3V( cp, cp, (m+8) );
+	 if (LEN_SQUARED_3FV(cp) < SQ(1e-6))
+	    mat->flags |= MAT_FLAG_ROTATION;
+	 else
+	    mat->flags |= MAT_FLAG_GENERAL_3D;
+      }
+      else {
+	 mat->flags |= MAT_FLAG_GENERAL_3D; /* shear, etc */
+      }
+   }
+   else if ((mask & MASK_PERSPECTIVE) == MASK_PERSPECTIVE && m[11]==-1.0F) {
+      mat->type = MATRIX_PERSPECTIVE;
+      mat->flags |= MAT_FLAG_GENERAL;
+   }
+   else {
+      mat->type = MATRIX_GENERAL;
+      mat->flags |= MAT_FLAG_GENERAL;
+   }
+}
+
+/**
+ * Analyze a matrix given that its flags are accurate.
+ * 
+ * This is the more common operation, hopefully.
+ */
+static void analyse_from_flags( GLmatrix *mat )
+{
+   const GLfloat *m = mat->m;
+
+   if (TEST_MAT_FLAGS(mat, 0)) {
+      mat->type = MATRIX_IDENTITY;
+   }
+   else if (TEST_MAT_FLAGS(mat, (MAT_FLAG_TRANSLATION |
+				 MAT_FLAG_UNIFORM_SCALE |
+				 MAT_FLAG_GENERAL_SCALE))) {
+      if ( m[10]==1.0F && m[14]==0.0F ) {
+	 mat->type = MATRIX_2D_NO_ROT;
+      }
+      else {
+	 mat->type = MATRIX_3D_NO_ROT;
+      }
+   }
+   else if (TEST_MAT_FLAGS(mat, MAT_FLAGS_3D)) {
+      if (                                 m[ 8]==0.0F
+            &&                             m[ 9]==0.0F
+            && m[2]==0.0F && m[6]==0.0F && m[10]==1.0F && m[14]==0.0F) {
+	 mat->type = MATRIX_2D;
+      }
+      else {
+	 mat->type = MATRIX_3D;
+      }
+   }
+   else if (                 m[4]==0.0F                 && m[12]==0.0F
+            && m[1]==0.0F                               && m[13]==0.0F
+            && m[2]==0.0F && m[6]==0.0F
+            && m[3]==0.0F && m[7]==0.0F && m[11]==-1.0F && m[15]==0.0F) {
+      mat->type = MATRIX_PERSPECTIVE;
+   }
+   else {
+      mat->type = MATRIX_GENERAL;
+   }
+}
+
+/**
+ * Analyze and update a matrix.
+ *
+ * \param mat matrix.
+ *
+ * If the matrix type is dirty then calls either analyse_from_scratch() or
+ * analyse_from_flags() to determine its type, according to whether the flags
+ * are dirty or not, respectively. If the matrix has an inverse and it's dirty
+ * then calls matrix_invert(). Finally clears the dirty flags.
+ */
+void
+_math_matrix_analyse( GLmatrix *mat )
+{
+   if (mat->flags & MAT_DIRTY_TYPE) {
+      if (mat->flags & MAT_DIRTY_FLAGS)
+	 analyse_from_scratch( mat );
+      else
+	 analyse_from_flags( mat );
+   }
+
+   if (mat->inv && (mat->flags & MAT_DIRTY_INVERSE)) {
+      matrix_invert( mat );
+      mat->flags &= ~MAT_DIRTY_INVERSE;
+   }
+
+   mat->flags &= ~(MAT_DIRTY_FLAGS | MAT_DIRTY_TYPE);
+}
+
+/*@}*/
+
+
+/**
+ * Test if the given matrix preserves vector lengths.
+ */
+GLboolean
+_math_matrix_is_length_preserving( const GLmatrix *m )
+{
+   return TEST_MAT_FLAGS( m, MAT_FLAGS_LENGTH_PRESERVING);
+}
+
+
+/**
+ * Test if the given matrix does any rotation.
+ * (or perhaps if the upper-left 3x3 is non-identity)
+ */
+GLboolean
+_math_matrix_has_rotation( const GLmatrix *m )
+{
+   if (m->flags & (MAT_FLAG_GENERAL |
+                   MAT_FLAG_ROTATION |
+                   MAT_FLAG_GENERAL_3D |
+                   MAT_FLAG_PERSPECTIVE))
+      return GL_TRUE;
+   else
+      return GL_FALSE;
+}
+
+
+GLboolean
+_math_matrix_is_general_scale( const GLmatrix *m )
+{
+   return (m->flags & MAT_FLAG_GENERAL_SCALE) ? GL_TRUE : GL_FALSE;
+}
+
+
+GLboolean
+_math_matrix_is_dirty( const GLmatrix *m )
+{
+   return (m->flags & MAT_DIRTY) ? GL_TRUE : GL_FALSE;
+}
+
+
+/**********************************************************************/
+/** \name Matrix setup */
+/*@{*/
+
+/**
+ * Copy a matrix.
+ *
+ * \param to destination matrix.
+ * \param from source matrix.
+ *
+ * Copies all fields in GLmatrix, creating an inverse array if necessary.
+ */
+void
+_math_matrix_copy( GLmatrix *to, const GLmatrix *from )
+{
+   memcpy( to->m, from->m, sizeof(Identity) );
+   memcpy(to->inv, from->inv, 16 * sizeof(GLfloat));
+   to->flags = from->flags;
+   to->type = from->type;
+}
+
+/**
+ * Loads a matrix array into GLmatrix.
+ * 
+ * \param m matrix array.
+ * \param mat matrix.
+ *
+ * Copies \p m into GLmatrix::m and marks the MAT_FLAG_GENERAL and MAT_DIRTY
+ * flags.
+ */
+void
+_math_matrix_loadf( GLmatrix *mat, const GLfloat *m )
+{
+   memcpy( mat->m, m, 16*sizeof(GLfloat) );
+   mat->flags = (MAT_FLAG_GENERAL | MAT_DIRTY);
+}
+
+/**
+ * Matrix constructor.
+ *
+ * \param m matrix.
+ *
+ * Initialize the GLmatrix fields.
+ */
+void
+_math_matrix_ctr( GLmatrix *m )
+{
+   m->m = _mesa_align_malloc( 16 * sizeof(GLfloat), 16 );
+   if (m->m)
+      memcpy( m->m, Identity, sizeof(Identity) );
+   m->inv = _mesa_align_malloc( 16 * sizeof(GLfloat), 16 );
+   if (m->inv)
+      memcpy( m->inv, Identity, sizeof(Identity) );
+   m->type = MATRIX_IDENTITY;
+   m->flags = 0;
+}
+
+/**
+ * Matrix destructor.
+ *
+ * \param m matrix.
+ *
+ * Frees the data in a GLmatrix.
+ */
+void
+_math_matrix_dtr( GLmatrix *m )
+{
+   _mesa_align_free( m->m );
+   m->m = NULL;
+
+   _mesa_align_free( m->inv );
+   m->inv = NULL;
+}
+
+/*@}*/
+
+
+/**********************************************************************/
+/** \name Matrix transpose */
+/*@{*/
+
+/**
+ * Transpose a GLfloat matrix.
+ *
+ * \param to destination array.
+ * \param from source array.
+ */
+void
+_math_transposef( GLfloat to[16], const GLfloat from[16] )
+{
+   to[0] = from[0];
+   to[1] = from[4];
+   to[2] = from[8];
+   to[3] = from[12];
+   to[4] = from[1];
+   to[5] = from[5];
+   to[6] = from[9];
+   to[7] = from[13];
+   to[8] = from[2];
+   to[9] = from[6];
+   to[10] = from[10];
+   to[11] = from[14];
+   to[12] = from[3];
+   to[13] = from[7];
+   to[14] = from[11];
+   to[15] = from[15];
+}
+
+/**
+ * Transpose a GLdouble matrix.
+ *
+ * \param to destination array.
+ * \param from source array.
+ */
+void
+_math_transposed( GLdouble to[16], const GLdouble from[16] )
+{
+   to[0] = from[0];
+   to[1] = from[4];
+   to[2] = from[8];
+   to[3] = from[12];
+   to[4] = from[1];
+   to[5] = from[5];
+   to[6] = from[9];
+   to[7] = from[13];
+   to[8] = from[2];
+   to[9] = from[6];
+   to[10] = from[10];
+   to[11] = from[14];
+   to[12] = from[3];
+   to[13] = from[7];
+   to[14] = from[11];
+   to[15] = from[15];
+}
+
+/**
+ * Transpose a GLdouble matrix and convert to GLfloat.
+ *
+ * \param to destination array.
+ * \param from source array.
+ */
+void
+_math_transposefd( GLfloat to[16], const GLdouble from[16] )
+{
+   to[0] = (GLfloat) from[0];
+   to[1] = (GLfloat) from[4];
+   to[2] = (GLfloat) from[8];
+   to[3] = (GLfloat) from[12];
+   to[4] = (GLfloat) from[1];
+   to[5] = (GLfloat) from[5];
+   to[6] = (GLfloat) from[9];
+   to[7] = (GLfloat) from[13];
+   to[8] = (GLfloat) from[2];
+   to[9] = (GLfloat) from[6];
+   to[10] = (GLfloat) from[10];
+   to[11] = (GLfloat) from[14];
+   to[12] = (GLfloat) from[3];
+   to[13] = (GLfloat) from[7];
+   to[14] = (GLfloat) from[11];
+   to[15] = (GLfloat) from[15];
+}
+
+/*@}*/
+
+
+/**
+ * Transform a 4-element row vector (1x4 matrix) by a 4x4 matrix.  This
+ * function is used for transforming clipping plane equations and spotlight
+ * directions.
+ * Mathematically,  u = v * m.
+ * Input:  v - input vector
+ *         m - transformation matrix
+ * Output:  u - transformed vector
+ */
+void
+_mesa_transform_vector( GLfloat u[4], const GLfloat v[4], const GLfloat m[16] )
+{
+   const GLfloat v0 = v[0], v1 = v[1], v2 = v[2], v3 = v[3];
+#define M(row,col)  m[row + col*4]
+   u[0] = v0 * M(0,0) + v1 * M(1,0) + v2 * M(2,0) + v3 * M(3,0);
+   u[1] = v0 * M(0,1) + v1 * M(1,1) + v2 * M(2,1) + v3 * M(3,1);
+   u[2] = v0 * M(0,2) + v1 * M(1,2) + v2 * M(2,2) + v3 * M(3,2);
+   u[3] = v0 * M(0,3) + v1 * M(1,3) + v2 * M(2,3) + v3 * M(3,3);
+#undef M
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/math/m_matrix.h b/icd/intel/compiler/mesa-utils/src/mesa/math/m_matrix.h
new file mode 100644
index 0000000..dddce70
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/math/m_matrix.h

@@ -0,0 +1,218 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2005  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+/**
+ * \file math/m_matrix.h
+ * Defines basic structures for matrix-handling.
+ */
+
+#ifndef _M_MATRIX_H
+#define _M_MATRIX_H
+
+
+#include "main/glheader.h"
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+/**
+ * \name Symbolic names to some of the entries in the matrix
+ *
+ * These are handy for the viewport mapping, which is expressed as a matrix.
+ */
+/*@{*/
+#define MAT_SX 0
+#define MAT_SY 5
+#define MAT_SZ 10
+#define MAT_TX 12
+#define MAT_TY 13
+#define MAT_TZ 14
+/*@}*/
+
+
+/**
+ * Different kinds of 4x4 transformation matrices.
+ * We use these to select specific optimized vertex transformation routines.
+ */
+enum GLmatrixtype {
+   MATRIX_GENERAL,	/**< general 4x4 matrix */
+   MATRIX_IDENTITY,	/**< identity matrix */
+   MATRIX_3D_NO_ROT,	/**< orthogonal projection and others... */
+   MATRIX_PERSPECTIVE,	/**< perspective projection matrix */
+   MATRIX_2D,		/**< 2-D transformation */
+   MATRIX_2D_NO_ROT,	/**< 2-D scale & translate only */
+   MATRIX_3D		/**< 3-D transformation */
+} ;
+
+/**
+ * Matrix type to represent 4x4 transformation matrices.
+ */
+typedef struct {
+   GLfloat *m;		/**< 16 matrix elements (16-byte aligned) */
+   GLfloat *inv;	/**< 16-element inverse (16-byte aligned) */
+   GLuint flags;        /**< possible values determined by (of \link
+                         * MatFlags MAT_FLAG_* flags\endlink)
+                         */
+   enum GLmatrixtype type;
+} GLmatrix;
+
+
+
+
+extern void
+_math_matrix_ctr( GLmatrix *m );
+
+extern void
+_math_matrix_dtr( GLmatrix *m );
+
+extern void
+_math_matrix_mul_matrix( GLmatrix *dest, const GLmatrix *a, const GLmatrix *b );
+
+extern void
+_math_matrix_mul_floats( GLmatrix *dest, const GLfloat *b );
+
+extern void
+_math_matrix_loadf( GLmatrix *mat, const GLfloat *m );
+
+extern void
+_math_matrix_translate( GLmatrix *mat, GLfloat x, GLfloat y, GLfloat z );
+
+extern void
+_math_matrix_rotate( GLmatrix *m, GLfloat angle,
+		     GLfloat x, GLfloat y, GLfloat z );
+
+extern void
+_math_matrix_scale( GLmatrix *mat, GLfloat x, GLfloat y, GLfloat z );
+
+extern void
+_math_matrix_ortho( GLmatrix *mat,
+		    GLfloat left, GLfloat right,
+		    GLfloat bottom, GLfloat top,
+		    GLfloat nearval, GLfloat farval );
+
+extern void
+_math_matrix_frustum( GLmatrix *mat,
+		      GLfloat left, GLfloat right,
+		      GLfloat bottom, GLfloat top,
+		      GLfloat nearval, GLfloat farval );
+
+extern void
+_math_matrix_viewport(GLmatrix *m, GLfloat x, GLfloat y, GLfloat width, GLfloat height,
+                      GLdouble zNear, GLdouble zFar, GLdouble depthMax);
+
+extern void
+_math_matrix_set_identity( GLmatrix *dest );
+
+extern void
+_math_matrix_copy( GLmatrix *to, const GLmatrix *from );
+
+extern void
+_math_matrix_analyse( GLmatrix *mat );
+
+extern void
+_math_matrix_print( const GLmatrix *m );
+
+extern GLboolean
+_math_matrix_is_length_preserving( const GLmatrix *m );
+
+extern GLboolean
+_math_matrix_has_rotation( const GLmatrix *m );
+
+extern GLboolean
+_math_matrix_is_general_scale( const GLmatrix *m );
+
+extern GLboolean
+_math_matrix_is_dirty( const GLmatrix *m );
+
+
+/**
+ * \name Related functions that don't actually operate on GLmatrix structs
+ */
+/*@{*/
+
+extern void
+_math_transposef( GLfloat to[16], const GLfloat from[16] );
+
+extern void
+_math_transposed( GLdouble to[16], const GLdouble from[16] );
+
+extern void
+_math_transposefd( GLfloat to[16], const GLdouble from[16] );
+
+
+/*
+ * Transform a point (column vector) by a matrix:   Q = M * P
+ */
+#define TRANSFORM_POINT( Q, M, P )					\
+   Q[0] = M[0] * P[0] + M[4] * P[1] + M[8] *  P[2] + M[12] * P[3];	\
+   Q[1] = M[1] * P[0] + M[5] * P[1] + M[9] *  P[2] + M[13] * P[3];	\
+   Q[2] = M[2] * P[0] + M[6] * P[1] + M[10] * P[2] + M[14] * P[3];	\
+   Q[3] = M[3] * P[0] + M[7] * P[1] + M[11] * P[2] + M[15] * P[3];
+
+
+#define TRANSFORM_POINT3( Q, M, P )				\
+   Q[0] = M[0] * P[0] + M[4] * P[1] + M[8] *  P[2] + M[12];	\
+   Q[1] = M[1] * P[0] + M[5] * P[1] + M[9] *  P[2] + M[13];	\
+   Q[2] = M[2] * P[0] + M[6] * P[1] + M[10] * P[2] + M[14];	\
+   Q[3] = M[3] * P[0] + M[7] * P[1] + M[11] * P[2] + M[15];
+
+
+/*
+ * Transform a normal (row vector) by a matrix:  [NX NY NZ] = N * MAT
+ */
+#define TRANSFORM_NORMAL( TO, N, MAT )				\
+do {								\
+   TO[0] = N[0] * MAT[0] + N[1] * MAT[1] + N[2] * MAT[2];	\
+   TO[1] = N[0] * MAT[4] + N[1] * MAT[5] + N[2] * MAT[6];	\
+   TO[2] = N[0] * MAT[8] + N[1] * MAT[9] + N[2] * MAT[10];	\
+} while (0)
+
+
+/**
+ * Transform a direction by a matrix.
+ */
+#define TRANSFORM_DIRECTION( TO, DIR, MAT )			\
+do {								\
+   TO[0] = DIR[0] * MAT[0] + DIR[1] * MAT[4] + DIR[2] * MAT[8];	\
+   TO[1] = DIR[0] * MAT[1] + DIR[1] * MAT[5] + DIR[2] * MAT[9];	\
+   TO[2] = DIR[0] * MAT[2] + DIR[1] * MAT[6] + DIR[2] * MAT[10];\
+} while (0)
+
+
+extern void
+_mesa_transform_vector(GLfloat u[4], const GLfloat v[4], const GLfloat m[16]);
+
+
+/*@}*/
+
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/Android.mk b/icd/intel/compiler/mesa-utils/src/mesa/program/Android.mk
new file mode 100644
index 0000000..e85afe6
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/Android.mk

@@ -0,0 +1,79 @@
+# Copyright 2012 Intel Corporation
+#
+# Permission is hereby granted, free of charge, to any person obtaining a
+# copy of this software and associated documentation files (the "Software"),
+# to deal in the Software without restriction, including without limitation
+# the rights to use, copy, modify, merge, publish, distribute, sublicense,
+# and/or sell copies of the Software, and to permit persons to whom the
+# Software is furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included
+# in all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+# DEALINGS IN THE SOFTWARE.
+
+LOCAL_PATH := $(call my-dir)
+
+define local-l-to-c
+	@mkdir -p $(dir $@)
+	@echo "Mesa Lex: $(PRIVATE_MODULE) <= $<"
+	$(hide) $(LEX) -o$@ $<
+endef
+
+define mesa_local-y-to-c-and-h
+	@mkdir -p $(dir $@)
+	@echo "Mesa Yacc: $(PRIVATE_MODULE) <= $<"
+	$(hide) $(YACC) -o $@ -p "_mesa_program_" $<
+endef
+
+# ----------------------------------------------------------------------
+# libmesa_program.a
+# ----------------------------------------------------------------------
+
+# Import the following variables:
+#     PROGRAM_FILES
+include $(MESA_TOP)/src/mesa/Makefile.sources
+
+include $(CLEAR_VARS)
+
+LOCAL_MODULE := libmesa_program
+LOCAL_MODULE_CLASS := STATIC_LIBRARIES
+
+intermediates := $(call local-intermediates-dir)
+
+# TODO(chadv): In Makefile.sources, move these vars to a different list so we can
+# remove this kludge.
+generated_sources_basenames := \
+	lex.yy.c \
+	program_parse.tab.c \
+	program_parse.tab.h
+
+LOCAL_SRC_FILES := \
+	$(filter-out $(generated_sources_basenames),$(subst program/,,$(PROGRAM_FILES)))
+
+LOCAL_GENERATED_SOURCES := \
+	$(addprefix $(intermediates)/program/,$(generated_sources_basenames))
+
+$(intermediates)/program/program_parse.tab.c: $(LOCAL_PATH)/program_parse.y
+	$(mesa_local-y-to-c-and-h)
+
+$(intermediates)/program/program_parse.tab.h: $(intermediates)/program/program_parse.tab.c
+	@
+
+$(intermediates)/program/lex.yy.c: $(LOCAL_PATH)/program_lexer.l
+	$(local-l-to-c)
+
+LOCAL_C_INCLUDES := \
+	$(intermediates) \
+	$(MESA_TOP)/src/mapi \
+	$(MESA_TOP)/src/mesa \
+	$(MESA_TOP)/src/glsl
+
+include $(MESA_COMMON_MK)
+include $(BUILD_STATIC_LIBRARY)

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/arbprogparse.c b/icd/intel/compiler/mesa-utils/src/mesa/program/arbprogparse.c
new file mode 100644
index 0000000..5b96650
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/arbprogparse.c

@@ -0,0 +1,213 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2008  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#define DEBUG_PARSING 0
+
+/**
+ * \file arbprogparse.c
+ * ARB_*_program parser core
+ * \author Karl Rasche
+ */
+
+/**
+Notes on program parameters, etc.
+
+The instructions we emit will use six kinds of source registers:
+
+  PROGRAM_INPUT      - input registers
+  PROGRAM_TEMPORARY  - temp registers
+  PROGRAM_ADDRESS    - address/indirect register
+  PROGRAM_SAMPLER    - texture sampler
+  PROGRAM_CONSTANT   - indexes into program->Parameters, a known constant/literal
+  PROGRAM_STATE_VAR  - indexes into program->Parameters, and may actually be:
+                       + a state variable, like "state.fog.color", or
+                       + a pointer to a "program.local[k]" parameter, or
+                       + a pointer to a "program.env[k]" parameter
+
+Basically, all the program.local[] and program.env[] values will get mapped
+into the unified gl_program->Parameters array.  This solves the problem of
+having three separate program parameter arrays.
+*/
+
+
+#include "main/glheader.h"
+#include "main/imports.h"
+#include "main/context.h"
+#include "main/mtypes.h"
+#include "arbprogparse.h"
+#include "programopt.h"
+#include "prog_parameter.h"
+#include "prog_statevars.h"
+#include "prog_instruction.h"
+#include "program_parser.h"
+
+
+void
+_mesa_parse_arb_fragment_program(struct gl_context* ctx, GLenum target,
+                                 const GLvoid *str, GLsizei len,
+                                 struct gl_fragment_program *program)
+{
+   struct gl_program prog;
+   struct asm_parser_state state;
+   GLuint i;
+
+   ASSERT(target == GL_FRAGMENT_PROGRAM_ARB);
+
+   memset(&prog, 0, sizeof(prog));
+   memset(&state, 0, sizeof(state));
+   state.prog = &prog;
+
+   if (!_mesa_parse_arb_program(ctx, target, (const GLubyte*) str, len,
+				&state)) {
+      /* Error in the program. Just return. */
+      return;
+   }
+
+   free(program->Base.String);
+
+   /* Copy the relevant contents of the arb_program struct into the
+    * fragment_program struct.
+    */
+   program->Base.String          = prog.String;
+   program->Base.NumInstructions = prog.NumInstructions;
+   program->Base.NumTemporaries  = prog.NumTemporaries;
+   program->Base.NumParameters   = prog.NumParameters;
+   program->Base.NumAttributes   = prog.NumAttributes;
+   program->Base.NumAddressRegs  = prog.NumAddressRegs;
+   program->Base.NumNativeInstructions = prog.NumNativeInstructions;
+   program->Base.NumNativeTemporaries = prog.NumNativeTemporaries;
+   program->Base.NumNativeParameters = prog.NumNativeParameters;
+   program->Base.NumNativeAttributes = prog.NumNativeAttributes;
+   program->Base.NumNativeAddressRegs = prog.NumNativeAddressRegs;
+   program->Base.NumAluInstructions   = prog.NumAluInstructions;
+   program->Base.NumTexInstructions   = prog.NumTexInstructions;
+   program->Base.NumTexIndirections   = prog.NumTexIndirections;
+   program->Base.NumNativeAluInstructions = prog.NumAluInstructions;
+   program->Base.NumNativeTexInstructions = prog.NumTexInstructions;
+   program->Base.NumNativeTexIndirections = prog.NumTexIndirections;
+   program->Base.InputsRead      = prog.InputsRead;
+   program->Base.OutputsWritten  = prog.OutputsWritten;
+   program->Base.IndirectRegisterFiles = prog.IndirectRegisterFiles;
+   for (i = 0; i < MAX_TEXTURE_IMAGE_UNITS; i++) {
+      program->Base.TexturesUsed[i] = prog.TexturesUsed[i];
+      if (prog.TexturesUsed[i])
+         program->Base.SamplersUsed |= (1 << i);
+   }
+   program->Base.ShadowSamplers = prog.ShadowSamplers;
+   program->OriginUpperLeft = state.option.OriginUpperLeft;
+   program->PixelCenterInteger = state.option.PixelCenterInteger;
+
+   program->UsesKill            = state.fragment.UsesKill;
+   program->UsesDFdy            = state.fragment.UsesDFdy;
+
+   free(program->Base.Instructions);
+   program->Base.Instructions = prog.Instructions;
+
+   if (program->Base.Parameters)
+      _mesa_free_parameter_list(program->Base.Parameters);
+   program->Base.Parameters    = prog.Parameters;
+
+   /* Append fog instructions now if the program has "OPTION ARB_fog_exp"
+    * or similar.  We used to leave this up to drivers, but it appears
+    * there's no hardware that wants to do fog in a discrete stage separate
+    * from the fragment shader.
+    */
+   if (state.option.Fog != OPTION_NONE) {
+      static const GLenum fog_modes[4] = {
+	 GL_NONE, GL_EXP, GL_EXP2, GL_LINEAR
+      };
+
+      /* XXX: we should somehow recompile this to remove clamping if disabled
+       * On the ATI driver, this is unclampled if fragment clamping is disabled
+       */
+      _mesa_append_fog_code(ctx, program, fog_modes[state.option.Fog], GL_TRUE);
+   }
+
+#if DEBUG_FP
+   printf("____________Fragment program %u ________\n", program->Base.Id);
+   _mesa_print_program(&program->Base);
+#endif
+}
+
+
+
+/**
+ * Parse the vertex program string.  If success, update the given
+ * vertex_program object with the new program.  Else, leave the vertex_program
+ * object unchanged.
+ */
+void
+_mesa_parse_arb_vertex_program(struct gl_context *ctx, GLenum target,
+			       const GLvoid *str, GLsizei len,
+			       struct gl_vertex_program *program)
+{
+   struct gl_program prog;
+   struct asm_parser_state state;
+
+   ASSERT(target == GL_VERTEX_PROGRAM_ARB);
+
+   memset(&prog, 0, sizeof(prog));
+   memset(&state, 0, sizeof(state));
+   state.prog = &prog;
+
+   if (!_mesa_parse_arb_program(ctx, target, (const GLubyte*) str, len,
+				&state)) {
+      _mesa_error(ctx, GL_INVALID_OPERATION, "glProgramString(bad program)");
+      return;
+   }
+
+   free(program->Base.String);
+
+   /* Copy the relevant contents of the arb_program struct into the 
+    * vertex_program struct.
+    */
+   program->Base.String          = prog.String;
+   program->Base.NumInstructions = prog.NumInstructions;
+   program->Base.NumTemporaries  = prog.NumTemporaries;
+   program->Base.NumParameters   = prog.NumParameters;
+   program->Base.NumAttributes   = prog.NumAttributes;
+   program->Base.NumAddressRegs  = prog.NumAddressRegs;
+   program->Base.NumNativeInstructions = prog.NumNativeInstructions;
+   program->Base.NumNativeTemporaries = prog.NumNativeTemporaries;
+   program->Base.NumNativeParameters = prog.NumNativeParameters;
+   program->Base.NumNativeAttributes = prog.NumNativeAttributes;
+   program->Base.NumNativeAddressRegs = prog.NumNativeAddressRegs;
+   program->Base.InputsRead     = prog.InputsRead;
+   program->Base.OutputsWritten = prog.OutputsWritten;
+   program->Base.IndirectRegisterFiles = prog.IndirectRegisterFiles;
+   program->IsPositionInvariant = (state.option.PositionInvariant)
+      ? GL_TRUE : GL_FALSE;
+
+   free(program->Base.Instructions);
+   program->Base.Instructions = prog.Instructions;
+
+   if (program->Base.Parameters)
+      _mesa_free_parameter_list(program->Base.Parameters);
+   program->Base.Parameters = prog.Parameters; 
+
+#if DEBUG_VP
+   printf("____________Vertex program %u __________\n", program->Base.Id);
+   _mesa_print_program(&program->Base);
+#endif
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/arbprogparse.h b/icd/intel/compiler/mesa-utils/src/mesa/program/arbprogparse.h
new file mode 100644
index 0000000..39d2116
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/arbprogparse.h

@@ -0,0 +1,45 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2005  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+#ifndef ARBPROGPARSE_H
+#define ARBPROGPARSE_H
+
+#include "main/glheader.h"
+
+struct gl_context;
+struct gl_fragment_program;
+struct gl_vertex_program;
+
+extern void
+_mesa_parse_arb_vertex_program(struct gl_context *ctx, GLenum target,
+			       const GLvoid *str, GLsizei len,
+			       struct gl_vertex_program *program);
+
+extern void
+_mesa_parse_arb_fragment_program(struct gl_context *ctx, GLenum target,
+                                 const GLvoid *str, GLsizei len,
+                                 struct gl_fragment_program *program);
+
+#endif

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/hash_table.h b/icd/intel/compiler/mesa-utils/src/mesa/program/hash_table.h
new file mode 100644
index 0000000..c466bce
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/hash_table.h

@@ -0,0 +1,296 @@
+/*
+ * Copyright © 2008 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file hash_table.h
+ * \brief Implementation of a generic, opaque hash table data type.
+ *
+ * \author Ian Romanick <ian.d.romanick@intel.com>
+ */
+
+#ifndef HASH_TABLE_H
+#define HASH_TABLE_H
+
+#include <string.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <limits.h>
+#include <assert.h>
+
+struct string_to_uint_map;
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct hash_table;
+
+typedef unsigned (*hash_func_t)(const void *key);
+typedef int (*hash_compare_func_t)(const void *key1, const void *key2);
+
+/**
+ * Hash table constructor
+ *
+ * Creates a hash table with the specified number of buckets.  The supplied
+ * \c hash and \c compare routines are used when adding elements to the table
+ * and when searching for elements in the table.
+ *
+ * \param num_buckets  Number of buckets (bins) in the hash table.
+ * \param hash         Function used to compute hash value of input keys.
+ * \param compare      Function used to compare keys.
+ */
+extern struct hash_table *hash_table_ctor(unsigned num_buckets,
+    hash_func_t hash, hash_compare_func_t compare);
+
+
+/**
+ * Release all memory associated with a hash table
+ *
+ * \warning
+ * This function cannot release memory occupied either by keys or data.
+ */
+extern void hash_table_dtor(struct hash_table *ht);
+
+
+/**
+ * Flush all entries from a hash table
+ *
+ * \param ht  Table to be cleared of its entries.
+ */
+extern void hash_table_clear(struct hash_table *ht);
+
+
+/**
+ * Search a hash table for a specific element
+ *
+ * \param ht   Table to be searched
+ * \param key  Key of the desired element
+ *
+ * \return
+ * The \c data value supplied to \c hash_table_insert when the element with
+ * the matching key was added.  If no matching key exists in the table,
+ * \c NULL is returned.
+ */
+extern void *hash_table_find(struct hash_table *ht, const void *key);
+
+
+/**
+ * Add an element to a hash table
+ *
+ * \warning
+ * If \c key is already in the hash table, it will be added again.  Future
+ * calls to \c hash_table_find and \c hash_table_remove will return or remove,
+ * respectively, the most recently added instance of \c key.
+ *
+ * \warning
+ * The value passed by \c key is kept in the hash table and is used by later
+ * calls to \c hash_table_find.
+ *
+ * \sa hash_table_replace
+ */
+extern void hash_table_insert(struct hash_table *ht, void *data,
+    const void *key);
+
+/**
+ * Add an element to a hash table with replacement
+ *
+ * \return
+ * 1 if it did replace the the value (in which case the old key is kept), 0 if
+ * it did not replace the value (in which case the new key is kept).
+ *
+ * \warning
+ * If \c key is already in the hash table, \c data will \b replace the most
+ * recently inserted \c data (see the warning in \c hash_table_insert) for
+ * that key.
+ *
+ * \sa hash_table_insert
+ */
+extern bool hash_table_replace(struct hash_table *ht, void *data,
+    const void *key);
+
+/**
+ * Remove a specific element from a hash table.
+ */
+extern void hash_table_remove(struct hash_table *ht, const void *key);
+
+/**
+ * Compute hash value of a string
+ *
+ * Computes the hash value of a string using the DJB2 algorithm developed by
+ * Professor Daniel J. Bernstein.  It was published on comp.lang.c once upon
+ * a time.  I was unable to find the original posting in the archives.
+ *
+ * \param key  Pointer to a NUL terminated string to be hashed.
+ *
+ * \sa hash_table_string_compare
+ */
+extern unsigned hash_table_string_hash(const void *key);
+
+
+/**
+ * Compare two strings used as keys
+ *
+ * This is just a macro wrapper around \c strcmp.
+ *
+ * \sa hash_table_string_hash
+ */
+#define hash_table_string_compare ((hash_compare_func_t) strcmp)
+
+
+/**
+ * Compute hash value of a pointer
+ *
+ * \param key  Pointer to be used as a hash key
+ *
+ * \note
+ * The memory pointed to by \c key is \b never accessed.  The value of \c key
+ * itself is used as the hash key
+ *
+ * \sa hash_table_pointer_compare
+ */
+unsigned
+hash_table_pointer_hash(const void *key);
+
+
+/**
+ * Compare two pointers used as keys
+ *
+ * \sa hash_table_pointer_hash
+ */
+int
+hash_table_pointer_compare(const void *key1, const void *key2);
+
+void
+hash_table_call_foreach(struct hash_table *ht,
+			void (*callback)(const void *key,
+					 void *data,
+					 void *closure),
+			void *closure);
+
+struct string_to_uint_map *
+string_to_uint_map_ctor();
+
+void
+string_to_uint_map_dtor(struct string_to_uint_map *);
+
+
+#ifdef __cplusplus
+}
+
+/**
+ * Map from a string (name) to an unsigned integer value
+ *
+ * \note
+ * Because of the way this class interacts with the \c hash_table
+ * implementation, values of \c UINT_MAX cannot be stored in the map.
+ */
+struct string_to_uint_map {
+public:
+   string_to_uint_map()
+   {
+      this->ht = hash_table_ctor(0, hash_table_string_hash,
+				 hash_table_string_compare);
+   }
+
+   ~string_to_uint_map()
+   {
+      hash_table_call_foreach(this->ht, delete_key, NULL);
+      hash_table_dtor(this->ht);
+   }
+
+   /**
+    * Remove all mappings from this map.
+    */
+   void clear()
+   {
+      hash_table_call_foreach(this->ht, delete_key, NULL);
+      hash_table_clear(this->ht);
+   }
+
+   /**
+    * Runs a passed callback for the hash
+    */
+   void iterate(void (*func)(const void *, void *, void *), void *closure)
+   {
+      hash_table_call_foreach(this->ht, func, closure);
+   }
+
+   /**
+    * Get the value associated with a particular key
+    *
+    * \return
+    * If \c key is found in the map, \c true is returned.  Otherwise \c false
+    * is returned.
+    *
+    * \note
+    * If \c key is not found in the table, \c value is not modified.
+    */
+   bool get(unsigned &value, const char *key)
+   {
+      const intptr_t v =
+	 (intptr_t) hash_table_find(this->ht, (const void *) key);
+
+      if (v == 0)
+	 return false;
+
+      value = (unsigned)(v - 1);
+      return true;
+   }
+
+   void put(unsigned value, const char *key)
+   {
+      /* The low-level hash table structure returns NULL if key is not in the
+       * hash table.  However, users of this map might want to store zero as a
+       * valid value in the table.  Bias the value by +1 so that a
+       * user-specified zero is stored as 1.  This enables ::get to tell the
+       * difference between a user-specified zero (returned as 1 by
+       * hash_table_find) and the key not in the table (returned as 0 by
+       * hash_table_find).
+       *
+       * The net effect is that we can't store UINT_MAX in the table.  This is
+       * because UINT_MAX+1 = 0.
+       */
+      assert(value != UINT_MAX);
+      char *dup_key = strdup(key);
+      bool result = hash_table_replace(this->ht,
+				       (void *) (intptr_t) (value + 1),
+				       dup_key);
+      if (result)
+	 free(dup_key);
+   }
+
+private:
+   static void delete_key(const void *key, void *data, void *closure)
+   {
+      (void) data;
+      (void) closure;
+
+      free((char *)key);
+   }
+
+   struct hash_table *ht;
+};
+
+#endif /* __cplusplus */
+#endif /* HASH_TABLE_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/ir_to_mesa.cpp b/icd/intel/compiler/mesa-utils/src/mesa/program/ir_to_mesa.cpp
new file mode 100644
index 0000000..6ebc0c9
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/ir_to_mesa.cpp

@@ -0,0 +1,3119 @@
+/*
+ * Copyright (C) 2005-2007  Brian Paul   All Rights Reserved.
+ * Copyright (C) 2008  VMware, Inc.   All Rights Reserved.
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file ir_to_mesa.cpp
+ *
+ * Translate GLSL IR to Mesa's gl_program representation.
+ */
+
+#include <stdio.h>
+#include "main/compiler.h"
+#include "ir.h"
+#include "ir_visitor.h"
+#include "ir_expression_flattening.h"
+#include "ir_uniform.h"
+#include "glsl_types.h"
+#include "glsl_parser_extras.h"
+//#include "../glsl/program.h" // LunarG : Removed
+#include "ir_optimization.h"
+#include "ast.h"
+#include "linker.h"
+
+#include "main/mtypes.h"
+#include "main/shaderobj.h"
+#include "main/uniforms.h"
+#include "program/hash_table.h"
+
+extern "C" {
+#include "main/shaderapi.h"
+#include "program/prog_instruction.h"
+#include "program/prog_optimize.h"
+#include "program/prog_print.h"
+#include "program/program.h"
+#include "program/prog_parameter.h"
+#include "program/sampler.h"
+#include "program/prog_diskcache.h"
+}
+
+static int swizzle_for_size(int size);
+
+namespace {
+
+class src_reg;
+class dst_reg;
+
+/**
+ * This struct is a corresponding struct to Mesa prog_src_register, with
+ * wider fields.
+ */
+class src_reg {
+public:
+   src_reg(gl_register_file file, int index, const glsl_type *type)
+   {
+      this->file = file;
+      this->index = index;
+      if (type && (type->is_scalar() || type->is_vector() || type->is_matrix()))
+	 this->swizzle = swizzle_for_size(type->vector_elements);
+      else
+	 this->swizzle = SWIZZLE_XYZW;
+      this->negate = 0;
+      this->reladdr = NULL;
+   }
+
+   src_reg()
+   {
+      this->file = PROGRAM_UNDEFINED;
+      this->index = 0;
+      this->swizzle = 0;
+      this->negate = 0;
+      this->reladdr = NULL;
+   }
+
+   explicit src_reg(dst_reg reg);
+
+   gl_register_file file; /**< PROGRAM_* from Mesa */
+   int index; /**< temporary index, VERT_ATTRIB_*, VARYING_SLOT_*, etc. */
+   GLuint swizzle; /**< SWIZZLE_XYZWONEZERO swizzles from Mesa. */
+   int negate; /**< NEGATE_XYZW mask from mesa */
+   /** Register index should be offset by the integer in this reg. */
+   src_reg *reladdr;
+};
+
+class dst_reg {
+public:
+   dst_reg(gl_register_file file, int writemask)
+   {
+      this->file = file;
+      this->index = 0;
+      this->writemask = writemask;
+      this->cond_mask = COND_TR;
+      this->reladdr = NULL;
+   }
+
+   dst_reg()
+   {
+      this->file = PROGRAM_UNDEFINED;
+      this->index = 0;
+      this->writemask = 0;
+      this->cond_mask = COND_TR;
+      this->reladdr = NULL;
+   }
+
+   explicit dst_reg(src_reg reg);
+
+   gl_register_file file; /**< PROGRAM_* from Mesa */
+   int index; /**< temporary index, VERT_ATTRIB_*, VARYING_SLOT_*, etc. */
+   int writemask; /**< Bitfield of WRITEMASK_[XYZW] */
+   GLuint cond_mask:4;
+   /** Register index should be offset by the integer in this reg. */
+   src_reg *reladdr;
+};
+
+} /* anonymous namespace */
+
+src_reg::src_reg(dst_reg reg)
+{
+   this->file = reg.file;
+   this->index = reg.index;
+   this->swizzle = SWIZZLE_XYZW;
+   this->negate = 0;
+   this->reladdr = reg.reladdr;
+}
+
+dst_reg::dst_reg(src_reg reg)
+{
+   this->file = reg.file;
+   this->index = reg.index;
+   this->writemask = WRITEMASK_XYZW;
+   this->cond_mask = COND_TR;
+   this->reladdr = reg.reladdr;
+}
+
+namespace {
+
+class ir_to_mesa_instruction : public exec_node {
+public:
+   DECLARE_RALLOC_CXX_OPERATORS(ir_to_mesa_instruction)
+
+   enum prog_opcode op;
+   dst_reg dst;
+   src_reg src[3];
+   /** Pointer to the ir source this tree came from for debugging */
+   ir_instruction *ir;
+   GLboolean cond_update;
+   bool saturate;
+   int sampler; /**< sampler index */
+   int tex_target; /**< One of TEXTURE_*_INDEX */
+   GLboolean tex_shadow;
+};
+
+class variable_storage : public exec_node {
+public:
+   variable_storage(ir_variable *var, gl_register_file file, int index)
+      : file(file), index(index), var(var)
+   {
+      /* empty */
+   }
+
+   gl_register_file file;
+   int index;
+   ir_variable *var; /* variable that maps to this, if any */
+};
+
+class function_entry : public exec_node {
+public:
+   ir_function_signature *sig;
+
+   /**
+    * identifier of this function signature used by the program.
+    *
+    * At the point that Mesa instructions for function calls are
+    * generated, we don't know the address of the first instruction of
+    * the function body.  So we make the BranchTarget that is called a
+    * small integer and rewrite them during set_branchtargets().
+    */
+   int sig_id;
+
+   /**
+    * Pointer to first instruction of the function body.
+    *
+    * Set during function body emits after main() is processed.
+    */
+   ir_to_mesa_instruction *bgn_inst;
+
+   /**
+    * Index of the first instruction of the function body in actual
+    * Mesa IR.
+    *
+    * Set after conversion from ir_to_mesa_instruction to prog_instruction.
+    */
+   int inst;
+
+   /** Storage for the return value. */
+   src_reg return_reg;
+};
+
+class ir_to_mesa_visitor : public ir_visitor {
+public:
+   ir_to_mesa_visitor();
+   ~ir_to_mesa_visitor();
+
+   function_entry *current_function;
+
+   struct gl_context *ctx;
+   struct gl_program *prog;
+   struct gl_shader_program *shader_program;
+   struct gl_shader_compiler_options *options;
+
+   int next_temp;
+
+   variable_storage *find_variable_storage(const ir_variable *var);
+
+   src_reg get_temp(const glsl_type *type);
+   void reladdr_to_temp(ir_instruction *ir, src_reg *reg, int *num_reladdr);
+
+   src_reg src_reg_for_float(float val);
+
+   /**
+    * \name Visit methods
+    *
+    * As typical for the visitor pattern, there must be one \c visit method for
+    * each concrete subclass of \c ir_instruction.  Virtual base classes within
+    * the hierarchy should not have \c visit methods.
+    */
+   /*@{*/
+   virtual void visit(ir_variable *);
+   virtual void visit(ir_loop *);
+   virtual void visit(ir_loop_jump *);
+   virtual void visit(ir_function_signature *);
+   virtual void visit(ir_function *);
+   virtual void visit(ir_expression *);
+   virtual void visit(ir_swizzle *);
+   virtual void visit(ir_dereference_variable  *);
+   virtual void visit(ir_dereference_array *);
+   virtual void visit(ir_dereference_record *);
+   virtual void visit(ir_assignment *);
+   virtual void visit(ir_constant *);
+   virtual void visit(ir_call *);
+   virtual void visit(ir_return *);
+   virtual void visit(ir_discard *);
+   virtual void visit(ir_texture *);
+   virtual void visit(ir_if *);
+   virtual void visit(ir_emit_vertex *);
+   virtual void visit(ir_end_primitive *);
+   /*@}*/
+
+   src_reg result;
+
+   /** List of variable_storage */
+   exec_list variables;
+
+   /** List of function_entry */
+   exec_list function_signatures;
+   int next_signature_id;
+
+   /** List of ir_to_mesa_instruction */
+   exec_list instructions;
+
+   ir_to_mesa_instruction *emit(ir_instruction *ir, enum prog_opcode op);
+
+   ir_to_mesa_instruction *emit(ir_instruction *ir, enum prog_opcode op,
+			        dst_reg dst, src_reg src0);
+
+   ir_to_mesa_instruction *emit(ir_instruction *ir, enum prog_opcode op,
+			        dst_reg dst, src_reg src0, src_reg src1);
+
+   ir_to_mesa_instruction *emit(ir_instruction *ir, enum prog_opcode op,
+			        dst_reg dst,
+			        src_reg src0, src_reg src1, src_reg src2);
+
+   /**
+    * Emit the correct dot-product instruction for the type of arguments
+    */
+   ir_to_mesa_instruction * emit_dp(ir_instruction *ir,
+				    dst_reg dst,
+				    src_reg src0,
+				    src_reg src1,
+				    unsigned elements);
+
+   void emit_scalar(ir_instruction *ir, enum prog_opcode op,
+		    dst_reg dst, src_reg src0);
+
+   void emit_scalar(ir_instruction *ir, enum prog_opcode op,
+		    dst_reg dst, src_reg src0, src_reg src1);
+
+   void emit_scs(ir_instruction *ir, enum prog_opcode op,
+		 dst_reg dst, const src_reg &src);
+
+   bool try_emit_mad(ir_expression *ir,
+			  int mul_operand);
+   bool try_emit_mad_for_and_not(ir_expression *ir,
+				 int mul_operand);
+   bool try_emit_sat(ir_expression *ir);
+
+   void emit_swz(ir_expression *ir);
+
+   bool process_move_condition(ir_rvalue *ir);
+
+   void copy_propagate(void);
+
+   void *mem_ctx;
+};
+
+} /* anonymous namespace */
+
+static src_reg undef_src = src_reg(PROGRAM_UNDEFINED, 0, NULL);
+
+static dst_reg undef_dst = dst_reg(PROGRAM_UNDEFINED, SWIZZLE_NOOP);
+
+static dst_reg address_reg = dst_reg(PROGRAM_ADDRESS, WRITEMASK_X);
+
+static int
+swizzle_for_size(int size)
+{
+   static const int size_swizzles[4] = {
+      MAKE_SWIZZLE4(SWIZZLE_X, SWIZZLE_X, SWIZZLE_X, SWIZZLE_X),
+      MAKE_SWIZZLE4(SWIZZLE_X, SWIZZLE_Y, SWIZZLE_Y, SWIZZLE_Y),
+      MAKE_SWIZZLE4(SWIZZLE_X, SWIZZLE_Y, SWIZZLE_Z, SWIZZLE_Z),
+      MAKE_SWIZZLE4(SWIZZLE_X, SWIZZLE_Y, SWIZZLE_Z, SWIZZLE_W),
+   };
+
+   assert((size >= 1) && (size <= 4));
+   return size_swizzles[size - 1];
+}
+
+ir_to_mesa_instruction *
+ir_to_mesa_visitor::emit(ir_instruction *ir, enum prog_opcode op,
+			 dst_reg dst,
+			 src_reg src0, src_reg src1, src_reg src2)
+{
+   ir_to_mesa_instruction *inst = new(mem_ctx) ir_to_mesa_instruction();
+   int num_reladdr = 0;
+
+   /* If we have to do relative addressing, we want to load the ARL
+    * reg directly for one of the regs, and preload the other reladdr
+    * sources into temps.
+    */
+   num_reladdr += dst.reladdr != NULL;
+   num_reladdr += src0.reladdr != NULL;
+   num_reladdr += src1.reladdr != NULL;
+   num_reladdr += src2.reladdr != NULL;
+
+   reladdr_to_temp(ir, &src2, &num_reladdr);
+   reladdr_to_temp(ir, &src1, &num_reladdr);
+   reladdr_to_temp(ir, &src0, &num_reladdr);
+
+   if (dst.reladdr) {
+      emit(ir, OPCODE_ARL, address_reg, *dst.reladdr);
+      num_reladdr--;
+   }
+   assert(num_reladdr == 0);
+
+   inst->op = op;
+   inst->dst = dst;
+   inst->src[0] = src0;
+   inst->src[1] = src1;
+   inst->src[2] = src2;
+   inst->ir = ir;
+
+   this->instructions.push_tail(inst);
+
+   return inst;
+}
+
+
+ir_to_mesa_instruction *
+ir_to_mesa_visitor::emit(ir_instruction *ir, enum prog_opcode op,
+			 dst_reg dst, src_reg src0, src_reg src1)
+{
+   return emit(ir, op, dst, src0, src1, undef_src);
+}
+
+ir_to_mesa_instruction *
+ir_to_mesa_visitor::emit(ir_instruction *ir, enum prog_opcode op,
+			 dst_reg dst, src_reg src0)
+{
+   assert(dst.writemask != 0);
+   return emit(ir, op, dst, src0, undef_src, undef_src);
+}
+
+ir_to_mesa_instruction *
+ir_to_mesa_visitor::emit(ir_instruction *ir, enum prog_opcode op)
+{
+   return emit(ir, op, undef_dst, undef_src, undef_src, undef_src);
+}
+
+ir_to_mesa_instruction *
+ir_to_mesa_visitor::emit_dp(ir_instruction *ir,
+			    dst_reg dst, src_reg src0, src_reg src1,
+			    unsigned elements)
+{
+   static const gl_inst_opcode dot_opcodes[] = {
+      OPCODE_DP2, OPCODE_DP3, OPCODE_DP4
+   };
+
+   return emit(ir, dot_opcodes[elements - 2], dst, src0, src1);
+}
+
+/**
+ * Emits Mesa scalar opcodes to produce unique answers across channels.
+ *
+ * Some Mesa opcodes are scalar-only, like ARB_fp/vp.  The src X
+ * channel determines the result across all channels.  So to do a vec4
+ * of this operation, we want to emit a scalar per source channel used
+ * to produce dest channels.
+ */
+void
+ir_to_mesa_visitor::emit_scalar(ir_instruction *ir, enum prog_opcode op,
+			        dst_reg dst,
+				src_reg orig_src0, src_reg orig_src1)
+{
+   int i, j;
+   int done_mask = ~dst.writemask;
+
+   /* Mesa RCP is a scalar operation splatting results to all channels,
+    * like ARB_fp/vp.  So emit as many RCPs as necessary to cover our
+    * dst channels.
+    */
+   for (i = 0; i < 4; i++) {
+      GLuint this_mask = (1 << i);
+      ir_to_mesa_instruction *inst;
+      src_reg src0 = orig_src0;
+      src_reg src1 = orig_src1;
+
+      if (done_mask & this_mask)
+	 continue;
+
+      GLuint src0_swiz = GET_SWZ(src0.swizzle, i);
+      GLuint src1_swiz = GET_SWZ(src1.swizzle, i);
+      for (j = i + 1; j < 4; j++) {
+	 /* If there is another enabled component in the destination that is
+	  * derived from the same inputs, generate its value on this pass as
+	  * well.
+	  */
+	 if (!(done_mask & (1 << j)) &&
+	     GET_SWZ(src0.swizzle, j) == src0_swiz &&
+	     GET_SWZ(src1.swizzle, j) == src1_swiz) {
+	    this_mask |= (1 << j);
+	 }
+      }
+      src0.swizzle = MAKE_SWIZZLE4(src0_swiz, src0_swiz,
+				   src0_swiz, src0_swiz);
+      src1.swizzle = MAKE_SWIZZLE4(src1_swiz, src1_swiz,
+				  src1_swiz, src1_swiz);
+
+      inst = emit(ir, op, dst, src0, src1);
+      inst->dst.writemask = this_mask;
+      done_mask |= this_mask;
+   }
+}
+
+void
+ir_to_mesa_visitor::emit_scalar(ir_instruction *ir, enum prog_opcode op,
+			        dst_reg dst, src_reg src0)
+{
+   src_reg undef = undef_src;
+
+   undef.swizzle = SWIZZLE_XXXX;
+
+   emit_scalar(ir, op, dst, src0, undef);
+}
+
+/**
+ * Emit an OPCODE_SCS instruction
+ *
+ * The \c SCS opcode functions a bit differently than the other Mesa (or
+ * ARB_fragment_program) opcodes.  Instead of splatting its result across all
+ * four components of the destination, it writes one value to the \c x
+ * component and another value to the \c y component.
+ *
+ * \param ir        IR instruction being processed
+ * \param op        Either \c OPCODE_SIN or \c OPCODE_COS depending on which
+ *                  value is desired.
+ * \param dst       Destination register
+ * \param src       Source register
+ */
+void
+ir_to_mesa_visitor::emit_scs(ir_instruction *ir, enum prog_opcode op,
+			     dst_reg dst,
+			     const src_reg &src)
+{
+   /* Vertex programs cannot use the SCS opcode.
+    */
+   if (this->prog->Target == GL_VERTEX_PROGRAM_ARB) {
+      emit_scalar(ir, op, dst, src);
+      return;
+   }
+
+   const unsigned component = (op == OPCODE_SIN) ? 0 : 1;
+   const unsigned scs_mask = (1U << component);
+   int done_mask = ~dst.writemask;
+   src_reg tmp;
+
+   assert(op == OPCODE_SIN || op == OPCODE_COS);
+
+   /* If there are compnents in the destination that differ from the component
+    * that will be written by the SCS instrution, we'll need a temporary.
+    */
+   if (scs_mask != unsigned(dst.writemask)) {
+      tmp = get_temp(glsl_type::vec4_type);
+   }
+
+   for (unsigned i = 0; i < 4; i++) {
+      unsigned this_mask = (1U << i);
+      src_reg src0 = src;
+
+      if ((done_mask & this_mask) != 0)
+	 continue;
+
+      /* The source swizzle specified which component of the source generates
+       * sine / cosine for the current component in the destination.  The SCS
+       * instruction requires that this value be swizzle to the X component.
+       * Replace the current swizzle with a swizzle that puts the source in
+       * the X component.
+       */
+      unsigned src0_swiz = GET_SWZ(src.swizzle, i);
+
+      src0.swizzle = MAKE_SWIZZLE4(src0_swiz, src0_swiz,
+				   src0_swiz, src0_swiz);
+      for (unsigned j = i + 1; j < 4; j++) {
+	 /* If there is another enabled component in the destination that is
+	  * derived from the same inputs, generate its value on this pass as
+	  * well.
+	  */
+	 if (!(done_mask & (1 << j)) &&
+	     GET_SWZ(src0.swizzle, j) == src0_swiz) {
+	    this_mask |= (1 << j);
+	 }
+      }
+
+      if (this_mask != scs_mask) {
+	 ir_to_mesa_instruction *inst;
+	 dst_reg tmp_dst = dst_reg(tmp);
+
+	 /* Emit the SCS instruction.
+	  */
+	 inst = emit(ir, OPCODE_SCS, tmp_dst, src0);
+	 inst->dst.writemask = scs_mask;
+
+	 /* Move the result of the SCS instruction to the desired location in
+	  * the destination.
+	  */
+	 tmp.swizzle = MAKE_SWIZZLE4(component, component,
+				     component, component);
+	 inst = emit(ir, OPCODE_SCS, dst, tmp);
+	 inst->dst.writemask = this_mask;
+      } else {
+	 /* Emit the SCS instruction to write directly to the destination.
+	  */
+	 ir_to_mesa_instruction *inst = emit(ir, OPCODE_SCS, dst, src0);
+	 inst->dst.writemask = scs_mask;
+      }
+
+      done_mask |= this_mask;
+   }
+}
+
+src_reg
+ir_to_mesa_visitor::src_reg_for_float(float val)
+{
+   src_reg src(PROGRAM_CONSTANT, -1, NULL);
+
+   src.index = _mesa_add_unnamed_constant(this->prog->Parameters,
+					  (const gl_constant_value *)&val, 1, &src.swizzle);
+
+   return src;
+}
+
+static int
+type_size(const struct glsl_type *type)
+{
+   unsigned int i;
+   int size;
+
+   switch (type->base_type) {
+   case GLSL_TYPE_UINT:
+   case GLSL_TYPE_INT:
+   case GLSL_TYPE_FLOAT:
+   case GLSL_TYPE_BOOL:
+      if (type->is_matrix()) {
+	 return type->matrix_columns;
+      } else {
+	 /* Regardless of size of vector, it gets a vec4. This is bad
+	  * packing for things like floats, but otherwise arrays become a
+	  * mess.  Hopefully a later pass over the code can pack scalars
+	  * down if appropriate.
+	  */
+	 return 1;
+      }
+   case GLSL_TYPE_ARRAY:
+      assert(type->length > 0);
+      return type_size(type->fields.array) * type->length;
+   case GLSL_TYPE_STRUCT:
+      size = 0;
+      for (i = 0; i < type->length; i++) {
+	 size += type_size(type->fields.structure[i].type);
+      }
+      return size;
+   case GLSL_TYPE_SAMPLER:
+   case GLSL_TYPE_IMAGE:
+      /* Samplers take up one slot in UNIFORMS[], but they're baked in
+       * at link time.
+       */
+      return 1;
+   case GLSL_TYPE_ATOMIC_UINT:
+   case GLSL_TYPE_VOID:
+   case GLSL_TYPE_ERROR:
+   case GLSL_TYPE_INTERFACE:
+      assert(!"Invalid type in type_size");
+      break;
+   }
+
+   return 0;
+}
+
+/**
+ * In the initial pass of codegen, we assign temporary numbers to
+ * intermediate results.  (not SSA -- variable assignments will reuse
+ * storage).  Actual register allocation for the Mesa VM occurs in a
+ * pass over the Mesa IR later.
+ */
+src_reg
+ir_to_mesa_visitor::get_temp(const glsl_type *type)
+{
+   src_reg src;
+
+   src.file = PROGRAM_TEMPORARY;
+   src.index = next_temp;
+   src.reladdr = NULL;
+   next_temp += type_size(type);
+
+   if (type->is_array() || type->is_record()) {
+      src.swizzle = SWIZZLE_NOOP;
+   } else {
+      src.swizzle = swizzle_for_size(type->vector_elements);
+   }
+   src.negate = 0;
+
+   return src;
+}
+
+variable_storage *
+ir_to_mesa_visitor::find_variable_storage(const ir_variable *var)
+{
+   variable_storage *entry;
+
+   foreach_list(node, &this->variables) {
+      entry = (variable_storage *) node;
+
+      if (entry->var == var)
+	 return entry;
+   }
+
+   return NULL;
+}
+
+void
+ir_to_mesa_visitor::visit(ir_variable *ir)
+{
+   if (strcmp(ir->name, "gl_FragCoord") == 0) {
+      struct gl_fragment_program *fp = (struct gl_fragment_program *)this->prog;
+
+      fp->OriginUpperLeft = ir->data.origin_upper_left;
+      fp->PixelCenterInteger = ir->data.pixel_center_integer;
+   }
+
+   if (ir->data.mode == ir_var_uniform && strncmp(ir->name, "gl_", 3) == 0) {
+      unsigned int i;
+      const ir_state_slot *const slots = ir->state_slots;
+      assert(ir->state_slots != NULL);
+
+      /* Check if this statevar's setup in the STATE file exactly
+       * matches how we'll want to reference it as a
+       * struct/array/whatever.  If not, then we need to move it into
+       * temporary storage and hope that it'll get copy-propagated
+       * out.
+       */
+      for (i = 0; i < ir->num_state_slots; i++) {
+	 if (slots[i].swizzle != SWIZZLE_XYZW) {
+	    break;
+	 }
+      }
+
+      variable_storage *storage;
+      dst_reg dst;
+      if (i == ir->num_state_slots) {
+	 /* We'll set the index later. */
+	 storage = new(mem_ctx) variable_storage(ir, PROGRAM_STATE_VAR, -1);
+	 this->variables.push_tail(storage);
+
+	 dst = undef_dst;
+      } else {
+	 /* The variable_storage constructor allocates slots based on the size
+	  * of the type.  However, this had better match the number of state
+	  * elements that we're going to copy into the new temporary.
+	  */
+	 assert((int) ir->num_state_slots == type_size(ir->type));
+
+	 storage = new(mem_ctx) variable_storage(ir, PROGRAM_TEMPORARY,
+						 this->next_temp);
+	 this->variables.push_tail(storage);
+	 this->next_temp += type_size(ir->type);
+
+	 dst = dst_reg(src_reg(PROGRAM_TEMPORARY, storage->index, NULL));
+      }
+
+
+      for (unsigned int i = 0; i < ir->num_state_slots; i++) {
+	 int index = _mesa_add_state_reference(this->prog->Parameters,
+					       (gl_state_index *)slots[i].tokens);
+
+	 if (storage->file == PROGRAM_STATE_VAR) {
+	    if (storage->index == -1) {
+	       storage->index = index;
+	    } else {
+	       assert(index == storage->index + (int)i);
+	    }
+	 } else {
+	    src_reg src(PROGRAM_STATE_VAR, index, NULL);
+	    src.swizzle = slots[i].swizzle;
+	    emit(ir, OPCODE_MOV, dst, src);
+	    /* even a float takes up a whole vec4 reg in a struct/array. */
+	    dst.index++;
+	 }
+      }
+
+      if (storage->file == PROGRAM_TEMPORARY &&
+	  dst.index != storage->index + (int) ir->num_state_slots) {
+	 linker_error(this->shader_program,
+		      "failed to load builtin uniform `%s' "
+		      "(%d/%d regs loaded)\n",
+		      ir->name, dst.index - storage->index,
+		      type_size(ir->type));
+      }
+   }
+}
+
+void
+ir_to_mesa_visitor::visit(ir_loop *ir)
+{
+   emit(NULL, OPCODE_BGNLOOP);
+
+   visit_exec_list(&ir->body_instructions, this);
+
+   emit(NULL, OPCODE_ENDLOOP);
+}
+
+void
+ir_to_mesa_visitor::visit(ir_loop_jump *ir)
+{
+   switch (ir->mode) {
+   case ir_loop_jump::jump_break:
+      emit(NULL, OPCODE_BRK);
+      break;
+   case ir_loop_jump::jump_continue:
+      emit(NULL, OPCODE_CONT);
+      break;
+   }
+}
+
+
+void
+ir_to_mesa_visitor::visit(ir_function_signature *ir)
+{
+   assert(0);
+   (void)ir;
+}
+
+void
+ir_to_mesa_visitor::visit(ir_function *ir)
+{
+   /* Ignore function bodies other than main() -- we shouldn't see calls to
+    * them since they should all be inlined before we get to ir_to_mesa.
+    */
+   if (strcmp(ir->name, "main") == 0) {
+      const ir_function_signature *sig;
+      exec_list empty;
+
+      sig = ir->matching_signature(NULL, &empty);
+
+      assert(sig);
+
+      foreach_list(node, &sig->body) {
+	 ir_instruction *ir = (ir_instruction *) node;
+
+	 ir->accept(this);
+      }
+   }
+}
+
+bool
+ir_to_mesa_visitor::try_emit_mad(ir_expression *ir, int mul_operand)
+{
+   int nonmul_operand = 1 - mul_operand;
+   src_reg a, b, c;
+
+   ir_expression *expr = ir->operands[mul_operand]->as_expression();
+   if (!expr || expr->operation != ir_binop_mul)
+      return false;
+
+   expr->operands[0]->accept(this);
+   a = this->result;
+   expr->operands[1]->accept(this);
+   b = this->result;
+   ir->operands[nonmul_operand]->accept(this);
+   c = this->result;
+
+   this->result = get_temp(ir->type);
+   emit(ir, OPCODE_MAD, dst_reg(this->result), a, b, c);
+
+   return true;
+}
+
+/**
+ * Emit OPCODE_MAD(a, -b, a) instead of AND(a, NOT(b))
+ *
+ * The logic values are 1.0 for true and 0.0 for false.  Logical-and is
+ * implemented using multiplication, and logical-or is implemented using
+ * addition.  Logical-not can be implemented as (true - x), or (1.0 - x).
+ * As result, the logical expression (a & !b) can be rewritten as:
+ *
+ *     - a * !b
+ *     - a * (1 - b)
+ *     - (a * 1) - (a * b)
+ *     - a + -(a * b)
+ *     - a + (a * -b)
+ *
+ * This final expression can be implemented as a single MAD(a, -b, a)
+ * instruction.
+ */
+bool
+ir_to_mesa_visitor::try_emit_mad_for_and_not(ir_expression *ir, int try_operand)
+{
+   const int other_operand = 1 - try_operand;
+   src_reg a, b;
+
+   ir_expression *expr = ir->operands[try_operand]->as_expression();
+   if (!expr || expr->operation != ir_unop_logic_not)
+      return false;
+
+   ir->operands[other_operand]->accept(this);
+   a = this->result;
+   expr->operands[0]->accept(this);
+   b = this->result;
+
+   b.negate = ~b.negate;
+
+   this->result = get_temp(ir->type);
+   emit(ir, OPCODE_MAD, dst_reg(this->result), a, b, a);
+
+   return true;
+}
+
+bool
+ir_to_mesa_visitor::try_emit_sat(ir_expression *ir)
+{
+   /* Saturates were only introduced to vertex programs in
+    * NV_vertex_program3, so don't give them to drivers in the VP.
+    */
+   if (this->prog->Target == GL_VERTEX_PROGRAM_ARB)
+      return false;
+
+   ir_rvalue *sat_src = ir->as_rvalue_to_saturate();
+   if (!sat_src)
+      return false;
+
+   sat_src->accept(this);
+   src_reg src = this->result;
+
+   /* If we generated an expression instruction into a temporary in
+    * processing the saturate's operand, apply the saturate to that
+    * instruction.  Otherwise, generate a MOV to do the saturate.
+    *
+    * Note that we have to be careful to only do this optimization if
+    * the instruction in question was what generated src->result.  For
+    * example, ir_dereference_array might generate a MUL instruction
+    * to create the reladdr, and return us a src reg using that
+    * reladdr.  That MUL result is not the value we're trying to
+    * saturate.
+    */
+   ir_expression *sat_src_expr = sat_src->as_expression();
+   ir_to_mesa_instruction *new_inst;
+   new_inst = (ir_to_mesa_instruction *)this->instructions.get_tail();
+   if (sat_src_expr && (sat_src_expr->operation == ir_binop_mul ||
+			sat_src_expr->operation == ir_binop_add ||
+			sat_src_expr->operation == ir_binop_dot)) {
+      new_inst->saturate = true;
+   } else {
+      this->result = get_temp(ir->type);
+      ir_to_mesa_instruction *inst;
+      inst = emit(ir, OPCODE_MOV, dst_reg(this->result), src);
+      inst->saturate = true;
+   }
+
+   return true;
+}
+
+void
+ir_to_mesa_visitor::reladdr_to_temp(ir_instruction *ir,
+				    src_reg *reg, int *num_reladdr)
+{
+   if (!reg->reladdr)
+      return;
+
+   emit(ir, OPCODE_ARL, address_reg, *reg->reladdr);
+
+   if (*num_reladdr != 1) {
+      src_reg temp = get_temp(glsl_type::vec4_type);
+
+      emit(ir, OPCODE_MOV, dst_reg(temp), *reg);
+      *reg = temp;
+   }
+
+   (*num_reladdr)--;
+}
+
+void
+ir_to_mesa_visitor::emit_swz(ir_expression *ir)
+{
+   /* Assume that the vector operator is in a form compatible with OPCODE_SWZ.
+    * This means that each of the operands is either an immediate value of -1,
+    * 0, or 1, or is a component from one source register (possibly with
+    * negation).
+    */
+   uint8_t components[4] = { 0 };
+   bool negate[4] = { false };
+   ir_variable *var = NULL;
+
+   for (unsigned i = 0; i < ir->type->vector_elements; i++) {
+      ir_rvalue *op = ir->operands[i];
+
+      assert(op->type->is_scalar());
+
+      while (op != NULL) {
+	 switch (op->ir_type) {
+	 case ir_type_constant: {
+
+	    assert(op->type->is_scalar());
+
+	    const ir_constant *const c = op->as_constant();
+	    if (c->is_one()) {
+	       components[i] = SWIZZLE_ONE;
+	    } else if (c->is_zero()) {
+	       components[i] = SWIZZLE_ZERO;
+	    } else if (c->is_negative_one()) {
+	       components[i] = SWIZZLE_ONE;
+	       negate[i] = true;
+	    } else {
+	       assert(!"SWZ constant must be 0.0 or 1.0.");
+	    }
+
+	    op = NULL;
+	    break;
+	 }
+
+	 case ir_type_dereference_variable: {
+	    ir_dereference_variable *const deref =
+	       (ir_dereference_variable *) op;
+
+	    assert((var == NULL) || (deref->var == var));
+	    components[i] = SWIZZLE_X;
+	    var = deref->var;
+	    op = NULL;
+	    break;
+	 }
+
+	 case ir_type_expression: {
+	    ir_expression *const expr = (ir_expression *) op;
+
+	    assert(expr->operation == ir_unop_neg);
+	    negate[i] = true;
+
+	    op = expr->operands[0];
+	    break;
+	 }
+
+	 case ir_type_swizzle: {
+	    ir_swizzle *const swiz = (ir_swizzle *) op;
+
+	    components[i] = swiz->mask.x;
+	    op = swiz->val;
+	    break;
+	 }
+
+	 default:
+	    assert(!"Should not get here.");
+	    return;
+	 }
+      }
+   }
+
+   assert(var != NULL);
+
+   ir_dereference_variable *const deref =
+      new(mem_ctx) ir_dereference_variable(var);
+
+   this->result.file = PROGRAM_UNDEFINED;
+   deref->accept(this);
+   if (this->result.file == PROGRAM_UNDEFINED) {
+      printf("Failed to get tree for expression operand:\n");
+      deref->print();
+      printf("\n");
+      exit(1);
+   }
+
+   src_reg src;
+
+   src = this->result;
+   src.swizzle = MAKE_SWIZZLE4(components[0],
+			       components[1],
+			       components[2],
+			       components[3]);
+   src.negate = ((unsigned(negate[0]) << 0)
+		 | (unsigned(negate[1]) << 1)
+		 | (unsigned(negate[2]) << 2)
+		 | (unsigned(negate[3]) << 3));
+
+   /* Storage for our result.  Ideally for an assignment we'd be using the
+    * actual storage for the result here, instead.
+    */
+   const src_reg result_src = get_temp(ir->type);
+   dst_reg result_dst = dst_reg(result_src);
+
+   /* Limit writes to the channels that will be used by result_src later.
+    * This does limit this temp's use as a temporary for multi-instruction
+    * sequences.
+    */
+   result_dst.writemask = (1 << ir->type->vector_elements) - 1;
+
+   emit(ir, OPCODE_SWZ, result_dst, src);
+   this->result = result_src;
+}
+
+void
+ir_to_mesa_visitor::visit(ir_expression *ir)
+{
+   unsigned int operand;
+   src_reg op[Elements(ir->operands)];
+   src_reg result_src;
+   dst_reg result_dst;
+
+   /* Quick peephole: Emit OPCODE_MAD(a, b, c) instead of ADD(MUL(a, b), c)
+    */
+   if (ir->operation == ir_binop_add) {
+      if (try_emit_mad(ir, 1))
+	 return;
+      if (try_emit_mad(ir, 0))
+	 return;
+   }
+
+   /* Quick peephole: Emit OPCODE_MAD(-a, -b, a) instead of AND(a, NOT(b))
+    */
+   if (ir->operation == ir_binop_logic_and) {
+      if (try_emit_mad_for_and_not(ir, 1))
+	 return;
+      if (try_emit_mad_for_and_not(ir, 0))
+	 return;
+   }
+
+   if (try_emit_sat(ir))
+      return;
+
+   if (ir->operation == ir_quadop_vector) {
+      this->emit_swz(ir);
+      return;
+   }
+
+   for (operand = 0; operand < ir->get_num_operands(); operand++) {
+      this->result.file = PROGRAM_UNDEFINED;
+      ir->operands[operand]->accept(this);
+      if (this->result.file == PROGRAM_UNDEFINED) {
+	 printf("Failed to get tree for expression operand:\n");
+         ir->operands[operand]->print();
+         printf("\n");
+	 exit(1);
+      }
+      op[operand] = this->result;
+
+      /* Matrix expression operands should have been broken down to vector
+       * operations already.
+       */
+      assert(!ir->operands[operand]->type->is_matrix());
+   }
+
+   int vector_elements = ir->operands[0]->type->vector_elements;
+   if (ir->operands[1]) {
+      vector_elements = MAX2(vector_elements,
+			     ir->operands[1]->type->vector_elements);
+   }
+
+   this->result.file = PROGRAM_UNDEFINED;
+
+   /* Storage for our result.  Ideally for an assignment we'd be using
+    * the actual storage for the result here, instead.
+    */
+   result_src = get_temp(ir->type);
+   /* convenience for the emit functions below. */
+   result_dst = dst_reg(result_src);
+   /* Limit writes to the channels that will be used by result_src later.
+    * This does limit this temp's use as a temporary for multi-instruction
+    * sequences.
+    */
+   result_dst.writemask = (1 << ir->type->vector_elements) - 1;
+
+   switch (ir->operation) {
+   case ir_unop_logic_not:
+      /* Previously 'SEQ dst, src, 0.0' was used for this.  However, many
+       * older GPUs implement SEQ using multiple instructions (i915 uses two
+       * SGE instructions and a MUL instruction).  Since our logic values are
+       * 0.0 and 1.0, 1-x also implements !x.
+       */
+      op[0].negate = ~op[0].negate;
+      emit(ir, OPCODE_ADD, result_dst, op[0], src_reg_for_float(1.0));
+      break;
+   case ir_unop_neg:
+      op[0].negate = ~op[0].negate;
+      result_src = op[0];
+      break;
+   case ir_unop_abs:
+      emit(ir, OPCODE_ABS, result_dst, op[0]);
+      break;
+   case ir_unop_sign:
+      emit(ir, OPCODE_SSG, result_dst, op[0]);
+      break;
+   case ir_unop_rcp:
+      emit_scalar(ir, OPCODE_RCP, result_dst, op[0]);
+      break;
+
+   case ir_unop_exp2:
+      emit_scalar(ir, OPCODE_EX2, result_dst, op[0]);
+      break;
+   case ir_unop_exp:
+   case ir_unop_log:
+      assert(!"not reached: should be handled by ir_explog_to_explog2");
+      break;
+   case ir_unop_log2:
+      emit_scalar(ir, OPCODE_LG2, result_dst, op[0]);
+      break;
+   case ir_unop_sin:
+      emit_scalar(ir, OPCODE_SIN, result_dst, op[0]);
+      break;
+   case ir_unop_cos:
+      emit_scalar(ir, OPCODE_COS, result_dst, op[0]);
+      break;
+   case ir_unop_sin_reduced:
+      emit_scs(ir, OPCODE_SIN, result_dst, op[0]);
+      break;
+   case ir_unop_cos_reduced:
+      emit_scs(ir, OPCODE_COS, result_dst, op[0]);
+      break;
+
+   case ir_unop_dFdx:
+      emit(ir, OPCODE_DDX, result_dst, op[0]);
+      break;
+   case ir_unop_dFdy:
+      emit(ir, OPCODE_DDY, result_dst, op[0]);
+      break;
+
+   case ir_unop_noise: {
+      const enum prog_opcode opcode =
+	 prog_opcode(OPCODE_NOISE1
+		     + (ir->operands[0]->type->vector_elements) - 1);
+      assert((opcode >= OPCODE_NOISE1) && (opcode <= OPCODE_NOISE4));
+
+      emit(ir, opcode, result_dst, op[0]);
+      break;
+   }
+
+   case ir_binop_add:
+      emit(ir, OPCODE_ADD, result_dst, op[0], op[1]);
+      break;
+   case ir_binop_sub:
+      emit(ir, OPCODE_SUB, result_dst, op[0], op[1]);
+      break;
+
+   case ir_binop_mul:
+      emit(ir, OPCODE_MUL, result_dst, op[0], op[1]);
+      break;
+   case ir_binop_div:
+      assert(!"not reached: should be handled by ir_div_to_mul_rcp");
+      break;
+   case ir_binop_mod:
+      /* Floating point should be lowered by MOD_TO_FRACT in the compiler. */
+      assert(ir->type->is_integer());
+      emit(ir, OPCODE_MUL, result_dst, op[0], op[1]);
+      break;
+
+   case ir_binop_less:
+      emit(ir, OPCODE_SLT, result_dst, op[0], op[1]);
+      break;
+   case ir_binop_greater:
+      emit(ir, OPCODE_SGT, result_dst, op[0], op[1]);
+      break;
+   case ir_binop_lequal:
+      emit(ir, OPCODE_SLE, result_dst, op[0], op[1]);
+      break;
+   case ir_binop_gequal:
+      emit(ir, OPCODE_SGE, result_dst, op[0], op[1]);
+      break;
+   case ir_binop_equal:
+      emit(ir, OPCODE_SEQ, result_dst, op[0], op[1]);
+      break;
+   case ir_binop_nequal:
+      emit(ir, OPCODE_SNE, result_dst, op[0], op[1]);
+      break;
+   case ir_binop_all_equal:
+      /* "==" operator producing a scalar boolean. */
+      if (ir->operands[0]->type->is_vector() ||
+	  ir->operands[1]->type->is_vector()) {
+	 src_reg temp = get_temp(glsl_type::vec4_type);
+	 emit(ir, OPCODE_SNE, dst_reg(temp), op[0], op[1]);
+
+	 /* After the dot-product, the value will be an integer on the
+	  * range [0,4].  Zero becomes 1.0, and positive values become zero.
+	  */
+	 emit_dp(ir, result_dst, temp, temp, vector_elements);
+
+	 /* Negating the result of the dot-product gives values on the range
+	  * [-4, 0].  Zero becomes 1.0, and negative values become zero.  This
+	  * achieved using SGE.
+	  */
+	 src_reg sge_src = result_src;
+	 sge_src.negate = ~sge_src.negate;
+	 emit(ir, OPCODE_SGE, result_dst, sge_src, src_reg_for_float(0.0));
+      } else {
+	 emit(ir, OPCODE_SEQ, result_dst, op[0], op[1]);
+      }
+      break;
+   case ir_binop_any_nequal:
+      /* "!=" operator producing a scalar boolean. */
+      if (ir->operands[0]->type->is_vector() ||
+	  ir->operands[1]->type->is_vector()) {
+	 src_reg temp = get_temp(glsl_type::vec4_type);
+	 emit(ir, OPCODE_SNE, dst_reg(temp), op[0], op[1]);
+
+	 /* After the dot-product, the value will be an integer on the
+	  * range [0,4].  Zero stays zero, and positive values become 1.0.
+	  */
+	 ir_to_mesa_instruction *const dp =
+	    emit_dp(ir, result_dst, temp, temp, vector_elements);
+	 if (this->prog->Target == GL_FRAGMENT_PROGRAM_ARB) {
+	    /* The clamping to [0,1] can be done for free in the fragment
+	     * shader with a saturate.
+	     */
+	    dp->saturate = true;
+	 } else {
+	    /* Negating the result of the dot-product gives values on the range
+	     * [-4, 0].  Zero stays zero, and negative values become 1.0.  This
+	     * achieved using SLT.
+	     */
+	    src_reg slt_src = result_src;
+	    slt_src.negate = ~slt_src.negate;
+	    emit(ir, OPCODE_SLT, result_dst, slt_src, src_reg_for_float(0.0));
+	 }
+      } else {
+	 emit(ir, OPCODE_SNE, result_dst, op[0], op[1]);
+      }
+      break;
+
+   case ir_unop_any: {
+      assert(ir->operands[0]->type->is_vector());
+
+      /* After the dot-product, the value will be an integer on the
+       * range [0,4].  Zero stays zero, and positive values become 1.0.
+       */
+      ir_to_mesa_instruction *const dp =
+	 emit_dp(ir, result_dst, op[0], op[0],
+		 ir->operands[0]->type->vector_elements);
+      if (this->prog->Target == GL_FRAGMENT_PROGRAM_ARB) {
+	 /* The clamping to [0,1] can be done for free in the fragment
+	  * shader with a saturate.
+	  */
+	 dp->saturate = true;
+      } else {
+	 /* Negating the result of the dot-product gives values on the range
+	  * [-4, 0].  Zero stays zero, and negative values become 1.0.  This
+	  * is achieved using SLT.
+	  */
+	 src_reg slt_src = result_src;
+	 slt_src.negate = ~slt_src.negate;
+	 emit(ir, OPCODE_SLT, result_dst, slt_src, src_reg_for_float(0.0));
+      }
+      break;
+   }
+
+   case ir_binop_logic_xor:
+      emit(ir, OPCODE_SNE, result_dst, op[0], op[1]);
+      break;
+
+   case ir_binop_logic_or: {
+      /* After the addition, the value will be an integer on the
+       * range [0,2].  Zero stays zero, and positive values become 1.0.
+       */
+      ir_to_mesa_instruction *add =
+	 emit(ir, OPCODE_ADD, result_dst, op[0], op[1]);
+      if (this->prog->Target == GL_FRAGMENT_PROGRAM_ARB) {
+	 /* The clamping to [0,1] can be done for free in the fragment
+	  * shader with a saturate.
+	  */
+	 add->saturate = true;
+      } else {
+	 /* Negating the result of the addition gives values on the range
+	  * [-2, 0].  Zero stays zero, and negative values become 1.0.  This
+	  * is achieved using SLT.
+	  */
+	 src_reg slt_src = result_src;
+	 slt_src.negate = ~slt_src.negate;
+	 emit(ir, OPCODE_SLT, result_dst, slt_src, src_reg_for_float(0.0));
+      }
+      break;
+   }
+
+   case ir_binop_logic_and:
+      /* the bool args are stored as float 0.0 or 1.0, so "mul" gives us "and". */
+      emit(ir, OPCODE_MUL, result_dst, op[0], op[1]);
+      break;
+
+   case ir_binop_dot:
+      assert(ir->operands[0]->type->is_vector());
+      assert(ir->operands[0]->type == ir->operands[1]->type);
+      emit_dp(ir, result_dst, op[0], op[1],
+	      ir->operands[0]->type->vector_elements);
+      break;
+
+   case ir_unop_sqrt:
+      /* sqrt(x) = x * rsq(x). */
+      emit_scalar(ir, OPCODE_RSQ, result_dst, op[0]);
+      emit(ir, OPCODE_MUL, result_dst, result_src, op[0]);
+      /* For incoming channels <= 0, set the result to 0. */
+      op[0].negate = ~op[0].negate;
+      emit(ir, OPCODE_CMP, result_dst,
+			  op[0], result_src, src_reg_for_float(0.0));
+      break;
+   case ir_unop_rsq:
+      emit_scalar(ir, OPCODE_RSQ, result_dst, op[0]);
+      break;
+   case ir_unop_i2f:
+   case ir_unop_u2f:
+   case ir_unop_b2f:
+   case ir_unop_b2i:
+   case ir_unop_i2u:
+   case ir_unop_u2i:
+      /* Mesa IR lacks types, ints are stored as truncated floats. */
+      result_src = op[0];
+      break;
+   case ir_unop_f2i:
+   case ir_unop_f2u:
+      emit(ir, OPCODE_TRUNC, result_dst, op[0]);
+      break;
+   case ir_unop_f2b:
+   case ir_unop_i2b:
+      emit(ir, OPCODE_SNE, result_dst,
+			  op[0], src_reg_for_float(0.0));
+      break;
+   case ir_unop_bitcast_f2i: // Ignore these 4, they can't happen here anyway
+   case ir_unop_bitcast_f2u:
+   case ir_unop_bitcast_i2f:
+   case ir_unop_bitcast_u2f:
+      break;
+   case ir_unop_trunc:
+      emit(ir, OPCODE_TRUNC, result_dst, op[0]);
+      break;
+   case ir_unop_ceil:
+      op[0].negate = ~op[0].negate;
+      emit(ir, OPCODE_FLR, result_dst, op[0]);
+      result_src.negate = ~result_src.negate;
+      break;
+   case ir_unop_floor:
+      emit(ir, OPCODE_FLR, result_dst, op[0]);
+      break;
+   case ir_unop_fract:
+      emit(ir, OPCODE_FRC, result_dst, op[0]);
+      break;
+   case ir_unop_pack_snorm_2x16:
+   case ir_unop_pack_snorm_4x8:
+   case ir_unop_pack_unorm_2x16:
+   case ir_unop_pack_unorm_4x8:
+   case ir_unop_pack_half_2x16:
+   case ir_unop_unpack_snorm_2x16:
+   case ir_unop_unpack_snorm_4x8:
+   case ir_unop_unpack_unorm_2x16:
+   case ir_unop_unpack_unorm_4x8:
+   case ir_unop_unpack_half_2x16:
+   case ir_unop_unpack_half_2x16_split_x:
+   case ir_unop_unpack_half_2x16_split_y:
+   case ir_binop_pack_half_2x16_split:
+   case ir_unop_bitfield_reverse:
+   case ir_unop_bit_count:
+   case ir_unop_find_msb:
+   case ir_unop_find_lsb:
+      assert(!"not supported");
+      break;
+   case ir_binop_min:
+      emit(ir, OPCODE_MIN, result_dst, op[0], op[1]);
+      break;
+   case ir_binop_max:
+      emit(ir, OPCODE_MAX, result_dst, op[0], op[1]);
+      break;
+   case ir_binop_pow:
+      emit_scalar(ir, OPCODE_POW, result_dst, op[0], op[1]);
+      break;
+
+      /* GLSL 1.30 integer ops are unsupported in Mesa IR, but since
+       * hardware backends have no way to avoid Mesa IR generation
+       * even if they don't use it, we need to emit "something" and
+       * continue.
+       */
+   case ir_binop_lshift:
+   case ir_binop_rshift:
+   case ir_binop_bit_and:
+   case ir_binop_bit_xor:
+   case ir_binop_bit_or:
+      emit(ir, OPCODE_ADD, result_dst, op[0], op[1]);
+      break;
+
+   case ir_unop_bit_not:
+   case ir_unop_round_even:
+      emit(ir, OPCODE_MOV, result_dst, op[0]);
+      break;
+
+   case ir_binop_ubo_load:
+      assert(!"not supported");
+      break;
+
+   case ir_triop_lrp:
+      /* ir_triop_lrp operands are (x, y, a) while
+       * OPCODE_LRP operands are (a, y, x) to match ARB_fragment_program.
+       */
+      emit(ir, OPCODE_LRP, result_dst, op[2], op[1], op[0]);
+      break;
+
+   case ir_binop_vector_extract:
+   case ir_binop_bfm:
+   case ir_triop_fma:
+   case ir_triop_bfi:
+   case ir_triop_bitfield_extract:
+   case ir_triop_vector_insert:
+   case ir_quadop_bitfield_insert:
+   case ir_binop_ldexp:
+   case ir_triop_csel:
+   case ir_binop_carry:
+   case ir_binop_borrow:
+   case ir_binop_imul_high:
+      assert(!"not supported");
+      break;
+
+   case ir_quadop_vector:
+      /* This operation should have already been handled.
+       */
+      assert(!"Should not get here.");
+      break;
+   }
+
+   this->result = result_src;
+}
+
+
+void
+ir_to_mesa_visitor::visit(ir_swizzle *ir)
+{
+   src_reg src;
+   int i;
+   int swizzle[4];
+
+   /* Note that this is only swizzles in expressions, not those on the left
+    * hand side of an assignment, which do write masking.  See ir_assignment
+    * for that.
+    */
+
+   ir->val->accept(this);
+   src = this->result;
+   assert(src.file != PROGRAM_UNDEFINED);
+
+   for (i = 0; i < 4; i++) {
+      if (i < ir->type->vector_elements) {
+	 switch (i) {
+	 case 0:
+	    swizzle[i] = GET_SWZ(src.swizzle, ir->mask.x);
+	    break;
+	 case 1:
+	    swizzle[i] = GET_SWZ(src.swizzle, ir->mask.y);
+	    break;
+	 case 2:
+	    swizzle[i] = GET_SWZ(src.swizzle, ir->mask.z);
+	    break;
+	 case 3:
+	    swizzle[i] = GET_SWZ(src.swizzle, ir->mask.w);
+	    break;
+	 }
+      } else {
+	 /* If the type is smaller than a vec4, replicate the last
+	  * channel out.
+	  */
+	 swizzle[i] = swizzle[ir->type->vector_elements - 1];
+      }
+   }
+
+   src.swizzle = MAKE_SWIZZLE4(swizzle[0], swizzle[1], swizzle[2], swizzle[3]);
+
+   this->result = src;
+}
+
+void
+ir_to_mesa_visitor::visit(ir_dereference_variable *ir)
+{
+   variable_storage *entry = find_variable_storage(ir->var);
+   ir_variable *var = ir->var;
+
+   if (!entry) {
+      switch (var->data.mode) {
+      case ir_var_uniform:
+	 entry = new(mem_ctx) variable_storage(var, PROGRAM_UNIFORM,
+					       var->data.location);
+	 this->variables.push_tail(entry);
+	 break;
+      case ir_var_shader_in:
+	 /* The linker assigns locations for varyings and attributes,
+	  * including deprecated builtins (like gl_Color),
+	  * user-assigned generic attributes (glBindVertexLocation),
+	  * and user-defined varyings.
+	  */
+	 assert(var->data.location != -1);
+         entry = new(mem_ctx) variable_storage(var,
+                                               PROGRAM_INPUT,
+                                               var->data.location);
+         break;
+      case ir_var_shader_out:
+	 assert(var->data.location != -1);
+         entry = new(mem_ctx) variable_storage(var,
+                                               PROGRAM_OUTPUT,
+                                               var->data.location);
+	 break;
+      case ir_var_system_value:
+         entry = new(mem_ctx) variable_storage(var,
+                                               PROGRAM_SYSTEM_VALUE,
+                                               var->data.location);
+         break;
+      case ir_var_auto:
+      case ir_var_temporary:
+	 entry = new(mem_ctx) variable_storage(var, PROGRAM_TEMPORARY,
+					       this->next_temp);
+	 this->variables.push_tail(entry);
+
+	 next_temp += type_size(var->type);
+	 break;
+      }
+
+      if (!entry) {
+	 printf("Failed to make storage for %s\n", var->name);
+	 exit(1);
+      }
+   }
+
+   this->result = src_reg(entry->file, entry->index, var->type);
+}
+
+void
+ir_to_mesa_visitor::visit(ir_dereference_array *ir)
+{
+   ir_constant *index;
+   src_reg src;
+   int element_size = type_size(ir->type);
+
+   index = ir->array_index->constant_expression_value();
+
+   ir->array->accept(this);
+   src = this->result;
+
+   if (index) {
+      src.index += index->value.i[0] * element_size;
+   } else {
+      /* Variable index array dereference.  It eats the "vec4" of the
+       * base of the array and an index that offsets the Mesa register
+       * index.
+       */
+      ir->array_index->accept(this);
+
+      src_reg index_reg;
+
+      if (element_size == 1) {
+	 index_reg = this->result;
+      } else {
+	 index_reg = get_temp(glsl_type::float_type);
+
+	 emit(ir, OPCODE_MUL, dst_reg(index_reg),
+	      this->result, src_reg_for_float(element_size));
+      }
+
+      /* If there was already a relative address register involved, add the
+       * new and the old together to get the new offset.
+       */
+      if (src.reladdr != NULL)  {
+	 src_reg accum_reg = get_temp(glsl_type::float_type);
+
+	 emit(ir, OPCODE_ADD, dst_reg(accum_reg),
+	      index_reg, *src.reladdr);
+
+	 index_reg = accum_reg;
+      }
+
+      src.reladdr = ralloc(mem_ctx, src_reg);
+      memcpy(src.reladdr, &index_reg, sizeof(index_reg));
+   }
+
+   /* If the type is smaller than a vec4, replicate the last channel out. */
+   if (ir->type->is_scalar() || ir->type->is_vector())
+      src.swizzle = swizzle_for_size(ir->type->vector_elements);
+   else
+      src.swizzle = SWIZZLE_NOOP;
+
+   this->result = src;
+}
+
+void
+ir_to_mesa_visitor::visit(ir_dereference_record *ir)
+{
+   unsigned int i;
+   const glsl_type *struct_type = ir->record->type;
+   int offset = 0;
+
+   ir->record->accept(this);
+
+   for (i = 0; i < struct_type->length; i++) {
+      if (strcmp(struct_type->fields.structure[i].name, ir->field) == 0)
+	 break;
+      offset += type_size(struct_type->fields.structure[i].type);
+   }
+
+   /* If the type is smaller than a vec4, replicate the last channel out. */
+   if (ir->type->is_scalar() || ir->type->is_vector())
+      this->result.swizzle = swizzle_for_size(ir->type->vector_elements);
+   else
+      this->result.swizzle = SWIZZLE_NOOP;
+
+   this->result.index += offset;
+}
+
+/**
+ * We want to be careful in assignment setup to hit the actual storage
+ * instead of potentially using a temporary like we might with the
+ * ir_dereference handler.
+ */
+static dst_reg
+get_assignment_lhs(ir_dereference *ir, ir_to_mesa_visitor *v)
+{
+   /* The LHS must be a dereference.  If the LHS is a variable indexed array
+    * access of a vector, it must be separated into a series conditional moves
+    * before reaching this point (see ir_vec_index_to_cond_assign).
+    */
+   assert(ir->as_dereference());
+   ir_dereference_array *deref_array = ir->as_dereference_array();
+   if (deref_array) {
+      assert(!deref_array->array->type->is_vector());
+   }
+
+   /* Use the rvalue deref handler for the most part.  We'll ignore
+    * swizzles in it and write swizzles using writemask, though.
+    */
+   ir->accept(v);
+   return dst_reg(v->result);
+}
+
+/**
+ * Process the condition of a conditional assignment
+ *
+ * Examines the condition of a conditional assignment to generate the optimal
+ * first operand of a \c CMP instruction.  If the condition is a relational
+ * operator with 0 (e.g., \c ir_binop_less), the value being compared will be
+ * used as the source for the \c CMP instruction.  Otherwise the comparison
+ * is processed to a boolean result, and the boolean result is used as the
+ * operand to the CMP instruction.
+ */
+bool
+ir_to_mesa_visitor::process_move_condition(ir_rvalue *ir)
+{
+   ir_rvalue *src_ir = ir;
+   bool negate = true;
+   bool switch_order = false;
+
+   ir_expression *const expr = ir->as_expression();
+   if ((expr != NULL) && (expr->get_num_operands() == 2)) {
+      bool zero_on_left = false;
+
+      if (expr->operands[0]->is_zero()) {
+	 src_ir = expr->operands[1];
+	 zero_on_left = true;
+      } else if (expr->operands[1]->is_zero()) {
+	 src_ir = expr->operands[0];
+	 zero_on_left = false;
+      }
+
+      /*      a is -  0  +            -  0  +
+       * (a <  0)  T  F  F  ( a < 0)  T  F  F
+       * (0 <  a)  F  F  T  (-a < 0)  F  F  T
+       * (a <= 0)  T  T  F  (-a < 0)  F  F  T  (swap order of other operands)
+       * (0 <= a)  F  T  T  ( a < 0)  T  F  F  (swap order of other operands)
+       * (a >  0)  F  F  T  (-a < 0)  F  F  T
+       * (0 >  a)  T  F  F  ( a < 0)  T  F  F
+       * (a >= 0)  F  T  T  ( a < 0)  T  F  F  (swap order of other operands)
+       * (0 >= a)  T  T  F  (-a < 0)  F  F  T  (swap order of other operands)
+       *
+       * Note that exchanging the order of 0 and 'a' in the comparison simply
+       * means that the value of 'a' should be negated.
+       */
+      if (src_ir != ir) {
+	 switch (expr->operation) {
+	 case ir_binop_less:
+	    switch_order = false;
+	    negate = zero_on_left;
+	    break;
+
+	 case ir_binop_greater:
+	    switch_order = false;
+	    negate = !zero_on_left;
+	    break;
+
+	 case ir_binop_lequal:
+	    switch_order = true;
+	    negate = !zero_on_left;
+	    break;
+
+	 case ir_binop_gequal:
+	    switch_order = true;
+	    negate = zero_on_left;
+	    break;
+
+	 default:
+	    /* This isn't the right kind of comparison afterall, so make sure
+	     * the whole condition is visited.
+	     */
+	    src_ir = ir;
+	    break;
+	 }
+      }
+   }
+
+   src_ir->accept(this);
+
+   /* We use the OPCODE_CMP (a < 0 ? b : c) for conditional moves, and the
+    * condition we produced is 0.0 or 1.0.  By flipping the sign, we can
+    * choose which value OPCODE_CMP produces without an extra instruction
+    * computing the condition.
+    */
+   if (negate)
+      this->result.negate = ~this->result.negate;
+
+   return switch_order;
+}
+
+void
+ir_to_mesa_visitor::visit(ir_assignment *ir)
+{
+   dst_reg l;
+   src_reg r;
+   int i;
+
+   ir->rhs->accept(this);
+   r = this->result;
+
+   l = get_assignment_lhs(ir->lhs, this);
+
+   /* FINISHME: This should really set to the correct maximal writemask for each
+    * FINISHME: component written (in the loops below).  This case can only
+    * FINISHME: occur for matrices, arrays, and structures.
+    */
+   if (ir->write_mask == 0) {
+      assert(!ir->lhs->type->is_scalar() && !ir->lhs->type->is_vector());
+      l.writemask = WRITEMASK_XYZW;
+   } else if (ir->lhs->type->is_scalar()) {
+      /* FINISHME: This hack makes writing to gl_FragDepth, which lives in the
+       * FINISHME: W component of fragment shader output zero, work correctly.
+       */
+      l.writemask = WRITEMASK_XYZW;
+   } else {
+      int swizzles[4];
+      int first_enabled_chan = 0;
+      int rhs_chan = 0;
+
+      assert(ir->lhs->type->is_vector());
+      l.writemask = ir->write_mask;
+
+      for (int i = 0; i < 4; i++) {
+	 if (l.writemask & (1 << i)) {
+	    first_enabled_chan = GET_SWZ(r.swizzle, i);
+	    break;
+	 }
+      }
+
+      /* Swizzle a small RHS vector into the channels being written.
+       *
+       * glsl ir treats write_mask as dictating how many channels are
+       * present on the RHS while Mesa IR treats write_mask as just
+       * showing which channels of the vec4 RHS get written.
+       */
+      for (int i = 0; i < 4; i++) {
+	 if (l.writemask & (1 << i))
+	    swizzles[i] = GET_SWZ(r.swizzle, rhs_chan++);
+	 else
+	    swizzles[i] = first_enabled_chan;
+      }
+      r.swizzle = MAKE_SWIZZLE4(swizzles[0], swizzles[1],
+				swizzles[2], swizzles[3]);
+   }
+
+   assert(l.file != PROGRAM_UNDEFINED);
+   assert(r.file != PROGRAM_UNDEFINED);
+
+   if (ir->condition) {
+      const bool switch_order = this->process_move_condition(ir->condition);
+      src_reg condition = this->result;
+
+      for (i = 0; i < type_size(ir->lhs->type); i++) {
+	 if (switch_order) {
+	    emit(ir, OPCODE_CMP, l, condition, src_reg(l), r);
+	 } else {
+	    emit(ir, OPCODE_CMP, l, condition, r, src_reg(l));
+	 }
+
+	 l.index++;
+	 r.index++;
+      }
+   } else {
+      for (i = 0; i < type_size(ir->lhs->type); i++) {
+	 emit(ir, OPCODE_MOV, l, r);
+	 l.index++;
+	 r.index++;
+      }
+   }
+}
+
+
+void
+ir_to_mesa_visitor::visit(ir_constant *ir)
+{
+   src_reg src;
+   GLfloat stack_vals[4] = { 0 };
+   GLfloat *values = stack_vals;
+   unsigned int i;
+
+   /* Unfortunately, 4 floats is all we can get into
+    * _mesa_add_unnamed_constant.  So, make a temp to store an
+    * aggregate constant and move each constant value into it.  If we
+    * get lucky, copy propagation will eliminate the extra moves.
+    */
+
+   if (ir->type->base_type == GLSL_TYPE_STRUCT) {
+      src_reg temp_base = get_temp(ir->type);
+      dst_reg temp = dst_reg(temp_base);
+
+      foreach_list(node, &ir->components) {
+	 ir_constant *field_value = (ir_constant *) node;
+	 int size = type_size(field_value->type);
+
+	 assert(size > 0);
+
+	 field_value->accept(this);
+	 src = this->result;
+
+	 for (i = 0; i < (unsigned int)size; i++) {
+	    emit(ir, OPCODE_MOV, temp, src);
+
+	    src.index++;
+	    temp.index++;
+	 }
+      }
+      this->result = temp_base;
+      return;
+   }
+
+   if (ir->type->is_array()) {
+      src_reg temp_base = get_temp(ir->type);
+      dst_reg temp = dst_reg(temp_base);
+      int size = type_size(ir->type->fields.array);
+
+      assert(size > 0);
+
+      for (i = 0; i < ir->type->length; i++) {
+	 ir->array_elements[i]->accept(this);
+	 src = this->result;
+	 for (int j = 0; j < size; j++) {
+	    emit(ir, OPCODE_MOV, temp, src);
+
+	    src.index++;
+	    temp.index++;
+	 }
+      }
+      this->result = temp_base;
+      return;
+   }
+
+   if (ir->type->is_matrix()) {
+      src_reg mat = get_temp(ir->type);
+      dst_reg mat_column = dst_reg(mat);
+
+      for (i = 0; i < ir->type->matrix_columns; i++) {
+	 assert(ir->type->base_type == GLSL_TYPE_FLOAT);
+	 values = &ir->value.f[i * ir->type->vector_elements];
+
+	 src = src_reg(PROGRAM_CONSTANT, -1, NULL);
+	 src.index = _mesa_add_unnamed_constant(this->prog->Parameters,
+						(gl_constant_value *) values,
+						ir->type->vector_elements,
+						&src.swizzle);
+	 emit(ir, OPCODE_MOV, mat_column, src);
+
+	 mat_column.index++;
+      }
+
+      this->result = mat;
+      return;
+   }
+
+   src.file = PROGRAM_CONSTANT;
+   switch (ir->type->base_type) {
+   case GLSL_TYPE_FLOAT:
+      values = &ir->value.f[0];
+      break;
+   case GLSL_TYPE_UINT:
+      for (i = 0; i < ir->type->vector_elements; i++) {
+	 values[i] = ir->value.u[i];
+      }
+      break;
+   case GLSL_TYPE_INT:
+      for (i = 0; i < ir->type->vector_elements; i++) {
+	 values[i] = ir->value.i[i];
+      }
+      break;
+   case GLSL_TYPE_BOOL:
+      for (i = 0; i < ir->type->vector_elements; i++) {
+	 values[i] = ir->value.b[i];
+      }
+      break;
+   default:
+      assert(!"Non-float/uint/int/bool constant");
+   }
+
+   this->result = src_reg(PROGRAM_CONSTANT, -1, ir->type);
+   this->result.index = _mesa_add_unnamed_constant(this->prog->Parameters,
+						   (gl_constant_value *) values,
+						   ir->type->vector_elements,
+						   &this->result.swizzle);
+}
+
+void
+ir_to_mesa_visitor::visit(ir_call *)
+{
+   assert(!"ir_to_mesa: All function calls should have been inlined by now.");
+}
+
+void
+ir_to_mesa_visitor::visit(ir_texture *ir)
+{
+   src_reg result_src, coord, lod_info, projector, dx, dy;
+   dst_reg result_dst, coord_dst;
+   ir_to_mesa_instruction *inst = NULL;
+   prog_opcode opcode = OPCODE_NOP;
+
+   /* Neither opcode uses coordinate */
+   if (ir->op == ir_txs || ir->op == ir_query_levels)
+      this->result = src_reg_for_float(0.0);
+   else
+      ir->coordinate->accept(this);
+
+   /* Put our coords in a temp.  We'll need to modify them for shadow,
+    * projection, or LOD, so the only case we'd use it as is is if
+    * we're doing plain old texturing.  Mesa IR optimization should
+    * handle cleaning up our mess in that case.
+    */
+   coord = get_temp(glsl_type::vec4_type);
+   coord_dst = dst_reg(coord);
+   emit(ir, OPCODE_MOV, coord_dst, this->result);
+
+   if (ir->projector) {
+      ir->projector->accept(this);
+      projector = this->result;
+   }
+
+   /* Storage for our result.  Ideally for an assignment we'd be using
+    * the actual storage for the result here, instead.
+    */
+   result_src = get_temp(glsl_type::vec4_type);
+   result_dst = dst_reg(result_src);
+
+   switch (ir->op) {
+   case ir_tex:
+   case ir_txs:
+      opcode = OPCODE_TEX;
+      break;
+   case ir_txb:
+      opcode = OPCODE_TXB;
+      ir->lod_info.bias->accept(this);
+      lod_info = this->result;
+      break;
+   case ir_txf:
+      /* Pretend to be TXL so the sampler, coordinate, lod are available */
+   case ir_txl:
+      opcode = OPCODE_TXL;
+      ir->lod_info.lod->accept(this);
+      lod_info = this->result;
+      break;
+   case ir_txd:
+      opcode = OPCODE_TXD;
+      ir->lod_info.grad.dPdx->accept(this);
+      dx = this->result;
+      ir->lod_info.grad.dPdy->accept(this);
+      dy = this->result;
+      break;
+   case ir_txf_ms:
+      assert(!"Unexpected ir_txf_ms opcode");
+      break;
+   case ir_lod:
+      assert(!"Unexpected ir_lod opcode");
+      break;
+   case ir_tg4:
+      assert(!"Unexpected ir_tg4 opcode");
+      break;
+   case ir_query_levels:
+      assert(!"Unexpected ir_query_levels opcode");
+      break;
+   }
+
+   const glsl_type *sampler_type = ir->sampler->type;
+
+   if (ir->projector) {
+      if (opcode == OPCODE_TEX) {
+	 /* Slot the projector in as the last component of the coord. */
+	 coord_dst.writemask = WRITEMASK_W;
+	 emit(ir, OPCODE_MOV, coord_dst, projector);
+	 coord_dst.writemask = WRITEMASK_XYZW;
+	 opcode = OPCODE_TXP;
+      } else {
+	 src_reg coord_w = coord;
+	 coord_w.swizzle = SWIZZLE_WWWW;
+
+	 /* For the other TEX opcodes there's no projective version
+	  * since the last slot is taken up by lod info.  Do the
+	  * projective divide now.
+	  */
+	 coord_dst.writemask = WRITEMASK_W;
+	 emit(ir, OPCODE_RCP, coord_dst, projector);
+
+	 /* In the case where we have to project the coordinates "by hand,"
+	  * the shadow comparitor value must also be projected.
+	  */
+	 src_reg tmp_src = coord;
+	 if (ir->shadow_comparitor) {
+	    /* Slot the shadow value in as the second to last component of the
+	     * coord.
+	     */
+	    ir->shadow_comparitor->accept(this);
+
+	    tmp_src = get_temp(glsl_type::vec4_type);
+	    dst_reg tmp_dst = dst_reg(tmp_src);
+
+	    /* Projective division not allowed for array samplers. */
+	    assert(!sampler_type->sampler_array);
+
+	    tmp_dst.writemask = WRITEMASK_Z;
+	    emit(ir, OPCODE_MOV, tmp_dst, this->result);
+
+	    tmp_dst.writemask = WRITEMASK_XY;
+	    emit(ir, OPCODE_MOV, tmp_dst, coord);
+	 }
+
+	 coord_dst.writemask = WRITEMASK_XYZ;
+	 emit(ir, OPCODE_MUL, coord_dst, tmp_src, coord_w);
+
+	 coord_dst.writemask = WRITEMASK_XYZW;
+	 coord.swizzle = SWIZZLE_XYZW;
+      }
+   }
+
+   /* If projection is done and the opcode is not OPCODE_TXP, then the shadow
+    * comparitor was put in the correct place (and projected) by the code,
+    * above, that handles by-hand projection.
+    */
+   if (ir->shadow_comparitor && (!ir->projector || opcode == OPCODE_TXP)) {
+      /* Slot the shadow value in as the second to last component of the
+       * coord.
+       */
+      ir->shadow_comparitor->accept(this);
+
+      /* XXX This will need to be updated for cubemap array samplers. */
+      if (sampler_type->sampler_dimensionality == GLSL_SAMPLER_DIM_2D &&
+          sampler_type->sampler_array) {
+         coord_dst.writemask = WRITEMASK_W;
+      } else {
+         coord_dst.writemask = WRITEMASK_Z;
+      }
+
+      emit(ir, OPCODE_MOV, coord_dst, this->result);
+      coord_dst.writemask = WRITEMASK_XYZW;
+   }
+
+   if (opcode == OPCODE_TXL || opcode == OPCODE_TXB) {
+      /* Mesa IR stores lod or lod bias in the last channel of the coords. */
+      coord_dst.writemask = WRITEMASK_W;
+      emit(ir, OPCODE_MOV, coord_dst, lod_info);
+      coord_dst.writemask = WRITEMASK_XYZW;
+   }
+
+   if (opcode == OPCODE_TXD)
+      inst = emit(ir, opcode, result_dst, coord, dx, dy);
+   else
+      inst = emit(ir, opcode, result_dst, coord);
+
+   if (ir->shadow_comparitor)
+      inst->tex_shadow = GL_TRUE;
+
+   inst->sampler = _mesa_get_sampler_uniform_value(ir->sampler,
+						   this->shader_program,
+						   this->prog);
+
+   switch (sampler_type->sampler_dimensionality) {
+   case GLSL_SAMPLER_DIM_1D:
+      inst->tex_target = (sampler_type->sampler_array)
+	 ? TEXTURE_1D_ARRAY_INDEX : TEXTURE_1D_INDEX;
+      break;
+   case GLSL_SAMPLER_DIM_2D:
+      inst->tex_target = (sampler_type->sampler_array)
+	 ? TEXTURE_2D_ARRAY_INDEX : TEXTURE_2D_INDEX;
+      break;
+   case GLSL_SAMPLER_DIM_3D:
+      inst->tex_target = TEXTURE_3D_INDEX;
+      break;
+   case GLSL_SAMPLER_DIM_CUBE:
+      inst->tex_target = TEXTURE_CUBE_INDEX;
+      break;
+   case GLSL_SAMPLER_DIM_RECT:
+      inst->tex_target = TEXTURE_RECT_INDEX;
+      break;
+   case GLSL_SAMPLER_DIM_BUF:
+      assert(!"FINISHME: Implement ARB_texture_buffer_object");
+      break;
+   case GLSL_SAMPLER_DIM_EXTERNAL:
+      inst->tex_target = TEXTURE_EXTERNAL_INDEX;
+      break;
+   default:
+      assert(!"Should not get here.");
+   }
+
+   this->result = result_src;
+}
+
+void
+ir_to_mesa_visitor::visit(ir_return *ir)
+{
+   /* Non-void functions should have been inlined.  We may still emit RETs
+    * from main() unless the EmitNoMainReturn option is set.
+    */
+   assert(!ir->get_value());
+   emit(ir, OPCODE_RET);
+}
+
+void
+ir_to_mesa_visitor::visit(ir_discard *ir)
+{
+   if (ir->condition) {
+      ir->condition->accept(this);
+      this->result.negate = ~this->result.negate;
+      emit(ir, OPCODE_KIL, undef_dst, this->result);
+   } else {
+      emit(ir, OPCODE_KIL_NV);
+   }
+}
+
+void
+ir_to_mesa_visitor::visit(ir_if *ir)
+{
+   ir_to_mesa_instruction *cond_inst, *if_inst;
+   ir_to_mesa_instruction *prev_inst;
+
+   prev_inst = (ir_to_mesa_instruction *)this->instructions.get_tail();
+
+   ir->condition->accept(this);
+   assert(this->result.file != PROGRAM_UNDEFINED);
+
+   if (this->options->EmitCondCodes) {
+      cond_inst = (ir_to_mesa_instruction *)this->instructions.get_tail();
+
+      /* See if we actually generated any instruction for generating
+       * the condition.  If not, then cook up a move to a temp so we
+       * have something to set cond_update on.
+       */
+      if (cond_inst == prev_inst) {
+	 src_reg temp = get_temp(glsl_type::bool_type);
+	 cond_inst = emit(ir->condition, OPCODE_MOV, dst_reg(temp), result);
+      }
+      cond_inst->cond_update = GL_TRUE;
+
+      if_inst = emit(ir->condition, OPCODE_IF);
+      if_inst->dst.cond_mask = COND_NE;
+   } else {
+      if_inst = emit(ir->condition, OPCODE_IF, undef_dst, this->result);
+   }
+
+   this->instructions.push_tail(if_inst);
+
+   visit_exec_list(&ir->then_instructions, this);
+
+   if (!ir->else_instructions.is_empty()) {
+      emit(ir->condition, OPCODE_ELSE);
+      visit_exec_list(&ir->else_instructions, this);
+   }
+
+   emit(ir->condition, OPCODE_ENDIF);
+}
+
+void
+ir_to_mesa_visitor::visit(ir_emit_vertex *)
+{
+   assert(!"Geometry shaders not supported.");
+}
+
+void
+ir_to_mesa_visitor::visit(ir_end_primitive *)
+{
+   assert(!"Geometry shaders not supported.");
+}
+
+ir_to_mesa_visitor::ir_to_mesa_visitor()
+{
+   result.file = PROGRAM_UNDEFINED;
+   next_temp = 1;
+   next_signature_id = 1;
+   current_function = NULL;
+   mem_ctx = ralloc_context(NULL);
+}
+
+ir_to_mesa_visitor::~ir_to_mesa_visitor()
+{
+   ralloc_free(mem_ctx);
+}
+
+static struct prog_src_register
+mesa_src_reg_from_ir_src_reg(src_reg reg)
+{
+   struct prog_src_register mesa_reg;
+
+   mesa_reg.File = reg.file;
+   assert(reg.index < (1 << INST_INDEX_BITS));
+   mesa_reg.Index = reg.index;
+   mesa_reg.Swizzle = reg.swizzle;
+   mesa_reg.RelAddr = reg.reladdr != NULL;
+   mesa_reg.Negate = reg.negate;
+   mesa_reg.Abs = 0;
+   mesa_reg.HasIndex2 = GL_FALSE;
+   mesa_reg.RelAddr2 = 0;
+   mesa_reg.Index2 = 0;
+
+   return mesa_reg;
+}
+
+static void
+set_branchtargets(ir_to_mesa_visitor *v,
+		  struct prog_instruction *mesa_instructions,
+		  int num_instructions)
+{
+   int if_count = 0, loop_count = 0;
+   int *if_stack, *loop_stack;
+   int if_stack_pos = 0, loop_stack_pos = 0;
+   int i, j;
+
+   for (i = 0; i < num_instructions; i++) {
+      switch (mesa_instructions[i].Opcode) {
+      case OPCODE_IF:
+	 if_count++;
+	 break;
+      case OPCODE_BGNLOOP:
+	 loop_count++;
+	 break;
+      case OPCODE_BRK:
+      case OPCODE_CONT:
+	 mesa_instructions[i].BranchTarget = -1;
+	 break;
+      default:
+	 break;
+      }
+   }
+
+   if_stack = rzalloc_array(v->mem_ctx, int, if_count);
+   loop_stack = rzalloc_array(v->mem_ctx, int, loop_count);
+
+   for (i = 0; i < num_instructions; i++) {
+      switch (mesa_instructions[i].Opcode) {
+      case OPCODE_IF:
+	 if_stack[if_stack_pos] = i;
+	 if_stack_pos++;
+	 break;
+      case OPCODE_ELSE:
+	 mesa_instructions[if_stack[if_stack_pos - 1]].BranchTarget = i;
+	 if_stack[if_stack_pos - 1] = i;
+	 break;
+      case OPCODE_ENDIF:
+	 mesa_instructions[if_stack[if_stack_pos - 1]].BranchTarget = i;
+	 if_stack_pos--;
+	 break;
+      case OPCODE_BGNLOOP:
+	 loop_stack[loop_stack_pos] = i;
+	 loop_stack_pos++;
+	 break;
+      case OPCODE_ENDLOOP:
+	 loop_stack_pos--;
+	 /* Rewrite any breaks/conts at this nesting level (haven't
+	  * already had a BranchTarget assigned) to point to the end
+	  * of the loop.
+	  */
+	 for (j = loop_stack[loop_stack_pos]; j < i; j++) {
+	    if (mesa_instructions[j].Opcode == OPCODE_BRK ||
+		mesa_instructions[j].Opcode == OPCODE_CONT) {
+	       if (mesa_instructions[j].BranchTarget == -1) {
+		  mesa_instructions[j].BranchTarget = i;
+	       }
+	    }
+	 }
+	 /* The loop ends point at each other. */
+	 mesa_instructions[i].BranchTarget = loop_stack[loop_stack_pos];
+	 mesa_instructions[loop_stack[loop_stack_pos]].BranchTarget = i;
+	 break;
+      case OPCODE_CAL:
+	 foreach_list(n, &v->function_signatures) {
+	    function_entry *entry = (function_entry *) n;
+
+	    if (entry->sig_id == mesa_instructions[i].BranchTarget) {
+	       mesa_instructions[i].BranchTarget = entry->inst;
+	       break;
+	    }
+	 }
+	 break;
+      default:
+	 break;
+      }
+   }
+}
+
+static void
+print_program(struct prog_instruction *mesa_instructions,
+	      ir_instruction **mesa_instruction_annotation,
+	      int num_instructions)
+{
+   ir_instruction *last_ir = NULL;
+   int i;
+   int indent = 0;
+
+   for (i = 0; i < num_instructions; i++) {
+      struct prog_instruction *mesa_inst = mesa_instructions + i;
+      ir_instruction *ir = mesa_instruction_annotation[i];
+
+      fprintf(stdout, "%3d: ", i);
+
+      if (last_ir != ir && ir) {
+	 int j;
+
+	 for (j = 0; j < indent; j++) {
+	    fprintf(stdout, " ");
+	 }
+	 ir->print();
+	 printf("\n");
+	 last_ir = ir;
+
+	 fprintf(stdout, "     "); /* line number spacing. */
+      }
+
+      indent = _mesa_fprint_instruction_opt(stdout, mesa_inst, indent,
+					    PROG_PRINT_DEBUG, NULL);
+   }
+}
+
+namespace {
+
+class add_uniform_to_shader : public program_resource_visitor {
+public:
+   add_uniform_to_shader(struct gl_shader_program *shader_program,
+			 struct gl_program_parameter_list *params,
+                         gl_shader_stage shader_type)
+      : shader_program(shader_program), params(params), idx(-1),
+        shader_type(shader_type)
+   {
+      /* empty */
+   }
+
+   void process(ir_variable *var)
+   {
+      this->idx = -1;
+      this->program_resource_visitor::process(var);
+
+      var->data.location = this->idx;
+   }
+
+private:
+   virtual void visit_field(const glsl_type *type, const char *name,
+                            bool row_major);
+
+   struct gl_shader_program *shader_program;
+   struct gl_program_parameter_list *params;
+   int idx;
+   gl_shader_stage shader_type;
+};
+
+} /* anonymous namespace */
+
+void
+add_uniform_to_shader::visit_field(const glsl_type *type, const char *name,
+                                   bool row_major)
+{
+   unsigned int size;
+
+   (void) row_major;
+
+   if (type->is_vector() || type->is_scalar()) {
+      size = type->vector_elements;
+   } else {
+      size = type_size(type) * 4;
+   }
+
+   gl_register_file file;
+   if (type->is_sampler() ||
+       (type->is_array() && type->fields.array->is_sampler())) {
+      file = PROGRAM_SAMPLER;
+   } else {
+      file = PROGRAM_UNIFORM;
+   }
+
+   int index = _mesa_lookup_parameter_index(params, -1, name);
+   if (index < 0) {
+      index = _mesa_add_parameter(params, file, name, size, type->gl_type,
+				  NULL, NULL);
+
+      /* Sampler uniform values are stored in prog->SamplerUnits,
+       * and the entry in that array is selected by this index we
+       * store in ParameterValues[].
+       */
+      if (file == PROGRAM_SAMPLER) {
+	 unsigned location;
+	 const bool found =
+	    this->shader_program->UniformHash->get(location,
+						   params->Parameters[index].Name);
+	 assert(found);
+
+	 if (!found)
+	    return;
+
+	 struct gl_uniform_storage *storage =
+	    &this->shader_program->UniformStorage[location];
+
+         assert(storage->sampler[shader_type].active);
+
+	 for (unsigned int j = 0; j < size / 4; j++)
+            params->ParameterValues[index + j][0].f =
+               storage->sampler[shader_type].index + j;
+      }
+   }
+
+   /* The first part of the uniform that's processed determines the base
+    * location of the whole uniform (for structures).
+    */
+   if (this->idx < 0)
+      this->idx = index;
+}
+
+/**
+ * Generate the program parameters list for the user uniforms in a shader
+ *
+ * \param shader_program Linked shader program.  This is only used to
+ *                       emit possible link errors to the info log.
+ * \param sh             Shader whose uniforms are to be processed.
+ * \param params         Parameter list to be filled in.
+ */
+void
+_mesa_generate_parameters_list_for_uniforms(struct gl_shader_program
+					    *shader_program,
+					    struct gl_shader *sh,
+					    struct gl_program_parameter_list
+					    *params)
+{
+   add_uniform_to_shader add(shader_program, params, sh->Stage);
+
+   foreach_list(node, sh->ir) {
+      ir_variable *var = ((ir_instruction *) node)->as_variable();
+
+      if ((var == NULL) || (var->data.mode != ir_var_uniform)
+	  || var->is_in_uniform_block() || (strncmp(var->name, "gl_", 3) == 0))
+	 continue;
+
+      add.process(var);
+   }
+}
+
+void
+_mesa_associate_uniform_storage(struct gl_context *ctx,
+				struct gl_shader_program *shader_program,
+				struct gl_program_parameter_list *params)
+{
+   /* After adding each uniform to the parameter list, connect the storage for
+    * the parameter with the tracking structure used by the API for the
+    * uniform.
+    */
+   unsigned last_location = unsigned(~0);
+   for (unsigned i = 0; i < params->NumParameters; i++) {
+      if (params->Parameters[i].Type != PROGRAM_UNIFORM)
+	 continue;
+
+      unsigned location;
+      const bool found =
+	 shader_program->UniformHash->get(location, params->Parameters[i].Name);
+      assert(found);
+
+      if (!found)
+	 continue;
+
+      if (location != last_location) {
+	 struct gl_uniform_storage *storage =
+	    &shader_program->UniformStorage[location];
+	 enum gl_uniform_driver_format format = uniform_native;
+
+	 unsigned columns = 0;
+	 switch (storage->type->base_type) {
+	 case GLSL_TYPE_UINT:
+	    assert(ctx->Const.NativeIntegers);
+	    format = uniform_native;
+	    columns = 1;
+	    break;
+	 case GLSL_TYPE_INT:
+	    format =
+	       (ctx->Const.NativeIntegers) ? uniform_native : uniform_int_float;
+	    columns = 1;
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    format = uniform_native;
+	    columns = storage->type->matrix_columns;
+	    break;
+	 case GLSL_TYPE_BOOL:
+	    if (ctx->Const.NativeIntegers) {
+	       format = (ctx->Const.UniformBooleanTrue == 1)
+		  ? uniform_bool_int_0_1 : uniform_bool_int_0_not0;
+	    } else {
+	       format = uniform_bool_float;
+	    }
+	    columns = 1;
+	    break;
+	 case GLSL_TYPE_SAMPLER:
+	 case GLSL_TYPE_IMAGE:
+	    format = uniform_native;
+	    columns = 1;
+	    break;
+         case GLSL_TYPE_ATOMIC_UINT:
+         case GLSL_TYPE_ARRAY:
+         case GLSL_TYPE_VOID:
+         case GLSL_TYPE_STRUCT:
+         case GLSL_TYPE_ERROR:
+         case GLSL_TYPE_INTERFACE:
+	    assert(!"Should not get here.");
+	    break;
+	 }
+
+	 _mesa_uniform_attach_driver_storage(storage,
+					     4 * sizeof(float) * columns,
+					     4 * sizeof(float),
+					     format,
+					     &params->ParameterValues[i]);
+
+	 /* After attaching the driver's storage to the uniform, propagate any
+	  * data from the linker's backing store.  This will cause values from
+	  * initializers in the source code to be copied over.
+	  */
+	 _mesa_propagate_uniforms_to_driver_storage(storage,
+						    0,
+						    MAX2(1, storage->array_elements));
+
+	 last_location = location;
+      }
+   }
+}
+
+/*
+ * On a basic block basis, tracks available PROGRAM_TEMPORARY register
+ * channels for copy propagation and updates following instructions to
+ * use the original versions.
+ *
+ * The ir_to_mesa_visitor lazily produces code assuming that this pass
+ * will occur.  As an example, a TXP production before this pass:
+ *
+ * 0: MOV TEMP[1], INPUT[4].xyyy;
+ * 1: MOV TEMP[1].w, INPUT[4].wwww;
+ * 2: TXP TEMP[2], TEMP[1], texture[0], 2D;
+ *
+ * and after:
+ *
+ * 0: MOV TEMP[1], INPUT[4].xyyy;
+ * 1: MOV TEMP[1].w, INPUT[4].wwww;
+ * 2: TXP TEMP[2], INPUT[4].xyyw, texture[0], 2D;
+ *
+ * which allows for dead code elimination on TEMP[1]'s writes.
+ */
+void
+ir_to_mesa_visitor::copy_propagate(void)
+{
+   ir_to_mesa_instruction **acp = rzalloc_array(mem_ctx,
+						    ir_to_mesa_instruction *,
+						    this->next_temp * 4);
+   int *acp_level = rzalloc_array(mem_ctx, int, this->next_temp * 4);
+   int level = 0;
+
+   foreach_list(node, &this->instructions) {
+      ir_to_mesa_instruction *inst = (ir_to_mesa_instruction *) node;
+
+      assert(inst->dst.file != PROGRAM_TEMPORARY
+	     || inst->dst.index < this->next_temp);
+
+      /* First, do any copy propagation possible into the src regs. */
+      for (int r = 0; r < 3; r++) {
+	 ir_to_mesa_instruction *first = NULL;
+	 bool good = true;
+	 int acp_base = inst->src[r].index * 4;
+
+	 if (inst->src[r].file != PROGRAM_TEMPORARY ||
+	     inst->src[r].reladdr)
+	    continue;
+
+	 /* See if we can find entries in the ACP consisting of MOVs
+	  * from the same src register for all the swizzled channels
+	  * of this src register reference.
+	  */
+	 for (int i = 0; i < 4; i++) {
+	    int src_chan = GET_SWZ(inst->src[r].swizzle, i);
+	    ir_to_mesa_instruction *copy_chan = acp[acp_base + src_chan];
+
+	    if (!copy_chan) {
+	       good = false;
+	       break;
+	    }
+
+	    assert(acp_level[acp_base + src_chan] <= level);
+
+	    if (!first) {
+	       first = copy_chan;
+	    } else {
+	       if (first->src[0].file != copy_chan->src[0].file ||
+		   first->src[0].index != copy_chan->src[0].index) {
+		  good = false;
+		  break;
+	       }
+	    }
+	 }
+
+	 if (good) {
+	    /* We've now validated that we can copy-propagate to
+	     * replace this src register reference.  Do it.
+	     */
+	    inst->src[r].file = first->src[0].file;
+	    inst->src[r].index = first->src[0].index;
+
+	    int swizzle = 0;
+	    for (int i = 0; i < 4; i++) {
+	       int src_chan = GET_SWZ(inst->src[r].swizzle, i);
+	       ir_to_mesa_instruction *copy_inst = acp[acp_base + src_chan];
+	       swizzle |= (GET_SWZ(copy_inst->src[0].swizzle, src_chan) <<
+			   (3 * i));
+	    }
+	    inst->src[r].swizzle = swizzle;
+	 }
+      }
+
+      switch (inst->op) {
+      case OPCODE_BGNLOOP:
+      case OPCODE_ENDLOOP:
+	 /* End of a basic block, clear the ACP entirely. */
+	 memset(acp, 0, sizeof(*acp) * this->next_temp * 4);
+	 break;
+
+      case OPCODE_IF:
+	 ++level;
+	 break;
+
+      case OPCODE_ENDIF:
+      case OPCODE_ELSE:
+	 /* Clear all channels written inside the block from the ACP, but
+	  * leaving those that were not touched.
+	  */
+	 for (int r = 0; r < this->next_temp; r++) {
+	    for (int c = 0; c < 4; c++) {
+	       if (!acp[4 * r + c])
+		  continue;
+
+	       if (acp_level[4 * r + c] >= level)
+		  acp[4 * r + c] = NULL;
+	    }
+	 }
+	 if (inst->op == OPCODE_ENDIF)
+	    --level;
+	 break;
+
+      default:
+	 /* Continuing the block, clear any written channels from
+	  * the ACP.
+	  */
+	 if (inst->dst.file == PROGRAM_TEMPORARY && inst->dst.reladdr) {
+	    /* Any temporary might be written, so no copy propagation
+	     * across this instruction.
+	     */
+	    memset(acp, 0, sizeof(*acp) * this->next_temp * 4);
+	 } else if (inst->dst.file == PROGRAM_OUTPUT &&
+		    inst->dst.reladdr) {
+	    /* Any output might be written, so no copy propagation
+	     * from outputs across this instruction.
+	     */
+	    for (int r = 0; r < this->next_temp; r++) {
+	       for (int c = 0; c < 4; c++) {
+		  if (!acp[4 * r + c])
+		     continue;
+
+		  if (acp[4 * r + c]->src[0].file == PROGRAM_OUTPUT)
+		     acp[4 * r + c] = NULL;
+	       }
+	    }
+	 } else if (inst->dst.file == PROGRAM_TEMPORARY ||
+		    inst->dst.file == PROGRAM_OUTPUT) {
+	    /* Clear where it's used as dst. */
+	    if (inst->dst.file == PROGRAM_TEMPORARY) {
+	       for (int c = 0; c < 4; c++) {
+		  if (inst->dst.writemask & (1 << c)) {
+		     acp[4 * inst->dst.index + c] = NULL;
+		  }
+	       }
+	    }
+
+	    /* Clear where it's used as src. */
+	    for (int r = 0; r < this->next_temp; r++) {
+	       for (int c = 0; c < 4; c++) {
+		  if (!acp[4 * r + c])
+		     continue;
+
+		  int src_chan = GET_SWZ(acp[4 * r + c]->src[0].swizzle, c);
+
+		  if (acp[4 * r + c]->src[0].file == inst->dst.file &&
+		      acp[4 * r + c]->src[0].index == inst->dst.index &&
+		      inst->dst.writemask & (1 << src_chan))
+		  {
+		     acp[4 * r + c] = NULL;
+		  }
+	       }
+	    }
+	 }
+	 break;
+      }
+
+      /* If this is a copy, add it to the ACP. */
+      if (inst->op == OPCODE_MOV &&
+	  inst->dst.file == PROGRAM_TEMPORARY &&
+	  !(inst->dst.file == inst->src[0].file &&
+	    inst->dst.index == inst->src[0].index) &&
+	  !inst->dst.reladdr &&
+	  !inst->saturate &&
+	  !inst->src[0].reladdr &&
+	  !inst->src[0].negate) {
+	 for (int i = 0; i < 4; i++) {
+	    if (inst->dst.writemask & (1 << i)) {
+	       acp[4 * inst->dst.index + i] = inst;
+	       acp_level[4 * inst->dst.index + i] = level;
+	    }
+	 }
+      }
+   }
+
+   ralloc_free(acp_level);
+   ralloc_free(acp);
+}
+
+
+/**
+ * Convert a shader's GLSL IR into a Mesa gl_program.
+ */
+struct gl_program *
+_mesa_ir_get_program(struct gl_context *ctx,
+                     struct gl_shader_program *shader_program,
+                     struct gl_shader *shader)
+{
+   ir_to_mesa_visitor v;
+   struct prog_instruction *mesa_instructions, *mesa_inst;
+   ir_instruction **mesa_instruction_annotation;
+   int i;
+   struct gl_program *prog;
+   GLenum target = _mesa_shader_stage_to_program(shader->Stage);
+   const char *target_string = _mesa_shader_stage_to_string(shader->Stage);
+   struct gl_shader_compiler_options *options =
+         &ctx->ShaderCompilerOptions[shader->Stage];
+
+   validate_ir_tree(shader->ir);
+
+   prog = ctx->Driver.NewProgram(ctx, target, shader_program->Name);
+   if (!prog)
+      return NULL;
+   prog->Parameters = _mesa_new_parameter_list();
+   v.ctx = ctx;
+   v.prog = prog;
+   v.shader_program = shader_program;
+   v.options = options;
+
+   _mesa_generate_parameters_list_for_uniforms(shader_program, shader,
+					       prog->Parameters);
+
+   /* Emit Mesa IR for main(). */
+   visit_exec_list(shader->ir, &v);
+   v.emit(NULL, OPCODE_END);
+
+   prog->NumTemporaries = v.next_temp;
+
+   int num_instructions = 0;
+   foreach_list(node, &v.instructions) {
+      num_instructions++;
+   }
+
+   mesa_instructions =
+      (struct prog_instruction *)calloc(num_instructions,
+					sizeof(*mesa_instructions));
+   mesa_instruction_annotation = ralloc_array(v.mem_ctx, ir_instruction *,
+					      num_instructions);
+
+   v.copy_propagate();
+
+   /* Convert ir_mesa_instructions into prog_instructions.
+    */
+   mesa_inst = mesa_instructions;
+   i = 0;
+   foreach_list(node, &v.instructions) {
+      const ir_to_mesa_instruction *inst = (ir_to_mesa_instruction *) node;
+
+      mesa_inst->Opcode = inst->op;
+      mesa_inst->CondUpdate = inst->cond_update;
+      if (inst->saturate)
+	 mesa_inst->SaturateMode = SATURATE_ZERO_ONE;
+      mesa_inst->DstReg.File = inst->dst.file;
+      mesa_inst->DstReg.Index = inst->dst.index;
+      mesa_inst->DstReg.CondMask = inst->dst.cond_mask;
+      mesa_inst->DstReg.WriteMask = inst->dst.writemask;
+      mesa_inst->DstReg.RelAddr = inst->dst.reladdr != NULL;
+      mesa_inst->SrcReg[0] = mesa_src_reg_from_ir_src_reg(inst->src[0]);
+      mesa_inst->SrcReg[1] = mesa_src_reg_from_ir_src_reg(inst->src[1]);
+      mesa_inst->SrcReg[2] = mesa_src_reg_from_ir_src_reg(inst->src[2]);
+      mesa_inst->TexSrcUnit = inst->sampler;
+      mesa_inst->TexSrcTarget = inst->tex_target;
+      mesa_inst->TexShadow = inst->tex_shadow;
+      mesa_instruction_annotation[i] = inst->ir;
+
+      /* Set IndirectRegisterFiles. */
+      if (mesa_inst->DstReg.RelAddr)
+         prog->IndirectRegisterFiles |= 1 << mesa_inst->DstReg.File;
+
+      /* Update program's bitmask of indirectly accessed register files */
+      for (unsigned src = 0; src < 3; src++)
+         if (mesa_inst->SrcReg[src].RelAddr)
+            prog->IndirectRegisterFiles |= 1 << mesa_inst->SrcReg[src].File;
+
+      switch (mesa_inst->Opcode) {
+      case OPCODE_IF:
+	 if (options->MaxIfDepth == 0) {
+	    linker_warning(shader_program,
+			   "Couldn't flatten if-statement.  "
+			   "This will likely result in software "
+			   "rasterization.\n");
+	 }
+	 break;
+      case OPCODE_BGNLOOP:
+	 if (options->EmitNoLoops) {
+	    linker_warning(shader_program,
+			   "Couldn't unroll loop.  "
+			   "This will likely result in software "
+			   "rasterization.\n");
+	 }
+	 break;
+      case OPCODE_CONT:
+	 if (options->EmitNoCont) {
+	    linker_warning(shader_program,
+			   "Couldn't lower continue-statement.  "
+			   "This will likely result in software "
+			   "rasterization.\n");
+	 }
+	 break;
+      case OPCODE_ARL:
+	 prog->NumAddressRegs = 1;
+	 break;
+      default:
+	 break;
+      }
+
+      mesa_inst++;
+      i++;
+
+      if (!shader_program->LinkStatus)
+         break;
+   }
+
+   if (!shader_program->LinkStatus) {
+      goto fail_exit;
+   }
+
+   set_branchtargets(&v, mesa_instructions, num_instructions);
+
+   if (ctx->GlslFlags & GLSL_DUMP) {
+      fprintf(stderr, "\n");
+      fprintf(stderr, "GLSL IR for linked %s program %d:\n", target_string,
+	      shader_program->Name);
+      _mesa_print_ir(stderr, shader->ir, NULL);
+      fprintf(stderr, "\n");
+      fprintf(stderr, "\n");
+      fprintf(stderr, "Mesa IR for linked %s program %d:\n", target_string,
+	      shader_program->Name);
+      print_program(mesa_instructions, mesa_instruction_annotation,
+		    num_instructions);
+      fflush(stderr);
+   }
+
+   prog->Instructions = mesa_instructions;
+   prog->NumInstructions = num_instructions;
+
+   /* Setting this to NULL prevents a possible double free in the fail_exit
+    * path (far below).
+    */
+   mesa_instructions = NULL;
+
+   do_set_program_inouts(shader->ir, prog, shader->Stage);
+
+   prog->SamplersUsed = shader->active_samplers;
+   prog->ShadowSamplers = shader->shadow_samplers;
+   _mesa_update_shader_textures_used(shader_program, prog);
+
+   /* Set the gl_FragDepth layout. */
+   if (target == GL_FRAGMENT_PROGRAM_ARB) {
+      struct gl_fragment_program *fp = (struct gl_fragment_program *)prog;
+      fp->FragDepthLayout = shader_program->FragDepthLayout;
+   }
+
+   _mesa_reference_program(ctx, &shader->Program, prog);
+
+   if ((ctx->GlslFlags & GLSL_NO_OPT) == 0) {
+      _mesa_optimize_program(ctx, prog);
+   }
+
+   /* This has to be done last.  Any operation that can cause
+    * prog->ParameterValues to get reallocated (e.g., anything that adds a
+    * program constant) has to happen before creating this linkage.
+    */
+   _mesa_associate_uniform_storage(ctx, shader_program, prog->Parameters);
+   if (!shader_program->LinkStatus) {
+      goto fail_exit;
+   }
+
+   return prog;
+
+fail_exit:
+   free(mesa_instructions);
+   _mesa_reference_program(ctx, &shader->Program, NULL);
+   return NULL;
+}
+
+extern "C" {
+
+/**
+ * Link a shader.
+ * Called via ctx->Driver.LinkShader()
+ * This actually involves converting GLSL IR into Mesa gl_programs with
+ * code lowering and other optimizations.
+ */
+GLboolean
+_mesa_ir_link_shader(struct gl_context *ctx, struct gl_shader_program *prog)
+{
+   assert(prog->LinkStatus);
+
+   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
+      if (prog->_LinkedShaders[i] == NULL)
+	 continue;
+
+      bool progress;
+      exec_list *ir = prog->_LinkedShaders[i]->ir;
+      const struct gl_shader_compiler_options *options =
+            &ctx->ShaderCompilerOptions[prog->_LinkedShaders[i]->Stage];
+
+      do {
+	 progress = false;
+
+	 /* Lowering */
+	 do_mat_op_to_vec(ir);
+	 lower_instructions(ir, (MOD_TO_FRACT | DIV_TO_MUL_RCP | EXP_TO_EXP2
+				 | LOG_TO_LOG2 | INT_DIV_TO_MUL_RCP
+				 | ((options->EmitNoPow) ? POW_TO_EXP2 : 0)));
+
+	 progress = do_lower_jumps(ir, true, true, options->EmitNoMainReturn, options->EmitNoCont, options->EmitNoLoops) || progress;
+
+	 progress = do_common_optimization(ir, true, true,
+                                           options, ctx->Const.NativeIntegers)
+	   || progress;
+
+	 progress = lower_quadop_vector(ir, true) || progress;
+
+	 if (options->MaxIfDepth == 0)
+	    progress = lower_discard(ir) || progress;
+
+	 progress = lower_if_to_cond_assign(ir, options->MaxIfDepth) || progress;
+
+	 if (options->EmitNoNoise)
+	    progress = lower_noise(ir) || progress;
+
+	 /* If there are forms of indirect addressing that the driver
+	  * cannot handle, perform the lowering pass.
+	  */
+	 if (options->EmitNoIndirectInput || options->EmitNoIndirectOutput
+	     || options->EmitNoIndirectTemp || options->EmitNoIndirectUniform)
+	   progress =
+	     lower_variable_index_to_cond_assign(ir,
+						 options->EmitNoIndirectInput,
+						 options->EmitNoIndirectOutput,
+						 options->EmitNoIndirectTemp,
+						 options->EmitNoIndirectUniform)
+	     || progress;
+
+	 progress = do_vec_index_to_cond_assign(ir) || progress;
+         progress = lower_vector_insert(ir, true) || progress;
+      } while (progress);
+
+      validate_ir_tree(ir);
+   }
+
+   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
+      struct gl_program *linked_prog;
+
+      if (prog->_LinkedShaders[i] == NULL)
+	 continue;
+
+      linked_prog = _mesa_ir_get_program(ctx, prog, prog->_LinkedShaders[i]);
+
+      if (linked_prog) {
+         _mesa_copy_linked_program_data((gl_shader_stage) i, prog, linked_prog);
+
+	 _mesa_reference_program(ctx, &prog->_LinkedShaders[i]->Program,
+				 linked_prog);
+         if (!ctx->Driver.ProgramStringNotify(ctx,
+                                              _mesa_shader_stage_to_program(i),
+                                              linked_prog)) {
+            return GL_FALSE;
+         }
+      }
+
+      _mesa_reference_program(ctx, &linked_prog, NULL);
+   }
+
+   return prog->LinkStatus;
+}
+
+/**
+ * Link a GLSL shader program.  Called via glLinkProgram().
+ */
+void
+_mesa_glsl_link_shader(struct gl_context *ctx, struct gl_shader_program *prog)
+{
+   unsigned int i;
+
+   _mesa_clear_shader_program_data(ctx, prog);
+
+   prog->LinkStatus = GL_TRUE;
+
+   for (i = 0; i < prog->NumShaders; i++) {
+      if (!prog->Shaders[i]->CompileStatus) {
+	 linker_error(prog, "linking with uncompiled shader");
+      }
+   }
+
+   /* Search program disk cache if active. */
+   if (ctx->BinaryProgramCacheActive && mesa_program_diskcache_find(ctx, prog) == 0)
+      return;
+
+   if (prog->LinkStatus) {
+      link_shaders(ctx, prog);
+   }
+
+   if (prog->LinkStatus) {
+      if (!ctx->Driver.LinkShader(ctx, prog)) {
+	 prog->LinkStatus = GL_FALSE;
+      } else {
+         if (ctx->BinaryProgramCacheActive)
+            mesa_program_diskcache_cache(ctx, prog);
+         prog->_Linked = GL_TRUE;
+      }
+   }
+
+   if (ctx->GlslFlags & GLSL_DUMP) {
+      if (!prog->LinkStatus) {
+	 fprintf(stderr, "GLSL shader program %d failed to link\n", prog->Name);
+      }
+
+      if (prog->InfoLog && prog->InfoLog[0] != 0) {
+	 fprintf(stderr, "GLSL shader program %d info log:\n", prog->Name);
+	 fprintf(stderr, "%s\n", prog->InfoLog);
+      }
+   }
+}
+
+} /* extern "C" */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/ir_to_mesa.h b/icd/intel/compiler/mesa-utils/src/mesa/program/ir_to_mesa.h
new file mode 100644
index 0000000..f450555
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/ir_to_mesa.h

@@ -0,0 +1,57 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+
+#include "main/glheader.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct gl_context;
+struct gl_shader;
+struct gl_shader_program;
+
+void _mesa_glsl_link_shader(struct gl_context *ctx, struct gl_shader_program *prog);
+GLboolean _mesa_ir_compile_shader(struct gl_context *ctx, struct gl_shader *shader);
+GLboolean _mesa_ir_link_shader(struct gl_context *ctx, struct gl_shader_program *prog);
+struct gl_program *
+_mesa_ir_get_program(struct gl_context *ctx,
+                     struct gl_shader_program *shader_program,
+                     struct gl_shader *shader);
+
+void
+_mesa_generate_parameters_list_for_uniforms(struct gl_shader_program
+					    *shader_program,
+					    struct gl_shader *sh,
+					    struct gl_program_parameter_list
+					    *params);
+void
+_mesa_associate_uniform_storage(struct gl_context *ctx,
+				struct gl_shader_program *shader_program,
+				struct gl_program_parameter_list *params);
+
+#ifdef __cplusplus
+}
+#endif

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/lex.yy.c b/icd/intel/compiler/mesa-utils/src/mesa/program/lex.yy.c
new file mode 100644
index 0000000..7b058d3
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/lex.yy.c

@@ -0,0 +1,3682 @@
+#line 2 "program/lex.yy.c"
+
+#line 4 "program/lex.yy.c"
+
+#define  YY_INT_ALIGNED short int
+
+/* A lexical scanner generated by flex */
+
+#define FLEX_SCANNER
+#define YY_FLEX_MAJOR_VERSION 2
+#define YY_FLEX_MINOR_VERSION 5
+#define YY_FLEX_SUBMINOR_VERSION 35
+#if YY_FLEX_SUBMINOR_VERSION > 0
+#define FLEX_BETA
+#endif
+
+/* First, we deal with  platform-specific or compiler-specific issues. */
+
+/* begin standard C headers. */
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdlib.h>
+
+/* end standard C headers. */
+
+/* flex integer type definitions */
+
+#ifndef FLEXINT_H
+#define FLEXINT_H
+
+/* C99 systems have <inttypes.h>. Non-C99 systems may or may not. */
+
+#if defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
+
+/* C99 says to define __STDC_LIMIT_MACROS before including stdint.h,
+ * if you want the limit (max/min) macros for int types. 
+ */
+#ifndef __STDC_LIMIT_MACROS
+#define __STDC_LIMIT_MACROS 1
+#endif
+
+#include <inttypes.h>
+typedef int8_t flex_int8_t;
+typedef uint8_t flex_uint8_t;
+typedef int16_t flex_int16_t;
+typedef uint16_t flex_uint16_t;
+typedef int32_t flex_int32_t;
+typedef uint32_t flex_uint32_t;
+#else
+typedef signed char flex_int8_t;
+typedef short int flex_int16_t;
+typedef int flex_int32_t;
+typedef unsigned char flex_uint8_t; 
+typedef unsigned short int flex_uint16_t;
+typedef unsigned int flex_uint32_t;
+
+/* Limits of integral types. */
+#ifndef INT8_MIN
+#define INT8_MIN               (-128)
+#endif
+#ifndef INT16_MIN
+#define INT16_MIN              (-32767-1)
+#endif
+#ifndef INT32_MIN
+#define INT32_MIN              (-2147483647-1)
+#endif
+#ifndef INT8_MAX
+#define INT8_MAX               (127)
+#endif
+#ifndef INT16_MAX
+#define INT16_MAX              (32767)
+#endif
+#ifndef INT32_MAX
+#define INT32_MAX              (2147483647)
+#endif
+#ifndef UINT8_MAX
+#define UINT8_MAX              (255U)
+#endif
+#ifndef UINT16_MAX
+#define UINT16_MAX             (65535U)
+#endif
+#ifndef UINT32_MAX
+#define UINT32_MAX             (4294967295U)
+#endif
+
+#endif /* ! C99 */
+
+#endif /* ! FLEXINT_H */
+
+#ifdef __cplusplus
+
+/* The "const" storage-class-modifier is valid. */
+#define YY_USE_CONST
+
+#else	/* ! __cplusplus */
+
+/* C99 requires __STDC__ to be defined as 1. */
+#if defined (__STDC__)
+
+#define YY_USE_CONST
+
+#endif	/* defined (__STDC__) */
+#endif	/* ! __cplusplus */
+
+#ifdef YY_USE_CONST
+#define yyconst const
+#else
+#define yyconst
+#endif
+
+/* Returned upon end-of-file. */
+#define YY_NULL 0
+
+/* Promotes a possibly negative, possibly signed char to an unsigned
+ * integer for use as an array index.  If the signed char is negative,
+ * we want to instead treat it as an 8-bit unsigned char, hence the
+ * double cast.
+ */
+#define YY_SC_TO_UI(c) ((unsigned int) (unsigned char) c)
+
+/* An opaque pointer. */
+#ifndef YY_TYPEDEF_YY_SCANNER_T
+#define YY_TYPEDEF_YY_SCANNER_T
+typedef void* yyscan_t;
+#endif
+
+/* For convenience, these vars (plus the bison vars far below)
+   are macros in the reentrant scanner. */
+#define yyin yyg->yyin_r
+#define yyout yyg->yyout_r
+#define yyextra yyg->yyextra_r
+#define yyleng yyg->yyleng_r
+#define yytext yyg->yytext_r
+#define yylineno (YY_CURRENT_BUFFER_LVALUE->yy_bs_lineno)
+#define yycolumn (YY_CURRENT_BUFFER_LVALUE->yy_bs_column)
+#define yy_flex_debug yyg->yy_flex_debug_r
+
+/* Enter a start condition.  This macro really ought to take a parameter,
+ * but we do it the disgusting crufty way forced on us by the ()-less
+ * definition of BEGIN.
+ */
+#define BEGIN yyg->yy_start = 1 + 2 *
+
+/* Translate the current start state into a value that can be later handed
+ * to BEGIN to return to the state.  The YYSTATE alias is for lex
+ * compatibility.
+ */
+#define YY_START ((yyg->yy_start - 1) / 2)
+#define YYSTATE YY_START
+
+/* Action number for EOF rule of a given start state. */
+#define YY_STATE_EOF(state) (YY_END_OF_BUFFER + state + 1)
+
+/* Special action meaning "start processing a new file". */
+#define YY_NEW_FILE _mesa_program_lexer_restart(yyin ,yyscanner )
+
+#define YY_END_OF_BUFFER_CHAR 0
+
+/* Size of default input buffer. */
+#ifndef YY_BUF_SIZE
+#ifdef __ia64__
+/* On IA-64, the buffer size is 16k, not 8k.
+ * Moreover, YY_BUF_SIZE is 2*YY_READ_BUF_SIZE in the general case.
+ * Ditto for the __ia64__ case accordingly.
+ */
+#define YY_BUF_SIZE 32768
+#else
+#define YY_BUF_SIZE 16384
+#endif /* __ia64__ */
+#endif
+
+/* The state buf must be large enough to hold one state per character in the main buffer.
+ */
+#define YY_STATE_BUF_SIZE   ((YY_BUF_SIZE + 2) * sizeof(yy_state_type))
+
+#ifndef YY_TYPEDEF_YY_BUFFER_STATE
+#define YY_TYPEDEF_YY_BUFFER_STATE
+typedef struct yy_buffer_state *YY_BUFFER_STATE;
+#endif
+
+#define EOB_ACT_CONTINUE_SCAN 0
+#define EOB_ACT_END_OF_FILE 1
+#define EOB_ACT_LAST_MATCH 2
+
+    #define YY_LESS_LINENO(n)
+    
+/* Return all but the first "n" matched characters back to the input stream. */
+#define yyless(n) \
+	do \
+		{ \
+		/* Undo effects of setting up yytext. */ \
+        int yyless_macro_arg = (n); \
+        YY_LESS_LINENO(yyless_macro_arg);\
+		*yy_cp = yyg->yy_hold_char; \
+		YY_RESTORE_YY_MORE_OFFSET \
+		yyg->yy_c_buf_p = yy_cp = yy_bp + yyless_macro_arg - YY_MORE_ADJ; \
+		YY_DO_BEFORE_ACTION; /* set up yytext again */ \
+		} \
+	while ( 0 )
+
+#define unput(c) yyunput( c, yyg->yytext_ptr , yyscanner )
+
+#ifndef YY_TYPEDEF_YY_SIZE_T
+#define YY_TYPEDEF_YY_SIZE_T
+typedef size_t yy_size_t;
+#endif
+
+#ifndef YY_STRUCT_YY_BUFFER_STATE
+#define YY_STRUCT_YY_BUFFER_STATE
+struct yy_buffer_state
+	{
+	FILE *yy_input_file;
+
+	char *yy_ch_buf;		/* input buffer */
+	char *yy_buf_pos;		/* current position in input buffer */
+
+	/* Size of input buffer in bytes, not including room for EOB
+	 * characters.
+	 */
+	yy_size_t yy_buf_size;
+
+	/* Number of characters read into yy_ch_buf, not including EOB
+	 * characters.
+	 */
+	int yy_n_chars;
+
+	/* Whether we "own" the buffer - i.e., we know we created it,
+	 * and can realloc() it to grow it, and should free() it to
+	 * delete it.
+	 */
+	int yy_is_our_buffer;
+
+	/* Whether this is an "interactive" input source; if so, and
+	 * if we're using stdio for input, then we want to use getc()
+	 * instead of fread(), to make sure we stop fetching input after
+	 * each newline.
+	 */
+	int yy_is_interactive;
+
+	/* Whether we're considered to be at the beginning of a line.
+	 * If so, '^' rules will be active on the next match, otherwise
+	 * not.
+	 */
+	int yy_at_bol;
+
+    int yy_bs_lineno; /**< The line count. */
+    int yy_bs_column; /**< The column count. */
+    
+	/* Whether to try to fill the input buffer when we reach the
+	 * end of it.
+	 */
+	int yy_fill_buffer;
+
+	int yy_buffer_status;
+
+#define YY_BUFFER_NEW 0
+#define YY_BUFFER_NORMAL 1
+	/* When an EOF's been seen but there's still some text to process
+	 * then we mark the buffer as YY_EOF_PENDING, to indicate that we
+	 * shouldn't try reading from the input source any more.  We might
+	 * still have a bunch of tokens to match, though, because of
+	 * possible backing-up.
+	 *
+	 * When we actually see the EOF, we change the status to "new"
+	 * (via _mesa_program_lexer_restart()), so that the user can continue scanning by
+	 * just pointing yyin at a new input file.
+	 */
+#define YY_BUFFER_EOF_PENDING 2
+
+	};
+#endif /* !YY_STRUCT_YY_BUFFER_STATE */
+
+/* We provide macros for accessing buffer states in case in the
+ * future we want to put the buffer states in a more general
+ * "scanner state".
+ *
+ * Returns the top of the stack, or NULL.
+ */
+#define YY_CURRENT_BUFFER ( yyg->yy_buffer_stack \
+                          ? yyg->yy_buffer_stack[yyg->yy_buffer_stack_top] \
+                          : NULL)
+
+/* Same as previous macro, but useful when we know that the buffer stack is not
+ * NULL or when we need an lvalue. For internal use only.
+ */
+#define YY_CURRENT_BUFFER_LVALUE yyg->yy_buffer_stack[yyg->yy_buffer_stack_top]
+
+void _mesa_program_lexer_restart (FILE *input_file ,yyscan_t yyscanner );
+void _mesa_program_lexer__switch_to_buffer (YY_BUFFER_STATE new_buffer ,yyscan_t yyscanner );
+YY_BUFFER_STATE _mesa_program_lexer__create_buffer (FILE *file,int size ,yyscan_t yyscanner );
+void _mesa_program_lexer__delete_buffer (YY_BUFFER_STATE b ,yyscan_t yyscanner );
+void _mesa_program_lexer__flush_buffer (YY_BUFFER_STATE b ,yyscan_t yyscanner );
+void _mesa_program_lexer_push_buffer_state (YY_BUFFER_STATE new_buffer ,yyscan_t yyscanner );
+void _mesa_program_lexer_pop_buffer_state (yyscan_t yyscanner );
+
+static void _mesa_program_lexer_ensure_buffer_stack (yyscan_t yyscanner );
+static void _mesa_program_lexer__load_buffer_state (yyscan_t yyscanner );
+static void _mesa_program_lexer__init_buffer (YY_BUFFER_STATE b,FILE *file ,yyscan_t yyscanner );
+
+#define YY_FLUSH_BUFFER _mesa_program_lexer__flush_buffer(YY_CURRENT_BUFFER ,yyscanner)
+
+YY_BUFFER_STATE _mesa_program_lexer__scan_buffer (char *base,yy_size_t size ,yyscan_t yyscanner );
+YY_BUFFER_STATE _mesa_program_lexer__scan_string (yyconst char *yy_str ,yyscan_t yyscanner );
+YY_BUFFER_STATE _mesa_program_lexer__scan_bytes (yyconst char *bytes,int len ,yyscan_t yyscanner );
+
+void *_mesa_program_lexer_alloc (yy_size_t ,yyscan_t yyscanner );
+void *_mesa_program_lexer_realloc (void *,yy_size_t ,yyscan_t yyscanner );
+void _mesa_program_lexer_free (void * ,yyscan_t yyscanner );
+
+#define yy_new_buffer _mesa_program_lexer__create_buffer
+
+#define yy_set_interactive(is_interactive) \
+	{ \
+	if ( ! YY_CURRENT_BUFFER ){ \
+        _mesa_program_lexer_ensure_buffer_stack (yyscanner); \
+		YY_CURRENT_BUFFER_LVALUE =    \
+            _mesa_program_lexer__create_buffer(yyin,YY_BUF_SIZE ,yyscanner); \
+	} \
+	YY_CURRENT_BUFFER_LVALUE->yy_is_interactive = is_interactive; \
+	}
+
+#define yy_set_bol(at_bol) \
+	{ \
+	if ( ! YY_CURRENT_BUFFER ){\
+        _mesa_program_lexer_ensure_buffer_stack (yyscanner); \
+		YY_CURRENT_BUFFER_LVALUE =    \
+            _mesa_program_lexer__create_buffer(yyin,YY_BUF_SIZE ,yyscanner); \
+	} \
+	YY_CURRENT_BUFFER_LVALUE->yy_at_bol = at_bol; \
+	}
+
+#define YY_AT_BOL() (YY_CURRENT_BUFFER_LVALUE->yy_at_bol)
+
+/* Begin user sect3 */
+
+#define _mesa_program_lexer_wrap(n) 1
+#define YY_SKIP_YYWRAP
+
+typedef unsigned char YY_CHAR;
+
+typedef int yy_state_type;
+
+#define yytext_ptr yytext_r
+
+static yy_state_type yy_get_previous_state (yyscan_t yyscanner );
+static yy_state_type yy_try_NUL_trans (yy_state_type current_state  ,yyscan_t yyscanner);
+static int yy_get_next_buffer (yyscan_t yyscanner );
+static void yy_fatal_error (yyconst char msg[] ,yyscan_t yyscanner );
+
+/* Done after the current pattern has been matched and before the
+ * corresponding action - sets up yytext.
+ */
+#define YY_DO_BEFORE_ACTION \
+	yyg->yytext_ptr = yy_bp; \
+	yyleng = (size_t) (yy_cp - yy_bp); \
+	yyg->yy_hold_char = *yy_cp; \
+	*yy_cp = '\0'; \
+	yyg->yy_c_buf_p = yy_cp;
+
+#define YY_NUM_RULES 170
+#define YY_END_OF_BUFFER 171
+/* This struct is not used in this scanner,
+   but its presence is necessary. */
+struct yy_trans_info
+	{
+	flex_int32_t yy_verify;
+	flex_int32_t yy_nxt;
+	};
+static yyconst flex_int16_t yy_accept[850] =
+    {   0,
+        0,    0,  171,  169,  167,  166,  169,  169,  139,  165,
+      141,  141,  141,  141,  139,  139,  139,  139,  139,  139,
+      139,  139,  139,  139,  139,  139,  139,  139,  139,  139,
+      139,  139,  139,  139,  139,  167,    0,    0,  168,  139,
+        0,  140,  142,  162,  162,    0,    0,    0,    0,  162,
+        0,    0,    0,    0,    0,    0,    0,  119,  163,  120,
+      121,  153,  153,  153,  153,    0,  141,    0,  127,  128,
+      129,  139,  139,  139,  139,  139,  139,  139,  139,  139,
+      139,  139,  139,  139,  139,  139,  139,  139,  139,  139,
+      139,  139,  139,  139,  139,  139,  139,  139,  139,  139,
+
+      139,  139,  139,  139,  139,  139,  139,  139,  139,  139,
+      139,  139,  139,  139,  139,  139,  139,  139,  139,  139,
+      139,  139,  139,  139,  139,  139,    0,    0,    0,    0,
+        0,    0,    0,    0,    0,  161,    0,    0,    0,    0,
+        0,    0,    0,    0,    0,  160,  160,    0,    0,    0,
+        0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
+      159,  159,  159,    0,    0,    0,    0,    0,    0,    0,
+        0,    0,    0,  150,  150,  150,  151,  151,  152,  143,
+      142,  143,    0,  144,   11,   12,  139,   13,  139,  139,
+       14,   15,  139,   16,   17,   18,   19,   20,   21,    6,
+
+       22,   23,   24,   25,   26,   28,   27,   29,   30,   31,
+       32,   33,   34,   35,  139,  139,  139,  139,  139,   40,
+       41,  139,   42,   43,   44,   45,   46,   47,   48,  139,
+       49,   50,   51,   52,   53,   54,   55,  139,   56,   57,
+       58,   59,  139,  139,   64,   65,  139,  139,  139,  139,
+      139,  139,    0,    0,    0,    0,  142,    0,    0,    0,
+        0,    0,    0,    0,    0,    0,    0,   80,   81,   83,
+        0,  158,    0,    0,    0,    0,    0,    0,   97,    0,
+        0,    0,    0,    0,    0,    0,    0,    0,    0,  157,
+      156,  156,  109,    0,    0,    0,    0,    0,    0,    0,
+
+        0,    0,    0,  147,  147,  148,  149,    0,  145,   11,
+       11,  139,   12,   12,   12,  139,  139,  139,  139,  139,
+       15,   15,  139,  130,   16,   16,  139,   17,   17,  139,
+       18,   18,  139,   19,   19,  139,   20,   20,  139,   21,
+       21,  139,   22,   22,  139,   24,   24,  139,   25,   25,
+      139,   28,   28,  139,   27,   27,  139,   30,   30,  139,
+       31,   31,  139,   32,   32,  139,   33,   33,  139,   34,
+       34,  139,   35,   35,  139,  139,  139,  139,   36,  139,
+       38,  139,   40,   40,  139,   41,   41,  139,  131,   42,
+       42,  139,   43,   43,  139,  139,   45,   45,  139,   46,
+
+       46,  139,   47,   47,  139,   48,   48,  139,  139,   49,
+       49,  139,   50,   50,  139,   51,   51,  139,   52,   52,
+      139,   53,   53,  139,   54,   54,  139,  139,   10,   56,
+      139,   57,  139,   58,  139,   59,  139,   60,  139,   62,
+      139,   64,   64,  139,  139,  139,  139,  139,  139,  139,
+      139,    0,  164,    0,    0,    0,   73,   74,    0,    0,
+        0,    0,    0,    0,    0,   85,    0,    0,    0,    0,
+        0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
+        0,    0,    0,    0,  155,    0,    0,    0,  113,    0,
+      115,    0,    0,    0,    0,    0,    0,  154,  146,  139,
+
+      139,  139,    4,  139,  139,  139,  139,  139,  139,  139,
+      139,  139,  139,  139,  139,  139,  139,  139,  139,  139,
+      139,  139,  139,  139,  139,  139,    9,   37,   39,  139,
+      139,  139,  139,  139,  139,  139,  139,  139,  139,  139,
+      139,  139,  139,  139,  139,  139,  139,  139,  139,  139,
+       60,  139,   61,   62,  139,   63,  139,  139,  139,  139,
+      139,   69,  139,  139,    0,    0,    0,    0,    0,   75,
+       76,    0,    0,    0,    0,   84,    0,    0,   88,   91,
+        0,    0,    0,    0,    0,    0,    0,  102,  103,    0,
+        0,    0,    0,  108,    0,    0,    0,    0,    0,    0,
+
+        0,    0,    0,    0,  139,  139,  139,  139,  139,  139,
+        5,  139,  139,  139,  139,  139,  139,  139,  139,  139,
+      139,  139,  139,  139,  139,  139,  139,  139,  139,  139,
+        7,    8,  139,  139,  139,  139,  139,  139,  139,  139,
+      139,  139,  139,  139,  139,  139,  139,  139,  139,  139,
+      139,  139,  139,  139,   61,  139,  139,   63,  139,  139,
+      139,  139,  139,   70,  139,   66,    0,    0,    0,    0,
+      124,    0,    0,    0,    0,    0,    0,    0,    0,    0,
+       94,    0,   98,   99,    0,  101,    0,    0,    0,    0,
+        0,    0,    0,    0,    0,    0,  117,  118,    0,    0,
+
+      125,   11,    3,   12,  135,  136,  139,   14,   15,   16,
+       17,   18,   19,   20,   21,   22,   24,   25,   28,   27,
+       30,   31,   32,   33,   34,   35,   40,   41,   42,   43,
+       44,   45,   46,   47,   48,  139,  139,  139,   49,   50,
+       51,   52,   53,   54,   55,   56,   57,   58,   59,  139,
+      139,  139,  139,   64,   65,  139,   68,  126,    0,    0,
+       71,    0,   77,    0,    0,    0,   86,    0,    0,    0,
+        0,    0,    0,  100,    0,    0,  106,   93,    0,    0,
+        0,    0,    0,    0,  122,    0,  139,  132,  133,  139,
+       60,  139,   62,  139,   67,    0,    0,    0,    0,   79,
+
+       82,   87,    0,    0,   92,    0,    0,    0,  105,    0,
+        0,    0,    0,  114,  116,    0,  139,  139,   61,   63,
+        2,    1,    0,   78,    0,   90,    0,   96,  104,    0,
+        0,  111,  112,  123,  139,  134,    0,   89,    0,  107,
+      110,  139,   72,   95,  139,  139,  137,  138,    0
+    } ;
+
+static yyconst flex_int32_t yy_ec[256] =
+    {   0,
+        1,    1,    1,    1,    1,    1,    1,    1,    2,    3,
+        1,    1,    4,    1,    1,    1,    1,    1,    1,    1,
+        1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
+        1,    2,    5,    1,    6,    7,    1,    1,    1,    1,
+        1,    1,    8,    1,    8,    9,    1,   10,   11,   12,
+       13,   14,   15,   15,   15,   15,   15,    1,    1,    1,
+        1,    1,    1,    1,   16,   17,   18,   19,   20,   21,
+       22,   23,   24,    7,   25,   26,   27,   28,   29,   30,
+       31,   32,   33,   34,   35,   36,   37,   38,   39,   40,
+        1,    1,    1,    1,   41,    1,   42,   43,   44,   45,
+
+       46,   47,   48,   49,   50,   51,   52,   53,   54,   55,
+       56,   57,   58,   59,   60,   61,   62,   63,   64,   65,
+       66,   67,    1,    1,    1,    1,    1,    1,    1,    1,
+        1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
+        1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
+        1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
+        1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
+        1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
+        1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
+        1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
+
+        1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
+        1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
+        1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
+        1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
+        1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
+        1,    1,    1,    1,    1
+    } ;
+
+static yyconst flex_int32_t yy_meta[68] =
+    {   0,
+        1,    1,    1,    1,    1,    1,    2,    1,    3,    2,
+        2,    2,    2,    2,    2,    2,    2,    2,    2,    2,
+        2,    2,    2,    2,    2,    2,    2,    2,    2,    2,
+        2,    2,    2,    2,    2,    2,    2,    2,    2,    2,
+        2,    2,    2,    2,    2,    2,    2,    2,    2,    2,
+        2,    2,    2,    2,    2,    2,    2,    2,    2,    2,
+        2,    2,    2,    2,    2,    2,    2
+    } ;
+
+static yyconst flex_int16_t yy_base[853] =
+    {   0,
+        0,    0, 1299, 1300,   66, 1300, 1293, 1294,    0,   69,
+       85,  128,  140,  152,  151,   58,   56,   63,   76, 1272,
+      158,  160,   39,  163,  173,  189,   52, 1265,   76, 1235,
+     1234, 1246, 1230, 1244, 1243,  105, 1272, 1284, 1300,    0,
+      225, 1300,  218,  160,  157,   20,  123,   66,  119,  192,
+     1244, 1230,   54,  162, 1228, 1240,  194, 1300,  200,  195,
+       98,  227,  196,  231,  235,  293,  305,  316, 1300, 1300,
+     1300, 1249, 1262, 1256,  223, 1245, 1248, 1244, 1259,  107,
+      298, 1241, 1255,  246, 1241, 1254, 1245, 1258, 1235, 1246,
+     1237,  182, 1238, 1229, 1238, 1229, 1228, 1229,  144, 1223,
+
+     1229, 1240, 1231, 1225, 1222, 1223, 1227,  289, 1236, 1223,
+      302, 1230, 1217, 1231, 1207,   65,  315,  276, 1227, 1226,
+     1202, 1187, 1182, 1199, 1175, 1180, 1206,  279, 1195,  293,
+     1190,  342,  299, 1192, 1173,  317, 1183, 1179, 1174,  207,
+     1180, 1166, 1182, 1179, 1170,  320,  324, 1172, 1161, 1175,
+     1178, 1160, 1175, 1162, 1159, 1166,  284, 1174,  227,  288,
+      327,  342,  345, 1151, 1168, 1169, 1162, 1144,  318, 1145,
+     1167, 1158,  330,  341,  345,  349,  353,  357,  361, 1300,
+      419,  430,  436,  442,  440,  441, 1191,    0, 1190, 1173,
+     1163,  443, 1183,  444,  451,  468,  470,  472,  471,    0,
+
+      496,    0,  497,  498,    0,  499,  500,    0,  524,  525,
+      526,  536,  537,  553, 1178, 1171, 1184,  354,  356,  561,
+      563, 1165,  564,  565, 1157,  580,  590,  591,  592, 1178,
+      593,  617,  618,  619,  629,  630, 1155, 1165,  330,  362,
+      419,  483,  445,  364,  646, 1153, 1145, 1144, 1129, 1129,
+     1128, 1127, 1170, 1142, 1130,  662,  669,  643, 1134,  487,
+     1131, 1125, 1125, 1119, 1132, 1132, 1117, 1300, 1300, 1132,
+     1120,  646, 1127,  135, 1124, 1130,  561, 1125, 1300, 1116,
+     1123, 1122, 1125, 1111, 1110, 1114, 1109,  448, 1114,  650,
+      653,  665, 1300, 1106, 1104, 1104, 1112, 1113, 1095,  670,
+
+     1100, 1106,  486,  579,  655,  661,  668,  726,  732, 1112,
+      682, 1119, 1110,  688,  730, 1117, 1116, 1109, 1123, 1113,
+     1104,  712, 1111,    0, 1102,  731, 1109, 1100,  733, 1107,
+     1098,  734, 1105, 1096,  736, 1103, 1094,  737, 1101, 1092,
+      738, 1099, 1090,  739, 1097, 1088,  740, 1095, 1086,  741,
+     1093, 1084,  742, 1091, 1082,  743, 1089, 1080,  744, 1087,
+     1078,  745, 1085, 1076,  746, 1083, 1074,  747, 1081, 1072,
+      748, 1079, 1070,  749, 1077, 1080, 1073, 1080,    0, 1073,
+        0, 1088, 1063,  750, 1070, 1061,  751, 1068,    0, 1059,
+      752, 1066, 1057,  755, 1064, 1063, 1054,  758, 1061, 1052,
+
+      776, 1059, 1050,  777, 1057, 1048,  779, 1055, 1058, 1045,
+      780, 1052, 1043,  782, 1050, 1041,  783, 1048, 1039,  784,
+     1046, 1037,  785, 1044, 1035,  786, 1042, 1041,    0, 1032,
+     1039, 1030, 1037, 1028, 1035, 1026, 1033,  787, 1032,  788,
+     1047, 1022,  789, 1029, 1028, 1006, 1000, 1005, 1011,  994,
+     1009,  424, 1300, 1008,  998, 1002, 1300, 1300,  992, 1001,
+      987, 1004,  987,  990,  984, 1300,  985,  984,  981,  988,
+      981,  989,  985,  995,  992,  974,  980,  987,  971,  970,
+      988,  970,  982,  981, 1300,  980,  970,  974, 1300,  961,
+     1300,  966,  966,  974,  957,  958,  968, 1300, 1300, 1000,
+
+      982,  998,    0,  798,  996,  996,  995,  994,  993,  992,
+      991,  990,  989,  988,  987,  986,  985,  984,  983,  982,
+      981,  980,  979,  978,  965,  958,    0,    0,    0,  975,
+      974,  973,  972,  971,  970,  969,  968,  967,  945,  965,
+      964,  963,  962,  961,  960,  959,  958,  957,  956,  955,
+      929,  936,  793,  927,  934,  794,  950,  949,  918,  921,
+      901,    0,  902,  895,  902,  901,  902,  894,  912, 1300,
+     1300,  894,  892,  902,  895, 1300,  890,  907,  516, 1300,
+      898,  882,  883,  892,  883,  882,  882, 1300,  881,  890,
+      880,  896,  893, 1300,  892,  890,  879,  880,  876,  868,
+
+      875,  870,  871,  866,  892,  892,  890,  904,  903,  898,
+        0,  886,  885,  884,  883,  882,  881,  880,  879,  878,
+      877,  876,  875,  874,  873,  872,  871,  870,  869,  868,
+        0,    0,  867,  866,  865,  864,  863,  862,  861,  860,
+      859,  804,  858,  857,  856,  855,  854,  853,  852,  851,
+      850,  849,  848,  865,  839,  846,  862,  836,  843,  841,
+      840,  818,  818,    0,  825,    0,  859,  858,  807,  825,
+     1300,  820,  815,  808,  804,  816,  806,  804,  800,  816,
+      807,  806, 1300, 1300,  809, 1300,  804,  797,  786,  797,
+      789,  793,  806,  801,  804,  786, 1300, 1300,  798,  787,
+
+     1300,    0,    0,    0,    0,    0,  826,    0,    0,    0,
+        0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
+        0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
+        0,    0,    0,    0,    0,  814,  813,  802,    0,    0,
+        0,    0,    0,    0,    0,    0,    0,    0,    0,  785,
+      798,  779,  792,    0,    0,  656,    0,    0,  706,  702,
+     1300,  649, 1300,  648,  648,  654, 1300,  637,  645,  610,
+      612,  608,  608, 1300,  572,  583, 1300, 1300,  577,  573,
+      560,  557,  542,  555, 1300,  539,  573,    0,    0,  572,
+        0,  555,    0,  546,    0,  562,  551,  495,  479, 1300,
+
+     1300, 1300,  481,  481, 1300,  480,  443,   31, 1300,  141,
+      166,  171,  186, 1300, 1300,  211,  236,  276,    0,    0,
+     1300, 1300,  290, 1300,  325, 1300,  346, 1300, 1300,  343,
+      341, 1300, 1300, 1300,  365,    0,  380, 1300,  371, 1300,
+     1300,  486, 1300, 1300,  451,  458,    0,    0, 1300,  836,
+      503,  839
+    } ;
+
+static yyconst flex_int16_t yy_def[853] =
+    {   0,
+      849,    1,  849,  849,  849,  849,  849,  850,  851,  849,
+      849,  849,  849,  849,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  849,  849,  850,  849,  851,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  852,  849,  849,  849,  849,
+      849,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  851,
+
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+
+      849,  849,  849,  849,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+
+      849,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  851,  851,  851,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  851,  851,  851,  851,
+      851,  851,  851,  851,  851,  849,  849,  849,  849,  849,
+
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  851,  851,  851,  851,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  851,  851,  849,  849,  849,  849,
+      849,  851,  849,  849,  851,  851,  851,  851,    0,  849,
+      849,  849
+    } ;
+
+static yyconst flex_int16_t yy_nxt[1368] =
+    {   0,
+        4,    5,    6,    5,    7,    8,    9,    4,   10,   11,
+       12,   13,   14,   11,   11,   15,    9,   16,   17,   18,
+       19,    9,    9,    9,   20,   21,   22,    9,   23,   24,
+        9,   25,   26,   27,   28,    9,    9,   29,    9,    9,
+        9,    9,    9,    9,    9,    9,   30,    9,    9,    9,
+        9,    9,    9,    9,    9,    9,   31,    9,   32,   33,
+       34,    9,   35,    9,    9,    9,    9,   36,   96,   36,
+       41,  116,  137,   97,   80,  138,  829,   42,   43,   43,
+       43,   43,   43,   43,   77,   81,   78,  119,   82,  117,
+       83,  238,   79,   66,   67,   67,   67,   67,   67,   67,
+
+       84,   85,  239,  150,   68,  120,   36,   86,   36,  151,
+       44,   45,   46,   47,   48,   49,   50,   51,   52,  141,
+      142,   53,   54,   55,   56,   57,   58,   59,   60,   61,
+       68,  143,   62,   63,   64,   65,   66,   67,   67,   67,
+       67,   67,   67,  170,  194,  195,   69,   68,   66,   67,
+       67,   67,   67,   67,   67,  218,  171,  219,   70,   68,
+       66,   67,   67,   67,   67,   67,   67,   72,  139,   73,
+       71,   68,  140,   68,  144,   92,   74,  145,   98,   88,
+      467,   89,   75,   93,   76,   68,   90,   99,   94,   91,
+      101,  100,  102,  103,   95,  468,  830,   68,  136,  133,
+
+      210,  133,  133,  152,  133,  104,  105,  133,  106,  107,
+      108,  109,  110,  134,  111,  133,  112,  153,  133,  211,
+      135,  831,  113,  114,  154,  115,   41,   43,   43,   43,
+       43,   43,   43,  146,  147,  157,  832,  132,  165,  133,
+      166,  161,  162,  167,  168,  833,  158,  163,  188,  159,
+      133,  169,  160,  265,  189,  164,  834,  201,  133,  174,
+      173,  175,  176,  132,  835,  266,  128,  129,   46,   47,
+       48,   49,  172,   51,   52,  202,  285,   53,   54,   55,
+       56,   57,   58,  130,   60,   61,  286,  243,  131,  244,
+      173,  173,  173,  173,  177,  173,  173,  178,  179,  173,
+
+      173,  173,  181,  181,  181,  181,  181,  181,  228,  836,
+      196,  197,  182,   66,   67,   67,   67,   67,   67,   67,
+      198,  232,  229,  183,   68,  184,  184,  184,  184,  184,
+      184,  240,  134,  241,  255,  233,  282,  287,  182,  135,
+      258,  258,  283,  288,  242,  837,  258,  430,  164,  256,
+       68,  257,  257,  257,  257,  257,  257,  258,  258,  258,
+      261,  258,  258,  298,  258,  272,  258,  258,  258,  258,
+      431,  258,  381,  299,  258,  258,  379,  838,  258,  432,
+      440,  289,  258,  290,  258,  258,  291,  292,  380,  258,
+      382,  839,  258,  303,  303,  303,  303,  840,  441,  841,
+
+      258,  842,  433,  258,  303,  303,  303,  303,  304,  303,
+      303,  305,  306,  303,  303,  303,  303,  303,  303,  303,
+      307,  303,  303,  303,  303,  303,  303,  303,   43,   43,
+       43,   43,   43,   43,  843,  844,  434,  308,  132,  309,
+      309,  309,  309,  309,  309,  184,  184,  184,  184,  184,
+      184,  184,  184,  184,  184,  184,  184,  310,  313,  435,
+      321,  325,  311,  314,  132,  322,  326,  438,  328,  847,
+      565,  311,  315,  329,  322,  326,  848,  311,  314,  439,
+      312,  316,  329,  323,  327,  331,  566,  334,  340,  337,
+      332,  330,  335,  341,  338,  482,  845,  846,  483,  332,
+
+      436,  335,  341,  338,   40,  332,  828,  335,  333,  338,
+      336,  342,  339,  343,  346,  349,  352,  355,  344,  347,
+      350,  353,  356,  437,  827,  826,  825,  344,  347,  350,
+      353,  356,  455,  824,  347,  350,  345,  348,  351,  354,
+      357,  358,  361,  364,  823,  456,  359,  362,  365,  498,
+      498,  498,  498,  367,  370,  359,  362,  365,  368,  371,
+      822,  359,  362,  365,  360,  363,  366,  368,  371,  678,
+      373,  821,  679,  368,  371,  374,  369,  372,  383,  820,
+      386,  390,  393,  384,  374,  387,  391,  394,  819,  818,
+      374,  817,  384,  375,  387,  391,  394,  397,  816,  815,
+
+      814,  385,  398,  388,  392,  395,  471,  400,  403,  406,
+      410,  398,  401,  404,  407,  411,  813,  398,  812,  472,
+      399,  401,  404,  407,  411,  811,  810,  401,  404,  407,
+      402,  405,  408,  412,  413,  416,  419,  809,  808,  414,
+      417,  420,  498,  498,  498,  498,  422,  425,  414,  417,
+      420,  423,  426,  807,  414,  417,  420,  415,  418,  421,
+      423,  426,  806,  442,  805,  804,  423,  426,  443,  424,
+      427,  257,  257,  257,  257,  257,  257,  443,  257,  257,
+      257,  257,  257,  257,  453,  453,  444,  453,  453,  803,
+      453,  453,  453,  453,  453,  453,  802,  453,  801,  310,
+
+      453,  453,  800,  799,  453,  313,  485,  453,  453,  798,
+      797,  453,  453,  492,  796,  493,  795,  494,  499,  498,
+      498,  498,  312,  453,  498,  498,  498,  498,  316,  321,
+      495,  498,  498,  498,  498,  309,  309,  309,  309,  309,
+      309,  309,  309,  309,  309,  309,  309,  313,  325,  501,
+      328,  331,  323,  334,  337,  340,  343,  346,  349,  352,
+      355,  358,  361,  364,  367,  370,  373,  383,  386,  390,
+      316,  327,  393,  330,  333,  397,  336,  339,  342,  345,
+      348,  351,  354,  357,  360,  363,  366,  369,  372,  375,
+      385,  388,  392,  400,  403,  395,  406,  410,  399,  413,
+
+      416,  419,  422,  425,  551,  554,  442,  794,  608,  609,
+      655,  658,  793,  792,  736,  737,  402,  405,  791,  408,
+      412,  790,  415,  418,  421,  424,  427,  552,  555,  444,
+      610,  789,  788,  656,  659,  738,   38,   38,   38,  180,
+      180,  787,  786,  785,  784,  783,  782,  781,  780,  779,
+      778,  777,  776,  775,  774,  773,  772,  771,  770,  769,
+      768,  767,  766,  765,  764,  763,  762,  761,  760,  759,
+      758,  757,  756,  755,  754,  753,  659,  752,  751,  656,
+      750,  749,  748,  747,  746,  745,  744,  743,  742,  741,
+      740,  739,  735,  734,  733,  732,  731,  730,  729,  728,
+
+      727,  726,  725,  724,  723,  722,  721,  720,  719,  718,
+      717,  716,  715,  714,  713,  712,  711,  710,  709,  708,
+      707,  706,  705,  704,  703,  702,  701,  700,  699,  698,
+      697,  696,  695,  694,  693,  692,  691,  690,  689,  688,
+      687,  686,  685,  684,  683,  682,  681,  680,  677,  676,
+      675,  674,  673,  672,  671,  670,  669,  668,  667,  666,
+      665,  664,  663,  662,  661,  660,  657,  555,  654,  552,
+      653,  652,  651,  650,  649,  648,  647,  646,  645,  644,
+      643,  642,  641,  640,  639,  638,  637,  636,  635,  634,
+      633,  632,  631,  630,  629,  628,  627,  626,  625,  624,
+
+      623,  622,  621,  620,  619,  618,  617,  616,  615,  614,
+      613,  612,  611,  607,  606,  605,  604,  603,  602,  601,
+      600,  599,  598,  597,  596,  595,  594,  593,  592,  591,
+      590,  589,  588,  587,  586,  585,  584,  583,  582,  581,
+      580,  579,  578,  577,  576,  575,  574,  573,  572,  571,
+      570,  569,  568,  567,  564,  563,  562,  561,  560,  559,
+      558,  557,  444,  556,  553,  550,  437,  549,  435,  548,
+      433,  547,  431,  546,  545,  427,  544,  424,  543,  421,
+      542,  418,  541,  415,  540,  412,  539,  538,  408,  537,
+      405,  536,  402,  535,  399,  534,  533,  395,  532,  392,
+
+      531,  388,  530,  385,  529,  528,  527,  526,  525,  524,
+      375,  523,  372,  522,  369,  521,  366,  520,  363,  519,
+      360,  518,  357,  517,  354,  516,  351,  515,  348,  514,
+      345,  513,  342,  512,  339,  511,  336,  510,  333,  509,
+      330,  508,  327,  507,  323,  506,  505,  504,  503,  502,
+      316,  500,  312,  497,  496,  491,  490,  489,  488,  487,
+      486,  484,  481,  480,  479,  478,  477,  476,  475,  474,
+      473,  470,  469,  466,  465,  464,  463,  462,  461,  460,
+      459,  458,  457,  454,  289,  261,  452,  451,  450,  449,
+      448,  447,  446,  445,  429,  428,  409,  396,  389,  378,
+
+      377,  376,  324,  320,  319,  318,  317,  302,  301,  300,
+      297,  296,  295,  294,  293,  284,  281,  280,  279,  278,
+      277,  276,  275,  274,  273,  271,  270,  269,  268,  267,
+      264,  263,  262,  260,  259,  172,  254,  253,  252,  251,
+      250,  249,  248,  247,  246,  245,  237,  236,  235,  234,
+      231,  230,  227,  226,  225,  224,  223,  222,  221,  220,
+      217,  216,  215,  214,  213,  212,  209,  208,  207,  206,
+      205,  204,  203,  200,  199,  193,  192,  191,  190,  187,
+      186,  185,  156,  155,  149,  148,   39,  127,  126,  125,
+      124,  123,  122,  121,  118,   87,   39,   37,  849,    3,
+
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849
+    } ;
+
+static yyconst flex_int16_t yy_chk[1368] =
+    {   0,
+        1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
+        1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
+        1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
+        1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
+        1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
+        1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
+        1,    1,    1,    1,    1,    1,    1,    5,   23,    5,
+       10,   27,   46,   23,   17,   46,  808,   10,   10,   10,
+       10,   10,   10,   10,   16,   17,   16,   29,   17,   27,
+       18,  116,   16,   11,   11,   11,   11,   11,   11,   11,
+
+       18,   19,  116,   53,   11,   29,   36,   19,   36,   53,
+       10,   10,   10,   10,   10,   10,   10,   10,   10,   48,
+       48,   10,   10,   10,   10,   10,   10,   10,   10,   10,
+       11,   48,   10,   10,   10,   10,   12,   12,   12,   12,
+       12,   12,   12,   61,   80,   80,   12,   12,   13,   13,
+       13,   13,   13,   13,   13,   99,   61,   99,   13,   13,
+       14,   14,   14,   14,   14,   14,   14,   15,   47,   15,
+       14,   14,   47,   12,   49,   22,   15,   49,   24,   21,
+      274,   21,   15,   22,   15,   13,   21,   24,   22,   21,
+       25,   24,   25,   25,   22,  274,  810,   14,   45,   45,
+
+       92,   44,   44,   54,   45,   25,   26,   44,   26,   26,
+       26,   26,   26,   44,   26,   45,   26,   54,   44,   92,
+       44,  811,   26,   26,   54,   26,   41,   43,   43,   43,
+       43,   43,   43,   50,   50,   57,  812,   43,   60,   50,
+       60,   59,   59,   60,   60,  813,   57,   59,   75,   57,
+       50,   60,   57,  140,   75,   59,  816,   84,   59,   63,
+       63,   63,   63,   43,  817,  140,   41,   41,   41,   41,
+       41,   41,   62,   41,   41,   84,  159,   41,   41,   41,
+       41,   41,   41,   41,   41,   41,  159,  118,   41,  118,
+       62,   62,   62,   62,   64,   64,   64,   64,   65,   65,
+
+       65,   65,   66,   66,   66,   66,   66,   66,  108,  818,
+       81,   81,   66,   67,   67,   67,   67,   67,   67,   67,
+       81,  111,  108,   68,   67,   68,   68,   68,   68,   68,
+       68,  117,  128,  117,  130,  111,  157,  160,   66,  128,
+      133,  133,  157,  160,  117,  823,  133,  239,  130,  132,
+       67,  132,  132,  132,  132,  132,  132,  133,  136,  136,
+      136,  146,  146,  169,  136,  147,  147,  146,  161,  161,
+      239,  147,  219,  169,  161,  136,  218,  825,  146,  240,
+      244,  161,  147,  162,  162,  161,  163,  163,  218,  162,
+      219,  827,  163,  173,  173,  173,  173,  830,  244,  831,
+
+      162,  835,  240,  163,  174,  174,  174,  174,  175,  175,
+      175,  175,  176,  176,  176,  176,  177,  177,  177,  177,
+      178,  178,  178,  178,  179,  179,  179,  179,  181,  181,
+      181,  181,  181,  181,  837,  839,  241,  182,  181,  182,
+      182,  182,  182,  182,  182,  183,  183,  183,  183,  183,
+      183,  184,  184,  184,  184,  184,  184,  185,  186,  241,
+      192,  194,  185,  186,  181,  192,  194,  243,  195,  845,
+      452,  185,  186,  195,  192,  194,  846,  185,  186,  243,
+      185,  186,  195,  192,  194,  196,  452,  197,  199,  198,
+      196,  195,  197,  199,  198,  288,  842,  842,  288,  196,
+
+      242,  197,  199,  198,  851,  196,  807,  197,  196,  198,
+      197,  199,  198,  201,  203,  204,  206,  207,  201,  203,
+      204,  206,  207,  242,  806,  804,  803,  201,  203,  204,
+      206,  207,  260,  799,  203,  204,  201,  203,  204,  206,
+      207,  209,  210,  211,  798,  260,  209,  210,  211,  303,
+      303,  303,  303,  212,  213,  209,  210,  211,  212,  213,
+      797,  209,  210,  211,  209,  210,  211,  212,  213,  579,
+      214,  796,  579,  212,  213,  214,  212,  213,  220,  794,
+      221,  223,  224,  220,  214,  221,  223,  224,  792,  790,
+      214,  787,  220,  214,  221,  223,  224,  226,  786,  784,
+
+      783,  220,  226,  221,  223,  224,  277,  227,  228,  229,
+      231,  226,  227,  228,  229,  231,  782,  226,  781,  277,
+      226,  227,  228,  229,  231,  780,  779,  227,  228,  229,
+      227,  228,  229,  231,  232,  233,  234,  776,  775,  232,
+      233,  234,  304,  304,  304,  304,  235,  236,  232,  233,
+      234,  235,  236,  773,  232,  233,  234,  232,  233,  234,
+      235,  236,  772,  245,  771,  770,  235,  236,  245,  235,
+      236,  256,  256,  256,  256,  256,  256,  245,  257,  257,
+      257,  257,  257,  257,  258,  258,  245,  272,  272,  769,
+      258,  290,  290,  272,  291,  291,  768,  290,  766,  311,
+
+      291,  258,  765,  764,  272,  314,  292,  292,  290,  762,
+      760,  291,  292,  300,  759,  300,  756,  300,  305,  305,
+      305,  305,  311,  292,  306,  306,  306,  306,  314,  322,
+      300,  307,  307,  307,  307,  308,  308,  308,  308,  308,
+      308,  309,  309,  309,  309,  309,  309,  315,  326,  315,
+      329,  332,  322,  335,  338,  341,  344,  347,  350,  353,
+      356,  359,  362,  365,  368,  371,  374,  384,  387,  391,
+      315,  326,  394,  329,  332,  398,  335,  338,  341,  344,
+      347,  350,  353,  356,  359,  362,  365,  368,  371,  374,
+      384,  387,  391,  401,  404,  394,  407,  411,  398,  414,
+
+      417,  420,  423,  426,  438,  440,  443,  753,  504,  504,
+      553,  556,  752,  751,  642,  642,  401,  404,  750,  407,
+      411,  738,  414,  417,  420,  423,  426,  438,  440,  443,
+      504,  737,  736,  553,  556,  642,  850,  850,  850,  852,
+      852,  707,  700,  699,  696,  695,  694,  693,  692,  691,
+      690,  689,  688,  687,  685,  682,  681,  680,  679,  678,
+      677,  676,  675,  674,  673,  672,  670,  669,  668,  667,
+      665,  663,  662,  661,  660,  659,  658,  657,  656,  655,
+      654,  653,  652,  651,  650,  649,  648,  647,  646,  645,
+      644,  643,  641,  640,  639,  638,  637,  636,  635,  634,
+
+      633,  630,  629,  628,  627,  626,  625,  624,  623,  622,
+      621,  620,  619,  618,  617,  616,  615,  614,  613,  612,
+      610,  609,  608,  607,  606,  605,  604,  603,  602,  601,
+      600,  599,  598,  597,  596,  595,  593,  592,  591,  590,
+      589,  587,  586,  585,  584,  583,  582,  581,  578,  577,
+      575,  574,  573,  572,  569,  568,  567,  566,  565,  564,
+      563,  561,  560,  559,  558,  557,  555,  554,  552,  551,
+      550,  549,  548,  547,  546,  545,  544,  543,  542,  541,
+      540,  539,  538,  537,  536,  535,  534,  533,  532,  531,
+      530,  526,  525,  524,  523,  522,  521,  520,  519,  518,
+
+      517,  516,  515,  514,  513,  512,  511,  510,  509,  508,
+      507,  506,  505,  502,  501,  500,  497,  496,  495,  494,
+      493,  492,  490,  488,  487,  486,  484,  483,  482,  481,
+      480,  479,  478,  477,  476,  475,  474,  473,  472,  471,
+      470,  469,  468,  467,  465,  464,  463,  462,  461,  460,
+      459,  456,  455,  454,  451,  450,  449,  448,  447,  446,
+      445,  444,  442,  441,  439,  437,  436,  435,  434,  433,
+      432,  431,  430,  428,  427,  425,  424,  422,  421,  419,
+      418,  416,  415,  413,  412,  410,  409,  408,  406,  405,
+      403,  402,  400,  399,  397,  396,  395,  393,  392,  390,
+
+      388,  386,  385,  383,  382,  380,  378,  377,  376,  375,
+      373,  372,  370,  369,  367,  366,  364,  363,  361,  360,
+      358,  357,  355,  354,  352,  351,  349,  348,  346,  345,
+      343,  342,  340,  339,  337,  336,  334,  333,  331,  330,
+      328,  327,  325,  323,  321,  320,  319,  318,  317,  316,
+      313,  312,  310,  302,  301,  299,  298,  297,  296,  295,
+      294,  289,  287,  286,  285,  284,  283,  282,  281,  280,
+      278,  276,  275,  273,  271,  270,  267,  266,  265,  264,
+      263,  262,  261,  259,  255,  254,  253,  252,  251,  250,
+      249,  248,  247,  246,  238,  237,  230,  225,  222,  217,
+
+      216,  215,  193,  191,  190,  189,  187,  172,  171,  170,
+      168,  167,  166,  165,  164,  158,  156,  155,  154,  153,
+      152,  151,  150,  149,  148,  145,  144,  143,  142,  141,
+      139,  138,  137,  135,  134,  131,  129,  127,  126,  125,
+      124,  123,  122,  121,  120,  119,  115,  114,  113,  112,
+      110,  109,  107,  106,  105,  104,  103,  102,  101,  100,
+       98,   97,   96,   95,   94,   93,   91,   90,   89,   88,
+       87,   86,   85,   83,   82,   79,   78,   77,   76,   74,
+       73,   72,   56,   55,   52,   51,   38,   37,   35,   34,
+       33,   32,   31,   30,   28,   20,    8,    7,    3,  849,
+
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849,  849,  849,  849,
+      849,  849,  849,  849,  849,  849,  849
+    } ;
+
+/* The intent behind this definition is that it'll catch
+ * any uses of REJECT which flex missed.
+ */
+#define REJECT reject_used_but_not_detected
+#define yymore() yymore_used_but_not_detected
+#define YY_MORE_ADJ 0
+#define YY_RESTORE_YY_MORE_OFFSET
+#line 1 "program/program_lexer.l"
+#line 2 "program/program_lexer.l"
+/*
+ * Copyright © 2009 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include "main/glheader.h"
+#include "main/imports.h"
+#include "program/prog_instruction.h"
+#include "program/prog_statevars.h"
+#include "program/symbol_table.h"
+#include "program/program_parser.h"
+#include "program/program_parse.tab.h"
+
+#define require_ARB_vp (yyextra->mode == ARB_vertex)
+#define require_ARB_fp (yyextra->mode == ARB_fragment)
+#define require_NV_fp  (yyextra->option.NV_fragment)
+#define require_shadow (yyextra->option.Shadow)
+#define require_rect   (yyextra->option.TexRect)
+#define require_texarray        (yyextra->option.TexArray)
+
+#ifndef HAVE_UNISTD_H
+#define YY_NO_UNISTD_H
+#endif
+
+#define return_token_or_IDENTIFIER(condition, token)	\
+   do {							\
+      if (condition) {					\
+	 return token;					\
+      } else {						\
+	 return handle_ident(yyextra, yytext, yylval);	\
+      }							\
+   } while (0)
+
+#define return_token_or_DOT(condition, token)		\
+   do {							\
+      if (condition) {					\
+	 return token;					\
+      } else {						\
+	 yyless(1);					\
+	 return DOT;					\
+      }							\
+   } while (0)
+
+
+#define return_opcode(condition, token, opcode, len)	\
+   do {							\
+      if (condition &&					\
+	  _mesa_parse_instruction_suffix(yyextra,	\
+					 yytext + len,	\
+					 & yylval->temp_inst)) {	\
+	 yylval->temp_inst.Opcode = OPCODE_ ## opcode;	\
+	 return token;					\
+      } else {						\
+	 return handle_ident(yyextra, yytext, yylval);	\
+      }							\
+   } while (0)
+
+#define SWIZZLE_INVAL  MAKE_SWIZZLE4(SWIZZLE_NIL, SWIZZLE_NIL, \
+				     SWIZZLE_NIL, SWIZZLE_NIL)
+
+static unsigned
+mask_from_char(char c)
+{
+   switch (c) {
+   case 'x':
+   case 'r':
+      return WRITEMASK_X;
+   case 'y':
+   case 'g':
+      return WRITEMASK_Y;
+   case 'z':
+   case 'b':
+      return WRITEMASK_Z;
+   case 'w':
+   case 'a':
+      return WRITEMASK_W;
+   }
+
+   return 0;
+}
+
+static unsigned
+swiz_from_char(char c)
+{
+   switch (c) {
+   case 'x':
+   case 'r':
+      return SWIZZLE_X;
+   case 'y':
+   case 'g':
+      return SWIZZLE_Y;
+   case 'z':
+   case 'b':
+      return SWIZZLE_Z;
+   case 'w':
+   case 'a':
+      return SWIZZLE_W;
+   }
+
+   return 0;
+}
+
+static int
+handle_ident(struct asm_parser_state *state, const char *text, YYSTYPE *lval)
+{
+   lval->string = strdup(text);
+
+   return (_mesa_symbol_table_find_symbol(state->st, 0, text) == NULL)
+      ? IDENTIFIER : USED_IDENTIFIER;
+}
+
+#define YY_USER_ACTION							\
+   do {									\
+      yylloc->first_column = yylloc->last_column;			\
+      yylloc->last_column += yyleng;					\
+      if ((yylloc->first_line == 1)					\
+	  && (yylloc->first_column == 1)) {				\
+	 yylloc->position = 1;						\
+      } else {								\
+	 yylloc->position += yylloc->last_column - yylloc->first_column; \
+      }									\
+   } while(0);
+
+#define YY_NO_INPUT
+
+/* Yes, this is intentionally doing nothing. We have this line of code
+here only to avoid the compiler complaining about an unput function
+that is defined, but never called. */
+#define YY_USER_INIT while (0) { unput(0); }
+
+#define YY_EXTRA_TYPE struct asm_parser_state *
+
+/* Flex defines a couple of functions with no declarations nor the
+static keyword. Declare them here to avoid a compiler warning. */
+int _mesa_program_lexer_get_column  (yyscan_t yyscanner);
+void _mesa_program_lexer_set_column (int  column_no , yyscan_t yyscanner);
+
+#line 1178 "program/lex.yy.c"
+
+#define INITIAL 0
+
+#ifndef YY_NO_UNISTD_H
+/* Special case for "unistd.h", since it is non-ANSI. We include it way
+ * down here because we want the user's section 1 to have been scanned first.
+ * The user has a chance to override it with an option.
+ */
+#include <unistd.h>
+#endif
+
+#ifndef YY_EXTRA_TYPE
+#define YY_EXTRA_TYPE void *
+#endif
+
+/* Holds the entire state of the reentrant scanner. */
+struct yyguts_t
+    {
+
+    /* User-defined. Not touched by flex. */
+    YY_EXTRA_TYPE yyextra_r;
+
+    /* The rest are the same as the globals declared in the non-reentrant scanner. */
+    FILE *yyin_r, *yyout_r;
+    size_t yy_buffer_stack_top; /**< index of top of stack. */
+    size_t yy_buffer_stack_max; /**< capacity of stack. */
+    YY_BUFFER_STATE * yy_buffer_stack; /**< Stack as an array. */
+    char yy_hold_char;
+    int yy_n_chars;
+    int yyleng_r;
+    char *yy_c_buf_p;
+    int yy_init;
+    int yy_start;
+    int yy_did_buffer_switch_on_eof;
+    int yy_start_stack_ptr;
+    int yy_start_stack_depth;
+    int *yy_start_stack;
+    yy_state_type yy_last_accepting_state;
+    char* yy_last_accepting_cpos;
+
+    int yylineno_r;
+    int yy_flex_debug_r;
+
+    char *yytext_r;
+    int yy_more_flag;
+    int yy_more_len;
+
+    YYSTYPE * yylval_r;
+
+    YYLTYPE * yylloc_r;
+
+    }; /* end struct yyguts_t */
+
+static int yy_init_globals (yyscan_t yyscanner );
+
+    /* This must go here because YYSTYPE and YYLTYPE are included
+     * from bison output in section 1.*/
+    #    define yylval yyg->yylval_r
+    
+    #    define yylloc yyg->yylloc_r
+    
+int _mesa_program_lexer_lex_init (yyscan_t* scanner);
+
+int _mesa_program_lexer_lex_init_extra (YY_EXTRA_TYPE user_defined,yyscan_t* scanner);
+
+/* Accessor methods to globals.
+   These are made visible to non-reentrant scanners for convenience. */
+
+int _mesa_program_lexer_lex_destroy (yyscan_t yyscanner );
+
+int _mesa_program_lexer_get_debug (yyscan_t yyscanner );
+
+void _mesa_program_lexer_set_debug (int debug_flag ,yyscan_t yyscanner );
+
+YY_EXTRA_TYPE _mesa_program_lexer_get_extra (yyscan_t yyscanner );
+
+void _mesa_program_lexer_set_extra (YY_EXTRA_TYPE user_defined ,yyscan_t yyscanner );
+
+FILE *_mesa_program_lexer_get_in (yyscan_t yyscanner );
+
+void _mesa_program_lexer_set_in  (FILE * in_str ,yyscan_t yyscanner );
+
+FILE *_mesa_program_lexer_get_out (yyscan_t yyscanner );
+
+void _mesa_program_lexer_set_out  (FILE * out_str ,yyscan_t yyscanner );
+
+int _mesa_program_lexer_get_leng (yyscan_t yyscanner );
+
+char *_mesa_program_lexer_get_text (yyscan_t yyscanner );
+
+int _mesa_program_lexer_get_lineno (yyscan_t yyscanner );
+
+void _mesa_program_lexer_set_lineno (int line_number ,yyscan_t yyscanner );
+
+YYSTYPE * _mesa_program_lexer_get_lval (yyscan_t yyscanner );
+
+void _mesa_program_lexer_set_lval (YYSTYPE * yylval_param ,yyscan_t yyscanner );
+
+       YYLTYPE *_mesa_program_lexer_get_lloc (yyscan_t yyscanner );
+    
+        void _mesa_program_lexer_set_lloc (YYLTYPE * yylloc_param ,yyscan_t yyscanner );
+    
+/* Macros after this point can all be overridden by user definitions in
+ * section 1.
+ */
+
+#ifndef YY_SKIP_YYWRAP
+#ifdef __cplusplus
+extern "C" int _mesa_program_lexer_wrap (yyscan_t yyscanner );
+#else
+extern int _mesa_program_lexer_wrap (yyscan_t yyscanner );
+#endif
+#endif
+
+    static void yyunput (int c,char *buf_ptr  ,yyscan_t yyscanner);
+    
+#ifndef yytext_ptr
+static void yy_flex_strncpy (char *,yyconst char *,int ,yyscan_t yyscanner);
+#endif
+
+#ifdef YY_NEED_STRLEN
+static int yy_flex_strlen (yyconst char * ,yyscan_t yyscanner);
+#endif
+
+#ifndef YY_NO_INPUT
+
+#ifdef __cplusplus
+static int yyinput (yyscan_t yyscanner );
+#else
+static int input (yyscan_t yyscanner );
+#endif
+
+#endif
+
+/* Amount of stuff to slurp up with each read. */
+#ifndef YY_READ_BUF_SIZE
+#ifdef __ia64__
+/* On IA-64, the buffer size is 16k, not 8k */
+#define YY_READ_BUF_SIZE 16384
+#else
+#define YY_READ_BUF_SIZE 8192
+#endif /* __ia64__ */
+#endif
+
+/* Copy whatever the last rule matched to the standard output. */
+#ifndef ECHO
+/* This used to be an fputs(), but since the string might contain NUL's,
+ * we now use fwrite().
+ */
+#define ECHO do { if (fwrite( yytext, yyleng, 1, yyout )) {} } while (0)
+#endif
+
+/* Gets input and stuffs it into "buf".  number of characters read, or YY_NULL,
+ * is returned in "result".
+ */
+#ifndef YY_INPUT
+#define YY_INPUT(buf,result,max_size) \
+	if ( YY_CURRENT_BUFFER_LVALUE->yy_is_interactive ) \
+		{ \
+		int c = '*'; \
+		size_t n; \
+		for ( n = 0; n < max_size && \
+			     (c = getc( yyin )) != EOF && c != '\n'; ++n ) \
+			buf[n] = (char) c; \
+		if ( c == '\n' ) \
+			buf[n++] = (char) c; \
+		if ( c == EOF && ferror( yyin ) ) \
+			YY_FATAL_ERROR( "input in flex scanner failed" ); \
+		result = n; \
+		} \
+	else \
+		{ \
+		errno=0; \
+		while ( (result = fread(buf, 1, max_size, yyin))==0 && ferror(yyin)) \
+			{ \
+			if( errno != EINTR) \
+				{ \
+				YY_FATAL_ERROR( "input in flex scanner failed" ); \
+				break; \
+				} \
+			errno=0; \
+			clearerr(yyin); \
+			} \
+		}\
+\
+
+#endif
+
+/* No semi-colon after return; correct usage is to write "yyterminate();" -
+ * we don't want an extra ';' after the "return" because that will cause
+ * some compilers to complain about unreachable statements.
+ */
+#ifndef yyterminate
+#define yyterminate() return YY_NULL
+#endif
+
+/* Number of entries by which start-condition stack grows. */
+#ifndef YY_START_STACK_INCR
+#define YY_START_STACK_INCR 25
+#endif
+
+/* Report a fatal error. */
+#ifndef YY_FATAL_ERROR
+#define YY_FATAL_ERROR(msg) yy_fatal_error( msg , yyscanner)
+#endif
+
+/* end tables serialization structures and prototypes */
+
+/* Default declaration of generated scanner - a define so the user can
+ * easily add parameters.
+ */
+#ifndef YY_DECL
+#define YY_DECL_IS_OURS 1
+
+extern int _mesa_program_lexer_lex \
+               (YYSTYPE * yylval_param,YYLTYPE * yylloc_param ,yyscan_t yyscanner);
+
+#define YY_DECL int _mesa_program_lexer_lex \
+               (YYSTYPE * yylval_param, YYLTYPE * yylloc_param , yyscan_t yyscanner)
+#endif /* !YY_DECL */
+
+/* Code executed at the beginning of each rule, after yytext and yyleng
+ * have been set up.
+ */
+#ifndef YY_USER_ACTION
+#define YY_USER_ACTION
+#endif
+
+/* Code executed at the end of each rule. */
+#ifndef YY_BREAK
+#define YY_BREAK break;
+#endif
+
+#define YY_RULE_SETUP \
+	YY_USER_ACTION
+
+/** The main scanner function which does all the work.
+ */
+YY_DECL
+{
+	register yy_state_type yy_current_state;
+	register char *yy_cp, *yy_bp;
+	register int yy_act;
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+
+#line 170 "program/program_lexer.l"
+
+
+#line 1427 "program/lex.yy.c"
+
+    yylval = yylval_param;
+
+    yylloc = yylloc_param;
+
+	if ( !yyg->yy_init )
+		{
+		yyg->yy_init = 1;
+
+#ifdef YY_USER_INIT
+		YY_USER_INIT;
+#endif
+
+		if ( ! yyg->yy_start )
+			yyg->yy_start = 1;	/* first start state */
+
+		if ( ! yyin )
+			yyin = stdin;
+
+		if ( ! yyout )
+			yyout = stdout;
+
+		if ( ! YY_CURRENT_BUFFER ) {
+			_mesa_program_lexer_ensure_buffer_stack (yyscanner);
+			YY_CURRENT_BUFFER_LVALUE =
+				_mesa_program_lexer__create_buffer(yyin,YY_BUF_SIZE ,yyscanner);
+		}
+
+		_mesa_program_lexer__load_buffer_state(yyscanner );
+		}
+
+	while ( 1 )		/* loops until end-of-file is reached */
+		{
+		yy_cp = yyg->yy_c_buf_p;
+
+		/* Support of yytext. */
+		*yy_cp = yyg->yy_hold_char;
+
+		/* yy_bp points to the position in yy_ch_buf of the start of
+		 * the current run.
+		 */
+		yy_bp = yy_cp;
+
+		yy_current_state = yyg->yy_start;
+yy_match:
+		do
+			{
+			register YY_CHAR yy_c = yy_ec[YY_SC_TO_UI(*yy_cp)];
+			if ( yy_accept[yy_current_state] )
+				{
+				yyg->yy_last_accepting_state = yy_current_state;
+				yyg->yy_last_accepting_cpos = yy_cp;
+				}
+			while ( yy_chk[yy_base[yy_current_state] + yy_c] != yy_current_state )
+				{
+				yy_current_state = (int) yy_def[yy_current_state];
+				if ( yy_current_state >= 850 )
+					yy_c = yy_meta[(unsigned int) yy_c];
+				}
+			yy_current_state = yy_nxt[yy_base[yy_current_state] + (unsigned int) yy_c];
+			++yy_cp;
+			}
+		while ( yy_base[yy_current_state] != 1300 );
+
+yy_find_action:
+		yy_act = yy_accept[yy_current_state];
+		if ( yy_act == 0 )
+			{ /* have to back up */
+			yy_cp = yyg->yy_last_accepting_cpos;
+			yy_current_state = yyg->yy_last_accepting_state;
+			yy_act = yy_accept[yy_current_state];
+			}
+
+		YY_DO_BEFORE_ACTION;
+
+do_action:	/* This label is used only to access EOF actions. */
+
+		switch ( yy_act )
+	{ /* beginning of action switch */
+			case 0: /* must back up */
+			/* undo the effects of YY_DO_BEFORE_ACTION */
+			*yy_cp = yyg->yy_hold_char;
+			yy_cp = yyg->yy_last_accepting_cpos;
+			yy_current_state = yyg->yy_last_accepting_state;
+			goto yy_find_action;
+
+case 1:
+YY_RULE_SETUP
+#line 172 "program/program_lexer.l"
+{ return ARBvp_10; }
+	YY_BREAK
+case 2:
+YY_RULE_SETUP
+#line 173 "program/program_lexer.l"
+{ return ARBfp_10; }
+	YY_BREAK
+case 3:
+YY_RULE_SETUP
+#line 174 "program/program_lexer.l"
+{
+   yylval->integer = at_address;
+   return_token_or_IDENTIFIER(require_ARB_vp, ADDRESS);
+}
+	YY_BREAK
+case 4:
+YY_RULE_SETUP
+#line 178 "program/program_lexer.l"
+{ return ALIAS; }
+	YY_BREAK
+case 5:
+YY_RULE_SETUP
+#line 179 "program/program_lexer.l"
+{ return ATTRIB; }
+	YY_BREAK
+case 6:
+YY_RULE_SETUP
+#line 180 "program/program_lexer.l"
+{ return END; }
+	YY_BREAK
+case 7:
+YY_RULE_SETUP
+#line 181 "program/program_lexer.l"
+{ return OPTION; }
+	YY_BREAK
+case 8:
+YY_RULE_SETUP
+#line 182 "program/program_lexer.l"
+{ return OUTPUT; }
+	YY_BREAK
+case 9:
+YY_RULE_SETUP
+#line 183 "program/program_lexer.l"
+{ return PARAM; }
+	YY_BREAK
+case 10:
+YY_RULE_SETUP
+#line 184 "program/program_lexer.l"
+{ yylval->integer = at_temp; return TEMP; }
+	YY_BREAK
+case 11:
+YY_RULE_SETUP
+#line 186 "program/program_lexer.l"
+{ return_opcode(             1, VECTOR_OP, ABS, 3); }
+	YY_BREAK
+case 12:
+YY_RULE_SETUP
+#line 187 "program/program_lexer.l"
+{ return_opcode(             1, BIN_OP, ADD, 3); }
+	YY_BREAK
+case 13:
+YY_RULE_SETUP
+#line 188 "program/program_lexer.l"
+{ return_opcode(require_ARB_vp, ARL, ARL, 3); }
+	YY_BREAK
+case 14:
+YY_RULE_SETUP
+#line 190 "program/program_lexer.l"
+{ return_opcode(require_ARB_fp, TRI_OP, CMP, 3); }
+	YY_BREAK
+case 15:
+YY_RULE_SETUP
+#line 191 "program/program_lexer.l"
+{ return_opcode(require_ARB_fp, SCALAR_OP, COS, 3); }
+	YY_BREAK
+case 16:
+YY_RULE_SETUP
+#line 193 "program/program_lexer.l"
+{ return_opcode(require_NV_fp,  VECTOR_OP, DDX, 3); }
+	YY_BREAK
+case 17:
+YY_RULE_SETUP
+#line 194 "program/program_lexer.l"
+{ return_opcode(require_NV_fp,  VECTOR_OP, DDY, 3); }
+	YY_BREAK
+case 18:
+YY_RULE_SETUP
+#line 195 "program/program_lexer.l"
+{ return_opcode(             1, BIN_OP, DP3, 3); }
+	YY_BREAK
+case 19:
+YY_RULE_SETUP
+#line 196 "program/program_lexer.l"
+{ return_opcode(             1, BIN_OP, DP4, 3); }
+	YY_BREAK
+case 20:
+YY_RULE_SETUP
+#line 197 "program/program_lexer.l"
+{ return_opcode(             1, BIN_OP, DPH, 3); }
+	YY_BREAK
+case 21:
+YY_RULE_SETUP
+#line 198 "program/program_lexer.l"
+{ return_opcode(             1, BIN_OP, DST, 3); }
+	YY_BREAK
+case 22:
+YY_RULE_SETUP
+#line 200 "program/program_lexer.l"
+{ return_opcode(             1, SCALAR_OP, EX2, 3); }
+	YY_BREAK
+case 23:
+YY_RULE_SETUP
+#line 201 "program/program_lexer.l"
+{ return_opcode(require_ARB_vp, SCALAR_OP, EXP, 3); }
+	YY_BREAK
+case 24:
+YY_RULE_SETUP
+#line 203 "program/program_lexer.l"
+{ return_opcode(             1, VECTOR_OP, FLR, 3); }
+	YY_BREAK
+case 25:
+YY_RULE_SETUP
+#line 204 "program/program_lexer.l"
+{ return_opcode(             1, VECTOR_OP, FRC, 3); }
+	YY_BREAK
+case 26:
+YY_RULE_SETUP
+#line 206 "program/program_lexer.l"
+{ return_opcode(require_ARB_fp, KIL, KIL, 3); }
+	YY_BREAK
+case 27:
+YY_RULE_SETUP
+#line 208 "program/program_lexer.l"
+{ return_opcode(             1, VECTOR_OP, LIT, 3); }
+	YY_BREAK
+case 28:
+YY_RULE_SETUP
+#line 209 "program/program_lexer.l"
+{ return_opcode(             1, SCALAR_OP, LG2, 3); }
+	YY_BREAK
+case 29:
+YY_RULE_SETUP
+#line 210 "program/program_lexer.l"
+{ return_opcode(require_ARB_vp, SCALAR_OP, LOG, 3); }
+	YY_BREAK
+case 30:
+YY_RULE_SETUP
+#line 211 "program/program_lexer.l"
+{ return_opcode(require_ARB_fp, TRI_OP, LRP, 3); }
+	YY_BREAK
+case 31:
+YY_RULE_SETUP
+#line 213 "program/program_lexer.l"
+{ return_opcode(             1, TRI_OP, MAD, 3); }
+	YY_BREAK
+case 32:
+YY_RULE_SETUP
+#line 214 "program/program_lexer.l"
+{ return_opcode(             1, BIN_OP, MAX, 3); }
+	YY_BREAK
+case 33:
+YY_RULE_SETUP
+#line 215 "program/program_lexer.l"
+{ return_opcode(             1, BIN_OP, MIN, 3); }
+	YY_BREAK
+case 34:
+YY_RULE_SETUP
+#line 216 "program/program_lexer.l"
+{ return_opcode(             1, VECTOR_OP, MOV, 3); }
+	YY_BREAK
+case 35:
+YY_RULE_SETUP
+#line 217 "program/program_lexer.l"
+{ return_opcode(             1, BIN_OP, MUL, 3); }
+	YY_BREAK
+case 36:
+YY_RULE_SETUP
+#line 219 "program/program_lexer.l"
+{ return_opcode(require_NV_fp,  VECTOR_OP, PK2H, 4); }
+	YY_BREAK
+case 37:
+YY_RULE_SETUP
+#line 220 "program/program_lexer.l"
+{ return_opcode(require_NV_fp,  VECTOR_OP, PK2US, 5); }
+	YY_BREAK
+case 38:
+YY_RULE_SETUP
+#line 221 "program/program_lexer.l"
+{ return_opcode(require_NV_fp,  VECTOR_OP, PK4B, 4); }
+	YY_BREAK
+case 39:
+YY_RULE_SETUP
+#line 222 "program/program_lexer.l"
+{ return_opcode(require_NV_fp,  VECTOR_OP, PK4UB, 5); }
+	YY_BREAK
+case 40:
+YY_RULE_SETUP
+#line 223 "program/program_lexer.l"
+{ return_opcode(             1, BINSC_OP, POW, 3); }
+	YY_BREAK
+case 41:
+YY_RULE_SETUP
+#line 225 "program/program_lexer.l"
+{ return_opcode(             1, SCALAR_OP, RCP, 3); }
+	YY_BREAK
+case 42:
+YY_RULE_SETUP
+#line 226 "program/program_lexer.l"
+{ return_opcode(require_NV_fp,  BIN_OP,    RFL, 3); }
+	YY_BREAK
+case 43:
+YY_RULE_SETUP
+#line 227 "program/program_lexer.l"
+{ return_opcode(             1, SCALAR_OP, RSQ, 3); }
+	YY_BREAK
+case 44:
+YY_RULE_SETUP
+#line 229 "program/program_lexer.l"
+{ return_opcode(require_ARB_fp, SCALAR_OP, SCS, 3); }
+	YY_BREAK
+case 45:
+YY_RULE_SETUP
+#line 230 "program/program_lexer.l"
+{ return_opcode(require_NV_fp,  BIN_OP, SEQ, 3); }
+	YY_BREAK
+case 46:
+YY_RULE_SETUP
+#line 231 "program/program_lexer.l"
+{ return_opcode(require_NV_fp,  BIN_OP, SFL, 3); }
+	YY_BREAK
+case 47:
+YY_RULE_SETUP
+#line 232 "program/program_lexer.l"
+{ return_opcode(             1, BIN_OP, SGE, 3); }
+	YY_BREAK
+case 48:
+YY_RULE_SETUP
+#line 233 "program/program_lexer.l"
+{ return_opcode(require_NV_fp,  BIN_OP, SGT, 3); }
+	YY_BREAK
+case 49:
+YY_RULE_SETUP
+#line 234 "program/program_lexer.l"
+{ return_opcode(require_ARB_fp, SCALAR_OP, SIN, 3); }
+	YY_BREAK
+case 50:
+YY_RULE_SETUP
+#line 235 "program/program_lexer.l"
+{ return_opcode(require_NV_fp,  BIN_OP, SLE, 3); }
+	YY_BREAK
+case 51:
+YY_RULE_SETUP
+#line 236 "program/program_lexer.l"
+{ return_opcode(             1, BIN_OP, SLT, 3); }
+	YY_BREAK
+case 52:
+YY_RULE_SETUP
+#line 237 "program/program_lexer.l"
+{ return_opcode(require_NV_fp,  BIN_OP, SNE, 3); }
+	YY_BREAK
+case 53:
+YY_RULE_SETUP
+#line 238 "program/program_lexer.l"
+{ return_opcode(require_NV_fp,  BIN_OP, STR, 3); }
+	YY_BREAK
+case 54:
+YY_RULE_SETUP
+#line 239 "program/program_lexer.l"
+{ return_opcode(             1, BIN_OP, SUB, 3); }
+	YY_BREAK
+case 55:
+YY_RULE_SETUP
+#line 240 "program/program_lexer.l"
+{ return_opcode(             1, SWZ, SWZ, 3); }
+	YY_BREAK
+case 56:
+YY_RULE_SETUP
+#line 242 "program/program_lexer.l"
+{ return_opcode(require_ARB_fp, SAMPLE_OP, TEX, 3); }
+	YY_BREAK
+case 57:
+YY_RULE_SETUP
+#line 243 "program/program_lexer.l"
+{ return_opcode(require_ARB_fp, SAMPLE_OP, TXB, 3); }
+	YY_BREAK
+case 58:
+YY_RULE_SETUP
+#line 244 "program/program_lexer.l"
+{ return_opcode(require_NV_fp,  TXD_OP, TXD, 3); }
+	YY_BREAK
+case 59:
+YY_RULE_SETUP
+#line 245 "program/program_lexer.l"
+{ return_opcode(require_ARB_fp, SAMPLE_OP, TXP, 3); }
+	YY_BREAK
+case 60:
+YY_RULE_SETUP
+#line 247 "program/program_lexer.l"
+{ return_opcode(require_NV_fp,  SCALAR_OP, UP2H, 4); }
+	YY_BREAK
+case 61:
+YY_RULE_SETUP
+#line 248 "program/program_lexer.l"
+{ return_opcode(require_NV_fp,  SCALAR_OP, UP2US, 5); }
+	YY_BREAK
+case 62:
+YY_RULE_SETUP
+#line 249 "program/program_lexer.l"
+{ return_opcode(require_NV_fp,  SCALAR_OP, UP4B, 4); }
+	YY_BREAK
+case 63:
+YY_RULE_SETUP
+#line 250 "program/program_lexer.l"
+{ return_opcode(require_NV_fp,  SCALAR_OP, UP4UB, 5); }
+	YY_BREAK
+case 64:
+YY_RULE_SETUP
+#line 252 "program/program_lexer.l"
+{ return_opcode(require_NV_fp,  TRI_OP, X2D, 3); }
+	YY_BREAK
+case 65:
+YY_RULE_SETUP
+#line 253 "program/program_lexer.l"
+{ return_opcode(             1, BIN_OP, XPD, 3); }
+	YY_BREAK
+case 66:
+YY_RULE_SETUP
+#line 255 "program/program_lexer.l"
+{ return_token_or_IDENTIFIER(require_ARB_vp, VERTEX); }
+	YY_BREAK
+case 67:
+YY_RULE_SETUP
+#line 256 "program/program_lexer.l"
+{ return_token_or_IDENTIFIER(require_ARB_fp, FRAGMENT); }
+	YY_BREAK
+case 68:
+YY_RULE_SETUP
+#line 257 "program/program_lexer.l"
+{ return PROGRAM; }
+	YY_BREAK
+case 69:
+YY_RULE_SETUP
+#line 258 "program/program_lexer.l"
+{ return STATE; }
+	YY_BREAK
+case 70:
+YY_RULE_SETUP
+#line 259 "program/program_lexer.l"
+{ return RESULT; }
+	YY_BREAK
+case 71:
+YY_RULE_SETUP
+#line 261 "program/program_lexer.l"
+{ return AMBIENT; }
+	YY_BREAK
+case 72:
+YY_RULE_SETUP
+#line 262 "program/program_lexer.l"
+{ return ATTENUATION; }
+	YY_BREAK
+case 73:
+YY_RULE_SETUP
+#line 263 "program/program_lexer.l"
+{ return BACK; }
+	YY_BREAK
+case 74:
+YY_RULE_SETUP
+#line 264 "program/program_lexer.l"
+{ return_token_or_DOT(require_ARB_vp, CLIP); }
+	YY_BREAK
+case 75:
+YY_RULE_SETUP
+#line 265 "program/program_lexer.l"
+{ return COLOR; }
+	YY_BREAK
+case 76:
+YY_RULE_SETUP
+#line 266 "program/program_lexer.l"
+{ return_token_or_DOT(require_ARB_fp, DEPTH); }
+	YY_BREAK
+case 77:
+YY_RULE_SETUP
+#line 267 "program/program_lexer.l"
+{ return DIFFUSE; }
+	YY_BREAK
+case 78:
+YY_RULE_SETUP
+#line 268 "program/program_lexer.l"
+{ return DIRECTION; }
+	YY_BREAK
+case 79:
+YY_RULE_SETUP
+#line 269 "program/program_lexer.l"
+{ return EMISSION; }
+	YY_BREAK
+case 80:
+YY_RULE_SETUP
+#line 270 "program/program_lexer.l"
+{ return ENV; }
+	YY_BREAK
+case 81:
+YY_RULE_SETUP
+#line 271 "program/program_lexer.l"
+{ return EYE; }
+	YY_BREAK
+case 82:
+YY_RULE_SETUP
+#line 272 "program/program_lexer.l"
+{ return FOGCOORD; }
+	YY_BREAK
+case 83:
+YY_RULE_SETUP
+#line 273 "program/program_lexer.l"
+{ return FOG; }
+	YY_BREAK
+case 84:
+YY_RULE_SETUP
+#line 274 "program/program_lexer.l"
+{ return FRONT; }
+	YY_BREAK
+case 85:
+YY_RULE_SETUP
+#line 275 "program/program_lexer.l"
+{ return HALF; }
+	YY_BREAK
+case 86:
+YY_RULE_SETUP
+#line 276 "program/program_lexer.l"
+{ return INVERSE; }
+	YY_BREAK
+case 87:
+YY_RULE_SETUP
+#line 277 "program/program_lexer.l"
+{ return INVTRANS; }
+	YY_BREAK
+case 88:
+YY_RULE_SETUP
+#line 278 "program/program_lexer.l"
+{ return LIGHT; }
+	YY_BREAK
+case 89:
+YY_RULE_SETUP
+#line 279 "program/program_lexer.l"
+{ return LIGHTMODEL; }
+	YY_BREAK
+case 90:
+YY_RULE_SETUP
+#line 280 "program/program_lexer.l"
+{ return LIGHTPROD; }
+	YY_BREAK
+case 91:
+YY_RULE_SETUP
+#line 281 "program/program_lexer.l"
+{ return LOCAL; }
+	YY_BREAK
+case 92:
+YY_RULE_SETUP
+#line 282 "program/program_lexer.l"
+{ return MATERIAL; }
+	YY_BREAK
+case 93:
+YY_RULE_SETUP
+#line 283 "program/program_lexer.l"
+{ return MAT_PROGRAM; }
+	YY_BREAK
+case 94:
+YY_RULE_SETUP
+#line 284 "program/program_lexer.l"
+{ return MATRIX; }
+	YY_BREAK
+case 95:
+YY_RULE_SETUP
+#line 285 "program/program_lexer.l"
+{ return_token_or_DOT(require_ARB_vp, MATRIXINDEX); }
+	YY_BREAK
+case 96:
+YY_RULE_SETUP
+#line 286 "program/program_lexer.l"
+{ return MODELVIEW; }
+	YY_BREAK
+case 97:
+YY_RULE_SETUP
+#line 287 "program/program_lexer.l"
+{ return MVP; }
+	YY_BREAK
+case 98:
+YY_RULE_SETUP
+#line 288 "program/program_lexer.l"
+{ return_token_or_DOT(require_ARB_vp, NORMAL); }
+	YY_BREAK
+case 99:
+YY_RULE_SETUP
+#line 289 "program/program_lexer.l"
+{ return OBJECT; }
+	YY_BREAK
+case 100:
+YY_RULE_SETUP
+#line 290 "program/program_lexer.l"
+{ return PALETTE; }
+	YY_BREAK
+case 101:
+YY_RULE_SETUP
+#line 291 "program/program_lexer.l"
+{ return PARAMS; }
+	YY_BREAK
+case 102:
+YY_RULE_SETUP
+#line 292 "program/program_lexer.l"
+{ return PLANE; }
+	YY_BREAK
+case 103:
+YY_RULE_SETUP
+#line 293 "program/program_lexer.l"
+{ return_token_or_DOT(require_ARB_vp, POINT_TOK); }
+	YY_BREAK
+case 104:
+YY_RULE_SETUP
+#line 294 "program/program_lexer.l"
+{ return_token_or_DOT(require_ARB_vp, POINTSIZE); }
+	YY_BREAK
+case 105:
+YY_RULE_SETUP
+#line 295 "program/program_lexer.l"
+{ return POSITION; }
+	YY_BREAK
+case 106:
+YY_RULE_SETUP
+#line 296 "program/program_lexer.l"
+{ return PRIMARY; }
+	YY_BREAK
+case 107:
+YY_RULE_SETUP
+#line 297 "program/program_lexer.l"
+{ return PROJECTION; }
+	YY_BREAK
+case 108:
+YY_RULE_SETUP
+#line 298 "program/program_lexer.l"
+{ return_token_or_DOT(require_ARB_fp, RANGE); }
+	YY_BREAK
+case 109:
+YY_RULE_SETUP
+#line 299 "program/program_lexer.l"
+{ return ROW; }
+	YY_BREAK
+case 110:
+YY_RULE_SETUP
+#line 300 "program/program_lexer.l"
+{ return SCENECOLOR; }
+	YY_BREAK
+case 111:
+YY_RULE_SETUP
+#line 301 "program/program_lexer.l"
+{ return SECONDARY; }
+	YY_BREAK
+case 112:
+YY_RULE_SETUP
+#line 302 "program/program_lexer.l"
+{ return SHININESS; }
+	YY_BREAK
+case 113:
+YY_RULE_SETUP
+#line 303 "program/program_lexer.l"
+{ return_token_or_DOT(require_ARB_vp, SIZE_TOK); }
+	YY_BREAK
+case 114:
+YY_RULE_SETUP
+#line 304 "program/program_lexer.l"
+{ return SPECULAR; }
+	YY_BREAK
+case 115:
+YY_RULE_SETUP
+#line 305 "program/program_lexer.l"
+{ return SPOT; }
+	YY_BREAK
+case 116:
+YY_RULE_SETUP
+#line 306 "program/program_lexer.l"
+{ return TEXCOORD; }
+	YY_BREAK
+case 117:
+YY_RULE_SETUP
+#line 307 "program/program_lexer.l"
+{ return_token_or_DOT(require_ARB_fp, TEXENV); }
+	YY_BREAK
+case 118:
+YY_RULE_SETUP
+#line 308 "program/program_lexer.l"
+{ return_token_or_DOT(require_ARB_vp, TEXGEN); }
+	YY_BREAK
+case 119:
+YY_RULE_SETUP
+#line 309 "program/program_lexer.l"
+{ return_token_or_DOT(require_ARB_vp, TEXGEN_Q); }
+	YY_BREAK
+case 120:
+YY_RULE_SETUP
+#line 310 "program/program_lexer.l"
+{ return_token_or_DOT(require_ARB_vp, TEXGEN_S); }
+	YY_BREAK
+case 121:
+YY_RULE_SETUP
+#line 311 "program/program_lexer.l"
+{ return_token_or_DOT(require_ARB_vp, TEXGEN_T); }
+	YY_BREAK
+case 122:
+YY_RULE_SETUP
+#line 312 "program/program_lexer.l"
+{ return TEXTURE; }
+	YY_BREAK
+case 123:
+YY_RULE_SETUP
+#line 313 "program/program_lexer.l"
+{ return TRANSPOSE; }
+	YY_BREAK
+case 124:
+YY_RULE_SETUP
+#line 314 "program/program_lexer.l"
+{ return_token_or_DOT(require_ARB_vp, VTXATTRIB); }
+	YY_BREAK
+case 125:
+YY_RULE_SETUP
+#line 315 "program/program_lexer.l"
+{ return_token_or_DOT(require_ARB_vp, WEIGHT); }
+	YY_BREAK
+case 126:
+YY_RULE_SETUP
+#line 317 "program/program_lexer.l"
+{ return_token_or_IDENTIFIER(require_ARB_fp, TEXTURE_UNIT); }
+	YY_BREAK
+case 127:
+YY_RULE_SETUP
+#line 318 "program/program_lexer.l"
+{ return_token_or_IDENTIFIER(require_ARB_fp, TEX_1D); }
+	YY_BREAK
+case 128:
+YY_RULE_SETUP
+#line 319 "program/program_lexer.l"
+{ return_token_or_IDENTIFIER(require_ARB_fp, TEX_2D); }
+	YY_BREAK
+case 129:
+YY_RULE_SETUP
+#line 320 "program/program_lexer.l"
+{ return_token_or_IDENTIFIER(require_ARB_fp, TEX_3D); }
+	YY_BREAK
+case 130:
+YY_RULE_SETUP
+#line 321 "program/program_lexer.l"
+{ return_token_or_IDENTIFIER(require_ARB_fp, TEX_CUBE); }
+	YY_BREAK
+case 131:
+YY_RULE_SETUP
+#line 322 "program/program_lexer.l"
+{ return_token_or_IDENTIFIER(require_ARB_fp && require_rect, TEX_RECT); }
+	YY_BREAK
+case 132:
+YY_RULE_SETUP
+#line 323 "program/program_lexer.l"
+{ return_token_or_IDENTIFIER(require_ARB_fp && require_shadow, TEX_SHADOW1D); }
+	YY_BREAK
+case 133:
+YY_RULE_SETUP
+#line 324 "program/program_lexer.l"
+{ return_token_or_IDENTIFIER(require_ARB_fp && require_shadow, TEX_SHADOW2D); }
+	YY_BREAK
+case 134:
+YY_RULE_SETUP
+#line 325 "program/program_lexer.l"
+{ return_token_or_IDENTIFIER(require_ARB_fp && require_shadow && require_rect, TEX_SHADOWRECT); }
+	YY_BREAK
+case 135:
+YY_RULE_SETUP
+#line 326 "program/program_lexer.l"
+{ return_token_or_IDENTIFIER(require_ARB_fp && require_texarray, TEX_ARRAY1D); }
+	YY_BREAK
+case 136:
+YY_RULE_SETUP
+#line 327 "program/program_lexer.l"
+{ return_token_or_IDENTIFIER(require_ARB_fp && require_texarray, TEX_ARRAY2D); }
+	YY_BREAK
+case 137:
+YY_RULE_SETUP
+#line 328 "program/program_lexer.l"
+{ return_token_or_IDENTIFIER(require_ARB_fp && require_shadow && require_texarray, TEX_ARRAYSHADOW1D); }
+	YY_BREAK
+case 138:
+YY_RULE_SETUP
+#line 329 "program/program_lexer.l"
+{ return_token_or_IDENTIFIER(require_ARB_fp && require_shadow && require_texarray, TEX_ARRAYSHADOW2D); }
+	YY_BREAK
+case 139:
+YY_RULE_SETUP
+#line 331 "program/program_lexer.l"
+{ return handle_ident(yyextra, yytext, yylval); }
+	YY_BREAK
+case 140:
+YY_RULE_SETUP
+#line 333 "program/program_lexer.l"
+{ return DOT_DOT; }
+	YY_BREAK
+case 141:
+YY_RULE_SETUP
+#line 335 "program/program_lexer.l"
+{
+   yylval->integer = strtol(yytext, NULL, 10);
+   return INTEGER;
+}
+	YY_BREAK
+case 142:
+YY_RULE_SETUP
+#line 339 "program/program_lexer.l"
+{
+   yylval->real = _mesa_strtof(yytext, NULL);
+   return REAL;
+}
+	YY_BREAK
+case 143:
+/* rule 143 can match eol */
+*yy_cp = yyg->yy_hold_char; /* undo effects of setting up yytext */
+yyg->yy_c_buf_p = yy_cp -= 1;
+YY_DO_BEFORE_ACTION; /* set up yytext again */
+YY_RULE_SETUP
+#line 343 "program/program_lexer.l"
+{
+   yylval->real = _mesa_strtof(yytext, NULL);
+   return REAL;
+}
+	YY_BREAK
+case 144:
+YY_RULE_SETUP
+#line 347 "program/program_lexer.l"
+{
+   yylval->real = _mesa_strtof(yytext, NULL);
+   return REAL;
+}
+	YY_BREAK
+case 145:
+YY_RULE_SETUP
+#line 351 "program/program_lexer.l"
+{
+   yylval->real = _mesa_strtof(yytext, NULL);
+   return REAL;
+}
+	YY_BREAK
+case 146:
+YY_RULE_SETUP
+#line 356 "program/program_lexer.l"
+{
+   yylval->swiz_mask.swizzle = SWIZZLE_NOOP;
+   yylval->swiz_mask.mask = WRITEMASK_XYZW;
+   return MASK4;
+}
+	YY_BREAK
+case 147:
+YY_RULE_SETUP
+#line 362 "program/program_lexer.l"
+{
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_XY
+      | mask_from_char(yytext[3]);
+   return MASK3;
+}
+	YY_BREAK
+case 148:
+YY_RULE_SETUP
+#line 368 "program/program_lexer.l"
+{
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_XZW;
+   return MASK3;
+}
+	YY_BREAK
+case 149:
+YY_RULE_SETUP
+#line 373 "program/program_lexer.l"
+{
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_YZW;
+   return MASK3;
+}
+	YY_BREAK
+case 150:
+YY_RULE_SETUP
+#line 379 "program/program_lexer.l"
+{
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_X
+      | mask_from_char(yytext[2]);
+   return MASK2;
+}
+	YY_BREAK
+case 151:
+YY_RULE_SETUP
+#line 385 "program/program_lexer.l"
+{
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_Y
+      | mask_from_char(yytext[2]);
+   return MASK2;
+}
+	YY_BREAK
+case 152:
+YY_RULE_SETUP
+#line 391 "program/program_lexer.l"
+{
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_ZW;
+   return MASK2;
+}
+	YY_BREAK
+case 153:
+YY_RULE_SETUP
+#line 397 "program/program_lexer.l"
+{
+   const unsigned s = swiz_from_char(yytext[1]);
+   yylval->swiz_mask.swizzle = MAKE_SWIZZLE4(s, s, s, s);
+   yylval->swiz_mask.mask = mask_from_char(yytext[1]);
+   return MASK1; 
+}
+	YY_BREAK
+case 154:
+YY_RULE_SETUP
+#line 404 "program/program_lexer.l"
+{
+   yylval->swiz_mask.swizzle = MAKE_SWIZZLE4(swiz_from_char(yytext[1]),
+					    swiz_from_char(yytext[2]),
+					    swiz_from_char(yytext[3]),
+					    swiz_from_char(yytext[4]));
+   yylval->swiz_mask.mask = 0;
+   return SWIZZLE;
+}
+	YY_BREAK
+case 155:
+YY_RULE_SETUP
+#line 413 "program/program_lexer.l"
+{
+   yylval->swiz_mask.swizzle = SWIZZLE_NOOP;
+   yylval->swiz_mask.mask = WRITEMASK_XYZW;
+   return_token_or_DOT(require_ARB_fp, MASK4);
+}
+	YY_BREAK
+case 156:
+YY_RULE_SETUP
+#line 419 "program/program_lexer.l"
+{
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_XY
+      | mask_from_char(yytext[3]);
+   return_token_or_DOT(require_ARB_fp, MASK3);
+}
+	YY_BREAK
+case 157:
+YY_RULE_SETUP
+#line 425 "program/program_lexer.l"
+{
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_XZW;
+   return_token_or_DOT(require_ARB_fp, MASK3);
+}
+	YY_BREAK
+case 158:
+YY_RULE_SETUP
+#line 430 "program/program_lexer.l"
+{
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_YZW;
+   return_token_or_DOT(require_ARB_fp, MASK3);
+}
+	YY_BREAK
+case 159:
+YY_RULE_SETUP
+#line 436 "program/program_lexer.l"
+{
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_X
+      | mask_from_char(yytext[2]);
+   return_token_or_DOT(require_ARB_fp, MASK2);
+}
+	YY_BREAK
+case 160:
+YY_RULE_SETUP
+#line 442 "program/program_lexer.l"
+{
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_Y
+      | mask_from_char(yytext[2]);
+   return_token_or_DOT(require_ARB_fp, MASK2);
+}
+	YY_BREAK
+case 161:
+YY_RULE_SETUP
+#line 448 "program/program_lexer.l"
+{
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_ZW;
+   return_token_or_DOT(require_ARB_fp, MASK2);
+}
+	YY_BREAK
+case 162:
+YY_RULE_SETUP
+#line 454 "program/program_lexer.l"
+{
+   const unsigned s = swiz_from_char(yytext[1]);
+   yylval->swiz_mask.swizzle = MAKE_SWIZZLE4(s, s, s, s);
+   yylval->swiz_mask.mask = mask_from_char(yytext[1]);
+   return_token_or_DOT(require_ARB_fp, MASK1);
+}
+	YY_BREAK
+case 163:
+YY_RULE_SETUP
+#line 462 "program/program_lexer.l"
+{
+   if (require_ARB_vp) {
+      return TEXGEN_R;
+   } else {
+      yylval->swiz_mask.swizzle = MAKE_SWIZZLE4(SWIZZLE_X, SWIZZLE_X,
+						SWIZZLE_X, SWIZZLE_X);
+      yylval->swiz_mask.mask = WRITEMASK_X;
+      return MASK1;
+   }
+}
+	YY_BREAK
+case 164:
+YY_RULE_SETUP
+#line 473 "program/program_lexer.l"
+{
+   yylval->swiz_mask.swizzle = MAKE_SWIZZLE4(swiz_from_char(yytext[1]),
+					    swiz_from_char(yytext[2]),
+					    swiz_from_char(yytext[3]),
+					    swiz_from_char(yytext[4]));
+   yylval->swiz_mask.mask = 0;
+   return_token_or_DOT(require_ARB_fp, SWIZZLE);
+}
+	YY_BREAK
+case 165:
+YY_RULE_SETUP
+#line 482 "program/program_lexer.l"
+{ return DOT; }
+	YY_BREAK
+case 166:
+/* rule 166 can match eol */
+YY_RULE_SETUP
+#line 484 "program/program_lexer.l"
+{
+   yylloc->first_line++;
+   yylloc->first_column = 1;
+   yylloc->last_line++;
+   yylloc->last_column = 1;
+   yylloc->position++;
+}
+	YY_BREAK
+case 167:
+YY_RULE_SETUP
+#line 491 "program/program_lexer.l"
+/* eat whitespace */ ;
+	YY_BREAK
+case 168:
+*yy_cp = yyg->yy_hold_char; /* undo effects of setting up yytext */
+yyg->yy_c_buf_p = yy_cp -= 1;
+YY_DO_BEFORE_ACTION; /* set up yytext again */
+YY_RULE_SETUP
+#line 492 "program/program_lexer.l"
+/* eat comments */ ;
+	YY_BREAK
+case 169:
+YY_RULE_SETUP
+#line 493 "program/program_lexer.l"
+{ return yytext[0]; }
+	YY_BREAK
+case 170:
+YY_RULE_SETUP
+#line 494 "program/program_lexer.l"
+ECHO;
+	YY_BREAK
+#line 2491 "program/lex.yy.c"
+case YY_STATE_EOF(INITIAL):
+	yyterminate();
+
+	case YY_END_OF_BUFFER:
+		{
+		/* Amount of text matched not including the EOB char. */
+		int yy_amount_of_matched_text = (int) (yy_cp - yyg->yytext_ptr) - 1;
+
+		/* Undo the effects of YY_DO_BEFORE_ACTION. */
+		*yy_cp = yyg->yy_hold_char;
+		YY_RESTORE_YY_MORE_OFFSET
+
+		if ( YY_CURRENT_BUFFER_LVALUE->yy_buffer_status == YY_BUFFER_NEW )
+			{
+			/* We're scanning a new file or input source.  It's
+			 * possible that this happened because the user
+			 * just pointed yyin at a new source and called
+			 * _mesa_program_lexer_lex().  If so, then we have to assure
+			 * consistency between YY_CURRENT_BUFFER and our
+			 * globals.  Here is the right place to do so, because
+			 * this is the first action (other than possibly a
+			 * back-up) that will match for the new input source.
+			 */
+			yyg->yy_n_chars = YY_CURRENT_BUFFER_LVALUE->yy_n_chars;
+			YY_CURRENT_BUFFER_LVALUE->yy_input_file = yyin;
+			YY_CURRENT_BUFFER_LVALUE->yy_buffer_status = YY_BUFFER_NORMAL;
+			}
+
+		/* Note that here we test for yy_c_buf_p "<=" to the position
+		 * of the first EOB in the buffer, since yy_c_buf_p will
+		 * already have been incremented past the NUL character
+		 * (since all states make transitions on EOB to the
+		 * end-of-buffer state).  Contrast this with the test
+		 * in input().
+		 */
+		if ( yyg->yy_c_buf_p <= &YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[yyg->yy_n_chars] )
+			{ /* This was really a NUL. */
+			yy_state_type yy_next_state;
+
+			yyg->yy_c_buf_p = yyg->yytext_ptr + yy_amount_of_matched_text;
+
+			yy_current_state = yy_get_previous_state( yyscanner );
+
+			/* Okay, we're now positioned to make the NUL
+			 * transition.  We couldn't have
+			 * yy_get_previous_state() go ahead and do it
+			 * for us because it doesn't know how to deal
+			 * with the possibility of jamming (and we don't
+			 * want to build jamming into it because then it
+			 * will run more slowly).
+			 */
+
+			yy_next_state = yy_try_NUL_trans( yy_current_state , yyscanner);
+
+			yy_bp = yyg->yytext_ptr + YY_MORE_ADJ;
+
+			if ( yy_next_state )
+				{
+				/* Consume the NUL. */
+				yy_cp = ++yyg->yy_c_buf_p;
+				yy_current_state = yy_next_state;
+				goto yy_match;
+				}
+
+			else
+				{
+				yy_cp = yyg->yy_c_buf_p;
+				goto yy_find_action;
+				}
+			}
+
+		else switch ( yy_get_next_buffer( yyscanner ) )
+			{
+			case EOB_ACT_END_OF_FILE:
+				{
+				yyg->yy_did_buffer_switch_on_eof = 0;
+
+				if ( _mesa_program_lexer_wrap(yyscanner ) )
+					{
+					/* Note: because we've taken care in
+					 * yy_get_next_buffer() to have set up
+					 * yytext, we can now set up
+					 * yy_c_buf_p so that if some total
+					 * hoser (like flex itself) wants to
+					 * call the scanner after we return the
+					 * YY_NULL, it'll still work - another
+					 * YY_NULL will get returned.
+					 */
+					yyg->yy_c_buf_p = yyg->yytext_ptr + YY_MORE_ADJ;
+
+					yy_act = YY_STATE_EOF(YY_START);
+					goto do_action;
+					}
+
+				else
+					{
+					if ( ! yyg->yy_did_buffer_switch_on_eof )
+						YY_NEW_FILE;
+					}
+				break;
+				}
+
+			case EOB_ACT_CONTINUE_SCAN:
+				yyg->yy_c_buf_p =
+					yyg->yytext_ptr + yy_amount_of_matched_text;
+
+				yy_current_state = yy_get_previous_state( yyscanner );
+
+				yy_cp = yyg->yy_c_buf_p;
+				yy_bp = yyg->yytext_ptr + YY_MORE_ADJ;
+				goto yy_match;
+
+			case EOB_ACT_LAST_MATCH:
+				yyg->yy_c_buf_p =
+				&YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[yyg->yy_n_chars];
+
+				yy_current_state = yy_get_previous_state( yyscanner );
+
+				yy_cp = yyg->yy_c_buf_p;
+				yy_bp = yyg->yytext_ptr + YY_MORE_ADJ;
+				goto yy_find_action;
+			}
+		break;
+		}
+
+	default:
+		YY_FATAL_ERROR(
+			"fatal flex scanner internal error--no action found" );
+	} /* end of action switch */
+		} /* end of scanning one token */
+} /* end of _mesa_program_lexer_lex */
+
+/* yy_get_next_buffer - try to read in a new buffer
+ *
+ * Returns a code representing an action:
+ *	EOB_ACT_LAST_MATCH -
+ *	EOB_ACT_CONTINUE_SCAN - continue scanning from current position
+ *	EOB_ACT_END_OF_FILE - end of file
+ */
+static int yy_get_next_buffer (yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+	register char *dest = YY_CURRENT_BUFFER_LVALUE->yy_ch_buf;
+	register char *source = yyg->yytext_ptr;
+	register int number_to_move, i;
+	int ret_val;
+
+	if ( yyg->yy_c_buf_p > &YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[yyg->yy_n_chars + 1] )
+		YY_FATAL_ERROR(
+		"fatal flex scanner internal error--end of buffer missed" );
+
+	if ( YY_CURRENT_BUFFER_LVALUE->yy_fill_buffer == 0 )
+		{ /* Don't try to fill the buffer, so this is an EOF. */
+		if ( yyg->yy_c_buf_p - yyg->yytext_ptr - YY_MORE_ADJ == 1 )
+			{
+			/* We matched a single character, the EOB, so
+			 * treat this as a final EOF.
+			 */
+			return EOB_ACT_END_OF_FILE;
+			}
+
+		else
+			{
+			/* We matched some text prior to the EOB, first
+			 * process it.
+			 */
+			return EOB_ACT_LAST_MATCH;
+			}
+		}
+
+	/* Try to read more data. */
+
+	/* First move last chars to start of buffer. */
+	number_to_move = (int) (yyg->yy_c_buf_p - yyg->yytext_ptr) - 1;
+
+	for ( i = 0; i < number_to_move; ++i )
+		*(dest++) = *(source++);
+
+	if ( YY_CURRENT_BUFFER_LVALUE->yy_buffer_status == YY_BUFFER_EOF_PENDING )
+		/* don't do the read, it's not guaranteed to return an EOF,
+		 * just force an EOF
+		 */
+		YY_CURRENT_BUFFER_LVALUE->yy_n_chars = yyg->yy_n_chars = 0;
+
+	else
+		{
+			int num_to_read =
+			YY_CURRENT_BUFFER_LVALUE->yy_buf_size - number_to_move - 1;
+
+		while ( num_to_read <= 0 )
+			{ /* Not enough room in the buffer - grow it. */
+
+			/* just a shorter name for the current buffer */
+			YY_BUFFER_STATE b = YY_CURRENT_BUFFER;
+
+			int yy_c_buf_p_offset =
+				(int) (yyg->yy_c_buf_p - b->yy_ch_buf);
+
+			if ( b->yy_is_our_buffer )
+				{
+				int new_size = b->yy_buf_size * 2;
+
+				if ( new_size <= 0 )
+					b->yy_buf_size += b->yy_buf_size / 8;
+				else
+					b->yy_buf_size *= 2;
+
+				b->yy_ch_buf = (char *)
+					/* Include room in for 2 EOB chars. */
+					_mesa_program_lexer_realloc((void *) b->yy_ch_buf,b->yy_buf_size + 2 ,yyscanner );
+				}
+			else
+				/* Can't grow it, we don't own it. */
+				b->yy_ch_buf = 0;
+
+			if ( ! b->yy_ch_buf )
+				YY_FATAL_ERROR(
+				"fatal error - scanner input buffer overflow" );
+
+			yyg->yy_c_buf_p = &b->yy_ch_buf[yy_c_buf_p_offset];
+
+			num_to_read = YY_CURRENT_BUFFER_LVALUE->yy_buf_size -
+						number_to_move - 1;
+
+			}
+
+		if ( num_to_read > YY_READ_BUF_SIZE )
+			num_to_read = YY_READ_BUF_SIZE;
+
+		/* Read in more data. */
+		YY_INPUT( (&YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[number_to_move]),
+			yyg->yy_n_chars, (size_t) num_to_read );
+
+		YY_CURRENT_BUFFER_LVALUE->yy_n_chars = yyg->yy_n_chars;
+		}
+
+	if ( yyg->yy_n_chars == 0 )
+		{
+		if ( number_to_move == YY_MORE_ADJ )
+			{
+			ret_val = EOB_ACT_END_OF_FILE;
+			_mesa_program_lexer_restart(yyin  ,yyscanner);
+			}
+
+		else
+			{
+			ret_val = EOB_ACT_LAST_MATCH;
+			YY_CURRENT_BUFFER_LVALUE->yy_buffer_status =
+				YY_BUFFER_EOF_PENDING;
+			}
+		}
+
+	else
+		ret_val = EOB_ACT_CONTINUE_SCAN;
+
+	if ((yy_size_t) (yyg->yy_n_chars + number_to_move) > YY_CURRENT_BUFFER_LVALUE->yy_buf_size) {
+		/* Extend the array by 50%, plus the number we really need. */
+		yy_size_t new_size = yyg->yy_n_chars + number_to_move + (yyg->yy_n_chars >> 1);
+		YY_CURRENT_BUFFER_LVALUE->yy_ch_buf = (char *) _mesa_program_lexer_realloc((void *) YY_CURRENT_BUFFER_LVALUE->yy_ch_buf,new_size ,yyscanner );
+		if ( ! YY_CURRENT_BUFFER_LVALUE->yy_ch_buf )
+			YY_FATAL_ERROR( "out of dynamic memory in yy_get_next_buffer()" );
+	}
+
+	yyg->yy_n_chars += number_to_move;
+	YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[yyg->yy_n_chars] = YY_END_OF_BUFFER_CHAR;
+	YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[yyg->yy_n_chars + 1] = YY_END_OF_BUFFER_CHAR;
+
+	yyg->yytext_ptr = &YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[0];
+
+	return ret_val;
+}
+
+/* yy_get_previous_state - get the state just before the EOB char was reached */
+
+    static yy_state_type yy_get_previous_state (yyscan_t yyscanner)
+{
+	register yy_state_type yy_current_state;
+	register char *yy_cp;
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+
+	yy_current_state = yyg->yy_start;
+
+	for ( yy_cp = yyg->yytext_ptr + YY_MORE_ADJ; yy_cp < yyg->yy_c_buf_p; ++yy_cp )
+		{
+		register YY_CHAR yy_c = (*yy_cp ? yy_ec[YY_SC_TO_UI(*yy_cp)] : 1);
+		if ( yy_accept[yy_current_state] )
+			{
+			yyg->yy_last_accepting_state = yy_current_state;
+			yyg->yy_last_accepting_cpos = yy_cp;
+			}
+		while ( yy_chk[yy_base[yy_current_state] + yy_c] != yy_current_state )
+			{
+			yy_current_state = (int) yy_def[yy_current_state];
+			if ( yy_current_state >= 850 )
+				yy_c = yy_meta[(unsigned int) yy_c];
+			}
+		yy_current_state = yy_nxt[yy_base[yy_current_state] + (unsigned int) yy_c];
+		}
+
+	return yy_current_state;
+}
+
+/* yy_try_NUL_trans - try to make a transition on the NUL character
+ *
+ * synopsis
+ *	next_state = yy_try_NUL_trans( current_state );
+ */
+    static yy_state_type yy_try_NUL_trans  (yy_state_type yy_current_state , yyscan_t yyscanner)
+{
+	register int yy_is_jam;
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner; /* This var may be unused depending upon options. */
+	register char *yy_cp = yyg->yy_c_buf_p;
+
+	register YY_CHAR yy_c = 1;
+	if ( yy_accept[yy_current_state] )
+		{
+		yyg->yy_last_accepting_state = yy_current_state;
+		yyg->yy_last_accepting_cpos = yy_cp;
+		}
+	while ( yy_chk[yy_base[yy_current_state] + yy_c] != yy_current_state )
+		{
+		yy_current_state = (int) yy_def[yy_current_state];
+		if ( yy_current_state >= 850 )
+			yy_c = yy_meta[(unsigned int) yy_c];
+		}
+	yy_current_state = yy_nxt[yy_base[yy_current_state] + (unsigned int) yy_c];
+	yy_is_jam = (yy_current_state == 849);
+
+	return yy_is_jam ? 0 : yy_current_state;
+}
+
+    static void yyunput (int c, register char * yy_bp , yyscan_t yyscanner)
+{
+	register char *yy_cp;
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+
+    yy_cp = yyg->yy_c_buf_p;
+
+	/* undo effects of setting up yytext */
+	*yy_cp = yyg->yy_hold_char;
+
+	if ( yy_cp < YY_CURRENT_BUFFER_LVALUE->yy_ch_buf + 2 )
+		{ /* need to shift things up to make room */
+		/* +2 for EOB chars. */
+		register int number_to_move = yyg->yy_n_chars + 2;
+		register char *dest = &YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[
+					YY_CURRENT_BUFFER_LVALUE->yy_buf_size + 2];
+		register char *source =
+				&YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[number_to_move];
+
+		while ( source > YY_CURRENT_BUFFER_LVALUE->yy_ch_buf )
+			*--dest = *--source;
+
+		yy_cp += (int) (dest - source);
+		yy_bp += (int) (dest - source);
+		YY_CURRENT_BUFFER_LVALUE->yy_n_chars =
+			yyg->yy_n_chars = YY_CURRENT_BUFFER_LVALUE->yy_buf_size;
+
+		if ( yy_cp < YY_CURRENT_BUFFER_LVALUE->yy_ch_buf + 2 )
+			YY_FATAL_ERROR( "flex scanner push-back overflow" );
+		}
+
+	*--yy_cp = (char) c;
+
+	yyg->yytext_ptr = yy_bp;
+	yyg->yy_hold_char = *yy_cp;
+	yyg->yy_c_buf_p = yy_cp;
+}
+
+#ifndef YY_NO_INPUT
+#ifdef __cplusplus
+    static int yyinput (yyscan_t yyscanner)
+#else
+    static int input  (yyscan_t yyscanner)
+#endif
+
+{
+	int c;
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+
+	*yyg->yy_c_buf_p = yyg->yy_hold_char;
+
+	if ( *yyg->yy_c_buf_p == YY_END_OF_BUFFER_CHAR )
+		{
+		/* yy_c_buf_p now points to the character we want to return.
+		 * If this occurs *before* the EOB characters, then it's a
+		 * valid NUL; if not, then we've hit the end of the buffer.
+		 */
+		if ( yyg->yy_c_buf_p < &YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[yyg->yy_n_chars] )
+			/* This was really a NUL. */
+			*yyg->yy_c_buf_p = '\0';
+
+		else
+			{ /* need more input */
+			int offset = yyg->yy_c_buf_p - yyg->yytext_ptr;
+			++yyg->yy_c_buf_p;
+
+			switch ( yy_get_next_buffer( yyscanner ) )
+				{
+				case EOB_ACT_LAST_MATCH:
+					/* This happens because yy_g_n_b()
+					 * sees that we've accumulated a
+					 * token and flags that we need to
+					 * try matching the token before
+					 * proceeding.  But for input(),
+					 * there's no matching to consider.
+					 * So convert the EOB_ACT_LAST_MATCH
+					 * to EOB_ACT_END_OF_FILE.
+					 */
+
+					/* Reset buffer status. */
+					_mesa_program_lexer_restart(yyin ,yyscanner);
+
+					/*FALLTHROUGH*/
+
+				case EOB_ACT_END_OF_FILE:
+					{
+					if ( _mesa_program_lexer_wrap(yyscanner ) )
+						return EOF;
+
+					if ( ! yyg->yy_did_buffer_switch_on_eof )
+						YY_NEW_FILE;
+#ifdef __cplusplus
+					return yyinput(yyscanner);
+#else
+					return input(yyscanner);
+#endif
+					}
+
+				case EOB_ACT_CONTINUE_SCAN:
+					yyg->yy_c_buf_p = yyg->yytext_ptr + offset;
+					break;
+				}
+			}
+		}
+
+	c = *(unsigned char *) yyg->yy_c_buf_p;	/* cast for 8-bit char's */
+	*yyg->yy_c_buf_p = '\0';	/* preserve yytext */
+	yyg->yy_hold_char = *++yyg->yy_c_buf_p;
+
+	return c;
+}
+#endif	/* ifndef YY_NO_INPUT */
+
+/** Immediately switch to a different input stream.
+ * @param input_file A readable stream.
+ * @param yyscanner The scanner object.
+ * @note This function does not reset the start condition to @c INITIAL .
+ */
+    void _mesa_program_lexer_restart  (FILE * input_file , yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+
+	if ( ! YY_CURRENT_BUFFER ){
+        _mesa_program_lexer_ensure_buffer_stack (yyscanner);
+		YY_CURRENT_BUFFER_LVALUE =
+            _mesa_program_lexer__create_buffer(yyin,YY_BUF_SIZE ,yyscanner);
+	}
+
+	_mesa_program_lexer__init_buffer(YY_CURRENT_BUFFER,input_file ,yyscanner);
+	_mesa_program_lexer__load_buffer_state(yyscanner );
+}
+
+/** Switch to a different input buffer.
+ * @param new_buffer The new input buffer.
+ * @param yyscanner The scanner object.
+ */
+    void _mesa_program_lexer__switch_to_buffer  (YY_BUFFER_STATE  new_buffer , yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+
+	/* TODO. We should be able to replace this entire function body
+	 * with
+	 *		_mesa_program_lexer_pop_buffer_state();
+	 *		_mesa_program_lexer_push_buffer_state(new_buffer);
+     */
+	_mesa_program_lexer_ensure_buffer_stack (yyscanner);
+	if ( YY_CURRENT_BUFFER == new_buffer )
+		return;
+
+	if ( YY_CURRENT_BUFFER )
+		{
+		/* Flush out information for old buffer. */
+		*yyg->yy_c_buf_p = yyg->yy_hold_char;
+		YY_CURRENT_BUFFER_LVALUE->yy_buf_pos = yyg->yy_c_buf_p;
+		YY_CURRENT_BUFFER_LVALUE->yy_n_chars = yyg->yy_n_chars;
+		}
+
+	YY_CURRENT_BUFFER_LVALUE = new_buffer;
+	_mesa_program_lexer__load_buffer_state(yyscanner );
+
+	/* We don't actually know whether we did this switch during
+	 * EOF (_mesa_program_lexer_wrap()) processing, but the only time this flag
+	 * is looked at is after _mesa_program_lexer_wrap() is called, so it's safe
+	 * to go ahead and always set it.
+	 */
+	yyg->yy_did_buffer_switch_on_eof = 1;
+}
+
+static void _mesa_program_lexer__load_buffer_state  (yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+	yyg->yy_n_chars = YY_CURRENT_BUFFER_LVALUE->yy_n_chars;
+	yyg->yytext_ptr = yyg->yy_c_buf_p = YY_CURRENT_BUFFER_LVALUE->yy_buf_pos;
+	yyin = YY_CURRENT_BUFFER_LVALUE->yy_input_file;
+	yyg->yy_hold_char = *yyg->yy_c_buf_p;
+}
+
+/** Allocate and initialize an input buffer state.
+ * @param file A readable stream.
+ * @param size The character buffer size in bytes. When in doubt, use @c YY_BUF_SIZE.
+ * @param yyscanner The scanner object.
+ * @return the allocated buffer state.
+ */
+    YY_BUFFER_STATE _mesa_program_lexer__create_buffer  (FILE * file, int  size , yyscan_t yyscanner)
+{
+	YY_BUFFER_STATE b;
+    
+	b = (YY_BUFFER_STATE) _mesa_program_lexer_alloc(sizeof( struct yy_buffer_state ) ,yyscanner );
+	if ( ! b )
+		YY_FATAL_ERROR( "out of dynamic memory in _mesa_program_lexer__create_buffer()" );
+
+	b->yy_buf_size = size;
+
+	/* yy_ch_buf has to be 2 characters longer than the size given because
+	 * we need to put in 2 end-of-buffer characters.
+	 */
+	b->yy_ch_buf = (char *) _mesa_program_lexer_alloc(b->yy_buf_size + 2 ,yyscanner );
+	if ( ! b->yy_ch_buf )
+		YY_FATAL_ERROR( "out of dynamic memory in _mesa_program_lexer__create_buffer()" );
+
+	b->yy_is_our_buffer = 1;
+
+	_mesa_program_lexer__init_buffer(b,file ,yyscanner);
+
+	return b;
+}
+
+/** Destroy the buffer.
+ * @param b a buffer created with _mesa_program_lexer__create_buffer()
+ * @param yyscanner The scanner object.
+ */
+    void _mesa_program_lexer__delete_buffer (YY_BUFFER_STATE  b , yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+
+	if ( ! b )
+		return;
+
+	if ( b == YY_CURRENT_BUFFER ) /* Not sure if we should pop here. */
+		YY_CURRENT_BUFFER_LVALUE = (YY_BUFFER_STATE) 0;
+
+	if ( b->yy_is_our_buffer )
+		_mesa_program_lexer_free((void *) b->yy_ch_buf ,yyscanner );
+
+	_mesa_program_lexer_free((void *) b ,yyscanner );
+}
+
+/* Initializes or reinitializes a buffer.
+ * This function is sometimes called more than once on the same buffer,
+ * such as during a _mesa_program_lexer_restart() or at EOF.
+ */
+    static void _mesa_program_lexer__init_buffer  (YY_BUFFER_STATE  b, FILE * file , yyscan_t yyscanner)
+
+{
+	int oerrno = errno;
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+
+	_mesa_program_lexer__flush_buffer(b ,yyscanner);
+
+	b->yy_input_file = file;
+	b->yy_fill_buffer = 1;
+
+    /* If b is the current buffer, then _mesa_program_lexer__init_buffer was _probably_
+     * called from _mesa_program_lexer_restart() or through yy_get_next_buffer.
+     * In that case, we don't want to reset the lineno or column.
+     */
+    if (b != YY_CURRENT_BUFFER){
+        b->yy_bs_lineno = 1;
+        b->yy_bs_column = 0;
+    }
+
+        b->yy_is_interactive = 0;
+    
+	errno = oerrno;
+}
+
+/** Discard all buffered characters. On the next scan, YY_INPUT will be called.
+ * @param b the buffer state to be flushed, usually @c YY_CURRENT_BUFFER.
+ * @param yyscanner The scanner object.
+ */
+    void _mesa_program_lexer__flush_buffer (YY_BUFFER_STATE  b , yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+	if ( ! b )
+		return;
+
+	b->yy_n_chars = 0;
+
+	/* We always need two end-of-buffer characters.  The first causes
+	 * a transition to the end-of-buffer state.  The second causes
+	 * a jam in that state.
+	 */
+	b->yy_ch_buf[0] = YY_END_OF_BUFFER_CHAR;
+	b->yy_ch_buf[1] = YY_END_OF_BUFFER_CHAR;
+
+	b->yy_buf_pos = &b->yy_ch_buf[0];
+
+	b->yy_at_bol = 1;
+	b->yy_buffer_status = YY_BUFFER_NEW;
+
+	if ( b == YY_CURRENT_BUFFER )
+		_mesa_program_lexer__load_buffer_state(yyscanner );
+}
+
+/** Pushes the new state onto the stack. The new state becomes
+ *  the current state. This function will allocate the stack
+ *  if necessary.
+ *  @param new_buffer The new state.
+ *  @param yyscanner The scanner object.
+ */
+void _mesa_program_lexer_push_buffer_state (YY_BUFFER_STATE new_buffer , yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+	if (new_buffer == NULL)
+		return;
+
+	_mesa_program_lexer_ensure_buffer_stack(yyscanner);
+
+	/* This block is copied from _mesa_program_lexer__switch_to_buffer. */
+	if ( YY_CURRENT_BUFFER )
+		{
+		/* Flush out information for old buffer. */
+		*yyg->yy_c_buf_p = yyg->yy_hold_char;
+		YY_CURRENT_BUFFER_LVALUE->yy_buf_pos = yyg->yy_c_buf_p;
+		YY_CURRENT_BUFFER_LVALUE->yy_n_chars = yyg->yy_n_chars;
+		}
+
+	/* Only push if top exists. Otherwise, replace top. */
+	if (YY_CURRENT_BUFFER)
+		yyg->yy_buffer_stack_top++;
+	YY_CURRENT_BUFFER_LVALUE = new_buffer;
+
+	/* copied from _mesa_program_lexer__switch_to_buffer. */
+	_mesa_program_lexer__load_buffer_state(yyscanner );
+	yyg->yy_did_buffer_switch_on_eof = 1;
+}
+
+/** Removes and deletes the top of the stack, if present.
+ *  The next element becomes the new top.
+ *  @param yyscanner The scanner object.
+ */
+void _mesa_program_lexer_pop_buffer_state (yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+	if (!YY_CURRENT_BUFFER)
+		return;
+
+	_mesa_program_lexer__delete_buffer(YY_CURRENT_BUFFER ,yyscanner);
+	YY_CURRENT_BUFFER_LVALUE = NULL;
+	if (yyg->yy_buffer_stack_top > 0)
+		--yyg->yy_buffer_stack_top;
+
+	if (YY_CURRENT_BUFFER) {
+		_mesa_program_lexer__load_buffer_state(yyscanner );
+		yyg->yy_did_buffer_switch_on_eof = 1;
+	}
+}
+
+/* Allocates the stack if it does not exist.
+ *  Guarantees space for at least one push.
+ */
+static void _mesa_program_lexer_ensure_buffer_stack (yyscan_t yyscanner)
+{
+	int num_to_alloc;
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+
+	if (!yyg->yy_buffer_stack) {
+
+		/* First allocation is just for 2 elements, since we don't know if this
+		 * scanner will even need a stack. We use 2 instead of 1 to avoid an
+		 * immediate realloc on the next call.
+         */
+		num_to_alloc = 1;
+		yyg->yy_buffer_stack = (struct yy_buffer_state**)_mesa_program_lexer_alloc
+								(num_to_alloc * sizeof(struct yy_buffer_state*)
+								, yyscanner);
+		if ( ! yyg->yy_buffer_stack )
+			YY_FATAL_ERROR( "out of dynamic memory in _mesa_program_lexer_ensure_buffer_stack()" );
+								  
+		memset(yyg->yy_buffer_stack, 0, num_to_alloc * sizeof(struct yy_buffer_state*));
+				
+		yyg->yy_buffer_stack_max = num_to_alloc;
+		yyg->yy_buffer_stack_top = 0;
+		return;
+	}
+
+	if (yyg->yy_buffer_stack_top >= (yyg->yy_buffer_stack_max) - 1){
+
+		/* Increase the buffer to prepare for a possible push. */
+		int grow_size = 8 /* arbitrary grow size */;
+
+		num_to_alloc = yyg->yy_buffer_stack_max + grow_size;
+		yyg->yy_buffer_stack = (struct yy_buffer_state**)_mesa_program_lexer_realloc
+								(yyg->yy_buffer_stack,
+								num_to_alloc * sizeof(struct yy_buffer_state*)
+								, yyscanner);
+		if ( ! yyg->yy_buffer_stack )
+			YY_FATAL_ERROR( "out of dynamic memory in _mesa_program_lexer_ensure_buffer_stack()" );
+
+		/* zero only the new slots.*/
+		memset(yyg->yy_buffer_stack + yyg->yy_buffer_stack_max, 0, grow_size * sizeof(struct yy_buffer_state*));
+		yyg->yy_buffer_stack_max = num_to_alloc;
+	}
+}
+
+/** Setup the input buffer state to scan directly from a user-specified character buffer.
+ * @param base the character buffer
+ * @param size the size in bytes of the character buffer
+ * @param yyscanner The scanner object.
+ * @return the newly allocated buffer state object. 
+ */
+YY_BUFFER_STATE _mesa_program_lexer__scan_buffer  (char * base, yy_size_t  size , yyscan_t yyscanner)
+{
+	YY_BUFFER_STATE b;
+    
+	if ( size < 2 ||
+	     base[size-2] != YY_END_OF_BUFFER_CHAR ||
+	     base[size-1] != YY_END_OF_BUFFER_CHAR )
+		/* They forgot to leave room for the EOB's. */
+		return 0;
+
+	b = (YY_BUFFER_STATE) _mesa_program_lexer_alloc(sizeof( struct yy_buffer_state ) ,yyscanner );
+	if ( ! b )
+		YY_FATAL_ERROR( "out of dynamic memory in _mesa_program_lexer__scan_buffer()" );
+
+	b->yy_buf_size = size - 2;	/* "- 2" to take care of EOB's */
+	b->yy_buf_pos = b->yy_ch_buf = base;
+	b->yy_is_our_buffer = 0;
+	b->yy_input_file = 0;
+	b->yy_n_chars = b->yy_buf_size;
+	b->yy_is_interactive = 0;
+	b->yy_at_bol = 1;
+	b->yy_fill_buffer = 0;
+	b->yy_buffer_status = YY_BUFFER_NEW;
+
+	_mesa_program_lexer__switch_to_buffer(b ,yyscanner );
+
+	return b;
+}
+
+/** Setup the input buffer state to scan a string. The next call to _mesa_program_lexer_lex() will
+ * scan from a @e copy of @a str.
+ * @param yystr a NUL-terminated string to scan
+ * @param yyscanner The scanner object.
+ * @return the newly allocated buffer state object.
+ * @note If you want to scan bytes that may contain NUL values, then use
+ *       _mesa_program_lexer__scan_bytes() instead.
+ */
+YY_BUFFER_STATE _mesa_program_lexer__scan_string (yyconst char * yystr , yyscan_t yyscanner)
+{
+    
+	return _mesa_program_lexer__scan_bytes(yystr,strlen(yystr) ,yyscanner);
+}
+
+/** Setup the input buffer state to scan the given bytes. The next call to _mesa_program_lexer_lex() will
+ * scan from a @e copy of @a bytes.
+ * @param yybytes the byte buffer to scan
+ * @param _yybytes_len the number of bytes in the buffer pointed to by @a bytes.
+ * @param yyscanner The scanner object.
+ * @return the newly allocated buffer state object.
+ */
+YY_BUFFER_STATE _mesa_program_lexer__scan_bytes  (yyconst char * yybytes, int  _yybytes_len , yyscan_t yyscanner)
+{
+	YY_BUFFER_STATE b;
+	char *buf;
+	yy_size_t n;
+	int i;
+    
+	/* Get memory for full buffer, including space for trailing EOB's. */
+	n = _yybytes_len + 2;
+	buf = (char *) _mesa_program_lexer_alloc(n ,yyscanner );
+	if ( ! buf )
+		YY_FATAL_ERROR( "out of dynamic memory in _mesa_program_lexer__scan_bytes()" );
+
+	for ( i = 0; i < _yybytes_len; ++i )
+		buf[i] = yybytes[i];
+
+	buf[_yybytes_len] = buf[_yybytes_len+1] = YY_END_OF_BUFFER_CHAR;
+
+	b = _mesa_program_lexer__scan_buffer(buf,n ,yyscanner);
+	if ( ! b )
+		YY_FATAL_ERROR( "bad buffer in _mesa_program_lexer__scan_bytes()" );
+
+	/* It's okay to grow etc. this buffer, and we should throw it
+	 * away when we're done.
+	 */
+	b->yy_is_our_buffer = 1;
+
+	return b;
+}
+
+#ifndef YY_EXIT_FAILURE
+#define YY_EXIT_FAILURE 2
+#endif
+
+static void yy_fatal_error (yyconst char* msg , yyscan_t yyscanner)
+{
+    	(void) fprintf( stderr, "%s\n", msg );
+	exit( YY_EXIT_FAILURE );
+}
+
+/* Redefine yyless() so it works in section 3 code. */
+
+#undef yyless
+#define yyless(n) \
+	do \
+		{ \
+		/* Undo effects of setting up yytext. */ \
+        int yyless_macro_arg = (n); \
+        YY_LESS_LINENO(yyless_macro_arg);\
+		yytext[yyleng] = yyg->yy_hold_char; \
+		yyg->yy_c_buf_p = yytext + yyless_macro_arg; \
+		yyg->yy_hold_char = *yyg->yy_c_buf_p; \
+		*yyg->yy_c_buf_p = '\0'; \
+		yyleng = yyless_macro_arg; \
+		} \
+	while ( 0 )
+
+/* Accessor  methods (get/set functions) to struct members. */
+
+/** Get the user-defined data for this scanner.
+ * @param yyscanner The scanner object.
+ */
+YY_EXTRA_TYPE _mesa_program_lexer_get_extra  (yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+    return yyextra;
+}
+
+/** Get the current line number.
+ * @param yyscanner The scanner object.
+ */
+int _mesa_program_lexer_get_lineno  (yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+    
+        if (! YY_CURRENT_BUFFER)
+            return 0;
+    
+    return yylineno;
+}
+
+/** Get the current column number.
+ * @param yyscanner The scanner object.
+ */
+int _mesa_program_lexer_get_column  (yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+    
+        if (! YY_CURRENT_BUFFER)
+            return 0;
+    
+    return yycolumn;
+}
+
+/** Get the input stream.
+ * @param yyscanner The scanner object.
+ */
+FILE *_mesa_program_lexer_get_in  (yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+    return yyin;
+}
+
+/** Get the output stream.
+ * @param yyscanner The scanner object.
+ */
+FILE *_mesa_program_lexer_get_out  (yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+    return yyout;
+}
+
+/** Get the length of the current token.
+ * @param yyscanner The scanner object.
+ */
+int _mesa_program_lexer_get_leng  (yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+    return yyleng;
+}
+
+/** Get the current token.
+ * @param yyscanner The scanner object.
+ */
+
+char *_mesa_program_lexer_get_text  (yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+    return yytext;
+}
+
+/** Set the user-defined data. This data is never touched by the scanner.
+ * @param user_defined The data to be associated with this scanner.
+ * @param yyscanner The scanner object.
+ */
+void _mesa_program_lexer_set_extra (YY_EXTRA_TYPE  user_defined , yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+    yyextra = user_defined ;
+}
+
+/** Set the current line number.
+ * @param line_number
+ * @param yyscanner The scanner object.
+ */
+void _mesa_program_lexer_set_lineno (int  line_number , yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+
+        /* lineno is only valid if an input buffer exists. */
+        if (! YY_CURRENT_BUFFER )
+           yy_fatal_error( "_mesa_program_lexer_set_lineno called with no buffer" , yyscanner); 
+    
+    yylineno = line_number;
+}
+
+/** Set the current column.
+ * @param line_number
+ * @param yyscanner The scanner object.
+ */
+void _mesa_program_lexer_set_column (int  column_no , yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+
+        /* column is only valid if an input buffer exists. */
+        if (! YY_CURRENT_BUFFER )
+           yy_fatal_error( "_mesa_program_lexer_set_column called with no buffer" , yyscanner); 
+    
+    yycolumn = column_no;
+}
+
+/** Set the input stream. This does not discard the current
+ * input buffer.
+ * @param in_str A readable stream.
+ * @param yyscanner The scanner object.
+ * @see _mesa_program_lexer__switch_to_buffer
+ */
+void _mesa_program_lexer_set_in (FILE *  in_str , yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+    yyin = in_str ;
+}
+
+void _mesa_program_lexer_set_out (FILE *  out_str , yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+    yyout = out_str ;
+}
+
+int _mesa_program_lexer_get_debug  (yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+    return yy_flex_debug;
+}
+
+void _mesa_program_lexer_set_debug (int  bdebug , yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+    yy_flex_debug = bdebug ;
+}
+
+/* Accessor methods for yylval and yylloc */
+
+YYSTYPE * _mesa_program_lexer_get_lval  (yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+    return yylval;
+}
+
+void _mesa_program_lexer_set_lval (YYSTYPE *  yylval_param , yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+    yylval = yylval_param;
+}
+
+YYLTYPE *_mesa_program_lexer_get_lloc  (yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+    return yylloc;
+}
+    
+void _mesa_program_lexer_set_lloc (YYLTYPE *  yylloc_param , yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+    yylloc = yylloc_param;
+}
+    
+/* User-visible API */
+
+/* _mesa_program_lexer_lex_init is special because it creates the scanner itself, so it is
+ * the ONLY reentrant function that doesn't take the scanner as the last argument.
+ * That's why we explicitly handle the declaration, instead of using our macros.
+ */
+
+int _mesa_program_lexer_lex_init(yyscan_t* ptr_yy_globals)
+
+{
+    if (ptr_yy_globals == NULL){
+        errno = EINVAL;
+        return 1;
+    }
+
+    *ptr_yy_globals = (yyscan_t) _mesa_program_lexer_alloc ( sizeof( struct yyguts_t ), NULL );
+
+    if (*ptr_yy_globals == NULL){
+        errno = ENOMEM;
+        return 1;
+    }
+
+    /* By setting to 0xAA, we expose bugs in yy_init_globals. Leave at 0x00 for releases. */
+    memset(*ptr_yy_globals,0x00,sizeof(struct yyguts_t));
+
+    return yy_init_globals ( *ptr_yy_globals );
+}
+
+/* _mesa_program_lexer_lex_init_extra has the same functionality as _mesa_program_lexer_lex_init, but follows the
+ * convention of taking the scanner as the last argument. Note however, that
+ * this is a *pointer* to a scanner, as it will be allocated by this call (and
+ * is the reason, too, why this function also must handle its own declaration).
+ * The user defined value in the first argument will be available to _mesa_program_lexer_alloc in
+ * the yyextra field.
+ */
+
+int _mesa_program_lexer_lex_init_extra(YY_EXTRA_TYPE yy_user_defined,yyscan_t* ptr_yy_globals )
+
+{
+    struct yyguts_t dummy_yyguts;
+
+    _mesa_program_lexer_set_extra (yy_user_defined, &dummy_yyguts);
+
+    if (ptr_yy_globals == NULL){
+        errno = EINVAL;
+        return 1;
+    }
+	
+    *ptr_yy_globals = (yyscan_t) _mesa_program_lexer_alloc ( sizeof( struct yyguts_t ), &dummy_yyguts );
+	
+    if (*ptr_yy_globals == NULL){
+        errno = ENOMEM;
+        return 1;
+    }
+    
+    /* By setting to 0xAA, we expose bugs in
+    yy_init_globals. Leave at 0x00 for releases. */
+    memset(*ptr_yy_globals,0x00,sizeof(struct yyguts_t));
+    
+    _mesa_program_lexer_set_extra (yy_user_defined, *ptr_yy_globals);
+    
+    return yy_init_globals ( *ptr_yy_globals );
+}
+
+static int yy_init_globals (yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+    /* Initialization is the same as for the non-reentrant scanner.
+     * This function is called from _mesa_program_lexer_lex_destroy(), so don't allocate here.
+     */
+
+    yyg->yy_buffer_stack = 0;
+    yyg->yy_buffer_stack_top = 0;
+    yyg->yy_buffer_stack_max = 0;
+    yyg->yy_c_buf_p = (char *) 0;
+    yyg->yy_init = 0;
+    yyg->yy_start = 0;
+
+    yyg->yy_start_stack_ptr = 0;
+    yyg->yy_start_stack_depth = 0;
+    yyg->yy_start_stack =  NULL;
+
+/* Defined in main.c */
+#ifdef YY_STDINIT
+    yyin = stdin;
+    yyout = stdout;
+#else
+    yyin = (FILE *) 0;
+    yyout = (FILE *) 0;
+#endif
+
+    /* For future reference: Set errno on error, since we are called by
+     * _mesa_program_lexer_lex_init()
+     */
+    return 0;
+}
+
+/* _mesa_program_lexer_lex_destroy is for both reentrant and non-reentrant scanners. */
+int _mesa_program_lexer_lex_destroy  (yyscan_t yyscanner)
+{
+    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
+
+    /* Pop the buffer stack, destroying each element. */
+	while(YY_CURRENT_BUFFER){
+		_mesa_program_lexer__delete_buffer(YY_CURRENT_BUFFER ,yyscanner );
+		YY_CURRENT_BUFFER_LVALUE = NULL;
+		_mesa_program_lexer_pop_buffer_state(yyscanner);
+	}
+
+	/* Destroy the stack itself. */
+	_mesa_program_lexer_free(yyg->yy_buffer_stack ,yyscanner);
+	yyg->yy_buffer_stack = NULL;
+
+    /* Destroy the start condition stack. */
+        _mesa_program_lexer_free(yyg->yy_start_stack ,yyscanner );
+        yyg->yy_start_stack = NULL;
+
+    /* Reset the globals. This is important in a non-reentrant scanner so the next time
+     * _mesa_program_lexer_lex() is called, initialization will occur. */
+    yy_init_globals( yyscanner);
+
+    /* Destroy the main struct (reentrant only). */
+    _mesa_program_lexer_free ( yyscanner , yyscanner );
+    yyscanner = NULL;
+    return 0;
+}
+
+/*
+ * Internal utility routines.
+ */
+
+#ifndef yytext_ptr
+static void yy_flex_strncpy (char* s1, yyconst char * s2, int n , yyscan_t yyscanner)
+{
+	register int i;
+	for ( i = 0; i < n; ++i )
+		s1[i] = s2[i];
+}
+#endif
+
+#ifdef YY_NEED_STRLEN
+static int yy_flex_strlen (yyconst char * s , yyscan_t yyscanner)
+{
+	register int n;
+	for ( n = 0; s[n]; ++n )
+		;
+
+	return n;
+}
+#endif
+
+void *_mesa_program_lexer_alloc (yy_size_t  size , yyscan_t yyscanner)
+{
+	return (void *) malloc( size );
+}
+
+void *_mesa_program_lexer_realloc  (void * ptr, yy_size_t  size , yyscan_t yyscanner)
+{
+	/* The cast to (char *) in the following accommodates both
+	 * implementations that use char* generic pointers, and those
+	 * that use void* generic pointers.  It works with the latter
+	 * because both ANSI C and C++ allow castless assignment from
+	 * any pointer type to void*, and deal with argument conversions
+	 * as though doing an assignment.
+	 */
+	return (void *) realloc( (char *) ptr, size );
+}
+
+void _mesa_program_lexer_free (void * ptr , yyscan_t yyscanner)
+{
+	free( (char *) ptr );	/* see _mesa_program_lexer_realloc() for (char *) cast */
+}
+
+#define YYTABLES_NAME "yytables"
+
+#line 494 "program/program_lexer.l"
+
+
+
+void
+_mesa_program_lexer_ctor(void **scanner, struct asm_parser_state *state,
+			 const char *string, size_t len)
+{
+   _mesa_program_lexer_lex_init_extra(state,scanner);
+   _mesa_program_lexer__scan_bytes(string,len,*scanner);
+}
+
+void
+_mesa_program_lexer_dtor(void *scanner)
+{
+   _mesa_program_lexer_lex_destroy(scanner);
+}
+

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/prog_cache.c b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_cache.c
new file mode 100644
index 0000000..c1c3d84
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_cache.c

@@ -0,0 +1,258 @@
+/**************************************************************************
+ * 
+ * Copyright 2003 VMware, Inc.
+ * All Rights Reserved.
+ * 
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ * 
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ * 
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
+ * IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR
+ * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ * 
+ **************************************************************************/
+
+
+#include "main/glheader.h"
+#include "main/mtypes.h"
+#include "main/imports.h"
+#include "main/shaderobj.h"
+#include "program/prog_cache.h"
+#include "program/program.h"
+
+
+struct cache_item
+{
+   GLuint hash;
+   unsigned keysize;
+   void *key;
+   struct gl_program *program;
+   struct cache_item *next;
+};
+
+struct gl_program_cache
+{
+   struct cache_item **items;
+   struct cache_item *last;
+   GLuint size, n_items;
+};
+
+
+
+/**
+ * Compute hash index from state key.
+ */
+static GLuint
+hash_key(const void *key, GLuint key_size)
+{
+   const GLuint *ikey = (const GLuint *) key;
+   GLuint hash = 0, i;
+
+   assert(key_size >= 4);
+
+   /* Make a slightly better attempt at a hash function:
+    */
+   for (i = 0; i < key_size / sizeof(*ikey); i++)
+   {
+      hash += ikey[i];
+      hash += (hash << 10);
+      hash ^= (hash >> 6);
+   }
+
+   return hash;
+}
+
+
+/**
+ * Rebuild/expand the hash table to accommodate more entries
+ */
+static void
+rehash(struct gl_program_cache *cache)
+{
+   struct cache_item **items;
+   struct cache_item *c, *next;
+   GLuint size, i;
+
+   cache->last = NULL;
+
+   size = cache->size * 3;
+   items = malloc(size * sizeof(*items));
+   memset(items, 0, size * sizeof(*items));
+
+   for (i = 0; i < cache->size; i++)
+      for (c = cache->items[i]; c; c = next) {
+	 next = c->next;
+	 c->next = items[c->hash % size];
+	 items[c->hash % size] = c;
+      }
+
+   free(cache->items);
+   cache->items = items;
+   cache->size = size;
+}
+
+
+static void
+clear_cache(struct gl_context *ctx, struct gl_program_cache *cache,
+	    GLboolean shader)
+{
+   struct cache_item *c, *next;
+   GLuint i;
+   
+   cache->last = NULL;
+
+   for (i = 0; i < cache->size; i++) {
+      for (c = cache->items[i]; c; c = next) {
+	 next = c->next;
+	 free(c->key);
+	 if (shader) {
+	    _mesa_reference_shader_program(ctx,
+					   (struct gl_shader_program **)&c->program,
+					   NULL);
+	 } else {
+	    _mesa_reference_program(ctx, &c->program, NULL);
+	 }
+	 free(c);
+      }
+      cache->items[i] = NULL;
+   }
+
+
+   cache->n_items = 0;
+}
+
+
+
+struct gl_program_cache *
+_mesa_new_program_cache(void)
+{
+   struct gl_program_cache *cache = CALLOC_STRUCT(gl_program_cache);
+   if (cache) {
+      cache->size = 17;
+      cache->items =
+         calloc(1, cache->size * sizeof(struct cache_item));
+      if (!cache->items) {
+         free(cache);
+         return NULL;
+      }
+   }
+   return cache;
+}
+
+
+void
+_mesa_delete_program_cache(struct gl_context *ctx, struct gl_program_cache *cache)
+{
+   clear_cache(ctx, cache, GL_FALSE);
+   free(cache->items);
+   free(cache);
+}
+
+void
+_mesa_delete_shader_cache(struct gl_context *ctx,
+			  struct gl_program_cache *cache)
+{
+   clear_cache(ctx, cache, GL_TRUE);
+   free(cache->items);
+   free(cache);
+}
+
+
+struct gl_program *
+_mesa_search_program_cache(struct gl_program_cache *cache,
+                           const void *key, GLuint keysize)
+{
+   if (cache->last &&
+       cache->last->keysize == keysize &&
+       memcmp(cache->last->key, key, keysize) == 0) {
+      return cache->last->program;
+   }
+   else {
+      const GLuint hash = hash_key(key, keysize);
+      struct cache_item *c;
+
+      for (c = cache->items[hash % cache->size]; c; c = c->next) {
+         if (c->hash == hash &&
+             c->keysize == keysize &&
+             memcmp(c->key, key, keysize) == 0) {
+
+            cache->last = c;
+            return c->program;
+         }
+      }
+
+      return NULL;
+   }
+}
+
+
+void
+_mesa_program_cache_insert(struct gl_context *ctx,
+                           struct gl_program_cache *cache,
+                           const void *key, GLuint keysize,
+                           struct gl_program *program)
+{
+   const GLuint hash = hash_key(key, keysize);
+   struct cache_item *c = CALLOC_STRUCT(cache_item);
+
+   c->hash = hash;
+
+   c->key = malloc(keysize);
+   memcpy(c->key, key, keysize);
+   c->keysize = keysize;
+
+   c->program = program;  /* no refcount change */
+
+   if (cache->n_items > cache->size * 1.5) {
+      if (cache->size < 1000)
+	 rehash(cache);
+      else 
+	 clear_cache(ctx, cache, GL_FALSE);
+   }
+
+   cache->n_items++;
+   c->next = cache->items[hash % cache->size];
+   cache->items[hash % cache->size] = c;
+}
+
+void
+_mesa_shader_cache_insert(struct gl_context *ctx,
+			  struct gl_program_cache *cache,
+			  const void *key, GLuint keysize,
+			  struct gl_shader_program *program)
+{
+   const GLuint hash = hash_key(key, keysize);
+   struct cache_item *c = CALLOC_STRUCT(cache_item);
+
+   c->hash = hash;
+
+   c->key = malloc(keysize);
+   memcpy(c->key, key, keysize);
+   c->keysize = keysize;
+
+   c->program = (struct gl_program *)program;  /* no refcount change */
+
+   if (cache->n_items > cache->size * 1.5) {
+      if (cache->size < 1000)
+	 rehash(cache);
+      else
+	 clear_cache(ctx, cache, GL_TRUE);
+   }
+
+   cache->n_items++;
+   c->next = cache->items[hash % cache->size];
+   cache->items[hash % cache->size] = c;
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/prog_cache.h b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_cache.h
new file mode 100644
index 0000000..fdd7e26
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_cache.h

@@ -0,0 +1,68 @@
+/**************************************************************************
+ * 
+ * Copyright 2003 VMware, Inc.
+ * All Rights Reserved.
+ * 
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ * 
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ * 
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
+ * IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR
+ * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ * 
+ **************************************************************************/
+
+
+#ifndef PROG_CACHE_H
+#define PROG_CACHE_H
+
+
+#include "main/glheader.h"
+
+struct gl_context;
+
+/** Opaque type */
+struct gl_program_cache;
+
+
+extern struct gl_program_cache *
+_mesa_new_program_cache(void);
+
+extern void
+_mesa_delete_program_cache(struct gl_context *ctx, struct gl_program_cache *pc);
+
+extern void
+_mesa_delete_shader_cache(struct gl_context *ctx,
+			  struct gl_program_cache *cache);
+
+extern struct gl_program *
+_mesa_search_program_cache(struct gl_program_cache *cache,
+                           const void *key, GLuint keysize);
+
+extern void
+_mesa_program_cache_insert(struct gl_context *ctx,
+                           struct gl_program_cache *cache,
+                           const void *key, GLuint keysize,
+                           struct gl_program *program);
+
+void
+_mesa_shader_cache_insert(struct gl_context *ctx,
+			  struct gl_program_cache *cache,
+			  const void *key, GLuint keysize,
+			  struct gl_shader_program *program);
+
+
+#endif /* PROG_CACHE_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/prog_diskcache.c b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_diskcache.c
new file mode 100644
index 0000000..76a7641
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_diskcache.c

@@ -0,0 +1,656 @@
+/* -*- c -*- */
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include <sys/stat.h>
+#include "shader_cache.h"
+#include "prog_diskcache.h"
+#include "glsl_parser_extras.h"
+
+#ifndef _WIN32
+#include <dirent.h>
+
+struct dir_entry_t
+{
+   char *path;
+   struct stat info;
+};
+
+
+static int
+sort_by_access_time(const void *_a, const void *_b)
+{
+   /* Compare access time of 2 entries */
+   struct dir_entry_t *a = (struct dir_entry_t *) _a;
+   struct dir_entry_t *b = (struct dir_entry_t *) _b;
+
+   if (a->info.st_atime > b->info.st_atime)
+      return 1;
+   return -1;
+}
+
+
+static int
+valid_cache_entry(const struct dirent *entry)
+{
+   /* Only regular files are possible valid cache entries. */
+   if (entry->d_type == DT_REG)
+      return 1;
+   return 0;
+}
+
+
+/**
+ * Cache size management. If cache size exceeds max_cache_size,
+ * entries are sorted by access time and oldest entries deleted
+ * until we fit.
+ */
+static void
+manage_cache_size(const char *path, const unsigned max_cache_size)
+{
+   if (max_cache_size == 0xFFFFFFFF) {
+      /* cache size of -1 means don't limit the size */
+      return;
+   }
+
+   struct dirent **entries;
+   int n = scandir(path, &entries, valid_cache_entry, NULL);
+
+   if (n <= 0)
+      return;
+
+   struct dir_entry_t *cache = NULL;
+   unsigned cache_size = 0;
+   unsigned cache_entries = 0;
+
+   void *mem_ctx = ralloc_context(NULL);
+
+   cache = ralloc_array(mem_ctx, struct dir_entry_t, n);
+
+   /* Construct entries with path and access information + calculate
+    * total size used by entries.
+    */
+   while (n--) {
+      cache[cache_entries].path =
+         ralloc_asprintf(mem_ctx, "%s/%s", path, entries[n]->d_name);
+      stat(cache[cache_entries].path, &cache[cache_entries].info);
+
+      cache_size += cache[cache_entries].info.st_size;
+
+      cache_entries++;
+      free(entries[n]);
+   }
+   free(entries);
+
+   /* No need to manage if we fit the max size. */
+   if (cache_size < max_cache_size)
+      goto free_allocated_memory;
+
+   /* Sort oldest first so we can 'delete until cache size less than max'. */
+   qsort(cache, cache_entries, sizeof(struct dir_entry_t), sort_by_access_time);
+
+   unsigned i = 0;
+   while (cache_size > max_cache_size && i < cache_entries) {
+      unlink(cache[i].path);
+      cache_size -= cache[i].info.st_size;
+      i++;
+   }
+
+free_allocated_memory:
+
+   ralloc_free(mem_ctx);
+}
+#endif
+
+
+static int
+mesa_mkdir_cache(const char *path)
+{
+   char *copy = _mesa_strdup(path);
+   char *dir = strtok(copy, "/");
+   char *current = ralloc_strdup(NULL, "/");
+   int result = 0;
+
+   /* As example loop iterates and calls mkdir for each token
+    * separated by '/' in "/home/yogsothoth/.cache/mesa".
+    */
+   while (dir) {
+      ralloc_strcat(&current, dir);
+
+      result = _mesa_mkdir(current);
+
+      if (result != 0 && result != EEXIST)
+         return -1;
+
+      ralloc_strcat(&current, "/");
+      dir = strtok(NULL, "/");
+   }
+
+   ralloc_free(current);
+   free(copy);
+
+   return 0;
+}
+
+
+int
+mesa_program_diskcache_init(struct gl_context *ctx)
+{
+   const char *tmp = "/tmp", *cache_root = NULL;
+   int result = 0;
+
+   if (ctx->Const.MaxShaderCacheSize == 0) {
+      // if 0 (default) then no cache will be active
+      ctx->BinaryProgramCacheActive = false;
+      return -1;
+   }
+
+   cache_root = _mesa_getenv("XDG_CACHE_DIR");
+   if (!cache_root)
+      cache_root = _mesa_getenv("HOME");
+   if (!cache_root)
+      cache_root = tmp;
+
+   asprintf(&ctx->BinaryProgramCachePath, "%s/.cache/mesa/programs", cache_root);
+
+   struct stat stat_info;
+   if (stat(ctx->BinaryProgramCachePath, &stat_info) != 0)
+      result = mesa_mkdir_cache(ctx->BinaryProgramCachePath);
+#ifndef _WIN32
+   else
+      manage_cache_size(ctx->BinaryProgramCachePath, ctx->Const.MaxShaderCacheSize);
+#endif
+
+   if (result == 0)
+      ctx->BinaryProgramCacheActive = true;
+   else
+      ctx->BinaryProgramCacheActive = false;
+
+   return result;
+}
+
+
+static uint32_t
+checksum(const char *src)
+{
+   uint32_t sum = _mesa_str_checksum(src);
+   unsigned i;
+
+   /* Add some sugar on top (borrowed from brw_state_cache). This is meant
+    * to catch cache collisions when there are only small changes in the
+    * source such as mat3 -> mat4 in a type for example.
+    */
+   for (i = 0; i < strlen(src); i++) {
+      sum ^= (uint32_t) src[i];
+      sum = (sum << 5) | (sum >> 27);
+   }
+
+   return sum;
+}
+
+
+/**
+ * Attempt to generate unique key for a gl_shader_program.
+ * TODO - this should be stronger and be based on some of the
+ * gl_shader_program content, not just sources.
+ */
+static char *
+generate_key(struct gl_shader_program *prog)
+{
+   char *key = ralloc_strdup(prog, "");
+   for (unsigned i = 0; i < prog->NumShaders; i++) {
+
+      /* No source, no key. */
+      if (!prog->Shaders[i]->Source)
+         return NULL;
+
+      /* At least some content required. */
+      if (strcmp(prog->Shaders[i]->Source, "") == 0)
+         return NULL;
+
+      uint64_t sum = checksum(prog->Shaders[i]->Source);
+
+      char tmp[32];
+      _mesa_snprintf(tmp, 32, "%lu", sum);
+
+      ralloc_strcat(&key, tmp);
+   }
+
+   /* Key needs to have enough content. */
+   if (strlen(key) < 7) {
+      ralloc_free(key);
+      key = NULL;
+   }
+
+   return key;
+}
+
+
+/**
+ * Cache gl_shader_program to disk
+ */
+int
+mesa_program_diskcache_cache(struct gl_context *ctx, struct gl_shader_program *prog)
+{
+   assert(ctx);
+   int result = -1;
+   struct stat stat_info;
+   char *key;
+
+   key = generate_key(prog);
+
+   if (!key)
+      return -1;
+
+   /* Glassy vs. Opaque compiled shaders */
+   ralloc_strcat(&key, ".g");
+
+   char *shader_path =
+      ralloc_asprintf(NULL, "%s/%s.bin", ctx->BinaryProgramCachePath, key);
+
+   /* Collision, do not attempt to overwrite. */
+   if (stat(shader_path, &stat_info) == 0)
+      goto cache_epilogue;
+
+   size_t size = 0;
+   char *data = mesa_program_serialize(ctx, prog, &size);
+
+   if (!data)
+      goto cache_epilogue;
+
+   FILE *out = fopen(shader_path, "w+");
+
+   if (!out)
+      goto cache_epilogue;
+
+   fwrite(data, size, 1, out);
+   fclose(out);
+   free(data);
+   result = 0;
+
+cache_epilogue:
+   ralloc_free(shader_path);
+   ralloc_free(key);
+   return result;
+}
+
+bool
+supported_by_program_cache(struct gl_shader_program *prog, bool is_write)
+{
+   /* No geometry shader support. */
+   if (prog->_LinkedShaders[MESA_SHADER_GEOMETRY])
+      return false;
+
+   /* No uniform block support. */
+   if (prog->NumUniformBlocks > 0)
+      return false;
+
+   /* No transform feedback support. */
+   if (prog->TransformFeedback.NumVarying > 0)
+           return false;
+
+   /* These more expensive checks should only be run when generating
+    * the cache entry
+    */
+   if (is_write)
+   {
+      /* Uniform structs are not working */
+      if (prog->UniformStorage) {
+         for (unsigned i = 0; i < prog->NumUserUniformStorage; i++) {
+            if (strchr(prog->UniformStorage[i].name, '.')) {
+               /* The uniforms struct fields have already been broken
+                * down into unique variables, we have to inspect their
+                * names and kick back since these aren't working.
+                */
+               return false;
+            }
+         }
+      }
+
+      /* This is nasty!  Short term solution for correctness! */
+      for (unsigned i = 0; i < prog->NumShaders; i++) {
+         if (prog->Shaders[i] && prog->Shaders[i]->Source) {
+            /* This opcode is not supported by MesaIR (yet?) */
+            if (strstr(prog->Shaders[i]->Source, "textureQueryLevels"))
+               return false;
+         }
+      }
+   }
+
+   return true;
+}
+
+/**
+ * Fill gl_shader_program from cache if found
+ */
+int
+mesa_program_diskcache_find(struct gl_context *ctx, struct gl_shader_program *prog)
+{
+   assert(ctx);
+   int result = 0;
+   char *key;
+
+   /* Do not use diskcache when program relinks. Relinking is not
+    * currently supported due to the way how cache key gets generated.
+    * We would need to modify key generation to take account hashtables
+    * and possible other data in gl_shader_program to catch changes in
+    * the data used as input for linker (resulting in a different program
+    * with same sources).
+    */
+   if (prog->_Linked)
+      return -1;
+
+   /* This is heavier than we'd like, but string compares are
+    * insufficient for this caching.  It depends on state as well.
+    */
+   if (!supported_by_program_cache(prog, false /* is_write */))
+      return -1;
+
+   key = generate_key(prog);
+
+   if (!key)
+      return -1;
+
+   /* Glassy vs. Opaque compiled shaders */
+   ralloc_strcat(&key, ".g");
+
+
+   char *shader_path =
+      ralloc_asprintf(NULL, "%s/%s.bin", ctx->BinaryProgramCachePath, key);
+
+   result = mesa_program_load(ctx, prog, shader_path);
+
+   // TODO:  ensure we did not get a false cache hit by comparing the
+   // incoming strings with what we just deserialized
+
+   ralloc_free(shader_path);
+   ralloc_free(key);
+
+   return result;
+}
+
+
+
+/* The following functions are shader verions of program caching functions
+ * They could be moved to another file, or merged with the program version
+ * since several have line of code in common.  This hasn't been done yet to
+ * prevent premature optimization.
+ */
+
+bool
+supported_by_shader_cache(struct gl_shader *shader, bool is_write)
+{
+   /* No geometry shader support. */
+   // how hard to add?
+   if (shader->Stage == MESA_SHADER_GEOMETRY)
+      return false;
+
+   /* No uniform block support. */
+   if (shader->NumUniformBlocks > 0)
+      return false;
+
+   return true;
+}
+
+
+static int
+mesa_mkdir_shader_cache(const char *path)
+{
+   char *copy = _mesa_strdup(path);
+   char *dir = strtok(copy, "/");
+   char *current = ralloc_strdup(NULL, "/");
+   int result = 0;
+
+   /* As example loop iterates and calls mkdir for each token
+    * separated by '/' in "/home/yogsothoth/.cache/mesa".
+    */
+   while (dir) {
+      ralloc_strcat(&current, dir);
+
+      result = _mesa_mkdir(current);
+
+      if (result != 0 && result != EEXIST)
+         return -1;
+
+      ralloc_strcat(&current, "/");
+      dir = strtok(NULL, "/");
+   }
+
+   ralloc_free(current);
+   free(copy);
+
+   return 0;
+}
+
+
+
+/* This is based on mesa_program_diskcache_init,
+ * would be good to merge them at some point.
+ */
+
+int
+mesa_shader_diskcache_init(struct gl_context *ctx)
+{
+   const char *tmp = "/tmp", *cache_root = NULL;
+   int result = 0;
+
+   if (ctx->Const.MaxShaderCacheSize == 0) {
+      // if 0 (default) then no cache will be active
+      ctx->BinaryShaderCacheActive = false;
+      return -1;
+   }
+
+   cache_root = _mesa_getenv("XDG_CACHE_DIR");
+   if (!cache_root)
+      cache_root = _mesa_getenv("HOME");
+   if (!cache_root)
+      cache_root = tmp;
+
+   asprintf(&ctx->BinaryShaderCachePath, "%s/.cache/mesa/shaders", cache_root);
+
+   struct stat stat_info;
+   if (stat(ctx->BinaryShaderCachePath, &stat_info) != 0)
+      result = mesa_mkdir_shader_cache(ctx->BinaryShaderCachePath);
+#ifndef _WIN32
+   else
+      manage_cache_size(ctx->BinaryShaderCachePath, ctx->Const.MaxShaderCacheSize);
+#endif
+
+   if (result == 0)
+      ctx->BinaryShaderCacheActive = true;
+   else
+      ctx->BinaryShaderCacheActive = false;
+
+   return result;
+}
+
+
+static uint32_t
+shader_checksum(const char *src)
+{
+   uint32_t sum = _mesa_str_checksum(src);
+   unsigned i;
+
+   /* Add some sugar on top (borrowed from brw_state_cache). This is meant
+    * to catch cache collisions when there are only small changes in the
+    * source such as mat3 -> mat4 in a type for example.
+    */
+   for (i = 0; i < strlen(src); i++) {
+      sum ^= (uint32_t) src[i];
+      sum = (sum << 5) | (sum >> 27);
+   }
+
+   return sum;
+}
+
+
+/* This is based on generate_key(prog), would
+ * be good to merge them at some point.
+ */
+static char *
+generate_shader_key(struct gl_shader *shader)
+{
+   char *key = ralloc_strdup(shader, "");
+
+   /* No source, no key. */
+   if (shader->Source == NULL)
+      return NULL;
+
+   /* At least some content required. */
+   if (strcmp(shader->Source, "") == 0)
+      return NULL;
+
+   uint64_t sum = shader_checksum(shader->Source);
+
+   char tmp[32];
+   _mesa_snprintf(tmp, 32, "%lu", sum);
+
+   ralloc_strcat(&key, tmp);
+
+   /* Key needs to have enough content. */
+   if (strlen(key) < 7) {
+      ralloc_free(key);
+      key = NULL;
+   }
+
+   return key;
+}
+
+/**
+ * Deserialize gl_shader structure
+ */
+struct gl_shader *
+mesa_shader_deserialize(struct gl_context *ctx, struct gl_shader *shader,
+                        const char* path)
+{
+   return read_single_shader(ctx, shader, path);
+}
+
+
+int
+mesa_shader_load(struct gl_context *ctx, struct gl_shader *shader,
+                 const char *path)
+{
+
+   struct gl_shader *result = mesa_shader_deserialize(ctx, shader, path);
+
+   if (result)
+     return 0;
+   else
+     return MESA_SHADER_DESERIALIZE_READ_ERROR;
+}
+
+/*
+ * This is based on mesa_program_diskcache_cache, would be good to
+ * merge them at some point.
+ */
+int
+mesa_shader_diskcache_cache(struct gl_context *ctx, struct gl_shader *shader)
+{
+   assert(ctx);
+   int result = -1;
+   struct stat stat_info;
+   char *key;
+   size_t size = 0;
+
+   key = generate_shader_key(shader);
+
+   if (!key)
+      return -1;
+
+   /* Glassy vs. Opaque compiled shaders */
+   ralloc_strcat(&key, ".g");
+
+   char *shader_path =
+      ralloc_asprintf(NULL, "%s/%s.bin", ctx->BinaryShaderCachePath, key);
+
+   char *data = NULL;
+   FILE *out = NULL;
+
+   /* Collision, do not attempt to overwrite. */
+   if (stat(shader_path, &stat_info) == 0)
+      goto cache_epilogue;
+
+   data = mesa_shader_serialize(ctx, shader, &size, true /* shader_only */);
+
+   if (!data)
+      goto cache_epilogue;
+
+   out = fopen(shader_path, "w+");
+
+   if (!out)
+      goto cache_epilogue;
+
+   fwrite(data, size, 1, out);
+   fclose(out);
+   free(data);
+   result = 0;
+
+cache_epilogue:
+   ralloc_free(shader_path);
+   ralloc_free(key);
+   return result;
+}
+
+
+
+/* This is based on mesa_program_diskcache_find, would be good to
+ * merge them at some point.
+ */
+int
+mesa_shader_diskcache_find(struct gl_context *ctx, struct gl_shader *shader)
+{
+   int result = 0;
+   char *key;
+
+   /* Don't lookup if already compiled. */
+   if (shader->CompileStatus)
+      return -1;
+
+   /* This is heavier than we'd like, but string compares are
+    * insufficient for this caching.  It depends on state as well.
+    */
+   if (!supported_by_shader_cache(shader, false /* is_write */))
+      return -1;
+
+   key = generate_shader_key(shader);
+
+   if (!key)
+      return -1;
+
+   /* Glassy vs. Opaque compiled shaders */
+   ralloc_strcat(&key, ".g");
+
+   char *shader_path =
+      ralloc_asprintf(NULL, "%s/%s.bin", ctx->BinaryShaderCachePath, key);
+
+   result = mesa_shader_load(ctx, shader, shader_path);
+
+   // TODO:  ensure we did not get a false cache hit by comparing the
+   // incoming string with what we just deserialized
+
+   ralloc_free(shader_path);
+   ralloc_free(key);
+
+   return result;
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/prog_diskcache.h b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_diskcache.h
new file mode 100644
index 0000000..6e3973b
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_diskcache.h

@@ -0,0 +1,57 @@
+/* -*- c -*- */
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef PROGRAM_DISKCACHE_H
+#define PROGRAM_DISKCACHE_H
+
+#include "main/mtypes.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+int
+mesa_program_diskcache_init(struct gl_context *ctx);
+
+int
+mesa_program_diskcache_cache(struct gl_context *ctx, struct gl_shader_program *prog);
+
+int
+mesa_program_diskcache_find(struct gl_context *ctx, struct gl_shader_program *prog);
+
+extern int
+mesa_shader_diskcache_find(struct gl_context *ctx, struct gl_shader *shader);
+
+extern int
+mesa_shader_diskcache_cache(struct gl_context *ctx, struct gl_shader *shader);
+
+int
+mesa_shader_diskcache_init(struct gl_context *ctx);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* PROGRAM_DISKCACHE_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/prog_execute.c b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_execute.c
new file mode 100644
index 0000000..115525e
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_execute.c

@@ -0,0 +1,1679 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2008  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file prog_execute.c
+ * Software interpreter for vertex/fragment programs.
+ * \author Brian Paul
+ */
+
+/*
+ * NOTE: we do everything in single-precision floating point; we don't
+ * currently observe the single/half/fixed-precision qualifiers.
+ *
+ */
+
+
+#include "main/glheader.h"
+#include "main/colormac.h"
+#include "main/macros.h"
+#include "prog_execute.h"
+#include "prog_instruction.h"
+#include "prog_parameter.h"
+#include "prog_print.h"
+#include "prog_noise.h"
+
+
+/* debug predicate */
+#define DEBUG_PROG 0
+
+
+/**
+ * Set x to positive or negative infinity.
+ */
+#if defined(USE_IEEE) || defined(_WIN32)
+#define SET_POS_INFINITY(x)                  \
+   do {                                      \
+         fi_type fi;                         \
+         fi.i = 0x7F800000;                  \
+         x = fi.f;                           \
+   } while (0)
+#define SET_NEG_INFINITY(x)                  \
+   do {                                      \
+         fi_type fi;                         \
+         fi.i = 0xFF800000;                  \
+         x = fi.f;                           \
+   } while (0)
+#else
+#define SET_POS_INFINITY(x)  x = (GLfloat) HUGE_VAL
+#define SET_NEG_INFINITY(x)  x = (GLfloat) -HUGE_VAL
+#endif
+
+#define SET_FLOAT_BITS(x, bits) ((fi_type *) (void *) &(x))->i = bits
+
+
+static const GLfloat ZeroVec[4] = { 0.0F, 0.0F, 0.0F, 0.0F };
+
+
+/**
+ * Return a pointer to the 4-element float vector specified by the given
+ * source register.
+ */
+static inline const GLfloat *
+get_src_register_pointer(const struct prog_src_register *source,
+                         const struct gl_program_machine *machine)
+{
+   const struct gl_program *prog = machine->CurProgram;
+   GLint reg = source->Index;
+
+   if (source->RelAddr) {
+      /* add address register value to src index/offset */
+      reg += machine->AddressReg[0][0];
+      if (reg < 0) {
+         return ZeroVec;
+      }
+   }
+
+   switch (source->File) {
+   case PROGRAM_TEMPORARY:
+      if (reg >= MAX_PROGRAM_TEMPS)
+         return ZeroVec;
+      return machine->Temporaries[reg];
+
+   case PROGRAM_INPUT:
+      if (prog->Target == GL_VERTEX_PROGRAM_ARB) {
+         if (reg >= VERT_ATTRIB_MAX)
+            return ZeroVec;
+         return machine->VertAttribs[reg];
+      }
+      else {
+         if (reg >= VARYING_SLOT_MAX)
+            return ZeroVec;
+         return machine->Attribs[reg][machine->CurElement];
+      }
+
+   case PROGRAM_OUTPUT:
+      if (reg >= MAX_PROGRAM_OUTPUTS)
+         return ZeroVec;
+      return machine->Outputs[reg];
+
+   case PROGRAM_STATE_VAR:
+      /* Fallthrough */
+   case PROGRAM_CONSTANT:
+      /* Fallthrough */
+   case PROGRAM_UNIFORM:
+      if (reg >= (GLint) prog->Parameters->NumParameters)
+         return ZeroVec;
+      return (GLfloat *) prog->Parameters->ParameterValues[reg];
+
+   case PROGRAM_SYSTEM_VALUE:
+      assert(reg < Elements(machine->SystemValues));
+      return machine->SystemValues[reg];
+
+   default:
+      _mesa_problem(NULL,
+         "Invalid src register file %d in get_src_register_pointer()",
+         source->File);
+      return ZeroVec;
+   }
+}
+
+
+/**
+ * Return a pointer to the 4-element float vector specified by the given
+ * destination register.
+ */
+static inline GLfloat *
+get_dst_register_pointer(const struct prog_dst_register *dest,
+                         struct gl_program_machine *machine)
+{
+   static GLfloat dummyReg[4];
+   GLint reg = dest->Index;
+
+   if (dest->RelAddr) {
+      /* add address register value to src index/offset */
+      reg += machine->AddressReg[0][0];
+      if (reg < 0) {
+         return dummyReg;
+      }
+   }
+
+   switch (dest->File) {
+   case PROGRAM_TEMPORARY:
+      if (reg >= MAX_PROGRAM_TEMPS)
+         return dummyReg;
+      return machine->Temporaries[reg];
+
+   case PROGRAM_OUTPUT:
+      if (reg >= MAX_PROGRAM_OUTPUTS)
+         return dummyReg;
+      return machine->Outputs[reg];
+
+   default:
+      _mesa_problem(NULL,
+         "Invalid dest register file %d in get_dst_register_pointer()",
+         dest->File);
+      return dummyReg;
+   }
+}
+
+
+
+/**
+ * Fetch a 4-element float vector from the given source register.
+ * Apply swizzling and negating as needed.
+ */
+static void
+fetch_vector4(const struct prog_src_register *source,
+              const struct gl_program_machine *machine, GLfloat result[4])
+{
+   const GLfloat *src = get_src_register_pointer(source, machine);
+
+   if (source->Swizzle == SWIZZLE_NOOP) {
+      /* no swizzling */
+      COPY_4V(result, src);
+   }
+   else {
+      ASSERT(GET_SWZ(source->Swizzle, 0) <= 3);
+      ASSERT(GET_SWZ(source->Swizzle, 1) <= 3);
+      ASSERT(GET_SWZ(source->Swizzle, 2) <= 3);
+      ASSERT(GET_SWZ(source->Swizzle, 3) <= 3);
+      result[0] = src[GET_SWZ(source->Swizzle, 0)];
+      result[1] = src[GET_SWZ(source->Swizzle, 1)];
+      result[2] = src[GET_SWZ(source->Swizzle, 2)];
+      result[3] = src[GET_SWZ(source->Swizzle, 3)];
+   }
+
+   if (source->Abs) {
+      result[0] = FABSF(result[0]);
+      result[1] = FABSF(result[1]);
+      result[2] = FABSF(result[2]);
+      result[3] = FABSF(result[3]);
+   }
+   if (source->Negate) {
+      ASSERT(source->Negate == NEGATE_XYZW);
+      result[0] = -result[0];
+      result[1] = -result[1];
+      result[2] = -result[2];
+      result[3] = -result[3];
+   }
+
+#ifdef NAN_CHECK
+   assert(!IS_INF_OR_NAN(result[0]));
+   assert(!IS_INF_OR_NAN(result[0]));
+   assert(!IS_INF_OR_NAN(result[0]));
+   assert(!IS_INF_OR_NAN(result[0]));
+#endif
+}
+
+
+/**
+ * Fetch the derivative with respect to X or Y for the given register.
+ * XXX this currently only works for fragment program input attribs.
+ */
+static void
+fetch_vector4_deriv(struct gl_context * ctx,
+                    const struct prog_src_register *source,
+                    const struct gl_program_machine *machine,
+                    char xOrY, GLfloat result[4])
+{
+   if (source->File == PROGRAM_INPUT &&
+       source->Index < (GLint) machine->NumDeriv) {
+      const GLint col = machine->CurElement;
+      const GLfloat w = machine->Attribs[VARYING_SLOT_POS][col][3];
+      const GLfloat invQ = 1.0f / w;
+      GLfloat deriv[4];
+
+      if (xOrY == 'X') {
+         deriv[0] = machine->DerivX[source->Index][0] * invQ;
+         deriv[1] = machine->DerivX[source->Index][1] * invQ;
+         deriv[2] = machine->DerivX[source->Index][2] * invQ;
+         deriv[3] = machine->DerivX[source->Index][3] * invQ;
+      }
+      else {
+         deriv[0] = machine->DerivY[source->Index][0] * invQ;
+         deriv[1] = machine->DerivY[source->Index][1] * invQ;
+         deriv[2] = machine->DerivY[source->Index][2] * invQ;
+         deriv[3] = machine->DerivY[source->Index][3] * invQ;
+      }
+
+      result[0] = deriv[GET_SWZ(source->Swizzle, 0)];
+      result[1] = deriv[GET_SWZ(source->Swizzle, 1)];
+      result[2] = deriv[GET_SWZ(source->Swizzle, 2)];
+      result[3] = deriv[GET_SWZ(source->Swizzle, 3)];
+      
+      if (source->Abs) {
+         result[0] = FABSF(result[0]);
+         result[1] = FABSF(result[1]);
+         result[2] = FABSF(result[2]);
+         result[3] = FABSF(result[3]);
+      }
+      if (source->Negate) {
+         ASSERT(source->Negate == NEGATE_XYZW);
+         result[0] = -result[0];
+         result[1] = -result[1];
+         result[2] = -result[2];
+         result[3] = -result[3];
+      }
+   }
+   else {
+      ASSIGN_4V(result, 0.0, 0.0, 0.0, 0.0);
+   }
+}
+
+
+/**
+ * As above, but only return result[0] element.
+ */
+static void
+fetch_vector1(const struct prog_src_register *source,
+              const struct gl_program_machine *machine, GLfloat result[4])
+{
+   const GLfloat *src = get_src_register_pointer(source, machine);
+
+   result[0] = src[GET_SWZ(source->Swizzle, 0)];
+
+   if (source->Abs) {
+      result[0] = FABSF(result[0]);
+   }
+   if (source->Negate) {
+      result[0] = -result[0];
+   }
+}
+
+
+static GLuint
+fetch_vector1ui(const struct prog_src_register *source,
+                const struct gl_program_machine *machine)
+{
+   const GLuint *src = (GLuint *) get_src_register_pointer(source, machine);
+   return src[GET_SWZ(source->Swizzle, 0)];
+}
+
+
+/**
+ * Fetch texel from texture.  Use partial derivatives when possible.
+ */
+static inline void
+fetch_texel(struct gl_context *ctx,
+            const struct gl_program_machine *machine,
+            const struct prog_instruction *inst,
+            const GLfloat texcoord[4], GLfloat lodBias,
+            GLfloat color[4])
+{
+   const GLuint unit = machine->Samplers[inst->TexSrcUnit];
+
+   /* Note: we only have the right derivatives for fragment input attribs.
+    */
+   if (machine->NumDeriv > 0 &&
+       inst->SrcReg[0].File == PROGRAM_INPUT &&
+       inst->SrcReg[0].Index == VARYING_SLOT_TEX0 + inst->TexSrcUnit) {
+      /* simple texture fetch for which we should have derivatives */
+      GLuint attr = inst->SrcReg[0].Index;
+      machine->FetchTexelDeriv(ctx, texcoord,
+                               machine->DerivX[attr],
+                               machine->DerivY[attr],
+                               lodBias, unit, color);
+   }
+   else {
+      machine->FetchTexelLod(ctx, texcoord, lodBias, unit, color);
+   }
+}
+
+
+/**
+ * Test value against zero and return GT, LT, EQ or UN if NaN.
+ */
+static inline GLuint
+generate_cc(float value)
+{
+   if (value != value)
+      return COND_UN;           /* NaN */
+   if (value > 0.0F)
+      return COND_GT;
+   if (value < 0.0F)
+      return COND_LT;
+   return COND_EQ;
+}
+
+
+/**
+ * Test if the ccMaskRule is satisfied by the given condition code.
+ * Used to mask destination writes according to the current condition code.
+ */
+static inline GLboolean
+test_cc(GLuint condCode, GLuint ccMaskRule)
+{
+   switch (ccMaskRule) {
+   case COND_EQ: return (condCode == COND_EQ);
+   case COND_NE: return (condCode != COND_EQ);
+   case COND_LT: return (condCode == COND_LT);
+   case COND_GE: return (condCode == COND_GT || condCode == COND_EQ);
+   case COND_LE: return (condCode == COND_LT || condCode == COND_EQ);
+   case COND_GT: return (condCode == COND_GT);
+   case COND_TR: return GL_TRUE;
+   case COND_FL: return GL_FALSE;
+   default:      return GL_TRUE;
+   }
+}
+
+
+/**
+ * Evaluate the 4 condition codes against a predicate and return GL_TRUE
+ * or GL_FALSE to indicate result.
+ */
+static inline GLboolean
+eval_condition(const struct gl_program_machine *machine,
+               const struct prog_instruction *inst)
+{
+   const GLuint swizzle = inst->DstReg.CondSwizzle;
+   const GLuint condMask = inst->DstReg.CondMask;
+   if (test_cc(machine->CondCodes[GET_SWZ(swizzle, 0)], condMask) ||
+       test_cc(machine->CondCodes[GET_SWZ(swizzle, 1)], condMask) ||
+       test_cc(machine->CondCodes[GET_SWZ(swizzle, 2)], condMask) ||
+       test_cc(machine->CondCodes[GET_SWZ(swizzle, 3)], condMask)) {
+      return GL_TRUE;
+   }
+   else {
+      return GL_FALSE;
+   }
+}
+
+
+
+/**
+ * Store 4 floats into a register.  Observe the instructions saturate and
+ * set-condition-code flags.
+ */
+static void
+store_vector4(const struct prog_instruction *inst,
+              struct gl_program_machine *machine, const GLfloat value[4])
+{
+   const struct prog_dst_register *dstReg = &(inst->DstReg);
+   const GLboolean clamp = inst->SaturateMode == SATURATE_ZERO_ONE;
+   GLuint writeMask = dstReg->WriteMask;
+   GLfloat clampedValue[4];
+   GLfloat *dst = get_dst_register_pointer(dstReg, machine);
+
+#if 0
+   if (value[0] > 1.0e10 ||
+       IS_INF_OR_NAN(value[0]) ||
+       IS_INF_OR_NAN(value[1]) ||
+       IS_INF_OR_NAN(value[2]) || IS_INF_OR_NAN(value[3]))
+      printf("store %g %g %g %g\n", value[0], value[1], value[2], value[3]);
+#endif
+
+   if (clamp) {
+      clampedValue[0] = CLAMP(value[0], 0.0F, 1.0F);
+      clampedValue[1] = CLAMP(value[1], 0.0F, 1.0F);
+      clampedValue[2] = CLAMP(value[2], 0.0F, 1.0F);
+      clampedValue[3] = CLAMP(value[3], 0.0F, 1.0F);
+      value = clampedValue;
+   }
+
+   if (dstReg->CondMask != COND_TR) {
+      /* condition codes may turn off some writes */
+      if (writeMask & WRITEMASK_X) {
+         if (!test_cc(machine->CondCodes[GET_SWZ(dstReg->CondSwizzle, 0)],
+                      dstReg->CondMask))
+            writeMask &= ~WRITEMASK_X;
+      }
+      if (writeMask & WRITEMASK_Y) {
+         if (!test_cc(machine->CondCodes[GET_SWZ(dstReg->CondSwizzle, 1)],
+                      dstReg->CondMask))
+            writeMask &= ~WRITEMASK_Y;
+      }
+      if (writeMask & WRITEMASK_Z) {
+         if (!test_cc(machine->CondCodes[GET_SWZ(dstReg->CondSwizzle, 2)],
+                      dstReg->CondMask))
+            writeMask &= ~WRITEMASK_Z;
+      }
+      if (writeMask & WRITEMASK_W) {
+         if (!test_cc(machine->CondCodes[GET_SWZ(dstReg->CondSwizzle, 3)],
+                      dstReg->CondMask))
+            writeMask &= ~WRITEMASK_W;
+      }
+   }
+
+#ifdef NAN_CHECK
+   assert(!IS_INF_OR_NAN(value[0]));
+   assert(!IS_INF_OR_NAN(value[0]));
+   assert(!IS_INF_OR_NAN(value[0]));
+   assert(!IS_INF_OR_NAN(value[0]));
+#endif
+
+   if (writeMask & WRITEMASK_X)
+      dst[0] = value[0];
+   if (writeMask & WRITEMASK_Y)
+      dst[1] = value[1];
+   if (writeMask & WRITEMASK_Z)
+      dst[2] = value[2];
+   if (writeMask & WRITEMASK_W)
+      dst[3] = value[3];
+
+   if (inst->CondUpdate) {
+      if (writeMask & WRITEMASK_X)
+         machine->CondCodes[0] = generate_cc(value[0]);
+      if (writeMask & WRITEMASK_Y)
+         machine->CondCodes[1] = generate_cc(value[1]);
+      if (writeMask & WRITEMASK_Z)
+         machine->CondCodes[2] = generate_cc(value[2]);
+      if (writeMask & WRITEMASK_W)
+         machine->CondCodes[3] = generate_cc(value[3]);
+#if DEBUG_PROG
+      printf("CondCodes=(%s,%s,%s,%s) for:\n",
+             _mesa_condcode_string(machine->CondCodes[0]),
+             _mesa_condcode_string(machine->CondCodes[1]),
+             _mesa_condcode_string(machine->CondCodes[2]),
+             _mesa_condcode_string(machine->CondCodes[3]));
+#endif
+   }
+}
+
+
+/**
+ * Store 4 uints into a register.  Observe the set-condition-code flags.
+ */
+static void
+store_vector4ui(const struct prog_instruction *inst,
+                struct gl_program_machine *machine, const GLuint value[4])
+{
+   const struct prog_dst_register *dstReg = &(inst->DstReg);
+   GLuint writeMask = dstReg->WriteMask;
+   GLuint *dst = (GLuint *) get_dst_register_pointer(dstReg, machine);
+
+   if (dstReg->CondMask != COND_TR) {
+      /* condition codes may turn off some writes */
+      if (writeMask & WRITEMASK_X) {
+         if (!test_cc(machine->CondCodes[GET_SWZ(dstReg->CondSwizzle, 0)],
+                      dstReg->CondMask))
+            writeMask &= ~WRITEMASK_X;
+      }
+      if (writeMask & WRITEMASK_Y) {
+         if (!test_cc(machine->CondCodes[GET_SWZ(dstReg->CondSwizzle, 1)],
+                      dstReg->CondMask))
+            writeMask &= ~WRITEMASK_Y;
+      }
+      if (writeMask & WRITEMASK_Z) {
+         if (!test_cc(machine->CondCodes[GET_SWZ(dstReg->CondSwizzle, 2)],
+                      dstReg->CondMask))
+            writeMask &= ~WRITEMASK_Z;
+      }
+      if (writeMask & WRITEMASK_W) {
+         if (!test_cc(machine->CondCodes[GET_SWZ(dstReg->CondSwizzle, 3)],
+                      dstReg->CondMask))
+            writeMask &= ~WRITEMASK_W;
+      }
+   }
+
+   if (writeMask & WRITEMASK_X)
+      dst[0] = value[0];
+   if (writeMask & WRITEMASK_Y)
+      dst[1] = value[1];
+   if (writeMask & WRITEMASK_Z)
+      dst[2] = value[2];
+   if (writeMask & WRITEMASK_W)
+      dst[3] = value[3];
+
+   if (inst->CondUpdate) {
+      if (writeMask & WRITEMASK_X)
+         machine->CondCodes[0] = generate_cc((float)value[0]);
+      if (writeMask & WRITEMASK_Y)
+         machine->CondCodes[1] = generate_cc((float)value[1]);
+      if (writeMask & WRITEMASK_Z)
+         machine->CondCodes[2] = generate_cc((float)value[2]);
+      if (writeMask & WRITEMASK_W)
+         machine->CondCodes[3] = generate_cc((float)value[3]);
+#if DEBUG_PROG
+      printf("CondCodes=(%s,%s,%s,%s) for:\n",
+             _mesa_condcode_string(machine->CondCodes[0]),
+             _mesa_condcode_string(machine->CondCodes[1]),
+             _mesa_condcode_string(machine->CondCodes[2]),
+             _mesa_condcode_string(machine->CondCodes[3]));
+#endif
+   }
+}
+
+
+
+/**
+ * Execute the given vertex/fragment program.
+ *
+ * \param ctx  rendering context
+ * \param program  the program to execute
+ * \param machine  machine state (must be initialized)
+ * \return GL_TRUE if program completed or GL_FALSE if program executed KIL.
+ */
+GLboolean
+_mesa_execute_program(struct gl_context * ctx,
+                      const struct gl_program *program,
+                      struct gl_program_machine *machine)
+{
+   const GLuint numInst = program->NumInstructions;
+   const GLuint maxExec = 65536;
+   GLuint pc, numExec = 0;
+
+   machine->CurProgram = program;
+
+   if (DEBUG_PROG) {
+      printf("execute program %u --------------------\n", program->Id);
+   }
+
+   if (program->Target == GL_VERTEX_PROGRAM_ARB) {
+      machine->EnvParams = ctx->VertexProgram.Parameters;
+   }
+   else {
+      machine->EnvParams = ctx->FragmentProgram.Parameters;
+   }
+
+   for (pc = 0; pc < numInst; pc++) {
+      const struct prog_instruction *inst = program->Instructions + pc;
+
+      if (DEBUG_PROG) {
+         _mesa_print_instruction(inst);
+      }
+
+      switch (inst->Opcode) {
+      case OPCODE_ABS:
+         {
+            GLfloat a[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            result[0] = FABSF(a[0]);
+            result[1] = FABSF(a[1]);
+            result[2] = FABSF(a[2]);
+            result[3] = FABSF(a[3]);
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_ADD:
+         {
+            GLfloat a[4], b[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            fetch_vector4(&inst->SrcReg[1], machine, b);
+            result[0] = a[0] + b[0];
+            result[1] = a[1] + b[1];
+            result[2] = a[2] + b[2];
+            result[3] = a[3] + b[3];
+            store_vector4(inst, machine, result);
+            if (DEBUG_PROG) {
+               printf("ADD (%g %g %g %g) = (%g %g %g %g) + (%g %g %g %g)\n",
+                      result[0], result[1], result[2], result[3],
+                      a[0], a[1], a[2], a[3], b[0], b[1], b[2], b[3]);
+            }
+         }
+         break;
+      case OPCODE_ARL:
+         {
+            GLfloat t[4];
+            fetch_vector4(&inst->SrcReg[0], machine, t);
+            machine->AddressReg[0][0] = IFLOOR(t[0]);
+            if (DEBUG_PROG) {
+               printf("ARL %d\n", machine->AddressReg[0][0]);
+            }
+         }
+         break;
+      case OPCODE_BGNLOOP:
+         /* no-op */
+         ASSERT(program->Instructions[inst->BranchTarget].Opcode
+                == OPCODE_ENDLOOP);
+         break;
+      case OPCODE_ENDLOOP:
+         /* subtract 1 here since pc is incremented by for(pc) loop */
+         ASSERT(program->Instructions[inst->BranchTarget].Opcode
+                == OPCODE_BGNLOOP);
+         pc = inst->BranchTarget - 1;   /* go to matching BNGLOOP */
+         break;
+      case OPCODE_BGNSUB:      /* begin subroutine */
+         break;
+      case OPCODE_ENDSUB:      /* end subroutine */
+         break;
+      case OPCODE_BRK:         /* break out of loop (conditional) */
+         ASSERT(program->Instructions[inst->BranchTarget].Opcode
+                == OPCODE_ENDLOOP);
+         if (eval_condition(machine, inst)) {
+            /* break out of loop */
+            /* pc++ at end of for-loop will put us after the ENDLOOP inst */
+            pc = inst->BranchTarget;
+         }
+         break;
+      case OPCODE_CONT:        /* continue loop (conditional) */
+         ASSERT(program->Instructions[inst->BranchTarget].Opcode
+                == OPCODE_ENDLOOP);
+         if (eval_condition(machine, inst)) {
+            /* continue at ENDLOOP */
+            /* Subtract 1 here since we'll do pc++ at end of for-loop */
+            pc = inst->BranchTarget - 1;
+         }
+         break;
+      case OPCODE_CAL:         /* Call subroutine (conditional) */
+         if (eval_condition(machine, inst)) {
+            /* call the subroutine */
+            if (machine->StackDepth >= MAX_PROGRAM_CALL_DEPTH) {
+               return GL_TRUE;  /* Per GL_NV_vertex_program2 spec */
+            }
+            machine->CallStack[machine->StackDepth++] = pc + 1; /* next inst */
+            /* Subtract 1 here since we'll do pc++ at end of for-loop */
+            pc = inst->BranchTarget - 1;
+         }
+         break;
+      case OPCODE_CMP:
+         {
+            GLfloat a[4], b[4], c[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            fetch_vector4(&inst->SrcReg[1], machine, b);
+            fetch_vector4(&inst->SrcReg[2], machine, c);
+            result[0] = a[0] < 0.0F ? b[0] : c[0];
+            result[1] = a[1] < 0.0F ? b[1] : c[1];
+            result[2] = a[2] < 0.0F ? b[2] : c[2];
+            result[3] = a[3] < 0.0F ? b[3] : c[3];
+            store_vector4(inst, machine, result);
+            if (DEBUG_PROG) {
+               printf("CMP (%g %g %g %g) = (%g %g %g %g) < 0 ? (%g %g %g %g) : (%g %g %g %g)\n",
+                      result[0], result[1], result[2], result[3],
+                      a[0], a[1], a[2], a[3],
+                      b[0], b[1], b[2], b[3],
+                      c[0], c[1], c[2], c[3]);
+            }
+         }
+         break;
+      case OPCODE_COS:
+         {
+            GLfloat a[4], result[4];
+            fetch_vector1(&inst->SrcReg[0], machine, a);
+            result[0] = result[1] = result[2] = result[3]
+               = (GLfloat) cos(a[0]);
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_DDX:         /* Partial derivative with respect to X */
+         {
+            GLfloat result[4];
+            fetch_vector4_deriv(ctx, &inst->SrcReg[0], machine,
+                                'X', result);
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_DDY:         /* Partial derivative with respect to Y */
+         {
+            GLfloat result[4];
+            fetch_vector4_deriv(ctx, &inst->SrcReg[0], machine,
+                                'Y', result);
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_DP2:
+         {
+            GLfloat a[4], b[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            fetch_vector4(&inst->SrcReg[1], machine, b);
+            result[0] = result[1] = result[2] = result[3] = DOT2(a, b);
+            store_vector4(inst, machine, result);
+            if (DEBUG_PROG) {
+               printf("DP2 %g = (%g %g) . (%g %g)\n",
+                      result[0], a[0], a[1], b[0], b[1]);
+            }
+         }
+         break;
+      case OPCODE_DP3:
+         {
+            GLfloat a[4], b[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            fetch_vector4(&inst->SrcReg[1], machine, b);
+            result[0] = result[1] = result[2] = result[3] = DOT3(a, b);
+            store_vector4(inst, machine, result);
+            if (DEBUG_PROG) {
+               printf("DP3 %g = (%g %g %g) . (%g %g %g)\n",
+                      result[0], a[0], a[1], a[2], b[0], b[1], b[2]);
+            }
+         }
+         break;
+      case OPCODE_DP4:
+         {
+            GLfloat a[4], b[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            fetch_vector4(&inst->SrcReg[1], machine, b);
+            result[0] = result[1] = result[2] = result[3] = DOT4(a, b);
+            store_vector4(inst, machine, result);
+            if (DEBUG_PROG) {
+               printf("DP4 %g = (%g, %g %g %g) . (%g, %g %g %g)\n",
+                      result[0], a[0], a[1], a[2], a[3],
+                      b[0], b[1], b[2], b[3]);
+            }
+         }
+         break;
+      case OPCODE_DPH:
+         {
+            GLfloat a[4], b[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            fetch_vector4(&inst->SrcReg[1], machine, b);
+            result[0] = result[1] = result[2] = result[3] = DOT3(a, b) + b[3];
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_DST:         /* Distance vector */
+         {
+            GLfloat a[4], b[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            fetch_vector4(&inst->SrcReg[1], machine, b);
+            result[0] = 1.0F;
+            result[1] = a[1] * b[1];
+            result[2] = a[2];
+            result[3] = b[3];
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_EXP:
+         {
+            GLfloat t[4], q[4], floor_t0;
+            fetch_vector1(&inst->SrcReg[0], machine, t);
+            floor_t0 = FLOORF(t[0]);
+            if (floor_t0 > FLT_MAX_EXP) {
+               SET_POS_INFINITY(q[0]);
+               SET_POS_INFINITY(q[2]);
+            }
+            else if (floor_t0 < FLT_MIN_EXP) {
+               q[0] = 0.0F;
+               q[2] = 0.0F;
+            }
+            else {
+               q[0] = LDEXPF(1.0, (int) floor_t0);
+               /* Note: GL_NV_vertex_program expects 
+                * result.z = result.x * APPX(result.y)
+                * We do what the ARB extension says.
+                */
+               q[2] = (GLfloat) pow(2.0, t[0]);
+            }
+            q[1] = t[0] - floor_t0;
+            q[3] = 1.0F;
+            store_vector4( inst, machine, q );
+         }
+         break;
+      case OPCODE_EX2:         /* Exponential base 2 */
+         {
+            GLfloat a[4], result[4], val;
+            fetch_vector1(&inst->SrcReg[0], machine, a);
+            val = (GLfloat) pow(2.0, a[0]);
+            /*
+            if (IS_INF_OR_NAN(val))
+               val = 1.0e10;
+            */
+            result[0] = result[1] = result[2] = result[3] = val;
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_FLR:
+         {
+            GLfloat a[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            result[0] = FLOORF(a[0]);
+            result[1] = FLOORF(a[1]);
+            result[2] = FLOORF(a[2]);
+            result[3] = FLOORF(a[3]);
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_FRC:
+         {
+            GLfloat a[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            result[0] = a[0] - FLOORF(a[0]);
+            result[1] = a[1] - FLOORF(a[1]);
+            result[2] = a[2] - FLOORF(a[2]);
+            result[3] = a[3] - FLOORF(a[3]);
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_IF:
+         {
+            GLboolean cond;
+            ASSERT(program->Instructions[inst->BranchTarget].Opcode
+                   == OPCODE_ELSE ||
+                   program->Instructions[inst->BranchTarget].Opcode
+                   == OPCODE_ENDIF);
+            /* eval condition */
+            if (inst->SrcReg[0].File != PROGRAM_UNDEFINED) {
+               GLfloat a[4];
+               fetch_vector1(&inst->SrcReg[0], machine, a);
+               cond = (a[0] != 0.0);
+            }
+            else {
+               cond = eval_condition(machine, inst);
+            }
+            if (DEBUG_PROG) {
+               printf("IF: %d\n", cond);
+            }
+            /* do if/else */
+            if (cond) {
+               /* do if-clause (just continue execution) */
+            }
+            else {
+               /* go to the instruction after ELSE or ENDIF */
+               assert(inst->BranchTarget >= 0);
+               pc = inst->BranchTarget;
+            }
+         }
+         break;
+      case OPCODE_ELSE:
+         /* goto ENDIF */
+         ASSERT(program->Instructions[inst->BranchTarget].Opcode
+                == OPCODE_ENDIF);
+         assert(inst->BranchTarget >= 0);
+         pc = inst->BranchTarget;
+         break;
+      case OPCODE_ENDIF:
+         /* nothing */
+         break;
+      case OPCODE_KIL_NV:      /* NV_f_p only (conditional) */
+         if (eval_condition(machine, inst)) {
+            return GL_FALSE;
+         }
+         break;
+      case OPCODE_KIL:         /* ARB_f_p only */
+         {
+            GLfloat a[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            if (DEBUG_PROG) {
+               printf("KIL if (%g %g %g %g) <= 0.0\n",
+                      a[0], a[1], a[2], a[3]);
+            }
+
+            if (a[0] < 0.0F || a[1] < 0.0F || a[2] < 0.0F || a[3] < 0.0F) {
+               return GL_FALSE;
+            }
+         }
+         break;
+      case OPCODE_LG2:         /* log base 2 */
+         {
+            GLfloat a[4], result[4], val;
+            fetch_vector1(&inst->SrcReg[0], machine, a);
+	    /* The fast LOG2 macro doesn't meet the precision requirements.
+	     */
+            if (a[0] == 0.0F) {
+               val = -FLT_MAX;
+            }
+            else {
+               val = (float)(log(a[0]) * 1.442695F);
+            }
+            result[0] = result[1] = result[2] = result[3] = val;
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_LIT:
+         {
+            const GLfloat epsilon = 1.0F / 256.0F;      /* from NV VP spec */
+            GLfloat a[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            a[0] = MAX2(a[0], 0.0F);
+            a[1] = MAX2(a[1], 0.0F);
+            /* XXX ARB version clamps a[3], NV version doesn't */
+            a[3] = CLAMP(a[3], -(128.0F - epsilon), (128.0F - epsilon));
+            result[0] = 1.0F;
+            result[1] = a[0];
+            /* XXX we could probably just use pow() here */
+            if (a[0] > 0.0F) {
+               if (a[1] == 0.0 && a[3] == 0.0)
+                  result[2] = 1.0F;
+               else
+                  result[2] = (GLfloat) pow(a[1], a[3]);
+            }
+            else {
+               result[2] = 0.0F;
+            }
+            result[3] = 1.0F;
+            store_vector4(inst, machine, result);
+            if (DEBUG_PROG) {
+               printf("LIT (%g %g %g %g) : (%g %g %g %g)\n",
+                      result[0], result[1], result[2], result[3],
+                      a[0], a[1], a[2], a[3]);
+            }
+         }
+         break;
+      case OPCODE_LOG:
+         {
+            GLfloat t[4], q[4], abs_t0;
+            fetch_vector1(&inst->SrcReg[0], machine, t);
+            abs_t0 = FABSF(t[0]);
+            if (abs_t0 != 0.0F) {
+               if (IS_INF_OR_NAN(abs_t0))
+               {
+                  SET_POS_INFINITY(q[0]);
+                  q[1] = 1.0F;
+                  SET_POS_INFINITY(q[2]);
+               }
+               else {
+                  int exponent;
+                  GLfloat mantissa = FREXPF(t[0], &exponent);
+                  q[0] = (GLfloat) (exponent - 1);
+                  q[1] = (GLfloat) (2.0 * mantissa); /* map [.5, 1) -> [1, 2) */
+
+		  /* The fast LOG2 macro doesn't meet the precision
+		   * requirements.
+		   */
+                  q[2] = (float)(log(t[0]) * 1.442695F);
+               }
+            }
+            else {
+               SET_NEG_INFINITY(q[0]);
+               q[1] = 1.0F;
+               SET_NEG_INFINITY(q[2]);
+            }
+            q[3] = 1.0;
+            store_vector4(inst, machine, q);
+         }
+         break;
+      case OPCODE_LRP:
+         {
+            GLfloat a[4], b[4], c[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            fetch_vector4(&inst->SrcReg[1], machine, b);
+            fetch_vector4(&inst->SrcReg[2], machine, c);
+            result[0] = a[0] * b[0] + (1.0F - a[0]) * c[0];
+            result[1] = a[1] * b[1] + (1.0F - a[1]) * c[1];
+            result[2] = a[2] * b[2] + (1.0F - a[2]) * c[2];
+            result[3] = a[3] * b[3] + (1.0F - a[3]) * c[3];
+            store_vector4(inst, machine, result);
+            if (DEBUG_PROG) {
+               printf("LRP (%g %g %g %g) = (%g %g %g %g), "
+                      "(%g %g %g %g), (%g %g %g %g)\n",
+                      result[0], result[1], result[2], result[3],
+                      a[0], a[1], a[2], a[3],
+                      b[0], b[1], b[2], b[3], c[0], c[1], c[2], c[3]);
+            }
+         }
+         break;
+      case OPCODE_MAD:
+         {
+            GLfloat a[4], b[4], c[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            fetch_vector4(&inst->SrcReg[1], machine, b);
+            fetch_vector4(&inst->SrcReg[2], machine, c);
+            result[0] = a[0] * b[0] + c[0];
+            result[1] = a[1] * b[1] + c[1];
+            result[2] = a[2] * b[2] + c[2];
+            result[3] = a[3] * b[3] + c[3];
+            store_vector4(inst, machine, result);
+            if (DEBUG_PROG) {
+               printf("MAD (%g %g %g %g) = (%g %g %g %g) * "
+                      "(%g %g %g %g) + (%g %g %g %g)\n",
+                      result[0], result[1], result[2], result[3],
+                      a[0], a[1], a[2], a[3],
+                      b[0], b[1], b[2], b[3], c[0], c[1], c[2], c[3]);
+            }
+         }
+         break;
+      case OPCODE_MAX:
+         {
+            GLfloat a[4], b[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            fetch_vector4(&inst->SrcReg[1], machine, b);
+            result[0] = MAX2(a[0], b[0]);
+            result[1] = MAX2(a[1], b[1]);
+            result[2] = MAX2(a[2], b[2]);
+            result[3] = MAX2(a[3], b[3]);
+            store_vector4(inst, machine, result);
+            if (DEBUG_PROG) {
+               printf("MAX (%g %g %g %g) = (%g %g %g %g), (%g %g %g %g)\n",
+                      result[0], result[1], result[2], result[3],
+                      a[0], a[1], a[2], a[3], b[0], b[1], b[2], b[3]);
+            }
+         }
+         break;
+      case OPCODE_MIN:
+         {
+            GLfloat a[4], b[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            fetch_vector4(&inst->SrcReg[1], machine, b);
+            result[0] = MIN2(a[0], b[0]);
+            result[1] = MIN2(a[1], b[1]);
+            result[2] = MIN2(a[2], b[2]);
+            result[3] = MIN2(a[3], b[3]);
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_MOV:
+         {
+            GLfloat result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, result);
+            store_vector4(inst, machine, result);
+            if (DEBUG_PROG) {
+               printf("MOV (%g %g %g %g)\n",
+                      result[0], result[1], result[2], result[3]);
+            }
+         }
+         break;
+      case OPCODE_MUL:
+         {
+            GLfloat a[4], b[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            fetch_vector4(&inst->SrcReg[1], machine, b);
+            result[0] = a[0] * b[0];
+            result[1] = a[1] * b[1];
+            result[2] = a[2] * b[2];
+            result[3] = a[3] * b[3];
+            store_vector4(inst, machine, result);
+            if (DEBUG_PROG) {
+               printf("MUL (%g %g %g %g) = (%g %g %g %g) * (%g %g %g %g)\n",
+                      result[0], result[1], result[2], result[3],
+                      a[0], a[1], a[2], a[3], b[0], b[1], b[2], b[3]);
+            }
+         }
+         break;
+      case OPCODE_NOISE1:
+         {
+            GLfloat a[4], result[4];
+            fetch_vector1(&inst->SrcReg[0], machine, a);
+            result[0] =
+               result[1] =
+               result[2] =
+               result[3] = _mesa_noise1(a[0]);
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_NOISE2:
+         {
+            GLfloat a[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            result[0] =
+               result[1] =
+               result[2] = result[3] = _mesa_noise2(a[0], a[1]);
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_NOISE3:
+         {
+            GLfloat a[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            result[0] =
+               result[1] =
+               result[2] =
+               result[3] = _mesa_noise3(a[0], a[1], a[2]);
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_NOISE4:
+         {
+            GLfloat a[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            result[0] =
+               result[1] =
+               result[2] =
+               result[3] = _mesa_noise4(a[0], a[1], a[2], a[3]);
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_NOP:
+         break;
+      case OPCODE_PK2H:        /* pack two 16-bit floats in one 32-bit float */
+         {
+            GLfloat a[4];
+            GLuint result[4];
+            GLhalfNV hx, hy;
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            hx = _mesa_float_to_half(a[0]);
+            hy = _mesa_float_to_half(a[1]);
+            result[0] =
+            result[1] =
+            result[2] =
+            result[3] = hx | (hy << 16);
+            store_vector4ui(inst, machine, result);
+         }
+         break;
+      case OPCODE_PK2US:       /* pack two GLushorts into one 32-bit float */
+         {
+            GLfloat a[4];
+            GLuint result[4], usx, usy;
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            a[0] = CLAMP(a[0], 0.0F, 1.0F);
+            a[1] = CLAMP(a[1], 0.0F, 1.0F);
+            usx = F_TO_I(a[0] * 65535.0F);
+            usy = F_TO_I(a[1] * 65535.0F);
+            result[0] =
+            result[1] =
+            result[2] =
+            result[3] = usx | (usy << 16);
+            store_vector4ui(inst, machine, result);
+         }
+         break;
+      case OPCODE_PK4B:        /* pack four GLbytes into one 32-bit float */
+         {
+            GLfloat a[4];
+            GLuint result[4], ubx, uby, ubz, ubw;
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            a[0] = CLAMP(a[0], -128.0F / 127.0F, 1.0F);
+            a[1] = CLAMP(a[1], -128.0F / 127.0F, 1.0F);
+            a[2] = CLAMP(a[2], -128.0F / 127.0F, 1.0F);
+            a[3] = CLAMP(a[3], -128.0F / 127.0F, 1.0F);
+            ubx = F_TO_I(127.0F * a[0] + 128.0F);
+            uby = F_TO_I(127.0F * a[1] + 128.0F);
+            ubz = F_TO_I(127.0F * a[2] + 128.0F);
+            ubw = F_TO_I(127.0F * a[3] + 128.0F);
+            result[0] =
+            result[1] =
+            result[2] =
+            result[3] = ubx | (uby << 8) | (ubz << 16) | (ubw << 24);
+            store_vector4ui(inst, machine, result);
+         }
+         break;
+      case OPCODE_PK4UB:       /* pack four GLubytes into one 32-bit float */
+         {
+            GLfloat a[4];
+            GLuint result[4], ubx, uby, ubz, ubw;
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            a[0] = CLAMP(a[0], 0.0F, 1.0F);
+            a[1] = CLAMP(a[1], 0.0F, 1.0F);
+            a[2] = CLAMP(a[2], 0.0F, 1.0F);
+            a[3] = CLAMP(a[3], 0.0F, 1.0F);
+            ubx = F_TO_I(255.0F * a[0]);
+            uby = F_TO_I(255.0F * a[1]);
+            ubz = F_TO_I(255.0F * a[2]);
+            ubw = F_TO_I(255.0F * a[3]);
+            result[0] =
+            result[1] =
+            result[2] =
+            result[3] = ubx | (uby << 8) | (ubz << 16) | (ubw << 24);
+            store_vector4ui(inst, machine, result);
+         }
+         break;
+      case OPCODE_POW:
+         {
+            GLfloat a[4], b[4], result[4];
+            fetch_vector1(&inst->SrcReg[0], machine, a);
+            fetch_vector1(&inst->SrcReg[1], machine, b);
+            result[0] = result[1] = result[2] = result[3]
+               = (GLfloat) pow(a[0], b[0]);
+            store_vector4(inst, machine, result);
+         }
+         break;
+
+      case OPCODE_RCP:
+         {
+            GLfloat a[4], result[4];
+            fetch_vector1(&inst->SrcReg[0], machine, a);
+            if (DEBUG_PROG) {
+               if (a[0] == 0)
+                  printf("RCP(0)\n");
+               else if (IS_INF_OR_NAN(a[0]))
+                  printf("RCP(inf)\n");
+            }
+            result[0] = result[1] = result[2] = result[3] = 1.0F / a[0];
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_RET:         /* return from subroutine (conditional) */
+         if (eval_condition(machine, inst)) {
+            if (machine->StackDepth == 0) {
+               return GL_TRUE;  /* Per GL_NV_vertex_program2 spec */
+            }
+            /* subtract one because of pc++ in the for loop */
+            pc = machine->CallStack[--machine->StackDepth] - 1;
+         }
+         break;
+      case OPCODE_RFL:         /* reflection vector */
+         {
+            GLfloat axis[4], dir[4], result[4], tmpX, tmpW;
+            fetch_vector4(&inst->SrcReg[0], machine, axis);
+            fetch_vector4(&inst->SrcReg[1], machine, dir);
+            tmpW = DOT3(axis, axis);
+            tmpX = (2.0F * DOT3(axis, dir)) / tmpW;
+            result[0] = tmpX * axis[0] - dir[0];
+            result[1] = tmpX * axis[1] - dir[1];
+            result[2] = tmpX * axis[2] - dir[2];
+            /* result[3] is never written! XXX enforce in parser! */
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_RSQ:         /* 1 / sqrt() */
+         {
+            GLfloat a[4], result[4];
+            fetch_vector1(&inst->SrcReg[0], machine, a);
+            a[0] = FABSF(a[0]);
+            result[0] = result[1] = result[2] = result[3] = INV_SQRTF(a[0]);
+            store_vector4(inst, machine, result);
+            if (DEBUG_PROG) {
+               printf("RSQ %g = 1/sqrt(|%g|)\n", result[0], a[0]);
+            }
+         }
+         break;
+      case OPCODE_SCS:         /* sine and cos */
+         {
+            GLfloat a[4], result[4];
+            fetch_vector1(&inst->SrcReg[0], machine, a);
+            result[0] = (GLfloat) cos(a[0]);
+            result[1] = (GLfloat) sin(a[0]);
+            result[2] = 0.0;    /* undefined! */
+            result[3] = 0.0;    /* undefined! */
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_SEQ:         /* set on equal */
+         {
+            GLfloat a[4], b[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            fetch_vector4(&inst->SrcReg[1], machine, b);
+            result[0] = (a[0] == b[0]) ? 1.0F : 0.0F;
+            result[1] = (a[1] == b[1]) ? 1.0F : 0.0F;
+            result[2] = (a[2] == b[2]) ? 1.0F : 0.0F;
+            result[3] = (a[3] == b[3]) ? 1.0F : 0.0F;
+            store_vector4(inst, machine, result);
+            if (DEBUG_PROG) {
+               printf("SEQ (%g %g %g %g) = (%g %g %g %g) == (%g %g %g %g)\n",
+                      result[0], result[1], result[2], result[3],
+                      a[0], a[1], a[2], a[3],
+                      b[0], b[1], b[2], b[3]);
+            }
+         }
+         break;
+      case OPCODE_SFL:         /* set false, operands ignored */
+         {
+            static const GLfloat result[4] = { 0.0F, 0.0F, 0.0F, 0.0F };
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_SGE:         /* set on greater or equal */
+         {
+            GLfloat a[4], b[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            fetch_vector4(&inst->SrcReg[1], machine, b);
+            result[0] = (a[0] >= b[0]) ? 1.0F : 0.0F;
+            result[1] = (a[1] >= b[1]) ? 1.0F : 0.0F;
+            result[2] = (a[2] >= b[2]) ? 1.0F : 0.0F;
+            result[3] = (a[3] >= b[3]) ? 1.0F : 0.0F;
+            store_vector4(inst, machine, result);
+            if (DEBUG_PROG) {
+               printf("SGE (%g %g %g %g) = (%g %g %g %g) >= (%g %g %g %g)\n",
+                      result[0], result[1], result[2], result[3],
+                      a[0], a[1], a[2], a[3],
+                      b[0], b[1], b[2], b[3]);
+            }
+         }
+         break;
+      case OPCODE_SGT:         /* set on greater */
+         {
+            GLfloat a[4], b[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            fetch_vector4(&inst->SrcReg[1], machine, b);
+            result[0] = (a[0] > b[0]) ? 1.0F : 0.0F;
+            result[1] = (a[1] > b[1]) ? 1.0F : 0.0F;
+            result[2] = (a[2] > b[2]) ? 1.0F : 0.0F;
+            result[3] = (a[3] > b[3]) ? 1.0F : 0.0F;
+            store_vector4(inst, machine, result);
+            if (DEBUG_PROG) {
+               printf("SGT (%g %g %g %g) = (%g %g %g %g) > (%g %g %g %g)\n",
+                      result[0], result[1], result[2], result[3],
+                      a[0], a[1], a[2], a[3],
+                      b[0], b[1], b[2], b[3]);
+            }
+         }
+         break;
+      case OPCODE_SIN:
+         {
+            GLfloat a[4], result[4];
+            fetch_vector1(&inst->SrcReg[0], machine, a);
+            result[0] = result[1] = result[2] = result[3]
+               = (GLfloat) sin(a[0]);
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_SLE:         /* set on less or equal */
+         {
+            GLfloat a[4], b[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            fetch_vector4(&inst->SrcReg[1], machine, b);
+            result[0] = (a[0] <= b[0]) ? 1.0F : 0.0F;
+            result[1] = (a[1] <= b[1]) ? 1.0F : 0.0F;
+            result[2] = (a[2] <= b[2]) ? 1.0F : 0.0F;
+            result[3] = (a[3] <= b[3]) ? 1.0F : 0.0F;
+            store_vector4(inst, machine, result);
+            if (DEBUG_PROG) {
+               printf("SLE (%g %g %g %g) = (%g %g %g %g) <= (%g %g %g %g)\n",
+                      result[0], result[1], result[2], result[3],
+                      a[0], a[1], a[2], a[3],
+                      b[0], b[1], b[2], b[3]);
+            }
+         }
+         break;
+      case OPCODE_SLT:         /* set on less */
+         {
+            GLfloat a[4], b[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            fetch_vector4(&inst->SrcReg[1], machine, b);
+            result[0] = (a[0] < b[0]) ? 1.0F : 0.0F;
+            result[1] = (a[1] < b[1]) ? 1.0F : 0.0F;
+            result[2] = (a[2] < b[2]) ? 1.0F : 0.0F;
+            result[3] = (a[3] < b[3]) ? 1.0F : 0.0F;
+            store_vector4(inst, machine, result);
+            if (DEBUG_PROG) {
+               printf("SLT (%g %g %g %g) = (%g %g %g %g) < (%g %g %g %g)\n",
+                      result[0], result[1], result[2], result[3],
+                      a[0], a[1], a[2], a[3],
+                      b[0], b[1], b[2], b[3]);
+            }
+         }
+         break;
+      case OPCODE_SNE:         /* set on not equal */
+         {
+            GLfloat a[4], b[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            fetch_vector4(&inst->SrcReg[1], machine, b);
+            result[0] = (a[0] != b[0]) ? 1.0F : 0.0F;
+            result[1] = (a[1] != b[1]) ? 1.0F : 0.0F;
+            result[2] = (a[2] != b[2]) ? 1.0F : 0.0F;
+            result[3] = (a[3] != b[3]) ? 1.0F : 0.0F;
+            store_vector4(inst, machine, result);
+            if (DEBUG_PROG) {
+               printf("SNE (%g %g %g %g) = (%g %g %g %g) != (%g %g %g %g)\n",
+                      result[0], result[1], result[2], result[3],
+                      a[0], a[1], a[2], a[3],
+                      b[0], b[1], b[2], b[3]);
+            }
+         }
+         break;
+      case OPCODE_SSG:         /* set sign (-1, 0 or +1) */
+         {
+            GLfloat a[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            result[0] = (GLfloat) ((a[0] > 0.0F) - (a[0] < 0.0F));
+            result[1] = (GLfloat) ((a[1] > 0.0F) - (a[1] < 0.0F));
+            result[2] = (GLfloat) ((a[2] > 0.0F) - (a[2] < 0.0F));
+            result[3] = (GLfloat) ((a[3] > 0.0F) - (a[3] < 0.0F));
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_STR:         /* set true, operands ignored */
+         {
+            static const GLfloat result[4] = { 1.0F, 1.0F, 1.0F, 1.0F };
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_SUB:
+         {
+            GLfloat a[4], b[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            fetch_vector4(&inst->SrcReg[1], machine, b);
+            result[0] = a[0] - b[0];
+            result[1] = a[1] - b[1];
+            result[2] = a[2] - b[2];
+            result[3] = a[3] - b[3];
+            store_vector4(inst, machine, result);
+            if (DEBUG_PROG) {
+               printf("SUB (%g %g %g %g) = (%g %g %g %g) - (%g %g %g %g)\n",
+                      result[0], result[1], result[2], result[3],
+                      a[0], a[1], a[2], a[3], b[0], b[1], b[2], b[3]);
+            }
+         }
+         break;
+      case OPCODE_SWZ:         /* extended swizzle */
+         {
+            const struct prog_src_register *source = &inst->SrcReg[0];
+            const GLfloat *src = get_src_register_pointer(source, machine);
+            GLfloat result[4];
+            GLuint i;
+            for (i = 0; i < 4; i++) {
+               const GLuint swz = GET_SWZ(source->Swizzle, i);
+               if (swz == SWIZZLE_ZERO)
+                  result[i] = 0.0;
+               else if (swz == SWIZZLE_ONE)
+                  result[i] = 1.0;
+               else {
+                  ASSERT(swz >= 0);
+                  ASSERT(swz <= 3);
+                  result[i] = src[swz];
+               }
+               if (source->Negate & (1 << i))
+                  result[i] = -result[i];
+            }
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_TEX:         /* Both ARB and NV frag prog */
+         /* Simple texel lookup */
+         {
+            GLfloat texcoord[4], color[4];
+            fetch_vector4(&inst->SrcReg[0], machine, texcoord);
+
+            /* For TEX, texcoord.Q should not be used and its value should not
+             * matter (at most, we pass coord.xyz to texture3D() in GLSL).
+             * Set Q=1 so that FetchTexelDeriv() doesn't get a garbage value
+             * which is effectively what happens when the texcoord swizzle
+             * is .xyzz
+             */
+            texcoord[3] = 1.0f;
+
+            fetch_texel(ctx, machine, inst, texcoord, 0.0, color);
+
+            if (DEBUG_PROG) {
+               printf("TEX (%g, %g, %g, %g) = texture[%d][%g, %g, %g, %g]\n",
+                      color[0], color[1], color[2], color[3],
+                      inst->TexSrcUnit,
+                      texcoord[0], texcoord[1], texcoord[2], texcoord[3]);
+            }
+            store_vector4(inst, machine, color);
+         }
+         break;
+      case OPCODE_TXB:         /* GL_ARB_fragment_program only */
+         /* Texel lookup with LOD bias */
+         {
+            GLfloat texcoord[4], color[4], lodBias;
+
+            fetch_vector4(&inst->SrcReg[0], machine, texcoord);
+
+            /* texcoord[3] is the bias to add to lambda */
+            lodBias = texcoord[3];
+
+            fetch_texel(ctx, machine, inst, texcoord, lodBias, color);
+
+            if (DEBUG_PROG) {
+               printf("TXB (%g, %g, %g, %g) = texture[%d][%g %g %g %g]"
+                      "  bias %g\n",
+                      color[0], color[1], color[2], color[3],
+                      inst->TexSrcUnit,
+                      texcoord[0],
+                      texcoord[1],
+                      texcoord[2],
+                      texcoord[3],
+                      lodBias);
+            }
+
+            store_vector4(inst, machine, color);
+         }
+         break;
+      case OPCODE_TXD:         /* GL_NV_fragment_program only */
+         /* Texture lookup w/ partial derivatives for LOD */
+         {
+            GLfloat texcoord[4], dtdx[4], dtdy[4], color[4];
+            fetch_vector4(&inst->SrcReg[0], machine, texcoord);
+            fetch_vector4(&inst->SrcReg[1], machine, dtdx);
+            fetch_vector4(&inst->SrcReg[2], machine, dtdy);
+            machine->FetchTexelDeriv(ctx, texcoord, dtdx, dtdy,
+                                     0.0, /* lodBias */
+                                     inst->TexSrcUnit, color);
+            store_vector4(inst, machine, color);
+         }
+         break;
+      case OPCODE_TXL:
+         /* Texel lookup with explicit LOD */
+         {
+            GLfloat texcoord[4], color[4], lod;
+
+            fetch_vector4(&inst->SrcReg[0], machine, texcoord);
+
+            /* texcoord[3] is the LOD */
+            lod = texcoord[3];
+
+	    machine->FetchTexelLod(ctx, texcoord, lod,
+				   machine->Samplers[inst->TexSrcUnit], color);
+
+            store_vector4(inst, machine, color);
+         }
+         break;
+      case OPCODE_TXP:         /* GL_ARB_fragment_program only */
+         /* Texture lookup w/ projective divide */
+         {
+            GLfloat texcoord[4], color[4];
+
+            fetch_vector4(&inst->SrcReg[0], machine, texcoord);
+            /* Not so sure about this test - if texcoord[3] is
+             * zero, we'd probably be fine except for an ASSERT in
+             * IROUND_POS() which gets triggered by the inf values created.
+             */
+            if (texcoord[3] != 0.0) {
+               texcoord[0] /= texcoord[3];
+               texcoord[1] /= texcoord[3];
+               texcoord[2] /= texcoord[3];
+            }
+
+            fetch_texel(ctx, machine, inst, texcoord, 0.0, color);
+
+            store_vector4(inst, machine, color);
+         }
+         break;
+      case OPCODE_TXP_NV:      /* GL_NV_fragment_program only */
+         /* Texture lookup w/ projective divide, as above, but do not
+          * do the divide by w if sampling from a cube map.
+          */
+         {
+            GLfloat texcoord[4], color[4];
+
+            fetch_vector4(&inst->SrcReg[0], machine, texcoord);
+            if (inst->TexSrcTarget != TEXTURE_CUBE_INDEX &&
+                texcoord[3] != 0.0) {
+               texcoord[0] /= texcoord[3];
+               texcoord[1] /= texcoord[3];
+               texcoord[2] /= texcoord[3];
+            }
+
+            fetch_texel(ctx, machine, inst, texcoord, 0.0, color);
+
+            store_vector4(inst, machine, color);
+         }
+         break;
+      case OPCODE_TRUNC:       /* truncate toward zero */
+         {
+            GLfloat a[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            result[0] = (GLfloat) (GLint) a[0];
+            result[1] = (GLfloat) (GLint) a[1];
+            result[2] = (GLfloat) (GLint) a[2];
+            result[3] = (GLfloat) (GLint) a[3];
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_UP2H:        /* unpack two 16-bit floats */
+         {
+            const GLuint raw = fetch_vector1ui(&inst->SrcReg[0], machine);
+            GLfloat result[4];
+            GLushort hx, hy;
+            hx = raw & 0xffff;
+            hy = raw >> 16;
+            result[0] = result[2] = _mesa_half_to_float(hx);
+            result[1] = result[3] = _mesa_half_to_float(hy);
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_UP2US:       /* unpack two GLushorts */
+         {
+            const GLuint raw = fetch_vector1ui(&inst->SrcReg[0], machine);
+            GLfloat result[4];
+            GLushort usx, usy;
+            usx = raw & 0xffff;
+            usy = raw >> 16;
+            result[0] = result[2] = usx * (1.0f / 65535.0f);
+            result[1] = result[3] = usy * (1.0f / 65535.0f);
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_UP4B:        /* unpack four GLbytes */
+         {
+            const GLuint raw = fetch_vector1ui(&inst->SrcReg[0], machine);
+            GLfloat result[4];
+            result[0] = (((raw >> 0) & 0xff) - 128) / 127.0F;
+            result[1] = (((raw >> 8) & 0xff) - 128) / 127.0F;
+            result[2] = (((raw >> 16) & 0xff) - 128) / 127.0F;
+            result[3] = (((raw >> 24) & 0xff) - 128) / 127.0F;
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_UP4UB:       /* unpack four GLubytes */
+         {
+            const GLuint raw = fetch_vector1ui(&inst->SrcReg[0], machine);
+            GLfloat result[4];
+            result[0] = ((raw >> 0) & 0xff) / 255.0F;
+            result[1] = ((raw >> 8) & 0xff) / 255.0F;
+            result[2] = ((raw >> 16) & 0xff) / 255.0F;
+            result[3] = ((raw >> 24) & 0xff) / 255.0F;
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_XPD:         /* cross product */
+         {
+            GLfloat a[4], b[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            fetch_vector4(&inst->SrcReg[1], machine, b);
+            result[0] = a[1] * b[2] - a[2] * b[1];
+            result[1] = a[2] * b[0] - a[0] * b[2];
+            result[2] = a[0] * b[1] - a[1] * b[0];
+            result[3] = 1.0;
+            store_vector4(inst, machine, result);
+            if (DEBUG_PROG) {
+               printf("XPD (%g %g %g %g) = (%g %g %g) X (%g %g %g)\n",
+                      result[0], result[1], result[2], result[3],
+                      a[0], a[1], a[2], b[0], b[1], b[2]);
+            }
+         }
+         break;
+      case OPCODE_X2D:         /* 2-D matrix transform */
+         {
+            GLfloat a[4], b[4], c[4], result[4];
+            fetch_vector4(&inst->SrcReg[0], machine, a);
+            fetch_vector4(&inst->SrcReg[1], machine, b);
+            fetch_vector4(&inst->SrcReg[2], machine, c);
+            result[0] = a[0] + b[0] * c[0] + b[1] * c[1];
+            result[1] = a[1] + b[0] * c[2] + b[1] * c[3];
+            result[2] = a[2] + b[0] * c[0] + b[1] * c[1];
+            result[3] = a[3] + b[0] * c[2] + b[1] * c[3];
+            store_vector4(inst, machine, result);
+         }
+         break;
+      case OPCODE_END:
+         return GL_TRUE;
+      default:
+         _mesa_problem(ctx, "Bad opcode %d in _mesa_execute_program",
+                       inst->Opcode);
+         return GL_TRUE;        /* return value doesn't matter */
+      }
+
+      numExec++;
+      if (numExec > maxExec) {
+	 static GLboolean reported = GL_FALSE;
+	 if (!reported) {
+	    _mesa_problem(ctx, "Infinite loop detected in fragment program");
+	    reported = GL_TRUE;
+	 }
+         return GL_TRUE;
+      }
+
+   } /* for pc */
+
+   return GL_TRUE;
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/prog_execute.h b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_execute.h
new file mode 100644
index 0000000..09542bf
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_execute.h

@@ -0,0 +1,91 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2007  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef PROG_EXECUTE_H
+#define PROG_EXECUTE_H
+
+#include "main/config.h"
+#include "main/mtypes.h"
+
+
+typedef void (*FetchTexelLodFunc)(struct gl_context *ctx, const GLfloat texcoord[4],
+                                  GLfloat lambda, GLuint unit, GLfloat color[4]);
+
+typedef void (*FetchTexelDerivFunc)(struct gl_context *ctx, const GLfloat texcoord[4],
+                                    const GLfloat texdx[4],
+                                    const GLfloat texdy[4],
+                                    GLfloat lodBias,
+                                    GLuint unit, GLfloat color[4]);
+
+
+/** NOTE: This must match SWRAST_MAX_WIDTH */
+#define PROG_MAX_WIDTH 16384
+
+
+/**
+ * Virtual machine state used during execution of vertex/fragment programs.
+ */
+struct gl_program_machine
+{
+   const struct gl_program *CurProgram;
+
+   /** Fragment Input attributes */
+   GLfloat (*Attribs)[PROG_MAX_WIDTH][4];
+   GLfloat (*DerivX)[4];
+   GLfloat (*DerivY)[4];
+   GLuint NumDeriv; /**< Max index into DerivX/Y arrays */
+   GLuint CurElement; /**< Index into Attribs arrays */
+
+   /** Vertex Input attribs */
+   GLfloat VertAttribs[VERT_ATTRIB_MAX][4];
+
+   GLfloat Temporaries[MAX_PROGRAM_TEMPS][4];
+   GLfloat Outputs[MAX_PROGRAM_OUTPUTS][4];
+   GLfloat (*EnvParams)[4]; /**< Vertex or Fragment env parameters */
+   GLuint CondCodes[4];  /**< COND_* value for x/y/z/w */
+   GLint AddressReg[MAX_PROGRAM_ADDRESS_REGS][4];
+   GLfloat SystemValues[SYSTEM_VALUE_MAX][4];
+
+   const GLubyte *Samplers;  /** Array mapping sampler var to tex unit */
+
+   GLuint CallStack[MAX_PROGRAM_CALL_DEPTH]; /**< For CAL/RET instructions */
+   GLuint StackDepth; /**< Index/ptr to top of CallStack[] */
+
+   /** Texture fetch functions */
+   FetchTexelLodFunc FetchTexelLod;
+   FetchTexelDerivFunc FetchTexelDeriv;
+};
+
+
+extern void
+_mesa_get_program_register(struct gl_context *ctx, gl_register_file file,
+                           GLuint index, GLfloat val[4]);
+
+extern GLboolean
+_mesa_execute_program(struct gl_context *ctx,
+                      const struct gl_program *program,
+                      struct gl_program_machine *machine);
+
+
+#endif /* PROG_EXECUTE_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/prog_hash_table.c b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_hash_table.c
new file mode 100644
index 0000000..f45ed46
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_hash_table.c

@@ -0,0 +1,235 @@
+/*
+ * Copyright © 2008 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file hash_table.c
+ * \brief Implementation of a generic, opaque hash table data type.
+ *
+ * \author Ian Romanick <ian.d.romanick@intel.com>
+ */
+
+#include "main/imports.h"
+#include "main/simple_list.h"
+#include "hash_table.h"
+
+struct node {
+   struct node *next;
+   struct node *prev;
+};
+
+struct hash_table {
+    hash_func_t    hash;
+    hash_compare_func_t  compare;
+
+    unsigned num_buckets;
+    struct node buckets[1];
+};
+
+
+struct hash_node {
+    struct node link;
+    const void *key;
+    void *data;
+};
+
+
+struct hash_table *
+hash_table_ctor(unsigned num_buckets, hash_func_t hash,
+                hash_compare_func_t compare)
+{
+    struct hash_table *ht;
+    unsigned i;
+
+
+    if (num_buckets < 16) {
+        num_buckets = 16;
+    }
+
+    ht = malloc(sizeof(*ht) + ((num_buckets - 1) 
+				     * sizeof(ht->buckets[0])));
+    if (ht != NULL) {
+        ht->hash = hash;
+        ht->compare = compare;
+        ht->num_buckets = num_buckets;
+
+        for (i = 0; i < num_buckets; i++) {
+            make_empty_list(& ht->buckets[i]);
+        }
+    }
+
+    return ht;
+}
+
+
+void
+hash_table_dtor(struct hash_table *ht)
+{
+   hash_table_clear(ht);
+   free(ht);
+}
+
+
+void
+hash_table_clear(struct hash_table *ht)
+{
+   struct node *node;
+   struct node *temp;
+   unsigned i;
+
+
+   for (i = 0; i < ht->num_buckets; i++) {
+      foreach_s(node, temp, & ht->buckets[i]) {
+	 remove_from_list(node);
+	 free(node);
+      }
+
+      assert(is_empty_list(& ht->buckets[i]));
+   }
+}
+
+
+static struct hash_node *
+get_node(struct hash_table *ht, const void *key)
+{
+    const unsigned hash_value = (*ht->hash)(key);
+    const unsigned bucket = hash_value % ht->num_buckets;
+    struct node *node;
+
+    foreach(node, & ht->buckets[bucket]) {
+       struct hash_node *hn = (struct hash_node *) node;
+
+       if ((*ht->compare)(hn->key, key) == 0) {
+	  return hn;
+       }
+    }
+
+    return NULL;
+}
+
+void *
+hash_table_find(struct hash_table *ht, const void *key)
+{
+   struct hash_node *hn = get_node(ht, key);
+
+   return (hn == NULL) ? NULL : hn->data;
+}
+
+void
+hash_table_insert(struct hash_table *ht, void *data, const void *key)
+{
+    const unsigned hash_value = (*ht->hash)(key);
+    const unsigned bucket = hash_value % ht->num_buckets;
+    struct hash_node *node;
+
+    node = calloc(1, sizeof(*node));
+
+    node->data = data;
+    node->key = key;
+
+    insert_at_head(& ht->buckets[bucket], & node->link);
+}
+
+bool
+hash_table_replace(struct hash_table *ht, void *data, const void *key)
+{
+    const unsigned hash_value = (*ht->hash)(key);
+    const unsigned bucket = hash_value % ht->num_buckets;
+    struct node *node;
+    struct hash_node *hn;
+
+    foreach(node, & ht->buckets[bucket]) {
+       hn = (struct hash_node *) node;
+
+       if ((*ht->compare)(hn->key, key) == 0) {
+	  hn->data = data;
+	  return true;
+       }
+    }
+
+    hn = calloc(1, sizeof(*hn));
+
+    hn->data = data;
+    hn->key = key;
+
+    insert_at_head(& ht->buckets[bucket], & hn->link);
+    return false;
+}
+
+void
+hash_table_remove(struct hash_table *ht, const void *key)
+{
+   struct node *node = (struct node *) get_node(ht, key);
+   if (node != NULL) {
+      remove_from_list(node);
+      free(node);
+      return;
+   }
+}
+
+void
+hash_table_call_foreach(struct hash_table *ht,
+			void (*callback)(const void *key,
+					 void *data,
+					 void *closure),
+			void *closure)
+{
+   unsigned bucket;
+
+   for (bucket = 0; bucket < ht->num_buckets; bucket++) {
+      struct node *node, *temp;
+      foreach_s(node, temp, &ht->buckets[bucket]) {
+	 struct hash_node *hn = (struct hash_node *) node;
+
+	 callback(hn->key, hn->data, closure);
+      }
+   }
+}
+
+unsigned
+hash_table_string_hash(const void *key)
+{
+    const char *str = (const char *) key;
+    unsigned hash = 5381;
+
+
+    while (*str != '\0') {
+        hash = (hash * 33) + *str;
+        str++;
+    }
+
+    return hash;
+}
+
+
+unsigned
+hash_table_pointer_hash(const void *key)
+{
+   return (unsigned)((uintptr_t) key / sizeof(void *));
+}
+
+
+int
+hash_table_pointer_compare(const void *key1, const void *key2)
+{
+   return key1 == key2 ? 0 : 1;
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/prog_instruction.c b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_instruction.c
new file mode 100644
index 0000000..dcfedb7
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_instruction.c

@@ -0,0 +1,334 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2008  Brian Paul   All Rights Reserved.
+ * Copyright (C) 1999-2009  VMware, Inc.  All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+#include "main/glheader.h"
+#include "main/imports.h"
+#include "main/mtypes.h"
+#include "prog_instruction.h"
+
+
+/**
+ * Initialize program instruction fields to defaults.
+ * \param inst  first instruction to initialize
+ * \param count  number of instructions to initialize
+ */
+void
+_mesa_init_instructions(struct prog_instruction *inst, GLuint count)
+{
+   GLuint i;
+
+   memset(inst, 0, count * sizeof(struct prog_instruction));
+
+   for (i = 0; i < count; i++) {
+      inst[i].SrcReg[0].File = PROGRAM_UNDEFINED;
+      inst[i].SrcReg[0].Swizzle = SWIZZLE_NOOP;
+      inst[i].SrcReg[1].File = PROGRAM_UNDEFINED;
+      inst[i].SrcReg[1].Swizzle = SWIZZLE_NOOP;
+      inst[i].SrcReg[2].File = PROGRAM_UNDEFINED;
+      inst[i].SrcReg[2].Swizzle = SWIZZLE_NOOP;
+
+      inst[i].DstReg.File = PROGRAM_UNDEFINED;
+      inst[i].DstReg.WriteMask = WRITEMASK_XYZW;
+      inst[i].DstReg.CondMask = COND_TR;
+      inst[i].DstReg.CondSwizzle = SWIZZLE_NOOP;
+
+      inst[i].SaturateMode = SATURATE_OFF;
+      inst[i].Precision = FLOAT32;
+   }
+}
+
+
+/**
+ * Allocate an array of program instructions.
+ * \param numInst  number of instructions
+ * \return pointer to instruction memory
+ */
+struct prog_instruction *
+_mesa_alloc_instructions(GLuint numInst)
+{
+   return
+      calloc(1, numInst * sizeof(struct prog_instruction));
+}
+
+
+/**
+ * Reallocate memory storing an array of program instructions.
+ * This is used when we need to append additional instructions onto an
+ * program.
+ * \param oldInst  pointer to first of old/src instructions
+ * \param numOldInst  number of instructions at <oldInst>
+ * \param numNewInst  desired size of new instruction array.
+ * \return  pointer to start of new instruction array.
+ */
+struct prog_instruction *
+_mesa_realloc_instructions(struct prog_instruction *oldInst,
+                           GLuint numOldInst, GLuint numNewInst)
+{
+   struct prog_instruction *newInst;
+
+   newInst = (struct prog_instruction *)
+      _mesa_realloc(oldInst,
+                    numOldInst * sizeof(struct prog_instruction),
+                    numNewInst * sizeof(struct prog_instruction));
+
+   return newInst;
+}
+
+
+/**
+ * Copy an array of program instructions.
+ * \param dest  pointer to destination.
+ * \param src  pointer to source.
+ * \param n  number of instructions to copy.
+ * \return pointer to destination.
+ */
+struct prog_instruction *
+_mesa_copy_instructions(struct prog_instruction *dest,
+                        const struct prog_instruction *src, GLuint n)
+{
+   GLuint i;
+   memcpy(dest, src, n * sizeof(struct prog_instruction));
+   for (i = 0; i < n; i++) {
+      if (src[i].Comment)
+         dest[i].Comment = _mesa_strdup(src[i].Comment);
+   }
+   return dest;
+}
+
+
+/**
+ * Free an array of instructions
+ */
+void
+_mesa_free_instructions(struct prog_instruction *inst, GLuint count)
+{
+   GLuint i;
+   for (i = 0; i < count; i++) {
+      free((char *)inst[i].Comment);
+   }
+   free(inst);
+}
+
+
+/**
+ * Basic info about each instruction
+ */
+struct instruction_info
+{
+   gl_inst_opcode Opcode;
+   const char *Name;
+   GLuint NumSrcRegs;
+   GLuint NumDstRegs;
+};
+
+/**
+ * Instruction info
+ * \note Opcode should equal array index!
+ */
+static const struct instruction_info InstInfo[MAX_OPCODE] = {
+   { OPCODE_NOP,    "NOP",     0, 0 },
+   { OPCODE_ABS,    "ABS",     1, 1 },
+   { OPCODE_ADD,    "ADD",     2, 1 },
+   { OPCODE_ARL,    "ARL",     1, 1 },
+   { OPCODE_BGNLOOP,"BGNLOOP", 0, 0 },
+   { OPCODE_BGNSUB, "BGNSUB",  0, 0 },
+   { OPCODE_BRK,    "BRK",     0, 0 },
+   { OPCODE_CAL,    "CAL",     0, 0 },
+   { OPCODE_CMP,    "CMP",     3, 1 },
+   { OPCODE_CONT,   "CONT",    0, 0 },
+   { OPCODE_COS,    "COS",     1, 1 },
+   { OPCODE_DDX,    "DDX",     1, 1 },
+   { OPCODE_DDY,    "DDY",     1, 1 },
+   { OPCODE_DP2,    "DP2",     2, 1 },
+   { OPCODE_DP3,    "DP3",     2, 1 },
+   { OPCODE_DP4,    "DP4",     2, 1 },
+   { OPCODE_DPH,    "DPH",     2, 1 },
+   { OPCODE_DST,    "DST",     2, 1 },
+   { OPCODE_ELSE,   "ELSE",    0, 0 },
+   { OPCODE_END,    "END",     0, 0 },
+   { OPCODE_ENDIF,  "ENDIF",   0, 0 },
+   { OPCODE_ENDLOOP,"ENDLOOP", 0, 0 },
+   { OPCODE_ENDSUB, "ENDSUB",  0, 0 },
+   { OPCODE_EX2,    "EX2",     1, 1 },
+   { OPCODE_EXP,    "EXP",     1, 1 },
+   { OPCODE_FLR,    "FLR",     1, 1 },
+   { OPCODE_FRC,    "FRC",     1, 1 },
+   { OPCODE_IF,     "IF",      1, 0 },
+   { OPCODE_KIL,    "KIL",     1, 0 },
+   { OPCODE_KIL_NV, "KIL_NV",  0, 0 },
+   { OPCODE_LG2,    "LG2",     1, 1 },
+   { OPCODE_LIT,    "LIT",     1, 1 },
+   { OPCODE_LOG,    "LOG",     1, 1 },
+   { OPCODE_LRP,    "LRP",     3, 1 },
+   { OPCODE_MAD,    "MAD",     3, 1 },
+   { OPCODE_MAX,    "MAX",     2, 1 },
+   { OPCODE_MIN,    "MIN",     2, 1 },
+   { OPCODE_MOV,    "MOV",     1, 1 },
+   { OPCODE_MUL,    "MUL",     2, 1 },
+   { OPCODE_NOISE1, "NOISE1",  1, 1 },
+   { OPCODE_NOISE2, "NOISE2",  1, 1 },
+   { OPCODE_NOISE3, "NOISE3",  1, 1 },
+   { OPCODE_NOISE4, "NOISE4",  1, 1 },
+   { OPCODE_PK2H,   "PK2H",    1, 1 },
+   { OPCODE_PK2US,  "PK2US",   1, 1 },
+   { OPCODE_PK4B,   "PK4B",    1, 1 },
+   { OPCODE_PK4UB,  "PK4UB",   1, 1 },
+   { OPCODE_POW,    "POW",     2, 1 },
+   { OPCODE_RCP,    "RCP",     1, 1 },
+   { OPCODE_RET,    "RET",     0, 0 },
+   { OPCODE_RFL,    "RFL",     1, 1 },
+   { OPCODE_RSQ,    "RSQ",     1, 1 },
+   { OPCODE_SCS,    "SCS",     1, 1 },
+   { OPCODE_SEQ,    "SEQ",     2, 1 },
+   { OPCODE_SFL,    "SFL",     0, 1 },
+   { OPCODE_SGE,    "SGE",     2, 1 },
+   { OPCODE_SGT,    "SGT",     2, 1 },
+   { OPCODE_SIN,    "SIN",     1, 1 },
+   { OPCODE_SLE,    "SLE",     2, 1 },
+   { OPCODE_SLT,    "SLT",     2, 1 },
+   { OPCODE_SNE,    "SNE",     2, 1 },
+   { OPCODE_SSG,    "SSG",     1, 1 },
+   { OPCODE_STR,    "STR",     0, 1 },
+   { OPCODE_SUB,    "SUB",     2, 1 },
+   { OPCODE_SWZ,    "SWZ",     1, 1 },
+   { OPCODE_TEX,    "TEX",     1, 1 },
+   { OPCODE_TXB,    "TXB",     1, 1 },
+   { OPCODE_TXD,    "TXD",     3, 1 },
+   { OPCODE_TXL,    "TXL",     1, 1 },
+   { OPCODE_TXP,    "TXP",     1, 1 },
+   { OPCODE_TXP_NV, "TXP_NV",  1, 1 },
+   { OPCODE_TRUNC,  "TRUNC",   1, 1 },
+   { OPCODE_UP2H,   "UP2H",    1, 1 },
+   { OPCODE_UP2US,  "UP2US",   1, 1 },
+   { OPCODE_UP4B,   "UP4B",    1, 1 },
+   { OPCODE_UP4UB,  "UP4UB",   1, 1 },
+   { OPCODE_X2D,    "X2D",     3, 1 },
+   { OPCODE_XPD,    "XPD",     2, 1 }
+};
+
+
+/**
+ * Return the number of src registers for the given instruction/opcode.
+ */
+GLuint
+_mesa_num_inst_src_regs(gl_inst_opcode opcode)
+{
+   ASSERT(opcode < MAX_OPCODE);
+   ASSERT(opcode == InstInfo[opcode].Opcode);
+   ASSERT(OPCODE_XPD == InstInfo[OPCODE_XPD].Opcode);
+   return InstInfo[opcode].NumSrcRegs;
+}
+
+
+/**
+ * Return the number of dst registers for the given instruction/opcode.
+ */
+GLuint
+_mesa_num_inst_dst_regs(gl_inst_opcode opcode)
+{
+   ASSERT(opcode < MAX_OPCODE);
+   ASSERT(opcode == InstInfo[opcode].Opcode);
+   ASSERT(OPCODE_XPD == InstInfo[OPCODE_XPD].Opcode);
+   return InstInfo[opcode].NumDstRegs;
+}
+
+
+GLboolean
+_mesa_is_tex_instruction(gl_inst_opcode opcode)
+{
+   return (opcode == OPCODE_TEX ||
+           opcode == OPCODE_TXB ||
+           opcode == OPCODE_TXD ||
+           opcode == OPCODE_TXL ||
+           opcode == OPCODE_TXP);
+}
+
+
+/**
+ * Check if there's a potential src/dst register data dependency when
+ * using SOA execution.
+ * Example:
+ *   MOV T, T.yxwz;
+ * This would expand into:
+ *   MOV t0, t1;
+ *   MOV t1, t0;
+ *   MOV t2, t3;
+ *   MOV t3, t2;
+ * The second instruction will have the wrong value for t0 if executed as-is.
+ */
+GLboolean
+_mesa_check_soa_dependencies(const struct prog_instruction *inst)
+{
+   GLuint i, chan;
+
+   if (inst->DstReg.WriteMask == WRITEMASK_X ||
+       inst->DstReg.WriteMask == WRITEMASK_Y ||
+       inst->DstReg.WriteMask == WRITEMASK_Z ||
+       inst->DstReg.WriteMask == WRITEMASK_W ||
+       inst->DstReg.WriteMask == 0x0) {
+      /* no chance of data dependency */
+      return GL_FALSE;
+   }
+
+   /* loop over src regs */
+   for (i = 0; i < 3; i++) {
+      if (inst->SrcReg[i].File == inst->DstReg.File &&
+          inst->SrcReg[i].Index == inst->DstReg.Index) {
+         /* loop over dest channels */
+         GLuint channelsWritten = 0x0;
+         for (chan = 0; chan < 4; chan++) {
+            if (inst->DstReg.WriteMask & (1 << chan)) {
+               /* check if we're reading a channel that's been written */
+               GLuint swizzle = GET_SWZ(inst->SrcReg[i].Swizzle, chan);
+               if (swizzle <= SWIZZLE_W &&
+                   (channelsWritten & (1 << swizzle))) {
+                  return GL_TRUE;
+               }
+
+               channelsWritten |= (1 << chan);
+            }
+         }
+      }
+   }
+   return GL_FALSE;
+}
+
+
+/**
+ * Return string name for given program opcode.
+ */
+const char *
+_mesa_opcode_string(gl_inst_opcode opcode)
+{
+   if (opcode < MAX_OPCODE)
+      return InstInfo[opcode].Name;
+   else {
+      static char s[20];
+      _mesa_snprintf(s, sizeof(s), "OP%u", opcode);
+      return s;
+   }
+}
+

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/prog_instruction.h b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_instruction.h
new file mode 100644
index 0000000..b9604e5
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_instruction.h

@@ -0,0 +1,430 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2008  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+/**
+ * \file prog_instruction.h
+ *
+ * Vertex/fragment program instruction datatypes and constants.
+ *
+ * \author Brian Paul
+ * \author Keith Whitwell
+ * \author Ian Romanick <idr@us.ibm.com>
+ */
+
+
+#ifndef PROG_INSTRUCTION_H
+#define PROG_INSTRUCTION_H
+
+
+#include "main/glheader.h"
+
+
+/**
+ * Swizzle indexes.
+ * Do not change!
+ */
+/*@{*/
+#define SWIZZLE_X    0
+#define SWIZZLE_Y    1
+#define SWIZZLE_Z    2
+#define SWIZZLE_W    3
+#define SWIZZLE_ZERO 4   /**< For SWZ instruction only */
+#define SWIZZLE_ONE  5   /**< For SWZ instruction only */
+#define SWIZZLE_NIL  7   /**< used during shader code gen (undefined value) */
+/*@}*/
+
+#define MAKE_SWIZZLE4(a,b,c,d) (((a)<<0) | ((b)<<3) | ((c)<<6) | ((d)<<9))
+#define SWIZZLE_NOOP           MAKE_SWIZZLE4(0,1,2,3)
+#define GET_SWZ(swz, idx)      (((swz) >> ((idx)*3)) & 0x7)
+#define GET_BIT(msk, idx)      (((msk) >> (idx)) & 0x1)
+
+#define SWIZZLE_XYZW MAKE_SWIZZLE4(SWIZZLE_X, SWIZZLE_Y, SWIZZLE_Z, SWIZZLE_W)
+#define SWIZZLE_XXXX MAKE_SWIZZLE4(SWIZZLE_X, SWIZZLE_X, SWIZZLE_X, SWIZZLE_X)
+#define SWIZZLE_YYYY MAKE_SWIZZLE4(SWIZZLE_Y, SWIZZLE_Y, SWIZZLE_Y, SWIZZLE_Y)
+#define SWIZZLE_ZZZZ MAKE_SWIZZLE4(SWIZZLE_Z, SWIZZLE_Z, SWIZZLE_Z, SWIZZLE_Z)
+#define SWIZZLE_WWWW MAKE_SWIZZLE4(SWIZZLE_W, SWIZZLE_W, SWIZZLE_W, SWIZZLE_W)
+
+
+/**
+ * Writemask values, 1 bit per component.
+ */
+/*@{*/
+#define WRITEMASK_X     0x1
+#define WRITEMASK_Y     0x2
+#define WRITEMASK_XY    0x3
+#define WRITEMASK_Z     0x4
+#define WRITEMASK_XZ    0x5
+#define WRITEMASK_YZ    0x6
+#define WRITEMASK_XYZ   0x7
+#define WRITEMASK_W     0x8
+#define WRITEMASK_XW    0x9
+#define WRITEMASK_YW    0xa
+#define WRITEMASK_XYW   0xb
+#define WRITEMASK_ZW    0xc
+#define WRITEMASK_XZW   0xd
+#define WRITEMASK_YZW   0xe
+#define WRITEMASK_XYZW  0xf
+/*@}*/
+
+
+/**
+ * Condition codes
+ */
+/*@{*/
+#define COND_GT  1  /**< greater than zero */
+#define COND_EQ  2  /**< equal to zero */
+#define COND_LT  3  /**< less than zero */
+#define COND_UN  4  /**< unordered (NaN) */
+#define COND_GE  5  /**< greater than or equal to zero */
+#define COND_LE  6  /**< less than or equal to zero */
+#define COND_NE  7  /**< not equal to zero */
+#define COND_TR  8  /**< always true */
+#define COND_FL  9  /**< always false */
+/*@}*/
+
+
+/**
+ * Instruction precision for GL_NV_fragment_program
+ */
+/*@{*/
+#define FLOAT32  0x1
+#define FLOAT16  0x2
+#define FIXED12  0x4
+/*@}*/
+
+
+/**
+ * Saturation modes when storing values.
+ */
+/*@{*/
+#define SATURATE_OFF            0
+#define SATURATE_ZERO_ONE       1
+/*@}*/
+
+
+/**
+ * Per-component negation masks
+ */
+/*@{*/
+#define NEGATE_X    0x1
+#define NEGATE_Y    0x2
+#define NEGATE_Z    0x4
+#define NEGATE_W    0x8
+#define NEGATE_XYZ  0x7
+#define NEGATE_XYZW 0xf
+#define NEGATE_NONE 0x0
+/*@}*/
+
+
+/**
+ * Program instruction opcodes for vertex, fragment and geometry programs.
+ */
+typedef enum prog_opcode {
+                     /* ARB_vp   ARB_fp   NV_vp   NV_fp     GLSL */
+                     /*------------------------------------------*/
+   OPCODE_NOP = 0,   /*                                      X   */
+   OPCODE_ABS,       /*   X        X       1.1               X   */
+   OPCODE_ADD,       /*   X        X       X       X         X   */
+   OPCODE_ARL,       /*   X                X                 X   */
+   OPCODE_BGNLOOP,   /*                                     opt  */
+   OPCODE_BGNSUB,    /*                                     opt  */
+   OPCODE_BRK,       /*                    2                opt  */
+   OPCODE_CAL,       /*                    2       2        opt  */
+   OPCODE_CMP,       /*            X                         X   */
+   OPCODE_CONT,      /*                                     opt  */
+   OPCODE_COS,       /*            X       2       X         X   */
+   OPCODE_DDX,       /*                            X         X   */
+   OPCODE_DDY,       /*                            X         X   */
+   OPCODE_DP2,       /*                            2         X   */
+   OPCODE_DP3,       /*   X        X       X       X         X   */
+   OPCODE_DP4,       /*   X        X       X       X         X   */
+   OPCODE_DPH,       /*   X        X       1.1                   */
+   OPCODE_DST,       /*   X        X       X       X             */
+   OPCODE_ELSE,      /*                                     opt  */
+   OPCODE_END,       /*   X        X       X       X        opt  */
+   OPCODE_ENDIF,     /*                                     opt  */
+   OPCODE_ENDLOOP,   /*                                     opt  */
+   OPCODE_ENDSUB,    /*                                     opt  */
+   OPCODE_EX2,       /*   X        X       2       X         X   */
+   OPCODE_EXP,       /*   X                X                     */
+   OPCODE_FLR,       /*   X        X       2       X         X   */
+   OPCODE_FRC,       /*   X        X       2       X         X   */
+   OPCODE_IF,        /*                                     opt  */
+   OPCODE_KIL,       /*            X                         X   */
+   OPCODE_KIL_NV,    /*                            X         X   */
+   OPCODE_LG2,       /*   X        X       2       X         X   */
+   OPCODE_LIT,       /*   X        X       X       X             */
+   OPCODE_LOG,       /*   X                X                     */
+   OPCODE_LRP,       /*            X               X             */
+   OPCODE_MAD,       /*   X        X       X       X         X   */
+   OPCODE_MAX,       /*   X        X       X       X         X   */
+   OPCODE_MIN,       /*   X        X       X       X         X   */
+   OPCODE_MOV,       /*   X        X       X       X         X   */
+   OPCODE_MUL,       /*   X        X       X       X         X   */
+   OPCODE_NOISE1,    /*                                      X   */
+   OPCODE_NOISE2,    /*                                      X   */
+   OPCODE_NOISE3,    /*                                      X   */
+   OPCODE_NOISE4,    /*                                      X   */
+   OPCODE_PK2H,      /*                            X             */
+   OPCODE_PK2US,     /*                            X             */
+   OPCODE_PK4B,      /*                            X             */
+   OPCODE_PK4UB,     /*                            X             */
+   OPCODE_POW,       /*   X        X               X         X   */
+   OPCODE_RCP,       /*   X        X       X       X         X   */
+   OPCODE_RET,       /*                    2       2        opt  */
+   OPCODE_RFL,       /*                            X             */
+   OPCODE_RSQ,       /*   X        X       X       X         X   */
+   OPCODE_SCS,       /*            X                         X   */
+   OPCODE_SEQ,       /*                    2       X         X   */
+   OPCODE_SFL,       /*                    2       X             */
+   OPCODE_SGE,       /*   X        X       X       X         X   */
+   OPCODE_SGT,       /*                    2       X         X   */
+   OPCODE_SIN,       /*            X       2       X         X   */
+   OPCODE_SLE,       /*                    2       X         X   */
+   OPCODE_SLT,       /*   X        X       X       X         X   */
+   OPCODE_SNE,       /*                    2       X         X   */
+   OPCODE_SSG,       /*                    2                 X   */
+   OPCODE_STR,       /*                    2       X             */
+   OPCODE_SUB,       /*   X        X       1.1     X         X   */
+   OPCODE_SWZ,       /*   X        X                         X   */
+   OPCODE_TEX,       /*            X       3       X         X   */
+   OPCODE_TXB,       /*            X       3                 X   */
+   OPCODE_TXD,       /*                            X         X   */
+   OPCODE_TXL,       /*                    3       2         X   */
+   OPCODE_TXP,       /*            X                         X   */
+   OPCODE_TXP_NV,    /*                    3       X             */
+   OPCODE_TRUNC,     /*                                      X   */
+   OPCODE_UP2H,      /*                            X             */
+   OPCODE_UP2US,     /*                            X             */
+   OPCODE_UP4B,      /*                            X             */
+   OPCODE_UP4UB,     /*                            X             */
+   OPCODE_X2D,       /*                            X             */
+   OPCODE_XPD,       /*   X        X                             */
+   MAX_OPCODE
+} gl_inst_opcode;
+
+
+/**
+ * Number of bits for the src/dst register Index field.
+ * This limits the size of temp/uniform register files.
+ */
+#define INST_INDEX_BITS 12
+
+
+/**
+ * Instruction source register.
+ */
+struct prog_src_register
+{
+   GLuint File:4;	/**< One of the PROGRAM_* register file values. */
+   GLint Index:(INST_INDEX_BITS+1); /**< Extra bit here for sign bit.
+                                     * May be negative for relative addressing.
+                                     */
+   GLuint Swizzle:12;
+   GLuint RelAddr:1;
+
+   /** Take the component-wise absolute value */
+   GLuint Abs:1;
+
+   /**
+    * Post-Abs negation.
+    * This will either be NEGATE_NONE or NEGATE_XYZW, except for the SWZ
+    * instruction which allows per-component negation.
+    */
+   GLuint Negate:4;
+
+   /**
+    * Is the register two-dimensional.
+    * Two dimensional registers are of the
+    * REGISTER[index][index2] format.
+    * They are used by the geometry shaders where
+    * the first index is the index within an array
+    * and the second index is the semantic of the
+    * array, e.g. gl_PositionIn[index] would become
+    * INPUT[index][gl_PositionIn]
+    */
+   GLuint HasIndex2:1;
+   GLuint RelAddr2:1;
+   GLint Index2:(INST_INDEX_BITS+1); /**< Extra bit here for sign bit.
+                                       * May be negative for relative
+                                       * addressing. */
+};
+
+
+/**
+ * Instruction destination register.
+ */
+struct prog_dst_register
+{
+   GLuint File:4;      /**< One of the PROGRAM_* register file values */
+   GLuint Index:INST_INDEX_BITS;  /**< Unsigned, never negative */
+   GLuint WriteMask:4;
+   GLuint RelAddr:1;
+
+   /**
+    * \name Conditional destination update control.
+    *
+    * \since
+    * NV_fragment_program_option, NV_vertex_program2, NV_vertex_program2_option.
+    */
+   /*@{*/
+   /**
+    * Takes one of the 9 possible condition values (EQ, FL, GT, GE, LE, LT,
+    * NE, TR, or UN).  Dest reg is only written to if the matching
+    * (swizzled) condition code value passes.  When a conditional update mask
+    * is not specified, this will be \c COND_TR.
+    */
+   GLuint CondMask:4;
+
+   /**
+    * Condition code swizzle value.
+    */
+   GLuint CondSwizzle:12;
+};
+
+
+/**
+ * Vertex/fragment program instruction.
+ */
+struct prog_instruction
+{
+   gl_inst_opcode Opcode;
+   struct prog_src_register SrcReg[3];
+   struct prog_dst_register DstReg;
+
+   /**
+    * Indicates that the instruction should update the condition code
+    * register.
+    *
+    * \since
+    * NV_fragment_program_option, NV_vertex_program2, NV_vertex_program2_option.
+    */
+   GLuint CondUpdate:1;
+
+   /**
+    * If prog_instruction::CondUpdate is \c GL_TRUE, this value selects the
+    * condition code register that is to be updated.
+    *
+    * In GL_NV_fragment_program or GL_NV_vertex_program2 mode, only condition
+    * code register 0 is available.  In GL_NV_vertex_program3 mode, condition
+    * code registers 0 and 1 are available.
+    *
+    * \since
+    * NV_fragment_program_option, NV_vertex_program2, NV_vertex_program2_option.
+    */
+   GLuint CondDst:1;
+
+   /**
+    * Saturate each value of the vectored result to the range [0,1] or the
+    * range [-1,1].  \c SSAT mode (i.e., saturation to the range [-1,1]) is
+    * only available in NV_fragment_program2 mode.
+    * Value is one of the SATURATE_* tokens.
+    *
+    * \since
+    * NV_fragment_program_option, NV_vertex_program3.
+    */
+   GLuint SaturateMode:2;
+
+   /**
+    * Per-instruction selectable precision: FLOAT32, FLOAT16, FIXED12.
+    *
+    * \since
+    * NV_fragment_program_option.
+    */
+   GLuint Precision:3;
+
+   /**
+    * \name Extra fields for TEX, TXB, TXD, TXL, TXP instructions.
+    */
+   /*@{*/
+   /** Source texture unit. */
+   GLuint TexSrcUnit:5;
+
+   /** Source texture target, one of TEXTURE_{1D,2D,3D,CUBE,RECT}_INDEX */
+   GLuint TexSrcTarget:4;
+
+   /** True if tex instruction should do shadow comparison */
+   GLuint TexShadow:1;
+   /*@}*/
+
+   /**
+    * For BRA and CAL instructions, the location to jump to.
+    * For BGNLOOP, points to ENDLOOP (and vice-versa).
+    * For BRK, points to ENDLOOP
+    * For IF, points to ELSE or ENDIF.
+    * For ELSE, points to ENDIF.
+    */
+   GLint BranchTarget;
+
+   /** for debugging purposes */
+   const char *Comment;
+
+   /** for driver use (try to remove someday) */
+   GLint Aux;
+};
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+extern void
+_mesa_init_instructions(struct prog_instruction *inst, GLuint count);
+
+extern struct prog_instruction *
+_mesa_alloc_instructions(GLuint numInst);
+
+extern struct prog_instruction *
+_mesa_realloc_instructions(struct prog_instruction *oldInst,
+                           GLuint numOldInst, GLuint numNewInst);
+
+extern struct prog_instruction *
+_mesa_copy_instructions(struct prog_instruction *dest,
+                        const struct prog_instruction *src, GLuint n);
+
+extern void
+_mesa_free_instructions(struct prog_instruction *inst, GLuint count);
+
+extern GLuint
+_mesa_num_inst_src_regs(gl_inst_opcode opcode);
+
+extern GLuint
+_mesa_num_inst_dst_regs(gl_inst_opcode opcode);
+
+extern GLboolean
+_mesa_is_tex_instruction(gl_inst_opcode opcode);
+
+extern GLboolean
+_mesa_check_soa_dependencies(const struct prog_instruction *inst);
+
+extern const char *
+_mesa_opcode_string(gl_inst_opcode opcode);
+
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+#endif /* PROG_INSTRUCTION_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/prog_noise.c b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_noise.c
new file mode 100644
index 0000000..ac920c2
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_noise.c

@@ -0,0 +1,638 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 2006  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/*
+ * SimplexNoise1234
+ * Copyright (c) 2003-2005, Stefan Gustavson
+ *
+ * Contact: stegu@itn.liu.se
+ */
+
+/**
+ * \file
+ * \brief C implementation of Perlin Simplex Noise over 1, 2, 3 and 4 dims.
+ * \author Stefan Gustavson (stegu@itn.liu.se)
+ *
+ *
+ * This implementation is "Simplex Noise" as presented by
+ * Ken Perlin at a relatively obscure and not often cited course
+ * session "Real-Time Shading" at Siggraph 2001 (before real
+ * time shading actually took on), under the title "hardware noise".
+ * The 3D function is numerically equivalent to his Java reference
+ * code available in the PDF course notes, although I re-implemented
+ * it from scratch to get more readable code. The 1D, 2D and 4D cases
+ * were implemented from scratch by me from Ken Perlin's text.
+ *
+ * This file has no dependencies on any other file, not even its own
+ * header file. The header file is made for use by external code only.
+ */
+
+
+#include "main/imports.h"
+#include "prog_noise.h"
+
+#define FASTFLOOR(x) ( ((x)>0) ? ((int)x) : (((int)x)-1) )
+
+/*
+ * ---------------------------------------------------------------------
+ * Static data
+ */
+
+/**
+ * Permutation table. This is just a random jumble of all numbers 0-255,
+ * repeated twice to avoid wrapping the index at 255 for each lookup.
+ * This needs to be exactly the same for all instances on all platforms,
+ * so it's easiest to just keep it as static explicit data.
+ * This also removes the need for any initialisation of this class.
+ *
+ * Note that making this an int[] instead of a char[] might make the
+ * code run faster on platforms with a high penalty for unaligned single
+ * byte addressing. Intel x86 is generally single-byte-friendly, but
+ * some other CPUs are faster with 4-aligned reads.
+ * However, a char[] is smaller, which avoids cache trashing, and that
+ * is probably the most important aspect on most architectures.
+ * This array is accessed a *lot* by the noise functions.
+ * A vector-valued noise over 3D accesses it 96 times, and a
+ * float-valued 4D noise 64 times. We want this to fit in the cache!
+ */
+static const unsigned char perm[512] = { 151, 160, 137, 91, 90, 15,
+   131, 13, 201, 95, 96, 53, 194, 233, 7, 225, 140, 36, 103, 30, 69, 142, 8,
+      99, 37, 240, 21, 10, 23,
+   190, 6, 148, 247, 120, 234, 75, 0, 26, 197, 62, 94, 252, 219, 203, 117, 35,
+      11, 32, 57, 177, 33,
+   88, 237, 149, 56, 87, 174, 20, 125, 136, 171, 168, 68, 175, 74, 165, 71,
+      134, 139, 48, 27, 166,
+   77, 146, 158, 231, 83, 111, 229, 122, 60, 211, 133, 230, 220, 105, 92, 41,
+      55, 46, 245, 40, 244,
+   102, 143, 54, 65, 25, 63, 161, 1, 216, 80, 73, 209, 76, 132, 187, 208, 89,
+      18, 169, 200, 196,
+   135, 130, 116, 188, 159, 86, 164, 100, 109, 198, 173, 186, 3, 64, 52, 217,
+      226, 250, 124, 123,
+   5, 202, 38, 147, 118, 126, 255, 82, 85, 212, 207, 206, 59, 227, 47, 16, 58,
+      17, 182, 189, 28, 42,
+   223, 183, 170, 213, 119, 248, 152, 2, 44, 154, 163, 70, 221, 153, 101, 155,
+      167, 43, 172, 9,
+   129, 22, 39, 253, 19, 98, 108, 110, 79, 113, 224, 232, 178, 185, 112, 104,
+      218, 246, 97, 228,
+   251, 34, 242, 193, 238, 210, 144, 12, 191, 179, 162, 241, 81, 51, 145, 235,
+      249, 14, 239, 107,
+   49, 192, 214, 31, 181, 199, 106, 157, 184, 84, 204, 176, 115, 121, 50, 45,
+      127, 4, 150, 254,
+   138, 236, 205, 93, 222, 114, 67, 29, 24, 72, 243, 141, 128, 195, 78, 66,
+      215, 61, 156, 180,
+   151, 160, 137, 91, 90, 15,
+   131, 13, 201, 95, 96, 53, 194, 233, 7, 225, 140, 36, 103, 30, 69, 142, 8,
+      99, 37, 240, 21, 10, 23,
+   190, 6, 148, 247, 120, 234, 75, 0, 26, 197, 62, 94, 252, 219, 203, 117, 35,
+      11, 32, 57, 177, 33,
+   88, 237, 149, 56, 87, 174, 20, 125, 136, 171, 168, 68, 175, 74, 165, 71,
+      134, 139, 48, 27, 166,
+   77, 146, 158, 231, 83, 111, 229, 122, 60, 211, 133, 230, 220, 105, 92, 41,
+      55, 46, 245, 40, 244,
+   102, 143, 54, 65, 25, 63, 161, 1, 216, 80, 73, 209, 76, 132, 187, 208, 89,
+      18, 169, 200, 196,
+   135, 130, 116, 188, 159, 86, 164, 100, 109, 198, 173, 186, 3, 64, 52, 217,
+      226, 250, 124, 123,
+   5, 202, 38, 147, 118, 126, 255, 82, 85, 212, 207, 206, 59, 227, 47, 16, 58,
+      17, 182, 189, 28, 42,
+   223, 183, 170, 213, 119, 248, 152, 2, 44, 154, 163, 70, 221, 153, 101, 155,
+      167, 43, 172, 9,
+   129, 22, 39, 253, 19, 98, 108, 110, 79, 113, 224, 232, 178, 185, 112, 104,
+      218, 246, 97, 228,
+   251, 34, 242, 193, 238, 210, 144, 12, 191, 179, 162, 241, 81, 51, 145, 235,
+      249, 14, 239, 107,
+   49, 192, 214, 31, 181, 199, 106, 157, 184, 84, 204, 176, 115, 121, 50, 45,
+      127, 4, 150, 254,
+   138, 236, 205, 93, 222, 114, 67, 29, 24, 72, 243, 141, 128, 195, 78, 66,
+      215, 61, 156, 180
+};
+
+/*
+ * ---------------------------------------------------------------------
+ */
+
+/*
+ * Helper functions to compute gradients-dot-residualvectors (1D to 4D)
+ * Note that these generate gradients of more than unit length. To make
+ * a close match with the value range of classic Perlin noise, the final
+ * noise values need to be rescaled to fit nicely within [-1,1].
+ * (The simplex noise functions as such also have different scaling.)
+ * Note also that these noise functions are the most practical and useful
+ * signed version of Perlin noise. To return values according to the
+ * RenderMan specification from the SL noise() and pnoise() functions,
+ * the noise values need to be scaled and offset to [0,1], like this:
+ * float SLnoise = (SimplexNoise1234::noise(x,y,z) + 1.0) * 0.5;
+ */
+
+static float
+grad1(int hash, float x)
+{
+   int h = hash & 15;
+   float grad = 1.0f + (h & 7); /* Gradient value 1.0, 2.0, ..., 8.0 */
+   if (h & 8)
+      grad = -grad;             /* Set a random sign for the gradient */
+   return (grad * x);           /* Multiply the gradient with the distance */
+}
+
+static float
+grad2(int hash, float x, float y)
+{
+   int h = hash & 7;            /* Convert low 3 bits of hash code */
+   float u = h < 4 ? x : y;     /* into 8 simple gradient directions, */
+   float v = h < 4 ? y : x;     /* and compute the dot product with (x,y). */
+   return ((h & 1) ? -u : u) + ((h & 2) ? -2.0f * v : 2.0f * v);
+}
+
+static float
+grad3(int hash, float x, float y, float z)
+{
+   int h = hash & 15;           /* Convert low 4 bits of hash code into 12 simple */
+   float u = h < 8 ? x : y;     /* gradient directions, and compute dot product. */
+   float v = h < 4 ? y : h == 12 || h == 14 ? x : z;    /* Fix repeats at h = 12 to 15 */
+   return ((h & 1) ? -u : u) + ((h & 2) ? -v : v);
+}
+
+static float
+grad4(int hash, float x, float y, float z, float t)
+{
+   int h = hash & 31;           /* Convert low 5 bits of hash code into 32 simple */
+   float u = h < 24 ? x : y;    /* gradient directions, and compute dot product. */
+   float v = h < 16 ? y : z;
+   float w = h < 8 ? z : t;
+   return ((h & 1) ? -u : u) + ((h & 2) ? -v : v) + ((h & 4) ? -w : w);
+}
+
+/**
+ * A lookup table to traverse the simplex around a given point in 4D.
+ * Details can be found where this table is used, in the 4D noise method.
+ * TODO: This should not be required, backport it from Bill's GLSL code!
+ */
+static unsigned char simplex[64][4] = {
+   {0, 1, 2, 3}, {0, 1, 3, 2}, {0, 0, 0, 0}, {0, 2, 3, 1},
+   {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {1, 2, 3, 0},
+   {0, 2, 1, 3}, {0, 0, 0, 0}, {0, 3, 1, 2}, {0, 3, 2, 1},
+   {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {1, 3, 2, 0},
+   {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0},
+   {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0},
+   {1, 2, 0, 3}, {0, 0, 0, 0}, {1, 3, 0, 2}, {0, 0, 0, 0},
+   {0, 0, 0, 0}, {0, 0, 0, 0}, {2, 3, 0, 1}, {2, 3, 1, 0},
+   {1, 0, 2, 3}, {1, 0, 3, 2}, {0, 0, 0, 0}, {0, 0, 0, 0},
+   {0, 0, 0, 0}, {2, 0, 3, 1}, {0, 0, 0, 0}, {2, 1, 3, 0},
+   {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0},
+   {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0},
+   {2, 0, 1, 3}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0},
+   {3, 0, 1, 2}, {3, 0, 2, 1}, {0, 0, 0, 0}, {3, 1, 2, 0},
+   {2, 1, 0, 3}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0},
+   {3, 1, 0, 2}, {0, 0, 0, 0}, {3, 2, 0, 1}, {3, 2, 1, 0}
+};
+
+
+/** 1D simplex noise */
+GLfloat
+_mesa_noise1(GLfloat x)
+{
+   int i0 = FASTFLOOR(x);
+   int i1 = i0 + 1;
+   float x0 = x - i0;
+   float x1 = x0 - 1.0f;
+   float t1 = 1.0f - x1 * x1;
+   float n0, n1;
+
+   float t0 = 1.0f - x0 * x0;
+/*  if(t0 < 0.0f) t0 = 0.0f; // this never happens for the 1D case */
+   t0 *= t0;
+   n0 = t0 * t0 * grad1(perm[i0 & 0xff], x0);
+
+/*  if(t1 < 0.0f) t1 = 0.0f; // this never happens for the 1D case */
+   t1 *= t1;
+   n1 = t1 * t1 * grad1(perm[i1 & 0xff], x1);
+   /* The maximum value of this noise is 8*(3/4)^4 = 2.53125 */
+   /* A factor of 0.395 would scale to fit exactly within [-1,1], but */
+   /* we want to match PRMan's 1D noise, so we scale it down some more. */
+   return 0.25f * (n0 + n1);
+}
+
+
+/** 2D simplex noise */
+GLfloat
+_mesa_noise2(GLfloat x, GLfloat y)
+{
+#define F2 0.366025403f         /* F2 = 0.5*(sqrt(3.0)-1.0) */
+#define G2 0.211324865f         /* G2 = (3.0-Math.sqrt(3.0))/6.0 */
+
+   float n0, n1, n2;            /* Noise contributions from the three corners */
+
+   /* Skew the input space to determine which simplex cell we're in */
+   float s = (x + y) * F2;      /* Hairy factor for 2D */
+   float xs = x + s;
+   float ys = y + s;
+   int i = FASTFLOOR(xs);
+   int j = FASTFLOOR(ys);
+
+   float t = (float) (i + j) * G2;
+   float X0 = i - t;            /* Unskew the cell origin back to (x,y) space */
+   float Y0 = j - t;
+   float x0 = x - X0;           /* The x,y distances from the cell origin */
+   float y0 = y - Y0;
+
+   float x1, y1, x2, y2;
+   unsigned int ii, jj;
+   float t0, t1, t2;
+
+   /* For the 2D case, the simplex shape is an equilateral triangle. */
+   /* Determine which simplex we are in. */
+   unsigned int i1, j1;         /* Offsets for second (middle) corner of simplex in (i,j) coords */
+   if (x0 > y0) {
+      i1 = 1;
+      j1 = 0;
+   }                            /* lower triangle, XY order: (0,0)->(1,0)->(1,1) */
+   else {
+      i1 = 0;
+      j1 = 1;
+   }                            /* upper triangle, YX order: (0,0)->(0,1)->(1,1) */
+
+   /* A step of (1,0) in (i,j) means a step of (1-c,-c) in (x,y), and */
+   /* a step of (0,1) in (i,j) means a step of (-c,1-c) in (x,y), where */
+   /* c = (3-sqrt(3))/6 */
+
+   x1 = x0 - i1 + G2;           /* Offsets for middle corner in (x,y) unskewed coords */
+   y1 = y0 - j1 + G2;
+   x2 = x0 - 1.0f + 2.0f * G2;  /* Offsets for last corner in (x,y) unskewed coords */
+   y2 = y0 - 1.0f + 2.0f * G2;
+
+   /* Wrap the integer indices at 256, to avoid indexing perm[] out of bounds */
+   ii = i & 0xff;
+   jj = j & 0xff;
+
+   /* Calculate the contribution from the three corners */
+   t0 = 0.5f - x0 * x0 - y0 * y0;
+   if (t0 < 0.0f)
+      n0 = 0.0f;
+   else {
+      t0 *= t0;
+      n0 = t0 * t0 * grad2(perm[ii + perm[jj]], x0, y0);
+   }
+
+   t1 = 0.5f - x1 * x1 - y1 * y1;
+   if (t1 < 0.0f)
+      n1 = 0.0f;
+   else {
+      t1 *= t1;
+      n1 = t1 * t1 * grad2(perm[ii + i1 + perm[jj + j1]], x1, y1);
+   }
+
+   t2 = 0.5f - x2 * x2 - y2 * y2;
+   if (t2 < 0.0f)
+      n2 = 0.0f;
+   else {
+      t2 *= t2;
+      n2 = t2 * t2 * grad2(perm[ii + 1 + perm[jj + 1]], x2, y2);
+   }
+
+   /* Add contributions from each corner to get the final noise value. */
+   /* The result is scaled to return values in the interval [-1,1]. */
+   return 40.0f * (n0 + n1 + n2);       /* TODO: The scale factor is preliminary! */
+}
+
+
+/** 3D simplex noise */
+GLfloat
+_mesa_noise3(GLfloat x, GLfloat y, GLfloat z)
+{
+/* Simple skewing factors for the 3D case */
+#define F3 0.333333333f
+#define G3 0.166666667f
+
+   float n0, n1, n2, n3;        /* Noise contributions from the four corners */
+
+   /* Skew the input space to determine which simplex cell we're in */
+   float s = (x + y + z) * F3;  /* Very nice and simple skew factor for 3D */
+   float xs = x + s;
+   float ys = y + s;
+   float zs = z + s;
+   int i = FASTFLOOR(xs);
+   int j = FASTFLOOR(ys);
+   int k = FASTFLOOR(zs);
+
+   float t = (float) (i + j + k) * G3;
+   float X0 = i - t;            /* Unskew the cell origin back to (x,y,z) space */
+   float Y0 = j - t;
+   float Z0 = k - t;
+   float x0 = x - X0;           /* The x,y,z distances from the cell origin */
+   float y0 = y - Y0;
+   float z0 = z - Z0;
+
+   float x1, y1, z1, x2, y2, z2, x3, y3, z3;
+   unsigned int ii, jj, kk;
+   float t0, t1, t2, t3;
+
+   /* For the 3D case, the simplex shape is a slightly irregular tetrahedron. */
+   /* Determine which simplex we are in. */
+   unsigned int i1, j1, k1;     /* Offsets for second corner of simplex in (i,j,k) coords */
+   unsigned int i2, j2, k2;     /* Offsets for third corner of simplex in (i,j,k) coords */
+
+/* This code would benefit from a backport from the GLSL version! */
+   if (x0 >= y0) {
+      if (y0 >= z0) {
+         i1 = 1;
+         j1 = 0;
+         k1 = 0;
+         i2 = 1;
+         j2 = 1;
+         k2 = 0;
+      }                         /* X Y Z order */
+      else if (x0 >= z0) {
+         i1 = 1;
+         j1 = 0;
+         k1 = 0;
+         i2 = 1;
+         j2 = 0;
+         k2 = 1;
+      }                         /* X Z Y order */
+      else {
+         i1 = 0;
+         j1 = 0;
+         k1 = 1;
+         i2 = 1;
+         j2 = 0;
+         k2 = 1;
+      }                         /* Z X Y order */
+   }
+   else {                       /* x0<y0 */
+      if (y0 < z0) {
+         i1 = 0;
+         j1 = 0;
+         k1 = 1;
+         i2 = 0;
+         j2 = 1;
+         k2 = 1;
+      }                         /* Z Y X order */
+      else if (x0 < z0) {
+         i1 = 0;
+         j1 = 1;
+         k1 = 0;
+         i2 = 0;
+         j2 = 1;
+         k2 = 1;
+      }                         /* Y Z X order */
+      else {
+         i1 = 0;
+         j1 = 1;
+         k1 = 0;
+         i2 = 1;
+         j2 = 1;
+         k2 = 0;
+      }                         /* Y X Z order */
+   }
+
+   /* A step of (1,0,0) in (i,j,k) means a step of (1-c,-c,-c) in
+    * (x,y,z), a step of (0,1,0) in (i,j,k) means a step of
+    * (-c,1-c,-c) in (x,y,z), and a step of (0,0,1) in (i,j,k) means a
+    * step of (-c,-c,1-c) in (x,y,z), where c = 1/6.
+    */
+
+   x1 = x0 - i1 + G3;         /* Offsets for second corner in (x,y,z) coords */
+   y1 = y0 - j1 + G3;
+   z1 = z0 - k1 + G3;
+   x2 = x0 - i2 + 2.0f * G3;  /* Offsets for third corner in (x,y,z) coords */
+   y2 = y0 - j2 + 2.0f * G3;
+   z2 = z0 - k2 + 2.0f * G3;
+   x3 = x0 - 1.0f + 3.0f * G3;/* Offsets for last corner in (x,y,z) coords */
+   y3 = y0 - 1.0f + 3.0f * G3;
+   z3 = z0 - 1.0f + 3.0f * G3;
+
+   /* Wrap the integer indices at 256 to avoid indexing perm[] out of bounds */
+   ii = i & 0xff;
+   jj = j & 0xff;
+   kk = k & 0xff;
+
+   /* Calculate the contribution from the four corners */
+   t0 = 0.6f - x0 * x0 - y0 * y0 - z0 * z0;
+   if (t0 < 0.0f)
+      n0 = 0.0f;
+   else {
+      t0 *= t0;
+      n0 = t0 * t0 * grad3(perm[ii + perm[jj + perm[kk]]], x0, y0, z0);
+   }
+
+   t1 = 0.6f - x1 * x1 - y1 * y1 - z1 * z1;
+   if (t1 < 0.0f)
+      n1 = 0.0f;
+   else {
+      t1 *= t1;
+      n1 =
+         t1 * t1 * grad3(perm[ii + i1 + perm[jj + j1 + perm[kk + k1]]], x1,
+                         y1, z1);
+   }
+
+   t2 = 0.6f - x2 * x2 - y2 * y2 - z2 * z2;
+   if (t2 < 0.0f)
+      n2 = 0.0f;
+   else {
+      t2 *= t2;
+      n2 =
+         t2 * t2 * grad3(perm[ii + i2 + perm[jj + j2 + perm[kk + k2]]], x2,
+                         y2, z2);
+   }
+
+   t3 = 0.6f - x3 * x3 - y3 * y3 - z3 * z3;
+   if (t3 < 0.0f)
+      n3 = 0.0f;
+   else {
+      t3 *= t3;
+      n3 =
+         t3 * t3 * grad3(perm[ii + 1 + perm[jj + 1 + perm[kk + 1]]], x3, y3,
+                         z3);
+   }
+
+   /* Add contributions from each corner to get the final noise value.
+    * The result is scaled to stay just inside [-1,1]
+    */
+   return 32.0f * (n0 + n1 + n2 + n3);  /* TODO: The scale factor is preliminary! */
+}
+
+
+/** 4D simplex noise */
+GLfloat
+_mesa_noise4(GLfloat x, GLfloat y, GLfloat z, GLfloat w)
+{
+   /* The skewing and unskewing factors are hairy again for the 4D case */
+#define F4 0.309016994f         /* F4 = (Math.sqrt(5.0)-1.0)/4.0 */
+#define G4 0.138196601f         /* G4 = (5.0-Math.sqrt(5.0))/20.0 */
+
+   float n0, n1, n2, n3, n4;    /* Noise contributions from the five corners */
+
+   /* Skew the (x,y,z,w) space to determine which cell of 24 simplices we're in */
+   float s = (x + y + z + w) * F4;      /* Factor for 4D skewing */
+   float xs = x + s;
+   float ys = y + s;
+   float zs = z + s;
+   float ws = w + s;
+   int i = FASTFLOOR(xs);
+   int j = FASTFLOOR(ys);
+   int k = FASTFLOOR(zs);
+   int l = FASTFLOOR(ws);
+
+   float t = (i + j + k + l) * G4;      /* Factor for 4D unskewing */
+   float X0 = i - t;            /* Unskew the cell origin back to (x,y,z,w) space */
+   float Y0 = j - t;
+   float Z0 = k - t;
+   float W0 = l - t;
+
+   float x0 = x - X0;           /* The x,y,z,w distances from the cell origin */
+   float y0 = y - Y0;
+   float z0 = z - Z0;
+   float w0 = w - W0;
+
+   /* For the 4D case, the simplex is a 4D shape I won't even try to describe.
+    * To find out which of the 24 possible simplices we're in, we need to
+    * determine the magnitude ordering of x0, y0, z0 and w0.
+    * The method below is a good way of finding the ordering of x,y,z,w and
+    * then find the correct traversal order for the simplex we're in.
+    * First, six pair-wise comparisons are performed between each possible pair
+    * of the four coordinates, and the results are used to add up binary bits
+    * for an integer index.
+    */
+   int c1 = (x0 > y0) ? 32 : 0;
+   int c2 = (x0 > z0) ? 16 : 0;
+   int c3 = (y0 > z0) ? 8 : 0;
+   int c4 = (x0 > w0) ? 4 : 0;
+   int c5 = (y0 > w0) ? 2 : 0;
+   int c6 = (z0 > w0) ? 1 : 0;
+   int c = c1 + c2 + c3 + c4 + c5 + c6;
+
+   unsigned int i1, j1, k1, l1;  /* The integer offsets for the second simplex corner */
+   unsigned int i2, j2, k2, l2;  /* The integer offsets for the third simplex corner */
+   unsigned int i3, j3, k3, l3;  /* The integer offsets for the fourth simplex corner */
+
+   float x1, y1, z1, w1, x2, y2, z2, w2, x3, y3, z3, w3, x4, y4, z4, w4;
+   unsigned int ii, jj, kk, ll;
+   float t0, t1, t2, t3, t4;
+
+   /*
+    * simplex[c] is a 4-vector with the numbers 0, 1, 2 and 3 in some
+    * order.  Many values of c will never occur, since e.g. x>y>z>w
+    * makes x<z, y<w and x<w impossible. Only the 24 indices which
+    * have non-zero entries make any sense.  We use a thresholding to
+    * set the coordinates in turn from the largest magnitude.  The
+    * number 3 in the "simplex" array is at the position of the
+    * largest coordinate.
+    */
+   i1 = simplex[c][0] >= 3 ? 1 : 0;
+   j1 = simplex[c][1] >= 3 ? 1 : 0;
+   k1 = simplex[c][2] >= 3 ? 1 : 0;
+   l1 = simplex[c][3] >= 3 ? 1 : 0;
+   /* The number 2 in the "simplex" array is at the second largest coordinate. */
+   i2 = simplex[c][0] >= 2 ? 1 : 0;
+   j2 = simplex[c][1] >= 2 ? 1 : 0;
+   k2 = simplex[c][2] >= 2 ? 1 : 0;
+   l2 = simplex[c][3] >= 2 ? 1 : 0;
+   /* The number 1 in the "simplex" array is at the second smallest coordinate. */
+   i3 = simplex[c][0] >= 1 ? 1 : 0;
+   j3 = simplex[c][1] >= 1 ? 1 : 0;
+   k3 = simplex[c][2] >= 1 ? 1 : 0;
+   l3 = simplex[c][3] >= 1 ? 1 : 0;
+   /* The fifth corner has all coordinate offsets = 1, so no need to look that up. */
+
+   x1 = x0 - i1 + G4;           /* Offsets for second corner in (x,y,z,w) coords */
+   y1 = y0 - j1 + G4;
+   z1 = z0 - k1 + G4;
+   w1 = w0 - l1 + G4;
+   x2 = x0 - i2 + 2.0f * G4;    /* Offsets for third corner in (x,y,z,w) coords */
+   y2 = y0 - j2 + 2.0f * G4;
+   z2 = z0 - k2 + 2.0f * G4;
+   w2 = w0 - l2 + 2.0f * G4;
+   x3 = x0 - i3 + 3.0f * G4;    /* Offsets for fourth corner in (x,y,z,w) coords */
+   y3 = y0 - j3 + 3.0f * G4;
+   z3 = z0 - k3 + 3.0f * G4;
+   w3 = w0 - l3 + 3.0f * G4;
+   x4 = x0 - 1.0f + 4.0f * G4;  /* Offsets for last corner in (x,y,z,w) coords */
+   y4 = y0 - 1.0f + 4.0f * G4;
+   z4 = z0 - 1.0f + 4.0f * G4;
+   w4 = w0 - 1.0f + 4.0f * G4;
+
+   /* Wrap the integer indices at 256, to avoid indexing perm[] out of bounds */
+   ii = i & 0xff;
+   jj = j & 0xff;
+   kk = k & 0xff;
+   ll = l & 0xff;
+
+   /* Calculate the contribution from the five corners */
+   t0 = 0.6f - x0 * x0 - y0 * y0 - z0 * z0 - w0 * w0;
+   if (t0 < 0.0f)
+      n0 = 0.0f;
+   else {
+      t0 *= t0;
+      n0 =
+         t0 * t0 * grad4(perm[ii + perm[jj + perm[kk + perm[ll]]]], x0, y0,
+                         z0, w0);
+   }
+
+   t1 = 0.6f - x1 * x1 - y1 * y1 - z1 * z1 - w1 * w1;
+   if (t1 < 0.0f)
+      n1 = 0.0f;
+   else {
+      t1 *= t1;
+      n1 =
+         t1 * t1 *
+         grad4(perm[ii + i1 + perm[jj + j1 + perm[kk + k1 + perm[ll + l1]]]],
+               x1, y1, z1, w1);
+   }
+
+   t2 = 0.6f - x2 * x2 - y2 * y2 - z2 * z2 - w2 * w2;
+   if (t2 < 0.0f)
+      n2 = 0.0f;
+   else {
+      t2 *= t2;
+      n2 =
+         t2 * t2 *
+         grad4(perm[ii + i2 + perm[jj + j2 + perm[kk + k2 + perm[ll + l2]]]],
+               x2, y2, z2, w2);
+   }
+
+   t3 = 0.6f - x3 * x3 - y3 * y3 - z3 * z3 - w3 * w3;
+   if (t3 < 0.0f)
+      n3 = 0.0f;
+   else {
+      t3 *= t3;
+      n3 =
+         t3 * t3 *
+         grad4(perm[ii + i3 + perm[jj + j3 + perm[kk + k3 + perm[ll + l3]]]],
+               x3, y3, z3, w3);
+   }
+
+   t4 = 0.6f - x4 * x4 - y4 * y4 - z4 * z4 - w4 * w4;
+   if (t4 < 0.0f)
+      n4 = 0.0f;
+   else {
+      t4 *= t4;
+      n4 =
+         t4 * t4 *
+         grad4(perm[ii + 1 + perm[jj + 1 + perm[kk + 1 + perm[ll + 1]]]], x4,
+               y4, z4, w4);
+   }
+
+   /* Sum up and scale the result to cover the range [-1,1] */
+   return 27.0f * (n0 + n1 + n2 + n3 + n4);     /* TODO: The scale factor is preliminary! */
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/prog_noise.h b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_noise.h
new file mode 100644
index 0000000..51124ca
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_noise.h

@@ -0,0 +1,36 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 2006  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef PROG_NOISE
+#define PROG_NOISE
+
+#include "main/glheader.h"
+
+extern GLfloat _mesa_noise1(GLfloat);
+extern GLfloat _mesa_noise2(GLfloat, GLfloat);
+extern GLfloat _mesa_noise3(GLfloat, GLfloat, GLfloat);
+extern GLfloat _mesa_noise4(GLfloat, GLfloat, GLfloat, GLfloat);
+
+#endif
+

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/prog_opt_constant_fold.c b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_opt_constant_fold.c
new file mode 100644
index 0000000..3811c0d
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_opt_constant_fold.c

@@ -0,0 +1,447 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "main/glheader.h"
+#include "main/context.h"
+#include "main/macros.h"
+#include "program.h"
+#include "prog_instruction.h"
+#include "prog_optimize.h"
+#include "prog_parameter.h"
+#include <stdbool.h>
+
+static bool
+src_regs_are_constant(const struct prog_instruction *inst, unsigned num_srcs)
+{
+   unsigned i;
+
+   for (i = 0; i < num_srcs; i++) {
+      if (inst->SrcReg[i].File != PROGRAM_CONSTANT)
+	 return false;
+   }
+
+   return true;
+}
+
+static struct prog_src_register
+src_reg_for_float(struct gl_program *prog, float val)
+{
+   struct prog_src_register src;
+   unsigned swiz;
+
+   memset(&src, 0, sizeof(src));
+
+   src.File = PROGRAM_CONSTANT;
+   src.Index = _mesa_add_unnamed_constant(prog->Parameters,
+					  (gl_constant_value *) &val, 1, &swiz);
+   src.Swizzle = swiz;
+   return src;
+}
+
+static struct prog_src_register
+src_reg_for_vec4(struct gl_program *prog, const float *val)
+{
+   struct prog_src_register src;
+   unsigned swiz;
+
+   memset(&src, 0, sizeof(src));
+
+   src.File = PROGRAM_CONSTANT;
+   src.Index = _mesa_add_unnamed_constant(prog->Parameters,
+					  (gl_constant_value *) val, 4, &swiz);
+   src.Swizzle = swiz;
+   return src;
+}
+
+static bool
+src_regs_are_same(const struct prog_src_register *a,
+		  const struct prog_src_register *b)
+{
+   return (a->File == b->File)
+      && (a->Index == b->Index)
+      && (a->Swizzle == b->Swizzle)
+      && (a->Abs == b->Abs)
+      && (a->Negate == b->Negate)
+      && (a->RelAddr == 0)
+      && (b->RelAddr == 0);
+}
+
+static void
+get_value(struct gl_program *prog, struct prog_src_register *r, float *data)
+{
+   const gl_constant_value *const value =
+      prog->Parameters->ParameterValues[r->Index];
+
+   data[0] = value[GET_SWZ(r->Swizzle, 0)].f;
+   data[1] = value[GET_SWZ(r->Swizzle, 1)].f;
+   data[2] = value[GET_SWZ(r->Swizzle, 2)].f;
+   data[3] = value[GET_SWZ(r->Swizzle, 3)].f;
+
+   if (r->Abs) {
+      data[0] = fabsf(data[0]);
+      data[1] = fabsf(data[1]);
+      data[2] = fabsf(data[2]);
+      data[3] = fabsf(data[3]);
+   }
+
+   if (r->Negate & 0x01) {
+      data[0] = -data[0];
+   }
+
+   if (r->Negate & 0x02) {
+      data[1] = -data[1];
+   }
+
+   if (r->Negate & 0x04) {
+      data[2] = -data[2];
+   }
+
+   if (r->Negate & 0x08) {
+      data[3] = -data[3];
+   }
+}
+
+/**
+ * Try to replace instructions that produce a constant result with simple moves
+ *
+ * The hope is that a following copy propagation pass will eliminate the
+ * unnecessary move instructions.
+ */
+GLboolean
+_mesa_constant_fold(struct gl_program *prog)
+{
+   bool progress = false;
+   unsigned i;
+
+   for (i = 0; i < prog->NumInstructions; i++) {
+      struct prog_instruction *const inst = &prog->Instructions[i];
+
+      switch (inst->Opcode) {
+      case OPCODE_ADD:
+	 if (src_regs_are_constant(inst, 2)) {
+	    float a[4];
+	    float b[4];
+	    float result[4];
+
+	    get_value(prog, &inst->SrcReg[0], a);
+	    get_value(prog, &inst->SrcReg[1], b);
+
+	    result[0] = a[0] + b[0];
+	    result[1] = a[1] + b[1];
+	    result[2] = a[2] + b[2];
+	    result[3] = a[3] + b[3];
+
+	    inst->Opcode = OPCODE_MOV;
+	    inst->SrcReg[0] = src_reg_for_vec4(prog, result);
+
+	    inst->SrcReg[1].File = PROGRAM_UNDEFINED;
+	    inst->SrcReg[1].Swizzle = SWIZZLE_NOOP;
+
+	    progress = true;
+	 }
+	 break;
+
+      case OPCODE_CMP:
+	 /* FINISHME: We could also optimize CMP instructions where the first
+	  * FINISHME: source is a constant that is either all < 0.0 or all
+	  * FINISHME: >= 0.0.
+	  */
+	 if (src_regs_are_constant(inst, 3)) {
+	    float a[4];
+	    float b[4];
+	    float c[4];
+	    float result[4];
+
+	    get_value(prog, &inst->SrcReg[0], a);
+	    get_value(prog, &inst->SrcReg[1], b);
+	    get_value(prog, &inst->SrcReg[2], c);
+
+            result[0] = a[0] < 0.0f ? b[0] : c[0];
+            result[1] = a[1] < 0.0f ? b[1] : c[1];
+            result[2] = a[2] < 0.0f ? b[2] : c[2];
+            result[3] = a[3] < 0.0f ? b[3] : c[3];
+
+	    inst->Opcode = OPCODE_MOV;
+	    inst->SrcReg[0] = src_reg_for_vec4(prog, result);
+
+	    inst->SrcReg[1].File = PROGRAM_UNDEFINED;
+	    inst->SrcReg[1].Swizzle = SWIZZLE_NOOP;
+	    inst->SrcReg[2].File = PROGRAM_UNDEFINED;
+	    inst->SrcReg[2].Swizzle = SWIZZLE_NOOP;
+
+	    progress = true;
+	 }
+	 break;
+
+      case OPCODE_DP2:
+      case OPCODE_DP3:
+      case OPCODE_DP4:
+	 if (src_regs_are_constant(inst, 2)) {
+	    float a[4];
+	    float b[4];
+	    float result;
+
+	    get_value(prog, &inst->SrcReg[0], a);
+	    get_value(prog, &inst->SrcReg[1], b);
+
+	    result = (a[0] * b[0]) + (a[1] * b[1]);
+
+	    if (inst->Opcode >= OPCODE_DP3)
+	       result += a[2] * b[2];
+
+	    if (inst->Opcode == OPCODE_DP4)
+	       result += a[3] * b[3];
+
+	    inst->Opcode = OPCODE_MOV;
+	    inst->SrcReg[0] = src_reg_for_float(prog, result);
+
+	    inst->SrcReg[1].File = PROGRAM_UNDEFINED;
+	    inst->SrcReg[1].Swizzle = SWIZZLE_NOOP;
+
+	    progress = true;
+	 }
+	 break;
+
+      case OPCODE_MUL:
+	 if (src_regs_are_constant(inst, 2)) {
+	    float a[4];
+	    float b[4];
+	    float result[4];
+
+	    get_value(prog, &inst->SrcReg[0], a);
+	    get_value(prog, &inst->SrcReg[1], b);
+
+	    result[0] = a[0] * b[0];
+	    result[1] = a[1] * b[1];
+	    result[2] = a[2] * b[2];
+	    result[3] = a[3] * b[3];
+
+	    inst->Opcode = OPCODE_MOV;
+	    inst->SrcReg[0] = src_reg_for_vec4(prog, result);
+
+	    inst->SrcReg[1].File = PROGRAM_UNDEFINED;
+	    inst->SrcReg[1].Swizzle = SWIZZLE_NOOP;
+
+	    progress = true;
+	 }
+	 break;
+
+      case OPCODE_SEQ:
+	 if (src_regs_are_constant(inst, 2)) {
+	    float a[4];
+	    float b[4];
+	    float result[4];
+
+	    get_value(prog, &inst->SrcReg[0], a);
+	    get_value(prog, &inst->SrcReg[1], b);
+
+	    result[0] = (a[0] == b[0]) ? 1.0f : 0.0f;
+	    result[1] = (a[1] == b[1]) ? 1.0f : 0.0f;
+	    result[2] = (a[2] == b[2]) ? 1.0f : 0.0f;
+	    result[3] = (a[3] == b[3]) ? 1.0f : 0.0f;
+
+	    inst->Opcode = OPCODE_MOV;
+	    inst->SrcReg[0] = src_reg_for_vec4(prog, result);
+
+	    inst->SrcReg[1].File = PROGRAM_UNDEFINED;
+	    inst->SrcReg[1].Swizzle = SWIZZLE_NOOP;
+
+	    progress = true;
+	 } else if (src_regs_are_same(&inst->SrcReg[0], &inst->SrcReg[1])) {
+	    inst->Opcode = OPCODE_MOV;
+	    inst->SrcReg[0] = src_reg_for_float(prog, 1.0f);
+
+	    inst->SrcReg[1].File = PROGRAM_UNDEFINED;
+	    inst->SrcReg[1].Swizzle = SWIZZLE_NOOP;
+
+	    progress = true;
+	 }
+	 break;
+
+      case OPCODE_SGE:
+	 if (src_regs_are_constant(inst, 2)) {
+	    float a[4];
+	    float b[4];
+	    float result[4];
+
+	    get_value(prog, &inst->SrcReg[0], a);
+	    get_value(prog, &inst->SrcReg[1], b);
+
+	    result[0] = (a[0] >= b[0]) ? 1.0f : 0.0f;
+	    result[1] = (a[1] >= b[1]) ? 1.0f : 0.0f;
+	    result[2] = (a[2] >= b[2]) ? 1.0f : 0.0f;
+	    result[3] = (a[3] >= b[3]) ? 1.0f : 0.0f;
+
+	    inst->Opcode = OPCODE_MOV;
+	    inst->SrcReg[0] = src_reg_for_vec4(prog, result);
+
+	    inst->SrcReg[1].File = PROGRAM_UNDEFINED;
+	    inst->SrcReg[1].Swizzle = SWIZZLE_NOOP;
+
+	    progress = true;
+	 } else if (src_regs_are_same(&inst->SrcReg[0], &inst->SrcReg[1])) {
+	    inst->Opcode = OPCODE_MOV;
+	    inst->SrcReg[0] = src_reg_for_float(prog, 1.0f);
+
+	    inst->SrcReg[1].File = PROGRAM_UNDEFINED;
+	    inst->SrcReg[1].Swizzle = SWIZZLE_NOOP;
+
+	    progress = true;
+	 }
+	 break;
+
+      case OPCODE_SGT:
+	 if (src_regs_are_constant(inst, 2)) {
+	    float a[4];
+	    float b[4];
+	    float result[4];
+
+	    get_value(prog, &inst->SrcReg[0], a);
+	    get_value(prog, &inst->SrcReg[1], b);
+
+	    result[0] = (a[0] > b[0]) ? 1.0f : 0.0f;
+	    result[1] = (a[1] > b[1]) ? 1.0f : 0.0f;
+	    result[2] = (a[2] > b[2]) ? 1.0f : 0.0f;
+	    result[3] = (a[3] > b[3]) ? 1.0f : 0.0f;
+
+	    inst->Opcode = OPCODE_MOV;
+	    inst->SrcReg[0] = src_reg_for_vec4(prog, result);
+
+	    inst->SrcReg[1].File = PROGRAM_UNDEFINED;
+	    inst->SrcReg[1].Swizzle = SWIZZLE_NOOP;
+
+	    progress = true;
+	 } else if (src_regs_are_same(&inst->SrcReg[0], &inst->SrcReg[1])) {
+	    inst->Opcode = OPCODE_MOV;
+	    inst->SrcReg[0] = src_reg_for_float(prog, 0.0f);
+
+	    inst->SrcReg[1].File = PROGRAM_UNDEFINED;
+	    inst->SrcReg[1].Swizzle = SWIZZLE_NOOP;
+
+	    progress = true;
+	 }
+	 break;
+
+      case OPCODE_SLE:
+	 if (src_regs_are_constant(inst, 2)) {
+	    float a[4];
+	    float b[4];
+	    float result[4];
+
+	    get_value(prog, &inst->SrcReg[0], a);
+	    get_value(prog, &inst->SrcReg[1], b);
+
+	    result[0] = (a[0] <= b[0]) ? 1.0f : 0.0f;
+	    result[1] = (a[1] <= b[1]) ? 1.0f : 0.0f;
+	    result[2] = (a[2] <= b[2]) ? 1.0f : 0.0f;
+	    result[3] = (a[3] <= b[3]) ? 1.0f : 0.0f;
+
+	    inst->Opcode = OPCODE_MOV;
+	    inst->SrcReg[0] = src_reg_for_vec4(prog, result);
+
+	    inst->SrcReg[1].File = PROGRAM_UNDEFINED;
+	    inst->SrcReg[1].Swizzle = SWIZZLE_NOOP;
+
+	    progress = true;
+	 } else if (src_regs_are_same(&inst->SrcReg[0], &inst->SrcReg[1])) {
+	    inst->Opcode = OPCODE_MOV;
+	    inst->SrcReg[0] = src_reg_for_float(prog, 1.0f);
+
+	    inst->SrcReg[1].File = PROGRAM_UNDEFINED;
+	    inst->SrcReg[1].Swizzle = SWIZZLE_NOOP;
+
+	    progress = true;
+	 }
+	 break;
+
+      case OPCODE_SLT:
+	 if (src_regs_are_constant(inst, 2)) {
+	    float a[4];
+	    float b[4];
+	    float result[4];
+
+	    get_value(prog, &inst->SrcReg[0], a);
+	    get_value(prog, &inst->SrcReg[1], b);
+
+	    result[0] = (a[0] < b[0]) ? 1.0f : 0.0f;
+	    result[1] = (a[1] < b[1]) ? 1.0f : 0.0f;
+	    result[2] = (a[2] < b[2]) ? 1.0f : 0.0f;
+	    result[3] = (a[3] < b[3]) ? 1.0f : 0.0f;
+
+	    inst->Opcode = OPCODE_MOV;
+	    inst->SrcReg[0] = src_reg_for_vec4(prog, result);
+
+	    inst->SrcReg[1].File = PROGRAM_UNDEFINED;
+	    inst->SrcReg[1].Swizzle = SWIZZLE_NOOP;
+
+	    progress = true;
+	 } else if (src_regs_are_same(&inst->SrcReg[0], &inst->SrcReg[1])) {
+	    inst->Opcode = OPCODE_MOV;
+	    inst->SrcReg[0] = src_reg_for_float(prog, 0.0f);
+
+	    inst->SrcReg[1].File = PROGRAM_UNDEFINED;
+	    inst->SrcReg[1].Swizzle = SWIZZLE_NOOP;
+
+	    progress = true;
+	 }
+	 break;
+
+      case OPCODE_SNE:
+	 if (src_regs_are_constant(inst, 2)) {
+	    float a[4];
+	    float b[4];
+	    float result[4];
+
+	    get_value(prog, &inst->SrcReg[0], a);
+	    get_value(prog, &inst->SrcReg[1], b);
+
+	    result[0] = (a[0] != b[0]) ? 1.0f : 0.0f;
+	    result[1] = (a[1] != b[1]) ? 1.0f : 0.0f;
+	    result[2] = (a[2] != b[2]) ? 1.0f : 0.0f;
+	    result[3] = (a[3] != b[3]) ? 1.0f : 0.0f;
+
+	    inst->Opcode = OPCODE_MOV;
+	    inst->SrcReg[0] = src_reg_for_vec4(prog, result);
+
+	    inst->SrcReg[1].File = PROGRAM_UNDEFINED;
+	    inst->SrcReg[1].Swizzle = SWIZZLE_NOOP;
+
+	    progress = true;
+	 } else if (src_regs_are_same(&inst->SrcReg[0], &inst->SrcReg[1])) {
+	    inst->Opcode = OPCODE_MOV;
+	    inst->SrcReg[0] = src_reg_for_float(prog, 0.0f);
+
+	    inst->SrcReg[1].File = PROGRAM_UNDEFINED;
+	    inst->SrcReg[1].Swizzle = SWIZZLE_NOOP;
+
+	    progress = true;
+	 }
+	 break;
+
+      default:
+	 break;
+      }
+   }
+
+   return progress;
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/prog_optimize.c b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_optimize.c
new file mode 100644
index 0000000..55ad0e3
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_optimize.c

@@ -0,0 +1,1362 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 2009  VMware, Inc.  All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * VMWARE BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
+ * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+
+#include "main/glheader.h"
+#include "main/context.h"
+#include "main/macros.h"
+#include "program.h"
+#include "prog_instruction.h"
+#include "prog_optimize.h"
+#include "prog_print.h"
+
+
+#define MAX_LOOP_NESTING 50
+/* MAX_PROGRAM_TEMPS is a low number (256), and we want to be able to
+ * register allocate many temporary values into that small number of
+ * temps.  So allow large temporary indices coming into the register
+ * allocator.
+ */
+#define REG_ALLOCATE_MAX_PROGRAM_TEMPS	((1 << INST_INDEX_BITS) - 1)
+
+static GLboolean dbg = GL_FALSE;
+
+#define NO_MASK 0xf
+
+/**
+ * Returns the mask of channels (bitmask of WRITEMASK_X,Y,Z,W) which
+ * are read from the given src in this instruction, We also provide
+ * one optional masks which may mask other components in the dst
+ * register
+ */
+static GLuint
+get_src_arg_mask(const struct prog_instruction *inst,
+                 GLuint arg, GLuint dst_mask)
+{
+   GLuint read_mask, channel_mask;
+   GLuint comp;
+
+   ASSERT(arg < _mesa_num_inst_src_regs(inst->Opcode));
+
+   /* Form the dst register, find the written channels */
+   if (inst->CondUpdate) {
+      channel_mask = WRITEMASK_XYZW;
+   }
+   else {
+      switch (inst->Opcode) {
+      case OPCODE_MOV:
+      case OPCODE_MIN:
+      case OPCODE_MAX:
+      case OPCODE_ABS:
+      case OPCODE_ADD:
+      case OPCODE_MAD:
+      case OPCODE_MUL:
+      case OPCODE_SUB:
+      case OPCODE_CMP:
+      case OPCODE_FLR:
+      case OPCODE_FRC:
+      case OPCODE_LRP:
+      case OPCODE_SEQ:
+      case OPCODE_SGE:
+      case OPCODE_SGT:
+      case OPCODE_SLE:
+      case OPCODE_SLT:
+      case OPCODE_SNE:
+      case OPCODE_SSG:
+         channel_mask = inst->DstReg.WriteMask & dst_mask;
+         break;
+      case OPCODE_RCP:
+      case OPCODE_SIN:
+      case OPCODE_COS:
+      case OPCODE_RSQ:
+      case OPCODE_POW:
+      case OPCODE_EX2:
+      case OPCODE_LOG:
+         channel_mask = WRITEMASK_X;
+         break;
+      case OPCODE_DP2:
+         channel_mask = WRITEMASK_XY;
+         break;
+      case OPCODE_DP3:
+      case OPCODE_XPD:
+         channel_mask = WRITEMASK_XYZ;
+         break;
+      default:
+         channel_mask = WRITEMASK_XYZW;
+         break;
+      }
+   }
+
+   /* Now, given the src swizzle and the written channels, find which
+    * components are actually read
+    */
+   read_mask = 0x0;
+   for (comp = 0; comp < 4; ++comp) {
+      const GLuint coord = GET_SWZ(inst->SrcReg[arg].Swizzle, comp);
+      ASSERT(coord < 4);
+      if (channel_mask & (1 << comp) && coord <= SWIZZLE_W)
+         read_mask |= 1 << coord;
+   }
+
+   return read_mask;
+}
+
+
+/**
+ * For a MOV instruction, compute a write mask when src register also has
+ * a mask
+ */
+static GLuint
+get_dst_mask_for_mov(const struct prog_instruction *mov, GLuint src_mask)
+{
+   const GLuint mask = mov->DstReg.WriteMask;
+   GLuint comp;
+   GLuint updated_mask = 0x0;
+
+   ASSERT(mov->Opcode == OPCODE_MOV);
+
+   for (comp = 0; comp < 4; ++comp) {
+      GLuint src_comp;
+      if ((mask & (1 << comp)) == 0)
+         continue;
+      src_comp = GET_SWZ(mov->SrcReg[0].Swizzle, comp);
+      if ((src_mask & (1 << src_comp)) == 0)
+         continue;
+      updated_mask |= 1 << comp;
+   }
+
+   return updated_mask;
+}
+
+
+/**
+ * Ensure that the swizzle is regular.  That is, all of the swizzle
+ * terms are SWIZZLE_X,Y,Z,W and not SWIZZLE_ZERO or SWIZZLE_ONE.
+ */
+static GLboolean
+is_swizzle_regular(GLuint swz)
+{
+   return GET_SWZ(swz,0) <= SWIZZLE_W &&
+          GET_SWZ(swz,1) <= SWIZZLE_W &&
+          GET_SWZ(swz,2) <= SWIZZLE_W &&
+          GET_SWZ(swz,3) <= SWIZZLE_W;
+}
+
+
+/**
+ * In 'prog' remove instruction[i] if removeFlags[i] == TRUE.
+ * \return number of instructions removed
+ */
+static GLuint
+remove_instructions(struct gl_program *prog, const GLboolean *removeFlags)
+{
+   GLint i, removeEnd = 0, removeCount = 0;
+   GLuint totalRemoved = 0;
+
+   /* go backward */
+   for (i = prog->NumInstructions - 1; i >= 0; i--) {
+      if (removeFlags[i]) {
+         totalRemoved++;
+         if (removeCount == 0) {
+            /* begin a run of instructions to remove */
+            removeEnd = i;
+            removeCount = 1;
+         }
+         else {
+            /* extend the run of instructions to remove */
+            removeCount++;
+         }
+      }
+      else {
+         /* don't remove this instruction, but check if the preceding
+          * instructions are to be removed.
+          */
+         if (removeCount > 0) {
+            GLint removeStart = removeEnd - removeCount + 1;
+            _mesa_delete_instructions(prog, removeStart, removeCount);
+            removeStart = removeCount = 0; /* reset removal info */
+         }
+      }
+   }
+   /* Finish removing if the first instruction was to be removed. */
+   if (removeCount > 0) {
+      GLint removeStart = removeEnd - removeCount + 1;
+      _mesa_delete_instructions(prog, removeStart, removeCount);
+   }
+   return totalRemoved;
+}
+
+
+/**
+ * Remap register indexes according to map.
+ * \param prog  the program to search/replace
+ * \param file  the type of register file to search/replace
+ * \param map  maps old register indexes to new indexes
+ */
+static void
+replace_regs(struct gl_program *prog, gl_register_file file, const GLint map[])
+{
+   GLuint i;
+
+   for (i = 0; i < prog->NumInstructions; i++) {
+      struct prog_instruction *inst = prog->Instructions + i;
+      const GLuint numSrc = _mesa_num_inst_src_regs(inst->Opcode);
+      GLuint j;
+      for (j = 0; j < numSrc; j++) {
+         if (inst->SrcReg[j].File == file) {
+            GLuint index = inst->SrcReg[j].Index;
+            ASSERT(map[index] >= 0);
+            inst->SrcReg[j].Index = map[index];
+         }
+      }
+      if (inst->DstReg.File == file) {
+         const GLuint index = inst->DstReg.Index;
+         ASSERT(map[index] >= 0);
+         inst->DstReg.Index = map[index];
+      }
+   }
+}
+
+
+/**
+ * Remove dead instructions from the given program.
+ * This is very primitive for now.  Basically look for temp registers
+ * that are written to but never read.  Remove any instructions that
+ * write to such registers.  Be careful with condition code setters.
+ */
+static GLboolean
+_mesa_remove_dead_code_global(struct gl_program *prog)
+{
+   GLboolean tempRead[REG_ALLOCATE_MAX_PROGRAM_TEMPS][4];
+   GLboolean *removeInst; /* per-instruction removal flag */
+   GLuint i, rem = 0, comp;
+
+   memset(tempRead, 0, sizeof(tempRead));
+
+   if (dbg) {
+      printf("Optimize: Begin dead code removal\n");
+      /*_mesa_print_program(prog);*/
+   }
+
+   removeInst =
+      calloc(1, prog->NumInstructions * sizeof(GLboolean));
+
+   /* Determine which temps are read and written */
+   for (i = 0; i < prog->NumInstructions; i++) {
+      const struct prog_instruction *inst = prog->Instructions + i;
+      const GLuint numSrc = _mesa_num_inst_src_regs(inst->Opcode);
+      GLuint j;
+
+      /* check src regs */
+      for (j = 0; j < numSrc; j++) {
+         if (inst->SrcReg[j].File == PROGRAM_TEMPORARY) {
+            const GLuint index = inst->SrcReg[j].Index;
+            GLuint read_mask;
+            ASSERT(index < REG_ALLOCATE_MAX_PROGRAM_TEMPS);
+	    read_mask = get_src_arg_mask(inst, j, NO_MASK);
+
+            if (inst->SrcReg[j].RelAddr) {
+               if (dbg)
+                  printf("abort remove dead code (indirect temp)\n");
+               goto done;
+            }
+
+	    for (comp = 0; comp < 4; comp++) {
+	       const GLuint swz = GET_SWZ(inst->SrcReg[j].Swizzle, comp);
+	       ASSERT(swz < 4);
+               if ((read_mask & (1 << swz)) == 0)
+		  continue;
+               if (swz <= SWIZZLE_W)
+                  tempRead[index][swz] = GL_TRUE;
+	    }
+         }
+      }
+
+      /* check dst reg */
+      if (inst->DstReg.File == PROGRAM_TEMPORARY) {
+         const GLuint index = inst->DstReg.Index;
+         ASSERT(index < REG_ALLOCATE_MAX_PROGRAM_TEMPS);
+
+         if (inst->DstReg.RelAddr) {
+            if (dbg)
+               printf("abort remove dead code (indirect temp)\n");
+            goto done;
+         }
+
+         if (inst->CondUpdate) {
+            /* If we're writing to this register and setting condition
+             * codes we cannot remove the instruction.  Prevent removal
+             * by setting the 'read' flag.
+             */
+            tempRead[index][0] = GL_TRUE;
+            tempRead[index][1] = GL_TRUE;
+            tempRead[index][2] = GL_TRUE;
+            tempRead[index][3] = GL_TRUE;
+         }
+      }
+   }
+
+   /* find instructions that write to dead registers, flag for removal */
+   for (i = 0; i < prog->NumInstructions; i++) {
+      struct prog_instruction *inst = prog->Instructions + i;
+      const GLuint numDst = _mesa_num_inst_dst_regs(inst->Opcode);
+
+      if (numDst != 0 && inst->DstReg.File == PROGRAM_TEMPORARY) {
+         GLint chan, index = inst->DstReg.Index;
+
+	 for (chan = 0; chan < 4; chan++) {
+	    if (!tempRead[index][chan] &&
+		inst->DstReg.WriteMask & (1 << chan)) {
+	       if (dbg) {
+		  printf("Remove writemask on %u.%c\n", i,
+			       chan == 3 ? 'w' : 'x' + chan);
+	       }
+	       inst->DstReg.WriteMask &= ~(1 << chan);
+	       rem++;
+	    }
+	 }
+
+	 if (inst->DstReg.WriteMask == 0) {
+	    /* If we cleared all writes, the instruction can be removed. */
+	    if (dbg)
+	       printf("Remove instruction %u: \n", i);
+	    removeInst[i] = GL_TRUE;
+	 }
+      }
+   }
+
+   /* now remove the instructions which aren't needed */
+   rem = remove_instructions(prog, removeInst);
+
+   if (dbg) {
+      printf("Optimize: End dead code removal.\n");
+      printf("  %u channel writes removed\n", rem);
+      printf("  %u instructions removed\n", rem);
+      /*_mesa_print_program(prog);*/
+   }
+
+done:
+   free(removeInst);
+   return rem != 0;
+}
+
+
+enum inst_use
+{
+   READ,
+   WRITE,
+   FLOW,
+   END
+};
+
+
+/**
+ * Scan forward in program from 'start' for the next occurrences of TEMP[index].
+ * We look if an instruction reads the component given by the masks and if they
+ * are overwritten.
+ * Return READ, WRITE, FLOW or END to indicate the next usage or an indicator
+ * that we can't look further.
+ */
+static enum inst_use
+find_next_use(const struct gl_program *prog,
+              GLuint start,
+              GLuint index,
+              GLuint mask)
+{
+   GLuint i;
+
+   for (i = start; i < prog->NumInstructions; i++) {
+      const struct prog_instruction *inst = prog->Instructions + i;
+      switch (inst->Opcode) {
+      case OPCODE_BGNLOOP:
+      case OPCODE_BGNSUB:
+      case OPCODE_CAL:
+      case OPCODE_CONT:
+      case OPCODE_IF:
+      case OPCODE_ELSE:
+      case OPCODE_ENDIF:
+      case OPCODE_ENDLOOP:
+      case OPCODE_ENDSUB:
+      case OPCODE_RET:
+         return FLOW;
+      case OPCODE_END:
+         return END;
+      default:
+         {
+            const GLuint numSrc = _mesa_num_inst_src_regs(inst->Opcode);
+            GLuint j;
+            for (j = 0; j < numSrc; j++) {
+               if (inst->SrcReg[j].RelAddr ||
+                   (inst->SrcReg[j].File == PROGRAM_TEMPORARY &&
+                   inst->SrcReg[j].Index == index &&
+                   (get_src_arg_mask(inst,j,NO_MASK) & mask)))
+                  return READ;
+            }
+            if (_mesa_num_inst_dst_regs(inst->Opcode) == 1 &&
+                inst->DstReg.File == PROGRAM_TEMPORARY &&
+                inst->DstReg.Index == index) {
+               mask &= ~inst->DstReg.WriteMask;
+               if (mask == 0)
+                  return WRITE;
+            }
+         }
+      }
+   }
+   return END;
+}
+
+
+/**
+ * Is the given instruction opcode a flow-control opcode?
+ * XXX maybe move this into prog_instruction.[ch]
+ */
+static GLboolean
+_mesa_is_flow_control_opcode(enum prog_opcode opcode)
+{
+   switch (opcode) {
+   case OPCODE_BGNLOOP:
+   case OPCODE_BGNSUB:
+   case OPCODE_CAL:
+   case OPCODE_CONT:
+   case OPCODE_IF:
+   case OPCODE_ELSE:
+   case OPCODE_END:
+   case OPCODE_ENDIF:
+   case OPCODE_ENDLOOP:
+   case OPCODE_ENDSUB:
+   case OPCODE_RET:
+      return GL_TRUE;
+   default:
+      return GL_FALSE;
+   }
+}
+
+
+/**
+ * Test if the given instruction is a simple MOV (no conditional updating,
+ * not relative addressing, no negation/abs, etc).
+ */
+static GLboolean
+can_downward_mov_be_modifed(const struct prog_instruction *mov)
+{
+   return
+      mov->Opcode == OPCODE_MOV &&
+      mov->CondUpdate == GL_FALSE &&
+      mov->SrcReg[0].RelAddr == 0 &&
+      mov->SrcReg[0].Negate == 0 &&
+      mov->SrcReg[0].Abs == 0 &&
+      mov->SrcReg[0].HasIndex2 == 0 &&
+      mov->SrcReg[0].RelAddr2 == 0 &&
+      mov->DstReg.RelAddr == 0 &&
+      mov->DstReg.CondMask == COND_TR;
+}
+
+
+static GLboolean
+can_upward_mov_be_modifed(const struct prog_instruction *mov)
+{
+   return
+      can_downward_mov_be_modifed(mov) &&
+      mov->DstReg.File == PROGRAM_TEMPORARY &&
+      mov->SaturateMode == SATURATE_OFF;
+}
+
+
+/**
+ * Try to remove use of extraneous MOV instructions, to free them up for dead
+ * code removal.
+ */
+static void
+_mesa_remove_extra_move_use(struct gl_program *prog)
+{
+   GLuint i, j;
+
+   if (dbg) {
+      printf("Optimize: Begin remove extra move use\n");
+      _mesa_print_program(prog);
+   }
+
+   /*
+    * Look for sequences such as this:
+    *    MOV tmpX, arg0;
+    *    ...
+    *    FOO tmpY, tmpX, arg1;
+    * and convert into:
+    *    MOV tmpX, arg0;
+    *    ...
+    *    FOO tmpY, arg0, arg1;
+    */
+
+   for (i = 0; i + 1 < prog->NumInstructions; i++) {
+      const struct prog_instruction *mov = prog->Instructions + i;
+      GLuint dst_mask, src_mask;
+      if (can_upward_mov_be_modifed(mov) == GL_FALSE)
+         continue;
+
+      /* Scanning the code, we maintain the components which are still active in
+       * these two masks
+       */
+      dst_mask = mov->DstReg.WriteMask;
+      src_mask = get_src_arg_mask(mov, 0, NO_MASK);
+
+      /* Walk through remaining instructions until the or src reg gets
+       * rewritten or we get into some flow-control, eliminating the use of
+       * this MOV.
+       */
+      for (j = i + 1; j < prog->NumInstructions; j++) {
+	 struct prog_instruction *inst2 = prog->Instructions + j;
+         GLuint arg;
+
+	 if (_mesa_is_flow_control_opcode(inst2->Opcode))
+	     break;
+
+	 /* First rewrite this instruction's args if appropriate. */
+	 for (arg = 0; arg < _mesa_num_inst_src_regs(inst2->Opcode); arg++) {
+	    GLuint comp, read_mask;
+
+	    if (inst2->SrcReg[arg].File != mov->DstReg.File ||
+		inst2->SrcReg[arg].Index != mov->DstReg.Index ||
+		inst2->SrcReg[arg].RelAddr ||
+		inst2->SrcReg[arg].Abs)
+	       continue;
+            read_mask = get_src_arg_mask(inst2, arg, NO_MASK);
+
+	    /* Adjust the swizzles of inst2 to point at MOV's source if ALL the
+             * components read still come from the mov instructions
+             */
+            if (is_swizzle_regular(inst2->SrcReg[arg].Swizzle) &&
+               (read_mask & dst_mask) == read_mask) {
+               for (comp = 0; comp < 4; comp++) {
+                  const GLuint inst2_swz =
+                     GET_SWZ(inst2->SrcReg[arg].Swizzle, comp);
+                  const GLuint s = GET_SWZ(mov->SrcReg[0].Swizzle, inst2_swz);
+                  inst2->SrcReg[arg].Swizzle &= ~(7 << (3 * comp));
+                  inst2->SrcReg[arg].Swizzle |= s << (3 * comp);
+                  inst2->SrcReg[arg].Negate ^= (((mov->SrcReg[0].Negate >>
+                                                  inst2_swz) & 0x1) << comp);
+               }
+               inst2->SrcReg[arg].File = mov->SrcReg[0].File;
+               inst2->SrcReg[arg].Index = mov->SrcReg[0].Index;
+            }
+	 }
+
+	 /* The source of MOV is written. This potentially deactivates some
+          * components from the src and dst of the MOV instruction
+          */
+	 if (inst2->DstReg.File == mov->DstReg.File &&
+	     (inst2->DstReg.RelAddr ||
+	      inst2->DstReg.Index == mov->DstReg.Index)) {
+            dst_mask &= ~inst2->DstReg.WriteMask;
+            src_mask = get_src_arg_mask(mov, 0, dst_mask);
+         }
+
+         /* Idem when the destination of mov is written */
+	 if (inst2->DstReg.File == mov->SrcReg[0].File &&
+	     (inst2->DstReg.RelAddr ||
+	      inst2->DstReg.Index == mov->SrcReg[0].Index)) {
+            src_mask &= ~inst2->DstReg.WriteMask;
+            dst_mask &= get_dst_mask_for_mov(mov, src_mask);
+         }
+         if (dst_mask == 0)
+            break;
+      }
+   }
+
+   if (dbg) {
+      printf("Optimize: End remove extra move use.\n");
+      /*_mesa_print_program(prog);*/
+   }
+}
+
+
+/**
+ * Complements dead_code_global. Try to remove code in block of code by
+ * carefully monitoring the swizzles. Both functions should be merged into one
+ * with a proper control flow graph
+ */
+static GLboolean
+_mesa_remove_dead_code_local(struct gl_program *prog)
+{
+   GLboolean *removeInst;
+   GLuint i, arg, rem = 0;
+
+   removeInst =
+      calloc(1, prog->NumInstructions * sizeof(GLboolean));
+
+   for (i = 0; i < prog->NumInstructions; i++) {
+      const struct prog_instruction *inst = prog->Instructions + i;
+      const GLuint index = inst->DstReg.Index;
+      const GLuint mask = inst->DstReg.WriteMask;
+      enum inst_use use;
+
+      /* We must deactivate the pass as soon as some indirection is used */
+      if (inst->DstReg.RelAddr)
+         goto done;
+      for (arg = 0; arg < _mesa_num_inst_src_regs(inst->Opcode); arg++)
+         if (inst->SrcReg[arg].RelAddr)
+            goto done;
+
+      if (_mesa_is_flow_control_opcode(inst->Opcode) ||
+          _mesa_num_inst_dst_regs(inst->Opcode) == 0 ||
+          inst->DstReg.File != PROGRAM_TEMPORARY ||
+          inst->DstReg.RelAddr)
+         continue;
+
+      use = find_next_use(prog, i+1, index, mask);
+      if (use == WRITE || use == END)
+         removeInst[i] = GL_TRUE;
+   }
+
+   rem = remove_instructions(prog, removeInst);
+
+done:
+   free(removeInst);
+   return rem != 0;
+}
+
+
+/**
+ * Try to inject the destination of mov as the destination of inst and recompute
+ * the swizzles operators for the sources of inst if required. Return GL_TRUE
+ * of the substitution was possible, GL_FALSE otherwise
+ */
+static GLboolean
+_mesa_merge_mov_into_inst(struct prog_instruction *inst,
+                          const struct prog_instruction *mov)
+{
+   /* Indirection table which associates destination and source components for
+    * the mov instruction
+    */
+   const GLuint mask = get_src_arg_mask(mov, 0, NO_MASK);
+
+   /* Some components are not written by inst. We cannot remove the mov */
+   if (mask != (inst->DstReg.WriteMask & mask))
+      return GL_FALSE;
+
+   inst->SaturateMode |= mov->SaturateMode;
+
+   /* Depending on the instruction, we may need to recompute the swizzles.
+    * Also, some other instructions (like TEX) are not linear. We will only
+    * consider completely active sources and destinations
+    */
+   switch (inst->Opcode) {
+
+   /* Carstesian instructions: we compute the swizzle */
+   case OPCODE_MOV:
+   case OPCODE_MIN:
+   case OPCODE_MAX:
+   case OPCODE_ABS:
+   case OPCODE_ADD:
+   case OPCODE_MAD:
+   case OPCODE_MUL:
+   case OPCODE_SUB:
+   {
+      GLuint dst_to_src_comp[4] = {0,0,0,0};
+      GLuint dst_comp, arg;
+      for (dst_comp = 0; dst_comp < 4; ++dst_comp) {
+         if (mov->DstReg.WriteMask & (1 << dst_comp)) {
+            const GLuint src_comp = GET_SWZ(mov->SrcReg[0].Swizzle, dst_comp);
+            ASSERT(src_comp < 4);
+            dst_to_src_comp[dst_comp] = src_comp;
+         }
+      }
+
+      /* Patch each source of the instruction */
+      for (arg = 0; arg < _mesa_num_inst_src_regs(inst->Opcode); arg++) {
+         const GLuint arg_swz = inst->SrcReg[arg].Swizzle;
+         inst->SrcReg[arg].Swizzle = 0;
+
+         /* Reset each active component of the swizzle */
+         for (dst_comp = 0; dst_comp < 4; ++dst_comp) {
+            GLuint src_comp, arg_comp;
+            if ((mov->DstReg.WriteMask & (1 << dst_comp)) == 0)
+               continue;
+            src_comp = dst_to_src_comp[dst_comp];
+            ASSERT(src_comp < 4);
+            arg_comp = GET_SWZ(arg_swz, src_comp);
+            ASSERT(arg_comp < 4);
+            inst->SrcReg[arg].Swizzle |= arg_comp << (3*dst_comp);
+         }
+      }
+      inst->DstReg = mov->DstReg;
+      return GL_TRUE;
+   }
+
+   /* Dot products and scalar instructions: we only change the destination */
+   case OPCODE_RCP:
+   case OPCODE_SIN:
+   case OPCODE_COS:
+   case OPCODE_RSQ:
+   case OPCODE_POW:
+   case OPCODE_EX2:
+   case OPCODE_LOG:
+   case OPCODE_DP2:
+   case OPCODE_DP3:
+   case OPCODE_DP4:
+      inst->DstReg = mov->DstReg;
+      return GL_TRUE;
+
+   /* All other instructions require fully active components with no swizzle */
+   default:
+      if (mov->SrcReg[0].Swizzle != SWIZZLE_XYZW ||
+          inst->DstReg.WriteMask != WRITEMASK_XYZW)
+         return GL_FALSE;
+      inst->DstReg = mov->DstReg;
+      return GL_TRUE;
+   }
+}
+
+
+/**
+ * Try to remove extraneous MOV instructions from the given program.
+ */
+static GLboolean
+_mesa_remove_extra_moves(struct gl_program *prog)
+{
+   GLboolean *removeInst; /* per-instruction removal flag */
+   GLuint i, rem = 0, nesting = 0;
+
+   if (dbg) {
+      printf("Optimize: Begin remove extra moves\n");
+      _mesa_print_program(prog);
+   }
+
+   removeInst =
+      calloc(1, prog->NumInstructions * sizeof(GLboolean));
+
+   /*
+    * Look for sequences such as this:
+    *    FOO tmpX, arg0, arg1;
+    *    MOV tmpY, tmpX;
+    * and convert into:
+    *    FOO tmpY, arg0, arg1;
+    */
+
+   for (i = 0; i < prog->NumInstructions; i++) {
+      const struct prog_instruction *mov = prog->Instructions + i;
+
+      switch (mov->Opcode) {
+      case OPCODE_BGNLOOP:
+      case OPCODE_BGNSUB:
+      case OPCODE_IF:
+         nesting++;
+         break;
+      case OPCODE_ENDLOOP:
+      case OPCODE_ENDSUB:
+      case OPCODE_ENDIF:
+         nesting--;
+         break;
+      case OPCODE_MOV:
+         if (i > 0 &&
+             can_downward_mov_be_modifed(mov) &&
+             mov->SrcReg[0].File == PROGRAM_TEMPORARY &&
+             nesting == 0)
+         {
+
+            /* see if this MOV can be removed */
+            const GLuint id = mov->SrcReg[0].Index;
+            struct prog_instruction *prevInst;
+            GLuint prevI;
+
+            /* get pointer to previous instruction */
+            prevI = i - 1;
+            while (prevI > 0 && removeInst[prevI])
+               prevI--;
+            prevInst = prog->Instructions + prevI;
+
+            if (prevInst->DstReg.File == PROGRAM_TEMPORARY &&
+                prevInst->DstReg.Index == id &&
+                prevInst->DstReg.RelAddr == 0 &&
+                prevInst->DstReg.CondMask == COND_TR) {
+
+               const GLuint dst_mask = prevInst->DstReg.WriteMask;
+               enum inst_use next_use = find_next_use(prog, i+1, id, dst_mask);
+
+               if (next_use == WRITE || next_use == END) {
+                  /* OK, we can safely remove this MOV instruction.
+                   * Transform:
+                   *   prevI: FOO tempIndex, x, y;
+                   *       i: MOV z, tempIndex;
+                   * Into:
+                   *   prevI: FOO z, x, y;
+                   */
+                  if (_mesa_merge_mov_into_inst(prevInst, mov)) {
+                     removeInst[i] = GL_TRUE;
+                     if (dbg) {
+                        printf("Remove MOV at %u\n", i);
+                        printf("new prev inst %u: ", prevI);
+                        _mesa_print_instruction(prevInst);
+                     }
+                  }
+               }
+            }
+         }
+         break;
+      default:
+         ; /* nothing */
+      }
+   }
+
+   /* now remove the instructions which aren't needed */
+   rem = remove_instructions(prog, removeInst);
+
+   free(removeInst);
+
+   if (dbg) {
+      printf("Optimize: End remove extra moves.  %u instructions removed\n", rem);
+      /*_mesa_print_program(prog);*/
+   }
+
+   return rem != 0;
+}
+
+
+/** A live register interval */
+struct interval
+{
+   GLuint Reg;         /** The temporary register index */
+   GLuint Start, End;  /** Start/end instruction numbers */
+};
+
+
+/** A list of register intervals */
+struct interval_list
+{
+   GLuint Num;
+   struct interval Intervals[REG_ALLOCATE_MAX_PROGRAM_TEMPS];
+};
+
+
+static void
+append_interval(struct interval_list *list, const struct interval *inv)
+{
+   list->Intervals[list->Num++] = *inv;
+}
+
+
+/** Insert interval inv into list, sorted by interval end */
+static void
+insert_interval_by_end(struct interval_list *list, const struct interval *inv)
+{
+   /* XXX we could do a binary search insertion here since list is sorted */
+   GLint i = list->Num - 1;
+   while (i >= 0 && list->Intervals[i].End > inv->End) {
+      list->Intervals[i + 1] = list->Intervals[i];
+      i--;
+   }
+   list->Intervals[i + 1] = *inv;
+   list->Num++;
+
+#ifdef DEBUG
+   {
+      GLuint i;
+      for (i = 0; i + 1 < list->Num; i++) {
+         ASSERT(list->Intervals[i].End <= list->Intervals[i + 1].End);
+      }
+   }
+#endif
+}
+
+
+/** Remove the given interval from the interval list */
+static void
+remove_interval(struct interval_list *list, const struct interval *inv)
+{
+   /* XXX we could binary search since list is sorted */
+   GLuint k;
+   for (k = 0; k < list->Num; k++) {
+      if (list->Intervals[k].Reg == inv->Reg) {
+         /* found, remove it */
+         ASSERT(list->Intervals[k].Start == inv->Start);
+         ASSERT(list->Intervals[k].End == inv->End);
+         while (k < list->Num - 1) {
+            list->Intervals[k] = list->Intervals[k + 1];
+            k++;
+         }
+         list->Num--;
+         return;
+      }
+   }
+}
+
+
+/** called by qsort() */
+static int
+compare_start(const void *a, const void *b)
+{
+   const struct interval *ia = (const struct interval *) a;
+   const struct interval *ib = (const struct interval *) b;
+   if (ia->Start < ib->Start)
+      return -1;
+   else if (ia->Start > ib->Start)
+      return +1;
+   else
+      return 0;
+}
+
+
+/** sort the interval list according to interval starts */
+static void
+sort_interval_list_by_start(struct interval_list *list)
+{
+   qsort(list->Intervals, list->Num, sizeof(struct interval), compare_start);
+#ifdef DEBUG
+   {
+      GLuint i;
+      for (i = 0; i + 1 < list->Num; i++) {
+         ASSERT(list->Intervals[i].Start <= list->Intervals[i + 1].Start);
+      }
+   }
+#endif
+}
+
+struct loop_info
+{
+   GLuint Start, End;  /**< Start, end instructions of loop */
+};
+
+/**
+ * Update the intermediate interval info for register 'index' and
+ * instruction 'ic'.
+ */
+static void
+update_interval(GLint intBegin[], GLint intEnd[],
+		struct loop_info *loopStack, GLuint loopStackDepth,
+		GLuint index, GLuint ic)
+{
+   int i;
+   GLuint begin = ic;
+   GLuint end = ic;
+
+   /* If the register is used in a loop, extend its lifetime through the end
+    * of the outermost loop that doesn't contain its definition.
+    */
+   for (i = 0; i < loopStackDepth; i++) {
+      if (intBegin[index] < loopStack[i].Start) {
+	 end = loopStack[i].End;
+	 break;
+      }
+   }
+
+   /* Variables that are live at the end of a loop will also be live at the
+    * beginning, so an instruction inside of a loop should have its live
+    * interval begin at the start of the outermost loop.
+    */
+   if (loopStackDepth > 0 && ic > loopStack[0].Start && ic < loopStack[0].End) {
+      begin = loopStack[0].Start;
+   }
+
+   ASSERT(index < REG_ALLOCATE_MAX_PROGRAM_TEMPS);
+   if (intBegin[index] == -1) {
+      ASSERT(intEnd[index] == -1);
+      intBegin[index] = begin;
+      intEnd[index] = end;
+   }
+   else {
+      intEnd[index] = end;
+   }
+}
+
+
+/**
+ * Find first/last instruction that references each temporary register.
+ */
+GLboolean
+_mesa_find_temp_intervals(const struct prog_instruction *instructions,
+                          GLuint numInstructions,
+                          GLint intBegin[REG_ALLOCATE_MAX_PROGRAM_TEMPS],
+                          GLint intEnd[REG_ALLOCATE_MAX_PROGRAM_TEMPS])
+{
+   struct loop_info loopStack[MAX_LOOP_NESTING];
+   GLuint loopStackDepth = 0;
+   GLuint i;
+
+   for (i = 0; i < REG_ALLOCATE_MAX_PROGRAM_TEMPS; i++){
+      intBegin[i] = intEnd[i] = -1;
+   }
+
+   /* Scan instructions looking for temporary registers */
+   for (i = 0; i < numInstructions; i++) {
+      const struct prog_instruction *inst = instructions + i;
+      if (inst->Opcode == OPCODE_BGNLOOP) {
+         loopStack[loopStackDepth].Start = i;
+         loopStack[loopStackDepth].End = inst->BranchTarget;
+         loopStackDepth++;
+      }
+      else if (inst->Opcode == OPCODE_ENDLOOP) {
+         loopStackDepth--;
+      }
+      else if (inst->Opcode == OPCODE_CAL) {
+         return GL_FALSE;
+      }
+      else {
+         const GLuint numSrc = 3;/*_mesa_num_inst_src_regs(inst->Opcode);*/
+         GLuint j;
+         for (j = 0; j < numSrc; j++) {
+            if (inst->SrcReg[j].File == PROGRAM_TEMPORARY) {
+               const GLuint index = inst->SrcReg[j].Index;
+               if (inst->SrcReg[j].RelAddr)
+                  return GL_FALSE;
+               update_interval(intBegin, intEnd, loopStack, loopStackDepth,
+			       index, i);
+            }
+         }
+         if (inst->DstReg.File == PROGRAM_TEMPORARY) {
+            const GLuint index = inst->DstReg.Index;
+            if (inst->DstReg.RelAddr)
+               return GL_FALSE;
+            update_interval(intBegin, intEnd, loopStack, loopStackDepth,
+			    index, i);
+         }
+      }
+   }
+
+   return GL_TRUE;
+}
+
+
+/**
+ * Find the live intervals for each temporary register in the program.
+ * For register R, the interval [A,B] indicates that R is referenced
+ * from instruction A through instruction B.
+ * Special consideration is needed for loops and subroutines.
+ * \return GL_TRUE if success, GL_FALSE if we cannot proceed for some reason
+ */
+static GLboolean
+find_live_intervals(struct gl_program *prog,
+                    struct interval_list *liveIntervals)
+{
+   GLint intBegin[REG_ALLOCATE_MAX_PROGRAM_TEMPS];
+   GLint intEnd[REG_ALLOCATE_MAX_PROGRAM_TEMPS];
+   GLuint i;
+
+   /*
+    * Note: we'll return GL_FALSE below if we find relative indexing
+    * into the TEMP register file.  We can't handle that yet.
+    * We also give up on subroutines for now.
+    */
+
+   if (dbg) {
+      printf("Optimize: Begin find intervals\n");
+   }
+
+   /* build intermediate arrays */
+   if (!_mesa_find_temp_intervals(prog->Instructions, prog->NumInstructions,
+                                  intBegin, intEnd))
+      return GL_FALSE;
+
+   /* Build live intervals list from intermediate arrays */
+   liveIntervals->Num = 0;
+   for (i = 0; i < REG_ALLOCATE_MAX_PROGRAM_TEMPS; i++) {
+      if (intBegin[i] >= 0) {
+         struct interval inv;
+         inv.Reg = i;
+         inv.Start = intBegin[i];
+         inv.End = intEnd[i];
+         append_interval(liveIntervals, &inv);
+      }
+   }
+
+   /* Sort the list according to interval starts */
+   sort_interval_list_by_start(liveIntervals);
+
+   if (dbg) {
+      /* print interval info */
+      for (i = 0; i < liveIntervals->Num; i++) {
+         const struct interval *inv = liveIntervals->Intervals + i;
+         printf("Reg[%d] live [%d, %d]:",
+                      inv->Reg, inv->Start, inv->End);
+         if (1) {
+            GLuint j;
+            for (j = 0; j < inv->Start; j++)
+               printf(" ");
+            for (j = inv->Start; j <= inv->End; j++)
+               printf("x");
+         }
+         printf("\n");
+      }
+   }
+
+   return GL_TRUE;
+}
+
+
+/** Scan the array of used register flags to find free entry */
+static GLint
+alloc_register(GLboolean usedRegs[REG_ALLOCATE_MAX_PROGRAM_TEMPS])
+{
+   GLuint k;
+   for (k = 0; k < REG_ALLOCATE_MAX_PROGRAM_TEMPS; k++) {
+      if (!usedRegs[k]) {
+         usedRegs[k] = GL_TRUE;
+         return k;
+      }
+   }
+   return -1;
+}
+
+
+/**
+ * This function implements "Linear Scan Register Allocation" to reduce
+ * the number of temporary registers used by the program.
+ *
+ * We compute the "live interval" for all temporary registers then
+ * examine the overlap of the intervals to allocate new registers.
+ * Basically, if two intervals do not overlap, they can use the same register.
+ */
+static void
+_mesa_reallocate_registers(struct gl_program *prog)
+{
+   struct interval_list liveIntervals;
+   GLint registerMap[REG_ALLOCATE_MAX_PROGRAM_TEMPS];
+   GLboolean usedRegs[REG_ALLOCATE_MAX_PROGRAM_TEMPS];
+   GLuint i;
+   GLint maxTemp = -1;
+
+   if (dbg) {
+      printf("Optimize: Begin live-interval register reallocation\n");
+      _mesa_print_program(prog);
+   }
+
+   for (i = 0; i < REG_ALLOCATE_MAX_PROGRAM_TEMPS; i++){
+      registerMap[i] = -1;
+      usedRegs[i] = GL_FALSE;
+   }
+
+   if (!find_live_intervals(prog, &liveIntervals)) {
+      if (dbg)
+         printf("Aborting register reallocation\n");
+      return;
+   }
+
+   {
+      struct interval_list activeIntervals;
+      activeIntervals.Num = 0;
+
+      /* loop over live intervals, allocating a new register for each */
+      for (i = 0; i < liveIntervals.Num; i++) {
+         const struct interval *live = liveIntervals.Intervals + i;
+
+         if (dbg)
+            printf("Consider register %u\n", live->Reg);
+
+         /* Expire old intervals.  Intervals which have ended with respect
+          * to the live interval can have their remapped registers freed.
+          */
+         {
+            GLint j;
+            for (j = 0; j < (GLint) activeIntervals.Num; j++) {
+               const struct interval *inv = activeIntervals.Intervals + j;
+               if (inv->End >= live->Start) {
+                  /* Stop now.  Since the activeInterval list is sorted
+                   * we know we don't have to go further.
+                   */
+                  break;
+               }
+               else {
+                  /* Interval 'inv' has expired */
+                  const GLint regNew = registerMap[inv->Reg];
+                  ASSERT(regNew >= 0);
+
+                  if (dbg)
+                     printf("  expire interval for reg %u\n", inv->Reg);
+
+                  /* remove interval j from active list */
+                  remove_interval(&activeIntervals, inv);
+                  j--;  /* counter-act j++ in for-loop above */
+
+                  /* return register regNew to the free pool */
+                  if (dbg)
+                     printf("  free reg %d\n", regNew);
+                  ASSERT(usedRegs[regNew] == GL_TRUE);
+                  usedRegs[regNew] = GL_FALSE;
+               }
+            }
+         }
+
+         /* find a free register for this live interval */
+         {
+            const GLint k = alloc_register(usedRegs);
+            if (k < 0) {
+               /* out of registers, give up */
+               return;
+            }
+            registerMap[live->Reg] = k;
+            maxTemp = MAX2(maxTemp, k);
+            if (dbg)
+               printf("  remap register %u -> %d\n", live->Reg, k);
+         }
+
+         /* Insert this live interval into the active list which is sorted
+          * by increasing end points.
+          */
+         insert_interval_by_end(&activeIntervals, live);
+      }
+   }
+
+   if (maxTemp + 1 < (GLint) liveIntervals.Num) {
+      /* OK, we've reduced the number of registers needed.
+       * Scan the program and replace all the old temporary register
+       * indexes with the new indexes.
+       */
+      replace_regs(prog, PROGRAM_TEMPORARY, registerMap);
+
+      prog->NumTemporaries = maxTemp + 1;
+   }
+
+   if (dbg) {
+      printf("Optimize: End live-interval register reallocation\n");
+      printf("Num temp regs before: %u  after: %u\n",
+                   liveIntervals.Num, maxTemp + 1);
+      _mesa_print_program(prog);
+   }
+}
+
+
+#if 0
+static void
+print_it(struct gl_context *ctx, struct gl_program *program, const char *txt) {
+   fprintf(stderr, "%s (%u inst):\n", txt, program->NumInstructions);
+   _mesa_print_program(program);
+   _mesa_print_program_parameters(ctx, program);
+   fprintf(stderr, "\n\n");
+}
+#endif
+
+/**
+ * This pass replaces CMP T0, T1 T2 T0 with MOV T0, T2 when the CMP
+ * instruction is the first instruction to write to register T0.  The are
+ * several lowering passes done in GLSL IR (e.g. branches and
+ * relative addressing) that create a large number of conditional assignments
+ * that ir_to_mesa converts to CMP instructions like the one mentioned above.
+ *
+ * Here is why this conversion is safe:
+ * CMP T0, T1 T2 T0 can be expanded to:
+ * if (T1 < 0.0)
+ * 	MOV T0, T2;
+ * else
+ * 	MOV T0, T0;
+ *
+ * If (T1 < 0.0) evaluates to true then our replacement MOV T0, T2 is the same
+ * as the original program.  If (T1 < 0.0) evaluates to false, executing
+ * MOV T0, T0 will store a garbage value in T0 since T0 is uninitialized.
+ * Therefore, it doesn't matter that we are replacing MOV T0, T0 with MOV T0, T2
+ * because any instruction that was going to read from T0 after this was going
+ * to read a garbage value anyway.
+ */
+static void
+_mesa_simplify_cmp(struct gl_program * program)
+{
+   GLuint tempWrites[REG_ALLOCATE_MAX_PROGRAM_TEMPS];
+   GLuint outputWrites[MAX_PROGRAM_OUTPUTS];
+   GLuint i;
+
+   if (dbg) {
+      printf("Optimize: Begin reads without writes\n");
+      _mesa_print_program(program);
+   }
+
+   for (i = 0; i < REG_ALLOCATE_MAX_PROGRAM_TEMPS; i++) {
+      tempWrites[i] = 0;
+   }
+
+   for (i = 0; i < MAX_PROGRAM_OUTPUTS; i++) {
+      outputWrites[i] = 0;
+   }
+
+   for (i = 0; i < program->NumInstructions; i++) {
+      struct prog_instruction *inst = program->Instructions + i;
+      GLuint prevWriteMask;
+
+      /* Give up if we encounter relative addressing or flow control. */
+      if (_mesa_is_flow_control_opcode(inst->Opcode) || inst->DstReg.RelAddr) {
+         return;
+      }
+
+      if (inst->DstReg.File == PROGRAM_OUTPUT) {
+         assert(inst->DstReg.Index < MAX_PROGRAM_OUTPUTS);
+         prevWriteMask = outputWrites[inst->DstReg.Index];
+         outputWrites[inst->DstReg.Index] |= inst->DstReg.WriteMask;
+      } else if (inst->DstReg.File == PROGRAM_TEMPORARY) {
+         assert(inst->DstReg.Index < REG_ALLOCATE_MAX_PROGRAM_TEMPS);
+         prevWriteMask = tempWrites[inst->DstReg.Index];
+         tempWrites[inst->DstReg.Index] |= inst->DstReg.WriteMask;
+      } else {
+         /* No other register type can be a destination register. */
+         continue;
+      }
+
+      /* For a CMP to be considered a conditional write, the destination
+       * register and source register two must be the same. */
+      if (inst->Opcode == OPCODE_CMP
+          && !(inst->DstReg.WriteMask & prevWriteMask)
+          && inst->SrcReg[2].File == inst->DstReg.File
+          && inst->SrcReg[2].Index == inst->DstReg.Index
+          && inst->DstReg.WriteMask == get_src_arg_mask(inst, 2, NO_MASK)) {
+
+         inst->Opcode = OPCODE_MOV;
+         inst->SrcReg[0] = inst->SrcReg[1];
+
+	 /* Unused operands are expected to have the file set to
+	  * PROGRAM_UNDEFINED.  This is how _mesa_init_instructions initializes
+	  * all of the sources.
+	  */
+	 inst->SrcReg[1].File = PROGRAM_UNDEFINED;
+	 inst->SrcReg[1].Swizzle = SWIZZLE_NOOP;
+	 inst->SrcReg[2].File = PROGRAM_UNDEFINED;
+	 inst->SrcReg[2].Swizzle = SWIZZLE_NOOP;
+      }
+   }
+   if (dbg) {
+      printf("Optimize: End reads without writes\n");
+      _mesa_print_program(program);
+   }
+}
+
+/**
+ * Apply optimizations to the given program to eliminate unnecessary
+ * instructions, temp regs, etc.
+ */
+void
+_mesa_optimize_program(struct gl_context *ctx, struct gl_program *program)
+{
+   GLboolean any_change;
+
+   _mesa_simplify_cmp(program);
+   /* Stop when no modifications were output */
+   do {
+      any_change = GL_FALSE;
+      _mesa_remove_extra_move_use(program);
+      if (_mesa_remove_dead_code_global(program))
+         any_change = GL_TRUE;
+      if (_mesa_remove_extra_moves(program))
+         any_change = GL_TRUE;
+      if (_mesa_remove_dead_code_local(program))
+         any_change = GL_TRUE;
+
+      any_change = _mesa_constant_fold(program) || any_change;
+      _mesa_reallocate_registers(program);
+   } while (any_change);
+}
+

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/prog_optimize.h b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_optimize.h
new file mode 100644
index 0000000..7607bff
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_optimize.h

@@ -0,0 +1,49 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 2009  VMware, Inc.  All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * VMWARE BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
+ * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef PROG_OPT_H
+#define PROG_OPT_H
+
+
+#include "main/config.h"
+#include "main/glheader.h"
+
+
+struct gl_context;
+struct gl_program;
+struct prog_instruction;
+
+
+extern GLboolean
+_mesa_find_temp_intervals(const struct prog_instruction *instructions,
+                          GLuint numInstructions,
+                          GLint intBegin[MAX_PROGRAM_TEMPS],
+                          GLint intEnd[MAX_PROGRAM_TEMPS]);
+
+extern void
+_mesa_optimize_program(struct gl_context *ctx, struct gl_program *program);
+
+extern GLboolean
+_mesa_constant_fold(struct gl_program *prog);
+
+#endif

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/prog_parameter.c b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_parameter.c
new file mode 100644
index 0000000..fcdda9f
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_parameter.c

@@ -0,0 +1,582 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2008  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file prog_parameter.c
+ * Program parameter lists and functions.
+ * \author Brian Paul
+ */
+
+
+#include "main/glheader.h"
+#include "main/imports.h"
+#include "main/macros.h"
+#include "prog_instruction.h"
+#include "prog_parameter.h"
+#include "prog_statevars.h"
+
+
+struct gl_program_parameter_list *
+_mesa_new_parameter_list(void)
+{
+   return CALLOC_STRUCT(gl_program_parameter_list);
+}
+
+
+struct gl_program_parameter_list *
+_mesa_new_parameter_list_sized(unsigned size)
+{
+   struct gl_program_parameter_list *p = _mesa_new_parameter_list();
+
+   if ((p != NULL) && (size != 0)) {
+      p->Size = size;
+
+      /* alloc arrays */
+      p->Parameters = (struct gl_program_parameter *)
+	 calloc(1, size * sizeof(struct gl_program_parameter));
+
+      p->ParameterValues = (gl_constant_value (*)[4])
+         _mesa_align_malloc(size * 4 *sizeof(gl_constant_value), 16);
+
+
+      if ((p->Parameters == NULL) || (p->ParameterValues == NULL)) {
+	 free(p->Parameters);
+	 _mesa_align_free(p->ParameterValues);
+	 free(p);
+	 p = NULL;
+      }
+   }
+
+   return p;
+}
+
+
+/**
+ * Free a parameter list and all its parameters
+ */
+void
+_mesa_free_parameter_list(struct gl_program_parameter_list *paramList)
+{
+   GLuint i;
+   for (i = 0; i < paramList->NumParameters; i++) {
+      free((void *)paramList->Parameters[i].Name);
+   }
+   free(paramList->Parameters);
+   _mesa_align_free(paramList->ParameterValues);
+   free(paramList);
+}
+
+
+/**
+ * Add a new parameter to a parameter list.
+ * Note that parameter values are usually 4-element GLfloat vectors.
+ * When size > 4 we'll allocate a sequential block of parameters to
+ * store all the values (in blocks of 4).
+ *
+ * \param paramList  the list to add the parameter to
+ * \param type  type of parameter, such as 
+ * \param name  the parameter name, will be duplicated/copied!
+ * \param size  number of elements in 'values' vector (1..4, or more)
+ * \param datatype  GL_FLOAT, GL_FLOAT_VECx, GL_INT, GL_INT_VECx or GL_NONE.
+ * \param values  initial parameter value, up to 4 gl_constant_values, or NULL
+ * \param state  state indexes, or NULL
+ * \return  index of new parameter in the list, or -1 if error (out of mem)
+ */
+GLint
+_mesa_add_parameter(struct gl_program_parameter_list *paramList,
+                    gl_register_file type, const char *name,
+                    GLuint size, GLenum datatype,
+                    const gl_constant_value *values,
+                    const gl_state_index state[STATE_LENGTH])
+{
+   const GLuint oldNum = paramList->NumParameters;
+   const GLuint sz4 = (size + 3) / 4; /* no. of new param slots needed */
+
+   assert(size > 0);
+
+   if (oldNum + sz4 > paramList->Size) {
+      /* Need to grow the parameter list array (alloc some extra) */
+      paramList->Size = paramList->Size + 4 * sz4;
+
+      /* realloc arrays */
+      paramList->Parameters = (struct gl_program_parameter *)
+	 _mesa_realloc(paramList->Parameters,
+		       oldNum * sizeof(struct gl_program_parameter),
+		       paramList->Size * sizeof(struct gl_program_parameter));
+
+      paramList->ParameterValues = (gl_constant_value (*)[4])
+         _mesa_align_realloc(paramList->ParameterValues,         /* old buf */
+                             oldNum * 4 * sizeof(gl_constant_value),/* old sz */
+                             paramList->Size*4*sizeof(gl_constant_value),/*new*/
+                             16);
+   }
+
+   if (!paramList->Parameters ||
+       !paramList->ParameterValues) {
+      /* out of memory */
+      paramList->NumParameters = 0;
+      paramList->Size = 0;
+      return -1;
+   }
+   else {
+      GLuint i, j;
+
+      paramList->NumParameters = oldNum + sz4;
+
+      memset(&paramList->Parameters[oldNum], 0,
+             sz4 * sizeof(struct gl_program_parameter));
+
+      for (i = 0; i < sz4; i++) {
+         struct gl_program_parameter *p = paramList->Parameters + oldNum + i;
+         p->Name = name ? _mesa_strdup(name) : NULL;
+         p->Type = type;
+         p->Size = size;
+         p->DataType = datatype;
+         if (values) {
+            if (size >= 4) {
+               COPY_4V(paramList->ParameterValues[oldNum + i], values);
+            }
+            else {
+               /* copy 1, 2 or 3 values */
+               GLuint remaining = size % 4;
+               assert(remaining < 4);
+               for (j = 0; j < remaining; j++) {
+                  paramList->ParameterValues[oldNum + i][j].f = values[j].f;
+               }
+               /* fill in remaining positions with zeros */
+               for (; j < 4; j++) {
+                  paramList->ParameterValues[oldNum + i][j].f = 0.0f;
+               }
+            }
+            values += 4;
+            p->Initialized = GL_TRUE;
+         }
+         else {
+            /* silence valgrind */
+            for (j = 0; j < 4; j++)
+            	paramList->ParameterValues[oldNum + i][j].f = 0;
+         }
+         size -= 4;
+      }
+
+      if (state) {
+         for (i = 0; i < STATE_LENGTH; i++)
+            paramList->Parameters[oldNum].StateIndexes[i] = state[i];
+      }
+
+      return (GLint) oldNum;
+   }
+}
+
+
+/**
+ * Add a new named constant to the parameter list.
+ * This will be used when the program contains something like this:
+ *    PARAM myVals = { 0, 1, 2, 3 };
+ *
+ * \param paramList  the parameter list
+ * \param name  the name for the constant
+ * \param values  four float values
+ * \return index/position of the new parameter in the parameter list
+ */
+GLint
+_mesa_add_named_constant(struct gl_program_parameter_list *paramList,
+                         const char *name, const gl_constant_value values[4],
+                         GLuint size)
+{
+   /* first check if this is a duplicate constant */
+   GLint pos;
+   for (pos = 0; pos < (GLint)paramList->NumParameters; pos++) {
+      const gl_constant_value *pvals = paramList->ParameterValues[pos];
+      if (pvals[0].u == values[0].u &&
+          pvals[1].u == values[1].u &&
+          pvals[2].u == values[2].u &&
+          pvals[3].u == values[3].u &&
+          strcmp(paramList->Parameters[pos].Name, name) == 0) {
+         /* Same name and value is already in the param list - reuse it */
+         return pos;
+      }
+   }
+   /* not found, add new parameter */
+   return _mesa_add_parameter(paramList, PROGRAM_CONSTANT, name,
+                              size, GL_NONE, values, NULL);
+}
+
+
+/**
+ * Add a new unnamed constant to the parameter list.  This will be used
+ * when a fragment/vertex program contains something like this:
+ *    MOV r, { 0, 1, 2, 3 };
+ * If swizzleOut is non-null we'll search the parameter list for an
+ * existing instance of the constant which matches with a swizzle.
+ *
+ * \param paramList  the parameter list
+ * \param values  four float values
+ * \param swizzleOut  returns swizzle mask for accessing the constant
+ * \return index/position of the new parameter in the parameter list.
+ */
+GLint
+_mesa_add_typed_unnamed_constant(struct gl_program_parameter_list *paramList,
+                           const gl_constant_value values[4], GLuint size,
+                           GLenum datatype, GLuint *swizzleOut)
+{
+   GLint pos;
+   ASSERT(size >= 1);
+   ASSERT(size <= 4);
+
+   if (swizzleOut &&
+       _mesa_lookup_parameter_constant(paramList, values,
+                                       size, &pos, swizzleOut)) {
+      return pos;
+   }
+
+   /* Look for empty space in an already unnamed constant parameter
+    * to add this constant.  This will only work for single-element
+    * constants because we rely on smearing (i.e. .yyyy or .zzzz).
+    */
+   if (size == 1 && swizzleOut) {
+      for (pos = 0; pos < (GLint) paramList->NumParameters; pos++) {
+         struct gl_program_parameter *p = paramList->Parameters + pos;
+         if (p->Type == PROGRAM_CONSTANT && p->Size + size <= 4) {
+            /* ok, found room */
+            gl_constant_value *pVal = paramList->ParameterValues[pos];
+            GLuint swz = p->Size; /* 1, 2 or 3 for Y, Z, W */
+            pVal[p->Size] = values[0];
+            p->Size++;
+            *swizzleOut = MAKE_SWIZZLE4(swz, swz, swz, swz);
+            return pos;
+         }
+      }
+   }
+
+   /* add a new parameter to store this constant */
+   pos = _mesa_add_parameter(paramList, PROGRAM_CONSTANT, NULL,
+                             size, datatype, values, NULL);
+   if (pos >= 0 && swizzleOut) {
+      if (size == 1)
+         *swizzleOut = SWIZZLE_XXXX;
+      else
+         *swizzleOut = SWIZZLE_NOOP;
+   }
+   return pos;
+}
+
+/**
+ * Add a new unnamed constant to the parameter list.  This will be used
+ * when a fragment/vertex program contains something like this:
+ *    MOV r, { 0, 1, 2, 3 };
+ * If swizzleOut is non-null we'll search the parameter list for an
+ * existing instance of the constant which matches with a swizzle.
+ *
+ * \param paramList  the parameter list
+ * \param values  four float values
+ * \param swizzleOut  returns swizzle mask for accessing the constant
+ * \return index/position of the new parameter in the parameter list.
+ * \sa _mesa_add_typed_unnamed_constant
+ */
+GLint
+_mesa_add_unnamed_constant(struct gl_program_parameter_list *paramList,
+                           const gl_constant_value values[4], GLuint size,
+                           GLuint *swizzleOut)
+{
+   return _mesa_add_typed_unnamed_constant(paramList, values, size, GL_NONE,
+                                           swizzleOut);
+}
+
+#if 0 /* not used yet */
+/**
+ * Returns the number of 4-component registers needed to store a piece
+ * of GL state.  For matrices this may be as many as 4 registers,
+ * everything else needs
+ * just 1 register.
+ */
+static GLuint
+sizeof_state_reference(const GLint *stateTokens)
+{
+   if (stateTokens[0] == STATE_MATRIX) {
+      GLuint rows = stateTokens[4] - stateTokens[3] + 1;
+      assert(rows >= 1);
+      assert(rows <= 4);
+      return rows;
+   }
+   else {
+      return 1;
+   }
+}
+#endif
+
+
+/**
+ * Add a new state reference to the parameter list.
+ * This will be used when the program contains something like this:
+ *    PARAM ambient = state.material.front.ambient;
+ *
+ * \param paramList  the parameter list
+ * \param stateTokens  an array of 5 (STATE_LENGTH) state tokens
+ * \return index of the new parameter.
+ */
+GLint
+_mesa_add_state_reference(struct gl_program_parameter_list *paramList,
+                          const gl_state_index stateTokens[STATE_LENGTH])
+{
+   const GLuint size = 4; /* XXX fix */
+   char *name;
+   GLint index;
+
+   /* Check if the state reference is already in the list */
+   for (index = 0; index < (GLint) paramList->NumParameters; index++) {
+      if (!memcmp(paramList->Parameters[index].StateIndexes,
+		  stateTokens, STATE_LENGTH * sizeof(gl_state_index))) {
+	 return index;
+      }
+   }
+
+   name = _mesa_program_state_string(stateTokens);
+   index = _mesa_add_parameter(paramList, PROGRAM_STATE_VAR, name,
+                               size, GL_NONE,
+                               NULL, (gl_state_index *) stateTokens);
+   paramList->StateFlags |= _mesa_program_state_flags(stateTokens);
+
+   /* free name string here since we duplicated it in add_parameter() */
+   free(name);
+
+   return index;
+}
+
+
+/**
+ * Lookup a parameter value by name in the given parameter list.
+ * \return pointer to the float[4] values.
+ */
+gl_constant_value *
+_mesa_lookup_parameter_value(const struct gl_program_parameter_list *paramList,
+                             GLsizei nameLen, const char *name)
+{
+   GLint i = _mesa_lookup_parameter_index(paramList, nameLen, name);
+   if (i < 0)
+      return NULL;
+   else
+      return paramList->ParameterValues[i];
+}
+
+
+/**
+ * Given a program parameter name, find its position in the list of parameters.
+ * \param paramList  the parameter list to search
+ * \param nameLen  length of name (in chars).
+ *                 If length is negative, assume that name is null-terminated.
+ * \param name  the name to search for
+ * \return index of parameter in the list.
+ */
+GLint
+_mesa_lookup_parameter_index(const struct gl_program_parameter_list *paramList,
+                             GLsizei nameLen, const char *name)
+{
+   GLint i;
+
+   if (!paramList)
+      return -1;
+
+   if (nameLen == -1) {
+      /* name is null-terminated */
+      for (i = 0; i < (GLint) paramList->NumParameters; i++) {
+         if (paramList->Parameters[i].Name &&
+	     strcmp(paramList->Parameters[i].Name, name) == 0)
+            return i;
+      }
+   }
+   else {
+      /* name is not null-terminated, use nameLen */
+      for (i = 0; i < (GLint) paramList->NumParameters; i++) {
+         if (paramList->Parameters[i].Name &&
+	     strncmp(paramList->Parameters[i].Name, name, nameLen) == 0
+             && strlen(paramList->Parameters[i].Name) == (size_t)nameLen)
+            return i;
+      }
+   }
+   return -1;
+}
+
+
+/**
+ * Look for a float vector in the given parameter list.  The float vector
+ * may be of length 1, 2, 3 or 4.  If swizzleOut is non-null, we'll try
+ * swizzling to find a match.
+ * \param list  the parameter list to search
+ * \param v  the float vector to search for
+ * \param vSize  number of element in v
+ * \param posOut  returns the position of the constant, if found
+ * \param swizzleOut  returns a swizzle mask describing location of the
+ *                    vector elements if found.
+ * \return GL_TRUE if found, GL_FALSE if not found
+ */
+GLboolean
+_mesa_lookup_parameter_constant(const struct gl_program_parameter_list *list,
+                                const gl_constant_value v[], GLuint vSize,
+                                GLint *posOut, GLuint *swizzleOut)
+{
+   GLuint i;
+
+   assert(vSize >= 1);
+   assert(vSize <= 4);
+
+   if (!list) {
+      *posOut = -1;
+      return GL_FALSE;
+   }
+
+   for (i = 0; i < list->NumParameters; i++) {
+      if (list->Parameters[i].Type == PROGRAM_CONSTANT) {
+         if (!swizzleOut) {
+            /* swizzle not allowed */
+            GLuint j, match = 0;
+            for (j = 0; j < vSize; j++) {
+               if (v[j].u == list->ParameterValues[i][j].u)
+                  match++;
+            }
+            if (match == vSize) {
+               *posOut = i;
+               return GL_TRUE;
+            }
+         }
+         else {
+            /* try matching w/ swizzle */
+             if (vSize == 1) {
+                /* look for v[0] anywhere within float[4] value */
+                GLuint j;
+                for (j = 0; j < list->Parameters[i].Size; j++) {
+                   if (list->ParameterValues[i][j].u == v[0].u) {
+                      /* found it */
+                      *posOut = i;
+                      *swizzleOut = MAKE_SWIZZLE4(j, j, j, j);
+                      return GL_TRUE;
+                   }
+                }
+             }
+             else if (vSize <= list->Parameters[i].Size) {
+                /* see if we can match this constant (with a swizzle) */
+                GLuint swz[4];
+                GLuint match = 0, j, k;
+                for (j = 0; j < vSize; j++) {
+                   if (v[j].u == list->ParameterValues[i][j].u) {
+                      swz[j] = j;
+                      match++;
+                   }
+                   else {
+                      for (k = 0; k < list->Parameters[i].Size; k++) {
+                         if (v[j].u == list->ParameterValues[i][k].u) {
+                            swz[j] = k;
+                            match++;
+                            break;
+                         }
+                      }
+                   }
+                }
+                /* smear last value to remaining positions */
+                for (; j < 4 && j > 0; j++)
+                   swz[j] = swz[j-1];
+
+                if (match == vSize) {
+                   *posOut = i;
+                   *swizzleOut = MAKE_SWIZZLE4(swz[0], swz[1], swz[2], swz[3]);
+                   return GL_TRUE;
+                }
+             }
+         }
+      }
+   }
+
+   *posOut = -1;
+   return GL_FALSE;
+}
+
+
+struct gl_program_parameter_list *
+_mesa_clone_parameter_list(const struct gl_program_parameter_list *list)
+{
+   struct gl_program_parameter_list *clone;
+   GLuint i;
+
+   clone = _mesa_new_parameter_list();
+   if (!clone)
+      return NULL;
+
+   /** Not too efficient, but correct */
+   for (i = 0; i < list->NumParameters; i++) {
+      struct gl_program_parameter *p = list->Parameters + i;
+      struct gl_program_parameter *pCopy;
+      GLuint size = MIN2(p->Size, 4);
+      GLint j = _mesa_add_parameter(clone, p->Type, p->Name, size, p->DataType,
+                                    list->ParameterValues[i], NULL);
+      ASSERT(j >= 0);
+      pCopy = clone->Parameters + j;
+      /* copy state indexes */
+      if (p->Type == PROGRAM_STATE_VAR) {
+         GLint k;
+         for (k = 0; k < STATE_LENGTH; k++) {
+            pCopy->StateIndexes[k] = p->StateIndexes[k];
+         }
+      }
+      else {
+         clone->Parameters[j].Size = p->Size;
+      }
+      
+   }
+
+   clone->StateFlags = list->StateFlags;
+
+   return clone;
+}
+
+
+/**
+ * Return a new parameter list which is listA + listB.
+ */
+struct gl_program_parameter_list *
+_mesa_combine_parameter_lists(const struct gl_program_parameter_list *listA,
+                              const struct gl_program_parameter_list *listB)
+{
+   struct gl_program_parameter_list *list;
+
+   if (listA) {
+      list = _mesa_clone_parameter_list(listA);
+      if (list && listB) {
+         GLuint i;
+         for (i = 0; i < listB->NumParameters; i++) {
+            struct gl_program_parameter *param = listB->Parameters + i;
+            _mesa_add_parameter(list, param->Type, param->Name, param->Size,
+                                param->DataType,
+                                listB->ParameterValues[i],
+                                param->StateIndexes);
+         }
+      }
+   }
+   else if (listB) {
+      list = _mesa_clone_parameter_list(listB);
+   }
+   else {
+      list = NULL;
+   }
+   return list;
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/prog_parameter.h b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_parameter.h
new file mode 100644
index 0000000..6b3b3c2
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_parameter.h

@@ -0,0 +1,158 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2008  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file prog_parameter.c
+ * Program parameter lists and functions.
+ * \author Brian Paul
+ */
+
+#ifndef PROG_PARAMETER_H
+#define PROG_PARAMETER_H
+
+#include "main/mtypes.h"
+#include "prog_statevars.h"
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+/**
+ * Actual data for constant values of parameters.
+ */
+typedef union gl_constant_value
+{
+   GLfloat f;
+   GLint b;
+   GLint i;
+   GLuint u;
+} gl_constant_value;
+
+
+/**
+ * Program parameter.
+ * Used by shaders/programs for uniforms, constants, varying vars, etc.
+ */
+struct gl_program_parameter
+{
+   const char *Name;        /**< Null-terminated string */
+   gl_register_file Type;   /**< PROGRAM_CONSTANT or STATE_VAR */
+   GLenum DataType;         /**< GL_FLOAT, GL_FLOAT_VEC2, etc */
+   /**
+    * Number of components (1..4), or more.
+    * If the number of components is greater than 4,
+    * this parameter is part of a larger uniform like a GLSL matrix or array.
+    * The next program parameter's Size will be Size-4 of this parameter.
+    */
+   GLuint Size;
+   GLboolean Initialized;   /**< debug: Has the ParameterValue[] been set? */
+   /**
+    * A sequence of STATE_* tokens and integers to identify GL state.
+    */
+   gl_state_index StateIndexes[STATE_LENGTH];
+};
+
+
+/**
+ * List of gl_program_parameter instances.
+ */
+struct gl_program_parameter_list
+{
+   GLuint Size;           /**< allocated size of Parameters, ParameterValues */
+   GLuint NumParameters;  /**< number of parameters in arrays */
+   struct gl_program_parameter *Parameters; /**< Array [Size] */
+   gl_constant_value (*ParameterValues)[4]; /**< Array [Size] of constant[4] */
+   GLbitfield StateFlags; /**< _NEW_* flags indicating which state changes
+                               might invalidate ParameterValues[] */
+};
+
+
+extern struct gl_program_parameter_list *
+_mesa_new_parameter_list(void);
+
+extern struct gl_program_parameter_list *
+_mesa_new_parameter_list_sized(unsigned size);
+
+extern void
+_mesa_free_parameter_list(struct gl_program_parameter_list *paramList);
+
+extern struct gl_program_parameter_list *
+_mesa_clone_parameter_list(const struct gl_program_parameter_list *list);
+
+extern struct gl_program_parameter_list *
+_mesa_combine_parameter_lists(const struct gl_program_parameter_list *a,
+                              const struct gl_program_parameter_list *b);
+
+static inline GLuint
+_mesa_num_parameters(const struct gl_program_parameter_list *list)
+{
+   return list ? list->NumParameters : 0;
+}
+
+extern GLint
+_mesa_add_parameter(struct gl_program_parameter_list *paramList,
+                    gl_register_file type, const char *name,
+                    GLuint size, GLenum datatype,
+                    const gl_constant_value *values,
+                    const gl_state_index state[STATE_LENGTH]);
+
+extern GLint
+_mesa_add_named_constant(struct gl_program_parameter_list *paramList,
+                         const char *name, const gl_constant_value values[4],
+                         GLuint size);
+
+extern GLint
+_mesa_add_typed_unnamed_constant(struct gl_program_parameter_list *paramList,
+                           const gl_constant_value values[4], GLuint size,
+                           GLenum datatype, GLuint *swizzleOut);
+
+extern GLint
+_mesa_add_unnamed_constant(struct gl_program_parameter_list *paramList,
+                           const gl_constant_value values[4], GLuint size,
+                           GLuint *swizzleOut);
+
+extern GLint
+_mesa_add_state_reference(struct gl_program_parameter_list *paramList,
+                          const gl_state_index stateTokens[STATE_LENGTH]);
+
+extern gl_constant_value *
+_mesa_lookup_parameter_value(const struct gl_program_parameter_list *paramList,
+                             GLsizei nameLen, const char *name);
+
+extern GLint
+_mesa_lookup_parameter_index(const struct gl_program_parameter_list *paramList,
+                             GLsizei nameLen, const char *name);
+
+extern GLboolean
+_mesa_lookup_parameter_constant(const struct gl_program_parameter_list *list,
+                                const gl_constant_value v[], GLuint vSize,
+                                GLint *posOut, GLuint *swizzleOut);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* PROG_PARAMETER_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/prog_parameter_layout.c b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_parameter_layout.c
new file mode 100644
index 0000000..e834690
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_parameter_layout.c

@@ -0,0 +1,216 @@
+/*
+ * Copyright © 2009 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file prog_parameter_layout.c
+ * \brief Helper functions to layout storage for program parameters
+ *
+ * \author Ian Romanick <ian.d.romanick@intel.com>
+ */
+
+#include "main/compiler.h"
+#include "main/mtypes.h"
+#include "prog_parameter.h"
+#include "prog_parameter_layout.h"
+#include "prog_instruction.h"
+#include "program_parser.h"
+
+unsigned
+_mesa_combine_swizzles(unsigned base, unsigned applied)
+{
+   unsigned swiz = 0;
+   unsigned i;
+
+   for (i = 0; i < 4; i++) {
+      const unsigned s = GET_SWZ(applied, i);
+
+      swiz |= ((s <= SWIZZLE_W) ? GET_SWZ(base, s) : s) << (i * 3);
+   }
+
+   return swiz;
+}
+
+
+/**
+ * Copy indirect access array from one parameter list to another
+ *
+ * \param src   Parameter array copied from
+ * \param dst   Parameter array copied to
+ * \param first Index of first element in \c src to copy
+ * \param count Number of elements to copy
+ *
+ * \return
+ * The location in \c dst of the first element copied from \c src on
+ * success.  -1 on failure.
+ *
+ * \warning
+ * This function assumes that there is already enough space available in
+ * \c dst to hold all of the elements that will be copied over.
+ */
+static int
+copy_indirect_accessed_array(struct gl_program_parameter_list *src,
+			     struct gl_program_parameter_list *dst,
+			     unsigned first, unsigned count)
+{
+   const int base = dst->NumParameters;
+   unsigned i, j;
+
+   for (i = first; i < (first + count); i++) {
+      struct gl_program_parameter *curr = & src->Parameters[i];
+
+      if (curr->Type == PROGRAM_CONSTANT) {
+	 j = dst->NumParameters;
+      } else {
+	 for (j = 0; j < dst->NumParameters; j++) {
+	    if (memcmp(dst->Parameters[j].StateIndexes, curr->StateIndexes, 
+		       sizeof(curr->StateIndexes)) == 0) {
+	       return -1;
+	    }
+	 }
+      }
+
+      assert(j == dst->NumParameters);
+
+      /* copy src parameter [i] to dest parameter [j] */
+      memcpy(& dst->Parameters[j], curr,
+	     sizeof(dst->Parameters[j]));
+      memcpy(dst->ParameterValues[j], src->ParameterValues[i],
+	     sizeof(GLfloat) * 4);
+
+      /* Pointer to the string name was copied.  Null-out src param name
+       * to prevent double free later.
+       */
+      curr->Name = NULL;
+
+      dst->NumParameters++;
+   }
+
+   return base;
+}
+
+
+/**
+ * XXX description???
+ * \return GL_TRUE for success, GL_FALSE for failure
+ */
+GLboolean
+_mesa_layout_parameters(struct asm_parser_state *state)
+{
+   struct gl_program_parameter_list *layout;
+   struct asm_instruction *inst;
+   unsigned i;
+
+   layout =
+      _mesa_new_parameter_list_sized(state->prog->Parameters->NumParameters);
+
+   /* PASS 1:  Move any parameters that are accessed indirectly from the
+    * original parameter list to the new parameter list.
+    */
+   for (inst = state->inst_head; inst != NULL; inst = inst->next) {
+      for (i = 0; i < 3; i++) {
+	 if (inst->SrcReg[i].Base.RelAddr) {
+	    /* Only attempt to add the to the new parameter list once.
+	     */
+	    if (!inst->SrcReg[i].Symbol->pass1_done) {
+	       const int new_begin =
+		  copy_indirect_accessed_array(state->prog->Parameters, layout,
+		      inst->SrcReg[i].Symbol->param_binding_begin,
+		      inst->SrcReg[i].Symbol->param_binding_length);
+
+	       if (new_begin < 0) {
+		  _mesa_free_parameter_list(layout);
+		  return GL_FALSE;
+	       }
+
+	       inst->SrcReg[i].Symbol->param_binding_begin = new_begin;
+	       inst->SrcReg[i].Symbol->pass1_done = 1;
+	    }
+
+	    /* Previously the Index was just the offset from the parameter
+	     * array.  Now that the base of the parameter array is known, the
+	     * index can be updated to its actual value.
+	     */
+	    inst->Base.SrcReg[i] = inst->SrcReg[i].Base;
+	    inst->Base.SrcReg[i].Index +=
+	       inst->SrcReg[i].Symbol->param_binding_begin;
+	 }
+      }
+   }
+
+   /* PASS 2:  Move any parameters that are not accessed indirectly from the
+    * original parameter list to the new parameter list.
+    */
+   for (inst = state->inst_head; inst != NULL; inst = inst->next) {
+      for (i = 0; i < 3; i++) {
+	 const struct gl_program_parameter *p;
+	 const int idx = inst->SrcReg[i].Base.Index;
+	 unsigned swizzle = SWIZZLE_NOOP;
+
+	 /* All relative addressed operands were processed on the first
+	  * pass.  Just skip them here.
+	  */
+	 if (inst->SrcReg[i].Base.RelAddr) {
+	    continue;
+	 }
+
+	 if ((inst->SrcReg[i].Base.File <= PROGRAM_OUTPUT)
+	     || (inst->SrcReg[i].Base.File >= PROGRAM_WRITE_ONLY)) {
+	    continue;
+	 }
+
+	 inst->Base.SrcReg[i] = inst->SrcReg[i].Base;
+	 p = & state->prog->Parameters->Parameters[idx];
+
+	 switch (p->Type) {
+	 case PROGRAM_CONSTANT: {
+	    const gl_constant_value *const v =
+	       state->prog->Parameters->ParameterValues[idx];
+
+	    inst->Base.SrcReg[i].Index =
+	       _mesa_add_unnamed_constant(layout, v, p->Size, & swizzle);
+
+	    inst->Base.SrcReg[i].Swizzle = 
+	       _mesa_combine_swizzles(swizzle, inst->Base.SrcReg[i].Swizzle);
+	    break;
+	 }
+
+	 case PROGRAM_STATE_VAR:
+	    inst->Base.SrcReg[i].Index =
+	       _mesa_add_state_reference(layout, p->StateIndexes);
+	    break;
+
+	 default:
+	    break;
+	 }
+
+	 inst->SrcReg[i].Base.File = p->Type;
+	 inst->Base.SrcReg[i].File = p->Type;
+      }
+   }
+
+   layout->StateFlags = state->prog->Parameters->StateFlags;
+   _mesa_free_parameter_list(state->prog->Parameters);
+   state->prog->Parameters = layout;
+
+   return GL_TRUE;
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/prog_parameter_layout.h b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_parameter_layout.h
new file mode 100644
index 0000000..99a7b6c
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_parameter_layout.h

@@ -0,0 +1,42 @@
+/*
+ * Copyright © 2009 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file prog_parameter_layout.h
+ * \brief Helper functions to layout storage for program parameters
+ *
+ * \author Ian Romanick <ian.d.romanick@intel.com>
+ */
+
+#pragma once
+
+#ifndef PROG_PARAMETER_LAYOUT_H
+#define PROG_PARAMETER_LAYOUT_H
+
+extern unsigned _mesa_combine_swizzles(unsigned base, unsigned applied);
+
+struct asm_parser_state;
+
+extern GLboolean _mesa_layout_parameters(struct asm_parser_state *state);
+
+#endif /* PROG_PARAMETER_LAYOUT_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/prog_print.c b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_print.c
new file mode 100644
index 0000000..4a5c1c1
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_print.c

@@ -0,0 +1,1092 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2008  Brian Paul   All Rights Reserved.
+ * Copyright (C) 2009  VMware, Inc.  All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file prog_print.c
+ * Print vertex/fragment programs - for debugging.
+ * \author Brian Paul
+ */
+
+#include <inttypes.h>  /* for PRIx64 macro */
+
+#include "main/glheader.h"
+#include "main/context.h"
+#include "main/imports.h"
+#include "prog_instruction.h"
+#include "prog_parameter.h"
+#include "prog_print.h"
+#include "prog_statevars.h"
+
+
+
+/**
+ * Return string name for given program/register file.
+ */
+const char *
+_mesa_register_file_name(gl_register_file f)
+{
+   switch (f) {
+   case PROGRAM_TEMPORARY:
+      return "TEMP";
+   case PROGRAM_STATE_VAR:
+      return "STATE";
+   case PROGRAM_INPUT:
+      return "INPUT";
+   case PROGRAM_OUTPUT:
+      return "OUTPUT";
+   case PROGRAM_CONSTANT:
+      return "CONST";
+   case PROGRAM_UNIFORM:
+      return "UNIFORM";
+   case PROGRAM_ADDRESS:
+      return "ADDR";
+   case PROGRAM_SAMPLER:
+      return "SAMPLER";
+   case PROGRAM_SYSTEM_VALUE:
+      return "SYSVAL";
+   case PROGRAM_UNDEFINED:
+      return "UNDEFINED";
+   default:
+      {
+         static char s[20];
+         _mesa_snprintf(s, sizeof(s), "FILE%u", f);
+         return s;
+      }
+   }
+}
+
+
+/**
+ * Return ARB_v/f_prog-style input attrib string.
+ */
+static const char *
+arb_input_attrib_string(GLint index, GLenum progType)
+{
+   /*
+    * These strings should match the VERT_ATTRIB_x and VARYING_SLOT_x tokens.
+    */
+   static const char *const vertAttribs[] = {
+      "vertex.position",
+      "vertex.weight",
+      "vertex.normal",
+      "vertex.color.primary",
+      "vertex.color.secondary",
+      "vertex.fogcoord",
+      "vertex.(six)", /* VERT_ATTRIB_COLOR_INDEX */
+      "vertex.(seven)", /* VERT_ATTRIB_EDGEFLAG */
+      "vertex.texcoord[0]",
+      "vertex.texcoord[1]",
+      "vertex.texcoord[2]",
+      "vertex.texcoord[3]",
+      "vertex.texcoord[4]",
+      "vertex.texcoord[5]",
+      "vertex.texcoord[6]",
+      "vertex.texcoord[7]",
+      "vertex.(sixteen)", /* VERT_ATTRIB_POINT_SIZE */
+      "vertex.attrib[0]",
+      "vertex.attrib[1]",
+      "vertex.attrib[2]",
+      "vertex.attrib[3]",
+      "vertex.attrib[4]",
+      "vertex.attrib[5]",
+      "vertex.attrib[6]",
+      "vertex.attrib[7]",
+      "vertex.attrib[8]",
+      "vertex.attrib[9]",
+      "vertex.attrib[10]",
+      "vertex.attrib[11]",
+      "vertex.attrib[12]",
+      "vertex.attrib[13]",
+      "vertex.attrib[14]",
+      "vertex.attrib[15]" /* MAX_VARYING = 16 */
+   };
+   static const char *const fragAttribs[] = {
+      "fragment.position",
+      "fragment.color.primary",
+      "fragment.color.secondary",
+      "fragment.fogcoord",
+      "fragment.texcoord[0]",
+      "fragment.texcoord[1]",
+      "fragment.texcoord[2]",
+      "fragment.texcoord[3]",
+      "fragment.texcoord[4]",
+      "fragment.texcoord[5]",
+      "fragment.texcoord[6]",
+      "fragment.texcoord[7]",
+      "fragment.(twelve)", /* VARYING_SLOT_PSIZ */
+      "fragment.(thirteen)", /* VARYING_SLOT_BFC0 */
+      "fragment.(fourteen)", /* VARYING_SLOT_BFC1 */
+      "fragment.(fifteen)", /* VARYING_SLOT_EDGE */
+      "fragment.(sixteen)", /* VARYING_SLOT_CLIP_VERTEX */
+      "fragment.(seventeen)", /* VARYING_SLOT_CLIP_DIST0 */
+      "fragment.(eighteen)", /* VARYING_SLOT_CLIP_DIST1 */
+      "fragment.(nineteen)", /* VARYING_SLOT_PRIMITIVE_ID */
+      "fragment.(twenty)", /* VARYING_SLOT_LAYER */
+      "fragment.(twenty-one)", /* VARYING_SLOT_VIEWPORT */
+      "fragment.(twenty-two)", /* VARYING_SLOT_FACE */
+      "fragment.(twenty-three)", /* VARYING_SLOT_PNTC */
+      "fragment.varying[0]",
+      "fragment.varying[1]",
+      "fragment.varying[2]",
+      "fragment.varying[3]",
+      "fragment.varying[4]",
+      "fragment.varying[5]",
+      "fragment.varying[6]",
+      "fragment.varying[7]",
+      "fragment.varying[8]",
+      "fragment.varying[9]",
+      "fragment.varying[10]",
+      "fragment.varying[11]",
+      "fragment.varying[12]",
+      "fragment.varying[13]",
+      "fragment.varying[14]",
+      "fragment.varying[15]",
+      "fragment.varying[16]",
+      "fragment.varying[17]",
+      "fragment.varying[18]",
+      "fragment.varying[19]",
+      "fragment.varying[20]",
+      "fragment.varying[21]",
+      "fragment.varying[22]",
+      "fragment.varying[23]",
+      "fragment.varying[24]",
+      "fragment.varying[25]",
+      "fragment.varying[26]",
+      "fragment.varying[27]",
+      "fragment.varying[28]",
+      "fragment.varying[29]",
+      "fragment.varying[30]",
+      "fragment.varying[31]", /* MAX_VARYING = 32 */
+   };
+
+   /* sanity checks */
+   STATIC_ASSERT(Elements(vertAttribs) == VERT_ATTRIB_MAX);
+   STATIC_ASSERT(Elements(fragAttribs) == VARYING_SLOT_MAX);
+   assert(strcmp(vertAttribs[VERT_ATTRIB_TEX0], "vertex.texcoord[0]") == 0);
+   assert(strcmp(vertAttribs[VERT_ATTRIB_GENERIC15], "vertex.attrib[15]") == 0);
+   assert(strcmp(fragAttribs[VARYING_SLOT_TEX0], "fragment.texcoord[0]") == 0);
+   assert(strcmp(fragAttribs[VARYING_SLOT_VAR0+15], "fragment.varying[15]") == 0);
+
+   if (progType == GL_VERTEX_PROGRAM_ARB) {
+      assert(index < Elements(vertAttribs));
+      return vertAttribs[index];
+   }
+   else {
+      assert(progType == GL_FRAGMENT_PROGRAM_ARB);
+      assert(index < Elements(fragAttribs));
+      return fragAttribs[index];
+   }
+}
+
+
+/**
+ * Print a vertex program's InputsRead field in human-readable format.
+ * For debugging.
+ */
+void
+_mesa_print_vp_inputs(GLbitfield inputs)
+{
+   printf("VP Inputs 0x%x: \n", inputs);
+   while (inputs) {
+      GLint attr = ffs(inputs) - 1;
+      const char *name = arb_input_attrib_string(attr,
+                                                 GL_VERTEX_PROGRAM_ARB);
+      printf("  %d: %s\n", attr, name);
+      inputs &= ~(1 << attr);
+   }
+}
+
+
+/**
+ * Print a fragment program's InputsRead field in human-readable format.
+ * For debugging.
+ */
+void
+_mesa_print_fp_inputs(GLbitfield inputs)
+{
+   printf("FP Inputs 0x%x: \n", inputs);
+   while (inputs) {
+      GLint attr = ffs(inputs) - 1;
+      const char *name = arb_input_attrib_string(attr,
+                                                 GL_FRAGMENT_PROGRAM_ARB);
+      printf("  %d: %s\n", attr, name);
+      inputs &= ~(1 << attr);
+   }
+}
+
+
+
+/**
+ * Return ARB_v/f_prog-style output attrib string.
+ */
+static const char *
+arb_output_attrib_string(GLint index, GLenum progType)
+{
+   /*
+    * These strings should match the VARYING_SLOT_x and FRAG_RESULT_x tokens.
+    */
+   static const char *const vertResults[] = {
+      "result.position",
+      "result.color.primary",
+      "result.color.secondary",
+      "result.fogcoord",
+      "result.texcoord[0]",
+      "result.texcoord[1]",
+      "result.texcoord[2]",
+      "result.texcoord[3]",
+      "result.texcoord[4]",
+      "result.texcoord[5]",
+      "result.texcoord[6]",
+      "result.texcoord[7]",
+      "result.pointsize", /* VARYING_SLOT_PSIZ */
+      "result.(thirteen)", /* VARYING_SLOT_BFC0 */
+      "result.(fourteen)", /* VARYING_SLOT_BFC1 */
+      "result.(fifteen)", /* VARYING_SLOT_EDGE */
+      "result.(sixteen)", /* VARYING_SLOT_CLIP_VERTEX */
+      "result.(seventeen)", /* VARYING_SLOT_CLIP_DIST0 */
+      "result.(eighteen)", /* VARYING_SLOT_CLIP_DIST1 */
+      "result.(nineteen)", /* VARYING_SLOT_PRIMITIVE_ID */
+      "result.(twenty)", /* VARYING_SLOT_LAYER */
+      "result.(twenty-one)", /* VARYING_SLOT_VIEWPORT */
+      "result.(twenty-two)", /* VARYING_SLOT_FACE */
+      "result.(twenty-three)", /* VARYING_SLOT_PNTC */
+      "result.varying[0]",
+      "result.varying[1]",
+      "result.varying[2]",
+      "result.varying[3]",
+      "result.varying[4]",
+      "result.varying[5]",
+      "result.varying[6]",
+      "result.varying[7]",
+      "result.varying[8]",
+      "result.varying[9]",
+      "result.varying[10]",
+      "result.varying[11]",
+      "result.varying[12]",
+      "result.varying[13]",
+      "result.varying[14]",
+      "result.varying[15]",
+      "result.varying[16]",
+      "result.varying[17]",
+      "result.varying[18]",
+      "result.varying[19]",
+      "result.varying[20]",
+      "result.varying[21]",
+      "result.varying[22]",
+      "result.varying[23]",
+      "result.varying[24]",
+      "result.varying[25]",
+      "result.varying[26]",
+      "result.varying[27]",
+      "result.varying[28]",
+      "result.varying[29]",
+      "result.varying[30]",
+      "result.varying[31]", /* MAX_VARYING = 32 */
+   };
+   static const char *const fragResults[] = {
+      "result.depth", /* FRAG_RESULT_DEPTH */
+      "result.(one)", /* FRAG_RESULT_STENCIL */
+      "result.color", /* FRAG_RESULT_COLOR */
+      "result.samplemask", /* FRAG_RESULT_SAMPLE_MASK */
+      "result.color[0]", /* FRAG_RESULT_DATA0 (named for GLSL's gl_FragData) */
+      "result.color[1]",
+      "result.color[2]",
+      "result.color[3]",
+      "result.color[4]",
+      "result.color[5]",
+      "result.color[6]",
+      "result.color[7]" /* MAX_DRAW_BUFFERS = 8 */
+   };
+
+   /* sanity checks */
+   STATIC_ASSERT(Elements(vertResults) == VARYING_SLOT_MAX);
+   STATIC_ASSERT(Elements(fragResults) == FRAG_RESULT_MAX);
+   assert(strcmp(vertResults[VARYING_SLOT_POS], "result.position") == 0);
+   assert(strcmp(vertResults[VARYING_SLOT_VAR0], "result.varying[0]") == 0);
+   assert(strcmp(fragResults[FRAG_RESULT_DATA0], "result.color[0]") == 0);
+
+   if (progType == GL_VERTEX_PROGRAM_ARB) {
+      assert(index < Elements(vertResults));
+      return vertResults[index];
+   }
+   else {
+      assert(progType == GL_FRAGMENT_PROGRAM_ARB);
+      assert(index < Elements(fragResults));
+      return fragResults[index];
+   }
+}
+
+
+/**
+ * Return string representation of the given register.
+ * Note that some types of registers (like PROGRAM_UNIFORM) aren't defined
+ * by the ARB/NV program languages so we've taken some liberties here.
+ * \param f  the register file (PROGRAM_INPUT, PROGRAM_TEMPORARY, etc)
+ * \param index  number of the register in the register file
+ * \param mode  the output format/mode/style
+ * \param prog  pointer to containing program
+ */
+static const char *
+reg_string(gl_register_file f, GLint index, gl_prog_print_mode mode,
+           GLboolean relAddr, const struct gl_program *prog,
+           GLboolean hasIndex2, GLboolean relAddr2, GLint index2)
+{
+   static char str[100];
+   const char *addr = relAddr ? "ADDR+" : "";
+
+   str[0] = 0;
+
+   switch (mode) {
+   case PROG_PRINT_DEBUG:
+      sprintf(str, "%s[%s%d]",
+              _mesa_register_file_name(f), addr, index);
+      if (hasIndex2) {
+         int offset = strlen(str);
+         const char *addr2 = relAddr2 ? "ADDR+" : "";
+         sprintf(str+offset, "[%s%d]", addr2, index2);
+      }
+      break;
+
+   case PROG_PRINT_ARB:
+      switch (f) {
+      case PROGRAM_INPUT:
+         sprintf(str, "%s", arb_input_attrib_string(index, prog->Target));
+         break;
+      case PROGRAM_OUTPUT:
+         sprintf(str, "%s", arb_output_attrib_string(index, prog->Target));
+         break;
+      case PROGRAM_TEMPORARY:
+         sprintf(str, "temp%d", index);
+         break;
+      case PROGRAM_CONSTANT: /* extension */
+         sprintf(str, "constant[%s%d]", addr, index);
+         break;
+      case PROGRAM_UNIFORM: /* extension */
+         sprintf(str, "uniform[%s%d]", addr, index);
+         break;
+      case PROGRAM_SYSTEM_VALUE:
+         sprintf(str, "sysvalue[%s%d]", addr, index);
+         break;
+      case PROGRAM_STATE_VAR:
+         {
+            struct gl_program_parameter *param
+               = prog->Parameters->Parameters + index;
+            char *state = _mesa_program_state_string(param->StateIndexes);
+            sprintf(str, "%s", state);
+            free(state);
+         }
+         break;
+      case PROGRAM_ADDRESS:
+         sprintf(str, "A%d", index);
+         break;
+      default:
+         _mesa_problem(NULL, "bad file in reg_string()");
+      }
+      break;
+
+   default:
+      _mesa_problem(NULL, "bad mode in reg_string()");
+   }
+
+   return str;
+}
+
+
+/**
+ * Return a string representation of the given swizzle word.
+ * If extended is true, use extended (comma-separated) format.
+ * \param swizzle  the swizzle field
+ * \param negateBase  4-bit negation vector
+ * \param extended  if true, also allow 0, 1 values
+ */
+const char *
+_mesa_swizzle_string(GLuint swizzle, GLuint negateMask, GLboolean extended)
+{
+   static const char swz[] = "xyzw01!?";  /* See SWIZZLE_x definitions */
+   static char s[20];
+   GLuint i = 0;
+
+   if (!extended && swizzle == SWIZZLE_NOOP && negateMask == 0)
+      return ""; /* no swizzle/negation */
+
+   if (!extended)
+      s[i++] = '.';
+
+   if (negateMask & NEGATE_X)
+      s[i++] = '-';
+   s[i++] = swz[GET_SWZ(swizzle, 0)];
+
+   if (extended) {
+      s[i++] = ',';
+   }
+
+   if (negateMask & NEGATE_Y)
+      s[i++] = '-';
+   s[i++] = swz[GET_SWZ(swizzle, 1)];
+
+   if (extended) {
+      s[i++] = ',';
+   }
+
+   if (negateMask & NEGATE_Z)
+      s[i++] = '-';
+   s[i++] = swz[GET_SWZ(swizzle, 2)];
+
+   if (extended) {
+      s[i++] = ',';
+   }
+
+   if (negateMask & NEGATE_W)
+      s[i++] = '-';
+   s[i++] = swz[GET_SWZ(swizzle, 3)];
+
+   s[i] = 0;
+   return s;
+}
+
+
+void
+_mesa_print_swizzle(GLuint swizzle)
+{
+   if (swizzle == SWIZZLE_XYZW) {
+      printf(".xyzw\n");
+   }
+   else {
+      const char *s = _mesa_swizzle_string(swizzle, 0, 0);
+      printf("%s\n", s);
+   }
+}
+
+
+const char *
+_mesa_writemask_string(GLuint writeMask)
+{
+   static char s[10];
+   GLuint i = 0;
+
+   if (writeMask == WRITEMASK_XYZW)
+      return "";
+
+   s[i++] = '.';
+   if (writeMask & WRITEMASK_X)
+      s[i++] = 'x';
+   if (writeMask & WRITEMASK_Y)
+      s[i++] = 'y';
+   if (writeMask & WRITEMASK_Z)
+      s[i++] = 'z';
+   if (writeMask & WRITEMASK_W)
+      s[i++] = 'w';
+
+   s[i] = 0;
+   return s;
+}
+
+
+const char *
+_mesa_condcode_string(GLuint condcode)
+{
+   switch (condcode) {
+   case COND_GT:  return "GT";
+   case COND_EQ:  return "EQ";
+   case COND_LT:  return "LT";
+   case COND_UN:  return "UN";
+   case COND_GE:  return "GE";
+   case COND_LE:  return "LE";
+   case COND_NE:  return "NE";
+   case COND_TR:  return "TR";
+   case COND_FL:  return "FL";
+   default: return "cond???";
+   }
+}
+
+
+static void
+fprint_dst_reg(FILE * f,
+               const struct prog_dst_register *dstReg,
+               gl_prog_print_mode mode,
+               const struct gl_program *prog)
+{
+   fprintf(f, "%s%s",
+	   reg_string((gl_register_file) dstReg->File,
+		      dstReg->Index, mode, dstReg->RelAddr, prog,
+                      GL_FALSE, GL_FALSE, 0),
+	   _mesa_writemask_string(dstReg->WriteMask));
+   
+   if (dstReg->CondMask != COND_TR) {
+      fprintf(f, " (%s.%s)",
+	      _mesa_condcode_string(dstReg->CondMask),
+	      _mesa_swizzle_string(dstReg->CondSwizzle,
+				   GL_FALSE, GL_FALSE));
+   }
+
+#if 0
+   fprintf(f, "%s[%d]%s",
+	   _mesa_register_file_name((gl_register_file) dstReg->File),
+	   dstReg->Index,
+	   _mesa_writemask_string(dstReg->WriteMask));
+#endif
+}
+
+
+static void
+fprint_src_reg(FILE *f,
+               const struct prog_src_register *srcReg, 
+               gl_prog_print_mode mode,
+               const struct gl_program *prog)
+{
+   const char *abs = srcReg->Abs ? "|" : "";
+
+   fprintf(f, "%s%s%s%s",
+	   abs,
+	   reg_string((gl_register_file) srcReg->File,
+		      srcReg->Index, mode, srcReg->RelAddr, prog,
+                      srcReg->HasIndex2, srcReg->RelAddr2, srcReg->Index2),
+	   _mesa_swizzle_string(srcReg->Swizzle,
+				srcReg->Negate, GL_FALSE),
+	   abs);
+#if 0
+   fprintf(f, "%s[%d]%s",
+	   _mesa_register_file_name((gl_register_file) srcReg->File),
+	   srcReg->Index,
+	   _mesa_swizzle_string(srcReg->Swizzle,
+				srcReg->Negate, GL_FALSE));
+#endif
+}
+
+
+static void
+fprint_comment(FILE *f, const struct prog_instruction *inst)
+{
+   if (inst->Comment)
+      fprintf(f, ";  # %s\n", inst->Comment);
+   else
+      fprintf(f, ";\n");
+}
+
+
+void
+_mesa_fprint_alu_instruction(FILE *f,
+			     const struct prog_instruction *inst,
+			     const char *opcode_string, GLuint numRegs,
+			     gl_prog_print_mode mode,
+			     const struct gl_program *prog)
+{
+   GLuint j;
+
+   fprintf(f, "%s", opcode_string);
+   if (inst->CondUpdate)
+      fprintf(f, ".C");
+
+   /* frag prog only */
+   if (inst->SaturateMode == SATURATE_ZERO_ONE)
+      fprintf(f, "_SAT");
+
+   fprintf(f, " ");
+   if (inst->DstReg.File != PROGRAM_UNDEFINED) {
+      fprint_dst_reg(f, &inst->DstReg, mode, prog);
+   }
+   else {
+      fprintf(f, " ???");
+   }
+
+   if (numRegs > 0)
+      fprintf(f, ", ");
+
+   for (j = 0; j < numRegs; j++) {
+      fprint_src_reg(f, inst->SrcReg + j, mode, prog);
+      if (j + 1 < numRegs)
+	 fprintf(f, ", ");
+   }
+
+   fprint_comment(f, inst);
+}
+
+
+void
+_mesa_print_alu_instruction(const struct prog_instruction *inst,
+                            const char *opcode_string, GLuint numRegs)
+{
+   _mesa_fprint_alu_instruction(stderr, inst, opcode_string,
+				numRegs, PROG_PRINT_DEBUG, NULL);
+}
+
+
+/**
+ * Print a single vertex/fragment program instruction.
+ */
+GLint
+_mesa_fprint_instruction_opt(FILE *f,
+                            const struct prog_instruction *inst,
+                            GLint indent,
+                            gl_prog_print_mode mode,
+                            const struct gl_program *prog)
+{
+   GLint i;
+
+   if (inst->Opcode == OPCODE_ELSE ||
+       inst->Opcode == OPCODE_ENDIF ||
+       inst->Opcode == OPCODE_ENDLOOP ||
+       inst->Opcode == OPCODE_ENDSUB) {
+      indent -= 3;
+   }
+   for (i = 0; i < indent; i++) {
+      fprintf(f, " ");
+   }
+
+   switch (inst->Opcode) {
+   case OPCODE_SWZ:
+      fprintf(f, "SWZ");
+      if (inst->SaturateMode == SATURATE_ZERO_ONE)
+         fprintf(f, "_SAT");
+      fprintf(f, " ");
+      fprint_dst_reg(f, &inst->DstReg, mode, prog);
+      fprintf(f, ", %s[%d], %s",
+	      _mesa_register_file_name((gl_register_file) inst->SrcReg[0].File),
+	      inst->SrcReg[0].Index,
+	      _mesa_swizzle_string(inst->SrcReg[0].Swizzle,
+				   inst->SrcReg[0].Negate, GL_TRUE));
+      fprint_comment(f, inst);
+      break;
+   case OPCODE_TEX:
+   case OPCODE_TXP:
+   case OPCODE_TXL:
+   case OPCODE_TXB:
+   case OPCODE_TXD:
+      fprintf(f, "%s", _mesa_opcode_string(inst->Opcode));
+      if (inst->SaturateMode == SATURATE_ZERO_ONE)
+         fprintf(f, "_SAT");
+      fprintf(f, " ");
+      fprint_dst_reg(f, &inst->DstReg, mode, prog);
+      fprintf(f, ", ");
+      fprint_src_reg(f, &inst->SrcReg[0], mode, prog);
+      if (inst->Opcode == OPCODE_TXD) {
+         fprintf(f, ", ");
+         fprint_src_reg(f, &inst->SrcReg[1], mode, prog);
+         fprintf(f, ", ");
+         fprint_src_reg(f, &inst->SrcReg[2], mode, prog);
+      }
+      fprintf(f, ", texture[%d], ", inst->TexSrcUnit);
+      switch (inst->TexSrcTarget) {
+      case TEXTURE_1D_INDEX:   fprintf(f, "1D");    break;
+      case TEXTURE_2D_INDEX:   fprintf(f, "2D");    break;
+      case TEXTURE_3D_INDEX:   fprintf(f, "3D");    break;
+      case TEXTURE_CUBE_INDEX: fprintf(f, "CUBE");  break;
+      case TEXTURE_RECT_INDEX: fprintf(f, "RECT");  break;
+      case TEXTURE_1D_ARRAY_INDEX: fprintf(f, "1D_ARRAY"); break;
+      case TEXTURE_2D_ARRAY_INDEX: fprintf(f, "2D_ARRAY"); break;
+      default:
+         ;
+      }
+      if (inst->TexShadow)
+         fprintf(f, " SHADOW");
+      fprint_comment(f, inst);
+      break;
+
+   case OPCODE_KIL:
+      fprintf(f, "%s", _mesa_opcode_string(inst->Opcode));
+      fprintf(f, " ");
+      fprint_src_reg(f, &inst->SrcReg[0], mode, prog);
+      fprint_comment(f, inst);
+      break;
+   case OPCODE_KIL_NV:
+      fprintf(f, "%s", _mesa_opcode_string(inst->Opcode));
+      fprintf(f, " ");
+      fprintf(f, "%s.%s",
+	      _mesa_condcode_string(inst->DstReg.CondMask),
+	      _mesa_swizzle_string(inst->DstReg.CondSwizzle,
+				   GL_FALSE, GL_FALSE));
+      fprint_comment(f, inst);
+      break;
+
+   case OPCODE_ARL:
+      fprintf(f, "ARL ");
+      fprint_dst_reg(f, &inst->DstReg, mode, prog);
+      fprintf(f, ", ");
+      fprint_src_reg(f, &inst->SrcReg[0], mode, prog);
+      fprint_comment(f, inst);
+      break;
+   case OPCODE_IF:
+      if (inst->SrcReg[0].File != PROGRAM_UNDEFINED) {
+         /* Use ordinary register */
+         fprintf(f, "IF ");
+         fprint_src_reg(f, &inst->SrcReg[0], mode, prog);
+         fprintf(f, "; ");
+      }
+      else {
+         /* Use cond codes */
+         fprintf(f, "IF (%s%s);",
+		 _mesa_condcode_string(inst->DstReg.CondMask),
+		 _mesa_swizzle_string(inst->DstReg.CondSwizzle,
+				      0, GL_FALSE));
+      }
+      fprintf(f, " # (if false, goto %d)", inst->BranchTarget);
+      fprint_comment(f, inst);
+      return indent + 3;
+   case OPCODE_ELSE:
+      fprintf(f, "ELSE; # (goto %d)\n", inst->BranchTarget);
+      return indent + 3;
+   case OPCODE_ENDIF:
+      fprintf(f, "ENDIF;\n");
+      break;
+   case OPCODE_BGNLOOP:
+      fprintf(f, "BGNLOOP; # (end at %d)\n", inst->BranchTarget);
+      return indent + 3;
+   case OPCODE_ENDLOOP:
+      fprintf(f, "ENDLOOP; # (goto %d)\n", inst->BranchTarget);
+      break;
+   case OPCODE_BRK:
+   case OPCODE_CONT:
+      fprintf(f, "%s (%s%s); # (goto %d)",
+	      _mesa_opcode_string(inst->Opcode),
+	      _mesa_condcode_string(inst->DstReg.CondMask),
+	      _mesa_swizzle_string(inst->DstReg.CondSwizzle, 0, GL_FALSE),
+	      inst->BranchTarget);
+      fprint_comment(f, inst);
+      break;
+
+   case OPCODE_BGNSUB:
+      fprintf(f, "BGNSUB");
+      fprint_comment(f, inst);
+      return indent + 3;
+   case OPCODE_ENDSUB:
+      if (mode == PROG_PRINT_DEBUG) {
+         fprintf(f, "ENDSUB");
+         fprint_comment(f, inst);
+      }
+      break;
+   case OPCODE_CAL:
+      fprintf(f, "CAL %u", inst->BranchTarget);
+      fprint_comment(f, inst);
+      break;
+   case OPCODE_RET:
+      fprintf(f, "RET (%s%s)",
+	      _mesa_condcode_string(inst->DstReg.CondMask),
+	      _mesa_swizzle_string(inst->DstReg.CondSwizzle, 0, GL_FALSE));
+      fprint_comment(f, inst);
+      break;
+
+   case OPCODE_END:
+      fprintf(f, "END\n");
+      break;
+   case OPCODE_NOP:
+      if (mode == PROG_PRINT_DEBUG) {
+         fprintf(f, "NOP");
+         fprint_comment(f, inst);
+      }
+      else if (inst->Comment) {
+         /* ARB/NV extensions don't have NOP instruction */
+         fprintf(f, "# %s\n", inst->Comment);
+      }
+      break;
+   /* XXX may need other special-case instructions */
+   default:
+      if (inst->Opcode < MAX_OPCODE) {
+         /* typical alu instruction */
+         _mesa_fprint_alu_instruction(f, inst,
+				      _mesa_opcode_string(inst->Opcode),
+				      _mesa_num_inst_src_regs(inst->Opcode),
+				      mode, prog);
+      }
+      else {
+         _mesa_fprint_alu_instruction(f, inst,
+				      _mesa_opcode_string(inst->Opcode),
+				      3/*_mesa_num_inst_src_regs(inst->Opcode)*/,
+				      mode, prog);
+      }
+      break;
+   }
+   return indent;
+}
+
+
+GLint
+_mesa_print_instruction_opt(const struct prog_instruction *inst,
+                            GLint indent,
+                            gl_prog_print_mode mode,
+                            const struct gl_program *prog)
+{
+   return _mesa_fprint_instruction_opt(stderr, inst, indent, mode, prog);
+}
+
+
+void
+_mesa_print_instruction(const struct prog_instruction *inst)
+{
+   /* note: 4th param should be ignored for PROG_PRINT_DEBUG */
+   _mesa_fprint_instruction_opt(stderr, inst, 0, PROG_PRINT_DEBUG, NULL);
+}
+
+
+
+/**
+ * Print program, with options.
+ */
+void
+_mesa_fprint_program_opt(FILE *f,
+                         const struct gl_program *prog,
+                         gl_prog_print_mode mode,
+                         GLboolean lineNumbers)
+{
+   GLuint i, indent = 0;
+
+   switch (prog->Target) {
+   case GL_VERTEX_PROGRAM_ARB:
+      if (mode == PROG_PRINT_ARB)
+         fprintf(f, "!!ARBvp1.0\n");
+      else
+         fprintf(f, "# Vertex Program/Shader %u\n", prog->Id);
+      break;
+   case GL_FRAGMENT_PROGRAM_ARB:
+      if (mode == PROG_PRINT_ARB)
+         fprintf(f, "!!ARBfp1.0\n");
+      else
+         fprintf(f, "# Fragment Program/Shader %u\n", prog->Id);
+      break;
+   case MESA_GEOMETRY_PROGRAM:
+      fprintf(f, "# Geometry Shader\n");
+   }
+
+   for (i = 0; i < prog->NumInstructions; i++) {
+      if (lineNumbers)
+         fprintf(f, "%3d: ", i);
+      indent = _mesa_fprint_instruction_opt(f, prog->Instructions + i,
+                                           indent, mode, prog);
+   }
+}
+
+
+/**
+ * Print program to stderr, default options.
+ */
+void
+_mesa_print_program(const struct gl_program *prog)
+{
+   _mesa_fprint_program_opt(stderr, prog, PROG_PRINT_DEBUG, GL_TRUE);
+}
+
+
+/**
+ * Return binary representation of 64-bit value (as a string).
+ * Insert a comma to separate each group of 8 bits.
+ * Note we return a pointer to local static storage so this is not
+ * re-entrant, etc.
+ * XXX move to imports.[ch] if useful elsewhere.
+ */
+static const char *
+binary(GLbitfield64 val)
+{
+   static char buf[80];
+   GLint i, len = 0;
+   for (i = 63; i >= 0; --i) {
+      if (val & (BITFIELD64_BIT(i)))
+         buf[len++] = '1';
+      else if (len > 0 || i == 0)
+         buf[len++] = '0';
+      if (len > 0 && ((i-1) % 8) == 7)
+         buf[len++] = ',';
+   }
+   buf[len] = '\0';
+   return buf;
+}
+
+
+/**
+ * Print all of a program's parameters/fields to given file.
+ */
+static void
+_mesa_fprint_program_parameters(FILE *f,
+                                struct gl_context *ctx,
+                                const struct gl_program *prog)
+{
+   GLuint i;
+
+   fprintf(f, "InputsRead: %" PRIx64 " (0b%s)\n",
+           (uint64_t) prog->InputsRead, binary(prog->InputsRead));
+   fprintf(f, "OutputsWritten: %" PRIx64 " (0b%s)\n",
+           (uint64_t) prog->OutputsWritten, binary(prog->OutputsWritten));
+   fprintf(f, "NumInstructions=%d\n", prog->NumInstructions);
+   fprintf(f, "NumTemporaries=%d\n", prog->NumTemporaries);
+   fprintf(f, "NumParameters=%d\n", prog->NumParameters);
+   fprintf(f, "NumAttributes=%d\n", prog->NumAttributes);
+   fprintf(f, "NumAddressRegs=%d\n", prog->NumAddressRegs);
+   fprintf(f, "IndirectRegisterFiles: 0x%x (0b%s)\n",
+           prog->IndirectRegisterFiles, binary(prog->IndirectRegisterFiles));
+   fprintf(f, "SamplersUsed: 0x%x (0b%s)\n",
+                 prog->SamplersUsed, binary(prog->SamplersUsed));
+   fprintf(f, "Samplers=[ ");
+   for (i = 0; i < MAX_SAMPLERS; i++) {
+      fprintf(f, "%d ", prog->SamplerUnits[i]);
+   }
+   fprintf(f, "]\n");
+
+   _mesa_load_state_parameters(ctx, prog->Parameters);
+
+#if 0
+   fprintf(f, "Local Params:\n");
+   for (i = 0; i < MAX_PROGRAM_LOCAL_PARAMS; i++){
+      const GLfloat *p = prog->LocalParams[i];
+      fprintf(f, "%2d: %f, %f, %f, %f\n", i, p[0], p[1], p[2], p[3]);
+   }
+#endif	
+   _mesa_print_parameter_list(prog->Parameters);
+}
+
+
+/**
+ * Print all of a program's parameters/fields to stderr.
+ */
+void
+_mesa_print_program_parameters(struct gl_context *ctx, const struct gl_program *prog)
+{
+   _mesa_fprint_program_parameters(stderr, ctx, prog);
+}
+
+
+/**
+ * Print a program parameter list to given file.
+ */
+static void
+_mesa_fprint_parameter_list(FILE *f,
+                            const struct gl_program_parameter_list *list)
+{
+   GLuint i;
+
+   if (!list)
+      return;
+
+   if (0)
+      fprintf(f, "param list %p\n", (void *) list);
+   fprintf(f, "dirty state flags: 0x%x\n", list->StateFlags);
+   for (i = 0; i < list->NumParameters; i++){
+      struct gl_program_parameter *param = list->Parameters + i;
+      const GLfloat *v = (GLfloat *) list->ParameterValues[i];
+      fprintf(f, "param[%d] sz=%d %s %s = {%.3g, %.3g, %.3g, %.3g}",
+	      i, param->Size,
+	      _mesa_register_file_name(list->Parameters[i].Type),
+	      param->Name, v[0], v[1], v[2], v[3]);
+      fprintf(f, "\n");
+   }
+}
+
+
+/**
+ * Print a program parameter list to stderr.
+ */
+void
+_mesa_print_parameter_list(const struct gl_program_parameter_list *list)
+{
+   _mesa_fprint_parameter_list(stderr, list);
+}
+
+
+/**
+ * Write shader and associated info to a file.
+ */
+void
+_mesa_write_shader_to_file(const struct gl_shader *shader)
+{
+   const char *type = "????";
+   char filename[100];
+   FILE *f;
+
+   switch (shader->Stage) {
+   case MESA_SHADER_FRAGMENT:
+      type = "frag";
+      break;
+   case MESA_SHADER_VERTEX:
+      type = "vert";
+      break;
+   case MESA_SHADER_GEOMETRY:
+      type = "geom";
+      break;
+   case MESA_SHADER_COMPUTE:
+      type = "comp";
+      break;
+   }
+
+   _mesa_snprintf(filename, sizeof(filename), "shader_%u.%s", shader->Name, type);
+   f = fopen(filename, "w");
+   if (!f) {
+      fprintf(stderr, "Unable to open %s for writing\n", filename);
+      return;
+   }
+
+   fprintf(f, "/* Shader %u source, checksum %u */\n", shader->Name, shader->SourceChecksum);
+   fputs(shader->Source, f);
+   fprintf(f, "\n");
+
+   fprintf(f, "/* Compile status: %s */\n",
+           shader->CompileStatus ? "ok" : "fail");
+   fprintf(f, "/* Log Info: */\n");
+   if (shader->InfoLog) {
+      fputs(shader->InfoLog, f);
+   }
+   if (shader->CompileStatus && shader->Program) {
+      fprintf(f, "/* GPU code */\n");
+      fprintf(f, "/*\n");
+      _mesa_fprint_program_opt(f, shader->Program, PROG_PRINT_DEBUG, GL_TRUE);
+      fprintf(f, "*/\n");
+      fprintf(f, "/* Parameters / constants */\n");
+      fprintf(f, "/*\n");
+      _mesa_fprint_parameter_list(f, shader->Program->Parameters);
+      fprintf(f, "*/\n");
+   }
+
+   fclose(f);
+}
+
+
+/**
+ * Append the shader's uniform info/values to the shader log file.
+ * The log file will typically have been created by the
+ * _mesa_write_shader_to_file function.
+ */
+void
+_mesa_append_uniforms_to_file(const struct gl_shader *shader)
+{
+   const struct gl_program *const prog = shader->Program;
+   const char *type;
+   char filename[100];
+   FILE *f;
+
+   if (shader->Stage == MESA_SHADER_FRAGMENT)
+      type = "frag";
+   else
+      type = "vert";
+
+   _mesa_snprintf(filename, sizeof(filename), "shader_%u.%s", shader->Name, type);
+   f = fopen(filename, "a"); /* append */
+   if (!f) {
+      fprintf(stderr, "Unable to open %s for appending\n", filename);
+      return;
+   }
+
+   fprintf(f, "/* First-draw parameters / constants */\n");
+   fprintf(f, "/*\n");
+   _mesa_fprint_parameter_list(f, prog->Parameters);
+   fprintf(f, "*/\n");
+
+   fclose(f);
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/prog_print.h b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_print.h
new file mode 100644
index 0000000..cd61568
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_print.h

@@ -0,0 +1,118 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2007  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+#ifndef PROG_PRINT_H
+#define PROG_PRINT_H
+
+#include <stdio.h>
+
+#include "main/glheader.h"
+#include "main/mtypes.h"
+
+struct gl_program;
+struct gl_program_parameter_list;
+struct gl_shader;
+struct prog_instruction;
+
+
+/**
+ * The output style to use when printing programs.
+ */
+typedef enum {
+   PROG_PRINT_ARB,
+   PROG_PRINT_DEBUG
+} gl_prog_print_mode;
+
+
+extern const char *
+_mesa_register_file_name(gl_register_file f);
+
+extern void
+_mesa_print_vp_inputs(GLbitfield inputs);
+
+extern void
+_mesa_print_fp_inputs(GLbitfield inputs);
+
+extern const char *
+_mesa_condcode_string(GLuint condcode);
+
+extern const char *
+_mesa_swizzle_string(GLuint swizzle, GLuint negateBase, GLboolean extended);
+
+const char *
+_mesa_writemask_string(GLuint writeMask);
+
+extern void
+_mesa_print_swizzle(GLuint swizzle);
+
+extern void
+_mesa_fprint_alu_instruction(FILE *f,
+			     const struct prog_instruction *inst,
+			     const char *opcode_string, GLuint numRegs,
+			     gl_prog_print_mode mode,
+			     const struct gl_program *prog);
+
+extern void
+_mesa_print_alu_instruction(const struct prog_instruction *inst,
+                            const char *opcode_string, GLuint numRegs);
+
+extern void
+_mesa_print_instruction(const struct prog_instruction *inst);
+
+extern GLint
+_mesa_fprint_instruction_opt(FILE *f,
+                            const struct prog_instruction *inst,
+                            GLint indent,
+                            gl_prog_print_mode mode,
+                            const struct gl_program *prog);
+
+extern GLint
+_mesa_print_instruction_opt(const struct prog_instruction *inst, GLint indent,
+                            gl_prog_print_mode mode,
+                            const struct gl_program *prog);
+
+extern void
+_mesa_print_program(const struct gl_program *prog);
+
+extern void
+_mesa_fprint_program_opt(FILE *f,
+                         const struct gl_program *prog, gl_prog_print_mode mode,
+                         GLboolean lineNumbers);
+
+extern void
+_mesa_print_program_parameters(struct gl_context *ctx, const struct gl_program *prog);
+
+extern void
+_mesa_print_parameter_list(const struct gl_program_parameter_list *list);
+
+
+extern void
+_mesa_write_shader_to_file(const struct gl_shader *shader);
+
+extern void
+_mesa_append_uniforms_to_file(const struct gl_shader *shader);
+
+
+#endif /* PROG_PRINT_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/prog_statevars.c b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_statevars.c
new file mode 100644
index 0000000..bf356af
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_statevars.c

@@ -0,0 +1,527 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2007  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file prog_statevars.c
+ * Program state variable management.
+ * \author Brian Paul
+ */
+
+
+#include "main/glheader.h"
+#include "main/context.h"
+//#include "main/blend.h"
+#include "main/imports.h"
+#include "main/macros.h"
+#include "main/mtypes.h"
+//#include "main/fbobject.h"
+#include "prog_statevars.h"
+#include "prog_parameter.h"
+//#include "main/samplerobj.h"
+
+
+/**
+ * Use the list of tokens in the state[] array to find global GL state
+ * and return it in <value>.  Usually, four values are returned in <value>
+ * but matrix queries may return as many as 16 values.
+ * This function is used for ARB vertex/fragment programs.
+ * The program parser will produce the state[] values.
+ */
+static void
+_mesa_fetch_state(struct gl_context *ctx, const gl_state_index state[],
+                  GLfloat *value)
+{
+    _mesa_problem(ctx, "Invalid state in _mesa_fetch_state");
+}
+
+
+/**
+ * Return a bitmask of the Mesa state flags (_NEW_* values) which would
+ * indicate that the given context state may have changed.
+ * The bitmask is used during validation to determine if we need to update
+ * vertex/fragment program parameters (like "state.material.color") when
+ * some GL state has changed.
+ */
+GLbitfield
+_mesa_program_state_flags(const gl_state_index state[STATE_LENGTH])
+{
+   switch (state[0]) {
+   case STATE_MATERIAL:
+   case STATE_LIGHTPROD:
+   case STATE_LIGHTMODEL_SCENECOLOR:
+      /* these can be effected by glColor when colormaterial mode is used */
+      return _NEW_LIGHT | _NEW_CURRENT_ATTRIB;
+
+   case STATE_LIGHT:
+   case STATE_LIGHTMODEL_AMBIENT:
+      return _NEW_LIGHT;
+
+   case STATE_TEXGEN:
+      return _NEW_TEXTURE;
+   case STATE_TEXENV_COLOR:
+      return _NEW_TEXTURE | _NEW_BUFFERS | _NEW_FRAG_CLAMP;
+
+   case STATE_FOG_COLOR:
+      return _NEW_FOG | _NEW_BUFFERS | _NEW_FRAG_CLAMP;
+   case STATE_FOG_PARAMS:
+      return _NEW_FOG;
+
+   case STATE_CLIPPLANE:
+      return _NEW_TRANSFORM;
+
+   case STATE_POINT_SIZE:
+   case STATE_POINT_ATTENUATION:
+      return _NEW_POINT;
+
+   case STATE_MODELVIEW_MATRIX:
+      return _NEW_MODELVIEW;
+   case STATE_PROJECTION_MATRIX:
+      return _NEW_PROJECTION;
+   case STATE_MVP_MATRIX:
+      return _NEW_MODELVIEW | _NEW_PROJECTION;
+   case STATE_TEXTURE_MATRIX:
+      return _NEW_TEXTURE_MATRIX;
+   case STATE_PROGRAM_MATRIX:
+      return _NEW_TRACK_MATRIX;
+
+   case STATE_NUM_SAMPLES:
+      return _NEW_BUFFERS;
+
+   case STATE_DEPTH_RANGE:
+      return _NEW_VIEWPORT;
+
+   case STATE_FRAGMENT_PROGRAM:
+   case STATE_VERTEX_PROGRAM:
+      return _NEW_PROGRAM;
+
+   case STATE_NORMAL_SCALE:
+      return _NEW_MODELVIEW;
+
+   case STATE_INTERNAL:
+      switch (state[1]) {
+      case STATE_CURRENT_ATTRIB:
+         return _NEW_CURRENT_ATTRIB;
+      case STATE_CURRENT_ATTRIB_MAYBE_VP_CLAMPED:
+         return _NEW_CURRENT_ATTRIB | _NEW_LIGHT | _NEW_BUFFERS;
+
+      case STATE_NORMAL_SCALE:
+         return _NEW_MODELVIEW;
+
+      case STATE_TEXRECT_SCALE:
+      case STATE_ROT_MATRIX_0:
+      case STATE_ROT_MATRIX_1:
+	 return _NEW_TEXTURE;
+      case STATE_FOG_PARAMS_OPTIMIZED:
+	 return _NEW_FOG;
+      case STATE_POINT_SIZE_CLAMPED:
+         return _NEW_POINT | _NEW_MULTISAMPLE;
+      case STATE_LIGHT_SPOT_DIR_NORMALIZED:
+      case STATE_LIGHT_POSITION:
+      case STATE_LIGHT_POSITION_NORMALIZED:
+      case STATE_LIGHT_HALF_VECTOR:
+         return _NEW_LIGHT;
+
+      case STATE_PT_SCALE:
+      case STATE_PT_BIAS:
+         return _NEW_PIXEL;
+
+      case STATE_FB_SIZE:
+      case STATE_FB_WPOS_Y_TRANSFORM:
+         return _NEW_BUFFERS;
+
+      default:
+         /* unknown state indexes are silently ignored and
+         *  no flag set, since it is handled by the driver.
+         */
+	 return 0;
+      }
+
+   default:
+      _mesa_problem(NULL, "unexpected state[0] in make_state_flags()");
+      return 0;
+   }
+}
+
+
+static void
+append(char *dst, const char *src)
+{
+   while (*dst)
+      dst++;
+   while (*src)
+     *dst++ = *src++;
+   *dst = 0;
+}
+
+
+/**
+ * Convert token 'k' to a string, append it onto 'dst' string.
+ */
+static void
+append_token(char *dst, gl_state_index k)
+{
+   switch (k) {
+   case STATE_MATERIAL:
+      append(dst, "material");
+      break;
+   case STATE_LIGHT:
+      append(dst, "light");
+      break;
+   case STATE_LIGHTMODEL_AMBIENT:
+      append(dst, "lightmodel.ambient");
+      break;
+   case STATE_LIGHTMODEL_SCENECOLOR:
+      break;
+   case STATE_LIGHTPROD:
+      append(dst, "lightprod");
+      break;
+   case STATE_TEXGEN:
+      append(dst, "texgen");
+      break;
+   case STATE_FOG_COLOR:
+      append(dst, "fog.color");
+      break;
+   case STATE_FOG_PARAMS:
+      append(dst, "fog.params");
+      break;
+   case STATE_CLIPPLANE:
+      append(dst, "clip");
+      break;
+   case STATE_POINT_SIZE:
+      append(dst, "point.size");
+      break;
+   case STATE_POINT_ATTENUATION:
+      append(dst, "point.attenuation");
+      break;
+   case STATE_MODELVIEW_MATRIX:
+      append(dst, "matrix.modelview");
+      break;
+   case STATE_PROJECTION_MATRIX:
+      append(dst, "matrix.projection");
+      break;
+   case STATE_MVP_MATRIX:
+      append(dst, "matrix.mvp");
+      break;
+   case STATE_TEXTURE_MATRIX:
+      append(dst, "matrix.texture");
+      break;
+   case STATE_PROGRAM_MATRIX:
+      append(dst, "matrix.program");
+      break;
+   case STATE_MATRIX_INVERSE:
+      append(dst, ".inverse");
+      break;
+   case STATE_MATRIX_TRANSPOSE:
+      append(dst, ".transpose");
+      break;
+   case STATE_MATRIX_INVTRANS:
+      append(dst, ".invtrans");
+      break;
+   case STATE_AMBIENT:
+      append(dst, ".ambient");
+      break;
+   case STATE_DIFFUSE:
+      append(dst, ".diffuse");
+      break;
+   case STATE_SPECULAR:
+      append(dst, ".specular");
+      break;
+   case STATE_EMISSION:
+      append(dst, ".emission");
+      break;
+   case STATE_SHININESS:
+      append(dst, "lshininess");
+      break;
+   case STATE_HALF_VECTOR:
+      append(dst, ".half");
+      break;
+   case STATE_POSITION:
+      append(dst, ".position");
+      break;
+   case STATE_ATTENUATION:
+      append(dst, ".attenuation");
+      break;
+   case STATE_SPOT_DIRECTION:
+      append(dst, ".spot.direction");
+      break;
+   case STATE_SPOT_CUTOFF:
+      append(dst, ".spot.cutoff");
+      break;
+   case STATE_TEXGEN_EYE_S:
+      append(dst, ".eye.s");
+      break;
+   case STATE_TEXGEN_EYE_T:
+      append(dst, ".eye.t");
+      break;
+   case STATE_TEXGEN_EYE_R:
+      append(dst, ".eye.r");
+      break;
+   case STATE_TEXGEN_EYE_Q:
+      append(dst, ".eye.q");
+      break;
+   case STATE_TEXGEN_OBJECT_S:
+      append(dst, ".object.s");
+      break;
+   case STATE_TEXGEN_OBJECT_T:
+      append(dst, ".object.t");
+      break;
+   case STATE_TEXGEN_OBJECT_R:
+      append(dst, ".object.r");
+      break;
+   case STATE_TEXGEN_OBJECT_Q:
+      append(dst, ".object.q");
+      break;
+   case STATE_TEXENV_COLOR:
+      append(dst, "texenv");
+      break;
+   case STATE_NUM_SAMPLES:
+      append(dst, "numsamples");
+      break;
+   case STATE_DEPTH_RANGE:
+      append(dst, "depth.range");
+      break;
+   case STATE_VERTEX_PROGRAM:
+   case STATE_FRAGMENT_PROGRAM:
+      break;
+   case STATE_ENV:
+      append(dst, "env");
+      break;
+   case STATE_LOCAL:
+      append(dst, "local");
+      break;
+   /* BEGIN internal state vars */
+   case STATE_INTERNAL:
+      append(dst, ".internal.");
+      break;
+   case STATE_CURRENT_ATTRIB:
+      append(dst, "current");
+      break;
+   case STATE_CURRENT_ATTRIB_MAYBE_VP_CLAMPED:
+      append(dst, "currentAttribMaybeVPClamped");
+      break;
+   case STATE_NORMAL_SCALE:
+      append(dst, "normalScale");
+      break;
+   case STATE_TEXRECT_SCALE:
+      append(dst, "texrectScale");
+      break;
+   case STATE_FOG_PARAMS_OPTIMIZED:
+      append(dst, "fogParamsOptimized");
+      break;
+   case STATE_POINT_SIZE_CLAMPED:
+      append(dst, "pointSizeClamped");
+      break;
+   case STATE_LIGHT_SPOT_DIR_NORMALIZED:
+      append(dst, "lightSpotDirNormalized");
+      break;
+   case STATE_LIGHT_POSITION:
+      append(dst, "lightPosition");
+      break;
+   case STATE_LIGHT_POSITION_NORMALIZED:
+      append(dst, "light.position.normalized");
+      break;
+   case STATE_LIGHT_HALF_VECTOR:
+      append(dst, "lightHalfVector");
+      break;
+   case STATE_PT_SCALE:
+      append(dst, "PTscale");
+      break;
+   case STATE_PT_BIAS:
+      append(dst, "PTbias");
+      break;
+   case STATE_FB_SIZE:
+      append(dst, "FbSize");
+      break;
+   case STATE_FB_WPOS_Y_TRANSFORM:
+      append(dst, "FbWposYTransform");
+      break;
+   case STATE_ROT_MATRIX_0:
+      append(dst, "rotMatrixRow0");
+      break;
+   case STATE_ROT_MATRIX_1:
+      append(dst, "rotMatrixRow1");
+      break;
+   default:
+      /* probably STATE_INTERNAL_DRIVER+i (driver private state) */
+      append(dst, "driverState");
+   }
+}
+
+static void
+append_face(char *dst, GLint face)
+{
+   if (face == 0)
+      append(dst, "front.");
+   else
+      append(dst, "back.");
+}
+
+static void
+append_index(char *dst, GLint index)
+{
+   char s[20];
+   sprintf(s, "[%d]", index);
+   append(dst, s);
+}
+
+/**
+ * Make a string from the given state vector.
+ * For example, return "state.matrix.texture[2].inverse".
+ * Use free() to deallocate the string.
+ */
+char *
+_mesa_program_state_string(const gl_state_index state[STATE_LENGTH])
+{
+   char str[1000] = "";
+   char tmp[30];
+
+   append(str, "state.");
+   append_token(str, state[0]);
+
+   switch (state[0]) {
+   case STATE_MATERIAL:
+      append_face(str, state[1]);
+      append_token(str, state[2]);
+      break;
+   case STATE_LIGHT:
+      append_index(str, state[1]); /* light number [i]. */
+      append_token(str, state[2]); /* coefficients */
+      break;
+   case STATE_LIGHTMODEL_AMBIENT:
+      append(str, "lightmodel.ambient");
+      break;
+   case STATE_LIGHTMODEL_SCENECOLOR:
+      if (state[1] == 0) {
+         append(str, "lightmodel.front.scenecolor");
+      }
+      else {
+         append(str, "lightmodel.back.scenecolor");
+      }
+      break;
+   case STATE_LIGHTPROD:
+      append_index(str, state[1]); /* light number [i]. */
+      append_face(str, state[2]);
+      append_token(str, state[3]);
+      break;
+   case STATE_TEXGEN:
+      append_index(str, state[1]); /* tex unit [i] */
+      append_token(str, state[2]); /* plane coef */
+      break;
+   case STATE_TEXENV_COLOR:
+      append_index(str, state[1]); /* tex unit [i] */
+      append(str, "color");
+      break;
+   case STATE_CLIPPLANE:
+      append_index(str, state[1]); /* plane [i] */
+      append(str, ".plane");
+      break;
+   case STATE_MODELVIEW_MATRIX:
+   case STATE_PROJECTION_MATRIX:
+   case STATE_MVP_MATRIX:
+   case STATE_TEXTURE_MATRIX:
+   case STATE_PROGRAM_MATRIX:
+      {
+         /* state[0] = modelview, projection, texture, etc. */
+         /* state[1] = which texture matrix or program matrix */
+         /* state[2] = first row to fetch */
+         /* state[3] = last row to fetch */
+         /* state[4] = transpose, inverse or invtrans */
+         const gl_state_index mat = state[0];
+         const GLuint index = (GLuint) state[1];
+         const GLuint firstRow = (GLuint) state[2];
+         const GLuint lastRow = (GLuint) state[3];
+         const gl_state_index modifier = state[4];
+         if (index ||
+             mat == STATE_TEXTURE_MATRIX ||
+             mat == STATE_PROGRAM_MATRIX)
+            append_index(str, index);
+         if (modifier)
+            append_token(str, modifier);
+         if (firstRow == lastRow)
+            sprintf(tmp, ".row[%d]", firstRow);
+         else
+            sprintf(tmp, ".row[%d..%d]", firstRow, lastRow);
+         append(str, tmp);
+      }
+      break;
+   case STATE_POINT_SIZE:
+      break;
+   case STATE_POINT_ATTENUATION:
+      break;
+   case STATE_FOG_PARAMS:
+      break;
+   case STATE_FOG_COLOR:
+      break;
+   case STATE_NUM_SAMPLES:
+      break;
+   case STATE_DEPTH_RANGE:
+      break;
+   case STATE_FRAGMENT_PROGRAM:
+   case STATE_VERTEX_PROGRAM:
+      /* state[1] = {STATE_ENV, STATE_LOCAL} */
+      /* state[2] = parameter index          */
+      append_token(str, state[1]);
+      append_index(str, state[2]);
+      break;
+   case STATE_NORMAL_SCALE:
+      break;
+   case STATE_INTERNAL:
+      append_token(str, state[1]);
+      if (state[1] == STATE_CURRENT_ATTRIB)
+         append_index(str, state[2]);
+       break;
+   default:
+      _mesa_problem(NULL, "Invalid state in _mesa_program_state_string");
+      break;
+   }
+
+   return _mesa_strdup(str);
+}
+
+
+/**
+ * Loop over all the parameters in a parameter list.  If the parameter
+ * is a GL state reference, look up the current value of that state
+ * variable and put it into the parameter's Value[4] array.
+ * Other parameter types never change or are explicitly set by the user
+ * with glUniform() or glProgramParameter(), etc.
+ * This would be called at glBegin time.
+ */
+void
+_mesa_load_state_parameters(struct gl_context *ctx,
+                            struct gl_program_parameter_list *paramList)
+{
+   GLuint i;
+
+   if (!paramList)
+      return;
+
+   for (i = 0; i < paramList->NumParameters; i++) {
+      if (paramList->Parameters[i].Type == PROGRAM_STATE_VAR) {
+         _mesa_fetch_state(ctx,
+			   paramList->Parameters[i].StateIndexes,
+                           &paramList->ParameterValues[i][0].f);
+      }
+   }
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/prog_statevars.h b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_statevars.h
new file mode 100644
index 0000000..23a9f48
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/prog_statevars.h

@@ -0,0 +1,156 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2007  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef PROG_STATEVARS_H
+#define PROG_STATEVARS_H
+
+
+#include "main/glheader.h"
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+struct gl_context;
+struct gl_program_parameter_list;
+
+/**
+ * Number of STATE_* values we need to address any GL state.
+ * Used to dimension arrays.
+ */
+#define STATE_LENGTH 5
+
+
+/**
+ * Used for describing GL state referenced from inside ARB vertex and
+ * fragment programs.
+ * A string such as "state.light[0].ambient" gets translated into a
+ * sequence of tokens such as [ STATE_LIGHT, 0, STATE_AMBIENT ].
+ *
+ * For state that's an array, like STATE_CLIPPLANE, the 2nd token [1] should
+ * always be the array index.
+ */
+typedef enum gl_state_index_ {
+   STATE_MATERIAL = 100,  /* start at 100 so small ints are seen as ints */
+
+   STATE_LIGHT,
+   STATE_LIGHTMODEL_AMBIENT,
+   STATE_LIGHTMODEL_SCENECOLOR,
+   STATE_LIGHTPROD,
+
+   STATE_TEXGEN,
+
+   STATE_FOG_COLOR,
+   STATE_FOG_PARAMS,
+
+   STATE_CLIPPLANE,
+
+   STATE_POINT_SIZE,
+   STATE_POINT_ATTENUATION,
+
+   STATE_MODELVIEW_MATRIX,
+   STATE_PROJECTION_MATRIX,
+   STATE_MVP_MATRIX,
+   STATE_TEXTURE_MATRIX,
+   STATE_PROGRAM_MATRIX,
+   STATE_MATRIX_INVERSE,
+   STATE_MATRIX_TRANSPOSE,
+   STATE_MATRIX_INVTRANS,
+
+   STATE_AMBIENT,
+   STATE_DIFFUSE,
+   STATE_SPECULAR,
+   STATE_EMISSION,
+   STATE_SHININESS,
+   STATE_HALF_VECTOR,
+
+   STATE_POSITION,       /**< xyzw = position */
+   STATE_ATTENUATION,    /**< xyz = attenuation, w = spot exponent */
+   STATE_SPOT_DIRECTION, /**< xyz = direction, w = cos(cutoff) */
+   STATE_SPOT_CUTOFF,    /**< x = cutoff, yzw = undefined */
+
+   STATE_TEXGEN_EYE_S,
+   STATE_TEXGEN_EYE_T,
+   STATE_TEXGEN_EYE_R,
+   STATE_TEXGEN_EYE_Q,
+   STATE_TEXGEN_OBJECT_S,
+   STATE_TEXGEN_OBJECT_T,
+   STATE_TEXGEN_OBJECT_R,
+   STATE_TEXGEN_OBJECT_Q,
+
+   STATE_TEXENV_COLOR,
+
+   STATE_NUM_SAMPLES,    /* An integer, not a float like the other state vars */
+
+   STATE_DEPTH_RANGE,
+
+   STATE_VERTEX_PROGRAM,
+   STATE_FRAGMENT_PROGRAM,
+
+   STATE_ENV,
+   STATE_LOCAL,
+
+   STATE_INTERNAL,		/* Mesa additions */
+   STATE_CURRENT_ATTRIB,        /* ctx->Current vertex attrib value */
+   STATE_CURRENT_ATTRIB_MAYBE_VP_CLAMPED,        /* ctx->Current vertex attrib value after passthrough vertex processing */
+   STATE_NORMAL_SCALE,
+   STATE_TEXRECT_SCALE,
+   STATE_FOG_PARAMS_OPTIMIZED,  /* for faster fog calc */
+   STATE_POINT_SIZE_CLAMPED,    /* includes implementation dependent size clamp */
+   STATE_LIGHT_SPOT_DIR_NORMALIZED,   /* pre-normalized spot dir */
+   STATE_LIGHT_POSITION,              /* object vs eye space */
+   STATE_LIGHT_POSITION_NORMALIZED,   /* object vs eye space */
+   STATE_LIGHT_HALF_VECTOR,           /* object vs eye space */
+   STATE_PT_SCALE,              /**< Pixel transfer RGBA scale */
+   STATE_PT_BIAS,               /**< Pixel transfer RGBA bias */
+   STATE_FB_SIZE,               /**< (width-1, height-1, 0, 0) */
+   STATE_FB_WPOS_Y_TRANSFORM,   /**< (1, 0, -1, height) if a FBO is bound, (-1, height, 1, 0) otherwise */
+   STATE_ROT_MATRIX_0,          /**< ATI_envmap_bumpmap, rot matrix row 0 */
+   STATE_ROT_MATRIX_1,          /**< ATI_envmap_bumpmap, rot matrix row 1 */
+   STATE_INTERNAL_DRIVER	/* first available state index for drivers (must be last) */
+} gl_state_index;
+
+
+
+extern void
+_mesa_load_state_parameters(struct gl_context *ctx,
+                            struct gl_program_parameter_list *paramList);
+
+
+extern GLbitfield
+_mesa_program_state_flags(const gl_state_index state[STATE_LENGTH]);
+
+
+extern char *
+_mesa_program_state_string(const gl_state_index state[STATE_LENGTH]);
+
+
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* PROG_STATEVARS_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/program.c b/icd/intel/compiler/mesa-utils/src/mesa/program/program.c
new file mode 100644
index 0000000..b9360ac
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/program.c

@@ -0,0 +1,1088 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2007  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file program.c
+ * Vertex and fragment program support functions.
+ * \author Brian Paul
+ */
+
+
+#include "icd-utils.h" // LunarG: ADD
+#include "main/glheader.h"
+#include "main/context.h"
+#include "main/hash.h"
+#include "main/macros.h"
+#include "program.h"
+//#include "prog_cache.h"
+#include "prog_parameter.h"
+#include "prog_instruction.h"
+
+
+/**
+ * A pointer to this dummy program is put into the hash table when
+ * glGenPrograms is called.
+ */
+struct gl_program _mesa_DummyProgram;
+
+
+/**
+ * Init context's vertex/fragment program state
+ */
+void
+_mesa_init_program(struct gl_context *ctx)
+{
+   /*
+    * If this assertion fails, we need to increase the field
+    * size for register indexes (see INST_INDEX_BITS).
+    */
+   ASSERT(ctx->Const.Program[MESA_SHADER_VERTEX].MaxUniformComponents / 4
+          <= (1 << INST_INDEX_BITS));
+   ASSERT(ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxUniformComponents / 4
+          <= (1 << INST_INDEX_BITS));
+
+   ASSERT(ctx->Const.Program[MESA_SHADER_VERTEX].MaxTemps <= (1 << INST_INDEX_BITS));
+   ASSERT(ctx->Const.Program[MESA_SHADER_VERTEX].MaxLocalParams <= (1 << INST_INDEX_BITS));
+   ASSERT(ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxTemps <= (1 << INST_INDEX_BITS));
+   ASSERT(ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxLocalParams <= (1 << INST_INDEX_BITS));
+
+   ASSERT(ctx->Const.Program[MESA_SHADER_VERTEX].MaxUniformComponents <= 4 * MAX_UNIFORMS);
+   ASSERT(ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxUniformComponents <= 4 * MAX_UNIFORMS);
+
+   ASSERT(ctx->Const.Program[MESA_SHADER_VERTEX].MaxAddressOffset <= (1 << INST_INDEX_BITS));
+   ASSERT(ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxAddressOffset <= (1 << INST_INDEX_BITS));
+
+   /* If this fails, increase prog_instruction::TexSrcUnit size */
+   STATIC_ASSERT(MAX_TEXTURE_UNITS <= (1 << 5));
+
+   /* If this fails, increase prog_instruction::TexSrcTarget size */
+   STATIC_ASSERT(NUM_TEXTURE_TARGETS <= (1 << 4));
+
+   ctx->Program.ErrorPos = -1;
+   ctx->Program.ErrorString = _mesa_strdup("");
+
+   ctx->VertexProgram.Enabled = GL_FALSE;
+   ctx->VertexProgram.PointSizeEnabled =
+      (ctx->API == API_OPENGLES2) ? GL_TRUE : GL_FALSE;
+   ctx->VertexProgram.TwoSideEnabled = GL_FALSE;
+   _mesa_reference_vertprog(ctx, &ctx->VertexProgram.Current,
+                            ctx->Shared->DefaultVertexProgram);
+   assert(ctx->VertexProgram.Current);
+//   ctx->VertexProgram.Cache = _mesa_new_program_cache();
+
+   ctx->FragmentProgram.Enabled = GL_FALSE;
+   _mesa_reference_fragprog(ctx, &ctx->FragmentProgram.Current,
+                            ctx->Shared->DefaultFragmentProgram);
+   assert(ctx->FragmentProgram.Current);
+//   ctx->FragmentProgram.Cache = _mesa_new_program_cache();
+
+   ctx->GeometryProgram.Enabled = GL_FALSE;
+   /* right now by default we don't have a geometry program */
+   _mesa_reference_geomprog(ctx, &ctx->GeometryProgram.Current,
+                            NULL);
+//   ctx->GeometryProgram.Cache = _mesa_new_program_cache();
+
+   /* XXX probably move this stuff */
+   ctx->ATIFragmentShader.Enabled = GL_FALSE;
+   ctx->ATIFragmentShader.Current = ctx->Shared->DefaultFragmentShader;
+   assert(ctx->ATIFragmentShader.Current);
+   ctx->ATIFragmentShader.Current->RefCount++;
+}
+
+
+/**
+ * Free a context's vertex/fragment program state
+ */
+void
+_mesa_free_program_data(struct gl_context *ctx)
+{
+   _mesa_reference_vertprog(ctx, &ctx->VertexProgram.Current, NULL);
+//   _mesa_delete_program_cache(ctx, ctx->VertexProgram.Cache);
+   _mesa_reference_fragprog(ctx, &ctx->FragmentProgram.Current, NULL);
+//   _mesa_delete_shader_cache(ctx, ctx->FragmentProgram.Cache);
+   _mesa_reference_geomprog(ctx, &ctx->GeometryProgram.Current, NULL);
+//   _mesa_delete_program_cache(ctx, ctx->GeometryProgram.Cache);
+
+   /* XXX probably move this stuff */
+   if (ctx->ATIFragmentShader.Current) {
+      ctx->ATIFragmentShader.Current->RefCount--;
+      if (ctx->ATIFragmentShader.Current->RefCount <= 0) {
+         free(ctx->ATIFragmentShader.Current);
+      }
+   }
+
+   free((void *) ctx->Program.ErrorString);
+}
+
+
+/**
+ * Update the default program objects in the given context to reference those
+ * specified in the shared state and release those referencing the old
+ * shared state.
+ */
+void
+_mesa_update_default_objects_program(struct gl_context *ctx)
+{
+   _mesa_reference_vertprog(ctx, &ctx->VertexProgram.Current,
+                            ctx->Shared->DefaultVertexProgram);
+   assert(ctx->VertexProgram.Current);
+
+   _mesa_reference_fragprog(ctx, &ctx->FragmentProgram.Current,
+                            ctx->Shared->DefaultFragmentProgram);
+   assert(ctx->FragmentProgram.Current);
+
+   _mesa_reference_geomprog(ctx, &ctx->GeometryProgram.Current,
+                      ctx->Shared->DefaultGeometryProgram);
+
+   /* XXX probably move this stuff */
+   if (ctx->ATIFragmentShader.Current) {
+      ctx->ATIFragmentShader.Current->RefCount--;
+      if (ctx->ATIFragmentShader.Current->RefCount <= 0) {
+         free(ctx->ATIFragmentShader.Current);
+      }
+   }
+   ctx->ATIFragmentShader.Current = (struct ati_fragment_shader *) ctx->Shared->DefaultFragmentShader;
+   assert(ctx->ATIFragmentShader.Current);
+   ctx->ATIFragmentShader.Current->RefCount++;
+}
+
+
+/**
+ * Set the vertex/fragment program error state (position and error string).
+ * This is generally called from within the parsers.
+ */
+void
+_mesa_set_program_error(struct gl_context *ctx, GLint pos, const char *string)
+{
+   ctx->Program.ErrorPos = pos;
+   free((void *) ctx->Program.ErrorString);
+   if (!string)
+      string = "";
+   ctx->Program.ErrorString = _mesa_strdup(string);
+}
+
+
+/**
+ * Find the line number and column for 'pos' within 'string'.
+ * Return a copy of the line which contains 'pos'.  Free the line with
+ * free().
+ * \param string  the program string
+ * \param pos     the position within the string
+ * \param line    returns the line number corresponding to 'pos'.
+ * \param col     returns the column number corresponding to 'pos'.
+ * \return copy of the line containing 'pos'.
+ */
+const GLubyte *
+_mesa_find_line_column(const GLubyte *string, const GLubyte *pos,
+                       GLint *line, GLint *col)
+{
+   const GLubyte *lineStart = string;
+   const GLubyte *p = string;
+   GLubyte *s;
+   int len;
+
+   *line = 1;
+
+   while (p != pos) {
+      if (*p == (GLubyte) '\n') {
+         (*line)++;
+         lineStart = p + 1;
+      }
+      p++;
+   }
+
+   *col = (pos - lineStart) + 1;
+
+   /* return copy of this line */
+   while (*p != 0 && *p != '\n')
+      p++;
+   len = p - lineStart;
+   s = malloc(len + 1);
+   memcpy(s, lineStart, len);
+   s[len] = 0;
+
+   return s;
+}
+
+
+/**
+ * Initialize a new vertex/fragment program object.
+ */
+static struct gl_program *
+_mesa_init_program_struct( struct gl_context *ctx, struct gl_program *prog,
+                           GLenum target, GLuint id)
+{
+   (void) ctx;
+   if (prog) {
+      GLuint i;
+      memset(prog, 0, sizeof(*prog));
+      prog->Id = id;
+      prog->Target = target;
+      // LunarG: VK does not use reference counts
+      // prog->RefCount = 1;
+      prog->Format = GL_PROGRAM_FORMAT_ASCII_ARB;
+
+      /* default mapping from samplers to texture units */
+      for (i = 0; i < MAX_SAMPLERS; i++)
+         prog->SamplerUnits[i] = i;
+   }
+
+   return prog;
+}
+
+
+/**
+ * Initialize a new fragment program object.
+ */
+struct gl_program *
+_mesa_init_fragment_program( struct gl_context *ctx, struct gl_fragment_program *prog,
+                             GLenum target, GLuint id)
+{
+   if (prog)
+      return _mesa_init_program_struct( ctx, &prog->Base, target, id );
+   else
+      return NULL;
+}
+
+
+/**
+ * Initialize a new vertex program object.
+ */
+struct gl_program *
+_mesa_init_vertex_program( struct gl_context *ctx, struct gl_vertex_program *prog,
+                           GLenum target, GLuint id)
+{
+   if (prog)
+      return _mesa_init_program_struct( ctx, &prog->Base, target, id );
+   else
+      return NULL;
+}
+
+
+/**
+ * Initialize a new compute program object.
+ */
+struct gl_program *
+_mesa_init_compute_program(struct gl_context *ctx,
+                           struct gl_compute_program *prog, GLenum target,
+                           GLuint id)
+{
+   if (prog)
+      return _mesa_init_program_struct( ctx, &prog->Base, target, id );
+   else
+      return NULL;
+}
+
+
+/**
+ * Initialize a new geometry program object.
+ */
+struct gl_program *
+_mesa_init_geometry_program( struct gl_context *ctx, struct gl_geometry_program *prog,
+                             GLenum target, GLuint id)
+{
+   if (prog)
+      return _mesa_init_program_struct( ctx, &prog->Base, target, id );
+   else
+      return NULL;
+}
+
+
+/**
+ * Allocate and initialize a new fragment/vertex program object but
+ * don't put it into the program hash table.  Called via
+ * ctx->Driver.NewProgram.  May be overridden (ie. replaced) by a
+ * device driver function to implement OO deriviation with additional
+ * types not understood by this function.
+ *
+ * \param ctx  context
+ * \param id   program id/number
+ * \param target  program target/type
+ * \return  pointer to new program object
+ */
+struct gl_program *
+_mesa_new_program(struct gl_context *ctx, GLenum target, GLuint id)
+{
+   struct gl_program *prog;
+   switch (target) {
+   case GL_VERTEX_PROGRAM_ARB: /* == GL_VERTEX_PROGRAM_NV */
+      prog = _mesa_init_vertex_program(ctx, CALLOC_STRUCT(gl_vertex_program),
+                                       target, id );
+      break;
+   case GL_FRAGMENT_PROGRAM_NV:
+   case GL_FRAGMENT_PROGRAM_ARB:
+      prog =_mesa_init_fragment_program(ctx,
+                                         CALLOC_STRUCT(gl_fragment_program),
+                                         target, id );
+      break;
+   case MESA_GEOMETRY_PROGRAM:
+      prog = _mesa_init_geometry_program(ctx,
+                                         CALLOC_STRUCT(gl_geometry_program),
+                                         target, id);
+      break;
+   case GL_COMPUTE_PROGRAM_NV:
+      prog = _mesa_init_compute_program(ctx,
+                                        CALLOC_STRUCT(gl_compute_program),
+                                        target, id);
+      break;
+   default:
+      _mesa_problem(ctx, "bad target in _mesa_new_program");
+      prog = NULL;
+   }
+   return prog;
+}
+
+
+/**
+ * Delete a program and remove it from the hash table, ignoring the
+ * reference count.
+ * Called via ctx->Driver.DeleteProgram.  May be wrapped (OO deriviation)
+ * by a device driver function.
+ */
+void
+_mesa_delete_program(struct gl_context *ctx, struct gl_program *prog)
+{
+   (void) ctx;
+   ASSERT(prog);
+   // LunarG: VK does not use reference counts
+   //ASSERT(prog->RefCount==0);
+
+   if (prog == &_mesa_DummyProgram)
+      return;
+
+   free(prog->String);
+   free(prog->LocalParams);
+
+   if (prog->Instructions) {
+      _mesa_free_instructions(prog->Instructions, prog->NumInstructions);
+   }
+   if (prog->Parameters) {
+       _mesa_free_parameter_list(prog->Parameters);
+   }
+
+   free(prog);
+}
+
+
+/**
+ * Return the gl_program object for a given ID.
+ * Basically just a wrapper for _mesa_HashLookup() to avoid a lot of
+ * casts elsewhere.
+ */
+struct gl_program *
+_mesa_lookup_program(struct gl_context *ctx, GLuint id)
+{
+   if (id)
+      return (struct gl_program *) _mesa_HashLookup(ctx->Shared->Programs, id);
+   else
+      return NULL;
+}
+
+
+/**
+ * Reference counting for vertex/fragment programs
+ * This is normally only called from the _mesa_reference_program() macro
+ * when there's a real pointer change.
+ */
+void
+_mesa_reference_program_(struct gl_context *ctx,
+                         struct gl_program **ptr,
+                         struct gl_program *prog)
+{
+// LunarG: VK does not use reference counts
+#if 0
+#ifndef NDEBUG
+   assert(ptr);
+   if (*ptr && prog) {
+      /* sanity check */
+      if ((*ptr)->Target == GL_VERTEX_PROGRAM_ARB)
+         ASSERT(prog->Target == GL_VERTEX_PROGRAM_ARB);
+      else if ((*ptr)->Target == GL_FRAGMENT_PROGRAM_ARB)
+         ASSERT(prog->Target == GL_FRAGMENT_PROGRAM_ARB ||
+                prog->Target == GL_FRAGMENT_PROGRAM_NV);
+      else if ((*ptr)->Target == MESA_GEOMETRY_PROGRAM)
+         ASSERT(prog->Target == MESA_GEOMETRY_PROGRAM);
+   }
+#endif
+
+   if (*ptr) {
+      GLboolean deleteFlag;
+
+      /*mtx_lock(&(*ptr)->Mutex);*/
+#if 0
+      printf("Program %p ID=%u Target=%s  Refcount-- to %d\n",
+             *ptr, (*ptr)->Id,
+             ((*ptr)->Target == GL_VERTEX_PROGRAM_ARB ? "VP" :
+              ((*ptr)->Target == MESA_GEOMETRY_PROGRAM ? "GP" : "FP")),
+             (*ptr)->RefCount - 1);
+#endif
+      ASSERT((*ptr)->RefCount > 0);
+      (*ptr)->RefCount--;
+
+      deleteFlag = ((*ptr)->RefCount == 0);
+      /*mtx_lock(&(*ptr)->Mutex);*/
+
+      if (deleteFlag) {
+         ASSERT(ctx);
+         ctx->Driver.DeleteProgram(ctx, *ptr);
+      }
+
+      *ptr = NULL;
+   }
+
+   assert(!*ptr);
+   if (prog) {
+      /*mtx_lock(&prog->Mutex);*/
+      prog->RefCount++;
+#if 0
+      printf("Program %p ID=%u Target=%s  Refcount++ to %d\n",
+             prog, prog->Id,
+             (prog->Target == GL_VERTEX_PROGRAM_ARB ? "VP" :
+              (prog->Target == MESA_GEOMETRY_PROGRAM ? "GP" : "FP")),
+             prog->RefCount);
+#endif
+      /*mtx_unlock(&prog->Mutex);*/
+   }
+#endif
+
+   *ptr = prog;
+}
+
+
+/**
+ * Return a copy of a program.
+ * XXX Problem here if the program object is actually OO-derivation
+ * made by a device driver.
+ */
+struct gl_program *
+_mesa_clone_program(struct gl_context *ctx, const struct gl_program *prog)
+{
+   struct gl_program *clone;
+
+   clone = ctx->Driver.NewProgram(ctx, prog->Target, prog->Id);
+   if (!clone)
+      return NULL;
+
+   assert(clone->Target == prog->Target);
+   // LunarG: VK does not use reference counts
+   // assert(clone->RefCount == 1);
+
+   clone->String = (GLubyte *) _mesa_strdup((char *) prog->String);
+   clone->Format = prog->Format;
+   clone->Instructions = _mesa_alloc_instructions(prog->NumInstructions);
+   if (!clone->Instructions) {
+      _mesa_reference_program(ctx, &clone, NULL);
+      return NULL;
+   }
+   _mesa_copy_instructions(clone->Instructions, prog->Instructions,
+                           prog->NumInstructions);
+   clone->InputsRead = prog->InputsRead;
+   clone->OutputsWritten = prog->OutputsWritten;
+   clone->SamplersUsed = prog->SamplersUsed;
+   clone->ShadowSamplers = prog->ShadowSamplers;
+   memcpy(clone->TexturesUsed, prog->TexturesUsed, sizeof(prog->TexturesUsed));
+
+   if (prog->Parameters)
+      clone->Parameters = _mesa_clone_parameter_list(prog->Parameters);
+   if (prog->LocalParams) {
+      clone->LocalParams = malloc(MAX_PROGRAM_LOCAL_PARAMS *
+                                  sizeof(float[4]));
+      if (!clone->LocalParams) {
+         _mesa_reference_program(ctx, &clone, NULL);
+         return NULL;
+      }
+      memcpy(clone->LocalParams, prog->LocalParams,
+             MAX_PROGRAM_LOCAL_PARAMS * sizeof(float[4]));
+   }
+   clone->IndirectRegisterFiles = prog->IndirectRegisterFiles;
+   clone->NumInstructions = prog->NumInstructions;
+   clone->NumTemporaries = prog->NumTemporaries;
+   clone->NumParameters = prog->NumParameters;
+   clone->NumAttributes = prog->NumAttributes;
+   clone->NumAddressRegs = prog->NumAddressRegs;
+   clone->NumNativeInstructions = prog->NumNativeInstructions;
+   clone->NumNativeTemporaries = prog->NumNativeTemporaries;
+   clone->NumNativeParameters = prog->NumNativeParameters;
+   clone->NumNativeAttributes = prog->NumNativeAttributes;
+   clone->NumNativeAddressRegs = prog->NumNativeAddressRegs;
+   clone->NumAluInstructions = prog->NumAluInstructions;
+   clone->NumTexInstructions = prog->NumTexInstructions;
+   clone->NumTexIndirections = prog->NumTexIndirections;
+   clone->NumNativeAluInstructions = prog->NumNativeAluInstructions;
+   clone->NumNativeTexInstructions = prog->NumNativeTexInstructions;
+   clone->NumNativeTexIndirections = prog->NumNativeTexIndirections;
+
+   switch (prog->Target) {
+   case GL_VERTEX_PROGRAM_ARB:
+      {
+         const struct gl_vertex_program *vp = gl_vertex_program_const(prog);
+         struct gl_vertex_program *vpc = gl_vertex_program(clone);
+         vpc->IsPositionInvariant = vp->IsPositionInvariant;
+      }
+      break;
+   case GL_FRAGMENT_PROGRAM_ARB:
+      {
+         const struct gl_fragment_program *fp = gl_fragment_program_const(prog);
+         struct gl_fragment_program *fpc = gl_fragment_program(clone);
+         fpc->UsesKill = fp->UsesKill;
+         fpc->UsesDFdy = fp->UsesDFdy;
+         fpc->OriginUpperLeft = fp->OriginUpperLeft;
+         fpc->PixelCenterInteger = fp->PixelCenterInteger;
+      }
+      break;
+   case MESA_GEOMETRY_PROGRAM:
+      {
+         const struct gl_geometry_program *gp = gl_geometry_program_const(prog);
+         struct gl_geometry_program *gpc = gl_geometry_program(clone);
+         gpc->VerticesOut = gp->VerticesOut;
+         gpc->InputType = gp->InputType;
+         gpc->Invocations = gp->Invocations;
+         gpc->OutputType = gp->OutputType;
+         gpc->UsesEndPrimitive = gp->UsesEndPrimitive;
+      }
+      break;
+   default:
+      _mesa_problem(NULL, "Unexpected target in _mesa_clone_program");
+   }
+
+   return clone;
+}
+
+
+/**
+ * Insert 'count' NOP instructions at 'start' in the given program.
+ * Adjust branch targets accordingly.
+ */
+GLboolean
+_mesa_insert_instructions(struct gl_program *prog, GLuint start, GLuint count)
+{
+   const GLuint origLen = prog->NumInstructions;
+   const GLuint newLen = origLen + count;
+   struct prog_instruction *newInst;
+   GLuint i;
+
+   /* adjust branches */
+   for (i = 0; i < prog->NumInstructions; i++) {
+      struct prog_instruction *inst = prog->Instructions + i;
+      if (inst->BranchTarget > 0) {
+         if ((GLuint)inst->BranchTarget >= start) {
+            inst->BranchTarget += count;
+         }
+      }
+   }
+
+   /* Alloc storage for new instructions */
+   newInst = _mesa_alloc_instructions(newLen);
+   if (!newInst) {
+      return GL_FALSE;
+   }
+
+   /* Copy 'start' instructions into new instruction buffer */
+   _mesa_copy_instructions(newInst, prog->Instructions, start);
+
+   /* init the new instructions */
+   _mesa_init_instructions(newInst + start, count);
+
+   /* Copy the remaining/tail instructions to new inst buffer */
+   _mesa_copy_instructions(newInst + start + count,
+                           prog->Instructions + start,
+                           origLen - start);
+
+   /* free old instructions */
+   _mesa_free_instructions(prog->Instructions, origLen);
+
+   /* install new instructions */
+   prog->Instructions = newInst;
+   prog->NumInstructions = newLen;
+
+   return GL_TRUE;
+}
+
+/**
+ * Delete 'count' instructions at 'start' in the given program.
+ * Adjust branch targets accordingly.
+ */
+GLboolean
+_mesa_delete_instructions(struct gl_program *prog, GLuint start, GLuint count)
+{
+   const GLuint origLen = prog->NumInstructions;
+   const GLuint newLen = origLen - count;
+   struct prog_instruction *newInst;
+   GLuint i;
+
+   /* adjust branches */
+   for (i = 0; i < prog->NumInstructions; i++) {
+      struct prog_instruction *inst = prog->Instructions + i;
+      if (inst->BranchTarget > 0) {
+         if (inst->BranchTarget > (GLint) start) {
+            inst->BranchTarget -= count;
+         }
+      }
+   }
+
+   /* Alloc storage for new instructions */
+   newInst = _mesa_alloc_instructions(newLen);
+   if (!newInst) {
+      return GL_FALSE;
+   }
+
+   /* Copy 'start' instructions into new instruction buffer */
+   _mesa_copy_instructions(newInst, prog->Instructions, start);
+
+   /* Copy the remaining/tail instructions to new inst buffer */
+   _mesa_copy_instructions(newInst + start,
+                           prog->Instructions + start + count,
+                           newLen - start);
+
+   /* free old instructions */
+   _mesa_free_instructions(prog->Instructions, origLen);
+
+   /* install new instructions */
+   prog->Instructions = newInst;
+   prog->NumInstructions = newLen;
+
+   return GL_TRUE;
+}
+
+
+/**
+ * Search instructions for registers that match (oldFile, oldIndex),
+ * replacing them with (newFile, newIndex).
+ */
+static void
+replace_registers(struct prog_instruction *inst, GLuint numInst,
+                  GLuint oldFile, GLuint oldIndex,
+                  GLuint newFile, GLuint newIndex)
+{
+   GLuint i, j;
+   for (i = 0; i < numInst; i++) {
+      /* src regs */
+      for (j = 0; j < _mesa_num_inst_src_regs(inst[i].Opcode); j++) {
+         if (inst[i].SrcReg[j].File == oldFile &&
+             inst[i].SrcReg[j].Index == oldIndex) {
+            inst[i].SrcReg[j].File = newFile;
+            inst[i].SrcReg[j].Index = newIndex;
+         }
+      }
+      /* dst reg */
+      if (inst[i].DstReg.File == oldFile && inst[i].DstReg.Index == oldIndex) {
+         inst[i].DstReg.File = newFile;
+         inst[i].DstReg.Index = newIndex;
+      }
+   }
+}
+
+
+/**
+ * Search instructions for references to program parameters.  When found,
+ * increment the parameter index by 'offset'.
+ * Used when combining programs.
+ */
+static void
+adjust_param_indexes(struct prog_instruction *inst, GLuint numInst,
+                     GLuint offset)
+{
+   GLuint i, j;
+   for (i = 0; i < numInst; i++) {
+      for (j = 0; j < _mesa_num_inst_src_regs(inst[i].Opcode); j++) {
+         GLuint f = inst[i].SrcReg[j].File;
+         if (f == PROGRAM_CONSTANT ||
+             f == PROGRAM_UNIFORM ||
+             f == PROGRAM_STATE_VAR) {
+            inst[i].SrcReg[j].Index += offset;
+         }
+      }
+   }
+}
+
+
+/**
+ * Combine two programs into one.  Fix instructions so the outputs of
+ * the first program go to the inputs of the second program.
+ */
+struct gl_program *
+_mesa_combine_programs(struct gl_context *ctx,
+                       const struct gl_program *progA,
+                       const struct gl_program *progB)
+{
+   struct prog_instruction *newInst;
+   struct gl_program *newProg;
+   const GLuint lenA = progA->NumInstructions - 1; /* omit END instr */
+   const GLuint lenB = progB->NumInstructions;
+   const GLuint numParamsA = _mesa_num_parameters(progA->Parameters);
+   const GLuint newLength = lenA + lenB;
+   GLboolean usedTemps[MAX_PROGRAM_TEMPS];
+   GLuint firstTemp = 0;
+   GLbitfield64 inputsB;
+   GLuint i;
+
+   ASSERT(progA->Target == progB->Target);
+
+   newInst = _mesa_alloc_instructions(newLength);
+   if (!newInst)
+      return GL_FALSE;
+
+   _mesa_copy_instructions(newInst, progA->Instructions, lenA);
+   _mesa_copy_instructions(newInst + lenA, progB->Instructions, lenB);
+
+   /* adjust branch / instruction addresses for B's instructions */
+   for (i = 0; i < lenB; i++) {
+      newInst[lenA + i].BranchTarget += lenA;
+   }
+
+   newProg = ctx->Driver.NewProgram(ctx, progA->Target, 0);
+   newProg->Instructions = newInst;
+   newProg->NumInstructions = newLength;
+
+   /* find used temp regs (we may need new temps below) */
+   _mesa_find_used_registers(newProg, PROGRAM_TEMPORARY,
+                             usedTemps, MAX_PROGRAM_TEMPS);
+
+   if (newProg->Target == GL_FRAGMENT_PROGRAM_ARB) {
+      const struct gl_fragment_program *fprogA, *fprogB;
+      struct gl_fragment_program *newFprog;
+      GLbitfield64 progB_inputsRead = progB->InputsRead;
+      GLint progB_colorFile, progB_colorIndex;
+
+      fprogA = gl_fragment_program_const(progA);
+      fprogB = gl_fragment_program_const(progB);
+      newFprog = gl_fragment_program(newProg);
+
+      newFprog->UsesKill = fprogA->UsesKill || fprogB->UsesKill;
+      newFprog->UsesDFdy = fprogA->UsesDFdy || fprogB->UsesDFdy;
+
+      /* We'll do a search and replace for instances
+       * of progB_colorFile/progB_colorIndex below...
+       */
+      progB_colorFile = PROGRAM_INPUT;
+      progB_colorIndex = VARYING_SLOT_COL0;
+
+      /*
+       * The fragment program may get color from a state var rather than
+       * a fragment input (vertex output) if it's constant.
+       * See the texenvprogram.c code.
+       * So, search the program's parameter list now to see if the program
+       * gets color from a state var instead of a conventional fragment
+       * input register.
+       */
+      for (i = 0; i < progB->Parameters->NumParameters; i++) {
+         struct gl_program_parameter *p = &progB->Parameters->Parameters[i];
+         if (p->Type == PROGRAM_STATE_VAR &&
+             p->StateIndexes[0] == STATE_INTERNAL &&
+             p->StateIndexes[1] == STATE_CURRENT_ATTRIB &&
+             (int) p->StateIndexes[2] == (int) VERT_ATTRIB_COLOR0) {
+            progB_inputsRead |= VARYING_BIT_COL0;
+            progB_colorFile = PROGRAM_STATE_VAR;
+            progB_colorIndex = i;
+            break;
+         }
+      }
+
+      /* Connect color outputs of fprogA to color inputs of fprogB, via a
+       * new temporary register.
+       */
+      if ((progA->OutputsWritten & BITFIELD64_BIT(FRAG_RESULT_COLOR)) &&
+          (progB_inputsRead & VARYING_BIT_COL0)) {
+         GLint tempReg = _mesa_find_free_register(usedTemps, MAX_PROGRAM_TEMPS,
+                                                  firstTemp);
+         if (tempReg < 0) {
+            _mesa_problem(ctx, "No free temp regs found in "
+                          "_mesa_combine_programs(), using 31");
+            tempReg = 31;
+         }
+         firstTemp = tempReg + 1;
+
+         /* replace writes to result.color[0] with tempReg */
+         replace_registers(newInst, lenA,
+                           PROGRAM_OUTPUT, FRAG_RESULT_COLOR,
+                           PROGRAM_TEMPORARY, tempReg);
+         /* replace reads from the input color with tempReg */
+         replace_registers(newInst + lenA, lenB,
+                           progB_colorFile, progB_colorIndex, /* search for */
+                           PROGRAM_TEMPORARY, tempReg  /* replace with */ );
+      }
+
+      /* compute combined program's InputsRead */
+      inputsB = progB_inputsRead;
+      if (progA->OutputsWritten & BITFIELD64_BIT(FRAG_RESULT_COLOR)) {
+         inputsB &= ~(1 << VARYING_SLOT_COL0);
+      }
+      newProg->InputsRead = progA->InputsRead | inputsB;
+      newProg->OutputsWritten = progB->OutputsWritten;
+      newProg->SamplersUsed = progA->SamplersUsed | progB->SamplersUsed;
+   }
+   else {
+      /* vertex program */
+      assert(0);      /* XXX todo */
+   }
+
+   /*
+    * Merge parameters (uniforms, constants, etc)
+    */
+   newProg->Parameters = _mesa_combine_parameter_lists(progA->Parameters,
+                                                       progB->Parameters);
+
+   adjust_param_indexes(newInst + lenA, lenB, numParamsA);
+
+
+   return newProg;
+}
+
+
+/**
+ * Populate the 'used' array with flags indicating which registers (TEMPs,
+ * INPUTs, OUTPUTs, etc, are used by the given program.
+ * \param file  type of register to scan for
+ * \param used  returns true/false flags for in use / free
+ * \param usedSize  size of the 'used' array
+ */
+void
+_mesa_find_used_registers(const struct gl_program *prog,
+                          gl_register_file file,
+                          GLboolean used[], GLuint usedSize)
+{
+   GLuint i, j;
+
+   memset(used, 0, usedSize);
+
+   for (i = 0; i < prog->NumInstructions; i++) {
+      const struct prog_instruction *inst = prog->Instructions + i;
+      const GLuint n = _mesa_num_inst_src_regs(inst->Opcode);
+
+      if (inst->DstReg.File == file) {
+         ASSERT(inst->DstReg.Index < usedSize);
+         if(inst->DstReg.Index < usedSize)
+            used[inst->DstReg.Index] = GL_TRUE;
+      }
+
+      for (j = 0; j < n; j++) {
+         if (inst->SrcReg[j].File == file) {
+            ASSERT(inst->SrcReg[j].Index < (GLint) usedSize);
+            if (inst->SrcReg[j].Index < (GLint) usedSize)
+               used[inst->SrcReg[j].Index] = GL_TRUE;
+         }
+      }
+   }
+}
+
+
+/**
+ * Scan the given 'used' register flag array for the first entry
+ * that's >= firstReg.
+ * \param used  vector of flags indicating registers in use (as returned
+ *              by _mesa_find_used_registers())
+ * \param usedSize  size of the 'used' array
+ * \param firstReg  first register to start searching at
+ * \return index of unused register, or -1 if none.
+ */
+GLint
+_mesa_find_free_register(const GLboolean used[],
+                         GLuint usedSize, GLuint firstReg)
+{
+   GLuint i;
+
+   assert(firstReg < usedSize);
+
+   for (i = firstReg; i < usedSize; i++)
+      if (!used[i])
+         return i;
+
+   return -1;
+}
+
+
+
+/**
+ * Check if the given register index is valid (doesn't exceed implementation-
+ * dependent limits).
+ * \return GL_TRUE if OK, GL_FALSE if bad index
+ */
+GLboolean
+_mesa_valid_register_index(const struct gl_context *ctx,
+                           gl_shader_stage shaderType,
+                           gl_register_file file, GLint index)
+{
+   const struct gl_program_constants *c;
+
+   assert(0 <= shaderType && shaderType < MESA_SHADER_STAGES);
+   c = &ctx->Const.Program[shaderType];
+
+   switch (file) {
+   case PROGRAM_UNDEFINED:
+      return GL_TRUE;  /* XXX or maybe false? */
+
+   case PROGRAM_TEMPORARY:
+      return index >= 0 && index < (GLint) c->MaxTemps;
+
+   case PROGRAM_UNIFORM:
+   case PROGRAM_STATE_VAR:
+      /* aka constant buffer */
+      return index >= 0 && index < (GLint) c->MaxUniformComponents / 4;
+
+   case PROGRAM_CONSTANT:
+      /* constant buffer w/ possible relative negative addressing */
+      return (index > (int) c->MaxUniformComponents / -4 &&
+              index < (int) c->MaxUniformComponents / 4);
+
+   case PROGRAM_INPUT:
+      if (index < 0)
+         return GL_FALSE;
+
+      switch (shaderType) {
+      case MESA_SHADER_VERTEX:
+         return index < VERT_ATTRIB_GENERIC0 + (GLint) c->MaxAttribs;
+      case MESA_SHADER_FRAGMENT:
+         return index < VARYING_SLOT_VAR0 + (GLint) ctx->Const.MaxVarying;
+      case MESA_SHADER_GEOMETRY:
+         return index < VARYING_SLOT_VAR0 + (GLint) ctx->Const.MaxVarying;
+      default:
+         return GL_FALSE;
+      }
+
+   case PROGRAM_OUTPUT:
+      if (index < 0)
+         return GL_FALSE;
+
+      switch (shaderType) {
+      case MESA_SHADER_VERTEX:
+         return index < VARYING_SLOT_VAR0 + (GLint) ctx->Const.MaxVarying;
+      case MESA_SHADER_FRAGMENT:
+         return index < FRAG_RESULT_DATA0 + (GLint) ctx->Const.MaxDrawBuffers;
+      case MESA_SHADER_GEOMETRY:
+         return index < VARYING_SLOT_VAR0 + (GLint) ctx->Const.MaxVarying;
+      default:
+         return GL_FALSE;
+      }
+
+   case PROGRAM_ADDRESS:
+      return index >= 0 && index < (GLint) c->MaxAddressRegs;
+
+   default:
+      _mesa_problem(ctx,
+                    "unexpected register file in _mesa_valid_register_index()");
+      return GL_FALSE;
+   }
+}
+
+
+
+/**
+ * "Post-process" a GPU program.  This is intended to be used for debugging.
+ * Example actions include no-op'ing instructions or changing instruction
+ * behaviour.
+ */
+void
+_mesa_postprocess_program(struct gl_context *ctx, struct gl_program *prog)
+{
+   static const GLfloat white[4] = { 0.5, 0.5, 0.5, 0.5 };
+   GLuint i;
+   GLuint whiteSwizzle;
+   GLint whiteIndex = _mesa_add_unnamed_constant(prog->Parameters,
+                                                 (gl_constant_value *) white,
+                                                 4, &whiteSwizzle);
+
+   (void) whiteIndex;
+
+   for (i = 0; i < prog->NumInstructions; i++) {
+      struct prog_instruction *inst = prog->Instructions + i;
+      const GLuint n = _mesa_num_inst_src_regs(inst->Opcode);
+
+      (void) n;
+
+      if (_mesa_is_tex_instruction(inst->Opcode)) {
+#if 0
+         /* replace TEX/TXP/TXB with MOV */
+         inst->Opcode = OPCODE_MOV;
+         inst->DstReg.WriteMask = WRITEMASK_XYZW;
+         inst->SrcReg[0].Swizzle = SWIZZLE_XYZW;
+         inst->SrcReg[0].Negate = NEGATE_NONE;
+#endif
+
+#if 0
+         /* disable shadow texture mode */
+         inst->TexShadow = 0;
+#endif
+      }
+
+      if (inst->Opcode == OPCODE_TXP) {
+#if 0
+         inst->Opcode = OPCODE_MOV;
+         inst->DstReg.WriteMask = WRITEMASK_XYZW;
+         inst->SrcReg[0].File = PROGRAM_CONSTANT;
+         inst->SrcReg[0].Index = whiteIndex;
+         inst->SrcReg[0].Swizzle = SWIZZLE_XYZW;
+         inst->SrcReg[0].Negate = NEGATE_NONE;
+#endif
+#if 0
+         inst->TexShadow = 0;
+#endif
+#if 0
+         inst->Opcode = OPCODE_TEX;
+         inst->TexShadow = 0;
+#endif
+      }
+
+   }
+}
+
+/* Gets the minimum number of shader invocations per fragment.
+ * This function is useful to determine if we need to do per
+ * sample shading or per fragment shading.
+ */
+GLint
+_mesa_get_min_invocations_per_fragment(struct gl_context *ctx,
+                                       const struct gl_fragment_program *prog,
+                                       bool ignore_sample_qualifier)
+{
+   /* From ARB_sample_shading specification:
+    * "Using gl_SampleID in a fragment shader causes the entire shader
+    *  to be evaluated per-sample."
+    *
+    * "Using gl_SamplePosition in a fragment shader causes the entire
+    *  shader to be evaluated per-sample."
+    *
+    * "If MULTISAMPLE or SAMPLE_SHADING_ARB is disabled, sample shading
+    *  has no effect."
+    */
+   if (ctx->Multisample.Enabled) {
+       // LunarG REM:
+      /* /\* The ARB_gpu_shader5 specification says: */
+      /*  * */
+      /*  * "Use of the "sample" qualifier on a fragment shader input */
+      /*  *  forces per-sample shading" */
+      /*  *\/ */
+      /* if (prog->IsSample && !ignore_sample_qualifier) */
+      /*    return MAX2(ctx->DrawBuffer->Visual.samples, 1); */
+
+      /* if (prog->Base.SystemValuesRead & (SYSTEM_BIT_SAMPLE_ID | */
+      /*                                    SYSTEM_BIT_SAMPLE_POS)) */
+      /*    return MAX2(ctx->DrawBuffer->Visual.samples, 1); */
+      /* else if (ctx->Multisample.SampleShading) */
+      /*    return MAX2(ceil(ctx->Multisample.MinSampleShadingValue * */
+      /*                     ctx->DrawBuffer->Visual.samples), 1); */
+      /* else */
+         return 1;
+   }
+   return 1;
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/program.h b/icd/intel/compiler/mesa-utils/src/mesa/program/program.h
new file mode 100644
index 0000000..ef69824
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/program.h

@@ -0,0 +1,283 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2007  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file program.c
+ * Vertex and fragment program support functions.
+ * \author Brian Paul
+ */
+
+
+/**
+ * \mainpage Mesa vertex and fragment program module
+ *
+ * This module or directory contains most of the code for vertex and
+ * fragment programs and shaders, including state management, parsers,
+ * and (some) software routines for executing programs
+ */
+
+#ifndef PROGRAM_H
+#define PROGRAM_H
+
+#include "main/compiler.h"
+#include "main/mtypes.h"
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+extern struct gl_program _mesa_DummyProgram;
+
+
+extern void
+_mesa_init_program(struct gl_context *ctx);
+
+extern void
+_mesa_free_program_data(struct gl_context *ctx);
+
+extern void
+_mesa_update_default_objects_program(struct gl_context *ctx);
+
+extern void
+_mesa_set_program_error(struct gl_context *ctx, GLint pos, const char *string);
+
+extern const GLubyte *
+_mesa_find_line_column(const GLubyte *string, const GLubyte *pos,
+                       GLint *line, GLint *col);
+
+
+extern struct gl_program *
+_mesa_init_vertex_program(struct gl_context *ctx,
+                          struct gl_vertex_program *prog,
+                          GLenum target, GLuint id);
+
+extern struct gl_program *
+_mesa_init_fragment_program(struct gl_context *ctx,
+                            struct gl_fragment_program *prog,
+                            GLenum target, GLuint id);
+
+extern struct gl_program *
+_mesa_init_geometry_program(struct gl_context *ctx,
+                            struct gl_geometry_program *prog,
+                            GLenum target, GLuint id);
+
+extern struct gl_program *
+_mesa_init_compute_program(struct gl_context *ctx,
+                           struct gl_compute_program *prog,
+                           GLenum target, GLuint id);
+
+extern struct gl_program *
+_mesa_new_program(struct gl_context *ctx, GLenum target, GLuint id);
+
+extern void
+_mesa_delete_program(struct gl_context *ctx, struct gl_program *prog);
+
+extern struct gl_program *
+_mesa_lookup_program(struct gl_context *ctx, GLuint id);
+
+extern void
+_mesa_reference_program_(struct gl_context *ctx,
+                         struct gl_program **ptr,
+                         struct gl_program *prog);
+
+static inline void
+_mesa_reference_program(struct gl_context *ctx,
+                        struct gl_program **ptr,
+                        struct gl_program *prog)
+{
+   if (*ptr != prog)
+      _mesa_reference_program_(ctx, ptr, prog);
+}
+
+static inline void
+_mesa_reference_vertprog(struct gl_context *ctx,
+                         struct gl_vertex_program **ptr,
+                         struct gl_vertex_program *prog)
+{
+   _mesa_reference_program(ctx, (struct gl_program **) ptr,
+                           (struct gl_program *) prog);
+}
+
+static inline void
+_mesa_reference_fragprog(struct gl_context *ctx,
+                         struct gl_fragment_program **ptr,
+                         struct gl_fragment_program *prog)
+{
+   _mesa_reference_program(ctx, (struct gl_program **) ptr,
+                           (struct gl_program *) prog);
+}
+
+static inline void
+_mesa_reference_geomprog(struct gl_context *ctx,
+                         struct gl_geometry_program **ptr,
+                         struct gl_geometry_program *prog)
+{
+   _mesa_reference_program(ctx, (struct gl_program **) ptr,
+                           (struct gl_program *) prog);
+}
+
+extern struct gl_program *
+_mesa_clone_program(struct gl_context *ctx, const struct gl_program *prog);
+
+static inline struct gl_vertex_program *
+_mesa_clone_vertex_program(struct gl_context *ctx,
+                           const struct gl_vertex_program *prog)
+{
+   return (struct gl_vertex_program *) _mesa_clone_program(ctx, &prog->Base);
+}
+
+static inline struct gl_geometry_program *
+_mesa_clone_geometry_program(struct gl_context *ctx,
+                             const struct gl_geometry_program *prog)
+{
+   return (struct gl_geometry_program *) _mesa_clone_program(ctx, &prog->Base);
+}
+
+static inline struct gl_fragment_program *
+_mesa_clone_fragment_program(struct gl_context *ctx,
+                             const struct gl_fragment_program *prog)
+{
+   return (struct gl_fragment_program *) _mesa_clone_program(ctx, &prog->Base);
+}
+
+
+extern  GLboolean
+_mesa_insert_instructions(struct gl_program *prog, GLuint start, GLuint count);
+
+extern  GLboolean
+_mesa_delete_instructions(struct gl_program *prog, GLuint start, GLuint count);
+
+extern struct gl_program *
+_mesa_combine_programs(struct gl_context *ctx,
+                       const struct gl_program *progA,
+                       const struct gl_program *progB);
+
+extern void
+_mesa_find_used_registers(const struct gl_program *prog,
+                          gl_register_file file,
+                          GLboolean used[], GLuint usedSize);
+
+extern GLint
+_mesa_find_free_register(const GLboolean used[],
+                         GLuint maxRegs, GLuint firstReg);
+
+
+extern GLboolean
+_mesa_valid_register_index(const struct gl_context *ctx,
+                           gl_shader_stage shaderType,
+                           gl_register_file file, GLint index);
+
+extern void
+_mesa_postprocess_program(struct gl_context *ctx, struct gl_program *prog);
+
+extern GLint
+_mesa_get_min_invocations_per_fragment(struct gl_context *ctx,
+                                       const struct gl_fragment_program *prog,
+                                       bool ignore_sample_qualifier);
+
+static inline GLuint
+_mesa_program_enum_to_shader_stage(GLenum v)
+{
+   switch (v) {
+   case GL_VERTEX_PROGRAM_ARB:
+      return MESA_SHADER_VERTEX;
+   case GL_FRAGMENT_PROGRAM_ARB:
+      return MESA_SHADER_FRAGMENT;
+   case GL_GEOMETRY_PROGRAM_NV:
+      return MESA_SHADER_GEOMETRY;
+   case GL_COMPUTE_PROGRAM_NV:
+      return MESA_SHADER_COMPUTE;
+   default:
+      ASSERT(0);
+      return ~0;
+   }
+}
+
+
+static inline GLenum
+_mesa_shader_stage_to_program(unsigned stage)
+{
+   switch (stage) {
+   case MESA_SHADER_VERTEX:
+      return GL_VERTEX_PROGRAM_ARB;
+   case MESA_SHADER_FRAGMENT:
+      return GL_FRAGMENT_PROGRAM_ARB;
+   case MESA_SHADER_GEOMETRY:
+      return GL_GEOMETRY_PROGRAM_NV;
+   case MESA_SHADER_COMPUTE:
+      return GL_COMPUTE_PROGRAM_NV;
+   }
+
+   assert(!"Unexpected shader stage in _mesa_shader_stage_to_program");
+   return GL_VERTEX_PROGRAM_ARB;
+}
+
+
+/* Cast wrappers from gl_program to gl_vertex/geometry/fragment_program */
+
+static inline struct gl_fragment_program *
+gl_fragment_program(struct gl_program *prog)
+{
+   return (struct gl_fragment_program *) prog;
+}
+
+static inline const struct gl_fragment_program *
+gl_fragment_program_const(const struct gl_program *prog)
+{
+   return (const struct gl_fragment_program *) prog;
+}
+
+
+static inline struct gl_vertex_program *
+gl_vertex_program(struct gl_program *prog)
+{
+   return (struct gl_vertex_program *) prog;
+}
+
+static inline const struct gl_vertex_program *
+gl_vertex_program_const(const struct gl_program *prog)
+{
+   return (const struct gl_vertex_program *) prog;
+}
+
+
+static inline struct gl_geometry_program *
+gl_geometry_program(struct gl_program *prog)
+{
+   return (struct gl_geometry_program *) prog;
+}
+
+static inline const struct gl_geometry_program *
+gl_geometry_program_const(const struct gl_program *prog)
+{
+   return (const struct gl_geometry_program *) prog;
+}
+
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+#endif /* PROGRAM_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/program_lexer.l b/icd/intel/compiler/mesa-utils/src/mesa/program/program_lexer.l
new file mode 100644
index 0000000..d5dbcf3
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/program_lexer.l

@@ -0,0 +1,508 @@
+%{
+/*
+ * Copyright © 2009 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include "main/glheader.h"
+#include "main/imports.h"
+#include "program/prog_instruction.h"
+#include "program/prog_statevars.h"
+#include "program/symbol_table.h"
+#include "program/program_parser.h"
+#include "program/program_parse.tab.h"
+
+#define require_ARB_vp (yyextra->mode == ARB_vertex)
+#define require_ARB_fp (yyextra->mode == ARB_fragment)
+#define require_NV_fp  (yyextra->option.NV_fragment)
+#define require_shadow (yyextra->option.Shadow)
+#define require_rect   (yyextra->option.TexRect)
+#define require_texarray        (yyextra->option.TexArray)
+
+#ifndef HAVE_UNISTD_H
+#define YY_NO_UNISTD_H
+#endif
+
+#define return_token_or_IDENTIFIER(condition, token)	\
+   do {							\
+      if (condition) {					\
+	 return token;					\
+      } else {						\
+	 return handle_ident(yyextra, yytext, yylval);	\
+      }							\
+   } while (0)
+
+#define return_token_or_DOT(condition, token)		\
+   do {							\
+      if (condition) {					\
+	 return token;					\
+      } else {						\
+	 yyless(1);					\
+	 return DOT;					\
+      }							\
+   } while (0)
+
+
+#define return_opcode(condition, token, opcode, len)	\
+   do {							\
+      if (condition &&					\
+	  _mesa_parse_instruction_suffix(yyextra,	\
+					 yytext + len,	\
+					 & yylval->temp_inst)) {	\
+	 yylval->temp_inst.Opcode = OPCODE_ ## opcode;	\
+	 return token;					\
+      } else {						\
+	 return handle_ident(yyextra, yytext, yylval);	\
+      }							\
+   } while (0)
+
+#define SWIZZLE_INVAL  MAKE_SWIZZLE4(SWIZZLE_NIL, SWIZZLE_NIL, \
+				     SWIZZLE_NIL, SWIZZLE_NIL)
+
+static unsigned
+mask_from_char(char c)
+{
+   switch (c) {
+   case 'x':
+   case 'r':
+      return WRITEMASK_X;
+   case 'y':
+   case 'g':
+      return WRITEMASK_Y;
+   case 'z':
+   case 'b':
+      return WRITEMASK_Z;
+   case 'w':
+   case 'a':
+      return WRITEMASK_W;
+   }
+
+   return 0;
+}
+
+static unsigned
+swiz_from_char(char c)
+{
+   switch (c) {
+   case 'x':
+   case 'r':
+      return SWIZZLE_X;
+   case 'y':
+   case 'g':
+      return SWIZZLE_Y;
+   case 'z':
+   case 'b':
+      return SWIZZLE_Z;
+   case 'w':
+   case 'a':
+      return SWIZZLE_W;
+   }
+
+   return 0;
+}
+
+static int
+handle_ident(struct asm_parser_state *state, const char *text, YYSTYPE *lval)
+{
+   lval->string = strdup(text);
+
+   return (_mesa_symbol_table_find_symbol(state->st, 0, text) == NULL)
+      ? IDENTIFIER : USED_IDENTIFIER;
+}
+
+#define YY_USER_ACTION							\
+   do {									\
+      yylloc->first_column = yylloc->last_column;			\
+      yylloc->last_column += yyleng;					\
+      if ((yylloc->first_line == 1)					\
+	  && (yylloc->first_column == 1)) {				\
+	 yylloc->position = 1;						\
+      } else {								\
+	 yylloc->position += yylloc->last_column - yylloc->first_column; \
+      }									\
+   } while(0);
+
+#define YY_NO_INPUT
+
+/* Yes, this is intentionally doing nothing. We have this line of code
+here only to avoid the compiler complaining about an unput function
+that is defined, but never called. */
+#define YY_USER_INIT while (0) { unput(0); }
+
+#define YY_EXTRA_TYPE struct asm_parser_state *
+
+/* Flex defines a couple of functions with no declarations nor the
+static keyword. Declare them here to avoid a compiler warning. */
+int yyget_column  (yyscan_t yyscanner);
+void yyset_column (int  column_no , yyscan_t yyscanner);
+
+%}
+
+num    [0-9]+
+exp    [Ee][-+]?[0-9]+
+frac   "."[0-9]+
+dot    "."[ \t]*
+
+sz     [HRX]?
+szf    [HR]?
+cc     C?
+sat    (_SAT)?
+
+%option prefix="_mesa_program_lexer_"
+%option bison-bridge bison-locations reentrant noyywrap
+%%
+
+"!!ARBvp1.0"              { return ARBvp_10; }
+"!!ARBfp1.0"              { return ARBfp_10; }
+ADDRESS                   {
+   yylval->integer = at_address;
+   return_token_or_IDENTIFIER(require_ARB_vp, ADDRESS);
+}
+ALIAS                     { return ALIAS; }
+ATTRIB                    { return ATTRIB; }
+END                       { return END; }
+OPTION                    { return OPTION; }
+OUTPUT                    { return OUTPUT; }
+PARAM                     { return PARAM; }
+TEMP                      { yylval->integer = at_temp; return TEMP; }
+
+ABS{sz}{cc}{sat}   { return_opcode(             1, VECTOR_OP, ABS, 3); }
+ADD{sz}{cc}{sat}   { return_opcode(             1, BIN_OP, ADD, 3); }
+ARL                { return_opcode(require_ARB_vp, ARL, ARL, 3); }
+
+CMP{sat}           { return_opcode(require_ARB_fp, TRI_OP, CMP, 3); }
+COS{szf}{cc}{sat}  { return_opcode(require_ARB_fp, SCALAR_OP, COS, 3); }
+
+DDX{szf}{cc}{sat}  { return_opcode(require_NV_fp,  VECTOR_OP, DDX, 3); }
+DDY{szf}{cc}{sat}  { return_opcode(require_NV_fp,  VECTOR_OP, DDY, 3); }
+DP3{sz}{cc}{sat}   { return_opcode(             1, BIN_OP, DP3, 3); }
+DP4{sz}{cc}{sat}   { return_opcode(             1, BIN_OP, DP4, 3); }
+DPH{sz}{cc}{sat}   { return_opcode(             1, BIN_OP, DPH, 3); }
+DST{szf}{cc}{sat}  { return_opcode(             1, BIN_OP, DST, 3); }
+
+EX2{szf}{cc}{sat}  { return_opcode(             1, SCALAR_OP, EX2, 3); }
+EXP                { return_opcode(require_ARB_vp, SCALAR_OP, EXP, 3); }
+
+FLR{sz}{cc}{sat}   { return_opcode(             1, VECTOR_OP, FLR, 3); }
+FRC{sz}{cc}{sat}   { return_opcode(             1, VECTOR_OP, FRC, 3); }
+
+KIL                { return_opcode(require_ARB_fp, KIL, KIL, 3); }
+
+LIT{szf}{cc}{sat}  { return_opcode(             1, VECTOR_OP, LIT, 3); }
+LG2{szf}{cc}{sat}  { return_opcode(             1, SCALAR_OP, LG2, 3); }
+LOG                { return_opcode(require_ARB_vp, SCALAR_OP, LOG, 3); }
+LRP{sz}{cc}{sat}   { return_opcode(require_ARB_fp, TRI_OP, LRP, 3); }
+
+MAD{sz}{cc}{sat}   { return_opcode(             1, TRI_OP, MAD, 3); }
+MAX{sz}{cc}{sat}   { return_opcode(             1, BIN_OP, MAX, 3); }
+MIN{sz}{cc}{sat}   { return_opcode(             1, BIN_OP, MIN, 3); }
+MOV{sz}{cc}{sat}   { return_opcode(             1, VECTOR_OP, MOV, 3); }
+MUL{sz}{cc}{sat}   { return_opcode(             1, BIN_OP, MUL, 3); }
+
+PK2H               { return_opcode(require_NV_fp,  VECTOR_OP, PK2H, 4); }
+PK2US              { return_opcode(require_NV_fp,  VECTOR_OP, PK2US, 5); }
+PK4B               { return_opcode(require_NV_fp,  VECTOR_OP, PK4B, 4); }
+PK4UB              { return_opcode(require_NV_fp,  VECTOR_OP, PK4UB, 5); }
+POW{szf}{cc}{sat}  { return_opcode(             1, BINSC_OP, POW, 3); }
+
+RCP{szf}{cc}{sat}  { return_opcode(             1, SCALAR_OP, RCP, 3); }
+RFL{szf}{cc}{sat}  { return_opcode(require_NV_fp,  BIN_OP,    RFL, 3); }
+RSQ{szf}{cc}{sat}  { return_opcode(             1, SCALAR_OP, RSQ, 3); }
+
+SCS{sat}           { return_opcode(require_ARB_fp, SCALAR_OP, SCS, 3); }
+SEQ{sz}{cc}{sat}   { return_opcode(require_NV_fp,  BIN_OP, SEQ, 3); }
+SFL{sz}{cc}{sat}   { return_opcode(require_NV_fp,  BIN_OP, SFL, 3); }
+SGE{sz}{cc}{sat}   { return_opcode(             1, BIN_OP, SGE, 3); }
+SGT{sz}{cc}{sat}   { return_opcode(require_NV_fp,  BIN_OP, SGT, 3); }
+SIN{szf}{cc}{sat}  { return_opcode(require_ARB_fp, SCALAR_OP, SIN, 3); }
+SLE{sz}{cc}{sat}   { return_opcode(require_NV_fp,  BIN_OP, SLE, 3); }
+SLT{sz}{cc}{sat}   { return_opcode(             1, BIN_OP, SLT, 3); }
+SNE{sz}{cc}{sat}   { return_opcode(require_NV_fp,  BIN_OP, SNE, 3); }
+STR{sz}{cc}{sat}   { return_opcode(require_NV_fp,  BIN_OP, STR, 3); }
+SUB{sz}{cc}{sat}   { return_opcode(             1, BIN_OP, SUB, 3); }
+SWZ{sat}           { return_opcode(             1, SWZ, SWZ, 3); }
+
+TEX{cc}{sat}       { return_opcode(require_ARB_fp, SAMPLE_OP, TEX, 3); }
+TXB{cc}{sat}       { return_opcode(require_ARB_fp, SAMPLE_OP, TXB, 3); }
+TXD{cc}{sat}       { return_opcode(require_NV_fp,  TXD_OP, TXD, 3); }
+TXP{cc}{sat}       { return_opcode(require_ARB_fp, SAMPLE_OP, TXP, 3); }
+
+UP2H{cc}{sat}      { return_opcode(require_NV_fp,  SCALAR_OP, UP2H, 4); }
+UP2US{cc}{sat}     { return_opcode(require_NV_fp,  SCALAR_OP, UP2US, 5); }
+UP4B{cc}{sat}      { return_opcode(require_NV_fp,  SCALAR_OP, UP4B, 4); }
+UP4UB{cc}{sat}     { return_opcode(require_NV_fp,  SCALAR_OP, UP4UB, 5); }
+
+X2D{szf}{cc}{sat}  { return_opcode(require_NV_fp,  TRI_OP, X2D, 3); }
+XPD{sat}           { return_opcode(             1, BIN_OP, XPD, 3); }
+
+vertex                    { return_token_or_IDENTIFIER(require_ARB_vp, VERTEX); }
+fragment                  { return_token_or_IDENTIFIER(require_ARB_fp, FRAGMENT); }
+program                   { return PROGRAM; }
+state                     { return STATE; }
+result                    { return RESULT; }
+
+{dot}ambient              { return AMBIENT; }
+{dot}attenuation          { return ATTENUATION; }
+{dot}back                 { return BACK; }
+{dot}clip                 { return_token_or_DOT(require_ARB_vp, CLIP); }
+{dot}color                { return COLOR; }
+{dot}depth                { return_token_or_DOT(require_ARB_fp, DEPTH); }
+{dot}diffuse              { return DIFFUSE; }
+{dot}direction            { return DIRECTION; }
+{dot}emission             { return EMISSION; }
+{dot}env                  { return ENV; }
+{dot}eye                  { return EYE; }
+{dot}fogcoord             { return FOGCOORD; }
+{dot}fog                  { return FOG; }
+{dot}front                { return FRONT; }
+{dot}half                 { return HALF; }
+{dot}inverse              { return INVERSE; }
+{dot}invtrans             { return INVTRANS; }
+{dot}light                { return LIGHT; }
+{dot}lightmodel           { return LIGHTMODEL; }
+{dot}lightprod            { return LIGHTPROD; }
+{dot}local                { return LOCAL; }
+{dot}material             { return MATERIAL; }
+{dot}program              { return MAT_PROGRAM; }
+{dot}matrix               { return MATRIX; }
+{dot}matrixindex          { return_token_or_DOT(require_ARB_vp, MATRIXINDEX); }
+{dot}modelview            { return MODELVIEW; }
+{dot}mvp                  { return MVP; }
+{dot}normal               { return_token_or_DOT(require_ARB_vp, NORMAL); }
+{dot}object               { return OBJECT; }
+{dot}palette              { return PALETTE; }
+{dot}params               { return PARAMS; }
+{dot}plane                { return PLANE; }
+{dot}point                { return_token_or_DOT(require_ARB_vp, POINT_TOK); }
+{dot}pointsize            { return_token_or_DOT(require_ARB_vp, POINTSIZE); }
+{dot}position             { return POSITION; }
+{dot}primary              { return PRIMARY; }
+{dot}projection           { return PROJECTION; }
+{dot}range                { return_token_or_DOT(require_ARB_fp, RANGE); }
+{dot}row                  { return ROW; }
+{dot}scenecolor           { return SCENECOLOR; }
+{dot}secondary            { return SECONDARY; }
+{dot}shininess            { return SHININESS; }
+{dot}size                 { return_token_or_DOT(require_ARB_vp, SIZE_TOK); }
+{dot}specular             { return SPECULAR; }
+{dot}spot                 { return SPOT; }
+{dot}texcoord             { return TEXCOORD; }
+{dot}texenv               { return_token_or_DOT(require_ARB_fp, TEXENV); }
+{dot}texgen               { return_token_or_DOT(require_ARB_vp, TEXGEN); }
+{dot}q                    { return_token_or_DOT(require_ARB_vp, TEXGEN_Q); }
+{dot}s                    { return_token_or_DOT(require_ARB_vp, TEXGEN_S); }
+{dot}t                    { return_token_or_DOT(require_ARB_vp, TEXGEN_T); }
+{dot}texture              { return TEXTURE; }
+{dot}transpose            { return TRANSPOSE; }
+{dot}attrib               { return_token_or_DOT(require_ARB_vp, VTXATTRIB); }
+{dot}weight               { return_token_or_DOT(require_ARB_vp, WEIGHT); }
+
+texture                   { return_token_or_IDENTIFIER(require_ARB_fp, TEXTURE_UNIT); }
+1D                        { return_token_or_IDENTIFIER(require_ARB_fp, TEX_1D); }
+2D                        { return_token_or_IDENTIFIER(require_ARB_fp, TEX_2D); }
+3D                        { return_token_or_IDENTIFIER(require_ARB_fp, TEX_3D); }
+CUBE                      { return_token_or_IDENTIFIER(require_ARB_fp, TEX_CUBE); }
+RECT                      { return_token_or_IDENTIFIER(require_ARB_fp && require_rect, TEX_RECT); }
+SHADOW1D                  { return_token_or_IDENTIFIER(require_ARB_fp && require_shadow, TEX_SHADOW1D); }
+SHADOW2D                  { return_token_or_IDENTIFIER(require_ARB_fp && require_shadow, TEX_SHADOW2D); }
+SHADOWRECT                { return_token_or_IDENTIFIER(require_ARB_fp && require_shadow && require_rect, TEX_SHADOWRECT); }
+ARRAY1D                   { return_token_or_IDENTIFIER(require_ARB_fp && require_texarray, TEX_ARRAY1D); }
+ARRAY2D                   { return_token_or_IDENTIFIER(require_ARB_fp && require_texarray, TEX_ARRAY2D); }
+ARRAYSHADOW1D             { return_token_or_IDENTIFIER(require_ARB_fp && require_shadow && require_texarray, TEX_ARRAYSHADOW1D); }
+ARRAYSHADOW2D             { return_token_or_IDENTIFIER(require_ARB_fp && require_shadow && require_texarray, TEX_ARRAYSHADOW2D); }
+
+[_a-zA-Z$][_a-zA-Z0-9$]*  { return handle_ident(yyextra, yytext, yylval); }
+
+".."                      { return DOT_DOT; }
+
+{num}                     {
+   yylval->integer = strtol(yytext, NULL, 10);
+   return INTEGER;
+}
+{num}?{frac}{exp}?        {
+   yylval->real = _mesa_strtof(yytext, NULL);
+   return REAL;
+}
+{num}"."/[^.]             {
+   yylval->real = _mesa_strtof(yytext, NULL);
+   return REAL;
+}
+{num}{exp}                {
+   yylval->real = _mesa_strtof(yytext, NULL);
+   return REAL;
+}
+{num}"."{exp}             {
+   yylval->real = _mesa_strtof(yytext, NULL);
+   return REAL;
+}
+
+".xyzw"                   {
+   yylval->swiz_mask.swizzle = SWIZZLE_NOOP;
+   yylval->swiz_mask.mask = WRITEMASK_XYZW;
+   return MASK4;
+}
+
+".xy"[zw]                 {
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_XY
+      | mask_from_char(yytext[3]);
+   return MASK3;
+}
+".xzw"                    {
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_XZW;
+   return MASK3;
+}
+".yzw"                    {
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_YZW;
+   return MASK3;
+}
+
+".x"[yzw]                 {
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_X
+      | mask_from_char(yytext[2]);
+   return MASK2;
+}
+".y"[zw]                  {
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_Y
+      | mask_from_char(yytext[2]);
+   return MASK2;
+}
+".zw"                     {
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_ZW;
+   return MASK2;
+}
+
+"."[xyzw]                 {
+   const unsigned s = swiz_from_char(yytext[1]);
+   yylval->swiz_mask.swizzle = MAKE_SWIZZLE4(s, s, s, s);
+   yylval->swiz_mask.mask = mask_from_char(yytext[1]);
+   return MASK1; 
+}
+
+"."[xyzw]{4}              {
+   yylval->swiz_mask.swizzle = MAKE_SWIZZLE4(swiz_from_char(yytext[1]),
+					    swiz_from_char(yytext[2]),
+					    swiz_from_char(yytext[3]),
+					    swiz_from_char(yytext[4]));
+   yylval->swiz_mask.mask = 0;
+   return SWIZZLE;
+}
+
+".rgba"                   {
+   yylval->swiz_mask.swizzle = SWIZZLE_NOOP;
+   yylval->swiz_mask.mask = WRITEMASK_XYZW;
+   return_token_or_DOT(require_ARB_fp, MASK4);
+}
+
+".rg"[ba]                 {
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_XY
+      | mask_from_char(yytext[3]);
+   return_token_or_DOT(require_ARB_fp, MASK3);
+}
+".rba"                    {
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_XZW;
+   return_token_or_DOT(require_ARB_fp, MASK3);
+}
+".gba"                    {
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_YZW;
+   return_token_or_DOT(require_ARB_fp, MASK3);
+}
+
+".r"[gba]                 {
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_X
+      | mask_from_char(yytext[2]);
+   return_token_or_DOT(require_ARB_fp, MASK2);
+}
+".g"[ba]                  {
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_Y
+      | mask_from_char(yytext[2]);
+   return_token_or_DOT(require_ARB_fp, MASK2);
+}
+".ba"                     {
+   yylval->swiz_mask.swizzle = SWIZZLE_INVAL;
+   yylval->swiz_mask.mask = WRITEMASK_ZW;
+   return_token_or_DOT(require_ARB_fp, MASK2);
+}
+
+"."[gba]                  {
+   const unsigned s = swiz_from_char(yytext[1]);
+   yylval->swiz_mask.swizzle = MAKE_SWIZZLE4(s, s, s, s);
+   yylval->swiz_mask.mask = mask_from_char(yytext[1]);
+   return_token_or_DOT(require_ARB_fp, MASK1);
+}
+
+
+".r"                      {
+   if (require_ARB_vp) {
+      return TEXGEN_R;
+   } else {
+      yylval->swiz_mask.swizzle = MAKE_SWIZZLE4(SWIZZLE_X, SWIZZLE_X,
+						SWIZZLE_X, SWIZZLE_X);
+      yylval->swiz_mask.mask = WRITEMASK_X;
+      return MASK1;
+   }
+}
+
+"."[rgba]{4}              {
+   yylval->swiz_mask.swizzle = MAKE_SWIZZLE4(swiz_from_char(yytext[1]),
+					    swiz_from_char(yytext[2]),
+					    swiz_from_char(yytext[3]),
+					    swiz_from_char(yytext[4]));
+   yylval->swiz_mask.mask = 0;
+   return_token_or_DOT(require_ARB_fp, SWIZZLE);
+}
+
+"."                       { return DOT; }
+
+\n                        {
+   yylloc->first_line++;
+   yylloc->first_column = 1;
+   yylloc->last_line++;
+   yylloc->last_column = 1;
+   yylloc->position++;
+}
+[ \t\r]+                  /* eat whitespace */ ;
+#.*$                      /* eat comments */ ;
+.                         { return yytext[0]; }
+%%
+
+void
+_mesa_program_lexer_ctor(void **scanner, struct asm_parser_state *state,
+			 const char *string, size_t len)
+{
+   yylex_init_extra(state, scanner);
+   yy_scan_bytes(string, len, *scanner);
+}
+
+void
+_mesa_program_lexer_dtor(void *scanner)
+{
+   yylex_destroy(scanner);
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/program_parse.tab.c b/icd/intel/compiler/mesa-utils/src/mesa/program/program_parse.tab.c
new file mode 100644
index 0000000..74979fe
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/program_parse.tab.c

@@ -0,0 +1,5674 @@
+/* A Bison parser, made by GNU Bison 2.7.12-4996.  */
+
+/* Bison implementation for Yacc-like parsers in C
+   
+      Copyright (C) 1984, 1989-1990, 2000-2013 Free Software Foundation, Inc.
+   
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+   
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+   
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+/* As a special exception, you may create a larger work that contains
+   part or all of the Bison parser skeleton and distribute that work
+   under terms of your choice, so long as that work isn't itself a
+   parser generator using the skeleton or a modified version thereof
+   as a parser skeleton.  Alternatively, if you modify or redistribute
+   the parser skeleton itself, you may (at your option) remove this
+   special exception, which will cause the skeleton and the resulting
+   Bison output files to be licensed under the GNU General Public
+   License without this special exception.
+   
+   This special exception was added by the Free Software Foundation in
+   version 2.2 of Bison.  */
+
+/* C LALR(1) parser skeleton written by Richard Stallman, by
+   simplifying the original so-called "semantic" parser.  */
+
+/* All symbols defined below should begin with yy or YY, to avoid
+   infringing on user name space.  This should be done even for local
+   variables, as they might otherwise be expanded by user macros.
+   There are some unavoidable exceptions within include files to
+   define necessary library symbols; they are noted "INFRINGES ON
+   USER NAME SPACE" below.  */
+
+/* Identify Bison output.  */
+#define YYBISON 1
+
+/* Bison version.  */
+#define YYBISON_VERSION "2.7.12-4996"
+
+/* Skeleton name.  */
+#define YYSKELETON_NAME "yacc.c"
+
+/* Pure parsers.  */
+#define YYPURE 1
+
+/* Push parsers.  */
+#define YYPUSH 0
+
+/* Pull parsers.  */
+#define YYPULL 1
+
+
+/* Substitute the variable and function names.  */
+#define yyparse         _mesa_program_parse
+#define yylex           _mesa_program_lex
+#define yyerror         _mesa_program_error
+#define yylval          _mesa_program_lval
+#define yychar          _mesa_program_char
+#define yydebug         _mesa_program_debug
+#define yynerrs         _mesa_program_nerrs
+#define yylloc          _mesa_program_lloc
+
+/* Copy the first part of user declarations.  */
+/* Line 371 of yacc.c  */
+#line 1 "program/program_parse.y"
+
+/*
+ * Copyright © 2009 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "main/mtypes.h"
+#include "main/imports.h"
+#include "program/program.h"
+#include "program/prog_parameter.h"
+#include "program/prog_parameter_layout.h"
+#include "program/prog_statevars.h"
+#include "program/prog_instruction.h"
+
+#include "program/symbol_table.h"
+#include "program/program_parser.h"
+
+extern void *yy_scan_string(char *);
+extern void yy_delete_buffer(void *);
+
+static struct asm_symbol *declare_variable(struct asm_parser_state *state,
+    char *name, enum asm_type t, struct YYLTYPE *locp);
+
+static int add_state_reference(struct gl_program_parameter_list *param_list,
+    const gl_state_index tokens[STATE_LENGTH]);
+
+static int initialize_symbol_from_state(struct gl_program *prog,
+    struct asm_symbol *param_var, const gl_state_index tokens[STATE_LENGTH]);
+
+static int initialize_symbol_from_param(struct gl_program *prog,
+    struct asm_symbol *param_var, const gl_state_index tokens[STATE_LENGTH]);
+
+static int initialize_symbol_from_const(struct gl_program *prog,
+    struct asm_symbol *param_var, const struct asm_vector *vec,
+    GLboolean allowSwizzle);
+
+static int yyparse(struct asm_parser_state *state);
+
+static char *make_error_string(const char *fmt, ...);
+
+static void yyerror(struct YYLTYPE *locp, struct asm_parser_state *state,
+    const char *s);
+
+static int validate_inputs(struct YYLTYPE *locp,
+    struct asm_parser_state *state);
+
+static void init_dst_reg(struct prog_dst_register *r);
+
+static void set_dst_reg(struct prog_dst_register *r,
+                        gl_register_file file, GLint index);
+
+static void init_src_reg(struct asm_src_register *r);
+
+static void set_src_reg(struct asm_src_register *r,
+                        gl_register_file file, GLint index);
+
+static void set_src_reg_swz(struct asm_src_register *r,
+                            gl_register_file file, GLint index, GLuint swizzle);
+
+static void asm_instruction_set_operands(struct asm_instruction *inst,
+    const struct prog_dst_register *dst, const struct asm_src_register *src0,
+    const struct asm_src_register *src1, const struct asm_src_register *src2);
+
+static struct asm_instruction *asm_instruction_ctor(gl_inst_opcode op,
+    const struct prog_dst_register *dst, const struct asm_src_register *src0,
+    const struct asm_src_register *src1, const struct asm_src_register *src2);
+
+static struct asm_instruction *asm_instruction_copy_ctor(
+    const struct prog_instruction *base, const struct prog_dst_register *dst,
+    const struct asm_src_register *src0, const struct asm_src_register *src1,
+    const struct asm_src_register *src2);
+
+#ifndef FALSE
+#define FALSE 0
+#define TRUE (!FALSE)
+#endif
+
+#define YYLLOC_DEFAULT(Current, Rhs, N)					\
+   do {									\
+      if (N) {							\
+	 (Current).first_line = YYRHSLOC(Rhs, 1).first_line;		\
+	 (Current).first_column = YYRHSLOC(Rhs, 1).first_column;	\
+	 (Current).position = YYRHSLOC(Rhs, 1).position;		\
+	 (Current).last_line = YYRHSLOC(Rhs, N).last_line;		\
+	 (Current).last_column = YYRHSLOC(Rhs, N).last_column;		\
+      } else {								\
+	 (Current).first_line = YYRHSLOC(Rhs, 0).last_line;		\
+	 (Current).last_line = (Current).first_line;			\
+	 (Current).first_column = YYRHSLOC(Rhs, 0).last_column;		\
+	 (Current).last_column = (Current).first_column;		\
+	 (Current).position = YYRHSLOC(Rhs, 0).position			\
+	    + (Current).first_column;					\
+      }									\
+   } while(0)
+
+/* Line 371 of yacc.c  */
+#line 193 "./program/program_parse.tab.c"
+
+# ifndef YY_NULL
+#  if defined __cplusplus && 201103L <= __cplusplus
+#   define YY_NULL nullptr
+#  else
+#   define YY_NULL 0
+#  endif
+# endif
+
+/* Enabling verbose error messages.  */
+#ifdef YYERROR_VERBOSE
+# undef YYERROR_VERBOSE
+# define YYERROR_VERBOSE 1
+#else
+# define YYERROR_VERBOSE 1
+#endif
+
+/* In a future release of Bison, this section will be replaced
+   by #include "program_parse.tab.h".  */
+#ifndef YY__MESA_PROGRAM_PROGRAM_PROGRAM_PARSE_TAB_H_INCLUDED
+# define YY__MESA_PROGRAM_PROGRAM_PROGRAM_PARSE_TAB_H_INCLUDED
+/* Enabling traces.  */
+#ifndef YYDEBUG
+# define YYDEBUG 0
+#endif
+#if YYDEBUG
+extern int _mesa_program_debug;
+#endif
+
+/* Tokens.  */
+#ifndef YYTOKENTYPE
+# define YYTOKENTYPE
+   /* Put the tokens into the symbol table, so that GDB and other debuggers
+      know about them.  */
+   enum yytokentype {
+     ARBvp_10 = 258,
+     ARBfp_10 = 259,
+     ADDRESS = 260,
+     ALIAS = 261,
+     ATTRIB = 262,
+     OPTION = 263,
+     OUTPUT = 264,
+     PARAM = 265,
+     TEMP = 266,
+     END = 267,
+     BIN_OP = 268,
+     BINSC_OP = 269,
+     SAMPLE_OP = 270,
+     SCALAR_OP = 271,
+     TRI_OP = 272,
+     VECTOR_OP = 273,
+     ARL = 274,
+     KIL = 275,
+     SWZ = 276,
+     TXD_OP = 277,
+     INTEGER = 278,
+     REAL = 279,
+     AMBIENT = 280,
+     ATTENUATION = 281,
+     BACK = 282,
+     CLIP = 283,
+     COLOR = 284,
+     DEPTH = 285,
+     DIFFUSE = 286,
+     DIRECTION = 287,
+     EMISSION = 288,
+     ENV = 289,
+     EYE = 290,
+     FOG = 291,
+     FOGCOORD = 292,
+     FRAGMENT = 293,
+     FRONT = 294,
+     HALF = 295,
+     INVERSE = 296,
+     INVTRANS = 297,
+     LIGHT = 298,
+     LIGHTMODEL = 299,
+     LIGHTPROD = 300,
+     LOCAL = 301,
+     MATERIAL = 302,
+     MAT_PROGRAM = 303,
+     MATRIX = 304,
+     MATRIXINDEX = 305,
+     MODELVIEW = 306,
+     MVP = 307,
+     NORMAL = 308,
+     OBJECT = 309,
+     PALETTE = 310,
+     PARAMS = 311,
+     PLANE = 312,
+     POINT_TOK = 313,
+     POINTSIZE = 314,
+     POSITION = 315,
+     PRIMARY = 316,
+     PROGRAM = 317,
+     PROJECTION = 318,
+     RANGE = 319,
+     RESULT = 320,
+     ROW = 321,
+     SCENECOLOR = 322,
+     SECONDARY = 323,
+     SHININESS = 324,
+     SIZE_TOK = 325,
+     SPECULAR = 326,
+     SPOT = 327,
+     STATE = 328,
+     TEXCOORD = 329,
+     TEXENV = 330,
+     TEXGEN = 331,
+     TEXGEN_Q = 332,
+     TEXGEN_R = 333,
+     TEXGEN_S = 334,
+     TEXGEN_T = 335,
+     TEXTURE = 336,
+     TRANSPOSE = 337,
+     TEXTURE_UNIT = 338,
+     TEX_1D = 339,
+     TEX_2D = 340,
+     TEX_3D = 341,
+     TEX_CUBE = 342,
+     TEX_RECT = 343,
+     TEX_SHADOW1D = 344,
+     TEX_SHADOW2D = 345,
+     TEX_SHADOWRECT = 346,
+     TEX_ARRAY1D = 347,
+     TEX_ARRAY2D = 348,
+     TEX_ARRAYSHADOW1D = 349,
+     TEX_ARRAYSHADOW2D = 350,
+     VERTEX = 351,
+     VTXATTRIB = 352,
+     WEIGHT = 353,
+     IDENTIFIER = 354,
+     USED_IDENTIFIER = 355,
+     MASK4 = 356,
+     MASK3 = 357,
+     MASK2 = 358,
+     MASK1 = 359,
+     SWIZZLE = 360,
+     DOT_DOT = 361,
+     DOT = 362
+   };
+#endif
+
+
+#if ! defined YYSTYPE && ! defined YYSTYPE_IS_DECLARED
+typedef union YYSTYPE
+{
+/* Line 387 of yacc.c  */
+#line 124 "program/program_parse.y"
+
+   struct asm_instruction *inst;
+   struct asm_symbol *sym;
+   struct asm_symbol temp_sym;
+   struct asm_swizzle_mask swiz_mask;
+   struct asm_src_register src_reg;
+   struct prog_dst_register dst_reg;
+   struct prog_instruction temp_inst;
+   char *string;
+   unsigned result;
+   unsigned attrib;
+   int integer;
+   float real;
+   gl_state_index state[STATE_LENGTH];
+   int negate;
+   struct asm_vector vector;
+   gl_inst_opcode opcode;
+
+   struct {
+      unsigned swz;
+      unsigned rgba_valid:1;
+      unsigned xyzw_valid:1;
+      unsigned negate:1;
+   } ext_swizzle;
+
+
+/* Line 387 of yacc.c  */
+#line 370 "./program/program_parse.tab.c"
+} YYSTYPE;
+# define YYSTYPE_IS_TRIVIAL 1
+# define yystype YYSTYPE /* obsolescent; will be withdrawn */
+# define YYSTYPE_IS_DECLARED 1
+#endif
+
+#if ! defined YYLTYPE && ! defined YYLTYPE_IS_DECLARED
+typedef struct YYLTYPE
+{
+  int first_line;
+  int first_column;
+  int last_line;
+  int last_column;
+} YYLTYPE;
+# define yyltype YYLTYPE /* obsolescent; will be withdrawn */
+# define YYLTYPE_IS_DECLARED 1
+# define YYLTYPE_IS_TRIVIAL 1
+#endif
+
+
+#ifdef YYPARSE_PARAM
+#if defined __STDC__ || defined __cplusplus
+int _mesa_program_parse (void *YYPARSE_PARAM);
+#else
+int _mesa_program_parse ();
+#endif
+#else /* ! YYPARSE_PARAM */
+#if defined __STDC__ || defined __cplusplus
+int _mesa_program_parse (struct asm_parser_state *state);
+#else
+int _mesa_program_parse ();
+#endif
+#endif /* ! YYPARSE_PARAM */
+
+#endif /* !YY__MESA_PROGRAM_PROGRAM_PROGRAM_PARSE_TAB_H_INCLUDED  */
+
+/* Copy the second part of user declarations.  */
+/* Line 390 of yacc.c  */
+#line 269 "program/program_parse.y"
+
+extern int
+_mesa_program_lexer_lex(YYSTYPE *yylval_param, YYLTYPE *yylloc_param,
+                        void *yyscanner);
+
+static int
+yylex(YYSTYPE *yylval_param, YYLTYPE *yylloc_param,
+      struct asm_parser_state *state)
+{
+   return _mesa_program_lexer_lex(yylval_param, yylloc_param, state->scanner);
+}
+
+/* Line 390 of yacc.c  */
+#line 423 "./program/program_parse.tab.c"
+
+#ifdef short
+# undef short
+#endif
+
+#ifdef YYTYPE_UINT8
+typedef YYTYPE_UINT8 yytype_uint8;
+#else
+typedef unsigned char yytype_uint8;
+#endif
+
+#ifdef YYTYPE_INT8
+typedef YYTYPE_INT8 yytype_int8;
+#elif (defined __STDC__ || defined __C99__FUNC__ \
+     || defined __cplusplus || defined _MSC_VER)
+typedef signed char yytype_int8;
+#else
+typedef short int yytype_int8;
+#endif
+
+#ifdef YYTYPE_UINT16
+typedef YYTYPE_UINT16 yytype_uint16;
+#else
+typedef unsigned short int yytype_uint16;
+#endif
+
+#ifdef YYTYPE_INT16
+typedef YYTYPE_INT16 yytype_int16;
+#else
+typedef short int yytype_int16;
+#endif
+
+#ifndef YYSIZE_T
+# ifdef __SIZE_TYPE__
+#  define YYSIZE_T __SIZE_TYPE__
+# elif defined size_t
+#  define YYSIZE_T size_t
+# elif ! defined YYSIZE_T && (defined __STDC__ || defined __C99__FUNC__ \
+     || defined __cplusplus || defined _MSC_VER)
+#  include <stddef.h> /* INFRINGES ON USER NAME SPACE */
+#  define YYSIZE_T size_t
+# else
+#  define YYSIZE_T unsigned int
+# endif
+#endif
+
+#define YYSIZE_MAXIMUM ((YYSIZE_T) -1)
+
+#ifndef YY_
+# if defined YYENABLE_NLS && YYENABLE_NLS
+#  if ENABLE_NLS
+#   include <libintl.h> /* INFRINGES ON USER NAME SPACE */
+#   define YY_(Msgid) dgettext ("bison-runtime", Msgid)
+#  endif
+# endif
+# ifndef YY_
+#  define YY_(Msgid) Msgid
+# endif
+#endif
+
+#ifndef __attribute__
+/* This feature is available in gcc versions 2.5 and later.  */
+# if (! defined __GNUC__ || __GNUC__ < 2 \
+      || (__GNUC__ == 2 && __GNUC_MINOR__ < 5))
+#  define __attribute__(Spec) /* empty */
+# endif
+#endif
+
+/* Suppress unused-variable warnings by "using" E.  */
+#if ! defined lint || defined __GNUC__
+# define YYUSE(E) ((void) (E))
+#else
+# define YYUSE(E) /* empty */
+#endif
+
+
+/* Identity function, used to suppress warnings about constant conditions.  */
+#ifndef lint
+# define YYID(N) (N)
+#else
+#if (defined __STDC__ || defined __C99__FUNC__ \
+     || defined __cplusplus || defined _MSC_VER)
+static int
+YYID (int yyi)
+#else
+static int
+YYID (yyi)
+    int yyi;
+#endif
+{
+  return yyi;
+}
+#endif
+
+#if ! defined yyoverflow || YYERROR_VERBOSE
+
+/* The parser invokes alloca or malloc; define the necessary symbols.  */
+
+# ifdef YYSTACK_USE_ALLOCA
+#  if YYSTACK_USE_ALLOCA
+#   ifdef __GNUC__
+#    define YYSTACK_ALLOC __builtin_alloca
+#   elif defined __BUILTIN_VA_ARG_INCR
+#    include <alloca.h> /* INFRINGES ON USER NAME SPACE */
+#   elif defined _AIX
+#    define YYSTACK_ALLOC __alloca
+#   elif defined _MSC_VER
+#    include <malloc.h> /* INFRINGES ON USER NAME SPACE */
+#    define alloca _alloca
+#   else
+#    define YYSTACK_ALLOC alloca
+#    if ! defined _ALLOCA_H && ! defined EXIT_SUCCESS && (defined __STDC__ || defined __C99__FUNC__ \
+     || defined __cplusplus || defined _MSC_VER)
+#     include <stdlib.h> /* INFRINGES ON USER NAME SPACE */
+      /* Use EXIT_SUCCESS as a witness for stdlib.h.  */
+#     ifndef EXIT_SUCCESS
+#      define EXIT_SUCCESS 0
+#     endif
+#    endif
+#   endif
+#  endif
+# endif
+
+# ifdef YYSTACK_ALLOC
+   /* Pacify GCC's `empty if-body' warning.  */
+#  define YYSTACK_FREE(Ptr) do { /* empty */; } while (YYID (0))
+#  ifndef YYSTACK_ALLOC_MAXIMUM
+    /* The OS might guarantee only one guard page at the bottom of the stack,
+       and a page size can be as small as 4096 bytes.  So we cannot safely
+       invoke alloca (N) if N exceeds 4096.  Use a slightly smaller number
+       to allow for a few compiler-allocated temporary stack slots.  */
+#   define YYSTACK_ALLOC_MAXIMUM 4032 /* reasonable circa 2006 */
+#  endif
+# else
+#  define YYSTACK_ALLOC YYMALLOC
+#  define YYSTACK_FREE YYFREE
+#  ifndef YYSTACK_ALLOC_MAXIMUM
+#   define YYSTACK_ALLOC_MAXIMUM YYSIZE_MAXIMUM
+#  endif
+#  if (defined __cplusplus && ! defined EXIT_SUCCESS \
+       && ! ((defined YYMALLOC || defined malloc) \
+	     && (defined YYFREE || defined free)))
+#   include <stdlib.h> /* INFRINGES ON USER NAME SPACE */
+#   ifndef EXIT_SUCCESS
+#    define EXIT_SUCCESS 0
+#   endif
+#  endif
+#  ifndef YYMALLOC
+#   define YYMALLOC malloc
+#   if ! defined malloc && ! defined EXIT_SUCCESS && (defined __STDC__ || defined __C99__FUNC__ \
+     || defined __cplusplus || defined _MSC_VER)
+void *malloc (YYSIZE_T); /* INFRINGES ON USER NAME SPACE */
+#   endif
+#  endif
+#  ifndef YYFREE
+#   define YYFREE free
+#   if ! defined free && ! defined EXIT_SUCCESS && (defined __STDC__ || defined __C99__FUNC__ \
+     || defined __cplusplus || defined _MSC_VER)
+void free (void *); /* INFRINGES ON USER NAME SPACE */
+#   endif
+#  endif
+# endif
+#endif /* ! defined yyoverflow || YYERROR_VERBOSE */
+
+
+#if (! defined yyoverflow \
+     && (! defined __cplusplus \
+	 || (defined YYLTYPE_IS_TRIVIAL && YYLTYPE_IS_TRIVIAL \
+	     && defined YYSTYPE_IS_TRIVIAL && YYSTYPE_IS_TRIVIAL)))
+
+/* A type that is properly aligned for any stack member.  */
+union yyalloc
+{
+  yytype_int16 yyss_alloc;
+  YYSTYPE yyvs_alloc;
+  YYLTYPE yyls_alloc;
+};
+
+/* The size of the maximum gap between one aligned stack and the next.  */
+# define YYSTACK_GAP_MAXIMUM (sizeof (union yyalloc) - 1)
+
+/* The size of an array large to enough to hold all stacks, each with
+   N elements.  */
+# define YYSTACK_BYTES(N) \
+     ((N) * (sizeof (yytype_int16) + sizeof (YYSTYPE) + sizeof (YYLTYPE)) \
+      + 2 * YYSTACK_GAP_MAXIMUM)
+
+# define YYCOPY_NEEDED 1
+
+/* Relocate STACK from its old location to the new one.  The
+   local variables YYSIZE and YYSTACKSIZE give the old and new number of
+   elements in the stack, and YYPTR gives the new location of the
+   stack.  Advance YYPTR to a properly aligned location for the next
+   stack.  */
+# define YYSTACK_RELOCATE(Stack_alloc, Stack)				\
+    do									\
+      {									\
+	YYSIZE_T yynewbytes;						\
+	YYCOPY (&yyptr->Stack_alloc, Stack, yysize);			\
+	Stack = &yyptr->Stack_alloc;					\
+	yynewbytes = yystacksize * sizeof (*Stack) + YYSTACK_GAP_MAXIMUM; \
+	yyptr += yynewbytes / sizeof (*yyptr);				\
+      }									\
+    while (YYID (0))
+
+#endif
+
+#if defined YYCOPY_NEEDED && YYCOPY_NEEDED
+/* Copy COUNT objects from SRC to DST.  The source and destination do
+   not overlap.  */
+# ifndef YYCOPY
+#  if defined __GNUC__ && 1 < __GNUC__
+#   define YYCOPY(Dst, Src, Count) \
+      __builtin_memcpy (Dst, Src, (Count) * sizeof (*(Src)))
+#  else
+#   define YYCOPY(Dst, Src, Count)              \
+      do                                        \
+        {                                       \
+          YYSIZE_T yyi;                         \
+          for (yyi = 0; yyi < (Count); yyi++)   \
+            (Dst)[yyi] = (Src)[yyi];            \
+        }                                       \
+      while (YYID (0))
+#  endif
+# endif
+#endif /* !YYCOPY_NEEDED */
+
+/* YYFINAL -- State number of the termination state.  */
+#define YYFINAL  5
+/* YYLAST -- Last index in YYTABLE.  */
+#define YYLAST   402
+
+/* YYNTOKENS -- Number of terminals.  */
+#define YYNTOKENS  120
+/* YYNNTS -- Number of nonterminals.  */
+#define YYNNTS  143
+/* YYNRULES -- Number of rules.  */
+#define YYNRULES  283
+/* YYNRULES -- Number of states.  */
+#define YYNSTATES  478
+
+/* YYTRANSLATE(YYLEX) -- Bison symbol number corresponding to YYLEX.  */
+#define YYUNDEFTOK  2
+#define YYMAXUTOK   362
+
+#define YYTRANSLATE(YYX)						\
+  ((unsigned int) (YYX) <= YYMAXUTOK ? yytranslate[YYX] : YYUNDEFTOK)
+
+/* YYTRANSLATE[YYLEX] -- Bison symbol number corresponding to YYLEX.  */
+static const yytype_uint8 yytranslate[] =
+{
+       0,     2,     2,     2,     2,     2,     2,     2,     2,     2,
+       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
+       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
+       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
+     115,   116,     2,   113,   109,   114,     2,     2,     2,     2,
+       2,     2,     2,     2,     2,     2,     2,     2,     2,   108,
+       2,   117,     2,     2,     2,     2,     2,     2,     2,     2,
+       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
+       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
+       2,   111,     2,   112,     2,     2,     2,     2,     2,     2,
+       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
+       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
+       2,     2,     2,   118,   110,   119,     2,     2,     2,     2,
+       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
+       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
+       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
+       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
+       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
+       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
+       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
+       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
+       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
+       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
+       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
+       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
+       2,     2,     2,     2,     2,     2,     1,     2,     3,     4,
+       5,     6,     7,     8,     9,    10,    11,    12,    13,    14,
+      15,    16,    17,    18,    19,    20,    21,    22,    23,    24,
+      25,    26,    27,    28,    29,    30,    31,    32,    33,    34,
+      35,    36,    37,    38,    39,    40,    41,    42,    43,    44,
+      45,    46,    47,    48,    49,    50,    51,    52,    53,    54,
+      55,    56,    57,    58,    59,    60,    61,    62,    63,    64,
+      65,    66,    67,    68,    69,    70,    71,    72,    73,    74,
+      75,    76,    77,    78,    79,    80,    81,    82,    83,    84,
+      85,    86,    87,    88,    89,    90,    91,    92,    93,    94,
+      95,    96,    97,    98,    99,   100,   101,   102,   103,   104,
+     105,   106,   107
+};
+
+#if YYDEBUG
+/* YYPRHS[YYN] -- Index of the first RHS symbol of rule number YYN in
+   YYRHS.  */
+static const yytype_uint16 yyprhs[] =
+{
+       0,     0,     3,     8,    10,    12,    15,    16,    20,    23,
+      24,    27,    30,    32,    34,    36,    38,    40,    42,    44,
+      46,    48,    50,    52,    54,    59,    64,    69,    76,    83,
+      92,   101,   104,   107,   120,   123,   125,   127,   129,   131,
+     133,   135,   137,   139,   141,   143,   145,   147,   154,   157,
+     162,   165,   167,   171,   177,   181,   184,   192,   195,   197,
+     199,   201,   203,   208,   210,   212,   214,   216,   218,   220,
+     222,   226,   227,   230,   233,   235,   237,   239,   241,   243,
+     245,   247,   249,   251,   252,   254,   256,   258,   260,   261,
+     265,   269,   270,   273,   276,   278,   280,   282,   284,   286,
+     288,   290,   292,   297,   300,   303,   305,   308,   310,   313,
+     315,   318,   323,   328,   330,   331,   335,   337,   339,   342,
+     344,   347,   349,   351,   355,   362,   363,   365,   368,   373,
+     375,   379,   381,   383,   385,   387,   389,   391,   393,   395,
+     397,   399,   402,   405,   408,   411,   414,   417,   420,   423,
+     426,   429,   432,   435,   439,   441,   443,   445,   451,   453,
+     455,   457,   460,   462,   464,   467,   469,   472,   479,   481,
+     485,   487,   489,   491,   493,   495,   500,   502,   504,   506,
+     508,   510,   512,   515,   517,   519,   525,   527,   530,   532,
+     534,   540,   543,   544,   551,   555,   556,   558,   560,   562,
+     564,   566,   569,   571,   573,   576,   581,   586,   587,   591,
+     593,   595,   597,   600,   602,   604,   606,   608,   614,   616,
+     620,   626,   632,   634,   638,   644,   646,   648,   650,   652,
+     654,   656,   658,   660,   662,   666,   672,   680,   690,   693,
+     696,   698,   700,   701,   702,   707,   709,   710,   711,   715,
+     719,   721,   727,   730,   733,   736,   739,   743,   746,   750,
+     751,   755,   757,   759,   760,   762,   764,   765,   767,   769,
+     770,   772,   774,   775,   779,   780,   784,   785,   789,   791,
+     793,   795,   800,   802
+};
+
+/* YYRHS -- A `-1'-separated list of the rules' RHS.  */
+static const yytype_int16 yyrhs[] =
+{
+     121,     0,    -1,   122,   123,   125,    12,    -1,     3,    -1,
+       4,    -1,   123,   124,    -1,    -1,     8,   262,   108,    -1,
+     125,   126,    -1,    -1,   127,   108,    -1,   170,   108,    -1,
+     128,    -1,   129,    -1,   130,    -1,   131,    -1,   132,    -1,
+     133,    -1,   134,    -1,   135,    -1,   141,    -1,   136,    -1,
+     137,    -1,   138,    -1,    19,   146,   109,   142,    -1,    18,
+     145,   109,   144,    -1,    16,   145,   109,   142,    -1,    14,
+     145,   109,   142,   109,   142,    -1,    13,   145,   109,   144,
+     109,   144,    -1,    17,   145,   109,   144,   109,   144,   109,
+     144,    -1,    15,   145,   109,   144,   109,   139,   109,   140,
+      -1,    20,   144,    -1,    20,   166,    -1,    22,   145,   109,
+     144,   109,   144,   109,   144,   109,   139,   109,   140,    -1,
+      83,   256,    -1,    84,    -1,    85,    -1,    86,    -1,    87,
+      -1,    88,    -1,    89,    -1,    90,    -1,    91,    -1,    92,
+      -1,    93,    -1,    94,    -1,    95,    -1,    21,   145,   109,
+     150,   109,   147,    -1,   241,   143,    -1,   241,   110,   143,
+     110,    -1,   150,   162,    -1,   238,    -1,   241,   150,   163,
+      -1,   241,   110,   150,   163,   110,    -1,   151,   164,   165,
+      -1,   159,   161,    -1,   148,   109,   148,   109,   148,   109,
+     148,    -1,   241,   149,    -1,    23,    -1,   262,    -1,   100,
+      -1,   172,    -1,   152,   111,   153,   112,    -1,   186,    -1,
+     249,    -1,   100,    -1,   100,    -1,   154,    -1,   155,    -1,
+      23,    -1,   159,   160,   156,    -1,    -1,   113,   157,    -1,
+     114,   158,    -1,    23,    -1,    23,    -1,   100,    -1,   104,
+      -1,   104,    -1,   104,    -1,   104,    -1,   101,    -1,   105,
+      -1,    -1,   101,    -1,   102,    -1,   103,    -1,   104,    -1,
+      -1,   115,   166,   116,    -1,   115,   167,   116,    -1,    -1,
+     168,   163,    -1,   169,   163,    -1,    99,    -1,   100,    -1,
+     171,    -1,   178,    -1,   242,    -1,   245,    -1,   248,    -1,
+     261,    -1,     7,    99,   117,   172,    -1,    96,   173,    -1,
+      38,   177,    -1,    60,    -1,    98,   175,    -1,    53,    -1,
+      29,   254,    -1,    37,    -1,    74,   255,    -1,    50,   111,
+     176,   112,    -1,    97,   111,   174,   112,    -1,    23,    -1,
+      -1,   111,   176,   112,    -1,    23,    -1,    60,    -1,    29,
+     254,    -1,    37,    -1,    74,   255,    -1,   179,    -1,   180,
+      -1,    10,    99,   182,    -1,    10,    99,   111,   181,   112,
+     183,    -1,    -1,    23,    -1,   117,   185,    -1,   117,   118,
+     184,   119,    -1,   187,    -1,   184,   109,   187,    -1,   189,
+      -1,   225,    -1,   235,    -1,   189,    -1,   225,    -1,   236,
+      -1,   188,    -1,   226,    -1,   235,    -1,   189,    -1,    73,
+     213,    -1,    73,   190,    -1,    73,   192,    -1,    73,   195,
+      -1,    73,   197,    -1,    73,   203,    -1,    73,   199,    -1,
+      73,   206,    -1,    73,   208,    -1,    73,   210,    -1,    73,
+     212,    -1,    73,   224,    -1,    47,   253,   191,    -1,   201,
+      -1,    33,    -1,    69,    -1,    43,   111,   202,   112,   193,
+      -1,   201,    -1,    60,    -1,    26,    -1,    72,   194,    -1,
+      40,    -1,    32,    -1,    44,   196,    -1,    25,    -1,   253,
+      67,    -1,    45,   111,   202,   112,   253,   198,    -1,   201,
+      -1,    75,   257,   200,    -1,    29,    -1,    25,    -1,    31,
+      -1,    71,    -1,    23,    -1,    76,   255,   204,   205,    -1,
+      35,    -1,    54,    -1,    79,    -1,    80,    -1,    78,    -1,
+      77,    -1,    36,   207,    -1,    29,    -1,    56,    -1,    28,
+     111,   209,   112,    57,    -1,    23,    -1,    58,   211,    -1,
+      70,    -1,    26,    -1,   215,    66,   111,   218,   112,    -1,
+     215,   214,    -1,    -1,    66,   111,   218,   106,   218,   112,
+      -1,    49,   219,   216,    -1,    -1,   217,    -1,    41,    -1,
+      82,    -1,    42,    -1,    23,    -1,    51,   220,    -1,    63,
+      -1,    52,    -1,    81,   255,    -1,    55,   111,   222,   112,
+      -1,    48,   111,   223,   112,    -1,    -1,   111,   221,   112,
+      -1,    23,    -1,    23,    -1,    23,    -1,    30,    64,    -1,
+     229,    -1,   232,    -1,   227,    -1,   230,    -1,    62,    34,
+     111,   228,   112,    -1,   233,    -1,   233,   106,   233,    -1,
+      62,    34,   111,   233,   112,    -1,    62,    46,   111,   231,
+     112,    -1,   234,    -1,   234,   106,   234,    -1,    62,    46,
+     111,   234,   112,    -1,    23,    -1,    23,    -1,   237,    -1,
+     239,    -1,   238,    -1,   239,    -1,   240,    -1,    24,    -1,
+      23,    -1,   118,   240,   119,    -1,   118,   240,   109,   240,
+     119,    -1,   118,   240,   109,   240,   109,   240,   119,    -1,
+     118,   240,   109,   240,   109,   240,   109,   240,   119,    -1,
+     241,    24,    -1,   241,    23,    -1,   113,    -1,   114,    -1,
+      -1,    -1,   244,    11,   243,   247,    -1,   262,    -1,    -1,
+      -1,     5,   246,   247,    -1,   247,   109,    99,    -1,    99,
+      -1,   244,     9,    99,   117,   249,    -1,    65,    60,    -1,
+      65,    37,    -1,    65,   250,    -1,    65,    59,    -1,    65,
+      74,   255,    -1,    65,    30,    -1,    29,   251,   252,    -1,
+      -1,   111,    23,   112,    -1,    39,    -1,    27,    -1,    -1,
+      61,    -1,    68,    -1,    -1,    39,    -1,    27,    -1,    -1,
+      61,    -1,    68,    -1,    -1,   111,   258,   112,    -1,    -1,
+     111,   259,   112,    -1,    -1,   111,   260,   112,    -1,    23,
+      -1,    23,    -1,    23,    -1,     6,    99,   117,   100,    -1,
+      99,    -1,   100,    -1
+};
+
+/* YYRLINE[YYN] -- source line where rule number YYN was defined.  */
+static const yytype_uint16 yyrline[] =
+{
+       0,   284,   284,   287,   295,   307,   308,   311,   335,   336,
+     339,   354,   357,   362,   369,   370,   371,   372,   373,   374,
+     375,   378,   379,   380,   383,   389,   397,   403,   410,   416,
+     423,   467,   472,   481,   525,   531,   532,   533,   534,   535,
+     536,   537,   538,   539,   540,   541,   542,   545,   557,   565,
+     582,   589,   608,   619,   639,   663,   670,   703,   710,   726,
+     785,   828,   837,   859,   869,   873,   902,   921,   921,   923,
+     930,   942,   943,   944,   947,   961,   975,   995,  1006,  1018,
+    1020,  1021,  1022,  1023,  1026,  1026,  1026,  1026,  1027,  1030,
+    1034,  1039,  1045,  1052,  1059,  1081,  1103,  1104,  1105,  1106,
+    1107,  1108,  1111,  1130,  1134,  1140,  1144,  1148,  1152,  1156,
+    1160,  1164,  1169,  1175,  1186,  1186,  1187,  1189,  1193,  1197,
+    1201,  1207,  1207,  1209,  1227,  1253,  1256,  1271,  1277,  1283,
+    1284,  1291,  1297,  1303,  1311,  1317,  1323,  1331,  1337,  1343,
+    1351,  1352,  1355,  1356,  1357,  1358,  1359,  1360,  1361,  1362,
+    1363,  1364,  1365,  1368,  1377,  1381,  1385,  1391,  1400,  1404,
+    1408,  1417,  1421,  1427,  1433,  1440,  1445,  1453,  1463,  1465,
+    1473,  1479,  1483,  1487,  1493,  1504,  1513,  1517,  1522,  1526,
+    1530,  1534,  1540,  1547,  1551,  1557,  1565,  1576,  1583,  1587,
+    1593,  1603,  1614,  1618,  1636,  1645,  1648,  1654,  1658,  1662,
+    1668,  1679,  1684,  1689,  1694,  1699,  1704,  1712,  1715,  1720,
+    1733,  1741,  1752,  1760,  1760,  1762,  1762,  1764,  1774,  1779,
+    1786,  1796,  1805,  1810,  1817,  1827,  1837,  1849,  1849,  1850,
+    1850,  1852,  1862,  1870,  1880,  1888,  1896,  1905,  1916,  1920,
+    1926,  1927,  1928,  1931,  1931,  1934,  1969,  1973,  1973,  1976,
+    1983,  1992,  2006,  2015,  2024,  2028,  2037,  2046,  2057,  2064,
+    2074,  2102,  2111,  2123,  2126,  2135,  2146,  2147,  2148,  2151,
+    2152,  2153,  2156,  2157,  2160,  2161,  2164,  2165,  2168,  2179,
+    2190,  2201,  2227,  2228
+};
+#endif
+
+#if YYDEBUG || YYERROR_VERBOSE || 1
+/* YYTNAME[SYMBOL-NUM] -- String name of the symbol SYMBOL-NUM.
+   First, the terminals, then, starting at YYNTOKENS, nonterminals.  */
+static const char *const yytname[] =
+{
+  "$end", "error", "$undefined", "ARBvp_10", "ARBfp_10", "ADDRESS",
+  "ALIAS", "ATTRIB", "OPTION", "OUTPUT", "PARAM", "TEMP", "END", "BIN_OP",
+  "BINSC_OP", "SAMPLE_OP", "SCALAR_OP", "TRI_OP", "VECTOR_OP", "ARL",
+  "KIL", "SWZ", "TXD_OP", "INTEGER", "REAL", "AMBIENT", "ATTENUATION",
+  "BACK", "CLIP", "COLOR", "DEPTH", "DIFFUSE", "DIRECTION", "EMISSION",
+  "ENV", "EYE", "FOG", "FOGCOORD", "FRAGMENT", "FRONT", "HALF", "INVERSE",
+  "INVTRANS", "LIGHT", "LIGHTMODEL", "LIGHTPROD", "LOCAL", "MATERIAL",
+  "MAT_PROGRAM", "MATRIX", "MATRIXINDEX", "MODELVIEW", "MVP", "NORMAL",
+  "OBJECT", "PALETTE", "PARAMS", "PLANE", "POINT_TOK", "POINTSIZE",
+  "POSITION", "PRIMARY", "PROGRAM", "PROJECTION", "RANGE", "RESULT", "ROW",
+  "SCENECOLOR", "SECONDARY", "SHININESS", "SIZE_TOK", "SPECULAR", "SPOT",
+  "STATE", "TEXCOORD", "TEXENV", "TEXGEN", "TEXGEN_Q", "TEXGEN_R",
+  "TEXGEN_S", "TEXGEN_T", "TEXTURE", "TRANSPOSE", "TEXTURE_UNIT", "TEX_1D",
+  "TEX_2D", "TEX_3D", "TEX_CUBE", "TEX_RECT", "TEX_SHADOW1D",
+  "TEX_SHADOW2D", "TEX_SHADOWRECT", "TEX_ARRAY1D", "TEX_ARRAY2D",
+  "TEX_ARRAYSHADOW1D", "TEX_ARRAYSHADOW2D", "VERTEX", "VTXATTRIB",
+  "WEIGHT", "IDENTIFIER", "USED_IDENTIFIER", "MASK4", "MASK3", "MASK2",
+  "MASK1", "SWIZZLE", "DOT_DOT", "DOT", "';'", "','", "'|'", "'['", "']'",
+  "'+'", "'-'", "'('", "')'", "'='", "'{'", "'}'", "$accept", "program",
+  "language", "optionSequence", "option", "statementSequence", "statement",
+  "instruction", "ALU_instruction", "TexInstruction", "ARL_instruction",
+  "VECTORop_instruction", "SCALARop_instruction", "BINSCop_instruction",
+  "BINop_instruction", "TRIop_instruction", "SAMPLE_instruction",
+  "KIL_instruction", "TXD_instruction", "texImageUnit", "texTarget",
+  "SWZ_instruction", "scalarSrcReg", "scalarUse", "swizzleSrcReg",
+  "maskedDstReg", "maskedAddrReg", "extendedSwizzle", "extSwizComp",
+  "extSwizSel", "srcReg", "dstReg", "progParamArray", "progParamArrayMem",
+  "progParamArrayAbs", "progParamArrayRel", "addrRegRelOffset",
+  "addrRegPosOffset", "addrRegNegOffset", "addrReg", "addrComponent",
+  "addrWriteMask", "scalarSuffix", "swizzleSuffix", "optionalMask",
+  "optionalCcMask", "ccTest", "ccTest2", "ccMaskRule", "ccMaskRule2",
+  "namingStatement", "ATTRIB_statement", "attribBinding", "vtxAttribItem",
+  "vtxAttribNum", "vtxOptWeightNum", "vtxWeightNum", "fragAttribItem",
+  "PARAM_statement", "PARAM_singleStmt", "PARAM_multipleStmt",
+  "optArraySize", "paramSingleInit", "paramMultipleInit",
+  "paramMultInitList", "paramSingleItemDecl", "paramSingleItemUse",
+  "paramMultipleItem", "stateMultipleItem", "stateSingleItem",
+  "stateMaterialItem", "stateMatProperty", "stateLightItem",
+  "stateLightProperty", "stateSpotProperty", "stateLightModelItem",
+  "stateLModProperty", "stateLightProdItem", "stateLProdProperty",
+  "stateTexEnvItem", "stateTexEnvProperty", "ambDiffSpecProperty",
+  "stateLightNumber", "stateTexGenItem", "stateTexGenType",
+  "stateTexGenCoord", "stateFogItem", "stateFogProperty",
+  "stateClipPlaneItem", "stateClipPlaneNum", "statePointItem",
+  "statePointProperty", "stateMatrixRow", "stateMatrixRows",
+  "optMatrixRows", "stateMatrixItem", "stateOptMatModifier",
+  "stateMatModifier", "stateMatrixRowNum", "stateMatrixName",
+  "stateOptModMatNum", "stateModMatNum", "statePaletteMatNum",
+  "stateProgramMatNum", "stateDepthItem", "programSingleItem",
+  "programMultipleItem", "progEnvParams", "progEnvParamNums",
+  "progEnvParam", "progLocalParams", "progLocalParamNums",
+  "progLocalParam", "progEnvParamNum", "progLocalParamNum",
+  "paramConstDecl", "paramConstUse", "paramConstScalarDecl",
+  "paramConstScalarUse", "paramConstVector", "signedFloatConstant",
+  "optionalSign", "TEMP_statement", "@1", "optVarSize",
+  "ADDRESS_statement", "@2", "varNameList", "OUTPUT_statement",
+  "resultBinding", "resultColBinding", "optResultFaceType",
+  "optResultColorType", "optFaceType", "optColorType",
+  "optTexCoordUnitNum", "optTexImageUnitNum", "optLegacyTexUnitNum",
+  "texCoordUnitNum", "texImageUnitNum", "legacyTexUnitNum",
+  "ALIAS_statement", "string", YY_NULL
+};
+#endif
+
+# ifdef YYPRINT
+/* YYTOKNUM[YYLEX-NUM] -- Internal token number corresponding to
+   token YYLEX-NUM.  */
+static const yytype_uint16 yytoknum[] =
+{
+       0,   256,   257,   258,   259,   260,   261,   262,   263,   264,
+     265,   266,   267,   268,   269,   270,   271,   272,   273,   274,
+     275,   276,   277,   278,   279,   280,   281,   282,   283,   284,
+     285,   286,   287,   288,   289,   290,   291,   292,   293,   294,
+     295,   296,   297,   298,   299,   300,   301,   302,   303,   304,
+     305,   306,   307,   308,   309,   310,   311,   312,   313,   314,
+     315,   316,   317,   318,   319,   320,   321,   322,   323,   324,
+     325,   326,   327,   328,   329,   330,   331,   332,   333,   334,
+     335,   336,   337,   338,   339,   340,   341,   342,   343,   344,
+     345,   346,   347,   348,   349,   350,   351,   352,   353,   354,
+     355,   356,   357,   358,   359,   360,   361,   362,    59,    44,
+     124,    91,    93,    43,    45,    40,    41,    61,   123,   125
+};
+# endif
+
+/* YYR1[YYN] -- Symbol number of symbol that rule YYN derives.  */
+static const yytype_uint16 yyr1[] =
+{
+       0,   120,   121,   122,   122,   123,   123,   124,   125,   125,
+     126,   126,   127,   127,   128,   128,   128,   128,   128,   128,
+     128,   129,   129,   129,   130,   131,   132,   133,   134,   135,
+     136,   137,   137,   138,   139,   140,   140,   140,   140,   140,
+     140,   140,   140,   140,   140,   140,   140,   141,   142,   142,
+     143,   143,   144,   144,   145,   146,   147,   148,   149,   149,
+     150,   150,   150,   150,   151,   151,   152,   153,   153,   154,
+     155,   156,   156,   156,   157,   158,   159,   160,   161,   162,
+     163,   163,   163,   163,   164,   164,   164,   164,   164,   165,
+     165,   165,   166,   167,   168,   169,   170,   170,   170,   170,
+     170,   170,   171,   172,   172,   173,   173,   173,   173,   173,
+     173,   173,   173,   174,   175,   175,   176,   177,   177,   177,
+     177,   178,   178,   179,   180,   181,   181,   182,   183,   184,
+     184,   185,   185,   185,   186,   186,   186,   187,   187,   187,
+     188,   188,   189,   189,   189,   189,   189,   189,   189,   189,
+     189,   189,   189,   190,   191,   191,   191,   192,   193,   193,
+     193,   193,   193,   194,   195,   196,   196,   197,   198,   199,
+     200,   201,   201,   201,   202,   203,   204,   204,   205,   205,
+     205,   205,   206,   207,   207,   208,   209,   210,   211,   211,
+     212,   213,   214,   214,   215,   216,   216,   217,   217,   217,
+     218,   219,   219,   219,   219,   219,   219,   220,   220,   221,
+     222,   223,   224,   225,   225,   226,   226,   227,   228,   228,
+     229,   230,   231,   231,   232,   233,   234,   235,   235,   236,
+     236,   237,   238,   238,   239,   239,   239,   239,   240,   240,
+     241,   241,   241,   243,   242,   244,   244,   246,   245,   247,
+     247,   248,   249,   249,   249,   249,   249,   249,   250,   251,
+     251,   251,   251,   252,   252,   252,   253,   253,   253,   254,
+     254,   254,   255,   255,   256,   256,   257,   257,   258,   259,
+     260,   261,   262,   262
+};
+
+/* YYR2[YYN] -- Number of symbols composing right hand side of rule YYN.  */
+static const yytype_uint8 yyr2[] =
+{
+       0,     2,     4,     1,     1,     2,     0,     3,     2,     0,
+       2,     2,     1,     1,     1,     1,     1,     1,     1,     1,
+       1,     1,     1,     1,     4,     4,     4,     6,     6,     8,
+       8,     2,     2,    12,     2,     1,     1,     1,     1,     1,
+       1,     1,     1,     1,     1,     1,     1,     6,     2,     4,
+       2,     1,     3,     5,     3,     2,     7,     2,     1,     1,
+       1,     1,     4,     1,     1,     1,     1,     1,     1,     1,
+       3,     0,     2,     2,     1,     1,     1,     1,     1,     1,
+       1,     1,     1,     0,     1,     1,     1,     1,     0,     3,
+       3,     0,     2,     2,     1,     1,     1,     1,     1,     1,
+       1,     1,     4,     2,     2,     1,     2,     1,     2,     1,
+       2,     4,     4,     1,     0,     3,     1,     1,     2,     1,
+       2,     1,     1,     3,     6,     0,     1,     2,     4,     1,
+       3,     1,     1,     1,     1,     1,     1,     1,     1,     1,
+       1,     2,     2,     2,     2,     2,     2,     2,     2,     2,
+       2,     2,     2,     3,     1,     1,     1,     5,     1,     1,
+       1,     2,     1,     1,     2,     1,     2,     6,     1,     3,
+       1,     1,     1,     1,     1,     4,     1,     1,     1,     1,
+       1,     1,     2,     1,     1,     5,     1,     2,     1,     1,
+       5,     2,     0,     6,     3,     0,     1,     1,     1,     1,
+       1,     2,     1,     1,     2,     4,     4,     0,     3,     1,
+       1,     1,     2,     1,     1,     1,     1,     5,     1,     3,
+       5,     5,     1,     3,     5,     1,     1,     1,     1,     1,
+       1,     1,     1,     1,     3,     5,     7,     9,     2,     2,
+       1,     1,     0,     0,     4,     1,     0,     0,     3,     3,
+       1,     5,     2,     2,     2,     2,     3,     2,     3,     0,
+       3,     1,     1,     0,     1,     1,     0,     1,     1,     0,
+       1,     1,     0,     3,     0,     3,     0,     3,     1,     1,
+       1,     4,     1,     1
+};
+
+/* YYDEFACT[STATE-NAME] -- Default reduction number in state STATE-NUM.
+   Performed when YYTABLE doesn't specify something else to do.  Zero
+   means the default is an error.  */
+static const yytype_uint16 yydefact[] =
+{
+       0,     3,     4,     0,     6,     1,     9,     0,     5,   246,
+     282,   283,     0,   247,     0,     0,     0,     2,     0,     0,
+       0,     0,     0,     0,     0,   242,     0,     0,     8,     0,
+      12,    13,    14,    15,    16,    17,    18,    19,    21,    22,
+      23,    20,     0,    96,    97,   121,   122,    98,     0,    99,
+     100,   101,   245,     7,     0,     0,     0,     0,     0,    65,
+       0,    88,    64,     0,     0,     0,     0,     0,    76,     0,
+       0,    94,   240,   241,    31,    32,    83,     0,     0,     0,
+      10,    11,     0,   243,   250,   248,     0,     0,   125,   242,
+     123,   259,   257,   253,   255,   252,   272,   254,   242,    84,
+      85,    86,    87,    91,   242,   242,   242,   242,   242,   242,
+      78,    55,    81,    80,    82,    92,   233,   232,     0,     0,
+       0,     0,    60,     0,   242,    83,     0,    61,    63,   134,
+     135,   213,   214,   136,   229,   230,     0,   242,     0,     0,
+       0,   281,   102,   126,     0,   127,   131,   132,   133,   227,
+     228,   231,     0,   262,   261,     0,   263,     0,   256,     0,
+       0,    54,     0,     0,     0,    26,     0,    25,    24,   269,
+     119,   117,   272,   104,     0,     0,     0,     0,     0,     0,
+     266,     0,   266,     0,     0,   276,   272,   142,   143,   144,
+     145,   147,   146,   148,   149,   150,   151,     0,   152,   269,
+     109,     0,   107,   105,   272,     0,   114,   103,    83,     0,
+      52,     0,     0,     0,     0,   244,   249,     0,   239,   238,
+       0,   264,   265,   258,   278,     0,   242,    95,     0,     0,
+      83,   242,     0,    48,     0,    51,     0,   242,   270,   271,
+     118,   120,     0,     0,     0,   212,   183,   184,   182,     0,
+     165,   268,   267,   164,     0,     0,     0,     0,   207,   203,
+       0,   202,   272,   195,   189,   188,   187,     0,     0,     0,
+       0,   108,     0,   110,     0,     0,   106,     0,   242,   234,
+      69,     0,    67,    68,     0,   242,   242,   251,     0,   124,
+     260,   273,    28,    89,    90,    93,    27,     0,    79,    50,
+     274,     0,     0,   225,     0,   226,     0,   186,     0,   174,
+       0,   166,     0,   171,   172,   155,   156,   173,   153,   154,
+       0,     0,   201,     0,   204,   197,   199,   198,   194,   196,
+     280,     0,   170,   169,   176,   177,     0,     0,   116,     0,
+     113,     0,     0,    53,     0,    62,    77,    71,    47,     0,
+       0,     0,   242,    49,     0,    34,     0,   242,   220,   224,
+       0,     0,   266,   211,     0,   209,     0,   210,     0,   277,
+     181,   180,   178,   179,   175,   200,     0,   111,   112,   115,
+     242,   235,     0,     0,    70,   242,    58,    57,    59,   242,
+       0,     0,     0,   129,   137,   140,   138,   215,   216,   139,
+     279,     0,    35,    36,    37,    38,    39,    40,    41,    42,
+      43,    44,    45,    46,    30,    29,   185,   160,   162,   159,
+       0,   157,   158,     0,   206,   208,   205,   190,     0,    74,
+      72,    75,    73,     0,     0,     0,     0,   141,   192,   242,
+     128,   275,   163,   161,   167,   168,   242,   236,   242,     0,
+       0,     0,     0,   191,   130,     0,     0,     0,     0,   218,
+       0,   222,     0,   237,   242,     0,   217,     0,   221,     0,
+       0,    56,    33,   219,   223,     0,     0,   193
+};
+
+/* YYDEFGOTO[NTERM-NUM].  */
+static const yytype_int16 yydefgoto[] =
+{
+      -1,     3,     4,     6,     8,     9,    28,    29,    30,    31,
+      32,    33,    34,    35,    36,    37,    38,    39,    40,   301,
+     414,    41,   162,   233,    74,    60,    69,   348,   349,   387,
+     234,    61,   126,   281,   282,   283,   384,   430,   432,    70,
+     347,   111,   299,   115,   103,   161,    75,   229,    76,   230,
+      42,    43,   127,   207,   341,   276,   339,   173,    44,    45,
+      46,   144,    90,   289,   392,   145,   128,   393,   394,   129,
+     187,   318,   188,   421,   443,   189,   253,   190,   444,   191,
+     333,   319,   310,   192,   336,   374,   193,   248,   194,   308,
+     195,   266,   196,   437,   453,   197,   328,   329,   376,   263,
+     322,   366,   368,   364,   198,   130,   396,   397,   458,   131,
+     398,   460,   132,   304,   306,   399,   133,   149,   134,   135,
+     151,    77,    47,   139,    48,    49,    54,    85,    50,    62,
+      97,   156,   223,   254,   240,   158,   355,   268,   225,   401,
+     331,    51,    12
+};
+
+/* YYPACT[STATE-NUM] -- Index in YYTABLE of the portion describing
+   STATE-NUM.  */
+#define YYPACT_NINF -398
+static const yytype_int16 yypact[] =
+{
+      52,  -398,  -398,    14,  -398,  -398,    67,   152,  -398,    24,
+    -398,  -398,     5,  -398,    47,    81,    99,  -398,    -1,    -1,
+      -1,    -1,    -1,    -1,    43,    56,    -1,    -1,  -398,    97,
+    -398,  -398,  -398,  -398,  -398,  -398,  -398,  -398,  -398,  -398,
+    -398,  -398,   112,  -398,  -398,  -398,  -398,  -398,   156,  -398,
+    -398,  -398,  -398,  -398,   111,    98,   141,    95,   127,  -398,
+      84,   142,  -398,   146,   150,   153,   157,   158,  -398,   159,
+     165,  -398,  -398,  -398,  -398,  -398,   113,   -13,   161,   163,
+    -398,  -398,   162,  -398,  -398,   164,   174,    10,   252,    -3,
+    -398,   -11,  -398,  -398,  -398,  -398,   166,  -398,   -20,  -398,
+    -398,  -398,  -398,   167,   -20,   -20,   -20,   -20,   -20,   -20,
+    -398,  -398,  -398,  -398,  -398,  -398,  -398,  -398,   137,    70,
+     132,    85,   168,    34,   -20,   113,   169,  -398,  -398,  -398,
+    -398,  -398,  -398,  -398,  -398,  -398,    34,   -20,   171,   111,
+     179,  -398,  -398,  -398,   172,  -398,  -398,  -398,  -398,  -398,
+    -398,  -398,   216,  -398,  -398,   253,    76,   258,  -398,   176,
+     154,  -398,   178,    29,   180,  -398,   181,  -398,  -398,   110,
+    -398,  -398,   166,  -398,   175,   182,   183,   219,    32,   184,
+     177,   186,    94,   140,     7,   187,   166,  -398,  -398,  -398,
+    -398,  -398,  -398,  -398,  -398,  -398,  -398,   226,  -398,   110,
+    -398,   188,  -398,  -398,   166,   189,   190,  -398,   113,     9,
+    -398,     1,   193,   195,   240,   164,  -398,   191,  -398,  -398,
+     194,  -398,  -398,  -398,  -398,   197,   -20,  -398,   196,   198,
+     113,   -20,    34,  -398,   203,   206,   228,   -20,  -398,  -398,
+    -398,  -398,   290,   292,   293,  -398,  -398,  -398,  -398,   294,
+    -398,  -398,  -398,  -398,   251,   294,    48,   208,   209,  -398,
+     210,  -398,   166,    21,  -398,  -398,  -398,   299,   295,    12,
+     212,  -398,   302,  -398,   304,   302,  -398,   218,   -20,  -398,
+    -398,   217,  -398,  -398,   227,   -20,   -20,  -398,   214,  -398,
+    -398,  -398,  -398,  -398,  -398,  -398,  -398,   220,  -398,  -398,
+     222,   225,   229,  -398,   223,  -398,   224,  -398,   230,  -398,
+     231,  -398,   233,  -398,  -398,  -398,  -398,  -398,  -398,  -398,
+     314,   316,  -398,   317,  -398,  -398,  -398,  -398,  -398,  -398,
+    -398,   234,  -398,  -398,  -398,  -398,   170,   318,  -398,   235,
+    -398,   236,   237,  -398,    44,  -398,  -398,   143,  -398,   244,
+     -15,   245,    36,  -398,   332,  -398,   138,   -20,  -398,  -398,
+     301,   101,    94,  -398,   248,  -398,   249,  -398,   250,  -398,
+    -398,  -398,  -398,  -398,  -398,  -398,   254,  -398,  -398,  -398,
+     -20,  -398,   333,   340,  -398,   -20,  -398,  -398,  -398,   -20,
+     102,   132,    75,  -398,  -398,  -398,  -398,  -398,  -398,  -398,
+    -398,   255,  -398,  -398,  -398,  -398,  -398,  -398,  -398,  -398,
+    -398,  -398,  -398,  -398,  -398,  -398,  -398,  -398,  -398,  -398,
+     336,  -398,  -398,    49,  -398,  -398,  -398,  -398,    90,  -398,
+    -398,  -398,  -398,   256,   260,   259,   261,  -398,   298,    36,
+    -398,  -398,  -398,  -398,  -398,  -398,   -20,  -398,   -20,   228,
+     290,   292,   262,  -398,  -398,   257,   265,   268,   266,   273,
+     269,   274,   318,  -398,   -20,   138,  -398,   290,  -398,   292,
+     107,  -398,  -398,  -398,  -398,   318,   270,  -398
+};
+
+/* YYPGOTO[NTERM-NUM].  */
+static const yytype_int16 yypgoto[] =
+{
+    -398,  -398,  -398,  -398,  -398,  -398,  -398,  -398,  -398,  -398,
+    -398,  -398,  -398,  -398,  -398,  -398,  -398,  -398,  -398,   -78,
+     -82,  -398,  -100,   155,   -86,   215,  -398,  -398,  -372,  -398,
+     -54,  -398,  -398,  -398,  -398,  -398,  -398,  -398,  -398,   173,
+    -398,  -398,  -398,  -118,  -398,  -398,   232,  -398,  -398,  -398,
+    -398,  -398,   303,  -398,  -398,  -398,   114,  -398,  -398,  -398,
+    -398,  -398,  -398,  -398,  -398,  -398,  -398,   -53,  -398,   -88,
+    -398,  -398,  -398,  -398,  -398,  -398,  -398,  -398,  -398,  -398,
+    -398,  -334,   130,  -398,  -398,  -398,  -398,  -398,  -398,  -398,
+    -398,  -398,  -398,  -398,  -398,     0,  -398,  -398,  -397,  -398,
+    -398,  -398,  -398,  -398,  -398,   305,  -398,  -398,  -398,  -398,
+    -398,  -398,  -398,  -396,  -383,   306,  -398,  -398,  -137,   -87,
+    -120,   -89,  -398,  -398,  -398,  -398,  -398,   263,  -398,   185,
+    -398,  -398,  -398,  -177,   199,  -154,  -398,  -398,  -398,  -398,
+    -398,  -398,    -6
+};
+
+/* YYTABLE[YYPACT[STATE-NUM]].  What to do in state STATE-NUM.  If
+   positive, shift that token.  If negative, reduce the rule which
+   number is the opposite.  If YYTABLE_NINF, syntax error.  */
+#define YYTABLE_NINF -230
+static const yytype_int16 yytable[] =
+{
+     152,   146,   150,    52,   209,   256,   165,   210,   386,   168,
+     116,   117,   159,   433,     5,   163,   153,   163,   241,   164,
+     163,   166,   167,   125,   280,   118,   235,   422,   154,    13,
+      14,    15,   269,   264,    16,   152,    17,    18,    19,    20,
+      21,    22,    23,    24,    25,    26,    27,   334,   118,   119,
+     273,   213,   116,   117,   459,     1,     2,   116,   117,   119,
+     120,   246,   325,   326,    58,   470,   335,   118,   461,   208,
+     120,   473,   118,   313,   313,     7,   456,   265,   476,   314,
+     314,   315,   212,   121,    10,    11,   474,   122,   247,   445,
+     277,   119,   471,    72,    73,   235,   119,   123,   390,    59,
+     155,    68,   120,   327,   174,   124,   121,   120,   324,   391,
+      72,    73,   295,    53,   199,   124,   175,   316,   278,   317,
+     317,   251,   200,    10,    11,   121,   313,   417,   279,   122,
+     121,   296,   314,   252,   122,   201,   435,   221,   202,   232,
+     292,   418,   163,    68,   222,   203,    55,   124,   436,    72,
+      73,   302,   124,   380,   124,    71,    91,    92,   344,   204,
+     176,   419,   177,   381,    93,    82,   169,    83,   178,    72,
+      73,   238,   317,   420,   170,   179,   180,   181,   239,   182,
+      56,   183,   205,   206,   439,   423,    94,    95,   257,   152,
+     184,   258,   259,    98,   440,   260,   350,   171,    57,   446,
+     351,    96,   250,   261,   251,    80,    88,   185,   186,   447,
+      84,   172,    89,   475,   112,    86,   252,   113,   114,   427,
+      81,   262,   402,   403,   404,   405,   406,   407,   408,   409,
+     410,   411,   412,   413,    63,    64,    65,    66,    67,   218,
+     219,    78,    79,    99,   100,   101,   102,   370,   371,   372,
+     373,    10,    11,    71,   227,   104,   382,   383,    87,   105,
+     428,   138,   106,   152,   395,   150,   107,   108,   109,   110,
+     136,   415,   137,   140,   141,   143,   220,   157,   216,   -66,
+     211,   224,   160,   245,   217,   226,   242,   231,   214,   236,
+     237,   152,   270,   243,   244,   249,   350,   255,   267,   272,
+     274,   275,   285,   434,   286,    58,   290,   298,   288,   291,
+    -229,   300,   293,   303,   294,   305,   307,   309,   311,   320,
+     321,   323,   330,   337,   332,   338,   455,   340,   343,   345,
+     353,   346,   352,   354,   356,   358,   359,   363,   357,   365,
+     367,   375,   360,   361,   388,   362,   369,   377,   378,   379,
+     152,   395,   150,   385,   389,   400,   429,   152,   416,   350,
+     424,   425,   426,   431,   452,   448,   427,   441,   442,   449,
+     450,   457,   451,   462,   464,   350,   463,   465,   466,   467,
+     469,   468,   477,   472,   284,   312,   454,   297,     0,   342,
+     142,   438,   228,     0,   147,   148,     0,     0,   271,   287,
+       0,     0,   215
+};
+
+#define yypact_value_is_default(Yystate) \
+  (!!((Yystate) == (-398)))
+
+#define yytable_value_is_error(Yytable_value) \
+  YYID (0)
+
+static const yytype_int16 yycheck[] =
+{
+      89,    89,    89,     9,   124,   182,   106,   125,    23,   109,
+      23,    24,    98,   385,     0,   104,    27,   106,   172,   105,
+     109,   107,   108,    77,    23,    38,   163,   361,    39,     5,
+       6,     7,   186,    26,    10,   124,    12,    13,    14,    15,
+      16,    17,    18,    19,    20,    21,    22,    35,    38,    62,
+     204,   137,    23,    24,   450,     3,     4,    23,    24,    62,
+      73,    29,    41,    42,    65,   462,    54,    38,   451,   123,
+      73,   467,    38,    25,    25,     8,   448,    70,   475,    31,
+      31,    33,   136,    96,    99,   100,   469,   100,    56,   423,
+     208,    62,   464,   113,   114,   232,    62,   110,    62,   100,
+     111,   100,    73,    82,    34,   118,    96,    73,   262,    73,
+     113,   114,   230,   108,    29,   118,    46,    69,   109,    71,
+      71,    27,    37,    99,   100,    96,    25,    26,   119,   100,
+      96,   231,    31,    39,   100,    50,    34,    61,    53,   110,
+     226,    40,   231,   100,    68,    60,    99,   118,    46,   113,
+     114,   237,   118,   109,   118,    99,    29,    30,   278,    74,
+      28,    60,    30,   119,    37,     9,    29,    11,    36,   113,
+     114,    61,    71,    72,    37,    43,    44,    45,    68,    47,
+      99,    49,    97,    98,   109,   362,    59,    60,    48,   278,
+      58,    51,    52,   109,   119,    55,   285,    60,    99,   109,
+     286,    74,    25,    63,    27,   108,   111,    75,    76,   119,
+      99,    74,   117,   106,   101,   117,    39,   104,   105,   112,
+     108,    81,    84,    85,    86,    87,    88,    89,    90,    91,
+      92,    93,    94,    95,    19,    20,    21,    22,    23,    23,
+      24,    26,    27,   101,   102,   103,   104,    77,    78,    79,
+      80,    99,   100,    99,   100,   109,   113,   114,   117,   109,
+     380,    99,   109,   352,   352,   352,   109,   109,   109,   104,
+     109,   357,   109,   109,   100,    23,    23,   111,    99,   111,
+     111,    23,   115,    64,   112,   109,   111,   109,   117,   109,
+     109,   380,    66,   111,   111,   111,   385,   111,   111,   111,
+     111,   111,   109,   389,   109,    65,   112,   104,   117,   112,
+     104,    83,   116,    23,   116,    23,    23,    23,    67,   111,
+     111,   111,    23,   111,    29,    23,   446,    23,   110,   112,
+     110,   104,   118,   111,   109,   112,   112,    23,   109,    23,
+      23,    23,   112,   112,   350,   112,   112,   112,   112,   112,
+     439,   439,   439,   109,   109,    23,    23,   446,    57,   448,
+     112,   112,   112,    23,    66,   109,   112,   112,    32,   109,
+     111,   449,   111,   111,   109,   464,   119,   109,   112,   106,
+     106,   112,   112,   465,   211,   255,   439,   232,    -1,   275,
+      87,   391,   160,    -1,    89,    89,    -1,    -1,   199,   214,
+      -1,    -1,   139
+};
+
+/* YYSTOS[STATE-NUM] -- The (internal number of the) accessing
+   symbol of state STATE-NUM.  */
+static const yytype_uint16 yystos[] =
+{
+       0,     3,     4,   121,   122,     0,   123,     8,   124,   125,
+      99,   100,   262,     5,     6,     7,    10,    12,    13,    14,
+      15,    16,    17,    18,    19,    20,    21,    22,   126,   127,
+     128,   129,   130,   131,   132,   133,   134,   135,   136,   137,
+     138,   141,   170,   171,   178,   179,   180,   242,   244,   245,
+     248,   261,   262,   108,   246,    99,    99,    99,    65,   100,
+     145,   151,   249,   145,   145,   145,   145,   145,   100,   146,
+     159,    99,   113,   114,   144,   166,   168,   241,   145,   145,
+     108,   108,     9,    11,    99,   247,   117,   117,   111,   117,
+     182,    29,    30,    37,    59,    60,    74,   250,   109,   101,
+     102,   103,   104,   164,   109,   109,   109,   109,   109,   109,
+     104,   161,   101,   104,   105,   163,    23,    24,    38,    62,
+      73,    96,   100,   110,   118,   150,   152,   172,   186,   189,
+     225,   229,   232,   236,   238,   239,   109,   109,    99,   243,
+     109,   100,   172,    23,   181,   185,   189,   225,   235,   237,
+     239,   240,   241,    27,    39,   111,   251,   111,   255,   144,
+     115,   165,   142,   241,   144,   142,   144,   144,   142,    29,
+      37,    60,    74,   177,    34,    46,    28,    30,    36,    43,
+      44,    45,    47,    49,    58,    75,    76,   190,   192,   195,
+     197,   199,   203,   206,   208,   210,   212,   215,   224,    29,
+      37,    50,    53,    60,    74,    97,    98,   173,   150,   240,
+     163,   111,   150,   144,   117,   247,    99,   112,    23,    24,
+      23,    61,    68,   252,    23,   258,   109,   100,   166,   167,
+     169,   109,   110,   143,   150,   238,   109,   109,    61,    68,
+     254,   255,   111,   111,   111,    64,    29,    56,   207,   111,
+      25,    27,    39,   196,   253,   111,   253,    48,    51,    52,
+      55,    63,    81,   219,    26,    70,   211,   111,   257,   255,
+      66,   254,   111,   255,   111,   111,   175,   163,   109,   119,
+      23,   153,   154,   155,   159,   109,   109,   249,   117,   183,
+     112,   112,   144,   116,   116,   163,   142,   143,   104,   162,
+      83,   139,   144,    23,   233,    23,   234,    23,   209,    23,
+     202,    67,   202,    25,    31,    33,    69,    71,   191,   201,
+     111,   111,   220,   111,   255,    41,    42,    82,   216,   217,
+      23,   260,    29,   200,    35,    54,   204,   111,    23,   176,
+      23,   174,   176,   110,   240,   112,   104,   160,   147,   148,
+     241,   144,   118,   110,   111,   256,   109,   109,   112,   112,
+     112,   112,   112,    23,   223,    23,   221,    23,   222,   112,
+      77,    78,    79,    80,   205,    23,   218,   112,   112,   112,
+     109,   119,   113,   114,   156,   109,    23,   149,   262,   109,
+      62,    73,   184,   187,   188,   189,   226,   227,   230,   235,
+      23,   259,    84,    85,    86,    87,    88,    89,    90,    91,
+      92,    93,    94,    95,   140,   144,    57,    26,    40,    60,
+      72,   193,   201,   253,   112,   112,   112,   112,   240,    23,
+     157,    23,   158,   148,   144,    34,    46,   213,   215,   109,
+     119,   112,    32,   194,   198,   201,   109,   119,   109,   109,
+     111,   111,    66,   214,   187,   240,   148,   139,   228,   233,
+     231,   234,   111,   119,   109,   109,   112,   106,   112,   106,
+     218,   148,   140,   233,   234,   106,   218,   112
+};
+
+#define yyerrok		(yyerrstatus = 0)
+#define yyclearin	(yychar = YYEMPTY)
+#define YYEMPTY		(-2)
+#define YYEOF		0
+
+#define YYACCEPT	goto yyacceptlab
+#define YYABORT		goto yyabortlab
+#define YYERROR		goto yyerrorlab
+
+
+/* Like YYERROR except do call yyerror.  This remains here temporarily
+   to ease the transition to the new meaning of YYERROR, for GCC.
+   Once GCC version 2 has supplanted version 1, this can go.  However,
+   YYFAIL appears to be in use.  Nevertheless, it is formally deprecated
+   in Bison 2.4.2's NEWS entry, where a plan to phase it out is
+   discussed.  */
+
+#define YYFAIL		goto yyerrlab
+#if defined YYFAIL
+  /* This is here to suppress warnings from the GCC cpp's
+     -Wunused-macros.  Normally we don't worry about that warning, but
+     some users do, and we want to make it easy for users to remove
+     YYFAIL uses, which will produce warnings from Bison 2.5.  */
+#endif
+
+#define YYRECOVERING()  (!!yyerrstatus)
+
+#define YYBACKUP(Token, Value)                                  \
+do                                                              \
+  if (yychar == YYEMPTY)                                        \
+    {                                                           \
+      yychar = (Token);                                         \
+      yylval = (Value);                                         \
+      YYPOPSTACK (yylen);                                       \
+      yystate = *yyssp;                                         \
+      goto yybackup;                                            \
+    }                                                           \
+  else                                                          \
+    {                                                           \
+      yyerror (&yylloc, state, YY_("syntax error: cannot back up")); \
+      YYERROR;							\
+    }								\
+while (YYID (0))
+
+/* Error token number */
+#define YYTERROR	1
+#define YYERRCODE	256
+
+
+/* YYLLOC_DEFAULT -- Set CURRENT to span from RHS[1] to RHS[N].
+   If N is 0, then set CURRENT to the empty location which ends
+   the previous symbol: RHS[0] (always defined).  */
+
+#ifndef YYLLOC_DEFAULT
+# define YYLLOC_DEFAULT(Current, Rhs, N)                                \
+    do                                                                  \
+      if (YYID (N))                                                     \
+        {                                                               \
+          (Current).first_line   = YYRHSLOC (Rhs, 1).first_line;        \
+          (Current).first_column = YYRHSLOC (Rhs, 1).first_column;      \
+          (Current).last_line    = YYRHSLOC (Rhs, N).last_line;         \
+          (Current).last_column  = YYRHSLOC (Rhs, N).last_column;       \
+        }                                                               \
+      else                                                              \
+        {                                                               \
+          (Current).first_line   = (Current).last_line   =              \
+            YYRHSLOC (Rhs, 0).last_line;                                \
+          (Current).first_column = (Current).last_column =              \
+            YYRHSLOC (Rhs, 0).last_column;                              \
+        }                                                               \
+    while (YYID (0))
+#endif
+
+#define YYRHSLOC(Rhs, K) ((Rhs)[K])
+
+
+/* YY_LOCATION_PRINT -- Print the location on the stream.
+   This macro was not mandated originally: define only if we know
+   we won't break user code: when these are the locations we know.  */
+
+#ifndef YY_LOCATION_PRINT
+# if defined YYLTYPE_IS_TRIVIAL && YYLTYPE_IS_TRIVIAL
+
+/* Print *YYLOCP on YYO.  Private, do not rely on its existence. */
+
+__attribute__((__unused__))
+#if (defined __STDC__ || defined __C99__FUNC__ \
+     || defined __cplusplus || defined _MSC_VER)
+static unsigned
+yy_location_print_ (FILE *yyo, YYLTYPE const * const yylocp)
+#else
+static unsigned
+yy_location_print_ (yyo, yylocp)
+    FILE *yyo;
+    YYLTYPE const * const yylocp;
+#endif
+{
+  unsigned res = 0;
+  int end_col = 0 != yylocp->last_column ? yylocp->last_column - 1 : 0;
+  if (0 <= yylocp->first_line)
+    {
+      res += fprintf (yyo, "%d", yylocp->first_line);
+      if (0 <= yylocp->first_column)
+        res += fprintf (yyo, ".%d", yylocp->first_column);
+    }
+  if (0 <= yylocp->last_line)
+    {
+      if (yylocp->first_line < yylocp->last_line)
+        {
+          res += fprintf (yyo, "-%d", yylocp->last_line);
+          if (0 <= end_col)
+            res += fprintf (yyo, ".%d", end_col);
+        }
+      else if (0 <= end_col && yylocp->first_column < end_col)
+        res += fprintf (yyo, "-%d", end_col);
+    }
+  return res;
+ }
+
+#  define YY_LOCATION_PRINT(File, Loc)          \
+  yy_location_print_ (File, &(Loc))
+
+# else
+#  define YY_LOCATION_PRINT(File, Loc) ((void) 0)
+# endif
+#endif
+
+
+/* YYLEX -- calling `yylex' with the right arguments.  */
+#ifdef YYLEX_PARAM
+# define YYLEX yylex (&yylval, &yylloc, YYLEX_PARAM)
+#else
+# define YYLEX yylex (&yylval, &yylloc, state)
+#endif
+
+/* Enable debugging if requested.  */
+#if YYDEBUG
+
+# ifndef YYFPRINTF
+#  include <stdio.h> /* INFRINGES ON USER NAME SPACE */
+#  define YYFPRINTF fprintf
+# endif
+
+# define YYDPRINTF(Args)			\
+do {						\
+  if (yydebug)					\
+    YYFPRINTF Args;				\
+} while (YYID (0))
+
+# define YY_SYMBOL_PRINT(Title, Type, Value, Location)			  \
+do {									  \
+  if (yydebug)								  \
+    {									  \
+      YYFPRINTF (stderr, "%s ", Title);					  \
+      yy_symbol_print (stderr,						  \
+		  Type, Value, Location, state); \
+      YYFPRINTF (stderr, "\n");						  \
+    }									  \
+} while (YYID (0))
+
+
+/*--------------------------------.
+| Print this symbol on YYOUTPUT.  |
+`--------------------------------*/
+
+/*ARGSUSED*/
+#if (defined __STDC__ || defined __C99__FUNC__ \
+     || defined __cplusplus || defined _MSC_VER)
+static void
+yy_symbol_value_print (FILE *yyoutput, int yytype, YYSTYPE const * const yyvaluep, YYLTYPE const * const yylocationp, struct asm_parser_state *state)
+#else
+static void
+yy_symbol_value_print (yyoutput, yytype, yyvaluep, yylocationp, state)
+    FILE *yyoutput;
+    int yytype;
+    YYSTYPE const * const yyvaluep;
+    YYLTYPE const * const yylocationp;
+    struct asm_parser_state *state;
+#endif
+{
+  FILE *yyo = yyoutput;
+  YYUSE (yyo);
+  if (!yyvaluep)
+    return;
+  YYUSE (yylocationp);
+  YYUSE (state);
+# ifdef YYPRINT
+  if (yytype < YYNTOKENS)
+    YYPRINT (yyoutput, yytoknum[yytype], *yyvaluep);
+# else
+  YYUSE (yyoutput);
+# endif
+  YYUSE (yytype);
+}
+
+
+/*--------------------------------.
+| Print this symbol on YYOUTPUT.  |
+`--------------------------------*/
+
+#if (defined __STDC__ || defined __C99__FUNC__ \
+     || defined __cplusplus || defined _MSC_VER)
+static void
+yy_symbol_print (FILE *yyoutput, int yytype, YYSTYPE const * const yyvaluep, YYLTYPE const * const yylocationp, struct asm_parser_state *state)
+#else
+static void
+yy_symbol_print (yyoutput, yytype, yyvaluep, yylocationp, state)
+    FILE *yyoutput;
+    int yytype;
+    YYSTYPE const * const yyvaluep;
+    YYLTYPE const * const yylocationp;
+    struct asm_parser_state *state;
+#endif
+{
+  if (yytype < YYNTOKENS)
+    YYFPRINTF (yyoutput, "token %s (", yytname[yytype]);
+  else
+    YYFPRINTF (yyoutput, "nterm %s (", yytname[yytype]);
+
+  YY_LOCATION_PRINT (yyoutput, *yylocationp);
+  YYFPRINTF (yyoutput, ": ");
+  yy_symbol_value_print (yyoutput, yytype, yyvaluep, yylocationp, state);
+  YYFPRINTF (yyoutput, ")");
+}
+
+/*------------------------------------------------------------------.
+| yy_stack_print -- Print the state stack from its BOTTOM up to its |
+| TOP (included).                                                   |
+`------------------------------------------------------------------*/
+
+#if (defined __STDC__ || defined __C99__FUNC__ \
+     || defined __cplusplus || defined _MSC_VER)
+static void
+yy_stack_print (yytype_int16 *yybottom, yytype_int16 *yytop)
+#else
+static void
+yy_stack_print (yybottom, yytop)
+    yytype_int16 *yybottom;
+    yytype_int16 *yytop;
+#endif
+{
+  YYFPRINTF (stderr, "Stack now");
+  for (; yybottom <= yytop; yybottom++)
+    {
+      int yybot = *yybottom;
+      YYFPRINTF (stderr, " %d", yybot);
+    }
+  YYFPRINTF (stderr, "\n");
+}
+
+# define YY_STACK_PRINT(Bottom, Top)				\
+do {								\
+  if (yydebug)							\
+    yy_stack_print ((Bottom), (Top));				\
+} while (YYID (0))
+
+
+/*------------------------------------------------.
+| Report that the YYRULE is going to be reduced.  |
+`------------------------------------------------*/
+
+#if (defined __STDC__ || defined __C99__FUNC__ \
+     || defined __cplusplus || defined _MSC_VER)
+static void
+yy_reduce_print (YYSTYPE *yyvsp, YYLTYPE *yylsp, int yyrule, struct asm_parser_state *state)
+#else
+static void
+yy_reduce_print (yyvsp, yylsp, yyrule, state)
+    YYSTYPE *yyvsp;
+    YYLTYPE *yylsp;
+    int yyrule;
+    struct asm_parser_state *state;
+#endif
+{
+  int yynrhs = yyr2[yyrule];
+  int yyi;
+  unsigned long int yylno = yyrline[yyrule];
+  YYFPRINTF (stderr, "Reducing stack by rule %d (line %lu):\n",
+	     yyrule - 1, yylno);
+  /* The symbols being reduced.  */
+  for (yyi = 0; yyi < yynrhs; yyi++)
+    {
+      YYFPRINTF (stderr, "   $%d = ", yyi + 1);
+      yy_symbol_print (stderr, yyrhs[yyprhs[yyrule] + yyi],
+		       &(yyvsp[(yyi + 1) - (yynrhs)])
+		       , &(yylsp[(yyi + 1) - (yynrhs)])		       , state);
+      YYFPRINTF (stderr, "\n");
+    }
+}
+
+# define YY_REDUCE_PRINT(Rule)		\
+do {					\
+  if (yydebug)				\
+    yy_reduce_print (yyvsp, yylsp, Rule, state); \
+} while (YYID (0))
+
+/* Nonzero means print parse trace.  It is left uninitialized so that
+   multiple parsers can coexist.  */
+int yydebug;
+#else /* !YYDEBUG */
+# define YYDPRINTF(Args)
+# define YY_SYMBOL_PRINT(Title, Type, Value, Location)
+# define YY_STACK_PRINT(Bottom, Top)
+# define YY_REDUCE_PRINT(Rule)
+#endif /* !YYDEBUG */
+
+
+/* YYINITDEPTH -- initial size of the parser's stacks.  */
+#ifndef	YYINITDEPTH
+# define YYINITDEPTH 200
+#endif
+
+/* YYMAXDEPTH -- maximum size the stacks can grow to (effective only
+   if the built-in stack extension method is used).
+
+   Do not make this value too large; the results are undefined if
+   YYSTACK_ALLOC_MAXIMUM < YYSTACK_BYTES (YYMAXDEPTH)
+   evaluated with infinite-precision integer arithmetic.  */
+
+#ifndef YYMAXDEPTH
+# define YYMAXDEPTH 10000
+#endif
+
+
+#if YYERROR_VERBOSE
+
+# ifndef yystrlen
+#  if defined __GLIBC__ && defined _STRING_H
+#   define yystrlen strlen
+#  else
+/* Return the length of YYSTR.  */
+#if (defined __STDC__ || defined __C99__FUNC__ \
+     || defined __cplusplus || defined _MSC_VER)
+static YYSIZE_T
+yystrlen (const char *yystr)
+#else
+static YYSIZE_T
+yystrlen (yystr)
+    const char *yystr;
+#endif
+{
+  YYSIZE_T yylen;
+  for (yylen = 0; yystr[yylen]; yylen++)
+    continue;
+  return yylen;
+}
+#  endif
+# endif
+
+# ifndef yystpcpy
+#  if defined __GLIBC__ && defined _STRING_H && defined _GNU_SOURCE
+#   define yystpcpy stpcpy
+#  else
+/* Copy YYSRC to YYDEST, returning the address of the terminating '\0' in
+   YYDEST.  */
+#if (defined __STDC__ || defined __C99__FUNC__ \
+     || defined __cplusplus || defined _MSC_VER)
+static char *
+yystpcpy (char *yydest, const char *yysrc)
+#else
+static char *
+yystpcpy (yydest, yysrc)
+    char *yydest;
+    const char *yysrc;
+#endif
+{
+  char *yyd = yydest;
+  const char *yys = yysrc;
+
+  while ((*yyd++ = *yys++) != '\0')
+    continue;
+
+  return yyd - 1;
+}
+#  endif
+# endif
+
+# ifndef yytnamerr
+/* Copy to YYRES the contents of YYSTR after stripping away unnecessary
+   quotes and backslashes, so that it's suitable for yyerror.  The
+   heuristic is that double-quoting is unnecessary unless the string
+   contains an apostrophe, a comma, or backslash (other than
+   backslash-backslash).  YYSTR is taken from yytname.  If YYRES is
+   null, do not copy; instead, return the length of what the result
+   would have been.  */
+static YYSIZE_T
+yytnamerr (char *yyres, const char *yystr)
+{
+  if (*yystr == '"')
+    {
+      YYSIZE_T yyn = 0;
+      char const *yyp = yystr;
+
+      for (;;)
+	switch (*++yyp)
+	  {
+	  case '\'':
+	  case ',':
+	    goto do_not_strip_quotes;
+
+	  case '\\':
+	    if (*++yyp != '\\')
+	      goto do_not_strip_quotes;
+	    /* Fall through.  */
+	  default:
+	    if (yyres)
+	      yyres[yyn] = *yyp;
+	    yyn++;
+	    break;
+
+	  case '"':
+	    if (yyres)
+	      yyres[yyn] = '\0';
+	    return yyn;
+	  }
+    do_not_strip_quotes: ;
+    }
+
+  if (! yyres)
+    return yystrlen (yystr);
+
+  return yystpcpy (yyres, yystr) - yyres;
+}
+# endif
+
+/* Copy into *YYMSG, which is of size *YYMSG_ALLOC, an error message
+   about the unexpected token YYTOKEN for the state stack whose top is
+   YYSSP.
+
+   Return 0 if *YYMSG was successfully written.  Return 1 if *YYMSG is
+   not large enough to hold the message.  In that case, also set
+   *YYMSG_ALLOC to the required number of bytes.  Return 2 if the
+   required number of bytes is too large to store.  */
+static int
+yysyntax_error (YYSIZE_T *yymsg_alloc, char **yymsg,
+                yytype_int16 *yyssp, int yytoken)
+{
+  YYSIZE_T yysize0 = yytnamerr (YY_NULL, yytname[yytoken]);
+  YYSIZE_T yysize = yysize0;
+  enum { YYERROR_VERBOSE_ARGS_MAXIMUM = 5 };
+  /* Internationalized format string. */
+  const char *yyformat = YY_NULL;
+  /* Arguments of yyformat. */
+  char const *yyarg[YYERROR_VERBOSE_ARGS_MAXIMUM];
+  /* Number of reported tokens (one for the "unexpected", one per
+     "expected"). */
+  int yycount = 0;
+
+  /* There are many possibilities here to consider:
+     - Assume YYFAIL is not used.  It's too flawed to consider.  See
+       <http://lists.gnu.org/archive/html/bison-patches/2009-12/msg00024.html>
+       for details.  YYERROR is fine as it does not invoke this
+       function.
+     - If this state is a consistent state with a default action, then
+       the only way this function was invoked is if the default action
+       is an error action.  In that case, don't check for expected
+       tokens because there are none.
+     - The only way there can be no lookahead present (in yychar) is if
+       this state is a consistent state with a default action.  Thus,
+       detecting the absence of a lookahead is sufficient to determine
+       that there is no unexpected or expected token to report.  In that
+       case, just report a simple "syntax error".
+     - Don't assume there isn't a lookahead just because this state is a
+       consistent state with a default action.  There might have been a
+       previous inconsistent state, consistent state with a non-default
+       action, or user semantic action that manipulated yychar.
+     - Of course, the expected token list depends on states to have
+       correct lookahead information, and it depends on the parser not
+       to perform extra reductions after fetching a lookahead from the
+       scanner and before detecting a syntax error.  Thus, state merging
+       (from LALR or IELR) and default reductions corrupt the expected
+       token list.  However, the list is correct for canonical LR with
+       one exception: it will still contain any token that will not be
+       accepted due to an error action in a later state.
+  */
+  if (yytoken != YYEMPTY)
+    {
+      int yyn = yypact[*yyssp];
+      yyarg[yycount++] = yytname[yytoken];
+      if (!yypact_value_is_default (yyn))
+        {
+          /* Start YYX at -YYN if negative to avoid negative indexes in
+             YYCHECK.  In other words, skip the first -YYN actions for
+             this state because they are default actions.  */
+          int yyxbegin = yyn < 0 ? -yyn : 0;
+          /* Stay within bounds of both yycheck and yytname.  */
+          int yychecklim = YYLAST - yyn + 1;
+          int yyxend = yychecklim < YYNTOKENS ? yychecklim : YYNTOKENS;
+          int yyx;
+
+          for (yyx = yyxbegin; yyx < yyxend; ++yyx)
+            if (yycheck[yyx + yyn] == yyx && yyx != YYTERROR
+                && !yytable_value_is_error (yytable[yyx + yyn]))
+              {
+                if (yycount == YYERROR_VERBOSE_ARGS_MAXIMUM)
+                  {
+                    yycount = 1;
+                    yysize = yysize0;
+                    break;
+                  }
+                yyarg[yycount++] = yytname[yyx];
+                {
+                  YYSIZE_T yysize1 = yysize + yytnamerr (YY_NULL, yytname[yyx]);
+                  if (! (yysize <= yysize1
+                         && yysize1 <= YYSTACK_ALLOC_MAXIMUM))
+                    return 2;
+                  yysize = yysize1;
+                }
+              }
+        }
+    }
+
+  switch (yycount)
+    {
+# define YYCASE_(N, S)                      \
+      case N:                               \
+        yyformat = S;                       \
+      break
+      YYCASE_(0, YY_("syntax error"));
+      YYCASE_(1, YY_("syntax error, unexpected %s"));
+      YYCASE_(2, YY_("syntax error, unexpected %s, expecting %s"));
+      YYCASE_(3, YY_("syntax error, unexpected %s, expecting %s or %s"));
+      YYCASE_(4, YY_("syntax error, unexpected %s, expecting %s or %s or %s"));
+      YYCASE_(5, YY_("syntax error, unexpected %s, expecting %s or %s or %s or %s"));
+# undef YYCASE_
+    }
+
+  {
+    YYSIZE_T yysize1 = yysize + yystrlen (yyformat);
+    if (! (yysize <= yysize1 && yysize1 <= YYSTACK_ALLOC_MAXIMUM))
+      return 2;
+    yysize = yysize1;
+  }
+
+  if (*yymsg_alloc < yysize)
+    {
+      *yymsg_alloc = 2 * yysize;
+      if (! (yysize <= *yymsg_alloc
+             && *yymsg_alloc <= YYSTACK_ALLOC_MAXIMUM))
+        *yymsg_alloc = YYSTACK_ALLOC_MAXIMUM;
+      return 1;
+    }
+
+  /* Avoid sprintf, as that infringes on the user's name space.
+     Don't have undefined behavior even if the translation
+     produced a string with the wrong number of "%s"s.  */
+  {
+    char *yyp = *yymsg;
+    int yyi = 0;
+    while ((*yyp = *yyformat) != '\0')
+      if (*yyp == '%' && yyformat[1] == 's' && yyi < yycount)
+        {
+          yyp += yytnamerr (yyp, yyarg[yyi++]);
+          yyformat += 2;
+        }
+      else
+        {
+          yyp++;
+          yyformat++;
+        }
+  }
+  return 0;
+}
+#endif /* YYERROR_VERBOSE */
+
+/*-----------------------------------------------.
+| Release the memory associated to this symbol.  |
+`-----------------------------------------------*/
+
+/*ARGSUSED*/
+#if (defined __STDC__ || defined __C99__FUNC__ \
+     || defined __cplusplus || defined _MSC_VER)
+static void
+yydestruct (const char *yymsg, int yytype, YYSTYPE *yyvaluep, YYLTYPE *yylocationp, struct asm_parser_state *state)
+#else
+static void
+yydestruct (yymsg, yytype, yyvaluep, yylocationp, state)
+    const char *yymsg;
+    int yytype;
+    YYSTYPE *yyvaluep;
+    YYLTYPE *yylocationp;
+    struct asm_parser_state *state;
+#endif
+{
+  YYUSE (yyvaluep);
+  YYUSE (yylocationp);
+  YYUSE (state);
+
+  if (!yymsg)
+    yymsg = "Deleting";
+  YY_SYMBOL_PRINT (yymsg, yytype, yyvaluep, yylocationp);
+
+  YYUSE (yytype);
+}
+
+
+
+
+/*----------.
+| yyparse.  |
+`----------*/
+
+#ifdef YYPARSE_PARAM
+#if (defined __STDC__ || defined __C99__FUNC__ \
+     || defined __cplusplus || defined _MSC_VER)
+int
+yyparse (void *YYPARSE_PARAM)
+#else
+int
+yyparse (YYPARSE_PARAM)
+    void *YYPARSE_PARAM;
+#endif
+#else /* ! YYPARSE_PARAM */
+#if (defined __STDC__ || defined __C99__FUNC__ \
+     || defined __cplusplus || defined _MSC_VER)
+int
+yyparse (struct asm_parser_state *state)
+#else
+int
+yyparse (state)
+    struct asm_parser_state *state;
+#endif
+#endif
+{
+/* The lookahead symbol.  */
+int yychar;
+
+
+#if defined __GNUC__ && 407 <= __GNUC__ * 100 + __GNUC_MINOR__
+/* Suppress an incorrect diagnostic about yylval being uninitialized.  */
+# define YY_IGNORE_MAYBE_UNINITIALIZED_BEGIN \
+    _Pragma ("GCC diagnostic push") \
+    _Pragma ("GCC diagnostic ignored \"-Wuninitialized\"")\
+    _Pragma ("GCC diagnostic ignored \"-Wmaybe-uninitialized\"")
+# define YY_IGNORE_MAYBE_UNINITIALIZED_END \
+    _Pragma ("GCC diagnostic pop")
+#else
+/* Default value used for initialization, for pacifying older GCCs
+   or non-GCC compilers.  */
+static YYSTYPE yyval_default;
+# define YY_INITIAL_VALUE(Value) = Value
+#endif
+static YYLTYPE yyloc_default
+# if defined YYLTYPE_IS_TRIVIAL && YYLTYPE_IS_TRIVIAL
+  = { 1, 1, 1, 1 }
+# endif
+;
+#ifndef YY_IGNORE_MAYBE_UNINITIALIZED_BEGIN
+# define YY_IGNORE_MAYBE_UNINITIALIZED_BEGIN
+# define YY_IGNORE_MAYBE_UNINITIALIZED_END
+#endif
+#ifndef YY_INITIAL_VALUE
+# define YY_INITIAL_VALUE(Value) /* Nothing. */
+#endif
+
+/* The semantic value of the lookahead symbol.  */
+YYSTYPE yylval YY_INITIAL_VALUE(yyval_default);
+
+/* Location data for the lookahead symbol.  */
+YYLTYPE yylloc = yyloc_default;
+
+
+    /* Number of syntax errors so far.  */
+    int yynerrs;
+
+    int yystate;
+    /* Number of tokens to shift before error messages enabled.  */
+    int yyerrstatus;
+
+    /* The stacks and their tools:
+       `yyss': related to states.
+       `yyvs': related to semantic values.
+       `yyls': related to locations.
+
+       Refer to the stacks through separate pointers, to allow yyoverflow
+       to reallocate them elsewhere.  */
+
+    /* The state stack.  */
+    yytype_int16 yyssa[YYINITDEPTH];
+    yytype_int16 *yyss;
+    yytype_int16 *yyssp;
+
+    /* The semantic value stack.  */
+    YYSTYPE yyvsa[YYINITDEPTH];
+    YYSTYPE *yyvs;
+    YYSTYPE *yyvsp;
+
+    /* The location stack.  */
+    YYLTYPE yylsa[YYINITDEPTH];
+    YYLTYPE *yyls;
+    YYLTYPE *yylsp;
+
+    /* The locations where the error started and ended.  */
+    YYLTYPE yyerror_range[3];
+
+    YYSIZE_T yystacksize;
+
+  int yyn;
+  int yyresult;
+  /* Lookahead token as an internal (translated) token number.  */
+  int yytoken = 0;
+  /* The variables used to return semantic value and location from the
+     action routines.  */
+  YYSTYPE yyval;
+  YYLTYPE yyloc;
+
+#if YYERROR_VERBOSE
+  /* Buffer for error messages, and its allocated size.  */
+  char yymsgbuf[128];
+  char *yymsg = yymsgbuf;
+  YYSIZE_T yymsg_alloc = sizeof yymsgbuf;
+#endif
+
+#define YYPOPSTACK(N)   (yyvsp -= (N), yyssp -= (N), yylsp -= (N))
+
+  /* The number of symbols on the RHS of the reduced rule.
+     Keep to zero when no symbol should be popped.  */
+  int yylen = 0;
+
+  yyssp = yyss = yyssa;
+  yyvsp = yyvs = yyvsa;
+  yylsp = yyls = yylsa;
+  yystacksize = YYINITDEPTH;
+
+  YYDPRINTF ((stderr, "Starting parse\n"));
+
+  yystate = 0;
+  yyerrstatus = 0;
+  yynerrs = 0;
+  yychar = YYEMPTY; /* Cause a token to be read.  */
+  yylsp[0] = yylloc;
+  goto yysetstate;
+
+/*------------------------------------------------------------.
+| yynewstate -- Push a new state, which is found in yystate.  |
+`------------------------------------------------------------*/
+ yynewstate:
+  /* In all cases, when you get here, the value and location stacks
+     have just been pushed.  So pushing a state here evens the stacks.  */
+  yyssp++;
+
+ yysetstate:
+  *yyssp = yystate;
+
+  if (yyss + yystacksize - 1 <= yyssp)
+    {
+      /* Get the current used size of the three stacks, in elements.  */
+      YYSIZE_T yysize = yyssp - yyss + 1;
+
+#ifdef yyoverflow
+      {
+	/* Give user a chance to reallocate the stack.  Use copies of
+	   these so that the &'s don't force the real ones into
+	   memory.  */
+	YYSTYPE *yyvs1 = yyvs;
+	yytype_int16 *yyss1 = yyss;
+	YYLTYPE *yyls1 = yyls;
+
+	/* Each stack pointer address is followed by the size of the
+	   data in use in that stack, in bytes.  This used to be a
+	   conditional around just the two extra args, but that might
+	   be undefined if yyoverflow is a macro.  */
+	yyoverflow (YY_("memory exhausted"),
+		    &yyss1, yysize * sizeof (*yyssp),
+		    &yyvs1, yysize * sizeof (*yyvsp),
+		    &yyls1, yysize * sizeof (*yylsp),
+		    &yystacksize);
+
+	yyls = yyls1;
+	yyss = yyss1;
+	yyvs = yyvs1;
+      }
+#else /* no yyoverflow */
+# ifndef YYSTACK_RELOCATE
+      goto yyexhaustedlab;
+# else
+      /* Extend the stack our own way.  */
+      if (YYMAXDEPTH <= yystacksize)
+	goto yyexhaustedlab;
+      yystacksize *= 2;
+      if (YYMAXDEPTH < yystacksize)
+	yystacksize = YYMAXDEPTH;
+
+      {
+	yytype_int16 *yyss1 = yyss;
+	union yyalloc *yyptr =
+	  (union yyalloc *) YYSTACK_ALLOC (YYSTACK_BYTES (yystacksize));
+	if (! yyptr)
+	  goto yyexhaustedlab;
+	YYSTACK_RELOCATE (yyss_alloc, yyss);
+	YYSTACK_RELOCATE (yyvs_alloc, yyvs);
+	YYSTACK_RELOCATE (yyls_alloc, yyls);
+#  undef YYSTACK_RELOCATE
+	if (yyss1 != yyssa)
+	  YYSTACK_FREE (yyss1);
+      }
+# endif
+#endif /* no yyoverflow */
+
+      yyssp = yyss + yysize - 1;
+      yyvsp = yyvs + yysize - 1;
+      yylsp = yyls + yysize - 1;
+
+      YYDPRINTF ((stderr, "Stack size increased to %lu\n",
+		  (unsigned long int) yystacksize));
+
+      if (yyss + yystacksize - 1 <= yyssp)
+	YYABORT;
+    }
+
+  YYDPRINTF ((stderr, "Entering state %d\n", yystate));
+
+  if (yystate == YYFINAL)
+    YYACCEPT;
+
+  goto yybackup;
+
+/*-----------.
+| yybackup.  |
+`-----------*/
+yybackup:
+
+  /* Do appropriate processing given the current state.  Read a
+     lookahead token if we need one and don't already have one.  */
+
+  /* First try to decide what to do without reference to lookahead token.  */
+  yyn = yypact[yystate];
+  if (yypact_value_is_default (yyn))
+    goto yydefault;
+
+  /* Not known => get a lookahead token if don't already have one.  */
+
+  /* YYCHAR is either YYEMPTY or YYEOF or a valid lookahead symbol.  */
+  if (yychar == YYEMPTY)
+    {
+      YYDPRINTF ((stderr, "Reading a token: "));
+      yychar = YYLEX;
+    }
+
+  if (yychar <= YYEOF)
+    {
+      yychar = yytoken = YYEOF;
+      YYDPRINTF ((stderr, "Now at end of input.\n"));
+    }
+  else
+    {
+      yytoken = YYTRANSLATE (yychar);
+      YY_SYMBOL_PRINT ("Next token is", yytoken, &yylval, &yylloc);
+    }
+
+  /* If the proper action on seeing token YYTOKEN is to reduce or to
+     detect an error, take that action.  */
+  yyn += yytoken;
+  if (yyn < 0 || YYLAST < yyn || yycheck[yyn] != yytoken)
+    goto yydefault;
+  yyn = yytable[yyn];
+  if (yyn <= 0)
+    {
+      if (yytable_value_is_error (yyn))
+        goto yyerrlab;
+      yyn = -yyn;
+      goto yyreduce;
+    }
+
+  /* Count tokens shifted since error; after three, turn off error
+     status.  */
+  if (yyerrstatus)
+    yyerrstatus--;
+
+  /* Shift the lookahead token.  */
+  YY_SYMBOL_PRINT ("Shifting", yytoken, &yylval, &yylloc);
+
+  /* Discard the shifted token.  */
+  yychar = YYEMPTY;
+
+  yystate = yyn;
+  YY_IGNORE_MAYBE_UNINITIALIZED_BEGIN
+  *++yyvsp = yylval;
+  YY_IGNORE_MAYBE_UNINITIALIZED_END
+  *++yylsp = yylloc;
+  goto yynewstate;
+
+
+/*-----------------------------------------------------------.
+| yydefault -- do the default action for the current state.  |
+`-----------------------------------------------------------*/
+yydefault:
+  yyn = yydefact[yystate];
+  if (yyn == 0)
+    goto yyerrlab;
+  goto yyreduce;
+
+
+/*-----------------------------.
+| yyreduce -- Do a reduction.  |
+`-----------------------------*/
+yyreduce:
+  /* yyn is the number of a rule to reduce with.  */
+  yylen = yyr2[yyn];
+
+  /* If YYLEN is nonzero, implement the default value of the action:
+     `$$ = $1'.
+
+     Otherwise, the following line sets YYVAL to garbage.
+     This behavior is undocumented and Bison
+     users should not rely upon it.  Assigning to YYVAL
+     unconditionally makes the parser a bit smaller, and it avoids a
+     GCC warning that YYVAL may be used uninitialized.  */
+  yyval = yyvsp[1-yylen];
+
+  /* Default location.  */
+  YYLLOC_DEFAULT (yyloc, (yylsp - yylen), yylen);
+  YY_REDUCE_PRINT (yyn);
+  switch (yyn)
+    {
+        case 3:
+/* Line 1787 of yacc.c  */
+#line 288 "program/program_parse.y"
+    {
+	   if (state->prog->Target != GL_VERTEX_PROGRAM_ARB) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid fragment program header");
+
+	   }
+	   state->mode = ARB_vertex;
+	}
+    break;
+
+  case 4:
+/* Line 1787 of yacc.c  */
+#line 296 "program/program_parse.y"
+    {
+	   if (state->prog->Target != GL_FRAGMENT_PROGRAM_ARB) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid vertex program header");
+	   }
+	   state->mode = ARB_fragment;
+
+	   state->option.TexRect =
+	      (state->ctx->Extensions.NV_texture_rectangle != GL_FALSE);
+	}
+    break;
+
+  case 7:
+/* Line 1787 of yacc.c  */
+#line 312 "program/program_parse.y"
+    {
+	   int valid = 0;
+
+	   if (state->mode == ARB_vertex) {
+	      valid = _mesa_ARBvp_parse_option(state, (yyvsp[(2) - (3)].string));
+	   } else if (state->mode == ARB_fragment) {
+	      valid = _mesa_ARBfp_parse_option(state, (yyvsp[(2) - (3)].string));
+	   }
+
+
+	   free((yyvsp[(2) - (3)].string));
+
+	   if (!valid) {
+	      const char *const err_str = (state->mode == ARB_vertex)
+		 ? "invalid ARB vertex program option"
+		 : "invalid ARB fragment program option";
+
+	      yyerror(& (yylsp[(2) - (3)]), state, err_str);
+	      YYERROR;
+	   }
+	}
+    break;
+
+  case 10:
+/* Line 1787 of yacc.c  */
+#line 340 "program/program_parse.y"
+    {
+	   if ((yyvsp[(1) - (2)].inst) != NULL) {
+	      if (state->inst_tail == NULL) {
+		 state->inst_head = (yyvsp[(1) - (2)].inst);
+	      } else {
+		 state->inst_tail->next = (yyvsp[(1) - (2)].inst);
+	      }
+
+	      state->inst_tail = (yyvsp[(1) - (2)].inst);
+	      (yyvsp[(1) - (2)].inst)->next = NULL;
+
+	      state->prog->NumInstructions++;
+	   }
+	}
+    break;
+
+  case 12:
+/* Line 1787 of yacc.c  */
+#line 358 "program/program_parse.y"
+    {
+	   (yyval.inst) = (yyvsp[(1) - (1)].inst);
+	   state->prog->NumAluInstructions++;
+	}
+    break;
+
+  case 13:
+/* Line 1787 of yacc.c  */
+#line 363 "program/program_parse.y"
+    {
+	   (yyval.inst) = (yyvsp[(1) - (1)].inst);
+	   state->prog->NumTexInstructions++;
+	}
+    break;
+
+  case 24:
+/* Line 1787 of yacc.c  */
+#line 384 "program/program_parse.y"
+    {
+	   (yyval.inst) = asm_instruction_ctor(OPCODE_ARL, & (yyvsp[(2) - (4)].dst_reg), & (yyvsp[(4) - (4)].src_reg), NULL, NULL);
+	}
+    break;
+
+  case 25:
+/* Line 1787 of yacc.c  */
+#line 390 "program/program_parse.y"
+    {
+	   if ((yyvsp[(1) - (4)].temp_inst).Opcode == OPCODE_DDY)
+	      state->fragment.UsesDFdy = 1;
+	   (yyval.inst) = asm_instruction_copy_ctor(& (yyvsp[(1) - (4)].temp_inst), & (yyvsp[(2) - (4)].dst_reg), & (yyvsp[(4) - (4)].src_reg), NULL, NULL);
+	}
+    break;
+
+  case 26:
+/* Line 1787 of yacc.c  */
+#line 398 "program/program_parse.y"
+    {
+	   (yyval.inst) = asm_instruction_copy_ctor(& (yyvsp[(1) - (4)].temp_inst), & (yyvsp[(2) - (4)].dst_reg), & (yyvsp[(4) - (4)].src_reg), NULL, NULL);
+	}
+    break;
+
+  case 27:
+/* Line 1787 of yacc.c  */
+#line 404 "program/program_parse.y"
+    {
+	   (yyval.inst) = asm_instruction_copy_ctor(& (yyvsp[(1) - (6)].temp_inst), & (yyvsp[(2) - (6)].dst_reg), & (yyvsp[(4) - (6)].src_reg), & (yyvsp[(6) - (6)].src_reg), NULL);
+	}
+    break;
+
+  case 28:
+/* Line 1787 of yacc.c  */
+#line 411 "program/program_parse.y"
+    {
+	   (yyval.inst) = asm_instruction_copy_ctor(& (yyvsp[(1) - (6)].temp_inst), & (yyvsp[(2) - (6)].dst_reg), & (yyvsp[(4) - (6)].src_reg), & (yyvsp[(6) - (6)].src_reg), NULL);
+	}
+    break;
+
+  case 29:
+/* Line 1787 of yacc.c  */
+#line 418 "program/program_parse.y"
+    {
+	   (yyval.inst) = asm_instruction_copy_ctor(& (yyvsp[(1) - (8)].temp_inst), & (yyvsp[(2) - (8)].dst_reg), & (yyvsp[(4) - (8)].src_reg), & (yyvsp[(6) - (8)].src_reg), & (yyvsp[(8) - (8)].src_reg));
+	}
+    break;
+
+  case 30:
+/* Line 1787 of yacc.c  */
+#line 424 "program/program_parse.y"
+    {
+	   (yyval.inst) = asm_instruction_copy_ctor(& (yyvsp[(1) - (8)].temp_inst), & (yyvsp[(2) - (8)].dst_reg), & (yyvsp[(4) - (8)].src_reg), NULL, NULL);
+	   if ((yyval.inst) != NULL) {
+	      const GLbitfield tex_mask = (1U << (yyvsp[(6) - (8)].integer));
+	      GLbitfield shadow_tex = 0;
+	      GLbitfield target_mask = 0;
+
+
+	      (yyval.inst)->Base.TexSrcUnit = (yyvsp[(6) - (8)].integer);
+
+	      if ((yyvsp[(8) - (8)].integer) < 0) {
+		 shadow_tex = tex_mask;
+
+		 (yyval.inst)->Base.TexSrcTarget = -(yyvsp[(8) - (8)].integer);
+		 (yyval.inst)->Base.TexShadow = 1;
+	      } else {
+		 (yyval.inst)->Base.TexSrcTarget = (yyvsp[(8) - (8)].integer);
+	      }
+
+	      target_mask = (1U << (yyval.inst)->Base.TexSrcTarget);
+
+	      /* If this texture unit was previously accessed and that access
+	       * had a different texture target, generate an error.
+	       *
+	       * If this texture unit was previously accessed and that access
+	       * had a different shadow mode, generate an error.
+	       */
+	      if ((state->prog->TexturesUsed[(yyvsp[(6) - (8)].integer)] != 0)
+		  && ((state->prog->TexturesUsed[(yyvsp[(6) - (8)].integer)] != target_mask)
+		      || ((state->prog->ShadowSamplers & tex_mask)
+			  != shadow_tex))) {
+		 yyerror(& (yylsp[(8) - (8)]), state,
+			 "multiple targets used on one texture image unit");
+		 YYERROR;
+	      }
+
+
+	      state->prog->TexturesUsed[(yyvsp[(6) - (8)].integer)] |= target_mask;
+	      state->prog->ShadowSamplers |= shadow_tex;
+	   }
+	}
+    break;
+
+  case 31:
+/* Line 1787 of yacc.c  */
+#line 468 "program/program_parse.y"
+    {
+	   (yyval.inst) = asm_instruction_ctor(OPCODE_KIL, NULL, & (yyvsp[(2) - (2)].src_reg), NULL, NULL);
+	   state->fragment.UsesKill = 1;
+	}
+    break;
+
+  case 32:
+/* Line 1787 of yacc.c  */
+#line 473 "program/program_parse.y"
+    {
+	   (yyval.inst) = asm_instruction_ctor(OPCODE_KIL_NV, NULL, NULL, NULL, NULL);
+	   (yyval.inst)->Base.DstReg.CondMask = (yyvsp[(2) - (2)].dst_reg).CondMask;
+	   (yyval.inst)->Base.DstReg.CondSwizzle = (yyvsp[(2) - (2)].dst_reg).CondSwizzle;
+	   state->fragment.UsesKill = 1;
+	}
+    break;
+
+  case 33:
+/* Line 1787 of yacc.c  */
+#line 482 "program/program_parse.y"
+    {
+	   (yyval.inst) = asm_instruction_copy_ctor(& (yyvsp[(1) - (12)].temp_inst), & (yyvsp[(2) - (12)].dst_reg), & (yyvsp[(4) - (12)].src_reg), & (yyvsp[(6) - (12)].src_reg), & (yyvsp[(8) - (12)].src_reg));
+	   if ((yyval.inst) != NULL) {
+	      const GLbitfield tex_mask = (1U << (yyvsp[(10) - (12)].integer));
+	      GLbitfield shadow_tex = 0;
+	      GLbitfield target_mask = 0;
+
+
+	      (yyval.inst)->Base.TexSrcUnit = (yyvsp[(10) - (12)].integer);
+
+	      if ((yyvsp[(12) - (12)].integer) < 0) {
+		 shadow_tex = tex_mask;
+
+		 (yyval.inst)->Base.TexSrcTarget = -(yyvsp[(12) - (12)].integer);
+		 (yyval.inst)->Base.TexShadow = 1;
+	      } else {
+		 (yyval.inst)->Base.TexSrcTarget = (yyvsp[(12) - (12)].integer);
+	      }
+
+	      target_mask = (1U << (yyval.inst)->Base.TexSrcTarget);
+
+	      /* If this texture unit was previously accessed and that access
+	       * had a different texture target, generate an error.
+	       *
+	       * If this texture unit was previously accessed and that access
+	       * had a different shadow mode, generate an error.
+	       */
+	      if ((state->prog->TexturesUsed[(yyvsp[(10) - (12)].integer)] != 0)
+		  && ((state->prog->TexturesUsed[(yyvsp[(10) - (12)].integer)] != target_mask)
+		      || ((state->prog->ShadowSamplers & tex_mask)
+			  != shadow_tex))) {
+		 yyerror(& (yylsp[(12) - (12)]), state,
+			 "multiple targets used on one texture image unit");
+		 YYERROR;
+	      }
+
+
+	      state->prog->TexturesUsed[(yyvsp[(10) - (12)].integer)] |= target_mask;
+	      state->prog->ShadowSamplers |= shadow_tex;
+	   }
+	}
+    break;
+
+  case 34:
+/* Line 1787 of yacc.c  */
+#line 526 "program/program_parse.y"
+    {
+	   (yyval.integer) = (yyvsp[(2) - (2)].integer);
+	}
+    break;
+
+  case 35:
+/* Line 1787 of yacc.c  */
+#line 531 "program/program_parse.y"
+    { (yyval.integer) = TEXTURE_1D_INDEX; }
+    break;
+
+  case 36:
+/* Line 1787 of yacc.c  */
+#line 532 "program/program_parse.y"
+    { (yyval.integer) = TEXTURE_2D_INDEX; }
+    break;
+
+  case 37:
+/* Line 1787 of yacc.c  */
+#line 533 "program/program_parse.y"
+    { (yyval.integer) = TEXTURE_3D_INDEX; }
+    break;
+
+  case 38:
+/* Line 1787 of yacc.c  */
+#line 534 "program/program_parse.y"
+    { (yyval.integer) = TEXTURE_CUBE_INDEX; }
+    break;
+
+  case 39:
+/* Line 1787 of yacc.c  */
+#line 535 "program/program_parse.y"
+    { (yyval.integer) = TEXTURE_RECT_INDEX; }
+    break;
+
+  case 40:
+/* Line 1787 of yacc.c  */
+#line 536 "program/program_parse.y"
+    { (yyval.integer) = -TEXTURE_1D_INDEX; }
+    break;
+
+  case 41:
+/* Line 1787 of yacc.c  */
+#line 537 "program/program_parse.y"
+    { (yyval.integer) = -TEXTURE_2D_INDEX; }
+    break;
+
+  case 42:
+/* Line 1787 of yacc.c  */
+#line 538 "program/program_parse.y"
+    { (yyval.integer) = -TEXTURE_RECT_INDEX; }
+    break;
+
+  case 43:
+/* Line 1787 of yacc.c  */
+#line 539 "program/program_parse.y"
+    { (yyval.integer) = TEXTURE_1D_ARRAY_INDEX; }
+    break;
+
+  case 44:
+/* Line 1787 of yacc.c  */
+#line 540 "program/program_parse.y"
+    { (yyval.integer) = TEXTURE_2D_ARRAY_INDEX; }
+    break;
+
+  case 45:
+/* Line 1787 of yacc.c  */
+#line 541 "program/program_parse.y"
+    { (yyval.integer) = -TEXTURE_1D_ARRAY_INDEX; }
+    break;
+
+  case 46:
+/* Line 1787 of yacc.c  */
+#line 542 "program/program_parse.y"
+    { (yyval.integer) = -TEXTURE_2D_ARRAY_INDEX; }
+    break;
+
+  case 47:
+/* Line 1787 of yacc.c  */
+#line 546 "program/program_parse.y"
+    {
+	   /* FIXME: Is this correct?  Should the extenedSwizzle be applied
+	    * FIXME: to the existing swizzle?
+	    */
+	   (yyvsp[(4) - (6)].src_reg).Base.Swizzle = (yyvsp[(6) - (6)].swiz_mask).swizzle;
+	   (yyvsp[(4) - (6)].src_reg).Base.Negate = (yyvsp[(6) - (6)].swiz_mask).mask;
+
+	   (yyval.inst) = asm_instruction_copy_ctor(& (yyvsp[(1) - (6)].temp_inst), & (yyvsp[(2) - (6)].dst_reg), & (yyvsp[(4) - (6)].src_reg), NULL, NULL);
+	}
+    break;
+
+  case 48:
+/* Line 1787 of yacc.c  */
+#line 558 "program/program_parse.y"
+    {
+	   (yyval.src_reg) = (yyvsp[(2) - (2)].src_reg);
+
+	   if ((yyvsp[(1) - (2)].negate)) {
+	      (yyval.src_reg).Base.Negate = ~(yyval.src_reg).Base.Negate;
+	   }
+	}
+    break;
+
+  case 49:
+/* Line 1787 of yacc.c  */
+#line 566 "program/program_parse.y"
+    {
+	   (yyval.src_reg) = (yyvsp[(3) - (4)].src_reg);
+
+	   if (!state->option.NV_fragment) {
+	      yyerror(& (yylsp[(2) - (4)]), state, "unexpected character '|'");
+	      YYERROR;
+	   }
+
+	   if ((yyvsp[(1) - (4)].negate)) {
+	      (yyval.src_reg).Base.Negate = ~(yyval.src_reg).Base.Negate;
+	   }
+
+	   (yyval.src_reg).Base.Abs = 1;
+	}
+    break;
+
+  case 50:
+/* Line 1787 of yacc.c  */
+#line 583 "program/program_parse.y"
+    {
+	   (yyval.src_reg) = (yyvsp[(1) - (2)].src_reg);
+
+	   (yyval.src_reg).Base.Swizzle = _mesa_combine_swizzles((yyval.src_reg).Base.Swizzle,
+						    (yyvsp[(2) - (2)].swiz_mask).swizzle);
+	}
+    break;
+
+  case 51:
+/* Line 1787 of yacc.c  */
+#line 590 "program/program_parse.y"
+    {
+	   struct asm_symbol temp_sym;
+
+	   if (!state->option.NV_fragment) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "expected scalar suffix");
+	      YYERROR;
+	   }
+
+	   memset(& temp_sym, 0, sizeof(temp_sym));
+	   temp_sym.param_binding_begin = ~0;
+	   initialize_symbol_from_const(state->prog, & temp_sym, & (yyvsp[(1) - (1)].vector), GL_TRUE);
+
+	   set_src_reg_swz(& (yyval.src_reg), PROGRAM_CONSTANT,
+                           temp_sym.param_binding_begin,
+                           temp_sym.param_binding_swizzle);
+	}
+    break;
+
+  case 52:
+/* Line 1787 of yacc.c  */
+#line 609 "program/program_parse.y"
+    {
+	   (yyval.src_reg) = (yyvsp[(2) - (3)].src_reg);
+
+	   if ((yyvsp[(1) - (3)].negate)) {
+	      (yyval.src_reg).Base.Negate = ~(yyval.src_reg).Base.Negate;
+	   }
+
+	   (yyval.src_reg).Base.Swizzle = _mesa_combine_swizzles((yyval.src_reg).Base.Swizzle,
+						    (yyvsp[(3) - (3)].swiz_mask).swizzle);
+	}
+    break;
+
+  case 53:
+/* Line 1787 of yacc.c  */
+#line 620 "program/program_parse.y"
+    {
+	   (yyval.src_reg) = (yyvsp[(3) - (5)].src_reg);
+
+	   if (!state->option.NV_fragment) {
+	      yyerror(& (yylsp[(2) - (5)]), state, "unexpected character '|'");
+	      YYERROR;
+	   }
+
+	   if ((yyvsp[(1) - (5)].negate)) {
+	      (yyval.src_reg).Base.Negate = ~(yyval.src_reg).Base.Negate;
+	   }
+
+	   (yyval.src_reg).Base.Abs = 1;
+	   (yyval.src_reg).Base.Swizzle = _mesa_combine_swizzles((yyval.src_reg).Base.Swizzle,
+						    (yyvsp[(4) - (5)].swiz_mask).swizzle);
+	}
+    break;
+
+  case 54:
+/* Line 1787 of yacc.c  */
+#line 640 "program/program_parse.y"
+    {
+	   (yyval.dst_reg) = (yyvsp[(1) - (3)].dst_reg);
+	   (yyval.dst_reg).WriteMask = (yyvsp[(2) - (3)].swiz_mask).mask;
+	   (yyval.dst_reg).CondMask = (yyvsp[(3) - (3)].dst_reg).CondMask;
+	   (yyval.dst_reg).CondSwizzle = (yyvsp[(3) - (3)].dst_reg).CondSwizzle;
+
+	   if ((yyval.dst_reg).File == PROGRAM_OUTPUT) {
+	      /* Technically speaking, this should check that it is in
+	       * vertex program mode.  However, PositionInvariant can never be
+	       * set in fragment program mode, so it is somewhat irrelevant.
+	       */
+	      if (state->option.PositionInvariant
+	       && ((yyval.dst_reg).Index == VARYING_SLOT_POS)) {
+		 yyerror(& (yylsp[(1) - (3)]), state, "position-invariant programs cannot "
+			 "write position");
+		 YYERROR;
+	      }
+
+	      state->prog->OutputsWritten |= BITFIELD64_BIT((yyval.dst_reg).Index);
+	   }
+	}
+    break;
+
+  case 55:
+/* Line 1787 of yacc.c  */
+#line 664 "program/program_parse.y"
+    {
+	   set_dst_reg(& (yyval.dst_reg), PROGRAM_ADDRESS, 0);
+	   (yyval.dst_reg).WriteMask = (yyvsp[(2) - (2)].swiz_mask).mask;
+	}
+    break;
+
+  case 56:
+/* Line 1787 of yacc.c  */
+#line 671 "program/program_parse.y"
+    {
+	   const unsigned xyzw_valid =
+	      ((yyvsp[(1) - (7)].ext_swizzle).xyzw_valid << 0)
+	      | ((yyvsp[(3) - (7)].ext_swizzle).xyzw_valid << 1)
+	      | ((yyvsp[(5) - (7)].ext_swizzle).xyzw_valid << 2)
+	      | ((yyvsp[(7) - (7)].ext_swizzle).xyzw_valid << 3);
+	   const unsigned rgba_valid =
+	      ((yyvsp[(1) - (7)].ext_swizzle).rgba_valid << 0)
+	      | ((yyvsp[(3) - (7)].ext_swizzle).rgba_valid << 1)
+	      | ((yyvsp[(5) - (7)].ext_swizzle).rgba_valid << 2)
+	      | ((yyvsp[(7) - (7)].ext_swizzle).rgba_valid << 3);
+
+	   /* All of the swizzle components have to be valid in either RGBA
+	    * or XYZW.  Note that 0 and 1 are valid in both, so both masks
+	    * can have some bits set.
+	    *
+	    * We somewhat deviate from the spec here.  It would be really hard
+	    * to figure out which component is the error, and there probably
+	    * isn't a lot of benefit.
+	    */
+	   if ((rgba_valid != 0x0f) && (xyzw_valid != 0x0f)) {
+	      yyerror(& (yylsp[(1) - (7)]), state, "cannot combine RGBA and XYZW swizzle "
+		      "components");
+	      YYERROR;
+	   }
+
+	   (yyval.swiz_mask).swizzle = MAKE_SWIZZLE4((yyvsp[(1) - (7)].ext_swizzle).swz, (yyvsp[(3) - (7)].ext_swizzle).swz, (yyvsp[(5) - (7)].ext_swizzle).swz, (yyvsp[(7) - (7)].ext_swizzle).swz);
+	   (yyval.swiz_mask).mask = ((yyvsp[(1) - (7)].ext_swizzle).negate) | ((yyvsp[(3) - (7)].ext_swizzle).negate << 1) | ((yyvsp[(5) - (7)].ext_swizzle).negate << 2)
+	      | ((yyvsp[(7) - (7)].ext_swizzle).negate << 3);
+	}
+    break;
+
+  case 57:
+/* Line 1787 of yacc.c  */
+#line 704 "program/program_parse.y"
+    {
+	   (yyval.ext_swizzle) = (yyvsp[(2) - (2)].ext_swizzle);
+	   (yyval.ext_swizzle).negate = ((yyvsp[(1) - (2)].negate)) ? 1 : 0;
+	}
+    break;
+
+  case 58:
+/* Line 1787 of yacc.c  */
+#line 711 "program/program_parse.y"
+    {
+	   if (((yyvsp[(1) - (1)].integer) != 0) && ((yyvsp[(1) - (1)].integer) != 1)) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid extended swizzle selector");
+	      YYERROR;
+	   }
+
+	   (yyval.ext_swizzle).swz = ((yyvsp[(1) - (1)].integer) == 0) ? SWIZZLE_ZERO : SWIZZLE_ONE;
+           (yyval.ext_swizzle).negate = 0;
+
+	   /* 0 and 1 are valid for both RGBA swizzle names and XYZW
+	    * swizzle names.
+	    */
+	   (yyval.ext_swizzle).xyzw_valid = 1;
+	   (yyval.ext_swizzle).rgba_valid = 1;
+	}
+    break;
+
+  case 59:
+/* Line 1787 of yacc.c  */
+#line 727 "program/program_parse.y"
+    {
+	   char s;
+
+	   if (strlen((yyvsp[(1) - (1)].string)) > 1) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid extended swizzle selector");
+	      YYERROR;
+	   }
+
+	   s = (yyvsp[(1) - (1)].string)[0];
+	   free((yyvsp[(1) - (1)].string));
+
+           (yyval.ext_swizzle).rgba_valid = 0;
+           (yyval.ext_swizzle).xyzw_valid = 0;
+           (yyval.ext_swizzle).negate = 0;
+
+	   switch (s) {
+	   case 'x':
+	      (yyval.ext_swizzle).swz = SWIZZLE_X;
+	      (yyval.ext_swizzle).xyzw_valid = 1;
+	      break;
+	   case 'y':
+	      (yyval.ext_swizzle).swz = SWIZZLE_Y;
+	      (yyval.ext_swizzle).xyzw_valid = 1;
+	      break;
+	   case 'z':
+	      (yyval.ext_swizzle).swz = SWIZZLE_Z;
+	      (yyval.ext_swizzle).xyzw_valid = 1;
+	      break;
+	   case 'w':
+	      (yyval.ext_swizzle).swz = SWIZZLE_W;
+	      (yyval.ext_swizzle).xyzw_valid = 1;
+	      break;
+
+	   case 'r':
+	      (yyval.ext_swizzle).swz = SWIZZLE_X;
+	      (yyval.ext_swizzle).rgba_valid = 1;
+	      break;
+	   case 'g':
+	      (yyval.ext_swizzle).swz = SWIZZLE_Y;
+	      (yyval.ext_swizzle).rgba_valid = 1;
+	      break;
+	   case 'b':
+	      (yyval.ext_swizzle).swz = SWIZZLE_Z;
+	      (yyval.ext_swizzle).rgba_valid = 1;
+	      break;
+	   case 'a':
+	      (yyval.ext_swizzle).swz = SWIZZLE_W;
+	      (yyval.ext_swizzle).rgba_valid = 1;
+	      break;
+
+	   default:
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid extended swizzle selector");
+	      YYERROR;
+	      break;
+	   }
+	}
+    break;
+
+  case 60:
+/* Line 1787 of yacc.c  */
+#line 786 "program/program_parse.y"
+    {
+	   struct asm_symbol *const s = (struct asm_symbol *)
+	      _mesa_symbol_table_find_symbol(state->st, 0, (yyvsp[(1) - (1)].string));
+
+	   free((yyvsp[(1) - (1)].string));
+
+	   if (s == NULL) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid operand variable");
+	      YYERROR;
+	   } else if ((s->type != at_param) && (s->type != at_temp)
+		      && (s->type != at_attrib)) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid operand variable");
+	      YYERROR;
+	   } else if ((s->type == at_param) && s->param_is_array) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "non-array access to array PARAM");
+	      YYERROR;
+	   }
+
+	   init_src_reg(& (yyval.src_reg));
+	   switch (s->type) {
+	   case at_temp:
+	      set_src_reg(& (yyval.src_reg), PROGRAM_TEMPORARY, s->temp_binding);
+	      break;
+	   case at_param:
+              set_src_reg_swz(& (yyval.src_reg), s->param_binding_type,
+                              s->param_binding_begin,
+                              s->param_binding_swizzle);
+	      break;
+	   case at_attrib:
+	      set_src_reg(& (yyval.src_reg), PROGRAM_INPUT, s->attrib_binding);
+	      state->prog->InputsRead |= BITFIELD64_BIT((yyval.src_reg).Base.Index);
+
+	      if (!validate_inputs(& (yylsp[(1) - (1)]), state)) {
+		 YYERROR;
+	      }
+	      break;
+
+	   default:
+	      YYERROR;
+	      break;
+	   }
+	}
+    break;
+
+  case 61:
+/* Line 1787 of yacc.c  */
+#line 829 "program/program_parse.y"
+    {
+	   set_src_reg(& (yyval.src_reg), PROGRAM_INPUT, (yyvsp[(1) - (1)].attrib));
+	   state->prog->InputsRead |= BITFIELD64_BIT((yyval.src_reg).Base.Index);
+
+	   if (!validate_inputs(& (yylsp[(1) - (1)]), state)) {
+	      YYERROR;
+	   }
+	}
+    break;
+
+  case 62:
+/* Line 1787 of yacc.c  */
+#line 838 "program/program_parse.y"
+    {
+	   if (! (yyvsp[(3) - (4)].src_reg).Base.RelAddr
+	       && ((unsigned) (yyvsp[(3) - (4)].src_reg).Base.Index >= (yyvsp[(1) - (4)].sym)->param_binding_length)) {
+	      yyerror(& (yylsp[(3) - (4)]), state, "out of bounds array access");
+	      YYERROR;
+	   }
+
+	   init_src_reg(& (yyval.src_reg));
+	   (yyval.src_reg).Base.File = (yyvsp[(1) - (4)].sym)->param_binding_type;
+
+	   if ((yyvsp[(3) - (4)].src_reg).Base.RelAddr) {
+              state->prog->IndirectRegisterFiles |= (1 << (yyval.src_reg).Base.File);
+	      (yyvsp[(1) - (4)].sym)->param_accessed_indirectly = 1;
+
+	      (yyval.src_reg).Base.RelAddr = 1;
+	      (yyval.src_reg).Base.Index = (yyvsp[(3) - (4)].src_reg).Base.Index;
+	      (yyval.src_reg).Symbol = (yyvsp[(1) - (4)].sym);
+	   } else {
+	      (yyval.src_reg).Base.Index = (yyvsp[(1) - (4)].sym)->param_binding_begin + (yyvsp[(3) - (4)].src_reg).Base.Index;
+	   }
+	}
+    break;
+
+  case 63:
+/* Line 1787 of yacc.c  */
+#line 860 "program/program_parse.y"
+    {
+           gl_register_file file = ((yyvsp[(1) - (1)].temp_sym).name != NULL) 
+	      ? (yyvsp[(1) - (1)].temp_sym).param_binding_type
+	      : PROGRAM_CONSTANT;
+           set_src_reg_swz(& (yyval.src_reg), file, (yyvsp[(1) - (1)].temp_sym).param_binding_begin,
+                           (yyvsp[(1) - (1)].temp_sym).param_binding_swizzle);
+	}
+    break;
+
+  case 64:
+/* Line 1787 of yacc.c  */
+#line 870 "program/program_parse.y"
+    {
+	   set_dst_reg(& (yyval.dst_reg), PROGRAM_OUTPUT, (yyvsp[(1) - (1)].result));
+	}
+    break;
+
+  case 65:
+/* Line 1787 of yacc.c  */
+#line 874 "program/program_parse.y"
+    {
+	   struct asm_symbol *const s = (struct asm_symbol *)
+	      _mesa_symbol_table_find_symbol(state->st, 0, (yyvsp[(1) - (1)].string));
+
+	   free((yyvsp[(1) - (1)].string));
+
+	   if (s == NULL) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid operand variable");
+	      YYERROR;
+	   } else if ((s->type != at_output) && (s->type != at_temp)) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid operand variable");
+	      YYERROR;
+	   }
+
+	   switch (s->type) {
+	   case at_temp:
+	      set_dst_reg(& (yyval.dst_reg), PROGRAM_TEMPORARY, s->temp_binding);
+	      break;
+	   case at_output:
+	      set_dst_reg(& (yyval.dst_reg), PROGRAM_OUTPUT, s->output_binding);
+	      break;
+	   default:
+	      set_dst_reg(& (yyval.dst_reg), s->param_binding_type, s->param_binding_begin);
+	      break;
+	   }
+	}
+    break;
+
+  case 66:
+/* Line 1787 of yacc.c  */
+#line 903 "program/program_parse.y"
+    {
+	   struct asm_symbol *const s = (struct asm_symbol *)
+	      _mesa_symbol_table_find_symbol(state->st, 0, (yyvsp[(1) - (1)].string));
+
+	   free((yyvsp[(1) - (1)].string));
+
+	   if (s == NULL) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid operand variable");
+	      YYERROR;
+	   } else if ((s->type != at_param) || !s->param_is_array) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "array access to non-PARAM variable");
+	      YYERROR;
+	   } else {
+	      (yyval.sym) = s;
+	   }
+	}
+    break;
+
+  case 69:
+/* Line 1787 of yacc.c  */
+#line 924 "program/program_parse.y"
+    {
+	   init_src_reg(& (yyval.src_reg));
+	   (yyval.src_reg).Base.Index = (yyvsp[(1) - (1)].integer);
+	}
+    break;
+
+  case 70:
+/* Line 1787 of yacc.c  */
+#line 931 "program/program_parse.y"
+    {
+	   /* FINISHME: Add support for multiple address registers.
+	    */
+	   /* FINISHME: Add support for 4-component address registers.
+	    */
+	   init_src_reg(& (yyval.src_reg));
+	   (yyval.src_reg).Base.RelAddr = 1;
+	   (yyval.src_reg).Base.Index = (yyvsp[(3) - (3)].integer);
+	}
+    break;
+
+  case 71:
+/* Line 1787 of yacc.c  */
+#line 942 "program/program_parse.y"
+    { (yyval.integer) = 0; }
+    break;
+
+  case 72:
+/* Line 1787 of yacc.c  */
+#line 943 "program/program_parse.y"
+    { (yyval.integer) = (yyvsp[(2) - (2)].integer); }
+    break;
+
+  case 73:
+/* Line 1787 of yacc.c  */
+#line 944 "program/program_parse.y"
+    { (yyval.integer) = -(yyvsp[(2) - (2)].integer); }
+    break;
+
+  case 74:
+/* Line 1787 of yacc.c  */
+#line 948 "program/program_parse.y"
+    {
+	   if (((yyvsp[(1) - (1)].integer) < 0) || ((yyvsp[(1) - (1)].integer) > (state->limits->MaxAddressOffset - 1))) {
+              char s[100];
+              _mesa_snprintf(s, sizeof(s),
+                             "relative address offset too large (%d)", (yyvsp[(1) - (1)].integer));
+	      yyerror(& (yylsp[(1) - (1)]), state, s);
+	      YYERROR;
+	   } else {
+	      (yyval.integer) = (yyvsp[(1) - (1)].integer);
+	   }
+	}
+    break;
+
+  case 75:
+/* Line 1787 of yacc.c  */
+#line 962 "program/program_parse.y"
+    {
+	   if (((yyvsp[(1) - (1)].integer) < 0) || ((yyvsp[(1) - (1)].integer) > state->limits->MaxAddressOffset)) {
+              char s[100];
+              _mesa_snprintf(s, sizeof(s),
+                             "relative address offset too large (%d)", (yyvsp[(1) - (1)].integer));
+	      yyerror(& (yylsp[(1) - (1)]), state, s);
+	      YYERROR;
+	   } else {
+	      (yyval.integer) = (yyvsp[(1) - (1)].integer);
+	   }
+	}
+    break;
+
+  case 76:
+/* Line 1787 of yacc.c  */
+#line 976 "program/program_parse.y"
+    {
+	   struct asm_symbol *const s = (struct asm_symbol *)
+	      _mesa_symbol_table_find_symbol(state->st, 0, (yyvsp[(1) - (1)].string));
+
+	   free((yyvsp[(1) - (1)].string));
+
+	   if (s == NULL) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid array member");
+	      YYERROR;
+	   } else if (s->type != at_address) {
+	      yyerror(& (yylsp[(1) - (1)]), state,
+		      "invalid variable for indexed array access");
+	      YYERROR;
+	   } else {
+	      (yyval.sym) = s;
+	   }
+	}
+    break;
+
+  case 77:
+/* Line 1787 of yacc.c  */
+#line 996 "program/program_parse.y"
+    {
+	   if ((yyvsp[(1) - (1)].swiz_mask).mask != WRITEMASK_X) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid address component selector");
+	      YYERROR;
+	   } else {
+	      (yyval.swiz_mask) = (yyvsp[(1) - (1)].swiz_mask);
+	   }
+	}
+    break;
+
+  case 78:
+/* Line 1787 of yacc.c  */
+#line 1007 "program/program_parse.y"
+    {
+	   if ((yyvsp[(1) - (1)].swiz_mask).mask != WRITEMASK_X) {
+	      yyerror(& (yylsp[(1) - (1)]), state,
+		      "address register write mask must be \".x\"");
+	      YYERROR;
+	   } else {
+	      (yyval.swiz_mask) = (yyvsp[(1) - (1)].swiz_mask);
+	   }
+	}
+    break;
+
+  case 83:
+/* Line 1787 of yacc.c  */
+#line 1023 "program/program_parse.y"
+    { (yyval.swiz_mask).swizzle = SWIZZLE_NOOP; (yyval.swiz_mask).mask = WRITEMASK_XYZW; }
+    break;
+
+  case 88:
+/* Line 1787 of yacc.c  */
+#line 1027 "program/program_parse.y"
+    { (yyval.swiz_mask).swizzle = SWIZZLE_NOOP; (yyval.swiz_mask).mask = WRITEMASK_XYZW; }
+    break;
+
+  case 89:
+/* Line 1787 of yacc.c  */
+#line 1031 "program/program_parse.y"
+    {
+	   (yyval.dst_reg) = (yyvsp[(2) - (3)].dst_reg);
+	}
+    break;
+
+  case 90:
+/* Line 1787 of yacc.c  */
+#line 1035 "program/program_parse.y"
+    {
+	   (yyval.dst_reg) = (yyvsp[(2) - (3)].dst_reg);
+	}
+    break;
+
+  case 91:
+/* Line 1787 of yacc.c  */
+#line 1039 "program/program_parse.y"
+    {
+	   (yyval.dst_reg).CondMask = COND_TR;
+	   (yyval.dst_reg).CondSwizzle = SWIZZLE_NOOP;
+	}
+    break;
+
+  case 92:
+/* Line 1787 of yacc.c  */
+#line 1046 "program/program_parse.y"
+    {
+	   (yyval.dst_reg) = (yyvsp[(1) - (2)].dst_reg);
+	   (yyval.dst_reg).CondSwizzle = (yyvsp[(2) - (2)].swiz_mask).swizzle;
+	}
+    break;
+
+  case 93:
+/* Line 1787 of yacc.c  */
+#line 1053 "program/program_parse.y"
+    {
+	   (yyval.dst_reg) = (yyvsp[(1) - (2)].dst_reg);
+	   (yyval.dst_reg).CondSwizzle = (yyvsp[(2) - (2)].swiz_mask).swizzle;
+	}
+    break;
+
+  case 94:
+/* Line 1787 of yacc.c  */
+#line 1060 "program/program_parse.y"
+    {
+	   const int cond = _mesa_parse_cc((yyvsp[(1) - (1)].string));
+	   if ((cond == 0) || ((yyvsp[(1) - (1)].string)[2] != '\0')) {
+	      char *const err_str =
+		 make_error_string("invalid condition code \"%s\"", (yyvsp[(1) - (1)].string));
+
+	      yyerror(& (yylsp[(1) - (1)]), state, (err_str != NULL)
+		      ? err_str : "invalid condition code");
+
+	      if (err_str != NULL) {
+		 free(err_str);
+	      }
+
+	      YYERROR;
+	   }
+
+	   (yyval.dst_reg).CondMask = cond;
+	   (yyval.dst_reg).CondSwizzle = SWIZZLE_NOOP;
+	}
+    break;
+
+  case 95:
+/* Line 1787 of yacc.c  */
+#line 1082 "program/program_parse.y"
+    {
+	   const int cond = _mesa_parse_cc((yyvsp[(1) - (1)].string));
+	   if ((cond == 0) || ((yyvsp[(1) - (1)].string)[2] != '\0')) {
+	      char *const err_str =
+		 make_error_string("invalid condition code \"%s\"", (yyvsp[(1) - (1)].string));
+
+	      yyerror(& (yylsp[(1) - (1)]), state, (err_str != NULL)
+		      ? err_str : "invalid condition code");
+
+	      if (err_str != NULL) {
+		 free(err_str);
+	      }
+
+	      YYERROR;
+	   }
+
+	   (yyval.dst_reg).CondMask = cond;
+	   (yyval.dst_reg).CondSwizzle = SWIZZLE_NOOP;
+	}
+    break;
+
+  case 102:
+/* Line 1787 of yacc.c  */
+#line 1112 "program/program_parse.y"
+    {
+	   struct asm_symbol *const s =
+	      declare_variable(state, (yyvsp[(2) - (4)].string), at_attrib, & (yylsp[(2) - (4)]));
+
+	   if (s == NULL) {
+	      free((yyvsp[(2) - (4)].string));
+	      YYERROR;
+	   } else {
+	      s->attrib_binding = (yyvsp[(4) - (4)].attrib);
+	      state->InputsBound |= BITFIELD64_BIT(s->attrib_binding);
+
+	      if (!validate_inputs(& (yylsp[(4) - (4)]), state)) {
+		 YYERROR;
+	      }
+	   }
+	}
+    break;
+
+  case 103:
+/* Line 1787 of yacc.c  */
+#line 1131 "program/program_parse.y"
+    {
+	   (yyval.attrib) = (yyvsp[(2) - (2)].attrib);
+	}
+    break;
+
+  case 104:
+/* Line 1787 of yacc.c  */
+#line 1135 "program/program_parse.y"
+    {
+	   (yyval.attrib) = (yyvsp[(2) - (2)].attrib);
+	}
+    break;
+
+  case 105:
+/* Line 1787 of yacc.c  */
+#line 1141 "program/program_parse.y"
+    {
+	   (yyval.attrib) = VERT_ATTRIB_POS;
+	}
+    break;
+
+  case 106:
+/* Line 1787 of yacc.c  */
+#line 1145 "program/program_parse.y"
+    {
+	   (yyval.attrib) = VERT_ATTRIB_WEIGHT;
+	}
+    break;
+
+  case 107:
+/* Line 1787 of yacc.c  */
+#line 1149 "program/program_parse.y"
+    {
+	   (yyval.attrib) = VERT_ATTRIB_NORMAL;
+	}
+    break;
+
+  case 108:
+/* Line 1787 of yacc.c  */
+#line 1153 "program/program_parse.y"
+    {
+	   (yyval.attrib) = VERT_ATTRIB_COLOR0 + (yyvsp[(2) - (2)].integer);
+	}
+    break;
+
+  case 109:
+/* Line 1787 of yacc.c  */
+#line 1157 "program/program_parse.y"
+    {
+	   (yyval.attrib) = VERT_ATTRIB_FOG;
+	}
+    break;
+
+  case 110:
+/* Line 1787 of yacc.c  */
+#line 1161 "program/program_parse.y"
+    {
+	   (yyval.attrib) = VERT_ATTRIB_TEX0 + (yyvsp[(2) - (2)].integer);
+	}
+    break;
+
+  case 111:
+/* Line 1787 of yacc.c  */
+#line 1165 "program/program_parse.y"
+    {
+	   yyerror(& (yylsp[(1) - (4)]), state, "GL_ARB_matrix_palette not supported");
+	   YYERROR;
+	}
+    break;
+
+  case 112:
+/* Line 1787 of yacc.c  */
+#line 1170 "program/program_parse.y"
+    {
+	   (yyval.attrib) = VERT_ATTRIB_GENERIC0 + (yyvsp[(3) - (4)].integer);
+	}
+    break;
+
+  case 113:
+/* Line 1787 of yacc.c  */
+#line 1176 "program/program_parse.y"
+    {
+	   if ((unsigned) (yyvsp[(1) - (1)].integer) >= state->limits->MaxAttribs) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid vertex attribute reference");
+	      YYERROR;
+	   }
+
+	   (yyval.integer) = (yyvsp[(1) - (1)].integer);
+	}
+    break;
+
+  case 117:
+/* Line 1787 of yacc.c  */
+#line 1190 "program/program_parse.y"
+    {
+	   (yyval.attrib) = VARYING_SLOT_POS;
+	}
+    break;
+
+  case 118:
+/* Line 1787 of yacc.c  */
+#line 1194 "program/program_parse.y"
+    {
+	   (yyval.attrib) = VARYING_SLOT_COL0 + (yyvsp[(2) - (2)].integer);
+	}
+    break;
+
+  case 119:
+/* Line 1787 of yacc.c  */
+#line 1198 "program/program_parse.y"
+    {
+	   (yyval.attrib) = VARYING_SLOT_FOGC;
+	}
+    break;
+
+  case 120:
+/* Line 1787 of yacc.c  */
+#line 1202 "program/program_parse.y"
+    {
+	   (yyval.attrib) = VARYING_SLOT_TEX0 + (yyvsp[(2) - (2)].integer);
+	}
+    break;
+
+  case 123:
+/* Line 1787 of yacc.c  */
+#line 1210 "program/program_parse.y"
+    {
+	   struct asm_symbol *const s =
+	      declare_variable(state, (yyvsp[(2) - (3)].string), at_param, & (yylsp[(2) - (3)]));
+
+	   if (s == NULL) {
+	      free((yyvsp[(2) - (3)].string));
+	      YYERROR;
+	   } else {
+	      s->param_binding_type = (yyvsp[(3) - (3)].temp_sym).param_binding_type;
+	      s->param_binding_begin = (yyvsp[(3) - (3)].temp_sym).param_binding_begin;
+	      s->param_binding_length = (yyvsp[(3) - (3)].temp_sym).param_binding_length;
+              s->param_binding_swizzle = (yyvsp[(3) - (3)].temp_sym).param_binding_swizzle;
+	      s->param_is_array = 0;
+	   }
+	}
+    break;
+
+  case 124:
+/* Line 1787 of yacc.c  */
+#line 1228 "program/program_parse.y"
+    {
+	   if (((yyvsp[(4) - (6)].integer) != 0) && ((unsigned) (yyvsp[(4) - (6)].integer) != (yyvsp[(6) - (6)].temp_sym).param_binding_length)) {
+	      free((yyvsp[(2) - (6)].string));
+	      yyerror(& (yylsp[(4) - (6)]), state, 
+		      "parameter array size and number of bindings must match");
+	      YYERROR;
+	   } else {
+	      struct asm_symbol *const s =
+		 declare_variable(state, (yyvsp[(2) - (6)].string), (yyvsp[(6) - (6)].temp_sym).type, & (yylsp[(2) - (6)]));
+
+	      if (s == NULL) {
+		 free((yyvsp[(2) - (6)].string));
+		 YYERROR;
+	      } else {
+		 s->param_binding_type = (yyvsp[(6) - (6)].temp_sym).param_binding_type;
+		 s->param_binding_begin = (yyvsp[(6) - (6)].temp_sym).param_binding_begin;
+		 s->param_binding_length = (yyvsp[(6) - (6)].temp_sym).param_binding_length;
+                 s->param_binding_swizzle = SWIZZLE_XYZW;
+		 s->param_is_array = 1;
+	      }
+	   }
+	}
+    break;
+
+  case 125:
+/* Line 1787 of yacc.c  */
+#line 1253 "program/program_parse.y"
+    {
+	   (yyval.integer) = 0;
+	}
+    break;
+
+  case 126:
+/* Line 1787 of yacc.c  */
+#line 1257 "program/program_parse.y"
+    {
+	   if (((yyvsp[(1) - (1)].integer) < 1) || ((unsigned) (yyvsp[(1) - (1)].integer) > state->limits->MaxParameters)) {
+              char msg[100];
+              _mesa_snprintf(msg, sizeof(msg),
+                             "invalid parameter array size (size=%d max=%u)",
+                             (yyvsp[(1) - (1)].integer), state->limits->MaxParameters);
+	      yyerror(& (yylsp[(1) - (1)]), state, msg);
+	      YYERROR;
+	   } else {
+	      (yyval.integer) = (yyvsp[(1) - (1)].integer);
+	   }
+	}
+    break;
+
+  case 127:
+/* Line 1787 of yacc.c  */
+#line 1272 "program/program_parse.y"
+    {
+	   (yyval.temp_sym) = (yyvsp[(2) - (2)].temp_sym);
+	}
+    break;
+
+  case 128:
+/* Line 1787 of yacc.c  */
+#line 1278 "program/program_parse.y"
+    {
+	   (yyval.temp_sym) = (yyvsp[(3) - (4)].temp_sym);
+	}
+    break;
+
+  case 130:
+/* Line 1787 of yacc.c  */
+#line 1285 "program/program_parse.y"
+    {
+	   (yyvsp[(1) - (3)].temp_sym).param_binding_length += (yyvsp[(3) - (3)].temp_sym).param_binding_length;
+	   (yyval.temp_sym) = (yyvsp[(1) - (3)].temp_sym);
+	}
+    break;
+
+  case 131:
+/* Line 1787 of yacc.c  */
+#line 1292 "program/program_parse.y"
+    {
+	   memset(& (yyval.temp_sym), 0, sizeof((yyval.temp_sym)));
+	   (yyval.temp_sym).param_binding_begin = ~0;
+	   initialize_symbol_from_state(state->prog, & (yyval.temp_sym), (yyvsp[(1) - (1)].state));
+	}
+    break;
+
+  case 132:
+/* Line 1787 of yacc.c  */
+#line 1298 "program/program_parse.y"
+    {
+	   memset(& (yyval.temp_sym), 0, sizeof((yyval.temp_sym)));
+	   (yyval.temp_sym).param_binding_begin = ~0;
+	   initialize_symbol_from_param(state->prog, & (yyval.temp_sym), (yyvsp[(1) - (1)].state));
+	}
+    break;
+
+  case 133:
+/* Line 1787 of yacc.c  */
+#line 1304 "program/program_parse.y"
+    {
+	   memset(& (yyval.temp_sym), 0, sizeof((yyval.temp_sym)));
+	   (yyval.temp_sym).param_binding_begin = ~0;
+	   initialize_symbol_from_const(state->prog, & (yyval.temp_sym), & (yyvsp[(1) - (1)].vector), GL_TRUE);
+	}
+    break;
+
+  case 134:
+/* Line 1787 of yacc.c  */
+#line 1312 "program/program_parse.y"
+    {
+	   memset(& (yyval.temp_sym), 0, sizeof((yyval.temp_sym)));
+	   (yyval.temp_sym).param_binding_begin = ~0;
+	   initialize_symbol_from_state(state->prog, & (yyval.temp_sym), (yyvsp[(1) - (1)].state));
+	}
+    break;
+
+  case 135:
+/* Line 1787 of yacc.c  */
+#line 1318 "program/program_parse.y"
+    {
+	   memset(& (yyval.temp_sym), 0, sizeof((yyval.temp_sym)));
+	   (yyval.temp_sym).param_binding_begin = ~0;
+	   initialize_symbol_from_param(state->prog, & (yyval.temp_sym), (yyvsp[(1) - (1)].state));
+	}
+    break;
+
+  case 136:
+/* Line 1787 of yacc.c  */
+#line 1324 "program/program_parse.y"
+    {
+	   memset(& (yyval.temp_sym), 0, sizeof((yyval.temp_sym)));
+	   (yyval.temp_sym).param_binding_begin = ~0;
+	   initialize_symbol_from_const(state->prog, & (yyval.temp_sym), & (yyvsp[(1) - (1)].vector), GL_TRUE);
+	}
+    break;
+
+  case 137:
+/* Line 1787 of yacc.c  */
+#line 1332 "program/program_parse.y"
+    {
+	   memset(& (yyval.temp_sym), 0, sizeof((yyval.temp_sym)));
+	   (yyval.temp_sym).param_binding_begin = ~0;
+	   initialize_symbol_from_state(state->prog, & (yyval.temp_sym), (yyvsp[(1) - (1)].state));
+	}
+    break;
+
+  case 138:
+/* Line 1787 of yacc.c  */
+#line 1338 "program/program_parse.y"
+    {
+	   memset(& (yyval.temp_sym), 0, sizeof((yyval.temp_sym)));
+	   (yyval.temp_sym).param_binding_begin = ~0;
+	   initialize_symbol_from_param(state->prog, & (yyval.temp_sym), (yyvsp[(1) - (1)].state));
+	}
+    break;
+
+  case 139:
+/* Line 1787 of yacc.c  */
+#line 1344 "program/program_parse.y"
+    {
+	   memset(& (yyval.temp_sym), 0, sizeof((yyval.temp_sym)));
+	   (yyval.temp_sym).param_binding_begin = ~0;
+	   initialize_symbol_from_const(state->prog, & (yyval.temp_sym), & (yyvsp[(1) - (1)].vector), GL_FALSE);
+	}
+    break;
+
+  case 140:
+/* Line 1787 of yacc.c  */
+#line 1351 "program/program_parse.y"
+    { memcpy((yyval.state), (yyvsp[(1) - (1)].state), sizeof((yyval.state))); }
+    break;
+
+  case 141:
+/* Line 1787 of yacc.c  */
+#line 1352 "program/program_parse.y"
+    { memcpy((yyval.state), (yyvsp[(2) - (2)].state), sizeof((yyval.state))); }
+    break;
+
+  case 142:
+/* Line 1787 of yacc.c  */
+#line 1355 "program/program_parse.y"
+    { memcpy((yyval.state), (yyvsp[(2) - (2)].state), sizeof((yyval.state))); }
+    break;
+
+  case 143:
+/* Line 1787 of yacc.c  */
+#line 1356 "program/program_parse.y"
+    { memcpy((yyval.state), (yyvsp[(2) - (2)].state), sizeof((yyval.state))); }
+    break;
+
+  case 144:
+/* Line 1787 of yacc.c  */
+#line 1357 "program/program_parse.y"
+    { memcpy((yyval.state), (yyvsp[(2) - (2)].state), sizeof((yyval.state))); }
+    break;
+
+  case 145:
+/* Line 1787 of yacc.c  */
+#line 1358 "program/program_parse.y"
+    { memcpy((yyval.state), (yyvsp[(2) - (2)].state), sizeof((yyval.state))); }
+    break;
+
+  case 146:
+/* Line 1787 of yacc.c  */
+#line 1359 "program/program_parse.y"
+    { memcpy((yyval.state), (yyvsp[(2) - (2)].state), sizeof((yyval.state))); }
+    break;
+
+  case 147:
+/* Line 1787 of yacc.c  */
+#line 1360 "program/program_parse.y"
+    { memcpy((yyval.state), (yyvsp[(2) - (2)].state), sizeof((yyval.state))); }
+    break;
+
+  case 148:
+/* Line 1787 of yacc.c  */
+#line 1361 "program/program_parse.y"
+    { memcpy((yyval.state), (yyvsp[(2) - (2)].state), sizeof((yyval.state))); }
+    break;
+
+  case 149:
+/* Line 1787 of yacc.c  */
+#line 1362 "program/program_parse.y"
+    { memcpy((yyval.state), (yyvsp[(2) - (2)].state), sizeof((yyval.state))); }
+    break;
+
+  case 150:
+/* Line 1787 of yacc.c  */
+#line 1363 "program/program_parse.y"
+    { memcpy((yyval.state), (yyvsp[(2) - (2)].state), sizeof((yyval.state))); }
+    break;
+
+  case 151:
+/* Line 1787 of yacc.c  */
+#line 1364 "program/program_parse.y"
+    { memcpy((yyval.state), (yyvsp[(2) - (2)].state), sizeof((yyval.state))); }
+    break;
+
+  case 152:
+/* Line 1787 of yacc.c  */
+#line 1365 "program/program_parse.y"
+    { memcpy((yyval.state), (yyvsp[(2) - (2)].state), sizeof((yyval.state))); }
+    break;
+
+  case 153:
+/* Line 1787 of yacc.c  */
+#line 1369 "program/program_parse.y"
+    {
+	   memset((yyval.state), 0, sizeof((yyval.state)));
+	   (yyval.state)[0] = STATE_MATERIAL;
+	   (yyval.state)[1] = (yyvsp[(2) - (3)].integer);
+	   (yyval.state)[2] = (yyvsp[(3) - (3)].integer);
+	}
+    break;
+
+  case 154:
+/* Line 1787 of yacc.c  */
+#line 1378 "program/program_parse.y"
+    {
+	   (yyval.integer) = (yyvsp[(1) - (1)].integer);
+	}
+    break;
+
+  case 155:
+/* Line 1787 of yacc.c  */
+#line 1382 "program/program_parse.y"
+    {
+	   (yyval.integer) = STATE_EMISSION;
+	}
+    break;
+
+  case 156:
+/* Line 1787 of yacc.c  */
+#line 1386 "program/program_parse.y"
+    {
+	   (yyval.integer) = STATE_SHININESS;
+	}
+    break;
+
+  case 157:
+/* Line 1787 of yacc.c  */
+#line 1392 "program/program_parse.y"
+    {
+	   memset((yyval.state), 0, sizeof((yyval.state)));
+	   (yyval.state)[0] = STATE_LIGHT;
+	   (yyval.state)[1] = (yyvsp[(3) - (5)].integer);
+	   (yyval.state)[2] = (yyvsp[(5) - (5)].integer);
+	}
+    break;
+
+  case 158:
+/* Line 1787 of yacc.c  */
+#line 1401 "program/program_parse.y"
+    {
+	   (yyval.integer) = (yyvsp[(1) - (1)].integer);
+	}
+    break;
+
+  case 159:
+/* Line 1787 of yacc.c  */
+#line 1405 "program/program_parse.y"
+    {
+	   (yyval.integer) = STATE_POSITION;
+	}
+    break;
+
+  case 160:
+/* Line 1787 of yacc.c  */
+#line 1409 "program/program_parse.y"
+    {
+	   if (!state->ctx->Extensions.EXT_point_parameters) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "GL_ARB_point_parameters not supported");
+	      YYERROR;
+	   }
+
+	   (yyval.integer) = STATE_ATTENUATION;
+	}
+    break;
+
+  case 161:
+/* Line 1787 of yacc.c  */
+#line 1418 "program/program_parse.y"
+    {
+	   (yyval.integer) = (yyvsp[(2) - (2)].integer);
+	}
+    break;
+
+  case 162:
+/* Line 1787 of yacc.c  */
+#line 1422 "program/program_parse.y"
+    {
+	   (yyval.integer) = STATE_HALF_VECTOR;
+	}
+    break;
+
+  case 163:
+/* Line 1787 of yacc.c  */
+#line 1428 "program/program_parse.y"
+    {
+	   (yyval.integer) = STATE_SPOT_DIRECTION;
+	}
+    break;
+
+  case 164:
+/* Line 1787 of yacc.c  */
+#line 1434 "program/program_parse.y"
+    {
+	   (yyval.state)[0] = (yyvsp[(2) - (2)].state)[0];
+	   (yyval.state)[1] = (yyvsp[(2) - (2)].state)[1];
+	}
+    break;
+
+  case 165:
+/* Line 1787 of yacc.c  */
+#line 1441 "program/program_parse.y"
+    {
+	   memset((yyval.state), 0, sizeof((yyval.state)));
+	   (yyval.state)[0] = STATE_LIGHTMODEL_AMBIENT;
+	}
+    break;
+
+  case 166:
+/* Line 1787 of yacc.c  */
+#line 1446 "program/program_parse.y"
+    {
+	   memset((yyval.state), 0, sizeof((yyval.state)));
+	   (yyval.state)[0] = STATE_LIGHTMODEL_SCENECOLOR;
+	   (yyval.state)[1] = (yyvsp[(1) - (2)].integer);
+	}
+    break;
+
+  case 167:
+/* Line 1787 of yacc.c  */
+#line 1454 "program/program_parse.y"
+    {
+	   memset((yyval.state), 0, sizeof((yyval.state)));
+	   (yyval.state)[0] = STATE_LIGHTPROD;
+	   (yyval.state)[1] = (yyvsp[(3) - (6)].integer);
+	   (yyval.state)[2] = (yyvsp[(5) - (6)].integer);
+	   (yyval.state)[3] = (yyvsp[(6) - (6)].integer);
+	}
+    break;
+
+  case 169:
+/* Line 1787 of yacc.c  */
+#line 1466 "program/program_parse.y"
+    {
+	   memset((yyval.state), 0, sizeof((yyval.state)));
+	   (yyval.state)[0] = (yyvsp[(3) - (3)].integer);
+	   (yyval.state)[1] = (yyvsp[(2) - (3)].integer);
+	}
+    break;
+
+  case 170:
+/* Line 1787 of yacc.c  */
+#line 1474 "program/program_parse.y"
+    {
+	   (yyval.integer) = STATE_TEXENV_COLOR;
+	}
+    break;
+
+  case 171:
+/* Line 1787 of yacc.c  */
+#line 1480 "program/program_parse.y"
+    {
+	   (yyval.integer) = STATE_AMBIENT;
+	}
+    break;
+
+  case 172:
+/* Line 1787 of yacc.c  */
+#line 1484 "program/program_parse.y"
+    {
+	   (yyval.integer) = STATE_DIFFUSE;
+	}
+    break;
+
+  case 173:
+/* Line 1787 of yacc.c  */
+#line 1488 "program/program_parse.y"
+    {
+	   (yyval.integer) = STATE_SPECULAR;
+	}
+    break;
+
+  case 174:
+/* Line 1787 of yacc.c  */
+#line 1494 "program/program_parse.y"
+    {
+	   if ((unsigned) (yyvsp[(1) - (1)].integer) >= state->MaxLights) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid light selector");
+	      YYERROR;
+	   }
+
+	   (yyval.integer) = (yyvsp[(1) - (1)].integer);
+	}
+    break;
+
+  case 175:
+/* Line 1787 of yacc.c  */
+#line 1505 "program/program_parse.y"
+    {
+	   memset((yyval.state), 0, sizeof((yyval.state)));
+	   (yyval.state)[0] = STATE_TEXGEN;
+	   (yyval.state)[1] = (yyvsp[(2) - (4)].integer);
+	   (yyval.state)[2] = (yyvsp[(3) - (4)].integer) + (yyvsp[(4) - (4)].integer);
+	}
+    break;
+
+  case 176:
+/* Line 1787 of yacc.c  */
+#line 1514 "program/program_parse.y"
+    {
+	   (yyval.integer) = STATE_TEXGEN_EYE_S;
+	}
+    break;
+
+  case 177:
+/* Line 1787 of yacc.c  */
+#line 1518 "program/program_parse.y"
+    {
+	   (yyval.integer) = STATE_TEXGEN_OBJECT_S;
+	}
+    break;
+
+  case 178:
+/* Line 1787 of yacc.c  */
+#line 1523 "program/program_parse.y"
+    {
+	   (yyval.integer) = STATE_TEXGEN_EYE_S - STATE_TEXGEN_EYE_S;
+	}
+    break;
+
+  case 179:
+/* Line 1787 of yacc.c  */
+#line 1527 "program/program_parse.y"
+    {
+	   (yyval.integer) = STATE_TEXGEN_EYE_T - STATE_TEXGEN_EYE_S;
+	}
+    break;
+
+  case 180:
+/* Line 1787 of yacc.c  */
+#line 1531 "program/program_parse.y"
+    {
+	   (yyval.integer) = STATE_TEXGEN_EYE_R - STATE_TEXGEN_EYE_S;
+	}
+    break;
+
+  case 181:
+/* Line 1787 of yacc.c  */
+#line 1535 "program/program_parse.y"
+    {
+	   (yyval.integer) = STATE_TEXGEN_EYE_Q - STATE_TEXGEN_EYE_S;
+	}
+    break;
+
+  case 182:
+/* Line 1787 of yacc.c  */
+#line 1541 "program/program_parse.y"
+    {
+	   memset((yyval.state), 0, sizeof((yyval.state)));
+	   (yyval.state)[0] = (yyvsp[(2) - (2)].integer);
+	}
+    break;
+
+  case 183:
+/* Line 1787 of yacc.c  */
+#line 1548 "program/program_parse.y"
+    {
+	   (yyval.integer) = STATE_FOG_COLOR;
+	}
+    break;
+
+  case 184:
+/* Line 1787 of yacc.c  */
+#line 1552 "program/program_parse.y"
+    {
+	   (yyval.integer) = STATE_FOG_PARAMS;
+	}
+    break;
+
+  case 185:
+/* Line 1787 of yacc.c  */
+#line 1558 "program/program_parse.y"
+    {
+	   memset((yyval.state), 0, sizeof((yyval.state)));
+	   (yyval.state)[0] = STATE_CLIPPLANE;
+	   (yyval.state)[1] = (yyvsp[(3) - (5)].integer);
+	}
+    break;
+
+  case 186:
+/* Line 1787 of yacc.c  */
+#line 1566 "program/program_parse.y"
+    {
+	   if ((unsigned) (yyvsp[(1) - (1)].integer) >= state->MaxClipPlanes) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid clip plane selector");
+	      YYERROR;
+	   }
+
+	   (yyval.integer) = (yyvsp[(1) - (1)].integer);
+	}
+    break;
+
+  case 187:
+/* Line 1787 of yacc.c  */
+#line 1577 "program/program_parse.y"
+    {
+	   memset((yyval.state), 0, sizeof((yyval.state)));
+	   (yyval.state)[0] = (yyvsp[(2) - (2)].integer);
+	}
+    break;
+
+  case 188:
+/* Line 1787 of yacc.c  */
+#line 1584 "program/program_parse.y"
+    {
+	   (yyval.integer) = STATE_POINT_SIZE;
+	}
+    break;
+
+  case 189:
+/* Line 1787 of yacc.c  */
+#line 1588 "program/program_parse.y"
+    {
+	   (yyval.integer) = STATE_POINT_ATTENUATION;
+	}
+    break;
+
+  case 190:
+/* Line 1787 of yacc.c  */
+#line 1594 "program/program_parse.y"
+    {
+	   (yyval.state)[0] = (yyvsp[(1) - (5)].state)[0];
+	   (yyval.state)[1] = (yyvsp[(1) - (5)].state)[1];
+	   (yyval.state)[2] = (yyvsp[(4) - (5)].integer);
+	   (yyval.state)[3] = (yyvsp[(4) - (5)].integer);
+	   (yyval.state)[4] = (yyvsp[(1) - (5)].state)[2];
+	}
+    break;
+
+  case 191:
+/* Line 1787 of yacc.c  */
+#line 1604 "program/program_parse.y"
+    {
+	   (yyval.state)[0] = (yyvsp[(1) - (2)].state)[0];
+	   (yyval.state)[1] = (yyvsp[(1) - (2)].state)[1];
+	   (yyval.state)[2] = (yyvsp[(2) - (2)].state)[2];
+	   (yyval.state)[3] = (yyvsp[(2) - (2)].state)[3];
+	   (yyval.state)[4] = (yyvsp[(1) - (2)].state)[2];
+	}
+    break;
+
+  case 192:
+/* Line 1787 of yacc.c  */
+#line 1614 "program/program_parse.y"
+    {
+	   (yyval.state)[2] = 0;
+	   (yyval.state)[3] = 3;
+	}
+    break;
+
+  case 193:
+/* Line 1787 of yacc.c  */
+#line 1619 "program/program_parse.y"
+    {
+	   /* It seems logical that the matrix row range specifier would have
+	    * to specify a range or more than one row (i.e., $5 > $3).
+	    * However, the ARB_vertex_program spec says "a program will fail
+	    * to load if <a> is greater than <b>."  This means that $3 == $5
+	    * is valid.
+	    */
+	   if ((yyvsp[(3) - (6)].integer) > (yyvsp[(5) - (6)].integer)) {
+	      yyerror(& (yylsp[(3) - (6)]), state, "invalid matrix row range");
+	      YYERROR;
+	   }
+
+	   (yyval.state)[2] = (yyvsp[(3) - (6)].integer);
+	   (yyval.state)[3] = (yyvsp[(5) - (6)].integer);
+	}
+    break;
+
+  case 194:
+/* Line 1787 of yacc.c  */
+#line 1637 "program/program_parse.y"
+    {
+	   (yyval.state)[0] = (yyvsp[(2) - (3)].state)[0];
+	   (yyval.state)[1] = (yyvsp[(2) - (3)].state)[1];
+	   (yyval.state)[2] = (yyvsp[(3) - (3)].integer);
+	}
+    break;
+
+  case 195:
+/* Line 1787 of yacc.c  */
+#line 1645 "program/program_parse.y"
+    {
+	   (yyval.integer) = 0;
+	}
+    break;
+
+  case 196:
+/* Line 1787 of yacc.c  */
+#line 1649 "program/program_parse.y"
+    {
+	   (yyval.integer) = (yyvsp[(1) - (1)].integer);
+	}
+    break;
+
+  case 197:
+/* Line 1787 of yacc.c  */
+#line 1655 "program/program_parse.y"
+    {
+	   (yyval.integer) = STATE_MATRIX_INVERSE;
+	}
+    break;
+
+  case 198:
+/* Line 1787 of yacc.c  */
+#line 1659 "program/program_parse.y"
+    {
+	   (yyval.integer) = STATE_MATRIX_TRANSPOSE;
+	}
+    break;
+
+  case 199:
+/* Line 1787 of yacc.c  */
+#line 1663 "program/program_parse.y"
+    {
+	   (yyval.integer) = STATE_MATRIX_INVTRANS;
+	}
+    break;
+
+  case 200:
+/* Line 1787 of yacc.c  */
+#line 1669 "program/program_parse.y"
+    {
+	   if ((yyvsp[(1) - (1)].integer) > 3) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid matrix row reference");
+	      YYERROR;
+	   }
+
+	   (yyval.integer) = (yyvsp[(1) - (1)].integer);
+	}
+    break;
+
+  case 201:
+/* Line 1787 of yacc.c  */
+#line 1680 "program/program_parse.y"
+    {
+	   (yyval.state)[0] = STATE_MODELVIEW_MATRIX;
+	   (yyval.state)[1] = (yyvsp[(2) - (2)].integer);
+	}
+    break;
+
+  case 202:
+/* Line 1787 of yacc.c  */
+#line 1685 "program/program_parse.y"
+    {
+	   (yyval.state)[0] = STATE_PROJECTION_MATRIX;
+	   (yyval.state)[1] = 0;
+	}
+    break;
+
+  case 203:
+/* Line 1787 of yacc.c  */
+#line 1690 "program/program_parse.y"
+    {
+	   (yyval.state)[0] = STATE_MVP_MATRIX;
+	   (yyval.state)[1] = 0;
+	}
+    break;
+
+  case 204:
+/* Line 1787 of yacc.c  */
+#line 1695 "program/program_parse.y"
+    {
+	   (yyval.state)[0] = STATE_TEXTURE_MATRIX;
+	   (yyval.state)[1] = (yyvsp[(2) - (2)].integer);
+	}
+    break;
+
+  case 205:
+/* Line 1787 of yacc.c  */
+#line 1700 "program/program_parse.y"
+    {
+	   yyerror(& (yylsp[(1) - (4)]), state, "GL_ARB_matrix_palette not supported");
+	   YYERROR;
+	}
+    break;
+
+  case 206:
+/* Line 1787 of yacc.c  */
+#line 1705 "program/program_parse.y"
+    {
+	   (yyval.state)[0] = STATE_PROGRAM_MATRIX;
+	   (yyval.state)[1] = (yyvsp[(3) - (4)].integer);
+	}
+    break;
+
+  case 207:
+/* Line 1787 of yacc.c  */
+#line 1712 "program/program_parse.y"
+    {
+	   (yyval.integer) = 0;
+	}
+    break;
+
+  case 208:
+/* Line 1787 of yacc.c  */
+#line 1716 "program/program_parse.y"
+    {
+	   (yyval.integer) = (yyvsp[(2) - (3)].integer);
+	}
+    break;
+
+  case 209:
+/* Line 1787 of yacc.c  */
+#line 1721 "program/program_parse.y"
+    {
+	   /* Since GL_ARB_vertex_blend isn't supported, only modelview matrix
+	    * zero is valid.
+	    */
+	   if ((yyvsp[(1) - (1)].integer) != 0) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid modelview matrix index");
+	      YYERROR;
+	   }
+
+	   (yyval.integer) = (yyvsp[(1) - (1)].integer);
+	}
+    break;
+
+  case 210:
+/* Line 1787 of yacc.c  */
+#line 1734 "program/program_parse.y"
+    {
+	   /* Since GL_ARB_matrix_palette isn't supported, just let any value
+	    * through here.  The error will be generated later.
+	    */
+	   (yyval.integer) = (yyvsp[(1) - (1)].integer);
+	}
+    break;
+
+  case 211:
+/* Line 1787 of yacc.c  */
+#line 1742 "program/program_parse.y"
+    {
+	   if ((unsigned) (yyvsp[(1) - (1)].integer) >= state->MaxProgramMatrices) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid program matrix selector");
+	      YYERROR;
+	   }
+
+	   (yyval.integer) = (yyvsp[(1) - (1)].integer);
+	}
+    break;
+
+  case 212:
+/* Line 1787 of yacc.c  */
+#line 1753 "program/program_parse.y"
+    {
+	   memset((yyval.state), 0, sizeof((yyval.state)));
+	   (yyval.state)[0] = STATE_DEPTH_RANGE;
+	}
+    break;
+
+  case 217:
+/* Line 1787 of yacc.c  */
+#line 1765 "program/program_parse.y"
+    {
+	   memset((yyval.state), 0, sizeof((yyval.state)));
+	   (yyval.state)[0] = state->state_param_enum;
+	   (yyval.state)[1] = STATE_ENV;
+	   (yyval.state)[2] = (yyvsp[(4) - (5)].state)[0];
+	   (yyval.state)[3] = (yyvsp[(4) - (5)].state)[1];
+	}
+    break;
+
+  case 218:
+/* Line 1787 of yacc.c  */
+#line 1775 "program/program_parse.y"
+    {
+	   (yyval.state)[0] = (yyvsp[(1) - (1)].integer);
+	   (yyval.state)[1] = (yyvsp[(1) - (1)].integer);
+	}
+    break;
+
+  case 219:
+/* Line 1787 of yacc.c  */
+#line 1780 "program/program_parse.y"
+    {
+	   (yyval.state)[0] = (yyvsp[(1) - (3)].integer);
+	   (yyval.state)[1] = (yyvsp[(3) - (3)].integer);
+	}
+    break;
+
+  case 220:
+/* Line 1787 of yacc.c  */
+#line 1787 "program/program_parse.y"
+    {
+	   memset((yyval.state), 0, sizeof((yyval.state)));
+	   (yyval.state)[0] = state->state_param_enum;
+	   (yyval.state)[1] = STATE_ENV;
+	   (yyval.state)[2] = (yyvsp[(4) - (5)].integer);
+	   (yyval.state)[3] = (yyvsp[(4) - (5)].integer);
+	}
+    break;
+
+  case 221:
+/* Line 1787 of yacc.c  */
+#line 1797 "program/program_parse.y"
+    {
+	   memset((yyval.state), 0, sizeof((yyval.state)));
+	   (yyval.state)[0] = state->state_param_enum;
+	   (yyval.state)[1] = STATE_LOCAL;
+	   (yyval.state)[2] = (yyvsp[(4) - (5)].state)[0];
+	   (yyval.state)[3] = (yyvsp[(4) - (5)].state)[1];
+	}
+    break;
+
+  case 222:
+/* Line 1787 of yacc.c  */
+#line 1806 "program/program_parse.y"
+    {
+	   (yyval.state)[0] = (yyvsp[(1) - (1)].integer);
+	   (yyval.state)[1] = (yyvsp[(1) - (1)].integer);
+	}
+    break;
+
+  case 223:
+/* Line 1787 of yacc.c  */
+#line 1811 "program/program_parse.y"
+    {
+	   (yyval.state)[0] = (yyvsp[(1) - (3)].integer);
+	   (yyval.state)[1] = (yyvsp[(3) - (3)].integer);
+	}
+    break;
+
+  case 224:
+/* Line 1787 of yacc.c  */
+#line 1818 "program/program_parse.y"
+    {
+	   memset((yyval.state), 0, sizeof((yyval.state)));
+	   (yyval.state)[0] = state->state_param_enum;
+	   (yyval.state)[1] = STATE_LOCAL;
+	   (yyval.state)[2] = (yyvsp[(4) - (5)].integer);
+	   (yyval.state)[3] = (yyvsp[(4) - (5)].integer);
+	}
+    break;
+
+  case 225:
+/* Line 1787 of yacc.c  */
+#line 1828 "program/program_parse.y"
+    {
+	   if ((unsigned) (yyvsp[(1) - (1)].integer) >= state->limits->MaxEnvParams) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid environment parameter reference");
+	      YYERROR;
+	   }
+	   (yyval.integer) = (yyvsp[(1) - (1)].integer);
+	}
+    break;
+
+  case 226:
+/* Line 1787 of yacc.c  */
+#line 1838 "program/program_parse.y"
+    {
+	   if ((unsigned) (yyvsp[(1) - (1)].integer) >= state->limits->MaxLocalParams) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid local parameter reference");
+	      YYERROR;
+	   }
+	   (yyval.integer) = (yyvsp[(1) - (1)].integer);
+	}
+    break;
+
+  case 231:
+/* Line 1787 of yacc.c  */
+#line 1853 "program/program_parse.y"
+    {
+	   (yyval.vector).count = 4;
+	   (yyval.vector).data[0].f = (yyvsp[(1) - (1)].real);
+	   (yyval.vector).data[1].f = (yyvsp[(1) - (1)].real);
+	   (yyval.vector).data[2].f = (yyvsp[(1) - (1)].real);
+	   (yyval.vector).data[3].f = (yyvsp[(1) - (1)].real);
+	}
+    break;
+
+  case 232:
+/* Line 1787 of yacc.c  */
+#line 1863 "program/program_parse.y"
+    {
+	   (yyval.vector).count = 1;
+	   (yyval.vector).data[0].f = (yyvsp[(1) - (1)].real);
+	   (yyval.vector).data[1].f = (yyvsp[(1) - (1)].real);
+	   (yyval.vector).data[2].f = (yyvsp[(1) - (1)].real);
+	   (yyval.vector).data[3].f = (yyvsp[(1) - (1)].real);
+	}
+    break;
+
+  case 233:
+/* Line 1787 of yacc.c  */
+#line 1871 "program/program_parse.y"
+    {
+	   (yyval.vector).count = 1;
+	   (yyval.vector).data[0].f = (float) (yyvsp[(1) - (1)].integer);
+	   (yyval.vector).data[1].f = (float) (yyvsp[(1) - (1)].integer);
+	   (yyval.vector).data[2].f = (float) (yyvsp[(1) - (1)].integer);
+	   (yyval.vector).data[3].f = (float) (yyvsp[(1) - (1)].integer);
+	}
+    break;
+
+  case 234:
+/* Line 1787 of yacc.c  */
+#line 1881 "program/program_parse.y"
+    {
+	   (yyval.vector).count = 4;
+	   (yyval.vector).data[0].f = (yyvsp[(2) - (3)].real);
+	   (yyval.vector).data[1].f = 0.0f;
+	   (yyval.vector).data[2].f = 0.0f;
+	   (yyval.vector).data[3].f = 1.0f;
+	}
+    break;
+
+  case 235:
+/* Line 1787 of yacc.c  */
+#line 1889 "program/program_parse.y"
+    {
+	   (yyval.vector).count = 4;
+	   (yyval.vector).data[0].f = (yyvsp[(2) - (5)].real);
+	   (yyval.vector).data[1].f = (yyvsp[(4) - (5)].real);
+	   (yyval.vector).data[2].f = 0.0f;
+	   (yyval.vector).data[3].f = 1.0f;
+	}
+    break;
+
+  case 236:
+/* Line 1787 of yacc.c  */
+#line 1898 "program/program_parse.y"
+    {
+	   (yyval.vector).count = 4;
+	   (yyval.vector).data[0].f = (yyvsp[(2) - (7)].real);
+	   (yyval.vector).data[1].f = (yyvsp[(4) - (7)].real);
+	   (yyval.vector).data[2].f = (yyvsp[(6) - (7)].real);
+	   (yyval.vector).data[3].f = 1.0f;
+	}
+    break;
+
+  case 237:
+/* Line 1787 of yacc.c  */
+#line 1907 "program/program_parse.y"
+    {
+	   (yyval.vector).count = 4;
+	   (yyval.vector).data[0].f = (yyvsp[(2) - (9)].real);
+	   (yyval.vector).data[1].f = (yyvsp[(4) - (9)].real);
+	   (yyval.vector).data[2].f = (yyvsp[(6) - (9)].real);
+	   (yyval.vector).data[3].f = (yyvsp[(8) - (9)].real);
+	}
+    break;
+
+  case 238:
+/* Line 1787 of yacc.c  */
+#line 1917 "program/program_parse.y"
+    {
+	   (yyval.real) = ((yyvsp[(1) - (2)].negate)) ? -(yyvsp[(2) - (2)].real) : (yyvsp[(2) - (2)].real);
+	}
+    break;
+
+  case 239:
+/* Line 1787 of yacc.c  */
+#line 1921 "program/program_parse.y"
+    {
+	   (yyval.real) = (float)(((yyvsp[(1) - (2)].negate)) ? -(yyvsp[(2) - (2)].integer) : (yyvsp[(2) - (2)].integer));
+	}
+    break;
+
+  case 240:
+/* Line 1787 of yacc.c  */
+#line 1926 "program/program_parse.y"
+    { (yyval.negate) = FALSE; }
+    break;
+
+  case 241:
+/* Line 1787 of yacc.c  */
+#line 1927 "program/program_parse.y"
+    { (yyval.negate) = TRUE;  }
+    break;
+
+  case 242:
+/* Line 1787 of yacc.c  */
+#line 1928 "program/program_parse.y"
+    { (yyval.negate) = FALSE; }
+    break;
+
+  case 243:
+/* Line 1787 of yacc.c  */
+#line 1931 "program/program_parse.y"
+    { (yyval.integer) = (yyvsp[(2) - (2)].integer); }
+    break;
+
+  case 245:
+/* Line 1787 of yacc.c  */
+#line 1935 "program/program_parse.y"
+    {
+	   /* NV_fragment_program_option defines the size qualifiers in a
+	    * fairly broken way.  "SHORT" or "LONG" can optionally be used
+	    * before TEMP or OUTPUT.  However, neither is a reserved word!
+	    * This means that we have to parse it as an identifier, then check
+	    * to make sure it's one of the valid values.  *sigh*
+	    *
+	    * In addition, the grammar in the extension spec does *not* allow
+	    * the size specifier to be optional, but all known implementations
+	    * do.
+	    */
+	   if (!state->option.NV_fragment) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "unexpected IDENTIFIER");
+	      YYERROR;
+	   }
+
+	   if (strcmp("SHORT", (yyvsp[(1) - (1)].string)) == 0) {
+	   } else if (strcmp("LONG", (yyvsp[(1) - (1)].string)) == 0) {
+	   } else {
+	      char *const err_str =
+		 make_error_string("invalid storage size specifier \"%s\"",
+				   (yyvsp[(1) - (1)].string));
+
+	      yyerror(& (yylsp[(1) - (1)]), state, (err_str != NULL)
+		      ? err_str : "invalid storage size specifier");
+
+	      if (err_str != NULL) {
+		 free(err_str);
+	      }
+
+	      YYERROR;
+	   }
+	}
+    break;
+
+  case 246:
+/* Line 1787 of yacc.c  */
+#line 1969 "program/program_parse.y"
+    {
+	}
+    break;
+
+  case 247:
+/* Line 1787 of yacc.c  */
+#line 1973 "program/program_parse.y"
+    { (yyval.integer) = (yyvsp[(1) - (1)].integer); }
+    break;
+
+  case 249:
+/* Line 1787 of yacc.c  */
+#line 1977 "program/program_parse.y"
+    {
+	   if (!declare_variable(state, (yyvsp[(3) - (3)].string), (yyvsp[(0) - (3)].integer), & (yylsp[(3) - (3)]))) {
+	      free((yyvsp[(3) - (3)].string));
+	      YYERROR;
+	   }
+	}
+    break;
+
+  case 250:
+/* Line 1787 of yacc.c  */
+#line 1984 "program/program_parse.y"
+    {
+	   if (!declare_variable(state, (yyvsp[(1) - (1)].string), (yyvsp[(0) - (1)].integer), & (yylsp[(1) - (1)]))) {
+	      free((yyvsp[(1) - (1)].string));
+	      YYERROR;
+	   }
+	}
+    break;
+
+  case 251:
+/* Line 1787 of yacc.c  */
+#line 1993 "program/program_parse.y"
+    {
+	   struct asm_symbol *const s =
+	      declare_variable(state, (yyvsp[(3) - (5)].string), at_output, & (yylsp[(3) - (5)]));
+
+	   if (s == NULL) {
+	      free((yyvsp[(3) - (5)].string));
+	      YYERROR;
+	   } else {
+	      s->output_binding = (yyvsp[(5) - (5)].result);
+	   }
+	}
+    break;
+
+  case 252:
+/* Line 1787 of yacc.c  */
+#line 2007 "program/program_parse.y"
+    {
+	   if (state->mode == ARB_vertex) {
+	      (yyval.result) = VARYING_SLOT_POS;
+	   } else {
+	      yyerror(& (yylsp[(2) - (2)]), state, "invalid program result name");
+	      YYERROR;
+	   }
+	}
+    break;
+
+  case 253:
+/* Line 1787 of yacc.c  */
+#line 2016 "program/program_parse.y"
+    {
+	   if (state->mode == ARB_vertex) {
+	      (yyval.result) = VARYING_SLOT_FOGC;
+	   } else {
+	      yyerror(& (yylsp[(2) - (2)]), state, "invalid program result name");
+	      YYERROR;
+	   }
+	}
+    break;
+
+  case 254:
+/* Line 1787 of yacc.c  */
+#line 2025 "program/program_parse.y"
+    {
+	   (yyval.result) = (yyvsp[(2) - (2)].result);
+	}
+    break;
+
+  case 255:
+/* Line 1787 of yacc.c  */
+#line 2029 "program/program_parse.y"
+    {
+	   if (state->mode == ARB_vertex) {
+	      (yyval.result) = VARYING_SLOT_PSIZ;
+	   } else {
+	      yyerror(& (yylsp[(2) - (2)]), state, "invalid program result name");
+	      YYERROR;
+	   }
+	}
+    break;
+
+  case 256:
+/* Line 1787 of yacc.c  */
+#line 2038 "program/program_parse.y"
+    {
+	   if (state->mode == ARB_vertex) {
+	      (yyval.result) = VARYING_SLOT_TEX0 + (yyvsp[(3) - (3)].integer);
+	   } else {
+	      yyerror(& (yylsp[(2) - (3)]), state, "invalid program result name");
+	      YYERROR;
+	   }
+	}
+    break;
+
+  case 257:
+/* Line 1787 of yacc.c  */
+#line 2047 "program/program_parse.y"
+    {
+	   if (state->mode == ARB_fragment) {
+	      (yyval.result) = FRAG_RESULT_DEPTH;
+	   } else {
+	      yyerror(& (yylsp[(2) - (2)]), state, "invalid program result name");
+	      YYERROR;
+	   }
+	}
+    break;
+
+  case 258:
+/* Line 1787 of yacc.c  */
+#line 2058 "program/program_parse.y"
+    {
+	   (yyval.result) = (yyvsp[(2) - (3)].integer) + (yyvsp[(3) - (3)].integer);
+	}
+    break;
+
+  case 259:
+/* Line 1787 of yacc.c  */
+#line 2064 "program/program_parse.y"
+    {
+	   if (state->mode == ARB_vertex) {
+	      (yyval.integer) = VARYING_SLOT_COL0;
+	   } else {
+	      if (state->option.DrawBuffers)
+		 (yyval.integer) = FRAG_RESULT_DATA0;
+	      else
+		 (yyval.integer) = FRAG_RESULT_COLOR;
+	   }
+	}
+    break;
+
+  case 260:
+/* Line 1787 of yacc.c  */
+#line 2075 "program/program_parse.y"
+    {
+	   if (state->mode == ARB_vertex) {
+	      yyerror(& (yylsp[(1) - (3)]), state, "invalid program result name");
+	      YYERROR;
+	   } else {
+	      if (!state->option.DrawBuffers) {
+		 /* From the ARB_draw_buffers spec (same text exists
+		  * for ATI_draw_buffers):
+		  *
+		  *     If this option is not specified, a fragment
+		  *     program that attempts to bind
+		  *     "result.color[n]" will fail to load, and only
+		  *     "result.color" will be allowed.
+		  */
+		 yyerror(& (yylsp[(1) - (3)]), state,
+			 "result.color[] used without "
+			 "`OPTION ARB_draw_buffers' or "
+			 "`OPTION ATI_draw_buffers'");
+		 YYERROR;
+	      } else if ((yyvsp[(2) - (3)].integer) >= state->MaxDrawBuffers) {
+		 yyerror(& (yylsp[(1) - (3)]), state,
+			 "result.color[] exceeds MAX_DRAW_BUFFERS_ARB");
+		 YYERROR;
+	      }
+	      (yyval.integer) = FRAG_RESULT_DATA0 + (yyvsp[(2) - (3)].integer);
+	   }
+	}
+    break;
+
+  case 261:
+/* Line 1787 of yacc.c  */
+#line 2103 "program/program_parse.y"
+    {
+	   if (state->mode == ARB_vertex) {
+	      (yyval.integer) = VARYING_SLOT_COL0;
+	   } else {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid program result name");
+	      YYERROR;
+	   }
+	}
+    break;
+
+  case 262:
+/* Line 1787 of yacc.c  */
+#line 2112 "program/program_parse.y"
+    {
+	   if (state->mode == ARB_vertex) {
+	      (yyval.integer) = VARYING_SLOT_BFC0;
+	   } else {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid program result name");
+	      YYERROR;
+	   }
+	}
+    break;
+
+  case 263:
+/* Line 1787 of yacc.c  */
+#line 2123 "program/program_parse.y"
+    {
+	   (yyval.integer) = 0; 
+	}
+    break;
+
+  case 264:
+/* Line 1787 of yacc.c  */
+#line 2127 "program/program_parse.y"
+    {
+	   if (state->mode == ARB_vertex) {
+	      (yyval.integer) = 0;
+	   } else {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid program result name");
+	      YYERROR;
+	   }
+	}
+    break;
+
+  case 265:
+/* Line 1787 of yacc.c  */
+#line 2136 "program/program_parse.y"
+    {
+	   if (state->mode == ARB_vertex) {
+	      (yyval.integer) = 1;
+	   } else {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid program result name");
+	      YYERROR;
+	   }
+	}
+    break;
+
+  case 266:
+/* Line 1787 of yacc.c  */
+#line 2146 "program/program_parse.y"
+    { (yyval.integer) = 0; }
+    break;
+
+  case 267:
+/* Line 1787 of yacc.c  */
+#line 2147 "program/program_parse.y"
+    { (yyval.integer) = 0; }
+    break;
+
+  case 268:
+/* Line 1787 of yacc.c  */
+#line 2148 "program/program_parse.y"
+    { (yyval.integer) = 1; }
+    break;
+
+  case 269:
+/* Line 1787 of yacc.c  */
+#line 2151 "program/program_parse.y"
+    { (yyval.integer) = 0; }
+    break;
+
+  case 270:
+/* Line 1787 of yacc.c  */
+#line 2152 "program/program_parse.y"
+    { (yyval.integer) = 0; }
+    break;
+
+  case 271:
+/* Line 1787 of yacc.c  */
+#line 2153 "program/program_parse.y"
+    { (yyval.integer) = 1; }
+    break;
+
+  case 272:
+/* Line 1787 of yacc.c  */
+#line 2156 "program/program_parse.y"
+    { (yyval.integer) = 0; }
+    break;
+
+  case 273:
+/* Line 1787 of yacc.c  */
+#line 2157 "program/program_parse.y"
+    { (yyval.integer) = (yyvsp[(2) - (3)].integer); }
+    break;
+
+  case 274:
+/* Line 1787 of yacc.c  */
+#line 2160 "program/program_parse.y"
+    { (yyval.integer) = 0; }
+    break;
+
+  case 275:
+/* Line 1787 of yacc.c  */
+#line 2161 "program/program_parse.y"
+    { (yyval.integer) = (yyvsp[(2) - (3)].integer); }
+    break;
+
+  case 276:
+/* Line 1787 of yacc.c  */
+#line 2164 "program/program_parse.y"
+    { (yyval.integer) = 0; }
+    break;
+
+  case 277:
+/* Line 1787 of yacc.c  */
+#line 2165 "program/program_parse.y"
+    { (yyval.integer) = (yyvsp[(2) - (3)].integer); }
+    break;
+
+  case 278:
+/* Line 1787 of yacc.c  */
+#line 2169 "program/program_parse.y"
+    {
+	   if ((unsigned) (yyvsp[(1) - (1)].integer) >= state->MaxTextureCoordUnits) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid texture coordinate unit selector");
+	      YYERROR;
+	   }
+
+	   (yyval.integer) = (yyvsp[(1) - (1)].integer);
+	}
+    break;
+
+  case 279:
+/* Line 1787 of yacc.c  */
+#line 2180 "program/program_parse.y"
+    {
+	   if ((unsigned) (yyvsp[(1) - (1)].integer) >= state->MaxTextureImageUnits) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid texture image unit selector");
+	      YYERROR;
+	   }
+
+	   (yyval.integer) = (yyvsp[(1) - (1)].integer);
+	}
+    break;
+
+  case 280:
+/* Line 1787 of yacc.c  */
+#line 2191 "program/program_parse.y"
+    {
+	   if ((unsigned) (yyvsp[(1) - (1)].integer) >= state->MaxTextureUnits) {
+	      yyerror(& (yylsp[(1) - (1)]), state, "invalid texture unit selector");
+	      YYERROR;
+	   }
+
+	   (yyval.integer) = (yyvsp[(1) - (1)].integer);
+	}
+    break;
+
+  case 281:
+/* Line 1787 of yacc.c  */
+#line 2202 "program/program_parse.y"
+    {
+	   struct asm_symbol *exist = (struct asm_symbol *)
+	      _mesa_symbol_table_find_symbol(state->st, 0, (yyvsp[(2) - (4)].string));
+	   struct asm_symbol *target = (struct asm_symbol *)
+	      _mesa_symbol_table_find_symbol(state->st, 0, (yyvsp[(4) - (4)].string));
+
+	   free((yyvsp[(4) - (4)].string));
+
+	   if (exist != NULL) {
+	      char m[1000];
+	      _mesa_snprintf(m, sizeof(m), "redeclared identifier: %s", (yyvsp[(2) - (4)].string));
+	      free((yyvsp[(2) - (4)].string));
+	      yyerror(& (yylsp[(2) - (4)]), state, m);
+	      YYERROR;
+	   } else if (target == NULL) {
+	      free((yyvsp[(2) - (4)].string));
+	      yyerror(& (yylsp[(4) - (4)]), state,
+		      "undefined variable binding in ALIAS statement");
+	      YYERROR;
+	   } else {
+	      _mesa_symbol_table_add_symbol(state->st, 0, (yyvsp[(2) - (4)].string), target);
+	   }
+	}
+    break;
+
+
+/* Line 1787 of yacc.c  */
+#line 4857 "./program/program_parse.tab.c"
+      default: break;
+    }
+  /* User semantic actions sometimes alter yychar, and that requires
+     that yytoken be updated with the new translation.  We take the
+     approach of translating immediately before every use of yytoken.
+     One alternative is translating here after every semantic action,
+     but that translation would be missed if the semantic action invokes
+     YYABORT, YYACCEPT, or YYERROR immediately after altering yychar or
+     if it invokes YYBACKUP.  In the case of YYABORT or YYACCEPT, an
+     incorrect destructor might then be invoked immediately.  In the
+     case of YYERROR or YYBACKUP, subsequent parser actions might lead
+     to an incorrect destructor call or verbose syntax error message
+     before the lookahead is translated.  */
+  YY_SYMBOL_PRINT ("-> $$ =", yyr1[yyn], &yyval, &yyloc);
+
+  YYPOPSTACK (yylen);
+  yylen = 0;
+  YY_STACK_PRINT (yyss, yyssp);
+
+  *++yyvsp = yyval;
+  *++yylsp = yyloc;
+
+  /* Now `shift' the result of the reduction.  Determine what state
+     that goes to, based on the state we popped back to and the rule
+     number reduced by.  */
+
+  yyn = yyr1[yyn];
+
+  yystate = yypgoto[yyn - YYNTOKENS] + *yyssp;
+  if (0 <= yystate && yystate <= YYLAST && yycheck[yystate] == *yyssp)
+    yystate = yytable[yystate];
+  else
+    yystate = yydefgoto[yyn - YYNTOKENS];
+
+  goto yynewstate;
+
+
+/*------------------------------------.
+| yyerrlab -- here on detecting error |
+`------------------------------------*/
+yyerrlab:
+  /* Make sure we have latest lookahead translation.  See comments at
+     user semantic actions for why this is necessary.  */
+  yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar);
+
+  /* If not already recovering from an error, report this error.  */
+  if (!yyerrstatus)
+    {
+      ++yynerrs;
+#if ! YYERROR_VERBOSE
+      yyerror (&yylloc, state, YY_("syntax error"));
+#else
+# define YYSYNTAX_ERROR yysyntax_error (&yymsg_alloc, &yymsg, \
+                                        yyssp, yytoken)
+      {
+        char const *yymsgp = YY_("syntax error");
+        int yysyntax_error_status;
+        yysyntax_error_status = YYSYNTAX_ERROR;
+        if (yysyntax_error_status == 0)
+          yymsgp = yymsg;
+        else if (yysyntax_error_status == 1)
+          {
+            if (yymsg != yymsgbuf)
+              YYSTACK_FREE (yymsg);
+            yymsg = (char *) YYSTACK_ALLOC (yymsg_alloc);
+            if (!yymsg)
+              {
+                yymsg = yymsgbuf;
+                yymsg_alloc = sizeof yymsgbuf;
+                yysyntax_error_status = 2;
+              }
+            else
+              {
+                yysyntax_error_status = YYSYNTAX_ERROR;
+                yymsgp = yymsg;
+              }
+          }
+        yyerror (&yylloc, state, yymsgp);
+        if (yysyntax_error_status == 2)
+          goto yyexhaustedlab;
+      }
+# undef YYSYNTAX_ERROR
+#endif
+    }
+
+  yyerror_range[1] = yylloc;
+
+  if (yyerrstatus == 3)
+    {
+      /* If just tried and failed to reuse lookahead token after an
+	 error, discard it.  */
+
+      if (yychar <= YYEOF)
+	{
+	  /* Return failure if at end of input.  */
+	  if (yychar == YYEOF)
+	    YYABORT;
+	}
+      else
+	{
+	  yydestruct ("Error: discarding",
+		      yytoken, &yylval, &yylloc, state);
+	  yychar = YYEMPTY;
+	}
+    }
+
+  /* Else will try to reuse lookahead token after shifting the error
+     token.  */
+  goto yyerrlab1;
+
+
+/*---------------------------------------------------.
+| yyerrorlab -- error raised explicitly by YYERROR.  |
+`---------------------------------------------------*/
+yyerrorlab:
+
+  /* Pacify compilers like GCC when the user code never invokes
+     YYERROR and the label yyerrorlab therefore never appears in user
+     code.  */
+  if (/*CONSTCOND*/ 0)
+     goto yyerrorlab;
+
+  yyerror_range[1] = yylsp[1-yylen];
+  /* Do not reclaim the symbols of the rule which action triggered
+     this YYERROR.  */
+  YYPOPSTACK (yylen);
+  yylen = 0;
+  YY_STACK_PRINT (yyss, yyssp);
+  yystate = *yyssp;
+  goto yyerrlab1;
+
+
+/*-------------------------------------------------------------.
+| yyerrlab1 -- common code for both syntax error and YYERROR.  |
+`-------------------------------------------------------------*/
+yyerrlab1:
+  yyerrstatus = 3;	/* Each real token shifted decrements this.  */
+
+  for (;;)
+    {
+      yyn = yypact[yystate];
+      if (!yypact_value_is_default (yyn))
+	{
+	  yyn += YYTERROR;
+	  if (0 <= yyn && yyn <= YYLAST && yycheck[yyn] == YYTERROR)
+	    {
+	      yyn = yytable[yyn];
+	      if (0 < yyn)
+		break;
+	    }
+	}
+
+      /* Pop the current state because it cannot handle the error token.  */
+      if (yyssp == yyss)
+	YYABORT;
+
+      yyerror_range[1] = *yylsp;
+      yydestruct ("Error: popping",
+		  yystos[yystate], yyvsp, yylsp, state);
+      YYPOPSTACK (1);
+      yystate = *yyssp;
+      YY_STACK_PRINT (yyss, yyssp);
+    }
+
+  YY_IGNORE_MAYBE_UNINITIALIZED_BEGIN
+  *++yyvsp = yylval;
+  YY_IGNORE_MAYBE_UNINITIALIZED_END
+
+  yyerror_range[2] = yylloc;
+  /* Using YYLLOC is tempting, but would change the location of
+     the lookahead.  YYLOC is available though.  */
+  YYLLOC_DEFAULT (yyloc, yyerror_range, 2);
+  *++yylsp = yyloc;
+
+  /* Shift the error token.  */
+  YY_SYMBOL_PRINT ("Shifting", yystos[yyn], yyvsp, yylsp);
+
+  yystate = yyn;
+  goto yynewstate;
+
+
+/*-------------------------------------.
+| yyacceptlab -- YYACCEPT comes here.  |
+`-------------------------------------*/
+yyacceptlab:
+  yyresult = 0;
+  goto yyreturn;
+
+/*-----------------------------------.
+| yyabortlab -- YYABORT comes here.  |
+`-----------------------------------*/
+yyabortlab:
+  yyresult = 1;
+  goto yyreturn;
+
+#if !defined yyoverflow || YYERROR_VERBOSE
+/*-------------------------------------------------.
+| yyexhaustedlab -- memory exhaustion comes here.  |
+`-------------------------------------------------*/
+yyexhaustedlab:
+  yyerror (&yylloc, state, YY_("memory exhausted"));
+  yyresult = 2;
+  /* Fall through.  */
+#endif
+
+yyreturn:
+  if (yychar != YYEMPTY)
+    {
+      /* Make sure we have latest lookahead translation.  See comments at
+         user semantic actions for why this is necessary.  */
+      yytoken = YYTRANSLATE (yychar);
+      yydestruct ("Cleanup: discarding lookahead",
+                  yytoken, &yylval, &yylloc, state);
+    }
+  /* Do not reclaim the symbols of the rule which action triggered
+     this YYABORT or YYACCEPT.  */
+  YYPOPSTACK (yylen);
+  YY_STACK_PRINT (yyss, yyssp);
+  while (yyssp != yyss)
+    {
+      yydestruct ("Cleanup: popping",
+		  yystos[*yyssp], yyvsp, yylsp, state);
+      YYPOPSTACK (1);
+    }
+#ifndef yyoverflow
+  if (yyss != yyssa)
+    YYSTACK_FREE (yyss);
+#endif
+#if YYERROR_VERBOSE
+  if (yymsg != yymsgbuf)
+    YYSTACK_FREE (yymsg);
+#endif
+  /* Make sure YYID is used.  */
+  return YYID (yyresult);
+}
+
+
+/* Line 2050 of yacc.c  */
+#line 2231 "program/program_parse.y"
+
+
+void
+asm_instruction_set_operands(struct asm_instruction *inst,
+			     const struct prog_dst_register *dst,
+			     const struct asm_src_register *src0,
+			     const struct asm_src_register *src1,
+			     const struct asm_src_register *src2)
+{
+   /* In the core ARB extensions only the KIL instruction doesn't have a
+    * destination register.
+    */
+   if (dst == NULL) {
+      init_dst_reg(& inst->Base.DstReg);
+   } else {
+      inst->Base.DstReg = *dst;
+   }
+
+   /* The only instruction that doesn't have any source registers is the
+    * condition-code based KIL instruction added by NV_fragment_program_option.
+    */
+   if (src0 != NULL) {
+      inst->Base.SrcReg[0] = src0->Base;
+      inst->SrcReg[0] = *src0;
+   } else {
+      init_src_reg(& inst->SrcReg[0]);
+   }
+
+   if (src1 != NULL) {
+      inst->Base.SrcReg[1] = src1->Base;
+      inst->SrcReg[1] = *src1;
+   } else {
+      init_src_reg(& inst->SrcReg[1]);
+   }
+
+   if (src2 != NULL) {
+      inst->Base.SrcReg[2] = src2->Base;
+      inst->SrcReg[2] = *src2;
+   } else {
+      init_src_reg(& inst->SrcReg[2]);
+   }
+}
+
+
+struct asm_instruction *
+asm_instruction_ctor(gl_inst_opcode op,
+		     const struct prog_dst_register *dst,
+		     const struct asm_src_register *src0,
+		     const struct asm_src_register *src1,
+		     const struct asm_src_register *src2)
+{
+   struct asm_instruction *inst = CALLOC_STRUCT(asm_instruction);
+
+   if (inst) {
+      _mesa_init_instructions(& inst->Base, 1);
+      inst->Base.Opcode = op;
+
+      asm_instruction_set_operands(inst, dst, src0, src1, src2);
+   }
+
+   return inst;
+}
+
+
+struct asm_instruction *
+asm_instruction_copy_ctor(const struct prog_instruction *base,
+			  const struct prog_dst_register *dst,
+			  const struct asm_src_register *src0,
+			  const struct asm_src_register *src1,
+			  const struct asm_src_register *src2)
+{
+   struct asm_instruction *inst = CALLOC_STRUCT(asm_instruction);
+
+   if (inst) {
+      _mesa_init_instructions(& inst->Base, 1);
+      inst->Base.Opcode = base->Opcode;
+      inst->Base.CondUpdate = base->CondUpdate;
+      inst->Base.CondDst = base->CondDst;
+      inst->Base.SaturateMode = base->SaturateMode;
+      inst->Base.Precision = base->Precision;
+
+      asm_instruction_set_operands(inst, dst, src0, src1, src2);
+   }
+
+   return inst;
+}
+
+
+void
+init_dst_reg(struct prog_dst_register *r)
+{
+   memset(r, 0, sizeof(*r));
+   r->File = PROGRAM_UNDEFINED;
+   r->WriteMask = WRITEMASK_XYZW;
+   r->CondMask = COND_TR;
+   r->CondSwizzle = SWIZZLE_NOOP;
+}
+
+
+/** Like init_dst_reg() but set the File and Index fields. */
+void
+set_dst_reg(struct prog_dst_register *r, gl_register_file file, GLint index)
+{
+   const GLint maxIndex = 1 << INST_INDEX_BITS;
+   const GLint minIndex = 0;
+   ASSERT(index >= minIndex);
+   (void) minIndex;
+   ASSERT(index <= maxIndex);
+   (void) maxIndex;
+   ASSERT(file == PROGRAM_TEMPORARY ||
+	  file == PROGRAM_ADDRESS ||
+	  file == PROGRAM_OUTPUT);
+   memset(r, 0, sizeof(*r));
+   r->File = file;
+   r->Index = index;
+   r->WriteMask = WRITEMASK_XYZW;
+   r->CondMask = COND_TR;
+   r->CondSwizzle = SWIZZLE_NOOP;
+}
+
+
+void
+init_src_reg(struct asm_src_register *r)
+{
+   memset(r, 0, sizeof(*r));
+   r->Base.File = PROGRAM_UNDEFINED;
+   r->Base.Swizzle = SWIZZLE_NOOP;
+   r->Symbol = NULL;
+}
+
+
+/** Like init_src_reg() but set the File and Index fields.
+ * \return GL_TRUE if a valid src register, GL_FALSE otherwise
+ */
+void
+set_src_reg(struct asm_src_register *r, gl_register_file file, GLint index)
+{
+   set_src_reg_swz(r, file, index, SWIZZLE_XYZW);
+}
+
+
+void
+set_src_reg_swz(struct asm_src_register *r, gl_register_file file, GLint index,
+                GLuint swizzle)
+{
+   const GLint maxIndex = (1 << INST_INDEX_BITS) - 1;
+   const GLint minIndex = -(1 << INST_INDEX_BITS);
+   ASSERT(file < PROGRAM_FILE_MAX);
+   ASSERT(index >= minIndex);
+   (void) minIndex;
+   ASSERT(index <= maxIndex);
+   (void) maxIndex;
+   memset(r, 0, sizeof(*r));
+   r->Base.File = file;
+   r->Base.Index = index;
+   r->Base.Swizzle = swizzle;
+   r->Symbol = NULL;
+}
+
+
+/**
+ * Validate the set of inputs used by a program
+ *
+ * Validates that legal sets of inputs are used by the program.  In this case
+ * "used" included both reading the input or binding the input to a name using
+ * the \c ATTRIB command.
+ *
+ * \return
+ * \c TRUE if the combination of inputs used is valid, \c FALSE otherwise.
+ */
+int
+validate_inputs(struct YYLTYPE *locp, struct asm_parser_state *state)
+{
+   const GLbitfield64 inputs = state->prog->InputsRead | state->InputsBound;
+
+   if (((inputs & VERT_BIT_FF_ALL) & (inputs >> VERT_ATTRIB_GENERIC0)) != 0) {
+      yyerror(locp, state, "illegal use of generic attribute and name attribute");
+      return 0;
+   }
+
+   return 1;
+}
+
+
+struct asm_symbol *
+declare_variable(struct asm_parser_state *state, char *name, enum asm_type t,
+		 struct YYLTYPE *locp)
+{
+   struct asm_symbol *s = NULL;
+   struct asm_symbol *exist = (struct asm_symbol *)
+      _mesa_symbol_table_find_symbol(state->st, 0, name);
+
+
+   if (exist != NULL) {
+      yyerror(locp, state, "redeclared identifier");
+   } else {
+      s = calloc(1, sizeof(struct asm_symbol));
+      s->name = name;
+      s->type = t;
+
+      switch (t) {
+      case at_temp:
+	 if (state->prog->NumTemporaries >= state->limits->MaxTemps) {
+	    yyerror(locp, state, "too many temporaries declared");
+	    free(s);
+	    return NULL;
+	 }
+
+	 s->temp_binding = state->prog->NumTemporaries;
+	 state->prog->NumTemporaries++;
+	 break;
+
+      case at_address:
+	 if (state->prog->NumAddressRegs >= state->limits->MaxAddressRegs) {
+	    yyerror(locp, state, "too many address registers declared");
+	    free(s);
+	    return NULL;
+	 }
+
+	 /* FINISHME: Add support for multiple address registers.
+	  */
+	 state->prog->NumAddressRegs++;
+	 break;
+
+      default:
+	 break;
+      }
+
+      _mesa_symbol_table_add_symbol(state->st, 0, s->name, s);
+      s->next = state->sym;
+      state->sym = s;
+   }
+
+   return s;
+}
+
+
+int add_state_reference(struct gl_program_parameter_list *param_list,
+			const gl_state_index tokens[STATE_LENGTH])
+{
+   const GLuint size = 4; /* XXX fix */
+   char *name;
+   GLint index;
+
+   name = _mesa_program_state_string(tokens);
+   index = _mesa_add_parameter(param_list, PROGRAM_STATE_VAR, name,
+                               size, GL_NONE, NULL, tokens);
+   param_list->StateFlags |= _mesa_program_state_flags(tokens);
+
+   /* free name string here since we duplicated it in add_parameter() */
+   free(name);
+
+   return index;
+}
+
+
+int
+initialize_symbol_from_state(struct gl_program *prog,
+			     struct asm_symbol *param_var, 
+			     const gl_state_index tokens[STATE_LENGTH])
+{
+   int idx = -1;
+   gl_state_index state_tokens[STATE_LENGTH];
+
+
+   memcpy(state_tokens, tokens, sizeof(state_tokens));
+
+   param_var->type = at_param;
+   param_var->param_binding_type = PROGRAM_STATE_VAR;
+
+   /* If we are adding a STATE_MATRIX that has multiple rows, we need to
+    * unroll it and call add_state_reference() for each row
+    */
+   if ((state_tokens[0] == STATE_MODELVIEW_MATRIX ||
+	state_tokens[0] == STATE_PROJECTION_MATRIX ||
+	state_tokens[0] == STATE_MVP_MATRIX ||
+	state_tokens[0] == STATE_TEXTURE_MATRIX ||
+	state_tokens[0] == STATE_PROGRAM_MATRIX)
+       && (state_tokens[2] != state_tokens[3])) {
+      int row;
+      const int first_row = state_tokens[2];
+      const int last_row = state_tokens[3];
+
+      for (row = first_row; row <= last_row; row++) {
+	 state_tokens[2] = state_tokens[3] = row;
+
+	 idx = add_state_reference(prog->Parameters, state_tokens);
+	 if (param_var->param_binding_begin == ~0U) {
+	    param_var->param_binding_begin = idx;
+            param_var->param_binding_swizzle = SWIZZLE_XYZW;
+         }
+
+	 param_var->param_binding_length++;
+      }
+   }
+   else {
+      idx = add_state_reference(prog->Parameters, state_tokens);
+      if (param_var->param_binding_begin == ~0U) {
+	 param_var->param_binding_begin = idx;
+         param_var->param_binding_swizzle = SWIZZLE_XYZW;
+      }
+      param_var->param_binding_length++;
+   }
+
+   return idx;
+}
+
+
+int
+initialize_symbol_from_param(struct gl_program *prog,
+			     struct asm_symbol *param_var, 
+			     const gl_state_index tokens[STATE_LENGTH])
+{
+   int idx = -1;
+   gl_state_index state_tokens[STATE_LENGTH];
+
+
+   memcpy(state_tokens, tokens, sizeof(state_tokens));
+
+   assert((state_tokens[0] == STATE_VERTEX_PROGRAM)
+	  || (state_tokens[0] == STATE_FRAGMENT_PROGRAM));
+   assert((state_tokens[1] == STATE_ENV)
+	  || (state_tokens[1] == STATE_LOCAL));
+
+   /*
+    * The param type is STATE_VAR.  The program parameter entry will
+    * effectively be a pointer into the LOCAL or ENV parameter array.
+    */
+   param_var->type = at_param;
+   param_var->param_binding_type = PROGRAM_STATE_VAR;
+
+   /* If we are adding a STATE_ENV or STATE_LOCAL that has multiple elements,
+    * we need to unroll it and call add_state_reference() for each row
+    */
+   if (state_tokens[2] != state_tokens[3]) {
+      int row;
+      const int first_row = state_tokens[2];
+      const int last_row = state_tokens[3];
+
+      for (row = first_row; row <= last_row; row++) {
+	 state_tokens[2] = state_tokens[3] = row;
+
+	 idx = add_state_reference(prog->Parameters, state_tokens);
+	 if (param_var->param_binding_begin == ~0U) {
+	    param_var->param_binding_begin = idx;
+            param_var->param_binding_swizzle = SWIZZLE_XYZW;
+         }
+	 param_var->param_binding_length++;
+      }
+   }
+   else {
+      idx = add_state_reference(prog->Parameters, state_tokens);
+      if (param_var->param_binding_begin == ~0U) {
+	 param_var->param_binding_begin = idx;
+         param_var->param_binding_swizzle = SWIZZLE_XYZW;
+      }
+      param_var->param_binding_length++;
+   }
+
+   return idx;
+}
+
+
+/**
+ * Put a float/vector constant/literal into the parameter list.
+ * \param param_var  returns info about the parameter/constant's location,
+ *                   binding, type, etc.
+ * \param vec  the vector/constant to add
+ * \param allowSwizzle  if true, try to consolidate constants which only differ
+ *                      by a swizzle.  We don't want to do this when building
+ *                      arrays of constants that may be indexed indirectly.
+ * \return index of the constant in the parameter list.
+ */
+int
+initialize_symbol_from_const(struct gl_program *prog,
+			     struct asm_symbol *param_var, 
+			     const struct asm_vector *vec,
+                             GLboolean allowSwizzle)
+{
+   unsigned swizzle;
+   const int idx = _mesa_add_unnamed_constant(prog->Parameters,
+                                              vec->data, vec->count,
+                                              allowSwizzle ? &swizzle : NULL);
+
+   param_var->type = at_param;
+   param_var->param_binding_type = PROGRAM_CONSTANT;
+
+   if (param_var->param_binding_begin == ~0U) {
+      param_var->param_binding_begin = idx;
+      param_var->param_binding_swizzle = allowSwizzle ? swizzle : SWIZZLE_XYZW;
+   }
+   param_var->param_binding_length++;
+
+   return idx;
+}
+
+
+char *
+make_error_string(const char *fmt, ...)
+{
+   int length;
+   char *str;
+   va_list args;
+
+
+   /* Call vsnprintf once to determine how large the final string is.  Call it
+    * again to do the actual formatting.  from the vsnprintf manual page:
+    *
+    *    Upon successful return, these functions return the number of
+    *    characters printed  (not including the trailing '\0' used to end
+    *    output to strings).
+    */
+   va_start(args, fmt);
+   length = 1 + vsnprintf(NULL, 0, fmt, args);
+   va_end(args);
+
+   str = malloc(length);
+   if (str) {
+      va_start(args, fmt);
+      vsnprintf(str, length, fmt, args);
+      va_end(args);
+   }
+
+   return str;
+}
+
+
+void
+yyerror(YYLTYPE *locp, struct asm_parser_state *state, const char *s)
+{
+   char *err_str;
+
+
+   err_str = make_error_string("glProgramStringARB(%s)\n", s);
+   if (err_str) {
+      _mesa_error(state->ctx, GL_INVALID_OPERATION, "%s", err_str);
+      free(err_str);
+   }
+
+   err_str = make_error_string("line %u, char %u: error: %s\n",
+			       locp->first_line, locp->first_column, s);
+   _mesa_set_program_error(state->ctx, locp->position, err_str);
+
+   if (err_str) {
+      free(err_str);
+   }
+}
+
+
+GLboolean
+_mesa_parse_arb_program(struct gl_context *ctx, GLenum target, const GLubyte *str,
+			GLsizei len, struct asm_parser_state *state)
+{
+   struct asm_instruction *inst;
+   unsigned i;
+   GLubyte *strz;
+   GLboolean result = GL_FALSE;
+   void *temp;
+   struct asm_symbol *sym;
+
+   state->ctx = ctx;
+   state->prog->Target = target;
+   state->prog->Parameters = _mesa_new_parameter_list();
+
+   /* Make a copy of the program string and force it to be NUL-terminated.
+    */
+   strz = (GLubyte *) malloc(len + 1);
+   if (strz == NULL) {
+      _mesa_error(ctx, GL_OUT_OF_MEMORY, "glProgramStringARB");
+      return GL_FALSE;
+   }
+   memcpy (strz, str, len);
+   strz[len] = '\0';
+
+   state->prog->String = strz;
+
+   state->st = _mesa_symbol_table_ctor();
+
+   state->limits = (target == GL_VERTEX_PROGRAM_ARB)
+      ? & ctx->Const.Program[MESA_SHADER_VERTEX]
+      : & ctx->Const.Program[MESA_SHADER_FRAGMENT];
+
+   state->MaxTextureImageUnits = ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxTextureImageUnits;
+   state->MaxTextureCoordUnits = ctx->Const.MaxTextureCoordUnits;
+   state->MaxTextureUnits = ctx->Const.MaxTextureUnits;
+   state->MaxClipPlanes = ctx->Const.MaxClipPlanes;
+   state->MaxLights = ctx->Const.MaxLights;
+   state->MaxProgramMatrices = ctx->Const.MaxProgramMatrices;
+   state->MaxDrawBuffers = ctx->Const.MaxDrawBuffers;
+
+   state->state_param_enum = (target == GL_VERTEX_PROGRAM_ARB)
+      ? STATE_VERTEX_PROGRAM : STATE_FRAGMENT_PROGRAM;
+
+   _mesa_set_program_error(ctx, -1, NULL);
+
+   _mesa_program_lexer_ctor(& state->scanner, state, (const char *) str, len);
+   yyparse(state);
+   _mesa_program_lexer_dtor(state->scanner);
+
+
+   if (ctx->Program.ErrorPos != -1) {
+      goto error;
+   }
+
+   if (! _mesa_layout_parameters(state)) {
+      struct YYLTYPE loc;
+
+      loc.first_line = 0;
+      loc.first_column = 0;
+      loc.position = len;
+
+      yyerror(& loc, state, "invalid PARAM usage");
+      goto error;
+   }
+
+
+   
+   /* Add one instruction to store the "END" instruction.
+    */
+   state->prog->Instructions =
+      _mesa_alloc_instructions(state->prog->NumInstructions + 1);
+
+   if (state->prog->Instructions == NULL) {
+      goto error;
+   }
+
+   inst = state->inst_head;
+   for (i = 0; i < state->prog->NumInstructions; i++) {
+      struct asm_instruction *const temp = inst->next;
+
+      state->prog->Instructions[i] = inst->Base;
+      inst = temp;
+   }
+
+   /* Finally, tag on an OPCODE_END instruction */
+   {
+      const GLuint numInst = state->prog->NumInstructions;
+      _mesa_init_instructions(state->prog->Instructions + numInst, 1);
+      state->prog->Instructions[numInst].Opcode = OPCODE_END;
+   }
+   state->prog->NumInstructions++;
+
+   state->prog->NumParameters = state->prog->Parameters->NumParameters;
+   state->prog->NumAttributes = _mesa_bitcount_64(state->prog->InputsRead);
+
+   /*
+    * Initialize native counts to logical counts.  The device driver may
+    * change them if program is translated into a hardware program.
+    */
+   state->prog->NumNativeInstructions = state->prog->NumInstructions;
+   state->prog->NumNativeTemporaries = state->prog->NumTemporaries;
+   state->prog->NumNativeParameters = state->prog->NumParameters;
+   state->prog->NumNativeAttributes = state->prog->NumAttributes;
+   state->prog->NumNativeAddressRegs = state->prog->NumAddressRegs;
+
+   result = GL_TRUE;
+
+error:
+   for (inst = state->inst_head; inst != NULL; inst = temp) {
+      temp = inst->next;
+      free(inst);
+   }
+
+   state->inst_head = NULL;
+   state->inst_tail = NULL;
+
+   for (sym = state->sym; sym != NULL; sym = temp) {
+      temp = sym->next;
+
+      free((void *) sym->name);
+      free(sym);
+   }
+   state->sym = NULL;
+
+   _mesa_symbol_table_dtor(state->st);
+   state->st = NULL;
+
+   return result;
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/program_parse.tab.h b/icd/intel/compiler/mesa-utils/src/mesa/program/program_parse.tab.h
new file mode 100644
index 0000000..eab1c10
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/program_parse.tab.h

@@ -0,0 +1,225 @@
+/* A Bison parser, made by GNU Bison 2.7.12-4996.  */
+
+/* Bison interface for Yacc-like parsers in C
+   
+      Copyright (C) 1984, 1989-1990, 2000-2013 Free Software Foundation, Inc.
+   
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+   
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+   
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+/* As a special exception, you may create a larger work that contains
+   part or all of the Bison parser skeleton and distribute that work
+   under terms of your choice, so long as that work isn't itself a
+   parser generator using the skeleton or a modified version thereof
+   as a parser skeleton.  Alternatively, if you modify or redistribute
+   the parser skeleton itself, you may (at your option) remove this
+   special exception, which will cause the skeleton and the resulting
+   Bison output files to be licensed under the GNU General Public
+   License without this special exception.
+   
+   This special exception was added by the Free Software Foundation in
+   version 2.2 of Bison.  */
+
+#ifndef YY__MESA_PROGRAM_PROGRAM_PROGRAM_PARSE_TAB_H_INCLUDED
+# define YY__MESA_PROGRAM_PROGRAM_PROGRAM_PARSE_TAB_H_INCLUDED
+/* Enabling traces.  */
+#ifndef YYDEBUG
+# define YYDEBUG 0
+#endif
+#if YYDEBUG
+extern int _mesa_program_debug;
+#endif
+
+/* Tokens.  */
+#ifndef YYTOKENTYPE
+# define YYTOKENTYPE
+   /* Put the tokens into the symbol table, so that GDB and other debuggers
+      know about them.  */
+   enum yytokentype {
+     ARBvp_10 = 258,
+     ARBfp_10 = 259,
+     ADDRESS = 260,
+     ALIAS = 261,
+     ATTRIB = 262,
+     OPTION = 263,
+     OUTPUT = 264,
+     PARAM = 265,
+     TEMP = 266,
+     END = 267,
+     BIN_OP = 268,
+     BINSC_OP = 269,
+     SAMPLE_OP = 270,
+     SCALAR_OP = 271,
+     TRI_OP = 272,
+     VECTOR_OP = 273,
+     ARL = 274,
+     KIL = 275,
+     SWZ = 276,
+     TXD_OP = 277,
+     INTEGER = 278,
+     REAL = 279,
+     AMBIENT = 280,
+     ATTENUATION = 281,
+     BACK = 282,
+     CLIP = 283,
+     COLOR = 284,
+     DEPTH = 285,
+     DIFFUSE = 286,
+     DIRECTION = 287,
+     EMISSION = 288,
+     ENV = 289,
+     EYE = 290,
+     FOG = 291,
+     FOGCOORD = 292,
+     FRAGMENT = 293,
+     FRONT = 294,
+     HALF = 295,
+     INVERSE = 296,
+     INVTRANS = 297,
+     LIGHT = 298,
+     LIGHTMODEL = 299,
+     LIGHTPROD = 300,
+     LOCAL = 301,
+     MATERIAL = 302,
+     MAT_PROGRAM = 303,
+     MATRIX = 304,
+     MATRIXINDEX = 305,
+     MODELVIEW = 306,
+     MVP = 307,
+     NORMAL = 308,
+     OBJECT = 309,
+     PALETTE = 310,
+     PARAMS = 311,
+     PLANE = 312,
+     POINT_TOK = 313,
+     POINTSIZE = 314,
+     POSITION = 315,
+     PRIMARY = 316,
+     PROGRAM = 317,
+     PROJECTION = 318,
+     RANGE = 319,
+     RESULT = 320,
+     ROW = 321,
+     SCENECOLOR = 322,
+     SECONDARY = 323,
+     SHININESS = 324,
+     SIZE_TOK = 325,
+     SPECULAR = 326,
+     SPOT = 327,
+     STATE = 328,
+     TEXCOORD = 329,
+     TEXENV = 330,
+     TEXGEN = 331,
+     TEXGEN_Q = 332,
+     TEXGEN_R = 333,
+     TEXGEN_S = 334,
+     TEXGEN_T = 335,
+     TEXTURE = 336,
+     TRANSPOSE = 337,
+     TEXTURE_UNIT = 338,
+     TEX_1D = 339,
+     TEX_2D = 340,
+     TEX_3D = 341,
+     TEX_CUBE = 342,
+     TEX_RECT = 343,
+     TEX_SHADOW1D = 344,
+     TEX_SHADOW2D = 345,
+     TEX_SHADOWRECT = 346,
+     TEX_ARRAY1D = 347,
+     TEX_ARRAY2D = 348,
+     TEX_ARRAYSHADOW1D = 349,
+     TEX_ARRAYSHADOW2D = 350,
+     VERTEX = 351,
+     VTXATTRIB = 352,
+     WEIGHT = 353,
+     IDENTIFIER = 354,
+     USED_IDENTIFIER = 355,
+     MASK4 = 356,
+     MASK3 = 357,
+     MASK2 = 358,
+     MASK1 = 359,
+     SWIZZLE = 360,
+     DOT_DOT = 361,
+     DOT = 362
+   };
+#endif
+
+
+#if ! defined YYSTYPE && ! defined YYSTYPE_IS_DECLARED
+typedef union YYSTYPE
+{
+/* Line 2053 of yacc.c  */
+#line 124 "program/program_parse.y"
+
+   struct asm_instruction *inst;
+   struct asm_symbol *sym;
+   struct asm_symbol temp_sym;
+   struct asm_swizzle_mask swiz_mask;
+   struct asm_src_register src_reg;
+   struct prog_dst_register dst_reg;
+   struct prog_instruction temp_inst;
+   char *string;
+   unsigned result;
+   unsigned attrib;
+   int integer;
+   float real;
+   gl_state_index state[STATE_LENGTH];
+   int negate;
+   struct asm_vector vector;
+   gl_inst_opcode opcode;
+
+   struct {
+      unsigned swz;
+      unsigned rgba_valid:1;
+      unsigned xyzw_valid:1;
+      unsigned negate:1;
+   } ext_swizzle;
+
+
+/* Line 2053 of yacc.c  */
+#line 191 "./program/program_parse.tab.h"
+} YYSTYPE;
+# define YYSTYPE_IS_TRIVIAL 1
+# define yystype YYSTYPE /* obsolescent; will be withdrawn */
+# define YYSTYPE_IS_DECLARED 1
+#endif
+
+#if ! defined YYLTYPE && ! defined YYLTYPE_IS_DECLARED
+typedef struct YYLTYPE
+{
+  int first_line;
+  int first_column;
+  int last_line;
+  int last_column;
+} YYLTYPE;
+# define yyltype YYLTYPE /* obsolescent; will be withdrawn */
+# define YYLTYPE_IS_DECLARED 1
+# define YYLTYPE_IS_TRIVIAL 1
+#endif
+
+
+#ifdef YYPARSE_PARAM
+#if defined __STDC__ || defined __cplusplus
+int _mesa_program_parse (void *YYPARSE_PARAM);
+#else
+int _mesa_program_parse ();
+#endif
+#else /* ! YYPARSE_PARAM */
+#if defined __STDC__ || defined __cplusplus
+int _mesa_program_parse (struct asm_parser_state *state);
+#else
+int _mesa_program_parse ();
+#endif
+#endif /* ! YYPARSE_PARAM */
+
+#endif /* !YY__MESA_PROGRAM_PROGRAM_PROGRAM_PARSE_TAB_H_INCLUDED  */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/program_parse.y b/icd/intel/compiler/mesa-utils/src/mesa/program/program_parse.y
new file mode 100644
index 0000000..1664740
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/program_parse.y

@@ -0,0 +1,2809 @@
+%{
+/*
+ * Copyright © 2009 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "main/mtypes.h"
+#include "main/imports.h"
+#include "program/program.h"
+#include "program/prog_parameter.h"
+#include "program/prog_parameter_layout.h"
+#include "program/prog_statevars.h"
+#include "program/prog_instruction.h"
+
+#include "program/symbol_table.h"
+#include "program/program_parser.h"
+
+extern void *yy_scan_string(char *);
+extern void yy_delete_buffer(void *);
+
+static struct asm_symbol *declare_variable(struct asm_parser_state *state,
+    char *name, enum asm_type t, struct YYLTYPE *locp);
+
+static int add_state_reference(struct gl_program_parameter_list *param_list,
+    const gl_state_index tokens[STATE_LENGTH]);
+
+static int initialize_symbol_from_state(struct gl_program *prog,
+    struct asm_symbol *param_var, const gl_state_index tokens[STATE_LENGTH]);
+
+static int initialize_symbol_from_param(struct gl_program *prog,
+    struct asm_symbol *param_var, const gl_state_index tokens[STATE_LENGTH]);
+
+static int initialize_symbol_from_const(struct gl_program *prog,
+    struct asm_symbol *param_var, const struct asm_vector *vec,
+    GLboolean allowSwizzle);
+
+static int yyparse(struct asm_parser_state *state);
+
+static char *make_error_string(const char *fmt, ...);
+
+static void yyerror(struct YYLTYPE *locp, struct asm_parser_state *state,
+    const char *s);
+
+static int validate_inputs(struct YYLTYPE *locp,
+    struct asm_parser_state *state);
+
+static void init_dst_reg(struct prog_dst_register *r);
+
+static void set_dst_reg(struct prog_dst_register *r,
+                        gl_register_file file, GLint index);
+
+static void init_src_reg(struct asm_src_register *r);
+
+static void set_src_reg(struct asm_src_register *r,
+                        gl_register_file file, GLint index);
+
+static void set_src_reg_swz(struct asm_src_register *r,
+                            gl_register_file file, GLint index, GLuint swizzle);
+
+static void asm_instruction_set_operands(struct asm_instruction *inst,
+    const struct prog_dst_register *dst, const struct asm_src_register *src0,
+    const struct asm_src_register *src1, const struct asm_src_register *src2);
+
+static struct asm_instruction *asm_instruction_ctor(gl_inst_opcode op,
+    const struct prog_dst_register *dst, const struct asm_src_register *src0,
+    const struct asm_src_register *src1, const struct asm_src_register *src2);
+
+static struct asm_instruction *asm_instruction_copy_ctor(
+    const struct prog_instruction *base, const struct prog_dst_register *dst,
+    const struct asm_src_register *src0, const struct asm_src_register *src1,
+    const struct asm_src_register *src2);
+
+#ifndef FALSE
+#define FALSE 0
+#define TRUE (!FALSE)
+#endif
+
+#define YYLLOC_DEFAULT(Current, Rhs, N)					\
+   do {									\
+      if (N) {							\
+	 (Current).first_line = YYRHSLOC(Rhs, 1).first_line;		\
+	 (Current).first_column = YYRHSLOC(Rhs, 1).first_column;	\
+	 (Current).position = YYRHSLOC(Rhs, 1).position;		\
+	 (Current).last_line = YYRHSLOC(Rhs, N).last_line;		\
+	 (Current).last_column = YYRHSLOC(Rhs, N).last_column;		\
+      } else {								\
+	 (Current).first_line = YYRHSLOC(Rhs, 0).last_line;		\
+	 (Current).last_line = (Current).first_line;			\
+	 (Current).first_column = YYRHSLOC(Rhs, 0).last_column;		\
+	 (Current).last_column = (Current).first_column;		\
+	 (Current).position = YYRHSLOC(Rhs, 0).position			\
+	    + (Current).first_column;					\
+      }									\
+   } while(0)
+%}
+
+%pure-parser
+%locations
+%lex-param   { struct asm_parser_state *state }
+%parse-param { struct asm_parser_state *state }
+%error-verbose
+
+%union {
+   struct asm_instruction *inst;
+   struct asm_symbol *sym;
+   struct asm_symbol temp_sym;
+   struct asm_swizzle_mask swiz_mask;
+   struct asm_src_register src_reg;
+   struct prog_dst_register dst_reg;
+   struct prog_instruction temp_inst;
+   char *string;
+   unsigned result;
+   unsigned attrib;
+   int integer;
+   float real;
+   gl_state_index state[STATE_LENGTH];
+   int negate;
+   struct asm_vector vector;
+   gl_inst_opcode opcode;
+
+   struct {
+      unsigned swz;
+      unsigned rgba_valid:1;
+      unsigned xyzw_valid:1;
+      unsigned negate:1;
+   } ext_swizzle;
+}
+
+%token ARBvp_10 ARBfp_10
+
+/* Tokens for assembler pseudo-ops */
+%token <integer> ADDRESS
+%token ALIAS ATTRIB
+%token OPTION OUTPUT
+%token PARAM
+%token <integer> TEMP
+%token END
+
+ /* Tokens for instructions */
+%token <temp_inst> BIN_OP BINSC_OP SAMPLE_OP SCALAR_OP TRI_OP VECTOR_OP
+%token <temp_inst> ARL KIL SWZ TXD_OP
+
+%token <integer> INTEGER
+%token <real> REAL
+
+%token AMBIENT ATTENUATION
+%token BACK
+%token CLIP COLOR
+%token DEPTH DIFFUSE DIRECTION
+%token EMISSION ENV EYE
+%token FOG FOGCOORD FRAGMENT FRONT
+%token HALF
+%token INVERSE INVTRANS
+%token LIGHT LIGHTMODEL LIGHTPROD LOCAL
+%token MATERIAL MAT_PROGRAM MATRIX MATRIXINDEX MODELVIEW MVP
+%token NORMAL
+%token OBJECT
+%token PALETTE PARAMS PLANE POINT_TOK POINTSIZE POSITION PRIMARY PROGRAM PROJECTION
+%token RANGE RESULT ROW
+%token SCENECOLOR SECONDARY SHININESS SIZE_TOK SPECULAR SPOT STATE
+%token TEXCOORD TEXENV TEXGEN TEXGEN_Q TEXGEN_R TEXGEN_S TEXGEN_T TEXTURE TRANSPOSE
+%token TEXTURE_UNIT TEX_1D TEX_2D TEX_3D TEX_CUBE TEX_RECT
+%token TEX_SHADOW1D TEX_SHADOW2D TEX_SHADOWRECT
+%token TEX_ARRAY1D TEX_ARRAY2D TEX_ARRAYSHADOW1D TEX_ARRAYSHADOW2D 
+%token VERTEX VTXATTRIB
+%token WEIGHT
+
+%token <string> IDENTIFIER USED_IDENTIFIER
+%type <string> string
+%token <swiz_mask> MASK4 MASK3 MASK2 MASK1 SWIZZLE
+%token DOT_DOT
+%token DOT
+
+%type <inst> instruction ALU_instruction TexInstruction
+%type <inst> ARL_instruction VECTORop_instruction
+%type <inst> SCALARop_instruction BINSCop_instruction BINop_instruction
+%type <inst> TRIop_instruction TXD_instruction SWZ_instruction SAMPLE_instruction
+%type <inst> KIL_instruction
+
+%type <dst_reg> dstReg maskedDstReg maskedAddrReg
+%type <src_reg> srcReg scalarUse scalarSrcReg swizzleSrcReg
+%type <swiz_mask> scalarSuffix swizzleSuffix extendedSwizzle
+%type <ext_swizzle> extSwizComp extSwizSel
+%type <swiz_mask> optionalMask
+
+%type <sym> progParamArray
+%type <integer> addrRegRelOffset addrRegPosOffset addrRegNegOffset
+%type <src_reg> progParamArrayMem progParamArrayAbs progParamArrayRel
+%type <sym> addrReg
+%type <swiz_mask> addrComponent addrWriteMask
+
+%type <dst_reg> ccMaskRule ccTest ccMaskRule2 ccTest2 optionalCcMask
+
+%type <result> resultBinding resultColBinding
+%type <integer> optFaceType optColorType
+%type <integer> optResultFaceType optResultColorType
+
+%type <integer> optTexImageUnitNum texImageUnitNum
+%type <integer> optTexCoordUnitNum texCoordUnitNum
+%type <integer> optLegacyTexUnitNum legacyTexUnitNum
+%type <integer> texImageUnit texTarget
+%type <integer> vtxAttribNum
+
+%type <attrib> attribBinding vtxAttribItem fragAttribItem
+
+%type <temp_sym> paramSingleInit paramSingleItemDecl
+%type <integer> optArraySize
+
+%type <state> stateSingleItem stateMultipleItem
+%type <state> stateMaterialItem
+%type <state> stateLightItem stateLightModelItem stateLightProdItem
+%type <state> stateTexGenItem stateFogItem stateClipPlaneItem statePointItem
+%type <state> stateMatrixItem stateMatrixRow stateMatrixRows
+%type <state> stateTexEnvItem stateDepthItem
+
+%type <state> stateLModProperty
+%type <state> stateMatrixName optMatrixRows
+
+%type <integer> stateMatProperty
+%type <integer> stateLightProperty stateSpotProperty
+%type <integer> stateLightNumber stateLProdProperty
+%type <integer> stateTexGenType stateTexGenCoord
+%type <integer> stateTexEnvProperty
+%type <integer> stateFogProperty
+%type <integer> stateClipPlaneNum
+%type <integer> statePointProperty
+
+%type <integer> stateOptMatModifier stateMatModifier stateMatrixRowNum
+%type <integer> stateOptModMatNum stateModMatNum statePaletteMatNum 
+%type <integer> stateProgramMatNum
+
+%type <integer> ambDiffSpecProperty
+
+%type <state> programSingleItem progEnvParam progLocalParam
+%type <state> programMultipleItem progEnvParams progLocalParams
+
+%type <temp_sym> paramMultipleInit paramMultInitList paramMultipleItem
+%type <temp_sym> paramSingleItemUse
+
+%type <integer> progEnvParamNum progLocalParamNum
+%type <state> progEnvParamNums progLocalParamNums
+
+%type <vector> paramConstDecl paramConstUse
+%type <vector> paramConstScalarDecl paramConstScalarUse paramConstVector
+%type <real> signedFloatConstant
+%type <negate> optionalSign
+
+%{
+extern int
+_mesa_program_lexer_lex(YYSTYPE *yylval_param, YYLTYPE *yylloc_param,
+                        void *yyscanner);
+
+static int
+yylex(YYSTYPE *yylval_param, YYLTYPE *yylloc_param,
+      struct asm_parser_state *state)
+{
+   return _mesa_program_lexer_lex(yylval_param, yylloc_param, state->scanner);
+}
+%}
+
+%%
+
+program: language optionSequence statementSequence END
+	;
+
+language: ARBvp_10
+	{
+	   if (state->prog->Target != GL_VERTEX_PROGRAM_ARB) {
+	      yyerror(& @1, state, "invalid fragment program header");
+
+	   }
+	   state->mode = ARB_vertex;
+	}
+	| ARBfp_10
+	{
+	   if (state->prog->Target != GL_FRAGMENT_PROGRAM_ARB) {
+	      yyerror(& @1, state, "invalid vertex program header");
+	   }
+	   state->mode = ARB_fragment;
+
+	   state->option.TexRect =
+	      (state->ctx->Extensions.NV_texture_rectangle != GL_FALSE);
+	}
+	;
+
+optionSequence: optionSequence option
+	|
+	;
+
+option: OPTION string ';'
+	{
+	   int valid = 0;
+
+	   if (state->mode == ARB_vertex) {
+	      valid = _mesa_ARBvp_parse_option(state, $2);
+	   } else if (state->mode == ARB_fragment) {
+	      valid = _mesa_ARBfp_parse_option(state, $2);
+	   }
+
+
+	   free($2);
+
+	   if (!valid) {
+	      const char *const err_str = (state->mode == ARB_vertex)
+		 ? "invalid ARB vertex program option"
+		 : "invalid ARB fragment program option";
+
+	      yyerror(& @2, state, err_str);
+	      YYERROR;
+	   }
+	}
+	;
+
+statementSequence: statementSequence statement
+	|
+	;
+
+statement: instruction ';'
+	{
+	   if ($1 != NULL) {
+	      if (state->inst_tail == NULL) {
+		 state->inst_head = $1;
+	      } else {
+		 state->inst_tail->next = $1;
+	      }
+
+	      state->inst_tail = $1;
+	      $1->next = NULL;
+
+	      state->prog->NumInstructions++;
+	   }
+	}
+	| namingStatement ';'
+	;
+
+instruction: ALU_instruction
+	{
+	   $$ = $1;
+	   state->prog->NumAluInstructions++;
+	}
+	| TexInstruction
+	{
+	   $$ = $1;
+	   state->prog->NumTexInstructions++;
+	}
+	;
+
+ALU_instruction: ARL_instruction
+	| VECTORop_instruction
+	| SCALARop_instruction
+	| BINSCop_instruction
+	| BINop_instruction
+	| TRIop_instruction
+	| SWZ_instruction
+	;
+
+TexInstruction: SAMPLE_instruction
+	| KIL_instruction
+	| TXD_instruction
+	;
+
+ARL_instruction: ARL maskedAddrReg ',' scalarSrcReg
+	{
+	   $$ = asm_instruction_ctor(OPCODE_ARL, & $2, & $4, NULL, NULL);
+	}
+	;
+
+VECTORop_instruction: VECTOR_OP maskedDstReg ',' swizzleSrcReg
+	{
+	   if ($1.Opcode == OPCODE_DDY)
+	      state->fragment.UsesDFdy = 1;
+	   $$ = asm_instruction_copy_ctor(& $1, & $2, & $4, NULL, NULL);
+	}
+	;
+
+SCALARop_instruction: SCALAR_OP maskedDstReg ',' scalarSrcReg
+	{
+	   $$ = asm_instruction_copy_ctor(& $1, & $2, & $4, NULL, NULL);
+	}
+	;
+
+BINSCop_instruction: BINSC_OP maskedDstReg ',' scalarSrcReg ',' scalarSrcReg
+	{
+	   $$ = asm_instruction_copy_ctor(& $1, & $2, & $4, & $6, NULL);
+	}
+	;
+
+
+BINop_instruction: BIN_OP maskedDstReg ',' swizzleSrcReg ',' swizzleSrcReg
+	{
+	   $$ = asm_instruction_copy_ctor(& $1, & $2, & $4, & $6, NULL);
+	}
+	;
+
+TRIop_instruction: TRI_OP maskedDstReg ','
+                   swizzleSrcReg ',' swizzleSrcReg ',' swizzleSrcReg
+	{
+	   $$ = asm_instruction_copy_ctor(& $1, & $2, & $4, & $6, & $8);
+	}
+	;
+
+SAMPLE_instruction: SAMPLE_OP maskedDstReg ',' swizzleSrcReg ',' texImageUnit ',' texTarget
+	{
+	   $$ = asm_instruction_copy_ctor(& $1, & $2, & $4, NULL, NULL);
+	   if ($$ != NULL) {
+	      const GLbitfield tex_mask = (1U << $6);
+	      GLbitfield shadow_tex = 0;
+	      GLbitfield target_mask = 0;
+
+
+	      $$->Base.TexSrcUnit = $6;
+
+	      if ($8 < 0) {
+		 shadow_tex = tex_mask;
+
+		 $$->Base.TexSrcTarget = -$8;
+		 $$->Base.TexShadow = 1;
+	      } else {
+		 $$->Base.TexSrcTarget = $8;
+	      }
+
+	      target_mask = (1U << $$->Base.TexSrcTarget);
+
+	      /* If this texture unit was previously accessed and that access
+	       * had a different texture target, generate an error.
+	       *
+	       * If this texture unit was previously accessed and that access
+	       * had a different shadow mode, generate an error.
+	       */
+	      if ((state->prog->TexturesUsed[$6] != 0)
+		  && ((state->prog->TexturesUsed[$6] != target_mask)
+		      || ((state->prog->ShadowSamplers & tex_mask)
+			  != shadow_tex))) {
+		 yyerror(& @8, state,
+			 "multiple targets used on one texture image unit");
+		 YYERROR;
+	      }
+
+
+	      state->prog->TexturesUsed[$6] |= target_mask;
+	      state->prog->ShadowSamplers |= shadow_tex;
+	   }
+	}
+	;
+
+KIL_instruction: KIL swizzleSrcReg
+	{
+	   $$ = asm_instruction_ctor(OPCODE_KIL, NULL, & $2, NULL, NULL);
+	   state->fragment.UsesKill = 1;
+	}
+	| KIL ccTest
+	{
+	   $$ = asm_instruction_ctor(OPCODE_KIL_NV, NULL, NULL, NULL, NULL);
+	   $$->Base.DstReg.CondMask = $2.CondMask;
+	   $$->Base.DstReg.CondSwizzle = $2.CondSwizzle;
+	   state->fragment.UsesKill = 1;
+	}
+	;
+
+TXD_instruction: TXD_OP maskedDstReg ',' swizzleSrcReg ',' swizzleSrcReg ',' swizzleSrcReg ',' texImageUnit ',' texTarget
+	{
+	   $$ = asm_instruction_copy_ctor(& $1, & $2, & $4, & $6, & $8);
+	   if ($$ != NULL) {
+	      const GLbitfield tex_mask = (1U << $10);
+	      GLbitfield shadow_tex = 0;
+	      GLbitfield target_mask = 0;
+
+
+	      $$->Base.TexSrcUnit = $10;
+
+	      if ($12 < 0) {
+		 shadow_tex = tex_mask;
+
+		 $$->Base.TexSrcTarget = -$12;
+		 $$->Base.TexShadow = 1;
+	      } else {
+		 $$->Base.TexSrcTarget = $12;
+	      }
+
+	      target_mask = (1U << $$->Base.TexSrcTarget);
+
+	      /* If this texture unit was previously accessed and that access
+	       * had a different texture target, generate an error.
+	       *
+	       * If this texture unit was previously accessed and that access
+	       * had a different shadow mode, generate an error.
+	       */
+	      if ((state->prog->TexturesUsed[$10] != 0)
+		  && ((state->prog->TexturesUsed[$10] != target_mask)
+		      || ((state->prog->ShadowSamplers & tex_mask)
+			  != shadow_tex))) {
+		 yyerror(& @12, state,
+			 "multiple targets used on one texture image unit");
+		 YYERROR;
+	      }
+
+
+	      state->prog->TexturesUsed[$10] |= target_mask;
+	      state->prog->ShadowSamplers |= shadow_tex;
+	   }
+	}
+	;
+
+texImageUnit: TEXTURE_UNIT optTexImageUnitNum
+	{
+	   $$ = $2;
+	}
+	;
+
+texTarget: TEX_1D  { $$ = TEXTURE_1D_INDEX; }
+	| TEX_2D   { $$ = TEXTURE_2D_INDEX; }
+	| TEX_3D   { $$ = TEXTURE_3D_INDEX; }
+	| TEX_CUBE { $$ = TEXTURE_CUBE_INDEX; }
+	| TEX_RECT { $$ = TEXTURE_RECT_INDEX; }
+	| TEX_SHADOW1D   { $$ = -TEXTURE_1D_INDEX; }
+	| TEX_SHADOW2D   { $$ = -TEXTURE_2D_INDEX; }
+	| TEX_SHADOWRECT { $$ = -TEXTURE_RECT_INDEX; }
+	| TEX_ARRAY1D         { $$ = TEXTURE_1D_ARRAY_INDEX; }
+	| TEX_ARRAY2D         { $$ = TEXTURE_2D_ARRAY_INDEX; }
+	| TEX_ARRAYSHADOW1D   { $$ = -TEXTURE_1D_ARRAY_INDEX; }
+	| TEX_ARRAYSHADOW2D   { $$ = -TEXTURE_2D_ARRAY_INDEX; }
+	;
+
+SWZ_instruction: SWZ maskedDstReg ',' srcReg ',' extendedSwizzle
+	{
+	   /* FIXME: Is this correct?  Should the extenedSwizzle be applied
+	    * FIXME: to the existing swizzle?
+	    */
+	   $4.Base.Swizzle = $6.swizzle;
+	   $4.Base.Negate = $6.mask;
+
+	   $$ = asm_instruction_copy_ctor(& $1, & $2, & $4, NULL, NULL);
+	}
+	;
+
+scalarSrcReg: optionalSign scalarUse
+	{
+	   $$ = $2;
+
+	   if ($1) {
+	      $$.Base.Negate = ~$$.Base.Negate;
+	   }
+	}
+	| optionalSign '|' scalarUse '|'
+	{
+	   $$ = $3;
+
+	   if (!state->option.NV_fragment) {
+	      yyerror(& @2, state, "unexpected character '|'");
+	      YYERROR;
+	   }
+
+	   if ($1) {
+	      $$.Base.Negate = ~$$.Base.Negate;
+	   }
+
+	   $$.Base.Abs = 1;
+	}
+	;
+
+scalarUse:  srcReg scalarSuffix
+	{
+	   $$ = $1;
+
+	   $$.Base.Swizzle = _mesa_combine_swizzles($$.Base.Swizzle,
+						    $2.swizzle);
+	}
+	| paramConstScalarUse
+	{
+	   struct asm_symbol temp_sym;
+
+	   if (!state->option.NV_fragment) {
+	      yyerror(& @1, state, "expected scalar suffix");
+	      YYERROR;
+	   }
+
+	   memset(& temp_sym, 0, sizeof(temp_sym));
+	   temp_sym.param_binding_begin = ~0;
+	   initialize_symbol_from_const(state->prog, & temp_sym, & $1, GL_TRUE);
+
+	   set_src_reg_swz(& $$, PROGRAM_CONSTANT,
+                           temp_sym.param_binding_begin,
+                           temp_sym.param_binding_swizzle);
+	}
+	;
+
+swizzleSrcReg: optionalSign srcReg swizzleSuffix
+	{
+	   $$ = $2;
+
+	   if ($1) {
+	      $$.Base.Negate = ~$$.Base.Negate;
+	   }
+
+	   $$.Base.Swizzle = _mesa_combine_swizzles($$.Base.Swizzle,
+						    $3.swizzle);
+	}
+	| optionalSign '|' srcReg swizzleSuffix '|'
+	{
+	   $$ = $3;
+
+	   if (!state->option.NV_fragment) {
+	      yyerror(& @2, state, "unexpected character '|'");
+	      YYERROR;
+	   }
+
+	   if ($1) {
+	      $$.Base.Negate = ~$$.Base.Negate;
+	   }
+
+	   $$.Base.Abs = 1;
+	   $$.Base.Swizzle = _mesa_combine_swizzles($$.Base.Swizzle,
+						    $4.swizzle);
+	}
+
+	;
+
+maskedDstReg: dstReg optionalMask optionalCcMask
+	{
+	   $$ = $1;
+	   $$.WriteMask = $2.mask;
+	   $$.CondMask = $3.CondMask;
+	   $$.CondSwizzle = $3.CondSwizzle;
+
+	   if ($$.File == PROGRAM_OUTPUT) {
+	      /* Technically speaking, this should check that it is in
+	       * vertex program mode.  However, PositionInvariant can never be
+	       * set in fragment program mode, so it is somewhat irrelevant.
+	       */
+	      if (state->option.PositionInvariant
+	       && ($$.Index == VARYING_SLOT_POS)) {
+		 yyerror(& @1, state, "position-invariant programs cannot "
+			 "write position");
+		 YYERROR;
+	      }
+
+	      state->prog->OutputsWritten |= BITFIELD64_BIT($$.Index);
+	   }
+	}
+	;
+
+maskedAddrReg: addrReg addrWriteMask
+	{
+	   set_dst_reg(& $$, PROGRAM_ADDRESS, 0);
+	   $$.WriteMask = $2.mask;
+	}
+	;
+
+extendedSwizzle: extSwizComp ',' extSwizComp ',' extSwizComp ',' extSwizComp
+	{
+	   const unsigned xyzw_valid =
+	      ($1.xyzw_valid << 0)
+	      | ($3.xyzw_valid << 1)
+	      | ($5.xyzw_valid << 2)
+	      | ($7.xyzw_valid << 3);
+	   const unsigned rgba_valid =
+	      ($1.rgba_valid << 0)
+	      | ($3.rgba_valid << 1)
+	      | ($5.rgba_valid << 2)
+	      | ($7.rgba_valid << 3);
+
+	   /* All of the swizzle components have to be valid in either RGBA
+	    * or XYZW.  Note that 0 and 1 are valid in both, so both masks
+	    * can have some bits set.
+	    *
+	    * We somewhat deviate from the spec here.  It would be really hard
+	    * to figure out which component is the error, and there probably
+	    * isn't a lot of benefit.
+	    */
+	   if ((rgba_valid != 0x0f) && (xyzw_valid != 0x0f)) {
+	      yyerror(& @1, state, "cannot combine RGBA and XYZW swizzle "
+		      "components");
+	      YYERROR;
+	   }
+
+	   $$.swizzle = MAKE_SWIZZLE4($1.swz, $3.swz, $5.swz, $7.swz);
+	   $$.mask = ($1.negate) | ($3.negate << 1) | ($5.negate << 2)
+	      | ($7.negate << 3);
+	}
+	;
+
+extSwizComp: optionalSign extSwizSel
+	{
+	   $$ = $2;
+	   $$.negate = ($1) ? 1 : 0;
+	}
+	;
+
+extSwizSel: INTEGER
+	{
+	   if (($1 != 0) && ($1 != 1)) {
+	      yyerror(& @1, state, "invalid extended swizzle selector");
+	      YYERROR;
+	   }
+
+	   $$.swz = ($1 == 0) ? SWIZZLE_ZERO : SWIZZLE_ONE;
+           $$.negate = 0;
+
+	   /* 0 and 1 are valid for both RGBA swizzle names and XYZW
+	    * swizzle names.
+	    */
+	   $$.xyzw_valid = 1;
+	   $$.rgba_valid = 1;
+	}
+	| string
+	{
+	   char s;
+
+	   if (strlen($1) > 1) {
+	      yyerror(& @1, state, "invalid extended swizzle selector");
+	      YYERROR;
+	   }
+
+	   s = $1[0];
+	   free($1);
+
+           $$.rgba_valid = 0;
+           $$.xyzw_valid = 0;
+           $$.negate = 0;
+
+	   switch (s) {
+	   case 'x':
+	      $$.swz = SWIZZLE_X;
+	      $$.xyzw_valid = 1;
+	      break;
+	   case 'y':
+	      $$.swz = SWIZZLE_Y;
+	      $$.xyzw_valid = 1;
+	      break;
+	   case 'z':
+	      $$.swz = SWIZZLE_Z;
+	      $$.xyzw_valid = 1;
+	      break;
+	   case 'w':
+	      $$.swz = SWIZZLE_W;
+	      $$.xyzw_valid = 1;
+	      break;
+
+	   case 'r':
+	      $$.swz = SWIZZLE_X;
+	      $$.rgba_valid = 1;
+	      break;
+	   case 'g':
+	      $$.swz = SWIZZLE_Y;
+	      $$.rgba_valid = 1;
+	      break;
+	   case 'b':
+	      $$.swz = SWIZZLE_Z;
+	      $$.rgba_valid = 1;
+	      break;
+	   case 'a':
+	      $$.swz = SWIZZLE_W;
+	      $$.rgba_valid = 1;
+	      break;
+
+	   default:
+	      yyerror(& @1, state, "invalid extended swizzle selector");
+	      YYERROR;
+	      break;
+	   }
+	}
+	;
+
+srcReg: USED_IDENTIFIER /* temporaryReg | progParamSingle */
+	{
+	   struct asm_symbol *const s = (struct asm_symbol *)
+	      _mesa_symbol_table_find_symbol(state->st, 0, $1);
+
+	   free($1);
+
+	   if (s == NULL) {
+	      yyerror(& @1, state, "invalid operand variable");
+	      YYERROR;
+	   } else if ((s->type != at_param) && (s->type != at_temp)
+		      && (s->type != at_attrib)) {
+	      yyerror(& @1, state, "invalid operand variable");
+	      YYERROR;
+	   } else if ((s->type == at_param) && s->param_is_array) {
+	      yyerror(& @1, state, "non-array access to array PARAM");
+	      YYERROR;
+	   }
+
+	   init_src_reg(& $$);
+	   switch (s->type) {
+	   case at_temp:
+	      set_src_reg(& $$, PROGRAM_TEMPORARY, s->temp_binding);
+	      break;
+	   case at_param:
+              set_src_reg_swz(& $$, s->param_binding_type,
+                              s->param_binding_begin,
+                              s->param_binding_swizzle);
+	      break;
+	   case at_attrib:
+	      set_src_reg(& $$, PROGRAM_INPUT, s->attrib_binding);
+	      state->prog->InputsRead |= BITFIELD64_BIT($$.Base.Index);
+
+	      if (!validate_inputs(& @1, state)) {
+		 YYERROR;
+	      }
+	      break;
+
+	   default:
+	      YYERROR;
+	      break;
+	   }
+	}
+	| attribBinding
+	{
+	   set_src_reg(& $$, PROGRAM_INPUT, $1);
+	   state->prog->InputsRead |= BITFIELD64_BIT($$.Base.Index);
+
+	   if (!validate_inputs(& @1, state)) {
+	      YYERROR;
+	   }
+	}
+	| progParamArray '[' progParamArrayMem ']'
+	{
+	   if (! $3.Base.RelAddr
+	       && ((unsigned) $3.Base.Index >= $1->param_binding_length)) {
+	      yyerror(& @3, state, "out of bounds array access");
+	      YYERROR;
+	   }
+
+	   init_src_reg(& $$);
+	   $$.Base.File = $1->param_binding_type;
+
+	   if ($3.Base.RelAddr) {
+              state->prog->IndirectRegisterFiles |= (1 << $$.Base.File);
+	      $1->param_accessed_indirectly = 1;
+
+	      $$.Base.RelAddr = 1;
+	      $$.Base.Index = $3.Base.Index;
+	      $$.Symbol = $1;
+	   } else {
+	      $$.Base.Index = $1->param_binding_begin + $3.Base.Index;
+	   }
+	}
+	| paramSingleItemUse
+	{
+           gl_register_file file = ($1.name != NULL) 
+	      ? $1.param_binding_type
+	      : PROGRAM_CONSTANT;
+           set_src_reg_swz(& $$, file, $1.param_binding_begin,
+                           $1.param_binding_swizzle);
+	}
+	;
+
+dstReg: resultBinding
+	{
+	   set_dst_reg(& $$, PROGRAM_OUTPUT, $1);
+	}
+	| USED_IDENTIFIER /* temporaryReg | vertexResultReg */
+	{
+	   struct asm_symbol *const s = (struct asm_symbol *)
+	      _mesa_symbol_table_find_symbol(state->st, 0, $1);
+
+	   free($1);
+
+	   if (s == NULL) {
+	      yyerror(& @1, state, "invalid operand variable");
+	      YYERROR;
+	   } else if ((s->type != at_output) && (s->type != at_temp)) {
+	      yyerror(& @1, state, "invalid operand variable");
+	      YYERROR;
+	   }
+
+	   switch (s->type) {
+	   case at_temp:
+	      set_dst_reg(& $$, PROGRAM_TEMPORARY, s->temp_binding);
+	      break;
+	   case at_output:
+	      set_dst_reg(& $$, PROGRAM_OUTPUT, s->output_binding);
+	      break;
+	   default:
+	      set_dst_reg(& $$, s->param_binding_type, s->param_binding_begin);
+	      break;
+	   }
+	}
+	;
+
+progParamArray: USED_IDENTIFIER
+	{
+	   struct asm_symbol *const s = (struct asm_symbol *)
+	      _mesa_symbol_table_find_symbol(state->st, 0, $1);
+
+	   free($1);
+
+	   if (s == NULL) {
+	      yyerror(& @1, state, "invalid operand variable");
+	      YYERROR;
+	   } else if ((s->type != at_param) || !s->param_is_array) {
+	      yyerror(& @1, state, "array access to non-PARAM variable");
+	      YYERROR;
+	   } else {
+	      $$ = s;
+	   }
+	}
+	;
+
+progParamArrayMem: progParamArrayAbs | progParamArrayRel;
+
+progParamArrayAbs: INTEGER
+	{
+	   init_src_reg(& $$);
+	   $$.Base.Index = $1;
+	}
+	;
+
+progParamArrayRel: addrReg addrComponent addrRegRelOffset
+	{
+	   /* FINISHME: Add support for multiple address registers.
+	    */
+	   /* FINISHME: Add support for 4-component address registers.
+	    */
+	   init_src_reg(& $$);
+	   $$.Base.RelAddr = 1;
+	   $$.Base.Index = $3;
+	}
+	;
+
+addrRegRelOffset:              { $$ = 0; }
+	| '+' addrRegPosOffset { $$ = $2; }
+	| '-' addrRegNegOffset { $$ = -$2; }
+	;
+
+addrRegPosOffset: INTEGER
+	{
+	   if (($1 < 0) || ($1 > (state->limits->MaxAddressOffset - 1))) {
+              char s[100];
+              _mesa_snprintf(s, sizeof(s),
+                             "relative address offset too large (%d)", $1);
+	      yyerror(& @1, state, s);
+	      YYERROR;
+	   } else {
+	      $$ = $1;
+	   }
+	}
+	;
+
+addrRegNegOffset: INTEGER
+	{
+	   if (($1 < 0) || ($1 > state->limits->MaxAddressOffset)) {
+              char s[100];
+              _mesa_snprintf(s, sizeof(s),
+                             "relative address offset too large (%d)", $1);
+	      yyerror(& @1, state, s);
+	      YYERROR;
+	   } else {
+	      $$ = $1;
+	   }
+	}
+	;
+
+addrReg: USED_IDENTIFIER
+	{
+	   struct asm_symbol *const s = (struct asm_symbol *)
+	      _mesa_symbol_table_find_symbol(state->st, 0, $1);
+
+	   free($1);
+
+	   if (s == NULL) {
+	      yyerror(& @1, state, "invalid array member");
+	      YYERROR;
+	   } else if (s->type != at_address) {
+	      yyerror(& @1, state,
+		      "invalid variable for indexed array access");
+	      YYERROR;
+	   } else {
+	      $$ = s;
+	   }
+	}
+	;
+
+addrComponent: MASK1
+	{
+	   if ($1.mask != WRITEMASK_X) {
+	      yyerror(& @1, state, "invalid address component selector");
+	      YYERROR;
+	   } else {
+	      $$ = $1;
+	   }
+	}
+	;
+
+addrWriteMask: MASK1
+	{
+	   if ($1.mask != WRITEMASK_X) {
+	      yyerror(& @1, state,
+		      "address register write mask must be \".x\"");
+	      YYERROR;
+	   } else {
+	      $$ = $1;
+	   }
+	}
+	;
+
+scalarSuffix: MASK1;
+
+swizzleSuffix: MASK1
+	| MASK4
+	| SWIZZLE
+	|              { $$.swizzle = SWIZZLE_NOOP; $$.mask = WRITEMASK_XYZW; }
+	;
+
+optionalMask: MASK4 | MASK3 | MASK2 | MASK1 
+	|              { $$.swizzle = SWIZZLE_NOOP; $$.mask = WRITEMASK_XYZW; }
+	;
+
+optionalCcMask: '(' ccTest ')'
+	{
+	   $$ = $2;
+	}
+	| '(' ccTest2 ')'
+	{
+	   $$ = $2;
+	}
+	|
+	{
+	   $$.CondMask = COND_TR;
+	   $$.CondSwizzle = SWIZZLE_NOOP;
+	}
+	;
+
+ccTest: ccMaskRule swizzleSuffix
+	{
+	   $$ = $1;
+	   $$.CondSwizzle = $2.swizzle;
+	}
+	;
+
+ccTest2: ccMaskRule2 swizzleSuffix
+	{
+	   $$ = $1;
+	   $$.CondSwizzle = $2.swizzle;
+	}
+	;
+
+ccMaskRule: IDENTIFIER
+	{
+	   const int cond = _mesa_parse_cc($1);
+	   if ((cond == 0) || ($1[2] != '\0')) {
+	      char *const err_str =
+		 make_error_string("invalid condition code \"%s\"", $1);
+
+	      yyerror(& @1, state, (err_str != NULL)
+		      ? err_str : "invalid condition code");
+
+	      if (err_str != NULL) {
+		 free(err_str);
+	      }
+
+	      YYERROR;
+	   }
+
+	   $$.CondMask = cond;
+	   $$.CondSwizzle = SWIZZLE_NOOP;
+	}
+	;
+
+ccMaskRule2: USED_IDENTIFIER
+	{
+	   const int cond = _mesa_parse_cc($1);
+	   if ((cond == 0) || ($1[2] != '\0')) {
+	      char *const err_str =
+		 make_error_string("invalid condition code \"%s\"", $1);
+
+	      yyerror(& @1, state, (err_str != NULL)
+		      ? err_str : "invalid condition code");
+
+	      if (err_str != NULL) {
+		 free(err_str);
+	      }
+
+	      YYERROR;
+	   }
+
+	   $$.CondMask = cond;
+	   $$.CondSwizzle = SWIZZLE_NOOP;
+	}
+	;
+
+namingStatement: ATTRIB_statement
+	| PARAM_statement
+	| TEMP_statement
+	| ADDRESS_statement
+	| OUTPUT_statement
+	| ALIAS_statement
+	;
+
+ATTRIB_statement: ATTRIB IDENTIFIER '=' attribBinding
+	{
+	   struct asm_symbol *const s =
+	      declare_variable(state, $2, at_attrib, & @2);
+
+	   if (s == NULL) {
+	      free($2);
+	      YYERROR;
+	   } else {
+	      s->attrib_binding = $4;
+	      state->InputsBound |= BITFIELD64_BIT(s->attrib_binding);
+
+	      if (!validate_inputs(& @4, state)) {
+		 YYERROR;
+	      }
+	   }
+	}
+	;
+
+attribBinding: VERTEX vtxAttribItem
+	{
+	   $$ = $2;
+	}
+	| FRAGMENT fragAttribItem
+	{
+	   $$ = $2;
+	}
+	;
+
+vtxAttribItem: POSITION
+	{
+	   $$ = VERT_ATTRIB_POS;
+	}
+	| WEIGHT vtxOptWeightNum
+	{
+	   $$ = VERT_ATTRIB_WEIGHT;
+	}
+	| NORMAL
+	{
+	   $$ = VERT_ATTRIB_NORMAL;
+	}
+	| COLOR optColorType
+	{
+	   $$ = VERT_ATTRIB_COLOR0 + $2;
+	}
+	| FOGCOORD
+	{
+	   $$ = VERT_ATTRIB_FOG;
+	}
+	| TEXCOORD optTexCoordUnitNum
+	{
+	   $$ = VERT_ATTRIB_TEX0 + $2;
+	}
+	| MATRIXINDEX '[' vtxWeightNum ']'
+	{
+	   yyerror(& @1, state, "GL_ARB_matrix_palette not supported");
+	   YYERROR;
+	}
+	| VTXATTRIB '[' vtxAttribNum ']'
+	{
+	   $$ = VERT_ATTRIB_GENERIC0 + $3;
+	}
+	;
+
+vtxAttribNum: INTEGER
+	{
+	   if ((unsigned) $1 >= state->limits->MaxAttribs) {
+	      yyerror(& @1, state, "invalid vertex attribute reference");
+	      YYERROR;
+	   }
+
+	   $$ = $1;
+	}
+	;
+
+vtxOptWeightNum:  | '[' vtxWeightNum ']';
+vtxWeightNum: INTEGER;
+
+fragAttribItem: POSITION
+	{
+	   $$ = VARYING_SLOT_POS;
+	}
+	| COLOR optColorType
+	{
+	   $$ = VARYING_SLOT_COL0 + $2;
+	}
+	| FOGCOORD
+	{
+	   $$ = VARYING_SLOT_FOGC;
+	}
+	| TEXCOORD optTexCoordUnitNum
+	{
+	   $$ = VARYING_SLOT_TEX0 + $2;
+	}
+	;
+
+PARAM_statement: PARAM_singleStmt | PARAM_multipleStmt;
+
+PARAM_singleStmt: PARAM IDENTIFIER paramSingleInit
+	{
+	   struct asm_symbol *const s =
+	      declare_variable(state, $2, at_param, & @2);
+
+	   if (s == NULL) {
+	      free($2);
+	      YYERROR;
+	   } else {
+	      s->param_binding_type = $3.param_binding_type;
+	      s->param_binding_begin = $3.param_binding_begin;
+	      s->param_binding_length = $3.param_binding_length;
+              s->param_binding_swizzle = $3.param_binding_swizzle;
+	      s->param_is_array = 0;
+	   }
+	}
+	;
+
+PARAM_multipleStmt: PARAM IDENTIFIER '[' optArraySize ']' paramMultipleInit
+	{
+	   if (($4 != 0) && ((unsigned) $4 != $6.param_binding_length)) {
+	      free($2);
+	      yyerror(& @4, state, 
+		      "parameter array size and number of bindings must match");
+	      YYERROR;
+	   } else {
+	      struct asm_symbol *const s =
+		 declare_variable(state, $2, $6.type, & @2);
+
+	      if (s == NULL) {
+		 free($2);
+		 YYERROR;
+	      } else {
+		 s->param_binding_type = $6.param_binding_type;
+		 s->param_binding_begin = $6.param_binding_begin;
+		 s->param_binding_length = $6.param_binding_length;
+                 s->param_binding_swizzle = SWIZZLE_XYZW;
+		 s->param_is_array = 1;
+	      }
+	   }
+	}
+	;
+
+optArraySize:
+	{
+	   $$ = 0;
+	}
+	| INTEGER
+        {
+	   if (($1 < 1) || ((unsigned) $1 > state->limits->MaxParameters)) {
+              char msg[100];
+              _mesa_snprintf(msg, sizeof(msg),
+                             "invalid parameter array size (size=%d max=%u)",
+                             $1, state->limits->MaxParameters);
+	      yyerror(& @1, state, msg);
+	      YYERROR;
+	   } else {
+	      $$ = $1;
+	   }
+	}
+	;
+
+paramSingleInit: '=' paramSingleItemDecl
+	{
+	   $$ = $2;
+	}
+	;
+
+paramMultipleInit: '=' '{' paramMultInitList '}'
+	{
+	   $$ = $3;
+	}
+	;
+
+paramMultInitList: paramMultipleItem
+	| paramMultInitList ',' paramMultipleItem
+	{
+	   $1.param_binding_length += $3.param_binding_length;
+	   $$ = $1;
+	}
+	;
+
+paramSingleItemDecl: stateSingleItem
+	{
+	   memset(& $$, 0, sizeof($$));
+	   $$.param_binding_begin = ~0;
+	   initialize_symbol_from_state(state->prog, & $$, $1);
+	}
+	| programSingleItem
+	{
+	   memset(& $$, 0, sizeof($$));
+	   $$.param_binding_begin = ~0;
+	   initialize_symbol_from_param(state->prog, & $$, $1);
+	}
+	| paramConstDecl
+	{
+	   memset(& $$, 0, sizeof($$));
+	   $$.param_binding_begin = ~0;
+	   initialize_symbol_from_const(state->prog, & $$, & $1, GL_TRUE);
+	}
+	;
+
+paramSingleItemUse: stateSingleItem
+	{
+	   memset(& $$, 0, sizeof($$));
+	   $$.param_binding_begin = ~0;
+	   initialize_symbol_from_state(state->prog, & $$, $1);
+	}
+	| programSingleItem
+	{
+	   memset(& $$, 0, sizeof($$));
+	   $$.param_binding_begin = ~0;
+	   initialize_symbol_from_param(state->prog, & $$, $1);
+	}
+	| paramConstUse
+	{
+	   memset(& $$, 0, sizeof($$));
+	   $$.param_binding_begin = ~0;
+	   initialize_symbol_from_const(state->prog, & $$, & $1, GL_TRUE);
+	}
+	;
+
+paramMultipleItem: stateMultipleItem
+	{
+	   memset(& $$, 0, sizeof($$));
+	   $$.param_binding_begin = ~0;
+	   initialize_symbol_from_state(state->prog, & $$, $1);
+	}
+	| programMultipleItem
+	{
+	   memset(& $$, 0, sizeof($$));
+	   $$.param_binding_begin = ~0;
+	   initialize_symbol_from_param(state->prog, & $$, $1);
+	}
+	| paramConstDecl
+	{
+	   memset(& $$, 0, sizeof($$));
+	   $$.param_binding_begin = ~0;
+	   initialize_symbol_from_const(state->prog, & $$, & $1, GL_FALSE);
+	}
+	;
+
+stateMultipleItem: stateSingleItem        { memcpy($$, $1, sizeof($$)); }
+	| STATE stateMatrixRows           { memcpy($$, $2, sizeof($$)); }
+	;
+
+stateSingleItem: STATE stateMaterialItem  { memcpy($$, $2, sizeof($$)); }
+	| STATE stateLightItem            { memcpy($$, $2, sizeof($$)); }
+	| STATE stateLightModelItem       { memcpy($$, $2, sizeof($$)); }
+	| STATE stateLightProdItem        { memcpy($$, $2, sizeof($$)); }
+	| STATE stateTexGenItem           { memcpy($$, $2, sizeof($$)); }
+	| STATE stateTexEnvItem           { memcpy($$, $2, sizeof($$)); }
+	| STATE stateFogItem              { memcpy($$, $2, sizeof($$)); }
+	| STATE stateClipPlaneItem        { memcpy($$, $2, sizeof($$)); }
+	| STATE statePointItem            { memcpy($$, $2, sizeof($$)); }
+	| STATE stateMatrixRow            { memcpy($$, $2, sizeof($$)); }
+	| STATE stateDepthItem            { memcpy($$, $2, sizeof($$)); }
+	;
+
+stateMaterialItem: MATERIAL optFaceType stateMatProperty
+	{
+	   memset($$, 0, sizeof($$));
+	   $$[0] = STATE_MATERIAL;
+	   $$[1] = $2;
+	   $$[2] = $3;
+	}
+	;
+
+stateMatProperty: ambDiffSpecProperty
+	{
+	   $$ = $1;
+	}
+	| EMISSION
+	{
+	   $$ = STATE_EMISSION;
+	}
+	| SHININESS
+	{
+	   $$ = STATE_SHININESS;
+	}
+	;
+
+stateLightItem: LIGHT '[' stateLightNumber ']' stateLightProperty
+	{
+	   memset($$, 0, sizeof($$));
+	   $$[0] = STATE_LIGHT;
+	   $$[1] = $3;
+	   $$[2] = $5;
+	}
+	;
+
+stateLightProperty: ambDiffSpecProperty
+	{
+	   $$ = $1;
+	}
+	| POSITION
+	{
+	   $$ = STATE_POSITION;
+	}
+	| ATTENUATION
+	{
+	   if (!state->ctx->Extensions.EXT_point_parameters) {
+	      yyerror(& @1, state, "GL_ARB_point_parameters not supported");
+	      YYERROR;
+	   }
+
+	   $$ = STATE_ATTENUATION;
+	}
+	| SPOT stateSpotProperty
+	{
+	   $$ = $2;
+	}
+	| HALF
+	{
+	   $$ = STATE_HALF_VECTOR;
+	}
+	;
+
+stateSpotProperty: DIRECTION
+	{
+	   $$ = STATE_SPOT_DIRECTION;
+	}
+	;
+
+stateLightModelItem: LIGHTMODEL stateLModProperty
+	{
+	   $$[0] = $2[0];
+	   $$[1] = $2[1];
+	}
+	;
+
+stateLModProperty: AMBIENT
+	{
+	   memset($$, 0, sizeof($$));
+	   $$[0] = STATE_LIGHTMODEL_AMBIENT;
+	}
+	| optFaceType SCENECOLOR
+	{
+	   memset($$, 0, sizeof($$));
+	   $$[0] = STATE_LIGHTMODEL_SCENECOLOR;
+	   $$[1] = $1;
+	}
+	;
+
+stateLightProdItem: LIGHTPROD '[' stateLightNumber ']' optFaceType stateLProdProperty
+	{
+	   memset($$, 0, sizeof($$));
+	   $$[0] = STATE_LIGHTPROD;
+	   $$[1] = $3;
+	   $$[2] = $5;
+	   $$[3] = $6;
+	}
+	;
+
+stateLProdProperty: ambDiffSpecProperty;
+
+stateTexEnvItem: TEXENV optLegacyTexUnitNum stateTexEnvProperty
+	{
+	   memset($$, 0, sizeof($$));
+	   $$[0] = $3;
+	   $$[1] = $2;
+	}
+	;
+
+stateTexEnvProperty: COLOR
+	{
+	   $$ = STATE_TEXENV_COLOR;
+	}
+	;
+
+ambDiffSpecProperty: AMBIENT
+	{
+	   $$ = STATE_AMBIENT;
+	}
+	| DIFFUSE
+	{
+	   $$ = STATE_DIFFUSE;
+	}
+	| SPECULAR
+	{
+	   $$ = STATE_SPECULAR;
+	}
+	;
+
+stateLightNumber: INTEGER
+	{
+	   if ((unsigned) $1 >= state->MaxLights) {
+	      yyerror(& @1, state, "invalid light selector");
+	      YYERROR;
+	   }
+
+	   $$ = $1;
+	}
+	;
+
+stateTexGenItem: TEXGEN optTexCoordUnitNum stateTexGenType stateTexGenCoord
+	{
+	   memset($$, 0, sizeof($$));
+	   $$[0] = STATE_TEXGEN;
+	   $$[1] = $2;
+	   $$[2] = $3 + $4;
+	}
+	;
+
+stateTexGenType: EYE
+	{
+	   $$ = STATE_TEXGEN_EYE_S;
+	}
+	| OBJECT
+	{
+	   $$ = STATE_TEXGEN_OBJECT_S;
+	}
+	;
+stateTexGenCoord: TEXGEN_S
+	{
+	   $$ = STATE_TEXGEN_EYE_S - STATE_TEXGEN_EYE_S;
+	}
+	| TEXGEN_T
+	{
+	   $$ = STATE_TEXGEN_EYE_T - STATE_TEXGEN_EYE_S;
+	}
+	| TEXGEN_R
+	{
+	   $$ = STATE_TEXGEN_EYE_R - STATE_TEXGEN_EYE_S;
+	}
+	| TEXGEN_Q
+	{
+	   $$ = STATE_TEXGEN_EYE_Q - STATE_TEXGEN_EYE_S;
+	}
+	;
+
+stateFogItem: FOG stateFogProperty
+	{
+	   memset($$, 0, sizeof($$));
+	   $$[0] = $2;
+	}
+	;
+
+stateFogProperty: COLOR
+	{
+	   $$ = STATE_FOG_COLOR;
+	}
+	| PARAMS
+	{
+	   $$ = STATE_FOG_PARAMS;
+	}
+	;
+
+stateClipPlaneItem: CLIP '[' stateClipPlaneNum ']' PLANE
+	{
+	   memset($$, 0, sizeof($$));
+	   $$[0] = STATE_CLIPPLANE;
+	   $$[1] = $3;
+	}
+	;
+
+stateClipPlaneNum: INTEGER
+	{
+	   if ((unsigned) $1 >= state->MaxClipPlanes) {
+	      yyerror(& @1, state, "invalid clip plane selector");
+	      YYERROR;
+	   }
+
+	   $$ = $1;
+	}
+	;
+
+statePointItem: POINT_TOK statePointProperty
+	{
+	   memset($$, 0, sizeof($$));
+	   $$[0] = $2;
+	}
+	;
+
+statePointProperty: SIZE_TOK
+	{
+	   $$ = STATE_POINT_SIZE;
+	}
+	| ATTENUATION
+	{
+	   $$ = STATE_POINT_ATTENUATION;
+	}
+	;
+
+stateMatrixRow: stateMatrixItem ROW '[' stateMatrixRowNum ']'
+	{
+	   $$[0] = $1[0];
+	   $$[1] = $1[1];
+	   $$[2] = $4;
+	   $$[3] = $4;
+	   $$[4] = $1[2];
+	}
+	;
+
+stateMatrixRows: stateMatrixItem optMatrixRows
+	{
+	   $$[0] = $1[0];
+	   $$[1] = $1[1];
+	   $$[2] = $2[2];
+	   $$[3] = $2[3];
+	   $$[4] = $1[2];
+	}
+	;
+
+optMatrixRows:
+	{
+	   $$[2] = 0;
+	   $$[3] = 3;
+	}
+	| ROW '[' stateMatrixRowNum DOT_DOT stateMatrixRowNum ']'
+	{
+	   /* It seems logical that the matrix row range specifier would have
+	    * to specify a range or more than one row (i.e., $5 > $3).
+	    * However, the ARB_vertex_program spec says "a program will fail
+	    * to load if <a> is greater than <b>."  This means that $3 == $5
+	    * is valid.
+	    */
+	   if ($3 > $5) {
+	      yyerror(& @3, state, "invalid matrix row range");
+	      YYERROR;
+	   }
+
+	   $$[2] = $3;
+	   $$[3] = $5;
+	}
+	;
+
+stateMatrixItem: MATRIX stateMatrixName stateOptMatModifier
+	{
+	   $$[0] = $2[0];
+	   $$[1] = $2[1];
+	   $$[2] = $3;
+	}
+	;
+
+stateOptMatModifier: 
+	{
+	   $$ = 0;
+	}
+	| stateMatModifier
+	{
+	   $$ = $1;
+	}
+	;
+
+stateMatModifier: INVERSE 
+	{
+	   $$ = STATE_MATRIX_INVERSE;
+	}
+	| TRANSPOSE 
+	{
+	   $$ = STATE_MATRIX_TRANSPOSE;
+	}
+	| INVTRANS
+	{
+	   $$ = STATE_MATRIX_INVTRANS;
+	}
+	;
+
+stateMatrixRowNum: INTEGER
+	{
+	   if ($1 > 3) {
+	      yyerror(& @1, state, "invalid matrix row reference");
+	      YYERROR;
+	   }
+
+	   $$ = $1;
+	}
+	;
+
+stateMatrixName: MODELVIEW stateOptModMatNum
+	{
+	   $$[0] = STATE_MODELVIEW_MATRIX;
+	   $$[1] = $2;
+	}
+	| PROJECTION
+	{
+	   $$[0] = STATE_PROJECTION_MATRIX;
+	   $$[1] = 0;
+	}
+	| MVP
+	{
+	   $$[0] = STATE_MVP_MATRIX;
+	   $$[1] = 0;
+	}
+	| TEXTURE optTexCoordUnitNum
+	{
+	   $$[0] = STATE_TEXTURE_MATRIX;
+	   $$[1] = $2;
+	}
+	| PALETTE '[' statePaletteMatNum ']'
+	{
+	   yyerror(& @1, state, "GL_ARB_matrix_palette not supported");
+	   YYERROR;
+	}
+	| MAT_PROGRAM '[' stateProgramMatNum ']'
+	{
+	   $$[0] = STATE_PROGRAM_MATRIX;
+	   $$[1] = $3;
+	}
+	;
+
+stateOptModMatNum:
+	{
+	   $$ = 0;
+	}
+	| '[' stateModMatNum ']'
+	{
+	   $$ = $2;
+	}
+	;
+stateModMatNum: INTEGER
+	{
+	   /* Since GL_ARB_vertex_blend isn't supported, only modelview matrix
+	    * zero is valid.
+	    */
+	   if ($1 != 0) {
+	      yyerror(& @1, state, "invalid modelview matrix index");
+	      YYERROR;
+	   }
+
+	   $$ = $1;
+	}
+	;
+statePaletteMatNum: INTEGER
+	{
+	   /* Since GL_ARB_matrix_palette isn't supported, just let any value
+	    * through here.  The error will be generated later.
+	    */
+	   $$ = $1;
+	}
+	;
+stateProgramMatNum: INTEGER
+	{
+	   if ((unsigned) $1 >= state->MaxProgramMatrices) {
+	      yyerror(& @1, state, "invalid program matrix selector");
+	      YYERROR;
+	   }
+
+	   $$ = $1;
+	}
+	;
+
+stateDepthItem: DEPTH RANGE
+	{
+	   memset($$, 0, sizeof($$));
+	   $$[0] = STATE_DEPTH_RANGE;
+	}
+	;
+
+
+programSingleItem: progEnvParam | progLocalParam;
+
+programMultipleItem: progEnvParams | progLocalParams;
+
+progEnvParams: PROGRAM ENV '[' progEnvParamNums ']'
+	{
+	   memset($$, 0, sizeof($$));
+	   $$[0] = state->state_param_enum;
+	   $$[1] = STATE_ENV;
+	   $$[2] = $4[0];
+	   $$[3] = $4[1];
+	}
+	;
+
+progEnvParamNums: progEnvParamNum
+	{
+	   $$[0] = $1;
+	   $$[1] = $1;
+	}
+	| progEnvParamNum DOT_DOT progEnvParamNum
+	{
+	   $$[0] = $1;
+	   $$[1] = $3;
+	}
+	;
+
+progEnvParam: PROGRAM ENV '[' progEnvParamNum ']'
+	{
+	   memset($$, 0, sizeof($$));
+	   $$[0] = state->state_param_enum;
+	   $$[1] = STATE_ENV;
+	   $$[2] = $4;
+	   $$[3] = $4;
+	}
+	;
+
+progLocalParams: PROGRAM LOCAL '[' progLocalParamNums ']'
+	{
+	   memset($$, 0, sizeof($$));
+	   $$[0] = state->state_param_enum;
+	   $$[1] = STATE_LOCAL;
+	   $$[2] = $4[0];
+	   $$[3] = $4[1];
+	}
+
+progLocalParamNums: progLocalParamNum
+	{
+	   $$[0] = $1;
+	   $$[1] = $1;
+	}
+	| progLocalParamNum DOT_DOT progLocalParamNum
+	{
+	   $$[0] = $1;
+	   $$[1] = $3;
+	}
+	;
+
+progLocalParam: PROGRAM LOCAL '[' progLocalParamNum ']'
+	{
+	   memset($$, 0, sizeof($$));
+	   $$[0] = state->state_param_enum;
+	   $$[1] = STATE_LOCAL;
+	   $$[2] = $4;
+	   $$[3] = $4;
+	}
+	;
+
+progEnvParamNum: INTEGER
+	{
+	   if ((unsigned) $1 >= state->limits->MaxEnvParams) {
+	      yyerror(& @1, state, "invalid environment parameter reference");
+	      YYERROR;
+	   }
+	   $$ = $1;
+	}
+	;
+
+progLocalParamNum: INTEGER
+	{
+	   if ((unsigned) $1 >= state->limits->MaxLocalParams) {
+	      yyerror(& @1, state, "invalid local parameter reference");
+	      YYERROR;
+	   }
+	   $$ = $1;
+	}
+	;
+
+
+
+paramConstDecl: paramConstScalarDecl | paramConstVector;
+paramConstUse: paramConstScalarUse | paramConstVector;
+
+paramConstScalarDecl: signedFloatConstant
+	{
+	   $$.count = 4;
+	   $$.data[0].f = $1;
+	   $$.data[1].f = $1;
+	   $$.data[2].f = $1;
+	   $$.data[3].f = $1;
+	}
+	;
+
+paramConstScalarUse: REAL
+	{
+	   $$.count = 1;
+	   $$.data[0].f = $1;
+	   $$.data[1].f = $1;
+	   $$.data[2].f = $1;
+	   $$.data[3].f = $1;
+	}
+	| INTEGER
+	{
+	   $$.count = 1;
+	   $$.data[0].f = (float) $1;
+	   $$.data[1].f = (float) $1;
+	   $$.data[2].f = (float) $1;
+	   $$.data[3].f = (float) $1;
+	}
+	;
+
+paramConstVector: '{' signedFloatConstant '}'
+	{
+	   $$.count = 4;
+	   $$.data[0].f = $2;
+	   $$.data[1].f = 0.0f;
+	   $$.data[2].f = 0.0f;
+	   $$.data[3].f = 1.0f;
+	}
+	| '{' signedFloatConstant ',' signedFloatConstant '}'
+	{
+	   $$.count = 4;
+	   $$.data[0].f = $2;
+	   $$.data[1].f = $4;
+	   $$.data[2].f = 0.0f;
+	   $$.data[3].f = 1.0f;
+	}
+	| '{' signedFloatConstant ',' signedFloatConstant ','
+              signedFloatConstant '}'
+	{
+	   $$.count = 4;
+	   $$.data[0].f = $2;
+	   $$.data[1].f = $4;
+	   $$.data[2].f = $6;
+	   $$.data[3].f = 1.0f;
+	}
+	| '{' signedFloatConstant ',' signedFloatConstant ','
+              signedFloatConstant ',' signedFloatConstant '}'
+	{
+	   $$.count = 4;
+	   $$.data[0].f = $2;
+	   $$.data[1].f = $4;
+	   $$.data[2].f = $6;
+	   $$.data[3].f = $8;
+	}
+	;
+
+signedFloatConstant: optionalSign REAL
+	{
+	   $$ = ($1) ? -$2 : $2;
+	}
+	| optionalSign INTEGER
+	{
+	   $$ = (float)(($1) ? -$2 : $2);
+	}
+	;
+
+optionalSign: '+'        { $$ = FALSE; }
+	| '-'            { $$ = TRUE;  }
+	|                { $$ = FALSE; }
+	;
+
+TEMP_statement: optVarSize TEMP { $<integer>$ = $2; } varNameList
+	;
+
+optVarSize: string
+	{
+	   /* NV_fragment_program_option defines the size qualifiers in a
+	    * fairly broken way.  "SHORT" or "LONG" can optionally be used
+	    * before TEMP or OUTPUT.  However, neither is a reserved word!
+	    * This means that we have to parse it as an identifier, then check
+	    * to make sure it's one of the valid values.  *sigh*
+	    *
+	    * In addition, the grammar in the extension spec does *not* allow
+	    * the size specifier to be optional, but all known implementations
+	    * do.
+	    */
+	   if (!state->option.NV_fragment) {
+	      yyerror(& @1, state, "unexpected IDENTIFIER");
+	      YYERROR;
+	   }
+
+	   if (strcmp("SHORT", $1) == 0) {
+	   } else if (strcmp("LONG", $1) == 0) {
+	   } else {
+	      char *const err_str =
+		 make_error_string("invalid storage size specifier \"%s\"",
+				   $1);
+
+	      yyerror(& @1, state, (err_str != NULL)
+		      ? err_str : "invalid storage size specifier");
+
+	      if (err_str != NULL) {
+		 free(err_str);
+	      }
+
+	      YYERROR;
+	   }
+	}
+	|
+	{
+	}
+	;
+
+ADDRESS_statement: ADDRESS { $<integer>$ = $1; } varNameList
+	;
+
+varNameList: varNameList ',' IDENTIFIER
+	{
+	   if (!declare_variable(state, $3, $<integer>0, & @3)) {
+	      free($3);
+	      YYERROR;
+	   }
+	}
+	| IDENTIFIER
+	{
+	   if (!declare_variable(state, $1, $<integer>0, & @1)) {
+	      free($1);
+	      YYERROR;
+	   }
+	}
+	;
+
+OUTPUT_statement: optVarSize OUTPUT IDENTIFIER '=' resultBinding
+	{
+	   struct asm_symbol *const s =
+	      declare_variable(state, $3, at_output, & @3);
+
+	   if (s == NULL) {
+	      free($3);
+	      YYERROR;
+	   } else {
+	      s->output_binding = $5;
+	   }
+	}
+	;
+
+resultBinding: RESULT POSITION
+	{
+	   if (state->mode == ARB_vertex) {
+	      $$ = VARYING_SLOT_POS;
+	   } else {
+	      yyerror(& @2, state, "invalid program result name");
+	      YYERROR;
+	   }
+	}
+	| RESULT FOGCOORD
+	{
+	   if (state->mode == ARB_vertex) {
+	      $$ = VARYING_SLOT_FOGC;
+	   } else {
+	      yyerror(& @2, state, "invalid program result name");
+	      YYERROR;
+	   }
+	}
+	| RESULT resultColBinding
+	{
+	   $$ = $2;
+	}
+	| RESULT POINTSIZE
+	{
+	   if (state->mode == ARB_vertex) {
+	      $$ = VARYING_SLOT_PSIZ;
+	   } else {
+	      yyerror(& @2, state, "invalid program result name");
+	      YYERROR;
+	   }
+	}
+	| RESULT TEXCOORD optTexCoordUnitNum
+	{
+	   if (state->mode == ARB_vertex) {
+	      $$ = VARYING_SLOT_TEX0 + $3;
+	   } else {
+	      yyerror(& @2, state, "invalid program result name");
+	      YYERROR;
+	   }
+	}
+	| RESULT DEPTH
+	{
+	   if (state->mode == ARB_fragment) {
+	      $$ = FRAG_RESULT_DEPTH;
+	   } else {
+	      yyerror(& @2, state, "invalid program result name");
+	      YYERROR;
+	   }
+	}
+	;
+
+resultColBinding: COLOR optResultFaceType optResultColorType
+	{
+	   $$ = $2 + $3;
+	}
+	;
+
+optResultFaceType:
+	{
+	   if (state->mode == ARB_vertex) {
+	      $$ = VARYING_SLOT_COL0;
+	   } else {
+	      if (state->option.DrawBuffers)
+		 $$ = FRAG_RESULT_DATA0;
+	      else
+		 $$ = FRAG_RESULT_COLOR;
+	   }
+	}
+	| '[' INTEGER ']'
+	{
+	   if (state->mode == ARB_vertex) {
+	      yyerror(& @1, state, "invalid program result name");
+	      YYERROR;
+	   } else {
+	      if (!state->option.DrawBuffers) {
+		 /* From the ARB_draw_buffers spec (same text exists
+		  * for ATI_draw_buffers):
+		  *
+		  *     If this option is not specified, a fragment
+		  *     program that attempts to bind
+		  *     "result.color[n]" will fail to load, and only
+		  *     "result.color" will be allowed.
+		  */
+		 yyerror(& @1, state,
+			 "result.color[] used without "
+			 "`OPTION ARB_draw_buffers' or "
+			 "`OPTION ATI_draw_buffers'");
+		 YYERROR;
+	      } else if ($2 >= state->MaxDrawBuffers) {
+		 yyerror(& @1, state,
+			 "result.color[] exceeds MAX_DRAW_BUFFERS_ARB");
+		 YYERROR;
+	      }
+	      $$ = FRAG_RESULT_DATA0 + $2;
+	   }
+	}
+	| FRONT
+	{
+	   if (state->mode == ARB_vertex) {
+	      $$ = VARYING_SLOT_COL0;
+	   } else {
+	      yyerror(& @1, state, "invalid program result name");
+	      YYERROR;
+	   }
+	}
+	| BACK
+	{
+	   if (state->mode == ARB_vertex) {
+	      $$ = VARYING_SLOT_BFC0;
+	   } else {
+	      yyerror(& @1, state, "invalid program result name");
+	      YYERROR;
+	   }
+	}
+	;
+
+optResultColorType:
+	{
+	   $$ = 0; 
+	}
+	| PRIMARY
+	{
+	   if (state->mode == ARB_vertex) {
+	      $$ = 0;
+	   } else {
+	      yyerror(& @1, state, "invalid program result name");
+	      YYERROR;
+	   }
+	}
+	| SECONDARY
+	{
+	   if (state->mode == ARB_vertex) {
+	      $$ = 1;
+	   } else {
+	      yyerror(& @1, state, "invalid program result name");
+	      YYERROR;
+	   }
+	}
+	;
+
+optFaceType:    { $$ = 0; }
+	| FRONT	{ $$ = 0; }
+	| BACK  { $$ = 1; }
+	;
+
+optColorType:       { $$ = 0; }
+	| PRIMARY   { $$ = 0; }
+	| SECONDARY { $$ = 1; }
+	;
+
+optTexCoordUnitNum:                { $$ = 0; }
+	| '[' texCoordUnitNum ']'  { $$ = $2; }
+	;
+
+optTexImageUnitNum:                { $$ = 0; }
+	| '[' texImageUnitNum ']'  { $$ = $2; }
+	;
+
+optLegacyTexUnitNum:               { $$ = 0; }
+	| '[' legacyTexUnitNum ']' { $$ = $2; }
+	;
+
+texCoordUnitNum: INTEGER
+	{
+	   if ((unsigned) $1 >= state->MaxTextureCoordUnits) {
+	      yyerror(& @1, state, "invalid texture coordinate unit selector");
+	      YYERROR;
+	   }
+
+	   $$ = $1;
+	}
+	;
+
+texImageUnitNum: INTEGER
+	{
+	   if ((unsigned) $1 >= state->MaxTextureImageUnits) {
+	      yyerror(& @1, state, "invalid texture image unit selector");
+	      YYERROR;
+	   }
+
+	   $$ = $1;
+	}
+	;
+
+legacyTexUnitNum: INTEGER
+	{
+	   if ((unsigned) $1 >= state->MaxTextureUnits) {
+	      yyerror(& @1, state, "invalid texture unit selector");
+	      YYERROR;
+	   }
+
+	   $$ = $1;
+	}
+	;
+
+ALIAS_statement: ALIAS IDENTIFIER '=' USED_IDENTIFIER
+	{
+	   struct asm_symbol *exist = (struct asm_symbol *)
+	      _mesa_symbol_table_find_symbol(state->st, 0, $2);
+	   struct asm_symbol *target = (struct asm_symbol *)
+	      _mesa_symbol_table_find_symbol(state->st, 0, $4);
+
+	   free($4);
+
+	   if (exist != NULL) {
+	      char m[1000];
+	      _mesa_snprintf(m, sizeof(m), "redeclared identifier: %s", $2);
+	      free($2);
+	      yyerror(& @2, state, m);
+	      YYERROR;
+	   } else if (target == NULL) {
+	      free($2);
+	      yyerror(& @4, state,
+		      "undefined variable binding in ALIAS statement");
+	      YYERROR;
+	   } else {
+	      _mesa_symbol_table_add_symbol(state->st, 0, $2, target);
+	   }
+	}
+	;
+
+string: IDENTIFIER
+	| USED_IDENTIFIER
+	;
+
+%%
+
+void
+asm_instruction_set_operands(struct asm_instruction *inst,
+			     const struct prog_dst_register *dst,
+			     const struct asm_src_register *src0,
+			     const struct asm_src_register *src1,
+			     const struct asm_src_register *src2)
+{
+   /* In the core ARB extensions only the KIL instruction doesn't have a
+    * destination register.
+    */
+   if (dst == NULL) {
+      init_dst_reg(& inst->Base.DstReg);
+   } else {
+      inst->Base.DstReg = *dst;
+   }
+
+   /* The only instruction that doesn't have any source registers is the
+    * condition-code based KIL instruction added by NV_fragment_program_option.
+    */
+   if (src0 != NULL) {
+      inst->Base.SrcReg[0] = src0->Base;
+      inst->SrcReg[0] = *src0;
+   } else {
+      init_src_reg(& inst->SrcReg[0]);
+   }
+
+   if (src1 != NULL) {
+      inst->Base.SrcReg[1] = src1->Base;
+      inst->SrcReg[1] = *src1;
+   } else {
+      init_src_reg(& inst->SrcReg[1]);
+   }
+
+   if (src2 != NULL) {
+      inst->Base.SrcReg[2] = src2->Base;
+      inst->SrcReg[2] = *src2;
+   } else {
+      init_src_reg(& inst->SrcReg[2]);
+   }
+}
+
+
+struct asm_instruction *
+asm_instruction_ctor(gl_inst_opcode op,
+		     const struct prog_dst_register *dst,
+		     const struct asm_src_register *src0,
+		     const struct asm_src_register *src1,
+		     const struct asm_src_register *src2)
+{
+   struct asm_instruction *inst = CALLOC_STRUCT(asm_instruction);
+
+   if (inst) {
+      _mesa_init_instructions(& inst->Base, 1);
+      inst->Base.Opcode = op;
+
+      asm_instruction_set_operands(inst, dst, src0, src1, src2);
+   }
+
+   return inst;
+}
+
+
+struct asm_instruction *
+asm_instruction_copy_ctor(const struct prog_instruction *base,
+			  const struct prog_dst_register *dst,
+			  const struct asm_src_register *src0,
+			  const struct asm_src_register *src1,
+			  const struct asm_src_register *src2)
+{
+   struct asm_instruction *inst = CALLOC_STRUCT(asm_instruction);
+
+   if (inst) {
+      _mesa_init_instructions(& inst->Base, 1);
+      inst->Base.Opcode = base->Opcode;
+      inst->Base.CondUpdate = base->CondUpdate;
+      inst->Base.CondDst = base->CondDst;
+      inst->Base.SaturateMode = base->SaturateMode;
+      inst->Base.Precision = base->Precision;
+
+      asm_instruction_set_operands(inst, dst, src0, src1, src2);
+   }
+
+   return inst;
+}
+
+
+void
+init_dst_reg(struct prog_dst_register *r)
+{
+   memset(r, 0, sizeof(*r));
+   r->File = PROGRAM_UNDEFINED;
+   r->WriteMask = WRITEMASK_XYZW;
+   r->CondMask = COND_TR;
+   r->CondSwizzle = SWIZZLE_NOOP;
+}
+
+
+/** Like init_dst_reg() but set the File and Index fields. */
+void
+set_dst_reg(struct prog_dst_register *r, gl_register_file file, GLint index)
+{
+   const GLint maxIndex = 1 << INST_INDEX_BITS;
+   const GLint minIndex = 0;
+   ASSERT(index >= minIndex);
+   (void) minIndex;
+   ASSERT(index <= maxIndex);
+   (void) maxIndex;
+   ASSERT(file == PROGRAM_TEMPORARY ||
+	  file == PROGRAM_ADDRESS ||
+	  file == PROGRAM_OUTPUT);
+   memset(r, 0, sizeof(*r));
+   r->File = file;
+   r->Index = index;
+   r->WriteMask = WRITEMASK_XYZW;
+   r->CondMask = COND_TR;
+   r->CondSwizzle = SWIZZLE_NOOP;
+}
+
+
+void
+init_src_reg(struct asm_src_register *r)
+{
+   memset(r, 0, sizeof(*r));
+   r->Base.File = PROGRAM_UNDEFINED;
+   r->Base.Swizzle = SWIZZLE_NOOP;
+   r->Symbol = NULL;
+}
+
+
+/** Like init_src_reg() but set the File and Index fields.
+ * \return GL_TRUE if a valid src register, GL_FALSE otherwise
+ */
+void
+set_src_reg(struct asm_src_register *r, gl_register_file file, GLint index)
+{
+   set_src_reg_swz(r, file, index, SWIZZLE_XYZW);
+}
+
+
+void
+set_src_reg_swz(struct asm_src_register *r, gl_register_file file, GLint index,
+                GLuint swizzle)
+{
+   const GLint maxIndex = (1 << INST_INDEX_BITS) - 1;
+   const GLint minIndex = -(1 << INST_INDEX_BITS);
+   ASSERT(file < PROGRAM_FILE_MAX);
+   ASSERT(index >= minIndex);
+   (void) minIndex;
+   ASSERT(index <= maxIndex);
+   (void) maxIndex;
+   memset(r, 0, sizeof(*r));
+   r->Base.File = file;
+   r->Base.Index = index;
+   r->Base.Swizzle = swizzle;
+   r->Symbol = NULL;
+}
+
+
+/**
+ * Validate the set of inputs used by a program
+ *
+ * Validates that legal sets of inputs are used by the program.  In this case
+ * "used" included both reading the input or binding the input to a name using
+ * the \c ATTRIB command.
+ *
+ * \return
+ * \c TRUE if the combination of inputs used is valid, \c FALSE otherwise.
+ */
+int
+validate_inputs(struct YYLTYPE *locp, struct asm_parser_state *state)
+{
+   const GLbitfield64 inputs = state->prog->InputsRead | state->InputsBound;
+
+   if (((inputs & VERT_BIT_FF_ALL) & (inputs >> VERT_ATTRIB_GENERIC0)) != 0) {
+      yyerror(locp, state, "illegal use of generic attribute and name attribute");
+      return 0;
+   }
+
+   return 1;
+}
+
+
+struct asm_symbol *
+declare_variable(struct asm_parser_state *state, char *name, enum asm_type t,
+		 struct YYLTYPE *locp)
+{
+   struct asm_symbol *s = NULL;
+   struct asm_symbol *exist = (struct asm_symbol *)
+      _mesa_symbol_table_find_symbol(state->st, 0, name);
+
+
+   if (exist != NULL) {
+      yyerror(locp, state, "redeclared identifier");
+   } else {
+      s = calloc(1, sizeof(struct asm_symbol));
+      s->name = name;
+      s->type = t;
+
+      switch (t) {
+      case at_temp:
+	 if (state->prog->NumTemporaries >= state->limits->MaxTemps) {
+	    yyerror(locp, state, "too many temporaries declared");
+	    free(s);
+	    return NULL;
+	 }
+
+	 s->temp_binding = state->prog->NumTemporaries;
+	 state->prog->NumTemporaries++;
+	 break;
+
+      case at_address:
+	 if (state->prog->NumAddressRegs >= state->limits->MaxAddressRegs) {
+	    yyerror(locp, state, "too many address registers declared");
+	    free(s);
+	    return NULL;
+	 }
+
+	 /* FINISHME: Add support for multiple address registers.
+	  */
+	 state->prog->NumAddressRegs++;
+	 break;
+
+      default:
+	 break;
+      }
+
+      _mesa_symbol_table_add_symbol(state->st, 0, s->name, s);
+      s->next = state->sym;
+      state->sym = s;
+   }
+
+   return s;
+}
+
+
+int add_state_reference(struct gl_program_parameter_list *param_list,
+			const gl_state_index tokens[STATE_LENGTH])
+{
+   const GLuint size = 4; /* XXX fix */
+   char *name;
+   GLint index;
+
+   name = _mesa_program_state_string(tokens);
+   index = _mesa_add_parameter(param_list, PROGRAM_STATE_VAR, name,
+                               size, GL_NONE, NULL, tokens);
+   param_list->StateFlags |= _mesa_program_state_flags(tokens);
+
+   /* free name string here since we duplicated it in add_parameter() */
+   free(name);
+
+   return index;
+}
+
+
+int
+initialize_symbol_from_state(struct gl_program *prog,
+			     struct asm_symbol *param_var, 
+			     const gl_state_index tokens[STATE_LENGTH])
+{
+   int idx = -1;
+   gl_state_index state_tokens[STATE_LENGTH];
+
+
+   memcpy(state_tokens, tokens, sizeof(state_tokens));
+
+   param_var->type = at_param;
+   param_var->param_binding_type = PROGRAM_STATE_VAR;
+
+   /* If we are adding a STATE_MATRIX that has multiple rows, we need to
+    * unroll it and call add_state_reference() for each row
+    */
+   if ((state_tokens[0] == STATE_MODELVIEW_MATRIX ||
+	state_tokens[0] == STATE_PROJECTION_MATRIX ||
+	state_tokens[0] == STATE_MVP_MATRIX ||
+	state_tokens[0] == STATE_TEXTURE_MATRIX ||
+	state_tokens[0] == STATE_PROGRAM_MATRIX)
+       && (state_tokens[2] != state_tokens[3])) {
+      int row;
+      const int first_row = state_tokens[2];
+      const int last_row = state_tokens[3];
+
+      for (row = first_row; row <= last_row; row++) {
+	 state_tokens[2] = state_tokens[3] = row;
+
+	 idx = add_state_reference(prog->Parameters, state_tokens);
+	 if (param_var->param_binding_begin == ~0U) {
+	    param_var->param_binding_begin = idx;
+            param_var->param_binding_swizzle = SWIZZLE_XYZW;
+         }
+
+	 param_var->param_binding_length++;
+      }
+   }
+   else {
+      idx = add_state_reference(prog->Parameters, state_tokens);
+      if (param_var->param_binding_begin == ~0U) {
+	 param_var->param_binding_begin = idx;
+         param_var->param_binding_swizzle = SWIZZLE_XYZW;
+      }
+      param_var->param_binding_length++;
+   }
+
+   return idx;
+}
+
+
+int
+initialize_symbol_from_param(struct gl_program *prog,
+			     struct asm_symbol *param_var, 
+			     const gl_state_index tokens[STATE_LENGTH])
+{
+   int idx = -1;
+   gl_state_index state_tokens[STATE_LENGTH];
+
+
+   memcpy(state_tokens, tokens, sizeof(state_tokens));
+
+   assert((state_tokens[0] == STATE_VERTEX_PROGRAM)
+	  || (state_tokens[0] == STATE_FRAGMENT_PROGRAM));
+   assert((state_tokens[1] == STATE_ENV)
+	  || (state_tokens[1] == STATE_LOCAL));
+
+   /*
+    * The param type is STATE_VAR.  The program parameter entry will
+    * effectively be a pointer into the LOCAL or ENV parameter array.
+    */
+   param_var->type = at_param;
+   param_var->param_binding_type = PROGRAM_STATE_VAR;
+
+   /* If we are adding a STATE_ENV or STATE_LOCAL that has multiple elements,
+    * we need to unroll it and call add_state_reference() for each row
+    */
+   if (state_tokens[2] != state_tokens[3]) {
+      int row;
+      const int first_row = state_tokens[2];
+      const int last_row = state_tokens[3];
+
+      for (row = first_row; row <= last_row; row++) {
+	 state_tokens[2] = state_tokens[3] = row;
+
+	 idx = add_state_reference(prog->Parameters, state_tokens);
+	 if (param_var->param_binding_begin == ~0U) {
+	    param_var->param_binding_begin = idx;
+            param_var->param_binding_swizzle = SWIZZLE_XYZW;
+         }
+	 param_var->param_binding_length++;
+      }
+   }
+   else {
+      idx = add_state_reference(prog->Parameters, state_tokens);
+      if (param_var->param_binding_begin == ~0U) {
+	 param_var->param_binding_begin = idx;
+         param_var->param_binding_swizzle = SWIZZLE_XYZW;
+      }
+      param_var->param_binding_length++;
+   }
+
+   return idx;
+}
+
+
+/**
+ * Put a float/vector constant/literal into the parameter list.
+ * \param param_var  returns info about the parameter/constant's location,
+ *                   binding, type, etc.
+ * \param vec  the vector/constant to add
+ * \param allowSwizzle  if true, try to consolidate constants which only differ
+ *                      by a swizzle.  We don't want to do this when building
+ *                      arrays of constants that may be indexed indirectly.
+ * \return index of the constant in the parameter list.
+ */
+int
+initialize_symbol_from_const(struct gl_program *prog,
+			     struct asm_symbol *param_var, 
+			     const struct asm_vector *vec,
+                             GLboolean allowSwizzle)
+{
+   unsigned swizzle;
+   const int idx = _mesa_add_unnamed_constant(prog->Parameters,
+                                              vec->data, vec->count,
+                                              allowSwizzle ? &swizzle : NULL);
+
+   param_var->type = at_param;
+   param_var->param_binding_type = PROGRAM_CONSTANT;
+
+   if (param_var->param_binding_begin == ~0U) {
+      param_var->param_binding_begin = idx;
+      param_var->param_binding_swizzle = allowSwizzle ? swizzle : SWIZZLE_XYZW;
+   }
+   param_var->param_binding_length++;
+
+   return idx;
+}
+
+
+char *
+make_error_string(const char *fmt, ...)
+{
+   int length;
+   char *str;
+   va_list args;
+
+
+   /* Call vsnprintf once to determine how large the final string is.  Call it
+    * again to do the actual formatting.  from the vsnprintf manual page:
+    *
+    *    Upon successful return, these functions return the number of
+    *    characters printed  (not including the trailing '\0' used to end
+    *    output to strings).
+    */
+   va_start(args, fmt);
+   length = 1 + vsnprintf(NULL, 0, fmt, args);
+   va_end(args);
+
+   str = malloc(length);
+   if (str) {
+      va_start(args, fmt);
+      vsnprintf(str, length, fmt, args);
+      va_end(args);
+   }
+
+   return str;
+}
+
+
+void
+yyerror(YYLTYPE *locp, struct asm_parser_state *state, const char *s)
+{
+   char *err_str;
+
+
+   err_str = make_error_string("glProgramStringARB(%s)\n", s);
+   if (err_str) {
+      _mesa_error(state->ctx, GL_INVALID_OPERATION, "%s", err_str);
+      free(err_str);
+   }
+
+   err_str = make_error_string("line %u, char %u: error: %s\n",
+			       locp->first_line, locp->first_column, s);
+   _mesa_set_program_error(state->ctx, locp->position, err_str);
+
+   if (err_str) {
+      free(err_str);
+   }
+}
+
+
+GLboolean
+_mesa_parse_arb_program(struct gl_context *ctx, GLenum target, const GLubyte *str,
+			GLsizei len, struct asm_parser_state *state)
+{
+   struct asm_instruction *inst;
+   unsigned i;
+   GLubyte *strz;
+   GLboolean result = GL_FALSE;
+   void *temp;
+   struct asm_symbol *sym;
+
+   state->ctx = ctx;
+   state->prog->Target = target;
+   state->prog->Parameters = _mesa_new_parameter_list();
+
+   /* Make a copy of the program string and force it to be NUL-terminated.
+    */
+   strz = (GLubyte *) malloc(len + 1);
+   if (strz == NULL) {
+      _mesa_error(ctx, GL_OUT_OF_MEMORY, "glProgramStringARB");
+      return GL_FALSE;
+   }
+   memcpy (strz, str, len);
+   strz[len] = '\0';
+
+   state->prog->String = strz;
+
+   state->st = _mesa_symbol_table_ctor();
+
+   state->limits = (target == GL_VERTEX_PROGRAM_ARB)
+      ? & ctx->Const.Program[MESA_SHADER_VERTEX]
+      : & ctx->Const.Program[MESA_SHADER_FRAGMENT];
+
+   state->MaxTextureImageUnits = ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxTextureImageUnits;
+   state->MaxTextureCoordUnits = ctx->Const.MaxTextureCoordUnits;
+   state->MaxTextureUnits = ctx->Const.MaxTextureUnits;
+   state->MaxClipPlanes = ctx->Const.MaxClipPlanes;
+   state->MaxLights = ctx->Const.MaxLights;
+   state->MaxProgramMatrices = ctx->Const.MaxProgramMatrices;
+   state->MaxDrawBuffers = ctx->Const.MaxDrawBuffers;
+
+   state->state_param_enum = (target == GL_VERTEX_PROGRAM_ARB)
+      ? STATE_VERTEX_PROGRAM : STATE_FRAGMENT_PROGRAM;
+
+   _mesa_set_program_error(ctx, -1, NULL);
+
+   _mesa_program_lexer_ctor(& state->scanner, state, (const char *) str, len);
+   yyparse(state);
+   _mesa_program_lexer_dtor(state->scanner);
+
+
+   if (ctx->Program.ErrorPos != -1) {
+      goto error;
+   }
+
+   if (! _mesa_layout_parameters(state)) {
+      struct YYLTYPE loc;
+
+      loc.first_line = 0;
+      loc.first_column = 0;
+      loc.position = len;
+
+      yyerror(& loc, state, "invalid PARAM usage");
+      goto error;
+   }
+
+
+   
+   /* Add one instruction to store the "END" instruction.
+    */
+   state->prog->Instructions =
+      _mesa_alloc_instructions(state->prog->NumInstructions + 1);
+
+   if (state->prog->Instructions == NULL) {
+      goto error;
+   }
+
+   inst = state->inst_head;
+   for (i = 0; i < state->prog->NumInstructions; i++) {
+      struct asm_instruction *const temp = inst->next;
+
+      state->prog->Instructions[i] = inst->Base;
+      inst = temp;
+   }
+
+   /* Finally, tag on an OPCODE_END instruction */
+   {
+      const GLuint numInst = state->prog->NumInstructions;
+      _mesa_init_instructions(state->prog->Instructions + numInst, 1);
+      state->prog->Instructions[numInst].Opcode = OPCODE_END;
+   }
+   state->prog->NumInstructions++;
+
+   state->prog->NumParameters = state->prog->Parameters->NumParameters;
+   state->prog->NumAttributes = _mesa_bitcount_64(state->prog->InputsRead);
+
+   /*
+    * Initialize native counts to logical counts.  The device driver may
+    * change them if program is translated into a hardware program.
+    */
+   state->prog->NumNativeInstructions = state->prog->NumInstructions;
+   state->prog->NumNativeTemporaries = state->prog->NumTemporaries;
+   state->prog->NumNativeParameters = state->prog->NumParameters;
+   state->prog->NumNativeAttributes = state->prog->NumAttributes;
+   state->prog->NumNativeAddressRegs = state->prog->NumAddressRegs;
+
+   result = GL_TRUE;
+
+error:
+   for (inst = state->inst_head; inst != NULL; inst = temp) {
+      temp = inst->next;
+      free(inst);
+   }
+
+   state->inst_head = NULL;
+   state->inst_tail = NULL;
+
+   for (sym = state->sym; sym != NULL; sym = temp) {
+      temp = sym->next;
+
+      free((void *) sym->name);
+      free(sym);
+   }
+   state->sym = NULL;
+
+   _mesa_symbol_table_dtor(state->st);
+   state->st = NULL;
+
+   return result;
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/program_parse_extra.c b/icd/intel/compiler/mesa-utils/src/mesa/program/program_parse_extra.c
new file mode 100644
index 0000000..a9e3640
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/program_parse_extra.c

@@ -0,0 +1,262 @@
+/*
+ * Copyright © 2009 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include <string.h>
+#include "main/mtypes.h"
+#include "prog_instruction.h"
+#include "program_parser.h"
+
+
+/**
+ * Extra assembly-level parser routines
+ *
+ * \author Ian Romanick <ian.d.romanick@intel.com>
+ */
+
+int
+_mesa_parse_instruction_suffix(const struct asm_parser_state *state,
+			       const char *suffix,
+			       struct prog_instruction *inst)
+{
+   inst->CondUpdate = 0;
+   inst->CondDst = 0;
+   inst->SaturateMode = SATURATE_OFF;
+   inst->Precision = FLOAT32;
+
+
+   /* The first possible suffix element is the precision specifier from
+    * NV_fragment_program_option.
+    */
+   if (state->option.NV_fragment) {
+      switch (suffix[0]) {
+      case 'H':
+	 inst->Precision = FLOAT16;
+	 suffix++;
+	 break;
+      case 'R':
+	 inst->Precision = FLOAT32;
+	 suffix++;
+	 break;
+      case 'X':
+	 inst->Precision = FIXED12;
+	 suffix++;
+	 break;
+      default:
+	 break;
+      }
+   }
+
+   /* The next possible suffix element is the condition code modifier selection
+    * from NV_fragment_program_option.
+    */
+   if (state->option.NV_fragment) {
+      if (suffix[0] == 'C') {
+	 inst->CondUpdate = 1;
+	 suffix++;
+      }
+   }
+
+
+   /* The final possible suffix element is the saturation selector from
+    * ARB_fragment_program.
+    */
+   if (state->mode == ARB_fragment) {
+      if (strcmp(suffix, "_SAT") == 0) {
+	 inst->SaturateMode = SATURATE_ZERO_ONE;
+	 suffix += 4;
+      }
+   }
+
+
+   /* It is an error for all of the suffix string not to be consumed.
+    */
+   return suffix[0] == '\0';
+}
+
+
+int
+_mesa_parse_cc(const char *s)
+{
+   int cond = 0;
+
+   switch (s[0]) {
+   case 'E':
+      if (s[1] == 'Q') {
+	 cond = COND_EQ;
+      }
+      break;
+
+   case 'F':
+      if (s[1] == 'L') {
+	 cond = COND_FL;
+      }
+      break;
+
+   case 'G':
+      if (s[1] == 'E') {
+	 cond = COND_GE;
+      } else if (s[1] == 'T') {
+	 cond = COND_GT;
+      }
+      break;
+
+   case 'L':
+      if (s[1] == 'E') {
+	 cond = COND_LE;
+      } else if (s[1] == 'T') {
+	 cond = COND_LT;
+      }
+      break;
+
+   case 'N':
+      if (s[1] == 'E') {
+	 cond = COND_NE;
+      }
+      break;
+
+   case 'T':
+      if (s[1] == 'R') {
+	 cond = COND_TR;
+      }
+      break;
+
+   default:
+      break;
+   }
+
+   return ((cond == 0) || (s[2] != '\0')) ? 0 : cond;
+}
+
+
+int
+_mesa_ARBvp_parse_option(struct asm_parser_state *state, const char *option)
+{
+   if (strcmp(option, "ARB_position_invariant") == 0) {
+      state->option.PositionInvariant = 1;
+      return 1;
+   }
+
+   return 0;
+}
+
+
+int
+_mesa_ARBfp_parse_option(struct asm_parser_state *state, const char *option)
+{
+   /* All of the options currently supported start with "ARB_".  The code is
+    * currently structured with nested if-statements because eventually options
+    * that start with "NV_" will be supported.  This structure will result in
+    * less churn when those options are added.
+    */
+   if (strncmp(option, "ARB_", 4) == 0) {
+      /* Advance the pointer past the "ARB_" prefix.
+       */
+      option += 4;
+
+
+      if (strncmp(option, "fog_", 4) == 0) {
+	 option += 4;
+
+	 if (state->option.Fog == OPTION_NONE) {
+	    if (strcmp(option, "exp") == 0) {
+	       state->option.Fog = OPTION_FOG_EXP;
+	       return 1;
+	    } else if (strcmp(option, "exp2") == 0) {
+	       state->option.Fog = OPTION_FOG_EXP2;
+	       return 1;
+	    } else if (strcmp(option, "linear") == 0) {
+	       state->option.Fog = OPTION_FOG_LINEAR;
+	       return 1;
+	    }
+	 }
+
+	 return 0;
+      } else if (strncmp(option, "precision_hint_", 15) == 0) {
+	 option += 15;
+
+         /* The ARB_fragment_program spec, 3.11.4.5.2 says:
+          *
+          * "Only one precision control option may be specified by any given
+          * fragment program.  A fragment program that specifies both the
+          * "ARB_precision_hint_fastest" and "ARB_precision_hint_nicest"
+          * program options will fail to load.
+          */
+
+         if (strcmp(option, "nicest") == 0 && state->option.PrecisionHint != OPTION_FASTEST) {
+            state->option.PrecisionHint = OPTION_NICEST;
+            return 1;
+         } else if (strcmp(option, "fastest") == 0 && state->option.PrecisionHint != OPTION_NICEST) {
+            state->option.PrecisionHint = OPTION_FASTEST;
+            return 1;
+         }
+
+	 return 0;
+      } else if (strcmp(option, "draw_buffers") == 0) {
+	 /* Don't need to check extension availability because all Mesa-based
+	  * drivers support GL_ARB_draw_buffers.
+	  */
+	 state->option.DrawBuffers = 1;
+	 return 1;
+      } else if (strcmp(option, "fragment_program_shadow") == 0) {
+	 if (state->ctx->Extensions.ARB_fragment_program_shadow) {
+	    state->option.Shadow = 1;
+	    return 1;
+	 }
+      } else if (strncmp(option, "fragment_coord_", 15) == 0) {
+         option += 15;
+         if (state->ctx->Extensions.ARB_fragment_coord_conventions) {
+            if (strcmp(option, "origin_upper_left") == 0) {
+               state->option.OriginUpperLeft = 1;
+               return 1;
+            }
+            else if (strcmp(option, "pixel_center_integer") == 0) {
+               state->option.PixelCenterInteger = 1;
+               return 1;
+            }
+         }
+      }
+   } else if (strncmp(option, "ATI_", 4) == 0) {
+      option += 4;
+
+      if (strcmp(option, "draw_buffers") == 0) {
+	 /* Don't need to check extension availability because all Mesa-based
+	  * drivers support GL_ATI_draw_buffers.
+	  */
+	 state->option.DrawBuffers = 1;
+	 return 1;
+      }
+   } else if (strncmp(option, "NV_fragment_program", 19) == 0) {
+      option += 19;
+
+      /* Other NV_fragment_program strings may be supported later.
+       */
+      if (option[0] == '\0') {
+	 if (state->ctx->Extensions.NV_fragment_program_option) {
+	    state->option.NV_fragment = 1;
+	    return 1;
+	 }
+      }
+   }
+
+   return 0;
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/program_parser.h b/icd/intel/compiler/mesa-utils/src/mesa/program/program_parser.h
new file mode 100644
index 0000000..04c64f4
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/program_parser.h

@@ -0,0 +1,302 @@
+/*
+ * Copyright © 2009 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#pragma once
+
+#include "main/config.h"
+#include "program/prog_parameter.h"
+
+struct gl_context;
+
+enum asm_type {
+   at_none,
+   at_address,
+   at_attrib,
+   at_param,
+   at_temp,
+   at_output
+};
+
+struct asm_symbol {
+   struct asm_symbol *next;    /**< List linkage for freeing. */
+   const char *name;
+   enum asm_type type;
+   unsigned attrib_binding;
+   unsigned output_binding;   /**< Output / result register number. */
+
+   /**
+    * One of PROGRAM_STATE_VAR or PROGRAM_CONSTANT.
+    */
+   unsigned param_binding_type;
+
+   /** 
+    * Offset into the program_parameter_list where the tokens representing our
+    * bound state (or constants) start.
+    */
+   unsigned param_binding_begin;
+
+   /**
+    * Constants put into the parameter list may be swizzled.  This
+    * field contain's the symbol's swizzle. (SWIZZLE_X/Y/Z/W)
+    */
+   unsigned param_binding_swizzle;
+
+   /* This is how many entries in the program_parameter_list we take up
+    * with our state tokens or constants. Note that this is _not_ the same as
+    * the number of param registers we eventually use.
+    */
+   unsigned param_binding_length;
+
+   /**
+    * Index of the temp register assigned to this variable.
+    */
+   unsigned temp_binding;
+
+   /**
+    * Flag whether or not a PARAM is an array
+    */
+   unsigned param_is_array:1;
+
+
+   /**
+    * Flag whether or not a PARAM array is accessed indirectly
+    */
+   unsigned param_accessed_indirectly:1;
+
+
+   /**
+    * \brief Is first pass of parameter layout done with this variable?
+    *
+    * The parameter layout routine operates in two passes.  This flag tracks
+    * whether or not the first pass has handled this variable.
+    *
+    * \sa _mesa_layout_parameters
+    */
+   unsigned pass1_done:1;
+};
+
+
+struct asm_vector {
+   unsigned count;
+   gl_constant_value data[4];
+};
+
+
+struct asm_swizzle_mask {
+   unsigned swizzle:12;
+   unsigned mask:4;
+};
+
+
+struct asm_src_register {
+   struct prog_src_register Base;
+
+   /**
+    * Symbol associated with indirect access to parameter arrays.
+    *
+    * If \c Base::RelAddr is 1, this will point to the symbol for the parameter
+    * that is being dereferenced.  Further, \c Base::Index will be the offset
+    * from the address register being used.
+    */
+   struct asm_symbol *Symbol;
+};
+
+
+struct asm_instruction {
+   struct prog_instruction Base;
+   struct asm_instruction *next;
+   struct asm_src_register SrcReg[3];
+};
+
+
+struct asm_parser_state {
+   struct gl_context *ctx;
+   struct gl_program *prog;
+
+   /**
+    * Per-program target limits
+    */
+   struct gl_program_constants *limits;
+
+   struct _mesa_symbol_table *st;
+
+   /**
+    * Linked list of symbols
+    *
+    * This list is \b only used when cleaning up compiler state and freeing
+    * memory.
+    */
+   struct asm_symbol *sym;
+
+   /**
+    * State for the lexer.
+    */
+   void *scanner;
+
+   /**
+    * Linked list of instructions generated during parsing.
+    */
+   /*@{*/
+   struct asm_instruction *inst_head;
+   struct asm_instruction *inst_tail;
+   /*@}*/
+
+
+   /**
+    * Selected limits copied from gl_constants
+    *
+    * These are limits from the GL context, but various bits in the program
+    * must be validated against these values.
+    */
+   /*@{*/
+   unsigned MaxTextureCoordUnits;
+   unsigned MaxTextureImageUnits;
+   unsigned MaxTextureUnits;
+   unsigned MaxClipPlanes;
+   unsigned MaxLights;
+   unsigned MaxProgramMatrices;
+   unsigned MaxDrawBuffers;
+   /*@}*/
+
+   /**
+    * Value to use in state vector accessors for environment and local
+    * parameters
+    */
+   unsigned state_param_enum;
+
+
+   /**
+    * Input attributes bound to specific names
+    *
+    * This is only needed so that errors can be properly produced when
+    * multiple ATTRIB statements bind illegal combinations of vertex
+    * attributes.
+    */
+   GLbitfield64 InputsBound;
+
+   enum {
+      invalid_mode = 0,
+      ARB_vertex,
+      ARB_fragment
+   } mode;
+
+   struct {
+      unsigned PositionInvariant:1;
+      unsigned Fog:2;
+      unsigned PrecisionHint:2;
+      unsigned DrawBuffers:1;
+      unsigned Shadow:1;
+      unsigned TexRect:1;
+      unsigned TexArray:1;
+      unsigned NV_fragment:1;
+      unsigned OriginUpperLeft:1;
+      unsigned PixelCenterInteger:1;
+   } option;
+
+   struct {
+      unsigned UsesKill:1;
+      unsigned UsesDFdy:1;
+   } fragment;
+};
+
+#define OPTION_NONE        0
+#define OPTION_FOG_EXP     1
+#define OPTION_FOG_EXP2    2
+#define OPTION_FOG_LINEAR  3
+#define OPTION_NICEST      1
+#define OPTION_FASTEST     2
+
+typedef struct YYLTYPE {
+   int first_line;
+   int first_column;
+   int last_line;
+   int last_column;
+   int position;
+} YYLTYPE;
+
+#define YYLTYPE_IS_DECLARED 1
+#define YYLTYPE_IS_TRIVIAL 1
+
+
+extern GLboolean _mesa_parse_arb_program(struct gl_context *ctx, GLenum target,
+    const GLubyte *str, GLsizei len, struct asm_parser_state *state);
+
+
+
+/* From program_lexer.l. */
+extern void _mesa_program_lexer_dtor(void *scanner);
+
+extern void _mesa_program_lexer_ctor(void **scanner,
+    struct asm_parser_state *state, const char *string, size_t len);
+
+
+/**
+ *\name From program_parse_extra.c
+ */
+/*@{*/
+
+/**
+ * Parses and processes an option string to an ARB vertex program
+ *
+ * \return
+ * Non-zero on success, zero on failure.
+ */
+extern int _mesa_ARBvp_parse_option(struct asm_parser_state *state,
+    const char *option);
+
+/**
+ * Parses and processes an option string to an ARB fragment program
+ *
+ * \return
+ * Non-zero on success, zero on failure.
+ */
+extern int _mesa_ARBfp_parse_option(struct asm_parser_state *state,
+    const char *option);
+
+/**
+ * Parses and processes instruction suffixes
+ *
+ * Instruction suffixes, such as \c _SAT, are processed.  The relevant bits
+ * are set in \c inst.  If suffixes are encountered that are either not known
+ * or not supported by the modes and options set in \c state, zero will be
+ * returned.
+ *
+ * \return
+ * Non-zero on success, zero on failure.
+ */
+extern int _mesa_parse_instruction_suffix(const struct asm_parser_state *state,
+    const char *suffix, struct prog_instruction *inst);
+
+/**
+ * Parses a condition code name
+ *
+ * The condition code names (e.g., \c LT, \c GT, \c NE) were added to assembly
+ * shaders with the \c GL_NV_fragment_program_option extension.  This function
+ * converts a string representation into one of the \c COND_ macros.
+ *
+ * \return
+ * One of the \c COND_ macros defined in prog_instruction.h on success or zero
+ * on failure.
+ */
+extern int _mesa_parse_cc(const char *s);
+
+/*@}*/

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/programopt.c b/icd/intel/compiler/mesa-utils/src/mesa/program/programopt.c
new file mode 100644
index 0000000..92a8831
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/programopt.c

@@ -0,0 +1,682 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2007  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file  programopt.c 
+ * Vertex/Fragment program optimizations and transformations for program
+ * options, etc.
+ *
+ * \author Brian Paul
+ */
+
+
+#include "main/glheader.h"
+#include "main/context.h"
+#include "prog_parameter.h"
+#include "prog_statevars.h"
+#include "program.h"
+#include "programopt.h"
+#include "prog_instruction.h"
+
+
+/**
+ * This function inserts instructions for coordinate modelview * projection
+ * into a vertex program.
+ * May be used to implement the position_invariant option.
+ */
+static void
+_mesa_insert_mvp_dp4_code(struct gl_context *ctx, struct gl_vertex_program *vprog)
+{
+   struct prog_instruction *newInst;
+   const GLuint origLen = vprog->Base.NumInstructions;
+   const GLuint newLen = origLen + 4;
+   GLuint i;
+
+   /*
+    * Setup state references for the modelview/projection matrix.
+    * XXX we should check if these state vars are already declared.
+    */
+   static const gl_state_index mvpState[4][STATE_LENGTH] = {
+      { STATE_MVP_MATRIX, 0, 0, 0, 0 },  /* state.matrix.mvp.row[0] */
+      { STATE_MVP_MATRIX, 0, 1, 1, 0 },  /* state.matrix.mvp.row[1] */
+      { STATE_MVP_MATRIX, 0, 2, 2, 0 },  /* state.matrix.mvp.row[2] */
+      { STATE_MVP_MATRIX, 0, 3, 3, 0 },  /* state.matrix.mvp.row[3] */
+   };
+   GLint mvpRef[4];
+
+   for (i = 0; i < 4; i++) {
+      mvpRef[i] = _mesa_add_state_reference(vprog->Base.Parameters,
+                                            mvpState[i]);
+   }
+
+   /* Alloc storage for new instructions */
+   newInst = _mesa_alloc_instructions(newLen);
+   if (!newInst) {
+      _mesa_error(ctx, GL_OUT_OF_MEMORY,
+                  "glProgramString(inserting position_invariant code)");
+      return;
+   }
+
+   /*
+    * Generated instructions:
+    * newInst[0] = DP4 result.position.x, mvp.row[0], vertex.position;
+    * newInst[1] = DP4 result.position.y, mvp.row[1], vertex.position;
+    * newInst[2] = DP4 result.position.z, mvp.row[2], vertex.position;
+    * newInst[3] = DP4 result.position.w, mvp.row[3], vertex.position;
+    */
+   _mesa_init_instructions(newInst, 4);
+   for (i = 0; i < 4; i++) {
+      newInst[i].Opcode = OPCODE_DP4;
+      newInst[i].DstReg.File = PROGRAM_OUTPUT;
+      newInst[i].DstReg.Index = VARYING_SLOT_POS;
+      newInst[i].DstReg.WriteMask = (WRITEMASK_X << i);
+      newInst[i].SrcReg[0].File = PROGRAM_STATE_VAR;
+      newInst[i].SrcReg[0].Index = mvpRef[i];
+      newInst[i].SrcReg[0].Swizzle = SWIZZLE_NOOP;
+      newInst[i].SrcReg[1].File = PROGRAM_INPUT;
+      newInst[i].SrcReg[1].Index = VERT_ATTRIB_POS;
+      newInst[i].SrcReg[1].Swizzle = SWIZZLE_NOOP;
+   }
+
+   /* Append original instructions after new instructions */
+   _mesa_copy_instructions (newInst + 4, vprog->Base.Instructions, origLen);
+
+   /* free old instructions */
+   _mesa_free_instructions(vprog->Base.Instructions, origLen);
+
+   /* install new instructions */
+   vprog->Base.Instructions = newInst;
+   vprog->Base.NumInstructions = newLen;
+   vprog->Base.InputsRead |= VERT_BIT_POS;
+   vprog->Base.OutputsWritten |= BITFIELD64_BIT(VARYING_SLOT_POS);
+}
+
+
+static void
+_mesa_insert_mvp_mad_code(struct gl_context *ctx, struct gl_vertex_program *vprog)
+{
+   struct prog_instruction *newInst;
+   const GLuint origLen = vprog->Base.NumInstructions;
+   const GLuint newLen = origLen + 4;
+   GLuint hposTemp;
+   GLuint i;
+
+   /*
+    * Setup state references for the modelview/projection matrix.
+    * XXX we should check if these state vars are already declared.
+    */
+   static const gl_state_index mvpState[4][STATE_LENGTH] = {
+      { STATE_MVP_MATRIX, 0, 0, 0, STATE_MATRIX_TRANSPOSE },
+      { STATE_MVP_MATRIX, 0, 1, 1, STATE_MATRIX_TRANSPOSE },
+      { STATE_MVP_MATRIX, 0, 2, 2, STATE_MATRIX_TRANSPOSE },
+      { STATE_MVP_MATRIX, 0, 3, 3, STATE_MATRIX_TRANSPOSE },
+   };
+   GLint mvpRef[4];
+
+   for (i = 0; i < 4; i++) {
+      mvpRef[i] = _mesa_add_state_reference(vprog->Base.Parameters,
+                                            mvpState[i]);
+   }
+
+   /* Alloc storage for new instructions */
+   newInst = _mesa_alloc_instructions(newLen);
+   if (!newInst) {
+      _mesa_error(ctx, GL_OUT_OF_MEMORY,
+                  "glProgramString(inserting position_invariant code)");
+      return;
+   }
+
+   /* TEMP hposTemp; */
+   hposTemp = vprog->Base.NumTemporaries++;
+
+   /*
+    * Generated instructions:
+    *    emit_op2(p, OPCODE_MUL, tmp, 0, swizzle1(src,X), mat[0]);
+    *    emit_op3(p, OPCODE_MAD, tmp, 0, swizzle1(src,Y), mat[1], tmp);
+    *    emit_op3(p, OPCODE_MAD, tmp, 0, swizzle1(src,Z), mat[2], tmp);
+    *    emit_op3(p, OPCODE_MAD, dest, 0, swizzle1(src,W), mat[3], tmp);
+    */
+   _mesa_init_instructions(newInst, 4);
+
+   newInst[0].Opcode = OPCODE_MUL;
+   newInst[0].DstReg.File = PROGRAM_TEMPORARY;
+   newInst[0].DstReg.Index = hposTemp;
+   newInst[0].DstReg.WriteMask = WRITEMASK_XYZW;
+   newInst[0].SrcReg[0].File = PROGRAM_INPUT;
+   newInst[0].SrcReg[0].Index = VERT_ATTRIB_POS;
+   newInst[0].SrcReg[0].Swizzle = SWIZZLE_XXXX;
+   newInst[0].SrcReg[1].File = PROGRAM_STATE_VAR;
+   newInst[0].SrcReg[1].Index = mvpRef[0];
+   newInst[0].SrcReg[1].Swizzle = SWIZZLE_NOOP;
+
+   for (i = 1; i <= 2; i++) {
+      newInst[i].Opcode = OPCODE_MAD;
+      newInst[i].DstReg.File = PROGRAM_TEMPORARY;
+      newInst[i].DstReg.Index = hposTemp;
+      newInst[i].DstReg.WriteMask = WRITEMASK_XYZW;
+      newInst[i].SrcReg[0].File = PROGRAM_INPUT;
+      newInst[i].SrcReg[0].Index = VERT_ATTRIB_POS;
+      newInst[i].SrcReg[0].Swizzle = MAKE_SWIZZLE4(i,i,i,i);
+      newInst[i].SrcReg[1].File = PROGRAM_STATE_VAR;
+      newInst[i].SrcReg[1].Index = mvpRef[i];
+      newInst[i].SrcReg[1].Swizzle = SWIZZLE_NOOP;
+      newInst[i].SrcReg[2].File = PROGRAM_TEMPORARY;
+      newInst[i].SrcReg[2].Index = hposTemp;
+      newInst[1].SrcReg[2].Swizzle = SWIZZLE_NOOP;
+   }
+
+   newInst[3].Opcode = OPCODE_MAD;
+   newInst[3].DstReg.File = PROGRAM_OUTPUT;
+   newInst[3].DstReg.Index = VARYING_SLOT_POS;
+   newInst[3].DstReg.WriteMask = WRITEMASK_XYZW;
+   newInst[3].SrcReg[0].File = PROGRAM_INPUT;
+   newInst[3].SrcReg[0].Index = VERT_ATTRIB_POS;
+   newInst[3].SrcReg[0].Swizzle = SWIZZLE_WWWW;
+   newInst[3].SrcReg[1].File = PROGRAM_STATE_VAR;
+   newInst[3].SrcReg[1].Index = mvpRef[3];
+   newInst[3].SrcReg[1].Swizzle = SWIZZLE_NOOP;
+   newInst[3].SrcReg[2].File = PROGRAM_TEMPORARY;
+   newInst[3].SrcReg[2].Index = hposTemp;
+   newInst[3].SrcReg[2].Swizzle = SWIZZLE_NOOP;
+
+
+   /* Append original instructions after new instructions */
+   _mesa_copy_instructions (newInst + 4, vprog->Base.Instructions, origLen);
+
+   /* free old instructions */
+   _mesa_free_instructions(vprog->Base.Instructions, origLen);
+
+   /* install new instructions */
+   vprog->Base.Instructions = newInst;
+   vprog->Base.NumInstructions = newLen;
+   vprog->Base.InputsRead |= VERT_BIT_POS;
+   vprog->Base.OutputsWritten |= BITFIELD64_BIT(VARYING_SLOT_POS);
+}
+
+
+void
+_mesa_insert_mvp_code(struct gl_context *ctx, struct gl_vertex_program *vprog)
+{
+   if (ctx->ShaderCompilerOptions[MESA_SHADER_VERTEX].OptimizeForAOS)
+      _mesa_insert_mvp_dp4_code( ctx, vprog );
+   else
+      _mesa_insert_mvp_mad_code( ctx, vprog );
+}
+      
+
+
+
+
+
+/**
+ * Append instructions to implement fog
+ *
+ * The \c fragment.fogcoord input is used to compute the fog blend factor.
+ *
+ * \param ctx      The GL context
+ * \param fprog    Fragment program that fog instructions will be appended to.
+ * \param fog_mode Fog mode.  One of \c GL_EXP, \c GL_EXP2, or \c GL_LINEAR.
+ * \param saturate True if writes to color outputs should be clamped to [0, 1]
+ *
+ * \note
+ * This function sets \c VARYING_BIT_FOGC in \c fprog->Base.InputsRead.
+ *
+ * \todo With a little work, this function could be adapted to add fog code
+ * to vertex programs too.
+ */
+void
+_mesa_append_fog_code(struct gl_context *ctx,
+		      struct gl_fragment_program *fprog, GLenum fog_mode,
+		      GLboolean saturate)
+{
+   static const gl_state_index fogPStateOpt[STATE_LENGTH]
+      = { STATE_INTERNAL, STATE_FOG_PARAMS_OPTIMIZED, 0, 0, 0 };
+   static const gl_state_index fogColorState[STATE_LENGTH]
+      = { STATE_FOG_COLOR, 0, 0, 0, 0};
+   struct prog_instruction *newInst, *inst;
+   const GLuint origLen = fprog->Base.NumInstructions;
+   const GLuint newLen = origLen + 5;
+   GLuint i;
+   GLint fogPRefOpt, fogColorRef; /* state references */
+   GLuint colorTemp, fogFactorTemp; /* temporary registerss */
+
+   if (fog_mode == GL_NONE) {
+      _mesa_problem(ctx, "_mesa_append_fog_code() called for fragment program"
+                    " with fog_mode == GL_NONE");
+      return;
+   }
+
+   if (!(fprog->Base.OutputsWritten & (1 << FRAG_RESULT_COLOR))) {
+      /* program doesn't output color, so nothing to do */
+      return;
+   }
+
+   /* Alloc storage for new instructions */
+   newInst = _mesa_alloc_instructions(newLen);
+   if (!newInst) {
+      _mesa_error(ctx, GL_OUT_OF_MEMORY,
+                  "glProgramString(inserting fog_option code)");
+      return;
+   }
+
+   /* Copy orig instructions into new instruction buffer */
+   _mesa_copy_instructions(newInst, fprog->Base.Instructions, origLen);
+
+   /* PARAM fogParamsRefOpt = internal optimized fog params; */
+   fogPRefOpt
+      = _mesa_add_state_reference(fprog->Base.Parameters, fogPStateOpt);
+   /* PARAM fogColorRef = state.fog.color; */
+   fogColorRef
+      = _mesa_add_state_reference(fprog->Base.Parameters, fogColorState);
+
+   /* TEMP colorTemp; */
+   colorTemp = fprog->Base.NumTemporaries++;
+   /* TEMP fogFactorTemp; */
+   fogFactorTemp = fprog->Base.NumTemporaries++;
+
+   /* Scan program to find where result.color is written */
+   inst = newInst;
+   for (i = 0; i < fprog->Base.NumInstructions; i++) {
+      if (inst->Opcode == OPCODE_END)
+         break;
+      if (inst->DstReg.File == PROGRAM_OUTPUT &&
+          inst->DstReg.Index == FRAG_RESULT_COLOR) {
+         /* change the instruction to write to colorTemp w/ clamping */
+         inst->DstReg.File = PROGRAM_TEMPORARY;
+         inst->DstReg.Index = colorTemp;
+         inst->SaturateMode = saturate;
+         /* don't break (may be several writes to result.color) */
+      }
+      inst++;
+   }
+   assert(inst->Opcode == OPCODE_END); /* we'll overwrite this inst */
+
+   _mesa_init_instructions(inst, 5);
+
+   /* emit instructions to compute fog blending factor */
+   /* this is always clamped to [0, 1] regardless of fragment clamping */
+   if (fog_mode == GL_LINEAR) {
+      /* MAD fogFactorTemp.x, fragment.fogcoord.x, fogPRefOpt.x, fogPRefOpt.y; */
+      inst->Opcode = OPCODE_MAD;
+      inst->DstReg.File = PROGRAM_TEMPORARY;
+      inst->DstReg.Index = fogFactorTemp;
+      inst->DstReg.WriteMask = WRITEMASK_X;
+      inst->SrcReg[0].File = PROGRAM_INPUT;
+      inst->SrcReg[0].Index = VARYING_SLOT_FOGC;
+      inst->SrcReg[0].Swizzle = SWIZZLE_XXXX;
+      inst->SrcReg[1].File = PROGRAM_STATE_VAR;
+      inst->SrcReg[1].Index = fogPRefOpt;
+      inst->SrcReg[1].Swizzle = SWIZZLE_XXXX;
+      inst->SrcReg[2].File = PROGRAM_STATE_VAR;
+      inst->SrcReg[2].Index = fogPRefOpt;
+      inst->SrcReg[2].Swizzle = SWIZZLE_YYYY;
+      inst->SaturateMode = SATURATE_ZERO_ONE;
+      inst++;
+   }
+   else {
+      ASSERT(fog_mode == GL_EXP || fog_mode == GL_EXP2);
+      /* fogPRefOpt.z = d/ln(2), fogPRefOpt.w = d/sqrt(ln(2) */
+      /* EXP: MUL fogFactorTemp.x, fogPRefOpt.z, fragment.fogcoord.x; */
+      /* EXP2: MUL fogFactorTemp.x, fogPRefOpt.w, fragment.fogcoord.x; */
+      inst->Opcode = OPCODE_MUL;
+      inst->DstReg.File = PROGRAM_TEMPORARY;
+      inst->DstReg.Index = fogFactorTemp;
+      inst->DstReg.WriteMask = WRITEMASK_X;
+      inst->SrcReg[0].File = PROGRAM_STATE_VAR;
+      inst->SrcReg[0].Index = fogPRefOpt;
+      inst->SrcReg[0].Swizzle
+         = (fog_mode == GL_EXP) ? SWIZZLE_ZZZZ : SWIZZLE_WWWW;
+      inst->SrcReg[1].File = PROGRAM_INPUT;
+      inst->SrcReg[1].Index = VARYING_SLOT_FOGC;
+      inst->SrcReg[1].Swizzle = SWIZZLE_XXXX;
+      inst++;
+      if (fog_mode == GL_EXP2) {
+         /* MUL fogFactorTemp.x, fogFactorTemp.x, fogFactorTemp.x; */
+         inst->Opcode = OPCODE_MUL;
+         inst->DstReg.File = PROGRAM_TEMPORARY;
+         inst->DstReg.Index = fogFactorTemp;
+         inst->DstReg.WriteMask = WRITEMASK_X;
+         inst->SrcReg[0].File = PROGRAM_TEMPORARY;
+         inst->SrcReg[0].Index = fogFactorTemp;
+         inst->SrcReg[0].Swizzle = SWIZZLE_XXXX;
+         inst->SrcReg[1].File = PROGRAM_TEMPORARY;
+         inst->SrcReg[1].Index = fogFactorTemp;
+         inst->SrcReg[1].Swizzle = SWIZZLE_XXXX;
+         inst++;
+      }
+      /* EX2_SAT fogFactorTemp.x, -fogFactorTemp.x; */
+      inst->Opcode = OPCODE_EX2;
+      inst->DstReg.File = PROGRAM_TEMPORARY;
+      inst->DstReg.Index = fogFactorTemp;
+      inst->DstReg.WriteMask = WRITEMASK_X;
+      inst->SrcReg[0].File = PROGRAM_TEMPORARY;
+      inst->SrcReg[0].Index = fogFactorTemp;
+      inst->SrcReg[0].Negate = NEGATE_XYZW;
+      inst->SrcReg[0].Swizzle = SWIZZLE_XXXX;
+      inst->SaturateMode = SATURATE_ZERO_ONE;
+      inst++;
+   }
+   /* LRP result.color.xyz, fogFactorTemp.xxxx, colorTemp, fogColorRef; */
+   inst->Opcode = OPCODE_LRP;
+   inst->DstReg.File = PROGRAM_OUTPUT;
+   inst->DstReg.Index = FRAG_RESULT_COLOR;
+   inst->DstReg.WriteMask = WRITEMASK_XYZ;
+   inst->SrcReg[0].File = PROGRAM_TEMPORARY;
+   inst->SrcReg[0].Index = fogFactorTemp;
+   inst->SrcReg[0].Swizzle = SWIZZLE_XXXX;
+   inst->SrcReg[1].File = PROGRAM_TEMPORARY;
+   inst->SrcReg[1].Index = colorTemp;
+   inst->SrcReg[1].Swizzle = SWIZZLE_NOOP;
+   inst->SrcReg[2].File = PROGRAM_STATE_VAR;
+   inst->SrcReg[2].Index = fogColorRef;
+   inst->SrcReg[2].Swizzle = SWIZZLE_NOOP;
+   inst++;
+   /* MOV result.color.w, colorTemp.x;  # copy alpha */
+   inst->Opcode = OPCODE_MOV;
+   inst->DstReg.File = PROGRAM_OUTPUT;
+   inst->DstReg.Index = FRAG_RESULT_COLOR;
+   inst->DstReg.WriteMask = WRITEMASK_W;
+   inst->SrcReg[0].File = PROGRAM_TEMPORARY;
+   inst->SrcReg[0].Index = colorTemp;
+   inst->SrcReg[0].Swizzle = SWIZZLE_NOOP;
+   inst++;
+   /* END; */
+   inst->Opcode = OPCODE_END;
+   inst++;
+
+   /* free old instructions */
+   _mesa_free_instructions(fprog->Base.Instructions, origLen);
+
+   /* install new instructions */
+   fprog->Base.Instructions = newInst;
+   fprog->Base.NumInstructions = inst - newInst;
+   fprog->Base.InputsRead |= VARYING_BIT_FOGC;
+   assert(fprog->Base.OutputsWritten & (1 << FRAG_RESULT_COLOR));
+}
+
+
+
+static GLboolean
+is_texture_instruction(const struct prog_instruction *inst)
+{
+   switch (inst->Opcode) {
+   case OPCODE_TEX:
+   case OPCODE_TXB:
+   case OPCODE_TXD:
+   case OPCODE_TXL:
+   case OPCODE_TXP:
+   case OPCODE_TXP_NV:
+      return GL_TRUE;
+   default:
+      return GL_FALSE;
+   }
+}
+      
+
+/**
+ * Count the number of texure indirections in the given program.
+ * The program's NumTexIndirections field will be updated.
+ * See the GL_ARB_fragment_program spec (issue 24) for details.
+ * XXX we count texture indirections in texenvprogram.c (maybe use this code
+ * instead and elsewhere).
+ */
+void
+_mesa_count_texture_indirections(struct gl_program *prog)
+{
+   GLuint indirections = 1;
+   GLbitfield tempsOutput = 0x0;
+   GLbitfield aluTemps = 0x0;
+   GLuint i;
+
+   for (i = 0; i < prog->NumInstructions; i++) {
+      const struct prog_instruction *inst = prog->Instructions + i;
+
+      if (is_texture_instruction(inst)) {
+         if (((inst->SrcReg[0].File == PROGRAM_TEMPORARY) && 
+              (tempsOutput & (1 << inst->SrcReg[0].Index))) ||
+             ((inst->Opcode != OPCODE_KIL) &&
+              (inst->DstReg.File == PROGRAM_TEMPORARY) && 
+              (aluTemps & (1 << inst->DstReg.Index)))) 
+            {
+               indirections++;
+               tempsOutput = 0x0;
+               aluTemps = 0x0;
+            }
+      }
+      else {
+         GLuint j;
+         for (j = 0; j < 3; j++) {
+            if (inst->SrcReg[j].File == PROGRAM_TEMPORARY)
+               aluTemps |= (1 << inst->SrcReg[j].Index);
+         }
+         if (inst->DstReg.File == PROGRAM_TEMPORARY)
+            aluTemps |= (1 << inst->DstReg.Index);
+      }
+
+      if ((inst->Opcode != OPCODE_KIL) && (inst->DstReg.File == PROGRAM_TEMPORARY))
+         tempsOutput |= (1 << inst->DstReg.Index);
+   }
+
+   prog->NumTexIndirections = indirections;
+}
+
+
+/**
+ * Count number of texture instructions in given program and update the
+ * program's NumTexInstructions field.
+ */
+void
+_mesa_count_texture_instructions(struct gl_program *prog)
+{
+   GLuint i;
+   prog->NumTexInstructions = 0;
+   for (i = 0; i < prog->NumInstructions; i++) {
+      prog->NumTexInstructions += is_texture_instruction(prog->Instructions + i);
+   }
+}
+
+
+/**
+ * Scan/rewrite program to remove reads of custom (output) registers.
+ * The passed type has to be PROGRAM_OUTPUT.
+ * On some hardware, trying to read an output register causes trouble.
+ * So, rewrite the program to use a temporary register in this case.
+ */
+void
+_mesa_remove_output_reads(struct gl_program *prog, gl_register_file type)
+{
+   GLuint i;
+   GLint outputMap[VARYING_SLOT_MAX];
+   GLuint numVaryingReads = 0;
+   GLboolean usedTemps[MAX_PROGRAM_TEMPS];
+   GLuint firstTemp = 0;
+
+   _mesa_find_used_registers(prog, PROGRAM_TEMPORARY,
+                             usedTemps, MAX_PROGRAM_TEMPS);
+
+   assert(type == PROGRAM_OUTPUT);
+
+   for (i = 0; i < VARYING_SLOT_MAX; i++)
+      outputMap[i] = -1;
+
+   /* look for instructions which read from varying vars */
+   for (i = 0; i < prog->NumInstructions; i++) {
+      struct prog_instruction *inst = prog->Instructions + i;
+      const GLuint numSrc = _mesa_num_inst_src_regs(inst->Opcode);
+      GLuint j;
+      for (j = 0; j < numSrc; j++) {
+         if (inst->SrcReg[j].File == type) {
+            /* replace the read with a temp reg */
+            const GLuint var = inst->SrcReg[j].Index;
+            if (outputMap[var] == -1) {
+               numVaryingReads++;
+               outputMap[var] = _mesa_find_free_register(usedTemps,
+                                                         MAX_PROGRAM_TEMPS,
+                                                         firstTemp);
+               firstTemp = outputMap[var] + 1;
+            }
+            inst->SrcReg[j].File = PROGRAM_TEMPORARY;
+            inst->SrcReg[j].Index = outputMap[var];
+         }
+      }
+   }
+
+   if (numVaryingReads == 0)
+      return; /* nothing to be done */
+
+   /* look for instructions which write to the varying vars identified above */
+   for (i = 0; i < prog->NumInstructions; i++) {
+      struct prog_instruction *inst = prog->Instructions + i;
+      if (inst->DstReg.File == type &&
+          outputMap[inst->DstReg.Index] >= 0) {
+         /* change inst to write to the temp reg, instead of the varying */
+         inst->DstReg.File = PROGRAM_TEMPORARY;
+         inst->DstReg.Index = outputMap[inst->DstReg.Index];
+      }
+   }
+
+   /* insert new instructions to copy the temp vars to the varying vars */
+   {
+      struct prog_instruction *inst;
+      GLint endPos, var;
+
+      /* Look for END instruction and insert the new varying writes */
+      endPos = -1;
+      for (i = 0; i < prog->NumInstructions; i++) {
+         struct prog_instruction *inst = prog->Instructions + i;
+         if (inst->Opcode == OPCODE_END) {
+            endPos = i;
+            _mesa_insert_instructions(prog, i, numVaryingReads);
+            break;
+         }
+      }
+
+      assert(endPos >= 0);
+
+      /* insert new MOV instructions here */
+      inst = prog->Instructions + endPos;
+      for (var = 0; var < VARYING_SLOT_MAX; var++) {
+         if (outputMap[var] >= 0) {
+            /* MOV VAR[var], TEMP[tmp]; */
+            inst->Opcode = OPCODE_MOV;
+            inst->DstReg.File = type;
+            inst->DstReg.Index = var;
+            inst->SrcReg[0].File = PROGRAM_TEMPORARY;
+            inst->SrcReg[0].Index = outputMap[var];
+            inst++;
+         }
+      }
+   }
+}
+
+
+/**
+ * Make the given fragment program into a "no-op" shader.
+ * Actually, just copy the incoming fragment color (or texcoord)
+ * to the output color.
+ * This is for debug/test purposes.
+ */
+void
+_mesa_nop_fragment_program(struct gl_context *ctx, struct gl_fragment_program *prog)
+{
+   struct prog_instruction *inst;
+   GLuint inputAttr;
+
+   inst = _mesa_alloc_instructions(2);
+   if (!inst) {
+      _mesa_error(ctx, GL_OUT_OF_MEMORY, "_mesa_nop_fragment_program");
+      return;
+   }
+
+   _mesa_init_instructions(inst, 2);
+
+   inst[0].Opcode = OPCODE_MOV;
+   inst[0].DstReg.File = PROGRAM_OUTPUT;
+   inst[0].DstReg.Index = FRAG_RESULT_COLOR;
+   inst[0].SrcReg[0].File = PROGRAM_INPUT;
+   if (prog->Base.InputsRead & VARYING_BIT_COL0)
+      inputAttr = VARYING_SLOT_COL0;
+   else
+      inputAttr = VARYING_SLOT_TEX0;
+   inst[0].SrcReg[0].Index = inputAttr;
+
+   inst[1].Opcode = OPCODE_END;
+
+   _mesa_free_instructions(prog->Base.Instructions,
+                           prog->Base.NumInstructions);
+
+   prog->Base.Instructions = inst;
+   prog->Base.NumInstructions = 2;
+   prog->Base.InputsRead = BITFIELD64_BIT(inputAttr);
+   prog->Base.OutputsWritten = BITFIELD64_BIT(FRAG_RESULT_COLOR);
+}
+
+
+/**
+ * \sa _mesa_nop_fragment_program
+ * Replace the given vertex program with a "no-op" program that just
+ * transforms vertex position and emits color.
+ */
+void
+_mesa_nop_vertex_program(struct gl_context *ctx, struct gl_vertex_program *prog)
+{
+   struct prog_instruction *inst;
+   GLuint inputAttr;
+
+   /*
+    * Start with a simple vertex program that emits color.
+    */
+   inst = _mesa_alloc_instructions(2);
+   if (!inst) {
+      _mesa_error(ctx, GL_OUT_OF_MEMORY, "_mesa_nop_vertex_program");
+      return;
+   }
+
+   _mesa_init_instructions(inst, 2);
+
+   inst[0].Opcode = OPCODE_MOV;
+   inst[0].DstReg.File = PROGRAM_OUTPUT;
+   inst[0].DstReg.Index = VARYING_SLOT_COL0;
+   inst[0].SrcReg[0].File = PROGRAM_INPUT;
+   if (prog->Base.InputsRead & VERT_BIT_COLOR0)
+      inputAttr = VERT_ATTRIB_COLOR0;
+   else
+      inputAttr = VERT_ATTRIB_TEX0;
+   inst[0].SrcReg[0].Index = inputAttr;
+
+   inst[1].Opcode = OPCODE_END;
+
+   _mesa_free_instructions(prog->Base.Instructions,
+                           prog->Base.NumInstructions);
+
+   prog->Base.Instructions = inst;
+   prog->Base.NumInstructions = 2;
+   prog->Base.InputsRead = BITFIELD64_BIT(inputAttr);
+   prog->Base.OutputsWritten = BITFIELD64_BIT(VARYING_SLOT_COL0);
+
+   /*
+    * Now insert code to do standard modelview/projection transformation.
+    */
+   _mesa_insert_mvp_code(ctx, prog);
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/programopt.h b/icd/intel/compiler/mesa-utils/src/mesa/program/programopt.h
new file mode 100644
index 0000000..f22109f
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/programopt.h

@@ -0,0 +1,55 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 1999-2007  Brian Paul   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+#ifndef PROGRAMOPT_H
+#define PROGRAMOPT_H 1
+
+#include "main/mtypes.h"
+
+extern void
+_mesa_insert_mvp_code(struct gl_context *ctx, struct gl_vertex_program *vprog);
+
+extern void
+_mesa_append_fog_code(struct gl_context *ctx,
+		      struct gl_fragment_program *fprog, GLenum fog_mode,
+		      GLboolean saturate);
+
+extern void
+_mesa_count_texture_indirections(struct gl_program *prog);
+
+extern void
+_mesa_count_texture_instructions(struct gl_program *prog);
+
+extern void
+_mesa_remove_output_reads(struct gl_program *prog, gl_register_file type);
+
+extern void
+_mesa_nop_fragment_program(struct gl_context *ctx, struct gl_fragment_program *prog);
+
+extern void
+_mesa_nop_vertex_program(struct gl_context *ctx, struct gl_vertex_program *prog);
+
+
+#endif /* PROGRAMOPT_H */

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/register_allocate.c b/icd/intel/compiler/mesa-utils/src/mesa/program/register_allocate.c
new file mode 100644
index 0000000..6fac690
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/register_allocate.c

@@ -0,0 +1,676 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Eric Anholt <eric@anholt.net>
+ *
+ */
+
+/** @file register_allocate.c
+ *
+ * Graph-coloring register allocator.
+ *
+ * The basic idea of graph coloring is to make a node in a graph for
+ * every thing that needs a register (color) number assigned, and make
+ * edges in the graph between nodes that interfere (can't be allocated
+ * to the same register at the same time).
+ *
+ * During the "simplify" process, any any node with fewer edges than
+ * there are registers means that that edge can get assigned a
+ * register regardless of what its neighbors choose, so that node is
+ * pushed on a stack and removed (with its edges) from the graph.
+ * That likely causes other nodes to become trivially colorable as well.
+ *
+ * Then during the "select" process, nodes are popped off of that
+ * stack, their edges restored, and assigned a color different from
+ * their neighbors.  Because they were pushed on the stack only when
+ * they were trivially colorable, any color chosen won't interfere
+ * with the registers to be popped later.
+ *
+ * The downside to most graph coloring is that real hardware often has
+ * limitations, like registers that need to be allocated to a node in
+ * pairs, or aligned on some boundary.  This implementation follows
+ * the paper "Retargetable Graph-Coloring Register Allocation for
+ * Irregular Architectures" by Johan Runeson and Sven-Olof Nyström.
+ *
+ * In this system, there are register classes each containing various
+ * registers, and registers may interfere with other registers.  For
+ * example, one might have a class of base registers, and a class of
+ * aligned register pairs that would each interfere with their pair of
+ * the base registers.  Each node has a register class it needs to be
+ * assigned to.  Define p(B) to be the size of register class B, and
+ * q(B,C) to be the number of registers in B that the worst choice
+ * register in C could conflict with.  Then, this system replaces the
+ * basic graph coloring test of "fewer edges from this node than there
+ * are registers" with "For this node of class B, the sum of q(B,C)
+ * for each neighbor node of class C is less than pB".
+ *
+ * A nice feature of the pq test is that q(B,C) can be computed once
+ * up front and stored in a 2-dimensional array, so that the cost of
+ * coloring a node is constant with the number of registers.  We do
+ * this during ra_set_finalize().
+ */
+
+#include <stdbool.h>
+#include <ralloc.h>
+
+#include "main/imports.h"
+#include "main/macros.h"
+#include "main/mtypes.h"
+#include "main/bitset.h"
+#include "register_allocate.h"
+
+#define NO_REG ~0
+
+struct ra_reg {
+   BITSET_WORD *conflicts;
+   unsigned int *conflict_list;
+   unsigned int conflict_list_size;
+   unsigned int num_conflicts;
+};
+
+struct ra_regs {
+   struct ra_reg *regs;
+   unsigned int count;
+
+   struct ra_class **classes;
+   unsigned int class_count;
+
+   bool round_robin;
+};
+
+struct ra_class {
+   /**
+    * Bitset indicating which registers belong to this class.
+    *
+    * (If bit N is set, then register N belongs to this class.)
+    */
+   BITSET_WORD *regs;
+
+   /**
+    * p(B) in Runeson/Nyström paper.
+    *
+    * This is "how many regs are in the set."
+    */
+   unsigned int p;
+
+   /**
+    * q(B,C) (indexed by C, B is this register class) in
+    * Runeson/Nyström paper.  This is "how many registers of B could
+    * the worst choice register from C conflict with".
+    */
+   unsigned int *q;
+};
+
+struct ra_node {
+   /** @{
+    *
+    * List of which nodes this node interferes with.  This should be
+    * symmetric with the other node.
+    */
+   BITSET_WORD *adjacency;
+   unsigned int *adjacency_list;
+   unsigned int adjacency_list_size;
+   unsigned int adjacency_count;
+   /** @} */
+
+   unsigned int class;
+
+   /* Register, if assigned, or NO_REG. */
+   unsigned int reg;
+
+   /**
+    * Set when the node is in the trivially colorable stack.  When
+    * set, the adjacency to this node is ignored, to implement the
+    * "remove the edge from the graph" in simplification without
+    * having to actually modify the adjacency_list.
+    */
+   bool in_stack;
+
+   /* For an implementation that needs register spilling, this is the
+    * approximate cost of spilling this node.
+    */
+   float spill_cost;
+};
+
+struct ra_graph {
+   struct ra_regs *regs;
+   /**
+    * the variables that need register allocation.
+    */
+   struct ra_node *nodes;
+   unsigned int count; /**< count of nodes. */
+
+   unsigned int *stack;
+   unsigned int stack_count;
+
+   /**
+    * Tracks the start of the set of optimistically-colored registers in the
+    * stack.
+    *
+    * Along with any registers not in the stack (if one called ra_simplify()
+    * and didn't do optimistic coloring), these need to be considered for
+    * spilling.
+    */
+   unsigned int stack_optimistic_start;
+};
+
+/**
+ * Creates a set of registers for the allocator.
+ *
+ * mem_ctx is a ralloc context for the allocator.  The reg set may be freed
+ * using ralloc_free().
+ */
+struct ra_regs *
+ra_alloc_reg_set(void *mem_ctx, unsigned int count)
+{
+   unsigned int i;
+   struct ra_regs *regs;
+
+   regs = rzalloc(mem_ctx, struct ra_regs);
+   regs->count = count;
+   regs->regs = rzalloc_array(regs, struct ra_reg, count);
+
+   for (i = 0; i < count; i++) {
+      regs->regs[i].conflicts = rzalloc_array(regs->regs, BITSET_WORD,
+                                              BITSET_WORDS(count));
+      BITSET_SET(regs->regs[i].conflicts, i);
+
+      regs->regs[i].conflict_list = ralloc_array(regs->regs, unsigned int, 4);
+      regs->regs[i].conflict_list_size = 4;
+      regs->regs[i].conflict_list[0] = i;
+      regs->regs[i].num_conflicts = 1;
+   }
+
+   return regs;
+}
+
+/**
+ * The register allocator by default prefers to allocate low register numbers,
+ * since it was written for hardware (gen4/5 Intel) that is limited in its
+ * multithreadedness by the number of registers used in a given shader.
+ *
+ * However, for hardware without that restriction, densely packed register
+ * allocation can put serious constraints on instruction scheduling.  This
+ * function tells the allocator to rotate around the registers if possible as
+ * it allocates the nodes.
+ */
+void
+ra_set_allocate_round_robin(struct ra_regs *regs)
+{
+   regs->round_robin = true;
+}
+
+static void
+ra_add_conflict_list(struct ra_regs *regs, unsigned int r1, unsigned int r2)
+{
+   struct ra_reg *reg1 = &regs->regs[r1];
+
+   if (reg1->conflict_list_size == reg1->num_conflicts) {
+      reg1->conflict_list_size *= 2;
+      reg1->conflict_list = reralloc(regs->regs, reg1->conflict_list,
+				     unsigned int, reg1->conflict_list_size);
+   }
+   reg1->conflict_list[reg1->num_conflicts++] = r2;
+   BITSET_SET(reg1->conflicts, r2);
+}
+
+void
+ra_add_reg_conflict(struct ra_regs *regs, unsigned int r1, unsigned int r2)
+{
+   if (!BITSET_TEST(regs->regs[r1].conflicts, r2)) {
+      ra_add_conflict_list(regs, r1, r2);
+      ra_add_conflict_list(regs, r2, r1);
+   }
+}
+
+/**
+ * Adds a conflict between base_reg and reg, and also between reg and
+ * anything that base_reg conflicts with.
+ *
+ * This can simplify code for setting up multiple register classes
+ * which are aggregates of some base hardware registers, compared to
+ * explicitly using ra_add_reg_conflict.
+ */
+void
+ra_add_transitive_reg_conflict(struct ra_regs *regs,
+			       unsigned int base_reg, unsigned int reg)
+{
+   int i;
+
+   ra_add_reg_conflict(regs, reg, base_reg);
+
+   for (i = 0; i < regs->regs[base_reg].num_conflicts; i++) {
+      ra_add_reg_conflict(regs, reg, regs->regs[base_reg].conflict_list[i]);
+   }
+}
+
+unsigned int
+ra_alloc_reg_class(struct ra_regs *regs)
+{
+   struct ra_class *class;
+
+   regs->classes = reralloc(regs->regs, regs->classes, struct ra_class *,
+			    regs->class_count + 1);
+
+   class = rzalloc(regs, struct ra_class);
+   regs->classes[regs->class_count] = class;
+
+   class->regs = rzalloc_array(class, BITSET_WORD, BITSET_WORDS(regs->count));
+
+   return regs->class_count++;
+}
+
+void
+ra_class_add_reg(struct ra_regs *regs, unsigned int c, unsigned int r)
+{
+   struct ra_class *class = regs->classes[c];
+
+   BITSET_SET(class->regs, r);
+   class->p++;
+}
+
+/**
+ * Returns true if the register belongs to the given class.
+ */
+static bool
+reg_belongs_to_class(unsigned int r, struct ra_class *c)
+{
+   return BITSET_TEST(c->regs, r);
+}
+
+/**
+ * Must be called after all conflicts and register classes have been
+ * set up and before the register set is used for allocation.
+ * To avoid costly q value computation, use the q_values paramater
+ * to pass precomputed q values to this function.
+ */
+void
+ra_set_finalize(struct ra_regs *regs, unsigned int **q_values)
+{
+   unsigned int b, c;
+
+   for (b = 0; b < regs->class_count; b++) {
+      regs->classes[b]->q = ralloc_array(regs, unsigned int, regs->class_count);
+   }
+
+   if (q_values) {
+      for (b = 0; b < regs->class_count; b++) {
+         for (c = 0; c < regs->class_count; c++) {
+            regs->classes[b]->q[c] = q_values[b][c];
+	 }
+      }
+      return;
+   }
+
+   /* Compute, for each class B and C, how many regs of B an
+    * allocation to C could conflict with.
+    */
+   for (b = 0; b < regs->class_count; b++) {
+      for (c = 0; c < regs->class_count; c++) {
+	 unsigned int rc;
+	 int max_conflicts = 0;
+
+	 for (rc = 0; rc < regs->count; rc++) {
+	    int conflicts = 0;
+	    int i;
+
+            if (!reg_belongs_to_class(rc, regs->classes[c]))
+	       continue;
+
+	    for (i = 0; i < regs->regs[rc].num_conflicts; i++) {
+	       unsigned int rb = regs->regs[rc].conflict_list[i];
+	       if (BITSET_TEST(regs->classes[b]->regs, rb))
+		  conflicts++;
+	    }
+	    max_conflicts = MAX2(max_conflicts, conflicts);
+	 }
+	 regs->classes[b]->q[c] = max_conflicts;
+      }
+   }
+}
+
+static void
+ra_add_node_adjacency(struct ra_graph *g, unsigned int n1, unsigned int n2)
+{
+   BITSET_SET(g->nodes[n1].adjacency, n2);
+
+   if (g->nodes[n1].adjacency_count >=
+       g->nodes[n1].adjacency_list_size) {
+      g->nodes[n1].adjacency_list_size *= 2;
+      g->nodes[n1].adjacency_list = reralloc(g, g->nodes[n1].adjacency_list,
+                                             unsigned int,
+                                             g->nodes[n1].adjacency_list_size);
+   }
+
+   g->nodes[n1].adjacency_list[g->nodes[n1].adjacency_count] = n2;
+   g->nodes[n1].adjacency_count++;
+}
+
+struct ra_graph *
+ra_alloc_interference_graph(struct ra_regs *regs, unsigned int count)
+{
+   struct ra_graph *g;
+   unsigned int i;
+
+   g = rzalloc(regs, struct ra_graph);
+   g->regs = regs;
+   g->nodes = rzalloc_array(g, struct ra_node, count);
+   g->count = count;
+
+   g->stack = rzalloc_array(g, unsigned int, count);
+
+   for (i = 0; i < count; i++) {
+      int bitset_count = BITSET_WORDS(count);
+      g->nodes[i].adjacency = rzalloc_array(g, BITSET_WORD, bitset_count);
+
+      g->nodes[i].adjacency_list_size = 4;
+      g->nodes[i].adjacency_list =
+         ralloc_array(g, unsigned int, g->nodes[i].adjacency_list_size);
+      g->nodes[i].adjacency_count = 0;
+
+      ra_add_node_adjacency(g, i, i);
+      g->nodes[i].reg = NO_REG;
+   }
+
+   return g;
+}
+
+void
+ra_set_node_class(struct ra_graph *g,
+		  unsigned int n, unsigned int class)
+{
+   g->nodes[n].class = class;
+}
+
+void
+ra_add_node_interference(struct ra_graph *g,
+			 unsigned int n1, unsigned int n2)
+{
+   if (!BITSET_TEST(g->nodes[n1].adjacency, n2)) {
+      ra_add_node_adjacency(g, n1, n2);
+      ra_add_node_adjacency(g, n2, n1);
+   }
+}
+
+static bool
+pq_test(struct ra_graph *g, unsigned int n)
+{
+   unsigned int j;
+   unsigned int q = 0;
+   int n_class = g->nodes[n].class;
+
+   for (j = 0; j < g->nodes[n].adjacency_count; j++) {
+      unsigned int n2 = g->nodes[n].adjacency_list[j];
+      unsigned int n2_class = g->nodes[n2].class;
+
+      if (n != n2 && !g->nodes[n2].in_stack) {
+	 q += g->regs->classes[n_class]->q[n2_class];
+      }
+   }
+
+   return q < g->regs->classes[n_class]->p;
+}
+
+/**
+ * Simplifies the interference graph by pushing all
+ * trivially-colorable nodes into a stack of nodes to be colored,
+ * removing them from the graph, and rinsing and repeating.
+ *
+ * Returns true if all nodes were removed from the graph.  false
+ * means that either spilling will be required, or optimistic coloring
+ * should be applied.
+ */
+bool
+ra_simplify(struct ra_graph *g)
+{
+   bool progress = true;
+   int i;
+
+   while (progress) {
+      progress = false;
+
+      for (i = g->count - 1; i >= 0; i--) {
+	 if (g->nodes[i].in_stack || g->nodes[i].reg != NO_REG)
+	    continue;
+
+	 if (pq_test(g, i)) {
+	    g->stack[g->stack_count] = i;
+	    g->stack_count++;
+	    g->nodes[i].in_stack = true;
+	    progress = true;
+	 }
+      }
+   }
+
+   for (i = 0; i < g->count; i++) {
+      if (!g->nodes[i].in_stack && g->nodes[i].reg == -1)
+	 return false;
+   }
+
+   return true;
+}
+
+/**
+ * Pops nodes from the stack back into the graph, coloring them with
+ * registers as they go.
+ *
+ * If all nodes were trivially colorable, then this must succeed.  If
+ * not (optimistic coloring), then it may return false;
+ */
+bool
+ra_select(struct ra_graph *g)
+{
+   int i;
+   int start_search_reg = 0;
+
+   while (g->stack_count != 0) {
+      unsigned int ri;
+      unsigned int r = -1;
+      int n = g->stack[g->stack_count - 1];
+      struct ra_class *c = g->regs->classes[g->nodes[n].class];
+
+      /* Find the lowest-numbered reg which is not used by a member
+       * of the graph adjacent to us.
+       */
+      for (ri = 0; ri < g->regs->count; ri++) {
+         r = (start_search_reg + ri) % g->regs->count;
+         if (!reg_belongs_to_class(r, c))
+	    continue;
+
+	 /* Check if any of our neighbors conflict with this register choice. */
+	 for (i = 0; i < g->nodes[n].adjacency_count; i++) {
+	    unsigned int n2 = g->nodes[n].adjacency_list[i];
+
+	    if (!g->nodes[n2].in_stack &&
+		BITSET_TEST(g->regs->regs[r].conflicts, g->nodes[n2].reg)) {
+	       break;
+	    }
+	 }
+	 if (i == g->nodes[n].adjacency_count)
+	    break;
+      }
+      if (ri == g->regs->count)
+	 return false;
+
+      g->nodes[n].reg = r;
+      g->nodes[n].in_stack = false;
+      g->stack_count--;
+
+      if (g->regs->round_robin)
+         start_search_reg = r + 1;
+   }
+
+   return true;
+}
+
+/**
+ * Optimistic register coloring: Just push the remaining nodes
+ * on the stack.  They'll be colored first in ra_select(), and
+ * if they succeed then the locally-colorable nodes are still
+ * locally-colorable and the rest of the register allocation
+ * will succeed.
+ */
+void
+ra_optimistic_color(struct ra_graph *g)
+{
+   unsigned int i;
+
+   g->stack_optimistic_start = g->stack_count;
+   for (i = 0; i < g->count; i++) {
+      if (g->nodes[i].in_stack || g->nodes[i].reg != NO_REG)
+	 continue;
+
+      g->stack[g->stack_count] = i;
+      g->stack_count++;
+      g->nodes[i].in_stack = true;
+   }
+}
+
+bool
+ra_allocate_no_spills(struct ra_graph *g)
+{
+   if (!ra_simplify(g)) {
+      ra_optimistic_color(g);
+   }
+   return ra_select(g);
+}
+
+unsigned int
+ra_get_node_reg(struct ra_graph *g, unsigned int n)
+{
+   return g->nodes[n].reg;
+}
+
+/**
+ * Forces a node to a specific register.  This can be used to avoid
+ * creating a register class containing one node when handling data
+ * that must live in a fixed location and is known to not conflict
+ * with other forced register assignment (as is common with shader
+ * input data).  These nodes do not end up in the stack during
+ * ra_simplify(), and thus at ra_select() time it is as if they were
+ * the first popped off the stack and assigned their fixed locations.
+ * Nodes that use this function do not need to be assigned a register
+ * class.
+ *
+ * Must be called before ra_simplify().
+ */
+void
+ra_set_node_reg(struct ra_graph *g, unsigned int n, unsigned int reg)
+{
+   g->nodes[n].reg = reg;
+   g->nodes[n].in_stack = false;
+}
+
+static float
+ra_get_spill_benefit(struct ra_graph *g, unsigned int n)
+{
+   int j;
+   float benefit = 0;
+   int n_class = g->nodes[n].class;
+
+   /* Define the benefit of eliminating an interference between n, n2
+    * through spilling as q(C, B) / p(C).  This is similar to the
+    * "count number of edges" approach of traditional graph coloring,
+    * but takes classes into account.
+    */
+   for (j = 0; j < g->nodes[n].adjacency_count; j++) {
+      unsigned int n2 = g->nodes[n].adjacency_list[j];
+      if (n != n2) {
+	 unsigned int n2_class = g->nodes[n2].class;
+	 benefit += ((float)g->regs->classes[n_class]->q[n2_class] /
+		     g->regs->classes[n_class]->p);
+      }
+   }
+
+   return benefit;
+}
+
+/**
+ * Returns a node number to be spilled according to the cost/benefit using
+ * the pq test, or -1 if there are no spillable nodes.
+ */
+int
+ra_get_best_spill_node(struct ra_graph *g)
+{
+   unsigned int best_node = -1;
+   float best_benefit = 0.0;
+   unsigned int n, i;
+
+   /* For any registers not in the stack to be colored, consider them for
+    * spilling.  This will mostly collect nodes that were being optimistally
+    * colored as part of ra_allocate_no_spills() if we didn't successfully
+    * optimistically color.
+    *
+    * It also includes nodes not trivially colorable by ra_simplify() if it
+    * was used directly instead of as part of ra_allocate_no_spills().
+    */
+   for (n = 0; n < g->count; n++) {
+      float cost = g->nodes[n].spill_cost;
+      float benefit;
+
+      if (cost <= 0.0)
+	 continue;
+
+      if (g->nodes[n].in_stack)
+         continue;
+
+      benefit = ra_get_spill_benefit(g, n);
+
+      if (benefit / cost > best_benefit) {
+	 best_benefit = benefit / cost;
+	 best_node = n;
+      }
+   }
+
+   /* Also consider spilling any nodes that were set up to be optimistically
+    * colored that we couldn't manage to color in ra_select().
+    */
+   for (i = g->stack_optimistic_start; i < g->stack_count; i++) {
+      float cost, benefit;
+
+      n = g->stack[i];
+      cost = g->nodes[n].spill_cost;
+
+      if (cost <= 0.0)
+         continue;
+
+      benefit = ra_get_spill_benefit(g, n);
+
+      if (benefit / cost > best_benefit) {
+         best_benefit = benefit / cost;
+         best_node = n;
+      }
+   }
+
+   return best_node;
+}
+
+/**
+ * Only nodes with a spill cost set (cost != 0.0) will be considered
+ * for register spilling.
+ */
+void
+ra_set_node_spill_cost(struct ra_graph *g, unsigned int n, float cost)
+{
+   g->nodes[n].spill_cost = cost;
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/register_allocate.h b/icd/intel/compiler/mesa-utils/src/mesa/program/register_allocate.h
new file mode 100644
index 0000000..337dcf7
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/register_allocate.h

@@ -0,0 +1,79 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Eric Anholt <eric@anholt.net>
+ *
+ */
+
+#include <stdbool.h>
+
+struct ra_class;
+struct ra_regs;
+
+/* @{
+ * Register set setup.
+ *
+ * This should be done once at backend initializaion, as
+ * ra_set_finalize is O(r^2*c^2).  The registers may be virtual
+ * registers, such as aligned register pairs that conflict with the
+ * two real registers from which they are composed.
+ */
+struct ra_regs *ra_alloc_reg_set(void *mem_ctx, unsigned int count);
+void ra_set_allocate_round_robin(struct ra_regs *regs);
+unsigned int ra_alloc_reg_class(struct ra_regs *regs);
+void ra_add_reg_conflict(struct ra_regs *regs,
+			 unsigned int r1, unsigned int r2);
+void ra_add_transitive_reg_conflict(struct ra_regs *regs,
+				    unsigned int base_reg, unsigned int reg);
+void ra_class_add_reg(struct ra_regs *regs, unsigned int c, unsigned int reg);
+void ra_set_num_conflicts(struct ra_regs *regs, unsigned int class_a,
+                          unsigned int class_b, unsigned int num_conflicts);
+void ra_set_finalize(struct ra_regs *regs, unsigned int **conflicts);
+/** @} */
+
+/** @{ Interference graph setup.
+ *
+ * Each interference graph node is a virtual variable in the IL.  It
+ * is up to the user to ra_set_node_class() for the virtual variable,
+ * and compute live ranges and ra_node_interfere() between conflicting
+ * live ranges.
+ */
+struct ra_graph *ra_alloc_interference_graph(struct ra_regs *regs,
+					     unsigned int count);
+void ra_set_node_class(struct ra_graph *g, unsigned int n, unsigned int c);
+void ra_add_node_interference(struct ra_graph *g,
+			      unsigned int n1, unsigned int n2);
+/** @} */
+
+/** @{ Graph-coloring register allocation */
+bool ra_simplify(struct ra_graph *g);
+void ra_optimistic_color(struct ra_graph *g);
+bool ra_select(struct ra_graph *g);
+bool ra_allocate_no_spills(struct ra_graph *g);
+
+unsigned int ra_get_node_reg(struct ra_graph *g, unsigned int n);
+void ra_set_node_reg(struct ra_graph * g, unsigned int n, unsigned int reg);
+void ra_set_node_spill_cost(struct ra_graph *g, unsigned int n, float cost);
+int ra_get_best_spill_node(struct ra_graph *g);
+/** @} */
+

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/sampler.cpp b/icd/intel/compiler/mesa-utils/src/mesa/program/sampler.cpp
new file mode 100644
index 0000000..193d202
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/sampler.cpp

@@ -0,0 +1,137 @@
+/*
+ * Copyright (C) 2005-2007  Brian Paul   All Rights Reserved.
+ * Copyright (C) 2008  VMware, Inc.   All Rights Reserved.
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "ir.h"
+#include "glsl_types.h"
+#include "ir_visitor.h"
+//#include "../glsl/program.h" // LunarG : Removed
+#include "linker.h" // LunarG : ADD
+#include "program/hash_table.h"
+#include "ir_uniform.h"
+
+extern "C" {
+#include "main/compiler.h"
+#include "main/mtypes.h"
+#include "program/prog_parameter.h"
+#include "program/program.h"
+}
+
+class get_sampler_name : public ir_hierarchical_visitor
+{
+public:
+   get_sampler_name(ir_dereference *last,
+		    struct gl_shader_program *shader_program)
+   {
+      this->mem_ctx = ralloc_context(NULL);
+      this->shader_program = shader_program;
+      this->name = NULL;
+      this->offset = 0;
+      this->last = last;
+   }
+
+   ~get_sampler_name()
+   {
+      ralloc_free(this->mem_ctx);
+   }
+
+   virtual ir_visitor_status visit(ir_dereference_variable *ir)
+   {
+      this->name = ir->var->name;
+      return visit_continue;
+   }
+
+   virtual ir_visitor_status visit_leave(ir_dereference_record *ir)
+   {
+      this->name = ralloc_asprintf(mem_ctx, "%s.%s", name, ir->field);
+      return visit_continue;
+   }
+
+   virtual ir_visitor_status visit_leave(ir_dereference_array *ir)
+   {
+      ir_constant *index = ir->array_index->as_constant();
+      int i;
+
+      if (index) {
+	 i = index->value.i[0];
+      } else {
+	 /* GLSL 1.10 and 1.20 allowed variable sampler array indices,
+	  * while GLSL 1.30 requires that the array indices be
+	  * constant integer expressions.  We don't expect any driver
+	  * to actually work with a really variable array index, so
+	  * all that would work would be an unrolled loop counter that ends
+	  * up being constant above.
+	  */
+	 ralloc_strcat(&shader_program->InfoLog,
+		       "warning: Variable sampler array index unsupported.\n"
+		       "This feature of the language was removed in GLSL 1.20 "
+		       "and is unlikely to be supported for 1.10 in Mesa.\n");
+	 i = 0;
+      }
+      if (ir != last) {
+	 this->name = ralloc_asprintf(mem_ctx, "%s[%d]", name, i);
+      } else {
+	 offset = i;
+      }
+      return visit_continue;
+   }
+
+   struct gl_shader_program *shader_program;
+   const char *name;
+   void *mem_ctx;
+   int offset;
+   ir_dereference *last;
+};
+
+
+extern "C" int
+_mesa_get_sampler_uniform_value(class ir_dereference *sampler,
+				struct gl_shader_program *shader_program,
+				const struct gl_program *prog)
+{
+   get_sampler_name getname(sampler, shader_program);
+
+   GLuint shader = _mesa_program_enum_to_shader_stage(prog->Target);
+
+   sampler->accept(&getname);
+
+   unsigned location;
+   if (!shader_program->UniformHash->get(location, getname.name)) {
+      linker_error(shader_program,
+		   "failed to find sampler named %s.\n", getname.name);
+      return 0;
+   }
+
+   if (!shader_program->UniformStorage[location].sampler[shader].active) {
+      assert(0 && "cannot return a sampler");
+      linker_error(shader_program,
+		   "cannot return a sampler named %s, because it is not "
+                   "used in this shader stage. This is a driver bug.\n",
+                   getname.name);
+      return 0;
+   }
+
+   return shader_program->UniformStorage[location].sampler[shader].index +
+          getname.offset;
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/sampler.h b/icd/intel/compiler/mesa-utils/src/mesa/program/sampler.h
new file mode 100644
index 0000000..22467e9
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/sampler.h

@@ -0,0 +1,29 @@
+/*
+ * Copyright (C) 2005-2007  Brian Paul   All Rights Reserved.
+ * Copyright (C) 2008  VMware, Inc.   All Rights Reserved.
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+int
+_mesa_get_sampler_uniform_value(class ir_dereference *sampler,
+				struct gl_shader_program *shader_program,
+				const struct gl_program *prog);

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/string_to_uint_map.cpp b/icd/intel/compiler/mesa-utils/src/mesa/program/string_to_uint_map.cpp
new file mode 100644
index 0000000..cfa73ab
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/string_to_uint_map.cpp

@@ -0,0 +1,42 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file string_to_uint_map.cpp
+ * \brief Dumb wrapprs so that C code can create and destroy maps.
+ *
+ * \author Ian Romanick <ian.d.romanick@intel.com>
+ */
+#include "hash_table.h"
+
+extern "C" struct string_to_uint_map *
+string_to_uint_map_ctor()
+{
+   return new string_to_uint_map;
+}
+
+extern "C" void
+string_to_uint_map_dtor(struct string_to_uint_map *map)
+{
+   delete map;
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/symbol_table.c b/icd/intel/compiler/mesa-utils/src/mesa/program/symbol_table.c
new file mode 100644
index 0000000..9462978
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/symbol_table.c

@@ -0,0 +1,405 @@
+/*
+ * Copyright © 2008 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "main/imports.h"
+#include "symbol_table.h"
+#include "hash_table.h"
+
+struct symbol {
+    /**
+     * Link to the next symbol in the table with the same name
+     *
+     * The linked list of symbols with the same name is ordered by scope
+     * from inner-most to outer-most.
+     */
+    struct symbol *next_with_same_name;
+
+
+    /**
+     * Link to the next symbol in the table with the same scope
+     *
+     * The linked list of symbols with the same scope is unordered.  Symbols
+     * in this list my have unique names.
+     */
+    struct symbol *next_with_same_scope;
+
+
+    /**
+     * Header information for the list of symbols with the same name.
+     */
+    struct symbol_header *hdr;
+
+
+    /**
+     * Name space of the symbol
+     *
+     * Name space are arbitrary user assigned integers.  No two symbols can
+     * exist in the same name space at the same scope level.
+     */
+    int name_space;
+
+    /** Scope depth where this symbol was defined. */
+    unsigned depth;
+
+    /**
+     * Arbitrary user supplied data.
+     */
+    void *data;
+};
+
+
+/**
+ */
+struct symbol_header {
+    /** Linkage in list of all headers in a given symbol table. */
+    struct symbol_header *next;
+
+    /** Symbol name. */
+    char *name;
+
+    /** Linked list of symbols with the same name. */
+    struct symbol *symbols;
+};
+
+
+/**
+ * Element of the scope stack.
+ */
+struct scope_level {
+    /** Link to next (inner) scope level. */
+    struct scope_level *next;
+    
+    /** Linked list of symbols with the same scope. */
+    struct symbol *symbols;
+};
+
+
+/**
+ *
+ */
+struct _mesa_symbol_table {
+    /** Hash table containing all symbols in the symbol table. */
+    struct hash_table *ht;
+
+    /** Top of scope stack. */
+    struct scope_level *current_scope;
+
+    /** List of all symbol headers in the table. */
+    struct symbol_header *hdr;
+
+    /** Current scope depth. */
+    unsigned depth;
+};
+
+
+static void
+check_symbol_table(struct _mesa_symbol_table *table)
+{
+#if !defined(NDEBUG)
+    struct scope_level *scope;
+
+    for (scope = table->current_scope; scope != NULL; scope = scope->next) {
+        struct symbol *sym;
+
+        for (sym = scope->symbols
+             ; sym != NULL
+             ; sym = sym->next_with_same_name) {
+            const struct symbol_header *const hdr = sym->hdr;
+            struct symbol *sym2;
+
+            for (sym2 = hdr->symbols
+                 ; sym2 != NULL
+                 ; sym2 = sym2->next_with_same_name) {
+                assert(sym2->hdr == hdr);
+            }
+        }
+    }
+#else
+    (void) table;
+#endif /* !defined(NDEBUG) */
+}
+
+void
+_mesa_symbol_table_pop_scope(struct _mesa_symbol_table *table)
+{
+    struct scope_level *const scope = table->current_scope;
+    struct symbol *sym = scope->symbols;
+
+    table->current_scope = scope->next;
+    table->depth--;
+
+    free(scope);
+
+    while (sym != NULL) {
+        struct symbol *const next = sym->next_with_same_scope;
+        struct symbol_header *const hdr = sym->hdr;
+
+        assert(hdr->symbols == sym);
+
+        hdr->symbols = sym->next_with_same_name;
+
+        free(sym);
+
+        sym = next;
+    }
+
+    check_symbol_table(table);
+}
+
+
+void
+_mesa_symbol_table_push_scope(struct _mesa_symbol_table *table)
+{
+    struct scope_level *const scope = calloc(1, sizeof(*scope));
+    
+    scope->next = table->current_scope;
+    table->current_scope = scope;
+    table->depth++;
+}
+
+
+static struct symbol_header *
+find_symbol(struct _mesa_symbol_table *table, const char *name)
+{
+    return (struct symbol_header *) hash_table_find(table->ht, name);
+}
+
+
+/**
+ * Determine the scope "distance" of a symbol from the current scope
+ *
+ * \return
+ * A non-negative number for the number of scopes between the current scope
+ * and the scope where a symbol was defined.  A value of zero means the current
+ * scope.  A negative number if the symbol does not exist.
+ */
+int
+_mesa_symbol_table_symbol_scope(struct _mesa_symbol_table *table,
+				int name_space, const char *name)
+{
+    struct symbol_header *const hdr = find_symbol(table, name);
+    struct symbol *sym;
+
+    if (hdr != NULL) {
+       for (sym = hdr->symbols; sym != NULL; sym = sym->next_with_same_name) {
+	  assert(sym->hdr == hdr);
+
+	  if ((name_space == -1) || (sym->name_space == name_space)) {
+	     assert(sym->depth <= table->depth);
+	     return sym->depth - table->depth;
+	  }
+       }
+    }
+
+    return -1;
+}
+
+
+void *
+_mesa_symbol_table_find_symbol(struct _mesa_symbol_table *table,
+                               int name_space, const char *name)
+{
+    struct symbol_header *const hdr = find_symbol(table, name);
+
+    if (hdr != NULL) {
+        struct symbol *sym;
+
+
+        for (sym = hdr->symbols; sym != NULL; sym = sym->next_with_same_name) {
+            assert(sym->hdr == hdr);
+
+            if ((name_space == -1) || (sym->name_space == name_space)) {
+                return sym->data;
+            }
+        }
+    }
+
+    return NULL;
+}
+
+
+int
+_mesa_symbol_table_add_symbol(struct _mesa_symbol_table *table,
+                              int name_space, const char *name,
+                              void *declaration)
+{
+    struct symbol_header *hdr;
+    struct symbol *sym;
+
+    check_symbol_table(table);
+
+    hdr = find_symbol(table, name);
+
+    check_symbol_table(table);
+
+    if (hdr == NULL) {
+       hdr = calloc(1, sizeof(*hdr));
+       hdr->name = strdup(name);
+
+       hash_table_insert(table->ht, hdr, hdr->name);
+       hdr->next = table->hdr;
+       table->hdr = hdr;
+    }
+
+    check_symbol_table(table);
+
+    /* If the symbol already exists in this namespace at this scope, it cannot
+     * be added to the table.
+     */
+    for (sym = hdr->symbols
+	 ; (sym != NULL) && (sym->name_space != name_space)
+	 ; sym = sym->next_with_same_name) {
+       /* empty */
+    }
+
+    if (sym && (sym->depth == table->depth))
+       return -1;
+
+    sym = calloc(1, sizeof(*sym));
+    sym->next_with_same_name = hdr->symbols;
+    sym->next_with_same_scope = table->current_scope->symbols;
+    sym->hdr = hdr;
+    sym->name_space = name_space;
+    sym->data = declaration;
+    sym->depth = table->depth;
+
+    assert(sym->hdr == hdr);
+
+    hdr->symbols = sym;
+    table->current_scope->symbols = sym;
+
+    check_symbol_table(table);
+    return 0;
+}
+
+
+int
+_mesa_symbol_table_add_global_symbol(struct _mesa_symbol_table *table,
+				     int name_space, const char *name,
+				     void *declaration)
+{
+    struct symbol_header *hdr;
+    struct symbol *sym;
+    struct symbol *curr;
+    struct scope_level *top_scope;
+
+    check_symbol_table(table);
+
+    hdr = find_symbol(table, name);
+
+    check_symbol_table(table);
+
+    if (hdr == NULL) {
+        hdr = calloc(1, sizeof(*hdr));
+        hdr->name = strdup(name);
+
+        hash_table_insert(table->ht, hdr, hdr->name);
+        hdr->next = table->hdr;
+        table->hdr = hdr;
+    }
+
+    check_symbol_table(table);
+
+    /* If the symbol already exists in this namespace at this scope, it cannot
+     * be added to the table.
+     */
+    for (sym = hdr->symbols
+	 ; (sym != NULL) && (sym->name_space != name_space)
+	 ; sym = sym->next_with_same_name) {
+       /* empty */
+    }
+
+    if (sym && sym->depth == 0)
+       return -1;
+
+    /* Find the top-level scope */
+    for (top_scope = table->current_scope
+	 ; top_scope->next != NULL
+	 ; top_scope = top_scope->next) {
+       /* empty */
+    }
+
+    sym = calloc(1, sizeof(*sym));
+    sym->next_with_same_scope = top_scope->symbols;
+    sym->hdr = hdr;
+    sym->name_space = name_space;
+    sym->data = declaration;
+
+    assert(sym->hdr == hdr);
+
+    /* Since next_with_same_name is ordered by scope, we need to append the
+     * new symbol to the _end_ of the list.
+     */
+    if (hdr->symbols == NULL) {
+       hdr->symbols = sym;
+    } else {
+       for (curr = hdr->symbols
+	    ; curr->next_with_same_name != NULL
+	    ; curr = curr->next_with_same_name) {
+	  /* empty */
+       }
+       curr->next_with_same_name = sym;
+    }
+    top_scope->symbols = sym;
+
+    check_symbol_table(table);
+    return 0;
+}
+
+
+
+struct _mesa_symbol_table *
+_mesa_symbol_table_ctor(void)
+{
+    struct _mesa_symbol_table *table = calloc(1, sizeof(*table));
+
+    if (table != NULL) {
+       table->ht = hash_table_ctor(32, hash_table_string_hash,
+				   hash_table_string_compare);
+
+       _mesa_symbol_table_push_scope(table);
+    }
+
+    return table;
+}
+
+
+void
+_mesa_symbol_table_dtor(struct _mesa_symbol_table *table)
+{
+   struct symbol_header *hdr;
+   struct symbol_header *next;
+
+   while (table->current_scope != NULL) {
+      _mesa_symbol_table_pop_scope(table);
+   }
+
+   for (hdr = table->hdr; hdr != NULL; hdr = next) {
+       next = hdr->next;
+       free(hdr->name);
+       free(hdr);
+   }
+
+   hash_table_dtor(table->ht);
+   free(table);
+}

diff --git a/icd/intel/compiler/mesa-utils/src/mesa/program/symbol_table.h b/icd/intel/compiler/mesa-utils/src/mesa/program/symbol_table.h
new file mode 100644
index 0000000..1027f47
--- /dev/null
+++ b/icd/intel/compiler/mesa-utils/src/mesa/program/symbol_table.h

@@ -0,0 +1,49 @@
+/*
+ * Copyright © 2008 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#ifndef MESA_SYMBOL_TABLE_H
+#define MESA_SYMBOL_TABLE_H
+
+struct _mesa_symbol_table;
+
+extern void _mesa_symbol_table_push_scope(struct _mesa_symbol_table *table);
+
+extern void _mesa_symbol_table_pop_scope(struct _mesa_symbol_table *table);
+
+extern int _mesa_symbol_table_add_symbol(struct _mesa_symbol_table *symtab,
+    int name_space, const char *name, void *declaration);
+
+extern int _mesa_symbol_table_add_global_symbol(
+    struct _mesa_symbol_table *symtab, int name_space, const char *name,
+    void *declaration);
+
+extern int _mesa_symbol_table_symbol_scope(struct _mesa_symbol_table *table,
+    int name_space, const char *name);
+
+extern void *_mesa_symbol_table_find_symbol(
+    struct _mesa_symbol_table *symtab, int name_space, const char *name);
+
+extern struct _mesa_symbol_table *_mesa_symbol_table_ctor(void);
+
+extern void _mesa_symbol_table_dtor(struct _mesa_symbol_table *);
+
+#endif /* MESA_SYMBOL_TABLE_H */

diff --git a/icd/intel/compiler/pipeline/brw_blorp.h b/icd/intel/compiler/pipeline/brw_blorp.h
new file mode 100644
index 0000000..15a7a0b
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_blorp.h

@@ -0,0 +1,424 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#pragma once
+
+#include <stdint.h>
+
+#include "brw_context.h"
+#include "intel_mipmap_tree.h"
+
+struct brw_context;
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void
+brw_blorp_blit_miptrees(struct brw_context *brw,
+                        struct intel_mipmap_tree *src_mt,
+                        unsigned src_level, unsigned src_layer,
+                        struct intel_mipmap_tree *dst_mt,
+                        unsigned dst_level, unsigned dst_layer,
+                        float src_x0, float src_y0,
+                        float src_x1, float src_y1,
+                        float dst_x0, float dst_y0,
+                        float dst_x1, float dst_y1,
+                        GLenum filter, bool mirror_x, bool mirror_y);
+
+bool
+brw_blorp_clear_color(struct brw_context *brw, struct gl_framebuffer *fb,
+                      GLbitfield mask, bool partial_clear);
+
+void
+brw_blorp_resolve_color(struct brw_context *brw,
+                        struct intel_mipmap_tree *mt);
+
+#ifdef __cplusplus
+} /* end extern "C" */
+
+/**
+ * Binding table indices used by BLORP.
+ */
+enum {
+   BRW_BLORP_TEXTURE_BINDING_TABLE_INDEX,
+   BRW_BLORP_RENDERBUFFER_BINDING_TABLE_INDEX,
+   BRW_BLORP_NUM_BINDING_TABLE_ENTRIES
+};
+
+
+class brw_blorp_mip_info
+{
+public:
+   brw_blorp_mip_info();
+
+   void set(struct intel_mipmap_tree *mt,
+            unsigned int level, unsigned int layer);
+
+   struct intel_mipmap_tree *mt;
+
+   /**
+    * The miplevel to use.
+    */
+   uint32_t level;
+
+   /**
+    * The 2D layer within the miplevel. Combined, level and layer define the
+    * 2D miptree slice to use.
+    *
+    * Note: if mt is a 2D multisample array texture on Gen7+ using
+    * INTEL_MSAA_LAYOUT_UMS or INTEL_MSAA_LAYOUT_CMS, layer is the physical
+    * layer holding sample 0.  So, for example, if mt->num_samples == 4, then
+    * logical layer n corresponds to layer == 4*n.
+    */
+   uint32_t layer;
+
+   /**
+    * Width of the miplevel to be used.  For surfaces using
+    * INTEL_MSAA_LAYOUT_IMS, this is measured in samples, not pixels.
+    */
+   uint32_t width;
+
+   /**
+    * Height of the miplevel to be used.  For surfaces using
+    * INTEL_MSAA_LAYOUT_IMS, this is measured in samples, not pixels.
+    */
+   uint32_t height;
+
+   /**
+    * X offset within the surface to texture from (or render to).  For
+    * surfaces using INTEL_MSAA_LAYOUT_IMS, this is measured in samples, not
+    * pixels.
+    */
+   uint32_t x_offset;
+
+   /**
+    * Y offset within the surface to texture from (or render to).  For
+    * surfaces using INTEL_MSAA_LAYOUT_IMS, this is measured in samples, not
+    * pixels.
+    */
+   uint32_t y_offset;
+};
+
+class brw_blorp_surface_info : public brw_blorp_mip_info
+{
+public:
+   brw_blorp_surface_info();
+
+   void set(struct brw_context *brw,
+            struct intel_mipmap_tree *mt,
+            unsigned int level, unsigned int layer,
+            bool is_render_target);
+
+   uint32_t compute_tile_offsets(uint32_t *tile_x, uint32_t *tile_y) const;
+
+   /* Setting this flag indicates that the buffer's contents are W-tiled
+    * stencil data, but the surface state should be set up for Y tiled
+    * MESA_FORMAT_R_UNORM8 data (this is necessary because surface states don't
+    * support W tiling).
+    *
+    * Since W tiles are 64 pixels wide by 64 pixels high, whereas Y tiles of
+    * MESA_FORMAT_R_UNORM8 data are 128 pixels wide by 32 pixels high, the width and
+    * pitch stored in the surface state will be multiplied by 2, and the
+    * height will be halved.  Also, since W and Y tiles store their data in a
+    * different order, the width and height will be rounded up to a multiple
+    * of the tile size, to ensure that the WM program can access the full
+    * width and height of the buffer.
+    */
+   bool map_stencil_as_y_tiled;
+
+   unsigned num_samples;
+
+   /* Setting this flag indicates that the surface should be set up in
+    * ARYSPC_LOD0 mode.  Ignored prior to Gen7.
+    */
+   bool array_spacing_lod0;
+
+   /**
+    * Format that should be used when setting up the surface state for this
+    * surface.  Should correspond to one of the BRW_SURFACEFORMAT_* enums.
+    */
+   uint32_t brw_surfaceformat;
+
+   /**
+    * For MSAA surfaces, MSAA layout that should be used when setting up the
+    * surface state for this surface.
+    */
+   intel_msaa_layout msaa_layout;
+};
+
+
+struct brw_blorp_coord_transform_params
+{
+   void setup(GLfloat src0, GLfloat src1, GLfloat dst0, GLfloat dst1,
+              bool mirror);
+
+   float multiplier;
+   float offset;
+};
+
+
+struct brw_blorp_wm_push_constants
+{
+   uint32_t dst_x0;
+   uint32_t dst_x1;
+   uint32_t dst_y0;
+   uint32_t dst_y1;
+   /* Top right coordinates of the rectangular grid used for scaled blitting */
+   float rect_grid_x1;
+   float rect_grid_y1;
+   brw_blorp_coord_transform_params x_transform;
+   brw_blorp_coord_transform_params y_transform;
+   /* Pad out to an integral number of registers */
+   uint32_t pad[6];
+};
+
+/* Every 32 bytes of push constant data constitutes one GEN register. */
+const unsigned int BRW_BLORP_NUM_PUSH_CONST_REGS =
+   sizeof(brw_blorp_wm_push_constants) / 32;
+
+struct brw_blorp_prog_data
+{
+   unsigned int first_curbe_grf;
+
+   /**
+    * True if the WM program should be run in MSDISPMODE_PERSAMPLE with more
+    * than one sample per pixel.
+    */
+   bool persample_msaa_dispatch;
+};
+
+
+enum gen7_fast_clear_op {
+   GEN7_FAST_CLEAR_OP_NONE,
+   GEN7_FAST_CLEAR_OP_FAST_CLEAR,
+   GEN7_FAST_CLEAR_OP_RESOLVE,
+};
+
+
+class brw_blorp_params
+{
+public:
+   brw_blorp_params();
+
+   virtual uint32_t get_wm_prog(struct brw_context *brw,
+                                brw_blorp_prog_data **prog_data) const = 0;
+
+   uint32_t x0;
+   uint32_t y0;
+   uint32_t x1;
+   uint32_t y1;
+   brw_blorp_mip_info depth;
+   uint32_t depth_format;
+   brw_blorp_surface_info src;
+   brw_blorp_surface_info dst;
+   enum gen6_hiz_op hiz_op;
+   enum gen7_fast_clear_op fast_clear_op;
+   bool use_wm_prog;
+   brw_blorp_wm_push_constants wm_push_consts;
+   bool color_write_disable[4];
+};
+
+
+void
+brw_blorp_exec(struct brw_context *brw, const brw_blorp_params *params);
+
+
+/**
+ * Parameters for a HiZ or depth resolve operation.
+ *
+ * For an overview of HiZ ops, see the following sections of the Sandy Bridge
+ * PRM, Volume 1, Part 2:
+ *   - 7.5.3.1 Depth Buffer Clear
+ *   - 7.5.3.2 Depth Buffer Resolve
+ *   - 7.5.3.3 Hierarchical Depth Buffer Resolve
+ */
+class brw_hiz_op_params : public brw_blorp_params
+{
+public:
+   brw_hiz_op_params(struct intel_mipmap_tree *mt,
+                     unsigned int level, unsigned int layer,
+                     gen6_hiz_op op);
+
+   virtual uint32_t get_wm_prog(struct brw_context *brw,
+                                brw_blorp_prog_data **prog_data) const;
+};
+
+struct brw_blorp_blit_prog_key
+{
+   /* Number of samples per pixel that have been configured in the surface
+    * state for texturing from.
+    */
+   unsigned tex_samples;
+
+   /* MSAA layout that has been configured in the surface state for texturing
+    * from.
+    */
+   intel_msaa_layout tex_layout;
+
+   /* Actual number of samples per pixel in the source image. */
+   unsigned src_samples;
+
+   /* Actual MSAA layout used by the source image. */
+   intel_msaa_layout src_layout;
+
+   /* Number of samples per pixel that have been configured in the render
+    * target.
+    */
+   unsigned rt_samples;
+
+   /* MSAA layout that has been configured in the render target. */
+   intel_msaa_layout rt_layout;
+
+   /* Actual number of samples per pixel in the destination image. */
+   unsigned dst_samples;
+
+   /* Actual MSAA layout used by the destination image. */
+   intel_msaa_layout dst_layout;
+
+   /* Type of the data to be read from the texture (one of
+    * BRW_REGISTER_TYPE_{UD,D,F}).
+    */
+   unsigned texture_data_type;
+
+   /* True if the source image is W tiled.  If true, the surface state for the
+    * source image must be configured as Y tiled, and tex_samples must be 0.
+    */
+   bool src_tiled_w;
+
+   /* True if the destination image is W tiled.  If true, the surface state
+    * for the render target must be configured as Y tiled, and rt_samples must
+    * be 0.
+    */
+   bool dst_tiled_w;
+
+   /* True if all source samples should be blended together to produce each
+    * destination pixel.  If true, src_tiled_w must be false, tex_samples must
+    * equal src_samples, and tex_samples must be nonzero.
+    */
+   bool blend;
+
+   /* True if the rectangle being sent through the rendering pipeline might be
+    * larger than the destination rectangle, so the WM program should kill any
+    * pixels that are outside the destination rectangle.
+    */
+   bool use_kill;
+
+   /**
+    * True if the WM program should be run in MSDISPMODE_PERSAMPLE with more
+    * than one sample per pixel.
+    */
+   bool persample_msaa_dispatch;
+
+   /* True for scaled blitting. */
+   bool blit_scaled;
+
+   /* Scale factors between the pixel grid and the grid of samples. We're
+    * using grid of samples for bilinear filetring in multisample scaled blits.
+    */
+   float x_scale;
+   float y_scale;
+
+   /* True for blits with filter = GL_LINEAR. */
+   bool bilinear_filter;
+};
+
+class brw_blorp_blit_params : public brw_blorp_params
+{
+public:
+   brw_blorp_blit_params(struct brw_context *brw,
+                         struct intel_mipmap_tree *src_mt,
+                         unsigned src_level, unsigned src_layer,
+                         struct intel_mipmap_tree *dst_mt,
+                         unsigned dst_level, unsigned dst_layer,
+                         GLfloat src_x0, GLfloat src_y0,
+                         GLfloat src_x1, GLfloat src_y1,
+                         GLfloat dst_x0, GLfloat dst_y0,
+                         GLfloat dst_x1, GLfloat dst_y1,
+                         GLenum filter, bool mirror_x, bool mirror_y);
+
+   virtual uint32_t get_wm_prog(struct brw_context *brw,
+                                brw_blorp_prog_data **prog_data) const;
+
+private:
+   brw_blorp_blit_prog_key wm_prog_key;
+};
+
+/**
+ * \name BLORP internals
+ * \{
+ *
+ * Used internally by gen6_blorp_exec() and gen7_blorp_exec().
+ */
+
+void
+gen6_blorp_init(struct brw_context *brw);
+
+void
+gen6_blorp_emit_state_base_address(struct brw_context *brw,
+                                   const brw_blorp_params *params);
+
+void
+gen6_blorp_emit_vertices(struct brw_context *brw,
+                         const brw_blorp_params *params);
+
+uint32_t
+gen6_blorp_emit_blend_state(struct brw_context *brw,
+                            const brw_blorp_params *params);
+
+uint32_t
+gen6_blorp_emit_cc_state(struct brw_context *brw,
+                         const brw_blorp_params *params);
+
+uint32_t
+gen6_blorp_emit_wm_constants(struct brw_context *brw,
+                             const brw_blorp_params *params);
+
+void
+gen6_blorp_emit_vs_disable(struct brw_context *brw,
+                           const brw_blorp_params *params);
+
+uint32_t
+gen6_blorp_emit_binding_table(struct brw_context *brw,
+                              const brw_blorp_params *params,
+                              uint32_t wm_surf_offset_renderbuffer,
+                              uint32_t wm_surf_offset_texture);
+
+uint32_t
+gen6_blorp_emit_depth_stencil_state(struct brw_context *brw,
+                                    const brw_blorp_params *params);
+
+void
+gen6_blorp_emit_gs_disable(struct brw_context *brw,
+                           const brw_blorp_params *params);
+
+void
+gen6_blorp_emit_clip_disable(struct brw_context *brw,
+                             const brw_blorp_params *params);
+
+void
+gen6_blorp_emit_drawing_rectangle(struct brw_context *brw,
+                                  const brw_blorp_params *params);
+/** \} */
+
+#endif /* __cplusplus */

diff --git a/icd/intel/compiler/pipeline/brw_blorp_blit_eu.cpp b/icd/intel/compiler/pipeline/brw_blorp_blit_eu.cpp
new file mode 100644
index 0000000..32a8413
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_blorp_blit_eu.cpp

@@ -0,0 +1,216 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "glsl/ralloc.h"
+#include "brw_blorp_blit_eu.h"
+#include "brw_blorp.h"
+
+brw_blorp_eu_emitter::brw_blorp_eu_emitter(struct brw_context *brw)
+   : brw_ctx (brw), mem_ctx(ralloc_context(NULL)), c(rzalloc(mem_ctx, struct brw_wm_compile)),
+     generator(brw, c, NULL, NULL, false)
+{
+}
+
+brw_blorp_eu_emitter::~brw_blorp_eu_emitter()
+{
+   ralloc_free(mem_ctx);
+}
+
+const unsigned *
+brw_blorp_eu_emitter::get_program(unsigned *program_size, FILE *dump_file)
+{
+   const unsigned *res;
+
+   if (unlikely(INTEL_DEBUG & DEBUG_BLORP)) {
+      fprintf(stderr, "Native code for BLORP blit:\n");
+      res = generator.generate_assembly(NULL, &insts, program_size, dump_file);
+      fprintf(stderr, "\n");
+   } else {
+      res = generator.generate_assembly(NULL, &insts, program_size);
+   }
+
+   return res;
+}
+
+/**
+ * Emit code that kills pixels whose X and Y coordinates are outside the
+ * boundary of the rectangle defined by the push constants (dst_x0, dst_y0,
+ * dst_x1, dst_y1).
+ */
+void
+brw_blorp_eu_emitter::emit_kill_if_outside_rect(const struct brw_reg &x,
+                                                const struct brw_reg &y,
+                                                const struct brw_reg &dst_x0,
+                                                const struct brw_reg &dst_x1,
+                                                const struct brw_reg &dst_y0,
+                                                const struct brw_reg &dst_y1)
+{
+   struct brw_reg f0 = brw_flag_reg(0, 0);
+   struct brw_reg g1 = retype(brw_vec1_grf(1, 7), BRW_REGISTER_TYPE_UW);
+
+   emit_cmp(BRW_CONDITIONAL_GE, x, dst_x0);
+   emit_cmp(BRW_CONDITIONAL_GE, y, dst_y0)->predicate = BRW_PREDICATE_NORMAL;
+   emit_cmp(BRW_CONDITIONAL_L, x, dst_x1)->predicate = BRW_PREDICATE_NORMAL;
+   emit_cmp(BRW_CONDITIONAL_L, y, dst_y1)->predicate = BRW_PREDICATE_NORMAL;
+
+   fs_inst *inst = new (mem_ctx) fs_inst(BRW_OPCODE_AND, g1, f0, g1);
+   inst->force_writemask_all = true;
+   insts.push_tail(inst);
+}
+
+void
+brw_blorp_eu_emitter::emit_texture_lookup(const struct brw_reg &dst,
+                                          enum opcode op,
+                                          unsigned base_mrf,
+                                          unsigned msg_length)
+{
+   fs_inst *inst = new (mem_ctx) fs_inst(op, dst, brw_message_reg(base_mrf));
+
+   inst->base_mrf = base_mrf;
+   inst->mlen = msg_length;
+   inst->sampler = 0;
+   inst->header_present = false;
+
+   insts.push_tail(inst);
+}
+
+void
+brw_blorp_eu_emitter::emit_render_target_write(const struct brw_reg &src0,
+                                               unsigned msg_reg_nr,
+                                               unsigned msg_length,
+                                               bool use_header)
+{
+   fs_inst *inst = new (mem_ctx) fs_inst(FS_OPCODE_BLORP_FB_WRITE);
+
+   inst->src[0] = src0;
+   inst->base_mrf = msg_reg_nr;
+   inst->mlen = msg_length;
+   inst->header_present = use_header;
+   inst->target = BRW_BLORP_RENDERBUFFER_BINDING_TABLE_INDEX;
+
+   insts.push_tail(inst);
+}
+
+void
+brw_blorp_eu_emitter::emit_scattered_write(enum opcode opcode,
+                                           const struct brw_reg &src0,
+                                           unsigned msg_reg_nr,
+                                           unsigned msg_length,
+                                           int dispatch_width,
+                                           bool use_header)
+{
+   assert(opcode == SHADER_OPCODE_DWORD_SCATTERED_WRITE ||
+          (brw_ctx->gen >= 7 && opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE));
+
+   fs_inst *inst = new (mem_ctx) fs_inst(opcode);
+
+   inst->dst = brw_vecn_reg(dispatch_width,
+           BRW_ARCHITECTURE_REGISTER_FILE, BRW_ARF_NULL, 0);
+   inst->src[0] = src0;
+   inst->base_mrf = msg_reg_nr;
+   inst->mlen = msg_length;
+   inst->header_present = use_header;
+   inst->target = BRW_BLORP_RENDERBUFFER_BINDING_TABLE_INDEX;
+
+   insts.push_tail(inst);
+}
+
+void
+brw_blorp_eu_emitter::emit_scattered_read(const struct brw_reg &dst,
+                                          enum opcode opcode,
+                                          const struct brw_reg &src0,
+                                          unsigned msg_reg_nr,
+                                          unsigned msg_length,
+                                          int dispatch_width,
+                                          bool use_header)
+{
+   assert(opcode == SHADER_OPCODE_DWORD_SCATTERED_READ ||
+          (brw_ctx->gen >= 7 && opcode == SHADER_OPCODE_BYTE_SCATTERED_READ));
+
+   fs_inst *inst = new (mem_ctx) fs_inst(opcode);
+
+   switch (dispatch_width) {
+   case 1:
+   default:
+       inst->dst = vec1(dst);
+       break;
+   case 2:
+       inst->dst = vec2(dst);
+       break;
+   case 4:
+       inst->dst = vec4(dst);
+       break;
+   case 8:
+       inst->dst = vec8(dst);
+       break;
+   case 16:
+       inst->dst = vec16(dst);
+       break;
+   }
+
+   inst->src[0] = src0;
+   inst->base_mrf = msg_reg_nr;
+   inst->mlen = msg_length;
+   inst->header_present = use_header;
+   inst->target = BRW_BLORP_TEXTURE_BINDING_TABLE_INDEX;
+
+   insts.push_tail(inst);
+}
+
+void
+brw_blorp_eu_emitter::emit_urb_write_eot(unsigned base_mrf)
+{
+   fs_inst *inst = new (mem_ctx) fs_inst(VS_OPCODE_URB_WRITE);
+
+   inst->base_mrf = base_mrf;
+   /* header will be added by gen6_resolve_implied_move() */
+   inst->mlen = 1;
+   inst->header_present = true;
+   inst->eot = true;
+
+   insts.push_tail(inst);
+}
+
+void
+brw_blorp_eu_emitter::emit_combine(enum opcode combine_opcode,
+                                   const struct brw_reg &dst,
+                                   const struct brw_reg &src_1,
+                                   const struct brw_reg &src_2)
+{
+   assert(combine_opcode == BRW_OPCODE_ADD || combine_opcode == BRW_OPCODE_AVG);
+
+   insts.push_tail(new (mem_ctx) fs_inst(combine_opcode, dst, src_1, src_2));
+}
+
+fs_inst *
+brw_blorp_eu_emitter::emit_cmp(int op,
+                               const struct brw_reg &x,
+                               const struct brw_reg &y)
+{
+   fs_inst *cmp = new (mem_ctx) fs_inst(BRW_OPCODE_CMP,
+                                        vec16(brw_null_reg()), x, y);
+   cmp->conditional_mod = op;
+   insts.push_tail(cmp);
+   return cmp;
+}
+

diff --git a/icd/intel/compiler/pipeline/brw_blorp_blit_eu.h b/icd/intel/compiler/pipeline/brw_blorp_blit_eu.h
new file mode 100644
index 0000000..4432438
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_blorp_blit_eu.h

@@ -0,0 +1,233 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef BRW_BLORP_BLIT_EU_H
+#define BRW_BLORP_BLIT_EU_H
+
+#include "brw_context.h"
+#include "brw_fs.h"
+
+class brw_blorp_eu_emitter
+{
+protected:
+   explicit brw_blorp_eu_emitter(struct brw_context *brw);
+   ~brw_blorp_eu_emitter();
+
+   const unsigned *get_program(unsigned *program_size, FILE *dump_file);
+
+   void emit_kill_if_outside_rect(const struct brw_reg &x,
+                                  const struct brw_reg &y,
+                                  const struct brw_reg &dst_x0,
+                                  const struct brw_reg &dst_x1,
+                                  const struct brw_reg &dst_y0,
+                                  const struct brw_reg &dst_y1);
+
+   void emit_texture_lookup(const struct brw_reg &dst,
+                            enum opcode op,
+                            unsigned base_mrf,
+                            unsigned msg_length);
+
+   void emit_render_target_write(const struct brw_reg &src0,
+                                 unsigned msg_reg_nr,
+                                 unsigned msg_length,
+                                 bool use_header);
+
+   void emit_scattered_write(enum opcode opcode,
+                             const struct brw_reg &src0,
+                             unsigned msg_reg_nr,
+                             unsigned msg_length,
+                             int dispatch_width,
+                             bool use_header);
+
+   void emit_scattered_read(const struct brw_reg &dst,
+                            enum opcode opcode,
+                            const struct brw_reg &src0,
+                            unsigned msg_reg_nr,
+                            unsigned msg_length,
+                            int dispatch_width,
+                            bool use_header);
+
+   void emit_urb_write_eot(unsigned base_mrf);
+
+   void emit_combine(enum opcode combine_opcode,
+                     const struct brw_reg &dst,
+                     const struct brw_reg &src_1,
+                     const struct brw_reg &src_2);
+
+   inline void emit_cond_mov(const struct brw_reg &x,
+                             const struct brw_reg &y,
+                             int op,
+                             const struct brw_reg &dst,
+                             const struct brw_reg &src)
+   {
+      emit_cmp(op, x, y);
+
+      fs_inst *mv = new (mem_ctx) fs_inst(BRW_OPCODE_MOV, dst, src);
+      mv->predicate = BRW_PREDICATE_NORMAL;
+      insts.push_tail(mv);
+   }
+
+   inline void emit_if_eq_mov(const struct brw_reg &x, unsigned y,
+                              const struct brw_reg &dst, unsigned src)
+   {
+      emit_cond_mov(x, brw_imm_d(y), BRW_CONDITIONAL_EQ, dst, brw_imm_d(src));
+   }
+
+   inline void emit_lrp(const struct brw_reg &dst,
+                        const struct brw_reg &src1,
+                        const struct brw_reg &src2,
+                        const struct brw_reg &src3)
+   {
+      insts.push_tail(
+         new (mem_ctx) fs_inst(BRW_OPCODE_LRP, dst, src1, src2, src3));
+   }
+
+   inline void emit_mov(const struct brw_reg& dst, const struct brw_reg& src)
+   {
+      insts.push_tail(new (mem_ctx) fs_inst(BRW_OPCODE_MOV, dst, src));
+   }
+
+   inline void emit_mov_8(const struct brw_reg& dst, const struct brw_reg& src)
+   {
+      fs_inst *mv = new (mem_ctx) fs_inst(BRW_OPCODE_MOV, dst, src);
+      mv->force_uncompressed = true;
+      insts.push_tail(mv);
+   }
+
+   inline void emit_and(const struct brw_reg& dst,
+                        const struct brw_reg& src1,
+                        const struct brw_reg& src2)
+   {
+      insts.push_tail(new (mem_ctx) fs_inst(BRW_OPCODE_AND, dst, src1, src2));
+   }
+
+   inline void emit_add(const struct brw_reg& dst,
+                        const struct brw_reg& src1,
+                        const struct brw_reg& src2)
+   {
+      insts.push_tail(new (mem_ctx) fs_inst(BRW_OPCODE_ADD, dst, src1, src2));
+   }
+
+   inline void emit_add_8(const struct brw_reg& dst,
+                          const struct brw_reg& src1,
+                          const struct brw_reg& src2)
+   {
+      fs_inst *add = new (mem_ctx) fs_inst(BRW_OPCODE_ADD, dst, src1, src2);
+      add->force_uncompressed = true;
+      insts.push_tail(add);
+   }
+
+   inline void emit_mul(const struct brw_reg& dst,
+                        const struct brw_reg& src1,
+                        const struct brw_reg& src2)
+   {
+      insts.push_tail(new (mem_ctx) fs_inst(BRW_OPCODE_MUL, dst, src1, src2));
+   }
+
+   inline void emit_mul_8(const struct brw_reg& dst,
+                          const struct brw_reg& src1,
+                          const struct brw_reg& src2)
+   {
+      fs_inst *mul = new (mem_ctx) fs_inst(BRW_OPCODE_MUL, dst, src1, src2);
+      mul->force_uncompressed = true;
+      insts.push_tail(mul);
+   }
+
+   inline void emit_idiv(const struct brw_reg& dst,
+                         const struct brw_reg& src1,
+                         const struct brw_reg& src2)
+   {
+      fs_inst *idiv = new (mem_ctx) fs_inst(SHADER_OPCODE_INT_QUOTIENT, dst, src1, src2);
+      insts.push_tail(idiv);
+   }
+
+   inline void emit_irem(const struct brw_reg& dst,
+                         const struct brw_reg& src1,
+                         const struct brw_reg& src2)
+   {
+      fs_inst *irem = new (mem_ctx) fs_inst(SHADER_OPCODE_INT_REMAINDER, dst, src1, src2);
+      insts.push_tail(irem);
+   }
+
+   inline void emit_shr(const struct brw_reg& dst,
+                        const struct brw_reg& src1,
+                        const struct brw_reg& src2)
+   {
+      insts.push_tail(new (mem_ctx) fs_inst(BRW_OPCODE_SHR, dst, src1, src2));
+   }
+
+   inline void emit_shl(const struct brw_reg& dst,
+                        const struct brw_reg& src1,
+                        const struct brw_reg& src2)
+   {
+      insts.push_tail(new (mem_ctx) fs_inst(BRW_OPCODE_SHL, dst, src1, src2));
+   }
+
+   inline void emit_or(const struct brw_reg& dst,
+                       const struct brw_reg& src1,
+                       const struct brw_reg& src2)
+   {
+      insts.push_tail(new (mem_ctx) fs_inst(BRW_OPCODE_OR, dst, src1, src2));
+   }
+
+   inline void emit_frc(const struct brw_reg& dst,
+                        const struct brw_reg& src)
+   {
+      insts.push_tail(new (mem_ctx) fs_inst(BRW_OPCODE_FRC, dst, src));
+   }
+
+   inline void emit_rndd(const struct brw_reg& dst,
+                         const struct brw_reg& src)
+   {
+      insts.push_tail(new (mem_ctx) fs_inst(BRW_OPCODE_RNDD, dst, src));
+   }
+
+   inline void emit_cmp_if(int op,
+                           const struct brw_reg &x,
+                           const struct brw_reg &y)
+   {
+      emit_cmp(op, x, y);
+      insts.push_tail(new (mem_ctx) fs_inst(BRW_OPCODE_IF));
+   }
+
+   inline void emit_else(void)
+   {
+      insts.push_tail(new (mem_ctx) fs_inst(BRW_OPCODE_ELSE));
+   }
+
+   inline void emit_endif(void)
+   {
+      insts.push_tail(new (mem_ctx) fs_inst(BRW_OPCODE_ENDIF));
+   }
+
+private:
+   fs_inst *emit_cmp(int op, const struct brw_reg &x, const struct brw_reg &y);
+
+   struct brw_context *brw_ctx;
+   void *mem_ctx;
+   struct brw_wm_compile *c;
+   exec_list insts;
+   fs_generator generator;
+};
+
+#endif /* BRW_BLORP_BLIT_EU_H */

diff --git a/icd/intel/compiler/pipeline/brw_cfg.cpp b/icd/intel/compiler/pipeline/brw_cfg.cpp
new file mode 100644
index 0000000..53281c6
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_cfg.cpp

@@ -0,0 +1,316 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Eric Anholt <eric@anholt.net>
+ *
+ */
+
+#include "brw_cfg.h"
+
+/** @file brw_cfg.cpp
+ *
+ * Walks the shader instructions generated and creates a set of basic
+ * blocks with successor/predecessor edges connecting them.
+ */
+
+static bblock_t *
+pop_stack(exec_list *list)
+{
+   bblock_link *link = (bblock_link *)list->get_tail();
+   bblock_t *block = link->block;
+   link->remove();
+
+   return block;
+}
+
+bblock_t::bblock_t() :
+   start_ip(0), end_ip(0), block_num(0)
+{
+   start = NULL;
+   end = NULL;
+
+   parents.make_empty();
+   children.make_empty();
+
+   if_inst = NULL;
+   else_inst = NULL;
+   endif_inst = NULL;
+}
+
+void
+bblock_t::add_successor(void *mem_ctx, bblock_t *successor)
+{
+   successor->parents.push_tail(new(mem_ctx) bblock_link(this));
+   children.push_tail(new(mem_ctx) bblock_link(successor));
+}
+
+void
+bblock_t::dump(backend_visitor *v)
+{
+   int ip = this->start_ip;
+   for (backend_instruction *inst = (backend_instruction *)this->start;
+	inst != this->end->next;
+	inst = (backend_instruction *) inst->next) {
+      fprintf(stderr, "%5d: ", ip);
+      v->dump_instruction(inst);
+      ip++;
+   }
+}
+
+cfg_t::cfg_t(exec_list *instructions)
+{
+   mem_ctx = ralloc_context(NULL);
+   block_list.make_empty();
+   blocks = NULL;
+   num_blocks = 0;
+
+   bblock_t *cur = NULL;
+   int ip = 0;
+
+   bblock_t *entry = new_block();
+   bblock_t *cur_if = NULL;    /**< BB ending with IF. */
+   bblock_t *cur_else = NULL;  /**< BB ending with ELSE. */
+   bblock_t *cur_endif = NULL; /**< BB starting with ENDIF. */
+   bblock_t *cur_do = NULL;    /**< BB ending with DO. */
+   bblock_t *cur_while = NULL; /**< BB immediately following WHILE. */
+   exec_list if_stack, else_stack, do_stack, while_stack;
+   bblock_t *next;
+
+   set_next_block(&cur, entry, ip);
+
+   entry->start = (backend_instruction *) instructions->get_head();
+
+   foreach_list(node, instructions) {
+      backend_instruction *inst = (backend_instruction *)node;
+
+      cur->end = inst;
+
+      /* set_next_block wants the post-incremented ip */
+      ip++;
+
+      switch (inst->opcode) {
+      case BRW_OPCODE_IF:
+	 /* Push our information onto a stack so we can recover from
+	  * nested ifs.
+	  */
+	 if_stack.push_tail(new(mem_ctx) bblock_link(cur_if));
+	 else_stack.push_tail(new(mem_ctx) bblock_link(cur_else));
+
+	 cur_if = cur;
+	 cur_else = NULL;
+         cur_endif = NULL;
+
+	 /* Set up our immediately following block, full of "then"
+	  * instructions.
+	  */
+	 next = new_block();
+	 next->start = (backend_instruction *)inst->next;
+	 cur_if->add_successor(mem_ctx, next);
+
+	 set_next_block(&cur, next, ip);
+	 break;
+
+      case BRW_OPCODE_ELSE:
+         cur_else = cur;
+
+	 next = new_block();
+	 next->start = (backend_instruction *)inst->next;
+	 cur_if->add_successor(mem_ctx, next);
+
+	 set_next_block(&cur, next, ip);
+	 break;
+
+      case BRW_OPCODE_ENDIF: {
+         if (cur->start == inst) {
+            /* New block was just created; use it. */
+            cur_endif = cur;
+         } else {
+            cur_endif = new_block();
+            cur_endif->start = inst;
+
+            cur->end = (backend_instruction *)inst->prev;
+            cur->add_successor(mem_ctx, cur_endif);
+
+            set_next_block(&cur, cur_endif, ip - 1);
+         }
+
+         backend_instruction *else_inst = NULL;
+         if (cur_else) {
+            else_inst = (backend_instruction *)cur_else->end;
+
+            cur_else->add_successor(mem_ctx, cur_endif);
+         } else {
+            cur_if->add_successor(mem_ctx, cur_endif);
+         }
+
+         assert(cur_if->end->opcode == BRW_OPCODE_IF);
+         assert(!else_inst || else_inst->opcode == BRW_OPCODE_ELSE);
+         assert(inst->opcode == BRW_OPCODE_ENDIF);
+
+         cur_if->if_inst = cur_if->end;
+         cur_if->else_inst = else_inst;
+         cur_if->endif_inst = inst;
+
+	 if (cur_else) {
+            cur_else->if_inst = cur_if->end;
+            cur_else->else_inst = else_inst;
+            cur_else->endif_inst = inst;
+         }
+
+         cur->if_inst = cur_if->end;
+         cur->else_inst = else_inst;
+         cur->endif_inst = inst;
+
+	 /* Pop the stack so we're in the previous if/else/endif */
+	 cur_if = pop_stack(&if_stack);
+	 cur_else = pop_stack(&else_stack);
+	 break;
+      }
+      case BRW_OPCODE_DO:
+	 /* Push our information onto a stack so we can recover from
+	  * nested loops.
+	  */
+	 do_stack.push_tail(new(mem_ctx) bblock_link(cur_do));
+	 while_stack.push_tail(new(mem_ctx) bblock_link(cur_while));
+
+	 /* Set up the block just after the while.  Don't know when exactly
+	  * it will start, yet.
+	  */
+	 cur_while = new_block();
+
+	 /* Set up our immediately following block, full of "then"
+	  * instructions.
+	  */
+	 next = new_block();
+	 next->start = (backend_instruction *)inst->next;
+	 cur->add_successor(mem_ctx, next);
+	 cur_do = next;
+
+	 set_next_block(&cur, next, ip);
+	 break;
+
+      case BRW_OPCODE_CONTINUE:
+	 cur->add_successor(mem_ctx, cur_do);
+
+	 next = new_block();
+	 next->start = (backend_instruction *)inst->next;
+	 if (inst->predicate)
+	    cur->add_successor(mem_ctx, next);
+
+	 set_next_block(&cur, next, ip);
+	 break;
+
+      case BRW_OPCODE_BREAK:
+	 cur->add_successor(mem_ctx, cur_while);
+
+	 next = new_block();
+	 next->start = (backend_instruction *)inst->next;
+	 if (inst->predicate)
+	    cur->add_successor(mem_ctx, next);
+
+	 set_next_block(&cur, next, ip);
+	 break;
+
+      case BRW_OPCODE_WHILE:
+	 cur_while->start = (backend_instruction *)inst->next;
+
+	 cur->add_successor(mem_ctx, cur_do);
+	 set_next_block(&cur, cur_while, ip);
+
+	 /* Pop the stack so we're in the previous loop */
+	 cur_do = pop_stack(&do_stack);
+	 cur_while = pop_stack(&while_stack);
+	 break;
+
+      default:
+	 break;
+      }
+   }
+
+   cur->end_ip = ip;
+
+   make_block_array();
+}
+
+cfg_t::~cfg_t()
+{
+   ralloc_free(mem_ctx);
+}
+
+bblock_t *
+cfg_t::new_block()
+{
+   bblock_t *block = new(mem_ctx) bblock_t();
+
+   return block;
+}
+
+void
+cfg_t::set_next_block(bblock_t **cur, bblock_t *block, int ip)
+{
+   if (*cur) {
+      assert((*cur)->end->next == block->start);
+      (*cur)->end_ip = ip - 1;
+   }
+
+   block->start_ip = ip;
+   block->block_num = num_blocks++;
+   block_list.push_tail(new(mem_ctx) bblock_link(block));
+   *cur = block;
+}
+
+void
+cfg_t::make_block_array()
+{
+   blocks = ralloc_array(mem_ctx, bblock_t *, num_blocks);
+
+   int i = 0;
+   foreach_list(block_node, &block_list) {
+      bblock_link *link = (bblock_link *)block_node;
+      blocks[i++] = link->block;
+   }
+   assert(i == num_blocks);
+}
+
+void
+cfg_t::dump(backend_visitor *v)
+{
+   for (int b = 0; b < this->num_blocks; b++) {
+        bblock_t *block = this->blocks[b];
+      fprintf(stderr, "START B%d", b);
+      foreach_list(node, &block->parents) {
+         bblock_link *link = (bblock_link *)node;
+         fprintf(stderr, " <-B%d",
+                 link->block->block_num);
+      }
+      fprintf(stderr, "\n");
+      block->dump(v);
+      fprintf(stderr, "END B%d", b);
+      foreach_list(node, &block->children) {
+         bblock_link *link = (bblock_link *)node;
+         fprintf(stderr, " ->B%d",
+                 link->block->block_num);
+      }
+      fprintf(stderr, "\n");
+   }
+}

diff --git a/icd/intel/compiler/pipeline/brw_cfg.h b/icd/intel/compiler/pipeline/brw_cfg.h
new file mode 100644
index 0000000..7bd3e24
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_cfg.h

@@ -0,0 +1,91 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Eric Anholt <eric@anholt.net>
+ *
+ */
+
+#include "brw_shader.h"
+
+class bblock_t;
+
+class bblock_link : public exec_node {
+public:
+   bblock_link(bblock_t *block)
+      : block(block)
+   {
+   }
+
+   bblock_t *block;
+};
+
+class bblock_t {
+public:
+   DECLARE_RALLOC_CXX_OPERATORS(bblock_t)
+
+   bblock_t();
+
+   void add_successor(void *mem_ctx, bblock_t *successor);
+   void dump(backend_visitor *v);
+
+   backend_instruction *start;
+   backend_instruction *end;
+
+   int start_ip;
+   int end_ip;
+
+   exec_list parents;
+   exec_list children;
+   int block_num;
+
+   /* If the current basic block ends in an IF, ELSE, or ENDIF instruction,
+    * these pointers will hold the locations of the other associated control
+    * flow instructions.
+    *
+    * Otherwise they are NULL.
+    */
+   backend_instruction *if_inst;
+   backend_instruction *else_inst;
+   backend_instruction *endif_inst;
+};
+
+class cfg_t {
+public:
+   DECLARE_RALLOC_CXX_OPERATORS(cfg_t)
+
+   cfg_t(exec_list *instructions);
+   ~cfg_t();
+
+   bblock_t *new_block();
+   void set_next_block(bblock_t **cur, bblock_t *block, int ip);
+   void make_block_array();
+
+   void dump(backend_visitor *v);
+
+   void *mem_ctx;
+
+   /** Ordered list (by ip) of basic blocks */
+   exec_list block_list;
+   bblock_t **blocks;
+   int num_blocks;
+};

diff --git a/icd/intel/compiler/pipeline/brw_context.c b/icd/intel/compiler/pipeline/brw_context.c
new file mode 100644
index 0000000..1d1ceaa
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_context.c

@@ -0,0 +1,1461 @@
+/*
+ Copyright 2003 VMware, Inc.
+ Copyright (C) Intel Corp.  2006.  All Rights Reserved.
+ Intel funded Tungsten Graphics to
+ develop this 3D driver.
+
+ Permission is hereby granted, free of charge, to any person obtaining
+ a copy of this software and associated documentation files (the
+ "Software"), to deal in the Software without restriction, including
+ without limitation the rights to use, copy, modify, merge, publish,
+ distribute, sublicense, and/or sell copies of the Software, and to
+ permit persons to whom the Software is furnished to do so, subject to
+ the following conditions:
+
+ The above copyright notice and this permission notice (including the
+ next paragraph) shall be included in all copies or substantial
+ portions of the Software.
+
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+ **********************************************************************/
+ /*
+  * Authors:
+  *   Keith Whitwell <keithw@vmware.com>
+  */
+
+
+#include "main/api_exec.h"
+#include "main/context.h"
+#include "main/fbobject.h"
+#include "main/imports.h"
+#include "main/macros.h"
+#include "main/points.h"
+#include "main/version.h"
+#include "main/vtxfmt.h"
+
+#include "vbo/vbo_context.h"
+
+#include "drivers/common/driverfuncs.h"
+#include "drivers/common/meta.h"
+#include "utils.h"
+
+#include "brw_context.h"
+#include "brw_defines.h"
+#include "brw_draw.h"
+#include "brw_state.h"
+
+#include "intel_batchbuffer.h"
+#include "intel_buffer_objects.h"
+#include "intel_buffers.h"
+#include "intel_fbo.h"
+#include "intel_mipmap_tree.h"
+#include "intel_pixel.h"
+#include "intel_image.h"
+#include "intel_tex.h"
+#include "intel_tex_obj.h"
+
+#include "swrast_setup/swrast_setup.h"
+#include "tnl/tnl.h"
+#include "tnl/t_pipeline.h"
+#include "glsl/ralloc.h"
+
+#include "program/prog_diskcache.h"
+
+/***************************************
+ * Mesa's Driver Functions
+ ***************************************/
+
+static size_t
+brw_query_samples_for_format(struct gl_context *ctx, GLenum target,
+                             GLenum internalFormat, int samples[16])
+{
+   struct brw_context *brw = brw_context(ctx);
+
+   (void) target;
+
+   switch (brw->gen) {
+   case 8:
+      samples[0] = 8;
+      samples[1] = 4;
+      samples[2] = 2;
+      return 3;
+
+   case 7:
+      samples[0] = 8;
+      samples[1] = 4;
+      return 2;
+
+   case 6:
+      samples[0] = 4;
+      return 1;
+
+   default:
+      samples[0] = 1;
+      return 1;
+   }
+}
+
+const char *const brw_vendor_string = "Intel Open Source Technology Center";
+
+const char *
+brw_get_renderer_string(unsigned deviceID)
+{
+   const char *chipset;
+   static char buffer[128];
+
+   switch (deviceID) {
+#undef CHIPSET
+#define CHIPSET(id, symbol, str) case id: chipset = str; break;
+#include "pci_ids/i965_pci_ids.h"
+   default:
+      chipset = "Unknown Intel Chipset";
+      break;
+   }
+
+   (void) driGetRendererString(buffer, chipset, 0);
+   return buffer;
+}
+
+static const GLubyte *
+intelGetString(struct gl_context * ctx, GLenum name)
+{
+   const struct brw_context *const brw = brw_context(ctx);
+
+   switch (name) {
+   case GL_VENDOR:
+      return (GLubyte *) brw_vendor_string;
+
+   case GL_RENDERER:
+      return
+         (GLubyte *) brw_get_renderer_string(brw->intelScreen->deviceID);
+
+   default:
+      return NULL;
+   }
+}
+
+static void
+intel_viewport(struct gl_context *ctx)
+{
+   struct brw_context *brw = brw_context(ctx);
+   __DRIcontext *driContext = brw->driContext;
+
+   if (_mesa_is_winsys_fbo(ctx->DrawBuffer)) {
+      dri2InvalidateDrawable(driContext->driDrawablePriv);
+      dri2InvalidateDrawable(driContext->driReadablePriv);
+   }
+}
+
+static void
+intelInvalidateState(struct gl_context * ctx, GLuint new_state)
+{
+   struct brw_context *brw = brw_context(ctx);
+
+   if (ctx->swrast_context)
+      _swrast_InvalidateState(ctx, new_state);
+   _vbo_InvalidateState(ctx, new_state);
+
+   brw->NewGLState |= new_state;
+}
+
+#define flushFront(screen)      ((screen)->image.loader ? (screen)->image.loader->flushFrontBuffer : (screen)->dri2.loader->flushFrontBuffer)
+
+static void
+intel_flush_front(struct gl_context *ctx)
+{
+   struct brw_context *brw = brw_context(ctx);
+   __DRIcontext *driContext = brw->driContext;
+   __DRIdrawable *driDrawable = driContext->driDrawablePriv;
+   __DRIscreen *const screen = brw->intelScreen->driScrnPriv;
+
+   if (brw->front_buffer_dirty && _mesa_is_winsys_fbo(ctx->DrawBuffer)) {
+      if (flushFront(screen) && driDrawable &&
+          driDrawable->loaderPrivate) {
+
+         /* Resolve before flushing FAKE_FRONT_LEFT to FRONT_LEFT.
+          *
+          * This potentially resolves both front and back buffer. It
+          * is unnecessary to resolve the back, but harms nothing except
+          * performance. And no one cares about front-buffer render
+          * performance.
+          */
+         intel_resolve_for_dri2_flush(brw, driDrawable);
+         intel_batchbuffer_flush(brw);
+
+         flushFront(screen)(driDrawable, driDrawable->loaderPrivate);
+
+         /* We set the dirty bit in intel_prepare_render() if we're
+          * front buffer rendering once we get there.
+          */
+         brw->front_buffer_dirty = false;
+      }
+   }
+}
+
+static void
+intel_glFlush(struct gl_context *ctx)
+{
+   struct brw_context *brw = brw_context(ctx);
+
+   intel_batchbuffer_flush(brw);
+   intel_flush_front(ctx);
+   if (brw_is_front_buffer_drawing(ctx->DrawBuffer))
+      brw->need_throttle = true;
+}
+
+void
+intelFinish(struct gl_context * ctx)
+{
+   struct brw_context *brw = brw_context(ctx);
+
+   intel_glFlush(ctx);
+
+   if (brw->batch.last_bo)
+      drm_intel_bo_wait_rendering(brw->batch.last_bo);
+}
+
+static void
+brw_init_driver_functions(struct brw_context *brw,
+                          struct dd_function_table *functions)
+{
+   _mesa_init_driver_functions(functions);
+
+   /* GLX uses DRI2 invalidate events to handle window resizing.
+    * Unfortunately, EGL does not - libEGL is written in XCB (not Xlib),
+    * which doesn't provide a mechanism for snooping the event queues.
+    *
+    * So EGL still relies on viewport hacks to handle window resizing.
+    * This should go away with DRI3000.
+    */
+   if (!brw->driContext->driScreenPriv->dri2.useInvalidate)
+      functions->Viewport = intel_viewport;
+
+   functions->Flush = intel_glFlush;
+   functions->Finish = intelFinish;
+   functions->GetString = intelGetString;
+   functions->UpdateState = intelInvalidateState;
+
+   intelInitTextureFuncs(functions);
+   intelInitTextureImageFuncs(functions);
+   intelInitTextureSubImageFuncs(functions);
+   intelInitTextureCopyImageFuncs(functions);
+   intelInitClearFuncs(functions);
+   intelInitBufferFuncs(functions);
+   intelInitPixelFuncs(functions);
+   intelInitBufferObjectFuncs(functions);
+   intel_init_syncobj_functions(functions);
+   brw_init_object_purgeable_functions(functions);
+
+   brwInitFragProgFuncs( functions );
+   brw_init_common_queryobj_functions(functions);
+   if (brw->gen >= 6)
+      gen6_init_queryobj_functions(functions);
+   else
+      gen4_init_queryobj_functions(functions);
+
+   functions->QuerySamplesForFormat = brw_query_samples_for_format;
+
+   functions->NewTransformFeedback = brw_new_transform_feedback;
+   functions->DeleteTransformFeedback = brw_delete_transform_feedback;
+   functions->GetTransformFeedbackVertexCount =
+      brw_get_transform_feedback_vertex_count;
+   if (brw->gen >= 7) {
+      functions->BeginTransformFeedback = gen7_begin_transform_feedback;
+      functions->EndTransformFeedback = gen7_end_transform_feedback;
+      functions->PauseTransformFeedback = gen7_pause_transform_feedback;
+      functions->ResumeTransformFeedback = gen7_resume_transform_feedback;
+   } else {
+      functions->BeginTransformFeedback = brw_begin_transform_feedback;
+      functions->EndTransformFeedback = brw_end_transform_feedback;
+   }
+
+   if (brw->gen >= 6)
+      functions->GetSamplePosition = gen6_get_sample_position;
+}
+
+static void
+brw_initialize_context_constants(struct brw_context *brw)
+{
+   struct gl_context *ctx = &brw->ctx;
+
+   unsigned max_samplers =
+      brw->gen >= 8 || brw->is_haswell ? BRW_MAX_TEX_UNIT : 16;
+
+   ctx->Const.QueryCounterBits.Timestamp = 36;
+
+   ctx->Const.StripTextureBorder = true;
+
+   ctx->Const.MaxDualSourceDrawBuffers = 1;
+   ctx->Const.MaxDrawBuffers = BRW_MAX_DRAW_BUFFERS;
+   ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxTextureImageUnits = max_samplers;
+   ctx->Const.MaxTextureCoordUnits = 8; /* Mesa limit */
+   ctx->Const.MaxTextureUnits =
+      MIN2(ctx->Const.MaxTextureCoordUnits,
+           ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxTextureImageUnits);
+   ctx->Const.Program[MESA_SHADER_VERTEX].MaxTextureImageUnits = max_samplers;
+   if (brw->gen >= 7)
+      ctx->Const.Program[MESA_SHADER_GEOMETRY].MaxTextureImageUnits = max_samplers;
+   else
+      ctx->Const.Program[MESA_SHADER_GEOMETRY].MaxTextureImageUnits = 0;
+   if (getenv("INTEL_COMPUTE_SHADER")) {
+      ctx->Const.Program[MESA_SHADER_COMPUTE].MaxTextureImageUnits = BRW_MAX_TEX_UNIT;
+      ctx->Const.MaxUniformBufferBindings += 12;
+   } else {
+      ctx->Const.Program[MESA_SHADER_COMPUTE].MaxTextureImageUnits = 0;
+   }
+   ctx->Const.MaxCombinedTextureImageUnits =
+      ctx->Const.Program[MESA_SHADER_VERTEX].MaxTextureImageUnits +
+      ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxTextureImageUnits +
+      ctx->Const.Program[MESA_SHADER_GEOMETRY].MaxTextureImageUnits +
+      ctx->Const.Program[MESA_SHADER_COMPUTE].MaxTextureImageUnits;
+
+   ctx->Const.MaxTextureLevels = 14; /* 8192 */
+   if (ctx->Const.MaxTextureLevels > MAX_TEXTURE_LEVELS)
+      ctx->Const.MaxTextureLevels = MAX_TEXTURE_LEVELS;
+   ctx->Const.Max3DTextureLevels = 12; /* 2048 */
+   ctx->Const.MaxCubeTextureLevels = 14; /* 8192 */
+   ctx->Const.MaxTextureMbytes = 1536;
+
+   if (brw->gen >= 7)
+      ctx->Const.MaxArrayTextureLayers = 2048;
+   else
+      ctx->Const.MaxArrayTextureLayers = 512;
+
+   ctx->Const.MaxTextureRectSize = 1 << 12;
+
+   ctx->Const.MaxTextureMaxAnisotropy = 16.0;
+
+   ctx->Const.MaxRenderbufferSize = 8192;
+
+   /* Hardware only supports a limited number of transform feedback buffers.
+    * So we need to override the Mesa default (which is based only on software
+    * limits).
+    */
+   ctx->Const.MaxTransformFeedbackBuffers = BRW_MAX_SOL_BUFFERS;
+
+   /* On Gen6, in the worst case, we use up one binding table entry per
+    * transform feedback component (see comments above the definition of
+    * BRW_MAX_SOL_BINDINGS, in brw_context.h), so we need to advertise a value
+    * for MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS equal to
+    * BRW_MAX_SOL_BINDINGS.
+    *
+    * In "separate components" mode, we need to divide this value by
+    * BRW_MAX_SOL_BUFFERS, so that the total number of binding table entries
+    * used up by all buffers will not exceed BRW_MAX_SOL_BINDINGS.
+    */
+   ctx->Const.MaxTransformFeedbackInterleavedComponents = BRW_MAX_SOL_BINDINGS;
+   ctx->Const.MaxTransformFeedbackSeparateComponents =
+      BRW_MAX_SOL_BINDINGS / BRW_MAX_SOL_BUFFERS;
+
+   ctx->Const.AlwaysUseGetTransformFeedbackVertexCount = true;
+
+   int max_samples;
+   const int *msaa_modes = intel_supported_msaa_modes(brw->intelScreen);
+   const int clamp_max_samples =
+      driQueryOptioni(&brw->optionCache, "clamp_max_samples");
+
+   if (clamp_max_samples < 0) {
+      max_samples = msaa_modes[0];
+   } else {
+      /* Select the largest supported MSAA mode that does not exceed
+       * clamp_max_samples.
+       */
+      max_samples = 0;
+      for (int i = 0; msaa_modes[i] != 0; ++i) {
+         if (msaa_modes[i] <= clamp_max_samples) {
+            max_samples = msaa_modes[i];
+            break;
+         }
+      }
+   }
+
+   ctx->Const.MaxSamples = max_samples;
+   ctx->Const.MaxColorTextureSamples = max_samples;
+   ctx->Const.MaxDepthTextureSamples = max_samples;
+   ctx->Const.MaxIntegerSamples = max_samples;
+
+   if (brw->gen >= 7)
+      ctx->Const.MaxProgramTextureGatherComponents = 4;
+   else if (brw->gen == 6)
+      ctx->Const.MaxProgramTextureGatherComponents = 1;
+
+   ctx->Const.MinLineWidth = 1.0;
+   ctx->Const.MinLineWidthAA = 1.0;
+   ctx->Const.MaxLineWidth = 5.0;
+   ctx->Const.MaxLineWidthAA = 5.0;
+   ctx->Const.LineWidthGranularity = 0.5;
+
+   ctx->Const.MinPointSize = 1.0;
+   ctx->Const.MinPointSizeAA = 1.0;
+   ctx->Const.MaxPointSize = 255.0;
+   ctx->Const.MaxPointSizeAA = 255.0;
+   ctx->Const.PointSizeGranularity = 1.0;
+
+   if (brw->gen >= 5 || brw->is_g4x)
+      ctx->Const.MaxClipPlanes = 8;
+
+   ctx->Const.Program[MESA_SHADER_VERTEX].MaxNativeInstructions = 16 * 1024;
+   ctx->Const.Program[MESA_SHADER_VERTEX].MaxAluInstructions = 0;
+   ctx->Const.Program[MESA_SHADER_VERTEX].MaxTexInstructions = 0;
+   ctx->Const.Program[MESA_SHADER_VERTEX].MaxTexIndirections = 0;
+   ctx->Const.Program[MESA_SHADER_VERTEX].MaxNativeAluInstructions = 0;
+   ctx->Const.Program[MESA_SHADER_VERTEX].MaxNativeTexInstructions = 0;
+   ctx->Const.Program[MESA_SHADER_VERTEX].MaxNativeTexIndirections = 0;
+   ctx->Const.Program[MESA_SHADER_VERTEX].MaxNativeAttribs = 16;
+   ctx->Const.Program[MESA_SHADER_VERTEX].MaxNativeTemps = 256;
+   ctx->Const.Program[MESA_SHADER_VERTEX].MaxNativeAddressRegs = 1;
+   ctx->Const.Program[MESA_SHADER_VERTEX].MaxNativeParameters = 1024;
+   ctx->Const.Program[MESA_SHADER_VERTEX].MaxEnvParams =
+      MIN2(ctx->Const.Program[MESA_SHADER_VERTEX].MaxNativeParameters,
+	   ctx->Const.Program[MESA_SHADER_VERTEX].MaxEnvParams);
+
+   ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxNativeInstructions = 1024;
+   ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxNativeAluInstructions = 1024;
+   ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxNativeTexInstructions = 1024;
+   ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxNativeTexIndirections = 1024;
+   ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxNativeAttribs = 12;
+   ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxNativeTemps = 256;
+   ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxNativeAddressRegs = 0;
+   ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxNativeParameters = 1024;
+   ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxEnvParams =
+      MIN2(ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxNativeParameters,
+	   ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxEnvParams);
+
+   /* Fragment shaders use real, 32-bit twos-complement integers for all
+    * integer types.
+    */
+   ctx->Const.Program[MESA_SHADER_FRAGMENT].LowInt.RangeMin = 31;
+   ctx->Const.Program[MESA_SHADER_FRAGMENT].LowInt.RangeMax = 30;
+   ctx->Const.Program[MESA_SHADER_FRAGMENT].LowInt.Precision = 0;
+   ctx->Const.Program[MESA_SHADER_FRAGMENT].HighInt = ctx->Const.Program[MESA_SHADER_FRAGMENT].LowInt;
+   ctx->Const.Program[MESA_SHADER_FRAGMENT].MediumInt = ctx->Const.Program[MESA_SHADER_FRAGMENT].LowInt;
+
+   if (brw->gen >= 7) {
+      ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxAtomicCounters = MAX_ATOMIC_COUNTERS;
+      ctx->Const.Program[MESA_SHADER_VERTEX].MaxAtomicCounters = MAX_ATOMIC_COUNTERS;
+      ctx->Const.Program[MESA_SHADER_GEOMETRY].MaxAtomicCounters = MAX_ATOMIC_COUNTERS;
+      ctx->Const.Program[MESA_SHADER_COMPUTE].MaxAtomicCounters = MAX_ATOMIC_COUNTERS;
+      ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxAtomicBuffers = BRW_MAX_ABO;
+      ctx->Const.Program[MESA_SHADER_VERTEX].MaxAtomicBuffers = BRW_MAX_ABO;
+      ctx->Const.Program[MESA_SHADER_GEOMETRY].MaxAtomicBuffers = BRW_MAX_ABO;
+      ctx->Const.Program[MESA_SHADER_COMPUTE].MaxAtomicBuffers = BRW_MAX_ABO;
+      ctx->Const.MaxCombinedAtomicBuffers = 3 * BRW_MAX_ABO;
+   }
+
+   /* Gen6 converts quads to polygon in beginning of 3D pipeline,
+    * but we're not sure how it's actually done for vertex order,
+    * that affect provoking vertex decision. Always use last vertex
+    * convention for quad primitive which works as expected for now.
+    */
+   if (brw->gen >= 6)
+      ctx->Const.QuadsFollowProvokingVertexConvention = false;
+
+   ctx->Const.NativeIntegers = true;
+   ctx->Const.UniformBooleanTrue = 1;
+
+   /* From the gen4 PRM, volume 4 page 127:
+    *
+    *     "For SURFTYPE_BUFFER non-rendertarget surfaces, this field specifies
+    *      the base address of the first element of the surface, computed in
+    *      software by adding the surface base address to the byte offset of
+    *      the element in the buffer."
+    *
+    * However, unaligned accesses are slower, so enforce buffer alignment.
+    */
+   ctx->Const.UniformBufferOffsetAlignment = 16;
+   ctx->Const.TextureBufferOffsetAlignment = 16;
+
+   if (brw->gen >= 6) {
+      ctx->Const.MaxVarying = 32;
+      ctx->Const.Program[MESA_SHADER_VERTEX].MaxOutputComponents = 128;
+      ctx->Const.Program[MESA_SHADER_GEOMETRY].MaxInputComponents = 64;
+      ctx->Const.Program[MESA_SHADER_GEOMETRY].MaxOutputComponents = 128;
+      ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxInputComponents = 128;
+   }
+
+   /* We want the GLSL compiler to emit code that uses condition codes */
+   for (int i = 0; i < MESA_SHADER_STAGES; i++) {
+      ctx->ShaderCompilerOptions[i].MaxIfDepth = brw->gen < 6 ? 16 : UINT_MAX;
+      ctx->ShaderCompilerOptions[i].EmitCondCodes = true;
+      ctx->ShaderCompilerOptions[i].EmitNoNoise = true;
+      ctx->ShaderCompilerOptions[i].EmitNoMainReturn = true;
+      ctx->ShaderCompilerOptions[i].EmitNoIndirectInput = true;
+      ctx->ShaderCompilerOptions[i].EmitNoIndirectOutput =
+	 (i == MESA_SHADER_FRAGMENT);
+      ctx->ShaderCompilerOptions[i].EmitNoIndirectTemp =
+	 (i == MESA_SHADER_FRAGMENT);
+      ctx->ShaderCompilerOptions[i].EmitNoIndirectUniform = false;
+      ctx->ShaderCompilerOptions[i].LowerClipDistance = true;
+   }
+
+   ctx->ShaderCompilerOptions[MESA_SHADER_VERTEX].OptimizeForAOS = true;
+   ctx->ShaderCompilerOptions[MESA_SHADER_GEOMETRY].OptimizeForAOS = true;
+
+   /* ARB_viewport_array */
+   if (brw->gen >= 7 && (ctx->API == API_OPENGL_CORE || ctx->API == API_VK)) {
+      ctx->Const.MaxViewports = GEN7_NUM_VIEWPORTS;
+      ctx->Const.ViewportSubpixelBits = 0;
+
+      /* Cast to float before negating because MaxViewportWidth is unsigned.
+       */
+      ctx->Const.ViewportBounds.Min = -(float)ctx->Const.MaxViewportWidth;
+      ctx->Const.ViewportBounds.Max = ctx->Const.MaxViewportWidth;
+   }
+
+   if (unlikely(INTEL_DEBUG & DEBUG_DRI)) {
+      switch (ctx->Const.GlassMode) {
+      case DRI_CONF_GLASS_MODE_NEVER:     fprintf(stderr, "GlassMode = never\n");     break;
+      case DRI_CONF_GLASS_MODE_WHITELIST: fprintf(stderr, "GlassMode = whitelist\n"); break;
+      case DRI_CONF_GLASS_MODE_ALWAYS:    fprintf(stderr, "GlassMode = always\n");    break;
+      default:                            fprintf(stderr, "GlassMode = unknown\n");   break;
+      }
+   }
+}
+
+/**
+ * Process driconf (drirc) options, setting appropriate context flags.
+ *
+ * intelInitExtensions still pokes at optionCache directly, in order to
+ * avoid advertising various extensions.  No flags are set, so it makes
+ * sense to continue doing that there.
+ */
+static void
+brw_process_driconf_options(struct brw_context *brw)
+{
+   struct gl_context *ctx = &brw->ctx;
+
+   driOptionCache *options = &brw->optionCache;
+   driParseConfigFiles(options, &brw->intelScreen->optionCache,
+                       brw->driContext->driScreenPriv->myNum, "i965");
+
+   int bo_reuse_mode = driQueryOptioni(options, "bo_reuse");
+   switch (bo_reuse_mode) {
+   case DRI_CONF_BO_REUSE_DISABLED:
+      break;
+   case DRI_CONF_BO_REUSE_ALL:
+      intel_bufmgr_gem_enable_reuse(brw->bufmgr);
+      break;
+   }
+
+   if (!driQueryOptionb(options, "hiz")) {
+       brw->has_hiz = false;
+       /* On gen6, you can only do separate stencil with HIZ. */
+       if (brw->gen == 6)
+          brw->has_separate_stencil = false;
+   }
+
+   if (driQueryOptionb(options, "always_flush_batch")) {
+      fprintf(stderr, "flushing batchbuffer before/after each draw call\n");
+      brw->always_flush_batch = true;
+   }
+
+   if (driQueryOptionb(options, "always_flush_cache")) {
+      fprintf(stderr, "flushing GPU caches before/after each draw call\n");
+      brw->always_flush_cache = true;
+   }
+
+   if (driQueryOptionb(options, "disable_throttling")) {
+      fprintf(stderr, "disabling flush throttling\n");
+      brw->disable_throttling = true;
+   }
+
+   brw->disable_derivative_optimization =
+      driQueryOptionb(&brw->optionCache, "disable_derivative_optimization");
+
+   brw->precompile = driQueryOptionb(&brw->optionCache, "shader_precompile");
+
+   ctx->Const.ForceGLSLExtensionsWarn =
+      driQueryOptionb(options, "force_glsl_extensions_warn");
+
+   ctx->Const.DisableGLSLLineContinuations =
+      driQueryOptionb(options, "disable_glsl_line_continuations");
+
+
+   ctx->Const.GlassMode = driQueryOptioni(&brw->optionCache, "glass_mode");
+   ctx->Const.GlassEnableReassociation =
+      driQueryOptionb(&brw->optionCache, "glass_enable_reassociation");
+
+
+   const int multithread_glsl_compiler =
+      driQueryOptioni(options, "multithread_glsl_compiler");
+   if (multithread_glsl_compiler > 0) {
+      const int max_threads = (multithread_glsl_compiler > 1) ?
+         multithread_glsl_compiler : 2;
+
+      _mesa_enable_glsl_threadpool(ctx, max_threads);
+      ctx->Const.DeferCompileShader = GL_TRUE;
+      ctx->Const.DeferLinkProgram = GL_TRUE;
+   }
+
+
+   int max_shader_cache_size = 0;
+   max_shader_cache_size = driQueryOptioni(options, "max_shader_cache_size");
+   if (max_shader_cache_size != 0)
+      ctx->Const.MaxShaderCacheSize = (unsigned) max_shader_cache_size;
+
+   mesa_program_diskcache_init(ctx);
+   mesa_shader_diskcache_init(ctx);
+}
+
+GLboolean
+brwCreateContext(gl_api api,
+	         const struct gl_config *mesaVis,
+		 __DRIcontext *driContextPriv,
+                 unsigned major_version,
+                 unsigned minor_version,
+                 uint32_t flags,
+                 bool notify_reset,
+                 unsigned *dri_ctx_error,
+	         void *sharedContextPrivate)
+{
+   __DRIscreen *sPriv = driContextPriv->driScreenPriv;
+   struct gl_context *shareCtx = (struct gl_context *) sharedContextPrivate;
+   struct intel_screen *screen = sPriv->driverPrivate;
+   const struct brw_device_info *devinfo = screen->devinfo;
+   struct dd_function_table functions;
+
+   /* Only allow the __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS flag if the kernel
+    * provides us with context reset notifications.
+    */
+   uint32_t allowed_flags = __DRI_CTX_FLAG_DEBUG
+      | __DRI_CTX_FLAG_FORWARD_COMPATIBLE;
+
+   if (screen->has_context_reset_notification)
+      allowed_flags |= __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS;
+
+   if (flags & ~allowed_flags) {
+      *dri_ctx_error = __DRI_CTX_ERROR_UNKNOWN_FLAG;
+      return false;
+   }
+
+   struct brw_context *brw = rzalloc(NULL, struct brw_context);
+   if (!brw) {
+      fprintf(stderr, "%s: failed to alloc context\n", __FUNCTION__);
+      *dri_ctx_error = __DRI_CTX_ERROR_NO_MEMORY;
+      return false;
+   }
+
+   driContextPriv->driverPrivate = brw;
+   brw->driContext = driContextPriv;
+   brw->intelScreen = screen;
+   brw->bufmgr = screen->bufmgr;
+
+   brw->gen = devinfo->gen;
+   brw->gt = devinfo->gt;
+   brw->is_g4x = devinfo->is_g4x;
+   brw->is_baytrail = devinfo->is_baytrail;
+   brw->is_haswell = devinfo->is_haswell;
+   brw->has_llc = devinfo->has_llc;
+   brw->has_hiz = devinfo->has_hiz_and_separate_stencil;
+   brw->has_separate_stencil = devinfo->has_hiz_and_separate_stencil;
+   brw->has_pln = devinfo->has_pln;
+   brw->has_compr4 = devinfo->has_compr4;
+   brw->has_surface_tile_offset = devinfo->has_surface_tile_offset;
+   brw->has_negative_rhw_bug = devinfo->has_negative_rhw_bug;
+   brw->needs_unlit_centroid_workaround =
+      devinfo->needs_unlit_centroid_workaround;
+
+   brw->must_use_separate_stencil = screen->hw_must_use_separate_stencil;
+   brw->has_swizzling = screen->hw_has_swizzling;
+
+   brw->vs.base.stage = MESA_SHADER_VERTEX;
+   brw->gs.base.stage = MESA_SHADER_GEOMETRY;
+   brw->wm.base.stage = MESA_SHADER_FRAGMENT;
+   if (brw->gen >= 8) {
+      gen8_init_vtable_surface_functions(brw);
+      gen7_init_vtable_sampler_functions(brw);
+      brw->vtbl.emit_depth_stencil_hiz = gen8_emit_depth_stencil_hiz;
+   } else if (brw->gen >= 7) {
+      gen7_init_vtable_surface_functions(brw);
+      gen7_init_vtable_sampler_functions(brw);
+      brw->vtbl.emit_depth_stencil_hiz = gen7_emit_depth_stencil_hiz;
+   } else {
+      gen4_init_vtable_surface_functions(brw);
+      gen4_init_vtable_sampler_functions(brw);
+      brw->vtbl.emit_depth_stencil_hiz = brw_emit_depth_stencil_hiz;
+   }
+
+   brw_init_driver_functions(brw, &functions);
+
+   if (notify_reset)
+      functions.GetGraphicsResetStatus = brw_get_graphics_reset_status;
+
+   struct gl_context *ctx = &brw->ctx;
+
+   if (!_mesa_initialize_context(ctx, api, mesaVis, shareCtx, &functions)) {
+      *dri_ctx_error = __DRI_CTX_ERROR_NO_MEMORY;
+      fprintf(stderr, "%s: failed to init mesa context\n", __FUNCTION__);
+      intelDestroyContext(driContextPriv);
+      return false;
+   }
+
+   driContextSetFlags(ctx, flags);
+
+   /* Initialize the software rasterizer and helper modules.
+    *
+    * As of GL 3.1 core, the gen4+ driver doesn't need the swrast context for
+    * software fallbacks (which we have to support on legacy GL to do weird
+    * glDrawPixels(), glBitmap(), and other functions).
+    */
+   if (api != API_OPENGL_CORE && api != API_OPENGLES2) {
+      _swrast_CreateContext(ctx);
+   }
+
+   _vbo_CreateContext(ctx);
+   if (ctx->swrast_context) {
+      _tnl_CreateContext(ctx);
+      TNL_CONTEXT(ctx)->Driver.RunPipeline = _tnl_run_pipeline;
+      _swsetup_CreateContext(ctx);
+
+      /* Configure swrast to match hardware characteristics: */
+      _swrast_allow_pixel_fog(ctx, false);
+      _swrast_allow_vertex_fog(ctx, true);
+   }
+
+   _mesa_meta_init(ctx);
+
+   brw_process_driconf_options(brw);
+   brw_process_intel_debug_variable(brw);
+   brw_initialize_context_constants(brw);
+
+   ctx->Const.ResetStrategy = notify_reset
+      ? GL_LOSE_CONTEXT_ON_RESET_ARB : GL_NO_RESET_NOTIFICATION_ARB;
+
+   /* Reinitialize the context point state.  It depends on ctx->Const values. */
+   _mesa_init_point(ctx);
+
+   intel_fbo_init(brw);
+
+   intel_batchbuffer_init(brw);
+
+   if (brw->gen >= 6) {
+      /* Create a new hardware context.  Using a hardware context means that
+       * our GPU state will be saved/restored on context switch, allowing us
+       * to assume that the GPU is in the same state we left it in.
+       *
+       * This is required for transform feedback buffer offsets, query objects,
+       * and also allows us to reduce how much state we have to emit.
+       */
+      brw->hw_ctx = drm_intel_gem_context_create(brw->bufmgr);
+
+      if (!brw->hw_ctx) {
+         fprintf(stderr, "Gen6+ requires Kernel 3.6 or later.\n");
+         intelDestroyContext(driContextPriv);
+         return false;
+      }
+   }
+
+   brw_init_state(brw);
+
+   intelInitExtensions(ctx);
+
+   brw_init_surface_formats(brw);
+
+   brw->max_vs_threads = devinfo->max_vs_threads;
+   brw->max_gs_threads = devinfo->max_gs_threads;
+   brw->max_wm_threads = devinfo->max_wm_threads;
+   brw->urb.size = devinfo->urb.size;
+   brw->urb.min_vs_entries = devinfo->urb.min_vs_entries;
+   brw->urb.max_vs_entries = devinfo->urb.max_vs_entries;
+   brw->urb.max_gs_entries = devinfo->urb.max_gs_entries;
+
+   /* Estimate the size of the mappable aperture into the GTT.  There's an
+    * ioctl to get the whole GTT size, but not one to get the mappable subset.
+    * It turns out it's basically always 256MB, though some ancient hardware
+    * was smaller.
+    */
+   uint32_t gtt_size = 256 * 1024 * 1024;
+
+   /* We don't want to map two objects such that a memcpy between them would
+    * just fault one mapping in and then the other over and over forever.  So
+    * we would need to divide the GTT size by 2.  Additionally, some GTT is
+    * taken up by things like the framebuffer and the ringbuffer and such, so
+    * be more conservative.
+    */
+   brw->max_gtt_map_object_size = gtt_size / 4;
+
+   if (brw->gen == 6)
+      brw->urb.gen6_gs_previously_active = false;
+
+   brw->prim_restart.in_progress = false;
+   brw->prim_restart.enable_cut_index = false;
+   brw->gs.enabled = false;
+
+   if (brw->gen < 6) {
+      brw->curbe.last_buf = calloc(1, 4096);
+      brw->curbe.next_buf = calloc(1, 4096);
+   }
+
+   ctx->VertexProgram._MaintainTnlProgram = true;
+   ctx->FragmentProgram._MaintainTexEnvProgram = true;
+
+   brw_draw_init( brw );
+
+   if ((flags & __DRI_CTX_FLAG_DEBUG) != 0) {
+      /* Turn on some extra GL_ARB_debug_output generation. */
+      brw->perf_debug = true;
+   }
+
+   if ((flags & __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS) != 0)
+      ctx->Const.ContextFlags |= GL_CONTEXT_FLAG_ROBUST_ACCESS_BIT_ARB;
+
+   if (INTEL_DEBUG & DEBUG_SHADER_TIME)
+      brw_init_shader_time(brw);
+
+   /* brw_shader_precompile is not thread-safe when debug flags are set */
+   if (brw->precompile && (INTEL_DEBUG || brw->perf_debug))
+      ctx->Const.DeferLinkProgram = GL_FALSE;
+
+   _mesa_compute_version(ctx);
+
+   _mesa_initialize_dispatch_tables(ctx);
+   _mesa_initialize_vbo_vtxfmt(ctx);
+
+   if (ctx->Extensions.AMD_performance_monitor) {
+      brw_init_performance_monitors(brw);
+   }
+
+   return true;
+}
+
+void
+intelDestroyContext(__DRIcontext * driContextPriv)
+{
+   struct brw_context *brw =
+      (struct brw_context *) driContextPriv->driverPrivate;
+   struct gl_context *ctx = &brw->ctx;
+
+   assert(brw); /* should never be null */
+   if (!brw)
+      return;
+
+   /* Dump a final BMP in case the application doesn't call SwapBuffers */
+   if (INTEL_DEBUG & DEBUG_AUB) {
+      intel_batchbuffer_flush(brw);
+      aub_dump_bmp(&brw->ctx);
+   }
+
+   _mesa_meta_free(&brw->ctx);
+
+   if (INTEL_DEBUG & DEBUG_SHADER_TIME) {
+      /* Force a report. */
+      brw->shader_time.report_time = 0;
+
+      brw_collect_and_report_shader_time(brw);
+      brw_destroy_shader_time(brw);
+   }
+
+   brw_destroy_state(brw);
+   brw_draw_destroy(brw);
+
+   drm_intel_bo_unreference(brw->curbe.curbe_bo);
+
+   free(brw->curbe.last_buf);
+   free(brw->curbe.next_buf);
+
+   drm_intel_gem_context_destroy(brw->hw_ctx);
+
+   if (ctx->swrast_context) {
+      _swsetup_DestroyContext(&brw->ctx);
+      _tnl_DestroyContext(&brw->ctx);
+   }
+   _vbo_DestroyContext(&brw->ctx);
+
+   if (ctx->swrast_context)
+      _swrast_DestroyContext(&brw->ctx);
+
+   intel_batchbuffer_free(brw);
+
+   drm_intel_bo_unreference(brw->first_post_swapbuffers_batch);
+   brw->first_post_swapbuffers_batch = NULL;
+
+   driDestroyOptionCache(&brw->optionCache);
+
+   /* free the Mesa context */
+   _mesa_free_context_data(&brw->ctx);
+
+   ralloc_free(brw);
+   driContextPriv->driverPrivate = NULL;
+}
+
+GLboolean
+intelUnbindContext(__DRIcontext * driContextPriv)
+{
+   /* Unset current context and dispath table */
+   _mesa_make_current(NULL, NULL, NULL);
+
+   return true;
+}
+
+/**
+ * Fixes up the context for GLES23 with our default-to-sRGB-capable behavior
+ * on window system framebuffers.
+ *
+ * Desktop GL is fairly reasonable in its handling of sRGB: You can ask if
+ * your renderbuffer can do sRGB encode, and you can flip a switch that does
+ * sRGB encode if the renderbuffer can handle it.  You can ask specifically
+ * for a visual where you're guaranteed to be capable, but it turns out that
+ * everyone just makes all their ARGB8888 visuals capable and doesn't offer
+ * incapable ones, because there's no difference between the two in resources
+ * used.  Applications thus get built that accidentally rely on the default
+ * visual choice being sRGB, so we make ours sRGB capable.  Everything sounds
+ * great...
+ *
+ * But for GLES2/3, they decided that it was silly to not turn on sRGB encode
+ * for sRGB renderbuffers you made with the GL_EXT_texture_sRGB equivalent.
+ * So they removed the enable knob and made it "if the renderbuffer is sRGB
+ * capable, do sRGB encode".  Then, for your window system renderbuffers, you
+ * can ask for sRGB visuals and get sRGB encode, or not ask for sRGB visuals
+ * and get no sRGB encode (assuming that both kinds of visual are available).
+ * Thus our choice to support sRGB by default on our visuals for desktop would
+ * result in broken rendering of GLES apps that aren't expecting sRGB encode.
+ *
+ * Unfortunately, renderbuffer setup happens before a context is created.  So
+ * in intel_screen.c we always set up sRGB, and here, if you're a GLES2/3
+ * context (without an sRGB visual, though we don't have sRGB visuals exposed
+ * yet), we go turn that back off before anyone finds out.
+ */
+static void
+intel_gles3_srgb_workaround(struct brw_context *brw,
+                            struct gl_framebuffer *fb)
+{
+   struct gl_context *ctx = &brw->ctx;
+
+   if (_mesa_is_desktop_gl(ctx) || !fb->Visual.sRGBCapable)
+      return;
+
+   /* Some day when we support the sRGB capable bit on visuals available for
+    * GLES, we'll need to respect that and not disable things here.
+    */
+   fb->Visual.sRGBCapable = false;
+   for (int i = 0; i < BUFFER_COUNT; i++) {
+      if (fb->Attachment[i].Renderbuffer &&
+          fb->Attachment[i].Renderbuffer->Format == MESA_FORMAT_B8G8R8A8_SRGB) {
+         fb->Attachment[i].Renderbuffer->Format = MESA_FORMAT_B8G8R8A8_UNORM;
+      }
+   }
+}
+
+GLboolean
+intelMakeCurrent(__DRIcontext * driContextPriv,
+                 __DRIdrawable * driDrawPriv,
+                 __DRIdrawable * driReadPriv)
+{
+   struct brw_context *brw;
+   GET_CURRENT_CONTEXT(curCtx);
+
+   if (driContextPriv)
+      brw = (struct brw_context *) driContextPriv->driverPrivate;
+   else
+      brw = NULL;
+
+   /* According to the glXMakeCurrent() man page: "Pending commands to
+    * the previous context, if any, are flushed before it is released."
+    * But only flush if we're actually changing contexts.
+    */
+   if (brw_context(curCtx) && brw_context(curCtx) != brw) {
+      _mesa_flush(curCtx);
+   }
+
+   if (driContextPriv) {
+      struct gl_context *ctx = &brw->ctx;
+      struct gl_framebuffer *fb, *readFb;
+
+      if (driDrawPriv == NULL && driReadPriv == NULL) {
+         fb = _mesa_get_incomplete_framebuffer();
+         readFb = _mesa_get_incomplete_framebuffer();
+      } else {
+         fb = driDrawPriv->driverPrivate;
+         readFb = driReadPriv->driverPrivate;
+         driContextPriv->dri2.draw_stamp = driDrawPriv->dri2.stamp - 1;
+         driContextPriv->dri2.read_stamp = driReadPriv->dri2.stamp - 1;
+      }
+
+      /* The sRGB workaround changes the renderbuffer's format. We must change
+       * the format before the renderbuffer's miptree get's allocated, otherwise
+       * the formats of the renderbuffer and its miptree will differ.
+       */
+      intel_gles3_srgb_workaround(brw, fb);
+      intel_gles3_srgb_workaround(brw, readFb);
+
+      /* If the context viewport hasn't been initialized, force a call out to
+       * the loader to get buffers so we have a drawable size for the initial
+       * viewport. */
+      if (!brw->ctx.ViewportInitialized)
+         intel_prepare_render(brw);
+
+      _mesa_make_current(ctx, fb, readFb);
+   } else {
+      _mesa_make_current(NULL, NULL, NULL);
+   }
+
+   return true;
+}
+
+void
+intel_resolve_for_dri2_flush(struct brw_context *brw,
+                             __DRIdrawable *drawable)
+{
+   if (brw->gen < 6) {
+      /* MSAA and fast color clear are not supported, so don't waste time
+       * checking whether a resolve is needed.
+       */
+      return;
+   }
+
+   struct gl_framebuffer *fb = drawable->driverPrivate;
+   struct intel_renderbuffer *rb;
+
+   /* Usually, only the back buffer will need to be downsampled. However,
+    * the front buffer will also need it if the user has rendered into it.
+    */
+   static const gl_buffer_index buffers[2] = {
+         BUFFER_BACK_LEFT,
+         BUFFER_FRONT_LEFT,
+   };
+
+   for (int i = 0; i < 2; ++i) {
+      rb = intel_get_renderbuffer(fb, buffers[i]);
+      if (rb == NULL || rb->mt == NULL)
+         continue;
+      if (rb->mt->num_samples <= 1)
+         intel_miptree_resolve_color(brw, rb->mt);
+      else
+         intel_renderbuffer_downsample(brw, rb);
+   }
+}
+
+static unsigned
+intel_bits_per_pixel(const struct intel_renderbuffer *rb)
+{
+   return _mesa_get_format_bytes(intel_rb_format(rb)) * 8;
+}
+
+static void
+intel_query_dri2_buffers(struct brw_context *brw,
+                         __DRIdrawable *drawable,
+                         __DRIbuffer **buffers,
+                         int *count);
+
+static void
+intel_process_dri2_buffer(struct brw_context *brw,
+                          __DRIdrawable *drawable,
+                          __DRIbuffer *buffer,
+                          struct intel_renderbuffer *rb,
+                          const char *buffer_name);
+
+static void
+intel_update_image_buffers(struct brw_context *brw, __DRIdrawable *drawable);
+
+static void
+intel_update_dri2_buffers(struct brw_context *brw, __DRIdrawable *drawable)
+{
+   struct gl_framebuffer *fb = drawable->driverPrivate;
+   struct intel_renderbuffer *rb;
+   __DRIbuffer *buffers = NULL;
+   int i, count;
+   const char *region_name;
+
+   /* Set this up front, so that in case our buffers get invalidated
+    * while we're getting new buffers, we don't clobber the stamp and
+    * thus ignore the invalidate. */
+   drawable->lastStamp = drawable->dri2.stamp;
+
+   if (unlikely(INTEL_DEBUG & DEBUG_DRI))
+      fprintf(stderr, "enter %s, drawable %p\n", __func__, drawable);
+
+   intel_query_dri2_buffers(brw, drawable, &buffers, &count);
+
+   if (buffers == NULL)
+      return;
+
+   for (i = 0; i < count; i++) {
+       switch (buffers[i].attachment) {
+       case __DRI_BUFFER_FRONT_LEFT:
+           rb = intel_get_renderbuffer(fb, BUFFER_FRONT_LEFT);
+           region_name = "dri2 front buffer";
+           break;
+
+       case __DRI_BUFFER_FAKE_FRONT_LEFT:
+           rb = intel_get_renderbuffer(fb, BUFFER_FRONT_LEFT);
+           region_name = "dri2 fake front buffer";
+           break;
+
+       case __DRI_BUFFER_BACK_LEFT:
+           rb = intel_get_renderbuffer(fb, BUFFER_BACK_LEFT);
+           region_name = "dri2 back buffer";
+           break;
+
+       case __DRI_BUFFER_DEPTH:
+       case __DRI_BUFFER_HIZ:
+       case __DRI_BUFFER_DEPTH_STENCIL:
+       case __DRI_BUFFER_STENCIL:
+       case __DRI_BUFFER_ACCUM:
+       default:
+           fprintf(stderr,
+                   "unhandled buffer attach event, attachment type %d\n",
+                   buffers[i].attachment);
+           return;
+       }
+
+       intel_process_dri2_buffer(brw, drawable, &buffers[i], rb, region_name);
+   }
+
+}
+
+void
+intel_update_renderbuffers(__DRIcontext *context, __DRIdrawable *drawable)
+{
+   struct brw_context *brw = context->driverPrivate;
+   __DRIscreen *screen = brw->intelScreen->driScrnPriv;
+
+   /* Set this up front, so that in case our buffers get invalidated
+    * while we're getting new buffers, we don't clobber the stamp and
+    * thus ignore the invalidate. */
+   drawable->lastStamp = drawable->dri2.stamp;
+
+   if (unlikely(INTEL_DEBUG & DEBUG_DRI))
+      fprintf(stderr, "enter %s, drawable %p\n", __func__, drawable);
+
+   if (screen->image.loader)
+      intel_update_image_buffers(brw, drawable);
+   else
+      intel_update_dri2_buffers(brw, drawable);
+
+   driUpdateFramebufferSize(&brw->ctx, drawable);
+}
+
+/**
+ * intel_prepare_render should be called anywhere that curent read/drawbuffer
+ * state is required.
+ */
+void
+intel_prepare_render(struct brw_context *brw)
+{
+   struct gl_context *ctx = &brw->ctx;
+   __DRIcontext *driContext = brw->driContext;
+   __DRIdrawable *drawable;
+
+   drawable = driContext->driDrawablePriv;
+   if (drawable && drawable->dri2.stamp != driContext->dri2.draw_stamp) {
+      if (drawable->lastStamp != drawable->dri2.stamp)
+         intel_update_renderbuffers(driContext, drawable);
+      driContext->dri2.draw_stamp = drawable->dri2.stamp;
+   }
+
+   drawable = driContext->driReadablePriv;
+   if (drawable && drawable->dri2.stamp != driContext->dri2.read_stamp) {
+      if (drawable->lastStamp != drawable->dri2.stamp)
+         intel_update_renderbuffers(driContext, drawable);
+      driContext->dri2.read_stamp = drawable->dri2.stamp;
+   }
+
+   /* If we're currently rendering to the front buffer, the rendering
+    * that will happen next will probably dirty the front buffer.  So
+    * mark it as dirty here.
+    */
+   if (brw_is_front_buffer_drawing(ctx->DrawBuffer))
+      brw->front_buffer_dirty = true;
+
+   /* Wait for the swapbuffers before the one we just emitted, so we
+    * don't get too many swaps outstanding for apps that are GPU-heavy
+    * but not CPU-heavy.
+    *
+    * We're using intelDRI2Flush (called from the loader before
+    * swapbuffer) and glFlush (for front buffer rendering) as the
+    * indicator that a frame is done and then throttle when we get
+    * here as we prepare to render the next frame.  At this point for
+    * round trips for swap/copy and getting new buffers are done and
+    * we'll spend less time waiting on the GPU.
+    *
+    * Unfortunately, we don't have a handle to the batch containing
+    * the swap, and getting our hands on that doesn't seem worth it,
+    * so we just us the first batch we emitted after the last swap.
+    */
+   if (brw->need_throttle && brw->first_post_swapbuffers_batch) {
+      if (!brw->disable_throttling)
+         drm_intel_bo_wait_rendering(brw->first_post_swapbuffers_batch);
+      drm_intel_bo_unreference(brw->first_post_swapbuffers_batch);
+      brw->first_post_swapbuffers_batch = NULL;
+      brw->need_throttle = false;
+   }
+}
+
+/**
+ * \brief Query DRI2 to obtain a DRIdrawable's buffers.
+ *
+ * To determine which DRI buffers to request, examine the renderbuffers
+ * attached to the drawable's framebuffer. Then request the buffers with
+ * DRI2GetBuffers() or DRI2GetBuffersWithFormat().
+ *
+ * This is called from intel_update_renderbuffers().
+ *
+ * \param drawable      Drawable whose buffers are queried.
+ * \param buffers       [out] List of buffers returned by DRI2 query.
+ * \param buffer_count  [out] Number of buffers returned.
+ *
+ * \see intel_update_renderbuffers()
+ * \see DRI2GetBuffers()
+ * \see DRI2GetBuffersWithFormat()
+ */
+static void
+intel_query_dri2_buffers(struct brw_context *brw,
+                         __DRIdrawable *drawable,
+                         __DRIbuffer **buffers,
+                         int *buffer_count)
+{
+   __DRIscreen *screen = brw->intelScreen->driScrnPriv;
+   struct gl_framebuffer *fb = drawable->driverPrivate;
+   int i = 0;
+   unsigned attachments[8];
+
+   struct intel_renderbuffer *front_rb;
+   struct intel_renderbuffer *back_rb;
+
+   front_rb = intel_get_renderbuffer(fb, BUFFER_FRONT_LEFT);
+   back_rb = intel_get_renderbuffer(fb, BUFFER_BACK_LEFT);
+
+   memset(attachments, 0, sizeof(attachments));
+   if ((brw_is_front_buffer_drawing(fb) ||
+        brw_is_front_buffer_reading(fb) ||
+        !back_rb) && front_rb) {
+      /* If a fake front buffer is in use, then querying for
+       * __DRI_BUFFER_FRONT_LEFT will cause the server to copy the image from
+       * the real front buffer to the fake front buffer.  So before doing the
+       * query, we need to make sure all the pending drawing has landed in the
+       * real front buffer.
+       */
+      intel_batchbuffer_flush(brw);
+      intel_flush_front(&brw->ctx);
+
+      attachments[i++] = __DRI_BUFFER_FRONT_LEFT;
+      attachments[i++] = intel_bits_per_pixel(front_rb);
+   } else if (front_rb && brw->front_buffer_dirty) {
+      /* We have pending front buffer rendering, but we aren't querying for a
+       * front buffer.  If the front buffer we have is a fake front buffer,
+       * the X server is going to throw it away when it processes the query.
+       * So before doing the query, make sure all the pending drawing has
+       * landed in the real front buffer.
+       */
+      intel_batchbuffer_flush(brw);
+      intel_flush_front(&brw->ctx);
+   }
+
+   if (back_rb) {
+      attachments[i++] = __DRI_BUFFER_BACK_LEFT;
+      attachments[i++] = intel_bits_per_pixel(back_rb);
+   }
+
+   assert(i <= ARRAY_SIZE(attachments));
+
+   *buffers = screen->dri2.loader->getBuffersWithFormat(drawable,
+                                                        &drawable->w,
+                                                        &drawable->h,
+                                                        attachments, i / 2,
+                                                        buffer_count,
+                                                        drawable->loaderPrivate);
+}
+
+/**
+ * \brief Assign a DRI buffer's DRM region to a renderbuffer.
+ *
+ * This is called from intel_update_renderbuffers().
+ *
+ * \par Note:
+ *    DRI buffers whose attachment point is DRI2BufferStencil or
+ *    DRI2BufferDepthStencil are handled as special cases.
+ *
+ * \param buffer_name is a human readable name, such as "dri2 front buffer",
+ *        that is passed to drm_intel_bo_gem_create_from_name().
+ *
+ * \see intel_update_renderbuffers()
+ */
+static void
+intel_process_dri2_buffer(struct brw_context *brw,
+                          __DRIdrawable *drawable,
+                          __DRIbuffer *buffer,
+                          struct intel_renderbuffer *rb,
+                          const char *buffer_name)
+{
+   struct gl_framebuffer *fb = drawable->driverPrivate;
+   drm_intel_bo *bo;
+
+   if (!rb)
+      return;
+
+   unsigned num_samples = rb->Base.Base.NumSamples;
+
+   /* We try to avoid closing and reopening the same BO name, because the first
+    * use of a mapping of the buffer involves a bunch of page faulting which is
+    * moderately expensive.
+    */
+   struct intel_mipmap_tree *last_mt;
+   if (num_samples == 0)
+      last_mt = rb->mt;
+   else
+      last_mt = rb->singlesample_mt;
+
+   uint32_t old_name = 0;
+   if (last_mt) {
+       /* The bo already has a name because the miptree was created by a
+	* previous call to intel_process_dri2_buffer(). If a bo already has a
+	* name, then drm_intel_bo_flink() is a low-cost getter.  It does not
+	* create a new name.
+	*/
+      drm_intel_bo_flink(last_mt->bo, &old_name);
+   }
+
+   if (old_name == buffer->name)
+      return;
+
+   if (unlikely(INTEL_DEBUG & DEBUG_DRI)) {
+      fprintf(stderr,
+              "attaching buffer %d, at %d, cpp %d, pitch %d\n",
+              buffer->name, buffer->attachment,
+              buffer->cpp, buffer->pitch);
+   }
+
+   intel_miptree_release(&rb->mt);
+   bo = drm_intel_bo_gem_create_from_name(brw->bufmgr, buffer_name,
+                                          buffer->name);
+   if (!bo) {
+      fprintf(stderr,
+              "Failed to open BO for returned DRI2 buffer "
+              "(%dx%d, %s, named %d).\n"
+              "This is likely a bug in the X Server that will lead to a "
+              "crash soon.\n",
+              drawable->w, drawable->h, buffer_name, buffer->name);
+      return;
+   }
+
+   intel_update_winsys_renderbuffer_miptree(brw, rb, bo,
+                                            drawable->w, drawable->h,
+                                            buffer->pitch);
+
+   if (brw_is_front_buffer_drawing(fb) &&
+       (buffer->attachment == __DRI_BUFFER_FRONT_LEFT ||
+        buffer->attachment == __DRI_BUFFER_FAKE_FRONT_LEFT) &&
+       rb->Base.Base.NumSamples > 1) {
+      intel_renderbuffer_upsample(brw, rb);
+   }
+
+   assert(rb->mt);
+
+   drm_intel_bo_unreference(bo);
+}
+
+/**
+ * \brief Query DRI image loader to obtain a DRIdrawable's buffers.
+ *
+ * To determine which DRI buffers to request, examine the renderbuffers
+ * attached to the drawable's framebuffer. Then request the buffers from
+ * the image loader
+ *
+ * This is called from intel_update_renderbuffers().
+ *
+ * \param drawable      Drawable whose buffers are queried.
+ * \param buffers       [out] List of buffers returned by DRI2 query.
+ * \param buffer_count  [out] Number of buffers returned.
+ *
+ * \see intel_update_renderbuffers()
+ */
+
+static void
+intel_update_image_buffer(struct brw_context *intel,
+                          __DRIdrawable *drawable,
+                          struct intel_renderbuffer *rb,
+                          __DRIimage *buffer,
+                          enum __DRIimageBufferMask buffer_type)
+{
+   struct gl_framebuffer *fb = drawable->driverPrivate;
+
+   if (!rb || !buffer->bo)
+      return;
+
+   unsigned num_samples = rb->Base.Base.NumSamples;
+
+   /* Check and see if we're already bound to the right
+    * buffer object
+    */
+   struct intel_mipmap_tree *last_mt;
+   if (num_samples == 0)
+      last_mt = rb->mt;
+   else
+      last_mt = rb->singlesample_mt;
+
+   if (last_mt && last_mt->bo == buffer->bo)
+      return;
+
+   intel_update_winsys_renderbuffer_miptree(intel, rb, buffer->bo,
+                                            buffer->width, buffer->height,
+                                            buffer->pitch);
+
+   if (brw_is_front_buffer_drawing(fb) &&
+       buffer_type == __DRI_IMAGE_BUFFER_FRONT &&
+       rb->Base.Base.NumSamples > 1) {
+      intel_renderbuffer_upsample(intel, rb);
+   }
+}
+
+static void
+intel_update_image_buffers(struct brw_context *brw, __DRIdrawable *drawable)
+{
+   struct gl_framebuffer *fb = drawable->driverPrivate;
+   __DRIscreen *screen = brw->intelScreen->driScrnPriv;
+   struct intel_renderbuffer *front_rb;
+   struct intel_renderbuffer *back_rb;
+   struct __DRIimageList images;
+   unsigned int format;
+   uint32_t buffer_mask = 0;
+
+   front_rb = intel_get_renderbuffer(fb, BUFFER_FRONT_LEFT);
+   back_rb = intel_get_renderbuffer(fb, BUFFER_BACK_LEFT);
+
+   if (back_rb)
+      format = intel_rb_format(back_rb);
+   else if (front_rb)
+      format = intel_rb_format(front_rb);
+   else
+      return;
+
+   if (front_rb && (brw_is_front_buffer_drawing(fb) ||
+                    brw_is_front_buffer_reading(fb) || !back_rb)) {
+      buffer_mask |= __DRI_IMAGE_BUFFER_FRONT;
+   }
+
+   if (back_rb)
+      buffer_mask |= __DRI_IMAGE_BUFFER_BACK;
+
+   (*screen->image.loader->getBuffers) (drawable,
+                                        driGLFormatToImageFormat(format),
+                                        &drawable->dri2.stamp,
+                                        drawable->loaderPrivate,
+                                        buffer_mask,
+                                        &images);
+
+   if (images.image_mask & __DRI_IMAGE_BUFFER_FRONT) {
+      drawable->w = images.front->width;
+      drawable->h = images.front->height;
+      intel_update_image_buffer(brw,
+                                drawable,
+                                front_rb,
+                                images.front,
+                                __DRI_IMAGE_BUFFER_FRONT);
+   }
+   if (images.image_mask & __DRI_IMAGE_BUFFER_BACK) {
+      drawable->w = images.back->width;
+      drawable->h = images.back->height;
+      intel_update_image_buffer(brw,
+                                drawable,
+                                back_rb,
+                                images.back,
+                                __DRI_IMAGE_BUFFER_BACK);
+   }
+}

diff --git a/icd/intel/compiler/pipeline/brw_context.h b/icd/intel/compiler/pipeline/brw_context.h
new file mode 100644
index 0000000..9fd6a11
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_context.h

@@ -0,0 +1,1290 @@
+/*
+ Copyright (C) Intel Corp.  2006.  All Rights Reserved.
+ Intel funded Tungsten Graphics to
+ develop this 3D driver.
+
+ Permission is hereby granted, free of charge, to any person obtaining
+ a copy of this software and associated documentation files (the
+ "Software"), to deal in the Software without restriction, including
+ without limitation the rights to use, copy, modify, merge, publish,
+ distribute, sublicense, and/or sell copies of the Software, and to
+ permit persons to whom the Software is furnished to do so, subject to
+ the following conditions:
+
+ The above copyright notice and this permission notice (including the
+ next paragraph) shall be included in all copies or substantial
+ portions of the Software.
+
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+ **********************************************************************/
+ /*
+  * Authors:
+  *   Keith Whitwell <keithw@vmware.com>
+  */
+
+
+#ifndef BRWCONTEXT_INC
+#define BRWCONTEXT_INC
+
+#include <stdbool.h>
+#include <string.h>
+#include "main/imports.h"
+#include "main/macros.h"
+//#include "main/mm.h"  // LunarG: Remove
+#include "main/mtypes.h"
+#include "brw_structs.h"
+
+typedef struct drm_intel_bo drm_intel_bo;
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+#include "intel_debug.h"
+#include "intel_screen.h"
+//#include "intel_tex_obj.h" // LunarG: Remove
+//#include "intel_resolve_map.h"  // LunarG: Remove
+
+/* Glossary:
+ *
+ * URB - uniform resource buffer.  A mid-sized buffer which is
+ * partitioned between the fixed function units and used for passing
+ * values (vertices, primitives, constants) between them.
+ *
+ * CURBE - constant URB entry.  An urb region (entry) used to hold
+ * constant values which the fixed function units can be instructed to
+ * preload into the GRF when spawning a thread.
+ *
+ * VUE - vertex URB entry.  An urb entry holding a vertex and usually
+ * a vertex header.  The header contains control information and
+ * things like primitive type, Begin/end flags and clip codes.
+ *
+ * PUE - primitive URB entry.  An urb entry produced by the setup (SF)
+ * unit holding rasterization and interpolation parameters.
+ *
+ * GRF - general register file.  One of several register files
+ * addressable by programmed threads.  The inputs (r0, payload, curbe,
+ * urb) of the thread are preloaded to this area before the thread is
+ * spawned.  The registers are individually 8 dwords wide and suitable
+ * for general usage.  Registers holding thread input values are not
+ * special and may be overwritten.
+ *
+ * MRF - message register file.  Threads communicate (and terminate)
+ * by sending messages.  Message parameters are placed in contiguous
+ * MRF registers.  All program output is via these messages.  URB
+ * entries are populated by sending a message to the shared URB
+ * function containing the new data, together with a control word,
+ * often an unmodified copy of R0.
+ *
+ * R0 - GRF register 0.  Typically holds control information used when
+ * sending messages to other threads.
+ *
+ * EU or GEN4 EU: The name of the programmable subsystem of the
+ * i965 hardware.  Threads are executed by the EU, the registers
+ * described above are part of the EU architecture.
+ *
+ * Fixed function units:
+ *
+ * CS - Command streamer.  Notional first unit, little software
+ * interaction.  Holds the URB entries used for constant data, ie the
+ * CURBEs.
+ *
+ * VF/VS - Vertex Fetch / Vertex Shader.  The fixed function part of
+ * this unit is responsible for pulling vertices out of vertex buffers
+ * in vram and injecting them into the processing pipe as VUEs.  If
+ * enabled, it first passes them to a VS thread which is a good place
+ * for the driver to implement any active vertex shader.
+ *
+ * GS - Geometry Shader.  This corresponds to a new DX10 concept.  If
+ * enabled, incoming strips etc are passed to GS threads in individual
+ * line/triangle/point units.  The GS thread may perform arbitrary
+ * computation and emit whatever primtives with whatever vertices it
+ * chooses.  This makes GS an excellent place to implement GL's
+ * unfilled polygon modes, though of course it is capable of much
+ * more.  Additionally, GS is used to translate away primitives not
+ * handled by latter units, including Quads and Lineloops.
+ *
+ * CS - Clipper.  Mesa's clipping algorithms are imported to run on
+ * this unit.  The fixed function part performs cliptesting against
+ * the 6 fixed clipplanes and makes decisions on whether or not the
+ * incoming primitive needs to be passed to a thread for clipping.
+ * User clip planes are handled via cooperation with the VS thread.
+ *
+ * SF - Strips Fans or Setup: Triangles are prepared for
+ * rasterization.  Interpolation coefficients are calculated.
+ * Flatshading and two-side lighting usually performed here.
+ *
+ * WM - Windower.  Interpolation of vertex attributes performed here.
+ * Fragment shader implemented here.  SIMD aspects of EU taken full
+ * advantage of, as pixels are processed in blocks of 16.
+ *
+ * CC - Color Calculator.  No EU threads associated with this unit.
+ * Handles blending and (presumably) depth and stencil testing.
+ */
+
+#define BRW_MAX_CURBE                    (32*16)
+
+struct brw_context;
+struct brw_instruction;
+struct brw_vs_prog_key;
+struct brw_vec4_prog_key;
+struct brw_wm_prog_key;
+struct brw_wm_prog_data;
+
+enum brw_state_id {
+   BRW_STATE_URB_FENCE,
+   BRW_STATE_FRAGMENT_PROGRAM,
+   BRW_STATE_GEOMETRY_PROGRAM,
+   BRW_STATE_VERTEX_PROGRAM,
+   BRW_STATE_CURBE_OFFSETS,
+   BRW_STATE_REDUCED_PRIMITIVE,
+   BRW_STATE_PRIMITIVE,
+   BRW_STATE_CONTEXT,
+   BRW_STATE_PSP,
+   BRW_STATE_SURFACES,
+   BRW_STATE_VS_BINDING_TABLE,
+   BRW_STATE_GS_BINDING_TABLE,
+   BRW_STATE_PS_BINDING_TABLE,
+   BRW_STATE_INDICES,
+   BRW_STATE_VERTICES,
+   BRW_STATE_BATCH,
+   BRW_STATE_INDEX_BUFFER,
+   BRW_STATE_VS_CONSTBUF,
+   BRW_STATE_GS_CONSTBUF,
+   BRW_STATE_PROGRAM_CACHE,
+   BRW_STATE_STATE_BASE_ADDRESS,
+   BRW_STATE_VUE_MAP_VS,
+   BRW_STATE_VUE_MAP_GEOM_OUT,
+   BRW_STATE_TRANSFORM_FEEDBACK,
+   BRW_STATE_RASTERIZER_DISCARD,
+   BRW_STATE_STATS_WM,
+   BRW_STATE_UNIFORM_BUFFER,
+   BRW_STATE_ATOMIC_BUFFER,
+   BRW_STATE_META_IN_PROGRESS,
+   BRW_STATE_INTERPOLATION_MAP,
+   BRW_STATE_PUSH_CONSTANT_ALLOCATION,
+   BRW_STATE_NUM_SAMPLES,
+   BRW_NUM_STATE_BITS
+};
+
+#define BRW_NEW_URB_FENCE               (1 << BRW_STATE_URB_FENCE)
+#define BRW_NEW_FRAGMENT_PROGRAM        (1 << BRW_STATE_FRAGMENT_PROGRAM)
+#define BRW_NEW_GEOMETRY_PROGRAM        (1 << BRW_STATE_GEOMETRY_PROGRAM)
+#define BRW_NEW_VERTEX_PROGRAM          (1 << BRW_STATE_VERTEX_PROGRAM)
+#define BRW_NEW_CURBE_OFFSETS           (1 << BRW_STATE_CURBE_OFFSETS)
+#define BRW_NEW_REDUCED_PRIMITIVE       (1 << BRW_STATE_REDUCED_PRIMITIVE)
+#define BRW_NEW_PRIMITIVE               (1 << BRW_STATE_PRIMITIVE)
+#define BRW_NEW_CONTEXT                 (1 << BRW_STATE_CONTEXT)
+#define BRW_NEW_PSP                     (1 << BRW_STATE_PSP)
+#define BRW_NEW_SURFACES		(1 << BRW_STATE_SURFACES)
+#define BRW_NEW_VS_BINDING_TABLE	(1 << BRW_STATE_VS_BINDING_TABLE)
+#define BRW_NEW_GS_BINDING_TABLE	(1 << BRW_STATE_GS_BINDING_TABLE)
+#define BRW_NEW_PS_BINDING_TABLE	(1 << BRW_STATE_PS_BINDING_TABLE)
+#define BRW_NEW_INDICES			(1 << BRW_STATE_INDICES)
+#define BRW_NEW_VERTICES		(1 << BRW_STATE_VERTICES)
+/**
+ * Used for any batch entry with a relocated pointer that will be used
+ * by any 3D rendering.
+ */
+#define BRW_NEW_BATCH                  (1 << BRW_STATE_BATCH)
+/** \see brw.state.depth_region */
+#define BRW_NEW_INDEX_BUFFER           (1 << BRW_STATE_INDEX_BUFFER)
+#define BRW_NEW_VS_CONSTBUF            (1 << BRW_STATE_VS_CONSTBUF)
+#define BRW_NEW_GS_CONSTBUF            (1 << BRW_STATE_GS_CONSTBUF)
+#define BRW_NEW_PROGRAM_CACHE		(1 << BRW_STATE_PROGRAM_CACHE)
+#define BRW_NEW_STATE_BASE_ADDRESS	(1 << BRW_STATE_STATE_BASE_ADDRESS)
+#define BRW_NEW_VUE_MAP_VS		(1 << BRW_STATE_VUE_MAP_VS)
+#define BRW_NEW_VUE_MAP_GEOM_OUT	(1 << BRW_STATE_VUE_MAP_GEOM_OUT)
+#define BRW_NEW_TRANSFORM_FEEDBACK	(1 << BRW_STATE_TRANSFORM_FEEDBACK)
+#define BRW_NEW_RASTERIZER_DISCARD	(1 << BRW_STATE_RASTERIZER_DISCARD)
+#define BRW_NEW_STATS_WM		(1 << BRW_STATE_STATS_WM)
+#define BRW_NEW_UNIFORM_BUFFER          (1 << BRW_STATE_UNIFORM_BUFFER)
+#define BRW_NEW_ATOMIC_BUFFER           (1 << BRW_STATE_ATOMIC_BUFFER)
+#define BRW_NEW_META_IN_PROGRESS        (1 << BRW_STATE_META_IN_PROGRESS)
+#define BRW_NEW_INTERPOLATION_MAP       (1 << BRW_STATE_INTERPOLATION_MAP)
+#define BRW_NEW_PUSH_CONSTANT_ALLOCATION (1 << BRW_STATE_PUSH_CONSTANT_ALLOCATION)
+#define BRW_NEW_NUM_SAMPLES             (1 << BRW_STATE_NUM_SAMPLES)
+
+struct brw_state_flags {
+   /** State update flags signalled by mesa internals */
+   GLuint mesa;
+   /**
+    * State update flags signalled as the result of brw_tracked_state updates
+    */
+   GLuint brw;
+   /** State update flags signalled by brw_state_cache.c searches */
+   GLuint cache;
+};
+
+#define AUB_TRACE_TYPE_MASK		0x0000ff00
+#define AUB_TRACE_TYPE_NOTYPE		(0 << 8)
+#define AUB_TRACE_TYPE_BATCH		(1 << 8)
+#define AUB_TRACE_TYPE_VERTEX_BUFFER	(5 << 8)
+#define AUB_TRACE_TYPE_2D_MAP		(6 << 8)
+#define AUB_TRACE_TYPE_CUBE_MAP		(7 << 8)
+#define AUB_TRACE_TYPE_VOLUME_MAP	(9 << 8)
+#define AUB_TRACE_TYPE_1D_MAP		(10 << 8)
+#define AUB_TRACE_TYPE_CONSTANT_BUFFER	(11 << 8)
+#define AUB_TRACE_TYPE_CONSTANT_URB	(12 << 8)
+#define AUB_TRACE_TYPE_INDEX_BUFFER	(13 << 8)
+#define AUB_TRACE_TYPE_GENERAL		(14 << 8)
+#define AUB_TRACE_TYPE_SURFACE		(15 << 8)
+
+/**
+ * state_struct_type enum values are encoded with the top 16 bits representing
+ * the type to be delivered to the .aub file, and the bottom 16 bits
+ * representing the subtype.  This macro performs the encoding.
+ */
+#define ENCODE_SS_TYPE(type, subtype) (((type) << 16) | (subtype))
+
+enum state_struct_type {
+   AUB_TRACE_VS_STATE =			ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 1),
+   AUB_TRACE_GS_STATE =			ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 2),
+   AUB_TRACE_CLIP_STATE =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 3),
+   AUB_TRACE_SF_STATE =			ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 4),
+   AUB_TRACE_WM_STATE =			ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 5),
+   AUB_TRACE_CC_STATE =			ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 6),
+   AUB_TRACE_CLIP_VP_STATE =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 7),
+   AUB_TRACE_SF_VP_STATE =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 8),
+   AUB_TRACE_CC_VP_STATE =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 0x9),
+   AUB_TRACE_SAMPLER_STATE =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 0xa),
+   AUB_TRACE_KERNEL_INSTRUCTIONS =	ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 0xb),
+   AUB_TRACE_SCRATCH_SPACE =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 0xc),
+   AUB_TRACE_SAMPLER_DEFAULT_COLOR =    ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 0xd),
+
+   AUB_TRACE_SCISSOR_STATE =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 0x15),
+   AUB_TRACE_BLEND_STATE =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 0x16),
+   AUB_TRACE_DEPTH_STENCIL_STATE =	ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 0x17),
+
+   AUB_TRACE_VERTEX_BUFFER =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_VERTEX_BUFFER, 0),
+   AUB_TRACE_BINDING_TABLE =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_SURFACE, 0x100),
+   AUB_TRACE_SURFACE_STATE =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_SURFACE, 0x200),
+   AUB_TRACE_VS_CONSTANTS =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_CONSTANT_BUFFER, 0),
+   AUB_TRACE_WM_CONSTANTS =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_CONSTANT_BUFFER, 1),
+};
+
+/**
+ * Decode a state_struct_type value to determine the type that should be
+ * stored in the .aub file.
+ */
+static inline uint32_t AUB_TRACE_TYPE(enum state_struct_type ss_type)
+{
+   return (ss_type & 0xFFFF0000) >> 16;
+}
+
+/**
+ * Decode a state_struct_type value to determine the subtype that should be
+ * stored in the .aub file.
+ */
+static inline uint32_t AUB_TRACE_SUBTYPE(enum state_struct_type ss_type)
+{
+   return ss_type & 0xFFFF;
+}
+
+/** Subclass of Mesa vertex program */
+struct brw_vertex_program {
+   struct gl_vertex_program program;
+   GLuint id;
+};
+
+
+/** Subclass of Mesa geometry program */
+struct brw_geometry_program {
+   struct gl_geometry_program program;
+   unsigned id;  /**< serial no. to identify geom progs, never re-used */
+};
+
+
+/** Subclass of Mesa fragment program */
+struct brw_fragment_program {
+   struct gl_fragment_program program;
+   GLuint id;  /**< serial no. to identify frag progs, never re-used */
+};
+
+
+/** Subclass of Mesa compute program */
+struct brw_compute_program {
+   struct gl_compute_program program;
+   unsigned id;  /**< serial no. to identify compute progs, never re-used */
+};
+
+
+struct brw_shader {
+   struct gl_shader base;
+
+   bool compiled_once;
+};
+
+/* Note: If adding fields that need anything besides a normal memcmp() for
+ * comparing them, be sure to go fix brw_stage_prog_data_compare().
+ */
+struct brw_stage_prog_data {
+   struct {
+      /** size of our binding table. */
+      uint32_t size_bytes;
+
+      /** @{
+       * surface indices for the various groups of surfaces
+       */
+      uint32_t pull_constants_start;
+      uint32_t texture_start;
+      uint32_t gather_texture_start;
+      uint32_t ubo_start;
+      uint32_t abo_start;
+      uint32_t shader_time_start;
+      /** @} */
+   } binding_table;
+
+   GLuint nr_params;       /**< number of float params/constants */
+   GLuint nr_pull_params;
+
+   /* Pointers to tracked values (only valid once
+    * _mesa_load_state_parameters has been called at runtime).
+    *
+    * These must be the last fields of the struct (see
+    * brw_stage_prog_data_compare()).
+    */
+   const float **param;
+   const float **pull_param;
+};
+
+/* Data about a particular attempt to compile a program.  Note that
+ * there can be many of these, each in a different GL state
+ * corresponding to a different brw_wm_prog_key struct, with different
+ * compiled programs.
+ *
+ * Note: brw_wm_prog_data_compare() must be updated when adding fields to this
+ * struct!
+ */
+struct brw_wm_prog_data {
+   struct brw_stage_prog_data base;
+
+   GLuint curb_read_length;
+   GLuint num_varying_inputs;
+
+   GLuint first_curbe_grf;
+   GLuint first_curbe_grf_16;
+   GLuint reg_blocks;
+   GLuint reg_blocks_16;
+   GLuint total_scratch;
+
+   struct {
+      /** @{
+       * surface indices the WM-specific surfaces
+       */
+      uint32_t render_target_start;
+      /** @} */
+   } binding_table;
+
+   bool dual_src_blend;
+   bool uses_pos_offset;
+   bool uses_omask;
+   uint32_t prog_offset_16;
+
+   /**
+    * Mask of which interpolation modes are required by the fragment shader.
+    * Used in hardware setup on gen6+.
+    */
+   uint32_t barycentric_interp_modes;
+
+   /**
+    * Map from gl_varying_slot to the position within the FS setup data
+    * payload where the varying's attribute vertex deltas should be delivered.
+    * For varying slots that are not used by the FS, the value is -1.
+    */
+   int urb_setup[VARYING_SLOT_MAX];
+};
+
+/**
+ * Enum representing the i965-specific vertex results that don't correspond
+ * exactly to any element of gl_varying_slot.  The values of this enum are
+ * assigned such that they don't conflict with gl_varying_slot.
+ */
+typedef enum
+{
+   BRW_VARYING_SLOT_NDC = VARYING_SLOT_MAX,
+   BRW_VARYING_SLOT_PAD,
+   /**
+    * Technically this is not a varying but just a placeholder that
+    * compile_sf_prog() inserts into its VUE map to cause the gl_PointCoord
+    * builtin variable to be compiled correctly. see compile_sf_prog() for
+    * more info.
+    */
+   BRW_VARYING_SLOT_PNTC,
+   BRW_VARYING_SLOT_COUNT
+} brw_varying_slot;
+
+
+/**
+ * Data structure recording the relationship between the gl_varying_slot enum
+ * and "slots" within the vertex URB entry (VUE).  A "slot" is defined as a
+ * single octaword within the VUE (128 bits).
+ *
+ * Note that each BRW register contains 256 bits (2 octawords), so when
+ * accessing the VUE in URB_NOSWIZZLE mode, each register corresponds to two
+ * consecutive VUE slots.  When accessing the VUE in URB_INTERLEAVED mode (as
+ * in a vertex shader), each register corresponds to a single VUE slot, since
+ * it contains data for two separate vertices.
+ */
+struct brw_vue_map {
+   /**
+    * Bitfield representing all varying slots that are (a) stored in this VUE
+    * map, and (b) actually written by the shader.  Does not include any of
+    * the additional varying slots defined in brw_varying_slot.
+    */
+   GLbitfield64 slots_valid;
+
+   /**
+    * Map from gl_varying_slot value to VUE slot.  For gl_varying_slots that are
+    * not stored in a slot (because they are not written, or because
+    * additional processing is applied before storing them in the VUE), the
+    * value is -1.
+    */
+   signed char varying_to_slot[BRW_VARYING_SLOT_COUNT];
+
+   /**
+    * Map from VUE slot to gl_varying_slot value.  For slots that do not
+    * directly correspond to a gl_varying_slot, the value comes from
+    * brw_varying_slot.
+    *
+    * For slots that are not in use, the value is BRW_VARYING_SLOT_COUNT (this
+    * simplifies code that uses the value stored in slot_to_varying to
+    * create a bit mask).
+    */
+   signed char slot_to_varying[BRW_VARYING_SLOT_COUNT];
+
+   /**
+    * Total number of VUE slots in use
+    */
+   int num_slots;
+};
+
+/**
+ * Convert a VUE slot number into a byte offset within the VUE.
+ */
+static inline GLuint brw_vue_slot_to_offset(GLuint slot)
+{
+   return 16*slot;
+}
+
+/**
+ * Convert a vertex output (brw_varying_slot) into a byte offset within the
+ * VUE.
+ */
+static inline GLuint brw_varying_to_offset(struct brw_vue_map *vue_map,
+                                           GLuint varying)
+{
+   return brw_vue_slot_to_offset(vue_map->varying_to_slot[varying]);
+}
+
+void brw_compute_vue_map(struct brw_context *brw, struct brw_vue_map *vue_map,
+                         GLbitfield64 slots_valid);
+
+
+/**
+ * Bitmask indicating which fragment shader inputs represent varyings (and
+ * hence have to be delivered to the fragment shader by the SF/SBE stage).
+ */
+#define BRW_FS_VARYING_INPUT_MASK \
+   (BITFIELD64_RANGE(0, VARYING_SLOT_MAX) & \
+    ~VARYING_BIT_POS & ~VARYING_BIT_FACE)
+
+
+/*
+ * Mapping of VUE map slots to interpolation modes.
+ */
+struct interpolation_mode_map {
+   unsigned char mode[BRW_VARYING_SLOT_COUNT];
+};
+
+static inline bool brw_any_flat_varyings(struct interpolation_mode_map *map)
+{
+   for (int i = 0; i < BRW_VARYING_SLOT_COUNT; i++)
+      if (map->mode[i] == INTERP_QUALIFIER_FLAT)
+         return true;
+
+   return false;
+}
+
+static inline bool brw_any_noperspective_varyings(struct interpolation_mode_map *map)
+{
+   for (int i = 0; i < BRW_VARYING_SLOT_COUNT; i++)
+      if (map->mode[i] == INTERP_QUALIFIER_NOPERSPECTIVE)
+         return true;
+
+   return false;
+}
+
+
+struct brw_sf_prog_data {
+   GLuint urb_read_length;
+   GLuint total_grf;
+
+   /* Each vertex may have up to 12 attributes, 4 components each,
+    * except WPOS which requires only 2.  (11*4 + 2) == 44 ==> 11
+    * rows.
+    *
+    * Actually we use 4 for each, so call it 12 rows.
+    */
+   GLuint urb_entry_size;
+};
+
+
+/**
+ * We always program SF to start reading at an offset of 1 (2 varying slots)
+ * from the start of the vertex URB entry.  This causes it to skip:
+ * - VARYING_SLOT_PSIZ and BRW_VARYING_SLOT_NDC on gen4-5
+ * - VARYING_SLOT_PSIZ and VARYING_SLOT_POS on gen6+
+ */
+#define BRW_SF_URB_ENTRY_READ_OFFSET 1
+
+
+struct brw_clip_prog_data {
+   GLuint curb_read_length;	/* user planes? */
+   GLuint clip_mode;
+   GLuint urb_read_length;
+   GLuint total_grf;
+};
+
+struct brw_ff_gs_prog_data {
+   GLuint urb_read_length;
+   GLuint total_grf;
+
+   /**
+    * Gen6 transform feedback: Amount by which the streaming vertex buffer
+    * indices should be incremented each time the GS is invoked.
+    */
+   unsigned svbi_postincrement_value;
+};
+
+
+/* Note: brw_vec4_prog_data_compare() must be updated when adding fields to
+ * this struct!
+ */
+struct brw_vec4_prog_data {
+   struct brw_stage_prog_data base;
+   struct brw_vue_map vue_map;
+
+   /**
+    * Register where the thread expects to find input data from the URB
+    * (typically uniforms, followed by per-vertex inputs).
+    */
+   unsigned dispatch_grf_start_reg;
+
+   GLuint curb_read_length;
+   GLuint urb_read_length;
+   GLuint total_grf;
+   GLuint total_scratch;
+
+   /* Used for calculating urb partitions.  In the VS, this is the size of the
+    * URB entry used for both input and output to the thread.  In the GS, this
+    * is the size of the URB entry used for output.
+    */
+   GLuint urb_entry_size;
+};
+
+
+/* Note: brw_vs_prog_data_compare() must be updated when adding fields to this
+ * struct!
+ */
+struct brw_vs_prog_data {
+   struct brw_vec4_prog_data base;
+
+   GLbitfield64 inputs_read;
+
+   bool uses_vertexid;
+   bool uses_instanceid;
+};
+
+
+/* Note: brw_gs_prog_data_compare() must be updated when adding fields to
+ * this struct!
+ */
+struct brw_gs_prog_data
+{
+   struct brw_vec4_prog_data base;
+
+   /**
+    * Size of an output vertex, measured in HWORDS (32 bytes).
+    */
+   unsigned output_vertex_size_hwords;
+
+   unsigned output_topology;
+
+   /**
+    * Size of the control data (cut bits or StreamID bits), in hwords (32
+    * bytes).  0 if there is no control data.
+    */
+   unsigned control_data_header_size_hwords;
+
+   /**
+    * Format of the control data (either GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_SID
+    * if the control data is StreamID bits, or
+    * GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_CUT if the control data is cut bits).
+    * Ignored if control_data_header_size is 0.
+    */
+   unsigned control_data_format;
+
+   bool include_primitive_id;
+
+   int invocations;
+
+   /**
+    * True if the thread should be dispatched in DUAL_INSTANCE mode, false if
+    * it should be dispatched in DUAL_OBJECT mode.
+    */
+   bool dual_instanced_dispatch;
+};
+
+/** Number of texture sampler units */
+#define BRW_MAX_TEX_UNIT 32
+
+/** Max number of render targets in a shader */
+#define BRW_MAX_DRAW_BUFFERS 8
+
+/** Max number of atomic counter buffer objects in a shader */
+#define BRW_MAX_ABO 16
+
+/**
+ * Max number of binding table entries used for stream output.
+ *
+ * From the OpenGL 3.0 spec, table 6.44 (Transform Feedback State), the
+ * minimum value of MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS is 64.
+ *
+ * On Gen6, the size of transform feedback data is limited not by the number
+ * of components but by the number of binding table entries we set aside.  We
+ * use one binding table entry for a float, one entry for a vector, and one
+ * entry per matrix column.  Since the only way we can communicate our
+ * transform feedback capabilities to the client is via
+ * MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS, we need to plan for the
+ * worst case, in which all the varyings are floats, so we use up one binding
+ * table entry per component.  Therefore we need to set aside at least 64
+ * binding table entries for use by transform feedback.
+ *
+ * Note: since we don't currently pack varyings, it is currently impossible
+ * for the client to actually use up all of these binding table entries--if
+ * all of their varyings were floats, they would run out of varying slots and
+ * fail to link.  But that's a bug, so it seems prudent to go ahead and
+ * allocate the number of binding table entries we will need once the bug is
+ * fixed.
+ */
+#define BRW_MAX_SOL_BINDINGS 64
+
+/** Maximum number of actual buffers used for stream output */
+#define BRW_MAX_SOL_BUFFERS 4
+
+#define BRW_MAX_SURFACES   (BRW_MAX_DRAW_BUFFERS +                      \
+                            BRW_MAX_TEX_UNIT * 2 + /* normal, gather */ \
+                            12 + /* ubo */                              \
+                            BRW_MAX_ABO +                               \
+                            2 /* shader time, pull constants */)
+
+#define SURF_INDEX_GEN6_SOL_BINDING(t) (t)
+#define BRW_MAX_GEN6_GS_SURFACES       SURF_INDEX_GEN6_SOL_BINDING(BRW_MAX_SOL_BINDINGS)
+
+/**
+ * Stride in bytes between shader_time entries.
+ *
+ * We separate entries by a cacheline to reduce traffic between EUs writing to
+ * different entries.
+ */
+#define SHADER_TIME_STRIDE 64
+
+enum brw_cache_id {
+   BRW_CC_VP,
+   BRW_CC_UNIT,
+   BRW_WM_PROG,
+   BRW_BLORP_BLIT_PROG,
+   BRW_BLORP_CONST_COLOR_PROG,
+   BRW_SAMPLER,
+   BRW_WM_UNIT,
+   BRW_SF_PROG,
+   BRW_SF_VP,
+   BRW_SF_UNIT, /* scissor state on gen6 */
+   BRW_VS_UNIT,
+   BRW_VS_PROG,
+   BRW_FF_GS_UNIT,
+   BRW_FF_GS_PROG,
+   BRW_GS_PROG,
+   BRW_CLIP_VP,
+   BRW_CLIP_UNIT,
+   BRW_CLIP_PROG,
+
+   BRW_MAX_CACHE
+};
+
+struct brw_cache_item {
+   /**
+    * Effectively part of the key, cache_id identifies what kind of state
+    * buffer is involved, and also which brw->state.dirty.cache flag should
+    * be set when this cache item is chosen.
+    */
+   enum brw_cache_id cache_id;
+   /** 32-bit hash of the key data */
+   GLuint hash;
+   GLuint key_size;		/* for variable-sized keys */
+   GLuint aux_size;
+   const void *key;
+
+   uint32_t offset;
+   uint32_t size;
+
+   struct brw_cache_item *next;
+};
+
+
+typedef bool (*cache_aux_compare_func)(const void *a, const void *b);
+typedef void (*cache_aux_free_func)(const void *aux);
+
+struct brw_cache {
+   struct brw_context *brw;
+
+   struct brw_cache_item **items;
+   drm_intel_bo *bo;
+   GLuint size, n_items;
+
+   uint32_t next_offset;
+   bool bo_used_by_gpu;
+
+   /**
+    * Optional functions used in determining whether the prog_data for a new
+    * cache item matches an existing cache item (in case there's relevant data
+    * outside of the prog_data).  If NULL, a plain memcmp is done.
+    */
+   cache_aux_compare_func aux_compare[BRW_MAX_CACHE];
+   /** Optional functions for freeing other pointers attached to a prog_data. */
+   cache_aux_free_func aux_free[BRW_MAX_CACHE];
+};
+
+
+/* Considered adding a member to this struct to document which flags
+ * an update might raise so that ordering of the state atoms can be
+ * checked or derived at runtime.  Dropped the idea in favor of having
+ * a debug mode where the state is monitored for flags which are
+ * raised that have already been tested against.
+ */
+struct brw_tracked_state {
+   struct brw_state_flags dirty;
+   void (*emit)( struct brw_context *brw );
+};
+
+enum shader_time_shader_type {
+   ST_NONE,
+   ST_VS,
+   ST_VS_WRITTEN,
+   ST_VS_RESET,
+   ST_GS,
+   ST_GS_WRITTEN,
+   ST_GS_RESET,
+   ST_FS8,
+   ST_FS8_WRITTEN,
+   ST_FS8_RESET,
+   ST_FS16,
+   ST_FS16_WRITTEN,
+   ST_FS16_RESET,
+};
+
+/* Flags for brw->state.cache.
+ */
+#define CACHE_NEW_CC_VP                  (1<<BRW_CC_VP)
+#define CACHE_NEW_CC_UNIT                (1<<BRW_CC_UNIT)
+#define CACHE_NEW_WM_PROG                (1<<BRW_WM_PROG)
+#define CACHE_NEW_BLORP_BLIT_PROG        (1<<BRW_BLORP_BLIT_PROG)
+#define CACHE_NEW_BLORP_CONST_COLOR_PROG (1<<BRW_BLORP_CONST_COLOR_PROG)
+#define CACHE_NEW_SAMPLER                (1<<BRW_SAMPLER)
+#define CACHE_NEW_WM_UNIT                (1<<BRW_WM_UNIT)
+#define CACHE_NEW_SF_PROG                (1<<BRW_SF_PROG)
+#define CACHE_NEW_SF_VP                  (1<<BRW_SF_VP)
+#define CACHE_NEW_SF_UNIT                (1<<BRW_SF_UNIT)
+#define CACHE_NEW_VS_UNIT                (1<<BRW_VS_UNIT)
+#define CACHE_NEW_VS_PROG                (1<<BRW_VS_PROG)
+#define CACHE_NEW_FF_GS_UNIT             (1<<BRW_FF_GS_UNIT)
+#define CACHE_NEW_FF_GS_PROG             (1<<BRW_FF_GS_PROG)
+#define CACHE_NEW_GS_PROG                (1<<BRW_GS_PROG)
+#define CACHE_NEW_CLIP_VP                (1<<BRW_CLIP_VP)
+#define CACHE_NEW_CLIP_UNIT              (1<<BRW_CLIP_UNIT)
+#define CACHE_NEW_CLIP_PROG              (1<<BRW_CLIP_PROG)
+
+struct brw_vertex_buffer {
+   /** Buffer object containing the uploaded vertex data */
+   drm_intel_bo *bo;
+   uint32_t offset;
+   /** Byte stride between elements in the uploaded array */
+   GLuint stride;
+   GLuint step_rate;
+};
+struct brw_vertex_element {
+   const struct gl_client_array *glarray;
+
+   int buffer;
+
+   /** Offset of the first element within the buffer object */
+   unsigned int offset;
+};
+
+
+
+
+
+
+/**
+ * Data shared between each programmable stage in the pipeline (vs, gs, and
+ * wm).
+ */
+struct brw_stage_state
+{
+   gl_shader_stage stage;
+   struct brw_stage_prog_data *prog_data;
+
+   /**
+    * Optional scratch buffer used to store spilled register values and
+    * variably-indexed GRF arrays.
+    */
+   drm_intel_bo *scratch_bo;
+
+   /** Offset in the program cache to the program */
+   uint32_t prog_offset;
+
+   /** Offset in the batchbuffer to Gen4-5 pipelined state (VS/WM/GS_STATE). */
+   uint32_t state_offset;
+
+   uint32_t push_const_offset; /* Offset in the batchbuffer */
+   int push_const_size; /* in 256-bit register increments */
+
+   /* Binding table: pointers to SURFACE_STATE entries. */
+   uint32_t bind_bo_offset;
+   uint32_t surf_offset[BRW_MAX_SURFACES];
+
+   /** SAMPLER_STATE count and table offset */
+   uint32_t sampler_count;
+   uint32_t sampler_offset;
+
+   /** Offsets in the batch to sampler default colors (texture border color) */
+   uint32_t sdc_offset[BRW_MAX_TEX_UNIT];
+};
+
+
+/**
+ * brw_context is derived from gl_context.
+ */
+struct brw_context
+{
+    // LunarG:  Note, this is our stripped down gl_context
+   struct gl_context ctx; /**< base class, must be first field */
+
+   GLuint stats_wm;
+
+   bool disable_derivative_optimization;
+
+   GLuint primitive; /**< Hardware primitive, such as _3DPRIM_TRILIST. */
+
+   GLenum reduced_primitive;
+
+   /**
+    * Set if we're either a debug context or the INTEL_DEBUG=perf environment
+    * variable is set, this is the flag indicating to do expensive work that
+    * might lead to a perf_debug() call.
+    */
+   bool perf_debug;
+
+   int gen;
+   int gt;
+
+   bool is_g4x;
+   bool is_baytrail;
+   bool is_haswell;
+
+   bool has_llc;
+   bool has_compr4;
+   bool has_negative_rhw_bug;
+   bool has_pln;
+
+   /**
+    * Some versions of Gen hardware don't do centroid interpolation correctly
+    * on unlit pixels, causing incorrect values for derivatives near triangle
+    * edges.  Enabling this flag causes the fragment shader to use
+    * non-centroid interpolation for unlit pixels, at the expense of two extra
+    * fragment shader instructions.
+    */
+   bool needs_unlit_centroid_workaround;
+
+
+   struct {
+      struct brw_vertex_element inputs[VERT_ATTRIB_MAX];
+      struct brw_vertex_buffer buffers[VERT_ATTRIB_MAX];
+
+      struct brw_vertex_element *enabled[VERT_ATTRIB_MAX];
+      GLuint nr_enabled;
+      GLuint nr_buffers;
+
+      /* Summary of size and varying of active arrays, so we can check
+       * for changes to this state:
+       */
+      unsigned int min_index, max_index;
+
+      /* Offset from start of vertex buffer so we can avoid redefining
+       * the same VB packed over and over again.
+       */
+      unsigned int start_vertex_bias;
+   } vb;
+
+   struct {
+      /**
+       * Index buffer for this draw_prims call.
+       *
+       * Updates are signaled by BRW_NEW_INDICES.
+       */
+      const struct _mesa_index_buffer *ib;
+
+      /* Updates are signaled by BRW_NEW_INDEX_BUFFER. */
+      drm_intel_bo *bo;
+      GLuint type;
+
+      /* Offset to index buffer index to use in CMD_3D_PRIM so that we can
+       * avoid re-uploading the IB packet over and over if we're actually
+       * referencing the same index buffer.
+       */
+      unsigned int start_vertex_offset;
+   } ib;
+
+   /* Active vertex program:
+    */
+   const struct gl_vertex_program *vertex_program;
+   const struct gl_geometry_program *geometry_program;
+   const struct gl_fragment_program *fragment_program;
+
+   /**
+    * Platform specific constants containing the maximum number of threads
+    * for each pipeline stage.
+    */
+   int max_vs_threads;
+   int max_gs_threads;
+   int max_wm_threads;
+
+   /* BRW_NEW_URB_ALLOCATIONS:
+    */
+   struct {
+      GLuint vsize;		/* vertex size plus header in urb registers */
+      GLuint csize;		/* constant buffer size in urb registers */
+      GLuint sfsize;		/* setup data size in urb registers */
+
+      bool constrained;
+
+      GLuint min_vs_entries;    /* Minimum number of VS entries */
+      GLuint max_vs_entries;	/* Maximum number of VS entries */
+      GLuint max_gs_entries;	/* Maximum number of GS entries */
+
+      GLuint nr_vs_entries;
+      GLuint nr_gs_entries;
+      GLuint nr_clip_entries;
+      GLuint nr_sf_entries;
+      GLuint nr_cs_entries;
+
+      GLuint vs_start;
+      GLuint gs_start;
+      GLuint clip_start;
+      GLuint sf_start;
+      GLuint cs_start;
+      GLuint size; /* Hardware URB size, in KB. */
+
+      /* gen6: True if the most recently sent _3DSTATE_URB message allocated
+       * URB space for the GS.
+       */
+      bool gen6_gs_previously_active;
+   } urb;
+
+
+   /* BRW_NEW_CURBE_OFFSETS:
+    */
+   struct {
+      GLuint wm_start;  /**< pos of first wm const in CURBE buffer */
+      GLuint wm_size;   /**< number of float[4] consts, multiple of 16 */
+      GLuint clip_start;
+      GLuint clip_size;
+      GLuint vs_start;
+      GLuint vs_size;
+      GLuint total_size;
+
+      drm_intel_bo *curbe_bo;
+      /** Offset within curbe_bo of space for current curbe entry */
+      GLuint curbe_offset;
+      /** Offset within curbe_bo of space for next curbe entry */
+      GLuint curbe_next_offset;
+
+      /**
+       * Copy of the last set of CURBEs uploaded.  Frequently we'll end up
+       * in brw_curbe.c with the same set of constant data to be uploaded,
+       * so we'd rather not upload new constants in that case (it can cause
+       * a pipeline bubble since only up to 4 can be pipelined at a time).
+       */
+      GLfloat *last_buf;
+      /**
+       * Allocation for where to calculate the next set of CURBEs.
+       * It's a hot enough path that malloc/free of that data matters.
+       */
+      GLfloat *next_buf;
+      GLuint last_bufsz;
+   } curbe;
+
+   /**
+    * Layout of vertex data exiting the vertex shader.
+    *
+    * BRW_NEW_VUE_MAP_VS is flagged when this VUE map changes.
+    */
+   struct brw_vue_map vue_map_vs;
+
+   /**
+    * Layout of vertex data exiting the geometry portion of the pipleine.
+    * This comes from the geometry shader if one exists, otherwise from the
+    * vertex shader.
+    *
+    * BRW_NEW_VUE_MAP_GEOM_OUT is flagged when the VUE map changes.
+    */
+   struct brw_vue_map vue_map_geom_out;
+
+   struct {
+      struct brw_stage_state base;
+      struct brw_vs_prog_data *prog_data;
+   } vs;
+
+   struct {
+      struct brw_stage_state base;
+      struct brw_gs_prog_data *prog_data;
+
+      /**
+       * True if the 3DSTATE_GS command most recently emitted to the 3D
+       * pipeline enabled the GS; false otherwise.
+       */
+      bool enabled;
+   } gs;
+
+   struct {
+      struct brw_ff_gs_prog_data *prog_data;
+
+      bool prog_active;
+      /** Offset in the program cache to the CLIP program pre-gen6 */
+      uint32_t prog_offset;
+      uint32_t state_offset;
+
+      uint32_t bind_bo_offset;
+      uint32_t surf_offset[BRW_MAX_GEN6_GS_SURFACES];
+   } ff_gs;
+
+   struct {
+      struct brw_clip_prog_data *prog_data;
+
+      /** Offset in the program cache to the CLIP program pre-gen6 */
+      uint32_t prog_offset;
+
+      /* Offset in the batch to the CLIP state on pre-gen6. */
+      uint32_t state_offset;
+
+      /* As of gen6, this is the offset in the batch to the CLIP VP,
+       * instead of vp_bo.
+       */
+      uint32_t vp_offset;
+   } clip;
+
+
+   struct {
+      struct brw_sf_prog_data *prog_data;
+
+      /** Offset in the program cache to the CLIP program pre-gen6 */
+      uint32_t prog_offset;
+      uint32_t state_offset;
+      uint32_t vp_offset;
+   } sf;
+
+   struct {
+      struct brw_stage_state base;
+      struct brw_wm_prog_data *prog_data;
+
+      GLuint render_surf;
+
+      /**
+       * Buffer object used in place of multisampled null render targets on
+       * Gen6.  See brw_update_null_renderbuffer_surface().
+       */
+      drm_intel_bo *multisampled_null_render_target_bo;
+   } wm;
+
+
+   struct {
+      uint32_t state_offset;
+      uint32_t blend_state_offset;
+      uint32_t depth_stencil_state_offset;
+      uint32_t vp_offset;
+   } cc;
+
+
+   int num_atoms;
+   const struct brw_tracked_state **atoms;
+
+   struct {
+      //drm_intel_bo *bo; // LunarG: Remove
+      struct gl_shader_program **shader_programs;
+      struct gl_program **programs;
+      enum shader_time_shader_type *types;
+      uint64_t *cumulative;
+      int num_entries;
+      int max_entries;
+      double report_time;
+   } shader_time;
+
+   struct intel_screen *intelScreen;
+
+   // LunarG : ADD
+   // Give us a place to store our compile results
+   struct gl_shader_program *shader_prog;
+};
+
+
+
+
+
+/*======================================================================
+ * brw_program.c
+ */
+void brwInitFragProgFuncs( struct dd_function_table *functions );
+
+int brw_get_scratch_size(int size);
+void brw_get_scratch_bo(struct brw_context *brw,
+			drm_intel_bo **scratch_bo, int size);
+void brw_init_shader_time(struct brw_context *brw);
+int brw_get_shader_time_index(struct brw_context *brw,
+                              struct gl_shader_program *shader_prog,
+                              struct gl_program *prog,
+                              enum shader_time_shader_type type);
+void brw_collect_and_report_shader_time(struct brw_context *brw);
+void brw_destroy_shader_time(struct brw_context *brw);
+
+/* brw_urb.c
+ */
+void brw_upload_urb_fence(struct brw_context *brw);
+
+/* brw_curbe.c
+ */
+void brw_upload_cs_urb_state(struct brw_context *brw);
+
+/* brw_fs_reg_allocate.cpp
+ */
+void brw_fs_alloc_reg_sets(struct intel_screen *screen);
+
+/* brw_vec4_reg_allocate.cpp */
+void brw_vec4_alloc_reg_set(struct intel_screen *screen);
+
+/* brw_disasm.c */
+int brw_disasm (FILE *file, struct brw_instruction *inst, int gen);
+
+/* brw_vs.c */
+gl_clip_plane *brw_select_clip_planes(struct gl_context *ctx);
+
+/* intel_extensions.c */
+extern void intelInitExtensions(struct gl_context *ctx);
+
+
+/*======================================================================
+ * Inline conversion functions.  These are better-typed than the
+ * macros used previously:
+ */
+static inline struct brw_context *
+brw_context( struct gl_context *ctx )
+{
+   return (struct brw_context *)ctx;
+}
+
+static inline struct brw_vertex_program *
+brw_vertex_program(struct gl_vertex_program *p)
+{
+   return (struct brw_vertex_program *) p;
+}
+
+static inline const struct brw_vertex_program *
+brw_vertex_program_const(const struct gl_vertex_program *p)
+{
+   return (const struct brw_vertex_program *) p;
+}
+
+static inline struct brw_geometry_program *
+brw_geometry_program(struct gl_geometry_program *p)
+{
+   return (struct brw_geometry_program *) p;
+}
+
+static inline struct brw_fragment_program *
+brw_fragment_program(struct gl_fragment_program *p)
+{
+   return (struct brw_fragment_program *) p;
+}
+
+static inline const struct brw_fragment_program *
+brw_fragment_program_const(const struct gl_fragment_program *p)
+{
+   return (const struct brw_fragment_program *) p;
+}
+
+/**
+ * Pre-gen6, the register file of the EUs was shared between threads,
+ * and each thread used some subset allocated on a 16-register block
+ * granularity.  The unit states wanted these block counts.
+ */
+static inline int
+brw_register_blocks(int reg_count)
+{
+   return ALIGN(reg_count, 16) / 16 - 1;
+}
+
+
+bool brw_do_cubemap_normalize(struct exec_list *instructions);
+bool brw_lower_texture_gradients(struct brw_context *brw,
+                                 struct exec_list *instructions);
+bool brw_do_lower_unnormalized_offset(struct exec_list *instructions);
+
+struct opcode_desc {
+    char    *name;
+    int	    nsrc;
+    int	    ndst;
+};
+
+extern const struct opcode_desc opcode_descs[128];
+extern const char * const conditional_modifier[16];
+
+
+/* ================================================================
+ * From linux kernel i386 header files, copes with odd sizes better
+ * than COPY_DWORDS would:
+ * XXX Put this in src/mesa/main/imports.h ???
+ */
+#if defined(i386) || defined(__i386__)
+static inline void * __memcpy(void * to, const void * from, size_t n)
+{
+   int d0, d1, d2;
+   __asm__ __volatile__(
+      "rep ; movsl\n\t"
+      "testb $2,%b4\n\t"
+      "je 1f\n\t"
+      "movsw\n"
+      "1:\ttestb $1,%b4\n\t"
+      "je 2f\n\t"
+      "movsb\n"
+      "2:"
+      : "=&c" (d0), "=&D" (d1), "=&S" (d2)
+      :"0" (n/4), "q" (n),"1" ((long) to),"2" ((long) from)
+      : "memory");
+   return (to);
+}
+#else
+#define __memcpy(a,b,c) memcpy(a,b,c)
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif

diff --git a/icd/intel/compiler/pipeline/brw_cubemap_normalize.cpp b/icd/intel/compiler/pipeline/brw_cubemap_normalize.cpp
new file mode 100644
index 0000000..3357129
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_cubemap_normalize.cpp

@@ -0,0 +1,121 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file brw_cubemap_normalize.cpp
+ *
+ * IR lower pass to perform the normalization of the cubemap coordinates to
+ * have the largest magnitude component be -1.0 or 1.0.
+ *
+ * \author Eric Anholt <eric@anholt.net>
+ */
+
+#include "glsl/glsl_types.h"
+#include "glsl/ir.h"
+#include "program/prog_instruction.h" /* For WRITEMASK_* */
+
+class brw_cubemap_normalize_visitor : public ir_hierarchical_visitor {
+public:
+   brw_cubemap_normalize_visitor()
+   {
+      progress = false;
+   }
+
+   ir_visitor_status visit_leave(ir_texture *ir);
+
+   bool progress;
+};
+
+ir_visitor_status
+brw_cubemap_normalize_visitor::visit_leave(ir_texture *ir)
+{
+   if (ir->sampler->type->sampler_dimensionality != GLSL_SAMPLER_DIM_CUBE)
+      return visit_continue;
+
+   if (!ir->coordinate)
+      return visit_continue;
+
+   void *mem_ctx = ralloc_parent(ir);
+
+   ir_variable *var = new(mem_ctx) ir_variable(ir->coordinate->type,
+					       "coordinate", ir_var_auto);
+   base_ir->insert_before(var);
+   ir_dereference *deref = new(mem_ctx) ir_dereference_variable(var);
+   ir_assignment *assign = new(mem_ctx) ir_assignment(deref, ir->coordinate,
+						      NULL);
+   base_ir->insert_before(assign);
+
+   deref = new(mem_ctx) ir_dereference_variable(var);
+   ir_rvalue *swiz0 = new(mem_ctx) ir_swizzle(deref, 0, 0, 0, 0, 1);
+   deref = new(mem_ctx) ir_dereference_variable(var);
+   ir_rvalue *swiz1 = new(mem_ctx) ir_swizzle(deref, 1, 0, 0, 0, 1);
+   deref = new(mem_ctx) ir_dereference_variable(var);
+   ir_rvalue *swiz2 = new(mem_ctx) ir_swizzle(deref, 2, 0, 0, 0, 1);
+
+   swiz0 = new(mem_ctx) ir_expression(ir_unop_abs, swiz0->type, swiz0, NULL);
+   swiz1 = new(mem_ctx) ir_expression(ir_unop_abs, swiz1->type, swiz1, NULL);
+   swiz2 = new(mem_ctx) ir_expression(ir_unop_abs, swiz2->type, swiz2, NULL);
+
+   ir_expression *expr;
+   expr = new(mem_ctx) ir_expression(ir_binop_max,
+				     glsl_type::float_type,
+				     swiz0, swiz1);
+
+   expr = new(mem_ctx) ir_expression(ir_binop_max,
+				     glsl_type::float_type,
+				     expr, swiz2);
+
+   expr = new(mem_ctx) ir_expression(ir_unop_rcp,
+				     glsl_type::float_type,
+				     expr, NULL);
+
+   /* coordinate.xyz *= expr */
+   assign = new(mem_ctx) ir_assignment(
+      new(mem_ctx) ir_dereference_variable(var),
+      new(mem_ctx) ir_swizzle(
+         new(mem_ctx) ir_expression(ir_binop_mul,
+                                    ir->coordinate->type,
+                                    new(mem_ctx) ir_dereference_variable(var),
+                                    expr),
+         0, 1, 2, 0, 3));
+   assign->write_mask = WRITEMASK_XYZ;
+   base_ir->insert_before(assign);
+   ir->coordinate = new(mem_ctx) ir_dereference_variable(var);
+
+   progress = true;
+   return visit_continue;
+}
+
+extern "C" {
+
+bool
+brw_do_cubemap_normalize(exec_list *instructions)
+{
+   brw_cubemap_normalize_visitor v;
+
+   visit_list_elements(&v, instructions);
+
+   return v.progress;
+}
+
+}

diff --git a/icd/intel/compiler/pipeline/brw_dead_control_flow.cpp b/icd/intel/compiler/pipeline/brw_dead_control_flow.cpp
new file mode 100644
index 0000000..63a3e5b
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_dead_control_flow.cpp

@@ -0,0 +1,83 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+/** @file brw_dead_control_flow.cpp
+ *
+ * This file implements the dead control flow elimination optimization pass.
+ */
+
+#include "brw_shader.h"
+#include "brw_cfg.h"
+
+/* Look for and eliminate dead control flow:
+ *
+ *   - if/endif
+ *   - if/else/endif
+ */
+bool
+dead_control_flow_eliminate(backend_visitor *v)
+{
+   bool progress = false;
+
+   cfg_t cfg(&v->instructions);
+
+   for (int b = 0; b < cfg.num_blocks; b++) {
+      bblock_t *block = cfg.blocks[b];
+      bool found = false;
+
+      /* ENDIF instructions, by definition, can only be found at the start of
+       * basic blocks.
+       */
+      backend_instruction *endif_inst = block->start;
+      if (endif_inst->opcode != BRW_OPCODE_ENDIF)
+         continue;
+
+      backend_instruction *if_inst = NULL, *else_inst = NULL;
+      backend_instruction *prev_inst = (backend_instruction *) endif_inst->prev;
+      if (prev_inst->opcode == BRW_OPCODE_IF) {
+         if_inst = prev_inst;
+         found = true;
+      } else if (prev_inst->opcode == BRW_OPCODE_ELSE) {
+         else_inst = prev_inst;
+
+         prev_inst = (backend_instruction *) prev_inst->prev;
+         if (prev_inst->opcode == BRW_OPCODE_IF) {
+            if_inst = prev_inst;
+            found = true;
+         }
+      }
+
+      if (found) {
+         if_inst->remove();
+         if (else_inst)
+            else_inst->remove();
+         endif_inst->remove();
+         progress = true;
+      }
+   }
+
+   if (progress)
+      v->invalidate_live_intervals();
+
+   return progress;
+}

diff --git a/icd/intel/compiler/pipeline/brw_dead_control_flow.h b/icd/intel/compiler/pipeline/brw_dead_control_flow.h
new file mode 100644
index 0000000..57a4dab
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_dead_control_flow.h

@@ -0,0 +1,26 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "brw_shader.h"
+
+bool dead_control_flow_eliminate(backend_visitor *v);

diff --git a/icd/intel/compiler/pipeline/brw_defines.h b/icd/intel/compiler/pipeline/brw_defines.h
new file mode 100644
index 0000000..b34f846
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_defines.h

@@ -0,0 +1,2223 @@
+/*
+ Copyright (C) Intel Corp.  2006.  All Rights Reserved.
+ Intel funded Tungsten Graphics to
+ develop this 3D driver.
+
+ Permission is hereby granted, free of charge, to any person obtaining
+ a copy of this software and associated documentation files (the
+ "Software"), to deal in the Software without restriction, including
+ without limitation the rights to use, copy, modify, merge, publish,
+ distribute, sublicense, and/or sell copies of the Software, and to
+ permit persons to whom the Software is furnished to do so, subject to
+ the following conditions:
+
+ The above copyright notice and this permission notice (including the
+ next paragraph) shall be included in all copies or substantial
+ portions of the Software.
+
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+ **********************************************************************/
+ /*
+  * Authors:
+  *   Keith Whitwell <keithw@vmware.com>
+  */
+
+#define INTEL_MASK(high, low) (((1<<((high)-(low)+1))-1)<<(low))
+/* Using the GNU statement expression extension */
+#define SET_FIELD(value, field)                                         \
+   ({                                                                   \
+      uint32_t fieldval = (value) << field ## _SHIFT;                   \
+      assert((fieldval & ~ field ## _MASK) == 0);                       \
+      fieldval & field ## _MASK;                                        \
+   })
+
+#define GET_FIELD(word, field) (((word)  & field ## _MASK) >> field ## _SHIFT)
+
+#ifndef BRW_DEFINES_H
+#define BRW_DEFINES_H
+
+/* 3D state:
+ */
+#define CMD_3D_PRIM                                 0x7b00 /* 3DPRIMITIVE */
+/* DW0 */
+# define GEN4_3DPRIM_TOPOLOGY_TYPE_SHIFT            10
+# define GEN4_3DPRIM_VERTEXBUFFER_ACCESS_SEQUENTIAL (0 << 15)
+# define GEN4_3DPRIM_VERTEXBUFFER_ACCESS_RANDOM     (1 << 15)
+# define GEN7_3DPRIM_INDIRECT_PARAMETER_ENABLE      (1 << 10)
+/* DW1 */
+# define GEN7_3DPRIM_VERTEXBUFFER_ACCESS_SEQUENTIAL (0 << 8)
+# define GEN7_3DPRIM_VERTEXBUFFER_ACCESS_RANDOM     (1 << 8)
+
+#define _3DPRIM_POINTLIST         0x01
+#define _3DPRIM_LINELIST          0x02
+#define _3DPRIM_LINESTRIP         0x03
+#define _3DPRIM_TRILIST           0x04
+#define _3DPRIM_TRISTRIP          0x05
+#define _3DPRIM_TRIFAN            0x06
+#define _3DPRIM_QUADLIST          0x07
+#define _3DPRIM_QUADSTRIP         0x08
+#define _3DPRIM_LINELIST_ADJ      0x09
+#define _3DPRIM_LINESTRIP_ADJ     0x0A
+#define _3DPRIM_TRILIST_ADJ       0x0B
+#define _3DPRIM_TRISTRIP_ADJ      0x0C
+#define _3DPRIM_TRISTRIP_REVERSE  0x0D
+#define _3DPRIM_POLYGON           0x0E
+#define _3DPRIM_RECTLIST          0x0F
+#define _3DPRIM_LINELOOP          0x10
+#define _3DPRIM_POINTLIST_BF      0x11
+#define _3DPRIM_LINESTRIP_CONT    0x12
+#define _3DPRIM_LINESTRIP_BF      0x13
+#define _3DPRIM_LINESTRIP_CONT_BF 0x14
+#define _3DPRIM_TRIFAN_NOSTIPPLE  0x15
+
+#define BRW_ANISORATIO_2     0
+#define BRW_ANISORATIO_4     1
+#define BRW_ANISORATIO_6     2
+#define BRW_ANISORATIO_8     3
+#define BRW_ANISORATIO_10    4
+#define BRW_ANISORATIO_12    5
+#define BRW_ANISORATIO_14    6
+#define BRW_ANISORATIO_16    7
+
+#define BRW_BLENDFACTOR_ONE                 0x1
+#define BRW_BLENDFACTOR_SRC_COLOR           0x2
+#define BRW_BLENDFACTOR_SRC_ALPHA           0x3
+#define BRW_BLENDFACTOR_DST_ALPHA           0x4
+#define BRW_BLENDFACTOR_DST_COLOR           0x5
+#define BRW_BLENDFACTOR_SRC_ALPHA_SATURATE  0x6
+#define BRW_BLENDFACTOR_CONST_COLOR         0x7
+#define BRW_BLENDFACTOR_CONST_ALPHA         0x8
+#define BRW_BLENDFACTOR_SRC1_COLOR          0x9
+#define BRW_BLENDFACTOR_SRC1_ALPHA          0x0A
+#define BRW_BLENDFACTOR_ZERO                0x11
+#define BRW_BLENDFACTOR_INV_SRC_COLOR       0x12
+#define BRW_BLENDFACTOR_INV_SRC_ALPHA       0x13
+#define BRW_BLENDFACTOR_INV_DST_ALPHA       0x14
+#define BRW_BLENDFACTOR_INV_DST_COLOR       0x15
+#define BRW_BLENDFACTOR_INV_CONST_COLOR     0x17
+#define BRW_BLENDFACTOR_INV_CONST_ALPHA     0x18
+#define BRW_BLENDFACTOR_INV_SRC1_COLOR      0x19
+#define BRW_BLENDFACTOR_INV_SRC1_ALPHA      0x1A
+
+#define BRW_BLENDFUNCTION_ADD               0
+#define BRW_BLENDFUNCTION_SUBTRACT          1
+#define BRW_BLENDFUNCTION_REVERSE_SUBTRACT  2
+#define BRW_BLENDFUNCTION_MIN               3
+#define BRW_BLENDFUNCTION_MAX               4
+
+#define BRW_ALPHATEST_FORMAT_UNORM8         0
+#define BRW_ALPHATEST_FORMAT_FLOAT32        1
+
+#define BRW_CHROMAKEY_KILL_ON_ANY_MATCH  0
+#define BRW_CHROMAKEY_REPLACE_BLACK      1
+
+#define BRW_CLIP_API_OGL     0
+#define BRW_CLIP_API_DX      1
+
+#define BRW_CLIPMODE_NORMAL              0
+#define BRW_CLIPMODE_CLIP_ALL            1
+#define BRW_CLIPMODE_CLIP_NON_REJECTED   2
+#define BRW_CLIPMODE_REJECT_ALL          3
+#define BRW_CLIPMODE_ACCEPT_ALL          4
+#define BRW_CLIPMODE_KERNEL_CLIP         5
+
+#define BRW_CLIP_NDCSPACE     0
+#define BRW_CLIP_SCREENSPACE  1
+
+#define BRW_COMPAREFUNCTION_ALWAYS       0
+#define BRW_COMPAREFUNCTION_NEVER        1
+#define BRW_COMPAREFUNCTION_LESS         2
+#define BRW_COMPAREFUNCTION_EQUAL        3
+#define BRW_COMPAREFUNCTION_LEQUAL       4
+#define BRW_COMPAREFUNCTION_GREATER      5
+#define BRW_COMPAREFUNCTION_NOTEQUAL     6
+#define BRW_COMPAREFUNCTION_GEQUAL       7
+
+#define BRW_COVERAGE_PIXELS_HALF     0
+#define BRW_COVERAGE_PIXELS_1        1
+#define BRW_COVERAGE_PIXELS_2        2
+#define BRW_COVERAGE_PIXELS_4        3
+
+#define BRW_CULLMODE_BOTH        0
+#define BRW_CULLMODE_NONE        1
+#define BRW_CULLMODE_FRONT       2
+#define BRW_CULLMODE_BACK        3
+
+#define BRW_DEFAULTCOLOR_R8G8B8A8_UNORM      0
+#define BRW_DEFAULTCOLOR_R32G32B32A32_FLOAT  1
+
+#define BRW_DEPTHFORMAT_D32_FLOAT_S8X24_UINT     0
+#define BRW_DEPTHFORMAT_D32_FLOAT                1
+#define BRW_DEPTHFORMAT_D24_UNORM_S8_UINT        2
+#define BRW_DEPTHFORMAT_D24_UNORM_X8_UINT        3 /* GEN5 */
+#define BRW_DEPTHFORMAT_D16_UNORM                5
+
+#define BRW_FLOATING_POINT_IEEE_754        0
+#define BRW_FLOATING_POINT_NON_IEEE_754    1
+
+#define BRW_FRONTWINDING_CW      0
+#define BRW_FRONTWINDING_CCW     1
+
+#define BRW_SPRITE_POINT_ENABLE  16
+
+#define BRW_CUT_INDEX_ENABLE     (1 << 10)
+
+#define BRW_INDEX_BYTE     0
+#define BRW_INDEX_WORD     1
+#define BRW_INDEX_DWORD    2
+
+#define BRW_LOGICOPFUNCTION_CLEAR            0
+#define BRW_LOGICOPFUNCTION_NOR              1
+#define BRW_LOGICOPFUNCTION_AND_INVERTED     2
+#define BRW_LOGICOPFUNCTION_COPY_INVERTED    3
+#define BRW_LOGICOPFUNCTION_AND_REVERSE      4
+#define BRW_LOGICOPFUNCTION_INVERT           5
+#define BRW_LOGICOPFUNCTION_XOR              6
+#define BRW_LOGICOPFUNCTION_NAND             7
+#define BRW_LOGICOPFUNCTION_AND              8
+#define BRW_LOGICOPFUNCTION_EQUIV            9
+#define BRW_LOGICOPFUNCTION_NOOP             10
+#define BRW_LOGICOPFUNCTION_OR_INVERTED      11
+#define BRW_LOGICOPFUNCTION_COPY             12
+#define BRW_LOGICOPFUNCTION_OR_REVERSE       13
+#define BRW_LOGICOPFUNCTION_OR               14
+#define BRW_LOGICOPFUNCTION_SET              15
+
+#define BRW_MAPFILTER_NEAREST        0x0
+#define BRW_MAPFILTER_LINEAR         0x1
+#define BRW_MAPFILTER_ANISOTROPIC    0x2
+
+#define BRW_MIPFILTER_NONE        0
+#define BRW_MIPFILTER_NEAREST     1
+#define BRW_MIPFILTER_LINEAR      3
+
+#define BRW_ADDRESS_ROUNDING_ENABLE_U_MAG	0x20
+#define BRW_ADDRESS_ROUNDING_ENABLE_U_MIN	0x10
+#define BRW_ADDRESS_ROUNDING_ENABLE_V_MAG	0x08
+#define BRW_ADDRESS_ROUNDING_ENABLE_V_MIN	0x04
+#define BRW_ADDRESS_ROUNDING_ENABLE_R_MAG	0x02
+#define BRW_ADDRESS_ROUNDING_ENABLE_R_MIN	0x01
+
+#define BRW_POLYGON_FRONT_FACING     0
+#define BRW_POLYGON_BACK_FACING      1
+
+#define BRW_PREFILTER_ALWAYS     0x0
+#define BRW_PREFILTER_NEVER      0x1
+#define BRW_PREFILTER_LESS       0x2
+#define BRW_PREFILTER_EQUAL      0x3
+#define BRW_PREFILTER_LEQUAL     0x4
+#define BRW_PREFILTER_GREATER    0x5
+#define BRW_PREFILTER_NOTEQUAL   0x6
+#define BRW_PREFILTER_GEQUAL     0x7
+
+#define BRW_PROVOKING_VERTEX_0    0
+#define BRW_PROVOKING_VERTEX_1    1
+#define BRW_PROVOKING_VERTEX_2    2
+
+#define BRW_RASTRULE_UPPER_LEFT  0
+#define BRW_RASTRULE_UPPER_RIGHT 1
+/* These are listed as "Reserved, but not seen as useful"
+ * in Intel documentation (page 212, "Point Rasterization Rule",
+ * section 7.4 "SF Pipeline State Summary", of document
+ * "Intel® 965 Express Chipset Family and Intel® G35 Express
+ * Chipset Graphics Controller Programmer's Reference Manual,
+ * Volume 2: 3D/Media", Revision 1.0b as of January 2008,
+ * available at
+ *     http://intellinuxgraphics.org/documentation.html
+ * at the time of this writing).
+ *
+ * These appear to be supported on at least some
+ * i965-family devices, and the BRW_RASTRULE_LOWER_RIGHT
+ * is useful when using OpenGL to render to a FBO
+ * (which has the pixel coordinate Y orientation inverted
+ * with respect to the normal OpenGL pixel coordinate system).
+ */
+#define BRW_RASTRULE_LOWER_LEFT  2
+#define BRW_RASTRULE_LOWER_RIGHT 3
+
+#define BRW_RENDERTARGET_CLAMPRANGE_UNORM    0
+#define BRW_RENDERTARGET_CLAMPRANGE_SNORM    1
+#define BRW_RENDERTARGET_CLAMPRANGE_FORMAT   2
+
+#define BRW_STENCILOP_KEEP               0
+#define BRW_STENCILOP_ZERO               1
+#define BRW_STENCILOP_REPLACE            2
+#define BRW_STENCILOP_INCRSAT            3
+#define BRW_STENCILOP_DECRSAT            4
+#define BRW_STENCILOP_INCR               5
+#define BRW_STENCILOP_DECR               6
+#define BRW_STENCILOP_INVERT             7
+
+/* Surface state DW0 */
+#define GEN8_SURFACE_IS_ARRAY                       (1 << 28)
+#define GEN8_SURFACE_VALIGN_4                       (1 << 16)
+#define GEN8_SURFACE_VALIGN_8                       (2 << 16)
+#define GEN8_SURFACE_VALIGN_16                      (3 << 16)
+#define GEN8_SURFACE_HALIGN_4                       (1 << 14)
+#define GEN8_SURFACE_HALIGN_8                       (2 << 14)
+#define GEN8_SURFACE_HALIGN_16                      (3 << 14)
+#define GEN8_SURFACE_TILING_NONE                    (0 << 12)
+#define GEN8_SURFACE_TILING_W                       (1 << 12)
+#define GEN8_SURFACE_TILING_X                       (2 << 12)
+#define GEN8_SURFACE_TILING_Y                       (3 << 12)
+#define BRW_SURFACE_RC_READ_WRITE	(1 << 8)
+#define BRW_SURFACE_MIPLAYOUT_SHIFT	10
+#define BRW_SURFACE_MIPMAPLAYOUT_BELOW   0
+#define BRW_SURFACE_MIPMAPLAYOUT_RIGHT   1
+#define BRW_SURFACE_CUBEFACE_ENABLES	0x3f
+#define BRW_SURFACE_BLEND_ENABLED	(1 << 13)
+#define BRW_SURFACE_WRITEDISABLE_B_SHIFT	14
+#define BRW_SURFACE_WRITEDISABLE_G_SHIFT	15
+#define BRW_SURFACE_WRITEDISABLE_R_SHIFT	16
+#define BRW_SURFACE_WRITEDISABLE_A_SHIFT	17
+
+#define BRW_SURFACEFORMAT_R32G32B32A32_FLOAT             0x000
+#define BRW_SURFACEFORMAT_R32G32B32A32_SINT              0x001
+#define BRW_SURFACEFORMAT_R32G32B32A32_UINT              0x002
+#define BRW_SURFACEFORMAT_R32G32B32A32_UNORM             0x003
+#define BRW_SURFACEFORMAT_R32G32B32A32_SNORM             0x004
+#define BRW_SURFACEFORMAT_R64G64_FLOAT                   0x005
+#define BRW_SURFACEFORMAT_R32G32B32X32_FLOAT             0x006
+#define BRW_SURFACEFORMAT_R32G32B32A32_SSCALED           0x007
+#define BRW_SURFACEFORMAT_R32G32B32A32_USCALED           0x008
+#define BRW_SURFACEFORMAT_R32G32B32A32_SFIXED            0x020
+#define BRW_SURFACEFORMAT_R64G64_PASSTHRU                0x021
+#define BRW_SURFACEFORMAT_R32G32B32_FLOAT                0x040
+#define BRW_SURFACEFORMAT_R32G32B32_SINT                 0x041
+#define BRW_SURFACEFORMAT_R32G32B32_UINT                 0x042
+#define BRW_SURFACEFORMAT_R32G32B32_UNORM                0x043
+#define BRW_SURFACEFORMAT_R32G32B32_SNORM                0x044
+#define BRW_SURFACEFORMAT_R32G32B32_SSCALED              0x045
+#define BRW_SURFACEFORMAT_R32G32B32_USCALED              0x046
+#define BRW_SURFACEFORMAT_R32G32B32_SFIXED               0x050
+#define BRW_SURFACEFORMAT_R16G16B16A16_UNORM             0x080
+#define BRW_SURFACEFORMAT_R16G16B16A16_SNORM             0x081
+#define BRW_SURFACEFORMAT_R16G16B16A16_SINT              0x082
+#define BRW_SURFACEFORMAT_R16G16B16A16_UINT              0x083
+#define BRW_SURFACEFORMAT_R16G16B16A16_FLOAT             0x084
+#define BRW_SURFACEFORMAT_R32G32_FLOAT                   0x085
+#define BRW_SURFACEFORMAT_R32G32_SINT                    0x086
+#define BRW_SURFACEFORMAT_R32G32_UINT                    0x087
+#define BRW_SURFACEFORMAT_R32_FLOAT_X8X24_TYPELESS       0x088
+#define BRW_SURFACEFORMAT_X32_TYPELESS_G8X24_UINT        0x089
+#define BRW_SURFACEFORMAT_L32A32_FLOAT                   0x08A
+#define BRW_SURFACEFORMAT_R32G32_UNORM                   0x08B
+#define BRW_SURFACEFORMAT_R32G32_SNORM                   0x08C
+#define BRW_SURFACEFORMAT_R64_FLOAT                      0x08D
+#define BRW_SURFACEFORMAT_R16G16B16X16_UNORM             0x08E
+#define BRW_SURFACEFORMAT_R16G16B16X16_FLOAT             0x08F
+#define BRW_SURFACEFORMAT_A32X32_FLOAT                   0x090
+#define BRW_SURFACEFORMAT_L32X32_FLOAT                   0x091
+#define BRW_SURFACEFORMAT_I32X32_FLOAT                   0x092
+#define BRW_SURFACEFORMAT_R16G16B16A16_SSCALED           0x093
+#define BRW_SURFACEFORMAT_R16G16B16A16_USCALED           0x094
+#define BRW_SURFACEFORMAT_R32G32_SSCALED                 0x095
+#define BRW_SURFACEFORMAT_R32G32_USCALED                 0x096
+#define BRW_SURFACEFORMAT_R32G32_FLOAT_LD                0x097
+#define BRW_SURFACEFORMAT_R32G32_SFIXED                  0x0A0
+#define BRW_SURFACEFORMAT_R64_PASSTHRU                   0x0A1
+#define BRW_SURFACEFORMAT_B8G8R8A8_UNORM                 0x0C0
+#define BRW_SURFACEFORMAT_B8G8R8A8_UNORM_SRGB            0x0C1
+#define BRW_SURFACEFORMAT_R10G10B10A2_UNORM              0x0C2
+#define BRW_SURFACEFORMAT_R10G10B10A2_UNORM_SRGB         0x0C3
+#define BRW_SURFACEFORMAT_R10G10B10A2_UINT               0x0C4
+#define BRW_SURFACEFORMAT_R10G10B10_SNORM_A2_UNORM       0x0C5
+#define BRW_SURFACEFORMAT_R8G8B8A8_UNORM                 0x0C7
+#define BRW_SURFACEFORMAT_R8G8B8A8_UNORM_SRGB            0x0C8
+#define BRW_SURFACEFORMAT_R8G8B8A8_SNORM                 0x0C9
+#define BRW_SURFACEFORMAT_R8G8B8A8_SINT                  0x0CA
+#define BRW_SURFACEFORMAT_R8G8B8A8_UINT                  0x0CB
+#define BRW_SURFACEFORMAT_R16G16_UNORM                   0x0CC
+#define BRW_SURFACEFORMAT_R16G16_SNORM                   0x0CD
+#define BRW_SURFACEFORMAT_R16G16_SINT                    0x0CE
+#define BRW_SURFACEFORMAT_R16G16_UINT                    0x0CF
+#define BRW_SURFACEFORMAT_R16G16_FLOAT                   0x0D0
+#define BRW_SURFACEFORMAT_B10G10R10A2_UNORM              0x0D1
+#define BRW_SURFACEFORMAT_B10G10R10A2_UNORM_SRGB         0x0D2
+#define BRW_SURFACEFORMAT_R11G11B10_FLOAT                0x0D3
+#define BRW_SURFACEFORMAT_R32_SINT                       0x0D6
+#define BRW_SURFACEFORMAT_R32_UINT                       0x0D7
+#define BRW_SURFACEFORMAT_R32_FLOAT                      0x0D8
+#define BRW_SURFACEFORMAT_R24_UNORM_X8_TYPELESS          0x0D9
+#define BRW_SURFACEFORMAT_X24_TYPELESS_G8_UINT           0x0DA
+#define BRW_SURFACEFORMAT_L16A16_UNORM                   0x0DF
+#define BRW_SURFACEFORMAT_I24X8_UNORM                    0x0E0
+#define BRW_SURFACEFORMAT_L24X8_UNORM                    0x0E1
+#define BRW_SURFACEFORMAT_A24X8_UNORM                    0x0E2
+#define BRW_SURFACEFORMAT_I32_FLOAT                      0x0E3
+#define BRW_SURFACEFORMAT_L32_FLOAT                      0x0E4
+#define BRW_SURFACEFORMAT_A32_FLOAT                      0x0E5
+#define BRW_SURFACEFORMAT_B8G8R8X8_UNORM                 0x0E9
+#define BRW_SURFACEFORMAT_B8G8R8X8_UNORM_SRGB            0x0EA
+#define BRW_SURFACEFORMAT_R8G8B8X8_UNORM                 0x0EB
+#define BRW_SURFACEFORMAT_R8G8B8X8_UNORM_SRGB            0x0EC
+#define BRW_SURFACEFORMAT_R9G9B9E5_SHAREDEXP             0x0ED
+#define BRW_SURFACEFORMAT_B10G10R10X2_UNORM              0x0EE
+#define BRW_SURFACEFORMAT_L16A16_FLOAT                   0x0F0
+#define BRW_SURFACEFORMAT_R32_UNORM                      0x0F1
+#define BRW_SURFACEFORMAT_R32_SNORM                      0x0F2
+#define BRW_SURFACEFORMAT_R10G10B10X2_USCALED            0x0F3
+#define BRW_SURFACEFORMAT_R8G8B8A8_SSCALED               0x0F4
+#define BRW_SURFACEFORMAT_R8G8B8A8_USCALED               0x0F5
+#define BRW_SURFACEFORMAT_R16G16_SSCALED                 0x0F6
+#define BRW_SURFACEFORMAT_R16G16_USCALED                 0x0F7
+#define BRW_SURFACEFORMAT_R32_SSCALED                    0x0F8
+#define BRW_SURFACEFORMAT_R32_USCALED                    0x0F9
+#define BRW_SURFACEFORMAT_B5G6R5_UNORM                   0x100
+#define BRW_SURFACEFORMAT_B5G6R5_UNORM_SRGB              0x101
+#define BRW_SURFACEFORMAT_B5G5R5A1_UNORM                 0x102
+#define BRW_SURFACEFORMAT_B5G5R5A1_UNORM_SRGB            0x103
+#define BRW_SURFACEFORMAT_B4G4R4A4_UNORM                 0x104
+#define BRW_SURFACEFORMAT_B4G4R4A4_UNORM_SRGB            0x105
+#define BRW_SURFACEFORMAT_R8G8_UNORM                     0x106
+#define BRW_SURFACEFORMAT_R8G8_SNORM                     0x107
+#define BRW_SURFACEFORMAT_R8G8_SINT                      0x108
+#define BRW_SURFACEFORMAT_R8G8_UINT                      0x109
+#define BRW_SURFACEFORMAT_R16_UNORM                      0x10A
+#define BRW_SURFACEFORMAT_R16_SNORM                      0x10B
+#define BRW_SURFACEFORMAT_R16_SINT                       0x10C
+#define BRW_SURFACEFORMAT_R16_UINT                       0x10D
+#define BRW_SURFACEFORMAT_R16_FLOAT                      0x10E
+#define BRW_SURFACEFORMAT_A8P8_UNORM_PALETTE0            0x10F
+#define BRW_SURFACEFORMAT_A8P8_UNORM_PALETTE1            0x110
+#define BRW_SURFACEFORMAT_I16_UNORM                      0x111
+#define BRW_SURFACEFORMAT_L16_UNORM                      0x112
+#define BRW_SURFACEFORMAT_A16_UNORM                      0x113
+#define BRW_SURFACEFORMAT_L8A8_UNORM                     0x114
+#define BRW_SURFACEFORMAT_I16_FLOAT                      0x115
+#define BRW_SURFACEFORMAT_L16_FLOAT                      0x116
+#define BRW_SURFACEFORMAT_A16_FLOAT                      0x117
+#define BRW_SURFACEFORMAT_L8A8_UNORM_SRGB                0x118
+#define BRW_SURFACEFORMAT_R5G5_SNORM_B6_UNORM            0x119
+#define BRW_SURFACEFORMAT_B5G5R5X1_UNORM                 0x11A
+#define BRW_SURFACEFORMAT_B5G5R5X1_UNORM_SRGB            0x11B
+#define BRW_SURFACEFORMAT_R8G8_SSCALED                   0x11C
+#define BRW_SURFACEFORMAT_R8G8_USCALED                   0x11D
+#define BRW_SURFACEFORMAT_R16_SSCALED                    0x11E
+#define BRW_SURFACEFORMAT_R16_USCALED                    0x11F
+#define BRW_SURFACEFORMAT_P8A8_UNORM_PALETTE0            0x122
+#define BRW_SURFACEFORMAT_P8A8_UNORM_PALETTE1            0x123
+#define BRW_SURFACEFORMAT_A1B5G5R5_UNORM                 0x124
+#define BRW_SURFACEFORMAT_A4B4G4R4_UNORM                 0x125
+#define BRW_SURFACEFORMAT_L8A8_UINT                      0x126
+#define BRW_SURFACEFORMAT_L8A8_SINT                      0x127
+#define BRW_SURFACEFORMAT_R8_UNORM                       0x140
+#define BRW_SURFACEFORMAT_R8_SNORM                       0x141
+#define BRW_SURFACEFORMAT_R8_SINT                        0x142
+#define BRW_SURFACEFORMAT_R8_UINT                        0x143
+#define BRW_SURFACEFORMAT_A8_UNORM                       0x144
+#define BRW_SURFACEFORMAT_I8_UNORM                       0x145
+#define BRW_SURFACEFORMAT_L8_UNORM                       0x146
+#define BRW_SURFACEFORMAT_P4A4_UNORM                     0x147
+#define BRW_SURFACEFORMAT_A4P4_UNORM                     0x148
+#define BRW_SURFACEFORMAT_R8_SSCALED                     0x149
+#define BRW_SURFACEFORMAT_R8_USCALED                     0x14A
+#define BRW_SURFACEFORMAT_P8_UNORM_PALETTE0              0x14B
+#define BRW_SURFACEFORMAT_L8_UNORM_SRGB                  0x14C
+#define BRW_SURFACEFORMAT_P8_UNORM_PALETTE1              0x14D
+#define BRW_SURFACEFORMAT_P4A4_UNORM_PALETTE1            0x14E
+#define BRW_SURFACEFORMAT_A4P4_UNORM_PALETTE1            0x14F
+#define BRW_SURFACEFORMAT_Y8_SNORM                       0x150
+#define BRW_SURFACEFORMAT_L8_UINT                        0x152
+#define BRW_SURFACEFORMAT_L8_SINT                        0x153
+#define BRW_SURFACEFORMAT_I8_UINT                        0x154
+#define BRW_SURFACEFORMAT_I8_SINT                        0x155
+#define BRW_SURFACEFORMAT_DXT1_RGB_SRGB                  0x180
+#define BRW_SURFACEFORMAT_R1_UINT                        0x181
+#define BRW_SURFACEFORMAT_YCRCB_NORMAL                   0x182
+#define BRW_SURFACEFORMAT_YCRCB_SWAPUVY                  0x183
+#define BRW_SURFACEFORMAT_P2_UNORM_PALETTE0              0x184
+#define BRW_SURFACEFORMAT_P2_UNORM_PALETTE1              0x185
+#define BRW_SURFACEFORMAT_BC1_UNORM                      0x186
+#define BRW_SURFACEFORMAT_BC2_UNORM                      0x187
+#define BRW_SURFACEFORMAT_BC3_UNORM                      0x188
+#define BRW_SURFACEFORMAT_BC4_UNORM                      0x189
+#define BRW_SURFACEFORMAT_BC5_UNORM                      0x18A
+#define BRW_SURFACEFORMAT_BC1_UNORM_SRGB                 0x18B
+#define BRW_SURFACEFORMAT_BC2_UNORM_SRGB                 0x18C
+#define BRW_SURFACEFORMAT_BC3_UNORM_SRGB                 0x18D
+#define BRW_SURFACEFORMAT_MONO8                          0x18E
+#define BRW_SURFACEFORMAT_YCRCB_SWAPUV                   0x18F
+#define BRW_SURFACEFORMAT_YCRCB_SWAPY                    0x190
+#define BRW_SURFACEFORMAT_DXT1_RGB                       0x191
+#define BRW_SURFACEFORMAT_FXT1                           0x192
+#define BRW_SURFACEFORMAT_R8G8B8_UNORM                   0x193
+#define BRW_SURFACEFORMAT_R8G8B8_SNORM                   0x194
+#define BRW_SURFACEFORMAT_R8G8B8_SSCALED                 0x195
+#define BRW_SURFACEFORMAT_R8G8B8_USCALED                 0x196
+#define BRW_SURFACEFORMAT_R64G64B64A64_FLOAT             0x197
+#define BRW_SURFACEFORMAT_R64G64B64_FLOAT                0x198
+#define BRW_SURFACEFORMAT_BC4_SNORM                      0x199
+#define BRW_SURFACEFORMAT_BC5_SNORM                      0x19A
+#define BRW_SURFACEFORMAT_R16G16B16_FLOAT                0x19B
+#define BRW_SURFACEFORMAT_R16G16B16_UNORM                0x19C
+#define BRW_SURFACEFORMAT_R16G16B16_SNORM                0x19D
+#define BRW_SURFACEFORMAT_R16G16B16_SSCALED              0x19E
+#define BRW_SURFACEFORMAT_R16G16B16_USCALED              0x19F
+#define BRW_SURFACEFORMAT_BC6H_SF16                      0x1A1
+#define BRW_SURFACEFORMAT_BC7_UNORM                      0x1A2
+#define BRW_SURFACEFORMAT_BC7_UNORM_SRGB                 0x1A3
+#define BRW_SURFACEFORMAT_BC6H_UF16                      0x1A4
+#define BRW_SURFACEFORMAT_PLANAR_420_8                   0x1A5
+#define BRW_SURFACEFORMAT_R8G8B8_UNORM_SRGB              0x1A8
+#define BRW_SURFACEFORMAT_ETC1_RGB8                      0x1A9
+#define BRW_SURFACEFORMAT_ETC2_RGB8                      0x1AA
+#define BRW_SURFACEFORMAT_EAC_R11                        0x1AB
+#define BRW_SURFACEFORMAT_EAC_RG11                       0x1AC
+#define BRW_SURFACEFORMAT_EAC_SIGNED_R11                 0x1AD
+#define BRW_SURFACEFORMAT_EAC_SIGNED_RG11                0x1AE
+#define BRW_SURFACEFORMAT_ETC2_SRGB8                     0x1AF
+#define BRW_SURFACEFORMAT_R16G16B16_UINT                 0x1B0
+#define BRW_SURFACEFORMAT_R16G16B16_SINT                 0x1B1
+#define BRW_SURFACEFORMAT_R32_SFIXED                     0x1B2
+#define BRW_SURFACEFORMAT_R10G10B10A2_SNORM              0x1B3
+#define BRW_SURFACEFORMAT_R10G10B10A2_USCALED            0x1B4
+#define BRW_SURFACEFORMAT_R10G10B10A2_SSCALED            0x1B5
+#define BRW_SURFACEFORMAT_R10G10B10A2_SINT               0x1B6
+#define BRW_SURFACEFORMAT_B10G10R10A2_SNORM              0x1B7
+#define BRW_SURFACEFORMAT_B10G10R10A2_USCALED            0x1B8
+#define BRW_SURFACEFORMAT_B10G10R10A2_SSCALED            0x1B9
+#define BRW_SURFACEFORMAT_B10G10R10A2_UINT               0x1BA
+#define BRW_SURFACEFORMAT_B10G10R10A2_SINT               0x1BB
+#define BRW_SURFACEFORMAT_R64G64B64A64_PASSTHRU          0x1BC
+#define BRW_SURFACEFORMAT_R64G64B64_PASSTHRU             0x1BD
+#define BRW_SURFACEFORMAT_ETC2_RGB8_PTA                  0x1C0
+#define BRW_SURFACEFORMAT_ETC2_SRGB8_PTA                 0x1C1
+#define BRW_SURFACEFORMAT_ETC2_EAC_RGBA8                 0x1C2
+#define BRW_SURFACEFORMAT_ETC2_EAC_SRGB8_A8              0x1C3
+#define BRW_SURFACEFORMAT_R8G8B8_UINT                    0x1C8
+#define BRW_SURFACEFORMAT_R8G8B8_SINT                    0x1C9
+#define BRW_SURFACEFORMAT_RAW                            0x1FF
+#define BRW_SURFACE_FORMAT_SHIFT	18
+#define BRW_SURFACE_FORMAT_MASK		INTEL_MASK(26, 18)
+
+#define BRW_SURFACERETURNFORMAT_FLOAT32  0
+#define BRW_SURFACERETURNFORMAT_S1       1
+
+#define BRW_SURFACE_TYPE_SHIFT		29
+#define BRW_SURFACE_TYPE_MASK		INTEL_MASK(31, 29)
+#define BRW_SURFACE_1D      0
+#define BRW_SURFACE_2D      1
+#define BRW_SURFACE_3D      2
+#define BRW_SURFACE_CUBE    3
+#define BRW_SURFACE_BUFFER  4
+#define BRW_SURFACE_NULL    7
+
+#define GEN7_SURFACE_IS_ARRAY           (1 << 28)
+#define GEN7_SURFACE_VALIGN_2           (0 << 16)
+#define GEN7_SURFACE_VALIGN_4           (1 << 16)
+#define GEN7_SURFACE_HALIGN_4           (0 << 15)
+#define GEN7_SURFACE_HALIGN_8           (1 << 15)
+#define GEN7_SURFACE_TILING_NONE        (0 << 13)
+#define GEN7_SURFACE_TILING_X           (2 << 13)
+#define GEN7_SURFACE_TILING_Y           (3 << 13)
+#define GEN7_SURFACE_ARYSPC_FULL	(0 << 10)
+#define GEN7_SURFACE_ARYSPC_LOD0	(1 << 10)
+
+/* Surface state DW0 */
+#define GEN8_SURFACE_MOCS_SHIFT         24
+#define GEN8_SURFACE_MOCS_MASK          INTEL_MASK(30, 24)
+
+/* Surface state DW2 */
+#define BRW_SURFACE_HEIGHT_SHIFT	19
+#define BRW_SURFACE_HEIGHT_MASK		INTEL_MASK(31, 19)
+#define BRW_SURFACE_WIDTH_SHIFT		6
+#define BRW_SURFACE_WIDTH_MASK		INTEL_MASK(18, 6)
+#define BRW_SURFACE_LOD_SHIFT		2
+#define BRW_SURFACE_LOD_MASK		INTEL_MASK(5, 2)
+#define GEN7_SURFACE_HEIGHT_SHIFT       16
+#define GEN7_SURFACE_HEIGHT_MASK        INTEL_MASK(29, 16)
+#define GEN7_SURFACE_WIDTH_SHIFT        0
+#define GEN7_SURFACE_WIDTH_MASK         INTEL_MASK(13, 0)
+
+/* Surface state DW3 */
+#define BRW_SURFACE_DEPTH_SHIFT		21
+#define BRW_SURFACE_DEPTH_MASK		INTEL_MASK(31, 21)
+#define BRW_SURFACE_PITCH_SHIFT		3
+#define BRW_SURFACE_PITCH_MASK		INTEL_MASK(19, 3)
+#define BRW_SURFACE_TILED		(1 << 1)
+#define BRW_SURFACE_TILED_Y		(1 << 0)
+
+/* Surface state DW4 */
+#define BRW_SURFACE_MIN_LOD_SHIFT	28
+#define BRW_SURFACE_MIN_LOD_MASK	INTEL_MASK(31, 28)
+#define BRW_SURFACE_MULTISAMPLECOUNT_1  (0 << 4)
+#define BRW_SURFACE_MULTISAMPLECOUNT_4  (2 << 4)
+#define GEN7_SURFACE_MULTISAMPLECOUNT_1         (0 << 3)
+#define GEN8_SURFACE_MULTISAMPLECOUNT_2         (1 << 3)
+#define GEN7_SURFACE_MULTISAMPLECOUNT_4         (2 << 3)
+#define GEN7_SURFACE_MULTISAMPLECOUNT_8         (3 << 3)
+#define GEN8_SURFACE_MULTISAMPLECOUNT_16        (4 << 3)
+#define GEN7_SURFACE_MSFMT_MSS                  (0 << 6)
+#define GEN7_SURFACE_MSFMT_DEPTH_STENCIL        (1 << 6)
+#define GEN7_SURFACE_MIN_ARRAY_ELEMENT_SHIFT	18
+#define GEN7_SURFACE_MIN_ARRAY_ELEMENT_MASK     INTEL_MASK(28, 18)
+#define GEN7_SURFACE_RENDER_TARGET_VIEW_EXTENT_SHIFT	7
+#define GEN7_SURFACE_RENDER_TARGET_VIEW_EXTENT_MASK   INTEL_MASK(17, 7)
+
+/* Surface state DW5 */
+#define BRW_SURFACE_X_OFFSET_SHIFT		25
+#define BRW_SURFACE_X_OFFSET_MASK		INTEL_MASK(31, 25)
+#define BRW_SURFACE_VERTICAL_ALIGN_ENABLE	(1 << 24)
+#define BRW_SURFACE_Y_OFFSET_SHIFT		20
+#define BRW_SURFACE_Y_OFFSET_MASK		INTEL_MASK(23, 20)
+#define GEN7_SURFACE_MIN_LOD_SHIFT              4
+#define GEN7_SURFACE_MIN_LOD_MASK               INTEL_MASK(7, 4)
+#define GEN8_SURFACE_Y_OFFSET_SHIFT		21
+#define GEN8_SURFACE_Y_OFFSET_MASK		INTEL_MASK(23, 21)
+
+#define GEN7_SURFACE_MOCS_SHIFT                 16
+#define GEN7_SURFACE_MOCS_MASK                  INTEL_MASK(19, 16)
+
+/* Surface state DW6 */
+#define GEN7_SURFACE_MCS_ENABLE                 (1 << 0)
+#define GEN7_SURFACE_MCS_PITCH_SHIFT            3
+#define GEN7_SURFACE_MCS_PITCH_MASK             INTEL_MASK(11, 3)
+#define GEN8_SURFACE_AUX_QPITCH_SHIFT           16
+#define GEN8_SURFACE_AUX_QPITCH_MASK            INTEL_MASK(30, 16)
+#define GEN8_SURFACE_AUX_PITCH_SHIFT            3
+#define GEN8_SURFACE_AUX_PITCH_MASK             INTEL_MASK(11, 3)
+#define GEN8_SURFACE_AUX_MODE_MASK              INTEL_MASK(2, 0)
+
+#define GEN8_SURFACE_AUX_MODE_NONE              0
+#define GEN8_SURFACE_AUX_MODE_MCS               1
+#define GEN8_SURFACE_AUX_MODE_APPEND            2
+#define GEN8_SURFACE_AUX_MODE_HIZ               3
+
+/* Surface state DW7 */
+#define GEN7_SURFACE_CLEAR_COLOR_SHIFT		28
+#define GEN7_SURFACE_SCS_R_SHIFT                25
+#define GEN7_SURFACE_SCS_R_MASK                 INTEL_MASK(27, 25)
+#define GEN7_SURFACE_SCS_G_SHIFT                22
+#define GEN7_SURFACE_SCS_G_MASK                 INTEL_MASK(24, 22)
+#define GEN7_SURFACE_SCS_B_SHIFT                19
+#define GEN7_SURFACE_SCS_B_MASK                 INTEL_MASK(21, 19)
+#define GEN7_SURFACE_SCS_A_SHIFT                16
+#define GEN7_SURFACE_SCS_A_MASK                 INTEL_MASK(18, 16)
+
+/* The actual swizzle values/what channel to use */
+#define HSW_SCS_ZERO                     0
+#define HSW_SCS_ONE                      1
+#define HSW_SCS_RED                      4
+#define HSW_SCS_GREEN                    5
+#define HSW_SCS_BLUE                     6
+#define HSW_SCS_ALPHA                    7
+
+#define BRW_TEXCOORDMODE_WRAP            0
+#define BRW_TEXCOORDMODE_MIRROR          1
+#define BRW_TEXCOORDMODE_CLAMP           2
+#define BRW_TEXCOORDMODE_CUBE            3
+#define BRW_TEXCOORDMODE_CLAMP_BORDER    4
+#define BRW_TEXCOORDMODE_MIRROR_ONCE     5
+#define GEN8_TEXCOORDMODE_HALF_BORDER    6
+
+#define BRW_THREAD_PRIORITY_NORMAL   0
+#define BRW_THREAD_PRIORITY_HIGH     1
+
+#define BRW_TILEWALK_XMAJOR                 0
+#define BRW_TILEWALK_YMAJOR                 1
+
+#define BRW_VERTEX_SUBPIXEL_PRECISION_8BITS  0
+#define BRW_VERTEX_SUBPIXEL_PRECISION_4BITS  1
+
+/* Execution Unit (EU) defines
+ */
+
+#define BRW_ALIGN_1   0
+#define BRW_ALIGN_16  1
+
+#define BRW_ADDRESS_DIRECT                        0
+#define BRW_ADDRESS_REGISTER_INDIRECT_REGISTER    1
+
+#define BRW_CHANNEL_X     0
+#define BRW_CHANNEL_Y     1
+#define BRW_CHANNEL_Z     2
+#define BRW_CHANNEL_W     3
+
+enum brw_compression {
+   BRW_COMPRESSION_NONE       = 0,
+   BRW_COMPRESSION_2NDHALF    = 1,
+   BRW_COMPRESSION_COMPRESSED = 2,
+};
+
+#define GEN6_COMPRESSION_1Q		0
+#define GEN6_COMPRESSION_2Q		1
+#define GEN6_COMPRESSION_3Q		2
+#define GEN6_COMPRESSION_4Q		3
+#define GEN6_COMPRESSION_1H		0
+#define GEN6_COMPRESSION_2H		2
+
+#define BRW_CONDITIONAL_NONE  0
+#define BRW_CONDITIONAL_Z     1
+#define BRW_CONDITIONAL_NZ    2
+#define BRW_CONDITIONAL_EQ    1	/* Z */
+#define BRW_CONDITIONAL_NEQ   2	/* NZ */
+#define BRW_CONDITIONAL_G     3
+#define BRW_CONDITIONAL_GE    4
+#define BRW_CONDITIONAL_L     5
+#define BRW_CONDITIONAL_LE    6
+#define BRW_CONDITIONAL_R     7
+#define BRW_CONDITIONAL_O     8
+#define BRW_CONDITIONAL_U     9
+
+#define BRW_DEBUG_NONE        0
+#define BRW_DEBUG_BREAKPOINT  1
+
+#define BRW_DEPENDENCY_NORMAL         0
+#define BRW_DEPENDENCY_NOTCLEARED     1
+#define BRW_DEPENDENCY_NOTCHECKED     2
+#define BRW_DEPENDENCY_DISABLE        3
+
+#define BRW_EXECUTE_1     0
+#define BRW_EXECUTE_2     1
+#define BRW_EXECUTE_4     2
+#define BRW_EXECUTE_8     3
+#define BRW_EXECUTE_16    4
+#define BRW_EXECUTE_32    5
+
+#define BRW_HORIZONTAL_STRIDE_0   0
+#define BRW_HORIZONTAL_STRIDE_1   1
+#define BRW_HORIZONTAL_STRIDE_2   2
+#define BRW_HORIZONTAL_STRIDE_4   3
+
+#define BRW_INSTRUCTION_NORMAL    0
+#define BRW_INSTRUCTION_SATURATE  1
+
+#define BRW_MASK_ENABLE   0
+#define BRW_MASK_DISABLE  1
+
+/** @{
+ *
+ * Gen6 has replaced "mask enable/disable" with WECtrl, which is
+ * effectively the same but much simpler to think about.  Now, there
+ * are two contributors ANDed together to whether channels are
+ * executed: The predication on the instruction, and the channel write
+ * enable.
+ */
+/**
+ * This is the default value.  It means that a channel's write enable is set
+ * if the per-channel IP is pointing at this instruction.
+ */
+#define BRW_WE_NORMAL		0
+/**
+ * This is used like BRW_MASK_DISABLE, and causes all channels to have
+ * their write enable set.  Note that predication still contributes to
+ * whether the channel actually gets written.
+ */
+#define BRW_WE_ALL		1
+/** @} */
+
+enum opcode {
+   /* These are the actual hardware opcodes. */
+   BRW_OPCODE_MOV =	1,
+   BRW_OPCODE_SEL =	2,
+   BRW_OPCODE_NOT =	4,
+   BRW_OPCODE_AND =	5,
+   BRW_OPCODE_OR =	6,
+   BRW_OPCODE_XOR =	7,
+   BRW_OPCODE_SHR =	8,
+   BRW_OPCODE_SHL =	9,
+   BRW_OPCODE_ASR =	12,
+   BRW_OPCODE_CMP =	16,
+   BRW_OPCODE_CMPN =	17,
+   BRW_OPCODE_F32TO16 = 19,
+   BRW_OPCODE_F16TO32 = 20,
+   BRW_OPCODE_BFREV =	23,
+   BRW_OPCODE_BFE =	24,
+   BRW_OPCODE_BFI1 =	25,
+   BRW_OPCODE_BFI2 =	26,
+   BRW_OPCODE_JMPI =	32,
+   BRW_OPCODE_IF =	34,
+   BRW_OPCODE_IFF =	35,
+   BRW_OPCODE_ELSE =	36,
+   BRW_OPCODE_ENDIF =	37,
+   BRW_OPCODE_DO =	38,
+   BRW_OPCODE_WHILE =	39,
+   BRW_OPCODE_BREAK =	40,
+   BRW_OPCODE_CONTINUE = 41,
+   BRW_OPCODE_HALT =	42,
+   BRW_OPCODE_MSAVE =	44,
+   BRW_OPCODE_MRESTORE = 45,
+   BRW_OPCODE_PUSH =	46,
+   BRW_OPCODE_POP =	47,
+   BRW_OPCODE_WAIT =	48,
+   BRW_OPCODE_SEND =	49,
+   BRW_OPCODE_SENDC =	50,
+   BRW_OPCODE_MATH =	56,
+   BRW_OPCODE_ADD =	64,
+   BRW_OPCODE_MUL =	65,
+   BRW_OPCODE_AVG =	66,
+   BRW_OPCODE_FRC =	67,
+   BRW_OPCODE_RNDU =	68,
+   BRW_OPCODE_RNDD =	69,
+   BRW_OPCODE_RNDE =	70,
+   BRW_OPCODE_RNDZ =	71,
+   BRW_OPCODE_MAC =	72,
+   BRW_OPCODE_MACH =	73,
+   BRW_OPCODE_LZD =	74,
+   BRW_OPCODE_FBH =	75,
+   BRW_OPCODE_FBL =	76,
+   BRW_OPCODE_CBIT =	77,
+   BRW_OPCODE_ADDC =	78,
+   BRW_OPCODE_SUBB =	79,
+   BRW_OPCODE_SAD2 =	80,
+   BRW_OPCODE_SADA2 =	81,
+   BRW_OPCODE_DP4 =	84,
+   BRW_OPCODE_DPH =	85,
+   BRW_OPCODE_DP3 =	86,
+   BRW_OPCODE_DP2 =	87,
+   BRW_OPCODE_LINE =	89,
+   BRW_OPCODE_PLN =	90,
+   BRW_OPCODE_MAD =	91,
+   BRW_OPCODE_LRP =	92,
+   BRW_OPCODE_NOP =	126,
+
+   /* These are compiler backend opcodes that get translated into other
+    * instructions.
+    */
+   FS_OPCODE_FB_WRITE = 128,
+   FS_OPCODE_BLORP_FB_WRITE,
+   SHADER_OPCODE_RCP,
+   SHADER_OPCODE_RSQ,
+   SHADER_OPCODE_SQRT,
+   SHADER_OPCODE_EXP2,
+   SHADER_OPCODE_LOG2,
+   SHADER_OPCODE_POW,
+   SHADER_OPCODE_INT_QUOTIENT,
+   SHADER_OPCODE_INT_REMAINDER,
+   SHADER_OPCODE_SIN,
+   SHADER_OPCODE_COS,
+
+   SHADER_OPCODE_TEX,
+   SHADER_OPCODE_TXD,
+   SHADER_OPCODE_TXF,
+   SHADER_OPCODE_TXL,
+   SHADER_OPCODE_TXS,
+   FS_OPCODE_TXB,
+   SHADER_OPCODE_TXF_CMS,
+   SHADER_OPCODE_TXF_UMS,
+   SHADER_OPCODE_TXF_MCS,
+   SHADER_OPCODE_LOD,
+   SHADER_OPCODE_TG4,
+   SHADER_OPCODE_TG4_OFFSET,
+
+    // LunarG : TODO - shader time??
+//   SHADER_OPCODE_SHADER_TIME_ADD,
+
+   SHADER_OPCODE_UNTYPED_ATOMIC,
+   SHADER_OPCODE_UNTYPED_SURFACE_READ,
+
+   SHADER_OPCODE_GEN4_SCRATCH_READ,
+   SHADER_OPCODE_GEN4_SCRATCH_WRITE,
+   SHADER_OPCODE_GEN7_SCRATCH_READ,
+
+   SHADER_OPCODE_DWORD_SCATTERED_WRITE,
+   SHADER_OPCODE_BYTE_SCATTERED_WRITE,
+   SHADER_OPCODE_DWORD_SCATTERED_READ,
+   SHADER_OPCODE_BYTE_SCATTERED_READ,
+
+   FS_OPCODE_DDX,
+   FS_OPCODE_DDY,
+   FS_OPCODE_PIXEL_X,
+   FS_OPCODE_PIXEL_Y,
+   FS_OPCODE_CINTERP,
+   FS_OPCODE_LINTERP,
+   FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD,
+   FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GEN7,
+   FS_OPCODE_VARYING_PULL_CONSTANT_LOAD,
+   FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7,
+   FS_OPCODE_MOV_DISPATCH_TO_FLAGS,
+   FS_OPCODE_DISCARD_JUMP,
+   FS_OPCODE_SET_OMASK,
+   FS_OPCODE_SET_SAMPLE_ID,
+   FS_OPCODE_SET_SIMD4X2_OFFSET,
+   FS_OPCODE_PACK_HALF_2x16_SPLIT,
+   FS_OPCODE_UNPACK_HALF_2x16_SPLIT_X,
+   FS_OPCODE_UNPACK_HALF_2x16_SPLIT_Y,
+   FS_OPCODE_PLACEHOLDER_HALT,
+
+   VS_OPCODE_URB_WRITE,
+   VS_OPCODE_PULL_CONSTANT_LOAD,
+   VS_OPCODE_PULL_CONSTANT_LOAD_GEN7,
+   VS_OPCODE_UNPACK_FLAGS_SIMD4X2,
+
+   /**
+    * Write geometry shader output data to the URB.
+    *
+    * Unlike VS_OPCODE_URB_WRITE, this opcode doesn't do an implied move from
+    * R0 to the first MRF.  This allows the geometry shader to override the
+    * "Slot {0,1} Offset" fields in the message header.
+    */
+   GS_OPCODE_URB_WRITE,
+
+   /**
+    * Terminate the geometry shader thread by doing an empty URB write.
+    *
+    * This opcode doesn't do an implied move from R0 to the first MRF.  This
+    * allows the geometry shader to override the "GS Number of Output Vertices
+    * for Slot {0,1}" fields in the message header.
+    */
+   GS_OPCODE_THREAD_END,
+
+   /**
+    * Set the "Slot {0,1} Offset" fields of a URB_WRITE message header.
+    *
+    * - dst is the MRF containing the message header.
+    *
+    * - src0.x indicates which portion of the URB should be written to (e.g. a
+    *   vertex number)
+    *
+    * - src1 is an immediate multiplier which will be applied to src0
+    *   (e.g. the size of a single vertex in the URB).
+    *
+    * Note: the hardware will apply this offset *in addition to* the offset in
+    * vec4_instruction::offset.
+    */
+   GS_OPCODE_SET_WRITE_OFFSET,
+
+   /**
+    * Set the "GS Number of Output Vertices for Slot {0,1}" fields of a
+    * URB_WRITE message header.
+    *
+    * - dst is the MRF containing the message header.
+    *
+    * - src0.x is the vertex count.  The upper 16 bits will be ignored.
+    */
+   GS_OPCODE_SET_VERTEX_COUNT,
+
+   /**
+    * Set DWORD 2 of dst to the immediate value in src.  Used by geometry
+    * shaders to initialize DWORD 2 of R0, which needs to be 0 in order for
+    * scratch reads and writes to operate correctly.
+    */
+   GS_OPCODE_SET_DWORD_2_IMMED,
+
+   /**
+    * Prepare the dst register for storage in the "Channel Mask" fields of a
+    * URB_WRITE message header.
+    *
+    * DWORD 4 of dst is shifted left by 4 bits, so that later,
+    * GS_OPCODE_SET_CHANNEL_MASKS can OR DWORDs 0 and 4 together to form the
+    * final channel mask.
+    *
+    * Note: since GS_OPCODE_SET_CHANNEL_MASKS ORs DWORDs 0 and 4 together to
+    * form the final channel mask, DWORDs 0 and 4 of the dst register must not
+    * have any extraneous bits set prior to execution of this opcode (that is,
+    * they should be in the range 0x0 to 0xf).
+    */
+   GS_OPCODE_PREPARE_CHANNEL_MASKS,
+
+   /**
+    * Set the "Channel Mask" fields of a URB_WRITE message header.
+    *
+    * - dst is the MRF containing the message header.
+    *
+    * - src.x is the channel mask, as prepared by
+    *   GS_OPCODE_PREPARE_CHANNEL_MASKS.  DWORDs 0 and 4 are OR'ed together to
+    *   form the final channel mask.
+    */
+   GS_OPCODE_SET_CHANNEL_MASKS,
+
+   /**
+    * Get the "Instance ID" fields from the payload.
+    *
+    * - dst is the GRF for gl_InvocationID.
+    */
+   GS_OPCODE_GET_INSTANCE_ID,
+};
+
+enum brw_urb_write_flags {
+   BRW_URB_WRITE_NO_FLAGS = 0,
+
+   /**
+    * Causes a new URB entry to be allocated, and its address stored in the
+    * destination register (gen < 7).
+    */
+   BRW_URB_WRITE_ALLOCATE = 0x1,
+
+   /**
+    * Causes the current URB entry to be deallocated (gen < 7).
+    */
+   BRW_URB_WRITE_UNUSED = 0x2,
+
+   /**
+    * Causes the thread to terminate.
+    */
+   BRW_URB_WRITE_EOT = 0x4,
+
+   /**
+    * Indicates that the given URB entry is complete, and may be sent further
+    * down the 3D pipeline (gen < 7).
+    */
+   BRW_URB_WRITE_COMPLETE = 0x8,
+
+   /**
+    * Indicates that an additional offset (which may be different for the two
+    * vec4 slots) is stored in the message header (gen == 7).
+    */
+   BRW_URB_WRITE_PER_SLOT_OFFSET = 0x10,
+
+   /**
+    * Indicates that the channel masks in the URB_WRITE message header should
+    * not be overridden to 0xff (gen == 7).
+    */
+   BRW_URB_WRITE_USE_CHANNEL_MASKS = 0x20,
+
+   /**
+    * Indicates that the data should be sent to the URB using the
+    * URB_WRITE_OWORD message rather than URB_WRITE_HWORD (gen == 7).  This
+    * causes offsets to be interpreted as multiples of an OWORD instead of an
+    * HWORD, and only allows one OWORD to be written.
+    */
+   BRW_URB_WRITE_OWORD = 0x40,
+
+   /**
+    * Convenient combination of flags: end the thread while simultaneously
+    * marking the given URB entry as complete.
+    */
+   BRW_URB_WRITE_EOT_COMPLETE = BRW_URB_WRITE_EOT | BRW_URB_WRITE_COMPLETE,
+
+   /**
+    * Convenient combination of flags: mark the given URB entry as complete
+    * and simultaneously allocate a new one.
+    */
+   BRW_URB_WRITE_ALLOCATE_COMPLETE =
+      BRW_URB_WRITE_ALLOCATE | BRW_URB_WRITE_COMPLETE,
+};
+
+#ifdef __cplusplus
+/**
+ * Allow brw_urb_write_flags enums to be ORed together.
+ */
+inline brw_urb_write_flags
+operator|(brw_urb_write_flags x, brw_urb_write_flags y)
+{
+   return static_cast<brw_urb_write_flags>(static_cast<int>(x) |
+                                           static_cast<int>(y));
+}
+#endif
+
+#define BRW_PREDICATE_NONE             0
+#define BRW_PREDICATE_NORMAL           1
+#define BRW_PREDICATE_ALIGN1_ANYV             2
+#define BRW_PREDICATE_ALIGN1_ALLV             3
+#define BRW_PREDICATE_ALIGN1_ANY2H            4
+#define BRW_PREDICATE_ALIGN1_ALL2H            5
+#define BRW_PREDICATE_ALIGN1_ANY4H            6
+#define BRW_PREDICATE_ALIGN1_ALL4H            7
+#define BRW_PREDICATE_ALIGN1_ANY8H            8
+#define BRW_PREDICATE_ALIGN1_ALL8H            9
+#define BRW_PREDICATE_ALIGN1_ANY16H           10
+#define BRW_PREDICATE_ALIGN1_ALL16H           11
+#define BRW_PREDICATE_ALIGN16_REPLICATE_X     2
+#define BRW_PREDICATE_ALIGN16_REPLICATE_Y     3
+#define BRW_PREDICATE_ALIGN16_REPLICATE_Z     4
+#define BRW_PREDICATE_ALIGN16_REPLICATE_W     5
+#define BRW_PREDICATE_ALIGN16_ANY4H           6
+#define BRW_PREDICATE_ALIGN16_ALL4H           7
+
+#define BRW_ARCHITECTURE_REGISTER_FILE    0
+#define BRW_GENERAL_REGISTER_FILE         1
+#define BRW_MESSAGE_REGISTER_FILE         2
+#define BRW_IMMEDIATE_VALUE               3
+
+#define BRW_HW_REG_TYPE_UD  0
+#define BRW_HW_REG_TYPE_D   1
+#define BRW_HW_REG_TYPE_UW  2
+#define BRW_HW_REG_TYPE_W   3
+#define BRW_HW_REG_TYPE_F   7
+#define GEN8_HW_REG_TYPE_UQ 8
+#define GEN8_HW_REG_TYPE_Q  9
+
+#define BRW_HW_REG_NON_IMM_TYPE_UB  4
+#define BRW_HW_REG_NON_IMM_TYPE_B   5
+#define GEN7_HW_REG_NON_IMM_TYPE_DF 6
+#define GEN8_HW_REG_NON_IMM_TYPE_HF 10
+
+#define BRW_HW_REG_IMM_TYPE_UV  4 /* Gen6+ packed unsigned immediate vector */
+#define BRW_HW_REG_IMM_TYPE_VF  5 /* packed float immediate vector */
+#define BRW_HW_REG_IMM_TYPE_V   6 /* packed int imm. vector; uword dest only */
+#define GEN8_HW_REG_IMM_TYPE_DF 10
+#define GEN8_HW_REG_IMM_TYPE_HF 11
+
+/* SNB adds 3-src instructions (MAD and LRP) that only operate on floats, so
+ * the types were implied. IVB adds BFE and BFI2 that operate on doublewords
+ * and unsigned doublewords, so a new field is also available in the da3src
+ * struct (part of struct brw_instruction.bits1 in brw_structs.h) to select
+ * dst and shared-src types. The values are different from BRW_REGISTER_TYPE_*.
+ */
+#define BRW_3SRC_TYPE_F  0
+#define BRW_3SRC_TYPE_D  1
+#define BRW_3SRC_TYPE_UD 2
+#define BRW_3SRC_TYPE_DF 3
+
+#define BRW_ARF_NULL                  0x00
+#define BRW_ARF_ADDRESS               0x10
+#define BRW_ARF_ACCUMULATOR           0x20
+#define BRW_ARF_FLAG                  0x30
+#define BRW_ARF_MASK                  0x40
+#define BRW_ARF_MASK_STACK            0x50
+#define BRW_ARF_MASK_STACK_DEPTH      0x60
+#define BRW_ARF_STATE                 0x70
+#define BRW_ARF_CONTROL               0x80
+#define BRW_ARF_NOTIFICATION_COUNT    0x90
+#define BRW_ARF_IP                    0xA0
+#define BRW_ARF_TDR                   0xB0
+#define BRW_ARF_TIMESTAMP             0xC0
+
+#define BRW_MRF_COMPR4			(1 << 7)
+
+#define BRW_AMASK   0
+#define BRW_IMASK   1
+#define BRW_LMASK   2
+#define BRW_CMASK   3
+
+
+
+#define BRW_THREAD_NORMAL     0
+#define BRW_THREAD_ATOMIC     1
+#define BRW_THREAD_SWITCH     2
+
+#define BRW_VERTICAL_STRIDE_0                 0
+#define BRW_VERTICAL_STRIDE_1                 1
+#define BRW_VERTICAL_STRIDE_2                 2
+#define BRW_VERTICAL_STRIDE_4                 3
+#define BRW_VERTICAL_STRIDE_8                 4
+#define BRW_VERTICAL_STRIDE_16                5
+#define BRW_VERTICAL_STRIDE_32                6
+#define BRW_VERTICAL_STRIDE_64                7
+#define BRW_VERTICAL_STRIDE_128               8
+#define BRW_VERTICAL_STRIDE_256               9
+#define BRW_VERTICAL_STRIDE_ONE_DIMENSIONAL   0xF
+
+#define BRW_WIDTH_1       0
+#define BRW_WIDTH_2       1
+#define BRW_WIDTH_4       2
+#define BRW_WIDTH_8       3
+#define BRW_WIDTH_16      4
+
+#define BRW_STATELESS_BUFFER_BOUNDARY_1K      0
+#define BRW_STATELESS_BUFFER_BOUNDARY_2K      1
+#define BRW_STATELESS_BUFFER_BOUNDARY_4K      2
+#define BRW_STATELESS_BUFFER_BOUNDARY_8K      3
+#define BRW_STATELESS_BUFFER_BOUNDARY_16K     4
+#define BRW_STATELESS_BUFFER_BOUNDARY_32K     5
+#define BRW_STATELESS_BUFFER_BOUNDARY_64K     6
+#define BRW_STATELESS_BUFFER_BOUNDARY_128K    7
+#define BRW_STATELESS_BUFFER_BOUNDARY_256K    8
+#define BRW_STATELESS_BUFFER_BOUNDARY_512K    9
+#define BRW_STATELESS_BUFFER_BOUNDARY_1M      10
+#define BRW_STATELESS_BUFFER_BOUNDARY_2M      11
+
+#define BRW_POLYGON_FACING_FRONT      0
+#define BRW_POLYGON_FACING_BACK       1
+
+/**
+ * Message target: Shared Function ID for where to SEND a message.
+ *
+ * These are enumerated in the ISA reference under "send - Send Message".
+ * In particular, see the following tables:
+ * - G45 PRM, Volume 4, Table 14-15 "Message Descriptor Definition"
+ * - Sandybridge PRM, Volume 4 Part 2, Table 8-16 "Extended Message Descriptor"
+ * - Ivybridge PRM, Volume 1 Part 1, section 3.2.7 "GPE Function IDs"
+ */
+enum brw_message_target {
+   BRW_SFID_NULL                     = 0,
+   BRW_SFID_MATH                     = 1, /* Only valid on Gen4-5 */
+   BRW_SFID_SAMPLER                  = 2,
+   BRW_SFID_MESSAGE_GATEWAY          = 3,
+   BRW_SFID_DATAPORT_READ            = 4,
+   BRW_SFID_DATAPORT_WRITE           = 5,
+   BRW_SFID_URB                      = 6,
+   BRW_SFID_THREAD_SPAWNER           = 7,
+   BRW_SFID_VME                      = 8,
+
+   GEN6_SFID_DATAPORT_SAMPLER_CACHE  = 4,
+   GEN6_SFID_DATAPORT_RENDER_CACHE   = 5,
+   GEN6_SFID_DATAPORT_CONSTANT_CACHE = 9,
+
+   GEN7_SFID_DATAPORT_DATA_CACHE     = 10,
+   GEN7_SFID_PIXEL_INTERPOLATOR      = 11,
+   HSW_SFID_DATAPORT_DATA_CACHE_1    = 12,
+   HSW_SFID_CRE                      = 13,
+};
+
+#define GEN7_MESSAGE_TARGET_DP_DATA_CACHE     10
+
+#define BRW_SAMPLER_RETURN_FORMAT_FLOAT32     0
+#define BRW_SAMPLER_RETURN_FORMAT_UINT32      2
+#define BRW_SAMPLER_RETURN_FORMAT_SINT32      3
+
+#define BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE              0
+#define BRW_SAMPLER_MESSAGE_SIMD16_SAMPLE             0
+#define BRW_SAMPLER_MESSAGE_SIMD16_SAMPLE_BIAS        0
+#define BRW_SAMPLER_MESSAGE_SIMD8_KILLPIX             1
+#define BRW_SAMPLER_MESSAGE_SIMD4X2_SAMPLE_LOD        1
+#define BRW_SAMPLER_MESSAGE_SIMD16_SAMPLE_LOD         1
+#define BRW_SAMPLER_MESSAGE_SIMD4X2_SAMPLE_GRADIENTS  2
+#define BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE_GRADIENTS    2
+#define BRW_SAMPLER_MESSAGE_SIMD4X2_SAMPLE_COMPARE    0
+#define BRW_SAMPLER_MESSAGE_SIMD16_SAMPLE_COMPARE     2
+#define BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE_BIAS_COMPARE 0
+#define BRW_SAMPLER_MESSAGE_SIMD4X2_SAMPLE_LOD_COMPARE 1
+#define BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE_LOD_COMPARE  1
+#define BRW_SAMPLER_MESSAGE_SIMD4X2_RESINFO           2
+#define BRW_SAMPLER_MESSAGE_SIMD16_RESINFO            2
+#define BRW_SAMPLER_MESSAGE_SIMD4X2_LD                3
+#define BRW_SAMPLER_MESSAGE_SIMD8_LD                  3
+#define BRW_SAMPLER_MESSAGE_SIMD16_LD                 3
+
+#define GEN5_SAMPLER_MESSAGE_SAMPLE              0
+#define GEN5_SAMPLER_MESSAGE_SAMPLE_BIAS         1
+#define GEN5_SAMPLER_MESSAGE_SAMPLE_LOD          2
+#define GEN5_SAMPLER_MESSAGE_SAMPLE_COMPARE      3
+#define GEN5_SAMPLER_MESSAGE_SAMPLE_DERIVS       4
+#define GEN5_SAMPLER_MESSAGE_SAMPLE_BIAS_COMPARE 5
+#define GEN5_SAMPLER_MESSAGE_SAMPLE_LOD_COMPARE  6
+#define GEN5_SAMPLER_MESSAGE_SAMPLE_LD           7
+#define GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4      8
+#define GEN5_SAMPLER_MESSAGE_LOD                 9
+#define GEN5_SAMPLER_MESSAGE_SAMPLE_RESINFO      10
+#define GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_C    16
+#define GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO   17
+#define GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO_C 18
+#define HSW_SAMPLER_MESSAGE_SAMPLE_DERIV_COMPARE 20
+#define GEN7_SAMPLER_MESSAGE_SAMPLE_LD_MCS       29
+#define GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DMS       30
+#define GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DSS       31
+
+/* for GEN5 only */
+#define BRW_SAMPLER_SIMD_MODE_SIMD4X2                   0
+#define BRW_SAMPLER_SIMD_MODE_SIMD8                     1
+#define BRW_SAMPLER_SIMD_MODE_SIMD16                    2
+#define BRW_SAMPLER_SIMD_MODE_SIMD32_64                 3
+
+#define BRW_DATAPORT_OWORD_BLOCK_1_OWORDLOW   0
+#define BRW_DATAPORT_OWORD_BLOCK_1_OWORDHIGH  1
+#define BRW_DATAPORT_OWORD_BLOCK_2_OWORDS     2
+#define BRW_DATAPORT_OWORD_BLOCK_4_OWORDS     3
+#define BRW_DATAPORT_OWORD_BLOCK_8_OWORDS     4
+
+#define BRW_DATAPORT_OWORD_DUAL_BLOCK_1OWORD     0
+#define BRW_DATAPORT_OWORD_DUAL_BLOCK_4OWORDS    2
+
+#define BRW_DATAPORT_DWORD_SCATTERED_BLOCK_8DWORDS   2
+#define BRW_DATAPORT_DWORD_SCATTERED_BLOCK_16DWORDS  3
+
+/* This one stays the same across generations. */
+#define BRW_DATAPORT_READ_MESSAGE_OWORD_BLOCK_READ          0
+/* GEN4 */
+#define BRW_DATAPORT_READ_MESSAGE_OWORD_DUAL_BLOCK_READ     1
+#define BRW_DATAPORT_READ_MESSAGE_MEDIA_BLOCK_READ          2
+#define BRW_DATAPORT_READ_MESSAGE_DWORD_SCATTERED_READ      3
+/* G45, GEN5 */
+#define G45_DATAPORT_READ_MESSAGE_RENDER_UNORM_READ	    1
+#define G45_DATAPORT_READ_MESSAGE_OWORD_DUAL_BLOCK_READ     2
+#define G45_DATAPORT_READ_MESSAGE_AVC_LOOP_FILTER_READ	    3
+#define G45_DATAPORT_READ_MESSAGE_MEDIA_BLOCK_READ          4
+#define G45_DATAPORT_READ_MESSAGE_DWORD_SCATTERED_READ      6
+/* GEN6 */
+#define GEN6_DATAPORT_READ_MESSAGE_RENDER_UNORM_READ	    1
+#define GEN6_DATAPORT_READ_MESSAGE_OWORD_DUAL_BLOCK_READ     2
+#define GEN6_DATAPORT_READ_MESSAGE_MEDIA_BLOCK_READ          4
+#define GEN6_DATAPORT_READ_MESSAGE_OWORD_UNALIGN_BLOCK_READ  5
+#define GEN6_DATAPORT_READ_MESSAGE_DWORD_SCATTERED_READ      6
+
+#define BRW_DATAPORT_READ_TARGET_DATA_CACHE      0
+#define BRW_DATAPORT_READ_TARGET_RENDER_CACHE    1
+#define BRW_DATAPORT_READ_TARGET_SAMPLER_CACHE   2
+
+#define BRW_DATAPORT_RENDER_TARGET_WRITE_SIMD16_SINGLE_SOURCE                0
+#define BRW_DATAPORT_RENDER_TARGET_WRITE_SIMD16_SINGLE_SOURCE_REPLICATED     1
+#define BRW_DATAPORT_RENDER_TARGET_WRITE_SIMD8_DUAL_SOURCE_SUBSPAN01         2
+#define BRW_DATAPORT_RENDER_TARGET_WRITE_SIMD8_DUAL_SOURCE_SUBSPAN23         3
+#define BRW_DATAPORT_RENDER_TARGET_WRITE_SIMD8_SINGLE_SOURCE_SUBSPAN01       4
+
+#define BRW_DATAPORT_WRITE_MESSAGE_OWORD_BLOCK_WRITE                0
+#define BRW_DATAPORT_WRITE_MESSAGE_OWORD_DUAL_BLOCK_WRITE           1
+#define BRW_DATAPORT_WRITE_MESSAGE_MEDIA_BLOCK_WRITE                2
+#define BRW_DATAPORT_WRITE_MESSAGE_DWORD_SCATTERED_WRITE            3
+#define BRW_DATAPORT_WRITE_MESSAGE_RENDER_TARGET_WRITE              4
+#define BRW_DATAPORT_WRITE_MESSAGE_STREAMED_VERTEX_BUFFER_WRITE     5
+#define BRW_DATAPORT_WRITE_MESSAGE_FLUSH_RENDER_CACHE               7
+
+/* GEN6 */
+#define GEN6_DATAPORT_WRITE_MESSAGE_DWORD_ATOMIC_WRITE              7
+#define GEN6_DATAPORT_WRITE_MESSAGE_OWORD_BLOCK_WRITE               8
+#define GEN6_DATAPORT_WRITE_MESSAGE_OWORD_DUAL_BLOCK_WRITE          9
+#define GEN6_DATAPORT_WRITE_MESSAGE_MEDIA_BLOCK_WRITE               10
+#define GEN6_DATAPORT_WRITE_MESSAGE_DWORD_SCATTERED_WRITE           11
+#define GEN6_DATAPORT_WRITE_MESSAGE_RENDER_TARGET_WRITE             12
+#define GEN6_DATAPORT_WRITE_MESSAGE_STREAMED_VB_WRITE               13
+#define GEN6_DATAPORT_WRITE_MESSAGE_RENDER_TARGET_UNORM_WRITE       14
+
+/* GEN7 */
+#define GEN7_DATAPORT_WRITE_MESSAGE_OWORD_DUAL_BLOCK_WRITE          10
+#define GEN7_DATAPORT_DC_OWORD_BLOCK_READ                           0
+#define GEN7_DATAPORT_DC_UNALIGNED_OWORD_BLOCK_READ                 1
+#define GEN7_DATAPORT_DC_OWORD_DUAL_BLOCK_READ                      2
+#define GEN7_DATAPORT_DC_DWORD_SCATTERED_READ                       3
+#define GEN7_DATAPORT_DC_BYTE_SCATTERED_READ                        4
+#define GEN7_DATAPORT_DC_UNTYPED_SURFACE_READ                       5
+#define GEN7_DATAPORT_DC_UNTYPED_ATOMIC_OP                          6
+#define GEN7_DATAPORT_DC_MEMORY_FENCE                               7
+#define GEN7_DATAPORT_DC_OWORD_BLOCK_WRITE                          8
+#define GEN7_DATAPORT_DC_OWORD_DUAL_BLOCK_WRITE                     10
+#define GEN7_DATAPORT_DC_DWORD_SCATTERED_WRITE                      11
+#define GEN7_DATAPORT_DC_BYTE_SCATTERED_WRITE                       12
+#define GEN7_DATAPORT_DC_UNTYPED_SURFACE_WRITE                      13
+
+#define GEN7_DATAPORT_SCRATCH_READ                            ((1 << 18) | \
+                                                               (0 << 17))
+#define GEN7_DATAPORT_SCRATCH_WRITE                           ((1 << 18) | \
+                                                               (1 << 17))
+#define GEN7_DATAPORT_SCRATCH_NUM_REGS_SHIFT                        12
+
+/* HSW */
+#define HSW_DATAPORT_DC_PORT0_OWORD_BLOCK_READ                      0
+#define HSW_DATAPORT_DC_PORT0_UNALIGNED_OWORD_BLOCK_READ            1
+#define HSW_DATAPORT_DC_PORT0_OWORD_DUAL_BLOCK_READ                 2
+#define HSW_DATAPORT_DC_PORT0_DWORD_SCATTERED_READ                  3
+#define HSW_DATAPORT_DC_PORT0_BYTE_SCATTERED_READ                   4
+#define HSW_DATAPORT_DC_PORT0_MEMORY_FENCE                          7
+#define HSW_DATAPORT_DC_PORT0_OWORD_BLOCK_WRITE                     8
+#define HSW_DATAPORT_DC_PORT0_OWORD_DUAL_BLOCK_WRITE                10
+#define HSW_DATAPORT_DC_PORT0_DWORD_SCATTERED_WRITE                 11
+#define HSW_DATAPORT_DC_PORT0_BYTE_SCATTERED_WRITE                  12
+
+#define HSW_DATAPORT_DC_PORT1_UNTYPED_SURFACE_READ                  1
+#define HSW_DATAPORT_DC_PORT1_UNTYPED_ATOMIC_OP                     2
+#define HSW_DATAPORT_DC_PORT1_UNTYPED_ATOMIC_OP_SIMD4X2             3
+#define HSW_DATAPORT_DC_PORT1_MEDIA_BLOCK_READ                      4
+#define HSW_DATAPORT_DC_PORT1_TYPED_SURFACE_READ                    5
+#define HSW_DATAPORT_DC_PORT1_TYPED_ATOMIC_OP                       6
+#define HSW_DATAPORT_DC_PORT1_TYPED_ATOMIC_OP_SIMD4X2               7
+#define HSW_DATAPORT_DC_PORT1_UNTYPED_SURFACE_WRITE                 9
+#define HSW_DATAPORT_DC_PORT1_MEDIA_BLOCK_WRITE                     10
+#define HSW_DATAPORT_DC_PORT1_ATOMIC_COUNTER_OP                     11
+#define HSW_DATAPORT_DC_PORT1_ATOMIC_COUNTER_OP_SIMD4X2             12
+#define HSW_DATAPORT_DC_PORT1_TYPED_SURFACE_WRITE                   13
+
+/* dataport atomic operations. */
+#define BRW_AOP_AND                   1
+#define BRW_AOP_OR                    2
+#define BRW_AOP_XOR                   3
+#define BRW_AOP_MOV                   4
+#define BRW_AOP_INC                   5
+#define BRW_AOP_DEC                   6
+#define BRW_AOP_ADD                   7
+#define BRW_AOP_SUB                   8
+#define BRW_AOP_REVSUB                9
+#define BRW_AOP_IMAX                  10
+#define BRW_AOP_IMIN                  11
+#define BRW_AOP_UMAX                  12
+#define BRW_AOP_UMIN                  13
+#define BRW_AOP_CMPWR                 14
+#define BRW_AOP_PREDEC                15
+
+#define BRW_MATH_FUNCTION_INV                              1
+#define BRW_MATH_FUNCTION_LOG                              2
+#define BRW_MATH_FUNCTION_EXP                              3
+#define BRW_MATH_FUNCTION_SQRT                             4
+#define BRW_MATH_FUNCTION_RSQ                              5
+#define BRW_MATH_FUNCTION_SIN                              6
+#define BRW_MATH_FUNCTION_COS                              7
+#define BRW_MATH_FUNCTION_SINCOS                           8 /* gen4, gen5 */
+#define BRW_MATH_FUNCTION_FDIV                             9 /* gen6+ */
+#define BRW_MATH_FUNCTION_POW                              10
+#define BRW_MATH_FUNCTION_INT_DIV_QUOTIENT_AND_REMAINDER   11
+#define BRW_MATH_FUNCTION_INT_DIV_QUOTIENT                 12
+#define BRW_MATH_FUNCTION_INT_DIV_REMAINDER                13
+#define GEN8_MATH_FUNCTION_INVM                            14
+#define GEN8_MATH_FUNCTION_RSQRTM                          15
+
+#define BRW_MATH_INTEGER_UNSIGNED     0
+#define BRW_MATH_INTEGER_SIGNED       1
+
+#define BRW_MATH_PRECISION_FULL        0
+#define BRW_MATH_PRECISION_PARTIAL     1
+
+#define BRW_MATH_SATURATE_NONE         0
+#define BRW_MATH_SATURATE_SATURATE     1
+
+#define BRW_MATH_DATA_VECTOR  0
+#define BRW_MATH_DATA_SCALAR  1
+
+#define BRW_URB_OPCODE_WRITE_HWORD  0
+#define BRW_URB_OPCODE_WRITE_OWORD  1
+
+#define BRW_URB_SWIZZLE_NONE          0
+#define BRW_URB_SWIZZLE_INTERLEAVE    1
+#define BRW_URB_SWIZZLE_TRANSPOSE     2
+
+#define BRW_SCRATCH_SPACE_SIZE_1K     0
+#define BRW_SCRATCH_SPACE_SIZE_2K     1
+#define BRW_SCRATCH_SPACE_SIZE_4K     2
+#define BRW_SCRATCH_SPACE_SIZE_8K     3
+#define BRW_SCRATCH_SPACE_SIZE_16K    4
+#define BRW_SCRATCH_SPACE_SIZE_32K    5
+#define BRW_SCRATCH_SPACE_SIZE_64K    6
+#define BRW_SCRATCH_SPACE_SIZE_128K   7
+#define BRW_SCRATCH_SPACE_SIZE_256K   8
+#define BRW_SCRATCH_SPACE_SIZE_512K   9
+#define BRW_SCRATCH_SPACE_SIZE_1M     10
+#define BRW_SCRATCH_SPACE_SIZE_2M     11
+
+
+#define CMD_URB_FENCE                 0x6000
+#define CMD_CS_URB_STATE              0x6001
+#define CMD_CONST_BUFFER              0x6002
+
+#define CMD_STATE_BASE_ADDRESS        0x6101
+#define CMD_STATE_SIP                 0x6102
+#define CMD_PIPELINE_SELECT_965       0x6104
+#define CMD_PIPELINE_SELECT_GM45      0x6904
+
+#define _3DSTATE_PIPELINED_POINTERS		0x7800
+#define _3DSTATE_BINDING_TABLE_POINTERS		0x7801
+# define GEN6_BINDING_TABLE_MODIFY_VS	(1 << 8)
+# define GEN6_BINDING_TABLE_MODIFY_GS	(1 << 9)
+# define GEN6_BINDING_TABLE_MODIFY_PS	(1 << 12)
+
+#define _3DSTATE_BINDING_TABLE_POINTERS_VS	0x7826 /* GEN7+ */
+#define _3DSTATE_BINDING_TABLE_POINTERS_HS	0x7827 /* GEN7+ */
+#define _3DSTATE_BINDING_TABLE_POINTERS_DS	0x7828 /* GEN7+ */
+#define _3DSTATE_BINDING_TABLE_POINTERS_GS	0x7829 /* GEN7+ */
+#define _3DSTATE_BINDING_TABLE_POINTERS_PS	0x782A /* GEN7+ */
+
+#define _3DSTATE_SAMPLER_STATE_POINTERS		0x7802 /* GEN6+ */
+# define PS_SAMPLER_STATE_CHANGE				(1 << 12)
+# define GS_SAMPLER_STATE_CHANGE				(1 << 9)
+# define VS_SAMPLER_STATE_CHANGE				(1 << 8)
+/* DW1: VS */
+/* DW2: GS */
+/* DW3: PS */
+
+#define _3DSTATE_SAMPLER_STATE_POINTERS_VS	0x782B /* GEN7+ */
+#define _3DSTATE_SAMPLER_STATE_POINTERS_GS	0x782E /* GEN7+ */
+#define _3DSTATE_SAMPLER_STATE_POINTERS_PS	0x782F /* GEN7+ */
+
+#define _3DSTATE_VERTEX_BUFFERS       0x7808
+# define BRW_VB0_INDEX_SHIFT		27
+# define GEN6_VB0_INDEX_SHIFT		26
+# define BRW_VB0_ACCESS_VERTEXDATA	(0 << 26)
+# define BRW_VB0_ACCESS_INSTANCEDATA	(1 << 26)
+# define GEN6_VB0_ACCESS_VERTEXDATA	(0 << 20)
+# define GEN6_VB0_ACCESS_INSTANCEDATA	(1 << 20)
+# define GEN7_VB0_ADDRESS_MODIFYENABLE  (1 << 14)
+# define BRW_VB0_PITCH_SHIFT		0
+
+#define _3DSTATE_VERTEX_ELEMENTS      0x7809
+# define BRW_VE0_INDEX_SHIFT		27
+# define GEN6_VE0_INDEX_SHIFT		26
+# define BRW_VE0_FORMAT_SHIFT		16
+# define BRW_VE0_VALID			(1 << 26)
+# define GEN6_VE0_VALID			(1 << 25)
+# define GEN6_VE0_EDGE_FLAG_ENABLE	(1 << 15)
+# define BRW_VE0_SRC_OFFSET_SHIFT	0
+# define BRW_VE1_COMPONENT_NOSTORE	0
+# define BRW_VE1_COMPONENT_STORE_SRC	1
+# define BRW_VE1_COMPONENT_STORE_0	2
+# define BRW_VE1_COMPONENT_STORE_1_FLT	3
+# define BRW_VE1_COMPONENT_STORE_1_INT	4
+# define BRW_VE1_COMPONENT_STORE_VID	5
+# define BRW_VE1_COMPONENT_STORE_IID	6
+# define BRW_VE1_COMPONENT_STORE_PID	7
+# define BRW_VE1_COMPONENT_0_SHIFT	28
+# define BRW_VE1_COMPONENT_1_SHIFT	24
+# define BRW_VE1_COMPONENT_2_SHIFT	20
+# define BRW_VE1_COMPONENT_3_SHIFT	16
+# define BRW_VE1_DST_OFFSET_SHIFT	0
+
+#define CMD_INDEX_BUFFER              0x780a
+#define GEN4_3DSTATE_VF_STATISTICS		0x780b
+#define GM45_3DSTATE_VF_STATISTICS		0x680b
+#define _3DSTATE_CC_STATE_POINTERS		0x780e /* GEN6+ */
+#define _3DSTATE_BLEND_STATE_POINTERS		0x7824 /* GEN7+ */
+#define _3DSTATE_DEPTH_STENCIL_STATE_POINTERS	0x7825 /* GEN7+ */
+
+#define _3DSTATE_URB				0x7805 /* GEN6 */
+# define GEN6_URB_VS_SIZE_SHIFT				16
+# define GEN6_URB_VS_ENTRIES_SHIFT			0
+# define GEN6_URB_GS_ENTRIES_SHIFT			8
+# define GEN6_URB_GS_SIZE_SHIFT				0
+
+#define _3DSTATE_VF                             0x780c /* GEN7.5+ */
+#define HSW_CUT_INDEX_ENABLE                            (1 << 8)
+
+#define _3DSTATE_VF_INSTANCING                  0x7849 /* GEN8+ */
+# define GEN8_VF_INSTANCING_ENABLE                      (1 << 8)
+
+#define _3DSTATE_VF_SGVS                        0x784a /* GEN8+ */
+# define GEN8_SGVS_ENABLE_INSTANCE_ID                   (1 << 31)
+# define GEN8_SGVS_INSTANCE_ID_COMPONENT_SHIFT          29
+# define GEN8_SGVS_INSTANCE_ID_ELEMENT_OFFSET_SHIFT     16
+# define GEN8_SGVS_ENABLE_VERTEX_ID                     (1 << 15)
+# define GEN8_SGVS_VERTEX_ID_COMPONENT_SHIFT            13
+# define GEN8_SGVS_VERTEX_ID_ELEMENT_OFFSET_SHIFT       0
+
+#define _3DSTATE_VF_TOPOLOGY                    0x784b /* GEN8+ */
+
+#define _3DSTATE_WM_CHROMAKEY			0x784c /* GEN8+ */
+
+#define _3DSTATE_URB_VS                         0x7830 /* GEN7+ */
+#define _3DSTATE_URB_HS                         0x7831 /* GEN7+ */
+#define _3DSTATE_URB_DS                         0x7832 /* GEN7+ */
+#define _3DSTATE_URB_GS                         0x7833 /* GEN7+ */
+# define GEN7_URB_ENTRY_SIZE_SHIFT                      16
+# define GEN7_URB_STARTING_ADDRESS_SHIFT                25
+
+/* "GS URB Entry Allocation Size" is a U9-1 field, so the maximum gs_size
+ * is 2^9, or 512.  It's counted in multiples of 64 bytes.
+ */
+#define GEN7_MAX_GS_URB_ENTRY_SIZE_BYTES		(512*64)
+
+#define _3DSTATE_PUSH_CONSTANT_ALLOC_VS         0x7912 /* GEN7+ */
+#define _3DSTATE_PUSH_CONSTANT_ALLOC_GS         0x7915 /* GEN7+ */
+#define _3DSTATE_PUSH_CONSTANT_ALLOC_PS         0x7916 /* GEN7+ */
+# define GEN7_PUSH_CONSTANT_BUFFER_OFFSET_SHIFT         16
+
+#define _3DSTATE_VIEWPORT_STATE_POINTERS	0x780d /* GEN6+ */
+# define GEN6_CC_VIEWPORT_MODIFY			(1 << 12)
+# define GEN6_SF_VIEWPORT_MODIFY			(1 << 11)
+# define GEN6_CLIP_VIEWPORT_MODIFY			(1 << 10)
+# define GEN7_NUM_VIEWPORTS				16
+
+#define _3DSTATE_VIEWPORT_STATE_POINTERS_CC	0x7823 /* GEN7+ */
+#define _3DSTATE_VIEWPORT_STATE_POINTERS_SF_CL	0x7821 /* GEN7+ */
+
+#define _3DSTATE_SCISSOR_STATE_POINTERS		0x780f /* GEN6+ */
+
+#define _3DSTATE_VS				0x7810 /* GEN6+ */
+/* DW2 */
+# define GEN6_VS_SPF_MODE				(1 << 31)
+# define GEN6_VS_VECTOR_MASK_ENABLE			(1 << 30)
+# define GEN6_VS_SAMPLER_COUNT_SHIFT			27
+# define GEN6_VS_BINDING_TABLE_ENTRY_COUNT_SHIFT	18
+# define GEN6_VS_FLOATING_POINT_MODE_IEEE_754		(0 << 16)
+# define GEN6_VS_FLOATING_POINT_MODE_ALT		(1 << 16)
+/* DW4 */
+# define GEN6_VS_DISPATCH_START_GRF_SHIFT		20
+# define GEN6_VS_URB_READ_LENGTH_SHIFT			11
+# define GEN6_VS_URB_ENTRY_READ_OFFSET_SHIFT		4
+/* DW5 */
+# define GEN6_VS_MAX_THREADS_SHIFT			25
+# define HSW_VS_MAX_THREADS_SHIFT			23
+# define GEN6_VS_STATISTICS_ENABLE			(1 << 10)
+# define GEN6_VS_CACHE_DISABLE				(1 << 1)
+# define GEN6_VS_ENABLE					(1 << 0)
+/* Gen8+ DW8 */
+# define GEN8_VS_URB_ENTRY_OUTPUT_OFFSET_SHIFT          21
+# define GEN8_VS_URB_OUTPUT_LENGTH_SHIFT                16
+# define GEN8_VS_USER_CLIP_DISTANCE_SHIFT               8
+
+#define _3DSTATE_GS		      		0x7811 /* GEN6+ */
+/* DW2 */
+# define GEN6_GS_SPF_MODE				(1 << 31)
+# define GEN6_GS_VECTOR_MASK_ENABLE			(1 << 30)
+# define GEN6_GS_SAMPLER_COUNT_SHIFT			27
+# define GEN6_GS_BINDING_TABLE_ENTRY_COUNT_SHIFT	18
+# define GEN6_GS_FLOATING_POINT_MODE_IEEE_754		(0 << 16)
+# define GEN6_GS_FLOATING_POINT_MODE_ALT		(1 << 16)
+/* DW4 */
+# define GEN7_GS_OUTPUT_VERTEX_SIZE_SHIFT		23
+# define GEN7_GS_OUTPUT_TOPOLOGY_SHIFT			17
+# define GEN6_GS_URB_READ_LENGTH_SHIFT			11
+# define GEN7_GS_INCLUDE_VERTEX_HANDLES		        (1 << 10)
+# define GEN6_GS_URB_ENTRY_READ_OFFSET_SHIFT		4
+# define GEN6_GS_DISPATCH_START_GRF_SHIFT		0
+/* DW5 */
+# define GEN6_GS_MAX_THREADS_SHIFT			25
+# define HSW_GS_MAX_THREADS_SHIFT			24
+# define IVB_GS_CONTROL_DATA_FORMAT_SHIFT		24
+# define GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_CUT		0
+# define GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_SID		1
+# define GEN7_GS_CONTROL_DATA_HEADER_SIZE_SHIFT		20
+# define GEN7_GS_INSTANCE_CONTROL_SHIFT			15
+# define GEN7_GS_DISPATCH_MODE_SINGLE			(0 << 11)
+# define GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE		(1 << 11)
+# define GEN7_GS_DISPATCH_MODE_DUAL_OBJECT		(2 << 11)
+# define GEN6_GS_STATISTICS_ENABLE			(1 << 10)
+# define GEN6_GS_SO_STATISTICS_ENABLE			(1 << 9)
+# define GEN6_GS_RENDERING_ENABLE			(1 << 8)
+# define GEN7_GS_INCLUDE_PRIMITIVE_ID			(1 << 4)
+# define GEN7_GS_REORDER_TRAILING			(1 << 2)
+# define GEN7_GS_ENABLE					(1 << 0)
+/* DW6 */
+# define HSW_GS_CONTROL_DATA_FORMAT_SHIFT		31
+# define GEN6_GS_REORDER				(1 << 30)
+# define GEN6_GS_DISCARD_ADJACENCY			(1 << 29)
+# define GEN6_GS_SVBI_PAYLOAD_ENABLE			(1 << 28)
+# define GEN6_GS_SVBI_POSTINCREMENT_ENABLE		(1 << 27)
+# define GEN6_GS_SVBI_POSTINCREMENT_VALUE_SHIFT		16
+# define GEN6_GS_SVBI_POSTINCREMENT_VALUE_MASK		INTEL_MASK(25, 16)
+# define GEN6_GS_ENABLE					(1 << 15)
+
+/* Gen8+ DW9 */
+# define GEN8_GS_URB_ENTRY_OUTPUT_OFFSET_SHIFT          21
+# define GEN8_GS_URB_OUTPUT_LENGTH_SHIFT                16
+# define GEN8_GS_USER_CLIP_DISTANCE_SHIFT               8
+
+# define BRW_GS_EDGE_INDICATOR_0			(1 << 8)
+# define BRW_GS_EDGE_INDICATOR_1			(1 << 9)
+
+/* GS Thread Payload
+ */
+/* R0 */
+# define GEN7_GS_PAYLOAD_INSTANCE_ID_SHIFT		27
+
+/* 3DSTATE_GS "Output Vertex Size" has an effective maximum of 62.  It's
+ * counted in multiples of 16 bytes.
+ */
+#define GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES		(62*16)
+
+#define _3DSTATE_HS                             0x781B /* GEN7+ */
+#define _3DSTATE_TE                             0x781C /* GEN7+ */
+#define _3DSTATE_DS                             0x781D /* GEN7+ */
+
+#define _3DSTATE_CLIP				0x7812 /* GEN6+ */
+/* DW1 */
+# define GEN7_CLIP_WINDING_CW                           (0 << 20)
+# define GEN7_CLIP_WINDING_CCW                          (1 << 20)
+# define GEN7_CLIP_VERTEX_SUBPIXEL_PRECISION_8          (0 << 19)
+# define GEN7_CLIP_VERTEX_SUBPIXEL_PRECISION_4          (1 << 19)
+# define GEN7_CLIP_EARLY_CULL                           (1 << 18)
+# define GEN7_CLIP_CULLMODE_BOTH                        (0 << 16)
+# define GEN7_CLIP_CULLMODE_NONE                        (1 << 16)
+# define GEN7_CLIP_CULLMODE_FRONT                       (2 << 16)
+# define GEN7_CLIP_CULLMODE_BACK                        (3 << 16)
+# define GEN6_CLIP_STATISTICS_ENABLE			(1 << 10)
+/**
+ * Just does cheap culling based on the clip distance.  Bits must be
+ * disjoint with USER_CLIP_CLIP_DISTANCE bits.
+ */
+# define GEN6_USER_CLIP_CULL_DISTANCES_SHIFT		0
+/* DW2 */
+# define GEN6_CLIP_ENABLE				(1 << 31)
+# define GEN6_CLIP_API_OGL				(0 << 30)
+# define GEN6_CLIP_API_D3D				(1 << 30)
+# define GEN6_CLIP_XY_TEST				(1 << 28)
+# define GEN6_CLIP_Z_TEST				(1 << 27)
+# define GEN6_CLIP_GB_TEST				(1 << 26)
+/** 8-bit field of which user clip distances to clip against. */
+# define GEN6_USER_CLIP_CLIP_DISTANCES_SHIFT		16
+# define GEN6_CLIP_MODE_NORMAL				(0 << 13)
+# define GEN6_CLIP_MODE_REJECT_ALL			(3 << 13)
+# define GEN6_CLIP_MODE_ACCEPT_ALL			(4 << 13)
+# define GEN6_CLIP_PERSPECTIVE_DIVIDE_DISABLE		(1 << 9)
+# define GEN6_CLIP_NON_PERSPECTIVE_BARYCENTRIC_ENABLE	(1 << 8)
+# define GEN6_CLIP_TRI_PROVOKE_SHIFT			4
+# define GEN6_CLIP_LINE_PROVOKE_SHIFT			2
+# define GEN6_CLIP_TRIFAN_PROVOKE_SHIFT			0
+/* DW3 */
+# define GEN6_CLIP_MIN_POINT_WIDTH_SHIFT		17
+# define GEN6_CLIP_MAX_POINT_WIDTH_SHIFT		6
+# define GEN6_CLIP_FORCE_ZERO_RTAINDEX			(1 << 5)
+# define GEN6_CLIP_MAX_VP_INDEX_MASK			INTEL_MASK(3, 0)
+
+#define _3DSTATE_SF				0x7813 /* GEN6+ */
+/* DW1 (for gen6) */
+# define GEN6_SF_NUM_OUTPUTS_SHIFT			22
+# define GEN6_SF_SWIZZLE_ENABLE				(1 << 21)
+# define GEN6_SF_POINT_SPRITE_UPPERLEFT			(0 << 20)
+# define GEN6_SF_POINT_SPRITE_LOWERLEFT			(1 << 20)
+# define GEN6_SF_URB_ENTRY_READ_LENGTH_SHIFT		11
+# define GEN6_SF_URB_ENTRY_READ_OFFSET_SHIFT		4
+/* DW2 */
+# define GEN6_SF_LEGACY_GLOBAL_DEPTH_BIAS		(1 << 11)
+# define GEN6_SF_STATISTICS_ENABLE			(1 << 10)
+# define GEN6_SF_GLOBAL_DEPTH_OFFSET_SOLID		(1 << 9)
+# define GEN6_SF_GLOBAL_DEPTH_OFFSET_WIREFRAME		(1 << 8)
+# define GEN6_SF_GLOBAL_DEPTH_OFFSET_POINT		(1 << 7)
+# define GEN6_SF_FRONT_SOLID				(0 << 5)
+# define GEN6_SF_FRONT_WIREFRAME			(1 << 5)
+# define GEN6_SF_FRONT_POINT				(2 << 5)
+# define GEN6_SF_BACK_SOLID				(0 << 3)
+# define GEN6_SF_BACK_WIREFRAME				(1 << 3)
+# define GEN6_SF_BACK_POINT				(2 << 3)
+# define GEN6_SF_VIEWPORT_TRANSFORM_ENABLE		(1 << 1)
+# define GEN6_SF_WINDING_CCW				(1 << 0)
+/* DW3 */
+# define GEN6_SF_LINE_AA_ENABLE				(1 << 31)
+# define GEN6_SF_CULL_BOTH				(0 << 29)
+# define GEN6_SF_CULL_NONE				(1 << 29)
+# define GEN6_SF_CULL_FRONT				(2 << 29)
+# define GEN6_SF_CULL_BACK				(3 << 29)
+# define GEN6_SF_LINE_WIDTH_SHIFT			18 /* U3.7 */
+# define GEN6_SF_LINE_END_CAP_WIDTH_0_5			(0 << 16)
+# define GEN6_SF_LINE_END_CAP_WIDTH_1_0			(1 << 16)
+# define GEN6_SF_LINE_END_CAP_WIDTH_2_0			(2 << 16)
+# define GEN6_SF_LINE_END_CAP_WIDTH_4_0			(3 << 16)
+# define GEN6_SF_SCISSOR_ENABLE				(1 << 11)
+# define GEN6_SF_MSRAST_OFF_PIXEL			(0 << 8)
+# define GEN6_SF_MSRAST_OFF_PATTERN			(1 << 8)
+# define GEN6_SF_MSRAST_ON_PIXEL			(2 << 8)
+# define GEN6_SF_MSRAST_ON_PATTERN			(3 << 8)
+/* DW4 */
+# define GEN6_SF_TRI_PROVOKE_SHIFT			29
+# define GEN6_SF_LINE_PROVOKE_SHIFT			27
+# define GEN6_SF_TRIFAN_PROVOKE_SHIFT			25
+# define GEN6_SF_LINE_AA_MODE_MANHATTAN			(0 << 14)
+# define GEN6_SF_LINE_AA_MODE_TRUE			(1 << 14)
+# define GEN6_SF_VERTEX_SUBPIXEL_8BITS			(0 << 12)
+# define GEN6_SF_VERTEX_SUBPIXEL_4BITS			(1 << 12)
+# define GEN6_SF_USE_STATE_POINT_WIDTH			(1 << 11)
+# define GEN6_SF_POINT_WIDTH_SHIFT			0 /* U8.3 */
+/* DW5: depth offset constant */
+/* DW6: depth offset scale */
+/* DW7: depth offset clamp */
+/* DW8 */
+# define ATTRIBUTE_1_OVERRIDE_W				(1 << 31)
+# define ATTRIBUTE_1_OVERRIDE_Z				(1 << 30)
+# define ATTRIBUTE_1_OVERRIDE_Y				(1 << 29)
+# define ATTRIBUTE_1_OVERRIDE_X				(1 << 28)
+# define ATTRIBUTE_1_CONST_SOURCE_SHIFT			25
+# define ATTRIBUTE_1_SWIZZLE_SHIFT			22
+# define ATTRIBUTE_1_SOURCE_SHIFT			16
+# define ATTRIBUTE_0_OVERRIDE_W				(1 << 15)
+# define ATTRIBUTE_0_OVERRIDE_Z				(1 << 14)
+# define ATTRIBUTE_0_OVERRIDE_Y				(1 << 13)
+# define ATTRIBUTE_0_OVERRIDE_X				(1 << 12)
+# define ATTRIBUTE_0_CONST_SOURCE_SHIFT			9
+#  define ATTRIBUTE_CONST_0000				0
+#  define ATTRIBUTE_CONST_0001_FLOAT			1
+#  define ATTRIBUTE_CONST_1111_FLOAT			2
+#  define ATTRIBUTE_CONST_PRIM_ID			3
+# define ATTRIBUTE_0_SWIZZLE_SHIFT			6
+# define ATTRIBUTE_0_SOURCE_SHIFT			0
+
+# define ATTRIBUTE_SWIZZLE_INPUTATTR                    0
+# define ATTRIBUTE_SWIZZLE_INPUTATTR_FACING             1
+# define ATTRIBUTE_SWIZZLE_INPUTATTR_W                  2
+# define ATTRIBUTE_SWIZZLE_INPUTATTR_FACING_W           3
+# define ATTRIBUTE_SWIZZLE_SHIFT                        6
+
+/* DW16: Point sprite texture coordinate enables */
+/* DW17: Constant interpolation enables */
+/* DW18: attr 0-7 wrap shortest enables */
+/* DW19: attr 8-16 wrap shortest enables */
+
+/* On GEN7, many fields of 3DSTATE_SF were split out into a new command:
+ * 3DSTATE_SBE.  The remaining fields live in different DWords, but retain
+ * the same bit-offset.  The only new field:
+ */
+/* GEN7/DW1: */
+# define GEN7_SF_DEPTH_BUFFER_SURFACE_FORMAT_SHIFT	12
+/* GEN7/DW2: */
+# define HSW_SF_LINE_STIPPLE_ENABLE			(1 << 14)
+
+# define GEN8_SF_SMOOTH_POINT_ENABLE                    (1 << 13)
+
+#define _3DSTATE_SBE				0x781F /* GEN7+ */
+/* DW1 */
+# define GEN8_SBE_FORCE_URB_ENTRY_READ_LENGTH           (1 << 29)
+# define GEN8_SBE_FORCE_URB_ENTRY_READ_OFFSET           (1 << 28)
+# define GEN7_SBE_SWIZZLE_CONTROL_MODE			(1 << 28)
+# define GEN7_SBE_NUM_OUTPUTS_SHIFT			22
+# define GEN7_SBE_SWIZZLE_ENABLE			(1 << 21)
+# define GEN7_SBE_POINT_SPRITE_LOWERLEFT		(1 << 20)
+# define GEN7_SBE_URB_ENTRY_READ_LENGTH_SHIFT		11
+# define GEN7_SBE_URB_ENTRY_READ_OFFSET_SHIFT		4
+# define GEN8_SBE_URB_ENTRY_READ_OFFSET_SHIFT		5
+/* DW2-9: Attribute setup (same as DW8-15 of gen6 _3DSTATE_SF) */
+/* DW10: Point sprite texture coordinate enables */
+/* DW11: Constant interpolation enables */
+/* DW12: attr 0-7 wrap shortest enables */
+/* DW13: attr 8-16 wrap shortest enables */
+
+#define _3DSTATE_SBE_SWIZ                       0x7851 /* GEN8+ */
+
+#define _3DSTATE_RASTER                         0x7850 /* GEN8+ */
+/* DW1 */
+# define GEN8_RASTER_FRONT_WINDING_CCW                  (1 << 21)
+# define GEN8_RASTER_CULL_BOTH                          (0 << 16)
+# define GEN8_RASTER_CULL_NONE                          (1 << 16)
+# define GEN8_RASTER_CULL_FRONT                         (2 << 16)
+# define GEN8_RASTER_CULL_BACK                          (3 << 16)
+# define GEN8_RASTER_SMOOTH_POINT_ENABLE                (1 << 13)
+# define GEN8_RASTER_API_MULTISAMPLE_ENABLE             (1 << 12)
+# define GEN8_RASTER_LINE_AA_ENABLE                     (1 << 2)
+# define GEN8_RASTER_SCISSOR_ENABLE                     (1 << 1)
+# define GEN8_RASTER_VIEWPORT_Z_CLIP_TEST_ENABLE        (1 << 0)
+
+/* Gen8 BLEND_STATE */
+/* DW0 */
+#define GEN8_BLEND_ALPHA_TO_COVERAGE_ENABLE             (1 << 31)
+#define GEN8_BLEND_INDEPENDENT_ALPHA_BLEND_ENABLE       (1 << 30)
+#define GEN8_BLEND_ALPHA_TO_ONE_ENABLE                  (1 << 29)
+#define GEN8_BLEND_ALPHA_TO_COVERAGE_DITHER_ENABLE      (1 << 28)
+#define GEN8_BLEND_ALPHA_TEST_ENABLE                    (1 << 27)
+#define GEN8_BLEND_ALPHA_TEST_FUNCTION_MASK             INTEL_MASK(26, 24)
+#define GEN8_BLEND_ALPHA_TEST_FUNCTION_SHIFT            24
+#define GEN8_BLEND_COLOR_DITHER_ENABLE                  (1 << 23)
+#define GEN8_BLEND_X_DITHER_OFFSET_MASK                 INTEL_MASK(22, 21)
+#define GEN8_BLEND_X_DITHER_OFFSET_SHIFT                21
+#define GEN8_BLEND_Y_DITHER_OFFSET_MASK                 INTEL_MASK(20, 19)
+#define GEN8_BLEND_Y_DITHER_OFFSET_SHIFT                19
+/* DW1 + 2n */
+#define GEN8_BLEND_COLOR_BUFFER_BLEND_ENABLE            (1 << 31)
+#define GEN8_BLEND_SRC_BLEND_FACTOR_MASK                INTEL_MASK(30, 26)
+#define GEN8_BLEND_SRC_BLEND_FACTOR_SHIFT               26
+#define GEN8_BLEND_DST_BLEND_FACTOR_MASK                INTEL_MASK(25, 21)
+#define GEN8_BLEND_DST_BLEND_FACTOR_SHIFT               21
+#define GEN8_BLEND_COLOR_BLEND_FUNCTION_MASK            INTEL_MASK(20, 18)
+#define GEN8_BLEND_COLOR_BLEND_FUNCTION_SHIFT           18
+#define GEN8_BLEND_SRC_ALPHA_BLEND_FACTOR_MASK          INTEL_MASK(17, 13)
+#define GEN8_BLEND_SRC_ALPHA_BLEND_FACTOR_SHIFT         13
+#define GEN8_BLEND_DST_ALPHA_BLEND_FACTOR_MASK          INTEL_MASK(12, 8)
+#define GEN8_BLEND_DST_ALPHA_BLEND_FACTOR_SHIFT         8
+#define GEN8_BLEND_ALPHA_BLEND_FUNCTION_MASK            INTEL_MASK(7, 5)
+#define GEN8_BLEND_ALPHA_BLEND_FUNCTION_SHIFT           5
+#define GEN8_BLEND_WRITE_DISABLE_ALPHA                  (1 << 3)
+#define GEN8_BLEND_WRITE_DISABLE_RED                    (1 << 2)
+#define GEN8_BLEND_WRITE_DISABLE_GREEN                  (1 << 1)
+#define GEN8_BLEND_WRITE_DISABLE_BLUE                   (1 << 0)
+/* DW1 + 2n + 1 */
+#define GEN8_BLEND_LOGIC_OP_ENABLE                      (1 << 31)
+#define GEN8_BLEND_LOGIC_OP_FUNCTION_MASK               INTEL_MASK(30, 27)
+#define GEN8_BLEND_LOGIC_OP_FUNCTION_SHIFT              27
+#define GEN8_BLEND_PRE_BLEND_SRC_ONLY_CLAMP_ENABLE      (1 << 4)
+#define GEN8_BLEND_COLOR_CLAMP_RANGE_RTFORMAT           (2 << 2)
+#define GEN8_BLEND_PRE_BLEND_COLOR_CLAMP_ENABLE         (1 << 1)
+#define GEN8_BLEND_POST_BLEND_COLOR_CLAMP_ENABLE        (1 << 0)
+
+#define _3DSTATE_WM_HZ_OP                       0x7852 /* GEN8+ */
+/* DW1 */
+# define GEN8_WM_HZ_STENCIL_CLEAR                       (1 << 31)
+# define GEN8_WM_HZ_DEPTH_CLEAR                         (1 << 30)
+# define GEN8_WM_HZ_DEPTH_RESOLVE                       (1 << 28)
+# define GEN8_WM_HZ_HIZ_RESOLVE                         (1 << 27)
+# define GEN8_WM_HZ_PIXEL_OFFSET_ENABLE                 (1 << 26)
+# define GEN8_WM_HZ_FULL_SURFACE_DEPTH_CLEAR            (1 << 25)
+# define GEN8_WM_HZ_STENCIL_CLEAR_VALUE_MASK            INTEL_MASK(23, 16)
+# define GEN8_WM_HZ_STENCIL_CLEAR_VALUE_SHIFT           16
+# define GEN8_WM_HZ_NUM_SAMPLES_MASK                    INTEL_MASK(15, 13)
+# define GEN8_WM_HZ_NUM_SAMPLES_SHIFT                   13
+/* DW2 */
+# define GEN8_WM_HZ_CLEAR_RECTANGLE_Y_MIN_MASK          INTEL_MASK(31, 16)
+# define GEN8_WM_HZ_CLEAR_RECTANGLE_Y_MIN_SHIFT         16
+# define GEN8_WM_HZ_CLEAR_RECTANGLE_X_MIN_MASK          INTEL_MASK(15, 0)
+# define GEN8_WM_HZ_CLEAR_RECTANGLE_X_MIN_SHIFT         0
+/* DW3 */
+# define GEN8_WM_HZ_CLEAR_RECTANGLE_Y_MAX_MASK          INTEL_MASK(31, 16)
+# define GEN8_WM_HZ_CLEAR_RECTANGLE_Y_MAX_SHIFT         16
+# define GEN8_WM_HZ_CLEAR_RECTANGLE_X_MAX_MASK          INTEL_MASK(15, 0)
+# define GEN8_WM_HZ_CLEAR_RECTANGLE_X_MAX_SHIFT         0
+/* DW4 */
+# define GEN8_WM_HZ_SAMPLE_MASK_MASK                    INTEL_MASK(15, 0)
+# define GEN8_WM_HZ_SAMPLE_MASK_SHIFT                   0
+
+
+#define _3DSTATE_PS_BLEND                       0x784D /* GEN8+ */
+/* DW1 */
+# define GEN8_PS_BLEND_ALPHA_TO_COVERAGE_ENABLE         (1 << 31)
+# define GEN8_PS_BLEND_HAS_WRITEABLE_RT                 (1 << 30)
+# define GEN8_PS_BLEND_COLOR_BUFFER_BLEND_ENABLE        (1 << 29)
+# define GEN8_PS_BLEND_SRC_ALPHA_BLEND_FACTOR_MASK      INTEL_MASK(28, 24)
+# define GEN8_PS_BLEND_SRC_ALPHA_BLEND_FACTOR_SHIFT     24
+# define GEN8_PS_BLEND_DST_ALPHA_BLEND_FACTOR_MASK      INTEL_MASK(23, 19)
+# define GEN8_PS_BLEND_DST_ALPHA_BLEND_FACTOR_SHIFT     19
+# define GEN8_PS_BLEND_SRC_BLEND_FACTOR_MASK            INTEL_MASK(18, 14)
+# define GEN8_PS_BLEND_SRC_BLEND_FACTOR_SHIFT           14
+# define GEN8_PS_BLEND_DST_BLEND_FACTOR_MASK            INTEL_MASK(13, 9)
+# define GEN8_PS_BLEND_DST_BLEND_FACTOR_SHIFT           9
+# define GEN8_PS_BLEND_ALPHA_TEST_ENABLE                (1 << 8)
+# define GEN8_PS_BLEND_INDEPENDENT_ALPHA_BLEND_ENABLE   (1 << 7)
+
+#define _3DSTATE_WM_DEPTH_STENCIL               0x784E /* GEN8+ */
+/* DW1 */
+# define GEN8_WM_DS_STENCIL_FAIL_OP_SHIFT               29
+# define GEN8_WM_DS_Z_FAIL_OP_SHIFT                     26
+# define GEN8_WM_DS_Z_PASS_OP_SHIFT                     23
+# define GEN8_WM_DS_BF_STENCIL_FUNC_SHIFT               20
+# define GEN8_WM_DS_BF_STENCIL_FAIL_OP_SHIFT            17
+# define GEN8_WM_DS_BF_Z_FAIL_OP_SHIFT                  14
+# define GEN8_WM_DS_BF_Z_PASS_OP_SHIFT                  11
+# define GEN8_WM_DS_STENCIL_FUNC_SHIFT                  8
+# define GEN8_WM_DS_DEPTH_FUNC_SHIFT                    5
+# define GEN8_WM_DS_DOUBLE_SIDED_STENCIL_ENABLE         (1 << 4)
+# define GEN8_WM_DS_STENCIL_TEST_ENABLE                 (1 << 3)
+# define GEN8_WM_DS_STENCIL_BUFFER_WRITE_ENABLE         (1 << 2)
+# define GEN8_WM_DS_DEPTH_TEST_ENABLE                   (1 << 1)
+# define GEN8_WM_DS_DEPTH_BUFFER_WRITE_ENABLE           (1 << 0)
+/* DW2 */
+# define GEN8_WM_DS_STENCIL_TEST_MASK_MASK              INTEL_MASK(31, 24)
+# define GEN8_WM_DS_STENCIL_TEST_MASK_SHIFT             24
+# define GEN8_WM_DS_STENCIL_WRITE_MASK_MASK             INTEL_MASK(23, 16)
+# define GEN8_WM_DS_STENCIL_WRITE_MASK_SHIFT            16
+# define GEN8_WM_DS_BF_STENCIL_TEST_MASK_MASK           INTEL_MASK(15, 8)
+# define GEN8_WM_DS_BF_STENCIL_TEST_MASK_SHIFT          8
+# define GEN8_WM_DS_BF_STENCIL_WRITE_MASK_MASK          INTEL_MASK(7, 0)
+# define GEN8_WM_DS_BF_STENCIL_WRITE_MASK_SHIFT         0
+
+#define _3DSTATE_PS_EXTRA                       0x784F /* GEN8+ */
+/* DW1 */
+# define GEN8_PSX_PIXEL_SHADER_VALID                    (1 << 31)
+# define GEN8_PSX_PIXEL_SHADER_NO_RT_WRITE              (1 << 30)
+# define GEN8_PSX_OMASK_TO_RENDER_TARGET                (1 << 29)
+# define GEN8_PSX_KILL_ENABLE                           (1 << 28)
+# define GEN8_PSX_PSCDEPTH_OFF                          (0 << 26)
+# define GEN8_PSX_PSCDEPTH_ON                           (1 << 26)
+# define GEN8_PSX_PSCDEPTH_ON_GE                        (2 << 26)
+# define GEN8_PSX_PSCDEPTH_ON_LE                        (3 << 26)
+# define GEN8_PSX_FORCE_COMPUTED_DEPTH                  (1 << 25)
+# define GEN8_PSX_USES_SOURCE_DEPTH                     (1 << 24)
+# define GEN8_PSX_USES_SOURCE_W                         (1 << 23)
+# define GEN8_PSX_ATTRIBUTE_ENABLE                      (1 << 8)
+# define GEN8_PSX_SHADER_DISABLES_ALPHA_TO_COVERAGE     (1 << 7)
+# define GEN8_PSX_SHADER_IS_PER_SAMPLE                  (1 << 6)
+# define GEN8_PSX_SHADER_COMPUTES_STENCIL               (1 << 5)
+# define GEN8_PSX_SHADER_HAS_UAV                        (1 << 2)
+# define GEN8_PSX_SHADER_USES_INPUT_COVERAGE_MASK       (1 << 1)
+
+enum brw_wm_barycentric_interp_mode {
+   BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC		= 0,
+   BRW_WM_PERSPECTIVE_CENTROID_BARYCENTRIC	= 1,
+   BRW_WM_PERSPECTIVE_SAMPLE_BARYCENTRIC	= 2,
+   BRW_WM_NONPERSPECTIVE_PIXEL_BARYCENTRIC	= 3,
+   BRW_WM_NONPERSPECTIVE_CENTROID_BARYCENTRIC	= 4,
+   BRW_WM_NONPERSPECTIVE_SAMPLE_BARYCENTRIC	= 5,
+   BRW_WM_BARYCENTRIC_INTERP_MODE_COUNT  = 6
+};
+#define BRW_WM_NONPERSPECTIVE_BARYCENTRIC_BITS \
+   ((1 << BRW_WM_NONPERSPECTIVE_PIXEL_BARYCENTRIC) | \
+    (1 << BRW_WM_NONPERSPECTIVE_CENTROID_BARYCENTRIC) | \
+    (1 << BRW_WM_NONPERSPECTIVE_SAMPLE_BARYCENTRIC))
+
+#define _3DSTATE_WM				0x7814 /* GEN6+ */
+/* DW1: kernel pointer */
+/* DW2 */
+# define GEN6_WM_SPF_MODE				(1 << 31)
+# define GEN6_WM_VECTOR_MASK_ENABLE			(1 << 30)
+# define GEN6_WM_SAMPLER_COUNT_SHIFT			27
+# define GEN6_WM_BINDING_TABLE_ENTRY_COUNT_SHIFT	18
+# define GEN6_WM_FLOATING_POINT_MODE_IEEE_754		(0 << 16)
+# define GEN6_WM_FLOATING_POINT_MODE_ALT		(1 << 16)
+/* DW3: scratch space */
+/* DW4 */
+# define GEN6_WM_STATISTICS_ENABLE			(1 << 31)
+# define GEN6_WM_DEPTH_CLEAR				(1 << 30)
+# define GEN6_WM_DEPTH_RESOLVE				(1 << 28)
+# define GEN6_WM_HIERARCHICAL_DEPTH_RESOLVE		(1 << 27)
+# define GEN6_WM_DISPATCH_START_GRF_SHIFT_0		16
+# define GEN6_WM_DISPATCH_START_GRF_SHIFT_1		8
+# define GEN6_WM_DISPATCH_START_GRF_SHIFT_2		0
+/* DW5 */
+# define GEN6_WM_MAX_THREADS_SHIFT			25
+# define GEN6_WM_KILL_ENABLE				(1 << 22)
+# define GEN6_WM_COMPUTED_DEPTH				(1 << 21)
+# define GEN6_WM_USES_SOURCE_DEPTH			(1 << 20)
+# define GEN6_WM_DISPATCH_ENABLE			(1 << 19)
+# define GEN6_WM_LINE_END_CAP_AA_WIDTH_0_5		(0 << 16)
+# define GEN6_WM_LINE_END_CAP_AA_WIDTH_1_0		(1 << 16)
+# define GEN6_WM_LINE_END_CAP_AA_WIDTH_2_0		(2 << 16)
+# define GEN6_WM_LINE_END_CAP_AA_WIDTH_4_0		(3 << 16)
+# define GEN6_WM_LINE_AA_WIDTH_0_5			(0 << 14)
+# define GEN6_WM_LINE_AA_WIDTH_1_0			(1 << 14)
+# define GEN6_WM_LINE_AA_WIDTH_2_0			(2 << 14)
+# define GEN6_WM_LINE_AA_WIDTH_4_0			(3 << 14)
+# define GEN6_WM_POLYGON_STIPPLE_ENABLE			(1 << 13)
+# define GEN6_WM_LINE_STIPPLE_ENABLE			(1 << 11)
+# define GEN6_WM_OMASK_TO_RENDER_TARGET			(1 << 9)
+# define GEN6_WM_USES_SOURCE_W				(1 << 8)
+# define GEN6_WM_DUAL_SOURCE_BLEND_ENABLE		(1 << 7)
+# define GEN6_WM_32_DISPATCH_ENABLE			(1 << 2)
+# define GEN6_WM_16_DISPATCH_ENABLE			(1 << 1)
+# define GEN6_WM_8_DISPATCH_ENABLE			(1 << 0)
+/* DW6 */
+# define GEN6_WM_NUM_SF_OUTPUTS_SHIFT			20
+# define GEN6_WM_POSOFFSET_NONE				(0 << 18)
+# define GEN6_WM_POSOFFSET_CENTROID			(2 << 18)
+# define GEN6_WM_POSOFFSET_SAMPLE			(3 << 18)
+# define GEN6_WM_POSITION_ZW_PIXEL			(0 << 16)
+# define GEN6_WM_POSITION_ZW_CENTROID			(2 << 16)
+# define GEN6_WM_POSITION_ZW_SAMPLE			(3 << 16)
+# define GEN6_WM_NONPERSPECTIVE_SAMPLE_BARYCENTRIC	(1 << 15)
+# define GEN6_WM_NONPERSPECTIVE_CENTROID_BARYCENTRIC	(1 << 14)
+# define GEN6_WM_NONPERSPECTIVE_PIXEL_BARYCENTRIC	(1 << 13)
+# define GEN6_WM_PERSPECTIVE_SAMPLE_BARYCENTRIC		(1 << 12)
+# define GEN6_WM_PERSPECTIVE_CENTROID_BARYCENTRIC	(1 << 11)
+# define GEN6_WM_PERSPECTIVE_PIXEL_BARYCENTRIC		(1 << 10)
+# define GEN6_WM_BARYCENTRIC_INTERPOLATION_MODE_SHIFT   10
+# define GEN6_WM_POINT_RASTRULE_UPPER_RIGHT		(1 << 9)
+# define GEN6_WM_MSRAST_OFF_PIXEL			(0 << 1)
+# define GEN6_WM_MSRAST_OFF_PATTERN			(1 << 1)
+# define GEN6_WM_MSRAST_ON_PIXEL			(2 << 1)
+# define GEN6_WM_MSRAST_ON_PATTERN			(3 << 1)
+# define GEN6_WM_MSDISPMODE_PERSAMPLE			(0 << 0)
+# define GEN6_WM_MSDISPMODE_PERPIXEL			(1 << 0)
+/* DW7: kernel 1 pointer */
+/* DW8: kernel 2 pointer */
+
+#define _3DSTATE_CONSTANT_VS		      0x7815 /* GEN6+ */
+#define _3DSTATE_CONSTANT_GS		      0x7816 /* GEN6+ */
+#define _3DSTATE_CONSTANT_PS		      0x7817 /* GEN6+ */
+# define GEN6_CONSTANT_BUFFER_3_ENABLE			(1 << 15)
+# define GEN6_CONSTANT_BUFFER_2_ENABLE			(1 << 14)
+# define GEN6_CONSTANT_BUFFER_1_ENABLE			(1 << 13)
+# define GEN6_CONSTANT_BUFFER_0_ENABLE			(1 << 12)
+
+#define _3DSTATE_CONSTANT_HS                  0x7819 /* GEN7+ */
+#define _3DSTATE_CONSTANT_DS                  0x781A /* GEN7+ */
+
+#define _3DSTATE_STREAMOUT                    0x781e /* GEN7+ */
+/* DW1 */
+# define SO_FUNCTION_ENABLE				(1 << 31)
+# define SO_RENDERING_DISABLE				(1 << 30)
+/* This selects which incoming rendering stream goes down the pipeline.  The
+ * rendering stream is 0 if not defined by special cases in the GS state.
+ */
+# define SO_RENDER_STREAM_SELECT_SHIFT			27
+# define SO_RENDER_STREAM_SELECT_MASK			INTEL_MASK(28, 27)
+/* Controls reordering of TRISTRIP_* elements in stream output (not rendering).
+ */
+# define SO_REORDER_TRAILING				(1 << 26)
+/* Controls SO_NUM_PRIMS_WRITTEN_* and SO_PRIM_STORAGE_* */
+# define SO_STATISTICS_ENABLE				(1 << 25)
+# define SO_BUFFER_ENABLE(n)				(1 << (8 + (n)))
+/* DW2 */
+# define SO_STREAM_3_VERTEX_READ_OFFSET_SHIFT		29
+# define SO_STREAM_3_VERTEX_READ_OFFSET_MASK		INTEL_MASK(29, 29)
+# define SO_STREAM_3_VERTEX_READ_LENGTH_SHIFT		24
+# define SO_STREAM_3_VERTEX_READ_LENGTH_MASK		INTEL_MASK(28, 24)
+# define SO_STREAM_2_VERTEX_READ_OFFSET_SHIFT		21
+# define SO_STREAM_2_VERTEX_READ_OFFSET_MASK		INTEL_MASK(21, 21)
+# define SO_STREAM_2_VERTEX_READ_LENGTH_SHIFT		16
+# define SO_STREAM_2_VERTEX_READ_LENGTH_MASK		INTEL_MASK(20, 16)
+# define SO_STREAM_1_VERTEX_READ_OFFSET_SHIFT		13
+# define SO_STREAM_1_VERTEX_READ_OFFSET_MASK		INTEL_MASK(13, 13)
+# define SO_STREAM_1_VERTEX_READ_LENGTH_SHIFT		8
+# define SO_STREAM_1_VERTEX_READ_LENGTH_MASK		INTEL_MASK(12, 8)
+# define SO_STREAM_0_VERTEX_READ_OFFSET_SHIFT		5
+# define SO_STREAM_0_VERTEX_READ_OFFSET_MASK		INTEL_MASK(5, 5)
+# define SO_STREAM_0_VERTEX_READ_LENGTH_SHIFT		0
+# define SO_STREAM_0_VERTEX_READ_LENGTH_MASK		INTEL_MASK(4, 0)
+
+/* 3DSTATE_WM for Gen7 */
+/* DW1 */
+# define GEN7_WM_STATISTICS_ENABLE			(1 << 31)
+# define GEN7_WM_DEPTH_CLEAR				(1 << 30)
+# define GEN7_WM_DISPATCH_ENABLE			(1 << 29)
+# define GEN7_WM_DEPTH_RESOLVE				(1 << 28)
+# define GEN7_WM_HIERARCHICAL_DEPTH_RESOLVE		(1 << 27)
+# define GEN7_WM_KILL_ENABLE				(1 << 25)
+# define GEN7_WM_PSCDEPTH_OFF			        (0 << 23)
+# define GEN7_WM_PSCDEPTH_ON			        (1 << 23)
+# define GEN7_WM_PSCDEPTH_ON_GE			        (2 << 23)
+# define GEN7_WM_PSCDEPTH_ON_LE			        (3 << 23)
+# define GEN7_WM_USES_SOURCE_DEPTH			(1 << 20)
+# define GEN7_WM_USES_SOURCE_W			        (1 << 19)
+# define GEN7_WM_POSITION_ZW_PIXEL			(0 << 17)
+# define GEN7_WM_POSITION_ZW_CENTROID			(2 << 17)
+# define GEN7_WM_POSITION_ZW_SAMPLE			(3 << 17)
+# define GEN7_WM_BARYCENTRIC_INTERPOLATION_MODE_SHIFT   11
+# define GEN7_WM_USES_INPUT_COVERAGE_MASK	        (1 << 10)
+# define GEN7_WM_LINE_END_CAP_AA_WIDTH_0_5		(0 << 8)
+# define GEN7_WM_LINE_END_CAP_AA_WIDTH_1_0		(1 << 8)
+# define GEN7_WM_LINE_END_CAP_AA_WIDTH_2_0		(2 << 8)
+# define GEN7_WM_LINE_END_CAP_AA_WIDTH_4_0		(3 << 8)
+# define GEN7_WM_LINE_AA_WIDTH_0_5			(0 << 6)
+# define GEN7_WM_LINE_AA_WIDTH_1_0			(1 << 6)
+# define GEN7_WM_LINE_AA_WIDTH_2_0			(2 << 6)
+# define GEN7_WM_LINE_AA_WIDTH_4_0			(3 << 6)
+# define GEN7_WM_POLYGON_STIPPLE_ENABLE			(1 << 4)
+# define GEN7_WM_LINE_STIPPLE_ENABLE			(1 << 3)
+# define GEN7_WM_POINT_RASTRULE_UPPER_RIGHT		(1 << 2)
+# define GEN7_WM_MSRAST_OFF_PIXEL			(0 << 0)
+# define GEN7_WM_MSRAST_OFF_PATTERN			(1 << 0)
+# define GEN7_WM_MSRAST_ON_PIXEL			(2 << 0)
+# define GEN7_WM_MSRAST_ON_PATTERN			(3 << 0)
+/* DW2 */
+# define GEN7_WM_MSDISPMODE_PERSAMPLE			(0 << 31)
+# define GEN7_WM_MSDISPMODE_PERPIXEL			(1 << 31)
+
+#define _3DSTATE_PS				0x7820 /* GEN7+ */
+/* DW1: kernel pointer */
+/* DW2 */
+# define GEN7_PS_SPF_MODE				(1 << 31)
+# define GEN7_PS_VECTOR_MASK_ENABLE			(1 << 30)
+# define GEN7_PS_SAMPLER_COUNT_SHIFT			27
+# define GEN7_PS_BINDING_TABLE_ENTRY_COUNT_SHIFT	18
+# define GEN7_PS_FLOATING_POINT_MODE_IEEE_754		(0 << 16)
+# define GEN7_PS_FLOATING_POINT_MODE_ALT		(1 << 16)
+/* DW3: scratch space */
+/* DW4 */
+# define IVB_PS_MAX_THREADS_SHIFT			24
+# define HSW_PS_MAX_THREADS_SHIFT			23
+# define HSW_PS_SAMPLE_MASK_SHIFT		        12
+# define HSW_PS_SAMPLE_MASK_MASK			INTEL_MASK(19, 12)
+# define GEN7_PS_PUSH_CONSTANT_ENABLE		        (1 << 11)
+# define GEN7_PS_ATTRIBUTE_ENABLE		        (1 << 10)
+# define GEN7_PS_OMASK_TO_RENDER_TARGET			(1 << 9)
+# define GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE	(1 << 8)
+# define GEN7_PS_DUAL_SOURCE_BLEND_ENABLE		(1 << 7)
+# define GEN7_PS_RENDER_TARGET_RESOLVE_ENABLE		(1 << 6)
+# define GEN7_PS_POSOFFSET_NONE				(0 << 3)
+# define GEN7_PS_POSOFFSET_CENTROID			(2 << 3)
+# define GEN7_PS_POSOFFSET_SAMPLE			(3 << 3)
+# define GEN7_PS_32_DISPATCH_ENABLE			(1 << 2)
+# define GEN7_PS_16_DISPATCH_ENABLE			(1 << 1)
+# define GEN7_PS_8_DISPATCH_ENABLE			(1 << 0)
+/* DW5 */
+# define GEN7_PS_DISPATCH_START_GRF_SHIFT_0		16
+# define GEN7_PS_DISPATCH_START_GRF_SHIFT_1		8
+# define GEN7_PS_DISPATCH_START_GRF_SHIFT_2		0
+/* DW6: kernel 1 pointer */
+/* DW7: kernel 2 pointer */
+
+#define _3DSTATE_SAMPLE_MASK			0x7818 /* GEN6+ */
+
+#define _3DSTATE_DRAWING_RECTANGLE		0x7900
+#define _3DSTATE_BLEND_CONSTANT_COLOR		0x7901
+#define _3DSTATE_CHROMA_KEY			0x7904
+#define _3DSTATE_DEPTH_BUFFER			0x7905 /* GEN4-6 */
+#define _3DSTATE_POLY_STIPPLE_OFFSET		0x7906
+#define _3DSTATE_POLY_STIPPLE_PATTERN		0x7907
+#define _3DSTATE_LINE_STIPPLE_PATTERN		0x7908
+#define _3DSTATE_GLOBAL_DEPTH_OFFSET_CLAMP	0x7909
+#define _3DSTATE_AA_LINE_PARAMETERS		0x790a /* G45+ */
+
+#define _3DSTATE_GS_SVB_INDEX			0x790b /* CTG+ */
+/* DW1 */
+# define SVB_INDEX_SHIFT				29
+# define SVB_LOAD_INTERNAL_VERTEX_COUNT			(1 << 0) /* SNB+ */
+/* DW2: SVB index */
+/* DW3: SVB maximum index */
+
+#define _3DSTATE_MULTISAMPLE			0x790d /* GEN6+ */
+#define GEN8_3DSTATE_MULTISAMPLE		0x780d /* GEN8+ */
+/* DW1 */
+# define MS_PIXEL_LOCATION_CENTER			(0 << 4)
+# define MS_PIXEL_LOCATION_UPPER_LEFT			(1 << 4)
+# define MS_NUMSAMPLES_1				(0 << 1)
+# define MS_NUMSAMPLES_2				(1 << 1)
+# define MS_NUMSAMPLES_4				(2 << 1)
+# define MS_NUMSAMPLES_8				(3 << 1)
+# define MS_NUMSAMPLES_16				(4 << 1)
+
+#define _3DSTATE_SAMPLE_PATTERN                 0x791c
+
+#define _3DSTATE_STENCIL_BUFFER			0x790e /* ILK, SNB */
+#define _3DSTATE_HIER_DEPTH_BUFFER		0x790f /* ILK, SNB */
+
+#define GEN7_3DSTATE_CLEAR_PARAMS		0x7804
+#define GEN7_3DSTATE_DEPTH_BUFFER		0x7805
+#define GEN7_3DSTATE_STENCIL_BUFFER		0x7806
+# define HSW_STENCIL_ENABLED                            (1 << 31)
+#define GEN7_3DSTATE_HIER_DEPTH_BUFFER		0x7807
+
+#define _3DSTATE_CLEAR_PARAMS			0x7910 /* ILK, SNB */
+# define GEN5_DEPTH_CLEAR_VALID				(1 << 15)
+/* DW1: depth clear value */
+/* DW2 */
+# define GEN7_DEPTH_CLEAR_VALID				(1 << 0)
+
+#define _3DSTATE_SO_DECL_LIST			0x7917 /* GEN7+ */
+/* DW1 */
+# define SO_STREAM_TO_BUFFER_SELECTS_3_SHIFT		12
+# define SO_STREAM_TO_BUFFER_SELECTS_3_MASK		INTEL_MASK(15, 12)
+# define SO_STREAM_TO_BUFFER_SELECTS_2_SHIFT		8
+# define SO_STREAM_TO_BUFFER_SELECTS_2_MASK		INTEL_MASK(11, 8)
+# define SO_STREAM_TO_BUFFER_SELECTS_1_SHIFT		4
+# define SO_STREAM_TO_BUFFER_SELECTS_1_MASK		INTEL_MASK(7, 4)
+# define SO_STREAM_TO_BUFFER_SELECTS_0_SHIFT		0
+# define SO_STREAM_TO_BUFFER_SELECTS_0_MASK		INTEL_MASK(3, 0)
+/* DW2 */
+# define SO_NUM_ENTRIES_3_SHIFT				24
+# define SO_NUM_ENTRIES_3_MASK				INTEL_MASK(31, 24)
+# define SO_NUM_ENTRIES_2_SHIFT				16
+# define SO_NUM_ENTRIES_2_MASK				INTEL_MASK(23, 16)
+# define SO_NUM_ENTRIES_1_SHIFT				8
+# define SO_NUM_ENTRIES_1_MASK				INTEL_MASK(15, 8)
+# define SO_NUM_ENTRIES_0_SHIFT				0
+# define SO_NUM_ENTRIES_0_MASK				INTEL_MASK(7, 0)
+
+/* SO_DECL DW0 */
+# define SO_DECL_OUTPUT_BUFFER_SLOT_SHIFT		12
+# define SO_DECL_OUTPUT_BUFFER_SLOT_MASK		INTEL_MASK(13, 12)
+# define SO_DECL_HOLE_FLAG				(1 << 11)
+# define SO_DECL_REGISTER_INDEX_SHIFT			4
+# define SO_DECL_REGISTER_INDEX_MASK			INTEL_MASK(9, 4)
+# define SO_DECL_COMPONENT_MASK_SHIFT			0
+# define SO_DECL_COMPONENT_MASK_MASK			INTEL_MASK(3, 0)
+
+#define _3DSTATE_SO_BUFFER                    0x7918 /* GEN7+ */
+/* DW1 */
+# define GEN8_SO_BUFFER_ENABLE                          (1 << 31)
+# define SO_BUFFER_INDEX_SHIFT				29
+# define SO_BUFFER_INDEX_MASK				INTEL_MASK(30, 29)
+# define GEN8_SO_BUFFER_OFFSET_WRITE_ENABLE             (1 << 21)
+# define GEN8_SO_BUFFER_OFFSET_ADDRESS_ENABLE           (1 << 20)
+# define SO_BUFFER_PITCH_SHIFT				0
+# define SO_BUFFER_PITCH_MASK				INTEL_MASK(11, 0)
+/* DW2: start address */
+/* DW3: end address. */
+
+#define CMD_MI_FLUSH                  0x0200
+
+# define BLT_X_SHIFT					0
+# define BLT_X_MASK					INTEL_MASK(15, 0)
+# define BLT_Y_SHIFT					16
+# define BLT_Y_MASK					INTEL_MASK(31, 16)
+
+#define GEN5_MI_REPORT_PERF_COUNT ((0x26 << 23) | (3 - 2))
+/* DW0 */
+# define GEN5_MI_COUNTER_SET_0      (0 << 6)
+# define GEN5_MI_COUNTER_SET_1      (1 << 6)
+/* DW1 */
+# define MI_COUNTER_ADDRESS_GTT     (1 << 0)
+/* DW2: a user-defined report ID (written to the buffer but can be anything) */
+
+#define GEN6_MI_REPORT_PERF_COUNT ((0x28 << 23) | (3 - 2))
+
+/* Bitfields for the URB_WRITE message, DW2 of message header: */
+#define URB_WRITE_PRIM_END		0x1
+#define URB_WRITE_PRIM_START		0x2
+#define URB_WRITE_PRIM_TYPE_SHIFT	2
+
+
+/* Maximum number of entries that can be addressed using a binding table
+ * pointer of type SURFTYPE_BUFFER
+ */
+#define BRW_MAX_NUM_BUFFER_ENTRIES	(1 << 27)
+
+/* Memory Object Control State:
+ * Specifying zero for L3 means "uncached in L3", at least on Haswell
+ * and Baytrail, since there are no PTE flags for setting L3 cacheability.
+ * On Ivybridge, the PTEs do have a cache-in-L3 bit, so setting MOCS to 0
+ * may still respect that.
+ */
+#define GEN7_MOCS_L3                    1
+
+/* Ivybridge only: cache in LLC.
+ * Specifying zero here means to use the PTE values set by the kernel;
+ * non-zero overrides the PTE values.
+ */
+#define IVB_MOCS_LLC                    (1 << 1)
+
+/* Baytrail only: snoop in CPU cache */
+#define BYT_MOCS_SNOOP                  (1 << 1)
+
+/* Haswell only: LLC/eLLC controls (write-back or uncached).
+ * Specifying zero here means to use the PTE values set by the kernel,
+ * which is useful since it offers additional control (write-through
+ * cacheing and age).  Non-zero overrides the PTE values.
+ */
+#define HSW_MOCS_UC_LLC_UC_ELLC         (1 << 1)
+#define HSW_MOCS_WB_LLC_WB_ELLC         (2 << 1)
+#define HSW_MOCS_UC_LLC_WB_ELLC         (3 << 1)
+
+/* Broadwell: write-back or write-through; always use all the caches. */
+#define BDW_MOCS_WB 0x78
+#define BDW_MOCS_WT 0x58
+
+#include "intel_chipset.h"
+
+#endif

diff --git a/icd/intel/compiler/pipeline/brw_device_info.c b/icd/intel/compiler/pipeline/brw_device_info.c
new file mode 100644
index 0000000..f26e13c
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_device_info.c

@@ -0,0 +1,318 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "brw_device_info.h"
+
+static const struct brw_device_info brw_device_info_i965 = {
+   .gen = 4,
+   .has_negative_rhw_bug = true,
+   .needs_unlit_centroid_workaround = true,
+   .max_vs_threads = 16,
+   .max_gs_threads = 2,
+   .max_wm_threads = 8 * 4,
+   .urb = {
+      .size = 256,
+   },
+};
+
+static const struct brw_device_info brw_device_info_g4x = {
+   .gen = 4,
+   .has_pln = true,
+   .has_compr4 = true,
+   .has_surface_tile_offset = true,
+   .needs_unlit_centroid_workaround = true,
+   .is_g4x = true,
+   .max_vs_threads = 32,
+   .max_gs_threads = 2,
+   .max_wm_threads = 10 * 5,
+   .urb = {
+      .size = 384,
+   },
+};
+
+static const struct brw_device_info brw_device_info_ilk = {
+   .gen = 5,
+   .has_pln = true,
+   .has_compr4 = true,
+   .has_surface_tile_offset = true,
+   .needs_unlit_centroid_workaround = true,
+   .max_vs_threads = 72,
+   .max_gs_threads = 32,
+   .max_wm_threads = 12 * 6,
+   .urb = {
+      .size = 1024,
+   },
+};
+
+static const struct brw_device_info brw_device_info_snb_gt1 = {
+   .gen = 6,
+   .gt = 1,
+   .has_hiz_and_separate_stencil = true,
+   .has_llc = true,
+   .has_pln = true,
+   .has_surface_tile_offset = true,
+   .needs_unlit_centroid_workaround = true,
+   .max_vs_threads = 24,
+   .max_gs_threads = 21, /* conservative; 24 if rendering disabled. */
+   .max_wm_threads = 40,
+   .urb = {
+      .size = 32,
+      .min_vs_entries = 24,
+      .max_vs_entries = 256,
+      .max_gs_entries = 256,
+   },
+};
+
+static const struct brw_device_info brw_device_info_snb_gt2 = {
+   .gen = 6,
+   .gt = 2,
+   .has_hiz_and_separate_stencil = true,
+   .has_llc = true,
+   .has_pln = true,
+   .has_surface_tile_offset = true,
+   .needs_unlit_centroid_workaround = true,
+   .max_vs_threads = 60,
+   .max_gs_threads = 60,
+   .max_wm_threads = 80,
+   .urb = {
+      .size = 64,
+      .min_vs_entries = 24,
+      .max_vs_entries = 256,
+      .max_gs_entries = 256,
+   },
+};
+
+//#define GEN7_FEATURES
+
+static const struct brw_device_info brw_device_info_ivb_gt1 = {
+    .gen = 7,
+    .has_hiz_and_separate_stencil = true,
+    .must_use_separate_stencil = true,
+    .has_llc = true,
+    .has_pln = true,
+    .has_surface_tile_offset = true,
+    .needs_unlit_centroid_workaround = true,
+    .is_ivybridge = true,
+    .gt = 1,
+   .max_vs_threads = 36,
+   .max_gs_threads = 36,
+   .max_wm_threads = 48,
+   .urb = {
+      .size = 128,
+      .min_vs_entries = 32,
+      .max_vs_entries = 512,
+      .max_gs_entries = 192,
+   },
+};
+
+static const struct brw_device_info brw_device_info_ivb_gt2 = {
+    .gen = 7,
+    .has_hiz_and_separate_stencil = true,
+    .must_use_separate_stencil = true,
+    .has_pln = true,
+    .has_surface_tile_offset = true,
+    .needs_unlit_centroid_workaround = true,
+    .is_ivybridge = true, .gt = 2,
+   .has_llc = false,
+   .max_vs_threads = 128,
+   .max_gs_threads = 128,
+   .max_wm_threads = 172,
+   .urb = {
+      .size = 256,
+      .min_vs_entries = 32,
+      .max_vs_entries = 704,
+      .max_gs_entries = 320,
+   },
+};
+
+static const struct brw_device_info brw_device_info_byt = {
+    .gen = 7,
+    .has_hiz_and_separate_stencil = true,
+    .must_use_separate_stencil = true,
+    .has_pln = true,
+    .has_surface_tile_offset = true,
+    .needs_unlit_centroid_workaround = true,
+    .is_baytrail = true, .gt = 1,
+   .has_llc = false,
+   .max_vs_threads = 36,
+   .max_gs_threads = 36,
+   .max_wm_threads = 48,
+   .urb = {
+      .size = 128,
+      .min_vs_entries = 32,
+      .max_vs_entries = 512,
+      .max_gs_entries = 192,
+   },
+};
+
+static const struct brw_device_info brw_device_info_hsw_gt1 = {
+    .gen = 7,
+    .has_hiz_and_separate_stencil = true,
+    .must_use_separate_stencil = true,
+    .has_llc = true,
+    .has_pln = true,
+    .has_surface_tile_offset = true,
+    .needs_unlit_centroid_workaround = true,
+    .is_haswell = true, .gt = 1,
+   .max_vs_threads = 70,
+   .max_gs_threads = 70,
+   .max_wm_threads = 102,
+   .urb = {
+      .size = 128,
+      .min_vs_entries = 32,
+      .max_vs_entries = 640,
+      .max_gs_entries = 256,
+   },
+};
+
+static const struct brw_device_info brw_device_info_hsw_gt2 = {
+    .gen = 7,
+    .has_hiz_and_separate_stencil = true,
+    .must_use_separate_stencil = true,
+    .has_llc = true,
+    .has_pln = true,
+    .has_surface_tile_offset = true,
+    .needs_unlit_centroid_workaround = true,
+    .is_haswell = true, .gt = 2,
+   .max_vs_threads = 280,
+   .max_gs_threads = 256,
+   .max_wm_threads = 204,
+   .urb = {
+      .size = 256,
+      .min_vs_entries = 64,
+      .max_vs_entries = 1664,
+      .max_gs_entries = 640,
+   },
+};
+
+static const struct brw_device_info brw_device_info_hsw_gt3 = {
+    .gen = 7,
+    .has_hiz_and_separate_stencil = true,
+    .must_use_separate_stencil = true,
+    .has_llc = true,
+    .has_pln = true,
+    .has_surface_tile_offset = true,
+    .needs_unlit_centroid_workaround = true,
+    .is_haswell = true, .gt = 3,
+   .max_vs_threads = 280,
+   .max_gs_threads = 256,
+   .max_wm_threads = 408,
+   .urb = {
+      .size = 512,
+      .min_vs_entries = 64,
+      .max_vs_entries = 1664,
+      .max_gs_entries = 640,
+   },
+};
+
+//#define GEN8_FEATURES
+
+static const struct brw_device_info brw_device_info_bdw_gt1 = {
+    .gen = 8,
+    .has_hiz_and_separate_stencil = true,
+    .must_use_separate_stencil = true,
+    .has_llc = true,
+    .has_pln = true,
+    .max_vs_threads = 504,
+    .max_gs_threads = 504,
+    .max_wm_threads = 384,
+    .gt = 1,
+   .urb = {
+      .size = 192,
+      .min_vs_entries = 64,
+      .max_vs_entries = 2560,
+      .max_gs_entries = 960,
+   }
+};
+
+static const struct brw_device_info brw_device_info_bdw_gt2 = {
+    .gen = 8,
+    .has_hiz_and_separate_stencil = true,
+    .must_use_separate_stencil = true,
+    .has_llc = true,
+    .has_pln = true,
+    .max_vs_threads = 504,
+    .max_gs_threads = 504,
+    .max_wm_threads = 384,
+    .gt = 2,
+   .urb = {
+      .size = 384,
+      .min_vs_entries = 64,
+      .max_vs_entries = 2560,
+      .max_gs_entries = 960,
+   }
+};
+
+static const struct brw_device_info brw_device_info_bdw_gt3 = {
+    .gen = 8,
+    .has_hiz_and_separate_stencil = true,
+    .must_use_separate_stencil = true,
+    .has_llc = true,
+    .has_pln = true,
+    .max_vs_threads = 504,
+    .max_gs_threads = 504,
+    .max_wm_threads = 384,
+    .gt = 3,
+   .urb = {
+      .size = 384,
+      .min_vs_entries = 64,
+      .max_vs_entries = 2560,
+      .max_gs_entries = 960,
+   }
+};
+
+/* Thread counts and URB limits are placeholders, and may not be accurate.
+ * These were copied from Haswell GT1, above.
+ */
+static const struct brw_device_info brw_device_info_chv = {
+    .gen = 8,
+    .has_hiz_and_separate_stencil = true,
+    .must_use_separate_stencil = true,
+    .has_pln = true,
+    .is_cherryview = 1, .gt = 1,
+   .has_llc = false,
+   .max_vs_threads = 70,
+   .max_gs_threads = 70,
+   .max_wm_threads = 102,
+   .urb = {
+      .size = 128,
+      .min_vs_entries = 64,
+      .max_vs_entries = 640,
+      .max_gs_entries = 256,
+   }
+};
+
+const struct brw_device_info *
+brw_get_device_info(int devid)
+{
+   switch (devid) {
+#undef CHIPSET
+#define CHIPSET(id, family, name) case id: return &brw_device_info_##family;
+#include "pci_ids/i965_pci_ids.h"
+   default:
+      fprintf(stderr, "i965_dri.so does not support the 0x%x PCI ID.\n", devid);
+      return NULL;
+   }
+}

diff --git a/icd/intel/compiler/pipeline/brw_device_info.h b/icd/intel/compiler/pipeline/brw_device_info.h
new file mode 100644
index 0000000..e506beb
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_device_info.h

@@ -0,0 +1,81 @@
+ /*
+  * Copyright © 2013 Intel Corporation
+  *
+  * Permission is hereby granted, free of charge, to any person obtaining a
+  * copy of this software and associated documentation files (the "Software"),
+  * to deal in the Software without restriction, including without limitation
+  * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+  * and/or sell copies of the Software, and to permit persons to whom the
+  * Software is furnished to do so, subject to the following conditions:
+  *
+  * The above copyright notice and this permission notice (including the next
+  * paragraph) shall be included in all copies or substantial portions of the
+  * Software.
+  *
+  * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+  * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+  * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+  * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+  * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+  * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+  * IN THE SOFTWARE.
+  *
+  */
+
+#pragma once
+#include <stdbool.h>
+
+struct brw_device_info
+{
+   int gen; /**< Generation number: 4, 5, 6, 7, ... */
+   int gt;
+
+   bool is_g4x;
+   bool is_ivybridge;
+   bool is_baytrail;
+   bool is_haswell;
+   bool is_cherryview;
+
+   bool has_hiz_and_separate_stencil;
+   bool must_use_separate_stencil;
+
+   bool has_llc;
+
+   bool has_pln;
+   bool has_compr4;
+   bool has_surface_tile_offset;
+
+   /**
+    * Quirks:
+    *  @{
+    */
+   bool has_negative_rhw_bug;
+
+   /**
+    * Some versions of Gen hardware don't do centroid interpolation correctly
+    * on unlit pixels, causing incorrect values for derivatives near triangle
+    * edges.  Enabling this flag causes the fragment shader to use
+    * non-centroid interpolation for unlit pixels, at the expense of two extra
+    * fragment shader instructions.
+    */
+   bool needs_unlit_centroid_workaround;
+   /** @} */
+
+   /**
+    * GPU Limits:
+    *  @{
+    */
+   unsigned max_vs_threads;
+   unsigned max_gs_threads;
+   unsigned max_wm_threads;
+
+   struct {
+      unsigned size;
+      unsigned min_vs_entries;
+      unsigned max_vs_entries;
+      unsigned max_gs_entries;
+   } urb;
+   /** @} */
+};
+
+const struct brw_device_info *brw_get_device_info(int devid);

diff --git a/icd/intel/compiler/pipeline/brw_disasm.c b/icd/intel/compiler/pipeline/brw_disasm.c
new file mode 100644
index 0000000..e54172c
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_disasm.c

@@ -0,0 +1,1466 @@
+/*
+ * Copyright © 2008 Keith Packard
+ *
+ * Permission to use, copy, modify, distribute, and sell this software and its
+ * documentation for any purpose is hereby granted without fee, provided that
+ * the above copyright notice appear in all copies and that both that copyright
+ * notice and this permission notice appear in supporting documentation, and
+ * that the name of the copyright holders not be used in advertising or
+ * publicity pertaining to distribution of the software without specific,
+ * written prior permission.  The copyright holders make no representations
+ * about the suitability of this software for any purpose.  It is provided "as
+ * is" without express or implied warranty.
+ *
+ * THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
+ * INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO
+ * EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR
+ * CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE,
+ * DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
+ * TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
+ * OF THIS SOFTWARE.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <getopt.h>
+#include <unistd.h>
+#include <stdarg.h>
+
+#include "main/mtypes.h"
+
+#include "brw_context.h"
+#include "brw_defines.h"
+
+const struct opcode_desc opcode_descs[128] = {
+    [BRW_OPCODE_MOV] = { .name = "mov", .nsrc = 1, .ndst = 1 },
+    [BRW_OPCODE_FRC] = { .name = "frc", .nsrc = 1, .ndst = 1 },
+    [BRW_OPCODE_RNDU] = { .name = "rndu", .nsrc = 1, .ndst = 1 },
+    [BRW_OPCODE_RNDD] = { .name = "rndd", .nsrc = 1, .ndst = 1 },
+    [BRW_OPCODE_RNDE] = { .name = "rnde", .nsrc = 1, .ndst = 1 },
+    [BRW_OPCODE_RNDZ] = { .name = "rndz", .nsrc = 1, .ndst = 1 },
+    [BRW_OPCODE_NOT] = { .name = "not", .nsrc = 1, .ndst = 1 },
+    [BRW_OPCODE_LZD] = { .name = "lzd", .nsrc = 1, .ndst = 1 },
+    [BRW_OPCODE_F32TO16] = { .name = "f32to16", .nsrc = 1, .ndst = 1 },
+    [BRW_OPCODE_F16TO32] = { .name = "f16to32", .nsrc = 1, .ndst = 1 },
+    [BRW_OPCODE_BFREV] = { .name = "bfrev", .nsrc = 1, .ndst = 1},
+    [BRW_OPCODE_FBH] = { .name = "fbh", .nsrc = 1, .ndst = 1},
+    [BRW_OPCODE_FBL] = { .name = "fbl", .nsrc = 1, .ndst = 1},
+    [BRW_OPCODE_CBIT] = { .name = "cbit", .nsrc = 1, .ndst = 1},
+
+    [BRW_OPCODE_MUL] = { .name = "mul", .nsrc = 2, .ndst = 1 },
+    [BRW_OPCODE_MAC] = { .name = "mac", .nsrc = 2, .ndst = 1 },
+    [BRW_OPCODE_MACH] = { .name = "mach", .nsrc = 2, .ndst = 1 },
+    [BRW_OPCODE_LINE] = { .name = "line", .nsrc = 2, .ndst = 1 },
+    [BRW_OPCODE_PLN] = { .name = "pln", .nsrc = 2, .ndst = 1 },
+    [BRW_OPCODE_MAD] = { .name = "mad", .nsrc = 3, .ndst = 1 },
+    [BRW_OPCODE_LRP] = { .name = "lrp", .nsrc = 3, .ndst = 1 },
+    [BRW_OPCODE_SAD2] = { .name = "sad2", .nsrc = 2, .ndst = 1 },
+    [BRW_OPCODE_SADA2] = { .name = "sada2", .nsrc = 2, .ndst = 1 },
+    [BRW_OPCODE_DP4] = { .name = "dp4", .nsrc = 2, .ndst = 1 },
+    [BRW_OPCODE_DPH] = { .name = "dph", .nsrc = 2, .ndst = 1 },
+    [BRW_OPCODE_DP3] = { .name = "dp3", .nsrc = 2, .ndst = 1 },
+    [BRW_OPCODE_DP2] = { .name = "dp2", .nsrc = 2, .ndst = 1 },
+    [BRW_OPCODE_MATH] = { .name = "math", .nsrc = 2, .ndst = 1 },
+
+    [BRW_OPCODE_AVG] = { .name = "avg", .nsrc = 2, .ndst = 1 },
+    [BRW_OPCODE_ADD] = { .name = "add", .nsrc = 2, .ndst = 1 },
+    [BRW_OPCODE_SEL] = { .name = "sel", .nsrc = 2, .ndst = 1 },
+    [BRW_OPCODE_AND] = { .name = "and", .nsrc = 2, .ndst = 1 },
+    [BRW_OPCODE_OR] = { .name = "or", .nsrc = 2, .ndst = 1 },
+    [BRW_OPCODE_XOR] = { .name = "xor", .nsrc = 2, .ndst = 1 },
+    [BRW_OPCODE_SHR] = { .name = "shr", .nsrc = 2, .ndst = 1 },
+    [BRW_OPCODE_SHL] = { .name = "shl", .nsrc = 2, .ndst = 1 },
+    [BRW_OPCODE_ASR] = { .name = "asr", .nsrc = 2, .ndst = 1 },
+    [BRW_OPCODE_CMP] = { .name = "cmp", .nsrc = 2, .ndst = 1 },
+    [BRW_OPCODE_CMPN] = { .name = "cmpn", .nsrc = 2, .ndst = 1 },
+    [BRW_OPCODE_BFE] = { .name = "bfe", .nsrc = 3, .ndst = 1},
+    [BRW_OPCODE_BFI1] = { .name = "bfi1", .nsrc = 2, .ndst = 1},
+    [BRW_OPCODE_BFI2] = { .name = "bfi2", .nsrc = 3, .ndst = 1},
+    [BRW_OPCODE_ADDC] = { .name = "addc", .nsrc = 2, .ndst = 1},
+    [BRW_OPCODE_SUBB] = { .name = "subb", .nsrc = 2, .ndst = 1},
+
+    [BRW_OPCODE_SEND] = { .name = "send", .nsrc = 1, .ndst = 1 },
+    [BRW_OPCODE_SENDC] = { .name = "sendc", .nsrc = 1, .ndst = 1 },
+    [BRW_OPCODE_NOP] = { .name = "nop", .nsrc = 0, .ndst = 0 },
+    [BRW_OPCODE_JMPI] = { .name = "jmpi", .nsrc = 0, .ndst = 0 },
+    [BRW_OPCODE_IF] = { .name = "if", .nsrc = 2, .ndst = 0 },
+    [BRW_OPCODE_IFF] = { .name = "iff", .nsrc = 2, .ndst = 1 },
+    [BRW_OPCODE_WHILE] = { .name = "while", .nsrc = 2, .ndst = 0 },
+    [BRW_OPCODE_ELSE] = { .name = "else", .nsrc = 2, .ndst = 0 },
+    [BRW_OPCODE_BREAK] = { .name = "break", .nsrc = 2, .ndst = 0 },
+    [BRW_OPCODE_CONTINUE] = { .name = "cont", .nsrc = 1, .ndst = 0 },
+    [BRW_OPCODE_HALT] = { .name = "halt", .nsrc = 1, .ndst = 0 },
+    [BRW_OPCODE_MSAVE] = { .name = "msave", .nsrc = 1, .ndst = 1 },
+    [BRW_OPCODE_PUSH] = { .name = "push", .nsrc = 1, .ndst = 1 },
+    [BRW_OPCODE_MRESTORE] = { .name = "mrest", .nsrc = 1, .ndst = 1 },
+    [BRW_OPCODE_POP] = { .name = "pop", .nsrc = 2, .ndst = 0 },
+    [BRW_OPCODE_WAIT] = { .name = "wait", .nsrc = 1, .ndst = 0 },
+    [BRW_OPCODE_DO] = { .name = "do", .nsrc = 0, .ndst = 0 },
+    [BRW_OPCODE_ENDIF] = { .name = "endif", .nsrc = 2, .ndst = 0 },
+};
+static const struct opcode_desc *opcode = opcode_descs;
+
+const char * const conditional_modifier[16] = {
+    [BRW_CONDITIONAL_NONE] = "",
+    [BRW_CONDITIONAL_Z] = ".e",
+    [BRW_CONDITIONAL_NZ] = ".ne",
+    [BRW_CONDITIONAL_G] = ".g",
+    [BRW_CONDITIONAL_GE] = ".ge",
+    [BRW_CONDITIONAL_L] = ".l",
+    [BRW_CONDITIONAL_LE] = ".le",
+    [BRW_CONDITIONAL_R] = ".r",
+    [BRW_CONDITIONAL_O] = ".o",
+    [BRW_CONDITIONAL_U] = ".u",
+};
+
+static const char * const negate[2] = {
+    [0] = "",
+    [1] = "-",
+};
+
+static const char * const _abs[2] = {
+    [0] = "",
+    [1] = "(abs)",
+};
+
+static const char * const vert_stride[16] = {
+    [0] = "0",
+    [1] = "1",
+    [2] = "2",
+    [3] = "4",
+    [4] = "8",
+    [5] = "16",
+    [6] = "32",
+    [15] = "VxH",
+};
+
+static const char * const width[8] = {
+    [0] = "1",
+    [1] = "2",
+    [2] = "4",
+    [3] = "8",
+    [4] = "16",
+};
+
+static const char * const horiz_stride[4] = {
+    [0] = "0",
+    [1] = "1",
+    [2] = "2",
+    [3] = "4"
+};
+
+static const char * const chan_sel[4] = {
+    [0] = "x",
+    [1] = "y",
+    [2] = "z",
+    [3] = "w",
+};
+
+static const char * const debug_ctrl[2] = {
+    [0] = "",
+    [1] = ".breakpoint"
+};
+
+static const char * const saturate[2] = {
+    [0] = "",
+    [1] = ".sat"
+};
+
+static const char * const accwr[2] = {
+    [0] = "",
+    [1] = "AccWrEnable"
+};
+
+static const char * const wectrl[2] = {
+    [0] = "WE_normal",
+    [1] = "WE_all"
+};
+
+static const char * const exec_size[8] = {
+    [0] = "1",
+    [1] = "2",
+    [2] = "4",
+    [3] = "8",
+    [4] = "16",
+    [5] = "32"
+};
+
+static const char * const pred_inv[2] = {
+    [0] = "+",
+    [1] = "-"
+};
+
+static const char * const pred_ctrl_align16[16] = {
+    [1] = "",
+    [2] = ".x",
+    [3] = ".y",
+    [4] = ".z",
+    [5] = ".w",
+    [6] = ".any4h",
+    [7] = ".all4h",
+};
+
+static const char * const pred_ctrl_align1[16] = {
+    [1] = "",
+    [2] = ".anyv",
+    [3] = ".allv",
+    [4] = ".any2h",
+    [5] = ".all2h",
+    [6] = ".any4h",
+    [7] = ".all4h",
+    [8] = ".any8h",
+    [9] = ".all8h",
+    [10] = ".any16h",
+    [11] = ".all16h",
+};
+
+static const char * const thread_ctrl[4] = {
+    [0] = "",
+    [2] = "switch"
+};
+
+static const char * const compr_ctrl[4] = {
+    [0] = "",
+    [1] = "sechalf",
+    [2] = "compr",
+    [3] = "compr4",
+};
+
+static const char * const dep_ctrl[4] = {
+    [0] = "",
+    [1] = "NoDDClr",
+    [2] = "NoDDChk",
+    [3] = "NoDDClr,NoDDChk",
+};
+
+static const char * const mask_ctrl[4] = {
+    [0] = "",
+    [1] = "nomask",
+};
+
+static const char * const access_mode[2] = {
+    [0] = "align1",
+    [1] = "align16",
+};
+
+static const char * const reg_encoding[8] = {
+    [0] = "UD",
+    [1] = "D",
+    [2] = "UW",
+    [3] = "W",
+    [4] = "UB",
+    [5] = "B",
+    [7] = "F"
+};
+
+const char * const three_source_reg_encoding[] = {
+   [BRW_3SRC_TYPE_F]  = "F",
+   [BRW_3SRC_TYPE_D]  = "D",
+   [BRW_3SRC_TYPE_UD] = "UD",
+};
+
+const int reg_type_size[8] = {
+    [0] = 4,
+    [1] = 4,
+    [2] = 2,
+    [3] = 2,
+    [4] = 1,
+    [5] = 1,
+    [7] = 4
+};
+
+static const char * const reg_file[4] = {
+    [0] = "A",
+    [1] = "g",
+    [2] = "m",
+    [3] = "imm",
+};
+
+static const char * const writemask[16] = {
+    [0x0] = ".",
+    [0x1] = ".x",
+    [0x2] = ".y",
+    [0x3] = ".xy",
+    [0x4] = ".z",
+    [0x5] = ".xz",
+    [0x6] = ".yz",
+    [0x7] = ".xyz",
+    [0x8] = ".w",
+    [0x9] = ".xw",
+    [0xa] = ".yw",
+    [0xb] = ".xyw",
+    [0xc] = ".zw",
+    [0xd] = ".xzw",
+    [0xe] = ".yzw",
+    [0xf] = "",
+};
+
+static const char * const end_of_thread[2] = {
+    [0] = "",
+    [1] = "EOT"
+};
+
+static const char * const target_function[16] = {
+    [BRW_SFID_NULL] = "null",
+    [BRW_SFID_MATH] = "math",
+    [BRW_SFID_SAMPLER] = "sampler",
+    [BRW_SFID_MESSAGE_GATEWAY] = "gateway",
+    [BRW_SFID_DATAPORT_READ] = "read",
+    [BRW_SFID_DATAPORT_WRITE] = "write",
+    [BRW_SFID_URB] = "urb",
+    [BRW_SFID_THREAD_SPAWNER] = "thread_spawner",
+    [BRW_SFID_VME] = "vme",
+};
+
+static const char * const target_function_gen6[16] = {
+    [BRW_SFID_NULL] = "null",
+    [BRW_SFID_MATH] = "math",
+    [BRW_SFID_SAMPLER] = "sampler",
+    [BRW_SFID_MESSAGE_GATEWAY] = "gateway",
+    [BRW_SFID_URB] = "urb",
+    [BRW_SFID_THREAD_SPAWNER] = "thread_spawner",
+    [GEN6_SFID_DATAPORT_SAMPLER_CACHE] = "sampler",
+    [GEN6_SFID_DATAPORT_RENDER_CACHE] = "render",
+    [GEN6_SFID_DATAPORT_CONSTANT_CACHE] = "const",
+    [GEN7_SFID_DATAPORT_DATA_CACHE] = "data",
+    [GEN7_SFID_PIXEL_INTERPOLATOR] = "pixel interp",
+    [HSW_SFID_DATAPORT_DATA_CACHE_1] = "dp data 1",
+    [HSW_SFID_CRE] = "cre",
+};
+
+static const char * const dp_rc_msg_type_gen6[16] = {
+    [BRW_DATAPORT_READ_MESSAGE_OWORD_BLOCK_READ] = "OWORD block read",
+    [GEN6_DATAPORT_READ_MESSAGE_RENDER_UNORM_READ] = "RT UNORM read",
+    [GEN6_DATAPORT_READ_MESSAGE_OWORD_DUAL_BLOCK_READ] = "OWORD dual block read",
+    [GEN6_DATAPORT_READ_MESSAGE_MEDIA_BLOCK_READ] = "media block read",
+    [GEN6_DATAPORT_READ_MESSAGE_OWORD_UNALIGN_BLOCK_READ] = "OWORD unaligned block read",
+    [GEN6_DATAPORT_READ_MESSAGE_DWORD_SCATTERED_READ] = "DWORD scattered read",
+    [GEN6_DATAPORT_WRITE_MESSAGE_DWORD_ATOMIC_WRITE] = "DWORD atomic write",
+    [GEN6_DATAPORT_WRITE_MESSAGE_OWORD_BLOCK_WRITE] = "OWORD block write",
+    [GEN6_DATAPORT_WRITE_MESSAGE_OWORD_DUAL_BLOCK_WRITE] = "OWORD dual block write",
+    [GEN6_DATAPORT_WRITE_MESSAGE_MEDIA_BLOCK_WRITE] = "media block write",
+    [GEN6_DATAPORT_WRITE_MESSAGE_DWORD_SCATTERED_WRITE] = "DWORD scattered write",
+    [GEN6_DATAPORT_WRITE_MESSAGE_RENDER_TARGET_WRITE] = "RT write",
+    [GEN6_DATAPORT_WRITE_MESSAGE_STREAMED_VB_WRITE] = "streamed VB write",
+    [GEN6_DATAPORT_WRITE_MESSAGE_RENDER_TARGET_UNORM_WRITE] = "RT UNORMc write",
+};
+
+static const char *const dp_dc0_msg_type_gen7[16] = {
+    [GEN7_DATAPORT_DC_OWORD_BLOCK_READ] = "DC OWORD block read",
+    [GEN7_DATAPORT_DC_UNALIGNED_OWORD_BLOCK_READ] = "DC unaligned OWORD block read",
+    [GEN7_DATAPORT_DC_OWORD_DUAL_BLOCK_READ] = "DC OWORD dual block read",
+    [GEN7_DATAPORT_DC_DWORD_SCATTERED_READ] = "DC DWORD scattered read",
+    [GEN7_DATAPORT_DC_BYTE_SCATTERED_READ] = "DC byte scattered read",
+    [GEN7_DATAPORT_DC_UNTYPED_ATOMIC_OP] = "DC untyped atomic",
+    [GEN7_DATAPORT_DC_MEMORY_FENCE] = "DC mfence",
+    [GEN7_DATAPORT_DC_OWORD_BLOCK_WRITE] = "DC OWORD block write",
+    [GEN7_DATAPORT_DC_OWORD_DUAL_BLOCK_WRITE] = "DC OWORD dual block write",
+    [GEN7_DATAPORT_DC_DWORD_SCATTERED_WRITE] = "DC DWORD scatterd write",
+    [GEN7_DATAPORT_DC_BYTE_SCATTERED_WRITE] = "DC byte scattered write",
+    [GEN7_DATAPORT_DC_UNTYPED_SURFACE_WRITE] = "DC untyped surface write",
+};
+
+static const char *const dp_dc1_msg_type_hsw[16] = {
+    [HSW_DATAPORT_DC_PORT1_UNTYPED_SURFACE_READ] = "untyped surface read",
+    [HSW_DATAPORT_DC_PORT1_UNTYPED_ATOMIC_OP] = "DC untyped atomic op",
+    [HSW_DATAPORT_DC_PORT1_UNTYPED_ATOMIC_OP_SIMD4X2] = "DC untyped 4x2 atomic op",
+    [HSW_DATAPORT_DC_PORT1_MEDIA_BLOCK_READ] = "DC media block read",
+    [HSW_DATAPORT_DC_PORT1_TYPED_SURFACE_READ] = "DC typed surface read",
+    [HSW_DATAPORT_DC_PORT1_TYPED_ATOMIC_OP] = "DC typed atomic",
+    [HSW_DATAPORT_DC_PORT1_TYPED_ATOMIC_OP_SIMD4X2] = "DC typed 4x2 atomic op",
+    [HSW_DATAPORT_DC_PORT1_UNTYPED_SURFACE_WRITE] = "DC untyped surface write",
+    [HSW_DATAPORT_DC_PORT1_MEDIA_BLOCK_WRITE] = "DC media block write",
+    [HSW_DATAPORT_DC_PORT1_ATOMIC_COUNTER_OP] = "DC atomic counter op",
+    [HSW_DATAPORT_DC_PORT1_ATOMIC_COUNTER_OP_SIMD4X2] = "DC 4x2 atomic counter op",
+    [HSW_DATAPORT_DC_PORT1_TYPED_SURFACE_WRITE] = "DC typed surface write",
+};
+
+static const char * const aop[16] = {
+   [BRW_AOP_AND] = "and",
+   [BRW_AOP_OR] = "or",
+   [BRW_AOP_XOR] = "xor",
+   [BRW_AOP_MOV] = "mov",
+   [BRW_AOP_INC] = "inc",
+   [BRW_AOP_DEC] = "dec",
+   [BRW_AOP_ADD] = "add",
+   [BRW_AOP_SUB] = "sub",
+   [BRW_AOP_REVSUB] = "revsub",
+   [BRW_AOP_IMAX] = "imax",
+   [BRW_AOP_IMIN] = "imin",
+   [BRW_AOP_UMAX] = "umax",
+   [BRW_AOP_UMIN] = "umin",
+   [BRW_AOP_CMPWR] = "cmpwr",
+   [BRW_AOP_PREDEC] = "predec",
+};
+
+static const char * const math_function[16] = {
+    [BRW_MATH_FUNCTION_INV] = "inv",
+    [BRW_MATH_FUNCTION_LOG] = "log",
+    [BRW_MATH_FUNCTION_EXP] = "exp",
+    [BRW_MATH_FUNCTION_SQRT] = "sqrt",
+    [BRW_MATH_FUNCTION_RSQ] = "rsq",
+    [BRW_MATH_FUNCTION_SIN] = "sin",
+    [BRW_MATH_FUNCTION_COS] = "cos",
+    [BRW_MATH_FUNCTION_SINCOS] = "sincos",
+    [BRW_MATH_FUNCTION_FDIV] = "fdiv",
+    [BRW_MATH_FUNCTION_POW] = "pow",
+    [BRW_MATH_FUNCTION_INT_DIV_QUOTIENT_AND_REMAINDER] = "intdivmod",
+    [BRW_MATH_FUNCTION_INT_DIV_QUOTIENT] = "intdiv",
+    [BRW_MATH_FUNCTION_INT_DIV_REMAINDER] = "intmod",
+};
+
+static const char * const math_saturate[2] = {
+    [0] = "",
+    [1] = "sat"
+};
+
+static const char * const math_signed[2] = {
+    [0] = "",
+    [1] = "signed"
+};
+
+static const char * const math_scalar[2] = {
+    [0] = "",
+    [1] = "scalar"
+};
+
+static const char * const math_precision[2] = {
+    [0] = "",
+    [1] = "partial_precision"
+};
+
+static const char * const urb_opcode[2] = {
+    [0] = "urb_write",
+    [1] = "ff_sync",
+};
+
+static const char * const urb_swizzle[4] = {
+    [BRW_URB_SWIZZLE_NONE] = "",
+    [BRW_URB_SWIZZLE_INTERLEAVE] = "interleave",
+    [BRW_URB_SWIZZLE_TRANSPOSE] = "transpose",
+};
+
+static const char * const urb_allocate[2] = {
+    [0] = "",
+    [1] = "allocate"
+};
+
+static const char * const urb_used[2] = {
+    [0] = "",
+    [1] = "used"
+};
+
+static const char * const urb_complete[2] = {
+    [0] = "",
+    [1] = "complete"
+};
+
+static const char * const sampler_target_format[4] = {
+    [0] = "F",
+    [2] = "UD",
+    [3] = "D"
+};
+
+
+static int column;
+
+static int string (FILE *file, const char *string)
+{
+    fputs (string, file);
+    column += strlen (string);
+    return 0;
+}
+
+static int format (FILE *f, const char *format, ...)
+{
+    char    buf[1024];
+    va_list	args;
+    va_start (args, format);
+
+    vsnprintf (buf, sizeof (buf) - 1, format, args);
+    va_end (args);
+    string (f, buf);
+    return 0;
+}
+
+static int newline (FILE *f)
+{
+    putc ('\n', f);
+    column = 0;
+    return 0;
+}
+
+static int pad (FILE *f, int c)
+{
+    do
+	string (f, " ");
+    while (column < c);
+    return 0;
+}
+
+static int control (FILE *file, const char *name, const char * const ctrl[],
+                    unsigned id, int *space)
+{
+    if (!ctrl[id]) {
+	fprintf (file, "*** invalid %s value %d ",
+		 name, id);
+	return 1;
+    }
+    if (ctrl[id][0])
+    {
+	if (space && *space)
+	    string (file, " ");
+	string (file, ctrl[id]);
+	if (space)
+	    *space = 1;
+    }
+    return 0;
+}
+
+static int print_opcode (FILE *file, int id)
+{
+    if (!opcode[id].name) {
+	format (file, "*** invalid opcode value %d ", id);
+	return 1;
+    }
+    string (file, opcode[id].name);
+    return 0;
+}
+
+static int reg (FILE *file, unsigned _reg_file, unsigned _reg_nr)
+{
+    int	err = 0;
+
+    /* Clear the Compr4 instruction compression bit. */
+    if (_reg_file == BRW_MESSAGE_REGISTER_FILE)
+       _reg_nr &= ~(1 << 7);
+
+    if (_reg_file == BRW_ARCHITECTURE_REGISTER_FILE) {
+	switch (_reg_nr & 0xf0) {
+	case BRW_ARF_NULL:
+	    string (file, "null");
+	    return -1;
+	case BRW_ARF_ADDRESS:
+	    format (file, "a%d", _reg_nr & 0x0f);
+	    break;
+	case BRW_ARF_ACCUMULATOR:
+	    format (file, "acc%d", _reg_nr & 0x0f);
+	    break;
+	case BRW_ARF_FLAG:
+	    format (file, "f%d", _reg_nr & 0x0f);
+	    break;
+	case BRW_ARF_MASK:
+	    format (file, "mask%d", _reg_nr & 0x0f);
+	    break;
+	case BRW_ARF_MASK_STACK:
+	    format (file, "msd%d", _reg_nr & 0x0f);
+	    break;
+	case BRW_ARF_STATE:
+	    format (file, "sr%d", _reg_nr & 0x0f);
+	    break;
+	case BRW_ARF_CONTROL:
+	    format (file, "cr%d", _reg_nr & 0x0f);
+	    break;
+	case BRW_ARF_NOTIFICATION_COUNT:
+	    format (file, "n%d", _reg_nr & 0x0f);
+	    break;
+	case BRW_ARF_IP:
+	    string (file, "ip");
+	    return -1;
+	    break;
+	default:
+	    format (file, "ARF%d", _reg_nr);
+	    break;
+	}
+    } else {
+	err  |= control (file, "src reg file", reg_file, _reg_file, NULL);
+	format (file, "%d", _reg_nr);
+    }
+    return err;
+}
+
+static int dest (FILE *file, struct brw_instruction *inst)
+{
+    int	err = 0;
+
+    if (inst->header.access_mode == BRW_ALIGN_1)
+    {
+	if (inst->bits1.da1.dest_address_mode == BRW_ADDRESS_DIRECT)
+	{
+	    err |= reg (file, inst->bits1.da1.dest_reg_file, inst->bits1.da1.dest_reg_nr);
+	    if (err == -1)
+		return 0;
+	    if (inst->bits1.da1.dest_subreg_nr)
+		format (file, ".%d", inst->bits1.da1.dest_subreg_nr /
+				     reg_type_size[inst->bits1.da1.dest_reg_type]);
+	    string (file, "<");
+	    err |= control (file, "horiz stride", horiz_stride, inst->bits1.da1.dest_horiz_stride, NULL);
+	    string (file, ">");
+	    err |= control (file, "dest reg encoding", reg_encoding, inst->bits1.da1.dest_reg_type, NULL);
+	}
+	else
+	{
+	    string (file, "g[a0");
+	    if (inst->bits1.ia1.dest_subreg_nr)
+		format (file, ".%d", inst->bits1.ia1.dest_subreg_nr /
+					reg_type_size[inst->bits1.ia1.dest_reg_type]);
+	    if (inst->bits1.ia1.dest_indirect_offset)
+		format (file, " %d", inst->bits1.ia1.dest_indirect_offset);
+	    string (file, "]<");
+	    err |= control (file, "horiz stride", horiz_stride, inst->bits1.ia1.dest_horiz_stride, NULL);
+	    string (file, ">");
+	    err |= control (file, "dest reg encoding", reg_encoding, inst->bits1.ia1.dest_reg_type, NULL);
+	}
+    }
+    else
+    {
+	if (inst->bits1.da16.dest_address_mode == BRW_ADDRESS_DIRECT)
+	{
+	    err |= reg (file, inst->bits1.da16.dest_reg_file, inst->bits1.da16.dest_reg_nr);
+	    if (err == -1)
+		return 0;
+	    if (inst->bits1.da16.dest_subreg_nr)
+		format (file, ".%d", inst->bits1.da16.dest_subreg_nr /
+				     reg_type_size[inst->bits1.da16.dest_reg_type]);
+	    string (file, "<1>");
+	    err |= control (file, "writemask", writemask, inst->bits1.da16.dest_writemask, NULL);
+	    err |= control (file, "dest reg encoding", reg_encoding, inst->bits1.da16.dest_reg_type, NULL);
+	}
+	else
+	{
+	    err = 1;
+	    string (file, "Indirect align16 address mode not supported");
+	}
+    }
+
+    return 0;
+}
+
+static int dest_3src (FILE *file, struct brw_instruction *inst)
+{
+    int	err = 0;
+    uint32_t reg_file;
+
+    if (inst->bits1.da3src.dest_reg_file)
+       reg_file = BRW_MESSAGE_REGISTER_FILE;
+    else
+       reg_file = BRW_GENERAL_REGISTER_FILE;
+
+    err |= reg (file, reg_file, inst->bits1.da3src.dest_reg_nr);
+    if (err == -1)
+       return 0;
+    if (inst->bits1.da3src.dest_subreg_nr)
+       format (file, ".%d", inst->bits1.da3src.dest_subreg_nr);
+    string (file, "<1>");
+    err |= control (file, "writemask", writemask, inst->bits1.da3src.dest_writemask, NULL);
+    err |= control (file, "dest reg encoding", three_source_reg_encoding,
+                    inst->bits1.da3src.dst_type, NULL);
+
+    return 0;
+}
+
+static int src_align1_region (FILE *file,
+			      unsigned _vert_stride, unsigned _width, unsigned _horiz_stride)
+{
+    int err = 0;
+    string (file, "<");
+    err |= control (file, "vert stride", vert_stride, _vert_stride, NULL);
+    string (file, ",");
+    err |= control (file, "width", width, _width, NULL);
+    string (file, ",");
+    err |= control (file, "horiz_stride", horiz_stride, _horiz_stride, NULL);
+    string (file, ">");
+    return err;
+}
+
+static int src_da1 (FILE *file, unsigned type, unsigned _reg_file,
+		    unsigned _vert_stride, unsigned _width, unsigned _horiz_stride,
+		    unsigned reg_num, unsigned sub_reg_num, unsigned __abs, unsigned _negate)
+{
+    int err = 0;
+    err |= control (file, "negate", negate, _negate, NULL);
+    err |= control (file, "abs", _abs, __abs, NULL);
+
+    err |= reg (file, _reg_file, reg_num);
+    if (err == -1)
+	return 0;
+    if (sub_reg_num)
+	format (file, ".%d", sub_reg_num / reg_type_size[type]); /* use formal style like spec */
+    src_align1_region (file, _vert_stride, _width, _horiz_stride);
+    err |= control (file, "src reg encoding", reg_encoding, type, NULL);
+    return err;
+}
+
+static int src_ia1 (FILE *file,
+		    unsigned type,
+		    unsigned _reg_file,
+		    int _addr_imm,
+		    unsigned _addr_subreg_nr,
+		    unsigned _negate,
+		    unsigned __abs,
+		    unsigned _addr_mode,
+		    unsigned _horiz_stride,
+		    unsigned _width,
+		    unsigned _vert_stride)
+{
+    int err = 0;
+    err |= control (file, "negate", negate, _negate, NULL);
+    err |= control (file, "abs", _abs, __abs, NULL);
+
+    string (file, "g[a0");
+    if (_addr_subreg_nr)
+	format (file, ".%d", _addr_subreg_nr);
+    if (_addr_imm)
+	format (file, " %d", _addr_imm);
+    string (file, "]");
+    src_align1_region (file, _vert_stride, _width, _horiz_stride);
+    err |= control (file, "src reg encoding", reg_encoding, type, NULL);
+    return err;
+}
+
+static int src_da16 (FILE *file,
+		     unsigned _reg_type,
+		     unsigned _reg_file,
+		     unsigned _vert_stride,
+		     unsigned _reg_nr,
+		     unsigned _subreg_nr,
+		     unsigned __abs,
+		     unsigned _negate,
+		     unsigned swz_x,
+		     unsigned swz_y,
+		     unsigned swz_z,
+		     unsigned swz_w)
+{
+    int err = 0;
+    err |= control (file, "negate", negate, _negate, NULL);
+    err |= control (file, "abs", _abs, __abs, NULL);
+
+    err |= reg (file, _reg_file, _reg_nr);
+    if (err == -1)
+	return 0;
+    if (_subreg_nr)
+	/* bit4 for subreg number byte addressing. Make this same meaning as
+	   in da1 case, so output looks consistent. */
+	format (file, ".%d", 16 / reg_type_size[_reg_type]);
+    string (file, "<");
+    err |= control (file, "vert stride", vert_stride, _vert_stride, NULL);
+    string (file, ",4,1>");
+    /*
+     * Three kinds of swizzle display:
+     *  identity - nothing printed
+     *  1->all	 - print the single channel
+     *  1->1     - print the mapping
+     */
+    if (swz_x == BRW_CHANNEL_X &&
+	swz_y == BRW_CHANNEL_Y &&
+	swz_z == BRW_CHANNEL_Z &&
+	swz_w == BRW_CHANNEL_W)
+    {
+	;
+    }
+    else if (swz_x == swz_y && swz_x == swz_z && swz_x == swz_w)
+    {
+	string (file, ".");
+	err |= control (file, "channel select", chan_sel, swz_x, NULL);
+    }
+    else
+    {
+	string (file, ".");
+	err |= control (file, "channel select", chan_sel, swz_x, NULL);
+	err |= control (file, "channel select", chan_sel, swz_y, NULL);
+	err |= control (file, "channel select", chan_sel, swz_z, NULL);
+	err |= control (file, "channel select", chan_sel, swz_w, NULL);
+    }
+    err |= control (file, "src da16 reg type", reg_encoding, _reg_type, NULL);
+    return err;
+}
+
+static int src0_3src (FILE *file, struct brw_instruction *inst)
+{
+    int err = 0;
+    unsigned swz_x = (inst->bits2.da3src.src0_swizzle >> 0) & 0x3;
+    unsigned swz_y = (inst->bits2.da3src.src0_swizzle >> 2) & 0x3;
+    unsigned swz_z = (inst->bits2.da3src.src0_swizzle >> 4) & 0x3;
+    unsigned swz_w = (inst->bits2.da3src.src0_swizzle >> 6) & 0x3;
+
+    err |= control (file, "negate", negate, inst->bits1.da3src.src0_negate, NULL);
+    err |= control (file, "abs", _abs, inst->bits1.da3src.src0_abs, NULL);
+
+    err |= reg (file, BRW_GENERAL_REGISTER_FILE, inst->bits2.da3src.src0_reg_nr);
+    if (err == -1)
+	return 0;
+    if (inst->bits2.da3src.src0_subreg_nr)
+	format (file, ".%d", inst->bits2.da3src.src0_subreg_nr);
+    if (inst->bits2.da3src.src0_rep_ctrl)
+       string (file, "<0,1,0>");
+    else
+       string (file, "<4,4,1>");
+    err |= control (file, "src da16 reg type", three_source_reg_encoding,
+                    inst->bits1.da3src.src_type, NULL);
+    /*
+     * Three kinds of swizzle display:
+     *  identity - nothing printed
+     *  1->all	 - print the single channel
+     *  1->1     - print the mapping
+     */
+    if (swz_x == BRW_CHANNEL_X &&
+	swz_y == BRW_CHANNEL_Y &&
+	swz_z == BRW_CHANNEL_Z &&
+	swz_w == BRW_CHANNEL_W)
+    {
+	;
+    }
+    else if (swz_x == swz_y && swz_x == swz_z && swz_x == swz_w)
+    {
+	string (file, ".");
+	err |= control (file, "channel select", chan_sel, swz_x, NULL);
+    }
+    else
+    {
+	string (file, ".");
+	err |= control (file, "channel select", chan_sel, swz_x, NULL);
+	err |= control (file, "channel select", chan_sel, swz_y, NULL);
+	err |= control (file, "channel select", chan_sel, swz_z, NULL);
+	err |= control (file, "channel select", chan_sel, swz_w, NULL);
+    }
+    return err;
+}
+
+static int src1_3src (FILE *file, struct brw_instruction *inst)
+{
+    int err = 0;
+    unsigned swz_x = (inst->bits2.da3src.src1_swizzle >> 0) & 0x3;
+    unsigned swz_y = (inst->bits2.da3src.src1_swizzle >> 2) & 0x3;
+    unsigned swz_z = (inst->bits2.da3src.src1_swizzle >> 4) & 0x3;
+    unsigned swz_w = (inst->bits2.da3src.src1_swizzle >> 6) & 0x3;
+    unsigned src1_subreg_nr = (inst->bits2.da3src.src1_subreg_nr_low |
+			     (inst->bits3.da3src.src1_subreg_nr_high << 2));
+
+    err |= control (file, "negate", negate, inst->bits1.da3src.src1_negate,
+		    NULL);
+    err |= control (file, "abs", _abs, inst->bits1.da3src.src1_abs, NULL);
+
+    err |= reg (file, BRW_GENERAL_REGISTER_FILE,
+		inst->bits3.da3src.src1_reg_nr);
+    if (err == -1)
+	return 0;
+    if (src1_subreg_nr)
+	format (file, ".%d", src1_subreg_nr);
+    if (inst->bits2.da3src.src1_rep_ctrl)
+       string (file, "<0,1,0>");
+    else
+       string (file, "<4,4,1>");
+    err |= control (file, "src da16 reg type", three_source_reg_encoding,
+                    inst->bits1.da3src.src_type, NULL);
+    /*
+     * Three kinds of swizzle display:
+     *  identity - nothing printed
+     *  1->all	 - print the single channel
+     *  1->1     - print the mapping
+     */
+    if (swz_x == BRW_CHANNEL_X &&
+	swz_y == BRW_CHANNEL_Y &&
+	swz_z == BRW_CHANNEL_Z &&
+	swz_w == BRW_CHANNEL_W)
+    {
+	;
+    }
+    else if (swz_x == swz_y && swz_x == swz_z && swz_x == swz_w)
+    {
+	string (file, ".");
+	err |= control (file, "channel select", chan_sel, swz_x, NULL);
+    }
+    else
+    {
+	string (file, ".");
+	err |= control (file, "channel select", chan_sel, swz_x, NULL);
+	err |= control (file, "channel select", chan_sel, swz_y, NULL);
+	err |= control (file, "channel select", chan_sel, swz_z, NULL);
+	err |= control (file, "channel select", chan_sel, swz_w, NULL);
+    }
+    return err;
+}
+
+
+static int src2_3src (FILE *file, struct brw_instruction *inst)
+{
+    int err = 0;
+    unsigned swz_x = (inst->bits3.da3src.src2_swizzle >> 0) & 0x3;
+    unsigned swz_y = (inst->bits3.da3src.src2_swizzle >> 2) & 0x3;
+    unsigned swz_z = (inst->bits3.da3src.src2_swizzle >> 4) & 0x3;
+    unsigned swz_w = (inst->bits3.da3src.src2_swizzle >> 6) & 0x3;
+
+    err |= control (file, "negate", negate, inst->bits1.da3src.src2_negate,
+		    NULL);
+    err |= control (file, "abs", _abs, inst->bits1.da3src.src2_abs, NULL);
+
+    err |= reg (file, BRW_GENERAL_REGISTER_FILE,
+		inst->bits3.da3src.src2_reg_nr);
+    if (err == -1)
+	return 0;
+    if (inst->bits3.da3src.src2_subreg_nr)
+	format (file, ".%d", inst->bits3.da3src.src2_subreg_nr);
+    if (inst->bits3.da3src.src2_rep_ctrl)
+       string (file, "<0,1,0>");
+    else
+       string (file, "<4,4,1>");
+    err |= control (file, "src da16 reg type", three_source_reg_encoding,
+                    inst->bits1.da3src.src_type, NULL);
+    /*
+     * Three kinds of swizzle display:
+     *  identity - nothing printed
+     *  1->all	 - print the single channel
+     *  1->1     - print the mapping
+     */
+    if (swz_x == BRW_CHANNEL_X &&
+	swz_y == BRW_CHANNEL_Y &&
+	swz_z == BRW_CHANNEL_Z &&
+	swz_w == BRW_CHANNEL_W)
+    {
+	;
+    }
+    else if (swz_x == swz_y && swz_x == swz_z && swz_x == swz_w)
+    {
+	string (file, ".");
+	err |= control (file, "channel select", chan_sel, swz_x, NULL);
+    }
+    else
+    {
+	string (file, ".");
+	err |= control (file, "channel select", chan_sel, swz_x, NULL);
+	err |= control (file, "channel select", chan_sel, swz_y, NULL);
+	err |= control (file, "channel select", chan_sel, swz_z, NULL);
+	err |= control (file, "channel select", chan_sel, swz_w, NULL);
+    }
+    return err;
+}
+
+static int imm (FILE *file, unsigned type, struct brw_instruction *inst) {
+    switch (type) {
+    case BRW_HW_REG_TYPE_UD:
+	format (file, "0x%08xUD", inst->bits3.ud);
+	break;
+    case BRW_HW_REG_TYPE_D:
+	format (file, "%dD", inst->bits3.d);
+	break;
+    case BRW_HW_REG_TYPE_UW:
+	format (file, "0x%04xUW", (uint16_t) inst->bits3.ud);
+	break;
+    case BRW_HW_REG_TYPE_W:
+	format (file, "%dW", (int16_t) inst->bits3.d);
+	break;
+    case BRW_HW_REG_IMM_TYPE_UV:
+	format (file, "0x%08xUV", inst->bits3.ud);
+	break;
+    case BRW_HW_REG_IMM_TYPE_VF:
+	format (file, "Vector Float");
+	break;
+    case BRW_HW_REG_IMM_TYPE_V:
+	format (file, "0x%08xV", inst->bits3.ud);
+	break;
+    case BRW_HW_REG_TYPE_F:
+	format (file, "%-gF", inst->bits3.f);
+    }
+    return 0;
+}
+
+static int src0 (FILE *file, struct brw_instruction *inst)
+{
+    if (inst->bits1.da1.src0_reg_file == BRW_IMMEDIATE_VALUE)
+	return imm (file, inst->bits1.da1.src0_reg_type,
+		    inst);
+    else if (inst->header.access_mode == BRW_ALIGN_1)
+    {
+	if (inst->bits2.da1.src0_address_mode == BRW_ADDRESS_DIRECT)
+	{
+	    return src_da1 (file,
+			    inst->bits1.da1.src0_reg_type,
+			    inst->bits1.da1.src0_reg_file,
+			    inst->bits2.da1.src0_vert_stride,
+			    inst->bits2.da1.src0_width,
+			    inst->bits2.da1.src0_horiz_stride,
+			    inst->bits2.da1.src0_reg_nr,
+			    inst->bits2.da1.src0_subreg_nr,
+			    inst->bits2.da1.src0_abs,
+			    inst->bits2.da1.src0_negate);
+	}
+	else
+	{
+	    return src_ia1 (file,
+			    inst->bits1.ia1.src0_reg_type,
+			    inst->bits1.ia1.src0_reg_file,
+			    inst->bits2.ia1.src0_indirect_offset,
+			    inst->bits2.ia1.src0_subreg_nr,
+			    inst->bits2.ia1.src0_negate,
+			    inst->bits2.ia1.src0_abs,
+			    inst->bits2.ia1.src0_address_mode,
+			    inst->bits2.ia1.src0_horiz_stride,
+			    inst->bits2.ia1.src0_width,
+			    inst->bits2.ia1.src0_vert_stride);
+	}
+    }
+    else
+    {
+	if (inst->bits2.da16.src0_address_mode == BRW_ADDRESS_DIRECT)
+	{
+	    return src_da16 (file,
+			     inst->bits1.da16.src0_reg_type,
+			     inst->bits1.da16.src0_reg_file,
+			     inst->bits2.da16.src0_vert_stride,
+			     inst->bits2.da16.src0_reg_nr,
+			     inst->bits2.da16.src0_subreg_nr,
+			     inst->bits2.da16.src0_abs,
+			     inst->bits2.da16.src0_negate,
+			     inst->bits2.da16.src0_swz_x,
+			     inst->bits2.da16.src0_swz_y,
+			     inst->bits2.da16.src0_swz_z,
+			     inst->bits2.da16.src0_swz_w);
+	}
+	else
+	{
+	    string (file, "Indirect align16 address mode not supported");
+	    return 1;
+	}
+    }
+}
+
+static int src1 (FILE *file, struct brw_instruction *inst)
+{
+    if (inst->bits1.da1.src1_reg_file == BRW_IMMEDIATE_VALUE)
+	return imm (file, inst->bits1.da1.src1_reg_type,
+		    inst);
+    else if (inst->header.access_mode == BRW_ALIGN_1)
+    {
+	if (inst->bits3.da1.src1_address_mode == BRW_ADDRESS_DIRECT)
+	{
+	    return src_da1 (file,
+			    inst->bits1.da1.src1_reg_type,
+			    inst->bits1.da1.src1_reg_file,
+			    inst->bits3.da1.src1_vert_stride,
+			    inst->bits3.da1.src1_width,
+			    inst->bits3.da1.src1_horiz_stride,
+			    inst->bits3.da1.src1_reg_nr,
+			    inst->bits3.da1.src1_subreg_nr,
+			    inst->bits3.da1.src1_abs,
+			    inst->bits3.da1.src1_negate);
+	}
+	else
+	{
+	    return src_ia1 (file,
+			    inst->bits1.ia1.src1_reg_type,
+			    inst->bits1.ia1.src1_reg_file,
+			    inst->bits3.ia1.src1_indirect_offset,
+			    inst->bits3.ia1.src1_subreg_nr,
+			    inst->bits3.ia1.src1_negate,
+			    inst->bits3.ia1.src1_abs,
+			    inst->bits3.ia1.src1_address_mode,
+			    inst->bits3.ia1.src1_horiz_stride,
+			    inst->bits3.ia1.src1_width,
+			    inst->bits3.ia1.src1_vert_stride);
+	}
+    }
+    else
+    {
+	if (inst->bits3.da16.src1_address_mode == BRW_ADDRESS_DIRECT)
+	{
+	    return src_da16 (file,
+			     inst->bits1.da16.src1_reg_type,
+			     inst->bits1.da16.src1_reg_file,
+			     inst->bits3.da16.src1_vert_stride,
+			     inst->bits3.da16.src1_reg_nr,
+			     inst->bits3.da16.src1_subreg_nr,
+			     inst->bits3.da16.src1_abs,
+			     inst->bits3.da16.src1_negate,
+			     inst->bits3.da16.src1_swz_x,
+			     inst->bits3.da16.src1_swz_y,
+			     inst->bits3.da16.src1_swz_z,
+			     inst->bits3.da16.src1_swz_w);
+	}
+	else
+	{
+	    string (file, "Indirect align16 address mode not supported");
+	    return 1;
+	}
+    }
+}
+
+static int qtr_ctrl(FILE *file, struct brw_instruction *inst)
+{
+    int qtr_ctl = inst->header.compression_control;
+    int exec_size = 1 << inst->header.execution_size;
+
+    if (exec_size == 8) {
+	switch (qtr_ctl) {
+	case 0:
+	    string (file, " 1Q");
+	    break;
+	case 1:
+	    string (file, " 2Q");
+	    break;
+	case 2:
+	    string (file, " 3Q");
+	    break;
+	case 3:
+	    string (file, " 4Q");
+	    break;
+	}
+    } else if (exec_size == 16){
+	if (qtr_ctl < 2)
+	    string (file, " 1H");
+	else
+	    string (file, " 2H");
+    }
+    return 0;
+}
+
+int brw_disasm (FILE *file, struct brw_instruction *inst, int gen)
+{
+    int	err = 0;
+    int space = 0;
+
+    if (inst->header.predicate_control) {
+	string (file, "(");
+	err |= control (file, "predicate inverse", pred_inv, inst->header.predicate_inverse, NULL);
+	format (file, "f%d", gen >= 7 ? inst->bits2.da1.flag_reg_nr : 0);
+	if (inst->bits2.da1.flag_subreg_nr)
+	    format (file, ".%d", inst->bits2.da1.flag_subreg_nr);
+	if (inst->header.access_mode == BRW_ALIGN_1)
+	    err |= control (file, "predicate control align1", pred_ctrl_align1,
+			    inst->header.predicate_control, NULL);
+	else
+	    err |= control (file, "predicate control align16", pred_ctrl_align16,
+			    inst->header.predicate_control, NULL);
+	string (file, ") ");
+    }
+
+    err |= print_opcode (file, inst->header.opcode);
+    err |= control (file, "saturate", saturate, inst->header.saturate, NULL);
+    err |= control (file, "debug control", debug_ctrl, inst->header.debug_control, NULL);
+
+    if (inst->header.opcode == BRW_OPCODE_MATH) {
+	string (file, " ");
+	err |= control (file, "function", math_function,
+			inst->header.destreg__conditionalmod, NULL);
+    } else if (inst->header.opcode != BRW_OPCODE_SEND &&
+	       inst->header.opcode != BRW_OPCODE_SENDC) {
+	err |= control (file, "conditional modifier", conditional_modifier,
+			inst->header.destreg__conditionalmod, NULL);
+
+        /* If we're using the conditional modifier, print which flags reg is
+         * used for it.  Note that on gen6+, the embedded-condition SEL and
+         * control flow doesn't update flags.
+         */
+	if (inst->header.destreg__conditionalmod &&
+            (gen < 6 || (inst->header.opcode != BRW_OPCODE_SEL &&
+                         inst->header.opcode != BRW_OPCODE_IF &&
+                         inst->header.opcode != BRW_OPCODE_WHILE))) {
+	    format (file, ".f%d", gen >= 7 ? inst->bits2.da1.flag_reg_nr : 0);
+	    if (inst->bits2.da1.flag_subreg_nr)
+		format (file, ".%d", inst->bits2.da1.flag_subreg_nr);
+        }
+    }
+
+    if (inst->header.opcode != BRW_OPCODE_NOP) {
+	string (file, "(");
+	err |= control (file, "execution size", exec_size, inst->header.execution_size, NULL);
+	string (file, ")");
+    }
+
+    if (inst->header.opcode == BRW_OPCODE_SEND && gen < 6)
+	format (file, " %d", inst->header.destreg__conditionalmod);
+
+    if (opcode[inst->header.opcode].nsrc == 3) {
+       pad (file, 16);
+       err |= dest_3src (file, inst);
+
+       pad (file, 32);
+       err |= src0_3src (file, inst);
+
+       pad (file, 48);
+       err |= src1_3src (file, inst);
+
+       pad (file, 64);
+       err |= src2_3src (file, inst);
+    } else {
+       if (opcode[inst->header.opcode].ndst > 0) {
+	  pad (file, 16);
+	  err |= dest (file, inst);
+       } else if (gen == 7 && (inst->header.opcode == BRW_OPCODE_ELSE ||
+			       inst->header.opcode == BRW_OPCODE_ENDIF ||
+			       inst->header.opcode == BRW_OPCODE_WHILE)) {
+	  format (file, " %d", inst->bits3.break_cont.jip);
+       } else if (gen == 6 && (inst->header.opcode == BRW_OPCODE_IF ||
+			       inst->header.opcode == BRW_OPCODE_ELSE ||
+			       inst->header.opcode == BRW_OPCODE_ENDIF ||
+			       inst->header.opcode == BRW_OPCODE_WHILE)) {
+	  format (file, " %d", inst->bits1.branch_gen6.jump_count);
+       } else if ((gen >= 6 && (inst->header.opcode == BRW_OPCODE_BREAK ||
+                                inst->header.opcode == BRW_OPCODE_CONTINUE ||
+                                inst->header.opcode == BRW_OPCODE_HALT)) ||
+                  (gen == 7 && inst->header.opcode == BRW_OPCODE_IF)) {
+	  format (file, " %d %d", inst->bits3.break_cont.uip, inst->bits3.break_cont.jip);
+       } else if (inst->header.opcode == BRW_OPCODE_JMPI) {
+	  format (file, " %d", inst->bits3.d);
+       }
+
+       if (opcode[inst->header.opcode].nsrc > 0) {
+	  pad (file, 32);
+	  err |= src0 (file, inst);
+       }
+       if (opcode[inst->header.opcode].nsrc > 1) {
+	  pad (file, 48);
+	  err |= src1 (file, inst);
+       }
+    }
+
+    if (inst->header.opcode == BRW_OPCODE_SEND ||
+	inst->header.opcode == BRW_OPCODE_SENDC) {
+	enum brw_message_target target;
+
+	if (gen >= 6)
+	    target = inst->header.destreg__conditionalmod;
+	else if (gen == 5)
+	    target = inst->bits2.send_gen5.sfid;
+	else
+	    target = inst->bits3.generic.msg_target;
+
+	newline (file);
+	pad (file, 16);
+	space = 0;
+
+	if (gen >= 6) {
+	   err |= control (file, "target function", target_function_gen6,
+			   target, &space);
+	} else {
+	   err |= control (file, "target function", target_function,
+			   target, &space);
+	}
+
+	switch (target) {
+	case BRW_SFID_MATH:
+	    err |= control (file, "math function", math_function,
+			    inst->bits3.math.function, &space);
+	    err |= control (file, "math saturate", math_saturate,
+			    inst->bits3.math.saturate, &space);
+	    err |= control (file, "math signed", math_signed,
+			    inst->bits3.math.int_type, &space);
+	    err |= control (file, "math scalar", math_scalar,
+			    inst->bits3.math.data_type, &space);
+	    err |= control (file, "math precision", math_precision,
+			    inst->bits3.math.precision, &space);
+	    break;
+	case BRW_SFID_SAMPLER:
+	    if (gen >= 7) {
+		format (file, " (%d, %d, %d, %d)",
+			inst->bits3.sampler_gen7.binding_table_index,
+			inst->bits3.sampler_gen7.sampler,
+			inst->bits3.sampler_gen7.msg_type,
+			inst->bits3.sampler_gen7.simd_mode);
+	    } else if (gen >= 5) {
+		format (file, " (%d, %d, %d, %d)",
+			inst->bits3.sampler_gen5.binding_table_index,
+			inst->bits3.sampler_gen5.sampler,
+			inst->bits3.sampler_gen5.msg_type,
+			inst->bits3.sampler_gen5.simd_mode);
+	    } else if (0 /* FINISHME: is_g4x */) {
+		format (file, " (%d, %d)",
+			inst->bits3.sampler_g4x.binding_table_index,
+			inst->bits3.sampler_g4x.sampler);
+	    } else {
+		format (file, " (%d, %d, ",
+			inst->bits3.sampler.binding_table_index,
+			inst->bits3.sampler.sampler);
+		err |= control (file, "sampler target format",
+				sampler_target_format,
+				inst->bits3.sampler.return_format, NULL);
+		string (file, ")");
+	    }
+	    break;
+	case BRW_SFID_DATAPORT_READ:
+	    if (gen >= 6) {
+		format (file, " (%d, %d, %d, %d)",
+			inst->bits3.gen6_dp.binding_table_index,
+			inst->bits3.gen6_dp.msg_control,
+			inst->bits3.gen6_dp.msg_type,
+			inst->bits3.gen6_dp.send_commit_msg);
+	    } else if (gen >= 5 /* FINISHME: || is_g4x */) {
+		format (file, " (%d, %d, %d)",
+			inst->bits3.dp_read_gen5.binding_table_index,
+			inst->bits3.dp_read_gen5.msg_control,
+			inst->bits3.dp_read_gen5.msg_type);
+	    } else {
+		format (file, " (%d, %d, %d)",
+			inst->bits3.dp_read.binding_table_index,
+			inst->bits3.dp_read.msg_control,
+			inst->bits3.dp_read.msg_type);
+	    }
+	    break;
+
+	case BRW_SFID_DATAPORT_WRITE:
+	    if (gen >= 7) {
+		format (file, " (");
+
+		err |= control (file, "DP rc message type",
+				dp_rc_msg_type_gen6,
+				inst->bits3.gen7_dp.msg_type, &space);
+
+		format (file, ", %d, %d, %d)",
+			inst->bits3.gen7_dp.binding_table_index,
+			inst->bits3.gen7_dp.msg_control,
+			inst->bits3.gen7_dp.msg_type);
+	    } else if (gen == 6) {
+		format (file, " (");
+
+		err |= control (file, "DP rc message type",
+				dp_rc_msg_type_gen6,
+				inst->bits3.gen6_dp.msg_type, &space);
+
+		format (file, ", %d, %d, %d, %d)",
+			inst->bits3.gen6_dp.binding_table_index,
+			inst->bits3.gen6_dp.msg_control,
+			inst->bits3.gen6_dp.msg_type,
+			inst->bits3.gen6_dp.send_commit_msg);
+	    } else {
+		format (file, " (%d, %d, %d, %d)",
+			inst->bits3.dp_write.binding_table_index,
+			(inst->bits3.dp_write.last_render_target << 3) |
+			inst->bits3.dp_write.msg_control,
+			inst->bits3.dp_write.msg_type,
+			inst->bits3.dp_write.send_commit_msg);
+	    }
+	    break;
+
+	case BRW_SFID_URB:
+	    if (gen >= 5) {
+		format (file, " %d", inst->bits3.urb_gen5.offset);
+	    } else {
+		format (file, " %d", inst->bits3.urb.offset);
+	    }
+
+	    space = 1;
+	    if (gen >= 5) {
+		err |= control (file, "urb opcode", urb_opcode,
+				inst->bits3.urb_gen5.opcode, &space);
+	    }
+	    err |= control (file, "urb swizzle", urb_swizzle,
+			    inst->bits3.urb.swizzle_control, &space);
+	    err |= control (file, "urb allocate", urb_allocate,
+			    inst->bits3.urb.allocate, &space);
+	    err |= control (file, "urb used", urb_used,
+			    inst->bits3.urb.used, &space);
+	    err |= control (file, "urb complete", urb_complete,
+			    inst->bits3.urb.complete, &space);
+	    break;
+	case BRW_SFID_THREAD_SPAWNER:
+	    break;
+	case GEN7_SFID_DATAPORT_DATA_CACHE:
+           if (gen >= 7) {
+              format (file, " (");
+
+              err |= control (file, "DP DC0 message type",
+                              dp_dc0_msg_type_gen7,
+                              inst->bits3.gen7_dp.msg_type, &space);
+
+              format (file, ", %d, ", inst->bits3.gen7_dp.binding_table_index);
+
+              switch (inst->bits3.gen7_dp.msg_type) {
+              case GEN7_DATAPORT_DC_UNTYPED_ATOMIC_OP:
+                 control (file, "atomic op", aop, inst->bits3.ud >> 8 & 0xf,
+                          &space);
+                 break;
+              default:
+                 format (file, "%d", inst->bits3.gen7_dp.msg_control);
+              }
+              format (file, ")");
+              break;
+           }
+           /* FALLTHROUGH */
+
+	case HSW_SFID_DATAPORT_DATA_CACHE_1:
+	    if (gen >= 7) {
+		format (file, " (");
+
+		err |= control (file, "DP DC1 message type",
+				dp_dc1_msg_type_hsw,
+				inst->bits3.gen7_dp.msg_type, &space);
+
+		format (file, ", %d, ",
+			inst->bits3.gen7_dp.binding_table_index);
+
+                switch (inst->bits3.gen7_dp.msg_type) {
+                case HSW_DATAPORT_DC_PORT1_UNTYPED_ATOMIC_OP:
+                case HSW_DATAPORT_DC_PORT1_UNTYPED_ATOMIC_OP_SIMD4X2:
+                case HSW_DATAPORT_DC_PORT1_TYPED_ATOMIC_OP:
+                case HSW_DATAPORT_DC_PORT1_TYPED_ATOMIC_OP_SIMD4X2:
+                case HSW_DATAPORT_DC_PORT1_ATOMIC_COUNTER_OP:
+                case HSW_DATAPORT_DC_PORT1_ATOMIC_COUNTER_OP_SIMD4X2:
+                   control (file, "atomic op", aop,
+                            inst->bits3.ud >> 8 & 0xf, &space);
+                   break;
+                default:
+                   format (file, "%d", inst->bits3.gen7_dp.msg_control);
+                }
+                format (file, ")");
+                break;
+            }
+            /* FALLTHROUGH */
+
+	default:
+	    format (file, "unsupported target %d", target);
+	    break;
+	}
+	if (space)
+	    string (file, " ");
+	if (gen >= 5) {
+	   format (file, "mlen %d",
+		   inst->bits3.generic_gen5.msg_length);
+	   format (file, " rlen %d",
+		   inst->bits3.generic_gen5.response_length);
+	} else {
+	   format (file, "mlen %d",
+		   inst->bits3.generic.msg_length);
+	   format (file, " rlen %d",
+		   inst->bits3.generic.response_length);
+	}
+    }
+    pad (file, 64);
+    if (inst->header.opcode != BRW_OPCODE_NOP) {
+	string (file, "{");
+	space = 1;
+	err |= control(file, "access mode", access_mode, inst->header.access_mode, &space);
+	if (gen >= 6)
+	    err |= control (file, "write enable control", wectrl, inst->header.mask_control, &space);
+	else
+	    err |= control (file, "mask control", mask_ctrl, inst->header.mask_control, &space);
+	err |= control (file, "dependency control", dep_ctrl, inst->header.dependency_control, &space);
+
+	if (gen >= 6)
+	    err |= qtr_ctrl (file, inst);
+	else {
+	    if (inst->header.compression_control == BRW_COMPRESSION_COMPRESSED &&
+		opcode[inst->header.opcode].ndst > 0 &&
+		inst->bits1.da1.dest_reg_file == BRW_MESSAGE_REGISTER_FILE &&
+		inst->bits1.da1.dest_reg_nr & (1 << 7)) {
+		format (file, " compr4");
+	    } else {
+		err |= control (file, "compression control", compr_ctrl,
+				inst->header.compression_control, &space);
+	    }
+	}
+
+	err |= control (file, "thread control", thread_ctrl, inst->header.thread_control, &space);
+	if (gen >= 6)
+	    err |= control (file, "acc write control", accwr, inst->header.acc_wr_control, &space);
+	if (inst->header.opcode == BRW_OPCODE_SEND ||
+	    inst->header.opcode == BRW_OPCODE_SENDC)
+	    err |= control (file, "end of thread", end_of_thread,
+			    inst->bits3.generic.end_of_thread, &space);
+	if (space)
+	    string (file, " ");
+	string (file, "}");
+    }
+    string (file, ";");
+    newline (file);
+    return err;
+}

diff --git a/icd/intel/compiler/pipeline/brw_eu.c b/icd/intel/compiler/pipeline/brw_eu.c
new file mode 100644
index 0000000..bb84029
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_eu.c

@@ -0,0 +1,294 @@
+/*
+ Copyright (C) Intel Corp.  2006.  All Rights Reserved.
+ Intel funded Tungsten Graphics to
+ develop this 3D driver.
+
+ Permission is hereby granted, free of charge, to any person obtaining
+ a copy of this software and associated documentation files (the
+ "Software"), to deal in the Software without restriction, including
+ without limitation the rights to use, copy, modify, merge, publish,
+ distribute, sublicense, and/or sell copies of the Software, and to
+ permit persons to whom the Software is furnished to do so, subject to
+ the following conditions:
+
+ The above copyright notice and this permission notice (including the
+ next paragraph) shall be included in all copies or substantial
+ portions of the Software.
+
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+ **********************************************************************/
+ /*
+  * Authors:
+  *   Keith Whitwell <keithw@vmware.com>
+  */
+
+
+#include "brw_context.h"
+#include "brw_defines.h"
+#include "brw_eu.h"
+
+#include "glsl/ralloc.h"
+
+/**
+ * Converts a BRW_REGISTER_TYPE_* enum to a short string (F, UD, and so on).
+ *
+ * This is different than reg_encoding from brw_disasm.c in that it operates
+ * on the abstract enum values, rather than the generation-specific encoding.
+ */
+const char *
+brw_reg_type_letters(unsigned type)
+{
+   const char *names[] = {
+      [BRW_REGISTER_TYPE_UD] = "UD",
+      [BRW_REGISTER_TYPE_D]  = "D",
+      [BRW_REGISTER_TYPE_UW] = "UW",
+      [BRW_REGISTER_TYPE_W]  = "W",
+      [BRW_REGISTER_TYPE_F]  = "F",
+      [BRW_REGISTER_TYPE_UB] = "UB",
+      [BRW_REGISTER_TYPE_B]  = "B",
+      [BRW_REGISTER_TYPE_UV] = "UV",
+      [BRW_REGISTER_TYPE_V]  = "V",
+      [BRW_REGISTER_TYPE_VF] = "VF",
+      [BRW_REGISTER_TYPE_DF] = "DF",
+      [BRW_REGISTER_TYPE_HF] = "HF",
+      [BRW_REGISTER_TYPE_UQ] = "UQ",
+      [BRW_REGISTER_TYPE_Q]  = "Q",
+   };
+   assert(type <= BRW_REGISTER_TYPE_UQ);
+   return names[type];
+}
+
+/* Returns the corresponding conditional mod for swapping src0 and
+ * src1 in e.g. CMP.
+ */
+uint32_t
+brw_swap_cmod(uint32_t cmod)
+{
+   switch (cmod) {
+   case BRW_CONDITIONAL_Z:
+   case BRW_CONDITIONAL_NZ:
+      return cmod;
+   case BRW_CONDITIONAL_G:
+      return BRW_CONDITIONAL_L;
+   case BRW_CONDITIONAL_GE:
+      return BRW_CONDITIONAL_LE;
+   case BRW_CONDITIONAL_L:
+      return BRW_CONDITIONAL_G;
+   case BRW_CONDITIONAL_LE:
+      return BRW_CONDITIONAL_GE;
+   default:
+      return ~0;
+   }
+}
+
+
+/* How does predicate control work when execution_size != 8?  Do I
+ * need to test/set for 0xffff when execution_size is 16?
+ */
+void brw_set_predicate_control_flag_value( struct brw_compile *p, unsigned value )
+{
+   p->current->header.predicate_control = BRW_PREDICATE_NONE;
+
+   if (value != 0xff) {
+      if (value != p->flag_value) {
+	 brw_push_insn_state(p);
+	 brw_MOV(p, brw_flag_reg(0, 0), brw_imm_uw(value));
+	 p->flag_value = value;
+	 brw_pop_insn_state(p);
+      }
+
+      p->current->header.predicate_control = BRW_PREDICATE_NORMAL;
+   }
+}
+
+void brw_set_predicate_control( struct brw_compile *p, unsigned pc )
+{
+   p->current->header.predicate_control = pc;
+}
+
+void brw_set_predicate_inverse(struct brw_compile *p, bool predicate_inverse)
+{
+   p->current->header.predicate_inverse = predicate_inverse;
+}
+
+void brw_set_conditionalmod( struct brw_compile *p, unsigned conditional )
+{
+   p->current->header.destreg__conditionalmod = conditional;
+}
+
+void brw_set_flag_reg(struct brw_compile *p, int reg, int subreg)
+{
+   p->current->bits2.da1.flag_reg_nr = reg;
+   p->current->bits2.da1.flag_subreg_nr = subreg;
+}
+
+void brw_set_access_mode( struct brw_compile *p, unsigned access_mode )
+{
+   p->current->header.access_mode = access_mode;
+}
+
+void
+brw_set_compression_control(struct brw_compile *p,
+			    enum brw_compression compression_control)
+{
+   p->compressed = (compression_control == BRW_COMPRESSION_COMPRESSED);
+
+   if (p->brw->gen >= 6) {
+      /* Since we don't use the SIMD32 support in gen6, we translate
+       * the pre-gen6 compression control here.
+       */
+      switch (compression_control) {
+      case BRW_COMPRESSION_NONE:
+	 /* This is the "use the first set of bits of dmask/vmask/arf
+	  * according to execsize" option.
+	  */
+	 p->current->header.compression_control = GEN6_COMPRESSION_1Q;
+	 break;
+      case BRW_COMPRESSION_2NDHALF:
+	 /* For SIMD8, this is "use the second set of 8 bits." */
+	 p->current->header.compression_control = GEN6_COMPRESSION_2Q;
+	 break;
+      case BRW_COMPRESSION_COMPRESSED:
+	 /* For SIMD16 instruction compression, use the first set of 16 bits
+	  * since we don't do SIMD32 dispatch.
+	  */
+	 p->current->header.compression_control = GEN6_COMPRESSION_1H;
+	 break;
+      default:
+	 assert(!"not reached");
+	 p->current->header.compression_control = GEN6_COMPRESSION_1H;
+	 break;
+      }
+   } else {
+      p->current->header.compression_control = compression_control;
+   }
+}
+
+void brw_set_mask_control( struct brw_compile *p, unsigned value )
+{
+   p->current->header.mask_control = value;
+}
+
+void brw_set_saturate( struct brw_compile *p, bool enable )
+{
+   p->current->header.saturate = enable;
+}
+
+void brw_set_acc_write_control(struct brw_compile *p, unsigned value)
+{
+   if (p->brw->gen >= 6)
+      p->current->header.acc_wr_control = value;
+}
+
+void brw_push_insn_state( struct brw_compile *p )
+{
+   assert(p->current != &p->stack[BRW_EU_MAX_INSN_STACK-1]);
+   memcpy(p->current+1, p->current, sizeof(struct brw_instruction));
+   p->compressed_stack[p->current - p->stack] = p->compressed;
+   p->current++;
+}
+
+void brw_pop_insn_state( struct brw_compile *p )
+{
+   assert(p->current != p->stack);
+   p->current--;
+   p->compressed = p->compressed_stack[p->current - p->stack];
+}
+
+
+/***********************************************************************
+ */
+void
+brw_init_compile(struct brw_context *brw, struct brw_compile *p, void *mem_ctx)
+{
+   memset(p, 0, sizeof(*p));
+
+   p->brw = brw;
+   /*
+    * Set the initial instruction store array size to 1024, if found that
+    * isn't enough, then it will double the store size at brw_next_insn()
+    * until out of memory.
+    */
+   p->store_size = 1024;
+   p->store = rzalloc_array(mem_ctx, struct brw_instruction, p->store_size);
+   p->nr_insn = 0;
+   p->current = p->stack;
+   p->compressed = false;
+   memset(p->current, 0, sizeof(p->current[0]));
+
+   p->mem_ctx = mem_ctx;
+
+   /* Some defaults?
+    */
+   brw_set_mask_control(p, BRW_MASK_ENABLE); /* what does this do? */
+   brw_set_saturate(p, 0);
+   brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+   brw_set_predicate_control_flag_value(p, 0xff);
+
+   /* Set up control flow stack */
+   p->if_stack_depth = 0;
+   p->if_stack_array_size = 16;
+   p->if_stack = rzalloc_array(mem_ctx, int, p->if_stack_array_size);
+
+   p->loop_stack_depth = 0;
+   p->loop_stack_array_size = 16;
+   p->loop_stack = rzalloc_array(mem_ctx, int, p->loop_stack_array_size);
+   p->if_depth_in_loop = rzalloc_array(mem_ctx, int, p->loop_stack_array_size);
+
+   brw_init_compaction_tables(brw);
+}
+
+
+const unsigned *brw_get_program( struct brw_compile *p,
+			       unsigned *sz )
+{
+   brw_compact_instructions(p);
+
+   *sz = p->next_insn_offset;
+   return (const unsigned *)p->store;
+}
+
+void
+brw_dump_compile(struct brw_compile *p, FILE *out, int start, int end)
+{
+   struct brw_context *brw = p->brw;
+   void *store = p->store;
+   bool dump_hex = false;
+
+   for (int offset = start; offset < end;) {
+      struct brw_instruction *insn = store + offset;
+      struct brw_instruction uncompacted;
+      fprintf(out, "0x%08x: ", offset);
+
+      if (insn->header.cmpt_control) {
+	 struct brw_compact_instruction *compacted = (void *)insn;
+	 if (dump_hex) {
+	    fprintf(out, "0x%08x 0x%08x                       ",
+		    ((uint32_t *)insn)[1],
+		    ((uint32_t *)insn)[0]);
+	 }
+
+	 brw_uncompact_instruction(brw, &uncompacted, compacted);
+	 insn = &uncompacted;
+	 offset += 8;
+      } else {
+	 if (dump_hex) {
+	    fprintf(out, "0x%08x 0x%08x 0x%08x 0x%08x ",
+		    ((uint32_t *)insn)[3],
+		    ((uint32_t *)insn)[2],
+		    ((uint32_t *)insn)[1],
+		    ((uint32_t *)insn)[0]);
+	 }
+	 offset += 16;
+      }
+
+      brw_disasm(out, insn, p->brw->gen);
+   }
+}

diff --git a/icd/intel/compiler/pipeline/brw_eu.h b/icd/intel/compiler/pipeline/brw_eu.h
new file mode 100644
index 0000000..68c68fd
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_eu.h

@@ -0,0 +1,451 @@
+/*
+ Copyright (C) Intel Corp.  2006.  All Rights Reserved.
+ Intel funded Tungsten Graphics to
+ develop this 3D driver.
+
+ Permission is hereby granted, free of charge, to any person obtaining
+ a copy of this software and associated documentation files (the
+ "Software"), to deal in the Software without restriction, including
+ without limitation the rights to use, copy, modify, merge, publish,
+ distribute, sublicense, and/or sell copies of the Software, and to
+ permit persons to whom the Software is furnished to do so, subject to
+ the following conditions:
+
+ The above copyright notice and this permission notice (including the
+ next paragraph) shall be included in all copies or substantial
+ portions of the Software.
+
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+ **********************************************************************/
+ /*
+  * Authors:
+  *   Keith Whitwell <keithw@vmware.com>
+  */
+
+
+#ifndef BRW_EU_H
+#define BRW_EU_H
+
+#include <stdbool.h>
+#include "brw_structs.h"
+#include "brw_defines.h"
+#include "brw_reg.h"
+#include "program/prog_instruction.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define BRW_EU_MAX_INSN_STACK 5
+
+struct brw_compile {
+   struct brw_instruction *store;
+   int store_size;
+   unsigned nr_insn;
+   unsigned int next_insn_offset;
+
+   void *mem_ctx;
+
+   /* Allow clients to push/pop instruction state:
+    */
+   struct brw_instruction stack[BRW_EU_MAX_INSN_STACK];
+   bool compressed_stack[BRW_EU_MAX_INSN_STACK];
+   struct brw_instruction *current;
+
+   unsigned flag_value;
+   bool single_program_flow;
+   bool compressed;
+   struct brw_context *brw;
+
+   /* Control flow stacks:
+    * - if_stack contains IF and ELSE instructions which must be patched
+    *   (and popped) once the matching ENDIF instruction is encountered.
+    *
+    *   Just store the instruction pointer(an index).
+    */
+   int *if_stack;
+   int if_stack_depth;
+   int if_stack_array_size;
+
+   /**
+    * loop_stack contains the instruction pointers of the starts of loops which
+    * must be patched (and popped) once the matching WHILE instruction is
+    * encountered.
+    */
+   int *loop_stack;
+   /**
+    * pre-gen6, the BREAK and CONT instructions had to tell how many IF/ENDIF
+    * blocks they were popping out of, to fix up the mask stack.  This tracks
+    * the IF/ENDIF nesting in each current nested loop level.
+    */
+   int *if_depth_in_loop;
+   int loop_stack_depth;
+   int loop_stack_array_size;
+};
+
+static inline struct brw_instruction *current_insn( struct brw_compile *p)
+{
+   return &p->store[p->nr_insn];
+}
+
+void brw_pop_insn_state( struct brw_compile *p );
+void brw_push_insn_state( struct brw_compile *p );
+void brw_set_mask_control( struct brw_compile *p, unsigned value );
+void brw_set_saturate( struct brw_compile *p, bool enable );
+void brw_set_access_mode( struct brw_compile *p, unsigned access_mode );
+void brw_set_compression_control(struct brw_compile *p, enum brw_compression c);
+void brw_set_predicate_control_flag_value( struct brw_compile *p, unsigned value );
+void brw_set_predicate_control( struct brw_compile *p, unsigned pc );
+void brw_set_predicate_inverse(struct brw_compile *p, bool predicate_inverse);
+void brw_set_conditionalmod( struct brw_compile *p, unsigned conditional );
+void brw_set_flag_reg(struct brw_compile *p, int reg, int subreg);
+void brw_set_acc_write_control(struct brw_compile *p, unsigned value);
+
+void brw_init_compile(struct brw_context *, struct brw_compile *p,
+		      void *mem_ctx);
+void brw_dump_compile(struct brw_compile *p, FILE *out, int start, int end);
+const unsigned *brw_get_program( struct brw_compile *p, unsigned *sz );
+
+struct brw_instruction *brw_next_insn(struct brw_compile *p, unsigned opcode);
+void brw_set_dest(struct brw_compile *p, struct brw_instruction *insn,
+		  struct brw_reg dest);
+void brw_set_src0(struct brw_compile *p, struct brw_instruction *insn,
+		  struct brw_reg reg);
+
+void gen6_resolve_implied_move(struct brw_compile *p,
+			       struct brw_reg *src,
+			       unsigned msg_reg_nr);
+
+/* Helpers for regular instructions:
+ */
+#define ALU1(OP)					\
+struct brw_instruction *brw_##OP(struct brw_compile *p,	\
+	      struct brw_reg dest,			\
+	      struct brw_reg src0);
+
+#define ALU2(OP)					\
+struct brw_instruction *brw_##OP(struct brw_compile *p,	\
+	      struct brw_reg dest,			\
+	      struct brw_reg src0,			\
+	      struct brw_reg src1);
+
+#define ALU3(OP)					\
+struct brw_instruction *brw_##OP(struct brw_compile *p,	\
+	      struct brw_reg dest,			\
+	      struct brw_reg src0,			\
+	      struct brw_reg src1,			\
+	      struct brw_reg src2);
+
+#define ROUND(OP) \
+void brw_##OP(struct brw_compile *p, struct brw_reg dest, struct brw_reg src0);
+
+ALU1(MOV)
+ALU2(SEL)
+ALU1(NOT)
+ALU2(AND)
+ALU2(OR)
+ALU2(XOR)
+ALU2(SHR)
+ALU2(SHL)
+ALU2(ASR)
+ALU1(F32TO16)
+ALU1(F16TO32)
+ALU2(JMPI)
+ALU2(ADD)
+ALU2(AVG)
+ALU2(MUL)
+ALU1(FRC)
+ALU1(RNDD)
+ALU2(MAC)
+ALU2(MACH)
+ALU1(LZD)
+ALU2(DP4)
+ALU2(DPH)
+ALU2(DP3)
+ALU2(DP2)
+ALU2(LINE)
+ALU2(PLN)
+ALU3(MAD)
+ALU3(LRP)
+ALU1(BFREV)
+ALU3(BFE)
+ALU2(BFI1)
+ALU3(BFI2)
+ALU1(FBH)
+ALU1(FBL)
+ALU1(CBIT)
+ALU2(ADDC)
+ALU2(SUBB)
+ALU2(MAC)
+
+ROUND(RNDZ)
+ROUND(RNDE)
+
+#undef ALU1
+#undef ALU2
+#undef ALU3
+#undef ROUND
+
+
+/* Helpers for SEND instruction:
+ */
+void brw_set_sampler_message(struct brw_compile *p,
+                             struct brw_instruction *insn,
+                             unsigned binding_table_index,
+                             unsigned sampler,
+                             unsigned msg_type,
+                             unsigned response_length,
+                             unsigned msg_length,
+                             unsigned header_present,
+                             unsigned simd_mode,
+                             unsigned return_format);
+
+void brw_set_dp_read_message(struct brw_compile *p,
+			     struct brw_instruction *insn,
+			     unsigned binding_table_index,
+			     unsigned msg_control,
+			     unsigned msg_type,
+			     unsigned target_cache,
+			     unsigned msg_length,
+                             bool header_present,
+			     unsigned response_length);
+
+void brw_set_dp_write_message(struct brw_compile *p,
+			      struct brw_instruction *insn,
+			      unsigned binding_table_index,
+			      unsigned msg_control,
+			      unsigned msg_type,
+			      unsigned msg_length,
+			      bool header_present,
+			      unsigned last_render_target,
+			      unsigned response_length,
+			      unsigned end_of_thread,
+			      unsigned send_commit_msg);
+
+void brw_urb_WRITE(struct brw_compile *p,
+		   struct brw_reg dest,
+		   unsigned msg_reg_nr,
+		   struct brw_reg src0,
+                   enum brw_urb_write_flags flags,
+		   unsigned msg_length,
+		   unsigned response_length,
+		   unsigned offset,
+		   unsigned swizzle);
+
+void brw_ff_sync(struct brw_compile *p,
+		   struct brw_reg dest,
+		   unsigned msg_reg_nr,
+		   struct brw_reg src0,
+		   bool allocate,
+		   unsigned response_length,
+		   bool eot);
+
+void brw_svb_write(struct brw_compile *p,
+                   struct brw_reg dest,
+                   unsigned msg_reg_nr,
+                   struct brw_reg src0,
+                   unsigned binding_table_index,
+                   bool   send_commit_msg);
+
+void brw_fb_WRITE(struct brw_compile *p,
+		  int dispatch_width,
+		   unsigned msg_reg_nr,
+		   struct brw_reg src0,
+		   unsigned msg_control,
+		   unsigned binding_table_index,
+		   unsigned msg_length,
+		   unsigned response_length,
+		   bool eot,
+		   bool header_present);
+
+void brw_SAMPLE(struct brw_compile *p,
+		struct brw_reg dest,
+		unsigned msg_reg_nr,
+		struct brw_reg src0,
+		unsigned binding_table_index,
+		unsigned sampler,
+		unsigned msg_type,
+		unsigned response_length,
+		unsigned msg_length,
+		unsigned header_present,
+		unsigned simd_mode,
+		unsigned return_format);
+
+void brw_math( struct brw_compile *p,
+	       struct brw_reg dest,
+	       unsigned function,
+	       unsigned msg_reg_nr,
+	       struct brw_reg src,
+	       unsigned data_type,
+	       unsigned precision );
+
+void brw_math2(struct brw_compile *p,
+	       struct brw_reg dest,
+	       unsigned function,
+	       struct brw_reg src0,
+	       struct brw_reg src1);
+
+void brw_oword_block_read(struct brw_compile *p,
+			  struct brw_reg dest,
+			  struct brw_reg mrf,
+			  uint32_t offset,
+			  uint32_t bind_table_index);
+
+void brw_oword_block_read_scratch(struct brw_compile *p,
+				  struct brw_reg dest,
+				  struct brw_reg mrf,
+				  int num_regs,
+				  unsigned offset);
+
+void brw_oword_block_write_scratch(struct brw_compile *p,
+				   struct brw_reg mrf,
+				   int num_regs,
+				   unsigned offset);
+
+void gen7_block_read_scratch(struct brw_compile *p,
+                             struct brw_reg dest,
+                             int num_regs,
+                             unsigned offset);
+
+void brw_shader_time_add(struct brw_compile *p,
+                         struct brw_reg payload,
+                         uint32_t surf_index);
+
+/* If/else/endif.  Works by manipulating the execution flags on each
+ * channel.
+ */
+struct brw_instruction *brw_IF(struct brw_compile *p,
+			       unsigned execute_size);
+struct brw_instruction *gen6_IF(struct brw_compile *p, uint32_t conditional,
+				struct brw_reg src0, struct brw_reg src1);
+
+void brw_ELSE(struct brw_compile *p);
+void brw_ENDIF(struct brw_compile *p);
+
+/* DO/WHILE loops:
+ */
+struct brw_instruction *brw_DO(struct brw_compile *p,
+			       unsigned execute_size);
+
+struct brw_instruction *brw_WHILE(struct brw_compile *p);
+
+struct brw_instruction *brw_BREAK(struct brw_compile *p);
+struct brw_instruction *brw_CONT(struct brw_compile *p);
+struct brw_instruction *gen6_CONT(struct brw_compile *p);
+struct brw_instruction *gen6_HALT(struct brw_compile *p);
+/* Forward jumps:
+ */
+void brw_land_fwd_jump(struct brw_compile *p, int jmp_insn_idx);
+
+
+
+void brw_NOP(struct brw_compile *p);
+
+void brw_WAIT(struct brw_compile *p);
+
+/* Special case: there is never a destination, execution size will be
+ * taken from src0:
+ */
+void brw_CMP(struct brw_compile *p,
+	     struct brw_reg dest,
+	     unsigned conditional,
+	     struct brw_reg src0,
+	     struct brw_reg src1);
+
+void
+brw_untyped_atomic(struct brw_compile *p,
+                   struct brw_reg dest,
+                   struct brw_reg mrf,
+                   unsigned atomic_op,
+                   unsigned bind_table_index,
+                   unsigned msg_length,
+                   unsigned response_length);
+
+void
+brw_untyped_surface_read(struct brw_compile *p,
+                         struct brw_reg dest,
+                         struct brw_reg mrf,
+                         unsigned bind_table_index,
+                         unsigned msg_length,
+                         unsigned response_length);
+
+void
+brw_scattered_write(struct brw_compile *p,
+                    struct brw_reg dest,
+                    struct brw_reg mrf,
+                    unsigned bind_table_index,
+                    unsigned msg_length,
+                    bool header_present,
+                    bool in_dwords);
+
+void
+brw_scattered_read(struct brw_compile *p,
+                   struct brw_reg dest,
+                   struct brw_reg mrf,
+                   unsigned bind_table_index,
+                   unsigned msg_length,
+                   bool header_present,
+                   bool in_dwords);
+
+/***********************************************************************
+ * brw_eu_util.c:
+ */
+
+void brw_copy_indirect_to_indirect(struct brw_compile *p,
+				   struct brw_indirect dst_ptr,
+				   struct brw_indirect src_ptr,
+				   unsigned count);
+
+void brw_copy_from_indirect(struct brw_compile *p,
+			    struct brw_reg dst,
+			    struct brw_indirect ptr,
+			    unsigned count);
+
+void brw_copy4(struct brw_compile *p,
+	       struct brw_reg dst,
+	       struct brw_reg src,
+	       unsigned count);
+
+void brw_copy8(struct brw_compile *p,
+	       struct brw_reg dst,
+	       struct brw_reg src,
+	       unsigned count);
+
+void brw_math_invert( struct brw_compile *p,
+		      struct brw_reg dst,
+		      struct brw_reg src);
+
+void brw_set_src1(struct brw_compile *p,
+		  struct brw_instruction *insn,
+		  struct brw_reg reg);
+
+void brw_set_uip_jip(struct brw_compile *p);
+
+uint32_t brw_swap_cmod(uint32_t cmod);
+
+/* brw_eu_compact.c */
+void brw_init_compaction_tables(struct brw_context *brw);
+void brw_compact_instructions(struct brw_compile *p);
+void brw_uncompact_instruction(struct brw_context *brw,
+			       struct brw_instruction *dst,
+			       struct brw_compact_instruction *src);
+bool brw_try_compact_instruction(struct brw_compile *p,
+                                 struct brw_compact_instruction *dst,
+                                 struct brw_instruction *src);
+
+void brw_debug_compact_uncompact(struct brw_context *brw,
+				 struct brw_instruction *orig,
+				 struct brw_instruction *uncompacted);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif

diff --git a/icd/intel/compiler/pipeline/brw_eu_compact.c b/icd/intel/compiler/pipeline/brw_eu_compact.c
new file mode 100644
index 0000000..eb7da9c
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_eu_compact.c

@@ -0,0 +1,807 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+/** @file brw_eu_compact.c
+ *
+ * Instruction compaction is a feature of gm45 and newer hardware that allows
+ * for a smaller instruction encoding.
+ *
+ * The instruction cache is on the order of 32KB, and many programs generate
+ * far more instructions than that.  The instruction cache is built to barely
+ * keep up with instruction dispatch abaility in cache hit cases -- L1
+ * instruction cache misses that still hit in the next level could limit
+ * throughput by around 50%.
+ *
+ * The idea of instruction compaction is that most instructions use a tiny
+ * subset of the GPU functionality, so we can encode what would be a 16 byte
+ * instruction in 8 bytes using some lookup tables for various fields.
+ */
+
+#include "brw_context.h"
+#include "brw_eu.h"
+
+#include "icd-utils.h" // LunarG : ADD
+
+static const uint32_t gen6_control_index_table[32] = {
+   0b00000000000000000,
+   0b01000000000000000,
+   0b00110000000000000,
+   0b00000000100000000,
+   0b00010000000000000,
+   0b00001000100000000,
+   0b00000000100000010,
+   0b00000000000000010,
+   0b01000000100000000,
+   0b01010000000000000,
+   0b10110000000000000,
+   0b00100000000000000,
+   0b11010000000000000,
+   0b11000000000000000,
+   0b01001000100000000,
+   0b01000000000001000,
+   0b01000000000000100,
+   0b00000000000001000,
+   0b00000000000000100,
+   0b00111000100000000,
+   0b00001000100000010,
+   0b00110000100000000,
+   0b00110000000000001,
+   0b00100000000000001,
+   0b00110000000000010,
+   0b00110000000000101,
+   0b00110000000001001,
+   0b00110000000010000,
+   0b00110000000000011,
+   0b00110000000000100,
+   0b00110000100001000,
+   0b00100000000001001
+};
+
+static const uint32_t gen6_datatype_table[32] = {
+   0b001001110000000000,
+   0b001000110000100000,
+   0b001001110000000001,
+   0b001000000001100000,
+   0b001010110100101001,
+   0b001000000110101101,
+   0b001100011000101100,
+   0b001011110110101101,
+   0b001000000111101100,
+   0b001000000001100001,
+   0b001000110010100101,
+   0b001000000001000001,
+   0b001000001000110001,
+   0b001000001000101001,
+   0b001000000000100000,
+   0b001000001000110010,
+   0b001010010100101001,
+   0b001011010010100101,
+   0b001000000110100101,
+   0b001100011000101001,
+   0b001011011000101100,
+   0b001011010110100101,
+   0b001011110110100101,
+   0b001111011110111101,
+   0b001111011110111100,
+   0b001111011110111101,
+   0b001111011110011101,
+   0b001111011110111110,
+   0b001000000000100001,
+   0b001000000000100010,
+   0b001001111111011101,
+   0b001000001110111110,
+};
+
+static const uint16_t gen6_subreg_table[32] = {
+   0b000000000000000,
+   0b000000000000100,
+   0b000000110000000,
+   0b111000000000000,
+   0b011110000001000,
+   0b000010000000000,
+   0b000000000010000,
+   0b000110000001100,
+   0b001000000000000,
+   0b000001000000000,
+   0b000001010010100,
+   0b000000001010110,
+   0b010000000000000,
+   0b110000000000000,
+   0b000100000000000,
+   0b000000010000000,
+   0b000000000001000,
+   0b100000000000000,
+   0b000001010000000,
+   0b001010000000000,
+   0b001100000000000,
+   0b000000001010100,
+   0b101101010010100,
+   0b010100000000000,
+   0b000000010001111,
+   0b011000000000000,
+   0b111110000000000,
+   0b101000000000000,
+   0b000000000001111,
+   0b000100010001111,
+   0b001000010001111,
+   0b000110000000000,
+};
+
+static const uint16_t gen6_src_index_table[32] = {
+   0b000000000000,
+   0b010110001000,
+   0b010001101000,
+   0b001000101000,
+   0b011010010000,
+   0b000100100000,
+   0b010001101100,
+   0b010101110000,
+   0b011001111000,
+   0b001100101000,
+   0b010110001100,
+   0b001000100000,
+   0b010110001010,
+   0b000000000010,
+   0b010101010000,
+   0b010101101000,
+   0b111101001100,
+   0b111100101100,
+   0b011001110000,
+   0b010110001001,
+   0b010101011000,
+   0b001101001000,
+   0b010000101100,
+   0b010000000000,
+   0b001101110000,
+   0b001100010000,
+   0b001100000000,
+   0b010001101010,
+   0b001101111000,
+   0b000001110000,
+   0b001100100000,
+   0b001101010000,
+};
+
+static const uint32_t gen7_control_index_table[32] = {
+   0b0000000000000000010,
+   0b0000100000000000000,
+   0b0000100000000000001,
+   0b0000100000000000010,
+   0b0000100000000000011,
+   0b0000100000000000100,
+   0b0000100000000000101,
+   0b0000100000000000111,
+   0b0000100000000001000,
+   0b0000100000000001001,
+   0b0000100000000001101,
+   0b0000110000000000000,
+   0b0000110000000000001,
+   0b0000110000000000010,
+   0b0000110000000000011,
+   0b0000110000000000100,
+   0b0000110000000000101,
+   0b0000110000000000111,
+   0b0000110000000001001,
+   0b0000110000000001101,
+   0b0000110000000010000,
+   0b0000110000100000000,
+   0b0001000000000000000,
+   0b0001000000000000010,
+   0b0001000000000000100,
+   0b0001000000100000000,
+   0b0010110000000000000,
+   0b0010110000000010000,
+   0b0011000000000000000,
+   0b0011000000100000000,
+   0b0101000000000000000,
+   0b0101000000100000000
+};
+
+static const uint32_t gen7_datatype_table[32] = {
+   0b001000000000000001,
+   0b001000000000100000,
+   0b001000000000100001,
+   0b001000000001100001,
+   0b001000000010111101,
+   0b001000001011111101,
+   0b001000001110100001,
+   0b001000001110100101,
+   0b001000001110111101,
+   0b001000010000100001,
+   0b001000110000100000,
+   0b001000110000100001,
+   0b001001010010100101,
+   0b001001110010100100,
+   0b001001110010100101,
+   0b001111001110111101,
+   0b001111011110011101,
+   0b001111011110111100,
+   0b001111011110111101,
+   0b001111111110111100,
+   0b000000001000001100,
+   0b001000000000111101,
+   0b001000000010100101,
+   0b001000010000100000,
+   0b001001010010100100,
+   0b001001110010000100,
+   0b001010010100001001,
+   0b001101111110111101,
+   0b001111111110111101,
+   0b001011110110101100,
+   0b001010010100101000,
+   0b001010110100101000
+};
+
+static const uint16_t gen7_subreg_table[32] = {
+   0b000000000000000,
+   0b000000000000001,
+   0b000000000001000,
+   0b000000000001111,
+   0b000000000010000,
+   0b000000010000000,
+   0b000000100000000,
+   0b000000110000000,
+   0b000001000000000,
+   0b000001000010000,
+   0b000010100000000,
+   0b001000000000000,
+   0b001000000000001,
+   0b001000010000001,
+   0b001000010000010,
+   0b001000010000011,
+   0b001000010000100,
+   0b001000010000111,
+   0b001000010001000,
+   0b001000010001110,
+   0b001000010001111,
+   0b001000110000000,
+   0b001000111101000,
+   0b010000000000000,
+   0b010000110000000,
+   0b011000000000000,
+   0b011110010000111,
+   0b100000000000000,
+   0b101000000000000,
+   0b110000000000000,
+   0b111000000000000,
+   0b111000000011100
+};
+
+static const uint16_t gen7_src_index_table[32] = {
+   0b000000000000,
+   0b000000000010,
+   0b000000010000,
+   0b000000010010,
+   0b000000011000,
+   0b000000100000,
+   0b000000101000,
+   0b000001001000,
+   0b000001010000,
+   0b000001110000,
+   0b000001111000,
+   0b001100000000,
+   0b001100000010,
+   0b001100001000,
+   0b001100010000,
+   0b001100010010,
+   0b001100100000,
+   0b001100101000,
+   0b001100111000,
+   0b001101000000,
+   0b001101000010,
+   0b001101001000,
+   0b001101010000,
+   0b001101100000,
+   0b001101101000,
+   0b001101110000,
+   0b001101110001,
+   0b001101111000,
+   0b010001101000,
+   0b010001101001,
+   0b010001101010,
+   0b010110001000
+};
+
+static const uint32_t *control_index_table;
+static const uint32_t *datatype_table;
+static const uint16_t *subreg_table;
+static const uint16_t *src_index_table;
+
+static bool
+set_control_index(struct brw_context *brw,
+                  struct brw_compact_instruction *dst,
+                  struct brw_instruction *src)
+{
+   uint32_t *src_u32 = (uint32_t *)src;
+   uint32_t uncompacted = 0;
+
+   uncompacted |= ((src_u32[0] >> 8) & 0xffff) << 0;
+   uncompacted |= ((src_u32[0] >> 31) & 0x1) << 16;
+   /* On gen7, the flag register number gets integrated into the control
+    * index.
+    */
+   if (brw->gen >= 7)
+      uncompacted |= ((src_u32[2] >> 25) & 0x3) << 17;
+
+   for (int i = 0; i < 32; i++) {
+      if (control_index_table[i] == uncompacted) {
+	 dst->dw0.control_index = i;
+	 return true;
+      }
+   }
+
+   return false;
+}
+
+static bool
+set_datatype_index(struct brw_compact_instruction *dst,
+                   struct brw_instruction *src)
+{
+   uint32_t uncompacted = 0;
+
+   uncompacted |= src->bits1.ud & 0x7fff;
+   uncompacted |= (src->bits1.ud >> 29) << 15;
+
+   for (int i = 0; i < 32; i++) {
+      if (datatype_table[i] == uncompacted) {
+	 dst->dw0.data_type_index = i;
+	 return true;
+      }
+   }
+
+   return false;
+}
+
+static bool
+set_subreg_index(struct brw_compact_instruction *dst,
+                 struct brw_instruction *src)
+{
+   uint16_t uncompacted = 0;
+
+   uncompacted |= src->bits1.da1.dest_subreg_nr << 0;
+   uncompacted |= src->bits2.da1.src0_subreg_nr << 5;
+   uncompacted |= src->bits3.da1.src1_subreg_nr << 10;
+
+   for (int i = 0; i < 32; i++) {
+      if (subreg_table[i] == uncompacted) {
+	 dst->dw0.sub_reg_index = i;
+	 return true;
+      }
+   }
+
+   return false;
+}
+
+static bool
+get_src_index(uint16_t uncompacted,
+              uint16_t *compacted)
+{
+   for (int i = 0; i < 32; i++) {
+      if (src_index_table[i] == uncompacted) {
+	 *compacted = i;
+	 return true;
+      }
+   }
+
+   return false;
+}
+
+static bool
+set_src0_index(struct brw_compact_instruction *dst,
+               struct brw_instruction *src)
+{
+   uint16_t compacted, uncompacted = 0;
+
+   uncompacted |= (src->bits2.ud >> 13) & 0xfff;
+
+   if (!get_src_index(uncompacted, &compacted))
+      return false;
+
+   dst->dw0.src0_index = compacted & 0x3;
+   dst->dw1.src0_index = compacted >> 2;
+
+   return true;
+}
+
+static bool
+set_src1_index(struct brw_compact_instruction *dst,
+               struct brw_instruction *src)
+{
+   uint16_t compacted, uncompacted = 0;
+
+   uncompacted |= (src->bits3.ud >> 13) & 0xfff;
+
+   if (!get_src_index(uncompacted, &compacted))
+      return false;
+
+   dst->dw1.src1_index = compacted;
+
+   return true;
+}
+
+/**
+ * Tries to compact instruction src into dst.
+ *
+ * It doesn't modify dst unless src is compactable, which is relied on by
+ * brw_compact_instructions().
+ */
+bool
+brw_try_compact_instruction(struct brw_compile *p,
+                            struct brw_compact_instruction *dst,
+                            struct brw_instruction *src)
+{
+   struct brw_context *brw = p->brw;
+   struct brw_compact_instruction temp;
+
+   if (src->header.opcode == BRW_OPCODE_IF ||
+       src->header.opcode == BRW_OPCODE_ELSE ||
+       src->header.opcode == BRW_OPCODE_ENDIF ||
+       src->header.opcode == BRW_OPCODE_HALT ||
+       src->header.opcode == BRW_OPCODE_DO ||
+       src->header.opcode == BRW_OPCODE_WHILE) {
+      /* FINISHME: The fixup code below, and brw_set_uip_jip and friends, needs
+       * to be able to handle compacted flow control instructions..
+       */
+      return false;
+   }
+
+   /* FINISHME: immediates */
+   if (src->bits1.da1.src0_reg_file == BRW_IMMEDIATE_VALUE ||
+       src->bits1.da1.src1_reg_file == BRW_IMMEDIATE_VALUE)
+      return false;
+
+   memset(&temp, 0, sizeof(temp));
+
+   temp.dw0.opcode = src->header.opcode;
+   temp.dw0.debug_control = src->header.debug_control;
+   if (!set_control_index(brw, &temp, src))
+      return false;
+   if (!set_datatype_index(&temp, src))
+      return false;
+   if (!set_subreg_index(&temp, src))
+      return false;
+   temp.dw0.acc_wr_control = src->header.acc_wr_control;
+   temp.dw0.conditionalmod = src->header.destreg__conditionalmod;
+   if (brw->gen <= 6)
+      temp.dw0.flag_subreg_nr = src->bits2.da1.flag_subreg_nr;
+   temp.dw0.cmpt_ctrl = 1;
+   if (!set_src0_index(&temp, src))
+      return false;
+   if (!set_src1_index(&temp, src))
+      return false;
+   temp.dw1.dst_reg_nr = src->bits1.da1.dest_reg_nr;
+   temp.dw1.src0_reg_nr = src->bits2.da1.src0_reg_nr;
+   temp.dw1.src1_reg_nr = src->bits3.da1.src1_reg_nr;
+
+   *dst = temp;
+
+   return true;
+}
+
+static void
+set_uncompacted_control(struct brw_context *brw,
+                        struct brw_instruction *dst,
+                        struct brw_compact_instruction *src)
+{
+   uint32_t *dst_u32 = (uint32_t *)dst;
+   uint32_t uncompacted = control_index_table[src->dw0.control_index];
+
+   dst_u32[0] |= ((uncompacted >> 0) & 0xffff) << 8;
+   dst_u32[0] |= ((uncompacted >> 16) & 0x1) << 31;
+
+   if (brw->gen >= 7)
+      dst_u32[2] |= ((uncompacted >> 17) & 0x3) << 25;
+}
+
+static void
+set_uncompacted_datatype(struct brw_instruction *dst,
+                         struct brw_compact_instruction *src)
+{
+   uint32_t uncompacted = datatype_table[src->dw0.data_type_index];
+
+   dst->bits1.ud &= ~(0x7 << 29);
+   dst->bits1.ud |= ((uncompacted >> 15) & 0x7) << 29;
+   dst->bits1.ud &= ~0x7fff;
+   dst->bits1.ud |= uncompacted & 0x7fff;
+}
+
+static void
+set_uncompacted_subreg(struct brw_instruction *dst,
+                       struct brw_compact_instruction *src)
+{
+   uint16_t uncompacted = subreg_table[src->dw0.sub_reg_index];
+
+   dst->bits1.da1.dest_subreg_nr = (uncompacted >> 0)  & 0x1f;
+   dst->bits2.da1.src0_subreg_nr = (uncompacted >> 5)  & 0x1f;
+   dst->bits3.da1.src1_subreg_nr = (uncompacted >> 10) & 0x1f;
+}
+
+static void
+set_uncompacted_src0(struct brw_instruction *dst,
+                     struct brw_compact_instruction *src)
+{
+   uint32_t compacted = src->dw0.src0_index | src->dw1.src0_index << 2;
+   uint16_t uncompacted = src_index_table[compacted];
+
+   dst->bits2.ud |= uncompacted << 13;
+}
+
+static void
+set_uncompacted_src1(struct brw_instruction *dst,
+                     struct brw_compact_instruction *src)
+{
+   uint16_t uncompacted = src_index_table[src->dw1.src1_index];
+
+   dst->bits3.ud |= uncompacted << 13;
+}
+
+void
+brw_uncompact_instruction(struct brw_context *brw,
+                          struct brw_instruction *dst,
+                          struct brw_compact_instruction *src)
+{
+   memset(dst, 0, sizeof(*dst));
+
+   dst->header.opcode = src->dw0.opcode;
+   dst->header.debug_control = src->dw0.debug_control;
+
+   set_uncompacted_control(brw, dst, src);
+   set_uncompacted_datatype(dst, src);
+   set_uncompacted_subreg(dst, src);
+   dst->header.acc_wr_control = src->dw0.acc_wr_control;
+   dst->header.destreg__conditionalmod = src->dw0.conditionalmod;
+   if (brw->gen <= 6)
+      dst->bits2.da1.flag_subreg_nr = src->dw0.flag_subreg_nr;
+   set_uncompacted_src0(dst, src);
+   set_uncompacted_src1(dst, src);
+   dst->bits1.da1.dest_reg_nr = src->dw1.dst_reg_nr;
+   dst->bits2.da1.src0_reg_nr = src->dw1.src0_reg_nr;
+   dst->bits3.da1.src1_reg_nr = src->dw1.src1_reg_nr;
+}
+
+void brw_debug_compact_uncompact(struct brw_context *brw,
+                                 struct brw_instruction *orig,
+                                 struct brw_instruction *uncompacted)
+{
+   fprintf(stderr, "Instruction compact/uncompact changed (gen%d):\n",
+           brw->gen);
+
+   fprintf(stderr, "  before: ");
+   brw_disasm(stderr, orig, brw->gen);
+
+   fprintf(stderr, "  after:  ");
+   brw_disasm(stderr, uncompacted, brw->gen);
+
+   uint32_t *before_bits = (uint32_t *)orig;
+   uint32_t *after_bits = (uint32_t *)uncompacted;
+   fprintf(stderr, "  changed bits:\n");
+   for (int i = 0; i < 128; i++) {
+      uint32_t before = before_bits[i / 32] & (1 << (i & 31));
+      uint32_t after = after_bits[i / 32] & (1 << (i & 31));
+
+      if (before != after) {
+         fprintf(stderr, "  bit %d, %s to %s\n", i,
+                 before ? "set" : "unset",
+                 after ? "set" : "unset");
+      }
+   }
+}
+
+static int
+compacted_between(int old_ip, int old_target_ip, int *compacted_counts)
+{
+   int this_compacted_count = compacted_counts[old_ip];
+   int target_compacted_count = compacted_counts[old_target_ip];
+   return target_compacted_count - this_compacted_count;
+}
+
+static void
+update_uip_jip(struct brw_instruction *insn, int this_old_ip,
+               int *compacted_counts)
+{
+   int target_old_ip;
+
+   target_old_ip = this_old_ip + insn->bits3.break_cont.jip;
+   insn->bits3.break_cont.jip -= compacted_between(this_old_ip,
+                                                   target_old_ip,
+                                                   compacted_counts);
+
+   target_old_ip = this_old_ip + insn->bits3.break_cont.uip;
+   insn->bits3.break_cont.uip -= compacted_between(this_old_ip,
+                                                   target_old_ip,
+                                                   compacted_counts);
+}
+
+void
+brw_init_compaction_tables(struct brw_context *brw)
+{
+   assert(gen6_control_index_table[ARRAY_SIZE(gen6_control_index_table) - 1] != 0);
+   assert(gen6_datatype_table[ARRAY_SIZE(gen6_datatype_table) - 1] != 0);
+   assert(gen6_subreg_table[ARRAY_SIZE(gen6_subreg_table) - 1] != 0);
+   assert(gen6_src_index_table[ARRAY_SIZE(gen6_src_index_table) - 1] != 0);
+   assert(gen7_control_index_table[ARRAY_SIZE(gen6_control_index_table) - 1] != 0);
+   assert(gen7_datatype_table[ARRAY_SIZE(gen6_datatype_table) - 1] != 0);
+   assert(gen7_subreg_table[ARRAY_SIZE(gen6_subreg_table) - 1] != 0);
+   assert(gen7_src_index_table[ARRAY_SIZE(gen6_src_index_table) - 1] != 0);
+
+   switch (brw->gen) {
+   case 7:
+      control_index_table = gen7_control_index_table;
+      datatype_table = gen7_datatype_table;
+      subreg_table = gen7_subreg_table;
+      src_index_table = gen7_src_index_table;
+      break;
+   case 6:
+      control_index_table = gen6_control_index_table;
+      datatype_table = gen6_datatype_table;
+      subreg_table = gen6_subreg_table;
+      src_index_table = gen6_src_index_table;
+      break;
+   default:
+      return;
+   }
+}
+
+void
+brw_compact_instructions(struct brw_compile *p)
+{
+   struct brw_context *brw = p->brw;
+   void *store = p->store;
+   /* For an instruction at byte offset 8*i before compaction, this is the number
+    * of compacted instructions that preceded it.
+    */
+   int compacted_counts[p->next_insn_offset / 8];
+   /* For an instruction at byte offset 8*i after compaction, this is the
+    * 8-byte offset it was at before compaction.
+    */
+   int old_ip[p->next_insn_offset / 8];
+
+   if (brw->gen < 6)
+      return;
+
+   int src_offset;
+   int offset = 0;
+   int compacted_count = 0;
+   for (src_offset = 0; src_offset < p->nr_insn * 16;) {
+      struct brw_instruction *src = store + src_offset;
+      void *dst = store + offset;
+
+      old_ip[offset / 8] = src_offset / 8;
+      compacted_counts[src_offset / 8] = compacted_count;
+
+      struct brw_instruction saved = *src;
+
+      if (!src->header.cmpt_control &&
+          brw_try_compact_instruction(p, dst, src)) {
+         compacted_count++;
+
+         if (INTEL_DEBUG) {
+            struct brw_instruction uncompacted;
+            brw_uncompact_instruction(brw, &uncompacted, dst);
+            if (memcmp(&saved, &uncompacted, sizeof(uncompacted))) {
+               brw_debug_compact_uncompact(brw, &saved, &uncompacted);
+            }
+         }
+
+         offset += 8;
+         src_offset += 16;
+      } else {
+         int size = src->header.cmpt_control ? 8 : 16;
+
+         /* It appears that the end of thread SEND instruction needs to be
+          * aligned, or the GPU hangs.
+          */
+         if ((src->header.opcode == BRW_OPCODE_SEND ||
+              src->header.opcode == BRW_OPCODE_SENDC) &&
+             src->bits3.generic.end_of_thread &&
+             (offset & 8) != 0) {
+            struct brw_compact_instruction *align = store + offset;
+            memset(align, 0, sizeof(*align));
+            align->dw0.opcode = BRW_OPCODE_NOP;
+            align->dw0.cmpt_ctrl = 1;
+            offset += 8;
+            old_ip[offset / 8] = src_offset / 8;
+            dst = store + offset;
+         }
+
+         /* If we didn't compact this instruction, we need to move it down into
+          * place.
+          */
+         if (offset != src_offset) {
+            memmove(dst, src, size);
+         }
+         offset += size;
+         src_offset += size;
+      }
+   }
+
+   /* Fix up control flow offsets. */
+   p->next_insn_offset = offset;
+   for (offset = 0; offset < p->next_insn_offset;) {
+      struct brw_instruction *insn = store + offset;
+      int this_old_ip = old_ip[offset / 8];
+      int this_compacted_count = compacted_counts[this_old_ip];
+      int target_old_ip, target_compacted_count;
+
+      switch (insn->header.opcode) {
+      case BRW_OPCODE_BREAK:
+      case BRW_OPCODE_CONTINUE:
+      case BRW_OPCODE_HALT:
+         update_uip_jip(insn, this_old_ip, compacted_counts);
+         break;
+
+      case BRW_OPCODE_IF:
+      case BRW_OPCODE_ELSE:
+      case BRW_OPCODE_ENDIF:
+      case BRW_OPCODE_WHILE:
+         if (brw->gen == 6) {
+            target_old_ip = this_old_ip + insn->bits1.branch_gen6.jump_count;
+            target_compacted_count = compacted_counts[target_old_ip];
+            insn->bits1.branch_gen6.jump_count -= (target_compacted_count -
+                                                   this_compacted_count);
+         } else {
+            update_uip_jip(insn, this_old_ip, compacted_counts);
+         }
+         break;
+      }
+
+      if (insn->header.cmpt_control) {
+         offset += 8;
+      } else {
+         offset += 16;
+      }
+   }
+
+   /* p->nr_insn is counting the number of uncompacted instructions still, so
+    * divide.  We do want to be sure there's a valid instruction in any
+    * alignment padding, so that the next compression pass (for the FS 8/16
+    * compile passes) parses correctly.
+    */
+   if (p->next_insn_offset & 8) {
+      struct brw_compact_instruction *align = store + offset;
+      memset(align, 0, sizeof(*align));
+      align->dw0.opcode = BRW_OPCODE_NOP;
+      align->dw0.cmpt_ctrl = 1;
+      p->next_insn_offset += 8;
+   }
+   p->nr_insn = p->next_insn_offset / 16;
+
+   if (0) {
+      fprintf(stderr, "dumping compacted program\n");
+      brw_dump_compile(p, stderr, 0, p->next_insn_offset);
+
+      int cmp = 0;
+      for (offset = 0; offset < p->next_insn_offset;) {
+         struct brw_instruction *insn = store + offset;
+
+         if (insn->header.cmpt_control) {
+            offset += 8;
+            cmp++;
+         } else {
+            offset += 16;
+         }
+      }
+      fprintf(stderr, "%db/%db saved (%d%%)\n", cmp * 8, offset + cmp * 8,
+              cmp * 8 * 100 / (offset + cmp * 8));
+   }
+}

diff --git a/icd/intel/compiler/pipeline/brw_eu_emit.c b/icd/intel/compiler/pipeline/brw_eu_emit.c
new file mode 100644
index 0000000..c9b54c5
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_eu_emit.c

@@ -0,0 +1,2853 @@
+/*
+ Copyright (C) Intel Corp.  2006.  All Rights Reserved.
+ Intel funded Tungsten Graphics to
+ develop this 3D driver.
+
+ Permission is hereby granted, free of charge, to any person obtaining
+ a copy of this software and associated documentation files (the
+ "Software"), to deal in the Software without restriction, including
+ without limitation the rights to use, copy, modify, merge, publish,
+ distribute, sublicense, and/or sell copies of the Software, and to
+ permit persons to whom the Software is furnished to do so, subject to
+ the following conditions:
+
+ The above copyright notice and this permission notice (including the
+ next paragraph) shall be included in all copies or substantial
+ portions of the Software.
+
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+ **********************************************************************/
+ /*
+  * Authors:
+  *   Keith Whitwell <keithw@vmware.com>
+  */
+
+
+#include "brw_context.h"
+#include "brw_defines.h"
+#include "brw_eu.h"
+
+#include "glsl/ralloc.h"
+
+#include "icd-utils.h" // LunarG : ADD
+
+/***********************************************************************
+ * Internal helper for constructing instructions
+ */
+
+static void guess_execution_size(struct brw_compile *p,
+				 struct brw_instruction *insn,
+				 struct brw_reg reg)
+{
+   if (reg.width == BRW_WIDTH_8 && p->compressed)
+      insn->header.execution_size = BRW_EXECUTE_16;
+   else
+      insn->header.execution_size = reg.width;	/* note - definitions are compatible */
+}
+
+
+/**
+ * Prior to Sandybridge, the SEND instruction accepted non-MRF source
+ * registers, implicitly moving the operand to a message register.
+ *
+ * On Sandybridge, this is no longer the case.  This function performs the
+ * explicit move; it should be called before emitting a SEND instruction.
+ */
+void
+gen6_resolve_implied_move(struct brw_compile *p,
+			  struct brw_reg *src,
+			  unsigned msg_reg_nr)
+{
+   struct brw_context *brw = p->brw;
+   if (brw->gen < 6)
+      return;
+
+   if (src->file == BRW_MESSAGE_REGISTER_FILE)
+      return;
+
+   if (src->file != BRW_ARCHITECTURE_REGISTER_FILE || src->nr != BRW_ARF_NULL) {
+      brw_push_insn_state(p);
+      brw_set_mask_control(p, BRW_MASK_DISABLE);
+      brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+      brw_MOV(p, retype(brw_message_reg(msg_reg_nr), BRW_REGISTER_TYPE_UD),
+	      retype(*src, BRW_REGISTER_TYPE_UD));
+      brw_pop_insn_state(p);
+   }
+   *src = brw_message_reg(msg_reg_nr);
+}
+
+static void
+gen7_convert_mrf_to_grf(struct brw_compile *p, struct brw_reg *reg)
+{
+   /* From the Ivybridge PRM, Volume 4 Part 3, page 218 ("send"):
+    * "The send with EOT should use register space R112-R127 for <src>. This is
+    *  to enable loading of a new thread into the same slot while the message
+    *  with EOT for current thread is pending dispatch."
+    *
+    * Since we're pretending to have 16 MRFs anyway, we may as well use the
+    * registers required for messages with EOT.
+    */
+   struct brw_context *brw = p->brw;
+   if (brw->gen == 7 && reg->file == BRW_MESSAGE_REGISTER_FILE) {
+      reg->file = BRW_GENERAL_REGISTER_FILE;
+      reg->nr += GEN7_MRF_HACK_START;
+   }
+}
+
+/**
+ * Convert a brw_reg_type enumeration value into the hardware representation.
+ *
+ * The hardware encoding may depend on whether the value is an immediate.
+ */
+unsigned
+brw_reg_type_to_hw_type(const struct brw_context *brw,
+                        enum brw_reg_type type, unsigned file)
+{
+   if (file == BRW_IMMEDIATE_VALUE) {
+      static const int imm_hw_types[] = {
+         [BRW_REGISTER_TYPE_UD] = BRW_HW_REG_TYPE_UD,
+         [BRW_REGISTER_TYPE_D]  = BRW_HW_REG_TYPE_D,
+         [BRW_REGISTER_TYPE_UW] = BRW_HW_REG_TYPE_UW,
+         [BRW_REGISTER_TYPE_W]  = BRW_HW_REG_TYPE_W,
+         [BRW_REGISTER_TYPE_F]  = BRW_HW_REG_TYPE_F,
+         [BRW_REGISTER_TYPE_UB] = -1,
+         [BRW_REGISTER_TYPE_B]  = -1,
+         [BRW_REGISTER_TYPE_UV] = BRW_HW_REG_IMM_TYPE_UV,
+         [BRW_REGISTER_TYPE_VF] = BRW_HW_REG_IMM_TYPE_VF,
+         [BRW_REGISTER_TYPE_V]  = BRW_HW_REG_IMM_TYPE_V,
+         [BRW_REGISTER_TYPE_DF] = GEN8_HW_REG_IMM_TYPE_DF,
+         [BRW_REGISTER_TYPE_HF] = GEN8_HW_REG_IMM_TYPE_HF,
+         [BRW_REGISTER_TYPE_UQ] = GEN8_HW_REG_TYPE_UQ,
+         [BRW_REGISTER_TYPE_Q]  = GEN8_HW_REG_TYPE_Q,
+      };
+      assert(type < ARRAY_SIZE(imm_hw_types));
+      assert(imm_hw_types[type] != -1);
+      assert(brw->gen >= 8 || type < BRW_REGISTER_TYPE_DF);
+      return imm_hw_types[type];
+   } else {
+      /* Non-immediate registers */
+      static const int hw_types[] = {
+         [BRW_REGISTER_TYPE_UD] = BRW_HW_REG_TYPE_UD,
+         [BRW_REGISTER_TYPE_D]  = BRW_HW_REG_TYPE_D,
+         [BRW_REGISTER_TYPE_UW] = BRW_HW_REG_TYPE_UW,
+         [BRW_REGISTER_TYPE_W]  = BRW_HW_REG_TYPE_W,
+         [BRW_REGISTER_TYPE_UB] = BRW_HW_REG_NON_IMM_TYPE_UB,
+         [BRW_REGISTER_TYPE_B]  = BRW_HW_REG_NON_IMM_TYPE_B,
+         [BRW_REGISTER_TYPE_F]  = BRW_HW_REG_TYPE_F,
+         [BRW_REGISTER_TYPE_UV] = -1,
+         [BRW_REGISTER_TYPE_VF] = -1,
+         [BRW_REGISTER_TYPE_V]  = -1,
+         [BRW_REGISTER_TYPE_DF] = GEN7_HW_REG_NON_IMM_TYPE_DF,
+         [BRW_REGISTER_TYPE_HF] = GEN8_HW_REG_NON_IMM_TYPE_HF,
+         [BRW_REGISTER_TYPE_UQ] = GEN8_HW_REG_TYPE_UQ,
+         [BRW_REGISTER_TYPE_Q]  = GEN8_HW_REG_TYPE_Q,
+      };
+      assert(type < ARRAY_SIZE(hw_types));
+      assert(hw_types[type] != -1);
+      assert(brw->gen >= 7 || type < BRW_REGISTER_TYPE_DF);
+      assert(brw->gen >= 8 || type < BRW_REGISTER_TYPE_HF);
+      return hw_types[type];
+   }
+}
+
+void
+brw_set_dest(struct brw_compile *p, struct brw_instruction *insn,
+	     struct brw_reg dest)
+{
+   if (dest.file != BRW_ARCHITECTURE_REGISTER_FILE &&
+       dest.file != BRW_MESSAGE_REGISTER_FILE)
+      assert(dest.nr < 128);
+
+   gen7_convert_mrf_to_grf(p, &dest);
+
+   insn->bits1.da1.dest_reg_file = dest.file;
+   insn->bits1.da1.dest_reg_type =
+      brw_reg_type_to_hw_type(p->brw, dest.type, dest.file);
+   insn->bits1.da1.dest_address_mode = dest.address_mode;
+
+   if (dest.address_mode == BRW_ADDRESS_DIRECT) {
+      insn->bits1.da1.dest_reg_nr = dest.nr;
+
+      if (insn->header.access_mode == BRW_ALIGN_1) {
+	 insn->bits1.da1.dest_subreg_nr = dest.subnr;
+	 if (dest.hstride == BRW_HORIZONTAL_STRIDE_0)
+	    dest.hstride = BRW_HORIZONTAL_STRIDE_1;
+	 insn->bits1.da1.dest_horiz_stride = dest.hstride;
+      }
+      else {
+	 insn->bits1.da16.dest_subreg_nr = dest.subnr / 16;
+	 insn->bits1.da16.dest_writemask = dest.dw1.bits.writemask;
+         if (dest.file == BRW_GENERAL_REGISTER_FILE ||
+             dest.file == BRW_MESSAGE_REGISTER_FILE) {
+            assert(dest.dw1.bits.writemask != 0);
+         }
+	 /* From the Ivybridge PRM, Vol 4, Part 3, Section 5.2.4.1:
+	  *    Although Dst.HorzStride is a don't care for Align16, HW needs
+	  *    this to be programmed as "01".
+	  */
+	 insn->bits1.da16.dest_horiz_stride = 1;
+      }
+   }
+   else {
+      insn->bits1.ia1.dest_subreg_nr = dest.subnr;
+
+      /* These are different sizes in align1 vs align16:
+       */
+      if (insn->header.access_mode == BRW_ALIGN_1) {
+	 insn->bits1.ia1.dest_indirect_offset = dest.dw1.bits.indirect_offset;
+	 if (dest.hstride == BRW_HORIZONTAL_STRIDE_0)
+	    dest.hstride = BRW_HORIZONTAL_STRIDE_1;
+	 insn->bits1.ia1.dest_horiz_stride = dest.hstride;
+      }
+      else {
+	 insn->bits1.ia16.dest_indirect_offset = dest.dw1.bits.indirect_offset;
+	 /* even ignored in da16, still need to set as '01' */
+	 insn->bits1.ia16.dest_horiz_stride = 1;
+      }
+   }
+
+   /* NEW: Set the execution size based on dest.width and
+    * insn->compression_control:
+    */
+   guess_execution_size(p, insn, dest);
+}
+
+extern int reg_type_size[];
+
+static void
+validate_reg(struct brw_instruction *insn, struct brw_reg reg)
+{
+   int hstride_for_reg[] = {0, 1, 2, 4};
+   int vstride_for_reg[] = {0, 1, 2, 4, 8, 16, 32, 64, 128, 256};
+   int width_for_reg[] = {1, 2, 4, 8, 16};
+   int execsize_for_reg[] = {1, 2, 4, 8, 16};
+   int width, hstride, vstride, execsize;
+
+   if (reg.file == BRW_IMMEDIATE_VALUE) {
+      /* 3.3.6: Region Parameters.  Restriction: Immediate vectors
+       * mean the destination has to be 128-bit aligned and the
+       * destination horiz stride has to be a word.
+       */
+      if (reg.type == BRW_REGISTER_TYPE_V) {
+	 assert(hstride_for_reg[insn->bits1.da1.dest_horiz_stride] *
+		reg_type_size[insn->bits1.da1.dest_reg_type] == 2);
+      }
+
+      return;
+   }
+
+   if (reg.file == BRW_ARCHITECTURE_REGISTER_FILE &&
+       reg.file == BRW_ARF_NULL)
+      return;
+
+   assert(reg.hstride >= 0 && reg.hstride < Elements(hstride_for_reg));
+   hstride = hstride_for_reg[reg.hstride];
+
+   if (reg.vstride == 0xf) {
+      vstride = -1;
+   } else {
+      assert(reg.vstride >= 0 && reg.vstride < Elements(vstride_for_reg));
+      vstride = vstride_for_reg[reg.vstride];
+   }
+
+   assert(reg.width >= 0 && reg.width < Elements(width_for_reg));
+   width = width_for_reg[reg.width];
+
+   assert(insn->header.execution_size >= 0 &&
+	  insn->header.execution_size < Elements(execsize_for_reg));
+   execsize = execsize_for_reg[insn->header.execution_size];
+
+   /* Restrictions from 3.3.10: Register Region Restrictions. */
+   /* 3. */
+   assert(execsize >= width);
+
+   /* 4. */
+   if (execsize == width && hstride != 0) {
+      assert(vstride == -1 || vstride == width * hstride);
+   }
+
+   /* 5. */
+   if (execsize == width && hstride == 0) {
+      /* no restriction on vstride. */
+   }
+
+   /* 6. */
+   if (width == 1) {
+      assert(hstride == 0);
+   }
+
+   /* 7. */
+   if (execsize == 1 && width == 1) {
+      assert(hstride == 0);
+      assert(vstride == 0);
+   }
+
+   /* 8. */
+   if (vstride == 0 && hstride == 0) {
+      assert(width == 1);
+   }
+
+   /* 10. Check destination issues. */
+}
+
+void
+brw_set_src0(struct brw_compile *p, struct brw_instruction *insn,
+	     struct brw_reg reg)
+{
+   struct brw_context *brw = p->brw;
+
+   if (reg.type != BRW_ARCHITECTURE_REGISTER_FILE)
+      assert(reg.nr < 128);
+
+   gen7_convert_mrf_to_grf(p, &reg);
+
+   if (brw->gen >= 6 && (insn->header.opcode == BRW_OPCODE_SEND ||
+                           insn->header.opcode == BRW_OPCODE_SENDC)) {
+      /* Any source modifiers or regions will be ignored, since this just
+       * identifies the MRF/GRF to start reading the message contents from.
+       * Check for some likely failures.
+       */
+      assert(!reg.negate);
+      assert(!reg.abs);
+      assert(reg.address_mode == BRW_ADDRESS_DIRECT);
+   }
+
+   validate_reg(insn, reg);
+
+   insn->bits1.da1.src0_reg_file = reg.file;
+   insn->bits1.da1.src0_reg_type =
+      brw_reg_type_to_hw_type(brw, reg.type, reg.file);
+   insn->bits2.da1.src0_abs = reg.abs;
+   insn->bits2.da1.src0_negate = reg.negate;
+   insn->bits2.da1.src0_address_mode = reg.address_mode;
+
+   if (reg.file == BRW_IMMEDIATE_VALUE) {
+      insn->bits3.ud = reg.dw1.ud;
+
+      /* Required to set some fields in src1 as well:
+       */
+      insn->bits1.da1.src1_reg_file = 0; /* arf */
+      insn->bits1.da1.src1_reg_type = insn->bits1.da1.src0_reg_type;
+   }
+   else
+   {
+      if (reg.address_mode == BRW_ADDRESS_DIRECT) {
+	 if (insn->header.access_mode == BRW_ALIGN_1) {
+	    insn->bits2.da1.src0_subreg_nr = reg.subnr;
+	    insn->bits2.da1.src0_reg_nr = reg.nr;
+	 }
+	 else {
+	    insn->bits2.da16.src0_subreg_nr = reg.subnr / 16;
+	    insn->bits2.da16.src0_reg_nr = reg.nr;
+	 }
+      }
+      else {
+	 insn->bits2.ia1.src0_subreg_nr = reg.subnr;
+
+	 if (insn->header.access_mode == BRW_ALIGN_1) {
+	    insn->bits2.ia1.src0_indirect_offset = reg.dw1.bits.indirect_offset;
+	 }
+	 else {
+	    insn->bits2.ia16.src0_subreg_nr = reg.dw1.bits.indirect_offset;
+	 }
+      }
+
+      if (insn->header.access_mode == BRW_ALIGN_1) {
+	 if (reg.width == BRW_WIDTH_1 &&
+	     insn->header.execution_size == BRW_EXECUTE_1) {
+	    insn->bits2.da1.src0_horiz_stride = BRW_HORIZONTAL_STRIDE_0;
+	    insn->bits2.da1.src0_width = BRW_WIDTH_1;
+	    insn->bits2.da1.src0_vert_stride = BRW_VERTICAL_STRIDE_0;
+	 }
+	 else {
+	    insn->bits2.da1.src0_horiz_stride = reg.hstride;
+	    insn->bits2.da1.src0_width = reg.width;
+	    insn->bits2.da1.src0_vert_stride = reg.vstride;
+	 }
+      }
+      else {
+	 insn->bits2.da16.src0_swz_x = BRW_GET_SWZ(reg.dw1.bits.swizzle, BRW_CHANNEL_X);
+	 insn->bits2.da16.src0_swz_y = BRW_GET_SWZ(reg.dw1.bits.swizzle, BRW_CHANNEL_Y);
+	 insn->bits2.da16.src0_swz_z = BRW_GET_SWZ(reg.dw1.bits.swizzle, BRW_CHANNEL_Z);
+	 insn->bits2.da16.src0_swz_w = BRW_GET_SWZ(reg.dw1.bits.swizzle, BRW_CHANNEL_W);
+
+	 /* This is an oddity of the fact we're using the same
+	  * descriptions for registers in align_16 as align_1:
+	  */
+	 if (reg.vstride == BRW_VERTICAL_STRIDE_8)
+	    insn->bits2.da16.src0_vert_stride = BRW_VERTICAL_STRIDE_4;
+	 else
+	    insn->bits2.da16.src0_vert_stride = reg.vstride;
+      }
+   }
+}
+
+
+void brw_set_src1(struct brw_compile *p,
+		  struct brw_instruction *insn,
+		  struct brw_reg reg)
+{
+   assert(reg.file != BRW_MESSAGE_REGISTER_FILE);
+
+   if (reg.type != BRW_ARCHITECTURE_REGISTER_FILE)
+      assert(reg.nr < 128);
+
+   gen7_convert_mrf_to_grf(p, &reg);
+
+   validate_reg(insn, reg);
+
+   insn->bits1.da1.src1_reg_file = reg.file;
+   insn->bits1.da1.src1_reg_type =
+      brw_reg_type_to_hw_type(p->brw, reg.type, reg.file);
+   insn->bits3.da1.src1_abs = reg.abs;
+   insn->bits3.da1.src1_negate = reg.negate;
+
+   /* Only src1 can be immediate in two-argument instructions.
+    */
+   assert(insn->bits1.da1.src0_reg_file != BRW_IMMEDIATE_VALUE);
+
+   if (reg.file == BRW_IMMEDIATE_VALUE) {
+      insn->bits3.ud = reg.dw1.ud;
+   }
+   else {
+      /* This is a hardware restriction, which may or may not be lifted
+       * in the future:
+       */
+      assert (reg.address_mode == BRW_ADDRESS_DIRECT);
+      /* assert (reg.file == BRW_GENERAL_REGISTER_FILE); */
+
+      if (insn->header.access_mode == BRW_ALIGN_1) {
+	 insn->bits3.da1.src1_subreg_nr = reg.subnr;
+	 insn->bits3.da1.src1_reg_nr = reg.nr;
+      }
+      else {
+	 insn->bits3.da16.src1_subreg_nr = reg.subnr / 16;
+	 insn->bits3.da16.src1_reg_nr = reg.nr;
+      }
+
+      if (insn->header.access_mode == BRW_ALIGN_1) {
+	 if (reg.width == BRW_WIDTH_1 &&
+	     insn->header.execution_size == BRW_EXECUTE_1) {
+	    insn->bits3.da1.src1_horiz_stride = BRW_HORIZONTAL_STRIDE_0;
+	    insn->bits3.da1.src1_width = BRW_WIDTH_1;
+	    insn->bits3.da1.src1_vert_stride = BRW_VERTICAL_STRIDE_0;
+	 }
+	 else {
+	    insn->bits3.da1.src1_horiz_stride = reg.hstride;
+	    insn->bits3.da1.src1_width = reg.width;
+	    insn->bits3.da1.src1_vert_stride = reg.vstride;
+	 }
+      }
+      else {
+	 insn->bits3.da16.src1_swz_x = BRW_GET_SWZ(reg.dw1.bits.swizzle, BRW_CHANNEL_X);
+	 insn->bits3.da16.src1_swz_y = BRW_GET_SWZ(reg.dw1.bits.swizzle, BRW_CHANNEL_Y);
+	 insn->bits3.da16.src1_swz_z = BRW_GET_SWZ(reg.dw1.bits.swizzle, BRW_CHANNEL_Z);
+	 insn->bits3.da16.src1_swz_w = BRW_GET_SWZ(reg.dw1.bits.swizzle, BRW_CHANNEL_W);
+
+	 /* This is an oddity of the fact we're using the same
+	  * descriptions for registers in align_16 as align_1:
+	  */
+	 if (reg.vstride == BRW_VERTICAL_STRIDE_8)
+	    insn->bits3.da16.src1_vert_stride = BRW_VERTICAL_STRIDE_4;
+	 else
+	    insn->bits3.da16.src1_vert_stride = reg.vstride;
+      }
+   }
+}
+
+/**
+ * Set the Message Descriptor and Extended Message Descriptor fields
+ * for SEND messages.
+ *
+ * \note This zeroes out the Function Control bits, so it must be called
+ *       \b before filling out any message-specific data.  Callers can
+ *       choose not to fill in irrelevant bits; they will be zero.
+ */
+static void
+brw_set_message_descriptor(struct brw_compile *p,
+			   struct brw_instruction *inst,
+			   enum brw_message_target sfid,
+			   unsigned msg_length,
+			   unsigned response_length,
+			   bool header_present,
+			   bool end_of_thread)
+{
+   struct brw_context *brw = p->brw;
+
+   brw_set_src1(p, inst, brw_imm_d(0));
+
+   if (brw->gen >= 5) {
+      inst->bits3.generic_gen5.header_present = header_present;
+      inst->bits3.generic_gen5.response_length = response_length;
+      inst->bits3.generic_gen5.msg_length = msg_length;
+      inst->bits3.generic_gen5.end_of_thread = end_of_thread;
+
+      if (brw->gen >= 6) {
+	 /* On Gen6+ Message target/SFID goes in bits 27:24 of the header */
+	 inst->header.destreg__conditionalmod = sfid;
+      } else {
+	 /* Set Extended Message Descriptor (ex_desc) */
+	 inst->bits2.send_gen5.sfid = sfid;
+	 inst->bits2.send_gen5.end_of_thread = end_of_thread;
+      }
+   } else {
+      inst->bits3.generic.response_length = response_length;
+      inst->bits3.generic.msg_length = msg_length;
+      inst->bits3.generic.msg_target = sfid;
+      inst->bits3.generic.end_of_thread = end_of_thread;
+   }
+}
+
+static void brw_set_math_message( struct brw_compile *p,
+				  struct brw_instruction *insn,
+				  unsigned function,
+				  unsigned integer_type,
+				  bool low_precision,
+				  unsigned dataType )
+{
+   struct brw_context *brw = p->brw;
+   unsigned msg_length;
+   unsigned response_length;
+
+   /* Infer message length from the function */
+   switch (function) {
+   case BRW_MATH_FUNCTION_POW:
+   case BRW_MATH_FUNCTION_INT_DIV_QUOTIENT:
+   case BRW_MATH_FUNCTION_INT_DIV_REMAINDER:
+   case BRW_MATH_FUNCTION_INT_DIV_QUOTIENT_AND_REMAINDER:
+      msg_length = 2;
+      break;
+   default:
+      msg_length = 1;
+      break;
+   }
+
+   /* Infer response length from the function */
+   switch (function) {
+   case BRW_MATH_FUNCTION_SINCOS:
+   case BRW_MATH_FUNCTION_INT_DIV_QUOTIENT_AND_REMAINDER:
+      response_length = 2;
+      break;
+   default:
+      response_length = 1;
+      break;
+   }
+
+
+   brw_set_message_descriptor(p, insn, BRW_SFID_MATH,
+			      msg_length, response_length, false, false);
+   if (brw->gen == 5) {
+      insn->bits3.math_gen5.function = function;
+      insn->bits3.math_gen5.int_type = integer_type;
+      insn->bits3.math_gen5.precision = low_precision;
+      insn->bits3.math_gen5.saturate = insn->header.saturate;
+      insn->bits3.math_gen5.data_type = dataType;
+      insn->bits3.math_gen5.snapshot = 0;
+   } else {
+      insn->bits3.math.function = function;
+      insn->bits3.math.int_type = integer_type;
+      insn->bits3.math.precision = low_precision;
+      insn->bits3.math.saturate = insn->header.saturate;
+      insn->bits3.math.data_type = dataType;
+   }
+   insn->header.saturate = 0;
+}
+
+
+static void brw_set_ff_sync_message(struct brw_compile *p,
+				    struct brw_instruction *insn,
+				    bool allocate,
+				    unsigned response_length,
+				    bool end_of_thread)
+{
+   brw_set_message_descriptor(p, insn, BRW_SFID_URB,
+			      1, response_length, true, end_of_thread);
+   insn->bits3.urb_gen5.opcode = 1; /* FF_SYNC */
+   insn->bits3.urb_gen5.offset = 0; /* Not used by FF_SYNC */
+   insn->bits3.urb_gen5.swizzle_control = 0; /* Not used by FF_SYNC */
+   insn->bits3.urb_gen5.allocate = allocate;
+   insn->bits3.urb_gen5.used = 0; /* Not used by FF_SYNC */
+   insn->bits3.urb_gen5.complete = 0; /* Not used by FF_SYNC */
+}
+
+static void brw_set_urb_message( struct brw_compile *p,
+				 struct brw_instruction *insn,
+                                 enum brw_urb_write_flags flags,
+				 unsigned msg_length,
+				 unsigned response_length,
+				 unsigned offset,
+				 unsigned swizzle_control )
+{
+   struct brw_context *brw = p->brw;
+
+   brw_set_message_descriptor(p, insn, BRW_SFID_URB,
+			      msg_length, response_length, true,
+                              flags & BRW_URB_WRITE_EOT);
+   if (brw->gen == 7) {
+      if (flags & BRW_URB_WRITE_OWORD) {
+         assert(msg_length == 2); /* header + one OWORD of data */
+         insn->bits3.urb_gen7.opcode = BRW_URB_OPCODE_WRITE_OWORD;
+      } else {
+         insn->bits3.urb_gen7.opcode = BRW_URB_OPCODE_WRITE_HWORD;
+      }
+      insn->bits3.urb_gen7.offset = offset;
+      assert(swizzle_control != BRW_URB_SWIZZLE_TRANSPOSE);
+      insn->bits3.urb_gen7.swizzle_control = swizzle_control;
+      insn->bits3.urb_gen7.per_slot_offset =
+         flags & BRW_URB_WRITE_PER_SLOT_OFFSET ? 1 : 0;
+      insn->bits3.urb_gen7.complete = flags & BRW_URB_WRITE_COMPLETE ? 1 : 0;
+   } else if (brw->gen >= 5) {
+      insn->bits3.urb_gen5.opcode = 0;	/* URB_WRITE */
+      insn->bits3.urb_gen5.offset = offset;
+      insn->bits3.urb_gen5.swizzle_control = swizzle_control;
+      insn->bits3.urb_gen5.allocate = flags & BRW_URB_WRITE_ALLOCATE ? 1 : 0;
+      insn->bits3.urb_gen5.used = flags & BRW_URB_WRITE_UNUSED ? 0 : 1;
+      insn->bits3.urb_gen5.complete = flags & BRW_URB_WRITE_COMPLETE ? 1 : 0;
+   } else {
+      insn->bits3.urb.opcode = 0;	/* ? */
+      insn->bits3.urb.offset = offset;
+      insn->bits3.urb.swizzle_control = swizzle_control;
+      insn->bits3.urb.allocate = flags & BRW_URB_WRITE_ALLOCATE ? 1 : 0;
+      insn->bits3.urb.used = flags & BRW_URB_WRITE_UNUSED ? 0 : 1;
+      insn->bits3.urb.complete = flags & BRW_URB_WRITE_COMPLETE ? 1 : 0;
+   }
+}
+
+void
+brw_set_dp_write_message(struct brw_compile *p,
+			 struct brw_instruction *insn,
+			 unsigned binding_table_index,
+			 unsigned msg_control,
+			 unsigned msg_type,
+			 unsigned msg_length,
+			 bool header_present,
+			 unsigned last_render_target,
+			 unsigned response_length,
+			 unsigned end_of_thread,
+			 unsigned send_commit_msg)
+{
+   struct brw_context *brw = p->brw;
+   unsigned sfid;
+
+   if (brw->gen >= 7) {
+      /* Use the Render Cache for RT writes; otherwise use the Data Cache */
+      if (msg_type == GEN6_DATAPORT_WRITE_MESSAGE_RENDER_TARGET_WRITE)
+	 sfid = GEN6_SFID_DATAPORT_RENDER_CACHE;
+      else
+	 sfid = GEN7_SFID_DATAPORT_DATA_CACHE;
+   } else if (brw->gen == 6) {
+      /* Use the render cache for all write messages. */
+      sfid = GEN6_SFID_DATAPORT_RENDER_CACHE;
+   } else {
+      sfid = BRW_SFID_DATAPORT_WRITE;
+   }
+
+   brw_set_message_descriptor(p, insn, sfid, msg_length, response_length,
+			      header_present, end_of_thread);
+
+   if (brw->gen >= 7) {
+      insn->bits3.gen7_dp.binding_table_index = binding_table_index;
+      insn->bits3.gen7_dp.msg_control = msg_control;
+      insn->bits3.gen7_dp.last_render_target = last_render_target;
+      insn->bits3.gen7_dp.msg_type = msg_type;
+   } else if (brw->gen == 6) {
+      insn->bits3.gen6_dp.binding_table_index = binding_table_index;
+      insn->bits3.gen6_dp.msg_control = msg_control;
+      insn->bits3.gen6_dp.last_render_target = last_render_target;
+      insn->bits3.gen6_dp.msg_type = msg_type;
+      insn->bits3.gen6_dp.send_commit_msg = send_commit_msg;
+   } else if (brw->gen == 5) {
+      insn->bits3.dp_write_gen5.binding_table_index = binding_table_index;
+      insn->bits3.dp_write_gen5.msg_control = msg_control;
+      insn->bits3.dp_write_gen5.last_render_target = last_render_target;
+      insn->bits3.dp_write_gen5.msg_type = msg_type;
+      insn->bits3.dp_write_gen5.send_commit_msg = send_commit_msg;
+   } else {
+      insn->bits3.dp_write.binding_table_index = binding_table_index;
+      insn->bits3.dp_write.msg_control = msg_control;
+      insn->bits3.dp_write.last_render_target = last_render_target;
+      insn->bits3.dp_write.msg_type = msg_type;
+      insn->bits3.dp_write.send_commit_msg = send_commit_msg;
+   }
+}
+
+void
+brw_set_dp_read_message(struct brw_compile *p,
+			struct brw_instruction *insn,
+			unsigned binding_table_index,
+			unsigned msg_control,
+			unsigned msg_type,
+			unsigned target_cache,
+			unsigned msg_length,
+                        bool header_present,
+			unsigned response_length)
+{
+   struct brw_context *brw = p->brw;
+   unsigned sfid;
+
+   if (brw->gen >= 7) {
+      sfid = GEN7_SFID_DATAPORT_DATA_CACHE;
+   } else if (brw->gen == 6) {
+      if (target_cache == BRW_DATAPORT_READ_TARGET_RENDER_CACHE)
+	 sfid = GEN6_SFID_DATAPORT_RENDER_CACHE;
+      else
+	 sfid = GEN6_SFID_DATAPORT_SAMPLER_CACHE;
+   } else {
+      sfid = BRW_SFID_DATAPORT_READ;
+   }
+
+   brw_set_message_descriptor(p, insn, sfid, msg_length, response_length,
+			      header_present, false);
+
+   if (brw->gen >= 7) {
+      insn->bits3.gen7_dp.binding_table_index = binding_table_index;
+      insn->bits3.gen7_dp.msg_control = msg_control;
+      insn->bits3.gen7_dp.last_render_target = 0;
+      insn->bits3.gen7_dp.msg_type = msg_type;
+   } else if (brw->gen == 6) {
+      insn->bits3.gen6_dp.binding_table_index = binding_table_index;
+      insn->bits3.gen6_dp.msg_control = msg_control;
+      insn->bits3.gen6_dp.last_render_target = 0;
+      insn->bits3.gen6_dp.msg_type = msg_type;
+      insn->bits3.gen6_dp.send_commit_msg = 0;
+   } else if (brw->gen == 5) {
+      insn->bits3.dp_read_gen5.binding_table_index = binding_table_index;
+      insn->bits3.dp_read_gen5.msg_control = msg_control;
+      insn->bits3.dp_read_gen5.msg_type = msg_type;
+      insn->bits3.dp_read_gen5.target_cache = target_cache;
+   } else if (brw->is_g4x) {
+      insn->bits3.dp_read_g4x.binding_table_index = binding_table_index; /*0:7*/
+      insn->bits3.dp_read_g4x.msg_control = msg_control;  /*8:10*/
+      insn->bits3.dp_read_g4x.msg_type = msg_type;  /*11:13*/
+      insn->bits3.dp_read_g4x.target_cache = target_cache;  /*14:15*/
+   } else {
+      insn->bits3.dp_read.binding_table_index = binding_table_index; /*0:7*/
+      insn->bits3.dp_read.msg_control = msg_control;  /*8:11*/
+      insn->bits3.dp_read.msg_type = msg_type;  /*12:13*/
+      insn->bits3.dp_read.target_cache = target_cache;  /*14:15*/
+   }
+}
+
+void
+brw_set_sampler_message(struct brw_compile *p,
+                        struct brw_instruction *insn,
+                        unsigned binding_table_index,
+                        unsigned sampler,
+                        unsigned msg_type,
+                        unsigned response_length,
+                        unsigned msg_length,
+                        unsigned header_present,
+                        unsigned simd_mode,
+                        unsigned return_format)
+{
+   struct brw_context *brw = p->brw;
+
+   brw_set_message_descriptor(p, insn, BRW_SFID_SAMPLER, msg_length,
+			      response_length, header_present, false);
+
+   if (brw->gen >= 7) {
+      insn->bits3.sampler_gen7.binding_table_index = binding_table_index;
+      insn->bits3.sampler_gen7.sampler = sampler;
+      insn->bits3.sampler_gen7.msg_type = msg_type;
+      insn->bits3.sampler_gen7.simd_mode = simd_mode;
+   } else if (brw->gen >= 5) {
+      insn->bits3.sampler_gen5.binding_table_index = binding_table_index;
+      insn->bits3.sampler_gen5.sampler = sampler;
+      insn->bits3.sampler_gen5.msg_type = msg_type;
+      insn->bits3.sampler_gen5.simd_mode = simd_mode;
+   } else if (brw->is_g4x) {
+      insn->bits3.sampler_g4x.binding_table_index = binding_table_index;
+      insn->bits3.sampler_g4x.sampler = sampler;
+      insn->bits3.sampler_g4x.msg_type = msg_type;
+   } else {
+      insn->bits3.sampler.binding_table_index = binding_table_index;
+      insn->bits3.sampler.sampler = sampler;
+      insn->bits3.sampler.msg_type = msg_type;
+      insn->bits3.sampler.return_format = return_format;
+   }
+}
+
+
+#define next_insn brw_next_insn
+struct brw_instruction *
+brw_next_insn(struct brw_compile *p, unsigned opcode)
+{
+   struct brw_instruction *insn;
+
+   if (p->nr_insn + 1 > p->store_size) {
+      if (0) {
+         fprintf(stderr, "incresing the store size to %d\n",
+                 p->store_size << 1);
+      }
+      p->store_size <<= 1;
+      p->store = reralloc(p->mem_ctx, p->store,
+                          struct brw_instruction, p->store_size);
+      if (!p->store)
+         assert(!"realloc eu store memeory failed");
+   }
+
+   p->next_insn_offset += 16;
+   insn = &p->store[p->nr_insn++];
+   memcpy(insn, p->current, sizeof(*insn));
+
+   /* Reset this one-shot flag:
+    */
+
+   if (p->current->header.destreg__conditionalmod) {
+      p->current->header.destreg__conditionalmod = 0;
+      p->current->header.predicate_control = BRW_PREDICATE_NORMAL;
+   }
+
+   insn->header.opcode = opcode;
+   return insn;
+}
+
+static struct brw_instruction *brw_alu1( struct brw_compile *p,
+					 unsigned opcode,
+					 struct brw_reg dest,
+					 struct brw_reg src )
+{
+   struct brw_instruction *insn = next_insn(p, opcode);
+   brw_set_dest(p, insn, dest);
+   brw_set_src0(p, insn, src);
+   return insn;
+}
+
+static struct brw_instruction *brw_alu2(struct brw_compile *p,
+					unsigned opcode,
+					struct brw_reg dest,
+					struct brw_reg src0,
+					struct brw_reg src1 )
+{
+   struct brw_instruction *insn = next_insn(p, opcode);
+   brw_set_dest(p, insn, dest);
+   brw_set_src0(p, insn, src0);
+   brw_set_src1(p, insn, src1);
+   return insn;
+}
+
+static int
+get_3src_subreg_nr(struct brw_reg reg)
+{
+   if (reg.vstride == BRW_VERTICAL_STRIDE_0) {
+      assert(brw_is_single_value_swizzle(reg.dw1.bits.swizzle));
+      return reg.subnr / 4 + BRW_GET_SWZ(reg.dw1.bits.swizzle, 0);
+   } else {
+      return reg.subnr / 4;
+   }
+}
+
+static struct brw_instruction *brw_alu3(struct brw_compile *p,
+					unsigned opcode,
+					struct brw_reg dest,
+					struct brw_reg src0,
+					struct brw_reg src1,
+					struct brw_reg src2)
+{
+   struct brw_context *brw = p->brw;
+   struct brw_instruction *insn = next_insn(p, opcode);
+
+   gen7_convert_mrf_to_grf(p, &dest);
+
+   assert(insn->header.access_mode == BRW_ALIGN_16);
+
+   assert(dest.file == BRW_GENERAL_REGISTER_FILE ||
+	  dest.file == BRW_MESSAGE_REGISTER_FILE);
+   assert(dest.nr < 128);
+   assert(dest.address_mode == BRW_ADDRESS_DIRECT);
+   assert(dest.type == BRW_REGISTER_TYPE_F ||
+          dest.type == BRW_REGISTER_TYPE_D ||
+          dest.type == BRW_REGISTER_TYPE_UD);
+   insn->bits1.da3src.dest_reg_file = (dest.file == BRW_MESSAGE_REGISTER_FILE);
+   insn->bits1.da3src.dest_reg_nr = dest.nr;
+   insn->bits1.da3src.dest_subreg_nr = dest.subnr / 16;
+   insn->bits1.da3src.dest_writemask = dest.dw1.bits.writemask;
+   guess_execution_size(p, insn, dest);
+
+   assert(src0.file == BRW_GENERAL_REGISTER_FILE);
+   assert(src0.address_mode == BRW_ADDRESS_DIRECT);
+   assert(src0.nr < 128);
+   insn->bits2.da3src.src0_swizzle = src0.dw1.bits.swizzle;
+   insn->bits2.da3src.src0_subreg_nr = get_3src_subreg_nr(src0);
+   insn->bits2.da3src.src0_reg_nr = src0.nr;
+   insn->bits1.da3src.src0_abs = src0.abs;
+   insn->bits1.da3src.src0_negate = src0.negate;
+   insn->bits2.da3src.src0_rep_ctrl = src0.vstride == BRW_VERTICAL_STRIDE_0;
+
+   assert(src1.file == BRW_GENERAL_REGISTER_FILE);
+   assert(src1.address_mode == BRW_ADDRESS_DIRECT);
+   assert(src1.nr < 128);
+   insn->bits2.da3src.src1_swizzle = src1.dw1.bits.swizzle;
+   insn->bits2.da3src.src1_subreg_nr_low = get_3src_subreg_nr(src1) & 0x3;
+   insn->bits3.da3src.src1_subreg_nr_high = get_3src_subreg_nr(src1) >> 2;
+   insn->bits2.da3src.src1_rep_ctrl = src1.vstride == BRW_VERTICAL_STRIDE_0;
+   insn->bits3.da3src.src1_reg_nr = src1.nr;
+   insn->bits1.da3src.src1_abs = src1.abs;
+   insn->bits1.da3src.src1_negate = src1.negate;
+
+   assert(src2.file == BRW_GENERAL_REGISTER_FILE);
+   assert(src2.address_mode == BRW_ADDRESS_DIRECT);
+   assert(src2.nr < 128);
+   insn->bits3.da3src.src2_swizzle = src2.dw1.bits.swizzle;
+   insn->bits3.da3src.src2_subreg_nr = get_3src_subreg_nr(src2);
+   insn->bits3.da3src.src2_rep_ctrl = src2.vstride == BRW_VERTICAL_STRIDE_0;
+   insn->bits3.da3src.src2_reg_nr = src2.nr;
+   insn->bits1.da3src.src2_abs = src2.abs;
+   insn->bits1.da3src.src2_negate = src2.negate;
+
+   if (brw->gen >= 7) {
+      /* Set both the source and destination types based on dest.type,
+       * ignoring the source register types.  The MAD and LRP emitters ensure
+       * that all four types are float.  The BFE and BFI2 emitters, however,
+       * may send us mixed D and UD types and want us to ignore that and use
+       * the destination type.
+       */
+      switch (dest.type) {
+      case BRW_REGISTER_TYPE_F:
+         insn->bits1.da3src.src_type = BRW_3SRC_TYPE_F;
+         insn->bits1.da3src.dst_type = BRW_3SRC_TYPE_F;
+         break;
+      case BRW_REGISTER_TYPE_D:
+         insn->bits1.da3src.src_type = BRW_3SRC_TYPE_D;
+         insn->bits1.da3src.dst_type = BRW_3SRC_TYPE_D;
+         break;
+      case BRW_REGISTER_TYPE_UD:
+         insn->bits1.da3src.src_type = BRW_3SRC_TYPE_UD;
+         insn->bits1.da3src.dst_type = BRW_3SRC_TYPE_UD;
+         break;
+      }
+   }
+
+   return insn;
+}
+
+
+/***********************************************************************
+ * Convenience routines.
+ */
+#define ALU1(OP)					\
+struct brw_instruction *brw_##OP(struct brw_compile *p,	\
+	      struct brw_reg dest,			\
+	      struct brw_reg src0)   			\
+{							\
+   return brw_alu1(p, BRW_OPCODE_##OP, dest, src0);    	\
+}
+
+#define ALU2(OP)					\
+struct brw_instruction *brw_##OP(struct brw_compile *p,	\
+	      struct brw_reg dest,			\
+	      struct brw_reg src0,			\
+	      struct brw_reg src1)   			\
+{							\
+   return brw_alu2(p, BRW_OPCODE_##OP, dest, src0, src1);	\
+}
+
+#define ALU3(OP)					\
+struct brw_instruction *brw_##OP(struct brw_compile *p,	\
+	      struct brw_reg dest,			\
+	      struct brw_reg src0,			\
+	      struct brw_reg src1,			\
+	      struct brw_reg src2)   			\
+{							\
+   return brw_alu3(p, BRW_OPCODE_##OP, dest, src0, src1, src2);	\
+}
+
+#define ALU3F(OP)                                               \
+struct brw_instruction *brw_##OP(struct brw_compile *p,         \
+                                 struct brw_reg dest,           \
+                                 struct brw_reg src0,           \
+                                 struct brw_reg src1,           \
+                                 struct brw_reg src2)           \
+{                                                               \
+   assert(dest.type == BRW_REGISTER_TYPE_F);                    \
+   assert(src0.type == BRW_REGISTER_TYPE_F);                    \
+   assert(src1.type == BRW_REGISTER_TYPE_F);                    \
+   assert(src2.type == BRW_REGISTER_TYPE_F);                    \
+   return brw_alu3(p, BRW_OPCODE_##OP, dest, src0, src1, src2); \
+}
+
+/* Rounding operations (other than RNDD) require two instructions - the first
+ * stores a rounded value (possibly the wrong way) in the dest register, but
+ * also sets a per-channel "increment bit" in the flag register.  A predicated
+ * add of 1.0 fixes dest to contain the desired result.
+ *
+ * Sandybridge and later appear to round correctly without an ADD.
+ */
+#define ROUND(OP)							      \
+void brw_##OP(struct brw_compile *p,					      \
+	      struct brw_reg dest,					      \
+	      struct brw_reg src)					      \
+{									      \
+   struct brw_instruction *rnd, *add;					      \
+   rnd = next_insn(p, BRW_OPCODE_##OP);					      \
+   brw_set_dest(p, rnd, dest);						      \
+   brw_set_src0(p, rnd, src);						      \
+									      \
+   if (p->brw->gen < 6) {						      \
+      /* turn on round-increments */					      \
+      rnd->header.destreg__conditionalmod = BRW_CONDITIONAL_R;		      \
+      add = brw_ADD(p, dest, dest, brw_imm_f(1.0f));			      \
+      add->header.predicate_control = BRW_PREDICATE_NORMAL;		      \
+   }									      \
+}
+
+
+ALU1(MOV)
+ALU2(SEL)
+ALU1(NOT)
+ALU2(AND)
+ALU2(OR)
+ALU2(XOR)
+ALU2(SHR)
+ALU2(SHL)
+ALU2(ASR)
+ALU1(F32TO16)
+ALU1(F16TO32)
+ALU1(FRC)
+ALU1(RNDD)
+ALU2(MAC)
+ALU2(MACH)
+ALU1(LZD)
+ALU2(DP4)
+ALU2(DPH)
+ALU2(DP3)
+ALU2(DP2)
+ALU2(LINE)
+ALU2(PLN)
+ALU3F(MAD)
+ALU3F(LRP)
+ALU1(BFREV)
+ALU3(BFE)
+ALU2(BFI1)
+ALU3(BFI2)
+ALU1(FBH)
+ALU1(FBL)
+ALU1(CBIT)
+ALU2(ADDC)
+ALU2(SUBB)
+
+ROUND(RNDZ)
+ROUND(RNDE)
+
+
+struct brw_instruction *brw_ADD(struct brw_compile *p,
+				struct brw_reg dest,
+				struct brw_reg src0,
+				struct brw_reg src1)
+{
+   /* 6.2.2: add */
+   if (src0.type == BRW_REGISTER_TYPE_F ||
+       (src0.file == BRW_IMMEDIATE_VALUE &&
+	src0.type == BRW_REGISTER_TYPE_VF)) {
+      assert(src1.type != BRW_REGISTER_TYPE_UD);
+      assert(src1.type != BRW_REGISTER_TYPE_D);
+   }
+
+   if (src1.type == BRW_REGISTER_TYPE_F ||
+       (src1.file == BRW_IMMEDIATE_VALUE &&
+	src1.type == BRW_REGISTER_TYPE_VF)) {
+      assert(src0.type != BRW_REGISTER_TYPE_UD);
+      assert(src0.type != BRW_REGISTER_TYPE_D);
+   }
+
+   return brw_alu2(p, BRW_OPCODE_ADD, dest, src0, src1);
+}
+
+struct brw_instruction *brw_AVG(struct brw_compile *p,
+                                struct brw_reg dest,
+                                struct brw_reg src0,
+                                struct brw_reg src1)
+{
+   assert(dest.type == src0.type);
+   assert(src0.type == src1.type);
+   switch (src0.type) {
+   case BRW_REGISTER_TYPE_B:
+   case BRW_REGISTER_TYPE_UB:
+   case BRW_REGISTER_TYPE_W:
+   case BRW_REGISTER_TYPE_UW:
+   case BRW_REGISTER_TYPE_D:
+   case BRW_REGISTER_TYPE_UD:
+      break;
+   default:
+      assert(!"Bad type for brw_AVG");
+   }
+
+   return brw_alu2(p, BRW_OPCODE_AVG, dest, src0, src1);
+}
+
+struct brw_instruction *brw_MUL(struct brw_compile *p,
+				struct brw_reg dest,
+				struct brw_reg src0,
+				struct brw_reg src1)
+{
+   /* 6.32.38: mul */
+   if (src0.type == BRW_REGISTER_TYPE_D ||
+       src0.type == BRW_REGISTER_TYPE_UD ||
+       src1.type == BRW_REGISTER_TYPE_D ||
+       src1.type == BRW_REGISTER_TYPE_UD) {
+      assert(dest.type != BRW_REGISTER_TYPE_F);
+   }
+
+   if (src0.type == BRW_REGISTER_TYPE_F ||
+       (src0.file == BRW_IMMEDIATE_VALUE &&
+	src0.type == BRW_REGISTER_TYPE_VF)) {
+      assert(src1.type != BRW_REGISTER_TYPE_UD);
+      assert(src1.type != BRW_REGISTER_TYPE_D);
+   }
+
+   if (src1.type == BRW_REGISTER_TYPE_F ||
+       (src1.file == BRW_IMMEDIATE_VALUE &&
+	src1.type == BRW_REGISTER_TYPE_VF)) {
+      assert(src0.type != BRW_REGISTER_TYPE_UD);
+      assert(src0.type != BRW_REGISTER_TYPE_D);
+   }
+
+   assert(src0.file != BRW_ARCHITECTURE_REGISTER_FILE ||
+	  src0.nr != BRW_ARF_ACCUMULATOR);
+   assert(src1.file != BRW_ARCHITECTURE_REGISTER_FILE ||
+	  src1.nr != BRW_ARF_ACCUMULATOR);
+
+   return brw_alu2(p, BRW_OPCODE_MUL, dest, src0, src1);
+}
+
+
+void brw_NOP(struct brw_compile *p)
+{
+   struct brw_instruction *insn = next_insn(p, BRW_OPCODE_NOP);
+   brw_set_dest(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD));
+   brw_set_src0(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD));
+   brw_set_src1(p, insn, brw_imm_ud(0x0));
+}
+
+
+
+
+
+/***********************************************************************
+ * Comparisons, if/else/endif
+ */
+
+struct brw_instruction *brw_JMPI(struct brw_compile *p,
+                                 struct brw_reg dest,
+                                 struct brw_reg src0,
+                                 struct brw_reg src1)
+{
+   struct brw_instruction *insn = brw_alu2(p, BRW_OPCODE_JMPI, dest, src0, src1);
+
+   insn->header.execution_size = 1;
+   insn->header.compression_control = BRW_COMPRESSION_NONE;
+   insn->header.mask_control = BRW_MASK_DISABLE;
+
+   p->current->header.predicate_control = BRW_PREDICATE_NONE;
+
+   return insn;
+}
+
+static void
+push_if_stack(struct brw_compile *p, struct brw_instruction *inst)
+{
+   p->if_stack[p->if_stack_depth] = inst - p->store;
+
+   p->if_stack_depth++;
+   if (p->if_stack_array_size <= p->if_stack_depth) {
+      p->if_stack_array_size *= 2;
+      p->if_stack = reralloc(p->mem_ctx, p->if_stack, int,
+			     p->if_stack_array_size);
+   }
+}
+
+static struct brw_instruction *
+pop_if_stack(struct brw_compile *p)
+{
+   p->if_stack_depth--;
+   return &p->store[p->if_stack[p->if_stack_depth]];
+}
+
+static void
+push_loop_stack(struct brw_compile *p, struct brw_instruction *inst)
+{
+   if (p->loop_stack_array_size < p->loop_stack_depth) {
+      p->loop_stack_array_size *= 2;
+      p->loop_stack = reralloc(p->mem_ctx, p->loop_stack, int,
+			       p->loop_stack_array_size);
+      p->if_depth_in_loop = reralloc(p->mem_ctx, p->if_depth_in_loop, int,
+				     p->loop_stack_array_size);
+   }
+
+   p->loop_stack[p->loop_stack_depth] = inst - p->store;
+   p->loop_stack_depth++;
+   p->if_depth_in_loop[p->loop_stack_depth] = 0;
+}
+
+static struct brw_instruction *
+get_inner_do_insn(struct brw_compile *p)
+{
+   return &p->store[p->loop_stack[p->loop_stack_depth - 1]];
+}
+
+/* EU takes the value from the flag register and pushes it onto some
+ * sort of a stack (presumably merging with any flag value already on
+ * the stack).  Within an if block, the flags at the top of the stack
+ * control execution on each channel of the unit, eg. on each of the
+ * 16 pixel values in our wm programs.
+ *
+ * When the matching 'else' instruction is reached (presumably by
+ * countdown of the instruction count patched in by our ELSE/ENDIF
+ * functions), the relevant flags are inverted.
+ *
+ * When the matching 'endif' instruction is reached, the flags are
+ * popped off.  If the stack is now empty, normal execution resumes.
+ */
+struct brw_instruction *
+brw_IF(struct brw_compile *p, unsigned execute_size)
+{
+   struct brw_context *brw = p->brw;
+   struct brw_instruction *insn;
+
+   insn = next_insn(p, BRW_OPCODE_IF);
+
+   /* Override the defaults for this instruction:
+    */
+   if (brw->gen < 6) {
+      brw_set_dest(p, insn, brw_ip_reg());
+      brw_set_src0(p, insn, brw_ip_reg());
+      brw_set_src1(p, insn, brw_imm_d(0x0));
+   } else if (brw->gen == 6) {
+      brw_set_dest(p, insn, brw_imm_w(0));
+      insn->bits1.branch_gen6.jump_count = 0;
+      brw_set_src0(p, insn, vec1(retype(brw_null_reg(), BRW_REGISTER_TYPE_D)));
+      brw_set_src1(p, insn, vec1(retype(brw_null_reg(), BRW_REGISTER_TYPE_D)));
+   } else {
+      brw_set_dest(p, insn, vec1(retype(brw_null_reg(), BRW_REGISTER_TYPE_D)));
+      brw_set_src0(p, insn, vec1(retype(brw_null_reg(), BRW_REGISTER_TYPE_D)));
+      brw_set_src1(p, insn, brw_imm_ud(0));
+      insn->bits3.break_cont.jip = 0;
+      insn->bits3.break_cont.uip = 0;
+   }
+
+   insn->header.execution_size = execute_size;
+   insn->header.compression_control = BRW_COMPRESSION_NONE;
+   insn->header.predicate_control = BRW_PREDICATE_NORMAL;
+   insn->header.mask_control = BRW_MASK_ENABLE;
+   if (!p->single_program_flow)
+      insn->header.thread_control = BRW_THREAD_SWITCH;
+
+   p->current->header.predicate_control = BRW_PREDICATE_NONE;
+
+   push_if_stack(p, insn);
+   p->if_depth_in_loop[p->loop_stack_depth]++;
+   return insn;
+}
+
+/* This function is only used for gen6-style IF instructions with an
+ * embedded comparison (conditional modifier).  It is not used on gen7.
+ */
+struct brw_instruction *
+gen6_IF(struct brw_compile *p, uint32_t conditional,
+	struct brw_reg src0, struct brw_reg src1)
+{
+   struct brw_instruction *insn;
+
+   insn = next_insn(p, BRW_OPCODE_IF);
+
+   brw_set_dest(p, insn, brw_imm_w(0));
+   if (p->compressed) {
+      insn->header.execution_size = BRW_EXECUTE_16;
+   } else {
+      insn->header.execution_size = BRW_EXECUTE_8;
+   }
+   insn->bits1.branch_gen6.jump_count = 0;
+   brw_set_src0(p, insn, src0);
+   brw_set_src1(p, insn, src1);
+
+   assert(insn->header.compression_control == BRW_COMPRESSION_NONE);
+   assert(insn->header.predicate_control == BRW_PREDICATE_NONE);
+   insn->header.destreg__conditionalmod = conditional;
+
+   if (!p->single_program_flow)
+      insn->header.thread_control = BRW_THREAD_SWITCH;
+
+   push_if_stack(p, insn);
+   return insn;
+}
+
+/**
+ * In single-program-flow (SPF) mode, convert IF and ELSE into ADDs.
+ */
+static void
+convert_IF_ELSE_to_ADD(struct brw_compile *p,
+		       struct brw_instruction *if_inst,
+		       struct brw_instruction *else_inst)
+{
+   /* The next instruction (where the ENDIF would be, if it existed) */
+   struct brw_instruction *next_inst = &p->store[p->nr_insn];
+
+   assert(p->single_program_flow);
+   assert(if_inst != NULL && if_inst->header.opcode == BRW_OPCODE_IF);
+   assert(else_inst == NULL || else_inst->header.opcode == BRW_OPCODE_ELSE);
+   assert(if_inst->header.execution_size == BRW_EXECUTE_1);
+
+   /* Convert IF to an ADD instruction that moves the instruction pointer
+    * to the first instruction of the ELSE block.  If there is no ELSE
+    * block, point to where ENDIF would be.  Reverse the predicate.
+    *
+    * There's no need to execute an ENDIF since we don't need to do any
+    * stack operations, and if we're currently executing, we just want to
+    * continue normally.
+    */
+   if_inst->header.opcode = BRW_OPCODE_ADD;
+   if_inst->header.predicate_inverse = 1;
+
+   if (else_inst != NULL) {
+      /* Convert ELSE to an ADD instruction that points where the ENDIF
+       * would be.
+       */
+      else_inst->header.opcode = BRW_OPCODE_ADD;
+
+      if_inst->bits3.ud = (else_inst - if_inst + 1) * 16;
+      else_inst->bits3.ud = (next_inst - else_inst) * 16;
+   } else {
+      if_inst->bits3.ud = (next_inst - if_inst) * 16;
+   }
+}
+
+/**
+ * Patch IF and ELSE instructions with appropriate jump targets.
+ */
+static void
+patch_IF_ELSE(struct brw_compile *p,
+	      struct brw_instruction *if_inst,
+	      struct brw_instruction *else_inst,
+	      struct brw_instruction *endif_inst)
+{
+   struct brw_context *brw = p->brw;
+
+   /* We shouldn't be patching IF and ELSE instructions in single program flow
+    * mode when gen < 6, because in single program flow mode on those
+    * platforms, we convert flow control instructions to conditional ADDs that
+    * operate on IP (see brw_ENDIF).
+    *
+    * However, on Gen6, writing to IP doesn't work in single program flow mode
+    * (see the SandyBridge PRM, Volume 4 part 2, p79: "When SPF is ON, IP may
+    * not be updated by non-flow control instructions.").  And on later
+    * platforms, there is no significant benefit to converting control flow
+    * instructions to conditional ADDs.  So we do patch IF and ELSE
+    * instructions in single program flow mode on those platforms.
+    */
+   if (brw->gen < 6)
+      assert(!p->single_program_flow);
+
+   assert(if_inst != NULL && if_inst->header.opcode == BRW_OPCODE_IF);
+   assert(endif_inst != NULL);
+   assert(else_inst == NULL || else_inst->header.opcode == BRW_OPCODE_ELSE);
+
+   unsigned br = 1;
+   /* Jump count is for 64bit data chunk each, so one 128bit instruction
+    * requires 2 chunks.
+    */
+   if (brw->gen >= 5)
+      br = 2;
+
+   assert(endif_inst->header.opcode == BRW_OPCODE_ENDIF);
+   endif_inst->header.execution_size = if_inst->header.execution_size;
+
+   if (else_inst == NULL) {
+      /* Patch IF -> ENDIF */
+      if (brw->gen < 6) {
+	 /* Turn it into an IFF, which means no mask stack operations for
+	  * all-false and jumping past the ENDIF.
+	  */
+	 if_inst->header.opcode = BRW_OPCODE_IFF;
+	 if_inst->bits3.if_else.jump_count = br * (endif_inst - if_inst + 1);
+	 if_inst->bits3.if_else.pop_count = 0;
+	 if_inst->bits3.if_else.pad0 = 0;
+      } else if (brw->gen == 6) {
+	 /* As of gen6, there is no IFF and IF must point to the ENDIF. */
+	 if_inst->bits1.branch_gen6.jump_count = br * (endif_inst - if_inst);
+      } else {
+	 if_inst->bits3.break_cont.uip = br * (endif_inst - if_inst);
+	 if_inst->bits3.break_cont.jip = br * (endif_inst - if_inst);
+      }
+   } else {
+      else_inst->header.execution_size = if_inst->header.execution_size;
+
+      /* Patch IF -> ELSE */
+      if (brw->gen < 6) {
+	 if_inst->bits3.if_else.jump_count = br * (else_inst - if_inst);
+	 if_inst->bits3.if_else.pop_count = 0;
+	 if_inst->bits3.if_else.pad0 = 0;
+      } else if (brw->gen == 6) {
+	 if_inst->bits1.branch_gen6.jump_count = br * (else_inst - if_inst + 1);
+      }
+
+      /* Patch ELSE -> ENDIF */
+      if (brw->gen < 6) {
+	 /* BRW_OPCODE_ELSE pre-gen6 should point just past the
+	  * matching ENDIF.
+	  */
+	 else_inst->bits3.if_else.jump_count = br*(endif_inst - else_inst + 1);
+	 else_inst->bits3.if_else.pop_count = 1;
+	 else_inst->bits3.if_else.pad0 = 0;
+      } else if (brw->gen == 6) {
+	 /* BRW_OPCODE_ELSE on gen6 should point to the matching ENDIF. */
+	 else_inst->bits1.branch_gen6.jump_count = br*(endif_inst - else_inst);
+      } else {
+	 /* The IF instruction's JIP should point just past the ELSE */
+	 if_inst->bits3.break_cont.jip = br * (else_inst - if_inst + 1);
+	 /* The IF instruction's UIP and ELSE's JIP should point to ENDIF */
+	 if_inst->bits3.break_cont.uip = br * (endif_inst - if_inst);
+	 else_inst->bits3.break_cont.jip = br * (endif_inst - else_inst);
+      }
+   }
+}
+
+void
+brw_ELSE(struct brw_compile *p)
+{
+   struct brw_context *brw = p->brw;
+   struct brw_instruction *insn;
+
+   insn = next_insn(p, BRW_OPCODE_ELSE);
+
+   if (brw->gen < 6) {
+      brw_set_dest(p, insn, brw_ip_reg());
+      brw_set_src0(p, insn, brw_ip_reg());
+      brw_set_src1(p, insn, brw_imm_d(0x0));
+   } else if (brw->gen == 6) {
+      brw_set_dest(p, insn, brw_imm_w(0));
+      insn->bits1.branch_gen6.jump_count = 0;
+      brw_set_src0(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
+      brw_set_src1(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
+   } else {
+      brw_set_dest(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
+      brw_set_src0(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
+      brw_set_src1(p, insn, brw_imm_ud(0));
+      insn->bits3.break_cont.jip = 0;
+      insn->bits3.break_cont.uip = 0;
+   }
+
+   insn->header.compression_control = BRW_COMPRESSION_NONE;
+   insn->header.mask_control = BRW_MASK_ENABLE;
+   if (!p->single_program_flow)
+      insn->header.thread_control = BRW_THREAD_SWITCH;
+
+   push_if_stack(p, insn);
+}
+
+void
+brw_ENDIF(struct brw_compile *p)
+{
+   struct brw_context *brw = p->brw;
+   struct brw_instruction *insn = NULL;
+   struct brw_instruction *else_inst = NULL;
+   struct brw_instruction *if_inst = NULL;
+   struct brw_instruction *tmp;
+   bool emit_endif = true;
+
+   /* In single program flow mode, we can express IF and ELSE instructions
+    * equivalently as ADD instructions that operate on IP.  On platforms prior
+    * to Gen6, flow control instructions cause an implied thread switch, so
+    * this is a significant savings.
+    *
+    * However, on Gen6, writing to IP doesn't work in single program flow mode
+    * (see the SandyBridge PRM, Volume 4 part 2, p79: "When SPF is ON, IP may
+    * not be updated by non-flow control instructions.").  And on later
+    * platforms, there is no significant benefit to converting control flow
+    * instructions to conditional ADDs.  So we only do this trick on Gen4 and
+    * Gen5.
+    */
+   if (brw->gen < 6 && p->single_program_flow)
+      emit_endif = false;
+
+   /*
+    * A single next_insn() may change the base address of instruction store
+    * memory(p->store), so call it first before referencing the instruction
+    * store pointer from an index
+    */
+   if (emit_endif)
+      insn = next_insn(p, BRW_OPCODE_ENDIF);
+
+   /* Pop the IF and (optional) ELSE instructions from the stack */
+   p->if_depth_in_loop[p->loop_stack_depth]--;
+   tmp = pop_if_stack(p);
+   if (tmp->header.opcode == BRW_OPCODE_ELSE) {
+      else_inst = tmp;
+      tmp = pop_if_stack(p);
+   }
+   if_inst = tmp;
+
+   if (!emit_endif) {
+      /* ENDIF is useless; don't bother emitting it. */
+      convert_IF_ELSE_to_ADD(p, if_inst, else_inst);
+      return;
+   }
+
+   if (brw->gen < 6) {
+      brw_set_dest(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD));
+      brw_set_src0(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD));
+      brw_set_src1(p, insn, brw_imm_d(0x0));
+   } else if (brw->gen == 6) {
+      brw_set_dest(p, insn, brw_imm_w(0));
+      brw_set_src0(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
+      brw_set_src1(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
+   } else {
+      brw_set_dest(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
+      brw_set_src0(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
+      brw_set_src1(p, insn, brw_imm_ud(0));
+   }
+
+   insn->header.compression_control = BRW_COMPRESSION_NONE;
+   insn->header.mask_control = BRW_MASK_ENABLE;
+   insn->header.thread_control = BRW_THREAD_SWITCH;
+
+   /* Also pop item off the stack in the endif instruction: */
+   if (brw->gen < 6) {
+      insn->bits3.if_else.jump_count = 0;
+      insn->bits3.if_else.pop_count = 1;
+      insn->bits3.if_else.pad0 = 0;
+   } else if (brw->gen == 6) {
+      insn->bits1.branch_gen6.jump_count = 2;
+   } else {
+      insn->bits3.break_cont.jip = 2;
+   }
+   patch_IF_ELSE(p, if_inst, else_inst, insn);
+}
+
+struct brw_instruction *brw_BREAK(struct brw_compile *p)
+{
+   struct brw_context *brw = p->brw;
+   struct brw_instruction *insn;
+
+   insn = next_insn(p, BRW_OPCODE_BREAK);
+   if (brw->gen >= 6) {
+      brw_set_dest(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
+      brw_set_src0(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
+      brw_set_src1(p, insn, brw_imm_d(0x0));
+   } else {
+      brw_set_dest(p, insn, brw_ip_reg());
+      brw_set_src0(p, insn, brw_ip_reg());
+      brw_set_src1(p, insn, brw_imm_d(0x0));
+      insn->bits3.if_else.pad0 = 0;
+      insn->bits3.if_else.pop_count = p->if_depth_in_loop[p->loop_stack_depth];
+   }
+   insn->header.compression_control = BRW_COMPRESSION_NONE;
+   insn->header.execution_size = BRW_EXECUTE_8;
+
+   return insn;
+}
+
+struct brw_instruction *gen6_CONT(struct brw_compile *p)
+{
+   struct brw_instruction *insn;
+
+   insn = next_insn(p, BRW_OPCODE_CONTINUE);
+   brw_set_dest(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
+   brw_set_src0(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
+   brw_set_dest(p, insn, brw_ip_reg());
+   brw_set_src0(p, insn, brw_ip_reg());
+   brw_set_src1(p, insn, brw_imm_d(0x0));
+
+   insn->header.compression_control = BRW_COMPRESSION_NONE;
+   insn->header.execution_size = BRW_EXECUTE_8;
+   return insn;
+}
+
+struct brw_instruction *brw_CONT(struct brw_compile *p)
+{
+   struct brw_instruction *insn;
+   insn = next_insn(p, BRW_OPCODE_CONTINUE);
+   brw_set_dest(p, insn, brw_ip_reg());
+   brw_set_src0(p, insn, brw_ip_reg());
+   brw_set_src1(p, insn, brw_imm_d(0x0));
+   insn->header.compression_control = BRW_COMPRESSION_NONE;
+   insn->header.execution_size = BRW_EXECUTE_8;
+   /* insn->header.mask_control = BRW_MASK_DISABLE; */
+   insn->bits3.if_else.pad0 = 0;
+   insn->bits3.if_else.pop_count = p->if_depth_in_loop[p->loop_stack_depth];
+   return insn;
+}
+
+struct brw_instruction *gen6_HALT(struct brw_compile *p)
+{
+   struct brw_instruction *insn;
+
+   insn = next_insn(p, BRW_OPCODE_HALT);
+   brw_set_dest(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
+   brw_set_src0(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
+   brw_set_src1(p, insn, brw_imm_d(0x0)); /* UIP and JIP, updated later. */
+
+   if (p->compressed) {
+      insn->header.execution_size = BRW_EXECUTE_16;
+   } else {
+      insn->header.compression_control = BRW_COMPRESSION_NONE;
+      insn->header.execution_size = BRW_EXECUTE_8;
+   }
+   return insn;
+}
+
+/* DO/WHILE loop:
+ *
+ * The DO/WHILE is just an unterminated loop -- break or continue are
+ * used for control within the loop.  We have a few ways they can be
+ * done.
+ *
+ * For uniform control flow, the WHILE is just a jump, so ADD ip, ip,
+ * jip and no DO instruction.
+ *
+ * For non-uniform control flow pre-gen6, there's a DO instruction to
+ * push the mask, and a WHILE to jump back, and BREAK to get out and
+ * pop the mask.
+ *
+ * For gen6, there's no more mask stack, so no need for DO.  WHILE
+ * just points back to the first instruction of the loop.
+ */
+struct brw_instruction *brw_DO(struct brw_compile *p, unsigned execute_size)
+{
+   struct brw_context *brw = p->brw;
+
+   if (brw->gen >= 6 || p->single_program_flow) {
+      push_loop_stack(p, &p->store[p->nr_insn]);
+      return &p->store[p->nr_insn];
+   } else {
+      struct brw_instruction *insn = next_insn(p, BRW_OPCODE_DO);
+
+      push_loop_stack(p, insn);
+
+      /* Override the defaults for this instruction:
+       */
+      brw_set_dest(p, insn, brw_null_reg());
+      brw_set_src0(p, insn, brw_null_reg());
+      brw_set_src1(p, insn, brw_null_reg());
+
+      insn->header.compression_control = BRW_COMPRESSION_NONE;
+      insn->header.execution_size = execute_size;
+      insn->header.predicate_control = BRW_PREDICATE_NONE;
+      /* insn->header.mask_control = BRW_MASK_ENABLE; */
+      /* insn->header.mask_control = BRW_MASK_DISABLE; */
+
+      return insn;
+   }
+}
+
+/**
+ * For pre-gen6, we patch BREAK/CONT instructions to point at the WHILE
+ * instruction here.
+ *
+ * For gen6+, see brw_set_uip_jip(), which doesn't care so much about the loop
+ * nesting, since it can always just point to the end of the block/current loop.
+ */
+static void
+brw_patch_break_cont(struct brw_compile *p, struct brw_instruction *while_inst)
+{
+   struct brw_context *brw = p->brw;
+   struct brw_instruction *do_inst = get_inner_do_insn(p);
+   struct brw_instruction *inst;
+   int br = (brw->gen == 5) ? 2 : 1;
+
+   for (inst = while_inst - 1; inst != do_inst; inst--) {
+      /* If the jump count is != 0, that means that this instruction has already
+       * been patched because it's part of a loop inside of the one we're
+       * patching.
+       */
+      if (inst->header.opcode == BRW_OPCODE_BREAK &&
+	  inst->bits3.if_else.jump_count == 0) {
+	 inst->bits3.if_else.jump_count = br * ((while_inst - inst) + 1);
+      } else if (inst->header.opcode == BRW_OPCODE_CONTINUE &&
+		 inst->bits3.if_else.jump_count == 0) {
+	 inst->bits3.if_else.jump_count = br * (while_inst - inst);
+      }
+   }
+}
+
+struct brw_instruction *brw_WHILE(struct brw_compile *p)
+{
+   struct brw_context *brw = p->brw;
+   struct brw_instruction *insn, *do_insn;
+   unsigned br = 1;
+
+   if (brw->gen >= 5)
+      br = 2;
+
+   if (brw->gen >= 7) {
+      insn = next_insn(p, BRW_OPCODE_WHILE);
+      do_insn = get_inner_do_insn(p);
+
+      brw_set_dest(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
+      brw_set_src0(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
+      brw_set_src1(p, insn, brw_imm_ud(0));
+      insn->bits3.break_cont.jip = br * (do_insn - insn);
+
+      insn->header.execution_size = BRW_EXECUTE_8;
+   } else if (brw->gen == 6) {
+      insn = next_insn(p, BRW_OPCODE_WHILE);
+      do_insn = get_inner_do_insn(p);
+
+      brw_set_dest(p, insn, brw_imm_w(0));
+      insn->bits1.branch_gen6.jump_count = br * (do_insn - insn);
+      brw_set_src0(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
+      brw_set_src1(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
+
+      insn->header.execution_size = BRW_EXECUTE_8;
+   } else {
+      if (p->single_program_flow) {
+	 insn = next_insn(p, BRW_OPCODE_ADD);
+         do_insn = get_inner_do_insn(p);
+
+	 brw_set_dest(p, insn, brw_ip_reg());
+	 brw_set_src0(p, insn, brw_ip_reg());
+	 brw_set_src1(p, insn, brw_imm_d((do_insn - insn) * 16));
+	 insn->header.execution_size = BRW_EXECUTE_1;
+      } else {
+	 insn = next_insn(p, BRW_OPCODE_WHILE);
+         do_insn = get_inner_do_insn(p);
+
+	 assert(do_insn->header.opcode == BRW_OPCODE_DO);
+
+	 brw_set_dest(p, insn, brw_ip_reg());
+	 brw_set_src0(p, insn, brw_ip_reg());
+	 brw_set_src1(p, insn, brw_imm_d(0));
+
+	 insn->header.execution_size = do_insn->header.execution_size;
+	 insn->bits3.if_else.jump_count = br * (do_insn - insn + 1);
+	 insn->bits3.if_else.pop_count = 0;
+	 insn->bits3.if_else.pad0 = 0;
+
+	 brw_patch_break_cont(p, insn);
+      }
+   }
+   insn->header.compression_control = BRW_COMPRESSION_NONE;
+   p->current->header.predicate_control = BRW_PREDICATE_NONE;
+
+   p->loop_stack_depth--;
+
+   return insn;
+}
+
+
+/* FORWARD JUMPS:
+ */
+void brw_land_fwd_jump(struct brw_compile *p, int jmp_insn_idx)
+{
+   struct brw_context *brw = p->brw;
+   struct brw_instruction *jmp_insn = &p->store[jmp_insn_idx];
+   unsigned jmpi = 1;
+
+   if (brw->gen >= 5)
+      jmpi = 2;
+
+   assert(jmp_insn->header.opcode == BRW_OPCODE_JMPI);
+   assert(jmp_insn->bits1.da1.src1_reg_file == BRW_IMMEDIATE_VALUE);
+
+   jmp_insn->bits3.ud = jmpi * (p->nr_insn - jmp_insn_idx - 1);
+}
+
+
+
+/* To integrate with the above, it makes sense that the comparison
+ * instruction should populate the flag register.  It might be simpler
+ * just to use the flag reg for most WM tasks?
+ */
+void brw_CMP(struct brw_compile *p,
+	     struct brw_reg dest,
+	     unsigned conditional,
+	     struct brw_reg src0,
+	     struct brw_reg src1)
+{
+   struct brw_context *brw = p->brw;
+   struct brw_instruction *insn = next_insn(p, BRW_OPCODE_CMP);
+
+   insn->header.destreg__conditionalmod = conditional;
+   brw_set_dest(p, insn, dest);
+   brw_set_src0(p, insn, src0);
+   brw_set_src1(p, insn, src1);
+
+/*    guess_execution_size(insn, src0); */
+
+
+   /* Make it so that future instructions will use the computed flag
+    * value until brw_set_predicate_control_flag_value() is called
+    * again.
+    */
+   if (dest.file == BRW_ARCHITECTURE_REGISTER_FILE &&
+       dest.nr == 0) {
+      p->current->header.predicate_control = BRW_PREDICATE_NORMAL;
+      p->flag_value = 0xff;
+   }
+
+   /* Item WaCMPInstNullDstForcesThreadSwitch in the Haswell Bspec workarounds
+    * page says:
+    *    "Any CMP instruction with a null destination must use a {switch}."
+    *
+    * It also applies to other Gen7 platforms (IVB, BYT) even though it isn't
+    * mentioned on their work-arounds pages.
+    */
+   if (brw->gen == 7) {
+      if (dest.file == BRW_ARCHITECTURE_REGISTER_FILE &&
+          dest.nr == BRW_ARF_NULL) {
+         insn->header.thread_control = BRW_THREAD_SWITCH;
+      }
+   }
+}
+
+/* Issue 'wait' instruction for n1, host could program MMIO
+   to wake up thread. */
+void brw_WAIT (struct brw_compile *p)
+{
+   struct brw_instruction *insn = next_insn(p, BRW_OPCODE_WAIT);
+   struct brw_reg src = brw_notification_1_reg();
+
+   brw_set_dest(p, insn, src);
+   brw_set_src0(p, insn, src);
+   brw_set_src1(p, insn, brw_null_reg());
+   insn->header.execution_size = 0; /* must */
+   insn->header.predicate_control = 0;
+   insn->header.compression_control = 0;
+}
+
+
+/***********************************************************************
+ * Helpers for the various SEND message types:
+ */
+
+/** Extended math function, float[8].
+ */
+void brw_math( struct brw_compile *p,
+	       struct brw_reg dest,
+	       unsigned function,
+	       unsigned msg_reg_nr,
+	       struct brw_reg src,
+	       unsigned data_type,
+	       unsigned precision )
+{
+   struct brw_context *brw = p->brw;
+
+   if (brw->gen >= 6) {
+      struct brw_instruction *insn = next_insn(p, BRW_OPCODE_MATH);
+
+      assert(dest.file == BRW_GENERAL_REGISTER_FILE ||
+             (brw->gen >= 7 && dest.file == BRW_MESSAGE_REGISTER_FILE));
+      assert(src.file == BRW_GENERAL_REGISTER_FILE);
+
+      assert(dest.hstride == BRW_HORIZONTAL_STRIDE_1);
+      if (brw->gen == 6)
+	 assert(src.hstride == BRW_HORIZONTAL_STRIDE_1);
+
+      /* Source modifiers are ignored for extended math instructions on Gen6. */
+      if (brw->gen == 6) {
+	 assert(!src.negate);
+	 assert(!src.abs);
+      }
+
+      if (function == BRW_MATH_FUNCTION_INT_DIV_QUOTIENT ||
+	  function == BRW_MATH_FUNCTION_INT_DIV_REMAINDER ||
+	  function == BRW_MATH_FUNCTION_INT_DIV_QUOTIENT_AND_REMAINDER) {
+	 assert(src.type != BRW_REGISTER_TYPE_F);
+      } else {
+	 assert(src.type == BRW_REGISTER_TYPE_F);
+      }
+
+      /* Math is the same ISA format as other opcodes, except that CondModifier
+       * becomes FC[3:0] and ThreadCtrl becomes FC[5:4].
+       */
+      insn->header.destreg__conditionalmod = function;
+
+      brw_set_dest(p, insn, dest);
+      brw_set_src0(p, insn, src);
+      brw_set_src1(p, insn, brw_null_reg());
+   } else {
+      struct brw_instruction *insn = next_insn(p, BRW_OPCODE_SEND);
+
+      /* Example code doesn't set predicate_control for send
+       * instructions.
+       */
+      insn->header.predicate_control = 0;
+      insn->header.destreg__conditionalmod = msg_reg_nr;
+
+      brw_set_dest(p, insn, dest);
+      brw_set_src0(p, insn, src);
+      brw_set_math_message(p,
+			   insn,
+			   function,
+			   src.type == BRW_REGISTER_TYPE_D,
+			   precision,
+			   data_type);
+   }
+}
+
+/** Extended math function, float[8].
+ */
+void brw_math2(struct brw_compile *p,
+	       struct brw_reg dest,
+	       unsigned function,
+	       struct brw_reg src0,
+	       struct brw_reg src1)
+{
+   struct brw_context *brw = p->brw;
+   struct brw_instruction *insn = next_insn(p, BRW_OPCODE_MATH);
+
+   assert(dest.file == BRW_GENERAL_REGISTER_FILE ||
+          (brw->gen >= 7 && dest.file == BRW_MESSAGE_REGISTER_FILE));
+   assert(src0.file == BRW_GENERAL_REGISTER_FILE);
+   assert(src1.file == BRW_GENERAL_REGISTER_FILE);
+
+   assert(dest.hstride == BRW_HORIZONTAL_STRIDE_1);
+   if (brw->gen == 6) {
+      assert(src0.hstride == BRW_HORIZONTAL_STRIDE_1);
+      assert(src1.hstride == BRW_HORIZONTAL_STRIDE_1);
+   }
+
+   if (function == BRW_MATH_FUNCTION_INT_DIV_QUOTIENT ||
+       function == BRW_MATH_FUNCTION_INT_DIV_REMAINDER ||
+       function == BRW_MATH_FUNCTION_INT_DIV_QUOTIENT_AND_REMAINDER) {
+      assert(src0.type != BRW_REGISTER_TYPE_F);
+      assert(src1.type != BRW_REGISTER_TYPE_F);
+   } else {
+      assert(src0.type == BRW_REGISTER_TYPE_F);
+      assert(src1.type == BRW_REGISTER_TYPE_F);
+   }
+
+   /* Source modifiers are ignored for extended math instructions on Gen6. */
+   if (brw->gen == 6) {
+      assert(!src0.negate);
+      assert(!src0.abs);
+      assert(!src1.negate);
+      assert(!src1.abs);
+   }
+
+   /* Math is the same ISA format as other opcodes, except that CondModifier
+    * becomes FC[3:0] and ThreadCtrl becomes FC[5:4].
+    */
+   insn->header.destreg__conditionalmod = function;
+
+   brw_set_dest(p, insn, dest);
+   brw_set_src0(p, insn, src0);
+   brw_set_src1(p, insn, src1);
+}
+
+
+/**
+ * Write a block of OWORDs (half a GRF each) from the scratch buffer,
+ * using a constant offset per channel.
+ *
+ * The offset must be aligned to oword size (16 bytes).  Used for
+ * register spilling.
+ */
+void brw_oword_block_write_scratch(struct brw_compile *p,
+				   struct brw_reg mrf,
+				   int num_regs,
+				   unsigned offset)
+{
+   struct brw_context *brw = p->brw;
+   uint32_t msg_control, msg_type;
+   int mlen;
+
+   if (brw->gen >= 6)
+      offset /= 16;
+
+   mrf = retype(mrf, BRW_REGISTER_TYPE_UD);
+
+   if (num_regs == 1) {
+      msg_control = BRW_DATAPORT_OWORD_BLOCK_2_OWORDS;
+      mlen = 2;
+   } else {
+      msg_control = BRW_DATAPORT_OWORD_BLOCK_4_OWORDS;
+      mlen = 3;
+   }
+
+   /* Set up the message header.  This is g0, with g0.2 filled with
+    * the offset.  We don't want to leave our offset around in g0 or
+    * it'll screw up texture samples, so set it up inside the message
+    * reg.
+    */
+   {
+      brw_push_insn_state(p);
+      brw_set_mask_control(p, BRW_MASK_DISABLE);
+      brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+
+      brw_MOV(p, mrf, retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UD));
+
+      /* set message header global offset field (reg 0, element 2) */
+      brw_MOV(p,
+	      retype(brw_vec1_reg(BRW_MESSAGE_REGISTER_FILE,
+				  mrf.nr,
+				  2), BRW_REGISTER_TYPE_UD),
+	      brw_imm_ud(offset));
+
+      brw_pop_insn_state(p);
+   }
+
+   {
+      struct brw_reg dest;
+      struct brw_instruction *insn = next_insn(p, BRW_OPCODE_SEND);
+      int send_commit_msg;
+      struct brw_reg src_header = retype(brw_vec8_grf(0, 0),
+					 BRW_REGISTER_TYPE_UW);
+
+      if (insn->header.compression_control != BRW_COMPRESSION_NONE) {
+	 insn->header.compression_control = BRW_COMPRESSION_NONE;
+	 src_header = vec16(src_header);
+      }
+      assert(insn->header.predicate_control == BRW_PREDICATE_NONE);
+      insn->header.destreg__conditionalmod = mrf.nr;
+
+      /* Until gen6, writes followed by reads from the same location
+       * are not guaranteed to be ordered unless write_commit is set.
+       * If set, then a no-op write is issued to the destination
+       * register to set a dependency, and a read from the destination
+       * can be used to ensure the ordering.
+       *
+       * For gen6, only writes between different threads need ordering
+       * protection.  Our use of DP writes is all about register
+       * spilling within a thread.
+       */
+      if (brw->gen >= 6) {
+	 dest = retype(vec16(brw_null_reg()), BRW_REGISTER_TYPE_UW);
+	 send_commit_msg = 0;
+      } else {
+	 dest = src_header;
+	 send_commit_msg = 1;
+      }
+
+      brw_set_dest(p, insn, dest);
+      if (brw->gen >= 6) {
+	 brw_set_src0(p, insn, mrf);
+      } else {
+	 brw_set_src0(p, insn, brw_null_reg());
+      }
+
+      if (brw->gen >= 6)
+	 msg_type = GEN6_DATAPORT_WRITE_MESSAGE_OWORD_BLOCK_WRITE;
+      else
+	 msg_type = BRW_DATAPORT_WRITE_MESSAGE_OWORD_BLOCK_WRITE;
+
+      brw_set_dp_write_message(p,
+			       insn,
+			       255, /* binding table index (255=stateless) */
+			       msg_control,
+			       msg_type,
+			       mlen,
+			       true, /* header_present */
+			       0, /* not a render target */
+			       send_commit_msg, /* response_length */
+			       0, /* eot */
+			       send_commit_msg);
+   }
+}
+
+
+/**
+ * Read a block of owords (half a GRF each) from the scratch buffer
+ * using a constant index per channel.
+ *
+ * Offset must be aligned to oword size (16 bytes).  Used for register
+ * spilling.
+ */
+void
+brw_oword_block_read_scratch(struct brw_compile *p,
+			     struct brw_reg dest,
+			     struct brw_reg mrf,
+			     int num_regs,
+			     unsigned offset)
+{
+   struct brw_context *brw = p->brw;
+   uint32_t msg_control;
+   int rlen;
+
+   if (brw->gen >= 6)
+      offset /= 16;
+
+   mrf = retype(mrf, BRW_REGISTER_TYPE_UD);
+   dest = retype(dest, BRW_REGISTER_TYPE_UW);
+
+   if (num_regs == 1) {
+      msg_control = BRW_DATAPORT_OWORD_BLOCK_2_OWORDS;
+      rlen = 1;
+   } else {
+      msg_control = BRW_DATAPORT_OWORD_BLOCK_4_OWORDS;
+      rlen = 2;
+   }
+
+   {
+      brw_push_insn_state(p);
+      brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+      brw_set_mask_control(p, BRW_MASK_DISABLE);
+
+      brw_MOV(p, mrf, retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UD));
+
+      /* set message header global offset field (reg 0, element 2) */
+      brw_MOV(p,
+	      retype(brw_vec1_reg(BRW_MESSAGE_REGISTER_FILE,
+				  mrf.nr,
+				  2), BRW_REGISTER_TYPE_UD),
+	      brw_imm_ud(offset));
+
+      brw_pop_insn_state(p);
+   }
+
+   {
+      struct brw_instruction *insn = next_insn(p, BRW_OPCODE_SEND);
+
+      assert(insn->header.predicate_control == 0);
+      insn->header.compression_control = BRW_COMPRESSION_NONE;
+      insn->header.destreg__conditionalmod = mrf.nr;
+
+      brw_set_dest(p, insn, dest);	/* UW? */
+      if (brw->gen >= 6) {
+	 brw_set_src0(p, insn, mrf);
+      } else {
+	 brw_set_src0(p, insn, brw_null_reg());
+      }
+
+      brw_set_dp_read_message(p,
+			      insn,
+			      255, /* binding table index (255=stateless) */
+			      msg_control,
+			      BRW_DATAPORT_READ_MESSAGE_OWORD_BLOCK_READ, /* msg_type */
+			      BRW_DATAPORT_READ_TARGET_RENDER_CACHE,
+			      1, /* msg_length */
+                              true, /* header_present */
+			      rlen);
+   }
+}
+
+void
+gen7_block_read_scratch(struct brw_compile *p,
+                        struct brw_reg dest,
+                        int num_regs,
+                        unsigned offset)
+{
+   dest = retype(dest, BRW_REGISTER_TYPE_UW);
+
+   struct brw_instruction *insn = next_insn(p, BRW_OPCODE_SEND);
+
+   assert(insn->header.predicate_control == BRW_PREDICATE_NONE);
+   insn->header.compression_control = BRW_COMPRESSION_NONE;
+
+   brw_set_dest(p, insn, dest);
+
+   /* The HW requires that the header is present; this is to get the g0.5
+    * scratch offset.
+    */
+   bool header_present = true;
+   brw_set_src0(p, insn, brw_vec8_grf(0, 0));
+
+   brw_set_message_descriptor(p, insn,
+                              GEN7_SFID_DATAPORT_DATA_CACHE,
+                              1, /* mlen: just g0 */
+                              num_regs,
+                              header_present,
+                              false);
+
+   insn->bits3.ud |= GEN7_DATAPORT_SCRATCH_READ;
+
+   assert(num_regs == 1 || num_regs == 2 || num_regs == 4);
+   insn->bits3.ud |= (num_regs - 1) << GEN7_DATAPORT_SCRATCH_NUM_REGS_SHIFT;
+
+   /* According to the docs, offset is "A 12-bit HWord offset into the memory
+    * Immediate Memory buffer as specified by binding table 0xFF."  An HWORD
+    * is 32 bytes, which happens to be the size of a register.
+    */
+   offset /= REG_SIZE;
+   assert(offset < (1 << 12));
+   insn->bits3.ud |= offset;
+}
+
+/**
+ * Read a float[4] vector from the data port Data Cache (const buffer).
+ * Location (in buffer) should be a multiple of 16.
+ * Used for fetching shader constants.
+ */
+void brw_oword_block_read(struct brw_compile *p,
+			  struct brw_reg dest,
+			  struct brw_reg mrf,
+			  uint32_t offset,
+			  uint32_t bind_table_index)
+{
+   struct brw_context *brw = p->brw;
+
+   /* On newer hardware, offset is in units of owords. */
+   if (brw->gen >= 6)
+      offset /= 16;
+
+   mrf = retype(mrf, BRW_REGISTER_TYPE_UD);
+
+   brw_push_insn_state(p);
+   brw_set_predicate_control(p, BRW_PREDICATE_NONE);
+   brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+   brw_set_mask_control(p, BRW_MASK_DISABLE);
+
+   brw_MOV(p, mrf, retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UD));
+
+   /* set message header global offset field (reg 0, element 2) */
+   brw_MOV(p,
+	   retype(brw_vec1_reg(BRW_MESSAGE_REGISTER_FILE,
+			       mrf.nr,
+			       2), BRW_REGISTER_TYPE_UD),
+	   brw_imm_ud(offset));
+
+   struct brw_instruction *insn = next_insn(p, BRW_OPCODE_SEND);
+   insn->header.destreg__conditionalmod = mrf.nr;
+
+   /* cast dest to a uword[8] vector */
+   dest = retype(vec8(dest), BRW_REGISTER_TYPE_UW);
+
+   brw_set_dest(p, insn, dest);
+   if (brw->gen >= 6) {
+      brw_set_src0(p, insn, mrf);
+   } else {
+      brw_set_src0(p, insn, brw_null_reg());
+   }
+
+   brw_set_dp_read_message(p,
+			   insn,
+			   bind_table_index,
+			   BRW_DATAPORT_OWORD_BLOCK_1_OWORDLOW,
+			   BRW_DATAPORT_READ_MESSAGE_OWORD_BLOCK_READ,
+			   BRW_DATAPORT_READ_TARGET_DATA_CACHE,
+			   1, /* msg_length */
+                           true, /* header_present */
+			   1); /* response_length (1 reg, 2 owords!) */
+
+   brw_pop_insn_state(p);
+}
+
+
+void brw_fb_WRITE(struct brw_compile *p,
+		  int dispatch_width,
+                  unsigned msg_reg_nr,
+                  struct brw_reg src0,
+                  unsigned msg_control,
+                  unsigned binding_table_index,
+                  unsigned msg_length,
+                  unsigned response_length,
+                  bool eot,
+                  bool header_present)
+{
+   struct brw_context *brw = p->brw;
+   struct brw_instruction *insn;
+   unsigned msg_type;
+   struct brw_reg dest;
+
+   if (dispatch_width == 16)
+      dest = retype(vec16(brw_null_reg()), BRW_REGISTER_TYPE_UW);
+   else
+      dest = retype(vec8(brw_null_reg()), BRW_REGISTER_TYPE_UW);
+
+   if (brw->gen >= 6) {
+      insn = next_insn(p, BRW_OPCODE_SENDC);
+   } else {
+      insn = next_insn(p, BRW_OPCODE_SEND);
+   }
+   insn->header.compression_control = BRW_COMPRESSION_NONE;
+
+   if (brw->gen >= 6) {
+      /* headerless version, just submit color payload */
+      src0 = brw_message_reg(msg_reg_nr);
+
+      msg_type = GEN6_DATAPORT_WRITE_MESSAGE_RENDER_TARGET_WRITE;
+   } else {
+      insn->header.destreg__conditionalmod = msg_reg_nr;
+
+      msg_type = BRW_DATAPORT_WRITE_MESSAGE_RENDER_TARGET_WRITE;
+   }
+
+   brw_set_dest(p, insn, dest);
+   brw_set_src0(p, insn, src0);
+   brw_set_dp_write_message(p,
+			    insn,
+			    binding_table_index,
+			    msg_control,
+			    msg_type,
+			    msg_length,
+			    header_present,
+			    eot, /* last render target write */
+			    response_length,
+			    eot,
+			    0 /* send_commit_msg */);
+}
+
+
+/**
+ * Texture sample instruction.
+ * Note: the msg_type plus msg_length values determine exactly what kind
+ * of sampling operation is performed.  See volume 4, page 161 of docs.
+ */
+void brw_SAMPLE(struct brw_compile *p,
+		struct brw_reg dest,
+		unsigned msg_reg_nr,
+		struct brw_reg src0,
+		unsigned binding_table_index,
+		unsigned sampler,
+		unsigned msg_type,
+		unsigned response_length,
+		unsigned msg_length,
+		unsigned header_present,
+		unsigned simd_mode,
+		unsigned return_format)
+{
+   struct brw_context *brw = p->brw;
+   struct brw_instruction *insn;
+
+   if (msg_reg_nr != -1)
+      gen6_resolve_implied_move(p, &src0, msg_reg_nr);
+
+   insn = next_insn(p, BRW_OPCODE_SEND);
+   insn->header.predicate_control = 0; /* XXX */
+
+   /* From the 965 PRM (volume 4, part 1, section 14.2.41):
+    *
+    *    "Instruction compression is not allowed for this instruction (that
+    *     is, send). The hardware behavior is undefined if this instruction is
+    *     set as compressed. However, compress control can be set to "SecHalf"
+    *     to affect the EMask generation."
+    *
+    * No similar wording is found in later PRMs, but there are examples
+    * utilizing send with SecHalf.  More importantly, SIMD8 sampler messages
+    * are allowed in SIMD16 mode and they could not work without SecHalf.  For
+    * these reasons, we allow BRW_COMPRESSION_2NDHALF here.
+    */
+   if (insn->header.compression_control != BRW_COMPRESSION_2NDHALF)
+      insn->header.compression_control = BRW_COMPRESSION_NONE;
+
+   if (brw->gen < 6)
+      insn->header.destreg__conditionalmod = msg_reg_nr;
+
+   brw_set_dest(p, insn, dest);
+   brw_set_src0(p, insn, src0);
+   brw_set_sampler_message(p, insn,
+                           binding_table_index,
+                           sampler,
+                           msg_type,
+                           response_length,
+                           msg_length,
+                           header_present,
+                           simd_mode,
+                           return_format);
+}
+
+/* All these variables are pretty confusing - we might be better off
+ * using bitmasks and macros for this, in the old style.  Or perhaps
+ * just having the caller instantiate the fields in dword3 itself.
+ */
+void brw_urb_WRITE(struct brw_compile *p,
+		   struct brw_reg dest,
+		   unsigned msg_reg_nr,
+		   struct brw_reg src0,
+                   enum brw_urb_write_flags flags,
+		   unsigned msg_length,
+		   unsigned response_length,
+		   unsigned offset,
+		   unsigned swizzle)
+{
+   struct brw_context *brw = p->brw;
+   struct brw_instruction *insn;
+
+   gen6_resolve_implied_move(p, &src0, msg_reg_nr);
+
+   if (brw->gen == 7 && !(flags & BRW_URB_WRITE_USE_CHANNEL_MASKS)) {
+      /* Enable Channel Masks in the URB_WRITE_HWORD message header */
+      brw_push_insn_state(p);
+      brw_set_access_mode(p, BRW_ALIGN_1);
+      brw_set_mask_control(p, BRW_MASK_DISABLE);
+      brw_OR(p, retype(brw_vec1_reg(BRW_MESSAGE_REGISTER_FILE, msg_reg_nr, 5),
+		       BRW_REGISTER_TYPE_UD),
+	        retype(brw_vec1_grf(0, 5), BRW_REGISTER_TYPE_UD),
+		brw_imm_ud(0xff00));
+      brw_pop_insn_state(p);
+   }
+
+   insn = next_insn(p, BRW_OPCODE_SEND);
+
+   assert(msg_length < BRW_MAX_MRF);
+
+   brw_set_dest(p, insn, dest);
+   brw_set_src0(p, insn, src0);
+   brw_set_src1(p, insn, brw_imm_d(0));
+
+   if (brw->gen < 6)
+      insn->header.destreg__conditionalmod = msg_reg_nr;
+
+   brw_set_urb_message(p,
+		       insn,
+		       flags,
+		       msg_length,
+		       response_length,
+		       offset,
+		       swizzle);
+}
+
+static int
+next_ip(struct brw_compile *p, int ip)
+{
+   struct brw_instruction *insn = (void *)p->store + ip;
+
+   if (insn->header.cmpt_control)
+      return ip + 8;
+   else
+      return ip + 16;
+}
+
+static int
+brw_find_next_block_end(struct brw_compile *p, int start)
+{
+   int ip;
+   void *store = p->store;
+
+   for (ip = next_ip(p, start); ip < p->next_insn_offset; ip = next_ip(p, ip)) {
+      struct brw_instruction *insn = store + ip;
+
+      switch (insn->header.opcode) {
+      case BRW_OPCODE_ENDIF:
+      case BRW_OPCODE_ELSE:
+      case BRW_OPCODE_WHILE:
+      case BRW_OPCODE_HALT:
+	 return ip;
+      }
+   }
+
+   return 0;
+}
+
+/* There is no DO instruction on gen6, so to find the end of the loop
+ * we have to see if the loop is jumping back before our start
+ * instruction.
+ */
+static int
+brw_find_loop_end(struct brw_compile *p, int start)
+{
+   struct brw_context *brw = p->brw;
+   int ip;
+   int scale = 8;
+   void *store = p->store;
+
+   /* Always start after the instruction (such as a WHILE) we're trying to fix
+    * up.
+    */
+   for (ip = next_ip(p, start); ip < p->next_insn_offset; ip = next_ip(p, ip)) {
+      struct brw_instruction *insn = store + ip;
+
+      if (insn->header.opcode == BRW_OPCODE_WHILE) {
+	 int jip = brw->gen == 6 ? insn->bits1.branch_gen6.jump_count
+				   : insn->bits3.break_cont.jip;
+	 if (ip + jip * scale <= start)
+	    return ip;
+      }
+   }
+   assert(!"not reached");
+   return start;
+}
+
+/* After program generation, go back and update the UIP and JIP of
+ * BREAK, CONT, and HALT instructions to their correct locations.
+ */
+void
+brw_set_uip_jip(struct brw_compile *p)
+{
+   struct brw_context *brw = p->brw;
+   int ip;
+   int scale = 8;
+   void *store = p->store;
+
+   if (brw->gen < 6)
+      return;
+
+   for (ip = 0; ip < p->next_insn_offset; ip = next_ip(p, ip)) {
+      struct brw_instruction *insn = store + ip;
+
+      if (insn->header.cmpt_control) {
+	 /* Fixups for compacted BREAK/CONTINUE not supported yet. */
+	 assert(insn->header.opcode != BRW_OPCODE_BREAK &&
+		insn->header.opcode != BRW_OPCODE_CONTINUE &&
+		insn->header.opcode != BRW_OPCODE_HALT);
+	 continue;
+      }
+
+      int block_end_ip = brw_find_next_block_end(p, ip);
+      switch (insn->header.opcode) {
+      case BRW_OPCODE_BREAK:
+         assert(block_end_ip != 0);
+	 insn->bits3.break_cont.jip = (block_end_ip - ip) / scale;
+	 /* Gen7 UIP points to WHILE; Gen6 points just after it */
+	 insn->bits3.break_cont.uip =
+	    (brw_find_loop_end(p, ip) - ip +
+             (brw->gen == 6 ? 16 : 0)) / scale;
+	 break;
+      case BRW_OPCODE_CONTINUE:
+         assert(block_end_ip != 0);
+	 insn->bits3.break_cont.jip = (block_end_ip - ip) / scale;
+	 insn->bits3.break_cont.uip =
+            (brw_find_loop_end(p, ip) - ip) / scale;
+
+	 assert(insn->bits3.break_cont.uip != 0);
+	 assert(insn->bits3.break_cont.jip != 0);
+	 break;
+
+      case BRW_OPCODE_ENDIF:
+         if (block_end_ip == 0)
+            insn->bits3.break_cont.jip = 2;
+         else
+            insn->bits3.break_cont.jip = (block_end_ip - ip) / scale;
+	 break;
+
+      case BRW_OPCODE_HALT:
+	 /* From the Sandy Bridge PRM (volume 4, part 2, section 8.3.19):
+	  *
+	  *    "In case of the halt instruction not inside any conditional
+	  *     code block, the value of <JIP> and <UIP> should be the
+	  *     same. In case of the halt instruction inside conditional code
+	  *     block, the <UIP> should be the end of the program, and the
+	  *     <JIP> should be end of the most inner conditional code block."
+	  *
+	  * The uip will have already been set by whoever set up the
+	  * instruction.
+	  */
+	 if (block_end_ip == 0) {
+	    insn->bits3.break_cont.jip = insn->bits3.break_cont.uip;
+	 } else {
+	    insn->bits3.break_cont.jip = (block_end_ip - ip) / scale;
+	 }
+	 assert(insn->bits3.break_cont.uip != 0);
+	 assert(insn->bits3.break_cont.jip != 0);
+	 break;
+      }
+   }
+}
+
+void brw_ff_sync(struct brw_compile *p,
+		   struct brw_reg dest,
+		   unsigned msg_reg_nr,
+		   struct brw_reg src0,
+		   bool allocate,
+		   unsigned response_length,
+		   bool eot)
+{
+   struct brw_context *brw = p->brw;
+   struct brw_instruction *insn;
+
+   gen6_resolve_implied_move(p, &src0, msg_reg_nr);
+
+   insn = next_insn(p, BRW_OPCODE_SEND);
+   brw_set_dest(p, insn, dest);
+   brw_set_src0(p, insn, src0);
+   brw_set_src1(p, insn, brw_imm_d(0));
+
+   if (brw->gen < 6)
+      insn->header.destreg__conditionalmod = msg_reg_nr;
+
+   brw_set_ff_sync_message(p,
+			   insn,
+			   allocate,
+			   response_length,
+			   eot);
+}
+
+/**
+ * Emit the SEND instruction necessary to generate stream output data on Gen6
+ * (for transform feedback).
+ *
+ * If send_commit_msg is true, this is the last piece of stream output data
+ * from this thread, so send the data as a committed write.  According to the
+ * Sandy Bridge PRM (volume 2 part 1, section 4.5.1):
+ *
+ *   "Prior to End of Thread with a URB_WRITE, the kernel must ensure all
+ *   writes are complete by sending the final write as a committed write."
+ */
+void
+brw_svb_write(struct brw_compile *p,
+              struct brw_reg dest,
+              unsigned msg_reg_nr,
+              struct brw_reg src0,
+              unsigned binding_table_index,
+              bool   send_commit_msg)
+{
+   struct brw_instruction *insn;
+
+   gen6_resolve_implied_move(p, &src0, msg_reg_nr);
+
+   insn = next_insn(p, BRW_OPCODE_SEND);
+   brw_set_dest(p, insn, dest);
+   brw_set_src0(p, insn, src0);
+   brw_set_src1(p, insn, brw_imm_d(0));
+   brw_set_dp_write_message(p, insn,
+                            binding_table_index,
+                            0, /* msg_control: ignored */
+                            GEN6_DATAPORT_WRITE_MESSAGE_STREAMED_VB_WRITE,
+                            1, /* msg_length */
+                            true, /* header_present */
+                            0, /* last_render_target: ignored */
+                            send_commit_msg, /* response_length */
+                            0, /* end_of_thread */
+                            send_commit_msg); /* send_commit_msg */
+}
+
+static void
+brw_set_dp_untyped_atomic_message(struct brw_compile *p,
+                                  struct brw_instruction *insn,
+                                  unsigned atomic_op,
+                                  unsigned bind_table_index,
+                                  unsigned msg_length,
+                                  unsigned response_length,
+                                  bool header_present)
+{
+   if (p->brw->is_haswell) {
+      brw_set_message_descriptor(p, insn, HSW_SFID_DATAPORT_DATA_CACHE_1,
+                                 msg_length, response_length,
+                                 header_present, false);
+
+
+      if (insn->header.access_mode == BRW_ALIGN_1) {
+         if (insn->header.execution_size != BRW_EXECUTE_16)
+            insn->bits3.ud |= 1 << 12; /* SIMD8 mode */
+
+         insn->bits3.gen7_dp.msg_type =
+            HSW_DATAPORT_DC_PORT1_UNTYPED_ATOMIC_OP;
+      } else {
+         insn->bits3.gen7_dp.msg_type =
+            HSW_DATAPORT_DC_PORT1_UNTYPED_ATOMIC_OP_SIMD4X2;
+      }
+
+   } else {
+      brw_set_message_descriptor(p, insn, GEN7_SFID_DATAPORT_DATA_CACHE,
+                                 msg_length, response_length,
+                                 header_present, false);
+
+      insn->bits3.gen7_dp.msg_type = GEN7_DATAPORT_DC_UNTYPED_ATOMIC_OP;
+
+      if (insn->header.execution_size != BRW_EXECUTE_16)
+         insn->bits3.ud |= 1 << 12; /* SIMD8 mode */
+   }
+
+   if (response_length)
+      insn->bits3.ud |= 1 << 13; /* Return data expected */
+
+   insn->bits3.gen7_dp.binding_table_index = bind_table_index;
+   insn->bits3.ud |= atomic_op << 8;
+}
+
+void
+brw_untyped_atomic(struct brw_compile *p,
+                   struct brw_reg dest,
+                   struct brw_reg mrf,
+                   unsigned atomic_op,
+                   unsigned bind_table_index,
+                   unsigned msg_length,
+                   unsigned response_length) {
+   struct brw_instruction *insn = brw_next_insn(p, BRW_OPCODE_SEND);
+
+   brw_set_dest(p, insn, retype(dest, BRW_REGISTER_TYPE_UD));
+   brw_set_src0(p, insn, retype(mrf, BRW_REGISTER_TYPE_UD));
+   brw_set_src1(p, insn, brw_imm_d(0));
+   brw_set_dp_untyped_atomic_message(
+      p, insn, atomic_op, bind_table_index, msg_length, response_length,
+      insn->header.access_mode == BRW_ALIGN_1);
+}
+
+static void
+brw_set_dp_untyped_surface_read_message(struct brw_compile *p,
+                                        struct brw_instruction *insn,
+                                        unsigned bind_table_index,
+                                        unsigned msg_length,
+                                        unsigned response_length,
+                                        bool header_present)
+{
+   const unsigned dispatch_width =
+      (insn->header.execution_size == BRW_EXECUTE_16 ? 16 : 8);
+   const unsigned num_channels = response_length / (dispatch_width / 8);
+
+   if (p->brw->is_haswell) {
+      brw_set_message_descriptor(p, insn, HSW_SFID_DATAPORT_DATA_CACHE_1,
+                                 msg_length, response_length,
+                                 header_present, false);
+
+      insn->bits3.gen7_dp.msg_type = HSW_DATAPORT_DC_PORT1_UNTYPED_SURFACE_READ;
+   } else {
+      brw_set_message_descriptor(p, insn, GEN7_SFID_DATAPORT_DATA_CACHE,
+                                 msg_length, response_length,
+                                 header_present, false);
+
+      insn->bits3.gen7_dp.msg_type = GEN7_DATAPORT_DC_UNTYPED_SURFACE_READ;
+   }
+
+   if (insn->header.access_mode == BRW_ALIGN_1) {
+      if (dispatch_width == 16)
+         insn->bits3.ud |= 1 << 12; /* SIMD16 mode */
+      else
+         insn->bits3.ud |= 2 << 12; /* SIMD8 mode */
+   }
+
+   insn->bits3.gen7_dp.binding_table_index = bind_table_index;
+
+   /* Set mask of 32-bit channels to drop. */
+   insn->bits3.ud |= (0xf & (0xf << num_channels)) << 8;
+}
+
+void
+brw_untyped_surface_read(struct brw_compile *p,
+                         struct brw_reg dest,
+                         struct brw_reg mrf,
+                         unsigned bind_table_index,
+                         unsigned msg_length,
+                         unsigned response_length)
+{
+   struct brw_instruction *insn = next_insn(p, BRW_OPCODE_SEND);
+
+   brw_set_dest(p, insn, retype(dest, BRW_REGISTER_TYPE_UD));
+   brw_set_src0(p, insn, retype(mrf, BRW_REGISTER_TYPE_UD));
+   brw_set_dp_untyped_surface_read_message(
+      p, insn, bind_table_index, msg_length, response_length,
+      insn->header.access_mode == BRW_ALIGN_1);
+}
+
+static void
+brw_scattered_op(struct brw_compile *p,
+                 struct brw_instruction *insn,
+                 bool read, bool in_dwords)
+{
+    struct brw_context *brw = p->brw;
+    uint32_t simd_mode;
+
+    if (brw->gen >= 7) {
+        int msg_type;
+
+        if (read) {
+            msg_type = (in_dwords) ?
+                GEN7_DATAPORT_DC_DWORD_SCATTERED_READ :
+                GEN7_DATAPORT_DC_BYTE_SCATTERED_READ;
+        } else {
+            msg_type = (in_dwords) ?
+                GEN7_DATAPORT_DC_DWORD_SCATTERED_WRITE :
+                GEN7_DATAPORT_DC_BYTE_SCATTERED_WRITE;
+        }
+
+        insn->bits3.gen7_dp.msg_type = msg_type;
+    } else {
+        assert(in_dwords);
+
+        if (read) {
+            insn->bits3.gen6_dp.msg_type =
+                GEN6_DATAPORT_READ_MESSAGE_DWORD_SCATTERED_READ;
+        } else {
+            insn->bits3.gen6_dp.msg_type =
+                GEN6_DATAPORT_WRITE_MESSAGE_DWORD_SCATTERED_WRITE;
+        }
+    }
+
+    if (in_dwords) {
+        simd_mode = (insn->header.execution_size == BRW_EXECUTE_16) ?
+            (0x3 << 8) : (0x2 << 8);
+    } else {
+        simd_mode = (insn->header.execution_size == BRW_EXECUTE_16) ?
+            (0x1 << 8) : (0x0 << 8);
+    }
+
+    insn->bits3.ud |= simd_mode;
+}
+
+void
+brw_scattered_write(struct brw_compile *p,
+                    struct brw_reg dest,
+                    struct brw_reg mrf,
+                    unsigned bind_table_index,
+                    unsigned msg_length,
+                    bool header_present,
+                    bool in_dwords)
+{
+   struct brw_context *brw = p->brw;
+   struct brw_instruction *insn = next_insn(p, BRW_OPCODE_SEND);
+   enum brw_message_target sfid;
+
+   brw_set_dest(p, insn, retype(dest, BRW_REGISTER_TYPE_UD));
+   brw_set_src0(p, insn, retype(mrf, BRW_REGISTER_TYPE_UD));
+
+   if (brw->gen >= 7)
+       sfid = GEN7_SFID_DATAPORT_DATA_CACHE;
+   else
+       sfid = GEN6_SFID_DATAPORT_RENDER_CACHE;
+
+   brw_set_message_descriptor(p, insn, sfid, msg_length, 0,
+           header_present, false);
+   brw_scattered_op(p, insn, false, in_dwords);
+
+   insn->bits3.gen6_dp.binding_table_index = bind_table_index;
+}
+
+void
+brw_scattered_read(struct brw_compile *p,
+                   struct brw_reg dest,
+                   struct brw_reg mrf,
+                   unsigned bind_table_index,
+                   unsigned msg_length,
+                   bool header_present,
+                   bool in_dwords)
+{
+   struct brw_context *brw = p->brw;
+   struct brw_instruction *insn = next_insn(p, BRW_OPCODE_SEND);
+   enum brw_message_target sfid;
+   unsigned rlen;
+
+   brw_set_dest(p, insn, retype(dest, BRW_REGISTER_TYPE_UD));
+   brw_set_src0(p, insn, retype(mrf, BRW_REGISTER_TYPE_UD));
+
+   if (brw->gen >= 7)
+       sfid = GEN7_SFID_DATAPORT_DATA_CACHE;
+   else
+       sfid = GEN6_SFID_DATAPORT_RENDER_CACHE;
+
+   rlen = (insn->header.execution_size == BRW_EXECUTE_16) ? 2 : 1;
+
+   brw_set_message_descriptor(p, insn, sfid, msg_length, rlen,
+           header_present, false);
+   brw_scattered_op(p, insn, true, in_dwords);
+
+   insn->bits3.gen6_dp.binding_table_index = bind_table_index;
+}
+
+// LunarG : TODO - shader time??
+/**
+ * This instruction is generated as a single-channel align1 instruction by
+ * both the VS and FS stages when using INTEL_DEBUG=shader_time.
+ *
+ * We can't use the typed atomic op in the FS because that has the execution
+ * mask ANDed with the pixel mask, but we just want to write the one dword for
+ * all the pixels.
+ *
+ * We don't use the SIMD4x2 atomic ops in the VS because want to just write
+ * one u32.  So we use the same untyped atomic write message as the pixel
+ * shader.
+ *
+ * The untyped atomic operation requires a BUFFER surface type with RAW
+ * format, and is only accessible through the legacy DATA_CACHE dataport
+ * messages.
+ */
+//void brw_shader_time_add(struct brw_compile *p,
+//                         struct brw_reg payload,
+//                         uint32_t surf_index)
+//{
+//   struct brw_context *brw = p->brw;
+//   assert(brw->gen >= 7);
+
+//   brw_push_insn_state(p);
+//   brw_set_access_mode(p, BRW_ALIGN_1);
+//   brw_set_mask_control(p, BRW_MASK_DISABLE);
+//   struct brw_instruction *send = brw_next_insn(p, BRW_OPCODE_SEND);
+//   brw_pop_insn_state(p);
+
+//   /* We use brw_vec1_reg and unmasked because we want to increment the given
+//    * offset only once.
+//    */
+//   brw_set_dest(p, send, brw_vec1_reg(BRW_ARCHITECTURE_REGISTER_FILE,
+//                                      BRW_ARF_NULL, 0));
+//   brw_set_src0(p, send, brw_vec1_reg(payload.file,
+//                                      payload.nr, 0));
+//   brw_set_dp_untyped_atomic_message(p, send, BRW_AOP_ADD, surf_index,
+//                                     2 /* message length */,
+//                                     0 /* response length */,
+//                                     false /* header present */);
+//}

diff --git a/icd/intel/compiler/pipeline/brw_fs.cpp b/icd/intel/compiler/pipeline/brw_fs.cpp
new file mode 100644
index 0000000..df7762b
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_fs.cpp

@@ -0,0 +1,3399 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+/** @file brw_fs.cpp
+ *
+ * This file drives the GLSL IR -> LIR translation, contains the
+ * optimizations on the LIR, and drives the generation of native code
+ * from the LIR.
+ */
+
+extern "C" {
+
+#include <sys/types.h>
+
+#include "main/hash_table.h"
+#include "main/macros.h"
+#include "icd-utils.h"
+#include "main/shaderobj.h"
+//#include "main/fbobject.h"  // LunarG: Remove
+#include "program/prog_parameter.h"
+#include "program/prog_print.h"
+#include "program/register_allocate.h"
+#include "program/sampler.h"
+#include "program/hash_table.h"
+#include "brw_context.h"
+#include "brw_eu.h"
+#include "brw_wm.h"
+}
+#include "brw_fs.h"
+#include "brw_dead_control_flow.h"
+#include "main/uniforms.h"
+#include "brw_fs_live_variables.h"
+#include "glsl/glsl_types.h"
+#include "glsl/glsl_parser_extras.h"
+
+void
+fs_inst::init()
+{
+   memset(this, 0, sizeof(*this));
+   this->conditional_mod = BRW_CONDITIONAL_NONE;
+
+   this->dst = reg_undef;
+   this->src[0] = reg_undef;
+   this->src[1] = reg_undef;
+   this->src[2] = reg_undef;
+
+   /* This will be the case for almost all instructions. */
+   this->regs_written = 1;
+
+   this->writes_accumulator = false;
+}
+
+fs_inst::fs_inst()
+{
+   init();
+   this->opcode = BRW_OPCODE_NOP;
+}
+
+fs_inst::fs_inst(enum opcode opcode)
+{
+   init();
+   this->opcode = opcode;
+}
+
+fs_inst::fs_inst(enum opcode opcode, fs_reg dst)
+{
+   init();
+   this->opcode = opcode;
+   this->dst = dst;
+
+   if (dst.file == GRF)
+      assert(dst.reg_offset >= 0);
+}
+
+fs_inst::fs_inst(enum opcode opcode, fs_reg dst, fs_reg src0)
+{
+   init();
+   this->opcode = opcode;
+   this->dst = dst;
+   this->src[0] = src0;
+
+   if (dst.file == GRF)
+      assert(dst.reg_offset >= 0);
+   if (src[0].file == GRF)
+      assert(src[0].reg_offset >= 0);
+}
+
+fs_inst::fs_inst(enum opcode opcode, fs_reg dst, fs_reg src0, fs_reg src1)
+{
+   init();
+   this->opcode = opcode;
+   this->dst = dst;
+   this->src[0] = src0;
+   this->src[1] = src1;
+
+   if (dst.file == GRF)
+      assert(dst.reg_offset >= 0);
+   if (src[0].file == GRF)
+      assert(src[0].reg_offset >= 0);
+   if (src[1].file == GRF)
+      assert(src[1].reg_offset >= 0);
+}
+
+fs_inst::fs_inst(enum opcode opcode, fs_reg dst,
+		 fs_reg src0, fs_reg src1, fs_reg src2)
+{
+   init();
+   this->opcode = opcode;
+   this->dst = dst;
+   this->src[0] = src0;
+   this->src[1] = src1;
+   this->src[2] = src2;
+
+   if (dst.file == GRF)
+      assert(dst.reg_offset >= 0);
+   if (src[0].file == GRF)
+      assert(src[0].reg_offset >= 0);
+   if (src[1].file == GRF)
+      assert(src[1].reg_offset >= 0);
+   if (src[2].file == GRF)
+      assert(src[2].reg_offset >= 0);
+}
+
+#define ALU1(op)                                                        \
+   fs_inst *                                                            \
+   fs_visitor::op(fs_reg dst, fs_reg src0)                              \
+   {                                                                    \
+      return new(mem_ctx) fs_inst(BRW_OPCODE_##op, dst, src0);          \
+   }
+
+#define ALU2(op)                                                        \
+   fs_inst *                                                            \
+   fs_visitor::op(fs_reg dst, fs_reg src0, fs_reg src1)                 \
+   {                                                                    \
+      return new(mem_ctx) fs_inst(BRW_OPCODE_##op, dst, src0, src1);    \
+   }
+
+#define ALU2_ACC(op)                                                    \
+   fs_inst *                                                            \
+   fs_visitor::op(fs_reg dst, fs_reg src0, fs_reg src1)                 \
+   {                                                                    \
+      fs_inst *inst = new(mem_ctx) fs_inst(BRW_OPCODE_##op, dst, src0, src1);\
+      inst->writes_accumulator = true;                                  \
+      return inst;                                                      \
+   }
+
+#define ALU3(op)                                                        \
+   fs_inst *                                                            \
+   fs_visitor::op(fs_reg dst, fs_reg src0, fs_reg src1, fs_reg src2)    \
+   {                                                                    \
+      return new(mem_ctx) fs_inst(BRW_OPCODE_##op, dst, src0, src1, src2);\
+   }
+
+ALU1(NOT)
+ALU1(MOV)
+ALU1(FRC)
+ALU1(RNDD)
+ALU1(RNDE)
+ALU1(RNDZ)
+ALU2(ADD)
+ALU2(MUL)
+ALU2_ACC(MACH)
+ALU2(AND)
+ALU2(OR)
+ALU2(XOR)
+ALU2(SHL)
+ALU2(SHR)
+ALU2(ASR)
+ALU3(LRP)
+ALU1(BFREV)
+ALU3(BFE)
+ALU2(BFI1)
+ALU3(BFI2)
+ALU1(FBH)
+ALU1(FBL)
+ALU1(CBIT)
+ALU3(MAD)
+ALU2_ACC(ADDC)
+ALU2_ACC(SUBB)
+ALU2(SEL)
+ALU2(MAC)
+
+/** Gen4 predicated IF. */
+fs_inst *
+fs_visitor::IF(uint32_t predicate)
+{
+   fs_inst *inst = new(mem_ctx) fs_inst(BRW_OPCODE_IF);
+   inst->predicate = predicate;
+   return inst;
+}
+
+/** Gen6 IF with embedded comparison. */
+fs_inst *
+fs_visitor::IF(fs_reg src0, fs_reg src1, uint32_t condition)
+{
+   assert(brw->gen == 6);
+   fs_inst *inst = new(mem_ctx) fs_inst(BRW_OPCODE_IF,
+                                        reg_null_d, src0, src1);
+   inst->conditional_mod = condition;
+   return inst;
+}
+
+/**
+ * CMP: Sets the low bit of the destination channels with the result
+ * of the comparison, while the upper bits are undefined, and updates
+ * the flag register with the packed 16 bits of the result.
+ */
+fs_inst *
+fs_visitor::CMP(fs_reg dst, fs_reg src0, fs_reg src1, uint32_t condition)
+{
+   fs_inst *inst;
+
+   /* Take the instruction:
+    *
+    * CMP null<d> src0<f> src1<f>
+    *
+    * Original gen4 does type conversion to the destination type before
+    * comparison, producing garbage results for floating point comparisons.
+    * gen5 does the comparison on the execution type (resolved source types),
+    * so dst type doesn't matter.  gen6 does comparison and then uses the
+    * result as if it was the dst type with no conversion, which happens to
+    * mostly work out for float-interpreted-as-int since our comparisons are
+    * for >0, =0, <0.
+    */
+   if (brw->gen == 4) {
+      dst.type = src0.type;
+      if (dst.file == HW_REG)
+	 dst.fixed_hw_reg.type = dst.type;
+   }
+
+   resolve_ud_negate(&src0);
+   resolve_ud_negate(&src1);
+
+   inst = new(mem_ctx) fs_inst(BRW_OPCODE_CMP, dst, src0, src1);
+   inst->conditional_mod = condition;
+
+   return inst;
+}
+
+exec_list
+fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_reg &dst,
+                                       const fs_reg &surf_index,
+                                       const fs_reg &varying_offset,
+                                       uint32_t const_offset)
+{
+   exec_list instructions;
+   fs_inst *inst;
+
+   /* We have our constant surface use a pitch of 4 bytes, so our index can
+    * be any component of a vector, and then we load 4 contiguous
+    * components starting from that.
+    *
+    * We break down the const_offset to a portion added to the variable
+    * offset and a portion done using reg_offset, which means that if you
+    * have GLSL using something like "uniform vec4 a[20]; gl_FragColor =
+    * a[i]", we'll temporarily generate 4 vec4 loads from offset i * 4, and
+    * CSE can later notice that those loads are all the same and eliminate
+    * the redundant ones.
+    */
+   fs_reg vec4_offset = fs_reg(this, glsl_type::int_type);
+   instructions.push_tail(ADD(vec4_offset,
+                              varying_offset, const_offset & ~3));
+
+   int scale = 1;
+   if (brw->gen == 4 && dispatch_width == 8) {
+      /* Pre-gen5, we can either use a SIMD8 message that requires (header,
+       * u, v, r) as parameters, or we can just use the SIMD16 message
+       * consisting of (header, u).  We choose the second, at the cost of a
+       * longer return length.
+       */
+      scale = 2;
+   }
+
+   enum opcode op;
+   if (brw->gen >= 7)
+      op = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7;
+   else
+      op = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD;
+   fs_reg vec4_result = fs_reg(GRF, virtual_grf_alloc(4 * scale), dst.type);
+   inst = new(mem_ctx) fs_inst(op, vec4_result, surf_index, vec4_offset);
+   inst->regs_written = 4 * scale;
+   instructions.push_tail(inst);
+
+   if (brw->gen < 7) {
+      inst->base_mrf = 13;
+      inst->header_present = true;
+      if (brw->gen == 4)
+         inst->mlen = 3;
+      else
+         inst->mlen = 1 + dispatch_width / 8;
+   }
+
+   vec4_result.reg_offset += (const_offset & 3) * scale;
+   instructions.push_tail(MOV(dst, vec4_result));
+
+   return instructions;
+}
+
+/**
+ * A helper for MOV generation for fixing up broken hardware SEND dependency
+ * handling.
+ */
+fs_inst *
+fs_visitor::DEP_RESOLVE_MOV(int grf)
+{
+   fs_inst *inst = MOV(brw_null_reg(), fs_reg(GRF, grf, BRW_REGISTER_TYPE_F));
+
+   inst->ir = NULL;
+   inst->annotation = "send dependency resolve";
+
+   /* The caller always wants uncompressed to emit the minimal extra
+    * dependencies, and to avoid having to deal with aligning its regs to 2.
+    */
+   inst->force_uncompressed = true;
+
+   return inst;
+}
+
+bool
+fs_inst::equals(fs_inst *inst) const
+{
+   return (opcode == inst->opcode &&
+           dst.equals(inst->dst) &&
+           src[0].equals(inst->src[0]) &&
+           src[1].equals(inst->src[1]) &&
+           src[2].equals(inst->src[2]) &&
+           saturate == inst->saturate &&
+           predicate == inst->predicate &&
+           conditional_mod == inst->conditional_mod &&
+           mlen == inst->mlen &&
+           base_mrf == inst->base_mrf &&
+           sampler == inst->sampler &&
+           target == inst->target &&
+           eot == inst->eot &&
+           header_present == inst->header_present &&
+           shadow_compare == inst->shadow_compare &&
+           offset == inst->offset);
+}
+
+bool
+fs_inst::overwrites_reg(const fs_reg &reg) const
+{
+   return (reg.file == dst.file &&
+           reg.reg == dst.reg &&
+           reg.reg_offset >= dst.reg_offset  &&
+           reg.reg_offset < dst.reg_offset + regs_written);
+}
+
+bool
+fs_inst::is_send_from_grf() const
+{
+   return (opcode == FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7 ||
+           // LunarG : TODO - shader time??
+           //opcode == SHADER_OPCODE_SHADER_TIME_ADD ||
+           (opcode == FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD &&
+            src[1].file == GRF) ||
+           (is_tex() && src[0].file == GRF));
+}
+
+bool
+fs_visitor::can_do_source_mods(fs_inst *inst)
+{
+   if (brw->gen == 6 && inst->is_math())
+      return false;
+
+   if (inst->is_send_from_grf())
+      return false;
+
+   if (!inst->can_do_source_mods())
+      return false;
+
+   return true;
+}
+
+void
+fs_reg::init()
+{
+   memset(this, 0, sizeof(*this));
+   stride = 1;
+}
+
+/** Generic unset register constructor. */
+fs_reg::fs_reg()
+{
+   init();
+   this->file = BAD_FILE;
+}
+
+/** Immediate value constructor. */
+fs_reg::fs_reg(float f)
+{
+   init();
+   this->file = IMM;
+   this->type = BRW_REGISTER_TYPE_F;
+   this->imm.f = f;
+}
+
+/** Immediate value constructor. */
+fs_reg::fs_reg(int32_t i)
+{
+   init();
+   this->file = IMM;
+   this->type = BRW_REGISTER_TYPE_D;
+   this->imm.i = i;
+}
+
+/** Immediate value constructor. */
+fs_reg::fs_reg(uint32_t u)
+{
+   init();
+   this->file = IMM;
+   this->type = BRW_REGISTER_TYPE_UD;
+   this->imm.u = u;
+}
+
+/** Fixed brw_reg. */
+fs_reg::fs_reg(struct brw_reg fixed_hw_reg)
+{
+   init();
+   this->file = HW_REG;
+   this->fixed_hw_reg = fixed_hw_reg;
+   this->type = fixed_hw_reg.type;
+}
+
+bool
+fs_reg::equals(const fs_reg &r) const
+{
+   return (file == r.file &&
+           reg == r.reg &&
+           reg_offset == r.reg_offset &&
+           subreg_offset == r.subreg_offset &&
+           type == r.type &&
+           negate == r.negate &&
+           abs == r.abs &&
+           !reladdr && !r.reladdr &&
+           memcmp(&fixed_hw_reg, &r.fixed_hw_reg,
+                  sizeof(fixed_hw_reg)) == 0 &&
+           stride == r.stride &&
+           imm.u == r.imm.u);
+}
+
+fs_reg &
+fs_reg::apply_stride(unsigned stride)
+{
+   assert((this->stride * stride) <= 4 &&
+          (is_power_of_two(stride) || stride == 0) &&
+          file != HW_REG && file != IMM);
+   this->stride *= stride;
+   return *this;
+}
+
+fs_reg &
+fs_reg::set_smear(unsigned subreg)
+{
+   assert(file != HW_REG && file != IMM);
+   subreg_offset = subreg * type_sz(type);
+   stride = 0;
+   return *this;
+}
+
+bool
+fs_reg::is_contiguous() const
+{
+   return stride == 1;
+}
+
+bool
+fs_reg::is_zero() const
+{
+   if (file != IMM)
+      return false;
+
+   return type == BRW_REGISTER_TYPE_F ? imm.f == 0.0 : imm.i == 0;
+}
+
+bool
+fs_reg::is_one() const
+{
+   if (file != IMM)
+      return false;
+
+   return type == BRW_REGISTER_TYPE_F ? imm.f == 1.0 : imm.i == 1;
+}
+
+bool
+fs_reg::is_null() const
+{
+   return file == HW_REG &&
+          fixed_hw_reg.file == BRW_ARCHITECTURE_REGISTER_FILE &&
+          fixed_hw_reg.nr == BRW_ARF_NULL;
+}
+
+bool
+fs_reg::is_valid_3src() const
+{
+   return file == GRF || file == UNIFORM;
+}
+
+bool
+fs_reg::is_accumulator() const
+{
+   return file == HW_REG &&
+          fixed_hw_reg.file == BRW_ARCHITECTURE_REGISTER_FILE &&
+          fixed_hw_reg.nr == BRW_ARF_ACCUMULATOR;
+}
+
+int
+fs_visitor::type_size(const struct glsl_type *type)
+{
+   unsigned int size, i;
+
+   switch (type->base_type) {
+   case GLSL_TYPE_UINT:
+   case GLSL_TYPE_INT:
+   case GLSL_TYPE_FLOAT:
+   case GLSL_TYPE_BOOL:
+      return type->components();
+   case GLSL_TYPE_ARRAY:
+      return type_size(type->fields.array) * type->length;
+   case GLSL_TYPE_STRUCT:
+      size = 0;
+      for (i = 0; i < type->length; i++) {
+	 size += type_size(type->fields.structure[i].type);
+      }
+      return size;
+   case GLSL_TYPE_SAMPLER:
+      /* Samplers take up no register space, since they're baked in at
+       * link time.
+       */
+      return 0;
+   case GLSL_TYPE_ATOMIC_UINT:
+      return 0;
+   case GLSL_TYPE_IMAGE:
+   case GLSL_TYPE_VOID:
+   case GLSL_TYPE_ERROR:
+   case GLSL_TYPE_INTERFACE:
+      assert(!"not reached");
+      break;
+   }
+
+   return 0;
+}
+
+
+fs_reg
+fs_visitor::get_timestamp()
+{
+   assert(brw->gen >= 7);
+
+   fs_reg ts = fs_reg(retype(brw_vec1_reg(BRW_ARCHITECTURE_REGISTER_FILE,
+                                          BRW_ARF_TIMESTAMP,
+                                          0),
+                             BRW_REGISTER_TYPE_UD));
+
+   fs_reg dst = fs_reg(this, glsl_type::uint_type);
+
+   fs_inst *mov = emit(MOV(dst, ts));
+   /* We want to read the 3 fields we care about (mostly field 0, but also 2)
+    * even if it's not enabled in the dispatch.
+    */
+   mov->force_writemask_all = true;
+   mov->force_uncompressed = true;
+
+   /* The caller wants the low 32 bits of the timestamp.  Since it's running
+    * at the GPU clock rate of ~1.2ghz, it will roll over every ~3 seconds,
+    * which is plenty of time for our purposes.  It is identical across the
+    * EUs, but since it's tracking GPU core speed it will increment at a
+    * varying rate as render P-states change.
+    *
+    * The caller could also check if render P-states have changed (or anything
+    * else that might disrupt timing) by setting smear to 2 and checking if
+    * that field is != 0.
+    */
+   dst.set_smear(0);
+
+   return dst;
+}
+
+// LunarG: TODO - support shader time?
+//void
+//fs_visitor::emit_shader_time_begin()
+//{
+//   current_annotation = "shader time start";
+//   shader_start_time = get_timestamp();
+//}
+
+//void
+//fs_visitor::emit_shader_time_end()
+//{
+//   current_annotation = "shader time end";
+
+//   enum shader_time_shader_type type, written_type, reset_type;
+//   if (dispatch_width == 8) {
+//      type = ST_FS8;
+//      written_type = ST_FS8_WRITTEN;
+//      reset_type = ST_FS8_RESET;
+//   } else {
+//      assert(dispatch_width == 16);
+//      type = ST_FS16;
+//      written_type = ST_FS16_WRITTEN;
+//      reset_type = ST_FS16_RESET;
+//   }
+
+//   fs_reg shader_end_time = get_timestamp();
+
+//   /* Check that there weren't any timestamp reset events (assuming these
+//    * were the only two timestamp reads that happened).
+//    */
+//   fs_reg reset = shader_end_time;
+//   reset.set_smear(2);
+//   fs_inst *test = emit(AND(reg_null_d, reset, fs_reg(1u)));
+//   test->conditional_mod = BRW_CONDITIONAL_Z;
+//   emit(IF(BRW_PREDICATE_NORMAL));
+
+//   push_force_uncompressed();
+//   fs_reg start = shader_start_time;
+//   start.negate = true;
+//   fs_reg diff = fs_reg(this, glsl_type::uint_type);
+//   emit(ADD(diff, start, shader_end_time));
+
+//   /* If there were no instructions between the two timestamp gets, the diff
+//    * is 2 cycles.  Remove that overhead, so I can forget about that when
+//    * trying to determine the time taken for single instructions.
+//    */
+//   emit(ADD(diff, diff, fs_reg(-2u)));
+
+//   emit_shader_time_write(type, diff);
+//   emit_shader_time_write(written_type, fs_reg(1u));
+//   emit(BRW_OPCODE_ELSE);
+//   emit_shader_time_write(reset_type, fs_reg(1u));
+//   emit(BRW_OPCODE_ENDIF);
+
+//   pop_force_uncompressed();
+//}
+
+//void
+//fs_visitor::emit_shader_time_write(enum shader_time_shader_type type,
+//                                   fs_reg value)
+//{
+//   int shader_time_index =
+//      brw_get_shader_time_index(brw, shader_prog, &fp->Base, type);
+//   fs_reg offset = fs_reg(shader_time_index * SHADER_TIME_STRIDE);
+
+//   fs_reg payload;
+//   if (dispatch_width == 8)
+//      payload = fs_reg(this, glsl_type::uvec2_type);
+//   else
+//      payload = fs_reg(this, glsl_type::uint_type);
+
+//   emit(new(mem_ctx) fs_inst(SHADER_OPCODE_SHADER_TIME_ADD,
+//                             fs_reg(), payload, offset, value));
+//}
+
+void
+fs_visitor::vfail(const char *format, va_list va)
+{
+   char *msg;
+
+   if (failed)
+      return;
+
+   failed = true;
+
+   msg = ralloc_vasprintf(mem_ctx, format, va);
+   msg = ralloc_asprintf(mem_ctx, "FS compile failed: %s\n", msg);
+
+   this->fail_msg = msg;
+
+   if (INTEL_DEBUG & DEBUG_WM) {
+      fprintf(stderr, "%s",  msg);
+   }
+}
+
+void
+fs_visitor::fail(const char *format, ...)
+{
+   va_list va;
+
+   va_start(va, format);
+   vfail(format, va);
+   va_end(va);
+}
+
+/**
+ * Mark this program as impossible to compile in SIMD16 mode.
+ *
+ * During the SIMD8 compile (which happens first), we can detect and flag
+ * things that are unsupported in SIMD16 mode, so the compiler can skip
+ * the SIMD16 compile altogether.
+ *
+ * During a SIMD16 compile (if one happens anyway), this just calls fail().
+ */
+void
+fs_visitor::no16(const char *format, ...)
+{
+   va_list va;
+
+   va_start(va, format);
+
+   if (dispatch_width == 16) {
+      vfail(format, va);
+   } else {
+      simd16_unsupported = true;
+
+      if (brw->perf_debug) {
+         if (no16_msg)
+            ralloc_vasprintf_append(&no16_msg, format, va);
+         else
+            no16_msg = ralloc_vasprintf(mem_ctx, format, va);
+      }
+   }
+
+   va_end(va);
+}
+
+fs_inst *
+fs_visitor::emit(enum opcode opcode)
+{
+   return emit(new(mem_ctx) fs_inst(opcode));
+}
+
+fs_inst *
+fs_visitor::emit(enum opcode opcode, fs_reg dst)
+{
+   return emit(new(mem_ctx) fs_inst(opcode, dst));
+}
+
+fs_inst *
+fs_visitor::emit(enum opcode opcode, fs_reg dst, fs_reg src0)
+{
+   return emit(new(mem_ctx) fs_inst(opcode, dst, src0));
+}
+
+fs_inst *
+fs_visitor::emit(enum opcode opcode, fs_reg dst, fs_reg src0, fs_reg src1)
+{
+   return emit(new(mem_ctx) fs_inst(opcode, dst, src0, src1));
+}
+
+fs_inst *
+fs_visitor::emit(enum opcode opcode, fs_reg dst,
+                 fs_reg src0, fs_reg src1, fs_reg src2)
+{
+   return emit(new(mem_ctx) fs_inst(opcode, dst, src0, src1, src2));
+}
+
+void
+fs_visitor::push_force_uncompressed()
+{
+   force_uncompressed_stack++;
+}
+
+void
+fs_visitor::pop_force_uncompressed()
+{
+   force_uncompressed_stack--;
+   assert(force_uncompressed_stack >= 0);
+}
+
+/**
+ * Returns true if the instruction has a flag that means it won't
+ * update an entire destination register.
+ *
+ * For example, dead code elimination and live variable analysis want to know
+ * when a write to a variable screens off any preceding values that were in
+ * it.
+ */
+bool
+fs_inst::is_partial_write() const
+{
+   return ((this->predicate && this->opcode != BRW_OPCODE_SEL) ||
+           this->force_uncompressed ||
+           this->force_sechalf || !this->dst.is_contiguous());
+}
+
+int
+fs_inst::regs_read(fs_visitor *v, int arg) const
+{
+   if (is_tex() && arg == 0 && src[0].file == GRF) {
+      if (v->dispatch_width == 16)
+	 return (mlen + 1) / 2;
+      else
+	 return mlen;
+   }
+   return 1;
+}
+
+bool
+fs_inst::reads_flag() const
+{
+   return predicate;
+}
+
+bool
+fs_inst::writes_flag() const
+{
+   return (conditional_mod && opcode != BRW_OPCODE_SEL) ||
+          opcode == FS_OPCODE_MOV_DISPATCH_TO_FLAGS;
+}
+
+/**
+ * Returns how many MRFs an FS opcode will write over.
+ *
+ * Note that this is not the 0 or 1 implied writes in an actual gen
+ * instruction -- the FS opcodes often generate MOVs in addition.
+ */
+int
+fs_visitor::implied_mrf_writes(fs_inst *inst)
+{
+   if (inst->mlen == 0)
+      return 0;
+
+   if (inst->base_mrf == -1)
+      return 0;
+
+   switch (inst->opcode) {
+   case SHADER_OPCODE_RCP:
+   case SHADER_OPCODE_RSQ:
+   case SHADER_OPCODE_SQRT:
+   case SHADER_OPCODE_EXP2:
+   case SHADER_OPCODE_LOG2:
+   case SHADER_OPCODE_SIN:
+   case SHADER_OPCODE_COS:
+      return 1 * dispatch_width / 8;
+   case SHADER_OPCODE_POW:
+   case SHADER_OPCODE_INT_QUOTIENT:
+   case SHADER_OPCODE_INT_REMAINDER:
+      return 2 * dispatch_width / 8;
+   case SHADER_OPCODE_TEX:
+   case FS_OPCODE_TXB:
+   case SHADER_OPCODE_TXD:
+   case SHADER_OPCODE_TXF:
+   case SHADER_OPCODE_TXF_CMS:
+   case SHADER_OPCODE_TXF_MCS:
+   case SHADER_OPCODE_TG4:
+   case SHADER_OPCODE_TG4_OFFSET:
+   case SHADER_OPCODE_TXL:
+   case SHADER_OPCODE_TXS:
+   case SHADER_OPCODE_LOD:
+      return 1;
+   case FS_OPCODE_FB_WRITE:
+      return 2;
+   case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD:
+   case SHADER_OPCODE_GEN4_SCRATCH_READ:
+      return 1;
+   case FS_OPCODE_VARYING_PULL_CONSTANT_LOAD:
+      return inst->mlen;
+   case SHADER_OPCODE_GEN4_SCRATCH_WRITE:
+      return 2;
+   case SHADER_OPCODE_UNTYPED_ATOMIC:
+   case SHADER_OPCODE_UNTYPED_SURFACE_READ:
+      return 0;
+   default:
+      assert(!"not reached");
+      return inst->mlen;
+   }
+}
+
+int
+fs_visitor::virtual_grf_alloc(int size)
+{
+   if (virtual_grf_array_size <= virtual_grf_count) {
+      if (virtual_grf_array_size == 0)
+	 virtual_grf_array_size = 16;
+      else
+	 virtual_grf_array_size *= 2;
+      virtual_grf_sizes = reralloc(mem_ctx, virtual_grf_sizes, int,
+				   virtual_grf_array_size);
+   }
+   virtual_grf_sizes[virtual_grf_count] = size;
+   return virtual_grf_count++;
+}
+
+/** Fixed HW reg constructor. */
+fs_reg::fs_reg(enum register_file file, int reg)
+{
+   init();
+   this->file = file;
+   this->reg = reg;
+   this->type = BRW_REGISTER_TYPE_F;
+}
+
+/** Fixed HW reg constructor. */
+fs_reg::fs_reg(enum register_file file, int reg, uint32_t type)
+{
+   init();
+   this->file = file;
+   this->reg = reg;
+   this->type = type;
+}
+
+/** Automatic reg constructor. */
+fs_reg::fs_reg(class fs_visitor *v, const struct glsl_type *type)
+{
+   init();
+
+   this->file = GRF;
+   this->reg = v->virtual_grf_alloc(v->type_size(type));
+   this->reg_offset = 0;
+   this->type = brw_type_for_base_type(type);
+}
+
+fs_reg *
+fs_visitor::variable_storage(ir_variable *var)
+{
+   return (fs_reg *)hash_table_find(this->variable_ht, var);
+}
+
+void
+import_uniforms_callback(const void *key,
+			 void *data,
+			 void *closure)
+{
+   struct hash_table *dst_ht = (struct hash_table *)closure;
+   const fs_reg *reg = (const fs_reg *)data;
+
+   if (reg->file != UNIFORM)
+      return;
+
+   hash_table_insert(dst_ht, data, key);
+}
+
+/* For SIMD16, we need to follow from the uniform setup of SIMD8 dispatch.
+ * This brings in those uniform definitions
+ */
+void
+fs_visitor::import_uniforms(fs_visitor *v)
+{
+   hash_table_call_foreach(v->variable_ht,
+			   import_uniforms_callback,
+			   variable_ht);
+   this->push_constant_loc = v->push_constant_loc;
+   this->pull_constant_loc = v->pull_constant_loc;
+   this->uniforms = v->uniforms;
+   this->param_size = v->param_size;
+}
+
+/* Our support for uniforms is piggy-backed on the struct
+ * gl_fragment_program, because that's where the values actually
+ * get stored, rather than in some global gl_shader_program uniform
+ * store.
+ */
+void
+fs_visitor::setup_uniform_values(ir_variable *ir)
+{
+   int namelen = strlen(ir->name);
+
+   /* The data for our (non-builtin) uniforms is stored in a series of
+    * gl_uniform_driver_storage structs for each subcomponent that
+    * glGetUniformLocation() could name.  We know it's been set up in the same
+    * order we'd walk the type, so walk the list of storage and find anything
+    * with our name, or the prefix of a component that starts with our name.
+    */
+   unsigned params_before = uniforms;
+   for (unsigned u = 0; u < shader_prog->NumUserUniformStorage; u++) {
+      struct gl_uniform_storage *storage = &shader_prog->UniformStorage[u];
+
+      if (strncmp(ir->name, storage->name, namelen) != 0 ||
+          (storage->name[namelen] != 0 &&
+           storage->name[namelen] != '.' &&
+           storage->name[namelen] != '[')) {
+         continue;
+      }
+
+      unsigned slots = storage->type->component_slots();
+      if (storage->array_elements)
+         slots *= storage->array_elements;
+
+      for (unsigned i = 0; i < slots; i++) {
+         stage_prog_data->param[uniforms++] = &storage->storage[i].f;
+      }
+   }
+
+   /* Make sure we actually initialized the right amount of stuff here. */
+   assert(params_before + ir->type->component_slots() == uniforms);
+   (void)params_before;
+}
+
+
+/* Our support for builtin uniforms is even scarier than non-builtin.
+ * It sits on top of the PROG_STATE_VAR parameters that are
+ * automatically updated from GL context state.
+ */
+void
+fs_visitor::setup_builtin_uniform_values(ir_variable *ir)
+{
+   const ir_state_slot *const slots = ir->state_slots;
+   assert(ir->state_slots != NULL);
+
+   for (unsigned int i = 0; i < ir->num_state_slots; i++) {
+      /* This state reference has already been setup by ir_to_mesa, but we'll
+       * get the same index back here.
+       */
+      int index = _mesa_add_state_reference(this->fp->Base.Parameters,
+					    (gl_state_index *)slots[i].tokens);
+
+      /* Add each of the unique swizzles of the element as a parameter.
+       * This'll end up matching the expected layout of the
+       * array/matrix/structure we're trying to fill in.
+       */
+      int last_swiz = -1;
+      for (unsigned int j = 0; j < 4; j++) {
+	 int swiz = GET_SWZ(slots[i].swizzle, j);
+	 if (swiz == last_swiz)
+	    break;
+	 last_swiz = swiz;
+
+         stage_prog_data->param[uniforms++] =
+            &fp->Base.Parameters->ParameterValues[index][swiz].f;
+      }
+   }
+}
+
+fs_reg *
+fs_visitor::emit_fragcoord_interpolation(ir_variable *ir)
+{
+   fs_reg *reg = new(this->mem_ctx) fs_reg(this, ir->type);
+   fs_reg wpos = *reg;
+   bool flip = !ir->data.origin_upper_left;
+
+   /* gl_FragCoord.x */
+   if (ir->data.pixel_center_integer) {
+      emit(MOV(wpos, this->pixel_x));
+   } else {
+      emit(ADD(wpos, this->pixel_x, fs_reg(0.5f)));
+   }
+   wpos.reg_offset++;
+
+   /* gl_FragCoord.y */
+   if (!flip && ir->data.pixel_center_integer) {
+      emit(MOV(wpos, this->pixel_y));
+   } else {
+      fs_reg pixel_y = this->pixel_y;
+      float offset = (ir->data.pixel_center_integer ? 0.0 : 0.5);
+
+      if (flip) {
+	 pixel_y.negate = true;
+	 offset += c->key.drawable_height - 1.0;
+      }
+
+      emit(ADD(wpos, pixel_y, fs_reg(offset)));
+   }
+   wpos.reg_offset++;
+
+   /* gl_FragCoord.z */
+   if (brw->gen >= 6) {
+      emit(MOV(wpos, fs_reg(brw_vec8_grf(c->source_depth_reg, 0))));
+   } else {
+      emit(FS_OPCODE_LINTERP, wpos,
+           this->delta_x[BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC],
+           this->delta_y[BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC],
+           interp_reg(VARYING_SLOT_POS, 2));
+   }
+   wpos.reg_offset++;
+
+   /* gl_FragCoord.w: Already set up in emit_interpolation */
+   emit(BRW_OPCODE_MOV, wpos, this->wpos_w);
+
+   return reg;
+}
+
+fs_inst *
+fs_visitor::emit_linterp(const fs_reg &attr, const fs_reg &interp,
+                         glsl_interp_qualifier interpolation_mode,
+                         bool is_centroid, bool is_sample)
+{
+   brw_wm_barycentric_interp_mode barycoord_mode;
+   if (brw->gen >= 6) {
+      if (is_centroid) {
+         if (interpolation_mode == INTERP_QUALIFIER_SMOOTH)
+            barycoord_mode = BRW_WM_PERSPECTIVE_CENTROID_BARYCENTRIC;
+         else
+            barycoord_mode = BRW_WM_NONPERSPECTIVE_CENTROID_BARYCENTRIC;
+      } else if (is_sample) {
+          if (interpolation_mode == INTERP_QUALIFIER_SMOOTH)
+            barycoord_mode = BRW_WM_PERSPECTIVE_SAMPLE_BARYCENTRIC;
+         else
+            barycoord_mode = BRW_WM_NONPERSPECTIVE_SAMPLE_BARYCENTRIC;
+      } else {
+         if (interpolation_mode == INTERP_QUALIFIER_SMOOTH)
+            barycoord_mode = BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC;
+         else
+            barycoord_mode = BRW_WM_NONPERSPECTIVE_PIXEL_BARYCENTRIC;
+      }
+   } else {
+      /* On Ironlake and below, there is only one interpolation mode.
+       * Centroid interpolation doesn't mean anything on this hardware --
+       * there is no multisampling.
+       */
+      barycoord_mode = BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC;
+   }
+   return emit(FS_OPCODE_LINTERP, attr,
+               this->delta_x[barycoord_mode],
+               this->delta_y[barycoord_mode], interp);
+}
+
+fs_reg *
+fs_visitor::emit_general_interpolation(ir_variable *ir)
+{
+   fs_reg *reg = new(this->mem_ctx) fs_reg(this, ir->type);
+   reg->type = brw_type_for_base_type(ir->type->get_scalar_type());
+   fs_reg attr = *reg;
+
+   unsigned int array_elements;
+   const glsl_type *type;
+
+   if (ir->type->is_array()) {
+      array_elements = ir->type->length;
+      if (array_elements == 0) {
+	 fail("dereferenced array '%s' has length 0\n", ir->name);
+      }
+      type = ir->type->fields.array;
+   } else {
+      array_elements = 1;
+      type = ir->type;
+   }
+
+   glsl_interp_qualifier interpolation_mode =
+      ir->determine_interpolation_mode(c->key.flat_shade);
+
+   int location = ir->data.location;
+   for (unsigned int i = 0; i < array_elements; i++) {
+      for (unsigned int j = 0; j < type->matrix_columns; j++) {
+	 if (c->prog_data.urb_setup[location] == -1) {
+	    /* If there's no incoming setup data for this slot, don't
+	     * emit interpolation for it.
+	     */
+	    attr.reg_offset += type->vector_elements;
+	    location++;
+	    continue;
+	 }
+
+	 if (interpolation_mode == INTERP_QUALIFIER_FLAT) {
+	    /* Constant interpolation (flat shading) case. The SF has
+	     * handed us defined values in only the constant offset
+	     * field of the setup reg.
+	     */
+	    for (unsigned int k = 0; k < type->vector_elements; k++) {
+	       struct brw_reg interp = interp_reg(location, k);
+	       interp = suboffset(interp, 3);
+               interp.type = reg->type;
+	       emit(FS_OPCODE_CINTERP, attr, fs_reg(interp));
+	       attr.reg_offset++;
+	    }
+	 } else {
+	    /* Smooth/noperspective interpolation case. */
+	    for (unsigned int k = 0; k < type->vector_elements; k++) {
+               struct brw_reg interp = interp_reg(location, k);
+               emit_linterp(attr, fs_reg(interp), interpolation_mode,
+                            ir->data.centroid && !c->key.persample_shading,
+                            ir->data.sample || c->key.persample_shading);
+               if (brw->needs_unlit_centroid_workaround && ir->data.centroid) {
+                  /* Get the pixel/sample mask into f0 so that we know
+                   * which pixels are lit.  Then, for each channel that is
+                   * unlit, replace the centroid data with non-centroid
+                   * data.
+                   */
+                  emit(FS_OPCODE_MOV_DISPATCH_TO_FLAGS);
+                  fs_inst *inst = emit_linterp(attr, fs_reg(interp),
+                                               interpolation_mode,
+                                               false, false);
+                  inst->predicate = BRW_PREDICATE_NORMAL;
+                  inst->predicate_inverse = true;
+               }
+               if (brw->gen < 6 && interpolation_mode == INTERP_QUALIFIER_SMOOTH) {
+                  emit(BRW_OPCODE_MUL, attr, attr, this->pixel_w);
+               }
+	       attr.reg_offset++;
+	    }
+
+	 }
+	 location++;
+      }
+   }
+
+   return reg;
+}
+
+fs_reg *
+fs_visitor::emit_frontfacing_interpolation(ir_variable *ir)
+{
+   fs_reg *reg = new(this->mem_ctx) fs_reg(this, ir->type);
+
+   /* The frontfacing comes in as a bit in the thread payload. */
+   if (brw->gen >= 6) {
+      emit(BRW_OPCODE_ASR, *reg,
+	   fs_reg(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_D)),
+	   fs_reg(15));
+      emit(BRW_OPCODE_NOT, *reg, *reg);
+      emit(BRW_OPCODE_AND, *reg, *reg, fs_reg(1));
+   } else {
+      struct brw_reg r1_6ud = retype(brw_vec1_grf(1, 6), BRW_REGISTER_TYPE_UD);
+      /* bit 31 is "primitive is back face", so checking < (1 << 31) gives
+       * us front face
+       */
+      emit(CMP(*reg, fs_reg(r1_6ud), fs_reg(1u << 31), BRW_CONDITIONAL_L));
+      emit(BRW_OPCODE_AND, *reg, *reg, fs_reg(1u));
+   }
+
+   return reg;
+}
+
+void
+fs_visitor::compute_sample_position(fs_reg dst, fs_reg int_sample_pos)
+{
+   assert(dst.type == BRW_REGISTER_TYPE_F);
+
+   if (c->key.compute_pos_offset) {
+      /* Convert int_sample_pos to floating point */
+      emit(MOV(dst, int_sample_pos));
+      /* Scale to the range [0, 1] */
+      emit(MUL(dst, dst, fs_reg(1 / 16.0f)));
+   }
+   else {
+      /* From ARB_sample_shading specification:
+       * "When rendering to a non-multisample buffer, or if multisample
+       *  rasterization is disabled, gl_SamplePosition will always be
+       *  (0.5, 0.5).
+       */
+      emit(MOV(dst, fs_reg(0.5f)));
+   }
+}
+
+fs_reg *
+fs_visitor::emit_samplepos_setup(ir_variable *ir)
+{
+   assert(brw->gen >= 6);
+   assert(ir->type == glsl_type::vec2_type);
+
+   this->current_annotation = "compute sample position";
+   fs_reg *reg = new(this->mem_ctx) fs_reg(this, ir->type);
+   fs_reg pos = *reg;
+   fs_reg int_sample_x = fs_reg(this, glsl_type::int_type);
+   fs_reg int_sample_y = fs_reg(this, glsl_type::int_type);
+
+   /* WM will be run in MSDISPMODE_PERSAMPLE. So, only one of SIMD8 or SIMD16
+    * mode will be enabled.
+    *
+    * From the Ivy Bridge PRM, volume 2 part 1, page 344:
+    * R31.1:0         Position Offset X/Y for Slot[3:0]
+    * R31.3:2         Position Offset X/Y for Slot[7:4]
+    * .....
+    *
+    * The X, Y sample positions come in as bytes in  thread payload. So, read
+    * the positions using vstride=16, width=8, hstride=2.
+    */
+   struct brw_reg sample_pos_reg =
+      stride(retype(brw_vec1_grf(c->sample_pos_reg, 0),
+                    BRW_REGISTER_TYPE_B), 16, 8, 2);
+
+   fs_inst *inst = emit(MOV(int_sample_x, fs_reg(sample_pos_reg)));
+   if (dispatch_width == 16) {
+      inst->force_uncompressed = true;
+      inst = emit(MOV(half(int_sample_x, 1),
+                      fs_reg(suboffset(sample_pos_reg, 16))));
+      inst->force_sechalf = true;
+   }
+   /* Compute gl_SamplePosition.x */
+   compute_sample_position(pos, int_sample_x);
+   pos.reg_offset++;
+   inst = emit(MOV(int_sample_y, fs_reg(suboffset(sample_pos_reg, 1))));
+   if (dispatch_width == 16) {
+      inst->force_uncompressed = true;
+      inst = emit(MOV(half(int_sample_y, 1),
+                      fs_reg(suboffset(sample_pos_reg, 17))));
+      inst->force_sechalf = true;
+   }
+   /* Compute gl_SamplePosition.y */
+   compute_sample_position(pos, int_sample_y);
+   return reg;
+}
+
+fs_reg *
+fs_visitor::emit_sampleid_setup(ir_variable *ir)
+{
+   assert(brw->gen >= 6);
+
+   this->current_annotation = "compute sample id";
+   fs_reg *reg = new(this->mem_ctx) fs_reg(this, ir->type);
+
+   if (c->key.compute_sample_id) {
+      fs_reg t1 = fs_reg(this, glsl_type::int_type);
+      fs_reg t2 = fs_reg(this, glsl_type::int_type);
+      t2.type = BRW_REGISTER_TYPE_UW;
+
+      /* The PS will be run in MSDISPMODE_PERSAMPLE. For example with
+       * 8x multisampling, subspan 0 will represent sample N (where N
+       * is 0, 2, 4 or 6), subspan 1 will represent sample 1, 3, 5 or
+       * 7. We can find the value of N by looking at R0.0 bits 7:6
+       * ("Starting Sample Pair Index (SSPI)") and multiplying by two
+       * (since samples are always delivered in pairs). That is, we
+       * compute 2*((R0.0 & 0xc0) >> 6) == (R0.0 & 0xc0) >> 5. Then
+       * we need to add N to the sequence (0, 0, 0, 0, 1, 1, 1, 1) in
+       * case of SIMD8 and sequence (0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2,
+       * 2, 3, 3, 3, 3) in case of SIMD16. We compute this sequence by
+       * populating a temporary variable with the sequence (0, 1, 2, 3),
+       * and then reading from it using vstride=1, width=4, hstride=0.
+       * These computations hold good for 4x multisampling as well.
+       *
+       * For 2x MSAA and SIMD16, we want to use the sequence (0, 1, 0, 1):
+       * the first four slots are sample 0 of subspan 0; the next four
+       * are sample 1 of subspan 0; the third group is sample 0 of
+       * subspan 1, and finally sample 1 of subspan 1.
+       */
+      fs_inst *inst;
+      inst = emit(BRW_OPCODE_AND, t1,
+                  fs_reg(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_UD)),
+                  fs_reg(0xc0));
+      inst->force_writemask_all = true;
+      inst = emit(BRW_OPCODE_SHR, t1, t1, fs_reg(5));
+      inst->force_writemask_all = true;
+      /* This works for both SIMD8 and SIMD16 */
+      inst = emit(MOV(t2, brw_imm_v(c->key.persample_2x ? 0x1010 : 0x3210)));
+      inst->force_writemask_all = true;
+      /* This special instruction takes care of setting vstride=1,
+       * width=4, hstride=0 of t2 during an ADD instruction.
+       */
+      emit(FS_OPCODE_SET_SAMPLE_ID, *reg, t1, t2);
+   } else {
+      /* As per GL_ARB_sample_shading specification:
+       * "When rendering to a non-multisample buffer, or if multisample
+       *  rasterization is disabled, gl_SampleID will always be zero."
+       */
+      emit(BRW_OPCODE_MOV, *reg, fs_reg(0));
+   }
+
+   return reg;
+}
+
+fs_reg *
+fs_visitor::emit_samplemaskin_setup(ir_variable *ir)
+{
+   assert(brw->gen >= 7);
+   this->current_annotation = "compute gl_SampleMaskIn";
+   fs_reg *reg = new(this->mem_ctx) fs_reg(this, ir->type);
+   emit(MOV(*reg, fs_reg(retype(brw_vec8_grf(c->sample_mask_reg, 0), BRW_REGISTER_TYPE_D))));
+   return reg;
+}
+
+fs_reg
+fs_visitor::fix_math_operand(fs_reg src)
+{
+   /* Can't do hstride == 0 args on gen6 math, so expand it out. We
+    * might be able to do better by doing execsize = 1 math and then
+    * expanding that result out, but we would need to be careful with
+    * masking.
+    *
+    * The hardware ignores source modifiers (negate and abs) on math
+    * instructions, so we also move to a temp to set those up.
+    */
+   if (brw->gen == 6 && src.file != UNIFORM && src.file != IMM &&
+       !src.abs && !src.negate)
+      return src;
+
+   /* Gen7 relaxes most of the above restrictions, but still can't use IMM
+    * operands to math
+    */
+   if (brw->gen >= 7 && src.file != IMM)
+      return src;
+
+   fs_reg expanded = fs_reg(this, glsl_type::float_type);
+   expanded.type = src.type;
+   emit(BRW_OPCODE_MOV, expanded, src);
+   return expanded;
+}
+
+fs_inst *
+fs_visitor::emit_math(enum opcode opcode, fs_reg dst, fs_reg src)
+{
+   switch (opcode) {
+   case SHADER_OPCODE_RCP:
+   case SHADER_OPCODE_RSQ:
+   case SHADER_OPCODE_SQRT:
+   case SHADER_OPCODE_EXP2:
+   case SHADER_OPCODE_LOG2:
+   case SHADER_OPCODE_SIN:
+   case SHADER_OPCODE_COS:
+      break;
+   default:
+      assert(!"not reached: bad math opcode");
+      return NULL;
+   }
+
+   /* Can't do hstride == 0 args to gen6 math, so expand it out.  We
+    * might be able to do better by doing execsize = 1 math and then
+    * expanding that result out, but we would need to be careful with
+    * masking.
+    *
+    * Gen 6 hardware ignores source modifiers (negate and abs) on math
+    * instructions, so we also move to a temp to set those up.
+    */
+   if (brw->gen == 6 || brw->gen == 7)
+      src = fix_math_operand(src);
+
+   fs_inst *inst = emit(opcode, dst, src);
+
+   if (brw->gen < 6) {
+      inst->base_mrf = 2;
+      inst->mlen = dispatch_width / 8;
+   }
+
+   return inst;
+}
+
+fs_inst *
+fs_visitor::emit_math(enum opcode opcode, fs_reg dst, fs_reg src0, fs_reg src1)
+{
+   int base_mrf = 2;
+   fs_inst *inst;
+
+   switch (opcode) {
+   case SHADER_OPCODE_INT_QUOTIENT:
+   case SHADER_OPCODE_INT_REMAINDER:
+      if (brw->gen >= 7)
+	 no16("SIMD16 INTDIV unsupported\n");
+      break;
+   case SHADER_OPCODE_POW:
+      break;
+   default:
+      assert(!"not reached: unsupported binary math opcode.");
+      return NULL;
+   }
+
+   if (brw->gen >= 8) {
+      inst = emit(opcode, dst, src0, src1);
+   } else if (brw->gen >= 6) {
+      src0 = fix_math_operand(src0);
+      src1 = fix_math_operand(src1);
+
+      inst = emit(opcode, dst, src0, src1);
+   } else {
+      /* From the Ironlake PRM, Volume 4, Part 1, Section 6.1.13
+       * "Message Payload":
+       *
+       * "Operand0[7].  For the INT DIV functions, this operand is the
+       *  denominator."
+       *  ...
+       * "Operand1[7].  For the INT DIV functions, this operand is the
+       *  numerator."
+       */
+      bool is_int_div = opcode != SHADER_OPCODE_POW;
+      fs_reg &op0 = is_int_div ? src1 : src0;
+      fs_reg &op1 = is_int_div ? src0 : src1;
+
+      emit(BRW_OPCODE_MOV, fs_reg(MRF, base_mrf + 1, op1.type), op1);
+      inst = emit(opcode, dst, op0, reg_null_f);
+
+      inst->base_mrf = base_mrf;
+      inst->mlen = 2 * dispatch_width / 8;
+   }
+   return inst;
+}
+
+void
+fs_visitor::assign_curb_setup()
+{
+   if (dispatch_width == 8) {
+      c->prog_data.first_curbe_grf = c->nr_payload_regs;
+   } else {
+      c->prog_data.first_curbe_grf_16 = c->nr_payload_regs;
+   }
+
+   c->prog_data.curb_read_length = ALIGN(stage_prog_data->nr_params, 8) / 8;
+
+   /* Map the offsets in the UNIFORM file to fixed HW regs. */
+   foreach_list(node, &this->instructions) {
+      fs_inst *inst = (fs_inst *)node;
+
+      for (unsigned int i = 0; i < 3; i++) {
+	 if (inst->src[i].file == UNIFORM) {
+            int uniform_nr = inst->src[i].reg + inst->src[i].reg_offset;
+            int constant_nr;
+            if (uniform_nr >= 0 && uniform_nr < (int) uniforms) {
+               constant_nr = push_constant_loc[uniform_nr];
+            } else {
+               /* Section 5.11 of the OpenGL 4.1 spec says:
+                * "Out-of-bounds reads return undefined values, which include
+                *  values from other variables of the active program or zero."
+                * Just return the first push constant.
+                */
+               constant_nr = 0;
+            }
+
+	    struct brw_reg brw_reg = brw_vec1_grf(c->nr_payload_regs +
+						  constant_nr / 8,
+						  constant_nr % 8);
+
+	    inst->src[i].file = HW_REG;
+	    inst->src[i].fixed_hw_reg = byte_offset(
+               retype(brw_reg, inst->src[i].type),
+               inst->src[i].subreg_offset);
+	 }
+      }
+   }
+}
+
+void
+fs_visitor::calculate_urb_setup()
+{
+   for (unsigned int i = 0; i < VARYING_SLOT_MAX; i++) {
+      c->prog_data.urb_setup[i] = -1;
+   }
+
+   int urb_next = 0;
+   /* Figure out where each of the incoming setup attributes lands. */
+   if (brw->gen >= 6) {
+      if (_mesa_bitcount_64(fp->Base.InputsRead &
+                            BRW_FS_VARYING_INPUT_MASK) <= 16) {
+         /* The SF/SBE pipeline stage can do arbitrary rearrangement of the
+          * first 16 varying inputs, so we can put them wherever we want.
+          * Just put them in order.
+          *
+          * This is useful because it means that (a) inputs not used by the
+          * fragment shader won't take up valuable register space, and (b) we
+          * won't have to recompile the fragment shader if it gets paired with
+          * a different vertex (or geometry) shader.
+          */
+         for (unsigned int i = 0; i < VARYING_SLOT_MAX; i++) {
+            if (fp->Base.InputsRead & BRW_FS_VARYING_INPUT_MASK &
+                BITFIELD64_BIT(i)) {
+               c->prog_data.urb_setup[i] = urb_next++;
+            }
+         }
+      } else {
+         /* We have enough input varyings that the SF/SBE pipeline stage can't
+          * arbitrarily rearrange them to suit our whim; we have to put them
+          * in an order that matches the output of the previous pipeline stage
+          * (geometry or vertex shader).
+          */
+         struct brw_vue_map prev_stage_vue_map;
+         brw_compute_vue_map(brw, &prev_stage_vue_map,
+                             c->key.input_slots_valid);
+         int first_slot = 2 * BRW_SF_URB_ENTRY_READ_OFFSET;
+         assert(prev_stage_vue_map.num_slots <= first_slot + 32);
+         for (int slot = first_slot; slot < prev_stage_vue_map.num_slots;
+              slot++) {
+            int varying = prev_stage_vue_map.slot_to_varying[slot];
+            /* Note that varying == BRW_VARYING_SLOT_COUNT when a slot is
+             * unused.
+             */
+            if (varying != BRW_VARYING_SLOT_COUNT &&
+                (fp->Base.InputsRead & BRW_FS_VARYING_INPUT_MASK &
+                 BITFIELD64_BIT(varying))) {
+               c->prog_data.urb_setup[varying] = slot - first_slot;
+            }
+         }
+         urb_next = prev_stage_vue_map.num_slots - first_slot;
+      }
+   } else {
+      /* FINISHME: The sf doesn't map VS->FS inputs for us very well. */
+      for (unsigned int i = 0; i < VARYING_SLOT_MAX; i++) {
+         /* Point size is packed into the header, not as a general attribute */
+         if (i == VARYING_SLOT_PSIZ)
+            continue;
+
+	 if (c->key.input_slots_valid & BITFIELD64_BIT(i)) {
+		/* The back color slot is skipped when the front color is
+		 * also written to.  In addition, some slots can be
+		 * written in the vertex shader and not read in the
+		 * fragment shader.  So the register number must always be
+		 * incremented, mapped or not.
+		 */
+		if (_mesa_varying_slot_in_fs((gl_varying_slot) i))
+		   c->prog_data.urb_setup[i] = urb_next;
+			urb_next++;
+	 }
+	  }
+
+      /*
+       * It's a FS only attribute, and we did interpolation for this attribute
+       * in SF thread. So, count it here, too.
+       *
+       * See compile_sf_prog() for more info.
+       */
+      if (fp->Base.InputsRead & BITFIELD64_BIT(VARYING_SLOT_PNTC))
+         c->prog_data.urb_setup[VARYING_SLOT_PNTC] = urb_next++;
+   }
+
+   c->prog_data.num_varying_inputs = urb_next;
+}
+
+void
+fs_visitor::assign_urb_setup()
+{
+   int urb_start = c->nr_payload_regs + c->prog_data.curb_read_length;
+
+   /* Offset all the urb_setup[] index by the actual position of the
+    * setup regs, now that the location of the constants has been chosen.
+    */
+   foreach_list(node, &this->instructions) {
+      fs_inst *inst = (fs_inst *)node;
+
+      if (inst->opcode == FS_OPCODE_LINTERP) {
+	 assert(inst->src[2].file == HW_REG);
+	 inst->src[2].fixed_hw_reg.nr += urb_start;
+      }
+
+      if (inst->opcode == FS_OPCODE_CINTERP) {
+	 assert(inst->src[0].file == HW_REG);
+	 inst->src[0].fixed_hw_reg.nr += urb_start;
+      }
+   }
+
+   /* Each attribute is 4 setup channels, each of which is half a reg. */
+   this->first_non_payload_grf =
+      urb_start + c->prog_data.num_varying_inputs * 2;
+}
+
+/**
+ * Split large virtual GRFs into separate components if we can.
+ *
+ * This is mostly duplicated with what brw_fs_vector_splitting does,
+ * but that's really conservative because it's afraid of doing
+ * splitting that doesn't result in real progress after the rest of
+ * the optimization phases, which would cause infinite looping in
+ * optimization.  We can do it once here, safely.  This also has the
+ * opportunity to split interpolated values, or maybe even uniforms,
+ * which we don't have at the IR level.
+ *
+ * We want to split, because virtual GRFs are what we register
+ * allocate and spill (due to contiguousness requirements for some
+ * instructions), and they're what we naturally generate in the
+ * codegen process, but most virtual GRFs don't actually need to be
+ * contiguous sets of GRFs.  If we split, we'll end up with reduced
+ * live intervals and better dead code elimination and coalescing.
+ */
+void
+fs_visitor::split_virtual_grfs()
+{
+   int num_vars = this->virtual_grf_count;
+   bool split_grf[num_vars];
+   int new_virtual_grf[num_vars];
+
+   /* Try to split anything > 0 sized. */
+   for (int i = 0; i < num_vars; i++) {
+      if (this->virtual_grf_sizes[i] != 1)
+	 split_grf[i] = true;
+      else
+	 split_grf[i] = false;
+   }
+
+   if (brw->has_pln &&
+       this->delta_x[BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC].file == GRF) {
+      /* PLN opcodes rely on the delta_xy being contiguous.  We only have to
+       * check this for BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC, because prior to
+       * Gen6, that was the only supported interpolation mode, and since Gen6,
+       * delta_x and delta_y are in fixed hardware registers.
+       */
+      split_grf[this->delta_x[BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC].reg] =
+         false;
+   }
+
+   foreach_list(node, &this->instructions) {
+      fs_inst *inst = (fs_inst *)node;
+
+      /* If there's a SEND message that requires contiguous destination
+       * registers, no splitting is allowed.
+       */
+      if (inst->regs_written > 1) {
+	 split_grf[inst->dst.reg] = false;
+      }
+
+      /* If we're sending from a GRF, don't split it, on the assumption that
+       * the send is reading the whole thing.
+       */
+      if (inst->is_send_from_grf()) {
+         for (int i = 0; i < 3; i++) {
+            if (inst->src[i].file == GRF) {
+               split_grf[inst->src[i].reg] = false;
+            }
+         }
+      }
+   }
+
+   /* Allocate new space for split regs.  Note that the virtual
+    * numbers will be contiguous.
+    */
+   for (int i = 0; i < num_vars; i++) {
+      if (split_grf[i]) {
+	 new_virtual_grf[i] = virtual_grf_alloc(1);
+	 for (int j = 2; j < this->virtual_grf_sizes[i]; j++) {
+	    int reg = virtual_grf_alloc(1);
+	    assert(reg == new_virtual_grf[i] + j - 1);
+	    (void) reg;
+	 }
+	 this->virtual_grf_sizes[i] = 1;
+      }
+   }
+
+   foreach_list(node, &this->instructions) {
+      fs_inst *inst = (fs_inst *)node;
+
+      if (inst->dst.file == GRF &&
+	  split_grf[inst->dst.reg] &&
+	  inst->dst.reg_offset != 0) {
+	 inst->dst.reg = (new_virtual_grf[inst->dst.reg] +
+			  inst->dst.reg_offset - 1);
+	 inst->dst.reg_offset = 0;
+      }
+      for (int i = 0; i < 3; i++) {
+	 if (inst->src[i].file == GRF &&
+	     split_grf[inst->src[i].reg] &&
+	     inst->src[i].reg_offset != 0) {
+	    inst->src[i].reg = (new_virtual_grf[inst->src[i].reg] +
+				inst->src[i].reg_offset - 1);
+	    inst->src[i].reg_offset = 0;
+	 }
+      }
+   }
+   invalidate_live_intervals();
+}
+
+/**
+ * Remove unused virtual GRFs and compact the virtual_grf_* arrays.
+ *
+ * During code generation, we create tons of temporary variables, many of
+ * which get immediately killed and are never used again.  Yet, in later
+ * optimization and analysis passes, such as compute_live_intervals, we need
+ * to loop over all the virtual GRFs.  Compacting them can save a lot of
+ * overhead.
+ */
+void
+fs_visitor::compact_virtual_grfs()
+{
+   /* Mark which virtual GRFs are used, and count how many. */
+   int remap_table[this->virtual_grf_count];
+   memset(remap_table, -1, sizeof(remap_table));
+
+   foreach_list(node, &this->instructions) {
+      const fs_inst *inst = (const fs_inst *) node;
+
+      if (inst->dst.file == GRF)
+         remap_table[inst->dst.reg] = 0;
+
+      for (int i = 0; i < 3; i++) {
+         if (inst->src[i].file == GRF)
+            remap_table[inst->src[i].reg] = 0;
+      }
+   }
+
+   /* In addition to registers used in instructions, fs_visitor keeps
+    * direct references to certain special values which must be patched:
+    */
+   struct {
+      fs_reg *reg;
+      unsigned count;
+   } special[] = {
+      { &frag_depth, 1 },
+      { &pixel_x, 1 },
+      { &pixel_y, 1 },
+      { &pixel_w, 1 },
+      { &wpos_w, 1 },
+      { &dual_src_output, 1 },
+      { outputs, ARRAY_SIZE(outputs) },
+      { delta_x, ARRAY_SIZE(delta_x) },
+      { delta_y, ARRAY_SIZE(delta_y) },
+      { &sample_mask, 1 },
+      { &shader_start_time, 1 },
+   };
+
+   /* Treat all special values as used, to be conservative */
+   for (unsigned i = 0; i < ARRAY_SIZE(special); i++) {
+      for (unsigned j = 0; j < special[i].count; j++) {
+         if (special[i].reg[j].file == GRF)
+            remap_table[special[i].reg[j].reg] = 0;
+      }
+   }
+
+   /* Compact the GRF arrays. */
+   int new_index = 0;
+   for (int i = 0; i < this->virtual_grf_count; i++) {
+      if (remap_table[i] != -1) {
+         remap_table[i] = new_index;
+         virtual_grf_sizes[new_index] = virtual_grf_sizes[i];
+         invalidate_live_intervals();
+         ++new_index;
+      }
+   }
+
+   this->virtual_grf_count = new_index;
+
+   /* Patch all the instructions to use the newly renumbered registers */
+   foreach_list(node, &this->instructions) {
+      fs_inst *inst = (fs_inst *) node;
+
+      if (inst->dst.file == GRF)
+         inst->dst.reg = remap_table[inst->dst.reg];
+
+      for (int i = 0; i < 3; i++) {
+         if (inst->src[i].file == GRF)
+            inst->src[i].reg = remap_table[inst->src[i].reg];
+      }
+   }
+
+   /* Patch all the references to special values */
+   for (unsigned i = 0; i < ARRAY_SIZE(special); i++) {
+      for (unsigned j = 0; j < special[i].count; j++) {
+         fs_reg *reg = &special[i].reg[j];
+         if (reg->file == GRF && remap_table[reg->reg] != -1)
+            reg->reg = remap_table[reg->reg];
+      }
+   }
+}
+
+/*
+ * Implements array access of uniforms by inserting a
+ * PULL_CONSTANT_LOAD instruction.
+ *
+ * Unlike temporary GRF array access (where we don't support it due to
+ * the difficulty of doing relative addressing on instruction
+ * destinations), we could potentially do array access of uniforms
+ * that were loaded in GRF space as push constants.  In real-world
+ * usage we've seen, though, the arrays being used are always larger
+ * than we could load as push constants, so just always move all
+ * uniform array access out to a pull constant buffer.
+ */
+void
+fs_visitor::move_uniform_array_access_to_pull_constants()
+{
+   if (dispatch_width != 8)
+      return;
+
+   pull_constant_loc = ralloc_array(mem_ctx, int, uniforms);
+
+   for (unsigned int i = 0; i < uniforms; i++) {
+      pull_constant_loc[i] = -1;
+   }
+
+   /* Walk through and find array access of uniforms.  Put a copy of that
+    * uniform in the pull constant buffer.
+    *
+    * Note that we don't move constant-indexed accesses to arrays.  No
+    * testing has been done of the performance impact of this choice.
+    */
+   foreach_list_safe(node, &this->instructions) {
+      fs_inst *inst = (fs_inst *)node;
+
+      for (int i = 0 ; i < 3; i++) {
+         if (inst->src[i].file != UNIFORM || !inst->src[i].reladdr)
+            continue;
+
+         int uniform = inst->src[i].reg;
+
+         /* If this array isn't already present in the pull constant buffer,
+          * add it.
+          */
+         if (pull_constant_loc[uniform] == -1) {
+            const float **values = &stage_prog_data->param[uniform];
+
+            assert(param_size[uniform]);
+
+            for (int j = 0; j < param_size[uniform]; j++) {
+               pull_constant_loc[uniform + j] = stage_prog_data->nr_pull_params;
+
+               stage_prog_data->pull_param[stage_prog_data->nr_pull_params++] =
+                  values[j];
+            }
+         }
+      }
+   }
+}
+
+/**
+ * Assign UNIFORM file registers to either push constants or pull constants.
+ *
+ * We allow a fragment shader to have more than the specified minimum
+ * maximum number of fragment shader uniform components (64).  If
+ * there are too many of these, they'd fill up all of register space.
+ * So, this will push some of them out to the pull constant buffer and
+ * update the program to load them.
+ */
+void
+fs_visitor::assign_constant_locations()
+{
+   /* Only the first compile (SIMD8 mode) gets to decide on locations. */
+   if (dispatch_width != 8)
+      return;
+
+   /* Find which UNIFORM registers are still in use. */
+   bool is_live[uniforms];
+   for (unsigned int i = 0; i < uniforms; i++) {
+      is_live[i] = false;
+   }
+
+   foreach_list(node, &this->instructions) {
+      fs_inst *inst = (fs_inst *) node;
+
+      for (int i = 0; i < 3; i++) {
+         if (inst->src[i].file != UNIFORM)
+            continue;
+
+         int constant_nr = inst->src[i].reg + inst->src[i].reg_offset;
+         if (constant_nr >= 0 && constant_nr < (int) uniforms)
+            is_live[constant_nr] = true;
+      }
+   }
+
+   /* Only allow 16 registers (128 uniform components) as push constants.
+    *
+    * Just demote the end of the list.  We could probably do better
+    * here, demoting things that are rarely used in the program first.
+    */
+   // LunarG : TODO - turning off push constants for bring up
+   unsigned int max_push_components = 0;//16 * 8;
+   unsigned int num_push_constants = 0;
+
+   push_constant_loc = ralloc_array(mem_ctx, int, uniforms);
+
+   for (unsigned int i = 0; i < uniforms; i++) {
+      if (!is_live[i] || pull_constant_loc[i] != -1) {
+         /* This UNIFORM register is either dead, or has already been demoted
+          * to a pull const.  Mark it as no longer living in the param[] array.
+          */
+         push_constant_loc[i] = -1;
+         continue;
+      }
+
+      if (num_push_constants < max_push_components) {
+         /* Retain as a push constant.  Record the location in the params[]
+          * array.
+          */
+         push_constant_loc[i] = num_push_constants++;
+      } else {
+         /* Demote to a pull constant. */
+         push_constant_loc[i] = -1;
+
+         int pull_index = stage_prog_data->nr_pull_params++;
+         stage_prog_data->pull_param[pull_index] = stage_prog_data->param[i];
+         pull_constant_loc[i] = pull_index;
+      }
+   }
+
+   stage_prog_data->nr_params = num_push_constants;
+
+   /* Up until now, the param[] array has been indexed by reg + reg_offset
+    * of UNIFORM registers.  Condense it to only contain the uniforms we
+    * chose to upload as push constants.
+    */
+   for (unsigned int i = 0; i < uniforms; i++) {
+      int remapped = push_constant_loc[i];
+
+      if (remapped == -1)
+         continue;
+
+      assert(remapped <= (int)i);
+      stage_prog_data->param[remapped] = stage_prog_data->param[i];
+   }
+}
+
+/**
+ * Replace UNIFORM register file access with either UNIFORM_PULL_CONSTANT_LOAD
+ * or VARYING_PULL_CONSTANT_LOAD instructions which load values into VGRFs.
+ */
+void
+fs_visitor::demote_pull_constants()
+{
+   foreach_list(node, &this->instructions) {
+      fs_inst *inst = (fs_inst *)node;
+
+      for (int i = 0; i < 3; i++) {
+	 if (inst->src[i].file != UNIFORM)
+	    continue;
+
+         int pull_index = pull_constant_loc[inst->src[i].reg +
+                                            inst->src[i].reg_offset];
+         if (pull_index == -1)
+	    continue;
+
+         /* Set up the annotation tracking for new generated instructions. */
+         base_ir = inst->ir;
+         current_annotation = inst->annotation;
+
+         fs_reg surf_index(stage_prog_data->binding_table.pull_constants_start);
+         fs_reg dst = fs_reg(this, glsl_type::float_type);
+
+         /* Generate a pull load into dst. */
+         if (inst->src[i].reladdr) {
+            exec_list list = VARYING_PULL_CONSTANT_LOAD(dst,
+                                                        surf_index,
+                                                        *inst->src[i].reladdr,
+                                                        pull_index);
+            inst->insert_before(&list);
+            inst->src[i].reladdr = NULL;
+         } else {
+            fs_reg offset = fs_reg((unsigned)(pull_index * 4) & ~15);
+            fs_inst *pull =
+               new(mem_ctx) fs_inst(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD,
+                                    dst, surf_index, offset);
+            inst->insert_before(pull);
+            inst->src[i].set_smear(pull_index & 3);
+         }
+
+         /* Rewrite the instruction to use the temporary VGRF. */
+         inst->src[i].file = GRF;
+         inst->src[i].reg = dst.reg;
+         inst->src[i].reg_offset = 0;
+      }
+   }
+   invalidate_live_intervals();
+}
+
+bool
+fs_visitor::opt_algebraic()
+{
+   bool progress = false;
+
+   foreach_list(node, &this->instructions) {
+      fs_inst *inst = (fs_inst *)node;
+
+      switch (inst->opcode) {
+      case BRW_OPCODE_MUL:
+	 if (inst->src[1].file != IMM)
+	    continue;
+
+	 /* a * 1.0 = a */
+	 if (inst->src[1].is_one()) {
+	    inst->opcode = BRW_OPCODE_MOV;
+	    inst->src[1] = reg_undef;
+	    progress = true;
+	    break;
+	 }
+
+         /* a * 0.0 = 0.0 */
+         if (inst->src[1].is_zero()) {
+            inst->opcode = BRW_OPCODE_MOV;
+            inst->src[0] = inst->src[1];
+            inst->src[1] = reg_undef;
+            progress = true;
+            break;
+         }
+
+	 break;
+      case BRW_OPCODE_ADD:
+         if (inst->src[1].file != IMM)
+            continue;
+
+         /* a + 0.0 = a */
+         if (inst->src[1].is_zero()) {
+            inst->opcode = BRW_OPCODE_MOV;
+            inst->src[1] = reg_undef;
+            progress = true;
+            break;
+         }
+         break;
+      case BRW_OPCODE_OR:
+         if (inst->src[0].equals(inst->src[1])) {
+            inst->opcode = BRW_OPCODE_MOV;
+            inst->src[1] = reg_undef;
+            progress = true;
+            break;
+         }
+         break;
+      case BRW_OPCODE_LRP:
+         if (inst->src[1].equals(inst->src[2])) {
+            inst->opcode = BRW_OPCODE_MOV;
+            inst->src[0] = inst->src[1];
+            inst->src[1] = reg_undef;
+            inst->src[2] = reg_undef;
+            progress = true;
+            break;
+         }
+         break;
+      case BRW_OPCODE_SEL:
+         if (inst->saturate && inst->src[1].file == IMM) {
+            switch (inst->conditional_mod) {
+            case BRW_CONDITIONAL_LE:
+            case BRW_CONDITIONAL_L:
+               switch (inst->src[1].type) {
+               case BRW_REGISTER_TYPE_F:
+                  if (inst->src[1].imm.f >= 1.0f) {
+                     inst->opcode = BRW_OPCODE_MOV;
+                     inst->src[1] = reg_undef;
+                     progress = true;
+                  }
+                  break;
+               default:
+                  break;
+               }
+               break;
+            case BRW_CONDITIONAL_GE:
+            case BRW_CONDITIONAL_G:
+               switch (inst->src[1].type) {
+               case BRW_REGISTER_TYPE_F:
+                  if (inst->src[1].imm.f <= 0.0f) {
+                     inst->opcode = BRW_OPCODE_MOV;
+                     inst->src[1] = reg_undef;
+                     inst->conditional_mod = BRW_CONDITIONAL_NONE;
+                     progress = true;
+                  }
+                  break;
+               default:
+                  break;
+               }
+            default:
+               break;
+            }
+         }
+         break;
+      default:
+	 break;
+      }
+   }
+
+   return progress;
+}
+
+bool
+fs_visitor::compute_to_mrf()
+{
+   bool progress = false;
+   int next_ip = 0;
+
+   calculate_live_intervals();
+
+   foreach_list_safe(node, &this->instructions) {
+      fs_inst *inst = (fs_inst *)node;
+
+      int ip = next_ip;
+      next_ip++;
+
+      if (inst->opcode != BRW_OPCODE_MOV ||
+	  inst->is_partial_write() ||
+	  inst->dst.file != MRF || inst->src[0].file != GRF ||
+	  inst->dst.type != inst->src[0].type ||
+	  inst->src[0].abs || inst->src[0].negate ||
+          !inst->src[0].is_contiguous() ||
+          inst->src[0].subreg_offset)
+	 continue;
+
+      /* Work out which hardware MRF registers are written by this
+       * instruction.
+       */
+      int mrf_low = inst->dst.reg & ~BRW_MRF_COMPR4;
+      int mrf_high;
+      if (inst->dst.reg & BRW_MRF_COMPR4) {
+	 mrf_high = mrf_low + 4;
+      } else if (dispatch_width == 16 &&
+		 (!inst->force_uncompressed && !inst->force_sechalf)) {
+	 mrf_high = mrf_low + 1;
+      } else {
+	 mrf_high = mrf_low;
+      }
+
+      /* Can't compute-to-MRF this GRF if someone else was going to
+       * read it later.
+       */
+      if (this->virtual_grf_end[inst->src[0].reg] > ip)
+	 continue;
+
+      /* Found a move of a GRF to a MRF.  Let's see if we can go
+       * rewrite the thing that made this GRF to write into the MRF.
+       */
+      fs_inst *scan_inst;
+      for (scan_inst = (fs_inst *)inst->prev;
+	   scan_inst->prev != NULL;
+	   scan_inst = (fs_inst *)scan_inst->prev) {
+	 if (scan_inst->dst.file == GRF &&
+	     scan_inst->dst.reg == inst->src[0].reg) {
+	    /* Found the last thing to write our reg we want to turn
+	     * into a compute-to-MRF.
+	     */
+
+	    /* If this one instruction didn't populate all the
+	     * channels, bail.  We might be able to rewrite everything
+	     * that writes that reg, but it would require smarter
+	     * tracking to delay the rewriting until complete success.
+	     */
+	    if (scan_inst->is_partial_write())
+	       break;
+
+            /* Things returning more than one register would need us to
+             * understand coalescing out more than one MOV at a time.
+             */
+            if (scan_inst->regs_written > 1)
+               break;
+
+	    /* SEND instructions can't have MRF as a destination. */
+	    if (scan_inst->mlen)
+	       break;
+
+	    if (brw->gen == 6) {
+	       /* gen6 math instructions must have the destination be
+		* GRF, so no compute-to-MRF for them.
+		*/
+	       if (scan_inst->is_math()) {
+		  break;
+	       }
+	    }
+
+	    if (scan_inst->dst.reg_offset == inst->src[0].reg_offset) {
+	       /* Found the creator of our MRF's source value. */
+	       scan_inst->dst.file = MRF;
+	       scan_inst->dst.reg = inst->dst.reg;
+	       scan_inst->saturate |= inst->saturate;
+	       inst->remove();
+	       progress = true;
+	    }
+	    break;
+	 }
+
+	 /* We don't handle control flow here.  Most computation of
+	  * values that end up in MRFs are shortly before the MRF
+	  * write anyway.
+	  */
+	 if (scan_inst->is_control_flow() && scan_inst->opcode != BRW_OPCODE_IF)
+	    break;
+
+	 /* You can't read from an MRF, so if someone else reads our
+	  * MRF's source GRF that we wanted to rewrite, that stops us.
+	  */
+	 bool interfered = false;
+	 for (int i = 0; i < 3; i++) {
+	    if (scan_inst->src[i].file == GRF &&
+		scan_inst->src[i].reg == inst->src[0].reg &&
+		scan_inst->src[i].reg_offset == inst->src[0].reg_offset) {
+	       interfered = true;
+	    }
+	 }
+	 if (interfered)
+	    break;
+
+	 if (scan_inst->dst.file == MRF) {
+	    /* If somebody else writes our MRF here, we can't
+	     * compute-to-MRF before that.
+	     */
+	    int scan_mrf_low = scan_inst->dst.reg & ~BRW_MRF_COMPR4;
+	    int scan_mrf_high;
+
+	    if (scan_inst->dst.reg & BRW_MRF_COMPR4) {
+	       scan_mrf_high = scan_mrf_low + 4;
+	    } else if (dispatch_width == 16 &&
+		       (!scan_inst->force_uncompressed &&
+			!scan_inst->force_sechalf)) {
+	       scan_mrf_high = scan_mrf_low + 1;
+	    } else {
+	       scan_mrf_high = scan_mrf_low;
+	    }
+
+	    if (mrf_low == scan_mrf_low ||
+		mrf_low == scan_mrf_high ||
+		mrf_high == scan_mrf_low ||
+		mrf_high == scan_mrf_high) {
+	       break;
+	    }
+	 }
+
+	 if (scan_inst->mlen > 0 && scan_inst->base_mrf != -1) {
+	    /* Found a SEND instruction, which means that there are
+	     * live values in MRFs from base_mrf to base_mrf +
+	     * scan_inst->mlen - 1.  Don't go pushing our MRF write up
+	     * above it.
+	     */
+	    if (mrf_low >= scan_inst->base_mrf &&
+		mrf_low < scan_inst->base_mrf + scan_inst->mlen) {
+	       break;
+	    }
+	    if (mrf_high >= scan_inst->base_mrf &&
+		mrf_high < scan_inst->base_mrf + scan_inst->mlen) {
+	       break;
+	    }
+	 }
+      }
+   }
+
+   if (progress)
+      invalidate_live_intervals();
+
+   return progress;
+}
+
+/**
+ * Walks through basic blocks, looking for repeated MRF writes and
+ * removing the later ones.
+ */
+bool
+fs_visitor::remove_duplicate_mrf_writes()
+{
+   fs_inst *last_mrf_move[16];
+   bool progress = false;
+
+   /* Need to update the MRF tracking for compressed instructions. */
+   if (dispatch_width == 16)
+      return false;
+
+   memset(last_mrf_move, 0, sizeof(last_mrf_move));
+
+   foreach_list_safe(node, &this->instructions) {
+      fs_inst *inst = (fs_inst *)node;
+
+      if (inst->is_control_flow()) {
+	 memset(last_mrf_move, 0, sizeof(last_mrf_move));
+      }
+
+      if (inst->opcode == BRW_OPCODE_MOV &&
+	  inst->dst.file == MRF) {
+	 fs_inst *prev_inst = last_mrf_move[inst->dst.reg];
+	 if (prev_inst && inst->equals(prev_inst)) {
+	    inst->remove();
+	    progress = true;
+	    continue;
+	 }
+      }
+
+      /* Clear out the last-write records for MRFs that were overwritten. */
+      if (inst->dst.file == MRF) {
+	 last_mrf_move[inst->dst.reg] = NULL;
+      }
+
+      if (inst->mlen > 0 && inst->base_mrf != -1) {
+	 /* Found a SEND instruction, which will include two or fewer
+	  * implied MRF writes.  We could do better here.
+	  */
+	 for (int i = 0; i < implied_mrf_writes(inst); i++) {
+	    last_mrf_move[inst->base_mrf + i] = NULL;
+	 }
+      }
+
+      /* Clear out any MRF move records whose sources got overwritten. */
+      if (inst->dst.file == GRF) {
+	 for (unsigned int i = 0; i < Elements(last_mrf_move); i++) {
+	    if (last_mrf_move[i] &&
+		last_mrf_move[i]->src[0].reg == inst->dst.reg) {
+	       last_mrf_move[i] = NULL;
+	    }
+	 }
+      }
+
+      if (inst->opcode == BRW_OPCODE_MOV &&
+	  inst->dst.file == MRF &&
+	  inst->src[0].file == GRF &&
+	  !inst->is_partial_write()) {
+	 last_mrf_move[inst->dst.reg] = inst;
+      }
+   }
+
+   if (progress)
+      invalidate_live_intervals();
+
+   return progress;
+}
+
+static void
+clear_deps_for_inst_src(fs_inst *inst, int dispatch_width, bool *deps,
+                        int first_grf, int grf_len)
+{
+   bool inst_simd16 = (dispatch_width > 8 &&
+                       !inst->force_uncompressed &&
+                       !inst->force_sechalf);
+
+   /* Clear the flag for registers that actually got read (as expected). */
+   for (int i = 0; i < 3; i++) {
+      int grf;
+      if (inst->src[i].file == GRF) {
+         grf = inst->src[i].reg;
+      } else if (inst->src[i].file == HW_REG &&
+                 inst->src[i].fixed_hw_reg.file == BRW_GENERAL_REGISTER_FILE) {
+         grf = inst->src[i].fixed_hw_reg.nr;
+      } else {
+         continue;
+      }
+
+      if (grf >= first_grf &&
+          grf < first_grf + grf_len) {
+         deps[grf - first_grf] = false;
+         if (inst_simd16)
+            deps[grf - first_grf + 1] = false;
+      }
+   }
+}
+
+/**
+ * Implements this workaround for the original 965:
+ *
+ *     "[DevBW, DevCL] Implementation Restrictions: As the hardware does not
+ *      check for post destination dependencies on this instruction, software
+ *      must ensure that there is no destination hazard for the case of ‘write
+ *      followed by a posted write’ shown in the following example.
+ *
+ *      1. mov r3 0
+ *      2. send r3.xy <rest of send instruction>
+ *      3. mov r2 r3
+ *
+ *      Due to no post-destination dependency check on the ‘send’, the above
+ *      code sequence could have two instructions (1 and 2) in flight at the
+ *      same time that both consider ‘r3’ as the target of their final writes.
+ */
+void
+fs_visitor::insert_gen4_pre_send_dependency_workarounds(fs_inst *inst)
+{
+   int reg_size = dispatch_width / 8;
+   int write_len = inst->regs_written * reg_size;
+   int first_write_grf = inst->dst.reg;
+   bool needs_dep[BRW_MAX_MRF];
+   assert(write_len < (int)sizeof(needs_dep) - 1);
+
+   memset(needs_dep, false, sizeof(needs_dep));
+   memset(needs_dep, true, write_len);
+
+   clear_deps_for_inst_src(inst, dispatch_width,
+                           needs_dep, first_write_grf, write_len);
+
+   /* Walk backwards looking for writes to registers we're writing which
+    * aren't read since being written.  If we hit the start of the program,
+    * we assume that there are no outstanding dependencies on entry to the
+    * program.
+    */
+   for (fs_inst *scan_inst = (fs_inst *)inst->prev;
+        !scan_inst->is_head_sentinel();
+        scan_inst = (fs_inst *)scan_inst->prev) {
+
+      /* If we hit control flow, assume that there *are* outstanding
+       * dependencies, and force their cleanup before our instruction.
+       */
+      if (scan_inst->is_control_flow()) {
+         for (int i = 0; i < write_len; i++) {
+            if (needs_dep[i]) {
+               inst->insert_before(DEP_RESOLVE_MOV(first_write_grf + i));
+            }
+         }
+         return;
+      }
+
+      bool scan_inst_simd16 = (dispatch_width > 8 &&
+                               !scan_inst->force_uncompressed &&
+                               !scan_inst->force_sechalf);
+
+      /* We insert our reads as late as possible on the assumption that any
+       * instruction but a MOV that might have left us an outstanding
+       * dependency has more latency than a MOV.
+       */
+      if (scan_inst->dst.file == GRF) {
+         for (int i = 0; i < scan_inst->regs_written; i++) {
+            int reg = scan_inst->dst.reg + i * reg_size;
+
+            if (reg >= first_write_grf &&
+                reg < first_write_grf + write_len &&
+                needs_dep[reg - first_write_grf]) {
+               inst->insert_before(DEP_RESOLVE_MOV(reg));
+               needs_dep[reg - first_write_grf] = false;
+               if (scan_inst_simd16)
+                  needs_dep[reg - first_write_grf + 1] = false;
+            }
+         }
+      }
+
+      /* Clear the flag for registers that actually got read (as expected). */
+      clear_deps_for_inst_src(scan_inst, dispatch_width,
+                              needs_dep, first_write_grf, write_len);
+
+      /* Continue the loop only if we haven't resolved all the dependencies */
+      int i;
+      for (i = 0; i < write_len; i++) {
+         if (needs_dep[i])
+            break;
+      }
+      if (i == write_len)
+         return;
+   }
+}
+
+/**
+ * Implements this workaround for the original 965:
+ *
+ *     "[DevBW, DevCL] Errata: A destination register from a send can not be
+ *      used as a destination register until after it has been sourced by an
+ *      instruction with a different destination register.
+ */
+void
+fs_visitor::insert_gen4_post_send_dependency_workarounds(fs_inst *inst)
+{
+   int write_len = inst->regs_written * dispatch_width / 8;
+   int first_write_grf = inst->dst.reg;
+   bool needs_dep[BRW_MAX_MRF];
+   assert(write_len < (int)sizeof(needs_dep) - 1);
+
+   memset(needs_dep, false, sizeof(needs_dep));
+   memset(needs_dep, true, write_len);
+   /* Walk forwards looking for writes to registers we're writing which aren't
+    * read before being written.
+    */
+   for (fs_inst *scan_inst = (fs_inst *)inst->next;
+        !scan_inst->is_tail_sentinel();
+        scan_inst = (fs_inst *)scan_inst->next) {
+      /* If we hit control flow, force resolve all remaining dependencies. */
+      if (scan_inst->is_control_flow()) {
+         for (int i = 0; i < write_len; i++) {
+            if (needs_dep[i])
+               scan_inst->insert_before(DEP_RESOLVE_MOV(first_write_grf + i));
+         }
+         return;
+      }
+
+      /* Clear the flag for registers that actually got read (as expected). */
+      clear_deps_for_inst_src(scan_inst, dispatch_width,
+                              needs_dep, first_write_grf, write_len);
+
+      /* We insert our reads as late as possible since they're reading the
+       * result of a SEND, which has massive latency.
+       */
+      if (scan_inst->dst.file == GRF &&
+          scan_inst->dst.reg >= first_write_grf &&
+          scan_inst->dst.reg < first_write_grf + write_len &&
+          needs_dep[scan_inst->dst.reg - first_write_grf]) {
+         scan_inst->insert_before(DEP_RESOLVE_MOV(scan_inst->dst.reg));
+         needs_dep[scan_inst->dst.reg - first_write_grf] = false;
+      }
+
+      /* Continue the loop only if we haven't resolved all the dependencies */
+      int i;
+      for (i = 0; i < write_len; i++) {
+         if (needs_dep[i])
+            break;
+      }
+      if (i == write_len)
+         return;
+   }
+
+   /* If we hit the end of the program, resolve all remaining dependencies out
+    * of paranoia.
+    */
+   fs_inst *last_inst = (fs_inst *)this->instructions.get_tail();
+   assert(last_inst->eot);
+   for (int i = 0; i < write_len; i++) {
+      if (needs_dep[i])
+         last_inst->insert_before(DEP_RESOLVE_MOV(first_write_grf + i));
+   }
+}
+
+void
+fs_visitor::insert_gen4_send_dependency_workarounds()
+{
+   if (brw->gen != 4 || brw->is_g4x)
+      return;
+
+   bool progress = false;
+
+   /* Note that we're done with register allocation, so GRF fs_regs always
+    * have a .reg_offset of 0.
+    */
+
+   foreach_list_safe(node, &this->instructions) {
+      fs_inst *inst = (fs_inst *)node;
+
+      if (inst->mlen != 0 && inst->dst.file == GRF) {
+         insert_gen4_pre_send_dependency_workarounds(inst);
+         insert_gen4_post_send_dependency_workarounds(inst);
+         progress = true;
+      }
+   }
+
+   if (progress)
+      invalidate_live_intervals();
+}
+
+/**
+ * Turns the generic expression-style uniform pull constant load instruction
+ * into a hardware-specific series of instructions for loading a pull
+ * constant.
+ *
+ * The expression style allows the CSE pass before this to optimize out
+ * repeated loads from the same offset, and gives the pre-register-allocation
+ * scheduling full flexibility, while the conversion to native instructions
+ * allows the post-register-allocation scheduler the best information
+ * possible.
+ *
+ * Note that execution masking for setting up pull constant loads is special:
+ * the channels that need to be written are unrelated to the current execution
+ * mask, since a later instruction will use one of the result channels as a
+ * source operand for all 8 or 16 of its channels.
+ */
+void
+fs_visitor::lower_uniform_pull_constant_loads()
+{
+   foreach_list(node, &this->instructions) {
+      fs_inst *inst = (fs_inst *)node;
+
+      if (inst->opcode != FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD)
+         continue;
+
+      if (brw->gen >= 7) {
+         /* The offset arg before was a vec4-aligned byte offset.  We need to
+          * turn it into a dword offset.
+          */
+         fs_reg const_offset_reg = inst->src[1];
+         assert(const_offset_reg.file == IMM &&
+                const_offset_reg.type == BRW_REGISTER_TYPE_UD);
+         const_offset_reg.imm.u /= 4;
+         fs_reg payload = fs_reg(this, glsl_type::uint_type);
+
+         /* This is actually going to be a MOV, but since only the first dword
+          * is accessed, we have a special opcode to do just that one.  Note
+          * that this needs to be an operation that will be considered a def
+          * by live variable analysis, or register allocation will explode.
+          */
+         fs_inst *setup = new(mem_ctx) fs_inst(FS_OPCODE_SET_SIMD4X2_OFFSET,
+                                               payload, const_offset_reg);
+         setup->force_writemask_all = true;
+
+         setup->ir = inst->ir;
+         setup->annotation = inst->annotation;
+         inst->insert_before(setup);
+
+         /* Similarly, this will only populate the first 4 channels of the
+          * result register (since we only use smear values from 0-3), but we
+          * don't tell the optimizer.
+          */
+         inst->opcode = FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GEN7;
+         inst->src[1] = payload;
+
+         invalidate_live_intervals();
+      } else {
+         /* Before register allocation, we didn't tell the scheduler about the
+          * MRF we use.  We know it's safe to use this MRF because nothing
+          * else does except for register spill/unspill, which generates and
+          * uses its MRF within a single IR instruction.
+          */
+         inst->base_mrf = 14;
+         inst->mlen = 1;
+      }
+   }
+}
+
+void
+fs_visitor::dump_instructions()
+{
+   calculate_register_pressure();
+
+   int ip = 0, max_pressure = 0, indent = 0;;
+   foreach_list(node, &this->instructions) {
+      backend_instruction *inst = (backend_instruction *)node;
+
+      max_pressure = MAX2(max_pressure, regs_live_at_ip[ip]);
+
+      switch (inst->opcode) {
+      case BRW_OPCODE_DO:    
+      case BRW_OPCODE_IF:    break;
+      case BRW_OPCODE_ELSE:  
+      case BRW_OPCODE_ENDIF: --indent; break;
+      case BRW_OPCODE_WHILE: --indent; break;
+      default: break;
+      }
+
+      fprintf(stderr, "{%3d} %4d: ", regs_live_at_ip[ip], ip);
+      for (int i=0; i<indent*4; ++i)
+         fprintf(stderr, " ");
+
+      dump_instruction(inst, stderr);
+      ++ip;
+
+      switch (inst->opcode) {
+      case BRW_OPCODE_DO:    
+      case BRW_OPCODE_IF:    
+      case BRW_OPCODE_ELSE:  ++indent; break;
+      case BRW_OPCODE_WHILE: 
+      case BRW_OPCODE_ENDIF: break;
+      default: break;
+      }
+   }
+   fprintf(stderr, "Maximum %3d registers live at once.\n", max_pressure);
+}
+
+void
+fs_visitor::dump_instruction(backend_instruction *be_inst)
+{
+   dump_instruction(be_inst, stderr);
+}
+
+void
+fs_visitor::dump_instruction(backend_instruction *be_inst, FILE *file)
+{
+   char buff[256];
+   dump_instruction(be_inst, buff);
+
+   fprintf(file, "%s", buff);
+
+   fprintf(file, "\n");
+}
+
+void
+fs_visitor::dump_instruction(backend_instruction *be_inst, char* string)
+{
+   fs_inst *inst = (fs_inst *)be_inst;
+
+   if (inst->predicate) {
+      string += sprintf(string, "(%cf0.%d) ",
+             inst->predicate_inverse ? '-' : '+',
+             inst->flag_subreg);
+   }
+
+   string += sprintf(string, "%s", brw_instruction_name(inst->opcode));
+   if (inst->saturate)
+      string += sprintf(string, ".sat");
+   if (inst->conditional_mod) {
+      string += sprintf(string, "%s", conditional_modifier[inst->conditional_mod]);
+      if (!inst->predicate &&
+          (brw->gen < 5 || (inst->opcode != BRW_OPCODE_SEL &&
+                              inst->opcode != BRW_OPCODE_IF &&
+                              inst->opcode != BRW_OPCODE_WHILE))) {
+         string += sprintf(string, ".f0.%d", inst->flag_subreg);
+      }
+   }
+   string += sprintf(string, " ");
+
+
+   switch (inst->dst.file) {
+   case GRF:
+      string += sprintf(string, "vgrf%d", inst->dst.reg);
+      if (virtual_grf_sizes[inst->dst.reg] != 1 ||
+          inst->dst.subreg_offset)
+         string += sprintf(string, "+%d.%d",
+                 inst->dst.reg_offset, inst->dst.subreg_offset);
+      break;
+   case MRF:
+      string += sprintf(string, "m%d", inst->dst.reg);
+      break;
+   case BAD_FILE:
+      string += sprintf(string, "(null)");
+      break;
+   case UNIFORM:
+      string += sprintf(string, "***u%d***", inst->dst.reg + inst->dst.reg_offset);
+      break;
+   case HW_REG:
+      if (inst->dst.fixed_hw_reg.file == BRW_ARCHITECTURE_REGISTER_FILE) {
+         switch (inst->dst.fixed_hw_reg.nr) {
+         case BRW_ARF_NULL:
+            string += sprintf(string, "null");
+            break;
+         case BRW_ARF_ADDRESS:
+            string += sprintf(string, "a0.%d", inst->dst.fixed_hw_reg.subnr);
+            break;
+         case BRW_ARF_ACCUMULATOR:
+            string += sprintf(string, "acc%d", inst->dst.fixed_hw_reg.subnr);
+            break;
+         case BRW_ARF_FLAG:
+            string += sprintf(string, "f%d.%d", inst->dst.fixed_hw_reg.nr & 0xf,
+                             inst->dst.fixed_hw_reg.subnr);
+            break;
+         default:
+            string += sprintf(string, "arf%d.%d", inst->dst.fixed_hw_reg.nr & 0xf,
+                               inst->dst.fixed_hw_reg.subnr);
+            break;
+         }
+      } else {
+         string += sprintf(string, "hw_reg%d", inst->dst.fixed_hw_reg.nr);
+      }
+      if (inst->dst.fixed_hw_reg.subnr)
+         string += sprintf(string, "+%d", inst->dst.fixed_hw_reg.subnr);
+      break;
+   default:
+      string += sprintf(string, "???");
+      break;
+   }
+   string += sprintf(string, ":%s, ", brw_reg_type_letters(inst->dst.type));
+
+   for (int i = 0; i < 3 && inst->src[i].file != BAD_FILE; i++) {
+      if (inst->src[i].negate)
+         string += sprintf(string, "-");
+      if (inst->src[i].abs)
+         string += sprintf(string, "|");
+      switch (inst->src[i].file) {
+      case GRF:
+         string += sprintf(string, "vgrf%d", inst->src[i].reg);
+         if (virtual_grf_sizes[inst->src[i].reg] != 1 ||
+             inst->src[i].subreg_offset)
+            string += sprintf(string, "+%d.%d", inst->src[i].reg_offset,
+                    inst->src[i].subreg_offset);
+         break;
+      case MRF:
+         string += sprintf(string, "***m%d***", inst->src[i].reg);
+         break;
+      case UNIFORM:
+         string += sprintf(string, "u%d", inst->src[i].reg + inst->src[i].reg_offset);
+         if (inst->src[i].reladdr) {
+            string += sprintf(string, "+reladdr");
+         } else if (virtual_grf_sizes[inst->src[i].reg] != 1 ||
+             inst->src[i].subreg_offset) {
+            string += sprintf(string, "+%d.%d", inst->src[i].reg_offset,
+                    inst->src[i].subreg_offset);
+         }
+         break;
+      case BAD_FILE:
+         string += sprintf(string, "(null)");
+         break;
+      case IMM:
+         switch (inst->src[i].type) {
+         case BRW_REGISTER_TYPE_F:
+            string += sprintf(string, "%ff", inst->src[i].imm.f);
+            break;
+         case BRW_REGISTER_TYPE_D:
+            string += sprintf(string, "%dd", inst->src[i].imm.i);
+            break;
+         case BRW_REGISTER_TYPE_UD:
+            string += sprintf(string, "%uu", inst->src[i].imm.u);
+            break;
+         default:
+            string += sprintf(string, "???");
+            break;
+         }
+         break;
+      case HW_REG:
+         if (inst->src[i].fixed_hw_reg.negate)
+            string += sprintf(string, "-");
+         if (inst->src[i].fixed_hw_reg.abs)
+            string += sprintf(string, "|");
+         if (inst->src[i].fixed_hw_reg.file == BRW_ARCHITECTURE_REGISTER_FILE) {
+            switch (inst->src[i].fixed_hw_reg.nr) {
+            case BRW_ARF_NULL:
+               string += sprintf(string, "null");
+               break;
+            case BRW_ARF_ADDRESS:
+               string += sprintf(string, "a0.%d", inst->src[i].fixed_hw_reg.subnr);
+               break;
+            case BRW_ARF_ACCUMULATOR:
+               string += sprintf(string, "acc%d", inst->src[i].fixed_hw_reg.subnr);
+               break;
+            case BRW_ARF_FLAG:
+               string += sprintf(string, "f%d.%d", inst->src[i].fixed_hw_reg.nr & 0xf,
+                                inst->src[i].fixed_hw_reg.subnr);
+               break;
+            default:
+               string += sprintf(string, "arf%d.%d", inst->src[i].fixed_hw_reg.nr & 0xf,
+                                  inst->src[i].fixed_hw_reg.subnr);
+               break;
+            }
+         } else {
+            string += sprintf(string, "hw_reg%d", inst->src[i].fixed_hw_reg.nr);
+         }
+         if (inst->src[i].fixed_hw_reg.subnr)
+            string += sprintf(string, "+%d", inst->src[i].fixed_hw_reg.subnr);
+         if (inst->src[i].fixed_hw_reg.abs)
+            string += sprintf(string, "|");
+         break;
+      default:
+         string += sprintf(string, "???");
+         break;
+      }
+      if (inst->src[i].abs)
+         string += sprintf(string, "|");
+
+      if (inst->src[i].file != IMM) {
+         string += sprintf(string, ":%s", brw_reg_type_letters(inst->src[i].type));
+      }
+
+      if (i < 2 && inst->src[i + 1].file != BAD_FILE)
+         string += sprintf(string, ", ");
+   }
+
+   string += sprintf(string, " ");
+
+   if (inst->force_uncompressed)
+      string += sprintf(string, "1sthalf ");
+
+   if (inst->force_sechalf)
+      string += sprintf(string, "2ndhalf ");
+}
+
+
+/**
+ * Possibly returns an instruction that set up @param reg.
+ *
+ * Sometimes we want to take the result of some expression/variable
+ * dereference tree and rewrite the instruction generating the result
+ * of the tree.  When processing the tree, we know that the
+ * instructions generated are all writing temporaries that are dead
+ * outside of this tree.  So, if we have some instructions that write
+ * a temporary, we're free to point that temp write somewhere else.
+ *
+ * Note that this doesn't guarantee that the instruction generated
+ * only reg -- it might be the size=4 destination of a texture instruction.
+ */
+fs_inst *
+fs_visitor::get_instruction_generating_reg(fs_inst *start,
+					   fs_inst *end,
+					   const fs_reg &reg)
+{
+   if (end == start ||
+       end->is_partial_write() ||
+       reg.reladdr ||
+       !reg.equals(end->dst)) {
+      return NULL;
+   } else {
+      return end;
+   }
+}
+
+void
+fs_visitor::setup_payload_gen6()
+{
+   bool uses_depth =
+      (fp->Base.InputsRead & (1 << VARYING_SLOT_POS)) != 0;
+   unsigned barycentric_interp_modes = c->prog_data.barycentric_interp_modes;
+
+   assert(brw->gen >= 6);
+
+   /* R0-1: masks, pixel X/Y coordinates. */
+   c->nr_payload_regs = 2;
+   /* R2: only for 32-pixel dispatch.*/
+
+   /* R3-26: barycentric interpolation coordinates.  These appear in the
+    * same order that they appear in the brw_wm_barycentric_interp_mode
+    * enum.  Each set of coordinates occupies 2 registers if dispatch width
+    * == 8 and 4 registers if dispatch width == 16.  Coordinates only
+    * appear if they were enabled using the "Barycentric Interpolation
+    * Mode" bits in WM_STATE.
+    */
+   for (int i = 0; i < BRW_WM_BARYCENTRIC_INTERP_MODE_COUNT; ++i) {
+      if (barycentric_interp_modes & (1 << i)) {
+         c->barycentric_coord_reg[i] = c->nr_payload_regs;
+         c->nr_payload_regs += 2;
+         if (dispatch_width == 16) {
+            c->nr_payload_regs += 2;
+         }
+      }
+   }
+
+   /* R27: interpolated depth if uses source depth */
+   if (uses_depth) {
+      c->source_depth_reg = c->nr_payload_regs;
+      c->nr_payload_regs++;
+      if (dispatch_width == 16) {
+         /* R28: interpolated depth if not SIMD8. */
+         c->nr_payload_regs++;
+      }
+   }
+   /* R29: interpolated W set if GEN6_WM_USES_SOURCE_W. */
+   if (uses_depth) {
+      c->source_w_reg = c->nr_payload_regs;
+      c->nr_payload_regs++;
+      if (dispatch_width == 16) {
+         /* R30: interpolated W if not SIMD8. */
+         c->nr_payload_regs++;
+      }
+   }
+
+   c->prog_data.uses_pos_offset = c->key.compute_pos_offset;
+   /* R31: MSAA position offsets. */
+   if (c->prog_data.uses_pos_offset) {
+      c->sample_pos_reg = c->nr_payload_regs;
+      c->nr_payload_regs++;
+   }
+
+   /* R32: MSAA input coverage mask */
+   if (fp->Base.SystemValuesRead & SYSTEM_BIT_SAMPLE_MASK_IN) {
+      assert(brw->gen >= 7);
+      c->sample_mask_reg = c->nr_payload_regs;
+      c->nr_payload_regs++;
+      if (dispatch_width == 16) {
+         /* R33: input coverage mask if not SIMD8. */
+         c->nr_payload_regs++;
+      }
+   }
+
+   /* R34-: bary for 32-pixel. */
+   /* R58-59: interp W for 32-pixel. */
+
+   if (fp->Base.OutputsWritten & BITFIELD64_BIT(FRAG_RESULT_DEPTH)) {
+      c->source_depth_to_render_target = true;
+   }
+}
+
+void
+fs_visitor::assign_binding_table_offsets()
+{
+   uint32_t next_binding_table_offset = 0;
+
+   /* If there are no color regions, we still perform an FB write to a null
+    * renderbuffer, which we place at surface index 0.
+    */
+   c->prog_data.binding_table.render_target_start = next_binding_table_offset;
+   next_binding_table_offset += MAX2(c->key.nr_color_regions, 1);
+
+   assign_common_binding_table_offsets(next_binding_table_offset);
+}
+
+int
+fs_visitor::calculate_register_pressure(int extra)
+{
+   invalidate_live_intervals();
+   calculate_live_intervals(extra);
+
+   int num_instructions = 0;
+   foreach_list(node, &this->instructions) {
+      ++num_instructions;
+   }
+
+   regs_live_at_ip = rzalloc_array(mem_ctx, int, num_instructions);
+
+   int pressure = 0;
+
+   for (int reg = 0; reg < virtual_grf_count; reg++) {
+      for (int ip = virtual_grf_start[reg]; ip <= virtual_grf_end[reg]; ip++) {
+         regs_live_at_ip[ip] += virtual_grf_sizes[reg];
+         pressure = MAX2(pressure, regs_live_at_ip[ip]);
+      }
+   }
+
+   return pressure;
+}
+
+/**
+ * Look for repeated FS_OPCODE_MOV_DISPATCH_TO_FLAGS and drop the later ones.
+ *
+ * The needs_unlit_centroid_workaround ends up producing one of these per
+ * channel of centroid input, so it's good to clean them up.
+ *
+ * An assumption here is that nothing ever modifies the dispatched pixels
+ * value that FS_OPCODE_MOV_DISPATCH_TO_FLAGS reads from, but the hardware
+ * dictates that anyway.
+ */
+void
+fs_visitor::opt_drop_redundant_mov_to_flags()
+{
+   bool flag_mov_found[2] = {false};
+
+   foreach_list_safe(node, &this->instructions) {
+      fs_inst *inst = (fs_inst *)node;
+
+      if (inst->is_control_flow()) {
+         memset(flag_mov_found, 0, sizeof(flag_mov_found));
+      } else if (inst->opcode == FS_OPCODE_MOV_DISPATCH_TO_FLAGS) {
+         if (!flag_mov_found[inst->flag_subreg])
+            flag_mov_found[inst->flag_subreg] = true;
+         else
+            inst->remove();
+      } else if (inst->writes_flag()) {
+         flag_mov_found[inst->flag_subreg] = false;
+      }
+   }
+}
+
+bool
+fs_visitor::run()
+{
+   sanity_param_count = fp->Base.Parameters->NumParameters;
+   bool allocated_without_spills;
+
+   assign_binding_table_offsets();
+
+//   if (brw->gen >= 6)
+      assert(brw->gen >= 6);
+      setup_payload_gen6();
+//   else
+//      setup_payload_gen4();
+
+   if (0) {
+      emit_dummy_fs();
+   } else {
+//      if (INTEL_DEBUG & DEBUG_SHADER_TIME)
+//         emit_shader_time_begin();
+
+
+       calculate_urb_setup();
+      if (fp->Base.InputsRead > 0) {
+         if (brw->gen < 6)
+            emit_interpolation_setup_gen4();
+         else
+            emit_interpolation_setup_gen6();
+      }
+
+      /* We handle discards by keeping track of the still-live pixels in f0.1.
+       * Initialize it with the dispatched pixels.
+       */
+      if (fp->UsesKill || c->key.alpha_test_func) {
+         fs_inst *discard_init = emit(FS_OPCODE_MOV_DISPATCH_TO_FLAGS);
+         discard_init->flag_subreg = 1;
+      }
+
+      /* Generate FS IR for main().  (the visitor only descends into
+       * functions called "main").
+       */
+//      if (shader) {
+         assert(shader);
+         foreach_list(node, &*shader->base.ir) {
+            ir_instruction *ir = (ir_instruction *)node;
+            base_ir = ir;
+            this->result = reg_undef;
+            ir->accept(this);
+         }
+//      } else {
+//         emit_fragment_program_code();
+//      }
+      base_ir = NULL;
+      if (failed)
+	 return false;
+
+      emit(FS_OPCODE_PLACEHOLDER_HALT);
+
+      if (c->key.alpha_test_func)
+         emit_alpha_test();
+
+      emit_fb_writes();
+
+      split_virtual_grfs();
+
+      move_uniform_array_access_to_pull_constants();
+      assign_constant_locations();
+      demote_pull_constants();
+
+      opt_drop_redundant_mov_to_flags();
+
+      bool progress;
+      do {
+	 progress = false;
+
+         compact_virtual_grfs();
+
+	 progress = remove_duplicate_mrf_writes() || progress;
+
+	 progress = opt_algebraic() || progress;
+	 progress = opt_cse() || progress;
+	 progress = opt_copy_propagate() || progress;
+         progress = opt_peephole_predicated_break() || progress;
+         progress = dead_code_eliminate() || progress;
+         progress = opt_peephole_sel() || progress;
+         progress = dead_control_flow_eliminate(this) || progress;
+         progress = opt_saturate_propagation() || progress;
+         progress = register_coalesce() || progress;
+	 progress = compute_to_mrf() || progress;
+      } while (progress);
+
+      lower_uniform_pull_constant_loads();
+
+      assign_curb_setup();
+      assign_urb_setup();
+
+      static const enum instruction_scheduler_mode pre_modes_glassy[] = {
+         SCHEDULE_PRE_IPS_BU_HI,
+         SCHEDULE_PRE_IPS_BU_MH,
+         SCHEDULE_PRE_IPS_BU_MD,
+         SCHEDULE_PRE_IPS_BU_ML,
+         SCHEDULE_PRE_IPS_BU_LO,
+      };
+
+      unsigned pre_mode_count = 0;
+      const enum instruction_scheduler_mode* pre_modes = 0;
+      
+      pre_modes = pre_modes_glassy;
+      pre_mode_count = ARRAY_SIZE(pre_modes_glassy);
+
+      /* Try each scheduling heuristic to see if it can successfully register
+       * allocate without spilling.  They should be ordered by decreasing
+       * performance but increasing likelihood of allocating.
+       */
+      for (unsigned i = 0; i < pre_mode_count; i++) {
+         estimated_clocks = schedule_instructions(pre_modes[i]);
+
+         if (0) {
+            assign_regs_trivial();
+            allocated_without_spills = true;
+         } else {
+            allocated_without_spills = assign_regs(false);
+         }
+         if (allocated_without_spills)
+            break;
+      }
+
+      if (!allocated_without_spills) {
+         /* We assume that any spilling is worse than just dropping back to
+          * SIMD8.  There's probably actually some intermediate point where
+          * SIMD16 with a couple of spills is still better.
+          */
+         if (dispatch_width == 16) {
+            fail("Failure to register allocate.  Reduce number of "
+                 "live scalar values to avoid this.");
+         }
+
+         /* Since we're out of heuristics, just go spill registers until we
+          * get an allocation.
+          */
+         while (!assign_regs(true)) {
+            if (failed)
+               break;
+         }
+      }
+   }
+   assert(force_uncompressed_stack == 0);
+
+   /* This must come after all optimization and register allocation, since
+    * it inserts dead code that happens to have side effects, and it does
+    * so based on the actual physical registers in use.
+    */
+   insert_gen4_send_dependency_workarounds();
+
+   if (failed)
+      return false;
+
+   if (!allocated_without_spills)
+      schedule_instructions(SCHEDULE_POST);
+
+   if (dispatch_width == 8)
+      c->prog_data.reg_blocks = brw_register_blocks(grf_used);
+   else
+      c->prog_data.reg_blocks_16 = brw_register_blocks(grf_used);
+
+   /* If any state parameters were appended, then ParameterValues could have
+    * been realloced, in which case the driver uniform storage set up by
+    * _mesa_associate_uniform_storage() would point to freed memory.  Make
+    * sure that didn't happen.
+    */
+   assert(sanity_param_count == fp->Base.Parameters->NumParameters);
+
+   return !failed;
+}
+
+const unsigned *
+brw_wm_fs_emit(struct brw_context *brw, struct brw_wm_compile *c,
+               struct gl_fragment_program *fp,
+               struct gl_shader_program *prog,
+               unsigned *final_assembly_size)
+{
+//   bool start_busy = false;
+//   double start_time = 0;
+
+//   if (unlikely(brw->perf_debug)) {
+//      start_busy = (brw->batch.last_bo &&
+//                    drm_intel_bo_busy(brw->batch.last_bo));
+//      start_time = get_time();
+//   }
+
+   struct brw_shader *shader = NULL;
+   if (prog)
+      shader = (brw_shader *) prog->_LinkedShaders[MESA_SHADER_FRAGMENT];
+
+   if (unlikely(INTEL_DEBUG & DEBUG_WM))
+      brw_dump_ir(brw, "fragment", prog, &shader->base, &fp->Base);
+
+   /* Now the main event: Visit the shader IR and generate our FS IR for it.
+    */
+   fs_visitor v(brw, c, prog, fp, 8);
+   if (!v.run()) {
+      if (prog) {
+         prog->LinkStatus = false;
+         ralloc_strcat(&prog->InfoLog, v.fail_msg);
+      }
+
+      _mesa_problem(NULL, "Failed to compile fragment shader: %s\n",
+                    v.fail_msg);
+
+      return NULL;
+   }
+
+   exec_list *simd16_instructions = NULL;
+   fs_visitor v2(brw, c, prog, fp, 16);
+   if (brw->gen >= 5 && likely(!(INTEL_DEBUG & DEBUG_NO16))) {
+      if (!v.simd16_unsupported) {
+         /* Try a SIMD16 compile */
+         v2.import_uniforms(&v);
+         if (!v2.run()) {
+            perf_debug("SIMD16 shader failed to compile, falling back to "
+                       "SIMD8 at a 10-20%% performance cost: %s", v2.fail_msg);
+         } else {
+            // Use simd16 unless it looks some ratio worse than simd8
+            if ((v2.estimated_clocks * 7 / 8) < v.estimated_clocks)
+               simd16_instructions = &v2.instructions;
+         }
+      } else {
+         perf_debug("SIMD16 shader unsupported, falling back to "
+                    "SIMD8 at a 10-20%% performance cost: %s", v.no16_msg);
+      }
+   }
+
+   const unsigned *assembly = NULL;
+   // LunarG : TODO - Gen8 support
+//   if (brw->gen >= 8) {
+//      gen8_fs_generator g(brw, c, prog, fp, v.do_dual_src);
+//      assembly = g.generate_assembly(&v.instructions, simd16_instructions,
+//                                     final_assembly_size);
+//   } else {
+      fs_generator g(brw, c, prog, fp, v.do_dual_src);
+      assembly = g.generate_assembly(&v.instructions, simd16_instructions,
+                                     final_assembly_size);
+//   }
+
+//   if (unlikely(brw->perf_debug) && shader) {
+//      if (shader->compiled_once)
+//         brw_wm_debug_recompile(brw, prog, &c->key);
+//      shader->compiled_once = true;
+
+//      if (start_busy && !drm_intel_bo_busy(brw->batch.last_bo)) {
+//         perf_debug("FS compile took %.03f ms and stalled the GPU\n",
+//                    (get_time() - start_time) * 1000);
+//      }
+//   }
+
+   return assembly;
+}
+
+bool
+brw_fs_precompile(struct gl_context *ctx, struct gl_shader_program *prog)
+{
+   struct brw_context *brw = brw_context(ctx);
+   const struct brw_shader_program_precompile_key *pre_key =
+      brw_shader_program_get_precompile_key(prog);
+   struct brw_wm_prog_key key;
+
+   if (!prog->_LinkedShaders[MESA_SHADER_FRAGMENT])
+      return true;
+
+   struct gl_fragment_program *fp = (struct gl_fragment_program *)
+      prog->_LinkedShaders[MESA_SHADER_FRAGMENT]->Program;
+   struct brw_fragment_program *bfp = brw_fragment_program(fp);
+   bool program_uses_dfdy = fp->UsesDFdy;
+
+   memset(&key, 0, sizeof(key));
+
+   if (brw->gen < 6) {
+      if (fp->UsesKill)
+         key.iz_lookup |= IZ_PS_KILL_ALPHATEST_BIT;
+
+      if (fp->Base.OutputsWritten & BITFIELD64_BIT(FRAG_RESULT_DEPTH))
+         key.iz_lookup |= IZ_PS_COMPUTES_DEPTH_BIT;
+
+      /* Just assume depth testing. */
+      key.iz_lookup |= IZ_DEPTH_TEST_ENABLE_BIT;
+      key.iz_lookup |= IZ_DEPTH_WRITE_ENABLE_BIT;
+   }
+
+   if (brw->gen < 6 || _mesa_bitcount_64(fp->Base.InputsRead &
+                                         BRW_FS_VARYING_INPUT_MASK) > 16)
+      key.input_slots_valid = fp->Base.InputsRead | VARYING_BIT_POS;
+
+   key.clamp_fragment_color = ctx->API == API_OPENGL_COMPAT;
+
+   unsigned sampler_count = _mesa_fls(fp->Base.SamplersUsed);
+   for (unsigned i = 0; i < sampler_count; i++) {
+      if (fp->Base.ShadowSamplers & (1 << i)) {
+         /* Assume DEPTH_TEXTURE_MODE is the default: X, X, X, 1 */
+         key.tex.swizzles[i] =
+            MAKE_SWIZZLE4(SWIZZLE_X, SWIZZLE_X, SWIZZLE_X, SWIZZLE_ONE);
+      } else {
+         /* Color sampler: assume no swizzling. */
+         key.tex.swizzles[i] = SWIZZLE_XYZW;
+      }
+   }
+
+   if (fp->Base.InputsRead & VARYING_BIT_POS) {
+      key.drawable_height = pre_key->fbo_height;
+   }
+
+   key.nr_color_regions = _mesa_bitcount_64(fp->Base.OutputsWritten &
+         ~(BITFIELD64_BIT(FRAG_RESULT_DEPTH) |
+         BITFIELD64_BIT(FRAG_RESULT_SAMPLE_MASK)));
+
+   if ((fp->Base.InputsRead & VARYING_BIT_POS) || program_uses_dfdy) {
+      key.render_to_fbo = pre_key->is_user_fbo ||
+                          key.nr_color_regions > 1;
+   }
+
+   /* GL_FRAGMENT_SHADER_DERIVATIVE_HINT is almost always GL_DONT_CARE.  The
+    * quality of the derivatives is likely to be determined by the driconf
+    * option.
+    */
+   key.high_quality_derivatives = brw->disable_derivative_optimization;
+
+   key.program_string_id = bfp->id;
+
+   struct brw_wm_compile *c;
+
+   c = brw_wm_init_compile(brw, prog, bfp, &key);
+   if (!c)
+      return false;
+
+   if (!brw_wm_do_compile(brw, c)) {
+      brw_wm_clear_compile(brw, c);
+      return false;
+   }
+
+   // Rather than defer or upload to cache, hand off
+   // the compile results back to the brw_context
+   brw_shader_program_save_wm_compile(brw->shader_prog, c);
+
+   // populate some other fields that would normally happen at draw time
+   brw->wm.base.sampler_count = _mesa_fls(fp->Base.SamplersUsed);
+   // Others?  ...
+
+   brw_wm_clear_compile(brw, c);
+
+   return true;
+}

diff --git a/icd/intel/compiler/pipeline/brw_fs.h b/icd/intel/compiler/pipeline/brw_fs.h
new file mode 100644
index 0000000..2454589
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_fs.h

@@ -0,0 +1,797 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Eric Anholt <eric@anholt.net>
+ *
+ */
+
+#pragma once
+
+#include "brw_shader.h"
+
+extern "C" {
+
+#include <sys/types.h>
+
+#include "main/macros.h"
+#include "main/shaderobj.h"
+#include "main/uniforms.h"
+#include "program/prog_parameter.h"
+#include "program/prog_print.h"
+#include "program/prog_optimize.h"
+#include "program/register_allocate.h"
+#include "program/sampler.h"
+#include "program/hash_table.h"
+#include "brw_context.h"
+#include "brw_eu.h"
+#include "brw_wm.h"
+#include "brw_shader.h"
+}
+#include "gen8_generator.h"
+#include "glsl/glsl_types.h"
+#include "glsl/ir.h"
+
+#define MAX_SAMPLER_MESSAGE_SIZE 11
+
+class bblock_t;
+namespace {
+   struct acp_entry;
+}
+
+namespace brw {
+   class fs_live_variables;
+}
+
+class igraph_t;
+
+class fs_reg {
+public:
+   DECLARE_RALLOC_CXX_OPERATORS(fs_reg)
+
+   void init();
+
+   fs_reg();
+   fs_reg(float f);
+   fs_reg(int32_t i);
+   fs_reg(uint32_t u);
+   fs_reg(struct brw_reg fixed_hw_reg);
+   fs_reg(enum register_file file, int reg);
+   fs_reg(enum register_file file, int reg, uint32_t type);
+   fs_reg(class fs_visitor *v, const struct glsl_type *type);
+
+   bool equals(const fs_reg &r) const;
+   bool is_zero() const;
+   bool is_one() const;
+   bool is_null() const;
+   bool is_valid_3src() const;
+   bool is_contiguous() const;
+   bool is_accumulator() const;
+
+   fs_reg &apply_stride(unsigned stride);
+   /** Smear a channel of the reg to all channels. */
+   fs_reg &set_smear(unsigned subreg);
+
+   /** Register file: GRF, MRF, IMM. */
+   enum register_file file;
+   /** Register type.  BRW_REGISTER_TYPE_* */
+   uint8_t type;
+   /**
+    * Register number.  For MRF, it's the hardware register.  For
+    * GRF, it's a virtual register number until register allocation
+    */
+   uint16_t reg;
+   /**
+    * Offset from the start of the contiguous register block.
+    *
+    * For pre-register-allocation GRFs, this is in units of a float per pixel
+    * (1 hardware register for SIMD8 mode, or 2 registers for SIMD16 mode).
+    * For uniforms, this is in units of 1 float.
+    */
+   int reg_offset;
+   /**
+    * Offset in bytes from the start of the register.  Values up to a
+    * backend_reg::reg_offset unit are valid.
+    */
+   int subreg_offset;
+
+   /** Value for file == IMM */
+   union {
+      int32_t i;
+      uint32_t u;
+      float f;
+   } imm;
+
+   struct brw_reg fixed_hw_reg;
+
+   fs_reg *reladdr;
+
+   bool negate;
+   bool abs;
+
+   /** Register region horizontal stride */
+   uint8_t stride;
+};
+
+static inline fs_reg
+retype(fs_reg reg, unsigned type)
+{
+   reg.fixed_hw_reg.type = reg.type = type;
+   return reg;
+}
+
+static inline fs_reg
+offset(fs_reg reg, unsigned delta)
+{
+   assert(delta == 0 || (reg.file != HW_REG && reg.file != IMM));
+   reg.reg_offset += delta;
+   return reg;
+}
+
+static inline fs_reg
+byte_offset(fs_reg reg, unsigned delta)
+{
+   assert(delta == 0 || (reg.file != HW_REG && reg.file != IMM));
+   reg.subreg_offset += delta;
+   return reg;
+}
+
+/**
+ * Get either of the 8-component halves of a 16-component register.
+ *
+ * Note: this also works if \c reg represents a SIMD16 pair of registers.
+ */
+static inline fs_reg
+half(const fs_reg &reg, unsigned idx)
+{
+   assert(idx < 2);
+   assert(idx == 0 || (reg.file != HW_REG && reg.file != IMM));
+   return byte_offset(reg, 8 * idx * reg.stride * type_sz(reg.type));
+}
+
+static const fs_reg reg_undef;
+static const fs_reg reg_null_f(retype(brw_null_reg(), BRW_REGISTER_TYPE_F));
+static const fs_reg reg_null_d(retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
+static const fs_reg reg_null_ud(retype(brw_null_reg(), BRW_REGISTER_TYPE_UD));
+
+class ip_record : public exec_node {
+public:
+   DECLARE_RALLOC_CXX_OPERATORS(ip_record)
+
+   ip_record(int ip)
+   {
+      this->ip = ip;
+   }
+
+   int ip;
+};
+
+class fs_inst : public backend_instruction {
+public:
+   DECLARE_RALLOC_CXX_OPERATORS(fs_inst)
+
+   void init();
+
+   fs_inst();
+   fs_inst(enum opcode opcode);
+   fs_inst(enum opcode opcode, fs_reg dst);
+   fs_inst(enum opcode opcode, fs_reg dst, fs_reg src0);
+   fs_inst(enum opcode opcode, fs_reg dst, fs_reg src0, fs_reg src1);
+   fs_inst(enum opcode opcode, fs_reg dst,
+           fs_reg src0, fs_reg src1,fs_reg src2);
+
+   bool equals(fs_inst *inst) const;
+   bool overwrites_reg(const fs_reg &reg) const;
+   bool is_send_from_grf() const;
+   bool is_partial_write() const;
+   int regs_read(fs_visitor *v, int arg) const;
+
+   bool reads_flag() const;
+   bool writes_flag() const;
+
+   fs_reg dst;
+   fs_reg src[3];
+
+   /** @{
+    * Annotation for the generated IR.  One of the two can be set.
+    */
+   const void *ir;
+   const char *annotation;
+   /** @} */
+
+   uint32_t texture_offset; /**< Texture offset bitfield */
+   uint32_t offset; /* spill/unspill offset */
+
+   uint8_t conditional_mod; /**< BRW_CONDITIONAL_* */
+
+   /* Chooses which flag subregister (f0.0 or f0.1) is used for conditional
+    * mod and predication.
+    */
+   uint8_t flag_subreg;
+
+   uint8_t mlen; /**< SEND message length */
+   uint8_t regs_written; /**< Number of vgrfs written by a SEND message, or 1 */
+   int8_t base_mrf; /**< First MRF in the SEND message, if mlen is nonzero. */
+   uint8_t sampler;
+   uint8_t target; /**< MRT target. */
+   bool saturate:1;
+   bool eot:1;
+   bool header_present:1;
+   bool shadow_compare:1;
+   bool force_uncompressed:1;
+   bool force_sechalf:1;
+   bool force_writemask_all:1;
+};
+
+/**
+ * The fragment shader front-end.
+ *
+ * Translates either GLSL IR or Mesa IR (for ARB_fragment_program) into FS IR.
+ */
+class fs_visitor : public backend_visitor
+{
+public:
+
+   fs_visitor(struct brw_context *brw,
+              struct brw_wm_compile *c,
+              struct gl_shader_program *shader_prog,
+              struct gl_fragment_program *fp,
+              unsigned dispatch_width);
+   ~fs_visitor();
+
+   fs_reg *variable_storage(ir_variable *var);
+   int virtual_grf_alloc(int size);
+   void import_uniforms(fs_visitor *v);
+
+   void visit(ir_variable *ir);
+   void visit(ir_assignment *ir);
+   void visit(ir_dereference_variable *ir);
+   void visit(ir_dereference_record *ir);
+   void visit(ir_dereference_array *ir);
+   void visit(ir_expression *ir);
+   void visit(ir_texture *ir);
+   void visit(ir_if *ir);
+   void visit(ir_constant *ir);
+   void visit(ir_swizzle *ir);
+   void visit(ir_return *ir);
+   void visit(ir_loop *ir);
+   void visit(ir_loop_jump *ir);
+   void visit(ir_discard *ir);
+   void visit(ir_call *ir);
+   void visit(ir_function *ir);
+   void visit(ir_function_signature *ir);
+   void visit(ir_emit_vertex *);
+   void visit(ir_end_primitive *);
+
+   uint32_t gather_channel(ir_texture *ir, int sampler);
+   void swizzle_result(ir_texture *ir, fs_reg orig_val, int sampler);
+
+   bool can_do_source_mods(fs_inst *inst);
+
+   fs_inst *emit(fs_inst *inst);
+   void emit(exec_list list);
+
+   fs_inst *emit(enum opcode opcode);
+   fs_inst *emit(enum opcode opcode, fs_reg dst);
+   fs_inst *emit(enum opcode opcode, fs_reg dst, fs_reg src0);
+   fs_inst *emit(enum opcode opcode, fs_reg dst, fs_reg src0, fs_reg src1);
+   fs_inst *emit(enum opcode opcode, fs_reg dst,
+                 fs_reg src0, fs_reg src1, fs_reg src2);
+
+   fs_inst *MOV(fs_reg dst, fs_reg src);
+   fs_inst *NOT(fs_reg dst, fs_reg src);
+   fs_inst *RNDD(fs_reg dst, fs_reg src);
+   fs_inst *RNDE(fs_reg dst, fs_reg src);
+   fs_inst *RNDZ(fs_reg dst, fs_reg src);
+   fs_inst *FRC(fs_reg dst, fs_reg src);
+   fs_inst *ADD(fs_reg dst, fs_reg src0, fs_reg src1);
+   fs_inst *MUL(fs_reg dst, fs_reg src0, fs_reg src1);
+   fs_inst *MACH(fs_reg dst, fs_reg src0, fs_reg src1);
+   fs_inst *MAC(fs_reg dst, fs_reg src0, fs_reg src1);
+   fs_inst *SHL(fs_reg dst, fs_reg src0, fs_reg src1);
+   fs_inst *SHR(fs_reg dst, fs_reg src0, fs_reg src1);
+   fs_inst *ASR(fs_reg dst, fs_reg src0, fs_reg src1);
+   fs_inst *AND(fs_reg dst, fs_reg src0, fs_reg src1);
+   fs_inst *OR(fs_reg dst, fs_reg src0, fs_reg src1);
+   fs_inst *XOR(fs_reg dst, fs_reg src0, fs_reg src1);
+   fs_inst *IF(uint32_t predicate);
+   fs_inst *IF(fs_reg src0, fs_reg src1, uint32_t condition);
+   fs_inst *CMP(fs_reg dst, fs_reg src0, fs_reg src1,
+                uint32_t condition);
+   fs_inst *LRP(fs_reg dst, fs_reg a, fs_reg y, fs_reg x);
+   fs_inst *DEP_RESOLVE_MOV(int grf);
+   fs_inst *BFREV(fs_reg dst, fs_reg value);
+   fs_inst *BFE(fs_reg dst, fs_reg bits, fs_reg offset, fs_reg value);
+   fs_inst *BFI1(fs_reg dst, fs_reg bits, fs_reg offset);
+   fs_inst *BFI2(fs_reg dst, fs_reg bfi1_dst, fs_reg insert, fs_reg base);
+   fs_inst *FBH(fs_reg dst, fs_reg value);
+   fs_inst *FBL(fs_reg dst, fs_reg value);
+   fs_inst *CBIT(fs_reg dst, fs_reg value);
+   fs_inst *MAD(fs_reg dst, fs_reg c, fs_reg b, fs_reg a);
+   fs_inst *ADDC(fs_reg dst, fs_reg src0, fs_reg src1);
+   fs_inst *SUBB(fs_reg dst, fs_reg src0, fs_reg src1);
+   fs_inst *SEL(fs_reg dst, fs_reg src0, fs_reg src1);
+
+   int type_size(const struct glsl_type *type);
+   fs_inst *get_instruction_generating_reg(fs_inst *start,
+					   fs_inst *end,
+					   const fs_reg &reg);
+
+   exec_list VARYING_PULL_CONSTANT_LOAD(const fs_reg &dst,
+                                        const fs_reg &surf_index,
+                                        const fs_reg &varying_offset,
+                                        uint32_t const_offset);
+
+   bool run();
+   void assign_binding_table_offsets();
+   void setup_payload_gen4();
+   void setup_payload_gen6();
+   void assign_curb_setup();
+   void calculate_urb_setup();
+   void assign_urb_setup();
+   bool assign_regs(bool allow_spilling);
+   bool assign_regs_glassy(bool allow_spilling);
+   void assign_regs_trivial();
+   void get_used_mrfs(bool *mrf_used);
+   void setup_payload_interference(int* payload_last_use_ip,
+                                   int* mrf_first_use_ip,
+                                   int payload_reg_count,
+                                   int mrf_node_count,
+                                   int first_payload_node);
+   void choose_spill_reg(float* spill_costs, bool* no_spill);
+   int choose_spill_reg(igraph_t& g);
+   void spill_reg(int spill_reg);
+   void split_virtual_grfs();
+   void compact_virtual_grfs();
+   void move_uniform_array_access_to_pull_constants();
+   void assign_constant_locations();
+   void demote_pull_constants();
+   void invalidate_live_intervals();
+   void calculate_live_intervals(int extra = 0);
+   int calculate_register_pressure(int extra = 0);
+   bool opt_algebraic();
+   bool opt_cse();
+   bool opt_cse_local(bblock_t *block, exec_list *aeb);
+   bool opt_copy_propagate();
+   bool try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry);
+   bool opt_copy_propagate_local(void *mem_ctx, bblock_t *block,
+                                 exec_list *acp);
+   void opt_drop_redundant_mov_to_flags();
+   bool register_coalesce();
+   bool compute_to_mrf();
+   bool dead_code_eliminate();
+   bool remove_duplicate_mrf_writes();
+   bool virtual_grf_interferes(int a, int b);
+   int  live_in_count(int block_num) const;
+   int  live_out_count(int block_num) const;
+   int schedule_instructions(instruction_scheduler_mode mode);
+   void insert_gen4_send_dependency_workarounds();
+   void insert_gen4_pre_send_dependency_workarounds(fs_inst *inst);
+   void insert_gen4_post_send_dependency_workarounds(fs_inst *inst);
+   void vfail(const char *msg, va_list args);
+   void fail(const char *msg, ...);
+   void no16(const char *msg, ...);
+   void lower_uniform_pull_constant_loads();
+
+   void push_force_uncompressed();
+   void pop_force_uncompressed();
+
+   void emit_dummy_fs();
+   fs_reg *emit_fragcoord_interpolation(ir_variable *ir);
+   fs_inst *emit_linterp(const fs_reg &attr, const fs_reg &interp,
+                         glsl_interp_qualifier interpolation_mode,
+                         bool is_centroid, bool is_sample);
+   fs_reg *emit_frontfacing_interpolation(ir_variable *ir);
+   fs_reg *emit_samplepos_setup(ir_variable *ir);
+   fs_reg *emit_sampleid_setup(ir_variable *ir);
+   fs_reg *emit_samplemaskin_setup(ir_variable *ir);
+   fs_reg *emit_general_interpolation(ir_variable *ir);
+   void emit_interpolation_setup_gen4();
+   void emit_interpolation_setup_gen6();
+   void compute_sample_position(fs_reg dst, fs_reg int_sample_pos);
+   fs_reg rescale_texcoord(ir_texture *ir, fs_reg coordinate,
+                           bool is_rect, int sampler, int texunit);
+   fs_inst *emit_texture_gen4(ir_texture *ir, fs_reg dst, fs_reg coordinate,
+			      fs_reg shadow_comp, fs_reg lod, fs_reg lod2);
+   fs_inst *emit_texture_gen5(ir_texture *ir, fs_reg dst, fs_reg coordinate,
+                              fs_reg shadow_comp, fs_reg lod, fs_reg lod2,
+                              fs_reg sample_index);
+   fs_inst *emit_texture_gen7(ir_texture *ir, fs_reg dst, fs_reg coordinate,
+                              fs_reg shadow_comp, fs_reg lod, fs_reg lod2,
+                              fs_reg sample_index, fs_reg mcs, int sampler);
+   fs_reg emit_mcs_fetch(ir_texture *ir, fs_reg coordinate, int sampler);
+   void emit_gen6_gather_wa(uint8_t wa, fs_reg dst);
+   fs_reg fix_math_operand(fs_reg src);
+   fs_inst *emit_math(enum opcode op, fs_reg dst, fs_reg src0);
+   fs_inst *emit_math(enum opcode op, fs_reg dst, fs_reg src0, fs_reg src1);
+   void emit_lrp(const fs_reg &dst, const fs_reg &x, const fs_reg &y,
+                 const fs_reg &a);
+   void emit_minmax(uint32_t conditionalmod, const fs_reg &dst,
+                    const fs_reg &src0, const fs_reg &src1);
+   bool try_emit_saturate(ir_expression *ir);
+   bool try_emit_mad(ir_expression *ir);
+   void try_replace_with_sel();
+   bool opt_peephole_sel();
+   bool opt_peephole_predicated_break();
+   bool opt_saturate_propagation();
+   void emit_bool_to_cond_code(ir_rvalue *condition);
+   void emit_if_gen6(ir_if *ir);
+   void emit_unspill(fs_inst *inst, fs_reg reg, uint32_t spill_offset,
+                     int count);
+
+   void emit_fragment_program_code();
+   void setup_fp_regs();
+   fs_reg get_fp_src_reg(const prog_src_register *src);
+   fs_reg get_fp_dst_reg(const prog_dst_register *dst);
+   void emit_fp_alu1(enum opcode opcode,
+                     const struct prog_instruction *fpi,
+                     fs_reg dst, fs_reg src);
+   void emit_fp_alu2(enum opcode opcode,
+                     const struct prog_instruction *fpi,
+                     fs_reg dst, fs_reg src0, fs_reg src1);
+   void emit_fp_scalar_write(const struct prog_instruction *fpi,
+                             fs_reg dst, fs_reg src);
+   void emit_fp_scalar_math(enum opcode opcode,
+                            const struct prog_instruction *fpi,
+                            fs_reg dst, fs_reg src);
+
+   void emit_fp_minmax(const struct prog_instruction *fpi,
+                       fs_reg dst, fs_reg src0, fs_reg src1);
+
+   void emit_fp_sop(uint32_t conditional_mod,
+                    const struct prog_instruction *fpi,
+                    fs_reg dst, fs_reg src0, fs_reg src1, fs_reg one);
+
+   void emit_color_write(int target, int index, int first_color_mrf);
+   void emit_alpha_test();
+   void emit_fb_writes();
+
+   void emit_shader_time_begin();
+   void emit_shader_time_end();
+   void emit_shader_time_write(enum shader_time_shader_type type,
+                               fs_reg value);
+
+   void emit_untyped_atomic(unsigned atomic_op, unsigned surf_index,
+                            fs_reg dst, fs_reg offset, fs_reg src0,
+                            fs_reg src1);
+
+   void emit_untyped_surface_read(unsigned surf_index, fs_reg dst,
+                                  fs_reg offset);
+
+   bool try_rewrite_rhs_to_dst(ir_assignment *ir,
+			       fs_reg dst,
+			       fs_reg src,
+			       fs_inst *pre_rhs_inst,
+			       fs_inst *last_rhs_inst);
+   void emit_assignment_writes(fs_reg &l, fs_reg &r,
+			       const glsl_type *type, bool predicated);
+   void resolve_ud_negate(fs_reg *reg);
+   void resolve_bool_comparison(ir_rvalue *rvalue, fs_reg *reg);
+
+   fs_reg get_timestamp();
+
+   struct brw_reg interp_reg(int location, int channel);
+   void setup_uniform_values(ir_variable *ir);
+   void setup_builtin_uniform_values(ir_variable *ir);
+   int implied_mrf_writes(fs_inst *inst);
+
+   virtual void dump_instructions();
+   void dump_instruction(backend_instruction *inst);
+   void dump_instruction(backend_instruction *inst, FILE *file);
+   void dump_instruction(backend_instruction *inst, char* string);
+
+   void visit_atomic_counter_intrinsic(ir_call *ir);
+
+   struct gl_fragment_program *fp;
+   struct brw_wm_compile *c;
+   unsigned int sanity_param_count;
+
+   int *param_size;
+
+   int *virtual_grf_sizes;
+   int virtual_grf_count;
+   int virtual_grf_array_size;
+   int *virtual_grf_start;
+   int *virtual_grf_end;
+   brw::fs_live_variables *live_intervals;
+
+   int *regs_live_at_ip;
+
+   /** Number of uniform variable components visited. */
+   unsigned uniforms;
+
+   int estimated_clocks;
+
+   /**
+    * Array mapping UNIFORM register numbers to the pull parameter index,
+    * or -1 if this uniform register isn't being uploaded as a pull constant.
+    */
+   int *pull_constant_loc;
+
+   /**
+    * Array mapping UNIFORM register numbers to the push parameter index,
+    * or -1 if this uniform register isn't being uploaded as a push constant.
+    */
+   int *push_constant_loc;
+
+   struct hash_table *variable_ht;
+   fs_reg frag_depth;
+   fs_reg sample_mask;
+   fs_reg outputs[BRW_MAX_DRAW_BUFFERS];
+   unsigned output_components[BRW_MAX_DRAW_BUFFERS];
+   fs_reg dual_src_output;
+   bool do_dual_src;
+   int first_non_payload_grf;
+   /** Either BRW_MAX_GRF or GEN7_MRF_HACK_START */
+   int max_grf;
+
+   fs_reg *fp_temp_regs;
+   fs_reg *fp_input_regs;
+
+   /** @{ debug annotation info */
+   const char *current_annotation;
+   const void *base_ir;
+   /** @} */
+
+   bool failed;
+   char *fail_msg;
+   bool simd16_unsupported;
+   char *no16_msg;
+
+   /* Result of last visit() method. */
+   fs_reg result;
+
+   fs_reg pixel_x;
+   fs_reg pixel_y;
+   fs_reg wpos_w;
+   fs_reg pixel_w;
+   fs_reg delta_x[BRW_WM_BARYCENTRIC_INTERP_MODE_COUNT];
+   fs_reg delta_y[BRW_WM_BARYCENTRIC_INTERP_MODE_COUNT];
+   fs_reg shader_start_time;
+
+   int grf_used;
+   bool spilled_any_registers;
+
+   const unsigned dispatch_width; /**< 8 or 16 */
+
+   int force_uncompressed_stack;
+};
+
+/**
+ * The fragment shader code generator.
+ *
+ * Translates FS IR to actual i965 assembly code.
+ */
+class fs_generator
+{
+public:
+   fs_generator(struct brw_context *brw,
+                struct brw_wm_compile *c,
+                struct gl_shader_program *prog,
+                struct gl_fragment_program *fp,
+                bool dual_source_output);
+   ~fs_generator();
+
+   const unsigned *generate_assembly(exec_list *simd8_instructions,
+                                     exec_list *simd16_instructions,
+                                     unsigned *assembly_size,
+                                     FILE *dump_file = NULL);
+
+private:
+   void generate_code(exec_list *instructions, FILE *dump_file);
+   void generate_fb_write(fs_inst *inst);
+   void generate_blorp_fb_write(fs_inst *inst);
+   void generate_pixel_xy(struct brw_reg dst, bool is_x);
+   void generate_linterp(fs_inst *inst, struct brw_reg dst,
+			 struct brw_reg *src);
+   void generate_tex(fs_inst *inst, struct brw_reg dst, struct brw_reg src);
+   void generate_math1_gen7(fs_inst *inst,
+			    struct brw_reg dst,
+			    struct brw_reg src);
+   void generate_math2_gen7(fs_inst *inst,
+			    struct brw_reg dst,
+			    struct brw_reg src0,
+			    struct brw_reg src1);
+   void generate_math1_gen6(fs_inst *inst,
+			    struct brw_reg dst,
+			    struct brw_reg src);
+   void generate_math2_gen6(fs_inst *inst,
+			    struct brw_reg dst,
+			    struct brw_reg src0,
+			    struct brw_reg src1);
+   void generate_math_gen4(fs_inst *inst,
+			   struct brw_reg dst,
+			   struct brw_reg src);
+   void generate_math_g45(fs_inst *inst,
+			  struct brw_reg dst,
+			  struct brw_reg src);
+   void generate_ddx(fs_inst *inst, struct brw_reg dst, struct brw_reg src);
+   void generate_ddy(fs_inst *inst, struct brw_reg dst, struct brw_reg src,
+                     bool negate_value);
+   void generate_scratch_write(fs_inst *inst, struct brw_reg src);
+   void generate_scratch_read(fs_inst *inst, struct brw_reg dst);
+   void generate_scratch_read_gen7(fs_inst *inst, struct brw_reg dst);
+   void generate_uniform_pull_constant_load(fs_inst *inst, struct brw_reg dst,
+                                            struct brw_reg index,
+                                            struct brw_reg offset);
+   void generate_uniform_pull_constant_load_gen7(fs_inst *inst,
+                                                 struct brw_reg dst,
+                                                 struct brw_reg surf_index,
+                                                 struct brw_reg offset);
+   void generate_varying_pull_constant_load(fs_inst *inst, struct brw_reg dst,
+                                            struct brw_reg index,
+                                            struct brw_reg offset);
+   void generate_varying_pull_constant_load_gen7(fs_inst *inst,
+                                                 struct brw_reg dst,
+                                                 struct brw_reg index,
+                                                 struct brw_reg offset);
+   void generate_mov_dispatch_to_flags(fs_inst *inst);
+
+   void generate_set_omask(fs_inst *inst,
+                           struct brw_reg dst,
+                           struct brw_reg sample_mask);
+
+   void generate_set_sample_id(fs_inst *inst,
+                               struct brw_reg dst,
+                               struct brw_reg src0,
+                               struct brw_reg src1);
+
+   void generate_set_simd4x2_offset(fs_inst *inst,
+                                    struct brw_reg dst,
+                                    struct brw_reg offset);
+   void generate_discard_jump(fs_inst *inst);
+
+   void generate_pack_half_2x16_split(fs_inst *inst,
+                                      struct brw_reg dst,
+                                      struct brw_reg x,
+                                      struct brw_reg y);
+   void generate_unpack_half_2x16_split(fs_inst *inst,
+                                        struct brw_reg dst,
+                                        struct brw_reg src);
+
+   void generate_shader_time_add(fs_inst *inst,
+                                 struct brw_reg payload,
+                                 struct brw_reg offset,
+                                 struct brw_reg value);
+
+   void generate_untyped_atomic(fs_inst *inst,
+                                struct brw_reg dst,
+                                struct brw_reg atomic_op,
+                                struct brw_reg surf_index);
+
+   void generate_untyped_surface_read(fs_inst *inst,
+                                      struct brw_reg dst,
+                                      struct brw_reg surf_index);
+
+   void generate_scattered_write(fs_inst *inst,
+                                 const struct brw_reg &dst,
+                                 const struct brw_reg &src);
+
+   void generate_scattered_read(fs_inst *inst,
+                                const struct brw_reg &dst,
+                                const struct brw_reg &src);
+
+   void patch_discard_jumps_to_fb_writes();
+
+   struct brw_context *brw;
+   struct gl_context *ctx;
+
+   struct brw_compile *p;
+   struct brw_wm_compile *c;
+
+   struct gl_shader_program *prog;
+   const struct gl_fragment_program *fp;
+
+   unsigned dispatch_width; /**< 8 or 16 */
+
+   exec_list discard_halt_patches;
+   bool dual_source_output;
+   void *mem_ctx;
+};
+
+/**
+ * The fragment shader code generator.
+ *
+ * Translates FS IR to actual i965 assembly code.
+ */
+class gen8_fs_generator : public gen8_generator
+{
+public:
+   gen8_fs_generator(struct brw_context *brw,
+                     struct brw_wm_compile *c,
+                     struct gl_shader_program *prog,
+                     struct gl_fragment_program *fp,
+                     bool dual_source_output);
+   ~gen8_fs_generator();
+
+   const unsigned *generate_assembly(exec_list *simd8_instructions,
+                                     exec_list *simd16_instructions,
+                                     unsigned *assembly_size);
+
+private:
+   void generate_code(exec_list *instructions);
+   void generate_fb_write(fs_inst *inst);
+   void generate_linterp(fs_inst *inst, struct brw_reg dst,
+                         struct brw_reg *src);
+   void generate_tex(fs_inst *inst, struct brw_reg dst, struct brw_reg src);
+   void generate_math1(fs_inst *inst, struct brw_reg dst, struct brw_reg src);
+   void generate_math2(fs_inst *inst, struct brw_reg dst,
+                       struct brw_reg src0, struct brw_reg src1);
+   void generate_ddx(fs_inst *inst, struct brw_reg dst, struct brw_reg src);
+   void generate_ddy(fs_inst *inst, struct brw_reg dst, struct brw_reg src,
+                     bool negate_value);
+   void generate_scratch_write(fs_inst *inst, struct brw_reg src);
+   void generate_scratch_read(fs_inst *inst, struct brw_reg dst);
+   void generate_scratch_read_gen7(fs_inst *inst, struct brw_reg dst);
+   void generate_uniform_pull_constant_load(fs_inst *inst,
+                                            struct brw_reg dst,
+                                            struct brw_reg index,
+                                            struct brw_reg offset);
+   void generate_varying_pull_constant_load(fs_inst *inst,
+                                            struct brw_reg dst,
+                                            struct brw_reg index,
+                                            struct brw_reg offset);
+   void generate_mov_dispatch_to_flags(fs_inst *ir);
+   void generate_set_omask(fs_inst *ir,
+                           struct brw_reg dst,
+                           struct brw_reg sample_mask);
+   void generate_set_sample_id(fs_inst *ir,
+                               struct brw_reg dst,
+                               struct brw_reg src0,
+                               struct brw_reg src1);
+   void generate_set_simd4x2_offset(fs_inst *ir,
+                                    struct brw_reg dst,
+                                    struct brw_reg offset);
+   void generate_pack_half_2x16_split(fs_inst *inst,
+                                      struct brw_reg dst,
+                                      struct brw_reg x,
+                                      struct brw_reg y);
+   void generate_unpack_half_2x16_split(fs_inst *inst,
+                                        struct brw_reg dst,
+                                        struct brw_reg src);
+   void generate_untyped_atomic(fs_inst *inst,
+                                struct brw_reg dst,
+                                struct brw_reg atomic_op,
+                                struct brw_reg surf_index);
+
+   void generate_untyped_surface_read(fs_inst *inst,
+                                      struct brw_reg dst,
+                                      struct brw_reg surf_index);
+   void generate_discard_jump(fs_inst *ir);
+
+   void patch_discard_jumps_to_fb_writes();
+
+   struct brw_wm_compile *c;
+   const struct gl_fragment_program *fp;
+
+   unsigned dispatch_width; /** 8 or 16 */
+
+   bool dual_source_output;
+
+   exec_list discard_halt_patches;
+};
+
+bool brw_do_channel_expressions(struct exec_list *instructions);
+bool brw_do_vector_splitting(struct exec_list *instructions);
+bool brw_fs_precompile(struct gl_context *ctx, struct gl_shader_program *prog);
+
+struct brw_reg brw_reg_from_fs_reg(fs_reg *reg);

diff --git a/icd/intel/compiler/pipeline/brw_fs_channel_expressions.cpp b/icd/intel/compiler/pipeline/brw_fs_channel_expressions.cpp
new file mode 100644
index 0000000..10bbd36
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_fs_channel_expressions.cpp

@@ -0,0 +1,429 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file brw_wm_channel_expressions.cpp
+ *
+ * Breaks vector operations down into operations on each component.
+ *
+ * The 965 fragment shader receives 8 or 16 pixels at a time, so each
+ * channel of a vector is laid out as 1 or 2 8-float registers.  Each
+ * ALU operation operates on one of those channel registers.  As a
+ * result, there is no value to the 965 fragment shader in tracking
+ * "vector" expressions in the sense of GLSL fragment shaders, when
+ * doing a channel at a time may help in constant folding, algebraic
+ * simplification, and reducing the liveness of channel registers.
+ *
+ * The exception to the desire to break everything down to floats is
+ * texturing.  The texture sampler returns a writemasked masked
+ * 4/8-register sequence containing the texture values.  We don't want
+ * to dispatch to the sampler separately for each channel we need, so
+ * we do retain the vector types in that case.
+ */
+
+extern "C" {
+//#include "main/core.h" // LunarG : Removed
+#include "brw_wm.h"
+}
+#include "glsl/ir.h"
+#include "glsl/ir_expression_flattening.h"
+#include "glsl/glsl_types.h"
+
+class ir_channel_expressions_visitor : public ir_hierarchical_visitor {
+public:
+   ir_channel_expressions_visitor()
+   {
+      this->progress = false;
+      this->mem_ctx = NULL;
+   }
+
+   ir_visitor_status visit_leave(ir_assignment *);
+
+   ir_rvalue *get_element(ir_variable *var, unsigned int element);
+   void assign(ir_assignment *ir, int elem, ir_rvalue *val);
+
+   bool progress;
+   void *mem_ctx;
+};
+
+static bool
+channel_expressions_predicate(ir_instruction *ir)
+{
+   ir_expression *expr = ir->as_expression();
+   unsigned int i;
+
+   if (!expr)
+      return false;
+
+   for (i = 0; i < expr->get_num_operands(); i++) {
+      if (expr->operands[i]->type->is_vector())
+	 return true;
+   }
+
+   return false;
+}
+
+bool
+brw_do_channel_expressions(exec_list *instructions)
+{
+   ir_channel_expressions_visitor v;
+
+   /* Pull out any matrix expression to a separate assignment to a
+    * temp.  This will make our handling of the breakdown to
+    * operations on the matrix's vector components much easier.
+    */
+   do_expression_flattening(instructions, channel_expressions_predicate);
+
+   visit_list_elements(&v, instructions);
+
+   return v.progress;
+}
+
+ir_rvalue *
+ir_channel_expressions_visitor::get_element(ir_variable *var, unsigned int elem)
+{
+   ir_dereference *deref;
+
+   if (var->type->is_scalar())
+      return new(mem_ctx) ir_dereference_variable(var);
+
+   assert(elem < var->type->components());
+   deref = new(mem_ctx) ir_dereference_variable(var);
+   return new(mem_ctx) ir_swizzle(deref, elem, 0, 0, 0, 1);
+}
+
+void
+ir_channel_expressions_visitor::assign(ir_assignment *ir, int elem, ir_rvalue *val)
+{
+   ir_dereference *lhs = ir->lhs->clone(mem_ctx, NULL);
+   ir_assignment *assign;
+
+   /* This assign-of-expression should have been generated by the
+    * expression flattening visitor (since we never short circit to
+    * not flatten, even for plain assignments of variables), so the
+    * writemask is always full.
+    */
+   assert(ir->write_mask == (1 << ir->lhs->type->components()) - 1);
+
+   assign = new(mem_ctx) ir_assignment(lhs, val, NULL, (1 << elem));
+   ir->insert_before(assign);
+}
+
+ir_visitor_status
+ir_channel_expressions_visitor::visit_leave(ir_assignment *ir)
+{
+   ir_expression *expr = ir->rhs->as_expression();
+   bool found_vector = false;
+   unsigned int i, vector_elements = 1;
+   ir_variable *op_var[3];
+
+   if (!expr)
+      return visit_continue;
+
+   if (!this->mem_ctx)
+      this->mem_ctx = ralloc_parent(ir);
+
+   for (i = 0; i < expr->get_num_operands(); i++) {
+      if (expr->operands[i]->type->is_vector()) {
+	 found_vector = true;
+	 vector_elements = expr->operands[i]->type->vector_elements;
+	 break;
+      }
+   }
+   if (!found_vector)
+      return visit_continue;
+
+   /* Store the expression operands in temps so we can use them
+    * multiple times.
+    */
+   for (i = 0; i < expr->get_num_operands(); i++) {
+      ir_assignment *assign;
+      ir_dereference *deref;
+
+      assert(!expr->operands[i]->type->is_matrix());
+
+      op_var[i] = new(mem_ctx) ir_variable(expr->operands[i]->type,
+					   "channel_expressions",
+					   ir_var_temporary);
+      ir->insert_before(op_var[i]);
+
+      deref = new(mem_ctx) ir_dereference_variable(op_var[i]);
+      assign = new(mem_ctx) ir_assignment(deref,
+					  expr->operands[i],
+					  NULL);
+      ir->insert_before(assign);
+   }
+
+   const glsl_type *element_type = glsl_type::get_instance(ir->lhs->type->base_type,
+							   1, 1);
+
+   /* OK, time to break down this vector operation. */
+   switch (expr->operation) {
+   case ir_unop_bit_not:
+   case ir_unop_logic_not:
+   case ir_unop_neg:
+   case ir_unop_abs:
+   case ir_unop_sign:
+   case ir_unop_rcp:
+   case ir_unop_rsq:
+   case ir_unop_sqrt:
+   case ir_unop_exp:
+   case ir_unop_log:
+   case ir_unop_exp2:
+   case ir_unop_log2:
+   case ir_unop_bitcast_i2f:
+   case ir_unop_bitcast_f2i:
+   case ir_unop_bitcast_f2u:
+   case ir_unop_bitcast_u2f:
+   case ir_unop_i2u:
+   case ir_unop_u2i:
+   case ir_unop_f2i:
+   case ir_unop_f2u:
+   case ir_unop_i2f:
+   case ir_unop_f2b:
+   case ir_unop_b2f:
+   case ir_unop_i2b:
+   case ir_unop_b2i:
+   case ir_unop_u2f:
+   case ir_unop_trunc:
+   case ir_unop_ceil:
+   case ir_unop_floor:
+   case ir_unop_fract:
+   case ir_unop_round_even:
+   case ir_unop_sin:
+   case ir_unop_cos:
+   case ir_unop_sin_reduced:
+   case ir_unop_cos_reduced:
+   case ir_unop_dFdx:
+   case ir_unop_dFdy:
+   case ir_unop_bitfield_reverse:
+   case ir_unop_bit_count:
+   case ir_unop_find_msb:
+   case ir_unop_find_lsb:
+      for (i = 0; i < vector_elements; i++) {
+	 ir_rvalue *op0 = get_element(op_var[0], i);
+
+	 assign(ir, i, new(mem_ctx) ir_expression(expr->operation,
+						  element_type,
+						  op0,
+						  NULL));
+      }
+      break;
+
+   case ir_binop_add:
+   case ir_binop_sub:
+   case ir_binop_mul:
+   case ir_binop_imul_high:
+   case ir_binop_div:
+   case ir_binop_carry:
+   case ir_binop_borrow:
+   case ir_binop_mod:
+   case ir_binop_min:
+   case ir_binop_max:
+   case ir_binop_pow:
+   case ir_binop_lshift:
+   case ir_binop_rshift:
+   case ir_binop_bit_and:
+   case ir_binop_bit_xor:
+   case ir_binop_bit_or:
+   case ir_binop_less:
+   case ir_binop_greater:
+   case ir_binop_lequal:
+   case ir_binop_gequal:
+   case ir_binop_equal:
+   case ir_binop_nequal:
+      for (i = 0; i < vector_elements; i++) {
+	 ir_rvalue *op0 = get_element(op_var[0], i);
+	 ir_rvalue *op1 = get_element(op_var[1], i);
+
+	 assign(ir, i, new(mem_ctx) ir_expression(expr->operation,
+						  element_type,
+						  op0,
+						  op1));
+      }
+      break;
+
+   case ir_unop_any: {
+      ir_expression *temp;
+      temp = new(mem_ctx) ir_expression(ir_binop_logic_or,
+					element_type,
+					get_element(op_var[0], 0),
+					get_element(op_var[0], 1));
+
+      for (i = 2; i < vector_elements; i++) {
+	 temp = new(mem_ctx) ir_expression(ir_binop_logic_or,
+					   element_type,
+					   get_element(op_var[0], i),
+					   temp);
+      }
+      assign(ir, 0, temp);
+      break;
+   }
+
+   case ir_binop_dot: {
+      ir_expression *last = NULL;
+      for (i = 0; i < vector_elements; i++) {
+	 ir_rvalue *op0 = get_element(op_var[0], i);
+	 ir_rvalue *op1 = get_element(op_var[1], i);
+	 ir_expression *temp;
+
+	 temp = new(mem_ctx) ir_expression(ir_binop_mul,
+					   element_type,
+					   op0,
+					   op1);
+	 if (last) {
+	    last = new(mem_ctx) ir_expression(ir_binop_add,
+					      element_type,
+					      temp,
+					      last);
+	 } else {
+	    last = temp;
+	 }
+      }
+      assign(ir, 0, last);
+      break;
+   }
+
+   case ir_binop_logic_and:
+   case ir_binop_logic_xor:
+   case ir_binop_logic_or:
+      ir->fprint(stderr);
+      fprintf(stderr, "\n");
+      assert(!"not reached: expression operates on scalars only");
+      break;
+   case ir_binop_all_equal:
+   case ir_binop_any_nequal: {
+      ir_expression *last = NULL;
+      for (i = 0; i < vector_elements; i++) {
+	 ir_rvalue *op0 = get_element(op_var[0], i);
+	 ir_rvalue *op1 = get_element(op_var[1], i);
+	 ir_expression *temp;
+	 ir_expression_operation join;
+
+	 if (expr->operation == ir_binop_all_equal)
+	    join = ir_binop_logic_and;
+	 else
+	    join = ir_binop_logic_or;
+
+	 temp = new(mem_ctx) ir_expression(expr->operation,
+					   element_type,
+					   op0,
+					   op1);
+	 if (last) {
+	    last = new(mem_ctx) ir_expression(join,
+					      element_type,
+					      temp,
+					      last);
+	 } else {
+	    last = temp;
+	 }
+      }
+      assign(ir, 0, last);
+      break;
+   }
+   case ir_unop_noise:
+      assert(!"noise should have been broken down to function call");
+      break;
+
+   case ir_binop_bfm: {
+      /* Does not need to be scalarized, since its result will be identical
+       * for all channels.
+       */
+      ir_rvalue *op0 = get_element(op_var[0], 0);
+      ir_rvalue *op1 = get_element(op_var[1], 0);
+
+      assign(ir, 0, new(mem_ctx) ir_expression(expr->operation,
+                                               element_type,
+                                               op0,
+                                               op1));
+      break;
+   }
+
+   case ir_binop_ubo_load:
+      assert(!"not yet supported");
+      break;
+
+   case ir_triop_fma:
+   case ir_triop_lrp:
+   case ir_triop_csel:
+   case ir_triop_bitfield_extract:
+      for (i = 0; i < vector_elements; i++) {
+	 ir_rvalue *op0 = get_element(op_var[0], i);
+	 ir_rvalue *op1 = get_element(op_var[1], i);
+	 ir_rvalue *op2 = get_element(op_var[2], i);
+
+	 assign(ir, i, new(mem_ctx) ir_expression(expr->operation,
+						  element_type,
+						  op0,
+						  op1,
+						  op2));
+      }
+      break;
+
+   case ir_triop_bfi: {
+      /* Only a single BFM is needed for multiple BFIs. */
+      ir_rvalue *op0 = get_element(op_var[0], 0);
+
+      for (i = 0; i < vector_elements; i++) {
+         ir_rvalue *op1 = get_element(op_var[1], i);
+         ir_rvalue *op2 = get_element(op_var[2], i);
+
+         assign(ir, i, new(mem_ctx) ir_expression(expr->operation,
+                                                  element_type,
+                                                  op0->clone(mem_ctx, NULL),
+                                                  op1,
+                                                  op2));
+      }
+      break;
+   }
+
+   case ir_unop_pack_snorm_2x16:
+   case ir_unop_pack_snorm_4x8:
+   case ir_unop_pack_unorm_2x16:
+   case ir_unop_pack_unorm_4x8:
+   case ir_unop_pack_half_2x16:
+   case ir_unop_unpack_snorm_2x16:
+   case ir_unop_unpack_snorm_4x8:
+   case ir_unop_unpack_unorm_2x16:
+   case ir_unop_unpack_unorm_4x8:
+   case ir_unop_unpack_half_2x16:
+   case ir_binop_ldexp:
+   case ir_binop_vector_extract:
+   case ir_triop_vector_insert:
+   case ir_quadop_bitfield_insert:
+   case ir_quadop_vector:
+      assert(!"should have been lowered");
+      break;
+
+   case ir_unop_unpack_half_2x16_split_x:
+   case ir_unop_unpack_half_2x16_split_y:
+   case ir_binop_pack_half_2x16_split:
+      assert(!"not reached: expression operates on scalars only");
+      break;
+   }
+
+   ir->remove();
+   this->progress = true;
+
+   return visit_continue;
+}

diff --git a/icd/intel/compiler/pipeline/brw_fs_copy_propagation.cpp b/icd/intel/compiler/pipeline/brw_fs_copy_propagation.cpp
new file mode 100644
index 0000000..ef149c8
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_fs_copy_propagation.cpp

@@ -0,0 +1,632 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+/** @file brw_fs_copy_propagation.cpp
+ *
+ * Support for global copy propagation in two passes: A local pass that does
+ * intra-block copy (and constant) propagation, and a global pass that uses
+ * dataflow analysis on the copies available at the end of each block to re-do
+ * local copy propagation with more copies available.
+ *
+ * See Muchnick's Advanced Compiler Design and Implementation, section
+ * 12.5 (p356).
+ */
+
+#define ACP_HASH_SIZE 16
+
+#include "main/bitset.h"
+#include "brw_fs.h"
+#include "brw_cfg.h"
+
+namespace { /* avoid conflict with opt_copy_propagation_elements */
+struct acp_entry : public exec_node {
+   fs_reg dst;
+   fs_reg src;
+};
+
+struct block_data {
+   /**
+    * Which entries in the fs_copy_prop_dataflow acp table are live at the
+    * start of this block.  This is the useful output of the analysis, since
+    * it lets us plug those into the local copy propagation on the second
+    * pass.
+    */
+   BITSET_WORD *livein;
+
+   /**
+    * Which entries in the fs_copy_prop_dataflow acp table are live at the end
+    * of this block.  This is done in initial setup from the per-block acps
+    * returned by the first local copy prop pass.
+    */
+   BITSET_WORD *liveout;
+
+   /**
+    * Which entries in the fs_copy_prop_dataflow acp table are generated by
+    * instructions in this block which reach the end of the block without
+    * being killed.
+    */
+   BITSET_WORD *copy;
+
+   /**
+    * Which entries in the fs_copy_prop_dataflow acp table are killed over the
+    * course of this block.
+    */
+   BITSET_WORD *kill;
+};
+
+class fs_copy_prop_dataflow
+{
+public:
+   fs_copy_prop_dataflow(void *mem_ctx, cfg_t *cfg,
+                         exec_list *out_acp[ACP_HASH_SIZE]);
+
+   void setup_initial_values();
+   void run();
+
+   void dump_block_data() const;
+
+   void *mem_ctx;
+   cfg_t *cfg;
+
+   acp_entry **acp;
+   int num_acp;
+   int bitset_words;
+
+  struct block_data *bd;
+};
+} /* anonymous namespace */
+
+fs_copy_prop_dataflow::fs_copy_prop_dataflow(void *mem_ctx, cfg_t *cfg,
+                                             exec_list *out_acp[ACP_HASH_SIZE])
+   : mem_ctx(mem_ctx), cfg(cfg)
+{
+   bd = rzalloc_array(mem_ctx, struct block_data, cfg->num_blocks);
+
+   num_acp = 0;
+   for (int b = 0; b < cfg->num_blocks; b++) {
+      for (int i = 0; i < ACP_HASH_SIZE; i++) {
+         foreach_list(entry_node, &out_acp[b][i]) {
+            num_acp++;
+         }
+      }
+   }
+
+   acp = rzalloc_array(mem_ctx, struct acp_entry *, num_acp);
+
+   bitset_words = BITSET_WORDS(num_acp);
+
+   int next_acp = 0;
+   for (int b = 0; b < cfg->num_blocks; b++) {
+      bd[b].livein = rzalloc_array(bd, BITSET_WORD, bitset_words);
+      bd[b].liveout = rzalloc_array(bd, BITSET_WORD, bitset_words);
+      bd[b].copy = rzalloc_array(bd, BITSET_WORD, bitset_words);
+      bd[b].kill = rzalloc_array(bd, BITSET_WORD, bitset_words);
+
+      for (int i = 0; i < ACP_HASH_SIZE; i++) {
+         foreach_list(entry_node, &out_acp[b][i]) {
+            acp_entry *entry = (acp_entry *)entry_node;
+
+            acp[next_acp] = entry;
+
+            /* opt_copy_propagate_local populates out_acp with copies created
+             * in a block which are still live at the end of the block.  This
+             * is exactly what we want in the COPY set.
+             */
+            BITSET_SET(bd[b].copy, next_acp);
+
+            next_acp++;
+         }
+      }
+   }
+
+   assert(next_acp == num_acp);
+
+   setup_initial_values();
+   run();
+}
+
+/**
+ * Set up initial values for each of the data flow sets, prior to running
+ * the fixed-point algorithm.
+ */
+void
+fs_copy_prop_dataflow::setup_initial_values()
+{
+   /* Initialize the COPY and KILL sets. */
+   for (int b = 0; b < cfg->num_blocks; b++) {
+      bblock_t *block = cfg->blocks[b];
+
+      for (fs_inst *inst = (fs_inst *)block->start;
+           inst != block->end->next;
+           inst = (fs_inst *)inst->next) {
+         if (inst->dst.file != GRF)
+            continue;
+
+         /* Mark ACP entries which are killed by this instruction. */
+         for (int i = 0; i < num_acp; i++) {
+            if (inst->overwrites_reg(acp[i]->dst) ||
+                inst->overwrites_reg(acp[i]->src)) {
+               BITSET_SET(bd[b].kill, i);
+            }
+         }
+      }
+   }
+
+   /* Populate the initial values for the livein and liveout sets.  For the
+    * block at the start of the program, livein = 0 and liveout = copy.
+    * For the others, set liveout to 0 (the empty set) and livein to ~0
+    * (the universal set).
+    */
+   for (int b = 0; b < cfg->num_blocks; b++) {
+      bblock_t *block = cfg->blocks[b];
+      if (block->parents.is_empty()) {
+         for (int i = 0; i < bitset_words; i++) {
+            bd[b].livein[i] = 0u;
+            bd[b].liveout[i] = bd[b].copy[i];
+         }
+      } else {
+         for (int i = 0; i < bitset_words; i++) {
+            bd[b].liveout[i] = 0u;
+            bd[b].livein[i] = ~0u;
+         }
+      }
+   }
+}
+
+/**
+ * Walk the set of instructions in the block, marking which entries in the acp
+ * are killed by the block.
+ */
+void
+fs_copy_prop_dataflow::run()
+{
+   bool progress;
+
+   do {
+      progress = false;
+
+      /* Update liveout for all blocks. */
+      for (int b = 0; b < cfg->num_blocks; b++) {
+         if (cfg->blocks[b]->parents.is_empty())
+            continue;
+
+         for (int i = 0; i < bitset_words; i++) {
+            const BITSET_WORD old_liveout = bd[b].liveout[i];
+
+            bd[b].liveout[i] =
+               bd[b].copy[i] | (bd[b].livein[i] & ~bd[b].kill[i]);
+
+            if (old_liveout != bd[b].liveout[i])
+               progress = true;
+         }
+      }
+
+      /* Update livein for all blocks.  If a copy is live out of all parent
+       * blocks, it's live coming in to this block.
+       */
+      for (int b = 0; b < cfg->num_blocks; b++) {
+         if (cfg->blocks[b]->parents.is_empty())
+            continue;
+
+         for (int i = 0; i < bitset_words; i++) {
+            const BITSET_WORD old_livein = bd[b].livein[i];
+
+            bd[b].livein[i] = ~0u;
+            foreach_list(block_node, &cfg->blocks[b]->parents) {
+               bblock_link *link = (bblock_link *)block_node;
+               bblock_t *block = link->block;
+               bd[b].livein[i] &= bd[block->block_num].liveout[i];
+            }
+
+            if (old_livein != bd[b].livein[i])
+               progress = true;
+         }
+      }
+   } while (progress);
+}
+
+void
+fs_copy_prop_dataflow::dump_block_data() const
+{
+   for (int b = 0; b < cfg->num_blocks; b++) {
+      bblock_t *block = cfg->blocks[b];
+      fprintf(stderr, "Block %d [%d, %d] (parents ", block->block_num,
+             block->start_ip, block->end_ip);
+      foreach_list(block_node, &block->parents) {
+         bblock_t *parent = ((bblock_link *) block_node)->block;
+         fprintf(stderr, "%d ", parent->block_num);
+      }
+      fprintf(stderr, "):\n");
+      fprintf(stderr, "       livein = 0x");
+      for (int i = 0; i < bitset_words; i++)
+         fprintf(stderr, "%08x", bd[b].livein[i]);
+      fprintf(stderr, ", liveout = 0x");
+      for (int i = 0; i < bitset_words; i++)
+         fprintf(stderr, "%08x", bd[b].liveout[i]);
+      fprintf(stderr, ",\n       copy   = 0x");
+      for (int i = 0; i < bitset_words; i++)
+         fprintf(stderr, "%08x", bd[b].copy[i]);
+      fprintf(stderr, ", kill    = 0x");
+      for (int i = 0; i < bitset_words; i++)
+         fprintf(stderr, "%08x", bd[b].kill[i]);
+      fprintf(stderr, "\n");
+   }
+}
+
+static bool
+is_logic_op(enum opcode opcode)
+{
+   return (opcode == BRW_OPCODE_AND ||
+           opcode == BRW_OPCODE_OR  ||
+           opcode == BRW_OPCODE_XOR ||
+           opcode == BRW_OPCODE_NOT);
+}
+
+bool
+fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
+{
+   if (entry->src.file == IMM)
+      return false;
+
+   /* Bail if inst is reading more than entry is writing. */
+   if ((inst->regs_read(this, arg) * inst->src[arg].stride *
+        type_sz(inst->src[arg].type)) > type_sz(entry->dst.type))
+      return false;
+
+   if (inst->src[arg].file != entry->dst.file ||
+       inst->src[arg].reg != entry->dst.reg ||
+       inst->src[arg].reg_offset != entry->dst.reg_offset ||
+       inst->src[arg].subreg_offset != entry->dst.subreg_offset) {
+      return false;
+   }
+
+   /* See resolve_ud_negate() and comment in brw_fs_emit.cpp. */
+   if (inst->conditional_mod &&
+       inst->src[arg].type == BRW_REGISTER_TYPE_UD &&
+       entry->src.negate)
+      return false;
+
+   bool has_source_modifiers = entry->src.abs || entry->src.negate;
+
+   if ((has_source_modifiers || entry->src.file == UNIFORM ||
+        !entry->src.is_contiguous()) &&
+       !can_do_source_mods(inst))
+      return false;
+
+   /* Bail if the result of composing both strides would exceed the
+    * hardware limit.
+    */
+   if (entry->src.stride * inst->src[arg].stride > 4)
+      return false;
+
+   /* Bail if the result of composing both strides cannot be expressed
+    * as another stride. This avoids, for example, trying to transform
+    * this:
+    *
+    *     MOV (8) rX<1>UD rY<0;1,0>UD
+    *     FOO (8) ...     rX<8;8,1>UW
+    *
+    * into this:
+    *
+    *     FOO (8) ...     rY<0;1,0>UW
+    *
+    * Which would have different semantics.
+    */
+   if (entry->src.stride != 1 &&
+       (inst->src[arg].stride *
+        type_sz(inst->src[arg].type)) % type_sz(entry->src.type) != 0)
+      return false;
+
+   if (has_source_modifiers && entry->dst.type != inst->src[arg].type)
+      return false;
+
+   if (brw->gen >= 8 && (entry->src.negate || entry->src.abs) &&
+       is_logic_op(inst->opcode)) {
+      return false;
+   }
+
+   inst->src[arg].file = entry->src.file;
+   inst->src[arg].reg = entry->src.reg;
+   inst->src[arg].reg_offset = entry->src.reg_offset;
+   inst->src[arg].subreg_offset = entry->src.subreg_offset;
+   inst->src[arg].stride *= entry->src.stride;
+
+   if (!inst->src[arg].abs) {
+      inst->src[arg].abs = entry->src.abs;
+      inst->src[arg].negate ^= entry->src.negate;
+   }
+
+   return true;
+}
+
+
+static bool
+try_constant_propagate(struct brw_context *brw, fs_inst *inst,
+                       acp_entry *entry)
+{
+   bool progress = false;
+
+   if (entry->src.file != IMM)
+      return false;
+
+   for (int i = 2; i >= 0; i--) {
+      if (inst->src[i].file != entry->dst.file ||
+          inst->src[i].reg != entry->dst.reg ||
+          inst->src[i].reg_offset != entry->dst.reg_offset ||
+          inst->src[i].subreg_offset != entry->dst.subreg_offset ||
+          inst->src[i].type != entry->dst.type ||
+          inst->src[i].stride > 1)
+         continue;
+
+      /* Don't bother with cases that should have been taken care of by the
+       * GLSL compiler's constant folding pass.
+       */
+      if (inst->src[i].negate || inst->src[i].abs)
+         continue;
+
+      switch (inst->opcode) {
+      case BRW_OPCODE_MOV:
+         inst->src[i] = entry->src;
+         progress = true;
+         break;
+
+      case SHADER_OPCODE_POW:
+      case SHADER_OPCODE_INT_QUOTIENT:
+      case SHADER_OPCODE_INT_REMAINDER:
+         if (brw->gen < 8)
+            break;
+         /* fallthrough */
+      case BRW_OPCODE_BFI1:
+      case BRW_OPCODE_ASR:
+      case BRW_OPCODE_SHL:
+      case BRW_OPCODE_SHR:
+      case BRW_OPCODE_SUBB:
+         if (i == 1) {
+            inst->src[i] = entry->src;
+            progress = true;
+         }
+         break;
+
+      case BRW_OPCODE_MACH:
+      case BRW_OPCODE_MUL:
+      case BRW_OPCODE_ADD:
+      case BRW_OPCODE_OR:
+      case BRW_OPCODE_AND:
+      case BRW_OPCODE_XOR:
+      case BRW_OPCODE_ADDC:
+         if (i == 1) {
+            inst->src[i] = entry->src;
+            progress = true;
+         } else if (i == 0 && inst->src[1].file != IMM) {
+            /* Fit this constant in by commuting the operands.
+             * Exception: we can't do this for 32-bit integer MUL/MACH
+             * because it's asymmetric.
+             */
+            if ((inst->opcode == BRW_OPCODE_MUL ||
+                 inst->opcode == BRW_OPCODE_MACH) &&
+                (inst->src[1].type == BRW_REGISTER_TYPE_D ||
+                 inst->src[1].type == BRW_REGISTER_TYPE_UD))
+               break;
+            inst->src[0] = inst->src[1];
+            inst->src[1] = entry->src;
+            progress = true;
+         }
+         break;
+
+      case BRW_OPCODE_CMP:
+      case BRW_OPCODE_IF:
+         if (i == 1) {
+            inst->src[i] = entry->src;
+            progress = true;
+         } else if (i == 0 && inst->src[1].file != IMM) {
+            uint32_t new_cmod;
+
+            new_cmod = brw_swap_cmod(inst->conditional_mod);
+            if (new_cmod != ~0u) {
+               /* Fit this constant in by swapping the operands and
+                * flipping the test
+                */
+               inst->src[0] = inst->src[1];
+               inst->src[1] = entry->src;
+               inst->conditional_mod = new_cmod;
+               progress = true;
+            }
+         }
+         break;
+
+      case BRW_OPCODE_SEL:
+         if (i == 1) {
+            inst->src[i] = entry->src;
+            progress = true;
+         } else if (i == 0 && inst->src[1].file != IMM) {
+            inst->src[0] = inst->src[1];
+            inst->src[1] = entry->src;
+
+            /* If this was predicated, flipping operands means
+             * we also need to flip the predicate.
+             */
+            if (inst->conditional_mod == BRW_CONDITIONAL_NONE) {
+               inst->predicate_inverse =
+                  !inst->predicate_inverse;
+            }
+            progress = true;
+         }
+         break;
+
+      case SHADER_OPCODE_RCP:
+         /* The hardware doesn't do math on immediate values
+          * (because why are you doing that, seriously?), but
+          * the correct answer is to just constant fold it
+          * anyway.
+          */
+         assert(i == 0);
+         if (inst->src[0].imm.f != 0.0f) {
+            inst->opcode = BRW_OPCODE_MOV;
+            inst->src[0] = entry->src;
+            inst->src[0].imm.f = 1.0f / inst->src[0].imm.f;
+            progress = true;
+         }
+         break;
+
+      case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD:
+         inst->src[i] = entry->src;
+         progress = true;
+         break;
+
+      default:
+         break;
+      }
+   }
+
+   return progress;
+}
+
+static bool
+can_propagate_from(fs_inst *inst)
+{
+   return (inst->opcode == BRW_OPCODE_MOV &&
+           inst->dst.file == GRF &&
+           ((inst->src[0].file == GRF &&
+             (inst->src[0].reg != inst->dst.reg ||
+              inst->src[0].reg_offset != inst->dst.reg_offset)) ||
+            inst->src[0].file == UNIFORM ||
+            inst->src[0].file == IMM) &&
+           inst->src[0].type == inst->dst.type &&
+           !inst->saturate &&
+           !inst->is_partial_write());
+}
+
+/* Walks a basic block and does copy propagation on it using the acp
+ * list.
+ */
+bool
+fs_visitor::opt_copy_propagate_local(void *copy_prop_ctx, bblock_t *block,
+                                     exec_list *acp)
+{
+   bool progress = false;
+
+   for (fs_inst *inst = (fs_inst *)block->start;
+	inst != block->end->next;
+	inst = (fs_inst *)inst->next) {
+
+      /* Try propagating into this instruction. */
+      for (int i = 0; i < 3; i++) {
+         if (inst->src[i].file != GRF)
+            continue;
+
+         foreach_list(entry_node, &acp[inst->src[i].reg % ACP_HASH_SIZE]) {
+            acp_entry *entry = (acp_entry *)entry_node;
+
+            if (try_constant_propagate(brw, inst, entry))
+               progress = true;
+
+            if (try_copy_propagate(inst, i, entry))
+               progress = true;
+         }
+      }
+
+      /* kill the destination from the ACP */
+      if (inst->dst.file == GRF) {
+	 foreach_list_safe(entry_node, &acp[inst->dst.reg % ACP_HASH_SIZE]) {
+	    acp_entry *entry = (acp_entry *)entry_node;
+
+	    if (inst->overwrites_reg(entry->dst)) {
+	       entry->remove();
+	    }
+	 }
+
+         /* Oops, we only have the chaining hash based on the destination, not
+          * the source, so walk across the entire table.
+          */
+         for (int i = 0; i < ACP_HASH_SIZE; i++) {
+            foreach_list_safe(entry_node, &acp[i]) {
+               acp_entry *entry = (acp_entry *)entry_node;
+               if (inst->overwrites_reg(entry->src))
+                  entry->remove();
+            }
+	 }
+      }
+
+      /* If this instruction's source could potentially be folded into the
+       * operand of another instruction, add it to the ACP.
+       */
+      if (can_propagate_from(inst)) {
+	 acp_entry *entry = ralloc(copy_prop_ctx, acp_entry);
+	 entry->dst = inst->dst;
+	 entry->src = inst->src[0];
+	 acp[entry->dst.reg % ACP_HASH_SIZE].push_tail(entry);
+      }
+   }
+
+   return progress;
+}
+
+bool
+fs_visitor::opt_copy_propagate()
+{
+   bool progress = false;
+   void *copy_prop_ctx = ralloc_context(NULL);
+   cfg_t cfg(&instructions);
+   exec_list *out_acp[cfg.num_blocks];
+   for (int i = 0; i < cfg.num_blocks; i++)
+      out_acp[i] = new exec_list [ACP_HASH_SIZE];
+
+   /* First, walk through each block doing local copy propagation and getting
+    * the set of copies available at the end of the block.
+    */
+   for (int b = 0; b < cfg.num_blocks; b++) {
+      bblock_t *block = cfg.blocks[b];
+
+      progress = opt_copy_propagate_local(copy_prop_ctx, block,
+                                          out_acp[b]) || progress;
+   }
+
+   /* Do dataflow analysis for those available copies. */
+   fs_copy_prop_dataflow dataflow(copy_prop_ctx, &cfg, out_acp);
+
+   /* Next, re-run local copy propagation, this time with the set of copies
+    * provided by the dataflow analysis available at the start of a block.
+    */
+   for (int b = 0; b < cfg.num_blocks; b++) {
+      bblock_t *block = cfg.blocks[b];
+      exec_list in_acp[ACP_HASH_SIZE];
+
+      for (int i = 0; i < dataflow.num_acp; i++) {
+         if (BITSET_TEST(dataflow.bd[b].livein, i)) {
+            struct acp_entry *entry = dataflow.acp[i];
+            in_acp[entry->dst.reg % ACP_HASH_SIZE].push_tail(entry);
+         }
+      }
+
+      progress = opt_copy_propagate_local(copy_prop_ctx, block, in_acp) || progress;
+   }
+
+   for (int i = 0; i < cfg.num_blocks; i++)
+      delete [] out_acp[i];
+   ralloc_free(copy_prop_ctx);
+
+   if (progress)
+      invalidate_live_intervals();
+
+   return progress;
+}

diff --git a/icd/intel/compiler/pipeline/brw_fs_cse.cpp b/icd/intel/compiler/pipeline/brw_fs_cse.cpp
new file mode 100644
index 0000000..ea610bd
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_fs_cse.cpp

@@ -0,0 +1,282 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "brw_fs.h"
+#include "brw_cfg.h"
+
+/** @file brw_fs_cse.cpp
+ *
+ * Support for local common subexpression elimination.
+ *
+ * See Muchnick's Advanced Compiler Design and Implementation, section
+ * 13.1 (p378).
+ */
+
+namespace {
+struct aeb_entry : public exec_node {
+   /** The instruction that generates the expression value. */
+   fs_inst *generator;
+
+   /** The temporary where the value is stored. */
+   fs_reg tmp;
+};
+}
+
+static bool
+is_expression(const fs_inst *const inst)
+{
+   switch (inst->opcode) {
+   case BRW_OPCODE_SEL:
+   case BRW_OPCODE_NOT:
+   case BRW_OPCODE_AND:
+   case BRW_OPCODE_OR:
+   case BRW_OPCODE_XOR:
+   case BRW_OPCODE_SHR:
+   case BRW_OPCODE_SHL:
+   case BRW_OPCODE_ASR:
+   case BRW_OPCODE_CMP:
+   case BRW_OPCODE_CMPN:
+   case BRW_OPCODE_ADD:
+   case BRW_OPCODE_MUL:
+   case BRW_OPCODE_FRC:
+   case BRW_OPCODE_RNDU:
+   case BRW_OPCODE_RNDD:
+   case BRW_OPCODE_RNDE:
+   case BRW_OPCODE_RNDZ:
+   case BRW_OPCODE_LINE:
+   case BRW_OPCODE_PLN:
+   case BRW_OPCODE_MAD:
+   case BRW_OPCODE_LRP:
+   case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD:
+   case FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7:
+   case FS_OPCODE_VARYING_PULL_CONSTANT_LOAD:
+   case FS_OPCODE_CINTERP:
+   case FS_OPCODE_LINTERP:
+      return true;
+   default:
+      return false;
+   }
+}
+
+static bool
+is_expression_commutative(enum opcode op)
+{
+   switch (op) {
+   case BRW_OPCODE_AND:
+   case BRW_OPCODE_OR:
+   case BRW_OPCODE_XOR:
+   case BRW_OPCODE_ADD:
+   case BRW_OPCODE_MUL:
+      return true;
+   default:
+      return false;
+   }
+}
+
+static bool
+operands_match(enum opcode op, fs_reg *xs, fs_reg *ys)
+{
+   if (!is_expression_commutative(op)) {
+      return xs[0].equals(ys[0]) && xs[1].equals(ys[1]) && xs[2].equals(ys[2]);
+   } else {
+      return (xs[0].equals(ys[0]) && xs[1].equals(ys[1])) ||
+             (xs[1].equals(ys[0]) && xs[0].equals(ys[1]));
+   }
+}
+
+static bool
+instructions_match(fs_inst *a, fs_inst *b)
+{
+   return a->opcode == b->opcode &&
+          a->saturate == b->saturate &&
+          a->predicate == b->predicate &&
+          a->predicate_inverse == b->predicate_inverse &&
+          a->conditional_mod == b->conditional_mod &&
+          a->dst.type == b->dst.type &&
+          operands_match(a->opcode, a->src, b->src);
+}
+
+bool
+fs_visitor::opt_cse_local(bblock_t *block, exec_list *aeb)
+{
+   bool progress = false;
+
+   void *cse_ctx = ralloc_context(NULL);
+
+   int ip = block->start_ip;
+   for (fs_inst *inst = (fs_inst *)block->start;
+	inst != block->end->next;
+	inst = (fs_inst *) inst->next) {
+
+      /* Skip some cases. */
+      if (is_expression(inst) && !inst->is_partial_write() &&
+          (inst->dst.file != HW_REG || inst->dst.is_null()))
+      {
+	 bool found = false;
+
+	 aeb_entry *entry;
+	 foreach_list(entry_node, aeb) {
+	    entry = (aeb_entry *) entry_node;
+
+	    /* Match current instruction's expression against those in AEB. */
+	    if (instructions_match(inst, entry->generator)) {
+	       found = true;
+	       progress = true;
+	       break;
+	    }
+	 }
+
+	 if (!found) {
+	    /* Our first sighting of this expression.  Create an entry. */
+	    aeb_entry *entry = ralloc(cse_ctx, aeb_entry);
+	    entry->tmp = reg_undef;
+	    entry->generator = inst;
+	    aeb->push_tail(entry);
+	 } else {
+	    /* This is at least our second sighting of this expression.
+	     * If we don't have a temporary already, make one.
+	     */
+	    bool no_existing_temp = entry->tmp.file == BAD_FILE;
+	    if (no_existing_temp && !entry->generator->dst.is_null()) {
+               int written = entry->generator->regs_written;
+
+               fs_reg orig_dst = entry->generator->dst;
+               fs_reg tmp = fs_reg(GRF, virtual_grf_alloc(written),
+                                   orig_dst.type);
+               entry->tmp = tmp;
+               entry->generator->dst = tmp;
+
+               for (int i = 0; i < written; i++) {
+                  fs_inst *copy = MOV(orig_dst, tmp);
+                  copy->force_writemask_all =
+                     entry->generator->force_writemask_all;
+                  entry->generator->insert_after(copy);
+
+                  orig_dst.reg_offset++;
+                  tmp.reg_offset++;
+               }
+	    }
+
+	    /* dest <- temp */
+            if (!inst->dst.is_null()) {
+               int written = inst->regs_written;
+               assert(written == entry->generator->regs_written);
+               assert(inst->dst.type == entry->tmp.type);
+               fs_reg dst = inst->dst;
+               fs_reg tmp = entry->tmp;
+               fs_inst *copy = NULL;
+               for (int i = 0; i < written; i++) {
+                  copy = MOV(dst, tmp);
+                  copy->force_writemask_all = inst->force_writemask_all;
+                  inst->insert_before(copy);
+
+                  dst.reg_offset++;
+                  tmp.reg_offset++;
+               }
+            }
+
+            /* Set our iterator so that next time through the loop inst->next
+             * will get the instruction in the basic block after the one we've
+             * removed.
+             */
+            fs_inst *prev = (fs_inst *)inst->prev;
+
+            inst->remove();
+
+	    /* Appending an instruction may have changed our bblock end. */
+	    if (inst == block->end) {
+	       block->end = prev;
+	    }
+
+            inst = prev;
+	 }
+      }
+
+      foreach_list_safe(entry_node, aeb) {
+	 aeb_entry *entry = (aeb_entry *)entry_node;
+
+         /* Kill all AEB entries that write a different value to or read from
+          * the flag register if we just wrote it.
+          */
+         if (inst->writes_flag()) {
+            if (entry->generator->reads_flag() ||
+                (entry->generator->writes_flag() &&
+                 !instructions_match(inst, entry->generator))) {
+               entry->remove();
+               ralloc_free(entry);
+               continue;
+            }
+         }
+
+	 for (int i = 0; i < 3; i++) {
+            fs_reg *src_reg = &entry->generator->src[i];
+
+            /* Kill all AEB entries that use the destination we just
+             * overwrote.
+             */
+            if (inst->overwrites_reg(entry->generator->src[i])) {
+	       entry->remove();
+	       ralloc_free(entry);
+	       break;
+	    }
+
+            /* Kill any AEB entries using registers that don't get reused any
+             * more -- a sure sign they'll fail operands_match().
+             */
+            if (src_reg->file == GRF && virtual_grf_end[src_reg->reg] < ip) {
+               entry->remove();
+               ralloc_free(entry);
+	       break;
+            }
+	 }
+      }
+
+      ip++;
+   }
+
+   ralloc_free(cse_ctx);
+
+   if (progress)
+      invalidate_live_intervals();
+
+   return progress;
+}
+
+bool
+fs_visitor::opt_cse()
+{
+   bool progress = false;
+
+   calculate_live_intervals();
+
+   cfg_t cfg(&instructions);
+
+   for (int b = 0; b < cfg.num_blocks; b++) {
+      bblock_t *block = cfg.blocks[b];
+      exec_list aeb;
+
+      progress = opt_cse_local(block, &aeb) || progress;
+   }
+
+   return progress;
+}

diff --git a/icd/intel/compiler/pipeline/brw_fs_dead_code_eliminate.cpp b/icd/intel/compiler/pipeline/brw_fs_dead_code_eliminate.cpp
new file mode 100644
index 0000000..dfeceb0
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_fs_dead_code_eliminate.cpp

@@ -0,0 +1,120 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "brw_fs.h"
+#include "brw_fs_live_variables.h"
+#include "brw_cfg.h"
+
+/** @file brw_fs_dead_code_eliminate.cpp
+ *
+ * Dataflow-aware dead code elimination.
+ *
+ * Walks the instruction list from the bottom, removing instructions that
+ * have results that both aren't used in later blocks and haven't been read
+ * yet in the tail end of this block.
+ */
+
+bool
+fs_visitor::dead_code_eliminate()
+{
+   bool progress = false;
+
+   cfg_t cfg(&instructions);
+
+   calculate_live_intervals();
+
+   int num_vars = live_intervals->num_vars;
+   BITSET_WORD *live = ralloc_array(NULL, BITSET_WORD, BITSET_WORDS(num_vars));
+
+   for (int b = 0; b < cfg.num_blocks; b++) {
+      bblock_t *block = cfg.blocks[b];
+      memcpy(live, live_intervals->bd[b].liveout,
+             sizeof(BITSET_WORD) * BITSET_WORDS(num_vars));
+
+      for (fs_inst *inst = (fs_inst *)block->end;
+           inst != block->start->prev;
+           inst = (fs_inst *)inst->prev) {
+         if (inst->dst.file == GRF &&
+             !inst->has_side_effects() &&
+             !inst->writes_flag()) {
+            bool result_live = false;
+
+            if (inst->regs_written == 1) {
+               int var = live_intervals->var_from_reg(&inst->dst);
+               result_live = BITSET_TEST(live, var);
+            } else {
+               int var = live_intervals->var_from_vgrf[inst->dst.reg];
+               for (int i = 0; i < inst->regs_written; i++) {
+                  result_live = result_live || BITSET_TEST(live, var + i);
+               }
+            }
+
+            if (!result_live) {
+               progress = true;
+
+               if (inst->writes_accumulator) {
+                  inst->dst = fs_reg(retype(brw_null_reg(), inst->dst.type));
+               } else {
+                  inst->opcode = BRW_OPCODE_NOP;
+                  continue;
+               }
+            }
+         }
+
+         if (inst->dst.file == GRF) {
+            if (!inst->is_partial_write()) {
+               int var = live_intervals->var_from_vgrf[inst->dst.reg];
+               for (int i = 0; i < inst->regs_written; i++) {
+                  BITSET_CLEAR(live, var + inst->dst.reg_offset + i);
+               }
+            }
+         }
+
+         for (int i = 0; i < 3; i++) {
+            if (inst->src[i].file == GRF) {
+               int var = live_intervals->var_from_vgrf[inst->src[i].reg];
+
+               for (int j = 0; j < inst->regs_read(this, i); j++) {
+                  BITSET_SET(live, var + inst->src[i].reg_offset + j);
+               }
+            }
+         }
+      }
+   }
+
+   ralloc_free(live);
+
+   if (progress) {
+      foreach_list_safe(node, &this->instructions) {
+         fs_inst *inst = (fs_inst *)node;
+
+         if (inst->opcode == BRW_OPCODE_NOP) {
+            inst->remove();
+         }
+      }
+
+      invalidate_live_intervals();
+   }
+
+   return progress;
+}

diff --git a/icd/intel/compiler/pipeline/brw_fs_generator.cpp b/icd/intel/compiler/pipeline/brw_fs_generator.cpp
new file mode 100644
index 0000000..9aefda9
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_fs_generator.cpp

@@ -0,0 +1,1936 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+/** @file brw_fs_generator.cpp
+ *
+ * This file supports generating code from the FS LIR to the actual
+ * native instructions.
+ */
+
+extern "C" {
+#include "main/macros.h"
+#include "brw_context.h"
+#include "brw_eu.h"
+} /* extern "C" */
+
+#include "brw_fs.h"
+#include "brw_cfg.h"
+
+#include "icd-utils.h" // LunarG: ADD
+
+fs_generator::fs_generator(struct brw_context *brw,
+                           struct brw_wm_compile *c,
+                           struct gl_shader_program *prog,
+                           struct gl_fragment_program *fp,
+                           bool dual_source_output)
+
+   : brw(brw), c(c), prog(prog), fp(fp), dual_source_output(dual_source_output)
+{
+   ctx = &brw->ctx;
+
+   mem_ctx = c;
+
+   p = rzalloc(mem_ctx, struct brw_compile);
+   brw_init_compile(brw, p, mem_ctx);
+}
+
+fs_generator::~fs_generator()
+{
+}
+
+void
+fs_generator::patch_discard_jumps_to_fb_writes()
+{
+   if (brw->gen < 6 || this->discard_halt_patches.is_empty())
+      return;
+
+   /* There is a somewhat strange undocumented requirement of using
+    * HALT, according to the simulator.  If some channel has HALTed to
+    * a particular UIP, then by the end of the program, every channel
+    * must have HALTed to that UIP.  Furthermore, the tracking is a
+    * stack, so you can't do the final halt of a UIP after starting
+    * halting to a new UIP.
+    *
+    * Symptoms of not emitting this instruction on actual hardware
+    * included GPU hangs and sparkly rendering on the piglit discard
+    * tests.
+    */
+   struct brw_instruction *last_halt = gen6_HALT(p);
+   last_halt->bits3.break_cont.uip = 2;
+   last_halt->bits3.break_cont.jip = 2;
+
+   int ip = p->nr_insn;
+
+   foreach_list(node, &this->discard_halt_patches) {
+      ip_record *patch_ip = (ip_record *)node;
+      struct brw_instruction *patch = &p->store[patch_ip->ip];
+
+      assert(patch->header.opcode == BRW_OPCODE_HALT);
+      /* HALT takes a half-instruction distance from the pre-incremented IP. */
+      patch->bits3.break_cont.uip = (ip - patch_ip->ip) * 2;
+   }
+
+   this->discard_halt_patches.make_empty();
+}
+
+void
+fs_generator::generate_scattered_write(fs_inst *inst,
+                                       const struct brw_reg &dst,
+                                       const struct brw_reg &src)
+{
+    brw_scattered_write(p,
+            dst,
+            src,
+            inst->target,
+            inst->mlen,
+            inst->header_present,
+            (inst->opcode == SHADER_OPCODE_DWORD_SCATTERED_WRITE));
+
+   brw_mark_surface_used(&c->prog_data.base, inst->target);
+}
+
+void
+fs_generator::generate_scattered_read(fs_inst *inst,
+                                      const struct brw_reg &dst,
+                                      const struct brw_reg &src)
+{
+    brw_scattered_read(p,
+            dst,
+            src,
+            inst->target,
+            inst->mlen,
+            inst->header_present,
+            (inst->opcode == SHADER_OPCODE_DWORD_SCATTERED_READ));
+
+   brw_mark_surface_used(&c->prog_data.base, inst->target);
+}
+
+void
+fs_generator::generate_fb_write(fs_inst *inst)
+{
+   bool eot = inst->eot;
+   struct brw_reg implied_header;
+   uint32_t msg_control;
+
+   /* Header is 2 regs, g0 and g1 are the contents. g0 will be implied
+    * move, here's g1.
+    */
+   brw_push_insn_state(p);
+   brw_set_mask_control(p, BRW_MASK_DISABLE);
+   brw_set_predicate_control(p, BRW_PREDICATE_NONE);
+   brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+
+   if (inst->header_present) {
+      /* On HSW, the GPU will use the predicate on SENDC, unless the header is
+       * present.
+       */
+      if ((fp && fp->UsesKill) || c->key.alpha_test_func) {
+         struct brw_reg pixel_mask;
+
+         if (brw->gen >= 6)
+            pixel_mask = retype(brw_vec1_grf(1, 7), BRW_REGISTER_TYPE_UW);
+         else
+            pixel_mask = retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_UW);
+
+         brw_MOV(p, pixel_mask, brw_flag_reg(0, 1));
+      }
+
+      if (brw->gen >= 6) {
+	 brw_set_compression_control(p, BRW_COMPRESSION_COMPRESSED);
+	 brw_MOV(p,
+		 retype(brw_message_reg(inst->base_mrf), BRW_REGISTER_TYPE_UD),
+		 retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UD));
+	 brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+
+         if (inst->target > 0 && c->key.replicate_alpha) {
+            /* Set "Source0 Alpha Present to RenderTarget" bit in message
+             * header.
+             */
+            brw_OR(p,
+		   vec1(retype(brw_message_reg(inst->base_mrf), BRW_REGISTER_TYPE_UD)),
+		   vec1(retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UD)),
+		   brw_imm_ud(0x1 << 11));
+         }
+
+	 if (inst->target > 0) {
+	    /* Set the render target index for choosing BLEND_STATE. */
+	    brw_MOV(p, retype(brw_vec1_reg(BRW_MESSAGE_REGISTER_FILE,
+					   inst->base_mrf, 2),
+			      BRW_REGISTER_TYPE_UD),
+		    brw_imm_ud(inst->target));
+	 }
+
+	 implied_header = brw_null_reg();
+      } else {
+	 implied_header = retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UW);
+
+	 brw_MOV(p,
+		 brw_message_reg(inst->base_mrf + 1),
+		 brw_vec8_grf(1, 0));
+      }
+   } else {
+      implied_header = brw_null_reg();
+   }
+
+   if (this->dual_source_output)
+      msg_control = BRW_DATAPORT_RENDER_TARGET_WRITE_SIMD8_DUAL_SOURCE_SUBSPAN01;
+   else if (dispatch_width == 16)
+      msg_control = BRW_DATAPORT_RENDER_TARGET_WRITE_SIMD16_SINGLE_SOURCE;
+   else
+      msg_control = BRW_DATAPORT_RENDER_TARGET_WRITE_SIMD8_SINGLE_SOURCE_SUBSPAN01;
+
+   brw_pop_insn_state(p);
+
+   uint32_t surf_index =
+      c->prog_data.binding_table.render_target_start + inst->target;
+   brw_fb_WRITE(p,
+		dispatch_width,
+		inst->base_mrf,
+		implied_header,
+		msg_control,
+		surf_index,
+		inst->mlen,
+		0,
+		eot,
+		inst->header_present);
+
+   brw_mark_surface_used(&c->prog_data.base, surf_index);
+}
+
+void
+fs_generator::generate_blorp_fb_write(fs_inst *inst)
+{
+   brw_fb_WRITE(p,
+                16 /* dispatch_width */,
+                inst->base_mrf,
+                brw_reg_from_fs_reg(&inst->src[0]),
+                BRW_DATAPORT_RENDER_TARGET_WRITE_SIMD16_SINGLE_SOURCE,
+                inst->target,
+                inst->mlen,
+                0,
+                true,
+                inst->header_present);
+}
+
+/* Computes the integer pixel x,y values from the origin.
+ *
+ * This is the basis of gl_FragCoord computation, but is also used
+ * pre-gen6 for computing the deltas from v0 for computing
+ * interpolation.
+ */
+void
+fs_generator::generate_pixel_xy(struct brw_reg dst, bool is_x)
+{
+   struct brw_reg g1_uw = retype(brw_vec1_grf(1, 0), BRW_REGISTER_TYPE_UW);
+   struct brw_reg src;
+   struct brw_reg deltas;
+
+   if (is_x) {
+      src = stride(suboffset(g1_uw, 4), 2, 4, 0);
+      deltas = brw_imm_v(0x10101010);
+   } else {
+      src = stride(suboffset(g1_uw, 5), 2, 4, 0);
+      deltas = brw_imm_v(0x11001100);
+   }
+
+   if (dispatch_width == 16) {
+      dst = vec16(dst);
+   }
+
+   /* We do this SIMD8 or SIMD16, but since the destination is UW we
+    * don't do compression in the SIMD16 case.
+    */
+   brw_push_insn_state(p);
+   brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+   brw_ADD(p, dst, src, deltas);
+   brw_pop_insn_state(p);
+}
+
+void
+fs_generator::generate_linterp(fs_inst *inst,
+			     struct brw_reg dst, struct brw_reg *src)
+{
+   struct brw_reg delta_x = src[0];
+   struct brw_reg delta_y = src[1];
+   struct brw_reg interp = src[2];
+
+   if (brw->has_pln &&
+       delta_y.nr == delta_x.nr + 1 &&
+       (brw->gen >= 6 || (delta_x.nr & 1) == 0)) {
+      brw_PLN(p, dst, interp, delta_x);
+   } else {
+      brw_LINE(p, brw_null_reg(), interp, delta_x);
+      brw_MAC(p, dst, suboffset(interp, 1), delta_y);
+   }
+}
+
+void
+fs_generator::generate_math1_gen7(fs_inst *inst,
+			        struct brw_reg dst,
+			        struct brw_reg src0)
+{
+   assert(inst->mlen == 0);
+   brw_math(p, dst,
+	    brw_math_function(inst->opcode),
+	    0, src0,
+	    BRW_MATH_DATA_VECTOR,
+	    BRW_MATH_PRECISION_FULL);
+}
+
+void
+fs_generator::generate_math2_gen7(fs_inst *inst,
+			        struct brw_reg dst,
+			        struct brw_reg src0,
+			        struct brw_reg src1)
+{
+   assert(inst->mlen == 0);
+   brw_math2(p, dst, brw_math_function(inst->opcode), src0, src1);
+}
+
+void
+fs_generator::generate_math1_gen6(fs_inst *inst,
+			        struct brw_reg dst,
+			        struct brw_reg src0)
+{
+   int op = brw_math_function(inst->opcode);
+
+   assert(inst->mlen == 0);
+
+   brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+   brw_math(p, dst,
+	    op,
+	    0, src0,
+	    BRW_MATH_DATA_VECTOR,
+	    BRW_MATH_PRECISION_FULL);
+
+   if (dispatch_width == 16) {
+      brw_set_compression_control(p, BRW_COMPRESSION_2NDHALF);
+      brw_math(p, sechalf(dst),
+	       op,
+	       0, sechalf(src0),
+	       BRW_MATH_DATA_VECTOR,
+	       BRW_MATH_PRECISION_FULL);
+      brw_set_compression_control(p, BRW_COMPRESSION_COMPRESSED);
+   }
+}
+
+void
+fs_generator::generate_math2_gen6(fs_inst *inst,
+			        struct brw_reg dst,
+			        struct brw_reg src0,
+			        struct brw_reg src1)
+{
+   int op = brw_math_function(inst->opcode);
+
+   assert(inst->mlen == 0);
+
+   brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+   brw_math2(p, dst, op, src0, src1);
+
+   if (dispatch_width == 16) {
+      brw_set_compression_control(p, BRW_COMPRESSION_2NDHALF);
+      brw_math2(p, sechalf(dst), op, sechalf(src0), sechalf(src1));
+      brw_set_compression_control(p, BRW_COMPRESSION_COMPRESSED);
+   }
+}
+
+void
+fs_generator::generate_math_gen4(fs_inst *inst,
+			       struct brw_reg dst,
+			       struct brw_reg src)
+{
+   int op = brw_math_function(inst->opcode);
+
+   assert(inst->mlen >= 1);
+
+   brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+   brw_math(p, dst,
+	    op,
+	    inst->base_mrf, src,
+	    BRW_MATH_DATA_VECTOR,
+	    BRW_MATH_PRECISION_FULL);
+
+   if (dispatch_width == 16) {
+      brw_set_compression_control(p, BRW_COMPRESSION_2NDHALF);
+      brw_math(p, sechalf(dst),
+	       op,
+	       inst->base_mrf + 1, sechalf(src),
+	       BRW_MATH_DATA_VECTOR,
+	       BRW_MATH_PRECISION_FULL);
+
+      brw_set_compression_control(p, BRW_COMPRESSION_COMPRESSED);
+   }
+}
+
+void
+fs_generator::generate_math_g45(fs_inst *inst,
+                                struct brw_reg dst,
+                                struct brw_reg src)
+{
+   if (inst->opcode == SHADER_OPCODE_POW ||
+       inst->opcode == SHADER_OPCODE_INT_QUOTIENT ||
+       inst->opcode == SHADER_OPCODE_INT_REMAINDER) {
+      generate_math_gen4(inst, dst, src);
+      return;
+   }
+
+   int op = brw_math_function(inst->opcode);
+
+   assert(inst->mlen >= 1);
+
+   brw_math(p, dst,
+            op,
+            inst->base_mrf, src,
+            BRW_MATH_DATA_VECTOR,
+            BRW_MATH_PRECISION_FULL);
+}
+
+void
+fs_generator::generate_tex(fs_inst *inst, struct brw_reg dst, struct brw_reg src)
+{
+   int msg_type = -1;
+   int rlen = 4;
+   uint32_t simd_mode = BRW_SAMPLER_SIMD_MODE_SIMD8;
+   uint32_t return_format;
+
+   switch (dst.type) {
+   case BRW_REGISTER_TYPE_D:
+      return_format = BRW_SAMPLER_RETURN_FORMAT_SINT32;
+      break;
+   case BRW_REGISTER_TYPE_UD:
+      return_format = BRW_SAMPLER_RETURN_FORMAT_UINT32;
+      break;
+   default:
+      return_format = BRW_SAMPLER_RETURN_FORMAT_FLOAT32;
+      break;
+   }
+
+   if (dispatch_width == 16 &&
+      !inst->force_uncompressed && !inst->force_sechalf)
+      simd_mode = BRW_SAMPLER_SIMD_MODE_SIMD16;
+
+   if (brw->gen >= 5) {
+      switch (inst->opcode) {
+      case SHADER_OPCODE_TEX:
+	 if (inst->shadow_compare) {
+	    msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_COMPARE;
+	 } else {
+	    msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE;
+	 }
+	 break;
+      case FS_OPCODE_TXB:
+	 if (inst->shadow_compare) {
+	    msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_BIAS_COMPARE;
+	 } else {
+	    msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_BIAS;
+	 }
+	 break;
+      case SHADER_OPCODE_TXL:
+	 if (inst->shadow_compare) {
+	    msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_LOD_COMPARE;
+	 } else {
+	    msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_LOD;
+	 }
+	 break;
+      case SHADER_OPCODE_TXS:
+	 msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_RESINFO;
+	 break;
+      case SHADER_OPCODE_TXD:
+         if (inst->shadow_compare) {
+            /* Gen7.5+.  Otherwise, lowered by brw_lower_texture_gradients(). */
+            assert(brw->is_haswell);
+            msg_type = HSW_SAMPLER_MESSAGE_SAMPLE_DERIV_COMPARE;
+         } else {
+            msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_DERIVS;
+         }
+	 break;
+      case SHADER_OPCODE_TXF:
+	 msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_LD;
+	 break;
+      case SHADER_OPCODE_TXF_CMS:
+         if (brw->gen >= 7)
+            msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DMS;
+         else
+            msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_LD;
+         break;
+      case SHADER_OPCODE_TXF_UMS:
+         assert(brw->gen >= 7);
+         msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DSS;
+         break;
+      case SHADER_OPCODE_TXF_MCS:
+         assert(brw->gen >= 7);
+         msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_LD_MCS;
+         break;
+      case SHADER_OPCODE_LOD:
+         msg_type = GEN5_SAMPLER_MESSAGE_LOD;
+         break;
+      case SHADER_OPCODE_TG4:
+         if (inst->shadow_compare) {
+            assert(brw->gen >= 7);
+            msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_C;
+         } else {
+            assert(brw->gen >= 6);
+            msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4;
+         }
+         break;
+      case SHADER_OPCODE_TG4_OFFSET:
+         assert(brw->gen >= 7);
+         if (inst->shadow_compare) {
+            msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO_C;
+         } else {
+            msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO;
+         }
+         break;
+      default:
+	 assert(!"not reached");
+	 break;
+      }
+   } else {
+      switch (inst->opcode) {
+      case SHADER_OPCODE_TEX:
+	 /* Note that G45 and older determines shadow compare and dispatch width
+	  * from message length for most messages.
+	  */
+	 assert(dispatch_width == 8);
+	 msg_type = BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE;
+	 if (inst->shadow_compare) {
+	    assert(inst->mlen == 6);
+	 } else {
+	    assert(inst->mlen <= 4);
+	 }
+	 break;
+      case FS_OPCODE_TXB:
+	 if (inst->shadow_compare) {
+	    assert(inst->mlen == 6);
+	    msg_type = BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE_BIAS_COMPARE;
+	 } else {
+	    assert(inst->mlen == 9);
+	    msg_type = BRW_SAMPLER_MESSAGE_SIMD16_SAMPLE_BIAS;
+	    simd_mode = BRW_SAMPLER_SIMD_MODE_SIMD16;
+	 }
+	 break;
+      case SHADER_OPCODE_TXL:
+	 if (inst->shadow_compare) {
+	    assert(inst->mlen == 6);
+	    msg_type = BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE_LOD_COMPARE;
+	 } else {
+	    assert(inst->mlen == 9);
+	    msg_type = BRW_SAMPLER_MESSAGE_SIMD16_SAMPLE_LOD;
+	    simd_mode = BRW_SAMPLER_SIMD_MODE_SIMD16;
+	 }
+	 break;
+      case SHADER_OPCODE_TXD:
+	 /* There is no sample_d_c message; comparisons are done manually */
+	 assert(inst->mlen == 7 || inst->mlen == 10);
+	 msg_type = BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE_GRADIENTS;
+	 break;
+      case SHADER_OPCODE_TXF:
+	 assert(inst->mlen == 9);
+	 msg_type = BRW_SAMPLER_MESSAGE_SIMD16_LD;
+	 simd_mode = BRW_SAMPLER_SIMD_MODE_SIMD16;
+	 break;
+      case SHADER_OPCODE_TXS:
+	 assert(inst->mlen == 3);
+	 msg_type = BRW_SAMPLER_MESSAGE_SIMD16_RESINFO;
+	 simd_mode = BRW_SAMPLER_SIMD_MODE_SIMD16;
+	 break;
+      default:
+	 assert(!"not reached");
+	 break;
+      }
+   }
+   assert(msg_type != -1);
+
+   if (simd_mode == BRW_SAMPLER_SIMD_MODE_SIMD16) {
+      rlen = 8;
+      dst = vec16(dst);
+   }
+
+   if (brw->gen >= 7 && inst->header_present && dispatch_width == 16) {
+      /* The send-from-GRF for SIMD16 texturing with a header has an extra
+       * hardware register allocated to it, which we need to skip over (since
+       * our coordinates in the payload are in the even-numbered registers,
+       * and the header comes right before the first one).
+       */
+      assert(src.file == BRW_GENERAL_REGISTER_FILE);
+      src.nr++;
+   }
+
+   /* Load the message header if present.  If there's a texture offset,
+    * we need to set it up explicitly and load the offset bitfield.
+    * Otherwise, we can use an implied move from g0 to the first message reg.
+    */
+   if (inst->header_present) {
+      if (brw->gen < 6 && !inst->texture_offset) {
+         /* Set up an implied move from g0 to the MRF. */
+         src = retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UW);
+      } else {
+         struct brw_reg header_reg;
+
+         if (brw->gen >= 7) {
+            header_reg = src;
+         } else {
+            assert(inst->base_mrf != -1);
+            header_reg = brw_message_reg(inst->base_mrf);
+         }
+
+         brw_push_insn_state(p);
+         brw_set_mask_control(p, BRW_MASK_DISABLE);
+         brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+         /* Explicitly set up the message header by copying g0 to the MRF. */
+         brw_MOV(p, header_reg, brw_vec8_grf(0, 0));
+
+         if (inst->texture_offset) {
+            /* Set the offset bits in DWord 2. */
+            brw_MOV(p, get_element_ud(header_reg, 2),
+                       brw_imm_ud(inst->texture_offset));
+         }
+
+         if (inst->sampler >= 16) {
+            /* The "Sampler Index" field can only store values between 0 and 15.
+             * However, we can add an offset to the "Sampler State Pointer"
+             * field, effectively selecting a different set of 16 samplers.
+             *
+             * The "Sampler State Pointer" needs to be aligned to a 32-byte
+             * offset, and each sampler state is only 16-bytes, so we can't
+             * exclusively use the offset - we have to use both.
+             */
+            assert(brw->is_haswell); /* field only exists on Haswell */
+            brw_ADD(p,
+                    get_element_ud(header_reg, 3),
+                    get_element_ud(brw_vec8_grf(0, 0), 3),
+                    brw_imm_ud(16 * (inst->sampler / 16) *
+                               sizeof(gen7_sampler_state)));
+         }
+         brw_pop_insn_state(p);
+      }
+   }
+
+   uint32_t surface_index = ((inst->opcode == SHADER_OPCODE_TG4 ||
+      inst->opcode == SHADER_OPCODE_TG4_OFFSET)
+      ? c->prog_data.base.binding_table.gather_texture_start
+      : c->prog_data.base.binding_table.texture_start) + inst->sampler;
+
+   brw_SAMPLE(p,
+	      retype(dst, BRW_REGISTER_TYPE_UW),
+	      inst->base_mrf,
+	      src,
+              surface_index,
+	      inst->sampler % 16,
+	      msg_type,
+	      rlen,
+	      inst->mlen,
+	      inst->header_present,
+	      simd_mode,
+	      return_format);
+
+   brw_mark_surface_used(&c->prog_data.base, surface_index);
+}
+
+
+/* For OPCODE_DDX and OPCODE_DDY, per channel of output we've got input
+ * looking like:
+ *
+ * arg0: ss0.tl ss0.tr ss0.bl ss0.br ss1.tl ss1.tr ss1.bl ss1.br
+ *
+ * Ideally, we want to produce:
+ *
+ *           DDX                     DDY
+ * dst: (ss0.tr - ss0.tl)     (ss0.tl - ss0.bl)
+ *      (ss0.tr - ss0.tl)     (ss0.tr - ss0.br)
+ *      (ss0.br - ss0.bl)     (ss0.tl - ss0.bl)
+ *      (ss0.br - ss0.bl)     (ss0.tr - ss0.br)
+ *      (ss1.tr - ss1.tl)     (ss1.tl - ss1.bl)
+ *      (ss1.tr - ss1.tl)     (ss1.tr - ss1.br)
+ *      (ss1.br - ss1.bl)     (ss1.tl - ss1.bl)
+ *      (ss1.br - ss1.bl)     (ss1.tr - ss1.br)
+ *
+ * and add another set of two more subspans if in 16-pixel dispatch mode.
+ *
+ * For DDX, it ends up being easy: width = 2, horiz=0 gets us the same result
+ * for each pair, and vertstride = 2 jumps us 2 elements after processing a
+ * pair.  But the ideal approximation may impose a huge performance cost on
+ * sample_d.  On at least Haswell, sample_d instruction does some
+ * optimizations if the same LOD is used for all pixels in the subspan.
+ *
+ * For DDY, we need to use ALIGN16 mode since it's capable of doing the
+ * appropriate swizzling.
+ */
+void
+fs_generator::generate_ddx(fs_inst *inst, struct brw_reg dst, struct brw_reg src)
+{
+   unsigned vstride, width;
+
+   if (c->key.high_quality_derivatives) {
+      /* produce accurate derivatives */
+      vstride = BRW_VERTICAL_STRIDE_2;
+      width = BRW_WIDTH_2;
+   }
+   else {
+      /* replicate the derivative at the top-left pixel to other pixels */
+      vstride = BRW_VERTICAL_STRIDE_4;
+      width = BRW_WIDTH_4;
+   }
+
+   struct brw_reg src0 = brw_reg(src.file, src.nr, 1,
+				 BRW_REGISTER_TYPE_F,
+				 vstride,
+				 width,
+				 BRW_HORIZONTAL_STRIDE_0,
+				 BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
+   struct brw_reg src1 = brw_reg(src.file, src.nr, 0,
+				 BRW_REGISTER_TYPE_F,
+				 vstride,
+				 width,
+				 BRW_HORIZONTAL_STRIDE_0,
+				 BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
+   brw_ADD(p, dst, src0, negate(src1));
+}
+
+/* The negate_value boolean is used to negate the derivative computation for
+ * FBOs, since they place the origin at the upper left instead of the lower
+ * left.
+ */
+void
+fs_generator::generate_ddy(fs_inst *inst, struct brw_reg dst, struct brw_reg src,
+                         bool negate_value)
+{
+   if (c->key.high_quality_derivatives) {
+      /* From the Ivy Bridge PRM, volume 4 part 3, section 3.3.9 (Register
+       * Region Restrictions):
+       *
+       *     In Align16 access mode, SIMD16 is not allowed for DW operations
+       *     and SIMD8 is not allowed for DF operations.
+       *
+       * In this context, "DW operations" means "operations acting on 32-bit
+       * values", so it includes operations on floats.
+       *
+       * Gen4 has a similar restriction.  From the i965 PRM, section 11.5.3
+       * (Instruction Compression -> Rules and Restrictions):
+       *
+       *     A compressed instruction must be in Align1 access mode. Align16
+       *     mode instructions cannot be compressed.
+       *
+       * Similar text exists in the g45 PRM.
+       *
+       * On these platforms, if we're building a SIMD16 shader, we need to
+       * manually unroll to a pair of SIMD8 instructions.
+       */
+      bool unroll_to_simd8 =
+         (dispatch_width == 16 &&
+          (brw->gen == 4 || (brw->gen == 7 && !brw->is_haswell)));
+
+      /* produce accurate derivatives */
+      struct brw_reg src0 = brw_reg(src.file, src.nr, 0,
+                                    BRW_REGISTER_TYPE_F,
+                                    BRW_VERTICAL_STRIDE_4,
+                                    BRW_WIDTH_4,
+                                    BRW_HORIZONTAL_STRIDE_1,
+                                    BRW_SWIZZLE_XYXY, WRITEMASK_XYZW);
+      struct brw_reg src1 = brw_reg(src.file, src.nr, 0,
+                                    BRW_REGISTER_TYPE_F,
+                                    BRW_VERTICAL_STRIDE_4,
+                                    BRW_WIDTH_4,
+                                    BRW_HORIZONTAL_STRIDE_1,
+                                    BRW_SWIZZLE_ZWZW, WRITEMASK_XYZW);
+      brw_push_insn_state(p);
+      brw_set_access_mode(p, BRW_ALIGN_16);
+      if (unroll_to_simd8)
+         brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+      if (negate_value)
+         brw_ADD(p, dst, src1, negate(src0));
+      else
+         brw_ADD(p, dst, src0, negate(src1));
+      if (unroll_to_simd8) {
+         brw_set_compression_control(p, BRW_COMPRESSION_2NDHALF);
+         src0 = sechalf(src0);
+         src1 = sechalf(src1);
+         dst = sechalf(dst);
+         if (negate_value)
+            brw_ADD(p, dst, src1, negate(src0));
+         else
+            brw_ADD(p, dst, src0, negate(src1));
+      }
+      brw_pop_insn_state(p);
+   } else {
+      /* replicate the derivative at the top-left pixel to other pixels */
+      struct brw_reg src0 = brw_reg(src.file, src.nr, 0,
+                                    BRW_REGISTER_TYPE_F,
+                                    BRW_VERTICAL_STRIDE_4,
+                                    BRW_WIDTH_4,
+                                    BRW_HORIZONTAL_STRIDE_0,
+                                    BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
+      struct brw_reg src1 = brw_reg(src.file, src.nr, 2,
+                                    BRW_REGISTER_TYPE_F,
+                                    BRW_VERTICAL_STRIDE_4,
+                                    BRW_WIDTH_4,
+                                    BRW_HORIZONTAL_STRIDE_0,
+                                    BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
+      if (negate_value)
+         brw_ADD(p, dst, src1, negate(src0));
+      else
+         brw_ADD(p, dst, src0, negate(src1));
+   }
+}
+
+void
+fs_generator::generate_discard_jump(fs_inst *inst)
+{
+   assert(brw->gen >= 6);
+
+   /* This HALT will be patched up at FB write time to point UIP at the end of
+    * the program, and at brw_uip_jip() JIP will be set to the end of the
+    * current block (or the program).
+    */
+   this->discard_halt_patches.push_tail(new(mem_ctx) ip_record(p->nr_insn));
+
+   brw_push_insn_state(p);
+   brw_set_mask_control(p, BRW_MASK_DISABLE);
+   gen6_HALT(p);
+   brw_pop_insn_state(p);
+}
+
+void
+fs_generator::generate_scratch_write(fs_inst *inst, struct brw_reg src)
+{
+   assert(inst->mlen != 0);
+
+   brw_MOV(p,
+	   retype(brw_message_reg(inst->base_mrf + 1), BRW_REGISTER_TYPE_UD),
+	   retype(src, BRW_REGISTER_TYPE_UD));
+   brw_oword_block_write_scratch(p, brw_message_reg(inst->base_mrf),
+                                 dispatch_width / 8, inst->offset);
+}
+
+void
+fs_generator::generate_scratch_read(fs_inst *inst, struct brw_reg dst)
+{
+   assert(inst->mlen != 0);
+
+   brw_oword_block_read_scratch(p, dst, brw_message_reg(inst->base_mrf),
+                                dispatch_width / 8, inst->offset);
+}
+
+void
+fs_generator::generate_scratch_read_gen7(fs_inst *inst, struct brw_reg dst)
+{
+   gen7_block_read_scratch(p, dst, dispatch_width / 8, inst->offset);
+}
+
+void
+fs_generator::generate_uniform_pull_constant_load(fs_inst *inst,
+                                                  struct brw_reg dst,
+                                                  struct brw_reg index,
+                                                  struct brw_reg offset)
+{
+   assert(inst->mlen != 0);
+
+   assert(index.file == BRW_IMMEDIATE_VALUE &&
+	  index.type == BRW_REGISTER_TYPE_UD);
+   uint32_t surf_index = index.dw1.ud;
+
+   assert(offset.file == BRW_IMMEDIATE_VALUE &&
+	  offset.type == BRW_REGISTER_TYPE_UD);
+   uint32_t read_offset = offset.dw1.ud;
+
+   brw_oword_block_read(p, dst, brw_message_reg(inst->base_mrf),
+			read_offset, surf_index);
+
+   brw_mark_surface_used(&c->prog_data.base, surf_index);
+}
+
+void
+fs_generator::generate_uniform_pull_constant_load_gen7(fs_inst *inst,
+                                                       struct brw_reg dst,
+                                                       struct brw_reg index,
+                                                       struct brw_reg offset)
+{
+   assert(inst->mlen == 0);
+
+   assert(index.file == BRW_IMMEDIATE_VALUE &&
+	  index.type == BRW_REGISTER_TYPE_UD);
+   uint32_t surf_index = index.dw1.ud;
+
+   assert(offset.file == BRW_GENERAL_REGISTER_FILE);
+   /* Reference just the dword we need, to avoid angering validate_reg(). */
+   offset = brw_vec1_grf(offset.nr, 0);
+
+   brw_push_insn_state(p);
+   brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+   brw_set_mask_control(p, BRW_MASK_DISABLE);
+   struct brw_instruction *send = brw_next_insn(p, BRW_OPCODE_SEND);
+   brw_pop_insn_state(p);
+
+   /* We use the SIMD4x2 mode because we want to end up with 4 components in
+    * the destination loaded consecutively from the same offset (which appears
+    * in the first component, and the rest are ignored).
+    */
+   dst.width = BRW_WIDTH_4;
+   brw_set_dest(p, send, dst);
+   brw_set_src0(p, send, offset);
+   brw_set_sampler_message(p, send,
+                           surf_index,
+                           0, /* LD message ignores sampler unit */
+                           GEN5_SAMPLER_MESSAGE_SAMPLE_LD,
+                           1, /* rlen */
+                           1, /* mlen */
+                           false, /* no header */
+                           BRW_SAMPLER_SIMD_MODE_SIMD4X2,
+                           0);
+
+   brw_mark_surface_used(&c->prog_data.base, surf_index);
+}
+
+void
+fs_generator::generate_varying_pull_constant_load(fs_inst *inst,
+                                                  struct brw_reg dst,
+                                                  struct brw_reg index,
+                                                  struct brw_reg offset)
+{
+   assert(brw->gen < 7); /* Should use the gen7 variant. */
+   assert(inst->header_present);
+   assert(inst->mlen);
+
+   assert(index.file == BRW_IMMEDIATE_VALUE &&
+	  index.type == BRW_REGISTER_TYPE_UD);
+   uint32_t surf_index = index.dw1.ud;
+
+   uint32_t simd_mode, rlen, msg_type;
+   if (dispatch_width == 16) {
+      simd_mode = BRW_SAMPLER_SIMD_MODE_SIMD16;
+      rlen = 8;
+   } else {
+      simd_mode = BRW_SAMPLER_SIMD_MODE_SIMD8;
+      rlen = 4;
+   }
+
+   if (brw->gen >= 5)
+      msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_LD;
+   else {
+      /* We always use the SIMD16 message so that we only have to load U, and
+       * not V or R.
+       */
+      msg_type = BRW_SAMPLER_MESSAGE_SIMD16_LD;
+      assert(inst->mlen == 3);
+      assert(inst->regs_written == 8);
+      rlen = 8;
+      simd_mode = BRW_SAMPLER_SIMD_MODE_SIMD16;
+   }
+
+   struct brw_reg offset_mrf = retype(brw_message_reg(inst->base_mrf + 1),
+                                      BRW_REGISTER_TYPE_D);
+   brw_MOV(p, offset_mrf, offset);
+
+   struct brw_reg header = brw_vec8_grf(0, 0);
+   gen6_resolve_implied_move(p, &header, inst->base_mrf);
+
+   struct brw_instruction *send = brw_next_insn(p, BRW_OPCODE_SEND);
+   send->header.compression_control = BRW_COMPRESSION_NONE;
+   brw_set_dest(p, send, retype(dst, BRW_REGISTER_TYPE_UW));
+   brw_set_src0(p, send, header);
+   if (brw->gen < 6)
+      send->header.destreg__conditionalmod = inst->base_mrf;
+
+   /* Our surface is set up as floats, regardless of what actual data is
+    * stored in it.
+    */
+   uint32_t return_format = BRW_SAMPLER_RETURN_FORMAT_FLOAT32;
+   brw_set_sampler_message(p, send,
+                           surf_index,
+                           0, /* sampler (unused) */
+                           msg_type,
+                           rlen,
+                           inst->mlen,
+                           inst->header_present,
+                           simd_mode,
+                           return_format);
+
+   brw_mark_surface_used(&c->prog_data.base, surf_index);
+}
+
+void
+fs_generator::generate_varying_pull_constant_load_gen7(fs_inst *inst,
+                                                       struct brw_reg dst,
+                                                       struct brw_reg index,
+                                                       struct brw_reg offset)
+{
+   assert(brw->gen >= 7);
+   /* Varying-offset pull constant loads are treated as a normal expression on
+    * gen7, so the fact that it's a send message is hidden at the IR level.
+    */
+   assert(!inst->header_present);
+   assert(!inst->mlen);
+
+   assert(index.file == BRW_IMMEDIATE_VALUE &&
+	  index.type == BRW_REGISTER_TYPE_UD);
+   uint32_t surf_index = index.dw1.ud;
+
+   uint32_t simd_mode, rlen, mlen;
+   if (dispatch_width == 16) {
+      mlen = 2;
+      rlen = 8;
+      simd_mode = BRW_SAMPLER_SIMD_MODE_SIMD16;
+   } else {
+      mlen = 1;
+      rlen = 4;
+      simd_mode = BRW_SAMPLER_SIMD_MODE_SIMD8;
+   }
+
+   struct brw_instruction *send = brw_next_insn(p, BRW_OPCODE_SEND);
+   brw_set_dest(p, send, dst);
+   brw_set_src0(p, send, offset);
+   brw_set_sampler_message(p, send,
+                           surf_index,
+                           0, /* LD message ignores sampler unit */
+                           GEN5_SAMPLER_MESSAGE_SAMPLE_LD,
+                           rlen,
+                           mlen,
+                           false, /* no header */
+                           simd_mode,
+                           0);
+
+   brw_mark_surface_used(&c->prog_data.base, surf_index);
+}
+
+/**
+ * Cause the current pixel/sample mask (from R1.7 bits 15:0) to be transferred
+ * into the flags register (f0.0).
+ *
+ * Used only on Gen6 and above.
+ */
+void
+fs_generator::generate_mov_dispatch_to_flags(fs_inst *inst)
+{
+   struct brw_reg flags = brw_flag_reg(0, inst->flag_subreg);
+   struct brw_reg dispatch_mask;
+
+   if (brw->gen >= 6)
+      dispatch_mask = retype(brw_vec1_grf(1, 7), BRW_REGISTER_TYPE_UW);
+   else
+      dispatch_mask = retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_UW);
+
+   brw_push_insn_state(p);
+   brw_set_mask_control(p, BRW_MASK_DISABLE);
+   brw_MOV(p, flags, dispatch_mask);
+   brw_pop_insn_state(p);
+}
+
+
+static uint32_t brw_file_from_reg(fs_reg *reg)
+{
+   switch (reg->file) {
+   case GRF:
+      return BRW_GENERAL_REGISTER_FILE;
+   case MRF:
+      return BRW_MESSAGE_REGISTER_FILE;
+   case IMM:
+      return BRW_IMMEDIATE_VALUE;
+   default:
+      assert(!"not reached");
+      return BRW_GENERAL_REGISTER_FILE;
+   }
+}
+
+struct brw_reg
+brw_reg_from_fs_reg(fs_reg *reg)
+{
+   struct brw_reg brw_reg;
+
+   switch (reg->file) {
+   case GRF:
+   case MRF:
+      if (reg->stride == 0) {
+         brw_reg = brw_vec1_reg(brw_file_from_reg(reg), reg->reg, 0);
+      } else {
+         brw_reg = brw_vec8_reg(brw_file_from_reg(reg), reg->reg, 0);
+         brw_reg = stride(brw_reg, 8 * reg->stride, 8, reg->stride);
+      }
+
+      brw_reg = retype(brw_reg, reg->type);
+      brw_reg = byte_offset(brw_reg, reg->subreg_offset);
+      break;
+   case IMM:
+      switch (reg->type) {
+      case BRW_REGISTER_TYPE_F:
+	 brw_reg = brw_imm_f(reg->imm.f);
+	 break;
+      case BRW_REGISTER_TYPE_D:
+	 brw_reg = brw_imm_d(reg->imm.i);
+	 break;
+      case BRW_REGISTER_TYPE_UD:
+	 brw_reg = brw_imm_ud(reg->imm.u);
+	 break;
+      default:
+	 assert(!"not reached");
+	 brw_reg = brw_null_reg();
+	 break;
+      }
+      break;
+   case HW_REG:
+      assert(reg->type == reg->fixed_hw_reg.type);
+      brw_reg = reg->fixed_hw_reg;
+      break;
+   case BAD_FILE:
+      /* Probably unused. */
+      brw_reg = brw_null_reg();
+      break;
+   case UNIFORM:
+      assert(!"not reached");
+      brw_reg = brw_null_reg();
+      break;
+   default:
+      assert(!"not reached");
+      brw_reg = brw_null_reg();
+      break;
+   }
+   if (reg->abs)
+      brw_reg = brw_abs(brw_reg);
+   if (reg->negate)
+      brw_reg = negate(brw_reg);
+
+   return brw_reg;
+}
+
+/**
+ * Sets the first word of a vgrf for gen7+ simd4x2 uniform pull constant
+ * sampler LD messages.
+ *
+ * We don't want to bake it into the send message's code generation because
+ * that means we don't get a chance to schedule the instructions.
+ */
+void
+fs_generator::generate_set_simd4x2_offset(fs_inst *inst,
+                                          struct brw_reg dst,
+                                          struct brw_reg value)
+{
+   assert(value.file == BRW_IMMEDIATE_VALUE);
+
+   brw_push_insn_state(p);
+   brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+   brw_set_mask_control(p, BRW_MASK_DISABLE);
+   brw_MOV(p, retype(brw_vec1_reg(dst.file, dst.nr, 0), value.type), value);
+   brw_pop_insn_state(p);
+}
+
+/* Sets vstride=16, width=8, hstride=2 or vstride=0, width=1, hstride=0
+ * (when mask is passed as a uniform) of register mask before moving it
+ * to register dst.
+ */
+void
+fs_generator::generate_set_omask(fs_inst *inst,
+                                 struct brw_reg dst,
+                                 struct brw_reg mask)
+{
+   bool stride_8_8_1 =
+    (mask.vstride == BRW_VERTICAL_STRIDE_8 &&
+     mask.width == BRW_WIDTH_8 &&
+     mask.hstride == BRW_HORIZONTAL_STRIDE_1);
+
+   bool stride_0_1_0 =
+    (mask.vstride == BRW_VERTICAL_STRIDE_0 &&
+     mask.width == BRW_WIDTH_1 &&
+     mask.hstride == BRW_HORIZONTAL_STRIDE_0);
+
+   assert(stride_8_8_1 || stride_0_1_0);
+   assert(dst.type == BRW_REGISTER_TYPE_UW);
+
+   if (dispatch_width == 16)
+      dst = vec16(dst);
+   brw_push_insn_state(p);
+   brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+   brw_set_mask_control(p, BRW_MASK_DISABLE);
+
+   if (stride_8_8_1) {
+      brw_MOV(p, dst, retype(stride(mask, 16, 8, 2), dst.type));
+   } else if (stride_0_1_0) {
+      brw_MOV(p, dst, retype(mask, dst.type));
+   }
+   brw_pop_insn_state(p);
+}
+
+/* Sets vstride=1, width=4, hstride=0 of register src1 during
+ * the ADD instruction.
+ */
+void
+fs_generator::generate_set_sample_id(fs_inst *inst,
+                                     struct brw_reg dst,
+                                     struct brw_reg src0,
+                                     struct brw_reg src1)
+{
+   assert(dst.type == BRW_REGISTER_TYPE_D ||
+          dst.type == BRW_REGISTER_TYPE_UD);
+   assert(src0.type == BRW_REGISTER_TYPE_D ||
+          src0.type == BRW_REGISTER_TYPE_UD);
+
+   brw_push_insn_state(p);
+   brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+   brw_set_mask_control(p, BRW_MASK_DISABLE);
+   struct brw_reg reg = retype(stride(src1, 1, 4, 0), BRW_REGISTER_TYPE_UW);
+   brw_ADD(p, dst, src0, reg);
+   if (dispatch_width == 16)
+      brw_ADD(p, offset(dst, 1), offset(src0, 1), suboffset(reg, 2));
+   brw_pop_insn_state(p);
+}
+
+/**
+ * Change the register's data type from UD to W, doubling the strides in order
+ * to compensate for halving the data type width.
+ */
+static struct brw_reg
+ud_reg_to_w(struct brw_reg r)
+{
+   assert(r.type == BRW_REGISTER_TYPE_UD);
+   r.type = BRW_REGISTER_TYPE_W;
+
+   /* The BRW_*_STRIDE enums are defined so that incrementing the field
+    * doubles the real stride.
+    */
+   if (r.hstride != 0)
+      ++r.hstride;
+   if (r.vstride != 0)
+      ++r.vstride;
+
+   return r;
+}
+
+void
+fs_generator::generate_pack_half_2x16_split(fs_inst *inst,
+                                            struct brw_reg dst,
+                                            struct brw_reg x,
+                                            struct brw_reg y)
+{
+   assert(brw->gen >= 7);
+   assert(dst.type == BRW_REGISTER_TYPE_UD);
+   assert(x.type == BRW_REGISTER_TYPE_F);
+   assert(y.type == BRW_REGISTER_TYPE_F);
+
+   /* From the Ivybridge PRM, Vol4, Part3, Section 6.27 f32to16:
+    *
+    *   Because this instruction does not have a 16-bit floating-point type,
+    *   the destination data type must be Word (W).
+    *
+    *   The destination must be DWord-aligned and specify a horizontal stride
+    *   (HorzStride) of 2. The 16-bit result is stored in the lower word of
+    *   each destination channel and the upper word is not modified.
+    */
+   struct brw_reg dst_w = ud_reg_to_w(dst);
+
+   /* Give each 32-bit channel of dst the form below , where "." means
+    * unchanged.
+    *   0x....hhhh
+    */
+   brw_F32TO16(p, dst_w, y);
+
+   /* Now the form:
+    *   0xhhhh0000
+    */
+   brw_SHL(p, dst, dst, brw_imm_ud(16u));
+
+   /* And, finally the form of packHalf2x16's output:
+    *   0xhhhhllll
+    */
+   brw_F32TO16(p, dst_w, x);
+}
+
+void
+fs_generator::generate_unpack_half_2x16_split(fs_inst *inst,
+                                              struct brw_reg dst,
+                                              struct brw_reg src)
+{
+   assert(brw->gen >= 7);
+   assert(dst.type == BRW_REGISTER_TYPE_F);
+   assert(src.type == BRW_REGISTER_TYPE_UD);
+
+   /* From the Ivybridge PRM, Vol4, Part3, Section 6.26 f16to32:
+    *
+    *   Because this instruction does not have a 16-bit floating-point type,
+    *   the source data type must be Word (W). The destination type must be
+    *   F (Float).
+    */
+   struct brw_reg src_w = ud_reg_to_w(src);
+
+   /* Each channel of src has the form of unpackHalf2x16's input: 0xhhhhllll.
+    * For the Y case, we wish to access only the upper word; therefore
+    * a 16-bit subregister offset is needed.
+    */
+   assert(inst->opcode == FS_OPCODE_UNPACK_HALF_2x16_SPLIT_X ||
+          inst->opcode == FS_OPCODE_UNPACK_HALF_2x16_SPLIT_Y);
+   if (inst->opcode == FS_OPCODE_UNPACK_HALF_2x16_SPLIT_Y)
+      src_w.subnr += 2;
+
+   brw_F16TO32(p, dst, src_w);
+}
+
+// LunarG : TODO - shader time??
+//void
+//fs_generator::generate_shader_time_add(fs_inst *inst,
+//                                       struct brw_reg payload,
+//                                       struct brw_reg offset,
+//                                       struct brw_reg value)
+//{
+//   assert(brw->gen >= 7);
+//   brw_push_insn_state(p);
+//   brw_set_mask_control(p, true);
+
+//   assert(payload.file == BRW_GENERAL_REGISTER_FILE);
+//   struct brw_reg payload_offset = retype(brw_vec1_grf(payload.nr, 0),
+//                                          offset.type);
+//   struct brw_reg payload_value = retype(brw_vec1_grf(payload.nr + 1, 0),
+//                                         value.type);
+
+//   assert(offset.file == BRW_IMMEDIATE_VALUE);
+//   if (value.file == BRW_GENERAL_REGISTER_FILE) {
+//      value.width = BRW_WIDTH_1;
+//      value.hstride = BRW_HORIZONTAL_STRIDE_0;
+//      value.vstride = BRW_VERTICAL_STRIDE_0;
+//   } else {
+//      assert(value.file == BRW_IMMEDIATE_VALUE);
+//   }
+
+//   /* Trying to deal with setup of the params from the IR is crazy in the FS8
+//    * case, and we don't really care about squeezing every bit of performance
+//    * out of this path, so we just emit the MOVs from here.
+//    */
+//   brw_MOV(p, payload_offset, offset);
+//   brw_MOV(p, payload_value, value);
+//   brw_shader_time_add(p, payload,
+//                       c->prog_data.base.binding_table.shader_time_start);
+//   brw_pop_insn_state(p);
+
+//   brw_mark_surface_used(&c->prog_data.base,
+//                         c->prog_data.base.binding_table.shader_time_start);
+//}
+
+void
+fs_generator::generate_untyped_atomic(fs_inst *inst, struct brw_reg dst,
+                                      struct brw_reg atomic_op,
+                                      struct brw_reg surf_index)
+{
+   assert(atomic_op.file == BRW_IMMEDIATE_VALUE &&
+          atomic_op.type == BRW_REGISTER_TYPE_UD &&
+          surf_index.file == BRW_IMMEDIATE_VALUE &&
+	  surf_index.type == BRW_REGISTER_TYPE_UD);
+
+   brw_untyped_atomic(p, dst, brw_message_reg(inst->base_mrf),
+                      atomic_op.dw1.ud, surf_index.dw1.ud,
+                      inst->mlen, dispatch_width / 8);
+
+   brw_mark_surface_used(&c->prog_data.base, surf_index.dw1.ud);
+}
+
+void
+fs_generator::generate_untyped_surface_read(fs_inst *inst, struct brw_reg dst,
+                                            struct brw_reg surf_index)
+{
+   assert(surf_index.file == BRW_IMMEDIATE_VALUE &&
+	  surf_index.type == BRW_REGISTER_TYPE_UD);
+
+   brw_untyped_surface_read(p, dst, brw_message_reg(inst->base_mrf),
+                            surf_index.dw1.ud,
+                            inst->mlen, dispatch_width / 8);
+
+   brw_mark_surface_used(&c->prog_data.base, surf_index.dw1.ud);
+}
+
+void
+fs_generator::generate_code(exec_list *instructions, FILE *dump_file)
+{
+   int last_native_insn_offset = p->next_insn_offset;
+   const char *last_annotation_string = NULL;
+   const void *last_annotation_ir = NULL;
+
+   if (unlikely(INTEL_DEBUG & DEBUG_WM)) {
+      if (prog) {
+         fprintf(stderr,
+                 "Native code for %s fragment shader %d (SIMD%d dispatch):\n",
+                 prog->Label ? prog->Label : "unnamed",
+                 prog->Name, dispatch_width);
+      } else if (fp) {
+         fprintf(stderr,
+                 "Native code for fragment program %d (SIMD%d dispatch):\n",
+                 fp->Base.Id, dispatch_width);
+      } else {
+         fprintf(stderr, "Native code for blorp program (SIMD%d dispatch):\n",
+                 dispatch_width);
+      }
+   }
+
+   cfg_t *cfg = NULL;
+   if (unlikely(INTEL_DEBUG & DEBUG_WM))
+      cfg = new(mem_ctx) cfg_t(instructions);
+
+   foreach_list(node, instructions) {
+      fs_inst *inst = (fs_inst *)node;
+      struct brw_reg src[3], dst;
+
+      if (unlikely(INTEL_DEBUG & DEBUG_WM)) {
+	 foreach_list(node, &cfg->block_list) {
+	    bblock_link *link = (bblock_link *)node;
+	    bblock_t *block = link->block;
+
+	    if (block->start == inst) {
+	       fprintf(stderr, "   START B%d", block->block_num);
+	       foreach_list(predecessor_node, &block->parents) {
+		  bblock_link *predecessor_link =
+		     (bblock_link *)predecessor_node;
+		  bblock_t *predecessor_block = predecessor_link->block;
+		  fprintf(stderr, " <-B%d", predecessor_block->block_num);
+	       }
+	       fprintf(stderr, "\n");
+	    }
+	 }
+
+	 if (last_annotation_ir != inst->ir) {
+	    last_annotation_ir = inst->ir;
+	    if (last_annotation_ir) {
+	       fprintf(stderr, "   ");
+//               if (prog)
+                  assert(prog);
+                  ((ir_instruction *)inst->ir)->fprint(stderr);
+//               else {
+//                  const prog_instruction *fpi;
+//                  fpi = (const prog_instruction *)inst->ir;
+//                  fprintf(stderr, "%d: ",
+//                          (int)(fpi - (fp ? fp->Base.Instructions : 0)));
+//                  _mesa_fprint_instruction_opt(stderr,
+//                                               fpi,
+//                                               0, PROG_PRINT_DEBUG, NULL);
+//               }
+	       fprintf(stderr, "\n");
+	    }
+	 }
+	 if (last_annotation_string != inst->annotation) {
+	    last_annotation_string = inst->annotation;
+	    if (last_annotation_string)
+	       fprintf(stderr, "   %s\n", last_annotation_string);
+	 }
+      }
+
+      for (unsigned int i = 0; i < 3; i++) {
+	 src[i] = brw_reg_from_fs_reg(&inst->src[i]);
+
+	 /* The accumulator result appears to get used for the
+	  * conditional modifier generation.  When negating a UD
+	  * value, there is a 33rd bit generated for the sign in the
+	  * accumulator value, so now you can't check, for example,
+	  * equality with a 32-bit value.  See piglit fs-op-neg-uvec4.
+	  */
+	 assert(!inst->conditional_mod ||
+		inst->src[i].type != BRW_REGISTER_TYPE_UD ||
+		!inst->src[i].negate);
+      }
+      dst = brw_reg_from_fs_reg(&inst->dst);
+
+      brw_set_conditionalmod(p, inst->conditional_mod);
+      brw_set_predicate_control(p, inst->predicate);
+      brw_set_predicate_inverse(p, inst->predicate_inverse);
+      brw_set_flag_reg(p, 0, inst->flag_subreg);
+      brw_set_saturate(p, inst->saturate);
+      brw_set_mask_control(p, inst->force_writemask_all);
+      brw_set_acc_write_control(p, inst->writes_accumulator);
+
+      if (inst->force_uncompressed || dispatch_width == 8) {
+	 brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+      } else if (inst->force_sechalf) {
+	 brw_set_compression_control(p, BRW_COMPRESSION_2NDHALF);
+      } else {
+	 brw_set_compression_control(p, BRW_COMPRESSION_COMPRESSED);
+      }
+
+      switch (inst->opcode) {
+      case BRW_OPCODE_MOV:
+	 brw_MOV(p, dst, src[0]);
+	 break;
+      case BRW_OPCODE_ADD:
+	 brw_ADD(p, dst, src[0], src[1]);
+	 break;
+      case BRW_OPCODE_MUL:
+	 brw_MUL(p, dst, src[0], src[1]);
+	 break;
+      case BRW_OPCODE_AVG:
+	 brw_AVG(p, dst, src[0], src[1]);
+	 break;
+      case BRW_OPCODE_MACH:
+	 brw_MACH(p, dst, src[0], src[1]);
+	 break;
+
+      case BRW_OPCODE_MAD:
+         assert(brw->gen >= 6);
+	 brw_set_access_mode(p, BRW_ALIGN_16);
+         if (dispatch_width == 16 && !brw->is_haswell) {
+	    brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+	    brw_MAD(p, dst, src[0], src[1], src[2]);
+	    brw_set_compression_control(p, BRW_COMPRESSION_2NDHALF);
+	    brw_MAD(p, sechalf(dst), sechalf(src[0]), sechalf(src[1]), sechalf(src[2]));
+	    brw_set_compression_control(p, BRW_COMPRESSION_COMPRESSED);
+	 } else {
+	    brw_MAD(p, dst, src[0], src[1], src[2]);
+	 }
+	 brw_set_access_mode(p, BRW_ALIGN_1);
+	 break;
+
+      case BRW_OPCODE_LRP:
+         assert(brw->gen >= 6);
+	 brw_set_access_mode(p, BRW_ALIGN_16);
+         if (dispatch_width == 16 && !brw->is_haswell) {
+	    brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+	    brw_LRP(p, dst, src[0], src[1], src[2]);
+	    brw_set_compression_control(p, BRW_COMPRESSION_2NDHALF);
+	    brw_LRP(p, sechalf(dst), sechalf(src[0]), sechalf(src[1]), sechalf(src[2]));
+	    brw_set_compression_control(p, BRW_COMPRESSION_COMPRESSED);
+	 } else {
+	    brw_LRP(p, dst, src[0], src[1], src[2]);
+	 }
+	 brw_set_access_mode(p, BRW_ALIGN_1);
+	 break;
+
+      case BRW_OPCODE_FRC:
+	 brw_FRC(p, dst, src[0]);
+	 break;
+      case BRW_OPCODE_RNDD:
+	 brw_RNDD(p, dst, src[0]);
+	 break;
+      case BRW_OPCODE_RNDE:
+	 brw_RNDE(p, dst, src[0]);
+	 break;
+      case BRW_OPCODE_RNDZ:
+	 brw_RNDZ(p, dst, src[0]);
+	 break;
+
+      case BRW_OPCODE_AND:
+	 brw_AND(p, dst, src[0], src[1]);
+	 break;
+      case BRW_OPCODE_OR:
+	 brw_OR(p, dst, src[0], src[1]);
+	 break;
+      case BRW_OPCODE_XOR:
+	 brw_XOR(p, dst, src[0], src[1]);
+	 break;
+      case BRW_OPCODE_NOT:
+	 brw_NOT(p, dst, src[0]);
+	 break;
+      case BRW_OPCODE_ASR:
+	 brw_ASR(p, dst, src[0], src[1]);
+	 break;
+      case BRW_OPCODE_SHR:
+	 brw_SHR(p, dst, src[0], src[1]);
+	 break;
+      case BRW_OPCODE_SHL:
+	 brw_SHL(p, dst, src[0], src[1]);
+	 break;
+      case BRW_OPCODE_F32TO16:
+         assert(brw->gen >= 7);
+         brw_F32TO16(p, dst, src[0]);
+         break;
+      case BRW_OPCODE_F16TO32:
+         assert(brw->gen >= 7);
+         brw_F16TO32(p, dst, src[0]);
+         break;
+      case BRW_OPCODE_CMP:
+	 brw_CMP(p, dst, inst->conditional_mod, src[0], src[1]);
+	 break;
+      case BRW_OPCODE_SEL:
+	 brw_SEL(p, dst, src[0], src[1]);
+	 break;
+      case BRW_OPCODE_BFREV:
+         assert(brw->gen >= 7);
+         /* BFREV only supports UD type for src and dst. */
+         brw_BFREV(p, retype(dst, BRW_REGISTER_TYPE_UD),
+                      retype(src[0], BRW_REGISTER_TYPE_UD));
+         break;
+      case BRW_OPCODE_FBH:
+         assert(brw->gen >= 7);
+         /* FBH only supports UD type for dst. */
+         brw_FBH(p, retype(dst, BRW_REGISTER_TYPE_UD), src[0]);
+         break;
+      case BRW_OPCODE_FBL:
+         assert(brw->gen >= 7);
+         /* FBL only supports UD type for dst. */
+         brw_FBL(p, retype(dst, BRW_REGISTER_TYPE_UD), src[0]);
+         break;
+      case BRW_OPCODE_CBIT:
+         assert(brw->gen >= 7);
+         /* CBIT only supports UD type for dst. */
+         brw_CBIT(p, retype(dst, BRW_REGISTER_TYPE_UD), src[0]);
+         break;
+      case BRW_OPCODE_ADDC:
+         assert(brw->gen >= 7);
+         brw_ADDC(p, dst, src[0], src[1]);
+         break;
+      case BRW_OPCODE_SUBB:
+         assert(brw->gen >= 7);
+         brw_SUBB(p, dst, src[0], src[1]);
+         break;
+      case BRW_OPCODE_MAC:
+         brw_MAC(p, dst, src[0], src[1]);
+         break;
+
+      case BRW_OPCODE_BFE:
+         assert(brw->gen >= 7);
+         brw_set_access_mode(p, BRW_ALIGN_16);
+         if (dispatch_width == 16 && !brw->is_haswell) {
+            brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+            brw_BFE(p, dst, src[0], src[1], src[2]);
+            brw_set_compression_control(p, BRW_COMPRESSION_2NDHALF);
+            brw_BFE(p, sechalf(dst), sechalf(src[0]), sechalf(src[1]), sechalf(src[2]));
+            brw_set_compression_control(p, BRW_COMPRESSION_COMPRESSED);
+         } else {
+            brw_BFE(p, dst, src[0], src[1], src[2]);
+         }
+         brw_set_access_mode(p, BRW_ALIGN_1);
+         break;
+
+      case BRW_OPCODE_BFI1:
+         assert(brw->gen >= 7);
+         /* The Haswell WaForceSIMD8ForBFIInstruction workaround says that we
+          * should
+          *
+          *    "Force BFI instructions to be executed always in SIMD8."
+          */
+         if (dispatch_width == 16 && brw->is_haswell) {
+            brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+            brw_BFI1(p, dst, src[0], src[1]);
+            brw_set_compression_control(p, BRW_COMPRESSION_2NDHALF);
+            brw_BFI1(p, sechalf(dst), sechalf(src[0]), sechalf(src[1]));
+            brw_set_compression_control(p, BRW_COMPRESSION_COMPRESSED);
+         } else {
+            brw_BFI1(p, dst, src[0], src[1]);
+         }
+         break;
+      case BRW_OPCODE_BFI2:
+         assert(brw->gen >= 7);
+         brw_set_access_mode(p, BRW_ALIGN_16);
+         /* The Haswell WaForceSIMD8ForBFIInstruction workaround says that we
+          * should
+          *
+          *    "Force BFI instructions to be executed always in SIMD8."
+          *
+          * Otherwise we would be able to emit compressed instructions like we
+          * do for the other three-source instructions.
+          */
+         if (dispatch_width == 16) {
+            brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+            brw_BFI2(p, dst, src[0], src[1], src[2]);
+            brw_set_compression_control(p, BRW_COMPRESSION_2NDHALF);
+            brw_BFI2(p, sechalf(dst), sechalf(src[0]), sechalf(src[1]), sechalf(src[2]));
+            brw_set_compression_control(p, BRW_COMPRESSION_COMPRESSED);
+         } else {
+            brw_BFI2(p, dst, src[0], src[1], src[2]);
+         }
+         brw_set_access_mode(p, BRW_ALIGN_1);
+         break;
+
+      case BRW_OPCODE_IF:
+	 if (inst->src[0].file != BAD_FILE) {
+	    /* The instruction has an embedded compare (only allowed on gen6) */
+	    assert(brw->gen == 6);
+	    gen6_IF(p, inst->conditional_mod, src[0], src[1]);
+	 } else {
+	    brw_IF(p, dispatch_width == 16 ? BRW_EXECUTE_16 : BRW_EXECUTE_8);
+	 }
+	 break;
+
+      case BRW_OPCODE_ELSE:
+	 brw_ELSE(p);
+	 break;
+      case BRW_OPCODE_ENDIF:
+	 brw_ENDIF(p);
+	 break;
+
+      case BRW_OPCODE_DO:
+	 brw_DO(p, BRW_EXECUTE_8);
+	 break;
+
+      case BRW_OPCODE_BREAK:
+	 brw_BREAK(p);
+	 brw_set_predicate_control(p, BRW_PREDICATE_NONE);
+	 break;
+      case BRW_OPCODE_CONTINUE:
+	 /* FINISHME: We need to write the loop instruction support still. */
+	 if (brw->gen >= 6)
+	    gen6_CONT(p);
+	 else
+	    brw_CONT(p);
+	 brw_set_predicate_control(p, BRW_PREDICATE_NONE);
+	 break;
+
+      case BRW_OPCODE_WHILE:
+	 brw_WHILE(p);
+	 break;
+
+      case SHADER_OPCODE_RCP:
+      case SHADER_OPCODE_RSQ:
+      case SHADER_OPCODE_SQRT:
+      case SHADER_OPCODE_EXP2:
+      case SHADER_OPCODE_LOG2:
+      case SHADER_OPCODE_SIN:
+      case SHADER_OPCODE_COS:
+	 if (brw->gen >= 7) {
+	    generate_math1_gen7(inst, dst, src[0]);
+	 } else if (brw->gen == 6) {
+	    generate_math1_gen6(inst, dst, src[0]);
+	 } else if (brw->gen == 5 || brw->is_g4x) {
+	    generate_math_g45(inst, dst, src[0]);
+	 } else {
+	    generate_math_gen4(inst, dst, src[0]);
+	 }
+	 break;
+      case SHADER_OPCODE_INT_QUOTIENT:
+      case SHADER_OPCODE_INT_REMAINDER:
+      case SHADER_OPCODE_POW:
+	 if (brw->gen >= 7) {
+	    generate_math2_gen7(inst, dst, src[0], src[1]);
+	 } else if (brw->gen == 6) {
+	    generate_math2_gen6(inst, dst, src[0], src[1]);
+	 } else {
+	    generate_math_gen4(inst, dst, src[0]);
+	 }
+	 break;
+      case FS_OPCODE_PIXEL_X:
+	 generate_pixel_xy(dst, true);
+	 break;
+      case FS_OPCODE_PIXEL_Y:
+	 generate_pixel_xy(dst, false);
+	 break;
+      case FS_OPCODE_CINTERP:
+	 brw_MOV(p, dst, src[0]);
+	 break;
+      case FS_OPCODE_LINTERP:
+	 generate_linterp(inst, dst, src);
+	 break;
+      case SHADER_OPCODE_TEX:
+      case FS_OPCODE_TXB:
+      case SHADER_OPCODE_TXD:
+      case SHADER_OPCODE_TXF:
+      case SHADER_OPCODE_TXF_CMS:
+      case SHADER_OPCODE_TXF_UMS:
+      case SHADER_OPCODE_TXF_MCS:
+      case SHADER_OPCODE_TXL:
+      case SHADER_OPCODE_TXS:
+      case SHADER_OPCODE_LOD:
+      case SHADER_OPCODE_TG4:
+      case SHADER_OPCODE_TG4_OFFSET:
+	 generate_tex(inst, dst, src[0]);
+	 break;
+      case FS_OPCODE_DDX:
+	 generate_ddx(inst, dst, src[0]);
+	 break;
+      case FS_OPCODE_DDY:
+         /* Make sure fp->UsesDFdy flag got set (otherwise there's no
+          * guarantee that c->key.render_to_fbo is set).
+          */
+         assert(fp->UsesDFdy);
+	 generate_ddy(inst, dst, src[0], c->key.render_to_fbo);
+	 break;
+
+      case SHADER_OPCODE_GEN4_SCRATCH_WRITE:
+	 generate_scratch_write(inst, src[0]);
+	 break;
+
+      case SHADER_OPCODE_GEN4_SCRATCH_READ:
+	 generate_scratch_read(inst, dst);
+	 break;
+
+      case SHADER_OPCODE_GEN7_SCRATCH_READ:
+	 generate_scratch_read_gen7(inst, dst);
+	 break;
+
+      case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD:
+	 generate_uniform_pull_constant_load(inst, dst, src[0], src[1]);
+	 break;
+
+      case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GEN7:
+	 generate_uniform_pull_constant_load_gen7(inst, dst, src[0], src[1]);
+	 break;
+
+      case FS_OPCODE_VARYING_PULL_CONSTANT_LOAD:
+	 generate_varying_pull_constant_load(inst, dst, src[0], src[1]);
+	 break;
+
+      case FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7:
+	 generate_varying_pull_constant_load_gen7(inst, dst, src[0], src[1]);
+	 break;
+
+      case FS_OPCODE_FB_WRITE:
+	 generate_fb_write(inst);
+	 break;
+
+      case FS_OPCODE_BLORP_FB_WRITE:
+	 generate_blorp_fb_write(inst);
+	 break;
+
+      case FS_OPCODE_MOV_DISPATCH_TO_FLAGS:
+         generate_mov_dispatch_to_flags(inst);
+         break;
+
+      case FS_OPCODE_DISCARD_JUMP:
+         generate_discard_jump(inst);
+         break;
+
+// LunarG : TODO - shader time??
+//      case SHADER_OPCODE_SHADER_TIME_ADD:
+//         generate_shader_time_add(inst, src[0], src[1], src[2]);
+//         break;
+
+      case SHADER_OPCODE_UNTYPED_ATOMIC:
+         generate_untyped_atomic(inst, dst, src[0], src[1]);
+         break;
+
+      case SHADER_OPCODE_UNTYPED_SURFACE_READ:
+         generate_untyped_surface_read(inst, dst, src[0]);
+         break;
+
+      case FS_OPCODE_SET_SIMD4X2_OFFSET:
+         generate_set_simd4x2_offset(inst, dst, src[0]);
+         break;
+
+      case FS_OPCODE_SET_OMASK:
+         generate_set_omask(inst, dst, src[0]);
+         break;
+
+      case FS_OPCODE_SET_SAMPLE_ID:
+         generate_set_sample_id(inst, dst, src[0], src[1]);
+         break;
+
+      case FS_OPCODE_PACK_HALF_2x16_SPLIT:
+          generate_pack_half_2x16_split(inst, dst, src[0], src[1]);
+          break;
+
+      case FS_OPCODE_UNPACK_HALF_2x16_SPLIT_X:
+      case FS_OPCODE_UNPACK_HALF_2x16_SPLIT_Y:
+         generate_unpack_half_2x16_split(inst, dst, src[0]);
+         break;
+
+      case FS_OPCODE_PLACEHOLDER_HALT:
+         /* This is the place where the final HALT needs to be inserted if
+          * we've emitted any discards.  If not, this will emit no code.
+          */
+         patch_discard_jumps_to_fb_writes();
+         break;
+
+      case SHADER_OPCODE_DWORD_SCATTERED_WRITE:
+      case SHADER_OPCODE_BYTE_SCATTERED_WRITE:
+         generate_scattered_write(inst, dst, src[0]);
+         break;
+      case SHADER_OPCODE_DWORD_SCATTERED_READ:
+      case SHADER_OPCODE_BYTE_SCATTERED_READ:
+         generate_scattered_read(inst, dst, src[0]);
+         break;
+      case VS_OPCODE_URB_WRITE:
+         brw_urb_WRITE(p,
+                 brw_null_reg(), /* dest */
+                 inst->base_mrf, /* starting mrf reg nr */
+                 brw_vec8_grf(0, 0), /* src */
+                 (inst->eot) ? BRW_URB_WRITE_EOT_COMPLETE : BRW_URB_WRITE_NO_FLAGS,
+                 inst->mlen,
+                 0,
+                 inst->offset,
+                 BRW_URB_SWIZZLE_INTERLEAVE);
+         break;
+
+      default:
+	 if (inst->opcode < (int) ARRAY_SIZE(opcode_descs)) {
+	    _mesa_problem(ctx, "Unsupported opcode `%s' in FS",
+			  opcode_descs[inst->opcode].name);
+	 } else {
+	    _mesa_problem(ctx, "Unsupported opcode %d in FS", inst->opcode);
+	 }
+	 abort();
+      }
+
+      if (unlikely(INTEL_DEBUG & DEBUG_WM)) {
+	 brw_dump_compile(p, stderr,
+			  last_native_insn_offset, p->next_insn_offset);
+
+	 foreach_list(node, &cfg->block_list) {
+	    bblock_link *link = (bblock_link *)node;
+	    bblock_t *block = link->block;
+
+	    if (block->end == inst) {
+	       fprintf(stderr, "   END B%d", block->block_num);
+	       foreach_list(successor_node, &block->children) {
+		  bblock_link *successor_link =
+		     (bblock_link *)successor_node;
+		  bblock_t *successor_block = successor_link->block;
+		  fprintf(stderr, " ->B%d", successor_block->block_num);
+	       }
+	       fprintf(stderr, "\n");
+	    }
+	 }
+      }
+
+      last_native_insn_offset = p->next_insn_offset;
+   }
+
+   if (unlikely(INTEL_DEBUG & DEBUG_WM)) {
+      fprintf(stderr, "\n");
+   }
+
+   brw_set_uip_jip(p);
+
+   /* OK, while the INTEL_DEBUG=wm above is very nice for debugging FS
+    * emit issues, it doesn't get the jump distances into the output,
+    * which is often something we want to debug.  So this is here in
+    * case you're doing that.
+    */
+   if (dump_file) {
+      brw_dump_compile(p, dump_file, 0, p->next_insn_offset);
+   }
+}
+
+const unsigned *
+fs_generator::generate_assembly(exec_list *simd8_instructions,
+                                exec_list *simd16_instructions,
+                                unsigned *assembly_size,
+                                FILE *dump_file)
+{
+   assert(simd8_instructions || simd16_instructions);
+
+   if (simd8_instructions) {
+      dispatch_width = 8;
+      generate_code(simd8_instructions, dump_file);
+   }
+
+   if (simd16_instructions) {
+      /* We have to do a compaction pass now, or the one at the end of
+       * execution will squash down where our prog_offset start needs
+       * to be.
+       */
+      brw_compact_instructions(p);
+
+      /* align to 64 byte boundary. */
+      while ((p->nr_insn * sizeof(struct brw_instruction)) % 64) {
+         brw_NOP(p);
+      }
+
+      /* Save off the start of this SIMD16 program */
+      c->prog_data.prog_offset_16 = p->nr_insn * sizeof(struct brw_instruction);
+
+      brw_set_compression_control(p, BRW_COMPRESSION_COMPRESSED);
+
+      dispatch_width = 16;
+      generate_code(simd16_instructions, dump_file);
+   }
+
+   return brw_get_program(p, assembly_size);
+}

diff --git a/icd/intel/compiler/pipeline/brw_fs_live_variables.cpp b/icd/intel/compiler/pipeline/brw_fs_live_variables.cpp
new file mode 100644
index 0000000..395fa38
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_fs_live_variables.cpp

@@ -0,0 +1,399 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Eric Anholt <eric@anholt.net>
+ *
+ */
+
+#include "brw_cfg.h"
+#include "brw_fs_live_variables.h"
+
+using namespace brw;
+
+#define MAX_INSTRUCTION (1 << 30)
+
+/** @file brw_fs_live_variables.cpp
+ *
+ * Support for calculating liveness information about virtual GRFs.
+ *
+ * This produces a live interval for each whole virtual GRF.  We could
+ * choose to expose per-component live intervals for VGRFs of size > 1,
+ * but we currently do not.  It is easier for the consumers of this
+ * information to work with whole VGRFs.
+ *
+ * However, we internally track use/def information at the per-component
+ * (reg_offset) level for greater accuracy.  Large VGRFs may be accessed
+ * piecemeal over many (possibly non-adjacent) instructions.  In this case,
+ * examining a single instruction is insufficient to decide whether a whole
+ * VGRF is ultimately used or defined.  Tracking individual components
+ * allows us to easily assemble this information.
+ *
+ * See Muchnick's Advanced Compiler Design and Implementation, section
+ * 14.1 (p444).
+ */
+
+void
+fs_live_variables::setup_one_read(bblock_t *block, fs_inst *inst,
+                                  int ip, fs_reg reg)
+{
+   int var = var_from_vgrf[reg.reg] + reg.reg_offset;
+   assert(var < num_vars);
+
+   /* In most cases, a register can be written over safely by the
+    * same instruction that is its last use.  For a single
+    * instruction, the sources are dereferenced before writing of the
+    * destination starts (naturally).  This gets more complicated for
+    * simd16, because the instruction:
+    *
+    * add(16)      g4<1>F      g4<8,8,1>F   g6<8,8,1>F
+    *
+    * is actually decoded in hardware as:
+    *
+    * add(8)       g4<1>F      g4<8,8,1>F   g6<8,8,1>F
+    * add(8)       g5<1>F      g5<8,8,1>F   g7<8,8,1>F
+    *
+    * Which is safe.  However, if we have uniform accesses
+    * happening, we get into trouble:
+    *
+    * add(8)       g4<1>F      g4<0,1,0>F   g6<8,8,1>F
+    * add(8)       g5<1>F      g4<0,1,0>F   g7<8,8,1>F
+    *
+    * Now our destination for the first instruction overwrote the
+    * second instruction's src0, and we get garbage for those 8
+    * pixels.  There's a similar issue for the pre-gen6
+    * pixel_x/pixel_y, which are registers of 16-bit values and thus
+    * would get stomped by the first decode as well.
+    */
+   int end_ip = ip;
+   if (v->dispatch_width == 16 && (reg.stride == 0 ||
+                                   reg.type == BRW_REGISTER_TYPE_UW ||
+                                   reg.type == BRW_REGISTER_TYPE_W ||
+                                   reg.type == BRW_REGISTER_TYPE_UB ||
+                                   reg.type == BRW_REGISTER_TYPE_B)) {
+      end_ip++;
+   }
+
+   start[var] = MIN2(start[var], ip);
+   end[var] = MAX2(end[var], end_ip);
+
+   /* The use[] bitset marks when the block makes use of a variable (VGRF
+    * channel) without having completely defined that variable within the
+    * block.
+    */
+   if (!BITSET_TEST(bd[block->block_num].def, var))
+      BITSET_SET(bd[block->block_num].use, var);
+}
+
+void
+fs_live_variables::setup_one_write(bblock_t *block, fs_inst *inst,
+                                   int ip, fs_reg reg)
+{
+   int var = var_from_vgrf[reg.reg] + reg.reg_offset;
+   assert(var < num_vars);
+
+   start[var] = MIN2(start[var], ip);
+   end[var] = MAX2(end[var], ip);
+
+   /* The def[] bitset marks when an initialization in a block completely
+    * screens off previous updates of that variable (VGRF channel).
+    */
+   if (inst->dst.file == GRF && !inst->is_partial_write()) {
+      if (!BITSET_TEST(bd[block->block_num].use, var))
+         BITSET_SET(bd[block->block_num].def, var);
+   }
+}
+
+/**
+ * Sets up the use[] and def[] bitsets.
+ *
+ * The basic-block-level live variable analysis needs to know which
+ * variables get used before they're completely defined, and which
+ * variables are completely defined before they're used.
+ *
+ * These are tracked at the per-component level, rather than whole VGRFs.
+ */
+void
+fs_live_variables::setup_def_use()
+{
+   int ip = 0;
+
+   for (int b = 0; b < cfg->num_blocks; b++) {
+      bblock_t *block = cfg->blocks[b];
+
+      assert(ip == block->start_ip);
+      if (b > 0)
+	 assert(cfg->blocks[b - 1]->end_ip == ip - 1);
+
+      for (fs_inst *inst = (fs_inst *)block->start;
+	   inst != block->end->next;
+	   inst = (fs_inst *)inst->next) {
+
+	 /* Set use[] for this instruction */
+	 for (unsigned int i = 0; i < 3; i++) {
+            fs_reg reg = inst->src[i];
+
+            if (reg.file != GRF)
+               continue;
+
+            for (int j = 0; j < inst->regs_read(v, i); j++) {
+               setup_one_read(block, inst, ip, reg);
+               reg.reg_offset++;
+            }
+	 }
+
+         /* Set def[] for this instruction */
+         if (inst->dst.file == GRF) {
+            fs_reg reg = inst->dst;
+            for (int j = 0; j < inst->regs_written; j++) {
+               setup_one_write(block, inst, ip, reg);
+               reg.reg_offset++;
+            }
+	 }
+
+	 ip++;
+      }
+   }
+}
+
+/**
+ * The algorithm incrementally sets bits in liveout and livein,
+ * propagating it through control flow.  It will eventually terminate
+ * because it only ever adds bits, and stops when no bits are added in
+ * a pass.
+ */
+void
+fs_live_variables::compute_live_variables()
+{
+   bool cont = true;
+
+   while (cont) {
+      cont = false;
+
+      for (int b = 0; b < cfg->num_blocks; b++) {
+	 /* Update livein */
+	 for (int i = 0; i < bitset_words; i++) {
+            BITSET_WORD new_livein = (bd[b].use[i] |
+                                      (bd[b].liveout[i] & ~bd[b].def[i]));
+	    if (new_livein & ~bd[b].livein[i]) {
+               bd[b].livein[i] |= new_livein;
+               cont = true;
+	    }
+	 }
+
+	 /* Update liveout */
+	 foreach_list(block_node, &cfg->blocks[b]->children) {
+	    bblock_link *link = (bblock_link *)block_node;
+	    bblock_t *block = link->block;
+
+	    for (int i = 0; i < bitset_words; i++) {
+               BITSET_WORD new_liveout = (bd[block->block_num].livein[i] &
+                                          ~bd[b].liveout[i]);
+               if (new_liveout) {
+                  bd[b].liveout[i] |= new_liveout;
+                  cont = true;
+               }
+	    }
+	 }
+      }
+   }
+}
+
+/**
+ * Extend the start/end ranges for each variable to account for the
+ * new information calculated from control flow.
+ */
+void
+fs_live_variables::compute_start_end()
+{
+   for (int b = 0; b < cfg->num_blocks; b++) {
+      for (int i = 0; i < num_vars; i++) {
+	 if (BITSET_TEST(bd[b].livein, i)) {
+	    start[i] = MIN2(start[i], cfg->blocks[b]->start_ip);
+	    end[i] = MAX2(end[i], cfg->blocks[b]->start_ip);
+	 }
+
+	 if (BITSET_TEST(bd[b].liveout, i)) {
+	    start[i] = MIN2(start[i], cfg->blocks[b]->end_ip);
+	    end[i] = MAX2(end[i], cfg->blocks[b]->end_ip);
+	 }
+
+      }
+   }
+}
+
+int
+fs_live_variables::var_from_reg(fs_reg *reg)
+{
+   return var_from_vgrf[reg->reg] + reg->reg_offset;
+}
+
+fs_live_variables::fs_live_variables(fs_visitor *v, cfg_t *cfg)
+   : v(v), cfg(cfg)
+{
+   mem_ctx = ralloc_context(NULL);
+
+   num_vgrfs = v->virtual_grf_count;
+   num_vars = 0;
+   var_from_vgrf = rzalloc_array(mem_ctx, int, num_vgrfs);
+   for (int i = 0; i < num_vgrfs; i++) {
+      var_from_vgrf[i] = num_vars;
+      num_vars += v->virtual_grf_sizes[i];
+   }
+
+   vgrf_from_var = rzalloc_array(mem_ctx, int, num_vars);
+   for (int i = 0; i < num_vgrfs; i++) {
+      for (int j = 0; j < v->virtual_grf_sizes[i]; j++) {
+         vgrf_from_var[var_from_vgrf[i] + j] = i;
+      }
+   }
+
+   start = ralloc_array(mem_ctx, int, num_vars);
+   end = rzalloc_array(mem_ctx, int, num_vars);
+   for (int i = 0; i < num_vars; i++) {
+      start[i] = MAX_INSTRUCTION;
+      end[i] = -1;
+   }
+
+   bd = rzalloc_array(mem_ctx, struct block_data, cfg->num_blocks);
+   blocks = cfg->num_blocks;
+
+   bitset_words = BITSET_WORDS(num_vars);
+   for (int i = 0; i < cfg->num_blocks; i++) {
+      bd[i].def = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
+      bd[i].use = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
+      bd[i].livein = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
+      bd[i].liveout = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
+   }
+
+   setup_def_use();
+   compute_live_variables();
+   compute_start_end();
+}
+
+fs_live_variables::~fs_live_variables()
+{
+   ralloc_free(mem_ctx);
+}
+
+void
+fs_visitor::invalidate_live_intervals()
+{
+   ralloc_free(live_intervals);
+   live_intervals = NULL;
+}
+
+/**
+ * Compute the live intervals for each virtual GRF.
+ *
+ * This uses the per-component use/def data, but combines it to produce
+ * information about whole VGRFs.
+ */
+void
+fs_visitor::calculate_live_intervals(int payload)
+{
+   if (this->live_intervals)
+      return;
+
+   int num_vgrfs = this->virtual_grf_count + payload;
+   ralloc_free(this->virtual_grf_start);
+   ralloc_free(this->virtual_grf_end);
+   virtual_grf_start = ralloc_array(mem_ctx, int, num_vgrfs);
+   virtual_grf_end = ralloc_array(mem_ctx, int, num_vgrfs);
+
+   for (int i = 0; i < num_vgrfs; i++) {
+      virtual_grf_start[i] = MAX_INSTRUCTION;
+      virtual_grf_end[i] = -1;
+   }
+
+   cfg_t cfg(&instructions);
+   this->live_intervals = new(mem_ctx) fs_live_variables(this, &cfg);
+
+   /* Merge the per-component live ranges to whole VGRF live ranges. */
+   for (int i = 0; i < live_intervals->num_vars; i++) {
+      int vgrf = live_intervals->vgrf_from_var[i];
+      virtual_grf_start[vgrf] = MIN2(virtual_grf_start[vgrf],
+                                     live_intervals->start[i]);
+      virtual_grf_end[vgrf] = MAX2(virtual_grf_end[vgrf],
+                                   live_intervals->end[i]);
+   }
+}
+
+bool
+fs_live_variables::vars_interfere(int a, int b)
+{
+   return !(end[b] <= start[a] ||
+            end[a] <= start[b]);
+}
+
+bool
+fs_visitor::virtual_grf_interferes(int a, int b)
+{
+   return !(virtual_grf_end[a] <= virtual_grf_start[b] ||
+            virtual_grf_end[b] <= virtual_grf_start[a]);
+}
+
+int
+fs_visitor::live_in_count(int block_num) const
+{
+   int count = 0;
+
+   assert(this->live_intervals);
+   assert(this->live_intervals->bd);
+   assert(block_num < this->live_intervals->blocks);
+
+   /* Count number of live ins for each block */
+   for (int i=0; i<this->live_intervals->bitset_words; ++i)
+      count += _mesa_bitcount(this->live_intervals->bd[block_num].livein[i]);
+
+   return count;
+}
+
+int
+fs_visitor::live_out_count(int block_num) const
+{
+   int count = 0;
+
+   assert(this->live_intervals);
+   assert(this->live_intervals->bd);
+   assert(block_num < this->live_intervals->blocks);
+
+   /* Count number of live ins for each block */
+   for (int i=0; i<this->live_intervals->bitset_words; ++i)
+      count += _mesa_bitcount(this->live_intervals->bd[block_num].liveout[i]);
+
+   static const bool debug = false;
+
+   if (debug) {
+      int count = 0;
+      fprintf(stderr, "Block %d out: ", block_num);
+      for (int var=0; var<this->live_intervals->num_vars; ++var)
+         if (BITSET_TEST(this->live_intervals->bd[block_num].liveout, var)) {
+            ++count;
+            fprintf(stderr, "var%d=vgrf%d ", var, this->live_intervals->vgrf_from_var[var]);
+         }
+         
+      fprintf(stderr, " (total=%d)\n", count);
+   }
+
+   return count;
+}

diff --git a/icd/intel/compiler/pipeline/brw_fs_live_variables.h b/icd/intel/compiler/pipeline/brw_fs_live_variables.h
new file mode 100644
index 0000000..7b9adce
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_fs_live_variables.h

@@ -0,0 +1,104 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Eric Anholt <eric@anholt.net>
+ *
+ */
+
+#include "brw_fs.h"
+#include "main/bitset.h"
+
+class cfg_t;
+
+namespace brw {
+
+struct block_data {
+   /**
+    * Which variables are defined before being used in the block.
+    *
+    * Note that for our purposes, "defined" means unconditionally, completely
+    * defined.
+    */
+   BITSET_WORD *def;
+
+   /**
+    * Which variables are used before being defined in the block.
+    */
+   BITSET_WORD *use;
+
+   /** Which defs reach the entry point of the block. */
+   BITSET_WORD *livein;
+
+   /** Which defs reach the exit point of the block. */
+   BITSET_WORD *liveout;
+};
+
+class fs_live_variables {
+public:
+   DECLARE_RALLOC_CXX_OPERATORS(fs_live_variables)
+
+   fs_live_variables(fs_visitor *v, cfg_t *cfg);
+   ~fs_live_variables();
+
+   void setup_def_use();
+   void setup_one_read(bblock_t *block, fs_inst *inst, int ip, fs_reg reg);
+   void setup_one_write(bblock_t *block, fs_inst *inst, int ip, fs_reg reg);
+   void compute_live_variables();
+   void compute_start_end();
+
+   bool vars_interfere(int a, int b);
+   int var_from_reg(fs_reg *reg);
+
+   fs_visitor *v;
+   cfg_t *cfg;
+   void *mem_ctx;
+
+   /** Map from virtual GRF number to index in block_data arrays. */
+   int *var_from_vgrf;
+
+   /**
+    * Map from any index in block_data to the virtual GRF containing it.
+    *
+    * For virtual_grf_sizes of [1, 2, 3], vgrf_from_var would contain
+    * [0, 1, 1, 2, 2, 2].
+    */
+   int *vgrf_from_var;
+
+   int num_vars;
+   int num_vgrfs;
+   int bitset_words;
+   int blocks;
+
+   /** @{
+    * Final computed live ranges for each var (each component of each virtual
+    * GRF).
+    */
+   int *start;
+   int *end;
+   /** @} */
+
+   /** Per-basic-block information on live variables */
+   struct block_data *bd;
+};
+
+} /* namespace brw */

diff --git a/icd/intel/compiler/pipeline/brw_fs_peephole_predicated_break.cpp b/icd/intel/compiler/pipeline/brw_fs_peephole_predicated_break.cpp
new file mode 100644
index 0000000..bb0a2ac
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_fs_peephole_predicated_break.cpp

@@ -0,0 +1,96 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "brw_fs.h"
+#include "brw_cfg.h"
+
+/** @file brw_fs_peephole_predicated_break.cpp
+ *
+ * Loops are often structured as
+ *
+ * loop:
+ *    CMP.f0
+ *    (+f0) IF
+ *    BREAK
+ *    ENDIF
+ *    ...
+ *    WHILE loop
+ *
+ * This peephole pass removes the IF and ENDIF instructions and predicates the
+ * BREAK, dropping two instructions from the loop body.
+ */
+
+bool
+fs_visitor::opt_peephole_predicated_break()
+{
+   bool progress = false;
+
+   cfg_t cfg(&instructions);
+
+   for (int b = 0; b < cfg.num_blocks; b++) {
+      bblock_t *block = cfg.blocks[b];
+
+      /* BREAK and CONTINUE instructions, by definition, can only be found at
+       * the ends of basic blocks.
+       */
+      fs_inst *inst = (fs_inst *) block->end;
+      if (inst->opcode != BRW_OPCODE_BREAK && inst->opcode != BRW_OPCODE_CONTINUE)
+         continue;
+
+      fs_inst *if_inst = (fs_inst *) inst->prev;
+      if (if_inst->opcode != BRW_OPCODE_IF)
+         continue;
+
+      fs_inst *endif_inst = (fs_inst *) inst->next;
+      if (endif_inst->opcode != BRW_OPCODE_ENDIF)
+         continue;
+
+      /* For Sandybridge with IF with embedded comparison we need to emit an
+       * instruction to set the flag register.
+       */
+      if (brw->gen == 6 && if_inst->conditional_mod) {
+         fs_inst *cmp_inst = CMP(reg_null_d, if_inst->src[0], if_inst->src[1],
+                                 if_inst->conditional_mod);
+         if_inst->insert_before(cmp_inst);
+         inst->predicate = BRW_PREDICATE_NORMAL;
+      } else {
+         inst->predicate = if_inst->predicate;
+         inst->predicate_inverse = if_inst->predicate_inverse;
+      }
+
+      if_inst->remove();
+      endif_inst->remove();
+
+      /* By removing the ENDIF instruction we removed a basic block. Skip over
+       * it for the next iteration.
+       */
+      b++;
+
+      progress = true;
+   }
+
+   if (progress)
+      invalidate_live_intervals();
+
+   return progress;
+}

diff --git a/icd/intel/compiler/pipeline/brw_fs_reg_allocate.cpp b/icd/intel/compiler/pipeline/brw_fs_reg_allocate.cpp
new file mode 100644
index 0000000..fb9eb9b
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_fs_reg_allocate.cpp

@@ -0,0 +1,818 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ * Copyright (C) 2014 LunarG, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Eric Anholt <eric@anholt.net>
+ *
+ */
+
+#include "brw_fs.h"
+#include "glsl/glsl_types.h"
+#include "glsl/ir_optimization.h"
+#include "glsl/glsl_parser_extras.h"
+#include "icd-utils.h" // LunarG : ADD
+
+static const bool debug = false;
+
+#include <vector>
+#include <algorithm>
+#include <bitset>
+
+class igraph_node_t {
+public:
+   igraph_node_t(int node_count) : color(-1), size(1), spillCost(-1.0f)
+   {
+      edges.reserve(16);
+   }
+
+   void addEdge(int i) { edges.push_back(i); }
+   int  getEdge(int p) const { return edges[p]; }
+   int  getColor() const { return color; }
+   void setColor(int c) { color = c; }
+   int  getEdgeCount() const { return edges.size(); }
+   void setSize(int s) { size = s; }
+   int  getSize() const { return size; }
+   void setSpillCost(float c) { spillCost = c; }
+   float getSpillCost() const { return spillCost; }
+
+   bool interferesWith(int n) const { return std::find(edges.begin(), edges.end(), n) != edges.end(); }
+
+   void dumpEdges() const {
+      for (int e=0; e<int(edges.size()); ++e)
+         fprintf(stderr, " %d", edges[e]);
+   }
+
+private:
+   std::vector<int> edges;
+   int   color;
+   int   edgeCount;
+   int   size;
+   float spillCost;
+};
+
+class igraph_t {
+public:
+   static const int max_reg_count = 128; // max registers we can ever allocate
+
+   igraph_t(int node_count, int phys_count,
+            int virtual_grf_count, int* virtual_grf_sizes) :
+      toColor(node_count),
+      nodes(node_count, igraph_node_t(node_count)),
+      phys_count(phys_count),
+      virtual_grf_count(virtual_grf_count),
+      fail_node(-1)
+   {
+      assert(phys_count <= max_reg_count);
+
+      for (int n=0; n<virtual_grf_count; ++n)
+         nodes[n].setSize(virtual_grf_sizes[n]);
+   }
+
+   void dumpGraph() const {
+      fprintf(stderr, "RA: igraph:\n");
+      for (int n=0; n<int(nodes.size()); ++n) {
+         fprintf(stderr, "RA: Node %3d ->%3d:", n, nodes[n].getColor());
+         nodes[n].dumpEdges();
+         fprintf(stderr, "\n");
+      }
+   }
+
+   void addInterference(int i, int j) {
+      assert(i != j);
+      nodes[i].addEdge(j);
+      nodes[j].addEdge(i);
+   }
+
+   int  getColor(int i) const { return nodes[i].getColor(); }
+   void setColor(int i, int color) { nodes[i].setColor(color); }
+
+   void setSpillCost(int i, float c) { nodes[i].setSpillCost(c); }
+
+   int getBestSpillNode() const;
+
+   struct sorter {
+      sorter(const igraph_t& g) : g(g) { }
+
+      bool operator()(int i, int j) const {
+         return
+            g.nodes[i].getSize() != g.nodes[j].getSize() ?
+                  g.nodes[i].getSize() > g.nodes[j].getSize() :
+            g.nodes[i].getEdgeCount() != g.nodes[j].getEdgeCount() ?
+                  g.nodes[i].getEdgeCount() > g.nodes[j].getEdgeCount() :
+            i < j;
+      }
+
+      const igraph_t& g;
+   };
+
+   bool colorNode(int n, std::bitset<max_reg_count>& used) {
+      // Trivial return if already colored
+      if (nodes[n].getColor() >= 0)
+         return true;
+
+      used.reset();
+
+      // Place interfering nodes
+      for (int e=0; e<nodes[n].getEdgeCount(); ++e) {
+         const int n2    = nodes[n].getEdge(e);
+         const int color = nodes[n2].getColor();
+         if (color >= 0)
+            for (int s=0; s<nodes[n2].getSize(); ++s)
+               used.set(color+s);
+      }
+
+      // Find color for this node
+      int c;
+      int avail=0;
+      for (c=0; c<phys_count; ++c) {
+         if (used.test(c))
+            avail=0;
+         else
+            if (++avail >= nodes[n].getSize())
+               break;
+      }
+
+      if (avail < nodes[n].getSize())
+         return false;
+
+      nodes[n].setColor(c + 1 - nodes[n].getSize());
+      return true;
+   }
+
+   bool colorGraph() {
+      for (int n=0; n<int(nodes.size()); ++n)
+         toColor[n] = n;
+
+      std::sort(toColor.begin(), toColor.end(), sorter(*this));
+
+      std::bitset<max_reg_count> used;
+
+      for (int n=0; n<int(toColor.size()); ++n) {
+         if (!colorNode(toColor[n], used)) {
+            fail_node = toColor[n];
+
+            if (debug) {
+               fprintf(stderr, "RA: fail: node=%d, size=%d\n",
+                       toColor[n],
+                       nodes[toColor[n]].getSize());
+               fprintf(stderr, "RA: Map: ");
+               for (int x=0; x<int(used.size()); ++x)
+                  fprintf(stderr, "%s", used[x] ? "X" : ".");
+               fprintf(stderr, "\n");
+            }
+            
+            return false;
+         }
+      }
+
+      return true;
+   }
+
+private:
+   float getSpillBenefit(int n) const;
+
+   std::vector<int> toColor;
+   std::vector<igraph_node_t> nodes;
+   int phys_count;
+   int virtual_grf_count;
+   int fail_node;
+};
+
+float igraph_t::getSpillBenefit(int n) const
+{
+   float benefit = 0;
+
+   for (int e = 0; e < nodes[n].getEdgeCount(); e++) {
+      int n2 = nodes[n].getEdge(e);
+
+      if (n != n2)
+         benefit += nodes[n2].getSize();
+   }
+
+   return benefit;
+}
+
+int igraph_t::getBestSpillNode() const
+{
+   unsigned int best_node = -1;
+   float best_benefit = 0.0;
+
+   for (int n = 0; n < virtual_grf_count; n++) {
+      const float cost = nodes[n].getSpillCost();
+
+      if (cost <= 0.0)
+         continue;
+
+      const float benefit = getSpillBenefit(n);
+
+      if (benefit / cost > best_benefit) {
+         best_benefit = benefit / cost;
+         best_node = n;
+      }
+   }
+
+   return best_node;
+}
+
+
+static void
+assign_reg(int *reg_hw_locations, fs_reg *reg, int reg_width)
+{
+   if (reg->file == GRF) {
+      assert(reg->reg_offset >= 0);
+      assert(reg_hw_locations[reg->reg] >= 0);
+
+      reg->reg = reg_hw_locations[reg->reg] + reg->reg_offset * reg_width;
+      reg->reg_offset = 0;
+   }
+}
+
+void
+fs_visitor::assign_regs_trivial()
+{
+   int hw_reg_mapping[this->virtual_grf_count + 1];
+   int i;
+   int reg_width = dispatch_width / 8;
+
+   /* Note that compressed instructions require alignment to 2 registers. */
+   hw_reg_mapping[0] = ALIGN(this->first_non_payload_grf, reg_width);
+   for (i = 1; i <= this->virtual_grf_count; i++) {
+      hw_reg_mapping[i] = (hw_reg_mapping[i - 1] +
+			   this->virtual_grf_sizes[i - 1] * reg_width);
+   }
+   this->grf_used = hw_reg_mapping[this->virtual_grf_count];
+
+   foreach_list(node, &this->instructions) {
+      fs_inst *inst = (fs_inst *)node;
+
+      assign_reg(hw_reg_mapping, &inst->dst, reg_width);
+      assign_reg(hw_reg_mapping, &inst->src[0], reg_width);
+      assign_reg(hw_reg_mapping, &inst->src[1], reg_width);
+      assign_reg(hw_reg_mapping, &inst->src[2], reg_width);
+   }
+
+   if (this->grf_used >= max_grf) {
+      fail("Ran out of regs on trivial allocator (%d/%d)\n",
+	   this->grf_used, max_grf);
+   }
+
+}
+
+int
+count_to_loop_end(fs_inst *do_inst)
+{
+   int depth = 1;
+   int ip = 1;
+   for (fs_inst *inst = (fs_inst *)do_inst->next;
+        depth > 0;
+        inst = (fs_inst *)inst->next) {
+      switch (inst->opcode) {
+      case BRW_OPCODE_DO:
+         depth++;
+         break;
+      case BRW_OPCODE_WHILE:
+         depth--;
+         break;
+      default:
+         break;
+      }
+      ip++;
+   }
+   return ip;
+}
+
+/**
+ * Sets up interference between thread payload registers and the virtual GRFs
+ * to be allocated for program temporaries.
+ *
+ * We want to be able to reallocate the payload for our virtual GRFs, notably
+ * because the setup coefficients for a full set of 16 FS inputs takes up 8 of
+ * our 128 registers.
+ *
+ * The layout of the payload registers is:
+ *
+ * 0..nr_payload_regs-1: fixed function setup (including bary coordinates).
+ * nr_payload_regs..nr_payload_regs+curb_read_lengh-1: uniform data
+ * nr_payload_regs+curb_read_lengh..first_non_payload_grf-1: setup coefficients.
+ *
+ * And we have payload_node_count nodes covering these registers in order
+ * (note that in SIMD16, a node is two registers).
+ */
+void
+fs_visitor::setup_payload_interference(int* payload_last_use_ip,
+                                       int* mrf_first_use_ip,
+                                       int payload_node_count,
+                                       int mrf_node_count,
+                                       int first_payload_node)
+{
+   int reg_width = dispatch_width / 8;
+   int loop_depth = 0;
+   int loop_end_ip = 0;
+   int loop_start_ip = 0;
+
+   memset(payload_last_use_ip, 0, payload_node_count * sizeof(int));
+
+   if (mrf_first_use_ip) {
+      for (int i=0; i<mrf_node_count; ++i)
+         mrf_first_use_ip[i] = -1;
+   }
+
+   int ip = 0;
+   foreach_list(node, &this->instructions) {
+      fs_inst *inst = (fs_inst *)node;
+
+      switch (inst->opcode) {
+      case BRW_OPCODE_DO:
+         loop_depth++;
+
+         /* Since payload regs are deffed only at the start of the shader
+          * execution, any uses of the payload within a loop mean the live
+          * interval extends to the end of the outermost loop.  Find the ip of
+          * the end now.
+          */
+         if (loop_depth == 1) {
+            loop_start_ip = ip;
+            loop_end_ip = ip + count_to_loop_end(inst);
+         }
+         break;
+      case BRW_OPCODE_WHILE:
+         loop_depth--;
+         break;
+      default:
+         break;
+      }
+
+      int use_ip;
+      int mrf_ip;
+      if (loop_depth > 0) {
+         use_ip = loop_end_ip;
+         mrf_ip = loop_start_ip;
+      } else {
+         use_ip = ip;
+         mrf_ip = ip;
+      }
+
+      /* Note that UNIFORM args have been turned into FIXED_HW_REG by
+       * assign_curbe_setup(), and interpolation uses fixed hardware regs from
+       * the start (see interp_reg()).
+       */
+      for (int i = 0; i < 3; i++) {
+         if (inst->src[i].file == HW_REG &&
+             inst->src[i].fixed_hw_reg.file == BRW_GENERAL_REGISTER_FILE) {
+            int node_nr = inst->src[i].fixed_hw_reg.nr / reg_width;
+            if (node_nr >= payload_node_count)
+               continue;
+
+            payload_last_use_ip[node_nr] = use_ip;
+         }
+      }
+
+      if (mrf_first_use_ip) {
+         if (inst->dst.file == MRF) {
+            const int reg = inst->dst.reg & ~BRW_MRF_COMPR4;
+            mrf_first_use_ip[reg] = mrf_ip;
+            if (reg_width == 2) {
+               if (inst->dst.reg & BRW_MRF_COMPR4) {
+                  mrf_first_use_ip[reg + 4] = mrf_ip;
+               } else {
+                  mrf_first_use_ip[reg + 1] = mrf_ip;
+               }
+            }
+         }
+
+         if (inst->mlen > 0) {
+            for (int i = 0; i < implied_mrf_writes(inst); i++) {
+               mrf_first_use_ip[inst->base_mrf + i] = mrf_ip;
+            }
+         }
+      }
+
+      /* Special case instructions which have extra implied registers used. */
+      switch (inst->opcode) {
+      case FS_OPCODE_FB_WRITE:
+         /* We could omit this for the !inst->header_present case, except that
+          * the simulator apparently incorrectly reads from g0/g1 instead of
+          * sideband.  It also really freaks out driver developers to see g0
+          * used in unusual places, so just always reserve it.
+          */
+         payload_last_use_ip[0 / reg_width] = use_ip;
+         payload_last_use_ip[1 / reg_width] = use_ip;
+         break;
+
+      case FS_OPCODE_LINTERP:
+         /* On gen6+ in SIMD16, there are 4 adjacent registers (so 2 nodes)
+          * used by PLN's sourcing of the deltas, while we list only the first
+          * two in the arguments (1 node).  Pre-gen6, the deltas are computed
+          * in normal VGRFs.
+          */
+         if (brw->gen >= 6) {
+            int delta_x_arg = 0;
+            if (inst->src[delta_x_arg].file == HW_REG &&
+                inst->src[delta_x_arg].fixed_hw_reg.file ==
+                BRW_GENERAL_REGISTER_FILE) {
+               int sechalf_node = (inst->src[delta_x_arg].fixed_hw_reg.nr /
+                                   reg_width) + 1;
+               assert(sechalf_node < payload_node_count);
+               payload_last_use_ip[sechalf_node] = use_ip;
+            }
+         }
+         break;
+
+      default:
+         break;
+      }
+
+      ip++;
+   }
+}
+
+/**
+ * Sets the mrf_used array to indicate which MRFs are used by the shader IR
+ *
+ * This is used in assign_regs() to decide which of the GRFs that we use as
+ * MRFs on gen7 get normally register allocated, and in register spilling to
+ * see if we can actually use MRFs to do spills without overwriting normal MRF
+ * contents.
+ */
+void
+fs_visitor::get_used_mrfs(bool *mrf_used)
+{
+   int reg_width = dispatch_width / 8;
+
+   memset(mrf_used, 0, BRW_MAX_MRF * sizeof(bool));
+
+   foreach_list(node, &this->instructions) {
+      fs_inst *inst = (fs_inst *)node;
+
+      if (inst->dst.file == MRF) {
+         int reg = inst->dst.reg & ~BRW_MRF_COMPR4;
+         mrf_used[reg] = true;
+         if (reg_width == 2) {
+            if (inst->dst.reg & BRW_MRF_COMPR4) {
+               mrf_used[reg + 4] = true;
+            } else {
+               mrf_used[reg + 1] = true;
+            }
+         }
+      }
+
+      if (inst->mlen > 0) {
+	 for (int i = 0; i < implied_mrf_writes(inst); i++) {
+            mrf_used[inst->base_mrf + i] = true;
+         }
+      }
+   }
+}
+
+bool
+fs_visitor::assign_regs(bool allow_spilling)
+{
+   /* Most of this allocation was written for a reg_width of 1
+    * (dispatch_width == 8).  In extending to SIMD16, the code was
+    * left in place and it was converted to have the hardware
+    * registers it's allocating be contiguous physical pairs of regs
+    * for reg_width == 2.
+    */
+   int reg_width = dispatch_width / 8;
+   int payload_node_count = (ALIGN(this->first_non_payload_grf, reg_width) /
+                            reg_width);
+
+   int node_count = this->virtual_grf_count;
+   int first_payload_node = node_count;
+   node_count += payload_node_count;
+   int first_mrf_hack_node = node_count;
+   int mrf_hack_node_count = 0;
+   if (brw->gen >= 7) {
+      mrf_hack_node_count = BRW_MAX_GRF - GEN7_MRF_HACK_START;
+      node_count += mrf_hack_node_count;
+   }
+
+   int hw_reg_mapping[node_count];
+   int mrf_first_use_ip[mrf_hack_node_count];
+   int payload_last_use_ip[payload_node_count];
+   int extra_regs = node_count - virtual_grf_count;
+
+   invalidate_live_intervals();
+   calculate_live_intervals(extra_regs);
+
+   const int incoming_pressure = debug ? calculate_register_pressure(extra_regs) : 0;
+
+   setup_payload_interference(payload_last_use_ip, mrf_first_use_ip,
+                              payload_node_count, mrf_hack_node_count, first_payload_node);
+
+   igraph_t igraph(node_count, BRW_MAX_GRF / reg_width, virtual_grf_count, virtual_grf_sizes);
+
+   if (debug) {
+      fprintf(stderr, "RA: width=%d, incoming pressure=%d\n", reg_width, incoming_pressure);
+   }
+
+   // Set payload interferences
+   for (int i = 0; i < payload_node_count; i++) {
+      for (int j = 0; j < this->virtual_grf_count; j++)
+         if (this->virtual_grf_start[j] <= payload_last_use_ip[i])
+            igraph.addInterference(first_payload_node + i, j);
+
+      igraph.setColor(first_payload_node + i, i);
+   }
+
+   // Set node interferences
+   for (int i = 0; i < node_count; i++)
+      for (int j = 0; j < i; j++)
+         if (virtual_grf_interferes(i, j) ||
+             (virtual_grf_end[i] == virtual_grf_start[i] ||
+              virtual_grf_end[j] == virtual_grf_start[j]))
+            igraph.addInterference(i, j);
+
+   // Set MRF interferences
+   if (brw->gen >= 7) {
+      for (int i = 0; i < mrf_hack_node_count; i++) {
+         if (mrf_first_use_ip[i] >= 0) {
+            for (int j = 0; j < this->virtual_grf_count; j++)
+               if (this->virtual_grf_end[j] >= mrf_first_use_ip[i])
+                  igraph.addInterference(first_mrf_hack_node + i, j);
+
+            igraph.setColor(first_mrf_hack_node + i, (GEN7_MRF_HACK_START + i) / reg_width);
+         }
+      }
+   }
+
+   if (!igraph.colorGraph()) {
+      /* Failed to allocate registers.  Spill a reg, and the caller will
+       * loop back into here to try again. */
+
+      if (allow_spilling) {
+         const int reg = choose_spill_reg(igraph);
+
+         if (debug) {
+            fprintf(stderr, "RA: spill: reg=%d, size=%d\n", reg, virtual_grf_sizes[reg]);
+            // fprintf(stderr, "RA: pre-spill\n");
+            // dump_instructions();
+            // fprintf(stderr, "RA: post-spill\n");
+            // dump_instructions();
+         }
+
+         if (reg == -1) {
+            fail("no register to spill:\n");
+            dump_instructions();
+         } else {
+            spill_reg(reg);
+         }
+      }
+
+      return false;
+   }
+
+   /* Get the chosen virtual registers for each node, and map virtual
+    * regs in the register classes back down to real hardware reg
+    * numbers.
+    */
+   this->grf_used = payload_node_count * reg_width;
+   for (int i = 0; i < this->virtual_grf_count; i++) {
+      hw_reg_mapping[i] = igraph.getColor(i) * reg_width;
+
+      // fprintf(stderr, "RA: vgrf%d -> %d\n", i, hw_reg_mapping[i]);
+      
+      this->grf_used = MAX2(this->grf_used,
+        		    hw_reg_mapping[i] + this->virtual_grf_sizes[i] * reg_width);
+   }
+
+   foreach_list(node, &this->instructions) {
+      fs_inst *inst = (fs_inst *)node;
+
+      assign_reg(hw_reg_mapping, &inst->dst,    reg_width);
+      assign_reg(hw_reg_mapping, &inst->src[0], reg_width);
+      assign_reg(hw_reg_mapping, &inst->src[1], reg_width);
+      assign_reg(hw_reg_mapping, &inst->src[2], reg_width);
+   }
+
+   if (debug || unlikely(INTEL_DEBUG & DEBUG_WM)) {
+      fprintf(stderr, "RA: success\n");
+   }
+
+   return true;
+}
+
+void
+fs_visitor::emit_unspill(fs_inst *inst, fs_reg dst, uint32_t spill_offset,
+                         int count)
+{
+   for (int i = 0; i < count; i++) {
+      /* The gen7 descriptor-based offset is 12 bits of HWORD units. */
+      bool gen7_read = brw->gen >= 7 && spill_offset < (1 << 12) * REG_SIZE;
+
+      fs_inst *unspill_inst =
+         new(mem_ctx) fs_inst(gen7_read ?
+                              SHADER_OPCODE_GEN7_SCRATCH_READ :
+                              SHADER_OPCODE_GEN4_SCRATCH_READ,
+                              dst);
+      unspill_inst->offset = spill_offset;
+      unspill_inst->ir = inst->ir;
+      unspill_inst->annotation = inst->annotation;
+
+      if (!gen7_read) {
+         unspill_inst->base_mrf = 14;
+         unspill_inst->mlen = 1; /* header contains offset */
+      }
+      inst->insert_before(unspill_inst);
+
+      dst.reg_offset++;
+      spill_offset += dispatch_width * sizeof(float);
+   }
+}
+
+void
+fs_visitor::choose_spill_reg(float* spill_costs, bool* no_spill)
+{
+   float loop_scale = 1.0;
+
+   for (int i = 0; i < this->virtual_grf_count; i++) {
+      spill_costs[i] = 0.0;
+      no_spill[i] = false;
+   }
+
+   /* Calculate costs for spilling nodes.  Call it a cost of 1 per
+    * spill/unspill we'll have to do, and guess that the insides of
+    * loops run 10 times.
+    */
+   foreach_list(node, &this->instructions) {
+      fs_inst *inst = (fs_inst *)node;
+
+      for (unsigned int i = 0; i < 3; i++) {
+	 if (inst->src[i].file == GRF) {
+	    spill_costs[inst->src[i].reg] += loop_scale;
+
+            /* Register spilling logic assumes full-width registers; smeared
+             * registers have a width of 1 so if we try to spill them we'll
+             * generate invalid assembly.  This shouldn't be a problem because
+             * smeared registers are only used as short-term temporaries when
+             * loading pull constants, so spilling them is unlikely to reduce
+             * register pressure anyhow.
+             */
+            if (!inst->src[i].is_contiguous()) {
+               no_spill[inst->src[i].reg] = true;
+            }
+	 }
+      }
+
+      if (inst->dst.file == GRF) {
+	 spill_costs[inst->dst.reg] += inst->regs_written * loop_scale;
+
+         if (!inst->dst.is_contiguous()) {
+            no_spill[inst->dst.reg] = true;
+         }
+      }
+
+      switch (inst->opcode) {
+
+      case BRW_OPCODE_DO:
+	 loop_scale *= 10;
+	 break;
+
+      case BRW_OPCODE_WHILE:
+	 loop_scale /= 10;
+	 break;
+
+      case SHADER_OPCODE_GEN4_SCRATCH_WRITE:
+	 if (inst->src[0].file == GRF)
+	    no_spill[inst->src[0].reg] = true;
+	 break;
+
+      case SHADER_OPCODE_GEN4_SCRATCH_READ:
+      case SHADER_OPCODE_GEN7_SCRATCH_READ:
+	 if (inst->dst.file == GRF)
+	    no_spill[inst->dst.reg] = true;
+	 break;
+
+      default:
+	 break;
+      }
+   }
+}
+
+int
+fs_visitor::choose_spill_reg(igraph_t& g)
+{
+   float spill_costs[this->virtual_grf_count];
+   bool no_spill[this->virtual_grf_count];
+
+   choose_spill_reg(spill_costs, no_spill);
+
+   for (int i = 0; i < this->virtual_grf_count; i++) {
+      if (!no_spill[i])
+         g.setSpillCost(i, spill_costs[i]);
+   }
+
+   return g.getBestSpillNode();
+}
+
+void
+fs_visitor::spill_reg(int spill_reg)
+{
+   int reg_size = dispatch_width * sizeof(float);
+   int size = virtual_grf_sizes[spill_reg];
+   unsigned int spill_offset = c->last_scratch;
+   assert(ALIGN(spill_offset, 16) == spill_offset); /* oword read/write req. */
+   int spill_base_mrf = dispatch_width > 8 ? 13 : 14;
+
+   /* Spills may use MRFs 13-15 in the SIMD16 case.  Our texturing is done
+    * using up to 11 MRFs starting from either m1 or m2, and fb writes can use
+    * up to m13 (gen6+ simd16: 2 header + 8 color + 2 src0alpha + 2 omask) or
+    * m15 (gen4-5 simd16: 2 header + 8 color + 1 aads + 2 src depth + 2 dst
+    * depth), starting from m1.  In summary: We may not be able to spill in
+    * SIMD16 mode, because we'd stomp the FB writes.
+    */
+   if (!spilled_any_registers) {
+      bool mrf_used[BRW_MAX_MRF];
+      get_used_mrfs(mrf_used);
+
+      for (int i = spill_base_mrf; i < BRW_MAX_MRF; i++) {
+         if (mrf_used[i]) {
+            fail("Register spilling not supported with m%d used", i);
+          return;
+         }
+      }
+
+      spilled_any_registers = true;
+   }
+
+   c->last_scratch += size * reg_size;
+
+   /* Generate spill/unspill instructions for the objects being
+    * spilled.  Right now, we spill or unspill the whole thing to a
+    * virtual grf of the same size.  For most instructions, though, we
+    * could just spill/unspill the GRF being accessed.
+    */
+   foreach_list(node, &this->instructions) {
+      fs_inst *inst = (fs_inst *)node;
+
+      for (unsigned int i = 0; i < 3; i++) {
+	 if (inst->src[i].file == GRF &&
+	     inst->src[i].reg == spill_reg) {
+            int regs_read = inst->regs_read(this, i);
+            int subset_spill_offset = (spill_offset +
+                                       reg_size * inst->src[i].reg_offset);
+            fs_reg unspill_dst(GRF, virtual_grf_alloc(regs_read));
+
+            inst->src[i].reg = unspill_dst.reg;
+            inst->src[i].reg_offset = 0;
+
+            emit_unspill(inst, unspill_dst, subset_spill_offset, regs_read);
+	 }
+      }
+
+      if (inst->dst.file == GRF &&
+	  inst->dst.reg == spill_reg) {
+         int subset_spill_offset = (spill_offset +
+                                    reg_size * inst->dst.reg_offset);
+         fs_reg spill_src(GRF, virtual_grf_alloc(inst->regs_written));
+
+         inst->dst.reg = spill_src.reg;
+         inst->dst.reg_offset = 0;
+
+	 /* If our write is going to affect just part of the
+          * inst->regs_written(), then we need to unspill the destination
+          * since we write back out all of the regs_written().
+	  */
+	 if (inst->predicate || inst->force_uncompressed ||
+             inst->force_sechalf || inst->dst.subreg_offset) {
+            emit_unspill(inst, spill_src, subset_spill_offset,
+                         inst->regs_written);
+	 }
+
+	 for (int chan = 0; chan < inst->regs_written; chan++) {
+	    fs_inst *spill_inst =
+               new(mem_ctx) fs_inst(SHADER_OPCODE_GEN4_SCRATCH_WRITE,
+                                    reg_null_f, spill_src);
+	    spill_src.reg_offset++;
+	    spill_inst->offset = subset_spill_offset + chan * reg_size;
+	    spill_inst->ir = inst->ir;
+	    spill_inst->annotation = inst->annotation;
+	    spill_inst->mlen = 1 + dispatch_width / 8; /* header, value */
+	    spill_inst->base_mrf = spill_base_mrf;
+	    inst->insert_after(spill_inst);
+	 }
+      }
+   }
+
+   invalidate_live_intervals();
+}

diff --git a/icd/intel/compiler/pipeline/brw_fs_register_coalesce.cpp b/icd/intel/compiler/pipeline/brw_fs_register_coalesce.cpp
new file mode 100644
index 0000000..01b672f
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_fs_register_coalesce.cpp

@@ -0,0 +1,248 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+/** @file brw_fs_register_coalesce.cpp
+ *
+ * Implements register coalescing: Checks if the two registers involved in a
+ * raw move don't interfere, in which case they can both be stored in the same
+ * place and the MOV removed.
+ *
+ * To do this, all uses of the source of the MOV in the shader are replaced
+ * with the destination of the MOV. For example:
+ *
+ * add vgrf3:F, vgrf1:F, vgrf2:F
+ * mov vgrf4:F, vgrf3:F
+ * mul vgrf5:F, vgrf5:F, vgrf4:F
+ *
+ * becomes
+ *
+ * add vgrf4:F, vgrf1:F, vgrf2:F
+ * mul vgrf5:F, vgrf5:F, vgrf4:F
+ */
+
+#include "brw_fs.h"
+#include "brw_fs_live_variables.h"
+
+static bool
+is_nop_mov(const fs_inst *inst)
+{
+   if (inst->opcode == BRW_OPCODE_MOV) {
+      return inst->dst.equals(inst->src[0]);
+   }
+
+   return false;
+}
+
+static bool
+is_coalesce_candidate(const fs_inst *inst, const int *virtual_grf_sizes)
+{
+   if (inst->opcode != BRW_OPCODE_MOV ||
+       inst->is_partial_write() ||
+       inst->saturate ||
+       inst->src[0].file != GRF ||
+       inst->src[0].negate ||
+       inst->src[0].abs ||
+       !inst->src[0].is_contiguous() ||
+       inst->dst.file != GRF ||
+       inst->dst.type != inst->src[0].type) {
+      return false;
+   }
+
+   if (virtual_grf_sizes[inst->src[0].reg] >
+       virtual_grf_sizes[inst->dst.reg])
+      return false;
+
+   return true;
+}
+
+static bool
+can_coalesce_vars(brw::fs_live_variables *live_intervals,
+                  const exec_list *instructions, const fs_inst *inst,
+                  int var_to, int var_from)
+{
+   if (!live_intervals->vars_interfere(var_from, var_to))
+      return true;
+
+   /* We know that the live ranges of A (var_from) and B (var_to)
+    * interfere because of the ->vars_interfere() call above. If the end
+    * of B's live range is after the end of A's range, then we know two
+    * things:
+    *  - the start of B's live range must be in A's live range (since we
+    *    already know the two ranges interfere, this is the only remaining
+    *    possibility)
+    *  - the interference isn't of the form we're looking for (where B is
+    *    entirely inside A)
+    */
+   if (live_intervals->end[var_to] > live_intervals->end[var_from])
+      return false;
+
+   int scan_ip = -1;
+
+   foreach_list(n, instructions) {
+      fs_inst *scan_inst = (fs_inst *)n;
+      scan_ip++;
+
+      if (scan_inst->is_control_flow())
+         return false;
+
+      if (scan_ip <= live_intervals->start[var_to])
+         continue;
+
+      if (scan_ip > live_intervals->end[var_to])
+         break;
+
+      if (scan_inst->dst.equals(inst->dst) ||
+          scan_inst->dst.equals(inst->src[0]))
+         return false;
+   }
+
+   return true;
+}
+
+bool
+fs_visitor::register_coalesce()
+{
+   bool progress = false;
+
+   calculate_live_intervals();
+
+   int src_size = 0;
+   int channels_remaining = 0;
+   int reg_from = -1, reg_to = -1;
+   int reg_to_offset[MAX_SAMPLER_MESSAGE_SIZE];
+   fs_inst *mov[MAX_SAMPLER_MESSAGE_SIZE];
+   int var_to[MAX_SAMPLER_MESSAGE_SIZE];
+   int var_from[MAX_SAMPLER_MESSAGE_SIZE];
+
+   foreach_list(node, &this->instructions) {
+      fs_inst *inst = (fs_inst *)node;
+
+      if (!is_coalesce_candidate(inst, virtual_grf_sizes))
+         continue;
+
+      if (is_nop_mov(inst)) {
+         inst->opcode = BRW_OPCODE_NOP;
+         progress = true;
+         continue;
+      }
+
+      if (reg_from != inst->src[0].reg) {
+         reg_from = inst->src[0].reg;
+
+         src_size = virtual_grf_sizes[inst->src[0].reg];
+         assert(src_size <= MAX_SAMPLER_MESSAGE_SIZE);
+
+         channels_remaining = src_size;
+         memset(mov, 0, sizeof(mov));
+
+         reg_to = inst->dst.reg;
+      }
+
+      if (reg_to != inst->dst.reg)
+         continue;
+
+      const int offset = inst->src[0].reg_offset;
+      reg_to_offset[offset] = inst->dst.reg_offset;
+      mov[offset] = inst;
+      channels_remaining--;
+
+      if (channels_remaining)
+         continue;
+
+      bool can_coalesce = true;
+      for (int i = 0; i < src_size; i++) {
+         var_to[i] = live_intervals->var_from_vgrf[reg_to] + reg_to_offset[i];
+         var_from[i] = live_intervals->var_from_vgrf[reg_from] + i;
+
+         if (!can_coalesce_vars(live_intervals, &instructions, inst,
+                                var_to[i], var_from[i])) {
+            can_coalesce = false;
+            reg_from = -1;
+            break;
+         }
+      }
+
+      if (!can_coalesce)
+         continue;
+
+      progress = true;
+
+      for (int i = 0; i < src_size; i++) {
+         if (mov[i]) {
+            mov[i]->opcode = BRW_OPCODE_NOP;
+            mov[i]->conditional_mod = BRW_CONDITIONAL_NONE;
+            mov[i]->dst = reg_undef;
+            mov[i]->src[0] = reg_undef;
+            mov[i]->src[1] = reg_undef;
+            mov[i]->src[2] = reg_undef;
+         }
+      }
+
+      foreach_list(node, &this->instructions) {
+         fs_inst *scan_inst = (fs_inst *)node;
+
+         for (int i = 0; i < src_size; i++) {
+            if (mov[i]) {
+               if (scan_inst->dst.file == GRF &&
+                   scan_inst->dst.reg == reg_from &&
+                   scan_inst->dst.reg_offset == i) {
+                  scan_inst->dst.reg = reg_to;
+                  scan_inst->dst.reg_offset = reg_to_offset[i];
+               }
+               for (int j = 0; j < 3; j++) {
+                  if (scan_inst->src[j].file == GRF &&
+                      scan_inst->src[j].reg == reg_from &&
+                      scan_inst->src[j].reg_offset == i) {
+                     scan_inst->src[j].reg = reg_to;
+                     scan_inst->src[j].reg_offset = reg_to_offset[i];
+                  }
+               }
+            }
+         }
+      }
+
+      for (int i = 0; i < src_size; i++) {
+         live_intervals->start[var_to[i]] =
+            MIN2(live_intervals->start[var_to[i]],
+                 live_intervals->start[var_from[i]]);
+         live_intervals->end[var_to[i]] =
+            MAX2(live_intervals->end[var_to[i]],
+                 live_intervals->end[var_from[i]]);
+      }
+      reg_from = -1;
+   }
+
+   if (progress) {
+      foreach_list_safe(node, &this->instructions) {
+         fs_inst *inst = (fs_inst *)node;
+
+         if (inst->opcode == BRW_OPCODE_NOP) {
+            inst->remove();
+         }
+      }
+
+      invalidate_live_intervals();
+   }
+
+   return progress;
+}

diff --git a/icd/intel/compiler/pipeline/brw_fs_saturate_propagation.cpp b/icd/intel/compiler/pipeline/brw_fs_saturate_propagation.cpp
new file mode 100644
index 0000000..35e6774
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_fs_saturate_propagation.cpp

@@ -0,0 +1,108 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "brw_fs.h"
+#include "brw_fs_live_variables.h"
+#include "brw_cfg.h"
+
+/** @file brw_fs_saturate_propagation.cpp
+ */
+
+static bool
+opt_saturate_propagation_local(fs_visitor *v, bblock_t *block)
+{
+   bool progress = false;
+   int ip = block->start_ip - 1;
+
+   for (fs_inst *inst = (fs_inst *)block->start;
+        inst != block->end->next;
+        inst = (fs_inst *) inst->next) {
+      ip++;
+
+      if (inst->opcode != BRW_OPCODE_MOV ||
+          inst->dst.file != GRF ||
+          inst->src[0].file != GRF ||
+          inst->src[0].abs ||
+          inst->src[0].negate ||
+          !inst->saturate)
+         continue;
+
+      int src_var = v->live_intervals->var_from_reg(&inst->src[0]);
+      int src_end_ip = v->live_intervals->end[src_var];
+      if (src_end_ip > ip && !inst->dst.equals(inst->src[0]))
+         continue;
+
+      int scan_ip = ip;
+      bool interfered = false;
+      for (fs_inst *scan_inst = (fs_inst *) inst->prev;
+           scan_inst != block->start->prev;
+           scan_inst = (fs_inst *) scan_inst->prev) {
+         scan_ip--;
+
+         if (scan_inst->dst.file == GRF &&
+             scan_inst->dst.reg == inst->src[0].reg &&
+             scan_inst->dst.reg_offset == inst->src[0].reg_offset &&
+             !scan_inst->is_partial_write()) {
+            if (scan_inst->can_do_saturate()) {
+               scan_inst->saturate = true;
+               inst->saturate = false;
+               progress = true;
+            }
+            break;
+         }
+         for (int i = 0; i < 3; i++) {
+            if (scan_inst->src[i].file == GRF &&
+                scan_inst->src[i].reg == inst->src[0].reg &&
+                scan_inst->src[i].reg_offset == inst->src[0].reg_offset) {
+               interfered = true;
+               break;
+            }
+         }
+
+         if (interfered)
+            break;
+      }
+   }
+
+   return progress;
+}
+
+bool
+fs_visitor::opt_saturate_propagation()
+{
+   bool progress = false;
+
+   calculate_live_intervals();
+
+   cfg_t cfg(&instructions);
+
+   for (int b = 0; b < cfg.num_blocks; b++) {
+      progress = opt_saturate_propagation_local(this, cfg.blocks[b])
+                 || progress;
+   }
+
+   if (progress)
+      invalidate_live_intervals();
+
+   return progress;
+}

diff --git a/icd/intel/compiler/pipeline/brw_fs_sel_peephole.cpp b/icd/intel/compiler/pipeline/brw_fs_sel_peephole.cpp
new file mode 100644
index 0000000..db0be19
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_fs_sel_peephole.cpp

@@ -0,0 +1,231 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "brw_fs.h"
+#include "brw_cfg.h"
+
+/** @file brw_fs_sel_peephole.cpp
+ *
+ * This file contains the opt_peephole_sel() optimization pass that replaces
+ * MOV instructions to the same destination in the "then" and "else" bodies of
+ * an if statement with SEL instructions.
+ */
+
+/* Four MOVs seems to be pretty typical, so I picked the next power of two in
+ * the hopes that it would handle almost anything possible in a single
+ * pass.
+ */
+#define MAX_MOVS 8 /**< The maximum number of MOVs to attempt to match. */
+
+/**
+ * Scans forwards from an IF counting consecutive MOV instructions in the
+ * "then" and "else" blocks of the if statement.
+ *
+ * A pointer to the fs_inst* for IF is passed as the <if_inst> argument. The
+ * function stores pointers to the MOV instructions in the <then_mov> and
+ * <else_mov> arrays.
+ *
+ * \return the minimum number of MOVs found in the two branches or zero if
+ *         an error occurred.
+ *
+ * E.g.:
+ *                  IF ...
+ *    then_mov[0] = MOV g4, ...
+ *    then_mov[1] = MOV g5, ...
+ *    then_mov[2] = MOV g6, ...
+ *                  ELSE ...
+ *    else_mov[0] = MOV g4, ...
+ *    else_mov[1] = MOV g5, ...
+ *    else_mov[2] = MOV g7, ...
+ *                  ENDIF
+ *    returns 3.
+ */
+static int
+count_movs_from_if(fs_inst *then_mov[MAX_MOVS], fs_inst *else_mov[MAX_MOVS],
+                   fs_inst *if_inst, fs_inst *else_inst)
+{
+   fs_inst *m = if_inst;
+
+   assert(m->opcode == BRW_OPCODE_IF);
+   m = (fs_inst *) m->next;
+
+   int then_movs = 0;
+   while (then_movs < MAX_MOVS && m->opcode == BRW_OPCODE_MOV) {
+      then_mov[then_movs] = m;
+      m = (fs_inst *) m->next;
+      then_movs++;
+   }
+
+   m = (fs_inst *) else_inst->next;
+
+   int else_movs = 0;
+   while (else_movs < MAX_MOVS && m->opcode == BRW_OPCODE_MOV) {
+      else_mov[else_movs] = m;
+      m = (fs_inst *) m->next;
+      else_movs++;
+   }
+
+   return MIN2(then_movs, else_movs);
+}
+
+/**
+ * Try to replace IF/MOV+/ELSE/MOV+/ENDIF with SEL.
+ *
+ * Many GLSL shaders contain the following pattern:
+ *
+ *    x = condition ? foo : bar
+ *
+ * or
+ *
+ *    if (...) a.xyzw = foo.xyzw;
+ *    else     a.xyzw = bar.xyzw;
+ *
+ * The compiler emits an ir_if tree for this, since each subexpression might be
+ * a complex tree that could have side-effects or short-circuit logic.
+ *
+ * However, the common case is to simply select one of two constants or
+ * variable values---which is exactly what SEL is for.  In this case, the
+ * assembly looks like:
+ *
+ *    (+f0) IF
+ *    MOV dst src0
+ *    ...
+ *    ELSE
+ *    MOV dst src1
+ *    ...
+ *    ENDIF
+ *
+ * where each pair of MOVs to a common destination and can be easily translated
+ * into
+ *
+ *    (+f0) SEL dst src0 src1
+ *
+ * If src0 is an immediate value, we promote it to a temporary GRF.
+ */
+bool
+fs_visitor::opt_peephole_sel()
+{
+   bool progress = false;
+
+   cfg_t cfg(&instructions);
+
+   for (int b = 0; b < cfg.num_blocks; b++) {
+      bblock_t *block = cfg.blocks[b];
+
+      /* IF instructions, by definition, can only be found at the ends of
+       * basic blocks.
+       */
+      fs_inst *if_inst = (fs_inst *) block->end;
+      if (if_inst->opcode != BRW_OPCODE_IF)
+         continue;
+
+      if (!block->else_inst)
+         continue;
+
+      fs_inst *else_inst = (fs_inst *) block->else_inst;
+      assert(else_inst->opcode == BRW_OPCODE_ELSE);
+
+      fs_inst *else_mov[MAX_MOVS] = { NULL };
+      fs_inst *then_mov[MAX_MOVS] = { NULL };
+
+      int movs = count_movs_from_if(then_mov, else_mov, if_inst, else_inst);
+
+      if (movs == 0)
+         continue;
+
+      fs_inst *sel_inst[MAX_MOVS] = { NULL };
+      fs_inst *mov_imm_inst[MAX_MOVS] = { NULL };
+
+      /* Generate SEL instructions for pairs of MOVs to a common destination. */
+      for (int i = 0; i < movs; i++) {
+         if (!then_mov[i] || !else_mov[i])
+            break;
+
+         /* Check that the MOVs are the right form. */
+         if (!then_mov[i]->dst.equals(else_mov[i]->dst) ||
+             then_mov[i]->is_partial_write() ||
+             else_mov[i]->is_partial_write()) {
+            movs = i;
+            break;
+         }
+
+         /* Check that source types for mov operations match. */
+         if (then_mov[i]->src[0].type != else_mov[i]->src[0].type) {
+            movs = i;
+            break;
+         }
+
+         if (!then_mov[i]->src[0].equals(else_mov[i]->src[0])) {
+            /* Only the last source register can be a constant, so if the MOV
+             * in the "then" clause uses a constant, we need to put it in a
+             * temporary.
+             */
+            fs_reg src0(then_mov[i]->src[0]);
+            if (src0.file == IMM) {
+               src0 = fs_reg(this, glsl_type::float_type);
+               src0.type = then_mov[i]->src[0].type;
+               mov_imm_inst[i] = MOV(src0, then_mov[i]->src[0]);
+            }
+
+            sel_inst[i] = SEL(then_mov[i]->dst, src0, else_mov[i]->src[0]);
+
+            if (brw->gen == 6 && if_inst->conditional_mod) {
+               /* For Sandybridge with IF with embedded comparison */
+               sel_inst[i]->predicate = BRW_PREDICATE_NORMAL;
+            } else {
+               /* Separate CMP and IF instructions */
+               sel_inst[i]->predicate = if_inst->predicate;
+               sel_inst[i]->predicate_inverse = if_inst->predicate_inverse;
+            }
+         } else {
+            sel_inst[i] = MOV(then_mov[i]->dst, then_mov[i]->src[0]);
+         }
+      }
+
+      if (movs == 0)
+         continue;
+
+      /* Emit a CMP if our IF used the embedded comparison */
+      if (brw->gen == 6 && if_inst->conditional_mod) {
+         fs_inst *cmp_inst = CMP(reg_null_d, if_inst->src[0], if_inst->src[1],
+                                 if_inst->conditional_mod);
+         if_inst->insert_before(cmp_inst);
+      }
+
+      for (int i = 0; i < movs; i++) {
+         if (mov_imm_inst[i])
+            if_inst->insert_before(mov_imm_inst[i]);
+         if_inst->insert_before(sel_inst[i]);
+
+         then_mov[i]->remove();
+         else_mov[i]->remove();
+      }
+
+      progress = true;
+   }
+
+   if (progress)
+      invalidate_live_intervals();
+
+   return progress;
+}

diff --git a/icd/intel/compiler/pipeline/brw_fs_vector_splitting.cpp b/icd/intel/compiler/pipeline/brw_fs_vector_splitting.cpp
new file mode 100644
index 0000000..5467f38
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_fs_vector_splitting.cpp

@@ -0,0 +1,389 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file brw_wm_vector_splitting.cpp
+ *
+ * If a vector is only ever referenced by its components, then
+ * split those components out to individual variables so they can be
+ * handled normally by other optimization passes.
+ *
+ * This skips vectors in uniforms and varyings, which need to be
+ * accessible as vectors for their access by the GL.  Also, vector
+ * results of non-variable-derefs in assignments aren't handled
+ * because to do so we would have to store the vector result to a
+ * temporary in order to unload each channel, and to do so would just
+ * loop us back to where we started.  For the 965, this is exactly the
+ * behavior we want for the results of texture lookups, but probably not for
+ */
+
+extern "C" {
+//#include "main/core.h" // LunarG : Removed
+#include "brw_context.h"
+}
+#include "glsl/ir.h"
+#include "glsl/ir_visitor.h"
+#include "glsl/ir_rvalue_visitor.h"
+#include "glsl/glsl_types.h"
+
+static bool debug = false;
+
+class variable_entry : public exec_node
+{
+public:
+   variable_entry(ir_variable *var)
+   {
+      this->var = var;
+      this->whole_vector_access = 0;
+      this->mem_ctx = NULL;
+   }
+
+   ir_variable *var; /* The key: the variable's pointer. */
+
+   /** Number of times the variable is referenced, including assignments. */
+   unsigned whole_vector_access;
+
+   ir_variable *components[4];
+
+   /** ralloc_parent(this->var) -- the shader's ralloc context. */
+   void *mem_ctx;
+};
+
+class ir_vector_reference_visitor : public ir_hierarchical_visitor {
+public:
+   ir_vector_reference_visitor(void)
+   {
+      this->mem_ctx = ralloc_context(NULL);
+      this->variable_list.make_empty();
+   }
+
+   ~ir_vector_reference_visitor(void)
+   {
+      ralloc_free(mem_ctx);
+   }
+
+   virtual ir_visitor_status visit(ir_variable *);
+   virtual ir_visitor_status visit(ir_dereference_variable *);
+   virtual ir_visitor_status visit_enter(ir_swizzle *);
+   virtual ir_visitor_status visit_enter(ir_assignment *);
+   virtual ir_visitor_status visit_enter(ir_function_signature *);
+
+   variable_entry *get_variable_entry(ir_variable *var);
+
+   /* List of variable_entry */
+   exec_list variable_list;
+
+   void *mem_ctx;
+};
+
+variable_entry *
+ir_vector_reference_visitor::get_variable_entry(ir_variable *var)
+{
+   assert(var);
+
+   if (!var->type->is_vector())
+      return NULL;
+
+   switch (var->data.mode) {
+   case ir_var_uniform:
+   case ir_var_shader_in:
+   case ir_var_shader_out:
+   case ir_var_system_value:
+   case ir_var_function_in:
+   case ir_var_function_out:
+   case ir_var_function_inout:
+      /* Can't split varyings or uniforms.  Function in/outs won't get split
+       * either.
+       */
+      return NULL;
+   case ir_var_auto:
+   case ir_var_temporary:
+      break;
+   }
+
+   foreach_list(node, &this->variable_list) {
+      variable_entry *entry = (variable_entry *)node;
+      if (entry->var == var)
+	 return entry;
+   }
+
+   variable_entry *entry = new(mem_ctx) variable_entry(var);
+   this->variable_list.push_tail(entry);
+   return entry;
+}
+
+
+ir_visitor_status
+ir_vector_reference_visitor::visit(ir_variable *ir)
+{
+   /* Make sure splitting looks at splitting this variable */
+   (void)this->get_variable_entry(ir);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_vector_reference_visitor::visit(ir_dereference_variable *ir)
+{
+   ir_variable *const var = ir->var;
+   variable_entry *entry = this->get_variable_entry(var);
+
+   if (entry)
+      entry->whole_vector_access++;
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_vector_reference_visitor::visit_enter(ir_swizzle *ir)
+{
+   /* Don't descend into a vector ir_dereference_variable below. */
+   if (ir->val->as_dereference_variable() && ir->type->is_scalar())
+      return visit_continue_with_parent;
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_vector_reference_visitor::visit_enter(ir_assignment *ir)
+{
+   if (ir->lhs->as_dereference_variable() &&
+       ir->rhs->as_dereference_variable() &&
+       !ir->condition) {
+      /* We'll split copies of a vector to copies of channels, so don't
+       * descend to the ir_dereference_variables.
+       */
+      return visit_continue_with_parent;
+   }
+   if (ir->lhs->as_dereference_variable() &&
+       is_power_of_two(ir->write_mask) &&
+       !ir->condition) {
+      /* If we're writing just a channel, then channel-splitting the LHS is OK.
+       */
+      ir->rhs->accept(this);
+      return visit_continue_with_parent;
+   }
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_vector_reference_visitor::visit_enter(ir_function_signature *ir)
+{
+   /* We don't want to descend into the function parameters and
+    * split them, so just accept the body here.
+    */
+   visit_list_elements(this, &ir->body);
+   return visit_continue_with_parent;
+}
+
+class ir_vector_splitting_visitor : public ir_rvalue_visitor {
+public:
+   ir_vector_splitting_visitor(exec_list *vars)
+   {
+      this->variable_list = vars;
+   }
+
+   virtual ir_visitor_status visit_leave(ir_assignment *);
+
+   void handle_rvalue(ir_rvalue **rvalue);
+   variable_entry *get_splitting_entry(ir_variable *var);
+
+   exec_list *variable_list;
+};
+
+variable_entry *
+ir_vector_splitting_visitor::get_splitting_entry(ir_variable *var)
+{
+   assert(var);
+
+   if (!var->type->is_vector())
+      return NULL;
+
+   foreach_list(node, &*this->variable_list) {
+      variable_entry *entry = (variable_entry *)node;
+      if (entry->var == var) {
+	 return entry;
+      }
+   }
+
+   return NULL;
+}
+
+void
+ir_vector_splitting_visitor::handle_rvalue(ir_rvalue **rvalue)
+{
+   if (!*rvalue)
+      return;
+
+   ir_swizzle *swiz = (*rvalue)->as_swizzle();
+   if (!swiz || !swiz->type->is_scalar())
+      return;
+
+   ir_dereference_variable *deref_var = swiz->val->as_dereference_variable();
+   if (!deref_var)
+      return;
+
+   variable_entry *entry = get_splitting_entry(deref_var->var);
+   if (!entry)
+      return;
+
+   ir_variable *var = entry->components[swiz->mask.x];
+   *rvalue = new(entry->mem_ctx) ir_dereference_variable(var);
+}
+
+ir_visitor_status
+ir_vector_splitting_visitor::visit_leave(ir_assignment *ir)
+{
+   ir_dereference_variable *lhs_deref = ir->lhs->as_dereference_variable();
+   ir_dereference_variable *rhs_deref = ir->rhs->as_dereference_variable();
+   variable_entry *lhs = lhs_deref ? get_splitting_entry(lhs_deref->var) : NULL;
+   variable_entry *rhs = rhs_deref ? get_splitting_entry(rhs_deref->var) : NULL;
+
+   if (lhs_deref && rhs_deref && (lhs || rhs) && !ir->condition) {
+      unsigned int rhs_chan = 0;
+
+      /* Straight assignment of vector variables. */
+      for (unsigned int i = 0; i < ir->lhs->type->vector_elements; i++) {
+	 ir_dereference *new_lhs;
+	 ir_rvalue *new_rhs;
+	 void *mem_ctx = lhs ? lhs->mem_ctx : rhs->mem_ctx;
+	 unsigned int writemask;
+
+	 if (!(ir->write_mask & (1 << i)))
+	    continue;
+
+	 if (lhs) {
+	    new_lhs = new(mem_ctx) ir_dereference_variable(lhs->components[i]);
+	    writemask = 1;
+	 } else {
+	    new_lhs = ir->lhs->clone(mem_ctx, NULL);
+	    writemask = 1 << i;
+	 }
+
+	 if (rhs) {
+	    new_rhs =
+	       new(mem_ctx) ir_dereference_variable(rhs->components[rhs_chan]);
+	 } else {
+	    new_rhs = new(mem_ctx) ir_swizzle(ir->rhs->clone(mem_ctx, NULL),
+					      rhs_chan, 0, 0, 0, 1);
+	 }
+
+	 ir->insert_before(new(mem_ctx) ir_assignment(new_lhs,
+						      new_rhs,
+						      NULL, writemask));
+
+	 rhs_chan++;
+      }
+      ir->remove();
+   } else if (lhs) {
+      void *mem_ctx = lhs->mem_ctx;
+      int elem = -1;
+
+      switch (ir->write_mask) {
+      case (1 << 0):
+	 elem = 0;
+	 break;
+      case (1 << 1):
+	 elem = 1;
+	 break;
+      case (1 << 2):
+	 elem = 2;
+	 break;
+      case (1 << 3):
+	 elem = 3;
+	 break;
+      default:
+	 ir->fprint(stderr);
+	 assert(!"not reached: non-channelwise dereference of LHS.");
+      }
+
+      ir->lhs = new(mem_ctx) ir_dereference_variable(lhs->components[elem]);
+      ir->write_mask = (1 << 0);
+
+      handle_rvalue(&ir->rhs);
+   } else {
+      handle_rvalue(&ir->rhs);
+   }
+
+   handle_rvalue(&ir->condition);
+
+   return visit_continue;
+}
+
+bool
+brw_do_vector_splitting(exec_list *instructions)
+{
+   ir_vector_reference_visitor refs;
+
+   visit_list_elements(&refs, instructions);
+
+   /* Trim out variables we can't split. */
+   foreach_list_safe(node, &refs.variable_list) {
+      variable_entry *entry = (variable_entry *)node;
+
+      if (debug) {
+	 fprintf(stderr, "vector %s@%p: whole_access %d\n",
+                 entry->var->name, (void *) entry->var,
+                 entry->whole_vector_access);
+      }
+
+      if (entry->whole_vector_access) {
+	 entry->remove();
+      }
+   }
+
+   if (refs.variable_list.is_empty())
+      return false;
+
+   void *mem_ctx = ralloc_context(NULL);
+
+   /* Replace the decls of the vectors to be split with their split
+    * components.
+    */
+   foreach_list(node, &refs.variable_list) {
+      variable_entry *entry = (variable_entry *)node;
+      const struct glsl_type *type;
+      type = glsl_type::get_instance(entry->var->type->base_type, 1, 1);
+
+      entry->mem_ctx = ralloc_parent(entry->var);
+
+      for (unsigned int i = 0; i < entry->var->type->vector_elements; i++) {
+	 const char *name = ralloc_asprintf(mem_ctx, "%s_%c",
+					    entry->var->name,
+					    "xyzw"[i]);
+
+	 entry->components[i] = new(entry->mem_ctx) ir_variable(type, name,
+								ir_var_temporary);
+	 entry->var->insert_before(entry->components[i]);
+      }
+
+      entry->var->remove();
+   }
+
+   ir_vector_splitting_visitor split(&refs.variable_list);
+   visit_list_elements(&split, instructions);
+
+   ralloc_free(mem_ctx);
+
+   return true;
+}

diff --git a/icd/intel/compiler/pipeline/brw_fs_visitor.cpp b/icd/intel/compiler/pipeline/brw_fs_visitor.cpp
new file mode 100644
index 0000000..dc6c2be
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_fs_visitor.cpp

@@ -0,0 +1,3022 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+/** @file brw_fs_visitor.cpp
+ *
+ * This file supports generating the FS LIR from the GLSL IR.  The LIR
+ * makes it easier to do backend-specific optimizations than doing so
+ * in the GLSL IR or in the native code.
+ */
+extern "C" {
+
+#include <sys/types.h>
+
+#include "main/macros.h"
+#include "main/shaderobj.h"
+#include "program/prog_parameter.h"
+#include "program/prog_print.h"
+#include "program/prog_optimize.h"
+#include "program/register_allocate.h"
+#include "program/sampler.h"
+#include "program/hash_table.h"
+#include "brw_context.h"
+#include "brw_eu.h"
+#include "brw_wm.h"
+}
+#include "brw_fs.h"
+#include "main/uniforms.h"
+#include "glsl/glsl_types.h"
+#include "glsl/ir_optimization.h"
+
+void
+fs_visitor::visit(ir_variable *ir)
+{
+   fs_reg *reg = NULL;
+
+   if (variable_storage(ir))
+      return;
+
+   if (ir->data.mode == ir_var_shader_in) {
+      if (!strcmp(ir->name, "gl_FragCoord")) {
+	 reg = emit_fragcoord_interpolation(ir);
+      } else if (!strcmp(ir->name, "gl_FrontFacing")) {
+	 reg = emit_frontfacing_interpolation(ir);
+      } else {
+	 reg = emit_general_interpolation(ir);
+      }
+      assert(reg);
+      hash_table_insert(this->variable_ht, reg, ir);
+      return;
+   } else if (ir->data.mode == ir_var_shader_out) {
+      reg = new(this->mem_ctx) fs_reg(this, ir->type);
+
+      if (ir->data.index > 0) {
+	 assert(ir->data.location == FRAG_RESULT_DATA0);
+	 assert(ir->data.index == 1);
+	 this->dual_src_output = *reg;
+         this->do_dual_src = true;
+      } else if (ir->data.location == FRAG_RESULT_COLOR) {
+	 /* Writing gl_FragColor outputs to all color regions. */
+	 for (unsigned int i = 0; i < MAX2(c->key.nr_color_regions, 1); i++) {
+	    this->outputs[i] = *reg;
+	    this->output_components[i] = 4;
+	 }
+      } else if (ir->data.location == FRAG_RESULT_DEPTH) {
+	 this->frag_depth = *reg;
+      } else if (ir->data.location == FRAG_RESULT_SAMPLE_MASK) {
+         this->sample_mask = *reg;
+      } else {
+	 /* gl_FragData or a user-defined FS output */
+	 assert(ir->data.location >= FRAG_RESULT_DATA0 &&
+		ir->data.location < FRAG_RESULT_DATA0 + BRW_MAX_DRAW_BUFFERS);
+
+	 int vector_elements =
+	    ir->type->is_array() ? ir->type->fields.array->vector_elements
+				 : ir->type->vector_elements;
+
+	 /* General color output. */
+	 for (unsigned int i = 0; i < MAX2(1, ir->type->length); i++) {
+	    int output = ir->data.location - FRAG_RESULT_DATA0 + i;
+	    this->outputs[output] = *reg;
+	    this->outputs[output].reg_offset += vector_elements * i;
+	    this->output_components[output] = vector_elements;
+	 }
+      }
+   } else if (ir->data.mode == ir_var_uniform) {
+      int param_index = uniforms;
+
+      /* Thanks to the lower_ubo_reference pass, we will see only
+       * ir_binop_ubo_load expressions and not ir_dereference_variable for UBO
+       * variables, so no need for them to be in variable_ht.
+       *
+       * Atomic counters take no uniform storage, no need to do
+       * anything here.
+       */
+      if (ir->is_in_uniform_block() || ir->type->contains_atomic())
+         return;
+
+      if (dispatch_width == 16) {
+	 if (!variable_storage(ir)) {
+	    fail("Failed to find uniform '%s' in SIMD16\n", ir->name);
+	 }
+	 return;
+      }
+
+      param_size[param_index] = type_size(ir->type);
+      if (!strncmp(ir->name, "gl_", 3)) {
+	 setup_builtin_uniform_values(ir);
+      } else {
+	 setup_uniform_values(ir);
+      }
+
+      reg = new(this->mem_ctx) fs_reg(UNIFORM, param_index);
+      reg->type = brw_type_for_base_type(ir->type);
+
+   } else if (ir->data.mode == ir_var_system_value) {
+      if (ir->data.location == SYSTEM_VALUE_SAMPLE_POS) {
+	 reg = emit_samplepos_setup(ir);
+      } else if (ir->data.location == SYSTEM_VALUE_SAMPLE_ID) {
+	 reg = emit_sampleid_setup(ir);
+      } else if (ir->data.location == SYSTEM_VALUE_SAMPLE_MASK_IN) {
+         reg = emit_samplemaskin_setup(ir);
+      }
+   }
+
+   if (!reg)
+      reg = new(this->mem_ctx) fs_reg(this, ir->type);
+
+   hash_table_insert(this->variable_ht, reg, ir);
+}
+
+void
+fs_visitor::visit(ir_dereference_variable *ir)
+{
+   fs_reg *reg = variable_storage(ir->var);
+   this->result = *reg;
+}
+
+void
+fs_visitor::visit(ir_dereference_record *ir)
+{
+   const glsl_type *struct_type = ir->record->type;
+
+   ir->record->accept(this);
+
+   unsigned int offset = 0;
+   for (unsigned int i = 0; i < struct_type->length; i++) {
+      if (strcmp(struct_type->fields.structure[i].name, ir->field) == 0)
+	 break;
+      offset += type_size(struct_type->fields.structure[i].type);
+   }
+   this->result.reg_offset += offset;
+   this->result.type = brw_type_for_base_type(ir->type);
+}
+
+void
+fs_visitor::visit(ir_dereference_array *ir)
+{
+   ir_constant *constant_index;
+   fs_reg src;
+   int element_size = type_size(ir->type);
+
+   constant_index = ir->array_index->as_constant();
+
+   ir->array->accept(this);
+   src = this->result;
+   src.type = brw_type_for_base_type(ir->type);
+
+   if (constant_index) {
+      assert(src.file == UNIFORM || src.file == GRF);
+      src.reg_offset += constant_index->value.i[0] * element_size;
+   } else {
+      /* Variable index array dereference.  We attach the variable index
+       * component to the reg as a pointer to a register containing the
+       * offset.  Currently only uniform arrays are supported in this patch,
+       * and that reladdr pointer is resolved by
+       * move_uniform_array_access_to_pull_constants().  All other array types
+       * are lowered by lower_variable_index_to_cond_assign().
+       */
+      ir->array_index->accept(this);
+
+      fs_reg index_reg;
+      index_reg = fs_reg(this, glsl_type::int_type);
+      emit(BRW_OPCODE_MUL, index_reg, this->result, fs_reg(element_size));
+
+      if (src.reladdr) {
+         emit(BRW_OPCODE_ADD, index_reg, *src.reladdr, index_reg);
+      }
+
+      src.reladdr = ralloc(mem_ctx, fs_reg);
+      memcpy(src.reladdr, &index_reg, sizeof(index_reg));
+   }
+   this->result = src;
+}
+
+void
+fs_visitor::emit_lrp(const fs_reg &dst, const fs_reg &x, const fs_reg &y,
+                     const fs_reg &a)
+{
+   if (brw->gen < 6 ||
+       !x.is_valid_3src() ||
+       !y.is_valid_3src() ||
+       !a.is_valid_3src()) {
+      /* We can't use the LRP instruction.  Emit x*(1-a) + y*a. */
+      fs_reg y_times_a           = fs_reg(this, glsl_type::float_type);
+      fs_reg one_minus_a         = fs_reg(this, glsl_type::float_type);
+      fs_reg x_times_one_minus_a = fs_reg(this, glsl_type::float_type);
+
+      emit(MUL(y_times_a, y, a));
+
+      fs_reg negative_a = a;
+      negative_a.negate = !a.negate;
+      emit(ADD(one_minus_a, negative_a, fs_reg(1.0f)));
+      emit(MUL(x_times_one_minus_a, x, one_minus_a));
+
+      emit(ADD(dst, x_times_one_minus_a, y_times_a));
+   } else {
+      /* The LRP instruction actually does op1 * op0 + op2 * (1 - op0), so
+       * we need to reorder the operands.
+       */
+      emit(LRP(dst, a, y, x));
+   }
+}
+
+void
+fs_visitor::emit_minmax(uint32_t conditionalmod, const fs_reg &dst,
+                        const fs_reg &src0, const fs_reg &src1)
+{
+   fs_inst *inst;
+
+   if (brw->gen >= 6) {
+      inst = emit(BRW_OPCODE_SEL, dst, src0, src1);
+      inst->conditional_mod = conditionalmod;
+   } else {
+      emit(CMP(reg_null_d, src0, src1, conditionalmod));
+
+      inst = emit(BRW_OPCODE_SEL, dst, src0, src1);
+      inst->predicate = BRW_PREDICATE_NORMAL;
+   }
+}
+
+/* Instruction selection: Produce a MOV.sat instead of
+ * MIN(MAX(val, 0), 1) when possible.
+ */
+bool
+fs_visitor::try_emit_saturate(ir_expression *ir)
+{
+   ir_rvalue *sat_val = ir->as_rvalue_to_saturate();
+
+   if (!sat_val)
+      return false;
+
+   fs_inst *pre_inst = (fs_inst *) this->instructions.get_tail();
+
+   sat_val->accept(this);
+   fs_reg src = this->result;
+
+   fs_inst *last_inst = (fs_inst *) this->instructions.get_tail();
+
+   /* If the last instruction from our accept() didn't generate our
+    * src, generate a saturated MOV
+    */
+   fs_inst *modify = get_instruction_generating_reg(pre_inst, last_inst, src);
+   if (!modify || modify->regs_written != 1) {
+      this->result = fs_reg(this, ir->type);
+      fs_inst *inst = emit(MOV(this->result, src));
+      inst->saturate = true;
+   } else {
+      modify->saturate = true;
+      this->result = src;
+   }
+
+
+   return true;
+}
+
+bool
+fs_visitor::try_emit_mad(ir_expression *ir)
+{
+   /* 3-src instructions were introduced in gen6. */
+   if (brw->gen < 6)
+      return false;
+
+   /* MAD can only handle floating-point data. */
+   if (ir->type != glsl_type::float_type)
+      return false;
+
+   ir_rvalue *nonmul = ir->operands[1];
+   ir_expression *mul = ir->operands[0]->as_expression();
+
+   if (!mul || mul->operation != ir_binop_mul) {
+      nonmul = ir->operands[0];
+      mul = ir->operands[1]->as_expression();
+
+      if (!mul || mul->operation != ir_binop_mul)
+         return false;
+   }
+
+   if (nonmul->as_constant() ||
+       mul->operands[0]->as_constant() ||
+       mul->operands[1]->as_constant())
+      return false;
+
+   nonmul->accept(this);
+   fs_reg src0 = this->result;
+
+   mul->operands[0]->accept(this);
+   fs_reg src1 = this->result;
+
+   mul->operands[1]->accept(this);
+   fs_reg src2 = this->result;
+
+   this->result = fs_reg(this, ir->type);
+   emit(BRW_OPCODE_MAD, this->result, src0, src1, src2);
+
+   return true;
+}
+
+void
+fs_visitor::visit(ir_expression *ir)
+{
+   unsigned int operand;
+   fs_reg op[3], temp;
+   fs_inst *inst;
+
+   assert(ir->get_num_operands() <= 3);
+
+   if (try_emit_saturate(ir))
+      return;
+   if (ir->operation == ir_binop_add) {
+      if (try_emit_mad(ir))
+	 return;
+   }
+
+   for (operand = 0; operand < ir->get_num_operands(); operand++) {
+      ir->operands[operand]->accept(this);
+      if (this->result.file == BAD_FILE) {
+	 fail("Failed to get tree for expression operand:\n");
+	 ir->operands[operand]->fprint(stderr);
+         fprintf(stderr, "\n");
+      }
+      assert(this->result.is_valid_3src());
+      op[operand] = this->result;
+
+      /* Matrix expression operands should have been broken down to vector
+       * operations already.
+       */
+      assert(!ir->operands[operand]->type->is_matrix());
+      /* And then those vector operands should have been broken down to scalar.
+       */
+      assert(!ir->operands[operand]->type->is_vector());
+   }
+
+   /* Storage for our result.  If our result goes into an assignment, it will
+    * just get copy-propagated out, so no worries.
+    */
+   this->result = fs_reg(this, ir->type);
+
+   switch (ir->operation) {
+   case ir_unop_logic_not:
+      /* Note that BRW_OPCODE_NOT is not appropriate here, since it is
+       * ones complement of the whole register, not just bit 0.
+       */
+      emit(XOR(this->result, op[0], fs_reg(1)));
+      break;
+   case ir_unop_neg:
+      op[0].negate = !op[0].negate;
+      emit(MOV(this->result, op[0]));
+      break;
+   case ir_unop_abs:
+      op[0].abs = true;
+      op[0].negate = false;
+      emit(MOV(this->result, op[0]));
+      break;
+   case ir_unop_sign:
+      if (ir->type->is_float()) {
+         /* AND(val, 0x80000000) gives the sign bit.
+          *
+          * Predicated OR ORs 1.0 (0x3f800000) with the sign bit if val is not
+          * zero.
+          */
+         emit(CMP(reg_null_f, op[0], fs_reg(0.0f), BRW_CONDITIONAL_NZ));
+
+         op[0].type = BRW_REGISTER_TYPE_UD;
+         this->result.type = BRW_REGISTER_TYPE_UD;
+         emit(AND(this->result, op[0], fs_reg(0x80000000u)));
+
+         inst = emit(OR(this->result, this->result, fs_reg(0x3f800000u)));
+         inst->predicate = BRW_PREDICATE_NORMAL;
+
+         this->result.type = BRW_REGISTER_TYPE_F;
+      } else {
+         /*  ASR(val, 31) -> negative val generates 0xffffffff (signed -1).
+          *               -> non-negative val generates 0x00000000.
+          *  Predicated OR sets 1 if val is positive.
+          */
+         emit(CMP(reg_null_d, op[0], fs_reg(0), BRW_CONDITIONAL_G));
+
+         emit(ASR(this->result, op[0], fs_reg(31)));
+
+         inst = emit(OR(this->result, this->result, fs_reg(1)));
+         inst->predicate = BRW_PREDICATE_NORMAL;
+      }
+      break;
+   case ir_unop_rcp:
+      emit_math(SHADER_OPCODE_RCP, this->result, op[0]);
+      break;
+
+   case ir_unop_exp2:
+      emit_math(SHADER_OPCODE_EXP2, this->result, op[0]);
+      break;
+   case ir_unop_log2:
+      emit_math(SHADER_OPCODE_LOG2, this->result, op[0]);
+      break;
+   case ir_unop_exp:
+   case ir_unop_log:
+      assert(!"not reached: should be handled by ir_explog_to_explog2");
+      break;
+   case ir_unop_sin:
+   case ir_unop_sin_reduced:
+      emit_math(SHADER_OPCODE_SIN, this->result, op[0]);
+      break;
+   case ir_unop_cos:
+   case ir_unop_cos_reduced:
+      emit_math(SHADER_OPCODE_COS, this->result, op[0]);
+      break;
+
+   case ir_unop_dFdx:
+      emit(FS_OPCODE_DDX, this->result, op[0]);
+      break;
+   case ir_unop_dFdy:
+      emit(FS_OPCODE_DDY, this->result, op[0]);
+      break;
+
+   case ir_binop_add:
+      emit(ADD(this->result, op[0], op[1]));
+      break;
+   case ir_binop_sub:
+      assert(!"not reached: should be handled by ir_sub_to_add_neg");
+      break;
+
+   case ir_binop_mul:
+      if (brw->gen < 8 && ir->type->is_integer()) {
+	 /* For integer multiplication, the MUL uses the low 16 bits
+	  * of one of the operands (src0 on gen6, src1 on gen7).  The
+	  * MACH accumulates in the contribution of the upper 16 bits
+	  * of that operand.
+          */
+         if (ir->operands[0]->is_uint16_constant()) {
+            if (brw->gen < 7)
+               emit(MUL(this->result, op[0], op[1]));
+            else
+               emit(MUL(this->result, op[1], op[0]));
+         } else if (ir->operands[1]->is_uint16_constant()) {
+            if (brw->gen < 7)
+               emit(MUL(this->result, op[1], op[0]));
+            else
+               emit(MUL(this->result, op[0], op[1]));
+         } else {
+            if (brw->gen >= 7)
+               no16("SIMD16 explicit accumulator operands unsupported\n");
+
+            struct brw_reg acc = retype(brw_acc_reg(), this->result.type);
+
+            emit(MUL(acc, op[0], op[1]));
+            emit(MACH(reg_null_d, op[0], op[1]));
+            emit(MOV(this->result, fs_reg(acc)));
+         }
+      } else {
+	 emit(MUL(this->result, op[0], op[1]));
+      }
+      break;
+   case ir_binop_imul_high: {
+      if (brw->gen >= 7)
+         no16("SIMD16 explicit accumulator operands unsupported\n");
+
+      struct brw_reg acc = retype(brw_acc_reg(), this->result.type);
+
+      emit(MUL(acc, op[0], op[1]));
+      emit(MACH(this->result, op[0], op[1]));
+      break;
+   }
+   case ir_binop_div:
+      /* Floating point should be lowered by DIV_TO_MUL_RCP in the compiler. */
+      assert(ir->type->is_integer());
+      emit_math(SHADER_OPCODE_INT_QUOTIENT, this->result, op[0], op[1]);
+      break;
+   case ir_binop_carry: {
+      if (brw->gen >= 7)
+         no16("SIMD16 explicit accumulator operands unsupported\n");
+
+      struct brw_reg acc = retype(brw_acc_reg(), BRW_REGISTER_TYPE_UD);
+
+      emit(ADDC(reg_null_ud, op[0], op[1]));
+      emit(MOV(this->result, fs_reg(acc)));
+      break;
+   }
+   case ir_binop_borrow: {
+      if (brw->gen >= 7)
+         no16("SIMD16 explicit accumulator operands unsupported\n");
+
+      struct brw_reg acc = retype(brw_acc_reg(), BRW_REGISTER_TYPE_UD);
+
+      emit(SUBB(reg_null_ud, op[0], op[1]));
+      emit(MOV(this->result, fs_reg(acc)));
+      break;
+   }
+   case ir_binop_mod:
+      /* Floating point should be lowered by MOD_TO_FRACT in the compiler. */
+      assert(ir->type->is_integer());
+      emit_math(SHADER_OPCODE_INT_REMAINDER, this->result, op[0], op[1]);
+      break;
+
+   case ir_binop_less:
+   case ir_binop_greater:
+   case ir_binop_lequal:
+   case ir_binop_gequal:
+   case ir_binop_equal:
+   case ir_binop_all_equal:
+   case ir_binop_nequal:
+   case ir_binop_any_nequal:
+      resolve_bool_comparison(ir->operands[0], &op[0]);
+      resolve_bool_comparison(ir->operands[1], &op[1]);
+
+      emit(CMP(this->result, op[0], op[1],
+               brw_conditional_for_comparison(ir->operation)));
+      break;
+
+   case ir_binop_logic_xor:
+      emit(XOR(this->result, op[0], op[1]));
+      break;
+
+   case ir_binop_logic_or:
+      emit(OR(this->result, op[0], op[1]));
+      break;
+
+   case ir_binop_logic_and:
+      emit(AND(this->result, op[0], op[1]));
+      break;
+
+   case ir_binop_dot:
+   case ir_unop_any:
+      assert(!"not reached: should be handled by brw_fs_channel_expressions");
+      break;
+
+   case ir_unop_noise:
+      assert(!"not reached: should be handled by lower_noise");
+      break;
+
+   case ir_quadop_vector:
+      assert(!"not reached: should be handled by lower_quadop_vector");
+      break;
+
+   case ir_binop_vector_extract:
+      assert(!"not reached: should be handled by lower_vec_index_to_cond_assign()");
+      break;
+
+   case ir_triop_vector_insert:
+      assert(!"not reached: should be handled by lower_vector_insert()");
+      break;
+
+   case ir_binop_ldexp:
+      assert(!"not reached: should be handled by ldexp_to_arith()");
+      break;
+
+   case ir_unop_sqrt:
+      emit_math(SHADER_OPCODE_SQRT, this->result, op[0]);
+      break;
+
+   case ir_unop_rsq:
+      emit_math(SHADER_OPCODE_RSQ, this->result, op[0]);
+      break;
+
+   case ir_unop_bitcast_i2f:
+   case ir_unop_bitcast_u2f:
+      op[0].type = BRW_REGISTER_TYPE_F;
+      this->result = op[0];
+      break;
+   case ir_unop_i2u:
+   case ir_unop_bitcast_f2u:
+      op[0].type = BRW_REGISTER_TYPE_UD;
+      this->result = op[0];
+      break;
+   case ir_unop_u2i:
+   case ir_unop_bitcast_f2i:
+      op[0].type = BRW_REGISTER_TYPE_D;
+      this->result = op[0];
+      break;
+   case ir_unop_i2f:
+   case ir_unop_u2f:
+   case ir_unop_f2i:
+   case ir_unop_f2u:
+      emit(MOV(this->result, op[0]));
+      break;
+
+   case ir_unop_b2i:
+      emit(AND(this->result, op[0], fs_reg(1)));
+      break;
+   case ir_unop_b2f:
+      temp = fs_reg(this, glsl_type::int_type);
+      emit(AND(temp, op[0], fs_reg(1)));
+      emit(MOV(this->result, temp));
+      break;
+
+   case ir_unop_f2b:
+      emit(CMP(this->result, op[0], fs_reg(0.0f), BRW_CONDITIONAL_NZ));
+      break;
+   case ir_unop_i2b:
+      emit(CMP(this->result, op[0], fs_reg(0), BRW_CONDITIONAL_NZ));
+      break;
+
+   case ir_unop_trunc:
+      emit(RNDZ(this->result, op[0]));
+      break;
+   case ir_unop_ceil:
+      op[0].negate = !op[0].negate;
+      emit(RNDD(this->result, op[0]));
+      this->result.negate = true;
+      break;
+   case ir_unop_floor:
+      emit(RNDD(this->result, op[0]));
+      break;
+   case ir_unop_fract:
+      emit(FRC(this->result, op[0]));
+      break;
+   case ir_unop_round_even:
+      emit(RNDE(this->result, op[0]));
+      break;
+
+   case ir_binop_min:
+   case ir_binop_max:
+      resolve_ud_negate(&op[0]);
+      resolve_ud_negate(&op[1]);
+      emit_minmax(ir->operation == ir_binop_min ?
+                  BRW_CONDITIONAL_L : BRW_CONDITIONAL_GE,
+                  this->result, op[0], op[1]);
+      break;
+   case ir_unop_pack_snorm_2x16:
+   case ir_unop_pack_snorm_4x8:
+   case ir_unop_pack_unorm_2x16:
+   case ir_unop_pack_unorm_4x8:
+   case ir_unop_unpack_snorm_2x16:
+   case ir_unop_unpack_snorm_4x8:
+   case ir_unop_unpack_unorm_2x16:
+   case ir_unop_unpack_unorm_4x8:
+   case ir_unop_unpack_half_2x16:
+   case ir_unop_pack_half_2x16:
+      assert(!"not reached: should be handled by lower_packing_builtins");
+      break;
+   case ir_unop_unpack_half_2x16_split_x:
+      emit(FS_OPCODE_UNPACK_HALF_2x16_SPLIT_X, this->result, op[0]);
+      break;
+   case ir_unop_unpack_half_2x16_split_y:
+      emit(FS_OPCODE_UNPACK_HALF_2x16_SPLIT_Y, this->result, op[0]);
+      break;
+   case ir_binop_pow:
+      emit_math(SHADER_OPCODE_POW, this->result, op[0], op[1]);
+      break;
+
+   case ir_unop_bitfield_reverse:
+      emit(BFREV(this->result, op[0]));
+      break;
+   case ir_unop_bit_count:
+      emit(CBIT(this->result, op[0]));
+      break;
+   case ir_unop_find_msb:
+      temp = fs_reg(this, glsl_type::uint_type);
+      emit(FBH(temp, op[0]));
+
+      /* FBH counts from the MSB side, while GLSL's findMSB() wants the count
+       * from the LSB side. If FBH didn't return an error (0xFFFFFFFF), then
+       * subtract the result from 31 to convert the MSB count into an LSB count.
+       */
+
+      /* FBH only supports UD type for dst, so use a MOV to convert UD to D. */
+      emit(MOV(this->result, temp));
+      emit(CMP(reg_null_d, this->result, fs_reg(-1), BRW_CONDITIONAL_NZ));
+
+      temp.negate = true;
+      inst = emit(ADD(this->result, temp, fs_reg(31)));
+      inst->predicate = BRW_PREDICATE_NORMAL;
+      break;
+   case ir_unop_find_lsb:
+      emit(FBL(this->result, op[0]));
+      break;
+   case ir_triop_bitfield_extract:
+      /* Note that the instruction's argument order is reversed from GLSL
+       * and the IR.
+       */
+      emit(BFE(this->result, op[2], op[1], op[0]));
+      break;
+   case ir_binop_bfm:
+      emit(BFI1(this->result, op[0], op[1]));
+      break;
+   case ir_triop_bfi:
+      emit(BFI2(this->result, op[0], op[1], op[2]));
+      break;
+   case ir_quadop_bitfield_insert:
+      assert(!"not reached: should be handled by "
+              "lower_instructions::bitfield_insert_to_bfm_bfi");
+      break;
+
+   case ir_unop_bit_not:
+      emit(NOT(this->result, op[0]));
+      break;
+   case ir_binop_bit_and:
+      emit(AND(this->result, op[0], op[1]));
+      break;
+   case ir_binop_bit_xor:
+      emit(XOR(this->result, op[0], op[1]));
+      break;
+   case ir_binop_bit_or:
+      emit(OR(this->result, op[0], op[1]));
+      break;
+
+   case ir_binop_lshift:
+      emit(SHL(this->result, op[0], op[1]));
+      break;
+
+   case ir_binop_rshift:
+      if (ir->type->base_type == GLSL_TYPE_INT)
+	 emit(ASR(this->result, op[0], op[1]));
+      else
+	 emit(SHR(this->result, op[0], op[1]));
+      break;
+   case ir_binop_pack_half_2x16_split:
+      emit(FS_OPCODE_PACK_HALF_2x16_SPLIT, this->result, op[0], op[1]);
+      break;
+   case ir_binop_ubo_load: {
+      /* This IR node takes a constant uniform block and a constant or
+       * variable byte offset within the block and loads a vector from that.
+       */
+      ir_constant *uniform_block = ir->operands[0]->as_constant();
+      ir_constant *const_offset = ir->operands[1]->as_constant();
+      fs_reg surf_index = fs_reg(c->prog_data.base.binding_table.ubo_start +
+                                 uniform_block->value.u[0]);
+      if (const_offset) {
+         fs_reg packed_consts = fs_reg(this, glsl_type::float_type);
+         packed_consts.type = result.type;
+
+         fs_reg const_offset_reg = fs_reg(const_offset->value.u[0] & ~15);
+         emit(new(mem_ctx) fs_inst(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD,
+                                   packed_consts, surf_index, const_offset_reg));
+
+         for (int i = 0; i < ir->type->vector_elements; i++) {
+            packed_consts.set_smear(const_offset->value.u[0] % 16 / 4 + i);
+
+            /* The std140 packing rules don't allow vectors to cross 16-byte
+             * boundaries, and a reg is 32 bytes.
+             */
+            assert(packed_consts.subreg_offset < 32);
+
+            /* UBO bools are any nonzero value.  We consider bools to be
+             * values with the low bit set to 1.  Convert them using CMP.
+             */
+            if (ir->type->base_type == GLSL_TYPE_BOOL) {
+               emit(CMP(result, packed_consts, fs_reg(0u), BRW_CONDITIONAL_NZ));
+            } else {
+               emit(MOV(result, packed_consts));
+            }
+
+            result.reg_offset++;
+         }
+      } else {
+         /* Turn the byte offset into a dword offset. */
+         fs_reg base_offset = fs_reg(this, glsl_type::int_type);
+         emit(SHR(base_offset, op[1], fs_reg(2)));
+
+         for (int i = 0; i < ir->type->vector_elements; i++) {
+            emit(VARYING_PULL_CONSTANT_LOAD(result, surf_index,
+                                            base_offset, i));
+
+            if (ir->type->base_type == GLSL_TYPE_BOOL)
+               emit(CMP(result, result, fs_reg(0), BRW_CONDITIONAL_NZ));
+
+            result.reg_offset++;
+         }
+      }
+
+      result.reg_offset = 0;
+      break;
+   }
+
+   case ir_triop_fma:
+      /* Note that the instruction's argument order is reversed from GLSL
+       * and the IR.
+       */
+      emit(MAD(this->result, op[2], op[1], op[0]));
+      break;
+
+   case ir_triop_lrp:
+      emit_lrp(this->result, op[0], op[1], op[2]);
+      break;
+
+   case ir_triop_csel:
+      emit(CMP(reg_null_d, op[0], fs_reg(0), BRW_CONDITIONAL_NZ));
+      inst = emit(BRW_OPCODE_SEL, this->result, op[1], op[2]);
+      inst->predicate = BRW_PREDICATE_NORMAL;
+      break;
+   }
+}
+
+void
+fs_visitor::emit_assignment_writes(fs_reg &l, fs_reg &r,
+				   const glsl_type *type, bool predicated)
+{
+   switch (type->base_type) {
+   case GLSL_TYPE_FLOAT:
+   case GLSL_TYPE_UINT:
+   case GLSL_TYPE_INT:
+   case GLSL_TYPE_BOOL:
+      for (unsigned int i = 0; i < type->components(); i++) {
+	 l.type = brw_type_for_base_type(type);
+	 r.type = brw_type_for_base_type(type);
+
+	 if (predicated || !l.equals(r)) {
+	    fs_inst *inst = emit(MOV(l, r));
+	    inst->predicate = predicated ? BRW_PREDICATE_NORMAL : BRW_PREDICATE_NONE;
+	 }
+
+	 l.reg_offset++;
+	 r.reg_offset++;
+      }
+      break;
+   case GLSL_TYPE_ARRAY:
+      for (unsigned int i = 0; i < type->length; i++) {
+	 emit_assignment_writes(l, r, type->fields.array, predicated);
+      }
+      break;
+
+   case GLSL_TYPE_STRUCT:
+      for (unsigned int i = 0; i < type->length; i++) {
+	 emit_assignment_writes(l, r, type->fields.structure[i].type,
+				predicated);
+      }
+      break;
+
+   case GLSL_TYPE_SAMPLER:
+   case GLSL_TYPE_IMAGE:
+   case GLSL_TYPE_ATOMIC_UINT:
+      break;
+
+   case GLSL_TYPE_VOID:
+   case GLSL_TYPE_ERROR:
+   case GLSL_TYPE_INTERFACE:
+      assert(!"not reached");
+      break;
+   }
+}
+
+/* If the RHS processing resulted in an instruction generating a
+ * temporary value, and it would be easy to rewrite the instruction to
+ * generate its result right into the LHS instead, do so.  This ends
+ * up reliably removing instructions where it can be tricky to do so
+ * later without real UD chain information.
+ */
+bool
+fs_visitor::try_rewrite_rhs_to_dst(ir_assignment *ir,
+                                   fs_reg dst,
+                                   fs_reg src,
+                                   fs_inst *pre_rhs_inst,
+                                   fs_inst *last_rhs_inst)
+{
+   /* Only attempt if we're doing a direct assignment. */
+   if (ir->condition ||
+       !(ir->lhs->type->is_scalar() ||
+        (ir->lhs->type->is_vector() &&
+         ir->write_mask == (1 << ir->lhs->type->vector_elements) - 1)))
+      return false;
+
+   /* Make sure the last instruction generated our source reg. */
+   fs_inst *modify = get_instruction_generating_reg(pre_rhs_inst,
+						    last_rhs_inst,
+						    src);
+   if (!modify)
+      return false;
+
+   /* If last_rhs_inst wrote a different number of components than our LHS,
+    * we can't safely rewrite it.
+    */
+   if (virtual_grf_sizes[dst.reg] != modify->regs_written)
+      return false;
+
+   /* Success!  Rewrite the instruction. */
+   modify->dst = dst;
+
+   return true;
+}
+
+void
+fs_visitor::visit(ir_assignment *ir)
+{
+   fs_reg l, r;
+   fs_inst *inst;
+
+   /* FINISHME: arrays on the lhs */
+   ir->lhs->accept(this);
+   l = this->result;
+
+   fs_inst *pre_rhs_inst = (fs_inst *) this->instructions.get_tail();
+
+   ir->rhs->accept(this);
+   r = this->result;
+
+   fs_inst *last_rhs_inst = (fs_inst *) this->instructions.get_tail();
+
+   assert(l.file != BAD_FILE);
+   assert(r.file != BAD_FILE);
+
+   if (try_rewrite_rhs_to_dst(ir, l, r, pre_rhs_inst, last_rhs_inst))
+      return;
+
+   if (ir->condition) {
+      emit_bool_to_cond_code(ir->condition);
+   }
+
+   if (ir->lhs->type->is_scalar() ||
+       ir->lhs->type->is_vector()) {
+      for (int i = 0; i < ir->lhs->type->vector_elements; i++) {
+	 if (ir->write_mask & (1 << i)) {
+	    inst = emit(MOV(l, r));
+	    if (ir->condition)
+	       inst->predicate = BRW_PREDICATE_NORMAL;
+	    r.reg_offset++;
+	 }
+	 l.reg_offset++;
+      }
+   } else {
+      emit_assignment_writes(l, r, ir->lhs->type, ir->condition != NULL);
+   }
+}
+
+fs_inst *
+fs_visitor::emit_texture_gen4(ir_texture *ir, fs_reg dst, fs_reg coordinate,
+			      fs_reg shadow_c, fs_reg lod, fs_reg dPdy)
+{
+   int mlen;
+   int base_mrf = 1;
+   bool simd16 = false;
+   fs_reg orig_dst;
+
+   /* g0 header. */
+   mlen = 1;
+
+   if (ir->shadow_comparitor) {
+      for (int i = 0; i < ir->coordinate->type->vector_elements; i++) {
+	 emit(MOV(fs_reg(MRF, base_mrf + mlen + i), coordinate));
+	 coordinate.reg_offset++;
+      }
+
+      /* gen4's SIMD8 sampler always has the slots for u,v,r present.
+       * the unused slots must be zeroed.
+       */
+      for (int i = ir->coordinate->type->vector_elements; i < 3; i++) {
+         emit(MOV(fs_reg(MRF, base_mrf + mlen + i), fs_reg(0.0f)));
+      }
+      mlen += 3;
+
+      if (ir->op == ir_tex) {
+	 /* There's no plain shadow compare message, so we use shadow
+	  * compare with a bias of 0.0.
+	  */
+	 emit(MOV(fs_reg(MRF, base_mrf + mlen), fs_reg(0.0f)));
+	 mlen++;
+      } else if (ir->op == ir_txb || ir->op == ir_txl) {
+	 emit(MOV(fs_reg(MRF, base_mrf + mlen), lod));
+	 mlen++;
+      } else {
+         assert(!"Should not get here.");
+      }
+
+      emit(MOV(fs_reg(MRF, base_mrf + mlen), shadow_c));
+      mlen++;
+   } else if (ir->op == ir_tex) {
+      for (int i = 0; i < ir->coordinate->type->vector_elements; i++) {
+	 emit(MOV(fs_reg(MRF, base_mrf + mlen + i), coordinate));
+	 coordinate.reg_offset++;
+      }
+      /* zero the others. */
+      for (int i = ir->coordinate->type->vector_elements; i<3; i++) {
+         emit(MOV(fs_reg(MRF, base_mrf + mlen + i), fs_reg(0.0f)));
+      }
+      /* gen4's SIMD8 sampler always has the slots for u,v,r present. */
+      mlen += 3;
+   } else if (ir->op == ir_txd) {
+      fs_reg &dPdx = lod;
+
+      for (int i = 0; i < ir->coordinate->type->vector_elements; i++) {
+	 emit(MOV(fs_reg(MRF, base_mrf + mlen + i), coordinate));
+	 coordinate.reg_offset++;
+      }
+      /* the slots for u and v are always present, but r is optional */
+      mlen += MAX2(ir->coordinate->type->vector_elements, 2);
+
+      /*  P   = u, v, r
+       * dPdx = dudx, dvdx, drdx
+       * dPdy = dudy, dvdy, drdy
+       *
+       * 1-arg: Does not exist.
+       *
+       * 2-arg: dudx   dvdx   dudy   dvdy
+       *        dPdx.x dPdx.y dPdy.x dPdy.y
+       *        m4     m5     m6     m7
+       *
+       * 3-arg: dudx   dvdx   drdx   dudy   dvdy   drdy
+       *        dPdx.x dPdx.y dPdx.z dPdy.x dPdy.y dPdy.z
+       *        m5     m6     m7     m8     m9     m10
+       */
+      for (int i = 0; i < ir->lod_info.grad.dPdx->type->vector_elements; i++) {
+	 emit(MOV(fs_reg(MRF, base_mrf + mlen), dPdx));
+	 dPdx.reg_offset++;
+      }
+      mlen += MAX2(ir->lod_info.grad.dPdx->type->vector_elements, 2);
+
+      for (int i = 0; i < ir->lod_info.grad.dPdy->type->vector_elements; i++) {
+	 emit(MOV(fs_reg(MRF, base_mrf + mlen), dPdy));
+	 dPdy.reg_offset++;
+      }
+      mlen += MAX2(ir->lod_info.grad.dPdy->type->vector_elements, 2);
+   } else if (ir->op == ir_txs) {
+      /* There's no SIMD8 resinfo message on Gen4.  Use SIMD16 instead. */
+      simd16 = true;
+      emit(MOV(fs_reg(MRF, base_mrf + mlen, BRW_REGISTER_TYPE_UD), lod));
+      mlen += 2;
+   } else {
+      /* Oh joy.  gen4 doesn't have SIMD8 non-shadow-compare bias/lod
+       * instructions.  We'll need to do SIMD16 here.
+       */
+      simd16 = true;
+      assert(ir->op == ir_txb || ir->op == ir_txl || ir->op == ir_txf);
+
+      for (int i = 0; i < ir->coordinate->type->vector_elements; i++) {
+	 emit(MOV(fs_reg(MRF, base_mrf + mlen + i * 2, coordinate.type),
+                  coordinate));
+	 coordinate.reg_offset++;
+      }
+
+      /* Initialize the rest of u/v/r with 0.0.  Empirically, this seems to
+       * be necessary for TXF (ld), but seems wise to do for all messages.
+       */
+      for (int i = ir->coordinate->type->vector_elements; i < 3; i++) {
+	 emit(MOV(fs_reg(MRF, base_mrf + mlen + i * 2), fs_reg(0.0f)));
+      }
+
+      /* lod/bias appears after u/v/r. */
+      mlen += 6;
+
+      emit(MOV(fs_reg(MRF, base_mrf + mlen, lod.type), lod));
+      mlen++;
+
+      /* The unused upper half. */
+      mlen++;
+   }
+
+   if (simd16) {
+      /* Now, since we're doing simd16, the return is 2 interleaved
+       * vec4s where the odd-indexed ones are junk. We'll need to move
+       * this weirdness around to the expected layout.
+       */
+      orig_dst = dst;
+      dst = fs_reg(GRF, virtual_grf_alloc(8),
+                   (brw->is_g4x ?
+                    brw_type_for_base_type(ir->type) :
+                    BRW_REGISTER_TYPE_F));
+   }
+
+   fs_inst *inst = NULL;
+   switch (ir->op) {
+   case ir_tex:
+      inst = emit(SHADER_OPCODE_TEX, dst);
+      break;
+   case ir_txb:
+      inst = emit(FS_OPCODE_TXB, dst);
+      break;
+   case ir_txl:
+      inst = emit(SHADER_OPCODE_TXL, dst);
+      break;
+   case ir_txd:
+      inst = emit(SHADER_OPCODE_TXD, dst);
+      break;
+   case ir_txs:
+      inst = emit(SHADER_OPCODE_TXS, dst);
+      break;
+   case ir_txf:
+      inst = emit(SHADER_OPCODE_TXF, dst);
+      break;
+   default:
+      fail("unrecognized texture opcode");
+   }
+   inst->base_mrf = base_mrf;
+   inst->mlen = mlen;
+   inst->header_present = true;
+   inst->regs_written = simd16 ? 8 : 4;
+
+   if (simd16) {
+      for (int i = 0; i < 4; i++) {
+	 emit(MOV(orig_dst, dst));
+	 orig_dst.reg_offset++;
+	 dst.reg_offset += 2;
+      }
+   }
+
+   return inst;
+}
+
+/* gen5's sampler has slots for u, v, r, array index, then optional
+ * parameters like shadow comparitor or LOD bias.  If optional
+ * parameters aren't present, those base slots are optional and don't
+ * need to be included in the message.
+ *
+ * We don't fill in the unnecessary slots regardless, which may look
+ * surprising in the disassembly.
+ */
+fs_inst *
+fs_visitor::emit_texture_gen5(ir_texture *ir, fs_reg dst, fs_reg coordinate,
+                              fs_reg shadow_c, fs_reg lod, fs_reg lod2,
+                              fs_reg sample_index)
+{
+   int mlen = 0;
+   int base_mrf = 2;
+   int reg_width = dispatch_width / 8;
+   bool header_present = false;
+   const int vector_elements =
+      ir->coordinate ? ir->coordinate->type->vector_elements : 0;
+
+   if (ir->offset) {
+      /* The offsets set up by the ir_texture visitor are in the
+       * m1 header, so we can't go headerless.
+       */
+      header_present = true;
+      mlen++;
+      base_mrf--;
+   }
+
+   for (int i = 0; i < vector_elements; i++) {
+      emit(MOV(fs_reg(MRF, base_mrf + mlen + i * reg_width, coordinate.type),
+               coordinate));
+      coordinate.reg_offset++;
+   }
+   mlen += vector_elements * reg_width;
+
+   if (ir->shadow_comparitor) {
+      mlen = MAX2(mlen, header_present + 4 * reg_width);
+
+      emit(MOV(fs_reg(MRF, base_mrf + mlen), shadow_c));
+      mlen += reg_width;
+   }
+
+   fs_inst *inst = NULL;
+   switch (ir->op) {
+   case ir_tex:
+      inst = emit(SHADER_OPCODE_TEX, dst);
+      break;
+   case ir_txb:
+      mlen = MAX2(mlen, header_present + 4 * reg_width);
+      emit(MOV(fs_reg(MRF, base_mrf + mlen), lod));
+      mlen += reg_width;
+
+      inst = emit(FS_OPCODE_TXB, dst);
+      break;
+   case ir_txl:
+      mlen = MAX2(mlen, header_present + 4 * reg_width);
+      emit(MOV(fs_reg(MRF, base_mrf + mlen), lod));
+      mlen += reg_width;
+
+      inst = emit(SHADER_OPCODE_TXL, dst);
+      break;
+   case ir_txd: {
+      mlen = MAX2(mlen, header_present + 4 * reg_width); /* skip over 'ai' */
+
+      /**
+       *  P   =  u,    v,    r
+       * dPdx = dudx, dvdx, drdx
+       * dPdy = dudy, dvdy, drdy
+       *
+       * Load up these values:
+       * - dudx   dudy   dvdx   dvdy   drdx   drdy
+       * - dPdx.x dPdy.x dPdx.y dPdy.y dPdx.z dPdy.z
+       */
+      for (int i = 0; i < ir->lod_info.grad.dPdx->type->vector_elements; i++) {
+	 emit(MOV(fs_reg(MRF, base_mrf + mlen), lod));
+	 lod.reg_offset++;
+	 mlen += reg_width;
+
+	 emit(MOV(fs_reg(MRF, base_mrf + mlen), lod2));
+	 lod2.reg_offset++;
+	 mlen += reg_width;
+      }
+
+      inst = emit(SHADER_OPCODE_TXD, dst);
+      break;
+   }
+   case ir_txs:
+      emit(MOV(fs_reg(MRF, base_mrf + mlen, BRW_REGISTER_TYPE_UD), lod));
+      mlen += reg_width;
+      inst = emit(SHADER_OPCODE_TXS, dst);
+      break;
+   case ir_query_levels:
+      emit(MOV(fs_reg(MRF, base_mrf + mlen, BRW_REGISTER_TYPE_UD), fs_reg(0u)));
+      mlen += reg_width;
+      inst = emit(SHADER_OPCODE_TXS, dst);
+      break;
+   case ir_txf:
+      mlen = header_present + 4 * reg_width;
+      emit(MOV(fs_reg(MRF, base_mrf + mlen - reg_width, BRW_REGISTER_TYPE_UD), lod));
+      inst = emit(SHADER_OPCODE_TXF, dst);
+      break;
+   case ir_txf_ms:
+      mlen = header_present + 4 * reg_width;
+
+      /* lod */
+      emit(MOV(fs_reg(MRF, base_mrf + mlen - reg_width, BRW_REGISTER_TYPE_UD), fs_reg(0)));
+      /* sample index */
+      emit(MOV(fs_reg(MRF, base_mrf + mlen, BRW_REGISTER_TYPE_UD), sample_index));
+      mlen += reg_width;
+      inst = emit(SHADER_OPCODE_TXF_CMS, dst);
+      break;
+   case ir_lod:
+      inst = emit(SHADER_OPCODE_LOD, dst);
+      break;
+   case ir_tg4:
+      inst = emit(SHADER_OPCODE_TG4, dst);
+      break;
+   default:
+      fail("unrecognized texture opcode");
+      break;
+   }
+   inst->base_mrf = base_mrf;
+   inst->mlen = mlen;
+   inst->header_present = header_present;
+   inst->regs_written = 4;
+
+   if (mlen > MAX_SAMPLER_MESSAGE_SIZE) {
+      fail("Message length >" STRINGIFY(MAX_SAMPLER_MESSAGE_SIZE)
+           " disallowed by hardware\n");
+   }
+
+   return inst;
+}
+
+fs_inst *
+fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg dst, fs_reg coordinate,
+                              fs_reg shadow_c, fs_reg lod, fs_reg lod2,
+                              fs_reg sample_index, fs_reg mcs, int sampler)
+{
+   int reg_width = dispatch_width / 8;
+   bool header_present = false;
+
+   fs_reg payload = fs_reg(this, glsl_type::float_type);
+   fs_reg next = payload;
+
+   if (ir->op == ir_tg4 || (ir->offset && ir->op != ir_txf) || sampler >= 16) {
+      /* For general texture offsets (no txf workaround), we need a header to
+       * put them in.  Note that for SIMD16 we're making space for two actual
+       * hardware registers here, so the emit will have to fix up for this.
+       *
+       * * ir4_tg4 needs to place its channel select in the header,
+       * for interaction with ARB_texture_swizzle
+       *
+       * The sampler index is only 4-bits, so for larger sampler numbers we
+       * need to offset the Sampler State Pointer in the header.
+       */
+      header_present = true;
+      next.reg_offset++;
+   }
+
+   if (ir->shadow_comparitor) {
+      emit(MOV(next, shadow_c));
+      next.reg_offset++;
+   }
+
+   bool has_nonconstant_offset = ir->offset && !ir->offset->as_constant();
+   bool coordinate_done = false;
+
+   /* Set up the LOD info */
+   switch (ir->op) {
+   case ir_tex:
+   case ir_lod:
+      break;
+   case ir_txb:
+      emit(MOV(next, lod));
+      next.reg_offset++;
+      break;
+   case ir_txl:
+      emit(MOV(next, lod));
+      next.reg_offset++;
+      break;
+   case ir_txd: {
+      no16("Gen7 does not support sample_d/sample_d_c in SIMD16 mode.");
+
+      /* Load dPdx and the coordinate together:
+       * [hdr], [ref], x, dPdx.x, dPdy.x, y, dPdx.y, dPdy.y, z, dPdx.z, dPdy.z
+       */
+      for (int i = 0; i < ir->coordinate->type->vector_elements; i++) {
+	 emit(MOV(next, coordinate));
+	 coordinate.reg_offset++;
+	 next.reg_offset++;
+
+         /* For cube map array, the coordinate is (u,v,r,ai) but there are
+          * only derivatives for (u, v, r).
+          */
+         if (i < ir->lod_info.grad.dPdx->type->vector_elements) {
+            emit(MOV(next, lod));
+            lod.reg_offset++;
+            next.reg_offset++;
+
+            emit(MOV(next, lod2));
+            lod2.reg_offset++;
+            next.reg_offset++;
+         }
+      }
+
+      coordinate_done = true;
+      break;
+   }
+   case ir_txs:
+      emit(MOV(retype(next, BRW_REGISTER_TYPE_UD), lod));
+      next.reg_offset++;
+      break;
+   case ir_query_levels:
+      emit(MOV(retype(next, BRW_REGISTER_TYPE_UD), fs_reg(0u)));
+      next.reg_offset++;
+      break;
+   case ir_txf:
+      /* Unfortunately, the parameters for LD are intermixed: u, lod, v, r. */
+      emit(MOV(retype(next, BRW_REGISTER_TYPE_D), coordinate));
+      coordinate.reg_offset++;
+      next.reg_offset++;
+
+      emit(MOV(retype(next, BRW_REGISTER_TYPE_D), lod));
+      next.reg_offset++;
+
+      for (int i = 1; i < ir->coordinate->type->vector_elements; i++) {
+	 emit(MOV(retype(next, BRW_REGISTER_TYPE_D), coordinate));
+	 coordinate.reg_offset++;
+	 next.reg_offset++;
+      }
+
+      coordinate_done = true;
+      break;
+   case ir_txf_ms:
+      emit(MOV(retype(next, BRW_REGISTER_TYPE_UD), sample_index));
+      next.reg_offset++;
+
+      /* data from the multisample control surface */
+      emit(MOV(retype(next, BRW_REGISTER_TYPE_UD), mcs));
+      next.reg_offset++;
+
+      /* there is no offsetting for this message; just copy in the integer
+       * texture coordinates
+       */
+      for (int i = 0; i < ir->coordinate->type->vector_elements; i++) {
+         emit(MOV(retype(next, BRW_REGISTER_TYPE_D), coordinate));
+         coordinate.reg_offset++;
+         next.reg_offset++;
+      }
+
+      coordinate_done = true;
+      break;
+   case ir_tg4:
+      if (has_nonconstant_offset) {
+         if (ir->shadow_comparitor)
+            no16("Gen7 does not support gather4_po_c in SIMD16 mode.");
+
+         /* More crazy intermixing */
+         ir->offset->accept(this);
+         fs_reg offset_value = this->result;
+
+         for (int i = 0; i < 2; i++) { /* u, v */
+            emit(MOV(next, coordinate));
+            coordinate.reg_offset++;
+            next.reg_offset++;
+         }
+
+         for (int i = 0; i < 2; i++) { /* offu, offv */
+            emit(MOV(retype(next, BRW_REGISTER_TYPE_D), offset_value));
+            offset_value.reg_offset++;
+            next.reg_offset++;
+         }
+
+         if (ir->coordinate->type->vector_elements == 3) { /* r if present */
+            emit(MOV(next, coordinate));
+            coordinate.reg_offset++;
+            next.reg_offset++;
+         }
+
+         coordinate_done = true;
+      }
+      break;
+   }
+
+   /* Set up the coordinate (except for cases where it was done above) */
+   if (ir->coordinate && !coordinate_done) {
+      for (int i = 0; i < ir->coordinate->type->vector_elements; i++) {
+         emit(MOV(next, coordinate));
+         coordinate.reg_offset++;
+         next.reg_offset++;
+      }
+   }
+
+   /* Generate the SEND */
+   fs_inst *inst = NULL;
+   switch (ir->op) {
+   case ir_tex: inst = emit(SHADER_OPCODE_TEX, dst, payload); break;
+   case ir_txb: inst = emit(FS_OPCODE_TXB, dst, payload); break;
+   case ir_txl: inst = emit(SHADER_OPCODE_TXL, dst, payload); break;
+   case ir_txd: inst = emit(SHADER_OPCODE_TXD, dst, payload); break;
+   case ir_txf: inst = emit(SHADER_OPCODE_TXF, dst, payload); break;
+   case ir_txf_ms: inst = emit(SHADER_OPCODE_TXF_CMS, dst, payload); break;
+   case ir_txs: inst = emit(SHADER_OPCODE_TXS, dst, payload); break;
+   case ir_query_levels: inst = emit(SHADER_OPCODE_TXS, dst, payload); break;
+   case ir_lod: inst = emit(SHADER_OPCODE_LOD, dst, payload); break;
+   case ir_tg4:
+      if (has_nonconstant_offset)
+         inst = emit(SHADER_OPCODE_TG4_OFFSET, dst, payload);
+      else
+         inst = emit(SHADER_OPCODE_TG4, dst, payload);
+      break;
+   }
+   inst->base_mrf = -1;
+   if (reg_width == 2)
+      inst->mlen = next.reg_offset * reg_width - header_present;
+   else
+      inst->mlen = next.reg_offset * reg_width;
+   inst->header_present = header_present;
+   inst->regs_written = 4;
+
+   virtual_grf_sizes[payload.reg] = next.reg_offset;
+   if (inst->mlen > MAX_SAMPLER_MESSAGE_SIZE) {
+      fail("Message length >" STRINGIFY(MAX_SAMPLER_MESSAGE_SIZE)
+           " disallowed by hardware\n");
+   }
+
+   return inst;
+}
+
+fs_reg
+fs_visitor::rescale_texcoord(ir_texture *ir, fs_reg coordinate,
+                             bool is_rect, int sampler, int texunit)
+{
+   fs_inst *inst = NULL;
+   bool needs_gl_clamp = true;
+   fs_reg scale_x, scale_y;
+
+   /* The 965 requires the EU to do the normalization of GL rectangle
+    * texture coordinates.  We use the program parameter state
+    * tracking to get the scaling factor.
+    */
+   if (is_rect &&
+       (brw->gen < 6 ||
+	(brw->gen >= 6 && (c->key.tex.gl_clamp_mask[0] & (1 << sampler) ||
+			     c->key.tex.gl_clamp_mask[1] & (1 << sampler))))) {
+      struct gl_program_parameter_list *params = prog->Parameters;
+      int tokens[STATE_LENGTH] = {
+	 STATE_INTERNAL,
+	 STATE_TEXRECT_SCALE,
+	 texunit,
+	 0,
+	 0
+      };
+
+      no16("rectangle scale uniform setup not supported on SIMD16\n");
+      if (dispatch_width == 16) {
+	 return coordinate;
+      }
+
+      GLuint index = _mesa_add_state_reference(params,
+					       (gl_state_index *)tokens);
+      /* Try to find existing copies of the texrect scale uniforms. */
+      for (unsigned i = 0; i < uniforms; i++) {
+         if (stage_prog_data->param[i] ==
+             &prog->Parameters->ParameterValues[index][0].f) {
+            scale_x = fs_reg(UNIFORM, i);
+            scale_y = fs_reg(UNIFORM, i + 1);
+            break;
+         }
+      }
+
+      /* If we didn't already set them up, do so now. */
+      if (scale_x.file == BAD_FILE) {
+         scale_x = fs_reg(UNIFORM, uniforms);
+         scale_y = fs_reg(UNIFORM, uniforms + 1);
+
+         stage_prog_data->param[uniforms++] =
+            &prog->Parameters->ParameterValues[index][0].f;
+         stage_prog_data->param[uniforms++] =
+            &prog->Parameters->ParameterValues[index][1].f;
+      }
+   }
+
+   /* The 965 requires the EU to do the normalization of GL rectangle
+    * texture coordinates.  We use the program parameter state
+    * tracking to get the scaling factor.
+    */
+   if (brw->gen < 6 && is_rect) {
+      fs_reg dst = fs_reg(this, ir->coordinate->type);
+      fs_reg src = coordinate;
+      coordinate = dst;
+
+      emit(MUL(dst, src, scale_x));
+      dst.reg_offset++;
+      src.reg_offset++;
+      emit(MUL(dst, src, scale_y));
+   } else if (is_rect) {
+      /* On gen6+, the sampler handles the rectangle coordinates
+       * natively, without needing rescaling.  But that means we have
+       * to do GL_CLAMP clamping at the [0, width], [0, height] scale,
+       * not [0, 1] like the default case below.
+       */
+      needs_gl_clamp = false;
+
+      for (int i = 0; i < 2; i++) {
+	 if (c->key.tex.gl_clamp_mask[i] & (1 << sampler)) {
+	    fs_reg chan = coordinate;
+	    chan.reg_offset += i;
+
+	    inst = emit(BRW_OPCODE_SEL, chan, chan, fs_reg(0.0f));
+	    inst->conditional_mod = BRW_CONDITIONAL_G;
+
+	    /* Our parameter comes in as 1.0/width or 1.0/height,
+	     * because that's what people normally want for doing
+	     * texture rectangle handling.  We need width or height
+	     * for clamping, but we don't care enough to make a new
+	     * parameter type, so just invert back.
+	     */
+	    fs_reg limit = fs_reg(this, glsl_type::float_type);
+	    emit(MOV(limit, i == 0 ? scale_x : scale_y));
+	    emit(SHADER_OPCODE_RCP, limit, limit);
+
+	    inst = emit(BRW_OPCODE_SEL, chan, chan, limit);
+	    inst->conditional_mod = BRW_CONDITIONAL_L;
+	 }
+      }
+   }
+
+   if (ir->coordinate && needs_gl_clamp) {
+      for (unsigned int i = 0;
+	   i < MIN2(ir->coordinate->type->vector_elements, 3); i++) {
+	 if (c->key.tex.gl_clamp_mask[i] & (1 << sampler)) {
+	    fs_reg chan = coordinate;
+	    chan.reg_offset += i;
+
+	    fs_inst *inst = emit(MOV(chan, chan));
+	    inst->saturate = true;
+	 }
+      }
+   }
+   return coordinate;
+}
+
+/* Sample from the MCS surface attached to this multisample texture. */
+fs_reg
+fs_visitor::emit_mcs_fetch(ir_texture *ir, fs_reg coordinate, int sampler)
+{
+   int reg_width = dispatch_width / 8;
+   fs_reg payload = fs_reg(this, glsl_type::float_type);
+   fs_reg dest = fs_reg(this, glsl_type::uvec4_type);
+   fs_reg next = payload;
+
+   /* parameters are: u, v, r, lod; missing parameters are treated as zero */
+   for (int i = 0; i < ir->coordinate->type->vector_elements; i++) {
+      emit(MOV(retype(next, BRW_REGISTER_TYPE_D), coordinate));
+      coordinate.reg_offset++;
+      next.reg_offset++;
+   }
+
+   fs_inst *inst = emit(SHADER_OPCODE_TXF_MCS, dest, payload);
+   virtual_grf_sizes[payload.reg] = next.reg_offset;
+   inst->base_mrf = -1;
+   inst->mlen = next.reg_offset * reg_width;
+   inst->header_present = false;
+   inst->regs_written = 4; /* we only care about one reg of response,
+                            * but the sampler always writes 4/8
+                            */
+   inst->sampler = sampler;
+
+   return dest;
+}
+
+void
+fs_visitor::visit(ir_texture *ir)
+{
+   fs_inst *inst = NULL;
+
+   int sampler = _mesa_get_sampler_uniform_value(ir->sampler, shader_prog, prog);
+
+   /* FINISHME: We're failing to recompile our programs when the sampler is
+    * updated.  This only matters for the texture rectangle scale parameters
+    * (pre-gen6, or gen6+ with GL_CLAMP).
+    */
+
+   int texunit = prog->SamplerUnits[sampler];
+
+   if (ir->op == ir_tg4) {
+      /* When tg4 is used with the degenerate ZERO/ONE swizzles, don't bother
+       * emitting anything other than setting up the constant result.
+       */
+      ir_constant *chan = ir->lod_info.component->as_constant();
+      int swiz = GET_SWZ(c->key.tex.swizzles[sampler], chan->value.i[0]);
+      if (swiz == SWIZZLE_ZERO || swiz == SWIZZLE_ONE) {
+
+         fs_reg res = fs_reg(this, glsl_type::vec4_type);
+         this->result = res;
+
+         for (int i=0; i<4; i++) {
+            emit(MOV(res, fs_reg(swiz == SWIZZLE_ZERO ? 0.0f : 1.0f)));
+            res.reg_offset++;
+         }
+         return;
+      }
+   }
+
+   /* Should be lowered by do_lower_texture_projection */
+   assert(!ir->projector);
+
+   /* Should be lowered */
+   assert(!ir->offset || !ir->offset->type->is_array());
+
+   /* Generate code to compute all the subexpression trees.  This has to be
+    * done before loading any values into MRFs for the sampler message since
+    * generating these values may involve SEND messages that need the MRFs.
+    */
+   fs_reg coordinate;
+   if (ir->coordinate) {
+      ir->coordinate->accept(this);
+
+      coordinate = rescale_texcoord(ir, this->result,
+                                    ir->sampler->type->sampler_dimensionality ==
+                                    GLSL_SAMPLER_DIM_RECT,
+                                    sampler, texunit);
+   }
+
+   fs_reg shadow_comparitor;
+   if (ir->shadow_comparitor) {
+      ir->shadow_comparitor->accept(this);
+      shadow_comparitor = this->result;
+   }
+
+   fs_reg lod, lod2, sample_index, mcs;
+   switch (ir->op) {
+   case ir_tex:
+   case ir_lod:
+   case ir_tg4:
+   case ir_query_levels:
+      break;
+   case ir_txb:
+      ir->lod_info.bias->accept(this);
+      lod = this->result;
+      break;
+   case ir_txd:
+      ir->lod_info.grad.dPdx->accept(this);
+      lod = this->result;
+
+      ir->lod_info.grad.dPdy->accept(this);
+      lod2 = this->result;
+      break;
+   case ir_txf:
+   case ir_txl:
+   case ir_txs:
+      ir->lod_info.lod->accept(this);
+      lod = this->result;
+      break;
+   case ir_txf_ms:
+      ir->lod_info.sample_index->accept(this);
+      sample_index = this->result;
+
+      if (brw->gen >= 7 && c->key.tex.compressed_multisample_layout_mask & (1<<sampler))
+         mcs = emit_mcs_fetch(ir, coordinate, sampler);
+      else
+         mcs = fs_reg(0u);
+      break;
+   default:
+      assert(!"Unrecognized texture opcode");
+   };
+
+   /* Writemasking doesn't eliminate channels on SIMD8 texture
+    * samples, so don't worry about them.
+    */
+   fs_reg dst = fs_reg(this, glsl_type::get_instance(ir->type->base_type, 4, 1));
+
+   if (brw->gen >= 7) {
+      inst = emit_texture_gen7(ir, dst, coordinate, shadow_comparitor,
+                               lod, lod2, sample_index, mcs, sampler);
+   } else if (brw->gen >= 5) {
+      inst = emit_texture_gen5(ir, dst, coordinate, shadow_comparitor,
+                               lod, lod2, sample_index);
+   } else {
+      inst = emit_texture_gen4(ir, dst, coordinate, shadow_comparitor,
+                               lod, lod2);
+   }
+
+   if (ir->offset != NULL && ir->op != ir_txf)
+      inst->texture_offset = brw_texture_offset(ctx, ir->offset->as_constant());
+
+   if (ir->op == ir_tg4)
+      inst->texture_offset |= gather_channel(ir, sampler) << 16; // M0.2:16-17
+
+   inst->sampler = sampler;
+
+   if (ir->shadow_comparitor)
+      inst->shadow_compare = true;
+
+   /* fixup #layers for cube map arrays */
+   if (ir->op == ir_txs) {
+      glsl_type const *type = ir->sampler->type;
+      if (type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE &&
+          type->sampler_array) {
+         fs_reg depth = dst;
+         depth.reg_offset = 2;
+         emit_math(SHADER_OPCODE_INT_QUOTIENT, depth, depth, fs_reg(6));
+      }
+   }
+
+   if (brw->gen == 6 && ir->op == ir_tg4) {
+      emit_gen6_gather_wa(c->key.tex.gen6_gather_wa[sampler], dst);
+   }
+
+   swizzle_result(ir, dst, sampler);
+}
+
+/**
+ * Apply workarounds for Gen6 gather with UINT/SINT
+ */
+void
+fs_visitor::emit_gen6_gather_wa(uint8_t wa, fs_reg dst)
+{
+   if (!wa)
+      return;
+
+   int width = (wa & WA_8BIT) ? 8 : 16;
+
+   for (int i = 0; i < 4; i++) {
+      fs_reg dst_f = retype(dst, BRW_REGISTER_TYPE_F);
+      /* Convert from UNORM to UINT */
+      emit(MUL(dst_f, dst_f, fs_reg((float)((1 << width) - 1))));
+      emit(MOV(dst, dst_f));
+
+      if (wa & WA_SIGN) {
+         /* Reinterpret the UINT value as a signed INT value by
+          * shifting the sign bit into place, then shifting back
+          * preserving sign.
+          */
+         emit(SHL(dst, dst, fs_reg(32 - width)));
+         emit(ASR(dst, dst, fs_reg(32 - width)));
+      }
+
+      dst.reg_offset++;
+   }
+}
+
+/**
+ * Set up the gather channel based on the swizzle, for gather4.
+ */
+uint32_t
+fs_visitor::gather_channel(ir_texture *ir, int sampler)
+{
+   ir_constant *chan = ir->lod_info.component->as_constant();
+   int swiz = GET_SWZ(c->key.tex.swizzles[sampler], chan->value.i[0]);
+   switch (swiz) {
+      case SWIZZLE_X: return 0;
+      case SWIZZLE_Y:
+         /* gather4 sampler is broken for green channel on RG32F --
+          * we must ask for blue instead.
+          */
+         if (c->key.tex.gather_channel_quirk_mask & (1<<sampler))
+            return 2;
+         return 1;
+      case SWIZZLE_Z: return 2;
+      case SWIZZLE_W: return 3;
+      default:
+         assert(!"Not reached"); /* zero, one swizzles handled already */
+         return 0;
+   }
+}
+
+/**
+ * Swizzle the result of a texture result.  This is necessary for
+ * EXT_texture_swizzle as well as DEPTH_TEXTURE_MODE for shadow comparisons.
+ */
+void
+fs_visitor::swizzle_result(ir_texture *ir, fs_reg orig_val, int sampler)
+{
+   if (ir->op == ir_query_levels) {
+      /* # levels is in .w */
+      orig_val.reg_offset += 3;
+      this->result = orig_val;
+      return;
+   }
+
+   this->result = orig_val;
+
+   /* txs,lod don't actually sample the texture, so swizzling the result
+    * makes no sense.
+    */
+   if (ir->op == ir_txs || ir->op == ir_lod || ir->op == ir_tg4)
+      return;
+
+   if (ir->type == glsl_type::float_type) {
+      /* Ignore DEPTH_TEXTURE_MODE swizzling. */
+      assert(ir->sampler->type->sampler_shadow);
+   } else if (c->key.tex.swizzles[sampler] != SWIZZLE_NOOP) {
+      fs_reg swizzled_result = fs_reg(this, glsl_type::vec4_type);
+
+      for (int i = 0; i < 4; i++) {
+	 int swiz = GET_SWZ(c->key.tex.swizzles[sampler], i);
+	 fs_reg l = swizzled_result;
+	 l.reg_offset += i;
+
+	 if (swiz == SWIZZLE_ZERO) {
+	    emit(MOV(l, fs_reg(0.0f)));
+	 } else if (swiz == SWIZZLE_ONE) {
+	    emit(MOV(l, fs_reg(1.0f)));
+	 } else {
+	    fs_reg r = orig_val;
+	    r.reg_offset += GET_SWZ(c->key.tex.swizzles[sampler], i);
+	    emit(MOV(l, r));
+	 }
+      }
+      this->result = swizzled_result;
+   }
+}
+
+void
+fs_visitor::visit(ir_swizzle *ir)
+{
+   ir->val->accept(this);
+   fs_reg val = this->result;
+
+   if (ir->type->vector_elements == 1) {
+      this->result.reg_offset += ir->mask.x;
+      return;
+   }
+
+   fs_reg result = fs_reg(this, ir->type);
+   this->result = result;
+
+   for (unsigned int i = 0; i < ir->type->vector_elements; i++) {
+      fs_reg channel = val;
+      int swiz = 0;
+
+      switch (i) {
+      case 0:
+	 swiz = ir->mask.x;
+	 break;
+      case 1:
+	 swiz = ir->mask.y;
+	 break;
+      case 2:
+	 swiz = ir->mask.z;
+	 break;
+      case 3:
+	 swiz = ir->mask.w;
+	 break;
+      }
+
+      channel.reg_offset += swiz;
+      emit(MOV(result, channel));
+      result.reg_offset++;
+   }
+}
+
+void
+fs_visitor::visit(ir_discard *ir)
+{
+   assert(ir->condition == NULL); /* FINISHME */
+
+   /* We track our discarded pixels in f0.1.  By predicating on it, we can
+    * update just the flag bits that aren't yet discarded.  By emitting a
+    * CMP of g0 != g0, all our currently executing channels will get turned
+    * off.
+    */
+   fs_reg some_reg = fs_reg(retype(brw_vec8_grf(0, 0),
+                                   BRW_REGISTER_TYPE_UW));
+   fs_inst *cmp = emit(CMP(reg_null_f, some_reg, some_reg,
+                           BRW_CONDITIONAL_NZ));
+   cmp->predicate = BRW_PREDICATE_NORMAL;
+   cmp->flag_subreg = 1;
+
+   if (brw->gen >= 6) {
+      /* For performance, after a discard, jump to the end of the shader.
+       * Only jump if all relevant channels have been discarded.
+       */
+      fs_inst *discard_jump = emit(FS_OPCODE_DISCARD_JUMP);
+      discard_jump->flag_subreg = 1;
+
+      discard_jump->predicate = (dispatch_width == 8)
+                                ? BRW_PREDICATE_ALIGN1_ANY8H
+                                : BRW_PREDICATE_ALIGN1_ANY16H;
+      discard_jump->predicate_inverse = true;
+   }
+}
+
+void
+fs_visitor::visit(ir_constant *ir)
+{
+   /* Set this->result to reg at the bottom of the function because some code
+    * paths will cause this visitor to be applied to other fields.  This will
+    * cause the value stored in this->result to be modified.
+    *
+    * Make reg constant so that it doesn't get accidentally modified along the
+    * way.  Yes, I actually had this problem. :(
+    */
+   const fs_reg reg(this, ir->type);
+   fs_reg dst_reg = reg;
+
+   if (ir->type->is_array()) {
+      const unsigned size = type_size(ir->type->fields.array);
+
+      for (unsigned i = 0; i < ir->type->length; i++) {
+	 ir->array_elements[i]->accept(this);
+	 fs_reg src_reg = this->result;
+
+	 dst_reg.type = src_reg.type;
+	 for (unsigned j = 0; j < size; j++) {
+	    emit(MOV(dst_reg, src_reg));
+	    src_reg.reg_offset++;
+	    dst_reg.reg_offset++;
+	 }
+      }
+   } else if (ir->type->is_record()) {
+      foreach_list(node, &ir->components) {
+	 ir_constant *const field = (ir_constant *) node;
+	 const unsigned size = type_size(field->type);
+
+	 field->accept(this);
+	 fs_reg src_reg = this->result;
+
+	 dst_reg.type = src_reg.type;
+	 for (unsigned j = 0; j < size; j++) {
+	    emit(MOV(dst_reg, src_reg));
+	    src_reg.reg_offset++;
+	    dst_reg.reg_offset++;
+	 }
+      }
+   } else {
+      const unsigned size = type_size(ir->type);
+
+      for (unsigned i = 0; i < size; i++) {
+	 switch (ir->type->base_type) {
+	 case GLSL_TYPE_FLOAT:
+	    emit(MOV(dst_reg, fs_reg(ir->value.f[i])));
+	    break;
+	 case GLSL_TYPE_UINT:
+	    emit(MOV(dst_reg, fs_reg(ir->value.u[i])));
+	    break;
+	 case GLSL_TYPE_INT:
+	    emit(MOV(dst_reg, fs_reg(ir->value.i[i])));
+	    break;
+	 case GLSL_TYPE_BOOL:
+	    emit(MOV(dst_reg, fs_reg((int)ir->value.b[i])));
+	    break;
+	 default:
+	    assert(!"Non-float/uint/int/bool constant");
+	 }
+	 dst_reg.reg_offset++;
+      }
+   }
+
+   this->result = reg;
+}
+
+void
+fs_visitor::emit_bool_to_cond_code(ir_rvalue *ir)
+{
+   ir_expression *expr = ir->as_expression();
+
+   if (expr &&
+       expr->operation != ir_binop_logic_and &&
+       expr->operation != ir_binop_logic_or &&
+       expr->operation != ir_binop_logic_xor &&
+       expr->operation != ir_triop_csel) {
+      fs_reg op[2];
+      fs_inst *inst;
+
+      assert(expr->get_num_operands() <= 2);
+      for (unsigned int i = 0; i < expr->get_num_operands(); i++) {
+	 assert(expr->operands[i]->type->is_scalar());
+
+	 expr->operands[i]->accept(this);
+	 op[i] = this->result;
+
+	 resolve_ud_negate(&op[i]);
+      }
+
+      switch (expr->operation) {
+      case ir_unop_logic_not:
+	 inst = emit(AND(reg_null_d, op[0], fs_reg(1)));
+	 inst->conditional_mod = BRW_CONDITIONAL_Z;
+	 break;
+
+      case ir_unop_f2b:
+	 if (brw->gen >= 6) {
+	    emit(CMP(reg_null_d, op[0], fs_reg(0.0f), BRW_CONDITIONAL_NZ));
+	 } else {
+	    inst = emit(MOV(reg_null_f, op[0]));
+            inst->conditional_mod = BRW_CONDITIONAL_NZ;
+	 }
+	 break;
+
+      case ir_unop_i2b:
+	 if (brw->gen >= 6) {
+	    emit(CMP(reg_null_d, op[0], fs_reg(0), BRW_CONDITIONAL_NZ));
+	 } else {
+	    inst = emit(MOV(reg_null_d, op[0]));
+            inst->conditional_mod = BRW_CONDITIONAL_NZ;
+	 }
+	 break;
+
+      case ir_binop_greater:
+      case ir_binop_gequal:
+      case ir_binop_less:
+      case ir_binop_lequal:
+      case ir_binop_equal:
+      case ir_binop_all_equal:
+      case ir_binop_nequal:
+      case ir_binop_any_nequal:
+	 resolve_bool_comparison(expr->operands[0], &op[0]);
+	 resolve_bool_comparison(expr->operands[1], &op[1]);
+
+	 emit(CMP(reg_null_d, op[0], op[1],
+                  brw_conditional_for_comparison(expr->operation)));
+	 break;
+
+      default:
+	 assert(!"not reached");
+	 fail("bad cond code\n");
+	 break;
+      }
+      return;
+   }
+
+   ir->accept(this);
+
+   fs_inst *inst = emit(AND(reg_null_d, this->result, fs_reg(1)));
+   inst->conditional_mod = BRW_CONDITIONAL_NZ;
+}
+
+/**
+ * Emit a gen6 IF statement with the comparison folded into the IF
+ * instruction.
+ */
+void
+fs_visitor::emit_if_gen6(ir_if *ir)
+{
+   ir_expression *expr = ir->condition->as_expression();
+
+   if (expr) {
+      fs_reg op[2];
+      fs_inst *inst;
+      fs_reg temp;
+
+      assert(expr->get_num_operands() <= 2);
+      for (unsigned int i = 0; i < expr->get_num_operands(); i++) {
+	 assert(expr->operands[i]->type->is_scalar());
+
+	 expr->operands[i]->accept(this);
+	 op[i] = this->result;
+      }
+
+      switch (expr->operation) {
+      case ir_unop_logic_not:
+      case ir_binop_logic_xor:
+      case ir_binop_logic_or:
+      case ir_binop_logic_and:
+         /* For operations on bool arguments, only the low bit of the bool is
+          * valid, and the others are undefined.  Fall back to the condition
+          * code path.
+          */
+         break;
+
+      case ir_unop_f2b:
+	 inst = emit(BRW_OPCODE_IF, reg_null_f, op[0], fs_reg(0));
+	 inst->conditional_mod = BRW_CONDITIONAL_NZ;
+	 return;
+
+      case ir_unop_i2b:
+	 emit(IF(op[0], fs_reg(0), BRW_CONDITIONAL_NZ));
+	 return;
+
+      case ir_binop_greater:
+      case ir_binop_gequal:
+      case ir_binop_less:
+      case ir_binop_lequal:
+      case ir_binop_equal:
+      case ir_binop_all_equal:
+      case ir_binop_nequal:
+      case ir_binop_any_nequal:
+	 resolve_bool_comparison(expr->operands[0], &op[0]);
+	 resolve_bool_comparison(expr->operands[1], &op[1]);
+
+	 emit(IF(op[0], op[1],
+                 brw_conditional_for_comparison(expr->operation)));
+	 return;
+      default:
+	 assert(!"not reached");
+	 emit(IF(op[0], fs_reg(0), BRW_CONDITIONAL_NZ));
+	 fail("bad condition\n");
+	 return;
+      }
+   }
+
+   emit_bool_to_cond_code(ir->condition);
+   fs_inst *inst = emit(BRW_OPCODE_IF);
+   inst->predicate = BRW_PREDICATE_NORMAL;
+}
+
+/**
+ * Try to replace IF/MOV/ELSE/MOV/ENDIF with SEL.
+ *
+ * Many GLSL shaders contain the following pattern:
+ *
+ *    x = condition ? foo : bar
+ *
+ * The compiler emits an ir_if tree for this, since each subexpression might be
+ * a complex tree that could have side-effects or short-circuit logic.
+ *
+ * However, the common case is to simply select one of two constants or
+ * variable values---which is exactly what SEL is for.  In this case, the
+ * assembly looks like:
+ *
+ *    (+f0) IF
+ *    MOV dst src0
+ *    ELSE
+ *    MOV dst src1
+ *    ENDIF
+ *
+ * which can be easily translated into:
+ *
+ *    (+f0) SEL dst src0 src1
+ *
+ * If src0 is an immediate value, we promote it to a temporary GRF.
+ */
+void
+fs_visitor::try_replace_with_sel()
+{
+   fs_inst *endif_inst = (fs_inst *) instructions.get_tail();
+   assert(endif_inst->opcode == BRW_OPCODE_ENDIF);
+
+   /* Pattern match in reverse: IF, MOV, ELSE, MOV, ENDIF. */
+   int opcodes[] = {
+      BRW_OPCODE_IF, BRW_OPCODE_MOV, BRW_OPCODE_ELSE, BRW_OPCODE_MOV,
+   };
+
+   fs_inst *match = (fs_inst *) endif_inst->prev;
+   for (int i = 0; i < 4; i++) {
+      if (match->is_head_sentinel() || match->opcode != opcodes[4-i-1])
+         return;
+      match = (fs_inst *) match->prev;
+   }
+
+   /* The opcodes match; it looks like the right sequence of instructions. */
+   fs_inst *else_mov = (fs_inst *) endif_inst->prev;
+   fs_inst *then_mov = (fs_inst *) else_mov->prev->prev;
+   fs_inst *if_inst = (fs_inst *) then_mov->prev;
+
+   /* Check that the MOVs are the right form. */
+   if (then_mov->dst.equals(else_mov->dst) &&
+       !then_mov->is_partial_write() &&
+       !else_mov->is_partial_write()) {
+
+      /* Remove the matched instructions; we'll emit a SEL to replace them. */
+      while (!if_inst->next->is_tail_sentinel())
+         if_inst->next->remove();
+      if_inst->remove();
+
+      /* Only the last source register can be a constant, so if the MOV in
+       * the "then" clause uses a constant, we need to put it in a temporary.
+       */
+      fs_reg src0(then_mov->src[0]);
+      if (src0.file == IMM) {
+         src0 = fs_reg(this, glsl_type::float_type);
+         src0.type = then_mov->src[0].type;
+         emit(MOV(src0, then_mov->src[0]));
+      }
+
+      fs_inst *sel;
+      if (if_inst->conditional_mod) {
+         /* Sandybridge-specific IF with embedded comparison */
+         emit(CMP(reg_null_d, if_inst->src[0], if_inst->src[1],
+                  if_inst->conditional_mod));
+         sel = emit(BRW_OPCODE_SEL, then_mov->dst, src0, else_mov->src[0]);
+         sel->predicate = BRW_PREDICATE_NORMAL;
+      } else {
+         /* Separate CMP and IF instructions */
+         sel = emit(BRW_OPCODE_SEL, then_mov->dst, src0, else_mov->src[0]);
+         sel->predicate = if_inst->predicate;
+         sel->predicate_inverse = if_inst->predicate_inverse;
+      }
+   }
+}
+
+void
+fs_visitor::visit(ir_if *ir)
+{
+   if (brw->gen < 6) {
+      no16("Can't support (non-uniform) control flow on SIMD16\n");
+   }
+
+   /* Don't point the annotation at the if statement, because then it plus
+    * the then and else blocks get printed.
+    */
+   this->base_ir = ir->condition;
+
+   if (brw->gen == 6) {
+      emit_if_gen6(ir);
+   } else {
+      emit_bool_to_cond_code(ir->condition);
+
+      emit(IF(BRW_PREDICATE_NORMAL));
+   }
+
+   foreach_list(node, &ir->then_instructions) {
+      ir_instruction *ir = (ir_instruction *)node;
+      this->base_ir = ir;
+
+      ir->accept(this);
+   }
+
+   if (!ir->else_instructions.is_empty()) {
+      emit(BRW_OPCODE_ELSE);
+
+      foreach_list(node, &ir->else_instructions) {
+	 ir_instruction *ir = (ir_instruction *)node;
+	 this->base_ir = ir;
+
+	 ir->accept(this);
+      }
+   }
+
+   emit(BRW_OPCODE_ENDIF);
+
+   try_replace_with_sel();
+}
+
+void
+fs_visitor::visit(ir_loop *ir)
+{
+   if (brw->gen < 6) {
+      no16("Can't support (non-uniform) control flow on SIMD16\n");
+   }
+
+   this->base_ir = NULL;
+   emit(BRW_OPCODE_DO);
+
+   foreach_list(node, &ir->body_instructions) {
+      ir_instruction *ir = (ir_instruction *)node;
+
+      this->base_ir = ir;
+      ir->accept(this);
+   }
+
+   this->base_ir = NULL;
+   emit(BRW_OPCODE_WHILE);
+}
+
+void
+fs_visitor::visit(ir_loop_jump *ir)
+{
+   switch (ir->mode) {
+   case ir_loop_jump::jump_break:
+      emit(BRW_OPCODE_BREAK);
+      break;
+   case ir_loop_jump::jump_continue:
+      emit(BRW_OPCODE_CONTINUE);
+      break;
+   }
+}
+
+void
+fs_visitor::visit_atomic_counter_intrinsic(ir_call *ir)
+{
+   ir_dereference *deref = static_cast<ir_dereference *>(
+      ir->actual_parameters.get_head());
+   ir_variable *location = deref->variable_referenced();
+   unsigned surf_index = (c->prog_data.base.binding_table.abo_start +
+                          location->data.atomic.buffer_index);
+
+   /* Calculate the surface offset */
+   fs_reg offset(this, glsl_type::uint_type);
+   ir_dereference_array *deref_array = deref->as_dereference_array();
+
+   if (deref_array) {
+      deref_array->array_index->accept(this);
+
+      fs_reg tmp(this, glsl_type::uint_type);
+      emit(MUL(tmp, this->result, ATOMIC_COUNTER_SIZE));
+      emit(ADD(offset, tmp, location->data.atomic.offset));
+   } else {
+      offset = location->data.atomic.offset;
+   }
+
+   /* Emit the appropriate machine instruction */
+   const char *callee = ir->callee->function_name();
+   ir->return_deref->accept(this);
+   fs_reg dst = this->result;
+
+   if (!strcmp("__intrinsic_atomic_read", callee)) {
+      emit_untyped_surface_read(surf_index, dst, offset);
+
+   } else if (!strcmp("__intrinsic_atomic_increment", callee)) {
+      emit_untyped_atomic(BRW_AOP_INC, surf_index, dst, offset,
+                          fs_reg(), fs_reg());
+
+   } else if (!strcmp("__intrinsic_atomic_predecrement", callee)) {
+      emit_untyped_atomic(BRW_AOP_PREDEC, surf_index, dst, offset,
+                          fs_reg(), fs_reg());
+   }
+}
+
+void
+fs_visitor::visit(ir_call *ir)
+{
+   const char *callee = ir->callee->function_name();
+
+   if (!strcmp("__intrinsic_atomic_read", callee) ||
+       !strcmp("__intrinsic_atomic_increment", callee) ||
+       !strcmp("__intrinsic_atomic_predecrement", callee)) {
+      visit_atomic_counter_intrinsic(ir);
+   } else {
+      assert(!"Unsupported intrinsic.");
+   }
+}
+
+void
+fs_visitor::visit(ir_return *ir)
+{
+   assert(!"FINISHME");
+}
+
+void
+fs_visitor::visit(ir_function *ir)
+{
+   /* Ignore function bodies other than main() -- we shouldn't see calls to
+    * them since they should all be inlined before we get to ir_to_mesa.
+    */
+   if (strcmp(ir->name, "main") == 0) {
+      const ir_function_signature *sig;
+      exec_list empty;
+
+      sig = ir->matching_signature(NULL, &empty);
+
+      assert(sig);
+
+      foreach_list(node, &sig->body) {
+	 ir_instruction *ir = (ir_instruction *)node;
+	 this->base_ir = ir;
+
+	 ir->accept(this);
+      }
+   }
+}
+
+void
+fs_visitor::visit(ir_function_signature *ir)
+{
+   assert(!"not reached");
+   (void)ir;
+}
+
+void
+fs_visitor::visit(ir_emit_vertex *)
+{
+   assert(!"not reached");
+}
+
+void
+fs_visitor::visit(ir_end_primitive *)
+{
+   assert(!"not reached");
+}
+
+void
+fs_visitor::emit_untyped_atomic(unsigned atomic_op, unsigned surf_index,
+                                fs_reg dst, fs_reg offset, fs_reg src0,
+                                fs_reg src1)
+{
+   const unsigned operand_len = dispatch_width / 8;
+   unsigned mlen = 0;
+
+   /* Initialize the sample mask in the message header. */
+   emit(MOV(brw_uvec_mrf(8, mlen, 0), fs_reg(0u)))
+      ->force_writemask_all = true;
+
+   if (fp->UsesKill) {
+      emit(MOV(brw_uvec_mrf(1, mlen, 7), brw_flag_reg(0, 1)))
+         ->force_writemask_all = true;
+   } else {
+      emit(MOV(brw_uvec_mrf(1, mlen, 7),
+               retype(brw_vec1_grf(1, 7), BRW_REGISTER_TYPE_UD)))
+         ->force_writemask_all = true;
+   }
+
+   mlen++;
+
+   /* Set the atomic operation offset. */
+   emit(MOV(brw_uvec_mrf(dispatch_width, mlen, 0), offset));
+   mlen += operand_len;
+
+   /* Set the atomic operation arguments. */
+   if (src0.file != BAD_FILE) {
+      emit(MOV(brw_uvec_mrf(dispatch_width, mlen, 0), src0));
+      mlen += operand_len;
+   }
+
+   if (src1.file != BAD_FILE) {
+      emit(MOV(brw_uvec_mrf(dispatch_width, mlen, 0), src1));
+      mlen += operand_len;
+   }
+
+   /* Emit the instruction. */
+   fs_inst *inst = new(mem_ctx) fs_inst(SHADER_OPCODE_UNTYPED_ATOMIC, dst,
+                                        atomic_op, surf_index);
+   inst->base_mrf = 0;
+   inst->mlen = mlen;
+   inst->header_present = true;
+   emit(inst);
+}
+
+void
+fs_visitor::emit_untyped_surface_read(unsigned surf_index, fs_reg dst,
+                                      fs_reg offset)
+{
+   const unsigned operand_len = dispatch_width / 8;
+   unsigned mlen = 0;
+
+   /* Initialize the sample mask in the message header. */
+   emit(MOV(brw_uvec_mrf(8, mlen, 0), fs_reg(0u)))
+      ->force_writemask_all = true;
+
+   if (fp->UsesKill) {
+      emit(MOV(brw_uvec_mrf(1, mlen, 7), brw_flag_reg(0, 1)))
+         ->force_writemask_all = true;
+   } else {
+      emit(MOV(brw_uvec_mrf(1, mlen, 7),
+               retype(brw_vec1_grf(1, 7), BRW_REGISTER_TYPE_UD)))
+         ->force_writemask_all = true;
+   }
+
+   mlen++;
+
+   /* Set the surface read offset. */
+   emit(MOV(brw_uvec_mrf(dispatch_width, mlen, 0), offset));
+   mlen += operand_len;
+
+   /* Emit the instruction. */
+   fs_inst *inst = new(mem_ctx)
+      fs_inst(SHADER_OPCODE_UNTYPED_SURFACE_READ, dst, surf_index);
+   inst->base_mrf = 0;
+   inst->mlen = mlen;
+   inst->header_present = true;
+   emit(inst);
+}
+
+fs_inst *
+fs_visitor::emit(fs_inst *inst)
+{
+   if (force_uncompressed_stack > 0)
+      inst->force_uncompressed = true;
+
+   inst->annotation = this->current_annotation;
+   inst->ir = this->base_ir;
+
+   this->instructions.push_tail(inst);
+
+   return inst;
+}
+
+void
+fs_visitor::emit(exec_list list)
+{
+   foreach_list_safe(node, &list) {
+      fs_inst *inst = (fs_inst *)node;
+      inst->remove();
+      emit(inst);
+   }
+}
+
+/** Emits a dummy fragment shader consisting of magenta for bringup purposes. */
+void
+fs_visitor::emit_dummy_fs()
+{
+   int reg_width = dispatch_width / 8;
+
+   /* Everyone's favorite color. */
+   emit(MOV(fs_reg(MRF, 2 + 0 * reg_width), fs_reg(1.0f)));
+   emit(MOV(fs_reg(MRF, 2 + 1 * reg_width), fs_reg(0.0f)));
+   emit(MOV(fs_reg(MRF, 2 + 2 * reg_width), fs_reg(1.0f)));
+   emit(MOV(fs_reg(MRF, 2 + 3 * reg_width), fs_reg(0.0f)));
+
+   fs_inst *write;
+   write = emit(FS_OPCODE_FB_WRITE, fs_reg(0), fs_reg(0));
+   write->base_mrf = 2;
+   write->mlen = 4 * reg_width;
+   write->eot = true;
+}
+
+/* The register location here is relative to the start of the URB
+ * data.  It will get adjusted to be a real location before
+ * generate_code() time.
+ */
+struct brw_reg
+fs_visitor::interp_reg(int location, int channel)
+{
+   int regnr = c->prog_data.urb_setup[location] * 2 + channel / 2;
+   int stride = (channel & 1) * 4;
+
+   assert(c->prog_data.urb_setup[location] != -1);
+
+   return brw_vec1_grf(regnr, stride);
+}
+
+/** Emits the interpolation for the varying inputs. */
+void
+fs_visitor::emit_interpolation_setup_gen4()
+{
+   this->current_annotation = "compute pixel centers";
+   this->pixel_x = fs_reg(this, glsl_type::uint_type);
+   this->pixel_y = fs_reg(this, glsl_type::uint_type);
+   this->pixel_x.type = BRW_REGISTER_TYPE_UW;
+   this->pixel_y.type = BRW_REGISTER_TYPE_UW;
+
+   emit(FS_OPCODE_PIXEL_X, this->pixel_x);
+   emit(FS_OPCODE_PIXEL_Y, this->pixel_y);
+
+   this->current_annotation = "compute pixel deltas from v0";
+   if (brw->has_pln) {
+      this->delta_x[BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC] =
+         fs_reg(this, glsl_type::vec2_type);
+      this->delta_y[BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC] =
+         this->delta_x[BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC];
+      this->delta_y[BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC].reg_offset++;
+   } else {
+      this->delta_x[BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC] =
+         fs_reg(this, glsl_type::float_type);
+      this->delta_y[BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC] =
+         fs_reg(this, glsl_type::float_type);
+   }
+   emit(ADD(this->delta_x[BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC],
+            this->pixel_x, fs_reg(negate(brw_vec1_grf(1, 0)))));
+   emit(ADD(this->delta_y[BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC],
+            this->pixel_y, fs_reg(negate(brw_vec1_grf(1, 1)))));
+
+   this->current_annotation = "compute pos.w and 1/pos.w";
+   /* Compute wpos.w.  It's always in our setup, since it's needed to
+    * interpolate the other attributes.
+    */
+   this->wpos_w = fs_reg(this, glsl_type::float_type);
+   emit(FS_OPCODE_LINTERP, wpos_w,
+        this->delta_x[BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC],
+        this->delta_y[BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC],
+	interp_reg(VARYING_SLOT_POS, 3));
+   /* Compute the pixel 1/W value from wpos.w. */
+   this->pixel_w = fs_reg(this, glsl_type::float_type);
+   emit_math(SHADER_OPCODE_RCP, this->pixel_w, wpos_w);
+   this->current_annotation = NULL;
+}
+
+/** Emits the interpolation for the varying inputs. */
+void
+fs_visitor::emit_interpolation_setup_gen6()
+{
+   struct brw_reg g1_uw = retype(brw_vec1_grf(1, 0), BRW_REGISTER_TYPE_UW);
+
+   /* If the pixel centers end up used, the setup is the same as for gen4. */
+   this->current_annotation = "compute pixel centers";
+   fs_reg int_pixel_x = fs_reg(this, glsl_type::uint_type);
+   fs_reg int_pixel_y = fs_reg(this, glsl_type::uint_type);
+   int_pixel_x.type = BRW_REGISTER_TYPE_UW;
+   int_pixel_y.type = BRW_REGISTER_TYPE_UW;
+   emit(ADD(int_pixel_x,
+            fs_reg(stride(suboffset(g1_uw, 4), 2, 4, 0)),
+            fs_reg(brw_imm_v(0x10101010))));
+   emit(ADD(int_pixel_y,
+            fs_reg(stride(suboffset(g1_uw, 5), 2, 4, 0)),
+            fs_reg(brw_imm_v(0x11001100))));
+
+   /* As of gen6, we can no longer mix float and int sources.  We have
+    * to turn the integer pixel centers into floats for their actual
+    * use.
+    */
+   this->pixel_x = fs_reg(this, glsl_type::float_type);
+   this->pixel_y = fs_reg(this, glsl_type::float_type);
+   emit(MOV(this->pixel_x, int_pixel_x));
+   emit(MOV(this->pixel_y, int_pixel_y));
+
+   this->current_annotation = "compute pos.w";
+   this->pixel_w = fs_reg(brw_vec8_grf(c->source_w_reg, 0));
+   this->wpos_w = fs_reg(this, glsl_type::float_type);
+   emit_math(SHADER_OPCODE_RCP, this->wpos_w, this->pixel_w);
+
+   for (int i = 0; i < BRW_WM_BARYCENTRIC_INTERP_MODE_COUNT; ++i) {
+      uint8_t reg = c->barycentric_coord_reg[i];
+      this->delta_x[i] = fs_reg(brw_vec8_grf(reg, 0));
+      this->delta_y[i] = fs_reg(brw_vec8_grf(reg + 1, 0));
+   }
+
+   this->current_annotation = NULL;
+}
+
+void
+fs_visitor::emit_color_write(int target, int index, int first_color_mrf)
+{
+   int reg_width = dispatch_width / 8;
+   fs_inst *inst;
+   fs_reg color = outputs[target];
+   fs_reg mrf;
+
+   /* If there's no color data to be written, skip it. */
+   if (color.file == BAD_FILE)
+      return;
+
+   color.reg_offset += index;
+
+   if (dispatch_width == 8 || brw->gen >= 6) {
+      /* SIMD8 write looks like:
+       * m + 0: r0
+       * m + 1: r1
+       * m + 2: g0
+       * m + 3: g1
+       *
+       * gen6 SIMD16 DP write looks like:
+       * m + 0: r0
+       * m + 1: r1
+       * m + 2: g0
+       * m + 3: g1
+       * m + 4: b0
+       * m + 5: b1
+       * m + 6: a0
+       * m + 7: a1
+       */
+      inst = emit(MOV(fs_reg(MRF, first_color_mrf + index * reg_width,
+                             color.type),
+                      color));
+      inst->saturate = c->key.clamp_fragment_color;
+   } else {
+      /* pre-gen6 SIMD16 single source DP write looks like:
+       * m + 0: r0
+       * m + 1: g0
+       * m + 2: b0
+       * m + 3: a0
+       * m + 4: r1
+       * m + 5: g1
+       * m + 6: b1
+       * m + 7: a1
+       */
+      if (brw->has_compr4) {
+	 /* By setting the high bit of the MRF register number, we
+	  * indicate that we want COMPR4 mode - instead of doing the
+	  * usual destination + 1 for the second half we get
+	  * destination + 4.
+	  */
+	 inst = emit(MOV(fs_reg(MRF, BRW_MRF_COMPR4 + first_color_mrf + index,
+                                color.type),
+                         color));
+	 inst->saturate = c->key.clamp_fragment_color;
+      } else {
+	 push_force_uncompressed();
+	 inst = emit(MOV(fs_reg(MRF, first_color_mrf + index, color.type),
+                         color));
+	 inst->saturate = c->key.clamp_fragment_color;
+	 pop_force_uncompressed();
+
+	 inst = emit(MOV(fs_reg(MRF, first_color_mrf + index + 4, color.type),
+                         half(color, 1)));
+	 inst->force_sechalf = true;
+	 inst->saturate = c->key.clamp_fragment_color;
+      }
+   }
+}
+
+static int
+cond_for_alpha_func(GLenum func)
+{
+   switch(func) {
+      case GL_GREATER:
+         return BRW_CONDITIONAL_G;
+      case GL_GEQUAL:
+         return BRW_CONDITIONAL_GE;
+      case GL_LESS:
+         return BRW_CONDITIONAL_L;
+      case GL_LEQUAL:
+         return BRW_CONDITIONAL_LE;
+      case GL_EQUAL:
+         return BRW_CONDITIONAL_EQ;
+      case GL_NOTEQUAL:
+         return BRW_CONDITIONAL_NEQ;
+      default:
+         assert(!"Not reached");
+         return 0;
+   }
+}
+
+/**
+ * Alpha test support for when we compile it into the shader instead
+ * of using the normal fixed-function alpha test.
+ */
+void
+fs_visitor::emit_alpha_test()
+{
+   this->current_annotation = "Alpha test";
+
+   fs_inst *cmp;
+   if (c->key.alpha_test_func == GL_ALWAYS)
+      return;
+
+   if (c->key.alpha_test_func == GL_NEVER) {
+      /* f0.1 = 0 */
+      fs_reg some_reg = fs_reg(retype(brw_vec8_grf(0, 0),
+                                      BRW_REGISTER_TYPE_UW));
+      cmp = emit(CMP(reg_null_f, some_reg, some_reg,
+                     BRW_CONDITIONAL_NEQ));
+   } else {
+      /* RT0 alpha */
+      fs_reg color = outputs[0];
+      color.reg_offset += 3;
+
+      /* f0.1 &= func(color, ref) */
+      cmp = emit(CMP(reg_null_f, color, fs_reg(c->key.alpha_test_ref),
+                     cond_for_alpha_func(c->key.alpha_test_func)));
+   }
+   cmp->predicate = BRW_PREDICATE_NORMAL;
+   cmp->flag_subreg = 1;
+}
+
+void
+fs_visitor::emit_fb_writes()
+{
+   this->current_annotation = "FB write header";
+   bool header_present = true;
+   /* We can potentially have a message length of up to 15, so we have to set
+    * base_mrf to either 0 or 1 in order to fit in m0..m15.
+    */
+   int base_mrf = 1;
+   int nr = base_mrf;
+   int reg_width = dispatch_width / 8;
+   bool src0_alpha_to_render_target = false;
+
+   if (do_dual_src) {
+      no16("GL_ARB_blend_func_extended not yet supported in SIMD16.");
+      if (dispatch_width == 16)
+         do_dual_src = false;
+   }
+
+   /* From the Sandy Bridge PRM, volume 4, page 198:
+    *
+    *     "Dispatched Pixel Enables. One bit per pixel indicating
+    *      which pixels were originally enabled when the thread was
+    *      dispatched. This field is only required for the end-of-
+    *      thread message and on all dual-source messages."
+    */
+   if (brw->gen >= 6 &&
+       (brw->is_haswell || brw->gen >= 8 || !this->fp->UsesKill) &&
+       !do_dual_src &&
+       c->key.nr_color_regions == 1) {
+      header_present = false;
+   }
+
+   if (header_present) {
+      src0_alpha_to_render_target = brw->gen >= 6 &&
+				    !do_dual_src &&
+                                    c->key.replicate_alpha;
+      /* m2, m3 header */
+      nr += 2;
+   }
+
+   if (c->aa_dest_stencil_reg) {
+      push_force_uncompressed();
+      emit(MOV(fs_reg(MRF, nr++),
+               fs_reg(brw_vec8_grf(c->aa_dest_stencil_reg, 0))));
+      pop_force_uncompressed();
+   }
+
+   c->prog_data.uses_omask =
+      fp->Base.OutputsWritten & BITFIELD64_BIT(FRAG_RESULT_SAMPLE_MASK);
+   if(c->prog_data.uses_omask) {
+      this->current_annotation = "FB write oMask";
+      assert(this->sample_mask.file != BAD_FILE);
+      /* Hand over gl_SampleMask. Only lower 16 bits are relevant. */
+      emit(FS_OPCODE_SET_OMASK, fs_reg(MRF, nr, BRW_REGISTER_TYPE_UW), this->sample_mask);
+      nr += 1;
+   }
+
+   /* Reserve space for color. It'll be filled in per MRT below. */
+   int color_mrf = nr;
+   nr += 4 * reg_width;
+   if (do_dual_src)
+      nr += 4;
+   if (src0_alpha_to_render_target)
+      nr += reg_width;
+
+   if (c->source_depth_to_render_target) {
+      if (brw->gen == 6) {
+	 /* For outputting oDepth on gen6, SIMD8 writes have to be
+	  * used.  This would require SIMD8 moves of each half to
+	  * message regs, kind of like pre-gen5 SIMD16 FB writes.
+	  * Just bail on doing so for now.
+	  */
+	 no16("Missing support for simd16 depth writes on gen6\n");
+      }
+
+      if (prog->OutputsWritten & BITFIELD64_BIT(FRAG_RESULT_DEPTH)) {
+	 /* Hand over gl_FragDepth. */
+	 assert(this->frag_depth.file != BAD_FILE);
+	 emit(MOV(fs_reg(MRF, nr), this->frag_depth));
+      } else {
+	 /* Pass through the payload depth. */
+	 emit(MOV(fs_reg(MRF, nr),
+                  fs_reg(brw_vec8_grf(c->source_depth_reg, 0))));
+      }
+      nr += reg_width;
+   }
+
+   if (c->dest_depth_reg) {
+      emit(MOV(fs_reg(MRF, nr),
+               fs_reg(brw_vec8_grf(c->dest_depth_reg, 0))));
+      nr += reg_width;
+   }
+
+   if (do_dual_src) {
+      fs_reg src0 = this->outputs[0];
+      fs_reg src1 = this->dual_src_output;
+
+      this->current_annotation = ralloc_asprintf(this->mem_ctx,
+						 "FB write src0");
+      for (int i = 0; i < 4; i++) {
+	 fs_inst *inst = emit(MOV(fs_reg(MRF, color_mrf + i, src0.type), src0));
+	 src0.reg_offset++;
+	 inst->saturate = c->key.clamp_fragment_color;
+      }
+
+      this->current_annotation = ralloc_asprintf(this->mem_ctx,
+						 "FB write src1");
+      for (int i = 0; i < 4; i++) {
+	 fs_inst *inst = emit(MOV(fs_reg(MRF, color_mrf + 4 + i, src1.type),
+                                  src1));
+	 src1.reg_offset++;
+	 inst->saturate = c->key.clamp_fragment_color;
+      }
+
+//      if (INTEL_DEBUG & DEBUG_SHADER_TIME)
+//         emit_shader_time_end();
+
+      fs_inst *inst = emit(FS_OPCODE_FB_WRITE);
+      inst->target = 0;
+      inst->base_mrf = base_mrf;
+      inst->mlen = nr - base_mrf;
+      inst->eot = true;
+      inst->header_present = header_present;
+      if ((brw->gen >= 8 || brw->is_haswell) && fp->UsesKill) {
+         inst->predicate = BRW_PREDICATE_NORMAL;
+         inst->flag_subreg = 1;
+      }
+
+      c->prog_data.dual_src_blend = true;
+      this->current_annotation = NULL;
+      return;
+   }
+
+   for (int target = 0; target < c->key.nr_color_regions; target++) {
+      this->current_annotation = ralloc_asprintf(this->mem_ctx,
+						 "FB write target %d",
+						 target);
+      /* If src0_alpha_to_render_target is true, include source zero alpha
+       * data in RenderTargetWrite message for targets > 0.
+       */
+      int write_color_mrf = color_mrf;
+      if (src0_alpha_to_render_target && target != 0) {
+         fs_inst *inst;
+         fs_reg color = outputs[0];
+         color.reg_offset += 3;
+
+         inst = emit(MOV(fs_reg(MRF, write_color_mrf, color.type),
+                         color));
+         inst->saturate = c->key.clamp_fragment_color;
+         write_color_mrf = color_mrf + reg_width;
+      }
+
+      for (unsigned i = 0; i < this->output_components[target]; i++)
+         emit_color_write(target, i, write_color_mrf);
+
+      bool eot = false;
+      if (target == c->key.nr_color_regions - 1) {
+         eot = true;
+
+//         if (INTEL_DEBUG & DEBUG_SHADER_TIME)
+//            emit_shader_time_end();
+      }
+
+      fs_inst *inst = emit(FS_OPCODE_FB_WRITE);
+      inst->target = target;
+      inst->base_mrf = base_mrf;
+      if (src0_alpha_to_render_target && target == 0)
+         inst->mlen = nr - base_mrf - reg_width;
+      else
+         inst->mlen = nr - base_mrf;
+      inst->eot = eot;
+      inst->header_present = header_present;
+      if ((brw->gen >= 8 || brw->is_haswell) && fp->UsesKill) {
+         inst->predicate = BRW_PREDICATE_NORMAL;
+         inst->flag_subreg = 1;
+      }
+   }
+
+   if (c->key.nr_color_regions == 0) {
+      /* Even if there's no color buffers enabled, we still need to send
+       * alpha out the pipeline to our null renderbuffer to support
+       * alpha-testing, alpha-to-coverage, and so on.
+       */
+      emit_color_write(0, 3, color_mrf);
+
+//      if (INTEL_DEBUG & DEBUG_SHADER_TIME)
+//         emit_shader_time_end();
+
+      fs_inst *inst = emit(FS_OPCODE_FB_WRITE);
+      inst->base_mrf = base_mrf;
+      inst->mlen = nr - base_mrf;
+      inst->eot = true;
+      inst->header_present = header_present;
+      if ((brw->gen >= 8 || brw->is_haswell) && fp->UsesKill) {
+         inst->predicate = BRW_PREDICATE_NORMAL;
+         inst->flag_subreg = 1;
+      }
+   }
+
+   this->current_annotation = NULL;
+}
+
+void
+fs_visitor::resolve_ud_negate(fs_reg *reg)
+{
+   if (reg->type != BRW_REGISTER_TYPE_UD ||
+       !reg->negate)
+      return;
+
+   fs_reg temp = fs_reg(this, glsl_type::uint_type);
+   emit(MOV(temp, *reg));
+   *reg = temp;
+}
+
+void
+fs_visitor::resolve_bool_comparison(ir_rvalue *rvalue, fs_reg *reg)
+{
+   if (rvalue->type != glsl_type::bool_type)
+      return;
+
+   fs_reg temp = fs_reg(this, glsl_type::bool_type);
+   emit(AND(temp, *reg, fs_reg(1)));
+   *reg = temp;
+}
+
+fs_visitor::fs_visitor(struct brw_context *brw,
+                       struct brw_wm_compile *c,
+                       struct gl_shader_program *shader_prog,
+                       struct gl_fragment_program *fp,
+                       unsigned dispatch_width)
+   : backend_visitor(brw, shader_prog, &fp->Base, &c->prog_data.base,
+                     MESA_SHADER_FRAGMENT),
+     dispatch_width(dispatch_width)
+{
+   this->c = c;
+   this->fp = fp;
+   this->mem_ctx = ralloc_context(NULL);
+   this->failed = false;
+   this->simd16_unsupported = false;
+   this->no16_msg = NULL;
+   this->variable_ht = hash_table_ctor(0,
+                                       hash_table_pointer_hash,
+                                       hash_table_pointer_compare);
+
+   memset(this->outputs, 0, sizeof(this->outputs));
+   memset(this->output_components, 0, sizeof(this->output_components));
+   this->first_non_payload_grf = 0;
+   this->max_grf = brw->gen >= 7 ? GEN7_MRF_HACK_START : BRW_MAX_GRF;
+
+   this->current_annotation = NULL;
+   this->base_ir = NULL;
+
+   this->virtual_grf_sizes = NULL;
+   this->virtual_grf_count = 0;
+   this->virtual_grf_array_size = 0;
+   this->virtual_grf_start = NULL;
+   this->virtual_grf_end = NULL;
+   this->live_intervals = NULL;
+   this->regs_live_at_ip = NULL;
+
+   this->uniforms = 0;
+   this->pull_constant_loc = NULL;
+   this->push_constant_loc = NULL;
+
+   this->force_uncompressed_stack = 0;
+
+   this->spilled_any_registers = false;
+   this->do_dual_src = false;
+
+   if (dispatch_width == 8)
+      this->param_size = rzalloc_array(mem_ctx, int, stage_prog_data->nr_params);
+}
+
+fs_visitor::~fs_visitor()
+{
+   ralloc_free(this->mem_ctx);
+   hash_table_dtor(this->variable_ht);
+}

diff --git a/icd/intel/compiler/pipeline/brw_lower_texture_gradients.cpp b/icd/intel/compiler/pipeline/brw_lower_texture_gradients.cpp
new file mode 100644
index 0000000..1589a20
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_lower_texture_gradients.cpp

@@ -0,0 +1,179 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file brw_lower_texture_gradients.cpp
+ */
+
+#include "glsl/ir.h"
+#include "glsl/ir_builder.h"
+#include "program/prog_instruction.h"
+#include "brw_context.h"
+
+using namespace ir_builder;
+
+class lower_texture_grad_visitor : public ir_hierarchical_visitor {
+public:
+   lower_texture_grad_visitor(bool has_sample_d_c)
+      : has_sample_d_c(has_sample_d_c)
+   {
+      progress = false;
+   }
+
+   ir_visitor_status visit_leave(ir_texture *ir);
+
+
+   bool progress;
+   bool has_sample_d_c;
+
+private:
+   void emit(ir_variable *, ir_rvalue *);
+};
+
+/**
+ * Emit a variable declaration and an assignment to initialize it.
+ */
+void
+lower_texture_grad_visitor::emit(ir_variable *var, ir_rvalue *value)
+{
+   base_ir->insert_before(var);
+   base_ir->insert_before(assign(var, value));
+}
+
+static const glsl_type *
+txs_type(const glsl_type *type)
+{
+   unsigned dims;
+   switch (type->sampler_dimensionality) {
+   case GLSL_SAMPLER_DIM_1D:
+      dims = 1;
+      break;
+   case GLSL_SAMPLER_DIM_2D:
+   case GLSL_SAMPLER_DIM_RECT:
+   case GLSL_SAMPLER_DIM_CUBE:
+      dims = 2;
+      break;
+   case GLSL_SAMPLER_DIM_3D:
+      dims = 3;
+      break;
+   default:
+      assert(!"Should not get here: invalid sampler dimensionality");
+      dims = 2;
+   }
+
+   if (type->sampler_array)
+      dims++;
+
+   return glsl_type::get_instance(GLSL_TYPE_INT, dims, 1);
+}
+
+ir_visitor_status
+lower_texture_grad_visitor::visit_leave(ir_texture *ir)
+{
+   /* Only lower textureGrad with shadow samplers */
+   if (ir->op != ir_txd || !ir->shadow_comparitor)
+      return visit_continue;
+
+   /* Lower textureGrad() with samplerCubeShadow even if we have the sample_d_c
+    * message.  GLSL provides gradients for the 'r' coordinate.  Unfortunately:
+    *
+    * From the Ivybridge PRM, Volume 4, Part 1, sample_d message description:
+    * "The r coordinate contains the faceid, and the r gradients are ignored
+    *  by hardware."
+    *
+    * We likely need to do a similar treatment for samplerCube and
+    * samplerCubeArray, but we have insufficient testing for that at the moment.
+    */
+   bool need_lowering = !has_sample_d_c ||
+      ir->sampler->type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE;
+
+   if (!need_lowering)
+      return visit_continue;
+
+   void *mem_ctx = ralloc_parent(ir);
+
+   const glsl_type *grad_type = ir->lod_info.grad.dPdx->type;
+
+   /* Use textureSize() to get the width and height of LOD 0; swizzle away
+    * the depth/number of array slices.
+    */
+   ir_texture *txs = new(mem_ctx) ir_texture(ir_txs);
+   txs->set_sampler(ir->sampler->clone(mem_ctx, NULL),
+		    txs_type(ir->sampler->type));
+   txs->lod_info.lod = new(mem_ctx) ir_constant(0);
+   ir_variable *size =
+      new(mem_ctx) ir_variable(grad_type, "size", ir_var_temporary);
+   if (ir->sampler->type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE) {
+      base_ir->insert_before(size);
+      base_ir->insert_before(assign(size, expr(ir_unop_i2f, txs), WRITEMASK_XY));
+      base_ir->insert_before(assign(size, new(mem_ctx) ir_constant(1.0f), WRITEMASK_Z));
+   } else {
+      emit(size, expr(ir_unop_i2f,
+                      swizzle_for_size(txs, grad_type->vector_elements)));
+   }
+
+   /* Scale the gradients by width and height.  Effectively, the incoming
+    * gradients are s'(x,y), t'(x,y), and r'(x,y) from equation 3.19 in the
+    * GL 3.0 spec; we want u'(x,y), which is w_t * s'(x,y).
+    */
+   ir_variable *dPdx =
+      new(mem_ctx) ir_variable(grad_type, "dPdx", ir_var_temporary);
+   emit(dPdx, mul(size, ir->lod_info.grad.dPdx));
+
+   ir_variable *dPdy =
+      new(mem_ctx) ir_variable(grad_type, "dPdy", ir_var_temporary);
+   emit(dPdy, mul(size, ir->lod_info.grad.dPdy));
+
+   /* Calculate rho from equation 3.20 of the GL 3.0 specification. */
+   ir_rvalue *rho;
+   if (dPdx->type->is_scalar()) {
+      rho = expr(ir_binop_max, expr(ir_unop_abs, dPdx),
+			       expr(ir_unop_abs, dPdy));
+   } else {
+      rho = expr(ir_binop_max, expr(ir_unop_sqrt, dot(dPdx, dPdx)),
+			       expr(ir_unop_sqrt, dot(dPdy, dPdy)));
+   }
+
+   /* lambda_base = log2(rho).  We're ignoring GL state biases for now. */
+   ir->op = ir_txl;
+   ir->lod_info.lod = expr(ir_unop_log2, rho);
+
+   progress = true;
+   return visit_continue;
+}
+
+extern "C" {
+
+bool
+brw_lower_texture_gradients(struct brw_context *brw,
+                            struct exec_list *instructions)
+{
+   bool has_sample_d_c = brw->gen >= 8 || brw->is_haswell;
+   lower_texture_grad_visitor v(has_sample_d_c);
+
+   visit_list_elements(&v, instructions);
+
+   return v.progress;
+}
+
+}

diff --git a/icd/intel/compiler/pipeline/brw_lower_unnormalized_offset.cpp b/icd/intel/compiler/pipeline/brw_lower_unnormalized_offset.cpp
new file mode 100644
index 0000000..c95d7f3
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_lower_unnormalized_offset.cpp

@@ -0,0 +1,101 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file brw_lower_unnormalized_offset.cpp
+ *
+ * IR lower pass to convert a texture offset into an adjusted coordinate,
+ * for use with unnormalized coordinates. At least the gather4* messages
+ * on Ivybridge and Haswell make a mess with nonzero offsets.
+ *
+ * \author Chris Forbes <chrisf@ijw.co.nz>
+ */
+
+#include "glsl/glsl_types.h"
+#include "glsl/ir.h"
+#include "glsl/ir_builder.h"
+
+using namespace ir_builder;
+
+class brw_lower_unnormalized_offset_visitor : public ir_hierarchical_visitor {
+public:
+   brw_lower_unnormalized_offset_visitor()
+   {
+      progress = false;
+   }
+
+   ir_visitor_status visit_leave(ir_texture *ir);
+
+   bool progress;
+};
+
+ir_visitor_status
+brw_lower_unnormalized_offset_visitor::visit_leave(ir_texture *ir)
+{
+   if (!ir->offset)
+      return visit_continue;
+
+   if (ir->op == ir_tg4 || ir->op == ir_tex) {
+      if (ir->sampler->type->sampler_dimensionality != GLSL_SAMPLER_DIM_RECT)
+         return visit_continue;
+   }
+   else if (ir->op != ir_txf) {
+      return visit_continue;
+   }
+
+   void *mem_ctx = ralloc_parent(ir);
+
+   if (ir->op == ir_txf) {
+      ir_variable *var = new(mem_ctx) ir_variable(ir->coordinate->type,
+                                                  "coordinate",
+                                                  ir_var_temporary);
+      base_ir->insert_before(var);
+      base_ir->insert_before(assign(var, ir->coordinate));
+      base_ir->insert_before(assign(var,
+               add(swizzle_for_size(var, ir->offset->type->vector_elements), ir->offset),
+               (1 << ir->offset->type->vector_elements) - 1));
+
+      ir->coordinate = new(mem_ctx) ir_dereference_variable(var);
+   } else {
+      ir->coordinate = add(ir->coordinate, i2f(ir->offset));
+   }
+
+   ir->offset = NULL;
+
+   progress = true;
+   return visit_continue;
+}
+
+extern "C" {
+
+bool
+brw_do_lower_unnormalized_offset(exec_list *instructions)
+{
+   brw_lower_unnormalized_offset_visitor v;
+
+   visit_list_elements(&v, instructions);
+
+   return v.progress;
+}
+
+}

diff --git a/icd/intel/compiler/pipeline/brw_program.c b/icd/intel/compiler/pipeline/brw_program.c
new file mode 100644
index 0000000..a56c947
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_program.c

@@ -0,0 +1,606 @@
+/*
+ Copyright (C) Intel Corp.  2006.  All Rights Reserved.
+ Intel funded Tungsten Graphics to
+ develop this 3D driver.
+
+ Permission is hereby granted, free of charge, to any person obtaining
+ a copy of this software and associated documentation files (the
+ "Software"), to deal in the Software without restriction, including
+ without limitation the rights to use, copy, modify, merge, publish,
+ distribute, sublicense, and/or sell copies of the Software, and to
+ permit persons to whom the Software is furnished to do so, subject to
+ the following conditions:
+
+ The above copyright notice and this permission notice (including the
+ next paragraph) shall be included in all copies or substantial
+ portions of the Software.
+
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+ **********************************************************************/
+ /*
+  * Authors:
+  *   Keith Whitwell <keithw@vmware.com>
+  */
+
+#include <pthread.h>
+#include "main/imports.h"
+#include "main/enums.h"
+//#include "main/shaderobj.h" // LunarG: Remove
+#include "program/prog_parameter.h"
+#include "program/prog_print.h"
+#include "program/program.h"
+//#include "program/programopt.h" // LunarG: Remove
+//#include "tnl/tnl.h" // LunarG: Remove
+#include "glsl/ralloc.h"
+#include "glsl/ir.h"
+
+#include "brw_context.h"
+#include "brw_wm.h"
+
+static unsigned
+get_new_program_id(struct intel_screen *screen)
+{
+   unsigned id = screen->program_id++;
+
+   return id;
+}
+
+//static void brwBindProgram( struct gl_context *ctx,
+//			    GLenum target,
+//			    struct gl_program *prog )
+//{
+//   struct brw_context *brw = brw_context(ctx);
+
+//   switch (target) {
+//   case GL_VERTEX_PROGRAM_ARB:
+//      brw->state.dirty.brw |= BRW_NEW_VERTEX_PROGRAM;
+//      break;
+//   case MESA_GEOMETRY_PROGRAM:
+//      brw->state.dirty.brw |= BRW_NEW_GEOMETRY_PROGRAM;
+//      break;
+//   case GL_FRAGMENT_PROGRAM_ARB:
+//      brw->state.dirty.brw |= BRW_NEW_FRAGMENT_PROGRAM;
+//      break;
+//   }
+//}
+
+struct gl_program *brwNewProgram( struct gl_context *ctx,
+                      GLenum target,
+                      GLuint id )
+{
+   struct brw_context *brw = brw_context(ctx);
+
+   switch (target) {
+   case GL_VERTEX_PROGRAM_ARB: {
+       struct brw_vertex_program *prog = CALLOC_STRUCT(brw_vertex_program);
+       if (prog) {
+           prog->id = get_new_program_id(brw->intelScreen);
+
+           return _mesa_init_vertex_program( ctx, &prog->program,
+                                             target, id );
+       }
+       else
+           return NULL;
+   }
+
+   case MESA_GEOMETRY_PROGRAM: {
+      struct brw_geometry_program *prog = CALLOC_STRUCT(brw_geometry_program);
+      if (prog) {
+         prog->id = get_new_program_id(brw->intelScreen);
+
+         return _mesa_init_geometry_program(ctx, &prog->program, target, id);
+      } else {
+         return NULL;
+      }
+   }
+
+   case GL_FRAGMENT_PROGRAM_ARB: {
+      struct brw_fragment_program *prog = CALLOC_STRUCT(brw_fragment_program);
+      if (prog) {
+     prog->id = get_new_program_id(brw->intelScreen);
+
+     return _mesa_init_fragment_program( ctx, &prog->program,
+                         target, id );
+      }
+      else
+     return NULL;
+   }
+
+//   case GL_COMPUTE_PROGRAM_NV: {
+//      struct brw_compute_program *prog = CALLOC_STRUCT(brw_compute_program);
+//      if (prog) {
+//         prog->id = get_new_program_id(brw->intelScreen);
+
+//         return _mesa_init_compute_program(ctx, &prog->program, target, id);
+//      } else {
+//         return NULL;
+//      }
+//   }
+
+   default:
+      assert(!"Unsupported target in brwNewProgram()");
+      return NULL;
+   }
+}
+
+void brwDeleteProgram( struct gl_context *ctx,
+                  struct gl_program *prog )
+{
+   _mesa_delete_program( ctx, prog );
+}
+
+
+// LunarG : Remove - Most of this is shader time related, may turn back on later
+
+//static GLboolean
+//brwIsProgramNative(struct gl_context *ctx,
+//		   GLenum target,
+//		   struct gl_program *prog)
+//{
+//   return true;
+//}
+
+//static GLboolean
+//brwProgramStringNotify(struct gl_context *ctx,
+//		       GLenum target,
+//		       struct gl_program *prog)
+//{
+//   struct brw_context *brw = brw_context(ctx);
+
+//   switch (target) {
+//   case GL_FRAGMENT_PROGRAM_ARB: {
+//      struct gl_fragment_program *fprog = (struct gl_fragment_program *) prog;
+//      struct brw_fragment_program *newFP = brw_fragment_program(fprog);
+//      const struct brw_fragment_program *curFP =
+//         brw_fragment_program_const(brw->fragment_program);
+
+//      if (newFP == curFP)
+//	 brw->state.dirty.brw |= BRW_NEW_FRAGMENT_PROGRAM;
+//      newFP->id = get_new_program_id(brw->intelScreen);
+//      break;
+//   }
+//   case GL_VERTEX_PROGRAM_ARB: {
+//      struct gl_vertex_program *vprog = (struct gl_vertex_program *) prog;
+//      struct brw_vertex_program *newVP = brw_vertex_program(vprog);
+//      const struct brw_vertex_program *curVP =
+//         brw_vertex_program_const(brw->vertex_program);
+
+//      if (newVP == curVP)
+//	 brw->state.dirty.brw |= BRW_NEW_VERTEX_PROGRAM;
+//      if (newVP->program.IsPositionInvariant) {
+//	 _mesa_insert_mvp_code(ctx, &newVP->program);
+//      }
+//      newVP->id = get_new_program_id(brw->intelScreen);
+
+//      /* Also tell tnl about it:
+//       */
+//      _tnl_program_string(ctx, target, prog);
+//      break;
+//   }
+//   default:
+//      /*
+//       * driver->ProgramStringNotify is only called for ARB programs, fixed
+//       * function vertex programs, and ir_to_mesa (which isn't used by the
+//       * i965 back-end).  Therefore, even after geometry shaders are added,
+//       * this function should only ever be called with a target of
+//       * GL_VERTEX_PROGRAM_ARB or GL_FRAGMENT_PROGRAM_ARB.
+//       */
+//      assert(!"Unexpected target in brwProgramStringNotify");
+//      break;
+//   }
+
+//   brw_add_texrect_params(prog);
+
+//   return true;
+//}
+
+//void
+//brw_add_texrect_params(struct gl_program *prog)
+//{
+//   for (int texunit = 0; texunit < BRW_MAX_TEX_UNIT; texunit++) {
+//      if (!(prog->TexturesUsed[texunit] & (1 << TEXTURE_RECT_INDEX)))
+//         continue;
+
+//      int tokens[STATE_LENGTH] = {
+//         STATE_INTERNAL,
+//         STATE_TEXRECT_SCALE,
+//         texunit,
+//         0,
+//         0
+//      };
+
+//      _mesa_add_state_reference(prog->Parameters, (gl_state_index *)tokens);
+//   }
+//}
+
+/* Per-thread scratch space is a power-of-two multiple of 1KB. */
+int
+brw_get_scratch_size(int size)
+{
+   int i;
+
+   for (i = 1024; i < size; i *= 2)
+      ;
+
+   return i;
+}
+
+//void
+//brw_get_scratch_bo(struct brw_context *brw,
+//		   drm_intel_bo **scratch_bo, int size)
+//{
+//   drm_intel_bo *old_bo = *scratch_bo;
+
+//   if (old_bo && old_bo->size < size) {
+//      drm_intel_bo_unreference(old_bo);
+//      old_bo = NULL;
+//   }
+
+//   if (!old_bo) {
+//      *scratch_bo = drm_intel_bo_alloc(brw->bufmgr, "scratch bo", size, 4096);
+//   }
+//}
+
+//void brwInitFragProgFuncs( struct dd_function_table *functions )
+//{
+//   assert(functions->ProgramStringNotify == _tnl_program_string);
+
+//   functions->BindProgram = brwBindProgram;
+//   functions->NewProgram = brwNewProgram;
+//   functions->DeleteProgram = brwDeleteProgram;
+//   functions->IsProgramNative = brwIsProgramNative;
+//   functions->ProgramStringNotify = brwProgramStringNotify;
+
+//   functions->NewShader = brw_new_shader;
+//   functions->NewShaderProgram = brw_new_shader_program;
+//   functions->LinkShader = brw_link_shader;
+//   functions->NotifyLinkShader = brw_notify_link_shader;
+//}
+
+//void
+//brw_init_shader_time(struct brw_context *brw)
+//{
+//   const int max_entries = 8192;
+//   brw->shader_time.bo = drm_intel_bo_alloc(brw->bufmgr, "shader time",
+//                                            max_entries * SHADER_TIME_STRIDE,
+//                                            4096);
+//   brw->shader_time.shader_programs = rzalloc_array(brw, struct gl_shader_program *,
+//                                                    max_entries);
+//   brw->shader_time.programs = rzalloc_array(brw, struct gl_program *,
+//                                             max_entries);
+//   brw->shader_time.types = rzalloc_array(brw, enum shader_time_shader_type,
+//                                          max_entries);
+//   brw->shader_time.cumulative = rzalloc_array(brw, uint64_t,
+//                                               max_entries);
+//   brw->shader_time.max_entries = max_entries;
+//}
+
+//static int
+//compare_time(const void *a, const void *b)
+//{
+//   uint64_t * const *a_val = a;
+//   uint64_t * const *b_val = b;
+
+//   /* We don't just subtract because we're turning the value to an int. */
+//   if (**a_val < **b_val)
+//      return -1;
+//   else if (**a_val == **b_val)
+//      return 0;
+//   else
+//      return 1;
+//}
+
+//static void
+//get_written_and_reset(struct brw_context *brw, int i,
+//                      uint64_t *written, uint64_t *reset)
+//{
+//   enum shader_time_shader_type type = brw->shader_time.types[i];
+//   assert(type == ST_VS || type == ST_GS || type == ST_FS8 || type == ST_FS16);
+
+//   /* Find where we recorded written and reset. */
+//   int wi, ri;
+
+//   for (wi = i; brw->shader_time.types[wi] != type + 1; wi++)
+//      ;
+
+//   for (ri = i; brw->shader_time.types[ri] != type + 2; ri++)
+//      ;
+
+//   *written = brw->shader_time.cumulative[wi];
+//   *reset = brw->shader_time.cumulative[ri];
+//}
+
+//static void
+//print_shader_time_line(const char *stage, const char *name,
+//                       int shader_num, uint64_t time, uint64_t total)
+//{
+//   fprintf(stderr, "%-6s%-18s", stage, name);
+
+//   if (shader_num != -1)
+//      fprintf(stderr, "%4d: ", shader_num);
+//   else
+//      fprintf(stderr, "    : ");
+
+//   fprintf(stderr, "%16lld (%7.2f Gcycles)      %4.1f%%\n",
+//           (long long)time,
+//           (double)time / 1000000000.0,
+//           (double)time / total * 100.0);
+//}
+
+//static void
+//brw_report_shader_time(struct brw_context *brw)
+//{
+//   if (!brw->shader_time.bo || !brw->shader_time.num_entries)
+//      return;
+
+//   uint64_t scaled[brw->shader_time.num_entries];
+//   uint64_t *sorted[brw->shader_time.num_entries];
+//   uint64_t total_by_type[ST_FS16 + 1];
+//   memset(total_by_type, 0, sizeof(total_by_type));
+//   double total = 0;
+//   for (int i = 0; i < brw->shader_time.num_entries; i++) {
+//      uint64_t written = 0, reset = 0;
+//      enum shader_time_shader_type type = brw->shader_time.types[i];
+
+//      sorted[i] = &scaled[i];
+
+//      switch (type) {
+//      case ST_VS_WRITTEN:
+//      case ST_VS_RESET:
+//      case ST_GS_WRITTEN:
+//      case ST_GS_RESET:
+//      case ST_FS8_WRITTEN:
+//      case ST_FS8_RESET:
+//      case ST_FS16_WRITTEN:
+//      case ST_FS16_RESET:
+//         /* We'll handle these when along with the time. */
+//         scaled[i] = 0;
+//         continue;
+
+//      case ST_VS:
+//      case ST_GS:
+//      case ST_FS8:
+//      case ST_FS16:
+//         get_written_and_reset(brw, i, &written, &reset);
+//         break;
+
+//      default:
+//         /* I sometimes want to print things that aren't the 3 shader times.
+//          * Just print the sum in that case.
+//          */
+//         written = 1;
+//         reset = 0;
+//         break;
+//      }
+
+//      uint64_t time = brw->shader_time.cumulative[i];
+//      if (written) {
+//         scaled[i] = time / written * (written + reset);
+//      } else {
+//         scaled[i] = time;
+//      }
+
+//      switch (type) {
+//      case ST_VS:
+//      case ST_GS:
+//      case ST_FS8:
+//      case ST_FS16:
+//         total_by_type[type] += scaled[i];
+//         break;
+//      default:
+//         break;
+//      }
+
+//      total += scaled[i];
+//   }
+
+//   if (total == 0) {
+//      fprintf(stderr, "No shader time collected yet\n");
+//      return;
+//   }
+
+//   qsort(sorted, brw->shader_time.num_entries, sizeof(sorted[0]), compare_time);
+
+//   fprintf(stderr, "\n");
+//   fprintf(stderr, "type          ID                  cycles spent                   %% of total\n");
+//   for (int s = 0; s < brw->shader_time.num_entries; s++) {
+//      const char *shader_name;
+//      const char *stage;
+//      /* Work back from the sorted pointers times to a time to print. */
+//      int i = sorted[s] - scaled;
+//      struct gl_shader_program *prog = brw->shader_time.shader_programs[i];
+
+//      if (scaled[i] == 0)
+//         continue;
+
+//      int shader_num = -1;
+//      if (prog) {
+//         shader_num = prog->Name;
+
+//         /* The fixed function fragment shader generates GLSL IR with a Name
+//          * of 0, and nothing else does.
+//          */
+//         if (prog->Label) {
+//            shader_name = prog->Label;
+//         } else if (shader_num == 0 &&
+//             (brw->shader_time.types[i] == ST_FS8 ||
+//              brw->shader_time.types[i] == ST_FS16)) {
+//            shader_name = "ff";
+//            shader_num = -1;
+//         } else {
+//            shader_name = "glsl";
+//         }
+//      } else if (brw->shader_time.programs[i]) {
+//         shader_num = brw->shader_time.programs[i]->Id;
+//         if (shader_num == 0) {
+//            shader_name = "ff";
+//            shader_num = -1;
+//         } else {
+//            shader_name = "prog";
+//         }
+//      } else {
+//         shader_name = "other";
+//      }
+
+//      switch (brw->shader_time.types[i]) {
+//      case ST_VS:
+//         stage = "vs";
+//         break;
+//      case ST_GS:
+//         stage = "gs";
+//         break;
+//      case ST_FS8:
+//         stage = "fs8";
+//         break;
+//      case ST_FS16:
+//         stage = "fs16";
+//         break;
+//      default:
+//         stage = "other";
+//         break;
+//      }
+
+//      print_shader_time_line(stage, shader_name, shader_num,
+//                             scaled[i], total);
+//   }
+
+//   fprintf(stderr, "\n");
+//   print_shader_time_line("total", "vs", -1, total_by_type[ST_VS], total);
+//   print_shader_time_line("total", "gs", -1, total_by_type[ST_GS], total);
+//   print_shader_time_line("total", "fs8", -1, total_by_type[ST_FS8], total);
+//   print_shader_time_line("total", "fs16", -1, total_by_type[ST_FS16], total);
+//}
+
+//static void
+//brw_collect_shader_time(struct brw_context *brw)
+//{
+//   if (!brw->shader_time.bo)
+//      return;
+
+//   /* This probably stalls on the last rendering.  We could fix that by
+//    * delaying reading the reports, but it doesn't look like it's a big
+//    * overhead compared to the cost of tracking the time in the first place.
+//    */
+//   drm_intel_bo_map(brw->shader_time.bo, true);
+
+//   uint32_t *times = brw->shader_time.bo->virtual;
+
+//   for (int i = 0; i < brw->shader_time.num_entries; i++) {
+//      brw->shader_time.cumulative[i] += times[i * SHADER_TIME_STRIDE / 4];
+//   }
+
+//   /* Zero the BO out to clear it out for our next collection.
+//    */
+//   memset(times, 0, brw->shader_time.bo->size);
+//   drm_intel_bo_unmap(brw->shader_time.bo);
+//}
+
+//void
+//brw_collect_and_report_shader_time(struct brw_context *brw)
+//{
+//   brw_collect_shader_time(brw);
+
+//   if (brw->shader_time.report_time == 0 ||
+//       get_time() - brw->shader_time.report_time >= 1.0) {
+//      brw_report_shader_time(brw);
+//      brw->shader_time.report_time = get_time();
+//   }
+//}
+
+///**
+// * Chooses an index in the shader_time buffer and sets up tracking information
+// * for our printouts.
+// *
+// * Note that this holds on to references to the underlying programs, which may
+// * change their lifetimes compared to normal operation.
+// */
+//int
+//brw_get_shader_time_index(struct brw_context *brw,
+//                          struct gl_shader_program *shader_prog,
+//                          struct gl_program *prog,
+//                          enum shader_time_shader_type type)
+//{
+//   struct gl_context *ctx = &brw->ctx;
+
+//   int shader_time_index = brw->shader_time.num_entries++;
+//   assert(shader_time_index < brw->shader_time.max_entries);
+//   brw->shader_time.types[shader_time_index] = type;
+
+//   _mesa_reference_shader_program(ctx,
+//                                  &brw->shader_time.shader_programs[shader_time_index],
+//                                  shader_prog);
+
+//   _mesa_reference_program(ctx,
+//                           &brw->shader_time.programs[shader_time_index],
+//                           prog);
+
+//   return shader_time_index;
+//}
+
+//void
+//brw_destroy_shader_time(struct brw_context *brw)
+//{
+//   drm_intel_bo_unreference(brw->shader_time.bo);
+//   brw->shader_time.bo = NULL;
+//}
+
+void
+brw_mark_surface_used(struct brw_stage_prog_data *prog_data,
+                      unsigned surf_index)
+{
+   assert(surf_index < BRW_MAX_SURFACES);
+
+   prog_data->binding_table.size_bytes =
+      MAX2(prog_data->binding_table.size_bytes, (surf_index + 1) * 4);
+}
+
+bool
+brw_stage_prog_data_compare(const struct brw_stage_prog_data *a,
+                            const struct brw_stage_prog_data *b)
+{
+   /* Compare all the struct up to the pointers. */
+   if (memcmp(a, b, offsetof(struct brw_stage_prog_data, param)))
+      return false;
+
+   if (memcmp(a->param, b->param, a->nr_params * sizeof(void *)))
+      return false;
+
+   if (memcmp(a->pull_param, b->pull_param, a->nr_pull_params * sizeof(void *)))
+      return false;
+
+   return true;
+}
+
+void
+brw_stage_prog_data_free(const void *p)
+{
+   struct brw_stage_prog_data *prog_data = (struct brw_stage_prog_data *)p;
+
+   ralloc_free(prog_data->param);
+   ralloc_free(prog_data->pull_param);
+}
+
+void
+brw_dump_ir(struct brw_context *brw, const char *stage,
+            struct gl_shader_program *shader_prog,
+            struct gl_shader *shader, struct gl_program *prog)
+{
+//   if (shader_prog) {
+     assert(shader_prog);
+      fprintf(stderr,
+              "GLSL IR for native %s shader %d:\n", stage, shader_prog->Name);
+      _mesa_print_ir(stderr, shader->ir, NULL);
+      fprintf(stderr, "\n\n");
+//   } else {
+//      fprintf(stderr, "ARB_%s_program %d ir for native %s shader\n",
+//              stage, prog->Id, stage);
+//      _mesa_print_program(prog);
+//   }
+}

diff --git a/icd/intel/compiler/pipeline/brw_program.h b/icd/intel/compiler/pipeline/brw_program.h
new file mode 100644
index 0000000..53ddc32
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_program.h

@@ -0,0 +1,101 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef BRW_PROGRAM_H
+#define BRW_PROGRAM_H
+
+enum gen6_gather_sampler_wa {
+   WA_SIGN = 1,      /* whether we need to sign extend */
+   WA_8BIT = 2,      /* if we have an 8bit format needing wa */
+   WA_16BIT = 4,     /* if we have a 16bit format needing wa */
+};
+
+/**
+ * Sampler information needed by VS, WM, and GS program cache keys.
+ */
+struct brw_sampler_prog_key_data {
+   /**
+    * EXT_texture_swizzle and DEPTH_TEXTURE_MODE swizzles.
+    */
+   uint16_t swizzles[MAX_SAMPLERS];
+
+   uint32_t gl_clamp_mask[3];
+
+   /**
+    * For RG32F, gather4's channel select is broken.
+    */
+   uint32_t gather_channel_quirk_mask;
+
+   /**
+    * Whether this sampler uses the compressed multisample surface layout.
+    */
+   uint32_t compressed_multisample_layout_mask;
+
+   /**
+    * For Sandybridge, which shader w/a we need for gather quirks.
+    */
+   uint8_t gen6_gather_wa[MAX_SAMPLERS];
+};
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void brw_populate_sampler_prog_key_data(struct gl_context *ctx,
+				        const struct gl_program *prog,
+                                        unsigned sampler_count,
+				        struct brw_sampler_prog_key_data *key);
+bool brw_debug_recompile_sampler_key(struct brw_context *brw,
+                                     const struct brw_sampler_prog_key_data *old_key,
+                                     const struct brw_sampler_prog_key_data *key);
+void brw_add_texrect_params(struct gl_program *prog);
+
+void
+brw_mark_surface_used(struct brw_stage_prog_data *prog_data,
+                      unsigned surf_index);
+
+bool
+brw_stage_prog_data_compare(const struct brw_stage_prog_data *a,
+                            const struct brw_stage_prog_data *b);
+
+void
+brw_stage_prog_data_free(const void *prog_data);
+
+void
+brw_dump_ir(struct brw_context *brw, const char *stage,
+            struct gl_shader_program *shader_prog,
+            struct gl_shader *shader, struct gl_program *prog);
+
+// LunarG : ADD
+struct gl_program *brwNewProgram( struct gl_context *ctx,
+                                 GLenum target,
+                                 GLuint id );
+
+void brwDeleteProgram( struct gl_context *ctx,
+                       struct gl_program *prog );
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+#endif

diff --git a/icd/intel/compiler/pipeline/brw_reg.h b/icd/intel/compiler/pipeline/brw_reg.h
new file mode 100644
index 0000000..38c0ae5
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_reg.h

@@ -0,0 +1,891 @@
+/*
+ Copyright (C) Intel Corp.  2006.  All Rights Reserved.
+ Intel funded Tungsten Graphics to
+ develop this 3D driver.
+
+ Permission is hereby granted, free of charge, to any person obtaining
+ a copy of this software and associated documentation files (the
+ "Software"), to deal in the Software without restriction, including
+ without limitation the rights to use, copy, modify, merge, publish,
+ distribute, sublicense, and/or sell copies of the Software, and to
+ permit persons to whom the Software is furnished to do so, subject to
+ the following conditions:
+
+ The above copyright notice and this permission notice (including the
+ next paragraph) shall be included in all copies or substantial
+ portions of the Software.
+
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+ **********************************************************************/
+ /*
+  * Authors:
+  *   Keith Whitwell <keithw@vmware.com>
+  */
+
+/** @file brw_reg.h
+ *
+ * This file defines struct brw_reg, which is our representation for EU
+ * registers.  They're not a hardware specific format, just an abstraction
+ * that intends to capture the full flexibility of the hardware registers.
+ *
+ * The brw_eu_emit.c layer's brw_set_dest/brw_set_src[01] functions encode
+ * the abstract brw_reg type into the actual hardware instruction encoding.
+ */
+
+#ifndef BRW_REG_H
+#define BRW_REG_H
+
+#include <stdbool.h>
+#include "main/imports.h"
+#include "main/compiler.h"
+#include "program/prog_instruction.h"
+#include "brw_defines.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** Number of general purpose registers (VS, WM, etc) */
+#define BRW_MAX_GRF 128
+
+/**
+ * First GRF used for the MRF hack.
+ *
+ * On gen7, MRFs are no longer used, and contiguous GRFs are used instead.  We
+ * haven't converted our compiler to be aware of this, so it asks for MRFs and
+ * brw_eu_emit.c quietly converts them to be accesses of the top GRFs.  The
+ * register allocators have to be careful of this to avoid corrupting the "MRF"s
+ * with actual GRF allocations.
+ */
+#define GEN7_MRF_HACK_START 112
+
+/** Number of message register file registers */
+#define BRW_MAX_MRF 16
+
+#define BRW_SWIZZLE4(a,b,c,d) (((a)<<0) | ((b)<<2) | ((c)<<4) | ((d)<<6))
+#define BRW_GET_SWZ(swz, idx) (((swz) >> ((idx)*2)) & 0x3)
+
+#define BRW_SWIZZLE_NOOP      BRW_SWIZZLE4(0,1,2,3)
+#define BRW_SWIZZLE_XYZW      BRW_SWIZZLE4(0,1,2,3)
+#define BRW_SWIZZLE_XXXX      BRW_SWIZZLE4(0,0,0,0)
+#define BRW_SWIZZLE_YYYY      BRW_SWIZZLE4(1,1,1,1)
+#define BRW_SWIZZLE_ZZZZ      BRW_SWIZZLE4(2,2,2,2)
+#define BRW_SWIZZLE_WWWW      BRW_SWIZZLE4(3,3,3,3)
+#define BRW_SWIZZLE_XYXY      BRW_SWIZZLE4(0,1,0,1)
+#define BRW_SWIZZLE_YZXW      BRW_SWIZZLE4(1,2,0,3)
+#define BRW_SWIZZLE_ZXYW      BRW_SWIZZLE4(2,0,1,3)
+#define BRW_SWIZZLE_ZWZW      BRW_SWIZZLE4(2,3,2,3)
+
+static inline bool
+brw_is_single_value_swizzle(int swiz)
+{
+   return (swiz == BRW_SWIZZLE_XXXX ||
+           swiz == BRW_SWIZZLE_YYYY ||
+           swiz == BRW_SWIZZLE_ZZZZ ||
+           swiz == BRW_SWIZZLE_WWWW);
+}
+
+enum PACKED brw_reg_type {
+   BRW_REGISTER_TYPE_UD = 0,
+   BRW_REGISTER_TYPE_D,
+   BRW_REGISTER_TYPE_UW,
+   BRW_REGISTER_TYPE_W,
+   BRW_REGISTER_TYPE_F,
+
+   /** Non-immediates only: @{ */
+   BRW_REGISTER_TYPE_UB,
+   BRW_REGISTER_TYPE_B,
+   /** @} */
+
+   /** Immediates only: @{ */
+   BRW_REGISTER_TYPE_UV,
+   BRW_REGISTER_TYPE_V,
+   BRW_REGISTER_TYPE_VF,
+   /** @} */
+
+   BRW_REGISTER_TYPE_DF, /* Gen7+ (no immediates until Gen8+) */
+
+   /* Gen8+ */
+   BRW_REGISTER_TYPE_HF,
+   BRW_REGISTER_TYPE_UQ,
+   BRW_REGISTER_TYPE_Q,
+};
+
+unsigned brw_reg_type_to_hw_type(const struct brw_context *brw,
+                                 enum brw_reg_type type, unsigned file);
+const char *brw_reg_type_letters(unsigned brw_reg_type);
+
+#define REG_SIZE (8*4)
+
+/* These aren't hardware structs, just something useful for us to pass around:
+ *
+ * Align1 operation has a lot of control over input ranges.  Used in
+ * WM programs to implement shaders decomposed into "channel serial"
+ * or "structure of array" form:
+ */
+struct brw_reg {
+   unsigned type:4;
+   unsigned file:2;
+   unsigned nr:8;
+   unsigned subnr:5;              /* :1 in align16 */
+   unsigned negate:1;             /* source only */
+   unsigned abs:1;                /* source only */
+   unsigned vstride:4;            /* source only */
+   unsigned width:3;              /* src only, align1 only */
+   unsigned hstride:2;            /* align1 only */
+   unsigned address_mode:1;       /* relative addressing, hopefully! */
+   unsigned pad0:1;
+
+   union {
+      struct {
+         unsigned swizzle:8;      /* src only, align16 only */
+         unsigned writemask:4;    /* dest only, align16 only */
+         int  indirect_offset:10; /* relative addressing offset */
+         unsigned pad1:10;        /* two dwords total */
+      } bits;
+
+      float f;
+      int   d;
+      unsigned ud;
+   } dw1;
+};
+
+
+struct brw_indirect {
+   unsigned addr_subnr:4;
+   int addr_offset:10;
+   unsigned pad:18;
+};
+
+
+static inline int
+type_sz(unsigned type)
+{
+   switch(type) {
+   case BRW_REGISTER_TYPE_UD:
+   case BRW_REGISTER_TYPE_D:
+   case BRW_REGISTER_TYPE_F:
+      return 4;
+   case BRW_REGISTER_TYPE_UW:
+   case BRW_REGISTER_TYPE_W:
+      return 2;
+   case BRW_REGISTER_TYPE_UB:
+   case BRW_REGISTER_TYPE_B:
+      return 1;
+   default:
+      return 0;
+   }
+}
+
+static inline bool
+type_is_signed(unsigned type)
+{
+   switch(type) {
+   case BRW_REGISTER_TYPE_D:
+   case BRW_REGISTER_TYPE_W:
+   case BRW_REGISTER_TYPE_F:
+   case BRW_REGISTER_TYPE_B:
+   case BRW_REGISTER_TYPE_V:
+   case BRW_REGISTER_TYPE_VF:
+   case BRW_REGISTER_TYPE_DF:
+   case BRW_REGISTER_TYPE_HF:
+   case BRW_REGISTER_TYPE_Q:
+      return true;
+
+   case BRW_REGISTER_TYPE_UD:
+   case BRW_REGISTER_TYPE_UW:
+   case BRW_REGISTER_TYPE_UB:
+   case BRW_REGISTER_TYPE_UV:
+   case BRW_REGISTER_TYPE_UQ:
+      return false;
+
+   default:
+      assert(!"Unreachable.");
+      return false;
+   }
+}
+
+/**
+ * Construct a brw_reg.
+ * \param file      one of the BRW_x_REGISTER_FILE values
+ * \param nr        register number/index
+ * \param subnr     register sub number
+ * \param type      one of BRW_REGISTER_TYPE_x
+ * \param vstride   one of BRW_VERTICAL_STRIDE_x
+ * \param width     one of BRW_WIDTH_x
+ * \param hstride   one of BRW_HORIZONTAL_STRIDE_x
+ * \param swizzle   one of BRW_SWIZZLE_x
+ * \param writemask WRITEMASK_X/Y/Z/W bitfield
+ */
+static inline struct brw_reg
+brw_reg(unsigned file,
+        unsigned nr,
+        unsigned subnr,
+        unsigned type,
+        unsigned vstride,
+        unsigned width,
+        unsigned hstride,
+        unsigned swizzle,
+        unsigned writemask)
+{
+   struct brw_reg reg;
+   if (file == BRW_GENERAL_REGISTER_FILE)
+      assert(nr < BRW_MAX_GRF);
+   else if (file == BRW_MESSAGE_REGISTER_FILE)
+      assert((nr & ~(1 << 7)) < BRW_MAX_MRF);
+   else if (file == BRW_ARCHITECTURE_REGISTER_FILE)
+      assert(nr <= BRW_ARF_TIMESTAMP);
+
+   reg.type = type;
+   reg.file = file;
+   reg.nr = nr;
+   reg.subnr = subnr * type_sz(type);
+   reg.negate = 0;
+   reg.abs = 0;
+   reg.vstride = vstride;
+   reg.width = width;
+   reg.hstride = hstride;
+   reg.address_mode = BRW_ADDRESS_DIRECT;
+   reg.pad0 = 0;
+
+   /* Could do better: If the reg is r5.3<0;1,0>, we probably want to
+    * set swizzle and writemask to W, as the lower bits of subnr will
+    * be lost when converted to align16.  This is probably too much to
+    * keep track of as you'd want it adjusted by suboffset(), etc.
+    * Perhaps fix up when converting to align16?
+    */
+   reg.dw1.bits.swizzle = swizzle;
+   reg.dw1.bits.writemask = writemask;
+   reg.dw1.bits.indirect_offset = 0;
+   reg.dw1.bits.pad1 = 0;
+   return reg;
+}
+
+/** Construct float[16] register */
+static inline struct brw_reg
+brw_vec16_reg(unsigned file, unsigned nr, unsigned subnr)
+{
+   return brw_reg(file,
+                  nr,
+                  subnr,
+                  BRW_REGISTER_TYPE_F,
+                  BRW_VERTICAL_STRIDE_16,
+                  BRW_WIDTH_16,
+                  BRW_HORIZONTAL_STRIDE_1,
+                  BRW_SWIZZLE_XYZW,
+                  WRITEMASK_XYZW);
+}
+
+/** Construct float[8] register */
+static inline struct brw_reg
+brw_vec8_reg(unsigned file, unsigned nr, unsigned subnr)
+{
+   return brw_reg(file,
+                  nr,
+                  subnr,
+                  BRW_REGISTER_TYPE_F,
+                  BRW_VERTICAL_STRIDE_8,
+                  BRW_WIDTH_8,
+                  BRW_HORIZONTAL_STRIDE_1,
+                  BRW_SWIZZLE_XYZW,
+                  WRITEMASK_XYZW);
+}
+
+/** Construct float[4] register */
+static inline struct brw_reg
+brw_vec4_reg(unsigned file, unsigned nr, unsigned subnr)
+{
+   return brw_reg(file,
+                  nr,
+                  subnr,
+                  BRW_REGISTER_TYPE_F,
+                  BRW_VERTICAL_STRIDE_4,
+                  BRW_WIDTH_4,
+                  BRW_HORIZONTAL_STRIDE_1,
+                  BRW_SWIZZLE_XYZW,
+                  WRITEMASK_XYZW);
+}
+
+/** Construct float[2] register */
+static inline struct brw_reg
+brw_vec2_reg(unsigned file, unsigned nr, unsigned subnr)
+{
+   return brw_reg(file,
+                  nr,
+                  subnr,
+                  BRW_REGISTER_TYPE_F,
+                  BRW_VERTICAL_STRIDE_2,
+                  BRW_WIDTH_2,
+                  BRW_HORIZONTAL_STRIDE_1,
+                  BRW_SWIZZLE_XYXY,
+                  WRITEMASK_XY);
+}
+
+/** Construct float[1] register */
+static inline struct brw_reg
+brw_vec1_reg(unsigned file, unsigned nr, unsigned subnr)
+{
+   return brw_reg(file,
+                  nr,
+                  subnr,
+                  BRW_REGISTER_TYPE_F,
+                  BRW_VERTICAL_STRIDE_0,
+                  BRW_WIDTH_1,
+                  BRW_HORIZONTAL_STRIDE_0,
+                  BRW_SWIZZLE_XXXX,
+                  WRITEMASK_X);
+}
+
+static inline struct brw_reg
+brw_vecn_reg(unsigned width, unsigned file, unsigned nr, unsigned subnr)
+{
+   switch (width) {
+   case 1:
+      return brw_vec1_reg(file, nr, subnr);
+   case 2:
+      return brw_vec2_reg(file, nr, subnr);
+   case 4:
+      return brw_vec4_reg(file, nr, subnr);
+   case 8:
+      return brw_vec8_reg(file, nr, subnr);
+   case 16:
+      return brw_vec16_reg(file, nr, subnr);
+   default:
+      assert(!"Invalid register width");
+   }
+   unreachable();
+}
+
+static inline struct brw_reg
+retype(struct brw_reg reg, unsigned type)
+{
+   reg.type = type;
+   return reg;
+}
+
+static inline struct brw_reg
+sechalf(struct brw_reg reg)
+{
+   if (reg.vstride)
+      reg.nr++;
+   return reg;
+}
+
+static inline struct brw_reg
+suboffset(struct brw_reg reg, unsigned delta)
+{
+   reg.subnr += delta * type_sz(reg.type);
+   return reg;
+}
+
+
+static inline struct brw_reg
+offset(struct brw_reg reg, unsigned delta)
+{
+   reg.nr += delta;
+   return reg;
+}
+
+
+static inline struct brw_reg
+byte_offset(struct brw_reg reg, unsigned bytes)
+{
+   unsigned newoffset = reg.nr * REG_SIZE + reg.subnr + bytes;
+   reg.nr = newoffset / REG_SIZE;
+   reg.subnr = newoffset % REG_SIZE;
+   return reg;
+}
+
+
+/** Construct unsigned word[16] register */
+static inline struct brw_reg
+brw_uw16_reg(unsigned file, unsigned nr, unsigned subnr)
+{
+   return suboffset(retype(brw_vec16_reg(file, nr, 0), BRW_REGISTER_TYPE_UW), subnr);
+}
+
+/** Construct unsigned word[8] register */
+static inline struct brw_reg
+brw_uw8_reg(unsigned file, unsigned nr, unsigned subnr)
+{
+   return suboffset(retype(brw_vec8_reg(file, nr, 0), BRW_REGISTER_TYPE_UW), subnr);
+}
+
+/** Construct unsigned word[1] register */
+static inline struct brw_reg
+brw_uw1_reg(unsigned file, unsigned nr, unsigned subnr)
+{
+   return suboffset(retype(brw_vec1_reg(file, nr, 0), BRW_REGISTER_TYPE_UW), subnr);
+}
+
+static inline struct brw_reg
+brw_imm_reg(unsigned type)
+{
+   return brw_reg(BRW_IMMEDIATE_VALUE,
+                  0,
+                  0,
+                  type,
+                  BRW_VERTICAL_STRIDE_0,
+                  BRW_WIDTH_1,
+                  BRW_HORIZONTAL_STRIDE_0,
+                  0,
+                  0);
+}
+
+/** Construct float immediate register */
+static inline struct brw_reg
+brw_imm_f(float f)
+{
+   struct brw_reg imm = brw_imm_reg(BRW_REGISTER_TYPE_F);
+   imm.dw1.f = f;
+   return imm;
+}
+
+/** Construct integer immediate register */
+static inline struct brw_reg
+brw_imm_d(int d)
+{
+   struct brw_reg imm = brw_imm_reg(BRW_REGISTER_TYPE_D);
+   imm.dw1.d = d;
+   return imm;
+}
+
+/** Construct uint immediate register */
+static inline struct brw_reg
+brw_imm_ud(unsigned ud)
+{
+   struct brw_reg imm = brw_imm_reg(BRW_REGISTER_TYPE_UD);
+   imm.dw1.ud = ud;
+   return imm;
+}
+
+/** Construct ushort immediate register */
+static inline struct brw_reg
+brw_imm_uw(uint16_t uw)
+{
+   struct brw_reg imm = brw_imm_reg(BRW_REGISTER_TYPE_UW);
+   imm.dw1.ud = uw | (uw << 16);
+   return imm;
+}
+
+/** Construct short immediate register */
+static inline struct brw_reg
+brw_imm_w(int16_t w)
+{
+   struct brw_reg imm = brw_imm_reg(BRW_REGISTER_TYPE_W);
+   imm.dw1.d = w | (w << 16);
+   return imm;
+}
+
+/* brw_imm_b and brw_imm_ub aren't supported by hardware - the type
+ * numbers alias with _V and _VF below:
+ */
+
+/** Construct vector of eight signed half-byte values */
+static inline struct brw_reg
+brw_imm_v(unsigned v)
+{
+   struct brw_reg imm = brw_imm_reg(BRW_REGISTER_TYPE_V);
+   imm.vstride = BRW_VERTICAL_STRIDE_0;
+   imm.width = BRW_WIDTH_8;
+   imm.hstride = BRW_HORIZONTAL_STRIDE_1;
+   imm.dw1.ud = v;
+   return imm;
+}
+
+/** Construct vector of four 8-bit float values */
+static inline struct brw_reg
+brw_imm_vf(unsigned v)
+{
+   struct brw_reg imm = brw_imm_reg(BRW_REGISTER_TYPE_VF);
+   imm.vstride = BRW_VERTICAL_STRIDE_0;
+   imm.width = BRW_WIDTH_4;
+   imm.hstride = BRW_HORIZONTAL_STRIDE_1;
+   imm.dw1.ud = v;
+   return imm;
+}
+
+/**
+ * Convert an integer into a "restricted" 8-bit float, used in vector
+ * immediates.  The 8-bit floating point format has a sign bit, an
+ * excess-3 3-bit exponent, and a 4-bit mantissa.  All integer values
+ * from -31 to 31 can be represented exactly.
+ */
+static inline uint8_t
+int_to_float8(int x)
+{
+   if (x == 0) {
+      return 0;
+   } else if (x < 0) {
+      return 1 << 7 | int_to_float8(-x);
+   } else {
+      const unsigned exponent = _mesa_logbase2(x);
+      const unsigned mantissa = (x - (1 << exponent)) << (4 - exponent);
+      assert(exponent <= 4);
+      return (exponent + 3) << 4 | mantissa;
+   }
+}
+
+/**
+ * Construct a floating-point packed vector immediate from its integer
+ * values. \sa int_to_float8()
+ */
+static inline struct brw_reg
+brw_imm_vf4(int v0, int v1, int v2, int v3)
+{
+   return brw_imm_vf((int_to_float8(v0) << 0) |
+                     (int_to_float8(v1) << 8) |
+                     (int_to_float8(v2) << 16) |
+                     (int_to_float8(v3) << 24));
+}
+
+
+static inline struct brw_reg
+brw_address(struct brw_reg reg)
+{
+   return brw_imm_uw(reg.nr * REG_SIZE + reg.subnr);
+}
+
+/** Construct float[1] general-purpose register */
+static inline struct brw_reg
+brw_vec1_grf(unsigned nr, unsigned subnr)
+{
+   return brw_vec1_reg(BRW_GENERAL_REGISTER_FILE, nr, subnr);
+}
+
+/** Construct float[2] general-purpose register */
+static inline struct brw_reg
+brw_vec2_grf(unsigned nr, unsigned subnr)
+{
+   return brw_vec2_reg(BRW_GENERAL_REGISTER_FILE, nr, subnr);
+}
+
+/** Construct float[4] general-purpose register */
+static inline struct brw_reg
+brw_vec4_grf(unsigned nr, unsigned subnr)
+{
+   return brw_vec4_reg(BRW_GENERAL_REGISTER_FILE, nr, subnr);
+}
+
+/** Construct float[8] general-purpose register */
+static inline struct brw_reg
+brw_vec8_grf(unsigned nr, unsigned subnr)
+{
+   return brw_vec8_reg(BRW_GENERAL_REGISTER_FILE, nr, subnr);
+}
+
+
+static inline struct brw_reg
+brw_uw8_grf(unsigned nr, unsigned subnr)
+{
+   return brw_uw8_reg(BRW_GENERAL_REGISTER_FILE, nr, subnr);
+}
+
+static inline struct brw_reg
+brw_uw16_grf(unsigned nr, unsigned subnr)
+{
+   return brw_uw16_reg(BRW_GENERAL_REGISTER_FILE, nr, subnr);
+}
+
+
+/** Construct null register (usually used for setting condition codes) */
+static inline struct brw_reg
+brw_null_reg(void)
+{
+   return brw_vec8_reg(BRW_ARCHITECTURE_REGISTER_FILE, BRW_ARF_NULL, 0);
+}
+
+static inline struct brw_reg
+brw_address_reg(unsigned subnr)
+{
+   return brw_uw1_reg(BRW_ARCHITECTURE_REGISTER_FILE, BRW_ARF_ADDRESS, subnr);
+}
+
+/* If/else instructions break in align16 mode if writemask & swizzle
+ * aren't xyzw.  This goes against the convention for other scalar
+ * regs:
+ */
+static inline struct brw_reg
+brw_ip_reg(void)
+{
+   return brw_reg(BRW_ARCHITECTURE_REGISTER_FILE,
+                  BRW_ARF_IP,
+                  0,
+                  BRW_REGISTER_TYPE_UD,
+                  BRW_VERTICAL_STRIDE_4, /* ? */
+                  BRW_WIDTH_1,
+                  BRW_HORIZONTAL_STRIDE_0,
+                  BRW_SWIZZLE_XYZW, /* NOTE! */
+                  WRITEMASK_XYZW); /* NOTE! */
+}
+
+static inline struct brw_reg
+brw_acc_reg(void)
+{
+   return brw_vec8_reg(BRW_ARCHITECTURE_REGISTER_FILE, BRW_ARF_ACCUMULATOR, 0);
+}
+
+static inline struct brw_reg
+brw_notification_1_reg(void)
+{
+
+   return brw_reg(BRW_ARCHITECTURE_REGISTER_FILE,
+                  BRW_ARF_NOTIFICATION_COUNT,
+                  1,
+                  BRW_REGISTER_TYPE_UD,
+                  BRW_VERTICAL_STRIDE_0,
+                  BRW_WIDTH_1,
+                  BRW_HORIZONTAL_STRIDE_0,
+                  BRW_SWIZZLE_XXXX,
+                  WRITEMASK_X);
+}
+
+
+static inline struct brw_reg
+brw_flag_reg(int reg, int subreg)
+{
+   return brw_uw1_reg(BRW_ARCHITECTURE_REGISTER_FILE,
+                      BRW_ARF_FLAG + reg, subreg);
+}
+
+
+static inline struct brw_reg
+brw_mask_reg(unsigned subnr)
+{
+   return brw_uw1_reg(BRW_ARCHITECTURE_REGISTER_FILE, BRW_ARF_MASK, subnr);
+}
+
+static inline struct brw_reg
+brw_message_reg(unsigned nr)
+{
+   assert((nr & ~(1 << 7)) < BRW_MAX_MRF);
+   return brw_vec8_reg(BRW_MESSAGE_REGISTER_FILE, nr, 0);
+}
+
+static inline struct brw_reg
+brw_uvec_mrf(unsigned width, unsigned nr, unsigned subnr)
+{
+   return retype(brw_vecn_reg(width, BRW_MESSAGE_REGISTER_FILE, nr, subnr),
+                 BRW_REGISTER_TYPE_UD);
+}
+
+/* This is almost always called with a numeric constant argument, so
+ * make things easy to evaluate at compile time:
+ */
+static inline unsigned cvt(unsigned val)
+{
+   switch (val) {
+   case 0: return 0;
+   case 1: return 1;
+   case 2: return 2;
+   case 4: return 3;
+   case 8: return 4;
+   case 16: return 5;
+   case 32: return 6;
+   }
+   return 0;
+}
+
+static inline struct brw_reg
+stride(struct brw_reg reg, unsigned vstride, unsigned width, unsigned hstride)
+{
+   reg.vstride = cvt(vstride);
+   reg.width = cvt(width) - 1;
+   reg.hstride = cvt(hstride);
+   return reg;
+}
+
+
+static inline struct brw_reg
+vec16(struct brw_reg reg)
+{
+   return stride(reg, 16,16,1);
+}
+
+static inline struct brw_reg
+vec8(struct brw_reg reg)
+{
+   return stride(reg, 8,8,1);
+}
+
+static inline struct brw_reg
+vec4(struct brw_reg reg)
+{
+   return stride(reg, 4,4,1);
+}
+
+static inline struct brw_reg
+vec2(struct brw_reg reg)
+{
+   return stride(reg, 2,2,1);
+}
+
+static inline struct brw_reg
+vec1(struct brw_reg reg)
+{
+   return stride(reg, 0,1,0);
+}
+
+
+static inline struct brw_reg
+get_element(struct brw_reg reg, unsigned elt)
+{
+   return vec1(suboffset(reg, elt));
+}
+
+static inline struct brw_reg
+get_element_ud(struct brw_reg reg, unsigned elt)
+{
+   return vec1(suboffset(retype(reg, BRW_REGISTER_TYPE_UD), elt));
+}
+
+static inline struct brw_reg
+get_element_d(struct brw_reg reg, unsigned elt)
+{
+   return vec1(suboffset(retype(reg, BRW_REGISTER_TYPE_D), elt));
+}
+
+
+static inline struct brw_reg
+brw_swizzle(struct brw_reg reg, unsigned x, unsigned y, unsigned z, unsigned w)
+{
+   assert(reg.file != BRW_IMMEDIATE_VALUE);
+
+   reg.dw1.bits.swizzle = BRW_SWIZZLE4(BRW_GET_SWZ(reg.dw1.bits.swizzle, x),
+                                       BRW_GET_SWZ(reg.dw1.bits.swizzle, y),
+                                       BRW_GET_SWZ(reg.dw1.bits.swizzle, z),
+                                       BRW_GET_SWZ(reg.dw1.bits.swizzle, w));
+   return reg;
+}
+
+
+static inline struct brw_reg
+brw_swizzle1(struct brw_reg reg, unsigned x)
+{
+   return brw_swizzle(reg, x, x, x, x);
+}
+
+static inline struct brw_reg
+brw_writemask(struct brw_reg reg, unsigned mask)
+{
+   assert(reg.file != BRW_IMMEDIATE_VALUE);
+   reg.dw1.bits.writemask &= mask;
+   return reg;
+}
+
+static inline struct brw_reg
+brw_set_writemask(struct brw_reg reg, unsigned mask)
+{
+   assert(reg.file != BRW_IMMEDIATE_VALUE);
+   reg.dw1.bits.writemask = mask;
+   return reg;
+}
+
+static inline struct brw_reg
+negate(struct brw_reg reg)
+{
+   reg.negate ^= 1;
+   return reg;
+}
+
+static inline struct brw_reg
+brw_abs(struct brw_reg reg)
+{
+   reg.abs = 1;
+   reg.negate = 0;
+   return reg;
+}
+
+/************************************************************************/
+
+static inline struct brw_reg
+brw_vec4_indirect(unsigned subnr, int offset)
+{
+   struct brw_reg reg =  brw_vec4_grf(0, 0);
+   reg.subnr = subnr;
+   reg.address_mode = BRW_ADDRESS_REGISTER_INDIRECT_REGISTER;
+   reg.dw1.bits.indirect_offset = offset;
+   return reg;
+}
+
+static inline struct brw_reg
+brw_vec1_indirect(unsigned subnr, int offset)
+{
+   struct brw_reg reg =  brw_vec1_grf(0, 0);
+   reg.subnr = subnr;
+   reg.address_mode = BRW_ADDRESS_REGISTER_INDIRECT_REGISTER;
+   reg.dw1.bits.indirect_offset = offset;
+   return reg;
+}
+
+static inline struct brw_reg
+deref_4f(struct brw_indirect ptr, int offset)
+{
+   return brw_vec4_indirect(ptr.addr_subnr, ptr.addr_offset + offset);
+}
+
+static inline struct brw_reg
+deref_1f(struct brw_indirect ptr, int offset)
+{
+   return brw_vec1_indirect(ptr.addr_subnr, ptr.addr_offset + offset);
+}
+
+static inline struct brw_reg
+deref_4b(struct brw_indirect ptr, int offset)
+{
+   return retype(deref_4f(ptr, offset), BRW_REGISTER_TYPE_B);
+}
+
+static inline struct brw_reg
+deref_1uw(struct brw_indirect ptr, int offset)
+{
+   return retype(deref_1f(ptr, offset), BRW_REGISTER_TYPE_UW);
+}
+
+static inline struct brw_reg
+deref_1d(struct brw_indirect ptr, int offset)
+{
+   return retype(deref_1f(ptr, offset), BRW_REGISTER_TYPE_D);
+}
+
+static inline struct brw_reg
+deref_1ud(struct brw_indirect ptr, int offset)
+{
+   return retype(deref_1f(ptr, offset), BRW_REGISTER_TYPE_UD);
+}
+
+static inline struct brw_reg
+get_addr_reg(struct brw_indirect ptr)
+{
+   return brw_address_reg(ptr.addr_subnr);
+}
+
+static inline struct brw_indirect
+brw_indirect_offset(struct brw_indirect ptr, int offset)
+{
+   ptr.addr_offset += offset;
+   return ptr;
+}
+
+static inline struct brw_indirect
+brw_indirect(unsigned addr_subnr, int offset)
+{
+   struct brw_indirect ptr;
+   ptr.addr_subnr = addr_subnr;
+   ptr.addr_offset = offset;
+   ptr.pad = 0;
+   return ptr;
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif

diff --git a/icd/intel/compiler/pipeline/brw_schedule_instructions.cpp b/icd/intel/compiler/pipeline/brw_schedule_instructions.cpp
new file mode 100644
index 0000000..1f46569
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_schedule_instructions.cpp

@@ -0,0 +1,2360 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ * Copyright (C) 2014 LunarG, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Eric Anholt <eric@anholt.net>
+ *
+ */
+
+#include "brw_fs.h"
+#include "brw_vec4.h"
+#include "glsl/glsl_types.h"
+#include "glsl/ir_optimization.h"
+
+#include "brw_fs_live_variables.h"
+#include <algorithm>
+#include <vector>
+
+using namespace brw;
+
+/** @file brw_fs_schedule_instructions.cpp
+ *
+ * List scheduling of FS instructions.
+ *
+ * The basic model of the list scheduler is to take a basic block,
+ * compute a DAG of the dependencies (RAW ordering with latency, WAW
+ * ordering with latency, WAR ordering), and make a list of the DAG heads.
+ * Heuristically pick a DAG head, then put all the children that are
+ * now DAG heads into the list of things to schedule.
+ *
+ * The heuristic is the important part.  We're trying to be cheap,
+ * since actually computing the optimal scheduling is NP complete.
+ * What we do is track a "current clock".  When we schedule a node, we
+ * update the earliest-unblocked clock time of its children, and
+ * increment the clock.  Then, when trying to schedule, we just pick
+ * the earliest-unblocked instruction to schedule.
+ *
+ * Note that often there will be many things which could execute
+ * immediately, and there are a range of heuristic options to choose
+ * from in picking among those.
+ */
+
+static const bool debug = false;
+static const bool detail_debug = false;
+#define USE_GRAPHVIZ 0
+
+class instruction_scheduler;
+
+namespace {
+   bool use_ips(int mode)
+   {
+      switch (mode) {
+      case SCHEDULE_PRE_IPS_TD_HI:
+      case SCHEDULE_PRE_IPS_TD_LO:
+      case SCHEDULE_PRE_IPS_BU_LIMIT:
+      case SCHEDULE_PRE_IPS_BU_LO:
+      case SCHEDULE_PRE_IPS_BU_ML:
+      case SCHEDULE_PRE_IPS_BU_MD:
+      case SCHEDULE_PRE_IPS_BU_MH:
+      case SCHEDULE_PRE_IPS_BU_HI:
+         return true;
+      default:
+         return false;
+      }
+   }
+
+   bool use_bu(int mode)
+   {
+      switch (mode) {
+      case SCHEDULE_PRE_IPS_BU_LIMIT:
+      case SCHEDULE_PRE_IPS_BU_LO:
+      case SCHEDULE_PRE_IPS_BU_ML:
+      case SCHEDULE_PRE_IPS_BU_MD:
+      case SCHEDULE_PRE_IPS_BU_MH:
+      case SCHEDULE_PRE_IPS_BU_HI:
+         return true;
+      default:
+         return false;
+      }
+   }
+
+   template <typename T> T clamp01(T v) { return std::max(std::min(v, T(1)), T(-1.0)); }
+   template <typename T> T lerp(T x, T y, T a) { return x * (T(1) - a) + y * a; }
+
+} // anonymous namespace
+
+
+class schedule_node : public exec_node
+{
+public:
+   schedule_node(backend_instruction *inst, instruction_scheduler *sched);
+   void set_latency_gen4();
+   void set_latency_gen7(bool is_haswell);
+
+   backend_instruction *inst;
+   schedule_node **children;
+   int *child_latency;
+   int *child_platency;
+   int child_count;
+   int parent_count;
+   int child_array_size;
+   int unblocked_time;
+   int unblocked_ptime;
+   int latency;
+   int platency;	// Physical latency
+   bool critical_path;
+
+   /**
+    * Which iteration of pushing groups of children onto the candidates list
+    * this node was a part of.
+    */
+   unsigned cand_generation;
+
+   /**
+    * This is the sum of the instruction's latency plus the maximum delay of
+    * its children, or just the issue_time if it's a leaf node.
+    */
+   int delay;
+};
+
+void
+schedule_node::set_latency_gen4()
+{
+   int chans = 8;
+   int math_latency = 22;
+
+   switch (inst->opcode) {
+   case SHADER_OPCODE_RCP:
+      this->latency = 1 * chans * math_latency;
+      break;
+   case SHADER_OPCODE_RSQ:
+      this->latency = 2 * chans * math_latency;
+      break;
+   case SHADER_OPCODE_INT_QUOTIENT:
+   case SHADER_OPCODE_SQRT:
+   case SHADER_OPCODE_LOG2:
+      /* full precision log.  partial is 2. */
+      this->latency = 3 * chans * math_latency;
+      break;
+   case SHADER_OPCODE_INT_REMAINDER:
+   case SHADER_OPCODE_EXP2:
+      /* full precision.  partial is 3, same throughput. */
+      this->latency = 4 * chans * math_latency;
+      break;
+   case SHADER_OPCODE_POW:
+      this->latency = 8 * chans * math_latency;
+      break;
+   case SHADER_OPCODE_SIN:
+   case SHADER_OPCODE_COS:
+      /* minimum latency, max is 12 rounds. */
+      this->latency = 5 * chans * math_latency;
+      break;
+   default:
+      this->latency = 2;
+      break;
+   }
+   this->platency = this->latency;
+}
+
+void
+schedule_node::set_latency_gen7(bool is_haswell)
+{
+   switch (inst->opcode) {
+   case BRW_OPCODE_MAD:
+      /* 2 cycles
+       *  (since the last two src operands are in different register banks):
+       * mad(8) g4<1>F g2.2<4,4,1>F.x  g2<4,4,1>F.x g3.1<4,4,1>F.x { align16 WE_normal 1Q };
+       *
+       * 3 cycles on IVB, 4 on HSW
+       *  (since the last two src operands are in the same register bank):
+       * mad(8) g4<1>F g2.2<4,4,1>F.x  g2<4,4,1>F.x g2.1<4,4,1>F.x { align16 WE_normal 1Q };
+       *
+       * 18 cycles on IVB, 16 on HSW
+       *  (since the last two src operands are in different register banks):
+       * mad(8) g4<1>F g2.2<4,4,1>F.x  g2<4,4,1>F.x g3.1<4,4,1>F.x { align16 WE_normal 1Q };
+       * mov(8) null   g4<4,5,1>F                     { align16 WE_normal 1Q };
+       *
+       * 20 cycles on IVB, 18 on HSW
+       *  (since the last two src operands are in the same register bank):
+       * mad(8) g4<1>F g2.2<4,4,1>F.x  g2<4,4,1>F.x g2.1<4,4,1>F.x { align16 WE_normal 1Q };
+       * mov(8) null   g4<4,4,1>F                     { align16 WE_normal 1Q };
+       */
+
+      /* Our register allocator doesn't know about register banks, so use the
+       * higher latency.
+       */
+      latency = is_haswell ? 16 : 18;
+      break;
+
+   case BRW_OPCODE_LRP:
+      /* 2 cycles
+       *  (since the last two src operands are in different register banks):
+       * lrp(8) g4<1>F g2.2<4,4,1>F.x  g2<4,4,1>F.x g3.1<4,4,1>F.x { align16 WE_normal 1Q };
+       *
+       * 3 cycles on IVB, 4 on HSW
+       *  (since the last two src operands are in the same register bank):
+       * lrp(8) g4<1>F g2.2<4,4,1>F.x  g2<4,4,1>F.x g2.1<4,4,1>F.x { align16 WE_normal 1Q };
+       *
+       * 16 cycles on IVB, 14 on HSW
+       *  (since the last two src operands are in different register banks):
+       * lrp(8) g4<1>F g2.2<4,4,1>F.x  g2<4,4,1>F.x g3.1<4,4,1>F.x { align16 WE_normal 1Q };
+       * mov(8) null   g4<4,4,1>F                     { align16 WE_normal 1Q };
+       *
+       * 16 cycles
+       *  (since the last two src operands are in the same register bank):
+       * lrp(8) g4<1>F g2.2<4,4,1>F.x  g2<4,4,1>F.x g2.1<4,4,1>F.x { align16 WE_normal 1Q };
+       * mov(8) null   g4<4,4,1>F                     { align16 WE_normal 1Q };
+       */
+
+      /* Our register allocator doesn't know about register banks, so use the
+       * higher latency.
+       */
+      latency = 14;
+      break;
+
+   case SHADER_OPCODE_RCP:
+   case SHADER_OPCODE_RSQ:
+   case SHADER_OPCODE_SQRT:
+   case SHADER_OPCODE_LOG2:
+   case SHADER_OPCODE_EXP2:
+   case SHADER_OPCODE_SIN:
+   case SHADER_OPCODE_COS:
+      /* 2 cycles:
+       * math inv(8) g4<1>F g2<0,1,0>F      null       { align1 WE_normal 1Q };
+       *
+       * 18 cycles:
+       * math inv(8) g4<1>F g2<0,1,0>F      null       { align1 WE_normal 1Q };
+       * mov(8)      null   g4<8,8,1>F                 { align1 WE_normal 1Q };
+       *
+       * Same for exp2, log2, rsq, sqrt, sin, cos.
+       */
+      latency = is_haswell ? 14 : 16;
+      break;
+
+   case SHADER_OPCODE_POW:
+      /* 2 cycles:
+       * math pow(8) g4<1>F g2<0,1,0>F   g2.1<0,1,0>F  { align1 WE_normal 1Q };
+       *
+       * 26 cycles:
+       * math pow(8) g4<1>F g2<0,1,0>F   g2.1<0,1,0>F  { align1 WE_normal 1Q };
+       * mov(8)      null   g4<8,8,1>F                 { align1 WE_normal 1Q };
+       */
+      latency = is_haswell ? 22 : 24;
+      break;
+
+   case SHADER_OPCODE_TEX:
+   case SHADER_OPCODE_TXD:
+   case SHADER_OPCODE_TXF:
+   case SHADER_OPCODE_TXL:
+      /* 18 cycles:
+       * mov(8)  g115<1>F   0F                         { align1 WE_normal 1Q };
+       * mov(8)  g114<1>F   0F                         { align1 WE_normal 1Q };
+       * send(8) g4<1>UW    g114<8,8,1>F
+       *   sampler (10, 0, 0, 1) mlen 2 rlen 4         { align1 WE_normal 1Q };
+       *
+       * 697 +/-49 cycles (min 610, n=26):
+       * mov(8)  g115<1>F   0F                         { align1 WE_normal 1Q };
+       * mov(8)  g114<1>F   0F                         { align1 WE_normal 1Q };
+       * send(8) g4<1>UW    g114<8,8,1>F
+       *   sampler (10, 0, 0, 1) mlen 2 rlen 4         { align1 WE_normal 1Q };
+       * mov(8)  null       g4<8,8,1>F                 { align1 WE_normal 1Q };
+       *
+       * So the latency on our first texture load of the batchbuffer takes
+       * ~700 cycles, since the caches are cold at that point.
+       *
+       * 840 +/- 92 cycles (min 720, n=25):
+       * mov(8)  g115<1>F   0F                         { align1 WE_normal 1Q };
+       * mov(8)  g114<1>F   0F                         { align1 WE_normal 1Q };
+       * send(8) g4<1>UW    g114<8,8,1>F
+       *   sampler (10, 0, 0, 1) mlen 2 rlen 4         { align1 WE_normal 1Q };
+       * mov(8)  null       g4<8,8,1>F                 { align1 WE_normal 1Q };
+       * send(8) g4<1>UW    g114<8,8,1>F
+       *   sampler (10, 0, 0, 1) mlen 2 rlen 4         { align1 WE_normal 1Q };
+       * mov(8)  null       g4<8,8,1>F                 { align1 WE_normal 1Q };
+       *
+       * On the second load, it takes just an extra ~140 cycles, and after
+       * accounting for the 14 cycles of the MOV's latency, that makes ~130.
+       *
+       * 683 +/- 49 cycles (min = 602, n=47):
+       * mov(8)  g115<1>F   0F                         { align1 WE_normal 1Q };
+       * mov(8)  g114<1>F   0F                         { align1 WE_normal 1Q };
+       * send(8) g4<1>UW    g114<8,8,1>F
+       *   sampler (10, 0, 0, 1) mlen 2 rlen 4         { align1 WE_normal 1Q };
+       * send(8) g50<1>UW   g114<8,8,1>F
+       *   sampler (10, 0, 0, 1) mlen 2 rlen 4         { align1 WE_normal 1Q };
+       * mov(8)  null       g4<8,8,1>F                 { align1 WE_normal 1Q };
+       *
+       * The unit appears to be pipelined, since this matches up with the
+       * cache-cold case, despite there being two loads here.  If you replace
+       * the g4 in the MOV to null with g50, it's still 693 +/- 52 (n=39).
+       *
+       * So, take some number between the cache-hot 140 cycles and the
+       * cache-cold 700 cycles.  No particular tuning was done on this.
+       *
+       * I haven't done significant testing of the non-TEX opcodes.  TXL at
+       * least looked about the same as TEX.
+       */
+      latency = 200;
+      break;
+
+   case SHADER_OPCODE_TXS:
+      /* Testing textureSize(sampler2D, 0), one load was 420 +/- 41
+       * cycles (n=15):
+       * mov(8)   g114<1>UD  0D                        { align1 WE_normal 1Q };
+       * send(8)  g6<1>UW    g114<8,8,1>F
+       *   sampler (10, 0, 10, 1) mlen 1 rlen 4        { align1 WE_normal 1Q };
+       * mov(16)  g6<1>F     g6<8,8,1>D                { align1 WE_normal 1Q };
+       *
+       *
+       * Two loads was 535 +/- 30 cycles (n=19):
+       * mov(16)   g114<1>UD  0D                       { align1 WE_normal 1H };
+       * send(16)  g6<1>UW    g114<8,8,1>F
+       *   sampler (10, 0, 10, 2) mlen 2 rlen 8        { align1 WE_normal 1H };
+       * mov(16)   g114<1>UD  0D                       { align1 WE_normal 1H };
+       * mov(16)   g6<1>F     g6<8,8,1>D               { align1 WE_normal 1H };
+       * send(16)  g8<1>UW    g114<8,8,1>F
+       *   sampler (10, 0, 10, 2) mlen 2 rlen 8        { align1 WE_normal 1H };
+       * mov(16)   g8<1>F     g8<8,8,1>D               { align1 WE_normal 1H };
+       * add(16)   g6<1>F     g6<8,8,1>F   g8<8,8,1>F  { align1 WE_normal 1H };
+       *
+       * Since the only caches that should matter are just the
+       * instruction/state cache containing the surface state, assume that we
+       * always have hot caches.
+       */
+      latency = 100;
+      break;
+
+   case FS_OPCODE_VARYING_PULL_CONSTANT_LOAD:
+   case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD:
+   case VS_OPCODE_PULL_CONSTANT_LOAD:
+      /* testing using varying-index pull constants:
+       *
+       * 16 cycles:
+       * mov(8)  g4<1>D  g2.1<0,1,0>F                  { align1 WE_normal 1Q };
+       * send(8) g4<1>F  g4<8,8,1>D
+       *   data (9, 2, 3) mlen 1 rlen 1                { align1 WE_normal 1Q };
+       *
+       * ~480 cycles:
+       * mov(8)  g4<1>D  g2.1<0,1,0>F                  { align1 WE_normal 1Q };
+       * send(8) g4<1>F  g4<8,8,1>D
+       *   data (9, 2, 3) mlen 1 rlen 1                { align1 WE_normal 1Q };
+       * mov(8)  null    g4<8,8,1>F                    { align1 WE_normal 1Q };
+       *
+       * ~620 cycles:
+       * mov(8)  g4<1>D  g2.1<0,1,0>F                  { align1 WE_normal 1Q };
+       * send(8) g4<1>F  g4<8,8,1>D
+       *   data (9, 2, 3) mlen 1 rlen 1                { align1 WE_normal 1Q };
+       * mov(8)  null    g4<8,8,1>F                    { align1 WE_normal 1Q };
+       * send(8) g4<1>F  g4<8,8,1>D
+       *   data (9, 2, 3) mlen 1 rlen 1                { align1 WE_normal 1Q };
+       * mov(8)  null    g4<8,8,1>F                    { align1 WE_normal 1Q };
+       *
+       * So, if it's cache-hot, it's about 140.  If it's cache cold, it's
+       * about 460.  We expect to mostly be cache hot, so pick something more
+       * in that direction.
+       */
+      latency = 200;
+      break;
+
+   case SHADER_OPCODE_GEN7_SCRATCH_READ:
+      /* Testing a load from offset 0, that had been previously written:
+       *
+       * send(8) g114<1>UW g0<8,8,1>F data (0, 0, 0) mlen 1 rlen 1 { align1 WE_normal 1Q };
+       * mov(8)  null      g114<8,8,1>F { align1 WE_normal 1Q };
+       *
+       * The cycles spent seemed to be grouped around 40-50 (as low as 38),
+       * then around 140.  Presumably this is cache hit vs miss.
+       */
+      latency = 50;
+      break;
+
+   case SHADER_OPCODE_UNTYPED_ATOMIC:
+      /* Test code:
+       *   mov(8)    g112<1>ud       0x00000000ud       { align1 WE_all 1Q };
+       *   mov(1)    g112.7<1>ud     g1.7<0,1,0>ud      { align1 WE_all };
+       *   mov(8)    g113<1>ud       0x00000000ud       { align1 WE_normal 1Q };
+       *   send(8)   g4<1>ud         g112<8,8,1>ud
+       *             data (38, 5, 6) mlen 2 rlen 1      { align1 WE_normal 1Q };
+       *
+       * Running it 100 times as fragment shader on a 128x128 quad
+       * gives an average latency of 13867 cycles per atomic op,
+       * standard deviation 3%.  Note that this is a rather
+       * pessimistic estimate, the actual latency in cases with few
+       * collisions between threads and favorable pipelining has been
+       * seen to be reduced by a factor of 100.
+       */
+      latency = 14000;
+      break;
+
+   case SHADER_OPCODE_UNTYPED_SURFACE_READ:
+      /* Test code:
+       *   mov(8)    g112<1>UD       0x00000000UD       { align1 WE_all 1Q };
+       *   mov(1)    g112.7<1>UD     g1.7<0,1,0>UD      { align1 WE_all };
+       *   mov(8)    g113<1>UD       0x00000000UD       { align1 WE_normal 1Q };
+       *   send(8)   g4<1>UD         g112<8,8,1>UD
+       *             data (38, 6, 5) mlen 2 rlen 1      { align1 WE_normal 1Q };
+       *   .
+       *   . [repeats 8 times]
+       *   .
+       *   mov(8)    g112<1>UD       0x00000000UD       { align1 WE_all 1Q };
+       *   mov(1)    g112.7<1>UD     g1.7<0,1,0>UD      { align1 WE_all };
+       *   mov(8)    g113<1>UD       0x00000000UD       { align1 WE_normal 1Q };
+       *   send(8)   g4<1>UD         g112<8,8,1>UD
+       *             data (38, 6, 5) mlen 2 rlen 1      { align1 WE_normal 1Q };
+       *
+       * Running it 100 times as fragment shader on a 128x128 quad
+       * gives an average latency of 583 cycles per surface read,
+       * standard deviation 0.9%.
+       */
+      latency = is_haswell ? 300 : 600;
+      break;
+
+   default:
+      /* 2 cycles:
+       * mul(8) g4<1>F g2<0,1,0>F      0.5F            { align1 WE_normal 1Q };
+       *
+       * 16 cycles:
+       * mul(8) g4<1>F g2<0,1,0>F      0.5F            { align1 WE_normal 1Q };
+       * mov(8) null   g4<8,8,1>F                      { align1 WE_normal 1Q };
+       */
+      latency = 14;
+      break;
+   }
+   platency = latency;
+}
+
+class instruction_scheduler {
+public:
+   instruction_scheduler(backend_visitor *v, int grf_count, int allocatable_grfs,
+                         instruction_scheduler_mode mode)
+   {
+      this->bv = v;
+      this->mem_ctx = ralloc_context(NULL);
+      this->grf_count = grf_count;
+      this->allocatable_grfs = allocatable_grfs;
+      this->instructions.make_empty();
+      this->instructions_to_schedule = 0;
+      this->post_reg_alloc = (mode == SCHEDULE_POST);
+      this->mode = mode;
+      this->time = 0;
+      this->ptime = 0;
+      this->previous_chosen = NULL;
+      this->block_num = 0;
+
+      switch (mode) {
+      case SCHEDULE_PRE_IPS_TD_HI:    this->pressure_panic_threshold = 0.90f; break;
+      case SCHEDULE_PRE_IPS_TD_LO:    this->pressure_panic_threshold = 0.50f; break;
+      case SCHEDULE_PRE_IPS_BU_HI:    this->pressure_panic_threshold = 0.90f; break;
+      case SCHEDULE_PRE_IPS_BU_MH:    this->pressure_panic_threshold = 0.75f; break;
+      case SCHEDULE_PRE_IPS_BU_MD:    this->pressure_panic_threshold = 0.65f; break;
+      case SCHEDULE_PRE_IPS_BU_ML:    this->pressure_panic_threshold = 0.55f; break;
+      case SCHEDULE_PRE_IPS_BU_LO:    this->pressure_panic_threshold = 0.45f; break;
+      default:
+         this->pressure_panic_threshold = 0.6f;
+      }
+
+      if (!post_reg_alloc) {
+         this->remaining_grf_uses = rzalloc_array(mem_ctx, int, grf_count);
+         this->grf_active = rzalloc_array(mem_ctx, bool, grf_count);
+      } else {
+         this->remaining_grf_uses = NULL;
+         this->grf_active = NULL;
+      }
+   }
+
+   ~instruction_scheduler()
+   {
+      ralloc_free(this->mem_ctx);
+   }
+   void add_barrier_deps(bool bu, schedule_node *n);
+   void add_dep(bool bu, schedule_node *before, schedule_node *after, int latency, int platency);
+   void add_dep(bool bu, schedule_node *before, schedule_node *after);
+
+   void run_td(exec_list *instructions);
+   void run_bu(exec_list *instructions);
+   void add_inst(backend_instruction *inst);
+   void compute_delay(schedule_node *node);
+#if USE_GRAPHVIZ
+   void find_critical_path(schedule_node *node);
+#endif // USE_GRAPHVIZ
+   virtual void calculate_deps(bool bu) = 0;
+   virtual schedule_node *choose_instruction_to_schedule_td() = 0;
+   virtual schedule_node *choose_instruction_to_schedule_bu() = 0;
+
+   /**
+    * Returns how many cycles it takes the instruction to issue.
+    *
+    * Instructions in gen hardware are handled one simd4 vector at a time,
+    * with 1 cycle per vector dispatched.  Thus SIMD8 pixel shaders take 2
+    * cycles to dispatch and SIMD16 (compressed) instructions take 4.
+    */
+   virtual int issue_time(backend_instruction *inst) = 0;
+
+   virtual void count_remaining_grf_uses(backend_instruction *inst) = 0;
+   virtual void update_register_pressure(backend_instruction *inst, bool  bu) = 0;
+   virtual float get_register_pressure_benefit(backend_instruction *inst, bool bu) = 0;
+   virtual bool is_partially_scheduled(backend_instruction *inst) = 0;
+   virtual bool consumes_dst(backend_instruction *prev, backend_instruction *curr) = 0;
+
+   void schedule_instructions_td(backend_instruction *next_block_header, int live_ins);
+   void schedule_instructions_bu(backend_instruction *next_block_header, int live_outs);
+
+   void *mem_ctx;
+
+   bool post_reg_alloc;
+   int instructions_to_schedule;
+   int grf_count;
+   int allocatable_grfs;
+   int time;
+   int ptime;
+   exec_list instructions;
+   backend_visitor *bv;
+
+   float current_block_pressure;
+   float pressure_panic_threshold;
+   int block_num;
+
+   instruction_scheduler_mode mode;
+
+   // The previously scheduled node, or NULL if none
+   schedule_node *previous_chosen;
+
+   /**
+    * Number of instructions left to schedule that reference each vgrf.
+    *
+    * Used so that we can prefer scheduling instructions that will end the
+    * live intervals of multiple variables, to reduce register pressure.
+    */
+   int *remaining_grf_uses;
+
+   /**
+    * Tracks whether each VGRF has had an instruction scheduled that uses it.
+    *
+    * This is used to estimate whether scheduling a new instruction will
+    * increase register pressure.
+    */
+   bool *grf_active;
+};
+
+class fs_instruction_scheduler : public instruction_scheduler
+{
+public:
+   fs_instruction_scheduler(fs_visitor *v, int grf_count, int allocatable_grfs,
+                            instruction_scheduler_mode mode);
+   void calculate_deps(bool bu);
+   bool conflict(fs_reg *r0, int n0, fs_reg *r1, int n1);
+   bool is_compressed(fs_inst *inst);
+   schedule_node *choose_instruction_to_schedule_td();
+   schedule_node *choose_instruction_to_schedule_bu();
+   schedule_node *choose_instruction_to_schedule_classic();
+   int issue_time(backend_instruction *inst);
+   fs_visitor *v;
+
+   void count_remaining_grf_uses(backend_instruction *inst);
+   void update_register_pressure(backend_instruction *inst, bool bu);
+   float get_register_pressure_benefit(backend_instruction *inst, bool bu);
+   bool is_partially_scheduled(backend_instruction *inst);
+   bool consumes_dst(backend_instruction *prev, backend_instruction *curr);
+};
+
+fs_instruction_scheduler::fs_instruction_scheduler(fs_visitor *v,
+                                                   int grf_count, int allocatable_grfs,
+                                                   instruction_scheduler_mode mode)
+   : instruction_scheduler(v, grf_count, allocatable_grfs, mode),
+     v(v)
+{
+}
+
+void
+fs_instruction_scheduler::count_remaining_grf_uses(backend_instruction *be)
+{
+   fs_inst *inst = (fs_inst *)be;
+
+   if (!remaining_grf_uses)
+      return;
+
+   if (inst->dst.file == GRF)
+      remaining_grf_uses[inst->dst.reg]++;
+
+   for (int i = 0; i < 3; i++) {
+      if (inst->src[i].file != GRF)
+         continue;
+
+      remaining_grf_uses[inst->src[i].reg]++;
+   }
+}
+
+void
+fs_instruction_scheduler::update_register_pressure(backend_instruction *be, bool bu)
+{
+   fs_inst *inst = (fs_inst *)be;
+
+   if (!remaining_grf_uses)
+      return;
+
+   current_block_pressure -= get_register_pressure_benefit(be, bu);
+
+   if (bu) {
+      for (int i = 0; i < 3; i++) {
+         if (inst->src[i].file == GRF) {
+            remaining_grf_uses[inst->src[i].reg]--;
+            grf_active[inst->src[i].reg] = true;
+         }
+      }
+
+      if (inst->dst.file == GRF) {
+         grf_active[inst->dst.reg] = false;
+         remaining_grf_uses[inst->dst.reg]--;
+      }
+
+   } else {
+      if (inst->dst.file == GRF) {
+         remaining_grf_uses[inst->dst.reg]--;
+         grf_active[inst->dst.reg] = true;
+      }
+
+      for (int i = 0; i < 3; i++) {
+         if (inst->src[i].file == GRF) {
+            remaining_grf_uses[inst->src[i].reg]--;
+            grf_active[inst->src[i].reg] = true;
+         }
+      }
+   }
+}
+
+bool
+fs_instruction_scheduler::is_partially_scheduled(backend_instruction *be)
+{
+   /* Look for partial writes to register groups, because we'll prefer
+      scheduling whole groups where possible */
+
+   fs_inst *inst = (fs_inst *)be;
+   int remaining = remaining_grf_uses[inst->dst.reg];
+
+   if (grf_active[inst->dst.reg] && remaining < 4) {
+      foreach_list(node, &instructions) {
+         schedule_node *n = (schedule_node *)node;
+
+         if (((fs_inst*)n->inst)->dst.reg == inst->dst.reg)
+            --remaining;
+      }
+   }
+
+   return remaining < remaining_grf_uses[inst->dst.reg];
+}
+
+bool
+fs_instruction_scheduler::consumes_dst(backend_instruction *prev, backend_instruction *curr)
+{
+   if (!prev | !curr)
+      return false;
+
+   fs_inst *fs_prev = (fs_inst *)prev;
+   fs_inst *fs_curr = (fs_inst *)curr;
+
+   if (fs_prev->dst.file != GRF)
+      return false;
+
+   for (int i = 0; i < 3; i++) {
+      if (fs_curr->src[i].file == GRF && fs_curr->src[i].reg == fs_prev->dst.reg)
+         return true;
+   }   
+
+   return false;
+}
+
+float
+fs_instruction_scheduler::get_register_pressure_benefit(backend_instruction *be, bool bu)
+{
+   fs_inst *inst = (fs_inst *)be;
+   float benefit = 0.0f;
+
+   if (use_ips(mode))
+      if (inst->opcode == FS_OPCODE_LINTERP || inst->opcode == FS_OPCODE_CINTERP)
+         return -0.5f;
+
+   if (bu) {
+      if (inst->dst.file == GRF) {
+         if (remaining_grf_uses[inst->dst.reg] == 1)
+            benefit += v->virtual_grf_sizes[inst->dst.reg];
+      }
+
+      for (int i = 0; i < 3; i++) {
+         if (inst->src[i].file != GRF)
+            continue;
+
+         if (!grf_active[inst->src[i].reg])
+            benefit -= v->virtual_grf_sizes[inst->src[i].reg];
+      }
+
+   } else {
+      if (inst->dst.file == GRF) {
+         if (remaining_grf_uses[inst->dst.reg] == 1)
+            benefit += v->virtual_grf_sizes[inst->dst.reg];
+         if (!grf_active[inst->dst.reg])
+            benefit -= v->virtual_grf_sizes[inst->dst.reg];
+      }
+
+      for (int i = 0; i < 3; i++) {
+         if (inst->src[i].file != GRF)
+            continue;
+
+         if (remaining_grf_uses[inst->src[i].reg] == 1)
+            benefit += v->virtual_grf_sizes[inst->src[i].reg];
+         if (!grf_active[inst->src[i].reg])
+            benefit -= v->virtual_grf_sizes[inst->src[i].reg];
+      }
+   }
+      
+   return benefit;
+}
+
+class vec4_instruction_scheduler : public instruction_scheduler
+{
+public:
+   vec4_instruction_scheduler(vec4_visitor *v, int grf_count, int allocatable_grfs,
+                              instruction_scheduler_mode mode);
+   void calculate_deps(bool bu);
+   schedule_node *choose_instruction_to_schedule_td();
+   schedule_node *choose_instruction_to_schedule_bu();
+   int issue_time(backend_instruction *inst);
+   vec4_visitor *v;
+
+   void count_remaining_grf_uses(backend_instruction *inst);
+   void update_register_pressure(backend_instruction *inst, bool bu);
+   float get_register_pressure_benefit(backend_instruction *inst, bool bu);
+   bool is_partially_scheduled(backend_instruction *inst) { return false; }
+   bool consumes_dst(backend_instruction *prev, backend_instruction *curr) { return false; }
+};
+
+vec4_instruction_scheduler::vec4_instruction_scheduler(vec4_visitor *v,
+                                                       int grf_count, int allocatable_grfs,
+                                                       instruction_scheduler_mode mode)
+   : instruction_scheduler(v, grf_count, allocatable_grfs, mode),
+     v(v)
+{
+}
+
+void
+vec4_instruction_scheduler::count_remaining_grf_uses(backend_instruction *be)
+{
+}
+
+void
+vec4_instruction_scheduler::update_register_pressure(backend_instruction *be, bool bu)
+{
+}
+
+float
+vec4_instruction_scheduler::get_register_pressure_benefit(backend_instruction *be, bool bu)
+{
+   return 0.0f;
+}
+
+schedule_node::schedule_node(backend_instruction *inst,
+                             instruction_scheduler *sched)
+{
+   struct brw_context *brw = sched->bv->brw;
+
+   this->inst = inst;
+   this->child_array_size = 0;
+   this->children = NULL;
+   this->child_latency = NULL;
+   this->child_platency = NULL;
+   this->child_count = 0;
+   this->critical_path = false;
+   this->parent_count = 0;
+   this->unblocked_time = 0;
+   this->unblocked_ptime = 0;
+   this->cand_generation = 0;
+   this->delay = 0;
+
+   /* We can't measure Gen6 timings directly but expect them to be much
+    * closer to Gen7 than Gen4.
+    */
+   if (brw->gen >= 6)
+      set_latency_gen7(brw->is_haswell);
+   else
+      set_latency_gen4();
+
+   // Estimate of average thread coverage.  Threads might be more effective in
+   // many cases, but there are also cache misses and other high latency
+   // events.
+   static const int hw_thread_count = 2;
+
+   this->latency = this->latency / hw_thread_count;
+   this->latency = (this->latency < 1) ? 1 : this->latency;
+   this->platency = this->latency; 
+
+   if (!sched->post_reg_alloc)	// Overwrite "scheduling" latency only
+     this->latency = 1;
+}
+
+void
+instruction_scheduler::add_inst(backend_instruction *inst)
+{
+   schedule_node *n = new(mem_ctx) schedule_node(inst, this);
+
+   assert(!inst->is_head_sentinel());
+   assert(!inst->is_tail_sentinel());
+
+   this->instructions_to_schedule++;
+
+   inst->remove();
+   instructions.push_tail(n);
+}
+
+#if USE_GRAPHVIZ
+/** Calculate critical path. */
+void
+instruction_scheduler::find_critical_path(schedule_node *n)
+{
+   if (!n || !n->child_count)
+      return;
+
+   do {
+      int delay = -65535;
+      schedule_node *next = 0;
+
+      n->critical_path = true;
+
+      for (int i = 0; i < n->child_count; i++) {
+         if (n->children[i]->delay > delay) {
+            next = n->children[i];
+            delay = n->children[i]->delay;
+         }
+      }
+      
+      n = next;
+   } while (n);
+}
+#endif // USE_GRAPHVIZ
+
+/** Recursive computation of the delay member of a node. */
+void
+instruction_scheduler::compute_delay(schedule_node *n)
+{
+   if (!n->child_count) {
+      n->delay = issue_time(n->inst);
+   } else {
+      for (int i = 0; i < n->child_count; i++) {
+         if (!n->children[i]->delay)
+            compute_delay(n->children[i]);
+
+         n->delay = MAX2(n->delay, n->platency + n->children[i]->delay);
+      }
+   }
+}
+
+/**
+ * Add a dependency between two instruction nodes.
+ *
+ * The @after node will be scheduled after @before.  We will try to
+ * schedule it @latency cycles after @before, but no guarantees there.
+ */
+void
+instruction_scheduler::add_dep(bool bu, schedule_node *before, schedule_node *after,
+			       int latency, int platency)
+{
+   if (!before || !after)
+      return;
+
+   assert(before != after);
+
+   if (bu) {
+      for (int i = 0; i < after->child_count; i++) {
+         if (after->children[i] == before) {
+            return;
+         }
+      }
+
+      if (after->child_array_size <= after->child_count) {
+         if (after->child_array_size < 16)
+            after->child_array_size = 16;
+         else
+            after->child_array_size *= 2;
+
+         after->children = reralloc(mem_ctx, after->children,
+                                     schedule_node *,
+                                     after->child_array_size);
+
+         after->child_latency = reralloc(mem_ctx, after->child_latency,
+                                         int, after->child_array_size);
+         after->child_platency = reralloc(mem_ctx, after->child_platency,
+                                          int, after->child_array_size);
+      }
+
+      after->children[after->child_count] = before;
+      after->child_latency[after->child_count] = latency;
+      after->child_platency[after->child_count] = platency;
+      after->child_count++;
+      before->parent_count++;
+   } else {
+      for (int i = 0; i < before->child_count; i++) {
+         if (before->children[i] == after) {
+            before->child_latency[i] = MAX2(before->child_latency[i], latency);
+            before->child_platency[i] = MAX2(before->child_platency[i], platency);
+            return;
+         }
+      }
+
+      if (before->child_array_size <= before->child_count) {
+         if (before->child_array_size < 16)
+            before->child_array_size = 16;
+         else
+            before->child_array_size *= 2;
+
+         before->children = reralloc(mem_ctx, before->children,
+                                     schedule_node *,
+                                     before->child_array_size);
+         before->child_latency = reralloc(mem_ctx, before->child_latency,
+                                          int, before->child_array_size);
+         before->child_platency = reralloc(mem_ctx, before->child_platency,
+                                           int, before->child_array_size);
+      }
+
+      before->children[before->child_count] = after;
+      before->child_latency[before->child_count] = latency;
+      before->child_platency[before->child_count] = platency;
+      before->child_count++;
+      after->parent_count++;
+   }
+}
+
+void
+instruction_scheduler::add_dep(bool bu, schedule_node *before, schedule_node *after)
+{
+   if (!before)
+      return;
+
+   add_dep(bu, before, after, before->latency, before->platency);
+}
+
+/**
+ * Sometimes we really want this node to execute after everything that
+ * was before it and before everything that followed it.  This adds
+ * the deps to do so.
+ */
+void
+instruction_scheduler::add_barrier_deps(bool bu, schedule_node *n)
+{
+   schedule_node *prev = (schedule_node *)n->prev;
+   schedule_node *next = (schedule_node *)n->next;
+
+   if (prev) {
+      while (!prev->is_head_sentinel()) {
+	 add_dep(bu, prev, n, 0, 0);
+	 prev = (schedule_node *)prev->prev;
+      }
+   }
+
+   if (next) {
+      while (!next->is_tail_sentinel()) {
+	 add_dep(bu, n, next, 0, 0);
+	 next = (schedule_node *)next->next;
+      }
+   }
+}
+
+/* instruction scheduling needs to be aware of when an MRF write
+ * actually writes 2 MRFs.
+ */
+bool
+fs_instruction_scheduler::is_compressed(fs_inst *inst)
+{
+   return (v->dispatch_width == 16 &&
+	   !inst->force_uncompressed &&
+	   !inst->force_sechalf);
+}
+
+bool fs_instruction_scheduler::conflict(fs_reg *r0, int n0, fs_reg *r1, int n1)
+{
+   const int r0_start = r0->reg + r0->reg_offset + 0;
+   const int r0_end   = r0->reg + r0->reg_offset + n0;
+   const int r1_start = r1->reg + r1->reg_offset + 0;
+   const int r1_end   = r1->reg + r1->reg_offset + n1;
+
+   return !(r0_end < r1_start || r0_start > r1_end);
+}
+
+
+void
+fs_instruction_scheduler::calculate_deps(bool bu)
+{
+   const bool gen6plus = v->brw->gen >= 6;
+
+   /* Pre-register-allocation, this tracks the last write per VGRF (so
+    * different reg_offsets within it can interfere when they shouldn't).
+    * After register allocation, reg_offsets are gone and we track individual
+    * GRF registers.
+    */
+   schedule_node *last_grf_write[grf_count];
+   schedule_node *last_mrf_write[BRW_MAX_MRF];
+   schedule_node *last_conditional_mod[2] = { NULL, NULL };
+   schedule_node *last_accumulator_write = NULL;
+   /* Fixed HW registers are assumed to be separate from the virtual
+    * GRFs, so they can be tracked separately.  We don't really write
+    * to fixed GRFs much, so don't bother tracking them on a more
+    * granular level.
+    */
+   schedule_node *last_fixed_grf_write = NULL;
+   int reg_width = v->dispatch_width / 8;
+
+   /* The last instruction always needs to still be the last
+    * instruction.  Either it's flow control (IF, ELSE, ENDIF, DO,
+    * WHILE) and scheduling other things after it would disturb the
+    * basic block, or it's FB_WRITE and we should do a better job at
+    * dead code elimination anyway.
+    */
+   schedule_node *last = (schedule_node *)instructions.get_tail();
+   add_barrier_deps(bu, last);
+
+   memset(last_grf_write, 0, sizeof(last_grf_write));
+   memset(last_mrf_write, 0, sizeof(last_mrf_write));
+
+   /* top-to-bottom dependencies: RAW and WAW. */
+   foreach_list(node, &instructions) {
+      schedule_node *n = (schedule_node *)node;
+      fs_inst *inst = (fs_inst *)n->inst;
+
+      if (inst->opcode == FS_OPCODE_PLACEHOLDER_HALT ||
+         inst->has_side_effects())
+         add_barrier_deps(bu, n);
+
+      /* read-after-write deps. */
+      for (int i = 0; i < 3; i++) {
+         if (inst->src[i].file == GRF) {
+            if (post_reg_alloc) {
+               for (int r = 0; r < reg_width * inst->regs_read(v, i); r++)
+                  add_dep(bu, last_grf_write[inst->src[i].reg + r], n);
+            } else {
+                fs_inst* writer = last_grf_write[inst->src[i].reg] ?
+                    (fs_inst*)last_grf_write[inst->src[i].reg]->inst : 0;
+                fs_inst* reader = (fs_inst*)n->inst;
+
+                if (reader && writer)
+                    // LunarG TODO: Fix and re-enable component-level dependence analysis
+                    // With the commented logic, dependencies are not recorded if the previous
+                    // write did not include the current source components. This misses dependencies
+                    // if a write preceding the last did include the current source components.
+                    // if (conflict(&writer->dst,    writer->regs_written,
+                    //             &reader->src[i], reader->regs_read(v, i)))
+                        add_dep(bu, last_grf_write[inst->src[i].reg], n);
+            }
+         } else if (inst->src[i].file == HW_REG &&
+                    (inst->src[i].fixed_hw_reg.file ==
+                     BRW_GENERAL_REGISTER_FILE)) {
+            if (post_reg_alloc) {
+               int size = reg_width;
+               if (inst->src[i].fixed_hw_reg.vstride == BRW_VERTICAL_STRIDE_0)
+                  size = 1;
+               for (int r = 0; r < size; r++)
+                  add_dep(bu, last_grf_write[inst->src[i].fixed_hw_reg.nr + r], n);
+            } else {
+               add_dep(bu, last_fixed_grf_write, n);
+            }
+         } else if (inst->src[i].is_accumulator() && gen6plus) {
+            add_dep(bu, last_accumulator_write, n);
+         } else if (inst->src[i].file != BAD_FILE &&
+                    inst->src[i].file != IMM &&
+                    inst->src[i].file != UNIFORM &&
+                    (inst->src[i].file != HW_REG ||
+                     inst->src[i].fixed_hw_reg.file != IMM)) {
+            assert(inst->src[i].file != MRF);
+            add_barrier_deps(bu, n);
+         }
+      }
+
+      if (inst->base_mrf != -1) {
+         for (int i = 0; i < inst->mlen; i++) {
+            /* It looks like the MRF regs are released in the send
+             * instruction once it's sent, not when the result comes
+             * back.
+             */
+            add_dep(bu, last_mrf_write[inst->base_mrf + i], n);
+         }
+      }
+
+      if (inst->reads_flag()) {
+         add_dep(bu, last_conditional_mod[inst->flag_subreg], n);
+      }
+
+      if (inst->reads_accumulator_implicitly()) {
+         if (gen6plus) {
+             add_dep(bu, last_accumulator_write, n);
+         } else {
+             add_barrier_deps(bu, n);
+         }
+      }
+
+      /* write-after-write deps. */
+      if (inst->dst.file == GRF) {
+         if (post_reg_alloc) {
+            for (int r = 0; r < inst->regs_written * reg_width; r++) {
+               add_dep(bu, last_grf_write[inst->dst.reg + r], n);
+               last_grf_write[inst->dst.reg + r] = n;
+            }
+         } else {
+            add_dep(bu, last_grf_write[inst->dst.reg], n);
+            last_grf_write[inst->dst.reg] = n;
+         }
+      } else if (inst->dst.file == MRF) {
+         int reg = inst->dst.reg & ~BRW_MRF_COMPR4;
+
+         add_dep(bu, last_mrf_write[reg], n);
+         last_mrf_write[reg] = n;
+         if (is_compressed(inst)) {
+            if (inst->dst.reg & BRW_MRF_COMPR4)
+               reg += 4;
+            else
+               reg++;
+            add_dep(bu, last_mrf_write[reg], n);
+            last_mrf_write[reg] = n;
+         }
+      } else if (inst->dst.file == HW_REG &&
+                 inst->dst.fixed_hw_reg.file == BRW_GENERAL_REGISTER_FILE) {
+         if (post_reg_alloc) {
+            for (int r = 0; r < reg_width; r++)
+               last_grf_write[inst->dst.fixed_hw_reg.nr + r] = n;
+         } else {
+            last_fixed_grf_write = n;
+         }
+      } else if (inst->dst.is_accumulator() && gen6plus) {
+         add_dep(bu, last_accumulator_write, n);
+         last_accumulator_write = n;
+      } else if (inst->dst.file != BAD_FILE) {
+         add_barrier_deps(bu, n);
+      }
+
+      if (inst->mlen > 0 && inst->base_mrf != -1) {
+         for (int i = 0; i < v->implied_mrf_writes(inst); i++) {
+            add_dep(bu, last_mrf_write[inst->base_mrf + i], n);
+            last_mrf_write[inst->base_mrf + i] = n;
+         }
+      }
+
+      if (inst->writes_flag()) {
+         add_dep(bu, last_conditional_mod[inst->flag_subreg], n, 0, 0);
+         last_conditional_mod[inst->flag_subreg] = n;
+      }
+
+      if (inst->writes_accumulator) {
+         if (gen6plus) {
+             add_dep(bu, last_accumulator_write, n);
+             last_accumulator_write = n;
+         } else {
+            add_barrier_deps(bu, n);
+         }
+      }
+   }
+
+   /* bottom-to-top dependencies: WAR */
+   memset(last_grf_write, 0, sizeof(last_grf_write));
+   memset(last_mrf_write, 0, sizeof(last_mrf_write));
+   memset(last_conditional_mod, 0, sizeof(last_conditional_mod));
+   last_accumulator_write = NULL;
+   last_fixed_grf_write = NULL;
+
+   exec_node *node;
+   exec_node *prev;
+   for (node = instructions.get_tail(), prev = node->prev;
+        !node->is_head_sentinel();
+        node = prev, prev = node->prev) {
+      schedule_node *n = (schedule_node *)node;
+      fs_inst *inst = (fs_inst *)n->inst;
+
+      /* write-after-read deps. */
+      for (int i = 0; i < 3; i++) {
+         if (inst->src[i].file == GRF) {
+            if (post_reg_alloc) {
+               for (int r = 0; r < reg_width * inst->regs_read(v, i); r++)
+                  add_dep(bu, n, last_grf_write[inst->src[i].reg + r]);
+            } else {
+                fs_inst* writer = last_grf_write[inst->src[i].reg] ?
+                    (fs_inst*)last_grf_write[inst->src[i].reg]->inst : 0;
+                fs_inst* reader = (fs_inst*)n->inst;
+
+                if (reader && writer)
+                    if (conflict(&writer->dst,    writer->regs_written,
+                                 &reader->src[i], reader->regs_read(v, i)))
+                        add_dep(bu, n, last_grf_write[inst->src[i].reg]);
+            }
+         } else if (inst->src[i].file == HW_REG &&
+                    (inst->src[i].fixed_hw_reg.file ==
+                     BRW_GENERAL_REGISTER_FILE)) {
+            if (post_reg_alloc) {
+               int size = reg_width;
+               if (inst->src[i].fixed_hw_reg.vstride == BRW_VERTICAL_STRIDE_0)
+                  size = 1;
+               for (int r = 0; r < size; r++)
+                  add_dep(bu, n, last_grf_write[inst->src[i].fixed_hw_reg.nr + r]);
+            } else {
+               add_dep(bu, n, last_fixed_grf_write);
+            }
+         } else if (inst->src[i].is_accumulator() && gen6plus) {
+            add_dep(bu, n, last_accumulator_write);
+         } else if (inst->src[i].file != BAD_FILE &&
+                    inst->src[i].file != IMM &&
+                    inst->src[i].file != UNIFORM &&
+                    (inst->src[i].file != HW_REG ||
+                     inst->src[i].fixed_hw_reg.file != IMM)) {
+            assert(inst->src[i].file != MRF);
+            add_barrier_deps(bu, n);
+         }
+      }
+
+      if (inst->base_mrf != -1) {
+         for (int i = 0; i < inst->mlen; i++) {
+            /* It looks like the MRF regs are released in the send
+             * instruction once it's sent, not when the result comes
+             * back.
+             */
+            add_dep(bu, n, last_mrf_write[inst->base_mrf + i], 2, 2);
+         }
+      }
+
+      if (inst->reads_flag()) {
+         add_dep(bu, n, last_conditional_mod[inst->flag_subreg]);
+      }
+
+      if (inst->reads_accumulator_implicitly()) {
+         if (gen6plus) {
+            add_dep(bu, n, last_accumulator_write);
+         } else {
+            add_barrier_deps(bu, n);
+         }
+      }
+
+      /* Update the things this instruction wrote, so earlier reads
+       * can mark this as WAR dependency.
+       */
+      if (inst->dst.file == GRF) {
+         if (post_reg_alloc) {
+            for (int r = 0; r < inst->regs_written * reg_width; r++)
+               last_grf_write[inst->dst.reg + r] = n;
+         } else {
+            last_grf_write[inst->dst.reg] = n;
+         }
+      } else if (inst->dst.file == MRF) {
+         int reg = inst->dst.reg & ~BRW_MRF_COMPR4;
+
+         last_mrf_write[reg] = n;
+
+         if (is_compressed(inst)) {
+            if (inst->dst.reg & BRW_MRF_COMPR4)
+               reg += 4;
+            else
+               reg++;
+
+            last_mrf_write[reg] = n;
+         }
+      } else if (inst->dst.file == HW_REG &&
+                 inst->dst.fixed_hw_reg.file == BRW_GENERAL_REGISTER_FILE) {
+         if (post_reg_alloc) {
+            for (int r = 0; r < reg_width; r++)
+               last_grf_write[inst->dst.fixed_hw_reg.nr + r] = n;
+         } else {
+            last_fixed_grf_write = n;
+         }
+      } else if (inst->dst.is_accumulator() && gen6plus) {
+         last_accumulator_write = n;
+      } else if (inst->dst.file != BAD_FILE) {
+         add_barrier_deps(bu, n);
+      }
+
+      if (inst->mlen > 0 && inst->base_mrf != -1) {
+         for (int i = 0; i < v->implied_mrf_writes(inst); i++) {
+            last_mrf_write[inst->base_mrf + i] = n;
+         }
+      }
+
+      if (inst->writes_flag()) {
+         last_conditional_mod[inst->flag_subreg] = n;
+      }
+
+      if (inst->writes_accumulator) {
+         if (gen6plus) {
+            last_accumulator_write = n;
+         } else {
+            add_barrier_deps(bu, n);
+         }
+      }
+   }
+}
+
+void
+vec4_instruction_scheduler::calculate_deps(bool bu)
+{
+   const bool gen6plus = v->brw->gen >= 6;
+
+   schedule_node *last_grf_write[grf_count];
+   schedule_node *last_mrf_write[BRW_MAX_MRF];
+   schedule_node *last_conditional_mod = NULL;
+   schedule_node *last_accumulator_write = NULL;
+   /* Fixed HW registers are assumed to be separate from the virtual
+    * GRFs, so they can be tracked separately.  We don't really write
+    * to fixed GRFs much, so don't bother tracking them on a more
+    * granular level.
+    */
+   schedule_node *last_fixed_grf_write = NULL;
+
+   /* The last instruction always needs to still be the last instruction.
+    * Either it's flow control (IF, ELSE, ENDIF, DO, WHILE) and scheduling
+    * other things after it would disturb the basic block, or it's the EOT
+    * URB_WRITE and we should do a better job at dead code eliminating
+    * anything that could have been scheduled after it.
+    */
+   schedule_node *last = (schedule_node *)instructions.get_tail();
+   add_barrier_deps(bu, last);
+
+   memset(last_grf_write, 0, sizeof(last_grf_write));
+   memset(last_mrf_write, 0, sizeof(last_mrf_write));
+
+   /* top-to-bottom dependencies: RAW and WAW. */
+   foreach_list(node, &instructions) {
+      schedule_node *n = (schedule_node *)node;
+      vec4_instruction *inst = (vec4_instruction *)n->inst;
+
+      if (inst->has_side_effects())
+         add_barrier_deps(bu, n);
+
+      /* read-after-write deps. */
+      for (int i = 0; i < 3; i++) {
+         if (inst->src[i].file == GRF) {
+            add_dep(bu, last_grf_write[inst->src[i].reg], n);
+         } else if (inst->src[i].file == HW_REG &&
+                    (inst->src[i].fixed_hw_reg.file ==
+                     BRW_GENERAL_REGISTER_FILE)) {
+            add_dep(bu, last_fixed_grf_write, n);
+         } else if (inst->src[i].is_accumulator() && gen6plus) {
+            assert(last_accumulator_write);
+            add_dep(bu, last_accumulator_write, n);
+         } else if (inst->src[i].file != BAD_FILE &&
+                    inst->src[i].file != IMM &&
+                    inst->src[i].file != UNIFORM) {
+            /* No reads from MRF, and ATTR is already translated away */
+            assert(inst->src[i].file != MRF &&
+                   inst->src[i].file != ATTR);
+            add_barrier_deps(bu, n);
+         }
+      }
+
+      for (int i = 0; i < inst->mlen; i++) {
+         /* It looks like the MRF regs are released in the send
+          * instruction once it's sent, not when the result comes
+          * back.
+          */
+         add_dep(bu, last_mrf_write[inst->base_mrf + i], n);
+      }
+
+      if (inst->reads_flag()) {
+         assert(last_conditional_mod);
+         add_dep(bu, last_conditional_mod, n);
+      }
+
+      if (inst->reads_accumulator_implicitly()) {
+         if (gen6plus) {
+            assert(last_accumulator_write);
+            add_dep(bu, last_accumulator_write, n);
+         } else {
+            add_barrier_deps(bu, n);
+         }
+      }
+
+      /* write-after-write deps. */
+      if (inst->dst.file == GRF) {
+         add_dep(bu, last_grf_write[inst->dst.reg], n);
+         last_grf_write[inst->dst.reg] = n;
+      } else if (inst->dst.file == MRF) {
+         add_dep(bu, last_mrf_write[inst->dst.reg], n);
+         last_mrf_write[inst->dst.reg] = n;
+     } else if (inst->dst.file == HW_REG &&
+                 inst->dst.fixed_hw_reg.file == BRW_GENERAL_REGISTER_FILE) {
+         last_fixed_grf_write = n;
+      } else if (inst->dst.is_accumulator() && gen6plus) {
+         add_dep(bu, last_accumulator_write, n);
+         last_accumulator_write = n;
+      } else if (inst->dst.file != BAD_FILE) {
+         add_barrier_deps(bu, n);
+      }
+
+      if (inst->mlen > 0) {
+         for (int i = 0; i < v->implied_mrf_writes(inst); i++) {
+            add_dep(bu, last_mrf_write[inst->base_mrf + i], n);
+            last_mrf_write[inst->base_mrf + i] = n;
+         }
+      }
+
+      if (inst->writes_flag()) {
+         add_dep(bu, last_conditional_mod, n, 0, 0);
+         last_conditional_mod = n;
+      }
+
+      if (inst->writes_accumulator) {
+         if (gen6plus) {
+            add_dep(bu, last_accumulator_write, n);
+            last_accumulator_write = n;
+         } else {
+            add_barrier_deps(bu, n);
+         }
+      }
+   }
+
+   /* bottom-to-top dependencies: WAR */
+   memset(last_grf_write, 0, sizeof(last_grf_write));
+   memset(last_mrf_write, 0, sizeof(last_mrf_write));
+   last_conditional_mod = NULL;
+   last_accumulator_write = NULL;
+   last_fixed_grf_write = NULL;
+
+   exec_node *node;
+   exec_node *prev;
+   for (node = instructions.get_tail(), prev = node->prev;
+        !node->is_head_sentinel();
+        node = prev, prev = node->prev) {
+      schedule_node *n = (schedule_node *)node;
+      vec4_instruction *inst = (vec4_instruction *)n->inst;
+
+      /* write-after-read deps. */
+      for (int i = 0; i < 3; i++) {
+         if (inst->src[i].file == GRF) {
+            add_dep(bu, n, last_grf_write[inst->src[i].reg]);
+         } else if (inst->src[i].file == HW_REG &&
+                    (inst->src[i].fixed_hw_reg.file ==
+                     BRW_GENERAL_REGISTER_FILE)) {
+            add_dep(bu, n, last_fixed_grf_write);
+         } else if (inst->src[i].is_accumulator() && gen6plus) {
+            add_dep(bu, n, last_accumulator_write);
+         } else if (inst->src[i].file != BAD_FILE &&
+                    inst->src[i].file != IMM &&
+                    inst->src[i].file != UNIFORM) {
+            assert(inst->src[i].file != MRF &&
+                   inst->src[i].file != ATTR);
+            add_barrier_deps(bu, n);
+         }
+      }
+
+      for (int i = 0; i < inst->mlen; i++) {
+         /* It looks like the MRF regs are released in the send
+          * instruction once it's sent, not when the result comes
+          * back.
+          */
+         add_dep(bu, n, last_mrf_write[inst->base_mrf + i], 2, 2);
+      }
+
+      if (inst->reads_flag()) {
+         add_dep(bu, n, last_conditional_mod);
+      }
+
+      if (inst->reads_accumulator_implicitly()) {
+         if (gen6plus) {
+            add_dep(bu, n, last_accumulator_write);
+         } else {
+            add_barrier_deps(bu, n);
+         }
+      }
+
+      /* Update the things this instruction wrote, so earlier reads
+       * can mark this as WAR dependency.
+       */
+      if (inst->dst.file == GRF) {
+         last_grf_write[inst->dst.reg] = n;
+      } else if (inst->dst.file == MRF) {
+         last_mrf_write[inst->dst.reg] = n;
+      } else if (inst->dst.file == HW_REG &&
+                 inst->dst.fixed_hw_reg.file == BRW_GENERAL_REGISTER_FILE) {
+         last_fixed_grf_write = n;
+      } else if (inst->dst.is_accumulator() && gen6plus) {
+         last_accumulator_write = n;
+      } else if (inst->dst.file != BAD_FILE) {
+         add_barrier_deps(bu, n);
+      }
+
+      if (inst->mlen > 0) {
+         for (int i = 0; i < v->implied_mrf_writes(inst); i++) {
+            last_mrf_write[inst->base_mrf + i] = n;
+         }
+      }
+
+      if (inst->writes_flag()) {
+         last_conditional_mod = n;
+      }
+
+      if (inst->writes_accumulator) {
+         if (gen6plus) {
+            last_accumulator_write = n;
+         } else {
+            add_barrier_deps(bu, n);
+         }
+      }
+   }
+}
+
+schedule_node *
+fs_instruction_scheduler::choose_instruction_to_schedule_classic()
+{
+   schedule_node *chosen = NULL;
+
+   if (mode == SCHEDULE_PRE || mode == SCHEDULE_POST) {
+      int chosen_time = 0;
+
+      /* Of the instructions ready to execute or the closest to
+       * being ready, choose the oldest one.
+       */
+      foreach_list(node, &instructions) {
+         schedule_node *n = (schedule_node *)node;
+
+         if (!chosen || n->unblocked_time < chosen_time) {
+            chosen = n;
+            chosen_time = n->unblocked_time;
+         }
+      }
+   } else {
+      /* Before register allocation, we don't care about the latencies of
+       * instructions.  All we care about is reducing live intervals of
+       * variables so that we can avoid register spilling, or get SIMD16
+       * shaders which naturally do a better job of hiding instruction
+       * latency.
+       */
+      foreach_list(node, &instructions) {
+         schedule_node *n = (schedule_node *)node;
+         fs_inst *inst = (fs_inst *)n->inst;
+
+         if (!chosen) {
+            chosen = n;
+            continue;
+         }
+
+         /* Most important: If we can definitely reduce register pressure, do
+          * so immediately.
+          */
+         float register_pressure_benefit = get_register_pressure_benefit(n->inst, false);
+         float chosen_register_pressure_benefit =
+            get_register_pressure_benefit(chosen->inst, false);
+
+         if (register_pressure_benefit > 0 &&
+             register_pressure_benefit > chosen_register_pressure_benefit) {
+            chosen = n;
+            continue;
+         } else if (chosen_register_pressure_benefit > 0 &&
+                    (register_pressure_benefit <
+                     chosen_register_pressure_benefit)) {
+            continue;
+         }
+
+         if (mode == SCHEDULE_PRE_LIFO) {
+            /* Prefer instructions that recently became available for
+             * scheduling.  These are the things that are most likely to
+             * (eventually) make a variable dead and reduce register pressure.
+             * Typical register pressure estimates don't work for us because
+             * most of our pressure comes from texturing, where no single
+             * instruction to schedule will make a vec4 value dead.
+             */
+            if (n->cand_generation > chosen->cand_generation) {
+               chosen = n;
+               continue;
+            } else if (n->cand_generation < chosen->cand_generation) {
+               continue;
+            }
+
+            /* On MRF-using chips, prefer non-SEND instructions.  If we don't
+             * do this, then because we prefer instructions that just became
+             * candidates, we'll end up in a pattern of scheduling a SEND,
+             * then the MRFs for the next SEND, then the next SEND, then the
+             * MRFs, etc., without ever consuming the results of a send.
+             */
+            if (v->brw->gen < 7) {
+               fs_inst *chosen_inst = (fs_inst *)chosen->inst;
+
+               /* We use regs_written > 1 as our test for the kind of send
+                * instruction to avoid -- only sends generate many regs, and a
+                * single-result send is probably actually reducing register
+                * pressure.
+                */
+               if (inst->regs_written <= 1 && chosen_inst->regs_written > 1) {
+                  chosen = n;
+                  continue;
+               } else if (inst->regs_written > chosen_inst->regs_written) {
+                  continue;
+               }
+            }
+         }
+
+         /* For instructions pushed on the cands list at the same time, prefer
+          * the one with the highest delay to the end of the program.  This is
+          * most likely to have its values able to be consumed first (such as
+          * for a large tree of lowered ubo loads, which appear reversed in
+          * the instruction stream with respect to when they can be consumed).
+          */
+         if (n->delay > chosen->delay) {
+            chosen = n;
+            continue;
+         } else if (n->delay < chosen->delay) {
+            continue;
+         }
+
+         /* If all other metrics are equal, we prefer the first instruction in
+          * the list (program execution).
+          */
+      }
+   }
+
+   return chosen;
+}
+
+schedule_node *
+fs_instruction_scheduler::choose_instruction_to_schedule_td()
+{
+   if (!use_ips(mode))
+      return choose_instruction_to_schedule_classic();
+
+   schedule_node *chosen = NULL;
+   schedule_node *first = NULL;
+
+   const float data_pressure_fraction  = current_block_pressure / allocatable_grfs;
+   const bool  data_pressure_panic = data_pressure_fraction > pressure_panic_threshold;
+   float chosen_weight = -1e10f;
+
+   int chosen_time = 0;
+
+   foreach_list(node, &instructions) {
+      schedule_node *n = (schedule_node *)node;
+      if (!chosen)
+         chosen = n;
+
+      if (!first)
+         first = n;
+
+      if (use_ips(mode)) {
+         fs_inst *inst = (fs_inst *)n->inst;
+
+         const bool partially_scheduled = is_partially_scheduled(inst);
+         const float pressure_reduction = get_register_pressure_benefit(inst, false);
+
+         // Weighting factors, clamped to [-1..1]
+         const float latency_factor     = clamp01(n->platency / 25.0f);
+         const float pressure_factor    = clamp01(pressure_reduction / 4.0f);
+         const float partial_factor     = partially_scheduled ? 1.0f : 0.0f;
+         const float locality_factor    = (previous_chosen && consumes_dst(previous_chosen->inst, inst)) ? 1.0f : 0.0f;
+         const float generation_factor  = clamp01((n->cand_generation - first->cand_generation) / 20.0f);
+         const float delay_factor       = clamp01((n->delay - first->delay) / 100.0f);
+         const float unblocked_factor   = clamp01((n->unblocked_ptime - ptime) / 20.0f);
+
+         float weight;
+
+         if (data_pressure_panic) {
+            weight =
+               latency_factor     * -100.0f +
+               pressure_factor    *   20.0f  +
+               partial_factor     *   20.0f  +
+               locality_factor    *    1.0f  +
+               generation_factor  *   10.0f  +
+               delay_factor       *    0.0f  +
+               unblocked_factor   *   10.0f  +
+               0.0f;
+         } else {
+            weight =
+               delay_factor       *  500.0f +
+               unblocked_factor   * -200.0f +
+               latency_factor     *  100.0f +
+               0.0f;
+
+         }
+
+         if (weight > chosen_weight) {
+            chosen_weight = weight;
+            chosen = n;
+         }
+
+         if (debug) {
+            fprintf(stderr, "factors: W=%7.2f: %5.2f %5.2f %s: ",
+                    weight,
+                    delay_factor,
+                    unblocked_factor,
+                    data_pressure_panic ? "!" : "");
+            bv->dump_instruction(inst);
+         }
+      } else { // not mode == SCHEDULE_PRE_IPS
+         if (n->unblocked_time < chosen_time) {
+            chosen = n;
+            chosen_time = n->unblocked_time;
+         }
+      }
+   }
+
+   return chosen;
+}
+
+schedule_node *
+fs_instruction_scheduler::choose_instruction_to_schedule_bu()
+{
+   schedule_node *chosen = NULL;
+   schedule_node *first = NULL;
+   float chosen_weight = -1e10;
+
+   const bool data_pressure_panic = current_block_pressure > (allocatable_grfs * pressure_panic_threshold);
+
+   foreach_list(node, &instructions) {
+      schedule_node *n = (schedule_node *)node;
+      fs_inst *inst = (fs_inst *)n->inst;
+
+      if (!first)
+         first = n;
+
+      if (!chosen)
+         chosen = n;
+
+      const float pressure_reduction = get_register_pressure_benefit(inst, true);
+
+      if (inst->dst.file == GRF)
+         assert(inst->dst.reg < v->virtual_grf_count);
+
+      const float lifetime_factor    = (inst->dst.file == GRF) ?
+         clamp01((v->virtual_grf_end[inst->dst.reg] - v->virtual_grf_start[inst->dst.reg]) / 200.0f) : 0.0f;
+      const float pressure_factor    = clamp01(pressure_reduction / 4.0f);
+      const float delay_factor       = clamp01((n->delay - first->delay) / 100.0f);
+      const float parent_factor      = clamp01(n->parent_count / 20.0f);
+      const float partial_factor     = inst->dst.file == GRF ? clamp01(remaining_grf_uses[inst->dst.reg] / 4.0) : 0.0f;
+      const float unblocked_factor   = clamp01((n->unblocked_ptime - ptime) / 20.0f);
+      const float panic_factor       = ((current_block_pressure / float(allocatable_grfs * pressure_panic_threshold)) - 1.0f) /
+         (1.0f - pressure_panic_threshold);
+      const float offset_factor      = (inst->src[0].file == GRF) ? inst->src[0].reg_offset : 0.0f;
+
+      float weight = 0.0f;
+
+      switch (mode) {
+      case SCHEDULE_PRE_IPS_BU_LIMIT:
+         weight =
+            lifetime_factor *   100.0f;
+         break;
+      case SCHEDULE_PRE_IPS_BU_HI:    // fall through
+      case SCHEDULE_PRE_IPS_BU_MH:    // ...
+      case SCHEDULE_PRE_IPS_BU_MD:    // ...
+      case SCHEDULE_PRE_IPS_BU_ML:    // ...
+      case SCHEDULE_PRE_IPS_BU_LO:
+         if (data_pressure_panic) {
+            weight =
+               pressure_factor  *   lerp(50.0f, 100.0f, panic_factor) +
+               unblocked_factor *   lerp(-70.0f, -50.0f, panic_factor) +
+               delay_factor     *   lerp(50.0f, 20.0f, panic_factor) +
+               lifetime_factor  *    10.0f +
+               offset_factor    *     5.0f +
+               0.0;
+         } else {
+            weight =
+               delay_factor     *    50.0f +
+               unblocked_factor *   -70.0f +
+               partial_factor   *    50.0f +
+               pressure_factor  *     5.0f +
+               offset_factor    *     5.0f +
+               0.0;
+         }
+         break;
+
+      default:
+         assert(0);
+         weight = lifetime_factor *   100.0f;
+      }
+
+      if (debug && detail_debug) {
+         fprintf(stderr, "BU factors: W=%7.2f: D=%5.2f L=%5.2f P=%5.2f parents=%5.2f Part=%5.2f: ",
+                 weight,
+                 delay_factor,
+                 lifetime_factor,
+                 pressure_factor,
+                 parent_factor,
+                 partial_factor);
+         bv->dump_instruction(inst);
+      }
+
+      if (weight >= chosen_weight) {
+         chosen_weight = weight;
+         chosen = n;
+      }
+   }
+
+   return (schedule_node*)chosen;
+}
+
+schedule_node *
+vec4_instruction_scheduler::choose_instruction_to_schedule_bu()
+{
+   return choose_instruction_to_schedule_td();   // TODO: ...
+}
+
+schedule_node *
+vec4_instruction_scheduler::choose_instruction_to_schedule_td()
+{
+   schedule_node *chosen = NULL;
+   int chosen_time = 0;
+
+   /* Of the instructions ready to execute or the closest to being ready,
+    * choose the oldest one.
+    */
+   foreach_list(node, &instructions) {
+      schedule_node *n = (schedule_node *)node;
+
+      if (!chosen || n->unblocked_time < chosen_time) {
+         chosen = n;
+         chosen_time = n->unblocked_time;
+      }
+   }
+
+   return chosen;
+}
+
+int
+fs_instruction_scheduler::issue_time(backend_instruction *inst)
+{
+   if (is_compressed((fs_inst *)inst))
+      return 4;
+   else
+      return 2;
+}
+
+int
+vec4_instruction_scheduler::issue_time(backend_instruction *inst)
+{
+   /* We always execute as two vec4s in parallel. */
+   return 2;
+}
+
+void
+instruction_scheduler::schedule_instructions_bu(backend_instruction *next_block_header,
+                                                int live_outputs)
+{
+   /* Remove non-DAG heads from the list. */
+   foreach_list_safe(node, &instructions) {
+      schedule_node *n = (schedule_node *)node;
+      if (n->parent_count != 0) {
+         n->remove();
+      }
+   }
+
+   previous_chosen = NULL;
+   unsigned cand_generation = 1;
+   while (!instructions.is_empty()) {
+      schedule_node *chosen = choose_instruction_to_schedule_bu();
+
+      assert(chosen);
+      chosen->remove();
+
+      if (!previous_chosen)
+         next_block_header->insert_before(chosen->inst);
+      else
+         previous_chosen->inst->insert_before(chosen->inst);
+
+      instructions_to_schedule--;
+      previous_chosen = chosen;
+
+      update_register_pressure(chosen->inst, true);
+      
+      /* Update the clock for how soon an instruction could start after the
+       * chosen one.
+       */
+      time += issue_time(chosen->inst);
+      ptime += issue_time(chosen->inst);
+
+      /* If we expected a delay for scheduling, then bump the clock to reflect
+       * that as well.  In reality, the hardware will switch to another
+       * hyperthread and may not return to dispatching our thread for a while
+       * even after we're unblocked.
+       */
+      time = MAX2(time, chosen->unblocked_time);
+      ptime = MAX2(ptime, chosen->unblocked_ptime);
+      
+      if (debug) {
+         const bool data_pressure_panic = current_block_pressure > (allocatable_grfs * pressure_panic_threshold);
+         fprintf(stderr, "BU: clock %d, pressure %.2f%s, scheduled: ",
+                 ptime,
+                 current_block_pressure,
+                 data_pressure_panic ? "(!)" : "");
+         bv->dump_instruction(chosen->inst);
+      }
+
+      /* Now that we've scheduled a new instruction, some of its
+       * children can be promoted to the list of instructions ready to
+       * be scheduled.  Update the children's unblocked time for this
+       * DAG edge as we do so.
+       */
+      for (int i = chosen->child_count - 1; i >= 0; i--) {
+         schedule_node *child = chosen->children[i];
+
+         child->unblocked_time = MAX2(child->unblocked_time,
+                                   time + chosen->child_latency[i]);
+         child->unblocked_ptime = MAX2(child->unblocked_ptime,
+                                   ptime + chosen->child_platency[i]);
+
+         if (debug && detail_debug) {
+            fprintf(stderr, "\tchild %d, %d parents: ", i, child->parent_count);
+            bv->dump_instruction(child->inst);
+         }
+
+         child->cand_generation = cand_generation;
+         child->parent_count--;
+         if (child->parent_count == 0) {
+            if (debug && detail_debug) {
+               fprintf(stderr, "\t\tnow available\n");
+            }
+            instructions.push_head(child);
+         }
+      }
+      cand_generation++;
+   }
+}
+
+void
+instruction_scheduler::schedule_instructions_td(backend_instruction *next_block_header,
+                                                int live_inputs)
+{
+   time = 0;
+
+   /* Remove non-DAG heads from the list. */
+   foreach_list_safe(node, &instructions) {
+      schedule_node *n = (schedule_node *)node;
+      if (n->parent_count != 0)
+         n->remove();
+   }
+
+   previous_chosen = NULL;
+   unsigned cand_generation = 1;
+   while (!instructions.is_empty()) {
+      schedule_node *chosen = choose_instruction_to_schedule_td();
+      previous_chosen = chosen;
+
+      /* Schedule this instruction. */
+      assert(chosen);
+      chosen->remove();
+      next_block_header->insert_before(chosen->inst);
+      instructions_to_schedule--;
+      update_register_pressure(chosen->inst, false);
+
+      /* Update the clock for how soon an instruction could start after the
+       * chosen one.
+       */
+      time += issue_time(chosen->inst);
+      ptime += issue_time(chosen->inst);
+
+      /* If we expected a delay for scheduling, then bump the clock to reflect
+       * that as well.  In reality, the hardware will switch to another
+       * hyperthread and may not return to dispatching our thread for a while
+       * even after we're unblocked.
+       */
+      time = MAX2(time, chosen->unblocked_time);
+      ptime = MAX2(ptime, chosen->unblocked_ptime);
+
+      if (debug) {
+         const bool data_pressure_panic = current_block_pressure > (allocatable_grfs * pressure_panic_threshold);
+         fprintf(stderr, "clock %d, pressure %.2f%s, scheduled: ",
+                 ptime,
+                 current_block_pressure,
+                 data_pressure_panic ? "(!)" : "");
+         bv->dump_instruction(chosen->inst);
+      }
+
+      /* Now that we've scheduled a new instruction, some of its
+       * children can be promoted to the list of instructions ready to
+       * be scheduled.  Update the children's unblocked time for this
+       * DAG edge as we do so.
+       */
+      for (int i = chosen->child_count - 1; i >= 0; i--) {
+         schedule_node *child = chosen->children[i];
+
+         child->unblocked_time = MAX2(child->unblocked_time,
+                                      time + chosen->child_latency[i]);
+         child->unblocked_ptime = MAX2(child->unblocked_ptime,
+                                      ptime + chosen->child_platency[i]);
+
+         if (debug && detail_debug) {
+            fprintf(stderr, "\tchild %d, %d parents: ", i, child->parent_count);
+            bv->dump_instruction(child->inst);
+         }
+
+         child->cand_generation = cand_generation;
+         child->parent_count--;
+         if (child->parent_count == 0) {
+            if (debug && detail_debug) {
+               fprintf(stderr, "\t\tnow available\n");
+            }
+            instructions.push_head(child);
+         }
+      }
+      cand_generation++;
+
+      /* Shared resource: the mathbox.  There's one mathbox per EU on Gen6+
+       * but it's more limited pre-gen6, so if we send something off to it then
+       * the next math instruction isn't going to make progress until the first
+       * is done.
+       */
+      if (chosen->inst->is_math()) {
+         foreach_list(node, &instructions) {
+            schedule_node *n = (schedule_node *)node;
+
+            if (n->inst->is_math()) {
+               n->unblocked_time = MAX2(n->unblocked_time,
+                                        time + chosen->latency);
+               n->unblocked_ptime = MAX2(n->unblocked_ptime,
+                                        ptime + chosen->platency);
+            }
+         }
+      }
+   }
+
+   assert(instructions_to_schedule == 0);
+}
+
+void
+instruction_scheduler::run_bu(exec_list *all_instructions)
+{
+   // Otherwise, we should want to attempt a pressure limiting pass to squeeze down pressure.
+   // We will make a bottom up (BU) pass for this purpose.
+
+   // First, create a reverse schedule.  This is a hack to fit into the currrently
+   // available structures.
+   
+   instructions_to_schedule = 0;
+   ptime = 0;
+   time = 0;
+
+   backend_instruction *next_block_header = (backend_instruction *)all_instructions->head;
+
+   if (debug) {
+      fprintf(stderr, "\nInstructions before scheduling (reg_alloc %d)\n",
+              post_reg_alloc);
+      bv->dump_instructions();
+   }
+
+   /* Populate the remaining GRF uses array to improve the pre-regalloc
+    * scheduling.
+    */
+   if (remaining_grf_uses) {
+      for (int x=0; x<grf_count; ++x) {
+         remaining_grf_uses[x] = 0;
+      }
+   }
+
+   block_num = 0;
+   while (!next_block_header->is_tail_sentinel()) {
+      int block_pos = 0;
+      /* Add things to be scheduled until we get to a new BB. */
+      while (!next_block_header->is_tail_sentinel()) {
+         backend_instruction *inst = next_block_header;
+         next_block_header = (backend_instruction *)next_block_header->next;
+
+         count_remaining_grf_uses(inst);
+
+         add_inst(inst);
+
+         // In CFG creation, endifs can be first instruction of a block.
+         // This mirrors the logic in cfg_t::cfg_t.
+         if (inst->is_control_flow() &&
+             !(inst->opcode == BRW_OPCODE_ENDIF && block_pos == 0))
+            break;
+         ++block_pos;
+      }
+
+      calculate_deps(true);
+
+      // Update GRF sets
+      for (int grf=0; grf<grf_count; ++grf) {
+         grf_active[grf] = BITSET_TEST(((fs_visitor*)bv)->live_intervals->bd[block_num].liveout, grf);
+      }
+
+      foreach_list(node, &instructions) {
+         schedule_node *n = (schedule_node *)node;
+         compute_delay(n);
+      }
+
+      // Count of live outputs from the block.
+      const int live_outs = bv->live_out_count(block_num);
+      current_block_pressure = live_outs;
+
+      if (debug) {
+         fprintf(stderr, "BU: block: %d: live outs = %d, live ins = %d\n", block_num, live_outs,
+                 bv->live_in_count(block_num));
+      }
+
+      // Schedule this block BU
+      schedule_instructions_bu(next_block_header, live_outs);
+
+      block_num++;
+   }
+}
+
+#if USE_GRAPHVIZ
+void
+dump_dot(exec_list *instructions, backend_visitor* bv, int blocknum, int part)
+{
+   if (part == 0) {
+      FILE* fp = fopen("/tmp/blockgraph.gv", "w");
+
+      if (!fp)
+         return;
+
+      fprintf(fp, "digraph blocks {\n");
+      fclose(fp);
+      return;
+   }
+
+   FILE* fp = fopen("/tmp/blockgraph.gv", "a");
+   if (!fp)
+      return;
+
+   if (part == 1) {
+      char name0[512];
+      char name1[512];
+      char insttext[512];
+
+      int ip = 0;
+
+      foreach_list(node, instructions) {
+         schedule_node *n = (schedule_node *)node;
+         backend_instruction *inst = (backend_instruction *)n->inst;
+
+         const bool istx =
+            inst->opcode == SHADER_OPCODE_TEX ||
+            inst->opcode == SHADER_OPCODE_TXD ||
+            inst->opcode == SHADER_OPCODE_TXF ||
+            inst->opcode == SHADER_OPCODE_TXL;
+
+         insttext[0] = '\0';
+         bv->dump_instruction(inst, insttext);
+
+         sprintf(name0, "node_0x%08lx [shape=rectangle, label=<<TABLE BORDER=\"0\"><TR><TD COLSPAN=\"4\"><FONT %s POINT-SIZE=\"10\">%s</FONT></TD></TR><TR><TD>D=%d</TD><TD>L=%d</TD><TD>P=%d</TD><TD>C=%d</TD></TR></TABLE>>%s];",
+                 (long)n,
+                 istx ? " COLOR=\"blue\"" : "",
+                 insttext, // brw_instruction_name(inst->opcode),
+                 n->delay,
+                 n->platency,
+                 n->parent_count,
+                 n->child_count,
+                 (n->parent_count == 0 ? ", color=yellow" : ""));
+
+                 // (istx ? ", fontcolor=red" : ""));
+                 
+         fprintf(fp, "  %s\n", name0);
+      }
+
+      foreach_list(node, instructions) {
+         schedule_node *n = (schedule_node *)node;
+
+         sprintf(name0, "node_0x%08lx", (long)n);
+
+         int max_delay = -65535;
+         int critical_child = 0;
+         for (int i = 0; i < n->child_count; i++) {
+            schedule_node *cn = (schedule_node *)n->children[i];
+
+            if (cn->delay > max_delay) {
+               max_delay = cn->delay;
+               critical_child = i;
+            }
+         }
+
+         for (int i = 0; i < n->child_count; i++) {
+            schedule_node *cn = (schedule_node *)n->children[i];
+         
+            sprintf(name1, "node_0x%08lx", (long)cn);
+            fprintf(fp, "  %s -> %s [color=%s]%s;\n", name0, name1,
+                    (n->critical_path && i == critical_child ? "red" :
+                     n->child_latency[i] == 0 ? "green" :
+                     n->platency > 8 ? "blue" : "black"),
+                    "");
+         }
+
+         ++ip;
+      }
+   }
+
+   if (part == 2) {
+      fprintf(fp, "}\n");
+   }
+
+   fclose(fp);
+}
+#endif // USE_GRAPHVIZ
+
+void
+instruction_scheduler::run_td(exec_list *all_instructions)
+{
+   backend_instruction *next_block_header =
+      (backend_instruction *)all_instructions->head;
+
+   if (debug && detail_debug) {
+      fprintf(stderr, "\nInstructions before scheduling (reg_alloc %d)\n",
+              post_reg_alloc);
+      bv->dump_instructions();
+   }
+
+   /* Populate the remaining GRF uses array to improve the pre-regalloc
+    * scheduling.
+    */
+   if (remaining_grf_uses) {
+      for (int x=0; x<grf_count; ++x) {
+         remaining_grf_uses[x] = 0;
+         grf_active[x] = false;
+      }
+
+      foreach_list(node, all_instructions) {
+         count_remaining_grf_uses((backend_instruction *)node);
+      }
+   }
+
+   block_num = 0;
+
+#if USE_GRAPHVIZ
+   dump_dot(&instructions, bv, block_num, 0);
+#endif
+
+   while (!next_block_header->is_tail_sentinel()) {
+      /* Add things to be scheduled until we get to a new BB. */
+
+      int block_pos = 0;
+      while (!next_block_header->is_tail_sentinel()) {
+         backend_instruction *inst = next_block_header;
+         next_block_header = (backend_instruction *)next_block_header->next;
+
+         add_inst(inst);
+
+         if (use_ips(mode)) {
+            // In CFG creation, endifs can be first instruction of a block.
+            // This mirrors the logic in cfg_t::cfg_t.
+            if (inst->is_control_flow() &&
+                !(inst->opcode == BRW_OPCODE_ENDIF && block_pos == 0))
+               break;
+         } else {
+            if (inst->is_control_flow())
+               break;
+         }
+
+         ++block_pos;
+      }
+
+      calculate_deps(false);
+
+      foreach_list(node, &instructions) {
+         schedule_node *n = (schedule_node *)node;
+         compute_delay(n);
+      }
+
+#if USE_GRAPHVIZ
+      find_critical_path((schedule_node*)instructions.head);
+      dump_dot(&instructions, bv, block_num, 1);
+#endif
+
+      // Count of live inputs to the block.
+      int live_ins = 0;
+
+      if (use_ips(mode)) {
+         live_ins = bv->live_in_count(block_num);
+         current_block_pressure = live_ins;
+      }
+
+      if (debug) {
+         fprintf(stderr, "block: %d: live ins = %d, live outs = %d\n", block_num,
+                 live_ins, bv->live_out_count(block_num));
+
+      }
+      
+      schedule_instructions_td(next_block_header, live_ins);
+
+      block_num++;
+   }
+
+#if USE_GRAPHVIZ
+   dump_dot(&instructions, bv, block_num, 2);
+#endif
+}
+
+int
+fs_visitor::schedule_instructions(instruction_scheduler_mode mode)
+{
+   int grf_count;
+   if (mode == SCHEDULE_POST)
+      grf_count = grf_used;
+   else
+      grf_count = virtual_grf_count;
+
+   const int in_pressure = calculate_register_pressure();
+
+   // In SIMD16, each channel requires 2 registers.
+   const int allocatable_grfs = (dispatch_width == 16) ? (max_grf / 2) : max_grf;
+
+   fs_instruction_scheduler sched(this, grf_count, allocatable_grfs, mode);
+
+   // Move interpolations close to their uses, to decrease register pressure
+   // at the potential cost of compute time.  Usually this is a win, but
+   // certainly a more sophisticated approach is indicated.
+   // sched.move_interps(&instructions);
+
+   if (use_bu(mode))
+      sched.run_bu(&instructions);
+   else
+      sched.run_td(&instructions);
+
+   if (unlikely(INTEL_DEBUG & DEBUG_WM)) {
+      fprintf(stderr, "%s: in_pressure=%d, limit=%d, out pressure=%d, panic=%.2f\n",
+              use_bu(mode) ? "BU" : "TD",
+              in_pressure, allocatable_grfs, calculate_register_pressure(), sched.pressure_panic_threshold);
+      fprintf(stderr, "fs%d estimated execution time: %d cycles\n",
+              dispatch_width,
+              sched.ptime * (dispatch_width == 16 ? 1 : 2));
+   }
+
+   if (debug) {
+      invalidate_live_intervals();
+      calculate_live_intervals();
+
+      fprintf(stderr, "\nInstructions after scheduling (reg_alloc)\n");
+      dump_instructions();
+   }
+
+   invalidate_live_intervals();
+
+   return sched.ptime * (dispatch_width == 16 ? 1 : 2);
+}
+
+int
+vec4_visitor::opt_schedule_instructions(instruction_scheduler_mode mode)
+{
+   vec4_instruction_scheduler sched(this, prog_data->total_grf, max_grf, mode);
+
+   calculate_live_intervals();
+   sched.run_td(&instructions);
+
+   if (unlikely(debug_flag)) {
+      fprintf(stderr, "vec4 estimated execution time: %d cycles\n", sched.ptime);
+   }
+
+   if (debug) {
+      invalidate_live_intervals();
+      calculate_live_intervals();
+
+      fprintf(stderr, "\nInstructions after scheduling (reg_alloc)\n");
+      dump_instructions();
+   }
+
+   invalidate_live_intervals();
+
+   return sched.ptime;
+}

diff --git a/icd/intel/compiler/pipeline/brw_shader.cpp b/icd/intel/compiler/pipeline/brw_shader.cpp
new file mode 100644
index 0000000..3a397e8
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_shader.cpp

@@ -0,0 +1,1056 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+extern "C" {
+#include "main/macros.h"
+#include "brw_context.h"
+}
+#include "brw_shader.h"
+#include "brw_vs.h"
+#include "brw_vec4_gs.h"
+#include "brw_vec4_gs_visitor.h"
+#include "brw_fs.h"
+#include "glsl/ir_optimization.h"
+#include "glsl/glsl_parser_extras.h"
+//#include "main/fbobject.h"  // LunarG: Remove
+//#include "main/shaderapi.h" // LunarG: Remove
+#include "main/shaderobj.h"   // LunarG: ADD
+#include "program/hash_table.h" // LunarG: ADD
+#include "icd-utils.h"        // LunarG: ADD
+
+struct brw_shader_program {
+   struct gl_shader_program base;
+
+   struct brw_shader_program_precompile_key pre_key;
+
+   GLbitfield saved;
+
+   struct {
+      struct brw_vs_prog_key key;
+      struct brw_vs_prog_data data;
+      const unsigned *program;
+      unsigned program_size;
+   } vs;
+
+   struct {
+      struct brw_gs_prog_key key;
+      struct brw_gs_prog_data data;
+      const unsigned *program;
+      unsigned program_size;
+   } gs;
+
+   struct {
+      struct brw_wm_prog_key key;
+      struct brw_wm_prog_data data;
+      const unsigned *program;
+      unsigned program_size;
+   } wm;
+};
+
+static inline struct brw_shader_program *
+brw_shader_program(struct gl_shader_program *prog)
+{
+   return (struct brw_shader_program *) prog;
+}
+
+// LunarG : ADD - These expose results of the shader compile.  There may
+//                be another way to get this data, revisit this then.
+
+struct brw_wm_prog_data *get_wm_prog_data(struct gl_shader_program *prog)
+{
+   struct brw_shader_program *brw_prog = (struct brw_shader_program *) prog;
+   return &brw_prog->wm.data;
+}
+
+const unsigned *get_wm_program(struct gl_shader_program *prog)
+{
+    struct brw_shader_program *brw_prog = (struct brw_shader_program *) prog;
+    return brw_prog->wm.program;
+}
+
+unsigned get_wm_program_size(struct gl_shader_program *prog)
+{
+    struct brw_shader_program *brw_prog = (struct brw_shader_program *) prog;
+    return brw_prog->wm.program_size;
+}
+
+struct brw_vs_prog_data *get_vs_prog_data(struct gl_shader_program *prog)
+{
+   struct brw_shader_program *brw_prog = (struct brw_shader_program *) prog;
+   return &brw_prog->vs.data;
+}
+
+const unsigned *get_vs_program(struct gl_shader_program *prog)
+{
+    struct brw_shader_program *brw_prog = (struct brw_shader_program *) prog;
+    return brw_prog->vs.program;
+}
+
+unsigned get_vs_program_size(struct gl_shader_program *prog)
+{
+    struct brw_shader_program *brw_prog = (struct brw_shader_program *) prog;
+    return brw_prog->vs.program_size;
+}
+
+struct brw_gs_prog_data *get_gs_prog_data(struct gl_shader_program *prog)
+{
+   struct brw_shader_program *brw_prog = (struct brw_shader_program *) prog;
+   return &brw_prog->gs.data;
+}
+
+const unsigned *get_gs_program(struct gl_shader_program *prog)
+{
+    struct brw_shader_program *brw_prog = (struct brw_shader_program *) prog;
+    return brw_prog->gs.program;
+}
+
+unsigned get_gs_program_size(struct gl_shader_program *prog)
+{
+    struct brw_shader_program *brw_prog = (struct brw_shader_program *) prog;
+    return brw_prog->gs.program_size;
+}
+
+struct gl_shader *
+brw_new_shader(struct gl_context *ctx, GLuint name, GLuint type)
+{
+   struct brw_shader *shader;
+
+   shader = rzalloc(NULL, struct brw_shader);
+   if (shader) {
+      shader->base.Type = type;
+      shader->base.Stage = _mesa_shader_enum_to_shader_stage(type);
+      shader->base.Name = name;
+      // LunarG: Removed
+      //_mesa_init_shader(ctx, &shader->base);
+   }
+
+   return &shader->base;
+}
+
+struct gl_shader_program *
+brw_new_shader_program(struct gl_context *ctx, GLuint name)
+{
+   struct brw_shader_program *prog = rzalloc(NULL, struct brw_shader_program);
+   if (prog) {
+      prog->base.Name = name;
+      // LunarG: ADD - inline the contents of this function to avoid
+      //               bringing in shaderobj.c
+      //_mesa_init_shader_program(ctx, &prog->base);
+      prog->base.Type = GL_SHADER_PROGRAM_MESA;
+      // LunarG: Remove - VK does not use reference counts
+      // prog->base.RefCount = 1;
+
+      prog->base.AttributeBindings = new string_to_uint_map;
+      prog->base.FragDataBindings = new string_to_uint_map;
+      prog->base.FragDataIndexBindings = new string_to_uint_map;
+
+      prog->base.Geom.VerticesOut = 0;
+      prog->base.Geom.InputType = GL_TRIANGLES;
+      prog->base.Geom.OutputType = GL_TRIANGLE_STRIP;
+
+      prog->base.TransformFeedback.BufferMode = GL_INTERLEAVED_ATTRIBS;
+
+      prog->base.InfoLog = ralloc_strdup(prog, "");
+   }
+   return &prog->base;
+}
+
+void
+brw_notify_link_shader(struct gl_context *ctx,
+                       struct gl_shader_program *shProg)
+{
+//   if (brw->precompile) {
+//      prog->pre_key.fbo_height = ctx->DrawBuffer->Height;
+//      prog->pre_key.is_user_fbo = _mesa_is_user_fbo(ctx->DrawBuffer);
+//   }
+}
+
+const struct brw_shader_program_precompile_key *
+brw_shader_program_get_precompile_key(struct gl_shader_program *shader_prog)
+{
+   struct brw_shader_program *prog = brw_shader_program(shader_prog);
+   return &prog->pre_key;
+}
+
+void
+brw_shader_program_save_vs_compile(struct gl_shader_program *shader_prog,
+                                   const struct brw_vs_compile *c)
+{
+   struct brw_shader_program *prog = brw_shader_program(shader_prog);
+
+   memcpy(&prog->vs.key, &c->key, sizeof(prog->vs.key));
+   memcpy(&prog->vs.data, &c->prog_data, sizeof(prog->vs.data));
+   prog->vs.program = c->base.program;
+   prog->vs.program_size = c->base.program_size;
+   ralloc_steal(shader_prog, (void *) c->base.program);
+
+   prog->saved |= 1 << BRW_VS_PROG;
+}
+
+void
+brw_shader_program_save_gs_compile(struct gl_shader_program *shader_prog,
+                                   const struct brw_gs_compile *c)
+{
+   struct brw_shader_program *prog = brw_shader_program(shader_prog);
+
+   memcpy(&prog->gs.key, &c->key, sizeof(prog->gs.key));
+   memcpy(&prog->gs.data, &c->prog_data, sizeof(prog->gs.data));
+   prog->gs.program = c->base.program;
+   prog->gs.program_size = c->base.program_size;
+   ralloc_steal(shader_prog, (void *) c->base.program);
+
+   prog->saved |= 1 << BRW_GS_PROG;
+}
+
+void
+brw_shader_program_save_wm_compile(struct gl_shader_program *shader_prog,
+                                   const struct brw_wm_compile *c)
+{
+   struct brw_shader_program *prog = brw_shader_program(shader_prog);
+
+   memcpy(&prog->wm.key, &c->key, sizeof(prog->wm.key));
+   memcpy(&prog->wm.data, &c->prog_data, sizeof(prog->wm.data));
+   prog->wm.program = c->program;
+   prog->wm.program_size = c->program_size;
+   ralloc_steal(shader_prog, (void *) c->program);
+
+   prog->saved |= 1 << BRW_WM_PROG;
+}
+
+bool
+brw_shader_program_restore_vs_compile(struct gl_shader_program *shader_prog,
+                                      struct brw_vs_compile *c)
+{
+   struct brw_shader_program *prog = brw_shader_program(shader_prog);
+
+   if (!(prog->saved & (1 << BRW_VS_PROG)))
+      return false;
+
+   prog->saved &= ~(1 << BRW_VS_PROG);
+
+   if (memcmp(&c->key, &prog->vs.key, sizeof(prog->vs.key)))
+      return false;
+
+   memcpy(&c->prog_data, &prog->vs.data, sizeof(prog->vs.data));
+   c->base.program = prog->vs.program;
+   c->base.program_size = prog->vs.program_size;
+   ralloc_steal(c->base.mem_ctx, (void *) c->base.program);
+
+   return true;
+}
+
+bool
+brw_shader_program_restore_gs_compile(struct gl_shader_program *shader_prog,
+                                      struct brw_gs_compile *c)
+{
+   struct brw_shader_program *prog = brw_shader_program(shader_prog);
+
+   if (!(prog->saved & (1 << BRW_GS_PROG)))
+      return false;
+
+   prog->saved &= ~(1 << BRW_GS_PROG);
+
+   if (memcmp(&c->key, &prog->gs.key, sizeof(prog->gs.key)))
+      return false;
+
+   memcpy(&c->prog_data, &prog->gs.data, sizeof(prog->gs.data));
+   c->base.program = prog->gs.program;
+   c->base.program_size = prog->gs.program_size;
+   ralloc_steal(c->base.mem_ctx, (void *) c->base.program);
+
+   return true;
+}
+
+bool
+brw_shader_program_restore_wm_compile(struct gl_shader_program *shader_prog,
+                                      struct brw_wm_compile *c)
+{
+   struct brw_shader_program *prog = brw_shader_program(shader_prog);
+
+   if (!(prog->saved & (1 << BRW_WM_PROG)))
+      return false;
+
+   prog->saved &= ~(1 << BRW_WM_PROG);
+
+   if (memcmp(&c->key, &prog->wm.key, sizeof(prog->wm.key)))
+      return false;
+
+   memcpy(&c->prog_data, &prog->wm.data, sizeof(prog->wm.data));
+   c->program = prog->wm.program;
+   c->program_size = prog->wm.program_size;
+   ralloc_steal(c, (void *) c->program);
+
+   return true;
+}
+
+/**
+ * Performs a compile of the shader stages even when we don't know
+ * what non-orthogonal state will be set, in the hope that it reflects
+ * the eventual NOS used, and thus allows us to produce link failures.
+ */
+static bool
+brw_shader_precompile(struct gl_context *ctx, struct gl_shader_program *prog)
+{
+   switch(prog->Type) {
+   case MESA_SHADER_FRAGMENT:
+       if (!brw_fs_precompile(ctx, prog))
+           return false;
+       break;
+
+   case MESA_SHADER_GEOMETRY:
+       if (!brw_gs_precompile(ctx, prog))
+           return false;
+       break;
+
+   case MESA_SHADER_VERTEX:
+       if (!brw_vs_precompile(ctx, prog))
+           return false;
+       break;
+
+    default:
+       assert(0);
+   }
+
+   return true;
+}
+
+static void
+brw_lower_packing_builtins(struct brw_context *brw,
+                           gl_shader_stage shader_type,
+                           exec_list *ir)
+{
+   int ops = LOWER_PACK_SNORM_2x16
+           | LOWER_UNPACK_SNORM_2x16
+           | LOWER_PACK_UNORM_2x16
+           | LOWER_UNPACK_UNORM_2x16
+           | LOWER_PACK_SNORM_4x8
+           | LOWER_UNPACK_SNORM_4x8
+           | LOWER_PACK_UNORM_4x8
+           | LOWER_UNPACK_UNORM_4x8;
+
+   if (brw->gen >= 7) {
+      /* Gen7 introduced the f32to16 and f16to32 instructions, which can be
+       * used to execute packHalf2x16 and unpackHalf2x16. For AOS code, no
+       * lowering is needed. For SOA code, the Half2x16 ops must be
+       * scalarized.
+       */
+      if (shader_type == MESA_SHADER_FRAGMENT) {
+         ops |= LOWER_PACK_HALF_2x16_TO_SPLIT
+             |  LOWER_UNPACK_HALF_2x16_TO_SPLIT;
+      }
+   } else {
+      ops |= LOWER_PACK_HALF_2x16
+          |  LOWER_UNPACK_HALF_2x16;
+   }
+
+   lower_packing_builtins(ir, ops);
+}
+
+/**
+ * Copy program-specific data generated by linking from the gl_shader_program
+ * object to a specific gl_program object.
+ *
+ * Brought over from shaderapi.c
+ */
+void mesa_copy_linked_program_data(gl_shader_stage type,
+                               const struct gl_shader_program *src,
+                               struct gl_program *dst)
+{
+   switch (type) {
+   case MESA_SHADER_VERTEX:
+      dst->UsesClipDistanceOut = src->Vert.UsesClipDistance;
+      break;
+   case MESA_SHADER_GEOMETRY: {
+      struct gl_geometry_program *dst_gp = (struct gl_geometry_program *) dst;
+      dst_gp->VerticesIn = src->Geom.VerticesIn;
+      dst_gp->VerticesOut = src->Geom.VerticesOut;
+      dst_gp->Invocations = src->Geom.Invocations;
+      dst_gp->InputType = src->Geom.InputType;
+      dst_gp->OutputType = src->Geom.OutputType;
+      dst->UsesClipDistanceOut = src->Geom.UsesClipDistance;
+      dst_gp->UsesEndPrimitive = src->Geom.UsesEndPrimitive;
+   }
+      break;
+   case MESA_SHADER_COMPUTE: {
+      struct gl_compute_program *dst_cp = (struct gl_compute_program *) dst;
+      int i;
+      for (i = 0; i < 3; i++)
+         dst_cp->LocalSize[i] = src->Comp.LocalSize[i];
+   }
+      break;
+   default:
+      break;
+   }
+}
+
+
+// LunarG : ADD - We redid indenting for this whole function to make it readable
+GLboolean
+brw_link_shader(struct gl_context *ctx, struct gl_shader_program *shProg)
+{
+    struct brw_context *brw = brw_context(ctx);
+    unsigned int stage;
+
+    for (stage = 0; stage < ARRAY_SIZE(shProg->_LinkedShaders); stage++) {
+        const struct gl_shader_compiler_options *options =
+                &ctx->ShaderCompilerOptions[stage];
+        struct brw_shader *shader =
+                (struct brw_shader *)shProg->_LinkedShaders[stage];
+
+        if (!shader)
+            continue;
+
+        struct gl_program *prog =
+                // LunarG : Call the function directly
+                //     ctx->Driver.NewProgram(ctx, _mesa_shader_stage_to_program(stage),
+                //                                shader->base.Name);
+                brwNewProgram(ctx, _mesa_shader_stage_to_program(stage),
+                              shader->base.Name);
+        if (!prog)
+            return false;
+        prog->Parameters = _mesa_new_parameter_list();
+
+        mesa_copy_linked_program_data((gl_shader_stage) stage, shProg, prog);
+
+        bool progress;
+
+        /* lower_packing_builtins() inserts arithmetic instructions, so it
+         * must precede lower_instructions().
+         */
+        brw_lower_packing_builtins(brw, (gl_shader_stage) stage, shader->base.ir);
+        do_mat_op_to_vec(shader->base.ir);
+        const int bitfield_insert = brw->gen >= 7
+                ? BITFIELD_INSERT_TO_BFM_BFI
+                : 0;
+        lower_instructions(shader->base.ir,
+                           MOD_TO_FRACT |
+                           DIV_TO_MUL_RCP |
+                           SUB_TO_ADD_NEG |
+                           EXP_TO_EXP2 |
+                           LOG_TO_LOG2 |
+                           bitfield_insert |
+                           LDEXP_TO_ARITH);
+
+        /* Pre-gen6 HW can only nest if-statements 16 deep.  Beyond this,
+         * if-statements need to be flattened.
+         */
+        if (brw->gen < 6)
+            lower_if_to_cond_assign(shader->base.ir, 16);
+
+        do_lower_texture_projection(shader->base.ir);
+        brw_lower_texture_gradients(brw, shader->base.ir);
+        do_vec_index_to_cond_assign(shader->base.ir);
+        lower_vector_insert(shader->base.ir, true);
+        brw_do_cubemap_normalize(shader->base.ir);
+        lower_offset_arrays(shader->base.ir);
+        brw_do_lower_unnormalized_offset(shader->base.ir);
+        lower_noise(shader->base.ir);
+        lower_quadop_vector(shader->base.ir, false);
+
+        bool lowered_variable_indexing =
+                lower_variable_index_to_cond_assign(shader->base.ir,
+                                                    options->EmitNoIndirectInput,
+                                                    options->EmitNoIndirectOutput,
+                                                    options->EmitNoIndirectTemp,
+                                                    options->EmitNoIndirectUniform);
+
+        if (unlikely(brw->perf_debug && lowered_variable_indexing)) {
+            perf_debug("Unsupported form of variable indexing in FS; falling "
+                       "back to very inefficient code generation\n");
+        }
+
+        lower_ubo_reference(&shader->base, shader->base.ir);
+
+        do {
+            progress = false;
+
+            if (stage == MESA_SHADER_FRAGMENT) {
+                brw_do_channel_expressions(shader->base.ir);
+                brw_do_vector_splitting(shader->base.ir);
+            }
+
+            progress = do_lower_jumps(shader->base.ir, true, true,
+                                      true, /* main return */
+                                      false, /* continue */
+                                      false /* loops */
+                                      ) || progress;
+
+            progress = do_common_optimization(shader->base.ir, true, true,
+                                              options, ctx->Const.NativeIntegers)
+                    || progress;
+        } while (progress);
+
+        /* Make a pass over the IR to add state references for any built-in
+          * uniforms that are used.  This has to be done now (during linking).
+          * Code generation doesn't happen until the first time this shader is
+          * used for rendering.  Waiting until then to generate the parameters is
+          * too late.  At that point, the values for the built-in uniforms won't
+          * get sent to the shader.
+          */
+        foreach_list(node, shader->base.ir) {
+            ir_variable *var = ((ir_instruction *) node)->as_variable();
+
+            if ((var == NULL) || (var->data.mode != ir_var_uniform)
+                    || (strncmp(var->name, "gl_", 3) != 0))
+                continue;
+
+            const ir_state_slot *const slots = var->state_slots;
+            assert(var->state_slots != NULL);
+
+            for (unsigned int i = 0; i < var->num_state_slots; i++) {
+                _mesa_add_state_reference(prog->Parameters,
+                                          (gl_state_index *) slots[i].tokens);
+            }
+        }
+
+        validate_ir_tree(shader->base.ir);
+
+        do_set_program_inouts(shader->base.ir, prog, shader->base.Stage);
+
+        prog->SamplersUsed = shader->base.active_samplers;
+
+        _mesa_update_shader_textures_used(shProg, prog);
+
+        _mesa_reference_program(ctx, &shader->base.Program, prog);
+
+        // LunarG : TODO - rectangle support
+        //      brw_add_texrect_params(prog);
+
+        /* This has to be done last.  Any operation that can cause
+         * prog->ParameterValues to get reallocated (e.g., anything that adds a
+         * program constant) has to happen before creating this linkage.
+         */
+        // LunarG : TODO - uniform support
+        //      _mesa_associate_uniform_storage(ctx, shProg, prog->Parameters);
+
+        _mesa_reference_program(ctx, &prog, NULL);
+
+        if (ctx->GlslFlags & GLSL_DUMP) {
+            fprintf(stderr, "\n");
+            fprintf(stderr, "GLSL IR for linked %s program %d:\n",
+                    _mesa_shader_stage_to_string(shader->base.Stage),
+                    shProg->Name);
+            _mesa_print_ir(stderr, shader->base.ir, NULL);
+            fprintf(stderr, "\n");
+        }
+    }
+
+    if ((ctx->GlslFlags & GLSL_DUMP) && shProg->Name != 0) {
+        for (unsigned i = 0; i < shProg->NumShaders; i++) {
+            const struct gl_shader *sh = shProg->Shaders[i];
+            if (!sh)
+                continue;
+
+            fprintf(stderr, "GLSL %s shader %d source for linked program %d:\n",
+                    _mesa_shader_stage_to_string(sh->Stage),
+                    i, shProg->Name);
+            fprintf(stderr, "%s", sh->Source);
+            fprintf(stderr, "\n");
+        }
+    }
+
+    if (!brw_shader_precompile(ctx, shProg))
+        return false;
+
+    return true;
+}
+
+
+int
+brw_type_for_base_type(const struct glsl_type *type)
+{
+   switch (type->base_type) {
+   case GLSL_TYPE_FLOAT:
+      return BRW_REGISTER_TYPE_F;
+   case GLSL_TYPE_INT:
+   case GLSL_TYPE_BOOL:
+      return BRW_REGISTER_TYPE_D;
+   case GLSL_TYPE_UINT:
+      return BRW_REGISTER_TYPE_UD;
+   case GLSL_TYPE_ARRAY:
+      return brw_type_for_base_type(type->fields.array);
+   case GLSL_TYPE_STRUCT:
+   case GLSL_TYPE_SAMPLER:
+   case GLSL_TYPE_ATOMIC_UINT:
+      /* These should be overridden with the type of the member when
+       * dereferenced into.  BRW_REGISTER_TYPE_UD seems like a likely
+       * way to trip up if we don't.
+       */
+      return BRW_REGISTER_TYPE_UD;
+   case GLSL_TYPE_IMAGE:
+      return BRW_REGISTER_TYPE_UD;
+   case GLSL_TYPE_VOID:
+   case GLSL_TYPE_ERROR:
+   case GLSL_TYPE_INTERFACE:
+      assert(!"not reached");
+      break;
+   }
+
+   return BRW_REGISTER_TYPE_F;
+}
+
+uint32_t
+brw_conditional_for_comparison(unsigned int op)
+{
+   switch (op) {
+   case ir_binop_less:
+      return BRW_CONDITIONAL_L;
+   case ir_binop_greater:
+      return BRW_CONDITIONAL_G;
+   case ir_binop_lequal:
+      return BRW_CONDITIONAL_LE;
+   case ir_binop_gequal:
+      return BRW_CONDITIONAL_GE;
+   case ir_binop_equal:
+   case ir_binop_all_equal: /* same as equal for scalars */
+      return BRW_CONDITIONAL_Z;
+   case ir_binop_nequal:
+   case ir_binop_any_nequal: /* same as nequal for scalars */
+      return BRW_CONDITIONAL_NZ;
+   default:
+      assert(!"not reached: bad operation for comparison");
+      return BRW_CONDITIONAL_NZ;
+   }
+}
+
+uint32_t
+brw_math_function(enum opcode op)
+{
+   switch (op) {
+   case SHADER_OPCODE_RCP:
+      return BRW_MATH_FUNCTION_INV;
+   case SHADER_OPCODE_RSQ:
+      return BRW_MATH_FUNCTION_RSQ;
+   case SHADER_OPCODE_SQRT:
+      return BRW_MATH_FUNCTION_SQRT;
+   case SHADER_OPCODE_EXP2:
+      return BRW_MATH_FUNCTION_EXP;
+   case SHADER_OPCODE_LOG2:
+      return BRW_MATH_FUNCTION_LOG;
+   case SHADER_OPCODE_POW:
+      return BRW_MATH_FUNCTION_POW;
+   case SHADER_OPCODE_SIN:
+      return BRW_MATH_FUNCTION_SIN;
+   case SHADER_OPCODE_COS:
+      return BRW_MATH_FUNCTION_COS;
+   case SHADER_OPCODE_INT_QUOTIENT:
+      return BRW_MATH_FUNCTION_INT_DIV_QUOTIENT;
+   case SHADER_OPCODE_INT_REMAINDER:
+      return BRW_MATH_FUNCTION_INT_DIV_REMAINDER;
+   default:
+      assert(!"not reached: unknown math function");
+      return 0;
+   }
+}
+
+uint32_t
+brw_texture_offset(struct gl_context *ctx, ir_constant *offset)
+{
+   /* If the driver does not support GL_ARB_gpu_shader5, the offset
+    * must be constant.
+    */
+   assert(offset != NULL || ctx->Extensions.ARB_gpu_shader5);
+
+   if (!offset) return 0;  /* nonconstant offset; caller will handle it. */
+
+   signed char offsets[3];
+   for (unsigned i = 0; i < offset->type->vector_elements; i++)
+      offsets[i] = (signed char) offset->value.i[i];
+
+   /* Combine all three offsets into a single unsigned dword:
+    *
+    *    bits 11:8 - U Offset (X component)
+    *    bits  7:4 - V Offset (Y component)
+    *    bits  3:0 - R Offset (Z component)
+    */
+   unsigned offset_bits = 0;
+   for (unsigned i = 0; i < offset->type->vector_elements; i++) {
+      const unsigned shift = 4 * (2 - i);
+      offset_bits |= (offsets[i] << shift) & (0xF << shift);
+   }
+   return offset_bits;
+}
+
+const char *
+brw_instruction_name(enum opcode op)
+{
+   char *fallback;
+
+   if (op < ARRAY_SIZE(opcode_descs) && opcode_descs[op].name)
+      return opcode_descs[op].name;
+
+   switch (op) {
+   case FS_OPCODE_FB_WRITE:
+      return "fb_write";
+   case FS_OPCODE_BLORP_FB_WRITE:
+      return "blorp_fb_write";
+
+   case SHADER_OPCODE_RCP:
+      return "rcp";
+   case SHADER_OPCODE_RSQ:
+      return "rsq";
+   case SHADER_OPCODE_SQRT:
+      return "sqrt";
+   case SHADER_OPCODE_EXP2:
+      return "exp2";
+   case SHADER_OPCODE_LOG2:
+      return "log2";
+   case SHADER_OPCODE_POW:
+      return "pow";
+   case SHADER_OPCODE_INT_QUOTIENT:
+      return "int_quot";
+   case SHADER_OPCODE_INT_REMAINDER:
+      return "int_rem";
+   case SHADER_OPCODE_SIN:
+      return "sin";
+   case SHADER_OPCODE_COS:
+      return "cos";
+
+   case SHADER_OPCODE_TEX:
+      return "tex";
+   case SHADER_OPCODE_TXD:
+      return "txd";
+   case SHADER_OPCODE_TXF:
+      return "txf";
+   case SHADER_OPCODE_TXL:
+      return "txl";
+   case SHADER_OPCODE_TXS:
+      return "txs";
+   case FS_OPCODE_TXB:
+      return "txb";
+   case SHADER_OPCODE_TXF_CMS:
+      return "txf_cms";
+   case SHADER_OPCODE_TXF_UMS:
+      return "txf_ums";
+   case SHADER_OPCODE_TXF_MCS:
+      return "txf_mcs";
+   case SHADER_OPCODE_TG4:
+      return "tg4";
+   case SHADER_OPCODE_TG4_OFFSET:
+      return "tg4_offset";
+
+   case SHADER_OPCODE_GEN4_SCRATCH_READ:
+      return "gen4_scratch_read";
+   case SHADER_OPCODE_GEN4_SCRATCH_WRITE:
+      return "gen4_scratch_write";
+   case SHADER_OPCODE_GEN7_SCRATCH_READ:
+      return "gen7_scratch_read";
+
+   case FS_OPCODE_DDX:
+      return "ddx";
+   case FS_OPCODE_DDY:
+      return "ddy";
+
+   case FS_OPCODE_PIXEL_X:
+      return "pixel_x";
+   case FS_OPCODE_PIXEL_Y:
+      return "pixel_y";
+
+   case FS_OPCODE_CINTERP:
+      return "cinterp";
+   case FS_OPCODE_LINTERP:
+      return "linterp";
+
+   case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD:
+      return "uniform_pull_const";
+   case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GEN7:
+      return "uniform_pull_const_gen7";
+   case FS_OPCODE_VARYING_PULL_CONSTANT_LOAD:
+      return "varying_pull_const";
+   case FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7:
+      return "varying_pull_const_gen7";
+
+   case FS_OPCODE_MOV_DISPATCH_TO_FLAGS:
+      return "mov_dispatch_to_flags";
+   case FS_OPCODE_DISCARD_JUMP:
+      return "discard_jump";
+
+   case FS_OPCODE_SET_SIMD4X2_OFFSET:
+      return "set_simd4x2_offset";
+
+   case FS_OPCODE_PACK_HALF_2x16_SPLIT:
+      return "pack_half_2x16_split";
+   case FS_OPCODE_UNPACK_HALF_2x16_SPLIT_X:
+      return "unpack_half_2x16_split_x";
+   case FS_OPCODE_UNPACK_HALF_2x16_SPLIT_Y:
+      return "unpack_half_2x16_split_y";
+
+   case FS_OPCODE_PLACEHOLDER_HALT:
+      return "placeholder_halt";
+
+   case VS_OPCODE_URB_WRITE:
+      return "vs_urb_write";
+   case VS_OPCODE_PULL_CONSTANT_LOAD:
+      return "pull_constant_load";
+   case VS_OPCODE_PULL_CONSTANT_LOAD_GEN7:
+      return "pull_constant_load_gen7";
+   case VS_OPCODE_UNPACK_FLAGS_SIMD4X2:
+      return "unpack_flags_simd4x2";
+
+   case GS_OPCODE_URB_WRITE:
+      return "gs_urb_write";
+   case GS_OPCODE_THREAD_END:
+      return "gs_thread_end";
+   case GS_OPCODE_SET_WRITE_OFFSET:
+      return "set_write_offset";
+   case GS_OPCODE_SET_VERTEX_COUNT:
+      return "set_vertex_count";
+   case GS_OPCODE_SET_DWORD_2_IMMED:
+      return "set_dword_2_immed";
+   case GS_OPCODE_PREPARE_CHANNEL_MASKS:
+      return "prepare_channel_masks";
+   case GS_OPCODE_SET_CHANNEL_MASKS:
+      return "set_channel_masks";
+   case GS_OPCODE_GET_INSTANCE_ID:
+      return "get_instance_id";
+
+   default:
+      /* Yes, this leaks.  It's in debug code, it should never occur, and if
+       * it does, you should just add the case to the list above.
+       */
+      int U_ASSERT_ONLY retval = asprintf(&fallback, "op%d", op);
+      assert(retval != -1);
+      return fallback;
+   }
+}
+
+backend_visitor::backend_visitor(struct brw_context *brw,
+                                 struct gl_shader_program *shader_prog,
+                                 struct gl_program *prog,
+                                 struct brw_stage_prog_data *stage_prog_data,
+                                 gl_shader_stage stage)
+   : brw(brw),
+     ctx(&brw->ctx),
+     shader(shader_prog ?
+        (struct brw_shader *)shader_prog->_LinkedShaders[stage] : NULL),
+     shader_prog(shader_prog),
+     prog(prog),
+     stage_prog_data(stage_prog_data)
+{
+}
+
+bool
+backend_instruction::is_tex() const
+{
+   return (opcode == SHADER_OPCODE_TEX ||
+           opcode == FS_OPCODE_TXB ||
+           opcode == SHADER_OPCODE_TXD ||
+           opcode == SHADER_OPCODE_TXF ||
+           opcode == SHADER_OPCODE_TXF_CMS ||
+           opcode == SHADER_OPCODE_TXF_UMS ||
+           opcode == SHADER_OPCODE_TXF_MCS ||
+           opcode == SHADER_OPCODE_TXL ||
+           opcode == SHADER_OPCODE_TXS ||
+           opcode == SHADER_OPCODE_LOD ||
+           opcode == SHADER_OPCODE_TG4 ||
+           opcode == SHADER_OPCODE_TG4_OFFSET);
+}
+
+bool
+backend_instruction::is_math() const
+{
+   return (opcode == SHADER_OPCODE_RCP ||
+           opcode == SHADER_OPCODE_RSQ ||
+           opcode == SHADER_OPCODE_SQRT ||
+           opcode == SHADER_OPCODE_EXP2 ||
+           opcode == SHADER_OPCODE_LOG2 ||
+           opcode == SHADER_OPCODE_SIN ||
+           opcode == SHADER_OPCODE_COS ||
+           opcode == SHADER_OPCODE_INT_QUOTIENT ||
+           opcode == SHADER_OPCODE_INT_REMAINDER ||
+           opcode == SHADER_OPCODE_POW);
+}
+
+bool
+backend_instruction::is_control_flow() const
+{
+   switch (opcode) {
+   case BRW_OPCODE_DO:
+   case BRW_OPCODE_WHILE:
+   case BRW_OPCODE_IF:
+   case BRW_OPCODE_ELSE:
+   case BRW_OPCODE_ENDIF:
+   case BRW_OPCODE_BREAK:
+   case BRW_OPCODE_CONTINUE:
+      return true;
+   default:
+      return false;
+   }
+}
+
+bool
+backend_instruction::can_do_source_mods() const
+{
+   switch (opcode) {
+   case BRW_OPCODE_ADDC:
+   case BRW_OPCODE_BFE:
+   case BRW_OPCODE_BFI1:
+   case BRW_OPCODE_BFI2:
+   case BRW_OPCODE_BFREV:
+   case BRW_OPCODE_CBIT:
+   case BRW_OPCODE_FBH:
+   case BRW_OPCODE_FBL:
+   case BRW_OPCODE_SUBB:
+      return false;
+   default:
+      return true;
+   }
+}
+
+bool
+backend_instruction::can_do_saturate() const
+{
+   switch (opcode) {
+   case BRW_OPCODE_ADD:
+   case BRW_OPCODE_ASR:
+   case BRW_OPCODE_AVG:
+   case BRW_OPCODE_DP2:
+   case BRW_OPCODE_DP3:
+   case BRW_OPCODE_DP4:
+   case BRW_OPCODE_DPH:
+   case BRW_OPCODE_F16TO32:
+   case BRW_OPCODE_F32TO16:
+   case BRW_OPCODE_LINE:
+   case BRW_OPCODE_LRP:
+   case BRW_OPCODE_MAC:
+   case BRW_OPCODE_MACH:
+   case BRW_OPCODE_MAD:
+   case BRW_OPCODE_MATH:
+   case BRW_OPCODE_MOV:
+   case BRW_OPCODE_MUL:
+   case BRW_OPCODE_PLN:
+   case BRW_OPCODE_RNDD:
+   case BRW_OPCODE_RNDE:
+   case BRW_OPCODE_RNDU:
+   case BRW_OPCODE_RNDZ:
+   case BRW_OPCODE_SEL:
+   case BRW_OPCODE_SHL:
+   case BRW_OPCODE_SHR:
+   case FS_OPCODE_LINTERP:
+   case SHADER_OPCODE_COS:
+   case SHADER_OPCODE_EXP2:
+   case SHADER_OPCODE_LOG2:
+   case SHADER_OPCODE_POW:
+   case SHADER_OPCODE_RCP:
+   case SHADER_OPCODE_RSQ:
+   case SHADER_OPCODE_SIN:
+   case SHADER_OPCODE_SQRT:
+      return true;
+   default:
+      return false;
+   }
+}
+
+bool
+backend_instruction::reads_accumulator_implicitly() const
+{
+   switch (opcode) {
+   case BRW_OPCODE_MAC:
+   case BRW_OPCODE_MACH:
+   case BRW_OPCODE_SADA2:
+      return true;
+   default:
+      return false;
+   }
+}
+
+bool
+backend_instruction::has_side_effects() const
+{
+   switch (opcode) {
+   case SHADER_OPCODE_UNTYPED_ATOMIC:
+      return true;
+   default:
+      return false;
+   }
+}
+
+void
+backend_visitor::dump_instructions()
+{
+   int ip = 0;
+   foreach_list(node, &this->instructions) {
+      backend_instruction *inst = (backend_instruction *)node;
+      fprintf(stderr, "%d: ", ip++);
+      dump_instruction(inst);
+   }
+}
+
+
+/**
+ * Sets up the starting offsets for the groups of binding table entries
+ * commong to all pipeline stages.
+ *
+ * Unused groups are initialized to 0xd0d0d0d0 to make it obvious that they're
+ * unused but also make sure that addition of small offsets to them will
+ * trigger some of our asserts that surface indices are < BRW_MAX_SURFACES.
+ */
+void
+backend_visitor::assign_common_binding_table_offsets(uint32_t next_binding_table_offset)
+{
+   int num_textures = _mesa_fls(prog->SamplersUsed);
+
+   stage_prog_data->binding_table.texture_start = next_binding_table_offset;
+   next_binding_table_offset += num_textures;
+
+   if (shader) {
+      stage_prog_data->binding_table.ubo_start = next_binding_table_offset;
+      next_binding_table_offset += shader->base.NumUniformBlocks;
+   } else {
+      stage_prog_data->binding_table.ubo_start = 0xd0d0d0d0;
+   }
+
+//   if (INTEL_DEBUG & DEBUG_SHADER_TIME) {
+//      stage_prog_data->binding_table.shader_time_start = next_binding_table_offset;
+//      next_binding_table_offset++;
+//   } else {
+      stage_prog_data->binding_table.shader_time_start = 0xd0d0d0d0;
+//   }
+
+   if (prog->UsesGather) {
+      if (brw->gen >= 8) {
+         stage_prog_data->binding_table.gather_texture_start =
+            stage_prog_data->binding_table.texture_start;
+      } else {
+         stage_prog_data->binding_table.gather_texture_start = next_binding_table_offset;
+         next_binding_table_offset += num_textures;
+      }
+   } else {
+      stage_prog_data->binding_table.gather_texture_start = 0xd0d0d0d0;
+   }
+
+   if (shader_prog && shader_prog->NumAtomicBuffers) {
+      stage_prog_data->binding_table.abo_start = next_binding_table_offset;
+      next_binding_table_offset += shader_prog->NumAtomicBuffers;
+   } else {
+      stage_prog_data->binding_table.abo_start = 0xd0d0d0d0;
+   }
+
+   /* This may or may not be used depending on how the compile goes. */
+   stage_prog_data->binding_table.pull_constants_start = next_binding_table_offset;
+   next_binding_table_offset++;
+
+   assert(next_binding_table_offset <= BRW_MAX_SURFACES);
+
+   /* prog_data->base.binding_table.size will be set by brw_mark_surface_used. */
+}

diff --git a/icd/intel/compiler/pipeline/brw_shader.h b/icd/intel/compiler/pipeline/brw_shader.h
new file mode 100644
index 0000000..0706e61
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_shader.h

@@ -0,0 +1,185 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <stdint.h>
+#include "brw_defines.h"
+#include "main/compiler.h"
+#include "glsl/ir.h"
+
+#pragma once
+
+enum PACKED register_file {
+   BAD_FILE,
+   GRF,
+   MRF,
+   IMM,
+   HW_REG, /* a struct brw_reg */
+   ATTR,
+   UNIFORM, /* prog_data->params[reg] */
+};
+
+#ifdef __cplusplus
+
+class backend_instruction : public exec_node {
+public:
+   bool is_tex() const;
+   bool is_math() const;
+   bool is_control_flow() const;
+   bool can_do_source_mods() const;
+   bool can_do_saturate() const;
+   bool reads_accumulator_implicitly() const;
+
+   /**
+    * True if the instruction has side effects other than writing to
+    * its destination registers.  You are expected not to reorder or
+    * optimize these out unless you know what you are doing.
+    */
+   bool has_side_effects() const;
+
+   enum opcode opcode; /* BRW_OPCODE_* or FS_OPCODE_* */
+
+   uint8_t predicate;
+   bool predicate_inverse;
+   bool writes_accumulator; /**< instruction implicitly writes accumulator */
+};
+
+enum instruction_scheduler_mode {
+   SCHEDULE_PRE_IPS_TD_HI,
+   SCHEDULE_PRE_IPS_TD_LO,
+   SCHEDULE_PRE_IPS_BU_LIMIT,
+   SCHEDULE_PRE_IPS_BU_LO,
+   SCHEDULE_PRE_IPS_BU_ML,
+   SCHEDULE_PRE_IPS_BU_MD,
+   SCHEDULE_PRE_IPS_BU_MH,
+   SCHEDULE_PRE_IPS_BU_HI,
+   SCHEDULE_PRE,
+   SCHEDULE_PRE_NON_LIFO,
+   SCHEDULE_PRE_LIFO,
+   SCHEDULE_POST,
+};
+
+class backend_visitor : public ir_visitor {
+protected:
+
+   backend_visitor(struct brw_context *brw,
+                   struct gl_shader_program *shader_prog,
+                   struct gl_program *prog,
+                   struct brw_stage_prog_data *stage_prog_data,
+                   gl_shader_stage stage);
+
+public:
+
+   struct brw_context * const brw;
+   struct gl_context * const ctx;
+   struct brw_shader * const shader;
+   struct gl_shader_program * const shader_prog;
+   struct gl_program * const prog;
+   struct brw_stage_prog_data * const stage_prog_data;
+
+   /** ralloc context for temporary data used during compile */
+   void *mem_ctx;
+
+   /**
+    * List of either fs_inst or vec4_instruction (inheriting from
+    * backend_instruction)
+    */
+   exec_list instructions;
+
+   virtual void dump_instruction(backend_instruction *inst) = 0;
+   virtual void dump_instruction(backend_instruction *inst, FILE *file) = 0;
+   virtual void dump_instruction(backend_instruction *inst, char* string) = 0;
+   virtual void dump_instructions();
+
+   void assign_common_binding_table_offsets(uint32_t next_binding_table_offset);
+
+   virtual void invalidate_live_intervals() = 0;
+   virtual int live_in_count(int block_num) const = 0;
+   virtual int live_out_count(int block_num) const = 0;
+};
+
+uint32_t brw_texture_offset(struct gl_context *ctx, ir_constant *offset);
+
+#endif /* __cplusplus */
+
+int brw_type_for_base_type(const struct glsl_type *type);
+uint32_t brw_conditional_for_comparison(unsigned int op);
+uint32_t brw_math_function(enum opcode op);
+const char *brw_instruction_name(enum opcode op);
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct brw_shader_program_precompile_key {
+   unsigned fbo_height;
+   bool is_user_fbo;
+};
+
+struct brw_vs_compile;
+struct brw_gs_compile;
+struct brw_wm_compile;
+
+const struct brw_shader_program_precompile_key *
+brw_shader_program_get_precompile_key(struct gl_shader_program *shader_prog);
+
+void
+brw_shader_program_save_vs_compile(struct gl_shader_program *shader_prog,
+                                   const struct brw_vs_compile *c);
+
+void
+brw_shader_program_save_gs_compile(struct gl_shader_program *shader_prog,
+                                   const struct brw_gs_compile *c);
+
+void
+brw_shader_program_save_wm_compile(struct gl_shader_program *shader_prog,
+                                   const struct brw_wm_compile *c);
+
+bool
+brw_shader_program_restore_vs_compile(struct gl_shader_program *shader_prog,
+                                      struct brw_vs_compile *c);
+
+bool
+brw_shader_program_restore_gs_compile(struct gl_shader_program *shader_prog,
+                                      struct brw_gs_compile *c);
+
+bool
+brw_shader_program_restore_wm_compile(struct gl_shader_program *shader_prog,
+                                      struct brw_wm_compile *c);
+
+// LunarG : ADD
+// Exposing some functions that were abstracted away, would be good to put them back
+struct gl_shader_program *brw_new_shader_program(struct gl_context *ctx, GLuint name);
+struct brw_shader_program *get_brw_shader_program(struct gl_shader_program *prog);
+GLboolean brw_link_shader(struct gl_context *ctx, struct gl_shader_program *prog);
+struct brw_wm_prog_data *get_wm_prog_data(struct gl_shader_program *prog);
+const unsigned *get_wm_program(struct gl_shader_program *prog);
+unsigned get_wm_program_size(struct gl_shader_program *prog);
+struct brw_vs_prog_data *get_vs_prog_data(struct gl_shader_program *prog);
+const unsigned *get_vs_program(struct gl_shader_program *prog);
+unsigned get_vs_program_size(struct gl_shader_program *prog);
+struct brw_gs_prog_data *get_gs_prog_data(struct gl_shader_program *prog);
+const unsigned *get_gs_program(struct gl_shader_program *prog);
+unsigned get_gs_program_size(struct gl_shader_program *prog);
+#ifdef __cplusplus
+}
+#endif

diff --git a/icd/intel/compiler/pipeline/brw_structs.h b/icd/intel/compiler/pipeline/brw_structs.h
new file mode 100644
index 0000000..9dbc797
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_structs.h

@@ -0,0 +1,1443 @@
+/*
+ Copyright (C) Intel Corp.  2006.  All Rights Reserved.
+ Intel funded Tungsten Graphics to
+ develop this 3D driver.
+
+ Permission is hereby granted, free of charge, to any person obtaining
+ a copy of this software and associated documentation files (the
+ "Software"), to deal in the Software without restriction, including
+ without limitation the rights to use, copy, modify, merge, publish,
+ distribute, sublicense, and/or sell copies of the Software, and to
+ permit persons to whom the Software is furnished to do so, subject to
+ the following conditions:
+
+ The above copyright notice and this permission notice (including the
+ next paragraph) shall be included in all copies or substantial
+ portions of the Software.
+
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+ **********************************************************************/
+ /*
+  * Authors:
+  *   Keith Whitwell <keithw@vmware.com>
+  */
+
+
+#ifndef BRW_STRUCTS_H
+#define BRW_STRUCTS_H
+
+struct brw_urb_fence
+{
+   struct
+   {
+      unsigned length:8;
+      unsigned vs_realloc:1;
+      unsigned gs_realloc:1;
+      unsigned clp_realloc:1;
+      unsigned sf_realloc:1;
+      unsigned vfe_realloc:1;
+      unsigned cs_realloc:1;
+      unsigned pad:2;
+      unsigned opcode:16;
+   } header;
+
+   struct
+   {
+      unsigned vs_fence:10;
+      unsigned gs_fence:10;
+      unsigned clp_fence:10;
+      unsigned pad:2;
+   } bits0;
+
+   struct
+   {
+      unsigned sf_fence:10;
+      unsigned vf_fence:10;
+      unsigned cs_fence:11;
+      unsigned pad:1;
+   } bits1;
+};
+
+/* State structs for the various fixed function units:
+ */
+
+
+struct thread0
+{
+   unsigned pad0:1;
+   unsigned grf_reg_count:3;
+   unsigned pad1:2;
+   unsigned kernel_start_pointer:26; /* Offset from GENERAL_STATE_BASE */
+};
+
+struct thread1
+{
+   unsigned ext_halt_exception_enable:1;
+   unsigned sw_exception_enable:1;
+   unsigned mask_stack_exception_enable:1;
+   unsigned timeout_exception_enable:1;
+   unsigned illegal_op_exception_enable:1;
+   unsigned pad0:3;
+   unsigned depth_coef_urb_read_offset:6;	/* WM only */
+   unsigned pad1:2;
+   unsigned floating_point_mode:1;
+   unsigned thread_priority:1;
+   unsigned binding_table_entry_count:8;
+   unsigned pad3:5;
+   unsigned single_program_flow:1;
+};
+
+struct thread2
+{
+   unsigned per_thread_scratch_space:4;
+   unsigned pad0:6;
+   unsigned scratch_space_base_pointer:22;
+};
+
+
+struct thread3
+{
+   unsigned dispatch_grf_start_reg:4;
+   unsigned urb_entry_read_offset:6;
+   unsigned pad0:1;
+   unsigned urb_entry_read_length:6;
+   unsigned pad1:1;
+   unsigned const_urb_entry_read_offset:6;
+   unsigned pad2:1;
+   unsigned const_urb_entry_read_length:6;
+   unsigned pad3:1;
+};
+
+
+
+struct brw_clip_unit_state
+{
+   struct thread0 thread0;
+   struct
+   {
+      unsigned pad0:7;
+      unsigned sw_exception_enable:1;
+      unsigned pad1:3;
+      unsigned mask_stack_exception_enable:1;
+      unsigned pad2:1;
+      unsigned illegal_op_exception_enable:1;
+      unsigned pad3:2;
+      unsigned floating_point_mode:1;
+      unsigned thread_priority:1;
+      unsigned binding_table_entry_count:8;
+      unsigned pad4:5;
+      unsigned single_program_flow:1;
+   } thread1;
+
+   struct thread2 thread2;
+   struct thread3 thread3;
+
+   struct
+   {
+      unsigned pad0:9;
+      unsigned gs_output_stats:1; /* not always */
+      unsigned stats_enable:1;
+      unsigned nr_urb_entries:7;
+      unsigned pad1:1;
+      unsigned urb_entry_allocation_size:5;
+      unsigned pad2:1;
+      unsigned max_threads:5; 	/* may be less */
+      unsigned pad3:2;
+   } thread4;
+
+   struct
+   {
+      unsigned pad0:13;
+      unsigned clip_mode:3;
+      unsigned userclip_enable_flags:8;
+      unsigned userclip_must_clip:1;
+      unsigned negative_w_clip_test:1;
+      unsigned guard_band_enable:1;
+      unsigned viewport_z_clip_enable:1;
+      unsigned viewport_xy_clip_enable:1;
+      unsigned vertex_position_space:1;
+      unsigned api_mode:1;
+      unsigned pad2:1;
+   } clip5;
+
+   struct
+   {
+      unsigned pad0:5;
+      unsigned clipper_viewport_state_ptr:27;
+   } clip6;
+
+
+   float viewport_xmin;
+   float viewport_xmax;
+   float viewport_ymin;
+   float viewport_ymax;
+};
+
+struct gen6_blend_state
+{
+   struct {
+      unsigned dest_blend_factor:5;
+      unsigned source_blend_factor:5;
+      unsigned pad3:1;
+      unsigned blend_func:3;
+      unsigned pad2:1;
+      unsigned ia_dest_blend_factor:5;
+      unsigned ia_source_blend_factor:5;
+      unsigned pad1:1;
+      unsigned ia_blend_func:3;
+      unsigned pad0:1;
+      unsigned ia_blend_enable:1;
+      unsigned blend_enable:1;
+   } blend0;
+
+   struct {
+      unsigned post_blend_clamp_enable:1;
+      unsigned pre_blend_clamp_enable:1;
+      unsigned clamp_range:2;
+      unsigned pad0:4;
+      unsigned x_dither_offset:2;
+      unsigned y_dither_offset:2;
+      unsigned dither_enable:1;
+      unsigned alpha_test_func:3;
+      unsigned alpha_test_enable:1;
+      unsigned pad1:1;
+      unsigned logic_op_func:4;
+      unsigned logic_op_enable:1;
+      unsigned pad2:1;
+      unsigned write_disable_b:1;
+      unsigned write_disable_g:1;
+      unsigned write_disable_r:1;
+      unsigned write_disable_a:1;
+      unsigned pad3:1;
+      unsigned alpha_to_coverage_dither:1;
+      unsigned alpha_to_one:1;
+      unsigned alpha_to_coverage:1;
+   } blend1;
+};
+
+struct gen6_color_calc_state
+{
+   struct {
+      unsigned alpha_test_format:1;
+      unsigned pad0:14;
+      unsigned round_disable:1;
+      unsigned bf_stencil_ref:8;
+      unsigned stencil_ref:8;
+   } cc0;
+
+   union {
+      float alpha_ref_f;
+      struct {
+	 unsigned ui:8;
+	 unsigned pad0:24;
+      } alpha_ref_fi;
+   } cc1;
+
+   float constant_r;
+   float constant_g;
+   float constant_b;
+   float constant_a;
+};
+
+struct gen6_depth_stencil_state
+{
+   struct {
+      unsigned pad0:3;
+      unsigned bf_stencil_pass_depth_pass_op:3;
+      unsigned bf_stencil_pass_depth_fail_op:3;
+      unsigned bf_stencil_fail_op:3;
+      unsigned bf_stencil_func:3;
+      unsigned bf_stencil_enable:1;
+      unsigned pad1:2;
+      unsigned stencil_write_enable:1;
+      unsigned stencil_pass_depth_pass_op:3;
+      unsigned stencil_pass_depth_fail_op:3;
+      unsigned stencil_fail_op:3;
+      unsigned stencil_func:3;
+      unsigned stencil_enable:1;
+   } ds0;
+
+   struct {
+      unsigned bf_stencil_write_mask:8;
+      unsigned bf_stencil_test_mask:8;
+      unsigned stencil_write_mask:8;
+      unsigned stencil_test_mask:8;
+   } ds1;
+
+   struct {
+      unsigned pad0:26;
+      unsigned depth_write_enable:1;
+      unsigned depth_test_func:3;
+      unsigned pad1:1;
+      unsigned depth_test_enable:1;
+   } ds2;
+};
+
+struct brw_cc_unit_state
+{
+   struct
+   {
+      unsigned pad0:3;
+      unsigned bf_stencil_pass_depth_pass_op:3;
+      unsigned bf_stencil_pass_depth_fail_op:3;
+      unsigned bf_stencil_fail_op:3;
+      unsigned bf_stencil_func:3;
+      unsigned bf_stencil_enable:1;
+      unsigned pad1:2;
+      unsigned stencil_write_enable:1;
+      unsigned stencil_pass_depth_pass_op:3;
+      unsigned stencil_pass_depth_fail_op:3;
+      unsigned stencil_fail_op:3;
+      unsigned stencil_func:3;
+      unsigned stencil_enable:1;
+   } cc0;
+
+
+   struct
+   {
+      unsigned bf_stencil_ref:8;
+      unsigned stencil_write_mask:8;
+      unsigned stencil_test_mask:8;
+      unsigned stencil_ref:8;
+   } cc1;
+
+
+   struct
+   {
+      unsigned logicop_enable:1;
+      unsigned pad0:10;
+      unsigned depth_write_enable:1;
+      unsigned depth_test_function:3;
+      unsigned depth_test:1;
+      unsigned bf_stencil_write_mask:8;
+      unsigned bf_stencil_test_mask:8;
+   } cc2;
+
+
+   struct
+   {
+      unsigned pad0:8;
+      unsigned alpha_test_func:3;
+      unsigned alpha_test:1;
+      unsigned blend_enable:1;
+      unsigned ia_blend_enable:1;
+      unsigned pad1:1;
+      unsigned alpha_test_format:1;
+      unsigned pad2:16;
+   } cc3;
+
+   struct
+   {
+      unsigned pad0:5;
+      unsigned cc_viewport_state_offset:27; /* Offset from GENERAL_STATE_BASE */
+   } cc4;
+
+   struct
+   {
+      unsigned pad0:2;
+      unsigned ia_dest_blend_factor:5;
+      unsigned ia_src_blend_factor:5;
+      unsigned ia_blend_function:3;
+      unsigned statistics_enable:1;
+      unsigned logicop_func:4;
+      unsigned pad1:11;
+      unsigned dither_enable:1;
+   } cc5;
+
+   struct
+   {
+      unsigned clamp_post_alpha_blend:1;
+      unsigned clamp_pre_alpha_blend:1;
+      unsigned clamp_range:2;
+      unsigned pad0:11;
+      unsigned y_dither_offset:2;
+      unsigned x_dither_offset:2;
+      unsigned dest_blend_factor:5;
+      unsigned src_blend_factor:5;
+      unsigned blend_function:3;
+   } cc6;
+
+   struct {
+      union {
+	 float f;
+	 uint8_t ub[4];
+      } alpha_ref;
+   } cc7;
+};
+
+struct brw_sf_unit_state
+{
+   struct thread0 thread0;
+   struct thread1 thread1;
+   struct thread2 thread2;
+   struct thread3 thread3;
+
+   struct
+   {
+      unsigned pad0:10;
+      unsigned stats_enable:1;
+      unsigned nr_urb_entries:7;
+      unsigned pad1:1;
+      unsigned urb_entry_allocation_size:5;
+      unsigned pad2:1;
+      unsigned max_threads:6;
+      unsigned pad3:1;
+   } thread4;
+
+   struct
+   {
+      unsigned front_winding:1;
+      unsigned viewport_transform:1;
+      unsigned pad0:3;
+      unsigned sf_viewport_state_offset:27; /* Offset from GENERAL_STATE_BASE */
+   } sf5;
+
+   struct
+   {
+      unsigned pad0:9;
+      unsigned dest_org_vbias:4;
+      unsigned dest_org_hbias:4;
+      unsigned scissor:1;
+      unsigned disable_2x2_trifilter:1;
+      unsigned disable_zero_pix_trifilter:1;
+      unsigned point_rast_rule:2;
+      unsigned line_endcap_aa_region_width:2;
+      unsigned line_width:4;
+      unsigned fast_scissor_disable:1;
+      unsigned cull_mode:2;
+      unsigned aa_enable:1;
+   } sf6;
+
+   struct
+   {
+      unsigned point_size:11;
+      unsigned use_point_size_state:1;
+      unsigned subpixel_precision:1;
+      unsigned sprite_point:1;
+      unsigned pad0:10;
+      unsigned aa_line_distance_mode:1;
+      unsigned trifan_pv:2;
+      unsigned linestrip_pv:2;
+      unsigned tristrip_pv:2;
+      unsigned line_last_pixel_enable:1;
+   } sf7;
+
+};
+
+struct gen6_scissor_rect
+{
+   unsigned xmin:16;
+   unsigned ymin:16;
+   unsigned xmax:16;
+   unsigned ymax:16;
+};
+
+struct brw_gs_unit_state
+{
+   struct thread0 thread0;
+   struct thread1 thread1;
+   struct thread2 thread2;
+   struct thread3 thread3;
+
+   struct
+   {
+      unsigned pad0:8;
+      unsigned rendering_enable:1; /* for Ironlake */
+      unsigned pad4:1;
+      unsigned stats_enable:1;
+      unsigned nr_urb_entries:7;
+      unsigned pad1:1;
+      unsigned urb_entry_allocation_size:5;
+      unsigned pad2:1;
+      unsigned max_threads:5;
+      unsigned pad3:2;
+   } thread4;
+
+   struct
+   {
+      unsigned sampler_count:3;
+      unsigned pad0:2;
+      unsigned sampler_state_pointer:27;
+   } gs5;
+
+
+   struct
+   {
+      unsigned max_vp_index:4;
+      unsigned pad0:12;
+      unsigned svbi_post_inc_value:10;
+      unsigned pad1:1;
+      unsigned svbi_post_inc_enable:1;
+      unsigned svbi_payload:1;
+      unsigned discard_adjaceny:1;
+      unsigned reorder_enable:1;
+      unsigned pad2:1;
+   } gs6;
+};
+
+
+struct brw_vs_unit_state
+{
+   struct thread0 thread0;
+   struct thread1 thread1;
+   struct thread2 thread2;
+   struct thread3 thread3;
+
+   struct
+   {
+      unsigned pad0:10;
+      unsigned stats_enable:1;
+      unsigned nr_urb_entries:7;
+      unsigned pad1:1;
+      unsigned urb_entry_allocation_size:5;
+      unsigned pad2:1;
+      unsigned max_threads:6;
+      unsigned pad3:1;
+   } thread4;
+
+   struct
+   {
+      unsigned sampler_count:3;
+      unsigned pad0:2;
+      unsigned sampler_state_pointer:27;
+   } vs5;
+
+   struct
+   {
+      unsigned vs_enable:1;
+      unsigned vert_cache_disable:1;
+      unsigned pad0:30;
+   } vs6;
+};
+
+
+struct brw_wm_unit_state
+{
+   struct thread0 thread0;
+   struct thread1 thread1;
+   struct thread2 thread2;
+   struct thread3 thread3;
+
+   struct {
+      unsigned stats_enable:1;
+      unsigned depth_buffer_clear:1;
+      unsigned sampler_count:3;
+      unsigned sampler_state_pointer:27;
+   } wm4;
+
+   struct
+   {
+      unsigned enable_8_pix:1;
+      unsigned enable_16_pix:1;
+      unsigned enable_32_pix:1;
+      unsigned enable_con_32_pix:1;
+      unsigned enable_con_64_pix:1;
+      unsigned pad0:1;
+
+      /* These next four bits are for Ironlake+ */
+      unsigned fast_span_coverage_enable:1;
+      unsigned depth_buffer_clear:1;
+      unsigned depth_buffer_resolve_enable:1;
+      unsigned hierarchical_depth_buffer_resolve_enable:1;
+
+      unsigned legacy_global_depth_bias:1;
+      unsigned line_stipple:1;
+      unsigned depth_offset:1;
+      unsigned polygon_stipple:1;
+      unsigned line_aa_region_width:2;
+      unsigned line_endcap_aa_region_width:2;
+      unsigned early_depth_test:1;
+      unsigned thread_dispatch_enable:1;
+      unsigned program_uses_depth:1;
+      unsigned program_computes_depth:1;
+      unsigned program_uses_killpixel:1;
+      unsigned legacy_line_rast: 1;
+      unsigned transposed_urb_read_enable:1;
+      unsigned max_threads:7;
+   } wm5;
+
+   float global_depth_offset_constant;
+   float global_depth_offset_scale;
+
+   /* for Ironlake only */
+   struct {
+      unsigned pad0:1;
+      unsigned grf_reg_count_1:3;
+      unsigned pad1:2;
+      unsigned kernel_start_pointer_1:26;
+   } wm8;
+
+   struct {
+      unsigned pad0:1;
+      unsigned grf_reg_count_2:3;
+      unsigned pad1:2;
+      unsigned kernel_start_pointer_2:26;
+   } wm9;
+
+   struct {
+      unsigned pad0:1;
+      unsigned grf_reg_count_3:3;
+      unsigned pad1:2;
+      unsigned kernel_start_pointer_3:26;
+   } wm10;
+};
+
+struct brw_sampler_default_color {
+   float color[4];
+};
+
+struct gen5_sampler_default_color {
+   uint8_t ub[4];
+   float f[4];
+   uint16_t hf[4];
+   uint16_t us[4];
+   int16_t s[4];
+   uint8_t b[4];
+};
+
+struct brw_sampler_state
+{
+
+   struct
+   {
+      unsigned shadow_function:3;
+      unsigned lod_bias:11;
+      unsigned min_filter:3;
+      unsigned mag_filter:3;
+      unsigned mip_filter:2;
+      unsigned base_level:5;
+      unsigned min_mag_neq:1;
+      unsigned lod_preclamp:1;
+      unsigned default_color_mode:1;
+      unsigned pad0:1;
+      unsigned disable:1;
+   } ss0;
+
+   struct
+   {
+      unsigned r_wrap_mode:3;
+      unsigned t_wrap_mode:3;
+      unsigned s_wrap_mode:3;
+      unsigned cube_control_mode:1;
+      unsigned pad:2;
+      unsigned max_lod:10;
+      unsigned min_lod:10;
+   } ss1;
+
+
+   struct
+   {
+      unsigned pad:5;
+      unsigned default_color_pointer:27;
+   } ss2;
+
+   struct
+   {
+      unsigned non_normalized_coord:1;
+      unsigned pad:12;
+      unsigned address_round:6;
+      unsigned max_aniso:3;
+      unsigned chroma_key_mode:1;
+      unsigned chroma_key_index:2;
+      unsigned chroma_key_enable:1;
+      unsigned monochrome_filter_width:3;
+      unsigned monochrome_filter_height:3;
+   } ss3;
+};
+
+struct gen7_sampler_state
+{
+   struct
+   {
+      unsigned aniso_algorithm:1;
+      unsigned lod_bias:13;
+      unsigned min_filter:3;
+      unsigned mag_filter:3;
+      unsigned mip_filter:2;
+      unsigned base_level:5;
+      unsigned pad1:1;
+      unsigned lod_preclamp:1;
+      unsigned default_color_mode:1;
+      unsigned pad0:1;
+      unsigned disable:1;
+   } ss0;
+
+   struct
+   {
+      unsigned cube_control_mode:1;
+      unsigned shadow_function:3;
+      unsigned pad:4;
+      unsigned max_lod:12;
+      unsigned min_lod:12;
+   } ss1;
+
+   struct
+   {
+      unsigned pad:5;
+      unsigned default_color_pointer:27;
+   } ss2;
+
+   struct
+   {
+      unsigned r_wrap_mode:3;
+      unsigned t_wrap_mode:3;
+      unsigned s_wrap_mode:3;
+      unsigned pad:1;
+      unsigned non_normalized_coord:1;
+      unsigned trilinear_quality:2;
+      unsigned address_round:6;
+      unsigned max_aniso:3;
+      unsigned chroma_key_mode:1;
+      unsigned chroma_key_index:2;
+      unsigned chroma_key_enable:1;
+      unsigned pad0:6;
+   } ss3;
+};
+
+struct brw_clipper_viewport
+{
+   float xmin;
+   float xmax;
+   float ymin;
+   float ymax;
+};
+
+struct brw_cc_viewport
+{
+   float min_depth;
+   float max_depth;
+};
+
+struct brw_sf_viewport
+{
+   struct {
+      float m00;
+      float m11;
+      float m22;
+      float m30;
+      float m31;
+      float m32;
+   } viewport;
+
+   /* scissor coordinates are inclusive */
+   struct {
+      int16_t xmin;
+      int16_t ymin;
+      int16_t xmax;
+      int16_t ymax;
+   } scissor;
+};
+
+struct gen6_sf_viewport {
+   float m00;
+   float m11;
+   float m22;
+   float m30;
+   float m31;
+   float m32;
+};
+
+struct gen7_sf_clip_viewport {
+   struct {
+      float m00;
+      float m11;
+      float m22;
+      float m30;
+      float m31;
+      float m32;
+   } viewport;
+
+   unsigned pad0[2];
+
+   struct {
+      float xmin;
+      float xmax;
+      float ymin;
+      float ymax;
+   } guardband;
+
+   float pad1[4];
+};
+
+struct brw_urb_immediate {
+   unsigned opcode:4;
+   unsigned offset:6;
+   unsigned swizzle_control:2;
+   unsigned pad:1;
+   unsigned allocate:1;
+   unsigned used:1;
+   unsigned complete:1;
+   unsigned response_length:4;
+   unsigned msg_length:4;
+   unsigned msg_target:4;
+   unsigned pad1:3;
+   unsigned end_of_thread:1;
+};
+
+/* Instruction format for the execution units:
+ */
+
+struct brw_instruction
+{
+   struct
+   {
+      unsigned opcode:7;
+      unsigned pad:1;
+      unsigned access_mode:1;
+      unsigned mask_control:1;
+      unsigned dependency_control:2;
+      unsigned compression_control:2; /* gen6: quarter control */
+      unsigned thread_control:2;
+      unsigned predicate_control:4;
+      unsigned predicate_inverse:1;
+      unsigned execution_size:3;
+      /**
+       * Conditional Modifier for most instructions.  On Gen6+, this is also
+       * used for the SEND instruction's Message Target/SFID.
+       */
+      unsigned destreg__conditionalmod:4;
+      unsigned acc_wr_control:1;
+      unsigned cmpt_control:1;
+      unsigned debug_control:1;
+      unsigned saturate:1;
+   } header;
+
+   union {
+      struct
+      {
+	 unsigned dest_reg_file:2;
+	 unsigned dest_reg_type:3;
+	 unsigned src0_reg_file:2;
+	 unsigned src0_reg_type:3;
+	 unsigned src1_reg_file:2;
+	 unsigned src1_reg_type:3;
+         unsigned nibctrl:1; /* gen7+ */
+	 unsigned dest_subreg_nr:5;
+	 unsigned dest_reg_nr:8;
+	 unsigned dest_horiz_stride:2;
+	 unsigned dest_address_mode:1;
+      } da1;
+
+      struct
+      {
+	 unsigned dest_reg_file:2;
+	 unsigned dest_reg_type:3;
+	 unsigned src0_reg_file:2;
+	 unsigned src0_reg_type:3;
+	 unsigned src1_reg_file:2;        /* 0x00000c00 */
+	 unsigned src1_reg_type:3;        /* 0x00007000 */
+         unsigned nibctrl:1; /* gen7+ */
+	 int dest_indirect_offset:10;	/* offset against the deref'd address reg */
+	 unsigned dest_subreg_nr:3; /* subnr for the address reg a0.x */
+	 unsigned dest_horiz_stride:2;
+	 unsigned dest_address_mode:1;
+      } ia1;
+
+      struct
+      {
+	 unsigned dest_reg_file:2;
+	 unsigned dest_reg_type:3;
+	 unsigned src0_reg_file:2;
+	 unsigned src0_reg_type:3;
+	 unsigned src1_reg_file:2;
+	 unsigned src1_reg_type:3;
+         unsigned nibctrl:1; /* gen7+ */
+	 unsigned dest_writemask:4;
+	 unsigned dest_subreg_nr:1;
+	 unsigned dest_reg_nr:8;
+	 unsigned dest_horiz_stride:2;
+	 unsigned dest_address_mode:1;
+      } da16;
+
+      struct
+      {
+	 unsigned dest_reg_file:2;
+	 unsigned dest_reg_type:3;
+	 unsigned src0_reg_file:2;
+	 unsigned src0_reg_type:3;
+         unsigned src1_reg_file:2;
+         unsigned src1_reg_type:3;
+         unsigned nibctrl:1; /* gen7+ */
+	 unsigned dest_writemask:4;
+	 int dest_indirect_offset:6;
+	 unsigned dest_subreg_nr:3;
+	 unsigned dest_horiz_stride:2;
+	 unsigned dest_address_mode:1;
+      } ia16;
+
+      struct {
+	 unsigned dest_reg_file:2;
+	 unsigned dest_reg_type:3;
+	 unsigned src0_reg_file:2;
+	 unsigned src0_reg_type:3;
+	 unsigned src1_reg_file:2;
+	 unsigned src1_reg_type:3;
+	 unsigned pad:1;
+
+	 int jump_count:16;
+      } branch_gen6;
+
+      struct {
+         unsigned dest_reg_file:1; /* gen6, not gen7+ */
+	 unsigned flag_subreg_num:1;
+         unsigned flag_reg_nr:1; /* gen7+ */
+         unsigned pad0:1;
+	 unsigned src0_abs:1;
+	 unsigned src0_negate:1;
+	 unsigned src1_abs:1;
+	 unsigned src1_negate:1;
+	 unsigned src2_abs:1;
+	 unsigned src2_negate:1;
+         unsigned src_type:2; /* gen7+ */
+         unsigned dst_type:2; /* gen7+ */
+         unsigned pad1:1;
+         unsigned nibctrl:1; /* gen7+ */
+         unsigned pad2:1;
+	 unsigned dest_writemask:4;
+	 unsigned dest_subreg_nr:3;
+	 unsigned dest_reg_nr:8;
+      } da3src;
+
+      uint32_t ud;
+   } bits1;
+
+
+   union {
+      struct
+      {
+	 unsigned src0_subreg_nr:5;
+	 unsigned src0_reg_nr:8;
+	 unsigned src0_abs:1;
+	 unsigned src0_negate:1;
+	 unsigned src0_address_mode:1;
+	 unsigned src0_horiz_stride:2;
+	 unsigned src0_width:3;
+	 unsigned src0_vert_stride:4;
+	 unsigned flag_subreg_nr:1;
+         unsigned flag_reg_nr:1; /* gen7+ */
+	 unsigned pad:5;
+      } da1;
+
+      struct
+      {
+	 int src0_indirect_offset:10;
+	 unsigned src0_subreg_nr:3;
+	 unsigned src0_abs:1;
+	 unsigned src0_negate:1;
+	 unsigned src0_address_mode:1;
+	 unsigned src0_horiz_stride:2;
+	 unsigned src0_width:3;
+	 unsigned src0_vert_stride:4;
+	 unsigned flag_subreg_nr:1;
+         unsigned flag_reg_nr:1; /* gen7+ */
+	 unsigned pad:5;
+      } ia1;
+
+      struct
+      {
+	 unsigned src0_swz_x:2;
+	 unsigned src0_swz_y:2;
+	 unsigned src0_subreg_nr:1;
+	 unsigned src0_reg_nr:8;
+	 unsigned src0_abs:1;
+	 unsigned src0_negate:1;
+	 unsigned src0_address_mode:1;
+	 unsigned src0_swz_z:2;
+	 unsigned src0_swz_w:2;
+	 unsigned pad0:1;
+	 unsigned src0_vert_stride:4;
+	 unsigned flag_subreg_nr:1;
+         unsigned flag_reg_nr:1; /* gen7+ */
+	 unsigned pad1:5;
+      } da16;
+
+      struct
+      {
+	 unsigned src0_swz_x:2;
+	 unsigned src0_swz_y:2;
+	 int src0_indirect_offset:6;
+	 unsigned src0_subreg_nr:3;
+	 unsigned src0_abs:1;
+	 unsigned src0_negate:1;
+	 unsigned src0_address_mode:1;
+	 unsigned src0_swz_z:2;
+	 unsigned src0_swz_w:2;
+	 unsigned pad0:1;
+	 unsigned src0_vert_stride:4;
+	 unsigned flag_subreg_nr:1;
+         unsigned flag_reg_nr:1; /* gen7+ */
+	 unsigned pad1:5;
+      } ia16;
+
+      /* Extended Message Descriptor for Ironlake (Gen5) SEND instruction.
+       *
+       * Does not apply to Gen6+.  The SFID/message target moved to bits
+       * 27:24 of the header (destreg__conditionalmod); EOT is in bits3.
+       */
+       struct
+       {
+           unsigned pad:26;
+           unsigned end_of_thread:1;
+           unsigned pad1:1;
+           unsigned sfid:4;
+       } send_gen5;  /* for Ironlake only */
+
+      struct {
+	 unsigned src0_rep_ctrl:1;
+	 unsigned src0_swizzle:8;
+	 unsigned src0_subreg_nr:3;
+	 unsigned src0_reg_nr:8;
+	 unsigned pad0:1;
+	 unsigned src1_rep_ctrl:1;
+	 unsigned src1_swizzle:8;
+	 unsigned src1_subreg_nr_low:2;
+      } da3src;
+
+      uint32_t ud;
+   } bits2;
+
+   union
+   {
+      struct
+      {
+	 unsigned src1_subreg_nr:5;
+	 unsigned src1_reg_nr:8;
+	 unsigned src1_abs:1;
+	 unsigned src1_negate:1;
+	 unsigned src1_address_mode:1;
+	 unsigned src1_horiz_stride:2;
+	 unsigned src1_width:3;
+	 unsigned src1_vert_stride:4;
+	 unsigned pad0:7;
+      } da1;
+
+      struct
+      {
+	 unsigned src1_swz_x:2;
+	 unsigned src1_swz_y:2;
+	 unsigned src1_subreg_nr:1;
+	 unsigned src1_reg_nr:8;
+	 unsigned src1_abs:1;
+	 unsigned src1_negate:1;
+	 unsigned src1_address_mode:1;
+	 unsigned src1_swz_z:2;
+	 unsigned src1_swz_w:2;
+	 unsigned pad1:1;
+	 unsigned src1_vert_stride:4;
+	 unsigned pad2:7;
+      } da16;
+
+      struct
+      {
+	 int  src1_indirect_offset:10;
+	 unsigned src1_subreg_nr:3;
+	 unsigned src1_abs:1;
+	 unsigned src1_negate:1;
+	 unsigned src1_address_mode:1;
+	 unsigned src1_horiz_stride:2;
+	 unsigned src1_width:3;
+	 unsigned src1_vert_stride:4;
+	 unsigned pad1:7;
+      } ia1;
+
+      struct
+      {
+	 unsigned src1_swz_x:2;
+	 unsigned src1_swz_y:2;
+	 int  src1_indirect_offset:6;
+	 unsigned src1_subreg_nr:3;
+	 unsigned src1_abs:1;
+	 unsigned src1_negate:1;
+	 unsigned pad0:1;
+	 unsigned src1_swz_z:2;
+	 unsigned src1_swz_w:2;
+	 unsigned pad1:1;
+	 unsigned src1_vert_stride:4;
+	 unsigned pad2:7;
+      } ia16;
+
+
+      struct
+      {
+	 int  jump_count:16;	/* note: signed */
+	 unsigned  pop_count:4;
+	 unsigned  pad0:12;
+      } if_else;
+
+      /* This is also used for gen7 IF/ELSE instructions */
+      struct
+      {
+	 /* Signed jump distance to the ip to jump to if all channels
+	  * are disabled after the break or continue.  It should point
+	  * to the end of the innermost control flow block, as that's
+	  * where some channel could get re-enabled.
+	  */
+	 int jip:16;
+
+	 /* Signed jump distance to the location to resume execution
+	  * of this channel if it's enabled for the break or continue.
+	  */
+	 int uip:16;
+      } break_cont;
+
+      /**
+       * \defgroup SEND instructions / Message Descriptors
+       *
+       * @{
+       */
+
+      /**
+       * Generic Message Descriptor for Gen4 SEND instructions.  The structs
+       * below expand function_control to something specific for their
+       * message.  Due to struct packing issues, they duplicate these bits.
+       *
+       * See the G45 PRM, Volume 4, Table 14-15.
+       */
+      struct {
+	 unsigned function_control:16;
+	 unsigned response_length:4;
+	 unsigned msg_length:4;
+	 unsigned msg_target:4;
+	 unsigned pad1:3;
+	 unsigned end_of_thread:1;
+      } generic;
+
+      /**
+       * Generic Message Descriptor for Gen5-7 SEND instructions.
+       *
+       * See the Sandybridge PRM, Volume 2 Part 2, Table 8-15.  (Sadly, most
+       * of the information on the SEND instruction is missing from the public
+       * Ironlake PRM.)
+       *
+       * The table claims that bit 31 is reserved/MBZ on Gen6+, but it lies.
+       * According to the SEND instruction description:
+       * "The MSb of the message description, the EOT field, always comes from
+       *  bit 127 of the instruction word"...which is bit 31 of this field.
+       */
+      struct {
+	 unsigned function_control:19;
+	 unsigned header_present:1;
+	 unsigned response_length:5;
+	 unsigned msg_length:4;
+	 unsigned pad1:2;
+	 unsigned end_of_thread:1;
+      } generic_gen5;
+
+      /** G45 PRM, Volume 4, Section 6.1.1.1 */
+      struct {
+	 unsigned function:4;
+	 unsigned int_type:1;
+	 unsigned precision:1;
+	 unsigned saturate:1;
+	 unsigned data_type:1;
+	 unsigned pad0:8;
+	 unsigned response_length:4;
+	 unsigned msg_length:4;
+	 unsigned msg_target:4;
+	 unsigned pad1:3;
+	 unsigned end_of_thread:1;
+      } math;
+
+      /** Ironlake PRM, Volume 4 Part 1, Section 6.1.1.1 */
+      struct {
+	 unsigned function:4;
+	 unsigned int_type:1;
+	 unsigned precision:1;
+	 unsigned saturate:1;
+	 unsigned data_type:1;
+	 unsigned snapshot:1;
+	 unsigned pad0:10;
+	 unsigned header_present:1;
+	 unsigned response_length:5;
+	 unsigned msg_length:4;
+	 unsigned pad1:2;
+	 unsigned end_of_thread:1;
+      } math_gen5;
+
+      /** G45 PRM, Volume 4, Section 4.8.1.1.1 [DevBW] and [DevCL] */
+      struct {
+	 unsigned binding_table_index:8;
+	 unsigned sampler:4;
+	 unsigned return_format:2;
+	 unsigned msg_type:2;
+	 unsigned response_length:4;
+	 unsigned msg_length:4;
+	 unsigned msg_target:4;
+	 unsigned pad1:3;
+	 unsigned end_of_thread:1;
+      } sampler;
+
+      /** G45 PRM, Volume 4, Section 4.8.1.1.2 [DevCTG] */
+      struct {
+         unsigned binding_table_index:8;
+         unsigned sampler:4;
+         unsigned msg_type:4;
+         unsigned response_length:4;
+         unsigned msg_length:4;
+         unsigned msg_target:4;
+         unsigned pad1:3;
+         unsigned end_of_thread:1;
+      } sampler_g4x;
+
+      /** Ironlake PRM, Volume 4 Part 1, Section 4.11.1.1.3 */
+      struct {
+	 unsigned binding_table_index:8;
+	 unsigned sampler:4;
+	 unsigned msg_type:4;
+	 unsigned simd_mode:2;
+	 unsigned pad0:1;
+	 unsigned header_present:1;
+	 unsigned response_length:5;
+	 unsigned msg_length:4;
+	 unsigned pad1:2;
+	 unsigned end_of_thread:1;
+      } sampler_gen5;
+
+      struct {
+	 unsigned binding_table_index:8;
+	 unsigned sampler:4;
+	 unsigned msg_type:5;
+	 unsigned simd_mode:2;
+	 unsigned header_present:1;
+	 unsigned response_length:5;
+	 unsigned msg_length:4;
+	 unsigned pad1:2;
+	 unsigned end_of_thread:1;
+      } sampler_gen7;
+
+      struct brw_urb_immediate urb;
+
+      struct {
+	 unsigned opcode:4;
+	 unsigned offset:6;
+	 unsigned swizzle_control:2;
+	 unsigned pad:1;
+	 unsigned allocate:1;
+	 unsigned used:1;
+	 unsigned complete:1;
+	 unsigned pad0:3;
+	 unsigned header_present:1;
+	 unsigned response_length:5;
+	 unsigned msg_length:4;
+	 unsigned pad1:2;
+	 unsigned end_of_thread:1;
+      } urb_gen5;
+
+      struct {
+	 unsigned opcode:3;
+	 unsigned offset:11;
+	 unsigned swizzle_control:1;
+	 unsigned complete:1;
+	 unsigned per_slot_offset:1;
+	 unsigned pad0:2;
+	 unsigned header_present:1;
+	 unsigned response_length:5;
+	 unsigned msg_length:4;
+	 unsigned pad1:2;
+	 unsigned end_of_thread:1;
+      } urb_gen7;
+
+      /** 965 PRM, Volume 4, Section 5.10.1.1: Message Descriptor */
+      struct {
+	 unsigned binding_table_index:8;
+	 unsigned msg_control:4;
+	 unsigned msg_type:2;
+	 unsigned target_cache:2;
+	 unsigned response_length:4;
+	 unsigned msg_length:4;
+	 unsigned msg_target:4;
+	 unsigned pad1:3;
+	 unsigned end_of_thread:1;
+      } dp_read;
+
+      /** G45 PRM, Volume 4, Section 5.10.1.1.2 */
+      struct {
+	 unsigned binding_table_index:8;
+	 unsigned msg_control:3;
+	 unsigned msg_type:3;
+	 unsigned target_cache:2;
+	 unsigned response_length:4;
+	 unsigned msg_length:4;
+	 unsigned msg_target:4;
+	 unsigned pad1:3;
+	 unsigned end_of_thread:1;
+      } dp_read_g4x;
+
+      /** Ironlake PRM, Volume 4 Part 1, Section 5.10.2.1.2. */
+      struct {
+	 unsigned binding_table_index:8;
+	 unsigned msg_control:3;
+	 unsigned msg_type:3;
+	 unsigned target_cache:2;
+	 unsigned pad0:3;
+	 unsigned header_present:1;
+	 unsigned response_length:5;
+	 unsigned msg_length:4;
+	 unsigned pad1:2;
+	 unsigned end_of_thread:1;
+      } dp_read_gen5;
+
+      /** G45 PRM, Volume 4, Section 5.10.1.1.2.  For both Gen4 and G45. */
+      struct {
+	 unsigned binding_table_index:8;
+	 unsigned msg_control:3;
+	 unsigned last_render_target:1;
+	 unsigned msg_type:3;
+	 unsigned send_commit_msg:1;
+	 unsigned response_length:4;
+	 unsigned msg_length:4;
+	 unsigned msg_target:4;
+	 unsigned pad1:3;
+	 unsigned end_of_thread:1;
+      } dp_write;
+
+      /** Ironlake PRM, Volume 4 Part 1, Section 5.10.2.1.2. */
+      struct {
+	 unsigned binding_table_index:8;
+	 unsigned msg_control:3;
+	 unsigned last_render_target:1;
+	 unsigned msg_type:3;
+	 unsigned send_commit_msg:1;
+	 unsigned pad0:3;
+	 unsigned header_present:1;
+	 unsigned response_length:5;
+	 unsigned msg_length:4;
+	 unsigned pad1:2;
+	 unsigned end_of_thread:1;
+      } dp_write_gen5;
+
+      /**
+       * Message for the Sandybridge Sampler Cache or Constant Cache Data Port.
+       *
+       * See the Sandybridge PRM, Volume 4 Part 1, Section 3.9.2.1.1.
+       **/
+      struct {
+	 unsigned binding_table_index:8;
+	 unsigned msg_control:5;
+	 unsigned msg_type:3;
+	 unsigned pad0:3;
+	 unsigned header_present:1;
+	 unsigned response_length:5;
+	 unsigned msg_length:4;
+	 unsigned pad1:2;
+	 unsigned end_of_thread:1;
+      } gen6_dp_sampler_const_cache;
+
+      /**
+       * Message for the Sandybridge Render Cache Data Port.
+       *
+       * Most fields are defined in the Sandybridge PRM, Volume 4 Part 1,
+       * Section 3.9.2.1.1: Message Descriptor.
+       *
+       * "Slot Group Select" and "Last Render Target" are part of the
+       * 5-bit message control for Render Target Write messages.  See
+       * Section 3.9.9.2.1 of the same volume.
+       */
+      struct {
+	 unsigned binding_table_index:8;
+	 unsigned msg_control:3;
+	 unsigned slot_group_select:1;
+	 unsigned last_render_target:1;
+	 unsigned msg_type:4;
+	 unsigned send_commit_msg:1;
+	 unsigned pad0:1;
+	 unsigned header_present:1;
+	 unsigned response_length:5;
+	 unsigned msg_length:4;
+	 unsigned pad1:2;
+	 unsigned end_of_thread:1;
+      } gen6_dp;
+
+      /**
+       * Message for any of the Gen7 Data Port caches.
+       *
+       * Most fields are defined in the Ivybridge PRM, Volume 4 Part 1,
+       * section 3.9.2.1.1 "Message Descriptor".  Once again, "Slot Group
+       * Select" and "Last Render Target" are part of the 6-bit message
+       * control for Render Target Writes (section 3.9.11.2).
+       */
+      struct {
+	 unsigned binding_table_index:8;
+	 unsigned msg_control:3;
+	 unsigned slot_group_select:1;
+	 unsigned last_render_target:1;
+	 unsigned msg_control_pad:1;
+	 unsigned msg_type:4;
+	 unsigned pad1:1;
+	 unsigned header_present:1;
+	 unsigned response_length:5;
+	 unsigned msg_length:4;
+	 unsigned pad2:2;
+	 unsigned end_of_thread:1;
+      } gen7_dp;
+
+      /**
+       * Message for the Gen7 Pixel Interpolator.
+       *
+       * Defined in the Ivybridge PRM, Volume 4 Part 2,
+       * section 4.1.1.1.
+       */
+      struct {
+         GLuint msg_data:8;
+         GLuint pad1:3;
+         GLuint slot_group:1;
+         GLuint msg_type:2;
+         GLuint interpolation_mode:1;
+         GLuint pad2:1;
+         GLuint simd_mode:1;
+         GLuint pad3:1;
+         GLuint response_length:5;
+         GLuint msg_length:4;
+         GLuint pad4:2;
+         GLuint end_of_thread:1;
+      } gen7_pi;
+      /** @} */
+
+      struct {
+	 unsigned src1_subreg_nr_high:1;
+	 unsigned src1_reg_nr:8;
+	 unsigned pad0:1;
+	 unsigned src2_rep_ctrl:1;
+	 unsigned src2_swizzle:8;
+	 unsigned src2_subreg_nr:3;
+	 unsigned src2_reg_nr:8;
+	 unsigned pad1:2;
+      } da3src;
+
+      int d;
+      unsigned ud;
+      float f;
+   } bits3;
+};
+
+struct brw_compact_instruction {
+   struct {
+      unsigned opcode:7;          /*  0- 6 */
+      unsigned debug_control:1;   /*  7- 7 */
+      unsigned control_index:5;   /*  8-12 */
+      unsigned data_type_index:5; /* 13-17 */
+      unsigned sub_reg_index:5;   /* 18-22 */
+      unsigned acc_wr_control:1;  /* 23-23 */
+      unsigned conditionalmod:4;  /* 24-27 */
+      unsigned flag_subreg_nr:1;     /* 28-28 */
+      unsigned cmpt_ctrl:1;       /* 29-29 */
+      unsigned src0_index:2;      /* 30-31 */
+   } dw0;
+
+   struct {
+      unsigned src0_index:3;  /* 32-24 */
+      unsigned src1_index:5;  /* 35-39 */
+      unsigned dst_reg_nr:8;  /* 40-47 */
+      unsigned src0_reg_nr:8; /* 48-55 */
+      unsigned src1_reg_nr:8; /* 56-63 */
+   } dw1;
+};
+
+#endif

diff --git a/icd/intel/compiler/pipeline/brw_vec4.cpp b/icd/intel/compiler/pipeline/brw_vec4.cpp
new file mode 100644
index 0000000..f5d78bd
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_vec4.cpp

@@ -0,0 +1,1902 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "brw_vec4.h"
+#include "brw_cfg.h"
+#include "brw_vs.h"
+#include "brw_dead_control_flow.h"
+#include "glsl/glsl_parser_extras.h"
+
+extern "C" {
+#include "main/macros.h"
+#include "main/shaderobj.h"
+#include "program/prog_print.h"
+#include "program/prog_parameter.h"
+}
+
+#define MAX_INSTRUCTION (1 << 30)
+
+using namespace brw;
+
+namespace brw {
+
+/**
+ * Common helper for constructing swizzles.  When only a subset of
+ * channels of a vec4 are used, we don't want to reference the other
+ * channels, as that will tell optimization passes that those other
+ * channels are used.
+ */
+unsigned
+swizzle_for_size(int size)
+{
+   static const unsigned size_swizzles[4] = {
+      BRW_SWIZZLE4(SWIZZLE_X, SWIZZLE_X, SWIZZLE_X, SWIZZLE_X),
+      BRW_SWIZZLE4(SWIZZLE_X, SWIZZLE_Y, SWIZZLE_Y, SWIZZLE_Y),
+      BRW_SWIZZLE4(SWIZZLE_X, SWIZZLE_Y, SWIZZLE_Z, SWIZZLE_Z),
+      BRW_SWIZZLE4(SWIZZLE_X, SWIZZLE_Y, SWIZZLE_Z, SWIZZLE_W),
+   };
+
+   assert((size >= 1) && (size <= 4));
+   return size_swizzles[size - 1];
+}
+
+void
+src_reg::init()
+{
+   memset(this, 0, sizeof(*this));
+
+   this->file = BAD_FILE;
+}
+
+src_reg::src_reg(register_file file, int reg, const glsl_type *type)
+{
+   init();
+
+   this->file = file;
+   this->reg = reg;
+   if (type && (type->is_scalar() || type->is_vector() || type->is_matrix()))
+      this->swizzle = swizzle_for_size(type->vector_elements);
+   else
+      this->swizzle = BRW_SWIZZLE_XYZW;
+}
+
+/** Generic unset register constructor. */
+src_reg::src_reg()
+{
+   init();
+}
+
+src_reg::src_reg(float f)
+{
+   init();
+
+   this->file = IMM;
+   this->type = BRW_REGISTER_TYPE_F;
+   this->imm.f = f;
+}
+
+src_reg::src_reg(uint32_t u)
+{
+   init();
+
+   this->file = IMM;
+   this->type = BRW_REGISTER_TYPE_UD;
+   this->imm.u = u;
+}
+
+src_reg::src_reg(int32_t i)
+{
+   init();
+
+   this->file = IMM;
+   this->type = BRW_REGISTER_TYPE_D;
+   this->imm.i = i;
+}
+
+src_reg::src_reg(struct brw_reg reg)
+{
+   init();
+
+   this->file = HW_REG;
+   this->fixed_hw_reg = reg;
+   this->type = reg.type;
+}
+
+src_reg::src_reg(dst_reg reg)
+{
+   init();
+
+   this->file = reg.file;
+   this->reg = reg.reg;
+   this->reg_offset = reg.reg_offset;
+   this->type = reg.type;
+   this->reladdr = reg.reladdr;
+   this->fixed_hw_reg = reg.fixed_hw_reg;
+
+   int swizzles[4];
+   int next_chan = 0;
+   int last = 0;
+
+   for (int i = 0; i < 4; i++) {
+      if (!(reg.writemask & (1 << i)))
+         continue;
+
+      swizzles[next_chan++] = last = i;
+   }
+
+   for (; next_chan < 4; next_chan++) {
+      swizzles[next_chan] = last;
+   }
+
+   this->swizzle = BRW_SWIZZLE4(swizzles[0], swizzles[1],
+                                swizzles[2], swizzles[3]);
+}
+
+bool
+src_reg::is_accumulator() const
+{
+   return file == HW_REG &&
+          fixed_hw_reg.file == BRW_ARCHITECTURE_REGISTER_FILE &&
+          fixed_hw_reg.nr == BRW_ARF_ACCUMULATOR;
+}
+
+
+void
+dst_reg::init()
+{
+   memset(this, 0, sizeof(*this));
+   this->file = BAD_FILE;
+   this->writemask = WRITEMASK_XYZW;
+}
+
+dst_reg::dst_reg()
+{
+   init();
+}
+
+dst_reg::dst_reg(register_file file, int reg)
+{
+   init();
+
+   this->file = file;
+   this->reg = reg;
+}
+
+dst_reg::dst_reg(register_file file, int reg, const glsl_type *type,
+                 int writemask)
+{
+   init();
+
+   this->file = file;
+   this->reg = reg;
+   this->type = brw_type_for_base_type(type);
+   this->writemask = writemask;
+}
+
+dst_reg::dst_reg(struct brw_reg reg)
+{
+   init();
+
+   this->file = HW_REG;
+   this->fixed_hw_reg = reg;
+   this->type = reg.type;
+}
+
+dst_reg::dst_reg(src_reg reg)
+{
+   init();
+
+   this->file = reg.file;
+   this->reg = reg.reg;
+   this->reg_offset = reg.reg_offset;
+   this->type = reg.type;
+   /* How should we do writemasking when converting from a src_reg?  It seems
+    * pretty obvious that for src.xxxx the caller wants to write to src.x, but
+    * what about for src.wx?  Just special-case src.xxxx for now.
+    */
+   if (reg.swizzle == BRW_SWIZZLE_XXXX)
+      this->writemask = WRITEMASK_X;
+   else
+      this->writemask = WRITEMASK_XYZW;
+   this->reladdr = reg.reladdr;
+   this->fixed_hw_reg = reg.fixed_hw_reg;
+}
+
+bool
+dst_reg::is_null() const
+{
+   return file == HW_REG &&
+          fixed_hw_reg.file == BRW_ARCHITECTURE_REGISTER_FILE &&
+          fixed_hw_reg.nr == BRW_ARF_NULL;
+}
+
+bool
+dst_reg::is_accumulator() const
+{
+   return file == HW_REG &&
+          fixed_hw_reg.file == BRW_ARCHITECTURE_REGISTER_FILE &&
+          fixed_hw_reg.nr == BRW_ARF_ACCUMULATOR;
+}
+
+bool
+vec4_instruction::is_send_from_grf()
+{
+   switch (opcode) {
+   // LunarG : TODO - shader time??
+   //case SHADER_OPCODE_SHADER_TIME_ADD:
+   case VS_OPCODE_PULL_CONSTANT_LOAD_GEN7:
+      return true;
+   default:
+      return false;
+   }
+}
+
+bool
+vec4_visitor::can_do_source_mods(vec4_instruction *inst)
+{
+   if (brw->gen == 6 && inst->is_math())
+      return false;
+
+   if (inst->is_send_from_grf())
+      return false;
+
+   if (!inst->can_do_source_mods())
+      return false;
+
+   return true;
+}
+
+/**
+ * Returns how many MRFs an opcode will write over.
+ *
+ * Note that this is not the 0 or 1 implied writes in an actual gen
+ * instruction -- the generate_* functions generate additional MOVs
+ * for setup.
+ */
+int
+vec4_visitor::implied_mrf_writes(vec4_instruction *inst)
+{
+   if (inst->mlen == 0)
+      return 0;
+
+   switch (inst->opcode) {
+   case SHADER_OPCODE_RCP:
+   case SHADER_OPCODE_RSQ:
+   case SHADER_OPCODE_SQRT:
+   case SHADER_OPCODE_EXP2:
+   case SHADER_OPCODE_LOG2:
+   case SHADER_OPCODE_SIN:
+   case SHADER_OPCODE_COS:
+      return 1;
+   case SHADER_OPCODE_INT_QUOTIENT:
+   case SHADER_OPCODE_INT_REMAINDER:
+   case SHADER_OPCODE_POW:
+      return 2;
+   case VS_OPCODE_URB_WRITE:
+      return 1;
+   case VS_OPCODE_PULL_CONSTANT_LOAD:
+      return 2;
+   case SHADER_OPCODE_GEN4_SCRATCH_READ:
+      return 2;
+   case SHADER_OPCODE_GEN4_SCRATCH_WRITE:
+      return 3;
+   case GS_OPCODE_URB_WRITE:
+   case GS_OPCODE_THREAD_END:
+      return 0;
+   // LunarG : TODO - shader time??
+//   case SHADER_OPCODE_SHADER_TIME_ADD:
+//      return 0;
+   case SHADER_OPCODE_TEX:
+   case SHADER_OPCODE_TXL:
+   case SHADER_OPCODE_TXD:
+   case SHADER_OPCODE_TXF:
+   case SHADER_OPCODE_TXF_CMS:
+   case SHADER_OPCODE_TXF_MCS:
+   case SHADER_OPCODE_TXS:
+   case SHADER_OPCODE_TG4:
+   case SHADER_OPCODE_TG4_OFFSET:
+      return inst->header_present ? 1 : 0;
+   case SHADER_OPCODE_UNTYPED_ATOMIC:
+   case SHADER_OPCODE_UNTYPED_SURFACE_READ:
+      return 0;
+   default:
+      assert(!"not reached");
+      return inst->mlen;
+   }
+}
+
+bool
+src_reg::equals(src_reg *r)
+{
+   return (file == r->file &&
+	   reg == r->reg &&
+	   reg_offset == r->reg_offset &&
+	   type == r->type &&
+	   negate == r->negate &&
+	   abs == r->abs &&
+	   swizzle == r->swizzle &&
+	   !reladdr && !r->reladdr &&
+	   memcmp(&fixed_hw_reg, &r->fixed_hw_reg,
+		  sizeof(fixed_hw_reg)) == 0 &&
+	   imm.u == r->imm.u);
+}
+
+static bool
+try_eliminate_instruction(vec4_instruction *inst, int new_writemask,
+                          const struct brw_context *brw)
+{
+   if (inst->has_side_effects())
+      return false;
+
+   if (new_writemask == 0) {
+      /* Don't dead code eliminate instructions that write to the
+       * accumulator as a side-effect. Instead just set the destination
+       * to the null register to free it.
+       */
+      if (inst->writes_accumulator || inst->writes_flag()) {
+         inst->dst = dst_reg(retype(brw_null_reg(), inst->dst.type));
+      } else {
+         inst->remove();
+      }
+
+      return true;
+   } else if (inst->dst.writemask != new_writemask) {
+      switch (inst->opcode) {
+      case SHADER_OPCODE_TXF_CMS:
+      case SHADER_OPCODE_GEN4_SCRATCH_READ:
+      case VS_OPCODE_PULL_CONSTANT_LOAD:
+      case VS_OPCODE_PULL_CONSTANT_LOAD_GEN7:
+         break;
+      default:
+         /* Do not set a writemask on Gen6 for math instructions, those are
+          * executed using align1 mode that does not support a destination mask.
+          */
+         if (!(brw->gen == 6 && inst->is_math()) && !inst->is_tex()) {
+            inst->dst.writemask = new_writemask;
+            return true;
+         }
+      }
+   }
+
+   return false;
+}
+
+/**
+ * Must be called after calculate_live_intervals() to remove unused
+ * writes to registers -- register allocation will fail otherwise
+ * because something deffed but not used won't be considered to
+ * interfere with other regs.
+ */
+bool
+vec4_visitor::dead_code_eliminate()
+{
+   bool progress = false;
+   int pc = -1;
+
+   calculate_live_intervals();
+
+   foreach_list_safe(node, &this->instructions) {
+      vec4_instruction *inst = (vec4_instruction *)node;
+
+      pc++;
+
+      bool inst_writes_flag = false;
+      if (inst->dst.file != GRF) {
+         if (inst->dst.is_null() && inst->writes_flag()) {
+            inst_writes_flag = true;
+         } else {
+            continue;
+         }
+      }
+
+      if (inst->dst.file == GRF) {
+         int write_mask = inst->dst.writemask;
+
+         for (int c = 0; c < 4; c++) {
+            if (write_mask & (1 << c)) {
+               assert(this->virtual_grf_end[inst->dst.reg * 4 + c] >= pc);
+               if (this->virtual_grf_end[inst->dst.reg * 4 + c] == pc) {
+                  write_mask &= ~(1 << c);
+               }
+            }
+         }
+
+         progress = try_eliminate_instruction(inst, write_mask, brw) ||
+                    progress;
+      }
+
+      if (inst->predicate || inst->prev == NULL)
+         continue;
+
+      int dead_channels;
+      if (inst_writes_flag) {
+/* Arbitrarily chosen, other than not being an xyzw writemask. */
+#define FLAG_WRITEMASK (1 << 5)
+         dead_channels = inst->reads_flag() ? 0 : FLAG_WRITEMASK;
+      } else {
+         dead_channels = inst->dst.writemask;
+
+         for (int i = 0; i < 3; i++) {
+            if (inst->src[i].file != GRF ||
+                inst->src[i].reg != inst->dst.reg)
+                  continue;
+
+            for (int j = 0; j < 4; j++) {
+               int swiz = BRW_GET_SWZ(inst->src[i].swizzle, j);
+               dead_channels &= ~(1 << swiz);
+            }
+         }
+      }
+
+      for (exec_node *node = inst->prev, *prev = node->prev;
+           prev != NULL && dead_channels != 0;
+           node = prev, prev = prev->prev) {
+         vec4_instruction *scan_inst = (vec4_instruction  *)node;
+
+         if (scan_inst->is_control_flow())
+            break;
+
+         if (inst_writes_flag) {
+            if (scan_inst->dst.is_null() && scan_inst->writes_flag()) {
+               scan_inst->remove();
+               progress = true;
+               continue;
+            } else if (scan_inst->reads_flag()) {
+               break;
+            }
+         }
+
+         if (inst->dst.file == scan_inst->dst.file &&
+             inst->dst.reg == scan_inst->dst.reg &&
+             inst->dst.reg_offset == scan_inst->dst.reg_offset) {
+            int new_writemask = scan_inst->dst.writemask & ~dead_channels;
+
+            progress = try_eliminate_instruction(scan_inst, new_writemask, brw) ||
+                       progress;
+         }
+
+         for (int i = 0; i < 3; i++) {
+            if (scan_inst->src[i].file != inst->dst.file ||
+                scan_inst->src[i].reg != inst->dst.reg)
+               continue;
+
+            for (int j = 0; j < 4; j++) {
+               int swiz = BRW_GET_SWZ(scan_inst->src[i].swizzle, j);
+               dead_channels &= ~(1 << swiz);
+            }
+         }
+      }
+   }
+
+   if (progress)
+      invalidate_live_intervals();
+
+   return progress;
+}
+
+void
+vec4_visitor::split_uniform_registers()
+{
+   /* Prior to this, uniforms have been in an array sized according to
+    * the number of vector uniforms present, sparsely filled (so an
+    * aggregate results in reg indices being skipped over).  Now we're
+    * going to cut those aggregates up so each .reg index is one
+    * vector.  The goal is to make elimination of unused uniform
+    * components easier later.
+    */
+   foreach_list(node, &this->instructions) {
+      vec4_instruction *inst = (vec4_instruction *)node;
+
+      for (int i = 0 ; i < 3; i++) {
+	 if (inst->src[i].file != UNIFORM)
+	    continue;
+
+	 assert(!inst->src[i].reladdr);
+
+	 inst->src[i].reg += inst->src[i].reg_offset;
+	 inst->src[i].reg_offset = 0;
+      }
+   }
+
+   /* Update that everything is now vector-sized. */
+   for (int i = 0; i < this->uniforms; i++) {
+      this->uniform_size[i] = 1;
+   }
+}
+
+void
+vec4_visitor::pack_uniform_registers()
+{
+   bool uniform_used[this->uniforms];
+   int new_loc[this->uniforms];
+   int new_chan[this->uniforms];
+
+   memset(uniform_used, 0, sizeof(uniform_used));
+   memset(new_loc, 0, sizeof(new_loc));
+   memset(new_chan, 0, sizeof(new_chan));
+
+   /* Find which uniform vectors are actually used by the program.  We
+    * expect unused vector elements when we've moved array access out
+    * to pull constants, and from some GLSL code generators like wine.
+    */
+   foreach_list(node, &this->instructions) {
+      vec4_instruction *inst = (vec4_instruction *)node;
+
+      for (int i = 0 ; i < 3; i++) {
+	 if (inst->src[i].file != UNIFORM)
+	    continue;
+
+	 uniform_used[inst->src[i].reg] = true;
+      }
+   }
+
+   int new_uniform_count = 0;
+
+   /* Now, figure out a packing of the live uniform vectors into our
+    * push constants.
+    */
+   for (int src = 0; src < uniforms; src++) {
+      assert(src < uniform_array_size);
+      int size = this->uniform_vector_size[src];
+
+      if (!uniform_used[src]) {
+	 this->uniform_vector_size[src] = 0;
+	 continue;
+      }
+
+      int dst;
+      /* Find the lowest place we can slot this uniform in. */
+      for (dst = 0; dst < src; dst++) {
+	 if (this->uniform_vector_size[dst] + size <= 4)
+	    break;
+      }
+
+      if (src == dst) {
+	 new_loc[src] = dst;
+	 new_chan[src] = 0;
+      } else {
+	 new_loc[src] = dst;
+	 new_chan[src] = this->uniform_vector_size[dst];
+
+	 /* Move the references to the data */
+	 for (int j = 0; j < size; j++) {
+	    stage_prog_data->param[dst * 4 + new_chan[src] + j] =
+	       stage_prog_data->param[src * 4 + j];
+	 }
+
+	 this->uniform_vector_size[dst] += size;
+	 this->uniform_vector_size[src] = 0;
+      }
+
+      new_uniform_count = MAX2(new_uniform_count, dst + 1);
+   }
+
+   this->uniforms = new_uniform_count;
+
+   /* Now, update the instructions for our repacked uniforms. */
+   foreach_list(node, &this->instructions) {
+      vec4_instruction *inst = (vec4_instruction *)node;
+
+      for (int i = 0 ; i < 3; i++) {
+	 int src = inst->src[i].reg;
+
+	 if (inst->src[i].file != UNIFORM)
+	    continue;
+
+	 inst->src[i].reg = new_loc[src];
+
+	 int sx = BRW_GET_SWZ(inst->src[i].swizzle, 0) + new_chan[src];
+	 int sy = BRW_GET_SWZ(inst->src[i].swizzle, 1) + new_chan[src];
+	 int sz = BRW_GET_SWZ(inst->src[i].swizzle, 2) + new_chan[src];
+	 int sw = BRW_GET_SWZ(inst->src[i].swizzle, 3) + new_chan[src];
+	 inst->src[i].swizzle = BRW_SWIZZLE4(sx, sy, sz, sw);
+      }
+   }
+}
+
+bool
+src_reg::is_zero() const
+{
+   if (file != IMM)
+      return false;
+
+   if (type == BRW_REGISTER_TYPE_F) {
+      return imm.f == 0.0;
+   } else {
+      return imm.i == 0;
+   }
+}
+
+bool
+src_reg::is_one() const
+{
+   if (file != IMM)
+      return false;
+
+   if (type == BRW_REGISTER_TYPE_F) {
+      return imm.f == 1.0;
+   } else {
+      return imm.i == 1;
+   }
+}
+
+/**
+ * Does algebraic optimizations (0 * a = 0, 1 * a = a, a + 0 = a).
+ *
+ * While GLSL IR also performs this optimization, we end up with it in
+ * our instruction stream for a couple of reasons.  One is that we
+ * sometimes generate silly instructions, for example in array access
+ * where we'll generate "ADD offset, index, base" even if base is 0.
+ * The other is that GLSL IR's constant propagation doesn't track the
+ * components of aggregates, so some VS patterns (initialize matrix to
+ * 0, accumulate in vertex blending factors) end up breaking down to
+ * instructions involving 0.
+ */
+bool
+vec4_visitor::opt_algebraic()
+{
+   bool progress = false;
+
+   foreach_list(node, &this->instructions) {
+      vec4_instruction *inst = (vec4_instruction *)node;
+
+      switch (inst->opcode) {
+      case BRW_OPCODE_ADD:
+	 if (inst->src[1].is_zero()) {
+	    inst->opcode = BRW_OPCODE_MOV;
+	    inst->src[1] = src_reg();
+	    progress = true;
+	 }
+	 break;
+
+      case BRW_OPCODE_MUL:
+	 if (inst->src[1].is_zero()) {
+	    inst->opcode = BRW_OPCODE_MOV;
+	    switch (inst->src[0].type) {
+	    case BRW_REGISTER_TYPE_F:
+	       inst->src[0] = src_reg(0.0f);
+	       break;
+	    case BRW_REGISTER_TYPE_D:
+	       inst->src[0] = src_reg(0);
+	       break;
+	    case BRW_REGISTER_TYPE_UD:
+	       inst->src[0] = src_reg(0u);
+	       break;
+	    default:
+	       assert(!"not reached");
+	       inst->src[0] = src_reg(0.0f);
+	       break;
+	    }
+	    inst->src[1] = src_reg();
+	    progress = true;
+	 } else if (inst->src[1].is_one()) {
+	    inst->opcode = BRW_OPCODE_MOV;
+	    inst->src[1] = src_reg();
+	    progress = true;
+	 }
+	 break;
+      default:
+	 break;
+      }
+   }
+
+   if (progress)
+      invalidate_live_intervals();
+
+   return progress;
+}
+
+/**
+ * Only a limited number of hardware registers may be used for push
+ * constants, so this turns access to the overflowed constants into
+ * pull constants.
+ */
+void
+vec4_visitor::move_push_constants_to_pull_constants()
+{
+   int pull_constant_loc[this->uniforms];
+
+   /* Only allow 32 registers (256 uniform components) as push constants,
+    * which is the limit on gen6.
+    */
+   // LunarG : TODO - turning off push constants for bring up
+   int max_uniform_components = 0;//32 * 8;
+   if (this->uniforms * 4 <= max_uniform_components)
+      return;
+
+   /* Make some sort of choice as to which uniforms get sent to pull
+    * constants.  We could potentially do something clever here like
+    * look for the most infrequently used uniform vec4s, but leave
+    * that for later.
+    */
+   for (int i = 0; i < this->uniforms * 4; i += 4) {
+      pull_constant_loc[i / 4] = -1;
+
+      if (i >= max_uniform_components) {
+	 const float **values = &stage_prog_data->param[i];
+
+	 /* Try to find an existing copy of this uniform in the pull
+	  * constants if it was part of an array access already.
+	  */
+	 for (unsigned int j = 0; j < stage_prog_data->nr_pull_params; j += 4) {
+	    int matches;
+
+	    for (matches = 0; matches < 4; matches++) {
+	       if (stage_prog_data->pull_param[j + matches] != values[matches])
+		  break;
+	    }
+
+	    if (matches == 4) {
+	       pull_constant_loc[i / 4] = j / 4;
+	       break;
+	    }
+	 }
+
+	 if (pull_constant_loc[i / 4] == -1) {
+	    assert(stage_prog_data->nr_pull_params % 4 == 0);
+	    pull_constant_loc[i / 4] = stage_prog_data->nr_pull_params / 4;
+
+	    for (int j = 0; j < 4; j++) {
+	       stage_prog_data->pull_param[stage_prog_data->nr_pull_params++] =
+                  values[j];
+	    }
+	 }
+      }
+   }
+
+   /* Now actually rewrite usage of the things we've moved to pull
+    * constants.
+    */
+   foreach_list_safe(node, &this->instructions) {
+      vec4_instruction *inst = (vec4_instruction *)node;
+
+      for (int i = 0 ; i < 3; i++) {
+	 if (inst->src[i].file != UNIFORM ||
+	     pull_constant_loc[inst->src[i].reg] == -1)
+	    continue;
+
+	 int uniform = inst->src[i].reg;
+
+	 dst_reg temp = dst_reg(this, glsl_type::vec4_type);
+
+	 emit_pull_constant_load(inst, temp, inst->src[i],
+				 pull_constant_loc[uniform]);
+
+	 inst->src[i].file = temp.file;
+	 inst->src[i].reg = temp.reg;
+	 inst->src[i].reg_offset = temp.reg_offset;
+	 inst->src[i].reladdr = NULL;
+      }
+   }
+
+   /* Repack push constants to remove the now-unused ones. */
+   pack_uniform_registers();
+}
+
+/**
+ * Sets the dependency control fields on instructions after register
+ * allocation and before the generator is run.
+ *
+ * When you have a sequence of instructions like:
+ *
+ * DP4 temp.x vertex uniform[0]
+ * DP4 temp.y vertex uniform[0]
+ * DP4 temp.z vertex uniform[0]
+ * DP4 temp.w vertex uniform[0]
+ *
+ * The hardware doesn't know that it can actually run the later instructions
+ * while the previous ones are in flight, producing stalls.  However, we have
+ * manual fields we can set in the instructions that let it do so.
+ */
+void
+vec4_visitor::opt_set_dependency_control()
+{
+   vec4_instruction *last_grf_write[BRW_MAX_GRF];
+   uint8_t grf_channels_written[BRW_MAX_GRF];
+   vec4_instruction *last_mrf_write[BRW_MAX_GRF];
+   uint8_t mrf_channels_written[BRW_MAX_GRF];
+
+   cfg_t cfg(&instructions);
+
+   assert(prog_data->total_grf ||
+          !"Must be called after register allocation");
+
+   for (int i = 0; i < cfg.num_blocks; i++) {
+      bblock_t *bblock = cfg.blocks[i];
+      vec4_instruction *inst;
+
+      memset(last_grf_write, 0, sizeof(last_grf_write));
+      memset(last_mrf_write, 0, sizeof(last_mrf_write));
+
+      for (inst = (vec4_instruction *)bblock->start;
+           inst != (vec4_instruction *)bblock->end->next;
+           inst = (vec4_instruction *)inst->next) {
+         /* If we read from a register that we were doing dependency control
+          * on, don't do dependency control across the read.
+          */
+         for (int i = 0; i < 3; i++) {
+            int reg = inst->src[i].reg + inst->src[i].reg_offset;
+            if (inst->src[i].file == GRF) {
+               last_grf_write[reg] = NULL;
+            } else if (inst->src[i].file == HW_REG) {
+               memset(last_grf_write, 0, sizeof(last_grf_write));
+               break;
+            }
+            assert(inst->src[i].file != MRF);
+         }
+
+         /* In the presence of send messages, totally interrupt dependency
+          * control.  They're long enough that the chance of dependency
+          * control around them just doesn't matter.
+          */
+         if (inst->mlen) {
+            memset(last_grf_write, 0, sizeof(last_grf_write));
+            memset(last_mrf_write, 0, sizeof(last_mrf_write));
+            continue;
+         }
+
+         /* It looks like setting dependency control on a predicated
+          * instruction hangs the GPU.
+          */
+         if (inst->predicate) {
+            memset(last_grf_write, 0, sizeof(last_grf_write));
+            memset(last_mrf_write, 0, sizeof(last_mrf_write));
+            continue;
+         }
+
+         /* Dependency control does not work well over math instructions.
+          */
+         if (inst->is_math()) {
+            memset(last_grf_write, 0, sizeof(last_grf_write));
+            memset(last_mrf_write, 0, sizeof(last_mrf_write));
+            continue;
+         }
+
+         /* Now, see if we can do dependency control for this instruction
+          * against a previous one writing to its destination.
+          */
+         int reg = inst->dst.reg + inst->dst.reg_offset;
+         if (inst->dst.file == GRF) {
+            if (last_grf_write[reg] &&
+                !(inst->dst.writemask & grf_channels_written[reg])) {
+               last_grf_write[reg]->no_dd_clear = true;
+               inst->no_dd_check = true;
+            } else {
+               grf_channels_written[reg] = 0;
+            }
+
+            last_grf_write[reg] = inst;
+            grf_channels_written[reg] |= inst->dst.writemask;
+         } else if (inst->dst.file == MRF) {
+            if (last_mrf_write[reg] &&
+                !(inst->dst.writemask & mrf_channels_written[reg])) {
+               last_mrf_write[reg]->no_dd_clear = true;
+               inst->no_dd_check = true;
+            } else {
+               mrf_channels_written[reg] = 0;
+            }
+
+            last_mrf_write[reg] = inst;
+            mrf_channels_written[reg] |= inst->dst.writemask;
+         } else if (inst->dst.reg == HW_REG) {
+            if (inst->dst.fixed_hw_reg.file == BRW_GENERAL_REGISTER_FILE)
+               memset(last_grf_write, 0, sizeof(last_grf_write));
+            if (inst->dst.fixed_hw_reg.file == BRW_MESSAGE_REGISTER_FILE)
+               memset(last_mrf_write, 0, sizeof(last_mrf_write));
+         }
+      }
+   }
+}
+
+bool
+vec4_instruction::can_reswizzle_dst(int dst_writemask,
+                                    int swizzle,
+                                    int swizzle_mask)
+{
+   /* If this instruction sets anything not referenced by swizzle, then we'd
+    * totally break it when we reswizzle.
+    */
+   if (dst.writemask & ~swizzle_mask)
+      return false;
+
+   switch (opcode) {
+   case BRW_OPCODE_DP4:
+   case BRW_OPCODE_DP3:
+   case BRW_OPCODE_DP2:
+      return true;
+   default:
+      /* Check if there happens to be no reswizzling required. */
+      for (int c = 0; c < 4; c++) {
+         int bit = 1 << BRW_GET_SWZ(swizzle, c);
+         /* Skip components of the swizzle not used by the dst. */
+         if (!(dst_writemask & (1 << c)))
+            continue;
+
+         /* We don't do the reswizzling yet, so just sanity check that we
+          * don't have to.
+          */
+         if (bit != (1 << c))
+            return false;
+      }
+      return true;
+   }
+}
+
+/**
+ * For any channels in the swizzle's source that were populated by this
+ * instruction, rewrite the instruction to put the appropriate result directly
+ * in those channels.
+ *
+ * e.g. for swizzle=yywx, MUL a.xy b c -> MUL a.yy_x b.yy z.yy_x
+ */
+void
+vec4_instruction::reswizzle_dst(int dst_writemask, int swizzle)
+{
+   int new_writemask = 0;
+
+   switch (opcode) {
+   case BRW_OPCODE_DP4:
+   case BRW_OPCODE_DP3:
+   case BRW_OPCODE_DP2:
+      for (int c = 0; c < 4; c++) {
+         int bit = 1 << BRW_GET_SWZ(swizzle, c);
+         /* Skip components of the swizzle not used by the dst. */
+         if (!(dst_writemask & (1 << c)))
+            continue;
+         /* If we were populating this component, then populate the
+          * corresponding channel of the new dst.
+          */
+         if (dst.writemask & bit)
+            new_writemask |= (1 << c);
+      }
+      dst.writemask = new_writemask;
+      break;
+   default:
+      for (int c = 0; c < 4; c++) {
+         /* Skip components of the swizzle not used by the dst. */
+         if (!(dst_writemask & (1 << c)))
+            continue;
+
+         /* We don't do the reswizzling yet, so just sanity check that we
+          * don't have to.
+          */
+         assert((1 << BRW_GET_SWZ(swizzle, c)) == (1 << c));
+      }
+      break;
+   }
+}
+
+/*
+ * Tries to reduce extra MOV instructions by taking temporary GRFs that get
+ * just written and then MOVed into another reg and making the original write
+ * of the GRF write directly to the final destination instead.
+ */
+bool
+vec4_visitor::opt_register_coalesce()
+{
+   bool progress = false;
+   int next_ip = 0;
+
+   calculate_live_intervals();
+
+   foreach_list_safe(node, &this->instructions) {
+      vec4_instruction *inst = (vec4_instruction *)node;
+
+      int ip = next_ip;
+      next_ip++;
+
+      if (inst->opcode != BRW_OPCODE_MOV ||
+          (inst->dst.file != GRF && inst->dst.file != MRF) ||
+	  inst->predicate ||
+	  inst->src[0].file != GRF ||
+	  inst->dst.type != inst->src[0].type ||
+	  inst->src[0].abs || inst->src[0].negate || inst->src[0].reladdr)
+	 continue;
+
+      bool to_mrf = (inst->dst.file == MRF);
+
+      /* Can't coalesce this GRF if someone else was going to
+       * read it later.
+       */
+      if (this->virtual_grf_end[inst->src[0].reg * 4 + 0] > ip ||
+          this->virtual_grf_end[inst->src[0].reg * 4 + 1] > ip ||
+          this->virtual_grf_end[inst->src[0].reg * 4 + 2] > ip ||
+          this->virtual_grf_end[inst->src[0].reg * 4 + 3] > ip)
+	 continue;
+
+      /* We need to check interference with the final destination between this
+       * instruction and the earliest instruction involved in writing the GRF
+       * we're eliminating.  To do that, keep track of which of our source
+       * channels we've seen initialized.
+       */
+      bool chans_needed[4] = {false, false, false, false};
+      int chans_remaining = 0;
+      int swizzle_mask = 0;
+      for (int i = 0; i < 4; i++) {
+	 int chan = BRW_GET_SWZ(inst->src[0].swizzle, i);
+
+	 if (!(inst->dst.writemask & (1 << i)))
+	    continue;
+
+         swizzle_mask |= (1 << chan);
+
+	 if (!chans_needed[chan]) {
+	    chans_needed[chan] = true;
+	    chans_remaining++;
+	 }
+      }
+
+      /* Now walk up the instruction stream trying to see if we can rewrite
+       * everything writing to the temporary to write into the destination
+       * instead.
+       */
+      vec4_instruction *scan_inst;
+      for (scan_inst = (vec4_instruction *)inst->prev;
+	   scan_inst->prev != NULL;
+	   scan_inst = (vec4_instruction *)scan_inst->prev) {
+	 if (scan_inst->dst.file == GRF &&
+	     scan_inst->dst.reg == inst->src[0].reg &&
+	     scan_inst->dst.reg_offset == inst->src[0].reg_offset) {
+            /* Found something writing to the reg we want to coalesce away. */
+            if (to_mrf) {
+               /* SEND instructions can't have MRF as a destination. */
+               if (scan_inst->mlen)
+                  break;
+
+               if (brw->gen == 6) {
+                  /* gen6 math instructions must have the destination be
+                   * GRF, so no compute-to-MRF for them.
+                   */
+                  if (scan_inst->is_math()) {
+                     break;
+                  }
+               }
+            }
+
+            /* If we can't handle the swizzle, bail. */
+            if (!scan_inst->can_reswizzle_dst(inst->dst.writemask,
+                                              inst->src[0].swizzle,
+                                              swizzle_mask)) {
+               break;
+            }
+
+	    /* Mark which channels we found unconditional writes for. */
+	    if (!scan_inst->predicate) {
+	       for (int i = 0; i < 4; i++) {
+		  if (scan_inst->dst.writemask & (1 << i) &&
+		      chans_needed[i]) {
+		     chans_needed[i] = false;
+		     chans_remaining--;
+		  }
+	       }
+	    }
+
+	    if (chans_remaining == 0)
+	       break;
+	 }
+
+	 /* We don't handle flow control here.  Most computation of values
+	  * that could be coalesced happens just before their use.
+	  */
+	 if (scan_inst->opcode == BRW_OPCODE_DO ||
+	     scan_inst->opcode == BRW_OPCODE_WHILE ||
+	     scan_inst->opcode == BRW_OPCODE_ELSE ||
+	     scan_inst->opcode == BRW_OPCODE_ENDIF) {
+	    break;
+	 }
+
+         /* You can't read from an MRF, so if someone else reads our MRF's
+          * source GRF that we wanted to rewrite, that stops us.  If it's a
+          * GRF we're trying to coalesce to, we don't actually handle
+          * rewriting sources so bail in that case as well.
+          */
+	 bool interfered = false;
+	 for (int i = 0; i < 3; i++) {
+	    if (scan_inst->src[i].file == GRF &&
+		scan_inst->src[i].reg == inst->src[0].reg &&
+		scan_inst->src[i].reg_offset == inst->src[0].reg_offset) {
+	       interfered = true;
+	    }
+	 }
+	 if (interfered)
+	    break;
+
+         /* If somebody else writes our destination here, we can't coalesce
+          * before that.
+          */
+         if (scan_inst->dst.file == inst->dst.file &&
+             scan_inst->dst.reg == inst->dst.reg) {
+	    break;
+         }
+
+         /* Check for reads of the register we're trying to coalesce into.  We
+          * can't go rewriting instructions above that to put some other value
+          * in the register instead.
+          */
+         if (to_mrf && scan_inst->mlen > 0) {
+            if (inst->dst.reg >= scan_inst->base_mrf &&
+                inst->dst.reg < scan_inst->base_mrf + scan_inst->mlen) {
+               break;
+            }
+         } else {
+            for (int i = 0; i < 3; i++) {
+               if (scan_inst->src[i].file == inst->dst.file &&
+                   scan_inst->src[i].reg == inst->dst.reg &&
+                   scan_inst->src[i].reg_offset == inst->src[0].reg_offset) {
+                  interfered = true;
+               }
+            }
+            if (interfered)
+               break;
+         }
+      }
+
+      if (chans_remaining == 0) {
+	 /* If we've made it here, we have an MOV we want to coalesce out, and
+	  * a scan_inst pointing to the earliest instruction involved in
+	  * computing the value.  Now go rewrite the instruction stream
+	  * between the two.
+	  */
+
+	 while (scan_inst != inst) {
+	    if (scan_inst->dst.file == GRF &&
+		scan_inst->dst.reg == inst->src[0].reg &&
+		scan_inst->dst.reg_offset == inst->src[0].reg_offset) {
+               scan_inst->reswizzle_dst(inst->dst.writemask,
+                                        inst->src[0].swizzle);
+	       scan_inst->dst.file = inst->dst.file;
+	       scan_inst->dst.reg = inst->dst.reg;
+	       scan_inst->dst.reg_offset = inst->dst.reg_offset;
+	       scan_inst->saturate |= inst->saturate;
+	    }
+	    scan_inst = (vec4_instruction *)scan_inst->next;
+	 }
+	 inst->remove();
+	 progress = true;
+      }
+   }
+
+   if (progress)
+      invalidate_live_intervals();
+
+   return progress;
+}
+
+/**
+ * Splits virtual GRFs requesting more than one contiguous physical register.
+ *
+ * We initially create large virtual GRFs for temporary structures, arrays,
+ * and matrices, so that the dereference visitor functions can add reg_offsets
+ * to work their way down to the actual member being accessed.  But when it
+ * comes to optimization, we'd like to treat each register as individual
+ * storage if possible.
+ *
+ * So far, the only thing that might prevent splitting is a send message from
+ * a GRF on IVB.
+ */
+void
+vec4_visitor::split_virtual_grfs()
+{
+   int num_vars = this->virtual_grf_count;
+   int new_virtual_grf[num_vars];
+   bool split_grf[num_vars];
+
+   memset(new_virtual_grf, 0, sizeof(new_virtual_grf));
+
+   /* Try to split anything > 0 sized. */
+   for (int i = 0; i < num_vars; i++) {
+      split_grf[i] = this->virtual_grf_sizes[i] != 1;
+   }
+
+   /* Check that the instructions are compatible with the registers we're trying
+    * to split.
+    */
+   foreach_list(node, &this->instructions) {
+      vec4_instruction *inst = (vec4_instruction *)node;
+
+      /* If there's a SEND message loading from a GRF on gen7+, it needs to be
+       * contiguous.
+       */
+      if (inst->is_send_from_grf()) {
+         for (int i = 0; i < 3; i++) {
+            if (inst->src[i].file == GRF) {
+               split_grf[inst->src[i].reg] = false;
+            }
+         }
+      }
+   }
+
+   /* Allocate new space for split regs.  Note that the virtual
+    * numbers will be contiguous.
+    */
+   for (int i = 0; i < num_vars; i++) {
+      if (!split_grf[i])
+         continue;
+
+      new_virtual_grf[i] = virtual_grf_alloc(1);
+      for (int j = 2; j < this->virtual_grf_sizes[i]; j++) {
+         int reg = virtual_grf_alloc(1);
+         assert(reg == new_virtual_grf[i] + j - 1);
+         (void) reg;
+      }
+      this->virtual_grf_sizes[i] = 1;
+   }
+
+   foreach_list(node, &this->instructions) {
+      vec4_instruction *inst = (vec4_instruction *)node;
+
+      if (inst->dst.file == GRF && split_grf[inst->dst.reg] &&
+          inst->dst.reg_offset != 0) {
+         inst->dst.reg = (new_virtual_grf[inst->dst.reg] +
+                          inst->dst.reg_offset - 1);
+         inst->dst.reg_offset = 0;
+      }
+      for (int i = 0; i < 3; i++) {
+         if (inst->src[i].file == GRF && split_grf[inst->src[i].reg] &&
+             inst->src[i].reg_offset != 0) {
+            inst->src[i].reg = (new_virtual_grf[inst->src[i].reg] +
+                                inst->src[i].reg_offset - 1);
+            inst->src[i].reg_offset = 0;
+         }
+      }
+   }
+   invalidate_live_intervals();
+}
+
+void
+vec4_visitor::dump_instruction(backend_instruction *be_inst)
+{
+   dump_instruction(be_inst, stderr);
+}
+
+void
+vec4_visitor::dump_instruction(backend_instruction *be_inst, FILE *file)
+{
+   char buff[256];
+   dump_instruction(be_inst, buff);
+
+   fprintf(file, "%s", buff);
+
+   fprintf(file, "\n");
+}
+
+void
+vec4_visitor::dump_instruction(backend_instruction *be_inst, char *string)
+{
+   vec4_instruction *inst = (vec4_instruction *)be_inst;
+
+   if (inst->predicate) {
+      string += sprintf(string, "(%cf0) ",
+             inst->predicate_inverse ? '-' : '+');
+   }
+
+   string += sprintf(string, "%s", brw_instruction_name(inst->opcode));
+   if (inst->conditional_mod) {
+      string += sprintf(string, "%s", conditional_modifier[inst->conditional_mod]);
+   }
+   string += sprintf(string, " ");
+
+   switch (inst->dst.file) {
+   case GRF:
+      string += sprintf(string, "vgrf%d.%d", inst->dst.reg, inst->dst.reg_offset);
+      break;
+   case MRF:
+      string += sprintf(string, "m%d", inst->dst.reg);
+      break;
+   case HW_REG:
+      if (inst->dst.fixed_hw_reg.file == BRW_ARCHITECTURE_REGISTER_FILE) {
+         switch (inst->dst.fixed_hw_reg.nr) {
+         case BRW_ARF_NULL:
+            string += sprintf(string, "null");
+            break;
+         case BRW_ARF_ADDRESS:
+            string += sprintf(string, "a0.%d", inst->dst.fixed_hw_reg.subnr);
+            break;
+         case BRW_ARF_ACCUMULATOR:
+            string += sprintf(string, "acc%d", inst->dst.fixed_hw_reg.subnr);
+            break;
+         case BRW_ARF_FLAG:
+            string += sprintf(string, "f%d.%d", inst->dst.fixed_hw_reg.nr & 0xf,
+                             inst->dst.fixed_hw_reg.subnr);
+            break;
+         default:
+            string += sprintf(string, "arf%d.%d", inst->dst.fixed_hw_reg.nr & 0xf,
+                               inst->dst.fixed_hw_reg.subnr);
+            break;
+         }
+      } else {
+         string += sprintf(string, "hw_reg%d", inst->dst.fixed_hw_reg.nr);
+      }
+      if (inst->dst.fixed_hw_reg.subnr)
+         string += sprintf(string, "+%d", inst->dst.fixed_hw_reg.subnr);
+      break;
+   case BAD_FILE:
+      string += sprintf(string, "(null)");
+      break;
+   default:
+      string += sprintf(string, "???");
+      break;
+   }
+   if (inst->dst.writemask != WRITEMASK_XYZW) {
+      string += sprintf(string, ".");
+      if (inst->dst.writemask & 1)
+         string += sprintf(string, "x");
+      if (inst->dst.writemask & 2)
+         string += sprintf(string, "y");
+      if (inst->dst.writemask & 4)
+         string += sprintf(string, "z");
+      if (inst->dst.writemask & 8)
+         string += sprintf(string, "w");
+   }
+   string += sprintf(string, ":%s, ", brw_reg_type_letters(inst->dst.type));
+
+   for (int i = 0; i < 3 && inst->src[i].file != BAD_FILE; i++) {
+      if (inst->src[i].negate)
+         string += sprintf(string, "-");
+      if (inst->src[i].abs)
+         string += sprintf(string, "|");
+      switch (inst->src[i].file) {
+      case GRF:
+         string += sprintf(string, "vgrf%d", inst->src[i].reg);
+         break;
+      case ATTR:
+         string += sprintf(string, "attr%d", inst->src[i].reg);
+         break;
+      case UNIFORM:
+         string += sprintf(string, "u%d", inst->src[i].reg);
+         break;
+      case IMM:
+         switch (inst->src[i].type) {
+         case BRW_REGISTER_TYPE_F:
+            string += sprintf(string, "%fF", inst->src[i].imm.f);
+            break;
+         case BRW_REGISTER_TYPE_D:
+            string += sprintf(string, "%dD", inst->src[i].imm.i);
+            break;
+         case BRW_REGISTER_TYPE_UD:
+            string += sprintf(string, "%uU", inst->src[i].imm.u);
+            break;
+         default:
+            string += sprintf(string, "???");
+            break;
+         }
+         break;
+      case HW_REG:
+         if (inst->src[i].fixed_hw_reg.negate)
+            string += sprintf(string, "-");
+         if (inst->src[i].fixed_hw_reg.abs)
+            string += sprintf(string, "|");
+         if (inst->src[i].fixed_hw_reg.file == BRW_ARCHITECTURE_REGISTER_FILE) {
+            switch (inst->src[i].fixed_hw_reg.nr) {
+            case BRW_ARF_NULL:
+               string += sprintf(string, "null");
+               break;
+            case BRW_ARF_ADDRESS:
+               string += sprintf(string, "a0.%d", inst->src[i].fixed_hw_reg.subnr);
+               break;
+            case BRW_ARF_ACCUMULATOR:
+               string += sprintf(string, "acc%d", inst->src[i].fixed_hw_reg.subnr);
+               break;
+            case BRW_ARF_FLAG:
+               string += sprintf(string, "f%d.%d", inst->src[i].fixed_hw_reg.nr & 0xf,
+                                inst->src[i].fixed_hw_reg.subnr);
+               break;
+            default:
+               string += sprintf(string, "arf%d.%d", inst->src[i].fixed_hw_reg.nr & 0xf,
+                                  inst->src[i].fixed_hw_reg.subnr);
+               break;
+            }
+         } else {
+            string += sprintf(string, "hw_reg%d", inst->src[i].fixed_hw_reg.nr);
+         }
+         if (inst->src[i].fixed_hw_reg.subnr)
+            string += sprintf(string, "+%d", inst->src[i].fixed_hw_reg.subnr);
+         if (inst->src[i].fixed_hw_reg.abs)
+            string += sprintf(string, "|");
+         break;
+      case BAD_FILE:
+         string += sprintf(string, "(null)");
+         break;
+      default:
+         string += sprintf(string, "???");
+         break;
+      }
+
+      if (virtual_grf_sizes[inst->src[i].reg] != 1)
+         string += sprintf(string, ".%d", inst->src[i].reg_offset);
+
+      if (inst->src[i].file != IMM) {
+         static const char *chans[4] = {"x", "y", "z", "w"};
+         string += sprintf(string, ".");
+         for (int c = 0; c < 4; c++) {
+            string += sprintf(string, "%s", chans[BRW_GET_SWZ(inst->src[i].swizzle, c)]);
+         }
+      }
+
+      if (inst->src[i].abs)
+         string += sprintf(string, "|");
+
+      if (inst->src[i].file != IMM) {
+         string += sprintf(string, ":%s", brw_reg_type_letters(inst->src[i].type));
+      }
+
+      if (i < 2 && inst->src[i + 1].file != BAD_FILE)
+         string += sprintf(string, ", ");
+   }
+}
+
+
+static inline struct brw_reg
+attribute_to_hw_reg(int attr, bool interleaved)
+{
+   if (interleaved)
+      return stride(brw_vec4_grf(attr / 2, (attr % 2) * 4), 0, 4, 1);
+   else
+      return brw_vec8_grf(attr, 0);
+}
+
+
+/**
+ * Replace each register of type ATTR in this->instructions with a reference
+ * to a fixed HW register.
+ *
+ * If interleaved is true, then each attribute takes up half a register, with
+ * register N containing attribute 2*N in its first half and attribute 2*N+1
+ * in its second half (this corresponds to the payload setup used by geometry
+ * shaders in "single" or "dual instanced" dispatch mode).  If interleaved is
+ * false, then each attribute takes up a whole register, with register N
+ * containing attribute N (this corresponds to the payload setup used by
+ * vertex shaders, and by geometry shaders in "dual object" dispatch mode).
+ */
+void
+vec4_visitor::lower_attributes_to_hw_regs(const int *attribute_map,
+                                          bool interleaved)
+{
+   foreach_list(node, &this->instructions) {
+      vec4_instruction *inst = (vec4_instruction *)node;
+
+      /* We have to support ATTR as a destination for GL_FIXED fixup. */
+      if (inst->dst.file == ATTR) {
+	 int grf = attribute_map[inst->dst.reg + inst->dst.reg_offset];
+
+         /* All attributes used in the shader need to have been assigned a
+          * hardware register by the caller
+          */
+         assert(grf != 0);
+
+	 struct brw_reg reg = attribute_to_hw_reg(grf, interleaved);
+	 reg.type = inst->dst.type;
+	 reg.dw1.bits.writemask = inst->dst.writemask;
+
+	 inst->dst.file = HW_REG;
+	 inst->dst.fixed_hw_reg = reg;
+      }
+
+      for (int i = 0; i < 3; i++) {
+	 if (inst->src[i].file != ATTR)
+	    continue;
+
+	 int grf = attribute_map[inst->src[i].reg + inst->src[i].reg_offset];
+
+         /* All attributes used in the shader need to have been assigned a
+          * hardware register by the caller
+          */
+         assert(grf != 0);
+
+	 struct brw_reg reg = attribute_to_hw_reg(grf, interleaved);
+	 reg.dw1.bits.swizzle = inst->src[i].swizzle;
+         reg.type = inst->src[i].type;
+	 if (inst->src[i].abs)
+	    reg = brw_abs(reg);
+	 if (inst->src[i].negate)
+	    reg = negate(reg);
+
+	 inst->src[i].file = HW_REG;
+	 inst->src[i].fixed_hw_reg = reg;
+      }
+   }
+}
+
+int
+vec4_vs_visitor::setup_attributes(int payload_reg)
+{
+   int nr_attributes;
+   int attribute_map[VERT_ATTRIB_MAX + 1];
+   memset(attribute_map, 0, sizeof(attribute_map));
+
+   nr_attributes = 0;
+   for (int i = 0; i < VERT_ATTRIB_MAX; i++) {
+      if (vs_prog_data->inputs_read & BITFIELD64_BIT(i)) {
+	 attribute_map[i] = payload_reg + nr_attributes;
+	 nr_attributes++;
+      }
+   }
+
+   /* VertexID is stored by the VF as the last vertex element, but we
+    * don't represent it with a flag in inputs_read, so we call it
+    * VERT_ATTRIB_MAX.
+    */
+   if (vs_prog_data->uses_vertexid || vs_prog_data->uses_instanceid) {
+      attribute_map[VERT_ATTRIB_MAX] = payload_reg + nr_attributes;
+      nr_attributes++;
+   }
+
+   lower_attributes_to_hw_regs(attribute_map, false /* interleaved */);
+
+   /* The BSpec says we always have to read at least one thing from
+    * the VF, and it appears that the hardware wedges otherwise.
+    */
+   if (nr_attributes == 0)
+      nr_attributes = 1;
+
+   prog_data->urb_read_length = (nr_attributes + 1) / 2;
+
+   unsigned vue_entries =
+      MAX2(nr_attributes, prog_data->vue_map.num_slots);
+
+   if (brw->gen == 6)
+      prog_data->urb_entry_size = ALIGN(vue_entries, 8) / 8;
+   else
+      prog_data->urb_entry_size = ALIGN(vue_entries, 4) / 4;
+
+   return payload_reg + nr_attributes;
+}
+
+int
+vec4_visitor::setup_uniforms(int reg)
+{
+   prog_data->dispatch_grf_start_reg = reg;
+
+   /* The pre-gen6 VS requires that some push constants get loaded no
+    * matter what, or the GPU would hang.
+    */
+   if (brw->gen < 6 && this->uniforms == 0) {
+      assert(this->uniforms < this->uniform_array_size);
+      this->uniform_vector_size[this->uniforms] = 1;
+
+      stage_prog_data->param =
+         reralloc(NULL, stage_prog_data->param, const float *, 4);
+      for (unsigned int i = 0; i < 4; i++) {
+	 unsigned int slot = this->uniforms * 4 + i;
+	 static float zero = 0.0;
+	 stage_prog_data->param[slot] = &zero;
+      }
+
+      this->uniforms++;
+      reg++;
+   } else {
+      reg += ALIGN(uniforms, 2) / 2;
+   }
+
+   stage_prog_data->nr_params = this->uniforms * 4;
+
+   prog_data->curb_read_length = reg - prog_data->dispatch_grf_start_reg;
+
+   return reg;
+}
+
+void
+vec4_vs_visitor::setup_payload(void)
+{
+   int reg = 0;
+
+   /* The payload always contains important data in g0, which contains
+    * the URB handles that are passed on to the URB write at the end
+    * of the thread.  So, we always start push constants at g1.
+    */
+   reg++;
+
+   reg = setup_uniforms(reg);
+
+   reg = setup_attributes(reg);
+
+   this->first_non_payload_grf = reg;
+}
+
+// LunarG : TODO - shader time??
+//src_reg
+//vec4_visitor::get_timestamp()
+//{
+//   assert(brw->gen >= 7);
+
+//   src_reg ts = src_reg(brw_reg(BRW_ARCHITECTURE_REGISTER_FILE,
+//                                BRW_ARF_TIMESTAMP,
+//                                0,
+//                                BRW_REGISTER_TYPE_UD,
+//                                BRW_VERTICAL_STRIDE_0,
+//                                BRW_WIDTH_4,
+//                                BRW_HORIZONTAL_STRIDE_4,
+//                                BRW_SWIZZLE_XYZW,
+//                                WRITEMASK_XYZW));
+
+//   dst_reg dst = dst_reg(this, glsl_type::uvec4_type);
+
+//   vec4_instruction *mov = emit(MOV(dst, ts));
+//   /* We want to read the 3 fields we care about (mostly field 0, but also 2)
+//    * even if it's not enabled in the dispatch.
+//    */
+//   mov->force_writemask_all = true;
+
+//   return src_reg(dst);
+//}
+
+//void
+//vec4_visitor::emit_shader_time_begin()
+//{
+//   current_annotation = "shader time start";
+//   shader_start_time = get_timestamp();
+//}
+
+// LunarG : TODO - shader time??
+//void
+//vec4_visitor::emit_shader_time_end()
+//{
+//   current_annotation = "shader time end";
+//   src_reg shader_end_time = get_timestamp();
+
+
+//   /* Check that there weren't any timestamp reset events (assuming these
+//    * were the only two timestamp reads that happened).
+//    */
+//   src_reg reset_end = shader_end_time;
+//   reset_end.swizzle = BRW_SWIZZLE_ZZZZ;
+//   vec4_instruction *test = emit(AND(dst_null_d(), reset_end, src_reg(1u)));
+//   test->conditional_mod = BRW_CONDITIONAL_Z;
+
+//   emit(IF(BRW_PREDICATE_NORMAL));
+
+//   /* Take the current timestamp and get the delta. */
+//   shader_start_time.negate = true;
+//   dst_reg diff = dst_reg(this, glsl_type::uint_type);
+//   emit(ADD(diff, shader_start_time, shader_end_time));
+
+//   /* If there were no instructions between the two timestamp gets, the diff
+//    * is 2 cycles.  Remove that overhead, so I can forget about that when
+//    * trying to determine the time taken for single instructions.
+//    */
+//   emit(ADD(diff, src_reg(diff), src_reg(-2u)));
+
+//   emit_shader_time_write(st_base, src_reg(diff));
+//   emit_shader_time_write(st_written, src_reg(1u));
+//   emit(BRW_OPCODE_ELSE);
+//   emit_shader_time_write(st_reset, src_reg(1u));
+//   emit(BRW_OPCODE_ENDIF);
+//}
+
+// LunarG : TODO - shader time??
+//void
+//vec4_visitor::emit_shader_time_write(enum shader_time_shader_type type,
+//                                     src_reg value)
+//{
+//   int shader_time_index =
+//      brw_get_shader_time_index(brw, shader_prog, prog, type);
+
+//   dst_reg dst =
+//      dst_reg(this, glsl_type::get_array_instance(glsl_type::vec4_type, 2));
+
+//   dst_reg offset = dst;
+//   dst_reg time = dst;
+//   time.reg_offset++;
+
+//   offset.type = BRW_REGISTER_TYPE_UD;
+//   emit(MOV(offset, src_reg(shader_time_index * SHADER_TIME_STRIDE)));
+
+//   time.type = BRW_REGISTER_TYPE_UD;
+//   emit(MOV(time, src_reg(value)));
+
+//   emit(SHADER_OPCODE_SHADER_TIME_ADD, dst_reg(), src_reg(dst));
+//}
+
+bool
+vec4_visitor::run()
+{
+   sanity_param_count = prog->Parameters->NumParameters;
+
+//   if (INTEL_DEBUG & DEBUG_SHADER_TIME)
+//      emit_shader_time_begin();
+
+   assign_common_binding_table_offsets(0);
+
+   emit_prolog();
+
+   /* Generate VS IR for main().  (the visitor only descends into
+    * functions called "main").
+    */
+//   if (shader) {
+      assert(shader);
+      visit_instructions(shader->base.ir);
+//   } else {
+//      emit_program_code();
+//   }
+   base_ir = NULL;
+
+   if (key->userclip_active && !prog->UsesClipDistanceOut)
+      setup_uniform_clipplane_values();
+
+   emit_thread_end();
+
+   /* Before any optimization, push array accesses out to scratch
+    * space where we need them to be.  This pass may allocate new
+    * virtual GRFs, so we want to do it early.  It also makes sure
+    * that we have reladdr computations available for CSE, since we'll
+    * often do repeated subexpressions for those.
+    */
+   if (shader) {
+      move_grf_array_access_to_scratch();
+      move_uniform_array_access_to_pull_constants();
+   } else {
+      /* The ARB_vertex_program frontend emits pull constant loads directly
+       * rather than using reladdr, so we don't need to walk through all the
+       * instructions looking for things to move.  There isn't anything.
+       *
+       * We do still need to split things to vec4 size.
+       */
+      split_uniform_registers();
+   }
+   pack_uniform_registers();
+   move_push_constants_to_pull_constants();
+   split_virtual_grfs();
+
+   bool progress;
+   do {
+      progress = false;
+      progress = dead_code_eliminate() || progress;
+      progress = dead_control_flow_eliminate(this) || progress;
+      progress = opt_copy_propagation() || progress;
+      progress = opt_algebraic() || progress;
+      progress = opt_register_coalesce() || progress;
+   } while (progress);
+
+
+   if (failed)
+      return false;
+
+   setup_payload();
+
+   if (false) {
+      /* Debug of register spilling: Go spill everything. */
+      const int grf_count = virtual_grf_count;
+      float spill_costs[virtual_grf_count];
+      bool no_spill[virtual_grf_count];
+      evaluate_spill_costs(spill_costs, no_spill);
+      for (int i = 0; i < grf_count; i++) {
+         if (no_spill[i])
+            continue;
+         spill_reg(i);
+      }
+   }
+
+   while (!reg_allocate()) {
+      if (failed)
+         return false;
+   }
+
+   opt_schedule_instructions(SCHEDULE_PRE_IPS_TD_HI);
+
+
+   opt_set_dependency_control();
+
+   /* If any state parameters were appended, then ParameterValues could have
+    * been realloced, in which case the driver uniform storage set up by
+    * _mesa_associate_uniform_storage() would point to freed memory.  Make
+    * sure that didn't happen.
+    */
+   assert(sanity_param_count == prog->Parameters->NumParameters);
+
+   return !failed;
+}
+
+} /* namespace brw */
+
+extern "C" {
+
+/**
+ * Compile a vertex shader.
+ *
+ * Returns the final assembly and the program's size.
+ */
+const unsigned *
+brw_vs_emit(struct brw_context *brw,
+            struct gl_shader_program *prog,
+            struct brw_vs_compile *c,
+            struct brw_vs_prog_data *prog_data,
+            void *mem_ctx,
+            unsigned *final_assembly_size)
+{
+//   bool start_busy = false;
+//   double start_time = 0;
+
+//   if (unlikely(brw->perf_debug)) {
+//      start_busy = (brw->batch.last_bo &&
+//                    drm_intel_bo_busy(brw->batch.last_bo));
+//      start_time = get_time();
+//   }
+
+   struct brw_shader *shader = NULL;
+   if (prog)
+      shader = (brw_shader *) prog->_LinkedShaders[MESA_SHADER_VERTEX];
+
+   if (unlikely(INTEL_DEBUG & DEBUG_VS))
+      brw_dump_ir(brw, "vertex", prog, &shader->base, &c->vp->program.Base);
+
+   vec4_vs_visitor v(brw, c, prog_data, prog, mem_ctx);
+   if (!v.run()) {
+      if (prog) {
+         prog->LinkStatus = false;
+         ralloc_strcat(&prog->InfoLog, v.fail_msg);
+      }
+
+      _mesa_problem(NULL, "Failed to compile vertex shader: %s\n",
+                    v.fail_msg);
+
+      return NULL;
+   }
+
+   const unsigned *assembly = NULL;
+   if (brw->gen >= 8) {
+//      gen8_vec4_generator g(brw, prog, &c->vp->program.Base, &prog_data->base,
+//                            mem_ctx, INTEL_DEBUG & DEBUG_VS);
+//      assembly = g.generate_assembly(&v.instructions, final_assembly_size);
+   } else {
+      vec4_generator g(brw, prog, &c->vp->program.Base, &prog_data->base,
+                       mem_ctx, INTEL_DEBUG & DEBUG_VS);
+      assembly = g.generate_assembly(&v.instructions, final_assembly_size);
+   }
+
+//   if (unlikely(brw->perf_debug) && shader) {
+//      if (shader->compiled_once) {
+//         brw_vs_debug_recompile(brw, prog, &c->key);
+//      }
+//      if (start_busy && !drm_intel_bo_busy(brw->batch.last_bo)) {
+//         perf_debug("VS compile took %.03f ms and stalled the GPU\n",
+//                    (get_time() - start_time) * 1000);
+//      }
+//      shader->compiled_once = true;
+//   }
+
+   return assembly;
+}
+
+
+void
+brw_vec4_setup_prog_key_for_precompile(struct gl_context *ctx,
+                                       struct brw_vec4_prog_key *key,
+                                       GLuint id, struct gl_program *prog)
+{
+   key->program_string_id = id;
+   key->clamp_vertex_color = ctx->API == API_OPENGL_COMPAT;
+
+   unsigned sampler_count = _mesa_fls(prog->SamplersUsed);
+   for (unsigned i = 0; i < sampler_count; i++) {
+      if (prog->ShadowSamplers & (1 << i)) {
+         /* Assume DEPTH_TEXTURE_MODE is the default: X, X, X, 1 */
+         key->tex.swizzles[i] =
+            MAKE_SWIZZLE4(SWIZZLE_X, SWIZZLE_X, SWIZZLE_X, SWIZZLE_ONE);
+      } else {
+         /* Color sampler: assume no swizzling. */
+         key->tex.swizzles[i] = SWIZZLE_XYZW;
+      }
+   }
+}
+
+} /* extern "C" */

diff --git a/icd/intel/compiler/pipeline/brw_vec4.h b/icd/intel/compiler/pipeline/brw_vec4.h
new file mode 100644
index 0000000..b953db8
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_vec4.h

@@ -0,0 +1,814 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef BRW_VEC4_H
+#define BRW_VEC4_H
+
+#include <stdint.h>
+#include "brw_shader.h"
+#include "main/compiler.h"
+#include "program/hash_table.h"
+#include "brw_program.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "brw_context.h"
+#include "brw_eu.h"
+
+#ifdef __cplusplus
+}; /* extern "C" */
+#include "gen8_generator.h"
+#endif
+
+#include "glsl/ir.h"
+
+
+struct brw_vec4_compile {
+   GLuint last_scratch; /**< measured in 32-byte (register size) units */
+
+   struct gl_shader_program *shader_prog;
+
+   void *mem_ctx;
+   const unsigned *program;
+   unsigned program_size;
+};
+
+
+struct brw_vec4_prog_key {
+   GLuint program_string_id;
+
+   /**
+    * True if at least one clip flag is enabled, regardless of whether the
+    * shader uses clip planes or gl_ClipDistance.
+    */
+   GLuint userclip_active:1;
+
+   /**
+    * How many user clipping planes are being uploaded to the vertex shader as
+    * push constants.
+    */
+   GLuint nr_userclip_plane_consts:4;
+
+   GLuint clamp_vertex_color:1;
+
+   struct brw_sampler_prog_key_data tex;
+};
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void
+brw_vec4_setup_prog_key_for_precompile(struct gl_context *ctx,
+                                       struct brw_vec4_prog_key *key,
+                                       GLuint id, struct gl_program *prog);
+
+#ifdef __cplusplus
+} /* extern "C" */
+
+namespace brw {
+
+class dst_reg;
+
+class vec4_live_variables;
+
+unsigned
+swizzle_for_size(int size);
+
+class reg
+{
+public:
+   /** Register file: GRF, MRF, IMM. */
+   enum register_file file;
+   /** virtual register number.  0 = fixed hw reg */
+   int reg;
+   /** Offset within the virtual register. */
+   int reg_offset;
+   /** Register type.  BRW_REGISTER_TYPE_* */
+   int type;
+   struct brw_reg fixed_hw_reg;
+
+   /** Value for file == BRW_IMMMEDIATE_FILE */
+   union {
+      int32_t i;
+      uint32_t u;
+      float f;
+   } imm;
+};
+
+class src_reg : public reg
+{
+public:
+   DECLARE_RALLOC_CXX_OPERATORS(src_reg)
+
+   void init();
+
+   src_reg(register_file file, int reg, const glsl_type *type);
+   src_reg();
+   src_reg(float f);
+   src_reg(uint32_t u);
+   src_reg(int32_t i);
+   src_reg(struct brw_reg reg);
+
+   bool equals(src_reg *r);
+   bool is_zero() const;
+   bool is_one() const;
+   bool is_accumulator() const;
+
+   src_reg(class vec4_visitor *v, const struct glsl_type *type);
+
+   explicit src_reg(dst_reg reg);
+
+   GLuint swizzle; /**< BRW_SWIZZLE_XYZW macros from brw_reg.h. */
+   bool negate;
+   bool abs;
+
+   src_reg *reladdr;
+};
+
+static inline src_reg
+retype(src_reg reg, unsigned type)
+{
+   reg.fixed_hw_reg.type = reg.type = type;
+   return reg;
+}
+
+static inline src_reg
+offset(src_reg reg, unsigned delta)
+{
+   assert(delta == 0 || (reg.file != HW_REG && reg.file != IMM));
+   reg.reg_offset += delta;
+   return reg;
+}
+
+/**
+ * Reswizzle a given source register.
+ * \sa brw_swizzle().
+ */
+static inline src_reg
+swizzle(src_reg reg, unsigned swizzle)
+{
+   assert(reg.file != HW_REG);
+   reg.swizzle = BRW_SWIZZLE4(
+      BRW_GET_SWZ(reg.swizzle, BRW_GET_SWZ(swizzle, 0)),
+      BRW_GET_SWZ(reg.swizzle, BRW_GET_SWZ(swizzle, 1)),
+      BRW_GET_SWZ(reg.swizzle, BRW_GET_SWZ(swizzle, 2)),
+      BRW_GET_SWZ(reg.swizzle, BRW_GET_SWZ(swizzle, 3)));
+   return reg;
+}
+
+static inline src_reg
+negate(src_reg reg)
+{
+   assert(reg.file != HW_REG && reg.file != IMM);
+   reg.negate = !reg.negate;
+   return reg;
+}
+
+class dst_reg : public reg
+{
+public:
+   DECLARE_RALLOC_CXX_OPERATORS(dst_reg)
+
+   void init();
+
+   dst_reg();
+   dst_reg(register_file file, int reg);
+   dst_reg(register_file file, int reg, const glsl_type *type, int writemask);
+   dst_reg(struct brw_reg reg);
+   dst_reg(class vec4_visitor *v, const struct glsl_type *type);
+
+   explicit dst_reg(src_reg reg);
+
+   bool is_null() const;
+   bool is_accumulator() const;
+
+   int writemask; /**< Bitfield of WRITEMASK_[XYZW] */
+
+   src_reg *reladdr;
+};
+
+static inline dst_reg
+retype(dst_reg reg, unsigned type)
+{
+   reg.fixed_hw_reg.type = reg.type = type;
+   return reg;
+}
+
+static inline dst_reg
+offset(dst_reg reg, unsigned delta)
+{
+   assert(delta == 0 || (reg.file != HW_REG && reg.file != IMM));
+   reg.reg_offset += delta;
+   return reg;
+}
+
+static inline dst_reg
+writemask(dst_reg reg, unsigned mask)
+{
+   assert(reg.file != HW_REG && reg.file != IMM);
+   assert((reg.writemask & mask) != 0);
+   reg.writemask &= mask;
+   return reg;
+}
+
+class vec4_instruction : public backend_instruction {
+public:
+   DECLARE_RALLOC_CXX_OPERATORS(vec4_instruction)
+
+   vec4_instruction(vec4_visitor *v, enum opcode opcode,
+		    dst_reg dst = dst_reg(),
+		    src_reg src0 = src_reg(),
+		    src_reg src1 = src_reg(),
+		    src_reg src2 = src_reg());
+
+   struct brw_reg get_dst(void);
+   struct brw_reg get_src(const struct brw_vec4_prog_data *prog_data, int i);
+
+   dst_reg dst;
+   src_reg src[3];
+
+   bool saturate;
+   bool force_writemask_all;
+   bool no_dd_clear, no_dd_check;
+
+   int conditional_mod; /**< BRW_CONDITIONAL_* */
+
+   int sampler;
+   uint32_t texture_offset; /**< Texture Offset bitfield */
+   int target; /**< MRT target. */
+   bool shadow_compare;
+
+   enum brw_urb_write_flags urb_write_flags;
+   bool header_present;
+   int mlen; /**< SEND message length */
+   int base_mrf; /**< First MRF in the SEND message, if mlen is nonzero. */
+
+   uint32_t offset; /* spill/unspill offset */
+   /** @{
+    * Annotation for the generated IR.  One of the two can be set.
+    */
+   const void *ir;
+   const char *annotation;
+   /** @} */
+
+   bool is_send_from_grf();
+   bool can_reswizzle_dst(int dst_writemask, int swizzle, int swizzle_mask);
+   void reswizzle_dst(int dst_writemask, int swizzle);
+
+   bool reads_flag()
+   {
+      return predicate || opcode == VS_OPCODE_UNPACK_FLAGS_SIMD4X2;
+   }
+
+   bool writes_flag()
+   {
+      return conditional_mod && opcode != BRW_OPCODE_SEL;
+   }
+};
+
+/**
+ * The vertex shader front-end.
+ *
+ * Translates either GLSL IR or Mesa IR (for ARB_vertex_program and
+ * fixed-function) into VS IR.
+ */
+class vec4_visitor : public backend_visitor
+{
+public:
+   vec4_visitor(struct brw_context *brw,
+                struct brw_vec4_compile *c,
+                struct gl_program *prog,
+                const struct brw_vec4_prog_key *key,
+                struct brw_vec4_prog_data *prog_data,
+		struct gl_shader_program *shader_prog,
+                gl_shader_stage stage,
+		void *mem_ctx,
+                bool debug_flag,
+                bool no_spills,
+                shader_time_shader_type st_base,
+                shader_time_shader_type st_written,
+                shader_time_shader_type st_reset);
+   ~vec4_visitor();
+
+   dst_reg dst_null_f()
+   {
+      return dst_reg(brw_null_reg());
+   }
+
+   dst_reg dst_null_d()
+   {
+      return dst_reg(retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
+   }
+
+   dst_reg dst_null_ud()
+   {
+      return dst_reg(retype(brw_null_reg(), BRW_REGISTER_TYPE_UD));
+   }
+
+   struct brw_vec4_compile * const c;
+   const struct brw_vec4_prog_key * const key;
+   struct brw_vec4_prog_data * const prog_data;
+   unsigned int sanity_param_count;
+
+   char *fail_msg;
+   bool failed;
+
+   /**
+    * GLSL IR currently being processed, which is associated with our
+    * driver IR instructions for debugging purposes.
+    */
+   const void *base_ir;
+   const char *current_annotation;
+
+   int *virtual_grf_sizes;
+   int virtual_grf_count;
+   int virtual_grf_array_size;
+   int first_non_payload_grf;
+   unsigned int max_grf;
+   int *virtual_grf_start;
+   int *virtual_grf_end;
+   brw::vec4_live_variables *live_intervals;
+   dst_reg userplane[MAX_CLIP_PLANES];
+
+   /**
+    * This is the size to be used for an array with an element per
+    * reg_offset
+    */
+   int virtual_grf_reg_count;
+   /** Per-virtual-grf indices into an array of size virtual_grf_reg_count */
+   int *virtual_grf_reg_map;
+
+   dst_reg *variable_storage(ir_variable *var);
+
+   void reladdr_to_temp(ir_instruction *ir, src_reg *reg, int *num_reladdr);
+
+   bool need_all_constants_in_pull_buffer;
+
+   /**
+    * \name Visit methods
+    *
+    * As typical for the visitor pattern, there must be one \c visit method for
+    * each concrete subclass of \c ir_instruction.  Virtual base classes within
+    * the hierarchy should not have \c visit methods.
+    */
+   /*@{*/
+   virtual void visit(ir_variable *);
+   virtual void visit(ir_loop *);
+   virtual void visit(ir_loop_jump *);
+   virtual void visit(ir_function_signature *);
+   virtual void visit(ir_function *);
+   virtual void visit(ir_expression *);
+   virtual void visit(ir_swizzle *);
+   virtual void visit(ir_dereference_variable  *);
+   virtual void visit(ir_dereference_array *);
+   virtual void visit(ir_dereference_record *);
+   virtual void visit(ir_assignment *);
+   virtual void visit(ir_constant *);
+   virtual void visit(ir_call *);
+   virtual void visit(ir_return *);
+   virtual void visit(ir_discard *);
+   virtual void visit(ir_texture *);
+   virtual void visit(ir_if *);
+   virtual void visit(ir_emit_vertex *);
+   virtual void visit(ir_end_primitive *);
+   /*@}*/
+
+   src_reg result;
+
+   /* Regs for vertex results.  Generated at ir_variable visiting time
+    * for the ir->location's used.
+    */
+   dst_reg output_reg[BRW_VARYING_SLOT_COUNT];
+   const char *output_reg_annotation[BRW_VARYING_SLOT_COUNT];
+   int *uniform_size;
+   int *uniform_vector_size;
+   int uniform_array_size; /*< Size of uniform_[vector_]size arrays */
+   int uniforms;
+
+   src_reg shader_start_time;
+
+   struct hash_table *variable_ht;
+
+   bool run(void);
+   void fail(const char *msg, ...);
+
+   int virtual_grf_alloc(int size);
+   void setup_uniform_clipplane_values();
+   void setup_uniform_values(ir_variable *ir);
+   void setup_builtin_uniform_values(ir_variable *ir);
+   int setup_uniforms(int payload_reg);
+   bool reg_allocate_trivial();
+   bool reg_allocate();
+   void evaluate_spill_costs(float *spill_costs, bool *no_spill);
+   int choose_spill_reg(struct ra_graph *g);
+   void spill_reg(int spill_reg);
+   void move_grf_array_access_to_scratch();
+   void move_uniform_array_access_to_pull_constants();
+   void move_push_constants_to_pull_constants();
+   void split_uniform_registers();
+   void pack_uniform_registers();
+   void calculate_live_intervals();
+   void invalidate_live_intervals();
+   void split_virtual_grfs();
+   bool dead_code_eliminate();
+   bool virtual_grf_interferes(int a, int b);
+   int  live_in_count(int block_num) const;
+   int  live_out_count(int block_num) const;
+   bool opt_copy_propagation();
+   bool opt_algebraic();
+   bool opt_register_coalesce();
+   void opt_set_dependency_control();
+   int opt_schedule_instructions(instruction_scheduler_mode mode);
+
+   bool can_do_source_mods(vec4_instruction *inst);
+
+   vec4_instruction *emit(vec4_instruction *inst);
+
+   vec4_instruction *emit(enum opcode opcode);
+
+   vec4_instruction *emit(enum opcode opcode, dst_reg dst);
+
+   vec4_instruction *emit(enum opcode opcode, dst_reg dst, src_reg src0);
+
+   vec4_instruction *emit(enum opcode opcode, dst_reg dst,
+			  src_reg src0, src_reg src1);
+
+   vec4_instruction *emit(enum opcode opcode, dst_reg dst,
+			  src_reg src0, src_reg src1, src_reg src2);
+
+   vec4_instruction *emit_before(vec4_instruction *inst,
+				 vec4_instruction *new_inst);
+
+   vec4_instruction *MOV(dst_reg dst, src_reg src0);
+   vec4_instruction *NOT(dst_reg dst, src_reg src0);
+   vec4_instruction *RNDD(dst_reg dst, src_reg src0);
+   vec4_instruction *RNDE(dst_reg dst, src_reg src0);
+   vec4_instruction *RNDZ(dst_reg dst, src_reg src0);
+   vec4_instruction *FRC(dst_reg dst, src_reg src0);
+   vec4_instruction *F32TO16(dst_reg dst, src_reg src0);
+   vec4_instruction *F16TO32(dst_reg dst, src_reg src0);
+   vec4_instruction *ADD(dst_reg dst, src_reg src0, src_reg src1);
+   vec4_instruction *MUL(dst_reg dst, src_reg src0, src_reg src1);
+   vec4_instruction *MACH(dst_reg dst, src_reg src0, src_reg src1);
+   vec4_instruction *MAC(dst_reg dst, src_reg src0, src_reg src1);
+   vec4_instruction *AND(dst_reg dst, src_reg src0, src_reg src1);
+   vec4_instruction *OR(dst_reg dst, src_reg src0, src_reg src1);
+   vec4_instruction *XOR(dst_reg dst, src_reg src0, src_reg src1);
+   vec4_instruction *DP3(dst_reg dst, src_reg src0, src_reg src1);
+   vec4_instruction *DP4(dst_reg dst, src_reg src0, src_reg src1);
+   vec4_instruction *DPH(dst_reg dst, src_reg src0, src_reg src1);
+   vec4_instruction *SHL(dst_reg dst, src_reg src0, src_reg src1);
+   vec4_instruction *SHR(dst_reg dst, src_reg src0, src_reg src1);
+   vec4_instruction *ASR(dst_reg dst, src_reg src0, src_reg src1);
+   vec4_instruction *CMP(dst_reg dst, src_reg src0, src_reg src1,
+			 uint32_t condition);
+   vec4_instruction *IF(src_reg src0, src_reg src1, uint32_t condition);
+   vec4_instruction *IF(uint32_t predicate);
+   vec4_instruction *PULL_CONSTANT_LOAD(dst_reg dst, src_reg index);
+   vec4_instruction *SCRATCH_READ(dst_reg dst, src_reg index);
+   vec4_instruction *SCRATCH_WRITE(dst_reg dst, src_reg src, src_reg index);
+   vec4_instruction *LRP(dst_reg dst, src_reg a, src_reg y, src_reg x);
+   vec4_instruction *BFREV(dst_reg dst, src_reg value);
+   vec4_instruction *BFE(dst_reg dst, src_reg bits, src_reg offset, src_reg value);
+   vec4_instruction *BFI1(dst_reg dst, src_reg bits, src_reg offset);
+   vec4_instruction *BFI2(dst_reg dst, src_reg bfi1_dst, src_reg insert, src_reg base);
+   vec4_instruction *FBH(dst_reg dst, src_reg value);
+   vec4_instruction *FBL(dst_reg dst, src_reg value);
+   vec4_instruction *CBIT(dst_reg dst, src_reg value);
+   vec4_instruction *MAD(dst_reg dst, src_reg c, src_reg b, src_reg a);
+   vec4_instruction *ADDC(dst_reg dst, src_reg src0, src_reg src1);
+   vec4_instruction *SUBB(dst_reg dst, src_reg src0, src_reg src1);
+
+   int implied_mrf_writes(vec4_instruction *inst);
+
+   bool try_rewrite_rhs_to_dst(ir_assignment *ir,
+			       dst_reg dst,
+			       src_reg src,
+			       vec4_instruction *pre_rhs_inst,
+			       vec4_instruction *last_rhs_inst);
+
+   bool try_copy_propagation(vec4_instruction *inst, int arg,
+                             src_reg *values[4]);
+
+   /** Walks an exec_list of ir_instruction and sends it through this visitor. */
+   void visit_instructions(const exec_list *list);
+
+   void emit_vp_sop(uint32_t condmod, dst_reg dst,
+                    src_reg src0, src_reg src1, src_reg one);
+
+   void emit_bool_to_cond_code(ir_rvalue *ir, uint32_t *predicate);
+   void emit_bool_comparison(unsigned int op, dst_reg dst, src_reg src0, src_reg src1);
+   void emit_if_gen6(ir_if *ir);
+
+   void emit_minmax(uint32_t condmod, dst_reg dst, src_reg src0, src_reg src1);
+
+   void emit_lrp(const dst_reg &dst,
+                 const src_reg &x, const src_reg &y, const src_reg &a);
+
+   void emit_block_move(dst_reg *dst, src_reg *src,
+			const struct glsl_type *type, uint32_t predicate);
+
+   void emit_constant_values(dst_reg *dst, ir_constant *value);
+
+   /**
+    * Emit the correct dot-product instruction for the type of arguments
+    */
+   void emit_dp(dst_reg dst, src_reg src0, src_reg src1, unsigned elements);
+
+   void emit_scalar(ir_instruction *ir, enum prog_opcode op,
+		    dst_reg dst, src_reg src0);
+
+   void emit_scalar(ir_instruction *ir, enum prog_opcode op,
+		    dst_reg dst, src_reg src0, src_reg src1);
+
+   void emit_scs(ir_instruction *ir, enum prog_opcode op,
+		 dst_reg dst, const src_reg &src);
+
+   src_reg fix_3src_operand(src_reg src);
+
+   void emit_math1_gen6(enum opcode opcode, dst_reg dst, src_reg src);
+   void emit_math1_gen4(enum opcode opcode, dst_reg dst, src_reg src);
+   void emit_math(enum opcode opcode, dst_reg dst, src_reg src);
+   void emit_math2_gen6(enum opcode opcode, dst_reg dst, src_reg src0, src_reg src1);
+   void emit_math2_gen4(enum opcode opcode, dst_reg dst, src_reg src0, src_reg src1);
+   void emit_math(enum opcode opcode, dst_reg dst, src_reg src0, src_reg src1);
+   src_reg fix_math_operand(src_reg src);
+
+   void emit_pack_half_2x16(dst_reg dst, src_reg src0);
+   void emit_unpack_half_2x16(dst_reg dst, src_reg src0);
+
+   uint32_t gather_channel(ir_texture *ir, int sampler);
+   src_reg emit_mcs_fetch(ir_texture *ir, src_reg coordinate, int sampler);
+   void emit_gen6_gather_wa(uint8_t wa, dst_reg dst);
+   void swizzle_result(ir_texture *ir, src_reg orig_val, int sampler);
+
+   void emit_ndc_computation();
+   void emit_psiz_and_flags(struct brw_reg reg);
+   void emit_clip_distances(dst_reg reg, int offset);
+   void emit_generic_urb_slot(dst_reg reg, int varying);
+   void emit_urb_slot(int mrf, int varying);
+
+   void emit_shader_time_begin();
+   void emit_shader_time_end();
+   void emit_shader_time_write(enum shader_time_shader_type type,
+                               src_reg value);
+
+   void emit_untyped_atomic(unsigned atomic_op, unsigned surf_index,
+                            dst_reg dst, src_reg offset, src_reg src0,
+                            src_reg src1);
+
+   void emit_untyped_surface_read(unsigned surf_index, dst_reg dst,
+                                  src_reg offset);
+
+   src_reg get_scratch_offset(vec4_instruction *inst,
+			      src_reg *reladdr, int reg_offset);
+   src_reg get_pull_constant_offset(vec4_instruction *inst,
+				    src_reg *reladdr, int reg_offset);
+   void emit_scratch_read(vec4_instruction *inst,
+			  dst_reg dst,
+			  src_reg orig_src,
+			  int base_offset);
+   void emit_scratch_write(vec4_instruction *inst,
+			   int base_offset);
+   void emit_pull_constant_load(vec4_instruction *inst,
+				dst_reg dst,
+				src_reg orig_src,
+				int base_offset);
+
+   bool try_emit_sat(ir_expression *ir);
+   bool try_emit_mad(ir_expression *ir);
+   void resolve_ud_negate(src_reg *reg);
+
+   src_reg get_timestamp();
+
+   bool process_move_condition(ir_rvalue *ir);
+
+   void dump_instruction(backend_instruction *inst);
+   void dump_instruction(backend_instruction *inst, FILE *file);
+   void dump_instruction(backend_instruction *inst, char* string);
+
+   void visit_atomic_counter_intrinsic(ir_call *ir);
+
+protected:
+   void emit_vertex();
+   void lower_attributes_to_hw_regs(const int *attribute_map,
+                                    bool interleaved);
+   void setup_payload_interference(struct ra_graph *g, int first_payload_node,
+                                   int reg_node_count);
+   virtual dst_reg *make_reg_for_system_value(ir_variable *ir) = 0;
+   virtual void setup_payload() = 0;
+   virtual void emit_prolog() = 0;
+   //virtual void emit_program_code() = 0;
+   virtual void emit_thread_end() = 0;
+   virtual void emit_urb_write_header(int mrf) = 0;
+   virtual vec4_instruction *emit_urb_write_opcode(bool complete) = 0;
+   virtual int compute_array_stride(ir_dereference_array *ir);
+
+   const bool debug_flag;
+
+private:
+   /**
+    * If true, then register allocation should fail instead of spilling.
+    */
+   const bool no_spills;
+
+   const shader_time_shader_type st_base;
+   const shader_time_shader_type st_written;
+   const shader_time_shader_type st_reset;
+};
+
+
+/**
+ * The vertex shader code generator.
+ *
+ * Translates VS IR to actual i965 assembly code.
+ */
+class vec4_generator
+{
+public:
+   vec4_generator(struct brw_context *brw,
+                  struct gl_shader_program *shader_prog,
+                  struct gl_program *prog,
+                  struct brw_vec4_prog_data *prog_data,
+                  void *mem_ctx,
+                  bool debug_flag);
+   ~vec4_generator();
+
+   const unsigned *generate_assembly(exec_list *insts, unsigned *asm_size);
+
+private:
+   void generate_code(exec_list *instructions);
+   void generate_vec4_instruction(vec4_instruction *inst,
+                                  struct brw_reg dst,
+                                  struct brw_reg *src);
+
+   void generate_math1_gen4(vec4_instruction *inst,
+			    struct brw_reg dst,
+			    struct brw_reg src);
+   void generate_math1_gen6(vec4_instruction *inst,
+			    struct brw_reg dst,
+			    struct brw_reg src);
+   void generate_math2_gen4(vec4_instruction *inst,
+			    struct brw_reg dst,
+			    struct brw_reg src0,
+			    struct brw_reg src1);
+   void generate_math2_gen6(vec4_instruction *inst,
+			    struct brw_reg dst,
+			    struct brw_reg src0,
+			    struct brw_reg src1);
+   void generate_math2_gen7(vec4_instruction *inst,
+			    struct brw_reg dst,
+			    struct brw_reg src0,
+			    struct brw_reg src1);
+
+   void generate_tex(vec4_instruction *inst,
+		     struct brw_reg dst,
+		     struct brw_reg src);
+
+   void generate_vs_urb_write(vec4_instruction *inst);
+   void generate_gs_urb_write(vec4_instruction *inst);
+   void generate_gs_thread_end(vec4_instruction *inst);
+   void generate_gs_set_write_offset(struct brw_reg dst,
+                                     struct brw_reg src0,
+                                     struct brw_reg src1);
+   void generate_gs_set_vertex_count(struct brw_reg dst,
+                                     struct brw_reg src);
+   void generate_gs_set_dword_2_immed(struct brw_reg dst, struct brw_reg src);
+   void generate_gs_prepare_channel_masks(struct brw_reg dst);
+   void generate_gs_set_channel_masks(struct brw_reg dst, struct brw_reg src);
+   void generate_gs_get_instance_id(struct brw_reg dst);
+   void generate_oword_dual_block_offsets(struct brw_reg m1,
+					  struct brw_reg index);
+   void generate_scratch_write(vec4_instruction *inst,
+			       struct brw_reg dst,
+			       struct brw_reg src,
+			       struct brw_reg index);
+   void generate_scratch_read(vec4_instruction *inst,
+			      struct brw_reg dst,
+			      struct brw_reg index);
+   void generate_pull_constant_load(vec4_instruction *inst,
+				    struct brw_reg dst,
+				    struct brw_reg index,
+				    struct brw_reg offset);
+   void generate_pull_constant_load_gen7(vec4_instruction *inst,
+                                         struct brw_reg dst,
+                                         struct brw_reg surf_index,
+                                         struct brw_reg offset);
+   void generate_unpack_flags(vec4_instruction *inst,
+                              struct brw_reg dst);
+
+   void generate_untyped_atomic(vec4_instruction *inst,
+                                struct brw_reg dst,
+                                struct brw_reg atomic_op,
+                                struct brw_reg surf_index);
+
+   void generate_untyped_surface_read(vec4_instruction *inst,
+                                      struct brw_reg dst,
+                                      struct brw_reg surf_index);
+
+   struct brw_context *brw;
+
+   struct brw_compile *p;
+
+   struct gl_shader_program *shader_prog;
+   const struct gl_program *prog;
+
+   struct brw_vec4_prog_data *prog_data;
+
+   void *mem_ctx;
+   const bool debug_flag;
+};
+
+/**
+ * The vertex shader code generator.
+ *
+ * Translates VS IR to actual i965 assembly code.
+ */
+class gen8_vec4_generator : public gen8_generator
+{
+public:
+   gen8_vec4_generator(struct brw_context *brw,
+                       struct gl_shader_program *shader_prog,
+                       struct gl_program *prog,
+                       struct brw_vec4_prog_data *prog_data,
+                       void *mem_ctx,
+                       bool debug_flag);
+   ~gen8_vec4_generator();
+
+   const unsigned *generate_assembly(exec_list *insts, unsigned *asm_size);
+
+private:
+   void generate_code(exec_list *instructions);
+   void generate_vec4_instruction(vec4_instruction *inst,
+                                  struct brw_reg dst,
+                                  struct brw_reg *src);
+
+   void generate_tex(vec4_instruction *inst,
+                     struct brw_reg dst);
+
+   void generate_urb_write(vec4_instruction *ir, bool copy_g0);
+   void generate_gs_thread_end(vec4_instruction *ir);
+   void generate_gs_set_write_offset(struct brw_reg dst,
+                                     struct brw_reg src0,
+                                     struct brw_reg src1);
+   void generate_gs_set_vertex_count(struct brw_reg dst,
+                                     struct brw_reg src);
+   void generate_gs_set_dword_2_immed(struct brw_reg dst, struct brw_reg src);
+   void generate_gs_prepare_channel_masks(struct brw_reg dst);
+   void generate_gs_set_channel_masks(struct brw_reg dst, struct brw_reg src);
+
+   void generate_oword_dual_block_offsets(struct brw_reg m1,
+                                          struct brw_reg index);
+   void generate_scratch_write(vec4_instruction *inst,
+                               struct brw_reg dst,
+                               struct brw_reg src,
+                               struct brw_reg index);
+   void generate_scratch_read(vec4_instruction *inst,
+                              struct brw_reg dst,
+                              struct brw_reg index);
+   void generate_pull_constant_load(vec4_instruction *inst,
+                                    struct brw_reg dst,
+                                    struct brw_reg index,
+                                    struct brw_reg offset);
+   void generate_untyped_atomic(vec4_instruction *ir,
+                                struct brw_reg dst,
+                                struct brw_reg atomic_op,
+                                struct brw_reg surf_index);
+   void generate_untyped_surface_read(vec4_instruction *ir,
+                                      struct brw_reg dst,
+                                      struct brw_reg surf_index);
+
+   struct brw_vec4_prog_data *prog_data;
+
+   const bool debug_flag;
+};
+
+
+} /* namespace brw */
+#endif /* __cplusplus */
+
+#endif /* BRW_VEC4_H */

diff --git a/icd/intel/compiler/pipeline/brw_vec4_copy_propagation.cpp b/icd/intel/compiler/pipeline/brw_vec4_copy_propagation.cpp
new file mode 100644
index 0000000..ce9b36c
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_vec4_copy_propagation.cpp

@@ -0,0 +1,411 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+/**
+ * @file brw_vec4_copy_propagation.cpp
+ *
+ * Implements tracking of values copied between registers, and
+ * optimizations based on that: copy propagation and constant
+ * propagation.
+ */
+
+#include "brw_vec4.h"
+extern "C" {
+#include "main/macros.h"
+}
+
+namespace brw {
+
+static bool
+is_direct_copy(vec4_instruction *inst)
+{
+   return (inst->opcode == BRW_OPCODE_MOV &&
+	   !inst->predicate &&
+	   inst->dst.file == GRF &&
+	   !inst->saturate &&
+	   !inst->dst.reladdr &&
+	   !inst->src[0].reladdr &&
+	   inst->dst.type == inst->src[0].type);
+}
+
+static bool
+is_dominated_by_previous_instruction(vec4_instruction *inst)
+{
+   return (inst->opcode != BRW_OPCODE_DO &&
+	   inst->opcode != BRW_OPCODE_WHILE &&
+	   inst->opcode != BRW_OPCODE_ELSE &&
+	   inst->opcode != BRW_OPCODE_ENDIF);
+}
+
+static bool
+is_channel_updated(vec4_instruction *inst, src_reg *values[4], int ch)
+{
+   const src_reg *src = values[ch];
+
+   /* consider GRF only */
+   assert(inst->dst.file == GRF);
+   if (!src || src->file != GRF)
+      return false;
+
+   return (src->reg == inst->dst.reg &&
+	   src->reg_offset == inst->dst.reg_offset &&
+	   inst->dst.writemask & (1 << BRW_GET_SWZ(src->swizzle, ch)));
+}
+
+static bool
+try_constant_propagation(struct brw_context *brw, vec4_instruction *inst,
+                         int arg, src_reg *values[4])
+{
+   /* For constant propagation, we only handle the same constant
+    * across all 4 channels.  Some day, we should handle the 8-bit
+    * float vector format, which would let us constant propagate
+    * vectors better.
+    */
+   src_reg value = *values[0];
+   for (int i = 1; i < 4; i++) {
+      if (!value.equals(values[i]))
+	 return false;
+   }
+
+   if (value.file != IMM)
+      return false;
+
+   if (inst->src[arg].abs) {
+      if (value.type == BRW_REGISTER_TYPE_F) {
+	 value.imm.f = fabs(value.imm.f);
+      } else if (value.type == BRW_REGISTER_TYPE_D) {
+	 if (value.imm.i < 0)
+	    value.imm.i = -value.imm.i;
+      }
+   }
+
+   if (inst->src[arg].negate) {
+      if (value.type == BRW_REGISTER_TYPE_F)
+	 value.imm.f = -value.imm.f;
+      else
+	 value.imm.u = -value.imm.u;
+   }
+
+   switch (inst->opcode) {
+   case BRW_OPCODE_MOV:
+      inst->src[arg] = value;
+      return true;
+
+   case SHADER_OPCODE_POW:
+   case SHADER_OPCODE_INT_QUOTIENT:
+   case SHADER_OPCODE_INT_REMAINDER:
+      if (brw->gen < 8)
+         break;
+      /* fallthrough */
+   case BRW_OPCODE_DP2:
+   case BRW_OPCODE_DP3:
+   case BRW_OPCODE_DP4:
+   case BRW_OPCODE_DPH:
+   case BRW_OPCODE_BFI1:
+   case BRW_OPCODE_ASR:
+   case BRW_OPCODE_SHL:
+   case BRW_OPCODE_SHR:
+   case BRW_OPCODE_SUBB:
+      if (arg == 1) {
+         inst->src[arg] = value;
+         return true;
+      }
+      break;
+
+   case BRW_OPCODE_MACH:
+   case BRW_OPCODE_MUL:
+   case BRW_OPCODE_ADD:
+   case BRW_OPCODE_OR:
+   case BRW_OPCODE_AND:
+   case BRW_OPCODE_XOR:
+   case BRW_OPCODE_ADDC:
+      if (arg == 1) {
+	 inst->src[arg] = value;
+	 return true;
+      } else if (arg == 0 && inst->src[1].file != IMM) {
+	 /* Fit this constant in by commuting the operands.  Exception: we
+	  * can't do this for 32-bit integer MUL/MACH because it's asymmetric.
+	  */
+	 if ((inst->opcode == BRW_OPCODE_MUL ||
+              inst->opcode == BRW_OPCODE_MACH) &&
+	     (inst->src[1].type == BRW_REGISTER_TYPE_D ||
+	      inst->src[1].type == BRW_REGISTER_TYPE_UD))
+	    break;
+	 inst->src[0] = inst->src[1];
+	 inst->src[1] = value;
+	 return true;
+      }
+      break;
+
+   case BRW_OPCODE_CMP:
+      if (arg == 1) {
+	 inst->src[arg] = value;
+	 return true;
+      } else if (arg == 0 && inst->src[1].file != IMM) {
+	 uint32_t new_cmod;
+
+	 new_cmod = brw_swap_cmod(inst->conditional_mod);
+	 if (new_cmod != ~0u) {
+	    /* Fit this constant in by swapping the operands and
+	     * flipping the test.
+	     */
+	    inst->src[0] = inst->src[1];
+	    inst->src[1] = value;
+	    inst->conditional_mod = new_cmod;
+	    return true;
+	 }
+      }
+      break;
+
+   case BRW_OPCODE_SEL:
+      if (arg == 1) {
+	 inst->src[arg] = value;
+	 return true;
+      } else if (arg == 0 && inst->src[1].file != IMM) {
+	 inst->src[0] = inst->src[1];
+	 inst->src[1] = value;
+
+	 /* If this was predicated, flipping operands means
+	  * we also need to flip the predicate.
+	  */
+	 if (inst->conditional_mod == BRW_CONDITIONAL_NONE) {
+	    inst->predicate_inverse = !inst->predicate_inverse;
+	 }
+	 return true;
+      }
+      break;
+
+   default:
+      break;
+   }
+
+   return false;
+}
+
+static bool
+is_logic_op(enum opcode opcode)
+{
+   return (opcode == BRW_OPCODE_AND ||
+           opcode == BRW_OPCODE_OR  ||
+           opcode == BRW_OPCODE_XOR ||
+           opcode == BRW_OPCODE_NOT);
+}
+
+bool
+vec4_visitor::try_copy_propagation(vec4_instruction *inst, int arg,
+                                   src_reg *values[4])
+{
+   /* For constant propagation, we only handle the same constant
+    * across all 4 channels.  Some day, we should handle the 8-bit
+    * float vector format, which would let us constant propagate
+    * vectors better.
+    */
+   src_reg value = *values[0];
+   for (int i = 1; i < 4; i++) {
+      /* This is equals() except we don't care about the swizzle. */
+      if (value.file != values[i]->file ||
+	  value.reg != values[i]->reg ||
+	  value.reg_offset != values[i]->reg_offset ||
+	  value.type != values[i]->type ||
+	  value.negate != values[i]->negate ||
+	  value.abs != values[i]->abs) {
+	 return false;
+      }
+   }
+
+   /* Compute the swizzle of the original register by swizzling the
+    * component loaded from each value according to the swizzle of
+    * operand we're going to change.
+    */
+   int s[4];
+   for (int i = 0; i < 4; i++) {
+      s[i] = BRW_GET_SWZ(values[i]->swizzle,
+			 BRW_GET_SWZ(inst->src[arg].swizzle, i));
+   }
+   value.swizzle = BRW_SWIZZLE4(s[0], s[1], s[2], s[3]);
+
+   if (value.file != UNIFORM &&
+       value.file != GRF &&
+       value.file != ATTR)
+      return false;
+
+   if (brw->gen >= 8 && (value.negate || value.abs) &&
+       is_logic_op(inst->opcode)) {
+      return false;
+   }
+
+   if (inst->src[arg].abs) {
+      value.negate = false;
+      value.abs = true;
+   }
+   if (inst->src[arg].negate)
+      value.negate = !value.negate;
+
+   bool has_source_modifiers = value.negate || value.abs;
+
+   /* gen6 math and gen7+ SENDs from GRFs ignore source modifiers on
+    * instructions.
+    */
+   if ((has_source_modifiers || value.file == UNIFORM ||
+        value.swizzle != BRW_SWIZZLE_XYZW) && !can_do_source_mods(inst))
+      return false;
+
+   if (has_source_modifiers && value.type != inst->src[arg].type)
+      return false;
+
+   bool is_3src_inst = (inst->opcode == BRW_OPCODE_LRP ||
+                        inst->opcode == BRW_OPCODE_MAD ||
+                        inst->opcode == BRW_OPCODE_BFE ||
+                        inst->opcode == BRW_OPCODE_BFI2);
+   if (is_3src_inst && value.file == UNIFORM)
+      return false;
+
+   if (inst->is_send_from_grf())
+      return false;
+
+   /* We can't copy-propagate a UD negation into a condmod
+    * instruction, because the condmod ends up looking at the 33-bit
+    * signed accumulator value instead of the 32-bit value we wanted
+    */
+   if (inst->conditional_mod &&
+       value.negate &&
+       value.type == BRW_REGISTER_TYPE_UD)
+      return false;
+
+   /* Don't report progress if this is a noop. */
+   if (value.equals(&inst->src[arg]))
+      return false;
+
+   value.type = inst->src[arg].type;
+   inst->src[arg] = value;
+   return true;
+}
+
+bool
+vec4_visitor::opt_copy_propagation()
+{
+   bool progress = false;
+   src_reg *cur_value[virtual_grf_reg_count][4];
+
+   memset(&cur_value, 0, sizeof(cur_value));
+
+   foreach_list(node, &this->instructions) {
+      vec4_instruction *inst = (vec4_instruction *)node;
+
+      /* This pass only works on basic blocks.  If there's flow
+       * control, throw out all our information and start from
+       * scratch.
+       *
+       * This should really be fixed by using a structure like in
+       * src/glsl/opt_copy_propagation.cpp to track available copies.
+       */
+      if (!is_dominated_by_previous_instruction(inst)) {
+	 memset(cur_value, 0, sizeof(cur_value));
+	 continue;
+      }
+
+      /* For each source arg, see if each component comes from a copy
+       * from the same type file (IMM, GRF, UNIFORM), and try
+       * optimizing out access to the copy result
+       */
+      for (int i = 2; i >= 0; i--) {
+	 /* Copied values end up in GRFs, and we don't track reladdr
+	  * accesses.
+	  */
+	 if (inst->src[i].file != GRF ||
+	     inst->src[i].reladdr)
+	    continue;
+
+	 int reg = (virtual_grf_reg_map[inst->src[i].reg] +
+		    inst->src[i].reg_offset);
+
+	 /* Find the regs that each swizzle component came from.
+	  */
+	 src_reg *values[4];
+	 int c;
+	 for (c = 0; c < 4; c++) {
+	    values[c] = cur_value[reg][BRW_GET_SWZ(inst->src[i].swizzle, c)];
+
+	    /* If there's no available copy for this channel, bail.
+	     * We could be more aggressive here -- some channels might
+	     * not get used based on the destination writemask.
+	     */
+	    if (!values[c])
+	       break;
+
+	    /* We'll only be able to copy propagate if the sources are
+	     * all from the same file -- there's no ability to swizzle
+	     * 0 or 1 constants in with source registers like in i915.
+	     */
+	    if (c > 0 && values[c - 1]->file != values[c]->file)
+	       break;
+	 }
+
+	 if (c != 4)
+	    continue;
+
+	 if (try_constant_propagation(brw, inst, i, values) ||
+	     try_copy_propagation(inst, i, values))
+	    progress = true;
+      }
+
+      /* Track available source registers. */
+      if (inst->dst.file == GRF) {
+	 const int reg =
+	    virtual_grf_reg_map[inst->dst.reg] + inst->dst.reg_offset;
+
+	 /* Update our destination's current channel values.  For a direct copy,
+	  * the value is the newly propagated source.  Otherwise, we don't know
+	  * the new value, so clear it.
+	  */
+	 bool direct_copy = is_direct_copy(inst);
+	 for (int i = 0; i < 4; i++) {
+	    if (inst->dst.writemask & (1 << i)) {
+	       cur_value[reg][i] = direct_copy ? &inst->src[0] : NULL;
+	    }
+	 }
+
+	 /* Clear the records for any registers whose current value came from
+	  * our destination's updated channels, as the two are no longer equal.
+	  */
+	 if (inst->dst.reladdr)
+	    memset(cur_value, 0, sizeof(cur_value));
+	 else {
+	    for (int i = 0; i < virtual_grf_reg_count; i++) {
+	       for (int j = 0; j < 4; j++) {
+		  if (is_channel_updated(inst, cur_value[i], j)){
+		     cur_value[i][j] = NULL;
+		  }
+	       }
+	    }
+	 }
+      }
+   }
+
+   if (progress)
+      invalidate_live_intervals();
+
+   return progress;
+}
+
+} /* namespace brw */

diff --git a/icd/intel/compiler/pipeline/brw_vec4_generator.cpp b/icd/intel/compiler/pipeline/brw_vec4_generator.cpp
new file mode 100644
index 0000000..a4f0741
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_vec4_generator.cpp

@@ -0,0 +1,1371 @@
+/* Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "brw_vec4.h"
+
+extern "C" {
+#include "brw_eu.h"
+#include "main/macros.h"
+#include "program/prog_print.h"
+#include "program/prog_parameter.h"
+};
+
+#include "icd-utils.h" // LunarG : ADD
+
+namespace brw {
+
+struct brw_reg
+vec4_instruction::get_dst(void)
+{
+   struct brw_reg brw_reg;
+
+   switch (dst.file) {
+   case GRF:
+      brw_reg = brw_vec8_grf(dst.reg + dst.reg_offset, 0);
+      brw_reg = retype(brw_reg, dst.type);
+      brw_reg.dw1.bits.writemask = dst.writemask;
+      break;
+
+   case MRF:
+      brw_reg = brw_message_reg(dst.reg + dst.reg_offset);
+      brw_reg = retype(brw_reg, dst.type);
+      brw_reg.dw1.bits.writemask = dst.writemask;
+      break;
+
+   case HW_REG:
+      assert(dst.type == dst.fixed_hw_reg.type);
+      brw_reg = dst.fixed_hw_reg;
+      break;
+
+   case BAD_FILE:
+      brw_reg = brw_null_reg();
+      break;
+
+   default:
+      assert(!"not reached");
+      brw_reg = brw_null_reg();
+      break;
+   }
+   return brw_reg;
+}
+
+struct brw_reg
+vec4_instruction::get_src(const struct brw_vec4_prog_data *prog_data, int i)
+{
+   struct brw_reg brw_reg;
+
+   switch (src[i].file) {
+   case GRF:
+      brw_reg = brw_vec8_grf(src[i].reg + src[i].reg_offset, 0);
+      brw_reg = retype(brw_reg, src[i].type);
+      brw_reg.dw1.bits.swizzle = src[i].swizzle;
+      if (src[i].abs)
+	 brw_reg = brw_abs(brw_reg);
+      if (src[i].negate)
+	 brw_reg = negate(brw_reg);
+      break;
+
+   case IMM:
+      switch (src[i].type) {
+      case BRW_REGISTER_TYPE_F:
+	 brw_reg = brw_imm_f(src[i].imm.f);
+	 break;
+      case BRW_REGISTER_TYPE_D:
+	 brw_reg = brw_imm_d(src[i].imm.i);
+	 break;
+      case BRW_REGISTER_TYPE_UD:
+	 brw_reg = brw_imm_ud(src[i].imm.u);
+	 break;
+      default:
+	 assert(!"not reached");
+	 brw_reg = brw_null_reg();
+	 break;
+      }
+      break;
+
+   case UNIFORM:
+      brw_reg = stride(brw_vec4_grf(prog_data->dispatch_grf_start_reg +
+                                    (src[i].reg + src[i].reg_offset) / 2,
+				    ((src[i].reg + src[i].reg_offset) % 2) * 4),
+		       0, 4, 1);
+      brw_reg = retype(brw_reg, src[i].type);
+      brw_reg.dw1.bits.swizzle = src[i].swizzle;
+      if (src[i].abs)
+	 brw_reg = brw_abs(brw_reg);
+      if (src[i].negate)
+	 brw_reg = negate(brw_reg);
+
+      /* This should have been moved to pull constants. */
+      assert(!src[i].reladdr);
+      break;
+
+   case HW_REG:
+      assert(src[i].type == src[i].fixed_hw_reg.type);
+      brw_reg = src[i].fixed_hw_reg;
+      break;
+
+   case BAD_FILE:
+      /* Probably unused. */
+      brw_reg = brw_null_reg();
+      break;
+   case ATTR:
+   default:
+      assert(!"not reached");
+      brw_reg = brw_null_reg();
+      break;
+   }
+
+   return brw_reg;
+}
+
+vec4_generator::vec4_generator(struct brw_context *brw,
+                               struct gl_shader_program *shader_prog,
+                               struct gl_program *prog,
+                               struct brw_vec4_prog_data *prog_data,
+                               void *mem_ctx,
+                               bool debug_flag)
+   : brw(brw), shader_prog(shader_prog), prog(prog), prog_data(prog_data),
+     mem_ctx(mem_ctx), debug_flag(debug_flag)
+{
+   p = rzalloc(mem_ctx, struct brw_compile);
+   brw_init_compile(brw, p, mem_ctx);
+}
+
+vec4_generator::~vec4_generator()
+{
+}
+
+void
+vec4_generator::generate_math1_gen4(vec4_instruction *inst,
+                                    struct brw_reg dst,
+                                    struct brw_reg src)
+{
+   brw_math(p,
+	    dst,
+	    brw_math_function(inst->opcode),
+	    inst->base_mrf,
+	    src,
+	    BRW_MATH_DATA_VECTOR,
+	    BRW_MATH_PRECISION_FULL);
+}
+
+static void
+check_gen6_math_src_arg(struct brw_reg src)
+{
+   /* Source swizzles are ignored. */
+   assert(!src.abs);
+   assert(!src.negate);
+   assert(src.dw1.bits.swizzle == BRW_SWIZZLE_XYZW);
+}
+
+void
+vec4_generator::generate_math1_gen6(vec4_instruction *inst,
+                                    struct brw_reg dst,
+                                    struct brw_reg src)
+{
+   /* Can't do writemask because math can't be align16. */
+   assert(dst.dw1.bits.writemask == WRITEMASK_XYZW);
+   check_gen6_math_src_arg(src);
+
+   brw_set_access_mode(p, BRW_ALIGN_1);
+   brw_math(p,
+	    dst,
+	    brw_math_function(inst->opcode),
+	    inst->base_mrf,
+	    src,
+	    BRW_MATH_DATA_SCALAR,
+	    BRW_MATH_PRECISION_FULL);
+   brw_set_access_mode(p, BRW_ALIGN_16);
+}
+
+void
+vec4_generator::generate_math2_gen7(vec4_instruction *inst,
+                                    struct brw_reg dst,
+                                    struct brw_reg src0,
+                                    struct brw_reg src1)
+{
+   brw_math2(p,
+	     dst,
+	     brw_math_function(inst->opcode),
+	     src0, src1);
+}
+
+void
+vec4_generator::generate_math2_gen6(vec4_instruction *inst,
+                                    struct brw_reg dst,
+                                    struct brw_reg src0,
+                                    struct brw_reg src1)
+{
+   /* Can't do writemask because math can't be align16. */
+   assert(dst.dw1.bits.writemask == WRITEMASK_XYZW);
+   /* Source swizzles are ignored. */
+   check_gen6_math_src_arg(src0);
+   check_gen6_math_src_arg(src1);
+
+   brw_set_access_mode(p, BRW_ALIGN_1);
+   brw_math2(p,
+	     dst,
+	     brw_math_function(inst->opcode),
+	     src0, src1);
+   brw_set_access_mode(p, BRW_ALIGN_16);
+}
+
+void
+vec4_generator::generate_math2_gen4(vec4_instruction *inst,
+                                    struct brw_reg dst,
+                                    struct brw_reg src0,
+                                    struct brw_reg src1)
+{
+   /* From the Ironlake PRM, Volume 4, Part 1, Section 6.1.13
+    * "Message Payload":
+    *
+    * "Operand0[7].  For the INT DIV functions, this operand is the
+    *  denominator."
+    *  ...
+    * "Operand1[7].  For the INT DIV functions, this operand is the
+    *  numerator."
+    */
+   bool is_int_div = inst->opcode != SHADER_OPCODE_POW;
+   struct brw_reg &op0 = is_int_div ? src1 : src0;
+   struct brw_reg &op1 = is_int_div ? src0 : src1;
+
+   brw_push_insn_state(p);
+   brw_set_saturate(p, false);
+   brw_set_predicate_control(p, BRW_PREDICATE_NONE);
+   brw_MOV(p, retype(brw_message_reg(inst->base_mrf + 1), op1.type), op1);
+   brw_pop_insn_state(p);
+
+   brw_math(p,
+	    dst,
+	    brw_math_function(inst->opcode),
+	    inst->base_mrf,
+	    op0,
+	    BRW_MATH_DATA_VECTOR,
+	    BRW_MATH_PRECISION_FULL);
+}
+
+void
+vec4_generator::generate_tex(vec4_instruction *inst,
+                             struct brw_reg dst,
+                             struct brw_reg src)
+{
+   int msg_type = -1;
+
+   if (brw->gen >= 5) {
+      switch (inst->opcode) {
+      case SHADER_OPCODE_TEX:
+      case SHADER_OPCODE_TXL:
+	 if (inst->shadow_compare) {
+	    msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_LOD_COMPARE;
+	 } else {
+	    msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_LOD;
+	 }
+	 break;
+      case SHADER_OPCODE_TXD:
+         if (inst->shadow_compare) {
+            /* Gen7.5+.  Otherwise, lowered by brw_lower_texture_gradients(). */
+            assert(brw->is_haswell);
+            msg_type = HSW_SAMPLER_MESSAGE_SAMPLE_DERIV_COMPARE;
+         } else {
+            msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_DERIVS;
+         }
+	 break;
+      case SHADER_OPCODE_TXF:
+	 msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_LD;
+	 break;
+      case SHADER_OPCODE_TXF_CMS:
+         if (brw->gen >= 7)
+            msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DMS;
+         else
+            msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_LD;
+         break;
+      case SHADER_OPCODE_TXF_MCS:
+         assert(brw->gen >= 7);
+         msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_LD_MCS;
+         break;
+      case SHADER_OPCODE_TXS:
+	 msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_RESINFO;
+	 break;
+      case SHADER_OPCODE_TG4:
+         if (inst->shadow_compare) {
+            msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_C;
+         } else {
+            msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4;
+         }
+         break;
+      case SHADER_OPCODE_TG4_OFFSET:
+         if (inst->shadow_compare) {
+            msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO_C;
+         } else {
+            msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO;
+         }
+         break;
+      default:
+	 assert(!"should not get here: invalid vec4 texture opcode");
+	 break;
+      }
+   } else {
+      switch (inst->opcode) {
+      case SHADER_OPCODE_TEX:
+      case SHADER_OPCODE_TXL:
+	 if (inst->shadow_compare) {
+	    msg_type = BRW_SAMPLER_MESSAGE_SIMD4X2_SAMPLE_LOD_COMPARE;
+	    assert(inst->mlen == 3);
+	 } else {
+	    msg_type = BRW_SAMPLER_MESSAGE_SIMD4X2_SAMPLE_LOD;
+	    assert(inst->mlen == 2);
+	 }
+	 break;
+      case SHADER_OPCODE_TXD:
+	 /* There is no sample_d_c message; comparisons are done manually. */
+	 msg_type = BRW_SAMPLER_MESSAGE_SIMD4X2_SAMPLE_GRADIENTS;
+	 assert(inst->mlen == 4);
+	 break;
+      case SHADER_OPCODE_TXF:
+	 msg_type = BRW_SAMPLER_MESSAGE_SIMD4X2_LD;
+	 assert(inst->mlen == 2);
+	 break;
+      case SHADER_OPCODE_TXS:
+	 msg_type = BRW_SAMPLER_MESSAGE_SIMD4X2_RESINFO;
+	 assert(inst->mlen == 2);
+	 break;
+      default:
+	 assert(!"should not get here: invalid vec4 texture opcode");
+	 break;
+      }
+   }
+
+   assert(msg_type != -1);
+
+   /* Load the message header if present.  If there's a texture offset, we need
+    * to set it up explicitly and load the offset bitfield.  Otherwise, we can
+    * use an implied move from g0 to the first message register.
+    */
+   if (inst->header_present) {
+      if (brw->gen < 6 && !inst->texture_offset) {
+         /* Set up an implied move from g0 to the MRF. */
+         src = brw_vec8_grf(0, 0);
+      } else {
+         struct brw_reg header =
+            retype(brw_message_reg(inst->base_mrf), BRW_REGISTER_TYPE_UD);
+
+         /* Explicitly set up the message header by copying g0 to the MRF. */
+         brw_push_insn_state(p);
+         brw_set_mask_control(p, BRW_MASK_DISABLE);
+         brw_MOV(p, header, retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UD));
+
+         brw_set_access_mode(p, BRW_ALIGN_1);
+
+         if (inst->texture_offset) {
+            /* Set the texel offset bits in DWord 2. */
+            brw_MOV(p, get_element_ud(header, 2),
+                    brw_imm_ud(inst->texture_offset));
+         }
+
+         if (inst->sampler >= 16) {
+            /* The "Sampler Index" field can only store values between 0 and 15.
+             * However, we can add an offset to the "Sampler State Pointer"
+             * field, effectively selecting a different set of 16 samplers.
+             *
+             * The "Sampler State Pointer" needs to be aligned to a 32-byte
+             * offset, and each sampler state is only 16-bytes, so we can't
+             * exclusively use the offset - we have to use both.
+             */
+            assert(brw->is_haswell); /* field only exists on Haswell */
+            brw_ADD(p,
+                    get_element_ud(header, 3),
+                    get_element_ud(brw_vec8_grf(0, 0), 3),
+                    brw_imm_ud(16 * (inst->sampler / 16) *
+                               sizeof(gen7_sampler_state)));
+         }
+         brw_pop_insn_state(p);
+      }
+   }
+
+   uint32_t return_format;
+
+   switch (dst.type) {
+   case BRW_REGISTER_TYPE_D:
+      return_format = BRW_SAMPLER_RETURN_FORMAT_SINT32;
+      break;
+   case BRW_REGISTER_TYPE_UD:
+      return_format = BRW_SAMPLER_RETURN_FORMAT_UINT32;
+      break;
+   default:
+      return_format = BRW_SAMPLER_RETURN_FORMAT_FLOAT32;
+      break;
+   }
+
+   uint32_t surface_index = ((inst->opcode == SHADER_OPCODE_TG4 ||
+      inst->opcode == SHADER_OPCODE_TG4_OFFSET)
+      ? prog_data->base.binding_table.gather_texture_start
+      : prog_data->base.binding_table.texture_start) + inst->sampler;
+
+   brw_SAMPLE(p,
+	      dst,
+	      inst->base_mrf,
+	      src,
+              surface_index,
+	      inst->sampler % 16,
+	      msg_type,
+	      1, /* response length */
+	      inst->mlen,
+	      inst->header_present,
+	      BRW_SAMPLER_SIMD_MODE_SIMD4X2,
+	      return_format);
+
+   brw_mark_surface_used(&prog_data->base, surface_index);
+}
+
+void
+vec4_generator::generate_vs_urb_write(vec4_instruction *inst)
+{
+   brw_urb_WRITE(p,
+		 brw_null_reg(), /* dest */
+		 inst->base_mrf, /* starting mrf reg nr */
+		 brw_vec8_grf(0, 0), /* src */
+                 inst->urb_write_flags,
+		 inst->mlen,
+		 0,		/* response len */
+		 inst->offset,	/* urb destination offset */
+		 BRW_URB_SWIZZLE_INTERLEAVE);
+}
+
+void
+vec4_generator::generate_gs_urb_write(vec4_instruction *inst)
+{
+   struct brw_reg src = brw_message_reg(inst->base_mrf);
+   brw_urb_WRITE(p,
+                 brw_null_reg(), /* dest */
+                 inst->base_mrf, /* starting mrf reg nr */
+                 src,
+                 inst->urb_write_flags,
+                 inst->mlen,
+                 0,             /* response len */
+                 inst->offset,  /* urb destination offset */
+                 BRW_URB_SWIZZLE_INTERLEAVE);
+}
+
+void
+vec4_generator::generate_gs_thread_end(vec4_instruction *inst)
+{
+   struct brw_reg src = brw_message_reg(inst->base_mrf);
+   brw_urb_WRITE(p,
+                 brw_null_reg(), /* dest */
+                 inst->base_mrf, /* starting mrf reg nr */
+                 src,
+                 BRW_URB_WRITE_EOT,
+                 1,              /* message len */
+                 0,              /* response len */
+                 0,              /* urb destination offset */
+                 BRW_URB_SWIZZLE_INTERLEAVE);
+}
+
+void
+vec4_generator::generate_gs_set_write_offset(struct brw_reg dst,
+                                             struct brw_reg src0,
+                                             struct brw_reg src1)
+{
+   /* From p22 of volume 4 part 2 of the Ivy Bridge PRM (2.4.3.1 Message
+    * Header: M0.3):
+    *
+    *     Slot 0 Offset. This field, after adding to the Global Offset field
+    *     in the message descriptor, specifies the offset (in 256-bit units)
+    *     from the start of the URB entry, as referenced by URB Handle 0, at
+    *     which the data will be accessed.
+    *
+    * Similar text describes DWORD M0.4, which is slot 1 offset.
+    *
+    * Therefore, we want to multiply DWORDs 0 and 4 of src0 (the x components
+    * of the register for geometry shader invocations 0 and 1) by the
+    * immediate value in src1, and store the result in DWORDs 3 and 4 of dst.
+    *
+    * We can do this with the following EU instruction:
+    *
+    *     mul(2) dst.3<1>UD src0<8;2,4>UD src1   { Align1 WE_all }
+    */
+   brw_push_insn_state(p);
+   brw_set_access_mode(p, BRW_ALIGN_1);
+   brw_set_mask_control(p, BRW_MASK_DISABLE);
+   brw_MUL(p, suboffset(stride(dst, 2, 2, 1), 3), stride(src0, 8, 2, 4),
+           src1);
+   brw_set_access_mode(p, BRW_ALIGN_16);
+   brw_pop_insn_state(p);
+}
+
+void
+vec4_generator::generate_gs_set_vertex_count(struct brw_reg dst,
+                                             struct brw_reg src)
+{
+   brw_push_insn_state(p);
+   brw_set_access_mode(p, BRW_ALIGN_1);
+   brw_set_mask_control(p, BRW_MASK_DISABLE);
+
+   /* If we think of the src and dst registers as composed of 8 DWORDs each,
+    * we want to pick up the contents of DWORDs 0 and 4 from src, truncate
+    * them to WORDs, and then pack them into DWORD 2 of dst.
+    *
+    * It's easier to get the EU to do this if we think of the src and dst
+    * registers as composed of 16 WORDS each; then, we want to pick up the
+    * contents of WORDs 0 and 8 from src, and pack them into WORDs 4 and 5 of
+    * dst.
+    *
+    * We can do that by the following EU instruction:
+    *
+    *     mov (2) dst.4<1>:uw src<8;1,0>:uw   { Align1, Q1, NoMask }
+    */
+   brw_MOV(p, suboffset(stride(retype(dst, BRW_REGISTER_TYPE_UW), 2, 2, 1), 4),
+           stride(retype(src, BRW_REGISTER_TYPE_UW), 8, 1, 0));
+   brw_set_access_mode(p, BRW_ALIGN_16);
+   brw_pop_insn_state(p);
+}
+
+void
+vec4_generator::generate_gs_set_dword_2_immed(struct brw_reg dst,
+                                              struct brw_reg src)
+{
+   assert(src.file == BRW_IMMEDIATE_VALUE);
+
+   brw_push_insn_state(p);
+   brw_set_access_mode(p, BRW_ALIGN_1);
+   brw_set_mask_control(p, BRW_MASK_DISABLE);
+   brw_MOV(p, suboffset(vec1(dst), 2), src);
+   brw_set_access_mode(p, BRW_ALIGN_16);
+   brw_pop_insn_state(p);
+}
+
+void
+vec4_generator::generate_gs_prepare_channel_masks(struct brw_reg dst)
+{
+   /* We want to left shift just DWORD 4 (the x component belonging to the
+    * second geometry shader invocation) by 4 bits.  So generate the
+    * instruction:
+    *
+    *     shl(1) dst.4<1>UD dst.4<0,1,0>UD 4UD { align1 WE_all }
+    */
+   dst = suboffset(vec1(dst), 4);
+   brw_push_insn_state(p);
+   brw_set_access_mode(p, BRW_ALIGN_1);
+   brw_set_mask_control(p, BRW_MASK_DISABLE);
+   brw_SHL(p, dst, dst, brw_imm_ud(4));
+   brw_pop_insn_state(p);
+}
+
+void
+vec4_generator::generate_gs_set_channel_masks(struct brw_reg dst,
+                                              struct brw_reg src)
+{
+   /* From p21 of volume 4 part 2 of the Ivy Bridge PRM (2.4.3.1 Message
+    * Header: M0.5):
+    *
+    *     15 Vertex 1 DATA [3] / Vertex 0 DATA[7] Channel Mask
+    *
+    *        When Swizzle Control = URB_INTERLEAVED this bit controls Vertex 1
+    *        DATA[3], when Swizzle Control = URB_NOSWIZZLE this bit controls
+    *        Vertex 0 DATA[7].  This bit is ANDed with the corresponding
+    *        channel enable to determine the final channel enable.  For the
+    *        URB_READ_OWORD & URB_READ_HWORD messages, when final channel
+    *        enable is 1 it indicates that Vertex 1 DATA [3] will be included
+    *        in the writeback message.  For the URB_WRITE_OWORD &
+    *        URB_WRITE_HWORD messages, when final channel enable is 1 it
+    *        indicates that Vertex 1 DATA [3] will be written to the surface.
+    *
+    *        0: Vertex 1 DATA [3] / Vertex 0 DATA[7] channel not included
+    *        1: Vertex DATA [3] / Vertex 0 DATA[7] channel included
+    *
+    *     14 Vertex 1 DATA [2] Channel Mask
+    *     13 Vertex 1 DATA [1] Channel Mask
+    *     12 Vertex 1 DATA [0] Channel Mask
+    *     11 Vertex 0 DATA [3] Channel Mask
+    *     10 Vertex 0 DATA [2] Channel Mask
+    *      9 Vertex 0 DATA [1] Channel Mask
+    *      8 Vertex 0 DATA [0] Channel Mask
+    *
+    * (This is from a section of the PRM that is agnostic to the particular
+    * type of shader being executed, so "Vertex 0" and "Vertex 1" refer to
+    * geometry shader invocations 0 and 1, respectively).  Since we have the
+    * enable flags for geometry shader invocation 0 in bits 3:0 of DWORD 0,
+    * and the enable flags for geometry shader invocation 1 in bits 7:0 of
+    * DWORD 4, we just need to OR them together and store the result in bits
+    * 15:8 of DWORD 5.
+    *
+    * It's easier to get the EU to do this if we think of the src and dst
+    * registers as composed of 32 bytes each; then, we want to pick up the
+    * contents of bytes 0 and 16 from src, OR them together, and store them in
+    * byte 21.
+    *
+    * We can do that by the following EU instruction:
+    *
+    *     or(1) dst.21<1>UB src<0,1,0>UB src.16<0,1,0>UB { align1 WE_all }
+    *
+    * Note: this relies on the source register having zeros in (a) bits 7:4 of
+    * DWORD 0 and (b) bits 3:0 of DWORD 4.  We can rely on (b) because the
+    * source register was prepared by GS_OPCODE_PREPARE_CHANNEL_MASKS (which
+    * shifts DWORD 4 left by 4 bits), and we can rely on (a) because prior to
+    * the execution of GS_OPCODE_PREPARE_CHANNEL_MASKS, DWORDs 0 and 4 need to
+    * contain valid channel mask values (which are in the range 0x0-0xf).
+    */
+   dst = retype(dst, BRW_REGISTER_TYPE_UB);
+   src = retype(src, BRW_REGISTER_TYPE_UB);
+   brw_push_insn_state(p);
+   brw_set_access_mode(p, BRW_ALIGN_1);
+   brw_set_mask_control(p, BRW_MASK_DISABLE);
+   brw_OR(p, suboffset(vec1(dst), 21), vec1(src), suboffset(vec1(src), 16));
+   brw_pop_insn_state(p);
+}
+
+void
+vec4_generator::generate_gs_get_instance_id(struct brw_reg dst)
+{
+   /* We want to right shift R0.0 & R0.1 by GEN7_GS_PAYLOAD_INSTANCE_ID_SHIFT
+    * and store into dst.0 & dst.4. So generate the instruction:
+    *
+    *     shr(8) dst<1> R0<1,4,0> GEN7_GS_PAYLOAD_INSTANCE_ID_SHIFT { align1 WE_normal 1Q }
+    */
+   brw_push_insn_state(p);
+   brw_set_access_mode(p, BRW_ALIGN_1);
+   dst = retype(dst, BRW_REGISTER_TYPE_UD);
+   struct brw_reg r0(retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UD));
+   brw_SHR(p, dst, stride(r0, 1, 4, 0),
+           brw_imm_ud(GEN7_GS_PAYLOAD_INSTANCE_ID_SHIFT));
+   brw_pop_insn_state(p);
+}
+
+void
+vec4_generator::generate_oword_dual_block_offsets(struct brw_reg m1,
+                                                  struct brw_reg index)
+{
+   int second_vertex_offset;
+
+   if (brw->gen >= 6)
+      second_vertex_offset = 1;
+   else
+      second_vertex_offset = 16;
+
+   m1 = retype(m1, BRW_REGISTER_TYPE_D);
+
+   /* Set up M1 (message payload).  Only the block offsets in M1.0 and
+    * M1.4 are used, and the rest are ignored.
+    */
+   struct brw_reg m1_0 = suboffset(vec1(m1), 0);
+   struct brw_reg m1_4 = suboffset(vec1(m1), 4);
+   struct brw_reg index_0 = suboffset(vec1(index), 0);
+   struct brw_reg index_4 = suboffset(vec1(index), 4);
+
+   brw_push_insn_state(p);
+   brw_set_mask_control(p, BRW_MASK_DISABLE);
+   brw_set_access_mode(p, BRW_ALIGN_1);
+
+   brw_MOV(p, m1_0, index_0);
+
+   if (index.file == BRW_IMMEDIATE_VALUE) {
+      index_4.dw1.ud += second_vertex_offset;
+      brw_MOV(p, m1_4, index_4);
+   } else {
+      brw_ADD(p, m1_4, index_4, brw_imm_d(second_vertex_offset));
+   }
+
+   brw_pop_insn_state(p);
+}
+
+void
+vec4_generator::generate_unpack_flags(vec4_instruction *inst,
+                                      struct brw_reg dst)
+{
+   brw_push_insn_state(p);
+   brw_set_mask_control(p, BRW_MASK_DISABLE);
+   brw_set_access_mode(p, BRW_ALIGN_1);
+
+   struct brw_reg flags = brw_flag_reg(0, 0);
+   struct brw_reg dst_0 = suboffset(vec1(dst), 0);
+   struct brw_reg dst_4 = suboffset(vec1(dst), 4);
+
+   brw_AND(p, dst_0, flags, brw_imm_ud(0x0f));
+   brw_AND(p, dst_4, flags, brw_imm_ud(0xf0));
+   brw_SHR(p, dst_4, dst_4, brw_imm_ud(4));
+
+   brw_pop_insn_state(p);
+}
+
+void
+vec4_generator::generate_scratch_read(vec4_instruction *inst,
+                                      struct brw_reg dst,
+                                      struct brw_reg index)
+{
+   struct brw_reg header = brw_vec8_grf(0, 0);
+
+   gen6_resolve_implied_move(p, &header, inst->base_mrf);
+
+   generate_oword_dual_block_offsets(brw_message_reg(inst->base_mrf + 1),
+				     index);
+
+   uint32_t msg_type;
+
+   if (brw->gen >= 6)
+      msg_type = GEN6_DATAPORT_READ_MESSAGE_OWORD_DUAL_BLOCK_READ;
+   else if (brw->gen == 5 || brw->is_g4x)
+      msg_type = G45_DATAPORT_READ_MESSAGE_OWORD_DUAL_BLOCK_READ;
+   else
+      msg_type = BRW_DATAPORT_READ_MESSAGE_OWORD_DUAL_BLOCK_READ;
+
+   /* Each of the 8 channel enables is considered for whether each
+    * dword is written.
+    */
+   struct brw_instruction *send = brw_next_insn(p, BRW_OPCODE_SEND);
+   brw_set_dest(p, send, dst);
+   brw_set_src0(p, send, header);
+   if (brw->gen < 6)
+      send->header.destreg__conditionalmod = inst->base_mrf;
+   brw_set_dp_read_message(p, send,
+			   255, /* binding table index: stateless access */
+			   BRW_DATAPORT_OWORD_DUAL_BLOCK_1OWORD,
+			   msg_type,
+			   BRW_DATAPORT_READ_TARGET_RENDER_CACHE,
+			   2, /* mlen */
+                           true, /* header_present */
+			   1 /* rlen */);
+}
+
+void
+vec4_generator::generate_scratch_write(vec4_instruction *inst,
+                                       struct brw_reg dst,
+                                       struct brw_reg src,
+                                       struct brw_reg index)
+{
+   struct brw_reg header = brw_vec8_grf(0, 0);
+   bool write_commit;
+
+   /* If the instruction is predicated, we'll predicate the send, not
+    * the header setup.
+    */
+   brw_set_predicate_control(p, false);
+
+   gen6_resolve_implied_move(p, &header, inst->base_mrf);
+
+   generate_oword_dual_block_offsets(brw_message_reg(inst->base_mrf + 1),
+				     index);
+
+   brw_MOV(p,
+	   retype(brw_message_reg(inst->base_mrf + 2), BRW_REGISTER_TYPE_D),
+	   retype(src, BRW_REGISTER_TYPE_D));
+
+   uint32_t msg_type;
+
+   if (brw->gen >= 7)
+      msg_type = GEN7_DATAPORT_WRITE_MESSAGE_OWORD_DUAL_BLOCK_WRITE;
+   else if (brw->gen == 6)
+      msg_type = GEN6_DATAPORT_WRITE_MESSAGE_OWORD_DUAL_BLOCK_WRITE;
+   else
+      msg_type = BRW_DATAPORT_WRITE_MESSAGE_OWORD_DUAL_BLOCK_WRITE;
+
+   brw_set_predicate_control(p, inst->predicate);
+
+   /* Pre-gen6, we have to specify write commits to ensure ordering
+    * between reads and writes within a thread.  Afterwards, that's
+    * guaranteed and write commits only matter for inter-thread
+    * synchronization.
+    */
+   if (brw->gen >= 6) {
+      write_commit = false;
+   } else {
+      /* The visitor set up our destination register to be g0.  This
+       * means that when the next read comes along, we will end up
+       * reading from g0 and causing a block on the write commit.  For
+       * write-after-read, we are relying on the value of the previous
+       * read being used (and thus blocking on completion) before our
+       * write is executed.  This means we have to be careful in
+       * instruction scheduling to not violate this assumption.
+       */
+      write_commit = true;
+   }
+
+   /* Each of the 8 channel enables is considered for whether each
+    * dword is written.
+    */
+   struct brw_instruction *send = brw_next_insn(p, BRW_OPCODE_SEND);
+   brw_set_dest(p, send, dst);
+   brw_set_src0(p, send, header);
+   if (brw->gen < 6)
+      send->header.destreg__conditionalmod = inst->base_mrf;
+   brw_set_dp_write_message(p, send,
+			    255, /* binding table index: stateless access */
+			    BRW_DATAPORT_OWORD_DUAL_BLOCK_1OWORD,
+			    msg_type,
+			    3, /* mlen */
+			    true, /* header present */
+			    false, /* not a render target write */
+			    write_commit, /* rlen */
+			    false, /* eot */
+			    write_commit);
+}
+
+void
+vec4_generator::generate_pull_constant_load(vec4_instruction *inst,
+                                            struct brw_reg dst,
+                                            struct brw_reg index,
+                                            struct brw_reg offset)
+{
+   assert(brw->gen <= 7);
+   assert(index.file == BRW_IMMEDIATE_VALUE &&
+	  index.type == BRW_REGISTER_TYPE_UD);
+   uint32_t surf_index = index.dw1.ud;
+
+   struct brw_reg header = brw_vec8_grf(0, 0);
+
+   gen6_resolve_implied_move(p, &header, inst->base_mrf);
+
+   brw_MOV(p, retype(brw_message_reg(inst->base_mrf + 1), BRW_REGISTER_TYPE_D),
+	   offset);
+
+   uint32_t msg_type;
+
+   if (brw->gen >= 6)
+      msg_type = GEN6_DATAPORT_READ_MESSAGE_OWORD_DUAL_BLOCK_READ;
+   else if (brw->gen == 5 || brw->is_g4x)
+      msg_type = G45_DATAPORT_READ_MESSAGE_OWORD_DUAL_BLOCK_READ;
+   else
+      msg_type = BRW_DATAPORT_READ_MESSAGE_OWORD_DUAL_BLOCK_READ;
+
+   /* Each of the 8 channel enables is considered for whether each
+    * dword is written.
+    */
+   struct brw_instruction *send = brw_next_insn(p, BRW_OPCODE_SEND);
+   brw_set_dest(p, send, dst);
+   brw_set_src0(p, send, header);
+   if (brw->gen < 6)
+      send->header.destreg__conditionalmod = inst->base_mrf;
+   brw_set_dp_read_message(p, send,
+			   surf_index,
+			   BRW_DATAPORT_OWORD_DUAL_BLOCK_1OWORD,
+			   msg_type,
+			   BRW_DATAPORT_READ_TARGET_DATA_CACHE,
+			   2, /* mlen */
+                           true, /* header_present */
+			   1 /* rlen */);
+
+   brw_mark_surface_used(&prog_data->base, surf_index);
+}
+
+void
+vec4_generator::generate_pull_constant_load_gen7(vec4_instruction *inst,
+                                                 struct brw_reg dst,
+                                                 struct brw_reg surf_index,
+                                                 struct brw_reg offset)
+{
+   assert(surf_index.file == BRW_IMMEDIATE_VALUE &&
+	  surf_index.type == BRW_REGISTER_TYPE_UD);
+
+   brw_instruction *insn = brw_next_insn(p, BRW_OPCODE_SEND);
+   brw_set_dest(p, insn, dst);
+   brw_set_src0(p, insn, offset);
+   brw_set_sampler_message(p, insn,
+                           surf_index.dw1.ud,
+                           0, /* LD message ignores sampler unit */
+                           GEN5_SAMPLER_MESSAGE_SAMPLE_LD,
+                           1, /* rlen */
+                           1, /* mlen */
+                           false, /* no header */
+                           BRW_SAMPLER_SIMD_MODE_SIMD4X2,
+                           0);
+
+   brw_mark_surface_used(&prog_data->base, surf_index.dw1.ud);
+}
+
+void
+vec4_generator::generate_untyped_atomic(vec4_instruction *inst,
+                                        struct brw_reg dst,
+                                        struct brw_reg atomic_op,
+                                        struct brw_reg surf_index)
+{
+   assert(atomic_op.file == BRW_IMMEDIATE_VALUE &&
+          atomic_op.type == BRW_REGISTER_TYPE_UD &&
+          surf_index.file == BRW_IMMEDIATE_VALUE &&
+	  surf_index.type == BRW_REGISTER_TYPE_UD);
+
+   brw_untyped_atomic(p, dst, brw_message_reg(inst->base_mrf),
+                      atomic_op.dw1.ud, surf_index.dw1.ud,
+                      inst->mlen, 1);
+
+   brw_mark_surface_used(&prog_data->base, surf_index.dw1.ud);
+}
+
+void
+vec4_generator::generate_untyped_surface_read(vec4_instruction *inst,
+                                              struct brw_reg dst,
+                                              struct brw_reg surf_index)
+{
+   assert(surf_index.file == BRW_IMMEDIATE_VALUE &&
+	  surf_index.type == BRW_REGISTER_TYPE_UD);
+
+   brw_untyped_surface_read(p, dst, brw_message_reg(inst->base_mrf),
+                            surf_index.dw1.ud,
+                            inst->mlen, 1);
+
+   brw_mark_surface_used(&prog_data->base, surf_index.dw1.ud);
+}
+
+/**
+ * Generate assembly for a Vec4 IR instruction.
+ *
+ * \param instruction The Vec4 IR instruction to generate code for.
+ * \param dst         The destination register.
+ * \param src         An array of up to three source registers.
+ */
+void
+vec4_generator::generate_vec4_instruction(vec4_instruction *instruction,
+                                          struct brw_reg dst,
+                                          struct brw_reg *src)
+{
+   vec4_instruction *inst = (vec4_instruction *) instruction;
+
+   if (dst.width == BRW_WIDTH_4) {
+      /* This happens in attribute fixups for "dual instanced" geometry
+       * shaders, since they use attributes that are vec4's.  Since the exec
+       * width is only 4, it's essential that the caller set
+       * force_writemask_all in order to make sure the instruction is executed
+       * regardless of which channels are enabled.
+       */
+      assert(inst->force_writemask_all);
+
+      /* Fix up any <8;8,1> or <0;4,1> source registers to <4;4,1> to satisfy
+       * the following register region restrictions (from Graphics BSpec:
+       * 3D-Media-GPGPU Engine > EU Overview > Registers and Register Regions
+       * > Register Region Restrictions)
+       *
+       *     1. ExecSize must be greater than or equal to Width.
+       *
+       *     2. If ExecSize = Width and HorzStride != 0, VertStride must be set
+       *        to Width * HorzStride."
+       */
+      for (int i = 0; i < 3; i++) {
+         if (src[i].file == BRW_GENERAL_REGISTER_FILE)
+            src[i] = stride(src[i], 4, 4, 1);
+      }
+   }
+
+   switch (inst->opcode) {
+   case BRW_OPCODE_MOV:
+      brw_MOV(p, dst, src[0]);
+      break;
+   case BRW_OPCODE_ADD:
+      brw_ADD(p, dst, src[0], src[1]);
+      break;
+   case BRW_OPCODE_MUL:
+      brw_MUL(p, dst, src[0], src[1]);
+      break;
+   case BRW_OPCODE_MACH:
+      brw_MACH(p, dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_MAD:
+      assert(brw->gen >= 6);
+      brw_MAD(p, dst, src[0], src[1], src[2]);
+      break;
+
+   case BRW_OPCODE_FRC:
+      brw_FRC(p, dst, src[0]);
+      break;
+   case BRW_OPCODE_RNDD:
+      brw_RNDD(p, dst, src[0]);
+      break;
+   case BRW_OPCODE_RNDE:
+      brw_RNDE(p, dst, src[0]);
+      break;
+   case BRW_OPCODE_RNDZ:
+      brw_RNDZ(p, dst, src[0]);
+      break;
+
+   case BRW_OPCODE_AND:
+      brw_AND(p, dst, src[0], src[1]);
+      break;
+   case BRW_OPCODE_OR:
+      brw_OR(p, dst, src[0], src[1]);
+      break;
+   case BRW_OPCODE_XOR:
+      brw_XOR(p, dst, src[0], src[1]);
+      break;
+   case BRW_OPCODE_NOT:
+      brw_NOT(p, dst, src[0]);
+      break;
+   case BRW_OPCODE_ASR:
+      brw_ASR(p, dst, src[0], src[1]);
+      break;
+   case BRW_OPCODE_SHR:
+      brw_SHR(p, dst, src[0], src[1]);
+      break;
+   case BRW_OPCODE_SHL:
+      brw_SHL(p, dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_CMP:
+      brw_CMP(p, dst, inst->conditional_mod, src[0], src[1]);
+      break;
+   case BRW_OPCODE_SEL:
+      brw_SEL(p, dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_DPH:
+      brw_DPH(p, dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_DP4:
+      brw_DP4(p, dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_DP3:
+      brw_DP3(p, dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_DP2:
+      brw_DP2(p, dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_F32TO16:
+      assert(brw->gen >= 7);
+      brw_F32TO16(p, dst, src[0]);
+      break;
+
+   case BRW_OPCODE_F16TO32:
+      assert(brw->gen >= 7);
+      brw_F16TO32(p, dst, src[0]);
+      break;
+
+   case BRW_OPCODE_LRP:
+      assert(brw->gen >= 6);
+      brw_LRP(p, dst, src[0], src[1], src[2]);
+      break;
+
+   case BRW_OPCODE_BFREV:
+      assert(brw->gen >= 7);
+      /* BFREV only supports UD type for src and dst. */
+      brw_BFREV(p, retype(dst, BRW_REGISTER_TYPE_UD),
+                   retype(src[0], BRW_REGISTER_TYPE_UD));
+      break;
+   case BRW_OPCODE_FBH:
+      assert(brw->gen >= 7);
+      /* FBH only supports UD type for dst. */
+      brw_FBH(p, retype(dst, BRW_REGISTER_TYPE_UD), src[0]);
+      break;
+   case BRW_OPCODE_FBL:
+      assert(brw->gen >= 7);
+      /* FBL only supports UD type for dst. */
+      brw_FBL(p, retype(dst, BRW_REGISTER_TYPE_UD), src[0]);
+      break;
+   case BRW_OPCODE_CBIT:
+      assert(brw->gen >= 7);
+      /* CBIT only supports UD type for dst. */
+      brw_CBIT(p, retype(dst, BRW_REGISTER_TYPE_UD), src[0]);
+      break;
+   case BRW_OPCODE_ADDC:
+      assert(brw->gen >= 7);
+      brw_ADDC(p, dst, src[0], src[1]);
+      break;
+   case BRW_OPCODE_SUBB:
+      assert(brw->gen >= 7);
+      brw_SUBB(p, dst, src[0], src[1]);
+      break;
+   case BRW_OPCODE_MAC:
+      brw_MAC(p, dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_BFE:
+      assert(brw->gen >= 7);
+      brw_BFE(p, dst, src[0], src[1], src[2]);
+      break;
+
+   case BRW_OPCODE_BFI1:
+      assert(brw->gen >= 7);
+      brw_BFI1(p, dst, src[0], src[1]);
+      break;
+   case BRW_OPCODE_BFI2:
+      assert(brw->gen >= 7);
+      brw_BFI2(p, dst, src[0], src[1], src[2]);
+      break;
+
+   case BRW_OPCODE_IF:
+      if (inst->src[0].file != BAD_FILE) {
+         /* The instruction has an embedded compare (only allowed on gen6) */
+         assert(brw->gen == 6);
+         gen6_IF(p, inst->conditional_mod, src[0], src[1]);
+      } else {
+         struct brw_instruction *brw_inst = brw_IF(p, BRW_EXECUTE_8);
+         brw_inst->header.predicate_control = inst->predicate;
+      }
+      break;
+
+   case BRW_OPCODE_ELSE:
+      brw_ELSE(p);
+      break;
+   case BRW_OPCODE_ENDIF:
+      brw_ENDIF(p);
+      break;
+
+   case BRW_OPCODE_DO:
+      brw_DO(p, BRW_EXECUTE_8);
+      break;
+
+   case BRW_OPCODE_BREAK:
+      brw_BREAK(p);
+      brw_set_predicate_control(p, BRW_PREDICATE_NONE);
+      break;
+   case BRW_OPCODE_CONTINUE:
+      /* FINISHME: We need to write the loop instruction support still. */
+      if (brw->gen >= 6)
+         gen6_CONT(p);
+      else
+         brw_CONT(p);
+      brw_set_predicate_control(p, BRW_PREDICATE_NONE);
+      break;
+
+   case BRW_OPCODE_WHILE:
+      brw_WHILE(p);
+      break;
+
+   case SHADER_OPCODE_RCP:
+   case SHADER_OPCODE_RSQ:
+   case SHADER_OPCODE_SQRT:
+   case SHADER_OPCODE_EXP2:
+   case SHADER_OPCODE_LOG2:
+   case SHADER_OPCODE_SIN:
+   case SHADER_OPCODE_COS:
+      if (brw->gen == 6) {
+	 generate_math1_gen6(inst, dst, src[0]);
+      } else {
+	 /* Also works for Gen7. */
+	 generate_math1_gen4(inst, dst, src[0]);
+      }
+      break;
+
+   case SHADER_OPCODE_POW:
+   case SHADER_OPCODE_INT_QUOTIENT:
+   case SHADER_OPCODE_INT_REMAINDER:
+      if (brw->gen >= 7) {
+	 generate_math2_gen7(inst, dst, src[0], src[1]);
+      } else if (brw->gen == 6) {
+	 generate_math2_gen6(inst, dst, src[0], src[1]);
+      } else {
+	 generate_math2_gen4(inst, dst, src[0], src[1]);
+      }
+      break;
+
+   case SHADER_OPCODE_TEX:
+   case SHADER_OPCODE_TXD:
+   case SHADER_OPCODE_TXF:
+   case SHADER_OPCODE_TXF_CMS:
+   case SHADER_OPCODE_TXF_MCS:
+   case SHADER_OPCODE_TXL:
+   case SHADER_OPCODE_TXS:
+   case SHADER_OPCODE_TG4:
+   case SHADER_OPCODE_TG4_OFFSET:
+      generate_tex(inst, dst, src[0]);
+      break;
+
+   case VS_OPCODE_URB_WRITE:
+      generate_vs_urb_write(inst);
+      break;
+
+   case SHADER_OPCODE_GEN4_SCRATCH_READ:
+      generate_scratch_read(inst, dst, src[0]);
+      break;
+
+   case SHADER_OPCODE_GEN4_SCRATCH_WRITE:
+      generate_scratch_write(inst, dst, src[0], src[1]);
+      break;
+
+   case VS_OPCODE_PULL_CONSTANT_LOAD:
+      generate_pull_constant_load(inst, dst, src[0], src[1]);
+      break;
+
+   case VS_OPCODE_PULL_CONSTANT_LOAD_GEN7:
+      generate_pull_constant_load_gen7(inst, dst, src[0], src[1]);
+      break;
+
+   case GS_OPCODE_URB_WRITE:
+      generate_gs_urb_write(inst);
+      break;
+
+   case GS_OPCODE_THREAD_END:
+      generate_gs_thread_end(inst);
+      break;
+
+   case GS_OPCODE_SET_WRITE_OFFSET:
+      generate_gs_set_write_offset(dst, src[0], src[1]);
+      break;
+
+   case GS_OPCODE_SET_VERTEX_COUNT:
+      generate_gs_set_vertex_count(dst, src[0]);
+      break;
+
+   case GS_OPCODE_SET_DWORD_2_IMMED:
+      generate_gs_set_dword_2_immed(dst, src[0]);
+      break;
+
+   case GS_OPCODE_PREPARE_CHANNEL_MASKS:
+      generate_gs_prepare_channel_masks(dst);
+      break;
+
+   case GS_OPCODE_SET_CHANNEL_MASKS:
+      generate_gs_set_channel_masks(dst, src[0]);
+      break;
+
+   case GS_OPCODE_GET_INSTANCE_ID:
+      generate_gs_get_instance_id(dst);
+      break;
+
+//   case SHADER_OPCODE_SHADER_TIME_ADD:
+//      brw_shader_time_add(p, src[0],
+//                          prog_data->base.binding_table.shader_time_start);
+//      brw_mark_surface_used(&prog_data->base,
+//                            prog_data->base.binding_table.shader_time_start);
+//      break;
+
+   case SHADER_OPCODE_UNTYPED_ATOMIC:
+      generate_untyped_atomic(inst, dst, src[0], src[1]);
+      break;
+
+   case SHADER_OPCODE_UNTYPED_SURFACE_READ:
+      generate_untyped_surface_read(inst, dst, src[0]);
+      break;
+
+   case VS_OPCODE_UNPACK_FLAGS_SIMD4X2:
+      generate_unpack_flags(inst, dst);
+      break;
+
+   default:
+      if (inst->opcode < (int) ARRAY_SIZE(opcode_descs)) {
+         _mesa_problem(&brw->ctx, "Unsupported opcode in `%s' in vec4\n",
+                       opcode_descs[inst->opcode].name);
+      } else {
+         _mesa_problem(&brw->ctx, "Unsupported opcode %d in vec4", inst->opcode);
+      }
+      abort();
+   }
+}
+
+void
+vec4_generator::generate_code(exec_list *instructions)
+{
+   int last_native_insn_offset = 0;
+   const char *last_annotation_string = NULL;
+   const void *last_annotation_ir = NULL;
+
+   if (unlikely(debug_flag)) {
+      if (shader_prog) {
+         fprintf(stderr, "Native code for %s vertex shader %d:\n",
+                 shader_prog->Label ? shader_prog->Label : "unnamed",
+                 shader_prog->Name);
+      } else {
+         fprintf(stderr, "Native code for vertex program %d:\n", prog->Id);
+      }
+   }
+
+   foreach_list(node, instructions) {
+      vec4_instruction *inst = (vec4_instruction *)node;
+      struct brw_reg src[3], dst;
+
+      if (unlikely(debug_flag)) {
+	 if (last_annotation_ir != inst->ir) {
+	    last_annotation_ir = inst->ir;
+	    if (last_annotation_ir) {
+	       fprintf(stderr, "   ");
+               if (shader_prog) {
+                  ((ir_instruction *) last_annotation_ir)->fprint(stderr);
+               } else {
+//                  const prog_instruction *vpi;
+//                  vpi = (const prog_instruction *) inst->ir;
+//                  fprintf(stderr, "%d: ", (int)(vpi - prog->Instructions));
+//                  _mesa_fprint_instruction_opt(stderr, vpi, 0,
+//                                               PROG_PRINT_DEBUG, NULL);
+               }
+	       fprintf(stderr, "\n");
+	    }
+	 }
+	 if (last_annotation_string != inst->annotation) {
+	    last_annotation_string = inst->annotation;
+	    if (last_annotation_string)
+	       fprintf(stderr, "   %s\n", last_annotation_string);
+	 }
+      }
+
+      for (unsigned int i = 0; i < 3; i++) {
+	 src[i] = inst->get_src(this->prog_data, i);
+      }
+      dst = inst->get_dst();
+
+      brw_set_conditionalmod(p, inst->conditional_mod);
+      brw_set_predicate_control(p, inst->predicate);
+      brw_set_predicate_inverse(p, inst->predicate_inverse);
+      brw_set_saturate(p, inst->saturate);
+      brw_set_mask_control(p, inst->force_writemask_all);
+      brw_set_acc_write_control(p, inst->writes_accumulator);
+
+      unsigned pre_emit_nr_insn = p->nr_insn;
+
+      generate_vec4_instruction(inst, dst, src);
+
+      if (inst->no_dd_clear || inst->no_dd_check) {
+         assert(p->nr_insn == pre_emit_nr_insn + 1 ||
+                !"no_dd_check or no_dd_clear set for IR emitting more "
+                "than 1 instruction");
+
+         struct brw_instruction *last = &p->store[pre_emit_nr_insn];
+
+         if (inst->no_dd_clear)
+            last->header.dependency_control |= BRW_DEPENDENCY_NOTCLEARED;
+         if (inst->no_dd_check)
+            last->header.dependency_control |= BRW_DEPENDENCY_NOTCHECKED;
+      }
+
+      if (unlikely(debug_flag)) {
+	 brw_dump_compile(p, stderr,
+			  last_native_insn_offset, p->next_insn_offset);
+      }
+
+      last_native_insn_offset = p->next_insn_offset;
+   }
+
+   if (unlikely(debug_flag)) {
+      fprintf(stderr, "\n");
+   }
+
+   brw_set_uip_jip(p);
+
+   /* OK, while the INTEL_DEBUG=vs above is very nice for debugging VS
+    * emit issues, it doesn't get the jump distances into the output,
+    * which is often something we want to debug.  So this is here in
+    * case you're doing that.
+    */
+   if (0 && unlikely(debug_flag)) {
+      brw_dump_compile(p, stderr, 0, p->next_insn_offset);
+   }
+}
+
+const unsigned *
+vec4_generator::generate_assembly(exec_list *instructions,
+                                  unsigned *assembly_size)
+{
+   brw_set_access_mode(p, BRW_ALIGN_16);
+   generate_code(instructions);
+   return brw_get_program(p, assembly_size);
+}
+
+} /* namespace brw */

diff --git a/icd/intel/compiler/pipeline/brw_vec4_gs.c b/icd/intel/compiler/pipeline/brw_vec4_gs.c
new file mode 100644
index 0000000..d59ee9a
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_vec4_gs.c

@@ -0,0 +1,332 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file brw_vec4_gs.c
+ *
+ * State atom for client-programmable geometry shaders, and support code.
+ */
+
+#include "brw_vec4_gs.h"
+#include "brw_context.h"
+#include "brw_vec4_gs_visitor.h"
+//#include "brw_state.h"
+
+const GLuint prim_to_hw_prim[GL_TRIANGLE_STRIP_ADJACENCY+1] = {
+   _3DPRIM_POINTLIST,
+   _3DPRIM_LINELIST,
+   _3DPRIM_LINELOOP,
+   _3DPRIM_LINESTRIP,
+   _3DPRIM_TRILIST,
+   _3DPRIM_TRISTRIP,
+   _3DPRIM_TRIFAN,
+   _3DPRIM_QUADLIST,
+   _3DPRIM_QUADSTRIP,
+   _3DPRIM_POLYGON,
+   _3DPRIM_LINELIST_ADJ,
+   _3DPRIM_LINESTRIP_ADJ,
+   _3DPRIM_TRILIST_ADJ,
+   _3DPRIM_TRISTRIP_ADJ,
+};
+
+static void
+brw_gs_init_compile(struct brw_context *brw,
+                    struct gl_shader_program *prog,
+                    struct brw_geometry_program *gp,
+                    const struct brw_gs_prog_key *key,
+                    struct brw_gs_compile *c)
+{
+   memset(c, 0, sizeof(*c));
+
+   c->key = *key;
+   c->gp = gp;
+   c->base.shader_prog = prog;
+   c->base.mem_ctx = ralloc_context(NULL);
+}
+
+static bool
+brw_gs_do_compile(struct brw_context *brw,
+                  struct brw_gs_compile *c)
+{
+   c->prog_data.include_primitive_id =
+      (c->gp->program.Base.InputsRead & VARYING_BIT_PRIMITIVE_ID) != 0;
+
+   c->prog_data.invocations = c->gp->program.Invocations;
+
+   /* Allocate the references to the uniforms that will end up in the
+    * prog_data associated with the compiled program, and which will be freed
+    * by the state cache.
+    *
+    * Note: param_count needs to be num_uniform_components * 4, since we add
+    * padding around uniform values below vec4 size, so the worst case is that
+    * every uniform is a float which gets padded to the size of a vec4.
+    */
+   struct gl_shader *gs =
+      c->base.shader_prog->_LinkedShaders[MESA_SHADER_GEOMETRY];
+   int param_count = gs->num_uniform_components * 4;
+
+   /* We also upload clip plane data as uniforms */
+   param_count += MAX_CLIP_PLANES * 4;
+
+   c->prog_data.base.base.param =
+      rzalloc_array(NULL, const float *, param_count);
+   c->prog_data.base.base.pull_param =
+      rzalloc_array(NULL, const float *, param_count);
+   /* Setting nr_params here NOT to the size of the param and pull_param
+    * arrays, but to the number of uniform components vec4_visitor
+    * needs. vec4_visitor::setup_uniforms() will set it back to a proper value.
+    */
+   c->prog_data.base.base.nr_params = ALIGN(param_count, 4) / 4 + gs->num_samplers;
+
+   if (c->gp->program.OutputType == GL_POINTS) {
+      /* When the output type is points, the geometry shader may output data
+       * to multiple streams, and EndPrimitive() has no effect.  So we
+       * configure the hardware to interpret the control data as stream ID.
+       */
+      c->prog_data.control_data_format = GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_SID;
+
+      /* However, StreamID is not yet supported, so we output zero bits of
+       * control data per vertex.
+       */
+      c->control_data_bits_per_vertex = 0;
+   } else {
+      /* When the output type is triangle_strip or line_strip, EndPrimitive()
+       * may be used to terminate the current strip and start a new one
+       * (similar to primitive restart), and outputting data to multiple
+       * streams is not supported.  So we configure the hardware to interpret
+       * the control data as EndPrimitive information (a.k.a. "cut bits").
+       */
+      c->prog_data.control_data_format = GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_CUT;
+
+      /* We only need to output control data if the shader actually calls
+       * EndPrimitive().
+       */
+      c->control_data_bits_per_vertex =
+         c->gp->program.UsesEndPrimitive ? 1 : 0;
+   }
+   c->control_data_header_size_bits =
+      c->gp->program.VerticesOut * c->control_data_bits_per_vertex;
+
+   /* 1 HWORD = 32 bytes = 256 bits */
+   c->prog_data.control_data_header_size_hwords =
+      ALIGN(c->control_data_header_size_bits, 256) / 256;
+
+   GLbitfield64 outputs_written = c->gp->program.Base.OutputsWritten;
+
+   /* In order for legacy clipping to work, we need to populate the clip
+    * distance varying slots whenever clipping is enabled, even if the vertex
+    * shader doesn't write to gl_ClipDistance.
+    */
+   if (c->key.base.userclip_active) {
+      outputs_written |= BITFIELD64_BIT(VARYING_SLOT_CLIP_DIST0);
+      outputs_written |= BITFIELD64_BIT(VARYING_SLOT_CLIP_DIST1);
+   }
+
+   brw_compute_vue_map(brw, &c->prog_data.base.vue_map, outputs_written);
+
+   /* Compute the output vertex size.
+    *
+    * From the Ivy Bridge PRM, Vol2 Part1 7.2.1.1 STATE_GS - Output Vertex
+    * Size (p168):
+    *
+    *     [0,62] indicating [1,63] 16B units
+    *
+    *     Specifies the size of each vertex stored in the GS output entry
+    *     (following any Control Header data) as a number of 128-bit units
+    *     (minus one).
+    *
+    *     Programming Restrictions: The vertex size must be programmed as a
+    *     multiple of 32B units with the following exception: Rendering is
+    *     disabled (as per SOL stage state) and the vertex size output by the
+    *     GS thread is 16B.
+    *
+    *     If rendering is enabled (as per SOL state) the vertex size must be
+    *     programmed as a multiple of 32B units. In other words, the only time
+    *     software can program a vertex size with an odd number of 16B units
+    *     is when rendering is disabled.
+    *
+    * Note: B=bytes in the above text.
+    *
+    * It doesn't seem worth the extra trouble to optimize the case where the
+    * vertex size is 16B (especially since this would require special-casing
+    * the GEN assembly that writes to the URB).  So we just set the vertex
+    * size to a multiple of 32B (2 vec4's) in all cases.
+    *
+    * The maximum output vertex size is 62*16 = 992 bytes (31 hwords).  We
+    * budget that as follows:
+    *
+    *   512 bytes for varyings (a varying component is 4 bytes and
+    *             gl_MaxGeometryOutputComponents = 128)
+    *    16 bytes overhead for VARYING_SLOT_PSIZ (each varying slot is 16
+    *             bytes)
+    *    16 bytes overhead for gl_Position (we allocate it a slot in the VUE
+    *             even if it's not used)
+    *    32 bytes overhead for gl_ClipDistance (we allocate it 2 VUE slots
+    *             whenever clip planes are enabled, even if the shader doesn't
+    *             write to gl_ClipDistance)
+    *    16 bytes overhead since the VUE size must be a multiple of 32 bytes
+    *             (see below)--this causes up to 1 VUE slot to be wasted
+    *   400 bytes available for varying packing overhead
+    *
+    * Worst-case varying packing overhead is 3/4 of a varying slot (12 bytes)
+    * per interpolation type, so this is plenty.
+    *
+    */
+   unsigned output_vertex_size_bytes = c->prog_data.base.vue_map.num_slots * 16;
+   assert(output_vertex_size_bytes <= GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES);
+   c->prog_data.output_vertex_size_hwords =
+      ALIGN(output_vertex_size_bytes, 32) / 32;
+
+   /* Compute URB entry size.  The maximum allowed URB entry size is 32k.
+    * That divides up as follows:
+    *
+    *     64 bytes for the control data header (cut indices or StreamID bits)
+    *   4096 bytes for varyings (a varying component is 4 bytes and
+    *              gl_MaxGeometryTotalOutputComponents = 1024)
+    *   4096 bytes overhead for VARYING_SLOT_PSIZ (each varying slot is 16
+    *              bytes/vertex and gl_MaxGeometryOutputVertices is 256)
+    *   4096 bytes overhead for gl_Position (we allocate it a slot in the VUE
+    *              even if it's not used)
+    *   8192 bytes overhead for gl_ClipDistance (we allocate it 2 VUE slots
+    *              whenever clip planes are enabled, even if the shader doesn't
+    *              write to gl_ClipDistance)
+    *   4096 bytes overhead since the VUE size must be a multiple of 32
+    *              bytes (see above)--this causes up to 1 VUE slot to be wasted
+    *   8128 bytes available for varying packing overhead
+    *
+    * Worst-case varying packing overhead is 3/4 of a varying slot per
+    * interpolation type, which works out to 3072 bytes, so this would allow
+    * us to accommodate 2 interpolation types without any danger of running
+    * out of URB space.
+    *
+    * In practice, the risk of running out of URB space is very small, since
+    * the above figures are all worst-case, and most of them scale with the
+    * number of output vertices.  So we'll just calculate the amount of space
+    * we need, and if it's too large, fail to compile.
+    */
+   unsigned output_size_bytes =
+      c->prog_data.output_vertex_size_hwords * 32 * c->gp->program.VerticesOut;
+   output_size_bytes += 32 * c->prog_data.control_data_header_size_hwords;
+
+   /* Broadwell stores "Vertex Count" as a full 8 DWord (32 byte) URB output,
+    * which comes before the control header.
+    */
+   if (brw->gen >= 8)
+      output_size_bytes += 32;
+
+   assert(output_size_bytes >= 1);
+   if (output_size_bytes > GEN7_MAX_GS_URB_ENTRY_SIZE_BYTES)
+      return false;
+
+   /* URB entry sizes are stored as a multiple of 64 bytes. */
+   c->prog_data.base.urb_entry_size = ALIGN(output_size_bytes, 64) / 64;
+
+   c->prog_data.output_topology = prim_to_hw_prim[c->gp->program.OutputType];
+
+   brw_compute_vue_map(brw, &c->input_vue_map, c->key.input_varyings);
+
+   /* GS inputs are read from the VUE 256 bits (2 vec4's) at a time, so we
+    * need to program a URB read length of ceiling(num_slots / 2).
+    */
+   c->prog_data.base.urb_read_length = (c->input_vue_map.num_slots + 1) / 2;
+
+   c->base.program = brw_gs_emit(brw, c->base.shader_prog, c,
+         c->base.mem_ctx, &c->base.program_size);
+   if (c->base.program == NULL)
+      return false;
+
+   if (c->base.last_scratch) {
+      c->prog_data.base.total_scratch
+         = brw_get_scratch_size(c->base.last_scratch*REG_SIZE);
+   }
+
+   return true;
+}
+
+static void
+brw_gs_clear_compile(struct brw_context *brw,
+                     struct brw_gs_compile *c)
+{
+   ralloc_free(c->base.mem_ctx);
+}
+
+
+bool
+brw_gs_precompile(struct gl_context *ctx, struct gl_shader_program *prog)
+{
+   struct brw_context *brw = brw_context(ctx);
+   struct brw_gs_prog_key key;
+
+   if (!prog->_LinkedShaders[MESA_SHADER_GEOMETRY])
+      return true;
+
+   struct gl_geometry_program *gp = (struct gl_geometry_program *)
+      prog->_LinkedShaders[MESA_SHADER_GEOMETRY]->Program;
+   struct brw_geometry_program *bgp = brw_geometry_program(gp);
+
+   memset(&key, 0, sizeof(key));
+
+   brw_vec4_setup_prog_key_for_precompile(ctx, &key.base, bgp->id, &gp->Base);
+
+   /* Assume that the set of varyings coming in from the vertex shader exactly
+    * matches what the geometry shader requires.
+    */
+   key.input_varyings = gp->Base.InputsRead;
+
+   struct brw_gs_compile c;
+
+   brw_gs_init_compile(brw, prog, bgp, &key, &c);
+   if (!brw_gs_do_compile(brw, &c)) {
+      brw_gs_clear_compile(brw, &c);
+      return false;
+   }
+
+   // Rather than defer or upload to cache, hand off
+   // the compile results back to the brw_context
+   brw_shader_program_save_gs_compile(brw->shader_prog, &c);
+
+   brw_gs_clear_compile(brw, &c);
+
+   return true;
+}
+
+
+bool
+brw_gs_prog_data_compare(const void *in_a, const void *in_b)
+{
+   const struct brw_gs_prog_data *a = in_a;
+   const struct brw_gs_prog_data *b = in_b;
+
+   /* Compare the base structure. */
+   if (!brw_stage_prog_data_compare(&a->base.base, &b->base.base))
+      return false;
+
+   /* Compare the rest of the struct. */
+   const unsigned offset = sizeof(struct brw_stage_prog_data);
+   if (memcmp(((char *) a) + offset, ((char *) b) + offset,
+              sizeof(struct brw_gs_prog_data) - offset)) {
+      return false;
+   }
+
+   return true;
+}

diff --git a/icd/intel/compiler/pipeline/brw_vec4_gs.h b/icd/intel/compiler/pipeline/brw_vec4_gs.h
new file mode 100644
index 0000000..5d4244e
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_vec4_gs.h

@@ -0,0 +1,43 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef BRW_VEC4_GS_H
+#define BRW_VEC4_GS_H
+
+#include <stdbool.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct gl_context;
+struct gl_shader_program;
+
+bool brw_gs_precompile(struct gl_context *ctx, struct gl_shader_program *prog);
+bool brw_gs_prog_data_compare(const void *a, const void *b);
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+#endif /* BRW_VEC4_GS_H */

diff --git a/icd/intel/compiler/pipeline/brw_vec4_gs_visitor.cpp b/icd/intel/compiler/pipeline/brw_vec4_gs_visitor.cpp
new file mode 100644
index 0000000..9546ffe
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_vec4_gs_visitor.cpp

@@ -0,0 +1,635 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file brw_vec4_gs_visitor.cpp
+ *
+ * Geometry-shader-specific code derived from the vec4_visitor class.
+ */
+
+#include "brw_vec4_gs_visitor.h"
+
+const unsigned MAX_GS_INPUT_VERTICES = 6;
+
+namespace brw {
+
+vec4_gs_visitor::vec4_gs_visitor(struct brw_context *brw,
+                                 struct brw_gs_compile *c,
+                                 struct gl_shader_program *prog,
+                                 void *mem_ctx,
+                                 bool no_spills)
+   : vec4_visitor(brw, &c->base, &c->gp->program.Base, &c->key.base,
+                  &c->prog_data.base, prog, MESA_SHADER_GEOMETRY, mem_ctx,
+                  INTEL_DEBUG & DEBUG_GS, no_spills,
+                  ST_GS, ST_GS_WRITTEN, ST_GS_RESET),
+     c(c)
+{
+}
+
+
+dst_reg *
+vec4_gs_visitor::make_reg_for_system_value(ir_variable *ir)
+{
+   dst_reg *reg = new(mem_ctx) dst_reg(this, ir->type);
+
+   switch (ir->data.location) {
+   case SYSTEM_VALUE_INVOCATION_ID:
+      this->current_annotation = "initialize gl_InvocationID";
+      emit(GS_OPCODE_GET_INSTANCE_ID, *reg);
+      break;
+   default:
+      assert(!"not reached");
+      break;
+   }
+
+   return reg;
+}
+
+
+int
+vec4_gs_visitor::setup_varying_inputs(int payload_reg, int *attribute_map,
+                                      int attributes_per_reg)
+{
+   /* For geometry shaders there are N copies of the input attributes, where N
+    * is the number of input vertices.  attribute_map[BRW_VARYING_SLOT_COUNT *
+    * i + j] represents attribute j for vertex i.
+    *
+    * Note that GS inputs are read from the VUE 256 bits (2 vec4's) at a time,
+    * so the total number of input slots that will be delivered to the GS (and
+    * thus the stride of the input arrays) is urb_read_length * 2.
+    */
+   const unsigned num_input_vertices = c->gp->program.VerticesIn;
+   assert(num_input_vertices <= MAX_GS_INPUT_VERTICES);
+   unsigned input_array_stride = c->prog_data.base.urb_read_length * 2;
+
+   for (int slot = 0; slot < c->input_vue_map.num_slots; slot++) {
+      int varying = c->input_vue_map.slot_to_varying[slot];
+      for (unsigned vertex = 0; vertex < num_input_vertices; vertex++) {
+         attribute_map[BRW_VARYING_SLOT_COUNT * vertex + varying] =
+            attributes_per_reg * payload_reg + input_array_stride * vertex +
+            slot;
+      }
+   }
+
+   int regs_used = ALIGN(input_array_stride * num_input_vertices,
+                         attributes_per_reg) / attributes_per_reg;
+   return payload_reg + regs_used;
+}
+
+
+void
+vec4_gs_visitor::setup_payload()
+{
+   int attribute_map[BRW_VARYING_SLOT_COUNT * MAX_GS_INPUT_VERTICES];
+
+   /* If we are in dual instanced mode, then attributes are going to be
+    * interleaved, so one register contains two attribute slots.
+    */
+   int attributes_per_reg = c->prog_data.dual_instanced_dispatch ? 2 : 1;
+
+   /* If a geometry shader tries to read from an input that wasn't written by
+    * the vertex shader, that produces undefined results, but it shouldn't
+    * crash anything.  So initialize attribute_map to zeros--that ensures that
+    * these undefined results are read from r0.
+    */
+   memset(attribute_map, 0, sizeof(attribute_map));
+
+   int reg = 0;
+
+   /* The payload always contains important data in r0, which contains
+    * the URB handles that are passed on to the URB write at the end
+    * of the thread.
+    */
+   reg++;
+
+   /* If the shader uses gl_PrimitiveIDIn, that goes in r1. */
+   if (c->prog_data.include_primitive_id)
+      attribute_map[VARYING_SLOT_PRIMITIVE_ID] = attributes_per_reg * reg++;
+
+   reg = setup_uniforms(reg);
+
+   reg = setup_varying_inputs(reg, attribute_map, attributes_per_reg);
+
+   lower_attributes_to_hw_regs(attribute_map,
+                               c->prog_data.dual_instanced_dispatch);
+
+   this->first_non_payload_grf = reg;
+}
+
+
+void
+vec4_gs_visitor::emit_prolog()
+{
+   /* In vertex shaders, r0.2 is guaranteed to be initialized to zero.  In
+    * geometry shaders, it isn't (it contains a bunch of information we don't
+    * need, like the input primitive type).  We need r0.2 to be zero in order
+    * to build scratch read/write messages correctly (otherwise this value
+    * will be interpreted as a global offset, causing us to do our scratch
+    * reads/writes to garbage memory).  So just set it to zero at the top of
+    * the shader.
+    */
+   this->current_annotation = "clear r0.2";
+   dst_reg r0(retype(brw_vec4_grf(0, 0), BRW_REGISTER_TYPE_UD));
+   vec4_instruction *inst = emit(GS_OPCODE_SET_DWORD_2_IMMED, r0, 0u);
+   inst->force_writemask_all = true;
+
+   /* Create a virtual register to hold the vertex count */
+   this->vertex_count = src_reg(this, glsl_type::uint_type);
+
+   /* Initialize the vertex_count register to 0 */
+   this->current_annotation = "initialize vertex_count";
+   inst = emit(MOV(dst_reg(this->vertex_count), 0u));
+   inst->force_writemask_all = true;
+
+   if (c->control_data_header_size_bits > 0) {
+      /* Create a virtual register to hold the current set of control data
+       * bits.
+       */
+      this->control_data_bits = src_reg(this, glsl_type::uint_type);
+
+      /* If we're outputting more than 32 control data bits, then EmitVertex()
+       * will set control_data_bits to 0 after emitting the first vertex.
+       * Otherwise, we need to initialize it to 0 here.
+       */
+      if (c->control_data_header_size_bits <= 32) {
+         this->current_annotation = "initialize control data bits";
+         inst = emit(MOV(dst_reg(this->control_data_bits), 0u));
+         inst->force_writemask_all = true;
+      }
+   }
+
+   /* If the geometry shader uses the gl_PointSize input, we need to fix it up
+    * to account for the fact that the vertex shader stored it in the w
+    * component of VARYING_SLOT_PSIZ.
+    */
+   if (c->gp->program.Base.InputsRead & VARYING_BIT_PSIZ) {
+      this->current_annotation = "swizzle gl_PointSize input";
+      for (int vertex = 0; vertex < c->gp->program.VerticesIn; vertex++) {
+         dst_reg dst(ATTR,
+                     BRW_VARYING_SLOT_COUNT * vertex + VARYING_SLOT_PSIZ);
+         dst.type = BRW_REGISTER_TYPE_F;
+         src_reg src(dst);
+         dst.writemask = WRITEMASK_X;
+         src.swizzle = BRW_SWIZZLE_WWWW;
+         inst = emit(MOV(dst, src));
+
+         /* In dual instanced dispatch mode, dst has a width of 4, so we need
+          * to make sure the MOV happens regardless of which channels are
+          * enabled.
+          */
+         inst->force_writemask_all = true;
+      }
+   }
+
+   this->current_annotation = NULL;
+}
+
+
+void
+vec4_gs_visitor::emit_program_code()
+{
+   /* We don't support NV_geometry_program4. */
+   assert(!"Unreached");
+}
+
+
+void
+vec4_gs_visitor::emit_thread_end()
+{
+   if (c->control_data_header_size_bits > 0) {
+      /* During shader execution, we only ever call emit_control_data_bits()
+       * just prior to outputting a vertex.  Therefore, the control data bits
+       * corresponding to the most recently output vertex still need to be
+       * emitted.
+       */
+      current_annotation = "thread end: emit control data bits";
+      emit_control_data_bits();
+   }
+
+   /* MRF 0 is reserved for the debugger, so start with message header
+    * in MRF 1.
+    */
+   int base_mrf = 1;
+
+   current_annotation = "thread end";
+   dst_reg mrf_reg(MRF, base_mrf);
+   src_reg r0(retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UD));
+   vec4_instruction *inst = emit(MOV(mrf_reg, r0));
+   inst->force_writemask_all = true;
+   emit(GS_OPCODE_SET_VERTEX_COUNT, mrf_reg, this->vertex_count);
+//   if (INTEL_DEBUG & DEBUG_SHADER_TIME)
+//      emit_shader_time_end();
+   inst = emit(GS_OPCODE_THREAD_END);
+   inst->base_mrf = base_mrf;
+   inst->mlen = 1;
+}
+
+
+void
+vec4_gs_visitor::emit_urb_write_header(int mrf)
+{
+   /* The SEND instruction that writes the vertex data to the VUE will use
+    * per_slot_offset=true, which means that DWORDs 3 and 4 of the message
+    * header specify an offset (in multiples of 256 bits) into the URB entry
+    * at which the write should take place.
+    *
+    * So we have to prepare a message header with the appropriate offset
+    * values.
+    */
+   dst_reg mrf_reg(MRF, mrf);
+   src_reg r0(retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UD));
+   this->current_annotation = "URB write header";
+   vec4_instruction *inst = emit(MOV(mrf_reg, r0));
+   inst->force_writemask_all = true;
+   emit(GS_OPCODE_SET_WRITE_OFFSET, mrf_reg, this->vertex_count,
+        (uint32_t) c->prog_data.output_vertex_size_hwords);
+}
+
+
+vec4_instruction *
+vec4_gs_visitor::emit_urb_write_opcode(bool complete)
+{
+   /* We don't care whether the vertex is complete, because in general
+    * geometry shaders output multiple vertices, and we don't terminate the
+    * thread until all vertices are complete.
+    */
+   (void) complete;
+
+   vec4_instruction *inst = emit(GS_OPCODE_URB_WRITE);
+   inst->offset = c->prog_data.control_data_header_size_hwords;
+
+   /* We need to increment Global Offset by 1 to make room for Broadwell's
+    * extra "Vertex Count" payload at the beginning of the URB entry.
+    */
+   if (brw->gen >= 8)
+      inst->offset++;
+
+   inst->urb_write_flags = BRW_URB_WRITE_PER_SLOT_OFFSET;
+   return inst;
+}
+
+
+int
+vec4_gs_visitor::compute_array_stride(ir_dereference_array *ir)
+{
+   /* Geometry shader inputs are arrays, but they use an unusual array layout:
+    * instead of all array elements for a given geometry shader input being
+    * stored consecutively, all geometry shader inputs are interleaved into
+    * one giant array.  At this stage of compilation, we assume that the
+    * stride of the array is BRW_VARYING_SLOT_COUNT.  Later,
+    * setup_attributes() will remap our accesses to the actual input array.
+    */
+   ir_dereference_variable *deref_var = ir->array->as_dereference_variable();
+   if (deref_var && deref_var->var->data.mode == ir_var_shader_in)
+      return BRW_VARYING_SLOT_COUNT;
+   else
+      return vec4_visitor::compute_array_stride(ir);
+}
+
+
+/**
+ * Write out a batch of 32 control data bits from the control_data_bits
+ * register to the URB.
+ *
+ * The current value of the vertex_count register determines which DWORD in
+ * the URB receives the control data bits.  The control_data_bits register is
+ * assumed to contain the correct data for the vertex that was most recently
+ * output, and all previous vertices that share the same DWORD.
+ *
+ * This function takes care of ensuring that if no vertices have been output
+ * yet, no control bits are emitted.
+ */
+void
+vec4_gs_visitor::emit_control_data_bits()
+{
+   assert(c->control_data_bits_per_vertex != 0);
+
+   /* Since the URB_WRITE_OWORD message operates with 128-bit (vec4 sized)
+    * granularity, we need to use two tricks to ensure that the batch of 32
+    * control data bits is written to the appropriate DWORD in the URB.  To
+    * select which vec4 we are writing to, we use the "slot {0,1} offset"
+    * fields of the message header.  To select which DWORD in the vec4 we are
+    * writing to, we use the channel mask fields of the message header.  To
+    * avoid penalizing geometry shaders that emit a small number of vertices
+    * with extra bookkeeping, we only do each of these tricks when
+    * c->prog_data.control_data_header_size_bits is large enough to make it
+    * necessary.
+    *
+    * Note: this means that if we're outputting just a single DWORD of control
+    * data bits, we'll actually replicate it four times since we won't do any
+    * channel masking.  But that's not a problem since in this case the
+    * hardware only pays attention to the first DWORD.
+    */
+   enum brw_urb_write_flags urb_write_flags = BRW_URB_WRITE_OWORD;
+   if (c->control_data_header_size_bits > 32)
+      urb_write_flags = urb_write_flags | BRW_URB_WRITE_USE_CHANNEL_MASKS;
+   if (c->control_data_header_size_bits > 128)
+      urb_write_flags = urb_write_flags | BRW_URB_WRITE_PER_SLOT_OFFSET;
+
+   /* If vertex_count is 0, then no control data bits have been accumulated
+    * yet, so we should do nothing.
+    */
+   emit(CMP(dst_null_d(), this->vertex_count, 0u, BRW_CONDITIONAL_NEQ));
+   emit(IF(BRW_PREDICATE_NORMAL));
+   {
+      /* If we are using either channel masks or a per-slot offset, then we
+       * need to figure out which DWORD we are trying to write to, using the
+       * formula:
+       *
+       *     dword_index = (vertex_count - 1) * bits_per_vertex / 32
+       *
+       * Since bits_per_vertex is a power of two, and is known at compile
+       * time, this can be optimized to:
+       *
+       *     dword_index = (vertex_count - 1) >> (6 - log2(bits_per_vertex))
+       */
+      src_reg dword_index(this, glsl_type::uint_type);
+      if (urb_write_flags) {
+         src_reg prev_count(this, glsl_type::uint_type);
+         emit(ADD(dst_reg(prev_count), this->vertex_count, 0xffffffffu));
+         unsigned log2_bits_per_vertex =
+            _mesa_fls(c->control_data_bits_per_vertex);
+         emit(SHR(dst_reg(dword_index), prev_count,
+                  (uint32_t) (6 - log2_bits_per_vertex)));
+      }
+
+      /* Start building the URB write message.  The first MRF gets a copy of
+       * R0.
+       */
+      int base_mrf = 1;
+      dst_reg mrf_reg(MRF, base_mrf);
+      src_reg r0(retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UD));
+      vec4_instruction *inst = emit(MOV(mrf_reg, r0));
+      inst->force_writemask_all = true;
+
+      if (urb_write_flags & BRW_URB_WRITE_PER_SLOT_OFFSET) {
+         /* Set the per-slot offset to dword_index / 4, to that we'll write to
+          * the appropriate OWORD within the control data header.
+          */
+         src_reg per_slot_offset(this, glsl_type::uint_type);
+         emit(SHR(dst_reg(per_slot_offset), dword_index, 2u));
+         emit(GS_OPCODE_SET_WRITE_OFFSET, mrf_reg, per_slot_offset, 1u);
+      }
+
+      if (urb_write_flags & BRW_URB_WRITE_USE_CHANNEL_MASKS) {
+         /* Set the channel masks to 1 << (dword_index % 4), so that we'll
+          * write to the appropriate DWORD within the OWORD.  We need to do
+          * this computation with force_writemask_all, otherwise garbage data
+          * from invocation 0 might clobber the mask for invocation 1 when
+          * GS_OPCODE_PREPARE_CHANNEL_MASKS tries to OR the two masks
+          * together.
+          */
+         src_reg channel(this, glsl_type::uint_type);
+         inst = emit(AND(dst_reg(channel), dword_index, 3u));
+         inst->force_writemask_all = true;
+         src_reg one(this, glsl_type::uint_type);
+         inst = emit(MOV(dst_reg(one), 1u));
+         inst->force_writemask_all = true;
+         src_reg channel_mask(this, glsl_type::uint_type);
+         inst = emit(SHL(dst_reg(channel_mask), one, channel));
+         inst->force_writemask_all = true;
+         emit(GS_OPCODE_PREPARE_CHANNEL_MASKS, dst_reg(channel_mask),
+                                               channel_mask);
+         emit(GS_OPCODE_SET_CHANNEL_MASKS, mrf_reg, channel_mask);
+      }
+
+      /* Store the control data bits in the message payload and send it. */
+      dst_reg mrf_reg2(MRF, base_mrf + 1);
+      inst = emit(MOV(mrf_reg2, this->control_data_bits));
+      inst->force_writemask_all = true;
+      inst = emit(GS_OPCODE_URB_WRITE);
+      inst->urb_write_flags = urb_write_flags;
+      /* We need to increment Global Offset by 256-bits to make room for
+       * Broadwell's extra "Vertex Count" payload at the beginning of the
+       * URB entry.  Since this is an OWord message, Global Offset is counted
+       * in 128-bit units, so we must set it to 2.
+       */
+      if (brw->gen >= 8)
+         inst->offset = 2;
+      inst->base_mrf = base_mrf;
+      inst->mlen = 2;
+   }
+   emit(BRW_OPCODE_ENDIF);
+}
+
+
+void
+vec4_gs_visitor::visit(ir_emit_vertex *)
+{
+   this->current_annotation = "emit vertex: safety check";
+
+   /* To ensure that we don't output more vertices than the shader specified
+    * using max_vertices, do the logic inside a conditional of the form "if
+    * (vertex_count < MAX)"
+    */
+   unsigned num_output_vertices = c->gp->program.VerticesOut;
+   emit(CMP(dst_null_d(), this->vertex_count,
+            src_reg(num_output_vertices), BRW_CONDITIONAL_L));
+   emit(IF(BRW_PREDICATE_NORMAL));
+   {
+      /* If we're outputting 32 control data bits or less, then we can wait
+       * until the shader is over to output them all.  Otherwise we need to
+       * output them as we go.  Now is the time to do it, since we're about to
+       * output the vertex_count'th vertex, so it's guaranteed that the
+       * control data bits associated with the (vertex_count - 1)th vertex are
+       * correct.
+       */
+      if (c->control_data_header_size_bits > 32) {
+         this->current_annotation = "emit vertex: emit control data bits";
+         /* Only emit control data bits if we've finished accumulating a batch
+          * of 32 bits.  This is the case when:
+          *
+          *     (vertex_count * bits_per_vertex) % 32 == 0
+          *
+          * (in other words, when the last 5 bits of vertex_count *
+          * bits_per_vertex are 0).  Assuming bits_per_vertex == 2^n for some
+          * integer n (which is always the case, since bits_per_vertex is
+          * always 1 or 2), this is equivalent to requiring that the last 5-n
+          * bits of vertex_count are 0:
+          *
+          *     vertex_count & (2^(5-n) - 1) == 0
+          *
+          * 2^(5-n) == 2^5 / 2^n == 32 / bits_per_vertex, so this is
+          * equivalent to:
+          *
+          *     vertex_count & (32 / bits_per_vertex - 1) == 0
+          */
+         vec4_instruction *inst =
+            emit(AND(dst_null_d(), this->vertex_count,
+                     (uint32_t) (32 / c->control_data_bits_per_vertex - 1)));
+         inst->conditional_mod = BRW_CONDITIONAL_Z;
+         emit(IF(BRW_PREDICATE_NORMAL));
+         {
+            emit_control_data_bits();
+
+            /* Reset control_data_bits to 0 so we can start accumulating a new
+             * batch.
+             *
+             * Note: in the case where vertex_count == 0, this neutralizes the
+             * effect of any call to EndPrimitive() that the shader may have
+             * made before outputting its first vertex.
+             */
+            inst = emit(MOV(dst_reg(this->control_data_bits), 0u));
+            inst->force_writemask_all = true;
+         }
+         emit(BRW_OPCODE_ENDIF);
+      }
+
+      this->current_annotation = "emit vertex: vertex data";
+      emit_vertex();
+
+      this->current_annotation = "emit vertex: increment vertex count";
+      emit(ADD(dst_reg(this->vertex_count), this->vertex_count,
+               src_reg(1u)));
+   }
+   emit(BRW_OPCODE_ENDIF);
+
+   this->current_annotation = NULL;
+}
+
+void
+vec4_gs_visitor::visit(ir_end_primitive *)
+{
+   /* We can only do EndPrimitive() functionality when the control data
+    * consists of cut bits.  Fortunately, the only time it isn't is when the
+    * output type is points, in which case EndPrimitive() is a no-op.
+    */
+   if (c->prog_data.control_data_format !=
+       GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_CUT) {
+      return;
+   }
+
+   /* Cut bits use one bit per vertex. */
+   assert(c->control_data_bits_per_vertex == 1);
+
+   /* Cut bit n should be set to 1 if EndPrimitive() was called after emitting
+    * vertex n, 0 otherwise.  So all we need to do here is mark bit
+    * (vertex_count - 1) % 32 in the cut_bits register to indicate that
+    * EndPrimitive() was called after emitting vertex (vertex_count - 1);
+    * vec4_gs_visitor::emit_control_data_bits() will take care of the rest.
+    *
+    * Note that if EndPrimitve() is called before emitting any vertices, this
+    * will cause us to set bit 31 of the control_data_bits register to 1.
+    * That's fine because:
+    *
+    * - If max_vertices < 32, then vertex number 31 (zero-based) will never be
+    *   output, so the hardware will ignore cut bit 31.
+    *
+    * - If max_vertices == 32, then vertex number 31 is guaranteed to be the
+    *   last vertex, so setting cut bit 31 has no effect (since the primitive
+    *   is automatically ended when the GS terminates).
+    *
+    * - If max_vertices > 32, then the ir_emit_vertex visitor will reset the
+    *   control_data_bits register to 0 when the first vertex is emitted.
+    */
+
+   /* control_data_bits |= 1 << ((vertex_count - 1) % 32) */
+   src_reg one(this, glsl_type::uint_type);
+   emit(MOV(dst_reg(one), 1u));
+   src_reg prev_count(this, glsl_type::uint_type);
+   emit(ADD(dst_reg(prev_count), this->vertex_count, 0xffffffffu));
+   src_reg mask(this, glsl_type::uint_type);
+   /* Note: we're relying on the fact that the GEN SHL instruction only pays
+    * attention to the lower 5 bits of its second source argument, so on this
+    * architecture, 1 << (vertex_count - 1) is equivalent to 1 <<
+    * ((vertex_count - 1) % 32).
+    */
+   emit(SHL(dst_reg(mask), one, prev_count));
+   emit(OR(dst_reg(this->control_data_bits), this->control_data_bits, mask));
+}
+
+static const unsigned *
+generate_assembly(struct brw_context *brw,
+                  struct gl_shader_program *shader_prog,
+                  struct gl_program *prog,
+                  struct brw_vec4_prog_data *prog_data,
+                  void *mem_ctx,
+                  exec_list *instructions,
+                  unsigned *final_assembly_size)
+{
+//   if (brw->gen >= 8) {
+//      gen8_vec4_generator g(brw, shader_prog, prog, prog_data, mem_ctx,
+//                            INTEL_DEBUG & DEBUG_GS);
+//      return g.generate_assembly(instructions, final_assembly_size);
+//   } else {
+      vec4_generator g(brw, shader_prog, prog, prog_data, mem_ctx,
+                       INTEL_DEBUG & DEBUG_GS);
+      return g.generate_assembly(instructions, final_assembly_size);
+//   }
+}
+
+extern "C" const unsigned *
+brw_gs_emit(struct brw_context *brw,
+            struct gl_shader_program *prog,
+            struct brw_gs_compile *c,
+            void *mem_ctx,
+            unsigned *final_assembly_size)
+{
+   if (unlikely(INTEL_DEBUG & DEBUG_GS)) {
+      struct brw_shader *shader =
+         (brw_shader *) prog->_LinkedShaders[MESA_SHADER_GEOMETRY];
+
+      brw_dump_ir(brw, "geometry", prog, &shader->base, NULL);
+   }
+
+   /* Compile the geometry shader in DUAL_OBJECT dispatch mode, if we can do
+    * so without spilling. If the GS invocations count > 1, then we can't use
+    * dual object mode.
+    */
+   if (c->prog_data.invocations <= 1 &&
+       likely(!(INTEL_DEBUG & DEBUG_NO_DUAL_OBJECT_GS))) {
+      c->prog_data.dual_instanced_dispatch = false;
+
+      vec4_gs_visitor v(brw, c, prog, mem_ctx, true /* no_spills */);
+      if (v.run()) {
+         return generate_assembly(brw, prog, &c->gp->program.Base,
+                                  &c->prog_data.base, mem_ctx, &v.instructions,
+                                  final_assembly_size);
+      }
+   }
+
+   /* Either we failed to compile in DUAL_OBJECT mode (probably because it
+    * would have required spilling) or DUAL_OBJECT mode is disabled.  So fall
+    * back to DUAL_INSTANCED mode, which consumes fewer registers.
+    *
+    * FIXME: In an ideal world we'd fall back to SINGLE mode, which would
+    * allow us to interleave general purpose registers (resulting in even less
+    * likelihood of spilling).  But at the moment, the vec4 generator and
+    * visitor classes don't have the infrastructure to interleave general
+    * purpose registers, so DUAL_INSTANCED is the best we can do.
+    */
+   c->prog_data.dual_instanced_dispatch = true;
+
+   vec4_gs_visitor v(brw, c, prog, mem_ctx, false /* no_spills */);
+   if (!v.run()) {
+      prog->LinkStatus = false;
+      ralloc_strcat(&prog->InfoLog, v.fail_msg);
+      return NULL;
+   }
+
+   return generate_assembly(brw, prog, &c->gp->program.Base, &c->prog_data.base,
+                            mem_ctx, &v.instructions, final_assembly_size);
+}
+
+
+} /* namespace brw */

diff --git a/icd/intel/compiler/pipeline/brw_vec4_gs_visitor.h b/icd/intel/compiler/pipeline/brw_vec4_gs_visitor.h
new file mode 100644
index 0000000..25415ea
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_vec4_gs_visitor.h

@@ -0,0 +1,111 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file brw_vec4_gs_visitor.h
+ *
+ * Geometry-shader-specific code derived from the vec4_visitor class.
+ */
+
+#ifndef BRW_VEC4_GS_VISITOR_H
+#define BRW_VEC4_GS_VISITOR_H
+
+#include "brw_vec4.h"
+
+
+struct brw_gs_prog_key
+{
+   struct brw_vec4_prog_key base;
+
+   GLbitfield64 input_varyings;
+};
+
+
+/**
+ * Scratch data used when compiling a GLSL geometry shader.
+ */
+struct brw_gs_compile
+{
+   struct brw_vec4_compile base;
+   struct brw_gs_prog_key key;
+   struct brw_gs_prog_data prog_data;
+   struct brw_vue_map input_vue_map;
+
+   struct brw_geometry_program *gp;
+
+   unsigned control_data_bits_per_vertex;
+   unsigned control_data_header_size_bits;
+};
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+const unsigned *brw_gs_emit(struct brw_context *brw,
+                            struct gl_shader_program *prog,
+                            struct brw_gs_compile *c,
+                            void *mem_ctx,
+                            unsigned *final_assembly_size);
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+#ifdef __cplusplus
+namespace brw {
+
+class vec4_gs_visitor : public vec4_visitor
+{
+public:
+   vec4_gs_visitor(struct brw_context *brw,
+                   struct brw_gs_compile *c,
+                   struct gl_shader_program *prog,
+                   void *mem_ctx,
+                   bool no_spills);
+
+protected:
+   virtual dst_reg *make_reg_for_system_value(ir_variable *ir);
+   virtual void setup_payload();
+   virtual void emit_prolog();
+   virtual void emit_program_code();
+   virtual void emit_thread_end();
+   virtual void emit_urb_write_header(int mrf);
+   virtual vec4_instruction *emit_urb_write_opcode(bool complete);
+   virtual int compute_array_stride(ir_dereference_array *ir);
+   virtual void visit(ir_emit_vertex *);
+   virtual void visit(ir_end_primitive *);
+
+private:
+   int setup_varying_inputs(int payload_reg, int *attribute_map,
+                            int attributes_per_reg);
+   void emit_control_data_bits();
+
+   src_reg vertex_count;
+   src_reg control_data_bits;
+   const struct brw_gs_compile * const c;
+};
+
+} /* namespace brw */
+#endif /* __cplusplus */
+
+#endif /* BRW_VEC4_GS_VISITOR_H */

diff --git a/icd/intel/compiler/pipeline/brw_vec4_live_variables.cpp b/icd/intel/compiler/pipeline/brw_vec4_live_variables.cpp
new file mode 100644
index 0000000..ff97a24
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_vec4_live_variables.cpp

@@ -0,0 +1,340 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Eric Anholt <eric@anholt.net>
+ *
+ */
+
+#include "brw_cfg.h"
+#include "brw_vec4_live_variables.h"
+
+using namespace brw;
+
+/** @file brw_vec4_live_variables.cpp
+ *
+ * Support for computing at the basic block level which variables
+ * (virtual GRFs in our case) are live at entry and exit.
+ *
+ * See Muchnick's Advanced Compiler Design and Implementation, section
+ * 14.1 (p444).
+ */
+
+/**
+ * Sets up the use[] and def[] arrays.
+ *
+ * The basic-block-level live variable analysis needs to know which
+ * variables get used before they're completely defined, and which
+ * variables are completely defined before they're used.
+ *
+ * We independently track each channel of a vec4.  This is because we need to
+ * be able to recognize a sequence like:
+ *
+ * ...
+ * DP4 tmp.x a b;
+ * DP4 tmp.y c d;
+ * MUL result.xy tmp.xy e.xy
+ * ...
+ *
+ * as having tmp live only across that sequence (assuming it's used nowhere
+ * else), because it's a common pattern.  A more conservative approach that
+ * doesn't get tmp marked a deffed in this block will tend to result in
+ * spilling.
+ */
+void
+vec4_live_variables::setup_def_use()
+{
+   int ip = 0;
+
+   for (int b = 0; b < cfg->num_blocks; b++) {
+      bblock_t *block = cfg->blocks[b];
+
+      assert(ip == block->start_ip);
+      if (b > 0)
+	 assert(cfg->blocks[b - 1]->end_ip == ip - 1);
+
+      for (vec4_instruction *inst = (vec4_instruction *)block->start;
+	   inst != block->end->next;
+	   inst = (vec4_instruction *)inst->next) {
+
+	 /* Set use[] for this instruction */
+	 for (unsigned int i = 0; i < 3; i++) {
+	    if (inst->src[i].file == GRF) {
+	       int reg = inst->src[i].reg;
+
+               for (int j = 0; j < 4; j++) {
+                  int c = BRW_GET_SWZ(inst->src[i].swizzle, j);
+                  if (!BITSET_TEST(bd[b].def, reg * 4 + c))
+                     BITSET_SET(bd[b].use, reg * 4 + c);
+               }
+	    }
+	 }
+
+	 /* Check for unconditional writes to whole registers. These
+	  * are the things that screen off preceding definitions of a
+	  * variable, and thus qualify for being in def[].
+	  */
+	 if (inst->dst.file == GRF &&
+	     v->virtual_grf_sizes[inst->dst.reg] == 1 &&
+	     !inst->predicate) {
+            for (int c = 0; c < 4; c++) {
+               if (inst->dst.writemask & (1 << c)) {
+                  int reg = inst->dst.reg;
+                  if (!BITSET_TEST(bd[b].use, reg * 4 + c))
+                     BITSET_SET(bd[b].def, reg * 4 + c);
+               }
+            }
+         }
+
+	 ip++;
+      }
+   }
+}
+
+/**
+ * The algorithm incrementally sets bits in liveout and livein,
+ * propagating it through control flow.  It will eventually terminate
+ * because it only ever adds bits, and stops when no bits are added in
+ * a pass.
+ */
+void
+vec4_live_variables::compute_live_variables()
+{
+   bool cont = true;
+
+   while (cont) {
+      cont = false;
+
+      for (int b = 0; b < cfg->num_blocks; b++) {
+	 /* Update livein */
+	 for (int i = 0; i < bitset_words; i++) {
+            BITSET_WORD new_livein = (bd[b].use[i] |
+                                      (bd[b].liveout[i] & ~bd[b].def[i]));
+            if (new_livein & ~bd[b].livein[i]) {
+               bd[b].livein[i] |= new_livein;
+               cont = true;
+	    }
+	 }
+
+	 /* Update liveout */
+	 foreach_list(block_node, &cfg->blocks[b]->children) {
+	    bblock_link *link = (bblock_link *)block_node;
+	    bblock_t *block = link->block;
+
+	    for (int i = 0; i < bitset_words; i++) {
+               BITSET_WORD new_liveout = (bd[block->block_num].livein[i] &
+                                          ~bd[b].liveout[i]);
+               if (new_liveout) {
+                  bd[b].liveout[i] |= new_liveout;
+		  cont = true;
+	       }
+	    }
+	 }
+      }
+   }
+}
+
+vec4_live_variables::vec4_live_variables(vec4_visitor *v, cfg_t *cfg)
+   : v(v), cfg(cfg)
+{
+   mem_ctx = ralloc_context(NULL);
+
+   // This is an emergency hack to work around a defect I don't have time to
+   // figure out the cause of.
+   num_vars = v->virtual_grf_count * 4;
+   bd = rzalloc_array(mem_ctx, struct block_data, cfg->num_blocks);
+   blocks = cfg->num_blocks;
+
+   bitset_words = BITSET_WORDS(num_vars);
+   for (int i = 0; i < cfg->num_blocks; i++) {
+      bd[i].def = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
+      bd[i].use = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
+      bd[i].livein = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
+      bd[i].liveout = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
+   }
+
+   setup_def_use();
+   compute_live_variables();
+}
+
+vec4_live_variables::~vec4_live_variables()
+{
+   ralloc_free(mem_ctx);
+}
+
+#define MAX_INSTRUCTION (1 << 30)
+
+/**
+ * Computes a conservative start/end of the live intervals for each virtual GRF.
+ *
+ * We could expose per-channel live intervals to the consumer based on the
+ * information we computed in vec4_live_variables, except that our only
+ * current user is virtual_grf_interferes().  So we instead union the
+ * per-channel ranges into a per-vgrf range for virtual_grf_start[] and
+ * virtual_grf_end[].
+ *
+ * We could potentially have virtual_grf_interferes() do the test per-channel,
+ * which would let some interesting register allocation occur (particularly on
+ * code-generated GLSL sequences from the Cg compiler which does register
+ * allocation at the GLSL level and thus reuses components of the variable
+ * with distinct lifetimes).  But right now the complexity of doing so doesn't
+ * seem worth it, since having virtual_grf_interferes() be cheap is important
+ * for register allocation performance.
+ */
+void
+vec4_visitor::calculate_live_intervals()
+{
+   if (this->live_intervals)
+      return;
+
+   int *start = ralloc_array(mem_ctx, int, this->virtual_grf_count * 4);
+   int *end = ralloc_array(mem_ctx, int, this->virtual_grf_count * 4);
+   ralloc_free(this->virtual_grf_start);
+   ralloc_free(this->virtual_grf_end);
+   this->virtual_grf_start = start;
+   this->virtual_grf_end = end;
+
+   for (int i = 0; i < this->virtual_grf_count * 4; i++) {
+      start[i] = MAX_INSTRUCTION;
+      end[i] = -1;
+   }
+
+   /* Start by setting up the intervals with no knowledge of control
+    * flow.
+    */
+   int ip = 0;
+   foreach_list(node, &this->instructions) {
+      vec4_instruction *inst = (vec4_instruction *)node;
+
+      for (unsigned int i = 0; i < 3; i++) {
+	 if (inst->src[i].file == GRF) {
+	    int reg = inst->src[i].reg;
+
+            for (int j = 0; j < 4; j++) {
+               int c = BRW_GET_SWZ(inst->src[i].swizzle, j);
+
+               start[reg * 4 + c] = MIN2(start[reg * 4 + c], ip);
+               end[reg * 4 + c] = ip;
+            }
+	 }
+      }
+
+      if (inst->dst.file == GRF) {
+         int reg = inst->dst.reg;
+
+         for (int c = 0; c < 4; c++) {
+            if (inst->dst.writemask & (1 << c)) {
+               start[reg * 4 + c] = MIN2(start[reg * 4 + c], ip);
+               end[reg * 4 + c] = ip;
+            }
+         }
+      }
+
+      ip++;
+   }
+
+   /* Now, extend those intervals using our analysis of control flow.
+    *
+    * The control flow-aware analysis was done at a channel level, while at
+    * this point we're distilling it down to vgrfs.
+    */
+   cfg_t cfg(&instructions);
+   this->live_intervals = new(mem_ctx) vec4_live_variables(this, &cfg);
+
+
+   for (int b = 0; b < cfg.num_blocks; b++) {
+      for (int i = 0; i < live_intervals->num_vars; i++) {
+	 if (BITSET_TEST(live_intervals->bd[b].livein, i)) {
+	    start[i] = MIN2(start[i], cfg.blocks[b]->start_ip);
+	    end[i] = MAX2(end[i], cfg.blocks[b]->start_ip);
+	 }
+
+	 if (BITSET_TEST(live_intervals->bd[b].liveout, i)) {
+	    start[i] = MIN2(start[i], cfg.blocks[b]->end_ip);
+	    end[i] = MAX2(end[i], cfg.blocks[b]->end_ip);
+	 }
+      }
+   }
+}
+
+void
+vec4_visitor::invalidate_live_intervals()
+{
+   ralloc_free(live_intervals);
+   live_intervals = NULL;
+}
+
+bool
+vec4_visitor::virtual_grf_interferes(int a, int b)
+{
+   int start_a = MIN2(MIN2(virtual_grf_start[a * 4 + 0],
+                           virtual_grf_start[a * 4 + 1]),
+                      MIN2(virtual_grf_start[a * 4 + 2],
+                           virtual_grf_start[a * 4 + 3]));
+   int start_b = MIN2(MIN2(virtual_grf_start[b * 4 + 0],
+                           virtual_grf_start[b * 4 + 1]),
+                      MIN2(virtual_grf_start[b * 4 + 2],
+                           virtual_grf_start[b * 4 + 3]));
+   int end_a = MAX2(MAX2(virtual_grf_end[a * 4 + 0],
+                         virtual_grf_end[a * 4 + 1]),
+                    MAX2(virtual_grf_end[a * 4 + 2],
+                         virtual_grf_end[a * 4 + 3]));
+   int end_b = MAX2(MAX2(virtual_grf_end[b * 4 + 0],
+                         virtual_grf_end[b * 4 + 1]),
+                    MAX2(virtual_grf_end[b * 4 + 2],
+                         virtual_grf_end[b * 4 + 3]));
+   return !(end_a <= start_b ||
+            end_b <= start_a);
+}
+
+int
+vec4_visitor::live_out_count(int block_num) const
+{
+   int count = 0;
+
+   assert(this->live_intervals);
+   assert(this->live_intervals->bd);
+   assert(block_num < this->live_intervals->blocks);
+
+   /* Count number of live outs for each block */
+   for (int i=0; i<this->live_intervals->bitset_words; ++i)
+      count += _mesa_bitcount(this->live_intervals->bd[block_num].liveout[i]);
+
+   return count;
+}
+
+int
+vec4_visitor::live_in_count(int block_num) const
+{
+   int count = 0;
+
+   assert(this->live_intervals);
+   assert(this->live_intervals->bd);
+   assert(block_num < this->live_intervals->blocks);
+
+   /* Count number of live ins for each block */
+   for (int i=0; i<this->live_intervals->bitset_words; ++i)
+      count += _mesa_bitcount(this->live_intervals->bd[block_num].livein[i]);
+
+   return count;
+}
+

diff --git a/icd/intel/compiler/pipeline/brw_vec4_live_variables.h b/icd/intel/compiler/pipeline/brw_vec4_live_variables.h
new file mode 100644
index 0000000..9793d5a
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_vec4_live_variables.h

@@ -0,0 +1,76 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Eric Anholt <eric@anholt.net>
+ *
+ */
+
+#include "main/bitset.h"
+#include "brw_vec4.h"
+
+namespace brw {
+
+struct block_data {
+   /**
+    * Which variables are defined before being used in the block.
+    *
+    * Note that for our purposes, "defined" means unconditionally, completely
+    * defined.
+    */
+   BITSET_WORD *def;
+
+   /**
+    * Which variables are used before being defined in the block.
+    */
+   BITSET_WORD *use;
+
+   /** Which defs reach the entry point of the block. */
+   BITSET_WORD *livein;
+
+   /** Which defs reach the exit point of the block. */
+   BITSET_WORD *liveout;
+};
+
+class vec4_live_variables {
+public:
+   DECLARE_RALLOC_CXX_OPERATORS(vec4_live_variables)
+
+   vec4_live_variables(vec4_visitor *v, cfg_t *cfg);
+   ~vec4_live_variables();
+
+   void setup_def_use();
+   void compute_live_variables();
+
+   vec4_visitor *v;
+   cfg_t *cfg;
+   void *mem_ctx;
+
+   int num_vars;
+   int bitset_words;
+   int blocks;
+
+   /** Per-basic-block information on live variables */
+   struct block_data *bd;
+};
+
+} /* namespace brw */

diff --git a/icd/intel/compiler/pipeline/brw_vec4_reg_allocate.cpp b/icd/intel/compiler/pipeline/brw_vec4_reg_allocate.cpp
new file mode 100644
index 0000000..349c031
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_vec4_reg_allocate.cpp

@@ -0,0 +1,367 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+extern "C" {
+#include "main/macros.h"
+#include "program/register_allocate.h"
+} /* extern "C" */
+
+#include "brw_vec4.h"
+#include "brw_vs.h"
+
+using namespace brw;
+
+namespace brw {
+
+static void
+assign(unsigned int *reg_hw_locations, reg *reg)
+{
+   if (reg->file == GRF) {
+      reg->reg = reg_hw_locations[reg->reg];
+   }
+}
+
+bool
+vec4_visitor::reg_allocate_trivial()
+{
+   unsigned int hw_reg_mapping[this->virtual_grf_count];
+   bool virtual_grf_used[this->virtual_grf_count];
+   int i;
+   int next;
+
+   /* Calculate which virtual GRFs are actually in use after whatever
+    * optimization passes have occurred.
+    */
+   for (int i = 0; i < this->virtual_grf_count; i++) {
+      virtual_grf_used[i] = false;
+   }
+
+   foreach_list(node, &this->instructions) {
+      vec4_instruction *inst = (vec4_instruction *) node;
+
+      if (inst->dst.file == GRF)
+	 virtual_grf_used[inst->dst.reg] = true;
+
+      for (int i = 0; i < 3; i++) {
+	 if (inst->src[i].file == GRF)
+	    virtual_grf_used[inst->src[i].reg] = true;
+      }
+   }
+
+   hw_reg_mapping[0] = this->first_non_payload_grf;
+   next = hw_reg_mapping[0] + this->virtual_grf_sizes[0];
+   for (i = 1; i < this->virtual_grf_count; i++) {
+      if (virtual_grf_used[i]) {
+	 hw_reg_mapping[i] = next;
+	 next += this->virtual_grf_sizes[i];
+      }
+   }
+   prog_data->total_grf = next;
+
+   foreach_list(node, &this->instructions) {
+      vec4_instruction *inst = (vec4_instruction *) node;
+
+      assign(hw_reg_mapping, &inst->dst);
+      assign(hw_reg_mapping, &inst->src[0]);
+      assign(hw_reg_mapping, &inst->src[1]);
+      assign(hw_reg_mapping, &inst->src[2]);
+   }
+
+   if (prog_data->total_grf > max_grf) {
+      fail("Ran out of regs on trivial allocator (%d/%d)\n",
+	   prog_data->total_grf, max_grf);
+      return false;
+   }
+
+   return true;
+}
+
+extern "C" void
+brw_vec4_alloc_reg_set(struct intel_screen *screen)
+{
+   int base_reg_count =
+      screen->devinfo->gen >= 7 ? GEN7_MRF_HACK_START : BRW_MAX_GRF;
+
+   /* After running split_virtual_grfs(), almost all VGRFs will be of size 1.
+    * SEND-from-GRF sources cannot be split, so we also need classes for each
+    * potential message length.
+    */
+   const int class_count = 2;
+   const int class_sizes[class_count] = {1, 2};
+
+   /* Compute the total number of registers across all classes. */
+   int ra_reg_count = 0;
+   for (int i = 0; i < class_count; i++) {
+      ra_reg_count += base_reg_count - (class_sizes[i] - 1);
+   }
+
+   ralloc_free(screen->vec4_reg_set.ra_reg_to_grf);
+   screen->vec4_reg_set.ra_reg_to_grf = ralloc_array(screen, uint8_t, ra_reg_count);
+   ralloc_free(screen->vec4_reg_set.regs);
+   screen->vec4_reg_set.regs = ra_alloc_reg_set(screen, ra_reg_count);
+   if (screen->devinfo->gen >= 6)
+      ra_set_allocate_round_robin(screen->vec4_reg_set.regs);
+   ralloc_free(screen->vec4_reg_set.classes);
+   screen->vec4_reg_set.classes = ralloc_array(screen, int, class_count);
+
+   /* Now, add the registers to their classes, and add the conflicts
+    * between them and the base GRF registers (and also each other).
+    */
+   int reg = 0;
+   for (int i = 0; i < class_count; i++) {
+      int class_reg_count = base_reg_count - (class_sizes[i] - 1);
+      screen->vec4_reg_set.classes[i] = ra_alloc_reg_class(screen->vec4_reg_set.regs);
+
+      for (int j = 0; j < class_reg_count; j++) {
+	 ra_class_add_reg(screen->vec4_reg_set.regs, screen->vec4_reg_set.classes[i], reg);
+
+	 screen->vec4_reg_set.ra_reg_to_grf[reg] = j;
+
+	 for (int base_reg = j;
+	      base_reg < j + class_sizes[i];
+	      base_reg++) {
+	    ra_add_transitive_reg_conflict(screen->vec4_reg_set.regs, base_reg, reg);
+	 }
+
+	 reg++;
+      }
+   }
+   assert(reg == ra_reg_count);
+
+   ra_set_finalize(screen->vec4_reg_set.regs, NULL);
+}
+
+void
+vec4_visitor::setup_payload_interference(struct ra_graph *g,
+                                         int first_payload_node,
+                                         int reg_node_count)
+{
+   int payload_node_count = this->first_non_payload_grf;
+
+   for (int i = 0; i < payload_node_count; i++) {
+      /* Mark each payload reg node as being allocated to its physical register.
+       *
+       * The alternative would be to have per-physical register classes, which
+       * would just be silly.
+       */
+      ra_set_node_reg(g, first_payload_node + i, i);
+
+      /* For now, just mark each payload node as interfering with every other
+       * node to be allocated.
+       */
+      for (int j = 0; j < reg_node_count; j++) {
+         ra_add_node_interference(g, first_payload_node + i, j);
+      }
+   }
+}
+
+bool
+vec4_visitor::reg_allocate()
+{
+   struct intel_screen *screen = brw->intelScreen;
+   unsigned int hw_reg_mapping[virtual_grf_count];
+   int payload_reg_count = this->first_non_payload_grf;
+
+   /* Using the trivial allocator can be useful in debugging undefined
+    * register access as a result of broken optimization passes.
+    */
+   if (0)
+      return reg_allocate_trivial();
+
+   calculate_live_intervals();
+
+   int node_count = virtual_grf_count;
+   int first_payload_node = node_count;
+   node_count += payload_reg_count;
+   struct ra_graph *g =
+      ra_alloc_interference_graph(screen->vec4_reg_set.regs, node_count);
+
+   for (int i = 0; i < virtual_grf_count; i++) {
+      int size = this->virtual_grf_sizes[i];
+      assert(size >= 1 && size <= 2 &&
+             "Register allocation relies on split_virtual_grfs().");
+      ra_set_node_class(g, i, screen->vec4_reg_set.classes[size - 1]);
+
+      for (int j = 0; j < i; j++) {
+	 if (virtual_grf_interferes(i, j)) {
+	    ra_add_node_interference(g, i, j);
+	 }
+      }
+   }
+
+   setup_payload_interference(g, first_payload_node, node_count);
+
+   if (!ra_allocate_no_spills(g)) {
+      /* Failed to allocate registers.  Spill a reg, and the caller will
+       * loop back into here to try again.
+       */
+      int reg = choose_spill_reg(g);
+      if (this->no_spills) {
+         fail("Failure to register allocate.  Reduce number of live "
+              "values to avoid this.");
+      } else if (reg == -1) {
+         fail("no register to spill\n");
+      } else {
+         spill_reg(reg);
+      }
+      ralloc_free(g);
+      return false;
+   }
+
+   /* Get the chosen virtual registers for each node, and map virtual
+    * regs in the register classes back down to real hardware reg
+    * numbers.
+    */
+   prog_data->total_grf = payload_reg_count;
+   for (int i = 0; i < virtual_grf_count; i++) {
+      int reg = ra_get_node_reg(g, i);
+
+      hw_reg_mapping[i] = screen->vec4_reg_set.ra_reg_to_grf[reg];
+      prog_data->total_grf = MAX2(prog_data->total_grf,
+				  hw_reg_mapping[i] + virtual_grf_sizes[i]);
+   }
+
+   foreach_list(node, &this->instructions) {
+      vec4_instruction *inst = (vec4_instruction *)node;
+
+      assign(hw_reg_mapping, &inst->dst);
+      assign(hw_reg_mapping, &inst->src[0]);
+      assign(hw_reg_mapping, &inst->src[1]);
+      assign(hw_reg_mapping, &inst->src[2]);
+   }
+
+   ralloc_free(g);
+
+   return true;
+}
+
+void
+vec4_visitor::evaluate_spill_costs(float *spill_costs, bool *no_spill)
+{
+   float loop_scale = 1.0;
+
+   for (int i = 0; i < this->virtual_grf_count; i++) {
+      spill_costs[i] = 0.0;
+      no_spill[i] = virtual_grf_sizes[i] != 1;
+   }
+
+   /* Calculate costs for spilling nodes.  Call it a cost of 1 per
+    * spill/unspill we'll have to do, and guess that the insides of
+    * loops run 10 times.
+    */
+   foreach_list(node, &this->instructions) {
+      vec4_instruction *inst = (vec4_instruction *) node;
+
+      for (unsigned int i = 0; i < 3; i++) {
+	 if (inst->src[i].file == GRF) {
+	    spill_costs[inst->src[i].reg] += loop_scale;
+            if (inst->src[i].reladdr)
+               no_spill[inst->src[i].reg] = true;
+	 }
+      }
+
+      if (inst->dst.file == GRF) {
+	 spill_costs[inst->dst.reg] += loop_scale;
+         if (inst->dst.reladdr)
+            no_spill[inst->dst.reg] = true;
+      }
+
+      switch (inst->opcode) {
+
+      case BRW_OPCODE_DO:
+	 loop_scale *= 10;
+	 break;
+
+      case BRW_OPCODE_WHILE:
+	 loop_scale /= 10;
+	 break;
+
+      case SHADER_OPCODE_GEN4_SCRATCH_READ:
+      case SHADER_OPCODE_GEN4_SCRATCH_WRITE:
+         for (int i = 0; i < 3; i++) {
+            if (inst->src[i].file == GRF)
+               no_spill[inst->src[i].reg] = true;
+         }
+	 if (inst->dst.file == GRF)
+	    no_spill[inst->dst.reg] = true;
+	 break;
+
+      default:
+	 break;
+      }
+   }
+}
+
+int
+vec4_visitor::choose_spill_reg(struct ra_graph *g)
+{
+   float spill_costs[this->virtual_grf_count];
+   bool no_spill[this->virtual_grf_count];
+
+   evaluate_spill_costs(spill_costs, no_spill);
+
+   for (int i = 0; i < this->virtual_grf_count; i++) {
+      if (!no_spill[i])
+         ra_set_node_spill_cost(g, i, spill_costs[i]);
+   }
+
+   return ra_get_best_spill_node(g);
+}
+
+void
+vec4_visitor::spill_reg(int spill_reg_nr)
+{
+   assert(virtual_grf_sizes[spill_reg_nr] == 1);
+   unsigned int spill_offset = c->last_scratch++;
+
+   /* Generate spill/unspill instructions for the objects being spilled. */
+   foreach_list(node, &this->instructions) {
+      vec4_instruction *inst = (vec4_instruction *) node;
+
+      for (unsigned int i = 0; i < 3; i++) {
+         if (inst->src[i].file == GRF && inst->src[i].reg == spill_reg_nr) {
+            src_reg spill_reg = inst->src[i];
+            inst->src[i].reg = virtual_grf_alloc(1);
+            dst_reg temp = dst_reg(inst->src[i]);
+
+            /* Only read the necessary channels, to avoid overwriting the rest
+             * with data that may not have been written to scratch.
+             */
+            temp.writemask = 0;
+            for (int c = 0; c < 4; c++)
+               temp.writemask |= (1 << BRW_GET_SWZ(inst->src[i].swizzle, c));
+            assert(temp.writemask != 0);
+
+            emit_scratch_read(inst, temp, spill_reg, spill_offset);
+         }
+      }
+
+      if (inst->dst.file == GRF && inst->dst.reg == spill_reg_nr) {
+         emit_scratch_write(inst, spill_offset);
+      }
+   }
+
+   invalidate_live_intervals();
+}
+
+} /* namespace brw */

diff --git a/icd/intel/compiler/pipeline/brw_vec4_visitor.cpp b/icd/intel/compiler/pipeline/brw_vec4_visitor.cpp
new file mode 100644
index 0000000..a9b5dad
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_vec4_visitor.cpp

@@ -0,0 +1,3450 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "brw_vec4.h"
+#include "glsl/ir_uniform.h"
+extern "C" {
+#include "program/sampler.h"
+}
+
+namespace brw {
+
+vec4_instruction::vec4_instruction(vec4_visitor *v,
+				   enum opcode opcode, dst_reg dst,
+				   src_reg src0, src_reg src1, src_reg src2)
+{
+   this->opcode = opcode;
+   this->dst = dst;
+   this->src[0] = src0;
+   this->src[1] = src1;
+   this->src[2] = src2;
+   this->saturate = false;
+   this->force_writemask_all = false;
+   this->no_dd_clear = false;
+   this->no_dd_check = false;
+   this->writes_accumulator = false;
+   this->conditional_mod = BRW_CONDITIONAL_NONE;
+   this->sampler = 0;
+   this->texture_offset = 0;
+   this->target = 0;
+   this->shadow_compare = false;
+   this->ir = v->base_ir;
+   this->urb_write_flags = BRW_URB_WRITE_NO_FLAGS;
+   this->header_present = false;
+   this->mlen = 0;
+   this->base_mrf = 0;
+   this->offset = 0;
+   this->annotation = v->current_annotation;
+}
+
+vec4_instruction *
+vec4_visitor::emit(vec4_instruction *inst)
+{
+   this->instructions.push_tail(inst);
+
+   return inst;
+}
+
+vec4_instruction *
+vec4_visitor::emit_before(vec4_instruction *inst, vec4_instruction *new_inst)
+{
+   new_inst->ir = inst->ir;
+   new_inst->annotation = inst->annotation;
+
+   inst->insert_before(new_inst);
+
+   return inst;
+}
+
+vec4_instruction *
+vec4_visitor::emit(enum opcode opcode, dst_reg dst,
+		   src_reg src0, src_reg src1, src_reg src2)
+{
+   return emit(new(mem_ctx) vec4_instruction(this, opcode, dst,
+					     src0, src1, src2));
+}
+
+
+vec4_instruction *
+vec4_visitor::emit(enum opcode opcode, dst_reg dst, src_reg src0, src_reg src1)
+{
+   return emit(new(mem_ctx) vec4_instruction(this, opcode, dst, src0, src1));
+}
+
+vec4_instruction *
+vec4_visitor::emit(enum opcode opcode, dst_reg dst, src_reg src0)
+{
+   return emit(new(mem_ctx) vec4_instruction(this, opcode, dst, src0));
+}
+
+vec4_instruction *
+vec4_visitor::emit(enum opcode opcode, dst_reg dst)
+{
+   return emit(new(mem_ctx) vec4_instruction(this, opcode, dst));
+}
+
+vec4_instruction *
+vec4_visitor::emit(enum opcode opcode)
+{
+   return emit(new(mem_ctx) vec4_instruction(this, opcode, dst_reg()));
+}
+
+#define ALU1(op)							\
+   vec4_instruction *							\
+   vec4_visitor::op(dst_reg dst, src_reg src0)				\
+   {									\
+      return new(mem_ctx) vec4_instruction(this, BRW_OPCODE_##op, dst,	\
+					   src0);			\
+   }
+
+#define ALU2(op)							\
+   vec4_instruction *							\
+   vec4_visitor::op(dst_reg dst, src_reg src0, src_reg src1)		\
+   {									\
+      return new(mem_ctx) vec4_instruction(this, BRW_OPCODE_##op, dst,	\
+					   src0, src1);			\
+   }
+
+#define ALU2_ACC(op)							\
+   vec4_instruction *							\
+   vec4_visitor::op(dst_reg dst, src_reg src0, src_reg src1)		\
+   {									\
+      vec4_instruction *inst = new(mem_ctx) vec4_instruction(this,     \
+                       BRW_OPCODE_##op, dst, src0, src1);		\
+      inst->writes_accumulator = true;                                 \
+      return inst;                                                     \
+   }
+
+#define ALU3(op)							\
+   vec4_instruction *							\
+   vec4_visitor::op(dst_reg dst, src_reg src0, src_reg src1, src_reg src2)\
+   {									\
+      assert(brw->gen >= 6);						\
+      return new(mem_ctx) vec4_instruction(this, BRW_OPCODE_##op, dst,	\
+					   src0, src1, src2);		\
+   }
+
+ALU1(NOT)
+ALU1(MOV)
+ALU1(FRC)
+ALU1(RNDD)
+ALU1(RNDE)
+ALU1(RNDZ)
+ALU1(F32TO16)
+ALU1(F16TO32)
+ALU2(ADD)
+ALU2(MUL)
+ALU2_ACC(MACH)
+ALU2(AND)
+ALU2(OR)
+ALU2(XOR)
+ALU2(DP3)
+ALU2(DP4)
+ALU2(DPH)
+ALU2(SHL)
+ALU2(SHR)
+ALU2(ASR)
+ALU3(LRP)
+ALU1(BFREV)
+ALU3(BFE)
+ALU2(BFI1)
+ALU3(BFI2)
+ALU1(FBH)
+ALU1(FBL)
+ALU1(CBIT)
+ALU3(MAD)
+ALU2_ACC(ADDC)
+ALU2_ACC(SUBB)
+ALU2(MAC)
+
+/** Gen4 predicated IF. */
+vec4_instruction *
+vec4_visitor::IF(uint32_t predicate)
+{
+   vec4_instruction *inst;
+
+   inst = new(mem_ctx) vec4_instruction(this, BRW_OPCODE_IF);
+   inst->predicate = predicate;
+
+   return inst;
+}
+
+/** Gen6 IF with embedded comparison. */
+vec4_instruction *
+vec4_visitor::IF(src_reg src0, src_reg src1, uint32_t condition)
+{
+   assert(brw->gen == 6);
+
+   vec4_instruction *inst;
+
+   resolve_ud_negate(&src0);
+   resolve_ud_negate(&src1);
+
+   inst = new(mem_ctx) vec4_instruction(this, BRW_OPCODE_IF, dst_null_d(),
+					src0, src1);
+   inst->conditional_mod = condition;
+
+   return inst;
+}
+
+/**
+ * CMP: Sets the low bit of the destination channels with the result
+ * of the comparison, while the upper bits are undefined, and updates
+ * the flag register with the packed 16 bits of the result.
+ */
+vec4_instruction *
+vec4_visitor::CMP(dst_reg dst, src_reg src0, src_reg src1, uint32_t condition)
+{
+   vec4_instruction *inst;
+
+   /* original gen4 does type conversion to the destination type
+    * before before comparison, producing garbage results for floating
+    * point comparisons.
+    */
+   if (brw->gen == 4) {
+      dst.type = src0.type;
+      if (dst.file == HW_REG)
+	 dst.fixed_hw_reg.type = dst.type;
+   }
+
+   resolve_ud_negate(&src0);
+   resolve_ud_negate(&src1);
+
+   inst = new(mem_ctx) vec4_instruction(this, BRW_OPCODE_CMP, dst, src0, src1);
+   inst->conditional_mod = condition;
+
+   return inst;
+}
+
+vec4_instruction *
+vec4_visitor::SCRATCH_READ(dst_reg dst, src_reg index)
+{
+   vec4_instruction *inst;
+
+   inst = new(mem_ctx) vec4_instruction(this, SHADER_OPCODE_GEN4_SCRATCH_READ,
+					dst, index);
+   inst->base_mrf = 14;
+   inst->mlen = 2;
+
+   return inst;
+}
+
+vec4_instruction *
+vec4_visitor::SCRATCH_WRITE(dst_reg dst, src_reg src, src_reg index)
+{
+   vec4_instruction *inst;
+
+   inst = new(mem_ctx) vec4_instruction(this, SHADER_OPCODE_GEN4_SCRATCH_WRITE,
+					dst, src, index);
+   inst->base_mrf = 13;
+   inst->mlen = 3;
+
+   return inst;
+}
+
+void
+vec4_visitor::emit_dp(dst_reg dst, src_reg src0, src_reg src1, unsigned elements)
+{
+   static enum opcode dot_opcodes[] = {
+      BRW_OPCODE_DP2, BRW_OPCODE_DP3, BRW_OPCODE_DP4
+   };
+
+   emit(dot_opcodes[elements - 2], dst, src0, src1);
+}
+
+src_reg
+vec4_visitor::fix_3src_operand(src_reg src)
+{
+   /* Using vec4 uniforms in SIMD4x2 programs is difficult. You'd like to be
+    * able to use vertical stride of zero to replicate the vec4 uniform, like
+    *
+    *    g3<0;4,1>:f - [0, 4][1, 5][2, 6][3, 7]
+    *
+    * But you can't, since vertical stride is always four in three-source
+    * instructions. Instead, insert a MOV instruction to do the replication so
+    * that the three-source instruction can consume it.
+    */
+
+   /* The MOV is only needed if the source is a uniform or immediate. */
+   if (src.file != UNIFORM && src.file != IMM)
+      return src;
+
+   if (src.file == UNIFORM && brw_is_single_value_swizzle(src.swizzle))
+      return src;
+
+   dst_reg expanded = dst_reg(this, glsl_type::vec4_type);
+   expanded.type = src.type;
+   emit(MOV(expanded, src));
+   return src_reg(expanded);
+}
+
+src_reg
+vec4_visitor::fix_math_operand(src_reg src)
+{
+   /* The gen6 math instruction ignores the source modifiers --
+    * swizzle, abs, negate, and at least some parts of the register
+    * region description.
+    *
+    * Rather than trying to enumerate all these cases, *always* expand the
+    * operand to a temp GRF for gen6.
+    *
+    * For gen7, keep the operand as-is, except if immediate, which gen7 still
+    * can't use.
+    */
+
+   if (brw->gen == 7 && src.file != IMM)
+      return src;
+
+   dst_reg expanded = dst_reg(this, glsl_type::vec4_type);
+   expanded.type = src.type;
+   emit(MOV(expanded, src));
+   return src_reg(expanded);
+}
+
+void
+vec4_visitor::emit_math1_gen6(enum opcode opcode, dst_reg dst, src_reg src)
+{
+   src = fix_math_operand(src);
+
+   if (dst.writemask != WRITEMASK_XYZW) {
+      /* The gen6 math instruction must be align1, so we can't do
+       * writemasks.
+       */
+      dst_reg temp_dst = dst_reg(this, glsl_type::vec4_type);
+
+      emit(opcode, temp_dst, src);
+
+      emit(MOV(dst, src_reg(temp_dst)));
+   } else {
+      emit(opcode, dst, src);
+   }
+}
+
+void
+vec4_visitor::emit_math1_gen4(enum opcode opcode, dst_reg dst, src_reg src)
+{
+   vec4_instruction *inst = emit(opcode, dst, src);
+   inst->base_mrf = 1;
+   inst->mlen = 1;
+}
+
+void
+vec4_visitor::emit_math(opcode opcode, dst_reg dst, src_reg src)
+{
+   switch (opcode) {
+   case SHADER_OPCODE_RCP:
+   case SHADER_OPCODE_RSQ:
+   case SHADER_OPCODE_SQRT:
+   case SHADER_OPCODE_EXP2:
+   case SHADER_OPCODE_LOG2:
+   case SHADER_OPCODE_SIN:
+   case SHADER_OPCODE_COS:
+      break;
+   default:
+      assert(!"not reached: bad math opcode");
+      return;
+   }
+
+   if (brw->gen >= 8) {
+      emit(opcode, dst, src);
+   } else if (brw->gen >= 6) {
+      emit_math1_gen6(opcode, dst, src);
+   } else {
+      emit_math1_gen4(opcode, dst, src);
+   }
+}
+
+void
+vec4_visitor::emit_math2_gen6(enum opcode opcode,
+			      dst_reg dst, src_reg src0, src_reg src1)
+{
+   src0 = fix_math_operand(src0);
+   src1 = fix_math_operand(src1);
+
+   if (dst.writemask != WRITEMASK_XYZW) {
+      /* The gen6 math instruction must be align1, so we can't do
+       * writemasks.
+       */
+      dst_reg temp_dst = dst_reg(this, glsl_type::vec4_type);
+      temp_dst.type = dst.type;
+
+      emit(opcode, temp_dst, src0, src1);
+
+      emit(MOV(dst, src_reg(temp_dst)));
+   } else {
+      emit(opcode, dst, src0, src1);
+   }
+}
+
+void
+vec4_visitor::emit_math2_gen4(enum opcode opcode,
+			      dst_reg dst, src_reg src0, src_reg src1)
+{
+   vec4_instruction *inst = emit(opcode, dst, src0, src1);
+   inst->base_mrf = 1;
+   inst->mlen = 2;
+}
+
+void
+vec4_visitor::emit_math(enum opcode opcode,
+			dst_reg dst, src_reg src0, src_reg src1)
+{
+   switch (opcode) {
+   case SHADER_OPCODE_POW:
+   case SHADER_OPCODE_INT_QUOTIENT:
+   case SHADER_OPCODE_INT_REMAINDER:
+      break;
+   default:
+      assert(!"not reached: unsupported binary math opcode");
+      return;
+   }
+
+   if (brw->gen >= 8) {
+      emit(opcode, dst, src0, src1);
+   } else if (brw->gen >= 6) {
+      emit_math2_gen6(opcode, dst, src0, src1);
+   } else {
+      emit_math2_gen4(opcode, dst, src0, src1);
+   }
+}
+
+void
+vec4_visitor::emit_pack_half_2x16(dst_reg dst, src_reg src0)
+{
+   if (brw->gen < 7)
+      assert(!"ir_unop_pack_half_2x16 should be lowered");
+
+   assert(dst.type == BRW_REGISTER_TYPE_UD);
+   assert(src0.type == BRW_REGISTER_TYPE_F);
+
+   /* From the Ivybridge PRM, Vol4, Part3, Section 6.27 f32to16:
+    *
+    *   Because this instruction does not have a 16-bit floating-point type,
+    *   the destination data type must be Word (W).
+    *
+    *   The destination must be DWord-aligned and specify a horizontal stride
+    *   (HorzStride) of 2. The 16-bit result is stored in the lower word of
+    *   each destination channel and the upper word is not modified.
+    *
+    * The above restriction implies that the f32to16 instruction must use
+    * align1 mode, because only in align1 mode is it possible to specify
+    * horizontal stride.  We choose here to defy the hardware docs and emit
+    * align16 instructions.
+    *
+    * (I [chadv] did attempt to emit align1 instructions for VS f32to16
+    * instructions. I was partially successful in that the code passed all
+    * tests.  However, the code was dubiously correct and fragile, and the
+    * tests were not harsh enough to probe that frailty. Not trusting the
+    * code, I chose instead to remain in align16 mode in defiance of the hw
+    * docs).
+    *
+    * I've [chadv] experimentally confirmed that, on gen7 hardware and the
+    * simulator, emitting a f32to16 in align16 mode with UD as destination
+    * data type is safe. The behavior differs from that specified in the PRM
+    * in that the upper word of each destination channel is cleared to 0.
+    */
+
+   dst_reg tmp_dst(this, glsl_type::uvec2_type);
+   src_reg tmp_src(tmp_dst);
+
+#if 0
+   /* Verify the undocumented behavior on which the following instructions
+    * rely.  If f32to16 fails to clear the upper word of the X and Y channels,
+    * then the result of the bit-or instruction below will be incorrect.
+    *
+    * You should inspect the disasm output in order to verify that the MOV is
+    * not optimized away.
+    */
+   emit(MOV(tmp_dst, src_reg(0x12345678u)));
+#endif
+
+   /* Give tmp the form below, where "." means untouched.
+    *
+    *     w z          y          x w z          y          x
+    *   |.|.|0x0000hhhh|0x0000llll|.|.|0x0000hhhh|0x0000llll|
+    *
+    * That the upper word of each write-channel be 0 is required for the
+    * following bit-shift and bit-or instructions to work. Note that this
+    * relies on the undocumented hardware behavior mentioned above.
+    */
+   tmp_dst.writemask = WRITEMASK_XY;
+   emit(F32TO16(tmp_dst, src0));
+
+   /* Give the write-channels of dst the form:
+    *   0xhhhh0000
+    */
+   tmp_src.swizzle = BRW_SWIZZLE_YYYY;
+   emit(SHL(dst, tmp_src, src_reg(16u)));
+
+   /* Finally, give the write-channels of dst the form of packHalf2x16's
+    * output:
+    *   0xhhhhllll
+    */
+   tmp_src.swizzle = BRW_SWIZZLE_XXXX;
+   emit(OR(dst, src_reg(dst), tmp_src));
+}
+
+void
+vec4_visitor::emit_unpack_half_2x16(dst_reg dst, src_reg src0)
+{
+   if (brw->gen < 7)
+      assert(!"ir_unop_unpack_half_2x16 should be lowered");
+
+   assert(dst.type == BRW_REGISTER_TYPE_F);
+   assert(src0.type == BRW_REGISTER_TYPE_UD);
+
+   /* From the Ivybridge PRM, Vol4, Part3, Section 6.26 f16to32:
+    *
+    *   Because this instruction does not have a 16-bit floating-point type,
+    *   the source data type must be Word (W). The destination type must be
+    *   F (Float).
+    *
+    * To use W as the source data type, we must adjust horizontal strides,
+    * which is only possible in align1 mode. All my [chadv] attempts at
+    * emitting align1 instructions for unpackHalf2x16 failed to pass the
+    * Piglit tests, so I gave up.
+    *
+    * I've verified that, on gen7 hardware and the simulator, it is safe to
+    * emit f16to32 in align16 mode with UD as source data type.
+    */
+
+   dst_reg tmp_dst(this, glsl_type::uvec2_type);
+   src_reg tmp_src(tmp_dst);
+
+   tmp_dst.writemask = WRITEMASK_X;
+   emit(AND(tmp_dst, src0, src_reg(0xffffu)));
+
+   tmp_dst.writemask = WRITEMASK_Y;
+   emit(SHR(tmp_dst, src0, src_reg(16u)));
+
+   dst.writemask = WRITEMASK_XY;
+   emit(F16TO32(dst, tmp_src));
+}
+
+void
+vec4_visitor::visit_instructions(const exec_list *list)
+{
+   foreach_list(node, list) {
+      ir_instruction *ir = (ir_instruction *)node;
+
+      base_ir = ir;
+      ir->accept(this);
+   }
+}
+
+
+static int
+type_size(const struct glsl_type *type)
+{
+   unsigned int i;
+   int size;
+
+   switch (type->base_type) {
+   case GLSL_TYPE_UINT:
+   case GLSL_TYPE_INT:
+   case GLSL_TYPE_FLOAT:
+   case GLSL_TYPE_BOOL:
+      if (type->is_matrix()) {
+	 return type->matrix_columns;
+      } else {
+	 /* Regardless of size of vector, it gets a vec4. This is bad
+	  * packing for things like floats, but otherwise arrays become a
+	  * mess.  Hopefully a later pass over the code can pack scalars
+	  * down if appropriate.
+	  */
+	 return 1;
+      }
+   case GLSL_TYPE_ARRAY:
+      assert(type->length > 0);
+      return type_size(type->fields.array) * type->length;
+   case GLSL_TYPE_STRUCT:
+      size = 0;
+      for (i = 0; i < type->length; i++) {
+	 size += type_size(type->fields.structure[i].type);
+      }
+      return size;
+   case GLSL_TYPE_SAMPLER:
+      /* Samplers take up one slot in UNIFORMS[], but they're baked in
+       * at link time.
+       */
+      return 1;
+   case GLSL_TYPE_ATOMIC_UINT:
+      return 0;
+   case GLSL_TYPE_IMAGE:
+   case GLSL_TYPE_VOID:
+   case GLSL_TYPE_ERROR:
+   case GLSL_TYPE_INTERFACE:
+      assert(0);
+      break;
+   }
+
+   return 0;
+}
+
+int
+vec4_visitor::virtual_grf_alloc(int size)
+{
+   if (virtual_grf_array_size <= virtual_grf_count) {
+      if (virtual_grf_array_size == 0)
+	 virtual_grf_array_size = 16;
+      else
+	 virtual_grf_array_size *= 2;
+      virtual_grf_sizes = reralloc(mem_ctx, virtual_grf_sizes, int,
+				   virtual_grf_array_size);
+      virtual_grf_reg_map = reralloc(mem_ctx, virtual_grf_reg_map, int,
+				     virtual_grf_array_size);
+   }
+   virtual_grf_reg_map[virtual_grf_count] = virtual_grf_reg_count;
+   virtual_grf_reg_count += size;
+   virtual_grf_sizes[virtual_grf_count] = size;
+   return virtual_grf_count++;
+}
+
+src_reg::src_reg(class vec4_visitor *v, const struct glsl_type *type)
+{
+   init();
+
+   this->file = GRF;
+   this->reg = v->virtual_grf_alloc(type_size(type));
+
+   if (type->is_array() || type->is_record()) {
+      this->swizzle = BRW_SWIZZLE_NOOP;
+   } else {
+      this->swizzle = swizzle_for_size(type->vector_elements);
+   }
+
+   this->type = brw_type_for_base_type(type);
+}
+
+dst_reg::dst_reg(class vec4_visitor *v, const struct glsl_type *type)
+{
+   init();
+
+   this->file = GRF;
+   this->reg = v->virtual_grf_alloc(type_size(type));
+
+   if (type->is_array() || type->is_record()) {
+      this->writemask = WRITEMASK_XYZW;
+   } else {
+      this->writemask = (1 << type->vector_elements) - 1;
+   }
+
+   this->type = brw_type_for_base_type(type);
+}
+
+/* Our support for uniforms is piggy-backed on the struct
+ * gl_fragment_program, because that's where the values actually
+ * get stored, rather than in some global gl_shader_program uniform
+ * store.
+ */
+void
+vec4_visitor::setup_uniform_values(ir_variable *ir)
+{
+   int namelen = strlen(ir->name);
+
+   /* The data for our (non-builtin) uniforms is stored in a series of
+    * gl_uniform_driver_storage structs for each subcomponent that
+    * glGetUniformLocation() could name.  We know it's been set up in the same
+    * order we'd walk the type, so walk the list of storage and find anything
+    * with our name, or the prefix of a component that starts with our name.
+    */
+   for (unsigned u = 0; u < shader_prog->NumUserUniformStorage; u++) {
+      struct gl_uniform_storage *storage = &shader_prog->UniformStorage[u];
+
+      if (strncmp(ir->name, storage->name, namelen) != 0 ||
+          (storage->name[namelen] != 0 &&
+           storage->name[namelen] != '.' &&
+           storage->name[namelen] != '[')) {
+         continue;
+      }
+
+      gl_constant_value *components = storage->storage;
+      unsigned vector_count = (MAX2(storage->array_elements, 1) *
+                               storage->type->matrix_columns);
+
+      for (unsigned s = 0; s < vector_count; s++) {
+         assert(uniforms < uniform_array_size);
+         uniform_vector_size[uniforms] = storage->type->vector_elements;
+
+         int i;
+         for (i = 0; i < uniform_vector_size[uniforms]; i++) {
+            stage_prog_data->param[uniforms * 4 + i] = &components->f;
+            components++;
+         }
+         for (; i < 4; i++) {
+            static float zero = 0;
+            stage_prog_data->param[uniforms * 4 + i] = &zero;
+         }
+
+         uniforms++;
+      }
+   }
+}
+
+void
+vec4_visitor::setup_uniform_clipplane_values()
+{
+//   gl_clip_plane *clip_planes = brw_select_clip_planes(ctx);
+//
+//   for (int i = 0; i < key->nr_userclip_plane_consts; ++i) {
+//      assert(this->uniforms < uniform_array_size);
+//      this->uniform_vector_size[this->uniforms] = 4;
+//      this->userplane[i] = dst_reg(UNIFORM, this->uniforms);
+//      this->userplane[i].type = BRW_REGISTER_TYPE_F;
+//      for (int j = 0; j < 4; ++j) {
+//         stage_prog_data->param[this->uniforms * 4 + j] = &clip_planes[i][j];
+//      }
+//      ++this->uniforms;
+//   }
+   assert(!"no gl_ClipPlane support");
+}
+
+/* Our support for builtin uniforms is even scarier than non-builtin.
+ * It sits on top of the PROG_STATE_VAR parameters that are
+ * automatically updated from GL context state.
+ */
+void
+vec4_visitor::setup_builtin_uniform_values(ir_variable *ir)
+{
+   const ir_state_slot *const slots = ir->state_slots;
+   assert(ir->state_slots != NULL);
+
+   for (unsigned int i = 0; i < ir->num_state_slots; i++) {
+      /* This state reference has already been setup by ir_to_mesa,
+       * but we'll get the same index back here.  We can reference
+       * ParameterValues directly, since unlike brw_fs.cpp, we never
+       * add new state references during compile.
+       */
+      int index = _mesa_add_state_reference(this->prog->Parameters,
+					    (gl_state_index *)slots[i].tokens);
+      float *values = &this->prog->Parameters->ParameterValues[index][0].f;
+
+      assert(this->uniforms < uniform_array_size);
+      this->uniform_vector_size[this->uniforms] = 0;
+      /* Add each of the unique swizzled channels of the element.
+       * This will end up matching the size of the glsl_type of this field.
+       */
+      int last_swiz = -1;
+      for (unsigned int j = 0; j < 4; j++) {
+	 int swiz = GET_SWZ(slots[i].swizzle, j);
+	 last_swiz = swiz;
+
+	 stage_prog_data->param[this->uniforms * 4 + j] = &values[swiz];
+	 assert(this->uniforms < uniform_array_size);
+	 if (swiz <= last_swiz)
+	    this->uniform_vector_size[this->uniforms]++;
+      }
+      this->uniforms++;
+   }
+}
+
+dst_reg *
+vec4_visitor::variable_storage(ir_variable *var)
+{
+   return (dst_reg *)hash_table_find(this->variable_ht, var);
+}
+
+void
+vec4_visitor::emit_bool_to_cond_code(ir_rvalue *ir, uint32_t *predicate)
+{
+   ir_expression *expr = ir->as_expression();
+
+   *predicate = BRW_PREDICATE_NORMAL;
+
+   if (expr) {
+      src_reg op[2];
+      vec4_instruction *inst;
+
+      assert(expr->get_num_operands() <= 2);
+      for (unsigned int i = 0; i < expr->get_num_operands(); i++) {
+	 expr->operands[i]->accept(this);
+	 op[i] = this->result;
+
+	 resolve_ud_negate(&op[i]);
+      }
+
+      switch (expr->operation) {
+      case ir_unop_logic_not:
+	 inst = emit(AND(dst_null_d(), op[0], src_reg(1)));
+	 inst->conditional_mod = BRW_CONDITIONAL_Z;
+	 break;
+
+      case ir_binop_logic_xor:
+	 inst = emit(XOR(dst_null_d(), op[0], op[1]));
+	 inst->conditional_mod = BRW_CONDITIONAL_NZ;
+	 break;
+
+      case ir_binop_logic_or:
+	 inst = emit(OR(dst_null_d(), op[0], op[1]));
+	 inst->conditional_mod = BRW_CONDITIONAL_NZ;
+	 break;
+
+      case ir_binop_logic_and:
+	 inst = emit(AND(dst_null_d(), op[0], op[1]));
+	 inst->conditional_mod = BRW_CONDITIONAL_NZ;
+	 break;
+
+      case ir_unop_f2b:
+	 if (brw->gen >= 6) {
+	    emit(CMP(dst_null_d(), op[0], src_reg(0.0f), BRW_CONDITIONAL_NZ));
+	 } else {
+	    inst = emit(MOV(dst_null_f(), op[0]));
+	    inst->conditional_mod = BRW_CONDITIONAL_NZ;
+	 }
+	 break;
+
+      case ir_unop_i2b:
+	 if (brw->gen >= 6) {
+	    emit(CMP(dst_null_d(), op[0], src_reg(0), BRW_CONDITIONAL_NZ));
+	 } else {
+	    inst = emit(MOV(dst_null_d(), op[0]));
+	    inst->conditional_mod = BRW_CONDITIONAL_NZ;
+	 }
+	 break;
+
+      case ir_binop_all_equal:
+	 inst = emit(CMP(dst_null_d(), op[0], op[1], BRW_CONDITIONAL_Z));
+	 *predicate = BRW_PREDICATE_ALIGN16_ALL4H;
+	 break;
+
+      case ir_binop_any_nequal:
+	 inst = emit(CMP(dst_null_d(), op[0], op[1], BRW_CONDITIONAL_NZ));
+	 *predicate = BRW_PREDICATE_ALIGN16_ANY4H;
+	 break;
+
+      case ir_unop_any:
+	 inst = emit(CMP(dst_null_d(), op[0], src_reg(0), BRW_CONDITIONAL_NZ));
+	 *predicate = BRW_PREDICATE_ALIGN16_ANY4H;
+	 break;
+
+      case ir_binop_greater:
+      case ir_binop_gequal:
+      case ir_binop_less:
+      case ir_binop_lequal:
+      case ir_binop_equal:
+      case ir_binop_nequal:
+	 emit(CMP(dst_null_d(), op[0], op[1],
+		  brw_conditional_for_comparison(expr->operation)));
+	 break;
+
+      default:
+	 assert(!"not reached");
+	 break;
+      }
+      return;
+   }
+
+   ir->accept(this);
+
+   resolve_ud_negate(&this->result);
+
+   if (brw->gen >= 6) {
+      vec4_instruction *inst = emit(AND(dst_null_d(),
+					this->result, src_reg(1)));
+      inst->conditional_mod = BRW_CONDITIONAL_NZ;
+   } else {
+      vec4_instruction *inst = emit(MOV(dst_null_d(), this->result));
+      inst->conditional_mod = BRW_CONDITIONAL_NZ;
+   }
+}
+
+/**
+ * Emit a gen6 IF statement with the comparison folded into the IF
+ * instruction.
+ */
+void
+vec4_visitor::emit_if_gen6(ir_if *ir)
+{
+   ir_expression *expr = ir->condition->as_expression();
+
+   if (expr) {
+      src_reg op[2];
+      dst_reg temp;
+
+      assert(expr->get_num_operands() <= 2);
+      for (unsigned int i = 0; i < expr->get_num_operands(); i++) {
+	 expr->operands[i]->accept(this);
+	 op[i] = this->result;
+      }
+
+      switch (expr->operation) {
+      case ir_unop_logic_not:
+	 emit(IF(op[0], src_reg(0), BRW_CONDITIONAL_Z));
+	 return;
+
+      case ir_binop_logic_xor:
+	 emit(IF(op[0], op[1], BRW_CONDITIONAL_NZ));
+	 return;
+
+      case ir_binop_logic_or:
+	 temp = dst_reg(this, glsl_type::bool_type);
+	 emit(OR(temp, op[0], op[1]));
+	 emit(IF(src_reg(temp), src_reg(0), BRW_CONDITIONAL_NZ));
+	 return;
+
+      case ir_binop_logic_and:
+	 temp = dst_reg(this, glsl_type::bool_type);
+	 emit(AND(temp, op[0], op[1]));
+	 emit(IF(src_reg(temp), src_reg(0), BRW_CONDITIONAL_NZ));
+	 return;
+
+      case ir_unop_f2b:
+	 emit(IF(op[0], src_reg(0), BRW_CONDITIONAL_NZ));
+	 return;
+
+      case ir_unop_i2b:
+	 emit(IF(op[0], src_reg(0), BRW_CONDITIONAL_NZ));
+	 return;
+
+      case ir_binop_greater:
+      case ir_binop_gequal:
+      case ir_binop_less:
+      case ir_binop_lequal:
+      case ir_binop_equal:
+      case ir_binop_nequal:
+	 emit(IF(op[0], op[1],
+		 brw_conditional_for_comparison(expr->operation)));
+	 return;
+
+      case ir_binop_all_equal:
+	 emit(CMP(dst_null_d(), op[0], op[1], BRW_CONDITIONAL_Z));
+	 emit(IF(BRW_PREDICATE_ALIGN16_ALL4H));
+	 return;
+
+      case ir_binop_any_nequal:
+	 emit(CMP(dst_null_d(), op[0], op[1], BRW_CONDITIONAL_NZ));
+	 emit(IF(BRW_PREDICATE_ALIGN16_ANY4H));
+	 return;
+
+      case ir_unop_any:
+	 emit(CMP(dst_null_d(), op[0], src_reg(0), BRW_CONDITIONAL_NZ));
+	 emit(IF(BRW_PREDICATE_ALIGN16_ANY4H));
+	 return;
+
+      default:
+	 assert(!"not reached");
+	 emit(IF(op[0], src_reg(0), BRW_CONDITIONAL_NZ));
+	 return;
+      }
+      return;
+   }
+
+   ir->condition->accept(this);
+
+   emit(IF(this->result, src_reg(0), BRW_CONDITIONAL_NZ));
+}
+
+void
+vec4_visitor::visit(ir_variable *ir)
+{
+   dst_reg *reg = NULL;
+
+   if (variable_storage(ir))
+      return;
+
+   switch (ir->data.mode) {
+   case ir_var_shader_in:
+      reg = new(mem_ctx) dst_reg(ATTR, ir->data.location);
+      break;
+
+   case ir_var_shader_out:
+      reg = new(mem_ctx) dst_reg(this, ir->type);
+
+      for (int i = 0; i < type_size(ir->type); i++) {
+	 output_reg[ir->data.location + i] = *reg;
+	 output_reg[ir->data.location + i].reg_offset = i;
+	 output_reg[ir->data.location + i].type =
+            brw_type_for_base_type(ir->type->get_scalar_type());
+	 output_reg_annotation[ir->data.location + i] = ir->name;
+      }
+      break;
+
+   case ir_var_auto:
+   case ir_var_temporary:
+      reg = new(mem_ctx) dst_reg(this, ir->type);
+      break;
+
+   case ir_var_uniform:
+      reg = new(this->mem_ctx) dst_reg(UNIFORM, this->uniforms);
+
+      /* Thanks to the lower_ubo_reference pass, we will see only
+       * ir_binop_ubo_load expressions and not ir_dereference_variable for UBO
+       * variables, so no need for them to be in variable_ht.
+       *
+       * Atomic counters take no uniform storage, no need to do
+       * anything here.
+       */
+      if (ir->is_in_uniform_block() || ir->type->contains_atomic())
+         return;
+
+      /* Track how big the whole uniform variable is, in case we need to put a
+       * copy of its data into pull constants for array access.
+       */
+      assert(this->uniforms < uniform_array_size);
+      this->uniform_size[this->uniforms] = type_size(ir->type);
+
+      if (!strncmp(ir->name, "gl_", 3)) {
+	 setup_builtin_uniform_values(ir);
+      } else {
+	 setup_uniform_values(ir);
+      }
+      break;
+
+   case ir_var_system_value:
+      reg = make_reg_for_system_value(ir);
+      break;
+
+   default:
+      assert(!"not reached");
+   }
+
+   reg->type = brw_type_for_base_type(ir->type);
+   hash_table_insert(this->variable_ht, reg, ir);
+}
+
+void
+vec4_visitor::visit(ir_loop *ir)
+{
+   /* We don't want debugging output to print the whole body of the
+    * loop as the annotation.
+    */
+   this->base_ir = NULL;
+
+   emit(BRW_OPCODE_DO);
+
+   visit_instructions(&ir->body_instructions);
+
+   emit(BRW_OPCODE_WHILE);
+}
+
+void
+vec4_visitor::visit(ir_loop_jump *ir)
+{
+   switch (ir->mode) {
+   case ir_loop_jump::jump_break:
+      emit(BRW_OPCODE_BREAK);
+      break;
+   case ir_loop_jump::jump_continue:
+      emit(BRW_OPCODE_CONTINUE);
+      break;
+   }
+}
+
+
+void
+vec4_visitor::visit(ir_function_signature *ir)
+{
+   assert(0);
+   (void)ir;
+}
+
+void
+vec4_visitor::visit(ir_function *ir)
+{
+   /* Ignore function bodies other than main() -- we shouldn't see calls to
+    * them since they should all be inlined.
+    */
+   if (strcmp(ir->name, "main") == 0) {
+      const ir_function_signature *sig;
+      exec_list empty;
+
+      sig = ir->matching_signature(NULL, &empty);
+
+      assert(sig);
+
+      visit_instructions(&sig->body);
+   }
+}
+
+bool
+vec4_visitor::try_emit_sat(ir_expression *ir)
+{
+   ir_rvalue *sat_src = ir->as_rvalue_to_saturate();
+   if (!sat_src)
+      return false;
+
+   sat_src->accept(this);
+   src_reg src = this->result;
+
+   this->result = src_reg(this, ir->type);
+   vec4_instruction *inst;
+   inst = emit(MOV(dst_reg(this->result), src));
+   inst->saturate = true;
+
+   return true;
+}
+
+bool
+vec4_visitor::try_emit_mad(ir_expression *ir)
+{
+   /* 3-src instructions were introduced in gen6. */
+   if (brw->gen < 6)
+      return false;
+
+   /* MAD can only handle floating-point data. */
+   if (ir->type->base_type != GLSL_TYPE_FLOAT)
+      return false;
+
+   ir_rvalue *nonmul = ir->operands[1];
+   ir_expression *mul = ir->operands[0]->as_expression();
+
+   if (!mul || mul->operation != ir_binop_mul) {
+      nonmul = ir->operands[0];
+      mul = ir->operands[1]->as_expression();
+
+      if (!mul || mul->operation != ir_binop_mul)
+         return false;
+   }
+
+   nonmul->accept(this);
+   src_reg src0 = fix_3src_operand(this->result);
+
+   mul->operands[0]->accept(this);
+   src_reg src1 = fix_3src_operand(this->result);
+
+   mul->operands[1]->accept(this);
+   src_reg src2 = fix_3src_operand(this->result);
+
+   this->result = src_reg(this, ir->type);
+   emit(BRW_OPCODE_MAD, dst_reg(this->result), src0, src1, src2);
+
+   return true;
+}
+
+void
+vec4_visitor::emit_bool_comparison(unsigned int op,
+				 dst_reg dst, src_reg src0, src_reg src1)
+{
+   /* original gen4 does destination conversion before comparison. */
+   if (brw->gen < 5)
+      dst.type = src0.type;
+
+   emit(CMP(dst, src0, src1, brw_conditional_for_comparison(op)));
+
+   dst.type = BRW_REGISTER_TYPE_D;
+   emit(AND(dst, src_reg(dst), src_reg(0x1)));
+}
+
+void
+vec4_visitor::emit_minmax(uint32_t conditionalmod, dst_reg dst,
+                          src_reg src0, src_reg src1)
+{
+   vec4_instruction *inst;
+
+   if (brw->gen >= 6) {
+      inst = emit(BRW_OPCODE_SEL, dst, src0, src1);
+      inst->conditional_mod = conditionalmod;
+   } else {
+      emit(CMP(dst, src0, src1, conditionalmod));
+
+      inst = emit(BRW_OPCODE_SEL, dst, src0, src1);
+      inst->predicate = BRW_PREDICATE_NORMAL;
+   }
+}
+
+void
+vec4_visitor::emit_lrp(const dst_reg &dst,
+                       const src_reg &x, const src_reg &y, const src_reg &a)
+{
+   if (brw->gen >= 6) {
+      /* Note that the instruction's argument order is reversed from GLSL
+       * and the IR.
+       */
+      emit(LRP(dst,
+               fix_3src_operand(a), fix_3src_operand(y), fix_3src_operand(x)));
+   } else {
+      /* Earlier generations don't support three source operations, so we
+       * need to emit x*(1-a) + y*a.
+       */
+      dst_reg y_times_a           = dst_reg(this, glsl_type::vec4_type);
+      dst_reg one_minus_a         = dst_reg(this, glsl_type::vec4_type);
+      dst_reg x_times_one_minus_a = dst_reg(this, glsl_type::vec4_type);
+      y_times_a.writemask           = dst.writemask;
+      one_minus_a.writemask         = dst.writemask;
+      x_times_one_minus_a.writemask = dst.writemask;
+
+      emit(MUL(y_times_a, y, a));
+      emit(ADD(one_minus_a, negate(a), src_reg(1.0f)));
+      emit(MUL(x_times_one_minus_a, x, src_reg(one_minus_a)));
+      emit(ADD(dst, src_reg(x_times_one_minus_a), src_reg(y_times_a)));
+   }
+}
+
+void
+vec4_visitor::visit(ir_expression *ir)
+{
+   unsigned int operand;
+   src_reg op[Elements(ir->operands)];
+   src_reg result_src;
+   dst_reg result_dst;
+   vec4_instruction *inst;
+
+   if (try_emit_sat(ir))
+      return;
+
+   if (ir->operation == ir_binop_add) {
+      if (try_emit_mad(ir))
+	 return;
+   }
+
+   for (operand = 0; operand < ir->get_num_operands(); operand++) {
+      this->result.file = BAD_FILE;
+      ir->operands[operand]->accept(this);
+      if (this->result.file == BAD_FILE) {
+	 fprintf(stderr, "Failed to get tree for expression operand:\n");
+	 ir->operands[operand]->fprint(stderr);
+	 exit(1);
+      }
+      op[operand] = this->result;
+
+      /* Matrix expression operands should have been broken down to vector
+       * operations already.
+       */
+      assert(!ir->operands[operand]->type->is_matrix());
+   }
+
+   int vector_elements = ir->operands[0]->type->vector_elements;
+   if (ir->operands[1]) {
+      vector_elements = MAX2(vector_elements,
+			     ir->operands[1]->type->vector_elements);
+   }
+
+   this->result.file = BAD_FILE;
+
+   /* Storage for our result.  Ideally for an assignment we'd be using
+    * the actual storage for the result here, instead.
+    */
+   result_src = src_reg(this, ir->type);
+   /* convenience for the emit functions below. */
+   result_dst = dst_reg(result_src);
+   /* If nothing special happens, this is the result. */
+   this->result = result_src;
+   /* Limit writes to the channels that will be used by result_src later.
+    * This does limit this temp's use as a temporary for multi-instruction
+    * sequences.
+    */
+   result_dst.writemask = (1 << ir->type->vector_elements) - 1;
+
+   switch (ir->operation) {
+   case ir_unop_logic_not:
+      /* Note that BRW_OPCODE_NOT is not appropriate here, since it is
+       * ones complement of the whole register, not just bit 0.
+       */
+      emit(XOR(result_dst, op[0], src_reg(1)));
+      break;
+   case ir_unop_neg:
+      op[0].negate = !op[0].negate;
+      emit(MOV(result_dst, op[0]));
+      break;
+   case ir_unop_abs:
+      op[0].abs = true;
+      op[0].negate = false;
+      emit(MOV(result_dst, op[0]));
+      break;
+
+   case ir_unop_sign:
+      if (ir->type->is_float()) {
+         /* AND(val, 0x80000000) gives the sign bit.
+          *
+          * Predicated OR ORs 1.0 (0x3f800000) with the sign bit if val is not
+          * zero.
+          */
+         emit(CMP(dst_null_f(), op[0], src_reg(0.0f), BRW_CONDITIONAL_NZ));
+
+         op[0].type = BRW_REGISTER_TYPE_UD;
+         result_dst.type = BRW_REGISTER_TYPE_UD;
+         emit(AND(result_dst, op[0], src_reg(0x80000000u)));
+
+         inst = emit(OR(result_dst, src_reg(result_dst), src_reg(0x3f800000u)));
+         inst->predicate = BRW_PREDICATE_NORMAL;
+
+         this->result.type = BRW_REGISTER_TYPE_F;
+      } else {
+         /*  ASR(val, 31) -> negative val generates 0xffffffff (signed -1).
+          *               -> non-negative val generates 0x00000000.
+          *  Predicated OR sets 1 if val is positive.
+          */
+         emit(CMP(dst_null_d(), op[0], src_reg(0), BRW_CONDITIONAL_G));
+
+         emit(ASR(result_dst, op[0], src_reg(31)));
+
+         inst = emit(OR(result_dst, src_reg(result_dst), src_reg(1)));
+         inst->predicate = BRW_PREDICATE_NORMAL;
+      }
+      break;
+
+   case ir_unop_rcp:
+      emit_math(SHADER_OPCODE_RCP, result_dst, op[0]);
+      break;
+
+   case ir_unop_exp2:
+      emit_math(SHADER_OPCODE_EXP2, result_dst, op[0]);
+      break;
+   case ir_unop_log2:
+      emit_math(SHADER_OPCODE_LOG2, result_dst, op[0]);
+      break;
+   case ir_unop_exp:
+   case ir_unop_log:
+      assert(!"not reached: should be handled by ir_explog_to_explog2");
+      break;
+   case ir_unop_sin:
+   case ir_unop_sin_reduced:
+      emit_math(SHADER_OPCODE_SIN, result_dst, op[0]);
+      break;
+   case ir_unop_cos:
+   case ir_unop_cos_reduced:
+      emit_math(SHADER_OPCODE_COS, result_dst, op[0]);
+      break;
+
+   case ir_unop_dFdx:
+   case ir_unop_dFdy:
+      assert(!"derivatives not valid in vertex shader");
+      break;
+
+   case ir_unop_bitfield_reverse:
+      emit(BFREV(result_dst, op[0]));
+      break;
+   case ir_unop_bit_count:
+      emit(CBIT(result_dst, op[0]));
+      break;
+   case ir_unop_find_msb: {
+      src_reg temp = src_reg(this, glsl_type::uint_type);
+
+      inst = emit(FBH(dst_reg(temp), op[0]));
+      inst->dst.writemask = WRITEMASK_XYZW;
+
+      /* FBH counts from the MSB side, while GLSL's findMSB() wants the count
+       * from the LSB side. If FBH didn't return an error (0xFFFFFFFF), then
+       * subtract the result from 31 to convert the MSB count into an LSB count.
+       */
+
+      /* FBH only supports UD type for dst, so use a MOV to convert UD to D. */
+      temp.swizzle = BRW_SWIZZLE_NOOP;
+      emit(MOV(result_dst, temp));
+
+      src_reg src_tmp = src_reg(result_dst);
+      emit(CMP(dst_null_d(), src_tmp, src_reg(-1), BRW_CONDITIONAL_NZ));
+
+      src_tmp.negate = true;
+      inst = emit(ADD(result_dst, src_tmp, src_reg(31)));
+      inst->predicate = BRW_PREDICATE_NORMAL;
+      break;
+   }
+   case ir_unop_find_lsb:
+      emit(FBL(result_dst, op[0]));
+      break;
+
+   case ir_unop_noise:
+      assert(!"not reached: should be handled by lower_noise");
+      break;
+
+   case ir_binop_add:
+      emit(ADD(result_dst, op[0], op[1]));
+      break;
+   case ir_binop_sub:
+      assert(!"not reached: should be handled by ir_sub_to_add_neg");
+      break;
+
+   case ir_binop_mul:
+      if (brw->gen < 8 && ir->type->is_integer()) {
+	 /* For integer multiplication, the MUL uses the low 16 bits of one of
+	  * the operands (src0 through SNB, src1 on IVB and later).  The MACH
+	  * accumulates in the contribution of the upper 16 bits of that
+	  * operand.  If we can determine that one of the args is in the low
+	  * 16 bits, though, we can just emit a single MUL.
+          */
+         if (ir->operands[0]->is_uint16_constant()) {
+            if (brw->gen < 7)
+               emit(MUL(result_dst, op[0], op[1]));
+            else
+               emit(MUL(result_dst, op[1], op[0]));
+         } else if (ir->operands[1]->is_uint16_constant()) {
+            if (brw->gen < 7)
+               emit(MUL(result_dst, op[1], op[0]));
+            else
+               emit(MUL(result_dst, op[0], op[1]));
+         } else {
+            struct brw_reg acc = retype(brw_acc_reg(), result_dst.type);
+
+            emit(MUL(acc, op[0], op[1]));
+            emit(MACH(dst_null_d(), op[0], op[1]));
+            emit(MOV(result_dst, src_reg(acc)));
+         }
+      } else {
+	 emit(MUL(result_dst, op[0], op[1]));
+      }
+      break;
+   case ir_binop_imul_high: {
+      struct brw_reg acc = retype(brw_acc_reg(), result_dst.type);
+
+      emit(MUL(acc, op[0], op[1]));
+      emit(MACH(result_dst, op[0], op[1]));
+      break;
+   }
+   case ir_binop_div:
+      /* Floating point should be lowered by DIV_TO_MUL_RCP in the compiler. */
+      assert(ir->type->is_integer());
+      emit_math(SHADER_OPCODE_INT_QUOTIENT, result_dst, op[0], op[1]);
+      break;
+   case ir_binop_carry: {
+      struct brw_reg acc = retype(brw_acc_reg(), BRW_REGISTER_TYPE_UD);
+
+      emit(ADDC(dst_null_ud(), op[0], op[1]));
+      emit(MOV(result_dst, src_reg(acc)));
+      break;
+   }
+   case ir_binop_borrow: {
+      struct brw_reg acc = retype(brw_acc_reg(), BRW_REGISTER_TYPE_UD);
+
+      emit(SUBB(dst_null_ud(), op[0], op[1]));
+      emit(MOV(result_dst, src_reg(acc)));
+      break;
+   }
+   case ir_binop_mod:
+      /* Floating point should be lowered by MOD_TO_FRACT in the compiler. */
+      assert(ir->type->is_integer());
+      emit_math(SHADER_OPCODE_INT_REMAINDER, result_dst, op[0], op[1]);
+      break;
+
+   case ir_binop_less:
+   case ir_binop_greater:
+   case ir_binop_lequal:
+   case ir_binop_gequal:
+   case ir_binop_equal:
+   case ir_binop_nequal: {
+      emit(CMP(result_dst, op[0], op[1],
+	       brw_conditional_for_comparison(ir->operation)));
+      emit(AND(result_dst, result_src, src_reg(0x1)));
+      break;
+   }
+
+   case ir_binop_all_equal:
+      /* "==" operator producing a scalar boolean. */
+      if (ir->operands[0]->type->is_vector() ||
+	  ir->operands[1]->type->is_vector()) {
+	 emit(CMP(dst_null_d(), op[0], op[1], BRW_CONDITIONAL_Z));
+	 emit(MOV(result_dst, src_reg(0)));
+	 inst = emit(MOV(result_dst, src_reg(1)));
+	 inst->predicate = BRW_PREDICATE_ALIGN16_ALL4H;
+      } else {
+	 emit(CMP(result_dst, op[0], op[1], BRW_CONDITIONAL_Z));
+	 emit(AND(result_dst, result_src, src_reg(0x1)));
+      }
+      break;
+   case ir_binop_any_nequal:
+      /* "!=" operator producing a scalar boolean. */
+      if (ir->operands[0]->type->is_vector() ||
+	  ir->operands[1]->type->is_vector()) {
+	 emit(CMP(dst_null_d(), op[0], op[1], BRW_CONDITIONAL_NZ));
+
+	 emit(MOV(result_dst, src_reg(0)));
+	 inst = emit(MOV(result_dst, src_reg(1)));
+	 inst->predicate = BRW_PREDICATE_ALIGN16_ANY4H;
+      } else {
+	 emit(CMP(result_dst, op[0], op[1], BRW_CONDITIONAL_NZ));
+	 emit(AND(result_dst, result_src, src_reg(0x1)));
+      }
+      break;
+
+   case ir_unop_any:
+      emit(CMP(dst_null_d(), op[0], src_reg(0), BRW_CONDITIONAL_NZ));
+      emit(MOV(result_dst, src_reg(0)));
+
+      inst = emit(MOV(result_dst, src_reg(1)));
+      inst->predicate = BRW_PREDICATE_ALIGN16_ANY4H;
+      break;
+
+   case ir_binop_logic_xor:
+      emit(XOR(result_dst, op[0], op[1]));
+      break;
+
+   case ir_binop_logic_or:
+      emit(OR(result_dst, op[0], op[1]));
+      break;
+
+   case ir_binop_logic_and:
+      emit(AND(result_dst, op[0], op[1]));
+      break;
+
+   case ir_binop_dot:
+      assert(ir->operands[0]->type->is_vector());
+      assert(ir->operands[0]->type == ir->operands[1]->type);
+      emit_dp(result_dst, op[0], op[1], ir->operands[0]->type->vector_elements);
+      break;
+
+   case ir_unop_sqrt:
+      emit_math(SHADER_OPCODE_SQRT, result_dst, op[0]);
+      break;
+   case ir_unop_rsq:
+      emit_math(SHADER_OPCODE_RSQ, result_dst, op[0]);
+      break;
+
+   case ir_unop_bitcast_i2f:
+   case ir_unop_bitcast_u2f:
+      this->result = op[0];
+      this->result.type = BRW_REGISTER_TYPE_F;
+      break;
+
+   case ir_unop_bitcast_f2i:
+      this->result = op[0];
+      this->result.type = BRW_REGISTER_TYPE_D;
+      break;
+
+   case ir_unop_bitcast_f2u:
+      this->result = op[0];
+      this->result.type = BRW_REGISTER_TYPE_UD;
+      break;
+
+   case ir_unop_i2f:
+   case ir_unop_i2u:
+   case ir_unop_u2i:
+   case ir_unop_u2f:
+   case ir_unop_b2f:
+   case ir_unop_b2i:
+   case ir_unop_f2i:
+   case ir_unop_f2u:
+      emit(MOV(result_dst, op[0]));
+      break;
+   case ir_unop_f2b:
+   case ir_unop_i2b: {
+      emit(CMP(result_dst, op[0], src_reg(0.0f), BRW_CONDITIONAL_NZ));
+      emit(AND(result_dst, result_src, src_reg(1)));
+      break;
+   }
+
+   case ir_unop_trunc:
+      emit(RNDZ(result_dst, op[0]));
+      break;
+   case ir_unop_ceil:
+      op[0].negate = !op[0].negate;
+      inst = emit(RNDD(result_dst, op[0]));
+      this->result.negate = true;
+      break;
+   case ir_unop_floor:
+      inst = emit(RNDD(result_dst, op[0]));
+      break;
+   case ir_unop_fract:
+      inst = emit(FRC(result_dst, op[0]));
+      break;
+   case ir_unop_round_even:
+      emit(RNDE(result_dst, op[0]));
+      break;
+
+   case ir_binop_min:
+      emit_minmax(BRW_CONDITIONAL_L, result_dst, op[0], op[1]);
+      break;
+   case ir_binop_max:
+      emit_minmax(BRW_CONDITIONAL_G, result_dst, op[0], op[1]);
+      break;
+
+   case ir_binop_pow:
+      emit_math(SHADER_OPCODE_POW, result_dst, op[0], op[1]);
+      break;
+
+   case ir_unop_bit_not:
+      inst = emit(NOT(result_dst, op[0]));
+      break;
+   case ir_binop_bit_and:
+      inst = emit(AND(result_dst, op[0], op[1]));
+      break;
+   case ir_binop_bit_xor:
+      inst = emit(XOR(result_dst, op[0], op[1]));
+      break;
+   case ir_binop_bit_or:
+      inst = emit(OR(result_dst, op[0], op[1]));
+      break;
+
+   case ir_binop_lshift:
+      inst = emit(SHL(result_dst, op[0], op[1]));
+      break;
+
+   case ir_binop_rshift:
+      if (ir->type->base_type == GLSL_TYPE_INT)
+         inst = emit(ASR(result_dst, op[0], op[1]));
+      else
+         inst = emit(SHR(result_dst, op[0], op[1]));
+      break;
+
+   case ir_binop_bfm:
+      emit(BFI1(result_dst, op[0], op[1]));
+      break;
+
+   case ir_binop_ubo_load: {
+      ir_constant *uniform_block = ir->operands[0]->as_constant();
+      ir_constant *const_offset_ir = ir->operands[1]->as_constant();
+      unsigned const_offset = const_offset_ir ? const_offset_ir->value.u[0] : 0;
+      src_reg offset;
+
+      /* Now, load the vector from that offset. */
+      assert(ir->type->is_vector() || ir->type->is_scalar());
+
+      src_reg packed_consts = src_reg(this, glsl_type::vec4_type);
+      packed_consts.type = result.type;
+      src_reg surf_index =
+         src_reg(prog_data->base.binding_table.ubo_start + uniform_block->value.u[0]);
+      if (const_offset_ir) {
+         if (brw->gen >= 8) {
+            /* Store the offset in a GRF so we can send-from-GRF. */
+            offset = src_reg(this, glsl_type::int_type);
+            emit(MOV(dst_reg(offset), src_reg(const_offset / 16)));
+         } else {
+            /* Immediates are fine on older generations since they'll be moved
+             * to a (potentially fake) MRF at the generator level.
+             */
+            offset = src_reg(const_offset / 16);
+         }
+      } else {
+         offset = src_reg(this, glsl_type::uint_type);
+         emit(SHR(dst_reg(offset), op[1], src_reg(4)));
+      }
+
+      if (brw->gen >= 7) {
+         dst_reg grf_offset = dst_reg(this, glsl_type::int_type);
+         grf_offset.type = offset.type;
+
+         emit(MOV(grf_offset, offset));
+
+         emit(new(mem_ctx) vec4_instruction(this,
+                                            VS_OPCODE_PULL_CONSTANT_LOAD_GEN7,
+                                            dst_reg(packed_consts),
+                                            surf_index,
+                                            src_reg(grf_offset)));
+      } else {
+         vec4_instruction *pull =
+            emit(new(mem_ctx) vec4_instruction(this,
+                                               VS_OPCODE_PULL_CONSTANT_LOAD,
+                                               dst_reg(packed_consts),
+                                               surf_index,
+                                               offset));
+         pull->base_mrf = 14;
+         pull->mlen = 1;
+      }
+
+      packed_consts.swizzle = swizzle_for_size(ir->type->vector_elements);
+      packed_consts.swizzle += BRW_SWIZZLE4(const_offset % 16 / 4,
+                                            const_offset % 16 / 4,
+                                            const_offset % 16 / 4,
+                                            const_offset % 16 / 4);
+
+      /* UBO bools are any nonzero int.  We store bools as either 0 or 1. */
+      if (ir->type->base_type == GLSL_TYPE_BOOL) {
+         emit(CMP(result_dst, packed_consts, src_reg(0u),
+                  BRW_CONDITIONAL_NZ));
+         emit(AND(result_dst, result, src_reg(0x1)));
+      } else {
+         emit(MOV(result_dst, packed_consts));
+      }
+      break;
+   }
+
+   case ir_binop_vector_extract:
+      assert(!"should have been lowered by vec_index_to_cond_assign");
+      break;
+
+   case ir_triop_fma:
+      op[0] = fix_3src_operand(op[0]);
+      op[1] = fix_3src_operand(op[1]);
+      op[2] = fix_3src_operand(op[2]);
+      /* Note that the instruction's argument order is reversed from GLSL
+       * and the IR.
+       */
+      emit(MAD(result_dst, op[2], op[1], op[0]));
+      break;
+
+   case ir_triop_lrp:
+      emit_lrp(result_dst, op[0], op[1], op[2]);
+      break;
+
+   case ir_triop_csel:
+      emit(CMP(dst_null_d(), op[0], src_reg(0), BRW_CONDITIONAL_NZ));
+      inst = emit(BRW_OPCODE_SEL, result_dst, op[1], op[2]);
+      inst->predicate = BRW_PREDICATE_NORMAL;
+      break;
+
+   case ir_triop_bfi:
+      op[0] = fix_3src_operand(op[0]);
+      op[1] = fix_3src_operand(op[1]);
+      op[2] = fix_3src_operand(op[2]);
+      emit(BFI2(result_dst, op[0], op[1], op[2]));
+      break;
+
+   case ir_triop_bitfield_extract:
+      op[0] = fix_3src_operand(op[0]);
+      op[1] = fix_3src_operand(op[1]);
+      op[2] = fix_3src_operand(op[2]);
+      /* Note that the instruction's argument order is reversed from GLSL
+       * and the IR.
+       */
+      emit(BFE(result_dst, op[2], op[1], op[0]));
+      break;
+
+   case ir_triop_vector_insert:
+      assert(!"should have been lowered by lower_vector_insert");
+      break;
+
+   case ir_quadop_bitfield_insert:
+      assert(!"not reached: should be handled by "
+              "bitfield_insert_to_bfm_bfi\n");
+      break;
+
+   case ir_quadop_vector:
+      assert(!"not reached: should be handled by lower_quadop_vector");
+      break;
+
+   case ir_unop_pack_half_2x16:
+      emit_pack_half_2x16(result_dst, op[0]);
+      break;
+   case ir_unop_unpack_half_2x16:
+      emit_unpack_half_2x16(result_dst, op[0]);
+      break;
+   case ir_unop_pack_snorm_2x16:
+   case ir_unop_pack_snorm_4x8:
+   case ir_unop_pack_unorm_2x16:
+   case ir_unop_pack_unorm_4x8:
+   case ir_unop_unpack_snorm_2x16:
+   case ir_unop_unpack_snorm_4x8:
+   case ir_unop_unpack_unorm_2x16:
+   case ir_unop_unpack_unorm_4x8:
+      assert(!"not reached: should be handled by lower_packing_builtins");
+      break;
+   case ir_unop_unpack_half_2x16_split_x:
+   case ir_unop_unpack_half_2x16_split_y:
+   case ir_binop_pack_half_2x16_split:
+      assert(!"not reached: should not occur in vertex shader");
+      break;
+   case ir_binop_ldexp:
+      assert(!"not reached: should be handled by ldexp_to_arith()");
+      break;
+   }
+}
+
+
+void
+vec4_visitor::visit(ir_swizzle *ir)
+{
+   src_reg src;
+   int i = 0;
+   int swizzle[4];
+
+   /* Note that this is only swizzles in expressions, not those on the left
+    * hand side of an assignment, which do write masking.  See ir_assignment
+    * for that.
+    */
+
+   ir->val->accept(this);
+   src = this->result;
+   assert(src.file != BAD_FILE);
+
+   for (i = 0; i < ir->type->vector_elements; i++) {
+      switch (i) {
+      case 0:
+	 swizzle[i] = BRW_GET_SWZ(src.swizzle, ir->mask.x);
+	 break;
+      case 1:
+	 swizzle[i] = BRW_GET_SWZ(src.swizzle, ir->mask.y);
+	 break;
+      case 2:
+	 swizzle[i] = BRW_GET_SWZ(src.swizzle, ir->mask.z);
+	 break;
+      case 3:
+	 swizzle[i] = BRW_GET_SWZ(src.swizzle, ir->mask.w);
+	    break;
+      }
+   }
+   for (; i < 4 && ir->type->vector_elements < 4 && ir->type->vector_elements > 0; i++) {
+      /* Replicate the last channel out. */
+      swizzle[i] = swizzle[ir->type->vector_elements - 1];
+   }
+
+   src.swizzle = BRW_SWIZZLE4(swizzle[0], swizzle[1], swizzle[2], swizzle[3]);
+
+   this->result = src;
+}
+
+void
+vec4_visitor::visit(ir_dereference_variable *ir)
+{
+   const struct glsl_type *type = ir->type;
+   dst_reg *reg = variable_storage(ir->var);
+
+   if (!reg) {
+      fail("Failed to find variable storage for %s\n", ir->var->name);
+      this->result = src_reg(brw_null_reg());
+      return;
+   }
+
+   this->result = src_reg(*reg);
+
+   /* System values get their swizzle from the dst_reg writemask */
+   if (ir->var->data.mode == ir_var_system_value)
+      return;
+
+   if (type->is_scalar() || type->is_vector() || type->is_matrix())
+      this->result.swizzle = swizzle_for_size(type->vector_elements);
+}
+
+
+int
+vec4_visitor::compute_array_stride(ir_dereference_array *ir)
+{
+   /* Under normal circumstances array elements are stored consecutively, so
+    * the stride is equal to the size of the array element.
+    */
+   return type_size(ir->type);
+}
+
+
+void
+vec4_visitor::visit(ir_dereference_array *ir)
+{
+   ir_constant *constant_index;
+   src_reg src;
+   int array_stride = compute_array_stride(ir);
+
+   constant_index = ir->array_index->constant_expression_value();
+
+   ir->array->accept(this);
+   src = this->result;
+
+   if (constant_index) {
+      src.reg_offset += constant_index->value.i[0] * array_stride;
+   } else {
+      /* Variable index array dereference.  It eats the "vec4" of the
+       * base of the array and an index that offsets the Mesa register
+       * index.
+       */
+      ir->array_index->accept(this);
+
+      src_reg index_reg;
+
+      if (array_stride == 1) {
+	 index_reg = this->result;
+      } else {
+	 index_reg = src_reg(this, glsl_type::int_type);
+
+	 emit(MUL(dst_reg(index_reg), this->result, src_reg(array_stride)));
+      }
+
+      if (src.reladdr) {
+	 src_reg temp = src_reg(this, glsl_type::int_type);
+
+	 emit(ADD(dst_reg(temp), *src.reladdr, index_reg));
+
+	 index_reg = temp;
+      }
+
+      src.reladdr = ralloc(mem_ctx, src_reg);
+      memcpy(src.reladdr, &index_reg, sizeof(index_reg));
+   }
+
+   /* If the type is smaller than a vec4, replicate the last channel out. */
+   if (ir->type->is_scalar() || ir->type->is_vector() || ir->type->is_matrix())
+      src.swizzle = swizzle_for_size(ir->type->vector_elements);
+   else
+      src.swizzle = BRW_SWIZZLE_NOOP;
+   src.type = brw_type_for_base_type(ir->type);
+
+   this->result = src;
+}
+
+void
+vec4_visitor::visit(ir_dereference_record *ir)
+{
+   unsigned int i;
+   const glsl_type *struct_type = ir->record->type;
+   int offset = 0;
+
+   ir->record->accept(this);
+
+   for (i = 0; i < struct_type->length; i++) {
+      if (strcmp(struct_type->fields.structure[i].name, ir->field) == 0)
+	 break;
+      offset += type_size(struct_type->fields.structure[i].type);
+   }
+
+   /* If the type is smaller than a vec4, replicate the last channel out. */
+   if (ir->type->is_scalar() || ir->type->is_vector() || ir->type->is_matrix())
+      this->result.swizzle = swizzle_for_size(ir->type->vector_elements);
+   else
+      this->result.swizzle = BRW_SWIZZLE_NOOP;
+   this->result.type = brw_type_for_base_type(ir->type);
+
+   this->result.reg_offset += offset;
+}
+
+/**
+ * We want to be careful in assignment setup to hit the actual storage
+ * instead of potentially using a temporary like we might with the
+ * ir_dereference handler.
+ */
+static dst_reg
+get_assignment_lhs(ir_dereference *ir, vec4_visitor *v)
+{
+   /* The LHS must be a dereference.  If the LHS is a variable indexed array
+    * access of a vector, it must be separated into a series conditional moves
+    * before reaching this point (see ir_vec_index_to_cond_assign).
+    */
+   assert(ir->as_dereference());
+   ir_dereference_array *deref_array = ir->as_dereference_array();
+   if (deref_array) {
+      assert(!deref_array->array->type->is_vector());
+   }
+
+   /* Use the rvalue deref handler for the most part.  We'll ignore
+    * swizzles in it and write swizzles using writemask, though.
+    */
+   ir->accept(v);
+   return dst_reg(v->result);
+}
+
+void
+vec4_visitor::emit_block_move(dst_reg *dst, src_reg *src,
+			      const struct glsl_type *type, uint32_t predicate)
+{
+   if (type->base_type == GLSL_TYPE_STRUCT) {
+      for (unsigned int i = 0; i < type->length; i++) {
+	 emit_block_move(dst, src, type->fields.structure[i].type, predicate);
+      }
+      return;
+   }
+
+   if (type->is_array()) {
+      for (unsigned int i = 0; i < type->length; i++) {
+	 emit_block_move(dst, src, type->fields.array, predicate);
+      }
+      return;
+   }
+
+   if (type->is_matrix()) {
+      const struct glsl_type *vec_type;
+
+      vec_type = glsl_type::get_instance(GLSL_TYPE_FLOAT,
+					 type->vector_elements, 1);
+
+      for (int i = 0; i < type->matrix_columns; i++) {
+	 emit_block_move(dst, src, vec_type, predicate);
+      }
+      return;
+   }
+
+   assert(type->is_scalar() || type->is_vector());
+
+   dst->type = brw_type_for_base_type(type);
+   src->type = dst->type;
+
+   dst->writemask = (1 << type->vector_elements) - 1;
+
+   src->swizzle = swizzle_for_size(type->vector_elements);
+
+   vec4_instruction *inst = emit(MOV(*dst, *src));
+   inst->predicate = predicate;
+
+   dst->reg_offset++;
+   src->reg_offset++;
+}
+
+
+/* If the RHS processing resulted in an instruction generating a
+ * temporary value, and it would be easy to rewrite the instruction to
+ * generate its result right into the LHS instead, do so.  This ends
+ * up reliably removing instructions where it can be tricky to do so
+ * later without real UD chain information.
+ */
+bool
+vec4_visitor::try_rewrite_rhs_to_dst(ir_assignment *ir,
+				     dst_reg dst,
+				     src_reg src,
+				     vec4_instruction *pre_rhs_inst,
+				     vec4_instruction *last_rhs_inst)
+{
+   /* This could be supported, but it would take more smarts. */
+   if (ir->condition)
+      return false;
+
+   if (pre_rhs_inst == last_rhs_inst)
+      return false; /* No instructions generated to work with. */
+
+   /* Make sure the last instruction generated our source reg. */
+   if (src.file != GRF ||
+       src.file != last_rhs_inst->dst.file ||
+       src.reg != last_rhs_inst->dst.reg ||
+       src.reg_offset != last_rhs_inst->dst.reg_offset ||
+       src.reladdr ||
+       src.abs ||
+       src.negate ||
+       last_rhs_inst->predicate != BRW_PREDICATE_NONE)
+      return false;
+
+   /* Check that that last instruction fully initialized the channels
+    * we want to use, in the order we want to use them.  We could
+    * potentially reswizzle the operands of many instructions so that
+    * we could handle out of order channels, but don't yet.
+    */
+
+   for (unsigned i = 0; i < 4; i++) {
+      if (dst.writemask & (1 << i)) {
+	 if (!(last_rhs_inst->dst.writemask & (1 << i)))
+	    return false;
+
+	 if (BRW_GET_SWZ(src.swizzle, i) != i)
+	    return false;
+      }
+   }
+
+   /* Success!  Rewrite the instruction. */
+   last_rhs_inst->dst.file = dst.file;
+   last_rhs_inst->dst.reg = dst.reg;
+   last_rhs_inst->dst.reg_offset = dst.reg_offset;
+   last_rhs_inst->dst.reladdr = dst.reladdr;
+   last_rhs_inst->dst.writemask &= dst.writemask;
+
+   return true;
+}
+
+void
+vec4_visitor::visit(ir_assignment *ir)
+{
+   dst_reg dst = get_assignment_lhs(ir->lhs, this);
+   uint32_t predicate = BRW_PREDICATE_NONE;
+
+   if (!ir->lhs->type->is_scalar() &&
+       !ir->lhs->type->is_vector()) {
+      ir->rhs->accept(this);
+      src_reg src = this->result;
+
+      if (ir->condition) {
+	 emit_bool_to_cond_code(ir->condition, &predicate);
+      }
+
+      /* emit_block_move doesn't account for swizzles in the source register.
+       * This should be ok, since the source register is a structure or an
+       * array, and those can't be swizzled.  But double-check to be sure.
+       */
+      assert(src.swizzle ==
+             (ir->rhs->type->is_matrix()
+              ? swizzle_for_size(ir->rhs->type->vector_elements)
+              : BRW_SWIZZLE_NOOP));
+
+      emit_block_move(&dst, &src, ir->rhs->type, predicate);
+      return;
+   }
+
+   /* Now we're down to just a scalar/vector with writemasks. */
+   int i;
+
+   vec4_instruction *pre_rhs_inst, *last_rhs_inst;
+   pre_rhs_inst = (vec4_instruction *)this->instructions.get_tail();
+
+   ir->rhs->accept(this);
+
+   last_rhs_inst = (vec4_instruction *)this->instructions.get_tail();
+
+   src_reg src = this->result;
+
+   int swizzles[4];
+   int first_enabled_chan = 0;
+   int src_chan = 0;
+
+   assert(ir->lhs->type->is_vector() ||
+	  ir->lhs->type->is_scalar());
+   dst.writemask = ir->write_mask;
+
+   for (int i = 0; i < 4; i++) {
+      if (dst.writemask & (1 << i)) {
+	 first_enabled_chan = BRW_GET_SWZ(src.swizzle, i);
+	 break;
+      }
+   }
+
+   /* Swizzle a small RHS vector into the channels being written.
+    *
+    * glsl ir treats write_mask as dictating how many channels are
+    * present on the RHS while in our instructions we need to make
+    * those channels appear in the slots of the vec4 they're written to.
+    */
+   for (int i = 0; i < 4; i++) {
+      if (dst.writemask & (1 << i))
+	 swizzles[i] = BRW_GET_SWZ(src.swizzle, src_chan++);
+      else
+	 swizzles[i] = first_enabled_chan;
+   }
+   src.swizzle = BRW_SWIZZLE4(swizzles[0], swizzles[1],
+			      swizzles[2], swizzles[3]);
+
+   if (try_rewrite_rhs_to_dst(ir, dst, src, pre_rhs_inst, last_rhs_inst)) {
+      return;
+   }
+
+   if (ir->condition) {
+      emit_bool_to_cond_code(ir->condition, &predicate);
+   }
+
+   for (i = 0; i < type_size(ir->lhs->type); i++) {
+      vec4_instruction *inst = emit(MOV(dst, src));
+      inst->predicate = predicate;
+
+      dst.reg_offset++;
+      src.reg_offset++;
+   }
+}
+
+void
+vec4_visitor::emit_constant_values(dst_reg *dst, ir_constant *ir)
+{
+   if (ir->type->base_type == GLSL_TYPE_STRUCT) {
+      foreach_list(node, &ir->components) {
+	 ir_constant *field_value = (ir_constant *)node;
+
+	 emit_constant_values(dst, field_value);
+      }
+      return;
+   }
+
+   if (ir->type->is_array()) {
+      for (unsigned int i = 0; i < ir->type->length; i++) {
+	 emit_constant_values(dst, ir->array_elements[i]);
+      }
+      return;
+   }
+
+   if (ir->type->is_matrix()) {
+      for (int i = 0; i < ir->type->matrix_columns; i++) {
+	 float *vec = &ir->value.f[i * ir->type->vector_elements];
+
+	 for (int j = 0; j < ir->type->vector_elements; j++) {
+	    dst->writemask = 1 << j;
+	    dst->type = BRW_REGISTER_TYPE_F;
+
+	    emit(MOV(*dst, src_reg(vec[j])));
+	 }
+	 dst->reg_offset++;
+      }
+      return;
+   }
+
+   int remaining_writemask = (1 << ir->type->vector_elements) - 1;
+
+   for (int i = 0; i < ir->type->vector_elements; i++) {
+      if (!(remaining_writemask & (1 << i)))
+	 continue;
+
+      dst->writemask = 1 << i;
+      dst->type = brw_type_for_base_type(ir->type);
+
+      /* Find other components that match the one we're about to
+       * write.  Emits fewer instructions for things like vec4(0.5,
+       * 1.5, 1.5, 1.5).
+       */
+      for (int j = i + 1; j < ir->type->vector_elements; j++) {
+	 if (ir->type->base_type == GLSL_TYPE_BOOL) {
+	    if (ir->value.b[i] == ir->value.b[j])
+	       dst->writemask |= (1 << j);
+	 } else {
+	    /* u, i, and f storage all line up, so no need for a
+	     * switch case for comparing each type.
+	     */
+	    if (ir->value.u[i] == ir->value.u[j])
+	       dst->writemask |= (1 << j);
+	 }
+      }
+
+      switch (ir->type->base_type) {
+      case GLSL_TYPE_FLOAT:
+	 emit(MOV(*dst, src_reg(ir->value.f[i])));
+	 break;
+      case GLSL_TYPE_INT:
+	 emit(MOV(*dst, src_reg(ir->value.i[i])));
+	 break;
+      case GLSL_TYPE_UINT:
+	 emit(MOV(*dst, src_reg(ir->value.u[i])));
+	 break;
+      case GLSL_TYPE_BOOL:
+	 emit(MOV(*dst, src_reg(ir->value.b[i])));
+	 break;
+      default:
+	 assert(!"Non-float/uint/int/bool constant");
+	 break;
+      }
+
+      remaining_writemask &= ~dst->writemask;
+   }
+   dst->reg_offset++;
+}
+
+void
+vec4_visitor::visit(ir_constant *ir)
+{
+   dst_reg dst = dst_reg(this, ir->type);
+   this->result = src_reg(dst);
+
+   emit_constant_values(&dst, ir);
+}
+
+void
+vec4_visitor::visit_atomic_counter_intrinsic(ir_call *ir)
+{
+   ir_dereference *deref = static_cast<ir_dereference *>(
+      ir->actual_parameters.get_head());
+   ir_variable *location = deref->variable_referenced();
+   unsigned surf_index = (prog_data->base.binding_table.abo_start +
+                          location->data.atomic.buffer_index);
+
+   /* Calculate the surface offset */
+   src_reg offset(this, glsl_type::uint_type);
+   ir_dereference_array *deref_array = deref->as_dereference_array();
+   if (deref_array) {
+      deref_array->array_index->accept(this);
+
+      src_reg tmp(this, glsl_type::uint_type);
+      emit(MUL(dst_reg(tmp), this->result, ATOMIC_COUNTER_SIZE));
+      emit(ADD(dst_reg(offset), tmp, location->data.atomic.offset));
+   } else {
+      offset = location->data.atomic.offset;
+   }
+
+   /* Emit the appropriate machine instruction */
+   const char *callee = ir->callee->function_name();
+   dst_reg dst = get_assignment_lhs(ir->return_deref, this);
+
+   if (!strcmp("__intrinsic_atomic_read", callee)) {
+      emit_untyped_surface_read(surf_index, dst, offset);
+
+   } else if (!strcmp("__intrinsic_atomic_increment", callee)) {
+      emit_untyped_atomic(BRW_AOP_INC, surf_index, dst, offset,
+                          src_reg(), src_reg());
+
+   } else if (!strcmp("__intrinsic_atomic_predecrement", callee)) {
+      emit_untyped_atomic(BRW_AOP_PREDEC, surf_index, dst, offset,
+                          src_reg(), src_reg());
+   }
+}
+
+void
+vec4_visitor::visit(ir_call *ir)
+{
+   const char *callee = ir->callee->function_name();
+
+   if (!strcmp("__intrinsic_atomic_read", callee) ||
+       !strcmp("__intrinsic_atomic_increment", callee) ||
+       !strcmp("__intrinsic_atomic_predecrement", callee)) {
+      visit_atomic_counter_intrinsic(ir);
+   } else {
+      assert(!"Unsupported intrinsic.");
+   }
+}
+
+src_reg
+vec4_visitor::emit_mcs_fetch(ir_texture *ir, src_reg coordinate, int sampler)
+{
+   vec4_instruction *inst = new(mem_ctx) vec4_instruction(this, SHADER_OPCODE_TXF_MCS);
+   inst->base_mrf = 2;
+   inst->mlen = 1;
+   inst->sampler = sampler;
+   inst->dst = dst_reg(this, glsl_type::uvec4_type);
+   inst->dst.writemask = WRITEMASK_XYZW;
+
+   /* parameters are: u, v, r, lod; lod will always be zero due to api restrictions */
+   int param_base = inst->base_mrf;
+   int coord_mask = (1 << ir->coordinate->type->vector_elements) - 1;
+   int zero_mask = 0xf & ~coord_mask;
+
+   emit(MOV(dst_reg(MRF, param_base, ir->coordinate->type, coord_mask),
+            coordinate));
+
+   emit(MOV(dst_reg(MRF, param_base, ir->coordinate->type, zero_mask),
+            src_reg(0)));
+
+   emit(inst);
+   return src_reg(inst->dst);
+}
+
+void
+vec4_visitor::visit(ir_texture *ir)
+{
+   int sampler =
+      _mesa_get_sampler_uniform_value(ir->sampler, shader_prog, prog);
+
+   /* When tg4 is used with the degenerate ZERO/ONE swizzles, don't bother
+    * emitting anything other than setting up the constant result.
+    */
+   if (ir->op == ir_tg4) {
+      ir_constant *chan = ir->lod_info.component->as_constant();
+      int swiz = GET_SWZ(key->tex.swizzles[sampler], chan->value.i[0]);
+      if (swiz == SWIZZLE_ZERO || swiz == SWIZZLE_ONE) {
+         dst_reg result(this, ir->type);
+         this->result = src_reg(result);
+         emit(MOV(result, src_reg(swiz == SWIZZLE_ONE ? 1.0f : 0.0f)));
+         return;
+      }
+   }
+
+   /* Should be lowered by do_lower_texture_projection */
+   assert(!ir->projector);
+
+   /* Should be lowered */
+   assert(!ir->offset || !ir->offset->type->is_array());
+
+   /* Generate code to compute all the subexpression trees.  This has to be
+    * done before loading any values into MRFs for the sampler message since
+    * generating these values may involve SEND messages that need the MRFs.
+    */
+   src_reg coordinate;
+   if (ir->coordinate) {
+      ir->coordinate->accept(this);
+      coordinate = this->result;
+   }
+
+   src_reg shadow_comparitor;
+   if (ir->shadow_comparitor) {
+      ir->shadow_comparitor->accept(this);
+      shadow_comparitor = this->result;
+   }
+
+   bool has_nonconstant_offset = ir->offset && !ir->offset->as_constant();
+   src_reg offset_value;
+   if (has_nonconstant_offset) {
+      ir->offset->accept(this);
+      offset_value = src_reg(this->result);
+   }
+
+   const glsl_type *lod_type = NULL, *sample_index_type = NULL;
+   src_reg lod, dPdx, dPdy, sample_index, mcs;
+   switch (ir->op) {
+   case ir_tex:
+      lod = src_reg(0.0f);
+      lod_type = glsl_type::float_type;
+      break;
+   case ir_txf:
+   case ir_txl:
+   case ir_txs:
+      ir->lod_info.lod->accept(this);
+      lod = this->result;
+      lod_type = ir->lod_info.lod->type;
+      break;
+   case ir_query_levels:
+      lod = src_reg(0);
+      lod_type = glsl_type::int_type;
+      break;
+   case ir_txf_ms:
+      ir->lod_info.sample_index->accept(this);
+      sample_index = this->result;
+      sample_index_type = ir->lod_info.sample_index->type;
+
+      if (brw->gen >= 7 && key->tex.compressed_multisample_layout_mask & (1<<sampler))
+         mcs = emit_mcs_fetch(ir, coordinate, sampler);
+      else
+         mcs = src_reg(0u);
+      break;
+   case ir_txd:
+      ir->lod_info.grad.dPdx->accept(this);
+      dPdx = this->result;
+
+      ir->lod_info.grad.dPdy->accept(this);
+      dPdy = this->result;
+
+      lod_type = ir->lod_info.grad.dPdx->type;
+      break;
+   case ir_txb:
+   case ir_lod:
+   case ir_tg4:
+      break;
+   }
+
+   vec4_instruction *inst = NULL;
+   switch (ir->op) {
+   case ir_tex:
+   case ir_txl:
+      inst = new(mem_ctx) vec4_instruction(this, SHADER_OPCODE_TXL);
+      break;
+   case ir_txd:
+      inst = new(mem_ctx) vec4_instruction(this, SHADER_OPCODE_TXD);
+      break;
+   case ir_txf:
+      inst = new(mem_ctx) vec4_instruction(this, SHADER_OPCODE_TXF);
+      break;
+   case ir_txf_ms:
+      inst = new(mem_ctx) vec4_instruction(this, SHADER_OPCODE_TXF_CMS);
+      break;
+   case ir_txs:
+      inst = new(mem_ctx) vec4_instruction(this, SHADER_OPCODE_TXS);
+      break;
+   case ir_tg4:
+      if (has_nonconstant_offset)
+         inst = new(mem_ctx) vec4_instruction(this, SHADER_OPCODE_TG4_OFFSET);
+      else
+         inst = new(mem_ctx) vec4_instruction(this, SHADER_OPCODE_TG4);
+      break;
+   case ir_query_levels:
+      inst = new(mem_ctx) vec4_instruction(this, SHADER_OPCODE_TXS);
+      break;
+   case ir_txb:
+      assert(!"TXB is not valid for vertex shaders.");
+      break;
+   case ir_lod:
+      assert(!"LOD is not valid for vertex shaders.");
+      break;
+   default:
+      assert(!"Unrecognized tex op");
+   }
+
+   if (ir->offset != NULL && ir->op != ir_txf)
+      inst->texture_offset = brw_texture_offset(ctx, ir->offset->as_constant());
+
+   /* Stuff the channel select bits in the top of the texture offset */
+   if (ir->op == ir_tg4)
+      inst->texture_offset |= gather_channel(ir, sampler) << 16;
+
+   /* The message header is necessary for:
+    * - Gen4 (always)
+    * - Texel offsets
+    * - Gather channel selection
+    * - Sampler indices too large to fit in a 4-bit value.
+    */
+   inst->header_present =
+      brw->gen < 5 || inst->texture_offset != 0 || ir->op == ir_tg4 ||
+      sampler >= 16;
+   inst->base_mrf = 2;
+   inst->mlen = inst->header_present + 1; /* always at least one */
+   inst->sampler = sampler;
+   inst->dst = dst_reg(this, ir->type);
+   inst->dst.writemask = WRITEMASK_XYZW;
+   inst->shadow_compare = ir->shadow_comparitor != NULL;
+
+   /* MRF for the first parameter */
+   int param_base = inst->base_mrf + inst->header_present;
+
+   if (ir->op == ir_txs || ir->op == ir_query_levels) {
+      int writemask = brw->gen == 4 ? WRITEMASK_W : WRITEMASK_X;
+      emit(MOV(dst_reg(MRF, param_base, lod_type, writemask), lod));
+   } else {
+      /* Load the coordinate */
+      /* FINISHME: gl_clamp_mask and saturate */
+      int coord_mask = (1 << ir->coordinate->type->vector_elements) - 1;
+      int zero_mask = 0xf & ~coord_mask;
+
+      emit(MOV(dst_reg(MRF, param_base, ir->coordinate->type, coord_mask),
+               coordinate));
+
+      if (zero_mask != 0) {
+         emit(MOV(dst_reg(MRF, param_base, ir->coordinate->type, zero_mask),
+                  src_reg(0)));
+      }
+      /* Load the shadow comparitor */
+      if (ir->shadow_comparitor && ir->op != ir_txd && (ir->op != ir_tg4 || !has_nonconstant_offset)) {
+	 emit(MOV(dst_reg(MRF, param_base + 1, ir->shadow_comparitor->type,
+			  WRITEMASK_X),
+		  shadow_comparitor));
+	 inst->mlen++;
+      }
+
+      /* Load the LOD info */
+      if (ir->op == ir_tex || ir->op == ir_txl) {
+	 int mrf, writemask;
+	 if (brw->gen >= 5) {
+	    mrf = param_base + 1;
+	    if (ir->shadow_comparitor) {
+	       writemask = WRITEMASK_Y;
+	       /* mlen already incremented */
+	    } else {
+	       writemask = WRITEMASK_X;
+	       inst->mlen++;
+	    }
+	 } else /* brw->gen == 4 */ {
+	    mrf = param_base;
+	    writemask = WRITEMASK_W;
+	 }
+	 emit(MOV(dst_reg(MRF, mrf, lod_type, writemask), lod));
+      } else if (ir->op == ir_txf) {
+         emit(MOV(dst_reg(MRF, param_base, lod_type, WRITEMASK_W), lod));
+      } else if (ir->op == ir_txf_ms) {
+         emit(MOV(dst_reg(MRF, param_base + 1, sample_index_type, WRITEMASK_X),
+                  sample_index));
+         if (brw->gen >= 7)
+            /* MCS data is in the first channel of `mcs`, but we need to get it into
+             * the .y channel of the second vec4 of params, so replicate .x across
+             * the whole vec4 and then mask off everything except .y
+             */
+            mcs.swizzle = BRW_SWIZZLE_XXXX;
+            emit(MOV(dst_reg(MRF, param_base + 1, glsl_type::uint_type, WRITEMASK_Y),
+                     mcs));
+         inst->mlen++;
+      } else if (ir->op == ir_txd) {
+	 const glsl_type *type = lod_type;
+
+	 if (brw->gen >= 5) {
+	    dPdx.swizzle = BRW_SWIZZLE4(SWIZZLE_X,SWIZZLE_X,SWIZZLE_Y,SWIZZLE_Y);
+	    dPdy.swizzle = BRW_SWIZZLE4(SWIZZLE_X,SWIZZLE_X,SWIZZLE_Y,SWIZZLE_Y);
+	    emit(MOV(dst_reg(MRF, param_base + 1, type, WRITEMASK_XZ), dPdx));
+	    emit(MOV(dst_reg(MRF, param_base + 1, type, WRITEMASK_YW), dPdy));
+	    inst->mlen++;
+
+	    if (ir->type->vector_elements == 3 || ir->shadow_comparitor) {
+	       dPdx.swizzle = BRW_SWIZZLE_ZZZZ;
+	       dPdy.swizzle = BRW_SWIZZLE_ZZZZ;
+	       emit(MOV(dst_reg(MRF, param_base + 2, type, WRITEMASK_X), dPdx));
+	       emit(MOV(dst_reg(MRF, param_base + 2, type, WRITEMASK_Y), dPdy));
+	       inst->mlen++;
+
+               if (ir->shadow_comparitor) {
+                  emit(MOV(dst_reg(MRF, param_base + 2,
+                                   ir->shadow_comparitor->type, WRITEMASK_Z),
+                           shadow_comparitor));
+               }
+	    }
+	 } else /* brw->gen == 4 */ {
+	    emit(MOV(dst_reg(MRF, param_base + 1, type, WRITEMASK_XYZ), dPdx));
+	    emit(MOV(dst_reg(MRF, param_base + 2, type, WRITEMASK_XYZ), dPdy));
+	    inst->mlen += 2;
+	 }
+      } else if (ir->op == ir_tg4 && has_nonconstant_offset) {
+         if (ir->shadow_comparitor) {
+            emit(MOV(dst_reg(MRF, param_base, ir->shadow_comparitor->type, WRITEMASK_W),
+                     shadow_comparitor));
+         }
+
+         emit(MOV(dst_reg(MRF, param_base + 1, glsl_type::ivec2_type, WRITEMASK_XY),
+                  offset_value));
+         inst->mlen++;
+      }
+   }
+
+   emit(inst);
+
+   /* fixup num layers (z) for cube arrays: hardware returns faces * layers;
+    * spec requires layers.
+    */
+   if (ir->op == ir_txs) {
+      glsl_type const *type = ir->sampler->type;
+      if (type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE &&
+          type->sampler_array) {
+         emit_math(SHADER_OPCODE_INT_QUOTIENT,
+                   writemask(inst->dst, WRITEMASK_Z),
+                   src_reg(inst->dst), src_reg(6));
+      }
+   }
+
+   if (brw->gen == 6 && ir->op == ir_tg4) {
+      emit_gen6_gather_wa(key->tex.gen6_gather_wa[sampler], inst->dst);
+   }
+
+   swizzle_result(ir, src_reg(inst->dst), sampler);
+}
+
+/**
+ * Apply workarounds for Gen6 gather with UINT/SINT
+ */
+void
+vec4_visitor::emit_gen6_gather_wa(uint8_t wa, dst_reg dst)
+{
+   if (!wa)
+      return;
+
+   int width = (wa & WA_8BIT) ? 8 : 16;
+   dst_reg dst_f = dst;
+   dst_f.type = BRW_REGISTER_TYPE_F;
+
+   /* Convert from UNORM to UINT */
+   emit(MUL(dst_f, src_reg(dst_f), src_reg((float)((1 << width) - 1))));
+   emit(MOV(dst, src_reg(dst_f)));
+
+   if (wa & WA_SIGN) {
+      /* Reinterpret the UINT value as a signed INT value by
+       * shifting the sign bit into place, then shifting back
+       * preserving sign.
+       */
+      emit(SHL(dst, src_reg(dst), src_reg(32 - width)));
+      emit(ASR(dst, src_reg(dst), src_reg(32 - width)));
+   }
+}
+
+/**
+ * Set up the gather channel based on the swizzle, for gather4.
+ */
+uint32_t
+vec4_visitor::gather_channel(ir_texture *ir, int sampler)
+{
+   ir_constant *chan = ir->lod_info.component->as_constant();
+   int swiz = GET_SWZ(key->tex.swizzles[sampler], chan->value.i[0]);
+   switch (swiz) {
+      case SWIZZLE_X: return 0;
+      case SWIZZLE_Y:
+         /* gather4 sampler is broken for green channel on RG32F --
+          * we must ask for blue instead.
+          */
+         if (key->tex.gather_channel_quirk_mask & (1<<sampler))
+            return 2;
+         return 1;
+      case SWIZZLE_Z: return 2;
+      case SWIZZLE_W: return 3;
+      default:
+         assert(!"Not reached"); /* zero, one swizzles handled already */
+         return 0;
+   }
+}
+
+void
+vec4_visitor::swizzle_result(ir_texture *ir, src_reg orig_val, int sampler)
+{
+   int s = key->tex.swizzles[sampler];
+
+   this->result = src_reg(this, ir->type);
+   dst_reg swizzled_result(this->result);
+
+   if (ir->op == ir_query_levels) {
+      /* # levels is in .w */
+      orig_val.swizzle = BRW_SWIZZLE4(SWIZZLE_W, SWIZZLE_W, SWIZZLE_W, SWIZZLE_W);
+      emit(MOV(swizzled_result, orig_val));
+      return;
+   }
+
+   if (ir->op == ir_txs || ir->type == glsl_type::float_type
+			|| s == SWIZZLE_NOOP || ir->op == ir_tg4) {
+      emit(MOV(swizzled_result, orig_val));
+      return;
+   }
+
+
+   int zero_mask = 0, one_mask = 0, copy_mask = 0;
+   int swizzle[4] = {0};
+
+   for (int i = 0; i < 4; i++) {
+      switch (GET_SWZ(s, i)) {
+      case SWIZZLE_ZERO:
+	 zero_mask |= (1 << i);
+	 break;
+      case SWIZZLE_ONE:
+	 one_mask |= (1 << i);
+	 break;
+      default:
+	 copy_mask |= (1 << i);
+	 swizzle[i] = GET_SWZ(s, i);
+	 break;
+      }
+   }
+
+   if (copy_mask) {
+      orig_val.swizzle = BRW_SWIZZLE4(swizzle[0], swizzle[1], swizzle[2], swizzle[3]);
+      swizzled_result.writemask = copy_mask;
+      emit(MOV(swizzled_result, orig_val));
+   }
+
+   if (zero_mask) {
+      swizzled_result.writemask = zero_mask;
+      emit(MOV(swizzled_result, src_reg(0.0f)));
+   }
+
+   if (one_mask) {
+      swizzled_result.writemask = one_mask;
+      emit(MOV(swizzled_result, src_reg(1.0f)));
+   }
+}
+
+void
+vec4_visitor::visit(ir_return *ir)
+{
+   assert(!"not reached");
+}
+
+void
+vec4_visitor::visit(ir_discard *ir)
+{
+   assert(!"not reached");
+}
+
+void
+vec4_visitor::visit(ir_if *ir)
+{
+   /* Don't point the annotation at the if statement, because then it plus
+    * the then and else blocks get printed.
+    */
+   this->base_ir = ir->condition;
+
+   if (brw->gen == 6) {
+      emit_if_gen6(ir);
+   } else {
+      uint32_t predicate;
+      emit_bool_to_cond_code(ir->condition, &predicate);
+      emit(IF(predicate));
+   }
+
+   visit_instructions(&ir->then_instructions);
+
+   if (!ir->else_instructions.is_empty()) {
+      this->base_ir = ir->condition;
+      emit(BRW_OPCODE_ELSE);
+
+      visit_instructions(&ir->else_instructions);
+   }
+
+   this->base_ir = ir->condition;
+   emit(BRW_OPCODE_ENDIF);
+}
+
+void
+vec4_visitor::visit(ir_emit_vertex *)
+{
+   assert(!"not reached");
+}
+
+void
+vec4_visitor::visit(ir_end_primitive *)
+{
+   assert(!"not reached");
+}
+
+void
+vec4_visitor::emit_untyped_atomic(unsigned atomic_op, unsigned surf_index,
+                                  dst_reg dst, src_reg offset,
+                                  src_reg src0, src_reg src1)
+{
+   unsigned mlen = 0;
+
+   /* Set the atomic operation offset. */
+   emit(MOV(brw_writemask(brw_uvec_mrf(8, mlen, 0), WRITEMASK_X), offset));
+   mlen++;
+
+   /* Set the atomic operation arguments. */
+   if (src0.file != BAD_FILE) {
+      emit(MOV(brw_writemask(brw_uvec_mrf(8, mlen, 0), WRITEMASK_X), src0));
+      mlen++;
+   }
+
+   if (src1.file != BAD_FILE) {
+      emit(MOV(brw_writemask(brw_uvec_mrf(8, mlen, 0), WRITEMASK_X), src1));
+      mlen++;
+   }
+
+   /* Emit the instruction.  Note that this maps to the normal SIMD8
+    * untyped atomic message on Ivy Bridge, but that's OK because
+    * unused channels will be masked out.
+    */
+   vec4_instruction *inst = emit(SHADER_OPCODE_UNTYPED_ATOMIC, dst,
+                                 src_reg(atomic_op), src_reg(surf_index));
+   inst->base_mrf = 0;
+   inst->mlen = mlen;
+}
+
+void
+vec4_visitor::emit_untyped_surface_read(unsigned surf_index, dst_reg dst,
+                                        src_reg offset)
+{
+   /* Set the surface read offset. */
+   emit(MOV(brw_writemask(brw_uvec_mrf(8, 0, 0), WRITEMASK_X), offset));
+
+   /* Emit the instruction.  Note that this maps to the normal SIMD8
+    * untyped surface read message, but that's OK because unused
+    * channels will be masked out.
+    */
+   vec4_instruction *inst = emit(SHADER_OPCODE_UNTYPED_SURFACE_READ,
+                                 dst, src_reg(surf_index));
+   inst->base_mrf = 0;
+   inst->mlen = 1;
+}
+
+void
+vec4_visitor::emit_ndc_computation()
+{
+   /* Get the position */
+   src_reg pos = src_reg(output_reg[VARYING_SLOT_POS]);
+
+   /* Build ndc coords, which are (x/w, y/w, z/w, 1/w) */
+   dst_reg ndc = dst_reg(this, glsl_type::vec4_type);
+   output_reg[BRW_VARYING_SLOT_NDC] = ndc;
+
+   current_annotation = "NDC";
+   dst_reg ndc_w = ndc;
+   ndc_w.writemask = WRITEMASK_W;
+   src_reg pos_w = pos;
+   pos_w.swizzle = BRW_SWIZZLE4(SWIZZLE_W, SWIZZLE_W, SWIZZLE_W, SWIZZLE_W);
+   emit_math(SHADER_OPCODE_RCP, ndc_w, pos_w);
+
+   dst_reg ndc_xyz = ndc;
+   ndc_xyz.writemask = WRITEMASK_XYZ;
+
+   emit(MUL(ndc_xyz, pos, src_reg(ndc_w)));
+}
+
+void
+vec4_visitor::emit_psiz_and_flags(struct brw_reg reg)
+{
+   if (brw->gen < 6 &&
+       ((prog_data->vue_map.slots_valid & VARYING_BIT_PSIZ) ||
+        key->userclip_active || brw->has_negative_rhw_bug)) {
+      dst_reg header1 = dst_reg(this, glsl_type::uvec4_type);
+      dst_reg header1_w = header1;
+      header1_w.writemask = WRITEMASK_W;
+
+      emit(MOV(header1, 0u));
+
+      if (prog_data->vue_map.slots_valid & VARYING_BIT_PSIZ) {
+	 src_reg psiz = src_reg(output_reg[VARYING_SLOT_PSIZ]);
+
+	 current_annotation = "Point size";
+	 emit(MUL(header1_w, psiz, src_reg((float)(1 << 11))));
+	 emit(AND(header1_w, src_reg(header1_w), 0x7ff << 8));
+      }
+
+      if (key->userclip_active) {
+         current_annotation = "Clipping flags";
+         dst_reg flags0 = dst_reg(this, glsl_type::uint_type);
+         dst_reg flags1 = dst_reg(this, glsl_type::uint_type);
+
+         emit(CMP(dst_null_f(), src_reg(output_reg[VARYING_SLOT_CLIP_DIST0]), src_reg(0.0f), BRW_CONDITIONAL_L));
+         emit(VS_OPCODE_UNPACK_FLAGS_SIMD4X2, flags0, src_reg(0));
+         emit(OR(header1_w, src_reg(header1_w), src_reg(flags0)));
+
+         emit(CMP(dst_null_f(), src_reg(output_reg[VARYING_SLOT_CLIP_DIST1]), src_reg(0.0f), BRW_CONDITIONAL_L));
+         emit(VS_OPCODE_UNPACK_FLAGS_SIMD4X2, flags1, src_reg(0));
+         emit(SHL(flags1, src_reg(flags1), src_reg(4)));
+         emit(OR(header1_w, src_reg(header1_w), src_reg(flags1)));
+      }
+
+      /* i965 clipping workaround:
+       * 1) Test for -ve rhw
+       * 2) If set,
+       *      set ndc = (0,0,0,0)
+       *      set ucp[6] = 1
+       *
+       * Later, clipping will detect ucp[6] and ensure the primitive is
+       * clipped against all fixed planes.
+       */
+      if (brw->has_negative_rhw_bug) {
+         src_reg ndc_w = src_reg(output_reg[BRW_VARYING_SLOT_NDC]);
+         ndc_w.swizzle = BRW_SWIZZLE_WWWW;
+         emit(CMP(dst_null_f(), ndc_w, src_reg(0.0f), BRW_CONDITIONAL_L));
+         vec4_instruction *inst;
+         inst = emit(OR(header1_w, src_reg(header1_w), src_reg(1u << 6)));
+         inst->predicate = BRW_PREDICATE_NORMAL;
+         inst = emit(MOV(output_reg[BRW_VARYING_SLOT_NDC], src_reg(0.0f)));
+         inst->predicate = BRW_PREDICATE_NORMAL;
+      }
+
+      emit(MOV(retype(reg, BRW_REGISTER_TYPE_UD), src_reg(header1)));
+   } else if (brw->gen < 6) {
+      emit(MOV(retype(reg, BRW_REGISTER_TYPE_UD), 0u));
+   } else {
+      emit(MOV(retype(reg, BRW_REGISTER_TYPE_D), src_reg(0)));
+      if (prog_data->vue_map.slots_valid & VARYING_BIT_PSIZ) {
+         emit(MOV(brw_writemask(reg, WRITEMASK_W),
+                  src_reg(output_reg[VARYING_SLOT_PSIZ])));
+      }
+      if (prog_data->vue_map.slots_valid & VARYING_BIT_LAYER) {
+         emit(MOV(retype(brw_writemask(reg, WRITEMASK_Y), BRW_REGISTER_TYPE_D),
+                  src_reg(output_reg[VARYING_SLOT_LAYER])));
+      }
+      if (prog_data->vue_map.slots_valid & VARYING_BIT_VIEWPORT) {
+         emit(MOV(retype(brw_writemask(reg, WRITEMASK_Z), BRW_REGISTER_TYPE_D),
+                  src_reg(output_reg[VARYING_SLOT_VIEWPORT])));
+      }
+   }
+}
+
+void
+vec4_visitor::emit_clip_distances(dst_reg reg, int offset)
+{
+   /* From the GLSL 1.30 spec, section 7.1 (Vertex Shader Special Variables):
+    *
+    *     "If a linked set of shaders forming the vertex stage contains no
+    *     static write to gl_ClipVertex or gl_ClipDistance, but the
+    *     application has requested clipping against user clip planes through
+    *     the API, then the coordinate written to gl_Position is used for
+    *     comparison against the user clip planes."
+    *
+    * This function is only called if the shader didn't write to
+    * gl_ClipDistance.  Accordingly, we use gl_ClipVertex to perform clipping
+    * if the user wrote to it; otherwise we use gl_Position.
+    */
+   gl_varying_slot clip_vertex = VARYING_SLOT_CLIP_VERTEX;
+   if (!(prog_data->vue_map.slots_valid & VARYING_BIT_CLIP_VERTEX)) {
+      clip_vertex = VARYING_SLOT_POS;
+   }
+
+   for (int i = 0; i + offset < key->nr_userclip_plane_consts && i < 4;
+        ++i) {
+      reg.writemask = 1 << i;
+      emit(DP4(reg,
+               src_reg(output_reg[clip_vertex]),
+               src_reg(this->userplane[i + offset])));
+   }
+}
+
+void
+vec4_visitor::emit_generic_urb_slot(dst_reg reg, int varying)
+{
+   assert (varying < VARYING_SLOT_MAX);
+   reg.type = output_reg[varying].type;
+   current_annotation = output_reg_annotation[varying];
+   /* Copy the register, saturating if necessary */
+   vec4_instruction *inst = emit(MOV(reg,
+                                     src_reg(output_reg[varying])));
+   if ((varying == VARYING_SLOT_COL0 ||
+        varying == VARYING_SLOT_COL1 ||
+        varying == VARYING_SLOT_BFC0 ||
+        varying == VARYING_SLOT_BFC1) &&
+       key->clamp_vertex_color) {
+      inst->saturate = true;
+   }
+}
+
+void
+vec4_visitor::emit_urb_slot(int mrf, int varying)
+{
+   struct brw_reg hw_reg = brw_message_reg(mrf);
+   dst_reg reg = dst_reg(MRF, mrf);
+   reg.type = BRW_REGISTER_TYPE_F;
+
+   switch (varying) {
+   case VARYING_SLOT_PSIZ:
+      /* PSIZ is always in slot 0, and is coupled with other flags. */
+      current_annotation = "indices, point width, clip flags";
+      emit_psiz_and_flags(hw_reg);
+      break;
+   case BRW_VARYING_SLOT_NDC:
+      current_annotation = "NDC";
+      emit(MOV(reg, src_reg(output_reg[BRW_VARYING_SLOT_NDC])));
+      break;
+   case VARYING_SLOT_POS:
+      current_annotation = "gl_Position";
+      emit(MOV(reg, src_reg(output_reg[VARYING_SLOT_POS])));
+      break;
+   case VARYING_SLOT_EDGE:
+      /* This is present when doing unfilled polygons.  We're supposed to copy
+       * the edge flag from the user-provided vertex array
+       * (glEdgeFlagPointer), or otherwise we'll copy from the current value
+       * of that attribute (starts as 1.0f).  This is then used in clipping to
+       * determine which edges should be drawn as wireframe.
+       */
+      current_annotation = "edge flag";
+      emit(MOV(reg, src_reg(dst_reg(ATTR, VERT_ATTRIB_EDGEFLAG,
+                                    glsl_type::float_type, WRITEMASK_XYZW))));
+      break;
+   case BRW_VARYING_SLOT_PAD:
+      /* No need to write to this slot */
+      break;
+   default:
+      emit_generic_urb_slot(reg, varying);
+      break;
+   }
+}
+
+static int
+align_interleaved_urb_mlen(struct brw_context *brw, int mlen)
+{
+   if (brw->gen >= 6) {
+      /* URB data written (does not include the message header reg) must
+       * be a multiple of 256 bits, or 2 VS registers.  See vol5c.5,
+       * section 5.4.3.2.2: URB_INTERLEAVED.
+       *
+       * URB entries are allocated on a multiple of 1024 bits, so an
+       * extra 128 bits written here to make the end align to 256 is
+       * no problem.
+       */
+      if ((mlen % 2) != 1)
+	 mlen++;
+   }
+
+   return mlen;
+}
+
+
+/**
+ * Generates the VUE payload plus the necessary URB write instructions to
+ * output it.
+ *
+ * The VUE layout is documented in Volume 2a.
+ */
+void
+vec4_visitor::emit_vertex()
+{
+   /* MRF 0 is reserved for the debugger, so start with message header
+    * in MRF 1.
+    */
+   int base_mrf = 1;
+   int mrf = base_mrf;
+   /* In the process of generating our URB write message contents, we
+    * may need to unspill a register or load from an array.  Those
+    * reads would use MRFs 14-15.
+    */
+   int max_usable_mrf = 13;
+
+   /* The following assertion verifies that max_usable_mrf causes an
+    * even-numbered amount of URB write data, which will meet gen6's
+    * requirements for length alignment.
+    */
+   assert ((max_usable_mrf - base_mrf) % 2 == 0);
+
+   /* First mrf is the g0-based message header containing URB handles and
+    * such.
+    */
+   emit_urb_write_header(mrf++);
+
+   if (brw->gen < 6) {
+      emit_ndc_computation();
+   }
+
+   /* Lower legacy ff and ClipVertex clipping to clip distances */
+   if (key->userclip_active && !prog->UsesClipDistanceOut) {
+      current_annotation = "user clip distances";
+
+      output_reg[VARYING_SLOT_CLIP_DIST0] = dst_reg(this, glsl_type::vec4_type);
+      output_reg[VARYING_SLOT_CLIP_DIST1] = dst_reg(this, glsl_type::vec4_type);
+
+      emit_clip_distances(output_reg[VARYING_SLOT_CLIP_DIST0], 0);
+      emit_clip_distances(output_reg[VARYING_SLOT_CLIP_DIST1], 4);
+   }
+
+   /* We may need to split this up into several URB writes, so do them in a
+    * loop.
+    */
+   int slot = 0;
+   bool complete = false;
+   do {
+      /* URB offset is in URB row increments, and each of our MRFs is half of
+       * one of those, since we're doing interleaved writes.
+       */
+      int offset = slot / 2;
+
+      mrf = base_mrf + 1;
+      for (; slot < prog_data->vue_map.num_slots; ++slot) {
+         emit_urb_slot(mrf++, prog_data->vue_map.slot_to_varying[slot]);
+
+         /* If this was max_usable_mrf, we can't fit anything more into this
+          * URB WRITE.
+          */
+         if (mrf > max_usable_mrf) {
+            slot++;
+            break;
+         }
+      }
+
+      complete = slot >= prog_data->vue_map.num_slots;
+      current_annotation = "URB write";
+      vec4_instruction *inst = emit_urb_write_opcode(complete);
+      inst->base_mrf = base_mrf;
+      inst->mlen = align_interleaved_urb_mlen(brw, mrf - base_mrf);
+      inst->offset += offset;
+   } while(!complete);
+}
+
+
+src_reg
+vec4_visitor::get_scratch_offset(vec4_instruction *inst,
+				 src_reg *reladdr, int reg_offset)
+{
+   /* Because we store the values to scratch interleaved like our
+    * vertex data, we need to scale the vec4 index by 2.
+    */
+   int message_header_scale = 2;
+
+   /* Pre-gen6, the message header uses byte offsets instead of vec4
+    * (16-byte) offset units.
+    */
+   if (brw->gen < 6)
+      message_header_scale *= 16;
+
+   if (reladdr) {
+      src_reg index = src_reg(this, glsl_type::int_type);
+
+      emit_before(inst, ADD(dst_reg(index), *reladdr, src_reg(reg_offset)));
+      emit_before(inst, MUL(dst_reg(index),
+			    index, src_reg(message_header_scale)));
+
+      return index;
+   } else {
+      return src_reg(reg_offset * message_header_scale);
+   }
+}
+
+src_reg
+vec4_visitor::get_pull_constant_offset(vec4_instruction *inst,
+				       src_reg *reladdr, int reg_offset)
+{
+   if (reladdr) {
+      src_reg index = src_reg(this, glsl_type::int_type);
+
+      emit_before(inst, ADD(dst_reg(index), *reladdr, src_reg(reg_offset)));
+
+      /* Pre-gen6, the message header uses byte offsets instead of vec4
+       * (16-byte) offset units.
+       */
+      if (brw->gen < 6) {
+	 emit_before(inst, MUL(dst_reg(index), index, src_reg(16)));
+      }
+
+      return index;
+   } else if (brw->gen >= 8) {
+      /* Store the offset in a GRF so we can send-from-GRF. */
+      src_reg offset = src_reg(this, glsl_type::int_type);
+      emit_before(inst, MOV(dst_reg(offset), src_reg(reg_offset)));
+      return offset;
+   } else {
+      int message_header_scale = brw->gen < 6 ? 16 : 1;
+      return src_reg(reg_offset * message_header_scale);
+   }
+}
+
+/**
+ * Emits an instruction before @inst to load the value named by @orig_src
+ * from scratch space at @base_offset to @temp.
+ *
+ * @base_offset is measured in 32-byte units (the size of a register).
+ */
+void
+vec4_visitor::emit_scratch_read(vec4_instruction *inst,
+				dst_reg temp, src_reg orig_src,
+				int base_offset)
+{
+   int reg_offset = base_offset + orig_src.reg_offset;
+   src_reg index = get_scratch_offset(inst, orig_src.reladdr, reg_offset);
+
+   emit_before(inst, SCRATCH_READ(temp, index));
+}
+
+/**
+ * Emits an instruction after @inst to store the value to be written
+ * to @orig_dst to scratch space at @base_offset, from @temp.
+ *
+ * @base_offset is measured in 32-byte units (the size of a register).
+ */
+void
+vec4_visitor::emit_scratch_write(vec4_instruction *inst, int base_offset)
+{
+   int reg_offset = base_offset + inst->dst.reg_offset;
+   src_reg index = get_scratch_offset(inst, inst->dst.reladdr, reg_offset);
+
+   /* Create a temporary register to store *inst's result in.
+    *
+    * We have to be careful in MOVing from our temporary result register in
+    * the scratch write.  If we swizzle from channels of the temporary that
+    * weren't initialized, it will confuse live interval analysis, which will
+    * make spilling fail to make progress.
+    */
+   src_reg temp = src_reg(this, glsl_type::vec4_type);
+   temp.type = inst->dst.type;
+   int first_writemask_chan = ffs(inst->dst.writemask) - 1;
+   int swizzles[4];
+   for (int i = 0; i < 4; i++)
+      if (inst->dst.writemask & (1 << i))
+         swizzles[i] = i;
+      else
+         swizzles[i] = first_writemask_chan;
+   temp.swizzle = BRW_SWIZZLE4(swizzles[0], swizzles[1],
+                               swizzles[2], swizzles[3]);
+
+   dst_reg dst = dst_reg(brw_writemask(brw_vec8_grf(0, 0),
+				       inst->dst.writemask));
+   vec4_instruction *write = SCRATCH_WRITE(dst, temp, index);
+   write->predicate = inst->predicate;
+   write->ir = inst->ir;
+   write->annotation = inst->annotation;
+   inst->insert_after(write);
+
+   inst->dst.file = temp.file;
+   inst->dst.reg = temp.reg;
+   inst->dst.reg_offset = temp.reg_offset;
+   inst->dst.reladdr = NULL;
+}
+
+/**
+ * We can't generally support array access in GRF space, because a
+ * single instruction's destination can only span 2 contiguous
+ * registers.  So, we send all GRF arrays that get variable index
+ * access to scratch space.
+ */
+void
+vec4_visitor::move_grf_array_access_to_scratch()
+{
+   int scratch_loc[this->virtual_grf_count];
+
+   for (int i = 0; i < this->virtual_grf_count; i++) {
+      scratch_loc[i] = -1;
+   }
+
+   /* First, calculate the set of virtual GRFs that need to be punted
+    * to scratch due to having any array access on them, and where in
+    * scratch.
+    */
+   foreach_list(node, &this->instructions) {
+      vec4_instruction *inst = (vec4_instruction *)node;
+
+      if (inst->dst.file == GRF && inst->dst.reladdr &&
+	  scratch_loc[inst->dst.reg] == -1) {
+	 scratch_loc[inst->dst.reg] = c->last_scratch;
+	 c->last_scratch += this->virtual_grf_sizes[inst->dst.reg];
+      }
+
+      for (int i = 0 ; i < 3; i++) {
+	 src_reg *src = &inst->src[i];
+
+	 if (src->file == GRF && src->reladdr &&
+	     scratch_loc[src->reg] == -1) {
+	    scratch_loc[src->reg] = c->last_scratch;
+	    c->last_scratch += this->virtual_grf_sizes[src->reg];
+	 }
+      }
+   }
+
+   /* Now, for anything that will be accessed through scratch, rewrite
+    * it to load/store.  Note that this is a _safe list walk, because
+    * we may generate a new scratch_write instruction after the one
+    * we're processing.
+    */
+   foreach_list_safe(node, &this->instructions) {
+      vec4_instruction *inst = (vec4_instruction *)node;
+
+      /* Set up the annotation tracking for new generated instructions. */
+      base_ir = inst->ir;
+      current_annotation = inst->annotation;
+
+      if (inst->dst.file == GRF && scratch_loc[inst->dst.reg] != -1) {
+	 emit_scratch_write(inst, scratch_loc[inst->dst.reg]);
+      }
+
+      for (int i = 0 ; i < 3; i++) {
+	 if (inst->src[i].file != GRF || scratch_loc[inst->src[i].reg] == -1)
+	    continue;
+
+	 dst_reg temp = dst_reg(this, glsl_type::vec4_type);
+
+	 emit_scratch_read(inst, temp, inst->src[i],
+			   scratch_loc[inst->src[i].reg]);
+
+	 inst->src[i].file = temp.file;
+	 inst->src[i].reg = temp.reg;
+	 inst->src[i].reg_offset = temp.reg_offset;
+	 inst->src[i].reladdr = NULL;
+      }
+   }
+}
+
+/**
+ * Emits an instruction before @inst to load the value named by @orig_src
+ * from the pull constant buffer (surface) at @base_offset to @temp.
+ */
+void
+vec4_visitor::emit_pull_constant_load(vec4_instruction *inst,
+				      dst_reg temp, src_reg orig_src,
+				      int base_offset)
+{
+   int reg_offset = base_offset + orig_src.reg_offset;
+   src_reg index = src_reg(prog_data->base.binding_table.pull_constants_start);
+   src_reg offset = get_pull_constant_offset(inst, orig_src.reladdr, reg_offset);
+   vec4_instruction *load;
+
+   if (brw->gen >= 7) {
+      dst_reg grf_offset = dst_reg(this, glsl_type::int_type);
+      grf_offset.type = offset.type;
+      emit_before(inst, MOV(grf_offset, offset));
+
+      load = new(mem_ctx) vec4_instruction(this,
+                                           VS_OPCODE_PULL_CONSTANT_LOAD_GEN7,
+                                           temp, index, src_reg(grf_offset));
+   } else {
+      load = new(mem_ctx) vec4_instruction(this, VS_OPCODE_PULL_CONSTANT_LOAD,
+                                           temp, index, offset);
+      load->base_mrf = 14;
+      load->mlen = 1;
+   }
+   emit_before(inst, load);
+}
+
+/**
+ * Implements array access of uniforms by inserting a
+ * PULL_CONSTANT_LOAD instruction.
+ *
+ * Unlike temporary GRF array access (where we don't support it due to
+ * the difficulty of doing relative addressing on instruction
+ * destinations), we could potentially do array access of uniforms
+ * that were loaded in GRF space as push constants.  In real-world
+ * usage we've seen, though, the arrays being used are always larger
+ * than we could load as push constants, so just always move all
+ * uniform array access out to a pull constant buffer.
+ */
+void
+vec4_visitor::move_uniform_array_access_to_pull_constants()
+{
+   int pull_constant_loc[this->uniforms];
+
+   for (int i = 0; i < this->uniforms; i++) {
+      pull_constant_loc[i] = -1;
+   }
+
+   /* Walk through and find array access of uniforms.  Put a copy of that
+    * uniform in the pull constant buffer.
+    *
+    * Note that we don't move constant-indexed accesses to arrays.  No
+    * testing has been done of the performance impact of this choice.
+    */
+   foreach_list_safe(node, &this->instructions) {
+      vec4_instruction *inst = (vec4_instruction *)node;
+
+      for (int i = 0 ; i < 3; i++) {
+	 if (inst->src[i].file != UNIFORM || !inst->src[i].reladdr)
+	    continue;
+
+	 int uniform = inst->src[i].reg;
+
+	 /* If this array isn't already present in the pull constant buffer,
+	  * add it.
+	  */
+	 if (pull_constant_loc[uniform] == -1) {
+	    const float **values = &stage_prog_data->param[uniform * 4];
+
+	    pull_constant_loc[uniform] = stage_prog_data->nr_pull_params / 4;
+
+	    assert(uniform < uniform_array_size);
+	    for (int j = 0; j < uniform_size[uniform] * 4; j++) {
+	       stage_prog_data->pull_param[stage_prog_data->nr_pull_params++]
+                  = values[j];
+	    }
+	 }
+
+	 /* Set up the annotation tracking for new generated instructions. */
+	 base_ir = inst->ir;
+	 current_annotation = inst->annotation;
+
+	 dst_reg temp = dst_reg(this, glsl_type::vec4_type);
+
+	 emit_pull_constant_load(inst, temp, inst->src[i],
+				 pull_constant_loc[uniform]);
+
+	 inst->src[i].file = temp.file;
+	 inst->src[i].reg = temp.reg;
+	 inst->src[i].reg_offset = temp.reg_offset;
+	 inst->src[i].reladdr = NULL;
+      }
+   }
+
+   /* Now there are no accesses of the UNIFORM file with a reladdr, so
+    * no need to track them as larger-than-vec4 objects.  This will be
+    * relied on in cutting out unused uniform vectors from push
+    * constants.
+    */
+   split_uniform_registers();
+}
+
+void
+vec4_visitor::resolve_ud_negate(src_reg *reg)
+{
+   if (reg->type != BRW_REGISTER_TYPE_UD ||
+       !reg->negate)
+      return;
+
+   src_reg temp = src_reg(this, glsl_type::uvec4_type);
+   emit(BRW_OPCODE_MOV, dst_reg(temp), *reg);
+   *reg = temp;
+}
+
+vec4_visitor::vec4_visitor(struct brw_context *brw,
+                           struct brw_vec4_compile *c,
+                           struct gl_program *prog,
+                           const struct brw_vec4_prog_key *key,
+                           struct brw_vec4_prog_data *prog_data,
+			   struct gl_shader_program *shader_prog,
+                           gl_shader_stage stage,
+			   void *mem_ctx,
+                           bool debug_flag,
+                           bool no_spills,
+                           shader_time_shader_type st_base,
+                           shader_time_shader_type st_written,
+                           shader_time_shader_type st_reset)
+   : backend_visitor(brw, shader_prog, prog, &prog_data->base, stage),
+     c(c),
+     key(key),
+     prog_data(prog_data),
+     sanity_param_count(0),
+     fail_msg(NULL),
+     first_non_payload_grf(0),
+     need_all_constants_in_pull_buffer(false),
+     debug_flag(debug_flag),
+     no_spills(no_spills),
+     st_base(st_base),
+     st_written(st_written),
+     st_reset(st_reset)
+{
+   this->mem_ctx = mem_ctx;
+   this->failed = false;
+
+   this->base_ir = NULL;
+   this->current_annotation = NULL;
+   memset(this->output_reg_annotation, 0, sizeof(this->output_reg_annotation));
+
+   this->variable_ht = hash_table_ctor(0,
+				       hash_table_pointer_hash,
+				       hash_table_pointer_compare);
+
+   this->virtual_grf_start = NULL;
+   this->virtual_grf_end = NULL;
+   this->virtual_grf_sizes = NULL;
+   this->virtual_grf_count = 0;
+   this->virtual_grf_reg_map = NULL;
+   this->virtual_grf_reg_count = 0;
+   this->virtual_grf_array_size = 0;
+   this->live_intervals = NULL;
+
+   this->max_grf = brw->gen >= 7 ? GEN7_MRF_HACK_START : BRW_MAX_GRF;
+
+   this->uniforms = 0;
+
+   /* Initialize uniform_array_size to at least 1 because pre-gen6 VS requires
+    * at least one. See setup_uniforms() in brw_vec4.cpp.
+    */
+   this->uniform_array_size = 1;
+   if (prog_data) {
+      this->uniform_array_size = MAX2(stage_prog_data->nr_params, 1);
+   }
+
+   this->uniform_size = rzalloc_array(mem_ctx, int, this->uniform_array_size);
+   this->uniform_vector_size = rzalloc_array(mem_ctx, int, this->uniform_array_size);
+}
+
+vec4_visitor::~vec4_visitor()
+{
+   hash_table_dtor(this->variable_ht);
+}
+
+
+void
+vec4_visitor::fail(const char *format, ...)
+{
+   va_list va;
+   char *msg;
+
+   if (failed)
+      return;
+
+   failed = true;
+
+   va_start(va, format);
+   msg = ralloc_vasprintf(mem_ctx, format, va);
+   va_end(va);
+   msg = ralloc_asprintf(mem_ctx, "vec4 compile failed: %s\n", msg);
+
+   this->fail_msg = msg;
+
+   if (debug_flag) {
+      fprintf(stderr, "%s",  msg);
+   }
+}
+
+} /* namespace brw */

diff --git a/icd/intel/compiler/pipeline/brw_vec4_vs_visitor.cpp b/icd/intel/compiler/pipeline/brw_vec4_vs_visitor.cpp
new file mode 100644
index 0000000..c3c4735
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_vec4_vs_visitor.cpp

@@ -0,0 +1,227 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+
+#include "brw_vs.h"
+#include "main/context.h"
+
+
+namespace brw {
+
+void
+vec4_vs_visitor::emit_prolog()
+{
+   dst_reg sign_recovery_shift;
+   dst_reg normalize_factor;
+   dst_reg es3_normalize_factor;
+
+   for (int i = 0; i < VERT_ATTRIB_MAX; i++) {
+      if (vs_prog_data->inputs_read & BITFIELD64_BIT(i)) {
+         uint8_t wa_flags = vs_compile->key.gl_attrib_wa_flags[i];
+         dst_reg reg(ATTR, i);
+         dst_reg reg_d = reg;
+         reg_d.type = BRW_REGISTER_TYPE_D;
+         dst_reg reg_ud = reg;
+         reg_ud.type = BRW_REGISTER_TYPE_UD;
+
+         /* Do GL_FIXED rescaling for GLES2.0.  Our GL_FIXED attributes
+          * come in as floating point conversions of the integer values.
+          */
+         if (wa_flags & BRW_ATTRIB_WA_COMPONENT_MASK) {
+            dst_reg dst = reg;
+            dst.type = brw_type_for_base_type(glsl_type::vec4_type);
+            dst.writemask = (1 << (wa_flags & BRW_ATTRIB_WA_COMPONENT_MASK)) - 1;
+            emit(MUL(dst, src_reg(dst), src_reg(1.0f / 65536.0f)));
+         }
+
+         /* Do sign recovery for 2101010 formats if required. */
+         if (wa_flags & BRW_ATTRIB_WA_SIGN) {
+            if (sign_recovery_shift.file == BAD_FILE) {
+               /* shift constant: <22,22,22,30> */
+               sign_recovery_shift = dst_reg(this, glsl_type::uvec4_type);
+               emit(MOV(writemask(sign_recovery_shift, WRITEMASK_XYZ), src_reg(22u)));
+               emit(MOV(writemask(sign_recovery_shift, WRITEMASK_W), src_reg(30u)));
+            }
+
+            emit(SHL(reg_ud, src_reg(reg_ud), src_reg(sign_recovery_shift)));
+            emit(ASR(reg_d, src_reg(reg_d), src_reg(sign_recovery_shift)));
+         }
+
+         /* Apply BGRA swizzle if required. */
+         if (wa_flags & BRW_ATTRIB_WA_BGRA) {
+            src_reg temp = src_reg(reg);
+            temp.swizzle = BRW_SWIZZLE4(2,1,0,3);
+            emit(MOV(reg, temp));
+         }
+
+         if (wa_flags & BRW_ATTRIB_WA_NORMALIZE) {
+            /* ES 3.0 has different rules for converting signed normalized
+             * fixed-point numbers than desktop GL.
+             */
+            if (_mesa_is_gles3(ctx) && (wa_flags & BRW_ATTRIB_WA_SIGN)) {
+               /* According to equation 2.2 of the ES 3.0 specification,
+                * signed normalization conversion is done by:
+                *
+                * f = c / (2^(b-1)-1)
+                */
+               if (es3_normalize_factor.file == BAD_FILE) {
+                  /* mul constant: 1 / (2^(b-1) - 1) */
+                  es3_normalize_factor = dst_reg(this, glsl_type::vec4_type);
+                  emit(MOV(writemask(es3_normalize_factor, WRITEMASK_XYZ),
+                           src_reg(1.0f / ((1<<9) - 1))));
+                  emit(MOV(writemask(es3_normalize_factor, WRITEMASK_W),
+                           src_reg(1.0f / ((1<<1) - 1))));
+               }
+
+               dst_reg dst = reg;
+               dst.type = brw_type_for_base_type(glsl_type::vec4_type);
+               emit(MOV(dst, src_reg(reg_d)));
+               emit(MUL(dst, src_reg(dst), src_reg(es3_normalize_factor)));
+               emit_minmax(BRW_CONDITIONAL_G, dst, src_reg(dst), src_reg(-1.0f));
+            } else {
+               /* The following equations are from the OpenGL 3.2 specification:
+                *
+                * 2.1 unsigned normalization
+                * f = c/(2^n-1)
+                *
+                * 2.2 signed normalization
+                * f = (2c+1)/(2^n-1)
+                *
+                * Both of these share a common divisor, which is represented by
+                * "normalize_factor" in the code below.
+                */
+               if (normalize_factor.file == BAD_FILE) {
+                  /* 1 / (2^b - 1) for b=<10,10,10,2> */
+                  normalize_factor = dst_reg(this, glsl_type::vec4_type);
+                  emit(MOV(writemask(normalize_factor, WRITEMASK_XYZ),
+                           src_reg(1.0f / ((1<<10) - 1))));
+                  emit(MOV(writemask(normalize_factor, WRITEMASK_W),
+                           src_reg(1.0f / ((1<<2) - 1))));
+               }
+
+               dst_reg dst = reg;
+               dst.type = brw_type_for_base_type(glsl_type::vec4_type);
+               emit(MOV(dst, src_reg((wa_flags & BRW_ATTRIB_WA_SIGN) ? reg_d : reg_ud)));
+
+               /* For signed normalization, we want the numerator to be 2c+1. */
+               if (wa_flags & BRW_ATTRIB_WA_SIGN) {
+                  emit(MUL(dst, src_reg(dst), src_reg(2.0f)));
+                  emit(ADD(dst, src_reg(dst), src_reg(1.0f)));
+               }
+
+               emit(MUL(dst, src_reg(dst), src_reg(normalize_factor)));
+            }
+         }
+
+         if (wa_flags & BRW_ATTRIB_WA_SCALE) {
+            dst_reg dst = reg;
+            dst.type = brw_type_for_base_type(glsl_type::vec4_type);
+            emit(MOV(dst, src_reg((wa_flags & BRW_ATTRIB_WA_SIGN) ? reg_d : reg_ud)));
+         }
+      }
+   }
+}
+
+
+dst_reg *
+vec4_vs_visitor::make_reg_for_system_value(ir_variable *ir)
+{
+   /* VertexID is stored by the VF as the last vertex element, but
+    * we don't represent it with a flag in inputs_read, so we call
+    * it VERT_ATTRIB_MAX, which setup_attributes() picks up on.
+    */
+   dst_reg *reg = new(mem_ctx) dst_reg(ATTR, VERT_ATTRIB_MAX);
+
+   switch (ir->data.location) {
+   case SYSTEM_VALUE_VERTEX_ID:
+      reg->writemask = WRITEMASK_X;
+      vs_prog_data->uses_vertexid = true;
+      break;
+   case SYSTEM_VALUE_INSTANCE_ID:
+      reg->writemask = WRITEMASK_Y;
+      vs_prog_data->uses_instanceid = true;
+      break;
+   default:
+      assert(!"not reached");
+      break;
+   }
+
+   return reg;
+}
+
+
+void
+vec4_vs_visitor::emit_urb_write_header(int mrf)
+{
+   /* No need to do anything for VS; an implied write to this MRF will be
+    * performed by VS_OPCODE_URB_WRITE.
+    */
+   (void) mrf;
+}
+
+
+vec4_instruction *
+vec4_vs_visitor::emit_urb_write_opcode(bool complete)
+{
+   /* For VS, the URB writes end the thread. */
+   if (complete) {
+//      if (INTEL_DEBUG & DEBUG_SHADER_TIME)
+//         emit_shader_time_end();
+   }
+
+   vec4_instruction *inst = emit(VS_OPCODE_URB_WRITE);
+   inst->urb_write_flags = complete ?
+      BRW_URB_WRITE_EOT_COMPLETE : BRW_URB_WRITE_NO_FLAGS;
+
+   return inst;
+}
+
+
+void
+vec4_vs_visitor::emit_thread_end()
+{
+   /* For VS, we always end the thread by emitting a single vertex.
+    * emit_urb_write_opcode() will take care of setting the eot flag on the
+    * SEND instruction.
+    */
+   emit_vertex();
+}
+
+
+vec4_vs_visitor::vec4_vs_visitor(struct brw_context *brw,
+                                 struct brw_vs_compile *vs_compile,
+                                 struct brw_vs_prog_data *vs_prog_data,
+                                 struct gl_shader_program *prog,
+                                 void *mem_ctx)
+   : vec4_visitor(brw, &vs_compile->base, &vs_compile->vp->program.Base,
+                  &vs_compile->key.base, &vs_prog_data->base, prog,
+                  MESA_SHADER_VERTEX,
+                  mem_ctx, INTEL_DEBUG & DEBUG_VS, false /* no_spills */,
+                  ST_VS, ST_VS_WRITTEN, ST_VS_RESET),
+     vs_compile(vs_compile),
+     vs_prog_data(vs_prog_data)
+{
+}
+
+
+} /* namespace brw */

diff --git a/icd/intel/compiler/pipeline/brw_vs.c b/icd/intel/compiler/pipeline/brw_vs.c
new file mode 100644
index 0000000..e86ac85
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_vs.c

@@ -0,0 +1,364 @@
+/*
+ Copyright (C) Intel Corp.  2006.  All Rights Reserved.
+ Intel funded Tungsten Graphics to
+ develop this 3D driver.
+
+ Permission is hereby granted, free of charge, to any person obtaining
+ a copy of this software and associated documentation files (the
+ "Software"), to deal in the Software without restriction, including
+ without limitation the rights to use, copy, modify, merge, publish,
+ distribute, sublicense, and/or sell copies of the Software, and to
+ permit persons to whom the Software is furnished to do so, subject to
+ the following conditions:
+
+ The above copyright notice and this permission notice (including the
+ next paragraph) shall be included in all copies or substantial
+ portions of the Software.
+
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+ **********************************************************************/
+ /*
+  * Authors:
+  *   Keith Whitwell <keithw@vmware.com>
+  */
+
+
+#include "main/compiler.h"
+#include "brw_context.h"
+#include "brw_vs.h"
+//#include "brw_util.h"  // LunarG: Remove
+//#include "brw_state.h" // LunarG: Remove
+#include "program/prog_print.h"
+#include "program/prog_parameter.h"
+
+#include "glsl/ralloc.h"
+
+#include "icd-utils.h"  // LunarG: ADD
+
+static inline void assign_vue_slot(struct brw_vue_map *vue_map,
+                                   int varying)
+{
+   /* Make sure this varying hasn't been assigned a slot already */
+   assert (vue_map->varying_to_slot[varying] == -1);
+
+   vue_map->varying_to_slot[varying] = vue_map->num_slots;
+   vue_map->slot_to_varying[vue_map->num_slots++] = varying;
+}
+
+/**
+ * Compute the VUE map for vertex shader program.
+ */
+void
+brw_compute_vue_map(struct brw_context *brw, struct brw_vue_map *vue_map,
+                    GLbitfield64 slots_valid)
+{
+   vue_map->slots_valid = slots_valid;
+   int i;
+
+   /* gl_Layer and gl_ViewportIndex don't get their own varying slots -- they
+    * are stored in the first VUE slot (VARYING_SLOT_PSIZ).
+    */
+   slots_valid &= ~(VARYING_BIT_LAYER | VARYING_BIT_VIEWPORT);
+
+   /* Make sure that the values we store in vue_map->varying_to_slot and
+    * vue_map->slot_to_varying won't overflow the signed chars that are used
+    * to store them.  Note that since vue_map->slot_to_varying sometimes holds
+    * values equal to BRW_VARYING_SLOT_COUNT, we need to ensure that
+    * BRW_VARYING_SLOT_COUNT is <= 127, not 128.
+    */
+   STATIC_ASSERT(BRW_VARYING_SLOT_COUNT <= 127);
+
+   vue_map->num_slots = 0;
+   for (i = 0; i < BRW_VARYING_SLOT_COUNT; ++i) {
+      vue_map->varying_to_slot[i] = -1;
+      vue_map->slot_to_varying[i] = BRW_VARYING_SLOT_COUNT;
+   }
+
+   /* VUE header: format depends on chip generation and whether clipping is
+    * enabled.
+    */
+   if (brw->gen < 6) {
+      /* There are 8 dwords in VUE header pre-Ironlake:
+       * dword 0-3 is indices, point width, clip flags.
+       * dword 4-7 is ndc position
+       * dword 8-11 is the first vertex data.
+       *
+       * On Ironlake the VUE header is nominally 20 dwords, but the hardware
+       * will accept the same header layout as Gen4 [and should be a bit faster]
+       */
+      assign_vue_slot(vue_map, VARYING_SLOT_PSIZ);
+      assign_vue_slot(vue_map, BRW_VARYING_SLOT_NDC);
+      assign_vue_slot(vue_map, VARYING_SLOT_POS);
+   } else {
+      /* There are 8 or 16 DWs (D0-D15) in VUE header on Sandybridge:
+       * dword 0-3 of the header is indices, point width, clip flags.
+       * dword 4-7 is the 4D space position
+       * dword 8-15 of the vertex header is the user clip distance if
+       * enabled.
+       * dword 8-11 or 16-19 is the first vertex element data we fill.
+       */
+      assign_vue_slot(vue_map, VARYING_SLOT_PSIZ);
+      assign_vue_slot(vue_map, VARYING_SLOT_POS);
+      if (slots_valid & BITFIELD64_BIT(VARYING_SLOT_CLIP_DIST0))
+         assign_vue_slot(vue_map, VARYING_SLOT_CLIP_DIST0);
+      if (slots_valid & BITFIELD64_BIT(VARYING_SLOT_CLIP_DIST1))
+         assign_vue_slot(vue_map, VARYING_SLOT_CLIP_DIST1);
+
+      /* front and back colors need to be consecutive so that we can use
+       * ATTRIBUTE_SWIZZLE_INPUTATTR_FACING to swizzle them when doing
+       * two-sided color.
+       */
+      if (slots_valid & BITFIELD64_BIT(VARYING_SLOT_COL0))
+         assign_vue_slot(vue_map, VARYING_SLOT_COL0);
+      if (slots_valid & BITFIELD64_BIT(VARYING_SLOT_BFC0))
+         assign_vue_slot(vue_map, VARYING_SLOT_BFC0);
+      if (slots_valid & BITFIELD64_BIT(VARYING_SLOT_COL1))
+         assign_vue_slot(vue_map, VARYING_SLOT_COL1);
+      if (slots_valid & BITFIELD64_BIT(VARYING_SLOT_BFC1))
+         assign_vue_slot(vue_map, VARYING_SLOT_BFC1);
+   }
+
+   /* The hardware doesn't care about the rest of the vertex outputs, so just
+    * assign them contiguously.  Don't reassign outputs that already have a
+    * slot.
+    *
+    * We generally don't need to assign a slot for VARYING_SLOT_CLIP_VERTEX,
+    * since it's encoded as the clip distances by emit_clip_distances().
+    * However, it may be output by transform feedback, and we'd rather not
+    * recompute state when TF changes, so we just always include it.
+    */
+   for (int i = 0; i < VARYING_SLOT_MAX; ++i) {
+      if ((slots_valid & BITFIELD64_BIT(i)) &&
+          vue_map->varying_to_slot[i] == -1) {
+         assign_vue_slot(vue_map, i);
+      }
+   }
+}
+
+
+// LunarG: TODO - How to handle user clip planes?
+/**
+ * Decide which set of clip planes should be used when clipping via
+ * gl_Position or gl_ClipVertex.
+ */
+//gl_clip_plane *brw_select_clip_planes(struct gl_context *ctx)
+//{
+//   if (ctx->_Shader->CurrentProgram[MESA_SHADER_VERTEX]) {
+//      /* There is currently a GLSL vertex shader, so clip according to GLSL
+//       * rules, which means compare gl_ClipVertex (or gl_Position, if
+//       * gl_ClipVertex wasn't assigned) against the eye-coordinate clip planes
+//       * that were stored in EyeUserPlane at the time the clip planes were
+//       * specified.
+//       */
+//      return ctx->Transform.EyeUserPlane;
+//   } else {
+//      /* Either we are using fixed function or an ARB vertex program.  In
+//       * either case the clip planes are going to be compared against
+//       * gl_Position (which is in clip coordinates) so we have to clip using
+//       * _ClipUserPlane, which was transformed into clip coordinates by Mesa
+//       * core.
+//       */
+//      return ctx->Transform._ClipUserPlane;
+//   }
+//}
+
+
+bool
+brw_vs_prog_data_compare(const void *in_a, const void *in_b)
+{
+   const struct brw_vs_prog_data *a = in_a;
+   const struct brw_vs_prog_data *b = in_b;
+
+   /* Compare the base structure. */
+   if (!brw_stage_prog_data_compare(&a->base.base, &b->base.base))
+      return false;
+
+   /* Compare the rest of the struct. */
+   const unsigned offset = sizeof(struct brw_stage_prog_data);
+   if (memcmp(((char *) a) + offset, ((char *) b) + offset,
+              sizeof(struct brw_vs_prog_data) - offset)) {
+      return false;
+   }
+
+   return true;
+}
+
+static void
+brw_vs_init_compile(struct brw_context *brw,
+	            struct gl_shader_program *prog,
+	            struct brw_vertex_program *vp,
+	            const struct brw_vs_prog_key *key,
+	            struct brw_vs_compile *c)
+{
+   memset(c, 0, sizeof(*c));
+
+   memcpy(&c->key, key, sizeof(*key));
+   c->vp = vp;
+   c->base.shader_prog = prog;
+   c->base.mem_ctx = ralloc_context(NULL);
+}
+
+static bool
+brw_vs_do_compile(struct brw_context *brw,
+	          struct brw_vs_compile *c)
+{
+   struct brw_stage_prog_data *stage_prog_data = &c->prog_data.base.base;
+   struct gl_shader *vs = NULL;
+   int i;
+
+   if (c->base.shader_prog)
+      vs = c->base.shader_prog->_LinkedShaders[MESA_SHADER_VERTEX];
+
+   /* Allocate the references to the uniforms that will end up in the
+    * prog_data associated with the compiled program, and which will be freed
+    * by the state cache.
+    */
+   int param_count;
+   if (vs) {
+      /* We add padding around uniform values below vec4 size, with the worst
+       * case being a float value that gets blown up to a vec4, so be
+       * conservative here.
+       */
+      param_count = vs->num_uniform_components * 4;
+
+   } else {
+      param_count = c->vp->program.Base.Parameters->NumParameters * 4;
+   }
+   /* vec4_visitor::setup_uniform_clipplane_values() also uploads user clip
+    * planes as uniforms.
+    */
+   param_count += c->key.base.nr_userclip_plane_consts * 4;
+
+   stage_prog_data->param = rzalloc_array(NULL, const float *, param_count);
+   stage_prog_data->pull_param = rzalloc_array(NULL, const float *, param_count);
+
+   /* Setting nr_params here NOT to the size of the param and pull_param
+    * arrays, but to the number of uniform components vec4_visitor
+    * needs. vec4_visitor::setup_uniforms() will set it back to a proper value.
+    */
+   stage_prog_data->nr_params = ALIGN(param_count, 4) / 4;
+   if (vs) {
+      stage_prog_data->nr_params += vs->num_samplers;
+   }
+
+   GLbitfield64 outputs_written = c->vp->program.Base.OutputsWritten;
+   c->prog_data.inputs_read = c->vp->program.Base.InputsRead;
+
+   if (c->key.copy_edgeflag) {
+      outputs_written |= BITFIELD64_BIT(VARYING_SLOT_EDGE);
+      c->prog_data.inputs_read |= VERT_BIT_EDGEFLAG;
+   }
+
+   if (brw->gen < 6) {
+      /* Put dummy slots into the VUE for the SF to put the replaced
+       * point sprite coords in.  We shouldn't need these dummy slots,
+       * which take up precious URB space, but it would mean that the SF
+       * doesn't get nice aligned pairs of input coords into output
+       * coords, which would be a pain to handle.
+       */
+      for (i = 0; i < 8; i++) {
+         if (c->key.point_coord_replace & (1 << i))
+            outputs_written |= BITFIELD64_BIT(VARYING_SLOT_TEX0 + i);
+      }
+
+      /* if back colors are written, allocate slots for front colors too */
+      if (outputs_written & BITFIELD64_BIT(VARYING_SLOT_BFC0))
+         outputs_written |= BITFIELD64_BIT(VARYING_SLOT_COL0);
+      if (outputs_written & BITFIELD64_BIT(VARYING_SLOT_BFC1))
+         outputs_written |= BITFIELD64_BIT(VARYING_SLOT_COL1);
+   }
+
+   /* In order for legacy clipping to work, we need to populate the clip
+    * distance varying slots whenever clipping is enabled, even if the vertex
+    * shader doesn't write to gl_ClipDistance.
+    */
+   if (c->key.base.userclip_active) {
+      outputs_written |= BITFIELD64_BIT(VARYING_SLOT_CLIP_DIST0);
+      outputs_written |= BITFIELD64_BIT(VARYING_SLOT_CLIP_DIST1);
+   }
+
+   brw_compute_vue_map(brw, &c->prog_data.base.vue_map, outputs_written);
+
+   if (0) {
+      _mesa_fprint_program_opt(stderr, &c->vp->program.Base, PROG_PRINT_DEBUG,
+			       true);
+   }
+
+   /* Emit GEN4 code.
+    */
+   c->base.program = brw_vs_emit(brw, c->base.shader_prog, c,
+         &c->prog_data, c->base.mem_ctx, &c->base.program_size);
+   if (c->base.program == NULL)
+      return false;
+
+   if (c->base.last_scratch) {
+      c->prog_data.base.total_scratch
+         = brw_get_scratch_size(c->base.last_scratch*REG_SIZE);
+   }
+
+   return true;
+}
+
+static void
+brw_vs_clear_compile(struct brw_context *brw,
+	             struct brw_vs_compile *c)
+{
+   ralloc_free(c->base.mem_ctx);
+}
+
+// LunarG : TODO - user clip planes?
+//void
+//brw_setup_vec4_key_clip_info(struct brw_context *brw,
+//                             struct brw_vec4_prog_key *key,
+//                             bool program_uses_clip_distance)
+//{
+//   struct gl_context *ctx = &brw->ctx;
+
+//   key->userclip_active = (ctx->Transform.ClipPlanesEnabled != 0);
+//   if (key->userclip_active && !program_uses_clip_distance) {
+//      key->nr_userclip_plane_consts
+//         = _mesa_logbase2(ctx->Transform.ClipPlanesEnabled) + 1;
+//   }
+//}
+
+bool
+brw_vs_precompile(struct gl_context *ctx, struct gl_shader_program *prog)
+{
+   struct brw_context *brw = brw_context(ctx);
+   struct brw_vs_prog_key key;
+
+   if (!prog->_LinkedShaders[MESA_SHADER_VERTEX])
+      return true;
+
+   struct gl_vertex_program *vp = (struct gl_vertex_program *)
+      prog->_LinkedShaders[MESA_SHADER_VERTEX]->Program;
+   struct brw_vertex_program *bvp = brw_vertex_program(vp);
+
+   memset(&key, 0, sizeof(key));
+
+   brw_vec4_setup_prog_key_for_precompile(ctx, &key.base, bvp->id, &vp->Base);
+
+   // In VK, user clipping is triggered solely from the shader.
+   key.base.userclip_active = vp->Base.UsesClipDistanceOut;
+
+   struct brw_vs_compile c;
+
+   brw_vs_init_compile(brw, prog, bvp, &key, &c);
+   if (!brw_vs_do_compile(brw, &c)) {
+      brw_vs_clear_compile(brw, &c);
+      return false;
+   }
+
+   // Rather than defer or upload to cache, hand off
+   // the compile results back to the brw_context
+   brw_shader_program_save_vs_compile(brw->shader_prog, &c);
+
+   return true;
+}

diff --git a/icd/intel/compiler/pipeline/brw_vs.h b/icd/intel/compiler/pipeline/brw_vs.h
new file mode 100644
index 0000000..62fbb56
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_vs.h

@@ -0,0 +1,140 @@
+/*
+ Copyright (C) Intel Corp.  2006.  All Rights Reserved.
+ Intel funded Tungsten Graphics to
+ develop this 3D driver.
+
+ Permission is hereby granted, free of charge, to any person obtaining
+ a copy of this software and associated documentation files (the
+ "Software"), to deal in the Software without restriction, including
+ without limitation the rights to use, copy, modify, merge, publish,
+ distribute, sublicense, and/or sell copies of the Software, and to
+ permit persons to whom the Software is furnished to do so, subject to
+ the following conditions:
+
+ The above copyright notice and this permission notice (including the
+ next paragraph) shall be included in all copies or substantial
+ portions of the Software.
+
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+ **********************************************************************/
+ /*
+  * Authors:
+  *   Keith Whitwell <keithw@vmware.com>
+  */
+
+
+#ifndef BRW_VS_H
+#define BRW_VS_H
+
+
+#include "brw_context.h"
+#include "brw_eu.h"
+#include "brw_vec4.h"
+#include "program/program.h"
+
+/**
+ * The VF can't natively handle certain types of attributes, such as GL_FIXED
+ * or most 10_10_10_2 types.  These flags enable various VS workarounds to
+ * "fix" attributes at the beginning of shaders.
+ */
+#define BRW_ATTRIB_WA_COMPONENT_MASK    7  /* mask for GL_FIXED scale channel count */
+#define BRW_ATTRIB_WA_NORMALIZE     8   /* normalize in shader */
+#define BRW_ATTRIB_WA_BGRA          16  /* swap r/b channels in shader */
+#define BRW_ATTRIB_WA_SIGN          32  /* interpret as signed in shader */
+#define BRW_ATTRIB_WA_SCALE         64  /* interpret as scaled in shader */
+
+struct brw_vs_prog_key {
+   struct brw_vec4_prog_key base;
+
+   /*
+    * Per-attribute workaround flags
+    */
+   uint8_t gl_attrib_wa_flags[VERT_ATTRIB_MAX];
+
+   GLuint copy_edgeflag:1;
+
+   /**
+    * For pre-Gen6 hardware, a bitfield indicating which texture coordinates
+    * are going to be replaced with point coordinates (as a consequence of a
+    * call to glTexEnvi(GL_POINT_SPRITE, GL_COORD_REPLACE, GL_TRUE)).  Because
+    * our SF thread requires exact matching between VS outputs and FS inputs,
+    * these texture coordinates will need to be unconditionally included in
+    * the VUE, even if they aren't written by the vertex shader.
+    */
+   GLuint point_coord_replace:8;
+};
+
+
+struct brw_vs_compile {
+   struct brw_vec4_compile base;
+   struct brw_vs_prog_key key;
+   struct brw_vs_prog_data prog_data;
+
+   struct brw_vertex_program *vp;
+};
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+const unsigned *brw_vs_emit(struct brw_context *brw,
+                            struct gl_shader_program *prog,
+                            struct brw_vs_compile *c,
+                            struct brw_vs_prog_data *prog_data,
+                            void *mem_ctx,
+                            unsigned *program_size);
+bool brw_vs_precompile(struct gl_context *ctx, struct gl_shader_program *prog);
+void brw_vs_debug_recompile(struct brw_context *brw,
+                            struct gl_shader_program *prog,
+                            const struct brw_vs_prog_key *key);
+bool brw_vs_prog_data_compare(const void *a, const void *b);
+
+#ifdef __cplusplus
+} /* extern "C" */
+
+
+namespace brw {
+
+class vec4_vs_visitor : public vec4_visitor
+{
+public:
+   vec4_vs_visitor(struct brw_context *brw,
+                   struct brw_vs_compile *vs_compile,
+                   struct brw_vs_prog_data *vs_prog_data,
+                   struct gl_shader_program *prog,
+                   void *mem_ctx);
+
+protected:
+   virtual dst_reg *make_reg_for_system_value(ir_variable *ir);
+   virtual void setup_payload();
+   virtual void emit_prolog();
+   //virtual void emit_program_code();
+   virtual void emit_thread_end();
+   virtual void emit_urb_write_header(int mrf);
+   virtual vec4_instruction *emit_urb_write_opcode(bool complete);
+
+private:
+   int setup_attributes(int payload_reg);
+   void setup_vp_regs();
+   dst_reg get_vp_dst_reg(const prog_dst_register &dst);
+   src_reg get_vp_src_reg(const prog_src_register &src);
+
+   struct brw_vs_compile * const vs_compile;
+   struct brw_vs_prog_data * const vs_prog_data;
+   src_reg *vp_temp_regs;
+   src_reg vp_addr_reg;
+};
+
+} /* namespace brw */
+
+
+#endif
+
+#endif

diff --git a/icd/intel/compiler/pipeline/brw_wm.c b/icd/intel/compiler/pipeline/brw_wm.c
new file mode 100644
index 0000000..f6a47d4
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_wm.c

@@ -0,0 +1,469 @@
+/*
+ Copyright (C) Intel Corp.  2006.  All Rights Reserved.
+ Intel funded Tungsten Graphics to
+ develop this 3D driver.
+
+ Permission is hereby granted, free of charge, to any person obtaining
+ a copy of this software and associated documentation files (the
+ "Software"), to deal in the Software without restriction, including
+ without limitation the rights to use, copy, modify, merge, publish,
+ distribute, sublicense, and/or sell copies of the Software, and to
+ permit persons to whom the Software is furnished to do so, subject to
+ the following conditions:
+
+ The above copyright notice and this permission notice (including the
+ next paragraph) shall be included in all copies or substantial
+ portions of the Software.
+
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+ **********************************************************************/
+ /*
+  * Authors:
+  *   Keith Whitwell <keithw@vmware.com>
+  */
+
+#include "brw_context.h"
+#include "brw_shader.h"
+#include "brw_wm.h"
+//#include "brw_state.h"  // LunarG: Remove
+#include "main/enums.h"
+#include "main/formats.h"
+//#include "main/fbobject.h" // LunarG: Remove
+//#include "main/samplerobj.h" // LunarG: Remove
+#include "program/prog_parameter.h"
+#include "program/program.h"
+//#include "intel_mipmap_tree.h" // LunarG: Remove
+
+#include "glsl/ralloc.h"
+
+/**
+ * Return a bitfield where bit n is set if barycentric interpolation mode n
+ * (see enum brw_wm_barycentric_interp_mode) is needed by the fragment shader.
+ */
+static unsigned
+brw_compute_barycentric_interp_modes(struct brw_context *brw,
+                                     bool shade_model_flat,
+                                     bool persample_shading,
+                                     const struct gl_fragment_program *fprog)
+{
+   unsigned barycentric_interp_modes = 0;
+   int attr;
+
+   /* Loop through all fragment shader inputs to figure out what interpolation
+    * modes are in use, and set the appropriate bits in
+    * barycentric_interp_modes.
+    */
+   for (attr = 0; attr < VARYING_SLOT_MAX; ++attr) {
+      enum glsl_interp_qualifier interp_qualifier =
+         fprog->InterpQualifier[attr];
+      bool is_centroid = (fprog->IsCentroid & BITFIELD64_BIT(attr)) &&
+         !persample_shading;
+      bool is_sample = (fprog->IsSample & BITFIELD64_BIT(attr)) ||
+         persample_shading;
+      bool is_gl_Color = attr == VARYING_SLOT_COL0 || attr == VARYING_SLOT_COL1;
+
+      /* Ignore unused inputs. */
+      if (!(fprog->Base.InputsRead & BITFIELD64_BIT(attr)))
+         continue;
+
+      /* Ignore WPOS and FACE, because they don't require interpolation. */
+      if (attr == VARYING_SLOT_POS || attr == VARYING_SLOT_FACE)
+         continue;
+
+      /* Determine the set (or sets) of barycentric coordinates needed to
+       * interpolate this variable.  Note that when
+       * brw->needs_unlit_centroid_workaround is set, centroid interpolation
+       * uses PIXEL interpolation for unlit pixels and CENTROID interpolation
+       * for lit pixels, so we need both sets of barycentric coordinates.
+       */
+      if (interp_qualifier == INTERP_QUALIFIER_NOPERSPECTIVE) {
+         if (is_centroid) {
+            barycentric_interp_modes |=
+               1 << BRW_WM_NONPERSPECTIVE_CENTROID_BARYCENTRIC;
+         } else if (is_sample) {
+            barycentric_interp_modes |=
+               1 << BRW_WM_NONPERSPECTIVE_SAMPLE_BARYCENTRIC;
+         }
+         if ((!is_centroid && !is_sample) ||
+             brw->needs_unlit_centroid_workaround) {
+            barycentric_interp_modes |=
+               1 << BRW_WM_NONPERSPECTIVE_PIXEL_BARYCENTRIC;
+         }
+      } else if (interp_qualifier == INTERP_QUALIFIER_SMOOTH ||
+                 (!(shade_model_flat && is_gl_Color) &&
+                  interp_qualifier == INTERP_QUALIFIER_NONE)) {
+         if (is_centroid) {
+            barycentric_interp_modes |=
+               1 << BRW_WM_PERSPECTIVE_CENTROID_BARYCENTRIC;
+         } else if (is_sample) {
+            barycentric_interp_modes |=
+               1 << BRW_WM_PERSPECTIVE_SAMPLE_BARYCENTRIC;
+         }
+         if ((!is_centroid && !is_sample) ||
+             brw->needs_unlit_centroid_workaround) {
+            barycentric_interp_modes |=
+               1 << BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC;
+         }
+      }
+   }
+
+   return barycentric_interp_modes;
+}
+
+bool
+brw_wm_prog_data_compare(const void *in_a, const void *in_b)
+{
+   const struct brw_wm_prog_data *a = in_a;
+   const struct brw_wm_prog_data *b = in_b;
+
+   /* Compare the base structure. */
+   if (!brw_stage_prog_data_compare(&a->base, &b->base))
+      return false;
+
+   /* Compare the rest of the structure. */
+   const unsigned offset = sizeof(struct brw_stage_prog_data);
+   if (memcmp(((char *) a) + offset, ((char *) b) + offset,
+              sizeof(struct brw_wm_prog_data) - offset))
+      return false;
+
+   return true;
+}
+
+struct brw_wm_compile *
+brw_wm_init_compile(struct brw_context *brw,
+		    struct gl_shader_program *prog,
+		    struct brw_fragment_program *fp,
+		    const struct brw_wm_prog_key *key)
+{
+   struct brw_wm_compile *c;
+
+   c = rzalloc(NULL, struct brw_wm_compile);
+   if (!c)
+      return NULL;
+
+   c->shader_prog = prog;
+   c->fp = fp;
+   c->key = *key;
+
+   return c;
+}
+
+bool
+brw_wm_do_compile(struct brw_context *brw,
+                  struct brw_wm_compile *c)
+{
+   struct gl_context *ctx = &brw->ctx;
+   struct gl_shader *fs = NULL;
+
+   if (c->shader_prog)
+      fs = c->shader_prog->_LinkedShaders[MESA_SHADER_FRAGMENT];
+
+   /* Allocate the references to the uniforms that will end up in the
+    * prog_data associated with the compiled program, and which will be freed
+    * by the state cache.
+    */
+   int param_count;
+   if (fs) {
+      param_count = fs->num_uniform_components;
+   } else {
+      param_count = c->fp->program.Base.Parameters->NumParameters * 4;
+   }
+   /* The backend also sometimes adds params for texture size. */
+   param_count += 2 * ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxTextureImageUnits;
+   c->prog_data.base.param = rzalloc_array(NULL, const float *, param_count);
+   c->prog_data.base.pull_param =
+      rzalloc_array(NULL, const float *, param_count);
+   c->prog_data.base.nr_params = param_count;
+
+   c->prog_data.barycentric_interp_modes =
+      brw_compute_barycentric_interp_modes(brw, c->key.flat_shade,
+                                           c->key.persample_shading,
+                                           &c->fp->program);
+
+   c->program = brw_wm_fs_emit(brw, c,
+         &c->fp->program, c->shader_prog, &c->program_size);
+   if (c->program == NULL)
+      return false;
+
+   /* Scratch space is used for register spilling */
+   if (c->last_scratch) {
+      perf_debug("Fragment shader triggered register spilling.  "
+                 "Try reducing the number of live scalar values to "
+                 "improve performance.\n");
+
+      c->prog_data.total_scratch = brw_get_scratch_size(c->last_scratch);
+   }
+
+   if (unlikely(INTEL_DEBUG & DEBUG_WM))
+      fprintf(stderr, "\n");
+
+   return true;
+}
+
+void
+brw_wm_clear_compile(struct brw_context *brw,
+                     struct brw_wm_compile *c)
+{
+   ralloc_free(c);
+}
+
+//static uint8_t
+//gen6_gather_workaround(GLenum internalformat)
+//{
+//   switch (internalformat) {
+//      case GL_R8I: return WA_SIGN | WA_8BIT;
+//      case GL_R8UI: return WA_8BIT;
+//      case GL_R16I: return WA_SIGN | WA_16BIT;
+//      case GL_R16UI: return WA_16BIT;
+//      /* note that even though GL_R32I and GL_R32UI have format overrides
+//       * in the surface state, there is no shader w/a required */
+//      default: return 0;
+//   }
+//}
+
+// LunarG : TODO - bring this back in when mapping descriptor
+//                 sets to NOS
+//void
+//brw_populate_sampler_prog_key_data(struct gl_context *ctx,
+//				   const struct gl_program *prog,
+//                                   unsigned sampler_count,
+//				   struct brw_sampler_prog_key_data *key)
+//{
+//   struct brw_context *brw = brw_context(ctx);
+
+//   for (int s = 0; s < sampler_count; s++) {
+//      key->swizzles[s] = SWIZZLE_NOOP;
+
+//      if (!(prog->SamplersUsed & (1 << s)))
+//	 continue;
+
+//      int unit_id = prog->SamplerUnits[s];
+//      const struct gl_texture_unit *unit = &ctx->Texture.Unit[unit_id];
+
+//      if (unit->_Current && unit->_Current->Target != GL_TEXTURE_BUFFER) {
+//	 const struct gl_texture_object *t = unit->_Current;
+//	 const struct gl_texture_image *img = t->Image[0][t->BaseLevel];
+//	 struct gl_sampler_object *sampler = _mesa_get_samplerobj(ctx, unit_id);
+
+//         const bool alpha_depth = t->DepthMode == GL_ALPHA &&
+//            (img->_BaseFormat == GL_DEPTH_COMPONENT ||
+//             img->_BaseFormat == GL_DEPTH_STENCIL);
+
+//         /* Haswell handles texture swizzling as surface format overrides
+//          * (except for GL_ALPHA); all other platforms need MOVs in the shader.
+//          */
+//         if (alpha_depth || (brw->gen < 8 && !brw->is_haswell))
+//            key->swizzles[s] = brw_get_texture_swizzle(ctx, t);
+
+//	 if (brw->gen < 8 &&
+//             sampler->MinFilter != GL_NEAREST &&
+//	     sampler->MagFilter != GL_NEAREST) {
+//	    if (sampler->WrapS == GL_CLAMP)
+//	       key->gl_clamp_mask[0] |= 1 << s;
+//	    if (sampler->WrapT == GL_CLAMP)
+//	       key->gl_clamp_mask[1] |= 1 << s;
+//	    if (sampler->WrapR == GL_CLAMP)
+//	       key->gl_clamp_mask[2] |= 1 << s;
+//	 }
+
+//         /* gather4's channel select for green from RG32F is broken;
+//          * requires a shader w/a on IVB; fixable with just SCS on HSW. */
+//         if (brw->gen == 7 && !brw->is_haswell && prog->UsesGather) {
+//            if (img->InternalFormat == GL_RG32F)
+//               key->gather_channel_quirk_mask |= 1 << s;
+//         }
+
+//         /* Gen6's gather4 is broken for UINT/SINT; we treat them as
+//          * UNORM/FLOAT instead and fix it in the shader.
+//          */
+//         if (brw->gen == 6 && prog->UsesGather) {
+//            key->gen6_gather_wa[s] = gen6_gather_workaround(img->InternalFormat);
+//         }
+
+//         /* If this is a multisample sampler, and uses the CMS MSAA layout,
+//          * then we need to emit slightly different code to first sample the
+//          * MCS surface.
+//          */
+//         struct intel_texture_object *intel_tex =
+//            intel_texture_object((struct gl_texture_object *)t);
+
+//         if (brw->gen >= 7 &&
+//             intel_tex->mt->msaa_layout == INTEL_MSAA_LAYOUT_CMS) {
+//            key->compressed_multisample_layout_mask |= 1 << s;
+//         }
+//      }
+//   }
+//}
+
+//static void brw_wm_populate_key( struct brw_context *brw,
+//				 struct brw_wm_prog_key *key )
+//{
+//   struct gl_context *ctx = &brw->ctx;
+//   /* BRW_NEW_FRAGMENT_PROGRAM */
+//   const struct brw_fragment_program *fp =
+//      (struct brw_fragment_program *)brw->fragment_program;
+//   const struct gl_program *prog = (struct gl_program *) brw->fragment_program;
+//   GLuint lookup = 0;
+//   GLuint line_aa;
+//   bool program_uses_dfdy = fp->program.UsesDFdy;
+//   bool multisample_fbo = ctx->DrawBuffer->Visual.samples > 1;
+
+//   memset(key, 0, sizeof(*key));
+
+//   /* Build the index for table lookup
+//    */
+//   if (brw->gen < 6) {
+//      /* _NEW_COLOR */
+//      if (fp->program.UsesKill || ctx->Color.AlphaEnabled)
+//	 lookup |= IZ_PS_KILL_ALPHATEST_BIT;
+
+//      if (fp->program.Base.OutputsWritten & BITFIELD64_BIT(FRAG_RESULT_DEPTH))
+//	 lookup |= IZ_PS_COMPUTES_DEPTH_BIT;
+
+//      /* _NEW_DEPTH */
+//      if (ctx->Depth.Test)
+//	 lookup |= IZ_DEPTH_TEST_ENABLE_BIT;
+
+//      if (ctx->Depth.Test && ctx->Depth.Mask) /* ?? */
+//	 lookup |= IZ_DEPTH_WRITE_ENABLE_BIT;
+
+//      /* _NEW_STENCIL | _NEW_BUFFERS */
+//      if (ctx->Stencil._Enabled) {
+//	 lookup |= IZ_STENCIL_TEST_ENABLE_BIT;
+
+//	 if (ctx->Stencil.WriteMask[0] ||
+//	     ctx->Stencil.WriteMask[ctx->Stencil._BackFace])
+//	    lookup |= IZ_STENCIL_WRITE_ENABLE_BIT;
+//      }
+//      key->iz_lookup = lookup;
+//   }
+
+//   line_aa = AA_NEVER;
+
+//   /* _NEW_LINE, _NEW_POLYGON, BRW_NEW_REDUCED_PRIMITIVE */
+//   if (ctx->Line.SmoothFlag) {
+//      if (brw->reduced_primitive == GL_LINES) {
+//	 line_aa = AA_ALWAYS;
+//      }
+//      else if (brw->reduced_primitive == GL_TRIANGLES) {
+//	 if (ctx->Polygon.FrontMode == GL_LINE) {
+//	    line_aa = AA_SOMETIMES;
+
+//	    if (ctx->Polygon.BackMode == GL_LINE ||
+//		(ctx->Polygon.CullFlag &&
+//		 ctx->Polygon.CullFaceMode == GL_BACK))
+//	       line_aa = AA_ALWAYS;
+//	 }
+//	 else if (ctx->Polygon.BackMode == GL_LINE) {
+//	    line_aa = AA_SOMETIMES;
+
+//	    if ((ctx->Polygon.CullFlag &&
+//		 ctx->Polygon.CullFaceMode == GL_FRONT))
+//	       line_aa = AA_ALWAYS;
+//	 }
+//      }
+//   }
+
+//   key->line_aa = line_aa;
+
+//   /* _NEW_HINT */
+//   if (brw->disable_derivative_optimization) {
+//      key->high_quality_derivatives =
+//         ctx->Hint.FragmentShaderDerivative != GL_FASTEST;
+//   } else {
+//      key->high_quality_derivatives =
+//         ctx->Hint.FragmentShaderDerivative == GL_NICEST;
+//   }
+
+//   if (brw->gen < 6)
+//      key->stats_wm = brw->stats_wm;
+
+//   /* _NEW_LIGHT */
+//   key->flat_shade = (ctx->Light.ShadeModel == GL_FLAT);
+
+//   /* _NEW_FRAG_CLAMP | _NEW_BUFFERS */
+//   key->clamp_fragment_color = ctx->Color._ClampFragmentColor;
+
+//   /* _NEW_TEXTURE */
+//   brw_populate_sampler_prog_key_data(ctx, prog, brw->wm.base.sampler_count,
+//                                      &key->tex);
+
+//   /* _NEW_BUFFERS */
+//   /*
+//    * Include the draw buffer origin and height so that we can calculate
+//    * fragment position values relative to the bottom left of the drawable,
+//    * from the incoming screen origin relative position we get as part of our
+//    * payload.
+//    *
+//    * This is only needed for the WM_WPOSXY opcode when the fragment program
+//    * uses the gl_FragCoord input.
+//    *
+//    * We could avoid recompiling by including this as a constant referenced by
+//    * our program, but if we were to do that it would also be nice to handle
+//    * getting that constant updated at batchbuffer submit time (when we
+//    * hold the lock and know where the buffer really is) rather than at emit
+//    * time when we don't hold the lock and are just guessing.  We could also
+//    * just avoid using this as key data if the program doesn't use
+//    * fragment.position.
+//    *
+//    * For DRI2 the origin_x/y will always be (0,0) but we still need the
+//    * drawable height in order to invert the Y axis.
+//    */
+//   if (fp->program.Base.InputsRead & VARYING_BIT_POS) {
+//      key->drawable_height = ctx->DrawBuffer->Height;
+//   }
+
+//   if ((fp->program.Base.InputsRead & VARYING_BIT_POS) || program_uses_dfdy) {
+//      key->render_to_fbo = _mesa_is_user_fbo(ctx->DrawBuffer);
+//   }
+
+//   /* _NEW_BUFFERS */
+//   key->nr_color_regions = ctx->DrawBuffer->_NumColorDrawBuffers;
+
+//   /* _NEW_MULTISAMPLE, _NEW_COLOR, _NEW_BUFFERS */
+//   key->replicate_alpha = ctx->DrawBuffer->_NumColorDrawBuffers > 1 &&
+//      (ctx->Multisample.SampleAlphaToCoverage || ctx->Color.AlphaEnabled);
+
+//   /* _NEW_BUFFERS _NEW_MULTISAMPLE */
+//   /* Ignore sample qualifier while computing this flag. */
+//   key->persample_shading =
+//      _mesa_get_min_invocations_per_fragment(ctx, &fp->program, true) > 1;
+//   if (key->persample_shading)
+//      key->persample_2x = ctx->DrawBuffer->Visual.samples == 2;
+
+//   key->compute_pos_offset =
+//      _mesa_get_min_invocations_per_fragment(ctx, &fp->program, false) > 1 &&
+//      fp->program.Base.SystemValuesRead & SYSTEM_BIT_SAMPLE_POS;
+
+//   key->compute_sample_id =
+//      multisample_fbo &&
+//      ctx->Multisample.Enabled &&
+//      (fp->program.Base.SystemValuesRead & SYSTEM_BIT_SAMPLE_ID);
+
+//   /* BRW_NEW_VUE_MAP_GEOM_OUT */
+//   if (brw->gen < 6 || _mesa_bitcount_64(fp->program.Base.InputsRead &
+//                                         BRW_FS_VARYING_INPUT_MASK) > 16)
+//      key->input_slots_valid = brw->vue_map_geom_out.slots_valid;
+
+
+//   /* _NEW_COLOR | _NEW_BUFFERS */
+//   /* Pre-gen6, the hardware alpha test always used each render
+//    * target's alpha to do alpha test, as opposed to render target 0's alpha
+//    * like GL requires.  Fix that by building the alpha test into the
+//    * shader, and we'll skip enabling the fixed function alpha test.
+//    */
+//   if (brw->gen < 6 && ctx->DrawBuffer->_NumColorDrawBuffers > 1 && ctx->Color.AlphaEnabled) {
+//      key->alpha_test_func = ctx->Color.AlphaFunc;
+//      key->alpha_test_ref = ctx->Color.AlphaRef;
+//   }
+
+//   /* The unique fragment program ID */
+//   key->program_string_id = fp->id;
+//}

diff --git a/icd/intel/compiler/pipeline/brw_wm.h b/icd/intel/compiler/pipeline/brw_wm.h
new file mode 100644
index 0000000..b3102f7
--- /dev/null
+++ b/icd/intel/compiler/pipeline/brw_wm.h

@@ -0,0 +1,150 @@
+/*
+ Copyright (C) Intel Corp.  2006.  All Rights Reserved.
+ Intel funded Tungsten Graphics to
+ develop this 3D driver.
+
+ Permission is hereby granted, free of charge, to any person obtaining
+ a copy of this software and associated documentation files (the
+ "Software"), to deal in the Software without restriction, including
+ without limitation the rights to use, copy, modify, merge, publish,
+ distribute, sublicense, and/or sell copies of the Software, and to
+ permit persons to whom the Software is furnished to do so, subject to
+ the following conditions:
+
+ The above copyright notice and this permission notice (including the
+ next paragraph) shall be included in all copies or substantial
+ portions of the Software.
+
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+ **********************************************************************/
+ /*
+  * Authors:
+  *   Keith Whitwell <keithw@vmware.com>
+  */
+
+
+#ifndef BRW_WM_H
+#define BRW_WM_H
+
+#include <stdbool.h>
+
+#include "program/prog_instruction.h"
+#include "brw_context.h"
+#include "brw_eu.h"
+#include "brw_program.h"
+
+/* A big lookup table is used to figure out which and how many
+ * additional regs will inserted before the main payload in the WM
+ * program execution.  These mainly relate to depth and stencil
+ * processing and the early-depth-test optimization.
+ */
+#define IZ_PS_KILL_ALPHATEST_BIT    0x1
+#define IZ_PS_COMPUTES_DEPTH_BIT    0x2
+#define IZ_DEPTH_WRITE_ENABLE_BIT   0x4
+#define IZ_DEPTH_TEST_ENABLE_BIT    0x8
+#define IZ_STENCIL_WRITE_ENABLE_BIT 0x10
+#define IZ_STENCIL_TEST_ENABLE_BIT  0x20
+#define IZ_BIT_MAX                  0x40
+
+#define AA_NEVER     0
+#define AA_SOMETIMES 1
+#define AA_ALWAYS    2
+
+struct brw_wm_prog_key {
+   uint8_t iz_lookup;
+   GLuint stats_wm:1;
+   GLuint flat_shade:1;
+   GLuint persample_shading:1;
+   GLuint persample_2x:1;
+   GLuint nr_color_regions:5;
+   GLuint replicate_alpha:1;
+   GLuint render_to_fbo:1;
+   GLuint clamp_fragment_color:1;
+   GLuint compute_pos_offset:1;
+   GLuint compute_sample_id:1;
+   GLuint line_aa:2;
+   GLuint high_quality_derivatives:1;
+
+   GLushort drawable_height;
+   GLbitfield64 input_slots_valid;
+   GLuint program_string_id:32;
+   GLenum alpha_test_func;          /* < For Gen4/5 MRT alpha test */
+   float alpha_test_ref;
+
+   struct brw_sampler_prog_key_data tex;
+};
+
+struct brw_wm_compile {
+   struct brw_wm_prog_key key;
+   struct brw_wm_prog_data prog_data;
+
+   uint8_t source_depth_reg;
+   uint8_t source_w_reg;
+   uint8_t aa_dest_stencil_reg;
+   uint8_t dest_depth_reg;
+   uint8_t sample_pos_reg;
+   uint8_t sample_mask_reg;
+   uint8_t barycentric_coord_reg[BRW_WM_BARYCENTRIC_INTERP_MODE_COUNT];
+   uint8_t nr_payload_regs;
+   GLuint source_depth_to_render_target:1;
+   GLuint runtime_check_aads_emit:1;
+
+   GLuint last_scratch;
+
+   struct gl_shader_program *shader_prog;
+   struct brw_fragment_program *fp;
+
+   const unsigned *program;
+   unsigned program_size;
+};
+
+struct brw_wm_compile *
+brw_wm_init_compile(struct brw_context *brw,
+		    struct gl_shader_program *prog,
+		    struct brw_fragment_program *fp,
+		    const struct brw_wm_prog_key *key);
+
+bool
+brw_wm_do_compile(struct brw_context *brw,
+                  struct brw_wm_compile *c);
+
+void
+brw_wm_upload_compile(struct brw_context *brw,
+                      const struct brw_wm_compile *c);
+
+void
+brw_wm_clear_compile(struct brw_context *brw,
+                     struct brw_wm_compile *c);
+
+/**
+ * Compile a fragment shader.
+ *
+ * Returns the final assembly and the program's size.
+ */
+const unsigned *brw_wm_fs_emit(struct brw_context *brw,
+                               struct brw_wm_compile *c,
+                               struct gl_fragment_program *fp,
+                               struct gl_shader_program *prog,
+                               unsigned *final_assembly_size);
+
+void brw_notify_link_shader(struct gl_context *ctx, struct gl_shader_program *prog);
+// LunarG : TODO - find a way to call these as they were
+//GLboolean brw_link_shader(struct gl_context *ctx, struct gl_shader_program *prog);
+struct gl_shader *brw_new_shader(struct gl_context *ctx, GLuint name, GLuint type);
+// LunarG : TODO - find a way to call these as they were
+//struct gl_shader_program *brw_new_shader_program(struct gl_context *ctx, GLuint name);
+
+bool brw_color_buffer_write_enabled(struct brw_context *brw);
+void brw_wm_debug_recompile(struct brw_context *brw,
+                            struct gl_shader_program *prog,
+                            const struct brw_wm_prog_key *key);
+bool brw_wm_prog_data_compare(const void *a, const void *b);
+
+#endif

diff --git a/icd/intel/compiler/pipeline/gen8_generator.h b/icd/intel/compiler/pipeline/gen8_generator.h
new file mode 100644
index 0000000..b144809
--- /dev/null
+++ b/icd/intel/compiler/pipeline/gen8_generator.h

@@ -0,0 +1,198 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+/**
+ * @file gen8_generator.h
+ *
+ * Code generation for Gen8+ hardware, replacing the brw_eu_emit.c layer.
+ */
+
+#pragma once
+
+extern "C" {
+#include "main/macros.h"
+} /* extern "C" */
+
+#include "gen8_instruction.h"
+
+class gen8_generator {
+public:
+   gen8_generator(struct brw_context *brw,
+                  struct gl_shader_program *shader_prog,
+                  struct gl_program *prog,
+                  void *mem_ctx);
+   ~gen8_generator();
+
+   /**
+    * Instruction emitters.
+    * @{
+    */
+   #define ALU1(OP) \
+   gen8_instruction *OP(struct brw_reg dst, struct brw_reg src);
+   #define ALU2(OP) \
+   gen8_instruction *OP(struct brw_reg d, struct brw_reg, struct brw_reg);
+   #define ALU3(OP) \
+   gen8_instruction *OP(struct brw_reg d, \
+                        struct brw_reg, struct brw_reg, struct brw_reg);
+   ALU2(ADD)
+   ALU2(AND)
+   ALU2(ASR)
+   ALU3(BFE)
+   ALU2(BFI1)
+   ALU3(BFI2)
+   ALU1(F32TO16)
+   ALU1(F16TO32)
+   ALU1(BFREV)
+   ALU1(CBIT)
+   ALU2(ADDC)
+   ALU2(SUBB)
+   ALU2(DP2)
+   ALU2(DP3)
+   ALU2(DP4)
+   ALU2(DPH)
+   ALU1(FBH)
+   ALU1(FBL)
+   ALU1(FRC)
+   ALU2(LINE)
+   ALU3(LRP)
+   ALU2(MAC)
+   ALU2(MACH)
+   ALU3(MAD)
+   ALU2(MUL)
+   ALU1(MOV)
+   ALU1(MOV_RAW)
+   ALU1(NOT)
+   ALU2(OR)
+   ALU2(PLN)
+   ALU1(RNDD)
+   ALU1(RNDE)
+   ALU1(RNDZ)
+   ALU2(SEL)
+   ALU2(SHL)
+   ALU2(SHR)
+   ALU2(XOR)
+   #undef ALU1
+   #undef ALU2
+   #undef ALU3
+
+   gen8_instruction *CMP(struct brw_reg dst, unsigned conditional,
+                         struct brw_reg src0, struct brw_reg src1);
+   gen8_instruction *IF(unsigned predicate);
+   gen8_instruction *ELSE();
+   gen8_instruction *ENDIF();
+   void DO();
+   gen8_instruction *BREAK();
+   gen8_instruction *CONTINUE();
+   gen8_instruction *WHILE();
+
+   gen8_instruction *HALT();
+
+   gen8_instruction *MATH(unsigned math_function,
+                          struct brw_reg dst,
+                          struct brw_reg src0);
+   gen8_instruction *MATH(unsigned math_function,
+                          struct brw_reg dst,
+                          struct brw_reg src0,
+                          struct brw_reg src1);
+   gen8_instruction *NOP();
+   /** @} */
+
+   void disassemble(FILE *out, int start, int end);
+
+protected:
+   gen8_instruction *alu3(unsigned opcode,
+                          struct brw_reg dst,
+                          struct brw_reg src0,
+                          struct brw_reg src1,
+                          struct brw_reg src2);
+
+   gen8_instruction *math(unsigned math_function,
+                          struct brw_reg dst,
+                          struct brw_reg src0);
+
+   gen8_instruction *next_inst(unsigned opcode);
+
+   struct gl_shader_program *shader_prog;
+   struct gl_program *prog;
+
+   struct brw_context *brw;
+   struct intel_context *intel;
+   struct gl_context *ctx;
+
+   gen8_instruction *store;
+   unsigned store_size;
+   unsigned nr_inst;
+   unsigned next_inst_offset;
+
+   /**
+    * Control flow stacks:
+    *
+    * if_stack contains IF and ELSE instructions which must be patched with
+    * the final jump offsets (and popped) once the matching ENDIF is encountered.
+    *
+    * We actually store an array index into the store, rather than pointers
+    * to the instructions.  This is necessary since we may realloc the store.
+    *
+    *  @{
+    */
+   int *if_stack;
+   int if_stack_depth;
+   int if_stack_array_size;
+
+   int *loop_stack;
+   int loop_stack_depth;
+   int loop_stack_array_size;
+
+   int if_depth_in_loop;
+
+   void push_if_stack(gen8_instruction *inst);
+   gen8_instruction *pop_if_stack();
+   /** @} */
+
+   void patch_IF_ELSE(gen8_instruction *if_inst,
+                      gen8_instruction *else_inst,
+                      gen8_instruction *endif_inst);
+
+   unsigned next_ip(unsigned ip) const;
+   unsigned find_next_block_end(unsigned start_ip) const;
+   unsigned find_loop_end(unsigned start) const;
+
+   void patch_jump_targets();
+
+   /**
+    * Default state for new instructions.
+    */
+   struct {
+      unsigned exec_size;
+      unsigned access_mode;
+      unsigned mask_control;
+      unsigned qtr_control;
+      unsigned flag_subreg_nr;
+      unsigned conditional_mod;
+      unsigned predicate;
+      bool predicate_inverse;
+      bool saturate;
+   } default_state;
+
+   void *mem_ctx;
+};

diff --git a/icd/intel/compiler/pipeline/gen8_instruction.h b/icd/intel/compiler/pipeline/gen8_instruction.h
new file mode 100644
index 0000000..89f2364
--- /dev/null
+++ b/icd/intel/compiler/pipeline/gen8_instruction.h

@@ -0,0 +1,422 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+/**
+ * @file gen8_instruction.h
+ *
+ * A representation of a Gen8+ EU instruction, with helper methods to get
+ * and set various fields.  This is the actual hardware format.
+ */
+
+#ifndef GEN8_INSTRUCTION_H
+#define GEN8_INSTRUCTION_H
+
+#include <stdio.h>
+#include <stdint.h>
+
+#include "brw_context.h"
+#include "brw_reg.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct gen8_instruction {
+   uint32_t data[4];
+};
+
+static inline unsigned gen8_instruction_bits(struct gen8_instruction *inst,
+                                             unsigned high,
+                                             unsigned low);
+static inline void gen8_instruction_set_bits(struct gen8_instruction *inst,
+                                             unsigned high,
+                                             unsigned low,
+                                             unsigned value);
+
+#define F(name, high, low) \
+static inline void gen8_set_##name(struct gen8_instruction *inst, unsigned v) \
+{ \
+   gen8_instruction_set_bits(inst, high, low, v); \
+} \
+static inline unsigned gen8_##name(struct gen8_instruction *inst) \
+{ \
+   return gen8_instruction_bits(inst, high, low); \
+}
+
+F(src1_vert_stride,    120, 117)
+F(src1_da1_width,      116, 114)
+F(src1_da16_swiz_w,    115, 114)
+F(src1_da16_swiz_z,    113, 112)
+F(src1_da1_hstride,    113, 112)
+F(src1_address_mode,   111, 111)
+/** Src1.SrcMod @{ */
+F(src1_negate,         110, 110)
+F(src1_abs,            109, 109)
+/** @} */
+F(src1_ia1_subreg_nr,  108, 105)
+F(src1_da_reg_nr,      108, 101)
+F(src1_da16_subreg_nr, 100, 100)
+F(src1_da1_subreg_nr,  100,  96)
+F(src1_da16_swiz_y,     99,  98)
+F(src1_da16_swiz_x,     97,  96)
+F(src1_reg_type,        94,  91)
+F(src1_reg_file,        90,  89)
+F(src0_vert_stride,     88,  85)
+F(src0_da1_width,       84,  82)
+F(src0_da16_swiz_w,     83,  82)
+F(src0_da16_swiz_z,     81,  80)
+F(src0_da1_hstride,     81,  80)
+F(src0_address_mode,    79,  79)
+/** Src0.SrcMod @{ */
+F(src0_negate,          78,  78)
+F(src0_abs,             77,  77)
+/** @} */
+F(src0_ia1_subreg_nr,   76,  73)
+F(src0_da_reg_nr,       76,  69)
+F(src0_da16_subreg_nr,  68,  68)
+F(src0_da1_subreg_nr,   68,  64)
+F(src0_da16_swiz_y,     67,  66)
+F(src0_da16_swiz_x,     65,  64)
+F(dst_address_mode,     63,  63)
+F(dst_da1_hstride,      62,  61)
+F(dst_ia1_subreg_nr,    60,  57)
+F(dst_da_reg_nr,        60,  53)
+F(dst_da16_subreg_nr,   52,  52)
+F(dst_da1_subreg_nr,    52,  48)
+F(da16_writemask,       51,  48) /* Dst.ChanEn */
+F(src0_reg_type,        46,  43)
+F(src0_reg_file,        42,  41)
+F(dst_reg_type,         40,  37)
+F(dst_reg_file,         36,  35)
+F(mask_control,         34,  34)
+F(flag_reg_nr,          33,  33)
+F(flag_subreg_nr,       32,  32)
+F(saturate,             31,  31)
+F(branch_control,       30,  30)
+F(debug_control,        30,  30)
+F(cmpt_control,         29,  29)
+F(acc_wr_control,       28,  28)
+F(cond_modifier,        27,  24)
+F(exec_size,            23,  21)
+F(pred_inv,             20,  20)
+F(pred_control,         19,  16)
+F(thread_control,       15,  14)
+F(qtr_control,          13,  12)
+F(nib_control,          11,  11)
+F(no_dd_check,          10,  10)
+F(no_dd_clear,           9,   9)
+F(access_mode,           8,   8)
+/* Bit 7 is Reserved (for future Opcode expansion) */
+F(opcode,                6,   0)
+
+/**
+ * Three-source instructions:
+ *  @{
+ */
+F(src2_3src_reg_nr,    125, 118)
+F(src2_3src_subreg_nr, 117, 115)
+F(src2_3src_swizzle,   114, 107)
+F(src2_3src_rep_ctrl,  106, 106)
+F(src1_3src_reg_nr,    104,  97)
+/* src1_3src_subreg_nr spans word boundaries and has to be handled specially */
+F(src1_3src_swizzle,    93,  86)
+F(src1_3src_rep_ctrl,   85,  85)
+F(src0_3src_reg_nr,     83,  76)
+F(src0_3src_subreg_nr,  75,  73)
+F(src0_3src_swizzle,    72,  65)
+F(src0_3src_rep_ctrl,   64,  64)
+F(dst_3src_reg_nr,      63,  56)
+F(dst_3src_subreg_nr,   55,  53)
+F(dst_3src_writemask,   52,  49)
+F(dst_3src_type,        48,  46)
+F(src_3src_type,        45,  43)
+F(src2_3src_negate,     42,  42)
+F(src2_3src_abs,        41,  41)
+F(src1_3src_negate,     40,  40)
+F(src1_3src_abs,        39,  39)
+F(src0_3src_negate,     38,  38)
+F(src0_3src_abs,        37,  37)
+/** @} */
+
+/**
+ * Fields for SEND messages:
+ *  @{
+ */
+F(eot,                 127, 127)
+F(mlen,                124, 121)
+F(rlen,                120, 116)
+F(header_present,      115, 115)
+F(function_control,    114,  96)
+F(sfid,                 27,  24)
+F(math_function,        27,  24)
+/** @} */
+
+/**
+ * URB message function control bits:
+ *  @{
+ */
+F(urb_per_slot_offset, 113, 113)
+F(urb_interleave,      111, 111)
+F(urb_global_offset,   110, 100)
+F(urb_opcode,           99,  96)
+/** @} */
+
+/* Message descriptor bits */
+#define MD(name, high, low) F(name, (high + 96), (low + 96))
+
+/**
+ * Sampler message function control bits:
+ *  @{
+ */
+MD(sampler_simd_mode,   18,  17)
+MD(sampler_msg_type,    16,  12)
+MD(sampler,             11,   8)
+MD(binding_table_index,  7,   0) /* also used by other messages */
+/** @} */
+
+/**
+ * Data port message function control bits:
+ *  @{
+ */
+MD(dp_category,         18,  18)
+MD(dp_message_type,     17,  14)
+MD(dp_message_control,  13,   8)
+/** @} */
+
+/**
+ * Scratch message bits:
+ *  @{
+ */
+MD(scratch_read_write,  17,  17) /* 0 = read,  1 = write */
+MD(scratch_type,        16,  16) /* 0 = OWord, 1 = DWord */
+MD(scratch_invalidate_after_read, 15, 15)
+MD(scratch_block_size,  13,  12)
+MD(scratch_addr_offset, 11,   0)
+/** @} */
+
+/**
+ * Render Target message function control bits:
+ *  @{
+ */
+MD(rt_last,             12,  12)
+MD(rt_slot_group,       11,  11)
+MD(rt_message_type,     10,   8)
+/** @} */
+
+/**
+ * Thread Spawn message function control bits:
+ *  @{
+ */
+MD(ts_resource_select,   4,   4)
+MD(ts_request_type,      1,   1)
+MD(ts_opcode,            0,   0)
+/** @} */
+
+/**
+ * Video Motion Estimation message function control bits:
+ *  @{
+ */
+F(vme_message_type,     14,  13)
+/** @} */
+
+/**
+ * Check & Refinement Engine message function control bits:
+ *  @{
+ */
+F(cre_message_type,     14,  13)
+/** @} */
+
+#undef MD
+#undef F
+
+static inline void
+gen8_set_src1_3src_subreg_nr(struct gen8_instruction *inst, unsigned v)
+{
+   assert((v & ~0x7) == 0);
+
+   gen8_instruction_set_bits(inst, 95, 94, v & 0x3);
+   gen8_instruction_set_bits(inst, 96, 96, v >> 2);
+}
+
+static inline unsigned
+gen8_src1_3src_subreg_nr(struct gen8_instruction *inst)
+{
+   return gen8_instruction_bits(inst, 95, 94) |
+          (gen8_instruction_bits(inst, 96, 96) << 2);
+}
+
+#define GEN8_IA1_ADDR_IMM(reg, nine, high, low)                              \
+static inline void                                                           \
+gen8_set_##reg##_ia1_addr_imm(struct gen8_instruction *inst, unsigned value) \
+{                                                                            \
+   assert((value & ~0x3ff) == 0);                                            \
+   gen8_instruction_set_bits(inst, high, low, value & 0x1ff);                \
+   gen8_instruction_set_bits(inst, nine, nine, value >> 9);                  \
+}                                                                            \
+                                                                             \
+static inline unsigned                                                       \
+gen8_##reg##_ia1_addr_imm(struct gen8_instruction *inst)                     \
+{                                                                            \
+   return gen8_instruction_bits(inst, high, low) |                           \
+          (gen8_instruction_bits(inst, nine, nine) << 9);                    \
+}
+
+/* AddrImm[9:0] for Align1 Indirect Addressing */
+GEN8_IA1_ADDR_IMM(src1, 121, 104, 96)
+GEN8_IA1_ADDR_IMM(src0,  95,  72, 64)
+GEN8_IA1_ADDR_IMM(dst,   47,  56, 48)
+
+/**
+ * Flow control instruction bits:
+ *  @{
+ */
+static inline unsigned gen8_uip(struct gen8_instruction *inst)
+{
+   return inst->data[2];
+}
+static inline void gen8_set_uip(struct gen8_instruction *inst, unsigned uip)
+{
+   inst->data[2] = uip;
+}
+static inline unsigned gen8_jip(struct gen8_instruction *inst)
+{
+   return inst->data[3];
+}
+static inline void gen8_set_jip(struct gen8_instruction *inst, unsigned jip)
+{
+   inst->data[3] = jip;
+}
+/** @} */
+
+static inline int gen8_src1_imm_d(struct gen8_instruction *inst)
+{
+   return inst->data[3];
+}
+static inline unsigned gen8_src1_imm_ud(struct gen8_instruction *inst)
+{
+   return inst->data[3];
+}
+static inline float gen8_src1_imm_f(struct gen8_instruction *inst)
+{
+   fi_type ft;
+
+   ft.u = inst->data[3];
+   return ft.f;
+}
+
+void gen8_set_dst(const struct brw_context *brw,
+                  struct gen8_instruction *inst, struct brw_reg reg);
+void gen8_set_src0(const struct brw_context *brw,
+                   struct gen8_instruction *inst, struct brw_reg reg);
+void gen8_set_src1(const struct brw_context *brw,
+                   struct gen8_instruction *inst, struct brw_reg reg);
+
+void gen8_set_urb_message(const struct brw_context *brw,
+                          struct gen8_instruction *inst,
+                          enum brw_urb_write_flags flags,
+                          unsigned mlen, unsigned rlen,
+                          unsigned offset, bool interleave);
+
+void gen8_set_sampler_message(const struct brw_context *brw,
+                              struct gen8_instruction *inst,
+                              unsigned binding_table_index, unsigned sampler,
+                              unsigned msg_type, unsigned rlen, unsigned mlen,
+                              bool header_present, unsigned simd_mode);
+
+void gen8_set_dp_message(const struct brw_context *brw,
+                         struct gen8_instruction *inst,
+                         enum brw_message_target sfid,
+                         unsigned binding_table_index,
+                         unsigned msg_type,
+                         unsigned msg_control,
+                         unsigned msg_length,
+                         unsigned response_length,
+                         bool header_present,
+                         bool end_of_thread);
+
+void gen8_set_dp_scratch_message(const struct brw_context *brw,
+                                 struct gen8_instruction *inst,
+                                 bool write,
+                                 bool dword,
+                                 bool invalidate_after_read,
+                                 unsigned num_regs,
+                                 unsigned addr_offset,
+                                 unsigned msg_length,
+                                 unsigned response_length,
+                                 bool header_present,
+                                 bool end_of_thread);
+
+/** Disassemble the instruction. */
+int gen8_disassemble(FILE *file, struct gen8_instruction *inst, int gen);
+
+
+/**
+ * Fetch a set of contiguous bits from the instruction.
+ *
+ * Bits indexes range from 0..127; fields may not cross 32-bit boundaries.
+ */
+static inline unsigned
+gen8_instruction_bits(struct gen8_instruction *inst, unsigned high, unsigned low)
+{
+   /* We assume the field doesn't cross 32-bit boundaries. */
+   const unsigned word = high / 32;
+   assert(word == low / 32);
+
+   high %= 32;
+   low %= 32;
+
+   const unsigned mask = (((1 << (high - low + 1)) - 1) << low);
+
+   return (inst->data[word] & mask) >> low;
+}
+
+/**
+ * Set bits in the instruction, with proper shifting and masking.
+ *
+ * Bits indexes range from 0..127; fields may not cross 32-bit boundaries.
+ */
+static inline void
+gen8_instruction_set_bits(struct gen8_instruction *inst,
+                          unsigned high,
+                          unsigned low,
+                          unsigned value)
+{
+   const unsigned word = high / 32;
+   assert(word == low / 32);
+
+   high %= 32;
+   low %= 32;
+
+   const unsigned mask = (((1 << (high - low + 1)) - 1) << low);
+
+   /* Make sure the supplied value actually fits in the given bitfield. */
+   assert((value & (mask >> low)) == value);
+
+   inst->data[word] = (inst->data[word] & ~mask) | ((value << low) & mask);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif

diff --git a/icd/intel/compiler/pipeline/gen8_vec4_generator.cpp b/icd/intel/compiler/pipeline/gen8_vec4_generator.cpp
new file mode 100644
index 0000000..16416f1
--- /dev/null
+++ b/icd/intel/compiler/pipeline/gen8_vec4_generator.cpp

@@ -0,0 +1,948 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "brw_vec4.h"
+
+extern "C" {
+#include "brw_eu.h"
+#include "main/macros.h"
+#include "program/prog_print.h"
+#include "program/prog_parameter.h"
+};
+
+#include "icd-utils.h" // LunarG : ADD
+
+namespace brw {
+
+gen8_vec4_generator::gen8_vec4_generator(struct brw_context *brw,
+                                         struct gl_shader_program *shader_prog,
+                                         struct gl_program *prog,
+                                         struct brw_vec4_prog_data *prog_data,
+                                         void *mem_ctx,
+                                         bool debug_flag)
+   : gen8_generator(brw, shader_prog, prog, mem_ctx),
+     prog_data(prog_data),
+     debug_flag(debug_flag)
+{
+}
+
+gen8_vec4_generator::~gen8_vec4_generator()
+{
+}
+
+void
+gen8_vec4_generator::generate_tex(vec4_instruction *ir, struct brw_reg dst)
+{
+   int msg_type = 0;
+
+   switch (ir->opcode) {
+   case SHADER_OPCODE_TEX:
+   case SHADER_OPCODE_TXL:
+      if (ir->shadow_compare) {
+         msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_LOD_COMPARE;
+      } else {
+         msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_LOD;
+      }
+      break;
+   case SHADER_OPCODE_TXD:
+      if (ir->shadow_compare) {
+         msg_type = HSW_SAMPLER_MESSAGE_SAMPLE_DERIV_COMPARE;
+      } else {
+         msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_DERIVS;
+      }
+      break;
+   case SHADER_OPCODE_TXF:
+      msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_LD;
+      break;
+   case SHADER_OPCODE_TXF_CMS:
+      msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DMS;
+      break;
+   case SHADER_OPCODE_TXF_MCS:
+      msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_LD_MCS;
+      break;
+   case SHADER_OPCODE_TXS:
+      msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_RESINFO;
+      break;
+   case SHADER_OPCODE_TG4:
+      if (ir->shadow_compare) {
+         msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_C;
+      } else {
+         msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4;
+      }
+      break;
+   case SHADER_OPCODE_TG4_OFFSET:
+      if (ir->shadow_compare) {
+         msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO_C;
+      } else {
+         msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO;
+      }
+      break;
+   default:
+      assert(!"should not get here: invalid VS texture opcode");
+      break;
+   }
+
+   if (ir->header_present) {
+      MOV_RAW(retype(brw_message_reg(ir->base_mrf), BRW_REGISTER_TYPE_UD),
+              retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UD));
+
+      default_state.access_mode = BRW_ALIGN_1;
+
+      if (ir->texture_offset) {
+         /* Set the offset bits in DWord 2. */
+         MOV_RAW(retype(brw_vec1_reg(MRF, ir->base_mrf, 2),
+                        BRW_REGISTER_TYPE_UD),
+                 brw_imm_ud(ir->texture_offset));
+      }
+
+      if (ir->sampler >= 16) {
+         /* The "Sampler Index" field can only store values between 0 and 15.
+          * However, we can add an offset to the "Sampler State Pointer"
+          * field, effectively selecting a different set of 16 samplers.
+          *
+          * The "Sampler State Pointer" needs to be aligned to a 32-byte
+          * offset, and each sampler state is only 16-bytes, so we can't
+          * exclusively use the offset - we have to use both.
+          */
+         gen8_instruction *add =
+            ADD(get_element_ud(brw_message_reg(ir->base_mrf), 3),
+                get_element_ud(brw_vec8_grf(0, 0), 3),
+                brw_imm_ud(16 * (ir->sampler / 16) *
+                           sizeof(gen7_sampler_state)));
+         gen8_set_mask_control(add, BRW_MASK_DISABLE);
+      }
+
+      default_state.access_mode = BRW_ALIGN_16;
+   }
+
+   uint32_t surf_index =
+      prog_data->base.binding_table.texture_start + ir->sampler;
+
+   gen8_instruction *inst = next_inst(BRW_OPCODE_SEND);
+   gen8_set_dst(brw, inst, dst);
+   gen8_set_src0(brw, inst, brw_message_reg(ir->base_mrf));
+   gen8_set_sampler_message(brw, inst,
+                            surf_index,
+                            ir->sampler % 16,
+                            msg_type,
+                            1,
+                            ir->mlen,
+                            ir->header_present,
+                            BRW_SAMPLER_SIMD_MODE_SIMD4X2);
+
+   brw_mark_surface_used(&prog_data->base, surf_index);
+}
+
+void
+gen8_vec4_generator::generate_urb_write(vec4_instruction *ir, bool vs)
+{
+   struct brw_reg header = brw_vec8_grf(GEN7_MRF_HACK_START + ir->base_mrf, 0);
+
+   /* Copy g0. */
+   if (vs)
+      MOV_RAW(header, brw_vec8_grf(0, 0));
+
+   gen8_instruction *inst;
+   if (!(ir->urb_write_flags & BRW_URB_WRITE_USE_CHANNEL_MASKS)) {
+      /* Enable Channel Masks in the URB_WRITE_OWORD message header */
+      default_state.access_mode = BRW_ALIGN_1;
+      MOV_RAW(brw_vec1_grf(GEN7_MRF_HACK_START + ir->base_mrf, 5),
+              brw_imm_ud(0xff00));
+      default_state.access_mode = BRW_ALIGN_16;
+   }
+
+   inst = next_inst(BRW_OPCODE_SEND);
+   gen8_set_urb_message(brw, inst, ir->urb_write_flags, ir->mlen, 0, ir->offset,
+                        true);
+   gen8_set_dst(brw, inst, brw_null_reg());
+   gen8_set_src0(brw, inst, header);
+}
+
+void
+gen8_vec4_generator::generate_gs_set_vertex_count(struct brw_reg eot_mrf_header,
+                                                  struct brw_reg src)
+{
+   /* Move the vertex count into the second MRF for the EOT write. */
+   assert(eot_mrf_header.file == BRW_MESSAGE_REGISTER_FILE);
+   int dst_nr = GEN7_MRF_HACK_START + eot_mrf_header.nr + 1;
+   MOV(retype(brw_vec8_grf(dst_nr, 0), BRW_REGISTER_TYPE_UD), src);
+}
+
+void
+gen8_vec4_generator::generate_gs_thread_end(vec4_instruction *ir)
+{
+   struct brw_reg src = brw_vec8_grf(GEN7_MRF_HACK_START + ir->base_mrf, 0);
+   gen8_instruction *inst;
+
+   /* Enable Channel Masks in the URB_WRITE_HWORD message header */
+   default_state.access_mode = BRW_ALIGN_1;
+   inst = OR(retype(brw_vec1_grf(GEN7_MRF_HACK_START + ir->base_mrf, 5),
+                    BRW_REGISTER_TYPE_UD),
+             retype(brw_vec1_grf(0, 5), BRW_REGISTER_TYPE_UD),
+             brw_imm_ud(0xff00)); /* could be 0x1100 but shouldn't matter */
+   gen8_set_mask_control(inst, BRW_MASK_DISABLE);
+   default_state.access_mode = BRW_ALIGN_16;
+
+   /* mlen = 2: g0 header + vertex count */
+   inst = next_inst(BRW_OPCODE_SEND);
+   gen8_set_urb_message(brw, inst, BRW_URB_WRITE_EOT, 2, 0, 0, true);
+   gen8_set_dst(brw, inst, brw_null_reg());
+   gen8_set_src0(brw, inst, src);
+}
+
+void
+gen8_vec4_generator::generate_gs_set_write_offset(struct brw_reg dst,
+                                                  struct brw_reg src0,
+                                                  struct brw_reg src1)
+{
+   /* From p22 of volume 4 part 2 of the Ivy Bridge PRM (2.4.3.1 Message
+    * Header: M0.3):
+    *
+    *     Slot 0 Offset. This field, after adding to the Global Offset field
+    *     in the message descriptor, specifies the offset (in 256-bit units)
+    *     from the start of the URB entry, as referenced by URB Handle 0, at
+    *     which the data will be accessed.
+    *
+    * Similar text describes DWORD M0.4, which is slot 1 offset.
+    *
+    * Therefore, we want to multiply DWORDs 0 and 4 of src0 (the x components
+    * of the register for geometry shader invocations 0 and 1) by the
+    * immediate value in src1, and store the result in DWORDs 3 and 4 of dst.
+    *
+    * We can do this with the following EU instruction:
+    *
+    *     mul(2) dst.3<1>UD src0<8;2,4>UD src1   { Align1 WE_all }
+    */
+   default_state.access_mode = BRW_ALIGN_1;
+   gen8_instruction *inst =
+      MUL(suboffset(stride(dst, 2, 2, 1), 3), stride(src0, 8, 2, 4), src1);
+   gen8_set_mask_control(inst, BRW_MASK_DISABLE);
+   default_state.access_mode = BRW_ALIGN_16;
+}
+
+void
+gen8_vec4_generator::generate_gs_set_dword_2_immed(struct brw_reg dst,
+                                                   struct brw_reg src)
+{
+   assert(src.file == BRW_IMMEDIATE_VALUE);
+
+   default_state.access_mode = BRW_ALIGN_1;
+
+   gen8_instruction *inst = MOV(suboffset(vec1(dst), 2), src);
+   gen8_set_mask_control(inst, BRW_MASK_DISABLE);
+
+   default_state.access_mode = BRW_ALIGN_16;
+}
+
+void
+gen8_vec4_generator::generate_gs_prepare_channel_masks(struct brw_reg dst)
+{
+   /* We want to left shift just DWORD 4 (the x component belonging to the
+    * second geometry shader invocation) by 4 bits.  So generate the
+    * instruction:
+    *
+    *     shl(1) dst.4<1>UD dst.4<0,1,0>UD 4UD { align1 WE_all }
+    */
+   dst = suboffset(vec1(dst), 4);
+   default_state.access_mode = BRW_ALIGN_1;
+   gen8_instruction *inst = SHL(dst, dst, brw_imm_ud(4));
+   gen8_set_mask_control(inst, BRW_MASK_DISABLE);
+   default_state.access_mode = BRW_ALIGN_16;
+}
+
+void
+gen8_vec4_generator::generate_gs_set_channel_masks(struct brw_reg dst,
+                                                   struct brw_reg src)
+{
+   /* From p21 of volume 4 part 2 of the Ivy Bridge PRM (2.4.3.1 Message
+    * Header: M0.5):
+    *
+    *     15 Vertex 1 DATA [3] / Vertex 0 DATA[7] Channel Mask
+    *
+    *        When Swizzle Control = URB_INTERLEAVED this bit controls Vertex 1
+    *        DATA[3], when Swizzle Control = URB_NOSWIZZLE this bit controls
+    *        Vertex 0 DATA[7].  This bit is ANDed with the corresponding
+    *        channel enable to determine the final channel enable.  For the
+    *        URB_READ_OWORD & URB_READ_HWORD messages, when final channel
+    *        enable is 1 it indicates that Vertex 1 DATA [3] will be included
+    *        in the writeback message.  For the URB_WRITE_OWORD &
+    *        URB_WRITE_HWORD messages, when final channel enable is 1 it
+    *        indicates that Vertex 1 DATA [3] will be written to the surface.
+    *
+    *        0: Vertex 1 DATA [3] / Vertex 0 DATA[7] channel not included
+    *        1: Vertex DATA [3] / Vertex 0 DATA[7] channel included
+    *
+    *     14 Vertex 1 DATA [2] Channel Mask
+    *     13 Vertex 1 DATA [1] Channel Mask
+    *     12 Vertex 1 DATA [0] Channel Mask
+    *     11 Vertex 0 DATA [3] Channel Mask
+    *     10 Vertex 0 DATA [2] Channel Mask
+    *      9 Vertex 0 DATA [1] Channel Mask
+    *      8 Vertex 0 DATA [0] Channel Mask
+    *
+    * (This is from a section of the PRM that is agnostic to the particular
+    * type of shader being executed, so "Vertex 0" and "Vertex 1" refer to
+    * geometry shader invocations 0 and 1, respectively).  Since we have the
+    * enable flags for geometry shader invocation 0 in bits 3:0 of DWORD 0,
+    * and the enable flags for geometry shader invocation 1 in bits 7:0 of
+    * DWORD 4, we just need to OR them together and store the result in bits
+    * 15:8 of DWORD 5.
+    *
+    * It's easier to get the EU to do this if we think of the src and dst
+    * registers as composed of 32 bytes each; then, we want to pick up the
+    * contents of bytes 0 and 16 from src, OR them together, and store them in
+    * byte 21.
+    *
+    * We can do that by the following EU instruction:
+    *
+    *     or(1) dst.21<1>UB src<0,1,0>UB src.16<0,1,0>UB { align1 WE_all }
+    *
+    * Note: this relies on the source register having zeros in (a) bits 7:4 of
+    * DWORD 0 and (b) bits 3:0 of DWORD 4.  We can rely on (b) because the
+    * source register was prepared by GS_OPCODE_PREPARE_CHANNEL_MASKS (which
+    * shifts DWORD 4 left by 4 bits), and we can rely on (a) because prior to
+    * the execution of GS_OPCODE_PREPARE_CHANNEL_MASKS, DWORDs 0 and 4 need to
+    * contain valid channel mask values (which are in the range 0x0-0xf).
+    */
+   dst = retype(dst, BRW_REGISTER_TYPE_UB);
+   src = retype(src, BRW_REGISTER_TYPE_UB);
+
+   default_state.access_mode = BRW_ALIGN_1;
+
+   gen8_instruction *inst =
+      OR(suboffset(vec1(dst), 21), vec1(src), suboffset(vec1(src), 16));
+   gen8_set_mask_control(inst, BRW_MASK_DISABLE);
+
+   default_state.access_mode = BRW_ALIGN_16;
+}
+
+void
+gen8_vec4_generator::generate_oword_dual_block_offsets(struct brw_reg m1,
+                                                       struct brw_reg index)
+{
+   int second_vertex_offset = 1;
+
+   m1 = retype(m1, BRW_REGISTER_TYPE_D);
+
+   /* Set up M1 (message payload).  Only the block offsets in M1.0 and
+    * M1.4 are used, and the rest are ignored.
+    */
+   struct brw_reg m1_0 = suboffset(vec1(m1), 0);
+   struct brw_reg m1_4 = suboffset(vec1(m1), 4);
+   struct brw_reg index_0 = suboffset(vec1(index), 0);
+   struct brw_reg index_4 = suboffset(vec1(index), 4);
+
+   default_state.mask_control = BRW_MASK_DISABLE;
+   default_state.access_mode = BRW_ALIGN_1;
+
+   MOV(m1_0, index_0);
+
+   if (index.file == BRW_IMMEDIATE_VALUE) {
+      index_4.dw1.ud += second_vertex_offset;
+      MOV(m1_4, index_4);
+   } else {
+      ADD(m1_4, index_4, brw_imm_d(second_vertex_offset));
+   }
+
+   default_state.mask_control = BRW_MASK_ENABLE;
+   default_state.access_mode = BRW_ALIGN_16;
+}
+
+void
+gen8_vec4_generator::generate_scratch_read(vec4_instruction *ir,
+                                           struct brw_reg dst,
+                                           struct brw_reg index)
+{
+   struct brw_reg header = brw_vec8_grf(GEN7_MRF_HACK_START + ir->base_mrf, 0);
+
+   MOV_RAW(header, brw_vec8_grf(0, 0));
+
+   generate_oword_dual_block_offsets(brw_message_reg(ir->base_mrf + 1), index);
+
+   /* Each of the 8 channel enables is considered for whether each
+    * dword is written.
+    */
+   gen8_instruction *send = next_inst(BRW_OPCODE_SEND);
+   gen8_set_dst(brw, send, dst);
+   gen8_set_src0(brw, send, header);
+   gen8_set_dp_message(brw, send, GEN7_SFID_DATAPORT_DATA_CACHE,
+                       255, /* binding table index: stateless access */
+                       GEN6_DATAPORT_READ_MESSAGE_OWORD_DUAL_BLOCK_READ,
+                       BRW_DATAPORT_OWORD_DUAL_BLOCK_1OWORD,
+                       2,      /* mlen */
+                       1,      /* rlen */
+                       true,   /* header present */
+                       false); /* EOT */
+}
+
+void
+gen8_vec4_generator::generate_scratch_write(vec4_instruction *ir,
+                                            struct brw_reg dst,
+                                            struct brw_reg src,
+                                            struct brw_reg index)
+{
+   struct brw_reg header = brw_vec8_grf(GEN7_MRF_HACK_START + ir->base_mrf, 0);
+
+   MOV_RAW(header, brw_vec8_grf(0, 0));
+
+   generate_oword_dual_block_offsets(brw_message_reg(ir->base_mrf + 1), index);
+
+   MOV(retype(brw_message_reg(ir->base_mrf + 2), BRW_REGISTER_TYPE_D),
+       retype(src, BRW_REGISTER_TYPE_D));
+
+   /* Each of the 8 channel enables is considered for whether each
+    * dword is written.
+    */
+   gen8_instruction *send = next_inst(BRW_OPCODE_SEND);
+   gen8_set_dst(brw, send, dst);
+   gen8_set_src0(brw, send, header);
+   gen8_set_pred_control(send, ir->predicate);
+   gen8_set_dp_message(brw, send, GEN7_SFID_DATAPORT_DATA_CACHE,
+                       255, /* binding table index: stateless access */
+                       GEN7_DATAPORT_WRITE_MESSAGE_OWORD_DUAL_BLOCK_WRITE,
+                       BRW_DATAPORT_OWORD_DUAL_BLOCK_1OWORD,
+                       3,      /* mlen */
+                       0,      /* rlen */
+                       true,   /* header present */
+                       false); /* EOT */
+}
+
+void
+gen8_vec4_generator::generate_pull_constant_load(vec4_instruction *inst,
+                                                 struct brw_reg dst,
+                                                 struct brw_reg index,
+                                                 struct brw_reg offset)
+{
+   assert(index.file == BRW_IMMEDIATE_VALUE &&
+          index.type == BRW_REGISTER_TYPE_UD);
+   uint32_t surf_index = index.dw1.ud;
+
+   assert(offset.file == BRW_GENERAL_REGISTER_FILE);
+
+   /* Each of the 8 channel enables is considered for whether each
+    * dword is written.
+    */
+   gen8_instruction *send = next_inst(BRW_OPCODE_SEND);
+   gen8_set_dst(brw, send, dst);
+   gen8_set_src0(brw, send, offset);
+   gen8_set_sampler_message(brw, send,
+                            surf_index,
+                            0, /* The LD message ignores the sampler unit. */
+                            GEN5_SAMPLER_MESSAGE_SAMPLE_LD,
+                            1,      /* rlen */
+                            1,      /* mlen */
+                            false,  /* no header */
+                            BRW_SAMPLER_SIMD_MODE_SIMD4X2);
+
+   brw_mark_surface_used(&prog_data->base, surf_index);
+}
+
+void
+gen8_vec4_generator::generate_untyped_atomic(vec4_instruction *ir,
+                                             struct brw_reg dst,
+                                             struct brw_reg atomic_op,
+                                             struct brw_reg surf_index)
+{
+   assert(atomic_op.file == BRW_IMMEDIATE_VALUE &&
+          atomic_op.type == BRW_REGISTER_TYPE_UD &&
+          surf_index.file == BRW_IMMEDIATE_VALUE &&
+          surf_index.type == BRW_REGISTER_TYPE_UD);
+   assert((atomic_op.dw1.ud & ~0xf) == 0);
+
+   unsigned msg_control =
+      atomic_op.dw1.ud | /* Atomic Operation Type: BRW_AOP_* */
+      (1 << 5); /* Return data expected */
+
+   gen8_instruction *inst = next_inst(BRW_OPCODE_SEND);
+   gen8_set_dst(brw, inst, retype(dst, BRW_REGISTER_TYPE_UD));
+   gen8_set_src0(brw, inst, brw_message_reg(ir->base_mrf));
+   gen8_set_dp_message(brw, inst, HSW_SFID_DATAPORT_DATA_CACHE_1,
+                       surf_index.dw1.ud,
+                       HSW_DATAPORT_DC_PORT1_UNTYPED_ATOMIC_OP_SIMD4X2,
+                       msg_control,
+                       ir->mlen,
+                       1,
+                       ir->header_present,
+                       false);
+
+   brw_mark_surface_used(&prog_data->base, surf_index.dw1.ud);
+}
+
+
+
+void
+gen8_vec4_generator::generate_untyped_surface_read(vec4_instruction *ir,
+                                                   struct brw_reg dst,
+                                                   struct brw_reg surf_index)
+{
+   assert(surf_index.file == BRW_IMMEDIATE_VALUE &&
+          surf_index.type == BRW_REGISTER_TYPE_UD);
+
+   gen8_instruction *inst = next_inst(BRW_OPCODE_SEND);
+   gen8_set_dst(brw, inst, retype(dst, BRW_REGISTER_TYPE_UD));
+   gen8_set_src0(brw, inst, brw_message_reg(ir->base_mrf));
+   gen8_set_dp_message(brw, inst, HSW_SFID_DATAPORT_DATA_CACHE_1,
+                       surf_index.dw1.ud,
+                       HSW_DATAPORT_DC_PORT1_UNTYPED_SURFACE_READ,
+                       0xe, /* enable only the R channel */
+                       ir->mlen,
+                       1,
+                       ir->header_present,
+                       false);
+
+   brw_mark_surface_used(&prog_data->base, surf_index.dw1.ud);
+}
+
+
+void
+gen8_vec4_generator::generate_vec4_instruction(vec4_instruction *instruction,
+                                               struct brw_reg dst,
+                                               struct brw_reg *src)
+{
+   vec4_instruction *ir = (vec4_instruction *) instruction;
+
+   if (dst.width == BRW_WIDTH_4) {
+      /* This happens in attribute fixups for "dual instanced" geometry
+       * shaders, since they use attributes that are vec4's.  Since the exec
+       * width is only 4, it's essential that the caller set
+       * force_writemask_all in order to make sure the instruction is executed
+       * regardless of which channels are enabled.
+       */
+      assert(ir->force_writemask_all);
+
+      /* Fix up any <8;8,1> or <0;4,1> source registers to <4;4,1> to satisfy
+       * the following register region restrictions (from Graphics BSpec:
+       * 3D-Media-GPGPU Engine > EU Overview > Registers and Register Regions
+       * > Register Region Restrictions)
+       *
+       *     1. ExecSize must be greater than or equal to Width.
+       *
+       *     2. If ExecSize = Width and HorzStride != 0, VertStride must be set
+       *        to Width * HorzStride."
+       */
+      for (int i = 0; i < 3; i++) {
+         if (src[i].file == BRW_GENERAL_REGISTER_FILE)
+            src[i] = stride(src[i], 4, 4, 1);
+      }
+   }
+
+   switch (ir->opcode) {
+   case BRW_OPCODE_MOV:
+      MOV(dst, src[0]);
+      break;
+
+   case BRW_OPCODE_ADD:
+      ADD(dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_MUL:
+      MUL(dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_MACH:
+      MACH(dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_MAD:
+      MAD(dst, src[0], src[1], src[2]);
+      break;
+
+   case BRW_OPCODE_FRC:
+      FRC(dst, src[0]);
+      break;
+
+   case BRW_OPCODE_RNDD:
+      RNDD(dst, src[0]);
+      break;
+
+   case BRW_OPCODE_RNDE:
+      RNDE(dst, src[0]);
+      break;
+
+   case BRW_OPCODE_RNDZ:
+      RNDZ(dst, src[0]);
+      break;
+
+   case BRW_OPCODE_AND:
+      AND(dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_OR:
+      OR(dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_XOR:
+      XOR(dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_NOT:
+      NOT(dst, src[0]);
+      break;
+
+   case BRW_OPCODE_ASR:
+      ASR(dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_SHR:
+      SHR(dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_SHL:
+      SHL(dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_CMP:
+      CMP(dst, ir->conditional_mod, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_SEL:
+      SEL(dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_DPH:
+      DPH(dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_DP4:
+      DP4(dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_DP3:
+      DP3(dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_DP2:
+      DP2(dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_F32TO16:
+      /* Emulate the Gen7 zeroing bug. */
+      MOV(retype(dst, BRW_REGISTER_TYPE_UD), brw_imm_ud(0u));
+      MOV(retype(dst, BRW_REGISTER_TYPE_HF), src[0]);
+      break;
+
+   case BRW_OPCODE_F16TO32:
+      MOV(dst, retype(src[0], BRW_REGISTER_TYPE_HF));
+      break;
+
+   case BRW_OPCODE_LRP:
+      LRP(dst, src[0], src[1], src[2]);
+      break;
+
+   case BRW_OPCODE_BFREV:
+      /* BFREV only supports UD type for src and dst. */
+      BFREV(retype(dst, BRW_REGISTER_TYPE_UD),
+            retype(src[0], BRW_REGISTER_TYPE_UD));
+      break;
+
+   case BRW_OPCODE_FBH:
+      /* FBH only supports UD type for dst. */
+      FBH(retype(dst, BRW_REGISTER_TYPE_UD), src[0]);
+      break;
+
+   case BRW_OPCODE_FBL:
+      /* FBL only supports UD type for dst. */
+      FBL(retype(dst, BRW_REGISTER_TYPE_UD), src[0]);
+      break;
+
+   case BRW_OPCODE_CBIT:
+      /* CBIT only supports UD type for dst. */
+      CBIT(retype(dst, BRW_REGISTER_TYPE_UD), src[0]);
+      break;
+
+   case BRW_OPCODE_ADDC:
+      ADDC(dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_SUBB:
+      SUBB(dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_BFE:
+      BFE(dst, src[0], src[1], src[2]);
+      break;
+
+   case BRW_OPCODE_BFI1:
+      BFI1(dst, src[0], src[1]);
+      break;
+
+   case BRW_OPCODE_BFI2:
+      BFI2(dst, src[0], src[1], src[2]);
+      break;
+
+   case BRW_OPCODE_IF:
+      IF(ir->predicate);
+      break;
+
+   case BRW_OPCODE_ELSE:
+      ELSE();
+      break;
+
+   case BRW_OPCODE_ENDIF:
+      ENDIF();
+      break;
+
+   case BRW_OPCODE_DO:
+      DO();
+      break;
+
+   case BRW_OPCODE_BREAK:
+      BREAK();
+      break;
+
+   case BRW_OPCODE_CONTINUE:
+      CONTINUE();
+      break;
+
+   case BRW_OPCODE_WHILE:
+      WHILE();
+      break;
+
+   case SHADER_OPCODE_RCP:
+      MATH(BRW_MATH_FUNCTION_INV, dst, src[0]);
+      break;
+
+   case SHADER_OPCODE_RSQ:
+      MATH(BRW_MATH_FUNCTION_RSQ, dst, src[0]);
+      break;
+
+   case SHADER_OPCODE_SQRT:
+      MATH(BRW_MATH_FUNCTION_SQRT, dst, src[0]);
+      break;
+
+   case SHADER_OPCODE_EXP2:
+      MATH(BRW_MATH_FUNCTION_EXP, dst, src[0]);
+      break;
+
+   case SHADER_OPCODE_LOG2:
+      MATH(BRW_MATH_FUNCTION_LOG, dst, src[0]);
+      break;
+
+   case SHADER_OPCODE_SIN:
+      MATH(BRW_MATH_FUNCTION_SIN, dst, src[0]);
+      break;
+
+   case SHADER_OPCODE_COS:
+      MATH(BRW_MATH_FUNCTION_COS, dst, src[0]);
+      break;
+
+   case SHADER_OPCODE_POW:
+      MATH(BRW_MATH_FUNCTION_POW, dst, src[0], src[1]);
+      break;
+
+   case SHADER_OPCODE_INT_QUOTIENT:
+      MATH(BRW_MATH_FUNCTION_INT_DIV_QUOTIENT, dst, src[0], src[1]);
+      break;
+
+   case SHADER_OPCODE_INT_REMAINDER:
+      MATH(BRW_MATH_FUNCTION_INT_DIV_REMAINDER, dst, src[0], src[1]);
+      break;
+
+   case SHADER_OPCODE_TEX:
+   case SHADER_OPCODE_TXD:
+   case SHADER_OPCODE_TXF:
+   case SHADER_OPCODE_TXF_CMS:
+   case SHADER_OPCODE_TXF_MCS:
+   case SHADER_OPCODE_TXL:
+   case SHADER_OPCODE_TXS:
+   case SHADER_OPCODE_TG4:
+   case SHADER_OPCODE_TG4_OFFSET:
+      generate_tex(ir, dst);
+      break;
+
+   case VS_OPCODE_URB_WRITE:
+      generate_urb_write(ir, true);
+      break;
+
+   case SHADER_OPCODE_GEN4_SCRATCH_READ:
+      generate_scratch_read(ir, dst, src[0]);
+      break;
+
+   case SHADER_OPCODE_GEN4_SCRATCH_WRITE:
+      generate_scratch_write(ir, dst, src[0], src[1]);
+      break;
+
+   case VS_OPCODE_PULL_CONSTANT_LOAD:
+   case VS_OPCODE_PULL_CONSTANT_LOAD_GEN7:
+      generate_pull_constant_load(ir, dst, src[0], src[1]);
+      break;
+
+   case GS_OPCODE_URB_WRITE:
+      generate_urb_write(ir, false);
+      break;
+
+   case GS_OPCODE_THREAD_END:
+      generate_gs_thread_end(ir);
+      break;
+
+   case GS_OPCODE_SET_WRITE_OFFSET:
+      generate_gs_set_write_offset(dst, src[0], src[1]);
+      break;
+
+   case GS_OPCODE_SET_VERTEX_COUNT:
+      generate_gs_set_vertex_count(dst, src[0]);
+      break;
+
+   case GS_OPCODE_SET_DWORD_2_IMMED:
+      generate_gs_set_dword_2_immed(dst, src[0]);
+      break;
+
+   case GS_OPCODE_PREPARE_CHANNEL_MASKS:
+      generate_gs_prepare_channel_masks(dst);
+      break;
+
+   case GS_OPCODE_SET_CHANNEL_MASKS:
+      generate_gs_set_channel_masks(dst, src[0]);
+      break;
+
+//   case SHADER_OPCODE_SHADER_TIME_ADD:
+//      assert(!"XXX: Missing Gen8 vec4 support for INTEL_DEBUG=shader_time");
+//      break;
+
+   case SHADER_OPCODE_UNTYPED_ATOMIC:
+      generate_untyped_atomic(ir, dst, src[0], src[1]);
+      break;
+
+   case SHADER_OPCODE_UNTYPED_SURFACE_READ:
+      generate_untyped_surface_read(ir, dst, src[0]);
+      break;
+
+   case VS_OPCODE_UNPACK_FLAGS_SIMD4X2:
+      assert(!"VS_OPCODE_UNPACK_FLAGS_SIMD4X2 should not be used on Gen8+.");
+      break;
+
+   default:
+      if (ir->opcode < (int) ARRAY_SIZE(opcode_descs)) {
+         _mesa_problem(ctx, "Unsupported opcode in `%s' in VS\n",
+                       opcode_descs[ir->opcode].name);
+      } else {
+         _mesa_problem(ctx, "Unsupported opcode %d in VS", ir->opcode);
+      }
+      abort();
+   }
+}
+
+void
+gen8_vec4_generator::generate_code(exec_list *instructions)
+{
+   int last_native_inst_offset = 0;
+   const char *last_annotation_string = NULL;
+   const void *last_annotation_ir = NULL;
+
+   if (unlikely(debug_flag)) {
+      if (shader_prog) {
+         fprintf(stderr, "Native code for %s vertex shader %d:\n",
+                 shader_prog->Label ? shader_prog->Label : "unnamed",
+                 shader_prog->Name);
+      } else {
+         fprintf(stderr, "Native code for vertex program %d:\n", prog->Id);
+      }
+   }
+
+   foreach_list(node, instructions) {
+      vec4_instruction *ir = (vec4_instruction *) node;
+      struct brw_reg src[3], dst;
+
+      if (unlikely(debug_flag)) {
+         if (last_annotation_ir != ir->ir) {
+            last_annotation_ir = ir->ir;
+            if (last_annotation_ir) {
+               fprintf(stderr, "   ");
+               if (shader_prog) {
+                  ((ir_instruction *) last_annotation_ir)->fprint(stderr);
+               } else {
+                  const prog_instruction *vpi;
+                  vpi = (const prog_instruction *) ir->ir;
+                  fprintf(stderr, "%d: ", (int)(vpi - prog->Instructions));
+                  _mesa_fprint_instruction_opt(stderr, vpi, 0,
+                                               PROG_PRINT_DEBUG, NULL);
+               }
+               fprintf(stderr, "\n");
+            }
+         }
+         if (last_annotation_string != ir->annotation) {
+            last_annotation_string = ir->annotation;
+            if (last_annotation_string)
+               fprintf(stderr, "   %s\n", last_annotation_string);
+         }
+      }
+
+      for (unsigned int i = 0; i < 3; i++) {
+         src[i] = ir->get_src(prog_data, i);
+      }
+      dst = ir->get_dst();
+
+      default_state.conditional_mod = ir->conditional_mod;
+      default_state.predicate = ir->predicate;
+      default_state.predicate_inverse = ir->predicate_inverse;
+      default_state.saturate = ir->saturate;
+
+      const unsigned pre_emit_nr_inst = nr_inst;
+
+      generate_vec4_instruction(ir, dst, src);
+
+      if (ir->no_dd_clear || ir->no_dd_check) {
+         assert(nr_inst == pre_emit_nr_inst + 1 ||
+                !"no_dd_check or no_dd_clear set for IR emitting more "
+                 "than 1 instruction");
+
+         gen8_instruction *last = &store[pre_emit_nr_inst];
+         gen8_set_no_dd_clear(last, ir->no_dd_clear);
+         gen8_set_no_dd_check(last, ir->no_dd_check);
+      }
+
+      if (unlikely(debug_flag)) {
+         disassemble(stderr, last_native_inst_offset, next_inst_offset);
+      }
+
+      last_native_inst_offset = next_inst_offset;
+   }
+
+   if (unlikely(debug_flag)) {
+      fprintf(stderr, "\n");
+   }
+
+   patch_jump_targets();
+
+   /* OK, while the INTEL_DEBUG=vs above is very nice for debugging VS
+    * emit issues, it doesn't get the jump distances into the output,
+    * which is often something we want to debug.  So this is here in
+    * case you're doing that.
+    */
+   if (0 && unlikely(debug_flag)) {
+      disassemble(stderr, 0, next_inst_offset);
+   }
+}
+
+const unsigned *
+gen8_vec4_generator::generate_assembly(exec_list *instructions,
+                                       unsigned *assembly_size)
+{
+   default_state.access_mode = BRW_ALIGN_16;
+   default_state.exec_size = BRW_EXECUTE_8;
+   generate_code(instructions);
+   *assembly_size = next_inst_offset;
+   return (const unsigned *) store;
+}
+
+} /* namespace brw */

diff --git a/icd/intel/compiler/pipeline/intel_chipset.h b/icd/intel/compiler/pipeline/intel_chipset.h
new file mode 100644
index 0000000..a760dee
--- /dev/null
+++ b/icd/intel/compiler/pipeline/intel_chipset.h

@@ -0,0 +1,246 @@
+ /*
+ * Copyright © 2007 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Eric Anholt <eric@anholt.net>
+ *
+ */
+
+#pragma once
+#include <stdbool.h>
+
+#define PCI_CHIP_IGD_GM			0xA011
+#define PCI_CHIP_IGD_G			0xA001
+
+#define IS_IGDGM(devid)	(devid == PCI_CHIP_IGD_GM)
+#define IS_IGDG(devid)	(devid == PCI_CHIP_IGD_G)
+#define IS_IGD(devid) (IS_IGDG(devid) || IS_IGDGM(devid))
+
+#define PCI_CHIP_I965_G			0x29A2
+#define PCI_CHIP_I965_Q			0x2992
+#define PCI_CHIP_I965_G_1		0x2982
+#define PCI_CHIP_I946_GZ		0x2972
+#define PCI_CHIP_I965_GM                0x2A02
+#define PCI_CHIP_I965_GME               0x2A12
+
+#define PCI_CHIP_GM45_GM                0x2A42
+
+#define PCI_CHIP_IGD_E_G                0x2E02
+#define PCI_CHIP_Q45_G                  0x2E12
+#define PCI_CHIP_G45_G                  0x2E22
+#define PCI_CHIP_G41_G                  0x2E32
+#define PCI_CHIP_B43_G                  0x2E42
+#define PCI_CHIP_B43_G1                 0x2E92
+
+#define PCI_CHIP_ILD_G                  0x0042
+#define PCI_CHIP_ILM_G                  0x0046
+
+#define PCI_CHIP_SANDYBRIDGE_GT1	0x0102	/* Desktop */
+#define PCI_CHIP_SANDYBRIDGE_GT2	0x0112
+#define PCI_CHIP_SANDYBRIDGE_GT2_PLUS	0x0122
+#define PCI_CHIP_SANDYBRIDGE_M_GT1	0x0106	/* Mobile */
+#define PCI_CHIP_SANDYBRIDGE_M_GT2	0x0116
+#define PCI_CHIP_SANDYBRIDGE_M_GT2_PLUS	0x0126
+#define PCI_CHIP_SANDYBRIDGE_S		0x010A	/* Server */
+
+#define PCI_CHIP_IVYBRIDGE_GT1          0x0152  /* Desktop */
+#define PCI_CHIP_IVYBRIDGE_GT2          0x0162
+#define PCI_CHIP_IVYBRIDGE_M_GT1        0x0156  /* Mobile */
+#define PCI_CHIP_IVYBRIDGE_M_GT2        0x0166
+#define PCI_CHIP_IVYBRIDGE_S_GT1        0x015a  /* Server */
+#define PCI_CHIP_IVYBRIDGE_S_GT2        0x016a
+
+#define PCI_CHIP_BAYTRAIL_M_1           0x0F31
+#define PCI_CHIP_BAYTRAIL_M_2           0x0F32
+#define PCI_CHIP_BAYTRAIL_M_3           0x0F33
+#define PCI_CHIP_BAYTRAIL_M_4           0x0157
+#define PCI_CHIP_BAYTRAIL_D             0x0155
+
+#define PCI_CHIP_HASWELL_GT1            0x0402 /* Desktop */
+#define PCI_CHIP_HASWELL_GT2            0x0412
+#define PCI_CHIP_HASWELL_GT3            0x0422
+#define PCI_CHIP_HASWELL_M_GT1          0x0406 /* Mobile */
+#define PCI_CHIP_HASWELL_M_GT2          0x0416
+#define PCI_CHIP_HASWELL_M_GT3          0x0426
+#define PCI_CHIP_HASWELL_S_GT1          0x040A /* Server */
+#define PCI_CHIP_HASWELL_S_GT2          0x041A
+#define PCI_CHIP_HASWELL_S_GT3          0x042A
+#define PCI_CHIP_HASWELL_B_GT1          0x040B /* Reserved */
+#define PCI_CHIP_HASWELL_B_GT2          0x041B
+#define PCI_CHIP_HASWELL_B_GT3          0x042B
+#define PCI_CHIP_HASWELL_E_GT1          0x040E /* Reserved */
+#define PCI_CHIP_HASWELL_E_GT2          0x041E
+#define PCI_CHIP_HASWELL_E_GT3          0x042E
+#define PCI_CHIP_HASWELL_SDV_GT1        0x0C02 /* Desktop */
+#define PCI_CHIP_HASWELL_SDV_GT2        0x0C12
+#define PCI_CHIP_HASWELL_SDV_GT3        0x0C22
+#define PCI_CHIP_HASWELL_SDV_M_GT1      0x0C06 /* Mobile */
+#define PCI_CHIP_HASWELL_SDV_M_GT2      0x0C16
+#define PCI_CHIP_HASWELL_SDV_M_GT3      0x0C26
+#define PCI_CHIP_HASWELL_SDV_S_GT1      0x0C0A /* Server */
+#define PCI_CHIP_HASWELL_SDV_S_GT2      0x0C1A
+#define PCI_CHIP_HASWELL_SDV_S_GT3      0x0C2A
+#define PCI_CHIP_HASWELL_SDV_B_GT1      0x0C0B /* Reserved */
+#define PCI_CHIP_HASWELL_SDV_B_GT2      0x0C1B
+#define PCI_CHIP_HASWELL_SDV_B_GT3      0x0C2B
+#define PCI_CHIP_HASWELL_SDV_E_GT1      0x0C0E /* Reserved */
+#define PCI_CHIP_HASWELL_SDV_E_GT2      0x0C1E
+#define PCI_CHIP_HASWELL_SDV_E_GT3      0x0C2E
+#define PCI_CHIP_HASWELL_ULT_GT1        0x0A02 /* Desktop */
+#define PCI_CHIP_HASWELL_ULT_GT2        0x0A12
+#define PCI_CHIP_HASWELL_ULT_GT3        0x0A22
+#define PCI_CHIP_HASWELL_ULT_M_GT1      0x0A06 /* Mobile */
+#define PCI_CHIP_HASWELL_ULT_M_GT2      0x0A16
+#define PCI_CHIP_HASWELL_ULT_M_GT3      0x0A26
+#define PCI_CHIP_HASWELL_ULT_S_GT1      0x0A0A /* Server */
+#define PCI_CHIP_HASWELL_ULT_S_GT2      0x0A1A
+#define PCI_CHIP_HASWELL_ULT_S_GT3      0x0A2A
+#define PCI_CHIP_HASWELL_ULT_B_GT1      0x0A0B /* Reserved */
+#define PCI_CHIP_HASWELL_ULT_B_GT2      0x0A1B
+#define PCI_CHIP_HASWELL_ULT_B_GT3      0x0A2B
+#define PCI_CHIP_HASWELL_ULT_E_GT1      0x0A0E /* Reserved */
+#define PCI_CHIP_HASWELL_ULT_E_GT2      0x0A1E
+#define PCI_CHIP_HASWELL_ULT_E_GT3      0x0A2E
+#define PCI_CHIP_HASWELL_CRW_GT1        0x0D02 /* Desktop */
+#define PCI_CHIP_HASWELL_CRW_GT2        0x0D12
+#define PCI_CHIP_HASWELL_CRW_GT3        0x0D22
+#define PCI_CHIP_HASWELL_CRW_M_GT1      0x0D06 /* Mobile */
+#define PCI_CHIP_HASWELL_CRW_M_GT2      0x0D16
+#define PCI_CHIP_HASWELL_CRW_M_GT3      0x0D26
+#define PCI_CHIP_HASWELL_CRW_S_GT1      0x0D0A /* Server */
+#define PCI_CHIP_HASWELL_CRW_S_GT2      0x0D1A
+#define PCI_CHIP_HASWELL_CRW_S_GT3      0x0D2A
+#define PCI_CHIP_HASWELL_CRW_B_GT1      0x0D0B /* Reserved */
+#define PCI_CHIP_HASWELL_CRW_B_GT2      0x0D1B
+#define PCI_CHIP_HASWELL_CRW_B_GT3      0x0D2B
+#define PCI_CHIP_HASWELL_CRW_E_GT1      0x0D0E /* Reserved */
+#define PCI_CHIP_HASWELL_CRW_E_GT2      0x0D1E
+#define PCI_CHIP_HASWELL_CRW_E_GT3      0x0D2E
+
+#define IS_G45(devid)           (devid == PCI_CHIP_IGD_E_G || \
+                                 devid == PCI_CHIP_Q45_G || \
+                                 devid == PCI_CHIP_G45_G || \
+                                 devid == PCI_CHIP_G41_G || \
+                                 devid == PCI_CHIP_B43_G || \
+                                 devid == PCI_CHIP_B43_G1)
+#define IS_GM45(devid)          (devid == PCI_CHIP_GM45_GM)
+#define IS_G4X(devid)		(IS_G45(devid) || IS_GM45(devid))
+
+#define IS_ILD(devid)           (devid == PCI_CHIP_ILD_G)
+#define IS_ILM(devid)           (devid == PCI_CHIP_ILM_G)
+#define IS_GEN5(devid)          (IS_ILD(devid) || IS_ILM(devid))
+
+#define IS_SNB_GT1(devid)	(devid == PCI_CHIP_SANDYBRIDGE_GT1 || \
+				 devid == PCI_CHIP_SANDYBRIDGE_M_GT1 || \
+				 devid == PCI_CHIP_SANDYBRIDGE_S)
+
+#define IS_SNB_GT2(devid)	(devid == PCI_CHIP_SANDYBRIDGE_GT2 || \
+				 devid == PCI_CHIP_SANDYBRIDGE_GT2_PLUS	|| \
+				 devid == PCI_CHIP_SANDYBRIDGE_M_GT2 || \
+				 devid == PCI_CHIP_SANDYBRIDGE_M_GT2_PLUS)
+
+#define IS_GEN6(devid)		(IS_SNB_GT1(devid) || IS_SNB_GT2(devid))
+
+#define IS_IVB_GT1(devid)       (devid == PCI_CHIP_IVYBRIDGE_GT1 || \
+				 devid == PCI_CHIP_IVYBRIDGE_M_GT1 || \
+				 devid == PCI_CHIP_IVYBRIDGE_S_GT1)
+
+#define IS_IVB_GT2(devid)       (devid == PCI_CHIP_IVYBRIDGE_GT2 || \
+				 devid == PCI_CHIP_IVYBRIDGE_M_GT2 || \
+				 devid == PCI_CHIP_IVYBRIDGE_S_GT2)
+
+#define IS_IVYBRIDGE(devid)     (IS_IVB_GT1(devid) || IS_IVB_GT2(devid))
+
+#define IS_BAYTRAIL(devid)      (devid == PCI_CHIP_BAYTRAIL_M_1 || \
+                                 devid == PCI_CHIP_BAYTRAIL_M_2 || \
+                                 devid == PCI_CHIP_BAYTRAIL_M_3 || \
+                                 devid == PCI_CHIP_BAYTRAIL_M_4 || \
+                                 devid == PCI_CHIP_BAYTRAIL_D)
+
+#define IS_GEN7(devid)	        (IS_IVYBRIDGE(devid) || \
+				 IS_BAYTRAIL(devid) || \
+				 IS_HASWELL(devid))
+
+#define IS_HSW_GT1(devid)	(devid == PCI_CHIP_HASWELL_GT1 || \
+				 devid == PCI_CHIP_HASWELL_M_GT1 || \
+				 devid == PCI_CHIP_HASWELL_S_GT1 || \
+				 devid == PCI_CHIP_HASWELL_B_GT1 || \
+				 devid == PCI_CHIP_HASWELL_E_GT1 || \
+				 devid == PCI_CHIP_HASWELL_SDV_GT1 || \
+				 devid == PCI_CHIP_HASWELL_SDV_M_GT1 || \
+				 devid == PCI_CHIP_HASWELL_SDV_S_GT1 || \
+				 devid == PCI_CHIP_HASWELL_SDV_B_GT1 || \
+				 devid == PCI_CHIP_HASWELL_SDV_E_GT1 || \
+				 devid == PCI_CHIP_HASWELL_ULT_GT1 || \
+				 devid == PCI_CHIP_HASWELL_ULT_M_GT1 || \
+				 devid == PCI_CHIP_HASWELL_ULT_S_GT1 || \
+				 devid == PCI_CHIP_HASWELL_ULT_B_GT1 || \
+				 devid == PCI_CHIP_HASWELL_ULT_E_GT1 || \
+				 devid == PCI_CHIP_HASWELL_CRW_GT1 || \
+				 devid == PCI_CHIP_HASWELL_CRW_M_GT1 || \
+				 devid == PCI_CHIP_HASWELL_CRW_S_GT1 || \
+				 devid == PCI_CHIP_HASWELL_CRW_B_GT1 || \
+				 devid == PCI_CHIP_HASWELL_CRW_E_GT1)
+#define IS_HSW_GT2(devid)	(devid == PCI_CHIP_HASWELL_GT2 || \
+				 devid == PCI_CHIP_HASWELL_M_GT2 || \
+				 devid == PCI_CHIP_HASWELL_S_GT2 || \
+				 devid == PCI_CHIP_HASWELL_B_GT2 || \
+				 devid == PCI_CHIP_HASWELL_E_GT2 || \
+				 devid == PCI_CHIP_HASWELL_SDV_GT2 || \
+				 devid == PCI_CHIP_HASWELL_SDV_M_GT2 || \
+				 devid == PCI_CHIP_HASWELL_SDV_S_GT2 || \
+				 devid == PCI_CHIP_HASWELL_SDV_B_GT2 || \
+				 devid == PCI_CHIP_HASWELL_SDV_E_GT2 || \
+				 devid == PCI_CHIP_HASWELL_ULT_GT2 || \
+				 devid == PCI_CHIP_HASWELL_ULT_M_GT2 || \
+				 devid == PCI_CHIP_HASWELL_ULT_S_GT2 || \
+				 devid == PCI_CHIP_HASWELL_ULT_B_GT2 || \
+				 devid == PCI_CHIP_HASWELL_ULT_E_GT2 || \
+				 devid == PCI_CHIP_HASWELL_CRW_GT2 || \
+				 devid == PCI_CHIP_HASWELL_CRW_M_GT2 || \
+				 devid == PCI_CHIP_HASWELL_CRW_S_GT2 || \
+				 devid == PCI_CHIP_HASWELL_CRW_B_GT2 || \
+				 devid == PCI_CHIP_HASWELL_CRW_E_GT2)
+#define IS_HSW_GT3(devid)	(devid == PCI_CHIP_HASWELL_GT3 || \
+				 devid == PCI_CHIP_HASWELL_M_GT3 || \
+				 devid == PCI_CHIP_HASWELL_S_GT3 || \
+				 devid == PCI_CHIP_HASWELL_B_GT3 || \
+				 devid == PCI_CHIP_HASWELL_E_GT3 || \
+				 devid == PCI_CHIP_HASWELL_SDV_GT3 || \
+				 devid == PCI_CHIP_HASWELL_SDV_M_GT3 || \
+				 devid == PCI_CHIP_HASWELL_SDV_S_GT3 || \
+				 devid == PCI_CHIP_HASWELL_SDV_B_GT3 || \
+				 devid == PCI_CHIP_HASWELL_SDV_E_GT3 || \
+				 devid == PCI_CHIP_HASWELL_ULT_GT3 || \
+				 devid == PCI_CHIP_HASWELL_ULT_M_GT3 || \
+				 devid == PCI_CHIP_HASWELL_ULT_S_GT3 || \
+				 devid == PCI_CHIP_HASWELL_ULT_B_GT3 || \
+				 devid == PCI_CHIP_HASWELL_ULT_E_GT3 || \
+				 devid == PCI_CHIP_HASWELL_CRW_GT3 || \
+				 devid == PCI_CHIP_HASWELL_CRW_M_GT3 || \
+				 devid == PCI_CHIP_HASWELL_CRW_S_GT3 || \
+				 devid == PCI_CHIP_HASWELL_CRW_B_GT3 || \
+				 devid == PCI_CHIP_HASWELL_CRW_E_GT3)
+
+#define IS_HASWELL(devid)       (IS_HSW_GT1(devid) || \
+				 IS_HSW_GT2(devid) || \
+				 IS_HSW_GT3(devid))

diff --git a/icd/intel/compiler/pipeline/intel_debug.c b/icd/intel/compiler/pipeline/intel_debug.c
new file mode 100644
index 0000000..790a37b
--- /dev/null
+++ b/icd/intel/compiler/pipeline/intel_debug.c

@@ -0,0 +1,92 @@
+/*
+ * Copyright 2003 VMware, Inc.
+ * Copyright © 2006 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+/**
+ * \file intel_debug.c
+ *
+ * Support for the INTEL_DEBUG environment variable, along with other
+ * miscellaneous debugging code.
+ */
+
+#include "brw_context.h"
+#include "intel_debug.h"
+//#include "utils.h"  // LunarG :
+
+//uint64_t INTEL_DEBUG = DEBUG_WM | DEBUG_NO16 | DEBUG_GS | DEBUG_VS;
+uint64_t INTEL_DEBUG = 0;
+
+//static const struct dri_debug_control debug_control[] = {
+//   { "tex",   DEBUG_TEXTURE},
+//   { "state", DEBUG_STATE},
+//   { "blit",  DEBUG_BLIT},
+//   { "mip",   DEBUG_MIPTREE},
+//   { "fall",  DEBUG_PERF},
+//   { "perf",  DEBUG_PERF},
+//   { "perfmon", DEBUG_PERFMON},
+//   { "bat",   DEBUG_BATCH},
+//   { "pix",   DEBUG_PIXEL},
+//   { "buf",   DEBUG_BUFMGR},
+//   { "fbo",   DEBUG_FBO},
+//   { "fs",    DEBUG_WM },
+//   { "gs",    DEBUG_GS},
+//   { "sync",  DEBUG_SYNC},
+//   { "prim",  DEBUG_PRIMS },
+//   { "vert",  DEBUG_VERTS },
+//   { "dri",   DEBUG_DRI },
+//   { "sf",    DEBUG_SF },
+//   { "stats", DEBUG_STATS },
+//   { "wm",    DEBUG_WM },
+//   { "urb",   DEBUG_URB },
+//   { "vs",    DEBUG_VS },
+//   { "clip",  DEBUG_CLIP },
+//   { "aub",   DEBUG_AUB },
+//   { "shader_time", DEBUG_SHADER_TIME },
+//   { "no16",  DEBUG_NO16 },
+//   { "blorp", DEBUG_BLORP },
+//   { "nodualobj", DEBUG_NO_DUAL_OBJECT_GS },
+//   { NULL,    0 }
+//};
+
+void
+brw_process_intel_debug_variable(struct brw_context *brw)
+{
+   // LunarG : For now, let's hard code our debug variables
+   INTEL_DEBUG = 0;
+
+//   INTEL_DEBUG = driParseDebugString(getenv("INTEL_DEBUG"), debug_control);
+//   if (INTEL_DEBUG & DEBUG_BUFMGR)
+//      dri_bufmgr_set_debug(brw->bufmgr, true);
+
+//   if ((INTEL_DEBUG & DEBUG_SHADER_TIME) && brw->gen < 7) {
+//      fprintf(stderr,
+//              "shader_time debugging requires gen7 (Ivybridge) or better.\n");
+//      INTEL_DEBUG &= ~DEBUG_SHADER_TIME;
+//   }
+
+//   if (INTEL_DEBUG & DEBUG_PERF)
+//      brw->perf_debug = true;
+
+//   if (INTEL_DEBUG & DEBUG_AUB)
+//      drm_intel_bufmgr_gem_set_aub_dump(brw->bufmgr, true);
+}

diff --git a/icd/intel/compiler/pipeline/intel_debug.h b/icd/intel/compiler/pipeline/intel_debug.h
new file mode 100644
index 0000000..6402cec
--- /dev/null
+++ b/icd/intel/compiler/pipeline/intel_debug.h

@@ -0,0 +1,109 @@
+/*
+ * Copyright 2003 VMware, Inc.
+ * Copyright © 2007 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining
+ * a copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sublicense, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial
+ * portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+ * LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+ * OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+ * WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ */
+#pragma once
+
+/**
+ * \file intel_debug.h
+ *
+ * Basic INTEL_DEBUG environment variable handling.  This file defines the
+ * list of debugging flags, as well as some macros for handling them.
+ */
+
+extern uint64_t INTEL_DEBUG;
+
+#define DEBUG_TEXTURE	  0x1
+#define DEBUG_STATE	  0x2
+#define DEBUG_BLIT	  0x8
+#define DEBUG_MIPTREE     0x10
+#define DEBUG_PERF	  0x20
+#define DEBUG_PERFMON     0x40
+#define DEBUG_BATCH       0x80
+#define DEBUG_PIXEL       0x100
+#define DEBUG_BUFMGR      0x200
+#define DEBUG_FBO         0x800
+#define DEBUG_GS          0x1000
+#define DEBUG_SYNC	  0x2000
+#define DEBUG_PRIMS	  0x4000
+#define DEBUG_VERTS	  0x8000
+#define DEBUG_DRI         0x10000
+#define DEBUG_SF          0x20000
+#define DEBUG_STATS       0x100000
+#define DEBUG_WM          0x400000
+#define DEBUG_URB         0x800000
+#define DEBUG_VS          0x1000000
+#define DEBUG_CLIP        0x2000000
+#define DEBUG_AUB         0x4000000
+#define DEBUG_SHADER_TIME 0x8000000
+#define DEBUG_BLORP       0x10000000
+#define DEBUG_NO16        0x20000000
+#define DEBUG_VUE         0x40000000
+#define DEBUG_NO_DUAL_OBJECT_GS 0x80000000
+
+#ifdef HAVE_ANDROID_PLATFORM
+#define LOG_TAG "INTEL-MESA"
+#include <cutils/log.h>
+#ifndef ALOGW
+#define ALOGW LOGW
+#endif
+#define dbg_printf(...)	ALOGW(__VA_ARGS__)
+#else
+#define dbg_printf(...)	fprintf(stderr, __VA_ARGS__)
+#endif /* HAVE_ANDROID_PLATFORM */
+
+#define DBG(...) do {						\
+	if (unlikely(INTEL_DEBUG & FILE_DEBUG_FLAG))		\
+		dbg_printf(__VA_ARGS__);			\
+} while(0)
+
+#define perf_debug(...) do {					\
+   static GLuint msg_id = 0;                                    \
+   if (unlikely(INTEL_DEBUG & DEBUG_PERF))                      \
+      dbg_printf(__VA_ARGS__);                                  \
+   if (brw->perf_debug)                                         \
+      _mesa_gl_debug(&brw->ctx, &msg_id,                        \
+                     MESA_DEBUG_TYPE_PERFORMANCE,               \
+                     MESA_DEBUG_SEVERITY_MEDIUM,                \
+                     __VA_ARGS__);                              \
+} while(0)
+
+#define WARN_ONCE(cond, fmt...) do {                            \
+   if (unlikely(cond)) {                                        \
+      static bool _warned = false;                              \
+      static GLuint msg_id = 0;                                 \
+      if (!_warned) {                                           \
+         fprintf(stderr, "WARNING: ");                          \
+         fprintf(stderr, fmt);                                  \
+         _warned = true;                                        \
+                                                                \
+         _mesa_gl_debug(ctx, &msg_id,                           \
+                        MESA_DEBUG_TYPE_OTHER,                  \
+                        MESA_DEBUG_SEVERITY_HIGH, fmt);         \
+      }                                                         \
+   }                                                            \
+} while (0)
+
+struct brw_context;
+
+extern void brw_process_intel_debug_variable(struct brw_context *brw);

diff --git a/icd/intel/compiler/pipeline/intel_mipmap_tree.h b/icd/intel/compiler/pipeline/intel_mipmap_tree.h
new file mode 100644
index 0000000..3f5f4e8
--- /dev/null
+++ b/icd/intel/compiler/pipeline/intel_mipmap_tree.h

@@ -0,0 +1,702 @@
+/**************************************************************************
+ *
+ * Copyright 2006 VMware, Inc.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
+ * IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR
+ * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ **************************************************************************/
+
+/** @file intel_mipmap_tree.h
+ *
+ * This file defines the structure that wraps a BO and describes how the
+ * mipmap levels and slices of a texture are laid out.
+ *
+ * The hardware has a fixed layout of a texture depending on parameters such
+ * as the target/type (2D, 3D, CUBE), width, height, pitch, and number of
+ * mipmap levels.  The individual level/layer slices are each 2D rectangles of
+ * pixels at some x/y offset from the start of the drm_intel_bo.
+ *
+ * Original OpenGL allowed texture miplevels to be specified in arbitrary
+ * order, and a texture may change size over time.  Thus, each
+ * intel_texture_image has a reference to a miptree that contains the pixel
+ * data sized appropriately for it, which will later be referenced by/copied
+ * to the intel_texture_object at draw time (intel_finalize_mipmap_tree()) so
+ * that there's a single miptree for the complete texture.
+ */
+
+#ifndef INTEL_MIPMAP_TREE_H
+#define INTEL_MIPMAP_TREE_H
+
+#include <assert.h>
+
+#include "main/mtypes.h"
+#include "intel_resolve_map.h"
+#include <GL/internal/dri_interface.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct brw_context;
+struct intel_renderbuffer;
+
+struct intel_resolve_map;
+struct intel_texture_image;
+
+/**
+ * When calling intel_miptree_map() on an ETC-transcoded-to-RGB miptree or a
+ * depthstencil-split-to-separate-stencil miptree, we'll normally make a
+ * tmeporary and recreate the kind of data requested by Mesa core, since we're
+ * satisfying some glGetTexImage() request or something.
+ *
+ * However, occasionally you want to actually map the miptree's current data
+ * without transcoding back.  This flag to intel_miptree_map() gets you that.
+ */
+#define BRW_MAP_DIRECT_BIT	0x80000000
+
+struct intel_miptree_map {
+   /** Bitfield of GL_MAP_READ_BIT, GL_MAP_WRITE_BIT, GL_MAP_INVALIDATE_BIT */
+   GLbitfield mode;
+   /** Region of interest for the map. */
+   int x, y, w, h;
+   /** Possibly malloced temporary buffer for the mapping. */
+   void *buffer;
+   /** Possible pointer to a temporary linear miptree for the mapping. */
+   struct intel_mipmap_tree *mt;
+   /** Pointer to the start of (map_x, map_y) returned by the mapping. */
+   void *ptr;
+   /** Stride of the mapping. */
+   int stride;
+};
+
+/**
+ * Describes the location of each texture image within a miptree.
+ */
+struct intel_mipmap_level
+{
+   /** Offset to this miptree level, used in computing x_offset. */
+   GLuint level_x;
+   /** Offset to this miptree level, used in computing y_offset. */
+   GLuint level_y;
+
+   /**
+    * \brief Number of 2D slices in this miplevel.
+    *
+    * The exact semantics of depth varies according to the texture target:
+    *    - For GL_TEXTURE_CUBE_MAP, depth is 6.
+    *    - For GL_TEXTURE_2D_ARRAY, depth is the number of array slices. It is
+    *      identical for all miplevels in the texture.
+    *    - For GL_TEXTURE_3D, it is the texture's depth at this miplevel. Its
+    *      value, like width and height, varies with miplevel.
+    *    - For other texture types, depth is 1.
+    *    - Additionally, for UMS and CMS miptrees, depth is multiplied by
+    *      sample count.
+    */
+   GLuint depth;
+
+   /**
+    * \brief List of 2D images in this mipmap level.
+    *
+    * This may be a list of cube faces, array slices in 2D array texture, or
+    * layers in a 3D texture. The list's length is \c depth.
+    */
+   struct intel_mipmap_slice {
+      /**
+       * \name Offset to slice
+       * \{
+       *
+       * Hardware formats are so diverse that that there is no unified way to
+       * compute the slice offsets, so we store them in this table.
+       *
+       * The (x, y) offset to slice \c s at level \c l relative the miptrees
+       * base address is
+       * \code
+       *     x = mt->level[l].slice[s].x_offset
+       *     y = mt->level[l].slice[s].y_offset
+       */
+      GLuint x_offset;
+      GLuint y_offset;
+      /** \} */
+
+      /**
+       * Mapping information. Persistent for the duration of
+       * intel_miptree_map/unmap on this slice.
+       */
+      struct intel_miptree_map *map;
+
+      /**
+       * \brief Is HiZ enabled for this slice?
+       *
+       * If \c mt->level[l].slice[s].has_hiz is set, then (1) \c mt->hiz_mt
+       * has been allocated and (2) the HiZ memory corresponding to this slice
+       * resides at \c mt->hiz_mt->level[l].slice[s].
+       */
+      bool has_hiz;
+   } *slice;
+};
+
+/**
+ * Enum for keeping track of the different MSAA layouts supported by Gen7.
+ */
+enum intel_msaa_layout
+{
+   /**
+    * Ordinary surface with no MSAA.
+    */
+   INTEL_MSAA_LAYOUT_NONE,
+
+   /**
+    * Interleaved Multisample Surface.  The additional samples are
+    * accommodated by scaling up the width and the height of the surface so
+    * that all the samples corresponding to a pixel are located at nearby
+    * memory locations.
+    */
+   INTEL_MSAA_LAYOUT_IMS,
+
+   /**
+    * Uncompressed Multisample Surface.  The surface is stored as a 2D array,
+    * with array slice n containing all pixel data for sample n.
+    */
+   INTEL_MSAA_LAYOUT_UMS,
+
+   /**
+    * Compressed Multisample Surface.  The surface is stored as in
+    * INTEL_MSAA_LAYOUT_UMS, but there is an additional buffer called the MCS
+    * (Multisample Control Surface) buffer.  Each pixel in the MCS buffer
+    * indicates the mapping from sample number to array slice.  This allows
+    * the common case (where all samples constituting a pixel have the same
+    * color value) to be stored efficiently by just using a single array
+    * slice.
+    */
+   INTEL_MSAA_LAYOUT_CMS,
+};
+
+
+/**
+ * Enum for keeping track of the fast clear state of a buffer associated with
+ * a miptree.
+ *
+ * Fast clear works by deferring the memory writes that would be used to clear
+ * the buffer, so that instead of performing them at the time of the clear
+ * operation, the hardware automatically performs them at the time that the
+ * buffer is later accessed for rendering.  The MCS buffer keeps track of
+ * which regions of the buffer still have pending clear writes.
+ *
+ * This enum keeps track of the driver's knowledge of pending fast clears in
+ * the MCS buffer.
+ *
+ * MCS buffers only exist on Gen7+.
+ */
+enum intel_fast_clear_state
+{
+   /**
+    * There is no MCS buffer for this miptree, and one should never be
+    * allocated.
+    */
+   INTEL_FAST_CLEAR_STATE_NO_MCS,
+
+   /**
+    * No deferred clears are pending for this miptree, and the contents of the
+    * color buffer are entirely correct.  An MCS buffer may or may not exist
+    * for this miptree.  If it does exist, it is entirely in the "no deferred
+    * clears pending" state.  If it does not exist, it will be created the
+    * first time a fast color clear is executed.
+    *
+    * In this state, the color buffer can be used for purposes other than
+    * rendering without needing a render target resolve.
+    *
+    * Since there is no such thing as a "fast color clear resolve" for MSAA
+    * buffers, an MSAA buffer will never be in this state.
+    */
+   INTEL_FAST_CLEAR_STATE_RESOLVED,
+
+   /**
+    * An MCS buffer exists for this miptree, and deferred clears are pending
+    * for some regions of the color buffer, as indicated by the MCS buffer.
+    * The contents of the color buffer are only correct for the regions where
+    * the MCS buffer doesn't indicate a deferred clear.
+    *
+    * If a single-sample buffer is in this state, a render target resolve must
+    * be performed before it can be used for purposes other than rendering.
+    */
+   INTEL_FAST_CLEAR_STATE_UNRESOLVED,
+
+   /**
+    * An MCS buffer exists for this miptree, and deferred clears are pending
+    * for the entire color buffer, and the contents of the MCS buffer reflect
+    * this.  The contents of the color buffer are undefined.
+    *
+    * If a single-sample buffer is in this state, a render target resolve must
+    * be performed before it can be used for purposes other than rendering.
+    *
+    * If the client attempts to clear a buffer which is already in this state,
+    * the clear can be safely skipped, since the buffer is already clear.
+    */
+   INTEL_FAST_CLEAR_STATE_CLEAR,
+};
+
+struct intel_mipmap_tree
+{
+   /** Buffer object containing the pixel data. */
+   drm_intel_bo *bo;
+
+   uint32_t pitch; /**< pitch in bytes. */
+
+   uint32_t tiling; /**< One of the I915_TILING_* flags */
+
+   /* Effectively the key:
+    */
+   GLenum target;
+
+   /**
+    * Generally, this is just the same as the gl_texture_image->TexFormat or
+    * gl_renderbuffer->Format.
+    *
+    * However, for textures and renderbuffers with packed depth/stencil formats
+    * on hardware where we want or need to use separate stencil, there will be
+    * two miptrees for storing the data.  If the depthstencil texture or rb is
+    * MESA_FORMAT_Z32_FLOAT_S8X24_UINT, then mt->format will be
+    * MESA_FORMAT_Z_FLOAT32, otherwise for MESA_FORMAT_Z24_UNORM_S8_UINT objects it will be
+    * MESA_FORMAT_Z24_UNORM_X8_UINT.
+    *
+    * For ETC1/ETC2 textures, this is one of the uncompressed mesa texture
+    * formats if the hardware lacks support for ETC1/ETC2. See @ref etc_format.
+    */
+   mesa_format format;
+
+   /** This variable stores the value of ETC compressed texture format */
+   mesa_format etc_format;
+
+   /**
+    * The X offset of each image in the miptree must be aligned to this.
+    * See the comments in brw_tex_layout.c.
+    */
+   unsigned int align_w;
+   unsigned int align_h; /**< \see align_w */
+
+   GLuint first_level;
+   GLuint last_level;
+
+   /**
+    * Level zero image dimensions.  These dimensions correspond to the
+    * physical layout of data in memory.  Accordingly, they account for the
+    * extra width, height, and or depth that must be allocated in order to
+    * accommodate multisample formats, and they account for the extra factor
+    * of 6 in depth that must be allocated in order to accommodate cubemap
+    * textures.
+    */
+   GLuint physical_width0, physical_height0, physical_depth0;
+
+   GLuint cpp; /**< bytes per pixel */
+   GLuint num_samples;
+   bool compressed;
+
+   /**
+    * Level zero image dimensions.  These dimensions correspond to the
+    * logical width, height, and depth of the texture as seen by client code.
+    * Accordingly, they do not account for the extra width, height, and/or
+    * depth that must be allocated in order to accommodate multisample
+    * formats, nor do they account for the extra factor of 6 in depth that
+    * must be allocated in order to accommodate cubemap textures.
+    */
+   uint32_t logical_width0, logical_height0, logical_depth0;
+
+   /**
+    * For 1D array, 2D array, cube, and 2D multisampled surfaces on Gen7: true
+    * if the surface only contains LOD 0, and hence no space is for LOD's
+    * other than 0 in between array slices.
+    *
+    * Corresponds to the surface_array_spacing bit in gen7_surface_state.
+    */
+   bool array_spacing_lod0;
+
+   /**
+    * The distance in rows between array slices in an uncompressed surface.
+    *
+    * For compressed surfaces, slices are stored closer together physically;
+    * the real distance is (qpitch / block height).
+    */
+   uint32_t qpitch;
+
+   /**
+    * MSAA layout used by this buffer.
+    */
+   enum intel_msaa_layout msaa_layout;
+
+   /* Derived from the above:
+    */
+   GLuint total_width;
+   GLuint total_height;
+
+   /* The 3DSTATE_CLEAR_PARAMS value associated with the last depth clear to
+    * this depth mipmap tree, if any.
+    */
+   uint32_t depth_clear_value;
+
+   /* Includes image offset tables:
+    */
+   struct intel_mipmap_level level[MAX_TEXTURE_LEVELS];
+
+   /* Offset into bo where miptree starts:
+    */
+   uint32_t offset;
+
+   /**
+    * \brief HiZ miptree
+    *
+    * The hiz miptree contains the miptree's hiz buffer. To allocate the hiz
+    * miptree, use intel_miptree_alloc_hiz().
+    *
+    * To determine if hiz is enabled, do not check this pointer. Instead, use
+    * intel_miptree_slice_has_hiz().
+    */
+   struct intel_mipmap_tree *hiz_mt;
+
+   /**
+    * \brief Map of miptree slices to needed resolves.
+    *
+    * This is used only when the miptree has a child HiZ miptree.
+    *
+    * Let \c mt be a depth miptree with HiZ enabled. Then the resolve map is
+    * \c mt->hiz_map. The resolve map of the child HiZ miptree, \c
+    * mt->hiz_mt->hiz_map, is unused.
+    */
+   struct intel_resolve_map hiz_map;
+
+   /**
+    * \brief Stencil miptree for depthstencil textures.
+    *
+    * This miptree is used for depthstencil textures and renderbuffers that
+    * require separate stencil.  It always has the true copy of the stencil
+    * bits, regardless of mt->format.
+    *
+    * \see intel_miptree_map_depthstencil()
+    * \see intel_miptree_unmap_depthstencil()
+    */
+   struct intel_mipmap_tree *stencil_mt;
+
+   /**
+    * \brief MCS miptree.
+    *
+    * This miptree contains the "multisample control surface", which stores
+    * the necessary information to implement compressed MSAA
+    * (INTEL_MSAA_FORMAT_CMS) and "fast color clear" behaviour on Gen7+.
+    *
+    * NULL if no MCS miptree is in use for this surface.
+    */
+   struct intel_mipmap_tree *mcs_mt;
+
+   /**
+    * Fast clear state for this buffer.
+    */
+   enum intel_fast_clear_state fast_clear_state;
+
+   /**
+    * The SURFACE_STATE bits associated with the last fast color clear to this
+    * color mipmap tree, if any.
+    *
+    * This value will only ever contain ones in bits 28-31, so it is safe to
+    * OR into dword 7 of SURFACE_STATE.
+    */
+   uint32_t fast_clear_color_value;
+
+   /* These are also refcounted:
+    */
+   GLuint refcount;
+};
+
+enum intel_miptree_tiling_mode {
+   INTEL_MIPTREE_TILING_ANY,
+   INTEL_MIPTREE_TILING_Y,
+   INTEL_MIPTREE_TILING_NONE,
+};
+
+bool
+intel_is_non_msrt_mcs_buffer_supported(struct brw_context *brw,
+                                       struct intel_mipmap_tree *mt);
+
+void
+intel_get_non_msrt_mcs_alignment(struct brw_context *brw,
+                                 struct intel_mipmap_tree *mt,
+                                 unsigned *width_px, unsigned *height);
+
+bool
+intel_miptree_alloc_non_msrt_mcs(struct brw_context *brw,
+                                 struct intel_mipmap_tree *mt);
+
+struct intel_mipmap_tree *intel_miptree_create(struct brw_context *brw,
+                                               GLenum target,
+					       mesa_format format,
+                                               GLuint first_level,
+                                               GLuint last_level,
+                                               GLuint width0,
+                                               GLuint height0,
+                                               GLuint depth0,
+					       bool expect_accelerated_upload,
+                                               GLuint num_samples,
+                                               enum intel_miptree_tiling_mode);
+
+struct intel_mipmap_tree *
+intel_miptree_create_layout(struct brw_context *brw,
+                            GLenum target,
+                            mesa_format format,
+                            GLuint first_level,
+                            GLuint last_level,
+                            GLuint width0,
+                            GLuint height0,
+                            GLuint depth0,
+                            bool for_bo,
+                            GLuint num_samples);
+
+struct intel_mipmap_tree *
+intel_miptree_create_for_bo(struct brw_context *brw,
+                            drm_intel_bo *bo,
+                            mesa_format format,
+                            uint32_t offset,
+                            uint32_t width,
+                            uint32_t height,
+                            int pitch);
+
+void
+intel_update_winsys_renderbuffer_miptree(struct brw_context *intel,
+                                         struct intel_renderbuffer *irb,
+                                         drm_intel_bo *bo,
+                                         uint32_t width, uint32_t height,
+                                         uint32_t pitch);
+
+/**
+ * Create a miptree appropriate as the storage for a non-texture renderbuffer.
+ * The miptree has the following properties:
+ *     - The target is GL_TEXTURE_2D.
+ *     - There are no levels other than the base level 0.
+ *     - Depth is 1.
+ */
+struct intel_mipmap_tree*
+intel_miptree_create_for_renderbuffer(struct brw_context *brw,
+                                      mesa_format format,
+                                      uint32_t width,
+                                      uint32_t height,
+                                      uint32_t num_samples);
+
+mesa_format
+intel_depth_format_for_depthstencil_format(mesa_format format);
+
+mesa_format
+intel_lower_compressed_format(struct brw_context *brw, mesa_format format);
+
+/** \brief Assert that the level and layer are valid for the miptree. */
+static inline void
+intel_miptree_check_level_layer(struct intel_mipmap_tree *mt,
+                                uint32_t level,
+                                uint32_t layer)
+{
+   assert(level >= mt->first_level);
+   assert(level <= mt->last_level);
+   assert(layer < mt->level[level].depth);
+}
+
+void intel_miptree_reference(struct intel_mipmap_tree **dst,
+                             struct intel_mipmap_tree *src);
+
+void intel_miptree_release(struct intel_mipmap_tree **mt);
+
+/* Check if an image fits an existing mipmap tree layout
+ */
+bool intel_miptree_match_image(struct intel_mipmap_tree *mt,
+                                    struct gl_texture_image *image);
+
+void
+intel_miptree_get_image_offset(const struct intel_mipmap_tree *mt,
+			       GLuint level, GLuint slice,
+			       GLuint *x, GLuint *y);
+
+void
+intel_miptree_get_dimensions_for_image(struct gl_texture_image *image,
+                                       int *width, int *height, int *depth);
+
+void
+intel_miptree_get_tile_masks(const struct intel_mipmap_tree *mt,
+                             uint32_t *mask_x, uint32_t *mask_y,
+                             bool map_stencil_as_y_tiled);
+
+uint32_t
+intel_miptree_get_tile_offsets(const struct intel_mipmap_tree *mt,
+                               GLuint level, GLuint slice,
+                               uint32_t *tile_x,
+                               uint32_t *tile_y);
+uint32_t
+intel_miptree_get_aligned_offset(const struct intel_mipmap_tree *mt,
+                                 uint32_t x, uint32_t y,
+                                 bool map_stencil_as_y_tiled);
+
+void intel_miptree_set_level_info(struct intel_mipmap_tree *mt,
+                                  GLuint level,
+                                  GLuint x, GLuint y, GLuint d);
+
+void intel_miptree_set_image_offset(struct intel_mipmap_tree *mt,
+                                    GLuint level,
+                                    GLuint img, GLuint x, GLuint y);
+
+void
+intel_miptree_copy_teximage(struct brw_context *brw,
+                            struct intel_texture_image *intelImage,
+                            struct intel_mipmap_tree *dst_mt, bool invalidate);
+
+bool
+intel_miptree_alloc_mcs(struct brw_context *brw,
+                        struct intel_mipmap_tree *mt,
+                        GLuint num_samples);
+
+/**
+ * \name Miptree HiZ functions
+ * \{
+ *
+ * It is safe to call the "slice_set_need_resolve" and "slice_resolve"
+ * functions on a miptree without HiZ. In that case, each function is a no-op.
+ */
+
+/**
+ * \brief Allocate the miptree's embedded HiZ miptree.
+ * \see intel_mipmap_tree:hiz_mt
+ * \return false if allocation failed
+ */
+
+bool
+intel_miptree_alloc_hiz(struct brw_context *brw,
+			struct intel_mipmap_tree *mt);
+
+bool
+intel_miptree_slice_has_hiz(struct intel_mipmap_tree *mt,
+                            uint32_t level,
+                            uint32_t layer);
+
+void
+intel_miptree_slice_set_needs_hiz_resolve(struct intel_mipmap_tree *mt,
+                                          uint32_t level,
+					  uint32_t depth);
+void
+intel_miptree_slice_set_needs_depth_resolve(struct intel_mipmap_tree *mt,
+                                            uint32_t level,
+					    uint32_t depth);
+
+void
+intel_miptree_set_all_slices_need_depth_resolve(struct intel_mipmap_tree *mt,
+                                                uint32_t level);
+
+/**
+ * \return false if no resolve was needed
+ */
+bool
+intel_miptree_slice_resolve_hiz(struct brw_context *brw,
+				struct intel_mipmap_tree *mt,
+				unsigned int level,
+				unsigned int depth);
+
+/**
+ * \return false if no resolve was needed
+ */
+bool
+intel_miptree_slice_resolve_depth(struct brw_context *brw,
+				  struct intel_mipmap_tree *mt,
+				  unsigned int level,
+				  unsigned int depth);
+
+/**
+ * \return false if no resolve was needed
+ */
+bool
+intel_miptree_all_slices_resolve_hiz(struct brw_context *brw,
+				     struct intel_mipmap_tree *mt);
+
+/**
+ * \return false if no resolve was needed
+ */
+bool
+intel_miptree_all_slices_resolve_depth(struct brw_context *brw,
+				       struct intel_mipmap_tree *mt);
+
+/**\}*/
+
+/**
+ * Update the fast clear state for a miptree to indicate that it has been used
+ * for rendering.
+ */
+static inline void
+intel_miptree_used_for_rendering(struct intel_mipmap_tree *mt)
+{
+   /* If the buffer was previously in fast clear state, change it to
+    * unresolved state, since it won't be guaranteed to be clear after
+    * rendering occurs.
+    */
+   if (mt->fast_clear_state == INTEL_FAST_CLEAR_STATE_CLEAR)
+      mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_UNRESOLVED;
+}
+
+void
+intel_miptree_resolve_color(struct brw_context *brw,
+                            struct intel_mipmap_tree *mt);
+
+void
+intel_miptree_make_shareable(struct brw_context *brw,
+                             struct intel_mipmap_tree *mt);
+
+void
+intel_miptree_updownsample(struct brw_context *brw,
+                           struct intel_mipmap_tree *src,
+                           struct intel_mipmap_tree *dst);
+
+void brw_miptree_layout(struct brw_context *brw, struct intel_mipmap_tree *mt);
+
+void *intel_miptree_map_raw(struct brw_context *brw,
+                            struct intel_mipmap_tree *mt);
+
+void intel_miptree_unmap_raw(struct brw_context *brw,
+                             struct intel_mipmap_tree *mt);
+
+void
+intel_miptree_map(struct brw_context *brw,
+		  struct intel_mipmap_tree *mt,
+		  unsigned int level,
+		  unsigned int slice,
+		  unsigned int x,
+		  unsigned int y,
+		  unsigned int w,
+		  unsigned int h,
+		  GLbitfield mode,
+		  void **out_ptr,
+		  int *out_stride);
+
+void
+intel_miptree_unmap(struct brw_context *brw,
+		    struct intel_mipmap_tree *mt,
+		    unsigned int level,
+		    unsigned int slice);
+
+void
+intel_hiz_exec(struct brw_context *brw, struct intel_mipmap_tree *mt,
+	       unsigned int level, unsigned int layer, enum gen6_hiz_op op);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif

diff --git a/icd/intel/compiler/pipeline/intel_resolve_map.h b/icd/intel/compiler/pipeline/intel_resolve_map.h
new file mode 100644
index 0000000..8504271
--- /dev/null
+++ b/icd/intel/compiler/pipeline/intel_resolve_map.h

@@ -0,0 +1,104 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#pragma once
+
+#include <stdint.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * For an overview of the HiZ operations, see the following sections of the
+ * Sandy Bridge PRM, Volume 1, Part2:
+ *   - 7.5.3.1 Depth Buffer Clear
+ *   - 7.5.3.2 Depth Buffer Resolve
+ *   - 7.5.3.3 Hierarchical Depth Buffer Resolve
+ *
+ * Of these, two get entered in the resolve map as needing to be done to the
+ * buffer: depth resolve and hiz resolve.
+ */
+enum gen6_hiz_op {
+   GEN6_HIZ_OP_DEPTH_CLEAR,
+   GEN6_HIZ_OP_DEPTH_RESOLVE,
+   GEN6_HIZ_OP_HIZ_RESOLVE,
+   GEN6_HIZ_OP_NONE,
+};
+
+/**
+ * \brief Map of miptree slices to needed resolves.
+ *
+ * The map is implemented as a linear doubly-linked list.
+ *
+ * In the intel_resolve_map*() functions, the \c head argument is not
+ * inspected for its data. It only serves as an anchor for the list.
+ *
+ * \par Design Discussion
+ *
+ *     There are two possible ways to record which miptree slices need
+ *     resolves. 1) Maintain a flag for every miptree slice in the texture,
+ *     likely in intel_mipmap_level::slice, or 2) maintain a list of only
+ *     those slices that need a resolve.
+ *
+ *     Immediately before drawing, a full depth resolve performed on each
+ *     enabled depth texture. If design 1 were chosen, then at each draw call
+ *     it would be necessary to iterate over each miptree slice of each
+ *     enabled depth texture in order to query if each slice needed a resolve.
+ *     In the worst case, this would require 2^16 iterations: 16 texture
+ *     units, 16 miplevels, and 256 depth layers (assuming maximums for OpenGL
+ *     2.1).
+ *
+ *     By choosing design 2, the number of iterations is exactly the minimum
+ *     necessary.
+ */
+struct intel_resolve_map {
+   uint32_t level;
+   uint32_t layer;
+   enum gen6_hiz_op need;
+
+   struct intel_resolve_map *next;
+   struct intel_resolve_map *prev;
+};
+
+void
+intel_resolve_map_set(struct intel_resolve_map *head,
+		      uint32_t level,
+		      uint32_t layer,
+		      enum gen6_hiz_op need);
+
+struct intel_resolve_map*
+intel_resolve_map_get(struct intel_resolve_map *head,
+		      uint32_t level,
+		      uint32_t layer);
+
+void
+intel_resolve_map_remove(struct intel_resolve_map *elem);
+
+void
+intel_resolve_map_clear(struct intel_resolve_map *head);
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+

diff --git a/icd/intel/compiler/pipeline/intel_screen.c b/icd/intel/compiler/pipeline/intel_screen.c
new file mode 100644
index 0000000..d06e7f3
--- /dev/null
+++ b/icd/intel/compiler/pipeline/intel_screen.c

@@ -0,0 +1,1481 @@
+/**************************************************************************
+ *
+ * Copyright 2003 VMware, Inc.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
+ * IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR
+ * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ **************************************************************************/
+
+#include <errno.h>
+#include <time.h>
+#include <unistd.h>
+#include "main/glheader.h"
+#include "main/context.h"
+#include "main/framebuffer.h"
+#include "main/renderbuffer.h"
+#include "main/texobj.h"
+#include "main/hash.h"
+#include "main/fbobject.h"
+#include "main/version.h"
+#include "swrast/s_renderbuffer.h"
+#include "glsl/ralloc.h"
+
+#include "utils.h"
+#include "xmlpool.h"
+
+static const __DRIconfigOptionsExtension brw_config_options = {
+   .base = { __DRI_CONFIG_OPTIONS, 1 },
+   .xml =
+DRI_CONF_BEGIN
+   DRI_CONF_SECTION_PERFORMANCE
+      DRI_CONF_VBLANK_MODE(DRI_CONF_VBLANK_ALWAYS_SYNC)
+      DRI_CONF_MULTITHREAD_GLSL_COMPILER(0)
+      DRI_CONF_MAX_SHADER_CACHE_SIZE(0)
+
+      /* Options correspond to DRI_CONF_BO_REUSE_DISABLED,
+       * DRI_CONF_BO_REUSE_ALL
+       */
+      DRI_CONF_OPT_BEGIN_V(bo_reuse, enum, 1, "0:1")
+	 DRI_CONF_DESC_BEGIN(en, "Buffer object reuse")
+	    DRI_CONF_ENUM(0, "Disable buffer object reuse")
+	    DRI_CONF_ENUM(1, "Enable reuse of all sizes of buffer objects")
+	 DRI_CONF_DESC_END
+      DRI_CONF_OPT_END
+
+      DRI_CONF_OPT_BEGIN_B(hiz, "true")
+	 DRI_CONF_DESC(en, "Enable Hierarchical Z on gen6+")
+      DRI_CONF_OPT_END
+
+      DRI_CONF_OPT_BEGIN_B(disable_derivative_optimization, "false")
+	 DRI_CONF_DESC(en, "Derivatives with finer granularity by default")
+      DRI_CONF_OPT_END
+
+      DRI_CONF_OPT_BEGIN_V(glass_mode, enum, 0, "0:2")
+    DRI_CONF_DESC_BEGIN(en, "Glass Optimizer Mode")
+       DRI_CONF_ENUM(0, "Disable Glass Optimizer")
+       DRI_CONF_ENUM(1, "Use Driver Whitelist")
+       DRI_CONF_ENUM(2, "Enable Glass Optimizer")
+    DRI_CONF_DESC_END
+    DRI_CONF_OPT_END
+
+      DRI_CONF_OPT_BEGIN_B(glass_enable_reassociation, "true")
+         DRI_CONF_DESC(en, "Use reassociate optimization pass in LunarGLASS")
+      DRI_CONF_OPT_END
+
+    DRI_CONF_SECTION_END
+
+   DRI_CONF_SECTION_QUALITY
+      DRI_CONF_FORCE_S3TC_ENABLE("false")
+
+      DRI_CONF_OPT_BEGIN(clamp_max_samples, int, -1)
+              DRI_CONF_DESC(en, "Clamp the value of GL_MAX_SAMPLES to the "
+                            "given integer. If negative, then do not clamp.")
+      DRI_CONF_OPT_END
+   DRI_CONF_SECTION_END
+
+   DRI_CONF_SECTION_DEBUG
+      DRI_CONF_NO_RAST("false")
+      DRI_CONF_ALWAYS_FLUSH_BATCH("false")
+      DRI_CONF_ALWAYS_FLUSH_CACHE("false")
+      DRI_CONF_DISABLE_THROTTLING("false")
+      DRI_CONF_FORCE_GLSL_EXTENSIONS_WARN("false")
+      DRI_CONF_DISABLE_GLSL_LINE_CONTINUATIONS("false")
+      DRI_CONF_DISABLE_BLEND_FUNC_EXTENDED("false")
+
+      DRI_CONF_OPT_BEGIN_B(shader_precompile, "true")
+	 DRI_CONF_DESC(en, "Perform code generation at shader link time.")
+      DRI_CONF_OPT_END
+   DRI_CONF_SECTION_END
+DRI_CONF_END
+};
+
+#include "intel_batchbuffer.h"
+#include "intel_buffers.h"
+#include "intel_bufmgr.h"
+#include "intel_chipset.h"
+#include "intel_fbo.h"
+#include "intel_mipmap_tree.h"
+#include "intel_screen.h"
+#include "intel_tex.h"
+#include "intel_image.h"
+
+#include "brw_context.h"
+
+#include "i915_drm.h"
+
+/**
+ * For debugging purposes, this returns a time in seconds.
+ */
+double
+get_time(void)
+{
+   struct timespec tp;
+
+   clock_gettime(CLOCK_MONOTONIC, &tp);
+
+   return tp.tv_sec + tp.tv_nsec / 1000000000.0;
+}
+
+void
+aub_dump_bmp(struct gl_context *ctx)
+{
+   struct gl_framebuffer *fb = ctx->DrawBuffer;
+
+   for (int i = 0; i < fb->_NumColorDrawBuffers; i++) {
+      struct intel_renderbuffer *irb =
+	 intel_renderbuffer(fb->_ColorDrawBuffers[i]);
+
+      if (irb && irb->mt) {
+	 enum aub_dump_bmp_format format;
+
+	 switch (irb->Base.Base.Format) {
+	 case MESA_FORMAT_B8G8R8A8_UNORM:
+	 case MESA_FORMAT_B8G8R8X8_UNORM:
+	    format = AUB_DUMP_BMP_FORMAT_ARGB_8888;
+	    break;
+	 default:
+	    continue;
+	 }
+
+         drm_intel_gem_bo_aub_dump_bmp(irb->mt->bo,
+				       irb->draw_x,
+				       irb->draw_y,
+				       irb->Base.Base.Width,
+				       irb->Base.Base.Height,
+				       format,
+				       irb->mt->pitch,
+				       0);
+      }
+   }
+}
+
+static const __DRItexBufferExtension intelTexBufferExtension = {
+   .base = { __DRI_TEX_BUFFER, 3 },
+
+   .setTexBuffer        = intelSetTexBuffer,
+   .setTexBuffer2       = intelSetTexBuffer2,
+   .releaseTexBuffer    = NULL,
+};
+
+static void
+intel_dri2_flush_with_flags(__DRIcontext *cPriv,
+                            __DRIdrawable *dPriv,
+                            unsigned flags,
+                            enum __DRI2throttleReason reason)
+{
+   struct brw_context *brw = cPriv->driverPrivate;
+
+   if (!brw)
+      return;
+
+   struct gl_context *ctx = &brw->ctx;
+
+   FLUSH_VERTICES(ctx, 0);
+
+   if (flags & __DRI2_FLUSH_DRAWABLE)
+      intel_resolve_for_dri2_flush(brw, dPriv);
+
+   if (reason == __DRI2_THROTTLE_SWAPBUFFER ||
+       reason == __DRI2_THROTTLE_FLUSHFRONT) {
+      brw->need_throttle = true;
+   }
+
+   intel_batchbuffer_flush(brw);
+
+   if (INTEL_DEBUG & DEBUG_AUB) {
+      aub_dump_bmp(ctx);
+   }
+}
+
+/**
+ * Provides compatibility with loaders that only support the older (version
+ * 1-3) flush interface.
+ *
+ * That includes libGL up to Mesa 9.0, and the X Server at least up to 1.13.
+ */
+static void
+intel_dri2_flush(__DRIdrawable *drawable)
+{
+   intel_dri2_flush_with_flags(drawable->driContextPriv, drawable,
+                               __DRI2_FLUSH_DRAWABLE,
+                               __DRI2_THROTTLE_SWAPBUFFER);
+}
+
+static const struct __DRI2flushExtensionRec intelFlushExtension = {
+    .base = { __DRI2_FLUSH, 4 },
+
+    .flush              = intel_dri2_flush,
+    .invalidate         = dri2InvalidateDrawable,
+    .flush_with_flags   = intel_dri2_flush_with_flags,
+};
+
+static struct intel_image_format intel_image_formats[] = {
+   { __DRI_IMAGE_FOURCC_ARGB8888, __DRI_IMAGE_COMPONENTS_RGBA, 1,
+     { { 0, 0, 0, __DRI_IMAGE_FORMAT_ARGB8888, 4 } } },
+
+   { __DRI_IMAGE_FOURCC_SARGB8888, __DRI_IMAGE_COMPONENTS_RGBA, 1,
+     { { 0, 0, 0, __DRI_IMAGE_FORMAT_SARGB8, 4 } } },
+
+   { __DRI_IMAGE_FOURCC_XRGB8888, __DRI_IMAGE_COMPONENTS_RGB, 1,
+     { { 0, 0, 0, __DRI_IMAGE_FORMAT_XRGB8888, 4 }, } },
+
+   { __DRI_IMAGE_FOURCC_RGB565, __DRI_IMAGE_COMPONENTS_RGB, 1,
+     { { 0, 0, 0, __DRI_IMAGE_FORMAT_RGB565, 2 } } },
+
+   { __DRI_IMAGE_FOURCC_YUV410, __DRI_IMAGE_COMPONENTS_Y_U_V, 3,
+     { { 0, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+       { 1, 2, 2, __DRI_IMAGE_FORMAT_R8, 1 },
+       { 2, 2, 2, __DRI_IMAGE_FORMAT_R8, 1 } } },
+
+   { __DRI_IMAGE_FOURCC_YUV411, __DRI_IMAGE_COMPONENTS_Y_U_V, 3,
+     { { 0, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+       { 1, 2, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+       { 2, 2, 0, __DRI_IMAGE_FORMAT_R8, 1 } } },
+
+   { __DRI_IMAGE_FOURCC_YUV420, __DRI_IMAGE_COMPONENTS_Y_U_V, 3,
+     { { 0, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+       { 1, 1, 1, __DRI_IMAGE_FORMAT_R8, 1 },
+       { 2, 1, 1, __DRI_IMAGE_FORMAT_R8, 1 } } },
+
+   { __DRI_IMAGE_FOURCC_YUV422, __DRI_IMAGE_COMPONENTS_Y_U_V, 3,
+     { { 0, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+       { 1, 1, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+       { 2, 1, 0, __DRI_IMAGE_FORMAT_R8, 1 } } },
+
+   { __DRI_IMAGE_FOURCC_YUV444, __DRI_IMAGE_COMPONENTS_Y_U_V, 3,
+     { { 0, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+       { 1, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+       { 2, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 } } },
+
+   { __DRI_IMAGE_FOURCC_NV12, __DRI_IMAGE_COMPONENTS_Y_UV, 2,
+     { { 0, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+       { 1, 1, 1, __DRI_IMAGE_FORMAT_GR88, 2 } } },
+
+   { __DRI_IMAGE_FOURCC_NV16, __DRI_IMAGE_COMPONENTS_Y_UV, 2,
+     { { 0, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+       { 1, 1, 0, __DRI_IMAGE_FORMAT_GR88, 2 } } },
+
+   /* For YUYV buffers, we set up two overlapping DRI images and treat
+    * them as planar buffers in the compositors.  Plane 0 is GR88 and
+    * samples YU or YV pairs and places Y into the R component, while
+    * plane 1 is ARGB and samples YUYV clusters and places pairs and
+    * places U into the G component and V into A.  This lets the
+    * texture sampler interpolate the Y components correctly when
+    * sampling from plane 0, and interpolate U and V correctly when
+    * sampling from plane 1. */
+   { __DRI_IMAGE_FOURCC_YUYV, __DRI_IMAGE_COMPONENTS_Y_XUXV, 2,
+     { { 0, 0, 0, __DRI_IMAGE_FORMAT_GR88, 2 },
+       { 0, 1, 0, __DRI_IMAGE_FORMAT_ARGB8888, 4 } } }
+};
+
+static void
+intel_image_warn_if_unaligned(__DRIimage *image, const char *func)
+{
+   uint32_t tiling, swizzle;
+   drm_intel_bo_get_tiling(image->bo, &tiling, &swizzle);
+
+   if (tiling != I915_TILING_NONE && (image->offset & 0xfff)) {
+      _mesa_warning(NULL, "%s: offset 0x%08x not on tile boundary",
+                    func, image->offset);
+   }
+}
+
+static struct intel_image_format *
+intel_image_format_lookup(int fourcc)
+{
+   struct intel_image_format *f = NULL;
+
+   for (unsigned i = 0; i < ARRAY_SIZE(intel_image_formats); i++) {
+      if (intel_image_formats[i].fourcc == fourcc) {
+	 f = &intel_image_formats[i];
+	 break;
+      }
+   }
+
+   return f;
+}
+
+static __DRIimage *
+intel_allocate_image(int dri_format, void *loaderPrivate)
+{
+    __DRIimage *image;
+
+    image = calloc(1, sizeof *image);
+    if (image == NULL)
+	return NULL;
+
+    image->dri_format = dri_format;
+    image->offset = 0;
+
+    image->format = driImageFormatToGLFormat(dri_format);
+    if (dri_format != __DRI_IMAGE_FORMAT_NONE &&
+        image->format == MESA_FORMAT_NONE) {
+       free(image);
+       return NULL;
+    }
+
+    image->internal_format = _mesa_get_format_base_format(image->format);
+    image->data = loaderPrivate;
+
+    return image;
+}
+
+/**
+ * Sets up a DRIImage structure to point to a slice out of a miptree.
+ */
+static void
+intel_setup_image_from_mipmap_tree(struct brw_context *brw, __DRIimage *image,
+                                   struct intel_mipmap_tree *mt, GLuint level,
+                                   GLuint zoffset)
+{
+   intel_miptree_make_shareable(brw, mt);
+
+   intel_miptree_check_level_layer(mt, level, zoffset);
+
+   image->width = minify(mt->physical_width0, level - mt->first_level);
+   image->height = minify(mt->physical_height0, level - mt->first_level);
+   image->pitch = mt->pitch;
+
+   image->offset = intel_miptree_get_tile_offsets(mt, level, zoffset,
+                                                  &image->tile_x,
+                                                  &image->tile_y);
+
+   drm_intel_bo_unreference(image->bo);
+   image->bo = mt->bo;
+   drm_intel_bo_reference(mt->bo);
+}
+
+static __DRIimage *
+intel_create_image_from_name(__DRIscreen *screen,
+			     int width, int height, int format,
+			     int name, int pitch, void *loaderPrivate)
+{
+    struct intel_screen *intelScreen = screen->driverPrivate;
+    __DRIimage *image;
+    int cpp;
+
+    image = intel_allocate_image(format, loaderPrivate);
+    if (image == NULL)
+       return NULL;
+
+    if (image->format == MESA_FORMAT_NONE)
+       cpp = 1;
+    else
+       cpp = _mesa_get_format_bytes(image->format);
+
+    image->width = width;
+    image->height = height;
+    image->pitch = pitch * cpp;
+    image->bo = drm_intel_bo_gem_create_from_name(intelScreen->bufmgr, "image",
+                                                  name);
+    if (!image->bo) {
+       free(image);
+       return NULL;
+    }
+
+    return image;	
+}
+
+static __DRIimage *
+intel_create_image_from_renderbuffer(__DRIcontext *context,
+				     int renderbuffer, void *loaderPrivate)
+{
+   __DRIimage *image;
+   struct brw_context *brw = context->driverPrivate;
+   struct gl_context *ctx = &brw->ctx;
+   struct gl_renderbuffer *rb;
+   struct intel_renderbuffer *irb;
+
+   rb = _mesa_lookup_renderbuffer(ctx, renderbuffer);
+   if (!rb) {
+      _mesa_error(ctx, GL_INVALID_OPERATION, "glRenderbufferExternalMESA");
+      return NULL;
+   }
+
+   irb = intel_renderbuffer(rb);
+   intel_miptree_make_shareable(brw, irb->mt);
+   image = calloc(1, sizeof *image);
+   if (image == NULL)
+      return NULL;
+
+   image->internal_format = rb->InternalFormat;
+   image->format = rb->Format;
+   image->offset = 0;
+   image->data = loaderPrivate;
+   drm_intel_bo_unreference(image->bo);
+   image->bo = irb->mt->bo;
+   drm_intel_bo_reference(irb->mt->bo);
+   image->width = rb->Width;
+   image->height = rb->Height;
+   image->pitch = irb->mt->pitch;
+   image->dri_format = driGLFormatToImageFormat(image->format);
+   image->has_depthstencil = irb->mt->stencil_mt? true : false;
+
+   rb->NeedsFinishRenderTexture = true;
+   return image;
+}
+
+static __DRIimage *
+intel_create_image_from_texture(__DRIcontext *context, int target,
+                                unsigned texture, int zoffset,
+                                int level,
+                                unsigned *error,
+                                void *loaderPrivate)
+{
+   __DRIimage *image;
+   struct brw_context *brw = context->driverPrivate;
+   struct gl_texture_object *obj;
+   struct intel_texture_object *iobj;
+   GLuint face = 0;
+
+   obj = _mesa_lookup_texture(&brw->ctx, texture);
+   if (!obj || obj->Target != target) {
+      *error = __DRI_IMAGE_ERROR_BAD_PARAMETER;
+      return NULL;
+   }
+
+   if (target == GL_TEXTURE_CUBE_MAP)
+      face = zoffset;
+
+   _mesa_test_texobj_completeness(&brw->ctx, obj);
+   iobj = intel_texture_object(obj);
+   if (!obj->_BaseComplete || (level > 0 && !obj->_MipmapComplete)) {
+      *error = __DRI_IMAGE_ERROR_BAD_PARAMETER;
+      return NULL;
+   }
+
+   if (level < obj->BaseLevel || level > obj->_MaxLevel) {
+      *error = __DRI_IMAGE_ERROR_BAD_MATCH;
+      return NULL;
+   }
+
+   if (target == GL_TEXTURE_3D && obj->Image[face][level]->Depth < zoffset) {
+      *error = __DRI_IMAGE_ERROR_BAD_MATCH;
+      return NULL;
+   }
+   image = calloc(1, sizeof *image);
+   if (image == NULL) {
+      *error = __DRI_IMAGE_ERROR_BAD_ALLOC;
+      return NULL;
+   }
+
+   image->internal_format = obj->Image[face][level]->InternalFormat;
+   image->format = obj->Image[face][level]->TexFormat;
+   image->data = loaderPrivate;
+   intel_setup_image_from_mipmap_tree(brw, image, iobj->mt, level, zoffset);
+   image->dri_format = driGLFormatToImageFormat(image->format);
+   image->has_depthstencil = iobj->mt->stencil_mt? true : false;
+   if (image->dri_format == MESA_FORMAT_NONE) {
+      *error = __DRI_IMAGE_ERROR_BAD_PARAMETER;
+      free(image);
+      return NULL;
+   }
+
+   *error = __DRI_IMAGE_ERROR_SUCCESS;
+   return image;
+}
+
+static void
+intel_destroy_image(__DRIimage *image)
+{
+   drm_intel_bo_unreference(image->bo);
+   free(image);
+}
+
+static __DRIimage *
+intel_create_image(__DRIscreen *screen,
+		   int width, int height, int format,
+		   unsigned int use,
+		   void *loaderPrivate)
+{
+   __DRIimage *image;
+   struct intel_screen *intelScreen = screen->driverPrivate;
+   uint32_t tiling;
+   int cpp;
+   unsigned long pitch;
+
+   tiling = I915_TILING_X;
+   if (use & __DRI_IMAGE_USE_CURSOR) {
+      if (width != 64 || height != 64)
+	 return NULL;
+      tiling = I915_TILING_NONE;
+   }
+
+   if (use & __DRI_IMAGE_USE_LINEAR)
+      tiling = I915_TILING_NONE;
+
+   image = intel_allocate_image(format, loaderPrivate);
+   if (image == NULL)
+      return NULL;
+
+   
+   cpp = _mesa_get_format_bytes(image->format);
+   image->bo = drm_intel_bo_alloc_tiled(intelScreen->bufmgr, "image",
+                                        width, height, cpp, &tiling,
+                                        &pitch, 0);
+   if (image->bo == NULL) {
+      free(image);
+      return NULL;
+   }
+   image->width = width;
+   image->height = height;
+   image->pitch = pitch;
+
+   return image;
+}
+
+static GLboolean
+intel_query_image(__DRIimage *image, int attrib, int *value)
+{
+   switch (attrib) {
+   case __DRI_IMAGE_ATTRIB_STRIDE:
+      *value = image->pitch;
+      return true;
+   case __DRI_IMAGE_ATTRIB_HANDLE:
+      *value = image->bo->handle;
+      return true;
+   case __DRI_IMAGE_ATTRIB_NAME:
+      return !drm_intel_bo_flink(image->bo, (uint32_t *) value);
+   case __DRI_IMAGE_ATTRIB_FORMAT:
+      *value = image->dri_format;
+      return true;
+   case __DRI_IMAGE_ATTRIB_WIDTH:
+      *value = image->width;
+      return true;
+   case __DRI_IMAGE_ATTRIB_HEIGHT:
+      *value = image->height;
+      return true;
+   case __DRI_IMAGE_ATTRIB_COMPONENTS:
+      if (image->planar_format == NULL)
+         return false;
+      *value = image->planar_format->components;
+      return true;
+   case __DRI_IMAGE_ATTRIB_FD:
+      if (drm_intel_bo_gem_export_to_prime(image->bo, value) == 0)
+         return true;
+      return false;
+  default:
+      return false;
+   }
+}
+
+static __DRIimage *
+intel_dup_image(__DRIimage *orig_image, void *loaderPrivate)
+{
+   __DRIimage *image;
+
+   image = calloc(1, sizeof *image);
+   if (image == NULL)
+      return NULL;
+
+   drm_intel_bo_reference(orig_image->bo);
+   image->bo              = orig_image->bo;
+   image->internal_format = orig_image->internal_format;
+   image->planar_format   = orig_image->planar_format;
+   image->dri_format      = orig_image->dri_format;
+   image->format          = orig_image->format;
+   image->offset          = orig_image->offset;
+   image->width           = orig_image->width;
+   image->height          = orig_image->height;
+   image->pitch           = orig_image->pitch;
+   image->tile_x          = orig_image->tile_x;
+   image->tile_y          = orig_image->tile_y;
+   image->has_depthstencil = orig_image->has_depthstencil;
+   image->data            = loaderPrivate;
+
+   memcpy(image->strides, orig_image->strides, sizeof(image->strides));
+   memcpy(image->offsets, orig_image->offsets, sizeof(image->offsets));
+
+   return image;
+}
+
+static GLboolean
+intel_validate_usage(__DRIimage *image, unsigned int use)
+{
+   if (use & __DRI_IMAGE_USE_CURSOR) {
+      if (image->width != 64 || image->height != 64)
+	 return GL_FALSE;
+   }
+
+   return GL_TRUE;
+}
+
+static __DRIimage *
+intel_create_image_from_names(__DRIscreen *screen,
+                              int width, int height, int fourcc,
+                              int *names, int num_names,
+                              int *strides, int *offsets,
+                              void *loaderPrivate)
+{
+    struct intel_image_format *f = NULL;
+    __DRIimage *image;
+    int i, index;
+
+    if (screen == NULL || names == NULL || num_names != 1)
+        return NULL;
+
+    f = intel_image_format_lookup(fourcc);
+    if (f == NULL)
+        return NULL;
+
+    image = intel_create_image_from_name(screen, width, height,
+                                         __DRI_IMAGE_FORMAT_NONE,
+                                         names[0], strides[0],
+                                         loaderPrivate);
+
+   if (image == NULL)
+      return NULL;
+
+    image->planar_format = f;
+    for (i = 0; i < f->nplanes; i++) {
+        index = f->planes[i].buffer_index;
+        image->offsets[index] = offsets[index];
+        image->strides[index] = strides[index];
+    }
+
+    return image;
+}
+
+static __DRIimage *
+intel_create_image_from_fds(__DRIscreen *screen,
+                            int width, int height, int fourcc,
+                            int *fds, int num_fds, int *strides, int *offsets,
+                            void *loaderPrivate)
+{
+   struct intel_screen *intelScreen = screen->driverPrivate;
+   struct intel_image_format *f;
+   __DRIimage *image;
+   int i, index;
+
+   if (fds == NULL || num_fds != 1)
+      return NULL;
+
+   f = intel_image_format_lookup(fourcc);
+   if (f == NULL)
+      return NULL;
+
+   if (f->nplanes == 1)
+      image = intel_allocate_image(f->planes[0].dri_format, loaderPrivate);
+   else
+      image = intel_allocate_image(__DRI_IMAGE_FORMAT_NONE, loaderPrivate);
+
+   if (image == NULL)
+      return NULL;
+
+   image->bo = drm_intel_bo_gem_create_from_prime(intelScreen->bufmgr,
+                                                  fds[0],
+                                                  height * strides[0]);
+   if (image->bo == NULL) {
+      free(image);
+      return NULL;
+   }
+   image->width = width;
+   image->height = height;
+   image->pitch = strides[0];
+
+   image->planar_format = f;
+   for (i = 0; i < f->nplanes; i++) {
+      index = f->planes[i].buffer_index;
+      image->offsets[index] = offsets[index];
+      image->strides[index] = strides[index];
+   }
+
+   if (f->nplanes == 1) {
+      image->offset = image->offsets[0];
+      intel_image_warn_if_unaligned(image, __FUNCTION__);
+   }
+
+   return image;
+}
+
+static __DRIimage *
+intel_create_image_from_dma_bufs(__DRIscreen *screen,
+                                 int width, int height, int fourcc,
+                                 int *fds, int num_fds,
+                                 int *strides, int *offsets,
+                                 enum __DRIYUVColorSpace yuv_color_space,
+                                 enum __DRISampleRange sample_range,
+                                 enum __DRIChromaSiting horizontal_siting,
+                                 enum __DRIChromaSiting vertical_siting,
+                                 unsigned *error,
+                                 void *loaderPrivate)
+{
+   __DRIimage *image;
+   struct intel_image_format *f = intel_image_format_lookup(fourcc);
+
+   /* For now only packed formats that have native sampling are supported. */
+   if (!f || f->nplanes != 1) {
+      *error = __DRI_IMAGE_ERROR_BAD_MATCH;
+      return NULL;
+   }
+
+   image = intel_create_image_from_fds(screen, width, height, fourcc, fds,
+                                       num_fds, strides, offsets,
+                                       loaderPrivate);
+
+   /*
+    * Invalid parameters and any inconsistencies between are assumed to be
+    * checked by the caller. Therefore besides unsupported formats one can fail
+    * only in allocation.
+    */
+   if (!image) {
+      *error = __DRI_IMAGE_ERROR_BAD_ALLOC;
+      return NULL;
+   }
+
+   image->dma_buf_imported = true;
+   image->yuv_color_space = yuv_color_space;
+   image->sample_range = sample_range;
+   image->horizontal_siting = horizontal_siting;
+   image->vertical_siting = vertical_siting;
+
+   *error = __DRI_IMAGE_ERROR_SUCCESS;
+   return image;
+}
+
+static __DRIimage *
+intel_from_planar(__DRIimage *parent, int plane, void *loaderPrivate)
+{
+    int width, height, offset, stride, dri_format, index;
+    struct intel_image_format *f;
+    __DRIimage *image;
+
+    if (parent == NULL || parent->planar_format == NULL)
+        return NULL;
+
+    f = parent->planar_format;
+
+    if (plane >= f->nplanes)
+        return NULL;
+
+    width = parent->width >> f->planes[plane].width_shift;
+    height = parent->height >> f->planes[plane].height_shift;
+    dri_format = f->planes[plane].dri_format;
+    index = f->planes[plane].buffer_index;
+    offset = parent->offsets[index];
+    stride = parent->strides[index];
+
+    image = intel_allocate_image(dri_format, loaderPrivate);
+    if (image == NULL)
+       return NULL;
+
+    if (offset + height * stride > parent->bo->size) {
+       _mesa_warning(NULL, "intel_create_sub_image: subimage out of bounds");
+       free(image);
+       return NULL;
+    }
+
+    image->bo = parent->bo;
+    drm_intel_bo_reference(parent->bo);
+
+    image->width = width;
+    image->height = height;
+    image->pitch = stride;
+    image->offset = offset;
+
+    intel_image_warn_if_unaligned(image, __FUNCTION__);
+
+    return image;
+}
+
+static const __DRIimageExtension intelImageExtension = {
+    .base = { __DRI_IMAGE, 8 },
+
+    .createImageFromName                = intel_create_image_from_name,
+    .createImageFromRenderbuffer        = intel_create_image_from_renderbuffer,
+    .destroyImage                       = intel_destroy_image,
+    .createImage                        = intel_create_image,
+    .queryImage                         = intel_query_image,
+    .dupImage                           = intel_dup_image,
+    .validateUsage                      = intel_validate_usage,
+    .createImageFromNames               = intel_create_image_from_names,
+    .fromPlanar                         = intel_from_planar,
+    .createImageFromTexture             = intel_create_image_from_texture,
+    .createImageFromFds                 = intel_create_image_from_fds,
+    .createImageFromDmaBufs             = intel_create_image_from_dma_bufs
+};
+
+static int
+brw_query_renderer_integer(__DRIscreen *psp, int param, unsigned int *value)
+{
+   const struct intel_screen *const intelScreen =
+      (struct intel_screen *) psp->driverPrivate;
+
+   switch (param) {
+   case __DRI2_RENDERER_VENDOR_ID:
+      value[0] = 0x8086;
+      return 0;
+   case __DRI2_RENDERER_DEVICE_ID:
+      value[0] = intelScreen->deviceID;
+      return 0;
+   case __DRI2_RENDERER_ACCELERATED:
+      value[0] = 1;
+      return 0;
+   case __DRI2_RENDERER_VIDEO_MEMORY: {
+      /* Once a batch uses more than 75% of the maximum mappable size, we
+       * assume that there's some fragmentation, and we start doing extra
+       * flushing, etc.  That's the big cliff apps will care about.
+       */
+      size_t aper_size;
+      size_t mappable_size;
+
+      drm_intel_get_aperture_sizes(psp->fd, &mappable_size, &aper_size);
+
+      const unsigned gpu_mappable_megabytes =
+         (aper_size / (1024 * 1024)) * 3 / 4;
+
+      const long system_memory_pages = sysconf(_SC_PHYS_PAGES);
+      const long system_page_size = sysconf(_SC_PAGE_SIZE);
+
+      if (system_memory_pages <= 0 || system_page_size <= 0)
+         return -1;
+
+      const uint64_t system_memory_bytes = (uint64_t) system_memory_pages
+         * (uint64_t) system_page_size;
+
+      const unsigned system_memory_megabytes =
+         (unsigned) (system_memory_bytes / (1024 * 1024));
+
+      value[0] = MIN2(system_memory_megabytes, gpu_mappable_megabytes);
+      return 0;
+   }
+   case __DRI2_RENDERER_UNIFIED_MEMORY_ARCHITECTURE:
+      value[0] = 1;
+      return 0;
+   case __DRI2_RENDERER_PREFERRED_PROFILE:
+      value[0] = (psp->max_gl_core_version != 0)
+         ? (1U << __DRI_API_OPENGL_CORE) : (1U << __DRI_API_OPENGL);
+      return 0;
+   default:
+      return driQueryRendererIntegerCommon(psp, param, value);
+   }
+
+   return -1;
+}
+
+static int
+brw_query_renderer_string(__DRIscreen *psp, int param, const char **value)
+{
+   const struct intel_screen *intelScreen =
+      (struct intel_screen *) psp->driverPrivate;
+
+   switch (param) {
+   case __DRI2_RENDERER_VENDOR_ID:
+      value[0] = brw_vendor_string;
+      return 0;
+   case __DRI2_RENDERER_DEVICE_ID:
+      value[0] = brw_get_renderer_string(intelScreen->deviceID);
+      return 0;
+   default:
+      break;
+   }
+
+   return -1;
+}
+
+static const __DRI2rendererQueryExtension intelRendererQueryExtension = {
+   .base = { __DRI2_RENDERER_QUERY, 1 },
+
+   .queryInteger = brw_query_renderer_integer,
+   .queryString = brw_query_renderer_string
+};
+
+static const __DRIrobustnessExtension dri2Robustness = {
+   .base = { __DRI2_ROBUSTNESS, 1 }
+};
+
+static const __DRIextension *intelScreenExtensions[] = {
+    &intelTexBufferExtension.base,
+    &intelFlushExtension.base,
+    &intelImageExtension.base,
+    &intelRendererQueryExtension.base,
+    &dri2ConfigQueryExtension.base,
+    NULL
+};
+
+static const __DRIextension *intelRobustScreenExtensions[] = {
+    &intelTexBufferExtension.base,
+    &intelFlushExtension.base,
+    &intelImageExtension.base,
+    &intelRendererQueryExtension.base,
+    &dri2ConfigQueryExtension.base,
+    &dri2Robustness.base,
+    NULL
+};
+
+static bool
+intel_get_param(__DRIscreen *psp, int param, int *value)
+{
+   int ret;
+   struct drm_i915_getparam gp;
+
+   memset(&gp, 0, sizeof(gp));
+   gp.param = param;
+   gp.value = value;
+
+   ret = drmCommandWriteRead(psp->fd, DRM_I915_GETPARAM, &gp, sizeof(gp));
+   if (ret) {
+      if (ret != -EINVAL)
+	 _mesa_warning(NULL, "drm_i915_getparam: %d", ret);
+      return false;
+   }
+
+   return true;
+}
+
+static bool
+intel_get_boolean(__DRIscreen *psp, int param)
+{
+   int value = 0;
+   return intel_get_param(psp, param, &value) && value;
+}
+
+static void
+intelDestroyScreen(__DRIscreen * sPriv)
+{
+   struct intel_screen *intelScreen = sPriv->driverPrivate;
+
+   dri_bufmgr_destroy(intelScreen->bufmgr);
+   driDestroyOptionInfo(&intelScreen->optionCache);
+
+   ralloc_free(intelScreen);
+   sPriv->driverPrivate = NULL;
+}
+
+
+/**
+ * This is called when we need to set up GL rendering to a new X window.
+ */
+static GLboolean
+intelCreateBuffer(__DRIscreen * driScrnPriv,
+                  __DRIdrawable * driDrawPriv,
+                  const struct gl_config * mesaVis, GLboolean isPixmap)
+{
+   struct intel_renderbuffer *rb;
+   struct intel_screen *screen = (struct intel_screen*) driScrnPriv->driverPrivate;
+   mesa_format rgbFormat;
+   unsigned num_samples = intel_quantize_num_samples(screen, mesaVis->samples);
+   struct gl_framebuffer *fb;
+
+   if (isPixmap)
+      return false;
+
+   fb = CALLOC_STRUCT(gl_framebuffer);
+   if (!fb)
+      return false;
+
+   _mesa_initialize_window_framebuffer(fb, mesaVis);
+
+   if (screen->winsys_msaa_samples_override != -1) {
+      num_samples = screen->winsys_msaa_samples_override;
+      fb->Visual.samples = num_samples;
+   }
+
+   if (mesaVis->redBits == 5)
+      rgbFormat = MESA_FORMAT_B5G6R5_UNORM;
+   else if (mesaVis->sRGBCapable)
+      rgbFormat = MESA_FORMAT_B8G8R8A8_SRGB;
+   else if (mesaVis->alphaBits == 0)
+      rgbFormat = MESA_FORMAT_B8G8R8X8_UNORM;
+   else {
+      rgbFormat = MESA_FORMAT_B8G8R8A8_SRGB;
+      fb->Visual.sRGBCapable = true;
+   }
+
+   /* setup the hardware-based renderbuffers */
+   rb = intel_create_renderbuffer(rgbFormat, num_samples);
+   _mesa_add_renderbuffer(fb, BUFFER_FRONT_LEFT, &rb->Base.Base);
+
+   if (mesaVis->doubleBufferMode) {
+      rb = intel_create_renderbuffer(rgbFormat, num_samples);
+      _mesa_add_renderbuffer(fb, BUFFER_BACK_LEFT, &rb->Base.Base);
+   }
+
+   /*
+    * Assert here that the gl_config has an expected depth/stencil bit
+    * combination: one of d24/s8, d16/s0, d0/s0. (See intelInitScreen2(),
+    * which constructs the advertised configs.)
+    */
+   if (mesaVis->depthBits == 24) {
+      assert(mesaVis->stencilBits == 8);
+
+      if (screen->devinfo->has_hiz_and_separate_stencil) {
+         rb = intel_create_private_renderbuffer(MESA_FORMAT_Z24_UNORM_X8_UINT,
+                                                num_samples);
+         _mesa_add_renderbuffer(fb, BUFFER_DEPTH, &rb->Base.Base);
+         rb = intel_create_private_renderbuffer(MESA_FORMAT_S_UINT8,
+                                                num_samples);
+         _mesa_add_renderbuffer(fb, BUFFER_STENCIL, &rb->Base.Base);
+      } else {
+         /*
+          * Use combined depth/stencil. Note that the renderbuffer is
+          * attached to two attachment points.
+          */
+         rb = intel_create_private_renderbuffer(MESA_FORMAT_Z24_UNORM_S8_UINT,
+                                                num_samples);
+         _mesa_add_renderbuffer(fb, BUFFER_DEPTH, &rb->Base.Base);
+         _mesa_add_renderbuffer(fb, BUFFER_STENCIL, &rb->Base.Base);
+      }
+   }
+   else if (mesaVis->depthBits == 16) {
+      assert(mesaVis->stencilBits == 0);
+      rb = intel_create_private_renderbuffer(MESA_FORMAT_Z_UNORM16,
+                                             num_samples);
+      _mesa_add_renderbuffer(fb, BUFFER_DEPTH, &rb->Base.Base);
+   }
+   else {
+      assert(mesaVis->depthBits == 0);
+      assert(mesaVis->stencilBits == 0);
+   }
+
+   /* now add any/all software-based renderbuffers we may need */
+   _swrast_add_soft_renderbuffers(fb,
+                                  false, /* never sw color */
+                                  false, /* never sw depth */
+                                  false, /* never sw stencil */
+                                  mesaVis->accumRedBits > 0,
+                                  false, /* never sw alpha */
+                                  false  /* never sw aux */ );
+   driDrawPriv->driverPrivate = fb;
+
+   return true;
+}
+
+static void
+intelDestroyBuffer(__DRIdrawable * driDrawPriv)
+{
+    struct gl_framebuffer *fb = driDrawPriv->driverPrivate;
+
+    _mesa_reference_framebuffer(&fb, NULL);
+}
+
+static bool
+intel_init_bufmgr(struct intel_screen *intelScreen)
+{
+   __DRIscreen *spriv = intelScreen->driScrnPriv;
+
+   intelScreen->no_hw = getenv("INTEL_NO_HW") != NULL;
+
+   intelScreen->bufmgr = intel_bufmgr_gem_init(spriv->fd, BATCH_SZ);
+   if (intelScreen->bufmgr == NULL) {
+      fprintf(stderr, "[%s:%u] Error initializing buffer manager.\n",
+	      __func__, __LINE__);
+      return false;
+   }
+
+   drm_intel_bufmgr_gem_set_vma_cache_size(intelScreen->bufmgr, 512);
+   drm_intel_bufmgr_gem_enable_fenced_relocs(intelScreen->bufmgr);
+
+   if (!intel_get_boolean(spriv, I915_PARAM_HAS_RELAXED_DELTA)) {
+      fprintf(stderr, "[%s: %u] Kernel 2.6.39 required.\n", __func__, __LINE__);
+      return false;
+   }
+
+   return true;
+}
+
+static bool
+intel_detect_swizzling(struct intel_screen *screen)
+{
+   drm_intel_bo *buffer;
+   unsigned long flags = 0;
+   unsigned long aligned_pitch;
+   uint32_t tiling = I915_TILING_X;
+   uint32_t swizzle_mode = 0;
+
+   buffer = drm_intel_bo_alloc_tiled(screen->bufmgr, "swizzle test",
+				     64, 64, 4,
+				     &tiling, &aligned_pitch, flags);
+   if (buffer == NULL)
+      return false;
+
+   drm_intel_bo_get_tiling(buffer, &tiling, &swizzle_mode);
+   drm_intel_bo_unreference(buffer);
+
+   if (swizzle_mode == I915_BIT_6_SWIZZLE_NONE)
+      return false;
+   else
+      return true;
+}
+
+/**
+ * Return array of MSAA modes supported by the hardware. The array is
+ * zero-terminated and sorted in decreasing order.
+ */
+const int*
+intel_supported_msaa_modes(const struct intel_screen  *screen)
+{
+   static const int gen8_modes[] = {8, 4, 2, 0, -1};
+   static const int gen7_modes[] = {8, 4, 0, -1};
+   static const int gen6_modes[] = {4, 0, -1};
+   static const int gen4_modes[] = {0, -1};
+
+   if (screen->devinfo->gen >= 8) {
+      return gen8_modes;
+   } else if (screen->devinfo->gen >= 7) {
+      return gen7_modes;
+   } else if (screen->devinfo->gen == 6) {
+      return gen6_modes;
+   } else {
+      return gen4_modes;
+   }
+}
+
+static __DRIconfig**
+intel_screen_make_configs(__DRIscreen *dri_screen)
+{
+   static const mesa_format formats[] = {
+      MESA_FORMAT_B5G6R5_UNORM,
+      MESA_FORMAT_B8G8R8A8_UNORM
+   };
+
+   /* GLX_SWAP_COPY_OML is not supported due to page flipping. */
+   static const GLenum back_buffer_modes[] = {
+       GLX_SWAP_UNDEFINED_OML, GLX_NONE,
+   };
+
+   static const uint8_t singlesample_samples[1] = {0};
+   static const uint8_t multisample_samples[2]  = {4, 8};
+
+   struct intel_screen *screen = dri_screen->driverPrivate;
+   const struct brw_device_info *devinfo = screen->devinfo;
+   uint8_t depth_bits[4], stencil_bits[4];
+   __DRIconfig **configs = NULL;
+
+   /* Generate singlesample configs without accumulation buffer. */
+   for (int i = 0; i < ARRAY_SIZE(formats); i++) {
+      __DRIconfig **new_configs;
+      int num_depth_stencil_bits = 2;
+
+      /* Starting with DRI2 protocol version 1.1 we can request a depth/stencil
+       * buffer that has a different number of bits per pixel than the color
+       * buffer, gen >= 6 supports this.
+       */
+      depth_bits[0] = 0;
+      stencil_bits[0] = 0;
+
+      if (formats[i] == MESA_FORMAT_B5G6R5_UNORM) {
+         depth_bits[1] = 16;
+         stencil_bits[1] = 0;
+         if (devinfo->gen >= 6) {
+             depth_bits[2] = 24;
+             stencil_bits[2] = 8;
+             num_depth_stencil_bits = 3;
+         }
+      } else {
+         depth_bits[1] = 24;
+         stencil_bits[1] = 8;
+      }
+
+      new_configs = driCreateConfigs(formats[i],
+                                     depth_bits,
+                                     stencil_bits,
+                                     num_depth_stencil_bits,
+                                     back_buffer_modes, 2,
+                                     singlesample_samples, 1,
+                                     false);
+      configs = driConcatConfigs(configs, new_configs);
+   }
+
+   /* Generate the minimum possible set of configs that include an
+    * accumulation buffer.
+    */
+   for (int i = 0; i < ARRAY_SIZE(formats); i++) {
+      __DRIconfig **new_configs;
+
+      if (formats[i] == MESA_FORMAT_B5G6R5_UNORM) {
+         depth_bits[0] = 16;
+         stencil_bits[0] = 0;
+      } else {
+         depth_bits[0] = 24;
+         stencil_bits[0] = 8;
+      }
+
+      new_configs = driCreateConfigs(formats[i],
+                                     depth_bits, stencil_bits, 1,
+                                     back_buffer_modes, 1,
+                                     singlesample_samples, 1,
+                                     true);
+      configs = driConcatConfigs(configs, new_configs);
+   }
+
+   /* Generate multisample configs.
+    *
+    * This loop breaks early, and hence is a no-op, on gen < 6.
+    *
+    * Multisample configs must follow the singlesample configs in order to
+    * work around an X server bug present in 1.12. The X server chooses to
+    * associate the first listed RGBA888-Z24S8 config, regardless of its
+    * sample count, with the 32-bit depth visual used for compositing.
+    *
+    * Only doublebuffer configs with GLX_SWAP_UNDEFINED_OML behavior are
+    * supported.  Singlebuffer configs are not supported because no one wants
+    * them.
+    */
+   for (int i = 0; i < ARRAY_SIZE(formats); i++) {
+      if (devinfo->gen < 6)
+         break;
+
+      __DRIconfig **new_configs;
+      const int num_depth_stencil_bits = 2;
+      int num_msaa_modes = 0;
+
+      depth_bits[0] = 0;
+      stencil_bits[0] = 0;
+
+      if (formats[i] == MESA_FORMAT_B5G6R5_UNORM) {
+         depth_bits[1] = 16;
+         stencil_bits[1] = 0;
+      } else {
+         depth_bits[1] = 24;
+         stencil_bits[1] = 8;
+      }
+
+      if (devinfo->gen >= 7)
+         num_msaa_modes = 2;
+      else if (devinfo->gen == 6)
+         num_msaa_modes = 1;
+
+      new_configs = driCreateConfigs(formats[i],
+                                     depth_bits,
+                                     stencil_bits,
+                                     num_depth_stencil_bits,
+                                     back_buffer_modes, 1,
+                                     multisample_samples,
+                                     num_msaa_modes,
+                                     false);
+      configs = driConcatConfigs(configs, new_configs);
+   }
+
+   if (configs == NULL) {
+      fprintf(stderr, "[%s:%u] Error creating FBConfig!\n", __func__,
+              __LINE__);
+      return NULL;
+   }
+
+   return configs;
+}
+
+static void
+set_max_gl_versions(struct intel_screen *screen)
+{
+   __DRIscreen *psp = screen->driScrnPriv;
+
+   switch (screen->devinfo->gen) {
+   case 8:
+   case 7:
+      psp->max_gl_core_version = 33;
+      psp->max_gl_compat_version = 30;
+      psp->max_gl_es1_version = 11;
+      psp->max_gl_es2_version = 30;
+      break;
+   case 6:
+      psp->max_gl_core_version = 31;
+      psp->max_gl_compat_version = 30;
+      psp->max_gl_es1_version = 11;
+      psp->max_gl_es2_version = 30;
+      break;
+   case 5:
+   case 4:
+      psp->max_gl_core_version = 0;
+      psp->max_gl_compat_version = 21;
+      psp->max_gl_es1_version = 11;
+      psp->max_gl_es2_version = 20;
+      break;
+   default:
+      assert(!"unrecognized intel_screen::gen");
+      break;
+   }
+}
+
+/**
+ * This is the driver specific part of the createNewScreen entry point.
+ * Called when using DRI2.
+ *
+ * \return the struct gl_config supported by this driver
+ */
+static const
+__DRIconfig **intelInitScreen2(__DRIscreen *psp)
+{
+   struct intel_screen *intelScreen;
+
+   if (psp->image.loader) {
+   } else if (psp->dri2.loader->base.version <= 2 ||
+       psp->dri2.loader->getBuffersWithFormat == NULL) {
+      fprintf(stderr,
+	      "\nERROR!  DRI2 loader with getBuffersWithFormat() "
+	      "support required\n");
+      return false;
+   }
+
+   /* Allocate the private area */
+   intelScreen = rzalloc(NULL, struct intel_screen);
+   if (!intelScreen) {
+      fprintf(stderr, "\nERROR!  Allocating private area failed\n");
+      return false;
+   }
+   /* parse information in __driConfigOptions */
+   driParseOptionInfo(&intelScreen->optionCache, brw_config_options.xml);
+
+   intelScreen->driScrnPriv = psp;
+   psp->driverPrivate = (void *) intelScreen;
+
+   if (!intel_init_bufmgr(intelScreen))
+       return false;
+
+   intelScreen->deviceID = drm_intel_bufmgr_gem_get_devid(intelScreen->bufmgr);
+   intelScreen->devinfo = brw_get_device_info(intelScreen->deviceID);
+   if (!intelScreen->devinfo)
+      return false;
+
+   intelScreen->hw_must_use_separate_stencil = intelScreen->devinfo->gen >= 7;
+
+   intelScreen->hw_has_swizzling = intel_detect_swizzling(intelScreen);
+
+   const char *force_msaa = getenv("INTEL_FORCE_MSAA");
+   if (force_msaa) {
+      intelScreen->winsys_msaa_samples_override =
+         intel_quantize_num_samples(intelScreen, atoi(force_msaa));
+      printf("Forcing winsys sample count to %d\n",
+             intelScreen->winsys_msaa_samples_override);
+   } else {
+      intelScreen->winsys_msaa_samples_override = -1;
+   }
+
+   set_max_gl_versions(intelScreen);
+
+   /* Notification of GPU resets requires hardware contexts and a kernel new
+    * enough to support DRM_IOCTL_I915_GET_RESET_STATS.  If the ioctl is
+    * supported, calling it with a context of 0 will either generate EPERM or
+    * no error.  If the ioctl is not supported, it always generate EINVAL.
+    * Use this to determine whether to advertise the __DRI2_ROBUSTNESS
+    * extension to the loader.
+    *
+    * Don't even try on pre-Gen6, since we don't attempt to use contexts there.
+    */
+   if (intelScreen->devinfo->gen >= 6) {
+      struct drm_i915_reset_stats stats;
+      memset(&stats, 0, sizeof(stats));
+
+      const int ret = drmIoctl(psp->fd, DRM_IOCTL_I915_GET_RESET_STATS, &stats);
+
+      intelScreen->has_context_reset_notification =
+         (ret != -1 || errno != EINVAL);
+   }
+
+   psp->extensions = !intelScreen->has_context_reset_notification
+      ? intelScreenExtensions : intelRobustScreenExtensions;
+
+   brw_fs_alloc_reg_sets(intelScreen);
+   brw_vec4_alloc_reg_set(intelScreen);
+
+   return (const __DRIconfig**) intel_screen_make_configs(psp);
+}
+
+struct intel_buffer {
+   __DRIbuffer base;
+   drm_intel_bo *bo;
+};
+
+static __DRIbuffer *
+intelAllocateBuffer(__DRIscreen *screen,
+		    unsigned attachment, unsigned format,
+		    int width, int height)
+{
+   struct intel_buffer *intelBuffer;
+   struct intel_screen *intelScreen = screen->driverPrivate;
+
+   assert(attachment == __DRI_BUFFER_FRONT_LEFT ||
+          attachment == __DRI_BUFFER_BACK_LEFT);
+
+   intelBuffer = calloc(1, sizeof *intelBuffer);
+   if (intelBuffer == NULL)
+      return NULL;
+
+   /* The front and back buffers are color buffers, which are X tiled. */
+   uint32_t tiling = I915_TILING_X;
+   unsigned long pitch;
+   int cpp = format / 8;
+   intelBuffer->bo = drm_intel_bo_alloc_tiled(intelScreen->bufmgr,
+                                              "intelAllocateBuffer",
+                                              width,
+                                              height,
+                                              cpp,
+                                              &tiling, &pitch,
+                                              BO_ALLOC_FOR_RENDER);
+
+   if (intelBuffer->bo == NULL) {
+	   free(intelBuffer);
+	   return NULL;
+   }
+
+   drm_intel_bo_flink(intelBuffer->bo, &intelBuffer->base.name);
+
+   intelBuffer->base.attachment = attachment;
+   intelBuffer->base.cpp = cpp;
+   intelBuffer->base.pitch = pitch;
+
+   return &intelBuffer->base;
+}
+
+static void
+intelReleaseBuffer(__DRIscreen *screen, __DRIbuffer *buffer)
+{
+   struct intel_buffer *intelBuffer = (struct intel_buffer *) buffer;
+
+   drm_intel_bo_unreference(intelBuffer->bo);
+   free(intelBuffer);
+}
+
+static const struct __DriverAPIRec brw_driver_api = {
+   .InitScreen		 = intelInitScreen2,
+   .DestroyScreen	 = intelDestroyScreen,
+   .CreateContext	 = brwCreateContext,
+   .DestroyContext	 = intelDestroyContext,
+   .CreateBuffer	 = intelCreateBuffer,
+   .DestroyBuffer	 = intelDestroyBuffer,
+   .MakeCurrent		 = intelMakeCurrent,
+   .UnbindContext	 = intelUnbindContext,
+   .AllocateBuffer       = intelAllocateBuffer,
+   .ReleaseBuffer        = intelReleaseBuffer
+};
+
+static const struct __DRIDriverVtableExtensionRec brw_vtable = {
+   .base = { __DRI_DRIVER_VTABLE, 1 },
+   .vtable = &brw_driver_api,
+};
+
+static const __DRIextension *brw_driver_extensions[] = {
+    &driCoreExtension.base,
+    &driImageDriverExtension.base,
+    &driDRI2Extension.base,
+    &brw_vtable.base,
+    &brw_config_options.base,
+    NULL
+};
+
+PUBLIC const __DRIextension **__driDriverGetExtensions_i965(void)
+{
+   globalDriverAPI = &brw_driver_api;
+
+   return brw_driver_extensions;
+}

diff --git a/icd/intel/compiler/pipeline/intel_screen.h b/icd/intel/compiler/pipeline/intel_screen.h
new file mode 100644
index 0000000..6231332
--- /dev/null
+++ b/icd/intel/compiler/pipeline/intel_screen.h

@@ -0,0 +1,63 @@
+/**************************************************************************
+ *
+ * Copyright 2003 VMware, Inc.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
+ * IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR
+ * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ **************************************************************************/
+
+#ifndef _INTEL_INIT_H_
+#define _INTEL_INIT_H_
+
+#include <stdbool.h>
+#include <sys/time.h>
+#include "intel_chipset.h"
+#include "brw_device_info.h"
+
+struct intel_screen
+{
+   int deviceID;
+   const struct brw_device_info *devinfo;
+
+   /**
+    * A unique ID for shader programs.
+    */
+   unsigned program_id;
+
+   struct {
+      struct ra_regs *regs;
+
+      /**
+       * Array of the ra classes for the unaligned contiguous register
+       * block sizes used.
+       */
+      int *classes;
+
+      /**
+       * Mapping for register-allocated objects in *regs to the first
+       * GRF for that object.
+       */
+      uint8_t *ra_reg_to_grf;
+   } vec4_reg_set;
+};
+
+#endif

diff --git a/icd/intel/compiler/pipeline/pipeline_compiler_interface.cpp b/icd/intel/compiler/pipeline/pipeline_compiler_interface.cpp
new file mode 100644
index 0000000..ecf055e
--- /dev/null
+++ b/icd/intel/compiler/pipeline/pipeline_compiler_interface.cpp

@@ -0,0 +1,938 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Cody Northrop <cody@lunarg.com>
+ * Author: GregF <greg@LunarG.com>
+ *
+ */
+
+#include <cinttypes>
+
+extern "C" {
+#include "desc.h"
+#include "gpu.h"
+#include "shader.h"
+#include "pipeline.h"
+}
+
+#include "compiler/shader/compiler_interface.h"
+#include "compiler/pipeline/pipeline_compiler_interface.h"
+#include "compiler/pipeline/brw_context.h"
+#include "compiler/pipeline/brw_shader.h"
+#include "compiler/mesa-utils/src/mesa/main/context.h"
+#include "compiler/mesa-utils/src/glsl/ralloc.h"
+#include "compiler/pipeline/brw_device_info.h"
+#include "compiler/pipeline/brw_wm.h"
+
+#ifndef STANDALONE_SHADER_COMPILER
+
+    static const bool standaloneCompiler = false;
+
+#else
+
+    static const bool standaloneCompiler = true;
+
+    // remove this when standalone resource map creation exists
+    bool intel_desc_iter_init_for_binding(struct intel_desc_iter *iter,
+                                          const struct intel_desc_layout *layout,
+                                          uint32_t binding_index, uint32_t array_base)
+    {
+        return true;
+    }
+
+#endif //STANDALONE_SHADER_COMPILER
+
+struct brw_binding_table {
+    uint32_t count;
+
+    uint32_t rt_start;
+
+    uint32_t texture_start;
+    uint32_t texture_count;
+
+    uint32_t ubo_start;
+    uint32_t ubo_count;
+
+    uint32_t texture_gather_start;
+    uint32_t texture_gather_count;
+
+    uint32_t *sampler_binding;
+    uint32_t *sampler_set;
+
+    uint32_t *uniform_binding;
+    uint32_t *uniform_set;
+
+    uint32_t *texture_gather_binding;
+    uint32_t *texture_gather_set;
+};
+
+static void initialize_brw_context(struct brw_context *brw,
+                                   const struct intel_gpu *gpu)
+{
+
+    // create a stripped down context for compilation
+    initialize_mesa_context_to_defaults(&brw->ctx);
+
+    //
+    // init the things pulled from DRI in brwCreateContext
+    //
+    struct brw_device_info *devInfo = rzalloc(brw, struct brw_device_info);
+    switch (intel_gpu_gen(gpu)) {
+    case INTEL_GEN(7.5):
+        devInfo->gen = 7;
+        devInfo->is_haswell = true;
+        break;
+    case INTEL_GEN(7):
+        devInfo->gen = 7;
+        break;
+    case INTEL_GEN(6):
+        devInfo->gen = 6;
+        break;
+    default:
+        assert(!"unsupported GEN");
+        break;
+    }
+
+    devInfo->gt = gpu->gt;
+    devInfo->has_llc = true;
+    devInfo->has_pln = true;
+    devInfo->has_compr4 = true;
+    devInfo->has_negative_rhw_bug = false;
+    devInfo->needs_unlit_centroid_workaround = true;
+
+    // hand code values until we have something to pull from
+    // use brw_device_info_hsw_gt3
+    brw->intelScreen = rzalloc(brw, struct intel_screen);
+    brw->intelScreen->devinfo = devInfo;
+
+    brw->gen = brw->intelScreen->devinfo->gen;
+    brw->gt = brw->intelScreen->devinfo->gt;
+    brw->is_g4x = brw->intelScreen->devinfo->is_g4x;
+    brw->is_baytrail = brw->intelScreen->devinfo->is_baytrail;
+    brw->is_haswell = brw->intelScreen->devinfo->is_haswell;
+    brw->has_llc = brw->intelScreen->devinfo->has_llc;
+    brw->has_pln = brw->intelScreen->devinfo->has_pln;
+    brw->has_compr4 = brw->intelScreen->devinfo->has_compr4;
+    brw->has_negative_rhw_bug = brw->intelScreen->devinfo->has_negative_rhw_bug;
+    brw->needs_unlit_centroid_workaround =
+       brw->intelScreen->devinfo->needs_unlit_centroid_workaround;
+
+    brw->vs.base.stage = MESA_SHADER_VERTEX;
+    brw->gs.base.stage = MESA_SHADER_GEOMETRY;
+    brw->wm.base.stage = MESA_SHADER_FRAGMENT;
+
+    //
+    // init what remains of intel_screen
+    //
+    brw->intelScreen->deviceID = 0;
+    brw->intelScreen->program_id = 0;
+
+    brw_vec4_alloc_reg_set(brw->intelScreen);
+
+    brw->shader_prog = brw_new_shader_program(&brw->ctx, 0);
+}
+
+static void hexdump(FILE *fp, void *ptr, int buflen) {
+  unsigned int *buf = (unsigned int*)ptr;
+  int i, j;
+  for (i=0; i<(buflen/4); i+=4) {
+    fprintf(fp,"%06x: ", i);
+    for (j=0; j<4; j++)
+      if (i+j < (buflen/4))
+        fprintf(fp,"%08x ", buf[i+j]);
+      else
+        fprintf(fp,"   ");
+    fprintf(fp,"\n");
+  }
+
+  fflush(fp);
+}
+
+static void base_prog_dump(FILE *fp, struct brw_stage_prog_data* stage_prog_data)
+{
+    fprintf(fp, "stage_prog_data->binding_table.size_bytes = %u\n",
+                 stage_prog_data->binding_table.size_bytes);
+    fprintf(fp, "stage_prog_data->binding_table.pull_constants_start = %u\n",
+                 stage_prog_data->binding_table.pull_constants_start);
+    fprintf(fp, "stage_prog_data->binding_table.texture_start = %u\n",
+                 stage_prog_data->binding_table.texture_start);
+    fprintf(fp, "stage_prog_data->binding_table.gather_texture_start = %u\n",
+                 stage_prog_data->binding_table.gather_texture_start);
+    fprintf(fp, "stage_prog_data->binding_table.ubo_start = %u\n",
+                 stage_prog_data->binding_table.ubo_start);
+    fprintf(fp, "stage_prog_data->binding_table.abo_start = %u\n",
+                 stage_prog_data->binding_table.abo_start);
+    fprintf(fp, "stage_prog_data->binding_table.shader_time_start = %u\n",
+                 stage_prog_data->binding_table.shader_time_start);
+
+    fprintf(fp, "stage_prog_data->nr_params = %u\n",
+                 stage_prog_data->nr_params);
+    fprintf(fp, "stage_prog_data->nr_pull_params = %u\n",
+                 stage_prog_data->nr_pull_params);
+
+    fprintf(fp, "== push constants: ==\n");
+    fprintf(fp, "stage_prog_data->nr_params = %u\n",
+                 stage_prog_data->nr_params);
+
+    for (int i = 0; i < stage_prog_data->nr_params; ++i) {
+        fprintf(fp, "stage_prog_data->param = %p\n",
+                     stage_prog_data->param);
+        fprintf(fp, "*stage_prog_data->param = %p\n",
+                     *stage_prog_data->param);
+        fprintf(fp, "**stage_prog_data->param = %f\n",
+                     **stage_prog_data->param);
+    }
+
+    fprintf(fp, "== pull constants: ==\n");
+    fprintf(fp, "stage_prog_data->nr_pull_params = %u\n",
+                 stage_prog_data->nr_pull_params);
+
+    for (int i = 0; i < stage_prog_data->nr_pull_params; ++i) {
+        fprintf(fp, "stage_prog_data->pull_param = %p\n",
+                     stage_prog_data->pull_param);
+        fprintf(fp, "*stage_prog_data->pull_param = %p\n",
+                     *stage_prog_data->pull_param);
+        fprintf(fp, "**stage_prog_data->pull_param = %f\n",
+                     **stage_prog_data->pull_param);
+    }
+}
+
+
+static void base_vec4_prog_dump(FILE *fp, struct brw_vec4_prog_data* vec4_prog_data)
+{
+    fprintf(fp, "vec4_prog_data->vue_map.slots_valid = 0x%" PRIX64 "\n",
+                 vec4_prog_data->vue_map.slots_valid);
+
+    for (int i = 0; i < BRW_VARYING_SLOT_COUNT; ++i)
+        fprintf(fp, "vec4_prog_data->vue_map.varying_to_slot[%i] = %i\n", i,
+               (int) vec4_prog_data->vue_map.varying_to_slot[i]);
+
+    for (int i = 0; i < BRW_VARYING_SLOT_COUNT; ++i)
+        fprintf(fp, "vec4_prog_data->vue_map.slot_to_varying[%i] = %i\n", i,
+               (int) vec4_prog_data->vue_map.slot_to_varying[i]);
+
+    fprintf(fp, "vec4_prog_data->vue_map.num_slots = %i\n",
+                 vec4_prog_data->vue_map.num_slots);
+    fprintf(fp, "vec4_prog_data->dispatch_grf_start_reg = %u\n",
+                 vec4_prog_data->dispatch_grf_start_reg);
+    fprintf(fp, "vec4_prog_data->curb_read_length = %u\n",
+                 vec4_prog_data->curb_read_length);
+    fprintf(fp, "vec4_prog_data->urb_read_length = %u\n",
+                 vec4_prog_data->urb_read_length);
+    fprintf(fp, "vec4_prog_data->total_grf = %u\n",
+                 vec4_prog_data->total_grf);
+    fprintf(fp, "vec4_prog_data->total_scratch = %u\n",
+                 vec4_prog_data->total_scratch);
+    fprintf(fp, "vec4_prog_data->urb_entry_size = %u\n",
+                 vec4_prog_data->urb_entry_size);
+}
+
+static void vs_data_dump(FILE *fp, struct brw_vs_prog_data *vs_prog_data)
+{
+    fprintf(fp, "\n=== begin brw_vs_prog_data ===\n");
+
+    base_prog_dump(fp, &vs_prog_data->base.base);
+
+    base_vec4_prog_dump(fp, &vs_prog_data->base);
+
+    fprintf(fp, "vs_prog_data->inputs_read = 0x%" PRIX64 "\n",
+                 vs_prog_data->inputs_read);
+    fprintf(fp, "vs_prog_data->uses_vertexid = %s\n",
+                 vs_prog_data->uses_vertexid ? "true" : "false");
+    fprintf(fp, "vs_prog_data->uses_instanceid = %s\n",
+                 vs_prog_data->uses_instanceid ? "true" : "false");
+
+    fprintf(fp, "=== end brw_vs_prog_data ===\n");
+
+    fflush(fp);
+}
+
+static void gs_data_dump(FILE *fp, struct brw_gs_prog_data *gs_prog_data)
+{
+    fprintf(fp, "\n=== begin brw_gs_prog_data ===\n");
+
+    base_prog_dump(fp, &gs_prog_data->base.base);
+
+    base_vec4_prog_dump(fp, &gs_prog_data->base);
+
+    fprintf(fp, "gs_prog_data->output_vertex_size_hwords = %u\n",
+                 gs_prog_data->output_vertex_size_hwords);
+    fprintf(fp, "gs_prog_data->output_topology = %u\n",
+                 gs_prog_data->output_topology);
+    fprintf(fp, "gs_prog_data->control_data_header_size_hwords = %u\n",
+                 gs_prog_data->control_data_header_size_hwords);
+    fprintf(fp, "gs_prog_data->control_data_format = %u\n",
+                 gs_prog_data->control_data_format);
+    fprintf(fp, "gs_prog_data->include_primitive_id = %s\n",
+                 gs_prog_data->include_primitive_id ? "true" : "false");
+    fprintf(fp, "gs_prog_data->invocations = %u\n",
+                 gs_prog_data->invocations);
+    fprintf(fp, "gs_prog_data->dual_instanced_dispatch = %s\n",
+                 gs_prog_data->dual_instanced_dispatch ? "true" : "false");
+
+    fprintf(fp, "=== end brw_gs_prog_data ===\n");
+
+    fflush(fp);
+}
+
+static void fs_data_dump(FILE *fp, struct brw_wm_prog_data* wm_prog_data)
+{
+    fprintf(fp, "\n=== begin brw_wm_prog_data ===\n");
+
+    base_prog_dump(fp, &wm_prog_data->base);
+
+    fprintf(fp, "wm_prog_data->curb_read_length = %u\n",
+                 wm_prog_data->curb_read_length);
+    fprintf(fp, "wm_prog_data->num_varying_inputs = %u\n",
+                 wm_prog_data->num_varying_inputs);
+    fprintf(fp, "wm_prog_data->first_curbe_grf = %u\n",
+                 wm_prog_data->first_curbe_grf);
+    fprintf(fp, "wm_prog_data->first_curbe_grf_16 = %u\n",
+                 wm_prog_data->first_curbe_grf_16);
+    fprintf(fp, "wm_prog_data->reg_blocks = %u\n",
+                 wm_prog_data->reg_blocks);
+    fprintf(fp, "wm_prog_data->reg_blocks_16 = %u\n",
+                 wm_prog_data->reg_blocks_16);
+    fprintf(fp, "wm_prog_data->total_scratch = %u\n",
+                 wm_prog_data->total_scratch);
+    fprintf(fp, "wm_prog_data->binding_table.render_target_start = %u\n",
+                 wm_prog_data->binding_table.render_target_start);
+    fprintf(fp, "wm_prog_data->dual_src_blend = %s\n",
+                 wm_prog_data->dual_src_blend ? "true" : "false");
+    fprintf(fp, "wm_prog_data->uses_pos_offset = %s\n",
+                 wm_prog_data->uses_pos_offset ? "true" : "false");
+    fprintf(fp, "wm_prog_data->uses_omask = %s\n",
+                 wm_prog_data->uses_omask ? "true" : "false");
+    fprintf(fp, "wm_prog_data->prog_offset_16 = %u\n",
+                 wm_prog_data->prog_offset_16);
+    fprintf(fp, "wm_prog_data->barycentric_interp_modes = %u\n",
+                 wm_prog_data->barycentric_interp_modes);
+
+    for (int i = 0; i < VARYING_SLOT_MAX; ++i) {
+        fprintf(fp, "wm_prog_data->urb_setup[%i] = %i\n",
+                  i, wm_prog_data->urb_setup[i]);
+    }
+
+    fprintf(fp, "=== end brw_wm_prog_data ===\n");
+
+    fflush(fp);
+}
+
+static inline void pipe_interface_free(const void *handle, void *ptr)
+{
+    if (standaloneCompiler)
+        free(ptr);
+    else
+        intel_free(handle, ptr);
+}
+
+static inline void *pipe_interface_alloc(const void *handle,
+                                                    size_t size, size_t alignment,
+                                                    VkSystemAllocationScope scope)
+{
+    if (standaloneCompiler)
+        return calloc(size, sizeof(char));
+    else
+        return intel_alloc(handle, size, alignment, scope);
+}
+
+static void rmap_destroy(const struct intel_gpu *gpu,
+                         struct intel_pipeline_rmap *rmap)
+{
+    pipe_interface_free(gpu, rmap->slots);
+    pipe_interface_free(gpu, rmap);
+}
+
+static struct intel_pipeline_rmap *rmap_create(const struct intel_gpu *gpu,
+                                               const struct intel_pipeline_layout *pipeline_layout,
+                                               const struct brw_binding_table *bt)
+{
+    struct intel_pipeline_rmap *rmap;
+    struct intel_desc_iter iter;
+    uint32_t surface_count, i;
+
+    rmap = (struct intel_pipeline_rmap *)
+        pipe_interface_alloc(gpu, sizeof(*rmap), sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+    if (!rmap)
+        return NULL;
+
+    // remove this when standalone resource map creation exists
+    if (standaloneCompiler)
+        return rmap;
+
+    memset(rmap, 0, sizeof(*rmap));
+
+    /* Fix the compiler and fix these!  No point in understanding them. */
+    rmap->rt_count = bt->texture_start;
+    rmap->texture_resource_count = bt->ubo_start - bt->texture_start;
+    rmap->uav_count = bt->ubo_count;
+    rmap->sampler_count = rmap->texture_resource_count;
+    surface_count = rmap->rt_count + rmap->texture_resource_count +
+        rmap->uav_count;
+    rmap->slot_count = surface_count + rmap->sampler_count;
+
+    rmap->slots = (struct intel_pipeline_rmap_slot *)
+        pipe_interface_alloc(gpu, sizeof(rmap->slots[0]) * rmap->slot_count,
+            sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+    if (!rmap->slots) {
+        pipe_interface_free(gpu, rmap);
+        return NULL;
+    }
+
+    memset(rmap->slots, 0, sizeof(rmap->slots[0]) * rmap->slot_count);
+
+    for (i = 0; i < bt->rt_start; i++)
+        rmap->slots[i].type = INTEL_PIPELINE_RMAP_UNUSED;
+
+    for (i = bt->rt_start; i < bt->texture_start; i++) {
+        rmap->slots[i].type = INTEL_PIPELINE_RMAP_RT;
+        rmap->slots[i].index = i - bt->rt_start;
+    }
+
+    for (i = bt->texture_start; i < bt->ubo_start; i++) {
+        rmap->slots[i].type = INTEL_PIPELINE_RMAP_SURFACE;
+        rmap->slots[i].index = bt->sampler_set[i - bt->texture_start];
+
+        // use the set and binding data to find correct dset slot
+        // XXX validate both set and binding
+        // XXX no array support
+        intel_desc_iter_init_for_binding(&iter,
+                pipeline_layout->layouts[rmap->slots[i].index],
+                bt->sampler_binding[i - bt->texture_start], 0);
+
+        rmap->slots[i].u.surface.offset = iter.begin;
+        rmap->slots[i].u.surface.dynamic_offset_index = -1;
+
+        rmap->slots[bt->count + i - bt->texture_start].type =
+            INTEL_PIPELINE_RMAP_SAMPLER;
+        rmap->slots[bt->count + i - bt->texture_start].index =
+            rmap->slots[i].index;
+        rmap->slots[bt->count + i - bt->texture_start].u.sampler =
+            iter.begin;
+    }
+
+    for (i = bt->ubo_start; i < bt->texture_gather_start; i++) {
+        rmap->slots[i].type = INTEL_PIPELINE_RMAP_SURFACE;
+        rmap->slots[i].index = bt->uniform_set[i - bt->ubo_start];
+
+        // use the set and binding data to find correct dset slot
+        // XXX validate both set and binding
+        // XXX no array support
+        intel_desc_iter_init_for_binding(&iter,
+                pipeline_layout->layouts[rmap->slots[i].index],
+                bt->uniform_binding[i - bt->ubo_start], 0);
+
+        rmap->slots[i].u.surface.offset = iter.begin;
+        rmap->slots[i].u.surface.dynamic_offset_index = -1;
+    }
+
+    for (i = bt->texture_gather_start; i < bt->count; i++) {
+        rmap->slots[i].type = INTEL_PIPELINE_RMAP_SURFACE;
+        rmap->slots[i].index = bt->texture_gather_set[i - bt->texture_gather_start];
+
+        // use the set and binding data to find correct dset slot
+        // XXX validate both set and binding
+        // XXX no array support
+        intel_desc_iter_init_for_binding(&iter,
+                pipeline_layout->layouts[rmap->slots[i].index],
+                bt->texture_gather_binding[i - bt->texture_gather_start], 0);
+
+        rmap->slots[i].u.surface.offset = iter.begin;
+        rmap->slots[i].u.surface.dynamic_offset_index = -1;
+
+        rmap->slots[bt->count + i - bt->texture_gather_start].type =
+            INTEL_PIPELINE_RMAP_SAMPLER;
+        rmap->slots[bt->count + i - bt->texture_gather_start].index =
+            rmap->slots[i].index;
+        rmap->slots[bt->count + i - bt->texture_gather_start].u.sampler =
+            iter.begin;
+    }
+
+    return rmap;
+}
+
+extern "C" {
+
+struct brw_context *intel_create_brw_context(const struct intel_gpu *gpu)
+{
+    // create a brw_context
+    struct brw_context *brw = rzalloc(NULL, struct brw_context);
+
+    // allocate sub structures on the stack
+    initialize_brw_context(brw, gpu);
+
+    return brw;
+}
+
+void intel_destroy_brw_context(struct brw_context *brw)
+{
+    ralloc_free(brw->shader_prog);
+    ralloc_free(brw);
+}
+
+void unpack_set_and_binding(const int location, int &set, int &binding)
+{
+    // Logic mirrored from LunarGLASS GLSL backend
+    set = (unsigned) location >> 16;
+    binding = location & 0xFFFF;
+
+    // Unbias set, which was biased by 1 to distinguish between "set=0" and nothing.
+    bool setPresent = (set != 0);
+    if (setPresent)
+        --set;
+}
+
+const char *shader_stage_to_string(VkShaderStageFlagBits stage)
+{
+   switch (stage) {
+   case VK_SHADER_STAGE_VERTEX_BIT:          return "vertex";
+   case VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT:    return "tessellation evaluation";
+   case VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT: return "tessellation control";
+   case VK_SHADER_STAGE_GEOMETRY_BIT:        return "geometry";
+   case VK_SHADER_STAGE_FRAGMENT_BIT:        return "fragment";
+   case VK_SHADER_STAGE_COMPUTE_BIT:         return "compute";
+   default:
+       assert(0 && "Unknown shader stage");
+       return "unknown";
+   }
+}
+
+static VkResult build_binding_table(const struct intel_gpu *gpu,
+                                    struct brw_context *brw,
+                                    struct brw_binding_table *bt,
+                                    brw_stage_prog_data data,
+                                    struct gl_shader_program *sh_prog,
+                                    VkShaderStageFlagBits stage)
+{
+    gl_shader *sh = 0;
+    switch (stage) {
+    case VK_SHADER_STAGE_VERTEX_BIT:   sh = sh_prog->_LinkedShaders[MESA_SHADER_VERTEX];   break;
+    case VK_SHADER_STAGE_GEOMETRY_BIT: sh = sh_prog->_LinkedShaders[MESA_SHADER_GEOMETRY]; break;
+    case VK_SHADER_STAGE_FRAGMENT_BIT: sh = sh_prog->_LinkedShaders[MESA_SHADER_FRAGMENT]; break;
+    default:
+        assert(0 && "Unknown shader stage");
+    }
+
+    bt->count = data.binding_table.size_bytes / 4;
+
+    // See assign_common_binding_table_offsets
+    // If gather is in use (before gen8), shader will use second set of indices for gathers
+    // on instruction by instruction basis.  We must duplicate our texture binding entries to
+    // compensate, and they follow UBO entries.
+    const bool usesGather = sh->Program->UsesGather && (brw->gen < 8);
+    bt->texture_start = data.binding_table.texture_start;
+    bt->texture_count = data.binding_table.ubo_start - data.binding_table.texture_start;
+    bt->ubo_start     = data.binding_table.ubo_start;
+    if (usesGather) {
+        bt->ubo_count            = data.binding_table.gather_texture_start - data.binding_table.ubo_start;
+        bt->texture_gather_start = data.binding_table.gather_texture_start;
+        bt->texture_gather_count = data.binding_table.pull_constants_start - bt->texture_gather_start;
+    } else {
+        bt->ubo_count            = data.binding_table.pull_constants_start - data.binding_table.ubo_start;
+        bt->texture_gather_start = data.binding_table.pull_constants_start;
+        bt->texture_gather_count = 0;
+    }
+
+    // TODO: handle atomics
+    assert(sh_prog->NumAtomicBuffers == 0);
+
+    if (bt->ubo_count != sh->NumUniformBlocks) {
+        // If there is no UBO data to pull from, the shader is using a default uniform, which
+        // will not work in VK.  We need a binding slot to pull from.
+        intel_log(gpu, VK_DEBUG_REPORT_ERROR_BIT_EXT, VK_DEBUG_REPORT_OBJECT_TYPE_PHYSICAL_DEVICE_EXT, reinterpret_cast<uint64_t>(gpu), 0, 0,
+                "compile error: UBO mismatch, %s shader may read from global, non-block uniform", shader_stage_to_string(stage));
+
+        assert(0);
+        /* TODO: Move this test to validation layer, is this already covered in shader_checker? */
+//        return VK_ERROR_BAD_PIPELINE_DATA;
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    // Sampler mapping data
+    bt->sampler_binding = (uint32_t*) rzalloc_size(brw, bt->texture_count * sizeof(uint32_t));
+    bt->sampler_set     = (uint32_t*) rzalloc_size(brw, bt->texture_count * sizeof(uint32_t));
+    for (int i = 0; i < bt->texture_count; ++i) {
+        int location = sh->SamplerUnits[i];
+        int set = 0;
+        int binding = 0;
+
+        unpack_set_and_binding(location, set, binding);
+
+        bt->sampler_binding[i] = binding;
+        bt->sampler_set[i]     = set;
+    }
+
+    // UBO mapping data
+    bt->uniform_binding = (uint32_t*) rzalloc_size(brw, bt->ubo_count * sizeof(uint32_t));
+    bt->uniform_set     = (uint32_t*) rzalloc_size(brw, bt->ubo_count * sizeof(uint32_t));
+    for (int i = 0; i < bt->ubo_count; ++i) {
+        int location = sh->UniformBlocks[i].Binding;
+        int set = 0;
+        int binding = 0;
+
+        unpack_set_and_binding(location, set, binding);
+
+        bt->uniform_binding[i] = binding;
+        bt->uniform_set[i]     = set;
+    }
+
+    // TextureGather mapping data
+    // Currently this is exactly like Sampler mapping data
+    bt->texture_gather_binding = (uint32_t*) rzalloc_size(brw, bt->texture_gather_count * sizeof(uint32_t));
+    bt->texture_gather_set     = (uint32_t*) rzalloc_size(brw, bt->texture_gather_count * sizeof(uint32_t));
+    for (int i = 0; i < bt->texture_gather_count; ++i) {
+        int location = sh->SamplerUnits[i];
+        int set = 0;
+        int binding = 0;
+
+        unpack_set_and_binding(location, set, binding);
+
+        bt->texture_gather_binding[i] = binding;
+        bt->texture_gather_set[i]     = set;
+    }
+
+    return VK_SUCCESS;
+}
+
+// invoke backend compiler to generate ISA and supporting data structures
+VkResult intel_pipeline_shader_compile(struct intel_pipeline_shader *pipe_shader,
+                                         const struct intel_gpu *gpu,
+                                         const struct intel_pipeline_layout *pipeline_layout,
+                                         const VkPipelineShaderStageCreateInfo *info,
+                                         const struct intel_ir* ir)
+{
+    /* XXX how about constness? */
+    struct gl_shader_program *sh_prog = (struct gl_shader_program *) ir;
+    VkResult status = VK_SUCCESS;
+    struct brw_binding_table bt;
+
+    struct brw_context *brw = intel_create_brw_context(gpu);
+
+    memset(&bt, 0, sizeof(bt));
+
+    // LunarG : TODO - should this have been set for us somewhere?
+    sh_prog->Type = sh_prog->Shaders[0]->Stage;
+
+    if (brw_link_shader(&brw->ctx, sh_prog)) {
+
+        // first take at standalone backend compile
+        switch(sh_prog->Shaders[0]->Type) {
+        case GL_VERTEX_SHADER:
+        {
+            pipe_shader->codeSize = get_vs_program_size(brw->shader_prog);
+
+            pipe_shader->pCode = pipe_interface_alloc(gpu, pipe_shader->codeSize, sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+
+            if (!pipe_shader->pCode) {
+                status = VK_ERROR_OUT_OF_HOST_MEMORY;
+                break;
+            }
+
+            // copy the ISA out of our compile context, it is about to poof away
+            memcpy(pipe_shader->pCode, get_vs_program(brw->shader_prog), pipe_shader->codeSize);
+
+            struct brw_vs_prog_data *data = get_vs_prog_data(brw->shader_prog);
+
+            if (data->uses_vertexid)
+                pipe_shader->uses |= INTEL_SHADER_USE_VID;
+
+            if (data->uses_instanceid)
+                pipe_shader->uses |= INTEL_SHADER_USE_IID;
+
+            assert(VERT_ATTRIB_MAX - VERT_ATTRIB_GENERIC0 < 64);
+            uint64_t user_attr_read = 0;
+            for (int i=VERT_ATTRIB_GENERIC0; i < VERT_ATTRIB_MAX; i++) {
+                if (data->inputs_read & BITFIELD64_BIT(i)) {
+                    user_attr_read |= (1L << (i - VERT_ATTRIB_GENERIC0));
+                }
+            }
+            pipe_shader->inputs_read = user_attr_read;
+
+            pipe_shader->enable_user_clip = sh_prog->Vert.UsesClipDistance;
+
+            assert(VARYING_SLOT_MAX - VARYING_SLOT_CLIP_DIST0 < 64);
+            uint64_t varyings_written = 0;
+            for (int i=VARYING_SLOT_CLIP_DIST0; i < VARYING_SLOT_MAX; i++) {
+                if (data->base.vue_map.varying_to_slot[i] >= 0) {
+                    varyings_written |= (1 << (i - VARYING_SLOT_CLIP_DIST0));
+                }
+            }
+            pipe_shader->outputs_written = varyings_written;
+
+            pipe_shader->outputs_offset = BRW_SF_URB_ENTRY_READ_OFFSET * 2;
+
+            // These are really best guesses, and will require more work to
+            // understand as we turn on more features
+            pipe_shader->in_count = u_popcount(user_attr_read) +
+                    ((data->uses_vertexid || data->uses_instanceid) ? 1 : 0);
+            pipe_shader->out_count = data->base.vue_map.num_slots;// = 2;
+            pipe_shader->urb_grf_start = data->base.dispatch_grf_start_reg;// = 1;
+            pipe_shader->surface_count = data->base.base.binding_table.size_bytes / 4;
+            pipe_shader->ubo_start     = data->base.base.binding_table.ubo_start;
+            pipe_shader->per_thread_scratch_size = data->base.total_scratch;
+
+            status = build_binding_table(gpu, brw, &bt, data->base.base, sh_prog, VK_SHADER_STAGE_VERTEX_BIT);
+            if (status != VK_SUCCESS)
+                break;
+
+            if (unlikely(INTEL_DEBUG & DEBUG_VS)) {
+                printf("out_count: %d\n", pipe_shader->out_count);
+
+                vs_data_dump(stdout, data);
+
+                fprintf(stdout,"\nISA generated by compiler:\n");
+                fprintf(stdout,"ISA size: %i\n", pipe_shader->codeSize);
+                hexdump(stdout, pipe_shader->pCode, pipe_shader->codeSize);
+                fflush(stdout);
+            }
+        }
+            break;
+
+        case GL_GEOMETRY_SHADER:
+        {
+            pipe_shader->codeSize = get_gs_program_size(brw->shader_prog);
+
+            pipe_shader->pCode = pipe_interface_alloc(gpu, pipe_shader->codeSize, sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+
+            if (!pipe_shader->pCode) {
+                status = VK_ERROR_OUT_OF_HOST_MEMORY;
+                break;
+            }
+
+            // copy the ISA out of our compile context, it is about to poof away
+            memcpy(pipe_shader->pCode, get_gs_program(brw->shader_prog), pipe_shader->codeSize);
+
+            struct brw_gs_prog_data *data = get_gs_prog_data(brw->shader_prog);
+
+            struct gl_geometry_program *gp = (struct gl_geometry_program *)
+               sh_prog->_LinkedShaders[MESA_SHADER_GEOMETRY]->Program;
+
+            // for now, assume inputs have not changed, but need to hook up vue_map_vs
+            pipe_shader->inputs_read = gp->Base.InputsRead;
+
+            // urb entries are reported in pairs, see vec4_gs_visitor::setup_varying_inputs()
+            pipe_shader->in_count = data->base.urb_read_length * 2;
+
+            pipe_shader->enable_user_clip = sh_prog->Geom.UsesClipDistance;
+            pipe_shader->discard_adj      = (sh_prog->Geom.InputType == GL_LINES ||
+                                             sh_prog->Geom.InputType == GL_TRIANGLES);
+
+            assert(VARYING_SLOT_MAX - VARYING_SLOT_CLIP_DIST0 < 64);
+            uint64_t varyings_written = 0;
+            for (int i=VARYING_SLOT_CLIP_DIST0; i < VARYING_SLOT_MAX; i++) {
+                if (data->base.vue_map.varying_to_slot[i] >= 0) {
+                    varyings_written |= (1 << (i - VARYING_SLOT_CLIP_DIST0));
+                }
+            }
+            pipe_shader->outputs_written = varyings_written;
+            pipe_shader->outputs_offset = BRW_SF_URB_ENTRY_READ_OFFSET * 2;
+            pipe_shader->out_count = data->base.vue_map.num_slots;
+
+            // The following were all programmed in brw_gs_do_compile
+            pipe_shader->output_size_hwords              = data->output_vertex_size_hwords;
+            pipe_shader->output_topology                 = data->output_topology;
+            pipe_shader->control_data_header_size_hwords = data->control_data_header_size_hwords;
+            pipe_shader->control_data_format             = data->control_data_format;
+            pipe_shader->include_primitive_id            = data->include_primitive_id;
+            pipe_shader->invocations                     = data->invocations;
+            pipe_shader->dual_instanced_dispatch         = data->dual_instanced_dispatch;
+
+            // The rest duplicated from VS, merge it and clean up
+            pipe_shader->urb_grf_start = data->base.dispatch_grf_start_reg;
+            pipe_shader->surface_count = data->base.base.binding_table.size_bytes / 4;
+            pipe_shader->ubo_start     = data->base.base.binding_table.ubo_start;
+            pipe_shader->per_thread_scratch_size = data->base.total_scratch;
+
+            status = build_binding_table(gpu, brw, &bt, data->base.base, sh_prog, VK_SHADER_STAGE_GEOMETRY_BIT);
+            if (status != VK_SUCCESS)
+                break;
+
+            if (unlikely(INTEL_DEBUG & DEBUG_GS)) {
+                printf("out_count: %d\n", pipe_shader->out_count);
+
+                gs_data_dump(stdout, data);
+
+                fprintf(stdout,"\nISA generated by compiler:\n");
+                fprintf(stdout,"ISA size: %i\n", pipe_shader->codeSize);
+                hexdump(stdout, pipe_shader->pCode, pipe_shader->codeSize);
+                fflush(stdout);
+            }
+        }
+        break;
+
+        case GL_FRAGMENT_SHADER:
+        {
+            // Start pulling bits out of our compile result.
+            // see upload_ps_state() for references about what I believe each of these values are
+
+            // I would prefer to find a way to pull this data out without exposing
+            // the internals of the compiler, but it hasn't presented itself yet
+
+            pipe_shader->codeSize = get_wm_program_size(brw->shader_prog);
+
+            pipe_shader->pCode = pipe_interface_alloc(gpu, pipe_shader->codeSize, sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+
+            if (!pipe_shader->pCode) {
+                status = VK_ERROR_OUT_OF_HOST_MEMORY;
+                break;
+            }
+
+            // copy the ISA out of our compile context, it is about to poof away
+            memcpy(pipe_shader->pCode, get_wm_program(brw->shader_prog), pipe_shader->codeSize);
+
+            struct brw_wm_prog_data *data = get_wm_prog_data(brw->shader_prog);
+
+            assert(VARYING_SLOT_MAX - VARYING_SLOT_CLIP_DIST0 < 64);
+            uint64_t varyings_read = 0;
+            for (int i=VARYING_SLOT_CLIP_DIST0; i < VARYING_SLOT_MAX; i++) {
+                if (data->urb_setup[i] >= 0) {
+                    varyings_read |= (1 << (i - VARYING_SLOT_CLIP_DIST0));
+                }
+            }
+            pipe_shader->inputs_read = varyings_read;
+
+            pipe_shader->generic_input_start = VARYING_SLOT_VAR0 - VARYING_SLOT_CLIP_DIST0;
+
+            pipe_shader->reads_user_clip = data->urb_setup[VARYING_SLOT_CLIP_DIST0] >= 0 ||
+                                           data->urb_setup[VARYING_SLOT_CLIP_DIST1] >= 0;
+
+            pipe_shader->surface_count = data->base.binding_table.size_bytes / 4;
+            pipe_shader->ubo_start     = data->base.binding_table.ubo_start;
+            pipe_shader->urb_grf_start = data->first_curbe_grf;
+            pipe_shader->in_count      = data->num_varying_inputs;
+
+            // Pass on SIMD16 info
+            pipe_shader->urb_grf_start_16 = data->first_curbe_grf_16;
+            pipe_shader->offset_16        = data->prog_offset_16;
+
+            // These are programmed based on gen7_wm_state.c::upload_wm_state()
+            struct gl_fragment_program *fp = (struct gl_fragment_program *)
+               sh_prog->_LinkedShaders[MESA_SHADER_FRAGMENT]->Program;
+
+            if (fp->UsesKill)
+                pipe_shader->uses |= INTEL_SHADER_USE_KILL;
+
+            if (fp->Base.InputsRead & VARYING_BIT_POS)
+                pipe_shader->uses |= INTEL_SHADER_USE_DEPTH | INTEL_SHADER_USE_W;
+
+            if (fp->Base.OutputsWritten & BITFIELD64_BIT(FRAG_RESULT_DEPTH)) {
+
+                switch (fp->FragDepthLayout) {
+                   case FRAG_DEPTH_LAYOUT_NONE:
+                   case FRAG_DEPTH_LAYOUT_ANY:
+                      pipe_shader->computed_depth_mode = INTEL_COMPUTED_DEPTH_MODE_ON;
+                      break;
+                   case FRAG_DEPTH_LAYOUT_GREATER:
+                      pipe_shader->computed_depth_mode = INTEL_COMPUTED_DEPTH_MODE_ON_GE;
+                      break;
+                   case FRAG_DEPTH_LAYOUT_LESS:
+                      pipe_shader->computed_depth_mode = INTEL_COMPUTED_DEPTH_MODE_ON_LE;
+                      break;
+                   case FRAG_DEPTH_LAYOUT_UNCHANGED:
+                      break;
+                }
+            }
+
+            // Ensure this is 1:1, or create a converter
+            pipe_shader->barycentric_interps = data->barycentric_interp_modes;
+
+            if (data->urb_setup[VARYING_SLOT_PNTC] >= 0)
+                pipe_shader->point_sprite_enables = 1 << data->urb_setup[VARYING_SLOT_PNTC];
+
+            struct brw_stage_state *stage_state = &brw->wm.base;
+            pipe_shader->sampler_count = stage_state->sampler_count;
+
+            // TODO - Figure out multiple FS outputs
+            pipe_shader->out_count = 1;
+
+            pipe_shader->per_thread_scratch_size = data->total_scratch;
+
+            // call common code for common binding table entries
+            status = build_binding_table(gpu, brw, &bt, data->base, sh_prog, VK_SHADER_STAGE_FRAGMENT_BIT);
+            if (status != VK_SUCCESS)
+                break;
+
+            // and then tack on the remaining field for FS
+            bt.rt_start = data->binding_table.render_target_start;
+
+            if (unlikely(INTEL_DEBUG & DEBUG_WM)) {
+                // print out the supporting structures generated by the BE compile:
+                fs_data_dump(stdout, data);
+
+                printf("in_count: %d\n", pipe_shader->in_count);
+
+                fprintf(stdout,"\nISA generated by compiler:\n");
+                fprintf(stdout,"ISA size: %i\n", pipe_shader->codeSize);
+                hexdump(stdout, pipe_shader->pCode, pipe_shader->codeSize);
+                fflush(stdout);
+            }
+        }
+            break;
+
+        case GL_COMPUTE_SHADER:
+        default:
+            assert(0);
+            status = VK_ERROR_VALIDATION_FAILED_EXT;
+        }
+    } else {
+        assert(0);
+        status = VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    if (status == VK_SUCCESS) {
+        pipe_shader->rmap = rmap_create(gpu, pipeline_layout, &bt);
+        if (!pipe_shader->rmap) {
+            intel_pipeline_shader_cleanup(pipe_shader, gpu);
+            status = VK_ERROR_OUT_OF_HOST_MEMORY;
+        }
+    }
+
+    intel_destroy_brw_context(brw);
+
+    return status;
+}
+
+void intel_pipeline_shader_cleanup(struct intel_pipeline_shader *sh,
+                                   const struct intel_gpu *gpu)
+{
+    pipe_interface_free(gpu, sh->pCode);
+    if (sh->rmap)
+        rmap_destroy(gpu, sh->rmap);
+    memset(sh, 0, sizeof(*sh));
+}
+
+
+void intel_disassemble_kernel(const struct intel_gpu *gpu,
+                              const void *kernel, size_t size)
+{
+    struct brw_compile c;
+
+    memset(&c, 0, sizeof(c));
+    c.brw = intel_create_brw_context(gpu);
+    c.store = (struct brw_instruction *) kernel;
+
+    brw_dump_compile(&c, stderr, 0, size);
+}
+
+} // extern "C"

diff --git a/icd/intel/compiler/pipeline/pipeline_compiler_interface.h b/icd/intel/compiler/pipeline/pipeline_compiler_interface.h
new file mode 100644
index 0000000..201913a
--- /dev/null
+++ b/icd/intel/compiler/pipeline/pipeline_compiler_interface.h

@@ -0,0 +1,69 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Cody Northrop <cody@lunarg.com>
+ * Author: GregF <greg@LunarG.com>
+ *
+ */
+
+#ifndef PIPELINE_COMPILER_INTERFACE_H
+#define PIPELINE_COMPILER_INTERFACE_H
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "dev.h"
+
+struct brw_context;
+struct intel_pipeline_layout;
+struct intel_gpu;
+struct intel_ir;
+struct intel_pipeline_shader;
+
+struct brw_context *intel_create_brw_context(const struct intel_gpu *gpu);
+void intel_destroy_brw_context(struct brw_context *brw);
+
+VkResult intel_pipeline_shader_compile(struct intel_pipeline_shader *ips,
+                                         const struct intel_gpu *gpu,
+                                         const struct intel_pipeline_layout *pipeline_layout,
+                                         const VkPipelineShaderStageCreateInfo *info,
+                                         const struct intel_ir* ir);
+
+void intel_pipeline_shader_cleanup(struct intel_pipeline_shader *sh,
+                                   const struct intel_gpu *gpu);
+
+VkResult intel_pipeline_shader_compile_meta(struct intel_pipeline_shader *sh,
+                                              const struct intel_gpu *gpu,
+                                              enum intel_dev_meta_shader id);
+
+void intel_disassemble_kernel(const struct intel_gpu *gpu,
+                              const void *kernel, size_t size);
+
+#ifdef __cplusplus
+} // extern "C"
+#endif
+
+#endif /* PIPELINE_COMPILER_INTERFACE_H */

diff --git a/icd/intel/compiler/pipeline/pipeline_compiler_interface_meta.cpp b/icd/intel/compiler/pipeline/pipeline_compiler_interface_meta.cpp
new file mode 100644
index 0000000..a1a737e
--- /dev/null
+++ b/icd/intel/compiler/pipeline/pipeline_compiler_interface_meta.cpp

@@ -0,0 +1,743 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ *
+ */
+
+
+extern "C" {
+#include "gpu.h"
+#include "pipeline.h"
+}
+#include "compiler/shader/compiler_interface.h"
+#include "compiler/pipeline/pipeline_compiler_interface.h"
+#include "compiler/pipeline/brw_blorp_blit_eu.h"
+#include "compiler/pipeline/brw_blorp.h"
+
+enum sampler_param {
+    SAMPLER_PARAM_HOLE,
+    SAMPLER_PARAM_ZERO,
+    SAMPLER_PARAM_X,
+    SAMPLER_PARAM_Y,
+    SAMPLER_PARAM_LAYER,
+    SAMPLER_PARAM_LOD,
+};
+
+class intel_meta_compiler : public brw_blorp_eu_emitter
+{
+public:
+    intel_meta_compiler(const struct intel_gpu *gpu,
+                        struct brw_context *brw,
+                        enum intel_dev_meta_shader id);
+    void *compile(brw_blorp_prog_data *prog_data, uint32_t *code_size);
+
+private:
+    void alloc_regs()
+    {
+        int grf = base_grf;
+
+        grf = alloc_pcb_regs(grf);
+        grf = alloc_input_regs(grf);
+        grf = alloc_temp_regs(grf);
+
+        assert(grf <= 128);
+    }
+
+    int alloc_pcb_regs(int grf);
+    int alloc_input_regs(int grf);
+    int alloc_temp_regs(int grf);
+
+    void emit_compute_frag_coord();
+    void emit_sampler_payload(const struct brw_reg &mrf,
+                              const enum sampler_param *params,
+                              uint32_t param_count);
+
+    void emit_vs_fill_mem();
+    void emit_vs_copy_mem();
+    void emit_vs_copy_img_to_mem();
+    void emit_copy_mem();
+    void emit_copy_img();
+    void emit_copy_mem_to_img();
+
+    void emit_clear_color();
+    void emit_clear_depth();
+    void *codegen(uint32_t *code_size);
+
+    const struct intel_gpu *gpu;
+    struct brw_context *brw;
+    enum intel_dev_meta_shader id;
+
+    const struct brw_reg poison;
+    const struct brw_reg r0;
+    const struct brw_reg r1;
+    const int base_grf;
+    const int base_mrf;
+
+    /* pushed consts */
+    struct brw_reg clear_vals[4];
+
+    struct brw_reg src_offset_x;
+    struct brw_reg src_offset_y;
+    struct brw_reg src_layer;
+    struct brw_reg src_lod;
+
+    struct brw_reg dst_mem_offset;
+    struct brw_reg dst_extent_width;
+
+    /* inputs */
+    struct brw_reg vid;
+
+    /* temps */
+    struct brw_reg frag_x;
+    struct brw_reg frag_y;
+
+    struct brw_reg texels[4];
+
+    struct brw_reg temps[4];
+};
+
+intel_meta_compiler::intel_meta_compiler(const struct intel_gpu *gpu,
+                                         struct brw_context *brw,
+                                         enum intel_dev_meta_shader id)
+    : brw_blorp_eu_emitter(brw), gpu(gpu), brw(brw), id(id),
+      poison(brw_imm_ud(0x12345678)),
+      r0(retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UD)),
+      r1(retype(brw_vec8_grf(1, 0), BRW_REGISTER_TYPE_UD)),
+      base_grf(2), /* skipping r0 and r1 */
+      base_mrf(1)
+{
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(clear_vals); i++)
+        clear_vals[i] = poison;
+
+    src_offset_x = poison;
+    src_offset_y = poison;
+    src_layer = poison;
+    src_lod = poison;
+
+    dst_mem_offset = poison;
+    dst_extent_width = poison;
+
+    vid = poison;
+}
+
+int intel_meta_compiler::alloc_pcb_regs(int grf)
+{
+    switch (id) {
+    case INTEL_DEV_META_VS_FILL_MEM:
+        dst_mem_offset = retype(brw_vec1_grf(grf, 0), BRW_REGISTER_TYPE_UD);
+        clear_vals[0] = retype(brw_vec1_grf(grf, 1), BRW_REGISTER_TYPE_UD);
+        break;
+    case INTEL_DEV_META_VS_COPY_MEM:
+    case INTEL_DEV_META_VS_COPY_MEM_UNALIGNED:
+        dst_mem_offset = retype(brw_vec1_grf(grf, 0), BRW_REGISTER_TYPE_UD);
+        src_offset_x = retype(brw_vec1_grf(grf, 1), BRW_REGISTER_TYPE_UD);
+        break;
+    case INTEL_DEV_META_VS_COPY_R8_TO_MEM:
+    case INTEL_DEV_META_VS_COPY_R16_TO_MEM:
+    case INTEL_DEV_META_VS_COPY_R32_TO_MEM:
+    case INTEL_DEV_META_VS_COPY_R32G32_TO_MEM:
+    case INTEL_DEV_META_VS_COPY_R32G32B32A32_TO_MEM:
+        src_offset_x = retype(brw_vec1_grf(grf, 0), BRW_REGISTER_TYPE_UD);
+        src_offset_y = retype(brw_vec1_grf(grf, 1), BRW_REGISTER_TYPE_UD);
+        dst_extent_width = retype(brw_vec1_grf(grf, 2), BRW_REGISTER_TYPE_UD);
+        dst_mem_offset = retype(brw_vec1_grf(grf, 3), BRW_REGISTER_TYPE_UD);
+        break;
+    case INTEL_DEV_META_FS_COPY_MEM:
+    case INTEL_DEV_META_FS_COPY_1D:
+    case INTEL_DEV_META_FS_COPY_1D_ARRAY:
+    case INTEL_DEV_META_FS_COPY_2D:
+    case INTEL_DEV_META_FS_COPY_2D_ARRAY:
+    case INTEL_DEV_META_FS_COPY_2D_MS:
+        src_offset_x = retype(brw_vec1_grf(grf, 0), BRW_REGISTER_TYPE_UD);
+        src_offset_y = retype(brw_vec1_grf(grf, 1), BRW_REGISTER_TYPE_UD);
+        src_layer = retype(brw_vec1_grf(grf, 2), BRW_REGISTER_TYPE_UD);
+        src_lod = retype(brw_vec1_grf(grf, 3), BRW_REGISTER_TYPE_UD);
+        break;
+    case INTEL_DEV_META_FS_COPY_1D_TO_MEM:
+    case INTEL_DEV_META_FS_COPY_1D_ARRAY_TO_MEM:
+    case INTEL_DEV_META_FS_COPY_2D_TO_MEM:
+    case INTEL_DEV_META_FS_COPY_2D_ARRAY_TO_MEM:
+    case INTEL_DEV_META_FS_COPY_2D_MS_TO_MEM:
+        src_offset_x = retype(brw_vec1_grf(grf, 0), BRW_REGISTER_TYPE_UD);
+        src_offset_y = retype(brw_vec1_grf(grf, 1), BRW_REGISTER_TYPE_UD);
+        src_layer = retype(brw_vec1_grf(grf, 2), BRW_REGISTER_TYPE_UD);
+        src_lod = retype(brw_vec1_grf(grf, 3), BRW_REGISTER_TYPE_UD);
+        dst_mem_offset = retype(brw_vec1_grf(grf, 4), BRW_REGISTER_TYPE_UD);
+        dst_extent_width = retype(brw_vec1_grf(grf, 5), BRW_REGISTER_TYPE_UD);
+        break;
+    case INTEL_DEV_META_FS_COPY_MEM_TO_IMG:
+        src_offset_x = retype(brw_vec1_grf(grf, 0), BRW_REGISTER_TYPE_UD);
+        src_offset_y = retype(brw_vec1_grf(grf, 1), BRW_REGISTER_TYPE_UD);
+        dst_extent_width = retype(brw_vec1_grf(grf, 2), BRW_REGISTER_TYPE_UD);
+        break;
+    case INTEL_DEV_META_FS_CLEAR_COLOR:
+    case INTEL_DEV_META_FS_CLEAR_DEPTH:
+        clear_vals[0] = retype(brw_vec1_grf(grf, 0), BRW_REGISTER_TYPE_UD);
+        clear_vals[1] = retype(brw_vec1_grf(grf, 1), BRW_REGISTER_TYPE_UD);
+        clear_vals[2] = retype(brw_vec1_grf(grf, 2), BRW_REGISTER_TYPE_UD);
+        clear_vals[3] = retype(brw_vec1_grf(grf, 3), BRW_REGISTER_TYPE_UD);
+        break;
+    case INTEL_DEV_META_FS_RESOLVE_2X:
+    case INTEL_DEV_META_FS_RESOLVE_4X:
+    case INTEL_DEV_META_FS_RESOLVE_8X:
+    case INTEL_DEV_META_FS_RESOLVE_16X:
+        src_offset_x = retype(brw_vec1_grf(grf, 0), BRW_REGISTER_TYPE_UD);
+        src_offset_y = retype(brw_vec1_grf(grf, 1), BRW_REGISTER_TYPE_UD);
+        break;
+    default:
+        break;
+    }
+
+    return grf + 1;
+}
+
+int intel_meta_compiler::alloc_input_regs(int grf)
+{
+    switch (id) {
+    case INTEL_DEV_META_VS_FILL_MEM:
+    case INTEL_DEV_META_VS_COPY_MEM:
+    case INTEL_DEV_META_VS_COPY_MEM_UNALIGNED:
+    case INTEL_DEV_META_VS_COPY_R8_TO_MEM:
+    case INTEL_DEV_META_VS_COPY_R16_TO_MEM:
+    case INTEL_DEV_META_VS_COPY_R32_TO_MEM:
+    case INTEL_DEV_META_VS_COPY_R32G32_TO_MEM:
+    case INTEL_DEV_META_VS_COPY_R32G32B32A32_TO_MEM:
+        vid = retype(brw_vec1_grf(grf, 0), BRW_REGISTER_TYPE_UD);
+        break;
+    default:
+        break;
+    }
+
+    return grf + 1;
+}
+
+int intel_meta_compiler::alloc_temp_regs(int grf)
+{
+    int i;
+
+    frag_x = retype(brw_vec8_grf(grf, 0), BRW_REGISTER_TYPE_UW);
+    grf++;
+
+    frag_y = retype(brw_vec8_grf(grf, 0), BRW_REGISTER_TYPE_UW);
+    grf++;
+
+    for (i = 0; i < ARRAY_SIZE(texels); i++) {
+        texels[i] = retype(brw_vec8_grf(grf, 0), BRW_REGISTER_TYPE_UD);
+        grf += 8;
+    }
+
+    for (i = 0; i < ARRAY_SIZE(temps); i++) {
+        temps[i] = retype(brw_vec8_grf(grf, 0), BRW_REGISTER_TYPE_UD);
+        grf += 2;
+    }
+
+    return grf;
+}
+
+void intel_meta_compiler::emit_compute_frag_coord()
+{
+    const struct brw_reg x = retype(suboffset(r1, 2), BRW_REGISTER_TYPE_UW);
+    const struct brw_reg y = suboffset(x, 1);
+
+    emit_add(frag_x, stride(x, 2, 4, 0), brw_imm_v(0x10101010));
+    emit_add(frag_y, stride(y, 2, 4, 0), brw_imm_v(0x11001100));
+}
+
+void intel_meta_compiler::emit_sampler_payload(const struct brw_reg &mrf,
+                                               const enum sampler_param *params,
+                                               uint32_t param_count)
+{
+    int mrf_offset = 0;
+    uint32_t i;
+
+    for (i = 0; i < param_count; i++) {
+        switch (params[i]) {
+        case SAMPLER_PARAM_HOLE:
+            break;
+        case SAMPLER_PARAM_ZERO:
+            emit_mov(offset(mrf, mrf_offset), brw_imm_ud(0));
+            break;
+        case SAMPLER_PARAM_X:
+            emit_add(offset(mrf, mrf_offset), frag_x, src_offset_x);
+            break;
+        case SAMPLER_PARAM_Y:
+            emit_add(offset(mrf, mrf_offset), frag_y, src_offset_y);
+            break;
+        case SAMPLER_PARAM_LAYER:
+            emit_mov(offset(mrf, mrf_offset), src_layer);
+            break;
+        case SAMPLER_PARAM_LOD:
+            emit_mov(offset(mrf, mrf_offset), src_lod);
+            break;
+        default:
+            assert(!"unknown sampler parameter");
+            break;
+        }
+
+        mrf_offset += 2;
+    }
+}
+
+void intel_meta_compiler::emit_vs_fill_mem()
+{
+    const struct brw_reg mrf =
+        retype(vec1(brw_message_reg(base_mrf)), BRW_REGISTER_TYPE_UD);
+    int mrf_offset = 0;
+    bool use_header;
+
+    if (brw->gen >= 7) {
+        use_header = false;
+    } else {
+        emit_mov_8(offset(vec8(mrf), mrf_offset), r0);
+        mrf_offset += 1;
+
+        use_header = true;
+    }
+
+    emit_add_8(offset(mrf, mrf_offset), dst_mem_offset, vid);
+    mrf_offset += 1;
+
+    emit_mov_8(offset(mrf, mrf_offset), clear_vals[0]);
+    mrf_offset += 1;
+
+    emit_scattered_write(SHADER_OPCODE_DWORD_SCATTERED_WRITE,
+            mrf, base_mrf, mrf_offset, 1, use_header);
+
+    emit_urb_write_eot(base_mrf);
+}
+
+void intel_meta_compiler::emit_vs_copy_mem()
+{
+    const struct brw_reg mrf =
+        retype(vec1(brw_message_reg(base_mrf)), BRW_REGISTER_TYPE_UD);
+    int mrf_offset = 0;
+    enum opcode op_read, op_write;
+    bool use_header;
+
+    if (id == INTEL_DEV_META_VS_COPY_MEM) {
+        op_read = SHADER_OPCODE_DWORD_SCATTERED_READ;
+        op_write = SHADER_OPCODE_DWORD_SCATTERED_WRITE;
+    } else {
+        /* Byte Scattered Read/Write are Gen7+ only */
+        if (brw->gen == 6) {
+            emit_urb_write_eot(base_mrf);
+            return;
+        }
+        op_read = SHADER_OPCODE_BYTE_SCATTERED_READ;
+        op_write = SHADER_OPCODE_BYTE_SCATTERED_WRITE;
+    }
+
+    if (brw->gen >= 7) {
+        use_header = false;
+    } else {
+        emit_mov_8(offset(vec8(mrf), mrf_offset), r0);
+        mrf_offset += 1;
+
+        use_header = true;
+    }
+
+    emit_add_8(offset(mrf, mrf_offset), src_offset_x, vid);
+    mrf_offset += 1;
+    emit_scattered_read(temps[0], op_read, mrf,
+            base_mrf, mrf_offset, 1, use_header);
+
+    /* prepare to set up dst offset */
+    mrf_offset -= 1;
+
+    emit_add_8(offset(mrf, mrf_offset), dst_mem_offset, vid);
+    mrf_offset += 1;
+    emit_mov_8(offset(mrf, mrf_offset), vec1(temps[0]));
+    mrf_offset += 1;
+
+    emit_scattered_write(op_write, mrf, base_mrf, mrf_offset, 1, use_header);
+
+    emit_urb_write_eot(base_mrf);
+}
+
+void intel_meta_compiler::emit_vs_copy_img_to_mem()
+{
+    const struct brw_reg mrf =
+        retype(brw_message_reg(base_mrf), BRW_REGISTER_TYPE_UD);
+    int mrf_offset = 0;
+    bool use_header;
+    enum opcode op_write;
+    int op_slot_count, i;
+
+    switch (id) {
+    case INTEL_DEV_META_VS_COPY_R8_TO_MEM:
+        op_write = SHADER_OPCODE_BYTE_SCATTERED_WRITE;
+        op_slot_count = 1;
+        break;
+    case INTEL_DEV_META_VS_COPY_R16_TO_MEM:
+        op_write = SHADER_OPCODE_BYTE_SCATTERED_WRITE;
+        op_slot_count = 2;
+        break;
+    case INTEL_DEV_META_VS_COPY_R32_TO_MEM:
+        op_write = SHADER_OPCODE_DWORD_SCATTERED_WRITE;
+        op_slot_count = 1;
+        break;
+    case INTEL_DEV_META_VS_COPY_R32G32_TO_MEM:
+        op_write = SHADER_OPCODE_DWORD_SCATTERED_WRITE;
+        op_slot_count = 2;
+        break;
+    case INTEL_DEV_META_VS_COPY_R32G32B32A32_TO_MEM:
+        op_write = SHADER_OPCODE_DWORD_SCATTERED_WRITE;
+        op_slot_count = 4;
+        break;
+    default:
+        op_write = SHADER_OPCODE_DWORD_SCATTERED_WRITE;
+        op_slot_count = 0;
+        break;
+    }
+
+    if (!op_slot_count || (brw->gen == 6 &&
+                           op_write == SHADER_OPCODE_BYTE_SCATTERED_WRITE)) {
+        emit_urb_write_eot(base_mrf);
+        return;
+    }
+
+    /* load image texel */
+    emit_mov(temps[0], vid);
+    emit_mov(temps[1], dst_extent_width);
+    emit_irem(temps[2], temps[0], temps[1]);
+    emit_idiv(temps[3], temps[0], temps[1]);
+    if (brw->gen >= 7) {
+        emit_add(offset(mrf, 0), vec1(src_offset_x), vec1(temps[2]));
+        emit_mov(offset(mrf, 2), brw_imm_ud(0));
+        emit_add(offset(mrf, 4), vec1(src_offset_y), vec1(temps[3]));
+        mrf_offset = 6;
+    } else {
+        emit_add(offset(mrf, 0), vec1(src_offset_x), vec1(temps[2]));
+        emit_add(offset(mrf, 2), vec1(src_offset_y), vec1(temps[3]));
+        mrf_offset = 4;
+    }
+    emit_texture_lookup(vec1(texels[0]), SHADER_OPCODE_TXF, base_mrf, mrf_offset);
+
+    mrf_offset = 0;
+    if (brw->gen >= 7) {
+        use_header = false;
+    } else {
+        emit_mov_8(offset(mrf, mrf_offset), r0);
+        mrf_offset += 1;
+
+        use_header = true;
+    }
+
+    /* offsets */
+    if (op_slot_count == 1) {
+        emit_add_8(offset(vec1(mrf), mrf_offset), dst_mem_offset, vid);
+    } else {
+        emit_mul_8(vec1(temps[0]), vid, brw_imm_ud(op_slot_count));
+        emit_add_8(vec1(temps[0]), dst_mem_offset, vec1(temps[0]));
+
+        emit_mov_8(offset(vec1(mrf), mrf_offset), vec1(temps[0]));
+        for (i = 1; i < op_slot_count; i++) {
+            emit_add_8(suboffset(offset(vec1(mrf), mrf_offset), i),
+                    vec1(temps[0]), brw_imm_ud(i));
+        }
+    }
+    mrf_offset += 1;
+
+    /* values */
+    if (id == INTEL_DEV_META_VS_COPY_R16_TO_MEM) {
+        emit_mov(suboffset(offset(vec1(mrf), mrf_offset), 0),
+                vec1(texels[0]));
+        emit_shr(suboffset(offset(vec1(mrf), mrf_offset), 1),
+                vec1(texels[0]), brw_imm_ud(8));
+    } else {
+        for (i = 0; i < op_slot_count; i++) {
+            emit_mov_8(suboffset(offset(vec1(mrf), mrf_offset), i),
+                    offset(vec1(texels[0]), 2 * i));
+        }
+    }
+    mrf_offset += 1;
+
+    emit_scattered_write(op_write, vec1(mrf), base_mrf, mrf_offset,
+            op_slot_count, use_header);
+
+    emit_urb_write_eot(base_mrf);
+}
+
+void intel_meta_compiler::emit_copy_mem()
+{
+    const struct brw_reg mrf =
+        retype(brw_message_reg(base_mrf), BRW_REGISTER_TYPE_UD);
+
+    emit_compute_frag_coord();
+    emit_add(mrf, frag_x, src_offset_x);
+    emit_texture_lookup(texels[0], SHADER_OPCODE_TXF, base_mrf, 2);
+
+    emit_mov(offset(mrf, 0), offset(texels[0], 0));
+    emit_mov(offset(mrf, 2), offset(texels[0], 2));
+    emit_mov(offset(mrf, 4), offset(texels[0], 4));
+    emit_mov(offset(mrf, 6), offset(texels[0], 6));
+    emit_render_target_write(mrf, base_mrf, 8, false);
+}
+
+void intel_meta_compiler::emit_copy_img()
+{
+    const struct brw_reg mrf =
+        retype(brw_message_reg(base_mrf), BRW_REGISTER_TYPE_UD);
+    enum sampler_param params[8];
+    uint32_t param_count = 0;
+
+    if (brw->gen >= 7) {
+        params[param_count++] = SAMPLER_PARAM_X;
+        params[param_count++] = SAMPLER_PARAM_LOD;
+
+        switch (id) {
+        case INTEL_DEV_META_FS_COPY_1D:
+            params[param_count++] = SAMPLER_PARAM_ZERO;
+            break;
+        case INTEL_DEV_META_FS_COPY_1D_ARRAY:
+            params[param_count++] = SAMPLER_PARAM_LAYER;
+            break;
+        case INTEL_DEV_META_FS_COPY_2D:
+            params[param_count++] = SAMPLER_PARAM_Y;
+            params[param_count++] = SAMPLER_PARAM_ZERO;
+            break;
+        case INTEL_DEV_META_FS_COPY_2D_ARRAY:
+            params[param_count++] = SAMPLER_PARAM_Y;
+            params[param_count++] = SAMPLER_PARAM_LAYER;
+            break;
+        case INTEL_DEV_META_FS_COPY_2D_MS:
+            /* TODO */
+            break;
+        default:
+            break;
+        }
+    } else {
+        params[param_count++] = SAMPLER_PARAM_X;
+
+        switch (id) {
+        case INTEL_DEV_META_FS_COPY_1D:
+            params[param_count++] = SAMPLER_PARAM_ZERO;
+            params[param_count++] = SAMPLER_PARAM_HOLE;
+            break;
+        case INTEL_DEV_META_FS_COPY_1D_ARRAY:
+            params[param_count++] = SAMPLER_PARAM_LAYER;
+            params[param_count++] = SAMPLER_PARAM_ZERO;
+            break;
+        case INTEL_DEV_META_FS_COPY_2D:
+            params[param_count++] = SAMPLER_PARAM_Y;
+            params[param_count++] = SAMPLER_PARAM_ZERO;
+            break;
+        case INTEL_DEV_META_FS_COPY_2D_ARRAY:
+            params[param_count++] = SAMPLER_PARAM_Y;
+            params[param_count++] = SAMPLER_PARAM_LAYER;
+            break;
+        case INTEL_DEV_META_FS_COPY_2D_MS:
+            /* TODO */
+            break;
+        default:
+            break;
+        }
+
+        params[param_count++] = SAMPLER_PARAM_LOD;
+    }
+
+    emit_compute_frag_coord();
+    emit_sampler_payload(mrf, params, param_count);
+    emit_texture_lookup(texels[0], SHADER_OPCODE_TXF,
+            base_mrf, param_count * 2);
+
+    emit_mov(offset(mrf, 0), offset(texels[0], 0));
+    emit_mov(offset(mrf, 2), offset(texels[0], 2));
+    emit_mov(offset(mrf, 4), offset(texels[0], 4));
+    emit_mov(offset(mrf, 6), offset(texels[0], 6));
+    emit_render_target_write(mrf, base_mrf, 10, false);
+}
+
+void intel_meta_compiler::emit_copy_mem_to_img()
+{
+    const struct brw_reg mrf =
+        retype(brw_message_reg(base_mrf), BRW_REGISTER_TYPE_UD);
+
+    emit_compute_frag_coord();
+    emit_add(temps[0], frag_y, src_offset_y);
+    emit_add(temps[1], frag_x, src_offset_x);
+
+    /* convert (x, y) to linear offset */
+    emit_mul(temps[2], temps[0], dst_extent_width);
+    emit_add(mrf, temps[2], temps[1]);
+    emit_texture_lookup(texels[0], SHADER_OPCODE_TXF, base_mrf, 2);
+
+    emit_mov(offset(mrf, 0), offset(texels[0], 0));
+    emit_mov(offset(mrf, 2), offset(texels[0], 2));
+    emit_mov(offset(mrf, 4), offset(texels[0], 4));
+    emit_mov(offset(mrf, 6), offset(texels[0], 6));
+    emit_render_target_write(mrf, base_mrf, 8, false);
+}
+
+void intel_meta_compiler::emit_clear_color()
+{
+    const struct brw_reg mrf =
+        retype(brw_message_reg(base_mrf), BRW_REGISTER_TYPE_UD);
+
+    emit_mov(offset(mrf, 0), clear_vals[0]);
+    emit_mov(offset(mrf, 2), clear_vals[1]);
+    emit_mov(offset(mrf, 4), clear_vals[2]);
+    emit_mov(offset(mrf, 6), clear_vals[3]);
+    emit_render_target_write(mrf, base_mrf, 8, false);
+}
+
+void intel_meta_compiler::emit_clear_depth()
+{
+    const struct brw_reg mrf =
+        retype(brw_message_reg(base_mrf), BRW_REGISTER_TYPE_UD);
+
+    /* skip color and write oDepth only */
+    emit_mov(offset(mrf, 8), clear_vals[0]);
+    emit_render_target_write(mrf, base_mrf, 10, false);
+}
+
+void *intel_meta_compiler::codegen(uint32_t *code_size)
+{
+    const unsigned *prog;
+    unsigned prog_size;
+    void *code;
+
+    prog = get_program(&prog_size, stderr);
+
+    code = intel_alloc(gpu, prog_size, sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_DEVICE);
+    if (!code)
+        return NULL;
+
+    memcpy(code, prog, prog_size);
+    if (code_size)
+        *code_size = prog_size;
+
+    return code;
+}
+
+void *intel_meta_compiler::compile(brw_blorp_prog_data *prog_data,
+                                   uint32_t *code_size)
+{
+    memset(prog_data, 0, sizeof(*prog_data));
+    prog_data->first_curbe_grf = base_grf;
+
+    alloc_regs();
+
+    switch (id) {
+    case INTEL_DEV_META_VS_FILL_MEM:
+        emit_vs_fill_mem();
+        break;
+    case INTEL_DEV_META_VS_COPY_MEM:
+    case INTEL_DEV_META_VS_COPY_MEM_UNALIGNED:
+        emit_vs_copy_mem();
+        break;
+    case INTEL_DEV_META_VS_COPY_R8_TO_MEM:
+    case INTEL_DEV_META_VS_COPY_R16_TO_MEM:
+    case INTEL_DEV_META_VS_COPY_R32_TO_MEM:
+    case INTEL_DEV_META_VS_COPY_R32G32_TO_MEM:
+    case INTEL_DEV_META_VS_COPY_R32G32B32A32_TO_MEM:
+        emit_vs_copy_img_to_mem();
+        break;
+    case INTEL_DEV_META_FS_COPY_MEM:
+        emit_copy_mem();
+        break;
+    case INTEL_DEV_META_FS_COPY_1D:
+    case INTEL_DEV_META_FS_COPY_1D_ARRAY:
+    case INTEL_DEV_META_FS_COPY_2D:
+    case INTEL_DEV_META_FS_COPY_2D_ARRAY:
+    case INTEL_DEV_META_FS_COPY_2D_MS:
+        emit_copy_img();
+        break;
+    case INTEL_DEV_META_FS_COPY_1D_TO_MEM:
+    case INTEL_DEV_META_FS_COPY_1D_ARRAY_TO_MEM:
+    case INTEL_DEV_META_FS_COPY_2D_TO_MEM:
+    case INTEL_DEV_META_FS_COPY_2D_ARRAY_TO_MEM:
+    case INTEL_DEV_META_FS_COPY_2D_MS_TO_MEM:
+        emit_clear_color();
+        break;
+    case INTEL_DEV_META_FS_COPY_MEM_TO_IMG:
+        emit_copy_mem_to_img();
+        break;
+    case INTEL_DEV_META_FS_CLEAR_COLOR:
+        emit_clear_color();
+        break;
+    case INTEL_DEV_META_FS_CLEAR_DEPTH:
+        emit_clear_depth();
+        break;
+    case INTEL_DEV_META_FS_RESOLVE_2X:
+    case INTEL_DEV_META_FS_RESOLVE_4X:
+    case INTEL_DEV_META_FS_RESOLVE_8X:
+    case INTEL_DEV_META_FS_RESOLVE_16X:
+        emit_clear_color();
+        break;
+    default:
+        emit_clear_color();
+        break;
+    }
+
+    return codegen(code_size);
+}
+
+extern "C" {
+
+VkResult intel_pipeline_shader_compile_meta(struct intel_pipeline_shader *sh,
+                                              const struct intel_gpu *gpu,
+                                              enum intel_dev_meta_shader id)
+{
+    struct brw_context *brw = intel_create_brw_context(gpu);
+
+    intel_meta_compiler c(gpu, brw, id);
+    brw_blorp_prog_data prog_data;
+
+    sh->pCode = c.compile(&prog_data, &sh->codeSize);
+
+    sh->in_count = 0;
+    sh->out_count = 1;
+    sh->uses = 0;
+    sh->surface_count = BRW_BLORP_NUM_BINDING_TABLE_ENTRIES;
+    sh->urb_grf_start = prog_data.first_curbe_grf;
+
+    switch (id) {
+    case INTEL_DEV_META_VS_FILL_MEM:
+    case INTEL_DEV_META_VS_COPY_MEM:
+    case INTEL_DEV_META_VS_COPY_MEM_UNALIGNED:
+    case INTEL_DEV_META_VS_COPY_R8_TO_MEM:
+    case INTEL_DEV_META_VS_COPY_R16_TO_MEM:
+    case INTEL_DEV_META_VS_COPY_R32_TO_MEM:
+    case INTEL_DEV_META_VS_COPY_R32G32_TO_MEM:
+    case INTEL_DEV_META_VS_COPY_R32G32B32A32_TO_MEM:
+        sh->in_count = 1;
+        sh->uses |= INTEL_SHADER_USE_VID;
+        break;
+    case INTEL_DEV_META_FS_CLEAR_DEPTH:
+        sh->uses |= INTEL_COMPUTED_DEPTH_MODE_ON;
+        break;
+    default:
+        break;
+    }
+
+    ralloc_free(brw->shader_prog);
+    ralloc_free(brw);
+
+    return (sh->pCode) ? VK_SUCCESS : VK_ERROR_VALIDATION_FAILED_EXT;
+}
+
+} // extern "C"

diff --git a/icd/intel/compiler/shader/.gitignore b/icd/intel/compiler/shader/.gitignore
new file mode 100644
index 0000000..43720f6
--- /dev/null
+++ b/icd/intel/compiler/shader/.gitignore

@@ -0,0 +1,6 @@
+glsl_compiler
+glsl_lexer.cpp
+glsl_parser.cpp
+glsl_parser.h
+glsl_parser.output
+glsl_test

diff --git a/icd/intel/compiler/shader/README b/icd/intel/compiler/shader/README
new file mode 100644
index 0000000..0b2b589
--- /dev/null
+++ b/icd/intel/compiler/shader/README

@@ -0,0 +1,228 @@
+Welcome to Mesa's GLSL compiler.  A brief overview of how things flow:
+
+1) lex and yacc-based preprocessor takes the incoming shader string
+and produces a new string containing the preprocessed shader.  This
+takes care of things like #if, #ifdef, #define, and preprocessor macro
+invocations.  Note that #version, #extension, and some others are
+passed straight through.  See glcpp/*
+
+2) lex and yacc-based parser takes the preprocessed string and
+generates the AST (abstract syntax tree).  Almost no checking is
+performed in this stage.  See glsl_lexer.lpp and glsl_parser.ypp.
+
+3) The AST is converted to "HIR".  This is the intermediate
+representation of the compiler.  Constructors are generated, function
+calls are resolved to particular function signatures, and all the
+semantic checking is performed.  See ast_*.cpp for the conversion, and
+ir.h for the IR structures.
+
+4) The driver (Mesa, or main.cpp for the standalone binary) performs
+optimizations.  These include copy propagation, dead code elimination,
+constant folding, and others.  Generally the driver will call
+optimizations in a loop, as each may open up opportunities for other
+optimizations to do additional work.  See most files called ir_*.cpp
+
+5) linking is performed.  This does checking to ensure that the
+outputs of the vertex shader match the inputs of the fragment shader,
+and assigns locations to uniforms, attributes, and varyings.  See
+linker.cpp.
+
+6) The driver may perform additional optimization at this point, as
+for example dead code elimination previously couldn't remove functions
+or global variable usage when we didn't know what other code would be
+linked in.
+
+7) The driver performs code generation out of the IR, taking a linked
+shader program and producing a compiled program for each stage.  See
+ir_to_mesa.cpp for Mesa IR code generation.
+
+FAQ:
+
+Q: What is HIR versus IR versus LIR?
+
+A: The idea behind the naming was that ast_to_hir would produce a
+high-level IR ("HIR"), with things like matrix operations, structure
+assignments, etc., present.  A series of lowering passes would occur
+that do things like break matrix multiplication into a series of dot
+products/MADs, make structure assignment be a series of assignment of
+components, flatten if statements into conditional moves, and such,
+producing a low level IR ("LIR").
+
+However, it now appears that each driver will have different
+requirements from a LIR.  A 915-generation chipset wants all functions
+inlined, all loops unrolled, all ifs flattened, no variable array
+accesses, and matrix multiplication broken down.  The Mesa IR backend
+for swrast would like matrices and structure assignment broken down,
+but it can support function calls and dynamic branching.  A 965 vertex
+shader IR backend could potentially even handle some matrix operations
+without breaking them down, but the 965 fragment shader IR backend
+would want to break to have (almost) all operations down channel-wise
+and perform optimization on that.  As a result, there's no single
+low-level IR that will make everyone happy.  So that usage has fallen
+out of favor, and each driver will perform a series of lowering passes
+to take the HIR down to whatever restrictions it wants to impose
+before doing codegen.
+
+Q: How is the IR structured?
+
+A: The best way to get started seeing it would be to run the
+standalone compiler against a shader:
+
+./glsl_compiler --dump-lir \
+	~/src/piglit/tests/shaders/glsl-orangebook-ch06-bump.frag
+
+So for example one of the ir_instructions in main() contains:
+
+(assign (constant bool (1)) (var_ref litColor)  (expression vec3 * (var_ref Surf
+aceColor) (var_ref __retval) ) )
+
+Or more visually:
+                     (assign)
+                 /       |        \
+        (var_ref)  (expression *)  (constant bool 1)
+         /          /           \
+(litColor)      (var_ref)    (var_ref)
+                  /                  \
+           (SurfaceColor)          (__retval)
+
+which came from:
+
+litColor = SurfaceColor * max(dot(normDelta, LightDir), 0.0);
+
+(the max call is not represented in this expression tree, as it was a
+function call that got inlined but not brought into this expression
+tree)
+
+Each of those nodes is a subclass of ir_instruction.  A particular
+ir_instruction instance may only appear once in the whole IR tree with
+the exception of ir_variables, which appear once as variable
+declarations:
+
+(declare () vec3 normDelta)
+
+and multiple times as the targets of variable dereferences:
+...
+(assign (constant bool (1)) (var_ref __retval) (expression float dot
+ (var_ref normDelta) (var_ref LightDir) ) )
+...
+(assign (constant bool (1)) (var_ref __retval) (expression vec3 -
+ (var_ref LightDir) (expression vec3 * (constant float (2.000000))
+ (expression vec3 * (expression float dot (var_ref normDelta) (var_ref
+ LightDir) ) (var_ref normDelta) ) ) ) )
+...
+
+Each node has a type.  Expressions may involve several different types:
+(declare (uniform ) mat4 gl_ModelViewMatrix)
+((assign (constant bool (1)) (var_ref constructor_tmp) (expression
+ vec4 * (var_ref gl_ModelViewMatrix) (var_ref gl_Vertex) ) )
+
+An expression tree can be arbitrarily deep, and the compiler tries to
+keep them structured like that so that things like algebraic
+optimizations ((color * 1.0 == color) and ((mat1 * mat2) * vec == mat1
+* (mat2 * vec))) or recognizing operation patterns for code generation
+(vec1 * vec2 + vec3 == mad(vec1, vec2, vec3)) are easier.  This comes
+at the expense of additional trickery in implementing some
+optimizations like CSE where one must navigate an expression tree.
+
+Q: Why no SSA representation?
+
+A: Converting an IR tree to SSA form makes dead code elmimination,
+common subexpression elimination, and many other optimizations much
+easier.  However, in our primarily vector-based language, there's some
+major questions as to how it would work.  Do we do SSA on the scalar
+or vector level?  If we do it at the vector level, we're going to end
+up with many different versions of the variable when encountering code
+like:
+
+(assign (constant bool (1)) (swiz x (var_ref __retval) ) (var_ref a) ) 
+(assign (constant bool (1)) (swiz y (var_ref __retval) ) (var_ref b) ) 
+(assign (constant bool (1)) (swiz z (var_ref __retval) ) (var_ref c) ) 
+
+If every masked update of a component relies on the previous value of
+the variable, then we're probably going to be quite limited in our
+dead code elimination wins, and recognizing common expressions may
+just not happen.  On the other hand, if we operate channel-wise, then
+we'll be prone to optimizing the operation on one of the channels at
+the expense of making its instruction flow different from the other
+channels, and a vector-based GPU would end up with worse code than if
+we didn't optimize operations on that channel!
+
+Once again, it appears that our optimization requirements are driven
+significantly by the target architecture.  For now, targeting the Mesa
+IR backend, SSA does not appear to be that important to producing
+excellent code, but we do expect to do some SSA-based optimizations
+for the 965 fragment shader backend when that is developed.
+
+Q: How should I expand instructions that take multiple backend instructions?
+
+Sometimes you'll have to do the expansion in your code generation --
+see, for example, ir_to_mesa.cpp's handling of ir_unop_sqrt.  However,
+in many cases you'll want to do a pass over the IR to convert
+non-native instructions to a series of native instructions.  For
+example, for the Mesa backend we have ir_div_to_mul_rcp.cpp because
+Mesa IR (and many hardware backends) only have a reciprocal
+instruction, not a divide.  Implementing non-native instructions this
+way gives the chance for constant folding to occur, so (a / 2.0)
+becomes (a * 0.5) after codegen instead of (a * (1.0 / 2.0))
+
+Q: How shoud I handle my special hardware instructions with respect to IR?
+
+Our current theory is that if multiple targets have an instruction for
+some operation, then we should probably be able to represent that in
+the IR.  Generally this is in the form of an ir_{bin,un}op expression
+type.  For example, we initially implemented fract() using (a -
+floor(a)), but both 945 and 965 have instructions to give that result,
+and it would also simplify the implementation of mod(), so
+ir_unop_fract was added.  The following areas need updating to add a
+new expression type:
+
+ir.h (new enum)
+ir.cpp:operator_strs (used for ir_reader)
+ir_constant_expression.cpp (you probably want to be able to constant fold)
+ir_validate.cpp (check users have the right types)
+
+You may also need to update the backends if they will see the new expr type:
+
+../mesa/shaders/ir_to_mesa.cpp
+
+You can then use the new expression from builtins (if all backends
+would rather see it), or scan the IR and convert to use your new
+expression type (see ir_mod_to_fract, for example).
+
+Q: How is memory management handled in the compiler?
+
+The hierarchical memory allocator "talloc" developed for the Samba
+project is used, so that things like optimization passes don't have to
+worry about their garbage collection so much.  It has a few nice
+features, including low performance overhead and good debugging
+support that's trivially available.
+
+Generally, each stage of the compile creates a talloc context and
+allocates its memory out of that or children of it.  At the end of the
+stage, the pieces still live are stolen to a new context and the old
+one freed, or the whole context is kept for use by the next stage.
+
+For IR transformations, a temporary context is used, then at the end
+of all transformations, reparent_ir reparents all live nodes under the
+shader's IR list, and the old context full of dead nodes is freed.
+When developing a single IR transformation pass, this means that you
+want to allocate instruction nodes out of the temporary context, so if
+it becomes dead it doesn't live on as the child of a live node.  At
+the moment, optimization passes aren't passed that temporary context,
+so they find it by calling talloc_parent() on a nearby IR node.  The
+talloc_parent() call is expensive, so many passes will cache the
+result of the first talloc_parent().  Cleaning up all the optimization
+passes to take a context argument and not call talloc_parent() is left
+as an exercise.
+
+Q: What is the file naming convention in this directory?
+
+Initially, there really wasn't one.  We have since adopted one:
+
+ - Files that implement code lowering passes should be named lower_*
+   (e.g., lower_noise.cpp).
+ - Files that implement optimization passes should be named opt_*.
+ - Files that implement a class that is used throughout the code should
+   take the name of that class (e.g., ir_hierarchical_visitor.cpp).
+ - Files that contain code not fitting in one of the previous
+   categories should have a sensible name (e.g., glsl_parser.ypp).

diff --git a/icd/intel/compiler/shader/TODO b/icd/intel/compiler/shader/TODO
new file mode 100644
index 0000000..bd077a8
--- /dev/null
+++ b/icd/intel/compiler/shader/TODO

@@ -0,0 +1,12 @@
+- Detect code paths in non-void functions that don't reach a return statement
+
+- Improve handling of constants and their initializers.  Constant initializers
+  should never generate any code.  This is trival for scalar constants.  It is
+  also trivial for arrays, matrices, and vectors that are accessed with
+  constant index values.  For others it is more complicated.  Perhaps these
+  cases should be silently converted to uniforms?
+
+- Track source locations throughout the IR.  There are currently several
+  places where we cannot emit line numbers for errors (and currently emit 0:0)
+  because we've "lost" the line number information.  This is particularly
+  noticeable at link time.

diff --git a/icd/intel/compiler/shader/ast.h b/icd/intel/compiler/shader/ast.h
new file mode 100644
index 0000000..6b136f5
--- /dev/null
+++ b/icd/intel/compiler/shader/ast.h

@@ -0,0 +1,1124 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2009 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef AST_H
+#define AST_H
+
+#include "list.h"
+#include "glsl_parser_extras.h"
+
+struct _mesa_glsl_parse_state;
+
+struct YYLTYPE;
+
+/**
+ * \defgroup AST Abstract syntax tree node definitions
+ *
+ * An abstract syntax tree is generated by the parser.  This is a fairly
+ * direct representation of the gramma derivation for the source program.
+ * No symantic checking is done during the generation of the AST.  Only
+ * syntactic checking is done.  Symantic checking is performed by a later
+ * stage that converts the AST to a more generic intermediate representation.
+ *
+ *@{
+ */
+/**
+ * Base class of all abstract syntax tree nodes
+ */
+class ast_node {
+public:
+   DECLARE_RALLOC_CXX_OPERATORS(ast_node);
+
+   /**
+    * Print an AST node in something approximating the original GLSL code
+    */
+   virtual void print(void) const;
+
+   /**
+    * Convert the AST node to the high-level intermediate representation
+    */
+   virtual ir_rvalue *hir(exec_list *instructions,
+			  struct _mesa_glsl_parse_state *state);
+
+   /**
+    * Retrieve the source location of an AST node
+    *
+    * This function is primarily used to get the source position of an AST node
+    * into a form that can be passed to \c _mesa_glsl_error.
+    *
+    * \sa _mesa_glsl_error, ast_node::set_location
+    */
+   struct YYLTYPE get_location(void) const
+   {
+      struct YYLTYPE locp;
+
+      locp.source = this->location.source;
+      locp.first_line = this->location.first_line;
+      locp.first_column = this->location.first_column;
+      locp.last_line = this->location.last_line;
+      locp.last_column = this->location.last_column;
+
+      return locp;
+   }
+
+   /**
+    * Set the source location of an AST node from a parser location
+    *
+    * \sa ast_node::get_location
+    */
+   void set_location(const struct YYLTYPE &locp)
+   {
+      this->location.source = locp.source;
+      this->location.first_line = locp.first_line;
+      this->location.first_column = locp.first_column;
+      this->location.last_line = locp.last_line;
+      this->location.last_column = locp.last_column;
+   }
+
+   /**
+    * Set the source location range of an AST node using two location nodes
+    *
+    * \sa ast_node::set_location
+    */
+   void set_location_range(const struct YYLTYPE &begin, const struct YYLTYPE &end)
+   {
+      this->location.source = begin.source;
+      this->location.first_line = begin.first_line;
+      this->location.last_line = end.last_line;
+      this->location.first_column = begin.first_column;
+      this->location.last_column = end.last_column;
+   }
+
+   /**
+    * Source location of the AST node.
+    */
+   struct {
+      unsigned source;          /**< GLSL source number. */
+      unsigned first_line;      /**< First line number within the source string. */
+      unsigned first_column;    /**< First column in the first line. */
+      unsigned last_line;       /**< Last line number within the source string. */
+      unsigned last_column;     /**< Last column in the last line. */
+   } location;
+
+   exec_node link;
+
+protected:
+   /**
+    * The only constructor is protected so that only derived class objects can
+    * be created.
+    */
+   ast_node(void);
+};
+
+
+/**
+ * Operators for AST expression nodes.
+ */
+enum ast_operators {
+   ast_assign,
+   ast_plus,        /**< Unary + operator. */
+   ast_neg,
+   ast_add,
+   ast_sub,
+   ast_mul,
+   ast_div,
+   ast_mod,
+   ast_lshift,
+   ast_rshift,
+   ast_less,
+   ast_greater,
+   ast_lequal,
+   ast_gequal,
+   ast_equal,
+   ast_nequal,
+   ast_bit_and,
+   ast_bit_xor,
+   ast_bit_or,
+   ast_bit_not,
+   ast_logic_and,
+   ast_logic_xor,
+   ast_logic_or,
+   ast_logic_not,
+
+   ast_mul_assign,
+   ast_div_assign,
+   ast_mod_assign,
+   ast_add_assign,
+   ast_sub_assign,
+   ast_ls_assign,
+   ast_rs_assign,
+   ast_and_assign,
+   ast_xor_assign,
+   ast_or_assign,
+
+   ast_conditional,
+
+   ast_pre_inc,
+   ast_pre_dec,
+   ast_post_inc,
+   ast_post_dec,
+   ast_field_selection,
+   ast_array_index,
+
+   ast_function_call,
+
+   ast_identifier,
+   ast_int_constant,
+   ast_uint_constant,
+   ast_float_constant,
+   ast_bool_constant,
+
+   ast_sequence,
+   ast_aggregate
+};
+
+/**
+ * Representation of any sort of expression.
+ */
+class ast_expression : public ast_node {
+public:
+   ast_expression(int oper, ast_expression *,
+		  ast_expression *, ast_expression *);
+
+   ast_expression(const char *identifier) :
+      oper(ast_identifier)
+   {
+      subexpressions[0] = NULL;
+      subexpressions[1] = NULL;
+      subexpressions[2] = NULL;
+      primary_expression.identifier = identifier;
+      this->non_lvalue_description = NULL;
+   }
+
+   static const char *operator_string(enum ast_operators op);
+
+   virtual ir_rvalue *hir(exec_list *instructions,
+			  struct _mesa_glsl_parse_state *state);
+
+   virtual void hir_no_rvalue(exec_list *instructions,
+                              struct _mesa_glsl_parse_state *state);
+
+   ir_rvalue *do_hir(exec_list *instructions,
+                     struct _mesa_glsl_parse_state *state,
+                     bool needs_rvalue);
+
+   virtual void print(void) const;
+
+   enum ast_operators oper;
+
+   ast_expression *subexpressions[3];
+
+   union {
+      const char *identifier;
+      int int_constant;
+      float float_constant;
+      unsigned uint_constant;
+      int bool_constant;
+   } primary_expression;
+
+
+   /**
+    * List of expressions for an \c ast_sequence or parameters for an
+    * \c ast_function_call
+    */
+   exec_list expressions;
+
+   /**
+    * For things that can't be l-values, this describes what it is.
+    *
+    * This text is used by the code that generates IR for assignments to
+    * detect and emit useful messages for assignments to some things that
+    * can't be l-values.  For example, pre- or post-incerement expressions.
+    *
+    * \note
+    * This pointer may be \c NULL.
+    */
+   const char *non_lvalue_description;
+};
+
+class ast_expression_bin : public ast_expression {
+public:
+   ast_expression_bin(int oper, ast_expression *, ast_expression *);
+
+   virtual void print(void) const;
+};
+
+/**
+ * Subclass of expressions for function calls
+ */
+class ast_function_expression : public ast_expression {
+public:
+   ast_function_expression(ast_expression *callee)
+      : ast_expression(ast_function_call, callee,
+		       NULL, NULL),
+	cons(false)
+   {
+      /* empty */
+   }
+
+   ast_function_expression(class ast_type_specifier *type)
+      : ast_expression(ast_function_call, (ast_expression *) type,
+		       NULL, NULL),
+	cons(true)
+   {
+      /* empty */
+   }
+
+   bool is_constructor() const
+   {
+      return cons;
+   }
+
+   virtual ir_rvalue *hir(exec_list *instructions,
+			  struct _mesa_glsl_parse_state *state);
+
+   virtual void hir_no_rvalue(exec_list *instructions,
+                              struct _mesa_glsl_parse_state *state);
+
+private:
+   /**
+    * Is this function call actually a constructor?
+    */
+   bool cons;
+};
+
+class ast_array_specifier : public ast_node {
+public:
+   /** Unsized array specifier ([]) */
+   explicit ast_array_specifier(const struct YYLTYPE &locp)
+     : is_unsized_array(true)
+   {
+      set_location(locp);
+   }
+
+   /** Sized array specifier ([dim]) */
+   ast_array_specifier(const struct YYLTYPE &locp, ast_expression *dim)
+     : is_unsized_array(false)
+   {
+      set_location(locp);
+      array_dimensions.push_tail(&dim->link);
+   }
+
+   void add_dimension(ast_expression *dim)
+   {
+      array_dimensions.push_tail(&dim->link);
+   }
+
+   virtual void print(void) const;
+
+   /* If true, this means that the array has an unsized outermost dimension. */
+   bool is_unsized_array;
+
+   /* This list contains objects of type ast_node containing the
+    * sized dimensions only, in outermost-to-innermost order.
+    */
+   exec_list array_dimensions;
+};
+
+/**
+ * C-style aggregate initialization class
+ *
+ * Represents C-style initializers of vectors, matrices, arrays, and
+ * structures. E.g., vec3 pos = {1.0, 0.0, -1.0} is equivalent to
+ * vec3 pos = vec3(1.0, 0.0, -1.0).
+ *
+ * Specified in GLSL 4.20 and GL_ARB_shading_language_420pack.
+ *
+ * \sa _mesa_ast_set_aggregate_type
+ */
+class ast_aggregate_initializer : public ast_expression {
+public:
+   ast_aggregate_initializer()
+      : ast_expression(ast_aggregate, NULL, NULL, NULL),
+        constructor_type(NULL)
+   {
+      /* empty */
+   }
+
+   /**
+    * glsl_type of the aggregate, which is inferred from the LHS of whatever
+    * the aggregate is being used to initialize.  This can't be inferred at
+    * parse time (since the parser deals with ast_type_specifiers, not
+    * glsl_types), so the parser leaves it NULL.  However, the ast-to-hir
+    * conversion code makes sure to fill it in with the appropriate type
+    * before hir() is called.
+    */
+   const glsl_type *constructor_type;
+
+   virtual ir_rvalue *hir(exec_list *instructions,
+                          struct _mesa_glsl_parse_state *state);
+
+   virtual void hir_no_rvalue(exec_list *instructions,
+                              struct _mesa_glsl_parse_state *state);
+};
+
+/**
+ * Number of possible operators for an ast_expression
+ *
+ * This is done as a define instead of as an additional value in the enum so
+ * that the compiler won't generate spurious messages like "warning:
+ * enumeration value ‘ast_num_operators’ not handled in switch"
+ */
+#define AST_NUM_OPERATORS (ast_sequence + 1)
+
+
+class ast_compound_statement : public ast_node {
+public:
+   ast_compound_statement(int new_scope, ast_node *statements);
+   virtual void print(void) const;
+
+   virtual ir_rvalue *hir(exec_list *instructions,
+			  struct _mesa_glsl_parse_state *state);
+
+   int new_scope;
+   exec_list statements;
+};
+
+class ast_declaration : public ast_node {
+public:
+   ast_declaration(const char *identifier,
+                   ast_array_specifier *array_specifier,
+                   ast_expression *initializer);
+   virtual void print(void) const;
+
+   const char *identifier;
+
+   ast_array_specifier *array_specifier;
+
+   ast_expression *initializer;
+};
+
+
+enum {
+   ast_precision_none = 0, /**< Absence of precision qualifier. */
+   ast_precision_high,
+   ast_precision_medium,
+   ast_precision_low
+};
+
+struct ast_type_qualifier {
+   DECLARE_RALLOC_CXX_OPERATORS(ast_type_qualifier);
+
+   union {
+      struct {
+	 unsigned invariant:1;
+	 unsigned constant:1;
+	 unsigned attribute:1;
+	 unsigned varying:1;
+	 unsigned in:1;
+	 unsigned out:1;
+	 unsigned centroid:1;
+         unsigned sample:1;
+	 unsigned uniform:1;
+	 unsigned smooth:1;
+	 unsigned flat:1;
+	 unsigned noperspective:1;
+
+	 /** \name Layout qualifiers for GL_ARB_fragment_coord_conventions */
+	 /*@{*/
+	 unsigned origin_upper_left:1;
+	 unsigned pixel_center_integer:1;
+	 /*@}*/
+
+	 /**
+	  * Flag set if GL_ARB_explicit_attrib_location "location" layout
+	  * qualifier is used.
+	  */
+	 unsigned explicit_location:1;
+	 /**
+	  * Flag set if GL_ARB_explicit_attrib_location "index" layout
+	  * qualifier is used.
+	  */
+	 unsigned explicit_index:1;
+
+         /**
+          * Flag set if GL_ARB_shading_language_420pack "binding" layout
+          * qualifier is used.
+          */
+         unsigned explicit_binding:1;
+
+         /**
+          * Flag set if GL_ARB_shader_atomic counter "offset" layout
+          * qualifier is used.
+          */
+         unsigned explicit_offset:1;
+
+         /** \name Layout qualifiers for GL_AMD_conservative_depth */
+         /** \{ */
+         unsigned depth_any:1;
+         unsigned depth_greater:1;
+         unsigned depth_less:1;
+         unsigned depth_unchanged:1;
+         /** \} */
+
+	 /** \name Layout qualifiers for GL_ARB_uniform_buffer_object */
+	 /** \{ */
+         unsigned std140:1;
+         unsigned shared:1;
+         unsigned packed:1;
+         unsigned column_major:1;
+         unsigned row_major:1;
+	 /** \} */
+
+	 /** \name Layout qualifiers for GLSL 1.50 geometry shaders */
+	 /** \{ */
+	 unsigned prim_type:1;
+	 unsigned max_vertices:1;
+	 /** \} */
+
+         /**
+          * local_size_{x,y,z} flags for compute shaders.  Bit 0 represents
+          * local_size_x, and so on.
+          */
+         unsigned local_size:3;
+
+	 /** \name Layout and memory qualifiers for ARB_shader_image_load_store. */
+	 /** \{ */
+	 unsigned early_fragment_tests:1;
+	 unsigned explicit_image_format:1;
+	 unsigned coherent:1;
+	 unsigned _volatile:1;
+	 unsigned restrict_flag:1;
+	 unsigned read_only:1; /**< "readonly" qualifier. */
+	 unsigned write_only:1; /**< "writeonly" qualifier. */
+	 /** \} */
+
+         /** \name Layout qualifiers for GL_ARB_gpu_shader5 */
+         /** \{ */
+         unsigned invocations:1;
+         /** \} */
+      }
+      /** \brief Set of flags, accessed by name. */
+      q;
+
+      /** \brief Set of flags, accessed as a bitmask. */
+      uint64_t i;
+   } flags;
+
+   /** Precision of the type (highp/medium/lowp). */
+   unsigned precision:2;
+
+   /** Geometry shader invocations for GL_ARB_gpu_shader5. */
+   int invocations;
+
+   /**
+    * Location specified via GL_ARB_explicit_attrib_location layout
+    *
+    * \note
+    * This field is only valid if \c explicit_location is set.
+    */
+   int location;
+   /**
+    * Index specified via GL_ARB_explicit_attrib_location layout
+    *
+    * \note
+    * This field is only valid if \c explicit_index is set.
+    */
+   int index;
+
+   /** Maximum output vertices in GLSL 1.50 geometry shaders. */
+   int max_vertices;
+
+   /** Input or output primitive type in GLSL 1.50 geometry shaders */
+   GLenum prim_type;
+
+   /**
+    * Binding specified via GL_ARB_shading_language_420pack's "binding" keyword.
+    *
+    * \note
+    * This field is only valid if \c explicit_binding is set.
+    */
+   int binding;
+
+   /**
+    * Offset specified via GL_ARB_shader_atomic_counter's "offset"
+    * keyword.
+    *
+    * \note
+    * This field is only valid if \c explicit_offset is set.
+    */
+   int offset;
+
+   /**
+    * Local size specified via GL_ARB_compute_shader's "local_size_{x,y,z}"
+    * layout qualifier.  Element i of this array is only valid if
+    * flags.q.local_size & (1 << i) is set.
+    */
+   int local_size[3];
+
+   /**
+    * Image format specified with an ARB_shader_image_load_store
+    * layout qualifier.
+    *
+    * \note
+    * This field is only valid if \c explicit_image_format is set.
+    */
+   GLenum image_format;
+
+   /**
+    * Base type of the data read from or written to this image.  Only
+    * the following enumerants are allowed: GLSL_TYPE_UINT,
+    * GLSL_TYPE_INT, GLSL_TYPE_FLOAT.
+    *
+    * \note
+    * This field is only valid if \c explicit_image_format is set.
+    */
+   glsl_base_type image_base_type;
+
+   /**
+    * Return true if and only if an interpolation qualifier is present.
+    */
+   bool has_interpolation() const;
+
+   /**
+    * Return whether a layout qualifier is present.
+    */
+   bool has_layout() const;
+
+   /**
+    * Return whether a storage qualifier is present.
+    */
+   bool has_storage() const;
+
+   /**
+    * Return whether an auxiliary storage qualifier is present.
+    */
+   bool has_auxiliary_storage() const;
+
+   /**
+    * \brief Return string representation of interpolation qualifier.
+    *
+    * If an interpolation qualifier is present, then return that qualifier's
+    * string representation. Otherwise, return null. For example, if the
+    * noperspective bit is set, then this returns "noperspective".
+    *
+    * If multiple interpolation qualifiers are somehow present, then the
+    * returned string is undefined but not null.
+    */
+   const char *interpolation_string() const;
+
+   bool merge_qualifier(YYLTYPE *loc,
+			_mesa_glsl_parse_state *state,
+			ast_type_qualifier q);
+
+   bool merge_in_qualifier(YYLTYPE *loc,
+                           _mesa_glsl_parse_state *state,
+                           ast_type_qualifier q,
+                           ast_node* &node);
+
+};
+
+class ast_declarator_list;
+
+class ast_struct_specifier : public ast_node {
+public:
+   /**
+    * \brief Make a shallow copy of an ast_struct_specifier.
+    *
+    * Use only if the objects are allocated from the same context and will not
+    * be modified. Zeros the inherited ast_node's fields.
+    */
+   ast_struct_specifier(const ast_struct_specifier& that):
+      ast_node(), name(that.name), declarations(that.declarations),
+      is_declaration(that.is_declaration)
+   {
+      /* empty */
+   }
+
+   ast_struct_specifier(const char *identifier,
+			ast_declarator_list *declarator_list);
+   virtual void print(void) const;
+
+   virtual ir_rvalue *hir(exec_list *instructions,
+			  struct _mesa_glsl_parse_state *state);
+
+   const char *name;
+   /* List of ast_declarator_list * */
+   exec_list declarations;
+   bool is_declaration;
+};
+
+
+
+class ast_type_specifier : public ast_node {
+public:
+   /**
+    * \brief Make a shallow copy of an ast_type_specifier, specifying array
+    *        fields.
+    *
+    * Use only if the objects are allocated from the same context and will not
+    * be modified. Zeros the inherited ast_node's fields.
+    */
+   ast_type_specifier(const ast_type_specifier *that,
+                      ast_array_specifier *array_specifier)
+      : ast_node(), type_name(that->type_name), structure(that->structure),
+        array_specifier(array_specifier),
+        default_precision(that->default_precision)
+   {
+      /* empty */
+   }
+
+   /** Construct a type specifier from a type name */
+   ast_type_specifier(const char *name) 
+      : type_name(name), structure(NULL), array_specifier(NULL),
+	default_precision(ast_precision_none)
+   {
+      /* empty */
+   }
+
+   /** Construct a type specifier from a structure definition */
+   ast_type_specifier(ast_struct_specifier *s)
+      : type_name(s->name), structure(s), array_specifier(NULL),
+	default_precision(ast_precision_none)
+   {
+      /* empty */
+   }
+
+   const struct glsl_type *glsl_type(const char **name,
+				     struct _mesa_glsl_parse_state *state)
+      const;
+
+   virtual void print(void) const;
+
+   ir_rvalue *hir(exec_list *, struct _mesa_glsl_parse_state *);
+
+   const char *type_name;
+   ast_struct_specifier *structure;
+
+   ast_array_specifier *array_specifier;
+
+   /** For precision statements, this is the given precision; otherwise none. */
+   unsigned default_precision:2;
+};
+
+
+class ast_fully_specified_type : public ast_node {
+public:
+   virtual void print(void) const;
+   bool has_qualifiers() const;
+
+   ast_fully_specified_type() : qualifier(), specifier(NULL)
+   {
+   }
+
+   const struct glsl_type *glsl_type(const char **name,
+				     struct _mesa_glsl_parse_state *state)
+      const;
+
+   ast_type_qualifier qualifier;
+   ast_type_specifier *specifier;
+};
+
+
+class ast_declarator_list : public ast_node {
+public:
+   ast_declarator_list(ast_fully_specified_type *);
+   virtual void print(void) const;
+
+   virtual ir_rvalue *hir(exec_list *instructions,
+			  struct _mesa_glsl_parse_state *state);
+
+   ast_fully_specified_type *type;
+   /** List of 'ast_declaration *' */
+   exec_list declarations;
+
+   /**
+    * Special flag for vertex shader "invariant" declarations.
+    *
+    * Vertex shaders can contain "invariant" variable redeclarations that do
+    * not include a type.  For example, "invariant gl_Position;".  This flag
+    * is used to note these cases when no type is specified.
+    */
+   int invariant;
+};
+
+
+class ast_parameter_declarator : public ast_node {
+public:
+   ast_parameter_declarator() :
+      type(NULL),
+      identifier(NULL),
+      array_specifier(NULL),
+      formal_parameter(false),
+      is_void(false)
+   {
+      /* empty */
+   }
+
+   virtual void print(void) const;
+
+   virtual ir_rvalue *hir(exec_list *instructions,
+			  struct _mesa_glsl_parse_state *state);
+
+   ast_fully_specified_type *type;
+   const char *identifier;
+   ast_array_specifier *array_specifier;
+
+   static void parameters_to_hir(exec_list *ast_parameters,
+				 bool formal, exec_list *ir_parameters,
+				 struct _mesa_glsl_parse_state *state);
+
+private:
+   /** Is this parameter declaration part of a formal parameter list? */
+   bool formal_parameter;
+
+   /**
+    * Is this parameter 'void' type?
+    *
+    * This field is set by \c ::hir.
+    */
+   bool is_void;
+};
+
+
+class ast_function : public ast_node {
+public:
+   ast_function(void);
+
+   virtual void print(void) const;
+
+   virtual ir_rvalue *hir(exec_list *instructions,
+			  struct _mesa_glsl_parse_state *state);
+
+   ast_fully_specified_type *return_type;
+   const char *identifier;
+
+   exec_list parameters;
+
+private:
+   /**
+    * Is this prototype part of the function definition?
+    *
+    * Used by ast_function_definition::hir to process the parameters, etc.
+    * of the function.
+    *
+    * \sa ::hir
+    */
+   bool is_definition;
+
+   /**
+    * Function signature corresponding to this function prototype instance
+    *
+    * Used by ast_function_definition::hir to process the parameters, etc.
+    * of the function.
+    *
+    * \sa ::hir
+    */
+   class ir_function_signature *signature;
+
+   friend class ast_function_definition;
+};
+
+
+class ast_expression_statement : public ast_node {
+public:
+   ast_expression_statement(ast_expression *);
+   virtual void print(void) const;
+
+   virtual ir_rvalue *hir(exec_list *instructions,
+			  struct _mesa_glsl_parse_state *state);
+
+   ast_expression *expression;
+};
+
+
+class ast_case_label : public ast_node {
+public:
+   ast_case_label(ast_expression *test_value);
+   virtual void print(void) const;
+
+   virtual ir_rvalue *hir(exec_list *instructions,
+			  struct _mesa_glsl_parse_state *state);
+
+   /**
+    * An test value of NULL means 'default'.
+    */
+   ast_expression *test_value;
+};
+
+
+class ast_case_label_list : public ast_node {
+public:
+   ast_case_label_list(void);
+   virtual void print(void) const;
+
+   virtual ir_rvalue *hir(exec_list *instructions,
+			  struct _mesa_glsl_parse_state *state);
+
+   /**
+    * A list of case labels.
+    */
+   exec_list labels;
+};
+
+
+class ast_case_statement : public ast_node {
+public:
+   ast_case_statement(ast_case_label_list *labels);
+   virtual void print(void) const;
+
+   virtual ir_rvalue *hir(exec_list *instructions,
+			  struct _mesa_glsl_parse_state *state);
+
+   ast_case_label_list *labels;
+
+   /**
+    * A list of statements.
+    */
+   exec_list stmts;
+};
+
+
+class ast_case_statement_list : public ast_node {
+public:
+   ast_case_statement_list(void);
+   virtual void print(void) const;
+
+   virtual ir_rvalue *hir(exec_list *instructions,
+			  struct _mesa_glsl_parse_state *state);
+
+   /**
+    * A list of cases.
+    */
+   exec_list cases;
+};
+
+
+class ast_switch_body : public ast_node {
+public:
+   ast_switch_body(ast_case_statement_list *stmts);
+   virtual void print(void) const;
+
+   virtual ir_rvalue *hir(exec_list *instructions,
+			  struct _mesa_glsl_parse_state *state);
+
+   ast_case_statement_list *stmts;
+};
+
+
+class ast_selection_statement : public ast_node {
+public:
+   ast_selection_statement(ast_expression *condition,
+			   ast_node *then_statement,
+			   ast_node *else_statement);
+   virtual void print(void) const;
+
+   virtual ir_rvalue *hir(exec_list *instructions,
+			  struct _mesa_glsl_parse_state *state);
+
+   ast_expression *condition;
+   ast_node *then_statement;
+   ast_node *else_statement;
+};
+
+
+class ast_switch_statement : public ast_node {
+public:
+   ast_switch_statement(ast_expression *test_expression,
+			ast_node *body);
+   virtual void print(void) const;
+
+   virtual ir_rvalue *hir(exec_list *instructions,
+			  struct _mesa_glsl_parse_state *state);
+
+   ast_expression *test_expression;
+   ast_node *body;
+
+protected:
+   void test_to_hir(exec_list *, struct _mesa_glsl_parse_state *);
+};
+
+class ast_iteration_statement : public ast_node {
+public:
+   ast_iteration_statement(int mode, ast_node *init, ast_node *condition,
+			   ast_expression *rest_expression, ast_node *body);
+
+   virtual void print(void) const;
+
+   virtual ir_rvalue *hir(exec_list *, struct _mesa_glsl_parse_state *);
+
+   enum ast_iteration_modes {
+      ast_for,
+      ast_while,
+      ast_do_while
+   } mode;
+   
+
+   ast_node *init_statement;
+   ast_node *condition;
+   ast_expression *rest_expression;
+
+   ast_node *body;
+
+   /**
+    * Generate IR from the condition of a loop
+    *
+    * This is factored out of ::hir because some loops have the condition
+    * test at the top (for and while), and others have it at the end (do-while).
+    */
+   void condition_to_hir(exec_list *, struct _mesa_glsl_parse_state *);
+};
+
+
+class ast_jump_statement : public ast_node {
+public:
+   ast_jump_statement(int mode, ast_expression *return_value);
+   virtual void print(void) const;
+
+   virtual ir_rvalue *hir(exec_list *instructions,
+			  struct _mesa_glsl_parse_state *state);
+
+   enum ast_jump_modes {
+      ast_continue,
+      ast_break,
+      ast_return,
+      ast_discard
+   } mode;
+
+   ast_expression *opt_return_value;
+};
+
+
+class ast_function_definition : public ast_node {
+public:
+   ast_function_definition() : prototype(NULL), body(NULL)
+   {
+   }
+
+   virtual void print(void) const;
+
+   virtual ir_rvalue *hir(exec_list *instructions,
+			  struct _mesa_glsl_parse_state *state);
+
+   ast_function *prototype;
+   ast_compound_statement *body;
+};
+
+class ast_interface_block : public ast_node {
+public:
+   ast_interface_block(ast_type_qualifier layout,
+                       const char *instance_name,
+                       ast_array_specifier *array_specifier)
+   : layout(layout), block_name(NULL), instance_name(instance_name),
+     array_specifier(array_specifier)
+   {
+   }
+
+   virtual ir_rvalue *hir(exec_list *instructions,
+			  struct _mesa_glsl_parse_state *state);
+
+   ast_type_qualifier layout;
+   const char *block_name;
+
+   /**
+    * Declared name of the block instance, if specified.
+    *
+    * If the block does not have an instance name, this field will be
+    * \c NULL.
+    */
+   const char *instance_name;
+
+   /** List of ast_declarator_list * */
+   exec_list declarations;
+
+   /**
+    * Declared array size of the block instance
+    *
+    * If the block is not declared as an array or if the block instance array
+    * is unsized, this field will be \c NULL.
+    */
+   ast_array_specifier *array_specifier;
+};
+
+
+/**
+ * AST node representing a declaration of the input layout for geometry
+ * shaders.
+ */
+class ast_gs_input_layout : public ast_node
+{
+public:
+   ast_gs_input_layout(const struct YYLTYPE &locp, GLenum prim_type)
+      : prim_type(prim_type)
+   {
+      set_location(locp);
+   }
+
+   virtual ir_rvalue *hir(exec_list *instructions,
+                          struct _mesa_glsl_parse_state *state);
+
+private:
+   const GLenum prim_type;
+};
+
+
+/**
+ * AST node representing a decalaration of the input layout for compute
+ * shaders.
+ */
+class ast_cs_input_layout : public ast_node
+{
+public:
+   ast_cs_input_layout(const struct YYLTYPE &locp, const unsigned *local_size)
+   {
+      memcpy(this->local_size, local_size, sizeof(this->local_size));
+      set_location(locp);
+   }
+
+   virtual ir_rvalue *hir(exec_list *instructions,
+                          struct _mesa_glsl_parse_state *state);
+
+private:
+   unsigned local_size[3];
+};
+
+/*@}*/
+
+extern void
+_mesa_ast_to_hir(exec_list *instructions, struct _mesa_glsl_parse_state *state);
+
+extern ir_rvalue *
+_mesa_ast_field_selection_to_hir(const ast_expression *expr,
+				 exec_list *instructions,
+				 struct _mesa_glsl_parse_state *state);
+
+extern ir_rvalue *
+_mesa_ast_array_index_to_hir(void *mem_ctx,
+			     struct _mesa_glsl_parse_state *state,
+			     ir_rvalue *array, ir_rvalue *idx,
+			     YYLTYPE &loc, YYLTYPE &idx_loc);
+
+extern void
+_mesa_ast_set_aggregate_type(const glsl_type *type,
+                             ast_expression *expr);
+
+void
+emit_function(_mesa_glsl_parse_state *state, ir_function *f);
+
+extern void
+check_builtin_array_max_size(const char *name, unsigned size,
+                             YYLTYPE loc, struct _mesa_glsl_parse_state *state);
+
+#endif /* AST_H */

diff --git a/icd/intel/compiler/shader/ast_array_index.cpp b/icd/intel/compiler/shader/ast_array_index.cpp
new file mode 100644
index 0000000..ebd336b
--- /dev/null
+++ b/icd/intel/compiler/shader/ast_array_index.cpp

@@ -0,0 +1,260 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "ast.h"
+#include "glsl_types.h"
+#include "ir.h"
+
+#if defined(NDEBUG) && defined(__GNUC__)
+#define U_ASSERT_ONLY __attribute__((unused))
+#else
+#define U_ASSERT_ONLY
+#endif
+
+void
+ast_array_specifier::print(void) const
+{
+   if (this->is_unsized_array) {
+      printf("[ ] ");
+   }
+
+   foreach_list_typed (ast_node, array_dimension, link, &this->array_dimensions) {
+      printf("[ ");
+      array_dimension->print();
+      printf("] ");
+   }
+}
+
+/**
+ * If \c ir is a reference to an array for which we are tracking the max array
+ * element accessed, track that the given element has been accessed.
+ * Otherwise do nothing.
+ *
+ * This function also checks whether the array is a built-in array whose
+ * maximum size is too small to accommodate the given index, and if so uses
+ * loc and state to report the error.
+ */
+static void
+update_max_array_access(ir_rvalue *ir, unsigned idx, YYLTYPE *loc,
+                        struct _mesa_glsl_parse_state *state)
+{
+   if (ir_dereference_variable *deref_var = ir->as_dereference_variable()) {
+      ir_variable *var = deref_var->var;
+      if (idx > var->data.max_array_access) {
+         var->data.max_array_access = idx;
+
+         /* Check whether this access will, as a side effect, implicitly cause
+          * the size of a built-in array to be too large.
+          */
+         check_builtin_array_max_size(var->name, idx+1, *loc, state);
+      }
+   } else if (ir_dereference_record *deref_record =
+              ir->as_dereference_record()) {
+      /* There are two possibilities we need to consider:
+       *
+       * - Accessing an element of an array that is a member of a named
+       *   interface block (e.g. ifc.foo[i])
+       *
+       * - Accessing an element of an array that is a member of a named
+       *   interface block array (e.g. ifc[j].foo[i]).
+       */
+      ir_dereference_variable *deref_var =
+         deref_record->record->as_dereference_variable();
+      if (deref_var == NULL) {
+         if (ir_dereference_array *deref_array =
+             deref_record->record->as_dereference_array()) {
+            deref_var = deref_array->array->as_dereference_variable();
+         }
+      }
+
+      if (deref_var != NULL) {
+         if (deref_var->var->is_interface_instance()) {
+            const glsl_type U_ASSERT_ONLY *interface_type =
+               deref_var->var->get_interface_type();
+            unsigned field_index =
+               deref_record->record->type->field_index(deref_record->field);
+            assert(field_index < interface_type->length);
+            if (idx > deref_var->var->max_ifc_array_access[field_index]) {
+               deref_var->var->max_ifc_array_access[field_index] = idx;
+
+               /* Check whether this access will, as a side effect, implicitly
+                * cause the size of a built-in array to be too large.
+                */
+               check_builtin_array_max_size(deref_record->field, idx+1, *loc,
+                                            state);
+            }
+         }
+      }
+   }
+}
+
+
+ir_rvalue *
+_mesa_ast_array_index_to_hir(void *mem_ctx,
+			     struct _mesa_glsl_parse_state *state,
+			     ir_rvalue *array, ir_rvalue *idx,
+			     YYLTYPE &loc, YYLTYPE &idx_loc)
+{
+   if (!array->type->is_error()
+       && !array->type->is_array()
+       && !array->type->is_matrix()
+       && !array->type->is_vector()) {
+      _mesa_glsl_error(& idx_loc, state,
+		       "cannot dereference non-array / non-matrix / "
+		       "non-vector");
+   }
+
+   if (!idx->type->is_error()) {
+      if (!idx->type->is_integer()) {
+	 _mesa_glsl_error(& idx_loc, state, "array index must be integer type");
+      } else if (!idx->type->is_scalar()) {
+	 _mesa_glsl_error(& idx_loc, state, "array index must be scalar");
+      }
+   }
+
+   /* If the array index is a constant expression and the array has a
+    * declared size, ensure that the access is in-bounds.  If the array
+    * index is not a constant expression, ensure that the array has a
+    * declared size.
+    */
+   ir_constant *const const_index = idx->constant_expression_value();
+   if (const_index != NULL && idx->type->is_integer()) {
+      const int idx = const_index->value.i[0];
+      const char *type_name = "error";
+      unsigned bound = 0;
+
+      /* From page 24 (page 30 of the PDF) of the GLSL 1.50 spec:
+       *
+       *    "It is illegal to declare an array with a size, and then
+       *    later (in the same shader) index the same array with an
+       *    integral constant expression greater than or equal to the
+       *    declared size. It is also illegal to index an array with a
+       *    negative constant expression."
+       */
+      if (array->type->is_matrix()) {
+	 if (array->type->row_type()->vector_elements <= idx) {
+	    type_name = "matrix";
+	    bound = array->type->row_type()->vector_elements;
+	 }
+      } else if (array->type->is_vector()) {
+	 if (array->type->vector_elements <= idx) {
+	    type_name = "vector";
+	    bound = array->type->vector_elements;
+	 }
+      } else {
+	 /* glsl_type::array_size() returns -1 for non-array types.  This means
+	  * that we don't need to verify that the type is an array before
+	  * doing the bounds checking.
+	  */
+	 if ((array->type->array_size() > 0)
+	     && (array->type->array_size() <= idx)) {
+	    type_name = "array";
+	    bound = array->type->array_size();
+	 }
+      }
+
+      if (bound > 0) {
+	 _mesa_glsl_error(& loc, state, "%s index must be < %u",
+			  type_name, bound);
+      } else if (idx < 0) {
+	 _mesa_glsl_error(& loc, state, "%s index must be >= 0",
+			  type_name);
+      }
+
+      if (array->type->is_array())
+         update_max_array_access(array, idx, &loc, state);
+   } else if (const_index == NULL && array->type->is_array()) {
+      if (array->type->is_unsized_array()) {
+	 _mesa_glsl_error(&loc, state, "unsized array index must be constant");
+      } else if (array->type->fields.array->is_interface()
+                 && array->variable_referenced()->data.mode == ir_var_uniform) {
+	 /* Page 46 in section 4.3.7 of the OpenGL ES 3.00 spec says:
+	  *
+	  *     "All indexes used to index a uniform block array must be
+	  *     constant integral expressions."
+	  */
+	 _mesa_glsl_error(&loc, state,
+			  "uniform block array index must be constant");
+      } else {
+	 /* whole_variable_referenced can return NULL if the array is a
+	  * member of a structure.  In this case it is safe to not update
+	  * the max_array_access field because it is never used for fields
+	  * of structures.
+	  */
+	 ir_variable *v = array->whole_variable_referenced();
+	 if (v != NULL)
+	    v->data.max_array_access = array->type->array_size() - 1;
+      }
+
+      /* From page 23 (29 of the PDF) of the GLSL 1.30 spec:
+       *
+       *    "Samplers aggregated into arrays within a shader (using square
+       *    brackets [ ]) can only be indexed with integral constant
+       *    expressions [...]."
+       *
+       * This restriction was added in GLSL 1.30.  Shaders using earlier
+       * version of the language should not be rejected by the compiler
+       * front-end for using this construct.  This allows useful things such
+       * as using a loop counter as the index to an array of samplers.  If the
+       * loop in unrolled, the code should compile correctly.  Instead, emit a
+       * warning.
+       */
+      if (array->type->element_type()->is_sampler()) {
+	 if (!state->is_version(130, 100)) {
+	    if (state->es_shader) {
+	       _mesa_glsl_warning(&loc, state,
+				  "sampler arrays indexed with non-constant "
+				  "expressions is optional in %s",
+				  state->get_version_string());
+	    } else {
+	       _mesa_glsl_warning(&loc, state,
+				  "sampler arrays indexed with non-constant "
+				  "expressions will be forbidden in GLSL 1.30 "
+				  "and later");
+	    }
+	 } else {
+	    _mesa_glsl_error(&loc, state,
+			     "sampler arrays indexed with non-constant "
+			     "expressions is forbidden in GLSL 1.30 and "
+			     "later");
+	 }
+      }
+   }
+
+   /* After performing all of the error checking, generate the IR for the
+    * expression.
+    */
+   if (array->type->is_array()
+       || array->type->is_matrix()) {
+      return new(mem_ctx) ir_dereference_array(array, idx);
+   } else if (array->type->is_vector()) {
+      return new(mem_ctx) ir_expression(ir_binop_vector_extract, array, idx);
+   } else if (array->type->is_error()) {
+      return array;
+   } else {
+      ir_rvalue *result = new(mem_ctx) ir_dereference_array(array, idx);
+      result->type = glsl_type::error_type;
+
+      return result;
+   }
+}

diff --git a/icd/intel/compiler/shader/ast_expr.cpp b/icd/intel/compiler/shader/ast_expr.cpp
new file mode 100644
index 0000000..e624d11
--- /dev/null
+++ b/icd/intel/compiler/shader/ast_expr.cpp

@@ -0,0 +1,95 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include <assert.h>
+#include "ast.h"
+
+const char *
+ast_expression::operator_string(enum ast_operators op)
+{
+   static const char *const operators[] = {
+      "=",
+      "+",
+      "-",
+      "+",
+      "-",
+      "*",
+      "/",
+      "%",
+      "<<",
+      ">>",
+      "<",
+      ">",
+      "<=",
+      ">=",
+      "==",
+      "!=",
+      "&",
+      "^",
+      "|",
+      "~",
+      "&&",
+      "^^",
+      "||",
+      "!",
+
+      "*=",
+      "/=",
+      "%=",
+      "+=",
+      "-=",
+      "<<=",
+      ">>=",
+      "&=",
+      "^=",
+      "|=",
+
+      "?:",
+
+      "++",
+      "--",
+      "++",
+      "--",
+      ".",
+   };
+
+   assert((unsigned int)op < sizeof(operators) / sizeof(operators[0]));
+
+   return operators[op];
+}
+
+
+ast_expression_bin::ast_expression_bin(int oper, ast_expression *ex0,
+				       ast_expression *ex1) :
+   ast_expression(oper, ex0, ex1, NULL)
+{
+   assert((oper >= ast_plus) && (oper <= ast_logic_not));
+}
+
+
+void
+ast_expression_bin::print(void) const
+{
+   subexpressions[0]->print();
+   printf("%s ", operator_string(oper));
+   subexpressions[1]->print();
+}

diff --git a/icd/intel/compiler/shader/ast_function.cpp b/icd/intel/compiler/shader/ast_function.cpp
new file mode 100644
index 0000000..cdc9d1c
--- /dev/null
+++ b/icd/intel/compiler/shader/ast_function.cpp

@@ -0,0 +1,1784 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "glsl_symbol_table.h"
+#include "ast.h"
+#include "glsl_types.h"
+#include "ir.h"
+#include "libfns.h" // LunarG ADD:
+
+static ir_rvalue *
+convert_component(ir_rvalue *src, const glsl_type *desired_type);
+
+bool
+apply_implicit_conversion(const glsl_type *to, ir_rvalue * &from,
+                          struct _mesa_glsl_parse_state *state);
+
+static unsigned
+process_parameters(exec_list *instructions, exec_list *actual_parameters,
+		   exec_list *parameters,
+		   struct _mesa_glsl_parse_state *state)
+{
+   unsigned count = 0;
+
+   foreach_list (n, parameters) {
+      ast_node *const ast = exec_node_data(ast_node, n, link);
+      ir_rvalue *result = ast->hir(instructions, state);
+
+      ir_constant *const constant = result->constant_expression_value();
+      if (constant != NULL)
+	 result = constant;
+
+      actual_parameters->push_tail(result);
+      count++;
+   }
+
+   return count;
+}
+
+
+/**
+ * Generate a source prototype for a function signature
+ *
+ * \param return_type Return type of the function.  May be \c NULL.
+ * \param name        Name of the function.
+ * \param parameters  List of \c ir_instruction nodes representing the
+ *                    parameter list for the function.  This may be either a
+ *                    formal (\c ir_variable) or actual (\c ir_rvalue)
+ *                    parameter list.  Only the type is used.
+ *
+ * \return
+ * A ralloced string representing the prototype of the function.
+ */
+char *
+prototype_string(const glsl_type *return_type, const char *name,
+		 exec_list *parameters)
+{
+   char *str = NULL;
+
+   if (return_type != NULL)
+      str = ralloc_asprintf(NULL, "%s ", return_type->name);
+
+   ralloc_asprintf_append(&str, "%s(", name);
+
+   const char *comma = "";
+   foreach_list(node, parameters) {
+      const ir_variable *const param = (ir_variable *) node;
+
+      ralloc_asprintf_append(&str, "%s%s", comma, param->type->name);
+      comma = ", ";
+   }
+
+   ralloc_strcat(&str, ")");
+   return str;
+}
+
+static bool
+verify_image_parameter(YYLTYPE *loc, _mesa_glsl_parse_state *state,
+                       const ir_variable *formal, const ir_variable *actual)
+{
+   /**
+    * From the ARB_shader_image_load_store specification:
+    *
+    * "The values of image variables qualified with coherent,
+    *  volatile, restrict, readonly, or writeonly may not be passed
+    *  to functions whose formal parameters lack such
+    *  qualifiers. [...] It is legal to have additional qualifiers
+    *  on a formal parameter, but not to have fewer."
+    */
+   if (actual->data.image.coherent && !formal->data.image.coherent) {
+      _mesa_glsl_error(loc, state,
+                       "function call parameter `%s' drops "
+                       "`coherent' qualifier", formal->name);
+      return false;
+   }
+
+   if (actual->data.image._volatile && !formal->data.image._volatile) {
+      _mesa_glsl_error(loc, state,
+                       "function call parameter `%s' drops "
+                       "`volatile' qualifier", formal->name);
+      return false;
+   }
+
+   if (actual->data.image.restrict_flag && !formal->data.image.restrict_flag) {
+      _mesa_glsl_error(loc, state,
+                       "function call parameter `%s' drops "
+                       "`restrict' qualifier", formal->name);
+      return false;
+   }
+
+   if (actual->data.image.read_only && !formal->data.image.read_only) {
+      _mesa_glsl_error(loc, state,
+                       "function call parameter `%s' drops "
+                       "`readonly' qualifier", formal->name);
+      return false;
+   }
+
+   if (actual->data.image.write_only && !formal->data.image.write_only) {
+      _mesa_glsl_error(loc, state,
+                       "function call parameter `%s' drops "
+                       "`writeonly' qualifier", formal->name);
+      return false;
+   }
+
+   return true;
+}
+
+/**
+ * Verify that 'out' and 'inout' actual parameters are lvalues.  Also, verify
+ * that 'const_in' formal parameters (an extension in our IR) correspond to
+ * ir_constant actual parameters.
+ */
+static bool
+verify_parameter_modes(_mesa_glsl_parse_state *state,
+		       ir_function_signature *sig,
+		       exec_list &actual_ir_parameters,
+		       exec_list &actual_ast_parameters)
+{
+   exec_node *actual_ir_node  = actual_ir_parameters.head;
+   exec_node *actual_ast_node = actual_ast_parameters.head;
+
+   foreach_list(formal_node, &sig->parameters) {
+      /* The lists must be the same length. */
+      assert(!actual_ir_node->is_tail_sentinel());
+      assert(!actual_ast_node->is_tail_sentinel());
+
+      const ir_variable *const formal = (ir_variable *) formal_node;
+      const ir_rvalue *const actual = (ir_rvalue *) actual_ir_node;
+      const ast_expression *const actual_ast =
+	 exec_node_data(ast_expression, actual_ast_node, link);
+
+      /* FIXME: 'loc' is incorrect (as of 2011-01-21). It is always
+       * FIXME: 0:0(0).
+       */
+      YYLTYPE loc = actual_ast->get_location();
+
+      /* Verify that 'const_in' parameters are ir_constants. */
+      if (formal->data.mode == ir_var_const_in &&
+	  actual->ir_type != ir_type_constant) {
+	 _mesa_glsl_error(&loc, state,
+			  "parameter `in %s' must be a constant expression",
+			  formal->name);
+	 return false;
+      }
+
+      /* Verify that 'out' and 'inout' actual parameters are lvalues. */
+      if (formal->data.mode == ir_var_function_out
+          || formal->data.mode == ir_var_function_inout) {
+	 const char *mode = NULL;
+	 switch (formal->data.mode) {
+	 case ir_var_function_out:   mode = "out";   break;
+	 case ir_var_function_inout: mode = "inout"; break;
+	 default:                    assert(false);  break;
+	 }
+
+	 /* This AST-based check catches errors like f(i++).  The IR-based
+	  * is_lvalue() is insufficient because the actual parameter at the
+	  * IR-level is just a temporary value, which is an l-value.
+	  */
+	 if (actual_ast->non_lvalue_description != NULL) {
+	    _mesa_glsl_error(&loc, state,
+			     "function parameter '%s %s' references a %s",
+			     mode, formal->name,
+			     actual_ast->non_lvalue_description);
+	    return false;
+	 }
+
+	 ir_variable *var = actual->variable_referenced();
+	 if (var)
+	    var->data.assigned = true;
+
+	 if (var && var->data.read_only) {
+	    _mesa_glsl_error(&loc, state,
+			     "function parameter '%s %s' references the "
+			     "read-only variable '%s'",
+			     mode, formal->name,
+			     actual->variable_referenced()->name);
+	    return false;
+	 } else if (!actual->is_lvalue()) {
+            /* Even though ir_binop_vector_extract is not an l-value, let it
+             * slop through.  generate_call will handle it correctly.
+             */
+            ir_expression *const expr = ((ir_rvalue *) actual)->as_expression();
+            if (expr == NULL
+                || expr->operation != ir_binop_vector_extract
+                || !expr->operands[0]->is_lvalue()) {
+               _mesa_glsl_error(&loc, state,
+                                "function parameter '%s %s' is not an lvalue",
+                                mode, formal->name);
+               return false;
+            }
+	 }
+      }
+
+      if (formal->type->is_image() &&
+          actual->variable_referenced()) {
+         if (!verify_image_parameter(&loc, state, formal,
+                                     actual->variable_referenced()))
+            return false;
+      }
+
+      actual_ir_node  = actual_ir_node->next;
+      actual_ast_node = actual_ast_node->next;
+   }
+   return true;
+}
+
+static void
+fix_parameter(void *mem_ctx, ir_rvalue *actual, const glsl_type *formal_type,
+              exec_list *before_instructions, exec_list *after_instructions,
+              bool parameter_is_inout)
+{
+   ir_expression *const expr = actual->as_expression();
+
+   /* If the types match exactly and the parameter is not a vector-extract,
+    * nothing needs to be done to fix the parameter.
+    */
+   if (formal_type == actual->type
+       && (expr == NULL || expr->operation != ir_binop_vector_extract))
+      return;
+
+   /* To convert an out parameter, we need to create a temporary variable to
+    * hold the value before conversion, and then perform the conversion after
+    * the function call returns.
+    *
+    * This has the effect of transforming code like this:
+    *
+    *   void f(out int x);
+    *   float value;
+    *   f(value);
+    *
+    * Into IR that's equivalent to this:
+    *
+    *   void f(out int x);
+    *   float value;
+    *   int out_parameter_conversion;
+    *   f(out_parameter_conversion);
+    *   value = float(out_parameter_conversion);
+    *
+    * If the parameter is an ir_expression of ir_binop_vector_extract,
+    * additional conversion is needed in the post-call re-write.
+    */
+   ir_variable *tmp =
+      new(mem_ctx) ir_variable(formal_type, "inout_tmp", ir_var_temporary);
+
+   before_instructions->push_tail(tmp);
+
+   /* If the parameter is an inout parameter, copy the value of the actual
+    * parameter to the new temporary.  Note that no type conversion is allowed
+    * here because inout parameters must match types exactly.
+    */
+   if (parameter_is_inout) {
+      /* Inout parameters should never require conversion, since that would
+       * require an implicit conversion to exist both to and from the formal
+       * parameter type, and there are no bidirectional implicit conversions.
+       */
+      assert (actual->type == formal_type);
+
+      ir_dereference_variable *const deref_tmp_1 =
+         new(mem_ctx) ir_dereference_variable(tmp);
+      ir_assignment *const assignment =
+         new(mem_ctx) ir_assignment(deref_tmp_1, actual);
+      before_instructions->push_tail(assignment);
+   }
+
+   /* Replace the parameter in the call with a dereference of the new
+    * temporary.
+    */
+   ir_dereference_variable *const deref_tmp_2 =
+      new(mem_ctx) ir_dereference_variable(tmp);
+   actual->replace_with(deref_tmp_2);
+
+
+   /* Copy the temporary variable to the actual parameter with optional
+    * type conversion applied.
+    */
+   ir_rvalue *rhs = new(mem_ctx) ir_dereference_variable(tmp);
+   if (actual->type != formal_type)
+      rhs = convert_component(rhs, actual->type);
+
+   ir_rvalue *lhs = actual;
+   if (expr != NULL && expr->operation == ir_binop_vector_extract) {
+      rhs = new(mem_ctx) ir_expression(ir_triop_vector_insert,
+                                       expr->operands[0]->type,
+                                       expr->operands[0]->clone(mem_ctx, NULL),
+                                       rhs,
+                                       expr->operands[1]->clone(mem_ctx, NULL));
+      lhs = expr->operands[0]->clone(mem_ctx, NULL);
+   }
+
+   ir_assignment *const assignment_2 = new(mem_ctx) ir_assignment(lhs, rhs);
+   after_instructions->push_tail(assignment_2);
+}
+
+/**
+ * Generate a function call.
+ *
+ * For non-void functions, this returns a dereference of the temporary variable
+ * which stores the return value for the call.  For void functions, this returns
+ * NULL.
+ */
+static ir_rvalue *
+generate_call(exec_list *instructions, ir_function_signature *sig,
+	      exec_list *actual_parameters,
+	      struct _mesa_glsl_parse_state *state)
+{
+   void *ctx = state;
+   exec_list post_call_conversions;
+
+   /* Perform implicit conversion of arguments.  For out parameters, we need
+    * to place them in a temporary variable and do the conversion after the
+    * call takes place.  Since we haven't emitted the call yet, we'll place
+    * the post-call conversions in a temporary exec_list, and emit them later.
+    */
+   foreach_two_lists(formal_node, &sig->parameters,
+                     actual_node, actual_parameters) {
+      ir_rvalue *actual = (ir_rvalue *) actual_node;
+      ir_variable *formal = (ir_variable *) formal_node;
+
+      if (formal->type->is_numeric() || formal->type->is_boolean()) {
+	 switch (formal->data.mode) {
+	 case ir_var_const_in:
+	 case ir_var_function_in: {
+	    ir_rvalue *converted
+	       = convert_component(actual, formal->type);
+	    actual->replace_with(converted);
+	    break;
+	 }
+	 case ir_var_function_out:
+	 case ir_var_function_inout:
+            fix_parameter(ctx, actual, formal->type,
+                          instructions, &post_call_conversions,
+                          formal->data.mode == ir_var_function_inout);
+	    break;
+	 default:
+	    assert (!"Illegal formal parameter mode");
+	    break;
+	 }
+      }
+   }
+
+   /* If the function call is a constant expression, don't generate any
+    * instructions; just generate an ir_constant.
+    *
+    * Function calls were first allowed to be constant expressions in GLSL
+    * 1.20 and GLSL ES 3.00.
+    */
+   if (state->is_version(120, 300)) {
+      ir_constant *value = sig->constant_expression_value(actual_parameters, NULL);
+      if (value != NULL) {
+	 return value;
+      }
+   }
+
+   ir_dereference_variable *deref = NULL;
+   if (!sig->return_type->is_void()) {
+      /* Create a new temporary to hold the return value. */
+      ir_variable *var;
+
+      var = new(ctx) ir_variable(sig->return_type,
+				 ralloc_asprintf(ctx, "%s_retval",
+						 sig->function_name()),
+				 ir_var_temporary);
+      instructions->push_tail(var);
+
+      deref = new(ctx) ir_dereference_variable(var);
+   }
+   ir_call *call = new(ctx) ir_call(sig, deref, actual_parameters);
+   instructions->push_tail(call);
+
+   /* Also emit any necessary out-parameter conversions. */
+   instructions->append_list(&post_call_conversions);
+
+   return deref ? deref->clone(ctx, NULL) : NULL;
+}
+
+/**
+ * Given a function name and parameter list, find the matching signature.
+ */
+static ir_function_signature *
+match_function_by_name(const char *name,
+		       exec_list *actual_parameters,
+		       struct _mesa_glsl_parse_state *state)
+{
+   void *ctx = state;
+   ir_function *f = state->symbols->get_function(name);
+   ir_function_signature *local_sig = NULL;
+   ir_function_signature *sig = NULL;
+
+   /* Is the function hidden by a record type constructor? */
+   if (state->symbols->get_type(name))
+      goto done; /* no match */
+
+   /* Is the function hidden by a variable (impossible in 1.10)? */
+   if (!state->symbols->separate_function_namespace
+       && state->symbols->get_variable(name))
+      goto done; /* no match */
+
+   if (f != NULL) {
+      /* Look for a match in the local shader.  If exact, we're done. */
+      bool is_exact = false;
+      sig = local_sig = f->matching_signature(state, actual_parameters,
+                                              &is_exact);
+      if (is_exact)
+	 goto done;
+
+      if (!state->es_shader && f->has_user_signature()) {
+	 /* In desktop GL, the presence of a user-defined signature hides any
+	  * built-in signatures, so we must ignore them.  In contrast, in ES2
+	  * user-defined signatures add new overloads, so we must proceed.
+	  */
+	 goto done;
+      }
+   }
+
+   /* Local shader has no exact candidates; check the built-ins. */
+   _mesa_glsl_initialize_builtin_functions();
+   sig = _mesa_glsl_find_builtin_function(state, name, actual_parameters);
+
+done:
+   if (sig != NULL) {
+      /* If the match is from a linked built-in shader, import the prototype. */
+      if (sig != local_sig) {
+	 if (f == NULL) {
+	    f = new(ctx) ir_function(name);
+	    state->symbols->add_global_function(f);
+	    emit_function(state, f);
+	 }
+	 f->add_signature(sig->clone_prototype(f, NULL));
+      }
+   }
+   return sig;
+}
+
+static void
+print_function_prototypes(_mesa_glsl_parse_state *state, YYLTYPE *loc,
+                          ir_function *f)
+{
+   if (f == NULL)
+      return;
+
+   foreach_list (node, &f->signatures) {
+      ir_function_signature *sig = (ir_function_signature *) node;
+
+      if (sig->is_builtin() && !sig->is_builtin_available(state))
+         continue;
+
+      char *str = prototype_string(sig->return_type, f->name, &sig->parameters);
+      _mesa_glsl_error(loc, state, "   %s", str);
+      ralloc_free(str);
+   }
+}
+
+/**
+ * Raise a "no matching function" error, listing all possible overloads the
+ * compiler considered so developers can figure out what went wrong.
+ */
+static void
+no_matching_function_error(const char *name,
+			   YYLTYPE *loc,
+			   exec_list *actual_parameters,
+			   _mesa_glsl_parse_state *state)
+{
+   gl_shader *sh = _mesa_glsl_get_builtin_function_shader();
+
+   if (state->symbols->get_function(name) == NULL
+      && (!state->uses_builtin_functions
+          || sh->symbols->get_function(name) == NULL)) {
+      _mesa_glsl_error(loc, state, "no function with name '%s'", name);
+   } else {
+      char *str = prototype_string(NULL, name, actual_parameters);
+      _mesa_glsl_error(loc, state,
+                       "no matching function for call to `%s'; candidates are:",
+                       str);
+      ralloc_free(str);
+
+      print_function_prototypes(state, loc, state->symbols->get_function(name));
+
+      if (state->uses_builtin_functions) {
+         print_function_prototypes(state, loc, sh->symbols->get_function(name));
+      }
+   }
+}
+
+/**
+ * Perform automatic type conversion of constructor parameters
+ *
+ * This implements the rules in the "Conversion and Scalar Constructors"
+ * section (GLSL 1.10 section 5.4.1), not the "Implicit Conversions" rules.
+ */
+static ir_rvalue *
+convert_component(ir_rvalue *src, const glsl_type *desired_type)
+{
+   void *ctx = ralloc_parent(src);
+   const unsigned a = desired_type->base_type;
+   const unsigned b = src->type->base_type;
+   ir_expression *result = NULL;
+
+   if (src->type->is_error())
+      return src;
+
+   assert(a <= GLSL_TYPE_BOOL);
+   assert(b <= GLSL_TYPE_BOOL);
+
+   if (a == b)
+      return src;
+
+   switch (a) {
+   case GLSL_TYPE_UINT:
+      switch (b) {
+      case GLSL_TYPE_INT:
+	 result = new(ctx) ir_expression(ir_unop_i2u, src);
+	 break;
+      case GLSL_TYPE_FLOAT:
+	 result = new(ctx) ir_expression(ir_unop_f2u, src);
+	 break;
+      case GLSL_TYPE_BOOL:
+	 result = new(ctx) ir_expression(ir_unop_i2u,
+		  new(ctx) ir_expression(ir_unop_b2i, src));
+	 break;
+      }
+      break;
+   case GLSL_TYPE_INT:
+      switch (b) {
+      case GLSL_TYPE_UINT:
+	 result = new(ctx) ir_expression(ir_unop_u2i, src);
+	 break;
+      case GLSL_TYPE_FLOAT:
+	 result = new(ctx) ir_expression(ir_unop_f2i, src);
+	 break;
+      case GLSL_TYPE_BOOL:
+	 result = new(ctx) ir_expression(ir_unop_b2i, src);
+	 break;
+      }
+      break;
+   case GLSL_TYPE_FLOAT:
+      switch (b) {
+      case GLSL_TYPE_UINT:
+	 result = new(ctx) ir_expression(ir_unop_u2f, desired_type, src, NULL);
+	 break;
+      case GLSL_TYPE_INT:
+	 result = new(ctx) ir_expression(ir_unop_i2f, desired_type, src, NULL);
+	 break;
+      case GLSL_TYPE_BOOL:
+	 result = new(ctx) ir_expression(ir_unop_b2f, desired_type, src, NULL);
+	 break;
+      }
+      break;
+   case GLSL_TYPE_BOOL:
+      switch (b) {
+      case GLSL_TYPE_UINT:
+	 result = new(ctx) ir_expression(ir_unop_i2b,
+		  new(ctx) ir_expression(ir_unop_u2i, src));
+	 break;
+      case GLSL_TYPE_INT:
+	 result = new(ctx) ir_expression(ir_unop_i2b, desired_type, src, NULL);
+	 break;
+      case GLSL_TYPE_FLOAT:
+	 result = new(ctx) ir_expression(ir_unop_f2b, desired_type, src, NULL);
+	 break;
+      }
+      break;
+   }
+
+   assert(result != NULL);
+   assert(result->type == desired_type);
+
+   /* Try constant folding; it may fold in the conversion we just added. */
+   ir_constant *const constant = result->constant_expression_value();
+   return (constant != NULL) ? (ir_rvalue *) constant : (ir_rvalue *) result;
+}
+
+/**
+ * Dereference a specific component from a scalar, vector, or matrix
+ */
+static ir_rvalue *
+dereference_component(ir_rvalue *src, unsigned component)
+{
+   void *ctx = ralloc_parent(src);
+   assert(component < src->type->components());
+
+   /* If the source is a constant, just create a new constant instead of a
+    * dereference of the existing constant.
+    */
+   ir_constant *constant = src->as_constant();
+   if (constant)
+      return new(ctx) ir_constant(constant, component);
+
+   if (src->type->is_scalar()) {
+      return src;
+   } else if (src->type->is_vector()) {
+      return new(ctx) ir_swizzle(src, component, 0, 0, 0, 1);
+   } else {
+      assert(src->type->is_matrix());
+
+      /* Dereference a row of the matrix, then call this function again to get
+       * a specific element from that row.
+       */
+      const int c = component / src->type->column_type()->vector_elements;
+      const int r = component % src->type->column_type()->vector_elements;
+      ir_constant *const col_index = new(ctx) ir_constant(c);
+      ir_dereference *const col = new(ctx) ir_dereference_array(src, col_index);
+
+      col->type = src->type->column_type();
+
+      return dereference_component(col, r);
+   }
+
+   assert(!"Should not get here.");
+   return NULL;
+}
+
+
+static ir_rvalue *
+process_vec_mat_constructor(exec_list *instructions,
+                            const glsl_type *constructor_type,
+                            YYLTYPE *loc, exec_list *parameters,
+                            struct _mesa_glsl_parse_state *state)
+{
+   void *ctx = state;
+
+   /* The ARB_shading_language_420pack spec says:
+    *
+    * "If an initializer is a list of initializers enclosed in curly braces,
+    *  the variable being declared must be a vector, a matrix, an array, or a
+    *  structure.
+    *
+    *      int i = { 1 }; // illegal, i is not an aggregate"
+    */
+   if (constructor_type->vector_elements <= 1) {
+      _mesa_glsl_error(loc, state, "aggregates can only initialize vectors, "
+                       "matrices, arrays, and structs");
+      return ir_rvalue::error_value(ctx);
+   }
+
+   exec_list actual_parameters;
+   const unsigned parameter_count =
+      process_parameters(instructions, &actual_parameters, parameters, state);
+
+   if (parameter_count == 0
+       || (constructor_type->is_vector() &&
+           constructor_type->vector_elements != parameter_count)
+       || (constructor_type->is_matrix() &&
+           constructor_type->matrix_columns != parameter_count)) {
+      _mesa_glsl_error(loc, state, "%s constructor must have %u parameters",
+                       constructor_type->is_vector() ? "vector" : "matrix",
+                       constructor_type->vector_elements);
+      return ir_rvalue::error_value(ctx);
+   }
+
+   bool all_parameters_are_constant = true;
+
+   /* Type cast each parameter and, if possible, fold constants. */
+   foreach_list_safe(n, &actual_parameters) {
+      ir_rvalue *ir = (ir_rvalue *) n;
+      ir_rvalue *result = ir;
+
+      /* Apply implicit conversions (not the scalar constructor rules!). See
+       * the spec quote above. */
+      if (constructor_type->is_float()) {
+         const glsl_type *desired_type =
+            glsl_type::get_instance(GLSL_TYPE_FLOAT,
+                                    ir->type->vector_elements,
+                                    ir->type->matrix_columns);
+         if (result->type->can_implicitly_convert_to(desired_type)) {
+            /* Even though convert_component() implements the constructor
+             * conversion rules (not the implicit conversion rules), its safe
+             * to use it here because we already checked that the implicit
+             * conversion is legal.
+             */
+            result = convert_component(ir, desired_type);
+         }
+      }
+
+      if (constructor_type->is_matrix()) {
+         if (result->type != constructor_type->column_type()) {
+            _mesa_glsl_error(loc, state, "type error in matrix constructor: "
+                             "expected: %s, found %s",
+                             constructor_type->column_type()->name,
+                             result->type->name);
+            return ir_rvalue::error_value(ctx);
+         }
+      } else if (result->type != constructor_type->get_scalar_type()) {
+         _mesa_glsl_error(loc, state, "type error in vector constructor: "
+                          "expected: %s, found %s",
+                          constructor_type->get_scalar_type()->name,
+                          result->type->name);
+         return ir_rvalue::error_value(ctx);
+      }
+
+      /* Attempt to convert the parameter to a constant valued expression.
+       * After doing so, track whether or not all the parameters to the
+       * constructor are trivially constant valued expressions.
+       */
+      ir_rvalue *const constant = result->constant_expression_value();
+
+      if (constant != NULL)
+         result = constant;
+      else
+         all_parameters_are_constant = false;
+
+      ir->replace_with(result);
+   }
+
+   if (all_parameters_are_constant)
+      return new(ctx) ir_constant(constructor_type, &actual_parameters);
+
+   ir_variable *var = new(ctx) ir_variable(constructor_type, "vec_mat_ctor",
+                                           ir_var_temporary);
+   instructions->push_tail(var);
+
+   int i = 0;
+
+   foreach_list(node, &actual_parameters) {
+      ir_rvalue *rhs = (ir_rvalue *) node;
+      ir_instruction *assignment = NULL;
+
+      if (var->type->is_matrix()) {
+         ir_rvalue *lhs = new(ctx) ir_dereference_array(var,
+                                             new(ctx) ir_constant(i));
+         assignment = new(ctx) ir_assignment(lhs, rhs, NULL);
+      } else {
+         /* use writemask rather than index for vector */
+         assert(var->type->is_vector());
+         assert(i < 4);
+         ir_dereference *lhs = new(ctx) ir_dereference_variable(var);
+         assignment = new(ctx) ir_assignment(lhs, rhs, NULL, (unsigned)(1 << i));
+      }
+
+      instructions->push_tail(assignment);
+
+      i++;
+   }
+
+   return new(ctx) ir_dereference_variable(var);
+}
+
+
+static ir_rvalue *
+process_array_constructor(exec_list *instructions,
+			  const glsl_type *constructor_type,
+			  YYLTYPE *loc, exec_list *parameters,
+			  struct _mesa_glsl_parse_state *state)
+{
+   void *ctx = state;
+   /* Array constructors come in two forms: sized and unsized.  Sized array
+    * constructors look like 'vec4[2](a, b)', where 'a' and 'b' are vec4
+    * variables.  In this case the number of parameters must exactly match the
+    * specified size of the array.
+    *
+    * Unsized array constructors look like 'vec4[](a, b)', where 'a' and 'b'
+    * are vec4 variables.  In this case the size of the array being constructed
+    * is determined by the number of parameters.
+    *
+    * From page 52 (page 58 of the PDF) of the GLSL 1.50 spec:
+    *
+    *    "There must be exactly the same number of arguments as the size of
+    *    the array being constructed. If no size is present in the
+    *    constructor, then the array is explicitly sized to the number of
+    *    arguments provided. The arguments are assigned in order, starting at
+    *    element 0, to the elements of the constructed array. Each argument
+    *    must be the same type as the element type of the array, or be a type
+    *    that can be converted to the element type of the array according to
+    *    Section 4.1.10 "Implicit Conversions.""
+    */
+   exec_list actual_parameters;
+   const unsigned parameter_count =
+      process_parameters(instructions, &actual_parameters, parameters, state);
+   bool is_unsized_array = constructor_type->is_unsized_array();
+
+   if ((parameter_count == 0) ||
+       (!is_unsized_array && (constructor_type->length != parameter_count))) {
+      const unsigned min_param = is_unsized_array
+         ? 1 : constructor_type->length;
+
+      _mesa_glsl_error(loc, state, "array constructor must have %s %u "
+		       "parameter%s",
+		       is_unsized_array ? "at least" : "exactly",
+		       min_param, (min_param <= 1) ? "" : "s");
+      return ir_rvalue::error_value(ctx);
+   }
+
+   if (is_unsized_array) {
+      constructor_type =
+	 glsl_type::get_array_instance(constructor_type->element_type(),
+				       parameter_count);
+      assert(constructor_type != NULL);
+      assert(constructor_type->length == parameter_count);
+   }
+
+   bool all_parameters_are_constant = true;
+
+   /* Type cast each parameter and, if possible, fold constants. */
+   foreach_list_safe(n, &actual_parameters) {
+      ir_rvalue *ir = (ir_rvalue *) n;
+      ir_rvalue *result = ir;
+
+      /* Apply implicit conversions (not the scalar constructor rules!). See
+       * the spec quote above. */
+      if (constructor_type->element_type()->is_float()) {
+	 const glsl_type *desired_type =
+	    glsl_type::get_instance(GLSL_TYPE_FLOAT,
+				    ir->type->vector_elements,
+				    ir->type->matrix_columns);
+	 if (result->type->can_implicitly_convert_to(desired_type)) {
+	    /* Even though convert_component() implements the constructor
+	     * conversion rules (not the implicit conversion rules), its safe
+	     * to use it here because we already checked that the implicit
+	     * conversion is legal.
+	     */
+	    result = convert_component(ir, desired_type);
+	 }
+      }
+
+      if (result->type != constructor_type->element_type()) {
+	 _mesa_glsl_error(loc, state, "type error in array constructor: "
+			  "expected: %s, found %s",
+			  constructor_type->element_type()->name,
+			  result->type->name);
+         return ir_rvalue::error_value(ctx);
+      }
+
+      /* Attempt to convert the parameter to a constant valued expression.
+       * After doing so, track whether or not all the parameters to the
+       * constructor are trivially constant valued expressions.
+       */
+      ir_rvalue *const constant = result->constant_expression_value();
+
+      if (constant != NULL)
+         result = constant;
+      else
+         all_parameters_are_constant = false;
+
+      ir->replace_with(result);
+   }
+
+   if (all_parameters_are_constant)
+      return new(ctx) ir_constant(constructor_type, &actual_parameters);
+
+   ir_variable *var = new(ctx) ir_variable(constructor_type, "array_ctor",
+					   ir_var_temporary);
+   instructions->push_tail(var);
+
+   int i = 0;
+   foreach_list(node, &actual_parameters) {
+      ir_rvalue *rhs = (ir_rvalue *) node;
+      ir_rvalue *lhs = new(ctx) ir_dereference_array(var,
+						     new(ctx) ir_constant(i));
+
+      ir_instruction *assignment = new(ctx) ir_assignment(lhs, rhs, NULL);
+      instructions->push_tail(assignment);
+
+      i++;
+   }
+
+   return new(ctx) ir_dereference_variable(var);
+}
+
+
+/**
+ * Try to convert a record constructor to a constant expression
+ */
+static ir_constant *
+constant_record_constructor(const glsl_type *constructor_type,
+			    exec_list *parameters, void *mem_ctx)
+{
+   foreach_list(node, parameters) {
+      ir_constant *constant = ((ir_instruction *) node)->as_constant();
+      if (constant == NULL)
+	 return NULL;
+      node->replace_with(constant);
+   }
+
+   return new(mem_ctx) ir_constant(constructor_type, parameters);
+}
+
+
+/**
+ * Determine if a list consists of a single scalar r-value
+ */
+bool
+single_scalar_parameter(exec_list *parameters)
+{
+   const ir_rvalue *const p = (ir_rvalue *) parameters->head;
+   assert(((ir_rvalue *)p)->as_rvalue() != NULL);
+
+   return (p->type->is_scalar() && p->next->is_tail_sentinel());
+}
+
+
+/**
+ * Generate inline code for a vector constructor
+ *
+ * The generated constructor code will consist of a temporary variable
+ * declaration of the same type as the constructor.  A sequence of assignments
+ * from constructor parameters to the temporary will follow.
+ *
+ * \return
+ * An \c ir_dereference_variable of the temprorary generated in the constructor
+ * body.
+ */
+ir_rvalue *
+emit_inline_vector_constructor(const glsl_type *type,
+			       exec_list *instructions,
+			       exec_list *parameters,
+			       void *ctx)
+{
+   assert(!parameters->is_empty());
+
+   ir_variable *var = new(ctx) ir_variable(type, "vec_ctor", ir_var_temporary);
+   instructions->push_tail(var);
+
+   /* There are two kinds of vector constructors.
+    *
+    *  - Construct a vector from a single scalar by replicating that scalar to
+    *    all components of the vector.
+    *
+    *  - Construct a vector from an arbirary combination of vectors and
+    *    scalars.  The components of the constructor parameters are assigned
+    *    to the vector in order until the vector is full.
+    */
+   const unsigned lhs_components = type->components();
+   if (single_scalar_parameter(parameters)) {
+      ir_rvalue *first_param = (ir_rvalue *)parameters->head;
+      ir_rvalue *rhs = new(ctx) ir_swizzle(first_param, 0, 0, 0, 0,
+					   lhs_components);
+      ir_dereference_variable *lhs = new(ctx) ir_dereference_variable(var);
+      const unsigned mask = (1U << lhs_components) - 1;
+
+      assert(rhs->type == lhs->type);
+
+      ir_instruction *inst = new(ctx) ir_assignment(lhs, rhs, NULL, mask);
+      instructions->push_tail(inst);
+   } else {
+      unsigned base_component = 0;
+      unsigned base_lhs_component = 0;
+      ir_constant_data data;
+      unsigned constant_mask = 0, constant_components = 0;
+
+      memset(&data, 0, sizeof(data));
+
+      foreach_list(node, parameters) {
+	 ir_rvalue *param = (ir_rvalue *) node;
+	 unsigned rhs_components = param->type->components();
+
+	 /* Do not try to assign more components to the vector than it has!
+	  */
+	 if ((rhs_components + base_lhs_component) > lhs_components) {
+	    rhs_components = lhs_components - base_lhs_component;
+	 }
+
+	 const ir_constant *const c = param->as_constant();
+	 if (c != NULL) {
+	    for (unsigned i = 0; i < rhs_components; i++) {
+	       switch (c->type->base_type) {
+	       case GLSL_TYPE_UINT:
+		  data.u[i + base_component] = c->get_uint_component(i);
+		  break;
+	       case GLSL_TYPE_INT:
+		  data.i[i + base_component] = c->get_int_component(i);
+		  break;
+	       case GLSL_TYPE_FLOAT:
+		  data.f[i + base_component] = c->get_float_component(i);
+		  break;
+	       case GLSL_TYPE_BOOL:
+		  data.b[i + base_component] = c->get_bool_component(i);
+		  break;
+	       default:
+		  assert(!"Should not get here.");
+		  break;
+	       }
+	    }
+
+	    /* Mask of fields to be written in the assignment.
+	     */
+	    constant_mask |= ((1U << rhs_components) - 1) << base_lhs_component;
+	    constant_components += rhs_components;
+
+	    base_component += rhs_components;
+	 }
+	 /* Advance the component index by the number of components
+	  * that were just assigned.
+	  */
+	 base_lhs_component += rhs_components;
+      }
+
+      if (constant_mask != 0) {
+	 ir_dereference *lhs = new(ctx) ir_dereference_variable(var);
+	 const glsl_type *rhs_type = glsl_type::get_instance(var->type->base_type,
+							     constant_components,
+							     1);
+	 ir_rvalue *rhs = new(ctx) ir_constant(rhs_type, &data);
+
+	 ir_instruction *inst =
+	    new(ctx) ir_assignment(lhs, rhs, NULL, constant_mask);
+	 instructions->push_tail(inst);
+      }
+
+      base_component = 0;
+      foreach_list(node, parameters) {
+	 ir_rvalue *param = (ir_rvalue *) node;
+	 unsigned rhs_components = param->type->components();
+
+	 /* Do not try to assign more components to the vector than it has!
+	  */
+	 if ((rhs_components + base_component) > lhs_components) {
+	    rhs_components = lhs_components - base_component;
+	 }
+
+	 const ir_constant *const c = param->as_constant();
+	 if (c == NULL) {
+	    /* Mask of fields to be written in the assignment.
+	     */
+	    const unsigned write_mask = ((1U << rhs_components) - 1)
+	       << base_component;
+
+	    ir_dereference *lhs = new(ctx) ir_dereference_variable(var);
+
+	    /* Generate a swizzle so that LHS and RHS sizes match.
+	     */
+	    ir_rvalue *rhs =
+	       new(ctx) ir_swizzle(param, 0, 1, 2, 3, rhs_components);
+
+	    ir_instruction *inst =
+	       new(ctx) ir_assignment(lhs, rhs, NULL, write_mask);
+	    instructions->push_tail(inst);
+	 }
+
+	 /* Advance the component index by the number of components that were
+	  * just assigned.
+	  */
+	 base_component += rhs_components;
+      }
+   }
+   return new(ctx) ir_dereference_variable(var);
+}
+
+
+/**
+ * Generate assignment of a portion of a vector to a portion of a matrix column
+ *
+ * \param src_base  First component of the source to be used in assignment
+ * \param column    Column of destination to be assiged
+ * \param row_base  First component of the destination column to be assigned
+ * \param count     Number of components to be assigned
+ *
+ * \note
+ * \c src_base + \c count must be less than or equal to the number of components
+ * in the source vector.
+ */
+ir_instruction *
+assign_to_matrix_column(ir_variable *var, unsigned column, unsigned row_base,
+			ir_rvalue *src, unsigned src_base, unsigned count,
+			void *mem_ctx)
+{
+   ir_constant *col_idx = new(mem_ctx) ir_constant(column);
+   ir_dereference *column_ref = new(mem_ctx) ir_dereference_array(var, col_idx);
+
+   assert(column_ref->type->components() >= (row_base + count));
+   assert(src->type->components() >= (src_base + count));
+
+   /* Generate a swizzle that extracts the number of components from the source
+    * that are to be assigned to the column of the matrix.
+    */
+   if (count < src->type->vector_elements) {
+      src = new(mem_ctx) ir_swizzle(src,
+				    src_base + 0, src_base + 1,
+				    src_base + 2, src_base + 3,
+				    count);
+   }
+
+   /* Mask of fields to be written in the assignment.
+    */
+   const unsigned write_mask = ((1U << count) - 1) << row_base;
+
+   return new(mem_ctx) ir_assignment(column_ref, src, NULL, write_mask);
+}
+
+
+/**
+ * Generate inline code for a matrix constructor
+ *
+ * The generated constructor code will consist of a temporary variable
+ * declaration of the same type as the constructor.  A sequence of assignments
+ * from constructor parameters to the temporary will follow.
+ *
+ * \return
+ * An \c ir_dereference_variable of the temprorary generated in the constructor
+ * body.
+ */
+ir_rvalue *
+emit_inline_matrix_constructor(const glsl_type *type,
+			       exec_list *instructions,
+			       exec_list *parameters,
+			       void *ctx)
+{
+   assert(!parameters->is_empty());
+
+   ir_variable *var = new(ctx) ir_variable(type, "mat_ctor", ir_var_temporary);
+   instructions->push_tail(var);
+
+   /* There are three kinds of matrix constructors.
+    *
+    *  - Construct a matrix from a single scalar by replicating that scalar to
+    *    along the diagonal of the matrix and setting all other components to
+    *    zero.
+    *
+    *  - Construct a matrix from an arbirary combination of vectors and
+    *    scalars.  The components of the constructor parameters are assigned
+    *    to the matrix in colum-major order until the matrix is full.
+    *
+    *  - Construct a matrix from a single matrix.  The source matrix is copied
+    *    to the upper left portion of the constructed matrix, and the remaining
+    *    elements take values from the identity matrix.
+    */
+   ir_rvalue *const first_param = (ir_rvalue *) parameters->head;
+   if (single_scalar_parameter(parameters)) {
+      /* Assign the scalar to the X component of a vec4, and fill the remaining
+       * components with zero.
+       */
+      ir_variable *rhs_var =
+	 new(ctx) ir_variable(glsl_type::vec4_type, "mat_ctor_vec",
+			      ir_var_temporary);
+      instructions->push_tail(rhs_var);
+
+      ir_constant_data zero;
+      zero.f[0] = 0.0;
+      zero.f[1] = 0.0;
+      zero.f[2] = 0.0;
+      zero.f[3] = 0.0;
+
+      ir_instruction *inst =
+	 new(ctx) ir_assignment(new(ctx) ir_dereference_variable(rhs_var),
+				new(ctx) ir_constant(rhs_var->type, &zero),
+				NULL);
+      instructions->push_tail(inst);
+
+      ir_dereference *const rhs_ref = new(ctx) ir_dereference_variable(rhs_var);
+
+      inst = new(ctx) ir_assignment(rhs_ref, first_param, NULL, 0x01);
+      instructions->push_tail(inst);
+
+      /* Assign the temporary vector to each column of the destination matrix
+       * with a swizzle that puts the X component on the diagonal of the
+       * matrix.  In some cases this may mean that the X component does not
+       * get assigned into the column at all (i.e., when the matrix has more
+       * columns than rows).
+       */
+      static const unsigned rhs_swiz[4][4] = {
+	 { 0, 1, 1, 1 },
+	 { 1, 0, 1, 1 },
+	 { 1, 1, 0, 1 },
+	 { 1, 1, 1, 0 }
+      };
+
+      const unsigned cols_to_init = MIN2(type->matrix_columns,
+					 type->vector_elements);
+      for (unsigned i = 0; i < cols_to_init; i++) {
+	 ir_constant *const col_idx = new(ctx) ir_constant(i);
+	 ir_rvalue *const col_ref = new(ctx) ir_dereference_array(var, col_idx);
+
+	 ir_rvalue *const rhs_ref = new(ctx) ir_dereference_variable(rhs_var);
+	 ir_rvalue *const rhs = new(ctx) ir_swizzle(rhs_ref, rhs_swiz[i],
+						    type->vector_elements);
+
+	 inst = new(ctx) ir_assignment(col_ref, rhs, NULL);
+	 instructions->push_tail(inst);
+      }
+
+      for (unsigned i = cols_to_init; i < type->matrix_columns; i++) {
+	 ir_constant *const col_idx = new(ctx) ir_constant(i);
+	 ir_rvalue *const col_ref = new(ctx) ir_dereference_array(var, col_idx);
+
+	 ir_rvalue *const rhs_ref = new(ctx) ir_dereference_variable(rhs_var);
+	 ir_rvalue *const rhs = new(ctx) ir_swizzle(rhs_ref, 1, 1, 1, 1,
+						    type->vector_elements);
+
+	 inst = new(ctx) ir_assignment(col_ref, rhs, NULL);
+	 instructions->push_tail(inst);
+      }
+   } else if (first_param->type->is_matrix()) {
+      /* From page 50 (56 of the PDF) of the GLSL 1.50 spec:
+       *
+       *     "If a matrix is constructed from a matrix, then each component
+       *     (column i, row j) in the result that has a corresponding
+       *     component (column i, row j) in the argument will be initialized
+       *     from there. All other components will be initialized to the
+       *     identity matrix. If a matrix argument is given to a matrix
+       *     constructor, it is an error to have any other arguments."
+       */
+      assert(first_param->next->is_tail_sentinel());
+      ir_rvalue *const src_matrix = first_param;
+
+      /* If the source matrix is smaller, pre-initialize the relavent parts of
+       * the destination matrix to the identity matrix.
+       */
+      if ((src_matrix->type->matrix_columns < var->type->matrix_columns)
+	  || (src_matrix->type->vector_elements < var->type->vector_elements)) {
+
+	 /* If the source matrix has fewer rows, every column of the destination
+	  * must be initialized.  Otherwise only the columns in the destination
+	  * that do not exist in the source must be initialized.
+	  */
+	 unsigned col =
+	    (src_matrix->type->vector_elements < var->type->vector_elements)
+	    ? 0 : src_matrix->type->matrix_columns;
+
+	 const glsl_type *const col_type = var->type->column_type();
+	 for (/* empty */; col < var->type->matrix_columns; col++) {
+	    ir_constant_data ident;
+
+	    ident.f[0] = 0.0;
+	    ident.f[1] = 0.0;
+	    ident.f[2] = 0.0;
+	    ident.f[3] = 0.0;
+
+	    ident.f[col] = 1.0;
+
+	    ir_rvalue *const rhs = new(ctx) ir_constant(col_type, &ident);
+
+	    ir_rvalue *const lhs =
+	       new(ctx) ir_dereference_array(var, new(ctx) ir_constant(col));
+
+	    ir_instruction *inst = new(ctx) ir_assignment(lhs, rhs, NULL);
+	    instructions->push_tail(inst);
+	 }
+      }
+
+      /* Assign columns from the source matrix to the destination matrix.
+       *
+       * Since the parameter will be used in the RHS of multiple assignments,
+       * generate a temporary and copy the parameter there.
+       */
+      ir_variable *const rhs_var =
+	 new(ctx) ir_variable(first_param->type, "mat_ctor_mat",
+			      ir_var_temporary);
+      instructions->push_tail(rhs_var);
+
+      ir_dereference *const rhs_var_ref =
+	 new(ctx) ir_dereference_variable(rhs_var);
+      ir_instruction *const inst =
+	 new(ctx) ir_assignment(rhs_var_ref, first_param, NULL);
+      instructions->push_tail(inst);
+
+      const unsigned last_row = MIN2(src_matrix->type->vector_elements,
+				     var->type->vector_elements);
+      const unsigned last_col = MIN2(src_matrix->type->matrix_columns,
+				     var->type->matrix_columns);
+
+      unsigned swiz[4] = { 0, 0, 0, 0 };
+      for (unsigned i = 1; i < last_row; i++)
+	 swiz[i] = i;
+
+      const unsigned write_mask = (1U << last_row) - 1;
+
+      for (unsigned i = 0; i < last_col; i++) {
+	 ir_dereference *const lhs =
+	    new(ctx) ir_dereference_array(var, new(ctx) ir_constant(i));
+	 ir_rvalue *const rhs_col =
+	    new(ctx) ir_dereference_array(rhs_var, new(ctx) ir_constant(i));
+
+	 /* If one matrix has columns that are smaller than the columns of the
+	  * other matrix, wrap the column access of the larger with a swizzle
+	  * so that the LHS and RHS of the assignment have the same size (and
+	  * therefore have the same type).
+	  *
+	  * It would be perfectly valid to unconditionally generate the
+	  * swizzles, this this will typically result in a more compact IR tree.
+	  */
+	 ir_rvalue *rhs;
+	 if (lhs->type->vector_elements != rhs_col->type->vector_elements) {
+	    rhs = new(ctx) ir_swizzle(rhs_col, swiz, last_row);
+	 } else {
+	    rhs = rhs_col;
+	 }
+
+	 ir_instruction *inst =
+	    new(ctx) ir_assignment(lhs, rhs, NULL, write_mask);
+	 instructions->push_tail(inst);
+      }
+   } else {
+      const unsigned cols = type->matrix_columns;
+      const unsigned rows = type->vector_elements;
+      unsigned col_idx = 0;
+      unsigned row_idx = 0;
+
+      foreach_list (node, parameters) {
+	 ir_rvalue *const rhs = (ir_rvalue *) node;
+	 const unsigned components_remaining_this_column = rows - row_idx;
+	 unsigned rhs_components = rhs->type->components();
+	 unsigned rhs_base = 0;
+
+	 /* Since the parameter might be used in the RHS of two assignments,
+	  * generate a temporary and copy the parameter there.
+	  */
+	 ir_variable *rhs_var =
+	    new(ctx) ir_variable(rhs->type, "mat_ctor_vec", ir_var_temporary);
+	 instructions->push_tail(rhs_var);
+
+	 ir_dereference *rhs_var_ref =
+	    new(ctx) ir_dereference_variable(rhs_var);
+	 ir_instruction *inst = new(ctx) ir_assignment(rhs_var_ref, rhs, NULL);
+	 instructions->push_tail(inst);
+
+	 /* Assign the current parameter to as many components of the matrix
+	  * as it will fill.
+	  *
+	  * NOTE: A single vector parameter can span two matrix columns.  A
+	  * single vec4, for example, can completely fill a mat2.
+	  */
+	 if (rhs_components >= components_remaining_this_column) {
+	    const unsigned count = MIN2(rhs_components,
+					components_remaining_this_column);
+
+	    rhs_var_ref = new(ctx) ir_dereference_variable(rhs_var);
+
+	    ir_instruction *inst = assign_to_matrix_column(var, col_idx,
+							   row_idx,
+							   rhs_var_ref, 0,
+							   count, ctx);
+	    instructions->push_tail(inst);
+
+	    rhs_base = count;
+
+	    col_idx++;
+	    row_idx = 0;
+	 }
+
+	 /* If there is data left in the parameter and components left to be
+	  * set in the destination, emit another assignment.  It is possible
+	  * that the assignment could be of a vec4 to the last element of the
+	  * matrix.  In this case col_idx==cols, but there is still data
+	  * left in the source parameter.  Obviously, don't emit an assignment
+	  * to data outside the destination matrix.
+	  */
+	 if ((col_idx < cols) && (rhs_base < rhs_components)) {
+	    const unsigned count = rhs_components - rhs_base;
+
+	    rhs_var_ref = new(ctx) ir_dereference_variable(rhs_var);
+
+	    ir_instruction *inst = assign_to_matrix_column(var, col_idx,
+							   row_idx,
+							   rhs_var_ref,
+							   rhs_base,
+							   count, ctx);
+	    instructions->push_tail(inst);
+
+	    row_idx += count;
+	 }
+      }
+   }
+
+   return new(ctx) ir_dereference_variable(var);
+}
+
+
+ir_rvalue *
+emit_inline_record_constructor(const glsl_type *type,
+			       exec_list *instructions,
+			       exec_list *parameters,
+			       void *mem_ctx)
+{
+   ir_variable *const var =
+      new(mem_ctx) ir_variable(type, "record_ctor", ir_var_temporary);
+   ir_dereference_variable *const d = new(mem_ctx) ir_dereference_variable(var);
+
+   instructions->push_tail(var);
+
+   exec_node *node = parameters->head;
+   for (unsigned i = 0; i < type->length; i++) {
+      assert(!node->is_tail_sentinel());
+
+      ir_dereference *const lhs =
+	 new(mem_ctx) ir_dereference_record(d->clone(mem_ctx, NULL),
+					    type->fields.structure[i].name);
+
+      ir_rvalue *const rhs = ((ir_instruction *) node)->as_rvalue();
+      assert(rhs != NULL);
+
+      ir_instruction *const assign = new(mem_ctx) ir_assignment(lhs, rhs, NULL);
+
+      instructions->push_tail(assign);
+      node = node->next;
+   }
+
+   return d;
+}
+
+
+static ir_rvalue *
+process_record_constructor(exec_list *instructions,
+                           const glsl_type *constructor_type,
+                           YYLTYPE *loc, exec_list *parameters,
+                           struct _mesa_glsl_parse_state *state)
+{
+   void *ctx = state;
+   exec_list actual_parameters;
+
+   process_parameters(instructions, &actual_parameters,
+                      parameters, state);
+
+   exec_node *node = actual_parameters.head;
+   for (unsigned i = 0; i < constructor_type->length; i++) {
+      ir_rvalue *ir = (ir_rvalue *) node;
+
+      if (node->is_tail_sentinel()) {
+         _mesa_glsl_error(loc, state,
+                          "insufficient parameters to constructor for `%s'",
+                          constructor_type->name);
+         return ir_rvalue::error_value(ctx);
+      }
+
+      if (apply_implicit_conversion(constructor_type->fields.structure[i].type,
+                                 ir, state)) {
+         node->replace_with(ir);
+      } else {
+         _mesa_glsl_error(loc, state,
+                          "parameter type mismatch in constructor for `%s.%s' "
+                          "(%s vs %s)",
+                          constructor_type->name,
+                          constructor_type->fields.structure[i].name,
+                          ir->type->name,
+                          constructor_type->fields.structure[i].type->name);
+         return ir_rvalue::error_value(ctx);;
+      }
+
+      node = node->next;
+   }
+
+   if (!node->is_tail_sentinel()) {
+      _mesa_glsl_error(loc, state, "too many parameters in constructor "
+                                    "for `%s'", constructor_type->name);
+      return ir_rvalue::error_value(ctx);
+   }
+
+   ir_rvalue *const constant =
+      constant_record_constructor(constructor_type, &actual_parameters,
+                                  state);
+
+   return (constant != NULL)
+            ? constant
+            : emit_inline_record_constructor(constructor_type, instructions,
+                                             &actual_parameters, state);
+}
+
+
+ir_rvalue *
+ast_function_expression::hir(exec_list *instructions,
+			     struct _mesa_glsl_parse_state *state)
+{
+   void *ctx = state;
+   /* There are three sorts of function calls.
+    *
+    * 1. constructors - The first subexpression is an ast_type_specifier.
+    * 2. methods - Only the .length() method of array types.
+    * 3. functions - Calls to regular old functions.
+    *
+    * Method calls are actually detected when the ast_field_selection
+    * expression is handled.
+    */
+   if (is_constructor()) {
+      const ast_type_specifier *type = (ast_type_specifier *) subexpressions[0];
+      YYLTYPE loc = type->get_location();
+      const char *name;
+
+      const glsl_type *const constructor_type = type->glsl_type(& name, state);
+
+      /* constructor_type can be NULL if a variable with the same name as the
+       * structure has come into scope.
+       */
+      if (constructor_type == NULL) {
+	 _mesa_glsl_error(& loc, state, "unknown type `%s' (structure name "
+			  "may be shadowed by a variable with the same name)",
+			  type->type_name);
+	 return ir_rvalue::error_value(ctx);
+      }
+
+
+      /* Constructors for samplers are illegal.
+       */
+      if (constructor_type->is_sampler()) {
+	 _mesa_glsl_error(& loc, state, "cannot construct sampler type `%s'",
+			  constructor_type->name);
+	 return ir_rvalue::error_value(ctx);
+      }
+
+      if (constructor_type->is_array()) {
+         if (!state->check_version(120, 300, &loc,
+                                   "array constructors forbidden")) {
+	    return ir_rvalue::error_value(ctx);
+	 }
+
+	 return process_array_constructor(instructions, constructor_type,
+					  & loc, &this->expressions, state);
+      }
+
+
+      /* There are two kinds of constructor calls.  Constructors for arrays and
+       * structures must have the exact number of arguments with matching types
+       * in the correct order.  These constructors follow essentially the same
+       * type matching rules as functions.
+       *
+       * Constructors for built-in language types, such as mat4 and vec2, are
+       * free form.  The only requirements are that the parameters must provide
+       * enough values of the correct scalar type and that no arguments are
+       * given past the last used argument.
+       *
+       * When using the C-style initializer syntax from GLSL 4.20, constructors
+       * must have the exact number of arguments with matching types in the
+       * correct order.
+       */
+      if (constructor_type->is_record()) {
+         return process_record_constructor(instructions, constructor_type,
+                                           &loc, &this->expressions,
+                                           state);
+      }
+
+      if (!constructor_type->is_numeric() && !constructor_type->is_boolean())
+	 return ir_rvalue::error_value(ctx);
+
+      /* Total number of components of the type being constructed. */
+      const unsigned type_components = constructor_type->components();
+
+      /* Number of components from parameters that have actually been
+       * consumed.  This is used to perform several kinds of error checking.
+       */
+      unsigned components_used = 0;
+
+      unsigned matrix_parameters = 0;
+      unsigned nonmatrix_parameters = 0;
+      exec_list actual_parameters;
+
+      foreach_list (n, &this->expressions) {
+	 ast_node *ast = exec_node_data(ast_node, n, link);
+	 ir_rvalue *result = ast->hir(instructions, state)->as_rvalue();
+
+	 /* From page 50 (page 56 of the PDF) of the GLSL 1.50 spec:
+	  *
+	  *    "It is an error to provide extra arguments beyond this
+	  *    last used argument."
+	  */
+	 if (components_used >= type_components) {
+	    _mesa_glsl_error(& loc, state, "too many parameters to `%s' "
+			     "constructor",
+			     constructor_type->name);
+	    return ir_rvalue::error_value(ctx);
+	 }
+
+	 if (!result->type->is_numeric() && !result->type->is_boolean()) {
+	    _mesa_glsl_error(& loc, state, "cannot construct `%s' from a "
+			     "non-numeric data type",
+			     constructor_type->name);
+	    return ir_rvalue::error_value(ctx);
+	 }
+
+	 /* Count the number of matrix and nonmatrix parameters.  This
+	  * is used below to enforce some of the constructor rules.
+	  */
+	 if (result->type->is_matrix())
+	    matrix_parameters++;
+	 else
+	    nonmatrix_parameters++;
+
+	 actual_parameters.push_tail(result);
+	 components_used += result->type->components();
+      }
+
+      /* From page 28 (page 34 of the PDF) of the GLSL 1.10 spec:
+       *
+       *    "It is an error to construct matrices from other matrices. This
+       *    is reserved for future use."
+       */
+      if (matrix_parameters > 0
+          && constructor_type->is_matrix()
+          && !state->check_version(120, 100, &loc,
+                                   "cannot construct `%s' from a matrix",
+                                   constructor_type->name)) {
+	 return ir_rvalue::error_value(ctx);
+      }
+
+      /* From page 50 (page 56 of the PDF) of the GLSL 1.50 spec:
+       *
+       *    "If a matrix argument is given to a matrix constructor, it is
+       *    an error to have any other arguments."
+       */
+      if ((matrix_parameters > 0)
+	  && ((matrix_parameters + nonmatrix_parameters) > 1)
+	  && constructor_type->is_matrix()) {
+	 _mesa_glsl_error(& loc, state, "for matrix `%s' constructor, "
+			  "matrix must be only parameter",
+			  constructor_type->name);
+	 return ir_rvalue::error_value(ctx);
+      }
+
+      /* From page 28 (page 34 of the PDF) of the GLSL 1.10 spec:
+       *
+       *    "In these cases, there must be enough components provided in the
+       *    arguments to provide an initializer for every component in the
+       *    constructed value."
+       */
+      if (components_used < type_components && components_used != 1
+	  && matrix_parameters == 0) {
+	 _mesa_glsl_error(& loc, state, "too few components to construct "
+			  "`%s'",
+			  constructor_type->name);
+	 return ir_rvalue::error_value(ctx);
+      }
+
+      /* Later, we cast each parameter to the same base type as the
+       * constructor.  Since there are no non-floating point matrices, we
+       * need to break them up into a series of column vectors.
+       */
+      if (constructor_type->base_type != GLSL_TYPE_FLOAT) {
+	 foreach_list_safe(n, &actual_parameters) {
+	    ir_rvalue *matrix = (ir_rvalue *) n;
+
+	    if (!matrix->type->is_matrix())
+	       continue;
+
+	    /* Create a temporary containing the matrix. */
+	    ir_variable *var = new(ctx) ir_variable(matrix->type, "matrix_tmp",
+						    ir_var_temporary);
+	    instructions->push_tail(var);
+	    instructions->push_tail(new(ctx) ir_assignment(new(ctx)
+	       ir_dereference_variable(var), matrix, NULL));
+	    var->constant_value = matrix->constant_expression_value();
+
+	    /* Replace the matrix with dereferences of its columns. */
+	    for (int i = 0; i < matrix->type->matrix_columns; i++) {
+	       matrix->insert_before(new (ctx) ir_dereference_array(var,
+		  new(ctx) ir_constant(i)));
+	    }
+	    matrix->remove();
+	 }
+      }
+
+      bool all_parameters_are_constant = true;
+
+      /* Type cast each parameter and, if possible, fold constants.*/
+      foreach_list_safe(n, &actual_parameters) {
+	 ir_rvalue *ir = (ir_rvalue *) n;
+
+	 const glsl_type *desired_type =
+	    glsl_type::get_instance(constructor_type->base_type,
+				    ir->type->vector_elements,
+				    ir->type->matrix_columns);
+	 ir_rvalue *result = convert_component(ir, desired_type);
+
+	 /* Attempt to convert the parameter to a constant valued expression.
+	  * After doing so, track whether or not all the parameters to the
+	  * constructor are trivially constant valued expressions.
+	  */
+	 ir_rvalue *const constant = result->constant_expression_value();
+
+	 if (constant != NULL)
+	    result = constant;
+	 else
+	    all_parameters_are_constant = false;
+
+	 if (result != ir) {
+	    ir->replace_with(result);
+	 }
+      }
+
+      /* If all of the parameters are trivially constant, create a
+       * constant representing the complete collection of parameters.
+       */
+      if (all_parameters_are_constant) {
+	 return new(ctx) ir_constant(constructor_type, &actual_parameters);
+      } else if (constructor_type->is_scalar()) {
+	 return dereference_component((ir_rvalue *) actual_parameters.head,
+				      0);
+      } else if (constructor_type->is_vector()) {
+	 return emit_inline_vector_constructor(constructor_type,
+					       instructions,
+					       &actual_parameters,
+					       ctx);
+      } else {
+	 assert(constructor_type->is_matrix());
+	 return emit_inline_matrix_constructor(constructor_type,
+					       instructions,
+					       &actual_parameters,
+					       ctx);
+      }
+   } else {
+      const ast_expression *id = subexpressions[0];
+      const char *func_name = id->primary_expression.identifier;
+      YYLTYPE loc = get_location();
+      exec_list actual_parameters;
+
+      process_parameters(instructions, &actual_parameters, &this->expressions,
+			 state);
+
+      ir_function_signature *sig =
+	 match_function_by_name(func_name, &actual_parameters, state);
+
+      ir_rvalue *value = NULL;
+      if (sig == NULL) {
+	 no_matching_function_error(func_name, &loc, &actual_parameters, state);
+	 value = ir_rvalue::error_value(ctx);
+      } else if (!verify_parameter_modes(state, sig, actual_parameters, this->expressions)) {
+	 /* an error has already been emitted */
+	 value = ir_rvalue::error_value(ctx);
+      } else {
+	 value = generate_call(instructions, sig, &actual_parameters, state);
+      }
+
+      return value;
+   }
+
+   return ir_rvalue::error_value(ctx);
+}
+
+ir_rvalue *
+ast_aggregate_initializer::hir(exec_list *instructions,
+                               struct _mesa_glsl_parse_state *state)
+{
+   void *ctx = state;
+   YYLTYPE loc = this->get_location();
+
+   if (!this->constructor_type) {
+      _mesa_glsl_error(&loc, state, "type of C-style initializer unknown");
+      return ir_rvalue::error_value(ctx);
+   }
+   const glsl_type *const constructor_type = this->constructor_type;
+
+   if (!state->ARB_shading_language_420pack_enable) {
+      _mesa_glsl_error(&loc, state, "C-style initialization requires the "
+                       "GL_ARB_shading_language_420pack extension");
+      return ir_rvalue::error_value(ctx);
+   }
+
+   if (constructor_type->is_array()) {
+      return process_array_constructor(instructions, constructor_type, &loc,
+                                       &this->expressions, state);
+   }
+
+   if (constructor_type->is_record()) {
+      return process_record_constructor(instructions, constructor_type, &loc,
+                                        &this->expressions, state);
+   }
+
+   return process_vec_mat_constructor(instructions, constructor_type, &loc,
+                                      &this->expressions, state);
+}

diff --git a/icd/intel/compiler/shader/ast_to_hir.cpp b/icd/intel/compiler/shader/ast_to_hir.cpp
new file mode 100644
index 0000000..2560cf3
--- /dev/null
+++ b/icd/intel/compiler/shader/ast_to_hir.cpp

@@ -0,0 +1,5731 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file ast_to_hir.c
+ * Convert abstract syntax to to high-level intermediate reprensentation (HIR).
+ *
+ * During the conversion to HIR, the majority of the symantic checking is
+ * preformed on the program.  This includes:
+ *
+ *    * Symbol table management
+ *    * Type checking
+ *    * Function binding
+ *
+ * The majority of this work could be done during parsing, and the parser could
+ * probably generate HIR directly.  However, this results in frequent changes
+ * to the parser code.  Since we do not assume that every system this compiler
+ * is built on will have Flex and Bison installed, we have to store the code
+ * generated by these tools in our version control system.  In other parts of
+ * the system we've seen problems where a parser was changed but the generated
+ * code was not committed, merge conflicts where created because two developers
+ * had slightly different versions of Bison installed, etc.
+ *
+ * I have also noticed that running Bison generated parsers in GDB is very
+ * irritating.  When you get a segfault on '$$ = $1->foo', you can't very
+ * well 'print $1' in GDB.
+ *
+ * As a result, my preference is to put as little C code as possible in the
+ * parser (and lexer) sources.
+ */
+
+#include "libfns.h" // LunarG ADD:
+#include "icd-utils.h" // LunarG ADD:
+#include "glsl_symbol_table.h"
+#include "glsl_parser_extras.h"
+#include "ast.h"
+#include "glsl_types.h"
+#include "program/hash_table.h"
+#include "ir.h"
+#include "ir_builder.h"
+
+using namespace ir_builder;
+
+static void
+detect_conflicting_assignments(struct _mesa_glsl_parse_state *state,
+			       exec_list *instructions);
+static void
+remove_per_vertex_blocks(exec_list *instructions,
+                         _mesa_glsl_parse_state *state, ir_variable_mode mode);
+
+
+void
+_mesa_ast_to_hir(exec_list *instructions, struct _mesa_glsl_parse_state *state)
+{
+   _mesa_glsl_initialize_variables(instructions, state);
+
+   state->symbols->separate_function_namespace = state->language_version == 110;
+
+   state->current_function = NULL;
+
+   state->toplevel_ir = instructions;
+
+   state->gs_input_prim_type_specified = false;
+   state->cs_input_local_size_specified = false;
+
+   /* Section 4.2 of the GLSL 1.20 specification states:
+    * "The built-in functions are scoped in a scope outside the global scope
+    *  users declare global variables in.  That is, a shader's global scope,
+    *  available for user-defined functions and global variables, is nested
+    *  inside the scope containing the built-in functions."
+    *
+    * Since built-in functions like ftransform() access built-in variables,
+    * it follows that those must be in the outer scope as well.
+    *
+    * We push scope here to create this nesting effect...but don't pop.
+    * This way, a shader's globals are still in the symbol table for use
+    * by the linker.
+    */
+   state->symbols->push_scope();
+
+   foreach_list_typed (ast_node, ast, link, & state->translation_unit)
+      ast->hir(instructions, state);
+
+   detect_recursion_unlinked(state, instructions);
+   detect_conflicting_assignments(state, instructions);
+
+   state->toplevel_ir = NULL;
+
+   /* Move all of the variable declarations to the front of the IR list, and
+    * reverse the order.  This has the (intended!) side effect that vertex
+    * shader inputs and fragment shader outputs will appear in the IR in the
+    * same order that they appeared in the shader code.  This results in the
+    * locations being assigned in the declared order.  Many (arguably buggy)
+    * applications depend on this behavior, and it matches what nearly all
+    * other drivers do.
+    */
+   foreach_list_safe(node, instructions) {
+      ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+      if (var == NULL)
+         continue;
+
+      var->remove();
+      instructions->push_head(var);
+   }
+
+   /* Figure out if gl_FragCoord is actually used in fragment shader */
+   ir_variable *const var = state->symbols->get_variable("gl_FragCoord");
+   if (var != NULL)
+      state->fs_uses_gl_fragcoord = var->data.used;
+
+   /* From section 7.1 (Built-In Language Variables) of the GLSL 4.10 spec:
+    *
+    *     If multiple shaders using members of a built-in block belonging to
+    *     the same interface are linked together in the same program, they
+    *     must all redeclare the built-in block in the same way, as described
+    *     in section 4.3.7 "Interface Blocks" for interface block matching, or
+    *     a link error will result.
+    *
+    * The phrase "using members of a built-in block" implies that if two
+    * shaders are linked together and one of them *does not use* any members
+    * of the built-in block, then that shader does not need to have a matching
+    * redeclaration of the built-in block.
+    *
+    * This appears to be a clarification to the behaviour established for
+    * gl_PerVertex by GLSL 1.50, therefore implement it regardless of GLSL
+    * version.
+    *
+    * The definition of "interface" in section 4.3.7 that applies here is as
+    * follows:
+    *
+    *     The boundary between adjacent programmable pipeline stages: This
+    *     spans all the outputs in all compilation units of the first stage
+    *     and all the inputs in all compilation units of the second stage.
+    *
+    * Therefore this rule applies to both inter- and intra-stage linking.
+    *
+    * The easiest way to implement this is to check whether the shader uses
+    * gl_PerVertex right after ast-to-ir conversion, and if it doesn't, simply
+    * remove all the relevant variable declaration from the IR, so that the
+    * linker won't see them and complain about mismatches.
+    */
+   remove_per_vertex_blocks(instructions, state, ir_var_shader_in);
+   remove_per_vertex_blocks(instructions, state, ir_var_shader_out);
+}
+
+
+/**
+ * If a conversion is available, convert one operand to a different type
+ *
+ * The \c from \c ir_rvalue is converted "in place".
+ *
+ * \param to     Type that the operand it to be converted to
+ * \param from   Operand that is being converted
+ * \param state  GLSL compiler state
+ *
+ * \return
+ * If a conversion is possible (or unnecessary), \c true is returned.
+ * Otherwise \c false is returned.
+ */
+bool
+apply_implicit_conversion(const glsl_type *to, ir_rvalue * &from,
+                          struct _mesa_glsl_parse_state *state)
+{
+   void *ctx = state;
+   if (to->base_type == from->type->base_type)
+      return true;
+
+   /* This conversion was added in GLSL 1.20.  If the compilation mode is
+    * GLSL 1.10, the conversion is skipped.
+    */
+   if (!state->is_version(120, 0))
+      return false;
+
+   /* From page 27 (page 33 of the PDF) of the GLSL 1.50 spec:
+    *
+    *    "There are no implicit array or structure conversions. For
+    *    example, an array of int cannot be implicitly converted to an
+    *    array of float. There are no implicit conversions between
+    *    signed and unsigned integers."
+    */
+   /* FINISHME: The above comment is partially a lie.  There is int/uint
+    * FINISHME: conversion for immediate constants.
+    */
+   if (!to->is_float() || !from->type->is_numeric())
+      return false;
+
+   /* Convert to a floating point type with the same number of components
+    * as the original type - i.e. int to float, not int to vec4.
+    */
+   to = glsl_type::get_instance(GLSL_TYPE_FLOAT, from->type->vector_elements,
+			        from->type->matrix_columns);
+
+   switch (from->type->base_type) {
+   case GLSL_TYPE_INT:
+      from = new(ctx) ir_expression(ir_unop_i2f, to, from, NULL);
+      break;
+   case GLSL_TYPE_UINT:
+      from = new(ctx) ir_expression(ir_unop_u2f, to, from, NULL);
+      break;
+   case GLSL_TYPE_BOOL:
+      from = new(ctx) ir_expression(ir_unop_b2f, to, from, NULL);
+      break;
+   default:
+      assert(0);
+   }
+
+   return true;
+}
+
+
+static const struct glsl_type *
+arithmetic_result_type(ir_rvalue * &value_a, ir_rvalue * &value_b,
+                       bool multiply,
+                       struct _mesa_glsl_parse_state *state, YYLTYPE *loc)
+{
+   const glsl_type *type_a = value_a->type;
+   const glsl_type *type_b = value_b->type;
+
+   /* From GLSL 1.50 spec, page 56:
+    *
+    *    "The arithmetic binary operators add (+), subtract (-),
+    *    multiply (*), and divide (/) operate on integer and
+    *    floating-point scalars, vectors, and matrices."
+    */
+   if (!type_a->is_numeric() || !type_b->is_numeric()) {
+      _mesa_glsl_error(loc, state,
+                       "operands to arithmetic operators must be numeric");
+      return glsl_type::error_type;
+   }
+
+
+   /*    "If one operand is floating-point based and the other is
+    *    not, then the conversions from Section 4.1.10 "Implicit
+    *    Conversions" are applied to the non-floating-point-based operand."
+    */
+   if (!apply_implicit_conversion(type_a, value_b, state)
+       && !apply_implicit_conversion(type_b, value_a, state)) {
+      _mesa_glsl_error(loc, state,
+                       "could not implicitly convert operands to "
+                       "arithmetic operator");
+      return glsl_type::error_type;
+   }
+   type_a = value_a->type;
+   type_b = value_b->type;
+
+   /*    "If the operands are integer types, they must both be signed or
+    *    both be unsigned."
+    *
+    * From this rule and the preceding conversion it can be inferred that
+    * both types must be GLSL_TYPE_FLOAT, or GLSL_TYPE_UINT, or GLSL_TYPE_INT.
+    * The is_numeric check above already filtered out the case where either
+    * type is not one of these, so now the base types need only be tested for
+    * equality.
+    */
+   if (type_a->base_type != type_b->base_type) {
+      _mesa_glsl_error(loc, state,
+                       "base type mismatch for arithmetic operator");
+      return glsl_type::error_type;
+   }
+
+   /*    "All arithmetic binary operators result in the same fundamental type
+    *    (signed integer, unsigned integer, or floating-point) as the
+    *    operands they operate on, after operand type conversion. After
+    *    conversion, the following cases are valid
+    *
+    *    * The two operands are scalars. In this case the operation is
+    *      applied, resulting in a scalar."
+    */
+   if (type_a->is_scalar() && type_b->is_scalar())
+      return type_a;
+
+   /*   "* One operand is a scalar, and the other is a vector or matrix.
+    *      In this case, the scalar operation is applied independently to each
+    *      component of the vector or matrix, resulting in the same size
+    *      vector or matrix."
+    */
+   if (type_a->is_scalar()) {
+      if (!type_b->is_scalar())
+         return type_b;
+   } else if (type_b->is_scalar()) {
+      return type_a;
+   }
+
+   /* All of the combinations of <scalar, scalar>, <vector, scalar>,
+    * <scalar, vector>, <scalar, matrix>, and <matrix, scalar> have been
+    * handled.
+    */
+   assert(!type_a->is_scalar());
+   assert(!type_b->is_scalar());
+
+   /*   "* The two operands are vectors of the same size. In this case, the
+    *      operation is done component-wise resulting in the same size
+    *      vector."
+    */
+   if (type_a->is_vector() && type_b->is_vector()) {
+      if (type_a == type_b) {
+         return type_a;
+      } else {
+         _mesa_glsl_error(loc, state,
+                          "vector size mismatch for arithmetic operator");
+         return glsl_type::error_type;
+      }
+   }
+
+   /* All of the combinations of <scalar, scalar>, <vector, scalar>,
+    * <scalar, vector>, <scalar, matrix>, <matrix, scalar>, and
+    * <vector, vector> have been handled.  At least one of the operands must
+    * be matrix.  Further, since there are no integer matrix types, the base
+    * type of both operands must be float.
+    */
+   assert(type_a->is_matrix() || type_b->is_matrix());
+   assert(type_a->base_type == GLSL_TYPE_FLOAT);
+   assert(type_b->base_type == GLSL_TYPE_FLOAT);
+
+   /*   "* The operator is add (+), subtract (-), or divide (/), and the
+    *      operands are matrices with the same number of rows and the same
+    *      number of columns. In this case, the operation is done component-
+    *      wise resulting in the same size matrix."
+    *    * The operator is multiply (*), where both operands are matrices or
+    *      one operand is a vector and the other a matrix. A right vector
+    *      operand is treated as a column vector and a left vector operand as a
+    *      row vector. In all these cases, it is required that the number of
+    *      columns of the left operand is equal to the number of rows of the
+    *      right operand. Then, the multiply (*) operation does a linear
+    *      algebraic multiply, yielding an object that has the same number of
+    *      rows as the left operand and the same number of columns as the right
+    *      operand. Section 5.10 "Vector and Matrix Operations" explains in
+    *      more detail how vectors and matrices are operated on."
+    */
+   if (! multiply) {
+      if (type_a == type_b)
+         return type_a;
+   } else {
+      if (type_a->is_matrix() && type_b->is_matrix()) {
+         /* Matrix multiply.  The columns of A must match the rows of B.  Given
+          * the other previously tested constraints, this means the vector type
+          * of a row from A must be the same as the vector type of a column from
+          * B.
+          */
+         if (type_a->row_type() == type_b->column_type()) {
+            /* The resulting matrix has the number of columns of matrix B and
+             * the number of rows of matrix A.  We get the row count of A by
+             * looking at the size of a vector that makes up a column.  The
+             * transpose (size of a row) is done for B.
+             */
+            const glsl_type *const type =
+               glsl_type::get_instance(type_a->base_type,
+                                       type_a->column_type()->vector_elements,
+                                       type_b->row_type()->vector_elements);
+            assert(type != glsl_type::error_type);
+
+            return type;
+         }
+      } else if (type_a->is_matrix()) {
+         /* A is a matrix and B is a column vector.  Columns of A must match
+          * rows of B.  Given the other previously tested constraints, this
+          * means the vector type of a row from A must be the same as the
+          * vector the type of B.
+          */
+         if (type_a->row_type() == type_b) {
+            /* The resulting vector has a number of elements equal to
+             * the number of rows of matrix A. */
+            const glsl_type *const type =
+               glsl_type::get_instance(type_a->base_type,
+                                       type_a->column_type()->vector_elements,
+                                       1);
+            assert(type != glsl_type::error_type);
+
+            return type;
+         }
+      } else {
+         assert(type_b->is_matrix());
+
+         /* A is a row vector and B is a matrix.  Columns of A must match rows
+          * of B.  Given the other previously tested constraints, this means
+          * the type of A must be the same as the vector type of a column from
+          * B.
+          */
+         if (type_a == type_b->column_type()) {
+            /* The resulting vector has a number of elements equal to
+             * the number of columns of matrix B. */
+            const glsl_type *const type =
+               glsl_type::get_instance(type_a->base_type,
+                                       type_b->row_type()->vector_elements,
+                                       1);
+            assert(type != glsl_type::error_type);
+
+            return type;
+         }
+      }
+
+      _mesa_glsl_error(loc, state, "size mismatch for matrix multiplication");
+      return glsl_type::error_type;
+   }
+
+
+   /*    "All other cases are illegal."
+    */
+   _mesa_glsl_error(loc, state, "type mismatch");
+   return glsl_type::error_type;
+}
+
+
+static const struct glsl_type *
+unary_arithmetic_result_type(const struct glsl_type *type,
+                             struct _mesa_glsl_parse_state *state, YYLTYPE *loc)
+{
+   /* From GLSL 1.50 spec, page 57:
+    *
+    *    "The arithmetic unary operators negate (-), post- and pre-increment
+    *     and decrement (-- and ++) operate on integer or floating-point
+    *     values (including vectors and matrices). All unary operators work
+    *     component-wise on their operands. These result with the same type
+    *     they operated on."
+    */
+   if (!type->is_numeric()) {
+      _mesa_glsl_error(loc, state,
+                       "operands to arithmetic operators must be numeric");
+      return glsl_type::error_type;
+   }
+
+   return type;
+}
+
+/**
+ * \brief Return the result type of a bit-logic operation.
+ *
+ * If the given types to the bit-logic operator are invalid, return
+ * glsl_type::error_type.
+ *
+ * \param type_a Type of LHS of bit-logic op
+ * \param type_b Type of RHS of bit-logic op
+ */
+static const struct glsl_type *
+bit_logic_result_type(const struct glsl_type *type_a,
+                      const struct glsl_type *type_b,
+                      ast_operators op,
+                      struct _mesa_glsl_parse_state *state, YYLTYPE *loc)
+{
+    if (!state->check_bitwise_operations_allowed(loc)) {
+       return glsl_type::error_type;
+    }
+
+    /* From page 50 (page 56 of PDF) of GLSL 1.30 spec:
+     *
+     *     "The bitwise operators and (&), exclusive-or (^), and inclusive-or
+     *     (|). The operands must be of type signed or unsigned integers or
+     *     integer vectors."
+     */
+    if (!type_a->is_integer()) {
+       _mesa_glsl_error(loc, state, "LHS of `%s' must be an integer",
+                         ast_expression::operator_string(op));
+       return glsl_type::error_type;
+    }
+    if (!type_b->is_integer()) {
+       _mesa_glsl_error(loc, state, "RHS of `%s' must be an integer",
+                        ast_expression::operator_string(op));
+       return glsl_type::error_type;
+    }
+
+    /*     "The fundamental types of the operands (signed or unsigned) must
+     *     match,"
+     */
+    if (type_a->base_type != type_b->base_type) {
+       _mesa_glsl_error(loc, state, "operands of `%s' must have the same "
+                        "base type", ast_expression::operator_string(op));
+       return glsl_type::error_type;
+    }
+
+    /*     "The operands cannot be vectors of differing size." */
+    if (type_a->is_vector() &&
+        type_b->is_vector() &&
+        type_a->vector_elements != type_b->vector_elements) {
+       _mesa_glsl_error(loc, state, "operands of `%s' cannot be vectors of "
+                        "different sizes", ast_expression::operator_string(op));
+       return glsl_type::error_type;
+    }
+
+    /*     "If one operand is a scalar and the other a vector, the scalar is
+     *     applied component-wise to the vector, resulting in the same type as
+     *     the vector. The fundamental types of the operands [...] will be the
+     *     resulting fundamental type."
+     */
+    if (type_a->is_scalar())
+        return type_b;
+    else
+        return type_a;
+}
+
+static const struct glsl_type *
+modulus_result_type(const struct glsl_type *type_a,
+                    const struct glsl_type *type_b,
+                    struct _mesa_glsl_parse_state *state, YYLTYPE *loc)
+{
+   if (!state->check_version(130, 300, loc, "operator '%%' is reserved")) {
+      return glsl_type::error_type;
+   }
+
+   /* From GLSL 1.50 spec, page 56:
+    *    "The operator modulus (%) operates on signed or unsigned integers or
+    *    integer vectors. The operand types must both be signed or both be
+    *    unsigned."
+    */
+   if (!type_a->is_integer()) {
+      _mesa_glsl_error(loc, state, "LHS of operator %% must be an integer");
+      return glsl_type::error_type;
+   }
+   if (!type_b->is_integer()) {
+      _mesa_glsl_error(loc, state, "RHS of operator %% must be an integer");
+      return glsl_type::error_type;
+   }
+   if (type_a->base_type != type_b->base_type) {
+      _mesa_glsl_error(loc, state,
+                       "operands of %% must have the same base type");
+      return glsl_type::error_type;
+   }
+
+   /*    "The operands cannot be vectors of differing size. If one operand is
+    *    a scalar and the other vector, then the scalar is applied component-
+    *    wise to the vector, resulting in the same type as the vector. If both
+    *    are vectors of the same size, the result is computed component-wise."
+    */
+   if (type_a->is_vector()) {
+      if (!type_b->is_vector()
+          || (type_a->vector_elements == type_b->vector_elements))
+      return type_a;
+   } else
+      return type_b;
+
+   /*    "The operator modulus (%) is not defined for any other data types
+    *    (non-integer types)."
+    */
+   _mesa_glsl_error(loc, state, "type mismatch");
+   return glsl_type::error_type;
+}
+
+
+static const struct glsl_type *
+relational_result_type(ir_rvalue * &value_a, ir_rvalue * &value_b,
+                       struct _mesa_glsl_parse_state *state, YYLTYPE *loc)
+{
+   const glsl_type *type_a = value_a->type;
+   const glsl_type *type_b = value_b->type;
+
+   /* From GLSL 1.50 spec, page 56:
+    *    "The relational operators greater than (>), less than (<), greater
+    *    than or equal (>=), and less than or equal (<=) operate only on
+    *    scalar integer and scalar floating-point expressions."
+    */
+   if (!type_a->is_numeric()
+       || !type_b->is_numeric()
+       || !type_a->is_scalar()
+       || !type_b->is_scalar()) {
+      _mesa_glsl_error(loc, state,
+                       "operands to relational operators must be scalar and "
+                       "numeric");
+      return glsl_type::error_type;
+   }
+
+   /*    "Either the operands' types must match, or the conversions from
+    *    Section 4.1.10 "Implicit Conversions" will be applied to the integer
+    *    operand, after which the types must match."
+    */
+   if (!apply_implicit_conversion(type_a, value_b, state)
+       && !apply_implicit_conversion(type_b, value_a, state)) {
+      _mesa_glsl_error(loc, state,
+                       "could not implicitly convert operands to "
+                       "relational operator");
+      return glsl_type::error_type;
+   }
+   type_a = value_a->type;
+   type_b = value_b->type;
+
+   if (type_a->base_type != type_b->base_type) {
+      _mesa_glsl_error(loc, state, "base type mismatch");
+      return glsl_type::error_type;
+   }
+
+   /*    "The result is scalar Boolean."
+    */
+   return glsl_type::bool_type;
+}
+
+/**
+ * \brief Return the result type of a bit-shift operation.
+ *
+ * If the given types to the bit-shift operator are invalid, return
+ * glsl_type::error_type.
+ *
+ * \param type_a Type of LHS of bit-shift op
+ * \param type_b Type of RHS of bit-shift op
+ */
+static const struct glsl_type *
+shift_result_type(const struct glsl_type *type_a,
+                  const struct glsl_type *type_b,
+                  ast_operators op,
+                  struct _mesa_glsl_parse_state *state, YYLTYPE *loc)
+{
+   if (!state->check_bitwise_operations_allowed(loc)) {
+      return glsl_type::error_type;
+   }
+
+   /* From page 50 (page 56 of the PDF) of the GLSL 1.30 spec:
+    *
+    *     "The shift operators (<<) and (>>). For both operators, the operands
+    *     must be signed or unsigned integers or integer vectors. One operand
+    *     can be signed while the other is unsigned."
+    */
+   if (!type_a->is_integer()) {
+      _mesa_glsl_error(loc, state, "LHS of operator %s must be an integer or "
+                       "integer vector", ast_expression::operator_string(op));
+     return glsl_type::error_type;
+
+   }
+   if (!type_b->is_integer()) {
+      _mesa_glsl_error(loc, state, "RHS of operator %s must be an integer or "
+                       "integer vector", ast_expression::operator_string(op));
+     return glsl_type::error_type;
+   }
+
+   /*     "If the first operand is a scalar, the second operand has to be
+    *     a scalar as well."
+    */
+   if (type_a->is_scalar() && !type_b->is_scalar()) {
+      _mesa_glsl_error(loc, state, "if the first operand of %s is scalar, the "
+                       "second must be scalar as well",
+                       ast_expression::operator_string(op));
+     return glsl_type::error_type;
+   }
+
+   /* If both operands are vectors, check that they have same number of
+    * elements.
+    */
+   if (type_a->is_vector() &&
+      type_b->is_vector() &&
+      type_a->vector_elements != type_b->vector_elements) {
+      _mesa_glsl_error(loc, state, "vector operands to operator %s must "
+                       "have same number of elements",
+                       ast_expression::operator_string(op));
+     return glsl_type::error_type;
+   }
+
+   /*     "In all cases, the resulting type will be the same type as the left
+    *     operand."
+    */
+   return type_a;
+}
+
+/**
+ * Validates that a value can be assigned to a location with a specified type
+ *
+ * Validates that \c rhs can be assigned to some location.  If the types are
+ * not an exact match but an automatic conversion is possible, \c rhs will be
+ * converted.
+ *
+ * \return
+ * \c NULL if \c rhs cannot be assigned to a location with type \c lhs_type.
+ * Otherwise the actual RHS to be assigned will be returned.  This may be
+ * \c rhs, or it may be \c rhs after some type conversion.
+ *
+ * \note
+ * In addition to being used for assignments, this function is used to
+ * type-check return values.
+ */
+ir_rvalue *
+validate_assignment(struct _mesa_glsl_parse_state *state,
+                    YYLTYPE loc, const glsl_type *lhs_type,
+                    ir_rvalue *rhs, bool is_initializer)
+{
+   /* If there is already some error in the RHS, just return it.  Anything
+    * else will lead to an avalanche of error message back to the user.
+    */
+   if (rhs->type->is_error())
+      return rhs;
+
+   /* If the types are identical, the assignment can trivially proceed.
+    */
+   if (rhs->type == lhs_type)
+      return rhs;
+
+   /* If the array element types are the same and the LHS is unsized,
+    * the assignment is okay for initializers embedded in variable
+    * declarations.
+    *
+    * Note: Whole-array assignments are not permitted in GLSL 1.10, but this
+    * is handled by ir_dereference::is_lvalue.
+    */
+   if (lhs_type->is_unsized_array() && rhs->type->is_array()
+       && (lhs_type->element_type() == rhs->type->element_type())) {
+      if (is_initializer) {
+         return rhs;
+      } else {
+         _mesa_glsl_error(&loc, state,
+                          "implicitly sized arrays cannot be assigned");
+         return NULL;
+      }
+   }
+
+   /* Check for implicit conversion in GLSL 1.20 */
+   if (apply_implicit_conversion(lhs_type, rhs, state)) {
+      if (rhs->type == lhs_type)
+	 return rhs;
+   }
+
+   _mesa_glsl_error(&loc, state,
+                    "%s of type %s cannot be assigned to "
+                    "variable of type %s",
+                    is_initializer ? "initializer" : "value",
+                    rhs->type->name, lhs_type->name);
+
+   return NULL;
+}
+
+static void
+mark_whole_array_access(ir_rvalue *access)
+{
+   ir_dereference_variable *deref = access->as_dereference_variable();
+
+   if (deref && deref->var) {
+      deref->var->data.max_array_access = deref->type->length - 1;
+   }
+}
+
+static bool
+do_assignment(exec_list *instructions, struct _mesa_glsl_parse_state *state,
+              const char *non_lvalue_description,
+              ir_rvalue *lhs, ir_rvalue *rhs,
+              ir_rvalue **out_rvalue, bool needs_rvalue,
+              bool is_initializer,
+              YYLTYPE lhs_loc)
+{
+   void *ctx = state;
+   bool error_emitted = (lhs->type->is_error() || rhs->type->is_error());
+   ir_rvalue *extract_channel = NULL;
+
+   /* If the assignment LHS comes back as an ir_binop_vector_extract
+    * expression, move it to the RHS as an ir_triop_vector_insert.
+    */
+   if (lhs->ir_type == ir_type_expression) {
+      ir_expression *const lhs_expr = lhs->as_expression();
+
+      if (unlikely(lhs_expr->operation == ir_binop_vector_extract)) {
+         ir_rvalue *new_rhs =
+            validate_assignment(state, lhs_loc, lhs->type,
+                                rhs, is_initializer);
+
+         if (new_rhs == NULL) {
+            return lhs;
+         } else {
+            /* This converts:
+             * - LHS: (expression float vector_extract <vec> <channel>)
+             * - RHS: <scalar>
+             * into:
+             * - LHS: <vec>
+             * - RHS: (expression vec2 vector_insert <vec> <channel> <scalar>)
+             *
+             * The LHS type is now a vector instead of a scalar.  Since GLSL
+             * allows assignments to be used as rvalues, we need to re-extract
+             * the channel from assignment_temp when returning the rvalue.
+             */
+            extract_channel = lhs_expr->operands[1];
+            rhs = new(ctx) ir_expression(ir_triop_vector_insert,
+                                         lhs_expr->operands[0]->type,
+                                         lhs_expr->operands[0],
+                                         new_rhs,
+                                         extract_channel);
+            lhs = lhs_expr->operands[0]->clone(ctx, NULL);
+         }
+      }
+   }
+
+   ir_variable *lhs_var = lhs->variable_referenced();
+   if (lhs_var)
+      lhs_var->data.assigned = true;
+
+   if (!error_emitted) {
+      if (non_lvalue_description != NULL) {
+         _mesa_glsl_error(&lhs_loc, state,
+                          "assignment to %s",
+                          non_lvalue_description);
+         error_emitted = true;
+      } else if (lhs->variable_referenced() != NULL
+                 && lhs->variable_referenced()->data.read_only) {
+         _mesa_glsl_error(&lhs_loc, state,
+                          "assignment to read-only variable '%s'",
+                          lhs->variable_referenced()->name);
+         error_emitted = true;
+      } else if (lhs->type->is_array() &&
+                 !state->check_version(120, 300, &lhs_loc,
+                                       "whole array assignment forbidden")) {
+         /* From page 32 (page 38 of the PDF) of the GLSL 1.10 spec:
+          *
+          *    "Other binary or unary expressions, non-dereferenced
+          *     arrays, function names, swizzles with repeated fields,
+          *     and constants cannot be l-values."
+          *
+          * The restriction on arrays is lifted in GLSL 1.20 and GLSL ES 3.00.
+          */
+         error_emitted = true;
+      } else if (!lhs->is_lvalue()) {
+         _mesa_glsl_error(& lhs_loc, state, "non-lvalue in assignment");
+         error_emitted = true;
+      }
+   }
+
+   ir_rvalue *new_rhs =
+      validate_assignment(state, lhs_loc, lhs->type, rhs, is_initializer);
+   if (new_rhs != NULL) {
+      rhs = new_rhs;
+
+      /* If the LHS array was not declared with a size, it takes it size from
+       * the RHS.  If the LHS is an l-value and a whole array, it must be a
+       * dereference of a variable.  Any other case would require that the LHS
+       * is either not an l-value or not a whole array.
+       */
+      if (lhs->type->is_unsized_array()) {
+         ir_dereference *const d = lhs->as_dereference();
+
+         assert(d != NULL);
+
+         ir_variable *const var = d->variable_referenced();
+
+         assert(var != NULL);
+
+         if (var->data.max_array_access >= unsigned(rhs->type->array_size())) {
+            /* FINISHME: This should actually log the location of the RHS. */
+            _mesa_glsl_error(& lhs_loc, state, "array size must be > %u due to "
+                             "previous access",
+                             var->data.max_array_access);
+         }
+
+         var->type = glsl_type::get_array_instance(lhs->type->element_type(),
+                                                   rhs->type->array_size());
+         d->type = var->type;
+      }
+      if (lhs->type->is_array()) {
+         mark_whole_array_access(rhs);
+         mark_whole_array_access(lhs);
+      }
+   }
+
+   /* Most callers of do_assignment (assign, add_assign, pre_inc/dec,
+    * but not post_inc) need the converted assigned value as an rvalue
+    * to handle things like:
+    *
+    * i = j += 1;
+    */
+   if (needs_rvalue) {
+      ir_variable *var = new(ctx) ir_variable(rhs->type, "assignment_tmp",
+                                              ir_var_temporary);
+      instructions->push_tail(var);
+      instructions->push_tail(assign(var, rhs));
+
+      if (!error_emitted) {
+         ir_dereference_variable *deref_var = new(ctx) ir_dereference_variable(var);
+         instructions->push_tail(new(ctx) ir_assignment(lhs, deref_var));
+      }
+      ir_rvalue *rvalue = new(ctx) ir_dereference_variable(var);
+
+      if (extract_channel) {
+         rvalue = new(ctx) ir_expression(ir_binop_vector_extract,
+                                         rvalue,
+                                         extract_channel->clone(ctx, NULL));
+      }
+
+      *out_rvalue = rvalue;
+   } else {
+      if (!error_emitted)
+         instructions->push_tail(new(ctx) ir_assignment(lhs, rhs));
+      *out_rvalue = NULL;
+   }
+
+   return error_emitted;
+}
+
+static ir_rvalue *
+get_lvalue_copy(exec_list *instructions, ir_rvalue *lvalue)
+{
+   void *ctx = ralloc_parent(lvalue);
+   ir_variable *var;
+
+   var = new(ctx) ir_variable(lvalue->type, "_post_incdec_tmp",
+			      ir_var_temporary);
+   instructions->push_tail(var);
+   var->data.mode = ir_var_auto;
+
+   instructions->push_tail(new(ctx) ir_assignment(new(ctx) ir_dereference_variable(var),
+						  lvalue));
+
+   return new(ctx) ir_dereference_variable(var);
+}
+
+
+ir_rvalue *
+ast_node::hir(exec_list *instructions, struct _mesa_glsl_parse_state *state)
+{
+   (void) instructions;
+   (void) state;
+
+   return NULL;
+}
+
+void
+ast_function_expression::hir_no_rvalue(exec_list *instructions,
+                                       struct _mesa_glsl_parse_state *state)
+{
+   (void)hir(instructions, state);
+}
+
+void
+ast_aggregate_initializer::hir_no_rvalue(exec_list *instructions,
+                                         struct _mesa_glsl_parse_state *state)
+{
+   (void)hir(instructions, state);
+}
+
+static ir_rvalue *
+do_comparison(void *mem_ctx, int operation, ir_rvalue *op0, ir_rvalue *op1)
+{
+   int join_op;
+   ir_rvalue *cmp = NULL;
+
+   if (operation == ir_binop_all_equal)
+      join_op = ir_binop_logic_and;
+   else
+      join_op = ir_binop_logic_or;
+
+   switch (op0->type->base_type) {
+   case GLSL_TYPE_FLOAT:
+   case GLSL_TYPE_UINT:
+   case GLSL_TYPE_INT:
+   case GLSL_TYPE_BOOL:
+      return new(mem_ctx) ir_expression(operation, op0, op1);
+
+   case GLSL_TYPE_ARRAY: {
+      for (unsigned int i = 0; i < op0->type->length; i++) {
+         ir_rvalue *e0, *e1, *result;
+
+         e0 = new(mem_ctx) ir_dereference_array(op0->clone(mem_ctx, NULL),
+                                                new(mem_ctx) ir_constant(i));
+         e1 = new(mem_ctx) ir_dereference_array(op1->clone(mem_ctx, NULL),
+                                                new(mem_ctx) ir_constant(i));
+         result = do_comparison(mem_ctx, operation, e0, e1);
+
+         if (cmp) {
+            cmp = new(mem_ctx) ir_expression(join_op, cmp, result);
+         } else {
+            cmp = result;
+         }
+      }
+
+      mark_whole_array_access(op0);
+      mark_whole_array_access(op1);
+      break;
+   }
+
+   case GLSL_TYPE_STRUCT: {
+      for (unsigned int i = 0; i < op0->type->length; i++) {
+         ir_rvalue *e0, *e1, *result;
+         const char *field_name = op0->type->fields.structure[i].name;
+
+         e0 = new(mem_ctx) ir_dereference_record(op0->clone(mem_ctx, NULL),
+                                                 field_name);
+         e1 = new(mem_ctx) ir_dereference_record(op1->clone(mem_ctx, NULL),
+                                                 field_name);
+         result = do_comparison(mem_ctx, operation, e0, e1);
+
+         if (cmp) {
+            cmp = new(mem_ctx) ir_expression(join_op, cmp, result);
+         } else {
+            cmp = result;
+         }
+      }
+      break;
+   }
+
+   case GLSL_TYPE_ERROR:
+   case GLSL_TYPE_VOID:
+   case GLSL_TYPE_SAMPLER:
+   case GLSL_TYPE_IMAGE:
+   case GLSL_TYPE_INTERFACE:
+   case GLSL_TYPE_ATOMIC_UINT:
+      /* I assume a comparison of a struct containing a sampler just
+       * ignores the sampler present in the type.
+       */
+      break;
+   }
+
+   if (cmp == NULL)
+      cmp = new(mem_ctx) ir_constant(true);
+
+   return cmp;
+}
+
+/* For logical operations, we want to ensure that the operands are
+ * scalar booleans.  If it isn't, emit an error and return a constant
+ * boolean to avoid triggering cascading error messages.
+ */
+ir_rvalue *
+get_scalar_boolean_operand(exec_list *instructions,
+			   struct _mesa_glsl_parse_state *state,
+			   ast_expression *parent_expr,
+			   int operand,
+			   const char *operand_name,
+			   bool *error_emitted)
+{
+   ast_expression *expr = parent_expr->subexpressions[operand];
+   void *ctx = state;
+   ir_rvalue *val = expr->hir(instructions, state);
+
+   if (val->type->is_boolean() && val->type->is_scalar())
+      return val;
+
+   if (!*error_emitted) {
+      YYLTYPE loc = expr->get_location();
+      _mesa_glsl_error(&loc, state, "%s of `%s' must be scalar boolean",
+                       operand_name,
+                       parent_expr->operator_string(parent_expr->oper));
+      *error_emitted = true;
+   }
+
+   return new(ctx) ir_constant(true);
+}
+
+/**
+ * If name refers to a builtin array whose maximum allowed size is less than
+ * size, report an error and return true.  Otherwise return false.
+ */
+void
+check_builtin_array_max_size(const char *name, unsigned size,
+                             YYLTYPE loc, struct _mesa_glsl_parse_state *state)
+{
+   if ((strcmp("gl_TexCoord", name) == 0)
+       && (size > state->Const.MaxTextureCoords)) {
+      /* From page 54 (page 60 of the PDF) of the GLSL 1.20 spec:
+       *
+       *     "The size [of gl_TexCoord] can be at most
+       *     gl_MaxTextureCoords."
+       */
+      _mesa_glsl_error(&loc, state, "`gl_TexCoord' array size cannot "
+                       "be larger than gl_MaxTextureCoords (%u)",
+                       state->Const.MaxTextureCoords);
+   } else if (strcmp("gl_ClipDistance", name) == 0
+              && size > state->Const.MaxClipPlanes) {
+      /* From section 7.1 (Vertex Shader Special Variables) of the
+       * GLSL 1.30 spec:
+       *
+       *   "The gl_ClipDistance array is predeclared as unsized and
+       *   must be sized by the shader either redeclaring it with a
+       *   size or indexing it only with integral constant
+       *   expressions. ... The size can be at most
+       *   gl_MaxClipDistances."
+       */
+      _mesa_glsl_error(&loc, state, "`gl_ClipDistance' array size cannot "
+                       "be larger than gl_MaxClipDistances (%u)",
+                       state->Const.MaxClipPlanes);
+   }
+}
+
+/**
+ * Create the constant 1, of a which is appropriate for incrementing and
+ * decrementing values of the given GLSL type.  For example, if type is vec4,
+ * this creates a constant value of 1.0 having type float.
+ *
+ * If the given type is invalid for increment and decrement operators, return
+ * a floating point 1--the error will be detected later.
+ */
+static ir_rvalue *
+constant_one_for_inc_dec(void *ctx, const glsl_type *type)
+{
+   switch (type->base_type) {
+   case GLSL_TYPE_UINT:
+      return new(ctx) ir_constant((unsigned) 1);
+   case GLSL_TYPE_INT:
+      return new(ctx) ir_constant(1);
+   default:
+   case GLSL_TYPE_FLOAT:
+      return new(ctx) ir_constant(1.0f);
+   }
+}
+
+ir_rvalue *
+ast_expression::hir(exec_list *instructions,
+                    struct _mesa_glsl_parse_state *state)
+{
+   return do_hir(instructions, state, true);
+}
+
+void
+ast_expression::hir_no_rvalue(exec_list *instructions,
+                              struct _mesa_glsl_parse_state *state)
+{
+   do_hir(instructions, state, false);
+}
+
+ir_rvalue *
+ast_expression::do_hir(exec_list *instructions,
+                       struct _mesa_glsl_parse_state *state,
+                       bool needs_rvalue)
+{
+   void *ctx = state;
+   static const int operations[AST_NUM_OPERATORS] = {
+      -1,               /* ast_assign doesn't convert to ir_expression. */
+      -1,               /* ast_plus doesn't convert to ir_expression. */
+      ir_unop_neg,
+      ir_binop_add,
+      ir_binop_sub,
+      ir_binop_mul,
+      ir_binop_div,
+      ir_binop_mod,
+      ir_binop_lshift,
+      ir_binop_rshift,
+      ir_binop_less,
+      ir_binop_greater,
+      ir_binop_lequal,
+      ir_binop_gequal,
+      ir_binop_all_equal,
+      ir_binop_any_nequal,
+      ir_binop_bit_and,
+      ir_binop_bit_xor,
+      ir_binop_bit_or,
+      ir_unop_bit_not,
+      ir_binop_logic_and,
+      ir_binop_logic_xor,
+      ir_binop_logic_or,
+      ir_unop_logic_not,
+
+      /* Note: The following block of expression types actually convert
+       * to multiple IR instructions.
+       */
+      ir_binop_mul,     /* ast_mul_assign */
+      ir_binop_div,     /* ast_div_assign */
+      ir_binop_mod,     /* ast_mod_assign */
+      ir_binop_add,     /* ast_add_assign */
+      ir_binop_sub,     /* ast_sub_assign */
+      ir_binop_lshift,  /* ast_ls_assign */
+      ir_binop_rshift,  /* ast_rs_assign */
+      ir_binop_bit_and, /* ast_and_assign */
+      ir_binop_bit_xor, /* ast_xor_assign */
+      ir_binop_bit_or,  /* ast_or_assign */
+
+      -1,               /* ast_conditional doesn't convert to ir_expression. */
+      ir_binop_add,     /* ast_pre_inc. */
+      ir_binop_sub,     /* ast_pre_dec. */
+      ir_binop_add,     /* ast_post_inc. */
+      ir_binop_sub,     /* ast_post_dec. */
+      -1,               /* ast_field_selection doesn't conv to ir_expression. */
+      -1,               /* ast_array_index doesn't convert to ir_expression. */
+      -1,               /* ast_function_call doesn't conv to ir_expression. */
+      -1,               /* ast_identifier doesn't convert to ir_expression. */
+      -1,               /* ast_int_constant doesn't convert to ir_expression. */
+      -1,               /* ast_uint_constant doesn't conv to ir_expression. */
+      -1,               /* ast_float_constant doesn't conv to ir_expression. */
+      -1,               /* ast_bool_constant doesn't conv to ir_expression. */
+      -1,               /* ast_sequence doesn't convert to ir_expression. */
+   };
+   ir_rvalue *result = NULL;
+   ir_rvalue *op[3];
+   const struct glsl_type *type; /* a temporary variable for switch cases */
+   bool error_emitted = false;
+   YYLTYPE loc;
+
+   loc = this->get_location();
+
+   switch (this->oper) {
+   case ast_aggregate:
+      assert(!"ast_aggregate: Should never get here.");
+      break;
+
+   case ast_assign: {
+      op[0] = this->subexpressions[0]->hir(instructions, state);
+      op[1] = this->subexpressions[1]->hir(instructions, state);
+
+      error_emitted =
+         do_assignment(instructions, state,
+                       this->subexpressions[0]->non_lvalue_description,
+                       op[0], op[1], &result, needs_rvalue, false,
+                       this->subexpressions[0]->get_location());
+      break;
+   }
+
+   case ast_plus:
+      op[0] = this->subexpressions[0]->hir(instructions, state);
+
+      type = unary_arithmetic_result_type(op[0]->type, state, & loc);
+
+      error_emitted = type->is_error();
+
+      result = op[0];
+      break;
+
+   case ast_neg:
+      op[0] = this->subexpressions[0]->hir(instructions, state);
+
+      type = unary_arithmetic_result_type(op[0]->type, state, & loc);
+
+      error_emitted = type->is_error();
+
+      result = new(ctx) ir_expression(operations[this->oper], type,
+                                      op[0], NULL);
+      break;
+
+   case ast_add:
+   case ast_sub:
+   case ast_mul:
+   case ast_div:
+      op[0] = this->subexpressions[0]->hir(instructions, state);
+      op[1] = this->subexpressions[1]->hir(instructions, state);
+
+      type = arithmetic_result_type(op[0], op[1],
+                                    (this->oper == ast_mul),
+                                    state, & loc);
+      error_emitted = type->is_error();
+
+      result = new(ctx) ir_expression(operations[this->oper], type,
+                                      op[0], op[1]);
+      break;
+
+   case ast_mod:
+      op[0] = this->subexpressions[0]->hir(instructions, state);
+      op[1] = this->subexpressions[1]->hir(instructions, state);
+
+      type = modulus_result_type(op[0]->type, op[1]->type, state, & loc);
+
+      assert(operations[this->oper] == ir_binop_mod);
+
+      result = new(ctx) ir_expression(operations[this->oper], type,
+                                      op[0], op[1]);
+      error_emitted = type->is_error();
+      break;
+
+   case ast_lshift:
+   case ast_rshift:
+       if (!state->check_bitwise_operations_allowed(&loc)) {
+          error_emitted = true;
+       }
+
+       op[0] = this->subexpressions[0]->hir(instructions, state);
+       op[1] = this->subexpressions[1]->hir(instructions, state);
+       type = shift_result_type(op[0]->type, op[1]->type, this->oper, state,
+                                &loc);
+       result = new(ctx) ir_expression(operations[this->oper], type,
+                                       op[0], op[1]);
+       error_emitted = op[0]->type->is_error() || op[1]->type->is_error();
+       break;
+
+   case ast_less:
+   case ast_greater:
+   case ast_lequal:
+   case ast_gequal:
+      op[0] = this->subexpressions[0]->hir(instructions, state);
+      op[1] = this->subexpressions[1]->hir(instructions, state);
+
+      type = relational_result_type(op[0], op[1], state, & loc);
+
+      /* The relational operators must either generate an error or result
+       * in a scalar boolean.  See page 57 of the GLSL 1.50 spec.
+       */
+      assert(type->is_error()
+	     || ((type->base_type == GLSL_TYPE_BOOL)
+		 && type->is_scalar()));
+
+      result = new(ctx) ir_expression(operations[this->oper], type,
+                                      op[0], op[1]);
+      error_emitted = type->is_error();
+      break;
+
+   case ast_nequal:
+   case ast_equal:
+      op[0] = this->subexpressions[0]->hir(instructions, state);
+      op[1] = this->subexpressions[1]->hir(instructions, state);
+
+      /* From page 58 (page 64 of the PDF) of the GLSL 1.50 spec:
+       *
+       *    "The equality operators equal (==), and not equal (!=)
+       *    operate on all types. They result in a scalar Boolean. If
+       *    the operand types do not match, then there must be a
+       *    conversion from Section 4.1.10 "Implicit Conversions"
+       *    applied to one operand that can make them match, in which
+       *    case this conversion is done."
+       */
+      if ((!apply_implicit_conversion(op[0]->type, op[1], state)
+           && !apply_implicit_conversion(op[1]->type, op[0], state))
+          || (op[0]->type != op[1]->type)) {
+         _mesa_glsl_error(& loc, state, "operands of `%s' must have the same "
+                          "type", (this->oper == ast_equal) ? "==" : "!=");
+         error_emitted = true;
+      } else if ((op[0]->type->is_array() || op[1]->type->is_array()) &&
+                 !state->check_version(120, 300, &loc,
+                                       "array comparisons forbidden")) {
+         error_emitted = true;
+      } else if ((op[0]->type->contains_opaque() ||
+                  op[1]->type->contains_opaque())) {
+         _mesa_glsl_error(&loc, state, "opaque type comparisons forbidden");
+         error_emitted = true;
+      }
+
+      if (error_emitted) {
+         result = new(ctx) ir_constant(false);
+      } else {
+         result = do_comparison(ctx, operations[this->oper], op[0], op[1]);
+         assert(result->type == glsl_type::bool_type);
+      }
+      break;
+
+   case ast_bit_and:
+   case ast_bit_xor:
+   case ast_bit_or:
+      op[0] = this->subexpressions[0]->hir(instructions, state);
+      op[1] = this->subexpressions[1]->hir(instructions, state);
+      type = bit_logic_result_type(op[0]->type, op[1]->type, this->oper,
+                                   state, &loc);
+      result = new(ctx) ir_expression(operations[this->oper], type,
+                                      op[0], op[1]);
+      error_emitted = op[0]->type->is_error() || op[1]->type->is_error();
+      break;
+
+   case ast_bit_not:
+      op[0] = this->subexpressions[0]->hir(instructions, state);
+
+      if (!state->check_bitwise_operations_allowed(&loc)) {
+         error_emitted = true;
+      }
+
+      if (!op[0]->type->is_integer()) {
+         _mesa_glsl_error(&loc, state, "operand of `~' must be an integer");
+         error_emitted = true;
+      }
+
+      type = error_emitted ? glsl_type::error_type : op[0]->type;
+      result = new(ctx) ir_expression(ir_unop_bit_not, type, op[0], NULL);
+      break;
+
+   case ast_logic_and: {
+      exec_list rhs_instructions;
+      op[0] = get_scalar_boolean_operand(instructions, state, this, 0,
+                                         "LHS", &error_emitted);
+      op[1] = get_scalar_boolean_operand(&rhs_instructions, state, this, 1,
+                                         "RHS", &error_emitted);
+
+      if (rhs_instructions.is_empty()) {
+         result = new(ctx) ir_expression(ir_binop_logic_and, op[0], op[1]);
+         type = result->type;
+      } else {
+         ir_variable *const tmp = new(ctx) ir_variable(glsl_type::bool_type,
+                                                       "and_tmp",
+                                                       ir_var_temporary);
+         instructions->push_tail(tmp);
+
+         ir_if *const stmt = new(ctx) ir_if(op[0]);
+         instructions->push_tail(stmt);
+
+         stmt->then_instructions.append_list(&rhs_instructions);
+         ir_dereference *const then_deref = new(ctx) ir_dereference_variable(tmp);
+         ir_assignment *const then_assign =
+            new(ctx) ir_assignment(then_deref, op[1]);
+         stmt->then_instructions.push_tail(then_assign);
+
+         ir_dereference *const else_deref = new(ctx) ir_dereference_variable(tmp);
+         ir_assignment *const else_assign =
+            new(ctx) ir_assignment(else_deref, new(ctx) ir_constant(false));
+         stmt->else_instructions.push_tail(else_assign);
+
+         result = new(ctx) ir_dereference_variable(tmp);
+         type = tmp->type;
+      }
+      break;
+   }
+
+   case ast_logic_or: {
+      exec_list rhs_instructions;
+      op[0] = get_scalar_boolean_operand(instructions, state, this, 0,
+                                         "LHS", &error_emitted);
+      op[1] = get_scalar_boolean_operand(&rhs_instructions, state, this, 1,
+                                         "RHS", &error_emitted);
+
+      if (rhs_instructions.is_empty()) {
+         result = new(ctx) ir_expression(ir_binop_logic_or, op[0], op[1]);
+         type = result->type;
+      } else {
+         ir_variable *const tmp = new(ctx) ir_variable(glsl_type::bool_type,
+                                                       "or_tmp",
+                                                       ir_var_temporary);
+         instructions->push_tail(tmp);
+
+         ir_if *const stmt = new(ctx) ir_if(op[0]);
+         instructions->push_tail(stmt);
+
+         ir_dereference *const then_deref = new(ctx) ir_dereference_variable(tmp);
+         ir_assignment *const then_assign =
+            new(ctx) ir_assignment(then_deref, new(ctx) ir_constant(true));
+         stmt->then_instructions.push_tail(then_assign);
+
+         stmt->else_instructions.append_list(&rhs_instructions);
+         ir_dereference *const else_deref = new(ctx) ir_dereference_variable(tmp);
+         ir_assignment *const else_assign =
+            new(ctx) ir_assignment(else_deref, op[1]);
+         stmt->else_instructions.push_tail(else_assign);
+
+         result = new(ctx) ir_dereference_variable(tmp);
+         type = tmp->type;
+      }
+      break;
+   }
+
+   case ast_logic_xor:
+      /* From page 33 (page 39 of the PDF) of the GLSL 1.10 spec:
+       *
+       *    "The logical binary operators and (&&), or ( | | ), and
+       *     exclusive or (^^). They operate only on two Boolean
+       *     expressions and result in a Boolean expression."
+       */
+      op[0] = get_scalar_boolean_operand(instructions, state, this, 0, "LHS",
+                                         &error_emitted);
+      op[1] = get_scalar_boolean_operand(instructions, state, this, 1, "RHS",
+                                         &error_emitted);
+
+      result = new(ctx) ir_expression(operations[this->oper], glsl_type::bool_type,
+                                      op[0], op[1]);
+      break;
+
+   case ast_logic_not:
+      op[0] = get_scalar_boolean_operand(instructions, state, this, 0,
+                                         "operand", &error_emitted);
+
+      result = new(ctx) ir_expression(operations[this->oper], glsl_type::bool_type,
+                                      op[0], NULL);
+      break;
+
+   case ast_mul_assign:
+   case ast_div_assign:
+   case ast_add_assign:
+   case ast_sub_assign: {
+      op[0] = this->subexpressions[0]->hir(instructions, state);
+      op[1] = this->subexpressions[1]->hir(instructions, state);
+
+      type = arithmetic_result_type(op[0], op[1],
+                                    (this->oper == ast_mul_assign),
+                                    state, & loc);
+
+      ir_rvalue *temp_rhs = new(ctx) ir_expression(operations[this->oper], type,
+                                                   op[0], op[1]);
+
+      error_emitted =
+         do_assignment(instructions, state,
+                       this->subexpressions[0]->non_lvalue_description,
+                       op[0]->clone(ctx, NULL), temp_rhs,
+                       &result, needs_rvalue, false,
+                       this->subexpressions[0]->get_location());
+
+      /* GLSL 1.10 does not allow array assignment.  However, we don't have to
+       * explicitly test for this because none of the binary expression
+       * operators allow array operands either.
+       */
+
+      break;
+   }
+
+   case ast_mod_assign: {
+      op[0] = this->subexpressions[0]->hir(instructions, state);
+      op[1] = this->subexpressions[1]->hir(instructions, state);
+
+      type = modulus_result_type(op[0]->type, op[1]->type, state, & loc);
+
+      assert(operations[this->oper] == ir_binop_mod);
+
+      ir_rvalue *temp_rhs;
+      temp_rhs = new(ctx) ir_expression(operations[this->oper], type,
+                                        op[0], op[1]);
+
+      error_emitted =
+         do_assignment(instructions, state,
+                       this->subexpressions[0]->non_lvalue_description,
+                       op[0]->clone(ctx, NULL), temp_rhs,
+                       &result, needs_rvalue, false,
+                       this->subexpressions[0]->get_location());
+      break;
+   }
+
+   case ast_ls_assign:
+   case ast_rs_assign: {
+      op[0] = this->subexpressions[0]->hir(instructions, state);
+      op[1] = this->subexpressions[1]->hir(instructions, state);
+      type = shift_result_type(op[0]->type, op[1]->type, this->oper, state,
+                               &loc);
+      ir_rvalue *temp_rhs = new(ctx) ir_expression(operations[this->oper],
+                                                   type, op[0], op[1]);
+      error_emitted =
+         do_assignment(instructions, state,
+                       this->subexpressions[0]->non_lvalue_description,
+                       op[0]->clone(ctx, NULL), temp_rhs,
+                       &result, needs_rvalue, false,
+                       this->subexpressions[0]->get_location());
+      break;
+   }
+
+   case ast_and_assign:
+   case ast_xor_assign:
+   case ast_or_assign: {
+      op[0] = this->subexpressions[0]->hir(instructions, state);
+      op[1] = this->subexpressions[1]->hir(instructions, state);
+      type = bit_logic_result_type(op[0]->type, op[1]->type, this->oper,
+                                   state, &loc);
+      ir_rvalue *temp_rhs = new(ctx) ir_expression(operations[this->oper],
+                                                   type, op[0], op[1]);
+      error_emitted =
+         do_assignment(instructions, state,
+                       this->subexpressions[0]->non_lvalue_description,
+                       op[0]->clone(ctx, NULL), temp_rhs,
+                       &result, needs_rvalue, false,
+                       this->subexpressions[0]->get_location());
+      break;
+   }
+
+   case ast_conditional: {
+      /* From page 59 (page 65 of the PDF) of the GLSL 1.50 spec:
+       *
+       *    "The ternary selection operator (?:). It operates on three
+       *    expressions (exp1 ? exp2 : exp3). This operator evaluates the
+       *    first expression, which must result in a scalar Boolean."
+       */
+      op[0] = get_scalar_boolean_operand(instructions, state, this, 0,
+                                         "condition", &error_emitted);
+
+      /* The :? operator is implemented by generating an anonymous temporary
+       * followed by an if-statement.  The last instruction in each branch of
+       * the if-statement assigns a value to the anonymous temporary.  This
+       * temporary is the r-value of the expression.
+       */
+      exec_list then_instructions;
+      exec_list else_instructions;
+
+      op[1] = this->subexpressions[1]->hir(&then_instructions, state);
+      op[2] = this->subexpressions[2]->hir(&else_instructions, state);
+
+      /* From page 59 (page 65 of the PDF) of the GLSL 1.50 spec:
+       *
+       *     "The second and third expressions can be any type, as
+       *     long their types match, or there is a conversion in
+       *     Section 4.1.10 "Implicit Conversions" that can be applied
+       *     to one of the expressions to make their types match. This
+       *     resulting matching type is the type of the entire
+       *     expression."
+       */
+      if ((!apply_implicit_conversion(op[1]->type, op[2], state)
+          && !apply_implicit_conversion(op[2]->type, op[1], state))
+          || (op[1]->type != op[2]->type)) {
+         YYLTYPE loc = this->subexpressions[1]->get_location();
+
+         _mesa_glsl_error(& loc, state, "second and third operands of ?: "
+                          "operator must have matching types");
+         error_emitted = true;
+         type = glsl_type::error_type;
+      } else {
+         type = op[1]->type;
+      }
+
+      /* From page 33 (page 39 of the PDF) of the GLSL 1.10 spec:
+       *
+       *    "The second and third expressions must be the same type, but can
+       *    be of any type other than an array."
+       */
+      if (type->is_array() &&
+          !state->check_version(120, 300, &loc,
+                                "second and third operands of ?: operator "
+                                "cannot be arrays")) {
+         error_emitted = true;
+      }
+
+      ir_constant *cond_val = op[0]->constant_expression_value();
+      ir_constant *then_val = op[1]->constant_expression_value();
+      ir_constant *else_val = op[2]->constant_expression_value();
+
+      if (then_instructions.is_empty()
+          && else_instructions.is_empty()
+          && (cond_val != NULL) && (then_val != NULL) && (else_val != NULL)) {
+         result = (cond_val->value.b[0]) ? then_val : else_val;
+      } else {
+         ir_variable *const tmp =
+            new(ctx) ir_variable(type, "conditional_tmp", ir_var_temporary);
+         instructions->push_tail(tmp);
+
+         ir_if *const stmt = new(ctx) ir_if(op[0]);
+         instructions->push_tail(stmt);
+
+         then_instructions.move_nodes_to(& stmt->then_instructions);
+         ir_dereference *const then_deref =
+            new(ctx) ir_dereference_variable(tmp);
+         ir_assignment *const then_assign =
+            new(ctx) ir_assignment(then_deref, op[1]);
+         stmt->then_instructions.push_tail(then_assign);
+
+         else_instructions.move_nodes_to(& stmt->else_instructions);
+         ir_dereference *const else_deref =
+            new(ctx) ir_dereference_variable(tmp);
+         ir_assignment *const else_assign =
+            new(ctx) ir_assignment(else_deref, op[2]);
+         stmt->else_instructions.push_tail(else_assign);
+
+         result = new(ctx) ir_dereference_variable(tmp);
+      }
+      break;
+   }
+
+   case ast_pre_inc:
+   case ast_pre_dec: {
+      this->non_lvalue_description = (this->oper == ast_pre_inc)
+         ? "pre-increment operation" : "pre-decrement operation";
+
+      op[0] = this->subexpressions[0]->hir(instructions, state);
+      op[1] = constant_one_for_inc_dec(ctx, op[0]->type);
+
+      type = arithmetic_result_type(op[0], op[1], false, state, & loc);
+
+      ir_rvalue *temp_rhs;
+      temp_rhs = new(ctx) ir_expression(operations[this->oper], type,
+                                        op[0], op[1]);
+
+      error_emitted =
+         do_assignment(instructions, state,
+                       this->subexpressions[0]->non_lvalue_description,
+                       op[0]->clone(ctx, NULL), temp_rhs,
+                       &result, needs_rvalue, false,
+                       this->subexpressions[0]->get_location());
+      break;
+   }
+
+   case ast_post_inc:
+   case ast_post_dec: {
+      this->non_lvalue_description = (this->oper == ast_post_inc)
+         ? "post-increment operation" : "post-decrement operation";
+      op[0] = this->subexpressions[0]->hir(instructions, state);
+      op[1] = constant_one_for_inc_dec(ctx, op[0]->type);
+
+      error_emitted = op[0]->type->is_error() || op[1]->type->is_error();
+
+      type = arithmetic_result_type(op[0], op[1], false, state, & loc);
+
+      ir_rvalue *temp_rhs;
+      temp_rhs = new(ctx) ir_expression(operations[this->oper], type,
+                                        op[0], op[1]);
+
+      /* Get a temporary of a copy of the lvalue before it's modified.
+       * This may get thrown away later.
+       */
+      result = get_lvalue_copy(instructions, op[0]->clone(ctx, NULL));
+
+      ir_rvalue *junk_rvalue;
+      error_emitted =
+         do_assignment(instructions, state,
+                       this->subexpressions[0]->non_lvalue_description,
+                       op[0]->clone(ctx, NULL), temp_rhs,
+                       &junk_rvalue, false, false,
+                       this->subexpressions[0]->get_location());
+
+      break;
+   }
+
+   case ast_field_selection:
+      result = _mesa_ast_field_selection_to_hir(this, instructions, state);
+      break;
+
+   case ast_array_index: {
+      YYLTYPE index_loc = subexpressions[1]->get_location();
+
+      op[0] = subexpressions[0]->hir(instructions, state);
+      op[1] = subexpressions[1]->hir(instructions, state);
+
+      result = _mesa_ast_array_index_to_hir(ctx, state, op[0], op[1],
+                                            loc, index_loc);
+
+      if (result->type->is_error())
+         error_emitted = true;
+
+      break;
+   }
+
+   case ast_function_call:
+      /* Should *NEVER* get here.  ast_function_call should always be handled
+       * by ast_function_expression::hir.
+       */
+      assert(0);
+      break;
+
+   case ast_identifier: {
+      /* ast_identifier can appear several places in a full abstract syntax
+       * tree.  This particular use must be at location specified in the grammar
+       * as 'variable_identifier'.
+       */
+      ir_variable *var = 
+         state->symbols->get_variable(this->primary_expression.identifier);
+
+      if (var != NULL) {
+         var->data.used = true;
+         result = new(ctx) ir_dereference_variable(var);
+      } else {
+         _mesa_glsl_error(& loc, state, "`%s' undeclared",
+                          this->primary_expression.identifier);
+
+         result = ir_rvalue::error_value(ctx);
+         error_emitted = true;
+      }
+      break;
+   }
+
+   case ast_int_constant:
+      result = new(ctx) ir_constant(this->primary_expression.int_constant);
+      break;
+
+   case ast_uint_constant:
+      result = new(ctx) ir_constant(this->primary_expression.uint_constant);
+      break;
+
+   case ast_float_constant:
+      result = new(ctx) ir_constant(this->primary_expression.float_constant);
+      break;
+
+   case ast_bool_constant:
+      result = new(ctx) ir_constant(bool(this->primary_expression.bool_constant));
+      break;
+
+   case ast_sequence: {
+      /* It should not be possible to generate a sequence in the AST without
+       * any expressions in it.
+       */
+      assert(!this->expressions.is_empty());
+
+      /* The r-value of a sequence is the last expression in the sequence.  If
+       * the other expressions in the sequence do not have side-effects (and
+       * therefore add instructions to the instruction list), they get dropped
+       * on the floor.
+       */
+      exec_node *previous_tail_pred = NULL;
+      YYLTYPE previous_operand_loc = loc;
+
+      foreach_list_typed (ast_node, ast, link, &this->expressions) {
+         /* If one of the operands of comma operator does not generate any
+          * code, we want to emit a warning.  At each pass through the loop
+          * previous_tail_pred will point to the last instruction in the
+          * stream *before* processing the previous operand.  Naturally,
+          * instructions->tail_pred will point to the last instruction in the
+          * stream *after* processing the previous operand.  If the two
+          * pointers match, then the previous operand had no effect.
+          *
+          * The warning behavior here differs slightly from GCC.  GCC will
+          * only emit a warning if none of the left-hand operands have an
+          * effect.  However, it will emit a warning for each.  I believe that
+          * there are some cases in C (especially with GCC extensions) where
+          * it is useful to have an intermediate step in a sequence have no
+          * effect, but I don't think these cases exist in GLSL.  Either way,
+          * it would be a giant hassle to replicate that behavior.
+          */
+         if (previous_tail_pred == instructions->tail_pred) {
+            _mesa_glsl_warning(&previous_operand_loc, state,
+                               "left-hand operand of comma expression has "
+                               "no effect");
+         }
+
+         /* tail_pred is directly accessed instead of using the get_tail()
+          * method for performance reasons.  get_tail() has extra code to
+          * return NULL when the list is empty.  We don't care about that
+          * here, so using tail_pred directly is fine.
+          */
+         previous_tail_pred = instructions->tail_pred;
+         previous_operand_loc = ast->get_location();
+
+         result = ast->hir(instructions, state);
+      }
+
+      /* Any errors should have already been emitted in the loop above.
+       */
+      error_emitted = true;
+      break;
+   }
+   }
+   type = NULL; /* use result->type, not type. */
+   assert(result != NULL || !needs_rvalue);
+
+   if (result && result->type->is_error() && !error_emitted)
+      _mesa_glsl_error(& loc, state, "type mismatch");
+
+   return result;
+}
+
+
+ir_rvalue *
+ast_expression_statement::hir(exec_list *instructions,
+                              struct _mesa_glsl_parse_state *state)
+{
+   /* It is possible to have expression statements that don't have an
+    * expression.  This is the solitary semicolon:
+    *
+    * for (i = 0; i < 5; i++)
+    *     ;
+    *
+    * In this case the expression will be NULL.  Test for NULL and don't do
+    * anything in that case.
+    */
+   if (expression != NULL)
+      expression->hir_no_rvalue(instructions, state);
+
+   /* Statements do not have r-values.
+    */
+   return NULL;
+}
+
+
+ir_rvalue *
+ast_compound_statement::hir(exec_list *instructions,
+                            struct _mesa_glsl_parse_state *state)
+{
+   if (new_scope)
+      state->symbols->push_scope();
+
+   foreach_list_typed (ast_node, ast, link, &this->statements)
+      ast->hir(instructions, state);
+
+   if (new_scope)
+      state->symbols->pop_scope();
+
+   /* Compound statements do not have r-values.
+    */
+   return NULL;
+}
+
+/**
+ * Evaluate the given exec_node (which should be an ast_node representing
+ * a single array dimension) and return its integer value.
+ */
+static unsigned
+process_array_size(exec_node *node,
+                   struct _mesa_glsl_parse_state *state)
+{
+   exec_list dummy_instructions;
+
+   ast_node *array_size = exec_node_data(ast_node, node, link);
+   ir_rvalue *const ir = array_size->hir(& dummy_instructions, state);
+   YYLTYPE loc = array_size->get_location();
+
+   if (ir == NULL) {
+      _mesa_glsl_error(& loc, state,
+                       "array size could not be resolved");
+      return 0;
+   }
+
+   if (!ir->type->is_integer()) {
+      _mesa_glsl_error(& loc, state,
+                       "array size must be integer type");
+      return 0;
+   }
+
+   if (!ir->type->is_scalar()) {
+      _mesa_glsl_error(& loc, state,
+                       "array size must be scalar type");
+      return 0;
+   }
+
+   ir_constant *const size = ir->constant_expression_value();
+   if (size == NULL) {
+      _mesa_glsl_error(& loc, state, "array size must be a "
+                       "constant valued expression");
+      return 0;
+   }
+
+   if (size->value.i[0] <= 0) {
+      _mesa_glsl_error(& loc, state, "array size must be > 0");
+      return 0;
+   }
+
+   assert(size->type == ir->type);
+
+   /* If the array size is const (and we've verified that
+    * it is) then no instructions should have been emitted
+    * when we converted it to HIR. If they were emitted,
+    * then either the array size isn't const after all, or
+    * we are emitting unnecessary instructions.
+    */
+   assert(dummy_instructions.is_empty());
+
+   return size->value.u[0];
+}
+
+static const glsl_type *
+process_array_type(YYLTYPE *loc, const glsl_type *base,
+                   ast_array_specifier *array_specifier,
+                   struct _mesa_glsl_parse_state *state)
+{
+   const glsl_type *array_type = base;
+
+   if (array_specifier != NULL) {
+      if (base->is_array()) {
+
+         /* From page 19 (page 25) of the GLSL 1.20 spec:
+          *
+          * "Only one-dimensional arrays may be declared."
+          */
+         if (!state->ARB_arrays_of_arrays_enable) {
+            _mesa_glsl_error(loc, state,
+                             "invalid array of `%s'"
+                             "GL_ARB_arrays_of_arrays "
+                             "required for defining arrays of arrays",
+                             base->name);
+            return glsl_type::error_type;
+         }
+
+         if (base->length == 0) {
+            _mesa_glsl_error(loc, state,
+                             "only the outermost array dimension can "
+                             "be unsized",
+                             base->name);
+            return glsl_type::error_type;
+         }
+      }
+
+      for (exec_node *node = array_specifier->array_dimensions.tail_pred;
+           !node->is_head_sentinel(); node = node->prev) {
+         unsigned array_size = process_array_size(node, state);
+         array_type = glsl_type::get_array_instance(array_type, array_size);
+      }
+
+      if (array_specifier->is_unsized_array)
+         array_type = glsl_type::get_array_instance(array_type, 0);
+   }
+
+   return array_type;
+}
+
+
+const glsl_type *
+ast_type_specifier::glsl_type(const char **name,
+                              struct _mesa_glsl_parse_state *state) const
+{
+   const struct glsl_type *type;
+
+   type = state->symbols->get_type(this->type_name);
+   *name = this->type_name;
+
+   YYLTYPE loc = this->get_location();
+   type = process_array_type(&loc, type, this->array_specifier, state);
+
+   return type;
+}
+
+const glsl_type *
+ast_fully_specified_type::glsl_type(const char **name,
+                                    struct _mesa_glsl_parse_state *state) const
+{
+   const struct glsl_type *type = this->specifier->glsl_type(name, state);
+
+   if (type == NULL)
+      return NULL;
+
+   if (type->base_type == GLSL_TYPE_FLOAT
+       && state->es_shader
+       && state->stage == MESA_SHADER_FRAGMENT
+       && this->qualifier.precision == ast_precision_none
+       && state->symbols->get_variable("#default precision") == NULL) {
+      YYLTYPE loc = this->get_location();
+      _mesa_glsl_error(&loc, state,
+                       "no precision specified this scope for type `%s'",
+                       type->name);
+   }
+
+   return type;
+}
+
+/**
+ * Determine whether a toplevel variable declaration declares a varying.  This
+ * function operates by examining the variable's mode and the shader target,
+ * so it correctly identifies linkage variables regardless of whether they are
+ * declared using the deprecated "varying" syntax or the new "in/out" syntax.
+ *
+ * Passing a non-toplevel variable declaration (e.g. a function parameter) to
+ * this function will produce undefined results.
+ */
+static bool
+is_varying_var(ir_variable *var, gl_shader_stage target)
+{
+   switch (target) {
+   case MESA_SHADER_VERTEX:
+      return var->data.mode == ir_var_shader_out;
+   case MESA_SHADER_FRAGMENT:
+      return var->data.mode == ir_var_shader_in;
+   default:
+      return var->data.mode == ir_var_shader_out || var->data.mode == ir_var_shader_in;
+   }
+}
+
+
+/**
+ * Matrix layout qualifiers are only allowed on certain types
+ */
+static void
+validate_matrix_layout_for_type(struct _mesa_glsl_parse_state *state,
+                                YYLTYPE *loc,
+                                const glsl_type *type,
+                                ir_variable *var)
+{
+   if (var && !var->is_in_uniform_block()) {
+      /* Layout qualifiers may only apply to interface blocks and fields in
+       * them.
+       */
+      _mesa_glsl_error(loc, state,
+                       "uniform block layout qualifiers row_major and "
+                       "column_major may not be applied to variables "
+                       "outside of uniform blocks");
+   } else if (!type->is_matrix()) {
+      /* The OpenGL ES 3.0 conformance tests did not originally allow
+       * matrix layout qualifiers on non-matrices.  However, the OpenGL
+       * 4.4 and OpenGL ES 3.0 (revision TBD) specifications were
+       * amended to specifically allow these layouts on all types.  Emit
+       * a warning so that people know their code may not be portable.
+       */
+      _mesa_glsl_warning(loc, state,
+                         "uniform block layout qualifiers row_major and "
+                         "column_major applied to non-matrix types may "
+                         "be rejected by older compilers");
+   } else if (type->is_record()) {
+      /* We allow 'layout(row_major)' on structure types because it's the only
+       * way to get row-major layouts on matrices contained in structures.
+       */
+      _mesa_glsl_warning(loc, state,
+                         "uniform block layout qualifiers row_major and "
+                         "column_major applied to structure types is not "
+                         "strictly conformant and may be rejected by other "
+                         "compilers");
+   }
+}
+
+static bool
+validate_binding_qualifier(struct _mesa_glsl_parse_state *state,
+                           YYLTYPE *loc,
+                           ir_variable *var,
+                           const ast_type_qualifier *qual)
+{
+   if (var->data.mode != ir_var_uniform) {
+      _mesa_glsl_error(loc, state,
+                       "the \"binding\" qualifier only applies to uniforms");
+      return false;
+   }
+
+   if (qual->binding < 0) {
+      _mesa_glsl_error(loc, state, "binding values must be >= 0");
+      return false;
+   }
+
+   const struct gl_context *const ctx = state->ctx;
+   unsigned elements = var->type->is_array() ? var->type->length : 1;
+   unsigned max_index = qual->binding + elements - 1;
+
+   if (var->type->is_interface()) {
+      /* UBOs.  From page 60 of the GLSL 4.20 specification:
+       * "If the binding point for any uniform block instance is less than zero,
+       *  or greater than or equal to the implementation-dependent maximum
+       *  number of uniform buffer bindings, a compilation error will occur.
+       *  When the binding identifier is used with a uniform block instanced as
+       *  an array of size N, all elements of the array from binding through
+       *  binding + N – 1 must be within this range."
+       *
+       * The implementation-dependent maximum is GL_MAX_UNIFORM_BUFFER_BINDINGS.
+       */
+      if (max_index >= ctx->Const.MaxUniformBufferBindings) {
+         _mesa_glsl_error(loc, state, "layout(binding = %d) for %d UBOs exceeds "
+                          "the maximum number of UBO binding points (%d)",
+                          qual->binding, elements,
+                          ctx->Const.MaxUniformBufferBindings);
+         return false;
+      }
+   } else if (var->type->is_sampler() ||
+              (var->type->is_array() && var->type->fields.array->is_sampler())) {
+      /* Samplers.  From page 63 of the GLSL 4.20 specification:
+       * "If the binding is less than zero, or greater than or equal to the
+       *  implementation-dependent maximum supported number of units, a
+       *  compilation error will occur. When the binding identifier is used
+       *  with an array of size N, all elements of the array from binding
+       *  through binding + N - 1 must be within this range."
+       */
+      unsigned limit = ctx->Const.Program[state->stage].MaxTextureImageUnits;
+
+      if (max_index >= limit) {
+         _mesa_glsl_error(loc, state, "layout(binding = %d) for %d samplers "
+                          "exceeds the maximum number of texture image units "
+                          "(%d)", qual->binding, elements, limit);
+
+         return false;
+      }
+   } else if (var->type->contains_atomic()) {
+      assert(ctx->Const.MaxAtomicBufferBindings <= MAX_COMBINED_ATOMIC_BUFFERS);
+      if (unsigned(qual->binding) >= ctx->Const.MaxAtomicBufferBindings) {
+         _mesa_glsl_error(loc, state, "layout(binding = %d) exceeds the "
+                          " maximum number of atomic counter buffer bindings"
+                          "(%d)", qual->binding,
+                          ctx->Const.MaxAtomicBufferBindings);
+
+         return false;
+      }
+   } else {
+      _mesa_glsl_error(loc, state,
+                       "the \"binding\" qualifier only applies to uniform "
+                       "blocks, samplers, atomic counters, or arrays thereof");
+      return false;
+   }
+
+   return true;
+}
+
+
+static glsl_interp_qualifier
+interpret_interpolation_qualifier(const struct ast_type_qualifier *qual,
+                                  ir_variable_mode mode,
+                                  struct _mesa_glsl_parse_state *state,
+                                  YYLTYPE *loc)
+{
+   glsl_interp_qualifier interpolation;
+   if (qual->flags.q.flat)
+      interpolation = INTERP_QUALIFIER_FLAT;
+   else if (qual->flags.q.noperspective)
+      interpolation = INTERP_QUALIFIER_NOPERSPECTIVE;
+   else if (qual->flags.q.smooth)
+      interpolation = INTERP_QUALIFIER_SMOOTH;
+   else
+      interpolation = INTERP_QUALIFIER_NONE;
+
+   if (interpolation != INTERP_QUALIFIER_NONE) {
+      if (mode != ir_var_shader_in && mode != ir_var_shader_out) {
+         _mesa_glsl_error(loc, state,
+                          "interpolation qualifier `%s' can only be applied to "
+                          "shader inputs or outputs.",
+                          interpolation_string(interpolation));
+
+      }
+
+      if ((state->stage == MESA_SHADER_VERTEX && mode == ir_var_shader_in) ||
+          (state->stage == MESA_SHADER_FRAGMENT && mode == ir_var_shader_out)) {
+         _mesa_glsl_error(loc, state,
+                          "interpolation qualifier `%s' cannot be applied to "
+                          "vertex shader inputs or fragment shader outputs",
+                          interpolation_string(interpolation));
+      }
+   }
+
+   return interpolation;
+}
+
+
+static void
+validate_explicit_location(const struct ast_type_qualifier *qual,
+                           ir_variable *var,
+                           struct _mesa_glsl_parse_state *state,
+                           YYLTYPE *loc)
+{
+   bool fail = false;
+
+   /* Between GL_ARB_explicit_attrib_location an
+    * GL_ARB_separate_shader_objects, the inputs and outputs of any shader
+    * stage can be assigned explicit locations.  The checking here associates
+    * the correct extension with the correct stage's input / output:
+    *
+    *                     input            output
+    *                     -----            ------
+    * vertex              explicit_loc     sso
+    * geometry            sso              sso
+    * fragment            sso              explicit_loc
+    */
+   switch (state->stage) {
+   case MESA_SHADER_VERTEX:
+      if (var->data.mode == ir_var_shader_in) {
+         if (!state->check_explicit_attrib_location_allowed(loc, var))
+            return;
+
+         break;
+      }
+
+      if (var->data.mode == ir_var_shader_out) {
+         if (!state->check_separate_shader_objects_allowed(loc, var))
+            return;
+
+         break;
+      }
+
+      fail = true;
+      break;
+
+   case MESA_SHADER_GEOMETRY:
+      if (var->data.mode == ir_var_shader_in || var->data.mode == ir_var_shader_out) {
+         if (!state->check_separate_shader_objects_allowed(loc, var))
+            return;
+
+         break;
+      }
+
+      fail = true;
+      break;
+
+   case MESA_SHADER_FRAGMENT:
+      if (var->data.mode == ir_var_shader_in) {
+         if (!state->check_separate_shader_objects_allowed(loc, var))
+            return;
+
+         break;
+      }
+
+      if (var->data.mode == ir_var_shader_out) {
+         if (!state->check_explicit_attrib_location_allowed(loc, var))
+            return;
+
+         break;
+      }
+
+      fail = true;
+      break;
+
+   case MESA_SHADER_COMPUTE:
+      _mesa_glsl_error(loc, state,
+                       "compute shader variables cannot be given "
+                       "explicit locations");
+      return;
+   };
+
+   if (fail) {
+      _mesa_glsl_error(loc, state,
+                       "%s cannot be given an explicit location in %s shader",
+                       mode_string(var),
+      _mesa_shader_stage_to_string(state->stage));
+   } else {
+      var->data.explicit_location = true;
+
+      /* This bit of silliness is needed because invalid explicit locations
+       * are supposed to be flagged during linking.  Small negative values
+       * biased by VERT_ATTRIB_GENERIC0 or FRAG_RESULT_DATA0 could alias
+       * built-in values (e.g., -16+VERT_ATTRIB_GENERIC0 = VERT_ATTRIB_POS).
+       * The linker needs to be able to differentiate these cases.  This
+       * ensures that negative values stay negative.
+       */
+      if (qual->location >= 0) {
+         switch (state->stage) {
+         case MESA_SHADER_VERTEX:
+            var->data.location = (var->data.mode == ir_var_shader_in)
+               ? (qual->location + VERT_ATTRIB_GENERIC0)
+               : (qual->location + VARYING_SLOT_VAR0);
+            break;
+
+         case MESA_SHADER_GEOMETRY:
+            var->data.location = qual->location + VARYING_SLOT_VAR0;
+            break;
+
+         case MESA_SHADER_FRAGMENT:
+            var->data.location = (var->data.mode == ir_var_shader_out)
+               ? (qual->location + FRAG_RESULT_DATA0)
+               : (qual->location + VARYING_SLOT_VAR0);
+            break;
+         case MESA_SHADER_COMPUTE:
+            assert(!"Unexpected shader type");
+            break;
+         }
+      } else {
+         var->data.location = qual->location;
+      }
+
+      if (qual->flags.q.explicit_index) {
+         /* From the GLSL 4.30 specification, section 4.4.2 (Output
+          * Layout Qualifiers):
+          *
+          * "It is also a compile-time error if a fragment shader
+          *  sets a layout index to less than 0 or greater than 1."
+          *
+          * Older specifications don't mandate a behavior; we take
+          * this as a clarification and always generate the error.
+          */
+         if (qual->index < 0 || qual->index > 1) {
+            _mesa_glsl_error(loc, state,
+                             "explicit index may only be 0 or 1");
+         } else {
+            var->data.explicit_index = true;
+            var->data.index = qual->index;
+         }
+      }
+   }
+}
+
+static void
+apply_image_qualifier_to_variable(const struct ast_type_qualifier *qual,
+                                  ir_variable *var,
+                                  struct _mesa_glsl_parse_state *state,
+                                  YYLTYPE *loc)
+{
+   const glsl_type *base_type =
+      (var->type->is_array() ? var->type->element_type() : var->type);
+
+   if (base_type->is_image()) {
+      if (var->data.mode != ir_var_uniform &&
+          var->data.mode != ir_var_function_in) {
+         _mesa_glsl_error(loc, state, "image variables may only be declared as "
+                          "function parameters or uniform-qualified "
+                          "global variables");
+      }
+
+      var->data.image.read_only |= qual->flags.q.read_only;
+      var->data.image.write_only |= qual->flags.q.write_only;
+      var->data.image.coherent |= qual->flags.q.coherent;
+      var->data.image._volatile |= qual->flags.q._volatile;
+      var->data.image.restrict_flag |= qual->flags.q.restrict_flag;
+      var->data.read_only = true;
+
+      if (qual->flags.q.explicit_image_format) {
+         if (var->data.mode == ir_var_function_in) {
+            _mesa_glsl_error(loc, state, "format qualifiers cannot be "
+                             "used on image function parameters");
+         }
+
+         if (qual->image_base_type != base_type->sampler_type) {
+            _mesa_glsl_error(loc, state, "format qualifier doesn't match the "
+                             "base data type of the image");
+         }
+
+         var->data.image.format = qual->image_format;
+      } else {
+         if (var->data.mode == ir_var_uniform && !qual->flags.q.write_only) {
+            _mesa_glsl_error(loc, state, "uniforms not qualified with "
+                             "`writeonly' must have a format layout "
+                             "qualifier");
+         }
+
+         var->data.image.format = GL_NONE;
+      }
+   }
+}
+
+static inline const char*
+get_layout_qualifier_string(bool origin_upper_left, bool pixel_center_integer)
+{
+   if (origin_upper_left && pixel_center_integer)
+      return "origin_upper_left, pixel_center_integer";
+   else if (origin_upper_left)
+      return "origin_upper_left";
+   else if (pixel_center_integer)
+      return "pixel_center_integer";
+   else
+      return " ";
+}
+
+static inline bool
+is_conflicting_fragcoord_redeclaration(struct _mesa_glsl_parse_state *state,
+                                       const struct ast_type_qualifier *qual)
+{
+   /* If gl_FragCoord was previously declared, and the qualifiers were
+    * different in any way, return true.
+    */
+   if (state->fs_redeclares_gl_fragcoord) {
+      return (state->fs_pixel_center_integer != qual->flags.q.pixel_center_integer
+         || state->fs_origin_upper_left != qual->flags.q.origin_upper_left);
+   }
+
+   return false;
+}
+
+static void
+apply_type_qualifier_to_variable(const struct ast_type_qualifier *qual,
+                                 ir_variable *var,
+                                 struct _mesa_glsl_parse_state *state,
+                                 YYLTYPE *loc,
+                                 bool is_parameter)
+{
+   STATIC_ASSERT(sizeof(qual->flags.q) <= sizeof(qual->flags.i));
+
+   if (qual->flags.q.invariant) {
+      if (var->data.used) {
+         _mesa_glsl_error(loc, state,
+                          "variable `%s' may not be redeclared "
+                          "`invariant' after being used",
+                          var->name);
+      } else {
+         var->data.invariant = 1;
+      }
+   }
+
+   if (qual->flags.q.constant || qual->flags.q.attribute
+       || qual->flags.q.uniform
+       || (qual->flags.q.varying && (state->stage == MESA_SHADER_FRAGMENT)))
+      var->data.read_only = 1;
+
+   if (qual->flags.q.centroid)
+      var->data.centroid = 1;
+
+   if (qual->flags.q.sample)
+      var->data.sample = 1;
+
+   if (qual->flags.q.attribute && state->stage != MESA_SHADER_VERTEX) {
+      var->type = glsl_type::error_type;
+      _mesa_glsl_error(loc, state,
+                       "`attribute' variables may not be declared in the "
+                       "%s shader",
+                       _mesa_shader_stage_to_string(state->stage));
+   }
+
+   /* Section 6.1.1 (Function Calling Conventions) of the GLSL 1.10 spec says:
+    *
+    *     "However, the const qualifier cannot be used with out or inout."
+    *
+    * The same section of the GLSL 4.40 spec further clarifies this saying:
+    *
+    *     "The const qualifier cannot be used with out or inout, or a
+    *     compile-time error results."
+    */
+   if (is_parameter && qual->flags.q.constant && qual->flags.q.out) {
+      _mesa_glsl_error(loc, state,
+                       "`const' may not be applied to `out' or `inout' "
+                       "function parameters");
+   }
+
+   /* If there is no qualifier that changes the mode of the variable, leave
+    * the setting alone.
+    */
+   if (qual->flags.q.in && qual->flags.q.out)
+      var->data.mode = ir_var_function_inout;
+   else if (qual->flags.q.in)
+      var->data.mode = is_parameter ? ir_var_function_in : ir_var_shader_in;
+   else if (qual->flags.q.attribute
+	    || (qual->flags.q.varying && (state->stage == MESA_SHADER_FRAGMENT)))
+      var->data.mode = ir_var_shader_in;
+   else if (qual->flags.q.out)
+      var->data.mode = is_parameter ? ir_var_function_out : ir_var_shader_out;
+   else if (qual->flags.q.varying && (state->stage == MESA_SHADER_VERTEX))
+      var->data.mode = ir_var_shader_out;
+   else if (qual->flags.q.uniform)
+      var->data.mode = ir_var_uniform;
+
+   if (!is_parameter && is_varying_var(var, state->stage)) {
+      /* User-defined ins/outs are not permitted in compute shaders. */
+      if (state->stage == MESA_SHADER_COMPUTE) {
+         _mesa_glsl_error(loc, state,
+                          "user-defined input and output variables are not "
+                          "permitted in compute shaders");
+      }
+
+      /* This variable is being used to link data between shader stages (in
+       * pre-glsl-1.30 parlance, it's a "varying").  Check that it has a type
+       * that is allowed for such purposes.
+       *
+       * From page 25 (page 31 of the PDF) of the GLSL 1.10 spec:
+       *
+       *     "The varying qualifier can be used only with the data types
+       *     float, vec2, vec3, vec4, mat2, mat3, and mat4, or arrays of
+       *     these."
+       *
+       * This was relaxed in GLSL version 1.30 and GLSL ES version 3.00.  From
+       * page 31 (page 37 of the PDF) of the GLSL 1.30 spec:
+       *
+       *     "Fragment inputs can only be signed and unsigned integers and
+       *     integer vectors, float, floating-point vectors, matrices, or
+       *     arrays of these. Structures cannot be input.
+       *
+       * Similar text exists in the section on vertex shader outputs.
+       *
+       * Similar text exists in the GLSL ES 3.00 spec, except that the GLSL ES
+       * 3.00 spec allows structs as well.  Varying structs are also allowed
+       * in GLSL 1.50.
+       */
+      switch (var->type->get_scalar_type()->base_type) {
+      case GLSL_TYPE_FLOAT:
+         /* Ok in all GLSL versions */
+         break;
+      case GLSL_TYPE_UINT:
+      case GLSL_TYPE_INT:
+         if (state->is_version(130, 300))
+            break;
+         _mesa_glsl_error(loc, state,
+                          "varying variables must be of base type float in %s",
+                          state->get_version_string());
+         break;
+      case GLSL_TYPE_STRUCT:
+         if (state->is_version(150, 300))
+            break;
+         _mesa_glsl_error(loc, state,
+                          "varying variables may not be of type struct");
+         break;
+      default:
+         _mesa_glsl_error(loc, state, "illegal type for a varying variable");
+         break;
+      }
+   }
+
+   if (state->all_invariant && (state->current_function == NULL)) {
+      switch (state->stage) {
+      case MESA_SHADER_VERTEX:
+         if (var->data.mode == ir_var_shader_out)
+            var->data.invariant = true;
+	      break;
+      case MESA_SHADER_GEOMETRY:
+         if ((var->data.mode == ir_var_shader_in)
+             || (var->data.mode == ir_var_shader_out))
+            var->data.invariant = true;
+         break;
+      case MESA_SHADER_FRAGMENT:
+         if (var->data.mode == ir_var_shader_in)
+            var->data.invariant = true;
+         break;
+      case MESA_SHADER_COMPUTE:
+         /* Invariance isn't meaningful in compute shaders. */
+         break;
+      }
+   }
+
+   var->data.interpolation =
+      interpret_interpolation_qualifier(qual, (ir_variable_mode) var->data.mode,
+                                        state, loc);
+
+   var->data.pixel_center_integer = qual->flags.q.pixel_center_integer;
+   var->data.origin_upper_left = qual->flags.q.origin_upper_left;
+   if ((qual->flags.q.origin_upper_left || qual->flags.q.pixel_center_integer)
+       && (strcmp(var->name, "gl_FragCoord") != 0)) {
+      const char *const qual_string = (qual->flags.q.origin_upper_left)
+         ? "origin_upper_left" : "pixel_center_integer";
+
+      _mesa_glsl_error(loc, state,
+		       "layout qualifier `%s' can only be applied to "
+		       "fragment shader input `gl_FragCoord'",
+		       qual_string);
+   }
+
+   if (var->name != NULL && strcmp(var->name, "gl_FragCoord") == 0) {
+
+      /* Section 4.3.8.1, page 39 of GLSL 1.50 spec says:
+       *
+       *    "Within any shader, the first redeclarations of gl_FragCoord
+       *     must appear before any use of gl_FragCoord."
+       *
+       * Generate a compiler error if above condition is not met by the
+       * fragment shader.
+       */
+      ir_variable *earlier = state->symbols->get_variable("gl_FragCoord");
+      if (earlier != NULL &&
+          earlier->data.used &&
+          !state->fs_redeclares_gl_fragcoord) {
+         _mesa_glsl_error(loc, state,
+                          "gl_FragCoord used before its first redeclaration "
+                          "in fragment shader");
+      }
+
+      /* Make sure all gl_FragCoord redeclarations specify the same layout
+       * qualifiers.
+       */
+      if (is_conflicting_fragcoord_redeclaration(state, qual)) {
+         const char *const qual_string =
+            get_layout_qualifier_string(qual->flags.q.origin_upper_left,
+                                        qual->flags.q.pixel_center_integer);
+
+         const char *const state_string =
+            get_layout_qualifier_string(state->fs_origin_upper_left,
+                                        state->fs_pixel_center_integer);
+
+         _mesa_glsl_error(loc, state,
+                          "gl_FragCoord redeclared with different layout "
+                          "qualifiers (%s) and (%s) ",
+                          state_string,
+                          qual_string);
+      }
+      state->fs_origin_upper_left = qual->flags.q.origin_upper_left;
+      state->fs_pixel_center_integer = qual->flags.q.pixel_center_integer;
+      state->fs_redeclares_gl_fragcoord_with_no_layout_qualifiers =
+         !qual->flags.q.origin_upper_left && !qual->flags.q.pixel_center_integer;
+      state->fs_redeclares_gl_fragcoord =
+         state->fs_origin_upper_left ||
+         state->fs_pixel_center_integer ||
+         state->fs_redeclares_gl_fragcoord_with_no_layout_qualifiers;
+   }
+
+   if (qual->flags.q.explicit_location) {
+      validate_explicit_location(qual, var, state, loc);
+   } else if (qual->flags.q.explicit_index) {
+      _mesa_glsl_error(loc, state, "explicit index requires explicit location");
+   }
+
+   if (qual->flags.q.explicit_binding &&
+       validate_binding_qualifier(state, loc, var, qual)) {
+      var->data.explicit_binding = true;
+      var->data.binding = qual->binding;
+   }
+
+   if (var->type->contains_atomic()) {
+      if (var->data.mode == ir_var_uniform) {
+         if (var->data.explicit_binding) {
+            unsigned *offset =
+               &state->atomic_counter_offsets[var->data.binding];
+
+            if (*offset % ATOMIC_COUNTER_SIZE)
+               _mesa_glsl_error(loc, state,
+                                "misaligned atomic counter offset");
+
+            var->data.atomic.offset = *offset;
+            *offset += var->type->atomic_size();
+
+         } else {
+            _mesa_glsl_error(loc, state,
+                             "atomic counters require explicit binding point");
+         }
+      } else if (var->data.mode != ir_var_function_in) {
+         _mesa_glsl_error(loc, state, "atomic counters may only be declared as "
+                          "function parameters or uniform-qualified "
+                          "global variables");
+      }
+   }
+
+   /* Does the declaration use the deprecated 'attribute' or 'varying'
+    * keywords?
+    */
+   const bool uses_deprecated_qualifier = qual->flags.q.attribute
+      || qual->flags.q.varying;
+
+   /* Is the 'layout' keyword used with parameters that allow relaxed checking.
+    * Many implementations of GL_ARB_fragment_coord_conventions_enable and some
+    * implementations (only Mesa?) GL_ARB_explicit_attrib_location_enable
+    * allowed the layout qualifier to be used with 'varying' and 'attribute'.
+    * These extensions and all following extensions that add the 'layout'
+    * keyword have been modified to require the use of 'in' or 'out'.
+    *
+    * The following extension do not allow the deprecated keywords:
+    *
+    *    GL_AMD_conservative_depth
+    *    GL_ARB_conservative_depth
+    *    GL_ARB_gpu_shader5
+    *    GL_ARB_separate_shader_objects
+    *    GL_ARB_tesselation_shader
+    *    GL_ARB_transform_feedback3
+    *    GL_ARB_uniform_buffer_object
+    *
+    * It is unknown whether GL_EXT_shader_image_load_store or GL_NV_gpu_shader5
+    * allow layout with the deprecated keywords.
+    */
+   const bool relaxed_layout_qualifier_checking =
+      state->ARB_fragment_coord_conventions_enable;
+
+   if (qual->has_layout() && uses_deprecated_qualifier) {
+      if (relaxed_layout_qualifier_checking) {
+         _mesa_glsl_warning(loc, state,
+                            "`layout' qualifier may not be used with "
+                            "`attribute' or `varying'");
+      } else {
+         _mesa_glsl_error(loc, state,
+                          "`layout' qualifier may not be used with "
+                          "`attribute' or `varying'");
+      }
+   }
+
+   /* Layout qualifiers for gl_FragDepth, which are enabled by extension
+    * AMD_conservative_depth.
+    */
+   int depth_layout_count = qual->flags.q.depth_any
+      + qual->flags.q.depth_greater
+      + qual->flags.q.depth_less
+      + qual->flags.q.depth_unchanged;
+   if (depth_layout_count > 0
+       && !state->AMD_conservative_depth_enable
+       && !state->ARB_conservative_depth_enable) {
+       _mesa_glsl_error(loc, state,
+                        "extension GL_AMD_conservative_depth or "
+                        "GL_ARB_conservative_depth must be enabled "
+                        "to use depth layout qualifiers");
+   } else if (depth_layout_count > 0
+              && strcmp(var->name, "gl_FragDepth") != 0) {
+       _mesa_glsl_error(loc, state,
+                        "depth layout qualifiers can be applied only to "
+                        "gl_FragDepth");
+   } else if (depth_layout_count > 1
+              && strcmp(var->name, "gl_FragDepth") == 0) {
+      _mesa_glsl_error(loc, state,
+                       "at most one depth layout qualifier can be applied to "
+                       "gl_FragDepth");
+   }
+   if (qual->flags.q.depth_any)
+      var->data.depth_layout = ir_depth_layout_any;
+   else if (qual->flags.q.depth_greater)
+      var->data.depth_layout = ir_depth_layout_greater;
+   else if (qual->flags.q.depth_less)
+      var->data.depth_layout = ir_depth_layout_less;
+   else if (qual->flags.q.depth_unchanged)
+       var->data.depth_layout = ir_depth_layout_unchanged;
+   else
+       var->data.depth_layout = ir_depth_layout_none;
+
+   if (qual->flags.q.std140 ||
+       qual->flags.q.packed ||
+       qual->flags.q.shared) {
+      _mesa_glsl_error(loc, state,
+                       "uniform block layout qualifiers std140, packed, and "
+                       "shared can only be applied to uniform blocks, not "
+                       "members");
+   }
+
+   if (qual->flags.q.row_major || qual->flags.q.column_major) {
+      validate_matrix_layout_for_type(state, loc, var->type, var);
+   }
+
+   if (var->type->contains_image())
+      apply_image_qualifier_to_variable(qual, var, state, loc);
+}
+
+/**
+ * Get the variable that is being redeclared by this declaration
+ *
+ * Semantic checks to verify the validity of the redeclaration are also
+ * performed.  If semantic checks fail, compilation error will be emitted via
+ * \c _mesa_glsl_error, but a non-\c NULL pointer will still be returned.
+ *
+ * \returns
+ * A pointer to an existing variable in the current scope if the declaration
+ * is a redeclaration, \c NULL otherwise.
+ */
+static ir_variable *
+get_variable_being_redeclared(ir_variable *var, YYLTYPE loc,
+                              struct _mesa_glsl_parse_state *state,
+                              bool allow_all_redeclarations)
+{
+   /* Check if this declaration is actually a re-declaration, either to
+    * resize an array or add qualifiers to an existing variable.
+    *
+    * This is allowed for variables in the current scope, or when at
+    * global scope (for built-ins in the implicit outer scope).
+    */
+   ir_variable *earlier = state->symbols->get_variable(var->name);
+   if (earlier == NULL ||
+       (state->current_function != NULL &&
+       !state->symbols->name_declared_this_scope(var->name))) {
+      return NULL;
+   }
+
+
+   /* From page 24 (page 30 of the PDF) of the GLSL 1.50 spec,
+    *
+    * "It is legal to declare an array without a size and then
+    *  later re-declare the same name as an array of the same
+    *  type and specify a size."
+    */
+   if (earlier->type->is_unsized_array() && var->type->is_array()
+       && (var->type->element_type() == earlier->type->element_type())) {
+      /* FINISHME: This doesn't match the qualifiers on the two
+       * FINISHME: declarations.  It's not 100% clear whether this is
+       * FINISHME: required or not.
+       */
+
+      const unsigned size = unsigned(var->type->array_size());
+      check_builtin_array_max_size(var->name, size, loc, state);
+      if ((size > 0) && (size <= earlier->data.max_array_access)) {
+         _mesa_glsl_error(& loc, state, "array size must be > %u due to "
+                          "previous access",
+                          earlier->data.max_array_access);
+      }
+
+      earlier->type = var->type;
+      delete var;
+      var = NULL;
+   } else if ((state->ARB_fragment_coord_conventions_enable ||
+              state->is_version(150, 0))
+              && strcmp(var->name, "gl_FragCoord") == 0
+              && earlier->type == var->type
+              && earlier->data.mode == var->data.mode) {
+      /* Allow redeclaration of gl_FragCoord for ARB_fcc layout
+       * qualifiers.
+       */
+      earlier->data.origin_upper_left = var->data.origin_upper_left;
+      earlier->data.pixel_center_integer = var->data.pixel_center_integer;
+
+      /* According to section 4.3.7 of the GLSL 1.30 spec,
+       * the following built-in varaibles can be redeclared with an
+       * interpolation qualifier:
+       *    * gl_FrontColor
+       *    * gl_BackColor
+       *    * gl_FrontSecondaryColor
+       *    * gl_BackSecondaryColor
+       *    * gl_Color
+       *    * gl_SecondaryColor
+       */
+   } else if (state->is_version(130, 0)
+              && (strcmp(var->name, "gl_FrontColor") == 0
+                  || strcmp(var->name, "gl_BackColor") == 0
+                  || strcmp(var->name, "gl_FrontSecondaryColor") == 0
+                  || strcmp(var->name, "gl_BackSecondaryColor") == 0
+                  || strcmp(var->name, "gl_Color") == 0
+                  || strcmp(var->name, "gl_SecondaryColor") == 0)
+              && earlier->type == var->type
+              && earlier->data.mode == var->data.mode) {
+      earlier->data.interpolation = var->data.interpolation;
+
+      /* Layout qualifiers for gl_FragDepth. */
+   } else if ((state->AMD_conservative_depth_enable ||
+               state->ARB_conservative_depth_enable)
+              && strcmp(var->name, "gl_FragDepth") == 0
+              && earlier->type == var->type
+              && earlier->data.mode == var->data.mode) {
+
+      /** From the AMD_conservative_depth spec:
+       *     Within any shader, the first redeclarations of gl_FragDepth
+       *     must appear before any use of gl_FragDepth.
+       */
+      if (earlier->data.used) {
+         _mesa_glsl_error(&loc, state,
+                          "the first redeclaration of gl_FragDepth "
+                          "must appear before any use of gl_FragDepth");
+      }
+
+      /* Prevent inconsistent redeclaration of depth layout qualifier. */
+      if (earlier->data.depth_layout != ir_depth_layout_none
+          && earlier->data.depth_layout != var->data.depth_layout) {
+            _mesa_glsl_error(&loc, state,
+                             "gl_FragDepth: depth layout is declared here "
+                             "as '%s, but it was previously declared as "
+                             "'%s'",
+                             depth_layout_string(var->data.depth_layout),
+                             depth_layout_string(earlier->data.depth_layout));
+      }
+
+      earlier->data.depth_layout = var->data.depth_layout;
+
+   } else if (allow_all_redeclarations) {
+      if (earlier->data.mode != var->data.mode) {
+         _mesa_glsl_error(&loc, state,
+                          "redeclaration of `%s' with incorrect qualifiers",
+                          var->name);
+      } else if (earlier->type != var->type) {
+         _mesa_glsl_error(&loc, state,
+                          "redeclaration of `%s' has incorrect type",
+                          var->name);
+      }
+   } else {
+      _mesa_glsl_error(&loc, state, "`%s' redeclared", var->name);
+   }
+
+   return earlier;
+}
+
+/**
+ * Generate the IR for an initializer in a variable declaration
+ */
+ir_rvalue *
+process_initializer(ir_variable *var, ast_declaration *decl,
+		    ast_fully_specified_type *type,
+		    exec_list *initializer_instructions,
+		    struct _mesa_glsl_parse_state *state)
+{
+   ir_rvalue *result = NULL;
+
+   YYLTYPE initializer_loc = decl->initializer->get_location();
+
+   /* From page 24 (page 30 of the PDF) of the GLSL 1.10 spec:
+    *
+    *    "All uniform variables are read-only and are initialized either
+    *    directly by an application via API commands, or indirectly by
+    *    OpenGL."
+    */
+   if (var->data.mode == ir_var_uniform) {
+      state->check_version(120, 0, &initializer_loc,
+                           "cannot initialize uniforms");
+   }
+
+   /* From section 4.1.7 of the GLSL 4.40 spec:
+    *
+    *    "Opaque variables [...] are initialized only through the
+    *     OpenGL API; they cannot be declared with an initializer in a
+    *     shader."
+    */
+   if (var->type->contains_opaque()) {
+      _mesa_glsl_error(& initializer_loc, state,
+                       "cannot initialize opaque variable");
+   }
+
+   if ((var->data.mode == ir_var_shader_in) && (state->current_function == NULL)) {
+      _mesa_glsl_error(& initializer_loc, state,
+		       "cannot initialize %s shader input / %s",
+		       _mesa_shader_stage_to_string(state->stage),
+		       (state->stage == MESA_SHADER_VERTEX)
+		       ? "attribute" : "varying");
+   }
+
+   /* If the initializer is an ast_aggregate_initializer, recursively store
+    * type information from the LHS into it, so that its hir() function can do
+    * type checking.
+    */
+   if (decl->initializer->oper == ast_aggregate)
+      _mesa_ast_set_aggregate_type(var->type, decl->initializer);
+
+   ir_dereference *const lhs = new(state) ir_dereference_variable(var);
+   ir_rvalue *rhs = decl->initializer->hir(initializer_instructions, state);
+
+   /* Calculate the constant value if this is a const or uniform
+    * declaration.
+    */
+   if (type->qualifier.flags.q.constant
+       || type->qualifier.flags.q.uniform) {
+      ir_rvalue *new_rhs = validate_assignment(state, initializer_loc,
+                                               var->type, rhs, true);
+      if (new_rhs != NULL) {
+         rhs = new_rhs;
+
+         ir_constant *constant_value = rhs->constant_expression_value();
+         if (!constant_value) {
+            /* If ARB_shading_language_420pack is enabled, initializers of
+             * const-qualified local variables do not have to be constant
+             * expressions. Const-qualified global variables must still be
+             * initialized with constant expressions.
+             */
+            if (!state->ARB_shading_language_420pack_enable
+                || state->current_function == NULL) {
+               _mesa_glsl_error(& initializer_loc, state,
+                                "initializer of %s variable `%s' must be a "
+                                "constant expression",
+                                (type->qualifier.flags.q.constant)
+                                ? "const" : "uniform",
+                                decl->identifier);
+               if (var->type->is_numeric()) {
+                  /* Reduce cascading errors. */
+                  var->constant_value = ir_constant::zero(state, var->type);
+               }
+            }
+         } else {
+            rhs = constant_value;
+            var->constant_value = constant_value;
+         }
+      } else {
+         if (var->type->is_numeric()) {
+            /* Reduce cascading errors. */
+            var->constant_value = ir_constant::zero(state, var->type);
+         }
+      }
+   }
+
+   if (rhs && !rhs->type->is_error()) {
+      bool temp = var->data.read_only;
+      if (type->qualifier.flags.q.constant)
+         var->data.read_only = false;
+
+      /* Never emit code to initialize a uniform.
+       */
+      const glsl_type *initializer_type;
+      if (!type->qualifier.flags.q.uniform) {
+         do_assignment(initializer_instructions, state,
+                       NULL,
+                       lhs, rhs,
+                       &result, true,
+                       true,
+                       type->get_location());
+         initializer_type = result->type;
+      } else
+         initializer_type = rhs->type;
+
+      var->constant_initializer = rhs->constant_expression_value();
+      var->data.has_initializer = true;
+
+      /* If the declared variable is an unsized array, it must inherrit
+       * its full type from the initializer.  A declaration such as
+       *
+       *     uniform float a[] = float[](1.0, 2.0, 3.0, 3.0);
+       *
+       * becomes
+       *
+       *     uniform float a[4] = float[](1.0, 2.0, 3.0, 3.0);
+       *
+       * The assignment generated in the if-statement (below) will also
+       * automatically handle this case for non-uniforms.
+       *
+       * If the declared variable is not an array, the types must
+       * already match exactly.  As a result, the type assignment
+       * here can be done unconditionally.  For non-uniforms the call
+       * to do_assignment can change the type of the initializer (via
+       * the implicit conversion rules).  For uniforms the initializer
+       * must be a constant expression, and the type of that expression
+       * was validated above.
+       */
+      var->type = initializer_type;
+
+      var->data.read_only = temp;
+   }
+
+   return result;
+}
+
+
+/**
+ * Do additional processing necessary for geometry shader input declarations
+ * (this covers both interface blocks arrays and bare input variables).
+ */
+static void
+handle_geometry_shader_input_decl(struct _mesa_glsl_parse_state *state,
+                                  YYLTYPE loc, ir_variable *var)
+{
+   unsigned num_vertices = 0;
+   if (state->gs_input_prim_type_specified) {
+      num_vertices = vertices_per_prim(state->in_qualifier->prim_type);
+   }
+
+   /* Geometry shader input variables must be arrays.  Caller should have
+    * reported an error for this.
+    */
+   if (!var->type->is_array()) {
+      assert(state->error);
+
+      /* To avoid cascading failures, short circuit the checks below. */
+      return;
+   }
+
+   if (var->type->is_unsized_array()) {
+      /* Section 4.3.8.1 (Input Layout Qualifiers) of the GLSL 1.50 spec says:
+       *
+       *   All geometry shader input unsized array declarations will be
+       *   sized by an earlier input layout qualifier, when present, as per
+       *   the following table.
+       *
+       * Followed by a table mapping each allowed input layout qualifier to
+       * the corresponding input length.
+       */
+      if (num_vertices != 0)
+         var->type = glsl_type::get_array_instance(var->type->fields.array,
+                                                   num_vertices);
+   } else {
+      /* Section 4.3.8.1 (Input Layout Qualifiers) of the GLSL 1.50 spec
+       * includes the following examples of compile-time errors:
+       *
+       *   // code sequence within one shader...
+       *   in vec4 Color1[];    // size unknown
+       *   ...Color1.length()...// illegal, length() unknown
+       *   in vec4 Color2[2];   // size is 2
+       *   ...Color1.length()...// illegal, Color1 still has no size
+       *   in vec4 Color3[3];   // illegal, input sizes are inconsistent
+       *   layout(lines) in;    // legal, input size is 2, matching
+       *   in vec4 Color4[3];   // illegal, contradicts layout
+       *   ...
+       *
+       * To detect the case illustrated by Color3, we verify that the size of
+       * an explicitly-sized array matches the size of any previously declared
+       * explicitly-sized array.  To detect the case illustrated by Color4, we
+       * verify that the size of an explicitly-sized array is consistent with
+       * any previously declared input layout.
+       */
+      if (num_vertices != 0 && var->type->length != num_vertices) {
+         _mesa_glsl_error(&loc, state,
+                          "geometry shader input size contradicts previously"
+                          " declared layout (size is %u, but layout requires a"
+                          " size of %u)", var->type->length, num_vertices);
+      } else if (state->gs_input_size != 0 &&
+                 var->type->length != state->gs_input_size) {
+         _mesa_glsl_error(&loc, state,
+                          "geometry shader input sizes are "
+                          "inconsistent (size is %u, but a previous "
+                          "declaration has size %u)",
+                          var->type->length, state->gs_input_size);
+      } else {
+         state->gs_input_size = var->type->length;
+      }
+   }
+}
+
+
+void
+validate_identifier(const char *identifier, YYLTYPE loc,
+                    struct _mesa_glsl_parse_state *state)
+{
+   /* From page 15 (page 21 of the PDF) of the GLSL 1.10 spec,
+    *
+    *   "Identifiers starting with "gl_" are reserved for use by
+    *   OpenGL, and may not be declared in a shader as either a
+    *   variable or a function."
+    */
+   if (strncmp(identifier, "gl_", 3) == 0) {
+      _mesa_glsl_error(&loc, state,
+                       "identifier `%s' uses reserved `gl_' prefix",
+                       identifier);
+   } else if (strstr(identifier, "__")) {
+      /* From page 14 (page 20 of the PDF) of the GLSL 1.10
+       * spec:
+       *
+       *     "In addition, all identifiers containing two
+       *      consecutive underscores (__) are reserved as
+       *      possible future keywords."
+       *
+       * The intention is that names containing __ are reserved for internal
+       * use by the implementation, and names prefixed with GL_ are reserved
+       * for use by Khronos.  Names simply containing __ are dangerous to use,
+       * but should be allowed.
+       *
+       * A future version of the GLSL specification will clarify this.
+       */
+      _mesa_glsl_warning(&loc, state,
+                         "identifier `%s' uses reserved `__' string",
+                         identifier);
+   }
+}
+
+
+ir_rvalue *
+ast_declarator_list::hir(exec_list *instructions,
+                         struct _mesa_glsl_parse_state *state)
+{
+   void *ctx = state;
+   const struct glsl_type *decl_type;
+   const char *type_name = NULL;
+   ir_rvalue *result = NULL;
+   YYLTYPE loc = this->get_location();
+
+   /* From page 46 (page 52 of the PDF) of the GLSL 1.50 spec:
+    *
+    *     "To ensure that a particular output variable is invariant, it is
+    *     necessary to use the invariant qualifier. It can either be used to
+    *     qualify a previously declared variable as being invariant
+    *
+    *         invariant gl_Position; // make existing gl_Position be invariant"
+    *
+    * In these cases the parser will set the 'invariant' flag in the declarator
+    * list, and the type will be NULL.
+    */
+   if (this->invariant) {
+      assert(this->type == NULL);
+
+      if (state->current_function != NULL) {
+         _mesa_glsl_error(& loc, state,
+                          "all uses of `invariant' keyword must be at global "
+                          "scope");
+      }
+
+      foreach_list_typed (ast_declaration, decl, link, &this->declarations) {
+         assert(decl->array_specifier == NULL);
+         assert(decl->initializer == NULL);
+
+         ir_variable *const earlier =
+            state->symbols->get_variable(decl->identifier);
+         if (earlier == NULL) {
+            _mesa_glsl_error(& loc, state,
+                             "undeclared variable `%s' cannot be marked "
+                             "invariant", decl->identifier);
+         } else if (!is_varying_var(earlier, state->stage)) {
+            _mesa_glsl_error(&loc, state,
+                             "`%s' cannot be marked invariant; interfaces between "
+                             "shader stages only.", decl->identifier);
+         } else if (earlier->data.used) {
+            _mesa_glsl_error(& loc, state,
+                            "variable `%s' may not be redeclared "
+                            "`invariant' after being used",
+                            earlier->name);
+         } else {
+            earlier->data.invariant = true;
+         }
+      }
+
+      /* Invariant redeclarations do not have r-values.
+       */
+      return NULL;
+   }
+
+   assert(this->type != NULL);
+   assert(!this->invariant);
+
+   /* The type specifier may contain a structure definition.  Process that
+    * before any of the variable declarations.
+    */
+   (void) this->type->specifier->hir(instructions, state);
+
+   decl_type = this->type->glsl_type(& type_name, state);
+
+   /* An offset-qualified atomic counter declaration sets the default
+    * offset for the next declaration within the same atomic counter
+    * buffer.
+    */
+   if (decl_type && decl_type->contains_atomic()) {
+      if (type->qualifier.flags.q.explicit_binding &&
+          type->qualifier.flags.q.explicit_offset)
+         state->atomic_counter_offsets[type->qualifier.binding] =
+            type->qualifier.offset;
+   }
+
+   if (this->declarations.is_empty()) {
+      /* If there is no structure involved in the program text, there are two
+       * possible scenarios:
+       *
+       * - The program text contained something like 'vec4;'.  This is an
+       *   empty declaration.  It is valid but weird.  Emit a warning.
+       *
+       * - The program text contained something like 'S;' and 'S' is not the
+       *   name of a known structure type.  This is both invalid and weird.
+       *   Emit an error.
+       *
+       * - The program text contained something like 'mediump float;'
+       *   when the programmer probably meant 'precision mediump
+       *   float;' Emit a warning with a description of what they
+       *   probably meant to do.
+       *
+       * Note that if decl_type is NULL and there is a structure involved,
+       * there must have been some sort of error with the structure.  In this
+       * case we assume that an error was already generated on this line of
+       * code for the structure.  There is no need to generate an additional,
+       * confusing error.
+       */
+      assert(this->type->specifier->structure == NULL || decl_type != NULL
+	     || state->error);
+
+      if (decl_type == NULL) {
+         _mesa_glsl_error(&loc, state,
+                          "invalid type `%s' in empty declaration",
+                          type_name);
+      } else if (decl_type->base_type == GLSL_TYPE_ATOMIC_UINT) {
+         /* Empty atomic counter declarations are allowed and useful
+          * to set the default offset qualifier.
+          */
+         return NULL;
+      } else if (this->type->qualifier.precision != ast_precision_none) {
+         if (this->type->specifier->structure != NULL) {
+            _mesa_glsl_error(&loc, state,
+                             "precision qualifiers can't be applied "
+                             "to structures");
+         } else {
+            static const char *const precision_names[] = {
+               "highp",
+               "highp",
+               "mediump",
+               "lowp"
+            };
+
+            _mesa_glsl_warning(&loc, state,
+                               "empty declaration with precision qualifier, "
+                               "to set the default precision, use "
+                               "`precision %s %s;'",
+                               precision_names[this->type->qualifier.precision],
+                               type_name);
+         }
+      } else if (this->type->specifier->structure == NULL) {
+         _mesa_glsl_warning(&loc, state, "empty declaration");
+      }
+   }
+
+   foreach_list_typed (ast_declaration, decl, link, &this->declarations) {
+      const struct glsl_type *var_type;
+      ir_variable *var;
+
+      /* FINISHME: Emit a warning if a variable declaration shadows a
+       * FINISHME: declaration at a higher scope.
+       */
+
+      if ((decl_type == NULL) || decl_type->is_void()) {
+         if (type_name != NULL) {
+            _mesa_glsl_error(& loc, state,
+                             "invalid type `%s' in declaration of `%s'",
+                             type_name, decl->identifier);
+         } else {
+            _mesa_glsl_error(& loc, state,
+                             "invalid type in declaration of `%s'",
+                             decl->identifier);
+         }
+         continue;
+      }
+
+      var_type = process_array_type(&loc, decl_type, decl->array_specifier,
+                                    state);
+
+      var = new(ctx) ir_variable(var_type, decl->identifier, ir_var_auto);
+
+      /* The 'varying in' and 'varying out' qualifiers can only be used with
+       * ARB_geometry_shader4 and EXT_geometry_shader4, which we don't support
+       * yet.
+       */
+      if (this->type->qualifier.flags.q.varying) {
+         if (this->type->qualifier.flags.q.in) {
+            _mesa_glsl_error(& loc, state,
+                             "`varying in' qualifier in declaration of "
+                             "`%s' only valid for geometry shaders using "
+                             "ARB_geometry_shader4 or EXT_geometry_shader4",
+                             decl->identifier);
+         } else if (this->type->qualifier.flags.q.out) {
+            _mesa_glsl_error(& loc, state,
+                             "`varying out' qualifier in declaration of "
+                             "`%s' only valid for geometry shaders using "
+                             "ARB_geometry_shader4 or EXT_geometry_shader4",
+                             decl->identifier);
+         }
+      }
+
+      /* From page 22 (page 28 of the PDF) of the GLSL 1.10 specification;
+       *
+       *     "Global variables can only use the qualifiers const,
+       *     attribute, uniform, or varying. Only one may be
+       *     specified.
+       *
+       *     Local variables can only use the qualifier const."
+       *
+       * This is relaxed in GLSL 1.30 and GLSL ES 3.00.  It is also relaxed by
+       * any extension that adds the 'layout' keyword.
+       */
+      if (!state->is_version(130, 300)
+          && !state->has_explicit_attrib_location()
+          && !state->has_separate_shader_objects()
+          && !state->ARB_fragment_coord_conventions_enable) {
+         if (this->type->qualifier.flags.q.out) {
+            _mesa_glsl_error(& loc, state,
+                             "`out' qualifier in declaration of `%s' "
+                             "only valid for function parameters in %s",
+                             decl->identifier, state->get_version_string());
+         }
+         if (this->type->qualifier.flags.q.in) {
+            _mesa_glsl_error(& loc, state,
+                             "`in' qualifier in declaration of `%s' "
+                             "only valid for function parameters in %s",
+                             decl->identifier, state->get_version_string());
+         }
+         /* FINISHME: Test for other invalid qualifiers. */
+      }
+
+      apply_type_qualifier_to_variable(& this->type->qualifier, var, state,
+				       & loc, false);
+
+      if (this->type->qualifier.flags.q.invariant) {
+         if (!is_varying_var(var, state->stage)) {
+            _mesa_glsl_error(&loc, state,
+                             "`%s' cannot be marked invariant; interfaces between "
+                             "shader stages only", var->name);
+         }
+      }
+
+      if (state->current_function != NULL) {
+         const char *mode = NULL;
+         const char *extra = "";
+
+         /* There is no need to check for 'inout' here because the parser will
+          * only allow that in function parameter lists.
+          */
+         if (this->type->qualifier.flags.q.attribute) {
+            mode = "attribute";
+         } else if (this->type->qualifier.flags.q.uniform) {
+            mode = "uniform";
+         } else if (this->type->qualifier.flags.q.varying) {
+            mode = "varying";
+         } else if (this->type->qualifier.flags.q.in) {
+            mode = "in";
+            extra = " or in function parameter list";
+         } else if (this->type->qualifier.flags.q.out) {
+            mode = "out";
+            extra = " or in function parameter list";
+         }
+
+         if (mode) {
+            _mesa_glsl_error(& loc, state,
+                             "%s variable `%s' must be declared at "
+                             "global scope%s",
+                             mode, var->name, extra);
+         }
+      } else if (var->data.mode == ir_var_shader_in) {
+         var->data.read_only = true;
+
+         if (state->stage == MESA_SHADER_VERTEX) {
+            bool error_emitted = false;
+
+            /* From page 31 (page 37 of the PDF) of the GLSL 1.50 spec:
+             *
+             *    "Vertex shader inputs can only be float, floating-point
+             *    vectors, matrices, signed and unsigned integers and integer
+             *    vectors. Vertex shader inputs can also form arrays of these
+             *    types, but not structures."
+             *
+             * From page 31 (page 27 of the PDF) of the GLSL 1.30 spec:
+             *
+             *    "Vertex shader inputs can only be float, floating-point
+             *    vectors, matrices, signed and unsigned integers and integer
+             *    vectors. They cannot be arrays or structures."
+             *
+             * From page 23 (page 29 of the PDF) of the GLSL 1.20 spec:
+             *
+             *    "The attribute qualifier can be used only with float,
+             *    floating-point vectors, and matrices. Attribute variables
+             *    cannot be declared as arrays or structures."
+             *
+             * From page 33 (page 39 of the PDF) of the GLSL ES 3.00 spec:
+             *
+             *    "Vertex shader inputs can only be float, floating-point
+             *    vectors, matrices, signed and unsigned integers and integer
+             *    vectors. Vertex shader inputs cannot be arrays or
+             *    structures."
+             */
+            const glsl_type *check_type = var->type;
+            while (check_type->is_array())
+               check_type = check_type->element_type();
+
+            switch (check_type->base_type) {
+            case GLSL_TYPE_FLOAT:
+            break;
+            case GLSL_TYPE_UINT:
+            case GLSL_TYPE_INT:
+               if (state->is_version(120, 300))
+                  break;
+            /* FALLTHROUGH */
+            default:
+               _mesa_glsl_error(& loc, state,
+                                "vertex shader input / attribute cannot have "
+                                "type %s`%s'",
+                                var->type->is_array() ? "array of " : "",
+                                check_type->name);
+               error_emitted = true;
+            }
+
+            if (!error_emitted && var->type->is_array() &&
+                !state->check_version(150, 0, &loc,
+                                      "vertex shader input / attribute "
+                                      "cannot have array type")) {
+               error_emitted = true;
+            }
+         } else if (state->stage == MESA_SHADER_GEOMETRY) {
+            /* From section 4.3.4 (Inputs) of the GLSL 1.50 spec:
+             *
+             *     Geometry shader input variables get the per-vertex values
+             *     written out by vertex shader output variables of the same
+             *     names. Since a geometry shader operates on a set of
+             *     vertices, each input varying variable (or input block, see
+             *     interface blocks below) needs to be declared as an array.
+             */
+            if (!var->type->is_array()) {
+               _mesa_glsl_error(&loc, state,
+                                "geometry shader inputs must be arrays");
+            }
+
+            handle_geometry_shader_input_decl(state, loc, var);
+         }
+      }
+
+      /* Integer fragment inputs must be qualified with 'flat'.  In GLSL ES,
+       * so must integer vertex outputs.
+       *
+       * From section 4.3.4 ("Inputs") of the GLSL 1.50 spec:
+       *    "Fragment shader inputs that are signed or unsigned integers or
+       *    integer vectors must be qualified with the interpolation qualifier
+       *    flat."
+       *
+       * From section 4.3.4 ("Input Variables") of the GLSL 3.00 ES spec:
+       *    "Fragment shader inputs that are, or contain, signed or unsigned
+       *    integers or integer vectors must be qualified with the
+       *    interpolation qualifier flat."
+       *
+       * From section 4.3.6 ("Output Variables") of the GLSL 3.00 ES spec:
+       *    "Vertex shader outputs that are, or contain, signed or unsigned
+       *    integers or integer vectors must be qualified with the
+       *    interpolation qualifier flat."
+       *
+       * Note that prior to GLSL 1.50, this requirement applied to vertex
+       * outputs rather than fragment inputs.  That creates problems in the
+       * presence of geometry shaders, so we adopt the GLSL 1.50 rule for all
+       * desktop GL shaders.  For GLSL ES shaders, we follow the spec and
+       * apply the restriction to both vertex outputs and fragment inputs.
+       *
+       * Note also that the desktop GLSL specs are missing the text "or
+       * contain"; this is presumably an oversight, since there is no
+       * reasonable way to interpolate a fragment shader input that contains
+       * an integer.
+       */
+      if (state->is_version(130, 300) &&
+          var->type->contains_integer() &&
+          var->data.interpolation != INTERP_QUALIFIER_FLAT &&
+          ((state->stage == MESA_SHADER_FRAGMENT && var->data.mode == ir_var_shader_in)
+           || (state->stage == MESA_SHADER_VERTEX && var->data.mode == ir_var_shader_out
+               && state->es_shader))) {
+         const char *var_type = (state->stage == MESA_SHADER_VERTEX) ?
+            "vertex output" : "fragment input";
+         _mesa_glsl_error(&loc, state, "if a %s is (or contains) "
+                          "an integer, then it must be qualified with 'flat'",
+                          var_type);
+      }
+
+
+      /* Interpolation qualifiers cannot be applied to 'centroid' and
+       * 'centroid varying'.
+       *
+       * From page 29 (page 35 of the PDF) of the GLSL 1.30 spec:
+       *    "interpolation qualifiers may only precede the qualifiers in,
+       *    centroid in, out, or centroid out in a declaration. They do not apply
+       *    to the deprecated storage qualifiers varying or centroid varying."
+       *
+       * These deprecated storage qualifiers do not exist in GLSL ES 3.00.
+       */
+      if (state->is_version(130, 0)
+          && this->type->qualifier.has_interpolation()
+          && this->type->qualifier.flags.q.varying) {
+
+         const char *i = this->type->qualifier.interpolation_string();
+         assert(i != NULL);
+         const char *s;
+         if (this->type->qualifier.flags.q.centroid)
+            s = "centroid varying";
+         else
+            s = "varying";
+
+         _mesa_glsl_error(&loc, state,
+                          "qualifier '%s' cannot be applied to the "
+                          "deprecated storage qualifier '%s'", i, s);
+      }
+
+
+      /* Interpolation qualifiers can only apply to vertex shader outputs and
+       * fragment shader inputs.
+       *
+       * From page 29 (page 35 of the PDF) of the GLSL 1.30 spec:
+       *    "Outputs from a vertex shader (out) and inputs to a fragment
+       *    shader (in) can be further qualified with one or more of these
+       *    interpolation qualifiers"
+       *
+       * From page 31 (page 37 of the PDF) of the GLSL ES 3.00 spec:
+       *    "These interpolation qualifiers may only precede the qualifiers
+       *    in, centroid in, out, or centroid out in a declaration. They do
+       *    not apply to inputs into a vertex shader or outputs from a
+       *    fragment shader."
+       */
+      if (state->is_version(130, 300)
+          && this->type->qualifier.has_interpolation()) {
+
+         const char *i = this->type->qualifier.interpolation_string();
+         assert(i != NULL);
+
+         switch (state->stage) {
+         case MESA_SHADER_VERTEX:
+            if (this->type->qualifier.flags.q.in) {
+               _mesa_glsl_error(&loc, state,
+                                "qualifier '%s' cannot be applied to vertex "
+                                "shader inputs", i);
+            }
+            break;
+         case MESA_SHADER_FRAGMENT:
+            if (this->type->qualifier.flags.q.out) {
+               _mesa_glsl_error(&loc, state,
+                                "qualifier '%s' cannot be applied to fragment "
+                                "shader outputs", i);
+            }
+            break;
+         default:
+            break;
+         }
+      }
+
+
+      /* From section 4.3.4 of the GLSL 1.30 spec:
+       *    "It is an error to use centroid in in a vertex shader."
+       *
+       * From section 4.3.4 of the GLSL ES 3.00 spec:
+       *    "It is an error to use centroid in or interpolation qualifiers in
+       *    a vertex shader input."
+       */
+      if (state->is_version(130, 300)
+          && this->type->qualifier.flags.q.centroid
+          && this->type->qualifier.flags.q.in
+          && state->stage == MESA_SHADER_VERTEX) {
+
+         _mesa_glsl_error(&loc, state,
+                          "'centroid in' cannot be used in a vertex shader");
+      }
+
+      if (state->stage == MESA_SHADER_VERTEX
+          && this->type->qualifier.flags.q.sample
+          && this->type->qualifier.flags.q.in) {
+
+         _mesa_glsl_error(&loc, state,
+                        "'sample in' cannot be used in a vertex shader");
+      }
+
+      /* Section 4.3.6 of the GLSL 1.30 specification states:
+       * "It is an error to use centroid out in a fragment shader."
+       *
+       * The GL_ARB_shading_language_420pack extension specification states:
+       * "It is an error to use auxiliary storage qualifiers or interpolation
+       *  qualifiers on an output in a fragment shader."
+       */
+      if (state->stage == MESA_SHADER_FRAGMENT &&
+          this->type->qualifier.flags.q.out &&
+          this->type->qualifier.has_auxiliary_storage()) {
+         _mesa_glsl_error(&loc, state,
+                          "auxiliary storage qualifiers cannot be used on "
+                          "fragment shader outputs");
+      }
+
+      /* Precision qualifiers exists only in GLSL versions 1.00 and >= 1.30.
+       */
+      if (this->type->qualifier.precision != ast_precision_none) {
+         state->check_precision_qualifiers_allowed(&loc);
+      }
+
+
+      /* Precision qualifiers apply to floating point, integer and sampler
+       * types.
+       *
+       * Section 4.5.2 (Precision Qualifiers) of the GLSL 1.30 spec says:
+       *    "Any floating point or any integer declaration can have the type
+       *    preceded by one of these precision qualifiers [...] Literal
+       *    constants do not have precision qualifiers. Neither do Boolean
+       *    variables.
+       *
+       * Section 4.5 (Precision and Precision Qualifiers) of the GLSL 1.30
+       * spec also says:
+       *
+       *     "Precision qualifiers are added for code portability with OpenGL
+       *     ES, not for functionality. They have the same syntax as in OpenGL
+       *     ES."
+       *
+       * Section 8 (Built-In Functions) of the GLSL ES 1.00 spec says:
+       *
+       *     "uniform lowp sampler2D sampler;
+       *     highp vec2 coord;
+       *     ...
+       *     lowp vec4 col = texture2D (sampler, coord);
+       *                                            // texture2D returns lowp"
+       *
+       * From this, we infer that GLSL 1.30 (and later) should allow precision
+       * qualifiers on sampler types just like float and integer types.
+       */
+      if (this->type->qualifier.precision != ast_precision_none
+          && !var->type->is_float()
+          && !var->type->is_integer()
+          && !var->type->is_record()
+          && !var->type->is_sampler()
+          && !(var->type->is_array()
+               && (var->type->fields.array->is_float()
+                   || var->type->fields.array->is_integer()))) {
+
+         _mesa_glsl_error(&loc, state,
+                          "precision qualifiers apply only to floating point"
+                          ", integer and sampler types");
+      }
+
+      /* From section 4.1.7 of the GLSL 4.40 spec:
+       *
+       *    "[Opaque types] can only be declared as function
+       *     parameters or uniform-qualified variables."
+       */
+      if (var_type->contains_opaque() &&
+          !this->type->qualifier.flags.q.uniform) {
+         _mesa_glsl_error(&loc, state,
+                          "opaque variables must be declared uniform");
+      }
+
+      /* Process the initializer and add its instructions to a temporary
+       * list.  This list will be added to the instruction stream (below) after
+       * the declaration is added.  This is done because in some cases (such as
+       * redeclarations) the declaration may not actually be added to the
+       * instruction stream.
+       */
+      exec_list initializer_instructions;
+
+      /* Examine var name here since var may get deleted in the next call */
+      bool var_is_gl_id = (strncmp(var->name, "gl_", 3) == 0);
+
+      ir_variable *earlier =
+         get_variable_being_redeclared(var, decl->get_location(), state,
+                                       false /* allow_all_redeclarations */);
+      if (earlier != NULL) {
+         if (var_is_gl_id &&
+             earlier->data.how_declared == ir_var_declared_in_block) {
+            _mesa_glsl_error(&loc, state,
+                             "`%s' has already been redeclared using "
+                             "gl_PerVertex", var->name);
+         }
+         earlier->data.how_declared = ir_var_declared_normally;
+      }
+
+      if (decl->initializer != NULL) {
+         result = process_initializer((earlier == NULL) ? var : earlier,
+                                      decl, this->type,
+                                      &initializer_instructions, state);
+      }
+
+      /* From page 23 (page 29 of the PDF) of the GLSL 1.10 spec:
+       *
+       *     "It is an error to write to a const variable outside of
+       *      its declaration, so they must be initialized when
+       *      declared."
+       */
+      if (this->type->qualifier.flags.q.constant && decl->initializer == NULL) {
+         _mesa_glsl_error(& loc, state,
+                          "const declaration of `%s' must be initialized",
+                          decl->identifier);
+      }
+
+      if (state->es_shader) {
+         const glsl_type *const t = (earlier == NULL)
+            ? var->type : earlier->type;
+
+         if (t->is_unsized_array())
+            /* Section 10.17 of the GLSL ES 1.00 specification states that
+             * unsized array declarations have been removed from the language.
+             * Arrays that are sized using an initializer are still explicitly
+             * sized.  However, GLSL ES 1.00 does not allow array
+             * initializers.  That is only allowed in GLSL ES 3.00.
+             *
+             * Section 4.1.9 (Arrays) of the GLSL ES 3.00 spec says:
+             *
+             *     "An array type can also be formed without specifying a size
+             *     if the definition includes an initializer:
+             *
+             *         float x[] = float[2] (1.0, 2.0);     // declares an array of size 2
+             *         float y[] = float[] (1.0, 2.0, 3.0); // declares an array of size 3
+             *
+             *         float a[5];
+             *         float b[] = a;"
+             */
+            _mesa_glsl_error(& loc, state,
+                             "unsized array declarations are not allowed in "
+                             "GLSL ES");
+      }
+
+      /* If the declaration is not a redeclaration, there are a few additional
+       * semantic checks that must be applied.  In addition, variable that was
+       * created for the declaration should be added to the IR stream.
+       */
+      if (earlier == NULL) {
+         validate_identifier(decl->identifier, loc, state);
+
+         /* Add the variable to the symbol table.  Note that the initializer's
+          * IR was already processed earlier (though it hasn't been emitted
+          * yet), without the variable in scope.
+          *
+          * This differs from most C-like languages, but it follows the GLSL
+          * specification.  From page 28 (page 34 of the PDF) of the GLSL 1.50
+          * spec:
+          *
+          *     "Within a declaration, the scope of a name starts immediately
+          *     after the initializer if present or immediately after the name
+          *     being declared if not."
+          */
+         if (!state->symbols->add_variable(var)) {
+            YYLTYPE loc = this->get_location();
+            _mesa_glsl_error(&loc, state, "name `%s' already taken in the "
+                             "current scope", decl->identifier);
+            continue;
+         }
+
+         /* Push the variable declaration to the top.  It means that all the
+          * variable declarations will appear in a funny last-to-first order,
+          * but otherwise we run into trouble if a function is prototyped, a
+          * global var is decled, then the function is defined with usage of
+          * the global var.  See glslparsertest's CorrectModule.frag.
+          */
+         instructions->push_head(var);
+      }
+
+      instructions->append_list(&initializer_instructions);
+   }
+
+
+   /* Generally, variable declarations do not have r-values.  However,
+    * one is used for the declaration in
+    *
+    * while (bool b = some_condition()) {
+    *   ...
+    * }
+    *
+    * so we return the rvalue from the last seen declaration here.
+    */
+   return result;
+}
+
+
+ir_rvalue *
+ast_parameter_declarator::hir(exec_list *instructions,
+                              struct _mesa_glsl_parse_state *state)
+{
+   void *ctx = state;
+   const struct glsl_type *type;
+   const char *name = NULL;
+   YYLTYPE loc = this->get_location();
+
+   type = this->type->glsl_type(& name, state);
+
+   if (type == NULL) {
+      if (name != NULL) {
+         _mesa_glsl_error(& loc, state,
+                          "invalid type `%s' in declaration of `%s'",
+                          name, this->identifier);
+      } else {
+         _mesa_glsl_error(& loc, state,
+                          "invalid type in declaration of `%s'",
+                          this->identifier);
+      }
+
+      type = glsl_type::error_type;
+   }
+
+   /* From page 62 (page 68 of the PDF) of the GLSL 1.50 spec:
+    *
+    *    "Functions that accept no input arguments need not use void in the
+    *    argument list because prototypes (or definitions) are required and
+    *    therefore there is no ambiguity when an empty argument list "( )" is
+    *    declared. The idiom "(void)" as a parameter list is provided for
+    *    convenience."
+    *
+    * Placing this check here prevents a void parameter being set up
+    * for a function, which avoids tripping up checks for main taking
+    * parameters and lookups of an unnamed symbol.
+    */
+   if (type->is_void()) {
+      if (this->identifier != NULL)
+         _mesa_glsl_error(& loc, state,
+                          "named parameter cannot have type `void'");
+
+      is_void = true;
+      return NULL;
+   }
+
+   if (formal_parameter && (this->identifier == NULL)) {
+      _mesa_glsl_error(& loc, state, "formal parameter lacks a name");
+      return NULL;
+   }
+
+   /* This only handles "vec4 foo[..]".  The earlier specifier->glsl_type(...)
+    * call already handled the "vec4[..] foo" case.
+    */
+   type = process_array_type(&loc, type, this->array_specifier, state);
+
+   if (!type->is_error() && type->is_unsized_array()) {
+      _mesa_glsl_error(&loc, state, "arrays passed as parameters must have "
+                       "a declared size");
+      type = glsl_type::error_type;
+   }
+
+   is_void = false;
+   ir_variable *var = new(ctx)
+      ir_variable(type, this->identifier, ir_var_function_in);
+
+   /* Apply any specified qualifiers to the parameter declaration.  Note that
+    * for function parameters the default mode is 'in'.
+    */
+   apply_type_qualifier_to_variable(& this->type->qualifier, var, state, & loc,
+                                    true);
+
+   /* From section 4.1.7 of the GLSL 4.40 spec:
+    *
+    *   "Opaque variables cannot be treated as l-values; hence cannot
+    *    be used as out or inout function parameters, nor can they be
+    *    assigned into."
+    */
+   if ((var->data.mode == ir_var_function_inout || var->data.mode == ir_var_function_out)
+       && type->contains_opaque()) {
+      _mesa_glsl_error(&loc, state, "out and inout parameters cannot "
+                       "contain opaque variables");
+      type = glsl_type::error_type;
+   }
+
+   /* From page 39 (page 45 of the PDF) of the GLSL 1.10 spec:
+    *
+    *    "When calling a function, expressions that do not evaluate to
+    *     l-values cannot be passed to parameters declared as out or inout."
+    *
+    * From page 32 (page 38 of the PDF) of the GLSL 1.10 spec:
+    *
+    *    "Other binary or unary expressions, non-dereferenced arrays,
+    *     function names, swizzles with repeated fields, and constants
+    *     cannot be l-values."
+    *
+    * So for GLSL 1.10, passing an array as an out or inout parameter is not
+    * allowed.  This restriction is removed in GLSL 1.20, and in GLSL ES.
+    */
+   if ((var->data.mode == ir_var_function_inout || var->data.mode == ir_var_function_out)
+       && type->is_array()
+       && !state->check_version(120, 100, &loc,
+                                "arrays cannot be out or inout parameters")) {
+      type = glsl_type::error_type;
+   }
+
+   instructions->push_tail(var);
+
+   /* Parameter declarations do not have r-values.
+    */
+   return NULL;
+}
+
+
+void
+ast_parameter_declarator::parameters_to_hir(exec_list *ast_parameters,
+                                            bool formal,
+                                            exec_list *ir_parameters,
+                                            _mesa_glsl_parse_state *state)
+{
+   ast_parameter_declarator *void_param = NULL;
+   unsigned count = 0;
+
+   foreach_list_typed (ast_parameter_declarator, param, link, ast_parameters) {
+      param->formal_parameter = formal;
+      param->hir(ir_parameters, state);
+
+      if (param->is_void)
+         void_param = param;
+
+      count++;
+   }
+
+   if ((void_param != NULL) && (count > 1)) {
+      YYLTYPE loc = void_param->get_location();
+
+      _mesa_glsl_error(& loc, state,
+                       "`void' parameter must be only parameter");
+   }
+}
+
+
+void
+emit_function(_mesa_glsl_parse_state *state, ir_function *f)
+{
+   /* IR invariants disallow function declarations or definitions
+    * nested within other function definitions.  But there is no
+    * requirement about the relative order of function declarations
+    * and definitions with respect to one another.  So simply insert
+    * the new ir_function block at the end of the toplevel instruction
+    * list.
+    */
+   state->toplevel_ir->push_tail(f);
+}
+
+
+ir_rvalue *
+ast_function::hir(exec_list *instructions,
+                  struct _mesa_glsl_parse_state *state)
+{
+   void *ctx = state;
+   ir_function *f = NULL;
+   ir_function_signature *sig = NULL;
+   exec_list hir_parameters;
+
+   const char *const name = identifier;
+
+   /* New functions are always added to the top-level IR instruction stream,
+    * so this instruction list pointer is ignored.  See also emit_function
+    * (called below).
+    */
+   (void) instructions;
+
+   /* From page 21 (page 27 of the PDF) of the GLSL 1.20 spec,
+    *
+    *   "Function declarations (prototypes) cannot occur inside of functions;
+    *   they must be at global scope, or for the built-in functions, outside
+    *   the global scope."
+    *
+    * From page 27 (page 33 of the PDF) of the GLSL ES 1.00.16 spec,
+    *
+    *   "User defined functions may only be defined within the global scope."
+    *
+    * Note that this language does not appear in GLSL 1.10.
+    */
+   if ((state->current_function != NULL) &&
+       state->is_version(120, 100)) {
+      YYLTYPE loc = this->get_location();
+      _mesa_glsl_error(&loc, state,
+		       "declaration of function `%s' not allowed within "
+		       "function body", name);
+   }
+
+   validate_identifier(name, this->get_location(), state);
+
+   /* Convert the list of function parameters to HIR now so that they can be
+    * used below to compare this function's signature with previously seen
+    * signatures for functions with the same name.
+    */
+   ast_parameter_declarator::parameters_to_hir(& this->parameters,
+                                               is_definition,
+                                               & hir_parameters, state);
+
+   const char *return_type_name;
+   const glsl_type *return_type =
+      this->return_type->glsl_type(& return_type_name, state);
+
+   if (!return_type) {
+      YYLTYPE loc = this->get_location();
+      _mesa_glsl_error(&loc, state,
+                       "function `%s' has undeclared return type `%s'",
+                       name, return_type_name);
+      return_type = glsl_type::error_type;
+   }
+
+   /* From page 56 (page 62 of the PDF) of the GLSL 1.30 spec:
+    * "No qualifier is allowed on the return type of a function."
+    */
+   if (this->return_type->has_qualifiers()) {
+      YYLTYPE loc = this->get_location();
+      _mesa_glsl_error(& loc, state,
+                       "function `%s' return type has qualifiers", name);
+   }
+
+   /* Section 6.1 (Function Definitions) of the GLSL 1.20 spec says:
+    *
+    *     "Arrays are allowed as arguments and as the return type. In both
+    *     cases, the array must be explicitly sized."
+    */
+   if (return_type->is_unsized_array()) {
+      YYLTYPE loc = this->get_location();
+      _mesa_glsl_error(& loc, state,
+                       "function `%s' return type array must be explicitly "
+                       "sized", name);
+   }
+
+   /* From section 4.1.7 of the GLSL 4.40 spec:
+    *
+    *    "[Opaque types] can only be declared as function parameters
+    *     or uniform-qualified variables."
+    */
+   if (return_type->contains_opaque()) {
+      YYLTYPE loc = this->get_location();
+      _mesa_glsl_error(&loc, state,
+                       "function `%s' return type can't contain an opaque type",
+                       name);
+   }
+
+   /* Verify that this function's signature either doesn't match a previously
+    * seen signature for a function with the same name, or, if a match is found,
+    * that the previously seen signature does not have an associated definition.
+    */
+   f = state->symbols->get_function(name);
+   if (f != NULL && (state->es_shader || f->has_user_signature())) {
+      sig = f->exact_matching_signature(state, &hir_parameters);
+      if (sig != NULL) {
+         const char *badvar = sig->qualifiers_match(&hir_parameters);
+         if (badvar != NULL) {
+            YYLTYPE loc = this->get_location();
+
+            _mesa_glsl_error(&loc, state, "function `%s' parameter `%s' "
+                             "qualifiers don't match prototype", name, badvar);
+         }
+
+         if (sig->return_type != return_type) {
+            YYLTYPE loc = this->get_location();
+
+            _mesa_glsl_error(&loc, state, "function `%s' return type doesn't "
+                             "match prototype", name);
+         }
+
+         if (sig->is_defined) {
+            if (is_definition) {
+               YYLTYPE loc = this->get_location();
+               _mesa_glsl_error(& loc, state, "function `%s' redefined", name);
+            } else {
+               /* We just encountered a prototype that exactly matches a
+                * function that's already been defined.  This is redundant,
+                * and we should ignore it.
+                */
+               return NULL;
+            }
+         }
+      }
+   } else {
+      f = new(ctx) ir_function(name);
+      if (!state->symbols->add_function(f)) {
+         /* This function name shadows a non-function use of the same name. */
+         YYLTYPE loc = this->get_location();
+
+         _mesa_glsl_error(&loc, state, "function name `%s' conflicts with "
+                          "non-function", name);
+         return NULL;
+      }
+
+      emit_function(state, f);
+   }
+
+   /* Verify the return type of main() */
+   if (strcmp(name, "main") == 0) {
+      if (! return_type->is_void()) {
+         YYLTYPE loc = this->get_location();
+
+         _mesa_glsl_error(& loc, state, "main() must return void");
+      }
+
+      if (!hir_parameters.is_empty()) {
+         YYLTYPE loc = this->get_location();
+
+         _mesa_glsl_error(& loc, state, "main() must not take any parameters");
+      }
+   }
+
+   /* Finish storing the information about this new function in its signature.
+    */
+   if (sig == NULL) {
+      sig = new(ctx) ir_function_signature(return_type);
+      f->add_signature(sig);
+   }
+
+   sig->replace_parameters(&hir_parameters);
+   signature = sig;
+
+   /* Function declarations (prototypes) do not have r-values.
+    */
+   return NULL;
+}
+
+
+ir_rvalue *
+ast_function_definition::hir(exec_list *instructions,
+                             struct _mesa_glsl_parse_state *state)
+{
+   prototype->is_definition = true;
+   prototype->hir(instructions, state);
+
+   ir_function_signature *signature = prototype->signature;
+   if (signature == NULL)
+      return NULL;
+
+   assert(state->current_function == NULL);
+   state->current_function = signature;
+   state->found_return = false;
+
+   /* Duplicate parameters declared in the prototype as concrete variables.
+    * Add these to the symbol table.
+    */
+   state->symbols->push_scope();
+   foreach_list(n, &signature->parameters) {
+      ir_variable *const var = ((ir_instruction *) n)->as_variable();
+
+      assert(var != NULL);
+
+      /* The only way a parameter would "exist" is if two parameters have
+       * the same name.
+       */
+      if (state->symbols->name_declared_this_scope(var->name)) {
+         YYLTYPE loc = this->get_location();
+
+         _mesa_glsl_error(& loc, state, "parameter `%s' redeclared", var->name);
+      } else {
+         state->symbols->add_variable(var);
+      }
+   }
+
+   /* Convert the body of the function to HIR. */
+   this->body->hir(&signature->body, state);
+   signature->is_defined = true;
+
+   state->symbols->pop_scope();
+
+   assert(state->current_function == signature);
+   state->current_function = NULL;
+
+   if (!signature->return_type->is_void() && !state->found_return) {
+      YYLTYPE loc = this->get_location();
+      _mesa_glsl_error(& loc, state, "function `%s' has non-void return type "
+                       "%s, but no return statement",
+                       signature->function_name(),
+                       signature->return_type->name);
+   }
+
+   /* Function definitions do not have r-values.
+    */
+   return NULL;
+}
+
+
+ir_rvalue *
+ast_jump_statement::hir(exec_list *instructions,
+                        struct _mesa_glsl_parse_state *state)
+{
+   void *ctx = state;
+
+   switch (mode) {
+   case ast_return: {
+      ir_return *inst;
+      assert(state->current_function);
+
+      if (opt_return_value) {
+         ir_rvalue *ret = opt_return_value->hir(instructions, state);
+
+         /* The value of the return type can be NULL if the shader says
+          * 'return foo();' and foo() is a function that returns void.
+          *
+          * NOTE: The GLSL spec doesn't say that this is an error.  The type
+          * of the return value is void.  If the return type of the function is
+          * also void, then this should compile without error.  Seriously.
+          */
+         const glsl_type *const ret_type =
+            (ret == NULL) ? glsl_type::void_type : ret->type;
+
+         /* Implicit conversions are not allowed for return values prior to
+          * ARB_shading_language_420pack.
+          */
+         if (state->current_function->return_type != ret_type) {
+            YYLTYPE loc = this->get_location();
+
+            if (state->ARB_shading_language_420pack_enable) {
+               if (!apply_implicit_conversion(state->current_function->return_type,
+                                              ret, state)) {
+                  _mesa_glsl_error(& loc, state,
+                                   "could not implicitly convert return value "
+                                   "to %s, in function `%s'",
+                                   state->current_function->return_type->name,
+                                   state->current_function->function_name());
+               }
+            } else {
+               _mesa_glsl_error(& loc, state,
+                                "`return' with wrong type %s, in function `%s' "
+                                "returning %s",
+                                ret_type->name,
+                                state->current_function->function_name(),
+                                state->current_function->return_type->name);
+            }
+         } else if (state->current_function->return_type->base_type ==
+                    GLSL_TYPE_VOID) {
+            YYLTYPE loc = this->get_location();
+
+            /* The ARB_shading_language_420pack, GLSL ES 3.0, and GLSL 4.20
+             * specs add a clarification:
+             *
+             *    "A void function can only use return without a return argument, even if
+             *     the return argument has void type. Return statements only accept values:
+             *
+             *         void func1() { }
+             *         void func2() { return func1(); } // illegal return statement"
+             */
+            _mesa_glsl_error(& loc, state,
+                             "void functions can only use `return' without a "
+                             "return argument");
+         }
+
+         inst = new(ctx) ir_return(ret);
+      } else {
+         if (state->current_function->return_type->base_type !=
+             GLSL_TYPE_VOID) {
+            YYLTYPE loc = this->get_location();
+
+            _mesa_glsl_error(& loc, state,
+                             "`return' with no value, in function %s returning "
+                             "non-void",
+            state->current_function->function_name());
+         }
+         inst = new(ctx) ir_return;
+      }
+
+      state->found_return = true;
+      instructions->push_tail(inst);
+      break;
+   }
+
+   case ast_discard:
+      if (state->stage != MESA_SHADER_FRAGMENT) {
+         YYLTYPE loc = this->get_location();
+
+         _mesa_glsl_error(& loc, state,
+                          "`discard' may only appear in a fragment shader");
+      }
+      instructions->push_tail(new(ctx) ir_discard);
+      break;
+
+   case ast_break:
+   case ast_continue:
+      if (mode == ast_continue &&
+          state->loop_nesting_ast == NULL) {
+         YYLTYPE loc = this->get_location();
+
+         _mesa_glsl_error(& loc, state, "continue may only appear in a loop");
+      } else if (mode == ast_break &&
+         state->loop_nesting_ast == NULL &&
+         state->switch_state.switch_nesting_ast == NULL) {
+         YYLTYPE loc = this->get_location();
+
+         _mesa_glsl_error(& loc, state,
+                          "break may only appear in a loop or a switch");
+      } else {
+         /* For a loop, inline the for loop expression again, since we don't
+          * know where near the end of the loop body the normal copy of it is
+          * going to be placed.  Same goes for the condition for a do-while
+          * loop.
+          */
+         if (state->loop_nesting_ast != NULL &&
+             mode == ast_continue) {
+            if (state->loop_nesting_ast->rest_expression) {
+               state->loop_nesting_ast->rest_expression->hir(instructions,
+                                                             state);
+            }
+            if (state->loop_nesting_ast->mode ==
+                ast_iteration_statement::ast_do_while) {
+               state->loop_nesting_ast->condition_to_hir(instructions, state);
+            }
+         }
+
+         if (state->switch_state.is_switch_innermost &&
+             mode == ast_break) {
+            /* Force break out of switch by setting is_break switch state.
+             */
+            ir_variable *const is_break_var = state->switch_state.is_break_var;
+            ir_dereference_variable *const deref_is_break_var =
+               new(ctx) ir_dereference_variable(is_break_var);
+            ir_constant *const true_val = new(ctx) ir_constant(true);
+            ir_assignment *const set_break_var =
+               new(ctx) ir_assignment(deref_is_break_var, true_val);
+	    
+            instructions->push_tail(set_break_var);
+         }
+         else {
+            ir_loop_jump *const jump =
+               new(ctx) ir_loop_jump((mode == ast_break)
+                  ? ir_loop_jump::jump_break
+                  : ir_loop_jump::jump_continue);
+            instructions->push_tail(jump);
+         }
+      }
+
+      break;
+   }
+
+   /* Jump instructions do not have r-values.
+    */
+   return NULL;
+}
+
+
+ir_rvalue *
+ast_selection_statement::hir(exec_list *instructions,
+                             struct _mesa_glsl_parse_state *state)
+{
+   void *ctx = state;
+
+   ir_rvalue *const condition = this->condition->hir(instructions, state);
+
+   /* From page 66 (page 72 of the PDF) of the GLSL 1.50 spec:
+    *
+    *    "Any expression whose type evaluates to a Boolean can be used as the
+    *    conditional expression bool-expression. Vector types are not accepted
+    *    as the expression to if."
+    *
+    * The checks are separated so that higher quality diagnostics can be
+    * generated for cases where both rules are violated.
+    */
+   if (!condition->type->is_boolean() || !condition->type->is_scalar()) {
+      YYLTYPE loc = this->condition->get_location();
+
+      _mesa_glsl_error(& loc, state, "if-statement condition must be scalar "
+                       "boolean");
+   }
+
+   ir_if *const stmt = new(ctx) ir_if(condition);
+
+   if (then_statement != NULL) {
+      state->symbols->push_scope();
+      then_statement->hir(& stmt->then_instructions, state);
+      state->symbols->pop_scope();
+   }
+
+   if (else_statement != NULL) {
+      state->symbols->push_scope();
+      else_statement->hir(& stmt->else_instructions, state);
+      state->symbols->pop_scope();
+   }
+
+   instructions->push_tail(stmt);
+
+   /* if-statements do not have r-values.
+    */
+   return NULL;
+}
+
+
+ir_rvalue *
+ast_switch_statement::hir(exec_list *instructions,
+                          struct _mesa_glsl_parse_state *state)
+{
+   void *ctx = state;
+
+   ir_rvalue *const test_expression =
+      this->test_expression->hir(instructions, state);
+
+   /* From page 66 (page 55 of the PDF) of the GLSL 1.50 spec:
+    *
+    *    "The type of init-expression in a switch statement must be a 
+    *     scalar integer." 
+    */
+   if (!test_expression->type->is_scalar() ||
+       !test_expression->type->is_integer()) {
+      YYLTYPE loc = this->test_expression->get_location();
+
+      _mesa_glsl_error(& loc,
+                       state,
+                       "switch-statement expression must be scalar "
+                       "integer");
+   }
+
+   /* Track the switch-statement nesting in a stack-like manner.
+    */
+   struct glsl_switch_state saved = state->switch_state;
+
+   state->switch_state.is_switch_innermost = true;
+   state->switch_state.switch_nesting_ast = this;
+   state->switch_state.labels_ht = hash_table_ctor(0, hash_table_pointer_hash,
+						   hash_table_pointer_compare);
+   state->switch_state.previous_default = NULL;
+
+   /* Initalize is_fallthru state to false.
+    */
+   ir_rvalue *const is_fallthru_val = new (ctx) ir_constant(false);
+   state->switch_state.is_fallthru_var =
+      new(ctx) ir_variable(glsl_type::bool_type,
+                           "switch_is_fallthru_tmp",
+                           ir_var_temporary);
+   instructions->push_tail(state->switch_state.is_fallthru_var);
+
+   ir_dereference_variable *deref_is_fallthru_var =
+      new(ctx) ir_dereference_variable(state->switch_state.is_fallthru_var);
+   instructions->push_tail(new(ctx) ir_assignment(deref_is_fallthru_var,
+                                                  is_fallthru_val));
+
+   /* Initalize is_break state to false.
+    */
+   ir_rvalue *const is_break_val = new (ctx) ir_constant(false);
+   state->switch_state.is_break_var =
+      new(ctx) ir_variable(glsl_type::bool_type,
+                           "switch_is_break_tmp",
+                           ir_var_temporary);
+   instructions->push_tail(state->switch_state.is_break_var);
+
+   ir_dereference_variable *deref_is_break_var =
+      new(ctx) ir_dereference_variable(state->switch_state.is_break_var);
+   instructions->push_tail(new(ctx) ir_assignment(deref_is_break_var,
+                                                  is_break_val));
+
+   /* Cache test expression.
+    */
+   test_to_hir(instructions, state);
+
+   /* Emit code for body of switch stmt.
+    */
+   body->hir(instructions, state);
+
+   hash_table_dtor(state->switch_state.labels_ht);
+
+   state->switch_state = saved;
+
+   /* Switch statements do not have r-values. */
+   return NULL;
+}
+
+
+void
+ast_switch_statement::test_to_hir(exec_list *instructions,
+                                  struct _mesa_glsl_parse_state *state)
+{
+   void *ctx = state;
+
+   /* Cache value of test expression. */
+   ir_rvalue *const test_val =
+      test_expression->hir(instructions,
+			   state);
+
+   state->switch_state.test_var = new(ctx) ir_variable(test_val->type,
+                                                       "switch_test_tmp",
+                                                       ir_var_temporary);
+   ir_dereference_variable *deref_test_var =
+      new(ctx) ir_dereference_variable(state->switch_state.test_var);
+
+   instructions->push_tail(state->switch_state.test_var);
+   instructions->push_tail(new(ctx) ir_assignment(deref_test_var, test_val));
+}
+
+
+ir_rvalue *
+ast_switch_body::hir(exec_list *instructions,
+                     struct _mesa_glsl_parse_state *state)
+{
+   if (stmts != NULL)
+      stmts->hir(instructions, state);
+
+   /* Switch bodies do not have r-values. */
+   return NULL;
+}
+
+ir_rvalue *
+ast_case_statement_list::hir(exec_list *instructions,
+                             struct _mesa_glsl_parse_state *state)
+{
+   foreach_list_typed (ast_case_statement, case_stmt, link, & this->cases)
+      case_stmt->hir(instructions, state);
+
+   /* Case statements do not have r-values. */
+   return NULL;
+}
+
+ir_rvalue *
+ast_case_statement::hir(exec_list *instructions,
+                        struct _mesa_glsl_parse_state *state)
+{
+   labels->hir(instructions, state);
+
+   /* Conditionally set fallthru state based on break state. */
+   ir_constant *const false_val = new(state) ir_constant(false);
+   ir_dereference_variable *const deref_is_fallthru_var =
+      new(state) ir_dereference_variable(state->switch_state.is_fallthru_var);
+   ir_dereference_variable *const deref_is_break_var =
+      new(state) ir_dereference_variable(state->switch_state.is_break_var);
+   ir_assignment *const reset_fallthru_on_break =
+      new(state) ir_assignment(deref_is_fallthru_var,
+                               false_val,
+                               deref_is_break_var);
+   instructions->push_tail(reset_fallthru_on_break);
+
+   /* Guard case statements depending on fallthru state. */
+   ir_dereference_variable *const deref_fallthru_guard =
+      new(state) ir_dereference_variable(state->switch_state.is_fallthru_var);
+   ir_if *const test_fallthru = new(state) ir_if(deref_fallthru_guard);
+
+   foreach_list_typed (ast_node, stmt, link, & this->stmts)
+      stmt->hir(& test_fallthru->then_instructions, state);
+
+   instructions->push_tail(test_fallthru);
+
+   /* Case statements do not have r-values. */
+   return NULL;
+}
+
+
+ir_rvalue *
+ast_case_label_list::hir(exec_list *instructions,
+                         struct _mesa_glsl_parse_state *state)
+{
+   foreach_list_typed (ast_case_label, label, link, & this->labels)
+      label->hir(instructions, state);
+
+   /* Case labels do not have r-values. */
+   return NULL;
+}
+
+ir_rvalue *
+ast_case_label::hir(exec_list *instructions,
+                    struct _mesa_glsl_parse_state *state)
+{
+   void *ctx = state;
+
+   ir_dereference_variable *deref_fallthru_var =
+      new(ctx) ir_dereference_variable(state->switch_state.is_fallthru_var);
+
+   ir_rvalue *const true_val = new(ctx) ir_constant(true);
+
+   /* If not default case, ... */
+   if (this->test_value != NULL) {
+      /* Conditionally set fallthru state based on
+       * comparison of cached test expression value to case label.
+       */
+      ir_rvalue *const label_rval = this->test_value->hir(instructions, state);
+      ir_constant *label_const = label_rval->constant_expression_value();
+
+      if (!label_const) {
+         YYLTYPE loc = this->test_value->get_location();
+
+         _mesa_glsl_error(& loc, state,
+                          "switch statement case label must be a "
+                          "constant expression");
+
+         /* Stuff a dummy value in to allow processing to continue. */
+         label_const = new(ctx) ir_constant(0);
+      } else {
+         ast_expression *previous_label = (ast_expression *)
+         hash_table_find(state->switch_state.labels_ht,
+                         (void *)(uintptr_t)label_const->value.u[0]);
+
+         if (previous_label) {
+            YYLTYPE loc = this->test_value->get_location();
+            _mesa_glsl_error(& loc, state, "duplicate case value");
+
+            loc = previous_label->get_location();
+            _mesa_glsl_error(& loc, state, "this is the previous case label");
+         } else {
+            hash_table_insert(state->switch_state.labels_ht,
+                              this->test_value,
+                              (void *)(uintptr_t)label_const->value.u[0]);
+         }
+      }
+
+      ir_dereference_variable *deref_test_var =
+         new(ctx) ir_dereference_variable(state->switch_state.test_var);
+
+      ir_rvalue *const test_cond = new(ctx) ir_expression(ir_binop_all_equal,
+                                                          label_const,
+                                                          deref_test_var);
+
+      ir_assignment *set_fallthru_on_test =
+         new(ctx) ir_assignment(deref_fallthru_var, true_val, test_cond);
+
+      instructions->push_tail(set_fallthru_on_test);
+   } else { /* default case */
+      if (state->switch_state.previous_default) {
+         YYLTYPE loc = this->get_location();
+         _mesa_glsl_error(& loc, state,
+                          "multiple default labels in one switch");
+
+         loc = state->switch_state.previous_default->get_location();
+         _mesa_glsl_error(& loc, state, "this is the first default label");
+      }
+      state->switch_state.previous_default = this;
+
+      /* Set falltrhu state. */
+      ir_assignment *set_fallthru =
+         new(ctx) ir_assignment(deref_fallthru_var, true_val);
+
+      instructions->push_tail(set_fallthru);
+   }
+
+   /* Case statements do not have r-values. */
+   return NULL;
+}
+
+void
+ast_iteration_statement::condition_to_hir(exec_list *instructions,
+                                          struct _mesa_glsl_parse_state *state)
+{
+   void *ctx = state;
+
+   if (condition != NULL) {
+      ir_rvalue *const cond =
+         condition->hir(instructions, state);
+
+      if ((cond == NULL)
+          || !cond->type->is_boolean() || !cond->type->is_scalar()) {
+         YYLTYPE loc = condition->get_location();
+
+         _mesa_glsl_error(& loc, state,
+                          "loop condition must be scalar boolean");
+      } else {
+         /* As the first code in the loop body, generate a block that looks
+          * like 'if (!condition) break;' as the loop termination condition.
+          */
+         ir_rvalue *const not_cond =
+            new(ctx) ir_expression(ir_unop_logic_not, cond);
+
+         ir_if *const if_stmt = new(ctx) ir_if(not_cond);
+
+         ir_jump *const break_stmt =
+            new(ctx) ir_loop_jump(ir_loop_jump::jump_break);
+
+         if_stmt->then_instructions.push_tail(break_stmt);
+         instructions->push_tail(if_stmt);
+      }
+   }
+}
+
+
+ir_rvalue *
+ast_iteration_statement::hir(exec_list *instructions,
+                             struct _mesa_glsl_parse_state *state)
+{
+   void *ctx = state;
+
+   /* For-loops and while-loops start a new scope, but do-while loops do not.
+    */
+   if (mode != ast_do_while)
+      state->symbols->push_scope();
+
+   if (init_statement != NULL)
+      init_statement->hir(instructions, state);
+
+   ir_loop *const stmt = new(ctx) ir_loop();
+   instructions->push_tail(stmt);
+
+   /* Track the current loop nesting. */
+   ast_iteration_statement *nesting_ast = state->loop_nesting_ast;
+
+   state->loop_nesting_ast = this;
+
+   /* Likewise, indicate that following code is closest to a loop,
+    * NOT closest to a switch.
+    */
+   bool saved_is_switch_innermost = state->switch_state.is_switch_innermost;
+   state->switch_state.is_switch_innermost = false;
+
+   if (mode != ast_do_while)
+      condition_to_hir(&stmt->body_instructions, state);
+
+   if (body != NULL)
+      body->hir(& stmt->body_instructions, state);
+
+   if (rest_expression != NULL)
+      rest_expression->hir(& stmt->body_instructions, state);
+
+   if (mode == ast_do_while)
+      condition_to_hir(&stmt->body_instructions, state);
+
+   if (mode != ast_do_while)
+      state->symbols->pop_scope();
+
+   /* Restore previous nesting before returning. */
+   state->loop_nesting_ast = nesting_ast;
+   state->switch_state.is_switch_innermost = saved_is_switch_innermost;
+
+   /* Loops do not have r-values.
+    */
+   return NULL;
+}
+
+
+/**
+ * Determine if the given type is valid for establishing a default precision
+ * qualifier.
+ *
+ * From GLSL ES 3.00 section 4.5.4 ("Default Precision Qualifiers"):
+ *
+ *     "The precision statement
+ *
+ *         precision precision-qualifier type;
+ *
+ *     can be used to establish a default precision qualifier. The type field
+ *     can be either int or float or any of the sampler types, and the
+ *     precision-qualifier can be lowp, mediump, or highp."
+ *
+ * GLSL ES 1.00 has similar language.  GLSL 1.30 doesn't allow precision
+ * qualifiers on sampler types, but this seems like an oversight (since the
+ * intention of including these in GLSL 1.30 is to allow compatibility with ES
+ * shaders).  So we allow int, float, and all sampler types regardless of GLSL
+ * version.
+ */
+static bool
+is_valid_default_precision_type(const struct glsl_type *const type)
+{
+   if (type == NULL)
+      return false;
+
+   switch (type->base_type) {
+   case GLSL_TYPE_INT:
+   case GLSL_TYPE_FLOAT:
+      /* "int" and "float" are valid, but vectors and matrices are not. */
+      return type->vector_elements == 1 && type->matrix_columns == 1;
+   case GLSL_TYPE_SAMPLER:
+      return true;
+   default:
+      return false;
+   }
+}
+
+
+ir_rvalue *
+ast_type_specifier::hir(exec_list *instructions,
+                        struct _mesa_glsl_parse_state *state)
+{
+   if (this->default_precision == ast_precision_none && this->structure == NULL)
+      return NULL;
+
+   YYLTYPE loc = this->get_location();
+
+   /* If this is a precision statement, check that the type to which it is
+    * applied is either float or int.
+    *
+    * From section 4.5.3 of the GLSL 1.30 spec:
+    *    "The precision statement
+    *       precision precision-qualifier type;
+    *    can be used to establish a default precision qualifier. The type
+    *    field can be either int or float [...].  Any other types or
+    *    qualifiers will result in an error.
+    */
+   if (this->default_precision != ast_precision_none) {
+      if (!state->check_precision_qualifiers_allowed(&loc))
+         return NULL;
+
+      if (this->structure != NULL) {
+         _mesa_glsl_error(&loc, state,
+                          "precision qualifiers do not apply to structures");
+         return NULL;
+      }
+
+      if (this->array_specifier != NULL) {
+         _mesa_glsl_error(&loc, state,
+                          "default precision statements do not apply to "
+                          "arrays");
+         return NULL;
+      }
+
+      const struct glsl_type *const type =
+         state->symbols->get_type(this->type_name);
+      if (!is_valid_default_precision_type(type)) {
+         _mesa_glsl_error(&loc, state,
+                          "default precision statements apply only to "
+                          "float, int, and sampler types");
+         return NULL;
+      }
+
+      if (type->base_type == GLSL_TYPE_FLOAT
+          && state->es_shader
+          && state->stage == MESA_SHADER_FRAGMENT) {
+         /* Section 4.5.3 (Default Precision Qualifiers) of the GLSL ES 1.00
+          * spec says:
+          *
+          *     "The fragment language has no default precision qualifier for
+          *     floating point types."
+          *
+          * As a result, we have to track whether or not default precision has
+          * been specified for float in GLSL ES fragment shaders.
+          *
+          * Earlier in that same section, the spec says:
+          *
+          *     "Non-precision qualified declarations will use the precision
+          *     qualifier specified in the most recent precision statement
+          *     that is still in scope. The precision statement has the same
+          *     scoping rules as variable declarations. If it is declared
+          *     inside a compound statement, its effect stops at the end of
+          *     the innermost statement it was declared in. Precision
+          *     statements in nested scopes override precision statements in
+          *     outer scopes. Multiple precision statements for the same basic
+          *     type can appear inside the same scope, with later statements
+          *     overriding earlier statements within that scope."
+          *
+          * Default precision specifications follow the same scope rules as
+          * variables.  So, we can track the state of the default float
+          * precision in the symbol table, and the rules will just work.  This
+          * is a slight abuse of the symbol table, but it has the semantics
+          * that we want.
+          */
+         ir_variable *const junk =
+            new(state) ir_variable(type, "#default precision",
+                                   ir_var_temporary);
+
+         state->symbols->add_variable(junk);
+      }
+
+      /* FINISHME: Translate precision statements into IR. */
+      return NULL;
+   }
+
+   /* _mesa_ast_set_aggregate_type() sets the <structure> field so that
+    * process_record_constructor() can do type-checking on C-style initializer
+    * expressions of structs, but ast_struct_specifier should only be translated
+    * to HIR if it is declaring the type of a structure.
+    *
+    * The ->is_declaration field is false for initializers of variables
+    * declared separately from the struct's type definition.
+    *
+    *    struct S { ... };              (is_declaration = true)
+    *    struct T { ... } t = { ... };  (is_declaration = true)
+    *    S s = { ... };                 (is_declaration = false)
+    */
+   if (this->structure != NULL && this->structure->is_declaration)
+      return this->structure->hir(instructions, state);
+
+   return NULL;
+}
+
+
+/**
+ * Process a structure or interface block tree into an array of structure fields
+ *
+ * After parsing, where there are some syntax differnces, structures and
+ * interface blocks are almost identical.  They are similar enough that the
+ * AST for each can be processed the same way into a set of
+ * \c glsl_struct_field to describe the members.
+ *
+ * If we're processing an interface block, var_mode should be the type of the
+ * interface block (ir_var_shader_in, ir_var_shader_out, or ir_var_uniform).
+ * If we're processing a structure, var_mode should be ir_var_auto.
+ *
+ * \return
+ * The number of fields processed.  A pointer to the array structure fields is
+ * stored in \c *fields_ret.
+ */
+unsigned
+ast_process_structure_or_interface_block(exec_list *instructions,
+                                         struct _mesa_glsl_parse_state *state,
+                                         exec_list *declarations,
+                                         YYLTYPE &loc,
+                                         glsl_struct_field **fields_ret,
+                                         bool is_interface,
+                                         bool block_row_major,
+                                         bool allow_reserved_names,
+                                         ir_variable_mode var_mode)
+{
+   unsigned decl_count = 0;
+
+   /* Make an initial pass over the list of fields to determine how
+    * many there are.  Each element in this list is an ast_declarator_list.
+    * This means that we actually need to count the number of elements in the
+    * 'declarations' list in each of the elements.
+    */
+   foreach_list_typed (ast_declarator_list, decl_list, link, declarations) {
+      foreach_list_const (decl_ptr, & decl_list->declarations) {
+         decl_count++;
+      }
+   }
+
+   /* Allocate storage for the fields and process the field
+    * declarations.  As the declarations are processed, try to also convert
+    * the types to HIR.  This ensures that structure definitions embedded in
+    * other structure definitions or in interface blocks are processed.
+    */
+   glsl_struct_field *const fields = ralloc_array(state, glsl_struct_field,
+                                                  decl_count);
+
+   unsigned i = 0;
+   foreach_list_typed (ast_declarator_list, decl_list, link, declarations) {
+      const char *type_name;
+
+      decl_list->type->specifier->hir(instructions, state);
+
+      /* Section 10.9 of the GLSL ES 1.00 specification states that
+       * embedded structure definitions have been removed from the language.
+       */
+      if (state->es_shader && decl_list->type->specifier->structure != NULL) {
+         _mesa_glsl_error(&loc, state, "embedded structure definitions are "
+                          "not allowed in GLSL ES 1.00");
+      }
+
+      const glsl_type *decl_type =
+         decl_list->type->glsl_type(& type_name, state);
+
+      foreach_list_typed (ast_declaration, decl, link,
+                          &decl_list->declarations) {
+         if (!allow_reserved_names)
+            validate_identifier(decl->identifier, loc, state);
+
+         /* From section 4.3.9 of the GLSL 4.40 spec:
+          *
+          *    "[In interface blocks] opaque types are not allowed."
+          *
+          * It should be impossible for decl_type to be NULL here.  Cases that
+          * might naturally lead to decl_type being NULL, especially for the
+          * is_interface case, will have resulted in compilation having
+          * already halted due to a syntax error.
+          */
+         const struct glsl_type *field_type =
+            decl_type != NULL ? decl_type : glsl_type::error_type;
+
+         if (is_interface && field_type->contains_opaque()) {
+            YYLTYPE loc = decl_list->get_location();
+            _mesa_glsl_error(&loc, state,
+                             "uniform in non-default uniform block contains "
+                             "opaque variable");
+         }
+
+         if (field_type->contains_atomic()) {
+            /* FINISHME: Add a spec quotation here once updated spec
+             * FINISHME: language is available.  See Khronos bug #10903
+             * FINISHME: on whether atomic counters are allowed in
+             * FINISHME: structures.
+             */
+            YYLTYPE loc = decl_list->get_location();
+            _mesa_glsl_error(&loc, state, "atomic counter in structure or "
+                             "uniform block");
+         }
+
+         if (field_type->contains_image()) {
+            /* FINISHME: Same problem as with atomic counters.
+             * FINISHME: Request clarification from Khronos and add
+             * FINISHME: spec quotation here.
+             */
+            YYLTYPE loc = decl_list->get_location();
+            _mesa_glsl_error(&loc, state,
+                             "image in structure or uniform block");
+         }
+
+         const struct ast_type_qualifier *const qual =
+            & decl_list->type->qualifier;
+         if (qual->flags.q.std140 ||
+             qual->flags.q.packed ||
+             qual->flags.q.shared) {
+            _mesa_glsl_error(&loc, state,
+                             "uniform block layout qualifiers std140, packed, and "
+                             "shared can only be applied to uniform blocks, not "
+                             "members");
+         }
+
+         field_type = process_array_type(&loc, decl_type,
+                                         decl->array_specifier, state);
+         fields[i].type = field_type;
+         fields[i].name = decl->identifier;
+         fields[i].location = -1;
+         fields[i].interpolation =
+            interpret_interpolation_qualifier(qual, var_mode, state, &loc);
+         fields[i].centroid = qual->flags.q.centroid ? 1 : 0;
+         fields[i].sample = qual->flags.q.sample ? 1 : 0;
+
+         if (qual->flags.q.row_major || qual->flags.q.column_major) {
+            if (!qual->flags.q.uniform) {
+               _mesa_glsl_error(&loc, state,
+                                "row_major and column_major can only be "
+                                "applied to uniform interface blocks");
+            } else
+               validate_matrix_layout_for_type(state, &loc, field_type, NULL);
+         }
+
+         if (qual->flags.q.uniform && qual->has_interpolation()) {
+            _mesa_glsl_error(&loc, state,
+                             "interpolation qualifiers cannot be used "
+                             "with uniform interface blocks");
+         }
+
+         if (field_type->is_matrix() ||
+             (field_type->is_array() && field_type->fields.array->is_matrix())) {
+            fields[i].row_major = block_row_major;
+            if (qual->flags.q.row_major)
+               fields[i].row_major = true;
+            else if (qual->flags.q.column_major)
+               fields[i].row_major = false;
+         }
+
+      i++;
+      }
+   }
+
+   assert(i == decl_count);
+
+   *fields_ret = fields;
+   return decl_count;
+}
+
+
+ir_rvalue *
+ast_struct_specifier::hir(exec_list *instructions,
+                          struct _mesa_glsl_parse_state *state)
+{
+   YYLTYPE loc = this->get_location();
+
+   /* Section 4.1.8 (Structures) of the GLSL 1.10 spec says:
+    *
+    *     "Anonymous structures are not supported; so embedded structures must
+    *     have a declarator. A name given to an embedded struct is scoped at
+    *     the same level as the struct it is embedded in."
+    *
+    * The same section of the  GLSL 1.20 spec says:
+    *
+    *     "Anonymous structures are not supported. Embedded structures are not
+    *     supported.
+    *
+    *         struct S { float f; };
+    *         struct T {
+    *             S;              // Error: anonymous structures disallowed
+    *             struct { ... }; // Error: embedded structures disallowed
+    *             S s;            // Okay: nested structures with name are allowed
+    *         };"
+    *
+    * The GLSL ES 1.00 and 3.00 specs have similar langauge and examples.  So,
+    * we allow embedded structures in 1.10 only.
+    */
+   if (state->language_version != 110 && state->struct_specifier_depth != 0)
+      _mesa_glsl_error(&loc, state,
+		       "embedded structure declartions are not allowed");
+
+   state->struct_specifier_depth++;
+
+   glsl_struct_field *fields;
+   unsigned decl_count =
+      ast_process_structure_or_interface_block(instructions,
+                                               state,
+                                               &this->declarations,
+                                               loc,
+                                               &fields,
+                                               false,
+                                               false,
+                                               false /* allow_reserved_names */,
+                                               ir_var_auto);
+
+   validate_identifier(this->name, loc, state);
+
+   const glsl_type *t =
+      glsl_type::get_record_instance(fields, decl_count, this->name);
+
+   if (!state->symbols->add_type(name, t)) {
+      _mesa_glsl_error(& loc, state, "struct `%s' previously defined", name);
+   } else {
+      const glsl_type **s = reralloc(state, state->user_structures,
+                                     const glsl_type *,
+                                     state->num_user_structures + 1);
+      if (s != NULL) {
+         s[state->num_user_structures] = t;
+         state->user_structures = s;
+         state->num_user_structures++;
+      }
+   }
+
+   state->struct_specifier_depth--;
+
+   /* Structure type definitions do not have r-values.
+    */
+   return NULL;
+}
+
+
+/**
+ * Visitor class which detects whether a given interface block has been used.
+ */
+class interface_block_usage_visitor : public ir_hierarchical_visitor
+{
+public:
+   interface_block_usage_visitor(ir_variable_mode mode, const glsl_type *block)
+      : mode(mode), block(block), found(false)
+   {
+   }
+
+   virtual ir_visitor_status visit(ir_dereference_variable *ir)
+   {
+      if (ir->var->data.mode == mode && ir->var->get_interface_type() == block) {
+         found = true;
+         return visit_stop;
+      }
+      return visit_continue;
+   }
+
+   bool usage_found() const
+   {
+      return this->found;
+   }
+
+private:
+   ir_variable_mode mode;
+   const glsl_type *block;
+   bool found;
+};
+
+
+ir_rvalue *
+ast_interface_block::hir(exec_list *instructions,
+                         struct _mesa_glsl_parse_state *state)
+{
+   YYLTYPE loc = this->get_location();
+
+   /* The ast_interface_block has a list of ast_declarator_lists.  We
+    * need to turn those into ir_variables with an association
+    * with this uniform block.
+    */
+   enum glsl_interface_packing packing;
+   if (this->layout.flags.q.shared) {
+      packing = GLSL_INTERFACE_PACKING_SHARED;
+   } else if (this->layout.flags.q.packed) {
+      packing = GLSL_INTERFACE_PACKING_PACKED;
+   } else {
+      /* The default layout is std140.
+       */
+      packing = GLSL_INTERFACE_PACKING_STD140;
+   }
+
+   ir_variable_mode var_mode;
+   const char *iface_type_name;
+   if (this->layout.flags.q.in) {
+      var_mode = ir_var_shader_in;
+      iface_type_name = "in";
+   } else if (this->layout.flags.q.out) {
+      var_mode = ir_var_shader_out;
+      iface_type_name = "out";
+   } else if (this->layout.flags.q.uniform) {
+      var_mode = ir_var_uniform;
+      iface_type_name = "uniform";
+   } else {
+      var_mode = ir_var_auto;
+      iface_type_name = "UNKNOWN";
+      assert(!"interface block layout qualifier not found!");
+   }
+
+   bool redeclaring_per_vertex = strcmp(this->block_name, "gl_PerVertex") == 0;
+   bool block_row_major = this->layout.flags.q.row_major;
+   exec_list declared_variables;
+   glsl_struct_field *fields;
+   unsigned int num_variables =
+      ast_process_structure_or_interface_block(&declared_variables,
+                                               state,
+                                               &this->declarations,
+                                               loc,
+                                               &fields,
+                                               true,
+                                               block_row_major,
+                                               redeclaring_per_vertex,
+                                               var_mode);
+
+   if (!redeclaring_per_vertex)
+      validate_identifier(this->block_name, loc, state);
+
+   const glsl_type *earlier_per_vertex = NULL;
+   if (redeclaring_per_vertex) {
+      /* Find the previous declaration of gl_PerVertex.  If we're redeclaring
+       * the named interface block gl_in, we can find it by looking at the
+       * previous declaration of gl_in.  Otherwise we can find it by looking
+       * at the previous decalartion of any of the built-in outputs,
+       * e.g. gl_Position.
+       *
+       * Also check that the instance name and array-ness of the redeclaration
+       * are correct.
+       */
+      switch (var_mode) {
+      case ir_var_shader_in:
+         if (ir_variable *earlier_gl_in =
+             state->symbols->get_variable("gl_in")) {
+            earlier_per_vertex = earlier_gl_in->get_interface_type();
+         } else {
+            _mesa_glsl_error(&loc, state,
+                             "redeclaration of gl_PerVertex input not allowed "
+                             "in the %s shader",
+                             _mesa_shader_stage_to_string(state->stage));
+         }
+         if (this->instance_name == NULL ||
+             strcmp(this->instance_name, "gl_in") != 0 || this->array_specifier == NULL) {
+            _mesa_glsl_error(&loc, state,
+                             "gl_PerVertex input must be redeclared as "
+                             "gl_in[]");
+         }
+         break;
+      case ir_var_shader_out:
+         if (ir_variable *earlier_gl_Position =
+             state->symbols->get_variable("gl_Position")) {
+            earlier_per_vertex = earlier_gl_Position->get_interface_type();
+         } else {
+            _mesa_glsl_error(&loc, state,
+                             "redeclaration of gl_PerVertex output not "
+                             "allowed in the %s shader",
+                             _mesa_shader_stage_to_string(state->stage));
+         }
+         if (this->instance_name != NULL) {
+            _mesa_glsl_error(&loc, state,
+                             "gl_PerVertex input may not be redeclared with "
+                             "an instance name");
+         }
+         break;
+      default:
+         _mesa_glsl_error(&loc, state,
+                          "gl_PerVertex must be declared as an input or an "
+                          "output");
+         break;
+      }
+
+      if (earlier_per_vertex == NULL) {
+         /* An error has already been reported.  Bail out to avoid null
+          * dereferences later in this function.
+          */
+         return NULL;
+      }
+
+      /* Copy locations from the old gl_PerVertex interface block. */
+      for (unsigned i = 0; i < num_variables; i++) {
+         int j = earlier_per_vertex->field_index(fields[i].name);
+         if (j == -1) {
+            _mesa_glsl_error(&loc, state,
+                             "redeclaration of gl_PerVertex must be a subset "
+                             "of the built-in members of gl_PerVertex");
+         } else {
+            fields[i].location =
+               earlier_per_vertex->fields.structure[j].location;
+            fields[i].interpolation =
+               earlier_per_vertex->fields.structure[j].interpolation;
+            fields[i].centroid =
+               earlier_per_vertex->fields.structure[j].centroid;
+            fields[i].sample =
+               earlier_per_vertex->fields.structure[j].sample;
+         }
+      }
+
+      /* From section 7.1 ("Built-in Language Variables") of the GLSL 4.10
+       * spec:
+       *
+       *     If a built-in interface block is redeclared, it must appear in
+       *     the shader before any use of any member included in the built-in
+       *     declaration, or a compilation error will result.
+       *
+       * This appears to be a clarification to the behaviour established for
+       * gl_PerVertex by GLSL 1.50, therefore we implement this behaviour
+       * regardless of GLSL version.
+       */
+      interface_block_usage_visitor v(var_mode, earlier_per_vertex);
+      v.run(instructions);
+      if (v.usage_found()) {
+         _mesa_glsl_error(&loc, state,
+                          "redeclaration of a built-in interface block must "
+                          "appear before any use of any member of the "
+                          "interface block");
+      }
+   }
+
+   const glsl_type *block_type =
+      glsl_type::get_interface_instance(fields,
+                                        num_variables,
+                                        packing,
+                                        this->block_name);
+
+   if (!state->symbols->add_interface(block_type->name, block_type, var_mode)) {
+      YYLTYPE loc = this->get_location();
+      _mesa_glsl_error(&loc, state, "interface block `%s' with type `%s' "
+                       "already taken in the current scope",
+                       this->block_name, iface_type_name);
+   }
+
+   /* Since interface blocks cannot contain statements, it should be
+    * impossible for the block to generate any instructions.
+    */
+   assert(declared_variables.is_empty());
+
+   /* From section 4.3.4 (Inputs) of the GLSL 1.50 spec:
+    *
+    *     Geometry shader input variables get the per-vertex values written
+    *     out by vertex shader output variables of the same names. Since a
+    *     geometry shader operates on a set of vertices, each input varying
+    *     variable (or input block, see interface blocks below) needs to be
+    *     declared as an array.
+    */
+   if (state->stage == MESA_SHADER_GEOMETRY && this->array_specifier == NULL &&
+       var_mode == ir_var_shader_in) {
+      _mesa_glsl_error(&loc, state, "geometry shader inputs must be arrays");
+   }
+
+   /* Page 39 (page 45 of the PDF) of section 4.3.7 in the GLSL ES 3.00 spec
+    * says:
+    *
+    *     "If an instance name (instance-name) is used, then it puts all the
+    *     members inside a scope within its own name space, accessed with the
+    *     field selector ( . ) operator (analogously to structures)."
+    */
+   if (this->instance_name) {
+      if (redeclaring_per_vertex) {
+         /* When a built-in in an unnamed interface block is redeclared,
+          * get_variable_being_redeclared() calls
+          * check_builtin_array_max_size() to make sure that built-in array
+          * variables aren't redeclared to illegal sizes.  But we're looking
+          * at a redeclaration of a named built-in interface block.  So we
+          * have to manually call check_builtin_array_max_size() for all parts
+          * of the interface that are arrays.
+          */
+         for (unsigned i = 0; i < num_variables; i++) {
+            if (fields[i].type->is_array()) {
+               const unsigned size = fields[i].type->array_size();
+               check_builtin_array_max_size(fields[i].name, size, loc, state);
+            }
+         }
+      } else {
+         validate_identifier(this->instance_name, loc, state);
+      }
+
+      ir_variable *var;
+
+      if (this->array_specifier != NULL) {
+         /* Section 4.3.7 (Interface Blocks) of the GLSL 1.50 spec says:
+          *
+          *     For uniform blocks declared an array, each individual array
+          *     element corresponds to a separate buffer object backing one
+          *     instance of the block. As the array size indicates the number
+          *     of buffer objects needed, uniform block array declarations
+          *     must specify an array size.
+          *
+          * And a few paragraphs later:
+          *
+          *     Geometry shader input blocks must be declared as arrays and
+          *     follow the array declaration and linking rules for all
+          *     geometry shader inputs. All other input and output block
+          *     arrays must specify an array size.
+          *
+          * The upshot of this is that the only circumstance where an
+          * interface array size *doesn't* need to be specified is on a
+          * geometry shader input.
+          */
+         if (this->array_specifier->is_unsized_array &&
+             (state->stage != MESA_SHADER_GEOMETRY || !this->layout.flags.q.in)) {
+            _mesa_glsl_error(&loc, state,
+                             "only geometry shader inputs may be unsized "
+                             "instance block arrays");
+
+         }
+
+         const glsl_type *block_array_type =
+            process_array_type(&loc, block_type, this->array_specifier, state);
+
+         var = new(state) ir_variable(block_array_type,
+                                      this->instance_name,
+                                      var_mode);
+      } else {
+         var = new(state) ir_variable(block_type,
+                                      this->instance_name,
+                                      var_mode);
+      }
+
+      if (state->stage == MESA_SHADER_GEOMETRY && var_mode == ir_var_shader_in)
+         handle_geometry_shader_input_decl(state, loc, var);
+
+      if (ir_variable *earlier =
+          state->symbols->get_variable(this->instance_name)) {
+         if (!redeclaring_per_vertex) {
+            _mesa_glsl_error(&loc, state, "`%s' redeclared",
+                             this->instance_name);
+         }
+         earlier->data.how_declared = ir_var_declared_normally;
+         earlier->type = var->type;
+         earlier->reinit_interface_type(block_type);
+         delete var;
+      } else {
+         /* Propagate the "binding" keyword into this UBO's fields;
+          * the UBO declaration itself doesn't get an ir_variable unless it
+          * has an instance name.  This is ugly.
+          */
+         var->data.explicit_binding = this->layout.flags.q.explicit_binding;
+         var->data.binding = this->layout.binding;
+
+         state->symbols->add_variable(var);
+         instructions->push_tail(var);
+      }
+   } else {
+      /* In order to have an array size, the block must also be declared with
+       * an instance name.
+       */
+      assert(this->array_specifier == NULL);
+
+      for (unsigned i = 0; i < num_variables; i++) {
+         ir_variable *var =
+            new(state) ir_variable(fields[i].type,
+                                   ralloc_strdup(state, fields[i].name),
+                                   var_mode);
+         var->data.interpolation = fields[i].interpolation;
+         var->data.centroid = fields[i].centroid;
+         var->data.sample = fields[i].sample;
+         var->init_interface_type(block_type);
+
+         if (redeclaring_per_vertex) {
+            ir_variable *earlier =
+               get_variable_being_redeclared(var, loc, state,
+                                             true /* allow_all_redeclarations */);
+            if (strncmp(var->name, "gl_", 3) != 0 || earlier == NULL) {
+               _mesa_glsl_error(&loc, state,
+                                "redeclaration of gl_PerVertex can only "
+                                "include built-in variables");
+            } else if (earlier->data.how_declared == ir_var_declared_normally) {
+               _mesa_glsl_error(&loc, state,
+                                "`%s' has already been redeclared", var->name);
+            } else {
+               earlier->data.how_declared = ir_var_declared_in_block;
+               earlier->reinit_interface_type(block_type);
+            }
+            continue;
+         }
+
+         if (state->symbols->get_variable(var->name) != NULL)
+            _mesa_glsl_error(&loc, state, "`%s' redeclared", var->name);
+
+         /* Propagate the "binding" keyword into this UBO's fields;
+          * the UBO declaration itself doesn't get an ir_variable unless it
+          * has an instance name.  This is ugly.
+          */
+         var->data.explicit_binding = this->layout.flags.q.explicit_binding;
+         var->data.binding = this->layout.binding;
+
+         state->symbols->add_variable(var);
+         instructions->push_tail(var);
+      }
+
+      if (redeclaring_per_vertex && block_type != earlier_per_vertex) {
+         /* From section 7.1 ("Built-in Language Variables") of the GLSL 4.10 spec:
+          *
+          *     It is also a compilation error ... to redeclare a built-in
+          *     block and then use a member from that built-in block that was
+          *     not included in the redeclaration.
+          *
+          * This appears to be a clarification to the behaviour established
+          * for gl_PerVertex by GLSL 1.50, therefore we implement this
+          * behaviour regardless of GLSL version.
+          *
+          * To prevent the shader from using a member that was not included in
+          * the redeclaration, we disable any ir_variables that are still
+          * associated with the old declaration of gl_PerVertex (since we've
+          * already updated all of the variables contained in the new
+          * gl_PerVertex to point to it).
+          *
+          * As a side effect this will prevent
+          * validate_intrastage_interface_blocks() from getting confused and
+          * thinking there are conflicting definitions of gl_PerVertex in the
+          * shader.
+          */
+         foreach_list_safe(node, instructions) {
+            ir_variable *const var = ((ir_instruction *) node)->as_variable();
+            if (var != NULL &&
+                var->get_interface_type() == earlier_per_vertex &&
+                var->data.mode == var_mode) {
+               if (var->data.how_declared == ir_var_declared_normally) {
+                  _mesa_glsl_error(&loc, state,
+                                   "redeclaration of gl_PerVertex cannot "
+                                   "follow a redeclaration of `%s'",
+                                   var->name);
+               }
+               state->symbols->disable_variable(var->name);
+               var->remove();
+            }
+         }
+      }
+   }
+
+   return NULL;
+}
+
+
+ir_rvalue *
+ast_gs_input_layout::hir(exec_list *instructions,
+                         struct _mesa_glsl_parse_state *state)
+{
+   YYLTYPE loc = this->get_location();
+
+   /* If any geometry input layout declaration preceded this one, make sure it
+    * was consistent with this one.
+    */
+   if (state->gs_input_prim_type_specified &&
+       state->in_qualifier->prim_type != this->prim_type) {
+      _mesa_glsl_error(&loc, state,
+                       "geometry shader input layout does not match"
+                       " previous declaration");
+      return NULL;
+   }
+
+   /* If any shader inputs occurred before this declaration and specified an
+    * array size, make sure the size they specified is consistent with the
+    * primitive type.
+    */
+   unsigned num_vertices = vertices_per_prim(this->prim_type);
+   if (state->gs_input_size != 0 && state->gs_input_size != num_vertices) {
+      _mesa_glsl_error(&loc, state,
+                       "this geometry shader input layout implies %u vertices"
+                       " per primitive, but a previous input is declared"
+                       " with size %u", num_vertices, state->gs_input_size);
+      return NULL;
+   }
+
+   state->gs_input_prim_type_specified = true;
+
+   /* If any shader inputs occurred before this declaration and did not
+    * specify an array size, their size is determined now.
+    */
+   foreach_list (node, instructions) {
+      ir_variable *var = ((ir_instruction *) node)->as_variable();
+      if (var == NULL || var->data.mode != ir_var_shader_in)
+         continue;
+
+      /* Note: gl_PrimitiveIDIn has mode ir_var_shader_in, but it's not an
+       * array; skip it.
+       */
+
+      if (var->type->is_unsized_array()) {
+         if (var->data.max_array_access >= num_vertices) {
+            _mesa_glsl_error(&loc, state,
+                             "this geometry shader input layout implies %u"
+                             " vertices, but an access to element %u of input"
+                             " `%s' already exists", num_vertices,
+                             var->data.max_array_access, var->name);
+         } else {
+            var->type = glsl_type::get_array_instance(var->type->fields.array,
+                                                      num_vertices);
+         }
+      }
+   }
+
+   return NULL;
+}
+
+
+ir_rvalue *
+ast_cs_input_layout::hir(exec_list *instructions,
+                         struct _mesa_glsl_parse_state *state)
+{
+   YYLTYPE loc = this->get_location();
+
+   /* If any compute input layout declaration preceded this one, make sure it
+    * was consistent with this one.
+    */
+   if (state->cs_input_local_size_specified) {
+      for (int i = 0; i < 3; i++) {
+         if (state->cs_input_local_size[i] != this->local_size[i]) {
+            _mesa_glsl_error(&loc, state,
+                             "compute shader input layout does not match"
+                             " previous declaration");
+            return NULL;
+         }
+      }
+   }
+
+   /* From the ARB_compute_shader specification:
+    *
+    *     If the local size of the shader in any dimension is greater
+    *     than the maximum size supported by the implementation for that
+    *     dimension, a compile-time error results.
+    *
+    * It is not clear from the spec how the error should be reported if
+    * the total size of the work group exceeds
+    * MAX_COMPUTE_WORK_GROUP_INVOCATIONS, but it seems reasonable to
+    * report it at compile time as well.
+    */
+   GLuint64 total_invocations = 1;
+   for (int i = 0; i < 3; i++) {
+      if (this->local_size[i] > state->ctx->Const.MaxComputeWorkGroupSize[i]) {
+         _mesa_glsl_error(&loc, state,
+                          "local_size_%c exceeds MAX_COMPUTE_WORK_GROUP_SIZE"
+                          " (%d)", 'x' + i,
+                          state->ctx->Const.MaxComputeWorkGroupSize[i]);
+         break;
+      }
+      total_invocations *= this->local_size[i];
+      if (total_invocations >
+          state->ctx->Const.MaxComputeWorkGroupInvocations) {
+         _mesa_glsl_error(&loc, state,
+                          "product of local_sizes exceeds "
+                          "MAX_COMPUTE_WORK_GROUP_INVOCATIONS (%d)",
+                          state->ctx->Const.MaxComputeWorkGroupInvocations);
+         break;
+      }
+   }
+
+   state->cs_input_local_size_specified = true;
+   for (int i = 0; i < 3; i++)
+      state->cs_input_local_size[i] = this->local_size[i];
+
+   /* We may now declare the built-in constant gl_WorkGroupSize (see
+    * builtin_variable_generator::generate_constants() for why we didn't
+    * declare it earlier).
+    */
+   ir_variable *var = new(state->symbols)
+      ir_variable(glsl_type::ivec3_type, "gl_WorkGroupSize", ir_var_auto);
+   var->data.how_declared = ir_var_declared_implicitly;
+   var->data.read_only = true;
+   instructions->push_tail(var);
+   state->symbols->add_variable(var);
+   ir_constant_data data;
+   memset(&data, 0, sizeof(data));
+   for (int i = 0; i < 3; i++)
+      data.i[i] = this->local_size[i];
+   var->constant_value = new(var) ir_constant(glsl_type::ivec3_type, &data);
+   var->constant_initializer =
+      new(var) ir_constant(glsl_type::ivec3_type, &data);
+   var->data.has_initializer = true;
+
+   return NULL;
+}
+
+
+static void
+detect_conflicting_assignments(struct _mesa_glsl_parse_state *state,
+                               exec_list *instructions)
+{
+   bool gl_FragColor_assigned = false;
+   bool gl_FragData_assigned = false;
+   bool user_defined_fs_output_assigned = false;
+   ir_variable *user_defined_fs_output = NULL;
+
+   /* It would be nice to have proper location information. */
+   YYLTYPE loc;
+   memset(&loc, 0, sizeof(loc));
+
+   foreach_list(node, instructions) {
+      ir_variable *var = ((ir_instruction *)node)->as_variable();
+
+      if (!var || !var->data.assigned)
+         continue;
+
+      if (strcmp(var->name, "gl_FragColor") == 0)
+         gl_FragColor_assigned = true;
+      else if (strcmp(var->name, "gl_FragData") == 0)
+         gl_FragData_assigned = true;
+      else if (strncmp(var->name, "gl_", 3) != 0) {
+         if (state->stage == MESA_SHADER_FRAGMENT &&
+             var->data.mode == ir_var_shader_out) {
+            user_defined_fs_output_assigned = true;
+            user_defined_fs_output = var;
+         }
+      }
+   }
+
+   /* From the GLSL 1.30 spec:
+    *
+    *     "If a shader statically assigns a value to gl_FragColor, it
+    *      may not assign a value to any element of gl_FragData. If a
+    *      shader statically writes a value to any element of
+    *      gl_FragData, it may not assign a value to
+    *      gl_FragColor. That is, a shader may assign values to either
+    *      gl_FragColor or gl_FragData, but not both. Multiple shaders
+    *      linked together must also consistently write just one of
+    *      these variables.  Similarly, if user declared output
+    *      variables are in use (statically assigned to), then the
+    *      built-in variables gl_FragColor and gl_FragData may not be
+    *      assigned to. These incorrect usages all generate compile
+    *      time errors."
+    */
+   if (gl_FragColor_assigned && gl_FragData_assigned) {
+      _mesa_glsl_error(&loc, state, "fragment shader writes to both "
+                       "`gl_FragColor' and `gl_FragData'");
+   } else if (gl_FragColor_assigned && user_defined_fs_output_assigned) {
+      _mesa_glsl_error(&loc, state, "fragment shader writes to both "
+                       "`gl_FragColor' and `%s'",
+                       user_defined_fs_output->name);
+   } else if (gl_FragData_assigned && user_defined_fs_output_assigned) {
+      _mesa_glsl_error(&loc, state, "fragment shader writes to both "
+                       "`gl_FragData' and `%s'",
+                       user_defined_fs_output->name);
+   }
+}
+
+
+static void
+remove_per_vertex_blocks(exec_list *instructions,
+                         _mesa_glsl_parse_state *state, ir_variable_mode mode)
+{
+   /* Find the gl_PerVertex interface block of the appropriate (in/out) mode,
+    * if it exists in this shader type.
+    */
+   const glsl_type *per_vertex = NULL;
+   switch (mode) {
+   case ir_var_shader_in:
+      if (ir_variable *gl_in = state->symbols->get_variable("gl_in"))
+         per_vertex = gl_in->get_interface_type();
+      break;
+   case ir_var_shader_out:
+      if (ir_variable *gl_Position =
+          state->symbols->get_variable("gl_Position")) {
+         per_vertex = gl_Position->get_interface_type();
+      }
+      break;
+   default:
+      assert(!"Unexpected mode");
+      break;
+   }
+
+   /* If we didn't find a built-in gl_PerVertex interface block, then we don't
+    * need to do anything.
+    */
+   if (per_vertex == NULL)
+      return;
+
+   /* If the interface block is used by the shader, then we don't need to do
+    * anything.
+    */
+   interface_block_usage_visitor v(mode, per_vertex);
+   v.run(instructions);
+   if (v.usage_found())
+      return;
+
+   /* Remove any ir_variable declarations that refer to the interface block
+    * we're removing.
+    */
+   foreach_list_safe(node, instructions) {
+      ir_variable *const var = ((ir_instruction *) node)->as_variable();
+      if (var != NULL && var->get_interface_type() == per_vertex &&
+          var->data.mode == mode) {
+         state->symbols->disable_variable(var->name);
+         var->remove();
+      }
+   }
+}

diff --git a/icd/intel/compiler/shader/ast_type.cpp b/icd/intel/compiler/shader/ast_type.cpp
new file mode 100644
index 0000000..0ee2c49
--- /dev/null
+++ b/icd/intel/compiler/shader/ast_type.cpp

@@ -0,0 +1,306 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "ast.h"
+
+void
+ast_type_specifier::print(void) const
+{
+   if (structure) {
+      structure->print();
+   } else {
+      printf("%s ", type_name);
+   }
+
+   if (array_specifier) {
+      array_specifier->print();
+   }
+}
+
+bool
+ast_fully_specified_type::has_qualifiers() const
+{
+   return this->qualifier.flags.i != 0;
+}
+
+bool ast_type_qualifier::has_interpolation() const
+{
+   return this->flags.q.smooth
+          || this->flags.q.flat
+          || this->flags.q.noperspective;
+}
+
+bool
+ast_type_qualifier::has_layout() const
+{
+   return this->flags.q.origin_upper_left
+          || this->flags.q.pixel_center_integer
+          || this->flags.q.depth_any
+          || this->flags.q.depth_greater
+          || this->flags.q.depth_less
+          || this->flags.q.depth_unchanged
+          || this->flags.q.std140
+          || this->flags.q.shared
+          || this->flags.q.column_major
+          || this->flags.q.row_major
+          || this->flags.q.packed
+          || this->flags.q.explicit_location
+          || this->flags.q.explicit_index
+          || this->flags.q.explicit_binding
+          || this->flags.q.explicit_offset;
+}
+
+bool
+ast_type_qualifier::has_storage() const
+{
+   return this->flags.q.constant
+          || this->flags.q.attribute
+          || this->flags.q.varying
+          || this->flags.q.in
+          || this->flags.q.out
+          || this->flags.q.uniform;
+}
+
+bool
+ast_type_qualifier::has_auxiliary_storage() const
+{
+   return this->flags.q.centroid
+          || this->flags.q.sample;
+}
+
+const char*
+ast_type_qualifier::interpolation_string() const
+{
+   if (this->flags.q.smooth)
+      return "smooth";
+   else if (this->flags.q.flat)
+      return "flat";
+   else if (this->flags.q.noperspective)
+      return "noperspective";
+   else
+      return NULL;
+}
+
+bool
+ast_type_qualifier::merge_qualifier(YYLTYPE *loc,
+				    _mesa_glsl_parse_state *state,
+				    ast_type_qualifier q)
+{
+   ast_type_qualifier ubo_mat_mask;
+   ubo_mat_mask.flags.i = 0;
+   ubo_mat_mask.flags.q.row_major = 1;
+   ubo_mat_mask.flags.q.column_major = 1;
+
+   ast_type_qualifier ubo_layout_mask;
+   ubo_layout_mask.flags.i = 0;
+   ubo_layout_mask.flags.q.std140 = 1;
+   ubo_layout_mask.flags.q.packed = 1;
+   ubo_layout_mask.flags.q.shared = 1;
+
+   ast_type_qualifier ubo_binding_mask;
+   ubo_binding_mask.flags.i = 0;
+   ubo_binding_mask.flags.q.explicit_binding = 1;
+   ubo_binding_mask.flags.q.explicit_offset = 1;
+
+   /* Uniform block layout qualifiers get to overwrite each
+    * other (rightmost having priority), while all other
+    * qualifiers currently don't allow duplicates.
+    */
+
+   if ((this->flags.i & q.flags.i & ~(ubo_mat_mask.flags.i |
+				      ubo_layout_mask.flags.i |
+                                      ubo_binding_mask.flags.i)) != 0) {
+      _mesa_glsl_error(loc, state,
+		       "duplicate layout qualifiers used");
+      return false;
+   }
+
+   if (q.flags.q.prim_type) {
+      if (this->flags.q.prim_type && this->prim_type != q.prim_type) {
+	 _mesa_glsl_error(loc, state,
+			  "conflicting primitive type qualifiers used");
+	 return false;
+      }
+      this->prim_type = q.prim_type;
+   }
+
+   if (q.flags.q.max_vertices) {
+      if (this->flags.q.max_vertices && this->max_vertices != q.max_vertices) {
+	 _mesa_glsl_error(loc, state,
+			  "geometry shader set conflicting max_vertices "
+			  "(%d and %d)", this->max_vertices, q.max_vertices);
+	 return false;
+      }
+      this->max_vertices = q.max_vertices;
+   }
+
+   if ((q.flags.i & ubo_mat_mask.flags.i) != 0)
+      this->flags.i &= ~ubo_mat_mask.flags.i;
+   if ((q.flags.i & ubo_layout_mask.flags.i) != 0)
+      this->flags.i &= ~ubo_layout_mask.flags.i;
+
+   for (int i = 0; i < 3; i++) {
+      if (q.flags.q.local_size & (1 << i)) {
+         if ((this->flags.q.local_size & (1 << i)) &&
+             this->local_size[i] != q.local_size[i]) {
+            _mesa_glsl_error(loc, state,
+                             "compute shader set conflicting values for "
+                             "local_size_%c (%d and %d)", 'x' + i,
+                             this->local_size[i], q.local_size[i]);
+            return false;
+         }
+         this->local_size[i] = q.local_size[i];
+      }
+   }
+
+   this->flags.i |= q.flags.i;
+
+   if (q.flags.q.explicit_location)
+      this->location = q.location;
+
+   if (q.flags.q.explicit_index)
+      this->index = q.index;
+
+   if (q.flags.q.explicit_binding)
+      this->binding = q.binding;
+
+   if (q.flags.q.explicit_offset)
+      this->offset = q.offset;
+
+   if (q.precision != ast_precision_none)
+      this->precision = q.precision;
+
+   if (q.flags.q.explicit_image_format) {
+      this->image_format = q.image_format;
+      this->image_base_type = q.image_base_type;
+   }
+
+   return true;
+}
+
+bool
+ast_type_qualifier::merge_in_qualifier(YYLTYPE *loc,
+                                       _mesa_glsl_parse_state *state,
+                                       ast_type_qualifier q,
+                                       ast_node* &node)
+{
+   void *mem_ctx = state;
+   bool create_gs_ast = false;
+   bool create_cs_ast = false;
+   ast_type_qualifier valid_in_mask;
+   valid_in_mask.flags.i = 0;
+
+   switch (state->stage) {
+   case MESA_SHADER_GEOMETRY:
+      if (q.flags.q.prim_type) {
+         /* Make sure this is a valid input primitive type. */
+         switch (q.prim_type) {
+         case GL_POINTS:
+         case GL_LINES:
+         case GL_LINES_ADJACENCY:
+         case GL_TRIANGLES:
+         case GL_TRIANGLES_ADJACENCY:
+            break;
+         default:
+            _mesa_glsl_error(loc, state,
+                             "invalid geometry shader input primitive type");
+            break;
+         }
+      }
+
+      create_gs_ast |=
+         q.flags.q.prim_type &&
+         !state->in_qualifier->flags.q.prim_type;
+
+      valid_in_mask.flags.q.prim_type = 1;
+      valid_in_mask.flags.q.invocations = 1;
+      break;
+   case MESA_SHADER_FRAGMENT:
+      if (q.flags.q.early_fragment_tests) {
+         state->early_fragment_tests = true;
+      } else {
+         _mesa_glsl_error(loc, state, "invalid input layout qualifier");
+      }
+      break;
+   case MESA_SHADER_COMPUTE:
+      create_cs_ast |=
+         q.flags.q.local_size != 0 &&
+         state->in_qualifier->flags.q.local_size == 0;
+
+      valid_in_mask.flags.q.local_size = 1;
+      break;
+   default:
+      _mesa_glsl_error(loc, state,
+                       "input layout qualifiers only valid in "
+                       "geometry, fragment and compute shaders");
+      break;
+   }
+
+   /* Generate an error when invalid input layout qualifiers are used. */
+   if ((q.flags.i & ~valid_in_mask.flags.i) != 0) {
+      _mesa_glsl_error(loc, state,
+		       "invalid input layout qualifiers used");
+      return false;
+   }
+
+   /* Input layout qualifiers can be specified multiple
+    * times in separate declarations, as long as they match.
+    */
+   if (this->flags.q.prim_type) {
+      if (q.flags.q.prim_type &&
+          this->prim_type != q.prim_type) {
+         _mesa_glsl_error(loc, state,
+                          "conflicting input primitive types specified");
+      }
+   } else if (q.flags.q.prim_type) {
+      state->in_qualifier->flags.q.prim_type = 1;
+      state->in_qualifier->prim_type = q.prim_type;
+   }
+
+   if (this->flags.q.invocations &&
+       q.flags.q.invocations &&
+       this->invocations != q.invocations) {
+      _mesa_glsl_error(loc, state,
+                       "conflicting invocations counts specified");
+      return false;
+   } else if (q.flags.q.invocations) {
+      this->flags.q.invocations = 1;
+      this->invocations = q.invocations;
+   }
+
+   if (create_gs_ast) {
+      node = new(mem_ctx) ast_gs_input_layout(*loc, q.prim_type);
+   } else if (create_cs_ast) {
+      /* Infer a local_size of 1 for every unspecified dimension */
+      unsigned local_size[3];
+      for (int i = 0; i < 3; i++) {
+         if (q.flags.q.local_size & (1 << i))
+            local_size[i] = q.local_size[i];
+         else
+            local_size[i] = 1;
+      }
+      node = new(mem_ctx) ast_cs_input_layout(*loc, local_size);
+   }
+
+   return true;
+}

diff --git a/icd/intel/compiler/shader/builtin_functions.cpp b/icd/intel/compiler/shader/builtin_functions.cpp
new file mode 100644
index 0000000..28b241e
--- /dev/null
+++ b/icd/intel/compiler/shader/builtin_functions.cpp

@@ -0,0 +1,4405 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file builtin_functions.cpp
+ *
+ * Support for GLSL built-in functions.
+ *
+ * This file is split into several main components:
+ *
+ * 1. Availability predicates
+ *
+ *    A series of small functions that check whether the current shader
+ *    supports the version/extensions required to expose a built-in.
+ *
+ * 2. Core builtin_builder class functionality
+ *
+ * 3. Lists of built-in functions
+ *
+ *    The builtin_builder::create_builtins() function contains lists of all
+ *    built-in function signatures, where they're available, what types they
+ *    take, and so on.
+ *
+ * 4. Implementations of built-in function signatures
+ *
+ *    A series of functions which create ir_function_signatures and emit IR
+ *    via ir_builder to implement them.
+ *
+ * 5. External API
+ *
+ *    A few functions the rest of the compiler can use to interact with the
+ *    built-in function module.  For example, searching for a built-in by
+ *    name and parameters.
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "libfns.h" /* for struct gl_shader */
+#include "main/shaderobj.h"
+#include "ir_builder.h"
+#include "glsl_parser_extras.h"
+#include "program/prog_instruction.h"
+#include <limits>
+
+#define M_PIf   ((float) M_PI)
+#define M_PI_2f ((float) M_PI_2)
+#define M_PI_4f ((float) M_PI_4)
+
+using namespace ir_builder;
+
+/**
+ * Availability predicates:
+ *  @{
+ */
+static bool
+always_available(const _mesa_glsl_parse_state *)
+{
+   return true;
+}
+
+static bool
+compatibility_vs_only(const _mesa_glsl_parse_state *state)
+{
+   return state->stage == MESA_SHADER_VERTEX &&
+          state->language_version <= 130 &&
+          !state->es_shader;
+}
+
+static bool
+fs_only(const _mesa_glsl_parse_state *state)
+{
+   return state->stage == MESA_SHADER_FRAGMENT;
+}
+
+static bool
+gs_only(const _mesa_glsl_parse_state *state)
+{
+   return state->stage == MESA_SHADER_GEOMETRY;
+}
+
+static bool
+v110(const _mesa_glsl_parse_state *state)
+{
+   return !state->es_shader;
+}
+
+static bool
+v110_fs_only(const _mesa_glsl_parse_state *state)
+{
+   return !state->es_shader && state->stage == MESA_SHADER_FRAGMENT;
+}
+
+static bool
+v120(const _mesa_glsl_parse_state *state)
+{
+   return state->is_version(120, 300);
+}
+
+static bool
+v130(const _mesa_glsl_parse_state *state)
+{
+   return state->is_version(130, 300);
+}
+
+static bool
+v130_fs_only(const _mesa_glsl_parse_state *state)
+{
+   return state->is_version(130, 300) &&
+          state->stage == MESA_SHADER_FRAGMENT;
+}
+
+static bool
+v140(const _mesa_glsl_parse_state *state)
+{
+   return state->is_version(140, 0);
+}
+
+static bool
+texture_rectangle(const _mesa_glsl_parse_state *state)
+{
+   return state->ARB_texture_rectangle_enable;
+}
+
+static bool
+texture_external(const _mesa_glsl_parse_state *state)
+{
+   return state->OES_EGL_image_external_enable;
+}
+
+/** True if texturing functions with explicit LOD are allowed. */
+static bool
+lod_exists_in_stage(const _mesa_glsl_parse_state *state)
+{
+   /* Texturing functions with "Lod" in their name exist:
+    * - In the vertex shader stage (for all languages)
+    * - In any stage for GLSL 1.30+ or GLSL ES 3.00
+    * - In any stage for desktop GLSL with ARB_shader_texture_lod enabled.
+    *
+    * Since ARB_shader_texture_lod can only be enabled on desktop GLSL, we
+    * don't need to explicitly check state->es_shader.
+    */
+   return state->stage == MESA_SHADER_VERTEX ||
+          state->is_version(130, 300) ||
+          state->ARB_shader_texture_lod_enable;
+}
+
+static bool
+v110_lod(const _mesa_glsl_parse_state *state)
+{
+   return !state->es_shader && lod_exists_in_stage(state);
+}
+
+static bool
+shader_texture_lod(const _mesa_glsl_parse_state *state)
+{
+   return state->ARB_shader_texture_lod_enable;
+}
+
+static bool
+shader_texture_lod_and_rect(const _mesa_glsl_parse_state *state)
+{
+   return state->ARB_shader_texture_lod_enable &&
+          state->ARB_texture_rectangle_enable;
+}
+
+static bool
+shader_bit_encoding(const _mesa_glsl_parse_state *state)
+{
+   return state->is_version(330, 300) ||
+          state->ARB_shader_bit_encoding_enable ||
+          state->ARB_gpu_shader5_enable;
+}
+
+static bool
+shader_integer_mix(const _mesa_glsl_parse_state *state)
+{
+   return v130(state) && state->EXT_shader_integer_mix_enable;
+}
+
+static bool
+shader_packing_or_es3(const _mesa_glsl_parse_state *state)
+{
+   return state->ARB_shading_language_packing_enable ||
+          state->is_version(400, 300);
+}
+
+static bool
+shader_packing_or_es3_or_gpu_shader5(const _mesa_glsl_parse_state *state)
+{
+   return state->ARB_shading_language_packing_enable ||
+          state->ARB_gpu_shader5_enable ||
+          state->is_version(400, 300);
+}
+
+static bool
+gpu_shader5(const _mesa_glsl_parse_state *state)
+{
+   return state->is_version(400, 0) || state->ARB_gpu_shader5_enable;
+}
+
+static bool
+shader_packing_or_gpu_shader5(const _mesa_glsl_parse_state *state)
+{
+   return state->ARB_shading_language_packing_enable ||
+          gpu_shader5(state);
+}
+
+static bool
+texture_array_lod(const _mesa_glsl_parse_state *state)
+{
+   return lod_exists_in_stage(state) &&
+          state->EXT_texture_array_enable;
+}
+
+static bool
+fs_texture_array(const _mesa_glsl_parse_state *state)
+{
+   return state->stage == MESA_SHADER_FRAGMENT &&
+          state->EXT_texture_array_enable;
+}
+
+static bool
+texture_array(const _mesa_glsl_parse_state *state)
+{
+   return state->EXT_texture_array_enable;
+}
+
+static bool
+texture_multisample(const _mesa_glsl_parse_state *state)
+{
+   return state->is_version(150, 0) ||
+          state->ARB_texture_multisample_enable;
+}
+
+static bool
+fs_texture_cube_map_array(const _mesa_glsl_parse_state *state)
+{
+   return state->stage == MESA_SHADER_FRAGMENT &&
+          (state->is_version(400, 0) ||
+           state->ARB_texture_cube_map_array_enable);
+}
+
+static bool
+texture_cube_map_array(const _mesa_glsl_parse_state *state)
+{
+   return state->is_version(400, 0) ||
+          state->ARB_texture_cube_map_array_enable;
+}
+
+static bool
+texture_query_levels(const _mesa_glsl_parse_state *state)
+{
+   return state->is_version(430, 0) ||
+          state->ARB_texture_query_levels_enable;
+}
+
+static bool
+texture_query_lod(const _mesa_glsl_parse_state *state)
+{
+   return state->stage == MESA_SHADER_FRAGMENT &&
+          state->ARB_texture_query_lod_enable;
+}
+
+static bool
+texture_gather(const _mesa_glsl_parse_state *state)
+{
+   return state->is_version(400, 0) ||
+          state->ARB_texture_gather_enable ||
+          state->ARB_gpu_shader5_enable;
+}
+
+/* Only ARB_texture_gather but not GLSL 4.0 or ARB_gpu_shader5.
+ * used for relaxation of const offset requirements.
+ */
+static bool
+texture_gather_only(const _mesa_glsl_parse_state *state)
+{
+   return !state->is_version(400, 0) &&
+          !state->ARB_gpu_shader5_enable &&
+          state->ARB_texture_gather_enable;
+}
+
+/* Desktop GL or OES_standard_derivatives + fragment shader only */
+static bool
+fs_oes_derivatives(const _mesa_glsl_parse_state *state)
+{
+   return state->stage == MESA_SHADER_FRAGMENT &&
+          (state->is_version(110, 300) ||
+           state->OES_standard_derivatives_enable);
+}
+
+static bool
+tex1d_lod(const _mesa_glsl_parse_state *state)
+{
+   return !state->es_shader && lod_exists_in_stage(state);
+}
+
+/** True if sampler3D exists */
+static bool
+tex3d(const _mesa_glsl_parse_state *state)
+{
+   /* sampler3D exists in all desktop GLSL versions, GLSL ES 1.00 with the
+    * OES_texture_3D extension, and in GLSL ES 3.00.
+    */
+   return !state->es_shader ||
+          state->OES_texture_3D_enable ||
+          state->language_version >= 300;
+}
+
+static bool
+fs_tex3d(const _mesa_glsl_parse_state *state)
+{
+   return state->stage == MESA_SHADER_FRAGMENT &&
+          (!state->es_shader || state->OES_texture_3D_enable);
+}
+
+static bool
+tex3d_lod(const _mesa_glsl_parse_state *state)
+{
+   return tex3d(state) && lod_exists_in_stage(state);
+}
+
+static bool
+shader_atomic_counters(const _mesa_glsl_parse_state *state)
+{
+   return state->ARB_shader_atomic_counters_enable;
+}
+
+static bool
+shader_trinary_minmax(const _mesa_glsl_parse_state *state)
+{
+   return state->AMD_shader_trinary_minmax_enable;
+}
+
+static bool
+shader_image_load_store(const _mesa_glsl_parse_state *state)
+{
+   return (state->is_version(420, 0) ||
+           state->ARB_shader_image_load_store_enable);
+}
+
+/** @} */
+
+/******************************************************************************/
+
+namespace {
+
+/**
+ * builtin_builder: A singleton object representing the core of the built-in
+ * function module.
+ *
+ * It generates IR for every built-in function signature, and organizes them
+ * into functions.
+ */
+class builtin_builder {
+public:
+   builtin_builder();
+   ~builtin_builder();
+
+   void initialize();
+   void release();
+   ir_function_signature *find(_mesa_glsl_parse_state *state,
+                               const char *name, exec_list *actual_parameters);
+
+   /**
+    * A shader to hold all the built-in signatures; created by this module.
+    *
+    * This includes signatures for every built-in, regardless of version or
+    * enabled extensions.  The availability predicate associated with each
+    * signature allows matching_signature() to filter out the irrelevant ones.
+    */
+   gl_shader *shader;
+
+private:
+   void *mem_ctx;
+
+   /** Global variables used by built-in functions. */
+   ir_variable *gl_ModelViewProjectionMatrix;
+   ir_variable *gl_Vertex;
+
+   void create_shader();
+   void create_intrinsics();
+   void create_builtins();
+
+   /**
+    * IR builder helpers:
+    *
+    * These convenience functions assist in emitting IR, but don't necessarily
+    * fit in ir_builder itself.  Many of them rely on having a mem_ctx class
+    * member available.
+    */
+   ir_variable *in_var(const glsl_type *type, const char *name);
+   ir_variable *out_var(const glsl_type *type, const char *name);
+   ir_constant *imm(float f, unsigned vector_elements=1);
+   ir_constant *imm(int i, unsigned vector_elements=1);
+   ir_constant *imm(unsigned u, unsigned vector_elements=1);
+   ir_constant *imm(const glsl_type *type, const ir_constant_data &);
+   ir_dereference_variable *var_ref(ir_variable *var);
+   ir_dereference_array *array_ref(ir_variable *var, int i);
+   ir_swizzle *matrix_elt(ir_variable *var, int col, int row);
+
+   ir_expression *asin_expr(ir_variable *x);
+
+   /**
+    * Call function \param f with parameters specified as the linked
+    * list \param params of \c ir_variable objects.  \param ret should
+    * point to the ir_variable that will hold the function return
+    * value, or be \c NULL if the function has void return type.
+    */
+   ir_call *call(ir_function *f, ir_variable *ret, exec_list params);
+
+   /** Create a new function and add the given signatures. */
+   void add_function(const char *name, ...);
+
+   enum image_function_flags {
+      IMAGE_FUNCTION_EMIT_STUB = (1 << 0),
+      IMAGE_FUNCTION_RETURNS_VOID = (1 << 1),
+      IMAGE_FUNCTION_HAS_VECTOR_DATA_TYPE = (1 << 2),
+      IMAGE_FUNCTION_SUPPORTS_FLOAT_DATA_TYPE = (1 << 3),
+      IMAGE_FUNCTION_READ_ONLY = (1 << 4),
+      IMAGE_FUNCTION_WRITE_ONLY = (1 << 5)
+   };
+
+   /**
+    * Create a new image built-in function for all known image types.
+    * \p flags is a bitfield of \c image_function_flags flags.
+    */
+   void add_image_function(const char *name,
+                           const char *intrinsic_name,
+                           unsigned num_arguments,
+                           unsigned flags);
+
+   /**
+    * Create new functions for all known image built-ins and types.
+    * If \p glsl is \c true, use the GLSL built-in names and emit code
+    * to call into the actual compiler intrinsic.  If \p glsl is
+    * false, emit a function prototype with no body for each image
+    * intrinsic name.
+    */
+   void add_image_functions(bool glsl);
+
+   ir_function_signature *new_sig(const glsl_type *return_type,
+                                  builtin_available_predicate avail,
+                                  int num_params, ...);
+
+   /**
+    * Function signature generators:
+    *  @{
+    */
+   ir_function_signature *unop(builtin_available_predicate avail,
+                               ir_expression_operation opcode,
+                               const glsl_type *return_type,
+                               const glsl_type *param_type);
+   ir_function_signature *binop(ir_expression_operation opcode,
+                                builtin_available_predicate avail,
+                                const glsl_type *return_type,
+                                const glsl_type *param0_type,
+                                const glsl_type *param1_type);
+
+#define B0(X) ir_function_signature *_##X();
+#define B1(X) ir_function_signature *_##X(const glsl_type *);
+#define B2(X) ir_function_signature *_##X(const glsl_type *, const glsl_type *);
+#define B3(X) ir_function_signature *_##X(const glsl_type *, const glsl_type *, const glsl_type *);
+#define BA1(X) ir_function_signature *_##X(builtin_available_predicate, const glsl_type *);
+#define BA2(X) ir_function_signature *_##X(builtin_available_predicate, const glsl_type *, const glsl_type *);
+   B1(radians)
+   B1(degrees)
+   B1(sin)
+   B1(cos)
+   B1(tan)
+   B1(asin)
+   B1(acos)
+   B1(atan2)
+   B1(atan)
+   B1(sinh)
+   B1(cosh)
+   B1(tanh)
+   B1(asinh)
+   B1(acosh)
+   B1(atanh)
+   B1(pow)
+   B1(exp)
+   B1(log)
+   B1(exp2)
+   B1(log2)
+   B1(sqrt)
+   B1(inversesqrt)
+   B1(abs)
+   B1(sign)
+   B1(floor)
+   B1(trunc)
+   B1(round)
+   B1(roundEven)
+   B1(ceil)
+   B1(fract)
+   B2(mod)
+   B1(modf)
+   BA2(min)
+   BA2(max)
+   BA2(clamp)
+   B2(mix_lrp)
+   ir_function_signature *_mix_sel(builtin_available_predicate avail,
+                                   const glsl_type *val_type,
+                                   const glsl_type *blend_type);
+   B2(step)
+   B2(smoothstep)
+   B1(isnan)
+   B1(isinf)
+   B1(floatBitsToInt)
+   B1(floatBitsToUint)
+   B1(intBitsToFloat)
+   B1(uintBitsToFloat)
+   ir_function_signature *_packUnorm2x16(builtin_available_predicate avail);
+   ir_function_signature *_packSnorm2x16(builtin_available_predicate avail);
+   ir_function_signature *_packUnorm4x8(builtin_available_predicate avail);
+   ir_function_signature *_packSnorm4x8(builtin_available_predicate avail);
+   ir_function_signature *_unpackUnorm2x16(builtin_available_predicate avail);
+   ir_function_signature *_unpackSnorm2x16(builtin_available_predicate avail);
+   ir_function_signature *_unpackUnorm4x8(builtin_available_predicate avail);
+   ir_function_signature *_unpackSnorm4x8(builtin_available_predicate avail);
+   ir_function_signature *_packHalf2x16(builtin_available_predicate avail);
+   ir_function_signature *_unpackHalf2x16(builtin_available_predicate avail);
+   B1(length)
+   B1(distance);
+   B1(dot);
+   B1(cross);
+   B1(normalize);
+   B0(ftransform);
+   B1(faceforward);
+   B1(reflect);
+   B1(refract);
+   B1(matrixCompMult);
+   B1(outerProduct);
+   B0(determinant_mat2);
+   B0(determinant_mat3);
+   B0(determinant_mat4);
+   B0(inverse_mat2);
+   B0(inverse_mat3);
+   B0(inverse_mat4);
+   B1(transpose);
+   BA1(lessThan);
+   BA1(lessThanEqual);
+   BA1(greaterThan);
+   BA1(greaterThanEqual);
+   BA1(equal);
+   BA1(notEqual);
+   B1(any);
+   B1(all);
+   B1(not);
+   B2(textureSize);
+   ir_function_signature *_textureSize(builtin_available_predicate avail,
+                                       const glsl_type *return_type,
+                                       const glsl_type *sampler_type);
+
+/** Flags to _texture() */
+#define TEX_PROJECT 1
+#define TEX_OFFSET  2
+#define TEX_COMPONENT 4
+#define TEX_OFFSET_NONCONST 8
+#define TEX_OFFSET_ARRAY 16
+
+   ir_function_signature *_texture(ir_texture_opcode opcode,
+                                   builtin_available_predicate avail,
+                                   const glsl_type *return_type,
+                                   const glsl_type *sampler_type,
+                                   const glsl_type *coord_type,
+                                   int flags = 0);
+   B0(textureCubeArrayShadow);
+   ir_function_signature *_texelFetch(builtin_available_predicate avail,
+                                      const glsl_type *return_type,
+                                      const glsl_type *sampler_type,
+                                      const glsl_type *coord_type,
+                                      const glsl_type *offset_type = NULL);
+
+   B0(EmitVertex)
+   B0(EndPrimitive)
+
+   B2(textureQueryLod);
+   B1(textureQueryLevels);
+   B1(dFdx);
+   B1(dFdy);
+   B1(fwidth);
+   B1(noise1);
+   B1(noise2);
+   B1(noise3);
+   B1(noise4);
+
+   B1(bitfieldExtract)
+   B1(bitfieldInsert)
+   B1(bitfieldReverse)
+   B1(bitCount)
+   B1(findLSB)
+   B1(findMSB)
+   B1(fma)
+   B2(ldexp)
+   B2(frexp)
+   B1(uaddCarry)
+   B1(usubBorrow)
+   B1(mulExtended)
+
+   ir_function_signature *_atomic_intrinsic(builtin_available_predicate avail);
+   ir_function_signature *_atomic_op(const char *intrinsic,
+                                     builtin_available_predicate avail);
+
+   B1(min3)
+   B1(max3)
+   B1(mid3)
+
+   ir_function_signature *_image_prototype(const glsl_type *image_type,
+                                           const char *intrinsic_name,
+                                           unsigned num_arguments,
+                                           unsigned flags);
+   ir_function_signature *_image(const glsl_type *image_type,
+                                 const char *intrinsic_name,
+                                 unsigned num_arguments,
+                                 unsigned flags);
+
+   ir_function_signature *_memory_barrier_intrinsic(
+      builtin_available_predicate avail);
+   ir_function_signature *_memory_barrier(
+      builtin_available_predicate avail);
+
+#undef B0
+#undef B1
+#undef B2
+#undef B3
+#undef BA1
+#undef BA2
+   /** @} */
+};
+
+} /* anonymous namespace */
+
+/**
+ * Core builtin_builder functionality:
+ *  @{
+ */
+builtin_builder::builtin_builder()
+   : shader(NULL),
+     gl_ModelViewProjectionMatrix(NULL),
+     gl_Vertex(NULL)
+{
+   mem_ctx = NULL;
+}
+
+builtin_builder::~builtin_builder()
+{
+   ralloc_free(mem_ctx);
+}
+
+ir_function_signature *
+builtin_builder::find(_mesa_glsl_parse_state *state,
+                      const char *name, exec_list *actual_parameters)
+{
+   /* The shader currently being compiled requested a built-in function;
+    * it needs to link against builtin_builder::shader in order to get them.
+    *
+    * Even if we don't find a matching signature, we still need to do this so
+    * that the "no matching signature" error will list potential candidates
+    * from the available built-ins.
+    */
+   state->uses_builtin_functions = true;
+
+   ir_function *f = shader->symbols->get_function(name);
+   if (f == NULL)
+      return NULL;
+
+   ir_function_signature *sig = f->matching_signature(state, actual_parameters);
+   if (sig == NULL)
+      return NULL;
+
+   return sig;
+}
+
+void
+builtin_builder::initialize()
+{
+   /* If already initialized, don't do it again. */
+   if (mem_ctx != NULL)
+      return;
+
+   mem_ctx = ralloc_context(NULL);
+   create_shader();
+   create_intrinsics();
+   create_builtins();
+}
+
+void
+builtin_builder::release()
+{
+   ralloc_free(mem_ctx);
+   mem_ctx = NULL;
+
+   ralloc_free(shader);
+   shader = NULL;
+}
+
+void
+builtin_builder::create_shader()
+{
+   /* The target doesn't actually matter.  There's no target for generic
+    * GLSL utility code that could be linked against any stage, so just
+    * arbitrarily pick GL_VERTEX_SHADER.
+    */
+   shader = _mesa_new_shader(NULL, 0, GL_VERTEX_SHADER);
+   shader->symbols = new(mem_ctx) glsl_symbol_table;
+
+   gl_ModelViewProjectionMatrix =
+      new(mem_ctx) ir_variable(glsl_type::mat4_type,
+                               "gl_ModelViewProjectionMatrix",
+                               ir_var_uniform);
+
+   shader->symbols->add_variable(gl_ModelViewProjectionMatrix);
+
+   gl_Vertex = in_var(glsl_type::vec4_type, "gl_Vertex");
+   shader->symbols->add_variable(gl_Vertex);
+}
+
+/** @} */
+
+/**
+ * Create ir_function and ir_function_signature objects for each
+ * intrinsic.
+ */
+void
+builtin_builder::create_intrinsics()
+{
+   add_function("__intrinsic_atomic_read",
+                _atomic_intrinsic(shader_atomic_counters),
+                NULL);
+   add_function("__intrinsic_atomic_increment",
+                _atomic_intrinsic(shader_atomic_counters),
+                NULL);
+   add_function("__intrinsic_atomic_predecrement",
+                _atomic_intrinsic(shader_atomic_counters),
+                NULL);
+
+   add_image_functions(false);
+
+   add_function("__intrinsic_memory_barrier",
+                _memory_barrier_intrinsic(shader_image_load_store),
+                NULL);
+}
+
+/**
+ * Create ir_function and ir_function_signature objects for each built-in.
+ *
+ * Contains a list of every available built-in.
+ */
+void
+builtin_builder::create_builtins()
+{
+#define F(NAME)                                 \
+   add_function(#NAME,                          \
+                _##NAME(glsl_type::float_type), \
+                _##NAME(glsl_type::vec2_type),  \
+                _##NAME(glsl_type::vec3_type),  \
+                _##NAME(glsl_type::vec4_type),  \
+                NULL);
+
+#define FI(NAME)                                \
+   add_function(#NAME,                          \
+                _##NAME(glsl_type::float_type), \
+                _##NAME(glsl_type::vec2_type),  \
+                _##NAME(glsl_type::vec3_type),  \
+                _##NAME(glsl_type::vec4_type),  \
+                _##NAME(glsl_type::int_type),   \
+                _##NAME(glsl_type::ivec2_type), \
+                _##NAME(glsl_type::ivec3_type), \
+                _##NAME(glsl_type::ivec4_type), \
+                NULL);
+
+#define FIU(NAME)                                                 \
+   add_function(#NAME,                                            \
+                _##NAME(always_available, glsl_type::float_type), \
+                _##NAME(always_available, glsl_type::vec2_type),  \
+                _##NAME(always_available, glsl_type::vec3_type),  \
+                _##NAME(always_available, glsl_type::vec4_type),  \
+                                                                  \
+                _##NAME(always_available, glsl_type::int_type),   \
+                _##NAME(always_available, glsl_type::ivec2_type), \
+                _##NAME(always_available, glsl_type::ivec3_type), \
+                _##NAME(always_available, glsl_type::ivec4_type), \
+                                                                  \
+                _##NAME(v130, glsl_type::uint_type),              \
+                _##NAME(v130, glsl_type::uvec2_type),             \
+                _##NAME(v130, glsl_type::uvec3_type),             \
+                _##NAME(v130, glsl_type::uvec4_type),             \
+                NULL);
+
+#define IU(NAME)                                \
+   add_function(#NAME,                          \
+                _##NAME(glsl_type::int_type),   \
+                _##NAME(glsl_type::ivec2_type), \
+                _##NAME(glsl_type::ivec3_type), \
+                _##NAME(glsl_type::ivec4_type), \
+                                                \
+                _##NAME(glsl_type::uint_type),  \
+                _##NAME(glsl_type::uvec2_type), \
+                _##NAME(glsl_type::uvec3_type), \
+                _##NAME(glsl_type::uvec4_type), \
+                NULL);
+
+#define FIUB(NAME)                                                \
+   add_function(#NAME,                                            \
+                _##NAME(always_available, glsl_type::float_type), \
+                _##NAME(always_available, glsl_type::vec2_type),  \
+                _##NAME(always_available, glsl_type::vec3_type),  \
+                _##NAME(always_available, glsl_type::vec4_type),  \
+                                                                  \
+                _##NAME(always_available, glsl_type::int_type),   \
+                _##NAME(always_available, glsl_type::ivec2_type), \
+                _##NAME(always_available, glsl_type::ivec3_type), \
+                _##NAME(always_available, glsl_type::ivec4_type), \
+                                                                  \
+                _##NAME(v130, glsl_type::uint_type),              \
+                _##NAME(v130, glsl_type::uvec2_type),             \
+                _##NAME(v130, glsl_type::uvec3_type),             \
+                _##NAME(v130, glsl_type::uvec4_type),             \
+                                                                  \
+                _##NAME(always_available, glsl_type::bool_type),  \
+                _##NAME(always_available, glsl_type::bvec2_type), \
+                _##NAME(always_available, glsl_type::bvec3_type), \
+                _##NAME(always_available, glsl_type::bvec4_type), \
+                NULL);
+
+#define FIU2_MIXED(NAME)                                                                 \
+   add_function(#NAME,                                                                   \
+                _##NAME(always_available, glsl_type::float_type, glsl_type::float_type), \
+                _##NAME(always_available, glsl_type::vec2_type,  glsl_type::float_type), \
+                _##NAME(always_available, glsl_type::vec3_type,  glsl_type::float_type), \
+                _##NAME(always_available, glsl_type::vec4_type,  glsl_type::float_type), \
+                                                                                         \
+                _##NAME(always_available, glsl_type::vec2_type,  glsl_type::vec2_type),  \
+                _##NAME(always_available, glsl_type::vec3_type,  glsl_type::vec3_type),  \
+                _##NAME(always_available, glsl_type::vec4_type,  glsl_type::vec4_type),  \
+                                                                                         \
+                _##NAME(always_available, glsl_type::int_type,   glsl_type::int_type),   \
+                _##NAME(always_available, glsl_type::ivec2_type, glsl_type::int_type),   \
+                _##NAME(always_available, glsl_type::ivec3_type, glsl_type::int_type),   \
+                _##NAME(always_available, glsl_type::ivec4_type, glsl_type::int_type),   \
+                                                                                         \
+                _##NAME(always_available, glsl_type::ivec2_type, glsl_type::ivec2_type), \
+                _##NAME(always_available, glsl_type::ivec3_type, glsl_type::ivec3_type), \
+                _##NAME(always_available, glsl_type::ivec4_type, glsl_type::ivec4_type), \
+                                                                                         \
+                _##NAME(v130, glsl_type::uint_type,  glsl_type::uint_type),              \
+                _##NAME(v130, glsl_type::uvec2_type, glsl_type::uint_type),              \
+                _##NAME(v130, glsl_type::uvec3_type, glsl_type::uint_type),              \
+                _##NAME(v130, glsl_type::uvec4_type, glsl_type::uint_type),              \
+                                                                                         \
+                _##NAME(v130, glsl_type::uvec2_type, glsl_type::uvec2_type),             \
+                _##NAME(v130, glsl_type::uvec3_type, glsl_type::uvec3_type),             \
+                _##NAME(v130, glsl_type::uvec4_type, glsl_type::uvec4_type),             \
+                NULL);
+
+   F(radians)
+   F(degrees)
+   F(sin)
+   F(cos)
+   F(tan)
+   F(asin)
+   F(acos)
+
+   add_function("atan",
+                _atan(glsl_type::float_type),
+                _atan(glsl_type::vec2_type),
+                _atan(glsl_type::vec3_type),
+                _atan(glsl_type::vec4_type),
+                _atan2(glsl_type::float_type),
+                _atan2(glsl_type::vec2_type),
+                _atan2(glsl_type::vec3_type),
+                _atan2(glsl_type::vec4_type),
+                NULL);
+
+   F(sinh)
+   F(cosh)
+   F(tanh)
+   F(asinh)
+   F(acosh)
+   F(atanh)
+   F(pow)
+   F(exp)
+   F(log)
+   F(exp2)
+   F(log2)
+   F(sqrt)
+   F(inversesqrt)
+   FI(abs)
+   FI(sign)
+   F(floor)
+   F(trunc)
+   F(round)
+   F(roundEven)
+   F(ceil)
+   F(fract)
+
+   add_function("mod",
+                _mod(glsl_type::float_type, glsl_type::float_type),
+                _mod(glsl_type::vec2_type,  glsl_type::float_type),
+                _mod(glsl_type::vec3_type,  glsl_type::float_type),
+                _mod(glsl_type::vec4_type,  glsl_type::float_type),
+
+                _mod(glsl_type::vec2_type,  glsl_type::vec2_type),
+                _mod(glsl_type::vec3_type,  glsl_type::vec3_type),
+                _mod(glsl_type::vec4_type,  glsl_type::vec4_type),
+                NULL);
+
+   F(modf)
+
+   FIU2_MIXED(min)
+   FIU2_MIXED(max)
+   FIU2_MIXED(clamp)
+
+   add_function("mix",
+                _mix_lrp(glsl_type::float_type, glsl_type::float_type),
+                _mix_lrp(glsl_type::vec2_type,  glsl_type::float_type),
+                _mix_lrp(glsl_type::vec3_type,  glsl_type::float_type),
+                _mix_lrp(glsl_type::vec4_type,  glsl_type::float_type),
+
+                _mix_lrp(glsl_type::vec2_type,  glsl_type::vec2_type),
+                _mix_lrp(glsl_type::vec3_type,  glsl_type::vec3_type),
+                _mix_lrp(glsl_type::vec4_type,  glsl_type::vec4_type),
+
+                _mix_sel(v130, glsl_type::float_type, glsl_type::bool_type),
+                _mix_sel(v130, glsl_type::vec2_type,  glsl_type::bvec2_type),
+                _mix_sel(v130, glsl_type::vec3_type,  glsl_type::bvec3_type),
+                _mix_sel(v130, glsl_type::vec4_type,  glsl_type::bvec4_type),
+
+                _mix_sel(shader_integer_mix, glsl_type::int_type,   glsl_type::bool_type),
+                _mix_sel(shader_integer_mix, glsl_type::ivec2_type, glsl_type::bvec2_type),
+                _mix_sel(shader_integer_mix, glsl_type::ivec3_type, glsl_type::bvec3_type),
+                _mix_sel(shader_integer_mix, glsl_type::ivec4_type, glsl_type::bvec4_type),
+
+                _mix_sel(shader_integer_mix, glsl_type::uint_type,  glsl_type::bool_type),
+                _mix_sel(shader_integer_mix, glsl_type::uvec2_type, glsl_type::bvec2_type),
+                _mix_sel(shader_integer_mix, glsl_type::uvec3_type, glsl_type::bvec3_type),
+                _mix_sel(shader_integer_mix, glsl_type::uvec4_type, glsl_type::bvec4_type),
+
+                _mix_sel(shader_integer_mix, glsl_type::bool_type,  glsl_type::bool_type),
+                _mix_sel(shader_integer_mix, glsl_type::bvec2_type, glsl_type::bvec2_type),
+                _mix_sel(shader_integer_mix, glsl_type::bvec3_type, glsl_type::bvec3_type),
+                _mix_sel(shader_integer_mix, glsl_type::bvec4_type, glsl_type::bvec4_type),
+                NULL);
+
+   add_function("step",
+                _step(glsl_type::float_type, glsl_type::float_type),
+                _step(glsl_type::float_type, glsl_type::vec2_type),
+                _step(glsl_type::float_type, glsl_type::vec3_type),
+                _step(glsl_type::float_type, glsl_type::vec4_type),
+
+                _step(glsl_type::vec2_type,  glsl_type::vec2_type),
+                _step(glsl_type::vec3_type,  glsl_type::vec3_type),
+                _step(glsl_type::vec4_type,  glsl_type::vec4_type),
+                NULL);
+
+   add_function("smoothstep",
+                _smoothstep(glsl_type::float_type, glsl_type::float_type),
+                _smoothstep(glsl_type::float_type, glsl_type::vec2_type),
+                _smoothstep(glsl_type::float_type, glsl_type::vec3_type),
+                _smoothstep(glsl_type::float_type, glsl_type::vec4_type),
+
+                _smoothstep(glsl_type::vec2_type,  glsl_type::vec2_type),
+                _smoothstep(glsl_type::vec3_type,  glsl_type::vec3_type),
+                _smoothstep(glsl_type::vec4_type,  glsl_type::vec4_type),
+                NULL);
+ 
+   F(isnan)
+   F(isinf)
+
+   F(floatBitsToInt)
+   F(floatBitsToUint)
+   add_function("intBitsToFloat",
+                _intBitsToFloat(glsl_type::int_type),
+                _intBitsToFloat(glsl_type::ivec2_type),
+                _intBitsToFloat(glsl_type::ivec3_type),
+                _intBitsToFloat(glsl_type::ivec4_type),
+                NULL);
+   add_function("uintBitsToFloat",
+                _uintBitsToFloat(glsl_type::uint_type),
+                _uintBitsToFloat(glsl_type::uvec2_type),
+                _uintBitsToFloat(glsl_type::uvec3_type),
+                _uintBitsToFloat(glsl_type::uvec4_type),
+                NULL);
+
+   add_function("packUnorm2x16",   _packUnorm2x16(shader_packing_or_es3_or_gpu_shader5),   NULL);
+   add_function("packSnorm2x16",   _packSnorm2x16(shader_packing_or_es3),                  NULL);
+   add_function("packUnorm4x8",    _packUnorm4x8(shader_packing_or_gpu_shader5),           NULL);
+   add_function("packSnorm4x8",    _packSnorm4x8(shader_packing_or_gpu_shader5),           NULL);
+   add_function("unpackUnorm2x16", _unpackUnorm2x16(shader_packing_or_es3_or_gpu_shader5), NULL);
+   add_function("unpackSnorm2x16", _unpackSnorm2x16(shader_packing_or_es3),                NULL);
+   add_function("unpackUnorm4x8",  _unpackUnorm4x8(shader_packing_or_gpu_shader5),         NULL);
+   add_function("unpackSnorm4x8",  _unpackSnorm4x8(shader_packing_or_gpu_shader5),         NULL);
+   add_function("packHalf2x16",    _packHalf2x16(shader_packing_or_es3),                   NULL);
+   add_function("unpackHalf2x16",  _unpackHalf2x16(shader_packing_or_es3),                 NULL);
+
+   F(length)
+   F(distance)
+   F(dot)
+
+   add_function("cross", _cross(glsl_type::vec3_type), NULL);
+
+   F(normalize)
+   add_function("ftransform", _ftransform(), NULL);
+   F(faceforward)
+   F(reflect)
+   F(refract)
+   // ...
+   add_function("matrixCompMult",
+                _matrixCompMult(glsl_type::mat2_type),
+                _matrixCompMult(glsl_type::mat3_type),
+                _matrixCompMult(glsl_type::mat4_type),
+                _matrixCompMult(glsl_type::mat2x3_type),
+                _matrixCompMult(glsl_type::mat2x4_type),
+                _matrixCompMult(glsl_type::mat3x2_type),
+                _matrixCompMult(glsl_type::mat3x4_type),
+                _matrixCompMult(glsl_type::mat4x2_type),
+                _matrixCompMult(glsl_type::mat4x3_type),
+                NULL);
+   add_function("outerProduct",
+                _outerProduct(glsl_type::mat2_type),
+                _outerProduct(glsl_type::mat3_type),
+                _outerProduct(glsl_type::mat4_type),
+                _outerProduct(glsl_type::mat2x3_type),
+                _outerProduct(glsl_type::mat2x4_type),
+                _outerProduct(glsl_type::mat3x2_type),
+                _outerProduct(glsl_type::mat3x4_type),
+                _outerProduct(glsl_type::mat4x2_type),
+                _outerProduct(glsl_type::mat4x3_type),
+                NULL);
+   add_function("determinant",
+                _determinant_mat2(),
+                _determinant_mat3(),
+                _determinant_mat4(),
+                NULL);
+   add_function("inverse",
+                _inverse_mat2(),
+                _inverse_mat3(),
+                _inverse_mat4(),
+                NULL);
+   add_function("transpose",
+                _transpose(glsl_type::mat2_type),
+                _transpose(glsl_type::mat3_type),
+                _transpose(glsl_type::mat4_type),
+                _transpose(glsl_type::mat2x3_type),
+                _transpose(glsl_type::mat2x4_type),
+                _transpose(glsl_type::mat3x2_type),
+                _transpose(glsl_type::mat3x4_type),
+                _transpose(glsl_type::mat4x2_type),
+                _transpose(glsl_type::mat4x3_type),
+                NULL);
+   FIU(lessThan)
+   FIU(lessThanEqual)
+   FIU(greaterThan)
+   FIU(greaterThanEqual)
+   FIUB(notEqual)
+   FIUB(equal)
+
+   add_function("any",
+                _any(glsl_type::bvec2_type),
+                _any(glsl_type::bvec3_type),
+                _any(glsl_type::bvec4_type),
+                NULL);
+
+   add_function("all",
+                _all(glsl_type::bvec2_type),
+                _all(glsl_type::bvec3_type),
+                _all(glsl_type::bvec4_type),
+                NULL);
+
+   add_function("not",
+                _not(glsl_type::bvec2_type),
+                _not(glsl_type::bvec3_type),
+                _not(glsl_type::bvec4_type),
+                NULL);
+
+   add_function("textureSize",
+                _textureSize(v130, glsl_type::int_type,   glsl_type::sampler1D_type),
+                _textureSize(v130, glsl_type::int_type,   glsl_type::isampler1D_type),
+                _textureSize(v130, glsl_type::int_type,   glsl_type::usampler1D_type),
+
+                _textureSize(v130, glsl_type::ivec2_type, glsl_type::sampler2D_type),
+                _textureSize(v130, glsl_type::ivec2_type, glsl_type::isampler2D_type),
+                _textureSize(v130, glsl_type::ivec2_type, glsl_type::usampler2D_type),
+
+                _textureSize(v130, glsl_type::ivec3_type, glsl_type::sampler3D_type),
+                _textureSize(v130, glsl_type::ivec3_type, glsl_type::isampler3D_type),
+                _textureSize(v130, glsl_type::ivec3_type, glsl_type::usampler3D_type),
+
+                _textureSize(v130, glsl_type::ivec2_type, glsl_type::samplerCube_type),
+                _textureSize(v130, glsl_type::ivec2_type, glsl_type::isamplerCube_type),
+                _textureSize(v130, glsl_type::ivec2_type, glsl_type::usamplerCube_type),
+
+                _textureSize(v130, glsl_type::int_type,   glsl_type::sampler1DShadow_type),
+                _textureSize(v130, glsl_type::ivec2_type, glsl_type::sampler2DShadow_type),
+                _textureSize(v130, glsl_type::ivec2_type, glsl_type::samplerCubeShadow_type),
+
+                _textureSize(v130, glsl_type::ivec2_type, glsl_type::sampler1DArray_type),
+                _textureSize(v130, glsl_type::ivec2_type, glsl_type::isampler1DArray_type),
+                _textureSize(v130, glsl_type::ivec2_type, glsl_type::usampler1DArray_type),
+                _textureSize(v130, glsl_type::ivec3_type, glsl_type::sampler2DArray_type),
+                _textureSize(v130, glsl_type::ivec3_type, glsl_type::isampler2DArray_type),
+                _textureSize(v130, glsl_type::ivec3_type, glsl_type::usampler2DArray_type),
+
+                _textureSize(v130, glsl_type::ivec2_type, glsl_type::sampler1DArrayShadow_type),
+                _textureSize(v130, glsl_type::ivec3_type, glsl_type::sampler2DArrayShadow_type),
+
+                _textureSize(texture_cube_map_array, glsl_type::ivec3_type, glsl_type::samplerCubeArray_type),
+                _textureSize(texture_cube_map_array, glsl_type::ivec3_type, glsl_type::isamplerCubeArray_type),
+                _textureSize(texture_cube_map_array, glsl_type::ivec3_type, glsl_type::usamplerCubeArray_type),
+                _textureSize(texture_cube_map_array, glsl_type::ivec3_type, glsl_type::samplerCubeArrayShadow_type),
+
+                _textureSize(v130, glsl_type::ivec2_type, glsl_type::sampler2DRect_type),
+                _textureSize(v130, glsl_type::ivec2_type, glsl_type::isampler2DRect_type),
+                _textureSize(v130, glsl_type::ivec2_type, glsl_type::usampler2DRect_type),
+                _textureSize(v130, glsl_type::ivec2_type, glsl_type::sampler2DRectShadow_type),
+
+                _textureSize(v140, glsl_type::int_type,   glsl_type::samplerBuffer_type),
+                _textureSize(v140, glsl_type::int_type,   glsl_type::isamplerBuffer_type),
+                _textureSize(v140, glsl_type::int_type,   glsl_type::usamplerBuffer_type),
+                _textureSize(texture_multisample, glsl_type::ivec2_type, glsl_type::sampler2DMS_type),
+                _textureSize(texture_multisample, glsl_type::ivec2_type, glsl_type::isampler2DMS_type),
+                _textureSize(texture_multisample, glsl_type::ivec2_type, glsl_type::usampler2DMS_type),
+
+                _textureSize(texture_multisample, glsl_type::ivec3_type, glsl_type::sampler2DMSArray_type),
+                _textureSize(texture_multisample, glsl_type::ivec3_type, glsl_type::isampler2DMSArray_type),
+                _textureSize(texture_multisample, glsl_type::ivec3_type, glsl_type::usampler2DMSArray_type),
+                NULL);
+
+   add_function("texture",
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::float_type),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::float_type),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::float_type),
+
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec2_type),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec2_type),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec2_type),
+
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler3D_type,  glsl_type::vec3_type),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler3D_type, glsl_type::vec3_type),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler3D_type, glsl_type::vec3_type),
+
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::samplerCube_type,  glsl_type::vec3_type),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isamplerCube_type, glsl_type::vec3_type),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usamplerCube_type, glsl_type::vec3_type),
+
+                _texture(ir_tex, v130, glsl_type::float_type, glsl_type::sampler1DShadow_type,   glsl_type::vec3_type),
+                _texture(ir_tex, v130, glsl_type::float_type, glsl_type::sampler2DShadow_type,   glsl_type::vec3_type),
+                _texture(ir_tex, v130, glsl_type::float_type, glsl_type::samplerCubeShadow_type, glsl_type::vec4_type),
+
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler1DArray_type,  glsl_type::vec2_type),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler1DArray_type, glsl_type::vec2_type),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler1DArray_type, glsl_type::vec2_type),
+
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler2DArray_type,  glsl_type::vec3_type),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler2DArray_type, glsl_type::vec3_type),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler2DArray_type, glsl_type::vec3_type),
+
+                _texture(ir_tex, texture_cube_map_array, glsl_type::vec4_type,  glsl_type::samplerCubeArray_type,  glsl_type::vec4_type),
+                _texture(ir_tex, texture_cube_map_array, glsl_type::ivec4_type, glsl_type::isamplerCubeArray_type, glsl_type::vec4_type),
+                _texture(ir_tex, texture_cube_map_array, glsl_type::uvec4_type, glsl_type::usamplerCubeArray_type, glsl_type::vec4_type),
+
+                _texture(ir_tex, v130, glsl_type::float_type, glsl_type::sampler1DArrayShadow_type, glsl_type::vec3_type),
+                _texture(ir_tex, v130, glsl_type::float_type, glsl_type::sampler2DArrayShadow_type, glsl_type::vec4_type),
+                /* samplerCubeArrayShadow is special; it has an extra parameter
+                 * for the shadow comparitor since there is no vec5 type.
+                 */
+                _textureCubeArrayShadow(),
+
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler2DRect_type,  glsl_type::vec2_type),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler2DRect_type, glsl_type::vec2_type),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler2DRect_type, glsl_type::vec2_type),
+
+                _texture(ir_tex, v130, glsl_type::float_type, glsl_type::sampler2DRectShadow_type, glsl_type::vec3_type),
+
+                _texture(ir_txb, v130_fs_only, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::float_type),
+                _texture(ir_txb, v130_fs_only, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::float_type),
+                _texture(ir_txb, v130_fs_only, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::float_type),
+
+                _texture(ir_txb, v130_fs_only, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec2_type),
+                _texture(ir_txb, v130_fs_only, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec2_type),
+                _texture(ir_txb, v130_fs_only, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec2_type),
+
+                _texture(ir_txb, v130_fs_only, glsl_type::vec4_type,  glsl_type::sampler3D_type,  glsl_type::vec3_type),
+                _texture(ir_txb, v130_fs_only, glsl_type::ivec4_type, glsl_type::isampler3D_type, glsl_type::vec3_type),
+                _texture(ir_txb, v130_fs_only, glsl_type::uvec4_type, glsl_type::usampler3D_type, glsl_type::vec3_type),
+
+                _texture(ir_txb, v130_fs_only, glsl_type::vec4_type,  glsl_type::samplerCube_type,  glsl_type::vec3_type),
+                _texture(ir_txb, v130_fs_only, glsl_type::ivec4_type, glsl_type::isamplerCube_type, glsl_type::vec3_type),
+                _texture(ir_txb, v130_fs_only, glsl_type::uvec4_type, glsl_type::usamplerCube_type, glsl_type::vec3_type),
+
+                _texture(ir_txb, v130_fs_only, glsl_type::float_type, glsl_type::sampler1DShadow_type,   glsl_type::vec3_type),
+                _texture(ir_txb, v130_fs_only, glsl_type::float_type, glsl_type::sampler2DShadow_type,   glsl_type::vec3_type),
+                _texture(ir_txb, v130_fs_only, glsl_type::float_type, glsl_type::samplerCubeShadow_type, glsl_type::vec4_type),
+
+                _texture(ir_txb, v130_fs_only, glsl_type::vec4_type,  glsl_type::sampler1DArray_type,  glsl_type::vec2_type),
+                _texture(ir_txb, v130_fs_only, glsl_type::ivec4_type, glsl_type::isampler1DArray_type, glsl_type::vec2_type),
+                _texture(ir_txb, v130_fs_only, glsl_type::uvec4_type, glsl_type::usampler1DArray_type, glsl_type::vec2_type),
+
+                _texture(ir_txb, v130_fs_only, glsl_type::vec4_type,  glsl_type::sampler2DArray_type,  glsl_type::vec3_type),
+                _texture(ir_txb, v130_fs_only, glsl_type::ivec4_type, glsl_type::isampler2DArray_type, glsl_type::vec3_type),
+                _texture(ir_txb, v130_fs_only, glsl_type::uvec4_type, glsl_type::usampler2DArray_type, glsl_type::vec3_type),
+
+                _texture(ir_txb, fs_texture_cube_map_array, glsl_type::vec4_type,  glsl_type::samplerCubeArray_type,  glsl_type::vec4_type),
+                _texture(ir_txb, fs_texture_cube_map_array, glsl_type::ivec4_type, glsl_type::isamplerCubeArray_type, glsl_type::vec4_type),
+                _texture(ir_txb, fs_texture_cube_map_array, glsl_type::uvec4_type, glsl_type::usamplerCubeArray_type, glsl_type::vec4_type),
+
+                _texture(ir_txb, v130_fs_only, glsl_type::float_type, glsl_type::sampler1DArrayShadow_type, glsl_type::vec3_type),
+                NULL);
+
+   add_function("textureLod",
+                _texture(ir_txl, v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::float_type),
+                _texture(ir_txl, v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::float_type),
+                _texture(ir_txl, v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::float_type),
+
+                _texture(ir_txl, v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec2_type),
+                _texture(ir_txl, v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec2_type),
+                _texture(ir_txl, v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec2_type),
+
+                _texture(ir_txl, v130, glsl_type::vec4_type,  glsl_type::sampler3D_type,  glsl_type::vec3_type),
+                _texture(ir_txl, v130, glsl_type::ivec4_type, glsl_type::isampler3D_type, glsl_type::vec3_type),
+                _texture(ir_txl, v130, glsl_type::uvec4_type, glsl_type::usampler3D_type, glsl_type::vec3_type),
+
+                _texture(ir_txl, v130, glsl_type::vec4_type,  glsl_type::samplerCube_type,  glsl_type::vec3_type),
+                _texture(ir_txl, v130, glsl_type::ivec4_type, glsl_type::isamplerCube_type, glsl_type::vec3_type),
+                _texture(ir_txl, v130, glsl_type::uvec4_type, glsl_type::usamplerCube_type, glsl_type::vec3_type),
+
+                _texture(ir_txl, v130, glsl_type::float_type, glsl_type::sampler1DShadow_type, glsl_type::vec3_type),
+                _texture(ir_txl, v130, glsl_type::float_type, glsl_type::sampler2DShadow_type, glsl_type::vec3_type),
+
+                _texture(ir_txl, v130, glsl_type::vec4_type,  glsl_type::sampler1DArray_type,  glsl_type::vec2_type),
+                _texture(ir_txl, v130, glsl_type::ivec4_type, glsl_type::isampler1DArray_type, glsl_type::vec2_type),
+                _texture(ir_txl, v130, glsl_type::uvec4_type, glsl_type::usampler1DArray_type, glsl_type::vec2_type),
+
+                _texture(ir_txl, v130, glsl_type::vec4_type,  glsl_type::sampler2DArray_type,  glsl_type::vec3_type),
+                _texture(ir_txl, v130, glsl_type::ivec4_type, glsl_type::isampler2DArray_type, glsl_type::vec3_type),
+                _texture(ir_txl, v130, glsl_type::uvec4_type, glsl_type::usampler2DArray_type, glsl_type::vec3_type),
+
+                _texture(ir_txl, texture_cube_map_array, glsl_type::vec4_type,  glsl_type::samplerCubeArray_type,  glsl_type::vec4_type),
+                _texture(ir_txl, texture_cube_map_array, glsl_type::ivec4_type, glsl_type::isamplerCubeArray_type, glsl_type::vec4_type),
+                _texture(ir_txl, texture_cube_map_array, glsl_type::uvec4_type, glsl_type::usamplerCubeArray_type, glsl_type::vec4_type),
+
+                _texture(ir_txl, v130, glsl_type::float_type, glsl_type::sampler1DArrayShadow_type, glsl_type::vec3_type),
+                NULL);
+
+   add_function("textureOffset",
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::float_type, TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::float_type, TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::float_type, TEX_OFFSET),
+
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec2_type, TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec2_type, TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec2_type, TEX_OFFSET),
+
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler3D_type,  glsl_type::vec3_type, TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler3D_type, glsl_type::vec3_type, TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler3D_type, glsl_type::vec3_type, TEX_OFFSET),
+
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler2DRect_type,  glsl_type::vec2_type, TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET),
+
+                _texture(ir_tex, v130, glsl_type::float_type, glsl_type::sampler2DRectShadow_type, glsl_type::vec3_type, TEX_OFFSET),
+
+                _texture(ir_tex, v130, glsl_type::float_type, glsl_type::sampler1DShadow_type, glsl_type::vec3_type, TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::float_type, glsl_type::sampler2DShadow_type, glsl_type::vec3_type, TEX_OFFSET),
+
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler1DArray_type,  glsl_type::vec2_type, TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler1DArray_type, glsl_type::vec2_type, TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler1DArray_type, glsl_type::vec2_type, TEX_OFFSET),
+
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler2DArray_type,  glsl_type::vec3_type, TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET),
+
+                _texture(ir_tex, v130, glsl_type::float_type, glsl_type::sampler1DArrayShadow_type, glsl_type::vec3_type, TEX_OFFSET),
+
+                _texture(ir_txb, v130_fs_only, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::float_type, TEX_OFFSET),
+                _texture(ir_txb, v130_fs_only, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::float_type, TEX_OFFSET),
+                _texture(ir_txb, v130_fs_only, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::float_type, TEX_OFFSET),
+
+                _texture(ir_txb, v130_fs_only, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec2_type, TEX_OFFSET),
+                _texture(ir_txb, v130_fs_only, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec2_type, TEX_OFFSET),
+                _texture(ir_txb, v130_fs_only, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec2_type, TEX_OFFSET),
+
+                _texture(ir_txb, v130_fs_only, glsl_type::vec4_type,  glsl_type::sampler3D_type,  glsl_type::vec3_type, TEX_OFFSET),
+                _texture(ir_txb, v130_fs_only, glsl_type::ivec4_type, glsl_type::isampler3D_type, glsl_type::vec3_type, TEX_OFFSET),
+                _texture(ir_txb, v130_fs_only, glsl_type::uvec4_type, glsl_type::usampler3D_type, glsl_type::vec3_type, TEX_OFFSET),
+
+                _texture(ir_txb, v130_fs_only, glsl_type::float_type, glsl_type::sampler1DShadow_type, glsl_type::vec3_type, TEX_OFFSET),
+                _texture(ir_txb, v130_fs_only, glsl_type::float_type, glsl_type::sampler2DShadow_type, glsl_type::vec3_type, TEX_OFFSET),
+
+                _texture(ir_txb, v130_fs_only, glsl_type::vec4_type,  glsl_type::sampler1DArray_type,  glsl_type::vec2_type, TEX_OFFSET),
+                _texture(ir_txb, v130_fs_only, glsl_type::ivec4_type, glsl_type::isampler1DArray_type, glsl_type::vec2_type, TEX_OFFSET),
+                _texture(ir_txb, v130_fs_only, glsl_type::uvec4_type, glsl_type::usampler1DArray_type, glsl_type::vec2_type, TEX_OFFSET),
+
+                _texture(ir_txb, v130_fs_only, glsl_type::vec4_type,  glsl_type::sampler2DArray_type,  glsl_type::vec3_type, TEX_OFFSET),
+                _texture(ir_txb, v130_fs_only, glsl_type::ivec4_type, glsl_type::isampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET),
+                _texture(ir_txb, v130_fs_only, glsl_type::uvec4_type, glsl_type::usampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET),
+
+                _texture(ir_txb, v130_fs_only, glsl_type::float_type, glsl_type::sampler1DArrayShadow_type, glsl_type::vec3_type, TEX_OFFSET),
+                NULL);
+
+   add_function("textureProj",
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::vec2_type, TEX_PROJECT),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::vec2_type, TEX_PROJECT),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::vec2_type, TEX_PROJECT),
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::vec4_type, TEX_PROJECT),
+
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec4_type, TEX_PROJECT),
+
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler3D_type,  glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler3D_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler3D_type, glsl_type::vec4_type, TEX_PROJECT),
+
+                _texture(ir_tex, v130, glsl_type::float_type, glsl_type::sampler1DShadow_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_tex, v130, glsl_type::float_type, glsl_type::sampler2DShadow_type, glsl_type::vec4_type, TEX_PROJECT),
+
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler2DRect_type,  glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler2DRect_type, glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler2DRect_type, glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler2DRect_type,  glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler2DRect_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler2DRect_type, glsl_type::vec4_type, TEX_PROJECT),
+
+                _texture(ir_tex, v130, glsl_type::float_type, glsl_type::sampler2DRectShadow_type, glsl_type::vec4_type, TEX_PROJECT),
+
+                _texture(ir_txb, v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::vec2_type, TEX_PROJECT),
+                _texture(ir_txb, v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::vec2_type, TEX_PROJECT),
+                _texture(ir_txb, v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::vec2_type, TEX_PROJECT),
+                _texture(ir_txb, v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txb, v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txb, v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::vec4_type, TEX_PROJECT),
+
+                _texture(ir_txb, v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_txb, v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_txb, v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_txb, v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txb, v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txb, v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec4_type, TEX_PROJECT),
+
+                _texture(ir_txb, v130, glsl_type::vec4_type,  glsl_type::sampler3D_type,  glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txb, v130, glsl_type::ivec4_type, glsl_type::isampler3D_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txb, v130, glsl_type::uvec4_type, glsl_type::usampler3D_type, glsl_type::vec4_type, TEX_PROJECT),
+
+                _texture(ir_txb, v130, glsl_type::float_type, glsl_type::sampler1DShadow_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txb, v130, glsl_type::float_type, glsl_type::sampler2DShadow_type, glsl_type::vec4_type, TEX_PROJECT),
+                NULL);
+
+   add_function("texelFetch",
+                _texelFetch(v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::int_type),
+                _texelFetch(v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::int_type),
+                _texelFetch(v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::int_type),
+
+                _texelFetch(v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::ivec2_type),
+                _texelFetch(v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::ivec2_type),
+                _texelFetch(v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::ivec2_type),
+
+                _texelFetch(v130, glsl_type::vec4_type,  glsl_type::sampler3D_type,  glsl_type::ivec3_type),
+                _texelFetch(v130, glsl_type::ivec4_type, glsl_type::isampler3D_type, glsl_type::ivec3_type),
+                _texelFetch(v130, glsl_type::uvec4_type, glsl_type::usampler3D_type, glsl_type::ivec3_type),
+
+                _texelFetch(v130, glsl_type::vec4_type,  glsl_type::sampler2DRect_type,  glsl_type::ivec2_type),
+                _texelFetch(v130, glsl_type::ivec4_type, glsl_type::isampler2DRect_type, glsl_type::ivec2_type),
+                _texelFetch(v130, glsl_type::uvec4_type, glsl_type::usampler2DRect_type, glsl_type::ivec2_type),
+
+                _texelFetch(v130, glsl_type::vec4_type,  glsl_type::sampler1DArray_type,  glsl_type::ivec2_type),
+                _texelFetch(v130, glsl_type::ivec4_type, glsl_type::isampler1DArray_type, glsl_type::ivec2_type),
+                _texelFetch(v130, glsl_type::uvec4_type, glsl_type::usampler1DArray_type, glsl_type::ivec2_type),
+
+                _texelFetch(v130, glsl_type::vec4_type,  glsl_type::sampler2DArray_type,  glsl_type::ivec3_type),
+                _texelFetch(v130, glsl_type::ivec4_type, glsl_type::isampler2DArray_type, glsl_type::ivec3_type),
+                _texelFetch(v130, glsl_type::uvec4_type, glsl_type::usampler2DArray_type, glsl_type::ivec3_type),
+
+                _texelFetch(v140, glsl_type::vec4_type,  glsl_type::samplerBuffer_type,  glsl_type::int_type),
+                _texelFetch(v140, glsl_type::ivec4_type, glsl_type::isamplerBuffer_type, glsl_type::int_type),
+                _texelFetch(v140, glsl_type::uvec4_type, glsl_type::usamplerBuffer_type, glsl_type::int_type),
+
+                _texelFetch(texture_multisample, glsl_type::vec4_type,  glsl_type::sampler2DMS_type,  glsl_type::ivec2_type),
+                _texelFetch(texture_multisample, glsl_type::ivec4_type, glsl_type::isampler2DMS_type, glsl_type::ivec2_type),
+                _texelFetch(texture_multisample, glsl_type::uvec4_type, glsl_type::usampler2DMS_type, glsl_type::ivec2_type),
+
+                _texelFetch(texture_multisample, glsl_type::vec4_type,  glsl_type::sampler2DMSArray_type,  glsl_type::ivec3_type),
+                _texelFetch(texture_multisample, glsl_type::ivec4_type, glsl_type::isampler2DMSArray_type, glsl_type::ivec3_type),
+                _texelFetch(texture_multisample, glsl_type::uvec4_type, glsl_type::usampler2DMSArray_type, glsl_type::ivec3_type),
+                NULL);
+
+   add_function("texelFetchOffset",
+                _texelFetch(v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::int_type, glsl_type::int_type),
+                _texelFetch(v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::int_type, glsl_type::int_type),
+                _texelFetch(v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::int_type, glsl_type::int_type),
+
+                _texelFetch(v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::ivec2_type, glsl_type::ivec2_type),
+                _texelFetch(v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::ivec2_type, glsl_type::ivec2_type),
+                _texelFetch(v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::ivec2_type, glsl_type::ivec2_type),
+
+                _texelFetch(v130, glsl_type::vec4_type,  glsl_type::sampler3D_type,  glsl_type::ivec3_type, glsl_type::ivec3_type),
+                _texelFetch(v130, glsl_type::ivec4_type, glsl_type::isampler3D_type, glsl_type::ivec3_type, glsl_type::ivec3_type),
+                _texelFetch(v130, glsl_type::uvec4_type, glsl_type::usampler3D_type, glsl_type::ivec3_type, glsl_type::ivec3_type),
+
+                _texelFetch(v130, glsl_type::vec4_type,  glsl_type::sampler2DRect_type,  glsl_type::ivec2_type, glsl_type::ivec2_type),
+                _texelFetch(v130, glsl_type::ivec4_type, glsl_type::isampler2DRect_type, glsl_type::ivec2_type, glsl_type::ivec2_type),
+                _texelFetch(v130, glsl_type::uvec4_type, glsl_type::usampler2DRect_type, glsl_type::ivec2_type, glsl_type::ivec2_type),
+
+                _texelFetch(v130, glsl_type::vec4_type,  glsl_type::sampler1DArray_type,  glsl_type::ivec2_type, glsl_type::int_type),
+                _texelFetch(v130, glsl_type::ivec4_type, glsl_type::isampler1DArray_type, glsl_type::ivec2_type, glsl_type::int_type),
+                _texelFetch(v130, glsl_type::uvec4_type, glsl_type::usampler1DArray_type, glsl_type::ivec2_type, glsl_type::int_type),
+
+                _texelFetch(v130, glsl_type::vec4_type,  glsl_type::sampler2DArray_type,  glsl_type::ivec3_type, glsl_type::ivec2_type),
+                _texelFetch(v130, glsl_type::ivec4_type, glsl_type::isampler2DArray_type, glsl_type::ivec3_type, glsl_type::ivec2_type),
+                _texelFetch(v130, glsl_type::uvec4_type, glsl_type::usampler2DArray_type, glsl_type::ivec3_type, glsl_type::ivec2_type),
+
+                NULL);
+
+   add_function("textureProjOffset",
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::vec2_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::vec2_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::vec2_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec3_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec3_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec3_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler3D_type,  glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler3D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler3D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+
+                _texture(ir_tex, v130, glsl_type::float_type, glsl_type::sampler1DShadow_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::float_type, glsl_type::sampler2DShadow_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler2DRect_type,  glsl_type::vec3_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler2DRect_type, glsl_type::vec3_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler2DRect_type, glsl_type::vec3_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::vec4_type,  glsl_type::sampler2DRect_type,  glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::ivec4_type, glsl_type::isampler2DRect_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_tex, v130, glsl_type::uvec4_type, glsl_type::usampler2DRect_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+
+                _texture(ir_tex, v130, glsl_type::float_type, glsl_type::sampler2DRectShadow_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+
+                _texture(ir_txb, v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::vec2_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txb, v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::vec2_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txb, v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::vec2_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txb, v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txb, v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txb, v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+
+                _texture(ir_txb, v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec3_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txb, v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec3_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txb, v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec3_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txb, v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txb, v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txb, v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+
+                _texture(ir_txb, v130, glsl_type::vec4_type,  glsl_type::sampler3D_type,  glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txb, v130, glsl_type::ivec4_type, glsl_type::isampler3D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txb, v130, glsl_type::uvec4_type, glsl_type::usampler3D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+
+                _texture(ir_txb, v130, glsl_type::float_type, glsl_type::sampler1DShadow_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txb, v130, glsl_type::float_type, glsl_type::sampler2DShadow_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                NULL);
+
+   add_function("textureLodOffset",
+                _texture(ir_txl, v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::float_type, TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::float_type, TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::float_type, TEX_OFFSET),
+
+                _texture(ir_txl, v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec2_type, TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec2_type, TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec2_type, TEX_OFFSET),
+
+                _texture(ir_txl, v130, glsl_type::vec4_type,  glsl_type::sampler3D_type,  glsl_type::vec3_type, TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::ivec4_type, glsl_type::isampler3D_type, glsl_type::vec3_type, TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::uvec4_type, glsl_type::usampler3D_type, glsl_type::vec3_type, TEX_OFFSET),
+
+                _texture(ir_txl, v130, glsl_type::float_type, glsl_type::sampler1DShadow_type, glsl_type::vec3_type, TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::float_type, glsl_type::sampler2DShadow_type, glsl_type::vec3_type, TEX_OFFSET),
+
+                _texture(ir_txl, v130, glsl_type::vec4_type,  glsl_type::sampler1DArray_type,  glsl_type::vec2_type, TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::ivec4_type, glsl_type::isampler1DArray_type, glsl_type::vec2_type, TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::uvec4_type, glsl_type::usampler1DArray_type, glsl_type::vec2_type, TEX_OFFSET),
+
+                _texture(ir_txl, v130, glsl_type::vec4_type,  glsl_type::sampler2DArray_type,  glsl_type::vec3_type, TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::ivec4_type, glsl_type::isampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::uvec4_type, glsl_type::usampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET),
+
+                _texture(ir_txl, v130, glsl_type::float_type, glsl_type::sampler1DArrayShadow_type, glsl_type::vec3_type, TEX_OFFSET),
+                NULL);
+
+   add_function("textureProjLod",
+                _texture(ir_txl, v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::vec2_type, TEX_PROJECT),
+                _texture(ir_txl, v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::vec2_type, TEX_PROJECT),
+                _texture(ir_txl, v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::vec2_type, TEX_PROJECT),
+                _texture(ir_txl, v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txl, v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txl, v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::vec4_type, TEX_PROJECT),
+
+                _texture(ir_txl, v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_txl, v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_txl, v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_txl, v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txl, v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txl, v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec4_type, TEX_PROJECT),
+
+                _texture(ir_txl, v130, glsl_type::vec4_type,  glsl_type::sampler3D_type,  glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txl, v130, glsl_type::ivec4_type, glsl_type::isampler3D_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txl, v130, glsl_type::uvec4_type, glsl_type::usampler3D_type, glsl_type::vec4_type, TEX_PROJECT),
+
+                _texture(ir_txl, v130, glsl_type::float_type, glsl_type::sampler1DShadow_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txl, v130, glsl_type::float_type, glsl_type::sampler2DShadow_type, glsl_type::vec4_type, TEX_PROJECT),
+                NULL);
+
+   add_function("textureProjLodOffset",
+                _texture(ir_txl, v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::vec2_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::vec2_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::vec2_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+
+                _texture(ir_txl, v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec3_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec3_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec3_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+
+                _texture(ir_txl, v130, glsl_type::vec4_type,  glsl_type::sampler3D_type,  glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::ivec4_type, glsl_type::isampler3D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::uvec4_type, glsl_type::usampler3D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+
+                _texture(ir_txl, v130, glsl_type::float_type, glsl_type::sampler1DShadow_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txl, v130, glsl_type::float_type, glsl_type::sampler2DShadow_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                NULL);
+
+   add_function("textureGrad",
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::float_type),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::float_type),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::float_type),
+
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec2_type),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec2_type),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec2_type),
+
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler3D_type,  glsl_type::vec3_type),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler3D_type, glsl_type::vec3_type),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler3D_type, glsl_type::vec3_type),
+
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::samplerCube_type,  glsl_type::vec3_type),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isamplerCube_type, glsl_type::vec3_type),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usamplerCube_type, glsl_type::vec3_type),
+
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler2DRect_type,  glsl_type::vec2_type),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler2DRect_type, glsl_type::vec2_type),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler2DRect_type, glsl_type::vec2_type),
+
+                _texture(ir_txd, v130, glsl_type::float_type, glsl_type::sampler2DRectShadow_type, glsl_type::vec3_type),
+
+                _texture(ir_txd, v130, glsl_type::float_type, glsl_type::sampler1DShadow_type,   glsl_type::vec3_type),
+                _texture(ir_txd, v130, glsl_type::float_type, glsl_type::sampler2DShadow_type,   glsl_type::vec3_type),
+                _texture(ir_txd, v130, glsl_type::float_type, glsl_type::samplerCubeShadow_type, glsl_type::vec4_type),
+
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler1DArray_type,  glsl_type::vec2_type),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler1DArray_type, glsl_type::vec2_type),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler1DArray_type, glsl_type::vec2_type),
+
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler2DArray_type,  glsl_type::vec3_type),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler2DArray_type, glsl_type::vec3_type),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler2DArray_type, glsl_type::vec3_type),
+
+                _texture(ir_txd, texture_cube_map_array, glsl_type::vec4_type,  glsl_type::samplerCubeArray_type,  glsl_type::vec4_type),
+                _texture(ir_txd, texture_cube_map_array, glsl_type::ivec4_type, glsl_type::isamplerCubeArray_type, glsl_type::vec4_type),
+                _texture(ir_txd, texture_cube_map_array, glsl_type::uvec4_type, glsl_type::usamplerCubeArray_type, glsl_type::vec4_type),
+
+                _texture(ir_txd, v130, glsl_type::float_type, glsl_type::sampler1DArrayShadow_type, glsl_type::vec3_type),
+                _texture(ir_txd, v130, glsl_type::float_type, glsl_type::sampler2DArrayShadow_type, glsl_type::vec4_type),
+                NULL);
+
+   add_function("textureGradOffset",
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::float_type, TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::float_type, TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::float_type, TEX_OFFSET),
+
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec2_type, TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec2_type, TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec2_type, TEX_OFFSET),
+
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler3D_type,  glsl_type::vec3_type, TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler3D_type, glsl_type::vec3_type, TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler3D_type, glsl_type::vec3_type, TEX_OFFSET),
+
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler2DRect_type,  glsl_type::vec2_type, TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET),
+
+                _texture(ir_txd, v130, glsl_type::float_type, glsl_type::sampler2DRectShadow_type, glsl_type::vec3_type, TEX_OFFSET),
+
+                _texture(ir_txd, v130, glsl_type::float_type, glsl_type::sampler1DShadow_type, glsl_type::vec3_type, TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::float_type, glsl_type::sampler2DShadow_type, glsl_type::vec3_type, TEX_OFFSET),
+
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler1DArray_type,  glsl_type::vec2_type, TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler1DArray_type, glsl_type::vec2_type, TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler1DArray_type, glsl_type::vec2_type, TEX_OFFSET),
+
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler2DArray_type,  glsl_type::vec3_type, TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET),
+
+                _texture(ir_txd, v130, glsl_type::float_type, glsl_type::sampler1DArrayShadow_type, glsl_type::vec3_type, TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::float_type, glsl_type::sampler2DArrayShadow_type, glsl_type::vec4_type, TEX_OFFSET),
+                NULL);
+
+   add_function("textureProjGrad",
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::vec2_type, TEX_PROJECT),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::vec2_type, TEX_PROJECT),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::vec2_type, TEX_PROJECT),
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::vec4_type, TEX_PROJECT),
+
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec4_type, TEX_PROJECT),
+
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler3D_type,  glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler3D_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler3D_type, glsl_type::vec4_type, TEX_PROJECT),
+
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler2DRect_type,  glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler2DRect_type, glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler2DRect_type, glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler2DRect_type,  glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler2DRect_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler2DRect_type, glsl_type::vec4_type, TEX_PROJECT),
+
+                _texture(ir_txd, v130, glsl_type::float_type, glsl_type::sampler2DRectShadow_type, glsl_type::vec4_type, TEX_PROJECT),
+
+                _texture(ir_txd, v130, glsl_type::float_type, glsl_type::sampler1DShadow_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txd, v130, glsl_type::float_type, glsl_type::sampler2DShadow_type, glsl_type::vec4_type, TEX_PROJECT),
+                NULL);
+
+   add_function("textureProjGradOffset",
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::vec2_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::vec2_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::vec2_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler1D_type,  glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler1D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler1D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec3_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec3_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec3_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler2D_type,  glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler3D_type,  glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler3D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler3D_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler2DRect_type,  glsl_type::vec3_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler2DRect_type, glsl_type::vec3_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler2DRect_type, glsl_type::vec3_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::vec4_type,  glsl_type::sampler2DRect_type,  glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::ivec4_type, glsl_type::isampler2DRect_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::uvec4_type, glsl_type::usampler2DRect_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+
+                _texture(ir_txd, v130, glsl_type::float_type, glsl_type::sampler2DRectShadow_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+
+                _texture(ir_txd, v130, glsl_type::float_type, glsl_type::sampler1DShadow_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                _texture(ir_txd, v130, glsl_type::float_type, glsl_type::sampler2DShadow_type, glsl_type::vec4_type, TEX_PROJECT | TEX_OFFSET),
+                NULL);
+
+   add_function("EmitVertex",   _EmitVertex(),   NULL);
+   add_function("EndPrimitive", _EndPrimitive(), NULL);
+
+   add_function("textureQueryLOD",
+                _textureQueryLod(glsl_type::sampler1D_type,  glsl_type::float_type),
+                _textureQueryLod(glsl_type::isampler1D_type, glsl_type::float_type),
+                _textureQueryLod(glsl_type::usampler1D_type, glsl_type::float_type),
+
+                _textureQueryLod(glsl_type::sampler2D_type,  glsl_type::vec2_type),
+                _textureQueryLod(glsl_type::isampler2D_type, glsl_type::vec2_type),
+                _textureQueryLod(glsl_type::usampler2D_type, glsl_type::vec2_type),
+
+                _textureQueryLod(glsl_type::sampler3D_type,  glsl_type::vec3_type),
+                _textureQueryLod(glsl_type::isampler3D_type, glsl_type::vec3_type),
+                _textureQueryLod(glsl_type::usampler3D_type, glsl_type::vec3_type),
+
+                _textureQueryLod(glsl_type::samplerCube_type,  glsl_type::vec3_type),
+                _textureQueryLod(glsl_type::isamplerCube_type, glsl_type::vec3_type),
+                _textureQueryLod(glsl_type::usamplerCube_type, glsl_type::vec3_type),
+
+                _textureQueryLod(glsl_type::sampler1DArray_type,  glsl_type::float_type),
+                _textureQueryLod(glsl_type::isampler1DArray_type, glsl_type::float_type),
+                _textureQueryLod(glsl_type::usampler1DArray_type, glsl_type::float_type),
+
+                _textureQueryLod(glsl_type::sampler2DArray_type,  glsl_type::vec2_type),
+                _textureQueryLod(glsl_type::isampler2DArray_type, glsl_type::vec2_type),
+                _textureQueryLod(glsl_type::usampler2DArray_type, glsl_type::vec2_type),
+
+                _textureQueryLod(glsl_type::samplerCubeArray_type,  glsl_type::vec3_type),
+                _textureQueryLod(glsl_type::isamplerCubeArray_type, glsl_type::vec3_type),
+                _textureQueryLod(glsl_type::usamplerCubeArray_type, glsl_type::vec3_type),
+
+                _textureQueryLod(glsl_type::sampler1DShadow_type, glsl_type::float_type),
+                _textureQueryLod(glsl_type::sampler2DShadow_type, glsl_type::vec2_type),
+                _textureQueryLod(glsl_type::samplerCubeShadow_type, glsl_type::vec3_type),
+                _textureQueryLod(glsl_type::sampler1DArrayShadow_type, glsl_type::float_type),
+                _textureQueryLod(glsl_type::sampler2DArrayShadow_type, glsl_type::vec2_type),
+                _textureQueryLod(glsl_type::samplerCubeArrayShadow_type, glsl_type::vec3_type),
+                NULL);
+
+   add_function("textureQueryLevels",
+                _textureQueryLevels(glsl_type::sampler1D_type),
+                _textureQueryLevels(glsl_type::sampler2D_type),
+                _textureQueryLevels(glsl_type::sampler3D_type),
+                _textureQueryLevels(glsl_type::samplerCube_type),
+                _textureQueryLevels(glsl_type::sampler1DArray_type),
+                _textureQueryLevels(glsl_type::sampler2DArray_type),
+                _textureQueryLevels(glsl_type::samplerCubeArray_type),
+                _textureQueryLevels(glsl_type::sampler1DShadow_type),
+                _textureQueryLevels(glsl_type::sampler2DShadow_type),
+                _textureQueryLevels(glsl_type::samplerCubeShadow_type),
+                _textureQueryLevels(glsl_type::sampler1DArrayShadow_type),
+                _textureQueryLevels(glsl_type::sampler2DArrayShadow_type),
+                _textureQueryLevels(glsl_type::samplerCubeArrayShadow_type),
+
+                _textureQueryLevels(glsl_type::isampler1D_type),
+                _textureQueryLevels(glsl_type::isampler2D_type),
+                _textureQueryLevels(glsl_type::isampler3D_type),
+                _textureQueryLevels(glsl_type::isamplerCube_type),
+                _textureQueryLevels(glsl_type::isampler1DArray_type),
+                _textureQueryLevels(glsl_type::isampler2DArray_type),
+                _textureQueryLevels(glsl_type::isamplerCubeArray_type),
+
+                _textureQueryLevels(glsl_type::usampler1D_type),
+                _textureQueryLevels(glsl_type::usampler2D_type),
+                _textureQueryLevels(glsl_type::usampler3D_type),
+                _textureQueryLevels(glsl_type::usamplerCube_type),
+                _textureQueryLevels(glsl_type::usampler1DArray_type),
+                _textureQueryLevels(glsl_type::usampler2DArray_type),
+                _textureQueryLevels(glsl_type::usamplerCubeArray_type),
+
+                NULL);
+
+   add_function("texture1D",
+                _texture(ir_tex, v110,         glsl_type::vec4_type,  glsl_type::sampler1D_type, glsl_type::float_type),
+                _texture(ir_txb, v110_fs_only, glsl_type::vec4_type,  glsl_type::sampler1D_type, glsl_type::float_type),
+                NULL);
+
+   add_function("texture1DArray",
+                _texture(ir_tex, texture_array,    glsl_type::vec4_type, glsl_type::sampler1DArray_type, glsl_type::vec2_type),
+                _texture(ir_txb, fs_texture_array, glsl_type::vec4_type, glsl_type::sampler1DArray_type, glsl_type::vec2_type),
+                NULL);
+
+   add_function("texture1DProj",
+                _texture(ir_tex, v110,         glsl_type::vec4_type,  glsl_type::sampler1D_type, glsl_type::vec2_type, TEX_PROJECT),
+                _texture(ir_tex, v110,         glsl_type::vec4_type,  glsl_type::sampler1D_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txb, v110_fs_only, glsl_type::vec4_type,  glsl_type::sampler1D_type, glsl_type::vec2_type, TEX_PROJECT),
+                _texture(ir_txb, v110_fs_only, glsl_type::vec4_type,  glsl_type::sampler1D_type, glsl_type::vec4_type, TEX_PROJECT),
+                NULL);
+
+   add_function("texture1DLod",
+                _texture(ir_txl, tex1d_lod, glsl_type::vec4_type,  glsl_type::sampler1D_type, glsl_type::float_type),
+                NULL);
+
+   add_function("texture1DArrayLod",
+                _texture(ir_txl, texture_array_lod, glsl_type::vec4_type, glsl_type::sampler1DArray_type, glsl_type::vec2_type),
+                NULL);
+
+   add_function("texture1DProjLod",
+                _texture(ir_txl, tex1d_lod, glsl_type::vec4_type,  glsl_type::sampler1D_type, glsl_type::vec2_type, TEX_PROJECT),
+                _texture(ir_txl, tex1d_lod, glsl_type::vec4_type,  glsl_type::sampler1D_type, glsl_type::vec4_type, TEX_PROJECT),
+                NULL);
+
+   add_function("texture2D",
+                _texture(ir_tex, always_available, glsl_type::vec4_type,  glsl_type::sampler2D_type, glsl_type::vec2_type),
+                _texture(ir_txb, fs_only,          glsl_type::vec4_type,  glsl_type::sampler2D_type, glsl_type::vec2_type),
+                _texture(ir_tex, texture_external, glsl_type::vec4_type,  glsl_type::samplerExternalOES_type, glsl_type::vec2_type),
+                NULL);
+
+   add_function("texture2DArray",
+                _texture(ir_tex, texture_array,    glsl_type::vec4_type, glsl_type::sampler2DArray_type, glsl_type::vec3_type),
+                _texture(ir_txb, fs_texture_array, glsl_type::vec4_type, glsl_type::sampler2DArray_type, glsl_type::vec3_type),
+                NULL);
+
+   add_function("texture2DProj",
+                _texture(ir_tex, always_available, glsl_type::vec4_type,  glsl_type::sampler2D_type, glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_tex, always_available, glsl_type::vec4_type,  glsl_type::sampler2D_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txb, fs_only,          glsl_type::vec4_type,  glsl_type::sampler2D_type, glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_txb, fs_only,          glsl_type::vec4_type,  glsl_type::sampler2D_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_tex, texture_external, glsl_type::vec4_type,  glsl_type::samplerExternalOES_type, glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_tex, texture_external, glsl_type::vec4_type,  glsl_type::samplerExternalOES_type, glsl_type::vec4_type, TEX_PROJECT),
+                NULL);
+
+   add_function("texture2DLod",
+                _texture(ir_txl, lod_exists_in_stage, glsl_type::vec4_type,  glsl_type::sampler2D_type, glsl_type::vec2_type),
+                NULL);
+
+   add_function("texture2DArrayLod",
+                _texture(ir_txl, texture_array_lod, glsl_type::vec4_type, glsl_type::sampler2DArray_type, glsl_type::vec3_type),
+                NULL);
+
+   add_function("texture2DProjLod",
+                _texture(ir_txl, v110_lod, glsl_type::vec4_type,  glsl_type::sampler2D_type, glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_txl, v110_lod, glsl_type::vec4_type,  glsl_type::sampler2D_type, glsl_type::vec4_type, TEX_PROJECT),
+                NULL);
+
+   add_function("texture3D",
+                _texture(ir_tex, tex3d,    glsl_type::vec4_type,  glsl_type::sampler3D_type, glsl_type::vec3_type),
+                _texture(ir_txb, fs_tex3d, glsl_type::vec4_type,  glsl_type::sampler3D_type, glsl_type::vec3_type),
+                NULL);
+
+   add_function("texture3DProj",
+                _texture(ir_tex, tex3d,    glsl_type::vec4_type,  glsl_type::sampler3D_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txb, fs_tex3d, glsl_type::vec4_type,  glsl_type::sampler3D_type, glsl_type::vec4_type, TEX_PROJECT),
+                NULL);
+
+   add_function("texture3DLod",
+                _texture(ir_txl, tex3d_lod, glsl_type::vec4_type,  glsl_type::sampler3D_type, glsl_type::vec3_type),
+                NULL);
+
+   add_function("texture3DProjLod",
+                _texture(ir_txl, tex3d_lod, glsl_type::vec4_type,  glsl_type::sampler3D_type, glsl_type::vec4_type, TEX_PROJECT),
+                NULL);
+
+   add_function("textureCube",
+                _texture(ir_tex, always_available, glsl_type::vec4_type,  glsl_type::samplerCube_type, glsl_type::vec3_type),
+                _texture(ir_txb, fs_only,          glsl_type::vec4_type,  glsl_type::samplerCube_type, glsl_type::vec3_type),
+                NULL);
+
+   add_function("textureCubeLod",
+                _texture(ir_txl, v110_lod, glsl_type::vec4_type,  glsl_type::samplerCube_type, glsl_type::vec3_type),
+                NULL);
+
+   add_function("texture2DRect",
+                _texture(ir_tex, texture_rectangle, glsl_type::vec4_type,  glsl_type::sampler2DRect_type, glsl_type::vec2_type),
+                NULL);
+
+   add_function("texture2DRectProj",
+                _texture(ir_tex, texture_rectangle, glsl_type::vec4_type,  glsl_type::sampler2DRect_type, glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_tex, texture_rectangle, glsl_type::vec4_type,  glsl_type::sampler2DRect_type, glsl_type::vec4_type, TEX_PROJECT),
+                NULL);
+
+   add_function("shadow1D",
+                _texture(ir_tex, v110,         glsl_type::vec4_type,  glsl_type::sampler1DShadow_type, glsl_type::vec3_type),
+                _texture(ir_txb, v110_fs_only, glsl_type::vec4_type,  glsl_type::sampler1DShadow_type, glsl_type::vec3_type),
+                NULL);
+
+   add_function("shadow1DArray",
+                _texture(ir_tex, texture_array,    glsl_type::vec4_type,  glsl_type::sampler1DArrayShadow_type, glsl_type::vec3_type),
+                _texture(ir_txb, fs_texture_array, glsl_type::vec4_type,  glsl_type::sampler1DArrayShadow_type, glsl_type::vec3_type),
+                NULL);
+
+   add_function("shadow2D",
+                _texture(ir_tex, v110,         glsl_type::vec4_type,  glsl_type::sampler2DShadow_type, glsl_type::vec3_type),
+                _texture(ir_txb, v110_fs_only, glsl_type::vec4_type,  glsl_type::sampler2DShadow_type, glsl_type::vec3_type),
+                NULL);
+
+   add_function("shadow2DArray",
+                _texture(ir_tex, texture_array,    glsl_type::vec4_type,  glsl_type::sampler2DArrayShadow_type, glsl_type::vec4_type),
+                _texture(ir_txb, fs_texture_array, glsl_type::vec4_type,  glsl_type::sampler2DArrayShadow_type, glsl_type::vec4_type),
+                NULL);
+
+   add_function("shadow1DProj",
+                _texture(ir_tex, v110,         glsl_type::vec4_type,  glsl_type::sampler1DShadow_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txb, v110_fs_only, glsl_type::vec4_type,  glsl_type::sampler1DShadow_type, glsl_type::vec4_type, TEX_PROJECT),
+                NULL);
+
+   add_function("shadow2DProj",
+                _texture(ir_tex, v110,         glsl_type::vec4_type,  glsl_type::sampler2DShadow_type, glsl_type::vec4_type, TEX_PROJECT),
+                _texture(ir_txb, v110_fs_only, glsl_type::vec4_type,  glsl_type::sampler2DShadow_type, glsl_type::vec4_type, TEX_PROJECT),
+                NULL);
+
+   add_function("shadow1DLod",
+                _texture(ir_txl, v110_lod, glsl_type::vec4_type,  glsl_type::sampler1DShadow_type, glsl_type::vec3_type),
+                NULL);
+
+   add_function("shadow2DLod",
+                _texture(ir_txl, v110_lod, glsl_type::vec4_type,  glsl_type::sampler2DShadow_type, glsl_type::vec3_type),
+                NULL);
+
+   add_function("shadow1DArrayLod",
+                _texture(ir_txl, texture_array_lod, glsl_type::vec4_type, glsl_type::sampler1DArrayShadow_type, glsl_type::vec3_type),
+                NULL);
+
+   add_function("shadow1DProjLod",
+                _texture(ir_txl, v110_lod, glsl_type::vec4_type,  glsl_type::sampler1DShadow_type, glsl_type::vec4_type, TEX_PROJECT),
+                NULL);
+
+   add_function("shadow2DProjLod",
+                _texture(ir_txl, v110_lod, glsl_type::vec4_type,  glsl_type::sampler2DShadow_type, glsl_type::vec4_type, TEX_PROJECT),
+                NULL);
+
+   add_function("shadow2DRect",
+                _texture(ir_tex, texture_rectangle, glsl_type::vec4_type,  glsl_type::sampler2DRectShadow_type, glsl_type::vec3_type),
+                NULL);
+
+   add_function("shadow2DRectProj",
+                _texture(ir_tex, texture_rectangle, glsl_type::vec4_type,  glsl_type::sampler2DRectShadow_type, glsl_type::vec4_type, TEX_PROJECT),
+                NULL);
+
+   add_function("texture1DGradARB",
+                _texture(ir_txd, shader_texture_lod, glsl_type::vec4_type,  glsl_type::sampler1D_type, glsl_type::float_type),
+                NULL);
+
+   add_function("texture1DProjGradARB",
+                _texture(ir_txd, shader_texture_lod, glsl_type::vec4_type,  glsl_type::sampler1D_type, glsl_type::vec2_type, TEX_PROJECT),
+                _texture(ir_txd, shader_texture_lod, glsl_type::vec4_type,  glsl_type::sampler1D_type, glsl_type::vec4_type, TEX_PROJECT),
+                NULL);
+
+   add_function("texture2DGradARB",
+                _texture(ir_txd, shader_texture_lod, glsl_type::vec4_type,  glsl_type::sampler2D_type, glsl_type::vec2_type),
+                NULL);
+
+   add_function("texture2DProjGradARB",
+                _texture(ir_txd, shader_texture_lod, glsl_type::vec4_type,  glsl_type::sampler2D_type, glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_txd, shader_texture_lod, glsl_type::vec4_type,  glsl_type::sampler2D_type, glsl_type::vec4_type, TEX_PROJECT),
+                NULL);
+
+   add_function("texture3DGradARB",
+                _texture(ir_txd, shader_texture_lod, glsl_type::vec4_type,  glsl_type::sampler3D_type, glsl_type::vec3_type),
+                NULL);
+
+   add_function("texture3DProjGradARB",
+                _texture(ir_txd, shader_texture_lod, glsl_type::vec4_type,  glsl_type::sampler3D_type, glsl_type::vec4_type, TEX_PROJECT),
+                NULL);
+
+   add_function("textureCubeGradARB",
+                _texture(ir_txd, shader_texture_lod, glsl_type::vec4_type,  glsl_type::samplerCube_type, glsl_type::vec3_type),
+                NULL);
+
+   add_function("shadow1DGradARB",
+                _texture(ir_txd, shader_texture_lod, glsl_type::vec4_type,  glsl_type::sampler1DShadow_type, glsl_type::vec3_type),
+                NULL);
+
+   add_function("shadow1DProjGradARB",
+                _texture(ir_txd, shader_texture_lod, glsl_type::vec4_type,  glsl_type::sampler1DShadow_type, glsl_type::vec4_type, TEX_PROJECT),
+                NULL);
+
+   add_function("shadow2DGradARB",
+                _texture(ir_txd, shader_texture_lod, glsl_type::vec4_type,  glsl_type::sampler2DShadow_type, glsl_type::vec3_type),
+                NULL);
+
+   add_function("shadow2DProjGradARB",
+                _texture(ir_txd, shader_texture_lod, glsl_type::vec4_type,  glsl_type::sampler2DShadow_type, glsl_type::vec4_type, TEX_PROJECT),
+                NULL);
+
+   add_function("texture2DRectGradARB",
+                _texture(ir_txd, shader_texture_lod_and_rect, glsl_type::vec4_type,  glsl_type::sampler2DRect_type, glsl_type::vec2_type),
+                NULL);
+
+   add_function("texture2DRectProjGradARB",
+                _texture(ir_txd, shader_texture_lod_and_rect, glsl_type::vec4_type,  glsl_type::sampler2DRect_type, glsl_type::vec3_type, TEX_PROJECT),
+                _texture(ir_txd, shader_texture_lod_and_rect, glsl_type::vec4_type,  glsl_type::sampler2DRect_type, glsl_type::vec4_type, TEX_PROJECT),
+                NULL);
+
+   add_function("shadow2DRectGradARB",
+                _texture(ir_txd, shader_texture_lod_and_rect, glsl_type::vec4_type,  glsl_type::sampler2DRectShadow_type, glsl_type::vec3_type),
+                NULL);
+
+   add_function("shadow2DRectProjGradARB",
+                _texture(ir_txd, shader_texture_lod_and_rect, glsl_type::vec4_type,  glsl_type::sampler2DRectShadow_type, glsl_type::vec4_type, TEX_PROJECT),
+                NULL);
+
+   add_function("textureGather",
+                _texture(ir_tg4, texture_gather, glsl_type::vec4_type, glsl_type::sampler2D_type, glsl_type::vec2_type),
+                _texture(ir_tg4, texture_gather, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec2_type),
+                _texture(ir_tg4, texture_gather, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec2_type),
+
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2DRect_type, glsl_type::vec2_type),
+                _texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, glsl_type::isampler2DRect_type, glsl_type::vec2_type),
+                _texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, glsl_type::usampler2DRect_type, glsl_type::vec2_type),
+
+                _texture(ir_tg4, texture_gather, glsl_type::vec4_type, glsl_type::sampler2DArray_type, glsl_type::vec3_type),
+                _texture(ir_tg4, texture_gather, glsl_type::ivec4_type, glsl_type::isampler2DArray_type, glsl_type::vec3_type),
+                _texture(ir_tg4, texture_gather, glsl_type::uvec4_type, glsl_type::usampler2DArray_type, glsl_type::vec3_type),
+
+                _texture(ir_tg4, texture_gather, glsl_type::vec4_type, glsl_type::samplerCube_type, glsl_type::vec3_type),
+                _texture(ir_tg4, texture_gather, glsl_type::ivec4_type, glsl_type::isamplerCube_type, glsl_type::vec3_type),
+                _texture(ir_tg4, texture_gather, glsl_type::uvec4_type, glsl_type::usamplerCube_type, glsl_type::vec3_type),
+
+                _texture(ir_tg4, texture_gather, glsl_type::vec4_type, glsl_type::samplerCubeArray_type, glsl_type::vec4_type),
+                _texture(ir_tg4, texture_gather, glsl_type::ivec4_type, glsl_type::isamplerCubeArray_type, glsl_type::vec4_type),
+                _texture(ir_tg4, texture_gather, glsl_type::uvec4_type, glsl_type::usamplerCubeArray_type, glsl_type::vec4_type),
+
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2D_type, glsl_type::vec2_type, TEX_COMPONENT),
+                _texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec2_type, TEX_COMPONENT),
+                _texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec2_type, TEX_COMPONENT),
+
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2DRect_type, glsl_type::vec2_type, TEX_COMPONENT),
+                _texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, glsl_type::isampler2DRect_type, glsl_type::vec2_type, TEX_COMPONENT),
+                _texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, glsl_type::usampler2DRect_type, glsl_type::vec2_type, TEX_COMPONENT),
+
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2DArray_type, glsl_type::vec3_type, TEX_COMPONENT),
+                _texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, glsl_type::isampler2DArray_type, glsl_type::vec3_type, TEX_COMPONENT),
+                _texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, glsl_type::usampler2DArray_type, glsl_type::vec3_type, TEX_COMPONENT),
+
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::samplerCube_type, glsl_type::vec3_type, TEX_COMPONENT),
+                _texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, glsl_type::isamplerCube_type, glsl_type::vec3_type, TEX_COMPONENT),
+                _texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, glsl_type::usamplerCube_type, glsl_type::vec3_type, TEX_COMPONENT),
+
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::samplerCubeArray_type, glsl_type::vec4_type, TEX_COMPONENT),
+                _texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, glsl_type::isamplerCubeArray_type, glsl_type::vec4_type, TEX_COMPONENT),
+                _texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, glsl_type::usamplerCubeArray_type, glsl_type::vec4_type, TEX_COMPONENT),
+
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2DShadow_type, glsl_type::vec2_type),
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2DArrayShadow_type, glsl_type::vec3_type),
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::samplerCubeShadow_type, glsl_type::vec3_type),
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::samplerCubeArrayShadow_type, glsl_type::vec4_type),
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2DRectShadow_type, glsl_type::vec2_type),
+                NULL);
+
+   add_function("textureGatherOffset",
+                _texture(ir_tg4, texture_gather_only, glsl_type::vec4_type, glsl_type::sampler2D_type, glsl_type::vec2_type, TEX_OFFSET),
+                _texture(ir_tg4, texture_gather_only, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec2_type, TEX_OFFSET),
+                _texture(ir_tg4, texture_gather_only, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec2_type, TEX_OFFSET),
+
+                _texture(ir_tg4, texture_gather_only, glsl_type::vec4_type, glsl_type::sampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET),
+                _texture(ir_tg4, texture_gather_only, glsl_type::ivec4_type, glsl_type::isampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET),
+                _texture(ir_tg4, texture_gather_only, glsl_type::uvec4_type, glsl_type::usampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET),
+
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2D_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST),
+                _texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST),
+                _texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST),
+
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET_NONCONST),
+                _texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, glsl_type::isampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET_NONCONST),
+                _texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, glsl_type::usampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET_NONCONST),
+
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST),
+                _texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, glsl_type::isampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST),
+                _texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, glsl_type::usampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST),
+
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2D_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST | TEX_COMPONENT),
+                _texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST | TEX_COMPONENT),
+                _texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST | TEX_COMPONENT),
+
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET_NONCONST | TEX_COMPONENT),
+                _texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, glsl_type::isampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET_NONCONST | TEX_COMPONENT),
+                _texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, glsl_type::usampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET_NONCONST | TEX_COMPONENT),
+
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST | TEX_COMPONENT),
+                _texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, glsl_type::isampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST | TEX_COMPONENT),
+                _texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, glsl_type::usampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST | TEX_COMPONENT),
+
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2DShadow_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST),
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2DArrayShadow_type, glsl_type::vec3_type, TEX_OFFSET_NONCONST),
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2DRectShadow_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST),
+                NULL);
+
+   add_function("textureGatherOffsets",
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2D_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY),
+                _texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY),
+                _texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY),
+
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2D_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY | TEX_COMPONENT),
+                _texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY | TEX_COMPONENT),
+                _texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY | TEX_COMPONENT),
+
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET_ARRAY),
+                _texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, glsl_type::isampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET_ARRAY),
+                _texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, glsl_type::usampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET_ARRAY),
+
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET_ARRAY | TEX_COMPONENT),
+                _texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, glsl_type::isampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET_ARRAY | TEX_COMPONENT),
+                _texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, glsl_type::usampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET_ARRAY | TEX_COMPONENT),
+
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY),
+                _texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, glsl_type::isampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY),
+                _texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, glsl_type::usampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY),
+
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY | TEX_COMPONENT),
+                _texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, glsl_type::isampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY | TEX_COMPONENT),
+                _texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, glsl_type::usampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY | TEX_COMPONENT),
+
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2DShadow_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY),
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2DArrayShadow_type, glsl_type::vec3_type, TEX_OFFSET_ARRAY),
+                _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, glsl_type::sampler2DRectShadow_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY),
+                NULL);
+
+   F(dFdx)
+   F(dFdy)
+   F(fwidth)
+   F(noise1)
+   F(noise2)
+   F(noise3)
+   F(noise4)
+
+   IU(bitfieldExtract)
+   IU(bitfieldInsert)
+   IU(bitfieldReverse)
+   IU(bitCount)
+   IU(findLSB)
+   IU(findMSB)
+   F(fma)
+
+   add_function("ldexp",
+                _ldexp(glsl_type::float_type, glsl_type::int_type),
+                _ldexp(glsl_type::vec2_type,  glsl_type::ivec2_type),
+                _ldexp(glsl_type::vec3_type,  glsl_type::ivec3_type),
+                _ldexp(glsl_type::vec4_type,  glsl_type::ivec4_type),
+                NULL);
+
+   add_function("frexp",
+                _frexp(glsl_type::float_type, glsl_type::int_type),
+                _frexp(glsl_type::vec2_type,  glsl_type::ivec2_type),
+                _frexp(glsl_type::vec3_type,  glsl_type::ivec3_type),
+                _frexp(glsl_type::vec4_type,  glsl_type::ivec4_type),
+                NULL);
+   add_function("uaddCarry",
+                _uaddCarry(glsl_type::uint_type),
+                _uaddCarry(glsl_type::uvec2_type),
+                _uaddCarry(glsl_type::uvec3_type),
+                _uaddCarry(glsl_type::uvec4_type),
+                NULL);
+   add_function("usubBorrow",
+                _usubBorrow(glsl_type::uint_type),
+                _usubBorrow(glsl_type::uvec2_type),
+                _usubBorrow(glsl_type::uvec3_type),
+                _usubBorrow(glsl_type::uvec4_type),
+                NULL);
+   add_function("imulExtended",
+                _mulExtended(glsl_type::int_type),
+                _mulExtended(glsl_type::ivec2_type),
+                _mulExtended(glsl_type::ivec3_type),
+                _mulExtended(glsl_type::ivec4_type),
+                NULL);
+   add_function("umulExtended",
+                _mulExtended(glsl_type::uint_type),
+                _mulExtended(glsl_type::uvec2_type),
+                _mulExtended(glsl_type::uvec3_type),
+                _mulExtended(glsl_type::uvec4_type),
+                NULL);
+
+   add_function("atomicCounter",
+                _atomic_op("__intrinsic_atomic_read",
+                           shader_atomic_counters),
+                NULL);
+   add_function("atomicCounterIncrement",
+                _atomic_op("__intrinsic_atomic_increment",
+                           shader_atomic_counters),
+                NULL);
+   add_function("atomicCounterDecrement",
+                _atomic_op("__intrinsic_atomic_predecrement",
+                           shader_atomic_counters),
+                NULL);
+
+   add_function("min3",
+                _min3(glsl_type::float_type),
+                _min3(glsl_type::vec2_type),
+                _min3(glsl_type::vec3_type),
+                _min3(glsl_type::vec4_type),
+
+                _min3(glsl_type::int_type),
+                _min3(glsl_type::ivec2_type),
+                _min3(glsl_type::ivec3_type),
+                _min3(glsl_type::ivec4_type),
+
+                _min3(glsl_type::uint_type),
+                _min3(glsl_type::uvec2_type),
+                _min3(glsl_type::uvec3_type),
+                _min3(glsl_type::uvec4_type),
+                NULL);
+
+   add_function("max3",
+                _max3(glsl_type::float_type),
+                _max3(glsl_type::vec2_type),
+                _max3(glsl_type::vec3_type),
+                _max3(glsl_type::vec4_type),
+
+                _max3(glsl_type::int_type),
+                _max3(glsl_type::ivec2_type),
+                _max3(glsl_type::ivec3_type),
+                _max3(glsl_type::ivec4_type),
+
+                _max3(glsl_type::uint_type),
+                _max3(glsl_type::uvec2_type),
+                _max3(glsl_type::uvec3_type),
+                _max3(glsl_type::uvec4_type),
+                NULL);
+
+   add_function("mid3",
+                _mid3(glsl_type::float_type),
+                _mid3(glsl_type::vec2_type),
+                _mid3(glsl_type::vec3_type),
+                _mid3(glsl_type::vec4_type),
+
+                _mid3(glsl_type::int_type),
+                _mid3(glsl_type::ivec2_type),
+                _mid3(glsl_type::ivec3_type),
+                _mid3(glsl_type::ivec4_type),
+
+                _mid3(glsl_type::uint_type),
+                _mid3(glsl_type::uvec2_type),
+                _mid3(glsl_type::uvec3_type),
+                _mid3(glsl_type::uvec4_type),
+                NULL);
+
+   add_image_functions(true);
+
+   add_function("memoryBarrier",
+                _memory_barrier(shader_image_load_store),
+                NULL);
+
+#undef F
+#undef FI
+#undef FIU
+#undef FIUB
+#undef FIU2_MIXED
+}
+
+void
+builtin_builder::add_function(const char *name, ...)
+{
+   va_list ap;
+
+   ir_function *f = new(mem_ctx) ir_function(name);
+
+   va_start(ap, name);
+   while (true) {
+      ir_function_signature *sig = va_arg(ap, ir_function_signature *);
+      if (sig == NULL)
+         break;
+
+      if (false) {
+         exec_list stuff;
+         stuff.push_tail(sig);
+         validate_ir_tree(&stuff);
+      }
+
+      f->add_signature(sig);
+   }
+   va_end(ap);
+
+   shader->symbols->add_function(f);
+}
+
+void
+builtin_builder::add_image_function(const char *name,
+                                    const char *intrinsic_name,
+                                    unsigned num_arguments,
+                                    unsigned flags)
+{
+   static const glsl_type *const types[] = {
+      glsl_type::image1D_type,
+      glsl_type::image2D_type,
+      glsl_type::image3D_type,
+      glsl_type::image2DRect_type,
+      glsl_type::imageCube_type,
+      glsl_type::imageBuffer_type,
+      glsl_type::image1DArray_type,
+      glsl_type::image2DArray_type,
+      glsl_type::imageCubeArray_type,
+      glsl_type::image2DMS_type,
+      glsl_type::image2DMSArray_type,
+      glsl_type::iimage1D_type,
+      glsl_type::iimage2D_type,
+      glsl_type::iimage3D_type,
+      glsl_type::iimage2DRect_type,
+      glsl_type::iimageCube_type,
+      glsl_type::iimageBuffer_type,
+      glsl_type::iimage1DArray_type,
+      glsl_type::iimage2DArray_type,
+      glsl_type::iimageCubeArray_type,
+      glsl_type::iimage2DMS_type,
+      glsl_type::iimage2DMSArray_type,
+      glsl_type::uimage1D_type,
+      glsl_type::uimage2D_type,
+      glsl_type::uimage3D_type,
+      glsl_type::uimage2DRect_type,
+      glsl_type::uimageCube_type,
+      glsl_type::uimageBuffer_type,
+      glsl_type::uimage1DArray_type,
+      glsl_type::uimage2DArray_type,
+      glsl_type::uimageCubeArray_type,
+      glsl_type::uimage2DMS_type,
+      glsl_type::uimage2DMSArray_type
+   };
+   ir_function *f = new(mem_ctx) ir_function(name);
+
+   for (unsigned i = 0; i < Elements(types); ++i) {
+      if (types[i]->sampler_type != GLSL_TYPE_FLOAT ||
+          (flags & IMAGE_FUNCTION_SUPPORTS_FLOAT_DATA_TYPE))
+         f->add_signature(_image(types[i], intrinsic_name,
+                                 num_arguments, flags));
+   }
+
+   shader->symbols->add_function(f);
+}
+
+void
+builtin_builder::add_image_functions(bool glsl)
+{
+   const unsigned flags = (glsl ? IMAGE_FUNCTION_EMIT_STUB : 0);
+
+   add_image_function(glsl ? "imageLoad" : "__intrinsic_image_load",
+                      "__intrinsic_image_load", 0,
+                      (flags | IMAGE_FUNCTION_HAS_VECTOR_DATA_TYPE |
+                       IMAGE_FUNCTION_SUPPORTS_FLOAT_DATA_TYPE |
+                       IMAGE_FUNCTION_READ_ONLY));
+
+   add_image_function(glsl ? "imageStore" : "__intrinsic_image_store",
+                      "__intrinsic_image_store", 1,
+                      (flags | IMAGE_FUNCTION_RETURNS_VOID |
+                       IMAGE_FUNCTION_HAS_VECTOR_DATA_TYPE |
+                       IMAGE_FUNCTION_SUPPORTS_FLOAT_DATA_TYPE |
+                       IMAGE_FUNCTION_WRITE_ONLY));
+
+   add_image_function(glsl ? "imageAtomicAdd" : "__intrinsic_image_atomic_add",
+                      "__intrinsic_image_atomic_add", 1, flags);
+
+   add_image_function(glsl ? "imageAtomicMin" : "__intrinsic_image_atomic_min",
+                      "__intrinsic_image_atomic_min", 1, flags);
+
+   add_image_function(glsl ? "imageAtomicMax" : "__intrinsic_image_atomic_max",
+                      "__intrinsic_image_atomic_max", 1, flags);
+
+   add_image_function(glsl ? "imageAtomicAnd" : "__intrinsic_image_atomic_and",
+                      "__intrinsic_image_atomic_and", 1, flags);
+
+   add_image_function(glsl ? "imageAtomicOr" : "__intrinsic_image_atomic_or",
+                      "__intrinsic_image_atomic_or", 1, flags);
+
+   add_image_function(glsl ? "imageAtomicXor" : "__intrinsic_image_atomic_xor",
+                      "__intrinsic_image_atomic_xor", 1, flags);
+
+   add_image_function((glsl ? "imageAtomicExchange" :
+                       "__intrinsic_image_atomic_exchange"),
+                      "__intrinsic_image_atomic_exchange", 1, flags);
+
+   add_image_function((glsl ? "imageAtomicCompSwap" :
+                       "__intrinsic_image_atomic_comp_swap"),
+                      "__intrinsic_image_atomic_comp_swap", 2, flags);
+}
+
+ir_variable *
+builtin_builder::in_var(const glsl_type *type, const char *name)
+{
+   return new(mem_ctx) ir_variable(type, name, ir_var_function_in);
+}
+
+ir_variable *
+builtin_builder::out_var(const glsl_type *type, const char *name)
+{
+   return new(mem_ctx) ir_variable(type, name, ir_var_function_out);
+}
+
+ir_constant *
+builtin_builder::imm(float f, unsigned vector_elements)
+{
+   return new(mem_ctx) ir_constant(f, vector_elements);
+}
+
+ir_constant *
+builtin_builder::imm(int i, unsigned vector_elements)
+{
+   return new(mem_ctx) ir_constant(i, vector_elements);
+}
+
+ir_constant *
+builtin_builder::imm(unsigned u, unsigned vector_elements)
+{
+   return new(mem_ctx) ir_constant(u, vector_elements);
+}
+
+ir_constant *
+builtin_builder::imm(const glsl_type *type, const ir_constant_data &data)
+{
+   return new(mem_ctx) ir_constant(type, &data);
+}
+
+ir_dereference_variable *
+builtin_builder::var_ref(ir_variable *var)
+{
+   return new(mem_ctx) ir_dereference_variable(var);
+}
+
+ir_dereference_array *
+builtin_builder::array_ref(ir_variable *var, int idx)
+{
+   return new(mem_ctx) ir_dereference_array(var, imm(idx));
+}
+
+/** Return an element of a matrix */
+ir_swizzle *
+builtin_builder::matrix_elt(ir_variable *var, int column, int row)
+{
+   return swizzle(array_ref(var, column), row, 1);
+}
+
+/**
+ * Implementations of built-in functions:
+ *  @{
+ */
+ir_function_signature *
+builtin_builder::new_sig(const glsl_type *return_type,
+                         builtin_available_predicate avail,
+                         int num_params,
+                         ...)
+{
+   va_list ap;
+
+   ir_function_signature *sig =
+      new(mem_ctx) ir_function_signature(return_type, avail);
+
+   exec_list plist;
+   va_start(ap, num_params);
+   for (int i = 0; i < num_params; i++) {
+      plist.push_tail(va_arg(ap, ir_variable *));
+   }
+   va_end(ap);
+
+   sig->replace_parameters(&plist);
+   return sig;
+}
+
+#define MAKE_SIG(return_type, avail, ...)  \
+   ir_function_signature *sig =               \
+      new_sig(return_type, avail, __VA_ARGS__);      \
+   ir_factory body(&sig->body, mem_ctx);             \
+   sig->is_defined = true;
+
+#define MAKE_INTRINSIC(return_type, avail, ...)      \
+   ir_function_signature *sig =                      \
+      new_sig(return_type, avail, __VA_ARGS__);      \
+   sig->is_intrinsic = true;
+
+ir_function_signature *
+builtin_builder::unop(builtin_available_predicate avail,
+                      ir_expression_operation opcode,
+                      const glsl_type *return_type,
+                      const glsl_type *param_type)
+{
+   ir_variable *x = in_var(param_type, "x");
+   MAKE_SIG(return_type, avail, 1, x);
+   body.emit(ret(expr(opcode, x)));
+   return sig;
+}
+
+#define UNOP(NAME, OPCODE, AVAIL)               \
+ir_function_signature *                         \
+builtin_builder::_##NAME(const glsl_type *type) \
+{                                               \
+   return unop(&AVAIL, OPCODE, type, type);     \
+}
+
+ir_function_signature *
+builtin_builder::binop(ir_expression_operation opcode,
+                       builtin_available_predicate avail,
+                       const glsl_type *return_type,
+                       const glsl_type *param0_type,
+                       const glsl_type *param1_type)
+{
+   ir_variable *x = in_var(param0_type, "x");
+   ir_variable *y = in_var(param1_type, "y");
+   MAKE_SIG(return_type, avail, 2, x, y);
+   body.emit(ret(expr(opcode, x, y)));
+   return sig;
+}
+
+#define BINOP(NAME, OPCODE, AVAIL)                                      \
+ir_function_signature *                                                 \
+builtin_builder::_##NAME(const glsl_type *return_type,                  \
+                         const glsl_type *param0_type,                  \
+                         const glsl_type *param1_type)                  \
+{                                                                       \
+   return binop(&AVAIL, OPCODE, return_type, param0_type, param1_type); \
+}
+
+/**
+ * Angle and Trigonometry Functions @{
+ */
+
+ir_function_signature *
+builtin_builder::_radians(const glsl_type *type)
+{
+   ir_variable *degrees = in_var(type, "degrees");
+   MAKE_SIG(type, always_available, 1, degrees);
+   body.emit(ret(mul(degrees, imm(0.0174532925f))));
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_degrees(const glsl_type *type)
+{
+   ir_variable *radians = in_var(type, "radians");
+   MAKE_SIG(type, always_available, 1, radians);
+   body.emit(ret(mul(radians, imm(57.29578f))));
+   return sig;
+}
+
+UNOP(sin, ir_unop_sin, always_available)
+UNOP(cos, ir_unop_cos, always_available)
+
+ir_function_signature *
+builtin_builder::_tan(const glsl_type *type)
+{
+   ir_variable *theta = in_var(type, "theta");
+   MAKE_SIG(type, always_available, 1, theta);
+   body.emit(ret(div(sin(theta), cos(theta))));
+   return sig;
+}
+
+ir_expression *
+builtin_builder::asin_expr(ir_variable *x)
+{
+   return mul(sign(x),
+              sub(imm(M_PI_2f),
+                  mul(sqrt(sub(imm(1.0f), abs(x))),
+                      add(imm(M_PI_2f),
+                          mul(abs(x),
+                              add(imm(M_PI_4f - 1.0f),
+                                  mul(abs(x),
+                                      add(imm(0.086566724f),
+                                          mul(abs(x), imm(-0.03102955f))))))))));
+}
+
+ir_call *
+builtin_builder::call(ir_function *f, ir_variable *ret, exec_list params)
+{
+   exec_list actual_params;
+
+   foreach_list(node, &params) {
+      ir_variable *var = (ir_variable *) node;
+      actual_params.push_tail(var_ref(var));
+   }
+
+   ir_function_signature *sig =
+      f->exact_matching_signature(NULL, &actual_params);
+   if (!sig)
+      return NULL;
+
+   ir_dereference_variable *deref =
+      (sig->return_type->is_void() ? NULL : var_ref(ret));
+
+   return new(mem_ctx) ir_call(sig, deref, &actual_params);
+}
+
+ir_function_signature *
+builtin_builder::_asin(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   MAKE_SIG(type, always_available, 1, x);
+
+   body.emit(ret(asin_expr(x)));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_acos(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   MAKE_SIG(type, always_available, 1, x);
+
+   body.emit(ret(sub(imm(M_PI_2f), asin_expr(x))));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_atan2(const glsl_type *type)
+{
+   ir_variable *vec_y = in_var(type, "vec_y");
+   ir_variable *vec_x = in_var(type, "vec_x");
+   MAKE_SIG(type, always_available, 2, vec_y, vec_x);
+
+   ir_variable *vec_result = body.make_temp(type, "vec_result");
+   ir_variable *r = body.make_temp(glsl_type::float_type, "r");
+   for (int i = 0; i < type->vector_elements; i++) {
+      ir_variable *y = body.make_temp(glsl_type::float_type, "y");
+      ir_variable *x = body.make_temp(glsl_type::float_type, "x");
+      body.emit(assign(y, swizzle(vec_y, i, 1)));
+      body.emit(assign(x, swizzle(vec_x, i, 1)));
+
+      /* If |x| >= 1.0e-8 * |y|: */
+      ir_if *outer_if =
+         new(mem_ctx) ir_if(greater(abs(x), mul(imm(1.0e-8f), abs(y))));
+
+      ir_factory outer_then(&outer_if->then_instructions, mem_ctx);
+
+      /* Then...call atan(y/x) */
+      ir_variable *y_over_x = outer_then.make_temp(glsl_type::float_type, "y_over_x");
+      outer_then.emit(assign(y_over_x, div(y, x)));
+      outer_then.emit(assign(r, mul(y_over_x, rsq(add(mul(y_over_x, y_over_x),
+                                                      imm(1.0f))))));
+      outer_then.emit(assign(r, asin_expr(r)));
+
+      /*     ...and fix it up: */
+      ir_if *inner_if = new(mem_ctx) ir_if(less(x, imm(0.0f)));
+      inner_if->then_instructions.push_tail(
+         if_tree(gequal(y, imm(0.0f)),
+                 assign(r, add(r, imm(M_PIf))),
+                 assign(r, sub(r, imm(M_PIf)))));
+      outer_then.emit(inner_if);
+
+      /* Else... */
+      outer_if->else_instructions.push_tail(
+         assign(r, mul(sign(y), imm(M_PI_2f))));
+
+      body.emit(outer_if);
+
+      body.emit(assign(vec_result, r, 1 << i));
+   }
+   body.emit(ret(vec_result));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_atan(const glsl_type *type)
+{
+   ir_variable *y_over_x = in_var(type, "y_over_x");
+   MAKE_SIG(type, always_available, 1, y_over_x);
+
+   ir_variable *t = body.make_temp(type, "t");
+   body.emit(assign(t, mul(y_over_x, rsq(add(mul(y_over_x, y_over_x),
+                                             imm(1.0f))))));
+
+   body.emit(ret(asin_expr(t)));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_sinh(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   MAKE_SIG(type, v130, 1, x);
+
+   /* 0.5 * (e^x - e^(-x)) */
+   body.emit(ret(mul(imm(0.5f), sub(exp(x), exp(neg(x))))));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_cosh(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   MAKE_SIG(type, v130, 1, x);
+
+   /* 0.5 * (e^x + e^(-x)) */
+   body.emit(ret(mul(imm(0.5f), add(exp(x), exp(neg(x))))));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_tanh(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   MAKE_SIG(type, v130, 1, x);
+
+   /* (e^x - e^(-x)) / (e^x + e^(-x)) */
+   body.emit(ret(div(sub(exp(x), exp(neg(x))),
+                     add(exp(x), exp(neg(x))))));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_asinh(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   MAKE_SIG(type, v130, 1, x);
+
+   body.emit(ret(mul(sign(x), log(add(abs(x), sqrt(add(mul(x, x),
+                                                       imm(1.0f))))))));
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_acosh(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   MAKE_SIG(type, v130, 1, x);
+
+   body.emit(ret(log(add(x, sqrt(sub(mul(x, x), imm(1.0f)))))));
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_atanh(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   MAKE_SIG(type, v130, 1, x);
+
+   body.emit(ret(mul(imm(0.5f), log(div(add(imm(1.0f), x),
+                                        sub(imm(1.0f), x))))));
+   return sig;
+}
+/** @} */
+
+/**
+ * Exponential Functions @{
+ */
+
+ir_function_signature *
+builtin_builder::_pow(const glsl_type *type)
+{
+   return binop(ir_binop_pow, always_available, type, type, type);
+}
+
+UNOP(exp,         ir_unop_exp,  always_available)
+UNOP(log,         ir_unop_log,  always_available)
+UNOP(exp2,        ir_unop_exp2, always_available)
+UNOP(log2,        ir_unop_log2, always_available)
+UNOP(sqrt,        ir_unop_sqrt, always_available)
+UNOP(inversesqrt, ir_unop_rsq,  always_available)
+
+/** @} */
+
+UNOP(abs,       ir_unop_abs,        always_available)
+UNOP(sign,      ir_unop_sign,       always_available)
+UNOP(floor,     ir_unop_floor,      always_available)
+UNOP(trunc,     ir_unop_trunc,      v130)
+UNOP(round,     ir_unop_round_even, always_available)
+UNOP(roundEven, ir_unop_round_even, always_available)
+UNOP(ceil,      ir_unop_ceil,       always_available)
+UNOP(fract,     ir_unop_fract,      always_available)
+
+ir_function_signature *
+builtin_builder::_mod(const glsl_type *x_type, const glsl_type *y_type)
+{
+   return binop(ir_binop_mod, always_available, x_type, x_type, y_type);
+}
+
+ir_function_signature *
+builtin_builder::_modf(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   ir_variable *i = out_var(type, "i");
+   MAKE_SIG(type, v130, 2, x, i);
+
+   ir_variable *t = body.make_temp(type, "t");
+   body.emit(assign(t, expr(ir_unop_trunc, x)));
+   body.emit(assign(i, t));
+   body.emit(ret(sub(x, t)));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_min(builtin_available_predicate avail,
+                      const glsl_type *x_type, const glsl_type *y_type)
+{
+   return binop(ir_binop_min, avail, x_type, x_type, y_type);
+}
+
+ir_function_signature *
+builtin_builder::_max(builtin_available_predicate avail,
+                      const glsl_type *x_type, const glsl_type *y_type)
+{
+   return binop(ir_binop_max, avail, x_type, x_type, y_type);
+}
+
+ir_function_signature *
+builtin_builder::_clamp(builtin_available_predicate avail,
+                        const glsl_type *val_type, const glsl_type *bound_type)
+{
+   ir_variable *x = in_var(val_type, "x");
+   ir_variable *minVal = in_var(bound_type, "minVal");
+   ir_variable *maxVal = in_var(bound_type, "maxVal");
+   MAKE_SIG(val_type, avail, 3, x, minVal, maxVal);
+
+   body.emit(ret(clamp(x, minVal, maxVal)));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_mix_lrp(const glsl_type *val_type, const glsl_type *blend_type)
+{
+   ir_variable *x = in_var(val_type, "x");
+   ir_variable *y = in_var(val_type, "y");
+   ir_variable *a = in_var(blend_type, "a");
+   MAKE_SIG(val_type, always_available, 3, x, y, a);
+
+   body.emit(ret(lrp(x, y, a)));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_mix_sel(builtin_available_predicate avail,
+                          const glsl_type *val_type,
+                          const glsl_type *blend_type)
+{
+   ir_variable *x = in_var(val_type, "x");
+   ir_variable *y = in_var(val_type, "y");
+   ir_variable *a = in_var(blend_type, "a");
+   MAKE_SIG(val_type, avail, 3, x, y, a);
+
+   /* csel matches the ternary operator in that a selector of true choses the
+    * first argument. This differs from mix(x, y, false) which choses the
+    * second argument (to remain consistent with the interpolating version of
+    * mix() which takes a blend factor from 0.0 to 1.0 where 0.0 is only x.
+    *
+    * To handle the behavior mismatch, reverse the x and y arguments.
+    */
+   body.emit(ret(csel(a, y, x)));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_step(const glsl_type *edge_type, const glsl_type *x_type)
+{
+   ir_variable *edge = in_var(edge_type, "edge");
+   ir_variable *x = in_var(x_type, "x");
+   MAKE_SIG(x_type, always_available, 2, edge, x);
+
+   ir_variable *t = body.make_temp(x_type, "t");
+   if (x_type->vector_elements == 1) {
+      /* Both are floats */
+      body.emit(assign(t, b2f(gequal(x, edge))));
+   } else if (edge_type->vector_elements == 1) {
+      /* x is a vector but edge is a float */
+      for (int i = 0; i < x_type->vector_elements; i++) {
+         body.emit(assign(t, b2f(gequal(swizzle(x, i, 1), edge)), 1 << i));
+      }
+   } else {
+      /* Both are vectors */
+      for (int i = 0; i < x_type->vector_elements; i++) {
+         body.emit(assign(t, b2f(gequal(swizzle(x, i, 1), swizzle(edge, i, 1))),
+                          1 << i));
+      }
+   }
+   body.emit(ret(t));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_smoothstep(const glsl_type *edge_type, const glsl_type *x_type)
+{
+   ir_variable *edge0 = in_var(edge_type, "edge0");
+   ir_variable *edge1 = in_var(edge_type, "edge1");
+   ir_variable *x = in_var(x_type, "x");
+   MAKE_SIG(x_type, always_available, 3, edge0, edge1, x);
+
+   /* From the GLSL 1.10 specification:
+    *
+    *    genType t;
+    *    t = clamp((x - edge0) / (edge1 - edge0), 0, 1);
+    *    return t * t * (3 - 2 * t);
+    */
+
+   ir_variable *t = body.make_temp(x_type, "t");
+   body.emit(assign(t, clamp(div(sub(x, edge0), sub(edge1, edge0)),
+                             imm(0.0f), imm(1.0f))));
+
+   body.emit(ret(mul(t, mul(t, sub(imm(3.0f), mul(imm(2.0f), t))))));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_isnan(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   MAKE_SIG(glsl_type::bvec(type->vector_elements), v130, 1, x);
+
+   body.emit(ret(nequal(x, x)));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_isinf(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   MAKE_SIG(glsl_type::bvec(type->vector_elements), v130, 1, x);
+
+   ir_constant_data infinities;
+   for (int i = 0; i < type->vector_elements; i++) {
+      infinities.f[i] = std::numeric_limits<float>::infinity();
+   }
+
+   body.emit(ret(equal(abs(x), imm(type, infinities))));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_floatBitsToInt(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   MAKE_SIG(glsl_type::ivec(type->vector_elements), shader_bit_encoding, 1, x);
+   body.emit(ret(bitcast_f2i(x)));
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_floatBitsToUint(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   MAKE_SIG(glsl_type::uvec(type->vector_elements), shader_bit_encoding, 1, x);
+   body.emit(ret(bitcast_f2u(x)));
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_intBitsToFloat(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   MAKE_SIG(glsl_type::vec(type->vector_elements), shader_bit_encoding, 1, x);
+   body.emit(ret(bitcast_i2f(x)));
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_uintBitsToFloat(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   MAKE_SIG(glsl_type::vec(type->vector_elements), shader_bit_encoding, 1, x);
+   body.emit(ret(bitcast_u2f(x)));
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_packUnorm2x16(builtin_available_predicate avail)
+{
+   ir_variable *v = in_var(glsl_type::vec2_type, "v");
+   MAKE_SIG(glsl_type::uint_type, avail, 1, v);
+   body.emit(ret(expr(ir_unop_pack_unorm_2x16, v)));
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_packSnorm2x16(builtin_available_predicate avail)
+{
+   ir_variable *v = in_var(glsl_type::vec2_type, "v");
+   MAKE_SIG(glsl_type::uint_type, avail, 1, v);
+   body.emit(ret(expr(ir_unop_pack_snorm_2x16, v)));
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_packUnorm4x8(builtin_available_predicate avail)
+{
+   ir_variable *v = in_var(glsl_type::vec4_type, "v");
+   MAKE_SIG(glsl_type::uint_type, avail, 1, v);
+   body.emit(ret(expr(ir_unop_pack_unorm_4x8, v)));
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_packSnorm4x8(builtin_available_predicate avail)
+{
+   ir_variable *v = in_var(glsl_type::vec4_type, "v");
+   MAKE_SIG(glsl_type::uint_type, avail, 1, v);
+   body.emit(ret(expr(ir_unop_pack_snorm_4x8, v)));
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_unpackUnorm2x16(builtin_available_predicate avail)
+{
+   ir_variable *p = in_var(glsl_type::uint_type, "p");
+   MAKE_SIG(glsl_type::vec2_type, avail, 1, p);
+   body.emit(ret(expr(ir_unop_unpack_unorm_2x16, p)));
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_unpackSnorm2x16(builtin_available_predicate avail)
+{
+   ir_variable *p = in_var(glsl_type::uint_type, "p");
+   MAKE_SIG(glsl_type::vec2_type, avail, 1, p);
+   body.emit(ret(expr(ir_unop_unpack_snorm_2x16, p)));
+   return sig;
+}
+
+
+ir_function_signature *
+builtin_builder::_unpackUnorm4x8(builtin_available_predicate avail)
+{
+   ir_variable *p = in_var(glsl_type::uint_type, "p");
+   MAKE_SIG(glsl_type::vec4_type, avail, 1, p);
+   body.emit(ret(expr(ir_unop_unpack_unorm_4x8, p)));
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_unpackSnorm4x8(builtin_available_predicate avail)
+{
+   ir_variable *p = in_var(glsl_type::uint_type, "p");
+   MAKE_SIG(glsl_type::vec4_type, avail, 1, p);
+   body.emit(ret(expr(ir_unop_unpack_snorm_4x8, p)));
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_packHalf2x16(builtin_available_predicate avail)
+{
+   ir_variable *v = in_var(glsl_type::vec2_type, "v");
+   MAKE_SIG(glsl_type::uint_type, avail, 1, v);
+   body.emit(ret(expr(ir_unop_pack_half_2x16, v)));
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_unpackHalf2x16(builtin_available_predicate avail)
+{
+   ir_variable *p = in_var(glsl_type::uint_type, "p");
+   MAKE_SIG(glsl_type::vec2_type, avail, 1, p);
+   body.emit(ret(expr(ir_unop_unpack_half_2x16, p)));
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_length(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   MAKE_SIG(glsl_type::float_type, always_available, 1, x);
+
+   body.emit(ret(sqrt(dot(x, x))));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_distance(const glsl_type *type)
+{
+   ir_variable *p0 = in_var(type, "p0");
+   ir_variable *p1 = in_var(type, "p1");
+   MAKE_SIG(glsl_type::float_type, always_available, 2, p0, p1);
+
+   if (type->vector_elements == 1) {
+      body.emit(ret(abs(sub(p0, p1))));
+   } else {
+      ir_variable *p = body.make_temp(type, "p");
+      body.emit(assign(p, sub(p0, p1)));
+      body.emit(ret(sqrt(dot(p, p))));
+   }
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_dot(const glsl_type *type)
+{
+   if (type->vector_elements == 1)
+      return binop(ir_binop_mul, always_available, type, type, type);
+
+   return binop(ir_binop_dot, always_available,
+                glsl_type::float_type, type, type);
+}
+
+ir_function_signature *
+builtin_builder::_cross(const glsl_type *type)
+{
+   ir_variable *a = in_var(type, "a");
+   ir_variable *b = in_var(type, "b");
+   MAKE_SIG(type, always_available, 2, a, b);
+
+   int yzx = MAKE_SWIZZLE4(SWIZZLE_Y, SWIZZLE_Z, SWIZZLE_X, 0);
+   int zxy = MAKE_SWIZZLE4(SWIZZLE_Z, SWIZZLE_X, SWIZZLE_Y, 0);
+
+   body.emit(ret(sub(mul(swizzle(a, yzx, 3), swizzle(b, zxy, 3)),
+                     mul(swizzle(a, zxy, 3), swizzle(b, yzx, 3)))));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_normalize(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   MAKE_SIG(type, always_available, 1, x);
+
+   if (type->vector_elements == 1) {
+      body.emit(ret(sign(x)));
+   } else {
+      body.emit(ret(mul(x, rsq(dot(x, x)))));
+   }
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_ftransform()
+{
+   MAKE_SIG(glsl_type::vec4_type, compatibility_vs_only, 0);
+
+   body.emit(ret(new(mem_ctx) ir_expression(ir_binop_mul,
+      glsl_type::vec4_type,
+      var_ref(gl_ModelViewProjectionMatrix),
+      var_ref(gl_Vertex))));
+
+   /* FINISHME: Once the ir_expression() constructor handles type inference
+    *           for matrix operations, we can simplify this to:
+    *
+    *    body.emit(ret(mul(gl_ModelViewProjectionMatrix, gl_Vertex)));
+    */
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_faceforward(const glsl_type *type)
+{
+   ir_variable *N = in_var(type, "N");
+   ir_variable *I = in_var(type, "I");
+   ir_variable *Nref = in_var(type, "Nref");
+   MAKE_SIG(type, always_available, 3, N, I, Nref);
+
+   body.emit(if_tree(less(dot(Nref, I), imm(0.0f)),
+                     ret(N), ret(neg(N))));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_reflect(const glsl_type *type)
+{
+   ir_variable *I = in_var(type, "I");
+   ir_variable *N = in_var(type, "N");
+   MAKE_SIG(type, always_available, 2, I, N);
+
+   /* I - 2 * dot(N, I) * N */
+   body.emit(ret(sub(I, mul(imm(2.0f), mul(dot(N, I), N)))));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_refract(const glsl_type *type)
+{
+   ir_variable *I = in_var(type, "I");
+   ir_variable *N = in_var(type, "N");
+   ir_variable *eta = in_var(glsl_type::float_type, "eta");
+   MAKE_SIG(type, always_available, 3, I, N, eta);
+
+   ir_variable *n_dot_i = body.make_temp(glsl_type::float_type, "n_dot_i");
+   body.emit(assign(n_dot_i, dot(N, I)));
+
+   /* From the GLSL 1.10 specification:
+    * k = 1.0 - eta * eta * (1.0 - dot(N, I) * dot(N, I))
+    * if (k < 0.0)
+    *    return genType(0.0)
+    * else
+    *    return eta * I - (eta * dot(N, I) + sqrt(k)) * N
+    */
+   ir_variable *k = body.make_temp(glsl_type::float_type, "k");
+   body.emit(assign(k, sub(imm(1.0f),
+                           mul(eta, mul(eta, sub(imm(1.0f),
+                                                 mul(n_dot_i, n_dot_i)))))));
+   body.emit(if_tree(less(k, imm(0.0f)),
+                     ret(ir_constant::zero(mem_ctx, type)),
+                     ret(sub(mul(eta, I),
+                             mul(add(mul(eta, n_dot_i), sqrt(k)), N)))));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_matrixCompMult(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   ir_variable *y = in_var(type, "y");
+   MAKE_SIG(type, always_available, 2, x, y);
+
+   ir_variable *z = body.make_temp(type, "z");
+   for (int i = 0; i < type->matrix_columns; i++) {
+      body.emit(assign(array_ref(z, i), mul(array_ref(x, i), array_ref(y, i))));
+   }
+   body.emit(ret(z));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_outerProduct(const glsl_type *type)
+{
+   ir_variable *c = in_var(glsl_type::vec(type->vector_elements), "c");
+   ir_variable *r = in_var(glsl_type::vec(type->matrix_columns), "r");
+   MAKE_SIG(type, v120, 2, c, r);
+
+   ir_variable *m = body.make_temp(type, "m");
+   for (int i = 0; i < type->matrix_columns; i++) {
+      body.emit(assign(array_ref(m, i), mul(c, swizzle(r, i, 1))));
+   }
+   body.emit(ret(m));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_transpose(const glsl_type *orig_type)
+{
+   const glsl_type *transpose_type =
+      glsl_type::get_instance(GLSL_TYPE_FLOAT,
+                              orig_type->matrix_columns,
+                              orig_type->vector_elements);
+
+   ir_variable *m = in_var(orig_type, "m");
+   MAKE_SIG(transpose_type, v120, 1, m);
+
+   ir_variable *t = body.make_temp(transpose_type, "t");
+   for (int i = 0; i < orig_type->matrix_columns; i++) {
+      for (int j = 0; j < orig_type->vector_elements; j++) {
+         body.emit(assign(array_ref(t, j),
+                          matrix_elt(m, i, j),
+                          1 << i));
+      }
+   }
+   body.emit(ret(t));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_determinant_mat2()
+{
+   ir_variable *m = in_var(glsl_type::mat2_type, "m");
+   MAKE_SIG(glsl_type::float_type, v120, 1, m);
+
+   body.emit(ret(sub(mul(matrix_elt(m, 0, 0), matrix_elt(m, 1, 1)),
+                     mul(matrix_elt(m, 1, 0), matrix_elt(m, 0, 1)))));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_determinant_mat3()
+{
+   ir_variable *m = in_var(glsl_type::mat3_type, "m");
+   MAKE_SIG(glsl_type::float_type, v120, 1, m);
+
+   ir_expression *f1 =
+      sub(mul(matrix_elt(m, 1, 1), matrix_elt(m, 2, 2)),
+          mul(matrix_elt(m, 1, 2), matrix_elt(m, 2, 1)));
+
+   ir_expression *f2 =
+      sub(mul(matrix_elt(m, 1, 0), matrix_elt(m, 2, 2)),
+          mul(matrix_elt(m, 1, 2), matrix_elt(m, 2, 0)));
+
+   ir_expression *f3 =
+      sub(mul(matrix_elt(m, 1, 0), matrix_elt(m, 2, 1)),
+          mul(matrix_elt(m, 1, 1), matrix_elt(m, 2, 0)));
+
+   body.emit(ret(add(sub(mul(matrix_elt(m, 0, 0), f1),
+                         mul(matrix_elt(m, 0, 1), f2)),
+                     mul(matrix_elt(m, 0, 2), f3))));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_determinant_mat4()
+{
+   ir_variable *m = in_var(glsl_type::mat4_type, "m");
+   MAKE_SIG(glsl_type::float_type, v120, 1, m);
+
+   ir_variable *SubFactor00 = body.make_temp(glsl_type::float_type, "SubFactor00");
+   ir_variable *SubFactor01 = body.make_temp(glsl_type::float_type, "SubFactor01");
+   ir_variable *SubFactor02 = body.make_temp(glsl_type::float_type, "SubFactor02");
+   ir_variable *SubFactor03 = body.make_temp(glsl_type::float_type, "SubFactor03");
+   ir_variable *SubFactor04 = body.make_temp(glsl_type::float_type, "SubFactor04");
+   ir_variable *SubFactor05 = body.make_temp(glsl_type::float_type, "SubFactor05");
+   ir_variable *SubFactor06 = body.make_temp(glsl_type::float_type, "SubFactor06");
+   ir_variable *SubFactor07 = body.make_temp(glsl_type::float_type, "SubFactor07");
+   ir_variable *SubFactor08 = body.make_temp(glsl_type::float_type, "SubFactor08");
+   ir_variable *SubFactor09 = body.make_temp(glsl_type::float_type, "SubFactor09");
+   ir_variable *SubFactor10 = body.make_temp(glsl_type::float_type, "SubFactor10");
+   ir_variable *SubFactor11 = body.make_temp(glsl_type::float_type, "SubFactor11");
+   ir_variable *SubFactor12 = body.make_temp(glsl_type::float_type, "SubFactor12");
+   ir_variable *SubFactor13 = body.make_temp(glsl_type::float_type, "SubFactor13");
+   ir_variable *SubFactor14 = body.make_temp(glsl_type::float_type, "SubFactor14");
+   ir_variable *SubFactor15 = body.make_temp(glsl_type::float_type, "SubFactor15");
+   ir_variable *SubFactor16 = body.make_temp(glsl_type::float_type, "SubFactor16");
+   ir_variable *SubFactor17 = body.make_temp(glsl_type::float_type, "SubFactor17");
+   ir_variable *SubFactor18 = body.make_temp(glsl_type::float_type, "SubFactor18");
+
+   body.emit(assign(SubFactor00, sub(mul(matrix_elt(m, 2, 2), matrix_elt(m, 3, 3)), mul(matrix_elt(m, 3, 2), matrix_elt(m, 2, 3)))));
+   body.emit(assign(SubFactor01, sub(mul(matrix_elt(m, 2, 1), matrix_elt(m, 3, 3)), mul(matrix_elt(m, 3, 1), matrix_elt(m, 2, 3)))));
+   body.emit(assign(SubFactor02, sub(mul(matrix_elt(m, 2, 1), matrix_elt(m, 3, 2)), mul(matrix_elt(m, 3, 1), matrix_elt(m, 2, 2)))));
+   body.emit(assign(SubFactor03, sub(mul(matrix_elt(m, 2, 0), matrix_elt(m, 3, 3)), mul(matrix_elt(m, 3, 0), matrix_elt(m, 2, 3)))));
+   body.emit(assign(SubFactor04, sub(mul(matrix_elt(m, 2, 0), matrix_elt(m, 3, 2)), mul(matrix_elt(m, 3, 0), matrix_elt(m, 2, 2)))));
+   body.emit(assign(SubFactor05, sub(mul(matrix_elt(m, 2, 0), matrix_elt(m, 3, 1)), mul(matrix_elt(m, 3, 0), matrix_elt(m, 2, 1)))));
+   body.emit(assign(SubFactor06, sub(mul(matrix_elt(m, 1, 2), matrix_elt(m, 3, 3)), mul(matrix_elt(m, 3, 2), matrix_elt(m, 1, 3)))));
+   body.emit(assign(SubFactor07, sub(mul(matrix_elt(m, 1, 1), matrix_elt(m, 3, 3)), mul(matrix_elt(m, 3, 1), matrix_elt(m, 1, 3)))));
+   body.emit(assign(SubFactor08, sub(mul(matrix_elt(m, 1, 1), matrix_elt(m, 3, 2)), mul(matrix_elt(m, 3, 1), matrix_elt(m, 1, 2)))));
+   body.emit(assign(SubFactor09, sub(mul(matrix_elt(m, 1, 0), matrix_elt(m, 3, 3)), mul(matrix_elt(m, 3, 0), matrix_elt(m, 1, 3)))));
+   body.emit(assign(SubFactor10, sub(mul(matrix_elt(m, 1, 0), matrix_elt(m, 3, 2)), mul(matrix_elt(m, 3, 0), matrix_elt(m, 1, 2)))));
+   body.emit(assign(SubFactor11, sub(mul(matrix_elt(m, 1, 1), matrix_elt(m, 3, 3)), mul(matrix_elt(m, 3, 1), matrix_elt(m, 1, 3)))));
+   body.emit(assign(SubFactor12, sub(mul(matrix_elt(m, 1, 0), matrix_elt(m, 3, 1)), mul(matrix_elt(m, 3, 0), matrix_elt(m, 1, 1)))));
+   body.emit(assign(SubFactor13, sub(mul(matrix_elt(m, 1, 2), matrix_elt(m, 2, 3)), mul(matrix_elt(m, 2, 2), matrix_elt(m, 1, 3)))));
+   body.emit(assign(SubFactor14, sub(mul(matrix_elt(m, 1, 1), matrix_elt(m, 2, 3)), mul(matrix_elt(m, 2, 1), matrix_elt(m, 1, 3)))));
+   body.emit(assign(SubFactor15, sub(mul(matrix_elt(m, 1, 1), matrix_elt(m, 2, 2)), mul(matrix_elt(m, 2, 1), matrix_elt(m, 1, 2)))));
+   body.emit(assign(SubFactor16, sub(mul(matrix_elt(m, 1, 0), matrix_elt(m, 2, 3)), mul(matrix_elt(m, 2, 0), matrix_elt(m, 1, 3)))));
+   body.emit(assign(SubFactor17, sub(mul(matrix_elt(m, 1, 0), matrix_elt(m, 2, 2)), mul(matrix_elt(m, 2, 0), matrix_elt(m, 1, 2)))));
+   body.emit(assign(SubFactor18, sub(mul(matrix_elt(m, 1, 0), matrix_elt(m, 2, 1)), mul(matrix_elt(m, 2, 0), matrix_elt(m, 1, 1)))));
+
+   ir_variable *adj_0 = body.make_temp(glsl_type::vec4_type, "adj_0");
+
+   body.emit(assign(adj_0,
+                    add(sub(mul(matrix_elt(m, 1, 1), SubFactor00),
+                            mul(matrix_elt(m, 1, 2), SubFactor01)),
+                        mul(matrix_elt(m, 1, 3), SubFactor02)),
+                    WRITEMASK_X));
+   body.emit(assign(adj_0, neg(
+                    add(sub(mul(matrix_elt(m, 1, 0), SubFactor00),
+                            mul(matrix_elt(m, 1, 2), SubFactor03)),
+                        mul(matrix_elt(m, 1, 3), SubFactor04))),
+                    WRITEMASK_Y));
+   body.emit(assign(adj_0,
+                    add(sub(mul(matrix_elt(m, 1, 0), SubFactor01),
+                            mul(matrix_elt(m, 1, 1), SubFactor03)),
+                        mul(matrix_elt(m, 1, 3), SubFactor05)),
+                    WRITEMASK_Z));
+   body.emit(assign(adj_0, neg(
+                    add(sub(mul(matrix_elt(m, 1, 0), SubFactor02),
+                            mul(matrix_elt(m, 1, 1), SubFactor04)),
+                        mul(matrix_elt(m, 1, 2), SubFactor05))),
+                    WRITEMASK_W));
+
+   body.emit(ret(dot(array_ref(m, 0), adj_0)));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_inverse_mat2()
+{
+   ir_variable *m = in_var(glsl_type::mat2_type, "m");
+   MAKE_SIG(glsl_type::mat2_type, v120, 1, m);
+
+   ir_variable *adj = body.make_temp(glsl_type::mat2_type, "adj");
+   body.emit(assign(array_ref(adj, 0), matrix_elt(m, 1, 1), 1 << 0));
+   body.emit(assign(array_ref(adj, 0), neg(matrix_elt(m, 0, 1)), 1 << 1));
+   body.emit(assign(array_ref(adj, 1), neg(matrix_elt(m, 1, 0)), 1 << 0));
+   body.emit(assign(array_ref(adj, 1), matrix_elt(m, 0, 0), 1 << 1));
+
+   ir_expression *det =
+      sub(mul(matrix_elt(m, 0, 0), matrix_elt(m, 1, 1)),
+          mul(matrix_elt(m, 1, 0), matrix_elt(m, 0, 1)));
+
+   body.emit(ret(div(adj, det)));
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_inverse_mat3()
+{
+   ir_variable *m = in_var(glsl_type::mat3_type, "m");
+   MAKE_SIG(glsl_type::mat3_type, v120, 1, m);
+
+   ir_variable *f11_22_21_12 = body.make_temp(glsl_type::float_type, "f11_22_21_12");
+   ir_variable *f10_22_20_12 = body.make_temp(glsl_type::float_type, "f10_22_20_12");
+   ir_variable *f10_21_20_11 = body.make_temp(glsl_type::float_type, "f10_21_20_11");
+
+   body.emit(assign(f11_22_21_12,
+                    sub(mul(matrix_elt(m, 1, 1), matrix_elt(m, 2, 2)),
+                        mul(matrix_elt(m, 2, 1), matrix_elt(m, 1, 2)))));
+   body.emit(assign(f10_22_20_12,
+                    sub(mul(matrix_elt(m, 1, 0), matrix_elt(m, 2, 2)),
+                        mul(matrix_elt(m, 2, 0), matrix_elt(m, 1, 2)))));
+   body.emit(assign(f10_21_20_11,
+                    sub(mul(matrix_elt(m, 1, 0), matrix_elt(m, 2, 1)),
+                        mul(matrix_elt(m, 2, 0), matrix_elt(m, 1, 1)))));
+
+   ir_variable *adj = body.make_temp(glsl_type::mat3_type, "adj");
+   body.emit(assign(array_ref(adj, 0), f11_22_21_12, WRITEMASK_X));
+   body.emit(assign(array_ref(adj, 1), neg(f10_22_20_12), WRITEMASK_X));
+   body.emit(assign(array_ref(adj, 2), f10_21_20_11, WRITEMASK_X));
+
+   body.emit(assign(array_ref(adj, 0), neg(
+                    sub(mul(matrix_elt(m, 0, 1), matrix_elt(m, 2, 2)),
+                        mul(matrix_elt(m, 2, 1), matrix_elt(m, 0, 2)))),
+                    WRITEMASK_Y));
+   body.emit(assign(array_ref(adj, 1),
+                    sub(mul(matrix_elt(m, 0, 0), matrix_elt(m, 2, 2)),
+                        mul(matrix_elt(m, 2, 0), matrix_elt(m, 0, 2))),
+                    WRITEMASK_Y));
+   body.emit(assign(array_ref(adj, 2), neg(
+                    sub(mul(matrix_elt(m, 0, 0), matrix_elt(m, 2, 1)),
+                        mul(matrix_elt(m, 2, 0), matrix_elt(m, 0, 1)))),
+                    WRITEMASK_Y));
+
+   body.emit(assign(array_ref(adj, 0),
+                    sub(mul(matrix_elt(m, 0, 1), matrix_elt(m, 1, 2)),
+                        mul(matrix_elt(m, 1, 1), matrix_elt(m, 0, 2))),
+                    WRITEMASK_Z));
+   body.emit(assign(array_ref(adj, 1), neg(
+                    sub(mul(matrix_elt(m, 0, 0), matrix_elt(m, 1, 2)),
+                        mul(matrix_elt(m, 1, 0), matrix_elt(m, 0, 2)))),
+                    WRITEMASK_Z));
+   body.emit(assign(array_ref(adj, 2),
+                    sub(mul(matrix_elt(m, 0, 0), matrix_elt(m, 1, 1)),
+                        mul(matrix_elt(m, 1, 0), matrix_elt(m, 0, 1))),
+                    WRITEMASK_Z));
+
+   ir_expression *det =
+      add(sub(mul(matrix_elt(m, 0, 0), f11_22_21_12),
+              mul(matrix_elt(m, 0, 1), f10_22_20_12)),
+          mul(matrix_elt(m, 0, 2), f10_21_20_11));
+
+   body.emit(ret(div(adj, det)));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_inverse_mat4()
+{
+   ir_variable *m = in_var(glsl_type::mat4_type, "m");
+   MAKE_SIG(glsl_type::mat4_type, v120, 1, m);
+
+   ir_variable *SubFactor00 = body.make_temp(glsl_type::float_type, "SubFactor00");
+   ir_variable *SubFactor01 = body.make_temp(glsl_type::float_type, "SubFactor01");
+   ir_variable *SubFactor02 = body.make_temp(glsl_type::float_type, "SubFactor02");
+   ir_variable *SubFactor03 = body.make_temp(glsl_type::float_type, "SubFactor03");
+   ir_variable *SubFactor04 = body.make_temp(glsl_type::float_type, "SubFactor04");
+   ir_variable *SubFactor05 = body.make_temp(glsl_type::float_type, "SubFactor05");
+   ir_variable *SubFactor06 = body.make_temp(glsl_type::float_type, "SubFactor06");
+   ir_variable *SubFactor07 = body.make_temp(glsl_type::float_type, "SubFactor07");
+   ir_variable *SubFactor08 = body.make_temp(glsl_type::float_type, "SubFactor08");
+   ir_variable *SubFactor09 = body.make_temp(glsl_type::float_type, "SubFactor09");
+   ir_variable *SubFactor10 = body.make_temp(glsl_type::float_type, "SubFactor10");
+   ir_variable *SubFactor11 = body.make_temp(glsl_type::float_type, "SubFactor11");
+   ir_variable *SubFactor12 = body.make_temp(glsl_type::float_type, "SubFactor12");
+   ir_variable *SubFactor13 = body.make_temp(glsl_type::float_type, "SubFactor13");
+   ir_variable *SubFactor14 = body.make_temp(glsl_type::float_type, "SubFactor14");
+   ir_variable *SubFactor15 = body.make_temp(glsl_type::float_type, "SubFactor15");
+   ir_variable *SubFactor16 = body.make_temp(glsl_type::float_type, "SubFactor16");
+   ir_variable *SubFactor17 = body.make_temp(glsl_type::float_type, "SubFactor17");
+   ir_variable *SubFactor18 = body.make_temp(glsl_type::float_type, "SubFactor18");
+
+   body.emit(assign(SubFactor00, sub(mul(matrix_elt(m, 2, 2), matrix_elt(m, 3, 3)), mul(matrix_elt(m, 3, 2), matrix_elt(m, 2, 3)))));
+   body.emit(assign(SubFactor01, sub(mul(matrix_elt(m, 2, 1), matrix_elt(m, 3, 3)), mul(matrix_elt(m, 3, 1), matrix_elt(m, 2, 3)))));
+   body.emit(assign(SubFactor02, sub(mul(matrix_elt(m, 2, 1), matrix_elt(m, 3, 2)), mul(matrix_elt(m, 3, 1), matrix_elt(m, 2, 2)))));
+   body.emit(assign(SubFactor03, sub(mul(matrix_elt(m, 2, 0), matrix_elt(m, 3, 3)), mul(matrix_elt(m, 3, 0), matrix_elt(m, 2, 3)))));
+   body.emit(assign(SubFactor04, sub(mul(matrix_elt(m, 2, 0), matrix_elt(m, 3, 2)), mul(matrix_elt(m, 3, 0), matrix_elt(m, 2, 2)))));
+   body.emit(assign(SubFactor05, sub(mul(matrix_elt(m, 2, 0), matrix_elt(m, 3, 1)), mul(matrix_elt(m, 3, 0), matrix_elt(m, 2, 1)))));
+   body.emit(assign(SubFactor06, sub(mul(matrix_elt(m, 1, 2), matrix_elt(m, 3, 3)), mul(matrix_elt(m, 3, 2), matrix_elt(m, 1, 3)))));
+   body.emit(assign(SubFactor07, sub(mul(matrix_elt(m, 1, 1), matrix_elt(m, 3, 3)), mul(matrix_elt(m, 3, 1), matrix_elt(m, 1, 3)))));
+   body.emit(assign(SubFactor08, sub(mul(matrix_elt(m, 1, 1), matrix_elt(m, 3, 2)), mul(matrix_elt(m, 3, 1), matrix_elt(m, 1, 2)))));
+   body.emit(assign(SubFactor09, sub(mul(matrix_elt(m, 1, 0), matrix_elt(m, 3, 3)), mul(matrix_elt(m, 3, 0), matrix_elt(m, 1, 3)))));
+   body.emit(assign(SubFactor10, sub(mul(matrix_elt(m, 1, 0), matrix_elt(m, 3, 2)), mul(matrix_elt(m, 3, 0), matrix_elt(m, 1, 2)))));
+   body.emit(assign(SubFactor11, sub(mul(matrix_elt(m, 1, 1), matrix_elt(m, 3, 3)), mul(matrix_elt(m, 3, 1), matrix_elt(m, 1, 3)))));
+   body.emit(assign(SubFactor12, sub(mul(matrix_elt(m, 1, 0), matrix_elt(m, 3, 1)), mul(matrix_elt(m, 3, 0), matrix_elt(m, 1, 1)))));
+   body.emit(assign(SubFactor13, sub(mul(matrix_elt(m, 1, 2), matrix_elt(m, 2, 3)), mul(matrix_elt(m, 2, 2), matrix_elt(m, 1, 3)))));
+   body.emit(assign(SubFactor14, sub(mul(matrix_elt(m, 1, 1), matrix_elt(m, 2, 3)), mul(matrix_elt(m, 2, 1), matrix_elt(m, 1, 3)))));
+   body.emit(assign(SubFactor15, sub(mul(matrix_elt(m, 1, 1), matrix_elt(m, 2, 2)), mul(matrix_elt(m, 2, 1), matrix_elt(m, 1, 2)))));
+   body.emit(assign(SubFactor16, sub(mul(matrix_elt(m, 1, 0), matrix_elt(m, 2, 3)), mul(matrix_elt(m, 2, 0), matrix_elt(m, 1, 3)))));
+   body.emit(assign(SubFactor17, sub(mul(matrix_elt(m, 1, 0), matrix_elt(m, 2, 2)), mul(matrix_elt(m, 2, 0), matrix_elt(m, 1, 2)))));
+   body.emit(assign(SubFactor18, sub(mul(matrix_elt(m, 1, 0), matrix_elt(m, 2, 1)), mul(matrix_elt(m, 2, 0), matrix_elt(m, 1, 1)))));
+
+   ir_variable *adj = body.make_temp(glsl_type::mat4_type, "adj");
+   body.emit(assign(array_ref(adj, 0),
+                    add(sub(mul(matrix_elt(m, 1, 1), SubFactor00),
+                            mul(matrix_elt(m, 1, 2), SubFactor01)),
+                        mul(matrix_elt(m, 1, 3), SubFactor02)),
+                    WRITEMASK_X));
+   body.emit(assign(array_ref(adj, 1), neg(
+                    add(sub(mul(matrix_elt(m, 1, 0), SubFactor00),
+                            mul(matrix_elt(m, 1, 2), SubFactor03)),
+                        mul(matrix_elt(m, 1, 3), SubFactor04))),
+                    WRITEMASK_X));
+   body.emit(assign(array_ref(adj, 2),
+                    add(sub(mul(matrix_elt(m, 1, 0), SubFactor01),
+                            mul(matrix_elt(m, 1, 1), SubFactor03)),
+                        mul(matrix_elt(m, 1, 3), SubFactor05)),
+                    WRITEMASK_X));
+   body.emit(assign(array_ref(adj, 3), neg(
+                    add(sub(mul(matrix_elt(m, 1, 0), SubFactor02),
+                            mul(matrix_elt(m, 1, 1), SubFactor04)),
+                        mul(matrix_elt(m, 1, 2), SubFactor05))),
+                    WRITEMASK_X));
+
+   body.emit(assign(array_ref(adj, 0), neg(
+                    add(sub(mul(matrix_elt(m, 0, 1), SubFactor00),
+                            mul(matrix_elt(m, 0, 2), SubFactor01)),
+                        mul(matrix_elt(m, 0, 3), SubFactor02))),
+                    WRITEMASK_Y));
+   body.emit(assign(array_ref(adj, 1),
+                    add(sub(mul(matrix_elt(m, 0, 0), SubFactor00),
+                            mul(matrix_elt(m, 0, 2), SubFactor03)),
+                        mul(matrix_elt(m, 0, 3), SubFactor04)),
+                    WRITEMASK_Y));
+   body.emit(assign(array_ref(adj, 2), neg(
+                    add(sub(mul(matrix_elt(m, 0, 0), SubFactor01),
+                            mul(matrix_elt(m, 0, 1), SubFactor03)),
+                        mul(matrix_elt(m, 0, 3), SubFactor05))),
+                    WRITEMASK_Y));
+   body.emit(assign(array_ref(adj, 3),
+                    add(sub(mul(matrix_elt(m, 0, 0), SubFactor02),
+                            mul(matrix_elt(m, 0, 1), SubFactor04)),
+                        mul(matrix_elt(m, 0, 2), SubFactor05)),
+                    WRITEMASK_Y));
+
+   body.emit(assign(array_ref(adj, 0),
+                    add(sub(mul(matrix_elt(m, 0, 1), SubFactor06),
+                            mul(matrix_elt(m, 0, 2), SubFactor07)),
+                        mul(matrix_elt(m, 0, 3), SubFactor08)),
+                    WRITEMASK_Z));
+   body.emit(assign(array_ref(adj, 1), neg(
+                    add(sub(mul(matrix_elt(m, 0, 0), SubFactor06),
+                            mul(matrix_elt(m, 0, 2), SubFactor09)),
+                        mul(matrix_elt(m, 0, 3), SubFactor10))),
+                    WRITEMASK_Z));
+   body.emit(assign(array_ref(adj, 2),
+                    add(sub(mul(matrix_elt(m, 0, 0), SubFactor11),
+                            mul(matrix_elt(m, 0, 1), SubFactor09)),
+                        mul(matrix_elt(m, 0, 3), SubFactor12)),
+                    WRITEMASK_Z));
+   body.emit(assign(array_ref(adj, 3), neg(
+                    add(sub(mul(matrix_elt(m, 0, 0), SubFactor08),
+                            mul(matrix_elt(m, 0, 1), SubFactor10)),
+                        mul(matrix_elt(m, 0, 2), SubFactor12))),
+                    WRITEMASK_Z));
+
+   body.emit(assign(array_ref(adj, 0), neg(
+                    add(sub(mul(matrix_elt(m, 0, 1), SubFactor13),
+                            mul(matrix_elt(m, 0, 2), SubFactor14)),
+                        mul(matrix_elt(m, 0, 3), SubFactor15))),
+                    WRITEMASK_W));
+   body.emit(assign(array_ref(adj, 1),
+                    add(sub(mul(matrix_elt(m, 0, 0), SubFactor13),
+                            mul(matrix_elt(m, 0, 2), SubFactor16)),
+                        mul(matrix_elt(m, 0, 3), SubFactor17)),
+                    WRITEMASK_W));
+   body.emit(assign(array_ref(adj, 2), neg(
+                    add(sub(mul(matrix_elt(m, 0, 0), SubFactor14),
+                            mul(matrix_elt(m, 0, 1), SubFactor16)),
+                        mul(matrix_elt(m, 0, 3), SubFactor18))),
+                    WRITEMASK_W));
+   body.emit(assign(array_ref(adj, 3),
+                    add(sub(mul(matrix_elt(m, 0, 0), SubFactor15),
+                            mul(matrix_elt(m, 0, 1), SubFactor17)),
+                        mul(matrix_elt(m, 0, 2), SubFactor18)),
+                    WRITEMASK_W));
+
+   ir_expression *det =
+      add(mul(matrix_elt(m, 0, 0), matrix_elt(adj, 0, 0)),
+          add(mul(matrix_elt(m, 0, 1), matrix_elt(adj, 1, 0)),
+              add(mul(matrix_elt(m, 0, 2), matrix_elt(adj, 2, 0)),
+                  mul(matrix_elt(m, 0, 3), matrix_elt(adj, 3, 0)))));
+
+   body.emit(ret(div(adj, det)));
+
+   return sig;
+}
+
+
+ir_function_signature *
+builtin_builder::_lessThan(builtin_available_predicate avail,
+                           const glsl_type *type)
+{
+   return binop(ir_binop_less, avail,
+                glsl_type::bvec(type->vector_elements), type, type);
+}
+
+ir_function_signature *
+builtin_builder::_lessThanEqual(builtin_available_predicate avail,
+                                const glsl_type *type)
+{
+   return binop(ir_binop_lequal, avail,
+                glsl_type::bvec(type->vector_elements), type, type);
+}
+
+ir_function_signature *
+builtin_builder::_greaterThan(builtin_available_predicate avail,
+                              const glsl_type *type)
+{
+   return binop(ir_binop_greater, avail,
+                glsl_type::bvec(type->vector_elements), type, type);
+}
+
+ir_function_signature *
+builtin_builder::_greaterThanEqual(builtin_available_predicate avail,
+                                   const glsl_type *type)
+{
+   return binop(ir_binop_gequal, avail,
+                glsl_type::bvec(type->vector_elements), type, type);
+}
+
+ir_function_signature *
+builtin_builder::_equal(builtin_available_predicate avail,
+                        const glsl_type *type)
+{
+   return binop(ir_binop_equal, avail,
+                glsl_type::bvec(type->vector_elements), type, type);
+}
+
+ir_function_signature *
+builtin_builder::_notEqual(builtin_available_predicate avail,
+                           const glsl_type *type)
+{
+   return binop(ir_binop_nequal, avail,
+                glsl_type::bvec(type->vector_elements), type, type);
+}
+
+ir_function_signature *
+builtin_builder::_any(const glsl_type *type)
+{
+   return unop(always_available, ir_unop_any, glsl_type::bool_type, type);
+}
+
+ir_function_signature *
+builtin_builder::_all(const glsl_type *type)
+{
+   ir_variable *v = in_var(type, "v");
+   MAKE_SIG(glsl_type::bool_type, always_available, 1, v);
+
+   switch (type->vector_elements) {
+   case 2:
+      body.emit(ret(logic_and(swizzle_x(v), swizzle_y(v))));
+      break;
+   case 3:
+      body.emit(ret(logic_and(logic_and(swizzle_x(v), swizzle_y(v)),
+                              swizzle_z(v))));
+      break;
+   case 4:
+      body.emit(ret(logic_and(logic_and(logic_and(swizzle_x(v), swizzle_y(v)),
+                                        swizzle_z(v)),
+                              swizzle_w(v))));
+      break;
+   }
+
+   return sig;
+}
+
+UNOP(not, ir_unop_logic_not, always_available)
+
+static bool
+has_lod(const glsl_type *sampler_type)
+{
+   assert(sampler_type->is_sampler());
+
+   switch (sampler_type->sampler_dimensionality) {
+   case GLSL_SAMPLER_DIM_RECT:
+   case GLSL_SAMPLER_DIM_BUF:
+   case GLSL_SAMPLER_DIM_MS:
+      return false;
+   default:
+      return true;
+   }
+}
+
+ir_function_signature *
+builtin_builder::_textureSize(builtin_available_predicate avail,
+                              const glsl_type *return_type,
+                              const glsl_type *sampler_type)
+{
+   ir_variable *s = in_var(sampler_type, "sampler");
+   /* The sampler always exists; add optional lod later. */
+   MAKE_SIG(return_type, avail, 1, s);
+
+   ir_texture *tex = new(mem_ctx) ir_texture(ir_txs);
+   tex->set_sampler(new(mem_ctx) ir_dereference_variable(s), return_type);
+
+   if (has_lod(sampler_type)) {
+      ir_variable *lod = in_var(glsl_type::int_type, "lod");
+      sig->parameters.push_tail(lod);
+      tex->lod_info.lod = var_ref(lod);
+   } else {
+      tex->lod_info.lod = imm(0u);
+   }
+
+   body.emit(ret(tex));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_texture(ir_texture_opcode opcode,
+                          builtin_available_predicate avail,
+                          const glsl_type *return_type,
+                          const glsl_type *sampler_type,
+                          const glsl_type *coord_type,
+                          int flags)
+{
+   ir_variable *s = in_var(sampler_type, "sampler");
+   ir_variable *P = in_var(coord_type, "P");
+   /* The sampler and coordinate always exist; add optional parameters later. */
+   MAKE_SIG(return_type, avail, 2, s, P);
+
+   ir_texture *tex = new(mem_ctx) ir_texture(opcode);
+   tex->set_sampler(var_ref(s), return_type);
+
+   const int coord_size = sampler_type->coordinate_components();
+
+   if (coord_size == coord_type->vector_elements) {
+      tex->coordinate = var_ref(P);
+   } else {
+      /* The incoming coordinate also has the projector or shadow comparitor,
+       * so we need to swizzle those away.
+       */
+      tex->coordinate = swizzle_for_size(P, coord_size);
+   }
+
+   /* The projector is always in the last component. */
+   if (flags & TEX_PROJECT)
+      tex->projector = swizzle(P, coord_type->vector_elements - 1, 1);
+
+   if (sampler_type->sampler_shadow) {
+      if (opcode == ir_tg4) {
+         /* gather has refz as a separate parameter, immediately after the
+          * coordinate
+          */
+         ir_variable *refz = in_var(glsl_type::float_type, "refz");
+         sig->parameters.push_tail(refz);
+         tex->shadow_comparitor = var_ref(refz);
+      } else {
+         /* The shadow comparitor is normally in the Z component, but a few types
+          * have sufficiently large coordinates that it's in W.
+          */
+         tex->shadow_comparitor = swizzle(P, MAX2(coord_size, SWIZZLE_Z), 1);
+      }
+   }
+
+   if (opcode == ir_txl) {
+      ir_variable *lod = in_var(glsl_type::float_type, "lod");
+      sig->parameters.push_tail(lod);
+      tex->lod_info.lod = var_ref(lod);
+   } else if (opcode == ir_txd) {
+      int grad_size = coord_size - (sampler_type->sampler_array ? 1 : 0);
+      ir_variable *dPdx = in_var(glsl_type::vec(grad_size), "dPdx");
+      ir_variable *dPdy = in_var(glsl_type::vec(grad_size), "dPdy");
+      sig->parameters.push_tail(dPdx);
+      sig->parameters.push_tail(dPdy);
+      tex->lod_info.grad.dPdx = var_ref(dPdx);
+      tex->lod_info.grad.dPdy = var_ref(dPdy);
+   }
+
+   if (flags & (TEX_OFFSET | TEX_OFFSET_NONCONST)) {
+      int offset_size = coord_size - (sampler_type->sampler_array ? 1 : 0);
+      ir_variable *offset =
+         new(mem_ctx) ir_variable(glsl_type::ivec(offset_size), "offset",
+                                  (flags & TEX_OFFSET) ? ir_var_const_in : ir_var_function_in);
+      sig->parameters.push_tail(offset);
+      tex->offset = var_ref(offset);
+   }
+
+   if (flags & TEX_OFFSET_ARRAY) {
+      ir_variable *offsets =
+         new(mem_ctx) ir_variable(glsl_type::get_array_instance(glsl_type::ivec2_type, 4),
+                                  "offsets", ir_var_const_in);
+      sig->parameters.push_tail(offsets);
+      tex->offset = var_ref(offsets);
+   }
+
+   if (opcode == ir_tg4) {
+      if (flags & TEX_COMPONENT) {
+         ir_variable *component =
+            new(mem_ctx) ir_variable(glsl_type::int_type, "comp", ir_var_const_in);
+         sig->parameters.push_tail(component);
+         tex->lod_info.component = var_ref(component);
+      }
+      else {
+         tex->lod_info.component = imm(0);
+      }
+   }
+
+   /* The "bias" parameter comes /after/ the "offset" parameter, which is
+    * inconsistent with both textureLodOffset and textureGradOffset.
+    */
+   if (opcode == ir_txb) {
+      ir_variable *bias = in_var(glsl_type::float_type, "bias");
+      sig->parameters.push_tail(bias);
+      tex->lod_info.bias = var_ref(bias);
+   }
+
+   body.emit(ret(tex));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_textureCubeArrayShadow()
+{
+   ir_variable *s = in_var(glsl_type::samplerCubeArrayShadow_type, "sampler");
+   ir_variable *P = in_var(glsl_type::vec4_type, "P");
+   ir_variable *compare = in_var(glsl_type::float_type, "compare");
+   MAKE_SIG(glsl_type::float_type, texture_cube_map_array, 3, s, P, compare);
+
+   ir_texture *tex = new(mem_ctx) ir_texture(ir_tex);
+   tex->set_sampler(var_ref(s), glsl_type::float_type);
+
+   tex->coordinate = var_ref(P);
+   tex->shadow_comparitor = var_ref(compare);
+
+   body.emit(ret(tex));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_texelFetch(builtin_available_predicate avail,
+                             const glsl_type *return_type,
+                             const glsl_type *sampler_type,
+                             const glsl_type *coord_type,
+                             const glsl_type *offset_type)
+{
+   ir_variable *s = in_var(sampler_type, "sampler");
+   ir_variable *P = in_var(coord_type, "P");
+   /* The sampler and coordinate always exist; add optional parameters later. */
+   MAKE_SIG(return_type, avail, 2, s, P);
+
+   ir_texture *tex = new(mem_ctx) ir_texture(ir_txf);
+   tex->coordinate = var_ref(P);
+   tex->set_sampler(var_ref(s), return_type);
+
+   if (sampler_type->sampler_dimensionality == GLSL_SAMPLER_DIM_MS) {
+      ir_variable *sample = in_var(glsl_type::int_type, "sample");
+      sig->parameters.push_tail(sample);
+      tex->lod_info.sample_index = var_ref(sample);
+      tex->op = ir_txf_ms;
+   } else if (has_lod(sampler_type)) {
+      ir_variable *lod = in_var(glsl_type::int_type, "lod");
+      sig->parameters.push_tail(lod);
+      tex->lod_info.lod = var_ref(lod);
+   } else {
+      tex->lod_info.lod = imm(0u);
+   }
+
+   if (offset_type != NULL) {
+      ir_variable *offset =
+         new(mem_ctx) ir_variable(offset_type, "offset", ir_var_const_in);
+      sig->parameters.push_tail(offset);
+      tex->offset = var_ref(offset);
+   }
+
+   body.emit(ret(tex));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_EmitVertex()
+{
+   MAKE_SIG(glsl_type::void_type, gs_only, 0);
+
+   body.emit(new(mem_ctx) ir_emit_vertex());
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_EndPrimitive()
+{
+   MAKE_SIG(glsl_type::void_type, gs_only, 0);
+
+   body.emit(new(mem_ctx) ir_end_primitive());
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_textureQueryLod(const glsl_type *sampler_type,
+                                  const glsl_type *coord_type)
+{
+   ir_variable *s = in_var(sampler_type, "sampler");
+   ir_variable *coord = in_var(coord_type, "coord");
+   /* The sampler and coordinate always exist; add optional parameters later. */
+   MAKE_SIG(glsl_type::vec2_type, texture_query_lod, 2, s, coord);
+
+   ir_texture *tex = new(mem_ctx) ir_texture(ir_lod);
+   tex->coordinate = var_ref(coord);
+   tex->set_sampler(var_ref(s), glsl_type::vec2_type);
+
+   body.emit(ret(tex));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_textureQueryLevels(const glsl_type *sampler_type)
+{
+   ir_variable *s = in_var(sampler_type, "sampler");
+   const glsl_type *return_type = glsl_type::int_type;
+   MAKE_SIG(return_type, texture_query_levels, 1, s);
+
+   ir_texture *tex = new(mem_ctx) ir_texture(ir_query_levels);
+   tex->set_sampler(var_ref(s), return_type);
+
+   body.emit(ret(tex));
+
+   return sig;
+}
+
+UNOP(dFdx, ir_unop_dFdx, fs_oes_derivatives)
+UNOP(dFdy, ir_unop_dFdy, fs_oes_derivatives)
+
+ir_function_signature *
+builtin_builder::_fwidth(const glsl_type *type)
+{
+   ir_variable *p = in_var(type, "p");
+   MAKE_SIG(type, fs_oes_derivatives, 1, p);
+
+   body.emit(ret(add(abs(expr(ir_unop_dFdx, p)), abs(expr(ir_unop_dFdy, p)))));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_noise1(const glsl_type *type)
+{
+   return unop(v110, ir_unop_noise, glsl_type::float_type, type);
+}
+
+ir_function_signature *
+builtin_builder::_noise2(const glsl_type *type)
+{
+   ir_variable *p = in_var(type, "p");
+   MAKE_SIG(glsl_type::vec2_type, v110, 1, p);
+
+   ir_constant_data b_offset;
+   b_offset.f[0] = 601.0f;
+   b_offset.f[1] = 313.0f;
+   b_offset.f[2] = 29.0f;
+   b_offset.f[3] = 277.0f;
+
+   ir_variable *a = body.make_temp(glsl_type::float_type, "a");
+   ir_variable *b = body.make_temp(glsl_type::float_type, "b");
+   ir_variable *t = body.make_temp(glsl_type::vec2_type,  "t");
+   body.emit(assign(a, expr(ir_unop_noise, p)));
+   body.emit(assign(b, expr(ir_unop_noise, add(p, imm(type, b_offset)))));
+   body.emit(assign(t, a, WRITEMASK_X));
+   body.emit(assign(t, b, WRITEMASK_Y));
+   body.emit(ret(t));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_noise3(const glsl_type *type)
+{
+   ir_variable *p = in_var(type, "p");
+   MAKE_SIG(glsl_type::vec3_type, v110, 1, p);
+
+   ir_constant_data b_offset;
+   b_offset.f[0] = 601.0f;
+   b_offset.f[1] = 313.0f;
+   b_offset.f[2] = 29.0f;
+   b_offset.f[3] = 277.0f;
+
+   ir_constant_data c_offset;
+   c_offset.f[0] = 1559.0f;
+   c_offset.f[1] = 113.0f;
+   c_offset.f[2] = 1861.0f;
+   c_offset.f[3] = 797.0f;
+
+   ir_variable *a = body.make_temp(glsl_type::float_type, "a");
+   ir_variable *b = body.make_temp(glsl_type::float_type, "b");
+   ir_variable *c = body.make_temp(glsl_type::float_type, "c");
+   ir_variable *t = body.make_temp(glsl_type::vec3_type,  "t");
+   body.emit(assign(a, expr(ir_unop_noise, p)));
+   body.emit(assign(b, expr(ir_unop_noise, add(p, imm(type, b_offset)))));
+   body.emit(assign(c, expr(ir_unop_noise, add(p, imm(type, c_offset)))));
+   body.emit(assign(t, a, WRITEMASK_X));
+   body.emit(assign(t, b, WRITEMASK_Y));
+   body.emit(assign(t, c, WRITEMASK_Z));
+   body.emit(ret(t));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_noise4(const glsl_type *type)
+{
+   ir_variable *p = in_var(type, "p");
+   MAKE_SIG(glsl_type::vec4_type, v110, 1, p);
+
+   ir_variable *_p = body.make_temp(type, "_p");
+
+   ir_constant_data p_offset;
+   p_offset.f[0] = 1559.0f;
+   p_offset.f[1] = 113.0f;
+   p_offset.f[2] = 1861.0f;
+   p_offset.f[3] = 797.0f;
+
+   body.emit(assign(_p, add(p, imm(type, p_offset))));
+
+   ir_constant_data offset;
+   offset.f[0] = 601.0f;
+   offset.f[1] = 313.0f;
+   offset.f[2] = 29.0f;
+   offset.f[3] = 277.0f;
+
+   ir_variable *a = body.make_temp(glsl_type::float_type, "a");
+   ir_variable *b = body.make_temp(glsl_type::float_type, "b");
+   ir_variable *c = body.make_temp(glsl_type::float_type, "c");
+   ir_variable *d = body.make_temp(glsl_type::float_type, "d");
+   ir_variable *t = body.make_temp(glsl_type::vec4_type,  "t");
+   body.emit(assign(a, expr(ir_unop_noise, p)));
+   body.emit(assign(b, expr(ir_unop_noise, add(p, imm(type, offset)))));
+   body.emit(assign(c, expr(ir_unop_noise, _p)));
+   body.emit(assign(d, expr(ir_unop_noise, add(_p, imm(type, offset)))));
+   body.emit(assign(t, a, WRITEMASK_X));
+   body.emit(assign(t, b, WRITEMASK_Y));
+   body.emit(assign(t, c, WRITEMASK_Z));
+   body.emit(assign(t, d, WRITEMASK_W));
+   body.emit(ret(t));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_bitfieldExtract(const glsl_type *type)
+{
+   ir_variable *value  = in_var(type, "value");
+   ir_variable *offset = in_var(glsl_type::int_type, "offset");
+   ir_variable *bits   = in_var(glsl_type::int_type, "bits");
+   MAKE_SIG(type, gpu_shader5, 3, value, offset, bits);
+
+   body.emit(ret(expr(ir_triop_bitfield_extract, value, offset, bits)));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_bitfieldInsert(const glsl_type *type)
+{
+   ir_variable *base   = in_var(type, "base");
+   ir_variable *insert = in_var(type, "insert");
+   ir_variable *offset = in_var(glsl_type::int_type, "offset");
+   ir_variable *bits   = in_var(glsl_type::int_type, "bits");
+   MAKE_SIG(type, gpu_shader5, 4, base, insert, offset, bits);
+
+   body.emit(ret(bitfield_insert(base, insert, offset, bits)));
+
+   return sig;
+}
+
+UNOP(bitfieldReverse, ir_unop_bitfield_reverse, gpu_shader5)
+
+ir_function_signature *
+builtin_builder::_bitCount(const glsl_type *type)
+{
+   return unop(gpu_shader5, ir_unop_bit_count,
+               glsl_type::ivec(type->vector_elements), type);
+}
+
+ir_function_signature *
+builtin_builder::_findLSB(const glsl_type *type)
+{
+   return unop(gpu_shader5, ir_unop_find_lsb,
+               glsl_type::ivec(type->vector_elements), type);
+}
+
+ir_function_signature *
+builtin_builder::_findMSB(const glsl_type *type)
+{
+   return unop(gpu_shader5, ir_unop_find_msb,
+               glsl_type::ivec(type->vector_elements), type);
+}
+
+ir_function_signature *
+builtin_builder::_fma(const glsl_type *type)
+{
+   ir_variable *a = in_var(type, "a");
+   ir_variable *b = in_var(type, "b");
+   ir_variable *c = in_var(type, "c");
+   MAKE_SIG(type, gpu_shader5, 3, a, b, c);
+
+   body.emit(ret(ir_builder::fma(a, b, c)));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_ldexp(const glsl_type *x_type, const glsl_type *exp_type)
+{
+   return binop(ir_binop_ldexp, gpu_shader5, x_type, x_type, exp_type);
+}
+
+ir_function_signature *
+builtin_builder::_frexp(const glsl_type *x_type, const glsl_type *exp_type)
+{
+   ir_variable *x = in_var(x_type, "x");
+   ir_variable *exponent = out_var(exp_type, "exp");
+   MAKE_SIG(x_type, gpu_shader5, 2, x, exponent);
+
+   const unsigned vec_elem = x_type->vector_elements;
+   const glsl_type *bvec = glsl_type::get_instance(GLSL_TYPE_BOOL, vec_elem, 1);
+   const glsl_type *uvec = glsl_type::get_instance(GLSL_TYPE_UINT, vec_elem, 1);
+
+   /* Single-precision floating-point values are stored as
+    *   1 sign bit;
+    *   8 exponent bits;
+    *   23 mantissa bits.
+    *
+    * An exponent shift of 23 will shift the mantissa out, leaving only the
+    * exponent and sign bit (which itself may be zero, if the absolute value
+    * was taken before the bitcast and shift.
+    */
+   ir_constant *exponent_shift = imm(23);
+   ir_constant *exponent_bias = imm(-126, vec_elem);
+
+   ir_constant *sign_mantissa_mask = imm(0x807fffffu, vec_elem);
+
+   /* Exponent of floating-point values in the range [0.5, 1.0). */
+   ir_constant *exponent_value = imm(0x3f000000u, vec_elem);
+
+   ir_variable *is_not_zero = body.make_temp(bvec, "is_not_zero");
+   body.emit(assign(is_not_zero, nequal(abs(x), imm(0.0f, vec_elem))));
+
+   /* Since abs(x) ensures that the sign bit is zero, we don't need to bitcast
+    * to unsigned integers to ensure that 1 bits aren't shifted in.
+    */
+   body.emit(assign(exponent, rshift(bitcast_f2i(abs(x)), exponent_shift)));
+   body.emit(assign(exponent, add(exponent, csel(is_not_zero, exponent_bias,
+                                                     imm(0, vec_elem)))));
+
+   ir_variable *bits = body.make_temp(uvec, "bits");
+   body.emit(assign(bits, bitcast_f2u(x)));
+   body.emit(assign(bits, bit_and(bits, sign_mantissa_mask)));
+   body.emit(assign(bits, bit_or(bits, csel(is_not_zero, exponent_value,
+                                                imm(0u, vec_elem)))));
+   body.emit(ret(bitcast_u2f(bits)));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_uaddCarry(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   ir_variable *y = in_var(type, "y");
+   ir_variable *carry = out_var(type, "carry");
+   MAKE_SIG(type, gpu_shader5, 3, x, y, carry);
+
+   body.emit(assign(carry, ir_builder::carry(x, y)));
+   body.emit(ret(add(x, y)));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_usubBorrow(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   ir_variable *y = in_var(type, "y");
+   ir_variable *borrow = out_var(type, "borrow");
+   MAKE_SIG(type, gpu_shader5, 3, x, y, borrow);
+
+   body.emit(assign(borrow, ir_builder::borrow(x, y)));
+   body.emit(ret(sub(x, y)));
+
+   return sig;
+}
+
+/**
+ * For both imulExtended() and umulExtended() built-ins.
+ */
+ir_function_signature *
+builtin_builder::_mulExtended(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   ir_variable *y = in_var(type, "y");
+   ir_variable *msb = out_var(type, "msb");
+   ir_variable *lsb = out_var(type, "lsb");
+   MAKE_SIG(glsl_type::void_type, gpu_shader5, 4, x, y, msb, lsb);
+
+   body.emit(assign(msb, imul_high(x, y)));
+   body.emit(assign(lsb, mul(x, y)));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_atomic_intrinsic(builtin_available_predicate avail)
+{
+   ir_variable *counter = in_var(glsl_type::atomic_uint_type, "counter");
+   MAKE_INTRINSIC(glsl_type::uint_type, avail, 1, counter);
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_atomic_op(const char *intrinsic,
+                            builtin_available_predicate avail)
+{
+   ir_variable *counter = in_var(glsl_type::atomic_uint_type, "atomic_counter");
+   MAKE_SIG(glsl_type::uint_type, avail, 1, counter);
+
+   ir_variable *retval = body.make_temp(glsl_type::uint_type, "atomic_retval");
+   body.emit(call(shader->symbols->get_function(intrinsic), retval,
+                  sig->parameters));
+   body.emit(ret(retval));
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_min3(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   ir_variable *y = in_var(type, "y");
+   ir_variable *z = in_var(type, "z");
+   MAKE_SIG(type, shader_trinary_minmax, 3, x, y, z);
+
+   ir_expression *min3 = min2(x, min2(y,z));
+   body.emit(ret(min3));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_max3(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   ir_variable *y = in_var(type, "y");
+   ir_variable *z = in_var(type, "z");
+   MAKE_SIG(type, shader_trinary_minmax, 3, x, y, z);
+
+   ir_expression *max3 = max2(x, max2(y,z));
+   body.emit(ret(max3));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_mid3(const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   ir_variable *y = in_var(type, "y");
+   ir_variable *z = in_var(type, "z");
+   MAKE_SIG(type, shader_trinary_minmax, 3, x, y, z);
+
+   ir_expression *mid3 = max2(min2(x, y), max2(min2(x, z), min2(y, z)));
+   body.emit(ret(mid3));
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_image_prototype(const glsl_type *image_type,
+                                  const char *intrinsic_name,
+                                  unsigned num_arguments,
+                                  unsigned flags)
+{
+   const glsl_type *data_type = glsl_type::get_instance(
+      image_type->sampler_type,
+      (flags & IMAGE_FUNCTION_HAS_VECTOR_DATA_TYPE ? 4 : 1),
+      1);
+   const glsl_type *ret_type = (flags & IMAGE_FUNCTION_RETURNS_VOID ?
+                                glsl_type::void_type : data_type);
+
+   /* Addressing arguments that are always present. */
+   ir_variable *image = in_var(image_type, "image");
+   ir_variable *coord = in_var(
+      glsl_type::ivec(image_type->coordinate_components()), "coord");
+
+   ir_function_signature *sig = new_sig(
+      ret_type, shader_image_load_store, 2, image, coord);
+
+   /* Sample index for multisample images. */
+   if (image_type->sampler_dimensionality == GLSL_SAMPLER_DIM_MS)
+      sig->parameters.push_tail(in_var(glsl_type::int_type, "sample"));
+
+   /* Data arguments. */
+   for (unsigned i = 0; i < num_arguments; ++i)
+      sig->parameters.push_tail(in_var(data_type,
+                                       ralloc_asprintf(NULL, "arg%d", i)));
+
+   /* Set the maximal set of qualifiers allowed for this image
+    * built-in.  Function calls with arguments having fewer
+    * qualifiers than present in the prototype are allowed by the
+    * spec, but not with more, i.e. this will make the compiler
+    * accept everything that needs to be accepted, and reject cases
+    * like loads from write-only or stores to read-only images.
+    */
+   image->data.image.read_only = flags & IMAGE_FUNCTION_READ_ONLY;
+   image->data.image.write_only = flags & IMAGE_FUNCTION_WRITE_ONLY;
+   image->data.image.coherent = true;
+   image->data.image._volatile = true;
+   image->data.image.restrict_flag = true;
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_image(const glsl_type *image_type,
+                        const char *intrinsic_name,
+                        unsigned num_arguments,
+                        unsigned flags)
+{
+   ir_function_signature *sig = _image_prototype(image_type, intrinsic_name,
+                                                 num_arguments, flags);
+
+   if (flags & IMAGE_FUNCTION_EMIT_STUB) {
+      ir_factory body(&sig->body, mem_ctx);
+      ir_function *f = shader->symbols->get_function(intrinsic_name);
+
+      if (flags & IMAGE_FUNCTION_RETURNS_VOID) {
+         body.emit(call(f, NULL, sig->parameters));
+      } else {
+         ir_variable *ret_val =
+            body.make_temp(sig->return_type, "_ret_val");
+         body.emit(call(f, ret_val, sig->parameters));
+         body.emit(ret(ret_val));
+      }
+
+      sig->is_defined = true;
+
+   } else {
+      sig->is_intrinsic = true;
+   }
+
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_memory_barrier_intrinsic(builtin_available_predicate avail)
+{
+   MAKE_INTRINSIC(glsl_type::void_type, avail, 0);
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_memory_barrier(builtin_available_predicate avail)
+{
+   MAKE_SIG(glsl_type::void_type, avail, 0);
+   body.emit(call(shader->symbols->get_function("__intrinsic_memory_barrier"),
+                  NULL, sig->parameters));
+   return sig;
+}
+
+/** @} */
+
+/******************************************************************************/
+
+/* The singleton instance of builtin_builder. */
+static builtin_builder builtins;
+static mtx_t builtins_lock = _MTX_INITIALIZER_NP;
+
+/**
+ * External API (exposing the built-in module to the rest of the compiler):
+ *  @{
+ */
+void
+_mesa_glsl_initialize_builtin_functions()
+{
+   mtx_lock(&builtins_lock);
+   builtins.initialize();
+   mtx_unlock(&builtins_lock);
+}
+
+void
+_mesa_glsl_release_builtin_functions()
+{
+   mtx_lock(&builtins_lock);
+   builtins.release();
+   mtx_unlock(&builtins_lock);
+}
+
+ir_function_signature *
+_mesa_glsl_find_builtin_function(_mesa_glsl_parse_state *state,
+                                 const char *name, exec_list *actual_parameters)
+{
+   ir_function_signature * s;
+   mtx_lock(&builtins_lock);
+   s = builtins.find(state, name, actual_parameters);
+   mtx_unlock(&builtins_lock);
+   return s;
+}
+
+gl_shader *
+_mesa_glsl_get_builtin_function_shader()
+{
+   return builtins.shader;
+}
+
+/** @} */

diff --git a/icd/intel/compiler/shader/builtin_type_macros.h b/icd/intel/compiler/shader/builtin_type_macros.h
new file mode 100644
index 0000000..236e1ce
--- /dev/null
+++ b/icd/intel/compiler/shader/builtin_type_macros.h

@@ -0,0 +1,156 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file builtin_type_macros.h
+ *
+ * This contains definitions for all GLSL built-in types, regardless of what
+ * language version or extension might provide them.
+ */
+
+#include "glsl_types.h"
+
+DECL_TYPE(error,  GL_INVALID_ENUM, GLSL_TYPE_ERROR, 0, 0)
+DECL_TYPE(void,   GL_INVALID_ENUM, GLSL_TYPE_VOID,  0, 0)
+
+DECL_TYPE(bool,   GL_BOOL,         GLSL_TYPE_BOOL,  1, 1)
+DECL_TYPE(bvec2,  GL_BOOL_VEC2,    GLSL_TYPE_BOOL,  2, 1)
+DECL_TYPE(bvec3,  GL_BOOL_VEC3,    GLSL_TYPE_BOOL,  3, 1)
+DECL_TYPE(bvec4,  GL_BOOL_VEC4,    GLSL_TYPE_BOOL,  4, 1)
+
+DECL_TYPE(int,    GL_INT,          GLSL_TYPE_INT,   1, 1)
+DECL_TYPE(ivec2,  GL_INT_VEC2,     GLSL_TYPE_INT,   2, 1)
+DECL_TYPE(ivec3,  GL_INT_VEC3,     GLSL_TYPE_INT,   3, 1)
+DECL_TYPE(ivec4,  GL_INT_VEC4,     GLSL_TYPE_INT,   4, 1)
+
+DECL_TYPE(uint,   GL_UNSIGNED_INT,      GLSL_TYPE_UINT, 1, 1)
+DECL_TYPE(uvec2,  GL_UNSIGNED_INT_VEC2, GLSL_TYPE_UINT, 2, 1)
+DECL_TYPE(uvec3,  GL_UNSIGNED_INT_VEC3, GLSL_TYPE_UINT, 3, 1)
+DECL_TYPE(uvec4,  GL_UNSIGNED_INT_VEC4, GLSL_TYPE_UINT, 4, 1)
+
+DECL_TYPE(float,  GL_FLOAT,        GLSL_TYPE_FLOAT, 1, 1)
+DECL_TYPE(vec2,   GL_FLOAT_VEC2,   GLSL_TYPE_FLOAT, 2, 1)
+DECL_TYPE(vec3,   GL_FLOAT_VEC3,   GLSL_TYPE_FLOAT, 3, 1)
+DECL_TYPE(vec4,   GL_FLOAT_VEC4,   GLSL_TYPE_FLOAT, 4, 1)
+
+DECL_TYPE(mat2,   GL_FLOAT_MAT2,   GLSL_TYPE_FLOAT, 2, 2)
+DECL_TYPE(mat3,   GL_FLOAT_MAT3,   GLSL_TYPE_FLOAT, 3, 3)
+DECL_TYPE(mat4,   GL_FLOAT_MAT4,   GLSL_TYPE_FLOAT, 4, 4)
+
+DECL_TYPE(mat2x3, GL_FLOAT_MAT2x3, GLSL_TYPE_FLOAT, 3, 2)
+DECL_TYPE(mat2x4, GL_FLOAT_MAT2x4, GLSL_TYPE_FLOAT, 4, 2)
+DECL_TYPE(mat3x2, GL_FLOAT_MAT3x2, GLSL_TYPE_FLOAT, 2, 3)
+DECL_TYPE(mat3x4, GL_FLOAT_MAT3x4, GLSL_TYPE_FLOAT, 4, 3)
+DECL_TYPE(mat4x2, GL_FLOAT_MAT4x2, GLSL_TYPE_FLOAT, 2, 4)
+DECL_TYPE(mat4x3, GL_FLOAT_MAT4x3, GLSL_TYPE_FLOAT, 3, 4)
+
+DECL_TYPE(sampler1D,         GL_SAMPLER_1D,                   GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_1D,   0, 0, GLSL_TYPE_FLOAT)
+DECL_TYPE(sampler2D,         GL_SAMPLER_2D,                   GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_2D,   0, 0, GLSL_TYPE_FLOAT)
+DECL_TYPE(sampler3D,         GL_SAMPLER_3D,                   GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_3D,   0, 0, GLSL_TYPE_FLOAT)
+DECL_TYPE(samplerCube,       GL_SAMPLER_CUBE,                 GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_CUBE, 0, 0, GLSL_TYPE_FLOAT)
+DECL_TYPE(sampler1DArray,    GL_SAMPLER_1D_ARRAY,             GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_1D,   0, 1, GLSL_TYPE_FLOAT)
+DECL_TYPE(sampler2DArray,    GL_SAMPLER_2D_ARRAY,             GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_2D,   0, 1, GLSL_TYPE_FLOAT)
+DECL_TYPE(samplerCubeArray,  GL_SAMPLER_CUBE_MAP_ARRAY,       GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_CUBE, 0, 1, GLSL_TYPE_FLOAT)
+DECL_TYPE(sampler2DRect,     GL_SAMPLER_2D_RECT,              GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_RECT, 0, 0, GLSL_TYPE_FLOAT)
+DECL_TYPE(samplerBuffer,     GL_SAMPLER_BUFFER,               GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_BUF,  0, 0, GLSL_TYPE_FLOAT)
+DECL_TYPE(sampler2DMS,       GL_SAMPLER_2D_MULTISAMPLE,       GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_MS,   0, 0, GLSL_TYPE_FLOAT)
+DECL_TYPE(sampler2DMSArray,  GL_SAMPLER_2D_MULTISAMPLE_ARRAY, GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_MS,   0, 1, GLSL_TYPE_FLOAT)
+
+DECL_TYPE(isampler1D,        GL_INT_SAMPLER_1D,                   GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_1D,   0, 0, GLSL_TYPE_INT)
+DECL_TYPE(isampler2D,        GL_INT_SAMPLER_2D,                   GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_2D,   0, 0, GLSL_TYPE_INT)
+DECL_TYPE(isampler3D,        GL_INT_SAMPLER_3D,                   GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_3D,   0, 0, GLSL_TYPE_INT)
+DECL_TYPE(isamplerCube,      GL_INT_SAMPLER_CUBE,                 GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_CUBE, 0, 0, GLSL_TYPE_INT)
+DECL_TYPE(isampler1DArray,   GL_INT_SAMPLER_1D_ARRAY,             GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_1D,   0, 1, GLSL_TYPE_INT)
+DECL_TYPE(isampler2DArray,   GL_INT_SAMPLER_2D_ARRAY,             GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_2D,   0, 1, GLSL_TYPE_INT)
+DECL_TYPE(isamplerCubeArray, GL_INT_SAMPLER_CUBE_MAP_ARRAY,       GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_CUBE, 0, 1, GLSL_TYPE_INT)
+DECL_TYPE(isampler2DRect,    GL_INT_SAMPLER_2D_RECT,              GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_RECT, 0, 0, GLSL_TYPE_INT)
+DECL_TYPE(isamplerBuffer,    GL_INT_SAMPLER_BUFFER,               GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_BUF,  0, 0, GLSL_TYPE_INT)
+DECL_TYPE(isampler2DMS,      GL_INT_SAMPLER_2D_MULTISAMPLE,       GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_MS,   0, 0, GLSL_TYPE_INT)
+DECL_TYPE(isampler2DMSArray, GL_INT_SAMPLER_2D_MULTISAMPLE_ARRAY, GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_MS,   0, 1, GLSL_TYPE_INT)
+
+DECL_TYPE(usampler1D,        GL_UNSIGNED_INT_SAMPLER_1D,                   GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_1D,   0, 0, GLSL_TYPE_UINT)
+DECL_TYPE(usampler2D,        GL_UNSIGNED_INT_SAMPLER_2D,                   GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_2D,   0, 0, GLSL_TYPE_UINT)
+DECL_TYPE(usampler3D,        GL_UNSIGNED_INT_SAMPLER_3D,                   GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_3D,   0, 0, GLSL_TYPE_UINT)
+DECL_TYPE(usamplerCube,      GL_UNSIGNED_INT_SAMPLER_CUBE,                 GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_CUBE, 0, 0, GLSL_TYPE_UINT)
+DECL_TYPE(usampler1DArray,   GL_UNSIGNED_INT_SAMPLER_1D_ARRAY,             GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_1D,   0, 1, GLSL_TYPE_UINT)
+DECL_TYPE(usampler2DArray,   GL_UNSIGNED_INT_SAMPLER_2D_ARRAY,             GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_2D,   0, 1, GLSL_TYPE_UINT)
+DECL_TYPE(usamplerCubeArray, GL_UNSIGNED_INT_SAMPLER_CUBE_MAP_ARRAY,       GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_CUBE, 0, 1, GLSL_TYPE_UINT)
+DECL_TYPE(usampler2DRect,    GL_UNSIGNED_INT_SAMPLER_2D_RECT,              GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_RECT, 0, 0, GLSL_TYPE_UINT)
+DECL_TYPE(usamplerBuffer,    GL_UNSIGNED_INT_SAMPLER_BUFFER,               GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_BUF,  0, 0, GLSL_TYPE_UINT)
+DECL_TYPE(usampler2DMS,      GL_UNSIGNED_INT_SAMPLER_2D_MULTISAMPLE,       GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_MS,   0, 0, GLSL_TYPE_UINT)
+DECL_TYPE(usampler2DMSArray, GL_UNSIGNED_INT_SAMPLER_2D_MULTISAMPLE_ARRAY, GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_MS,   0, 1, GLSL_TYPE_UINT)
+
+DECL_TYPE(sampler1DShadow,        GL_SAMPLER_1D_SHADOW,             GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_1D,       1, 0, GLSL_TYPE_FLOAT)
+DECL_TYPE(sampler2DShadow,        GL_SAMPLER_2D_SHADOW,             GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_2D,       1, 0, GLSL_TYPE_FLOAT)
+DECL_TYPE(samplerCubeShadow,      GL_SAMPLER_CUBE_SHADOW,           GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_CUBE,     1, 0, GLSL_TYPE_FLOAT)
+DECL_TYPE(sampler1DArrayShadow,   GL_SAMPLER_1D_ARRAY_SHADOW,       GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_1D,       1, 1, GLSL_TYPE_FLOAT)
+DECL_TYPE(sampler2DArrayShadow,   GL_SAMPLER_2D_ARRAY_SHADOW,       GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_2D,       1, 1, GLSL_TYPE_FLOAT)
+DECL_TYPE(samplerCubeArrayShadow, GL_SAMPLER_CUBE_MAP_ARRAY_SHADOW, GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_CUBE,     1, 1, GLSL_TYPE_FLOAT)
+DECL_TYPE(sampler2DRectShadow,    GL_SAMPLER_2D_RECT_SHADOW,        GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_RECT,     1, 0, GLSL_TYPE_FLOAT)
+
+DECL_TYPE(samplerExternalOES,     GL_SAMPLER_EXTERNAL_OES,          GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_EXTERNAL, 0, 0, GLSL_TYPE_FLOAT)
+
+DECL_TYPE(image1D,         GL_IMAGE_1D,                                GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_1D,     0, 0, GLSL_TYPE_FLOAT);
+DECL_TYPE(image2D,         GL_IMAGE_2D,                                GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_2D,     0, 0, GLSL_TYPE_FLOAT);
+DECL_TYPE(image3D,         GL_IMAGE_3D,                                GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_3D,     0, 0, GLSL_TYPE_FLOAT);
+DECL_TYPE(image2DRect,     GL_IMAGE_2D_RECT,                           GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_RECT,   0, 0, GLSL_TYPE_FLOAT);
+DECL_TYPE(imageCube,       GL_IMAGE_CUBE,                              GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_CUBE,   0, 0, GLSL_TYPE_FLOAT);
+DECL_TYPE(imageBuffer,     GL_IMAGE_BUFFER,                            GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_BUF,    0, 0, GLSL_TYPE_FLOAT);
+DECL_TYPE(image1DArray,    GL_IMAGE_1D_ARRAY,                          GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_1D,     0, 1, GLSL_TYPE_FLOAT);
+DECL_TYPE(image2DArray,    GL_IMAGE_2D_ARRAY,                          GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_2D,     0, 1, GLSL_TYPE_FLOAT);
+DECL_TYPE(imageCubeArray,  GL_IMAGE_CUBE_MAP_ARRAY,                    GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_CUBE,   0, 1, GLSL_TYPE_FLOAT);
+DECL_TYPE(image2DMS,       GL_IMAGE_2D_MULTISAMPLE,                    GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_MS,     0, 0, GLSL_TYPE_FLOAT);
+DECL_TYPE(image2DMSArray,  GL_IMAGE_2D_MULTISAMPLE_ARRAY,              GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_MS,     0, 1, GLSL_TYPE_FLOAT);
+DECL_TYPE(iimage1D,        GL_INT_IMAGE_1D,                            GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_1D,     0, 0, GLSL_TYPE_INT);
+DECL_TYPE(iimage2D,        GL_INT_IMAGE_2D,                            GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_2D,     0, 0, GLSL_TYPE_INT);
+DECL_TYPE(iimage3D,        GL_INT_IMAGE_3D,                            GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_3D,     0, 0, GLSL_TYPE_INT);
+DECL_TYPE(iimage2DRect,    GL_INT_IMAGE_2D_RECT,                       GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_RECT,   0, 0, GLSL_TYPE_INT);
+DECL_TYPE(iimageCube,      GL_INT_IMAGE_CUBE,                          GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_CUBE,   0, 0, GLSL_TYPE_INT);
+DECL_TYPE(iimageBuffer,    GL_INT_IMAGE_BUFFER,                        GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_BUF,    0, 0, GLSL_TYPE_INT);
+DECL_TYPE(iimage1DArray,   GL_INT_IMAGE_1D_ARRAY,                      GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_1D,     0, 1, GLSL_TYPE_INT);
+DECL_TYPE(iimage2DArray,   GL_INT_IMAGE_2D_ARRAY,                      GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_2D,     0, 1, GLSL_TYPE_INT);
+DECL_TYPE(iimageCubeArray, GL_INT_IMAGE_CUBE_MAP_ARRAY,                GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_CUBE,   0, 1, GLSL_TYPE_INT);
+DECL_TYPE(iimage2DMS,      GL_INT_IMAGE_2D_MULTISAMPLE,                GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_MS,     0, 0, GLSL_TYPE_INT);
+DECL_TYPE(iimage2DMSArray, GL_INT_IMAGE_2D_MULTISAMPLE_ARRAY,          GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_MS,     0, 1, GLSL_TYPE_INT);
+DECL_TYPE(uimage1D,        GL_UNSIGNED_INT_IMAGE_1D,                   GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_1D,     0, 0, GLSL_TYPE_UINT);
+DECL_TYPE(uimage2D,        GL_UNSIGNED_INT_IMAGE_2D,                   GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_2D,     0, 0, GLSL_TYPE_UINT);
+DECL_TYPE(uimage3D,        GL_UNSIGNED_INT_IMAGE_3D,                   GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_3D,     0, 0, GLSL_TYPE_UINT);
+DECL_TYPE(uimage2DRect,    GL_UNSIGNED_INT_IMAGE_2D_RECT,              GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_RECT,   0, 0, GLSL_TYPE_UINT);
+DECL_TYPE(uimageCube,      GL_UNSIGNED_INT_IMAGE_CUBE,                 GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_CUBE,   0, 0, GLSL_TYPE_UINT);
+DECL_TYPE(uimageBuffer,    GL_UNSIGNED_INT_IMAGE_BUFFER,               GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_BUF,    0, 0, GLSL_TYPE_UINT);
+DECL_TYPE(uimage1DArray,   GL_UNSIGNED_INT_IMAGE_1D_ARRAY,             GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_1D,     0, 1, GLSL_TYPE_UINT);
+DECL_TYPE(uimage2DArray,   GL_UNSIGNED_INT_IMAGE_2D_ARRAY,             GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_2D,     0, 1, GLSL_TYPE_UINT);
+DECL_TYPE(uimageCubeArray, GL_UNSIGNED_INT_IMAGE_CUBE_MAP_ARRAY,       GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_CUBE,   0, 1, GLSL_TYPE_UINT);
+DECL_TYPE(uimage2DMS,      GL_UNSIGNED_INT_IMAGE_2D_MULTISAMPLE,       GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_MS,     0, 0, GLSL_TYPE_UINT);
+DECL_TYPE(uimage2DMSArray, GL_UNSIGNED_INT_IMAGE_2D_MULTISAMPLE_ARRAY, GLSL_TYPE_IMAGE, GLSL_SAMPLER_DIM_MS,     0, 1, GLSL_TYPE_UINT);
+
+DECL_TYPE(atomic_uint, GL_UNSIGNED_INT_ATOMIC_COUNTER, GLSL_TYPE_ATOMIC_UINT, 1, 1)
+
+STRUCT_TYPE(gl_DepthRangeParameters)
+STRUCT_TYPE(gl_PointParameters)
+STRUCT_TYPE(gl_MaterialParameters)
+STRUCT_TYPE(gl_LightSourceParameters)
+STRUCT_TYPE(gl_LightModelParameters)
+STRUCT_TYPE(gl_LightModelProducts)
+STRUCT_TYPE(gl_LightProducts)
+STRUCT_TYPE(gl_FogParameters)

diff --git a/icd/intel/compiler/shader/builtin_types.cpp b/icd/intel/compiler/shader/builtin_types.cpp
new file mode 100644
index 0000000..0a0fa8c
--- /dev/null
+++ b/icd/intel/compiler/shader/builtin_types.cpp

@@ -0,0 +1,364 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file builtin_types.cpp
+ *
+ * The glsl_type class has static members to represent all the built-in types
+ * (such as the glsl_type::_float_type flyweight) as well as convenience pointer
+ * accessors (such as glsl_type::float_type).  Those global variables are
+ * declared and initialized in this file.
+ *
+ * This also contains _mesa_glsl_initialize_types(), a function which populates
+ * a symbol table with the available built-in types for a particular language
+ * version and set of enabled extensions.
+ */
+
+#include "glsl_types.h"
+#include "glsl_parser_extras.h"
+
+/**
+ * Declarations of type flyweights (glsl_type::_foo_type) and
+ * convenience pointers (glsl_type::foo_type).
+ * @{
+ */
+#define DECL_TYPE(NAME, ...)                                    \
+   const glsl_type glsl_type::_##NAME##_type = glsl_type(__VA_ARGS__, #NAME); \
+   const glsl_type *const glsl_type::NAME##_type = &glsl_type::_##NAME##_type;
+
+#define STRUCT_TYPE(NAME)                                       \
+   const glsl_type glsl_type::_struct_##NAME##_type =           \
+      glsl_type(NAME##_fields, Elements(NAME##_fields), #NAME); \
+   const glsl_type *const glsl_type::struct_##NAME##_type =     \
+      &glsl_type::_struct_##NAME##_type;
+
+static const struct glsl_struct_field gl_DepthRangeParameters_fields[] = {
+   { glsl_type::float_type, "near", false, -1 },
+   { glsl_type::float_type, "far",  false, -1 },
+   { glsl_type::float_type, "diff", false, -1 },
+};
+
+static const struct glsl_struct_field gl_PointParameters_fields[] = {
+   { glsl_type::float_type, "size", false, -1 },
+   { glsl_type::float_type, "sizeMin", false, -1 },
+   { glsl_type::float_type, "sizeMax", false, -1 },
+   { glsl_type::float_type, "fadeThresholdSize", false, -1 },
+   { glsl_type::float_type, "distanceConstantAttenuation", false, -1 },
+   { glsl_type::float_type, "distanceLinearAttenuation", false, -1 },
+   { glsl_type::float_type, "distanceQuadraticAttenuation", false, -1 },
+};
+
+static const struct glsl_struct_field gl_MaterialParameters_fields[] = {
+   { glsl_type::vec4_type, "emission", false, -1 },
+   { glsl_type::vec4_type, "ambient", false, -1 },
+   { glsl_type::vec4_type, "diffuse", false, -1 },
+   { glsl_type::vec4_type, "specular", false, -1 },
+   { glsl_type::float_type, "shininess", false, -1 },
+};
+
+static const struct glsl_struct_field gl_LightSourceParameters_fields[] = {
+   { glsl_type::vec4_type, "ambient", false, -1 },
+   { glsl_type::vec4_type, "diffuse", false, -1 },
+   { glsl_type::vec4_type, "specular", false, -1 },
+   { glsl_type::vec4_type, "position", false, -1 },
+   { glsl_type::vec4_type, "halfVector", false, -1 },
+   { glsl_type::vec3_type, "spotDirection", false, -1 },
+   { glsl_type::float_type, "spotExponent", false, -1 },
+   { glsl_type::float_type, "spotCutoff", false, -1 },
+   { glsl_type::float_type, "spotCosCutoff", false, -1 },
+   { glsl_type::float_type, "constantAttenuation", false, -1 },
+   { glsl_type::float_type, "linearAttenuation", false, -1 },
+   { glsl_type::float_type, "quadraticAttenuation", false, -1 },
+};
+
+static const struct glsl_struct_field gl_LightModelParameters_fields[] = {
+   { glsl_type::vec4_type, "ambient", false, -1 },
+};
+
+static const struct glsl_struct_field gl_LightModelProducts_fields[] = {
+   { glsl_type::vec4_type, "sceneColor", false, -1 },
+};
+
+static const struct glsl_struct_field gl_LightProducts_fields[] = {
+   { glsl_type::vec4_type, "ambient", false, -1 },
+   { glsl_type::vec4_type, "diffuse", false, -1 },
+   { glsl_type::vec4_type, "specular", false, -1 },
+};
+
+static const struct glsl_struct_field gl_FogParameters_fields[] = {
+   { glsl_type::vec4_type, "color", false, -1 },
+   { glsl_type::float_type, "density", false, -1 },
+   { glsl_type::float_type, "start", false, -1 },
+   { glsl_type::float_type, "end", false, -1 },
+   { glsl_type::float_type, "scale", false, -1 },
+};
+
+#include "builtin_type_macros.h"
+/** @} */
+
+/**
+ * Code to populate a symbol table with the built-in types available in a
+ * particular shading language version.  The table below contains tags every
+ * type with the GLSL/GLSL ES versions where it was introduced.
+ *
+ * @{
+ */
+#define T(TYPE, MIN_GL, MIN_ES) \
+   { glsl_type::TYPE##_type, MIN_GL, MIN_ES },
+
+const static struct builtin_type_versions {
+   const glsl_type *const type;
+   int min_gl;
+   int min_es;
+} builtin_type_versions[] = {
+   T(void,                            110, 100)
+   T(bool,                            110, 100)
+   T(bvec2,                           110, 100)
+   T(bvec3,                           110, 100)
+   T(bvec4,                           110, 100)
+   T(int,                             110, 100)
+   T(ivec2,                           110, 100)
+   T(ivec3,                           110, 100)
+   T(ivec4,                           110, 100)
+   T(uint,                            130, 300)
+   T(uvec2,                           130, 300)
+   T(uvec3,                           130, 300)
+   T(uvec4,                           130, 300)
+   T(float,                           110, 100)
+   T(vec2,                            110, 100)
+   T(vec3,                            110, 100)
+   T(vec4,                            110, 100)
+   T(mat2,                            110, 100)
+   T(mat3,                            110, 100)
+   T(mat4,                            110, 100)
+   T(mat2x3,                          120, 300)
+   T(mat2x4,                          120, 300)
+   T(mat3x2,                          120, 300)
+   T(mat3x4,                          120, 300)
+   T(mat4x2,                          120, 300)
+   T(mat4x3,                          120, 300)
+
+   T(sampler1D,                       110, 999)
+   T(sampler2D,                       110, 100)
+   T(sampler3D,                       110, 300)
+   T(samplerCube,                     110, 100)
+   T(sampler1DArray,                  130, 999)
+   T(sampler2DArray,                  130, 300)
+   T(samplerCubeArray,                400, 999)
+   T(sampler2DRect,                   140, 999)
+   T(samplerBuffer,                   140, 999)
+   T(sampler2DMS,                     150, 999)
+   T(sampler2DMSArray,                150, 999)
+
+   T(isampler1D,                      130, 999)
+   T(isampler2D,                      130, 300)
+   T(isampler3D,                      130, 300)
+   T(isamplerCube,                    130, 300)
+   T(isampler1DArray,                 130, 999)
+   T(isampler2DArray,                 130, 300)
+   T(isamplerCubeArray,               400, 999)
+   T(isampler2DRect,                  140, 999)
+   T(isamplerBuffer,                  140, 999)
+   T(isampler2DMS,                    150, 999)
+   T(isampler2DMSArray,               150, 999)
+
+   T(usampler1D,                      130, 999)
+   T(usampler2D,                      130, 300)
+   T(usampler3D,                      130, 300)
+   T(usamplerCube,                    130, 300)
+   T(usampler1DArray,                 130, 999)
+   T(usampler2DArray,                 130, 300)
+   T(usamplerCubeArray,               400, 999)
+   T(usampler2DRect,                  140, 999)
+   T(usamplerBuffer,                  140, 999)
+   T(usampler2DMS,                    150, 999)
+   T(usampler2DMSArray,               150, 999)
+
+   T(sampler1DShadow,                 110, 999)
+   T(sampler2DShadow,                 110, 300)
+   T(samplerCubeShadow,               130, 300)
+   T(sampler1DArrayShadow,            130, 999)
+   T(sampler2DArrayShadow,            130, 300)
+   T(samplerCubeArrayShadow,          400, 999)
+   T(sampler2DRectShadow,             140, 999)
+
+   T(struct_gl_DepthRangeParameters,  110, 100)
+
+   T(image1D,                         420, 999)
+   T(image2D,                         420, 999)
+   T(image3D,                         420, 999)
+   T(image2DRect,                     420, 999)
+   T(imageCube,                       420, 999)
+   T(imageBuffer,                     420, 999)
+   T(image1DArray,                    420, 999)
+   T(image2DArray,                    420, 999)
+   T(imageCubeArray,                  420, 999)
+   T(image2DMS,                       420, 999)
+   T(image2DMSArray,                  420, 999)
+   T(iimage1D,                        420, 999)
+   T(iimage2D,                        420, 999)
+   T(iimage3D,                        420, 999)
+   T(iimage2DRect,                    420, 999)
+   T(iimageCube,                      420, 999)
+   T(iimageBuffer,                    420, 999)
+   T(iimage1DArray,                   420, 999)
+   T(iimage2DArray,                   420, 999)
+   T(iimageCubeArray,                 420, 999)
+   T(iimage2DMS,                      420, 999)
+   T(iimage2DMSArray,                 420, 999)
+   T(uimage1D,                        420, 999)
+   T(uimage2D,                        420, 999)
+   T(uimage3D,                        420, 999)
+   T(uimage2DRect,                    420, 999)
+   T(uimageCube,                      420, 999)
+   T(uimageBuffer,                    420, 999)
+   T(uimage1DArray,                   420, 999)
+   T(uimage2DArray,                   420, 999)
+   T(uimageCubeArray,                 420, 999)
+   T(uimage2DMS,                      420, 999)
+   T(uimage2DMSArray,                 420, 999)
+
+   T(atomic_uint,                     420, 999)
+};
+
+static const glsl_type *const deprecated_types[] = {
+   glsl_type::struct_gl_PointParameters_type,
+   glsl_type::struct_gl_MaterialParameters_type,
+   glsl_type::struct_gl_LightSourceParameters_type,
+   glsl_type::struct_gl_LightModelParameters_type,
+   glsl_type::struct_gl_LightModelProducts_type,
+   glsl_type::struct_gl_LightProducts_type,
+   glsl_type::struct_gl_FogParameters_type,
+};
+
+static inline void
+add_type(glsl_symbol_table *symbols, const glsl_type *const type)
+{
+   symbols->add_type(type->name, type);
+}
+
+/**
+ * Populate the symbol table with available built-in types.
+ */
+void
+_mesa_glsl_initialize_types(struct _mesa_glsl_parse_state *state)
+{
+   struct glsl_symbol_table *symbols = state->symbols;
+
+   for (unsigned i = 0; i < Elements(builtin_type_versions); i++) {
+      const struct builtin_type_versions *const t = &builtin_type_versions[i];
+      if (state->is_version(t->min_gl, t->min_es)) {
+         add_type(symbols, t->type);
+      }
+   }
+
+   /* Add deprecated structure types.  While these were deprecated in 1.30,
+    * they're still present.  We've removed them in 1.40+ (OpenGL 3.1+).
+    */
+   if (!state->es_shader && state->language_version < 140) {
+      for (unsigned i = 0; i < Elements(deprecated_types); i++) {
+         add_type(symbols, deprecated_types[i]);
+      }
+   }
+
+   /* Add types for enabled extensions.  They may have already been added
+    * by the version-based loop, but attempting to add them a second time
+    * is harmless.
+    */
+   if (state->ARB_texture_cube_map_array_enable) {
+      add_type(symbols, glsl_type::samplerCubeArray_type);
+      add_type(symbols, glsl_type::samplerCubeArrayShadow_type);
+      add_type(symbols, glsl_type::isamplerCubeArray_type);
+      add_type(symbols, glsl_type::usamplerCubeArray_type);
+   }
+
+   if (state->ARB_texture_multisample_enable) {
+      add_type(symbols, glsl_type::sampler2DMS_type);
+      add_type(symbols, glsl_type::isampler2DMS_type);
+      add_type(symbols, glsl_type::usampler2DMS_type);
+      add_type(symbols, glsl_type::sampler2DMSArray_type);
+      add_type(symbols, glsl_type::isampler2DMSArray_type);
+      add_type(symbols, glsl_type::usampler2DMSArray_type);
+   }
+
+   if (state->ARB_texture_rectangle_enable) {
+      add_type(symbols, glsl_type::sampler2DRect_type);
+      add_type(symbols, glsl_type::sampler2DRectShadow_type);
+   }
+
+   if (state->EXT_texture_array_enable) {
+      add_type(symbols, glsl_type::sampler1DArray_type);
+      add_type(symbols, glsl_type::sampler2DArray_type);
+      add_type(symbols, glsl_type::sampler1DArrayShadow_type);
+      add_type(symbols, glsl_type::sampler2DArrayShadow_type);
+   }
+
+   if (state->OES_EGL_image_external_enable) {
+      add_type(symbols, glsl_type::samplerExternalOES_type);
+   }
+
+   if (state->OES_texture_3D_enable) {
+      add_type(symbols, glsl_type::sampler3D_type);
+   }
+
+   if (state->ARB_shader_image_load_store_enable) {
+      add_type(symbols, glsl_type::image1D_type);
+      add_type(symbols, glsl_type::image2D_type);
+      add_type(symbols, glsl_type::image3D_type);
+      add_type(symbols, glsl_type::image2DRect_type);
+      add_type(symbols, glsl_type::imageCube_type);
+      add_type(symbols, glsl_type::imageBuffer_type);
+      add_type(symbols, glsl_type::image1DArray_type);
+      add_type(symbols, glsl_type::image2DArray_type);
+      add_type(symbols, glsl_type::imageCubeArray_type);
+      add_type(symbols, glsl_type::image2DMS_type);
+      add_type(symbols, glsl_type::image2DMSArray_type);
+      add_type(symbols, glsl_type::iimage1D_type);
+      add_type(symbols, glsl_type::iimage2D_type);
+      add_type(symbols, glsl_type::iimage3D_type);
+      add_type(symbols, glsl_type::iimage2DRect_type);
+      add_type(symbols, glsl_type::iimageCube_type);
+      add_type(symbols, glsl_type::iimageBuffer_type);
+      add_type(symbols, glsl_type::iimage1DArray_type);
+      add_type(symbols, glsl_type::iimage2DArray_type);
+      add_type(symbols, glsl_type::iimageCubeArray_type);
+      add_type(symbols, glsl_type::iimage2DMS_type);
+      add_type(symbols, glsl_type::iimage2DMSArray_type);
+      add_type(symbols, glsl_type::uimage1D_type);
+      add_type(symbols, glsl_type::uimage2D_type);
+      add_type(symbols, glsl_type::uimage3D_type);
+      add_type(symbols, glsl_type::uimage2DRect_type);
+      add_type(symbols, glsl_type::uimageCube_type);
+      add_type(symbols, glsl_type::uimageBuffer_type);
+      add_type(symbols, glsl_type::uimage1DArray_type);
+      add_type(symbols, glsl_type::uimage2DArray_type);
+      add_type(symbols, glsl_type::uimageCubeArray_type);
+      add_type(symbols, glsl_type::uimage2DMS_type);
+      add_type(symbols, glsl_type::uimage2DMSArray_type);
+   }
+
+   if (state->ARB_shader_atomic_counters_enable) {
+      add_type(symbols, glsl_type::atomic_uint_type);
+   }
+}
+/** @} */

diff --git a/icd/intel/compiler/shader/builtin_variables.cpp b/icd/intel/compiler/shader/builtin_variables.cpp
new file mode 100644
index 0000000..7c864d5
--- /dev/null
+++ b/icd/intel/compiler/shader/builtin_variables.cpp

@@ -0,0 +1,1074 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "ir.h"
+#include "glsl_parser_extras.h"
+#include "glsl_symbol_table.h"
+#include "libfns.h"
+#include "main/uniforms.h"
+#include "program/prog_parameter.h"
+#include "program/prog_statevars.h"
+#include "program/prog_instruction.h"
+#include "icd-utils.h" // LunarG - ADD
+
+static const struct gl_builtin_uniform_element gl_NumSamples_elements[] = {
+   {NULL, {STATE_NUM_SAMPLES, 0, 0}, SWIZZLE_XXXX}
+};
+
+static const struct gl_builtin_uniform_element gl_DepthRange_elements[] = {
+   {"near", {STATE_DEPTH_RANGE, 0, 0}, SWIZZLE_XXXX},
+   {"far", {STATE_DEPTH_RANGE, 0, 0}, SWIZZLE_YYYY},
+   {"diff", {STATE_DEPTH_RANGE, 0, 0}, SWIZZLE_ZZZZ},
+};
+
+static const struct gl_builtin_uniform_element gl_ClipPlane_elements[] = {
+   {NULL, {STATE_CLIPPLANE, 0, 0}, SWIZZLE_XYZW}
+};
+
+static const struct gl_builtin_uniform_element gl_Point_elements[] = {
+   {"size", {STATE_POINT_SIZE}, SWIZZLE_XXXX},
+   {"sizeMin", {STATE_POINT_SIZE}, SWIZZLE_YYYY},
+   {"sizeMax", {STATE_POINT_SIZE}, SWIZZLE_ZZZZ},
+   {"fadeThresholdSize", {STATE_POINT_SIZE}, SWIZZLE_WWWW},
+   {"distanceConstantAttenuation", {STATE_POINT_ATTENUATION}, SWIZZLE_XXXX},
+   {"distanceLinearAttenuation", {STATE_POINT_ATTENUATION}, SWIZZLE_YYYY},
+   {"distanceQuadraticAttenuation", {STATE_POINT_ATTENUATION}, SWIZZLE_ZZZZ},
+};
+
+static const struct gl_builtin_uniform_element gl_FrontMaterial_elements[] = {
+   {"emission", {STATE_MATERIAL, 0, STATE_EMISSION}, SWIZZLE_XYZW},
+   {"ambient", {STATE_MATERIAL, 0, STATE_AMBIENT}, SWIZZLE_XYZW},
+   {"diffuse", {STATE_MATERIAL, 0, STATE_DIFFUSE}, SWIZZLE_XYZW},
+   {"specular", {STATE_MATERIAL, 0, STATE_SPECULAR}, SWIZZLE_XYZW},
+   {"shininess", {STATE_MATERIAL, 0, STATE_SHININESS}, SWIZZLE_XXXX},
+};
+
+static const struct gl_builtin_uniform_element gl_BackMaterial_elements[] = {
+   {"emission", {STATE_MATERIAL, 1, STATE_EMISSION}, SWIZZLE_XYZW},
+   {"ambient", {STATE_MATERIAL, 1, STATE_AMBIENT}, SWIZZLE_XYZW},
+   {"diffuse", {STATE_MATERIAL, 1, STATE_DIFFUSE}, SWIZZLE_XYZW},
+   {"specular", {STATE_MATERIAL, 1, STATE_SPECULAR}, SWIZZLE_XYZW},
+   {"shininess", {STATE_MATERIAL, 1, STATE_SHININESS}, SWIZZLE_XXXX},
+};
+
+static const struct gl_builtin_uniform_element gl_LightSource_elements[] = {
+   {"ambient", {STATE_LIGHT, 0, STATE_AMBIENT}, SWIZZLE_XYZW},
+   {"diffuse", {STATE_LIGHT, 0, STATE_DIFFUSE}, SWIZZLE_XYZW},
+   {"specular", {STATE_LIGHT, 0, STATE_SPECULAR}, SWIZZLE_XYZW},
+   {"position", {STATE_LIGHT, 0, STATE_POSITION}, SWIZZLE_XYZW},
+   {"halfVector", {STATE_LIGHT, 0, STATE_HALF_VECTOR}, SWIZZLE_XYZW},
+   {"spotDirection", {STATE_LIGHT, 0, STATE_SPOT_DIRECTION},
+    MAKE_SWIZZLE4(SWIZZLE_X,
+		  SWIZZLE_Y,
+		  SWIZZLE_Z,
+		  SWIZZLE_Z)},
+   {"spotCosCutoff", {STATE_LIGHT, 0, STATE_SPOT_DIRECTION}, SWIZZLE_WWWW},
+   {"spotCutoff", {STATE_LIGHT, 0, STATE_SPOT_CUTOFF}, SWIZZLE_XXXX},
+   {"spotExponent", {STATE_LIGHT, 0, STATE_ATTENUATION}, SWIZZLE_WWWW},
+   {"constantAttenuation", {STATE_LIGHT, 0, STATE_ATTENUATION}, SWIZZLE_XXXX},
+   {"linearAttenuation", {STATE_LIGHT, 0, STATE_ATTENUATION}, SWIZZLE_YYYY},
+   {"quadraticAttenuation", {STATE_LIGHT, 0, STATE_ATTENUATION}, SWIZZLE_ZZZZ},
+};
+
+static const struct gl_builtin_uniform_element gl_LightModel_elements[] = {
+   {"ambient", {STATE_LIGHTMODEL_AMBIENT, 0}, SWIZZLE_XYZW},
+};
+
+static const struct gl_builtin_uniform_element gl_FrontLightModelProduct_elements[] = {
+   {"sceneColor", {STATE_LIGHTMODEL_SCENECOLOR, 0}, SWIZZLE_XYZW},
+};
+
+static const struct gl_builtin_uniform_element gl_BackLightModelProduct_elements[] = {
+   {"sceneColor", {STATE_LIGHTMODEL_SCENECOLOR, 1}, SWIZZLE_XYZW},
+};
+
+static const struct gl_builtin_uniform_element gl_FrontLightProduct_elements[] = {
+   {"ambient", {STATE_LIGHTPROD, 0, 0, STATE_AMBIENT}, SWIZZLE_XYZW},
+   {"diffuse", {STATE_LIGHTPROD, 0, 0, STATE_DIFFUSE}, SWIZZLE_XYZW},
+   {"specular", {STATE_LIGHTPROD, 0, 0, STATE_SPECULAR}, SWIZZLE_XYZW},
+};
+
+static const struct gl_builtin_uniform_element gl_BackLightProduct_elements[] = {
+   {"ambient", {STATE_LIGHTPROD, 0, 1, STATE_AMBIENT}, SWIZZLE_XYZW},
+   {"diffuse", {STATE_LIGHTPROD, 0, 1, STATE_DIFFUSE}, SWIZZLE_XYZW},
+   {"specular", {STATE_LIGHTPROD, 0, 1, STATE_SPECULAR}, SWIZZLE_XYZW},
+};
+
+static const struct gl_builtin_uniform_element gl_TextureEnvColor_elements[] = {
+   {NULL, {STATE_TEXENV_COLOR, 0}, SWIZZLE_XYZW},
+};
+
+static const struct gl_builtin_uniform_element gl_EyePlaneS_elements[] = {
+   {NULL, {STATE_TEXGEN, 0, STATE_TEXGEN_EYE_S}, SWIZZLE_XYZW},
+};
+
+static const struct gl_builtin_uniform_element gl_EyePlaneT_elements[] = {
+   {NULL, {STATE_TEXGEN, 0, STATE_TEXGEN_EYE_T}, SWIZZLE_XYZW},
+};
+
+static const struct gl_builtin_uniform_element gl_EyePlaneR_elements[] = {
+   {NULL, {STATE_TEXGEN, 0, STATE_TEXGEN_EYE_R}, SWIZZLE_XYZW},
+};
+
+static const struct gl_builtin_uniform_element gl_EyePlaneQ_elements[] = {
+   {NULL, {STATE_TEXGEN, 0, STATE_TEXGEN_EYE_Q}, SWIZZLE_XYZW},
+};
+
+static const struct gl_builtin_uniform_element gl_ObjectPlaneS_elements[] = {
+   {NULL, {STATE_TEXGEN, 0, STATE_TEXGEN_OBJECT_S}, SWIZZLE_XYZW},
+};
+
+static const struct gl_builtin_uniform_element gl_ObjectPlaneT_elements[] = {
+   {NULL, {STATE_TEXGEN, 0, STATE_TEXGEN_OBJECT_T}, SWIZZLE_XYZW},
+};
+
+static const struct gl_builtin_uniform_element gl_ObjectPlaneR_elements[] = {
+   {NULL, {STATE_TEXGEN, 0, STATE_TEXGEN_OBJECT_R}, SWIZZLE_XYZW},
+};
+
+static const struct gl_builtin_uniform_element gl_ObjectPlaneQ_elements[] = {
+   {NULL, {STATE_TEXGEN, 0, STATE_TEXGEN_OBJECT_Q}, SWIZZLE_XYZW},
+};
+
+static const struct gl_builtin_uniform_element gl_Fog_elements[] = {
+   {"color", {STATE_FOG_COLOR}, SWIZZLE_XYZW},
+   {"density", {STATE_FOG_PARAMS}, SWIZZLE_XXXX},
+   {"start", {STATE_FOG_PARAMS}, SWIZZLE_YYYY},
+   {"end", {STATE_FOG_PARAMS}, SWIZZLE_ZZZZ},
+   {"scale", {STATE_FOG_PARAMS}, SWIZZLE_WWWW},
+};
+
+static const struct gl_builtin_uniform_element gl_NormalScale_elements[] = {
+   {NULL, {STATE_NORMAL_SCALE}, SWIZZLE_XXXX},
+};
+
+static const struct gl_builtin_uniform_element gl_BumpRotMatrix0MESA_elements[] = {
+   {NULL, {STATE_INTERNAL, STATE_ROT_MATRIX_0}, SWIZZLE_XYZW},
+};
+
+static const struct gl_builtin_uniform_element gl_BumpRotMatrix1MESA_elements[] = {
+   {NULL, {STATE_INTERNAL, STATE_ROT_MATRIX_1}, SWIZZLE_XYZW},
+};
+
+static const struct gl_builtin_uniform_element gl_FogParamsOptimizedMESA_elements[] = {
+   {NULL, {STATE_INTERNAL, STATE_FOG_PARAMS_OPTIMIZED}, SWIZZLE_XYZW},
+};
+
+static const struct gl_builtin_uniform_element gl_CurrentAttribVertMESA_elements[] = {
+   {NULL, {STATE_INTERNAL, STATE_CURRENT_ATTRIB, 0}, SWIZZLE_XYZW},
+};
+
+static const struct gl_builtin_uniform_element gl_CurrentAttribFragMESA_elements[] = {
+   {NULL, {STATE_INTERNAL, STATE_CURRENT_ATTRIB_MAYBE_VP_CLAMPED, 0}, SWIZZLE_XYZW},
+};
+
+#define MATRIX(name, statevar, modifier)				\
+   static const struct gl_builtin_uniform_element name ## _elements[] = { \
+      { NULL, { statevar, 0, 0, 0, modifier}, SWIZZLE_XYZW },		\
+      { NULL, { statevar, 0, 1, 1, modifier}, SWIZZLE_XYZW },		\
+      { NULL, { statevar, 0, 2, 2, modifier}, SWIZZLE_XYZW },		\
+      { NULL, { statevar, 0, 3, 3, modifier}, SWIZZLE_XYZW },		\
+   }
+
+MATRIX(gl_ModelViewMatrix,
+       STATE_MODELVIEW_MATRIX, STATE_MATRIX_TRANSPOSE);
+MATRIX(gl_ModelViewMatrixInverse,
+       STATE_MODELVIEW_MATRIX, STATE_MATRIX_INVTRANS);
+MATRIX(gl_ModelViewMatrixTranspose,
+       STATE_MODELVIEW_MATRIX, 0);
+MATRIX(gl_ModelViewMatrixInverseTranspose,
+       STATE_MODELVIEW_MATRIX, STATE_MATRIX_INVERSE);
+
+MATRIX(gl_ProjectionMatrix,
+       STATE_PROJECTION_MATRIX, STATE_MATRIX_TRANSPOSE);
+MATRIX(gl_ProjectionMatrixInverse,
+       STATE_PROJECTION_MATRIX, STATE_MATRIX_INVTRANS);
+MATRIX(gl_ProjectionMatrixTranspose,
+       STATE_PROJECTION_MATRIX, 0);
+MATRIX(gl_ProjectionMatrixInverseTranspose,
+       STATE_PROJECTION_MATRIX, STATE_MATRIX_INVERSE);
+
+MATRIX(gl_ModelViewProjectionMatrix,
+       STATE_MVP_MATRIX, STATE_MATRIX_TRANSPOSE);
+MATRIX(gl_ModelViewProjectionMatrixInverse,
+       STATE_MVP_MATRIX, STATE_MATRIX_INVTRANS);
+MATRIX(gl_ModelViewProjectionMatrixTranspose,
+       STATE_MVP_MATRIX, 0);
+MATRIX(gl_ModelViewProjectionMatrixInverseTranspose,
+       STATE_MVP_MATRIX, STATE_MATRIX_INVERSE);
+
+MATRIX(gl_TextureMatrix,
+       STATE_TEXTURE_MATRIX, STATE_MATRIX_TRANSPOSE);
+MATRIX(gl_TextureMatrixInverse,
+       STATE_TEXTURE_MATRIX, STATE_MATRIX_INVTRANS);
+MATRIX(gl_TextureMatrixTranspose,
+       STATE_TEXTURE_MATRIX, 0);
+MATRIX(gl_TextureMatrixInverseTranspose,
+       STATE_TEXTURE_MATRIX, STATE_MATRIX_INVERSE);
+
+static const struct gl_builtin_uniform_element gl_NormalMatrix_elements[] = {
+   { NULL, { STATE_MODELVIEW_MATRIX, 0, 0, 0, STATE_MATRIX_INVERSE},
+     MAKE_SWIZZLE4(SWIZZLE_X, SWIZZLE_Y, SWIZZLE_Z, SWIZZLE_Z) },
+   { NULL, { STATE_MODELVIEW_MATRIX, 0, 1, 1, STATE_MATRIX_INVERSE},
+     MAKE_SWIZZLE4(SWIZZLE_X, SWIZZLE_Y, SWIZZLE_Z, SWIZZLE_Z) },
+   { NULL, { STATE_MODELVIEW_MATRIX, 0, 2, 2, STATE_MATRIX_INVERSE},
+     MAKE_SWIZZLE4(SWIZZLE_X, SWIZZLE_Y, SWIZZLE_Z, SWIZZLE_Z) },
+};
+
+#undef MATRIX
+
+#define STATEVAR(name) {#name, name ## _elements, Elements(name ## _elements)}
+
+static const struct gl_builtin_uniform_desc _mesa_builtin_uniform_desc[] = {
+   STATEVAR(gl_NumSamples),
+   STATEVAR(gl_DepthRange),
+   STATEVAR(gl_ClipPlane),
+   STATEVAR(gl_Point),
+   STATEVAR(gl_FrontMaterial),
+   STATEVAR(gl_BackMaterial),
+   STATEVAR(gl_LightSource),
+   STATEVAR(gl_LightModel),
+   STATEVAR(gl_FrontLightModelProduct),
+   STATEVAR(gl_BackLightModelProduct),
+   STATEVAR(gl_FrontLightProduct),
+   STATEVAR(gl_BackLightProduct),
+   STATEVAR(gl_TextureEnvColor),
+   STATEVAR(gl_EyePlaneS),
+   STATEVAR(gl_EyePlaneT),
+   STATEVAR(gl_EyePlaneR),
+   STATEVAR(gl_EyePlaneQ),
+   STATEVAR(gl_ObjectPlaneS),
+   STATEVAR(gl_ObjectPlaneT),
+   STATEVAR(gl_ObjectPlaneR),
+   STATEVAR(gl_ObjectPlaneQ),
+   STATEVAR(gl_Fog),
+
+   STATEVAR(gl_ModelViewMatrix),
+   STATEVAR(gl_ModelViewMatrixInverse),
+   STATEVAR(gl_ModelViewMatrixTranspose),
+   STATEVAR(gl_ModelViewMatrixInverseTranspose),
+
+   STATEVAR(gl_ProjectionMatrix),
+   STATEVAR(gl_ProjectionMatrixInverse),
+   STATEVAR(gl_ProjectionMatrixTranspose),
+   STATEVAR(gl_ProjectionMatrixInverseTranspose),
+
+   STATEVAR(gl_ModelViewProjectionMatrix),
+   STATEVAR(gl_ModelViewProjectionMatrixInverse),
+   STATEVAR(gl_ModelViewProjectionMatrixTranspose),
+   STATEVAR(gl_ModelViewProjectionMatrixInverseTranspose),
+
+   STATEVAR(gl_TextureMatrix),
+   STATEVAR(gl_TextureMatrixInverse),
+   STATEVAR(gl_TextureMatrixTranspose),
+   STATEVAR(gl_TextureMatrixInverseTranspose),
+
+   STATEVAR(gl_NormalMatrix),
+   STATEVAR(gl_NormalScale),
+
+   STATEVAR(gl_BumpRotMatrix0MESA),
+   STATEVAR(gl_BumpRotMatrix1MESA),
+   STATEVAR(gl_FogParamsOptimizedMESA),
+   STATEVAR(gl_CurrentAttribVertMESA),
+   STATEVAR(gl_CurrentAttribFragMESA),
+
+   {NULL, NULL, 0}
+};
+
+
+namespace {
+
+/**
+ * Data structure that accumulates fields for the gl_PerVertex interface
+ * block.
+ */
+class per_vertex_accumulator
+{
+public:
+   per_vertex_accumulator();
+   void add_field(int slot, const glsl_type *type, const char *name);
+   const glsl_type *construct_interface_instance() const;
+
+private:
+   glsl_struct_field fields[10];
+   unsigned num_fields;
+};
+
+
+per_vertex_accumulator::per_vertex_accumulator()
+   : fields(),
+     num_fields(0)
+{
+}
+
+
+void
+per_vertex_accumulator::add_field(int slot, const glsl_type *type,
+                                  const char *name)
+{
+   assert(this->num_fields < ARRAY_SIZE(this->fields));
+   this->fields[this->num_fields].type = type;
+   this->fields[this->num_fields].name = name;
+   this->fields[this->num_fields].row_major = false;
+   this->fields[this->num_fields].location = slot;
+   this->fields[this->num_fields].interpolation = INTERP_QUALIFIER_NONE;
+   this->fields[this->num_fields].centroid = 0;
+   this->fields[this->num_fields].sample = 0;
+   this->num_fields++;
+}
+
+
+const glsl_type *
+per_vertex_accumulator::construct_interface_instance() const
+{
+   return glsl_type::get_interface_instance(this->fields, this->num_fields,
+                                            GLSL_INTERFACE_PACKING_STD140,
+                                            "gl_PerVertex");
+}
+
+
+class builtin_variable_generator
+{
+public:
+   builtin_variable_generator(exec_list *instructions,
+                              struct _mesa_glsl_parse_state *state);
+   void generate_constants();
+   void generate_uniforms();
+   void generate_vs_special_vars();
+   void generate_gs_special_vars();
+   void generate_fs_special_vars();
+   void generate_cs_special_vars();
+   void generate_varyings();
+
+private:
+   const glsl_type *array(const glsl_type *base, unsigned elements)
+   {
+      return glsl_type::get_array_instance(base, elements);
+   }
+
+   const glsl_type *type(const char *name)
+   {
+      return symtab->get_type(name);
+   }
+
+   ir_variable *add_input(int slot, const glsl_type *type, const char *name)
+   {
+      return add_variable(name, type, ir_var_shader_in, slot);
+   }
+
+   ir_variable *add_output(int slot, const glsl_type *type, const char *name)
+   {
+      return add_variable(name, type, ir_var_shader_out, slot);
+   }
+
+   ir_variable *add_system_value(int slot, const glsl_type *type,
+                                 const char *name)
+   {
+      return add_variable(name, type, ir_var_system_value, slot);
+   }
+
+   ir_variable *add_variable(const char *name, const glsl_type *type,
+                             enum ir_variable_mode mode, int slot);
+   ir_variable *add_uniform(const glsl_type *type, const char *name);
+   ir_variable *add_const(const char *name, int value);
+   ir_variable *add_const_ivec3(const char *name, int x, int y, int z);
+   void add_varying(int slot, const glsl_type *type, const char *name,
+                    const char *name_as_gs_input);
+
+   exec_list * const instructions;
+   struct _mesa_glsl_parse_state * const state;
+   glsl_symbol_table * const symtab;
+
+   /**
+    * True if compatibility-profile-only variables should be included.  (In
+    * desktop GL, these are always included when the GLSL version is 1.30 and
+    * or below).
+    */
+   const bool compatibility;
+
+   const glsl_type * const bool_t;
+   const glsl_type * const int_t;
+   const glsl_type * const float_t;
+   const glsl_type * const vec2_t;
+   const glsl_type * const vec3_t;
+   const glsl_type * const vec4_t;
+   const glsl_type * const mat3_t;
+   const glsl_type * const mat4_t;
+
+   per_vertex_accumulator per_vertex_in;
+   per_vertex_accumulator per_vertex_out;
+};
+
+
+builtin_variable_generator::builtin_variable_generator(
+   exec_list *instructions, struct _mesa_glsl_parse_state *state)
+   : instructions(instructions), state(state), symtab(state->symbols),
+     compatibility(!state->is_version(140, 100)),
+     bool_t(glsl_type::bool_type), int_t(glsl_type::int_type),
+     float_t(glsl_type::float_type), vec2_t(glsl_type::vec2_type),
+     vec3_t(glsl_type::vec3_type), vec4_t(glsl_type::vec4_type),
+     mat3_t(glsl_type::mat3_type), mat4_t(glsl_type::mat4_type)
+{
+}
+
+
+ir_variable *
+builtin_variable_generator::add_variable(const char *name,
+                                         const glsl_type *type,
+                                         enum ir_variable_mode mode, int slot)
+{
+   ir_variable *var = new(symtab) ir_variable(type, name, mode);
+   var->data.how_declared = ir_var_declared_implicitly;
+
+   switch (var->data.mode) {
+   case ir_var_auto:
+   case ir_var_shader_in:
+   case ir_var_uniform:
+   case ir_var_system_value:
+      var->data.read_only = true;
+      break;
+   case ir_var_shader_out:
+      break;
+   default:
+      /* The only variables that are added using this function should be
+       * uniforms, shader inputs, and shader outputs, constants (which use
+       * ir_var_auto), and system values.
+       */
+      assert(0);
+      break;
+   }
+
+   var->data.location = slot;
+   var->data.explicit_location = (slot >= 0);
+   var->data.explicit_index = 0;
+
+   /* Once the variable is created an initialized, add it to the symbol table
+    * and add the declaration to the IR stream.
+    */
+   instructions->push_tail(var);
+
+   symtab->add_variable(var);
+   return var;
+}
+
+
+ir_variable *
+builtin_variable_generator::add_uniform(const glsl_type *type,
+                                        const char *name)
+{
+   ir_variable *const uni = add_variable(name, type, ir_var_uniform, -1);
+
+   unsigned i;
+   for (i = 0; _mesa_builtin_uniform_desc[i].name != NULL; i++) {
+      if (strcmp(_mesa_builtin_uniform_desc[i].name, name) == 0) {
+	 break;
+      }
+   }
+
+   assert(_mesa_builtin_uniform_desc[i].name != NULL);
+   const struct gl_builtin_uniform_desc* const statevar =
+      &_mesa_builtin_uniform_desc[i];
+
+   const unsigned array_count = type->is_array() ? type->length : 1;
+   uni->num_state_slots = array_count * statevar->num_elements;
+
+   ir_state_slot *slots =
+      ralloc_array(uni, ir_state_slot, uni->num_state_slots);
+
+   uni->state_slots = slots;
+
+   /* see main/config.h */
+   assert(uni->num_state_slots <= MAX_NUM_STATE_SLOTS);
+
+   for (unsigned a = 0; a < array_count; a++) {
+      for (unsigned j = 0; j < statevar->num_elements; j++) {
+	 const struct gl_builtin_uniform_element *element =
+	    &statevar->elements[j];
+
+	 memcpy(slots->tokens, element->tokens, sizeof(element->tokens));
+	 if (type->is_array()) {
+	    if (strcmp(name, "gl_CurrentAttribVertMESA") == 0 ||
+		strcmp(name, "gl_CurrentAttribFragMESA") == 0) {
+	       slots->tokens[2] = a;
+	    } else {
+	       slots->tokens[1] = a;
+	    }
+	 }
+
+	 slots->swizzle = element->swizzle;
+	 slots++;
+      }
+   }
+
+   return uni;
+}
+
+
+ir_variable *
+builtin_variable_generator::add_const(const char *name, int value)
+{
+   ir_variable *const var = add_variable(name, glsl_type::int_type,
+					 ir_var_auto, -1);
+   var->constant_value = new(var) ir_constant(value);
+   var->constant_initializer = new(var) ir_constant(value);
+   var->data.has_initializer = true;
+   return var;
+}
+
+
+ir_variable *
+builtin_variable_generator::add_const_ivec3(const char *name, int x, int y,
+                                            int z)
+{
+   ir_variable *const var = add_variable(name, glsl_type::ivec3_type,
+                                         ir_var_auto, -1);
+   ir_constant_data data;
+   memset(&data, 0, sizeof(data));
+   data.i[0] = x;
+   data.i[1] = y;
+   data.i[2] = z;
+   var->constant_value = new(var) ir_constant(glsl_type::ivec3_type, &data);
+   var->constant_initializer =
+      new(var) ir_constant(glsl_type::ivec3_type, &data);
+   var->data.has_initializer = true;
+   return var;
+}
+
+
+void
+builtin_variable_generator::generate_constants()
+{
+   add_const("gl_MaxVertexAttribs", state->Const.MaxVertexAttribs);
+   add_const("gl_MaxVertexTextureImageUnits",
+             state->Const.MaxVertexTextureImageUnits);
+   add_const("gl_MaxCombinedTextureImageUnits",
+             state->Const.MaxCombinedTextureImageUnits);
+   add_const("gl_MaxTextureImageUnits", state->Const.MaxTextureImageUnits);
+   add_const("gl_MaxDrawBuffers", state->Const.MaxDrawBuffers);
+
+   /* Max uniforms/varyings: GLSL ES counts these in units of vectors; desktop
+    * GL counts them in units of "components" or "floats".
+    */
+   if (state->es_shader) {
+      add_const("gl_MaxVertexUniformVectors",
+                state->Const.MaxVertexUniformComponents / 4);
+      add_const("gl_MaxFragmentUniformVectors",
+                state->Const.MaxFragmentUniformComponents / 4);
+
+      /* In GLSL ES 3.00, gl_MaxVaryingVectors was split out to separate
+       * vertex and fragment shader constants.
+       */
+      if (state->is_version(0, 300)) {
+         add_const("gl_MaxVertexOutputVectors",
+                   state->ctx->Const.Program[MESA_SHADER_VERTEX].MaxOutputComponents / 4);
+         add_const("gl_MaxFragmentInputVectors",
+                   state->ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxInputComponents / 4);
+      } else {
+         add_const("gl_MaxVaryingVectors",
+                   state->ctx->Const.MaxVarying);
+      }
+   } else {
+      add_const("gl_MaxVertexUniformComponents",
+                state->Const.MaxVertexUniformComponents);
+
+      /* Note: gl_MaxVaryingFloats was deprecated in GLSL 1.30+, but not
+       * removed
+       */
+      add_const("gl_MaxVaryingFloats", state->ctx->Const.MaxVarying * 4);
+
+      add_const("gl_MaxFragmentUniformComponents",
+                state->Const.MaxFragmentUniformComponents);
+   }
+
+   /* Texel offsets were introduced in ARB_shading_language_420pack (which
+    * requires desktop GLSL version 130), and adopted into desktop GLSL
+    * version 4.20 and GLSL ES version 3.00.
+    */
+   if ((state->is_version(130, 0) &&
+        state->ARB_shading_language_420pack_enable) ||
+      state->is_version(420, 300)) {
+      add_const("gl_MinProgramTexelOffset",
+                state->Const.MinProgramTexelOffset);
+      add_const("gl_MaxProgramTexelOffset",
+                state->Const.MaxProgramTexelOffset);
+   }
+
+   if (state->is_version(130, 0)) {
+      add_const("gl_MaxClipDistances", state->Const.MaxClipPlanes);
+      add_const("gl_MaxVaryingComponents", state->ctx->Const.MaxVarying * 4);
+   }
+
+   if (state->is_version(150, 0)) {
+      add_const("gl_MaxVertexOutputComponents",
+                state->Const.MaxVertexOutputComponents);
+      add_const("gl_MaxGeometryInputComponents",
+                state->Const.MaxGeometryInputComponents);
+      add_const("gl_MaxGeometryOutputComponents",
+                state->Const.MaxGeometryOutputComponents);
+      add_const("gl_MaxFragmentInputComponents",
+                state->Const.MaxFragmentInputComponents);
+      add_const("gl_MaxGeometryTextureImageUnits",
+                state->Const.MaxGeometryTextureImageUnits);
+      add_const("gl_MaxGeometryOutputVertices",
+                state->Const.MaxGeometryOutputVertices);
+      add_const("gl_MaxGeometryTotalOutputComponents",
+                state->Const.MaxGeometryTotalOutputComponents);
+      add_const("gl_MaxGeometryUniformComponents",
+                state->Const.MaxGeometryUniformComponents);
+
+      /* Note: the GLSL 1.50-4.40 specs require
+       * gl_MaxGeometryVaryingComponents to be present, and to be at least 64.
+       * But they do not define what it means (and there does not appear to be
+       * any corresponding constant in the GL specs).  However,
+       * ARB_geometry_shader4 defines MAX_GEOMETRY_VARYING_COMPONENTS_ARB to
+       * be the maximum number of components available for use as geometry
+       * outputs.  So we assume this is a synonym for
+       * gl_MaxGeometryOutputComponents.
+       */
+      add_const("gl_MaxGeometryVaryingComponents",
+                state->Const.MaxGeometryOutputComponents);
+   }
+
+   if (compatibility) {
+      /* Note: gl_MaxLights stopped being listed as an explicit constant in
+       * GLSL 1.30, however it continues to be referred to (as a minimum size
+       * for compatibility-mode uniforms) all the way up through GLSL 4.30, so
+       * this seems like it was probably an oversight.
+       */
+      add_const("gl_MaxLights", state->Const.MaxLights);
+
+      add_const("gl_MaxClipPlanes", state->Const.MaxClipPlanes);
+
+      /* Note: gl_MaxTextureUnits wasn't made compatibility-only until GLSL
+       * 1.50, however this seems like it was probably an oversight.
+       */
+      add_const("gl_MaxTextureUnits", state->Const.MaxTextureUnits);
+
+      /* Note: gl_MaxTextureCoords was left out of GLSL 1.40, but it was
+       * re-introduced in GLSL 1.50, so this seems like it was probably an
+       * oversight.
+       */
+      add_const("gl_MaxTextureCoords", state->Const.MaxTextureCoords);
+   }
+
+   if (state->ARB_shader_atomic_counters_enable) {
+      add_const("gl_MaxVertexAtomicCounters",
+                state->Const.MaxVertexAtomicCounters);
+      add_const("gl_MaxGeometryAtomicCounters",
+                state->Const.MaxGeometryAtomicCounters);
+      add_const("gl_MaxFragmentAtomicCounters",
+                state->Const.MaxFragmentAtomicCounters);
+      add_const("gl_MaxCombinedAtomicCounters",
+                state->Const.MaxCombinedAtomicCounters);
+      add_const("gl_MaxAtomicCounterBindings",
+                state->Const.MaxAtomicBufferBindings);
+      add_const("gl_MaxTessControlAtomicCounters", 0);
+      add_const("gl_MaxTessEvaluationAtomicCounters", 0);
+   }
+
+   if (state->is_version(430, 0) || state->ARB_compute_shader_enable) {
+      add_const_ivec3("gl_MaxComputeWorkGroupCount",
+                      state->Const.MaxComputeWorkGroupCount[0],
+                      state->Const.MaxComputeWorkGroupCount[1],
+                      state->Const.MaxComputeWorkGroupCount[2]);
+      add_const_ivec3("gl_MaxComputeWorkGroupSize",
+                      state->Const.MaxComputeWorkGroupSize[0],
+                      state->Const.MaxComputeWorkGroupSize[1],
+                      state->Const.MaxComputeWorkGroupSize[2]);
+
+      /* From the GLSL 4.40 spec, section 7.1 (Built-In Language Variables):
+       *
+       *     The built-in constant gl_WorkGroupSize is a compute-shader
+       *     constant containing the local work-group size of the shader.  The
+       *     size of the work group in the X, Y, and Z dimensions is stored in
+       *     the x, y, and z components.  The constants values in
+       *     gl_WorkGroupSize will match those specified in the required
+       *     local_size_x, local_size_y, and local_size_z layout qualifiers
+       *     for the current shader.  This is a constant so that it can be
+       *     used to size arrays of memory that can be shared within the local
+       *     work group.  It is a compile-time error to use gl_WorkGroupSize
+       *     in a shader that does not declare a fixed local group size, or
+       *     before that shader has declared a fixed local group size, using
+       *     local_size_x, local_size_y, and local_size_z.
+       *
+       * To prevent the shader from trying to refer to gl_WorkGroupSize before
+       * the layout declaration, we don't define it here.  Instead we define it
+       * in ast_cs_input_layout::hir().
+       */
+   }
+
+   if (state->is_version(420, 0) ||
+       state->ARB_shader_image_load_store_enable) {
+      add_const("gl_MaxImageUnits",
+                state->Const.MaxImageUnits);
+      add_const("gl_MaxCombinedImageUnitsAndFragmentOutputs",
+                state->Const.MaxCombinedImageUnitsAndFragmentOutputs);
+      add_const("gl_MaxImageSamples",
+                state->Const.MaxImageSamples);
+      add_const("gl_MaxVertexImageUniforms",
+                state->Const.MaxVertexImageUniforms);
+      add_const("gl_MaxTessControlImageUniforms", 0);
+      add_const("gl_MaxTessEvaluationImageUniforms", 0);
+      add_const("gl_MaxGeometryImageUniforms",
+                state->Const.MaxGeometryImageUniforms);
+      add_const("gl_MaxFragmentImageUniforms",
+                state->Const.MaxFragmentImageUniforms);
+      add_const("gl_MaxCombinedImageUniforms",
+                state->Const.MaxCombinedImageUniforms);
+   }
+}
+
+
+/**
+ * Generate uniform variables (which exist in all types of shaders).
+ */
+void
+builtin_variable_generator::generate_uniforms()
+{
+   add_uniform(int_t, "gl_NumSamples");
+   add_uniform(type("gl_DepthRangeParameters"), "gl_DepthRange");
+   add_uniform(array(vec4_t, VERT_ATTRIB_MAX), "gl_CurrentAttribVertMESA");
+   add_uniform(array(vec4_t, VARYING_SLOT_MAX), "gl_CurrentAttribFragMESA");
+
+   if (compatibility) {
+      add_uniform(mat4_t, "gl_ModelViewMatrix");
+      add_uniform(mat4_t, "gl_ProjectionMatrix");
+      add_uniform(mat4_t, "gl_ModelViewProjectionMatrix");
+      add_uniform(mat3_t, "gl_NormalMatrix");
+      add_uniform(mat4_t, "gl_ModelViewMatrixInverse");
+      add_uniform(mat4_t, "gl_ProjectionMatrixInverse");
+      add_uniform(mat4_t, "gl_ModelViewProjectionMatrixInverse");
+      add_uniform(mat4_t, "gl_ModelViewMatrixTranspose");
+      add_uniform(mat4_t, "gl_ProjectionMatrixTranspose");
+      add_uniform(mat4_t, "gl_ModelViewProjectionMatrixTranspose");
+      add_uniform(mat4_t, "gl_ModelViewMatrixInverseTranspose");
+      add_uniform(mat4_t, "gl_ProjectionMatrixInverseTranspose");
+      add_uniform(mat4_t, "gl_ModelViewProjectionMatrixInverseTranspose");
+      add_uniform(float_t, "gl_NormalScale");
+      add_uniform(type("gl_LightModelParameters"), "gl_LightModel");
+      add_uniform(vec2_t, "gl_BumpRotMatrix0MESA");
+      add_uniform(vec2_t, "gl_BumpRotMatrix1MESA");
+      add_uniform(vec4_t, "gl_FogParamsOptimizedMESA");
+
+      const glsl_type *const mat4_array_type =
+	 array(mat4_t, state->Const.MaxTextureCoords);
+      add_uniform(mat4_array_type, "gl_TextureMatrix");
+      add_uniform(mat4_array_type, "gl_TextureMatrixInverse");
+      add_uniform(mat4_array_type, "gl_TextureMatrixTranspose");
+      add_uniform(mat4_array_type, "gl_TextureMatrixInverseTranspose");
+
+      add_uniform(array(vec4_t, state->Const.MaxClipPlanes), "gl_ClipPlane");
+      add_uniform(type("gl_PointParameters"), "gl_Point");
+
+      const glsl_type *const material_parameters_type =
+	 type("gl_MaterialParameters");
+      add_uniform(material_parameters_type, "gl_FrontMaterial");
+      add_uniform(material_parameters_type, "gl_BackMaterial");
+
+      add_uniform(array(type("gl_LightSourceParameters"),
+                        state->Const.MaxLights),
+                  "gl_LightSource");
+
+      const glsl_type *const light_model_products_type =
+         type("gl_LightModelProducts");
+      add_uniform(light_model_products_type, "gl_FrontLightModelProduct");
+      add_uniform(light_model_products_type, "gl_BackLightModelProduct");
+
+      const glsl_type *const light_products_type =
+         array(type("gl_LightProducts"), state->Const.MaxLights);
+      add_uniform(light_products_type, "gl_FrontLightProduct");
+      add_uniform(light_products_type, "gl_BackLightProduct");
+
+      add_uniform(array(vec4_t, state->Const.MaxTextureUnits),
+                  "gl_TextureEnvColor");
+
+      const glsl_type *const texcoords_vec4 =
+	 array(vec4_t, state->Const.MaxTextureCoords);
+      add_uniform(texcoords_vec4, "gl_EyePlaneS");
+      add_uniform(texcoords_vec4, "gl_EyePlaneT");
+      add_uniform(texcoords_vec4, "gl_EyePlaneR");
+      add_uniform(texcoords_vec4, "gl_EyePlaneQ");
+      add_uniform(texcoords_vec4, "gl_ObjectPlaneS");
+      add_uniform(texcoords_vec4, "gl_ObjectPlaneT");
+      add_uniform(texcoords_vec4, "gl_ObjectPlaneR");
+      add_uniform(texcoords_vec4, "gl_ObjectPlaneQ");
+
+      add_uniform(type("gl_FogParameters"), "gl_Fog");
+   }
+}
+
+
+/**
+ * Generate variables which only exist in vertex shaders.
+ */
+void
+builtin_variable_generator::generate_vs_special_vars()
+{
+   if (state->is_version(130, 300))
+      add_system_value(SYSTEM_VALUE_VERTEX_ID, int_t, "gl_VertexID");
+   if (state->ARB_draw_instanced_enable)
+      add_system_value(SYSTEM_VALUE_INSTANCE_ID, int_t, "gl_InstanceIDARB");
+   if (state->ARB_draw_instanced_enable || state->is_version(140, 300))
+      add_system_value(SYSTEM_VALUE_INSTANCE_ID, int_t, "gl_InstanceID");
+   if (state->AMD_vertex_shader_layer_enable)
+      add_output(VARYING_SLOT_LAYER, int_t, "gl_Layer");
+   if (compatibility) {
+      add_input(VERT_ATTRIB_POS, vec4_t, "gl_Vertex");
+      add_input(VERT_ATTRIB_NORMAL, vec3_t, "gl_Normal");
+      add_input(VERT_ATTRIB_COLOR0, vec4_t, "gl_Color");
+      add_input(VERT_ATTRIB_COLOR1, vec4_t, "gl_SecondaryColor");
+      add_input(VERT_ATTRIB_TEX0, vec4_t, "gl_MultiTexCoord0");
+      add_input(VERT_ATTRIB_TEX1, vec4_t, "gl_MultiTexCoord1");
+      add_input(VERT_ATTRIB_TEX2, vec4_t, "gl_MultiTexCoord2");
+      add_input(VERT_ATTRIB_TEX3, vec4_t, "gl_MultiTexCoord3");
+      add_input(VERT_ATTRIB_TEX4, vec4_t, "gl_MultiTexCoord4");
+      add_input(VERT_ATTRIB_TEX5, vec4_t, "gl_MultiTexCoord5");
+      add_input(VERT_ATTRIB_TEX6, vec4_t, "gl_MultiTexCoord6");
+      add_input(VERT_ATTRIB_TEX7, vec4_t, "gl_MultiTexCoord7");
+      add_input(VERT_ATTRIB_FOG, float_t, "gl_FogCoord");
+   }
+}
+
+
+/**
+ * Generate variables which only exist in geometry shaders.
+ */
+void
+builtin_variable_generator::generate_gs_special_vars()
+{
+   add_output(VARYING_SLOT_LAYER, int_t, "gl_Layer");
+   if (state->ARB_viewport_array_enable)
+      add_output(VARYING_SLOT_VIEWPORT, int_t, "gl_ViewportIndex");
+   if (state->ARB_gpu_shader5_enable)
+      add_system_value(SYSTEM_VALUE_INVOCATION_ID, int_t, "gl_InvocationID");
+
+   /* Although gl_PrimitiveID appears in tessellation control and tessellation
+    * evaluation shaders, it has a different function there than it has in
+    * geometry shaders, so we treat it (and its counterpart gl_PrimitiveIDIn)
+    * as special geometry shader variables.
+    *
+    * Note that although the general convention of suffixing geometry shader
+    * input varyings with "In" was not adopted into GLSL 1.50, it is used in
+    * the specific case of gl_PrimitiveIDIn.  So we don't need to treat
+    * gl_PrimitiveIDIn as an {ARB,EXT}_geometry_shader4-only variable.
+    */
+   ir_variable *var;
+   var = add_input(VARYING_SLOT_PRIMITIVE_ID, int_t, "gl_PrimitiveIDIn");
+   var->data.interpolation = INTERP_QUALIFIER_FLAT;
+   var = add_output(VARYING_SLOT_PRIMITIVE_ID, int_t, "gl_PrimitiveID");
+   var->data.interpolation = INTERP_QUALIFIER_FLAT;
+}
+
+
+/**
+ * Generate variables which only exist in fragment shaders.
+ */
+void
+builtin_variable_generator::generate_fs_special_vars()
+{
+   add_input(VARYING_SLOT_POS, vec4_t, "gl_FragCoord");
+   add_input(VARYING_SLOT_FACE, bool_t, "gl_FrontFacing");
+   if (state->is_version(120, 100))
+      add_input(VARYING_SLOT_PNTC, vec2_t, "gl_PointCoord");
+
+   if (state->is_version(150, 0)) {
+      ir_variable *var =
+         add_input(VARYING_SLOT_PRIMITIVE_ID, int_t, "gl_PrimitiveID");
+      var->data.interpolation = INTERP_QUALIFIER_FLAT;
+   }
+
+   /* gl_FragColor and gl_FragData were deprecated starting in desktop GLSL
+    * 1.30, and were relegated to the compatibility profile in GLSL 4.20.
+    * They were removed from GLSL ES 3.00.
+    */
+   if (compatibility || !state->is_version(420, 300)) {
+      add_output(FRAG_RESULT_COLOR, vec4_t, "gl_FragColor");
+      add_output(FRAG_RESULT_DATA0,
+                 array(vec4_t, state->Const.MaxDrawBuffers), "gl_FragData");
+   }
+
+   /* gl_FragDepth has always been in desktop GLSL, but did not appear in GLSL
+    * ES 1.00.
+    */
+   if (state->is_version(110, 300))
+      add_output(FRAG_RESULT_DEPTH, float_t, "gl_FragDepth");
+
+   if (state->ARB_shader_stencil_export_enable) {
+      ir_variable *const var =
+         add_output(FRAG_RESULT_STENCIL, int_t, "gl_FragStencilRefARB");
+      if (state->ARB_shader_stencil_export_warn)
+         var->warn_extension = "GL_ARB_shader_stencil_export";
+   }
+
+   if (state->AMD_shader_stencil_export_enable) {
+      ir_variable *const var =
+         add_output(FRAG_RESULT_STENCIL, int_t, "gl_FragStencilRefAMD");
+      if (state->AMD_shader_stencil_export_warn)
+         var->warn_extension = "GL_AMD_shader_stencil_export";
+   }
+
+   if (state->ARB_sample_shading_enable) {
+      add_system_value(SYSTEM_VALUE_SAMPLE_ID, int_t, "gl_SampleID");
+      add_system_value(SYSTEM_VALUE_SAMPLE_POS, vec2_t, "gl_SamplePosition");
+      /* From the ARB_sample_shading specification:
+       *    "The number of elements in the array is ceil(<s>/32), where
+       *    <s> is the maximum number of color samples supported by the
+       *    implementation."
+       * Since no drivers expose more than 32x MSAA, we can simply set
+       * the array size to 1 rather than computing it.
+       */
+      add_output(FRAG_RESULT_SAMPLE_MASK, array(int_t, 1), "gl_SampleMask");
+   }
+
+   if (state->ARB_gpu_shader5_enable) {
+      add_system_value(SYSTEM_VALUE_SAMPLE_MASK_IN, array(int_t, 1), "gl_SampleMaskIn");
+   }
+}
+
+
+/**
+ * Generate variables which only exist in compute shaders.
+ */
+void
+builtin_variable_generator::generate_cs_special_vars()
+{
+   /* TODO: finish this. */
+}
+
+
+/**
+ * Add a single "varying" variable.  The variable's type and direction (input
+ * or output) are adjusted as appropriate for the type of shader being
+ * compiled.  For geometry shaders using {ARB,EXT}_geometry_shader4,
+ * name_as_gs_input is used for the input (to avoid ambiguity).
+ */
+void
+builtin_variable_generator::add_varying(int slot, const glsl_type *type,
+                                        const char *name,
+                                        const char *name_as_gs_input)
+{
+   switch (state->stage) {
+   case MESA_SHADER_GEOMETRY:
+      this->per_vertex_in.add_field(slot, type, name);
+      /* FALLTHROUGH */
+   case MESA_SHADER_VERTEX:
+      this->per_vertex_out.add_field(slot, type, name);
+      break;
+   case MESA_SHADER_FRAGMENT:
+      add_input(slot, type, name);
+      break;
+   case MESA_SHADER_COMPUTE:
+      /* Compute shaders don't have varyings. */
+      break;
+   }
+}
+
+
+/**
+ * Generate variables that are used to communicate data from one shader stage
+ * to the next ("varyings").
+ */
+void
+builtin_variable_generator::generate_varyings()
+{
+#define ADD_VARYING(loc, type, name) \
+   add_varying(loc, type, name, name "In")
+
+   /* gl_Position and gl_PointSize are not visible from fragment shaders. */
+   if (state->stage != MESA_SHADER_FRAGMENT) {
+      ADD_VARYING(VARYING_SLOT_POS, vec4_t, "gl_Position");
+      ADD_VARYING(VARYING_SLOT_PSIZ, float_t, "gl_PointSize");
+   }
+
+   if (state->is_version(130, 0)) {
+       ADD_VARYING(VARYING_SLOT_CLIP_DIST0, array(float_t, 0),
+                   "gl_ClipDistance");
+   }
+
+   if (compatibility) {
+      ADD_VARYING(VARYING_SLOT_TEX0, array(vec4_t, 0), "gl_TexCoord");
+      ADD_VARYING(VARYING_SLOT_FOGC, float_t, "gl_FogFragCoord");
+      if (state->stage == MESA_SHADER_FRAGMENT) {
+         ADD_VARYING(VARYING_SLOT_COL0, vec4_t, "gl_Color");
+         ADD_VARYING(VARYING_SLOT_COL1, vec4_t, "gl_SecondaryColor");
+      } else {
+         ADD_VARYING(VARYING_SLOT_CLIP_VERTEX, vec4_t, "gl_ClipVertex");
+         ADD_VARYING(VARYING_SLOT_COL0, vec4_t, "gl_FrontColor");
+         ADD_VARYING(VARYING_SLOT_BFC0, vec4_t, "gl_BackColor");
+         ADD_VARYING(VARYING_SLOT_COL1, vec4_t, "gl_FrontSecondaryColor");
+         ADD_VARYING(VARYING_SLOT_BFC1, vec4_t, "gl_BackSecondaryColor");
+      }
+   }
+
+   if (state->stage == MESA_SHADER_GEOMETRY) {
+      const glsl_type *per_vertex_in_type =
+         this->per_vertex_in.construct_interface_instance();
+      add_variable("gl_in", array(per_vertex_in_type, 0),
+                   ir_var_shader_in, -1);
+   }
+   if (state->stage == MESA_SHADER_VERTEX || state->stage == MESA_SHADER_GEOMETRY) {
+      const glsl_type *per_vertex_out_type =
+         this->per_vertex_out.construct_interface_instance();
+      const glsl_struct_field *fields = per_vertex_out_type->fields.structure;
+      for (unsigned i = 0; i < per_vertex_out_type->length; i++) {
+         ir_variable *var =
+            add_variable(fields[i].name, fields[i].type, ir_var_shader_out,
+                         fields[i].location);
+         var->data.interpolation = fields[i].interpolation;
+         var->data.centroid = fields[i].centroid;
+         var->data.sample = fields[i].sample;
+         var->init_interface_type(per_vertex_out_type);
+      }
+   }
+}
+
+
+}; /* Anonymous namespace */
+
+
+void
+_mesa_glsl_initialize_variables(exec_list *instructions,
+				struct _mesa_glsl_parse_state *state)
+{
+   builtin_variable_generator gen(instructions, state);
+
+   gen.generate_constants();
+   gen.generate_uniforms();
+
+   gen.generate_varyings();
+
+   switch (state->stage) {
+   case MESA_SHADER_VERTEX:
+      gen.generate_vs_special_vars();
+      break;
+   case MESA_SHADER_GEOMETRY:
+      gen.generate_gs_special_vars();
+      break;
+   case MESA_SHADER_FRAGMENT:
+      gen.generate_fs_special_vars();
+      break;
+   case MESA_SHADER_COMPUTE:
+      gen.generate_cs_special_vars();
+      break;
+   }
+}

diff --git a/icd/intel/compiler/shader/compiler_interface.cpp b/icd/intel/compiler/shader/compiler_interface.cpp
new file mode 100644
index 0000000..bf94223
--- /dev/null
+++ b/icd/intel/compiler/shader/compiler_interface.cpp

@@ -0,0 +1,487 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Author: Cody Northrop <cody@lunarg.com>
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ * Author: Steve K <srk@LunarG.com>
+ *
+ */
+
+#include "icd-spv.h"
+#include "pipeline.h"
+#include "compiler_interface.h"
+#include "compiler/mesa-utils/src/glsl/ralloc.h"
+#include "compiler/mesa-utils/src/glsl/glsl_parser_extras.h"
+#include "compiler/shader/program.h"
+#include "compiler/mesa-utils/src/mesa/main/context.h"
+#include "compiler/mesa-utils/src/mesa/main/config.h"
+#include "compiler/shader/standalone_scaffolding.h"
+#include "compiler/pipeline/brw_wm.h"
+#include "compiler/pipeline/brw_shader.h"
+#include "SPIRV/spirv.hpp"
+
+/**
+ * Init vertex/fragment/geometry program limits.
+ * Base on init_program_limits()
+ */
+void init_mesa_program_limits(struct gl_context *ctx, gl_shader_stage stage,
+                              struct gl_program_constants *prog)
+{
+    prog->MaxInstructions = MAX_PROGRAM_INSTRUCTIONS;
+    prog->MaxAluInstructions = MAX_PROGRAM_INSTRUCTIONS;
+    prog->MaxTexInstructions = MAX_PROGRAM_INSTRUCTIONS;
+    prog->MaxTexIndirections = MAX_PROGRAM_INSTRUCTIONS;
+    prog->MaxTemps = MAX_PROGRAM_TEMPS;
+    prog->MaxEnvParams = MAX_PROGRAM_ENV_PARAMS;
+    prog->MaxLocalParams = MAX_PROGRAM_LOCAL_PARAMS;
+    prog->MaxAddressOffset = MAX_PROGRAM_LOCAL_PARAMS;
+
+    switch (stage) {
+    case MESA_SHADER_VERTEX:
+        prog->MaxParameters = MAX_VERTEX_PROGRAM_PARAMS;
+        prog->MaxAttribs = MAX_VERTEX_GENERIC_ATTRIBS;
+        prog->MaxAddressRegs = MAX_VERTEX_PROGRAM_ADDRESS_REGS;
+        prog->MaxUniformComponents = 4 * MAX_UNIFORMS;
+        prog->MaxInputComponents = 0; /* value not used */
+        prog->MaxOutputComponents = 16 * 4; /* old limit not to break tnl and swrast */
+        break;
+    case MESA_SHADER_FRAGMENT:
+        prog->MaxParameters = MAX_NV_FRAGMENT_PROGRAM_PARAMS;
+        prog->MaxAttribs = MAX_NV_FRAGMENT_PROGRAM_INPUTS;
+        prog->MaxAddressRegs = MAX_FRAGMENT_PROGRAM_ADDRESS_REGS;
+        prog->MaxUniformComponents = 4 * MAX_UNIFORMS;
+        prog->MaxInputComponents = 16 * 4; /* old limit not to break tnl and swrast */
+        prog->MaxOutputComponents = 0; /* value not used */
+        break;
+    case MESA_SHADER_GEOMETRY:
+        prog->MaxParameters = MAX_VERTEX_PROGRAM_PARAMS;
+        prog->MaxAttribs = MAX_VERTEX_GENERIC_ATTRIBS;
+        prog->MaxAddressRegs = MAX_VERTEX_PROGRAM_ADDRESS_REGS;
+        prog->MaxUniformComponents = 4 * MAX_UNIFORMS;
+        prog->MaxInputComponents = 16 * 4; /* old limit not to break tnl and swrast */
+        prog->MaxOutputComponents = 16 * 4; /* old limit not to break tnl and swrast */
+        break;
+    case MESA_SHADER_COMPUTE:
+        prog->MaxParameters = 0; /* not meaningful for compute shaders */
+        prog->MaxAttribs = 0; /* not meaningful for compute shaders */
+        prog->MaxAddressRegs = 0; /* not meaningful for compute shaders */
+        prog->MaxUniformComponents = 4 * MAX_UNIFORMS;
+        prog->MaxInputComponents = 0; /* not meaningful for compute shaders */
+        prog->MaxOutputComponents = 0; /* not meaningful for compute shaders */
+        break;
+    default:
+        assert(0 && "Bad shader stage in init_program_limits()");
+    }
+
+    /* Set the native limits to zero.  This implies that there is no native
+        * support for shaders.  Let the drivers fill in the actual values.
+        */
+    prog->MaxNativeInstructions = 0;
+    prog->MaxNativeAluInstructions = 0;
+    prog->MaxNativeTexInstructions = 0;
+    prog->MaxNativeTexIndirections = 0;
+    prog->MaxNativeAttribs = 0;
+    prog->MaxNativeTemps = 0;
+    prog->MaxNativeAddressRegs = 0;
+    prog->MaxNativeParameters = 0;
+
+    /* Set GLSL datatype range/precision info assuming IEEE float values.
+        * Drivers should override these defaults as needed.
+        */
+    prog->MediumFloat.RangeMin = 127;
+    prog->MediumFloat.RangeMax = 127;
+    prog->MediumFloat.Precision = 23;
+    prog->LowFloat = prog->HighFloat = prog->MediumFloat;
+
+    /* Assume ints are stored as floats for now, since this is the least-common
+        * denominator.  The OpenGL ES spec implies (page 132) that the precision
+        * of integer types should be 0.  Practically speaking, IEEE
+        * single-precision floating point values can only store integers in the
+        * range [-0x01000000, 0x01000000] without loss of precision.
+        */
+    prog->MediumInt.RangeMin = 24;
+    prog->MediumInt.RangeMax = 24;
+    prog->MediumInt.Precision = 0;
+    prog->LowInt = prog->HighInt = prog->MediumInt;
+
+    prog->MaxUniformBlocks = 12;
+    prog->MaxCombinedUniformComponents = (prog->MaxUniformComponents +
+                                          ctx->Const.MaxUniformBlockSize / 4 *
+                                          prog->MaxUniformBlocks);
+
+    prog->MaxAtomicBuffers = 0;
+    prog->MaxAtomicCounters = 0;
+}
+
+// Copied from context.c:_mesa_init_constants
+void
+initialize_mesa_constants(struct gl_context *ctx)
+{
+   int i;
+   assert(ctx);
+
+   /* Constants, may be overriden (usually only reduced) by device drivers */
+   ctx->Const.MaxTextureMbytes = MAX_TEXTURE_MBYTES;
+   ctx->Const.MaxTextureLevels = MAX_TEXTURE_LEVELS;
+   ctx->Const.Max3DTextureLevels = MAX_3D_TEXTURE_LEVELS;
+   ctx->Const.MaxCubeTextureLevels = MAX_CUBE_TEXTURE_LEVELS;
+   ctx->Const.MaxTextureRectSize = MAX_TEXTURE_RECT_SIZE;
+   ctx->Const.MaxArrayTextureLayers = MAX_ARRAY_TEXTURE_LAYERS;
+   ctx->Const.MaxTextureCoordUnits = MAX_TEXTURE_COORD_UNITS;
+   ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxTextureImageUnits = MAX_TEXTURE_IMAGE_UNITS;
+   ctx->Const.MaxTextureUnits = MIN2(ctx->Const.MaxTextureCoordUnits,
+                                     ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxTextureImageUnits);
+   ctx->Const.MaxTextureMaxAnisotropy = MAX_TEXTURE_MAX_ANISOTROPY;
+   ctx->Const.MaxTextureLodBias = MAX_TEXTURE_LOD_BIAS;
+   ctx->Const.MaxTextureBufferSize = 65536;
+   ctx->Const.TextureBufferOffsetAlignment = 1;
+   ctx->Const.MaxArrayLockSize = MAX_ARRAY_LOCK_SIZE;
+   ctx->Const.SubPixelBits = SUB_PIXEL_BITS;
+   ctx->Const.MinPointSize = MIN_POINT_SIZE;
+   ctx->Const.MaxPointSize = MAX_POINT_SIZE;
+   ctx->Const.MinPointSizeAA = MIN_POINT_SIZE;
+   ctx->Const.MaxPointSizeAA = MAX_POINT_SIZE;
+   ctx->Const.PointSizeGranularity = (GLfloat) POINT_SIZE_GRANULARITY;
+   ctx->Const.MinLineWidth = MIN_LINE_WIDTH;
+   ctx->Const.MaxLineWidth = MAX_LINE_WIDTH;
+   ctx->Const.MinLineWidthAA = MIN_LINE_WIDTH;
+   ctx->Const.MaxLineWidthAA = MAX_LINE_WIDTH;
+   ctx->Const.LineWidthGranularity = (GLfloat) LINE_WIDTH_GRANULARITY;
+   ctx->Const.MaxClipPlanes = 6;
+   ctx->Const.MaxLights = MAX_LIGHTS;
+   ctx->Const.MaxShininess = 128.0;
+   ctx->Const.MaxSpotExponent = 128.0;
+   ctx->Const.MaxViewportWidth = MAX_VIEWPORT_WIDTH;
+   ctx->Const.MaxViewportHeight = MAX_VIEWPORT_HEIGHT;
+   ctx->Const.MinMapBufferAlignment = 64;
+
+   /* Driver must override these values if ARB_viewport_array is supported. */
+   ctx->Const.MaxViewports = 1;
+   ctx->Const.ViewportSubpixelBits = 0;
+   ctx->Const.ViewportBounds.Min = 0;
+   ctx->Const.ViewportBounds.Max = 0;
+
+   /** GL_ARB_uniform_buffer_object */
+   ctx->Const.MaxCombinedUniformBlocks = 36;
+   ctx->Const.MaxUniformBufferBindings = 36;
+   ctx->Const.MaxUniformBlockSize = 16384;
+   ctx->Const.UniformBufferOffsetAlignment = 1;
+
+   for (i = 0; i < MESA_SHADER_STAGES; i++)
+      init_mesa_program_limits(ctx, (gl_shader_stage)i, &ctx->Const.Program[i]);
+
+   ctx->Const.MaxProgramMatrices = MAX_PROGRAM_MATRICES;
+   ctx->Const.MaxProgramMatrixStackDepth = MAX_PROGRAM_MATRIX_STACK_DEPTH;
+
+   /* CheckArrayBounds is overriden by drivers/x11 for X server */
+   ctx->Const.CheckArrayBounds = GL_FALSE;
+
+   /* GL_ARB_draw_buffers */
+   ctx->Const.MaxDrawBuffers = MAX_DRAW_BUFFERS;
+
+   ctx->Const.MaxColorAttachments = MAX_COLOR_ATTACHMENTS;
+   ctx->Const.MaxRenderbufferSize = MAX_RENDERBUFFER_SIZE;
+
+   ctx->Const.Program[MESA_SHADER_VERTEX].MaxTextureImageUnits = MAX_TEXTURE_IMAGE_UNITS;
+   ctx->Const.MaxCombinedTextureImageUnits = MAX_COMBINED_TEXTURE_IMAGE_UNITS;
+   ctx->Const.MaxVarying = 16; /* old limit not to break tnl and swrast */
+   ctx->Const.Program[MESA_SHADER_GEOMETRY].MaxTextureImageUnits = MAX_TEXTURE_IMAGE_UNITS;
+   ctx->Const.MaxGeometryOutputVertices = MAX_GEOMETRY_OUTPUT_VERTICES;
+   ctx->Const.MaxGeometryTotalOutputComponents = MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS;
+
+   /* Shading language version */
+   ctx->Const.GLSLVersion = 140;
+
+   /* GL_ARB_framebuffer_object */
+   ctx->Const.MaxSamples = 0;
+
+   /* GL_ARB_sync */
+   ctx->Const.MaxServerWaitTimeout = 0x1fff7fffffffULL;
+
+   /* GL_ATI_envmap_bumpmap */
+   ctx->Const.SupportedBumpUnits = SUPPORTED_ATI_BUMP_UNITS;
+
+   /* GL_EXT_provoking_vertex */
+   ctx->Const.QuadsFollowProvokingVertexConvention = GL_TRUE;
+
+   /* GL_EXT_transform_feedback */
+   ctx->Const.MaxTransformFeedbackBuffers = MAX_FEEDBACK_BUFFERS;
+   ctx->Const.MaxTransformFeedbackSeparateComponents = 4 * MAX_FEEDBACK_ATTRIBS;
+   ctx->Const.MaxTransformFeedbackInterleavedComponents = 4 * MAX_FEEDBACK_ATTRIBS;
+   ctx->Const.MaxVertexStreams = 1;
+
+   /* GL 3.2  */
+   ctx->Const.ProfileMask = (ctx->API == API_OPENGL_CORE || ctx->API == API_VK)
+                          ? GL_CONTEXT_CORE_PROFILE_BIT
+                          : GL_CONTEXT_COMPATIBILITY_PROFILE_BIT;
+
+   /** GL_EXT_gpu_shader4 */
+   ctx->Const.MinProgramTexelOffset = -8;
+   ctx->Const.MaxProgramTexelOffset = 7;
+
+   /* GL_ARB_texture_gather */
+   ctx->Const.MinProgramTextureGatherOffset = -8;
+   ctx->Const.MaxProgramTextureGatherOffset = 7;
+
+   /* GL_ARB_robustness */
+   ctx->Const.ResetStrategy = GL_NO_RESET_NOTIFICATION_ARB;
+
+   /* PrimitiveRestart */
+   ctx->Const.PrimitiveRestartInSoftware = GL_FALSE;
+
+   /* ES 3.0 or ARB_ES3_compatibility */
+   ctx->Const.MaxElementIndex = 0xffffffffu;
+
+   /* GL_ARB_texture_multisample */
+   ctx->Const.MaxColorTextureSamples = 1;
+   ctx->Const.MaxDepthTextureSamples = 1;
+   ctx->Const.MaxIntegerSamples = 1;
+
+   /* GL_ARB_shader_atomic_counters */
+   ctx->Const.MaxAtomicBufferBindings = MAX_COMBINED_ATOMIC_BUFFERS;
+   ctx->Const.MaxAtomicBufferSize = MAX_ATOMIC_COUNTERS * ATOMIC_COUNTER_SIZE;
+   ctx->Const.MaxCombinedAtomicBuffers = MAX_COMBINED_ATOMIC_BUFFERS;
+   ctx->Const.MaxCombinedAtomicCounters = MAX_ATOMIC_COUNTERS;
+
+   /* GL_ARB_vertex_attrib_binding */
+   ctx->Const.MaxVertexAttribRelativeOffset = 2047;
+   ctx->Const.MaxVertexAttribBindings = MAX_VERTEX_GENERIC_ATTRIBS;
+
+   /* GL_ARB_compute_shader */
+   ctx->Const.MaxComputeWorkGroupCount[0] = 65535;
+   ctx->Const.MaxComputeWorkGroupCount[1] = 65535;
+   ctx->Const.MaxComputeWorkGroupCount[2] = 65535;
+   ctx->Const.MaxComputeWorkGroupSize[0] = 1024;
+   ctx->Const.MaxComputeWorkGroupSize[1] = 1024;
+   ctx->Const.MaxComputeWorkGroupSize[2] = 64;
+   ctx->Const.MaxComputeWorkGroupInvocations = 1024;
+
+   /** GL_ARB_gpu_shader5 */
+   ctx->Const.MinFragmentInterpolationOffset = MIN_FRAGMENT_INTERPOLATION_OFFSET;
+   ctx->Const.MaxFragmentInterpolationOffset = MAX_FRAGMENT_INTERPOLATION_OFFSET;
+
+   ctx->Const.GlassMode = 0;
+}
+
+void initialize_mesa_context_to_defaults(struct gl_context *ctx)
+{
+   memset(ctx, 0, sizeof(*ctx));
+
+   ctx->API = API_VK;
+
+   ctx->Extensions.dummy_false = false;
+   ctx->Extensions.dummy_true = true;
+   ctx->Extensions.ARB_compute_shader = true;
+   ctx->Extensions.ARB_conservative_depth = true;
+   ctx->Extensions.ARB_draw_instanced = true;
+   ctx->Extensions.ARB_ES2_compatibility = true;
+   ctx->Extensions.ARB_ES3_compatibility = true;
+   ctx->Extensions.ARB_explicit_attrib_location = true;
+   ctx->Extensions.ARB_fragment_coord_conventions = true;
+   ctx->Extensions.ARB_gpu_shader5 = true;
+   ctx->Extensions.ARB_sample_shading = true;
+   ctx->Extensions.ARB_shader_bit_encoding = true;
+   ctx->Extensions.ARB_shader_stencil_export = true;
+   ctx->Extensions.ARB_shader_texture_lod = true;
+   ctx->Extensions.ARB_shading_language_420pack = true;
+   ctx->Extensions.ARB_shading_language_packing = true;
+   ctx->Extensions.ARB_texture_cube_map_array = true;
+   ctx->Extensions.ARB_texture_gather = true;
+   ctx->Extensions.ARB_texture_multisample = true;
+   ctx->Extensions.ARB_texture_query_levels = true;
+   ctx->Extensions.ARB_texture_query_lod = true;
+   ctx->Extensions.ARB_uniform_buffer_object = true;
+   ctx->Extensions.ARB_viewport_array = true;
+   ctx->Extensions.OES_EGL_image_external = true;
+   ctx->Extensions.OES_standard_derivatives = true;
+   ctx->Extensions.EXT_shader_integer_mix = true;
+   ctx->Extensions.EXT_texture3D = true;
+   ctx->Extensions.EXT_texture_array = true;
+   ctx->Extensions.NV_texture_rectangle = true;
+
+   // Set some initial constants, override as needed
+   initialize_mesa_constants( ctx );
+
+   /* Set up default shader compiler options. */
+   struct gl_shader_compiler_options options;
+   memset(&options, 0, sizeof(options));
+   options.MaxUnrollIterations = 32;
+   options.MaxIfDepth = UINT_MAX;
+
+   /* Default pragma settings */
+   options.DefaultPragmas.Optimize = true;
+
+   for (int sh = 0; sh < MESA_SHADER_STAGES; ++sh)
+      memcpy(&ctx->ShaderCompilerOptions[sh], &options, sizeof(options));
+
+
+   ctx->Driver.NewShader = _mesa_new_shader;
+   ctx->Driver.DeleteShader = _mesa_delete_shader;
+}
+
+
+extern "C" {
+
+// invoke front end compiler to generate an independently linked
+// program object that contains Mesa HIR
+struct intel_ir *shader_create_ir(const struct intel_gpu *gpu,
+                                  const void *code, size_t size,
+                                  VkShaderStageFlagBits stage)
+{
+    struct icd_spv_header header;
+    struct gl_context local_ctx;
+    struct gl_context *ctx = &local_ctx;
+
+    memcpy(&header, code, sizeof(header));
+    if (header.magic != ICD_SPV_MAGIC) {
+        return NULL;
+    }
+
+    _mesa_create_shader_compiler();
+    initialize_mesa_context_to_defaults(ctx);
+
+    struct gl_shader_program *shader_program = brw_new_shader_program(ctx, 0);
+    assert(shader_program != NULL);
+
+    shader_program->InfoLog = ralloc_strdup(shader_program, "");
+    shader_program->Shaders =
+    reralloc(shader_program, shader_program->Shaders,
+        struct gl_shader *, shader_program->NumShaders + 1);
+    assert(shader_program->Shaders != NULL);
+
+    struct gl_shader *shader = rzalloc(shader_program, struct gl_shader);
+
+    shader_program->Shaders[shader_program->NumShaders] = shader;
+    shader_program->NumShaders++;
+
+    if (header.version == 0) {
+        // version 0 means we really have GLSL Source
+        shader->Source = (const char *) code + sizeof(header);
+
+        switch(header.gen_magic) {
+        case VK_SHADER_STAGE_VERTEX_BIT:
+            shader->Type = GL_VERTEX_SHADER;
+            break;
+        case VK_SHADER_STAGE_GEOMETRY_BIT:
+            shader->Type = GL_GEOMETRY_SHADER;
+            break;
+        case VK_SHADER_STAGE_FRAGMENT_BIT:
+            shader->Type = GL_FRAGMENT_SHADER;
+            break;
+        default:
+            assert(0);
+            break;
+        }
+    } else {
+
+        shader->Source = (const GLchar*)code;
+        shader->Size   = size / sizeof(unsigned);  // size in SPV words
+
+        switch (stage) {
+        case VK_SHADER_STAGE_VERTEX_BIT:
+            shader->Type = GL_VERTEX_SHADER;
+            break;
+        case VK_SHADER_STAGE_GEOMETRY_BIT:
+            shader->Type = GL_GEOMETRY_SHADER;
+            break;
+        case VK_SHADER_STAGE_FRAGMENT_BIT:
+            shader->Type = GL_FRAGMENT_SHADER;
+            break;
+        default:
+            assert(0);
+            break;
+        }
+    }
+
+    shader->Stage = _mesa_shader_enum_to_shader_stage(shader->Type);
+
+    shader_program->Type = shader->Stage;
+
+    bool dump_ast = false;
+    bool dump_SPV = false;
+    bool dump_hir = false;
+
+    bool strip_SPV = false;
+    bool canonicalize_SPV = false;
+
+    _mesa_glsl_compile_shader(ctx, shader, dump_ast, dump_SPV, dump_hir, strip_SPV, canonicalize_SPV);
+
+    if (strlen(shader->InfoLog) > 0) {
+        printf("Info log:\n%s\n", shader->InfoLog);
+        fflush(stdout);
+    }
+
+    if (!shader->CompileStatus) {
+        _mesa_destroy_shader_compiler();
+        return NULL;
+    }
+
+    assert(shader_program->NumShaders == 1);
+
+    // for VK, we are independently compiling and linking individual
+    // shaders, which matches this frontend's concept of SSO
+    shader_program->SeparateShader = true;
+
+    link_shaders(ctx, shader_program);
+
+    if (strlen(shader_program->InfoLog) > 0) {
+        printf("Info log for linking:\n%s\n", shader_program->InfoLog);
+        fflush(stdout);
+    }
+
+    if (!shader_program->LinkStatus) {
+        _mesa_destroy_shader_compiler();
+        return NULL;
+    }
+
+    _mesa_destroy_shader_compiler();
+
+    return (struct intel_ir *) shader_program;
+}
+
+void shader_create_ir_with_lock(const struct intel_gpu *gpu,
+                                const void *code, size_t size,
+                                VkShaderStageFlagBits stage,
+                                struct intel_ir **ir)
+{
+    // Wrap this path in a mutex until we can clean up initialization
+    static mtx_t mutex = _MTX_INITIALIZER_NP;
+    mtx_lock(&mutex);
+
+    if (!*ir)
+        *ir = shader_create_ir(gpu, code, size, stage);
+
+    mtx_unlock(&mutex);
+}
+
+void shader_destroy_ir(struct intel_ir *ir)
+{
+    struct gl_shader_program *sh_prog = (struct gl_shader_program *) ir;
+
+    for (unsigned i = 0; i < MESA_SHADER_STAGES; i++)
+       ralloc_free(sh_prog->_LinkedShaders[i]);
+
+    ralloc_free(sh_prog);
+}
+
+} // extern "C"

diff --git a/icd/intel/compiler/shader/compiler_interface.h b/icd/intel/compiler/shader/compiler_interface.h
new file mode 100644
index 0000000..d53130e
--- /dev/null
+++ b/icd/intel/compiler/shader/compiler_interface.h

@@ -0,0 +1,57 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Cody Northrop <cody@lunarg.com>
+ *
+ */
+
+#ifndef COMPILER_INTERFACE_H
+#define COMPILER_INTERFACE_H
+
+#include <icd.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct gl_context;
+
+void initialize_mesa_context_to_defaults(struct gl_context *ctx);
+
+struct intel_ir *shader_create_ir(const struct intel_gpu *gpu,
+                                  const void *code, size_t size,
+                                  VkShaderStageFlagBits stage);
+
+void shader_create_ir_with_lock(const struct intel_gpu *gpu,
+                                const void *code, size_t size,
+                                VkShaderStageFlagBits stage,
+                                struct intel_ir **ir);
+
+void shader_destroy_ir(struct intel_ir *ir);
+
+#ifdef __cplusplus
+} // extern "C"
+#endif
+
+#endif /* COMPILER_INTERFACE_H */

diff --git a/icd/intel/compiler/shader/glcpp/.gitignore b/icd/intel/compiler/shader/glcpp/.gitignore
new file mode 100644
index 0000000..24a7119
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/.gitignore

@@ -0,0 +1,6 @@
+glcpp
+glcpp-lex.c
+glcpp-parse.output
+glcpp-parse.c
+glcpp-parse.h
+tests/*.out

diff --git a/icd/intel/compiler/shader/glcpp/README b/icd/intel/compiler/shader/glcpp/README
new file mode 100644
index 0000000..5e82d0d
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/README

@@ -0,0 +1,30 @@
+glcpp -- GLSL "C" preprocessor
+
+This is a simple preprocessor designed to provide the preprocessing
+needs of the GLSL language. The requirements for this preprocessor are
+specified in the GLSL 1.30 specification available from:
+
+http://www.opengl.org/registry/doc/GLSLangSpec.Full.1.30.10.pdf
+
+This specification is not precise on some semantics, (for example,
+#define and #if), defining these merely "as is standard for C++
+preprocessors". To fill in these details, I've been using a draft of
+the C99 standard as available from:
+
+http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
+
+Any downstream compiler accepting output from glcpp should be prepared
+to encounter and deal with the following preprocessor macros:
+
+	#line
+	#pragma
+	#extension
+
+All other macros will be handled according to the GLSL specification
+and will not appear in the output.
+
+Known limitations
+-----------------
+A file that ends with a function-like macro name as the last
+non-whitespace token will result in a parse error, (where it should be
+passed through as is).

diff --git a/icd/intel/compiler/shader/glcpp/glcpp-lex.l b/icd/intel/compiler/shader/glcpp/glcpp-lex.l
new file mode 100644
index 0000000..188e454
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/glcpp-lex.l

@@ -0,0 +1,379 @@
+%{
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <ctype.h>
+
+#include "glcpp.h"
+#include "glcpp-parse.h"
+
+/* Flex annoyingly generates some functions without making them
+ * static. Let's declare them here. */
+int glcpp_get_column  (yyscan_t yyscanner);
+void glcpp_set_column (int  column_no , yyscan_t yyscanner);
+
+#ifdef _MSC_VER
+#define YY_NO_UNISTD_H
+#endif
+
+#define YY_NO_INPUT
+
+#define YY_USER_ACTION							\
+	do {								\
+		if (parser->has_new_line_number)			\
+			yylineno = parser->new_line_number;		\
+		if (parser->has_new_source_number)			\
+			yylloc->source = parser->new_source_number;	\
+		yylloc->first_column = yycolumn + 1;			\
+		yylloc->first_line = yylloc->last_line = yylineno;	\
+		yycolumn += yyleng;					\
+		yylloc->last_column = yycolumn + 1;			\
+		parser->has_new_line_number = 0;			\
+		parser->has_new_source_number = 0;			\
+ } while(0);
+
+#define YY_USER_INIT			\
+	do {				\
+		yylineno = 1;		\
+		yycolumn = 1;		\
+		yylloc->source = 0;	\
+	} while(0)
+%}
+
+%option bison-bridge bison-locations reentrant noyywrap
+%option extra-type="glcpp_parser_t *"
+%option prefix="glcpp_"
+%option stack
+%option never-interactive
+
+%x DONE COMMENT UNREACHABLE SKIP DEFINE NEWLINE_CATCHUP
+
+SPACE		[[:space:]]
+NONSPACE	[^[:space:]]
+NEWLINE		[\n]
+HSPACE		[ \t]
+HASH		^{HSPACE}*#{HSPACE}*
+IDENTIFIER	[_a-zA-Z][_a-zA-Z0-9]*
+PUNCTUATION	[][(){}.&*~!/%<>^|;,=+-]
+
+/* The OTHER class is simply a catch-all for things that the CPP
+parser just doesn't care about. Since flex regular expressions that
+match longer strings take priority over those matching shorter
+strings, we have to be careful to avoid OTHER matching and hiding
+something that CPP does care about. So we simply exclude all
+characters that appear in any other expressions. */
+
+OTHER		[^][_#[:space:]#a-zA-Z0-9(){}.&*~!/%<>^|;,=+-]
+
+DIGITS			[0-9][0-9]*
+DECIMAL_INTEGER		[1-9][0-9]*[uU]?
+OCTAL_INTEGER		0[0-7]*[uU]?
+HEXADECIMAL_INTEGER	0[xX][0-9a-fA-F]+[uU]?
+
+%%
+
+	glcpp_parser_t *parser = yyextra;
+
+	/* When we lex a multi-line comment, we replace it (as
+	 * specified) with a single space. But if the comment spanned
+	 * multiple lines, then subsequent parsing stages will not
+	 * count correct line numbers. To avoid this problem we keep
+	 * track of all newlines that were commented out by a
+	 * multi-line comment, and we emit a NEWLINE token for each at
+	 * the next legal opportunity, (which is when the lexer would
+	 * be emitting a NEWLINE token anyway).
+	 */
+	if (YY_START == NEWLINE_CATCHUP) {
+		if (parser->commented_newlines)
+			parser->commented_newlines--;
+		if (parser->commented_newlines == 0)
+			BEGIN INITIAL;
+		return NEWLINE;
+	}
+
+	/* The handling of the SKIP vs INITIAL start states requires
+	 * some special handling. Typically, a lexer would change
+	 * start states with statements like "BEGIN SKIP" within the
+	 * lexer rules. We can't get away with that here, since we
+	 * need the parser to actually evaluate expressions for
+	 * directives like "#if".
+	 *
+	 * So, here, in code that will be executed on every call to
+	 * the lexer,and before any rules, we examine the skip_stack
+	 * as set by the parser to know whether to change from INITIAL
+	 * to SKIP or from SKIP back to INITIAL.
+	 *
+	 * Three cases cause us to switch out of the SKIP state and
+	 * back to the INITIAL state:
+	 *
+	 *	1. The top of the skip_stack is of type SKIP_NO_SKIP
+	 *	   This means we're still evaluating some #if
+	 *	   hierarchy, but we're on a branch of it where
+	 *	   content should not be skipped (such as "#if 1" or
+	 *	   "#else" or so).
+	 *
+	 *	2. The skip_stack is NULL meaning that we've reached
+	 *	   the last #endif.
+	 *
+	 *	3. The lexing_if bit is set. This indicates that we
+	 *	   are lexing the expression following an "#if" of
+	 *	   "#elif". Even inside an "#if 0" we need to lex this
+	 *	   expression so the parser can correctly update the
+	 *	   skip_stack state.
+	 */
+	if (YY_START == INITIAL || YY_START == SKIP) {
+		if (parser->lexing_if ||
+		    parser->skip_stack == NULL ||
+		    parser->skip_stack->type == SKIP_NO_SKIP)
+		{
+			BEGIN INITIAL;
+		} else {
+			BEGIN SKIP;
+		}
+	}
+
+	/* Single-line comments */
+"//"[^\n]* {
+}
+
+	/* Multi-line comments */
+"/*"                    { yy_push_state(COMMENT, yyscanner); }
+<COMMENT>[^*\n]*
+<COMMENT>[^*\n]*\n      { yylineno++; yycolumn = 0; parser->commented_newlines++; }
+<COMMENT>"*"+[^*/\n]*
+<COMMENT>"*"+[^*/\n]*\n { yylineno++; yycolumn = 0; parser->commented_newlines++; }
+<COMMENT>"*"+"/"        {
+	yy_pop_state(yyscanner);
+	if (yyextra->space_tokens)
+		return SPACE;
+}
+
+{HASH}version{HSPACE}+ {
+	yylval->str = ralloc_strdup (yyextra, yytext);
+	yyextra->space_tokens = 0;
+	return HASH_VERSION;
+}
+
+	/* glcpp doesn't handle #extension, #version, or #pragma directives.
+	 * Simply pass them through to the main compiler's lexer/parser. */
+{HASH}(extension|pragma)[^\n]+ {
+	if (parser->commented_newlines)
+		BEGIN NEWLINE_CATCHUP;
+	yylval->str = ralloc_strdup (yyextra, yytext);
+	yylineno++;
+	yycolumn = 0;
+	return OTHER;
+}
+
+{HASH}line{HSPACE}+ {
+	return HASH_LINE;
+}
+
+<SKIP,INITIAL>{
+{HASH}ifdef {
+	yyextra->lexing_if = 1;
+	yyextra->space_tokens = 0;
+	return HASH_IFDEF;
+}
+
+{HASH}ifndef {
+	yyextra->lexing_if = 1;
+	yyextra->space_tokens = 0;
+	return HASH_IFNDEF;
+}
+
+{HASH}if/[^_a-zA-Z0-9] {
+	yyextra->lexing_if = 1;
+	yyextra->space_tokens = 0;
+	return HASH_IF;
+}
+
+{HASH}elif/[^_a-zA-Z0-9] {
+	yyextra->lexing_if = 1;
+	yyextra->space_tokens = 0;
+	return HASH_ELIF;
+}
+
+{HASH}else {
+	yyextra->space_tokens = 0;
+	return HASH_ELSE;
+}
+
+{HASH}endif {
+	yyextra->space_tokens = 0;
+	return HASH_ENDIF;
+}
+}
+
+<SKIP>[^\n] {
+	if (parser->commented_newlines)
+		BEGIN NEWLINE_CATCHUP;
+}
+
+{HASH}error.* {
+	char *p;
+	for (p = yytext; !isalpha(p[0]); p++); /* skip "  #   " */
+	p += 5; /* skip "error" */
+	glcpp_error(yylloc, yyextra, "#error%s", p);
+}
+
+{HASH}define{HSPACE}+ {
+	yyextra->space_tokens = 0;
+	yy_push_state(DEFINE, yyscanner);
+	return HASH_DEFINE;
+}
+
+<DEFINE>{IDENTIFIER}/"(" {
+	yy_pop_state(yyscanner);
+	yylval->str = ralloc_strdup (yyextra, yytext);
+	return FUNC_IDENTIFIER;
+}
+
+<DEFINE>{IDENTIFIER} {
+	yy_pop_state(yyscanner);
+	yylval->str = ralloc_strdup (yyextra, yytext);
+	return OBJ_IDENTIFIER;
+}
+
+{HASH}undef {
+	yyextra->space_tokens = 0;
+	return HASH_UNDEF;
+}
+
+{HASH} {
+	yyextra->space_tokens = 0;
+	return HASH;
+}
+
+{DECIMAL_INTEGER} {
+	yylval->str = ralloc_strdup (yyextra, yytext);
+	return INTEGER_STRING;
+}
+
+{OCTAL_INTEGER} {
+	yylval->str = ralloc_strdup (yyextra, yytext);
+	return INTEGER_STRING;
+}
+
+{HEXADECIMAL_INTEGER} {
+	yylval->str = ralloc_strdup (yyextra, yytext);
+	return INTEGER_STRING;
+}
+
+"<<"  {
+	return LEFT_SHIFT;
+}
+
+">>" {
+	return RIGHT_SHIFT;
+}
+
+"<=" {
+	return LESS_OR_EQUAL;
+}
+
+">=" {
+	return GREATER_OR_EQUAL;
+}
+
+"==" {
+	return EQUAL;
+}
+
+"!=" {
+	return NOT_EQUAL;
+}
+
+"&&" {
+	return AND;
+}
+
+"||" {
+	return OR;
+}
+
+"##" {
+	if (parser->is_gles)
+		glcpp_error(yylloc, yyextra, "Token pasting (##) is illegal in GLES");
+	return PASTE;
+}
+
+"defined" {
+	return DEFINED;
+}
+
+{IDENTIFIER} {
+	yylval->str = ralloc_strdup (yyextra, yytext);
+	return IDENTIFIER;
+}
+
+{PUNCTUATION} {
+	return yytext[0];
+}
+
+{OTHER}+ {
+	yylval->str = ralloc_strdup (yyextra, yytext);
+	return OTHER;
+}
+
+{HSPACE} {
+	if (yyextra->space_tokens) {
+		return SPACE;
+	}
+}
+
+<SKIP,INITIAL>\n {
+	if (parser->commented_newlines) {
+		BEGIN NEWLINE_CATCHUP;
+	}
+	yyextra->lexing_if = 0;
+	yylineno++;
+	yycolumn = 0;
+	return NEWLINE;
+}
+
+	/* Handle missing newline at EOF. */
+<INITIAL><<EOF>> {
+	BEGIN DONE; /* Don't keep matching this rule forever. */
+	yyextra->lexing_if = 0;
+	return NEWLINE;
+}
+
+	/* We don't actually use the UNREACHABLE start condition. We
+	only have this action here so that we can pretend to call some
+	generated functions, (to avoid "defined but not used"
+	warnings. */
+<UNREACHABLE>. {
+	unput('.');
+	yy_top_state(yyextra);
+}
+
+%%
+
+void
+glcpp_lex_set_source_string(glcpp_parser_t *parser, const char *shader)
+{
+	yy_scan_string(shader, parser->scanner);
+}

diff --git a/icd/intel/compiler/shader/glcpp/glcpp-parse.y b/icd/intel/compiler/shader/glcpp/glcpp-parse.y
new file mode 100644
index 0000000..2be499a
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/glcpp-parse.y

@@ -0,0 +1,2192 @@
+%{
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <assert.h>
+#include <inttypes.h>
+
+#include "glcpp.h"
+#include "main/core.h" /* for struct gl_extensions */
+#include "main/mtypes.h" /* for gl_api enum */
+
+static void
+yyerror (YYLTYPE *locp, glcpp_parser_t *parser, const char *error);
+
+static void
+_define_object_macro (glcpp_parser_t *parser,
+		      YYLTYPE *loc,
+		      const char *macro,
+		      token_list_t *replacements);
+
+static void
+_define_function_macro (glcpp_parser_t *parser,
+			YYLTYPE *loc,
+			const char *macro,
+			string_list_t *parameters,
+			token_list_t *replacements);
+
+static string_list_t *
+_string_list_create (void *ctx);
+
+static void
+_string_list_append_item (string_list_t *list, const char *str);
+
+static int
+_string_list_contains (string_list_t *list, const char *member, int *index);
+
+static int
+_string_list_length (string_list_t *list);
+
+static int
+_string_list_equal (string_list_t *a, string_list_t *b);
+
+static argument_list_t *
+_argument_list_create (void *ctx);
+
+static void
+_argument_list_append (argument_list_t *list, token_list_t *argument);
+
+static int
+_argument_list_length (argument_list_t *list);
+
+static token_list_t *
+_argument_list_member_at (argument_list_t *list, int index);
+
+/* Note: This function ralloc_steal()s the str pointer. */
+static token_t *
+_token_create_str (void *ctx, int type, char *str);
+
+static token_t *
+_token_create_ival (void *ctx, int type, int ival);
+
+static token_list_t *
+_token_list_create (void *ctx);
+
+static void
+_token_list_append (token_list_t *list, token_t *token);
+
+static void
+_token_list_append_list (token_list_t *list, token_list_t *tail);
+
+static int
+_token_list_equal_ignoring_space (token_list_t *a, token_list_t *b);
+
+static void
+_parser_active_list_push (glcpp_parser_t *parser,
+			  const char *identifier,
+			  token_node_t *marker);
+
+static void
+_parser_active_list_pop (glcpp_parser_t *parser);
+
+static int
+_parser_active_list_contains (glcpp_parser_t *parser, const char *identifier);
+
+/* Expand list, and begin lexing from the result (after first
+ * prefixing a token of type 'head_token_type').
+ */
+static void
+_glcpp_parser_expand_and_lex_from (glcpp_parser_t *parser,
+				   int head_token_type,
+				   token_list_t *list);
+
+/* Perform macro expansion in-place on the given list. */
+static void
+_glcpp_parser_expand_token_list (glcpp_parser_t *parser,
+				 token_list_t *list);
+
+static void
+_glcpp_parser_print_expanded_token_list (glcpp_parser_t *parser,
+					 token_list_t *list);
+
+static void
+_glcpp_parser_skip_stack_push_if (glcpp_parser_t *parser, YYLTYPE *loc,
+				  int condition);
+
+static void
+_glcpp_parser_skip_stack_change_if (glcpp_parser_t *parser, YYLTYPE *loc,
+				    const char *type, int condition);
+
+static void
+_glcpp_parser_skip_stack_pop (glcpp_parser_t *parser, YYLTYPE *loc);
+
+static void
+_glcpp_parser_handle_version_declaration(glcpp_parser_t *parser, intmax_t version,
+                                         const char *ident, bool explicitly_set);
+
+static int
+glcpp_parser_lex (YYSTYPE *yylval, YYLTYPE *yylloc, glcpp_parser_t *parser);
+
+static void
+glcpp_parser_lex_from (glcpp_parser_t *parser, token_list_t *list);
+
+static void
+add_builtin_define(glcpp_parser_t *parser, const char *name, int value);
+
+%}
+
+%pure-parser
+%error-verbose
+
+%locations
+%initial-action {
+	@$.first_line = 1;
+	@$.first_column = 1;
+	@$.last_line = 1;
+	@$.last_column = 1;
+	@$.source = 0;
+}
+
+%parse-param {glcpp_parser_t *parser}
+%lex-param {glcpp_parser_t *parser}
+
+%expect 0
+%token COMMA_FINAL DEFINED ELIF_EXPANDED HASH HASH_DEFINE FUNC_IDENTIFIER OBJ_IDENTIFIER HASH_ELIF HASH_ELSE HASH_ENDIF HASH_IF HASH_IFDEF HASH_IFNDEF HASH_LINE HASH_UNDEF HASH_VERSION IDENTIFIER IF_EXPANDED INTEGER INTEGER_STRING LINE_EXPANDED NEWLINE OTHER PLACEHOLDER SPACE
+%token PASTE
+%type <ival> expression INTEGER operator SPACE integer_constant
+%type <str> IDENTIFIER FUNC_IDENTIFIER OBJ_IDENTIFIER INTEGER_STRING OTHER
+%type <string_list> identifier_list
+%type <token> preprocessing_token conditional_token
+%type <token_list> pp_tokens replacement_list text_line conditional_tokens
+%left OR
+%left AND
+%left '|'
+%left '^'
+%left '&'
+%left EQUAL NOT_EQUAL
+%left '<' '>' LESS_OR_EQUAL GREATER_OR_EQUAL
+%left LEFT_SHIFT RIGHT_SHIFT
+%left '+' '-'
+%left '*' '/' '%'
+%right UNARY
+
+%%
+
+input:
+	/* empty */
+|	input line
+;
+
+line:
+	control_line {
+		ralloc_asprintf_rewrite_tail (&parser->output, &parser->output_length, "\n");
+	}
+|	HASH_LINE {
+		glcpp_parser_resolve_implicit_version(parser);
+	} pp_tokens NEWLINE {
+
+		if (parser->skip_stack == NULL ||
+		    parser->skip_stack->type == SKIP_NO_SKIP)
+		{
+			_glcpp_parser_expand_and_lex_from (parser,
+							   LINE_EXPANDED, $3);
+		}
+	}
+|	text_line {
+		_glcpp_parser_print_expanded_token_list (parser, $1);
+		ralloc_asprintf_rewrite_tail (&parser->output, &parser->output_length, "\n");
+		ralloc_free ($1);
+	}
+|	expanded_line
+|	HASH non_directive
+;
+
+expanded_line:
+	IF_EXPANDED expression NEWLINE {
+		_glcpp_parser_skip_stack_push_if (parser, & @1, $2);
+	}
+|	ELIF_EXPANDED expression NEWLINE {
+		_glcpp_parser_skip_stack_change_if (parser, & @1, "elif", $2);
+	}
+|	LINE_EXPANDED integer_constant NEWLINE {
+		parser->has_new_line_number = 1;
+		parser->new_line_number = $2;
+		ralloc_asprintf_rewrite_tail (&parser->output,
+					      &parser->output_length,
+					      "#line %" PRIiMAX "\n",
+					      $2);
+	}
+|	LINE_EXPANDED integer_constant integer_constant NEWLINE {
+		parser->has_new_line_number = 1;
+		parser->new_line_number = $2;
+		parser->has_new_source_number = 1;
+		parser->new_source_number = $3;
+		ralloc_asprintf_rewrite_tail (&parser->output,
+					      &parser->output_length,
+					      "#line %" PRIiMAX " %" PRIiMAX "\n",
+					      $2, $3);
+	}
+;
+
+define:
+	OBJ_IDENTIFIER replacement_list NEWLINE {
+		_define_object_macro (parser, & @1, $1, $2);
+	}
+|	FUNC_IDENTIFIER '(' ')' replacement_list NEWLINE {
+		_define_function_macro (parser, & @1, $1, NULL, $4);
+	}
+|	FUNC_IDENTIFIER '(' identifier_list ')' replacement_list NEWLINE {
+		_define_function_macro (parser, & @1, $1, $3, $5);
+	}
+;
+
+control_line:
+	HASH_DEFINE {
+		glcpp_parser_resolve_implicit_version(parser);
+	} define
+|	HASH_UNDEF {
+		glcpp_parser_resolve_implicit_version(parser);
+	} IDENTIFIER NEWLINE {
+		macro_t *macro = hash_table_find (parser->defines, $3);
+		if (macro) {
+			hash_table_remove (parser->defines, $3);
+			ralloc_free (macro);
+		}
+		ralloc_free ($3);
+	}
+|	HASH_IF {
+		glcpp_parser_resolve_implicit_version(parser);
+	} conditional_tokens NEWLINE {
+		/* Be careful to only evaluate the 'if' expression if
+		 * we are not skipping. When we are skipping, we
+		 * simply push a new 0-valued 'if' onto the skip
+		 * stack.
+		 *
+		 * This avoids generating diagnostics for invalid
+		 * expressions that are being skipped. */
+		if (parser->skip_stack == NULL ||
+		    parser->skip_stack->type == SKIP_NO_SKIP)
+		{
+			_glcpp_parser_expand_and_lex_from (parser,
+							   IF_EXPANDED, $3);
+		}	
+		else
+		{
+			_glcpp_parser_skip_stack_push_if (parser, & @1, 0);
+			parser->skip_stack->type = SKIP_TO_ENDIF;
+		}
+	}
+|	HASH_IF NEWLINE {
+		/* #if without an expression is only an error if we
+		 *  are not skipping */
+		if (parser->skip_stack == NULL ||
+		    parser->skip_stack->type == SKIP_NO_SKIP)
+		{
+			glcpp_error(& @1, parser, "#if with no expression");
+		}	
+		_glcpp_parser_skip_stack_push_if (parser, & @1, 0);
+	}
+|	HASH_IFDEF {
+		glcpp_parser_resolve_implicit_version(parser);
+	} IDENTIFIER junk NEWLINE {
+		macro_t *macro = hash_table_find (parser->defines, $3);
+		ralloc_free ($3);
+		_glcpp_parser_skip_stack_push_if (parser, & @1, macro != NULL);
+	}
+|	HASH_IFNDEF {
+		glcpp_parser_resolve_implicit_version(parser);
+	} IDENTIFIER junk NEWLINE {
+		macro_t *macro = hash_table_find (parser->defines, $3);
+		ralloc_free ($3);
+		_glcpp_parser_skip_stack_push_if (parser, & @2, macro == NULL);
+	}
+|	HASH_ELIF conditional_tokens NEWLINE {
+		/* Be careful to only evaluate the 'elif' expression
+		 * if we are not skipping. When we are skipping, we
+		 * simply change to a 0-valued 'elif' on the skip
+		 * stack.
+		 *
+		 * This avoids generating diagnostics for invalid
+		 * expressions that are being skipped. */
+		if (parser->skip_stack &&
+		    parser->skip_stack->type == SKIP_TO_ELSE)
+		{
+			_glcpp_parser_expand_and_lex_from (parser,
+							   ELIF_EXPANDED, $2);
+		}
+		else if (parser->skip_stack &&
+		    parser->skip_stack->has_else)
+		{
+			glcpp_error(& @1, parser, "#elif after #else");
+		}
+		else
+		{
+			_glcpp_parser_skip_stack_change_if (parser, & @1,
+							    "elif", 0);
+		}
+	}
+|	HASH_ELIF NEWLINE {
+		/* #elif without an expression is an error unless we
+		 * are skipping. */
+		if (parser->skip_stack &&
+		    parser->skip_stack->type == SKIP_TO_ELSE)
+		{
+			glcpp_error(& @1, parser, "#elif with no expression");
+		}
+		else if (parser->skip_stack &&
+		    parser->skip_stack->has_else)
+		{
+			glcpp_error(& @1, parser, "#elif after #else");
+		}
+		else
+		{
+			_glcpp_parser_skip_stack_change_if (parser, & @1,
+							    "elif", 0);
+			glcpp_warning(& @1, parser, "ignoring illegal #elif without expression");
+		}
+	}
+|	HASH_ELSE {
+		if (parser->skip_stack &&
+		    parser->skip_stack->has_else)
+		{
+			glcpp_error(& @1, parser, "multiple #else");
+		}
+		else
+		{
+			_glcpp_parser_skip_stack_change_if (parser, & @1, "else", 1);
+			if (parser->skip_stack)
+				parser->skip_stack->has_else = true;
+		}
+	} NEWLINE
+|	HASH_ENDIF {
+		_glcpp_parser_skip_stack_pop (parser, & @1);
+	} NEWLINE
+|	HASH_VERSION integer_constant NEWLINE {
+		if (parser->version_resolved) {
+			glcpp_error(& @1, parser, "#version must appear on the first line");
+		}
+		_glcpp_parser_handle_version_declaration(parser, $2, NULL, true);
+	}
+|	HASH_VERSION integer_constant IDENTIFIER NEWLINE {
+		if (parser->version_resolved) {
+			glcpp_error(& @1, parser, "#version must appear on the first line");
+		}
+		_glcpp_parser_handle_version_declaration(parser, $2, $3, true);
+	}
+|	HASH NEWLINE {
+		glcpp_parser_resolve_implicit_version(parser);
+	}
+;
+
+integer_constant:
+	INTEGER_STRING {
+		if (strlen ($1) >= 3 && strncmp ($1, "0x", 2) == 0) {
+			$$ = strtoll ($1 + 2, NULL, 16);
+		} else if ($1[0] == '0') {
+			$$ = strtoll ($1, NULL, 8);
+		} else {
+			$$ = strtoll ($1, NULL, 10);
+		}
+	}
+|	INTEGER {
+		$$ = $1;
+	}
+
+expression:
+	integer_constant
+|	IDENTIFIER {
+		if (parser->is_gles)
+			glcpp_error(& @1, parser, "undefined macro %s in expression (illegal in GLES)", $1);
+		$$ = 0;
+	}
+|	expression OR expression {
+		$$ = $1 || $3;
+	}
+|	expression AND expression {
+		$$ = $1 && $3;
+	}
+|	expression '|' expression {
+		$$ = $1 | $3;
+	}
+|	expression '^' expression {
+		$$ = $1 ^ $3;
+	}
+|	expression '&' expression {
+		$$ = $1 & $3;
+	}
+|	expression NOT_EQUAL expression {
+		$$ = $1 != $3;
+	}
+|	expression EQUAL expression {
+		$$ = $1 == $3;
+	}
+|	expression GREATER_OR_EQUAL expression {
+		$$ = $1 >= $3;
+	}
+|	expression LESS_OR_EQUAL expression {
+		$$ = $1 <= $3;
+	}
+|	expression '>' expression {
+		$$ = $1 > $3;
+	}
+|	expression '<' expression {
+		$$ = $1 < $3;
+	}
+|	expression RIGHT_SHIFT expression {
+		$$ = $1 >> $3;
+	}
+|	expression LEFT_SHIFT expression {
+		$$ = $1 << $3;
+	}
+|	expression '-' expression {
+		$$ = $1 - $3;
+	}
+|	expression '+' expression {
+		$$ = $1 + $3;
+	}
+|	expression '%' expression {
+		if ($3 == 0) {
+			yyerror (& @1, parser,
+				 "zero modulus in preprocessor directive");
+		} else {
+			$$ = $1 % $3;
+		}
+	}
+|	expression '/' expression {
+		if ($3 == 0) {
+			yyerror (& @1, parser,
+				 "division by 0 in preprocessor directive");
+		} else {
+			$$ = $1 / $3;
+		}
+	}
+|	expression '*' expression {
+		$$ = $1 * $3;
+	}
+|	'!' expression %prec UNARY {
+		$$ = ! $2;
+	}
+|	'~' expression %prec UNARY {
+		$$ = ~ $2;
+	}
+|	'-' expression %prec UNARY {
+		$$ = - $2;
+	}
+|	'+' expression %prec UNARY {
+		$$ = + $2;
+	}
+|	'(' expression ')' {
+		$$ = $2;
+	}
+;
+
+identifier_list:
+	IDENTIFIER {
+		$$ = _string_list_create (parser);
+		_string_list_append_item ($$, $1);
+		ralloc_steal ($$, $1);
+	}
+|	identifier_list ',' IDENTIFIER {
+		$$ = $1;	
+		_string_list_append_item ($$, $3);
+		ralloc_steal ($$, $3);
+	}
+;
+
+text_line:
+	NEWLINE { $$ = NULL; }
+|	pp_tokens NEWLINE
+;
+
+non_directive:
+	pp_tokens NEWLINE {
+		yyerror (& @1, parser, "Invalid tokens after #");
+	}
+;
+
+replacement_list:
+	/* empty */ { $$ = NULL; }
+|	pp_tokens
+;
+
+junk:
+	/* empty */
+|	pp_tokens {
+		glcpp_warning(&@1, parser, "extra tokens at end of directive");
+	}
+;
+
+conditional_token:
+	/* Handle "defined" operator */
+	DEFINED IDENTIFIER {
+		int v = hash_table_find (parser->defines, $2) ? 1 : 0;
+		$$ = _token_create_ival (parser, INTEGER, v);
+	}
+|	DEFINED '(' IDENTIFIER ')' {
+		int v = hash_table_find (parser->defines, $3) ? 1 : 0;
+		$$ = _token_create_ival (parser, INTEGER, v);
+	}
+|	preprocessing_token
+;
+
+conditional_tokens:
+	/* Exactly the same as pp_tokens, but using conditional_token */
+	conditional_token {
+		$$ = _token_list_create (parser);
+		_token_list_append ($$, $1);
+	}
+|	conditional_tokens conditional_token {
+		$$ = $1;
+		_token_list_append ($$, $2);
+	}
+;
+
+pp_tokens:
+	preprocessing_token {
+		parser->space_tokens = 1;
+		$$ = _token_list_create (parser);
+		_token_list_append ($$, $1);
+	}
+|	pp_tokens preprocessing_token {
+		$$ = $1;
+		_token_list_append ($$, $2);
+	}
+;
+
+preprocessing_token:
+	IDENTIFIER {
+		$$ = _token_create_str (parser, IDENTIFIER, $1);
+		$$->location = yylloc;
+	}
+|	INTEGER_STRING {
+		$$ = _token_create_str (parser, INTEGER_STRING, $1);
+		$$->location = yylloc;
+	}
+|	operator {
+		$$ = _token_create_ival (parser, $1, $1);
+		$$->location = yylloc;
+	}
+|	OTHER {
+		$$ = _token_create_str (parser, OTHER, $1);
+		$$->location = yylloc;
+	}
+|	SPACE {
+		$$ = _token_create_ival (parser, SPACE, SPACE);
+		$$->location = yylloc;
+	}
+;
+
+operator:
+	'['			{ $$ = '['; }
+|	']'			{ $$ = ']'; }
+|	'('			{ $$ = '('; }
+|	')'			{ $$ = ')'; }
+|	'{'			{ $$ = '{'; }
+|	'}'			{ $$ = '}'; }
+|	'.'			{ $$ = '.'; }
+|	'&'			{ $$ = '&'; }
+|	'*'			{ $$ = '*'; }
+|	'+'			{ $$ = '+'; }
+|	'-'			{ $$ = '-'; }
+|	'~'			{ $$ = '~'; }
+|	'!'			{ $$ = '!'; }
+|	'/'			{ $$ = '/'; }
+|	'%'			{ $$ = '%'; }
+|	LEFT_SHIFT		{ $$ = LEFT_SHIFT; }
+|	RIGHT_SHIFT		{ $$ = RIGHT_SHIFT; }
+|	'<'			{ $$ = '<'; }
+|	'>'			{ $$ = '>'; }
+|	LESS_OR_EQUAL		{ $$ = LESS_OR_EQUAL; }
+|	GREATER_OR_EQUAL	{ $$ = GREATER_OR_EQUAL; }
+|	EQUAL			{ $$ = EQUAL; }
+|	NOT_EQUAL		{ $$ = NOT_EQUAL; }
+|	'^'			{ $$ = '^'; }
+|	'|'			{ $$ = '|'; }
+|	AND			{ $$ = AND; }
+|	OR			{ $$ = OR; }
+|	';'			{ $$ = ';'; }
+|	','			{ $$ = ','; }
+|	'='			{ $$ = '='; }
+|	PASTE			{ $$ = PASTE; }
+;
+
+%%
+
+string_list_t *
+_string_list_create (void *ctx)
+{
+	string_list_t *list;
+
+	list = ralloc (ctx, string_list_t);
+	list->head = NULL;
+	list->tail = NULL;
+
+	return list;
+}
+
+void
+_string_list_append_item (string_list_t *list, const char *str)
+{
+	string_node_t *node;
+
+	node = ralloc (list, string_node_t);
+	node->str = ralloc_strdup (node, str);
+
+	node->next = NULL;
+
+	if (list->head == NULL) {
+		list->head = node;
+	} else {
+		list->tail->next = node;
+	}
+
+	list->tail = node;
+}
+
+int
+_string_list_contains (string_list_t *list, const char *member, int *index)
+{
+	string_node_t *node;
+	int i;
+
+	if (list == NULL)
+		return 0;
+
+	for (i = 0, node = list->head; node; i++, node = node->next) {
+		if (strcmp (node->str, member) == 0) {
+			if (index)
+				*index = i;
+			return 1;
+		}
+	}
+
+	return 0;
+}
+
+int
+_string_list_length (string_list_t *list)
+{
+	int length = 0;
+	string_node_t *node;
+
+	if (list == NULL)
+		return 0;
+
+	for (node = list->head; node; node = node->next)
+		length++;
+
+	return length;
+}
+
+int
+_string_list_equal (string_list_t *a, string_list_t *b)
+{
+	string_node_t *node_a, *node_b;
+
+	if (a == NULL && b == NULL)
+		return 1;
+
+	if (a == NULL || b == NULL)
+		return 0;
+
+	for (node_a = a->head, node_b = b->head;
+	     node_a && node_b;
+	     node_a = node_a->next, node_b = node_b->next)
+	{
+		if (strcmp (node_a->str, node_b->str))
+			return 0;
+	}
+
+	/* Catch the case of lists being different lengths, (which
+	 * would cause the loop above to terminate after the shorter
+	 * list). */
+	return node_a == node_b;
+}
+
+argument_list_t *
+_argument_list_create (void *ctx)
+{
+	argument_list_t *list;
+
+	list = ralloc (ctx, argument_list_t);
+	list->head = NULL;
+	list->tail = NULL;
+
+	return list;
+}
+
+void
+_argument_list_append (argument_list_t *list, token_list_t *argument)
+{
+	argument_node_t *node;
+
+	node = ralloc (list, argument_node_t);
+	node->argument = argument;
+
+	node->next = NULL;
+
+	if (list->head == NULL) {
+		list->head = node;
+	} else {
+		list->tail->next = node;
+	}
+
+	list->tail = node;
+}
+
+int
+_argument_list_length (argument_list_t *list)
+{
+	int length = 0;
+	argument_node_t *node;
+
+	if (list == NULL)
+		return 0;
+
+	for (node = list->head; node; node = node->next)
+		length++;
+
+	return length;
+}
+
+token_list_t *
+_argument_list_member_at (argument_list_t *list, int index)
+{
+	argument_node_t *node;
+	int i;
+
+	if (list == NULL)
+		return NULL;
+
+	node = list->head;
+	for (i = 0; i < index; i++) {
+		node = node->next;
+		if (node == NULL)
+			break;
+	}
+
+	if (node)
+		return node->argument;
+
+	return NULL;
+}
+
+/* Note: This function ralloc_steal()s the str pointer. */
+token_t *
+_token_create_str (void *ctx, int type, char *str)
+{
+	token_t *token;
+
+	token = ralloc (ctx, token_t);
+	token->type = type;
+	token->value.str = str;
+
+	ralloc_steal (token, str);
+
+	return token;
+}
+
+token_t *
+_token_create_ival (void *ctx, int type, int ival)
+{
+	token_t *token;
+
+	token = ralloc (ctx, token_t);
+	token->type = type;
+	token->value.ival = ival;
+
+	return token;
+}
+
+token_list_t *
+_token_list_create (void *ctx)
+{
+	token_list_t *list;
+
+	list = ralloc (ctx, token_list_t);
+	list->head = NULL;
+	list->tail = NULL;
+	list->non_space_tail = NULL;
+
+	return list;
+}
+
+void
+_token_list_append (token_list_t *list, token_t *token)
+{
+	token_node_t *node;
+
+	node = ralloc (list, token_node_t);
+	node->token = token;
+	node->next = NULL;
+
+	if (list->head == NULL) {
+		list->head = node;
+	} else {
+		list->tail->next = node;
+	}
+
+	list->tail = node;
+	if (token->type != SPACE)
+		list->non_space_tail = node;
+}
+
+void
+_token_list_append_list (token_list_t *list, token_list_t *tail)
+{
+	if (tail == NULL || tail->head == NULL)
+		return;
+
+	if (list->head == NULL) {
+		list->head = tail->head;
+	} else {
+		list->tail->next = tail->head;
+	}
+
+	list->tail = tail->tail;
+	list->non_space_tail = tail->non_space_tail;
+}
+
+static token_list_t *
+_token_list_copy (void *ctx, token_list_t *other)
+{
+	token_list_t *copy;
+	token_node_t *node;
+
+	if (other == NULL)
+		return NULL;
+
+	copy = _token_list_create (ctx);
+	for (node = other->head; node; node = node->next) {
+		token_t *new_token = ralloc (copy, token_t);
+		*new_token = *node->token;
+		_token_list_append (copy, new_token);
+	}
+
+	return copy;
+}
+
+static void
+_token_list_trim_trailing_space (token_list_t *list)
+{
+	token_node_t *tail, *next;
+
+	if (list->non_space_tail) {
+		tail = list->non_space_tail->next;
+		list->non_space_tail->next = NULL;
+		list->tail = list->non_space_tail;
+
+		while (tail) {
+			next = tail->next;
+			ralloc_free (tail);
+			tail = next;
+		}
+	}
+}
+
+static int
+_token_list_is_empty_ignoring_space (token_list_t *l)
+{
+	token_node_t *n;
+
+	if (l == NULL)
+		return 1;
+
+	n = l->head;
+	while (n != NULL && n->token->type == SPACE)
+		n = n->next;
+
+	return n == NULL;
+}
+
+int
+_token_list_equal_ignoring_space (token_list_t *a, token_list_t *b)
+{
+	token_node_t *node_a, *node_b;
+
+	if (a == NULL || b == NULL) {
+		int a_empty = _token_list_is_empty_ignoring_space(a);
+		int b_empty = _token_list_is_empty_ignoring_space(b);
+		return a_empty == b_empty;
+	}
+
+	node_a = a->head;
+	node_b = b->head;
+
+	while (1)
+	{
+		if (node_a == NULL && node_b == NULL)
+			break;
+
+		if (node_a == NULL || node_b == NULL)
+			return 0;
+
+		if (node_a->token->type == SPACE) {
+			node_a = node_a->next;
+			continue;
+		}
+
+		if (node_b->token->type == SPACE) {
+			node_b = node_b->next;
+			continue;
+		}
+
+		if (node_a->token->type != node_b->token->type)
+			return 0;
+
+		switch (node_a->token->type) {
+		case INTEGER:
+			if (node_a->token->value.ival != 
+			    node_b->token->value.ival)
+			{
+				return 0;
+			}
+			break;
+		case IDENTIFIER:
+		case INTEGER_STRING:
+		case OTHER:
+			if (strcmp (node_a->token->value.str,
+				    node_b->token->value.str))
+			{
+				return 0;
+			}
+			break;
+		}
+
+		node_a = node_a->next;
+		node_b = node_b->next;
+	}
+
+	return 1;
+}
+
+static void
+_token_print (char **out, size_t *len, token_t *token)
+{
+	if (token->type < 256) {
+		ralloc_asprintf_rewrite_tail (out, len, "%c", token->type);
+		return;
+	}
+
+	switch (token->type) {
+	case INTEGER:
+		ralloc_asprintf_rewrite_tail (out, len, "%" PRIiMAX, token->value.ival);
+		break;
+	case IDENTIFIER:
+	case INTEGER_STRING:
+	case OTHER:
+		ralloc_asprintf_rewrite_tail (out, len, "%s", token->value.str);
+		break;
+	case SPACE:
+		ralloc_asprintf_rewrite_tail (out, len, " ");
+		break;
+	case LEFT_SHIFT:
+		ralloc_asprintf_rewrite_tail (out, len, "<<");
+		break;
+	case RIGHT_SHIFT:
+		ralloc_asprintf_rewrite_tail (out, len, ">>");
+		break;
+	case LESS_OR_EQUAL:
+		ralloc_asprintf_rewrite_tail (out, len, "<=");
+		break;
+	case GREATER_OR_EQUAL:
+		ralloc_asprintf_rewrite_tail (out, len, ">=");
+		break;
+	case EQUAL:
+		ralloc_asprintf_rewrite_tail (out, len, "==");
+		break;
+	case NOT_EQUAL:
+		ralloc_asprintf_rewrite_tail (out, len, "!=");
+		break;
+	case AND:
+		ralloc_asprintf_rewrite_tail (out, len, "&&");
+		break;
+	case OR:
+		ralloc_asprintf_rewrite_tail (out, len, "||");
+		break;
+	case PASTE:
+		ralloc_asprintf_rewrite_tail (out, len, "##");
+		break;
+	case COMMA_FINAL:
+		ralloc_asprintf_rewrite_tail (out, len, ",");
+		break;
+	case PLACEHOLDER:
+		/* Nothing to print. */
+		break;
+	default:
+		assert(!"Error: Don't know how to print token.");
+		break;
+	}
+}
+
+/* Return a new token (ralloc()ed off of 'token') formed by pasting
+ * 'token' and 'other'. Note that this function may return 'token' or
+ * 'other' directly rather than allocating anything new.
+ *
+ * Caution: Only very cursory error-checking is performed to see if
+ * the final result is a valid single token. */
+static token_t *
+_token_paste (glcpp_parser_t *parser, token_t *token, token_t *other)
+{
+	token_t *combined = NULL;
+
+	/* Pasting a placeholder onto anything makes no change. */
+	if (other->type == PLACEHOLDER)
+		return token;
+
+	/* When 'token' is a placeholder, just return 'other'. */
+	if (token->type == PLACEHOLDER)
+		return other;
+
+	/* A very few single-character punctuators can be combined
+	 * with another to form a multi-character punctuator. */
+	switch (token->type) {
+	case '<':
+		if (other->type == '<')
+			combined = _token_create_ival (token, LEFT_SHIFT, LEFT_SHIFT);
+		else if (other->type == '=')
+			combined = _token_create_ival (token, LESS_OR_EQUAL, LESS_OR_EQUAL);
+		break;
+	case '>':
+		if (other->type == '>')
+			combined = _token_create_ival (token, RIGHT_SHIFT, RIGHT_SHIFT);
+		else if (other->type == '=')
+			combined = _token_create_ival (token, GREATER_OR_EQUAL, GREATER_OR_EQUAL);
+		break;
+	case '=':
+		if (other->type == '=')
+			combined = _token_create_ival (token, EQUAL, EQUAL);
+		break;
+	case '!':
+		if (other->type == '=')
+			combined = _token_create_ival (token, NOT_EQUAL, NOT_EQUAL);
+		break;
+	case '&':
+		if (other->type == '&')
+			combined = _token_create_ival (token, AND, AND);
+		break;
+	case '|':
+		if (other->type == '|')
+			combined = _token_create_ival (token, OR, OR);
+		break;
+	}
+
+	if (combined != NULL) {
+		/* Inherit the location from the first token */
+		combined->location = token->location;
+		return combined;
+	}
+
+	/* Two string-valued (or integer) tokens can usually just be
+	 * mashed together. (We also handle a string followed by an
+	 * integer here as well.)
+	 *
+	 * There are some exceptions here. Notably, if the first token
+	 * is an integer (or a string representing an integer), then
+	 * the second token must also be an integer or must be a
+	 * string representing an integer that begins with a digit.
+	 */
+	if ((token->type == IDENTIFIER || token->type == OTHER || token->type == INTEGER_STRING || token->type == INTEGER) &&
+	    (other->type == IDENTIFIER || other->type == OTHER || other->type == INTEGER_STRING || other->type == INTEGER))
+	{
+		char *str;
+		int combined_type;
+
+		/* Check that pasting onto an integer doesn't create a
+		 * non-integer, (that is, only digits can be
+		 * pasted. */
+		if (token->type == INTEGER_STRING || token->type == INTEGER)
+		{
+			switch (other->type) {
+			case INTEGER_STRING:
+				if (other->value.str[0] < '0' ||
+				    other->value.str[0] > '9')
+					goto FAIL;
+				break;
+			case INTEGER:
+				if (other->value.ival < 0)
+					goto FAIL;
+				break;
+			default:
+				goto FAIL;
+			}
+		}
+
+		if (token->type == INTEGER)
+			str = ralloc_asprintf (token, "%" PRIiMAX,
+					       token->value.ival);
+		else
+			str = ralloc_strdup (token, token->value.str);
+					       
+
+		if (other->type == INTEGER)
+			ralloc_asprintf_append (&str, "%" PRIiMAX,
+						other->value.ival);
+		else
+			ralloc_strcat (&str, other->value.str);
+
+		/* New token is same type as original token, unless we
+		 * started with an integer, in which case we will be
+		 * creating an integer-string. */
+		combined_type = token->type;
+		if (combined_type == INTEGER)
+			combined_type = INTEGER_STRING;
+
+		combined = _token_create_str (token, combined_type, str);
+		combined->location = token->location;
+		return combined;
+	}
+
+    FAIL:
+	glcpp_error (&token->location, parser, "");
+	ralloc_asprintf_rewrite_tail (&parser->info_log, &parser->info_log_length, "Pasting \"");
+	_token_print (&parser->info_log, &parser->info_log_length, token);
+	ralloc_asprintf_rewrite_tail (&parser->info_log, &parser->info_log_length, "\" and \"");
+	_token_print (&parser->info_log, &parser->info_log_length, other);
+	ralloc_asprintf_rewrite_tail (&parser->info_log, &parser->info_log_length, "\" does not give a valid preprocessing token.\n");
+
+	return token;
+}
+
+static void
+_token_list_print (glcpp_parser_t *parser, token_list_t *list)
+{
+	token_node_t *node;
+
+	if (list == NULL)
+		return;
+
+	for (node = list->head; node; node = node->next)
+		_token_print (&parser->output, &parser->output_length, node->token);
+}
+
+void
+yyerror (YYLTYPE *locp, glcpp_parser_t *parser, const char *error)
+{
+	glcpp_error(locp, parser, "%s", error);
+}
+
+static void add_builtin_define(glcpp_parser_t *parser,
+			       const char *name, int value)
+{
+   token_t *tok;
+   token_list_t *list;
+
+   tok = _token_create_ival (parser, INTEGER, value);
+
+   list = _token_list_create(parser);
+   _token_list_append(list, tok);
+   _define_object_macro(parser, NULL, name, list);
+}
+
+glcpp_parser_t *
+glcpp_parser_create (const struct gl_extensions *extensions, gl_api api)
+{
+	glcpp_parser_t *parser;
+
+	parser = ralloc (NULL, glcpp_parser_t);
+
+	glcpp_lex_init_extra (parser, &parser->scanner);
+	parser->defines = hash_table_ctor (32, hash_table_string_hash,
+					   hash_table_string_compare);
+	parser->active = NULL;
+	parser->lexing_if = 0;
+	parser->space_tokens = 1;
+	parser->newline_as_space = 0;
+	parser->in_control_line = 0;
+	parser->paren_count = 0;
+        parser->commented_newlines = 0;
+
+	parser->skip_stack = NULL;
+
+	parser->lex_from_list = NULL;
+	parser->lex_from_node = NULL;
+
+	parser->output = ralloc_strdup(parser, "");
+	parser->output_length = 0;
+	parser->info_log = ralloc_strdup(parser, "");
+	parser->info_log_length = 0;
+	parser->error = 0;
+
+        parser->extensions = extensions;
+        parser->api = api;
+        parser->version_resolved = false;
+
+	parser->has_new_line_number = 0;
+	parser->new_line_number = 1;
+	parser->has_new_source_number = 0;
+	parser->new_source_number = 0;
+
+	return parser;
+}
+
+void
+glcpp_parser_destroy (glcpp_parser_t *parser)
+{
+	glcpp_lex_destroy (parser->scanner);
+	hash_table_dtor (parser->defines);
+	ralloc_free (parser);
+}
+
+typedef enum function_status
+{
+	FUNCTION_STATUS_SUCCESS,
+	FUNCTION_NOT_A_FUNCTION,
+	FUNCTION_UNBALANCED_PARENTHESES
+} function_status_t;
+
+/* Find a set of function-like macro arguments by looking for a
+ * balanced set of parentheses.
+ *
+ * When called, 'node' should be the opening-parenthesis token, (or
+ * perhaps preceding SPACE tokens). Upon successful return *last will
+ * be the last consumed node, (corresponding to the closing right
+ * parenthesis).
+ *
+ * Return values:
+ *
+ *   FUNCTION_STATUS_SUCCESS:
+ *
+ *	Successfully parsed a set of function arguments.	
+ *
+ *   FUNCTION_NOT_A_FUNCTION:
+ *
+ *	Macro name not followed by a '('. This is not an error, but
+ *	simply that the macro name should be treated as a non-macro.
+ *
+ *   FUNCTION_UNBALANCED_PARENTHESES
+ *
+ *	Macro name is not followed by a balanced set of parentheses.
+ */
+static function_status_t
+_arguments_parse (argument_list_t *arguments,
+		  token_node_t *node,
+		  token_node_t **last)
+{
+	token_list_t *argument;
+	int paren_count;
+
+	node = node->next;
+
+	/* Ignore whitespace before first parenthesis. */
+	while (node && node->token->type == SPACE)
+		node = node->next;
+
+	if (node == NULL || node->token->type != '(')
+		return FUNCTION_NOT_A_FUNCTION;
+
+	node = node->next;
+
+	argument = _token_list_create (arguments);
+	_argument_list_append (arguments, argument);
+
+	for (paren_count = 1; node; node = node->next) {
+		if (node->token->type == '(')
+		{
+			paren_count++;
+		}
+		else if (node->token->type == ')')
+		{
+			paren_count--;
+			if (paren_count == 0)
+				break;
+		}
+
+		if (node->token->type == ',' &&
+			 paren_count == 1)
+		{
+			_token_list_trim_trailing_space (argument);
+			argument = _token_list_create (arguments);
+			_argument_list_append (arguments, argument);
+		}
+		else {
+			if (argument->head == NULL) {
+				/* Don't treat initial whitespace as
+				 * part of the argument. */
+				if (node->token->type == SPACE)
+					continue;
+			}
+			_token_list_append (argument, node->token);
+		}
+	}
+
+	if (paren_count)
+		return FUNCTION_UNBALANCED_PARENTHESES;
+
+	*last = node;
+
+	return FUNCTION_STATUS_SUCCESS;
+}
+
+static token_list_t *
+_token_list_create_with_one_ival (void *ctx, int type, int ival)
+{
+	token_list_t *list;
+	token_t *node;
+
+	list = _token_list_create (ctx);
+	node = _token_create_ival (list, type, ival);
+	_token_list_append (list, node);
+
+	return list;
+}
+
+static token_list_t *
+_token_list_create_with_one_space (void *ctx)
+{
+	return _token_list_create_with_one_ival (ctx, SPACE, SPACE);
+}
+
+static token_list_t *
+_token_list_create_with_one_integer (void *ctx, int ival)
+{
+	return _token_list_create_with_one_ival (ctx, INTEGER, ival);
+}
+
+/* Perform macro expansion on 'list', placing the resulting tokens
+ * into a new list which is initialized with a first token of type
+ * 'head_token_type'. Then begin lexing from the resulting list,
+ * (return to the current lexing source when this list is exhausted).
+ */
+static void
+_glcpp_parser_expand_and_lex_from (glcpp_parser_t *parser,
+				   int head_token_type,
+				   token_list_t *list)
+{
+	token_list_t *expanded;
+	token_t *token;
+
+	expanded = _token_list_create (parser);
+	token = _token_create_ival (parser, head_token_type, head_token_type);
+	_token_list_append (expanded, token);
+	_glcpp_parser_expand_token_list (parser, list);
+	_token_list_append_list (expanded, list);
+	glcpp_parser_lex_from (parser, expanded);
+}
+
+static void
+_glcpp_parser_apply_pastes (glcpp_parser_t *parser, token_list_t *list)
+{
+	token_node_t *node;
+
+	node = list->head;
+	while (node)
+	{
+		token_node_t *next_non_space;
+
+		/* Look ahead for a PASTE token, skipping space. */
+		next_non_space = node->next;
+		while (next_non_space && next_non_space->token->type == SPACE)
+			next_non_space = next_non_space->next;
+
+		if (next_non_space == NULL)
+			break;
+
+		if (next_non_space->token->type != PASTE) {
+			node = next_non_space;
+			continue;
+		}
+
+		/* Now find the next non-space token after the PASTE. */
+		next_non_space = next_non_space->next;
+		while (next_non_space && next_non_space->token->type == SPACE)
+			next_non_space = next_non_space->next;
+
+		if (next_non_space == NULL) {
+			yyerror (&node->token->location, parser, "'##' cannot appear at either end of a macro expansion\n");
+			return;
+		}
+
+		node->token = _token_paste (parser, node->token, next_non_space->token);
+		node->next = next_non_space->next;
+		if (next_non_space == list->tail)
+			list->tail = node;
+	}
+
+	list->non_space_tail = list->tail;
+}
+
+/* This is a helper function that's essentially part of the
+ * implementation of _glcpp_parser_expand_node. It shouldn't be called
+ * except for by that function.
+ *
+ * Returns NULL if node is a simple token with no expansion, (that is,
+ * although 'node' corresponds to an identifier defined as a
+ * function-like macro, it is not followed with a parenthesized
+ * argument list).
+ *
+ * Compute the complete expansion of node (which is a function-like
+ * macro) and subsequent nodes which are arguments.
+ *
+ * Returns the token list that results from the expansion and sets
+ * *last to the last node in the list that was consumed by the
+ * expansion. Specifically, *last will be set as follows: as the
+ * token of the closing right parenthesis.
+ */
+static token_list_t *
+_glcpp_parser_expand_function (glcpp_parser_t *parser,
+			       token_node_t *node,
+			       token_node_t **last)
+			       
+{
+	macro_t *macro;
+	const char *identifier;
+	argument_list_t *arguments;
+	function_status_t status;
+	token_list_t *substituted;
+	int parameter_index;
+
+	identifier = node->token->value.str;
+
+	macro = hash_table_find (parser->defines, identifier);
+
+	assert (macro->is_function);
+
+	arguments = _argument_list_create (parser);
+	status = _arguments_parse (arguments, node, last);
+
+	switch (status) {
+	case FUNCTION_STATUS_SUCCESS:
+		break;
+	case FUNCTION_NOT_A_FUNCTION:
+		return NULL;
+	case FUNCTION_UNBALANCED_PARENTHESES:
+		glcpp_error (&node->token->location, parser, "Macro %s call has unbalanced parentheses\n", identifier);
+		return NULL;
+	}
+
+	/* Replace a macro defined as empty with a SPACE token. */
+	if (macro->replacements == NULL) {
+		ralloc_free (arguments);
+		return _token_list_create_with_one_space (parser);
+	}
+
+	if (! ((_argument_list_length (arguments) == 
+		_string_list_length (macro->parameters)) ||
+	       (_string_list_length (macro->parameters) == 0 &&
+		_argument_list_length (arguments) == 1 &&
+		arguments->head->argument->head == NULL)))
+	{
+		glcpp_error (&node->token->location, parser,
+			      "Error: macro %s invoked with %d arguments (expected %d)\n",
+			      identifier,
+			      _argument_list_length (arguments),
+			      _string_list_length (macro->parameters));
+		return NULL;
+	}
+
+	/* Perform argument substitution on the replacement list. */
+	substituted = _token_list_create (arguments);
+
+	for (node = macro->replacements->head; node; node = node->next)
+	{
+		if (node->token->type == IDENTIFIER &&
+		    _string_list_contains (macro->parameters,
+					   node->token->value.str,
+					   &parameter_index))
+		{
+			token_list_t *argument;
+			argument = _argument_list_member_at (arguments,
+							     parameter_index);
+			/* Before substituting, we expand the argument
+			 * tokens, or append a placeholder token for
+			 * an empty argument. */
+			if (argument->head) {
+				token_list_t *expanded_argument;
+				expanded_argument = _token_list_copy (parser,
+								      argument);
+				_glcpp_parser_expand_token_list (parser,
+								 expanded_argument);
+				_token_list_append_list (substituted,
+							 expanded_argument);
+			} else {
+				token_t *new_token;
+
+				new_token = _token_create_ival (substituted,
+								PLACEHOLDER,
+								PLACEHOLDER);
+				_token_list_append (substituted, new_token);
+			}
+		} else {
+			_token_list_append (substituted, node->token);
+		}
+	}
+
+	/* After argument substitution, and before further expansion
+	 * below, implement token pasting. */
+
+	_token_list_trim_trailing_space (substituted);
+
+	_glcpp_parser_apply_pastes (parser, substituted);
+
+	return substituted;
+}
+
+/* Compute the complete expansion of node, (and subsequent nodes after
+ * 'node' in the case that 'node' is a function-like macro and
+ * subsequent nodes are arguments).
+ *
+ * Returns NULL if node is a simple token with no expansion.
+ *
+ * Otherwise, returns the token list that results from the expansion
+ * and sets *last to the last node in the list that was consumed by
+ * the expansion. Specifically, *last will be set as follows:
+ *
+ *	As 'node' in the case of object-like macro expansion.
+ *
+ *	As the token of the closing right parenthesis in the case of
+ *	function-like macro expansion.
+ */
+static token_list_t *
+_glcpp_parser_expand_node (glcpp_parser_t *parser,
+			   token_node_t *node,
+			   token_node_t **last)
+{
+	token_t *token = node->token;
+	const char *identifier;
+	macro_t *macro;
+
+	/* We only expand identifiers */
+	if (token->type != IDENTIFIER) {
+		/* We change any COMMA into a COMMA_FINAL to prevent
+		 * it being mistaken for an argument separator
+		 * later. */
+		if (token->type == ',') {
+			token->type = COMMA_FINAL;
+			token->value.ival = COMMA_FINAL;
+		}
+
+		return NULL;
+	}
+
+	*last = node;
+	identifier = token->value.str;
+
+	/* Special handling for __LINE__ and __FILE__, (not through
+	 * the hash table). */
+	if (strcmp(identifier, "__LINE__") == 0)
+		return _token_list_create_with_one_integer (parser, node->token->location.first_line);
+
+	if (strcmp(identifier, "__FILE__") == 0)
+		return _token_list_create_with_one_integer (parser, node->token->location.source);
+
+	/* Look up this identifier in the hash table. */
+	macro = hash_table_find (parser->defines, identifier);
+
+	/* Not a macro, so no expansion needed. */
+	if (macro == NULL)
+		return NULL;
+
+	/* Finally, don't expand this macro if we're already actively
+	 * expanding it, (to avoid infinite recursion). */
+	if (_parser_active_list_contains (parser, identifier)) {
+		/* We change the token type here from IDENTIFIER to
+		 * OTHER to prevent any future expansion of this
+		 * unexpanded token. */
+		char *str;
+		token_list_t *expansion;
+		token_t *final;
+
+		str = ralloc_strdup (parser, token->value.str);
+		final = _token_create_str (parser, OTHER, str);
+		expansion = _token_list_create (parser);
+		_token_list_append (expansion, final);
+		return expansion;
+	}
+
+	if (! macro->is_function)
+	{
+		token_list_t *replacement;
+
+		/* Replace a macro defined as empty with a SPACE token. */
+		if (macro->replacements == NULL)
+			return _token_list_create_with_one_space (parser);
+
+		replacement = _token_list_copy (parser, macro->replacements);
+		_glcpp_parser_apply_pastes (parser, replacement);
+		return replacement;
+	}
+
+	return _glcpp_parser_expand_function (parser, node, last);
+}
+
+/* Push a new identifier onto the parser's active list.
+ *
+ * Here, 'marker' is the token node that appears in the list after the
+ * expansion of 'identifier'. That is, when the list iterator begins
+ * examining 'marker', then it is time to pop this node from the
+ * active stack.
+ */
+static void
+_parser_active_list_push (glcpp_parser_t *parser,
+			  const char *identifier,
+			  token_node_t *marker)
+{
+	active_list_t *node;
+
+	node = ralloc (parser->active, active_list_t);
+	node->identifier = ralloc_strdup (node, identifier);
+	node->marker = marker;
+	node->next = parser->active;
+
+	parser->active = node;
+}
+
+static void
+_parser_active_list_pop (glcpp_parser_t *parser)
+{
+	active_list_t *node = parser->active;
+
+	if (node == NULL) {
+		parser->active = NULL;
+		return;
+	}
+
+	node = parser->active->next;
+	ralloc_free (parser->active);
+
+	parser->active = node;
+}
+
+static int
+_parser_active_list_contains (glcpp_parser_t *parser, const char *identifier)
+{
+	active_list_t *node;
+
+	if (parser->active == NULL)
+		return 0;
+
+	for (node = parser->active; node; node = node->next)
+		if (strcmp (node->identifier, identifier) == 0)
+			return 1;
+
+	return 0;
+}
+
+/* Walk over the token list replacing nodes with their expansion.
+ * Whenever nodes are expanded the walking will walk over the new
+ * nodes, continuing to expand as necessary. The results are placed in
+ * 'list' itself;
+ */
+static void
+_glcpp_parser_expand_token_list (glcpp_parser_t *parser,
+				 token_list_t *list)
+{
+	token_node_t *node_prev;
+	token_node_t *node, *last = NULL;
+	token_list_t *expansion;
+	active_list_t *active_initial = parser->active;
+
+	if (list == NULL)
+		return;
+
+	_token_list_trim_trailing_space (list);
+
+	node_prev = NULL;
+	node = list->head;
+
+	while (node) {
+
+		while (parser->active && parser->active->marker == node)
+			_parser_active_list_pop (parser);
+
+		expansion = _glcpp_parser_expand_node (parser, node, &last);
+		if (expansion) {
+			token_node_t *n;
+
+			for (n = node; n != last->next; n = n->next)
+				while (parser->active &&
+				       parser->active->marker == n)
+				{
+					_parser_active_list_pop (parser);
+				}
+
+			_parser_active_list_push (parser,
+						  node->token->value.str,
+						  last->next);
+			
+			/* Splice expansion into list, supporting a
+			 * simple deletion if the expansion is
+			 * empty. */
+			if (expansion->head) {
+				if (node_prev)
+					node_prev->next = expansion->head;
+				else
+					list->head = expansion->head;
+				expansion->tail->next = last->next;
+				if (last == list->tail)
+					list->tail = expansion->tail;
+			} else {
+				if (node_prev)
+					node_prev->next = last->next;
+				else
+					list->head = last->next;
+				if (last == list->tail)
+					list->tail = NULL;
+			}
+		} else {
+			node_prev = node;
+		}
+		node = node_prev ? node_prev->next : list->head;
+	}
+
+	/* Remove any lingering effects of this invocation on the
+	 * active list. That is, pop until the list looks like it did
+	 * at the beginning of this function. */
+	while (parser->active && parser->active != active_initial)
+		_parser_active_list_pop (parser);
+
+	list->non_space_tail = list->tail;
+}
+
+void
+_glcpp_parser_print_expanded_token_list (glcpp_parser_t *parser,
+					 token_list_t *list)
+{
+	if (list == NULL)
+		return;
+
+	_glcpp_parser_expand_token_list (parser, list);
+
+	_token_list_trim_trailing_space (list);
+
+	_token_list_print (parser, list);
+}
+
+static void
+_check_for_reserved_macro_name (glcpp_parser_t *parser, YYLTYPE *loc,
+				const char *identifier)
+{
+	/* Section 3.3 (Preprocessor) of the GLSL 1.30 spec (and later) and
+	 * the GLSL ES spec (all versions) say:
+	 *
+	 *     "All macro names containing two consecutive underscores ( __ )
+	 *     are reserved for future use as predefined macro names. All
+	 *     macro names prefixed with "GL_" ("GL" followed by a single
+	 *     underscore) are also reserved."
+	 *
+	 * The intention is that names containing __ are reserved for internal
+	 * use by the implementation, and names prefixed with GL_ are reserved
+	 * for use by Khronos.  Since every extension adds a name prefixed
+	 * with GL_ (i.e., the name of the extension), that should be an
+	 * error.  Names simply containing __ are dangerous to use, but should
+	 * be allowed.
+	 *
+	 * A future version of the GLSL specification will clarify this.
+	 */
+	if (strstr(identifier, "__")) {
+		glcpp_warning(loc, parser,
+			      "Macro names containing \"__\" are reserved "
+			      "for use by the implementation.\n");
+	}
+	if (strncmp(identifier, "GL_", 3) == 0) {
+		glcpp_error (loc, parser, "Macro names starting with \"GL_\" are reserved.\n");
+	}
+}
+
+static int
+_macro_equal (macro_t *a, macro_t *b)
+{
+	if (a->is_function != b->is_function)
+		return 0;
+
+	if (a->is_function) {
+		if (! _string_list_equal (a->parameters, b->parameters))
+			return 0;
+	}
+
+	return _token_list_equal_ignoring_space (a->replacements,
+						 b->replacements);
+}
+
+void
+_define_object_macro (glcpp_parser_t *parser,
+		      YYLTYPE *loc,
+		      const char *identifier,
+		      token_list_t *replacements)
+{
+	macro_t *macro, *previous;
+
+	if (loc != NULL)
+		_check_for_reserved_macro_name(parser, loc, identifier);
+
+	macro = ralloc (parser, macro_t);
+
+	macro->is_function = 0;
+	macro->parameters = NULL;
+	macro->identifier = ralloc_strdup (macro, identifier);
+	macro->replacements = replacements;
+	ralloc_steal (macro, replacements);
+
+	previous = hash_table_find (parser->defines, identifier);
+	if (previous) {
+		if (_macro_equal (macro, previous)) {
+			ralloc_free (macro);
+			return;
+		}
+		glcpp_error (loc, parser, "Redefinition of macro %s\n",
+			     identifier);
+	}
+
+	hash_table_insert (parser->defines, macro, identifier);
+}
+
+void
+_define_function_macro (glcpp_parser_t *parser,
+			YYLTYPE *loc,
+			const char *identifier,
+			string_list_t *parameters,
+			token_list_t *replacements)
+{
+	macro_t *macro, *previous;
+
+	_check_for_reserved_macro_name(parser, loc, identifier);
+
+	macro = ralloc (parser, macro_t);
+	ralloc_steal (macro, parameters);
+	ralloc_steal (macro, replacements);
+
+	macro->is_function = 1;
+	macro->parameters = parameters;
+	macro->identifier = ralloc_strdup (macro, identifier);
+	macro->replacements = replacements;
+	previous = hash_table_find (parser->defines, identifier);
+	if (previous) {
+		if (_macro_equal (macro, previous)) {
+			ralloc_free (macro);
+			return;
+		}
+		glcpp_error (loc, parser, "Redefinition of macro %s\n",
+			     identifier);
+	}
+
+	hash_table_insert (parser->defines, macro, identifier);
+}
+
+static int
+glcpp_parser_lex (YYSTYPE *yylval, YYLTYPE *yylloc, glcpp_parser_t *parser)
+{
+	token_node_t *node;
+	int ret;
+
+	if (parser->lex_from_list == NULL) {
+		ret = glcpp_lex (yylval, yylloc, parser->scanner);
+
+		/* XXX: This ugly block of code exists for the sole
+		 * purpose of converting a NEWLINE token into a SPACE
+		 * token, but only in the case where we have seen a
+		 * function-like macro name, but have not yet seen its
+		 * closing parenthesis.
+		 *
+		 * There's perhaps a more compact way to do this with
+		 * mid-rule actions in the grammar.
+		 *
+		 * I'm definitely not pleased with the complexity of
+		 * this code here.
+		 */
+		if (parser->newline_as_space)
+		{
+			if (ret == '(') {
+				parser->paren_count++;
+			} else if (ret == ')') {
+				parser->paren_count--;
+				if (parser->paren_count == 0)
+					parser->newline_as_space = 0;
+			} else if (ret == NEWLINE) {
+				ret = SPACE;
+			} else if (ret != SPACE) {
+				if (parser->paren_count == 0)
+					parser->newline_as_space = 0;
+			}
+		}
+		else if (parser->in_control_line)
+		{
+			if (ret == NEWLINE)
+				parser->in_control_line = 0;
+		}
+		else if (ret == HASH_DEFINE ||
+			   ret == HASH_UNDEF || ret == HASH_IF ||
+			   ret == HASH_IFDEF || ret == HASH_IFNDEF ||
+			   ret == HASH_ELIF || ret == HASH_ELSE ||
+			   ret == HASH_ENDIF || ret == HASH)
+		{
+			parser->in_control_line = 1;
+		}
+		else if (ret == IDENTIFIER)
+		{
+			macro_t *macro;
+			macro = hash_table_find (parser->defines,
+						 yylval->str);
+			if (macro && macro->is_function) {
+				parser->newline_as_space = 1;
+				parser->paren_count = 0;
+			}
+		}
+
+		return ret;
+	}
+
+	node = parser->lex_from_node;
+
+	if (node == NULL) {
+		ralloc_free (parser->lex_from_list);
+		parser->lex_from_list = NULL;
+		return NEWLINE;
+	}
+
+	*yylval = node->token->value;
+	ret = node->token->type;
+
+	parser->lex_from_node = node->next;
+
+	return ret;
+}
+
+static void
+glcpp_parser_lex_from (glcpp_parser_t *parser, token_list_t *list)
+{
+	token_node_t *node;
+
+	assert (parser->lex_from_list == NULL);
+
+	/* Copy list, eliminating any space tokens. */
+	parser->lex_from_list = _token_list_create (parser);
+
+	for (node = list->head; node; node = node->next) {
+		if (node->token->type == SPACE)
+			continue;
+		_token_list_append (parser->lex_from_list, node->token);
+	}
+
+	ralloc_free (list);
+
+	parser->lex_from_node = parser->lex_from_list->head;
+
+	/* It's possible the list consisted of nothing but whitespace. */
+	if (parser->lex_from_node == NULL) {
+		ralloc_free (parser->lex_from_list);
+		parser->lex_from_list = NULL;
+	}
+}
+
+static void
+_glcpp_parser_skip_stack_push_if (glcpp_parser_t *parser, YYLTYPE *loc,
+				  int condition)
+{
+	skip_type_t current = SKIP_NO_SKIP;
+	skip_node_t *node;
+
+	if (parser->skip_stack)
+		current = parser->skip_stack->type;
+
+	node = ralloc (parser, skip_node_t);
+	node->loc = *loc;
+
+	if (current == SKIP_NO_SKIP) {
+		if (condition)
+			node->type = SKIP_NO_SKIP;
+		else
+			node->type = SKIP_TO_ELSE;
+	} else {
+		node->type = SKIP_TO_ENDIF;
+	}
+
+	node->has_else = false;
+	node->next = parser->skip_stack;
+	parser->skip_stack = node;
+}
+
+static void
+_glcpp_parser_skip_stack_change_if (glcpp_parser_t *parser, YYLTYPE *loc,
+				    const char *type, int condition)
+{
+	if (parser->skip_stack == NULL) {
+		glcpp_error (loc, parser, "%s without #if\n", type);
+		return;
+	}
+
+	if (parser->skip_stack->type == SKIP_TO_ELSE) {
+		if (condition)
+			parser->skip_stack->type = SKIP_NO_SKIP;
+	} else {
+		parser->skip_stack->type = SKIP_TO_ENDIF;
+	}
+}
+
+static void
+_glcpp_parser_skip_stack_pop (glcpp_parser_t *parser, YYLTYPE *loc)
+{
+	skip_node_t *node;
+
+	if (parser->skip_stack == NULL) {
+		glcpp_error (loc, parser, "#endif without #if\n");
+		return;
+	}
+
+	node = parser->skip_stack;
+	parser->skip_stack = node->next;
+	ralloc_free (node);
+}
+
+static void
+_glcpp_parser_handle_version_declaration(glcpp_parser_t *parser, intmax_t version,
+                                         const char *es_identifier,
+                                         bool explicitly_set)
+{
+	const struct gl_extensions *extensions = parser->extensions;
+
+	if (parser->version_resolved)
+		return;
+
+	parser->version_resolved = true;
+
+	add_builtin_define (parser, "__VERSION__", version);
+
+	parser->is_gles = (version == 100) ||
+			   (es_identifier &&
+			    (strcmp(es_identifier, "es") == 0));
+
+	/* Add pre-defined macros. */
+	if (parser->is_gles) {
+	   add_builtin_define(parser, "GL_ES", 1);
+           add_builtin_define(parser, "GL_EXT_separate_shader_objects", 1);
+
+	   if (extensions != NULL) {
+	      if (extensions->OES_EGL_image_external)
+	         add_builtin_define(parser, "GL_OES_EGL_image_external", 1);
+	   }
+	} else {
+	   add_builtin_define(parser, "GL_ARB_draw_buffers", 1);
+           add_builtin_define(parser, "GL_ARB_separate_shader_objects", 1);
+	   add_builtin_define(parser, "GL_ARB_texture_rectangle", 1);
+           add_builtin_define(parser, "GL_AMD_shader_trinary_minmax", 1);
+
+
+	   if (extensions != NULL) {
+	      if (extensions->EXT_texture_array)
+	         add_builtin_define(parser, "GL_EXT_texture_array", 1);
+
+	      if (extensions->ARB_arrays_of_arrays)
+	          add_builtin_define(parser, "GL_ARB_arrays_of_arrays", 1);
+
+	      if (extensions->ARB_fragment_coord_conventions)
+	         add_builtin_define(parser, "GL_ARB_fragment_coord_conventions",
+				    1);
+
+	      if (extensions->ARB_explicit_attrib_location)
+	         add_builtin_define(parser, "GL_ARB_explicit_attrib_location", 1);
+
+	      if (extensions->ARB_shader_texture_lod)
+	         add_builtin_define(parser, "GL_ARB_shader_texture_lod", 1);
+
+	      if (extensions->ARB_draw_instanced)
+	         add_builtin_define(parser, "GL_ARB_draw_instanced", 1);
+
+	      if (extensions->ARB_conservative_depth) {
+	         add_builtin_define(parser, "GL_AMD_conservative_depth", 1);
+	         add_builtin_define(parser, "GL_ARB_conservative_depth", 1);
+	      }
+
+	      if (extensions->ARB_shader_bit_encoding)
+	         add_builtin_define(parser, "GL_ARB_shader_bit_encoding", 1);
+
+	      if (extensions->ARB_uniform_buffer_object)
+	         add_builtin_define(parser, "GL_ARB_uniform_buffer_object", 1);
+
+	      if (extensions->ARB_texture_cube_map_array)
+	         add_builtin_define(parser, "GL_ARB_texture_cube_map_array", 1);
+
+	      if (extensions->ARB_shading_language_packing)
+	         add_builtin_define(parser, "GL_ARB_shading_language_packing", 1);
+
+	      if (extensions->ARB_texture_multisample)
+	         add_builtin_define(parser, "GL_ARB_texture_multisample", 1);
+
+	      if (extensions->ARB_texture_query_levels)
+	         add_builtin_define(parser, "GL_ARB_texture_query_levels", 1);
+
+	      if (extensions->ARB_texture_query_lod)
+	         add_builtin_define(parser, "GL_ARB_texture_query_lod", 1);
+
+	      if (extensions->ARB_gpu_shader5)
+	         add_builtin_define(parser, "GL_ARB_gpu_shader5", 1);
+
+	      if (extensions->AMD_vertex_shader_layer)
+	         add_builtin_define(parser, "GL_AMD_vertex_shader_layer", 1);
+
+	      if (extensions->ARB_shading_language_420pack)
+	         add_builtin_define(parser, "GL_ARB_shading_language_420pack", 1);
+
+	      if (extensions->ARB_sample_shading)
+	         add_builtin_define(parser, "GL_ARB_sample_shading", 1);
+
+	      if (extensions->ARB_texture_gather)
+	         add_builtin_define(parser, "GL_ARB_texture_gather", 1);
+
+	      if (extensions->ARB_shader_atomic_counters)
+	         add_builtin_define(parser, "GL_ARB_shader_atomic_counters", 1);
+
+	      if (extensions->ARB_viewport_array)
+	         add_builtin_define(parser, "GL_ARB_viewport_array", 1);
+
+              if (extensions->ARB_compute_shader)
+                 add_builtin_define(parser, "GL_ARB_compute_shader", 1);
+
+	      if (extensions->ARB_shader_image_load_store)
+	         add_builtin_define(parser, "GL_ARB_shader_image_load_store", 1);
+	   }
+	}
+
+	if (extensions != NULL) {
+	   if (extensions->EXT_shader_integer_mix)
+	      add_builtin_define(parser, "GL_EXT_shader_integer_mix", 1);
+	}
+
+	if (version >= 150)
+		add_builtin_define(parser, "GL_core_profile", 1);
+
+	/* Currently, all ES2/ES3 implementations support highp in the
+	 * fragment shader, so we always define this macro in ES2/ES3.
+	 * If we ever get a driver that doesn't support highp, we'll
+	 * need to add a flag to the gl_context and check that here.
+	 */
+	if (version >= 130 || parser->is_gles)
+		add_builtin_define (parser, "GL_FRAGMENT_PRECISION_HIGH", 1);
+
+	if (explicitly_set) {
+	   ralloc_asprintf_rewrite_tail (&parser->output, &parser->output_length,
+					 "#version %" PRIiMAX "%s%s", version,
+					 es_identifier ? " " : "",
+					 es_identifier ? es_identifier : "");
+	}
+}
+
+/* GLSL version if no version is explicitly specified. */
+#define IMPLICIT_GLSL_VERSION 110
+
+/* GLSL ES version if no version is explicitly specified. */
+#define IMPLICIT_GLSL_ES_VERSION 100
+
+void
+glcpp_parser_resolve_implicit_version(glcpp_parser_t *parser)
+{
+	int language_version = parser->api == API_OPENGLES2 ?
+			       IMPLICIT_GLSL_ES_VERSION :
+			       IMPLICIT_GLSL_VERSION;
+
+	_glcpp_parser_handle_version_declaration(parser, language_version,
+						 NULL, false);
+}

diff --git a/icd/intel/compiler/shader/glcpp/glcpp.c b/icd/intel/compiler/shader/glcpp/glcpp.c
new file mode 100644
index 0000000..07b1500
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/glcpp.c

@@ -0,0 +1,175 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <getopt.h>
+
+#include "glcpp.h"
+#include "main/mtypes.h"
+#include "main/shaderobj.h"
+
+extern int glcpp_parser_debug;
+
+void
+_mesa_reference_shader(struct gl_context *ctx, struct gl_shader **ptr,
+                       struct gl_shader *sh)
+{
+   (void) ctx;
+   *ptr = sh;
+}
+
+/* Read from fp until EOF and return a string of everything read.
+ */
+static char *
+load_text_fp (void *ctx, FILE *fp)
+{
+#define CHUNK 4096
+	char *text = NULL;
+	size_t text_size = 0;
+	size_t total_read = 0;
+	size_t bytes;
+
+	while (1) {
+		if (total_read + CHUNK + 1 > text_size) {
+			text_size = text_size ? text_size * 2 : CHUNK + 1;
+			text = reralloc_size (ctx, text, text_size);
+			if (text == NULL) {
+				fprintf (stderr, "Out of memory\n");
+				return NULL;
+			}
+		}
+		bytes = fread (text + total_read, 1, CHUNK, fp);
+		total_read += bytes;
+
+		if (bytes < CHUNK) {
+			break;
+		}
+	}
+
+	text[total_read] = '\0';
+
+	return text;
+}
+
+static char *
+load_text_file(void *ctx, const char *filename)
+{
+	char *text;
+	FILE *fp;
+
+	if (filename == NULL || strcmp (filename, "-") == 0)
+		return load_text_fp (ctx, stdin);
+
+	fp = fopen (filename, "r");
+	if (fp == NULL) {
+		fprintf (stderr, "Failed to open file %s: %s\n",
+			 filename, strerror (errno));
+		return NULL;
+	}
+
+	text = load_text_fp (ctx, fp);
+
+	fclose(fp);
+
+	return text;
+}
+
+/* Initialize only those things that glcpp cares about.
+ */
+static void
+init_fake_gl_context (struct gl_context *gl_ctx)
+{
+	gl_ctx->API = API_OPENGL_COMPAT;
+	gl_ctx->Const.DisableGLSLLineContinuations = false;
+}
+
+static void
+usage (void)
+{
+	fprintf (stderr,
+		 "Usage: glcpp [OPTIONS] [--] [<filename>]\n"
+		 "\n"
+		 "Pre-process the given filename (stdin if no filename given).\n"
+		 "The following options are supported:\n"
+		 "    --disable-line-continuations      Do not interpret lines ending with a\n"
+		 "                                      backslash ('\\') as a line continuation.\n");
+}
+
+enum {
+	DISABLE_LINE_CONTINUATIONS_OPT = CHAR_MAX + 1
+};
+
+const static struct option
+long_options[] = {
+	{"disable-line-continuations", no_argument, 0, DISABLE_LINE_CONTINUATIONS_OPT },
+	{0,                            0,           0, 0 }
+};
+
+int
+main (int argc, char *argv[])
+{
+	char *filename = NULL;
+	void *ctx = ralloc(NULL, void*);
+	char *info_log = ralloc_strdup(ctx, "");
+	const char *shader;
+	int ret;
+	struct gl_context gl_ctx;
+	int c;
+
+	init_fake_gl_context (&gl_ctx);
+
+	while ((c = getopt_long(argc, argv, "", long_options, NULL)) != -1) {
+		switch (c) {
+		case DISABLE_LINE_CONTINUATIONS_OPT:
+			gl_ctx.Const.DisableGLSLLineContinuations = true;
+			break;
+		default:
+			usage ();
+			exit (1);
+		}
+	}
+
+	if (optind + 1 < argc) {
+		printf ("Unexpected argument: %s\n", argv[optind+1]);
+		usage ();
+		exit (1);
+	}
+	if (optind < argc) {
+		filename = argv[optind];
+	}
+
+	shader = load_text_file (ctx, filename);
+	if (shader == NULL)
+	   return 1;
+
+	ret = glcpp_preprocess(ctx, &shader, &info_log, NULL, &gl_ctx);
+
+	printf("%s", shader);
+	fprintf(stderr, "%s", info_log);
+
+	ralloc_free(ctx);
+
+	return ret;
+}

diff --git a/icd/intel/compiler/shader/glcpp/glcpp.h b/icd/intel/compiler/shader/glcpp/glcpp.h
new file mode 100644
index 0000000..79ccb23
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/glcpp.h

@@ -0,0 +1,240 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef GLCPP_H
+#define GLCPP_H
+
+#include <stdint.h>
+#include <stdbool.h>
+
+#include "main/mtypes.h"
+
+#include "../ralloc.h"
+
+#include "program/hash_table.h"
+
+#define yyscan_t void*
+
+/* Some data types used for parser values. */
+
+typedef struct string_node {
+	const char *str;
+	struct string_node *next;
+} string_node_t;
+
+typedef struct string_list {
+	string_node_t *head;
+	string_node_t *tail;
+} string_list_t;
+
+typedef struct token token_t;
+typedef struct token_list token_list_t;
+
+typedef union YYSTYPE
+{
+	intmax_t ival;
+	char *str;
+	string_list_t *string_list;
+	token_t *token;
+	token_list_t *token_list;
+} YYSTYPE;
+
+# define YYSTYPE_IS_TRIVIAL 1
+# define YYSTYPE_IS_DECLARED 1
+
+typedef struct YYLTYPE {
+   int first_line;
+   int first_column;
+   int last_line;
+   int last_column;
+   unsigned source;
+} YYLTYPE;
+# define YYLTYPE_IS_DECLARED 1
+# define YYLTYPE_IS_TRIVIAL 1
+
+# define YYLLOC_DEFAULT(Current, Rhs, N)			\
+do {								\
+   if (N)							\
+   {								\
+      (Current).first_line   = YYRHSLOC(Rhs, 1).first_line;	\
+      (Current).first_column = YYRHSLOC(Rhs, 1).first_column;	\
+      (Current).last_line    = YYRHSLOC(Rhs, N).last_line;	\
+      (Current).last_column  = YYRHSLOC(Rhs, N).last_column;	\
+   }								\
+   else								\
+   {								\
+      (Current).first_line   = (Current).last_line =		\
+	 YYRHSLOC(Rhs, 0).last_line;				\
+      (Current).first_column = (Current).last_column =		\
+	 YYRHSLOC(Rhs, 0).last_column;				\
+   }								\
+   (Current).source = 0;					\
+} while (0)
+
+struct token {
+	int type;
+	YYSTYPE value;
+	YYLTYPE location;
+};
+
+typedef struct token_node {
+	token_t *token;
+	struct token_node *next;
+} token_node_t;
+
+struct token_list {
+	token_node_t *head;
+	token_node_t *tail;
+	token_node_t *non_space_tail;
+};
+
+typedef struct argument_node {
+	token_list_t *argument;
+	struct argument_node *next;
+} argument_node_t;
+
+typedef struct argument_list {
+	argument_node_t *head;
+	argument_node_t *tail;
+} argument_list_t;
+
+typedef struct glcpp_parser glcpp_parser_t;
+
+typedef enum {
+	TOKEN_CLASS_IDENTIFIER,
+	TOKEN_CLASS_IDENTIFIER_FINALIZED,
+	TOKEN_CLASS_FUNC_MACRO,
+	TOKEN_CLASS_OBJ_MACRO
+} token_class_t;
+
+token_class_t
+glcpp_parser_classify_token (glcpp_parser_t *parser,
+			     const char *identifier,
+			     int *parameter_index);
+
+typedef struct {
+	int is_function;
+	string_list_t *parameters;
+	const char *identifier;
+	token_list_t *replacements;
+} macro_t;
+
+typedef struct expansion_node {
+	macro_t *macro;
+	token_node_t *replacements;
+	struct expansion_node *next;
+} expansion_node_t;
+
+typedef enum skip_type {
+	SKIP_NO_SKIP,
+	SKIP_TO_ELSE,
+	SKIP_TO_ENDIF
+} skip_type_t;
+
+typedef struct skip_node {
+	skip_type_t type;
+	bool has_else;
+	YYLTYPE loc; /* location of the initial #if/#elif/... */
+	struct skip_node *next;
+} skip_node_t;
+
+typedef struct active_list {
+	const char *identifier;
+	token_node_t *marker;
+	struct active_list *next;
+} active_list_t;
+
+struct glcpp_parser {
+	yyscan_t scanner;
+	struct hash_table *defines;
+	active_list_t *active;
+	int lexing_if;
+	int space_tokens;
+	int newline_as_space;
+	int in_control_line;
+	int paren_count;
+	int commented_newlines;
+	skip_node_t *skip_stack;
+	token_list_t *lex_from_list;
+	token_node_t *lex_from_node;
+	char *output;
+	char *info_log;
+	size_t output_length;
+	size_t info_log_length;
+	int error;
+	const struct gl_extensions *extensions;
+	gl_api api;
+	bool version_resolved;
+	bool has_new_line_number;
+	int new_line_number;
+	bool has_new_source_number;
+	int new_source_number;
+	bool is_gles;
+};
+
+struct gl_extensions;
+
+glcpp_parser_t *
+glcpp_parser_create (const struct gl_extensions *extensions, gl_api api);
+
+int
+glcpp_parser_parse (glcpp_parser_t *parser);
+
+void
+glcpp_parser_destroy (glcpp_parser_t *parser);
+
+void
+glcpp_parser_resolve_implicit_version(glcpp_parser_t *parser);
+
+int
+glcpp_preprocess(void *ralloc_ctx, const char **shader, char **info_log,
+	   const struct gl_extensions *extensions, struct gl_context *g_ctx);
+
+/* Functions for writing to the info log */
+
+void
+glcpp_error (YYLTYPE *locp, glcpp_parser_t *parser, const char *fmt, ...);
+
+void
+glcpp_warning (YYLTYPE *locp, glcpp_parser_t *parser, const char *fmt, ...);
+
+/* Generated by glcpp-lex.l to glcpp-lex.c */
+
+int
+glcpp_lex_init_extra (glcpp_parser_t *parser, yyscan_t* scanner);
+
+void
+glcpp_lex_set_source_string(glcpp_parser_t *parser, const char *shader);
+
+int
+glcpp_lex (YYSTYPE *lvalp, YYLTYPE *llocp, yyscan_t scanner);
+
+int
+glcpp_lex_destroy (yyscan_t scanner);
+
+/* Generated by glcpp-parse.y to glcpp-parse.c */
+
+int
+yyparse (glcpp_parser_t *parser);
+
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/pp.c b/icd/intel/compiler/shader/glcpp/pp.c
new file mode 100644
index 0000000..4a623f8
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/pp.c

@@ -0,0 +1,164 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include <assert.h>
+#include <string.h>
+#include <ctype.h>
+#include "glcpp.h"
+#include "main/core.h" /* for isblank() on MSVC */
+
+void
+glcpp_error (YYLTYPE *locp, glcpp_parser_t *parser, const char *fmt, ...)
+{
+	va_list ap;
+
+	parser->error = 1;
+	ralloc_asprintf_rewrite_tail(&parser->info_log,
+				     &parser->info_log_length,
+				     "%u:%u(%u): "
+				     "preprocessor error: ",
+				     locp->source,
+				     locp->first_line,
+				     locp->first_column);
+	va_start(ap, fmt);
+	ralloc_vasprintf_rewrite_tail(&parser->info_log,
+				      &parser->info_log_length,
+				      fmt, ap);
+	va_end(ap);
+	ralloc_asprintf_rewrite_tail(&parser->info_log,
+				     &parser->info_log_length, "\n");
+}
+
+void
+glcpp_warning (YYLTYPE *locp, glcpp_parser_t *parser, const char *fmt, ...)
+{
+	va_list ap;
+
+	ralloc_asprintf_rewrite_tail(&parser->info_log,
+				     &parser->info_log_length,
+				     "%u:%u(%u): "
+				     "preprocessor warning: ",
+				     locp->source,
+				     locp->first_line,
+				     locp->first_column);
+	va_start(ap, fmt);
+	ralloc_vasprintf_rewrite_tail(&parser->info_log,
+				      &parser->info_log_length,
+				      fmt, ap);
+	va_end(ap);
+	ralloc_asprintf_rewrite_tail(&parser->info_log,
+				     &parser->info_log_length, "\n");
+}
+
+/* Remove any line continuation characters in the shader, (whether in
+ * preprocessing directives or in GLSL code).
+ */
+static char *
+remove_line_continuations(glcpp_parser_t *ctx, const char *shader)
+{
+	char *clean = ralloc_strdup(ctx, "");
+	const char *backslash, *newline, *search_start;
+	int collapsed_newlines = 0;
+
+	search_start = shader;
+
+	while (true) {
+		backslash = strchr(search_start, '\\');
+
+		/* If we have previously collapsed any line-continuations,
+		 * then we want to insert additional newlines at the next
+		 * occurrence of a newline character to avoid changing any
+		 * line numbers.
+		 */
+		if (collapsed_newlines) {
+			newline = strchr(search_start, '\n');
+			if (newline &&
+			    (backslash == NULL || newline < backslash))
+			{
+				ralloc_strncat(&clean, shader,
+					       newline - shader + 1);
+				while (collapsed_newlines) {
+					ralloc_strcat(&clean, "\n");
+					collapsed_newlines--;
+				}
+				shader = newline + 1;
+				search_start = shader;
+			}
+		}
+
+		search_start = backslash + 1;
+
+		if (backslash == NULL)
+			break;
+
+		/* At each line continuation, (backslash followed by a
+		 * newline), copy all preceding text to the output, then
+		 * advance the shader pointer to the character after the
+		 * newline.
+		 */
+		if (backslash[1] == '\n' ||
+		    (backslash[1] == '\r' && backslash[2] == '\n'))
+		{
+			collapsed_newlines++;
+			ralloc_strncat(&clean, shader, backslash - shader);
+			if (backslash[1] == '\n')
+				shader = backslash + 2;
+			else
+				shader = backslash + 3;
+			search_start = shader;
+		}
+	}
+
+	ralloc_strcat(&clean, shader);
+
+	return clean;
+}
+
+int
+glcpp_preprocess(void *ralloc_ctx, const char **shader, char **info_log,
+	   const struct gl_extensions *extensions, struct gl_context *gl_ctx)
+{
+	int errors;
+	glcpp_parser_t *parser = glcpp_parser_create (extensions, gl_ctx->API);
+
+	if (! gl_ctx->Const.DisableGLSLLineContinuations)
+		*shader = remove_line_continuations(parser, *shader);
+
+	glcpp_lex_set_source_string (parser, *shader);
+
+	glcpp_parser_parse (parser);
+
+	if (parser->skip_stack)
+		glcpp_error (&parser->skip_stack->loc, parser, "Unterminated #if\n");
+
+	glcpp_parser_resolve_implicit_version(parser);
+
+	ralloc_strcat(info_log, parser->info_log);
+
+	ralloc_steal(ralloc_ctx, parser->output);
+	*shader = parser->output;
+
+	errors = parser->error;
+	glcpp_parser_destroy (parser);
+	return errors;
+}

diff --git a/icd/intel/compiler/shader/glcpp/tests/000-content-with-spaces.c b/icd/intel/compiler/shader/glcpp/tests/000-content-with-spaces.c
new file mode 100644
index 0000000..1f2320e
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/000-content-with-spaces.c

@@ -0,0 +1 @@
+   this is  four 	tokens  with spaces

diff --git a/icd/intel/compiler/shader/glcpp/tests/000-content-with-spaces.c.expected b/icd/intel/compiler/shader/glcpp/tests/000-content-with-spaces.c.expected
new file mode 100644
index 0000000..5e17ec9
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/000-content-with-spaces.c.expected

@@ -0,0 +1,2 @@
+   this is  four  tokens  with spaces
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/001-define.c b/icd/intel/compiler/shader/glcpp/tests/001-define.c
new file mode 100644
index 0000000..cbf2fee
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/001-define.c

@@ -0,0 +1,2 @@
+#define foo 1
+foo

diff --git a/icd/intel/compiler/shader/glcpp/tests/001-define.c.expected b/icd/intel/compiler/shader/glcpp/tests/001-define.c.expected
new file mode 100644
index 0000000..878fd15
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/001-define.c.expected

@@ -0,0 +1,3 @@
+
+1
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/002-define-chain.c b/icd/intel/compiler/shader/glcpp/tests/002-define-chain.c
new file mode 100644
index 0000000..87d75c6
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/002-define-chain.c

@@ -0,0 +1,3 @@
+#define foo 1
+#define bar foo
+bar

diff --git a/icd/intel/compiler/shader/glcpp/tests/002-define-chain.c.expected b/icd/intel/compiler/shader/glcpp/tests/002-define-chain.c.expected
new file mode 100644
index 0000000..43d484d
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/002-define-chain.c.expected

@@ -0,0 +1,4 @@
+
+
+1
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/003-define-chain-reverse.c b/icd/intel/compiler/shader/glcpp/tests/003-define-chain-reverse.c
new file mode 100644
index 0000000..a18b724
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/003-define-chain-reverse.c

@@ -0,0 +1,3 @@
+#define bar foo
+#define foo 1
+bar

diff --git a/icd/intel/compiler/shader/glcpp/tests/003-define-chain-reverse.c.expected b/icd/intel/compiler/shader/glcpp/tests/003-define-chain-reverse.c.expected
new file mode 100644
index 0000000..43d484d
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/003-define-chain-reverse.c.expected

@@ -0,0 +1,4 @@
+
+
+1
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/004-define-recursive.c b/icd/intel/compiler/shader/glcpp/tests/004-define-recursive.c
new file mode 100644
index 0000000..2ac56ea
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/004-define-recursive.c

@@ -0,0 +1,6 @@
+#define foo bar
+#define bar baz
+#define baz foo
+foo
+bar
+baz

diff --git a/icd/intel/compiler/shader/glcpp/tests/004-define-recursive.c.expected b/icd/intel/compiler/shader/glcpp/tests/004-define-recursive.c.expected
new file mode 100644
index 0000000..4d2698b
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/004-define-recursive.c.expected

@@ -0,0 +1,7 @@
+
+
+
+foo
+bar
+baz
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/005-define-composite-chain.c b/icd/intel/compiler/shader/glcpp/tests/005-define-composite-chain.c
new file mode 100644
index 0000000..f5521df
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/005-define-composite-chain.c

@@ -0,0 +1,3 @@
+#define foo 1
+#define bar a foo
+bar

diff --git a/icd/intel/compiler/shader/glcpp/tests/005-define-composite-chain.c.expected b/icd/intel/compiler/shader/glcpp/tests/005-define-composite-chain.c.expected
new file mode 100644
index 0000000..c67358f
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/005-define-composite-chain.c.expected

@@ -0,0 +1,4 @@
+
+
+a 1
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/006-define-composite-chain-reverse.c b/icd/intel/compiler/shader/glcpp/tests/006-define-composite-chain-reverse.c
new file mode 100644
index 0000000..4bb91a1
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/006-define-composite-chain-reverse.c

@@ -0,0 +1,3 @@
+#define bar a foo
+#define foo 1
+bar

diff --git a/icd/intel/compiler/shader/glcpp/tests/006-define-composite-chain-reverse.c.expected b/icd/intel/compiler/shader/glcpp/tests/006-define-composite-chain-reverse.c.expected
new file mode 100644
index 0000000..c67358f
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/006-define-composite-chain-reverse.c.expected

@@ -0,0 +1,4 @@
+
+
+a 1
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/007-define-composite-recursive.c b/icd/intel/compiler/shader/glcpp/tests/007-define-composite-recursive.c
new file mode 100644
index 0000000..5784565
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/007-define-composite-recursive.c

@@ -0,0 +1,6 @@
+#define foo a bar
+#define bar b baz
+#define baz c foo
+foo
+bar
+baz

diff --git a/icd/intel/compiler/shader/glcpp/tests/007-define-composite-recursive.c.expected b/icd/intel/compiler/shader/glcpp/tests/007-define-composite-recursive.c.expected
new file mode 100644
index 0000000..30fe4dc
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/007-define-composite-recursive.c.expected

@@ -0,0 +1,7 @@
+
+
+
+a b c foo
+b c a bar
+c a b baz
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/008-define-empty.c b/icd/intel/compiler/shader/glcpp/tests/008-define-empty.c
new file mode 100644
index 0000000..b1bd17e
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/008-define-empty.c

@@ -0,0 +1,2 @@
+#define foo
+foo

diff --git a/icd/intel/compiler/shader/glcpp/tests/008-define-empty.c.expected b/icd/intel/compiler/shader/glcpp/tests/008-define-empty.c.expected
new file mode 100644
index 0000000..c0f53d7
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/008-define-empty.c.expected

@@ -0,0 +1,3 @@
+
+ 
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/009-undef.c b/icd/intel/compiler/shader/glcpp/tests/009-undef.c
new file mode 100644
index 0000000..3fc1fb4
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/009-undef.c

@@ -0,0 +1,4 @@
+#define foo 1
+foo
+#undef foo
+foo

diff --git a/icd/intel/compiler/shader/glcpp/tests/009-undef.c.expected b/icd/intel/compiler/shader/glcpp/tests/009-undef.c.expected
new file mode 100644
index 0000000..03a7061
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/009-undef.c.expected

@@ -0,0 +1,5 @@
+
+1
+
+foo
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/010-undef-re-define.c b/icd/intel/compiler/shader/glcpp/tests/010-undef-re-define.c
new file mode 100644
index 0000000..32ff737
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/010-undef-re-define.c

@@ -0,0 +1,6 @@
+#define foo 1
+foo
+#undef foo
+foo
+#define foo 2
+foo

diff --git a/icd/intel/compiler/shader/glcpp/tests/010-undef-re-define.c.expected b/icd/intel/compiler/shader/glcpp/tests/010-undef-re-define.c.expected
new file mode 100644
index 0000000..f4f7efd
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/010-undef-re-define.c.expected

@@ -0,0 +1,7 @@
+
+1
+
+foo
+
+2
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/011-define-func-empty.c b/icd/intel/compiler/shader/glcpp/tests/011-define-func-empty.c
new file mode 100644
index 0000000..d9ce13c
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/011-define-func-empty.c

@@ -0,0 +1,2 @@
+#define foo()
+foo()

diff --git a/icd/intel/compiler/shader/glcpp/tests/011-define-func-empty.c.expected b/icd/intel/compiler/shader/glcpp/tests/011-define-func-empty.c.expected
new file mode 100644
index 0000000..c0f53d7
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/011-define-func-empty.c.expected

@@ -0,0 +1,3 @@
+
+ 
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/012-define-func-no-args.c b/icd/intel/compiler/shader/glcpp/tests/012-define-func-no-args.c
new file mode 100644
index 0000000..c2bb730
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/012-define-func-no-args.c

@@ -0,0 +1,2 @@
+#define foo() bar
+foo()

diff --git a/icd/intel/compiler/shader/glcpp/tests/012-define-func-no-args.c.expected b/icd/intel/compiler/shader/glcpp/tests/012-define-func-no-args.c.expected
new file mode 100644
index 0000000..0353767
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/012-define-func-no-args.c.expected

@@ -0,0 +1,3 @@
+
+bar
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/013-define-func-1-arg-unused.c b/icd/intel/compiler/shader/glcpp/tests/013-define-func-1-arg-unused.c
new file mode 100644
index 0000000..f78fb8b
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/013-define-func-1-arg-unused.c

@@ -0,0 +1,2 @@
+#define foo(x) 1
+foo(bar)

diff --git a/icd/intel/compiler/shader/glcpp/tests/013-define-func-1-arg-unused.c.expected b/icd/intel/compiler/shader/glcpp/tests/013-define-func-1-arg-unused.c.expected
new file mode 100644
index 0000000..878fd15
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/013-define-func-1-arg-unused.c.expected

@@ -0,0 +1,3 @@
+
+1
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/014-define-func-2-arg-unused.c b/icd/intel/compiler/shader/glcpp/tests/014-define-func-2-arg-unused.c
new file mode 100644
index 0000000..11feb26
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/014-define-func-2-arg-unused.c

@@ -0,0 +1,2 @@
+#define foo(x,y) 1
+foo(bar,baz)

diff --git a/icd/intel/compiler/shader/glcpp/tests/014-define-func-2-arg-unused.c.expected b/icd/intel/compiler/shader/glcpp/tests/014-define-func-2-arg-unused.c.expected
new file mode 100644
index 0000000..878fd15
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/014-define-func-2-arg-unused.c.expected

@@ -0,0 +1,3 @@
+
+1
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/015-define-object-with-parens.c b/icd/intel/compiler/shader/glcpp/tests/015-define-object-with-parens.c
new file mode 100644
index 0000000..558da9c
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/015-define-object-with-parens.c

@@ -0,0 +1,4 @@
+#define foo ()1
+foo()
+#define bar ()2
+bar()

diff --git a/icd/intel/compiler/shader/glcpp/tests/015-define-object-with-parens.c.expected b/icd/intel/compiler/shader/glcpp/tests/015-define-object-with-parens.c.expected
new file mode 100644
index 0000000..d6f8cb9
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/015-define-object-with-parens.c.expected

@@ -0,0 +1,5 @@
+
+()1()
+
+()2()
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/016-define-func-1-arg.c b/icd/intel/compiler/shader/glcpp/tests/016-define-func-1-arg.c
new file mode 100644
index 0000000..a2e2404
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/016-define-func-1-arg.c

@@ -0,0 +1,2 @@
+#define foo(x) ((x)+1)
+foo(bar)

diff --git a/icd/intel/compiler/shader/glcpp/tests/016-define-func-1-arg.c.expected b/icd/intel/compiler/shader/glcpp/tests/016-define-func-1-arg.c.expected
new file mode 100644
index 0000000..7f1828a
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/016-define-func-1-arg.c.expected

@@ -0,0 +1,3 @@
+
+((bar)+1)
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/017-define-func-2-args.c b/icd/intel/compiler/shader/glcpp/tests/017-define-func-2-args.c
new file mode 100644
index 0000000..c725383
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/017-define-func-2-args.c

@@ -0,0 +1,2 @@
+#define foo(x,y) ((x)*(y))
+foo(bar,baz)

diff --git a/icd/intel/compiler/shader/glcpp/tests/017-define-func-2-args.c.expected b/icd/intel/compiler/shader/glcpp/tests/017-define-func-2-args.c.expected
new file mode 100644
index 0000000..9f341da
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/017-define-func-2-args.c.expected

@@ -0,0 +1,3 @@
+
+((bar)*(baz))
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/018-define-func-macro-as-parameter.c b/icd/intel/compiler/shader/glcpp/tests/018-define-func-macro-as-parameter.c
new file mode 100644
index 0000000..668130b
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/018-define-func-macro-as-parameter.c

@@ -0,0 +1,3 @@
+#define x 0
+#define foo(x) x
+foo(1)

diff --git a/icd/intel/compiler/shader/glcpp/tests/018-define-func-macro-as-parameter.c.expected b/icd/intel/compiler/shader/glcpp/tests/018-define-func-macro-as-parameter.c.expected
new file mode 100644
index 0000000..43d484d
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/018-define-func-macro-as-parameter.c.expected

@@ -0,0 +1,4 @@
+
+
+1
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/019-define-func-1-arg-multi.c b/icd/intel/compiler/shader/glcpp/tests/019-define-func-1-arg-multi.c
new file mode 100644
index 0000000..c4e62b2
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/019-define-func-1-arg-multi.c

@@ -0,0 +1,2 @@
+#define foo(x) (x)
+foo(this is more than one word)

diff --git a/icd/intel/compiler/shader/glcpp/tests/019-define-func-1-arg-multi.c.expected b/icd/intel/compiler/shader/glcpp/tests/019-define-func-1-arg-multi.c.expected
new file mode 100644
index 0000000..4314fc8
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/019-define-func-1-arg-multi.c.expected

@@ -0,0 +1,3 @@
+
+(this is more than one word)
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/020-define-func-2-arg-multi.c b/icd/intel/compiler/shader/glcpp/tests/020-define-func-2-arg-multi.c
new file mode 100644
index 0000000..3049ad1
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/020-define-func-2-arg-multi.c

@@ -0,0 +1,2 @@
+#define foo(x,y) x,two fish,red fish,y
+foo(one fish, blue fish)

diff --git a/icd/intel/compiler/shader/glcpp/tests/020-define-func-2-arg-multi.c.expected b/icd/intel/compiler/shader/glcpp/tests/020-define-func-2-arg-multi.c.expected
new file mode 100644
index 0000000..5648e4f
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/020-define-func-2-arg-multi.c.expected

@@ -0,0 +1,3 @@
+
+one fish,two fish,red fish,blue fish
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/021-define-func-compose.c b/icd/intel/compiler/shader/glcpp/tests/021-define-func-compose.c
new file mode 100644
index 0000000..21ddd0e
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/021-define-func-compose.c

@@ -0,0 +1,3 @@
+#define bar(x) (1+(x))
+#define foo(y) (2*(y))
+foo(bar(3))

diff --git a/icd/intel/compiler/shader/glcpp/tests/021-define-func-compose.c.expected b/icd/intel/compiler/shader/glcpp/tests/021-define-func-compose.c.expected
new file mode 100644
index 0000000..1d62105
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/021-define-func-compose.c.expected

@@ -0,0 +1,4 @@
+
+
+(2*((1+(3))))
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/022-define-func-arg-with-parens.c b/icd/intel/compiler/shader/glcpp/tests/022-define-func-arg-with-parens.c
new file mode 100644
index 0000000..c20d73a
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/022-define-func-arg-with-parens.c

@@ -0,0 +1,2 @@
+#define foo(x) (x)
+foo(argument(including parens)for the win)

diff --git a/icd/intel/compiler/shader/glcpp/tests/022-define-func-arg-with-parens.c.expected b/icd/intel/compiler/shader/glcpp/tests/022-define-func-arg-with-parens.c.expected
new file mode 100644
index 0000000..66c1658
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/022-define-func-arg-with-parens.c.expected

@@ -0,0 +1,3 @@
+
+(argument(including parens)for the win)
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/023-define-extra-whitespace.c b/icd/intel/compiler/shader/glcpp/tests/023-define-extra-whitespace.c
new file mode 100644
index 0000000..7ebfed6
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/023-define-extra-whitespace.c

@@ -0,0 +1,8 @@
+#define noargs() 1 
+# define onearg(foo) foo 
+ # define  twoargs( x , y ) x y 
+	#	define	threeargs(	a	,	b	,	c	) a b c 
+noargs ( ) 
+onearg ( 2 ) 
+twoargs ( 3 , 4 ) 
+threeargs ( 5 , 6 , 7 ) 

diff --git a/icd/intel/compiler/shader/glcpp/tests/023-define-extra-whitespace.c.expected b/icd/intel/compiler/shader/glcpp/tests/023-define-extra-whitespace.c.expected
new file mode 100644
index 0000000..573829c
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/023-define-extra-whitespace.c.expected

@@ -0,0 +1,9 @@
+
+
+
+
+1
+2
+3 4
+5 6 7
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/024-define-chain-to-self-recursion.c b/icd/intel/compiler/shader/glcpp/tests/024-define-chain-to-self-recursion.c
new file mode 100644
index 0000000..e788adc
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/024-define-chain-to-self-recursion.c

@@ -0,0 +1,3 @@
+#define  foo foo
+#define  bar foo
+bar

diff --git a/icd/intel/compiler/shader/glcpp/tests/024-define-chain-to-self-recursion.c.expected b/icd/intel/compiler/shader/glcpp/tests/024-define-chain-to-self-recursion.c.expected
new file mode 100644
index 0000000..ad955fc
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/024-define-chain-to-self-recursion.c.expected

@@ -0,0 +1,4 @@
+
+
+foo
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/025-func-macro-as-non-macro.c b/icd/intel/compiler/shader/glcpp/tests/025-func-macro-as-non-macro.c
new file mode 100644
index 0000000..b433671
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/025-func-macro-as-non-macro.c

@@ -0,0 +1,2 @@
+#define foo(bar) bar
+foo bar

diff --git a/icd/intel/compiler/shader/glcpp/tests/025-func-macro-as-non-macro.c.expected b/icd/intel/compiler/shader/glcpp/tests/025-func-macro-as-non-macro.c.expected
new file mode 100644
index 0000000..960f445
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/025-func-macro-as-non-macro.c.expected

@@ -0,0 +1,3 @@
+
+foo bar
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/026-define-func-extra-newlines.c b/icd/intel/compiler/shader/glcpp/tests/026-define-func-extra-newlines.c
new file mode 100644
index 0000000..0d83740
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/026-define-func-extra-newlines.c

@@ -0,0 +1,6 @@
+#define foo(a) bar
+
+foo
+(
+1
+)

diff --git a/icd/intel/compiler/shader/glcpp/tests/026-define-func-extra-newlines.c.expected b/icd/intel/compiler/shader/glcpp/tests/026-define-func-extra-newlines.c.expected
new file mode 100644
index 0000000..f0888f2
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/026-define-func-extra-newlines.c.expected

@@ -0,0 +1,4 @@
+
+
+bar
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/027-define-chain-obj-to-func.c b/icd/intel/compiler/shader/glcpp/tests/027-define-chain-obj-to-func.c
new file mode 100644
index 0000000..5ccb52c
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/027-define-chain-obj-to-func.c

@@ -0,0 +1,3 @@
+#define failure() success
+#define foo failure()
+foo

diff --git a/icd/intel/compiler/shader/glcpp/tests/027-define-chain-obj-to-func.c.expected b/icd/intel/compiler/shader/glcpp/tests/027-define-chain-obj-to-func.c.expected
new file mode 100644
index 0000000..aef762e
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/027-define-chain-obj-to-func.c.expected

@@ -0,0 +1,4 @@
+
+
+success
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/028-define-chain-obj-to-non-func.c b/icd/intel/compiler/shader/glcpp/tests/028-define-chain-obj-to-non-func.c
new file mode 100644
index 0000000..44962a7
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/028-define-chain-obj-to-non-func.c

@@ -0,0 +1,3 @@
+#define success() failure
+#define foo success
+foo

diff --git a/icd/intel/compiler/shader/glcpp/tests/028-define-chain-obj-to-non-func.c.expected b/icd/intel/compiler/shader/glcpp/tests/028-define-chain-obj-to-non-func.c.expected
new file mode 100644
index 0000000..aef762e
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/028-define-chain-obj-to-non-func.c.expected

@@ -0,0 +1,4 @@
+
+
+success
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/029-define-chain-obj-to-func-with-args.c b/icd/intel/compiler/shader/glcpp/tests/029-define-chain-obj-to-func-with-args.c
new file mode 100644
index 0000000..261f7d2
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/029-define-chain-obj-to-func-with-args.c

@@ -0,0 +1,3 @@
+#define bar(failure) failure
+#define foo bar(success)
+foo

diff --git a/icd/intel/compiler/shader/glcpp/tests/029-define-chain-obj-to-func-with-args.c.expected b/icd/intel/compiler/shader/glcpp/tests/029-define-chain-obj-to-func-with-args.c.expected
new file mode 100644
index 0000000..aef762e
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/029-define-chain-obj-to-func-with-args.c.expected

@@ -0,0 +1,4 @@
+
+
+success
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/030-define-chain-obj-to-func-compose.c b/icd/intel/compiler/shader/glcpp/tests/030-define-chain-obj-to-func-compose.c
new file mode 100644
index 0000000..e56fbef
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/030-define-chain-obj-to-func-compose.c

@@ -0,0 +1,4 @@
+#define baz(failure) failure
+#define bar(failure) failure
+#define foo bar(baz(success))
+foo

diff --git a/icd/intel/compiler/shader/glcpp/tests/030-define-chain-obj-to-func-compose.c.expected b/icd/intel/compiler/shader/glcpp/tests/030-define-chain-obj-to-func-compose.c.expected
new file mode 100644
index 0000000..729bdd1
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/030-define-chain-obj-to-func-compose.c.expected

@@ -0,0 +1,5 @@
+
+
+
+success
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/031-define-chain-func-to-func-compose.c b/icd/intel/compiler/shader/glcpp/tests/031-define-chain-func-to-func-compose.c
new file mode 100644
index 0000000..3f4c874
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/031-define-chain-func-to-func-compose.c

@@ -0,0 +1,4 @@
+#define baz(failure) failure
+#define bar(failure) failure
+#define foo() bar(baz(success))
+foo()

diff --git a/icd/intel/compiler/shader/glcpp/tests/031-define-chain-func-to-func-compose.c.expected b/icd/intel/compiler/shader/glcpp/tests/031-define-chain-func-to-func-compose.c.expected
new file mode 100644
index 0000000..729bdd1
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/031-define-chain-func-to-func-compose.c.expected

@@ -0,0 +1,5 @@
+
+
+
+success
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/032-define-func-self-recurse.c b/icd/intel/compiler/shader/glcpp/tests/032-define-func-self-recurse.c
new file mode 100644
index 0000000..b3ac70f
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/032-define-func-self-recurse.c

@@ -0,0 +1,2 @@
+#define foo(a) foo(2*(a))
+foo(3)

diff --git a/icd/intel/compiler/shader/glcpp/tests/032-define-func-self-recurse.c.expected b/icd/intel/compiler/shader/glcpp/tests/032-define-func-self-recurse.c.expected
new file mode 100644
index 0000000..541d44d
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/032-define-func-self-recurse.c.expected

@@ -0,0 +1,3 @@
+
+foo(2*(3))
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/033-define-func-self-compose.c b/icd/intel/compiler/shader/glcpp/tests/033-define-func-self-compose.c
new file mode 100644
index 0000000..f65e482
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/033-define-func-self-compose.c

@@ -0,0 +1,2 @@
+#define foo(a) foo(2*(a))
+foo(foo(3))

diff --git a/icd/intel/compiler/shader/glcpp/tests/033-define-func-self-compose.c.expected b/icd/intel/compiler/shader/glcpp/tests/033-define-func-self-compose.c.expected
new file mode 100644
index 0000000..6ea6905
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/033-define-func-self-compose.c.expected

@@ -0,0 +1,3 @@
+
+foo(2*(foo(2*(3))))
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/034-define-func-self-compose-non-func.c b/icd/intel/compiler/shader/glcpp/tests/034-define-func-self-compose-non-func.c
new file mode 100644
index 0000000..209a5f7
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/034-define-func-self-compose-non-func.c

@@ -0,0 +1,2 @@
+#define foo(bar) bar
+foo(foo)

diff --git a/icd/intel/compiler/shader/glcpp/tests/034-define-func-self-compose-non-func.c.expected b/icd/intel/compiler/shader/glcpp/tests/034-define-func-self-compose-non-func.c.expected
new file mode 100644
index 0000000..24823b1
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/034-define-func-self-compose-non-func.c.expected

@@ -0,0 +1,3 @@
+
+foo
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/035-define-func-self-compose-non-func-multi-token-argument.c b/icd/intel/compiler/shader/glcpp/tests/035-define-func-self-compose-non-func-multi-token-argument.c
new file mode 100644
index 0000000..c307fbe
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/035-define-func-self-compose-non-func-multi-token-argument.c

@@ -0,0 +1,2 @@
+#define foo(bar) bar
+foo(1+foo)

diff --git a/icd/intel/compiler/shader/glcpp/tests/035-define-func-self-compose-non-func-multi-token-argument.c.expected b/icd/intel/compiler/shader/glcpp/tests/035-define-func-self-compose-non-func-multi-token-argument.c.expected
new file mode 100644
index 0000000..137a9ea
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/035-define-func-self-compose-non-func-multi-token-argument.c.expected

@@ -0,0 +1,3 @@
+
+1+foo
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/036-define-func-non-macro-multi-token-argument.c b/icd/intel/compiler/shader/glcpp/tests/036-define-func-non-macro-multi-token-argument.c
new file mode 100644
index 0000000..b21ff33
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/036-define-func-non-macro-multi-token-argument.c

@@ -0,0 +1,3 @@
+#define bar success
+#define foo(x) x
+foo(more bar)

diff --git a/icd/intel/compiler/shader/glcpp/tests/036-define-func-non-macro-multi-token-argument.c.expected b/icd/intel/compiler/shader/glcpp/tests/036-define-func-non-macro-multi-token-argument.c.expected
new file mode 100644
index 0000000..ff6360b
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/036-define-func-non-macro-multi-token-argument.c.expected

@@ -0,0 +1,4 @@
+
+
+more success
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/037-finalize-unexpanded-macro.c b/icd/intel/compiler/shader/glcpp/tests/037-finalize-unexpanded-macro.c
new file mode 100644
index 0000000..b3a2f37
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/037-finalize-unexpanded-macro.c

@@ -0,0 +1,3 @@
+#define expand(x) expand(x once)
+#define foo(x) x
+foo(expand(just))

diff --git a/icd/intel/compiler/shader/glcpp/tests/037-finalize-unexpanded-macro.c.expected b/icd/intel/compiler/shader/glcpp/tests/037-finalize-unexpanded-macro.c.expected
new file mode 100644
index 0000000..cbadee8
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/037-finalize-unexpanded-macro.c.expected

@@ -0,0 +1,4 @@
+
+
+expand(just once)
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/038-func-arg-with-commas.c b/icd/intel/compiler/shader/glcpp/tests/038-func-arg-with-commas.c
new file mode 100644
index 0000000..1407c7d
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/038-func-arg-with-commas.c

@@ -0,0 +1,2 @@
+#define foo(x) success
+foo(argument (with,embedded , commas) -- tricky)

diff --git a/icd/intel/compiler/shader/glcpp/tests/038-func-arg-with-commas.c.expected b/icd/intel/compiler/shader/glcpp/tests/038-func-arg-with-commas.c.expected
new file mode 100644
index 0000000..5a28fb3
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/038-func-arg-with-commas.c.expected

@@ -0,0 +1,3 @@
+
+success
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/039-func-arg-obj-macro-with-comma.c b/icd/intel/compiler/shader/glcpp/tests/039-func-arg-obj-macro-with-comma.c
new file mode 100644
index 0000000..0f7fe63
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/039-func-arg-obj-macro-with-comma.c

@@ -0,0 +1,3 @@
+#define foo(a) (a)
+#define bar two,words
+foo(bar)

diff --git a/icd/intel/compiler/shader/glcpp/tests/039-func-arg-obj-macro-with-comma.c.expected b/icd/intel/compiler/shader/glcpp/tests/039-func-arg-obj-macro-with-comma.c.expected
new file mode 100644
index 0000000..b73869d
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/039-func-arg-obj-macro-with-comma.c.expected

@@ -0,0 +1,4 @@
+
+
+(two,words)
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/040-token-pasting.c b/icd/intel/compiler/shader/glcpp/tests/040-token-pasting.c
new file mode 100644
index 0000000..caab3ba
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/040-token-pasting.c

@@ -0,0 +1,2 @@
+#define paste(a,b) a ## b
+paste(one , token)

diff --git a/icd/intel/compiler/shader/glcpp/tests/040-token-pasting.c.expected b/icd/intel/compiler/shader/glcpp/tests/040-token-pasting.c.expected
new file mode 100644
index 0000000..36f6699
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/040-token-pasting.c.expected

@@ -0,0 +1,3 @@
+
+onetoken
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/041-if-0.c b/icd/intel/compiler/shader/glcpp/tests/041-if-0.c
new file mode 100644
index 0000000..2cab677
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/041-if-0.c

@@ -0,0 +1,5 @@
+success_1
+#if 0
+failure
+#endif
+success_2

diff --git a/icd/intel/compiler/shader/glcpp/tests/041-if-0.c.expected b/icd/intel/compiler/shader/glcpp/tests/041-if-0.c.expected
new file mode 100644
index 0000000..3800024
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/041-if-0.c.expected

@@ -0,0 +1,6 @@
+success_1
+
+
+
+success_2
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/042-if-1.c b/icd/intel/compiler/shader/glcpp/tests/042-if-1.c
new file mode 100644
index 0000000..874a25c
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/042-if-1.c

@@ -0,0 +1,5 @@
+success_1
+#if 1
+success_2
+#endif
+success_3

diff --git a/icd/intel/compiler/shader/glcpp/tests/042-if-1.c.expected b/icd/intel/compiler/shader/glcpp/tests/042-if-1.c.expected
new file mode 100644
index 0000000..e591044
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/042-if-1.c.expected

@@ -0,0 +1,6 @@
+success_1
+
+success_2
+
+success_3
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/043-if-0-else.c b/icd/intel/compiler/shader/glcpp/tests/043-if-0-else.c
new file mode 100644
index 0000000..323351f
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/043-if-0-else.c

@@ -0,0 +1,7 @@
+success_1
+#if 0
+failure
+#else
+success_2
+#endif
+success_3

diff --git a/icd/intel/compiler/shader/glcpp/tests/043-if-0-else.c.expected b/icd/intel/compiler/shader/glcpp/tests/043-if-0-else.c.expected
new file mode 100644
index 0000000..ee9e677
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/043-if-0-else.c.expected

@@ -0,0 +1,8 @@
+success_1
+
+
+
+success_2
+
+success_3
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/044-if-1-else.c b/icd/intel/compiler/shader/glcpp/tests/044-if-1-else.c
new file mode 100644
index 0000000..28dfc25
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/044-if-1-else.c

@@ -0,0 +1,7 @@
+success_1
+#if 1
+success_2
+#else
+failure
+#endif
+success_3

diff --git a/icd/intel/compiler/shader/glcpp/tests/044-if-1-else.c.expected b/icd/intel/compiler/shader/glcpp/tests/044-if-1-else.c.expected
new file mode 100644
index 0000000..129f5c8
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/044-if-1-else.c.expected

@@ -0,0 +1,8 @@
+success_1
+
+success_2
+
+
+
+success_3
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/045-if-0-elif.c b/icd/intel/compiler/shader/glcpp/tests/045-if-0-elif.c
new file mode 100644
index 0000000..e50f686
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/045-if-0-elif.c

@@ -0,0 +1,11 @@
+success_1
+#if 0
+failure_1
+#elif 0
+failure_2
+#elif 1
+success_3
+#elif 1
+failure_3
+#endif
+success_4

diff --git a/icd/intel/compiler/shader/glcpp/tests/045-if-0-elif.c.expected b/icd/intel/compiler/shader/glcpp/tests/045-if-0-elif.c.expected
new file mode 100644
index 0000000..97a11b4
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/045-if-0-elif.c.expected

@@ -0,0 +1,12 @@
+success_1
+
+
+
+
+
+success_3
+
+
+
+success_4
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/046-if-1-elsif.c b/icd/intel/compiler/shader/glcpp/tests/046-if-1-elsif.c
new file mode 100644
index 0000000..130515a
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/046-if-1-elsif.c

@@ -0,0 +1,11 @@
+success_1
+#if 1
+success_2
+#elif 0
+failure_1
+#elif 1
+failure_2
+#elif 0
+failure_3
+#endif
+success_3

diff --git a/icd/intel/compiler/shader/glcpp/tests/046-if-1-elsif.c.expected b/icd/intel/compiler/shader/glcpp/tests/046-if-1-elsif.c.expected
new file mode 100644
index 0000000..b928b91
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/046-if-1-elsif.c.expected

@@ -0,0 +1,12 @@
+success_1
+
+success_2
+
+
+
+
+
+
+
+success_3
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/047-if-elif-else.c b/icd/intel/compiler/shader/glcpp/tests/047-if-elif-else.c
new file mode 100644
index 0000000..e8f0838
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/047-if-elif-else.c

@@ -0,0 +1,11 @@
+success_1
+#if 0
+failure_1
+#elif 0
+failure_2
+#elif 0
+failure_3
+#else
+success_2
+#endif
+success_3

diff --git a/icd/intel/compiler/shader/glcpp/tests/047-if-elif-else.c.expected b/icd/intel/compiler/shader/glcpp/tests/047-if-elif-else.c.expected
new file mode 100644
index 0000000..e5b53a3
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/047-if-elif-else.c.expected

@@ -0,0 +1,12 @@
+success_1
+
+
+
+
+
+
+
+success_2
+
+success_3
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/048-if-nested.c b/icd/intel/compiler/shader/glcpp/tests/048-if-nested.c
new file mode 100644
index 0000000..fc4679c
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/048-if-nested.c

@@ -0,0 +1,11 @@
+success_1
+#if 0
+failure_1
+#if 1
+failure_2
+#else
+failure_3
+#endif
+failure_4
+#endif
+success_2

diff --git a/icd/intel/compiler/shader/glcpp/tests/048-if-nested.c.expected b/icd/intel/compiler/shader/glcpp/tests/048-if-nested.c.expected
new file mode 100644
index 0000000..c61fd0b
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/048-if-nested.c.expected

@@ -0,0 +1,12 @@
+success_1
+
+
+
+
+
+
+
+
+
+success_2
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/049-if-expression-precedence.c b/icd/intel/compiler/shader/glcpp/tests/049-if-expression-precedence.c
new file mode 100644
index 0000000..833ea03
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/049-if-expression-precedence.c

@@ -0,0 +1,5 @@
+#if 1 + 2 * 3 + - (25 % 17 - + 1)
+failure with operator precedence
+#else
+success
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/049-if-expression-precedence.c.expected b/icd/intel/compiler/shader/glcpp/tests/049-if-expression-precedence.c.expected
new file mode 100644
index 0000000..569debb
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/049-if-expression-precedence.c.expected

@@ -0,0 +1,6 @@
+
+
+
+success
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/050-if-defined.c b/icd/intel/compiler/shader/glcpp/tests/050-if-defined.c
new file mode 100644
index 0000000..34f0f95
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/050-if-defined.c

@@ -0,0 +1,17 @@
+#if defined foo
+failure_1
+#else
+success_1
+#endif
+#define foo
+#if defined foo
+success_2
+#else
+failure_2
+#endif
+#undef foo
+#if defined foo
+failure_3
+#else
+success_3
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/050-if-defined.c.expected b/icd/intel/compiler/shader/glcpp/tests/050-if-defined.c.expected
new file mode 100644
index 0000000..3f01955
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/050-if-defined.c.expected

@@ -0,0 +1,18 @@
+
+
+
+success_1
+
+
+
+success_2
+
+
+
+
+
+
+
+success_3
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/051-if-relational.c b/icd/intel/compiler/shader/glcpp/tests/051-if-relational.c
new file mode 100644
index 0000000..c3db488
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/051-if-relational.c

@@ -0,0 +1,35 @@
+#if 3 < 2
+failure_1
+#else
+success_1
+#endif
+
+#if 3 >= 2
+success_2
+#else
+failure_2
+#endif
+
+#if 2 + 3 <= 5
+success_3
+#else
+failure_3
+#endif
+
+#if 3 - 2 == 1
+success_3
+#else
+failure_3
+#endif
+
+#if 1 > 3
+failure_4
+#else
+success_4
+#endif
+
+#if 1 != 5
+success_5
+#else
+failure_5
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/051-if-relational.c.expected b/icd/intel/compiler/shader/glcpp/tests/051-if-relational.c.expected
new file mode 100644
index 0000000..d2b76f1
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/051-if-relational.c.expected

@@ -0,0 +1,36 @@
+
+
+
+success_1
+
+
+
+success_2
+
+
+
+
+
+success_3
+
+
+
+
+
+success_3
+
+
+
+
+
+
+
+success_4
+
+
+
+success_5
+
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/052-if-bitwise.c b/icd/intel/compiler/shader/glcpp/tests/052-if-bitwise.c
new file mode 100644
index 0000000..2d8e45e
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/052-if-bitwise.c

@@ -0,0 +1,20 @@
+#if (0xaaaaaaaa | 0x55555555) != 4294967295
+failure_1
+#else
+success_1
+#endif
+#if (0x12345678 ^ 0xfdecba98) == 4023971040
+success_2
+#else
+failure_2
+#endif
+#if (~ 0xdeadbeef) != -3735928560
+failure_3
+#else
+success_3
+#endif
+#if (0667 & 0733) == 403
+success_4
+#else
+failure_4
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/052-if-bitwise.c.expected b/icd/intel/compiler/shader/glcpp/tests/052-if-bitwise.c.expected
new file mode 100644
index 0000000..bb5d92e
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/052-if-bitwise.c.expected

@@ -0,0 +1,21 @@
+
+
+
+success_1
+
+
+success_2
+
+
+
+
+
+
+success_3
+
+
+success_4
+
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/053-if-divide-and-shift.c b/icd/intel/compiler/shader/glcpp/tests/053-if-divide-and-shift.c
new file mode 100644
index 0000000..d24c54a
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/053-if-divide-and-shift.c

@@ -0,0 +1,15 @@
+#if (15 / 2) != 7
+failure_1
+#else
+success_1
+#endif
+#if (1 << 12) == 4096
+success_2
+#else
+failure_2
+#endif
+#if (31762 >> 8) != 124
+failure_3
+#else
+success_3
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/053-if-divide-and-shift.c.expected b/icd/intel/compiler/shader/glcpp/tests/053-if-divide-and-shift.c.expected
new file mode 100644
index 0000000..f97e936
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/053-if-divide-and-shift.c.expected

@@ -0,0 +1,16 @@
+
+
+
+success_1
+
+
+success_2
+
+
+
+
+
+
+success_3
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/054-if-with-macros.c b/icd/intel/compiler/shader/glcpp/tests/054-if-with-macros.c
new file mode 100644
index 0000000..3da79a0
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/054-if-with-macros.c

@@ -0,0 +1,34 @@
+#define one 1
+#define two 2
+#define three 3
+#define five 5
+#if five < two
+failure_1
+#else
+success_1
+#endif
+#if three >= two
+success_2
+#else
+failure_2
+#endif
+#if two + three <= five
+success_3
+#else
+failure_3
+#endif
+#if five - two == three
+success_4
+#else
+failure_4
+#endif
+#if one > three
+failure_5
+#else
+success_5
+#endif
+#if one != five
+success_6
+#else
+failure_6
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/054-if-with-macros.c.expected b/icd/intel/compiler/shader/glcpp/tests/054-if-with-macros.c.expected
new file mode 100644
index 0000000..27ea496
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/054-if-with-macros.c.expected

@@ -0,0 +1,35 @@
+
+
+
+
+
+
+
+success_1
+
+
+success_2
+
+
+
+
+success_3
+
+
+
+
+success_4
+
+
+
+
+
+
+success_5
+
+
+success_6
+
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/055-define-chain-obj-to-func-parens-in-text.c b/icd/intel/compiler/shader/glcpp/tests/055-define-chain-obj-to-func-parens-in-text.c
new file mode 100644
index 0000000..00f2c23
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/055-define-chain-obj-to-func-parens-in-text.c

@@ -0,0 +1,3 @@
+#define failure() success
+#define foo failure
+foo()

diff --git a/icd/intel/compiler/shader/glcpp/tests/055-define-chain-obj-to-func-parens-in-text.c.expected b/icd/intel/compiler/shader/glcpp/tests/055-define-chain-obj-to-func-parens-in-text.c.expected
new file mode 100644
index 0000000..aef762e
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/055-define-chain-obj-to-func-parens-in-text.c.expected

@@ -0,0 +1,4 @@
+
+
+success
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/056-macro-argument-with-comma.c b/icd/intel/compiler/shader/glcpp/tests/056-macro-argument-with-comma.c
new file mode 100644
index 0000000..58701d1
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/056-macro-argument-with-comma.c

@@ -0,0 +1,4 @@
+#define bar with,embedded,commas
+#define function(x) success
+#define foo function
+foo(bar)

diff --git a/icd/intel/compiler/shader/glcpp/tests/056-macro-argument-with-comma.c.expected b/icd/intel/compiler/shader/glcpp/tests/056-macro-argument-with-comma.c.expected
new file mode 100644
index 0000000..729bdd1
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/056-macro-argument-with-comma.c.expected

@@ -0,0 +1,5 @@
+
+
+
+success
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/057-empty-arguments.c b/icd/intel/compiler/shader/glcpp/tests/057-empty-arguments.c
new file mode 100644
index 0000000..6140232
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/057-empty-arguments.c

@@ -0,0 +1,6 @@
+#define zero() success
+zero()
+#define one(x) success
+one()
+#define two(x,y) success
+two(,)

diff --git a/icd/intel/compiler/shader/glcpp/tests/057-empty-arguments.c.expected b/icd/intel/compiler/shader/glcpp/tests/057-empty-arguments.c.expected
new file mode 100644
index 0000000..4e3aad5
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/057-empty-arguments.c.expected

@@ -0,0 +1,7 @@
+
+success
+
+success
+
+success
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/058-token-pasting-empty-arguments.c b/icd/intel/compiler/shader/glcpp/tests/058-token-pasting-empty-arguments.c
new file mode 100644
index 0000000..8ac260c
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/058-token-pasting-empty-arguments.c

@@ -0,0 +1,5 @@
+#define paste(x,y) x ## y
+paste(a,b)
+paste(a,)
+paste(,b)
+paste(,)

diff --git a/icd/intel/compiler/shader/glcpp/tests/058-token-pasting-empty-arguments.c.expected b/icd/intel/compiler/shader/glcpp/tests/058-token-pasting-empty-arguments.c.expected
new file mode 100644
index 0000000..a1c34e5
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/058-token-pasting-empty-arguments.c.expected

@@ -0,0 +1,6 @@
+
+ab
+a
+b
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/059-token-pasting-integer.c b/icd/intel/compiler/shader/glcpp/tests/059-token-pasting-integer.c
new file mode 100644
index 0000000..37b895a
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/059-token-pasting-integer.c

@@ -0,0 +1,4 @@
+#define paste(x,y) x ## y
+paste(1,2)
+paste(1,000)
+paste(identifier,2)

diff --git a/icd/intel/compiler/shader/glcpp/tests/059-token-pasting-integer.c.expected b/icd/intel/compiler/shader/glcpp/tests/059-token-pasting-integer.c.expected
new file mode 100644
index 0000000..f1a2cd2
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/059-token-pasting-integer.c.expected

@@ -0,0 +1,5 @@
+
+12
+1000
+identifier2
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/060-left-paren-in-macro-right-paren-in-text.c b/icd/intel/compiler/shader/glcpp/tests/060-left-paren-in-macro-right-paren-in-text.c
new file mode 100644
index 0000000..ed80ea8
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/060-left-paren-in-macro-right-paren-in-text.c

@@ -0,0 +1,3 @@
+#define double(a) a*2
+#define foo double(
+foo 5)

diff --git a/icd/intel/compiler/shader/glcpp/tests/060-left-paren-in-macro-right-paren-in-text.c.expected b/icd/intel/compiler/shader/glcpp/tests/060-left-paren-in-macro-right-paren-in-text.c.expected
new file mode 100644
index 0000000..c1f0d24
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/060-left-paren-in-macro-right-paren-in-text.c.expected

@@ -0,0 +1,4 @@
+
+
+5*2
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/061-define-chain-obj-to-func-multi.c b/icd/intel/compiler/shader/glcpp/tests/061-define-chain-obj-to-func-multi.c
new file mode 100644
index 0000000..6dbfd1f
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/061-define-chain-obj-to-func-multi.c

@@ -0,0 +1,5 @@
+#define foo(x) success
+#define bar foo
+#define baz bar
+#define joe baz
+joe (failure)

diff --git a/icd/intel/compiler/shader/glcpp/tests/061-define-chain-obj-to-func-multi.c.expected b/icd/intel/compiler/shader/glcpp/tests/061-define-chain-obj-to-func-multi.c.expected
new file mode 100644
index 0000000..111f7d1
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/061-define-chain-obj-to-func-multi.c.expected

@@ -0,0 +1,6 @@
+
+
+
+
+success
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/062-if-0-skips-garbage.c b/icd/intel/compiler/shader/glcpp/tests/062-if-0-skips-garbage.c
new file mode 100644
index 0000000..d9e439b
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/062-if-0-skips-garbage.c

@@ -0,0 +1,5 @@
+#define foo(a,b)
+#if 0
+foo(bar)
+foo(
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/062-if-0-skips-garbage.c.expected b/icd/intel/compiler/shader/glcpp/tests/062-if-0-skips-garbage.c.expected
new file mode 100644
index 0000000..6fb66a5
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/062-if-0-skips-garbage.c.expected

@@ -0,0 +1,6 @@
+
+
+
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/063-comments.c b/icd/intel/compiler/shader/glcpp/tests/063-comments.c
new file mode 100644
index 0000000..e641d2f
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/063-comments.c

@@ -0,0 +1,20 @@
+/* this is a comment */
+// so is this
+// */
+f = g/**//h;
+/*//*/l();
+m = n//**/o
++ p;
+/* this
+comment spans
+multiple lines and
+contains *** stars
+and slashes / *** /
+and other stuff.
+****/
+more code here
+/* Test that /* nested
+   comments */
+are not treated like comments.
+/*/ this is a comment */
+/*/*/

diff --git a/icd/intel/compiler/shader/glcpp/tests/063-comments.c.expected b/icd/intel/compiler/shader/glcpp/tests/063-comments.c.expected
new file mode 100644
index 0000000..1965c9b
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/063-comments.c.expected

@@ -0,0 +1,21 @@
+ 
+
+
+f = g /h;
+ l();
+m = n
++ p;
+ 
+
+
+
+
+
+
+more code here
+ 
+
+are not treated like comments.
+ 
+ 
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/064-version.c b/icd/intel/compiler/shader/glcpp/tests/064-version.c
new file mode 100644
index 0000000..2132648
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/064-version.c

@@ -0,0 +1,2 @@
+#version 130
+#define FOO

diff --git a/icd/intel/compiler/shader/glcpp/tests/064-version.c.expected b/icd/intel/compiler/shader/glcpp/tests/064-version.c.expected
new file mode 100644
index 0000000..3af7111
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/064-version.c.expected

@@ -0,0 +1,3 @@
+#version 130
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/065-if-defined-parens.c b/icd/intel/compiler/shader/glcpp/tests/065-if-defined-parens.c
new file mode 100644
index 0000000..48aa0f8
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/065-if-defined-parens.c

@@ -0,0 +1,17 @@
+#if defined(foo)
+failure_1
+#else
+success_1
+#endif
+#define foo
+#if defined ( foo )
+success_2
+#else
+failure_2
+#endif
+#undef foo
+#if defined (foo)
+failure_3
+#else
+success_3
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/065-if-defined-parens.c.expected b/icd/intel/compiler/shader/glcpp/tests/065-if-defined-parens.c.expected
new file mode 100644
index 0000000..3f01955
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/065-if-defined-parens.c.expected

@@ -0,0 +1,18 @@
+
+
+
+success_1
+
+
+
+success_2
+
+
+
+
+
+
+
+success_3
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/066-if-nospace-expression.c b/icd/intel/compiler/shader/glcpp/tests/066-if-nospace-expression.c
new file mode 100644
index 0000000..3b0b473
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/066-if-nospace-expression.c

@@ -0,0 +1,3 @@
+#if(1)
+success
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/066-if-nospace-expression.c.expected b/icd/intel/compiler/shader/glcpp/tests/066-if-nospace-expression.c.expected
new file mode 100644
index 0000000..0e84a7c
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/066-if-nospace-expression.c.expected

@@ -0,0 +1,4 @@
+
+success
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/067-nested-ifdef-ifndef.c b/icd/intel/compiler/shader/glcpp/tests/067-nested-ifdef-ifndef.c
new file mode 100644
index 0000000..f46cce4
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/067-nested-ifdef-ifndef.c

@@ -0,0 +1,40 @@
+#define D1
+#define D2
+
+#define result success
+
+#ifdef U1
+#ifdef U2
+#undef result
+#define result failure
+#endif
+#endif
+result
+
+#ifndef D1
+#ifndef D2
+#undef result
+#define result failure
+#endif
+#endif
+result
+
+#undef result
+#define result failure
+#ifdef D1
+#ifdef D2
+#undef result
+#define result success
+#endif
+#endif
+result
+
+#undef result
+#define result failure
+#ifndef U1
+#ifndef U2
+#undef result
+#define result success
+#endif
+#endif
+result

diff --git a/icd/intel/compiler/shader/glcpp/tests/067-nested-ifdef-ifndef.c.expected b/icd/intel/compiler/shader/glcpp/tests/067-nested-ifdef-ifndef.c.expected
new file mode 100644
index 0000000..3340daa
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/067-nested-ifdef-ifndef.c.expected

@@ -0,0 +1,41 @@
+
+
+
+
+
+
+
+
+
+
+
+success
+
+
+
+
+
+
+
+success
+
+
+
+
+
+
+
+
+
+success
+
+
+
+
+
+
+
+
+
+success
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/068-accidental-pasting.c b/icd/intel/compiler/shader/glcpp/tests/068-accidental-pasting.c
new file mode 100644
index 0000000..699ac51
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/068-accidental-pasting.c

@@ -0,0 +1,11 @@
+#define empty
+<empty<
+<empty=
+>empty>
+>empty=
+=empty=
+!empty=
+&empty&
+|empty|
++empty+
+-empty-

diff --git a/icd/intel/compiler/shader/glcpp/tests/068-accidental-pasting.c.expected b/icd/intel/compiler/shader/glcpp/tests/068-accidental-pasting.c.expected
new file mode 100644
index 0000000..ce41cd6
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/068-accidental-pasting.c.expected

@@ -0,0 +1,12 @@
+
+< <
+< =
+> >
+> =
+= =
+! =
+& &
+| |
++ +
+- -
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/069-repeated-argument.c b/icd/intel/compiler/shader/glcpp/tests/069-repeated-argument.c
new file mode 100644
index 0000000..2b46ead
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/069-repeated-argument.c

@@ -0,0 +1,2 @@
+#define double(x) x x
+double(1)

diff --git a/icd/intel/compiler/shader/glcpp/tests/069-repeated-argument.c.expected b/icd/intel/compiler/shader/glcpp/tests/069-repeated-argument.c.expected
new file mode 100644
index 0000000..755c4d4
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/069-repeated-argument.c.expected

@@ -0,0 +1,3 @@
+
+1 1
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/070-undefined-macro-in-expression.c b/icd/intel/compiler/shader/glcpp/tests/070-undefined-macro-in-expression.c
new file mode 100644
index 0000000..d15a484
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/070-undefined-macro-in-expression.c

@@ -0,0 +1,5 @@
+#if UNDEFINED_MACRO
+Failure
+#else
+Success
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/070-undefined-macro-in-expression.c.expected b/icd/intel/compiler/shader/glcpp/tests/070-undefined-macro-in-expression.c.expected
new file mode 100644
index 0000000..d5a8452
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/070-undefined-macro-in-expression.c.expected

@@ -0,0 +1,6 @@
+
+
+
+Success
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/071-punctuator.c b/icd/intel/compiler/shader/glcpp/tests/071-punctuator.c
new file mode 100644
index 0000000..959d682
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/071-punctuator.c

@@ -0,0 +1 @@
+a = b

diff --git a/icd/intel/compiler/shader/glcpp/tests/071-punctuator.c.expected b/icd/intel/compiler/shader/glcpp/tests/071-punctuator.c.expected
new file mode 100644
index 0000000..fee253b
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/071-punctuator.c.expected

@@ -0,0 +1,2 @@
+a = b
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/072-token-pasting-same-line.c b/icd/intel/compiler/shader/glcpp/tests/072-token-pasting-same-line.c
new file mode 100644
index 0000000..e421e9d
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/072-token-pasting-same-line.c

@@ -0,0 +1,2 @@
+#define paste(x) success_ ## x
+paste(1) paste(2) paste(3)

diff --git a/icd/intel/compiler/shader/glcpp/tests/072-token-pasting-same-line.c.expected b/icd/intel/compiler/shader/glcpp/tests/072-token-pasting-same-line.c.expected
new file mode 100644
index 0000000..c780b43
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/072-token-pasting-same-line.c.expected

@@ -0,0 +1,3 @@
+
+success_1 success_2 success_3
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/073-if-in-ifdef.c b/icd/intel/compiler/shader/glcpp/tests/073-if-in-ifdef.c
new file mode 100644
index 0000000..61a4809
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/073-if-in-ifdef.c

@@ -0,0 +1,4 @@
+#ifdef UNDEF
+#if UNDEF > 1
+#endif
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/073-if-in-ifdef.c.expected b/icd/intel/compiler/shader/glcpp/tests/073-if-in-ifdef.c.expected
new file mode 100644
index 0000000..3f2ff2d
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/073-if-in-ifdef.c.expected

@@ -0,0 +1,5 @@
+
+
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/074-elif-undef.c b/icd/intel/compiler/shader/glcpp/tests/074-elif-undef.c
new file mode 100644
index 0000000..67aac89
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/074-elif-undef.c

@@ -0,0 +1,3 @@
+#ifndef UNDEF
+#elif UNDEF < 0
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/074-elif-undef.c.expected b/icd/intel/compiler/shader/glcpp/tests/074-elif-undef.c.expected
new file mode 100644
index 0000000..fd40910
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/074-elif-undef.c.expected

@@ -0,0 +1,4 @@
+
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/075-elif-elif-undef.c b/icd/intel/compiler/shader/glcpp/tests/075-elif-elif-undef.c
new file mode 100644
index 0000000..264bc4f
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/075-elif-elif-undef.c

@@ -0,0 +1,4 @@
+#ifndef UNDEF
+#elif UNDEF < 0
+#elif UNDEF == 3
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/075-elif-elif-undef.c.expected b/icd/intel/compiler/shader/glcpp/tests/075-elif-elif-undef.c.expected
new file mode 100644
index 0000000..3f2ff2d
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/075-elif-elif-undef.c.expected

@@ -0,0 +1,5 @@
+
+
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/076-elif-undef-nested.c b/icd/intel/compiler/shader/glcpp/tests/076-elif-undef-nested.c
new file mode 100644
index 0000000..ebd550e
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/076-elif-undef-nested.c

@@ -0,0 +1,5 @@
+#ifdef UNDEF
+#if UNDEF == 4
+#elif UNDEF == 5
+#endif
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/076-elif-undef-nested.c.expected b/icd/intel/compiler/shader/glcpp/tests/076-elif-undef-nested.c.expected
new file mode 100644
index 0000000..6fb66a5
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/076-elif-undef-nested.c.expected

@@ -0,0 +1,6 @@
+
+
+
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/077-else-without-if.c b/icd/intel/compiler/shader/glcpp/tests/077-else-without-if.c
new file mode 100644
index 0000000..81f00bf
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/077-else-without-if.c

@@ -0,0 +1 @@
+#else

diff --git a/icd/intel/compiler/shader/glcpp/tests/077-else-without-if.c.expected b/icd/intel/compiler/shader/glcpp/tests/077-else-without-if.c.expected
new file mode 100644
index 0000000..d289b36
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/077-else-without-if.c.expected

@@ -0,0 +1,4 @@
+0:1(2): preprocessor error: else without #if
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/078-elif-without-if.c b/icd/intel/compiler/shader/glcpp/tests/078-elif-without-if.c
new file mode 100644
index 0000000..60466b3
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/078-elif-without-if.c

@@ -0,0 +1 @@
+#elif defined FOO

diff --git a/icd/intel/compiler/shader/glcpp/tests/078-elif-without-if.c.expected b/icd/intel/compiler/shader/glcpp/tests/078-elif-without-if.c.expected
new file mode 100644
index 0000000..7d41f0a
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/078-elif-without-if.c.expected

@@ -0,0 +1,4 @@
+0:1(2): preprocessor error: elif without #if
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/079-endif-without-if.c b/icd/intel/compiler/shader/glcpp/tests/079-endif-without-if.c
new file mode 100644
index 0000000..69331c3
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/079-endif-without-if.c

@@ -0,0 +1 @@
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/079-endif-without-if.c.expected b/icd/intel/compiler/shader/glcpp/tests/079-endif-without-if.c.expected
new file mode 100644
index 0000000..08dd335
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/079-endif-without-if.c.expected

@@ -0,0 +1,4 @@
+0:1(2): preprocessor error: #endif without #if
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/080-if-without-expression.c b/icd/intel/compiler/shader/glcpp/tests/080-if-without-expression.c
new file mode 100644
index 0000000..a27ba36
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/080-if-without-expression.c

@@ -0,0 +1,4 @@
+/* Error message for unskipped #if with no expression. */
+#if
+#endif
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/080-if-without-expression.c.expected b/icd/intel/compiler/shader/glcpp/tests/080-if-without-expression.c.expected
new file mode 100644
index 0000000..768ba0f
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/080-if-without-expression.c.expected

@@ -0,0 +1,6 @@
+0:2(1): preprocessor error: #if with no expression
+ 
+
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/081-elif-without-expression.c b/icd/intel/compiler/shader/glcpp/tests/081-elif-without-expression.c
new file mode 100644
index 0000000..79c7866
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/081-elif-without-expression.c

@@ -0,0 +1,3 @@
+#if 0
+#elif
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/081-elif-without-expression.c.expected b/icd/intel/compiler/shader/glcpp/tests/081-elif-without-expression.c.expected
new file mode 100644
index 0000000..974f0f5
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/081-elif-without-expression.c.expected

@@ -0,0 +1,5 @@
+0:2(1): preprocessor error: #elif with no expression
+
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/082-invalid-paste.c b/icd/intel/compiler/shader/glcpp/tests/082-invalid-paste.c
new file mode 100644
index 0000000..8b84d50
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/082-invalid-paste.c

@@ -0,0 +1,7 @@
+#define PASTE(x,y) x ## y
+PASTE(<,>)
+PASTE(0,abc)
+PASTE(1,=)
+PASTE(2,@)
+PASTE(3,-4)
+PASTE(4,+5.2)

diff --git a/icd/intel/compiler/shader/glcpp/tests/082-invalid-paste.c.expected b/icd/intel/compiler/shader/glcpp/tests/082-invalid-paste.c.expected
new file mode 100644
index 0000000..2dd21c0
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/082-invalid-paste.c.expected

@@ -0,0 +1,20 @@
+0:2(7): preprocessor error: 
+Pasting "<" and ">" does not give a valid preprocessing token.
+0:3(7): preprocessor error: 
+Pasting "0" and "abc" does not give a valid preprocessing token.
+0:4(7): preprocessor error: 
+Pasting "1" and "=" does not give a valid preprocessing token.
+0:5(7): preprocessor error: 
+Pasting "2" and "@" does not give a valid preprocessing token.
+0:6(7): preprocessor error: 
+Pasting "3" and "-" does not give a valid preprocessing token.
+0:7(7): preprocessor error: 
+Pasting "4" and "+" does not give a valid preprocessing token.
+
+<
+0
+1
+2
+34
+45.2
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/083-unterminated-if.c b/icd/intel/compiler/shader/glcpp/tests/083-unterminated-if.c
new file mode 100644
index 0000000..9180635
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/083-unterminated-if.c

@@ -0,0 +1,2 @@
+#if 1
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/083-unterminated-if.c.expected b/icd/intel/compiler/shader/glcpp/tests/083-unterminated-if.c.expected
new file mode 100644
index 0000000..a69f8ba
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/083-unterminated-if.c.expected

@@ -0,0 +1,5 @@
+0:1(7): preprocessor error: Unterminated #if
+
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/084-unbalanced-parentheses.c b/icd/intel/compiler/shader/glcpp/tests/084-unbalanced-parentheses.c
new file mode 100644
index 0000000..0789ba5
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/084-unbalanced-parentheses.c

@@ -0,0 +1,2 @@
+#define FUNC(x) (2*(x))
+FUNC(23

diff --git a/icd/intel/compiler/shader/glcpp/tests/084-unbalanced-parentheses.c.expected b/icd/intel/compiler/shader/glcpp/tests/084-unbalanced-parentheses.c.expected
new file mode 100644
index 0000000..af49a37
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/084-unbalanced-parentheses.c.expected

@@ -0,0 +1,2 @@
+0:2(8): preprocessor error: syntax error, unexpected $end
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/085-incorrect-argument-count.c b/icd/intel/compiler/shader/glcpp/tests/085-incorrect-argument-count.c
new file mode 100644
index 0000000..91bea60
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/085-incorrect-argument-count.c

@@ -0,0 +1,5 @@
+#define MULT(x,y) ((x)*(y))
+MULT()
+MULT(1)
+MULT(1,2,3)
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/085-incorrect-argument-count.c.expected b/icd/intel/compiler/shader/glcpp/tests/085-incorrect-argument-count.c.expected
new file mode 100644
index 0000000..1df30cb
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/085-incorrect-argument-count.c.expected

@@ -0,0 +1,12 @@
+0:2(1): preprocessor error: Error: macro MULT invoked with 1 arguments (expected 2)
+
+0:3(1): preprocessor error: Error: macro MULT invoked with 1 arguments (expected 2)
+
+0:4(1): preprocessor error: Error: macro MULT invoked with 3 arguments (expected 2)
+
+
+MULT()
+MULT(1)
+MULT(1,2,3)
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/086-reserved-macro-names.c b/icd/intel/compiler/shader/glcpp/tests/086-reserved-macro-names.c
new file mode 100644
index 0000000..a6b7201
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/086-reserved-macro-names.c

@@ -0,0 +1,3 @@
+#define __BAD reserved
+#define GL_ALSO_BAD() also reserved
+#define THIS__TOO__IS__BAD reserved

diff --git a/icd/intel/compiler/shader/glcpp/tests/086-reserved-macro-names.c.expected b/icd/intel/compiler/shader/glcpp/tests/086-reserved-macro-names.c.expected
new file mode 100644
index 0000000..5ca42a9
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/086-reserved-macro-names.c.expected

@@ -0,0 +1,10 @@
+0:1(10): preprocessor warning: Macro names containing "__" are reserved for use by the implementation.
+
+0:2(9): preprocessor error: Macro names starting with "GL_" are reserved.
+
+0:3(9): preprocessor warning: Macro names containing "__" are reserved for use by the implementation.
+
+
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/087-if-comments.c b/icd/intel/compiler/shader/glcpp/tests/087-if-comments.c
new file mode 100644
index 0000000..ce8dc43
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/087-if-comments.c

@@ -0,0 +1,5 @@
+#if (1 == 0) // dangerous comment
+fail
+#else
+win
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/087-if-comments.c.expected b/icd/intel/compiler/shader/glcpp/tests/087-if-comments.c.expected
new file mode 100644
index 0000000..827e548
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/087-if-comments.c.expected

@@ -0,0 +1,6 @@
+
+
+
+win
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/088-redefine-macro-legitimate.c b/icd/intel/compiler/shader/glcpp/tests/088-redefine-macro-legitimate.c
new file mode 100644
index 0000000..0e0666b
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/088-redefine-macro-legitimate.c

@@ -0,0 +1,5 @@
+#define abc 123
+#define abc 123
+
+#define foo(x) (x)+23
+#define foo(x) ( x ) + 23

diff --git a/icd/intel/compiler/shader/glcpp/tests/088-redefine-macro-legitimate.c.expected b/icd/intel/compiler/shader/glcpp/tests/088-redefine-macro-legitimate.c.expected
new file mode 100644
index 0000000..6fb66a5
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/088-redefine-macro-legitimate.c.expected

@@ -0,0 +1,6 @@
+
+
+
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/089-redefine-macro-error.c b/icd/intel/compiler/shader/glcpp/tests/089-redefine-macro-error.c
new file mode 100644
index 0000000..b3d1391
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/089-redefine-macro-error.c

@@ -0,0 +1,17 @@
+#define x y
+#define x z
+
+#define abc 123
+#define abc() 123
+
+#define foo() bar
+#define foo(x) bar
+
+#define bar() baz
+#define bar baz
+
+#define biff(a,b) a+b
+#define biff(a,b,c) a+b
+
+#define oper(a,b) a+b
+#define oper(a,b) a*b

diff --git a/icd/intel/compiler/shader/glcpp/tests/089-redefine-macro-error.c.expected b/icd/intel/compiler/shader/glcpp/tests/089-redefine-macro-error.c.expected
new file mode 100644
index 0000000..6209ead
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/089-redefine-macro-error.c.expected

@@ -0,0 +1,30 @@
+0:2(9): preprocessor error: Redefinition of macro x
+
+0:5(9): preprocessor error: Redefinition of macro abc
+
+0:8(9): preprocessor error: Redefinition of macro foo
+
+0:11(9): preprocessor error: Redefinition of macro bar
+
+0:14(9): preprocessor error: Redefinition of macro biff
+
+0:17(9): preprocessor error: Redefinition of macro oper
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/090-hash-error.c b/icd/intel/compiler/shader/glcpp/tests/090-hash-error.c
new file mode 100644
index 0000000..d19bb7f
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/090-hash-error.c

@@ -0,0 +1 @@
+#error human error

diff --git a/icd/intel/compiler/shader/glcpp/tests/090-hash-error.c.expected b/icd/intel/compiler/shader/glcpp/tests/090-hash-error.c.expected
new file mode 100644
index 0000000..f2f1fbe
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/090-hash-error.c.expected

@@ -0,0 +1,3 @@
+0:1(2): preprocessor error: #error human error
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/091-hash-line.c b/icd/intel/compiler/shader/glcpp/tests/091-hash-line.c
new file mode 100644
index 0000000..26d7038
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/091-hash-line.c

@@ -0,0 +1,14 @@
+#line 0
+#error line 0 error
+#line 25
+#error line 25 error
+#line 0 1
+#error source 1, line 0 error
+#line 30 2
+#error source 2, line 30 error
+#line 45 2 /* A line with a comment */
+#define NINETY 90
+#define TWO 2
+#line NINETY TWO /* A #line line with macro expansion */
+#define FUNCTION_LIKE_MACRO(source, line) source line
+#line FUNCTION_LIKE_MACRO(180,2)

diff --git a/icd/intel/compiler/shader/glcpp/tests/091-hash-line.c.expected b/icd/intel/compiler/shader/glcpp/tests/091-hash-line.c.expected
new file mode 100644
index 0000000..48af0b2
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/091-hash-line.c.expected

@@ -0,0 +1,19 @@
+0:0(1): preprocessor error: #error line 0 error
+0:25(1): preprocessor error: #error line 25 error
+1:0(1): preprocessor error: #error source 1, line 0 error
+2:30(1): preprocessor error: #error source 2, line 30 error
+#line 0
+
+#line 25
+
+#line 0 1
+
+#line 30 2
+
+#line 45 2
+
+
+#line 90 2
+
+#line 180 2
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/092-redefine-macro-error-2.c b/icd/intel/compiler/shader/glcpp/tests/092-redefine-macro-error-2.c
new file mode 100644
index 0000000..3c161a5
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/092-redefine-macro-error-2.c

@@ -0,0 +1,5 @@
+#define A
+#define A 1
+
+#define B 1
+#define B

diff --git a/icd/intel/compiler/shader/glcpp/tests/092-redefine-macro-error-2.c.expected b/icd/intel/compiler/shader/glcpp/tests/092-redefine-macro-error-2.c.expected
new file mode 100644
index 0000000..0026f91
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/092-redefine-macro-error-2.c.expected

@@ -0,0 +1,10 @@
+0:2(9): preprocessor error: Redefinition of macro A
+
+0:5(9): preprocessor error: Redefinition of macro B
+
+
+
+
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/093-divide-by-zero.c b/icd/intel/compiler/shader/glcpp/tests/093-divide-by-zero.c
new file mode 100644
index 0000000..bf65d4f
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/093-divide-by-zero.c

@@ -0,0 +1,2 @@
+#if (1 / 0)
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/093-divide-by-zero.c.expected b/icd/intel/compiler/shader/glcpp/tests/093-divide-by-zero.c.expected
new file mode 100644
index 0000000..08f183f
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/093-divide-by-zero.c.expected

@@ -0,0 +1,4 @@
+0:1(13): preprocessor error: division by 0 in preprocessor directive
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/094-divide-by-zero-short-circuit.c b/icd/intel/compiler/shader/glcpp/tests/094-divide-by-zero-short-circuit.c
new file mode 100644
index 0000000..04497b1
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/094-divide-by-zero-short-circuit.c

@@ -0,0 +1,13 @@
+/* glcpp is generating a division-by-zero error for this case.  It's
+ * easy to argue that it should be short-circuiting the evaluation and
+ * not generating the diagnostic (which happens to be what gcc does).
+ * But it doesn't seem like we should force this behavior on our
+ * pre-processor, (and, as always, the GLSL specification of the
+ * pre-processor is too vague on this point).
+ *
+ * If a short-circuit evaluation optimization does get added to the
+ * pre-processor then it would legitimate to update the expected file
+ * for this test.
+*/
+#if 1 || (1 / 0)
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/094-divide-by-zero-short-circuit.c.expected b/icd/intel/compiler/shader/glcpp/tests/094-divide-by-zero-short-circuit.c.expected
new file mode 100644
index 0000000..be20b7c
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/094-divide-by-zero-short-circuit.c.expected

@@ -0,0 +1,15 @@
+0:12(17): preprocessor error: division by 0 in preprocessor directive
+ 
+
+
+
+
+
+
+
+
+
+
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/095-recursive-define.c b/icd/intel/compiler/shader/glcpp/tests/095-recursive-define.c
new file mode 100644
index 0000000..801d90c
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/095-recursive-define.c

@@ -0,0 +1,3 @@
+#define A(a, b) B(a, b)
+#define C A(0, C)
+C

diff --git a/icd/intel/compiler/shader/glcpp/tests/095-recursive-define.c.expected b/icd/intel/compiler/shader/glcpp/tests/095-recursive-define.c.expected
new file mode 100644
index 0000000..c7aa18f
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/095-recursive-define.c.expected

@@ -0,0 +1,4 @@
+
+
+B(0, C)
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/096-paste-twice.c b/icd/intel/compiler/shader/glcpp/tests/096-paste-twice.c
new file mode 100644
index 0000000..8da756f
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/096-paste-twice.c

@@ -0,0 +1,3 @@
+#define paste_twice(a,b,c) a ## b ## c
+paste_twice(just, one, token)
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/096-paste-twice.c.expected b/icd/intel/compiler/shader/glcpp/tests/096-paste-twice.c.expected
new file mode 100644
index 0000000..e401941
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/096-paste-twice.c.expected

@@ -0,0 +1,4 @@
+
+justonetoken
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/097-paste-with-non-function-macro.c b/icd/intel/compiler/shader/glcpp/tests/097-paste-with-non-function-macro.c
new file mode 100644
index 0000000..0f46835
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/097-paste-with-non-function-macro.c

@@ -0,0 +1,3 @@
+#define PASTE_MACRO one ## token
+PASTE_MACRO
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/097-paste-with-non-function-macro.c.expected b/icd/intel/compiler/shader/glcpp/tests/097-paste-with-non-function-macro.c.expected
new file mode 100644
index 0000000..af92187
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/097-paste-with-non-function-macro.c.expected

@@ -0,0 +1,4 @@
+
+onetoken
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/098-elif-undefined.c b/icd/intel/compiler/shader/glcpp/tests/098-elif-undefined.c
new file mode 100644
index 0000000..1f520d4
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/098-elif-undefined.c

@@ -0,0 +1,7 @@
+#if 0
+Not this
+#elif UNDEFINED_MACRO
+Nor this
+#else
+Yes, this.
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/098-elif-undefined.c.expected b/icd/intel/compiler/shader/glcpp/tests/098-elif-undefined.c.expected
new file mode 100644
index 0000000..2af0a12
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/098-elif-undefined.c.expected

@@ -0,0 +1,8 @@
+
+
+
+
+
+Yes, this.
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/099-c99-example.c b/icd/intel/compiler/shader/glcpp/tests/099-c99-example.c
new file mode 100644
index 0000000..d1976b1
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/099-c99-example.c

@@ -0,0 +1,17 @@
+#define  x      3
+#define  f(a)   f(x * (a))
+#undef   x
+#define  x      2
+#define  g      f
+#define  z      z[0]
+#define  h      g(~
+#define  m(a)   a(w)
+#define  w      0,1
+#define  t(a)   a
+#define  p()    int
+#define  q(x)   x
+#define  r(x,y) x ## y
+f(y+1) + f(f(z)) % t(t(g)(0) + t)(1);
+g(x +(3,4)-w) | h 5) & m
+       (f)^m(m);
+p() i[q()] = { q(1), r(2,3), r(4,), r(,5), r(,)};

diff --git a/icd/intel/compiler/shader/glcpp/tests/099-c99-example.c.expected b/icd/intel/compiler/shader/glcpp/tests/099-c99-example.c.expected
new file mode 100644
index 0000000..19be750
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/099-c99-example.c.expected

@@ -0,0 +1,17 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+f(2 * (y+1)) + f(2 * (f(2 * (z[0])))) % f(2 * (0)) + t(1);
+f(2 * (2 +(3,4)-0,1)) | f(2 * (~ 5)) & f(2 * (0,1))^m(0,1);
+int i[] = { 1, 23, 4, 5, };
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/100-macro-with-colon.c b/icd/intel/compiler/shader/glcpp/tests/100-macro-with-colon.c
new file mode 100644
index 0000000..31dbb9a
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/100-macro-with-colon.c

@@ -0,0 +1,7 @@
+#define one 1
+#define two 2
+
+switch (1) {
+   case one + two:
+      break;
+}

diff --git a/icd/intel/compiler/shader/glcpp/tests/100-macro-with-colon.c.expected b/icd/intel/compiler/shader/glcpp/tests/100-macro-with-colon.c.expected
new file mode 100644
index 0000000..36f98aa
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/100-macro-with-colon.c.expected

@@ -0,0 +1,8 @@
+
+
+
+switch (1) {
+   case 1 + 2:
+      break;
+}
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/101-macros-used-twice.c b/icd/intel/compiler/shader/glcpp/tests/101-macros-used-twice.c
new file mode 100644
index 0000000..e169380
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/101-macros-used-twice.c

@@ -0,0 +1,16 @@
+#define object 1
+#define function(x) 1
+
+#if object
+once
+#endif
+#if object
+twice
+#endif
+
+#if function(0)
+once
+#endif
+#if function(0)
+once again
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/101-macros-used-twice.c.expected b/icd/intel/compiler/shader/glcpp/tests/101-macros-used-twice.c.expected
new file mode 100644
index 0000000..1a4bf15
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/101-macros-used-twice.c.expected

@@ -0,0 +1,17 @@
+
+
+
+
+once
+
+
+twice
+
+
+
+once
+
+
+once again
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/102-garbage-after-endif.c b/icd/intel/compiler/shader/glcpp/tests/102-garbage-after-endif.c
new file mode 100644
index 0000000..301779e
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/102-garbage-after-endif.c

@@ -0,0 +1,2 @@
+#if 0
+#endif garbage

diff --git a/icd/intel/compiler/shader/glcpp/tests/102-garbage-after-endif.c.expected b/icd/intel/compiler/shader/glcpp/tests/102-garbage-after-endif.c.expected
new file mode 100644
index 0000000..d9f3bdc
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/102-garbage-after-endif.c.expected

@@ -0,0 +1,2 @@
+0:2(8): preprocessor error: syntax error, unexpected IDENTIFIER, expecting NEWLINE
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/103-garbage-after-else.c b/icd/intel/compiler/shader/glcpp/tests/103-garbage-after-else.c
new file mode 100644
index 0000000..c460fea
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/103-garbage-after-else.c

@@ -0,0 +1,3 @@
+#if 0
+#else garbage
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/103-garbage-after-else.c.expected b/icd/intel/compiler/shader/glcpp/tests/103-garbage-after-else.c.expected
new file mode 100644
index 0000000..f9f5f19
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/103-garbage-after-else.c.expected

@@ -0,0 +1,4 @@
+0:2(7): preprocessor error: syntax error, unexpected IDENTIFIER, expecting NEWLINE
+0:1(7): preprocessor error: Unterminated #if
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/104-hash-line-followed-by-code.c b/icd/intel/compiler/shader/glcpp/tests/104-hash-line-followed-by-code.c
new file mode 100644
index 0000000..3fbeec4
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/104-hash-line-followed-by-code.c

@@ -0,0 +1,2 @@
+#line 2
+int foo();

diff --git a/icd/intel/compiler/shader/glcpp/tests/104-hash-line-followed-by-code.c.expected b/icd/intel/compiler/shader/glcpp/tests/104-hash-line-followed-by-code.c.expected
new file mode 100644
index 0000000..e89a292
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/104-hash-line-followed-by-code.c.expected

@@ -0,0 +1,3 @@
+#line 2
+int foo();
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/105-multiline-hash-line.c b/icd/intel/compiler/shader/glcpp/tests/105-multiline-hash-line.c
new file mode 100644
index 0000000..da156c6
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/105-multiline-hash-line.c

@@ -0,0 +1,5 @@
+#define X(x) x
+#line X(	\
+	1	\
+       )
+#line 2

diff --git a/icd/intel/compiler/shader/glcpp/tests/105-multiline-hash-line.c.expected b/icd/intel/compiler/shader/glcpp/tests/105-multiline-hash-line.c.expected
new file mode 100644
index 0000000..fb8e150
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/105-multiline-hash-line.c.expected

@@ -0,0 +1,6 @@
+
+#line 1
+
+
+#line 2
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/106-multiline-hash-if.c b/icd/intel/compiler/shader/glcpp/tests/106-multiline-hash-if.c
new file mode 100644
index 0000000..929e93e
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/106-multiline-hash-if.c

@@ -0,0 +1,6 @@
+#define X(x) x
+#if X(		\
+	1	\
+     )
+int foo();
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/106-multiline-hash-if.c.expected b/icd/intel/compiler/shader/glcpp/tests/106-multiline-hash-if.c.expected
new file mode 100644
index 0000000..6f5ff2e
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/106-multiline-hash-if.c.expected

@@ -0,0 +1,7 @@
+
+
+
+
+int foo();
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/107-multiline-hash-elif.c b/icd/intel/compiler/shader/glcpp/tests/107-multiline-hash-elif.c
new file mode 100644
index 0000000..8c1c67a
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/107-multiline-hash-elif.c

@@ -0,0 +1,7 @@
+#define X(x) x
+#if 0
+#elif X(	\
+	1	\
+       )
+int foo();
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/107-multiline-hash-elif.c.expected b/icd/intel/compiler/shader/glcpp/tests/107-multiline-hash-elif.c.expected
new file mode 100644
index 0000000..68d489b
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/107-multiline-hash-elif.c.expected

@@ -0,0 +1,8 @@
+
+
+
+
+
+int foo();
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/108-no-space-after-hash-version.c b/icd/intel/compiler/shader/glcpp/tests/108-no-space-after-hash-version.c
new file mode 100644
index 0000000..0ce36f2
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/108-no-space-after-hash-version.c

@@ -0,0 +1 @@
+#version110

diff --git a/icd/intel/compiler/shader/glcpp/tests/108-no-space-after-hash-version.c.expected b/icd/intel/compiler/shader/glcpp/tests/108-no-space-after-hash-version.c.expected
new file mode 100644
index 0000000..da4544a
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/108-no-space-after-hash-version.c.expected

@@ -0,0 +1,2 @@
+0:1(3): preprocessor error: Invalid tokens after #
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/109-no-space-after-hash-line.c b/icd/intel/compiler/shader/glcpp/tests/109-no-space-after-hash-line.c
new file mode 100644
index 0000000..f52966a
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/109-no-space-after-hash-line.c

@@ -0,0 +1 @@
+#line2

diff --git a/icd/intel/compiler/shader/glcpp/tests/109-no-space-after-hash-line.c.expected b/icd/intel/compiler/shader/glcpp/tests/109-no-space-after-hash-line.c.expected
new file mode 100644
index 0000000..da4544a
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/109-no-space-after-hash-line.c.expected

@@ -0,0 +1,2 @@
+0:1(3): preprocessor error: Invalid tokens after #
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/110-no-space-digits-after-hash-elif.c b/icd/intel/compiler/shader/glcpp/tests/110-no-space-digits-after-hash-elif.c
new file mode 100644
index 0000000..6d7d0f3
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/110-no-space-digits-after-hash-elif.c

@@ -0,0 +1,3 @@
+#if 1
+#elif110
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/110-no-space-digits-after-hash-elif.c.expected b/icd/intel/compiler/shader/glcpp/tests/110-no-space-digits-after-hash-elif.c.expected
new file mode 100644
index 0000000..6d5e9d1
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/110-no-space-digits-after-hash-elif.c.expected

@@ -0,0 +1,4 @@
+0:2(2): preprocessor error: Invalid tokens after #
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/111-no-space-operator-after-hash-if.c b/icd/intel/compiler/shader/glcpp/tests/111-no-space-operator-after-hash-if.c
new file mode 100644
index 0000000..b341337
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/111-no-space-operator-after-hash-if.c

@@ -0,0 +1,19 @@
+#if(1)
+success
+#endif
+
+#if+1
+success
+#endif
+
+#if-1
+success
+#endif
+
+#if!1
+success
+#endif
+
+#if~1
+success
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/111-no-space-operator-after-hash-if.c.expected b/icd/intel/compiler/shader/glcpp/tests/111-no-space-operator-after-hash-if.c.expected
new file mode 100644
index 0000000..e083008
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/111-no-space-operator-after-hash-if.c.expected

@@ -0,0 +1,20 @@
+
+success
+
+
+
+success
+
+
+
+success
+
+
+
+
+
+
+
+success
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/112-no-space-operator-after-hash-elif.c b/icd/intel/compiler/shader/glcpp/tests/112-no-space-operator-after-hash-elif.c
new file mode 100644
index 0000000..e8221bc
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/112-no-space-operator-after-hash-elif.c

@@ -0,0 +1,24 @@
+#if 0
+#elif(1)
+success
+#endif
+
+#if 0
+#elif+1
+success
+#endif
+
+#if 0
+#elif-1
+success
+#endif
+
+#if 0
+#elif!1
+success
+#endif
+
+#if 0
+#elif~1
+success
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/112-no-space-operator-after-hash-elif.c.expected b/icd/intel/compiler/shader/glcpp/tests/112-no-space-operator-after-hash-elif.c.expected
new file mode 100644
index 0000000..3b5479a
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/112-no-space-operator-after-hash-elif.c.expected

@@ -0,0 +1,25 @@
+
+
+success
+
+
+
+
+success
+
+
+
+
+success
+
+
+
+
+
+
+
+
+
+success
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/113-line-and-file-macros.c b/icd/intel/compiler/shader/glcpp/tests/113-line-and-file-macros.c
new file mode 100644
index 0000000..369c487
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/113-line-and-file-macros.c

@@ -0,0 +1,7 @@
+1. Number of dalmations: __LINE__ __FILE__ __LINE__
+2. Nominal visual acuity: __LINE__ __FILE__ / __LINE__ __FILE__
+3. Battle of Thermopylae, as film: __LINE__ __FILE__ __FILE__
+4. HTTP code for "Not Found": __LINE__ __FILE__ __LINE__
+5. Hexadecimal for 20560: __LINE__ __FILE__ __LINE__ __FILE__
+6: Zip code for Nortonville, KS: __LINE__ __LINE__ __FILE__ __LINE__ __FILE__
+7. James Bond, as a number: __FILE__ __FILE__ __LINE__

diff --git a/icd/intel/compiler/shader/glcpp/tests/113-line-and-file-macros.c.expected b/icd/intel/compiler/shader/glcpp/tests/113-line-and-file-macros.c.expected
new file mode 100644
index 0000000..3562fb9
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/113-line-and-file-macros.c.expected

@@ -0,0 +1,8 @@
+1. Number of dalmations: 1 0 1
+2. Nominal visual acuity: 2 0 / 2 0
+3. Battle of Thermopylae, as film: 3 0 0
+4. HTTP code for "Not Found": 4 0 4
+5. Hexadecimal for 20560: 5 0 5 0
+6: Zip code for Nortonville, KS: 6 6 0 6 0
+7. James Bond, as a number: 0 0 7
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/114-paste-integer-tokens.c b/icd/intel/compiler/shader/glcpp/tests/114-paste-integer-tokens.c
new file mode 100644
index 0000000..d80d9c7
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/114-paste-integer-tokens.c

@@ -0,0 +1,7 @@
+#define PASTE3(a,b,c) a ## b ## c
+#define PASTE4(a,b,c,d) a ## b ## c ## d
+#define PASTE5(a,b,c,d,e) a ## b ## c ## d ## e
+4. HTTP code for "Not Found": PASTE3(__LINE__, __FILE__ , __LINE__)
+5. Hexadecimal for 20560: PASTE4(__LINE__, __FILE__, __LINE__, __FILE__)
+6: Zip code for Nortonville, KS: PASTE5(__LINE__, __LINE__, __FILE__, __LINE__,  __FILE__)
+7. James Bond, as a number: PASTE3(__FILE__, __FILE__, __LINE__)

diff --git a/icd/intel/compiler/shader/glcpp/tests/114-paste-integer-tokens.c.expected b/icd/intel/compiler/shader/glcpp/tests/114-paste-integer-tokens.c.expected
new file mode 100644
index 0000000..a3ad7da
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/114-paste-integer-tokens.c.expected

@@ -0,0 +1,8 @@
+
+
+
+4. HTTP code for "Not Found": 404
+5. Hexadecimal for 20560: 5050
+6: Zip code for Nortonville, KS: 66060
+7. James Bond, as a number: 007
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/115-line-continuations.c b/icd/intel/compiler/shader/glcpp/tests/115-line-continuations.c
new file mode 100644
index 0000000..105590d
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/115-line-continuations.c

@@ -0,0 +1,9 @@
+// This comment continues to the next line, hiding the define \
+#define CONTINUATION_UNSUPPORTED
+
+#ifdef CONTINUATION_UNSUPPORTED
+failure
+#else
+success
+#endif
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/115-line-continuations.c.expected b/icd/intel/compiler/shader/glcpp/tests/115-line-continuations.c.expected
new file mode 100644
index 0000000..f67ba1c
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/115-line-continuations.c.expected

@@ -0,0 +1,10 @@
+
+
+
+
+
+
+success
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/116-disable-line-continuations.c b/icd/intel/compiler/shader/glcpp/tests/116-disable-line-continuations.c
new file mode 100644
index 0000000..83d5ddf
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/116-disable-line-continuations.c

@@ -0,0 +1,13 @@
+// glcpp-args: --disable-line-continuations
+
+// This comments ends with a backslash \\
+#define NO_CONTINUATION
+
+#ifdef NO_CONTINUATION
+success
+#else
+failure
+#endif
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/116-disable-line-continuations.c.expected b/icd/intel/compiler/shader/glcpp/tests/116-disable-line-continuations.c.expected
new file mode 100644
index 0000000..9b9a8c5
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/116-disable-line-continuations.c.expected

@@ -0,0 +1,14 @@
+
+
+
+
+
+
+success
+
+
+
+
+
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/117-line-continuation-and-non-continuation-backslash.c b/icd/intel/compiler/shader/glcpp/tests/117-line-continuation-and-non-continuation-backslash.c
new file mode 100644
index 0000000..6a6f282
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/117-line-continuation-and-non-continuation-backslash.c

@@ -0,0 +1,12 @@
+/* This test case is the minimal case to replicate the bug reported here:
+ *
+ * https://bugs.freedesktop.org/show_bug.cgi?id=65112
+ *
+ * To trigger the bug, there must be a line-continuation sequence
+ * (backslash newline), then an additional newline character, and
+ * finally another backslash that is not part of a line-continuation
+ * sequence.
+ */
+\
+
+/* \ */

diff --git a/icd/intel/compiler/shader/glcpp/tests/117-line-continuation-and-non-continuation-backslash.c.expected b/icd/intel/compiler/shader/glcpp/tests/117-line-continuation-and-non-continuation-backslash.c.expected
new file mode 100644
index 0000000..292d651
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/117-line-continuation-and-non-continuation-backslash.c.expected

@@ -0,0 +1,13 @@
+ 
+
+
+
+
+
+
+
+
+
+
+ 
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/118-comment-becomes-space.c b/icd/intel/compiler/shader/glcpp/tests/118-comment-becomes-space.c
new file mode 100644
index 0000000..53e8039
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/118-comment-becomes-space.c

@@ -0,0 +1,4 @@
+#define FOO first/*
+*/second
+
+FOO

diff --git a/icd/intel/compiler/shader/glcpp/tests/118-comment-becomes-space.c.expected b/icd/intel/compiler/shader/glcpp/tests/118-comment-becomes-space.c.expected
new file mode 100644
index 0000000..2adf5d1
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/118-comment-becomes-space.c.expected

@@ -0,0 +1,5 @@
+
+
+
+first second
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/118-multiple-else.c b/icd/intel/compiler/shader/glcpp/tests/118-multiple-else.c
new file mode 100644
index 0000000..62ad49c
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/118-multiple-else.c

@@ -0,0 +1,6 @@
+#if 0
+#else
+int foo;
+#else
+int bar;
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/118-multiple-else.c.expected b/icd/intel/compiler/shader/glcpp/tests/118-multiple-else.c.expected
new file mode 100644
index 0000000..eaec481
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/118-multiple-else.c.expected

@@ -0,0 +1,8 @@
+0:4(1): preprocessor error: multiple #else
+
+
+int foo;
+
+int bar;
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/119-elif-after-else.c b/icd/intel/compiler/shader/glcpp/tests/119-elif-after-else.c
new file mode 100644
index 0000000..9b9e923
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/119-elif-after-else.c

@@ -0,0 +1,6 @@
+#if 0
+#else
+int foo;
+#elif 0
+int bar;
+#endif

diff --git a/icd/intel/compiler/shader/glcpp/tests/119-elif-after-else.c.expected b/icd/intel/compiler/shader/glcpp/tests/119-elif-after-else.c.expected
new file mode 100644
index 0000000..33f0513
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/119-elif-after-else.c.expected

@@ -0,0 +1,8 @@
+0:4(1): preprocessor error: #elif after #else
+
+
+int foo;
+
+int bar;
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/121-comment-bug-72686.c b/icd/intel/compiler/shader/glcpp/tests/121-comment-bug-72686.c
new file mode 100644
index 0000000..67ebe73
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/121-comment-bug-72686.c

@@ -0,0 +1,2 @@
+/*
+ */ //

diff --git a/icd/intel/compiler/shader/glcpp/tests/121-comment-bug-72686.c.expected b/icd/intel/compiler/shader/glcpp/tests/121-comment-bug-72686.c.expected
new file mode 100644
index 0000000..402a763
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/121-comment-bug-72686.c.expected

@@ -0,0 +1,3 @@
+  
+
+

diff --git a/icd/intel/compiler/shader/glcpp/tests/glcpp-test b/icd/intel/compiler/shader/glcpp/tests/glcpp-test
new file mode 100644
index 0000000..2d2687f
--- /dev/null
+++ b/icd/intel/compiler/shader/glcpp/tests/glcpp-test

@@ -0,0 +1,95 @@
+#!/bin/sh
+
+if [ ! -z "$srcdir" ]; then
+   testdir=$srcdir/glcpp/tests
+   glcpp=`pwd`/glcpp/glcpp
+else
+   testdir=.
+   glcpp=../glcpp
+fi
+
+trap 'rm $test.valgrind-errors; exit 1' INT QUIT
+
+usage ()
+{
+    cat <<EOF
+Usage: glcpp [options...]
+
+Run the test suite for mesa's GLSL pre-processor.
+
+Valid options include:
+
+	--valgrind	Run the test suite a second time under valgrind
+EOF
+}
+
+test_specific_args ()
+{
+    test="$1"
+
+    grep 'glcpp-args:' "$test" | sed -e 's,^.*glcpp-args: *,,'
+}
+
+# Parse command-line options
+for option; do
+    if [ "${option}" = '--help' ] ; then
+	usage
+	exit 0
+    elif [ "${option}" = '--valgrind' ] ; then
+	do_valgrind=yes
+    else
+	echo "Unrecognized option: $option" >&2
+	echo >&2
+	usage
+	exit 1
+    fi
+done
+
+total=0
+pass=0
+clean=0
+
+echo "====== Testing for correctness ======"
+for test in $testdir/*.c; do
+    echo -n "Testing $test..."
+    $glcpp $(test_specific_args $test) < $test > $test.out 2>&1
+    total=$((total+1))
+    if cmp $test.expected $test.out >/dev/null 2>&1; then
+	echo "PASS"
+	pass=$((pass+1))
+    else
+	echo "FAIL"
+	diff -u $test.expected $test.out
+    fi
+done
+
+echo ""
+echo "$pass/$total tests returned correct results"
+echo ""
+
+if [ "$do_valgrind" = "yes" ]; then
+    echo "====== Testing for valgrind cleanliness ======"
+    for test in $testdir/*.c; do
+	echo -n "Testing $test with valgrind..."
+	valgrind --error-exitcode=31 --log-file=$test.valgrind-errors $glcpp $(test_specific_args $test) < $test >/dev/null 2>&1
+	if [ "$?" = "31" ]; then
+	    echo "ERRORS"
+	    cat $test.valgrind-errors
+	else
+	    echo "CLEAN"
+	    clean=$((clean+1))
+	    rm $test.valgrind-errors
+	fi
+    done
+
+    echo ""
+    echo "$pass/$total tests returned correct results"
+    echo "$clean/$total tests are valgrind-clean"
+fi
+
+if [ "$pass" = "$total" ] && [ "$do_valgrind" != "yes" ] || [ "$pass" = "$total" ]; then
+    exit 0
+else
+    exit 1
+fi
+

diff --git a/icd/intel/compiler/shader/glsl_glass_backend.cpp b/icd/intel/compiler/shader/glsl_glass_backend.cpp
new file mode 100644
index 0000000..6b3d7e7
--- /dev/null
+++ b/icd/intel/compiler/shader/glsl_glass_backend.cpp

@@ -0,0 +1,51 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Author: Steve K <srk@LunarG.com>
+ *
+ */
+
+//===- glsl_glass_backend.cpp - Mesa customization of gla::BackEnd -----===//
+//
+// Customization of gla::BackEnd for Mesa
+//
+//===-------------------------------------------------------------------===//
+
+#include "glsl_glass_backend.h"
+
+namespace gla {
+
+//
+// factory for the Mesa LunarGLASS backend
+//
+BackEnd* GetMesaGlassBackEnd(const EShLanguage language)
+{
+    return new MesaGlassBackEnd(language);
+}
+
+void ReleaseMesaGlassBackEnd(BackEnd* backEnd)
+{
+    delete backEnd;
+}
+
+} // namespace gla

diff --git a/icd/intel/compiler/shader/glsl_glass_backend.h b/icd/intel/compiler/shader/glsl_glass_backend.h
new file mode 100644
index 0000000..c668406
--- /dev/null
+++ b/icd/intel/compiler/shader/glsl_glass_backend.h

@@ -0,0 +1,139 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Author: Steve K <srk@LunarG.com>
+ *
+ */
+
+//===- glsl_glass_backend.h - Mesa customization of gla::BackEnd -----===//
+//
+// Customization of gla::BackEnd for Mesa
+//
+//===----------------------------------------------------------------------===//
+
+#include "Core/Backend.h"
+#include "glslang/Public/ShaderLang.h"
+
+namespace gla {
+
+class MesaGlassBackEnd : public gla::BackEnd {
+public:
+   MesaGlassBackEnd(const EShLanguage l) :
+       language(l)
+   {
+      // LunarGLASS decomposition.  Ask for everything except the below.
+      for (int d = 0; d < EDiCount; ++d)
+         decompose[d] = true;
+
+      // Turn off some decompositions we don't need
+      decompose[EDiDot]               =
+      decompose[EDiMin]               =
+      decompose[EDiMax]               =
+      decompose[EDiExp]               =
+      decompose[EDiLog]               =
+      decompose[EDiSign]              =
+      decompose[EDiAny]               =
+      decompose[EDiAll]               =
+      decompose[EDiNot]               =
+      decompose[EDiFraction]          =
+      decompose[EDiInverseSqrt]       =
+      decompose[EDiFma]               =
+      decompose[EDiModF]              =
+      decompose[EDiMix]               =
+      decompose[EDiFixedTransform]    =
+      decompose[EDiPackUnorm2x16]     =
+      decompose[EDiPackUnorm4x8]      =
+      decompose[EDiPackSnorm4x8]      =
+      decompose[EDiUnpackUnorm2x16]   =
+      decompose[EDiUnpackUnorm4x8]    =
+      decompose[EDiUnpackSnorm4x8]    =
+      decompose[EDiPackDouble2x32]    =
+      decompose[EDiUnpackDouble2x32]  =
+      decompose[EDiPowi]              =
+      decompose[EDiAsin]              =
+      decompose[EDiAcos]              =
+      decompose[EDiAtan]              =
+      decompose[EDiAtan2]             =
+      decompose[EDiSinh]              =
+      decompose[EDiCosh]              =
+      decompose[EDiTanh]              =
+      decompose[EDiASinh]             =
+      decompose[EDiACosh]             =
+      decompose[EDiATanh]             =
+      decompose[EDiIsNan]             =
+      decompose[EDiTextureProjection] = 
+      decompose[EDiRefract]           = 
+      decompose[EDiFaceForward]       =
+         false;
+
+      // Explicitly mention the ones we want LunarGlass to decompose for us,
+      // just for clarity, even though the above loop asks for decomposition
+      // by default.
+      decompose[EDiClamp]       = false;
+      decompose[EDiFilterWidth] = true;
+   }
+   
+   void getRegisterForm(int& outerSoA, int& innerAoS)
+   {
+       switch (language)
+       {
+           case EShLangVertex:
+           case EShLangTessControl:
+           case EShLangTessEvaluation:
+           case EShLangGeometry:
+           case EShLangFragment:
+           case EShLangCompute:
+               outerSoA = 1;
+               innerAoS = 4;
+               break;
+           default:
+               assert(0); // TODO: error handling here
+       }
+   }
+
+   // We don't want phi functions
+   bool getRemovePhiFunctions() { return true; }
+
+   
+   bool getDeclarePhiCopies() { return true; }
+
+   // Not all backends yet support conditional discards.  Ask LunarGLASS to
+   // avoid them.
+   bool hoistDiscards() { return false; }
+
+   // Ask LunarGlass for mat*vec & vec*mat intrinsics
+   bool useColumnBasedMatrixIntrinsics() { return true; }
+
+   bool useLogicalIo() { return true; }
+
+private:
+   const EShLanguage language;
+};
+
+//
+// factory for the Mesa LunarGLASS backend
+//
+BackEnd* GetMesaGlassBackEnd(const EShLanguage);
+void ReleaseMesaGlassBackEnd(BackEnd*);
+
+} // namespace gla

diff --git a/icd/intel/compiler/shader/glsl_glass_backend_translator.cpp b/icd/intel/compiler/shader/glsl_glass_backend_translator.cpp
new file mode 100644
index 0000000..b4a658e
--- /dev/null
+++ b/icd/intel/compiler/shader/glsl_glass_backend_translator.cpp

@@ -0,0 +1,4116 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Author: Cody Northrop <cody@lunarg.com>
+ * Author: GregF <greg@LunarG.com>
+ * Author: Steve K <srk@LunarG.com>
+ *
+ */
+
+//===- glsl_glass_backend_translator.cpp - Mesa customization of gla::BackEndTranslator -----===//
+//
+// Customization of gla::BackEndTranslator for Mesa
+//
+//===----------------------------------------------------------------------------------------===//
+
+// LunarGLASS includes
+#include "Core/Revision.h"
+#include "Core/Exceptions.h"
+#include "Core/Util.h"
+#include "Core/BottomIR.h"
+#include "Core/Backend.h"
+#include "Core/PrivateManager.h"
+#include "Core/Options.h"
+#include "Core/metadata.h"
+#include "Core/Util.h"
+#include "glsl_glass_backend_translator.h"
+
+// Mesa includes
+#include "main/shaderobj.h"
+#include "ir.h"
+#include "glsl_parser_extras.h"
+#include "ast.h"
+
+// LLVM includes
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/ADT/StringMap.h"
+#include "llvm/Support/InstIterator.h"
+#include "llvm/Support/raw_ostream.h"
+#include "glslang/Public/ShaderLang.h"
+
+// Private includes
+#include <map>
+#include <iostream>
+
+void emit_function(_mesa_glsl_parse_state *state, ir_function *f);
+
+// Anonymous namespace for function local things
+namespace {
+   using namespace llvm;
+
+   // **** Borrowed & modified from LunarGlass GLSL backend: ****
+   class MetaType {
+   public:
+      MetaType() : precision(gla::EMpNone),
+                   typeLayout(gla::EMtlNone),
+                   interpMethod(gla::EIMNone),
+                   interpLocation(gla::EILFragment),
+                   qualifier(gla::EVQNone),
+                   location(-1),
+                   binding(-1),
+                   matrix(false), notSigned(false), block(false), mdAggregate(0), mdSampler(0) { }
+      std::string                 name;
+      gla::EMdPrecision           precision;
+      gla::EMdBuiltIn             builtIn;
+      gla::EMdInputOutput         ioKind;
+      gla::EMdTypeLayout          typeLayout;
+      gla::EInterpolationMethod   interpMethod;
+      gla::EInterpolationLocation interpLocation;
+      gla::EVariableQualifier     qualifier;
+      int                         location;
+      int                         binding;
+      bool                        matrix;
+      bool                        notSigned;
+      bool                        block;
+      const llvm::MDNode*         mdAggregate;
+      const llvm::MDNode*         mdSampler;
+   };
+
+   // **** Borrowed & modified from LunarGlass GLSL backend original: ****
+   // Process the mdNode, decoding all type information and emitting qualifiers.
+   // Returning false means there was a problem.
+   bool decodeMdTypesEmitMdQualifiers(bool ioRoot, const llvm::MDNode* mdNode, const llvm::Type*& type, bool arrayChild, MetaType& metaType)
+   {
+      using namespace gla;
+
+      if (ioRoot) {
+         llvm::Type* proxyType;
+         unsigned int proxyQualifiers;
+         int proxyOffset;
+         int interpMode;
+         if (! CrackIOMd(mdNode, metaType.name, metaType.ioKind, proxyType, metaType.typeLayout,
+                         metaType.precision, metaType.location, metaType.mdSampler, metaType.mdAggregate, interpMode, metaType.builtIn, metaType.binding, proxyQualifiers, proxyOffset)) {
+            return false;
+         }
+
+         metaType.block =
+            metaType.ioKind == EMioUniformBlockMember ||
+            metaType.ioKind == EMioBufferBlockMember  ||
+            metaType.ioKind == EMioPipeOutBlock       ||
+            metaType.ioKind == EMioPipeInBlock;
+
+         if (type == 0)
+            type = proxyType;
+
+         // emit interpolation qualifier, if appropriate
+         switch (metaType.ioKind) {
+         case EMioPipeIn:   metaType.qualifier = EVQInput;   break;
+         case EMioPipeOut:  metaType.qualifier = EVQOutput;  break;
+         default:           metaType.qualifier = EVQUndef;   break;
+         }
+         if (metaType.qualifier != EVQUndef)
+            // interpMode is initialized by the call to CrackIOMd above.
+            #if defined(__GNUC__)
+            #pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
+            #endif
+            CrackInterpolationMode(interpMode, metaType.interpMethod, metaType.interpLocation);
+      } else {
+         if (! CrackAggregateMd(mdNode, metaType.name, metaType.typeLayout,
+                                metaType.precision, metaType.location, metaType.mdSampler, metaType.builtIn))
+            return false;
+         metaType.mdAggregate = mdNode;
+      }
+
+      metaType.matrix = metaType.typeLayout == EMtlRowMajorMatrix || metaType.typeLayout == EMtlColMajorMatrix;
+      metaType.notSigned = metaType.typeLayout == EMtlUnsigned;
+
+      return true;
+   }
+
+   // **** Borrowed from LunarGlass GLSL Backend
+   void StripSuffix(std::string& name, const char* suffix)
+   {
+      const int newSize = name.length() - strlen(suffix);
+      if (newSize < 0)
+         return;
+
+      if (name.compare(newSize, strlen(suffix), suffix) == 0)
+         name.resize(newSize);
+   }
+
+   // **** Borrowed from LunarGlass GLSL Backend
+   gla::EVariableQualifier MapGlaAddressSpace(const llvm::Value* value)
+   {
+      using namespace gla;
+      if (const llvm::PointerType* pointer = llvm::dyn_cast<llvm::PointerType>(value->getType())) {
+          switch (pointer->getAddressSpace()) {
+          case ResourceAddressSpace:
+              return EVQUniform;
+          case GlobalAddressSpace:
+              return EVQGlobal;
+          default:
+              if (pointer->getAddressSpace() >= ConstantAddressSpaceBase)
+                  return EVQUniform;
+
+              UnsupportedFunctionality("Address Space in Bottom IR: ", pointer->getAddressSpace());
+              break;
+          }
+      }
+
+      if (llvm::isa<llvm::Instruction>(value))
+          return EVQTemporary;
+
+      // Check for an undef before a constant (since Undef is a
+      // subclass of Constant)
+      if (AreAllUndefined(value)) {
+          return EVQUndef;
+      }
+
+      if (llvm::isa<llvm::Constant>(value)) {
+          return EVQConstant;
+      }
+
+      return EVQTemporary;
+   }
+
+
+   // **** Borrowed & modified from LunarGlass GLSL Backend
+   // Whether the given intrinsic's specified operand is the same as the passed
+   // value, and its type is a vector.
+   bool IsSameSource(const llvm::Value *source, const llvm::Value *prevSource)
+   {
+       return source && prevSource == source &&
+          source->getType()->getTypeID() == llvm::Type::VectorTyID;
+   }
+
+
+   //
+   // **** Borrowed from LunarGlass GLSL Backend
+   // Figure out how many I/O slots 'type' would fill up.
+   //
+   int CountSlots(const llvm::Type* type)
+   {
+      if (type->getTypeID() == llvm::Type::VectorTyID)
+         return 1;
+      else if (type->getTypeID() == llvm::Type::ArrayTyID) {
+         const llvm::ArrayType* arrayType = llvm::dyn_cast<const llvm::ArrayType>(type);
+         return (int)arrayType->getNumElements() * CountSlots(arrayType->getContainedType(0));
+      } else if (type->getTypeID() == llvm::Type::StructTyID) {
+         const llvm::StructType* structType = llvm::dyn_cast<const llvm::StructType>(type);
+         int slots = 0;
+         for (unsigned int f = 0; f < structType->getStructNumElements(); ++f)
+            slots += CountSlots(structType->getContainedType(f));
+
+         return slots;
+      }
+
+      return 1;
+   }
+
+   // Sampler types grew to include include image types with formats.  We don't have image support yet.
+   static const int NumSamplerTypes     = 40;
+   static const int NumSamplerBaseTypes = 3;
+   static const int NumSamplerDims      = 7;
+   static const glsl_type* SamplerTypes[NumSamplerTypes][NumSamplerBaseTypes][NumSamplerDims][true+1][true+1];
+
+   // C++ before 11 doesn't have a static_assert.  This is a hack to provide
+   // one.  If/when this code is ever migrated to C++11, remove this, and use
+   // the language's native static_assert.
+   template <bool> struct ersatz_static_assert;
+   template <> struct ersatz_static_assert<true> { bool used; };
+
+   // If this fails during compilation, it means LunarGlass has added new sampler types
+   // We must react to.  Also, see comment above ersatz_static_assert.
+   ersatz_static_assert<NumSamplerTypes     == gla::EMsCount &&
+                        NumSamplerBaseTypes == gla::EMsbCount &&
+                        NumSamplerDims      == gla::EMsdCount> SamplerTypeMismatchWithLunarGlass;
+
+   /**
+    * -----------------------------------------------------------------------------
+    * Convert sampler type info from LunarGlass Metadata to glsl_type
+    * -----------------------------------------------------------------------------
+    */
+   const glsl_type* GetSamplerType(gla::EMdSampler samplerType,
+                                   gla::EMdSamplerBaseType baseType,
+                                   gla::EMdSamplerDim samplerDim,
+                                   bool isShadow, bool isArray)
+   {
+      // Bounds check so we don't overflow array!  The bools can't overflow.
+      if (samplerType >= NumSamplerTypes     ||
+          baseType    >= NumSamplerBaseTypes ||
+          samplerDim  >= NumSamplerDims) {
+         assert(0 && "Sampler type conversion error");
+         return 0;
+      }
+
+      return SamplerTypes[samplerType][baseType][samplerDim][isShadow][isArray];
+   }
+
+
+   /**
+    * -----------------------------------------------------------------------------
+    * Convert interpolation qualifiers to HIR version
+    * -----------------------------------------------------------------------------
+    */
+   glsl_interp_qualifier InterpolationQualifierToIR(gla::EInterpolationMethod im)
+   {
+      switch (im) {
+      case gla::EIMNoperspective: return INTERP_QUALIFIER_NOPERSPECTIVE;
+      case gla::EIMNone:          return INTERP_QUALIFIER_FLAT;
+      case gla::EIMSmooth:        return INTERP_QUALIFIER_SMOOTH;
+      default:                    return INTERP_QUALIFIER_NONE;
+      }
+   }
+
+   /**
+    * -----------------------------------------------------------------------------
+    * Convert type layout to HIR version
+    * -----------------------------------------------------------------------------
+    */
+    glsl_interface_packing TypeLayoutToIR(gla::EMdTypeLayout layout)
+    {
+       switch (layout) {
+       case gla::EMtlShared: return GLSL_INTERFACE_PACKING_SHARED;
+       case gla::EMtlPacked: return GLSL_INTERFACE_PACKING_PACKED;
+       case gla::EMtlStd430: assert(0 && "No HIR support yet");
+       case gla::EMtlStd140: // fall through...
+       default:              return GLSL_INTERFACE_PACKING_STD140;
+       }
+    }
+
+
+   /**
+    * -----------------------------------------------------------------------------
+    * Convert block mode
+    * -----------------------------------------------------------------------------
+    */
+    ir_variable_mode VariableQualifierToIR(gla::EVariableQualifier qualifier)
+    {
+       // TODO: ir_var_system_value
+
+       switch (qualifier) {
+       case gla::EVQUniform:   return ir_var_uniform;
+       case gla::EVQInput:     return ir_var_shader_in;
+       case gla::EVQOutput:    return ir_var_shader_out;
+       case gla::EVQConstant:  return ir_var_const_in;
+       case gla::EVQTemporary: return ir_var_temporary;
+       case gla::EVQGlobal:    return ir_var_auto;
+       default:                return ir_var_auto;
+       }
+    }
+
+
+   /**
+    * -----------------------------------------------------------------------------
+    * Deduce whether a metadata node IO metadata, or aggregate.  Does not
+    * grok other MD types.
+    * -----------------------------------------------------------------------------
+    */
+    inline bool isIoMd(const llvm::MDNode* mdNode)
+    {
+       return mdNode &&
+              mdNode->getNumOperands() > 3 &&
+              llvm::dyn_cast<const llvm::Value>(mdNode->getOperand(2)) &&
+              llvm::dyn_cast<const llvm::MDNode>(mdNode->getOperand(3));
+    }
+
+    inline bool isIoAggregateMd(const llvm::MDNode* mdNode)
+    {
+       return isIoMd(mdNode) && mdNode->getNumOperands() > 4 &&
+              llvm::dyn_cast<const llvm::MDNode>(mdNode->getOperand(4));
+    }
+
+} // anonymous namespace
+
+
+namespace gla {
+
+void MesaGlassTranslator::initSamplerTypes()
+{
+   using namespace gla;
+   memset(SamplerTypes, 0, sizeof(SamplerTypes));
+
+   SamplerTypeMismatchWithLunarGlass.used = true; // just to quiet a compiler warning about unused vars
+
+   // TODO: verify these, and fill out all types
+   // TODO: need EMsImage types in addition to EMsTexture
+
+   //           TYPE        BASETYPE   DIM        SHADOW  ARRAY    GLSL_SAMPLER_TYPE
+   SamplerTypes[EMsTexture][EMsbFloat][EMsd1D]    [false][false] = glsl_type::sampler1D_type;
+   SamplerTypes[EMsTexture][EMsbFloat][EMsd2D]    [false][false] = glsl_type::sampler2D_type;
+   SamplerTypes[EMsTexture][EMsbFloat][EMsd3D]    [false][false] = glsl_type::sampler3D_type;
+   SamplerTypes[EMsTexture][EMsbFloat][EMsdCube]  [false][false] = glsl_type::samplerCube_type;
+   SamplerTypes[EMsTexture][EMsbFloat][EMsdRect]  [false][false] = glsl_type::sampler2DRect_type;
+   SamplerTypes[EMsTexture][EMsbFloat][EMsdBuffer][false][false] = glsl_type::samplerBuffer_type;
+
+   SamplerTypes[EMsTexture][EMsbInt]  [EMsd1D]    [false][false] = glsl_type::isampler1D_type;
+   SamplerTypes[EMsTexture][EMsbInt]  [EMsd2D]    [false][false] = glsl_type::isampler2D_type;
+   SamplerTypes[EMsTexture][EMsbInt]  [EMsd3D]    [false][false] = glsl_type::isampler3D_type;
+   SamplerTypes[EMsTexture][EMsbInt]  [EMsdCube]  [false][false] = glsl_type::isamplerCube_type;
+   SamplerTypes[EMsTexture][EMsbInt]  [EMsdRect]  [false][false] = glsl_type::isampler2DRect_type;
+   SamplerTypes[EMsTexture][EMsbInt]  [EMsdBuffer][false][false] = glsl_type::isamplerBuffer_type;
+
+   SamplerTypes[EMsTexture][EMsbUint] [EMsd1D]    [false][false] = glsl_type::usampler1D_type;
+   SamplerTypes[EMsTexture][EMsbUint] [EMsd2D]    [false][false] = glsl_type::usampler2D_type;
+   SamplerTypes[EMsTexture][EMsbUint] [EMsd3D]    [false][false] = glsl_type::usampler3D_type;
+   SamplerTypes[EMsTexture][EMsbUint] [EMsdCube]  [false][false] = glsl_type::usamplerCube_type;
+   SamplerTypes[EMsTexture][EMsbUint] [EMsdRect]  [false][false] = glsl_type::usampler2DRect_type;
+   SamplerTypes[EMsTexture][EMsbUint] [EMsdBuffer][false][false] = glsl_type::usamplerBuffer_type;
+
+   SamplerTypes[EMsTexture][EMsbFloat][EMsd1D]    [true] [false] = glsl_type::sampler1DShadow_type;
+   SamplerTypes[EMsTexture][EMsbFloat][EMsd2D]    [true] [false] = glsl_type::sampler2DShadow_type;
+   SamplerTypes[EMsTexture][EMsbFloat][EMsdCube]  [true] [false] = glsl_type::samplerCubeShadow_type;
+   SamplerTypes[EMsTexture][EMsbFloat][EMsdRect]  [true] [false] = glsl_type::sampler2DRectShadow_type;
+
+   SamplerTypes[EMsTexture][EMsbFloat][EMsd1D]    [true] [true]  = glsl_type::sampler1DArrayShadow_type;
+   SamplerTypes[EMsTexture][EMsbFloat][EMsd2D]    [true] [true]  = glsl_type::sampler2DArrayShadow_type;
+   SamplerTypes[EMsTexture][EMsbFloat][EMsdCube]  [true] [true]  = glsl_type::samplerCubeArrayShadow_type;
+
+   SamplerTypes[EMsTexture][EMsbInt][EMsd1D]      [false][true]  = glsl_type::isampler1DArray_type;
+   SamplerTypes[EMsTexture][EMsbInt][EMsd2D]      [false][true]  = glsl_type::isampler2DArray_type;
+   SamplerTypes[EMsTexture][EMsbInt][EMsdCube]    [false][true]  = glsl_type::isamplerCubeArray_type;
+
+   SamplerTypes[EMsTexture][EMsbUint][EMsd1D]     [false][true]  = glsl_type::usampler1DArray_type;
+   SamplerTypes[EMsTexture][EMsbUint][EMsd2D]     [false][true]  = glsl_type::usampler2DArray_type;
+   SamplerTypes[EMsTexture][EMsbUint][EMsdCube]   [false][true]  = glsl_type::usamplerCubeArray_type;
+
+   SamplerTypes[EMsTexture][EMsbFloat][EMsd1D]    [false][true]  = glsl_type::sampler1DArray_type;
+   SamplerTypes[EMsTexture][EMsbFloat][EMsd2D]    [false][true]  = glsl_type::sampler2DArray_type;
+   SamplerTypes[EMsTexture][EMsbFloat][EMsdCube]  [false][true]  = glsl_type::samplerCubeArray_type;
+}
+
+/**
+ * -----------------------------------------------------------------------------
+ * Clean up
+ * -----------------------------------------------------------------------------
+ */
+MesaGlassTranslator::~MesaGlassTranslator()
+{
+   while (!toDelete.empty()) {
+      delete toDelete.back();
+      toDelete.pop_back();
+   }
+}
+
+// **** Borrowed from LunarGlass GLSL backend: ****
+// 'gep' is potentially a gep, either an instruction or a constantExpr.
+// See which one, if any, and return it.
+// Return 0 if not a gep.
+const llvm::GetElementPtrInst* MesaGlassTranslator::getGepAsInst(const llvm::Value* gep)
+{
+   const llvm::GetElementPtrInst* gepInst = llvm::dyn_cast<const llvm::GetElementPtrInst>(gep);
+   if (gepInst)
+      return gepInst;
+
+   // LLVM isn't always const correct.  I believe getAsInstruction() doesn't
+   // modify the original, so this is safe.
+   llvm::ConstantExpr *constantGep = const_cast<llvm::ConstantExpr*>(llvm::dyn_cast<llvm::ConstantExpr>(gep));
+
+   if (constantGep) {
+      const llvm::Instruction *instruction = constantGep->getAsInstruction();
+      toDelete.push_back(instruction);
+      gepInst = llvm::dyn_cast<const llvm::GetElementPtrInst>(instruction);
+   }
+
+   return gepInst;
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Handle error message
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::error(const char* msg) const
+{
+   infoLog += "ERROR: ";
+   infoLog += msg;
+   infoLog += "\n";
+
+   state->error = true;
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * initialize translation state
+ * cribbed from BottomToGLSL
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::start(llvm::Module& module)
+{
+   countReferences(module);
+
+   int mdInt;
+   switch(EShLanguage(manager->getStage())) {
+   case EShLangVertex:
+       break;
+
+   case EShLangGeometry:
+
+       // input primitives are not optional
+       state->gs_input_prim_type_specified = true;
+
+       switch (GetMdNamedInt(module, gla::InputPrimitiveMdName)) {
+       case EMlgPoints:
+           state->in_qualifier->prim_type = GL_POINTS;
+           break;
+       case EMlgLines:
+           state->in_qualifier->prim_type = GL_LINES;
+           break;
+       case EMlgLinesAdjacency:
+           state->in_qualifier->prim_type = GL_LINES_ADJACENCY;
+           break;
+       case EMlgTriangles:
+           state->in_qualifier->prim_type = GL_TRIANGLES;
+           break;
+       case EMlgTrianglesAdjacency:
+           state->in_qualifier->prim_type = GL_TRIANGLES_ADJACENCY;
+           break;
+       default:
+           assert(0 && "Unknown geometry shader input primitive");
+           break;
+       }
+
+       // invocations is optional
+       mdInt = GetMdNamedInt(module, gla::InvocationsMdName);
+       if (mdInt) {
+           state->in_qualifier->flags.q.invocations = true;
+           state->in_qualifier->invocations = mdInt;
+       }
+
+       // output primitives are not optional
+       state->out_qualifier->flags.q.prim_type = true;
+
+       switch (GetMdNamedInt(module, gla::OutputPrimitiveMdName)) {
+       case EMlgPoints:
+           state->out_qualifier->prim_type = GL_POINTS;
+           break;
+       case EMlgLineStrip:
+           state->out_qualifier->prim_type = GL_LINE_STRIP;
+           break;
+       case EMlgTriangleStrip:
+           state->out_qualifier->prim_type = GL_TRIANGLE_STRIP;
+           break;
+       default:
+           assert(0 && "Unknown geometry shader output primitive");
+           break;
+       }
+
+       // max_vertices is not optional
+       state->out_qualifier->flags.q.max_vertices = true;
+       state->out_qualifier->max_vertices = GetMdNamedInt(module, gla::NumVerticesMdName);
+
+       break;
+
+   case EShLangFragment:
+
+      if (GetMdNamedInt(module, PixelCenterIntegerMdName))
+         state->fs_pixel_center_integer = 1;
+      else
+         state->fs_pixel_center_integer = 0;
+
+      if (GetMdNamedInt(module, OriginUpperLeftMdName))
+         state->fs_origin_upper_left = 1;
+      else
+         state->fs_origin_upper_left = 0;
+
+      break;
+
+   default:
+       assert(0 && "Unsupported stage");
+       break;
+   }
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * initialize translation state
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::initializeTranslation(gl_context *ctx, _mesa_glsl_parse_state *state, gl_shader *shader)
+{
+   infoLog.clear();
+
+   setContext(ctx);
+   setParseState(state);
+   setShader(shader);
+
+   ralloc_free(shader->ir);
+   shader->ir = new(shader) exec_list;
+
+   _mesa_glsl_initialize_types(state);
+   _mesa_glsl_initialize_variables(shader->ir, state);
+
+   // Seed our globalDeclMap with whatever _mesa_glsl_initialize_variables did.
+   seedGlobalDeclMap();
+
+   state->symbols->separate_function_namespace = state->language_version == 110;
+
+   state->current_function = NULL;
+
+   state->toplevel_ir = shader->ir;
+
+   instructionStack.push_back(shader->ir);
+
+   state->gs_input_prim_type_specified = false;
+   state->cs_input_local_size_specified = false;
+
+   /* Section 4.2 of the GLSL 1.20 specification states:
+    * "The built-in functions are scoped in a scope outside the global scope
+    *  users declare global variables in.  That is, a shader's global scope,
+    *  available for user-defined functions and global variables, is nested
+    *  inside the scope containing the built-in functions."
+    *
+    * Since built-in functions like ftransform() access built-in variables,
+    * it follows that those must be in the outer scope as well.
+    *
+    * We push scope here to create this nesting effect...but don't pop.
+    * This way, a shader's globals are still in the symbol table for use
+    * by the linker.
+    */
+   state->symbols->push_scope();
+
+   // Initialize builtin functions we might use (tx sampling, etc)
+   _mesa_glsl_initialize_builtin_functions();
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * finalize translation state
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::finalizeTranslation()
+{
+   /* Figure out if gl_FragCoord is actually used in fragment shader */
+   ir_variable *const var = state->symbols->get_variable("gl_FragCoord");
+   if (var != NULL)
+      state->fs_uses_gl_fragcoord = var->data.used;
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Seed the global declaration map from the initial declarations
+ * from _mesa_glsl_initialize_variables.
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::seedGlobalDeclMap()
+{
+   foreach_list_safe(node, shader->ir) {
+      ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+      if (var)
+         globalDeclMap[var->name] = var;
+   }
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * We count the references of each lvalue, to know when to generate
+ * assignments and when to directly create tree nodes
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::countReferences(const llvm::Module& module)
+{
+   using namespace llvm;
+
+   // Iterate over all functions in module
+   for (Module::const_iterator func = module.begin(); func != module.end(); ++func) {
+      for (Function::const_iterator block = func->begin(); block != func->end(); ++block) {
+         for (BasicBlock::const_iterator inst = block->begin(); inst != block->end(); ++inst) {
+
+            // Keep track of which local values should be signed ints
+            switch ((&*inst)->getOpcode()) {
+            case llvm::Instruction::ZExt:   // fall through...
+            case llvm::Instruction::FPToSI: sintValues.insert(&*inst);                  break;
+            case llvm::Instruction::SIToFP: sintValues.insert((&*inst)->getOperand(0)); break;
+            case llvm::Instruction::SRem:   // fall through...
+            case llvm::Instruction::SDiv:
+               sintValues.insert((&*inst));
+               sintValues.insert((&*inst)->getOperand(0));
+               sintValues.insert((&*inst)->getOperand(1));
+               break;
+            default: break;
+            }
+
+            // Iterate over all operands in the instruction
+            for (User::const_op_iterator op = inst->op_begin(), opEnd = inst->op_end();
+                 op != opEnd; ++op) {
+
+               refCountMap[*op]++;
+
+               const llvm::Instruction* opInst = dyn_cast<llvm::Instruction>(op);
+
+               // If defined in different block than this use, force refcount large
+               // so we don't move the code across flow control.
+               if (opInst && opInst->getParent() != inst->getParent())
+                  refCountMap[*op] = 65536;
+            }
+         }
+      }
+   }
+
+   // For debugging: change to true to dump ref counts
+   static const bool debugReferenceCounts = false;
+
+   if (debugReferenceCounts) {
+      FILE* out = stderr;
+      std::string name;
+      llvm::raw_string_ostream nameStream(name);
+
+      fprintf(out, "RValue Ref Counts\n%-10s : %3s\n",
+              "RValue", "#");
+
+      for (tRefCountMap::const_iterator ref = refCountMap.begin();
+           ref != refCountMap.end(); ++ref) {
+         name.clear();
+         // ref->first->printAsOperand(nameStream);
+         fprintf(out, "%-10s : %3d\n", ref->first->getName().str().c_str(), ref->second);
+      }
+   }
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Return true if we have a valueMap entry for this LLVM value
+ * -----------------------------------------------------------------------------
+ */
+inline bool MesaGlassTranslator::valueEntryExists(const llvm::Value* value) const
+{
+   return valueMap.find(value) != valueMap.end();
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Return ref count of an rvalue
+ * -----------------------------------------------------------------------------
+ */
+
+inline unsigned MesaGlassTranslator::getRefCount(const llvm::Value* value) const
+{
+   const tRefCountMap::const_iterator found = refCountMap.find(value);
+
+   // Return 0 if not found, else the refcount
+   return found == refCountMap.end() ? 0 : found->second;
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Encapsulate creation of variables
+ * -----------------------------------------------------------------------------
+ */
+inline ir_variable*
+MesaGlassTranslator::newIRVariable(const glsl_type* type, const char* name,
+                                   int mode, bool declare)
+{
+   // If we have a global decl, use that.  HIR form expects use of the same
+   // ir_variable node as the global decl used.
+   const tGlobalDeclMap::const_iterator globalDecl = globalDeclMap.find(name);
+   if (globalDecl != globalDeclMap.end())
+      return globalDecl->second;
+
+   // ir_variable constructor strdups the name, so it's OK if it points to something ephemeral
+   ir_variable* var = new(shader) ir_variable(type, name, ir_variable_mode(mode));
+   var->data.used = true;
+
+   // TODO: handle interpolation modes, locations, precisions, etc etc etc
+   //       poke them into the right bits in the ir_variable
+   // var->data.centroid =
+   // var->data.sample =
+   // var->data.q.*
+   // var->data.invariant
+   // var->data.binding
+   // var->data.varying
+   // var->data.how_declared = ir_var_declared_normally;
+
+   // TODO: also set for consts and vertex attrs per comment ir.h:153?
+   const bool readOnly = (mode == ir_var_uniform);
+   if (readOnly)
+      var->data.read_only = true;
+
+   // TODO: To avoid redeclaring builtins, we have a hack to look at the name.
+   // Something better should be done here.  Because of short circuit evaluation,
+   // this is safe against name strings that aren't 3 characters long.
+   const bool isBuiltin = name && name[0] == 'g' && name[1] == 'l' && name[2] == '_';
+
+   // Declare the variable by adding it as either a global or local to the instruction list
+   const bool global = (mode == ir_var_uniform   ||
+                        mode == ir_var_shader_in ||
+                        mode == ir_var_shader_out);
+
+   if (!isBuiltin && (!global || declare))
+      addIRInstruction(var, global);
+
+   // Declare globals in the global decl map
+   if (declare)
+      globalDeclMap[name] = var;
+
+   return var;
+}
+
+inline ir_variable*
+MesaGlassTranslator::newIRVariable(const glsl_type* type, const std::string& name,
+                                   int mode, bool declare)
+{
+   return newIRVariable(type, name.c_str(), mode, declare);
+}
+
+inline ir_variable*
+MesaGlassTranslator::newIRVariable(const llvm::Type* type,
+                                   const llvm::Value* value,
+                                   const char* name,
+                                   int mode, bool declare)
+{
+   return newIRVariable(llvmTypeToHirType(type, 0, value), name, mode, declare);
+}
+
+inline ir_variable*
+MesaGlassTranslator::newIRVariable(const llvm::Type* type,
+                                   const llvm::Value* value, 
+                                   const std::string& name,
+                                   int mode, bool declare)
+{
+   return newIRVariable(llvmTypeToHirType(type, 0, value), name.c_str(), mode, declare);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Encapsulate creation of variable dereference
+ * -----------------------------------------------------------------------------
+ */
+inline ir_dereference*
+MesaGlassTranslator::newIRVariableDeref(const glsl_type* type, const char* name, int mode, bool declare)
+{
+   return new(shader) ir_dereference_variable(newIRVariable(type, name, mode, declare));
+}
+
+inline ir_dereference*
+MesaGlassTranslator::newIRVariableDeref(const glsl_type* type, const std::string& name, int mode, bool declare)
+{
+   return newIRVariableDeref(type, name.c_str(), mode, declare);
+}
+
+inline ir_dereference*
+MesaGlassTranslator::newIRVariableDeref(const llvm::Type* type,
+                                        const llvm::Value* value, 
+                                        const char* name, int mode,
+                                        bool declare)
+{
+   return newIRVariableDeref(llvmTypeToHirType(type, 0, value), name, mode, declare);
+}
+
+inline ir_dereference*
+MesaGlassTranslator::newIRVariableDeref(const llvm::Type* type,
+                                        const llvm::Value* value,
+                                        const std::string& name, int mode,
+                                        bool declare)
+{
+   return newIRVariableDeref(llvmTypeToHirType(type, 0, value), name.c_str(), mode, declare);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit IR texture intrinsics
+ *   (declare (uniform ) sampler2D Diffuse)
+ *   (call texture (var_ref texture_retval)  ((var_ref Diffuse) (var_ref VTexcoord) ))
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitIRTexture(const llvm::IntrinsicInst* llvmInst, bool gather)
+{
+   const unsigned texFlags        = GetConstantInt(llvmInst->getOperand(GetTextureOpIndex(ETOFlag)));
+   const llvm::Value* samplerType = llvmInst->getOperand(0);
+
+   std::string txName;
+   txName.reserve(20);
+
+   // Original style shadowing returns vec4 while 2nd generation returns float,
+   // so, have to stick to old-style for those cases.
+   const bool forceOldStyle = IsVector(llvmInst->getType()) && (texFlags & ETFShadow) && ((texFlags & ETFGather) == 0);
+
+   if (state->language_version >= 130 && !forceOldStyle) {
+      if (texFlags & ETFFetch)
+         txName = "texelFetch";
+      else
+         txName = "texture";
+   } else {
+      if (texFlags & ETFShadow)
+         txName = "shadow";
+      else
+         txName = "texture";
+
+      const int sampler = GetConstantInt(samplerType);
+
+      switch (sampler) {
+      case ESampler1D:        txName += "1D";     break;
+      case ESampler2D:        txName += "2D";     break;
+      case ESampler3D:        txName += "3D";     break;
+      case ESamplerCube:      txName += "Cube";   break;
+      case ESampler2DRect:    txName += "2DRect"; break;
+      default: error("Unexpected sampler type");  break;
+      }
+   }
+
+   if (texFlags & ETFProjected)
+      txName += "Proj";
+
+   // This is OK on the stack, because the ir_call constructor will move out of it.
+   exec_list parameters;
+
+   // Sampler operand
+   parameters.push_tail(getIRValue(llvmInst->getOperand(GetTextureOpIndex(ETOSamplerLoc))));
+
+   // Coordinate
+   parameters.push_tail(getIRValue(llvmInst->getOperand(GetTextureOpIndex(ETOCoord))));
+
+   // RefZ
+   if (texFlags & ETFRefZArg) {
+      parameters.push_tail(getIRValue(llvmInst->getOperand(GetTextureOpIndex(ETORefZ))));
+   }
+
+   // LOD
+   if (texFlags & ETFLod) {
+      txName += "Lod";
+      parameters.push_tail(getIRValue(llvmInst->getOperand(GetTextureOpIndex(ETOBiasLod))));
+   }
+
+   // dPdX/dPdY
+   if (IsGradientTexInst(llvmInst)) {
+      txName += "GradARB";
+      parameters.push_tail(getIRValue(llvmInst->getOperand(GetTextureOpIndex(ETODPdx))));
+      parameters.push_tail(getIRValue(llvmInst->getOperand(GetTextureOpIndex(ETODPdy))));
+   }
+
+   if (texFlags & ETFGather) {
+      txName += "Gather";
+   }
+
+   // Offsets
+   if (texFlags & ETFOffsetArg) {
+      if (texFlags & ETFOffsets) {
+         assert(0 && "TODO: Handle offsets"); // TODO;
+         txName += "Offsets";
+      } else {
+         txName += "Offset";
+         parameters.push_tail(getIRValue(llvmInst->getOperand(GetTextureOpIndex(ETOOffset))));
+      }
+   }
+
+   // BiasLOD
+   if (((texFlags & ETFBiasLodArg) != 0 && (texFlags & ETFLod) == 0) ||
+       (texFlags & ETFComponentArg)) {
+      parameters.push_tail(getIRValue(llvmInst->getOperand(GetTextureOpIndex(ETOBiasLod))));
+   }
+
+   // Find the right function signature to call
+   // This sets state->uses_builtin_functions
+   ir_function_signature *sig =
+      _mesa_glsl_find_builtin_function(state, txName.c_str(), &parameters);
+
+   if (sig == 0) {
+      return error("no matching texture signature found");
+   }
+
+   const std::string retName = std::string(txName) + "_retval";
+   ir_dereference* dest = newIRVariableDeref(llvmInst->getType(), llvmInst, retName, ir_var_auto);
+
+   ir_call *call = new(shader) ir_call(sig, dest->as_dereference_variable(), &parameters);
+
+   // TODO: Should we insert a prototype?
+
+   addIRInstruction(llvmInst, call);
+}
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit IR texture query intrinsics
+ *    (declare (uniform ) sampler2D surface)
+ *    (declare () ivec2 textureSize_retval)
+ *    (call textureSize (var_ref textureSize_retval)  ((var_ref surface) (var_ref lod) ))
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitIRTextureQuery(const llvm::IntrinsicInst* llvmInst)
+{
+   const char* name;
+
+   // This is OK on the stack, because the ir_call constructor will move out of it.
+   exec_list parameters;
+
+   // Sampler operand
+   parameters.push_tail(getIRValue(llvmInst->getOperand(GetTextureOpIndex(ETOSamplerLoc))));
+
+   // Inspired by LunarGlass GLSL backend
+   switch (llvmInst->getIntrinsicID()) {
+   case llvm::Intrinsic::gla_queryTextureSize:
+   case llvm::Intrinsic::gla_queryTextureSizeNoLod:
+      name = "textureSize";
+      if (llvmInst->getNumArgOperands() > 2)
+         parameters.push_tail(getIRValue(llvmInst->getOperand(2)));
+      break;
+
+   case llvm::Intrinsic::gla_fQueryTextureLod:
+      name = "textureLod";
+      parameters.push_tail(getIRValue(llvmInst->getOperand(2)));
+      break;
+
+   default:
+      name = "error";
+      error("unexpected texture query intrinsic");
+      break;
+   }
+
+   // The rest of this function is very similar to emitIRTexture
+
+   // Find the right function signature to call
+   // This sets state->uses_builtin_functions
+   ir_function_signature *sig =
+      _mesa_glsl_find_builtin_function(state, name, &parameters);
+
+   if (sig == 0) {
+      return error("no matching texture signature found");
+   }
+
+   const std::string retName = std::string(name) + "_retval";
+   ir_dereference* dest = newIRVariableDeref(llvmInst->getType(), llvmInst, retName, ir_var_auto);
+
+   ir_call *call = new(shader) ir_call(sig, dest->as_dereference_variable(), &parameters);
+
+   // TODO: Should we insert a prototype?
+
+   addIRInstruction(llvmInst, call);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Add a plain (non-conditional) discard
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::addDiscard()
+{
+   addIRInstruction(new(shader) ir_discard);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Translate structs
+ * -----------------------------------------------------------------------------
+ */
+void addStructType(llvm::StringRef, const llvm::Type*)
+{
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Translate const globals
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::addGlobalConst(const llvm::GlobalVariable* global)
+{
+   return addGlobal(global);
+}
+
+/**
+ * -----------------------------------------------------------------------------
+ * Translate globals
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::addGlobal(const llvm::GlobalVariable* global)
+{
+   const gla::EVariableQualifier qualifier = MapGlaAddressSpace(global);
+
+   switch (qualifier) {
+   case gla::EVQUniform:
+   case gla::EVQGlobal:
+   case gla::EVQTemporary: break;
+   default: return;  // nothing to do for these
+   }
+
+   llvm::Type* type;
+   if (const llvm::PointerType* pointer = llvm::dyn_cast<llvm::PointerType>(global->getType()))
+      type = pointer->getContainedType(0);
+   else
+      type = global->getType();
+
+   const ir_variable_mode irMode   = VariableQualifierToIR(qualifier);
+   const std::string      name     = global->getName();
+
+   // Remember the ir_variable_mode for this declaration
+   if (globalVarModeMap.find(name) == globalVarModeMap.end()) {
+      globalVarModeMap[name] = irMode;
+   } else {
+      // If it already existed in the map, it's due to an IO declaration,
+      // and we don't want to step on that, so bail out.
+      return;
+   }
+
+   // interface block members are hoisted to global scope in HIR.  Thus, we do
+   // not add globals for them.
+   if (anonBlocks.find(name) != anonBlocks.end())
+      return;
+
+   ir_rvalue*             varDeref = newIRVariableDeref(type, 0, name, irMode, true);
+   ir_variable*           var      = varDeref->as_dereference_variable()->variable_referenced();
+
+   // add initializers to main prologue or variable (for uniforms)
+   if (global->hasInitializer()) {
+      var->data.has_initializer = true;
+      ir_constant* constVal = newIRConstant(global->getInitializer());
+
+      if (qualifier == gla::EVQUniform) {
+         var->constant_value       = constVal;
+         var->constant_initializer = constVal->clone(shader, 0);
+         // Create uniform initializers
+      } else {
+         // Non-uniforms get explicit initialization
+         prologue.push_back(new(shader) ir_assignment(varDeref, constVal));
+      }
+   }
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Create global IR declarations for shaders ins, outs, and uniforms
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::addIoDeclaration(gla::EVariableQualifier qualifier,
+                                           const llvm::MDNode* mdNode)
+{
+   const llvm::Type* mdRawType = 0;
+   MetaType metaType;
+   decodeMdTypesEmitMdQualifiers(true, mdNode, mdRawType, false, metaType);
+   const llvm::Type* mdType = mdRawType->getContainedType(0);
+
+   std::string name = metaType.name;
+   bool anonymous = false;
+
+   // Name may be missing either because of anonymous block or SPIR-V
+   // debug strings were stripped. Grab a name from the metadata type
+   std::string bname;
+   if (name.empty()) {
+       name = mdNode->getOperand(2)->getName();
+       StripSuffix(name, "_typeProxy");
+       StripSuffix(name, "_shadow");
+       if (metaType.ioKind == gla::EMioUniformBlockMember)
+           anonymous = true;
+   }
+
+   // If its a builtin, the name might not be correct due to SPIR-V stripping.
+   // Make sure we are using the standard name as the rest of the driver relies on this.
+   switch (metaType.builtIn) {
+   case EmbPosition:       bname = "gl_Position";     break;
+   case EmbPointSize:      bname = "gl_PointSize";    break;
+   case EmbClipDistance:   bname = "gl_ClipDistance"; break;
+   case EmbCullDistance:   bname = "gl_CullDistance"; break;
+   case EmbFragDepth:      bname = "gl_FragDepth";    break;
+   case EmbVertexId:       bname = "gl_VertexID";     break;
+   case EmbVertexIndex:    bname = "gl_VertexID";     break;
+   case EmbPrimitiveId:    bname = "gl_PrimitiveID";  break;
+   case EmbInstanceId:     bname = "gl_InstanceID";   break;
+   case EmbFace:           bname = "gl_FrontFacing";  break;
+   case EmbFragCoord:      bname = "gl_FragCoord";    break;
+   case EmbPointCoord:     bname = "gl_PointCoord";   break;
+   case EmbSampleId:       bname = "gl_SampleID";     break;
+   case EmbLayer:          bname = "gl_Layer";        break;
+   case EmbSamplePosition:   bname = "gl_SamplePosition";   break;
+   case EmbViewportIndex:    bname = "gl_ViewportIndex";    break;
+   case EmbHelperInvocation: bname = "gl_HelperInvocation"; break;
+   case EmbSampleMask:       bname = (metaType.ioKind == gla::EMioPipeIn)
+                                         ? "gl_SampleMaskIn"
+                                         : "gl_SampleMask";   break;
+   case EmbNone:
+       if (metaType.ioKind == gla::EMioPipeInBlock)
+            bname = "gl_in";
+       break;
+   default:
+       // Any builtins not in GLSL 4.5 will not come through SPIR-V and will
+       // have the correct name.
+       break;
+   } // switch
+
+   const glsl_type*       irInterfaceType = llvmTypeToHirType(mdType, mdNode);
+   const ir_variable_mode irVarMode       = VariableQualifierToIR(qualifier);
+
+   if (!bname.empty()) {
+       nameBuiltinMap[name] = bname;
+       globalVarModeMap[name] = irVarMode;
+       name = bname;
+   }
+
+   // Register name -> metadata mapping for this declaration
+   typenameMdMap[name]    = mdNode;
+   globalVarModeMap[name] = irVarMode;
+
+   // Because of short circuit evaluation, this is safe
+   const bool isPerVertex = irInterfaceType->name && strcmp(irInterfaceType->name, "gl_PerVertex") == 0;
+
+   // Mesa hoists interface block members to top level IO
+   if (isPerVertex) {
+      anonBlocks.insert(name);
+      return;
+   }
+
+   if (anonymous) {
+      // Create IR declaration for the interface
+      // For anonymous blocks, HIR hoists members to the global scope.  This is awkward.
+      if (irInterfaceType->is_interface()) {
+         anonBlocks.insert(name);
+
+         for (unsigned field=0; field < irInterfaceType->length; ++field) {
+            ir_variable* var = newIRVariable(irInterfaceType->fields.structure[field].type,
+                                             irInterfaceType->fields.structure[field].name,
+                                             irVarMode, true);
+
+            var->data.how_declared = ir_var_declared_in_block;
+            var->reinit_interface_type(irInterfaceType);
+
+            // TODO: irInterfaceType->fields.structure[field].row_major;
+
+            // location should not be set on a field of an anonymous block
+            assert(irInterfaceType->fields.structure[field].location == -1);
+
+            // Look up the binding info for each member
+            // LunarG TODO: This looks like a no-op
+            MetaType metaType;
+            decodeMdTypesEmitMdQualifiers(isIoMd(mdNode), mdNode, mdType, false, metaType);
+
+            if(metaType.binding != -1) {
+                var->data.explicit_binding = true;
+                // Note: binding is now two 16-bit values combined
+                var->data.binding = metaType.binding;
+            }
+         }
+      }
+
+      // TODO: how to set this appropriately?
+      //    var->data.how_declared = ir_var_declared_implicitly;
+   } else {
+      // Create IR declaration for the interface
+      newIRVariable(irInterfaceType, name, irVarMode, true);
+
+      // Register interface block, if it is one.  We want to use the interface
+      // name, not the type name, in this registration.
+      if (irInterfaceType->is_interface()) {
+         if (!state->symbols->add_interface(irInterfaceType->name, irInterfaceType, irVarMode)) {
+            return error("unable to add interface");
+         }
+      }
+   }
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Translate function arguments
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::addArgument(const llvm::Value* value, bool last)
+{
+   fnParameters.push_tail(getIRValue(value));
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Reset any pending function state
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::resetFnTranslationState()
+{
+   // We don't free memory for the fn name, because someone else owns it
+   fnName = 0;
+
+   // Destroy any pending fn return type
+   delete fnReturnType;
+   fnReturnType = 0;
+
+   // Destroy function signature
+   if (fnSignature) {
+      ralloc_free(fnSignature);
+      fnSignature = 0;
+   }
+
+   // Destroy function
+   if (fnFunction) {
+      ralloc_free(fnFunction);
+      fnFunction = 0;
+   }
+
+   // empty parameter list
+   fnParameters.make_empty();
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Convert struct types.  May call llvmTypeToHirType recursively.
+ * -----------------------------------------------------------------------------
+ */
+const glsl_type*
+MesaGlassTranslator::convertStructType(const llvm::StructType* structType,
+                                       llvm::StringRef         name,
+                                       llvm::StringRef         blockName,
+                                       const llvm::MDNode*     mdNode,
+                                       gla::EMdTypeLayout      parentTypeLayout,
+                                       gla::EVariableQualifier parentQualifier,
+                                       bool                    isBlock)
+{
+   const int containedTypeCount = structType->getNumContainedTypes();
+
+   // Allocate an array of glsl_struct_fields of size to hold all subtypes
+   glsl_struct_field *const fields = ralloc_array(shader, glsl_struct_field, containedTypeCount);
+
+   for (int index = 0; index < containedTypeCount; ++index) {
+      const llvm::MDNode* subMdAggregate =
+         mdNode ? llvm::dyn_cast<llvm::MDNode>(mdNode->getOperand(GetAggregateMdSubAggregateOp(index))) : 0;
+
+      std::string subName;
+      subName.reserve(20);
+
+      const llvm::Type* containedType  = structType->getContainedType(index);
+
+      MetaType metaType;
+      if (subMdAggregate)
+         decodeMdTypesEmitMdQualifiers(false, subMdAggregate, containedType, false, metaType);
+
+      // If there's metadata, use that field name.  Else, make up a field name.
+      if (mdNode) {
+         subName = mdNode->getOperand(GetAggregateMdNameOp(index))->getName().str();
+         if (subName.empty()) {
+             if (subMdAggregate && metaType.builtIn != EmbNone) {
+                 std::string bname;
+                 switch (metaType.builtIn) {
+                 case EmbPosition: bname = "gl_Position";     break;
+                 case EmbPointSize:      bname = "gl_PointSize";    break;
+                 case EmbClipDistance:   bname = "gl_ClipDistance"; break;
+                 default: error("unhandled builtIn");
+                 } // switch
+                 subName.assign(bname);
+             } else {
+                char anonFieldName[20]; // enough for "anon_" + digits to hold maxint
+                snprintf(anonFieldName, sizeof(anonFieldName), "%s%s%d", blockName.str().c_str(), "_gg_", index);
+                subName.assign(anonFieldName);
+                name = blockName;
+             }
+         }
+      } else {
+         char anonFieldName[20]; // enough for "anon_" + digits to hold maxint
+         snprintf(anonFieldName, sizeof(anonFieldName), "anon_%d", index);
+         subName.assign(anonFieldName);
+      }
+
+      // TODO: set arrayChild, etc properly
+
+      fields[index].name          = ralloc_strdup(shader, subName.c_str());
+      fields[index].type          = llvmTypeToHirType(containedType, subMdAggregate);
+
+      if (subMdAggregate) {
+
+         if (metaType.location < gla::MaxUserLayoutLocation) {
+            fields[index].location      = metaType.location;
+         } else {
+            fields[index].location      = -1;
+         }
+
+         fields[index].interpolation = InterpolationQualifierToIR(metaType.interpMethod);
+         fields[index].centroid      = metaType.interpLocation == EILCentroid;
+         fields[index].sample        = metaType.interpLocation == EILSample;
+         fields[index].row_major     = metaType.typeLayout == EMtlRowMajorMatrix;
+      }
+
+      // TODO: handle qualifiers
+   }
+
+   if (isBlock) {
+      return glsl_type::get_interface_instance(fields, containedTypeCount,
+                                               TypeLayoutToIR(parentTypeLayout),
+                                               ralloc_strdup(shader, name.str().c_str()));
+   } else {
+      return glsl_type::get_record_instance(fields, containedTypeCount,
+                                            ralloc_strdup(shader, name.str().c_str()));
+   }
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Convert an LLVM type to an HIR type
+ * -----------------------------------------------------------------------------
+ */
+const glsl_type*
+MesaGlassTranslator::llvmTypeToHirType(const llvm::Type* type,
+                                       const llvm::MDNode* mdNode,
+                                       const llvm::Value*  llvmValue,
+                                       const bool genUnsigned)
+{
+   // Find any aggregate metadata we may have stored away in a prior declaration
+   const tMDMap::const_iterator aggMdFromMap = typeMdAggregateMap.find(type);
+   const llvm::MDNode* mdAggNode = (aggMdFromMap != typeMdAggregateMap.end()) ? aggMdFromMap->second : 0;
+
+   if (!mdNode)
+      mdNode = mdAggNode;
+
+   const tTypeData typeData(type, mdNode, genUnsigned);
+
+   // See if we've already converted this type.  If so, use the one we did already.
+   const tTypeMap::const_iterator foundType = typeMap.find(typeData);
+
+   if (foundType != typeMap.end())
+      return foundType->second;
+   
+   // Otherwise, we must convert it.
+   const glsl_type* return_type = glsl_type::error_type;
+
+   const EMdTypeLayout mdType = mdNode ? GetMdTypeLayout(mdNode) : gla::EMtlNone;
+
+   const bool signedInt = !genUnsigned &&
+           ((llvmValue && sintValues.find(llvmValue) != sintValues.end()) ||
+               (mdType != EMtlUnsigned));
+
+   // TODO: handle precision, etc
+   switch (type->getTypeID())
+   {
+   case llvm::Type::VectorTyID:
+      {
+         glsl_base_type glslBaseType;
+
+         if (type->getContainedType(0) == type->getFloatTy(type->getContext()))
+            glslBaseType = GLSL_TYPE_FLOAT;
+         else if (gla::IsBoolean(type->getContainedType(0)))
+            glslBaseType = GLSL_TYPE_BOOL;
+         else if (gla::IsInteger(type->getContainedType(0)))
+            glslBaseType = signedInt ? GLSL_TYPE_INT : GLSL_TYPE_UINT;
+         else 
+            glslBaseType = GLSL_TYPE_VOID;
+
+         // TODO: Ugly const_cast necessary here at the moment.  Fix.
+         const int componentCount = gla::GetComponentCount(const_cast<llvm::Type*>(type));
+         return_type = glsl_type::get_instance(glslBaseType, componentCount, 1);
+         break;
+      }
+ 
+   case llvm::Type::ArrayTyID:
+      {
+         const llvm::ArrayType* arrayType     = llvm::dyn_cast<const llvm::ArrayType>(type);
+         const llvm::Type*      containedType = arrayType->getContainedType(0);
+         const int              arraySize     = arrayType->getNumElements();
+         assert(arrayType);
+
+         const bool isMat = (mdType == EMtlRowMajorMatrix || mdType == EMtlColMajorMatrix);
+
+         // Reconstruct matrix types from arrays of vectors, per metadata hints
+         if (isMat) {
+            const llvm::VectorType* vectorType = llvm::dyn_cast<llvm::VectorType>(containedType);
+
+            // For arrays of arrays of vectors, we want one more level of dereference.
+            if (vectorType)
+               return glslMatType(arraySize, vectorType->getNumElements());
+         }
+
+         // Metadata type will be the same
+         const glsl_type* containedIRType = llvmTypeToHirType(containedType, mdNode, llvmValue);
+
+         return_type = glsl_type::get_array_instance(containedIRType, arraySize);
+         break;
+      }
+
+   case llvm::Type::StructTyID:
+      {
+         const llvm::StructType* structType = llvm::dyn_cast<const llvm::StructType>(type);
+         assert(structType);
+
+         llvm::StringRef structName = structType->isLiteral() ? "" : structType->getName();
+
+         // Check for a top level uniform/input/output MD with an aggregate MD hanging off it
+         // TODO: set arrayChild properly
+         llvm::StringRef blockName;
+         std::string name;
+         MetaType metaType;
+         if (mdNode) {
+             name = mdNode->getOperand(2)->getName();
+             StripSuffix(name, "_typeProxy");
+             StripSuffix(name, "_shadow");
+             blockName = name;
+
+             decodeMdTypesEmitMdQualifiers(isIoMd(mdNode), mdNode, type, false, metaType);
+
+             if (metaType.ioKind == EMioPipeOutBlock)
+                 structName = "gl_PerVertex";
+             else if (metaType.ioKind == EMioPipeInBlock)
+                 structName = "gl_PerVertex.0";
+
+             // Convert IO metadata to aggregate metadata if needed.
+             if (isIoAggregateMd(mdNode))
+                mdNode = llvm::dyn_cast<const llvm::MDNode>(mdNode->getOperand(4));
+
+             // track the mapping between the type and it's aggregate metadata
+             typeMdAggregateMap[structType] = mdNode;
+         }
+
+         return_type = convertStructType(structType, structName, blockName, mdNode,
+                                         metaType.typeLayout,
+                                         metaType.qualifier,
+                                         metaType.block);
+         break;
+      }
+
+   case llvm::Type::PointerTyID:
+      // TODO:
+      assert(0 && "unimplemented");
+      break;
+
+   case llvm::Type::IntegerTyID:
+      {
+         // Sampler type conversions
+         if (mdType == EMtlSampler) {
+            EMdSampler         mdSampler;
+            llvm::Type*        mdType;
+            EMdSamplerDim      mdSamplerDim;
+            bool               isArray;
+            bool               isShadow;
+            EMdSamplerBaseType mdBaseType;
+
+            // Handle aggregate member form
+            const int typeNodePos = isIoMd(mdNode) ? 3 : 1;
+
+            const llvm::MDNode* typeMdNode    = llvm::dyn_cast<const llvm::MDNode>(mdNode->getOperand(typeNodePos));
+            assert(typeMdNode);
+            const llvm::MDNode* samplerMdNode = llvm::dyn_cast<const llvm::MDNode>(typeMdNode->getOperand(3));
+            assert(samplerMdNode);
+
+            if (gla::CrackSamplerMd(samplerMdNode, mdSampler, mdType, mdSamplerDim, isArray, isShadow, mdBaseType))
+               return GetSamplerType(mdSampler, mdBaseType, mdSamplerDim, isShadow, isArray);
+         }
+
+         glsl_base_type baseType = signedInt ? GLSL_TYPE_INT : GLSL_TYPE_UINT;
+
+         if (gla::IsBoolean(type))
+            baseType = GLSL_TYPE_BOOL;
+
+         return_type = glsl_type::get_instance(baseType, 1, 1);
+         break;
+      }
+
+   case llvm::Type::FloatTyID:
+      return_type = glsl_type::get_instance(GLSL_TYPE_FLOAT, 1, 1);
+      break;
+
+   case llvm::Type::VoidTyID:
+      return_type = glsl_type::get_instance(GLSL_TYPE_VOID, 1, 1);
+      break;
+
+   default:
+      error("Unexpected LLVM type");
+      return_type = glsl_type::get_instance(GLSL_TYPE_FLOAT, 1, 1);
+   }
+
+   return typeMap[typeData] = return_type;
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Translate function declaration to HIR
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::startFunctionDeclaration(const llvm::Type* type, llvm::StringRef name)
+{
+   // Initialize function state: we only ever translate one function at a time, as
+   // they cannot be nested.
+   resetFnTranslationState();
+   
+   // TODO: create parameter list, etc from
+   // ir_function and ir_function_signature
+
+   fnName       = ralloc_strdup(shader, name.str().c_str());
+   fnReturnType = llvmTypeToHirType(type->getContainedType(0));
+
+   // Create fn and signature objects
+   fnFunction = new(shader) ir_function(fnName);
+   fnSignature = new(shader) ir_function_signature(fnReturnType);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Finish function declaration
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::endFunctionDeclaration()
+{
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Start translation of function body
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::startFunctionBody()
+{
+   assert(state->current_function == NULL);
+
+   // Start building new instruction list
+   instructionStack.push_back(&fnSignature->body);
+
+   if (!state->symbols->add_function(fnFunction)) {
+      // Shouldn't fail, because semantic checking done by glslang
+      return error("unable to add function");
+   }
+
+   emit_function(state, fnFunction);
+   fnFunction->add_signature(fnSignature);
+
+   // Transfer the fn parameters to the HIR signature
+   fnSignature->replace_parameters(&fnParameters);
+
+   state->current_function = fnSignature;
+   state->symbols->push_scope();
+
+   // For main, prepend prologue, if any
+   while (!prologue.empty()) {
+      addIRInstruction(prologue.front());
+      prologue.pop_front();
+   }
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * End translation of function body
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::endFunctionBody()
+{
+   fnSignature->is_defined = true; // tell linker function is defined
+
+   assert(instructionStack.size() >= 2);  // at least 1 for shader, 1 for function
+   instructionStack.pop_back();
+
+   state->current_function = 0; // Reset things set in startFunctionBody
+   state->symbols->pop_scope(); // ...
+
+   // We've now handed off (to the HIR) ownership of these bits of translation
+   // state, so don't free them!
+   fnName = 0;
+   fnReturnType = 0;
+   fnSignature = 0;
+   fnFunction = 0;
+   fnParameters.make_empty();
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Create IR constant value from Mesa constant
+ * -----------------------------------------------------------------------------
+ */
+inline ir_rvalue* MesaGlassTranslator::newIRGlobal(const llvm::Value* value, const char* name)
+{
+   // TODO: maybe this can't be reached?  If we always see a Load first, that'll
+   // take care of it.
+   assert(0);
+   return 0;
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * See if this constant is a zero
+ * -----------------------------------------------------------------------------
+ */
+inline bool MesaGlassTranslator::isConstantZero(const llvm::Constant* constant) const
+{
+   // Code snippit borrowed from LunarGlass GLSL backend
+   if (!constant || IsUndef(constant))
+      return true;
+   else if (llvm::isa<llvm::ConstantAggregateZero>(constant))
+      return true;
+   else
+      return false;
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Create simple scalar constant
+ * -----------------------------------------------------------------------------
+ */
+template <typename T> inline T
+MesaGlassTranslator::newIRScalarConstant(const llvm::Constant* constant) const
+{
+   if (isConstantZero(constant))
+       return T(0);
+
+   const llvm::Type* type = constant->getType();
+
+   switch (type->getTypeID()) {
+   case llvm::Type::IntegerTyID:
+      {
+         if (gla::IsBoolean(type)) {
+            if (GetConstantInt(constant))
+               return T(true);
+            else
+               return T(false);
+         } else
+            return T(GetConstantInt(constant));
+      }
+
+   case llvm::Type::FloatTyID:
+      return T(GetConstantFloat(constant));
+
+   default:
+      error("error in constant conversion");
+      return T(0);
+   }
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Create IR constant value from Mesa constant
+ * -----------------------------------------------------------------------------
+ */
+inline ir_constant* MesaGlassTranslator::newIRConstant(const llvm::Value* value)
+{
+   const llvm::Constant* constant = llvm::dyn_cast<llvm::Constant>(value);
+   const llvm::Type*     type     = constant->getType();
+
+   assert(constant);
+
+   // Handle undefined constant values
+   if (llvm::isa<llvm::UndefValue>(constant))
+      return addIRUndefined(type);
+
+   const bool isZero = isConstantZero(constant);
+
+   if (isZero)
+      return ir_constant::zero(shader, llvmTypeToHirType(type));
+
+   switch (type->getTypeID())
+   {
+   case llvm::Type::VectorTyID:
+      {
+         ir_constant_data data;
+         const int count = gla::GetComponentCount(const_cast<llvm::Type*>(type));
+         const llvm::ConstantDataSequential* seqData    = llvm::dyn_cast<const llvm::ConstantDataSequential>(constant);
+         const llvm::ConstantVector*         vectorData = llvm::dyn_cast<const llvm::ConstantVector>(constant);
+
+         const llvm::Constant* splatValue = 0;
+
+         if (seqData)
+            splatValue = seqData->getSplatValue();
+
+         if (vectorData)
+            splatValue = vectorData->getSplatValue();
+
+         const glsl_type* glslType = 0;
+
+         // TODO: must we handle other llvm Constant subclasses?
+         for (int op=0; op<count; ++op) {
+            const llvm::Constant* opData =
+               isZero     ? 0 :
+               splatValue ? splatValue :
+               vectorData ? dyn_cast<const llvm::Constant>(vectorData->getOperand(op)) :
+               seqData ? seqData->getElementAsConstant(op) :
+               0;
+
+            if (type->getContainedType(0) == type->getFloatTy(type->getContext())) {
+               glslType = glsl_type::vec(count);
+               data.f[op] = newIRScalarConstant<float>(opData);
+            }
+            else if (gla::IsBoolean(type->getContainedType(0))) {
+               glslType = glsl_type::bvec(count);
+               data.b[op] = newIRScalarConstant<bool>(opData);
+            } else if (gla::IsInteger(type->getContainedType(0))) {
+               // TODO: distinguish unsigned from signed
+               glslType = glsl_type::ivec(count);
+               data.i[op] = newIRScalarConstant<int>(opData);
+            } else {
+               error("unexpected LLVM type");
+               glslType = glsl_type::vec(count);
+               data.f[op] = 0;
+            }
+         }
+
+         assert(glslType);
+
+         return new(shader) ir_constant(glslType, &data);
+      }
+ 
+   case llvm::Type::IntegerTyID:
+      // TODO: distinguish unsigned from signed, use getSExtValue or getZExtValue appropriately
+      if (gla::IsBoolean(type))
+         return new(shader) ir_constant(newIRScalarConstant<bool>(constant));
+      else
+         return new(shader) ir_constant(newIRScalarConstant<int>(constant));
+
+   case llvm::Type::FloatTyID:
+      return new(shader) ir_constant(newIRScalarConstant<float>(constant));
+
+   case llvm::Type::StructTyID:
+   case llvm::Type::ArrayTyID:
+      {
+         const llvm::Type* aggrType = 0;
+         int               aggrSize = 0;
+
+         if (const llvm::StructType* structType = llvm::dyn_cast<const llvm::StructType>(type)) {
+             aggrSize = structType->getNumElements();
+             aggrType = structType;
+         } else if (const llvm::ArrayType* arrayType = llvm::dyn_cast<const llvm::ArrayType>(type)) {
+             aggrSize = arrayType->getNumElements();
+             aggrType = arrayType;
+         } else {
+           assert(0 && "unexpected constant aggregate type");
+         }
+
+         const llvm::ConstantDataSequential* dataSequential = llvm::dyn_cast<llvm::ConstantDataSequential>(constant);
+
+         // Populate a list of entries
+         exec_list constantValues;
+         for (int element = 0; element < aggrSize; ++element) {
+            const llvm::Constant* constElement;
+            if (dataSequential)
+               constElement = dataSequential->getElementAsConstant(element);
+            else
+               constElement = llvm::dyn_cast<llvm::Constant>(constant->getOperand(element));
+
+            constantValues.push_tail(newIRConstant(constElement));
+         }
+
+         return new(shader) ir_constant(llvmTypeToHirType(aggrType, 0, constant),
+                                        &constantValues);
+      }
+
+   case llvm::Type::PointerTyID: // fall through...
+   case llvm::Type::VoidTyID:    // fall through
+   default:
+      error("unexpected constant type");
+      return new(shader) ir_constant(0.0f);
+   }
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Add type-safe undefined value in case someone looks up a not-defined value
+ * -----------------------------------------------------------------------------
+ */
+inline ir_constant* MesaGlassTranslator::addIRUndefined(const llvm::Type* type)
+{
+   // TODO: add infolog warning about undefined values
+
+   static const unsigned deadbeefU = 0xdeadbeef;
+   static const float deadbeefF = *((float *)&deadbeefU);
+
+   switch (type->getTypeID())
+   {
+   case llvm::Type::VectorTyID:
+      {
+         const int count = gla::GetComponentCount(const_cast<llvm::Type*>(type));
+
+         if (type->getContainedType(0) == type->getFloatTy(type->getContext()))
+            return new(shader) ir_constant(deadbeefF, count);
+         else if (gla::IsBoolean(type->getContainedType(0)))
+            return new(shader) ir_constant(false, count);
+         else if (type->getContainedType(0) == type->getInt32Ty(type->getContext()))
+            // TODO: distinguish unsigned from signed
+            return new(shader) ir_constant(deadbeefU, count);
+         else {
+            error("unexpected LLVM type");
+         }
+         return new(shader) ir_constant(deadbeefF);
+      }
+ 
+   case llvm::Type::IntegerTyID:
+      // TODO: distinguish unsigned from signed
+      return new(shader) ir_constant(deadbeefU);
+
+   case llvm::Type::FloatTyID:
+      return new(shader) ir_constant(deadbeefF);
+
+   case llvm::Type::ArrayTyID:
+      {
+         // TODO: this is likely not sufficient.  mats?  aofa?
+         const llvm::ArrayType* arrayType = llvm::dyn_cast<const llvm::ArrayType>(type);
+         assert(arrayType);
+
+         const int         arraySize     = arrayType->getNumElements();
+         const llvm::Type* containedType = arrayType->getContainedType(0);
+
+         ralloc_array(shader, ir_constant*, arraySize);
+
+         exec_list* constants = new(shader) exec_list;
+
+         for (int e=0; e<arraySize; ++e) {
+            ir_constant* elementValue = addIRUndefined(containedType)->as_constant();
+            constants->push_tail(elementValue);
+         }
+
+         return new(shader) ir_constant(llvmTypeToHirType(type), constants);
+      }
+
+   case llvm::Type::StructTyID:
+      {
+         const llvm::StructType* structType = llvm::dyn_cast<const llvm::StructType>(type);
+         assert(structType);
+
+         const int structSize = structType->getStructNumElements();
+         ralloc_array(shader, ir_constant*, structSize);
+
+         exec_list* constants = new(shader) exec_list;
+
+         for (unsigned int f = 0; f < structSize; ++f) {
+            const llvm::Type* containedType = structType->getContainedType(f);
+            ir_constant* elementValue = addIRUndefined(containedType)->as_constant();
+            constants->push_tail(elementValue);
+         }
+
+         return new(shader) ir_constant(llvmTypeToHirType(type), constants);
+      }
+
+
+   case llvm::Type::PointerTyID: // fall through
+   case llvm::Type::VoidTyID:    // fall through
+   default:
+      error("unexpected LLVM type");
+      return new(shader) ir_constant(deadbeefF);
+   }
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Get an IR value, either by looking up existing value, or making a
+ * new constant or global
+ * -----------------------------------------------------------------------------
+ */
+inline ir_rvalue*
+MesaGlassTranslator::getIRValue(const llvm::Value* llvmValue, ir_instruction* instruction)
+{
+   tValueMap::const_iterator location = valueMap.find(llvmValue);
+
+   // Insert into map if not there
+   if (location == valueMap.end()) {
+      // Handle constants
+      if (llvm::isa<llvm::Constant>(llvmValue))
+         instruction = newIRConstant(llvmValue);
+      // Handle global variables
+      else if (llvm::isa<llvm::GlobalVariable>(llvmValue))
+         instruction = newIRGlobal(llvmValue);
+
+      // If asking for something that doesn't exist, create default value
+      if (instruction == 0)
+         instruction = addIRUndefined(llvmValue->getType());
+
+      location = valueMap.insert(tValueMap::value_type(llvmValue, instruction)).first;
+   }
+
+   // TODO: is there any reason to clone assign/call returns?
+
+   ir_rvalue* rvalue;
+
+   // Return appropriate value
+   if (location->second->as_assignment()) {
+      // For assignments, return a ref to the variable being assigned
+      rvalue = location->second->as_assignment()->lhs;
+   } else if (location->second->as_call()) {
+      // for calls, return a ref to the return value
+      rvalue = location->second->as_call()->return_deref;
+   } else {
+      // For all others, return the value, or a dereference
+      if (location->second->as_rvalue()) {
+         rvalue = location->second->as_rvalue();
+      } else {
+         error("internal error fetching rvalue");
+         return 0;
+      }
+   }
+
+   return rvalue->clone(shader, 0);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Make up a name for a new variable
+ * -----------------------------------------------------------------------------
+ */
+const char* MesaGlassTranslator::newName(const llvm::Value* llvmValue)
+{
+   static const char* baseLocalName = ".temp";
+
+   // TODO: This string flotching is runtime inefficient.
+   // Replace with something faster.
+   if (!llvmValue->getName().empty())
+      return ralloc_strdup(shader, (std::string(".") + llvmValue->getName().str()).c_str());
+
+   return baseLocalName;
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Add instruction (Raw): don't add map entry, just append to inst list
+ * -----------------------------------------------------------------------------
+ */
+inline void
+MesaGlassTranslator::addIRInstruction(ir_instruction* instruction, bool global)
+{
+   // Set variable as assigned to, if it is.
+   if (instruction->as_assignment() &&
+       instruction->as_assignment()->lhs->as_dereference_variable()) {
+      instruction->as_assignment()->lhs->as_dereference_variable()->variable_referenced()->data.assigned = true;
+   }
+
+   if (global)
+      instructionStack.front()->push_head(instruction);
+   else
+      instructionStack.back()->push_tail(instruction);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Add instruction (cooked): add to list on top of instruction list stack
+ * -----------------------------------------------------------------------------
+ */
+inline void
+MesaGlassTranslator::addIRInstruction(const llvm::Value* llvmValue, ir_instruction* instruction)
+{
+   assert(instruction);
+   tValueMap::const_iterator location = valueMap.find(llvmValue);
+
+   // Verify SSA-ness
+   if (location != valueMap.end()) {
+      assert(0 && "SSA failure!  Shouldn't add same value twice.");
+   }
+
+   const unsigned refCount = getRefCount(llvmValue);
+
+   // These are instructions we must always insert directly
+   const bool directInsertion =
+      instruction->as_assignment() ||
+      instruction->as_call() ||
+      instruction->as_discard() ||
+      (instruction->as_rvalue() &&
+       instruction->as_rvalue()->type->is_sampler());
+
+   // We never insert temp assignments for these
+   const bool noInsertion =
+      instruction->as_variable() ||
+      instruction->as_dereference_variable();
+
+   // Add it to the instruction list if needed.  Otherwise, don't; we'll look
+   // it up later and use it in an expression tree
+   if ((refCount > 1 || directInsertion) && !noInsertion) {
+      // Create an assignment to a new local if needed
+      if (!directInsertion) {
+         ir_rvalue* rvalue = instruction->as_rvalue();
+
+         if (!rvalue) {
+            // Var<-var assignment
+            assert(instruction->as_variable());
+            rvalue = new(shader) ir_dereference_variable(instruction->as_variable());
+         }
+
+         ir_dereference* localVar = newIRVariableDeref(rvalue->type, newName(llvmValue), ir_var_auto);
+         
+         instruction = new(shader) ir_assignment(localVar, rvalue);
+      }
+
+      addIRInstruction(instruction);
+   }
+
+   valueMap[llvmValue] = instruction;
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Add a builtin function call
+ * -----------------------------------------------------------------------------
+ */
+inline void
+MesaGlassTranslator::emitFn(const char* name, const llvm::Instruction* llvmInst)
+{
+   const llvm::CallInst* callInst = llvm::dyn_cast<llvm::CallInst>(llvmInst);
+
+   const unsigned numArgs = callInst ? callInst->getNumArgOperands() : llvmInst->getNumOperands();
+
+   // This is OK on the stack, because the ir_call constructor will move out of it.
+   exec_list parameters;
+   
+   for (unsigned i=0; i < numArgs; ++i)
+      parameters.push_tail(getIRValue(llvmInst->getOperand(i)));
+
+   // Find the right function signature to call
+   // This sets state->uses_builtin_functions
+   ir_function_signature *sig =
+      _mesa_glsl_find_builtin_function(state, name, &parameters);
+
+   if (sig == 0) {
+      return error("no matching function signature found");
+   }
+
+   const std::string retName = std::string(name) + "_retval";
+   ir_dereference* dest = newIRVariableDeref(llvmInst->getType(), llvmInst, retName, ir_var_auto);
+
+   ir_call *call = new(shader) ir_call(sig, dest->as_dereference_variable(), &parameters);
+   addIRInstruction(llvmInst, call);
+}
+
+ir_expression* MesaGlassTranslator::glass_to_ir_expression(const int irOp,
+                            const glsl_type* hirType,
+                            ir_rvalue* op0,
+                            ir_rvalue* op1,
+                            ir_rvalue* op2,
+                            const bool genUnsigned0,
+                            const bool genUnsigned1,
+                            const bool genUnsigned2)
+{
+    static const int maxOp = 3;
+    ir_rvalue* op[maxOp];
+
+    if (genUnsigned0 && op0->type->base_type == GLSL_TYPE_INT)
+    {
+       op[0] = new(shader) ir_expression(ir_unop_i2u, op0);
+    }
+    else if (!genUnsigned0 && op0->type->base_type == GLSL_TYPE_UINT)
+    {
+       op[0] = new(shader) ir_expression(ir_unop_u2i, op0);
+    }
+    else
+    {
+       op[0] = op0;
+    }
+
+    op[1] = op1;
+    if (op1)
+    {
+       if (genUnsigned1 && op1->type->base_type == GLSL_TYPE_INT)
+       {
+           op[1] = new(shader) ir_expression(ir_unop_i2u, op1);
+       }
+       else if (!genUnsigned1 && op1->type->base_type == GLSL_TYPE_UINT)
+       {
+           op[1] = new(shader) ir_expression(ir_unop_u2i, op1);
+       }
+    }
+
+    op[2] = op2;
+    if (op2)
+    {
+       if (genUnsigned2 && op2->type->base_type == GLSL_TYPE_INT)
+       {
+           op[2] = new(shader) ir_expression(ir_unop_i2u, op2);
+       }
+       else if (!genUnsigned2 && op2->type->base_type == GLSL_TYPE_UINT)
+       {
+           op[2] = new(shader) ir_expression(ir_unop_u2i, op2);
+       }
+    }
+
+    return new(shader) ir_expression(irOp, hirType, op[0], op[1], op[2]);
+}
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit N-ary opcode
+ * -----------------------------------------------------------------------------
+ */
+template <int ops>
+inline void MesaGlassTranslator::emitOp(int irOp,
+                                        const llvm::Instruction* llvmInst,
+                                        const bool genUnsigned0,
+                                        const bool genUnsigned1,
+                                        const bool genUnsigned2)
+{
+   static const int maxOp = 3;
+   ir_rvalue* op[maxOp];
+   assert(ops <= maxOp);
+
+   /* Assumes genUnsigned -> genUnsigned1 -> genUnsigned0 */
+   assert(!genUnsigned2 || genUnsigned1);
+   assert(!genUnsigned1 || genUnsigned0);
+
+   const glsl_type *hirType = llvmTypeToHirType(llvmInst->getType(), 0, llvmInst, genUnsigned0);
+
+   for (int i=0; i<maxOp; ++i)
+   {
+      op[i] = ops > i ? getIRValue(llvmInst->getOperand(i)) : 0;
+   }
+
+   ir_rvalue* result = glass_to_ir_expression(irOp, hirType, op[0], op[1], op[2],
+                                        genUnsigned0, genUnsigned1, genUnsigned2);
+   assert(result && (op[0] || ops < 1) && (op[1] || ops < 2) && (op[2] || ops < 3));
+   
+   addIRInstruction(llvmInst, result);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Add alloc
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitIRalloca(const llvm::Instruction* llvmInst)
+{
+   llvm::Type* type = llvmInst->getType();
+   if (const llvm::PointerType* pointer = llvm::dyn_cast<llvm::PointerType>(type))
+      type = pointer->getContainedType(0);
+
+   ir_rvalue* var = newIRVariableDeref(type, llvmInst, newName(llvmInst), ir_var_auto);
+
+   addIRInstruction(llvmInst, var);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Add IR sign extension
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitIRSext(const llvm::Instruction* llvmInst)
+{
+   // TODO: handle arbitrary sign extension.  For now, just from 1 bit
+   // ints.
+   const glsl_type *hirType = llvmTypeToHirType(llvmInst->getType(), 0, llvmInst);
+   ir_rvalue* op[3];
+
+   const int count = gla::GetComponentCount(const_cast<llvm::Type*>(llvmInst->getType()));
+
+   op[0] = getIRValue(llvmInst->getOperand(0));
+
+   if (count == 1) {
+      op[1] = new(shader) ir_constant(int(0xffffffff));
+      op[2] = new(shader) ir_constant(int(0));
+   } else {
+      ir_constant_data data0, data1;
+      data1.i[0] = data1.i[1] = data1.i[2] = data1.i[3] = int(0xffffffff);
+      data0.i[0] = data0.i[1] = data0.i[2] = data0.i[3] = int(0x0);
+
+      op[1] = new(shader) ir_constant(glsl_type::ivec(count), &data1);
+      op[2] = new(shader) ir_constant(glsl_type::ivec(count), &data0);
+   }
+
+   // HIR wants the condition to be a vector of size of element, so we must
+   // insert a broadcast swizzle if need be.
+   if (op[0]->type->vector_elements != op[1]->type->vector_elements)
+      op[0] = new(shader) ir_swizzle(op[0], 0, 0, 0, 0, op[1]->type->vector_elements);
+
+   ir_rvalue* result = glass_to_ir_expression(ir_triop_csel, hirType, op[0], op[1], op[2]);
+   assert(result && op[0] && op[1] && op[2]);
+   
+   addIRInstruction(llvmInst, result);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit vertex
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitIREmitVertex(const llvm::Instruction* llvmInst)
+{
+   addIRInstruction(new(shader) ir_emit_vertex());
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * End primitive
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitIREndPrimitive(const llvm::Instruction* llvmInst)
+{
+   addIRInstruction(new(shader) ir_end_primitive());
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit a N-ary op of either logical or bitwise type, selecting from one of
+ * two IR opcodes appropriately.
+ * -----------------------------------------------------------------------------
+ */
+template <int ops>
+inline void MesaGlassTranslator::emitOpBit(int irLogicalOp, int irBitwiseOp,
+                                           const llvm::Instruction* llvmInst,
+                                           const bool genUnsigned)
+{
+   const bool isBool = gla::IsBoolean(llvmInst->getOperand(0)->getType());
+
+   return emitOp<ops>(isBool ? irLogicalOp : irBitwiseOp, llvmInst, genUnsigned /* genUnsigned */);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Convert predicate
+ * use ints in interface to avoid lack of forward decls of enums in C++
+ * -----------------------------------------------------------------------------
+ */
+int MesaGlassTranslator::irCmpOp(int pred) const
+{
+   switch (pred) {
+   case llvm::FCmpInst::FCMP_OEQ: // fall through
+   case llvm::FCmpInst::FCMP_UEQ: // ...
+   case llvm::ICmpInst::ICMP_EQ:  return ir_binop_equal;
+
+   case llvm::FCmpInst::FCMP_ONE: // fall through
+   case llvm::FCmpInst::FCMP_UNE: // ...
+   case llvm::ICmpInst::ICMP_NE:  return ir_binop_nequal;
+
+   case llvm::FCmpInst::FCMP_OGT: // fall through
+   case llvm::FCmpInst::FCMP_UGT: // ...
+   case llvm::ICmpInst::ICMP_UGT: // ...
+   case llvm::ICmpInst::ICMP_SGT: return ir_binop_greater;
+
+   case llvm::FCmpInst::FCMP_OGE: // fall through...
+   case llvm::FCmpInst::FCMP_UGE: // ...
+   case llvm::ICmpInst::ICMP_UGE: // ...
+   case llvm::ICmpInst::ICMP_SGE: return ir_binop_gequal;
+
+   case llvm::FCmpInst::FCMP_OLT: // fall through...
+   case llvm::FCmpInst::FCMP_ULT: // ...
+   case llvm::ICmpInst::ICMP_ULT: // ...
+   case llvm::ICmpInst::ICMP_SLT: return ir_binop_less;
+
+   case llvm::FCmpInst::FCMP_OLE: // fall through...
+   case llvm::FCmpInst::FCMP_ULE: // ...
+   case llvm::ICmpInst::ICMP_ULE: // ...
+   case llvm::ICmpInst::ICMP_SLE: return ir_binop_lequal;
+      
+   default:
+      error("unknown comparison op");
+      return ir_binop_equal;
+   }
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Return if predicate is unsigned
+ * use ints in interface to avoid lack of forward decls of enums in C++
+ * -----------------------------------------------------------------------------
+ */
+bool MesaGlassTranslator::irCmpUnsigned(int pred) const
+{
+   switch (pred) {
+   case llvm::ICmpInst::ICMP_UGT: // fall through
+   case llvm::ICmpInst::ICMP_UGE: // ...
+   case llvm::ICmpInst::ICMP_ULT: // ...
+   case llvm::ICmpInst::ICMP_ULE: return true;
+
+   case llvm::FCmpInst::FCMP_OEQ: // fall through
+   case llvm::FCmpInst::FCMP_UEQ: // ...
+   case llvm::ICmpInst::ICMP_EQ:  // ...
+   case llvm::FCmpInst::FCMP_ONE: // ...
+   case llvm::FCmpInst::FCMP_UNE: // ...
+   case llvm::ICmpInst::ICMP_NE:  // ...
+   case llvm::FCmpInst::FCMP_OGT: // ...
+   case llvm::FCmpInst::FCMP_UGT: // ...
+   case llvm::ICmpInst::ICMP_SGT: // ...
+   case llvm::FCmpInst::FCMP_OGE: // ...
+   case llvm::FCmpInst::FCMP_UGE: // ...
+   case llvm::ICmpInst::ICMP_SGE: // ...
+   case llvm::FCmpInst::FCMP_OLT: // ...
+   case llvm::FCmpInst::FCMP_ULT: // ...
+   case llvm::ICmpInst::ICMP_SLT: // ...
+   case llvm::FCmpInst::FCMP_OLE: // ...
+   case llvm::FCmpInst::FCMP_ULE: // ...
+   case llvm::ICmpInst::ICMP_SLE: return false;
+
+   default:
+      error("unknown comparison op");
+      return false;
+   }
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit comparison op
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitCmp(const llvm::Instruction* llvmInst)
+{
+   const llvm::CmpInst* cmp = llvm::dyn_cast<llvm::CmpInst>(llvmInst);
+   
+   // TODO: do we need to handle arrays / structs here?
+
+   if (!cmp) {
+      return error("invalid comparison op");
+   }
+
+   const int cmpOp = irCmpOp(cmp->getPredicate());
+   const bool genUnsigned = irCmpUnsigned(cmp->getPredicate());
+
+   // TODO: handle arrays, structs
+   return emitOp<2>(cmpOp, llvmInst, genUnsigned, genUnsigned);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Add a conditional discard
+ * NOTE: CURRENTLY DISABLED pending backend support.  See:
+ *   brw_fs_visitor.cpp:fs_visitor::visit(ir_discard *ir)
+ * Comment "FINISHME" there.  When that's done, this can be enabled
+ * by removing our override of hoistDiscards
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitIRDiscardCond(const llvm::CallInst* llvmInst)
+{
+   ir_rvalue *op0      = getIRValue(llvmInst->getOperand(0));
+   ir_discard* discard = new(shader) ir_discard(op0);
+
+   addIRInstruction(llvmInst, discard);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Add ftransform
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitIRFTransform(const llvm::CallInst* llvmInst)
+{
+   exec_list parameters;
+
+   ir_function_signature *sig =
+      _mesa_glsl_find_builtin_function(state, "ftransform", &parameters);
+
+   // TODO: don't assume ir_var_temporary here
+   ir_dereference* dest = newIRVariableDeref(llvmInst->getType(), llvmInst,
+                                             newName(llvmInst), ir_var_temporary);
+
+   // TODO: Should we insert a prototype?
+   ir_call *call = new(shader) ir_call(sig, dest->as_dereference_variable(), &parameters);
+
+   addIRInstruction(llvmInst, call);
+}
+
+/**
+ * -----------------------------------------------------------------------------
+ * Declare phi output
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::declarePhiCopy(const llvm::Value* dst)
+{
+   valueMap[dst] = newIRVariableDeref(dst->getType(), dst, newName(dst), ir_var_auto);
+}
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit phi function aliases
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::addPhiAlias(const llvm::Value* dst, const llvm::Value* src)
+{
+   valueMap[dst] = getIRValue(src);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit phi function copies
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::addPhiCopy(const llvm::Value* dst, const llvm::Value* src)
+{
+   ir_rvalue* irDst = getIRValue(dst);
+   ir_rvalue* irSrc = getIRValue(src);
+
+   ir_assignment* assign  = new(shader) ir_assignment(irDst, irSrc);
+   addIRInstruction(assign);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit IR if statement from ir_rvalue
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::addIf(ir_rvalue* cond, bool invert)
+{
+   if (invert)
+      cond = glass_to_ir_expression(ir_unop_logic_not, cond->type, cond);
+
+   ir_if *const ifStmt = new(shader) ir_if(cond);
+
+   ifStack.push_back(ifStmt);
+   instructionStack.push_back(&ifStmt->then_instructions);
+   state->symbols->push_scope();
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit "if" (but not else clause)
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::addIf(const llvm::Value* cond, bool invert)
+{
+   return addIf(getIRValue(cond), invert);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit else clause
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::addElse()
+{
+   state->symbols->pop_scope();
+   instructionStack.pop_back();
+   instructionStack.push_back(&ifStack.back()->else_instructions);
+   state->symbols->push_scope();
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit endif
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::addEndif()
+{
+   assert(!ifStack.empty());
+
+   state->symbols->pop_scope();
+   instructionStack.pop_back();
+
+   addIRInstruction(ifStack.back());
+
+   ifStack.pop_back();
+}
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit switch
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::addSwitch(const llvm::Value* cond)
+{
+   assert(0 && "TODO: ...");
+}
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit case
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::addCase(int)
+{
+   assert(0 && "TODO: ...");
+}
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit default
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::addDefault()
+{
+   assert(0 && "TODO: ...");
+}
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit endcase
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::endCase(bool withBreak)
+{
+   assert(0 && "TODO: ...");
+}
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit endswitch
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::endSwitch()
+{
+   assert(0 && "TODO: ...");
+}
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit conditional loop
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::beginConditionalLoop()
+{
+   assert(0 && "TODO: ...");
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit conditional loop
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::beginSimpleConditionalLoop(const llvm::CmpInst* cmp,
+                                                     const llvm::Value* op0,
+                                                     const llvm::Value* op1,
+                                                     bool invert)
+{
+  // Start loop body
+   beginLoop();
+
+   // TODO: Why don't we see both of these through normal LunarGlass callbacks?
+   // If we haven't already processed the extract elements, do that now.
+   if (!valueEntryExists(op0)) {
+      if (dyn_cast<const llvm::Instruction>(op0))
+         addInstruction(dyn_cast<const llvm::Instruction>(op0), false);
+   }
+
+   if (!valueEntryExists(op1)) {
+      if (dyn_cast<const llvm::Instruction>(op1))
+         addInstruction(dyn_cast<const llvm::Instruction>(op1), false);
+   }
+
+   emitCmp(cmp);  // emit loop comparison
+   addLoopExit(cmp, !invert);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit for loop (for i=0; i pred value; i += increment) ...
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::beginForLoop(const llvm::PHINode* phi,
+                                       llvm::ICmpInst::Predicate pred,
+                                       ir_rvalue* bound, ir_rvalue* increment)
+{
+   ir_rvalue* loopVar = getIRValue(phi);
+
+   // we don't have to initialize loopVar to 0: the BottomIR will produce that for us
+
+   // Start loop body
+   beginLoop();
+
+   // Add a conditional break
+   int break_pred;
+   switch(pred) {
+   default:
+       UnsupportedFunctionality("loop predicate");
+   case llvm::CmpInst::ICMP_NE:   break_pred = ir_binop_equal;   break;
+   case llvm::CmpInst::ICMP_SLE:  break_pred = ir_binop_greater; break;
+   case llvm::CmpInst::ICMP_ULT:
+   case llvm::CmpInst::ICMP_SLT:  break_pred = ir_binop_gequal;  break;
+   }
+
+   bool genUnsigned = irCmpUnsigned(pred);
+   ir_rvalue* cmp = glass_to_ir_expression(break_pred, glsl_type::bool_type,
+                                     loopVar, bound, 0, genUnsigned, genUnsigned);
+
+   addIRLoopExit(cmp);
+
+   // Create terminator statement (for ++index)
+   ir_assignment* terminator =
+      new(shader) ir_assignment(loopVar->clone(shader, 0),
+                                glass_to_ir_expression(ir_binop_add, loopVar->type,
+                                                 loopVar->clone(shader, 0),
+                                                 increment));
+
+   loopTerminatorStack.back() = terminator;
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit for loop (for i=0; i pred value; i += increment) ...
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::beginForLoop(const llvm::PHINode* phi,
+                                       llvm::ICmpInst::Predicate pred,
+                                       unsigned bound, unsigned increment)
+{
+   return beginForLoop(phi, pred, 
+                       new(shader) ir_constant(bound),
+                       new(shader) ir_constant(increment));
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit inductive loop.  Phrased in terms of a for loop.
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::beginSimpleInductiveLoop(const llvm::PHINode* phi, const llvm::Value* count)
+{
+   return beginForLoop(phi, llvm::ICmpInst::ICMP_ULT,
+                       getIRValue(count),
+                       new(shader) ir_constant(1));
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit begin loop
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::beginLoop()
+{
+   ir_loop *const loopStmt = new(shader) ir_loop();
+
+   loopStack.push_back(loopStmt);
+   instructionStack.push_back(&loopStmt->body_instructions);
+   state->symbols->push_scope();
+
+   loopTerminatorStack.push_back(0);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit end loop
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::endLoop()
+{
+   // Handle any loop terminator instruction
+   if (loopTerminatorStack.back())
+      addIRInstruction(loopTerminatorStack.back());
+
+   loopTerminatorStack.pop_back();
+
+   state->symbols->pop_scope();
+   instructionStack.pop_back();
+
+   addIRInstruction(loopStack.back());
+
+   loopStack.pop_back();
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Add IR loop exit statement from ir_rvalue
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::addIRLoopExit(ir_rvalue* condition, bool invert)
+{
+   if (condition)
+      addIf(condition, invert);
+
+   ir_loop_jump *const loopExit = new(shader) ir_loop_jump(ir_loop_jump::jump_break);
+   addIRInstruction(loopExit);
+
+   if (condition)
+      addEndif();
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Add loop exit
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::addLoopExit(const llvm::Value* condition, bool invert)
+{
+   addIRLoopExit(condition ? getIRValue(condition) : 0, invert);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Add loop continue
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::addLoopBack(const llvm::Value* condition, bool invert)
+{
+   ir_loop_jump *const loopContinue = new(shader) ir_loop_jump(ir_loop_jump::jump_continue);
+   addIRInstruction(loopContinue);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Translate call
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitIRCall(const llvm::CallInst* llvmInst)
+{
+   assert(0 && "TODO: handle call");
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Translate return
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitIRReturn(const llvm::Instruction* llvmInst, bool lastBlock)
+{
+   if (llvmInst->getNumOperands() > 0)
+      addIRInstruction(new(shader) ir_return(getIRValue(llvmInst->getOperand(0))));
+   else if (!lastBlock) // don't add return in last shader block.
+      addIRInstruction(new(shader) ir_return);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Track maximum array element used
+ * If index < 0, means we use the entire array (probably indirect indexing)
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::trackMaxArrayElement(ir_rvalue* rvalue, int index) const
+{
+   if (ir_dereference_variable *deref = rvalue->as_dereference_variable()) {
+      if (ir_variable* var = deref->variable_referenced()) {
+         if (index >= 0)
+            var->data.max_array_access = std::max(var->data.max_array_access, unsigned(index));
+         else
+            var->data.max_array_access = deref->type->length - 1;
+      }
+   }
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Translate swizzle intrinsics
+ *   (swiz components ...)
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitIRSwizzle(const llvm::IntrinsicInst* llvmInst)
+{
+   const llvm::Value*    src  = llvmInst->getOperand(0);
+   const llvm::Constant* mask = llvm::dyn_cast<llvm::Constant>(llvmInst->getOperand(1));
+   assert(mask);
+
+   // TODO: Is this sufficient for all types of swizzles?  E.g, broadcasting swizzles?
+
+   unsigned components[4] = { 0, 0, 0, 0 };
+   unsigned componentCount = gla::GetComponentCount(llvmInst);
+
+   for (unsigned i = 0; i < componentCount; ++i)
+      if (IsDefined(mask->getAggregateElement(i)))
+         components[i] = gla::GetConstantInt(mask->getAggregateElement(i));
+
+   ir_swizzle* swizzle = new(shader) ir_swizzle(getIRValue(src), components, componentCount);
+
+   addIRInstruction(llvmInst, swizzle);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Translate insertion intrinsics
+ *   turns into N instances of:
+ *   (assign (x) (var_ref var) (expression...))
+ * Handles vector or scalar sources
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitIRMultiInsert(const llvm::IntrinsicInst* llvmInst)
+{
+   const int wmask = gla::GetConstantInt(llvmInst->getOperand(1));
+
+   // TODO: optimize identity case? investigate general optimization
+   // in this space.  For example, if all components are overwritten from
+   // the same source, we can just pass on a swizzle and avoid the
+   // temp the the assignment.
+
+   ir_dereference* localTmp = newIRVariableDeref(llvmInst->getType(), llvmInst,
+                                                 newName(llvmInst), ir_var_auto);
+
+   ir_variable* var = localTmp->as_dereference_variable()->variable_referenced();
+
+   const bool writingAllComponents =
+      var->type->is_vector() &&
+      ((1<<var->type->components())-1) == wmask;
+
+   // Copy original value to new temporary
+   // TODO: is it better to just assign the not-written components?
+   if (!writingAllComponents) {
+      ir_assignment* assign = new(shader) ir_assignment(localTmp->clone(shader, 0),
+                                                        getIRValue(llvmInst->getOperand(0)));
+      addIRInstruction(assign);
+   }
+      
+   unsigned numComponents        = 0;
+   unsigned components[4]        = { 0, 0, 0, 0 };
+   int      partialMask          = 0;
+   const llvm::Value *prevSource = 0;
+
+   // TODO: this may not be general enough.  E.g, array operands?
+   for (int i = 0; i < 4; ++i) {
+      const llvm::Value* srcOperand = llvmInst->getOperand((i+1) * 2);
+      const llvm::Constant* swizOffset = llvm::dyn_cast<llvm::Constant>(llvmInst->getOperand(i*2 + 3));
+
+      // If source changed, do the ones we have so far
+      if (!IsSameSource(srcOperand, prevSource) && numComponents > 0) {
+         ir_swizzle*    swizzle = new(shader) ir_swizzle(getIRValue(prevSource), components, numComponents);
+         ir_assignment* assign  = new(shader) ir_assignment(localTmp->clone(shader, 0), swizzle, 0, partialMask);
+         addIRInstruction(assign);
+         numComponents = 0;
+         partialMask   = 0;
+      }
+
+      if (IsDefined(swizOffset)) {
+         prevSource = srcOperand;
+         components[numComponents++] = gla::GetConstantInt(swizOffset);
+         partialMask |= 1<<i;
+      }
+   }
+
+   // Do any remainder
+   if (numComponents > 0) {
+      ir_swizzle*    swizzle = new(shader) ir_swizzle(getIRValue(prevSource), components, numComponents);
+      ir_assignment* assign  = new(shader) ir_assignment(localTmp->clone(shader, 0), swizzle, 0, partialMask);
+      addIRInstruction(assign);
+   }
+
+   // TODO: maybe if it's all one source and we're overwriting the whole thing, it's just a
+   // swizzle, and we don't have to make a temp assignment.
+
+   // Add the local temp we created for this multi-insert
+   addIRInstruction(llvmInst, localTmp);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit LLVM saturate intrinsic
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitIRSaturate(const llvm::CallInst* llvmInst)
+{
+   const glsl_type *hirType = llvmTypeToHirType(llvmInst->getType(), 0, llvmInst);
+
+   ir_rvalue* minOp = glass_to_ir_expression(ir_binop_min, hirType,
+                                                getIRValue(llvmInst->getOperand(0)),
+                                                new(shader) ir_constant(1.0f));
+
+   ir_rvalue* result = glass_to_ir_expression(ir_binop_max, hirType,
+                                                 minOp, new(shader) ir_constant(0.0f));
+                                           
+   addIRInstruction(llvmInst, result);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit LLVM clamp intrinsic.  We prefer LunarGlass to decompose this for us,
+ * but can leave this here for testing purposes.
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitIRClamp(const llvm::CallInst* llvmInst,
+                                             const bool genUnsigned)
+{
+   const glsl_type *hirType = llvmTypeToHirType(llvmInst->getType(), 0, llvmInst, genUnsigned);
+
+   ir_rvalue* minOp = glass_to_ir_expression(ir_binop_max, hirType,
+                                       getIRValue(llvmInst->getOperand(0)),
+                                       getIRValue(llvmInst->getOperand(1)),
+                                       0, genUnsigned, genUnsigned);
+
+   ir_rvalue* result = glass_to_ir_expression(ir_binop_min, hirType,
+                                        minOp,
+                                        getIRValue(llvmInst->getOperand(2)),
+                                        0, genUnsigned, genUnsigned);
+
+   addIRInstruction(llvmInst, result);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Determine glsl matrix type
+ * -----------------------------------------------------------------------------
+ */
+inline const glsl_type* MesaGlassTranslator::glslMatType(int numCols, int numRows) const
+{
+   // If all columns are from the same source, we can use mat*vec or vec*mat
+   // Determine the glsl_type to use:
+   if (numRows == numCols) {
+      // Square matrix
+      switch (numRows) {
+      case 2: return glsl_type::mat2_type;
+      case 3: return glsl_type::mat3_type; 
+      case 4: return glsl_type::mat4_type; 
+      default:
+         error("bad matrix dimension");
+         return glsl_type::mat4_type;
+      }
+   } else {
+      // non-square matrix
+      if (numCols == 2 && numRows == 3)
+         return glsl_type::mat2x3_type;
+      else if (numCols == 2 && numRows == 4)
+         return glsl_type::mat2x4_type;
+      else if (numCols == 3 && numRows == 2)
+         return glsl_type::mat3x2_type;
+      else if (numCols == 3 && numRows == 4)
+         return glsl_type::mat3x4_type;
+      else if (numCols == 4 && numRows == 2)
+         return glsl_type::mat4x2_type;
+      else if (numCols == 4 && numRows == 3)
+         return glsl_type::mat4x3_type;
+      else {
+         error("bad matrix dimension");
+         return glsl_type::mat4_type;
+      }
+   }
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Load, potentially from GEP chain offset
+ * typeOverride lets us overrule the internal type, for example, if a higher
+ * level mat*vec intrinsics knows the type should be a matrix.
+ * -----------------------------------------------------------------------------
+ */
+inline ir_rvalue*
+MesaGlassTranslator::makeIRLoad(const llvm::Instruction* llvmInst, const glsl_type* typeOverride)
+{
+   const llvm::Value*             src     = llvmInst->getOperand(0);
+   const llvm::GetElementPtrInst* gepInst = getGepAsInst(src);
+
+   const llvm::MDNode*            mdNode  = llvmInst->getMetadata(UniformMdName);
+   std::string                    name;
+
+   // If this load doesn't have a metadata node, try to find one we've created
+   // during variable declarations.  We use rendezvous-by-name to that end.
+   // We might not find one though, e.g, if this load is not from a global.
+
+   if (gepInst)
+      name = gepInst->getOperand(0)->getName();
+   else 
+      name = llvmInst->getOperand(0)->getName();
+
+   // Look for builtin
+   tNameBuiltinMap::const_iterator got = nameBuiltinMap.find(name);
+   if (got != nameBuiltinMap.end())
+       name = got->second;
+
+   if (!mdNode && !name.empty()) {
+      auto it = typenameMdMap.find(name);
+      mdNode = (it == typenameMdMap.end()) ? 0 : it->second;
+   }
+
+   // Look up the ir_variable_mode we remembered during the global declaration
+   const ir_variable_mode irMode = ir_variable_mode(globalVarModeMap[name]);
+
+   // Handle types for both GEP chain loads and non-GEPs.
+   const llvm::Type* srcType =
+      (gepInst ? gepInst->getPointerOperand() : llvmInst->getOperand(0))->getType();
+
+   if (srcType->getTypeID() == llvm::Type::PointerTyID)
+      srcType = srcType->getContainedType(0);
+
+   const glsl_type* irType = llvmTypeToHirType(srcType, mdNode, llvmInst);
+
+   ir_rvalue* load;
+   ir_variable* ioVar = 0;
+   ir_rvalue* aggregate = 0;
+
+   // Handle GEP traversal
+   if (gepInst) {
+      const llvm::Value* gepSrc = gepInst->getPointerOperand();
+
+      // If this is the first load from this aggregate, make up a new one.
+      const tValueMap::const_iterator location = valueMap.find(gepSrc);
+      if (location == valueMap.end()) {
+          const llvm::MDNode* mdNode;
+          const llvm::Type*   mdType;
+          const llvm::Type*   mdAggregateType;
+
+          FindGepType(gepInst, mdType, mdAggregateType, mdNode);
+          const glsl_type* irType = llvmTypeToHirType(mdType, mdNode, llvmInst);
+
+          llvm::StringRef name = src->getName();
+
+          if (gepSrc) {
+             name = gepSrc->getName();
+
+             // Look for builtin
+             tNameBuiltinMap::const_iterator got = nameBuiltinMap.find(name);
+             if (got != nameBuiltinMap.end())
+                 name = got->second;
+          }
+
+          const ir_variable_mode irMode = ir_variable_mode(globalVarModeMap[name]);
+
+          ir_rvalue* irDst = newIRVariableDeref(irType, name, irMode);
+
+          valueMap.insert(tValueMap::value_type(gepSrc, irDst)).first;
+       }
+
+
+      // TODO: For globals, do we really have to look this up from the
+      // global decl?  The global decl can't put any llvm::Value in the map,
+      // because it doesn't have a llvm::Value, so we can't use the normal
+      // scheme. This seems ugly: probably just overlooking the "right" way.
+      // Needs work...
+      if (globalDeclMap.find(name) != globalDeclMap.end() ||
+          anonBlocks.find(name) != anonBlocks.end()) {
+         aggregate = newIRVariableDeref(irType, name, irMode);
+      } else {
+         aggregate = getIRValue(gepSrc);
+      }
+   
+      load = traverseGEP(gepInst, aggregate, 0);
+   } else {
+       const llvm::Value* loadSrc = llvmInst->getOperand(0);
+
+       // Load from existing variable, or declare a new one
+       const tValueMap::const_iterator location = valueMap.find(loadSrc);
+       if (location == valueMap.end()) {
+           load = newIRVariableDeref(srcType, loadSrc, name, irMode);
+       } else {
+           load = getIRValue(loadSrc);
+       }
+   }
+
+   if (load->as_dereference_variable()) {
+      ioVar = load->as_dereference_variable()->variable_referenced();
+   } else if (aggregate) {
+      ioVar = aggregate->as_dereference_variable()->variable_referenced();
+   }
+
+   setIoParameters(ioVar, mdNode);
+
+   // Handle type overriding
+   if (typeOverride)
+      load->type = typeOverride;
+
+   return load;
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Create an IR matrix object by direct read from a uniform matrix,
+ * or moves of column vectors into a temp matrix.
+ * -----------------------------------------------------------------------------
+ */
+inline ir_rvalue*
+MesaGlassTranslator::intrinsicMat(const llvm::Instruction* llvmInst,
+                                  int firstColumn, int numCols, int numRows)
+{
+   const llvm::Value* matSource = 0;
+
+   // If the source is from a uniform, we'll make a new uniform matrix.
+   // Otherwise, we'll form a matrix by copying the columns into a new mat,
+   // and hope downstream copy propagation optimizes them away.
+   for (int col=0; col<numCols; ++col) {
+      const int colPos = col + firstColumn;
+
+      if (const llvm::ExtractValueInst* extractValue =
+          llvm::dyn_cast<const llvm::ExtractValueInst>(llvmInst->getOperand(colPos))) {
+
+         if (col == 0) {
+            matSource = extractValue->getOperand(0);
+         } else {
+            if (matSource != extractValue->getOperand(0))
+               matSource = 0;
+         }
+      }
+   }
+
+   const glsl_type* matType    = glslMatType(numCols, numRows);
+   assert(matType);
+
+   ir_rvalue* matrix = 0;
+
+   if (matSource) {
+      const llvm::Instruction* matSrcInst = dyn_cast<const llvm::Instruction>(matSource);
+
+      const llvm::GetElementPtrInst* gepInst = getGepAsInst(matSrcInst->getOperand(0));
+      llvm::StringRef name;
+
+      if (gepInst)
+         name = gepInst->getOperand(0)->getName();
+      else 
+         name = llvmInst->getOperand(0)->getName();
+
+      const ir_variable_mode irMode = ir_variable_mode(globalVarModeMap[name]);
+
+      // Single mat source may have come from a uniform load, or a previous mat*mat
+      if (matSrcInst->getOpcode() == llvm::Instruction::Load &&
+          irMode == ir_var_uniform) {
+         matrix = makeIRLoad(dyn_cast<const llvm::Instruction>(matSource), matType);
+      } else if (matSrcInst->getOpcode() == llvm::Instruction::Call) {
+         const llvm::IntrinsicInst* intrinsic = dyn_cast<const llvm::IntrinsicInst>(matSrcInst);
+
+         if (intrinsic) {
+            switch (intrinsic->getIntrinsicID()) {
+            case llvm::Intrinsic::gla_fMatrix2TimesMatrix2:
+            case llvm::Intrinsic::gla_fMatrix2TimesMatrix3:
+            case llvm::Intrinsic::gla_fMatrix2TimesMatrix4:
+            case llvm::Intrinsic::gla_fMatrix3TimesMatrix2:
+            case llvm::Intrinsic::gla_fMatrix3TimesMatrix3:
+            case llvm::Intrinsic::gla_fMatrix3TimesMatrix4:
+            case llvm::Intrinsic::gla_fMatrix4TimesMatrix2:
+            case llvm::Intrinsic::gla_fMatrix4TimesMatrix3:
+            case llvm::Intrinsic::gla_fMatrix4TimesMatrix4:
+               matrix = getIRValue(matSource);
+               break;
+            default:
+               break;
+               // we're going to load it the hard way
+            }
+         }
+      }
+   }
+
+   if (!matrix) {
+      // If we can't use mat*vec directly, issue separate moves into a temp matrix,
+      // and multiply from that.
+      matrix = newIRVariableDeref(matType, ralloc_strdup(shader, ".mat.temp"), ir_var_auto);
+
+      for (int col=0; col < numCols; ++col) {
+         const int colPos = col + firstColumn;
+
+         ir_rvalue* indexVal = new(shader) ir_constant(col);
+         ir_rvalue* column   = new(shader) ir_dereference_array(matrix->clone(shader, 0), indexVal);
+
+         addIRInstruction(new(shader) ir_assignment(column, getIRValue(llvmInst->getOperand(colPos))));
+      }
+   }
+
+   return matrix;
+}
+
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * mat*mat intrinsics
+ * -----------------------------------------------------------------------------
+ */
+inline void
+MesaGlassTranslator::emitIRMatTimesMat(const llvm::Instruction* llvmInst,
+                                       int numLeftCols, int numRightCols)
+{
+   // Size of column vector
+   const int leftRows  =
+      gla::GetComponentCount(const_cast<llvm::Type*>(llvmInst->getOperand(0)->getType()));
+   const int rightRows =
+      gla::GetComponentCount(const_cast<llvm::Type*>(llvmInst->getOperand(numLeftCols)->getType()));
+
+   // LLVM produces a struct result, which isn't what we want to be making.  So, we'll
+   // override that with the type we know it ought to be.
+   const glsl_type* resultType = glslMatType(numLeftCols, rightRows);
+
+   ir_rvalue* leftMat  = intrinsicMat(llvmInst, 0, numLeftCols, leftRows);
+   ir_rvalue* rightMat = intrinsicMat(llvmInst, numLeftCols, numRightCols, rightRows);
+   
+   ir_rvalue* result   = glass_to_ir_expression(ir_binop_mul, resultType, leftMat, rightMat);
+
+   addIRInstruction(llvmInst, result);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * mat*Vec and vec*mat intrinsics
+ * -----------------------------------------------------------------------------
+ */
+inline void
+MesaGlassTranslator::emitIRMatMul(const llvm::Instruction* llvmInst, int numCols, bool matLeft)
+{
+   const int firstColumn = matLeft ? 0 : 1;
+   const int vecPos      = matLeft ? numCols : 0;
+
+   const llvm::Value* vecSource = llvmInst->getOperand(vecPos);
+
+   // Size of column vector
+   const int vecSize = 
+      gla::GetComponentCount(const_cast<llvm::Type*>(llvmInst->getOperand(vecPos)->getType()));
+
+   ir_rvalue* result = 0;
+
+   const glsl_type* resultType = llvmTypeToHirType(llvmInst->getType(), 0, llvmInst);
+
+   ir_rvalue* vector = getIRValue(vecSource);
+   ir_rvalue* matrix = intrinsicMat(llvmInst, firstColumn, numCols, vecSize);
+
+   // Issue the matrix multiply, in the requested order
+   if (matLeft)
+      result = glass_to_ir_expression(ir_binop_mul, resultType, matrix, vector);
+   else
+      result = glass_to_ir_expression(ir_binop_mul, resultType, vector, matrix);
+
+   addIRInstruction(llvmInst, result);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Translate intrinsic
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitIRIntrinsic(const llvm::IntrinsicInst* llvmInst)
+{
+   switch (llvmInst->getIntrinsicID()) {
+   case llvm::Intrinsic::invariant_end:                         return;  // ignore LLVM hints
+   case llvm::Intrinsic::invariant_start:                       return;  // ignore LLVM hints
+   case llvm::Intrinsic::lifetime_end:                          return;  // ignore LLVM hints
+   case llvm::Intrinsic::lifetime_start:                        return;  // ignore LLVM hints
+
+   // Handle Texturing ------------------------------------------------------------------------
+   case llvm::Intrinsic::gla_queryTextureSize:                  // ... fall through...
+   case llvm::Intrinsic::gla_queryTextureSizeNoLod:             // ...
+   case llvm::Intrinsic::gla_fQueryTextureLod:                  return emitIRTextureQuery(llvmInst);
+   // TODO: Goo: 430 Functionality: textureQueryLevels()
+   // case llvm::Intrinsic::gla_queryTextureLevels: // TODO:
+
+   case llvm::Intrinsic::gla_textureSample:                     // fall through...
+   case llvm::Intrinsic::gla_fTextureSample:                    // ...
+   case llvm::Intrinsic::gla_rTextureSample1:                   // ...
+   case llvm::Intrinsic::gla_fRTextureSample1:                  // ...
+   case llvm::Intrinsic::gla_rTextureSample2:                   // ...
+   case llvm::Intrinsic::gla_fRTextureSample2:                  // ...
+   case llvm::Intrinsic::gla_rTextureSample3:                   // ...
+   case llvm::Intrinsic::gla_fRTextureSample3:                  // ...
+   case llvm::Intrinsic::gla_rTextureSample4:                   // ...
+   case llvm::Intrinsic::gla_fRTextureSample4:                  // ...
+   case llvm::Intrinsic::gla_textureSampleLodRefZ:              // ...
+   case llvm::Intrinsic::gla_fTextureSampleLodRefZ:             // ...
+   case llvm::Intrinsic::gla_rTextureSampleLodRefZ1:            // ...
+   case llvm::Intrinsic::gla_fRTextureSampleLodRefZ1:           // ...
+   case llvm::Intrinsic::gla_rTextureSampleLodRefZ2:            // ...
+   case llvm::Intrinsic::gla_fRTextureSampleLodRefZ2:           // ...
+   case llvm::Intrinsic::gla_rTextureSampleLodRefZ3:            // ...
+   case llvm::Intrinsic::gla_fRTextureSampleLodRefZ3:           // ...
+   case llvm::Intrinsic::gla_rTextureSampleLodRefZ4:            // ...
+   case llvm::Intrinsic::gla_fRTextureSampleLodRefZ4:           // ...
+   case llvm::Intrinsic::gla_textureSampleLodRefZOffset:        // ...
+   case llvm::Intrinsic::gla_fTextureSampleLodRefZOffset:       // ...
+   case llvm::Intrinsic::gla_rTextureSampleLodRefZOffset1:      // ...
+   case llvm::Intrinsic::gla_fRTextureSampleLodRefZOffset1:     // ...
+   case llvm::Intrinsic::gla_rTextureSampleLodRefZOffset2:      // ...
+   case llvm::Intrinsic::gla_fRTextureSampleLodRefZOffset2:     // ...
+   case llvm::Intrinsic::gla_rTextureSampleLodRefZOffset3:      // ...
+   case llvm::Intrinsic::gla_fRTextureSampleLodRefZOffset3:     // ...
+   case llvm::Intrinsic::gla_rTextureSampleLodRefZOffset4:      // ...
+   case llvm::Intrinsic::gla_fRTextureSampleLodRefZOffset4:     // ...
+   case llvm::Intrinsic::gla_textureSampleLodRefZOffsetGrad:    // ...
+   case llvm::Intrinsic::gla_fTextureSampleLodRefZOffsetGrad:   // ...
+   case llvm::Intrinsic::gla_rTextureSampleLodRefZOffsetGrad1:  // ...
+   case llvm::Intrinsic::gla_fRTextureSampleLodRefZOffsetGrad1: // ...
+   case llvm::Intrinsic::gla_rTextureSampleLodRefZOffsetGrad2:  // ...
+   case llvm::Intrinsic::gla_fRTextureSampleLodRefZOffsetGrad2: // ...
+   case llvm::Intrinsic::gla_rTextureSampleLodRefZOffsetGrad3:  // ...
+   case llvm::Intrinsic::gla_fRTextureSampleLodRefZOffsetGrad3: // ...
+   case llvm::Intrinsic::gla_rTextureSampleLodRefZOffsetGrad4:  // ...
+   case llvm::Intrinsic::gla_fRTextureSampleLodRefZOffsetGrad4: // ...
+   case llvm::Intrinsic::gla_texelFetchOffset:                  // ... 
+   case llvm::Intrinsic::gla_fTexelFetchOffset:                 return emitIRTexture(llvmInst, false);
+   // Gather
+   case llvm::Intrinsic::gla_texelGather:                       // Fall through ...
+   case llvm::Intrinsic::gla_fTexelGather:                      // ...
+   case llvm::Intrinsic::gla_texelGatherOffset:                 // ...
+   case llvm::Intrinsic::gla_fTexelGatherOffset:                // ...
+   case llvm::Intrinsic::gla_texelGatherOffsets:                // ...
+   case llvm::Intrinsic::gla_fTexelGatherOffsets:               return emitIRTexture(llvmInst, true);
+
+   // Handle MultInsert -----------------------------------------------------------------------
+   case llvm::Intrinsic::gla_fMultiInsert:                      // fall through...
+   case llvm::Intrinsic::gla_multiInsert:                       return emitIRMultiInsert(llvmInst);
+
+   // Handle Swizzles -------------------------------------------------------------------------
+   case llvm::Intrinsic::gla_swizzle:                           // fall through...
+   case llvm::Intrinsic::gla_fSwizzle:                          return emitIRSwizzle(llvmInst);
+
+   // Handle FP and integer intrinsics --------------------------------------------------------
+   case llvm::Intrinsic::gla_abs:                               // fall through...
+   case llvm::Intrinsic::gla_fAbs:                              return emitOp<1>(ir_unop_abs, llvmInst);
+
+   case llvm::Intrinsic::gla_sMin:                              // fall through...
+   case llvm::Intrinsic::gla_fMin:                              return emitOp<2>(ir_binop_min, llvmInst);
+   case llvm::Intrinsic::gla_uMin:                              return emitOp<2>(ir_binop_min, llvmInst, true, true /* genUnsigned */);
+
+   case llvm::Intrinsic::gla_sMax:                              // fall through...
+   case llvm::Intrinsic::gla_fMax:                              return emitOp<2>(ir_binop_max, llvmInst);
+   case llvm::Intrinsic::gla_uMax:                              return emitOp<2>(ir_binop_max, llvmInst, true, true /* genUnsigned */);
+
+   case llvm::Intrinsic::gla_sClamp:                            // fall through...
+   case llvm::Intrinsic::gla_fClamp:                            return emitIRClamp(llvmInst);
+   case llvm::Intrinsic::gla_uClamp:                            return emitIRClamp(llvmInst, true /* genUnsigned */);
+
+   case llvm::Intrinsic::gla_fRadians:                          return emitFn("radians", llvmInst);
+   case llvm::Intrinsic::gla_fDegrees:                          return emitFn("degrees", llvmInst);
+   case llvm::Intrinsic::gla_fSin:                              return emitOp<1>(ir_unop_sin, llvmInst);
+   case llvm::Intrinsic::gla_fCos:                              return emitOp<1>(ir_unop_cos, llvmInst);
+   case llvm::Intrinsic::gla_fTan:                              return emitFn("tan", llvmInst);
+   case llvm::Intrinsic::gla_fAsin:                             return emitFn("asin", llvmInst);
+   case llvm::Intrinsic::gla_fAcos:                             return emitFn("acos", llvmInst);
+   case llvm::Intrinsic::gla_fAtan:                             // fall through...
+   case llvm::Intrinsic::gla_fAtan2:                            return emitFn("atan", llvmInst);
+   case llvm::Intrinsic::gla_fSinh:                             return emitFn("sinh", llvmInst);
+   case llvm::Intrinsic::gla_fCosh:                             return emitFn("cosh", llvmInst);
+   case llvm::Intrinsic::gla_fTanh:                             return emitFn("tanh", llvmInst);
+   case llvm::Intrinsic::gla_fAsinh:                            return emitFn("asinh", llvmInst);
+   case llvm::Intrinsic::gla_fAcosh:                            return emitFn("acosh", llvmInst);
+   case llvm::Intrinsic::gla_fAtanh:                            return emitFn("atanh", llvmInst);
+   case llvm::Intrinsic::gla_fPow:                              // fall through...
+   case llvm::Intrinsic::gla_fPowi:                             return emitOp<2>(ir_binop_pow,  llvmInst);
+   case llvm::Intrinsic::gla_fExp:                              return emitOp<1>(ir_unop_exp,   llvmInst);
+   case llvm::Intrinsic::gla_fLog:                              return emitOp<1>(ir_unop_log,   llvmInst);
+   case llvm::Intrinsic::gla_fExp2:                             return emitOp<1>(ir_unop_exp2,  llvmInst);
+   case llvm::Intrinsic::gla_fLog2:                             return emitOp<1>(ir_unop_log2,  llvmInst);
+   case llvm::Intrinsic::gla_fExp10:                            assert(0); return; // Let LunarGlass decompose
+   case llvm::Intrinsic::gla_fLog10:                            assert(0); return; // Let LunarGlass decompose
+   case llvm::Intrinsic::gla_fSqrt:                             return emitOp<1>(ir_unop_sqrt,  llvmInst);
+   case llvm::Intrinsic::gla_fInverseSqrt:                      return emitOp<1>(ir_unop_rsq,   llvmInst);
+   case llvm::Intrinsic::gla_fSign:                             // fall through...
+   case llvm::Intrinsic::gla_sign:                              return emitOp<1>(ir_unop_sign,  llvmInst);
+   case llvm::Intrinsic::gla_fFloor:                            return emitOp<1>(ir_unop_floor, llvmInst);
+   case llvm::Intrinsic::gla_fCeiling:                          return emitOp<1>(ir_unop_ceil,  llvmInst);
+   case llvm::Intrinsic::gla_fRoundEven:                        return emitOp<1>(ir_unop_round_even, llvmInst);
+   case llvm::Intrinsic::gla_fRoundZero:                        return emitOp<1>(ir_unop_trunc,  llvmInst);
+   case llvm::Intrinsic::gla_fRoundFast:                        return emitFn("round", llvmInst);
+   case llvm::Intrinsic::gla_fFraction:                         return emitOp<1>(ir_unop_fract, llvmInst);
+   case llvm::Intrinsic::gla_fModF:                             return emitFn("modf", llvmInst);
+   case llvm::Intrinsic::gla_fMix:                              return emitOp<3>(ir_triop_lrp,  llvmInst);
+   case llvm::Intrinsic::gla_fbMix:                             return emitFn("mix", llvmInst);
+   case llvm::Intrinsic::gla_fStep:                             return emitFn("step", llvmInst);
+   case llvm::Intrinsic::gla_fSmoothStep:                       return emitFn("smoothstep", llvmInst);
+   case llvm::Intrinsic::gla_fIsNan:                            return emitFn("isnan", llvmInst);
+   case llvm::Intrinsic::gla_fIsInf:                            return emitFn("isinf", llvmInst);
+   case llvm::Intrinsic::gla_fSaturate:                         return emitIRSaturate(llvmInst);
+   case llvm::Intrinsic::gla_fFma:                              return emitOp<3>(ir_triop_fma,  llvmInst);
+
+   // Handle integer-only ops -----------------------------------------------------------------
+   case llvm::Intrinsic::gla_addCarry:                          return emitOp<2>(ir_binop_carry, llvmInst);
+   case llvm::Intrinsic::gla_subBorrow:                         return emitOp<2>(ir_binop_borrow, llvmInst);
+   case llvm::Intrinsic::gla_umulExtended:                      return emitFn("umulExtended", llvmInst);
+   case llvm::Intrinsic::gla_smulExtended:                      return emitFn("smulExtended", llvmInst);
+
+   // Handle mat*vec & vec*mat intrinsics -----------------------------------------------------
+   case llvm::Intrinsic::gla_fMatrix2TimesVector:               return emitIRMatMul(llvmInst, 2, true);
+   case llvm::Intrinsic::gla_fMatrix3TimesVector:               return emitIRMatMul(llvmInst, 3, true);
+   case llvm::Intrinsic::gla_fMatrix4TimesVector:               return emitIRMatMul(llvmInst, 4, true);
+   case llvm::Intrinsic::gla_fVectorTimesMatrix2:               return emitIRMatMul(llvmInst, 2, false);
+   case llvm::Intrinsic::gla_fVectorTimesMatrix3:               return emitIRMatMul(llvmInst, 3, false);
+   case llvm::Intrinsic::gla_fVectorTimesMatrix4:               return emitIRMatMul(llvmInst, 4, false);
+
+   // Handle mat*mat -- -----------------------------------------------------------------------
+   case llvm::Intrinsic::gla_fMatrix2TimesMatrix2:              return emitIRMatTimesMat(llvmInst, 2, 2);
+   case llvm::Intrinsic::gla_fMatrix2TimesMatrix3:              return emitIRMatTimesMat(llvmInst, 2, 3);
+   case llvm::Intrinsic::gla_fMatrix2TimesMatrix4:              return emitIRMatTimesMat(llvmInst, 2, 4);
+   case llvm::Intrinsic::gla_fMatrix3TimesMatrix2:              return emitIRMatTimesMat(llvmInst, 3, 2);
+   case llvm::Intrinsic::gla_fMatrix3TimesMatrix3:              return emitIRMatTimesMat(llvmInst, 3, 3);
+   case llvm::Intrinsic::gla_fMatrix3TimesMatrix4:              return emitIRMatTimesMat(llvmInst, 3, 4);
+   case llvm::Intrinsic::gla_fMatrix4TimesMatrix2:              return emitIRMatTimesMat(llvmInst, 4, 2);
+   case llvm::Intrinsic::gla_fMatrix4TimesMatrix3:              return emitIRMatTimesMat(llvmInst, 4, 3);
+   case llvm::Intrinsic::gla_fMatrix4TimesMatrix4:              return emitIRMatTimesMat(llvmInst, 4, 4);
+
+   // Handle bit operations -------------------------------------------------------------------
+   case llvm::Intrinsic::gla_fFloatBitsToInt:                   return emitFn("floatBitsToInt", llvmInst);
+   case llvm::Intrinsic::gla_fIntBitsTofloat:                   return emitFn("intBitsToFloat", llvmInst);
+   case llvm::Intrinsic::gla_sBitFieldExtract:                  return emitOp<3>(ir_triop_bitfield_extract, llvmInst);
+   case llvm::Intrinsic::gla_uBitFieldExtract:                  return emitOp<3>(ir_triop_bitfield_extract, llvmInst, true, true, true /* genUnsigned*/);
+   case llvm::Intrinsic::gla_bitFieldInsert:                    return emitOp<3>(ir_triop_bfi, llvmInst);
+   case llvm::Intrinsic::gla_bitReverse:                        return emitOp<1>(ir_unop_bitfield_reverse, llvmInst);
+   case llvm::Intrinsic::gla_bitCount:                          return emitOp<1>(ir_unop_bit_count, llvmInst);
+   case llvm::Intrinsic::gla_findLSB:                           return emitOp<1>(ir_unop_find_lsb, llvmInst);
+   case llvm::Intrinsic::gla_sFindMSB:                          return emitOp<1>(ir_unop_find_msb, llvmInst);
+   case llvm::Intrinsic::gla_uFindMSB:                          return emitOp<1>(ir_unop_find_msb, llvmInst, true /* genUnsigned */);
+
+   // Handle pack/unpack ----------------------------------------------------------------------
+   case llvm::Intrinsic::gla_fFrexp:                            return emitFn("frexp", llvmInst);
+   case llvm::Intrinsic::gla_fLdexp:                            return emitFn("ldexp", llvmInst);
+   case llvm::Intrinsic::gla_fPackUnorm2x16:                    return emitOp<1>(ir_unop_pack_unorm_2x16, llvmInst);
+   case llvm::Intrinsic::gla_fUnpackUnorm2x16:                  return emitOp<1>(ir_unop_unpack_unorm_2x16, llvmInst);
+
+   case llvm::Intrinsic::gla_fPackSnorm2x16:                    return emitOp<1>(ir_unop_pack_snorm_2x16, llvmInst);
+   case llvm::Intrinsic::gla_fUnpackSnorm2x16:                  return emitOp<1>(ir_unop_unpack_snorm_2x16, llvmInst);
+
+   case llvm::Intrinsic::gla_fPackHalf2x16:                     return emitOp<1>(ir_unop_pack_half_2x16, llvmInst);
+   case llvm::Intrinsic::gla_fUnpackHalf2x16:                   return emitOp<1>(ir_unop_unpack_half_2x16, llvmInst);
+
+   case llvm::Intrinsic::gla_fPackUnorm4x8:                     return emitOp<1>(ir_unop_pack_unorm_4x8, llvmInst);
+   case llvm::Intrinsic::gla_fPackSnorm4x8:                     return emitOp<1>(ir_unop_pack_snorm_4x8, llvmInst);
+
+   case llvm::Intrinsic::gla_fUnpackUnorm4x8:                   return emitOp<1>(ir_unop_unpack_unorm_4x8, llvmInst);
+   case llvm::Intrinsic::gla_fUnpackSnorm4x8:                   return emitOp<1>(ir_unop_unpack_snorm_4x8, llvmInst);
+
+   case llvm::Intrinsic::gla_fPackDouble2x32:                   assert(0); break; // TODO:
+   case llvm::Intrinsic::gla_fUnpackDouble2x32:                 assert(0); break; // TODO:
+
+   // Handle geometry ops ---------------------------------------------------------------------
+   case llvm::Intrinsic::gla_fLength:                           return emitFn("length", llvmInst);
+   case llvm::Intrinsic::gla_fDistance:                         return emitFn("distance", llvmInst);
+   case llvm::Intrinsic::gla_fDot2:                             // fall through...
+   case llvm::Intrinsic::gla_fDot3:                             // ...
+   case llvm::Intrinsic::gla_fDot4:                             return emitOp<2>(ir_binop_dot, llvmInst);
+   case llvm::Intrinsic::gla_fCross:                            return emitFn("cross", llvmInst);
+   case llvm::Intrinsic::gla_fNormalize:                        return emitFn("normalize", llvmInst);
+   case llvm::Intrinsic::gla_fNormalize3D:                      assert(0); break; // TODO:
+   case llvm::Intrinsic::gla_fLit:                              assert(0); break; // TODO:
+   case llvm::Intrinsic::gla_fFaceForward:                      return emitFn("faceforward", llvmInst);
+   case llvm::Intrinsic::gla_fReflect:                          return emitFn("reflect", llvmInst);
+   case llvm::Intrinsic::gla_fRefract:                          return emitFn("refract", llvmInst);
+
+   // Handle derivative and transform----------------------------------------------------------
+   case llvm::Intrinsic::gla_fDFdx:                             return emitOp<1>(ir_unop_dFdx, llvmInst);
+   case llvm::Intrinsic::gla_fDFdy:                             return emitOp<1>(ir_unop_dFdy, llvmInst);
+   case llvm::Intrinsic::gla_fFilterWidth:                      return emitFn("fwidth", llvmInst);
+
+   // Handle vector logical ops ---------------------------------------------------------------
+   case llvm::Intrinsic::gla_not:                               return emitFn("not", llvmInst);
+   case llvm::Intrinsic::gla_any:                               return emitOp<1>(ir_unop_any, llvmInst);
+   case llvm::Intrinsic::gla_all:                               return emitFn("all", llvmInst);
+
+   case llvm::Intrinsic::gla_discardConditional:                return emitIRDiscardCond(llvmInst);
+
+   // Handle fixed transform ------------------------------------------------------------------
+   case llvm::Intrinsic::gla_fFixedTransform:                   return emitIRFTransform(llvmInst);
+
+   // Control ---------------------------------------------------------------------------------
+   case llvm::Intrinsic::gla_barrier:                           assert(0); break; // TODO:
+   case llvm::Intrinsic::gla_memoryBarrier:                     return emitFn("memoryBarrier", llvmInst);
+   case llvm::Intrinsic::gla_memoryBarrierAtomicCounter:        assert(0); break; // TODO:
+   case llvm::Intrinsic::gla_memoryBarrierBuffer:               assert(0); break; // TODO:
+   case llvm::Intrinsic::gla_memoryBarrierImage:                assert(0); break; // TODO:
+   case llvm::Intrinsic::gla_memoryBarrierShared:               assert(0); break; // TODO:
+   case llvm::Intrinsic::gla_groupMemoryBarrier:                assert(0); break; // TODO:
+
+   // Geometry --------------------------------------------------------------------------------
+   case llvm::Intrinsic::gla_emitVertex:                        return emitIREmitVertex(llvmInst);
+   case llvm::Intrinsic::gla_endPrimitive:                      return emitIREndPrimitive(llvmInst);
+   case llvm::Intrinsic::gla_emitStreamVertex:                  assert(0); break; // TODO:
+   case llvm::Intrinsic::gla_endStreamPrimitive:                assert(0); break; // TODO:
+
+   default:     return error("unknown intrinsic");
+   }
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Translate call or intrinsic
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitIRCallOrIntrinsic(const llvm::Instruction* llvmInst)
+{
+
+   if (const llvm::IntrinsicInst* intrinsic = llvm::dyn_cast<llvm::IntrinsicInst>(llvmInst)) {
+      emitIRIntrinsic(intrinsic);
+   } else {
+      const llvm::CallInst* call = llvm::dyn_cast<llvm::CallInst>(llvmInst);
+      assert(call);
+      emitIRCall(call);
+   }
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Vector component extraction
+ *   <result> = extractelement <n x <ty>> <val>, i32 <idx>    ; yields type <ty>
+ *   index can be constant or variable
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitIRExtractElement(const llvm::Instruction* llvmInst)
+{
+   ir_rvalue *vector   = getIRValue(llvmInst->getOperand(0));
+   ir_rvalue *index    = getIRValue(llvmInst->getOperand(1));
+   assert(vector && index);
+
+   if (index->as_constant()) {
+      unsigned component = index->as_constant()->get_uint_component(0);
+      const unsigned components[4] = { component, component, component, component };
+      ir_swizzle* swizzle = new(shader) ir_swizzle(vector, components, 1);
+      addIRInstruction(llvmInst, swizzle);
+   } else {
+      emitOp<2>(ir_binop_vector_extract, llvmInst);
+   }
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Emit IR select (ternary op)
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitIRSelect(const llvm::Instruction* llvmInst)
+{
+   const glsl_type *hirType = llvmTypeToHirType(llvmInst->getType(), 0, llvmInst);
+   ir_rvalue* op[3];
+
+   op[0] = getIRValue(llvmInst->getOperand(0));
+   op[1] = getIRValue(llvmInst->getOperand(1));
+   op[2] = getIRValue(llvmInst->getOperand(2));
+
+   ir_rvalue* result;
+
+   // HIR can't switch arrays, etc, from a single boolean.  Create
+   // if/else/endif in that case.
+   if (op[1]->type->is_array() || op[1]->type->is_matrix() ||op[1]->type->is_record()) {
+      result = newIRVariableDeref(hirType, newName(llvmInst), ir_var_auto);
+
+      addIf(op[0]); {
+         addIRInstruction(new(shader) ir_assignment(result, op[1]));
+      } addElse(); {
+         addIRInstruction(new(shader) ir_assignment(result->clone(shader, 0), op[2]));
+      } addEndif();
+   } else {
+      // HIR wants the condition to be a vector of size of element, so we must
+      // insert a broadcast swizzle if need be.
+      if (op[1]->type->is_vector() && !op[0]->type->is_vector())
+         op[0] = new(shader) ir_swizzle(op[0], 0, 0, 0, 0, op[1]->type->vector_elements);
+
+      result = glass_to_ir_expression(ir_triop_csel, hirType, op[0], op[1], op[2]);
+   }
+   
+   assert(result && op[0] && op[1] && op[2]);
+
+   addIRInstruction(llvmInst, result);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Fix IR Lvalues
+ * When seeing a GEP chain, we don't yet know whether it will be for an
+ * LValue or RValue reference.  Mostly the HIR is the same for both, so
+ * this works out, but for vector component references, we must produce
+ * different HIR.  This change adds support for doing that by converting
+ * the R-value form to the L-value form, so that we don't need to guess
+ * about it earlier in time than we see the use.  (Alternately,  we could
+ * analyze the situation in a pre-pass).
+ * -----------------------------------------------------------------------------
+ */
+ir_instruction* MesaGlassTranslator::fixIRLValue(ir_rvalue* lhs, ir_rvalue* rhs)
+{
+   // convert:
+   //  (assign (vector_extract vec comp) elementVal)
+   // to;
+   //  (assign vec (vector_insert vec elementval comp))
+   // TODO: Sometimes we might not need the assign
+
+   if (ir_expression* lval = lhs->as_expression()) {
+      if (lval->operation == ir_binop_vector_extract) {
+         ir_expression* vecIns =
+            glass_to_ir_expression(ir_triop_vector_insert,
+                                      lval->operands[0]->type,
+                                      lval->operands[0], 
+                                      rhs,
+                                      lval->operands[1]);
+
+         return new(shader) ir_assignment(lval->operands[0]->clone(shader, 0),
+                                          vecIns);
+      }
+   }
+
+   // Otherwise, return normal assignment
+   return new(shader) ir_assignment(lhs, rhs);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Load, potentially from GEP chain offset
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitIRLoad(const llvm::Instruction* llvmInst)
+{
+   addIRInstruction(llvmInst, makeIRLoad(llvmInst));
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Binding is two combined 16-bit numbers, one for set, one for binding
+ * This function extracts and returns each of them.
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::unPackSetAndBinding(const int bindingIn, int& set, int& bindingOut)
+{
+    // Logic mirrored from LunarGLASS GLSL backend
+    set = (unsigned) bindingIn >> 16;
+    bindingOut = bindingIn & 0xFFFF;
+
+    // Unbias set, which was biased by 1 to distinguish between "set=0" and nothing.
+    bool setPresent = (set != 0);
+    if (setPresent)
+        --set;
+}
+
+/**
+ * -----------------------------------------------------------------------------
+ * Set IO variable parameters (locations, interp modes, pixel origins, etc)
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::setIoParameters(ir_variable* ioVar, const llvm::MDNode* mdNode)
+{
+   if (ioVar && mdNode) {
+      const llvm::Type*   mdType = 0;
+      MetaType metaType;
+
+      // Glean information from metadata for intrinsic
+      decodeMdTypesEmitMdQualifiers(true, mdNode, mdType, false, metaType);
+
+      // ioVar->data.index         = slotOffset;
+      ioVar->data.used = true;
+      ioVar->data.origin_upper_left    = state->fs_origin_upper_left;
+      ioVar->data.pixel_center_integer = state->fs_pixel_center_integer;
+
+      if (metaType.block || metaType.mdSampler != 0) {
+          if (metaType.binding != -1) {
+              // If we have a sampler or a block, we need a binding
+              ioVar->data.explicit_binding  = true;
+              // Note: For sampler and UBO bindings, location is now two 16-bit
+              // values combined, pass the whole thing!
+              ioVar->data.binding           = metaType.binding;
+          }
+      } else {
+          int location = metaType.location;
+          if (location >= 0 && location < gla::MaxUserLayoutLocation) {
+              ioVar->data.explicit_location = true;
+
+              if ((manager->getStage() == EShLangFragment) && metaType.qualifier == EVQOutput)
+                  ioVar->data.location      = location + FRAG_RESULT_DATA0;
+              else if ((manager->getStage() == EShLangVertex) && metaType.qualifier == EVQInput)
+                  ioVar->data.location      = location + VERT_ATTRIB_GENERIC0;
+              else
+                  ioVar->data.location      = location + VARYING_SLOT_VAR0;
+          }
+      }
+   }
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Store, potentially to GEP chain offset
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitIRStore(const llvm::Instruction* llvmInst)
+{
+   const llvm::Value*             src     = llvmInst->getOperand(0);
+   const llvm::Value*             dst     = llvmInst->getOperand(1);
+   const llvm::GetElementPtrInst* gepInst = getGepAsInst(dst);
+   const llvm::MDNode*            mdNode;
+   std::string                    name;
+
+   if (gepInst)
+      name = gepInst->getOperand(0)->getName();
+   else 
+      name = llvmInst->getOperand(1)->getName();
+
+   // Look for builtin
+   tNameBuiltinMap::const_iterator got = nameBuiltinMap.find(name);
+   if (got != nameBuiltinMap.end())
+       name = got->second;
+
+   mdNode = name.empty() ? 0 : typenameMdMap[name];
+
+   assert(llvm::isa<llvm::PointerType>(dst->getType()));
+
+   ir_rvalue* irDst;
+   ir_variable* ioVar = 0;
+   ir_rvalue* aggregate = 0;
+
+   if (gepInst) {
+      const llvm::MDNode* mdNode;
+      const llvm::Type*   mdType;
+      const llvm::Type*   mdAggregateType;
+
+      FindGepType(gepInst, mdType, mdAggregateType, mdNode);
+
+      const glsl_type* irType = llvmTypeToHirType(mdType, mdNode, llvmInst);
+
+      const llvm::Value* gepSrc = gepInst->getPointerOperand();
+
+      // If this is the first write to this aggregate, make up a new one.
+      const tValueMap::const_iterator location = valueMap.find(gepSrc);
+      if (location == valueMap.end()) {
+
+         if (gepSrc) {
+            name = gepInst->getPointerOperand()->getName();
+
+            // Look for builtin
+            tNameBuiltinMap::const_iterator got = nameBuiltinMap.find(name);
+            if (got != nameBuiltinMap.end())
+                name = got->second;
+         }
+
+         const ir_variable_mode irMode = ir_variable_mode(globalVarModeMap[name]);
+
+         irDst = newIRVariableDeref(irType, name, irMode);
+
+         valueMap.insert(tValueMap::value_type(gepSrc, irDst)).first;
+      }
+
+      aggregate = getIRValue(gepSrc);
+      irDst = traverseGEP(gepInst, aggregate, 0);
+
+   } else if (llvm::isa<llvm::Instruction>(dst)) { // temporary pointer can just be used directly
+      irDst = getIRValue(dst);
+
+   } else {
+      const glsl_type* irType = llvmTypeToHirType(dst->getType()->getContainedType(0), 0, llvmInst);
+
+      // llvm::StringRef name = dst->getName();
+      const ir_variable_mode irMode = ir_variable_mode(globalVarModeMap[name]);
+
+      irDst = newIRVariableDeref(irType, name, irMode);
+   }
+
+   if (irDst->as_dereference_variable()) {
+      ioVar = irDst->as_dereference_variable()->variable_referenced();
+   } else if (aggregate) {
+      ioVar = aggregate->as_dereference_variable()->variable_referenced();
+   }
+
+   setIoParameters(ioVar, mdNode);
+
+   addIRInstruction(llvmInst, fixIRLValue(irDst, getIRValue(src)));
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Vector component insertion
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitIRInsertElement(const llvm::Instruction* llvmInst)
+{
+   // ir_tripop_vector_insert
+   assert(0 && "TODO: handle insertElement");
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ *  Traverse one step of a dereference chain
+ * -----------------------------------------------------------------------------
+ */
+ir_rvalue* MesaGlassTranslator::dereferenceGep(const llvm::Type*& type, ir_rvalue* aggregate,
+                                               llvm::Value* operand, int index,
+                                               const llvm::MDNode*& mdAggregate,
+                                               EMdTypeLayout* mdTypeLayout)
+{
+   if (operand) {
+      if (llvm::isa<const llvm::ConstantInt>(operand))
+         index = GetConstantInt(operand);
+      else
+         index = -1;
+   }
+
+   switch (type->getTypeID()) {
+   case llvm::Type::StructTyID:
+   case llvm::Type::ArrayTyID:
+   case llvm::Type::VectorTyID:
+      if (aggregate->type->is_array() || aggregate->type->is_matrix()) {
+         ir_rvalue* indexVal;
+
+         if (index < 0) { // Indirect indexing
+            indexVal = getIRValue(operand);
+            trackMaxArrayElement(aggregate, -1);
+         } else { // Direct indexing
+            indexVal = new(shader) ir_constant(index);
+            trackMaxArrayElement(aggregate, index);
+         }
+
+         type = type->getContainedType(0);
+
+         return new(shader) ir_dereference_array(aggregate, indexVal);
+      } else if (aggregate->type->is_vector()) {
+         type = type->getContainedType(0);
+
+         return glass_to_ir_expression(ir_binop_vector_extract,
+                                          llvmTypeToHirType(type), aggregate,
+                                          getIRValue(operand));
+      } else if (aggregate->type->is_record() ||
+                 aggregate->type->is_interface()) {
+         if (index < 0) {
+            error("non-constant index in structure dereference");
+            // On error, just select the first field, since we can't know what was really meant.
+            const char *field_name = ralloc_strdup(shader, aggregate->type->fields.structure[0].name);
+            return new(shader) ir_dereference_record(aggregate, field_name);
+         }
+
+         type = type->getContainedType(index);
+
+         // Deference the metadata aggregate 
+         if (mdAggregate) {
+            const int aggOp = GetAggregateMdSubAggregateOp(index);
+            if (int(mdAggregate->getNumOperands()) <= aggOp) {
+               error("bad MD aggregate index");
+               mdAggregate = 0;
+            } else {
+               mdAggregate = llvm::dyn_cast<llvm::MDNode>(mdAggregate->getOperand(aggOp));
+            }
+
+            if (mdAggregate && mdTypeLayout)
+               *mdTypeLayout = GetMdTypeLayout(mdAggregate);
+         }
+
+         if (aggregate->as_dereference_variable()) {
+            ir_variable* var = aggregate->as_dereference_variable()->variable_referenced();
+            
+            // interface block members are hoisted to global scope in HIR
+            if (anonBlocks.find(var->name) != anonBlocks.end()) {
+               return newIRVariableDeref(aggregate->type->fields.structure[index].type,
+                                         aggregate->type->fields.structure[index].name,
+                                         ir_variable_mode(globalVarModeMap[var->name]));
+            }
+         }
+
+         const char *field_name = ralloc_strdup(shader, aggregate->type->fields.structure[index].name);
+
+         // interface block members are hoisted to global scope in HIR
+         if (anonBlocks.find(aggregate->type->name) != anonBlocks.end()) {
+            const ir_variable_mode irMode = ir_variable_mode(globalVarModeMap[aggregate->type->name]);
+            return newIRVariableDeref(llvmTypeToHirType(type), field_name, irMode);
+         } else {
+            return new(shader) ir_dereference_record(aggregate, field_name);
+         }
+      }
+      // fall through to error case if it wasn't an array or a struct
+
+   default:
+      error("unexpected type in GEP");
+      return newIRVariableDeref(glsl_type::get_instance(GLSL_TYPE_FLOAT, 1, 1),
+                                "errorval", ir_var_auto);
+   }
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Traverse GEP instruction chain
+ * -----------------------------------------------------------------------------
+ */
+
+inline void MesaGlassTranslator::FindGepType(const llvm::Instruction* llvmInst,
+                                             const llvm::Type*& type,
+                                             const llvm::Type*& aggregateType,
+                                             const llvm::MDNode*& mdNode)
+{
+   aggregateType = type = llvmInst->getOperand(0)->getType();
+
+   while (type->getTypeID() == llvm::Type::PointerTyID)
+      type = type->getContainedType(0);
+
+   while (aggregateType->getTypeID() == llvm::Type::PointerTyID || aggregateType->getTypeID() == llvm::Type::ArrayTyID)
+      aggregateType = aggregateType->getContainedType(0);
+   
+   auto it = typeMdAggregateMap.find(aggregateType);
+   mdNode = (it == typeMdAggregateMap.end()) ? 0 : it->second;
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Traverse GEP instruction chain
+ * -----------------------------------------------------------------------------
+ */
+inline ir_rvalue* MesaGlassTranslator::traverseGEP(const llvm::Instruction* llvmInst,
+                                                   ir_rvalue* aggregate,
+                                                   EMdTypeLayout* mdTypeLayout)
+{
+   // *** Function borrowed from LunarGlass GLSL backend, modified slightly ***
+   const llvm::MDNode* mdAggregate;
+   const llvm::Type*   mdType;
+   const llvm::Type*   mdAggregateType;
+
+   FindGepType(llvmInst, mdType, mdAggregateType, mdAggregate);
+
+   // Register type in the type map:
+   const llvm::StructType* structType = llvm::dyn_cast<const llvm::StructType>(mdAggregateType);
+   if (structType) {
+      const llvm::StringRef structName = structType->isLiteral() ? "" : structType->getName();
+      typenameMdMap[structName] = mdAggregate;
+   }
+
+   if (const llvm::GetElementPtrInst* gepInst = getGepAsInst(llvmInst)) {
+      // Start at operand 2 since indices 0 and 1 give you the base and are handled before traverseGep
+      const llvm::Type* gepType = gepInst->getPointerOperandType()->getContainedType(0);
+      for (unsigned int op = 2; op < gepInst->getNumOperands(); ++op)
+         aggregate = dereferenceGep(gepType, aggregate, gepInst->getOperand(op), -1, mdAggregate, mdTypeLayout);
+
+   } else if (const llvm::InsertValueInst* insertValueInst = llvm::dyn_cast<const llvm::InsertValueInst>(llvmInst)) {
+      const llvm::Type* gepType = insertValueInst->getAggregateOperand()->getType();            
+      for (llvm::InsertValueInst::idx_iterator iter = insertValueInst->idx_begin(), end = insertValueInst->idx_end();
+           iter != end; ++iter)
+         aggregate = dereferenceGep(gepType, aggregate, 0, *iter, mdAggregate);
+
+   } else if (const llvm::ExtractValueInst* extractValueInst = llvm::dyn_cast<const llvm::ExtractValueInst>(llvmInst)) {
+      const llvm::Type* gepType = extractValueInst->getAggregateOperand()->getType();  
+      for (llvm::ExtractValueInst::idx_iterator iter = extractValueInst->idx_begin(), end = extractValueInst->idx_end();
+           iter != end; ++iter)
+         aggregate = dereferenceGep(gepType, aggregate, 0, *iter, mdAggregate);
+
+   } else {
+      error("non-GEP in traverseGEP");
+      return newIRVariableDeref(glsl_type::get_instance(GLSL_TYPE_FLOAT, 1, 1),
+                                "errorval", ir_var_auto);
+   }
+
+   return aggregate;
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Array component or struct extraction
+ *   <result> = extractvalue <aggregate type> <val>, <idx>{, <idx>}*
+ *      (array_ref (var_ref ...) (constant int (0) ))
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitIRExtractValue(const llvm::Instruction* llvmInst)
+{
+   const llvm::ExtractValueInst* extractValueInst = llvm::dyn_cast<const llvm::ExtractValueInst>(llvmInst);
+   assert(llvmInst);
+
+   const llvm::Value* aggregate = extractValueInst->getAggregateOperand();
+   assert(aggregate);
+
+   addIRInstruction(llvmInst, traverseGEP(extractValueInst, getIRValue(aggregate), 0));
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Array component insertion
+ * -----------------------------------------------------------------------------
+ */
+inline void MesaGlassTranslator::emitIRInsertValue(const llvm::Instruction* llvmInst)
+{
+   const llvm::InsertValueInst* insertValueInst = llvm::dyn_cast<const llvm::InsertValueInst>(llvmInst);
+
+   ir_rvalue* aggregate  = getIRValue(insertValueInst->getAggregateOperand());
+   ir_rvalue* gep        = traverseGEP(insertValueInst, aggregate, 0);
+   ir_rvalue* src        = getIRValue(insertValueInst->getInsertedValueOperand());
+   ir_assignment* assign = new(shader) ir_assignment(gep, src);
+
+   addIRInstruction(assign);
+   addIRInstruction(llvmInst, aggregate);
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Translate instruction
+ * -----------------------------------------------------------------------------
+ */
+void MesaGlassTranslator::addInstruction(const llvm::Instruction* llvmInst, bool lastBlock, bool referencedOutsideScope)
+{
+    if (referencedOutsideScope) {
+       // emitGlaValueDeclaration(llvmInst, referencedOutsideScope);
+    }
+
+    switch (llvmInst->getOpcode()) {
+    case llvm::Instruction::Add:            // fall through...
+    case llvm::Instruction::FAdd:           return emitOp<2>(ir_binop_add,     llvmInst);
+    case llvm::Instruction::Sub:            // fall through...
+    case llvm::Instruction::FSub:           return emitOp<2>(ir_binop_sub,     llvmInst);
+    case llvm::Instruction::Mul:            // fall through...
+    case llvm::Instruction::FMul:           return emitOp<2>(ir_binop_mul,     llvmInst);
+    case llvm::Instruction::UDiv:           return emitOp<2>(ir_binop_div,     llvmInst, true, true /* genUnsigned */);
+    case llvm::Instruction::SDiv:           // fall through...
+    case llvm::Instruction::FDiv:           return emitOp<2>(ir_binop_div,     llvmInst);
+    case llvm::Instruction::URem:           return emitOp<2>(ir_binop_mod,     llvmInst, true, true /* genUnsigned */);
+    case llvm::Instruction::SRem:           return emitOp<2>(ir_binop_mod,     llvmInst);
+    case llvm::Instruction::FRem:           return emitFn("mod", llvmInst);
+    case llvm::Instruction::Shl:            return emitOp<2>(ir_binop_lshift,  llvmInst);
+    case llvm::Instruction::LShr:           return emitOp<2>(ir_binop_rshift,  llvmInst, true /* genUnsigned */);
+    case llvm::Instruction::AShr:           return emitOp<2>(ir_binop_rshift,  llvmInst);
+    case llvm::Instruction::And:            return emitOpBit<2>(ir_binop_logic_and, ir_binop_bit_and, llvmInst);
+    case llvm::Instruction::Or:             return emitOpBit<2>(ir_binop_logic_or,  ir_binop_bit_or,  llvmInst);
+    // TODO: xor needs a dedicated emitter, to handle all the stuff
+    // LLVM does with it.  Like unary negation and whatnot.
+    case llvm::Instruction::Xor:            return emitOpBit<2>(ir_binop_logic_xor, ir_binop_bit_xor, llvmInst);
+    case llvm::Instruction::ICmp:           // fall through...
+    case llvm::Instruction::FCmp:           return emitCmp(llvmInst);
+    case llvm::Instruction::FPToUI:         return emitOp<1>(ir_unop_f2u,      llvmInst, true /* genUnsigned */);
+    // TODO: for ZExt, if we need more than 1 to 32 bit conversions, more
+    // smarts wil be needed than blind use of ir_unop_b2i.
+    case llvm::Instruction::ZExt:           return emitOp<1>(ir_unop_b2i,      llvmInst);
+    case llvm::Instruction::SExt:           return emitIRSext(llvmInst);
+    case llvm::Instruction::FPToSI:         return emitOp<1>(ir_unop_f2i,      llvmInst);
+    case llvm::Instruction::UIToFP:         return emitOpBit<1>(ir_unop_b2f, ir_unop_u2f, llvmInst, true /* genUnsigned */);
+    case llvm::Instruction::SIToFP:         return emitOp<1>(ir_unop_i2f,      llvmInst);
+    case llvm::Instruction::Call:           return emitIRCallOrIntrinsic(llvmInst);
+    case llvm::Instruction::Ret:            return emitIRReturn(llvmInst, lastBlock);
+    case llvm::Instruction::Load:           return emitIRLoad(llvmInst);
+    case llvm::Instruction::Alloca:         return emitIRalloca(llvmInst);
+    case llvm::Instruction::Store:          return emitIRStore(llvmInst);
+    case llvm::Instruction::ExtractElement: return emitIRExtractElement(llvmInst);
+    case llvm::Instruction::InsertElement:  return emitIRInsertElement(llvmInst);
+    case llvm::Instruction::Select:         return emitIRSelect(llvmInst);
+    case llvm::Instruction::GetElementPtr:  break; // defer until we process the load.
+    case llvm::Instruction::ExtractValue:   return emitIRExtractValue(llvmInst);
+    case llvm::Instruction::InsertValue:    return emitIRInsertValue(llvmInst);
+    case llvm::Instruction::ShuffleVector:  assert(0 && "TODO: ShuffleVector"); break;
+    case llvm::Instruction::BitCast:        return;  // nothing to do: bitcast casts ptr types
+       break;
+
+    default:
+       llvmInst->dump();
+       return error("unexpected LLVM opcode");
+       // TODO: report error
+    }
+}
+
+
+/**
+ * -----------------------------------------------------------------------------
+ * Factory for Mesa LunarGLASS back-end translator
+ * -----------------------------------------------------------------------------
+ */
+BackEndTranslator* GetMesaGlassTranslator(Manager* manager)
+{
+    return new MesaGlassTranslator(manager);
+}
+
+/**
+ * -----------------------------------------------------------------------------
+ * Destroy back-end translator
+ * -----------------------------------------------------------------------------
+ */
+void ReleaseMesaGlassTranslator(BackEndTranslator* target)
+{
+    delete target;
+}
+
+} // namespace gla

diff --git a/icd/intel/compiler/shader/glsl_glass_backend_translator.h b/icd/intel/compiler/shader/glsl_glass_backend_translator.h
new file mode 100644
index 0000000..a596048
--- /dev/null
+++ b/icd/intel/compiler/shader/glsl_glass_backend_translator.h

@@ -0,0 +1,438 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Author: Cody Northrop <cody@lunarg.com>
+ * Author: GregF <greg@LunarG.com>
+ * Author: Steve K <srk@LunarG.com>
+ *
+ */
+
+//===- glsl_glass_backend_translator.h - Mesa customization of gla::BackEndTranslator -----===//
+//
+// Customization of gla::BackEndTranslator for Mesa
+//
+//===--------------------------------------------------------------------------------------===//
+
+
+#include "Core/PrivateManager.h"
+#include "Core/Backend.h"
+#include "list.h"
+#include <map>
+#include <list>
+#include <vector>
+#include <tr1/unordered_map>
+#include <tr1/unordered_set>
+#include <tr1/tuple>
+
+// Forward declare to avoid pulling in entire unneeded definitions
+struct _mesa_glsl_parse_state;
+struct glsl_type;
+struct gl_shader;
+struct gl_context;
+struct ir_instruction;
+struct ir_function;
+struct ir_rvalue;
+struct ir_expression;
+struct ir_constant;
+struct ir_function_signature;
+struct ir_variable;
+struct ir_dereference;
+struct ir_if;
+struct ir_loop;
+struct exec_list;
+
+// Forward decls for LLVM
+namespace llvm {
+   class IntrinsicInst;
+};
+
+namespace gla {
+    BackEndTranslator* GetMesaGlassTranslator(Manager*);
+    void ReleaseMesaGlassTranslator(gla::BackEndTranslator*);
+
+    BackEnd* GetDummyBackEnd();
+    void ReleaseMesaGlassBackEnd(gla::BackEnd*);
+
+    class MesaGlassTranslator : public gla::BackEndTranslator {
+    public:
+        MesaGlassTranslator(Manager* m) :
+            BackEndTranslator(m),
+            fnReturnType(0),
+            fnName(0),
+            fnFunction(0),
+            fnSignature(0),
+            state(0),
+            ctx(0)
+        { }
+
+        ~MesaGlassTranslator();
+
+        void initializeTranslation(gl_context *, _mesa_glsl_parse_state *, gl_shader *);
+        void finalizeTranslation();
+        void seedGlobalDeclMap();
+        const char* getInfoLog() const { return infoLog.c_str(); }
+
+        static void initSamplerTypes();
+
+    protected:
+        bool hoistUndefOperands()  {  return false; }
+
+        // Translation methods from LunarGlass ---------------------------------
+        void start(llvm::Module&);
+        void end(llvm::Module&) { /* nothing to do */ }
+
+        void addStructType(llvm::StringRef, const llvm::Type*);
+        void addGlobal(const llvm::GlobalVariable*);
+        void addGlobalConst(const llvm::GlobalVariable*);
+        void addIoDeclaration(gla::EVariableQualifier qualifier, const llvm::MDNode* mdNode);
+        void startFunctionDeclaration(const llvm::Type* type, llvm::StringRef name);
+        void addArgument(const llvm::Value* value, bool last);
+        void endFunctionDeclaration();
+        void startFunctionBody();
+        void endFunctionBody();
+        void addInstruction(const llvm::Instruction* llvmInstruction, bool lastBlock, bool referencedOutsideScope=false);
+        void declarePhiCopy(const llvm::Value* dst);
+        void addPhiCopy(const llvm::Value* dst, const llvm::Value* src);
+        void addPhiAlias(const llvm::Value* dst, const llvm::Value* src);
+        void addIf(const llvm::Value* cond, bool invert=false);
+        void addElse();
+        void addEndif();
+        void addSwitch(const llvm::Value* cond);
+        void addCase(int);
+        void addDefault();
+        void endCase(bool withBreak);
+        void endSwitch();
+        void beginConditionalLoop();
+        void beginSimpleConditionalLoop(const llvm::CmpInst* cmp, const llvm::Value* op1, const llvm::Value* op2, bool invert=false);
+        void beginForLoop(const llvm::PHINode* phi, llvm::ICmpInst::Predicate, unsigned bound, unsigned increment);
+        void beginSimpleInductiveLoop(const llvm::PHINode* phi, const llvm::Value* count);
+        void beginLoop();
+        void endLoop();
+        void addLoopExit(const llvm::Value* condition=NULL, bool invert=false);
+        void addLoopBack(const llvm::Value* condition=NULL, bool invert=false);
+        void addDiscard();
+        void print() { }
+
+        // Internal methods ----------------------------------------------------
+        void setParseState(_mesa_glsl_parse_state *s) { state = s; }
+        void setContext(gl_context *c) { ctx = c; }
+        void setShader(gl_shader *s) { shader = s; }
+        void resetFnTranslationState();
+        const llvm::GetElementPtrInst* getGepAsInst(const llvm::Value* gep);
+
+        int irCmpOp(int) const; // use ints to avoid lack of forward decls of enums in C++
+        bool irCmpUnsigned(int) const;
+
+        ir_expression *glass_to_ir_expression(const int, const glsl_type*, ir_rvalue* = NULL, ir_rvalue* = NULL, ir_rvalue* = NULL,
+                                    const bool = false, const bool = false, const bool = false);
+
+        // For loop with ir_rvalue inputs
+        void beginForLoop(const llvm::PHINode* phi, llvm::ICmpInst::Predicate, ir_rvalue* bound, ir_rvalue* increment);
+
+        // IR flavor of simple inductive loop
+        void beginSimpleInductiveLoop(const llvm::PHINode* phi, ir_rvalue*);
+
+        // Add IR if statement from ir_rvalue
+        inline void addIf(ir_rvalue* cond, bool invert=false);
+
+        // Add IR loop exit statement from ir_rvalue
+        inline void addIRLoopExit(ir_rvalue* condition=NULL, bool invert=false);
+
+        // Convert structure types (also used for blocks)
+        const glsl_type* convertStructType(const llvm::StructType*, llvm::StringRef name, llvm::StringRef blockName, const llvm::MDNode*,
+                                           gla::EMdTypeLayout, gla::EVariableQualifier, bool isBlock);
+
+        // Convert an LLVM type to an HIR type
+        const glsl_type* llvmTypeToHirType(const llvm::Type*, const llvm::MDNode* = 0, const llvm::Value* = 0, const bool = false);
+
+        // Emit vertex
+        inline void emitIREmitVertex(const llvm::Instruction*);
+
+        // End primitive
+        inline void emitIREndPrimitive(const llvm::Instruction*);
+
+        // Add IR sign extension
+        inline void emitIRSext(const llvm::Instruction*);
+
+        // Add alloc
+        inline void emitIRalloca(const llvm::Instruction*);
+
+        // Add a binary op
+        template <int ops> inline void emitOp(int /* ir_expression_operation */, const llvm::Instruction*,
+                                              const bool = false, const bool = false, const bool = false);
+
+        // Add a binary op of either logical or bitwise type
+        template <int ops> inline void emitOpBit(int /* logical_op */, int /* bitwise_op */, const llvm::Instruction*,
+                                                 const bool = false /* genUnsigned */);
+
+        // Add a builtin function call
+        inline void emitFn(const char* name, const llvm::Instruction*);
+
+        // Add a saturation.  TODO: Want to ask for a LunarGlass
+        // decomposition of this, which doesn't exist yet.
+        inline void emitIRSaturate(const llvm::CallInst*);
+
+        // Add a clamp.  TODO: We don't really want to do this here; better
+        // for LunarGlass to do it, but there's a defect in the LunarGlass decomposition
+        // exposed by taiji-shaders/shader_32.frag.  This is a workaround until
+        // that's sorted.
+        inline void emitIRClamp(const llvm::CallInst*, const bool = false);
+
+        // Add a conditional discard
+        inline void emitIRDiscardCond(const llvm::CallInst*);
+
+        // Add fixed transform
+        inline void emitIRFTransform(const llvm::CallInst*);
+
+        // Add a comparison op
+        inline void emitCmp(const llvm::Instruction*);
+
+        // Add a function call or intrinsic
+        inline void emitIRCall(const llvm::CallInst*);
+
+        // Add a function call or intrinsic
+        inline void emitIRReturn(const llvm::Instruction*, bool lastBlock);
+
+        // Determine glsl matrix type
+        inline const glsl_type* glslMatType(int numCols, int numRows) const;
+
+        // Create an IR matrix object by direct read from a uniform matrix,
+        // or moves of column vectors into a temp matrix.
+        inline ir_rvalue* intrinsicMat(const llvm::Instruction*,
+                                       int firstColumn, int numCols, int numRows);
+
+        // Add IR mat*vec or vec*mat intrinsic
+        inline void emitIRMatMul(const llvm::Instruction*, int numCols, bool matLeft);
+
+        // Add IR mat*mat intrinsic
+        inline void emitIRMatTimesMat(const llvm::Instruction*, int numLeftCols, int numrightCols);
+
+        // IR intrinsics
+        inline void emitIRIntrinsic(const llvm::IntrinsicInst*);
+
+        // Track maximum array value used
+        inline void trackMaxArrayElement(ir_rvalue* deref, int index) const;
+
+        // IR texture intrinsics
+        inline void emitIRTexture(const llvm::IntrinsicInst*, bool gather);
+        inline void emitIRTextureQuery(const llvm::IntrinsicInst*);
+
+        // IR insertion intrinsics
+        inline void emitIRMultiInsert(const llvm::IntrinsicInst*);
+
+        // IR swizzle intrinsics
+        inline void emitIRSwizzle(const llvm::IntrinsicInst*);
+
+        // Load operation (but don't add to instruction stream)
+        inline ir_rvalue* makeIRLoad(const llvm::Instruction*, const glsl_type* = 0);
+
+        // Load operation
+        inline void emitIRLoad(const llvm::Instruction*);
+
+        // Store operation
+        inline void emitIRStore(const llvm::Instruction*);
+
+        // Add a function call or intrinsic
+        inline void emitIRCallOrIntrinsic(const llvm::Instruction*);
+
+        // Vector component extraction
+        inline void emitIRExtractElement(const llvm::Instruction*);
+
+        // ternary operator
+        inline void emitIRSelect(const llvm::Instruction*);
+
+        // Vector component insertion
+        inline void emitIRInsertElement(const llvm::Instruction*);
+
+        // Array component extraction
+        inline void emitIRExtractValue(const llvm::Instruction*);
+
+        inline void FindGepType(const llvm::Instruction*,
+                                const llvm::Type*&,
+                                const llvm::Type*&,
+                                const llvm::MDNode*&);
+
+        // Deference GEP
+        inline ir_rvalue* dereferenceGep(const llvm::Type*&, ir_rvalue*, llvm::Value*,
+                                         int index, const llvm::MDNode*&, EMdTypeLayout* = 0);
+
+        // Traverse GEP instruction
+        inline ir_rvalue* traverseGEP(const llvm::Instruction*,
+                                      ir_rvalue* aggregate,
+                                      EMdTypeLayout* = 0);
+
+        // Array component insertion
+        inline void emitIRInsertValue(const llvm::Instruction*);
+
+        // See if this constant is a zero
+        inline bool isConstantZero(const llvm::Constant*) const;
+
+        // Create a simple scalar constant
+        template <typename T> inline T newIRScalarConstant(const llvm::Constant*) const;
+
+        // Add a constant value for this LLVM value
+        inline ir_constant* newIRConstant(const llvm::Value*);
+
+        // Add a global for this LLVM value
+        inline ir_rvalue* newIRGlobal(const llvm::Value*, const char* name = 0);
+
+        // Add type-safe undefined value in case someone looks up a not-defined value
+        inline ir_constant* addIRUndefined(const llvm::Type*);
+
+        // Add variable from LLVM value
+        inline ir_rvalue* getIRValue(const llvm::Value*, ir_instruction* = 0);
+
+        // Add instruction to top of instruction list stack
+        inline void addIRInstruction(const llvm::Value*, ir_instruction*);
+
+        // raw add instruction: don't add map entry, just append to inst list
+        inline void addIRInstruction(ir_instruction*, bool global = false);
+
+        // Return ref count of an rvalue
+        inline unsigned getRefCount(const llvm::Value*) const;
+
+        // Return true if we have a valueMap entry for this LLVM value
+        inline bool valueEntryExists(const llvm::Value*) const;
+
+        // Encapsulate creation of variables.  Ideally the int would be
+        // ir_variable_mode, but we can't forward declare an enum until C++11
+        inline ir_variable* newIRVariable(const glsl_type*, const char*, int mode, bool declare = false);
+        inline ir_variable* newIRVariable(const glsl_type*, const std::string&, int mode, bool declare = false);
+        inline ir_variable* newIRVariable(const llvm::Type* type, const llvm::Value*, const char*, int mode, bool declare = false);
+        inline ir_variable* newIRVariable(const llvm::Type* type, const llvm::Value*, const std::string&, int mode, bool declare = false);
+
+        // Encapsulate creation of variable dereference.  Ideally the int would be
+        // ir_variable_mode, but we can't forward declare an enum until C++11
+        inline ir_dereference* newIRVariableDeref(const glsl_type*, const char*, int mode, bool declare = false);
+        inline ir_dereference* newIRVariableDeref(const glsl_type*, const std::string&, int mode, bool declare = false);
+        inline ir_dereference* newIRVariableDeref(const llvm::Type* type, const llvm::Value*, const char*, int mode, bool declare = false);
+        inline ir_dereference* newIRVariableDeref(const llvm::Type* type, const llvm::Value*, const std::string&, int mode, bool declare = false);
+
+        // Fix up IR Lvalues (see comment in C++ code)
+        ir_instruction* fixIRLValue(ir_rvalue* lhs, ir_rvalue* rhs);
+
+        // Add error message
+        void error(const char* msg) const;
+
+        void unPackSetAndBinding(const int location, int& set, int& binding);
+
+        void setIoParameters(ir_variable* ioVar, const llvm::MDNode*);
+
+        // Data ----------------------------------------------------------------
+
+        // Map LLVM values to IR values
+        // We never need an ordered traversal.  Use unordered map for performance.
+        typedef std::tr1::unordered_map<const llvm::Value*, ir_instruction*> tValueMap;
+        tValueMap valueMap;
+
+        // structure to count rvalues per lvalue
+        typedef std::tr1::unordered_map<const llvm::Value*, unsigned> tRefCountMap;
+        tRefCountMap refCountMap;
+
+        // map from type names to the mdAggregate nodes that describe their types
+        typedef std::tr1::unordered_map<const llvm::Type*, const llvm::MDNode*> tMDMap;
+        tMDMap typeMdAggregateMap;
+
+        // Map from type names to metadata nodes
+        typedef std::tr1::unordered_map<std::string, const llvm::MDNode*> tTypenameMdMap;
+        tTypenameMdMap typenameMdMap;
+
+        typedef std::tr1::tuple<const llvm::Type*, const llvm::MDNode*, const bool> tTypeData;
+
+        struct TypeHash {
+           size_t operator()(const tTypeData& p) const { return size_t(std::tr1::get<0>(p)) ^ size_t(std::tr1::get<1>(p)) ^ size_t(std::tr1::get<2>(p)); }
+        };
+
+        // map to track mapping from LLVM to HIR types
+        typedef std::tr1::unordered_map<tTypeData, const glsl_type*, TypeHash> tTypeMap;
+        tTypeMap typeMap;
+
+        // This is the declaration map.  We map uses of globals to their declarations
+        // using rendezvous-by-name.  HIR requires use of the exact ir_variable node.
+        typedef std::tr1::unordered_map<std::string, ir_variable*> tGlobalDeclMap;
+        tGlobalDeclMap globalDeclMap;
+
+        // For globals, we remember an ir_variable_mode from the global declaration.
+        // Alas, we store it as an int here since we can't forward declare an enum.
+        typedef std::tr1::unordered_map<std::string, int> tGlobalVarModeMap;
+        tGlobalVarModeMap globalVarModeMap;
+
+        // Some builtin variables may be declared and ref'd with another name, especially when
+        // SPIRV debug info is stripped. Remember these mappings here.
+        typedef std::tr1::unordered_map<std::string, std::string> tNameBuiltinMap;
+        tNameBuiltinMap nameBuiltinMap;
+
+        // Certain HIR opcodes require proper sint/uint types, and that information is
+        // not preserved in LLVM.  LunarGlass provides it in metadata for IO and uniform
+        // data, but not arbitrary temp values.  This set tracks temporaries that must
+        // be sints.
+        typedef std::tr1::unordered_set<const llvm::Value*> tSintSet;
+        tSintSet sintValues;
+
+        // These are anonymous structs where we must hoist the members to the global scope,
+        // because HIR form doesn't match BottomIR form.  BottomIR holds them in a structure,
+        // while HIR makes each mmeber a globally scoped variable.  This is awkward.
+        std::tr1::unordered_set<std::string> anonBlocks;
+
+        // Stack of instruction lists for creation of HIR tree
+        std::vector<exec_list *> instructionStack;
+
+        // Stack of if statements.  Used to find else clause instruction list
+        std::vector<ir_if *> ifStack;
+
+        // Stack of loop statements.
+        std::vector<ir_loop *> loopStack;
+
+        // Stack of loop terminator statements (e.g, for handling ++index).
+        std::vector<ir_instruction *> loopTerminatorStack;
+
+        // Global initializers, etc we'll add as a main() prologue
+        std::list<ir_instruction *> prologue;
+
+        // We count the references of each lvalue, to know when to generate
+        // assignments and when to directly create tree nodes
+        void countReferences(const llvm::Module&);
+
+        // Make up a name for a new variable
+        const char* newName(const llvm::Value*);
+
+        // list of llvm Values to free on exit
+        std::vector<const llvm::Value*> toDelete;
+
+        mutable std::string infoLog;
+
+        // Function translation state: we need these because the traversal of LLVM
+        // functions is distributed across multiple methods.
+        const glsl_type*         fnReturnType;
+        const char*              fnName;
+        ir_function*             fnFunction;
+        ir_function_signature*   fnSignature;
+        exec_list                fnParameters;
+
+        // Mesa state and context
+        _mesa_glsl_parse_state*  state;
+        gl_shader*               shader;
+        gl_context*              ctx;
+    }; // class MesaGlassTranslator
+} // namespace gla
+

diff --git a/icd/intel/compiler/shader/glsl_glass_manager.cpp b/icd/intel/compiler/shader/glsl_glass_manager.cpp
new file mode 100644
index 0000000..2588900
--- /dev/null
+++ b/icd/intel/compiler/shader/glsl_glass_manager.cpp

@@ -0,0 +1,90 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Author: Steve K <srk@LunarG.com>
+ *
+ */
+
+//===- glsl_glass_manager.cpp - Mesa customization of PrivateManager -----===//
+//
+// Customization of gla::PrivateManager for Mesa
+//
+//===---------------------------------------------------------------------===//
+
+#include "glsl_glass_manager.h"
+#include "glsl_glass_backend.h"
+#include "glsl_glass_backend_translator.h"
+
+gla::MesaGlassManager::MesaGlassManager(const EShLanguage language)
+{
+   createNonreusable();
+   backEnd = gla::GetMesaGlassBackEnd(language);
+}
+
+gla::MesaGlassManager::~MesaGlassManager()
+{
+   freeNonreusable();
+   gla::ReleaseMesaGlassBackEnd(backEnd);
+}
+
+void gla::MesaGlassManager::clear()
+{
+   freeNonreusable();
+   createNonreusable();
+}
+
+void gla::MesaGlassManager::createContext()
+{
+   delete context;
+   context = new llvm::LLVMContext;
+}
+
+void gla::MesaGlassManager::createNonreusable()
+{
+   backEndTranslator = gla::GetMesaGlassTranslator(this);
+}
+
+void gla::MesaGlassManager::freeNonreusable()
+{
+   gla::ReleaseMesaGlassTranslator(backEndTranslator);
+   while (! freeList.empty()) {
+      delete freeList.back();
+      freeList.pop_back();
+   }
+   delete module;
+   module = 0;
+   delete context;
+   context = 0;
+}
+
+gla::Manager* gla::getManager(const EShLanguage language)
+{
+   return new gla::MesaGlassManager(language);
+}
+
+gla::MesaGlassTranslator* gla::MesaGlassManager::getBackendTranslator()
+{
+   // We know this must be a MesaGlassTranslator, because we only
+   // get them from our factory, which only makes MesaGlassTranslators.
+   return static_cast<gla::MesaGlassTranslator*>(backEndTranslator);
+}

diff --git a/icd/intel/compiler/shader/glsl_glass_manager.h b/icd/intel/compiler/shader/glsl_glass_manager.h
new file mode 100644
index 0000000..93c3651
--- /dev/null
+++ b/icd/intel/compiler/shader/glsl_glass_manager.h

@@ -0,0 +1,61 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Author: Steve K <srk@LunarG.com>
+ *
+ */
+
+//===- glsl_glass_manager.h - Mesa customization of PrivateManager -----===//
+//
+// Customization of gla::PrivateManager for Mesa
+//
+//===-------------------------------------------------------------------===//
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wswitch"
+#pragma GCC diagnostic ignored "-Wunknown-pragmas"
+#include "Core/PrivateManager.h"
+#include "glslang/Public/ShaderLang.h"
+#pragma GCC diagnostic pop
+
+namespace gla {
+
+class MesaGlassTranslator;  // forward declare
+
+class MesaGlassManager : public gla::PrivateManager {
+public:
+   MesaGlassManager(const EShLanguage);
+   virtual ~MesaGlassManager();
+   virtual void clear();
+   void createContext();
+   MesaGlassTranslator* getBackendTranslator();
+
+protected:
+   void freeNonreusable();
+   void createNonreusable();
+};
+
+// We provide our own overload of getManager to pass in some data
+Manager* getManager(const EShLanguage);
+
+} // namespace gla

diff --git a/icd/intel/compiler/shader/glsl_lexer.ll b/icd/intel/compiler/shader/glsl_lexer.ll
new file mode 100644
index 0000000..7602351
--- /dev/null
+++ b/icd/intel/compiler/shader/glsl_lexer.ll

@@ -0,0 +1,578 @@
+%{
+/*
+ * Copyright © 2008, 2009 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include <ctype.h>
+#include <limits.h>
+#include "strtod.h"
+#include "ast.h"
+#include "glsl_parser_extras.h"
+#include "glsl_parser.h"
+
+static int classify_identifier(struct _mesa_glsl_parse_state *, const char *);
+
+#ifdef _MSC_VER
+#define YY_NO_UNISTD_H
+#endif
+
+#define YY_USER_ACTION						\
+   do {								\
+      yylloc->source = 0;					\
+      yylloc->first_column = yycolumn + 1;			\
+      yylloc->first_line = yylloc->last_line = yylineno + 1;	\
+      yycolumn += yyleng;					\
+      yylloc->last_column = yycolumn + 1;			\
+   } while(0);
+
+#define YY_USER_INIT yylineno = 0; yycolumn = 0;
+
+/* A macro for handling reserved words and keywords across language versions.
+ *
+ * Certain words start out as identifiers, become reserved words in
+ * later language revisions, and finally become language keywords.
+ * This may happen at different times in desktop GLSL and GLSL ES.
+ *
+ * For example, consider the following lexer rule:
+ * samplerBuffer       KEYWORD(130, 0, 140, 0, SAMPLERBUFFER)
+ *
+ * This means that "samplerBuffer" will be treated as:
+ * - a keyword (SAMPLERBUFFER token)         ...in GLSL >= 1.40
+ * - a reserved word - error                 ...in GLSL >= 1.30
+ * - an identifier                           ...in GLSL <  1.30 or GLSL ES
+ */
+#define KEYWORD(reserved_glsl, reserved_glsl_es,			\
+                allowed_glsl, allowed_glsl_es, token)			\
+   KEYWORD_WITH_ALT(reserved_glsl, reserved_glsl_es,			\
+                    allowed_glsl, allowed_glsl_es, false, token)
+
+/**
+ * Like the KEYWORD macro, but the word is also treated as a keyword
+ * if the given boolean expression is true.
+ */
+#define KEYWORD_WITH_ALT(reserved_glsl, reserved_glsl_es,		\
+                         allowed_glsl, allowed_glsl_es,			\
+                         alt_expr, token)				\
+   do {									\
+      if (yyextra->is_version(allowed_glsl, allowed_glsl_es)		\
+          || (alt_expr)) {						\
+	 return token;							\
+      } else if (yyextra->is_version(reserved_glsl,			\
+                                     reserved_glsl_es)) {		\
+	 _mesa_glsl_error(yylloc, yyextra,				\
+			  "illegal use of reserved word `%s'", yytext);	\
+	 return ERROR_TOK;						\
+      } else {								\
+	 yylval->identifier = strdup(yytext);				\
+	 return classify_identifier(yyextra, yytext);			\
+      }									\
+   } while (0)
+
+/**
+ * A macro for handling keywords that have been present in GLSL since
+ * its origin, but were changed into reserved words in GLSL 3.00 ES.
+ */
+#define DEPRECATED_ES_KEYWORD(token)					\
+   do {									\
+      if (yyextra->is_version(0, 300)) {				\
+	 _mesa_glsl_error(yylloc, yyextra,				\
+			  "illegal use of reserved word `%s'", yytext);	\
+	 return ERROR_TOK;						\
+      } else {								\
+         return token;							\
+      }									\
+   } while (0)
+
+static int
+literal_integer(char *text, int len, struct _mesa_glsl_parse_state *state,
+		YYSTYPE *lval, YYLTYPE *lloc, int base)
+{
+   bool is_uint = (text[len - 1] == 'u' ||
+		   text[len - 1] == 'U');
+   const char *digits = text;
+
+   /* Skip "0x" */
+   if (base == 16)
+      digits += 2;
+
+#ifdef _MSC_VER
+   unsigned __int64 value = _strtoui64(digits, NULL, base);
+#else
+   unsigned long long value = strtoull(digits, NULL, base);
+#endif
+
+   lval->n = (int)value;
+
+   if (value > UINT_MAX) {
+      /* Note that signed 0xffffffff is valid, not out of range! */
+      if (state->is_version(130, 300)) {
+	 _mesa_glsl_error(lloc, state,
+			  "literal value `%s' out of range", text);
+      } else {
+	 _mesa_glsl_warning(lloc, state,
+			    "literal value `%s' out of range", text);
+      }
+   } else if (base == 10 && !is_uint && (unsigned)value > (unsigned)INT_MAX + 1) {
+      /* Tries to catch unintentionally providing a negative value.
+       * Note that -2147483648 is parsed as -(2147483648), so we don't
+       * want to warn for INT_MAX.
+       */
+      _mesa_glsl_warning(lloc, state,
+			 "signed literal value `%s' is interpreted as %d",
+			 text, lval->n);
+   }
+   return is_uint ? UINTCONSTANT : INTCONSTANT;
+}
+
+#define LITERAL_INTEGER(base) \
+   literal_integer(yytext, yyleng, yyextra, yylval, yylloc, base)
+
+%}
+
+%option bison-bridge bison-locations reentrant noyywrap
+%option nounput noyy_top_state
+%option never-interactive
+%option prefix="_mesa_glsl_lexer_"
+%option extra-type="struct _mesa_glsl_parse_state *"
+
+%x PP PRAGMA
+
+DEC_INT		[1-9][0-9]*
+HEX_INT		0[xX][0-9a-fA-F]+
+OCT_INT		0[0-7]*
+INT		({DEC_INT}|{HEX_INT}|{OCT_INT})
+SPC		[ \t]*
+SPCP		[ \t]+
+HASH		^{SPC}#{SPC}
+%%
+
+[ \r\t]+		;
+
+    /* Preprocessor tokens. */ 
+^[ \t]*#[ \t]*$			;
+^[ \t]*#[ \t]*version		{ BEGIN PP; return VERSION_TOK; }
+^[ \t]*#[ \t]*extension		{ BEGIN PP; return EXTENSION; }
+{HASH}line{SPCP}{INT}{SPCP}{INT}{SPC}$ {
+				   /* Eat characters until the first digit is
+				    * encountered
+				    */
+				   char *ptr = yytext;
+				   while (!isdigit(*ptr))
+				      ptr++;
+
+				   /* Subtract one from the line number because
+				    * yylineno is zero-based instead of
+				    * one-based.
+				    */
+				   yylineno = strtol(ptr, &ptr, 0) - 1;
+				   yylloc->source = strtol(ptr, NULL, 0);
+				}
+{HASH}line{SPCP}{INT}{SPC}$	{
+				   /* Eat characters until the first digit is
+				    * encountered
+				    */
+				   char *ptr = yytext;
+				   while (!isdigit(*ptr))
+				      ptr++;
+
+				   /* Subtract one from the line number because
+				    * yylineno is zero-based instead of
+				    * one-based.
+				    */
+				   yylineno = strtol(ptr, &ptr, 0) - 1;
+				}
+^{SPC}#{SPC}pragma{SPCP}debug{SPC}\({SPC}on{SPC}\) {
+				  BEGIN PP;
+				  return PRAGMA_DEBUG_ON;
+				}
+^{SPC}#{SPC}pragma{SPCP}debug{SPC}\({SPC}off{SPC}\) {
+				  BEGIN PP;
+				  return PRAGMA_DEBUG_OFF;
+				}
+^{SPC}#{SPC}pragma{SPCP}optimize{SPC}\({SPC}on{SPC}\) {
+				  BEGIN PP;
+				  return PRAGMA_OPTIMIZE_ON;
+				}
+^{SPC}#{SPC}pragma{SPCP}optimize{SPC}\({SPC}off{SPC}\) {
+				  BEGIN PP;
+				  return PRAGMA_OPTIMIZE_OFF;
+				}
+^{SPC}#{SPC}pragma{SPCP}STDGL{SPCP}invariant{SPC}\({SPC}all{SPC}\) {
+				  BEGIN PP;
+				  return PRAGMA_INVARIANT_ALL;
+				}
+^{SPC}#{SPC}pragma{SPCP}	{ BEGIN PRAGMA; }
+
+<PRAGMA>\n			{ BEGIN 0; yylineno++; yycolumn = 0; }
+<PRAGMA>.			{ }
+
+<PP>\/\/[^\n]*			{ }
+<PP>[ \t\r]*			{ }
+<PP>:				return COLON;
+<PP>[_a-zA-Z][_a-zA-Z0-9]*	{
+				   yylval->identifier = strdup(yytext);
+				   return IDENTIFIER;
+				}
+<PP>[1-9][0-9]*			{
+				    yylval->n = strtol(yytext, NULL, 10);
+				    return INTCONSTANT;
+				}
+<PP>\n				{ BEGIN 0; yylineno++; yycolumn = 0; return EOL; }
+
+\n		{ yylineno++; yycolumn = 0; }
+
+attribute	DEPRECATED_ES_KEYWORD(ATTRIBUTE);
+const		return CONST_TOK;
+bool		return BOOL_TOK;
+float		return FLOAT_TOK;
+int		return INT_TOK;
+uint		KEYWORD(130, 300, 130, 300, UINT_TOK);
+
+break		return BREAK;
+continue	return CONTINUE;
+do		return DO;
+while		return WHILE;
+else		return ELSE;
+for		return FOR;
+if		return IF;
+discard		return DISCARD;
+return		return RETURN;
+
+bvec2		return BVEC2;
+bvec3		return BVEC3;
+bvec4		return BVEC4;
+ivec2		return IVEC2;
+ivec3		return IVEC3;
+ivec4		return IVEC4;
+uvec2		KEYWORD(130, 300, 130, 300, UVEC2);
+uvec3		KEYWORD(130, 300, 130, 300, UVEC3);
+uvec4		KEYWORD(130, 300, 130, 300, UVEC4);
+vec2		return VEC2;
+vec3		return VEC3;
+vec4		return VEC4;
+mat2		return MAT2X2;
+mat3		return MAT3X3;
+mat4		return MAT4X4;
+mat2x2		KEYWORD(120, 300, 120, 300, MAT2X2);
+mat2x3		KEYWORD(120, 300, 120, 300, MAT2X3);
+mat2x4		KEYWORD(120, 300, 120, 300, MAT2X4);
+mat3x2		KEYWORD(120, 300, 120, 300, MAT3X2);
+mat3x3		KEYWORD(120, 300, 120, 300, MAT3X3);
+mat3x4		KEYWORD(120, 300, 120, 300, MAT3X4);
+mat4x2		KEYWORD(120, 300, 120, 300, MAT4X2);
+mat4x3		KEYWORD(120, 300, 120, 300, MAT4X3);
+mat4x4		KEYWORD(120, 300, 120, 300, MAT4X4);
+
+in		return IN_TOK;
+out		return OUT_TOK;
+inout		return INOUT_TOK;
+uniform		return UNIFORM;
+varying		DEPRECATED_ES_KEYWORD(VARYING);
+centroid	KEYWORD(120, 300, 120, 300, CENTROID);
+invariant	KEYWORD(120, 100, 120, 100, INVARIANT);
+flat		KEYWORD(130, 100, 130, 300, FLAT);
+smooth		KEYWORD(130, 300, 130, 300, SMOOTH);
+noperspective	KEYWORD(130, 300, 130, 0, NOPERSPECTIVE);
+
+sampler1D	DEPRECATED_ES_KEYWORD(SAMPLER1D);
+sampler2D	return SAMPLER2D;
+sampler3D	return SAMPLER3D;
+samplerCube	return SAMPLERCUBE;
+sampler1DArray	KEYWORD(130, 300, 130, 0, SAMPLER1DARRAY);
+sampler2DArray	KEYWORD(130, 300, 130, 300, SAMPLER2DARRAY);
+sampler1DShadow	DEPRECATED_ES_KEYWORD(SAMPLER1DSHADOW);
+sampler2DShadow	return SAMPLER2DSHADOW;
+samplerCubeShadow	KEYWORD(130, 300, 130, 300, SAMPLERCUBESHADOW);
+sampler1DArrayShadow	KEYWORD(130, 300, 130, 0, SAMPLER1DARRAYSHADOW);
+sampler2DArrayShadow	KEYWORD(130, 300, 130, 300, SAMPLER2DARRAYSHADOW);
+isampler1D		KEYWORD(130, 300, 130, 0, ISAMPLER1D);
+isampler2D		KEYWORD(130, 300, 130, 300, ISAMPLER2D);
+isampler3D		KEYWORD(130, 300, 130, 300, ISAMPLER3D);
+isamplerCube		KEYWORD(130, 300, 130, 300, ISAMPLERCUBE);
+isampler1DArray		KEYWORD(130, 300, 130, 0, ISAMPLER1DARRAY);
+isampler2DArray		KEYWORD(130, 300, 130, 300, ISAMPLER2DARRAY);
+usampler1D		KEYWORD(130, 300, 130, 0, USAMPLER1D);
+usampler2D		KEYWORD(130, 300, 130, 300, USAMPLER2D);
+usampler3D		KEYWORD(130, 300, 130, 300, USAMPLER3D);
+usamplerCube		KEYWORD(130, 300, 130, 300, USAMPLERCUBE);
+usampler1DArray		KEYWORD(130, 300, 130, 0, USAMPLER1DARRAY);
+usampler2DArray		KEYWORD(130, 300, 130, 300, USAMPLER2DARRAY);
+
+   /* additional keywords in ARB_texture_multisample, included in GLSL 1.50 */
+   /* these are reserved but not defined in GLSL 3.00 */
+sampler2DMS        KEYWORD_WITH_ALT(150, 300, 150, 0, yyextra->ARB_texture_multisample_enable, SAMPLER2DMS);
+isampler2DMS       KEYWORD_WITH_ALT(150, 300, 150, 0, yyextra->ARB_texture_multisample_enable, ISAMPLER2DMS);
+usampler2DMS       KEYWORD_WITH_ALT(150, 300, 150, 0, yyextra->ARB_texture_multisample_enable, USAMPLER2DMS);
+sampler2DMSArray   KEYWORD_WITH_ALT(150, 300, 150, 0, yyextra->ARB_texture_multisample_enable, SAMPLER2DMSARRAY);
+isampler2DMSArray  KEYWORD_WITH_ALT(150, 300, 150, 0, yyextra->ARB_texture_multisample_enable, ISAMPLER2DMSARRAY);
+usampler2DMSArray  KEYWORD_WITH_ALT(150, 300, 150, 0, yyextra->ARB_texture_multisample_enable, USAMPLER2DMSARRAY);
+
+   /* keywords available with ARB_texture_cube_map_array_enable extension on desktop GLSL */
+samplerCubeArray   KEYWORD_WITH_ALT(400, 0, 400, 0, yyextra->ARB_texture_cube_map_array_enable, SAMPLERCUBEARRAY);
+isamplerCubeArray KEYWORD_WITH_ALT(400, 0, 400, 0, yyextra->ARB_texture_cube_map_array_enable, ISAMPLERCUBEARRAY);
+usamplerCubeArray KEYWORD_WITH_ALT(400, 0, 400, 0, yyextra->ARB_texture_cube_map_array_enable, USAMPLERCUBEARRAY);
+samplerCubeArrayShadow   KEYWORD_WITH_ALT(400, 0, 400, 0, yyextra->ARB_texture_cube_map_array_enable, SAMPLERCUBEARRAYSHADOW);
+
+samplerExternalOES		{
+			  if (yyextra->OES_EGL_image_external_enable)
+			     return SAMPLEREXTERNALOES;
+			  else
+			     return IDENTIFIER;
+		}
+
+   /* keywords available with ARB_shader_image_load_store */
+image1D         KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, IMAGE1D);
+image2D         KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, IMAGE2D);
+image3D         KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, IMAGE3D);
+image2DRect     KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, IMAGE2DRECT);
+imageCube       KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, IMAGECUBE);
+imageBuffer     KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, IMAGEBUFFER);
+image1DArray    KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, IMAGE1DARRAY);
+image2DArray    KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, IMAGE2DARRAY);
+imageCubeArray  KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, IMAGECUBEARRAY);
+image2DMS       KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, IMAGE2DMS);
+image2DMSArray  KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, IMAGE2DMSARRAY);
+iimage1D        KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, IIMAGE1D);
+iimage2D        KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, IIMAGE2D);
+iimage3D        KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, IIMAGE3D);
+iimage2DRect    KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, IIMAGE2DRECT);
+iimageCube      KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, IIMAGECUBE);
+iimageBuffer    KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, IIMAGEBUFFER);
+iimage1DArray   KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, IIMAGE1DARRAY);
+iimage2DArray   KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, IIMAGE2DARRAY);
+iimageCubeArray KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, IIMAGECUBEARRAY);
+iimage2DMS      KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, IIMAGE2DMS);
+iimage2DMSArray KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, IIMAGE2DMSARRAY);
+uimage1D        KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, UIMAGE1D);
+uimage2D        KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, UIMAGE2D);
+uimage3D        KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, UIMAGE3D);
+uimage2DRect    KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, UIMAGE2DRECT);
+uimageCube      KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, UIMAGECUBE);
+uimageBuffer    KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, UIMAGEBUFFER);
+uimage1DArray   KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, UIMAGE1DARRAY);
+uimage2DArray   KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, UIMAGE2DARRAY);
+uimageCubeArray KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, UIMAGECUBEARRAY);
+uimage2DMS      KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, UIMAGE2DMS);
+uimage2DMSArray KEYWORD_WITH_ALT(130, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, UIMAGE2DMSARRAY);
+image1DShadow           KEYWORD(130, 300, 0, 0, IMAGE1DSHADOW);
+image2DShadow           KEYWORD(130, 300, 0, 0, IMAGE2DSHADOW);
+image1DArrayShadow      KEYWORD(130, 300, 0, 0, IMAGE1DARRAYSHADOW);
+image2DArrayShadow      KEYWORD(130, 300, 0, 0, IMAGE2DARRAYSHADOW);
+
+coherent	KEYWORD_WITH_ALT(420, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, COHERENT);
+volatile	KEYWORD_WITH_ALT(110, 100, 420, 0, yyextra->ARB_shader_image_load_store_enable, VOLATILE);
+restrict	KEYWORD_WITH_ALT(420, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, RESTRICT);
+readonly	KEYWORD_WITH_ALT(420, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, READONLY);
+writeonly	KEYWORD_WITH_ALT(420, 300, 420, 0, yyextra->ARB_shader_image_load_store_enable, WRITEONLY);
+
+atomic_uint     KEYWORD_WITH_ALT(420, 300, 420, 0, yyextra->ARB_shader_atomic_counters_enable, ATOMIC_UINT);
+
+struct		return STRUCT;
+void		return VOID_TOK;
+
+layout		{
+		  if ((yyextra->is_version(140, 300))
+		      || yyextra->AMD_conservative_depth_enable
+		      || yyextra->ARB_conservative_depth_enable
+		      || yyextra->ARB_explicit_attrib_location_enable
+                      || yyextra->has_separate_shader_objects()
+		      || yyextra->ARB_uniform_buffer_object_enable
+		      || yyextra->ARB_fragment_coord_conventions_enable
+                      || yyextra->ARB_shading_language_420pack_enable
+                      || yyextra->ARB_compute_shader_enable) {
+		      return LAYOUT_TOK;
+		   } else {
+		      yylval->identifier = strdup(yytext);
+		      return classify_identifier(yyextra, yytext);
+		   }
+		}
+
+\+\+		return INC_OP;
+--		return DEC_OP;
+\<=		return LE_OP;
+>=		return GE_OP;
+==		return EQ_OP;
+!=		return NE_OP;
+&&		return AND_OP;
+\|\|		return OR_OP;
+"^^"		return XOR_OP;
+"<<"		return LEFT_OP;
+">>"		return RIGHT_OP;
+
+\*=		return MUL_ASSIGN;
+\/=		return DIV_ASSIGN;
+\+=		return ADD_ASSIGN;
+\%=		return MOD_ASSIGN;
+\<\<=		return LEFT_ASSIGN;
+>>=		return RIGHT_ASSIGN;
+&=		return AND_ASSIGN;
+"^="		return XOR_ASSIGN;
+\|=		return OR_ASSIGN;
+-=		return SUB_ASSIGN;
+
+[1-9][0-9]*[uU]?	{
+			    return LITERAL_INTEGER(10);
+			}
+0[xX][0-9a-fA-F]+[uU]?	{
+			    return LITERAL_INTEGER(16);
+			}
+0[0-7]*[uU]?		{
+			    return LITERAL_INTEGER(8);
+			}
+
+[0-9]+\.[0-9]+([eE][+-]?[0-9]+)?[fF]?	{
+			    yylval->real = glsl_strtof(yytext, NULL);
+			    return FLOATCONSTANT;
+			}
+\.[0-9]+([eE][+-]?[0-9]+)?[fF]?		{
+			    yylval->real = glsl_strtof(yytext, NULL);
+			    return FLOATCONSTANT;
+			}
+[0-9]+\.([eE][+-]?[0-9]+)?[fF]?		{
+			    yylval->real = glsl_strtof(yytext, NULL);
+			    return FLOATCONSTANT;
+			}
+[0-9]+[eE][+-]?[0-9]+[fF]?		{
+			    yylval->real = glsl_strtof(yytext, NULL);
+			    return FLOATCONSTANT;
+			}
+[0-9]+[fF]		{
+			    yylval->real = glsl_strtof(yytext, NULL);
+			    return FLOATCONSTANT;
+			}
+
+true			{
+			    yylval->n = 1;
+			    return BOOLCONSTANT;
+			}
+false			{
+			    yylval->n = 0;
+			    return BOOLCONSTANT;
+			}
+
+
+    /* Reserved words in GLSL 1.10. */
+asm		KEYWORD(110, 100, 0, 0, ASM);
+class		KEYWORD(110, 100, 0, 0, CLASS);
+union		KEYWORD(110, 100, 0, 0, UNION);
+enum		KEYWORD(110, 100, 0, 0, ENUM);
+typedef		KEYWORD(110, 100, 0, 0, TYPEDEF);
+template	KEYWORD(110, 100, 0, 0, TEMPLATE);
+this		KEYWORD(110, 100, 0, 0, THIS);
+packed		KEYWORD_WITH_ALT(110, 100, 140, 300, yyextra->ARB_uniform_buffer_object_enable, PACKED_TOK);
+goto		KEYWORD(110, 100, 0, 0, GOTO);
+switch		KEYWORD(110, 100, 130, 300, SWITCH);
+default		KEYWORD(110, 100, 130, 300, DEFAULT);
+inline		KEYWORD(110, 100, 0, 0, INLINE_TOK);
+noinline	KEYWORD(110, 100, 0, 0, NOINLINE);
+public		KEYWORD(110, 100, 0, 0, PUBLIC_TOK);
+static		KEYWORD(110, 100, 0, 0, STATIC);
+extern		KEYWORD(110, 100, 0, 0, EXTERN);
+external	KEYWORD(110, 100, 0, 0, EXTERNAL);
+interface	KEYWORD(110, 100, 0, 0, INTERFACE);
+long		KEYWORD(110, 100, 0, 0, LONG_TOK);
+short		KEYWORD(110, 100, 0, 0, SHORT_TOK);
+double		KEYWORD(110, 100, 400, 0, DOUBLE_TOK);
+half		KEYWORD(110, 100, 0, 0, HALF);
+fixed		KEYWORD(110, 100, 0, 0, FIXED_TOK);
+unsigned	KEYWORD(110, 100, 0, 0, UNSIGNED);
+input		KEYWORD(110, 100, 0, 0, INPUT_TOK);
+output		KEYWORD(110, 100, 0, 0, OUTPUT);
+hvec2		KEYWORD(110, 100, 0, 0, HVEC2);
+hvec3		KEYWORD(110, 100, 0, 0, HVEC3);
+hvec4		KEYWORD(110, 100, 0, 0, HVEC4);
+dvec2		KEYWORD(110, 100, 400, 0, DVEC2);
+dvec3		KEYWORD(110, 100, 400, 0, DVEC3);
+dvec4		KEYWORD(110, 100, 400, 0, DVEC4);
+fvec2		KEYWORD(110, 100, 0, 0, FVEC2);
+fvec3		KEYWORD(110, 100, 0, 0, FVEC3);
+fvec4		KEYWORD(110, 100, 0, 0, FVEC4);
+sampler2DRect		DEPRECATED_ES_KEYWORD(SAMPLER2DRECT);
+sampler3DRect		KEYWORD(110, 100, 0, 0, SAMPLER3DRECT);
+sampler2DRectShadow	DEPRECATED_ES_KEYWORD(SAMPLER2DRECTSHADOW);
+sizeof		KEYWORD(110, 100, 0, 0, SIZEOF);
+cast		KEYWORD(110, 100, 0, 0, CAST);
+namespace	KEYWORD(110, 100, 0, 0, NAMESPACE);
+using		KEYWORD(110, 100, 0, 0, USING);
+
+    /* Additional reserved words in GLSL 1.20. */
+lowp		KEYWORD(120, 100, 130, 100, LOWP);
+mediump		KEYWORD(120, 100, 130, 100, MEDIUMP);
+highp		KEYWORD(120, 100, 130, 100, HIGHP);
+precision	KEYWORD(120, 100, 130, 100, PRECISION);
+
+    /* Additional reserved words in GLSL 1.30. */
+case		KEYWORD(130, 300, 130, 300, CASE);
+common		KEYWORD(130, 300, 0, 0, COMMON);
+partition	KEYWORD(130, 300, 0, 0, PARTITION);
+active		KEYWORD(130, 300, 0, 0, ACTIVE);
+superp		KEYWORD(130, 100, 0, 0, SUPERP);
+samplerBuffer	KEYWORD(130, 300, 140, 0, SAMPLERBUFFER);
+filter		KEYWORD(130, 300, 0, 0, FILTER);
+row_major	KEYWORD_WITH_ALT(130, 0, 140, 0, yyextra->ARB_uniform_buffer_object_enable && !yyextra->es_shader, ROW_MAJOR);
+
+    /* Additional reserved words in GLSL 1.40 */
+isampler2DRect	KEYWORD(140, 300, 140, 0, ISAMPLER2DRECT);
+usampler2DRect	KEYWORD(140, 300, 140, 0, USAMPLER2DRECT);
+isamplerBuffer	KEYWORD(140, 300, 140, 0, ISAMPLERBUFFER);
+usamplerBuffer	KEYWORD(140, 300, 140, 0, USAMPLERBUFFER);
+
+    /* Additional reserved words in GLSL ES 3.00 */
+resource	KEYWORD(0, 300, 0, 0, RESOURCE);
+patch		KEYWORD(0, 300, 0, 0, PATCH);
+sample		KEYWORD_WITH_ALT(400, 300, 400, 0, yyextra->ARB_gpu_shader5_enable, SAMPLE);
+subroutine	KEYWORD(0, 300, 0, 0, SUBROUTINE);
+
+
+[_a-zA-Z][_a-zA-Z0-9]*	{
+			    struct _mesa_glsl_parse_state *state = yyextra;
+			    void *ctx = state;	
+			    yylval->identifier = ralloc_strdup(ctx, yytext);
+			    return classify_identifier(state, yytext);
+			}
+
+.			{ return yytext[0]; }
+
+%%
+
+int
+classify_identifier(struct _mesa_glsl_parse_state *state, const char *name)
+{
+   if (state->symbols->get_variable(name) || state->symbols->get_function(name))
+      return IDENTIFIER;
+   else if (state->symbols->get_type(name))
+      return TYPE_IDENTIFIER;
+   else
+      return NEW_IDENTIFIER;
+}
+
+void
+_mesa_glsl_lexer_ctor(struct _mesa_glsl_parse_state *state, const char *string)
+{
+   yylex_init_extra(state, & state->scanner);
+   yy_scan_string(string, state->scanner);
+}
+
+void
+_mesa_glsl_lexer_dtor(struct _mesa_glsl_parse_state *state)
+{
+   yylex_destroy(state->scanner);
+}

diff --git a/icd/intel/compiler/shader/glsl_parser.yy b/icd/intel/compiler/shader/glsl_parser.yy
new file mode 100644
index 0000000..b09d6e5
--- /dev/null
+++ b/icd/intel/compiler/shader/glsl_parser.yy

@@ -0,0 +1,2558 @@
+%{
+/*
+ * Copyright © 2008, 2009 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <assert.h>
+
+#include "ast.h"
+#include "glsl_parser_extras.h"
+#include "glsl_types.h"
+#include "main/context.h"
+
+#undef yyerror
+
+static void yyerror(YYLTYPE *loc, _mesa_glsl_parse_state *st, const char *msg)
+{
+   _mesa_glsl_error(loc, st, "%s", msg);
+}
+
+static int
+_mesa_glsl_lex(YYSTYPE *val, YYLTYPE *loc, _mesa_glsl_parse_state *state)
+{
+   return _mesa_glsl_lexer_lex(val, loc, state->scanner);
+}
+
+static bool match_layout_qualifier(const char *s1, const char *s2,
+                                   _mesa_glsl_parse_state *state)
+{
+   /* From the GLSL 1.50 spec, section 4.3.8 (Layout Qualifiers):
+    *
+    *     "The tokens in any layout-qualifier-id-list ... are not case
+    *     sensitive, unless explicitly noted otherwise."
+    *
+    * The text "unless explicitly noted otherwise" appears to be
+    * vacuous--no desktop GLSL spec (up through GLSL 4.40) notes
+    * otherwise.
+    *
+    * However, the GLSL ES 3.00 spec says, in section 4.3.8 (Layout
+    * Qualifiers):
+    *
+    *     "As for other identifiers, they are case sensitive."
+    *
+    * So we need to do a case-sensitive or a case-insensitive match,
+    * depending on whether we are compiling for GLSL ES.
+    */
+   if (state->es_shader)
+      return strcmp(s1, s2);
+   else
+      return strcasecmp(s1, s2);
+}
+%}
+
+%expect 0
+
+%pure-parser
+%error-verbose
+
+%locations
+%initial-action {
+   @$.first_line = 1;
+   @$.first_column = 1;
+   @$.last_line = 1;
+   @$.last_column = 1;
+   @$.source = 0;
+}
+
+%lex-param   {struct _mesa_glsl_parse_state *state}
+%parse-param {struct _mesa_glsl_parse_state *state}
+
+%union {
+   int n;
+   float real;
+   const char *identifier;
+
+   struct ast_type_qualifier type_qualifier;
+
+   ast_node *node;
+   ast_type_specifier *type_specifier;
+   ast_array_specifier *array_specifier;
+   ast_fully_specified_type *fully_specified_type;
+   ast_function *function;
+   ast_parameter_declarator *parameter_declarator;
+   ast_function_definition *function_definition;
+   ast_compound_statement *compound_statement;
+   ast_expression *expression;
+   ast_declarator_list *declarator_list;
+   ast_struct_specifier *struct_specifier;
+   ast_declaration *declaration;
+   ast_switch_body *switch_body;
+   ast_case_label *case_label;
+   ast_case_label_list *case_label_list;
+   ast_case_statement *case_statement;
+   ast_case_statement_list *case_statement_list;
+   ast_interface_block *interface_block;
+
+   struct {
+      ast_node *cond;
+      ast_expression *rest;
+   } for_rest_statement;
+
+   struct {
+      ast_node *then_statement;
+      ast_node *else_statement;
+   } selection_rest_statement;
+}
+
+%token ATTRIBUTE CONST_TOK BOOL_TOK FLOAT_TOK INT_TOK UINT_TOK
+%token BREAK CONTINUE DO ELSE FOR IF DISCARD RETURN SWITCH CASE DEFAULT
+%token BVEC2 BVEC3 BVEC4 IVEC2 IVEC3 IVEC4 UVEC2 UVEC3 UVEC4 VEC2 VEC3 VEC4
+%token CENTROID IN_TOK OUT_TOK INOUT_TOK UNIFORM VARYING
+%token NOPERSPECTIVE FLAT SMOOTH
+%token MAT2X2 MAT2X3 MAT2X4
+%token MAT3X2 MAT3X3 MAT3X4
+%token MAT4X2 MAT4X3 MAT4X4
+%token SAMPLER1D SAMPLER2D SAMPLER3D SAMPLERCUBE SAMPLER1DSHADOW SAMPLER2DSHADOW
+%token SAMPLERCUBESHADOW SAMPLER1DARRAY SAMPLER2DARRAY SAMPLER1DARRAYSHADOW
+%token SAMPLER2DARRAYSHADOW SAMPLERCUBEARRAY SAMPLERCUBEARRAYSHADOW
+%token ISAMPLER1D ISAMPLER2D ISAMPLER3D ISAMPLERCUBE
+%token ISAMPLER1DARRAY ISAMPLER2DARRAY ISAMPLERCUBEARRAY
+%token USAMPLER1D USAMPLER2D USAMPLER3D USAMPLERCUBE USAMPLER1DARRAY
+%token USAMPLER2DARRAY USAMPLERCUBEARRAY
+%token SAMPLER2DRECT ISAMPLER2DRECT USAMPLER2DRECT SAMPLER2DRECTSHADOW
+%token SAMPLERBUFFER ISAMPLERBUFFER USAMPLERBUFFER
+%token SAMPLER2DMS ISAMPLER2DMS USAMPLER2DMS
+%token SAMPLER2DMSARRAY ISAMPLER2DMSARRAY USAMPLER2DMSARRAY
+%token SAMPLEREXTERNALOES
+%token IMAGE1D IMAGE2D IMAGE3D IMAGE2DRECT IMAGECUBE IMAGEBUFFER
+%token IMAGE1DARRAY IMAGE2DARRAY IMAGECUBEARRAY IMAGE2DMS IMAGE2DMSARRAY
+%token IIMAGE1D IIMAGE2D IIMAGE3D IIMAGE2DRECT IIMAGECUBE IIMAGEBUFFER
+%token IIMAGE1DARRAY IIMAGE2DARRAY IIMAGECUBEARRAY IIMAGE2DMS IIMAGE2DMSARRAY
+%token UIMAGE1D UIMAGE2D UIMAGE3D UIMAGE2DRECT UIMAGECUBE UIMAGEBUFFER
+%token UIMAGE1DARRAY UIMAGE2DARRAY UIMAGECUBEARRAY UIMAGE2DMS UIMAGE2DMSARRAY
+%token IMAGE1DSHADOW IMAGE2DSHADOW IMAGE1DARRAYSHADOW IMAGE2DARRAYSHADOW
+%token COHERENT VOLATILE RESTRICT READONLY WRITEONLY
+%token ATOMIC_UINT
+%token STRUCT VOID_TOK WHILE
+%token <identifier> IDENTIFIER TYPE_IDENTIFIER NEW_IDENTIFIER
+%type <identifier> any_identifier
+%type <interface_block> instance_name_opt
+%token <real> FLOATCONSTANT
+%token <n> INTCONSTANT UINTCONSTANT BOOLCONSTANT
+%token <identifier> FIELD_SELECTION
+%token LEFT_OP RIGHT_OP
+%token INC_OP DEC_OP LE_OP GE_OP EQ_OP NE_OP
+%token AND_OP OR_OP XOR_OP MUL_ASSIGN DIV_ASSIGN ADD_ASSIGN
+%token MOD_ASSIGN LEFT_ASSIGN RIGHT_ASSIGN AND_ASSIGN XOR_ASSIGN OR_ASSIGN
+%token SUB_ASSIGN
+%token INVARIANT
+%token LOWP MEDIUMP HIGHP SUPERP PRECISION
+
+%token VERSION_TOK EXTENSION LINE COLON EOL INTERFACE OUTPUT
+%token PRAGMA_DEBUG_ON PRAGMA_DEBUG_OFF
+%token PRAGMA_OPTIMIZE_ON PRAGMA_OPTIMIZE_OFF
+%token PRAGMA_INVARIANT_ALL
+%token LAYOUT_TOK
+
+   /* Reserved words that are not actually used in the grammar.
+    */
+%token ASM CLASS UNION ENUM TYPEDEF TEMPLATE THIS PACKED_TOK GOTO
+%token INLINE_TOK NOINLINE PUBLIC_TOK STATIC EXTERN EXTERNAL
+%token LONG_TOK SHORT_TOK DOUBLE_TOK HALF FIXED_TOK UNSIGNED INPUT_TOK OUPTUT
+%token HVEC2 HVEC3 HVEC4 DVEC2 DVEC3 DVEC4 FVEC2 FVEC3 FVEC4
+%token SAMPLER3DRECT
+%token SIZEOF CAST NAMESPACE USING
+%token RESOURCE PATCH SAMPLE
+%token SUBROUTINE
+
+%token ERROR_TOK
+
+%token COMMON PARTITION ACTIVE FILTER ROW_MAJOR
+
+%type <identifier> variable_identifier
+%type <node> statement
+%type <node> statement_list
+%type <node> simple_statement
+%type <n> precision_qualifier
+%type <type_qualifier> type_qualifier
+%type <type_qualifier> auxiliary_storage_qualifier
+%type <type_qualifier> storage_qualifier
+%type <type_qualifier> interpolation_qualifier
+%type <type_qualifier> layout_qualifier
+%type <type_qualifier> layout_qualifier_id_list layout_qualifier_id
+%type <type_qualifier> interface_block_layout_qualifier
+%type <type_qualifier> interface_qualifier
+%type <type_specifier> type_specifier
+%type <type_specifier> type_specifier_nonarray
+%type <array_specifier> array_specifier
+%type <identifier> basic_type_specifier_nonarray
+%type <fully_specified_type> fully_specified_type
+%type <function> function_prototype
+%type <function> function_header
+%type <function> function_header_with_parameters
+%type <function> function_declarator
+%type <parameter_declarator> parameter_declarator
+%type <parameter_declarator> parameter_declaration
+%type <type_qualifier> parameter_qualifier
+%type <type_qualifier> parameter_direction_qualifier
+%type <type_specifier> parameter_type_specifier
+%type <function_definition> function_definition
+%type <compound_statement> compound_statement_no_new_scope
+%type <compound_statement> compound_statement
+%type <node> statement_no_new_scope
+%type <node> expression_statement
+%type <expression> expression
+%type <expression> primary_expression
+%type <expression> assignment_expression
+%type <expression> conditional_expression
+%type <expression> logical_or_expression
+%type <expression> logical_xor_expression
+%type <expression> logical_and_expression
+%type <expression> inclusive_or_expression
+%type <expression> exclusive_or_expression
+%type <expression> and_expression
+%type <expression> equality_expression
+%type <expression> relational_expression
+%type <expression> shift_expression
+%type <expression> additive_expression
+%type <expression> multiplicative_expression
+%type <expression> unary_expression
+%type <expression> constant_expression
+%type <expression> integer_expression
+%type <expression> postfix_expression
+%type <expression> function_call_header_with_parameters
+%type <expression> function_call_header_no_parameters
+%type <expression> function_call_header
+%type <expression> function_call_generic
+%type <expression> function_call_or_method
+%type <expression> function_call
+%type <expression> method_call_generic
+%type <expression> method_call_header_with_parameters
+%type <expression> method_call_header_no_parameters
+%type <expression> method_call_header
+%type <n> assignment_operator
+%type <n> unary_operator
+%type <expression> function_identifier
+%type <node> external_declaration
+%type <declarator_list> init_declarator_list
+%type <declarator_list> single_declaration
+%type <expression> initializer
+%type <expression> initializer_list
+%type <node> declaration
+%type <node> declaration_statement
+%type <node> jump_statement
+%type <node> interface_block
+%type <interface_block> basic_interface_block
+%type <struct_specifier> struct_specifier
+%type <declarator_list> struct_declaration_list
+%type <declarator_list> struct_declaration
+%type <declaration> struct_declarator
+%type <declaration> struct_declarator_list
+%type <declarator_list> member_list
+%type <declarator_list> member_declaration
+%type <node> selection_statement
+%type <selection_rest_statement> selection_rest_statement
+%type <node> switch_statement
+%type <switch_body> switch_body
+%type <case_label_list> case_label_list
+%type <case_label> case_label
+%type <case_statement> case_statement
+%type <case_statement_list> case_statement_list
+%type <node> iteration_statement
+%type <node> condition
+%type <node> conditionopt
+%type <node> for_init_statement
+%type <for_rest_statement> for_rest_statement
+%type <n> integer_constant
+%type <node> layout_defaults
+
+%right THEN ELSE
+%%
+
+translation_unit:
+   version_statement extension_statement_list
+   {
+      _mesa_glsl_initialize_types(state);
+   }
+   external_declaration_list
+   {
+      delete state->symbols;
+      state->symbols = new(ralloc_parent(state)) glsl_symbol_table;
+      _mesa_glsl_initialize_types(state);
+   }
+   ;
+
+version_statement:
+   /* blank - no #version specified: defaults are already set */
+   | VERSION_TOK INTCONSTANT EOL
+   {
+      state->process_version_directive(&@2, $2, NULL);
+      if (state->error) {
+         YYERROR;
+      }
+   }
+   | VERSION_TOK INTCONSTANT any_identifier EOL
+   {
+      state->process_version_directive(&@2, $2, $3);
+      if (state->error) {
+         YYERROR;
+      }
+   }
+   ;
+
+pragma_statement:
+   PRAGMA_DEBUG_ON EOL
+   | PRAGMA_DEBUG_OFF EOL
+   | PRAGMA_OPTIMIZE_ON EOL
+   | PRAGMA_OPTIMIZE_OFF EOL
+   | PRAGMA_INVARIANT_ALL EOL
+   {
+      if (!state->is_version(120, 100)) {
+         _mesa_glsl_warning(& @1, state,
+                            "pragma `invariant(all)' not supported in %s "
+                            "(GLSL ES 1.00 or GLSL 1.20 required)",
+                            state->get_version_string());
+      } else {
+         state->all_invariant = true;
+      }
+   }
+   ;
+
+extension_statement_list:
+
+   | extension_statement_list extension_statement
+   ;
+
+any_identifier:
+   IDENTIFIER
+   | TYPE_IDENTIFIER
+   | NEW_IDENTIFIER
+   ;
+
+extension_statement:
+   EXTENSION any_identifier COLON any_identifier EOL
+   {
+      if (!_mesa_glsl_process_extension($2, & @2, $4, & @4, state)) {
+         YYERROR;
+      }
+   }
+   ;
+
+external_declaration_list:
+   external_declaration
+   {
+      /* FINISHME: The NULL test is required because pragmas are set to
+       * FINISHME: NULL. (See production rule for external_declaration.)
+       */
+      if ($1 != NULL)
+         state->translation_unit.push_tail(& $1->link);
+   }
+   | external_declaration_list external_declaration
+   {
+      /* FINISHME: The NULL test is required because pragmas are set to
+       * FINISHME: NULL. (See production rule for external_declaration.)
+       */
+      if ($2 != NULL)
+         state->translation_unit.push_tail(& $2->link);
+   }
+   ;
+
+variable_identifier:
+   IDENTIFIER
+   | NEW_IDENTIFIER
+   ;
+
+primary_expression:
+   variable_identifier
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression(ast_identifier, NULL, NULL, NULL);
+      $$->set_location(@1);
+      $$->primary_expression.identifier = $1;
+   }
+   | INTCONSTANT
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression(ast_int_constant, NULL, NULL, NULL);
+      $$->set_location(@1);
+      $$->primary_expression.int_constant = $1;
+   }
+   | UINTCONSTANT
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression(ast_uint_constant, NULL, NULL, NULL);
+      $$->set_location(@1);
+      $$->primary_expression.uint_constant = $1;
+   }
+   | FLOATCONSTANT
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression(ast_float_constant, NULL, NULL, NULL);
+      $$->set_location(@1);
+      $$->primary_expression.float_constant = $1;
+   }
+   | BOOLCONSTANT
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression(ast_bool_constant, NULL, NULL, NULL);
+      $$->set_location(@1);
+      $$->primary_expression.bool_constant = $1;
+   }
+   | '(' expression ')'
+   {
+      $$ = $2;
+   }
+   ;
+
+postfix_expression:
+   primary_expression
+   | postfix_expression '[' integer_expression ']'
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression(ast_array_index, $1, $3, NULL);
+      $$->set_location_range(@1, @4);
+   }
+   | function_call
+   {
+      $$ = $1;
+   }
+   | postfix_expression '.' any_identifier
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression(ast_field_selection, $1, NULL, NULL);
+      $$->set_location_range(@1, @3);
+      $$->primary_expression.identifier = $3;
+   }
+   | postfix_expression INC_OP
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression(ast_post_inc, $1, NULL, NULL);
+      $$->set_location_range(@1, @2);
+   }
+   | postfix_expression DEC_OP
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression(ast_post_dec, $1, NULL, NULL);
+      $$->set_location_range(@1, @2);
+   }
+   ;
+
+integer_expression:
+   expression
+   ;
+
+function_call:
+   function_call_or_method
+   ;
+
+function_call_or_method:
+   function_call_generic
+   | postfix_expression '.' method_call_generic
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression(ast_field_selection, $1, $3, NULL);
+      $$->set_location_range(@1, @3);
+   }
+   ;
+
+function_call_generic:
+   function_call_header_with_parameters ')'
+   | function_call_header_no_parameters ')'
+   ;
+
+function_call_header_no_parameters:
+   function_call_header VOID_TOK
+   | function_call_header
+   ;
+
+function_call_header_with_parameters:
+   function_call_header assignment_expression
+   {
+      $$ = $1;
+      $$->set_location(@1);
+      $$->expressions.push_tail(& $2->link);
+   }
+   | function_call_header_with_parameters ',' assignment_expression
+   {
+      $$ = $1;
+      $$->set_location(@1);
+      $$->expressions.push_tail(& $3->link);
+   }
+   ;
+
+   // Grammar Note: Constructors look like functions, but lexical
+   // analysis recognized most of them as keywords. They are now
+   // recognized through "type_specifier".
+function_call_header:
+   function_identifier '('
+   ;
+
+function_identifier:
+   type_specifier
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_function_expression($1);
+      $$->set_location(@1);
+      }
+   | variable_identifier
+   {
+      void *ctx = state;
+      ast_expression *callee = new(ctx) ast_expression($1);
+      callee->set_location(@1);
+      $$ = new(ctx) ast_function_expression(callee);
+      $$->set_location(@1);
+      }
+   | FIELD_SELECTION
+   {
+      void *ctx = state;
+      ast_expression *callee = new(ctx) ast_expression($1);
+      callee->set_location(@1);
+      $$ = new(ctx) ast_function_expression(callee);
+      $$->set_location(@1);
+      }
+   ;
+
+method_call_generic:
+   method_call_header_with_parameters ')'
+   | method_call_header_no_parameters ')'
+   ;
+
+method_call_header_no_parameters:
+   method_call_header VOID_TOK
+   | method_call_header
+   ;
+
+method_call_header_with_parameters:
+   method_call_header assignment_expression
+   {
+      $$ = $1;
+      $$->set_location(@1);
+      $$->expressions.push_tail(& $2->link);
+   }
+   | method_call_header_with_parameters ',' assignment_expression
+   {
+      $$ = $1;
+      $$->set_location(@1);
+      $$->expressions.push_tail(& $3->link);
+   }
+   ;
+
+   // Grammar Note: Constructors look like methods, but lexical
+   // analysis recognized most of them as keywords. They are now
+   // recognized through "type_specifier".
+method_call_header:
+   variable_identifier '('
+   {
+      void *ctx = state;
+      ast_expression *callee = new(ctx) ast_expression($1);
+      callee->set_location(@1);
+      $$ = new(ctx) ast_function_expression(callee);
+      $$->set_location(@1);
+   }
+   ;
+
+   // Grammar Note: No traditional style type casts.
+unary_expression:
+   postfix_expression
+   | INC_OP unary_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression(ast_pre_inc, $2, NULL, NULL);
+      $$->set_location(@1);
+   }
+   | DEC_OP unary_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression(ast_pre_dec, $2, NULL, NULL);
+      $$->set_location(@1);
+   }
+   | unary_operator unary_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression($1, $2, NULL, NULL);
+      $$->set_location_range(@1, @2);
+   }
+   ;
+
+   // Grammar Note: No '*' or '&' unary ops. Pointers are not supported.
+unary_operator:
+   '+'   { $$ = ast_plus; }
+   | '-' { $$ = ast_neg; }
+   | '!' { $$ = ast_logic_not; }
+   | '~' { $$ = ast_bit_not; }
+   ;
+
+multiplicative_expression:
+   unary_expression
+   | multiplicative_expression '*' unary_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression_bin(ast_mul, $1, $3);
+      $$->set_location_range(@1, @3);
+   }
+   | multiplicative_expression '/' unary_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression_bin(ast_div, $1, $3);
+      $$->set_location_range(@1, @3);
+   }
+   | multiplicative_expression '%' unary_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression_bin(ast_mod, $1, $3);
+      $$->set_location_range(@1, @3);
+   }
+   ;
+
+additive_expression:
+   multiplicative_expression
+   | additive_expression '+' multiplicative_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression_bin(ast_add, $1, $3);
+      $$->set_location_range(@1, @3);
+   }
+   | additive_expression '-' multiplicative_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression_bin(ast_sub, $1, $3);
+      $$->set_location_range(@1, @3);
+   }
+   ;
+
+shift_expression:
+   additive_expression
+   | shift_expression LEFT_OP additive_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression_bin(ast_lshift, $1, $3);
+      $$->set_location_range(@1, @3);
+   }
+   | shift_expression RIGHT_OP additive_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression_bin(ast_rshift, $1, $3);
+      $$->set_location_range(@1, @3);
+   }
+   ;
+
+relational_expression:
+   shift_expression
+   | relational_expression '<' shift_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression_bin(ast_less, $1, $3);
+      $$->set_location_range(@1, @3);
+   }
+   | relational_expression '>' shift_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression_bin(ast_greater, $1, $3);
+      $$->set_location_range(@1, @3);
+   }
+   | relational_expression LE_OP shift_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression_bin(ast_lequal, $1, $3);
+      $$->set_location_range(@1, @3);
+   }
+   | relational_expression GE_OP shift_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression_bin(ast_gequal, $1, $3);
+      $$->set_location_range(@1, @3);
+   }
+   ;
+
+equality_expression:
+   relational_expression
+   | equality_expression EQ_OP relational_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression_bin(ast_equal, $1, $3);
+      $$->set_location_range(@1, @3);
+   }
+   | equality_expression NE_OP relational_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression_bin(ast_nequal, $1, $3);
+      $$->set_location_range(@1, @3);
+   }
+   ;
+
+and_expression:
+   equality_expression
+   | and_expression '&' equality_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression_bin(ast_bit_and, $1, $3);
+      $$->set_location_range(@1, @3);
+   }
+   ;
+
+exclusive_or_expression:
+   and_expression
+   | exclusive_or_expression '^' and_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression_bin(ast_bit_xor, $1, $3);
+      $$->set_location_range(@1, @3);
+   }
+   ;
+
+inclusive_or_expression:
+   exclusive_or_expression
+   | inclusive_or_expression '|' exclusive_or_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression_bin(ast_bit_or, $1, $3);
+      $$->set_location_range(@1, @3);
+   }
+   ;
+
+logical_and_expression:
+   inclusive_or_expression
+   | logical_and_expression AND_OP inclusive_or_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression_bin(ast_logic_and, $1, $3);
+      $$->set_location_range(@1, @3);
+   }
+   ;
+
+logical_xor_expression:
+   logical_and_expression
+   | logical_xor_expression XOR_OP logical_and_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression_bin(ast_logic_xor, $1, $3);
+      $$->set_location_range(@1, @3);
+   }
+   ;
+
+logical_or_expression:
+   logical_xor_expression
+   | logical_or_expression OR_OP logical_xor_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression_bin(ast_logic_or, $1, $3);
+      $$->set_location_range(@1, @3);
+   }
+   ;
+
+conditional_expression:
+   logical_or_expression
+   | logical_or_expression '?' expression ':' assignment_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression(ast_conditional, $1, $3, $5);
+      $$->set_location_range(@1, @5);
+   }
+   ;
+
+assignment_expression:
+   conditional_expression
+   | unary_expression assignment_operator assignment_expression
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression($2, $1, $3, NULL);
+      $$->set_location_range(@1, @3);
+   }
+   ;
+
+assignment_operator:
+   '='                { $$ = ast_assign; }
+   | MUL_ASSIGN       { $$ = ast_mul_assign; }
+   | DIV_ASSIGN       { $$ = ast_div_assign; }
+   | MOD_ASSIGN       { $$ = ast_mod_assign; }
+   | ADD_ASSIGN       { $$ = ast_add_assign; }
+   | SUB_ASSIGN       { $$ = ast_sub_assign; }
+   | LEFT_ASSIGN      { $$ = ast_ls_assign; }
+   | RIGHT_ASSIGN     { $$ = ast_rs_assign; }
+   | AND_ASSIGN       { $$ = ast_and_assign; }
+   | XOR_ASSIGN       { $$ = ast_xor_assign; }
+   | OR_ASSIGN        { $$ = ast_or_assign; }
+   ;
+
+expression:
+   assignment_expression
+   {
+      $$ = $1;
+   }
+   | expression ',' assignment_expression
+   {
+      void *ctx = state;
+      if ($1->oper != ast_sequence) {
+         $$ = new(ctx) ast_expression(ast_sequence, NULL, NULL, NULL);
+         $$->set_location_range(@1, @3);
+         $$->expressions.push_tail(& $1->link);
+      } else {
+         $$ = $1;
+      }
+
+      $$->expressions.push_tail(& $3->link);
+   }
+   ;
+
+constant_expression:
+   conditional_expression
+   ;
+
+declaration:
+   function_prototype ';'
+   {
+      state->symbols->pop_scope();
+      $$ = $1;
+   }
+   | init_declarator_list ';'
+   {
+      $$ = $1;
+   }
+   | PRECISION precision_qualifier type_specifier ';'
+   {
+      $3->default_precision = $2;
+      $$ = $3;
+   }
+   | interface_block
+   {
+      $$ = $1;
+   }
+   ;
+
+function_prototype:
+   function_declarator ')'
+   ;
+
+function_declarator:
+   function_header
+   | function_header_with_parameters
+   ;
+
+function_header_with_parameters:
+   function_header parameter_declaration
+   {
+      $$ = $1;
+      $$->parameters.push_tail(& $2->link);
+   }
+   | function_header_with_parameters ',' parameter_declaration
+   {
+      $$ = $1;
+      $$->parameters.push_tail(& $3->link);
+   }
+   ;
+
+function_header:
+   fully_specified_type variable_identifier '('
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_function();
+      $$->set_location(@2);
+      $$->return_type = $1;
+      $$->identifier = $2;
+
+      state->symbols->add_function(new(state) ir_function($2));
+      state->symbols->push_scope();
+   }
+   ;
+
+parameter_declarator:
+   type_specifier any_identifier
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_parameter_declarator();
+      $$->set_location_range(@1, @2);
+      $$->type = new(ctx) ast_fully_specified_type();
+      $$->type->set_location(@1);
+      $$->type->specifier = $1;
+      $$->identifier = $2;
+   }
+   | type_specifier any_identifier array_specifier
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_parameter_declarator();
+      $$->set_location_range(@1, @3);
+      $$->type = new(ctx) ast_fully_specified_type();
+      $$->type->set_location(@1);
+      $$->type->specifier = $1;
+      $$->identifier = $2;
+      $$->array_specifier = $3;
+   }
+   ;
+
+parameter_declaration:
+   parameter_qualifier parameter_declarator
+   {
+      $$ = $2;
+      $$->type->qualifier = $1;
+   }
+   | parameter_qualifier parameter_type_specifier
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_parameter_declarator();
+      $$->set_location(@2);
+      $$->type = new(ctx) ast_fully_specified_type();
+      $$->type->set_location_range(@1, @2);
+      $$->type->qualifier = $1;
+      $$->type->specifier = $2;
+   }
+   ;
+
+parameter_qualifier:
+   /* empty */
+   {
+      memset(& $$, 0, sizeof($$));
+   }
+   | CONST_TOK parameter_qualifier
+   {
+      if ($2.flags.q.constant)
+         _mesa_glsl_error(&@1, state, "duplicate const qualifier");
+
+      $$ = $2;
+      $$.flags.q.constant = 1;
+   }
+   | parameter_direction_qualifier parameter_qualifier
+   {
+      if (($1.flags.q.in || $1.flags.q.out) && ($2.flags.q.in || $2.flags.q.out))
+         _mesa_glsl_error(&@1, state, "duplicate in/out/inout qualifier");
+
+      if (!state->ARB_shading_language_420pack_enable && $2.flags.q.constant)
+         _mesa_glsl_error(&@1, state, "const must be specified before "
+                          "in/out/inout");
+
+      $$ = $1;
+      $$.merge_qualifier(&@1, state, $2);
+   }
+   | precision_qualifier parameter_qualifier
+   {
+      if ($2.precision != ast_precision_none)
+         _mesa_glsl_error(&@1, state, "duplicate precision qualifier");
+
+      if (!state->ARB_shading_language_420pack_enable && $2.flags.i != 0)
+         _mesa_glsl_error(&@1, state, "precision qualifiers must come last");
+
+      $$ = $2;
+      $$.precision = $1;
+   }
+
+parameter_direction_qualifier:
+   IN_TOK
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.in = 1;
+   }
+   | OUT_TOK
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.out = 1;
+   }
+   | INOUT_TOK
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.in = 1;
+      $$.flags.q.out = 1;
+   }
+   ;
+
+parameter_type_specifier:
+   type_specifier
+   ;
+
+init_declarator_list:
+   single_declaration
+   | init_declarator_list ',' any_identifier
+   {
+      void *ctx = state;
+      ast_declaration *decl = new(ctx) ast_declaration($3, NULL, NULL);
+      decl->set_location(@3);
+
+      $$ = $1;
+      $$->declarations.push_tail(&decl->link);
+      state->symbols->add_variable(new(state) ir_variable(NULL, $3, ir_var_auto));
+   }
+   | init_declarator_list ',' any_identifier array_specifier
+   {
+      void *ctx = state;
+      ast_declaration *decl = new(ctx) ast_declaration($3, $4, NULL);
+      decl->set_location_range(@3, @4);
+
+      $$ = $1;
+      $$->declarations.push_tail(&decl->link);
+      state->symbols->add_variable(new(state) ir_variable(NULL, $3, ir_var_auto));
+   }
+   | init_declarator_list ',' any_identifier array_specifier '=' initializer
+   {
+      void *ctx = state;
+      ast_declaration *decl = new(ctx) ast_declaration($3, $4, $6);
+      decl->set_location_range(@3, @4);
+
+      $$ = $1;
+      $$->declarations.push_tail(&decl->link);
+      state->symbols->add_variable(new(state) ir_variable(NULL, $3, ir_var_auto));
+   }
+   | init_declarator_list ',' any_identifier '=' initializer
+   {
+      void *ctx = state;
+      ast_declaration *decl = new(ctx) ast_declaration($3, NULL, $5);
+      decl->set_location(@3);
+
+      $$ = $1;
+      $$->declarations.push_tail(&decl->link);
+      state->symbols->add_variable(new(state) ir_variable(NULL, $3, ir_var_auto));
+   }
+   ;
+
+   // Grammar Note: No 'enum', or 'typedef'.
+single_declaration:
+   fully_specified_type
+   {
+      void *ctx = state;
+      /* Empty declaration list is valid. */
+      $$ = new(ctx) ast_declarator_list($1);
+      $$->set_location(@1);
+   }
+   | fully_specified_type any_identifier
+   {
+      void *ctx = state;
+      ast_declaration *decl = new(ctx) ast_declaration($2, NULL, NULL);
+      decl->set_location(@2);
+
+      $$ = new(ctx) ast_declarator_list($1);
+      $$->set_location_range(@1, @2);
+      $$->declarations.push_tail(&decl->link);
+   }
+   | fully_specified_type any_identifier array_specifier
+   {
+      void *ctx = state;
+      ast_declaration *decl = new(ctx) ast_declaration($2, $3, NULL);
+      decl->set_location_range(@2, @3);
+
+      $$ = new(ctx) ast_declarator_list($1);
+      $$->set_location_range(@1, @3);
+      $$->declarations.push_tail(&decl->link);
+   }
+   | fully_specified_type any_identifier array_specifier '=' initializer
+   {
+      void *ctx = state;
+      ast_declaration *decl = new(ctx) ast_declaration($2, $3, $5);
+      decl->set_location_range(@2, @3);
+
+      $$ = new(ctx) ast_declarator_list($1);
+      $$->set_location_range(@1, @3);
+      $$->declarations.push_tail(&decl->link);
+   }
+   | fully_specified_type any_identifier '=' initializer
+   {
+      void *ctx = state;
+      ast_declaration *decl = new(ctx) ast_declaration($2, NULL, $4);
+      decl->set_location(@2);
+
+      $$ = new(ctx) ast_declarator_list($1);
+      $$->set_location_range(@1, @2);
+      $$->declarations.push_tail(&decl->link);
+   }
+   | INVARIANT variable_identifier // Vertex only.
+   {
+      void *ctx = state;
+      ast_declaration *decl = new(ctx) ast_declaration($2, NULL, NULL);
+      decl->set_location(@2);
+
+      $$ = new(ctx) ast_declarator_list(NULL);
+      $$->set_location_range(@1, @2);
+      $$->invariant = true;
+
+      $$->declarations.push_tail(&decl->link);
+   }
+   ;
+
+fully_specified_type:
+   type_specifier
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_fully_specified_type();
+      $$->set_location(@1);
+      $$->specifier = $1;
+   }
+   | type_qualifier type_specifier
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_fully_specified_type();
+      $$->set_location_range(@1, @2);
+      $$->qualifier = $1;
+      $$->specifier = $2;
+   }
+   ;
+
+layout_qualifier:
+   LAYOUT_TOK '(' layout_qualifier_id_list ')'
+   {
+      $$ = $3;
+   }
+   ;
+
+layout_qualifier_id_list:
+   layout_qualifier_id
+   | layout_qualifier_id_list ',' layout_qualifier_id
+   {
+      $$ = $1;
+      if (!$$.merge_qualifier(& @3, state, $3)) {
+         YYERROR;
+      }
+   }
+   ;
+
+integer_constant:
+   INTCONSTANT { $$ = $1; }
+   | UINTCONSTANT { $$ = $1; }
+   ;
+
+layout_qualifier_id:
+   any_identifier
+   {
+      memset(& $$, 0, sizeof($$));
+
+      /* Layout qualifiers for ARB_fragment_coord_conventions. */
+      if (!$$.flags.i && (state->ARB_fragment_coord_conventions_enable ||
+                          state->is_version(150, 0))) {
+         if (match_layout_qualifier($1, "origin_upper_left", state) == 0) {
+            $$.flags.q.origin_upper_left = 1;
+         } else if (match_layout_qualifier($1, "pixel_center_integer",
+                                           state) == 0) {
+            $$.flags.q.pixel_center_integer = 1;
+         }
+
+         if ($$.flags.i && state->ARB_fragment_coord_conventions_warn) {
+            _mesa_glsl_warning(& @1, state,
+                               "GL_ARB_fragment_coord_conventions layout "
+                               "identifier `%s' used", $1);
+         }
+      }
+
+      /* Layout qualifiers for AMD/ARB_conservative_depth. */
+      if (!$$.flags.i &&
+          (state->AMD_conservative_depth_enable ||
+           state->ARB_conservative_depth_enable)) {
+         if (match_layout_qualifier($1, "depth_any", state) == 0) {
+            $$.flags.q.depth_any = 1;
+         } else if (match_layout_qualifier($1, "depth_greater", state) == 0) {
+            $$.flags.q.depth_greater = 1;
+         } else if (match_layout_qualifier($1, "depth_less", state) == 0) {
+            $$.flags.q.depth_less = 1;
+         } else if (match_layout_qualifier($1, "depth_unchanged",
+                                           state) == 0) {
+            $$.flags.q.depth_unchanged = 1;
+         }
+
+         if ($$.flags.i && state->AMD_conservative_depth_warn) {
+            _mesa_glsl_warning(& @1, state,
+                               "GL_AMD_conservative_depth "
+                               "layout qualifier `%s' is used", $1);
+         }
+         if ($$.flags.i && state->ARB_conservative_depth_warn) {
+            _mesa_glsl_warning(& @1, state,
+                               "GL_ARB_conservative_depth "
+                               "layout qualifier `%s' is used", $1);
+         }
+      }
+
+      /* See also interface_block_layout_qualifier. */
+      if (!$$.flags.i && state->has_uniform_buffer_objects()) {
+         if (match_layout_qualifier($1, "std140", state) == 0) {
+            $$.flags.q.std140 = 1;
+         } else if (match_layout_qualifier($1, "shared", state) == 0) {
+            $$.flags.q.shared = 1;
+         } else if (match_layout_qualifier($1, "column_major", state) == 0) {
+            $$.flags.q.column_major = 1;
+         /* "row_major" is a reserved word in GLSL 1.30+. Its token is parsed
+          * below in the interface_block_layout_qualifier rule.
+          *
+          * It is not a reserved word in GLSL ES 3.00, so it's handled here as
+          * an identifier.
+          *
+          * Also, this takes care of alternate capitalizations of
+          * "row_major" (which is necessary because layout qualifiers
+          * are case-insensitive in desktop GLSL).
+          */
+         } else if (match_layout_qualifier($1, "row_major", state) == 0) {
+            $$.flags.q.row_major = 1;
+         /* "packed" is a reserved word in GLSL, and its token is
+          * parsed below in the interface_block_layout_qualifier rule.
+          * However, we must take care of alternate capitalizations of
+          * "packed", because layout qualifiers are case-insensitive
+          * in desktop GLSL.
+          */
+         } else if (match_layout_qualifier($1, "packed", state) == 0) {
+           $$.flags.q.packed = 1;
+         }
+
+         if ($$.flags.i && state->ARB_uniform_buffer_object_warn) {
+            _mesa_glsl_warning(& @1, state,
+                               "#version 140 / GL_ARB_uniform_buffer_object "
+                               "layout qualifier `%s' is used", $1);
+         }
+      }
+
+      /* Layout qualifiers for GLSL 1.50 geometry shaders. */
+      if (!$$.flags.i) {
+         static const struct {
+            const char *s;
+            GLenum e;
+         } map[] = {
+                 { "points", GL_POINTS },
+                 { "lines", GL_LINES },
+                 { "lines_adjacency", GL_LINES_ADJACENCY },
+                 { "line_strip", GL_LINE_STRIP },
+                 { "triangles", GL_TRIANGLES },
+                 { "triangles_adjacency", GL_TRIANGLES_ADJACENCY },
+                 { "triangle_strip", GL_TRIANGLE_STRIP },
+         };
+         for (unsigned i = 0; i < Elements(map); i++) {
+            if (match_layout_qualifier($1, map[i].s, state) == 0) {
+               $$.flags.q.prim_type = 1;
+               $$.prim_type = map[i].e;
+               break;
+            }
+         }
+
+         if ($$.flags.i && !state->is_version(150, 0)) {
+            _mesa_glsl_error(& @1, state, "#version 150 layout "
+                             "qualifier `%s' used", $1);
+         }
+      }
+
+      /* Layout qualifiers for ARB_shader_image_load_store. */
+      if (state->ARB_shader_image_load_store_enable ||
+          state->is_version(420, 0)) {
+         if (!$$.flags.i) {
+            static const struct {
+               const char *name;
+               GLenum format;
+               glsl_base_type base_type;
+            } map[] = {
+               { "rgba32f", GL_RGBA32F, GLSL_TYPE_FLOAT },
+               { "rgba16f", GL_RGBA16F, GLSL_TYPE_FLOAT },
+               { "rg32f", GL_RG32F, GLSL_TYPE_FLOAT },
+               { "rg16f", GL_RG16F, GLSL_TYPE_FLOAT },
+               { "r11f_g11f_b10f", GL_R11F_G11F_B10F, GLSL_TYPE_FLOAT },
+               { "r32f", GL_R32F, GLSL_TYPE_FLOAT },
+               { "r16f", GL_R16F, GLSL_TYPE_FLOAT },
+               { "rgba32ui", GL_RGBA32UI, GLSL_TYPE_UINT },
+               { "rgba16ui", GL_RGBA16UI, GLSL_TYPE_UINT },
+               { "rgb10_a2ui", GL_RGB10_A2UI, GLSL_TYPE_UINT },
+               { "rgba8ui", GL_RGBA8UI, GLSL_TYPE_UINT },
+               { "rg32ui", GL_RG32UI, GLSL_TYPE_UINT },
+               { "rg16ui", GL_RG16UI, GLSL_TYPE_UINT },
+               { "rg8ui", GL_RG8UI, GLSL_TYPE_UINT },
+               { "r32ui", GL_R32UI, GLSL_TYPE_UINT },
+               { "r16ui", GL_R16UI, GLSL_TYPE_UINT },
+               { "r8ui", GL_R8UI, GLSL_TYPE_UINT },
+               { "rgba32i", GL_RGBA32I, GLSL_TYPE_INT },
+               { "rgba16i", GL_RGBA16I, GLSL_TYPE_INT },
+               { "rgba8i", GL_RGBA8I, GLSL_TYPE_INT },
+               { "rg32i", GL_RG32I, GLSL_TYPE_INT },
+               { "rg16i", GL_RG16I, GLSL_TYPE_INT },
+               { "rg8i", GL_RG8I, GLSL_TYPE_INT },
+               { "r32i", GL_R32I, GLSL_TYPE_INT },
+               { "r16i", GL_R16I, GLSL_TYPE_INT },
+               { "r8i", GL_R8I, GLSL_TYPE_INT },
+               { "rgba16", GL_RGBA16, GLSL_TYPE_FLOAT },
+               { "rgb10_a2", GL_RGB10_A2, GLSL_TYPE_FLOAT },
+               { "rgba8", GL_RGBA8, GLSL_TYPE_FLOAT },
+               { "rg16", GL_RG16, GLSL_TYPE_FLOAT },
+               { "rg8", GL_RG8, GLSL_TYPE_FLOAT },
+               { "r16", GL_R16, GLSL_TYPE_FLOAT },
+               { "r8", GL_R8, GLSL_TYPE_FLOAT },
+               { "rgba16_snorm", GL_RGBA16_SNORM, GLSL_TYPE_FLOAT },
+               { "rgba8_snorm", GL_RGBA8_SNORM, GLSL_TYPE_FLOAT },
+               { "rg16_snorm", GL_RG16_SNORM, GLSL_TYPE_FLOAT },
+               { "rg8_snorm", GL_RG8_SNORM, GLSL_TYPE_FLOAT },
+               { "r16_snorm", GL_R16_SNORM, GLSL_TYPE_FLOAT },
+               { "r8_snorm", GL_R8_SNORM, GLSL_TYPE_FLOAT }
+            };
+
+            for (unsigned i = 0; i < Elements(map); i++) {
+               if (match_layout_qualifier($1, map[i].name, state) == 0) {
+                  $$.flags.q.explicit_image_format = 1;
+                  $$.image_format = map[i].format;
+                  $$.image_base_type = map[i].base_type;
+                  break;
+               }
+            }
+         }
+
+         if (!$$.flags.i &&
+             match_layout_qualifier($1, "early_fragment_tests", state) == 0) {
+            $$.flags.q.early_fragment_tests = 1;
+         }
+      }
+
+      if (!$$.flags.i) {
+         _mesa_glsl_error(& @1, state, "unrecognized layout identifier "
+                          "`%s'", $1);
+         YYERROR;
+      }
+   }
+   | any_identifier '=' integer_constant
+   {
+      memset(& $$, 0, sizeof($$));
+
+      if (match_layout_qualifier("location", $1, state) == 0) {
+         $$.flags.q.explicit_location = 1;
+
+         if ($$.flags.q.attribute == 1 &&
+             state->ARB_explicit_attrib_location_warn) {
+            _mesa_glsl_warning(& @1, state,
+                               "GL_ARB_explicit_attrib_location layout "
+                               "identifier `%s' used", $1);
+         }
+
+         if ($3 >= 0) {
+            $$.location = $3;
+         } else {
+             _mesa_glsl_error(& @3, state, "invalid location %d specified", $3);
+             YYERROR;
+         }
+      }
+
+      if (match_layout_qualifier("index", $1, state) == 0) {
+         $$.flags.q.explicit_index = 1;
+
+         if ($3 >= 0) {
+            $$.index = $3;
+         } else {
+            _mesa_glsl_error(& @3, state, "invalid index %d specified", $3);
+            YYERROR;
+         }
+      }
+
+      if ((state->ARB_shading_language_420pack_enable ||
+           state->ARB_shader_atomic_counters_enable) &&
+          match_layout_qualifier("binding", $1, state) == 0) {
+         $$.flags.q.explicit_binding = 1;
+         $$.binding = $3;
+      }
+
+      if (state->ARB_shader_atomic_counters_enable &&
+          match_layout_qualifier("offset", $1, state) == 0) {
+         $$.flags.q.explicit_offset = 1;
+         $$.offset = $3;
+      }
+
+      if (match_layout_qualifier("max_vertices", $1, state) == 0) {
+         $$.flags.q.max_vertices = 1;
+
+         if ($3 < 0) {
+            _mesa_glsl_error(& @3, state,
+                             "invalid max_vertices %d specified", $3);
+            YYERROR;
+         } else {
+            $$.max_vertices = $3;
+            if (!state->is_version(150, 0)) {
+               _mesa_glsl_error(& @3, state,
+                                "#version 150 max_vertices qualifier "
+                                "specified", $3);
+            }
+         }
+      }
+
+      static const char * const local_size_qualifiers[3] = {
+         "local_size_x",
+         "local_size_y",
+         "local_size_z",
+      };
+      for (int i = 0; i < 3; i++) {
+         if (match_layout_qualifier(local_size_qualifiers[i], $1,
+                                    state) == 0) {
+            if ($3 <= 0) {
+               _mesa_glsl_error(& @3, state,
+                                "invalid %s of %d specified",
+                                local_size_qualifiers[i], $3);
+               YYERROR;
+            } else if (!state->is_version(430, 0) &&
+                       !state->ARB_compute_shader_enable) {
+               _mesa_glsl_error(& @3, state,
+                                "%s qualifier requires GLSL 4.30 or "
+                                "ARB_compute_shader",
+                                local_size_qualifiers[i]);
+               YYERROR;
+            } else {
+               $$.flags.q.local_size |= (1 << i);
+               $$.local_size[i] = $3;
+            }
+            break;
+         }
+      }
+
+      if (match_layout_qualifier("invocations", $1, state) == 0) {
+         $$.flags.q.invocations = 1;
+
+         if ($3 <= 0) {
+            _mesa_glsl_error(& @3, state,
+                             "invalid invocations %d specified", $3);
+            YYERROR;
+         } else if ($3 > MAX_GEOMETRY_SHADER_INVOCATIONS) {
+            _mesa_glsl_error(& @3, state,
+                             "invocations (%d) exceeds "
+                             "GL_MAX_GEOMETRY_SHADER_INVOCATIONS", $3);
+            YYERROR;
+         } else {
+            $$.invocations = $3;
+            if (!state->is_version(400, 0) &&
+                !state->ARB_gpu_shader5_enable) {
+               _mesa_glsl_error(& @3, state,
+                                "GL_ARB_gpu_shader5 invocations "
+                                "qualifier specified", $3);
+            }
+         }
+      }
+
+      /* If the identifier didn't match any known layout identifiers,
+       * emit an error.
+       */
+      if (!$$.flags.i) {
+         _mesa_glsl_error(& @1, state, "unrecognized layout identifier "
+                          "`%s'", $1);
+         YYERROR;
+      }
+   }
+   | interface_block_layout_qualifier
+   {
+      $$ = $1;
+      /* Layout qualifiers for ARB_uniform_buffer_object. */
+      if ($$.flags.q.uniform && !state->has_uniform_buffer_objects()) {
+         _mesa_glsl_error(& @1, state,
+                          "#version 140 / GL_ARB_uniform_buffer_object "
+                          "layout qualifier `%s' is used", $1);
+      } else if ($$.flags.q.uniform && state->ARB_uniform_buffer_object_warn) {
+         _mesa_glsl_warning(& @1, state,
+                            "#version 140 / GL_ARB_uniform_buffer_object "
+                            "layout qualifier `%s' is used", $1);
+      }
+   }
+   ;
+
+/* This is a separate language rule because we parse these as tokens
+ * (due to them being reserved keywords) instead of identifiers like
+ * most qualifiers.  See the any_identifier path of
+ * layout_qualifier_id for the others.
+ *
+ * Note that since layout qualifiers are case-insensitive in desktop
+ * GLSL, all of these qualifiers need to be handled as identifiers as
+ * well (by the any_identifier path of layout_qualifier_id).
+ */
+interface_block_layout_qualifier:
+   ROW_MAJOR
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.row_major = 1;
+   }
+   | PACKED_TOK
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.packed = 1;
+   }
+   ;
+
+interpolation_qualifier:
+   SMOOTH
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.smooth = 1;
+   }
+   | FLAT
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.flat = 1;
+   }
+   | NOPERSPECTIVE
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.noperspective = 1;
+   }
+   ;
+
+type_qualifier:
+   /* Single qualifiers */
+   INVARIANT
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.invariant = 1;
+   }
+   | auxiliary_storage_qualifier
+   | storage_qualifier
+   | interpolation_qualifier
+   | layout_qualifier
+   | precision_qualifier
+   {
+      memset(&$$, 0, sizeof($$));
+      $$.precision = $1;
+   }
+
+   /* Multiple qualifiers:
+    * In GLSL 4.20, these can be specified in any order.  In earlier versions,
+    * they appear in this order (see GLSL 1.50 section 4.7 & comments below):
+    *
+    *    invariant interpolation auxiliary storage precision  ...or...
+    *    layout storage precision
+    *
+    * Each qualifier's rule ensures that the accumulated qualifiers on the right
+    * side don't contain any that must appear on the left hand side.
+    * For example, when processing a storage qualifier, we check that there are
+    * no auxiliary, interpolation, layout, or invariant qualifiers to the right.
+    */
+   | INVARIANT type_qualifier
+   {
+      if ($2.flags.q.invariant)
+         _mesa_glsl_error(&@1, state, "duplicate \"invariant\" qualifier");
+
+      if ($2.has_layout()) {
+         _mesa_glsl_error(&@1, state,
+                          "\"invariant\" cannot be used with layout(...)");
+      }
+
+      $$ = $2;
+      $$.flags.q.invariant = 1;
+   }
+   | interpolation_qualifier type_qualifier
+   {
+      /* Section 4.3 of the GLSL 1.40 specification states:
+       * "...qualified with one of these interpolation qualifiers"
+       *
+       * GLSL 1.30 claims to allow "one or more", but insists that:
+       * "These interpolation qualifiers may only precede the qualifiers in,
+       *  centroid in, out, or centroid out in a declaration."
+       *
+       * ...which means that e.g. smooth can't precede smooth, so there can be
+       * only one after all, and the 1.40 text is a clarification, not a change.
+       */
+      if ($2.has_interpolation())
+         _mesa_glsl_error(&@1, state, "duplicate interpolation qualifier");
+
+      if ($2.has_layout()) {
+         _mesa_glsl_error(&@1, state, "interpolation qualifiers cannot be used "
+                          "with layout(...)");
+      }
+
+      if (!state->ARB_shading_language_420pack_enable && $2.flags.q.invariant) {
+         _mesa_glsl_error(&@1, state, "interpolation qualifiers must come "
+                          "after \"invariant\"");
+      }
+
+      $$ = $1;
+      $$.merge_qualifier(&@1, state, $2);
+   }
+   | layout_qualifier type_qualifier
+   {
+      /* The GLSL 1.50 grammar indicates that a layout(...) declaration can be
+       * used standalone or immediately before a storage qualifier.  It cannot
+       * be used with interpolation qualifiers or invariant.  There does not
+       * appear to be any text indicating that it must come before the storage
+       * qualifier, but always seems to in examples.
+       */
+      if (!state->ARB_shading_language_420pack_enable && $2.has_layout())
+         _mesa_glsl_error(&@1, state, "duplicate layout(...) qualifiers");
+
+      if ($2.flags.q.invariant)
+         _mesa_glsl_error(&@1, state, "layout(...) cannot be used with "
+                          "the \"invariant\" qualifier");
+
+      if ($2.has_interpolation()) {
+         _mesa_glsl_error(&@1, state, "layout(...) cannot be used with "
+                          "interpolation qualifiers");
+      }
+
+      $$ = $1;
+      $$.merge_qualifier(&@1, state, $2);
+   }
+   | auxiliary_storage_qualifier type_qualifier
+   {
+      if ($2.has_auxiliary_storage()) {
+         _mesa_glsl_error(&@1, state,
+                          "duplicate auxiliary storage qualifier (centroid or sample)");
+      }
+
+      if (!state->ARB_shading_language_420pack_enable &&
+          ($2.flags.q.invariant || $2.has_interpolation() || $2.has_layout())) {
+         _mesa_glsl_error(&@1, state, "auxiliary storage qualifiers must come "
+                          "just before storage qualifiers");
+      }
+      $$ = $1;
+      $$.merge_qualifier(&@1, state, $2);
+   }
+   | storage_qualifier type_qualifier
+   {
+      /* Section 4.3 of the GLSL 1.20 specification states:
+       * "Variable declarations may have a storage qualifier specified..."
+       *  1.30 clarifies this to "may have one storage qualifier".
+       */
+      if ($2.has_storage())
+         _mesa_glsl_error(&@1, state, "duplicate storage qualifier");
+
+      if (!state->ARB_shading_language_420pack_enable &&
+          ($2.flags.q.invariant || $2.has_interpolation() || $2.has_layout() ||
+           $2.has_auxiliary_storage())) {
+         _mesa_glsl_error(&@1, state, "storage qualifiers must come after "
+                          "invariant, interpolation, layout and auxiliary "
+                          "storage qualifiers");
+      }
+
+      $$ = $1;
+      $$.merge_qualifier(&@1, state, $2);
+   }
+   | precision_qualifier type_qualifier
+   {
+      if ($2.precision != ast_precision_none)
+         _mesa_glsl_error(&@1, state, "duplicate precision qualifier");
+
+      if (!state->ARB_shading_language_420pack_enable && $2.flags.i != 0)
+         _mesa_glsl_error(&@1, state, "precision qualifiers must come last");
+
+      $$ = $2;
+      $$.precision = $1;
+   }
+   ;
+
+auxiliary_storage_qualifier:
+   CENTROID
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.centroid = 1;
+   }
+   | SAMPLE
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.sample = 1;
+   }
+   /* TODO: "patch" also goes here someday. */
+
+storage_qualifier:
+   CONST_TOK
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.constant = 1;
+   }
+   | ATTRIBUTE
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.attribute = 1;
+   }
+   | VARYING
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.varying = 1;
+   }
+   | IN_TOK
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.in = 1;
+   }
+   | OUT_TOK
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.out = 1;
+   }
+   | UNIFORM
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.uniform = 1;
+   }
+   | COHERENT
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.coherent = 1;
+   }
+   | VOLATILE
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q._volatile = 1;
+   }
+   | RESTRICT
+   {
+      STATIC_ASSERT(sizeof($$.flags.q) <= sizeof($$.flags.i));
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.restrict_flag = 1;
+   }
+   | READONLY
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.read_only = 1;
+   }
+   | WRITEONLY
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.write_only = 1;
+   }
+   ;
+
+array_specifier:
+   '[' ']'
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_array_specifier(@1);
+      $$->set_location_range(@1, @2);
+   }
+   | '[' constant_expression ']'
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_array_specifier(@1, $2);
+      $$->set_location_range(@1, @3);
+   }
+   | array_specifier '[' ']'
+   {
+      $$ = $1;
+
+      if (!state->ARB_arrays_of_arrays_enable) {
+         _mesa_glsl_error(& @1, state,
+                          "GL_ARB_arrays_of_arrays "
+                          "required for defining arrays of arrays");
+      } else {
+         _mesa_glsl_error(& @1, state,
+                          "only the outermost array dimension can "
+                          "be unsized");
+      }
+   }
+   | array_specifier '[' constant_expression ']'
+   {
+      $$ = $1;
+
+      if (!state->ARB_arrays_of_arrays_enable) {
+         _mesa_glsl_error(& @1, state,
+                          "GL_ARB_arrays_of_arrays "
+                          "required for defining arrays of arrays");
+      }
+
+      $$->add_dimension($3);
+   }
+   ;
+
+type_specifier:
+   type_specifier_nonarray
+   | type_specifier_nonarray array_specifier
+   {
+      $$ = $1;
+      $$->array_specifier = $2;
+   }
+   ;
+
+type_specifier_nonarray:
+   basic_type_specifier_nonarray
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_type_specifier($1);
+      $$->set_location(@1);
+   }
+   | struct_specifier
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_type_specifier($1);
+      $$->set_location(@1);
+   }
+   | TYPE_IDENTIFIER
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_type_specifier($1);
+      $$->set_location(@1);
+   }
+   ;
+
+basic_type_specifier_nonarray:
+   VOID_TOK                 { $$ = "void"; }
+   | FLOAT_TOK              { $$ = "float"; }
+   | INT_TOK                { $$ = "int"; }
+   | UINT_TOK               { $$ = "uint"; }
+   | BOOL_TOK               { $$ = "bool"; }
+   | VEC2                   { $$ = "vec2"; }
+   | VEC3                   { $$ = "vec3"; }
+   | VEC4                   { $$ = "vec4"; }
+   | BVEC2                  { $$ = "bvec2"; }
+   | BVEC3                  { $$ = "bvec3"; }
+   | BVEC4                  { $$ = "bvec4"; }
+   | IVEC2                  { $$ = "ivec2"; }
+   | IVEC3                  { $$ = "ivec3"; }
+   | IVEC4                  { $$ = "ivec4"; }
+   | UVEC2                  { $$ = "uvec2"; }
+   | UVEC3                  { $$ = "uvec3"; }
+   | UVEC4                  { $$ = "uvec4"; }
+   | MAT2X2                 { $$ = "mat2"; }
+   | MAT2X3                 { $$ = "mat2x3"; }
+   | MAT2X4                 { $$ = "mat2x4"; }
+   | MAT3X2                 { $$ = "mat3x2"; }
+   | MAT3X3                 { $$ = "mat3"; }
+   | MAT3X4                 { $$ = "mat3x4"; }
+   | MAT4X2                 { $$ = "mat4x2"; }
+   | MAT4X3                 { $$ = "mat4x3"; }
+   | MAT4X4                 { $$ = "mat4"; }
+   | SAMPLER1D              { $$ = "sampler1D"; }
+   | SAMPLER2D              { $$ = "sampler2D"; }
+   | SAMPLER2DRECT          { $$ = "sampler2DRect"; }
+   | SAMPLER3D              { $$ = "sampler3D"; }
+   | SAMPLERCUBE            { $$ = "samplerCube"; }
+   | SAMPLEREXTERNALOES     { $$ = "samplerExternalOES"; }
+   | SAMPLER1DSHADOW        { $$ = "sampler1DShadow"; }
+   | SAMPLER2DSHADOW        { $$ = "sampler2DShadow"; }
+   | SAMPLER2DRECTSHADOW    { $$ = "sampler2DRectShadow"; }
+   | SAMPLERCUBESHADOW      { $$ = "samplerCubeShadow"; }
+   | SAMPLER1DARRAY         { $$ = "sampler1DArray"; }
+   | SAMPLER2DARRAY         { $$ = "sampler2DArray"; }
+   | SAMPLER1DARRAYSHADOW   { $$ = "sampler1DArrayShadow"; }
+   | SAMPLER2DARRAYSHADOW   { $$ = "sampler2DArrayShadow"; }
+   | SAMPLERBUFFER          { $$ = "samplerBuffer"; }
+   | SAMPLERCUBEARRAY       { $$ = "samplerCubeArray"; }
+   | SAMPLERCUBEARRAYSHADOW { $$ = "samplerCubeArrayShadow"; }
+   | ISAMPLER1D             { $$ = "isampler1D"; }
+   | ISAMPLER2D             { $$ = "isampler2D"; }
+   | ISAMPLER2DRECT         { $$ = "isampler2DRect"; }
+   | ISAMPLER3D             { $$ = "isampler3D"; }
+   | ISAMPLERCUBE           { $$ = "isamplerCube"; }
+   | ISAMPLER1DARRAY        { $$ = "isampler1DArray"; }
+   | ISAMPLER2DARRAY        { $$ = "isampler2DArray"; }
+   | ISAMPLERBUFFER         { $$ = "isamplerBuffer"; }
+   | ISAMPLERCUBEARRAY      { $$ = "isamplerCubeArray"; }
+   | USAMPLER1D             { $$ = "usampler1D"; }
+   | USAMPLER2D             { $$ = "usampler2D"; }
+   | USAMPLER2DRECT         { $$ = "usampler2DRect"; }
+   | USAMPLER3D             { $$ = "usampler3D"; }
+   | USAMPLERCUBE           { $$ = "usamplerCube"; }
+   | USAMPLER1DARRAY        { $$ = "usampler1DArray"; }
+   | USAMPLER2DARRAY        { $$ = "usampler2DArray"; }
+   | USAMPLERBUFFER         { $$ = "usamplerBuffer"; }
+   | USAMPLERCUBEARRAY      { $$ = "usamplerCubeArray"; }
+   | SAMPLER2DMS            { $$ = "sampler2DMS"; }
+   | ISAMPLER2DMS           { $$ = "isampler2DMS"; }
+   | USAMPLER2DMS           { $$ = "usampler2DMS"; }
+   | SAMPLER2DMSARRAY       { $$ = "sampler2DMSArray"; }
+   | ISAMPLER2DMSARRAY      { $$ = "isampler2DMSArray"; }
+   | USAMPLER2DMSARRAY      { $$ = "usampler2DMSArray"; }
+   | IMAGE1D                { $$ = "image1D"; }
+   | IMAGE2D                { $$ = "image2D"; }
+   | IMAGE3D                { $$ = "image3D"; }
+   | IMAGE2DRECT            { $$ = "image2DRect"; }
+   | IMAGECUBE              { $$ = "imageCube"; }
+   | IMAGEBUFFER            { $$ = "imageBuffer"; }
+   | IMAGE1DARRAY           { $$ = "image1DArray"; }
+   | IMAGE2DARRAY           { $$ = "image2DArray"; }
+   | IMAGECUBEARRAY         { $$ = "imageCubeArray"; }
+   | IMAGE2DMS              { $$ = "image2DMS"; }
+   | IMAGE2DMSARRAY         { $$ = "image2DMSArray"; }
+   | IIMAGE1D               { $$ = "iimage1D"; }
+   | IIMAGE2D               { $$ = "iimage2D"; }
+   | IIMAGE3D               { $$ = "iimage3D"; }
+   | IIMAGE2DRECT           { $$ = "iimage2DRect"; }
+   | IIMAGECUBE             { $$ = "iimageCube"; }
+   | IIMAGEBUFFER           { $$ = "iimageBuffer"; }
+   | IIMAGE1DARRAY          { $$ = "iimage1DArray"; }
+   | IIMAGE2DARRAY          { $$ = "iimage2DArray"; }
+   | IIMAGECUBEARRAY        { $$ = "iimageCubeArray"; }
+   | IIMAGE2DMS             { $$ = "iimage2DMS"; }
+   | IIMAGE2DMSARRAY        { $$ = "iimage2DMSArray"; }
+   | UIMAGE1D               { $$ = "uimage1D"; }
+   | UIMAGE2D               { $$ = "uimage2D"; }
+   | UIMAGE3D               { $$ = "uimage3D"; }
+   | UIMAGE2DRECT           { $$ = "uimage2DRect"; }
+   | UIMAGECUBE             { $$ = "uimageCube"; }
+   | UIMAGEBUFFER           { $$ = "uimageBuffer"; }
+   | UIMAGE1DARRAY          { $$ = "uimage1DArray"; }
+   | UIMAGE2DARRAY          { $$ = "uimage2DArray"; }
+   | UIMAGECUBEARRAY        { $$ = "uimageCubeArray"; }
+   | UIMAGE2DMS             { $$ = "uimage2DMS"; }
+   | UIMAGE2DMSARRAY        { $$ = "uimage2DMSArray"; }
+   | ATOMIC_UINT            { $$ = "atomic_uint"; }
+   ;
+
+precision_qualifier:
+   HIGHP
+   {
+      state->check_precision_qualifiers_allowed(&@1);
+      $$ = ast_precision_high;
+   }
+   | MEDIUMP
+   {
+      state->check_precision_qualifiers_allowed(&@1);
+      $$ = ast_precision_medium;
+   }
+   | LOWP
+   {
+      state->check_precision_qualifiers_allowed(&@1);
+      $$ = ast_precision_low;
+   }
+   ;
+
+struct_specifier:
+   STRUCT any_identifier '{' struct_declaration_list '}'
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_struct_specifier($2, $4);
+      $$->set_location_range(@2, @5);
+      state->symbols->add_type($2, glsl_type::void_type);
+   }
+   | STRUCT '{' struct_declaration_list '}'
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_struct_specifier(NULL, $3);
+      $$->set_location_range(@2, @4);
+   }
+   ;
+
+struct_declaration_list:
+   struct_declaration
+   {
+      $$ = $1;
+      $1->link.self_link();
+   }
+   | struct_declaration_list struct_declaration
+   {
+      $$ = $1;
+      $$->link.insert_before(& $2->link);
+   }
+   ;
+
+struct_declaration:
+   fully_specified_type struct_declarator_list ';'
+   {
+      void *ctx = state;
+      ast_fully_specified_type *const type = $1;
+      type->set_location(@1);
+
+      if (type->qualifier.flags.i != 0)
+         _mesa_glsl_error(&@1, state,
+			  "only precision qualifiers may be applied to "
+			  "structure members");
+
+      $$ = new(ctx) ast_declarator_list(type);
+      $$->set_location(@2);
+
+      $$->declarations.push_degenerate_list_at_head(& $2->link);
+   }
+   ;
+
+struct_declarator_list:
+   struct_declarator
+   {
+      $$ = $1;
+      $1->link.self_link();
+   }
+   | struct_declarator_list ',' struct_declarator
+   {
+      $$ = $1;
+      $$->link.insert_before(& $3->link);
+   }
+   ;
+
+struct_declarator:
+   any_identifier
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_declaration($1, NULL, NULL);
+      $$->set_location(@1);
+   }
+   | any_identifier array_specifier
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_declaration($1, $2, NULL);
+      $$->set_location_range(@1, @2);
+   }
+   ;
+
+initializer:
+   assignment_expression
+   | '{' initializer_list '}'
+   {
+      $$ = $2;
+   }
+   | '{' initializer_list ',' '}'
+   {
+      $$ = $2;
+   }
+   ;
+
+initializer_list:
+   initializer
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_aggregate_initializer();
+      $$->set_location(@1);
+      $$->expressions.push_tail(& $1->link);
+   }
+   | initializer_list ',' initializer
+   {
+      $1->expressions.push_tail(& $3->link);
+   }
+   ;
+
+declaration_statement:
+   declaration
+   ;
+
+   // Grammar Note: labeled statements for SWITCH only; 'goto' is not
+   // supported.
+statement:
+   compound_statement        { $$ = (ast_node *) $1; }
+   | simple_statement
+   ;
+
+simple_statement:
+   declaration_statement
+   | expression_statement
+   | selection_statement
+   | switch_statement
+   | iteration_statement
+   | jump_statement
+   ;
+
+compound_statement:
+   '{' '}'
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_compound_statement(true, NULL);
+      $$->set_location_range(@1, @2);
+   }
+   | '{'
+   {
+      state->symbols->push_scope();
+   }
+   statement_list '}'
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_compound_statement(true, $3);
+      $$->set_location_range(@1, @4);
+      state->symbols->pop_scope();
+   }
+   ;
+
+statement_no_new_scope:
+   compound_statement_no_new_scope { $$ = (ast_node *) $1; }
+   | simple_statement
+   ;
+
+compound_statement_no_new_scope:
+   '{' '}'
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_compound_statement(false, NULL);
+      $$->set_location_range(@1, @2);
+   }
+   | '{' statement_list '}'
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_compound_statement(false, $2);
+      $$->set_location_range(@1, @3);
+   }
+   ;
+
+statement_list:
+   statement
+   {
+      if ($1 == NULL) {
+         _mesa_glsl_error(& @1, state, "<nil> statement");
+         assert($1 != NULL);
+      }
+
+      $$ = $1;
+      $$->link.self_link();
+   }
+   | statement_list statement
+   {
+      if ($2 == NULL) {
+         _mesa_glsl_error(& @2, state, "<nil> statement");
+         assert($2 != NULL);
+      }
+      $$ = $1;
+      $$->link.insert_before(& $2->link);
+   }
+   ;
+
+expression_statement:
+   ';'
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression_statement(NULL);
+      $$->set_location(@1);
+   }
+   | expression ';'
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_expression_statement($1);
+      $$->set_location(@1);
+   }
+   ;
+
+selection_statement:
+   IF '(' expression ')' selection_rest_statement
+   {
+      $$ = new(state) ast_selection_statement($3, $5.then_statement,
+                                              $5.else_statement);
+      $$->set_location_range(@1, @5);
+   }
+   ;
+
+selection_rest_statement:
+   statement ELSE statement
+   {
+      $$.then_statement = $1;
+      $$.else_statement = $3;
+   }
+   | statement %prec THEN
+   {
+      $$.then_statement = $1;
+      $$.else_statement = NULL;
+   }
+   ;
+
+condition:
+   expression
+   {
+      $$ = (ast_node *) $1;
+   }
+   | fully_specified_type any_identifier '=' initializer
+   {
+      void *ctx = state;
+      ast_declaration *decl = new(ctx) ast_declaration($2, NULL, $4);
+      ast_declarator_list *declarator = new(ctx) ast_declarator_list($1);
+      decl->set_location_range(@2, @4);
+      declarator->set_location(@1);
+
+      declarator->declarations.push_tail(&decl->link);
+      $$ = declarator;
+   }
+   ;
+
+/*
+ * siwtch_statement grammar is based on the syntax described in the body
+ * of the GLSL spec, not in it's appendix!!!
+ */
+switch_statement:
+   SWITCH '(' expression ')' switch_body
+   {
+      $$ = new(state) ast_switch_statement($3, $5);
+      $$->set_location_range(@1, @5);
+   }
+   ;
+
+switch_body:
+   '{' '}'
+   {
+      $$ = new(state) ast_switch_body(NULL);
+      $$->set_location_range(@1, @2);
+   }
+   | '{' case_statement_list '}'
+   {
+      $$ = new(state) ast_switch_body($2);
+      $$->set_location_range(@1, @3);
+   }
+   ;
+
+case_label:
+   CASE expression ':'
+   {
+      $$ = new(state) ast_case_label($2);
+      $$->set_location(@2);
+   }
+   | DEFAULT ':'
+   {
+      $$ = new(state) ast_case_label(NULL);
+      $$->set_location(@2);
+   }
+   ;
+
+case_label_list:
+   case_label
+   {
+      ast_case_label_list *labels = new(state) ast_case_label_list();
+
+      labels->labels.push_tail(& $1->link);
+      $$ = labels;
+      $$->set_location(@1);
+   }
+   | case_label_list case_label
+   {
+      $$ = $1;
+      $$->labels.push_tail(& $2->link);
+   }
+   ;
+
+case_statement:
+   case_label_list statement
+   {
+      ast_case_statement *stmts = new(state) ast_case_statement($1);
+      stmts->set_location(@2);
+
+      stmts->stmts.push_tail(& $2->link);
+      $$ = stmts;
+   }
+   | case_statement statement
+   {
+      $$ = $1;
+      $$->stmts.push_tail(& $2->link);
+   }
+   ;
+
+case_statement_list:
+   case_statement
+   {
+      ast_case_statement_list *cases= new(state) ast_case_statement_list();
+      cases->set_location(@1);
+
+      cases->cases.push_tail(& $1->link);
+      $$ = cases;
+   }
+   | case_statement_list case_statement
+   {
+      $$ = $1;
+      $$->cases.push_tail(& $2->link);
+   }
+   ;
+
+iteration_statement:
+   WHILE '(' condition ')' statement_no_new_scope
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_iteration_statement(ast_iteration_statement::ast_while,
+                                            NULL, $3, NULL, $5);
+      $$->set_location_range(@1, @4);
+   }
+   | DO statement WHILE '(' expression ')' ';'
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_iteration_statement(ast_iteration_statement::ast_do_while,
+                                            NULL, $5, NULL, $2);
+      $$->set_location_range(@1, @6);
+   }
+   | FOR '(' for_init_statement for_rest_statement ')' statement_no_new_scope
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_iteration_statement(ast_iteration_statement::ast_for,
+                                            $3, $4.cond, $4.rest, $6);
+      $$->set_location_range(@1, @6);
+   }
+   ;
+
+for_init_statement:
+   expression_statement
+   | declaration_statement
+   ;
+
+conditionopt:
+   condition
+   | /* empty */
+   {
+      $$ = NULL;
+   }
+   ;
+
+for_rest_statement:
+   conditionopt ';'
+   {
+      $$.cond = $1;
+      $$.rest = NULL;
+   }
+   | conditionopt ';' expression
+   {
+      $$.cond = $1;
+      $$.rest = $3;
+   }
+   ;
+
+   // Grammar Note: No 'goto'. Gotos are not supported.
+jump_statement:
+   CONTINUE ';'
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_jump_statement(ast_jump_statement::ast_continue, NULL);
+      $$->set_location(@1);
+   }
+   | BREAK ';'
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_jump_statement(ast_jump_statement::ast_break, NULL);
+      $$->set_location(@1);
+   }
+   | RETURN ';'
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_jump_statement(ast_jump_statement::ast_return, NULL);
+      $$->set_location(@1);
+   }
+   | RETURN expression ';'
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_jump_statement(ast_jump_statement::ast_return, $2);
+      $$->set_location_range(@1, @2);
+   }
+   | DISCARD ';' // Fragment shader only.
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_jump_statement(ast_jump_statement::ast_discard, NULL);
+      $$->set_location(@1);
+   }
+   ;
+
+external_declaration:
+   function_definition      { $$ = $1; }
+   | declaration            { $$ = $1; }
+   | pragma_statement       { $$ = NULL; }
+   | layout_defaults        { $$ = $1; }
+   ;
+
+function_definition:
+   function_prototype compound_statement_no_new_scope
+   {
+      void *ctx = state;
+      $$ = new(ctx) ast_function_definition();
+      $$->set_location_range(@1, @2);
+      $$->prototype = $1;
+      $$->body = $2;
+
+      state->symbols->pop_scope();
+   }
+   ;
+
+/* layout_qualifieropt is packed into this rule */
+interface_block:
+   basic_interface_block
+   {
+      $$ = $1;
+   }
+   | layout_qualifier basic_interface_block
+   {
+      ast_interface_block *block = $2;
+      if (!block->layout.merge_qualifier(& @1, state, $1)) {
+         YYERROR;
+      }
+      $$ = block;
+   }
+   ;
+
+basic_interface_block:
+   interface_qualifier NEW_IDENTIFIER '{' member_list '}' instance_name_opt ';'
+   {
+      ast_interface_block *const block = $6;
+
+      block->block_name = $2;
+      block->declarations.push_degenerate_list_at_head(& $4->link);
+
+      if ($1.flags.q.uniform) {
+         if (!state->has_uniform_buffer_objects()) {
+            _mesa_glsl_error(& @1, state,
+                             "#version 140 / GL_ARB_uniform_buffer_object "
+                             "required for defining uniform blocks");
+         } else if (state->ARB_uniform_buffer_object_warn) {
+            _mesa_glsl_warning(& @1, state,
+                               "#version 140 / GL_ARB_uniform_buffer_object "
+                               "required for defining uniform blocks");
+         }
+      } else {
+         if (state->es_shader || state->language_version < 150) {
+            _mesa_glsl_error(& @1, state,
+                             "#version 150 required for using "
+                             "interface blocks");
+         }
+      }
+
+      /* From the GLSL 1.50.11 spec, section 4.3.7 ("Interface Blocks"):
+       * "It is illegal to have an input block in a vertex shader
+       *  or an output block in a fragment shader"
+       */
+      if ((state->stage == MESA_SHADER_VERTEX) && $1.flags.q.in) {
+         _mesa_glsl_error(& @1, state,
+                          "`in' interface block is not allowed for "
+                          "a vertex shader");
+      } else if ((state->stage == MESA_SHADER_FRAGMENT) && $1.flags.q.out) {
+         _mesa_glsl_error(& @1, state,
+                          "`out' interface block is not allowed for "
+                          "a fragment shader");
+      }
+
+      /* Since block arrays require names, and both features are added in
+       * the same language versions, we don't have to explicitly
+       * version-check both things.
+       */
+      if (block->instance_name != NULL) {
+         state->check_version(150, 300, & @1, "interface blocks with "
+                               "an instance name are not allowed");
+      }
+
+      uint64_t interface_type_mask;
+      struct ast_type_qualifier temp_type_qualifier;
+
+      /* Get a bitmask containing only the in/out/uniform flags, allowing us
+       * to ignore other irrelevant flags like interpolation qualifiers.
+       */
+      temp_type_qualifier.flags.i = 0;
+      temp_type_qualifier.flags.q.uniform = true;
+      temp_type_qualifier.flags.q.in = true;
+      temp_type_qualifier.flags.q.out = true;
+      interface_type_mask = temp_type_qualifier.flags.i;
+
+      /* Get the block's interface qualifier.  The interface_qualifier
+       * production rule guarantees that only one bit will be set (and
+       * it will be in/out/uniform).
+       */
+      uint64_t block_interface_qualifier = $1.flags.i;
+
+      block->layout.flags.i |= block_interface_qualifier;
+
+      foreach_list_typed (ast_declarator_list, member, link, &block->declarations) {
+         ast_type_qualifier& qualifier = member->type->qualifier;
+         if ((qualifier.flags.i & interface_type_mask) == 0) {
+            /* GLSLangSpec.1.50.11, 4.3.7 (Interface Blocks):
+             * "If no optional qualifier is used in a member declaration, the
+             *  qualifier of the variable is just in, out, or uniform as declared
+             *  by interface-qualifier."
+             */
+            qualifier.flags.i |= block_interface_qualifier;
+         } else if ((qualifier.flags.i & interface_type_mask) !=
+                    block_interface_qualifier) {
+            /* GLSLangSpec.1.50.11, 4.3.7 (Interface Blocks):
+             * "If optional qualifiers are used, they can include interpolation
+             *  and storage qualifiers and they must declare an input, output,
+             *  or uniform variable consistent with the interface qualifier of
+             *  the block."
+             */
+            _mesa_glsl_error(& @1, state,
+                             "uniform/in/out qualifier on "
+                             "interface block member does not match "
+                             "the interface block");
+         }
+      }
+
+      $$ = block;
+   }
+   ;
+
+interface_qualifier:
+   IN_TOK
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.in = 1;
+   }
+   | OUT_TOK
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.out = 1;
+   }
+   | UNIFORM
+   {
+      memset(& $$, 0, sizeof($$));
+      $$.flags.q.uniform = 1;
+   }
+   ;
+
+instance_name_opt:
+   /* empty */
+   {
+      $$ = new(state) ast_interface_block(*state->default_uniform_qualifier,
+                                          NULL, NULL);
+   }
+   | NEW_IDENTIFIER
+   {
+      $$ = new(state) ast_interface_block(*state->default_uniform_qualifier,
+                                          $1, NULL);
+      $$->set_location(@1);
+   }
+   | NEW_IDENTIFIER array_specifier
+   {
+      $$ = new(state) ast_interface_block(*state->default_uniform_qualifier,
+                                          $1, $2);
+      $$->set_location_range(@1, @2);
+   }
+   ;
+
+member_list:
+   member_declaration
+   {
+      $$ = $1;
+      $1->link.self_link();
+   }
+   | member_declaration member_list
+   {
+      $$ = $1;
+      $2->link.insert_before(& $$->link);
+   }
+   ;
+
+member_declaration:
+   fully_specified_type struct_declarator_list ';'
+   {
+      void *ctx = state;
+      ast_fully_specified_type *type = $1;
+      type->set_location(@1);
+
+      if (type->qualifier.flags.q.attribute) {
+         _mesa_glsl_error(& @1, state,
+                          "keyword 'attribute' cannot be used with "
+                          "interface block member");
+      } else if (type->qualifier.flags.q.varying) {
+         _mesa_glsl_error(& @1, state,
+                          "keyword 'varying' cannot be used with "
+                          "interface block member");
+      }
+
+      $$ = new(ctx) ast_declarator_list(type);
+      $$->set_location(@2);
+
+      $$->declarations.push_degenerate_list_at_head(& $2->link);
+   }
+   ;
+
+layout_defaults:
+   layout_qualifier UNIFORM ';'
+   {
+      if (!state->default_uniform_qualifier->merge_qualifier(& @1, state, $1)) {
+         YYERROR;
+      }
+      $$ = NULL;
+   }
+
+   | layout_qualifier IN_TOK ';'
+   {
+      $$ = NULL;
+      if (!state->in_qualifier->merge_in_qualifier(& @1, state, $1, $$)) {
+         YYERROR;
+      }
+   }
+
+   | layout_qualifier OUT_TOK ';'
+   {
+      if (state->stage != MESA_SHADER_GEOMETRY) {
+         _mesa_glsl_error(& @1, state,
+                          "out layout qualifiers only valid in "
+                          "geometry shaders");
+      } else {
+         if ($1.flags.q.prim_type) {
+            /* Make sure this is a valid output primitive type. */
+            switch ($1.prim_type) {
+            case GL_POINTS:
+            case GL_LINE_STRIP:
+            case GL_TRIANGLE_STRIP:
+               break;
+            default:
+               _mesa_glsl_error(&@1, state, "invalid geometry shader output "
+                                "primitive type");
+               break;
+            }
+         }
+         if (!state->out_qualifier->merge_qualifier(& @1, state, $1))
+            YYERROR;
+      }
+      $$ = NULL;
+   }

diff --git a/icd/intel/compiler/shader/glsl_parser_extras.cpp b/icd/intel/compiler/shader/glsl_parser_extras.cpp
new file mode 100644
index 0000000..b2b7ab3
--- /dev/null
+++ b/icd/intel/compiler/shader/glsl_parser_extras.cpp

@@ -0,0 +1,1893 @@
+/*
+ * Copyright © 2008, 2009 Intel Corporation
+ * Copyright (C) 2014 LunarG, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include <stdio.h>
+#include <stdarg.h>
+#include <string.h>
+#include <assert.h>
+
+// These #includes MUST be provided before glext.h, which pollutes the global namespace.
+#include "glslang/Include/ShHandle.h"
+#include "glslang/Public/ShaderLang.h"
+#include "Frontends/glslang/GlslangToTop.h"
+#include "Frontends/SPIRV/SpvToTop.h"
+#include "SPIRV/GlslangToSpv.h"
+#include "SPIRV/SPVRemapper.h"
+#include "SPIRV/GLSL.std.450.h"
+#include "SPIRV/disassemble.h"
+#include "SPIRV/doc.h"
+#include "glsl_glass_manager.h"
+#include "glsl_glass_backend_translator.h"
+
+extern "C" {
+#include "libfns.h" // LunarG ADD:
+#include "icd-utils.h"  // LunarG ADD:
+#include "main/context.h"
+#include "main/shaderobj.h"
+#include "program/prog_diskcache.h"
+}
+
+#include "ralloc.h"
+#include "ast.h"
+#include "glsl_parser_extras.h"
+//#include "glsl_parser.h"
+#include "ir_optimization.h"
+#include "loop_analysis.h"
+#include "threadpool.h"
+#include "SPIRV/spirv.hpp"
+#include "Core/Exceptions.h"
+
+
+/**
+ * Format a short human-readable description of the given GLSL version.
+ */
+const char *
+glsl_compute_version_string(void *mem_ctx, bool is_es, unsigned version)
+{
+   return ralloc_asprintf(mem_ctx, "GLSL%s %d.%02d", is_es ? " ES" : "",
+                          version / 100, version % 100);
+}
+
+
+static const unsigned known_desktop_glsl_versions[] =
+   { 110, 120, 130, 140, 150, 330, 400, 410, 420, 430, 440 };
+
+
+_mesa_glsl_parse_state::_mesa_glsl_parse_state(struct gl_context *_ctx,
+					       gl_shader_stage stage,
+                                               void *mem_ctx)
+   : ctx(_ctx), cs_input_local_size_specified(false), cs_input_local_size(),
+     switch_state()
+{
+   assert(stage < MESA_SHADER_STAGES);
+   this->stage = stage;
+
+   this->scanner = NULL;
+   this->translation_unit.make_empty();
+   this->symbols = new(mem_ctx) glsl_symbol_table;
+
+   this->info_log = ralloc_strdup(mem_ctx, "");
+   this->error = false;
+   this->loop_nesting_ast = NULL;
+
+   this->struct_specifier_depth = 0;
+
+   this->uses_builtin_functions = false;
+
+   /* Set default language version and extensions */
+   this->language_version = ctx->Const.ForceGLSLVersion ?
+                            ctx->Const.ForceGLSLVersion : 110;
+   this->es_shader = false;
+   this->ARB_texture_rectangle_enable = true;
+
+   /* OpenGL ES 2.0 has different defaults from desktop GL. */
+   if (ctx->API == API_OPENGLES2) {
+      this->language_version = 100;
+      this->es_shader = true;
+      this->ARB_texture_rectangle_enable = false;
+   }
+
+   this->extensions = &ctx->Extensions;
+
+   this->Const.MaxLights = ctx->Const.MaxLights;
+   this->Const.MaxClipPlanes = ctx->Const.MaxClipPlanes;
+   this->Const.MaxTextureUnits = ctx->Const.MaxTextureUnits;
+   this->Const.MaxTextureCoords = ctx->Const.MaxTextureCoordUnits;
+   this->Const.MaxVertexAttribs = ctx->Const.Program[MESA_SHADER_VERTEX].MaxAttribs;
+   this->Const.MaxVertexUniformComponents = ctx->Const.Program[MESA_SHADER_VERTEX].MaxUniformComponents;
+   this->Const.MaxVertexTextureImageUnits = ctx->Const.Program[MESA_SHADER_VERTEX].MaxTextureImageUnits;
+   this->Const.MaxCombinedTextureImageUnits = ctx->Const.MaxCombinedTextureImageUnits;
+   this->Const.MaxTextureImageUnits = ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxTextureImageUnits;
+   this->Const.MaxFragmentUniformComponents = ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxUniformComponents;
+   this->Const.MinProgramTexelOffset = ctx->Const.MinProgramTexelOffset;
+   this->Const.MaxProgramTexelOffset = ctx->Const.MaxProgramTexelOffset;
+
+   this->Const.MaxDrawBuffers = ctx->Const.MaxDrawBuffers;
+
+   /* 1.50 constants */
+   this->Const.MaxVertexOutputComponents = ctx->Const.Program[MESA_SHADER_VERTEX].MaxOutputComponents;
+   this->Const.MaxGeometryInputComponents = ctx->Const.Program[MESA_SHADER_GEOMETRY].MaxInputComponents;
+   this->Const.MaxGeometryOutputComponents = ctx->Const.Program[MESA_SHADER_GEOMETRY].MaxOutputComponents;
+   this->Const.MaxFragmentInputComponents = ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxInputComponents;
+   this->Const.MaxGeometryTextureImageUnits = ctx->Const.Program[MESA_SHADER_GEOMETRY].MaxTextureImageUnits;
+   this->Const.MaxGeometryOutputVertices = ctx->Const.MaxGeometryOutputVertices;
+   this->Const.MaxGeometryTotalOutputComponents = ctx->Const.MaxGeometryTotalOutputComponents;
+   this->Const.MaxGeometryUniformComponents = ctx->Const.Program[MESA_SHADER_GEOMETRY].MaxUniformComponents;
+
+   this->Const.MaxVertexAtomicCounters = ctx->Const.Program[MESA_SHADER_VERTEX].MaxAtomicCounters;
+   this->Const.MaxGeometryAtomicCounters = ctx->Const.Program[MESA_SHADER_GEOMETRY].MaxAtomicCounters;
+   this->Const.MaxFragmentAtomicCounters = ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxAtomicCounters;
+   this->Const.MaxCombinedAtomicCounters = ctx->Const.MaxCombinedAtomicCounters;
+   this->Const.MaxAtomicBufferBindings = ctx->Const.MaxAtomicBufferBindings;
+
+   /* Compute shader constants */
+   for (unsigned i = 0; i < Elements(this->Const.MaxComputeWorkGroupCount); i++)
+      this->Const.MaxComputeWorkGroupCount[i] = ctx->Const.MaxComputeWorkGroupCount[i];
+   for (unsigned i = 0; i < Elements(this->Const.MaxComputeWorkGroupSize); i++)
+      this->Const.MaxComputeWorkGroupSize[i] = ctx->Const.MaxComputeWorkGroupSize[i];
+
+   this->Const.MaxImageUnits = ctx->Const.MaxImageUnits;
+   this->Const.MaxCombinedImageUnitsAndFragmentOutputs = ctx->Const.MaxCombinedImageUnitsAndFragmentOutputs;
+   this->Const.MaxImageSamples = ctx->Const.MaxImageSamples;
+   this->Const.MaxVertexImageUniforms = ctx->Const.Program[MESA_SHADER_VERTEX].MaxImageUniforms;
+   this->Const.MaxGeometryImageUniforms = ctx->Const.Program[MESA_SHADER_GEOMETRY].MaxImageUniforms;
+   this->Const.MaxFragmentImageUniforms = ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxImageUniforms;
+   this->Const.MaxCombinedImageUniforms = ctx->Const.MaxCombinedImageUniforms;
+
+   this->current_function = NULL;
+   this->toplevel_ir = NULL;
+   this->found_return = false;
+   this->all_invariant = false;
+   this->user_structures = NULL;
+   this->num_user_structures = 0;
+
+   /* Populate the list of supported GLSL versions */
+   /* FINISHME: Once the OpenGL 3.0 'forward compatible' context or
+    * the OpenGL 3.2 Core context is supported, this logic will need
+    * change.  Older versions of GLSL are no longer supported
+    * outside the compatibility contexts of 3.x.
+    */
+   this->num_supported_versions = 0;
+   if (_mesa_is_desktop_gl(ctx)) {
+      for (unsigned i = 0; i < ARRAY_SIZE(known_desktop_glsl_versions); i++) {
+         if (known_desktop_glsl_versions[i] <= ctx->Const.GLSLVersion) {
+            this->supported_versions[this->num_supported_versions].ver
+               = known_desktop_glsl_versions[i];
+            this->supported_versions[this->num_supported_versions].es = false;
+            this->num_supported_versions++;
+         }
+      }
+   }
+   if (ctx->API == API_OPENGLES2 || ctx->Extensions.ARB_ES2_compatibility) {
+      this->supported_versions[this->num_supported_versions].ver = 100;
+      this->supported_versions[this->num_supported_versions].es = true;
+      this->num_supported_versions++;
+   }
+   if (_mesa_is_gles3(ctx) || ctx->Extensions.ARB_ES3_compatibility) {
+      this->supported_versions[this->num_supported_versions].ver = 300;
+      this->supported_versions[this->num_supported_versions].es = true;
+      this->num_supported_versions++;
+   }
+   assert(this->num_supported_versions
+          <= ARRAY_SIZE(this->supported_versions));
+
+   /* Create a string for use in error messages to tell the user which GLSL
+    * versions are supported.
+    */
+   char *supported = ralloc_strdup(this, "");
+   for (unsigned i = 0; i < this->num_supported_versions; i++) {
+      unsigned ver = this->supported_versions[i].ver;
+      const char *const prefix = (i == 0)
+	 ? ""
+	 : ((i == this->num_supported_versions - 1) ? ", and " : ", ");
+      const char *const suffix = (this->supported_versions[i].es) ? " ES" : "";
+
+      ralloc_asprintf_append(& supported, "%s%u.%02u%s",
+			     prefix,
+			     ver / 100, ver % 100,
+			     suffix);
+   }
+
+   this->supported_version_string = supported;
+
+   if (ctx->Const.ForceGLSLExtensionsWarn)
+      _mesa_glsl_process_extension("all", NULL, "warn", NULL, this);
+
+   this->default_uniform_qualifier = new(this) ast_type_qualifier();
+   this->default_uniform_qualifier->flags.q.shared = 1;
+   this->default_uniform_qualifier->flags.q.column_major = 1;
+
+   this->fs_uses_gl_fragcoord = false;
+   this->fs_redeclares_gl_fragcoord = false;
+   this->fs_origin_upper_left = false;
+   this->fs_pixel_center_integer = false;
+   this->fs_redeclares_gl_fragcoord_with_no_layout_qualifiers = false;
+
+   this->gs_input_prim_type_specified = false;
+   this->gs_input_size = 0;
+   this->in_qualifier = new(this) ast_type_qualifier();
+   this->out_qualifier = new(this) ast_type_qualifier();
+   this->early_fragment_tests = false;
+   memset(this->atomic_counter_offsets, 0,
+          sizeof(this->atomic_counter_offsets));
+}
+
+/**
+ * Determine whether the current GLSL version is sufficiently high to support
+ * a certain feature, and generate an error message if it isn't.
+ *
+ * \param required_glsl_version and \c required_glsl_es_version are
+ * interpreted as they are in _mesa_glsl_parse_state::is_version().
+ *
+ * \param locp is the parser location where the error should be reported.
+ *
+ * \param fmt (and additional arguments) constitute a printf-style error
+ * message to report if the version check fails.  Information about the
+ * current and required GLSL versions will be appended.  So, for example, if
+ * the GLSL version being compiled is 1.20, and check_version(130, 300, locp,
+ * "foo unsupported") is called, the error message will be "foo unsupported in
+ * GLSL 1.20 (GLSL 1.30 or GLSL 3.00 ES required)".
+ */
+bool
+_mesa_glsl_parse_state::check_version(unsigned required_glsl_version,
+                                      unsigned required_glsl_es_version,
+                                      YYLTYPE *locp, const char *fmt, ...)
+{
+   if (this->is_version(required_glsl_version, required_glsl_es_version))
+      return true;
+
+   va_list args;
+   va_start(args, fmt);
+   char *problem = ralloc_vasprintf(this, fmt, args);
+   va_end(args);
+   const char *glsl_version_string
+      = glsl_compute_version_string(this, false, required_glsl_version);
+   const char *glsl_es_version_string
+      = glsl_compute_version_string(this, true, required_glsl_es_version);
+   const char *requirement_string = "";
+   if (required_glsl_version && required_glsl_es_version) {
+      requirement_string = ralloc_asprintf(this, " (%s or %s required)",
+                                           glsl_version_string,
+                                           glsl_es_version_string);
+   } else if (required_glsl_version) {
+      requirement_string = ralloc_asprintf(this, " (%s required)",
+                                           glsl_version_string);
+   } else if (required_glsl_es_version) {
+      requirement_string = ralloc_asprintf(this, " (%s required)",
+                                           glsl_es_version_string);
+   }
+   _mesa_glsl_error(locp, this, "%s in %s%s",
+                    problem, this->get_version_string(),
+                    requirement_string);
+
+   return false;
+}
+
+/**
+ * Process a GLSL #version directive.
+ *
+ * \param version is the integer that follows the #version token.
+ *
+ * \param ident is a string identifier that follows the integer, if any is
+ * present.  Otherwise NULL.
+ */
+void
+_mesa_glsl_parse_state::process_version_directive(YYLTYPE *locp, int version,
+                                                  const char *ident)
+{
+   bool es_token_present = false;
+   if (ident) {
+      if (strcmp(ident, "es") == 0) {
+         es_token_present = true;
+      } else if (version >= 150) {
+         if (strcmp(ident, "core") == 0) {
+            /* Accept the token.  There's no need to record that this is
+             * a core profile shader since that's the only profile we support.
+             */
+         } else if (strcmp(ident, "compatibility") == 0) {
+            _mesa_glsl_error(locp, this,
+                             "the compatibility profile is not supported");
+         } else {
+            _mesa_glsl_error(locp, this,
+                             "\"%s\" is not a valid shading language profile; "
+                             "if present, it must be \"core\"", ident);
+         }
+      } else {
+         _mesa_glsl_error(locp, this,
+                          "illegal text following version number");
+      }
+   }
+
+   this->es_shader = es_token_present;
+   if (version == 100) {
+      if (es_token_present) {
+         _mesa_glsl_error(locp, this,
+                          "GLSL 1.00 ES should be selected using "
+                          "`#version 100'");
+      } else {
+         this->es_shader = true;
+      }
+   }
+
+   if (this->es_shader) {
+      this->ARB_texture_rectangle_enable = false;
+   }
+
+   this->language_version = version;
+
+   bool supported = false;
+   for (unsigned i = 0; i < this->num_supported_versions; i++) {
+      if (this->supported_versions[i].ver == (unsigned) version
+          && this->supported_versions[i].es == this->es_shader) {
+         supported = true;
+         break;
+      }
+   }
+
+   if (!supported) {
+      _mesa_glsl_error(locp, this, "%s is not supported. "
+                       "Supported versions are: %s",
+                       this->get_version_string(),
+                       this->supported_version_string);
+
+      /* On exit, the language_version must be set to a valid value.
+       * Later calls to _mesa_glsl_initialize_types will misbehave if
+       * the version is invalid.
+       */
+      switch (this->ctx->API) {
+      case API_OPENGL_COMPAT:
+      case API_OPENGL_CORE:
+	 this->language_version = this->ctx->Const.GLSLVersion;
+	 break;
+
+      case API_OPENGLES:
+	 assert(!"Should not get here.");
+	 /* FALLTHROUGH */
+
+      case API_OPENGLES2:
+	 this->language_version = 100;
+	 break;
+      case API_VK:
+         break;
+      }
+   }
+}
+
+
+/**
+ * Translate a gl_shader_stage to a short shader stage name for debug
+ * printouts and error messages.
+ */
+const char *
+_mesa_shader_stage_to_string(unsigned stage)
+{
+   switch (stage) {
+   case MESA_SHADER_VERTEX:   return "vertex";
+   case MESA_SHADER_FRAGMENT: return "fragment";
+   case MESA_SHADER_GEOMETRY: return "geometry";
+   }
+
+   assert(!"Should not get here.");
+   return "unknown";
+}
+
+/* This helper function will append the given message to the shader's
+   info log and report it via GL_ARB_debug_output. Per that extension,
+   'type' is one of the enum values classifying the message, and
+   'id' is the implementation-defined ID of the given message. */
+static void
+_mesa_glsl_msg(const YYLTYPE *locp, _mesa_glsl_parse_state *state,
+               GLenum type, const char *fmt, va_list ap)
+{
+   bool error = (type == MESA_DEBUG_TYPE_ERROR);
+   GLuint msg_id = 0;
+
+   assert(state->info_log != NULL);
+
+   /* Get the offset that the new message will be written to. */
+   int msg_offset = strlen(state->info_log);
+
+   ralloc_asprintf_append(&state->info_log, "%u:%u(%u): %s: ",
+					    locp->source,
+					    locp->first_line,
+					    locp->first_column,
+					    error ? "error" : "warning");
+   ralloc_vasprintf_append(&state->info_log, fmt, ap);
+
+   const char *const msg = &state->info_log[msg_offset];
+   struct gl_context *ctx = state->ctx;
+
+   /* Report the error via GL_ARB_debug_output. */
+   _mesa_shader_debug(ctx, type, &msg_id, msg, strlen(msg));
+
+   ralloc_strcat(&state->info_log, "\n");
+}
+
+void
+_mesa_glsl_error(YYLTYPE *locp, _mesa_glsl_parse_state *state,
+		 const char *fmt, ...)
+{
+   va_list ap;
+
+   state->error = true;
+
+   va_start(ap, fmt);
+   _mesa_glsl_msg(locp, state, MESA_DEBUG_TYPE_ERROR, fmt, ap);
+   va_end(ap);
+}
+
+
+void
+_mesa_glsl_warning(const YYLTYPE *locp, _mesa_glsl_parse_state *state,
+		   const char *fmt, ...)
+{
+   va_list ap;
+
+   va_start(ap, fmt);
+   _mesa_glsl_msg(locp, state, MESA_DEBUG_TYPE_OTHER, fmt, ap);
+   va_end(ap);
+}
+
+
+/**
+ * Enum representing the possible behaviors that can be specified in
+ * an #extension directive.
+ */
+enum ext_behavior {
+   extension_disable,
+   extension_enable,
+   extension_require,
+   extension_warn
+};
+
+/**
+ * Element type for _mesa_glsl_supported_extensions
+ */
+struct _mesa_glsl_extension {
+   /**
+    * Name of the extension when referred to in a GLSL extension
+    * statement
+    */
+   const char *name;
+
+   /** True if this extension is available to desktop GL shaders */
+   bool avail_in_GL;
+
+   /** True if this extension is available to GLES shaders */
+   bool avail_in_ES;
+
+   /**
+    * Flag in the gl_extensions struct indicating whether this
+    * extension is supported by the driver, or
+    * &gl_extensions::dummy_true if supported by all drivers.
+    *
+    * Note: the type (GLboolean gl_extensions::*) is a "pointer to
+    * member" type, the type-safe alternative to the "offsetof" macro.
+    * In a nutshell:
+    *
+    * - foo bar::* p declares p to be an "offset" to a field of type
+    *   foo that exists within struct bar
+    * - &bar::baz computes the "offset" of field baz within struct bar
+    * - x.*p accesses the field of x that exists at "offset" p
+    * - x->*p is equivalent to (*x).*p
+    */
+   const GLboolean gl_extensions::* supported_flag;
+
+   /**
+    * Flag in the _mesa_glsl_parse_state struct that should be set
+    * when this extension is enabled.
+    *
+    * See note in _mesa_glsl_extension::supported_flag about "pointer
+    * to member" types.
+    */
+   bool _mesa_glsl_parse_state::* enable_flag;
+
+   /**
+    * Flag in the _mesa_glsl_parse_state struct that should be set
+    * when the shader requests "warn" behavior for this extension.
+    *
+    * See note in _mesa_glsl_extension::supported_flag about "pointer
+    * to member" types.
+    */
+   bool _mesa_glsl_parse_state::* warn_flag;
+
+
+   bool compatible_with_state(const _mesa_glsl_parse_state *state) const;
+   void set_flags(_mesa_glsl_parse_state *state, ext_behavior behavior) const;
+};
+
+#define EXT(NAME, GL, ES, SUPPORTED_FLAG)                   \
+   { "GL_" #NAME, GL, ES, &gl_extensions::SUPPORTED_FLAG,   \
+         &_mesa_glsl_parse_state::NAME##_enable,            \
+         &_mesa_glsl_parse_state::NAME##_warn }
+
+/**
+ * Table of extensions that can be enabled/disabled within a shader,
+ * and the conditions under which they are supported.
+ */
+static const _mesa_glsl_extension _mesa_glsl_supported_extensions[] = {
+   /*                                  API availability */
+   /* name                             GL     ES         supported flag */
+
+   /* ARB extensions go here, sorted alphabetically.
+    */
+   EXT(ARB_arrays_of_arrays,           true,  false,     ARB_arrays_of_arrays),
+   EXT(ARB_compute_shader,             true,  false,     ARB_compute_shader),
+   EXT(ARB_conservative_depth,         true,  false,     ARB_conservative_depth),
+   EXT(ARB_draw_buffers,               true,  false,     dummy_true),
+   EXT(ARB_draw_instanced,             true,  false,     ARB_draw_instanced),
+   EXT(ARB_explicit_attrib_location,   true,  false,     ARB_explicit_attrib_location),
+   EXT(ARB_fragment_coord_conventions, true,  false,     ARB_fragment_coord_conventions),
+   EXT(ARB_gpu_shader5,                true,  false,     ARB_gpu_shader5),
+   EXT(ARB_sample_shading,             true,  false,     ARB_sample_shading),
+   EXT(ARB_separate_shader_objects,    true,  false,     dummy_true),
+   EXT(ARB_shader_atomic_counters,     true,  false,     ARB_shader_atomic_counters),
+   EXT(ARB_shader_bit_encoding,        true,  false,     ARB_shader_bit_encoding),
+   EXT(ARB_shader_image_load_store,    true,  false,     ARB_shader_image_load_store),
+   EXT(ARB_shader_stencil_export,      true,  false,     ARB_shader_stencil_export),
+   EXT(ARB_shader_texture_lod,         true,  false,     ARB_shader_texture_lod),
+   EXT(ARB_shading_language_420pack,   true,  false,     ARB_shading_language_420pack),
+   EXT(ARB_shading_language_packing,   true,  false,     ARB_shading_language_packing),
+   EXT(ARB_texture_cube_map_array,     true,  false,     ARB_texture_cube_map_array),
+   EXT(ARB_texture_gather,             true,  false,     ARB_texture_gather),
+   EXT(ARB_texture_multisample,        true,  false,     ARB_texture_multisample),
+   EXT(ARB_texture_query_levels,       true,  false,     ARB_texture_query_levels),
+   EXT(ARB_texture_query_lod,          true,  false,     ARB_texture_query_lod),
+   EXT(ARB_texture_rectangle,          true,  false,     dummy_true),
+   EXT(ARB_uniform_buffer_object,      true,  false,     ARB_uniform_buffer_object),
+   EXT(ARB_viewport_array,             true,  false,     ARB_viewport_array),
+
+   /* KHR extensions go here, sorted alphabetically.
+    */
+
+   /* OES extensions go here, sorted alphabetically.
+    */
+   EXT(OES_EGL_image_external,         false, true,      OES_EGL_image_external),
+   EXT(OES_standard_derivatives,       false, true,      OES_standard_derivatives),
+   EXT(OES_texture_3D,                 false, true,      EXT_texture3D),
+
+   /* All other extensions go here, sorted alphabetically.
+    */
+   EXT(AMD_conservative_depth,         true,  false,     ARB_conservative_depth),
+   EXT(AMD_shader_stencil_export,      true,  false,     ARB_shader_stencil_export),
+   EXT(AMD_shader_trinary_minmax,      true,  false,     dummy_true),
+   EXT(AMD_vertex_shader_layer,        true,  false,     AMD_vertex_shader_layer),
+   EXT(EXT_separate_shader_objects,    false, true,      dummy_true),
+   EXT(EXT_shader_integer_mix,         true,  true,      EXT_shader_integer_mix),
+   EXT(EXT_texture_array,              true,  false,     EXT_texture_array),
+};
+
+#undef EXT
+
+
+/**
+ * Determine whether a given extension is compatible with the target,
+ * API, and extension information in the current parser state.
+ */
+bool _mesa_glsl_extension::compatible_with_state(const _mesa_glsl_parse_state *
+                                                 state) const
+{
+   /* Check that this extension matches whether we are compiling
+    * for desktop GL or GLES.
+    */
+   if (state->es_shader) {
+      if (!this->avail_in_ES) return false;
+   } else {
+      if (!this->avail_in_GL) return false;
+   }
+
+   /* Check that this extension is supported by the OpenGL
+    * implementation.
+    *
+    * Note: the ->* operator indexes into state->extensions by the
+    * offset this->supported_flag.  See
+    * _mesa_glsl_extension::supported_flag for more info.
+    */
+   return state->extensions->*(this->supported_flag);
+}
+
+/**
+ * Set the appropriate flags in the parser state to establish the
+ * given behavior for this extension.
+ */
+void _mesa_glsl_extension::set_flags(_mesa_glsl_parse_state *state,
+                                     ext_behavior behavior) const
+{
+   /* Note: the ->* operator indexes into state by the
+    * offsets this->enable_flag and this->warn_flag.  See
+    * _mesa_glsl_extension::supported_flag for more info.
+    */
+   state->*(this->enable_flag) = (behavior != extension_disable);
+   state->*(this->warn_flag)   = (behavior == extension_warn);
+}
+
+/**
+ * Find an extension by name in _mesa_glsl_supported_extensions.  If
+ * the name is not found, return NULL.
+ */
+static const _mesa_glsl_extension *find_extension(const char *name)
+{
+   for (unsigned i = 0; i < Elements(_mesa_glsl_supported_extensions); ++i) {
+      if (strcmp(name, _mesa_glsl_supported_extensions[i].name) == 0) {
+         return &_mesa_glsl_supported_extensions[i];
+      }
+   }
+   return NULL;
+}
+
+
+bool
+_mesa_glsl_process_extension(const char *name, YYLTYPE *name_locp,
+			     const char *behavior_string, YYLTYPE *behavior_locp,
+			     _mesa_glsl_parse_state *state)
+{
+   ext_behavior behavior;
+   if (strcmp(behavior_string, "warn") == 0) {
+      behavior = extension_warn;
+   } else if (strcmp(behavior_string, "require") == 0) {
+      behavior = extension_require;
+   } else if (strcmp(behavior_string, "enable") == 0) {
+      behavior = extension_enable;
+   } else if (strcmp(behavior_string, "disable") == 0) {
+      behavior = extension_disable;
+   } else {
+      _mesa_glsl_error(behavior_locp, state,
+		       "unknown extension behavior `%s'",
+		       behavior_string);
+      return false;
+   }
+
+   if (strcmp(name, "all") == 0) {
+      if ((behavior == extension_enable) || (behavior == extension_require)) {
+	 _mesa_glsl_error(name_locp, state, "cannot %s all extensions",
+			  (behavior == extension_enable)
+			  ? "enable" : "require");
+	 return false;
+      } else {
+         for (unsigned i = 0;
+              i < Elements(_mesa_glsl_supported_extensions); ++i) {
+            const _mesa_glsl_extension *extension
+               = &_mesa_glsl_supported_extensions[i];
+            if (extension->compatible_with_state(state)) {
+               _mesa_glsl_supported_extensions[i].set_flags(state, behavior);
+            }
+         }
+      }
+   } else {
+      const _mesa_glsl_extension *extension = find_extension(name);
+      if (extension && extension->compatible_with_state(state)) {
+         extension->set_flags(state, behavior);
+      } else {
+         static const char fmt[] = "extension `%s' unsupported in %s shader";
+
+         if (behavior == extension_require) {
+            _mesa_glsl_error(name_locp, state, fmt,
+                             name, _mesa_shader_stage_to_string(state->stage));
+            return false;
+         } else {
+            _mesa_glsl_warning(name_locp, state, fmt,
+                               name, _mesa_shader_stage_to_string(state->stage));
+         }
+      }
+   }
+
+   return true;
+}
+
+
+/**
+ * Recurses through <type> and <expr> if <expr> is an aggregate initializer
+ * and sets <expr>'s <constructor_type> field to <type>. Gives later functions
+ * (process_array_constructor, et al) sufficient information to do type
+ * checking.
+ *
+ * Operates on assignments involving an aggregate initializer. E.g.,
+ *
+ * vec4 pos = {1.0, -1.0, 0.0, 1.0};
+ *
+ * or more ridiculously,
+ *
+ * struct S {
+ *     vec4 v[2];
+ * };
+ *
+ * struct {
+ *     S a[2], b;
+ *     int c;
+ * } aggregate = {
+ *     {
+ *         {
+ *             {
+ *                 {1.0, 2.0, 3.0, 4.0}, // a[0].v[0]
+ *                 {5.0, 6.0, 7.0, 8.0}  // a[0].v[1]
+ *             } // a[0].v
+ *         }, // a[0]
+ *         {
+ *             {
+ *                 {1.0, 2.0, 3.0, 4.0}, // a[1].v[0]
+ *                 {5.0, 6.0, 7.0, 8.0}  // a[1].v[1]
+ *             } // a[1].v
+ *         } // a[1]
+ *     }, // a
+ *     {
+ *         {
+ *             {1.0, 2.0, 3.0, 4.0}, // b.v[0]
+ *             {5.0, 6.0, 7.0, 8.0}  // b.v[1]
+ *         } // b.v
+ *     }, // b
+ *     4 // c
+ * };
+ *
+ * This pass is necessary because the right-hand side of <type> e = { ... }
+ * doesn't contain sufficient information to determine if the types match.
+ */
+void
+_mesa_ast_set_aggregate_type(const glsl_type *type,
+                             ast_expression *expr)
+{
+   ast_aggregate_initializer *ai = (ast_aggregate_initializer *)expr;
+   ai->constructor_type = type;
+
+   /* If the aggregate is an array, recursively set its elements' types. */
+   if (type->is_array()) {
+      /* Each array element has the type type->element_type().
+       *
+       * E.g., if <type> if struct S[2] we want to set each element's type to
+       * struct S.
+       */
+      for (exec_node *expr_node = ai->expressions.head;
+           !expr_node->is_tail_sentinel();
+           expr_node = expr_node->next) {
+         ast_expression *expr = exec_node_data(ast_expression, expr_node,
+                                               link);
+
+         if (expr->oper == ast_aggregate)
+            _mesa_ast_set_aggregate_type(type->element_type(), expr);
+      }
+
+   /* If the aggregate is a struct, recursively set its fields' types. */
+   } else if (type->is_record()) {
+      exec_node *expr_node = ai->expressions.head;
+
+      /* Iterate through the struct's fields. */
+      for (unsigned i = 0; !expr_node->is_tail_sentinel() && i < type->length;
+           i++, expr_node = expr_node->next) {
+         ast_expression *expr = exec_node_data(ast_expression, expr_node,
+                                               link);
+
+         if (expr->oper == ast_aggregate) {
+            _mesa_ast_set_aggregate_type(type->fields.structure[i].type, expr);
+         }
+      }
+   /* If the aggregate is a matrix, set its columns' types. */
+   } else if (type->is_matrix()) {
+      for (exec_node *expr_node = ai->expressions.head;
+           !expr_node->is_tail_sentinel();
+           expr_node = expr_node->next) {
+         ast_expression *expr = exec_node_data(ast_expression, expr_node,
+                                               link);
+
+         if (expr->oper == ast_aggregate)
+            _mesa_ast_set_aggregate_type(type->column_type(), expr);
+      }
+   }
+}
+
+
+void
+_mesa_ast_type_qualifier_print(const struct ast_type_qualifier *q)
+{
+   if (q->flags.q.constant)
+      printf("const ");
+
+   if (q->flags.q.invariant)
+      printf("invariant ");
+
+   if (q->flags.q.attribute)
+      printf("attribute ");
+
+   if (q->flags.q.varying)
+      printf("varying ");
+
+   if (q->flags.q.in && q->flags.q.out)
+      printf("inout ");
+   else {
+      if (q->flags.q.in)
+	 printf("in ");
+
+      if (q->flags.q.out)
+	 printf("out ");
+   }
+
+   if (q->flags.q.centroid)
+      printf("centroid ");
+   if (q->flags.q.sample)
+      printf("sample ");
+   if (q->flags.q.uniform)
+      printf("uniform ");
+   if (q->flags.q.smooth)
+      printf("smooth ");
+   if (q->flags.q.flat)
+      printf("flat ");
+   if (q->flags.q.noperspective)
+      printf("noperspective ");
+}
+
+
+void
+ast_node::print(void) const
+{
+   printf("unhandled node ");
+}
+
+
+ast_node::ast_node(void)
+{
+   this->location.source = 0;
+   this->location.first_line = 0;
+   this->location.first_column = 0;
+   this->location.last_line = 0;
+   this->location.last_column = 0;
+}
+
+
+static void
+ast_opt_array_dimensions_print(const ast_array_specifier *array_specifier)
+{
+   if (array_specifier)
+      array_specifier->print();
+}
+
+
+void
+ast_compound_statement::print(void) const
+{
+   printf("{\n");
+   
+   foreach_list_const(n, &this->statements) {
+      ast_node *ast = exec_node_data(ast_node, n, link);
+      ast->print();
+   }
+
+   printf("}\n");
+}
+
+
+ast_compound_statement::ast_compound_statement(int new_scope,
+					       ast_node *statements)
+{
+   this->new_scope = new_scope;
+
+   if (statements != NULL) {
+      this->statements.push_degenerate_list_at_head(&statements->link);
+   }
+}
+
+
+void
+ast_expression::print(void) const
+{
+   switch (oper) {
+   case ast_assign:
+   case ast_mul_assign:
+   case ast_div_assign:
+   case ast_mod_assign:
+   case ast_add_assign:
+   case ast_sub_assign:
+   case ast_ls_assign:
+   case ast_rs_assign:
+   case ast_and_assign:
+   case ast_xor_assign:
+   case ast_or_assign:
+      subexpressions[0]->print();
+      printf("%s ", operator_string(oper));
+      subexpressions[1]->print();
+      break;
+
+   case ast_field_selection:
+      subexpressions[0]->print();
+      printf(". %s ", primary_expression.identifier);
+      break;
+
+   case ast_plus:
+   case ast_neg:
+   case ast_bit_not:
+   case ast_logic_not:
+   case ast_pre_inc:
+   case ast_pre_dec:
+      printf("%s ", operator_string(oper));
+      subexpressions[0]->print();
+      break;
+
+   case ast_post_inc:
+   case ast_post_dec:
+      subexpressions[0]->print();
+      printf("%s ", operator_string(oper));
+      break;
+
+   case ast_conditional:
+      subexpressions[0]->print();
+      printf("? ");
+      subexpressions[1]->print();
+      printf(": ");
+      subexpressions[2]->print();
+      break;
+
+   case ast_array_index:
+      subexpressions[0]->print();
+      printf("[ ");
+      subexpressions[1]->print();
+      printf("] ");
+      break;
+
+   case ast_function_call: {
+      subexpressions[0]->print();
+      printf("( ");
+
+      foreach_list_const (n, &this->expressions) {
+	 if (n != this->expressions.get_head())
+	    printf(", ");
+
+	 ast_node *ast = exec_node_data(ast_node, n, link);
+	 ast->print();
+      }
+
+      printf(") ");
+      break;
+   }
+
+   case ast_identifier:
+      printf("%s ", primary_expression.identifier);
+      break;
+
+   case ast_int_constant:
+      printf("%d ", primary_expression.int_constant);
+      break;
+
+   case ast_uint_constant:
+      printf("%u ", primary_expression.uint_constant);
+      break;
+
+   case ast_float_constant:
+      printf("%f ", primary_expression.float_constant);
+      break;
+
+   case ast_bool_constant:
+      printf("%s ",
+	     primary_expression.bool_constant
+	     ? "true" : "false");
+      break;
+
+   case ast_sequence: {
+      printf("( ");
+      foreach_list_const(n, & this->expressions) {
+	 if (n != this->expressions.get_head())
+	    printf(", ");
+
+	 ast_node *ast = exec_node_data(ast_node, n, link);
+	 ast->print();
+      }
+      printf(") ");
+      break;
+   }
+
+   case ast_aggregate: {
+      printf("{ ");
+      foreach_list_const(n, & this->expressions) {
+	 if (n != this->expressions.get_head())
+	    printf(", ");
+
+	 ast_node *ast = exec_node_data(ast_node, n, link);
+	 ast->print();
+      }
+      printf("} ");
+      break;
+   }
+
+   default:
+      assert(0);
+      break;
+   }
+}
+
+ast_expression::ast_expression(int oper,
+			       ast_expression *ex0,
+			       ast_expression *ex1,
+			       ast_expression *ex2) :
+   primary_expression()
+{
+   this->oper = ast_operators(oper);
+   this->subexpressions[0] = ex0;
+   this->subexpressions[1] = ex1;
+   this->subexpressions[2] = ex2;
+   this->non_lvalue_description = NULL;
+}
+
+
+void
+ast_expression_statement::print(void) const
+{
+   if (expression)
+      expression->print();
+
+   printf("; ");
+}
+
+
+ast_expression_statement::ast_expression_statement(ast_expression *ex) :
+   expression(ex)
+{
+   /* empty */
+}
+
+
+void
+ast_function::print(void) const
+{
+   return_type->print();
+   printf(" %s (", identifier);
+
+   foreach_list_const(n, & this->parameters) {
+      ast_node *ast = exec_node_data(ast_node, n, link);
+      ast->print();
+   }
+
+   printf(")");
+}
+
+
+ast_function::ast_function(void)
+   : return_type(NULL), identifier(NULL), is_definition(false),
+     signature(NULL)
+{
+   /* empty */
+}
+
+
+void
+ast_fully_specified_type::print(void) const
+{
+   _mesa_ast_type_qualifier_print(& qualifier);
+   specifier->print();
+}
+
+
+void
+ast_parameter_declarator::print(void) const
+{
+   type->print();
+   if (identifier)
+      printf("%s ", identifier);
+   ast_opt_array_dimensions_print(array_specifier);
+}
+
+
+void
+ast_function_definition::print(void) const
+{
+   prototype->print();
+   body->print();
+}
+
+
+void
+ast_declaration::print(void) const
+{
+   printf("%s ", identifier);
+   ast_opt_array_dimensions_print(array_specifier);
+
+   if (initializer) {
+      printf("= ");
+      initializer->print();
+   }
+}
+
+
+ast_declaration::ast_declaration(const char *identifier,
+				 ast_array_specifier *array_specifier,
+				 ast_expression *initializer)
+{
+   this->identifier = identifier;
+   this->array_specifier = array_specifier;
+   this->initializer = initializer;
+}
+
+
+void
+ast_declarator_list::print(void) const
+{
+   assert(type || invariant);
+
+   if (type)
+      type->print();
+   else
+      printf("invariant ");
+
+   foreach_list_const (ptr, & this->declarations) {
+      if (ptr != this->declarations.get_head())
+	 printf(", ");
+
+      ast_node *ast = exec_node_data(ast_node, ptr, link);
+      ast->print();
+   }
+
+   printf("; ");
+}
+
+
+ast_declarator_list::ast_declarator_list(ast_fully_specified_type *type)
+{
+   this->type = type;
+   this->invariant = false;
+}
+
+void
+ast_jump_statement::print(void) const
+{
+   switch (mode) {
+   case ast_continue:
+      printf("continue; ");
+      break;
+   case ast_break:
+      printf("break; ");
+      break;
+   case ast_return:
+      printf("return ");
+      if (opt_return_value)
+	 opt_return_value->print();
+
+      printf("; ");
+      break;
+   case ast_discard:
+      printf("discard; ");
+      break;
+   }
+}
+
+
+ast_jump_statement::ast_jump_statement(int mode, ast_expression *return_value)
+   : opt_return_value(NULL)
+{
+   this->mode = ast_jump_modes(mode);
+
+   if (mode == ast_return)
+      opt_return_value = return_value;
+}
+
+
+void
+ast_selection_statement::print(void) const
+{
+   printf("if ( ");
+   condition->print();
+   printf(") ");
+
+   then_statement->print();
+
+   if (else_statement) {
+      printf("else ");
+      else_statement->print();
+   }
+   
+}
+
+
+ast_selection_statement::ast_selection_statement(ast_expression *condition,
+						 ast_node *then_statement,
+						 ast_node *else_statement)
+{
+   this->condition = condition;
+   this->then_statement = then_statement;
+   this->else_statement = else_statement;
+}
+
+
+void
+ast_switch_statement::print(void) const
+{
+   printf("switch ( ");
+   test_expression->print();
+   printf(") ");
+
+   body->print();
+}
+
+
+ast_switch_statement::ast_switch_statement(ast_expression *test_expression,
+					   ast_node *body)
+{
+   this->test_expression = test_expression;
+   this->body = body;
+}
+
+
+void
+ast_switch_body::print(void) const
+{
+   printf("{\n");
+   if (stmts != NULL) {
+      stmts->print();
+   }
+   printf("}\n");
+}
+
+
+ast_switch_body::ast_switch_body(ast_case_statement_list *stmts)
+{
+   this->stmts = stmts;
+}
+
+
+void ast_case_label::print(void) const
+{
+   if (test_value != NULL) {
+      printf("case ");
+      test_value->print();
+      printf(": ");
+   } else {
+      printf("default: ");
+   }
+}
+
+
+ast_case_label::ast_case_label(ast_expression *test_value)
+{
+   this->test_value = test_value;
+}
+
+
+void ast_case_label_list::print(void) const
+{
+   foreach_list_const(n, & this->labels) {
+      ast_node *ast = exec_node_data(ast_node, n, link);
+      ast->print();
+   }
+   printf("\n");
+}
+
+
+ast_case_label_list::ast_case_label_list(void)
+{
+}
+
+
+void ast_case_statement::print(void) const
+{
+   labels->print();
+   foreach_list_const(n, & this->stmts) {
+      ast_node *ast = exec_node_data(ast_node, n, link);
+      ast->print();
+      printf("\n");
+   }
+}
+
+
+ast_case_statement::ast_case_statement(ast_case_label_list *labels)
+{
+   this->labels = labels;
+}
+
+
+void ast_case_statement_list::print(void) const
+{
+   foreach_list_const(n, & this->cases) {
+      ast_node *ast = exec_node_data(ast_node, n, link);
+      ast->print();
+   }
+}
+
+
+ast_case_statement_list::ast_case_statement_list(void)
+{
+}
+
+
+void
+ast_iteration_statement::print(void) const
+{
+   switch (mode) {
+   case ast_for:
+      printf("for( ");
+      if (init_statement)
+	 init_statement->print();
+      printf("; ");
+
+      if (condition)
+	 condition->print();
+      printf("; ");
+
+      if (rest_expression)
+	 rest_expression->print();
+      printf(") ");
+
+      body->print();
+      break;
+
+   case ast_while:
+      printf("while ( ");
+      if (condition)
+	 condition->print();
+      printf(") ");
+      body->print();
+      break;
+
+   case ast_do_while:
+      printf("do ");
+      body->print();
+      printf("while ( ");
+      if (condition)
+	 condition->print();
+      printf("); ");
+      break;
+   }
+}
+
+
+ast_iteration_statement::ast_iteration_statement(int mode,
+						 ast_node *init,
+						 ast_node *condition,
+						 ast_expression *rest_expression,
+						 ast_node *body)
+{
+   this->mode = ast_iteration_modes(mode);
+   this->init_statement = init;
+   this->condition = condition;
+   this->rest_expression = rest_expression;
+   this->body = body;
+}
+
+
+void
+ast_struct_specifier::print(void) const
+{
+   printf("struct %s { ", name);
+   foreach_list_const(n, &this->declarations) {
+      ast_node *ast = exec_node_data(ast_node, n, link);
+      ast->print();
+   }
+   printf("} ");
+}
+
+
+ast_struct_specifier::ast_struct_specifier(const char *identifier,
+					   ast_declarator_list *declarator_list)
+{
+   if (identifier == NULL) {
+      static mtx_t mutex = _MTX_INITIALIZER_NP;
+      static unsigned anon_count = 1;
+      unsigned count;
+
+      mtx_lock(&mutex);
+      count = anon_count++;
+      mtx_unlock(&mutex);
+
+      identifier = ralloc_asprintf(this, "#anon_struct_%04x", count);
+   }
+   name = identifier;
+   this->declarations.push_degenerate_list_at_head(&declarator_list->link);
+   is_declaration = true;
+}
+
+static void
+set_shader_inout_layout(struct gl_shader *shader,
+		     struct _mesa_glsl_parse_state *state)
+{
+   if (shader->Stage != MESA_SHADER_GEOMETRY) {
+      /* Should have been prevented by the parser. */
+      assert(!state->in_qualifier->flags.i);
+      assert(!state->out_qualifier->flags.i);
+   }
+
+   if (shader->Stage != MESA_SHADER_COMPUTE) {
+      /* Should have been prevented by the parser. */
+      assert(!state->cs_input_local_size_specified);
+   }
+
+   if (shader->Stage != MESA_SHADER_FRAGMENT) {
+      /* Should have been prevented by the parser. */
+      assert(!state->fs_uses_gl_fragcoord);
+      assert(!state->fs_redeclares_gl_fragcoord);
+      assert(!state->fs_pixel_center_integer);
+      assert(!state->fs_origin_upper_left);
+   }
+
+   switch (shader->Stage) {
+   case MESA_SHADER_GEOMETRY:
+      shader->Geom.VerticesOut = 0;
+      if (state->out_qualifier->flags.q.max_vertices)
+         shader->Geom.VerticesOut = state->out_qualifier->max_vertices;
+
+      if (state->gs_input_prim_type_specified) {
+         shader->Geom.InputType = state->in_qualifier->prim_type;
+      } else {
+         shader->Geom.InputType = PRIM_UNKNOWN;
+      }
+
+      if (state->out_qualifier->flags.q.prim_type) {
+         shader->Geom.OutputType = state->out_qualifier->prim_type;
+      } else {
+         shader->Geom.OutputType = PRIM_UNKNOWN;
+      }
+
+      shader->Geom.Invocations = 0;
+      if (state->in_qualifier->flags.q.invocations)
+         shader->Geom.Invocations = state->in_qualifier->invocations;
+      break;
+
+   case MESA_SHADER_COMPUTE:
+      if (state->cs_input_local_size_specified) {
+         for (int i = 0; i < 3; i++)
+            shader->Comp.LocalSize[i] = state->cs_input_local_size[i];
+      } else {
+         for (int i = 0; i < 3; i++)
+            shader->Comp.LocalSize[i] = 0;
+      }
+      break;
+
+   case MESA_SHADER_FRAGMENT:
+      shader->redeclares_gl_fragcoord = state->fs_redeclares_gl_fragcoord;
+      shader->uses_gl_fragcoord = state->fs_uses_gl_fragcoord;
+      shader->pixel_center_integer = state->fs_pixel_center_integer;
+      shader->origin_upper_left = state->fs_origin_upper_left;
+      shader->ARB_fragment_coord_conventions_enable =
+         state->ARB_fragment_coord_conventions_enable;
+      break;
+
+   default:
+      /* Nothing to do. */
+      break;
+   }
+}
+
+extern "C" {
+
+void _mesa_glslang_generate_resources(struct gl_context *ctx,
+                                      TBuiltInResource& resources)
+{
+   resources.maxLights = ctx->Const.MaxLights;
+   resources.maxClipPlanes = ctx->Const.MaxClipPlanes;
+   resources.maxTextureUnits = ctx->Const.MaxTextureUnits;
+   resources.maxTextureCoords = ctx->Const.MaxTextureCoordUnits;
+   resources.maxVertexAttribs = ctx->Const.Program[MESA_SHADER_VERTEX].MaxAttribs;
+   resources.maxVertexUniformComponents = ctx->Const.Program[MESA_SHADER_VERTEX].MaxUniformComponents;
+   resources.maxVaryingFloats = ctx->Const.MaxVarying * 4;
+   resources.maxVertexTextureImageUnits = ctx->Const.Program[MESA_SHADER_VERTEX].MaxTextureImageUnits;
+   resources.maxCombinedTextureImageUnits = ctx->Const.MaxCombinedTextureImageUnits;
+   resources.maxTextureImageUnits = ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxTextureImageUnits;
+   resources.maxFragmentUniformComponents = ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxUniformComponents;
+   resources.maxDrawBuffers = ctx->Const.MaxDrawBuffers;
+   resources.maxVertexUniformVectors = resources.maxVertexUniformComponents / 4;
+   resources.maxVaryingVectors = ctx->Const.MaxVarying;
+   resources.maxFragmentUniformVectors = resources.maxFragmentUniformComponents / 4;
+   resources.maxVertexOutputVectors = ctx->Const.MaxVarying;
+   resources.maxFragmentInputVectors = ctx->Const.MaxVarying - 1;
+   resources.minProgramTexelOffset = ctx->Const.MinProgramTexelOffset;
+   resources.maxProgramTexelOffset = ctx->Const.MaxProgramTexelOffset;
+   resources.maxClipDistances = 8;               // TODO: ...
+   resources.maxComputeWorkGroupCountX = ctx->Const.MaxComputeWorkGroupCount[0];
+   resources.maxComputeWorkGroupCountY = ctx->Const.MaxComputeWorkGroupCount[1];
+   resources.maxComputeWorkGroupCountZ = ctx->Const.MaxComputeWorkGroupCount[2];
+   resources.maxComputeWorkGroupSizeX = ctx->Const.MaxComputeWorkGroupSize[0];
+   resources.maxComputeWorkGroupSizeX = ctx->Const.MaxComputeWorkGroupSize[1];
+   resources.maxComputeWorkGroupSizeZ = ctx->Const.MaxComputeWorkGroupSize[2];
+   resources.maxComputeUniformComponents = 1024; // TODO: ...
+   resources.maxComputeTextureImageUnits = 16;   // TODO: ...
+   resources.maxComputeImageUniforms = 8;        // TODO: ...
+   resources.maxComputeAtomicCounters = 8;       // TODO: ...
+   resources.maxComputeAtomicCounterBuffers = 1; // TODO: ...
+   resources.maxVaryingComponents = ctx->Const.MaxVarying * 4;
+   resources.maxVertexOutputComponents = ctx->Const.Program[MESA_SHADER_VERTEX].MaxOutputComponents;
+   resources.maxGeometryInputComponents = ctx->Const.Program[MESA_SHADER_GEOMETRY].MaxInputComponents;
+   resources.maxGeometryOutputComponents = ctx->Const.Program[MESA_SHADER_GEOMETRY].MaxOutputComponents;
+   resources.maxFragmentInputComponents = ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxInputComponents;
+   resources.maxImageUnits = ctx->Const.MaxImageUnits;
+   resources.maxCombinedImageUnitsAndFragmentOutputs = ctx->Const.MaxCombinedImageUnitsAndFragmentOutputs;
+   resources.maxImageSamples = ctx->Const.MaxImageSamples;
+   resources.maxVertexImageUniforms = ctx->Const.Program[MESA_SHADER_VERTEX].MaxImageUniforms;
+   resources.maxTessControlImageUniforms = 0;    // TODO: ...
+   resources.maxTessEvaluationImageUniforms = 0; // TODO: ...
+   resources.maxGeometryImageUniforms = ctx->Const.Program[MESA_SHADER_GEOMETRY].MaxImageUniforms;
+   resources.maxFragmentImageUniforms = ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxImageUniforms;
+   resources.maxCombinedImageUniforms = ctx->Const.MaxCombinedImageUniforms;
+   resources.maxGeometryTextureImageUnits = ctx->Const.Program[MESA_SHADER_GEOMETRY].MaxTextureImageUnits;
+   resources.maxGeometryOutputVertices = ctx->Const.MaxGeometryOutputVertices;
+   resources.maxGeometryTotalOutputComponents = ctx->Const.MaxGeometryTotalOutputComponents;
+   resources.maxGeometryUniformComponents = ctx->Const.Program[MESA_SHADER_GEOMETRY].MaxUniformComponents;
+   resources.maxGeometryVaryingComponents = 64;           // TODO: ...
+   resources.maxTessControlInputComponents = 128;         // TODO: ...
+   resources.maxTessControlOutputComponents = 128;        // TODO: ...
+   resources.maxTessControlTextureImageUnits = 16;        // TODO: ...
+   resources.maxTessControlUniformComponents = 1024;      // TODO: ...
+   resources.maxTessControlTotalOutputComponents = 4096;  // TODO: ...
+   resources.maxTessEvaluationInputComponents = 128;      // TODO: ...
+   resources.maxTessEvaluationOutputComponents = 128;     // TODO: ...
+   resources.maxTessEvaluationTextureImageUnits = 16;     // TODO: ...
+   resources.maxTessEvaluationUniformComponents = 1024;   // TODO: ...
+   resources.maxTessPatchComponents = 120;                // TODO: ...
+   resources.maxPatchVertices = 32;                       // TODO: ...
+   resources.maxTessGenLevel = 64;                        // TODO: ...
+   resources.maxViewports = ctx->Const.MaxViewports;
+   resources.maxVertexAtomicCounters = ctx->Const.Program[MESA_SHADER_VERTEX].MaxAtomicCounters;
+   resources.maxTessControlAtomicCounters = 0;            // TODO: ...
+   resources.maxTessEvaluationAtomicCounters = 0;         // TODO: ...
+   resources.maxGeometryAtomicCounters = ctx->Const.Program[MESA_SHADER_GEOMETRY].MaxAtomicCounters;
+   resources.maxFragmentAtomicCounters = ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxAtomicCounters;
+   resources.maxCombinedAtomicCounters = ctx->Const.MaxCombinedAtomicCounters;
+   resources.maxAtomicCounterBindings = ctx->Const.MaxAtomicBufferBindings;
+   resources.maxVertexAtomicCounterBuffers = 0;           // TODO: ...
+   resources.maxTessControlAtomicCounterBuffers = 0;      // TODO: ...
+   resources.maxTessEvaluationAtomicCounterBuffers = 0;   // TODO: ...
+   resources.maxGeometryAtomicCounterBuffers = 0;         // TODO: ...
+   resources.maxFragmentAtomicCounterBuffers = 1;         // TODO: ...
+   resources.maxCombinedAtomicCounterBuffers = 1;         // TODO: ...
+   resources.maxAtomicCounterBufferSize = ctx->Const.MaxAtomicBufferSize;
+   resources.maxTransformFeedbackBuffers = ctx->Const.MaxTransformFeedbackBuffers;
+   resources.maxTransformFeedbackInterleavedComponents = ctx->Const.MaxTransformFeedbackInterleavedComponents;
+   resources.limits.nonInductiveForLoops = 1;
+   resources.limits.whileLoops = 1;
+   resources.limits.doWhileLoops = 1;
+   resources.limits.generalUniformIndexing = 1;
+   resources.limits.generalAttributeMatrixVectorIndexing = 1;
+   resources.limits.generalVaryingIndexing = 1;
+   resources.limits.generalSamplerIndexing = 1;
+   resources.limits.generalVariableIndexing = 1;
+   resources.limits.generalConstantMatrixVectorIndexing = 1;
+}
+
+
+EShLanguage _mesa_shader_stage_to_glslang_stage(unsigned stage)
+{
+   switch (stage)
+   {
+   case MESA_SHADER_VERTEX:          return EShLangVertex;
+   case MESA_SHADER_GEOMETRY:        return EShLangGeometry;
+   case MESA_SHADER_FRAGMENT:        return EShLangFragment;
+   case MESA_SHADER_COMPUTE:         return EShLangCompute;
+// TODO:  case MESA_SHADER_TESS_CONTROL:    return EShLangTessControl;
+// TODO:  case MESA_SHADER_TESS_EVALUATION: return EShLangTessEvaluation;
+   default:
+      assert(!"Should not get here.");
+      return EShLangVertex;
+   }
+}
+
+void null_unsupported_functionality(const std::string& message, gla::EAbortType at) {
+   if (at == gla::EATAbort) {
+      std::cerr << std::endl << message << std::endl;
+      exit(1);
+   }
+}
+
+void
+_mesa_glsl_compile_shader(struct gl_context *ctx, struct gl_shader *shader,
+                          bool dump_ast, bool dump_SPV, bool dump_hir,
+                          bool strip_SPV, bool canonicalize_SPV)
+{
+   const char* infoLog = "";
+
+   _mesa_glsl_parse_state *state =
+      new(shader) _mesa_glsl_parse_state(ctx, shader->Stage, shader);
+
+   state->error = false;
+
+   // TODO: Don't create and delete the manager for each compile.  Should have
+   // one per potential compilation thread, and re-use from a pool.
+   const EShLanguage glslang_stage = _mesa_shader_stage_to_glslang_stage(shader->Stage);
+   gla::Manager* glaManager = gla::getManager(glslang_stage);
+
+   // We know this is a MesaGlassManager, because its from our factory
+   gla::MesaGlassManager* manager = static_cast<gla::MesaGlassManager*>(glaManager);
+
+   manager->options.optimizations.loopUnrollThreshold = 350;
+   manager->options.optimizations.flattenHoistThreshold = 25;
+   manager->options.optimizations.reassociate = ctx->Const.GlassEnableReassociation;
+
+   const bool useSPV = ((unsigned int *)shader->Source)[0] == spv::MagicNumber;
+
+   glslang::TShader*  glslang_shader = useSPV ? 0 : new glslang::TShader(glslang_stage);
+   glslang::TProgram* glslang_program = useSPV ? 0 : new glslang::TProgram;
+
+   if (!useSPV && (glslang_shader == 0 || glslang_program == 0))
+      state->error = true;
+
+   // Set the shader source into the TShader object
+   if (!useSPV)
+       glslang_shader->setStrings(&shader->Source, 1);
+
+   // glslang message reporting level
+   const EShMessages messages = EShMessages(EShMsgDefault | EShMsgRelaxedErrors);
+
+   // TODO: don't do this per compile.
+   TBuiltInResource resources;
+   _mesa_glslang_generate_resources(ctx, resources);
+
+   const int defaultVersion = state->es_shader ? 100 : 110;
+
+   if (!useSPV && !state->error) {
+      if (!glslang_shader->parse(&resources, defaultVersion, false, messages)) {
+         state->error = true;
+         infoLog = glslang_shader->getInfoLog();
+      }
+   }
+
+   if (!useSPV && !state->error)
+      glslang_program->addShader(glslang_shader);
+
+   // Run glslang linking step
+   if (!useSPV && !state->error && !glslang_program->link(messages)) {
+      state->error = true;
+      infoLog = glslang_program->getInfoLog();
+   }
+
+#ifndef DEBUG
+   //std::cerr << std::endl << "Register Handler" << std::endl;
+   gla::RegisterUnsupportedFunctionalityHandler((gla::UnsupportedFunctionalityHandler) null_unsupported_functionality);
+#endif
+
+   if (!state->error) {
+       if (!useSPV) {
+           for (int stage = 0; stage < EShLangCount; ++stage) {
+               glslang::TIntermediate* intermediate = glslang_program->getIntermediate((EShLanguage)stage);
+               if (! intermediate)
+                   continue;
+               TranslateGlslangToTop(*intermediate, *manager);
+           }
+       } else {
+           // Verify that the SPV really is SPV
+           if (((unsigned int *)shader->Source)[0] == spv::MagicNumber) {
+               std::vector<unsigned int> spirv;
+
+               spirv.reserve(shader->Size);
+               for (int x=0; x<shader->Size; ++x)
+                   spirv.push_back(((unsigned int *)shader->Source)[x]);
+
+               if (strip_SPV || canonicalize_SPV) {
+                   // remap is expensive, just call once with OR'd feature mask
+                   spv::spirvbin_t(0).remap(spirv,
+                       strip_SPV        ? spv::spirvbin_t::STRIP         : 0 |
+                       canonicalize_SPV ? spv::spirvbin_t::ALL_BUT_STRIP : 0);
+               }
+
+               if (dump_SPV) {
+                   // TODO: spirv-tools::Disassemble(std::cout, spirv);
+               }
+
+               gla::SpvToTop(spirv, *manager);
+           } else {
+               state->error = true;
+           }
+       }
+
+       if (dump_hir)
+           manager->dump("\nTop IR:\n");
+
+       // Top IR to bottom IR
+       manager->translateTopToBottom();
+
+       if (dump_hir)
+           manager->dump("\n\nBottom IR:\n");
+   }
+
+   if (!state->error) {
+      const bool prevEsShader = state->es_shader;
+
+      // We must provide some state that lower level functions depend upon
+      state->es_shader = (manager->getProfile() == EEsProfile);
+      state->language_version = useSPV ? 450 : manager->getVersion();
+      
+      // TODO: enable key extensions, in advance of better integration,
+      // so that texture fn lookup can find them.  This isn't the right way to
+      // go about it.
+      state->ARB_texture_rectangle_enable  = true;
+      state->ARB_shader_texture_lod_enable = true;
+      state->ARB_draw_instanced_enable     = true;
+      state->EXT_texture_array_enable      = true;
+      state->ARB_texture_rectangle_enable  = true;
+      state->ARB_texture_gather_enable     = true;
+      state->ARB_texture_cube_map_array_enable = true;
+
+      // Allocate space for new instruction list
+      // Set required state for translation to HIR
+      manager->getBackendTranslator()->initializeTranslation(ctx, state, shader);
+
+      // Translate Bottom IR to HIR
+      // TODO: Skip empty translation units:
+      //     !state->translation_unit.is_empty() probably not set up in this path
+      manager->translateBottomToTarget();
+
+      manager->getBackendTranslator()->finalizeTranslation();
+
+      state->es_shader = prevEsShader;
+
+      // TODO: ensure that state->error is set if the translator fails
+
+      if (dump_hir)
+         _mesa_print_ir(stdout, shader->ir, state);
+   }
+
+   // Validate resulting HIR
+   if (!state->error) {
+      validate_ir_tree(shader->ir);
+   } else {
+      if (!infoLog || infoLog[0] == '\0')
+         infoLog = manager->getBackendTranslator()->getInfoLog();
+   }
+
+   // Free old infolog if any
+   if (shader->InfoLog)
+      ralloc_free(shader->InfoLog);
+
+   shader->symbols                = state->symbols;
+   shader->CompileStatus          = !state->error;
+   shader->Version                = useSPV ? 450 : manager->getVersion();
+   shader->uses_builtin_functions = state->uses_builtin_functions;
+   shader->IsES                   = state->es_shader;
+
+   if (!state->error)
+      set_shader_inout_layout(shader, state);
+
+   // Return infolog from parser or linker
+   state->info_log = shader->InfoLog = ralloc_strdup(shader, infoLog);
+
+   if (shader->ir)
+      reparent_ir(shader->ir, shader->ir);
+
+   // clean up
+   if (state)
+      ralloc_free(state);
+
+   delete glslang_program;
+   delete glslang_shader;
+   delete manager;
+}
+
+} /* extern "C" */
+/**
+ * Do the set of common optimizations passes
+ *
+ * \param ir                          List of instructions to be optimized
+ * \param linked                      Is the shader linked?  This enables
+ *                                    optimizations passes that remove code at
+ *                                    global scope and could cause linking to
+ *                                    fail.
+ * \param uniform_locations_assigned  Have locations already been assigned for
+ *                                    uniforms?  This prevents the declarations
+ *                                    of unused uniforms from being removed.
+ *                                    The setting of this flag only matters if
+ *                                    \c linked is \c true.
+ * \param max_unroll_iterations       Maximum number of loop iterations to be
+ *                                    unrolled.  Setting to 0 disables loop
+ *                                    unrolling.
+ * \param options                     The driver's preferred shader options.
+ */
+bool
+do_common_optimization(exec_list *ir, bool linked,
+		       bool uniform_locations_assigned,
+                       const struct gl_shader_compiler_options *options,
+                       bool native_integers)
+{
+   GLboolean progress = GL_FALSE;
+
+   progress = lower_instructions(ir, SUB_TO_ADD_NEG) || progress;
+
+   if (linked) {
+      progress = do_function_inlining(ir) || progress;
+      progress = do_dead_functions(ir) || progress;
+      progress = do_structure_splitting(ir) || progress;
+   }
+   progress = do_if_simplification(ir) || progress;
+   progress = opt_flatten_nested_if_blocks(ir) || progress;
+   progress = do_copy_propagation(ir) || progress;
+   progress = do_copy_propagation_elements(ir) || progress;
+
+   if (options->OptimizeForAOS && !linked)
+      progress = opt_flip_matrices(ir) || progress;
+
+   if (linked && options->OptimizeForAOS) {
+      progress = do_vectorize(ir) || progress;
+   }
+
+   if (linked)
+      progress = do_dead_code(ir, uniform_locations_assigned) || progress;
+   else
+      progress = do_dead_code_unlinked(ir) || progress;
+
+   progress = do_dead_code_local(ir) || progress;
+   progress = do_tree_grafting(ir) || progress;
+   progress = do_constant_propagation(ir) || progress;
+   if (linked)
+      progress = do_constant_variable(ir) || progress;
+   else
+      progress = do_constant_variable_unlinked(ir) || progress;
+   progress = do_constant_folding(ir) || progress;
+   progress = do_cse(ir) || progress;
+   progress = do_algebraic(ir, native_integers) || progress;
+   progress = do_lower_jumps(ir) || progress;
+   progress = do_vec_index_to_swizzle(ir) || progress;
+   progress = lower_vector_insert(ir, false) || progress;
+   progress = do_swizzle_swizzle(ir) || progress;
+   progress = do_noop_swizzle(ir) || progress;
+
+   progress = optimize_split_arrays(ir, linked) || progress;
+   progress = optimize_redundant_jumps(ir) || progress;
+
+   loop_state *ls = analyze_loop_variables(ir);
+   if (ls->loop_found) {
+      progress = set_loop_controls(ir, ls) || progress;
+      progress = unroll_loops(ir, ls, options) || progress;
+   }
+   delete ls;
+
+   return progress;
+}
+
+extern "C" {
+
+/**
+ * To be called at process creation time.  This does one-time
+ * compiler initialization.
+ */
+void
+_mesa_create_shader_compiler(void)
+{
+   static bool initialized = false;
+
+   // Initialize glslang and LunarGlass
+   if (!initialized) {
+      glslang::InitializeProcess();
+      gla::Manager::startMultithreaded();
+      gla::MesaGlassTranslator::initSamplerTypes();
+      initialized = true;
+   }
+}
+
+/**
+ * To be called at GL teardown time, this frees compiler datastructures.
+ *
+ * After calling this, any previously compiled shaders and shader
+ * programs would be invalid.  So this should happen at approximately
+ * program exit.
+ */
+void
+_mesa_destroy_shader_compiler(void)
+{
+//   _mesa_glsl_destroy_threadpool();
+
+   _mesa_destroy_shader_compiler_caches();
+
+   _mesa_glsl_release_types();
+}
+
+/**
+ * Releases compiler caches to trade off performance for memory.
+ *
+ * Intended to be used with glReleaseShaderCompiler().
+ */
+void
+_mesa_destroy_shader_compiler_caches(void)
+{
+//   _mesa_glsl_wait_threadpool();
+   _mesa_glsl_release_builtin_functions();
+}
+
+}

diff --git a/icd/intel/compiler/shader/glsl_symbol_table.cpp b/icd/intel/compiler/shader/glsl_symbol_table.cpp
new file mode 100644
index 0000000..a052362
--- /dev/null
+++ b/icd/intel/compiler/shader/glsl_symbol_table.cpp

@@ -0,0 +1,248 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "glsl_symbol_table.h"
+
+class symbol_table_entry {
+public:
+   DECLARE_RALLOC_CXX_OPERATORS(symbol_table_entry);
+
+   bool add_interface(const glsl_type *i, enum ir_variable_mode mode)
+   {
+      const glsl_type **dest;
+
+      switch (mode) {
+      case ir_var_uniform:
+         dest = &ibu;
+         break;
+      case ir_var_shader_in:
+         dest = &ibi;
+         break;
+      case ir_var_shader_out:
+         dest = &ibo;
+         break;
+      default:
+         assert(!"Unsupported interface variable mode!");
+         return false;
+      }
+
+      if (*dest != NULL) {
+         return false;
+      } else {
+         *dest = i;
+         return true;
+      }
+   }
+
+   const glsl_type *get_interface(enum ir_variable_mode mode)
+   {
+      switch (mode) {
+      case ir_var_uniform:
+         return ibu;
+      case ir_var_shader_in:
+         return ibi;
+      case ir_var_shader_out:
+         return ibo;
+      default:
+         assert(!"Unsupported interface variable mode!");
+         return NULL;
+      }
+   }
+
+   symbol_table_entry(ir_variable *v)               :
+      v(v), f(0), t(0), ibu(0), ibi(0), ibo(0), a(0) {}
+   symbol_table_entry(ir_function *f)               :
+      v(0), f(f), t(0), ibu(0), ibi(0), ibo(0), a(0) {}
+   symbol_table_entry(const glsl_type *t)           :
+      v(0), f(0), t(t), ibu(0), ibi(0), ibo(0), a(0) {}
+   symbol_table_entry(const glsl_type *t, enum ir_variable_mode mode) :
+      v(0), f(0), t(0), ibu(0), ibi(0), ibo(0), a(0)
+   {
+      assert(t->is_interface());
+      add_interface(t, mode);
+   }
+   symbol_table_entry(const class ast_type_specifier *a):
+      v(0), f(0), t(0), ibu(0), ibi(0), ibo(0), a(a) {}
+
+   ir_variable *v;
+   ir_function *f;
+   const glsl_type *t;
+   const glsl_type *ibu;
+   const glsl_type *ibi;
+   const glsl_type *ibo;
+   const class ast_type_specifier *a;
+};
+
+glsl_symbol_table::glsl_symbol_table()
+{
+   this->separate_function_namespace = false;
+   this->table = _mesa_symbol_table_ctor();
+   this->mem_ctx = ralloc_context(NULL);
+}
+
+glsl_symbol_table::~glsl_symbol_table()
+{
+   _mesa_symbol_table_dtor(table);
+   ralloc_free(mem_ctx);
+}
+
+void glsl_symbol_table::push_scope()
+{
+   _mesa_symbol_table_push_scope(table);
+}
+
+void glsl_symbol_table::pop_scope()
+{
+   _mesa_symbol_table_pop_scope(table);
+}
+
+bool glsl_symbol_table::name_declared_this_scope(const char *name)
+{
+   return _mesa_symbol_table_symbol_scope(table, -1, name) == 0;
+}
+
+bool glsl_symbol_table::add_variable(ir_variable *v)
+{
+   if (this->separate_function_namespace) {
+      /* In 1.10, functions and variables have separate namespaces. */
+      symbol_table_entry *existing = get_entry(v->name);
+      if (name_declared_this_scope(v->name)) {
+	 /* If there's already an existing function (not a constructor!) in
+	  * the current scope, just update the existing entry to include 'v'.
+	  */
+	 if (existing->v == NULL && existing->t == NULL) {
+	    existing->v = v;
+	    return true;
+	 }
+      } else {
+	 /* If not declared at this scope, add a new entry.  But if an existing
+	  * entry includes a function, propagate that to this block - otherwise
+	  * the new variable declaration would shadow the function.
+	  */
+	 symbol_table_entry *entry = new(mem_ctx) symbol_table_entry(v);
+	 if (existing != NULL)
+	    entry->f = existing->f;
+	 int added = _mesa_symbol_table_add_symbol(table, -1, v->name, entry);
+	 assert(added == 0);
+	 (void)added;
+	 return true;
+      }
+      return false;
+   }
+
+   /* 1.20+ rules: */
+   symbol_table_entry *entry = new(mem_ctx) symbol_table_entry(v);
+   return _mesa_symbol_table_add_symbol(table, -1, v->name, entry) == 0;
+}
+
+bool glsl_symbol_table::add_type(const char *name, const glsl_type *t)
+{
+   symbol_table_entry *entry = new(mem_ctx) symbol_table_entry(t);
+   return _mesa_symbol_table_add_symbol(table, -1, name, entry) == 0;
+}
+
+bool glsl_symbol_table::add_interface(const char *name, const glsl_type *i,
+                                      enum ir_variable_mode mode)
+{
+   assert(i->is_interface());
+   symbol_table_entry *entry = get_entry(name);
+   if (entry == NULL) {
+      symbol_table_entry *entry =
+         new(mem_ctx) symbol_table_entry(i, mode);
+      bool add_interface_symbol_result =
+         _mesa_symbol_table_add_symbol(table, -1, name, entry) == 0;
+      assert(add_interface_symbol_result);
+      return add_interface_symbol_result;
+   } else {
+      return entry->add_interface(i, mode);
+   }
+}
+
+bool glsl_symbol_table::add_function(ir_function *f)
+{
+   if (this->separate_function_namespace && name_declared_this_scope(f->name)) {
+      /* In 1.10, functions and variables have separate namespaces. */
+      symbol_table_entry *existing = get_entry(f->name);
+      if ((existing->f == NULL) && (existing->t == NULL)) {
+	 existing->f = f;
+	 return true;
+      }
+   }
+   symbol_table_entry *entry = new(mem_ctx) symbol_table_entry(f);
+   return _mesa_symbol_table_add_symbol(table, -1, f->name, entry) == 0;
+}
+
+void glsl_symbol_table::add_global_function(ir_function *f)
+{
+   symbol_table_entry *entry = new(mem_ctx) symbol_table_entry(f);
+   int added = _mesa_symbol_table_add_global_symbol(table, -1, f->name, entry);
+   assert(added == 0);
+   (void)added;
+}
+
+ir_variable *glsl_symbol_table::get_variable(const char *name)
+{
+   symbol_table_entry *entry = get_entry(name);
+   return entry != NULL ? entry->v : NULL;
+}
+
+const glsl_type *glsl_symbol_table::get_type(const char *name)
+{
+   symbol_table_entry *entry = get_entry(name);
+   return entry != NULL ? entry->t : NULL;
+}
+
+const glsl_type *glsl_symbol_table::get_interface(const char *name,
+                                                  enum ir_variable_mode mode)
+{
+   symbol_table_entry *entry = get_entry(name);
+   return entry != NULL ? entry->get_interface(mode) : NULL;
+}
+
+ir_function *glsl_symbol_table::get_function(const char *name)
+{
+   symbol_table_entry *entry = get_entry(name);
+   return entry != NULL ? entry->f : NULL;
+}
+
+symbol_table_entry *glsl_symbol_table::get_entry(const char *name)
+{
+   return (symbol_table_entry *)
+      _mesa_symbol_table_find_symbol(table, -1, name);
+}
+
+void
+glsl_symbol_table::disable_variable(const char *name)
+{
+   /* Ideally we would remove the variable's entry from the symbol table, but
+    * that would be difficult.  Fortunately, since this is only used for
+    * built-in variables, it won't be possible for the shader to re-introduce
+    * the variable later, so all we really need to do is to make sure that
+    * further attempts to access it using get_variable() will return NULL.
+    */
+   symbol_table_entry *entry = get_entry(name);
+   if (entry != NULL) {
+      entry->v = NULL;
+   }
+}

diff --git a/icd/intel/compiler/shader/glsl_symbol_table.h b/icd/intel/compiler/shader/glsl_symbol_table.h
new file mode 100644
index 0000000..f323fc3
--- /dev/null
+++ b/icd/intel/compiler/shader/glsl_symbol_table.h

@@ -0,0 +1,137 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef GLSL_SYMBOL_TABLE
+#define GLSL_SYMBOL_TABLE
+
+#include <new>
+
+extern "C" {
+#include "program/symbol_table.h"
+}
+#include "ir.h"
+#include "glsl_types.h"
+
+class symbol_table_entry;
+
+/**
+ * Facade class for _mesa_symbol_table
+ *
+ * Wraps the existing \c _mesa_symbol_table data structure to enforce some
+ * type safe and some symbol table invariants.
+ */
+struct glsl_symbol_table {
+private:
+   static void
+   _glsl_symbol_table_destructor (glsl_symbol_table *table)
+   {
+      table->~glsl_symbol_table();
+   }
+
+public:
+   /* Callers of this ralloc-based new need not call delete. It's
+    * easier to just ralloc_free 'ctx' (or any of its ancestors). */
+   static void* operator new(size_t size, void *ctx)
+   {
+      void *table;
+
+      table = ralloc_size(ctx, size);
+      assert(table != NULL);
+
+      ralloc_set_destructor(table, (void (*)(void*)) _glsl_symbol_table_destructor);
+
+      return table;
+   }
+
+   /* If the user *does* call delete, that's OK, we will just
+    * ralloc_free in that case. Here, C++ will have already called the
+    * destructor so tell ralloc not to do that again. */
+   static void operator delete(void *table)
+   {
+      ralloc_set_destructor(table, NULL);
+      ralloc_free(table);
+   }
+   
+   glsl_symbol_table();
+   ~glsl_symbol_table();
+
+   /* In 1.10, functions and variables have separate namespaces. */
+   bool separate_function_namespace;
+
+   void push_scope();
+   void pop_scope();
+
+   /**
+    * Determine whether a name was declared at the current scope
+    */
+   bool name_declared_this_scope(const char *name);
+
+   /**
+    * \name Methods to add symbols to the table
+    *
+    * There is some temptation to rename all these functions to \c add_symbol
+    * or similar.  However, this breaks symmetry with the getter functions and
+    * reduces the clarity of the intention of code that uses these methods.
+    */
+   /*@{*/
+   bool add_variable(ir_variable *v);
+   bool add_type(const char *name, const glsl_type *t);
+   bool add_function(ir_function *f);
+   bool add_interface(const char *name, const glsl_type *i,
+                      enum ir_variable_mode mode);
+   /*@}*/
+
+   /**
+    * Add an function at global scope without checking for scoping conflicts.
+    */
+   void add_global_function(ir_function *f);
+
+   /**
+    * \name Methods to get symbols from the table
+    */
+   /*@{*/
+   ir_variable *get_variable(const char *name);
+   const glsl_type *get_type(const char *name);
+   ir_function *get_function(const char *name);
+   const glsl_type *get_interface(const char *name,
+                                  enum ir_variable_mode mode);
+   /*@}*/
+
+   /**
+    * Disable a previously-added variable so that it no longer appears to be
+    * in the symbol table.  This is necessary when gl_PerVertex is redeclared,
+    * to ensure that previously-available built-in variables are no longer
+    * available.
+    */
+   void disable_variable(const char *name);
+
+private:
+   symbol_table_entry *get_entry(const char *name);
+
+   struct _mesa_symbol_table *table;
+   void *mem_ctx;
+};
+
+#endif /* GLSL_SYMBOL_TABLE */

diff --git a/icd/intel/compiler/shader/glsl_types.cpp b/icd/intel/compiler/shader/glsl_types.cpp
new file mode 100644
index 0000000..fa0f298
--- /dev/null
+++ b/icd/intel/compiler/shader/glsl_types.cpp

@@ -0,0 +1,1219 @@
+/*
+ * Copyright © 2009 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "libfns.h" // LunarG ADD:
+#include "glsl_symbol_table.h"
+#include "glsl_parser_extras.h"
+#include "glsl_types.h"
+extern "C" {
+#include "program/hash_table.h"
+}
+
+mtx_t glsl_type::mutex = _MTX_INITIALIZER_NP;
+hash_table *glsl_type::array_types = NULL;
+hash_table *glsl_type::record_types = NULL;
+hash_table *glsl_type::interface_types = NULL;
+void *glsl_type::mem_ctx = NULL;
+
+void
+glsl_type::init_ralloc_type_ctx(void)
+{
+   if (glsl_type::mem_ctx == NULL) {
+      glsl_type::mem_ctx = ralloc_autofree_context();
+      assert(glsl_type::mem_ctx != NULL);
+   }
+}
+
+glsl_type::glsl_type(GLenum gl_type,
+		     glsl_base_type base_type, unsigned vector_elements,
+		     unsigned matrix_columns, const char *name) :
+   gl_type(gl_type),
+   base_type(base_type),
+   sampler_dimensionality(0), sampler_shadow(0), sampler_array(0),
+   sampler_type(0), interface_packing(0),
+   vector_elements(vector_elements), matrix_columns(matrix_columns),
+   length(0)
+{
+   mtx_lock(&glsl_type::mutex);
+
+   init_ralloc_type_ctx();
+   assert(name != NULL);
+   this->name = ralloc_strdup(this->mem_ctx, name);
+
+   mtx_unlock(&glsl_type::mutex);
+
+   /* Neither dimension is zero or both dimensions are zero.
+    */
+   assert((vector_elements == 0) == (matrix_columns == 0));
+   memset(& fields, 0, sizeof(fields));
+}
+
+glsl_type::glsl_type(GLenum gl_type, glsl_base_type base_type,
+		     enum glsl_sampler_dim dim, bool shadow, bool array,
+		     unsigned type, const char *name) :
+   gl_type(gl_type),
+   base_type(base_type),
+   sampler_dimensionality(dim), sampler_shadow(shadow),
+   sampler_array(array), sampler_type(type), interface_packing(0),
+   length(0)
+{
+   mtx_lock(&glsl_type::mutex);
+
+   init_ralloc_type_ctx();
+   assert(name != NULL);
+   this->name = ralloc_strdup(this->mem_ctx, name);
+
+   mtx_unlock(&glsl_type::mutex);
+
+   memset(& fields, 0, sizeof(fields));
+
+   if (base_type == GLSL_TYPE_SAMPLER) {
+      /* Samplers take no storage whatsoever. */
+      matrix_columns = vector_elements = 0;
+   } else {
+      matrix_columns = vector_elements = 1;
+   }
+}
+
+glsl_type::glsl_type(const glsl_struct_field *fields, unsigned num_fields,
+		     const char *name) :
+   gl_type(0),
+   base_type(GLSL_TYPE_STRUCT),
+   sampler_dimensionality(0), sampler_shadow(0), sampler_array(0),
+   sampler_type(0), interface_packing(0),
+   vector_elements(0), matrix_columns(0),
+   length(num_fields)
+{
+   unsigned int i;
+
+   mtx_lock(&glsl_type::mutex);
+
+   init_ralloc_type_ctx();
+   assert(name != NULL);
+   this->name = ralloc_strdup(this->mem_ctx, name);
+   this->fields.structure = ralloc_array(this->mem_ctx,
+					 glsl_struct_field, length);
+
+   for (i = 0; i < length; i++) {
+      this->fields.structure[i].type = fields[i].type;
+      this->fields.structure[i].name = ralloc_strdup(this->fields.structure,
+						     fields[i].name);
+      this->fields.structure[i].location = fields[i].location;
+      this->fields.structure[i].interpolation = fields[i].interpolation;
+      this->fields.structure[i].centroid = fields[i].centroid;
+      this->fields.structure[i].sample = fields[i].sample;
+      this->fields.structure[i].row_major = fields[i].row_major;
+   }
+
+   mtx_unlock(&glsl_type::mutex);
+}
+
+glsl_type::glsl_type(const glsl_struct_field *fields, unsigned num_fields,
+		     enum glsl_interface_packing packing, const char *name) :
+   gl_type(0),
+   base_type(GLSL_TYPE_INTERFACE),
+   sampler_dimensionality(0), sampler_shadow(0), sampler_array(0),
+   sampler_type(0), interface_packing((unsigned) packing),
+   vector_elements(0), matrix_columns(0),
+   length(num_fields)
+{
+   unsigned int i;
+
+   mtx_lock(&glsl_type::mutex);
+
+   init_ralloc_type_ctx();
+   assert(name != NULL);
+   this->name = ralloc_strdup(this->mem_ctx, name);
+   this->fields.structure = ralloc_array(this->mem_ctx,
+					 glsl_struct_field, length);
+   for (i = 0; i < length; i++) {
+      this->fields.structure[i].type = fields[i].type;
+      this->fields.structure[i].name = ralloc_strdup(this->fields.structure,
+						     fields[i].name);
+      this->fields.structure[i].location = fields[i].location;
+      this->fields.structure[i].interpolation = fields[i].interpolation;
+      this->fields.structure[i].centroid = fields[i].centroid;
+      this->fields.structure[i].sample = fields[i].sample;
+      this->fields.structure[i].row_major = fields[i].row_major;
+   }
+
+   mtx_unlock(&glsl_type::mutex);
+}
+
+
+void
+glsl_type::serialize(memory_writer &mem) const
+{
+   uint32_t data_len = 0;
+
+   mem.write_string(name);
+
+   unsigned start_pos = mem.position();
+   mem.write_uint32_t(data_len);
+
+   /**
+    * Used to notify reader if a user defined type
+    * has been serialized before.
+    */
+   uint8_t user_type_exists = 0;
+
+   /* Serialize only user defined types. */
+   switch (base_type) {
+   case GLSL_TYPE_ARRAY:
+   case GLSL_TYPE_STRUCT:
+   case GLSL_TYPE_INTERFACE:
+      break;
+   default:
+      goto serialization_epilogue;
+   }
+
+   uint32_t type_id;
+   user_type_exists = mem.make_unique_id(this, &type_id);
+
+   mem.write_uint8_t(user_type_exists);
+   mem.write_uint32_t(type_id);
+
+   /* No need to write again. */
+   if (user_type_exists)
+      goto serialization_epilogue;
+
+   mem.write_uint32_t((uint32_t)length);
+   mem.write_uint8_t((uint8_t)base_type);
+   mem.write_uint8_t((uint8_t)interface_packing);
+
+   if (base_type == GLSL_TYPE_ARRAY) {
+      element_type()->serialize(mem);
+   } else {
+      glsl_struct_field *field = fields.structure;
+      for (unsigned i = 0; i < length; i++, field++) {
+         mem.write(field, sizeof(glsl_struct_field));
+         mem.write_string(field->name);
+         field->type->serialize(mem);
+      }
+   }
+
+serialization_epilogue:
+   /* Update the length of written data in 'start_pos'. */
+   data_len = mem.position() - start_pos - sizeof(data_len);
+   mem.overwrite(&data_len, sizeof(data_len), start_pos);
+}
+
+
+const glsl_type *
+deserialize_glsl_type(memory_map *map, struct _mesa_glsl_parse_state *state,
+                      struct hash_table *type_hash)
+{
+   char *name = map->read_string();
+   /* TODO: Understand how this can happen and fix */
+   if (!name)
+      return glsl_type::error_type;
+
+   uint32_t type_size = map->read_uint32_t();
+   const glsl_type *ret_type = glsl_type::error_type;
+
+   const glsl_type *existing_type =
+      state->symbols->get_type(name);
+
+   /* If type exists, move read pointer forward and return type. */
+   if (existing_type) {
+      map->ffwd(type_size);
+      return existing_type;
+   }
+
+   /* Has this user type been read and stored to hash already? */
+   uint8_t user_type_exists = map->read_uint8_t();
+   uint32_t type_id = map->read_uint32_t();
+
+   uint32_t length;
+   uint8_t base_type, interface_packing;
+
+   if (user_type_exists) {
+      hash_entry *entry =
+         _mesa_hash_table_search(type_hash, _mesa_hash_string(name),
+                                 (void*) (uintptr_t) type_id);
+
+      /* Return already read type from the hash. */
+      if (entry && entry->data)
+         return (const glsl_type *) entry->data;
+      else
+         goto type_serialization_error;
+   }
+
+   length = map->read_uint32_t();
+   base_type = map->read_uint8_t();
+   interface_packing = map->read_uint8_t();
+
+   if (base_type >= GLSL_TYPE_ERROR)
+      goto type_serialization_error;
+
+   /* Array type has additional element_type information. */
+   if (base_type == GLSL_TYPE_ARRAY) {
+      const glsl_type *element_type =
+         deserialize_glsl_type(map, state, type_hash);
+      if (!element_type)
+         goto type_serialization_error;
+
+      ret_type = glsl_type::get_array_instance(element_type, length);
+      goto return_type;
+   }
+
+   /* Structures have fields containing of names and types. */
+   else if (base_type == GLSL_TYPE_STRUCT ||
+      base_type == GLSL_TYPE_INTERFACE) {
+      glsl_struct_field *fields = ralloc_array(NULL, glsl_struct_field, length);
+      if (!fields)
+         goto type_serialization_error;
+
+      for (unsigned k = 0; k < length; k++) {
+         map->read(&fields[k], sizeof(glsl_struct_field));
+         char *field_name = map->read_string();
+         fields[k].name = _mesa_strdup(field_name);
+         fields[k].type = deserialize_glsl_type(map, state, type_hash);
+         /* Break out of the loop if read errors occurred. */
+         if (map->errors())
+            goto type_serialization_error;
+      }
+
+      if (base_type == GLSL_TYPE_STRUCT)
+         ret_type = glsl_type::get_record_instance(fields, length, name);
+      else if (base_type == GLSL_TYPE_INTERFACE)
+         ret_type =
+            glsl_type::get_interface_instance(fields, length,
+                                              (glsl_interface_packing)
+                                              interface_packing, name);
+      /* Free allocated memory. */
+      for (unsigned k = 0; k < length; k++)
+         free((void *)fields[k].name);
+      ralloc_free(fields);
+
+      goto return_type;
+   } else if (base_type == GLSL_TYPE_UINT) {
+      /* Support uint as a user type if it wasn't available
+       * in builtin_type_versions.  This can happen during
+       * IR lowering passes.
+       */
+      ret_type = glsl_type::uint_type;
+      goto return_type;
+   }
+
+   /* Should not fall down here! */
+type_serialization_error:
+   assert(!"error deserializing glsl_type");
+   return glsl_type::error_type;
+
+return_type:
+   /* Store user type in to hash. */
+   _mesa_hash_table_insert(type_hash, _mesa_hash_string(name),
+                           (void*) (uintptr_t) type_id,
+                           (void*) ret_type);
+   return ret_type;
+}
+
+
+
+
+bool
+glsl_type::contains_sampler() const
+{
+   if (this->is_array()) {
+      return this->fields.array->contains_sampler();
+   } else if (this->is_record()) {
+      for (unsigned int i = 0; i < this->length; i++) {
+	 if (this->fields.structure[i].type->contains_sampler())
+	    return true;
+      }
+      return false;
+   } else {
+      return this->is_sampler();
+   }
+}
+
+
+bool
+glsl_type::contains_integer() const
+{
+   if (this->is_array()) {
+      return this->fields.array->contains_integer();
+   } else if (this->is_record()) {
+      for (unsigned int i = 0; i < this->length; i++) {
+	 if (this->fields.structure[i].type->contains_integer())
+	    return true;
+      }
+      return false;
+   } else {
+      return this->is_integer();
+   }
+}
+
+bool
+glsl_type::contains_opaque() const {
+   switch (base_type) {
+   case GLSL_TYPE_SAMPLER:
+   case GLSL_TYPE_IMAGE:
+   case GLSL_TYPE_ATOMIC_UINT:
+      return true;
+   case GLSL_TYPE_ARRAY:
+      return element_type()->contains_opaque();
+   case GLSL_TYPE_STRUCT:
+      for (unsigned int i = 0; i < length; i++) {
+         if (fields.structure[i].type->contains_opaque())
+            return true;
+      }
+      return false;
+   default:
+      return false;
+   }
+}
+
+gl_texture_index
+glsl_type::sampler_index() const
+{
+   const glsl_type *const t = (this->is_array()) ? this->fields.array : this;
+
+   assert(t->is_sampler());
+
+   switch (t->sampler_dimensionality) {
+   case GLSL_SAMPLER_DIM_1D:
+      return (t->sampler_array) ? TEXTURE_1D_ARRAY_INDEX : TEXTURE_1D_INDEX;
+   case GLSL_SAMPLER_DIM_2D:
+      return (t->sampler_array) ? TEXTURE_2D_ARRAY_INDEX : TEXTURE_2D_INDEX;
+   case GLSL_SAMPLER_DIM_3D:
+      return TEXTURE_3D_INDEX;
+   case GLSL_SAMPLER_DIM_CUBE:
+      return (t->sampler_array) ? TEXTURE_CUBE_ARRAY_INDEX : TEXTURE_CUBE_INDEX;
+   case GLSL_SAMPLER_DIM_RECT:
+      return TEXTURE_RECT_INDEX;
+   case GLSL_SAMPLER_DIM_BUF:
+      return TEXTURE_BUFFER_INDEX;
+   case GLSL_SAMPLER_DIM_EXTERNAL:
+      return TEXTURE_EXTERNAL_INDEX;
+   case GLSL_SAMPLER_DIM_MS:
+      return (t->sampler_array) ? TEXTURE_2D_MULTISAMPLE_ARRAY_INDEX : TEXTURE_2D_MULTISAMPLE_INDEX;
+   default:
+      assert(!"Should not get here.");
+      return TEXTURE_BUFFER_INDEX;
+   }
+}
+
+bool
+glsl_type::contains_image() const
+{
+   if (this->is_array()) {
+      return this->fields.array->contains_image();
+   } else if (this->is_record()) {
+      for (unsigned int i = 0; i < this->length; i++) {
+	 if (this->fields.structure[i].type->contains_image())
+	    return true;
+      }
+      return false;
+   } else {
+      return this->is_image();
+   }
+}
+
+const glsl_type *glsl_type::get_base_type() const
+{
+   switch (base_type) {
+   case GLSL_TYPE_UINT:
+      return uint_type;
+   case GLSL_TYPE_INT:
+      return int_type;
+   case GLSL_TYPE_FLOAT:
+      return float_type;
+   case GLSL_TYPE_BOOL:
+      return bool_type;
+   default:
+      return error_type;
+   }
+}
+
+
+const glsl_type *glsl_type::get_scalar_type() const
+{
+   const glsl_type *type = this;
+
+   /* Handle arrays */
+   while (type->base_type == GLSL_TYPE_ARRAY)
+      type = type->fields.array;
+
+   /* Handle vectors and matrices */
+   switch (type->base_type) {
+   case GLSL_TYPE_UINT:
+      return uint_type;
+   case GLSL_TYPE_INT:
+      return int_type;
+   case GLSL_TYPE_FLOAT:
+      return float_type;
+   case GLSL_TYPE_BOOL:
+      return bool_type;
+   default:
+      /* Handle everything else */
+      return type;
+   }
+}
+
+
+void
+_mesa_glsl_release_types(void)
+{
+   mtx_lock(&glsl_type::mutex);
+
+   if (glsl_type::array_types != NULL) {
+      hash_table_dtor(glsl_type::array_types);
+      glsl_type::array_types = NULL;
+   }
+
+   if (glsl_type::record_types != NULL) {
+      hash_table_dtor(glsl_type::record_types);
+      glsl_type::record_types = NULL;
+   }
+
+   mtx_unlock(&glsl_type::mutex);
+}
+
+
+glsl_type::glsl_type(const glsl_type *array, unsigned length) :
+   base_type(GLSL_TYPE_ARRAY),
+   sampler_dimensionality(0), sampler_shadow(0), sampler_array(0),
+   sampler_type(0), interface_packing(0),
+   vector_elements(0), matrix_columns(0),
+   name(NULL), length(length)
+{
+   this->fields.array = array;
+   /* Inherit the gl type of the base. The GL type is used for
+    * uniform/statevar handling in Mesa and the arrayness of the type
+    * is represented by the size rather than the type.
+    */
+   this->gl_type = array->gl_type;
+
+   /* Allow a maximum of 10 characters for the array size.  This is enough
+    * for 32-bits of ~0.  The extra 3 are for the '[', ']', and terminating
+    * NUL.
+    */
+   const unsigned name_length = strlen(array->name) + 10 + 3;
+
+   mtx_lock(&glsl_type::mutex);
+   char *const n = (char *) ralloc_size(this->mem_ctx, name_length);
+   mtx_unlock(&glsl_type::mutex);
+
+   if (length == 0)
+      snprintf(n, name_length, "%s[]", array->name);
+   else {
+      /* insert outermost dimensions in the correct spot
+       * otherwise the dimension order will be backwards
+       */
+      const char *pos = strchr(array->name, '[');
+      if (pos) {
+         int idx = pos - array->name;
+         snprintf(n, idx+1, "%s", array->name);
+         snprintf(n + idx, name_length - idx, "[%u]%s",
+                  length, array->name + idx);
+      } else {
+         snprintf(n, name_length, "%s[%u]", array->name, length);
+      }
+   }
+
+   this->name = n;
+}
+
+
+const glsl_type *
+glsl_type::vec(unsigned components)
+{
+   if (components == 0 || components > 4)
+      return error_type;
+
+   static const glsl_type *const ts[] = {
+      float_type, vec2_type, vec3_type, vec4_type
+   };
+   return ts[components - 1];
+}
+
+
+const glsl_type *
+glsl_type::ivec(unsigned components)
+{
+   if (components == 0 || components > 4)
+      return error_type;
+
+   static const glsl_type *const ts[] = {
+      int_type, ivec2_type, ivec3_type, ivec4_type
+   };
+   return ts[components - 1];
+}
+
+
+const glsl_type *
+glsl_type::uvec(unsigned components)
+{
+   if (components == 0 || components > 4)
+      return error_type;
+
+   static const glsl_type *const ts[] = {
+      uint_type, uvec2_type, uvec3_type, uvec4_type
+   };
+   return ts[components - 1];
+}
+
+
+const glsl_type *
+glsl_type::bvec(unsigned components)
+{
+   if (components == 0 || components > 4)
+      return error_type;
+
+   static const glsl_type *const ts[] = {
+      bool_type, bvec2_type, bvec3_type, bvec4_type
+   };
+   return ts[components - 1];
+}
+
+
+const glsl_type *
+glsl_type::get_instance(unsigned base_type, unsigned rows, unsigned columns)
+{
+   if (base_type == GLSL_TYPE_VOID)
+      return void_type;
+
+   if ((rows < 1) || (rows > 4) || (columns < 1) || (columns > 4))
+      return error_type;
+
+   /* Treat GLSL vectors as Nx1 matrices.
+    */
+   if (columns == 1) {
+      switch (base_type) {
+      case GLSL_TYPE_UINT:
+	 return uvec(rows);
+      case GLSL_TYPE_INT:
+	 return ivec(rows);
+      case GLSL_TYPE_FLOAT:
+	 return vec(rows);
+      case GLSL_TYPE_BOOL:
+	 return bvec(rows);
+      default:
+	 return error_type;
+      }
+   } else {
+      if ((base_type != GLSL_TYPE_FLOAT) || (rows == 1))
+	 return error_type;
+
+      /* GLSL matrix types are named mat{COLUMNS}x{ROWS}.  Only the following
+       * combinations are valid:
+       *
+       *   1 2 3 4
+       * 1
+       * 2   x x x
+       * 3   x x x
+       * 4   x x x
+       */
+#define IDX(c,r) (((c-1)*3) + (r-1))
+
+      switch (IDX(columns, rows)) {
+      case IDX(2,2): return mat2_type;
+      case IDX(2,3): return mat2x3_type;
+      case IDX(2,4): return mat2x4_type;
+      case IDX(3,2): return mat3x2_type;
+      case IDX(3,3): return mat3_type;
+      case IDX(3,4): return mat3x4_type;
+      case IDX(4,2): return mat4x2_type;
+      case IDX(4,3): return mat4x3_type;
+      case IDX(4,4): return mat4_type;
+      default: return error_type;
+      }
+   }
+
+   assert(!"Should not get here.");
+   return error_type;
+}
+
+
+const glsl_type *
+glsl_type::get_array_instance(const glsl_type *base, unsigned array_size)
+{
+   /* Generate a name using the base type pointer in the key.  This is
+    * done because the name of the base type may not be unique across
+    * shaders.  For example, two shaders may have different record types
+    * named 'foo'.
+    */
+   char key[128];
+   snprintf(key, sizeof(key), "%p[%u]", (void *) base, array_size);
+
+   mtx_lock(&glsl_type::mutex);
+
+   if (array_types == NULL) {
+      array_types = hash_table_ctor(64, hash_table_string_hash,
+				    hash_table_string_compare);
+   }
+
+   const glsl_type *t = (glsl_type *) hash_table_find(array_types, key);
+
+   if (t == NULL) {
+      mtx_unlock(&glsl_type::mutex);
+      t = new glsl_type(base, array_size);
+      mtx_lock(&glsl_type::mutex);
+
+      hash_table_insert(array_types, (void *) t, ralloc_strdup(mem_ctx, key));
+   }
+
+   assert(t->base_type == GLSL_TYPE_ARRAY);
+   assert(t->length == array_size);
+   assert(t->fields.array == base);
+
+   mtx_unlock(&glsl_type::mutex);
+
+   return t;
+}
+
+
+bool
+glsl_type::record_compare(const glsl_type *b) const
+{
+   if (this->length != b->length)
+      return false;
+
+   if (this->interface_packing != b->interface_packing)
+      return false;
+
+   for (unsigned i = 0; i < this->length; i++) {
+      if (this->fields.structure[i].type != b->fields.structure[i].type)
+	 return false;
+      if (strcmp(this->fields.structure[i].name,
+		 b->fields.structure[i].name) != 0)
+	 return false;
+      if (this->fields.structure[i].row_major
+         != b->fields.structure[i].row_major)
+        return false;
+      if (this->fields.structure[i].location
+          != b->fields.structure[i].location)
+         return false;
+      if (this->fields.structure[i].interpolation
+          != b->fields.structure[i].interpolation)
+         return false;
+      if (this->fields.structure[i].centroid
+          != b->fields.structure[i].centroid)
+         return false;
+      if (this->fields.structure[i].sample
+          != b->fields.structure[i].sample)
+         return false;
+   }
+
+   return true;
+}
+
+
+int
+glsl_type::record_key_compare(const void *a, const void *b)
+{
+   const glsl_type *const key1 = (glsl_type *) a;
+   const glsl_type *const key2 = (glsl_type *) b;
+
+   /* Return zero is the types match (there is zero difference) or non-zero
+    * otherwise.
+    */
+   if (strcmp(key1->name, key2->name) != 0)
+      return 1;
+
+   return !key1->record_compare(key2);
+}
+
+
+unsigned
+glsl_type::record_key_hash(const void *a)
+{
+   const glsl_type *const key = (glsl_type *) a;
+   char hash_key[128];
+   unsigned size = 0;
+
+   size = snprintf(hash_key, sizeof(hash_key), "%08x", key->length);
+
+   for (unsigned i = 0; i < key->length; i++) {
+      if (size >= sizeof(hash_key))
+	 break;
+
+      size += snprintf(& hash_key[size], sizeof(hash_key) - size,
+		       "%p", (void *) key->fields.structure[i].type);
+   }
+
+   return hash_table_string_hash(& hash_key);
+}
+
+
+const glsl_type *
+glsl_type::get_record_instance(const glsl_struct_field *fields,
+			       unsigned num_fields,
+			       const char *name)
+{
+   const glsl_type key(fields, num_fields, name);
+
+   mtx_lock(&glsl_type::mutex);
+
+   if (record_types == NULL) {
+      record_types = hash_table_ctor(64, record_key_hash, record_key_compare);
+   }
+
+   const glsl_type *t = (glsl_type *) hash_table_find(record_types, & key);
+   if (t == NULL) {
+      mtx_unlock(&glsl_type::mutex);
+      t = new glsl_type(fields, num_fields, name);
+      mtx_lock(&glsl_type::mutex);
+
+      hash_table_insert(record_types, (void *) t, t);
+   }
+
+   assert(t->base_type == GLSL_TYPE_STRUCT);
+   assert(t->length == num_fields);
+   assert(strcmp(t->name, name) == 0);
+
+   mtx_unlock(&glsl_type::mutex);
+
+   return t;
+}
+
+
+const glsl_type *
+glsl_type::get_interface_instance(const glsl_struct_field *fields,
+				  unsigned num_fields,
+				  enum glsl_interface_packing packing,
+				  const char *block_name)
+{
+   const glsl_type key(fields, num_fields, packing, block_name);
+
+   mtx_lock(&glsl_type::mutex);
+
+   if (interface_types == NULL) {
+      interface_types = hash_table_ctor(64, record_key_hash, record_key_compare);
+   }
+
+   const glsl_type *t = (glsl_type *) hash_table_find(interface_types, & key);
+   if (t == NULL) {
+      mtx_unlock(&glsl_type::mutex);
+      t = new glsl_type(fields, num_fields, packing, block_name);
+      mtx_lock(&glsl_type::mutex);
+
+      hash_table_insert(interface_types, (void *) t, t);
+   }
+
+   assert(t->base_type == GLSL_TYPE_INTERFACE);
+   assert(t->length == num_fields);
+   assert(strcmp(t->name, block_name) == 0);
+
+   mtx_unlock(&glsl_type::mutex);
+
+   return t;
+}
+
+
+const glsl_type *
+glsl_type::field_type(const char *name) const
+{
+   if (this->base_type != GLSL_TYPE_STRUCT
+       && this->base_type != GLSL_TYPE_INTERFACE)
+      return error_type;
+
+   for (unsigned i = 0; i < this->length; i++) {
+      if (strcmp(name, this->fields.structure[i].name) == 0)
+	 return this->fields.structure[i].type;
+   }
+
+   return error_type;
+}
+
+
+int
+glsl_type::field_index(const char *name) const
+{
+   if (this->base_type != GLSL_TYPE_STRUCT
+       && this->base_type != GLSL_TYPE_INTERFACE)
+      return -1;
+
+   for (unsigned i = 0; i < this->length; i++) {
+      if (strcmp(name, this->fields.structure[i].name) == 0)
+	 return i;
+   }
+
+   return -1;
+}
+
+
+unsigned
+glsl_type::component_slots() const
+{
+   switch (this->base_type) {
+   case GLSL_TYPE_UINT:
+   case GLSL_TYPE_INT:
+   case GLSL_TYPE_FLOAT:
+   case GLSL_TYPE_BOOL:
+      return this->components();
+
+   case GLSL_TYPE_STRUCT:
+   case GLSL_TYPE_INTERFACE: {
+      unsigned size = 0;
+
+      for (unsigned i = 0; i < this->length; i++)
+	 size += this->fields.structure[i].type->component_slots();
+
+      return size;
+   }
+
+   case GLSL_TYPE_ARRAY:
+      return this->length * this->fields.array->component_slots();
+
+   case GLSL_TYPE_IMAGE:
+      return 1;
+
+   case GLSL_TYPE_SAMPLER:
+   case GLSL_TYPE_ATOMIC_UINT:
+   case GLSL_TYPE_VOID:
+   case GLSL_TYPE_ERROR:
+      break;
+   }
+
+   return 0;
+}
+
+bool
+glsl_type::can_implicitly_convert_to(const glsl_type *desired) const
+{
+   if (this == desired)
+      return true;
+
+   /* There is no conversion among matrix types. */
+   if (this->matrix_columns > 1 || desired->matrix_columns > 1)
+      return false;
+
+   /* int and uint can be converted to float. */
+   return desired->is_float()
+          && this->is_integer()
+          && this->vector_elements == desired->vector_elements;
+}
+
+unsigned
+glsl_type::std140_base_alignment(bool row_major) const
+{
+   /* (1) If the member is a scalar consuming <N> basic machine units, the
+    *     base alignment is <N>.
+    *
+    * (2) If the member is a two- or four-component vector with components
+    *     consuming <N> basic machine units, the base alignment is 2<N> or
+    *     4<N>, respectively.
+    *
+    * (3) If the member is a three-component vector with components consuming
+    *     <N> basic machine units, the base alignment is 4<N>.
+    */
+   if (this->is_scalar() || this->is_vector()) {
+      switch (this->vector_elements) {
+      case 1:
+	 return 4;
+      case 2:
+	 return 8;
+      case 3:
+      case 4:
+	 return 16;
+      }
+   }
+
+   /* (4) If the member is an array of scalars or vectors, the base alignment
+    *     and array stride are set to match the base alignment of a single
+    *     array element, according to rules (1), (2), and (3), and rounded up
+    *     to the base alignment of a vec4. The array may have padding at the
+    *     end; the base offset of the member following the array is rounded up
+    *     to the next multiple of the base alignment.
+    *
+    * (6) If the member is an array of <S> column-major matrices with <C>
+    *     columns and <R> rows, the matrix is stored identically to a row of
+    *     <S>*<C> column vectors with <R> components each, according to rule
+    *     (4).
+    *
+    * (8) If the member is an array of <S> row-major matrices with <C> columns
+    *     and <R> rows, the matrix is stored identically to a row of <S>*<R>
+    *     row vectors with <C> components each, according to rule (4).
+    *
+    * (10) If the member is an array of <S> structures, the <S> elements of
+    *      the array are laid out in order, according to rule (9).
+    */
+   if (this->is_array()) {
+      if (this->fields.array->is_scalar() ||
+	  this->fields.array->is_vector() ||
+	  this->fields.array->is_matrix()) {
+	 return MAX2(this->fields.array->std140_base_alignment(row_major), 16);
+      } else {
+	 assert(this->fields.array->is_record());
+	 return this->fields.array->std140_base_alignment(row_major);
+      }
+   }
+
+   /* (5) If the member is a column-major matrix with <C> columns and
+    *     <R> rows, the matrix is stored identically to an array of
+    *     <C> column vectors with <R> components each, according to
+    *     rule (4).
+    *
+    * (7) If the member is a row-major matrix with <C> columns and <R>
+    *     rows, the matrix is stored identically to an array of <R>
+    *     row vectors with <C> components each, according to rule (4).
+    */
+   if (this->is_matrix()) {
+      const struct glsl_type *vec_type, *array_type;
+      int c = this->matrix_columns;
+      int r = this->vector_elements;
+
+      if (row_major) {
+	 vec_type = get_instance(GLSL_TYPE_FLOAT, c, 1);
+	 array_type = glsl_type::get_array_instance(vec_type, r);
+      } else {
+	 vec_type = get_instance(GLSL_TYPE_FLOAT, r, 1);
+	 array_type = glsl_type::get_array_instance(vec_type, c);
+      }
+
+      return array_type->std140_base_alignment(false);
+   }
+
+   /* (9) If the member is a structure, the base alignment of the
+    *     structure is <N>, where <N> is the largest base alignment
+    *     value of any of its members, and rounded up to the base
+    *     alignment of a vec4. The individual members of this
+    *     sub-structure are then assigned offsets by applying this set
+    *     of rules recursively, where the base offset of the first
+    *     member of the sub-structure is equal to the aligned offset
+    *     of the structure. The structure may have padding at the end;
+    *     the base offset of the member following the sub-structure is
+    *     rounded up to the next multiple of the base alignment of the
+    *     structure.
+    */
+   if (this->is_record()) {
+      unsigned base_alignment = 16;
+      for (unsigned i = 0; i < this->length; i++) {
+	 const struct glsl_type *field_type = this->fields.structure[i].type;
+	 base_alignment = MAX2(base_alignment,
+			       field_type->std140_base_alignment(row_major));
+      }
+      return base_alignment;
+   }
+
+   assert(!"not reached");
+   return -1;
+}
+
+unsigned
+glsl_type::std140_size(bool row_major) const
+{
+   /* (1) If the member is a scalar consuming <N> basic machine units, the
+    *     base alignment is <N>.
+    *
+    * (2) If the member is a two- or four-component vector with components
+    *     consuming <N> basic machine units, the base alignment is 2<N> or
+    *     4<N>, respectively.
+    *
+    * (3) If the member is a three-component vector with components consuming
+    *     <N> basic machine units, the base alignment is 4<N>.
+    */
+   if (this->is_scalar() || this->is_vector()) {
+      return this->vector_elements * 4;
+   }
+
+   /* (5) If the member is a column-major matrix with <C> columns and
+    *     <R> rows, the matrix is stored identically to an array of
+    *     <C> column vectors with <R> components each, according to
+    *     rule (4).
+    *
+    * (6) If the member is an array of <S> column-major matrices with <C>
+    *     columns and <R> rows, the matrix is stored identically to a row of
+    *     <S>*<C> column vectors with <R> components each, according to rule
+    *     (4).
+    *
+    * (7) If the member is a row-major matrix with <C> columns and <R>
+    *     rows, the matrix is stored identically to an array of <R>
+    *     row vectors with <C> components each, according to rule (4).
+    *
+    * (8) If the member is an array of <S> row-major matrices with <C> columns
+    *     and <R> rows, the matrix is stored identically to a row of <S>*<R>
+    *     row vectors with <C> components each, according to rule (4).
+    */
+   if (this->is_matrix() || (this->is_array() &&
+			     this->fields.array->is_matrix())) {
+      const struct glsl_type *element_type;
+      const struct glsl_type *vec_type;
+      unsigned int array_len;
+
+      if (this->is_array()) {
+	 element_type = this->fields.array;
+	 array_len = this->length;
+      } else {
+	 element_type = this;
+	 array_len = 1;
+      }
+
+      if (row_major) {
+	 vec_type = get_instance(GLSL_TYPE_FLOAT,
+				 element_type->matrix_columns, 1);
+	 array_len *= element_type->vector_elements;
+      } else {
+	 vec_type = get_instance(GLSL_TYPE_FLOAT,
+				 element_type->vector_elements, 1);
+	 array_len *= element_type->matrix_columns;
+      }
+      const glsl_type *array_type = glsl_type::get_array_instance(vec_type,
+								  array_len);
+
+      return array_type->std140_size(false);
+   }
+
+   /* (4) If the member is an array of scalars or vectors, the base alignment
+    *     and array stride are set to match the base alignment of a single
+    *     array element, according to rules (1), (2), and (3), and rounded up
+    *     to the base alignment of a vec4. The array may have padding at the
+    *     end; the base offset of the member following the array is rounded up
+    *     to the next multiple of the base alignment.
+    *
+    * (10) If the member is an array of <S> structures, the <S> elements of
+    *      the array are laid out in order, according to rule (9).
+    */
+   if (this->is_array()) {
+      if (this->fields.array->is_record()) {
+	 return this->length * this->fields.array->std140_size(row_major);
+      } else {
+	 unsigned element_base_align =
+	    this->fields.array->std140_base_alignment(row_major);
+	 return this->length * MAX2(element_base_align, 16);
+      }
+   }
+
+   /* (9) If the member is a structure, the base alignment of the
+    *     structure is <N>, where <N> is the largest base alignment
+    *     value of any of its members, and rounded up to the base
+    *     alignment of a vec4. The individual members of this
+    *     sub-structure are then assigned offsets by applying this set
+    *     of rules recursively, where the base offset of the first
+    *     member of the sub-structure is equal to the aligned offset
+    *     of the structure. The structure may have padding at the end;
+    *     the base offset of the member following the sub-structure is
+    *     rounded up to the next multiple of the base alignment of the
+    *     structure.
+    */
+   if (this->is_record()) {
+      unsigned size = 0;
+      for (unsigned i = 0; i < this->length; i++) {
+	 const struct glsl_type *field_type = this->fields.structure[i].type;
+	 unsigned align = field_type->std140_base_alignment(row_major);
+	 size = glsl_align(size, align);
+	 size += field_type->std140_size(row_major);
+      }
+      size = glsl_align(size,
+			this->fields.structure[0].type->std140_base_alignment(row_major));
+      return size;
+   }
+
+   assert(!"not reached");
+   return -1;
+}
+
+
+unsigned
+glsl_type::count_attribute_slots() const
+{
+   /* From page 31 (page 37 of the PDF) of the GLSL 1.50 spec:
+    *
+    *     "A scalar input counts the same amount against this limit as a vec4,
+    *     so applications may want to consider packing groups of four
+    *     unrelated float inputs together into a vector to better utilize the
+    *     capabilities of the underlying hardware. A matrix input will use up
+    *     multiple locations.  The number of locations used will equal the
+    *     number of columns in the matrix."
+    *
+    * The spec does not explicitly say how arrays are counted.  However, it
+    * should be safe to assume the total number of slots consumed by an array
+    * is the number of entries in the array multiplied by the number of slots
+    * consumed by a single element of the array.
+    *
+    * The spec says nothing about how structs are counted, because vertex
+    * attributes are not allowed to be (or contain) structs.  However, Mesa
+    * allows varying structs, the number of varying slots taken up by a
+    * varying struct is simply equal to the sum of the number of slots taken
+    * up by each element.
+    */
+   switch (this->base_type) {
+   case GLSL_TYPE_UINT:
+   case GLSL_TYPE_INT:
+   case GLSL_TYPE_FLOAT:
+   case GLSL_TYPE_BOOL:
+      return this->matrix_columns;
+
+   case GLSL_TYPE_STRUCT:
+   case GLSL_TYPE_INTERFACE: {
+      unsigned size = 0;
+
+      for (unsigned i = 0; i < this->length; i++)
+         size += this->fields.structure[i].type->count_attribute_slots();
+
+      return size;
+   }
+
+   case GLSL_TYPE_ARRAY:
+      return this->length * this->fields.array->count_attribute_slots();
+
+   case GLSL_TYPE_SAMPLER:
+   case GLSL_TYPE_IMAGE:
+   case GLSL_TYPE_ATOMIC_UINT:
+   case GLSL_TYPE_VOID:
+   case GLSL_TYPE_ERROR:
+      break;
+   }
+
+   assert(!"Unexpected type in count_attribute_slots()");
+
+   return 0;
+}
+
+int
+glsl_type::coordinate_components() const
+{
+   int size;
+
+   switch (sampler_dimensionality) {
+   case GLSL_SAMPLER_DIM_1D:
+   case GLSL_SAMPLER_DIM_BUF:
+      size = 1;
+      break;
+   case GLSL_SAMPLER_DIM_2D:
+   case GLSL_SAMPLER_DIM_RECT:
+   case GLSL_SAMPLER_DIM_MS:
+   case GLSL_SAMPLER_DIM_EXTERNAL:
+      size = 2;
+      break;
+   case GLSL_SAMPLER_DIM_3D:
+   case GLSL_SAMPLER_DIM_CUBE:
+      size = 3;
+      break;
+   default:
+      assert(!"Should not get here.");
+      size = 1;
+      break;
+   }
+
+   /* Array textures need an additional component for the array index. */
+   if (sampler_array)
+      size += 1;
+
+   return size;
+}

diff --git a/icd/intel/compiler/shader/hir_field_selection.cpp b/icd/intel/compiler/shader/hir_field_selection.cpp
new file mode 100644
index 0000000..1e92c89
--- /dev/null
+++ b/icd/intel/compiler/shader/hir_field_selection.cpp

@@ -0,0 +1,122 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "ir.h"
+#include "program/symbol_table.h"
+#include "glsl_parser_extras.h"
+#include "ast.h"
+#include "glsl_types.h"
+
+ir_rvalue *
+_mesa_ast_field_selection_to_hir(const ast_expression *expr,
+				 exec_list *instructions,
+				 struct _mesa_glsl_parse_state *state)
+{
+   void *ctx = state;
+   ir_rvalue *result = NULL;
+   ir_rvalue *op;
+
+   op = expr->subexpressions[0]->hir(instructions, state);
+
+   /* There are two kinds of field selection.  There is the selection of a
+    * specific field from a structure, and there is the selection of a
+    * swizzle / mask from a vector.  Which is which is determined entirely
+    * by the base type of the thing to which the field selection operator is
+    * being applied.
+    */
+   YYLTYPE loc = expr->get_location();
+   if (op->type->is_error()) {
+      /* silently propagate the error */
+   } else if (op->type->base_type == GLSL_TYPE_STRUCT
+              || op->type->base_type == GLSL_TYPE_INTERFACE) {
+      result = new(ctx) ir_dereference_record(op,
+					      expr->primary_expression.identifier);
+
+      if (result->type->is_error()) {
+	 _mesa_glsl_error(& loc, state, "cannot access field `%s' of "
+			  "structure",
+			  expr->primary_expression.identifier);
+      }
+   } else if (expr->subexpressions[1] != NULL) {
+      /* Handle "method calls" in GLSL 1.20 - namely, array.length() */
+      state->check_version(120, 300, &loc, "methods not supported");
+
+      ast_expression *call = expr->subexpressions[1];
+      assert(call->oper == ast_function_call);
+
+      const char *method;
+      method = call->subexpressions[0]->primary_expression.identifier;
+
+      if (strcmp(method, "length") == 0) {
+         if (!call->expressions.is_empty())
+            _mesa_glsl_error(&loc, state, "length method takes no arguments");
+
+         if (op->type->is_array()) {
+            if (op->type->is_unsized_array())
+               _mesa_glsl_error(&loc, state, "length called on unsized array");
+
+            result = new(ctx) ir_constant(op->type->array_size());
+         } else if (op->type->is_vector()) {
+            if (state->ARB_shading_language_420pack_enable) {
+               /* .length() returns int. */
+               result = new(ctx) ir_constant((int) op->type->vector_elements);
+            } else {
+               _mesa_glsl_error(&loc, state, "length method on matrix only available"
+                                             "with ARB_shading_language_420pack");
+            }
+         } else if (op->type->is_matrix()) {
+            if (state->ARB_shading_language_420pack_enable) {
+               /* .length() returns int. */
+               result = new(ctx) ir_constant((int) op->type->matrix_columns);
+            } else {
+               _mesa_glsl_error(&loc, state, "length method on matrix only available"
+                                             "with ARB_shading_language_420pack");
+            }
+         }
+      } else {
+	 _mesa_glsl_error(&loc, state, "unknown method: `%s'", method);
+      }
+   } else if (op->type->is_vector() ||
+              (state->ARB_shading_language_420pack_enable &&
+               op->type->is_scalar())) {
+      ir_swizzle *swiz = ir_swizzle::create(op,
+					    expr->primary_expression.identifier,
+					    op->type->vector_elements);
+      if (swiz != NULL) {
+	 result = swiz;
+      } else {
+	 /* FINISHME: Logging of error messages should be moved into
+	  * FINISHME: ir_swizzle::create.  This allows the generation of more
+	  * FINISHME: specific error messages.
+	  */
+	 _mesa_glsl_error(& loc, state, "invalid swizzle / mask `%s'",
+			  expr->primary_expression.identifier);
+      }
+   } else {
+      _mesa_glsl_error(& loc, state, "cannot access field `%s' of "
+		       "non-structure / non-vector",
+		       expr->primary_expression.identifier);
+   }
+
+   return result ? result : ir_rvalue::error_value(ctx);
+}

diff --git a/icd/intel/compiler/shader/ir.cpp b/icd/intel/compiler/shader/ir.cpp
new file mode 100644
index 0000000..2a3a5c6
--- /dev/null
+++ b/icd/intel/compiler/shader/ir.cpp

@@ -0,0 +1,1896 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include <string.h>
+#include "ir.h"
+#include "ir_visitor.h"
+#include "glsl_types.h"
+#include "libfns.h"
+
+ir_rvalue::ir_rvalue()
+{
+   this->type = glsl_type::error_type;
+}
+
+bool ir_rvalue::is_zero() const
+{
+   return false;
+}
+
+bool ir_rvalue::is_one() const
+{
+   return false;
+}
+
+bool ir_rvalue::is_negative_one() const
+{
+   return false;
+}
+
+bool ir_rvalue::is_basis() const
+{
+   return false;
+}
+
+/**
+ * Modify the swizzle make to move one component to another
+ *
+ * \param m    IR swizzle to be modified
+ * \param from Component in the RHS that is to be swizzled
+ * \param to   Desired swizzle location of \c from
+ */
+static void
+update_rhs_swizzle(ir_swizzle_mask &m, unsigned from, unsigned to)
+{
+   switch (to) {
+   case 0: m.x = from; break;
+   case 1: m.y = from; break;
+   case 2: m.z = from; break;
+   case 3: m.w = from; break;
+   default: assert(!"Should not get here.");
+   }
+
+   m.num_components = MAX2(m.num_components, (to + 1));
+}
+
+void
+ir_assignment::set_lhs(ir_rvalue *lhs)
+{
+   void *mem_ctx = this;
+   bool swizzled = false;
+
+   while (lhs != NULL) {
+      ir_swizzle *swiz = lhs->as_swizzle();
+
+      if (swiz == NULL)
+	 break;
+
+      unsigned write_mask = 0;
+      ir_swizzle_mask rhs_swiz = { 0, 0, 0, 0, 0, 0 };
+
+      for (unsigned i = 0; i < swiz->mask.num_components; i++) {
+	 unsigned c = 0;
+
+	 switch (i) {
+	 case 0: c = swiz->mask.x; break;
+	 case 1: c = swiz->mask.y; break;
+	 case 2: c = swiz->mask.z; break;
+	 case 3: c = swiz->mask.w; break;
+	 default: assert(!"Should not get here.");
+	 }
+
+	 write_mask |= (((this->write_mask >> i) & 1) << c);
+	 update_rhs_swizzle(rhs_swiz, i, c);
+      }
+
+      this->write_mask = write_mask;
+      lhs = swiz->val;
+
+      this->rhs = new(mem_ctx) ir_swizzle(this->rhs, rhs_swiz);
+      swizzled = true;
+   }
+
+   if (swizzled) {
+      /* Now, RHS channels line up with the LHS writemask.  Collapse it
+       * to just the channels that will be written.
+       */
+      ir_swizzle_mask rhs_swiz = { 0, 0, 0, 0, 0, 0 };
+      int rhs_chan = 0;
+      for (int i = 0; i < 4; i++) {
+	 if (write_mask & (1 << i))
+	    update_rhs_swizzle(rhs_swiz, i, rhs_chan++);
+      }
+      this->rhs = new(mem_ctx) ir_swizzle(this->rhs, rhs_swiz);
+   }
+
+   assert((lhs == NULL) || lhs->as_dereference());
+
+   this->lhs = (ir_dereference *) lhs;
+}
+
+ir_variable *
+ir_assignment::whole_variable_written()
+{
+   ir_variable *v = this->lhs->whole_variable_referenced();
+
+   if (v == NULL)
+      return NULL;
+
+   if (v->type->is_scalar())
+      return v;
+
+   if (v->type->is_vector()) {
+      const unsigned mask = (1U << v->type->vector_elements) - 1;
+
+      if (mask != this->write_mask)
+	 return NULL;
+   }
+
+   /* Either all the vector components are assigned or the variable is some
+    * composite type (and the whole thing is assigned.
+    */
+   return v;
+}
+
+ir_assignment::ir_assignment(ir_dereference *lhs, ir_rvalue *rhs,
+			     ir_rvalue *condition, unsigned write_mask)
+{
+   this->ir_type = ir_type_assignment;
+   this->condition = condition;
+   this->rhs = rhs;
+   this->lhs = lhs;
+   this->write_mask = write_mask;
+
+   if (lhs->type->is_scalar() || lhs->type->is_vector()) {
+      int lhs_components = 0;
+      for (int i = 0; i < 4; i++) {
+	 if (write_mask & (1 << i))
+	    lhs_components++;
+      }
+
+      assert(lhs_components == this->rhs->type->vector_elements);
+   }
+}
+
+ir_assignment::ir_assignment(ir_rvalue *lhs, ir_rvalue *rhs,
+			     ir_rvalue *condition)
+{
+   this->ir_type = ir_type_assignment;
+   this->condition = condition;
+   this->rhs = rhs;
+
+   /* If the RHS is a vector type, assume that all components of the vector
+    * type are being written to the LHS.  The write mask comes from the RHS
+    * because we can have a case where the LHS is a vec4 and the RHS is a
+    * vec3.  In that case, the assignment is:
+    *
+    *     (assign (...) (xyz) (var_ref lhs) (var_ref rhs))
+    */
+   if (rhs->type->is_vector())
+      this->write_mask = (1U << rhs->type->vector_elements) - 1;
+   else if (rhs->type->is_scalar())
+      this->write_mask = 1;
+   else
+      this->write_mask = 0;
+
+   this->set_lhs(lhs);
+}
+
+ir_expression::ir_expression(int op, const struct glsl_type *type,
+			     ir_rvalue *op0, ir_rvalue *op1,
+			     ir_rvalue *op2, ir_rvalue *op3)
+{
+   this->ir_type = ir_type_expression;
+   this->type = type;
+   this->operation = ir_expression_operation(op);
+   this->operands[0] = op0;
+   this->operands[1] = op1;
+   this->operands[2] = op2;
+   this->operands[3] = op3;
+#ifndef NDEBUG
+   int num_operands = get_num_operands(this->operation);
+   for (int i = num_operands; i < 4; i++) {
+      assert(this->operands[i] == NULL);
+   }
+#endif
+}
+
+ir_expression::ir_expression(int op, ir_rvalue *op0)
+{
+   this->ir_type = ir_type_expression;
+
+   this->operation = ir_expression_operation(op);
+   this->operands[0] = op0;
+   this->operands[1] = NULL;
+   this->operands[2] = NULL;
+   this->operands[3] = NULL;
+
+   assert(op <= ir_last_unop);
+
+   switch (this->operation) {
+   case ir_unop_bit_not:
+   case ir_unop_logic_not:
+   case ir_unop_neg:
+   case ir_unop_abs:
+   case ir_unop_sign:
+   case ir_unop_rcp:
+   case ir_unop_rsq:
+   case ir_unop_sqrt:
+   case ir_unop_exp:
+   case ir_unop_log:
+   case ir_unop_exp2:
+   case ir_unop_log2:
+   case ir_unop_trunc:
+   case ir_unop_ceil:
+   case ir_unop_floor:
+   case ir_unop_fract:
+   case ir_unop_round_even:
+   case ir_unop_sin:
+   case ir_unop_cos:
+   case ir_unop_sin_reduced:
+   case ir_unop_cos_reduced:
+   case ir_unop_dFdx:
+   case ir_unop_dFdy:
+   case ir_unop_bitfield_reverse:
+      this->type = op0->type;
+      break;
+
+   case ir_unop_f2i:
+   case ir_unop_b2i:
+   case ir_unop_u2i:
+   case ir_unop_bitcast_f2i:
+   case ir_unop_bit_count:
+   case ir_unop_find_msb:
+   case ir_unop_find_lsb:
+      this->type = glsl_type::get_instance(GLSL_TYPE_INT,
+					   op0->type->vector_elements, 1);
+      break;
+
+   case ir_unop_b2f:
+   case ir_unop_i2f:
+   case ir_unop_u2f:
+   case ir_unop_bitcast_i2f:
+   case ir_unop_bitcast_u2f:
+      this->type = glsl_type::get_instance(GLSL_TYPE_FLOAT,
+					   op0->type->vector_elements, 1);
+      break;
+
+   case ir_unop_f2b:
+   case ir_unop_i2b:
+      this->type = glsl_type::get_instance(GLSL_TYPE_BOOL,
+					   op0->type->vector_elements, 1);
+      break;
+
+   case ir_unop_i2u:
+   case ir_unop_f2u:
+   case ir_unop_bitcast_f2u:
+      this->type = glsl_type::get_instance(GLSL_TYPE_UINT,
+					   op0->type->vector_elements, 1);
+      break;
+
+   case ir_unop_noise:
+   case ir_unop_unpack_half_2x16_split_x:
+   case ir_unop_unpack_half_2x16_split_y:
+      this->type = glsl_type::float_type;
+      break;
+
+   case ir_unop_any:
+      this->type = glsl_type::bool_type;
+      break;
+
+   case ir_unop_pack_snorm_2x16:
+   case ir_unop_pack_snorm_4x8:
+   case ir_unop_pack_unorm_2x16:
+   case ir_unop_pack_unorm_4x8:
+   case ir_unop_pack_half_2x16:
+      this->type = glsl_type::uint_type;
+      break;
+
+   case ir_unop_unpack_snorm_2x16:
+   case ir_unop_unpack_unorm_2x16:
+   case ir_unop_unpack_half_2x16:
+      this->type = glsl_type::vec2_type;
+      break;
+
+   case ir_unop_unpack_snorm_4x8:
+   case ir_unop_unpack_unorm_4x8:
+      this->type = glsl_type::vec4_type;
+      break;
+
+   default:
+      assert(!"not reached: missing automatic type setup for ir_expression");
+      this->type = op0->type;
+      break;
+   }
+}
+
+ir_expression::ir_expression(int op, ir_rvalue *op0, ir_rvalue *op1)
+{
+   this->ir_type = ir_type_expression;
+
+   this->operation = ir_expression_operation(op);
+   this->operands[0] = op0;
+   this->operands[1] = op1;
+   this->operands[2] = NULL;
+   this->operands[3] = NULL;
+
+   assert(op > ir_last_unop);
+
+   switch (this->operation) {
+   case ir_binop_all_equal:
+   case ir_binop_any_nequal:
+      this->type = glsl_type::bool_type;
+      break;
+
+   case ir_binop_add:
+   case ir_binop_sub:
+   case ir_binop_min:
+   case ir_binop_max:
+   case ir_binop_pow:
+   case ir_binop_mul:
+   case ir_binop_div:
+   case ir_binop_mod:
+      if (op0->type->is_scalar()) {
+	 this->type = op1->type;
+      } else if (op1->type->is_scalar()) {
+	 this->type = op0->type;
+      } else {
+	 /* FINISHME: matrix types */
+	 assert(!op0->type->is_matrix() && !op1->type->is_matrix());
+	 assert(op0->type == op1->type);
+	 this->type = op0->type;
+      }
+      break;
+
+   case ir_binop_logic_and:
+   case ir_binop_logic_xor:
+   case ir_binop_logic_or:
+   case ir_binop_bit_and:
+   case ir_binop_bit_xor:
+   case ir_binop_bit_or:
+       assert(!op0->type->is_matrix());
+       assert(!op1->type->is_matrix());
+      if (op0->type->is_scalar()) {
+         this->type = op1->type;
+      } else if (op1->type->is_scalar()) {
+         this->type = op0->type;
+      } else {
+          assert(op0->type->vector_elements == op1->type->vector_elements);
+          this->type = op0->type;
+      }
+      break;
+
+   case ir_binop_equal:
+   case ir_binop_nequal:
+   case ir_binop_lequal:
+   case ir_binop_gequal:
+   case ir_binop_less:
+   case ir_binop_greater:
+      assert(op0->type == op1->type);
+      this->type = glsl_type::get_instance(GLSL_TYPE_BOOL,
+					   op0->type->vector_elements, 1);
+      break;
+
+   case ir_binop_dot:
+      this->type = glsl_type::float_type;
+      break;
+
+   case ir_binop_pack_half_2x16_split:
+      this->type = glsl_type::uint_type;
+      break;
+
+   case ir_binop_imul_high:
+   case ir_binop_carry:
+   case ir_binop_borrow:
+   case ir_binop_lshift:
+   case ir_binop_rshift:
+   case ir_binop_bfm:
+   case ir_binop_ldexp:
+      this->type = op0->type;
+      break;
+
+   case ir_binop_vector_extract:
+      this->type = op0->type->get_scalar_type();
+      break;
+
+   default:
+      assert(!"not reached: missing automatic type setup for ir_expression");
+      this->type = glsl_type::float_type;
+   }
+}
+
+ir_expression::ir_expression(int op, ir_rvalue *op0, ir_rvalue *op1,
+                             ir_rvalue *op2)
+{
+   this->ir_type = ir_type_expression;
+
+   this->operation = ir_expression_operation(op);
+   this->operands[0] = op0;
+   this->operands[1] = op1;
+   this->operands[2] = op2;
+   this->operands[3] = NULL;
+
+   assert(op > ir_last_binop && op <= ir_last_triop);
+
+   switch (this->operation) {
+   case ir_triop_fma:
+   case ir_triop_lrp:
+   case ir_triop_bitfield_extract:
+   case ir_triop_vector_insert:
+      this->type = op0->type;
+      break;
+
+   case ir_triop_bfi:
+   case ir_triop_csel:
+      this->type = op1->type;
+      break;
+
+   default:
+      assert(!"not reached: missing automatic type setup for ir_expression");
+      this->type = glsl_type::float_type;
+   }
+}
+
+unsigned int
+ir_expression::get_num_operands(ir_expression_operation op)
+{
+   assert(op <= ir_last_opcode);
+
+   if (op <= ir_last_unop)
+      return 1;
+
+   if (op <= ir_last_binop)
+      return 2;
+
+   if (op <= ir_last_triop)
+      return 3;
+
+   if (op <= ir_last_quadop)
+      return 4;
+
+   assert(false);
+   return 0;
+}
+
+static const char *const operator_strs[] = {
+   "~",
+   "!",
+   "neg",
+   "abs",
+   "sign",
+   "rcp",
+   "rsq",
+   "sqrt",
+   "exp",
+   "log",
+   "exp2",
+   "log2",
+   "f2i",
+   "f2u",
+   "i2f",
+   "f2b",
+   "b2f",
+   "i2b",
+   "b2i",
+   "u2f",
+   "i2u",
+   "u2i",
+   "bitcast_i2f",
+   "bitcast_f2i",
+   "bitcast_u2f",
+   "bitcast_f2u",
+   "any",
+   "trunc",
+   "ceil",
+   "floor",
+   "fract",
+   "round_even",
+   "sin",
+   "cos",
+   "sin_reduced",
+   "cos_reduced",
+   "dFdx",
+   "dFdy",
+   "packSnorm2x16",
+   "packSnorm4x8",
+   "packUnorm2x16",
+   "packUnorm4x8",
+   "packHalf2x16",
+   "unpackSnorm2x16",
+   "unpackSnorm4x8",
+   "unpackUnorm2x16",
+   "unpackUnorm4x8",
+   "unpackHalf2x16",
+   "unpackHalf2x16_split_x",
+   "unpackHalf2x16_split_y",
+   "bitfield_reverse",
+   "bit_count",
+   "find_msb",
+   "find_lsb",
+   "noise",
+   "+",
+   "-",
+   "*",
+   "imul_high",
+   "/",
+   "carry",
+   "borrow",
+   "%",
+   "<",
+   ">",
+   "<=",
+   ">=",
+   "==",
+   "!=",
+   "all_equal",
+   "any_nequal",
+   "<<",
+   ">>",
+   "&",
+   "^",
+   "|",
+   "&&",
+   "^^",
+   "||",
+   "dot",
+   "min",
+   "max",
+   "pow",
+   "packHalf2x16_split",
+   "bfm",
+   "ubo_load",
+   "ldexp",
+   "vector_extract",
+   "fma",
+   "lrp",
+   "csel",
+   "bfi",
+   "bitfield_extract",
+   "vector_insert",
+   "bitfield_insert",
+   "vector",
+};
+
+const char *ir_expression::operator_string(ir_expression_operation op)
+{
+   assert((unsigned int) op < Elements(operator_strs));
+   assert(Elements(operator_strs) == (ir_quadop_vector + 1));
+   return operator_strs[op];
+}
+
+const char *ir_expression::operator_string()
+{
+   return operator_string(this->operation);
+}
+
+const char*
+depth_layout_string(ir_depth_layout layout)
+{
+   switch(layout) {
+   case ir_depth_layout_none:      return "";
+   case ir_depth_layout_any:       return "depth_any";
+   case ir_depth_layout_greater:   return "depth_greater";
+   case ir_depth_layout_less:      return "depth_less";
+   case ir_depth_layout_unchanged: return "depth_unchanged";
+
+   default:
+      assert(0);
+      return "";
+   }
+}
+
+ir_expression_operation
+ir_expression::get_operator(const char *str)
+{
+   const int operator_count = sizeof(operator_strs) / sizeof(operator_strs[0]);
+   for (int op = 0; op < operator_count; op++) {
+      if (strcmp(str, operator_strs[op]) == 0)
+	 return (ir_expression_operation) op;
+   }
+   return (ir_expression_operation) -1;
+}
+
+ir_constant::ir_constant()
+{
+   this->ir_type = ir_type_constant;
+}
+
+ir_constant::ir_constant(const struct glsl_type *type,
+			 const ir_constant_data *data)
+{
+   assert((type->base_type >= GLSL_TYPE_UINT)
+	  && (type->base_type <= GLSL_TYPE_BOOL));
+
+   this->ir_type = ir_type_constant;
+   this->type = type;
+   memcpy(& this->value, data, sizeof(this->value));
+}
+
+ir_constant::ir_constant(float f, unsigned vector_elements)
+{
+   assert(vector_elements <= 4);
+   this->ir_type = ir_type_constant;
+   this->type = glsl_type::get_instance(GLSL_TYPE_FLOAT, vector_elements, 1);
+   for (unsigned i = 0; i < vector_elements; i++) {
+      this->value.f[i] = f;
+   }
+   for (unsigned i = vector_elements; i < 16; i++)  {
+      this->value.f[i] = 0;
+   }
+}
+
+ir_constant::ir_constant(unsigned int u, unsigned vector_elements)
+{
+   assert(vector_elements <= 4);
+   this->ir_type = ir_type_constant;
+   this->type = glsl_type::get_instance(GLSL_TYPE_UINT, vector_elements, 1);
+   for (unsigned i = 0; i < vector_elements; i++) {
+      this->value.u[i] = u;
+   }
+   for (unsigned i = vector_elements; i < 16; i++) {
+      this->value.u[i] = 0;
+   }
+}
+
+ir_constant::ir_constant(int integer, unsigned vector_elements)
+{
+   assert(vector_elements <= 4);
+   this->ir_type = ir_type_constant;
+   this->type = glsl_type::get_instance(GLSL_TYPE_INT, vector_elements, 1);
+   for (unsigned i = 0; i < vector_elements; i++) {
+      this->value.i[i] = integer;
+   }
+   for (unsigned i = vector_elements; i < 16; i++) {
+      this->value.i[i] = 0;
+   }
+}
+
+ir_constant::ir_constant(bool b, unsigned vector_elements)
+{
+   assert(vector_elements <= 4);
+   this->ir_type = ir_type_constant;
+   this->type = glsl_type::get_instance(GLSL_TYPE_BOOL, vector_elements, 1);
+   for (unsigned i = 0; i < vector_elements; i++) {
+      this->value.b[i] = b;
+   }
+   for (unsigned i = vector_elements; i < 16; i++) {
+      this->value.b[i] = false;
+   }
+}
+
+ir_constant::ir_constant(const ir_constant *c, unsigned i)
+{
+   this->ir_type = ir_type_constant;
+   this->type = c->type->get_base_type();
+
+   switch (this->type->base_type) {
+   case GLSL_TYPE_UINT:  this->value.u[0] = c->value.u[i]; break;
+   case GLSL_TYPE_INT:   this->value.i[0] = c->value.i[i]; break;
+   case GLSL_TYPE_FLOAT: this->value.f[0] = c->value.f[i]; break;
+   case GLSL_TYPE_BOOL:  this->value.b[0] = c->value.b[i]; break;
+   default:              assert(!"Should not get here."); break;
+   }
+}
+
+ir_constant::ir_constant(const struct glsl_type *type, exec_list *value_list)
+{
+   this->ir_type = ir_type_constant;
+   this->type = type;
+
+   assert(type->is_scalar() || type->is_vector() || type->is_matrix()
+	  || type->is_record() || type->is_array());
+
+   if (type->is_array()) {
+      this->array_elements = ralloc_array(this, ir_constant *, type->length);
+      unsigned i = 0;
+      foreach_list(node, value_list) {
+	 ir_constant *value = (ir_constant *) node;
+	 assert(value->as_constant() != NULL);
+
+	 this->array_elements[i++] = value;
+      }
+      return;
+   }
+
+   /* If the constant is a record, the types of each of the entries in
+    * value_list must be a 1-for-1 match with the structure components.  Each
+    * entry must also be a constant.  Just move the nodes from the value_list
+    * to the list in the ir_constant.
+    */
+   /* FINISHME: Should there be some type checking and / or assertions here? */
+   /* FINISHME: Should the new constant take ownership of the nodes from
+    * FINISHME: value_list, or should it make copies?
+    */
+   if (type->is_record()) {
+      value_list->move_nodes_to(& this->components);
+      return;
+   }
+
+   for (unsigned i = 0; i < 16; i++) {
+      this->value.u[i] = 0;
+   }
+
+   ir_constant *value = (ir_constant *) (value_list->head);
+
+   /* Constructors with exactly one scalar argument are special for vectors
+    * and matrices.  For vectors, the scalar value is replicated to fill all
+    * the components.  For matrices, the scalar fills the components of the
+    * diagonal while the rest is filled with 0.
+    */
+   if (value->type->is_scalar() && value->next->is_tail_sentinel()) {
+      if (type->is_matrix()) {
+	 /* Matrix - fill diagonal (rest is already set to 0) */
+	 assert(type->base_type == GLSL_TYPE_FLOAT);
+	 for (unsigned i = 0; i < type->matrix_columns; i++)
+	    this->value.f[i * type->vector_elements + i] = value->value.f[0];
+      } else {
+	 /* Vector or scalar - fill all components */
+	 switch (type->base_type) {
+	 case GLSL_TYPE_UINT:
+	 case GLSL_TYPE_INT:
+	    for (unsigned i = 0; i < type->components(); i++)
+	       this->value.u[i] = value->value.u[0];
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    for (unsigned i = 0; i < type->components(); i++)
+	       this->value.f[i] = value->value.f[0];
+	    break;
+	 case GLSL_TYPE_BOOL:
+	    for (unsigned i = 0; i < type->components(); i++)
+	       this->value.b[i] = value->value.b[0];
+	    break;
+	 default:
+	    assert(!"Should not get here.");
+	    break;
+	 }
+      }
+      return;
+   }
+
+   if (type->is_matrix() && value->type->is_matrix()) {
+      assert(value->next->is_tail_sentinel());
+
+      /* From section 5.4.2 of the GLSL 1.20 spec:
+       * "If a matrix is constructed from a matrix, then each component
+       *  (column i, row j) in the result that has a corresponding component
+       *  (column i, row j) in the argument will be initialized from there."
+       */
+      unsigned cols = MIN2(type->matrix_columns, value->type->matrix_columns);
+      unsigned rows = MIN2(type->vector_elements, value->type->vector_elements);
+      for (unsigned i = 0; i < cols; i++) {
+	 for (unsigned j = 0; j < rows; j++) {
+	    const unsigned src = i * value->type->vector_elements + j;
+	    const unsigned dst = i * type->vector_elements + j;
+	    this->value.f[dst] = value->value.f[src];
+	 }
+      }
+
+      /* "All other components will be initialized to the identity matrix." */
+      for (unsigned i = cols; i < type->matrix_columns; i++)
+	 this->value.f[i * type->vector_elements + i] = 1.0;
+
+      return;
+   }
+
+   /* Use each component from each entry in the value_list to initialize one
+    * component of the constant being constructed.
+    */
+   for (unsigned i = 0; i < type->components(); /* empty */) {
+      assert(value->as_constant() != NULL);
+      assert(!value->is_tail_sentinel());
+
+      for (unsigned j = 0; j < value->type->components(); j++) {
+	 switch (type->base_type) {
+	 case GLSL_TYPE_UINT:
+	    this->value.u[i] = value->get_uint_component(j);
+	    break;
+	 case GLSL_TYPE_INT:
+	    this->value.i[i] = value->get_int_component(j);
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    this->value.f[i] = value->get_float_component(j);
+	    break;
+	 case GLSL_TYPE_BOOL:
+	    this->value.b[i] = value->get_bool_component(j);
+	    break;
+	 default:
+	    /* FINISHME: What to do?  Exceptions are not the answer.
+	     */
+	    break;
+	 }
+
+	 i++;
+	 if (i >= type->components())
+	    break;
+      }
+
+      value = (ir_constant *) value->next;
+   }
+}
+
+ir_constant *
+ir_constant::zero(void *mem_ctx, const glsl_type *type)
+{
+   assert(type->is_scalar() || type->is_vector() || type->is_matrix()
+	  || type->is_record() || type->is_array());
+
+   ir_constant *c = new(mem_ctx) ir_constant;
+   c->type = type;
+   memset(&c->value, 0, sizeof(c->value));
+
+   if (type->is_array()) {
+      c->array_elements = ralloc_array(c, ir_constant *, type->length);
+
+      for (unsigned i = 0; i < type->length; i++)
+	 c->array_elements[i] = ir_constant::zero(c, type->element_type());
+   }
+
+   if (type->is_record()) {
+      for (unsigned i = 0; i < type->length; i++) {
+	 ir_constant *comp = ir_constant::zero(mem_ctx, type->fields.structure[i].type);
+	 c->components.push_tail(comp);
+      }
+   }
+
+   return c;
+}
+
+bool
+ir_constant::get_bool_component(unsigned i) const
+{
+   switch (this->type->base_type) {
+   case GLSL_TYPE_UINT:  return this->value.u[i] != 0;
+   case GLSL_TYPE_INT:   return this->value.i[i] != 0;
+   case GLSL_TYPE_FLOAT: return ((int)this->value.f[i]) != 0;
+   case GLSL_TYPE_BOOL:  return this->value.b[i];
+   default:              assert(!"Should not get here."); break;
+   }
+
+   /* Must return something to make the compiler happy.  This is clearly an
+    * error case.
+    */
+   return false;
+}
+
+float
+ir_constant::get_float_component(unsigned i) const
+{
+   switch (this->type->base_type) {
+   case GLSL_TYPE_UINT:  return (float) this->value.u[i];
+   case GLSL_TYPE_INT:   return (float) this->value.i[i];
+   case GLSL_TYPE_FLOAT: return this->value.f[i];
+   case GLSL_TYPE_BOOL:  return this->value.b[i] ? 1.0f : 0.0f;
+   default:              assert(!"Should not get here."); break;
+   }
+
+   /* Must return something to make the compiler happy.  This is clearly an
+    * error case.
+    */
+   return 0.0;
+}
+
+int
+ir_constant::get_int_component(unsigned i) const
+{
+   switch (this->type->base_type) {
+   case GLSL_TYPE_UINT:  return this->value.u[i];
+   case GLSL_TYPE_INT:   return this->value.i[i];
+   case GLSL_TYPE_FLOAT: return (int) this->value.f[i];
+   case GLSL_TYPE_BOOL:  return this->value.b[i] ? 1 : 0;
+   default:              assert(!"Should not get here."); break;
+   }
+
+   /* Must return something to make the compiler happy.  This is clearly an
+    * error case.
+    */
+   return 0;
+}
+
+unsigned
+ir_constant::get_uint_component(unsigned i) const
+{
+   switch (this->type->base_type) {
+   case GLSL_TYPE_UINT:  return this->value.u[i];
+   case GLSL_TYPE_INT:   return this->value.i[i];
+   case GLSL_TYPE_FLOAT: return (unsigned) this->value.f[i];
+   case GLSL_TYPE_BOOL:  return this->value.b[i] ? 1 : 0;
+   default:              assert(!"Should not get here."); break;
+   }
+
+   /* Must return something to make the compiler happy.  This is clearly an
+    * error case.
+    */
+   return 0;
+}
+
+ir_constant *
+ir_constant::get_array_element(unsigned i) const
+{
+   assert(this->type->is_array());
+
+   /* From page 35 (page 41 of the PDF) of the GLSL 1.20 spec:
+    *
+    *     "Behavior is undefined if a shader subscripts an array with an index
+    *     less than 0 or greater than or equal to the size the array was
+    *     declared with."
+    *
+    * Most out-of-bounds accesses are removed before things could get this far.
+    * There are cases where non-constant array index values can get constant
+    * folded.
+    */
+   if (int(i) < 0)
+      i = 0;
+   else if (i >= this->type->length)
+      i = this->type->length - 1;
+
+   return array_elements[i];
+}
+
+ir_constant *
+ir_constant::get_record_field(const char *name)
+{
+   int idx = this->type->field_index(name);
+
+   if (idx < 0)
+      return NULL;
+
+   if (this->components.is_empty())
+      return NULL;
+
+   exec_node *node = this->components.head;
+   for (int i = 0; i < idx; i++) {
+      node = node->next;
+
+      /* If the end of the list is encountered before the element matching the
+       * requested field is found, return NULL.
+       */
+      if (node->is_tail_sentinel())
+	 return NULL;
+   }
+
+   return (ir_constant *) node;
+}
+
+void
+ir_constant::copy_offset(ir_constant *src, int offset)
+{
+   switch (this->type->base_type) {
+   case GLSL_TYPE_UINT:
+   case GLSL_TYPE_INT:
+   case GLSL_TYPE_FLOAT:
+   case GLSL_TYPE_BOOL: {
+      unsigned int size = src->type->components();
+      assert (size <= this->type->components() - offset);
+      for (unsigned int i=0; i<size; i++) {
+	 switch (this->type->base_type) {
+	 case GLSL_TYPE_UINT:
+	    value.u[i+offset] = src->get_uint_component(i);
+	    break;
+	 case GLSL_TYPE_INT:
+	    value.i[i+offset] = src->get_int_component(i);
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    value.f[i+offset] = src->get_float_component(i);
+	    break;
+	 case GLSL_TYPE_BOOL:
+	    value.b[i+offset] = src->get_bool_component(i);
+	    break;
+	 default: // Shut up the compiler
+	    break;
+	 }
+      }
+      break;
+   }
+
+   case GLSL_TYPE_STRUCT: {
+      assert (src->type == this->type);
+      this->components.make_empty();
+      foreach_list(node, &src->components) {
+	 ir_constant *const orig = (ir_constant *) node;
+
+	 this->components.push_tail(orig->clone(this, NULL));
+      }
+      break;
+   }
+
+   case GLSL_TYPE_ARRAY: {
+      assert (src->type == this->type);
+      for (unsigned i = 0; i < this->type->length; i++) {
+	 this->array_elements[i] = src->array_elements[i]->clone(this, NULL);
+      }
+      break;
+   }
+
+   default:
+      assert(!"Should not get here.");
+      break;
+   }
+}
+
+void
+ir_constant::copy_masked_offset(ir_constant *src, int offset, unsigned int mask)
+{
+   assert (!type->is_array() && !type->is_record());
+
+   if (!type->is_vector() && !type->is_matrix()) {
+      offset = 0;
+      mask = 1;
+   }
+
+   int id = 0;
+   for (int i=0; i<4; i++) {
+      if (mask & (1 << i)) {
+	 switch (this->type->base_type) {
+	 case GLSL_TYPE_UINT:
+	    value.u[i+offset] = src->get_uint_component(id++);
+	    break;
+	 case GLSL_TYPE_INT:
+	    value.i[i+offset] = src->get_int_component(id++);
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    value.f[i+offset] = src->get_float_component(id++);
+	    break;
+	 case GLSL_TYPE_BOOL:
+	    value.b[i+offset] = src->get_bool_component(id++);
+	    break;
+	 default:
+	    assert(!"Should not get here.");
+	    return;
+	 }
+      }
+   }
+}
+
+bool
+ir_constant::has_value(const ir_constant *c) const
+{
+   if (this->type != c->type)
+      return false;
+
+   if (this->type->is_array()) {
+      for (unsigned i = 0; i < this->type->length; i++) {
+	 if (!this->array_elements[i]->has_value(c->array_elements[i]))
+	    return false;
+      }
+      return true;
+   }
+
+   if (this->type->base_type == GLSL_TYPE_STRUCT) {
+      const exec_node *a_node = this->components.head;
+      const exec_node *b_node = c->components.head;
+
+      while (!a_node->is_tail_sentinel()) {
+	 assert(!b_node->is_tail_sentinel());
+
+	 const ir_constant *const a_field = (ir_constant *) a_node;
+	 const ir_constant *const b_field = (ir_constant *) b_node;
+
+	 if (!a_field->has_value(b_field))
+	    return false;
+
+	 a_node = a_node->next;
+	 b_node = b_node->next;
+      }
+
+      return true;
+   }
+
+   for (unsigned i = 0; i < this->type->components(); i++) {
+      switch (this->type->base_type) {
+      case GLSL_TYPE_UINT:
+	 if (this->value.u[i] != c->value.u[i])
+	    return false;
+	 break;
+      case GLSL_TYPE_INT:
+	 if (this->value.i[i] != c->value.i[i])
+	    return false;
+	 break;
+      case GLSL_TYPE_FLOAT:
+	 if (this->value.f[i] != c->value.f[i])
+	    return false;
+	 break;
+      case GLSL_TYPE_BOOL:
+	 if (this->value.b[i] != c->value.b[i])
+	    return false;
+	 break;
+      default:
+	 assert(!"Should not get here.");
+	 return false;
+      }
+   }
+
+   return true;
+}
+
+bool
+ir_constant::is_value(float f, int i) const
+{
+   if (!this->type->is_scalar() && !this->type->is_vector())
+      return false;
+
+   /* Only accept boolean values for 0/1. */
+   if (int(bool(i)) != i && this->type->is_boolean())
+      return false;
+
+   for (unsigned c = 0; c < this->type->vector_elements; c++) {
+      switch (this->type->base_type) {
+      case GLSL_TYPE_FLOAT:
+	 if (this->value.f[c] != f)
+	    return false;
+	 break;
+      case GLSL_TYPE_INT:
+	 if (this->value.i[c] != i)
+	    return false;
+	 break;
+      case GLSL_TYPE_UINT:
+	 if (this->value.u[c] != unsigned(i))
+	    return false;
+	 break;
+      case GLSL_TYPE_BOOL:
+	 if (this->value.b[c] != bool(i))
+	    return false;
+	 break;
+      default:
+	 /* The only other base types are structures, arrays, and samplers.
+	  * Samplers cannot be constants, and the others should have been
+	  * filtered out above.
+	  */
+	 assert(!"Should not get here.");
+	 return false;
+      }
+   }
+
+   return true;
+}
+
+bool
+ir_constant::is_zero() const
+{
+   return is_value(0.0, 0);
+}
+
+bool
+ir_constant::is_one() const
+{
+   return is_value(1.0, 1);
+}
+
+bool
+ir_constant::is_negative_one() const
+{
+   return is_value(-1.0, -1);
+}
+
+bool
+ir_constant::is_basis() const
+{
+   if (!this->type->is_scalar() && !this->type->is_vector())
+      return false;
+
+   if (this->type->is_boolean())
+      return false;
+
+   unsigned ones = 0;
+   for (unsigned c = 0; c < this->type->vector_elements; c++) {
+      switch (this->type->base_type) {
+      case GLSL_TYPE_FLOAT:
+	 if (this->value.f[c] == 1.0)
+	    ones++;
+	 else if (this->value.f[c] != 0.0)
+	    return false;
+	 break;
+      case GLSL_TYPE_INT:
+	 if (this->value.i[c] == 1)
+	    ones++;
+	 else if (this->value.i[c] != 0)
+	    return false;
+	 break;
+      case GLSL_TYPE_UINT:
+	 if (int(this->value.u[c]) == 1)
+	    ones++;
+	 else if (int(this->value.u[c]) != 0)
+	    return false;
+	 break;
+      default:
+	 /* The only other base types are structures, arrays, samplers, and
+	  * booleans.  Samplers cannot be constants, and the others should
+	  * have been filtered out above.
+	  */
+	 assert(!"Should not get here.");
+	 return false;
+      }
+   }
+
+   return ones == 1;
+}
+
+bool
+ir_constant::is_uint16_constant() const
+{
+   if (!type->is_integer())
+      return false;
+
+   return value.u[0] < (1 << 16);
+}
+
+ir_loop::ir_loop()
+{
+   this->ir_type = ir_type_loop;
+}
+
+
+ir_dereference_variable::ir_dereference_variable(ir_variable *var)
+{
+   assert(var != NULL);
+
+   this->ir_type = ir_type_dereference_variable;
+   this->var = var;
+   this->type = var->type;
+}
+
+
+ir_dereference_array::ir_dereference_array(ir_rvalue *value,
+					   ir_rvalue *array_index)
+{
+   this->ir_type = ir_type_dereference_array;
+   this->array_index = array_index;
+   this->set_array(value);
+}
+
+
+ir_dereference_array::ir_dereference_array(ir_variable *var,
+					   ir_rvalue *array_index)
+{
+   void *ctx = ralloc_parent(var);
+
+   this->ir_type = ir_type_dereference_array;
+   this->array_index = array_index;
+   this->set_array(new(ctx) ir_dereference_variable(var));
+}
+
+
+void
+ir_dereference_array::set_array(ir_rvalue *value)
+{
+   assert(value != NULL);
+
+   this->array = value;
+
+   const glsl_type *const vt = this->array->type;
+
+   if (vt->is_array()) {
+      type = vt->element_type();
+   } else if (vt->is_matrix()) {
+      type = vt->column_type();
+   } else if (vt->is_vector()) {
+      type = vt->get_base_type();
+   }
+}
+
+
+ir_dereference_record::ir_dereference_record(ir_rvalue *value,
+					     const char *field)
+{
+   assert(value != NULL);
+
+   this->ir_type = ir_type_dereference_record;
+   this->record = value;
+   this->field = ralloc_strdup(this, field);
+   this->type = this->record->type->field_type(field);
+}
+
+
+ir_dereference_record::ir_dereference_record(ir_variable *var,
+					     const char *field)
+{
+   void *ctx = ralloc_parent(var);
+
+   this->ir_type = ir_type_dereference_record;
+   this->record = new(ctx) ir_dereference_variable(var);
+   this->field = ralloc_strdup(this, field);
+   this->type = this->record->type->field_type(field);
+}
+
+bool
+ir_dereference::is_lvalue() const
+{
+   ir_variable *var = this->variable_referenced();
+
+   /* Every l-value derference chain eventually ends in a variable.
+    */
+   if ((var == NULL) || var->data.read_only)
+      return false;
+
+   /* From section 4.1.7 of the GLSL 4.40 spec:
+    *
+    *   "Opaque variables cannot be treated as l-values; hence cannot
+    *    be used as out or inout function parameters, nor can they be
+    *    assigned into."
+    */
+   if (this->type->contains_opaque())
+      return false;
+
+   return true;
+}
+
+
+static const char * const tex_opcode_strs[] = { "tex", "txb", "txl", "txd", "txf", "txf_ms", "txs", "lod", "tg4", "query_levels" };
+
+const char *ir_texture::opcode_string()
+{
+   assert((unsigned int) op <=
+	  sizeof(tex_opcode_strs) / sizeof(tex_opcode_strs[0]));
+   return tex_opcode_strs[op];
+}
+
+ir_texture_opcode
+ir_texture::get_opcode(const char *str)
+{
+   const int count = sizeof(tex_opcode_strs) / sizeof(tex_opcode_strs[0]);
+   for (int op = 0; op < count; op++) {
+      if (strcmp(str, tex_opcode_strs[op]) == 0)
+	 return (ir_texture_opcode) op;
+   }
+   return (ir_texture_opcode) -1;
+}
+
+
+void
+ir_texture::set_sampler(ir_dereference *sampler, const glsl_type *type)
+{
+   assert(sampler != NULL);
+   assert(type != NULL);
+   this->sampler = sampler;
+   this->type = type;
+
+   if (this->op == ir_txs || this->op == ir_query_levels) {
+      assert(type->base_type == GLSL_TYPE_INT);
+   } else if (this->op == ir_lod) {
+      assert(type->vector_elements == 2);
+      assert(type->base_type == GLSL_TYPE_FLOAT);
+   } else {
+      assert(sampler->type->sampler_type == (int) type->base_type);
+      if (sampler->type->sampler_shadow)
+	 assert(type->vector_elements == 4 || type->vector_elements == 1);
+      else
+	 assert(type->vector_elements == 4);
+   }
+}
+
+
+void
+ir_swizzle::init_mask(const unsigned *comp, unsigned count)
+{
+   assert((count >= 1) && (count <= 4));
+
+   memset(&this->mask, 0, sizeof(this->mask));
+   this->mask.num_components = count;
+
+   unsigned dup_mask = 0;
+   switch (count) {
+   case 4:
+      assert(comp[3] <= 3);
+      dup_mask |= (1U << comp[3])
+	 & ((1U << comp[0]) | (1U << comp[1]) | (1U << comp[2]));
+      this->mask.w = comp[3];
+
+   case 3:
+      assert(comp[2] <= 3);
+      dup_mask |= (1U << comp[2])
+	 & ((1U << comp[0]) | (1U << comp[1]));
+      this->mask.z = comp[2];
+
+   case 2:
+      assert(comp[1] <= 3);
+      dup_mask |= (1U << comp[1])
+	 & ((1U << comp[0]));
+      this->mask.y = comp[1];
+
+   case 1:
+      assert(comp[0] <= 3);
+      this->mask.x = comp[0];
+   }
+
+   this->mask.has_duplicates = dup_mask != 0;
+
+   /* Based on the number of elements in the swizzle and the base type
+    * (i.e., float, int, unsigned, or bool) of the vector being swizzled,
+    * generate the type of the resulting value.
+    */
+   type = glsl_type::get_instance(val->type->base_type, mask.num_components, 1);
+}
+
+ir_swizzle::ir_swizzle(ir_rvalue *val, unsigned x, unsigned y, unsigned z,
+		       unsigned w, unsigned count)
+   : val(val)
+{
+   const unsigned components[4] = { x, y, z, w };
+   this->ir_type = ir_type_swizzle;
+   this->init_mask(components, count);
+}
+
+ir_swizzle::ir_swizzle(ir_rvalue *val, const unsigned *comp,
+		       unsigned count)
+   : val(val)
+{
+   this->ir_type = ir_type_swizzle;
+   this->init_mask(comp, count);
+}
+
+ir_swizzle::ir_swizzle(ir_rvalue *val, ir_swizzle_mask mask)
+{
+   this->ir_type = ir_type_swizzle;
+   this->val = val;
+   this->mask = mask;
+   this->type = glsl_type::get_instance(val->type->base_type,
+					mask.num_components, 1);
+}
+
+#define X 1
+#define R 5
+#define S 9
+#define I 13
+
+ir_swizzle *
+ir_swizzle::create(ir_rvalue *val, const char *str, unsigned vector_length)
+{
+   void *ctx = ralloc_parent(val);
+
+   /* For each possible swizzle character, this table encodes the value in
+    * \c idx_map that represents the 0th element of the vector.  For invalid
+    * swizzle characters (e.g., 'k'), a special value is used that will allow
+    * detection of errors.
+    */
+   static const unsigned char base_idx[26] = {
+   /* a  b  c  d  e  f  g  h  i  j  k  l  m */
+      R, R, I, I, I, I, R, I, I, I, I, I, I,
+   /* n  o  p  q  r  s  t  u  v  w  x  y  z */
+      I, I, S, S, R, S, S, I, I, X, X, X, X
+   };
+
+   /* Each valid swizzle character has an entry in the previous table.  This
+    * table encodes the base index encoded in the previous table plus the actual
+    * index of the swizzle character.  When processing swizzles, the first
+    * character in the string is indexed in the previous table.  Each character
+    * in the string is indexed in this table, and the value found there has the
+    * value form the first table subtracted.  The result must be on the range
+    * [0,3].
+    *
+    * For example, the string "wzyx" will get X from the first table.  Each of
+    * the charcaters will get X+3, X+2, X+1, and X+0 from this table.  After
+    * subtraction, the swizzle values are { 3, 2, 1, 0 }.
+    *
+    * The string "wzrg" will get X from the first table.  Each of the characters
+    * will get X+3, X+2, R+0, and R+1 from this table.  After subtraction, the
+    * swizzle values are { 3, 2, 4, 5 }.  Since 4 and 5 are outside the range
+    * [0,3], the error is detected.
+    */
+   static const unsigned char idx_map[26] = {
+   /* a    b    c    d    e    f    g    h    i    j    k    l    m */
+      R+3, R+2, 0,   0,   0,   0,   R+1, 0,   0,   0,   0,   0,   0,
+   /* n    o    p    q    r    s    t    u    v    w    x    y    z */
+      0,   0,   S+2, S+3, R+0, S+0, S+1, 0,   0,   X+3, X+0, X+1, X+2
+   };
+
+   int swiz_idx[4] = { 0, 0, 0, 0 };
+   unsigned i;
+
+
+   /* Validate the first character in the swizzle string and look up the base
+    * index value as described above.
+    */
+   if ((str[0] < 'a') || (str[0] > 'z'))
+      return NULL;
+
+   const unsigned base = base_idx[str[0] - 'a'];
+
+
+   for (i = 0; (i < 4) && (str[i] != '\0'); i++) {
+      /* Validate the next character, and, as described above, convert it to a
+       * swizzle index.
+       */
+      if ((str[i] < 'a') || (str[i] > 'z'))
+	 return NULL;
+
+      swiz_idx[i] = idx_map[str[i] - 'a'] - base;
+      if ((swiz_idx[i] < 0) || (swiz_idx[i] >= (int) vector_length))
+	 return NULL;
+   }
+
+   if (str[i] != '\0')
+	 return NULL;
+
+   return new(ctx) ir_swizzle(val, swiz_idx[0], swiz_idx[1], swiz_idx[2],
+			      swiz_idx[3], i);
+}
+
+#undef X
+#undef R
+#undef S
+#undef I
+
+ir_variable *
+ir_swizzle::variable_referenced() const
+{
+   return this->val->variable_referenced();
+}
+
+
+ir_variable::ir_variable(const struct glsl_type *type, const char *name,
+			 ir_variable_mode mode)
+   : max_ifc_array_access(NULL)
+{
+   this->ir_type = ir_type_variable;
+   this->type = type;
+   this->name = ralloc_strdup(this, name);
+   this->data.explicit_location = false;
+   this->data.has_initializer = false;
+   this->data.location = -1;
+   this->data.location_frac = 0;
+   this->warn_extension = NULL;
+   this->constant_value = NULL;
+   this->constant_initializer = NULL;
+   this->data.origin_upper_left = false;
+   this->data.pixel_center_integer = false;
+   this->data.depth_layout = ir_depth_layout_none;
+   this->data.used = false;
+   this->data.read_only = false;
+   this->data.centroid = false;
+   this->data.sample = false;
+   this->data.invariant = false;
+   this->data.how_declared = ir_var_declared_normally;
+   this->data.mode = mode;
+   this->data.interpolation = INTERP_QUALIFIER_NONE;
+   this->data.max_array_access = 0;
+   this->data.atomic.buffer_index = 0;
+   this->data.atomic.offset = 0;
+   this->data.image.read_only = false;
+   this->data.image.write_only = false;
+   this->data.image.coherent = false;
+   this->data.image._volatile = false;
+   this->data.image.restrict_flag = false;
+
+   if (type != NULL) {
+      if (type->base_type == GLSL_TYPE_SAMPLER)
+         this->data.read_only = true;
+
+      if (type->is_interface())
+         this->init_interface_type(type);
+      else if (type->is_array() && type->fields.array->is_interface())
+         this->init_interface_type(type->fields.array);
+   }
+}
+
+
+const char *
+interpolation_string(unsigned interpolation)
+{
+   switch (interpolation) {
+   case INTERP_QUALIFIER_NONE:          return "no";
+   case INTERP_QUALIFIER_SMOOTH:        return "smooth";
+   case INTERP_QUALIFIER_FLAT:          return "flat";
+   case INTERP_QUALIFIER_NOPERSPECTIVE: return "noperspective";
+   }
+
+   assert(!"Should not get here.");
+   return "";
+}
+
+
+glsl_interp_qualifier
+ir_variable::determine_interpolation_mode(bool flat_shade)
+{
+   if (this->data.interpolation != INTERP_QUALIFIER_NONE)
+      return (glsl_interp_qualifier) this->data.interpolation;
+   int location = this->data.location;
+   bool is_gl_Color =
+      location == VARYING_SLOT_COL0 || location == VARYING_SLOT_COL1;
+   if (flat_shade && is_gl_Color)
+      return INTERP_QUALIFIER_FLAT;
+   else
+      return INTERP_QUALIFIER_SMOOTH;
+}
+
+
+ir_function_signature::ir_function_signature(const glsl_type *return_type,
+                                             builtin_available_predicate b)
+   : return_type(return_type), is_defined(false), is_intrinsic(false),
+     builtin_avail(b), _function(NULL)
+{
+   this->ir_type = ir_type_function_signature;
+   this->origin = NULL;
+}
+
+
+bool
+ir_function_signature::is_builtin() const
+{
+   return builtin_avail != NULL;
+}
+
+
+bool
+ir_function_signature::is_builtin_available(const _mesa_glsl_parse_state *state) const
+{
+   /* We can't call the predicate without a state pointer, so just say that
+    * the signature is available.  At compile time, we need the filtering,
+    * but also receive a valid state pointer.  At link time, we're resolving
+    * imported built-in prototypes to their definitions, which will always
+    * be an exact match.  So we can skip the filtering.
+    */
+   if (state == NULL)
+      return true;
+
+   assert(builtin_avail != NULL);
+   return builtin_avail(state);
+}
+
+
+static bool
+modes_match(unsigned a, unsigned b)
+{
+   if (a == b)
+      return true;
+
+   /* Accept "in" vs. "const in" */
+   if ((a == ir_var_const_in && b == ir_var_function_in) ||
+       (b == ir_var_const_in && a == ir_var_function_in))
+      return true;
+
+   return false;
+}
+
+
+const char *
+ir_function_signature::qualifiers_match(exec_list *params)
+{
+   /* check that the qualifiers match. */
+   foreach_two_lists(a_node, &this->parameters, b_node, params) {
+      ir_variable *a = (ir_variable *) a_node;
+      ir_variable *b = (ir_variable *) b_node;
+
+      if (a->data.read_only != b->data.read_only ||
+	  !modes_match(a->data.mode, b->data.mode) ||
+	  a->data.interpolation != b->data.interpolation ||
+	  a->data.centroid != b->data.centroid ||
+          a->data.sample != b->data.sample ||
+          a->data.image.read_only != b->data.image.read_only ||
+          a->data.image.write_only != b->data.image.write_only ||
+          a->data.image.coherent != b->data.image.coherent ||
+          a->data.image._volatile != b->data.image._volatile ||
+          a->data.image.restrict_flag != b->data.image.restrict_flag) {
+
+	 /* parameter a's qualifiers don't match */
+	 return a->name;
+      }
+   }
+   return NULL;
+}
+
+
+void
+ir_function_signature::replace_parameters(exec_list *new_params)
+{
+   /* Destroy all of the previous parameter information.  If the previous
+    * parameter information comes from the function prototype, it may either
+    * specify incorrect parameter names or not have names at all.
+    */
+   new_params->move_nodes_to(&parameters);
+}
+
+
+ir_function::ir_function(const char *name)
+{
+   this->ir_type = ir_type_function;
+   this->name = ralloc_strdup(this, name);
+}
+
+
+bool
+ir_function::has_user_signature()
+{
+   foreach_list(n, &this->signatures) {
+      ir_function_signature *const sig = (ir_function_signature *) n;
+      if (!sig->is_builtin())
+	 return true;
+   }
+   return false;
+}
+
+
+ir_rvalue *
+ir_rvalue::error_value(void *mem_ctx)
+{
+   ir_rvalue *v = new(mem_ctx) ir_rvalue;
+
+   v->type = glsl_type::error_type;
+   return v;
+}
+
+
+void
+visit_exec_list(exec_list *list, ir_visitor *visitor)
+{
+   foreach_list_safe(n, list) {
+      ((ir_instruction *) n)->accept(visitor);
+   }
+}
+
+
+static void
+steal_memory(ir_instruction *ir, void *new_ctx)
+{
+   ir_variable *var = ir->as_variable();
+   ir_constant *constant = ir->as_constant();
+   if (var != NULL && var->constant_value != NULL)
+      steal_memory(var->constant_value, ir);
+
+   if (var != NULL && var->constant_initializer != NULL)
+      steal_memory(var->constant_initializer, ir);
+
+   /* The components of aggregate constants are not visited by the normal
+    * visitor, so steal their values by hand.
+    */
+   if (constant != NULL) {
+      if (constant->type->is_record()) {
+	 foreach_list(n, &constant->components) {
+	    ir_constant *field = (ir_constant *) n;
+	    steal_memory(field, ir);
+	 }
+      } else if (constant->type->is_array()) {
+	 for (unsigned int i = 0; i < constant->type->length; i++) {
+	    steal_memory(constant->array_elements[i], ir);
+	 }
+      }
+   }
+
+   ralloc_steal(new_ctx, ir);
+}
+
+
+void
+reparent_ir(exec_list *list, void *mem_ctx)
+{
+   foreach_list(node, list) {
+      visit_tree((ir_instruction *) node, steal_memory, mem_ctx);
+   }
+}
+
+
+static ir_rvalue *
+try_min_one(ir_rvalue *ir)
+{
+   ir_expression *expr = ir->as_expression();
+
+   if (!expr || expr->operation != ir_binop_min)
+      return NULL;
+
+   if (expr->operands[0]->is_one())
+      return expr->operands[1];
+
+   if (expr->operands[1]->is_one())
+      return expr->operands[0];
+
+   return NULL;
+}
+
+static ir_rvalue *
+try_max_zero(ir_rvalue *ir)
+{
+   ir_expression *expr = ir->as_expression();
+
+   if (!expr || expr->operation != ir_binop_max)
+      return NULL;
+
+   if (expr->operands[0]->is_zero())
+      return expr->operands[1];
+
+   if (expr->operands[1]->is_zero())
+      return expr->operands[0];
+
+   return NULL;
+}
+
+ir_rvalue *
+ir_rvalue::as_rvalue_to_saturate()
+{
+   ir_expression *expr = this->as_expression();
+
+   if (!expr)
+      return NULL;
+
+   ir_rvalue *max_zero = try_max_zero(expr);
+   if (max_zero) {
+      return try_min_one(max_zero);
+   } else {
+      ir_rvalue *min_one = try_min_one(expr);
+      if (min_one) {
+	 return try_max_zero(min_one);
+      }
+   }
+
+   return NULL;
+}
+
+
+unsigned
+vertices_per_prim(GLenum prim)
+{
+   switch (prim) {
+   case GL_POINTS:
+      return 1;
+   case GL_LINES:
+      return 2;
+   case GL_TRIANGLES:
+      return 3;
+   case GL_LINES_ADJACENCY:
+      return 4;
+   case GL_TRIANGLES_ADJACENCY:
+      return 6;
+   default:
+      assert(!"Bad primitive");
+      return 3;
+   }
+}
+
+/**
+ * Generate a string describing the mode of a variable
+ */
+const char *
+mode_string(const ir_variable *var)
+{
+   switch (var->data.mode) {
+   case ir_var_auto:
+      return (var->data.read_only) ? "global constant" : "global variable";
+
+   case ir_var_uniform:
+      return "uniform";
+
+   case ir_var_shader_in:
+      return "shader input";
+
+   case ir_var_shader_out:
+      return "shader output";
+
+   case ir_var_function_in:
+   case ir_var_const_in:
+      return "function input";
+
+   case ir_var_function_out:
+      return "function output";
+
+   case ir_var_function_inout:
+      return "function inout";
+
+   case ir_var_system_value:
+      return "shader input";
+
+   case ir_var_temporary:
+      return "compiler temporary";
+
+   case ir_var_mode_count:
+      break;
+   }
+
+   assert(!"Should not get here.");
+   return "invalid variable";
+}

diff --git a/icd/intel/compiler/shader/ir_basic_block.cpp b/icd/intel/compiler/shader/ir_basic_block.cpp
new file mode 100644
index 0000000..426fda2
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_basic_block.cpp

@@ -0,0 +1,104 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file ir_basic_block.cpp
+ *
+ * Basic block analysis of instruction streams.
+ */
+
+#include "ir.h"
+#include "ir_visitor.h"
+#include "ir_basic_block.h"
+#include "glsl_types.h"
+
+/**
+ * Calls a user function for every basic block in the instruction stream.
+ *
+ * Basic block analysis is pretty easy in our IR thanks to the lack of
+ * unstructured control flow.  We've got:
+ *
+ * ir_loop (for () {}, while () {}, do {} while ())
+ * ir_loop_jump (
+ * ir_if () {}
+ * ir_return
+ * ir_call()
+ *
+ * Note that the basic blocks returned by this don't encompass all
+ * operations performed by the program -- for example, if conditions
+ * don't get returned, nor do the assignments that will be generated
+ * for ir_call parameters.
+ */
+void call_for_basic_blocks(exec_list *instructions,
+			   void (*callback)(ir_instruction *first,
+					    ir_instruction *last,
+					    void *data),
+			   void *data)
+{
+   ir_instruction *leader = NULL;
+   ir_instruction *last = NULL;
+
+   foreach_list(n, instructions) {
+      ir_instruction *ir = (ir_instruction *) n;
+      ir_if *ir_if;
+      ir_loop *ir_loop;
+      ir_function *ir_function;
+
+      if (!leader)
+	 leader = ir;
+
+      if ((ir_if = ir->as_if())) {
+	 callback(leader, ir, data);
+	 leader = NULL;
+
+	 call_for_basic_blocks(&ir_if->then_instructions, callback, data);
+	 call_for_basic_blocks(&ir_if->else_instructions, callback, data);
+      } else if ((ir_loop = ir->as_loop())) {
+	 callback(leader, ir, data);
+	 leader = NULL;
+	 call_for_basic_blocks(&ir_loop->body_instructions, callback, data);
+      } else if (ir->as_jump() || ir->as_call()) {
+	 callback(leader, ir, data);
+	 leader = NULL;
+      } else if ((ir_function = ir->as_function())) {
+	 /* A function definition doesn't interrupt our basic block
+	  * since execution doesn't go into it.  We should process the
+	  * bodies of its signatures for BBs, though.
+	  *
+	  * Note that we miss an opportunity for producing more
+	  * maximal BBs between the instructions that precede main()
+	  * and the body of main().  Perhaps those instructions ought
+	  * to live inside of main().
+	  */
+	 foreach_list(func_node, &ir_function->signatures) {
+	    ir_function_signature *ir_sig = (ir_function_signature *) func_node;
+
+	    call_for_basic_blocks(&ir_sig->body, callback, data);
+	 }
+      }
+      last = ir;
+   }
+   if (leader) {
+      callback(leader, last, data);
+   }
+}

diff --git a/icd/intel/compiler/shader/ir_basic_block.h b/icd/intel/compiler/shader/ir_basic_block.h
new file mode 100644
index 0000000..dbd678b
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_basic_block.h

@@ -0,0 +1,28 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+void call_for_basic_blocks(exec_list *instructions,
+			   void (*callback)(ir_instruction *first,
+					    ir_instruction *last,
+					    void *data),
+			   void *data);

diff --git a/icd/intel/compiler/shader/ir_builder.cpp b/icd/intel/compiler/shader/ir_builder.cpp
new file mode 100644
index 0000000..f4a1c6e
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_builder.cpp

@@ -0,0 +1,558 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "ir_builder.h"
+#include "program/prog_instruction.h"
+
+using namespace ir_builder;
+
+namespace ir_builder {
+
+void
+ir_factory::emit(ir_instruction *ir)
+{
+   instructions->push_tail(ir);
+}
+
+ir_variable *
+ir_factory::make_temp(const glsl_type *type, const char *name)
+{
+   ir_variable *var;
+
+   var = new(mem_ctx) ir_variable(type, name, ir_var_temporary);
+   emit(var);
+
+   return var;
+}
+
+ir_assignment *
+assign(deref lhs, operand rhs, operand condition, int writemask)
+{
+   void *mem_ctx = ralloc_parent(lhs.val);
+
+   ir_assignment *assign = new(mem_ctx) ir_assignment(lhs.val,
+						      rhs.val,
+                                                      condition.val,
+                                                      writemask);
+
+   return assign;
+}
+
+ir_assignment *
+assign(deref lhs, operand rhs)
+{
+   return assign(lhs, rhs, (1 << lhs.val->type->vector_elements) - 1);
+}
+
+ir_assignment *
+assign(deref lhs, operand rhs, int writemask)
+{
+   return assign(lhs, rhs, (ir_rvalue *) NULL, writemask);
+}
+
+ir_assignment *
+assign(deref lhs, operand rhs, operand condition)
+{
+   return assign(lhs, rhs, condition, (1 << lhs.val->type->vector_elements) - 1);
+}
+
+ir_return *
+ret(operand retval)
+{
+   void *mem_ctx = ralloc_parent(retval.val);
+   return new(mem_ctx) ir_return(retval.val);
+}
+
+ir_swizzle *
+swizzle(operand a, int swizzle, int components)
+{
+   void *mem_ctx = ralloc_parent(a.val);
+
+   return new(mem_ctx) ir_swizzle(a.val,
+				  GET_SWZ(swizzle, 0),
+				  GET_SWZ(swizzle, 1),
+				  GET_SWZ(swizzle, 2),
+				  GET_SWZ(swizzle, 3),
+				  components);
+}
+
+ir_swizzle *
+swizzle_for_size(operand a, unsigned components)
+{
+   void *mem_ctx = ralloc_parent(a.val);
+
+   if (a.val->type->vector_elements < components)
+      components = a.val->type->vector_elements;
+
+   unsigned s[4] = { 0, 1, 2, 3 };
+   for (int i = components; i < 4; i++)
+      s[i] = components - 1;
+
+   return new(mem_ctx) ir_swizzle(a.val, s, components);
+}
+
+ir_swizzle *
+swizzle_xxxx(operand a)
+{
+   return swizzle(a, SWIZZLE_XXXX, 4);
+}
+
+ir_swizzle *
+swizzle_yyyy(operand a)
+{
+   return swizzle(a, SWIZZLE_YYYY, 4);
+}
+
+ir_swizzle *
+swizzle_zzzz(operand a)
+{
+   return swizzle(a, SWIZZLE_ZZZZ, 4);
+}
+
+ir_swizzle *
+swizzle_wwww(operand a)
+{
+   return swizzle(a, SWIZZLE_WWWW, 4);
+}
+
+ir_swizzle *
+swizzle_x(operand a)
+{
+   return swizzle(a, SWIZZLE_XXXX, 1);
+}
+
+ir_swizzle *
+swizzle_y(operand a)
+{
+   return swizzle(a, SWIZZLE_YYYY, 1);
+}
+
+ir_swizzle *
+swizzle_z(operand a)
+{
+   return swizzle(a, SWIZZLE_ZZZZ, 1);
+}
+
+ir_swizzle *
+swizzle_w(operand a)
+{
+   return swizzle(a, SWIZZLE_WWWW, 1);
+}
+
+ir_swizzle *
+swizzle_xy(operand a)
+{
+   return swizzle(a, SWIZZLE_XYZW, 2);
+}
+
+ir_swizzle *
+swizzle_xyz(operand a)
+{
+   return swizzle(a, SWIZZLE_XYZW, 3);
+}
+
+ir_swizzle *
+swizzle_xyzw(operand a)
+{
+   return swizzle(a, SWIZZLE_XYZW, 4);
+}
+
+ir_expression *
+expr(ir_expression_operation op, operand a)
+{
+   void *mem_ctx = ralloc_parent(a.val);
+
+   return new(mem_ctx) ir_expression(op, a.val);
+}
+
+ir_expression *
+expr(ir_expression_operation op, operand a, operand b)
+{
+   void *mem_ctx = ralloc_parent(a.val);
+
+   return new(mem_ctx) ir_expression(op, a.val, b.val);
+}
+
+ir_expression *
+expr(ir_expression_operation op, operand a, operand b, operand c)
+{
+   void *mem_ctx = ralloc_parent(a.val);
+
+   return new(mem_ctx) ir_expression(op, a.val, b.val, c.val);
+}
+
+ir_expression *add(operand a, operand b)
+{
+   return expr(ir_binop_add, a, b);
+}
+
+ir_expression *sub(operand a, operand b)
+{
+   return expr(ir_binop_sub, a, b);
+}
+
+ir_expression *min2(operand a, operand b)
+{
+   return expr(ir_binop_min, a, b);
+}
+
+ir_expression *max2(operand a, operand b)
+{
+   return expr(ir_binop_max, a, b);
+}
+
+ir_expression *mul(operand a, operand b)
+{
+   return expr(ir_binop_mul, a, b);
+}
+
+ir_expression *imul_high(operand a, operand b)
+{
+   return expr(ir_binop_imul_high, a, b);
+}
+
+ir_expression *div(operand a, operand b)
+{
+   return expr(ir_binop_div, a, b);
+}
+
+ir_expression *carry(operand a, operand b)
+{
+   return expr(ir_binop_carry, a, b);
+}
+
+ir_expression *borrow(operand a, operand b)
+{
+   return expr(ir_binop_borrow, a, b);
+}
+
+ir_expression *round_even(operand a)
+{
+   return expr(ir_unop_round_even, a);
+}
+
+/* dot for vectors, mul for scalars */
+ir_expression *dot(operand a, operand b)
+{
+   assert(a.val->type == b.val->type);
+
+   if (a.val->type->vector_elements == 1)
+      return expr(ir_binop_mul, a, b);
+
+   return expr(ir_binop_dot, a, b);
+}
+
+ir_expression*
+clamp(operand a, operand b, operand c)
+{
+   return expr(ir_binop_min, expr(ir_binop_max, a, b), c);
+}
+
+ir_expression *
+saturate(operand a)
+{
+   void *mem_ctx = ralloc_parent(a.val);
+
+   return expr(ir_binop_max,
+	       expr(ir_binop_min, a, new(mem_ctx) ir_constant(1.0f)),
+	       new(mem_ctx) ir_constant(0.0f));
+}
+
+ir_expression *
+abs(operand a)
+{
+   return expr(ir_unop_abs, a);
+}
+
+ir_expression *
+neg(operand a)
+{
+   return expr(ir_unop_neg, a);
+}
+
+ir_expression *
+sin(operand a)
+{
+   return expr(ir_unop_sin, a);
+}
+
+ir_expression *
+cos(operand a)
+{
+   return expr(ir_unop_cos, a);
+}
+
+ir_expression *
+exp(operand a)
+{
+   return expr(ir_unop_exp, a);
+}
+
+ir_expression *
+rsq(operand a)
+{
+   return expr(ir_unop_rsq, a);
+}
+
+ir_expression *
+sqrt(operand a)
+{
+   return expr(ir_unop_sqrt, a);
+}
+
+ir_expression *
+log(operand a)
+{
+   return expr(ir_unop_log, a);
+}
+
+ir_expression *
+sign(operand a)
+{
+   return expr(ir_unop_sign, a);
+}
+
+ir_expression*
+equal(operand a, operand b)
+{
+   return expr(ir_binop_equal, a, b);
+}
+
+ir_expression*
+nequal(operand a, operand b)
+{
+   return expr(ir_binop_nequal, a, b);
+}
+
+ir_expression*
+less(operand a, operand b)
+{
+   return expr(ir_binop_less, a, b);
+}
+
+ir_expression*
+greater(operand a, operand b)
+{
+   return expr(ir_binop_greater, a, b);
+}
+
+ir_expression*
+lequal(operand a, operand b)
+{
+   return expr(ir_binop_lequal, a, b);
+}
+
+ir_expression*
+gequal(operand a, operand b)
+{
+   return expr(ir_binop_gequal, a, b);
+}
+
+ir_expression*
+logic_not(operand a)
+{
+   return expr(ir_unop_logic_not, a);
+}
+
+ir_expression*
+logic_and(operand a, operand b)
+{
+   return expr(ir_binop_logic_and, a, b);
+}
+
+ir_expression*
+logic_or(operand a, operand b)
+{
+   return expr(ir_binop_logic_or, a, b);
+}
+
+ir_expression*
+bit_not(operand a)
+{
+   return expr(ir_unop_bit_not, a);
+}
+
+ir_expression*
+bit_and(operand a, operand b)
+{
+   return expr(ir_binop_bit_and, a, b);
+}
+
+ir_expression*
+bit_or(operand a, operand b)
+{
+   return expr(ir_binop_bit_or, a, b);
+}
+
+ir_expression*
+lshift(operand a, operand b)
+{
+   return expr(ir_binop_lshift, a, b);
+}
+
+ir_expression*
+rshift(operand a, operand b)
+{
+   return expr(ir_binop_rshift, a, b);
+}
+
+ir_expression*
+f2i(operand a)
+{
+   return expr(ir_unop_f2i, a);
+}
+
+ir_expression*
+bitcast_f2i(operand a)
+{
+   return expr(ir_unop_bitcast_f2i, a);
+}
+
+ir_expression*
+i2f(operand a)
+{
+   return expr(ir_unop_i2f, a);
+}
+
+ir_expression*
+bitcast_i2f(operand a)
+{
+   return expr(ir_unop_bitcast_i2f, a);
+}
+
+ir_expression*
+i2u(operand a)
+{
+   return expr(ir_unop_i2u, a);
+}
+
+ir_expression*
+u2i(operand a)
+{
+   return expr(ir_unop_u2i, a);
+}
+
+ir_expression*
+f2u(operand a)
+{
+   return expr(ir_unop_f2u, a);
+}
+
+ir_expression*
+bitcast_f2u(operand a)
+{
+   return expr(ir_unop_bitcast_f2u, a);
+}
+
+ir_expression*
+u2f(operand a)
+{
+   return expr(ir_unop_u2f, a);
+}
+
+ir_expression*
+bitcast_u2f(operand a)
+{
+   return expr(ir_unop_bitcast_u2f, a);
+}
+
+ir_expression*
+i2b(operand a)
+{
+   return expr(ir_unop_i2b, a);
+}
+
+ir_expression*
+b2i(operand a)
+{
+   return expr(ir_unop_b2i, a);
+}
+
+ir_expression *
+f2b(operand a)
+{
+   return expr(ir_unop_f2b, a);
+}
+
+ir_expression *
+b2f(operand a)
+{
+   return expr(ir_unop_b2f, a);
+}
+
+ir_expression *
+fma(operand a, operand b, operand c)
+{
+   return expr(ir_triop_fma, a, b, c);
+}
+
+ir_expression *
+lrp(operand x, operand y, operand a)
+{
+   return expr(ir_triop_lrp, x, y, a);
+}
+
+ir_expression *
+csel(operand a, operand b, operand c)
+{
+   return expr(ir_triop_csel, a, b, c);
+}
+
+ir_expression *
+bitfield_insert(operand a, operand b, operand c, operand d)
+{
+   void *mem_ctx = ralloc_parent(a.val);
+   return new(mem_ctx) ir_expression(ir_quadop_bitfield_insert,
+                                     a.val->type, a.val, b.val, c.val, d.val);
+}
+
+ir_if*
+if_tree(operand condition,
+        ir_instruction *then_branch)
+{
+   assert(then_branch != NULL);
+
+   void *mem_ctx = ralloc_parent(condition.val);
+
+   ir_if *result = new(mem_ctx) ir_if(condition.val);
+   result->then_instructions.push_tail(then_branch);
+   return result;
+}
+
+ir_if*
+if_tree(operand condition,
+        ir_instruction *then_branch,
+        ir_instruction *else_branch)
+{
+   assert(then_branch != NULL);
+   assert(else_branch != NULL);
+
+   void *mem_ctx = ralloc_parent(condition.val);
+
+   ir_if *result = new(mem_ctx) ir_if(condition.val);
+   result->then_instructions.push_tail(then_branch);
+   result->else_instructions.push_tail(else_branch);
+   return result;
+}
+
+} /* namespace ir_builder */

diff --git a/icd/intel/compiler/shader/ir_clone.cpp b/icd/intel/compiler/shader/ir_clone.cpp
new file mode 100644
index 0000000..c00adc5
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_clone.cpp

@@ -0,0 +1,448 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include <string.h>
+#include "main/compiler.h"
+#include "ir.h"
+#include "glsl_types.h"
+#include "program/hash_table.h"
+
+ir_rvalue *
+ir_rvalue::clone(void *mem_ctx, struct hash_table *) const
+{
+   /* The only possible instantiation is the generic error value. */
+   return error_value(mem_ctx);
+}
+
+/**
+ * Duplicate an IR variable
+ */
+ir_variable *
+ir_variable::clone(void *mem_ctx, struct hash_table *ht) const
+{
+   ir_variable *var = new(mem_ctx) ir_variable(this->type, this->name,
+					       (ir_variable_mode) this->data.mode);
+
+   var->data.max_array_access = this->data.max_array_access;
+   if (this->is_interface_instance()) {
+      var->max_ifc_array_access =
+         rzalloc_array(var, unsigned, this->interface_type->length);
+      memcpy(var->max_ifc_array_access, this->max_ifc_array_access,
+             this->interface_type->length * sizeof(unsigned));
+   }
+
+   memcpy(&var->data, &this->data, sizeof(var->data));
+
+   var->warn_extension = this->warn_extension;
+
+   var->num_state_slots = this->num_state_slots;
+   if (this->state_slots) {
+      /* FINISHME: This really wants to use something like talloc_reference, but
+       * FINISHME: ralloc doesn't have any similar function.
+       */
+      var->state_slots = ralloc_array(var, ir_state_slot,
+				      this->num_state_slots);
+      memcpy(var->state_slots, this->state_slots,
+	     sizeof(this->state_slots[0]) * var->num_state_slots);
+   }
+
+   if (this->constant_value)
+      var->constant_value = this->constant_value->clone(mem_ctx, ht);
+
+   if (this->constant_initializer)
+      var->constant_initializer =
+	 this->constant_initializer->clone(mem_ctx, ht);
+
+   var->interface_type = this->interface_type;
+
+   if (ht) {
+      hash_table_insert(ht, var, (void *)const_cast<ir_variable *>(this));
+   }
+
+   return var;
+}
+
+ir_swizzle *
+ir_swizzle::clone(void *mem_ctx, struct hash_table *ht) const
+{
+   return new(mem_ctx) ir_swizzle(this->val->clone(mem_ctx, ht), this->mask);
+}
+
+ir_return *
+ir_return::clone(void *mem_ctx, struct hash_table *ht) const
+{
+   ir_rvalue *new_value = NULL;
+
+   if (this->value)
+      new_value = this->value->clone(mem_ctx, ht);
+
+   return new(mem_ctx) ir_return(new_value);
+}
+
+ir_discard *
+ir_discard::clone(void *mem_ctx, struct hash_table *ht) const
+{
+   ir_rvalue *new_condition = NULL;
+
+   if (this->condition != NULL)
+      new_condition = this->condition->clone(mem_ctx, ht);
+
+   return new(mem_ctx) ir_discard(new_condition);
+}
+
+ir_loop_jump *
+ir_loop_jump::clone(void *mem_ctx, struct hash_table *ht) const
+{
+   (void)ht;
+
+   return new(mem_ctx) ir_loop_jump(this->mode);
+}
+
+ir_if *
+ir_if::clone(void *mem_ctx, struct hash_table *ht) const
+{
+   ir_if *new_if = new(mem_ctx) ir_if(this->condition->clone(mem_ctx, ht));
+
+   foreach_list(n, &this->then_instructions) {
+      ir_instruction *ir = (ir_instruction *) n;
+      new_if->then_instructions.push_tail(ir->clone(mem_ctx, ht));
+   }
+
+   foreach_list(n, &this->else_instructions) {
+      ir_instruction *ir = (ir_instruction *) n;
+      new_if->else_instructions.push_tail(ir->clone(mem_ctx, ht));
+   }
+
+   return new_if;
+}
+
+ir_loop *
+ir_loop::clone(void *mem_ctx, struct hash_table *ht) const
+{
+   ir_loop *new_loop = new(mem_ctx) ir_loop();
+
+   foreach_list(n, &this->body_instructions) {
+      ir_instruction *ir = (ir_instruction *) n;
+      new_loop->body_instructions.push_tail(ir->clone(mem_ctx, ht));
+   }
+
+   return new_loop;
+}
+
+ir_call *
+ir_call::clone(void *mem_ctx, struct hash_table *ht) const
+{
+   ir_dereference_variable *new_return_ref = NULL;
+   if (this->return_deref != NULL)
+      new_return_ref = this->return_deref->clone(mem_ctx, ht);
+
+   exec_list new_parameters;
+
+   foreach_list(n, &this->actual_parameters) {
+      ir_instruction *ir = (ir_instruction *) n;
+      new_parameters.push_tail(ir->clone(mem_ctx, ht));
+   }
+
+   return new(mem_ctx) ir_call(this->callee, new_return_ref, &new_parameters);
+}
+
+ir_expression *
+ir_expression::clone(void *mem_ctx, struct hash_table *ht) const
+{
+   ir_rvalue *op[Elements(this->operands)] = { NULL, };
+   unsigned int i;
+
+   for (i = 0; i < get_num_operands(); i++) {
+      op[i] = this->operands[i]->clone(mem_ctx, ht);
+   }
+
+   return new(mem_ctx) ir_expression(this->operation, this->type,
+				     op[0], op[1], op[2], op[3]);
+}
+
+ir_dereference_variable *
+ir_dereference_variable::clone(void *mem_ctx, struct hash_table *ht) const
+{
+   ir_variable *new_var;
+
+   if (ht) {
+      new_var = (ir_variable *)hash_table_find(ht, this->var);
+      if (!new_var)
+	 new_var = this->var;
+   } else {
+      new_var = this->var;
+   }
+
+   return new(mem_ctx) ir_dereference_variable(new_var);
+}
+
+ir_dereference_array *
+ir_dereference_array::clone(void *mem_ctx, struct hash_table *ht) const
+{
+   return new(mem_ctx) ir_dereference_array(this->array->clone(mem_ctx, ht),
+					    this->array_index->clone(mem_ctx,
+								     ht));
+}
+
+ir_dereference_record *
+ir_dereference_record::clone(void *mem_ctx, struct hash_table *ht) const
+{
+   return new(mem_ctx) ir_dereference_record(this->record->clone(mem_ctx, ht),
+					     this->field);
+}
+
+ir_texture *
+ir_texture::clone(void *mem_ctx, struct hash_table *ht) const
+{
+   ir_texture *new_tex = new(mem_ctx) ir_texture(this->op);
+   new_tex->type = this->type;
+
+   new_tex->sampler = this->sampler->clone(mem_ctx, ht);
+   if (this->coordinate)
+      new_tex->coordinate = this->coordinate->clone(mem_ctx, ht);
+   if (this->projector)
+      new_tex->projector = this->projector->clone(mem_ctx, ht);
+   if (this->shadow_comparitor) {
+      new_tex->shadow_comparitor = this->shadow_comparitor->clone(mem_ctx, ht);
+   }
+
+   if (this->offset != NULL)
+      new_tex->offset = this->offset->clone(mem_ctx, ht);
+
+   switch (this->op) {
+   case ir_tex:
+   case ir_lod:
+   case ir_query_levels:
+      break;
+   case ir_txb:
+      new_tex->lod_info.bias = this->lod_info.bias->clone(mem_ctx, ht);
+      break;
+   case ir_txl:
+   case ir_txf:
+   case ir_txs:
+      new_tex->lod_info.lod = this->lod_info.lod->clone(mem_ctx, ht);
+      break;
+   case ir_txf_ms:
+      new_tex->lod_info.sample_index = this->lod_info.sample_index->clone(mem_ctx, ht);
+      break;
+   case ir_txd:
+      new_tex->lod_info.grad.dPdx = this->lod_info.grad.dPdx->clone(mem_ctx, ht);
+      new_tex->lod_info.grad.dPdy = this->lod_info.grad.dPdy->clone(mem_ctx, ht);
+      break;
+   case ir_tg4:
+      new_tex->lod_info.component = this->lod_info.component->clone(mem_ctx, ht);
+      break;
+   }
+
+   return new_tex;
+}
+
+ir_assignment *
+ir_assignment::clone(void *mem_ctx, struct hash_table *ht) const
+{
+   ir_rvalue *new_condition = NULL;
+
+   if (this->condition)
+      new_condition = this->condition->clone(mem_ctx, ht);
+
+   ir_assignment *cloned =
+      new(mem_ctx) ir_assignment(this->lhs->clone(mem_ctx, ht),
+                                 this->rhs->clone(mem_ctx, ht),
+                                 new_condition);
+   cloned->write_mask = this->write_mask;
+   return cloned;
+}
+
+ir_function *
+ir_function::clone(void *mem_ctx, struct hash_table *ht) const
+{
+   ir_function *copy = new(mem_ctx) ir_function(this->name);
+
+   foreach_list_const(node, &this->signatures) {
+      const ir_function_signature *const sig =
+	 (const ir_function_signature *const) node;
+
+      ir_function_signature *sig_copy = sig->clone(mem_ctx, ht);
+      copy->add_signature(sig_copy);
+
+      if (ht != NULL)
+	 hash_table_insert(ht, sig_copy,
+			   (void *)const_cast<ir_function_signature *>(sig));
+   }
+
+   return copy;
+}
+
+ir_function_signature *
+ir_function_signature::clone(void *mem_ctx, struct hash_table *ht) const
+{
+   ir_function_signature *copy = this->clone_prototype(mem_ctx, ht);
+
+   copy->is_defined = this->is_defined;
+
+   /* Clone the instruction list.
+    */
+   foreach_list_const(node, &this->body) {
+      const ir_instruction *const inst = (const ir_instruction *) node;
+
+      ir_instruction *const inst_copy = inst->clone(mem_ctx, ht);
+      copy->body.push_tail(inst_copy);
+   }
+
+   return copy;
+}
+
+ir_function_signature *
+ir_function_signature::clone_prototype(void *mem_ctx, struct hash_table *ht) const
+{
+   ir_function_signature *copy =
+      new(mem_ctx) ir_function_signature(this->return_type);
+
+   copy->is_defined = false;
+   copy->builtin_avail = this->builtin_avail;
+   copy->origin = this;
+
+   /* Clone the parameter list, but NOT the body.
+    */
+   foreach_list_const(node, &this->parameters) {
+      const ir_variable *const param = (const ir_variable *) node;
+
+      assert(const_cast<ir_variable *>(param)->as_variable() != NULL);
+
+      ir_variable *const param_copy = param->clone(mem_ctx, ht);
+      copy->parameters.push_tail(param_copy);
+   }
+
+   return copy;
+}
+
+ir_constant *
+ir_constant::clone(void *mem_ctx, struct hash_table *ht) const
+{
+   (void)ht;
+
+   switch (this->type->base_type) {
+   case GLSL_TYPE_UINT:
+   case GLSL_TYPE_INT:
+   case GLSL_TYPE_FLOAT:
+   case GLSL_TYPE_BOOL:
+      return new(mem_ctx) ir_constant(this->type, &this->value);
+
+   case GLSL_TYPE_STRUCT: {
+      ir_constant *c = new(mem_ctx) ir_constant;
+
+      c->type = this->type;
+      for (exec_node *node = this->components.head
+	      ; !node->is_tail_sentinel()
+	      ; node = node->next) {
+	 ir_constant *const orig = (ir_constant *) node;
+
+	 c->components.push_tail(orig->clone(mem_ctx, NULL));
+      }
+
+      return c;
+   }
+
+   case GLSL_TYPE_ARRAY: {
+      ir_constant *c = new(mem_ctx) ir_constant;
+
+      c->type = this->type;
+      c->array_elements = ralloc_array(c, ir_constant *, this->type->length);
+      for (unsigned i = 0; i < this->type->length; i++) {
+	 c->array_elements[i] = this->array_elements[i]->clone(mem_ctx, NULL);
+      }
+      return c;
+   }
+
+   case GLSL_TYPE_SAMPLER:
+   case GLSL_TYPE_IMAGE:
+   case GLSL_TYPE_ATOMIC_UINT:
+   case GLSL_TYPE_VOID:
+   case GLSL_TYPE_ERROR:
+   case GLSL_TYPE_INTERFACE:
+      assert(!"Should not get here.");
+      break;
+   }
+
+   return NULL;
+}
+
+
+class fixup_ir_call_visitor : public ir_hierarchical_visitor {
+public:
+   fixup_ir_call_visitor(struct hash_table *ht)
+   {
+      this->ht = ht;
+   }
+
+   virtual ir_visitor_status visit_enter(ir_call *ir)
+   {
+      /* Try to find the function signature referenced by the ir_call in the
+       * table.  If it is found, replace it with the value from the table.
+       */
+      ir_function_signature *sig =
+	 (ir_function_signature *) hash_table_find(this->ht, ir->callee);
+      if (sig != NULL)
+	 ir->callee = sig;
+
+      /* Since this may be used before function call parameters are flattened,
+       * the children also need to be processed.
+       */
+      return visit_continue;
+   }
+
+private:
+   struct hash_table *ht;
+};
+
+
+static void
+fixup_function_calls(struct hash_table *ht, exec_list *instructions)
+{
+   fixup_ir_call_visitor v(ht);
+   v.run(instructions);
+}
+
+
+void
+clone_ir_list(void *mem_ctx, exec_list *out, const exec_list *in)
+{
+   struct hash_table *ht =
+      hash_table_ctor(0, hash_table_pointer_hash, hash_table_pointer_compare);
+
+   foreach_list_const(node, in) {
+      const ir_instruction *const original = (ir_instruction *) node;
+      ir_instruction *copy = original->clone(mem_ctx, ht);
+
+      out->push_tail(copy);
+   }
+
+   /* Make a pass over the cloned tree to fix up ir_call nodes to point to the
+    * cloned ir_function_signature nodes.  This cannot be done automatically
+    * during cloning because the ir_call might be a forward reference (i.e.,
+    * the function signature that it references may not have been cloned yet).
+    */
+   fixup_function_calls(ht, out);
+
+   hash_table_dtor(ht);
+}

diff --git a/icd/intel/compiler/shader/ir_constant_expression.cpp b/icd/intel/compiler/shader/ir_constant_expression.cpp
new file mode 100644
index 0000000..fd311d1
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_constant_expression.cpp

@@ -0,0 +1,1951 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file ir_constant_expression.cpp
+ * Evaluate and process constant valued expressions
+ *
+ * In GLSL, constant valued expressions are used in several places.  These
+ * must be processed and evaluated very early in the compilation process.
+ *
+ *    * Sizes of arrays
+ *    * Initializers for uniforms
+ *    * Initializers for \c const variables
+ */
+
+#include <math.h>
+#include "libfns.h"
+#include "ir.h"
+#include "ir_visitor.h"
+#include "glsl_types.h"
+#include "program/hash_table.h"
+
+#if defined(_MSC_VER) && (_MSC_VER < 1800)
+static int isnormal(double x)
+{
+   return _fpclass(x) == _FPCLASS_NN || _fpclass(x) == _FPCLASS_PN;
+}
+#elif defined(__SUNPRO_CC)
+#include <ieeefp.h>
+static int isnormal(double x)
+{
+   return fpclass(x) == FP_NORMAL;
+}
+#endif
+
+#if defined(_MSC_VER)
+static double copysign(double x, double y)
+{
+   return _copysign(x, y);
+}
+#endif
+
+static float
+dot(ir_constant *op0, ir_constant *op1)
+{
+   assert(op0->type->is_float() && op1->type->is_float());
+
+   float result = 0;
+   for (unsigned c = 0; c < op0->type->components(); c++)
+      result += op0->value.f[c] * op1->value.f[c];
+
+   return result;
+}
+
+/* This method is the only one supported by gcc.  Unions in particular
+ * are iffy, and read-through-converted-pointer is killed by strict
+ * aliasing.  OTOH, the compiler sees through the memcpy, so the
+ * resulting asm is reasonable.
+ */
+static float
+bitcast_u2f(unsigned int u)
+{
+   assert(sizeof(float) == sizeof(unsigned int));
+   float f;
+   memcpy(&f, &u, sizeof(f));
+   return f;
+}
+
+static unsigned int
+bitcast_f2u(float f)
+{
+   assert(sizeof(float) == sizeof(unsigned int));
+   unsigned int u;
+   memcpy(&u, &f, sizeof(f));
+   return u;
+}
+
+/**
+ * Evaluate one component of a floating-point 4x8 unpacking function.
+ */
+typedef uint8_t
+(*pack_1x8_func_t)(float);
+
+/**
+ * Evaluate one component of a floating-point 2x16 unpacking function.
+ */
+typedef uint16_t
+(*pack_1x16_func_t)(float);
+
+/**
+ * Evaluate one component of a floating-point 4x8 unpacking function.
+ */
+typedef float
+(*unpack_1x8_func_t)(uint8_t);
+
+/**
+ * Evaluate one component of a floating-point 2x16 unpacking function.
+ */
+typedef float
+(*unpack_1x16_func_t)(uint16_t);
+
+/**
+ * Evaluate a 2x16 floating-point packing function.
+ */
+static uint32_t
+pack_2x16(pack_1x16_func_t pack_1x16,
+          float x, float y)
+{
+   /* From section 8.4 of the GLSL ES 3.00 spec:
+    *
+    *    packSnorm2x16
+    *    -------------
+    *    The first component of the vector will be written to the least
+    *    significant bits of the output; the last component will be written to
+    *    the most significant bits.
+    *
+    * The specifications for the other packing functions contain similar
+    * language.
+    */
+   uint32_t u = 0;
+   u |= ((uint32_t) pack_1x16(x) << 0);
+   u |= ((uint32_t) pack_1x16(y) << 16);
+   return u;
+}
+
+/**
+ * Evaluate a 4x8 floating-point packing function.
+ */
+static uint32_t
+pack_4x8(pack_1x8_func_t pack_1x8,
+         float x, float y, float z, float w)
+{
+   /* From section 8.4 of the GLSL 4.30 spec:
+    *
+    *    packSnorm4x8
+    *    ------------
+    *    The first component of the vector will be written to the least
+    *    significant bits of the output; the last component will be written to
+    *    the most significant bits.
+    *
+    * The specifications for the other packing functions contain similar
+    * language.
+    */
+   uint32_t u = 0;
+   u |= ((uint32_t) pack_1x8(x) << 0);
+   u |= ((uint32_t) pack_1x8(y) << 8);
+   u |= ((uint32_t) pack_1x8(z) << 16);
+   u |= ((uint32_t) pack_1x8(w) << 24);
+   return u;
+}
+
+/**
+ * Evaluate a 2x16 floating-point unpacking function.
+ */
+static void
+unpack_2x16(unpack_1x16_func_t unpack_1x16,
+            uint32_t u,
+            float *x, float *y)
+{
+    /* From section 8.4 of the GLSL ES 3.00 spec:
+     *
+     *    unpackSnorm2x16
+     *    ---------------
+     *    The first component of the returned vector will be extracted from
+     *    the least significant bits of the input; the last component will be
+     *    extracted from the most significant bits.
+     *
+     * The specifications for the other unpacking functions contain similar
+     * language.
+     */
+   *x = unpack_1x16((uint16_t) (u & 0xffff));
+   *y = unpack_1x16((uint16_t) (u >> 16));
+}
+
+/**
+ * Evaluate a 4x8 floating-point unpacking function.
+ */
+static void
+unpack_4x8(unpack_1x8_func_t unpack_1x8, uint32_t u,
+           float *x, float *y, float *z, float *w)
+{
+    /* From section 8.4 of the GLSL 4.30 spec:
+     *
+     *    unpackSnorm4x8
+     *    --------------
+     *    The first component of the returned vector will be extracted from
+     *    the least significant bits of the input; the last component will be
+     *    extracted from the most significant bits.
+     *
+     * The specifications for the other unpacking functions contain similar
+     * language.
+     */
+   *x = unpack_1x8((uint8_t) (u & 0xff));
+   *y = unpack_1x8((uint8_t) (u >> 8));
+   *z = unpack_1x8((uint8_t) (u >> 16));
+   *w = unpack_1x8((uint8_t) (u >> 24));
+}
+
+/**
+ * Evaluate one component of packSnorm4x8.
+ */
+static uint8_t
+pack_snorm_1x8(float x)
+{
+    /* From section 8.4 of the GLSL 4.30 spec:
+     *
+     *    packSnorm4x8
+     *    ------------
+     *    The conversion for component c of v to fixed point is done as
+     *    follows:
+     *
+     *      packSnorm4x8: round(clamp(c, -1, +1) * 127.0)
+     *
+     * We must first cast the float to an int, because casting a negative
+     * float to a uint is undefined.
+     */
+   return (uint8_t) (int8_t)
+          _mesa_round_to_even(CLAMP(x, -1.0f, +1.0f) * 127.0f);
+}
+
+/**
+ * Evaluate one component of packSnorm2x16.
+ */
+static uint16_t
+pack_snorm_1x16(float x)
+{
+    /* From section 8.4 of the GLSL ES 3.00 spec:
+     *
+     *    packSnorm2x16
+     *    -------------
+     *    The conversion for component c of v to fixed point is done as
+     *    follows:
+     *
+     *      packSnorm2x16: round(clamp(c, -1, +1) * 32767.0)
+     *
+     * We must first cast the float to an int, because casting a negative
+     * float to a uint is undefined.
+     */
+   return (uint16_t) (int16_t)
+          _mesa_round_to_even(CLAMP(x, -1.0f, +1.0f) * 32767.0f);
+}
+
+/**
+ * Evaluate one component of unpackSnorm4x8.
+ */
+static float
+unpack_snorm_1x8(uint8_t u)
+{
+    /* From section 8.4 of the GLSL 4.30 spec:
+     *
+     *    unpackSnorm4x8
+     *    --------------
+     *    The conversion for unpacked fixed-point value f to floating point is
+     *    done as follows:
+     *
+     *       unpackSnorm4x8: clamp(f / 127.0, -1, +1)
+     */
+   return CLAMP((int8_t) u / 127.0f, -1.0f, +1.0f);
+}
+
+/**
+ * Evaluate one component of unpackSnorm2x16.
+ */
+static float
+unpack_snorm_1x16(uint16_t u)
+{
+    /* From section 8.4 of the GLSL ES 3.00 spec:
+     *
+     *    unpackSnorm2x16
+     *    ---------------
+     *    The conversion for unpacked fixed-point value f to floating point is
+     *    done as follows:
+     *
+     *       unpackSnorm2x16: clamp(f / 32767.0, -1, +1)
+     */
+   return CLAMP((int16_t) u / 32767.0f, -1.0f, +1.0f);
+}
+
+/**
+ * Evaluate one component packUnorm4x8.
+ */
+static uint8_t
+pack_unorm_1x8(float x)
+{
+    /* From section 8.4 of the GLSL 4.30 spec:
+     *
+     *    packUnorm4x8
+     *    ------------
+     *    The conversion for component c of v to fixed point is done as
+     *    follows:
+     *
+     *       packUnorm4x8: round(clamp(c, 0, +1) * 255.0)
+     */
+   return (uint8_t) _mesa_round_to_even(CLAMP(x, 0.0f, 1.0f) * 255.0f);
+}
+
+/**
+ * Evaluate one component packUnorm2x16.
+ */
+static uint16_t
+pack_unorm_1x16(float x)
+{
+    /* From section 8.4 of the GLSL ES 3.00 spec:
+     *
+     *    packUnorm2x16
+     *    -------------
+     *    The conversion for component c of v to fixed point is done as
+     *    follows:
+     *
+     *       packUnorm2x16: round(clamp(c, 0, +1) * 65535.0)
+     */
+   return (uint16_t) _mesa_round_to_even(CLAMP(x, 0.0f, 1.0f) * 65535.0f);
+}
+
+/**
+ * Evaluate one component of unpackUnorm4x8.
+ */
+static float
+unpack_unorm_1x8(uint8_t u)
+{
+    /* From section 8.4 of the GLSL 4.30 spec:
+     *
+     *    unpackUnorm4x8
+     *    --------------
+     *    The conversion for unpacked fixed-point value f to floating point is
+     *    done as follows:
+     *
+     *       unpackUnorm4x8: f / 255.0
+     */
+   return (float) u / 255.0f;
+}
+
+/**
+ * Evaluate one component of unpackUnorm2x16.
+ */
+static float
+unpack_unorm_1x16(uint16_t u)
+{
+    /* From section 8.4 of the GLSL ES 3.00 spec:
+     *
+     *    unpackUnorm2x16
+     *    ---------------
+     *    The conversion for unpacked fixed-point value f to floating point is
+     *    done as follows:
+     *
+     *       unpackUnorm2x16: f / 65535.0
+     */
+   return (float) u / 65535.0f;
+}
+
+/**
+ * Evaluate one component of packHalf2x16.
+ */
+static uint16_t
+pack_half_1x16(float x)
+{
+   return _mesa_float_to_half(x);
+}
+
+/**
+ * Evaluate one component of unpackHalf2x16.
+ */
+static float
+unpack_half_1x16(uint16_t u)
+{
+   return _mesa_half_to_float(u);
+}
+
+/**
+ * Get the constant that is ultimately referenced by an r-value, in a constant
+ * expression evaluation context.
+ *
+ * The offset is used when the reference is to a specific column of a matrix.
+ */
+static bool
+constant_referenced(const ir_dereference *deref,
+                    struct hash_table *variable_context,
+                    ir_constant *&store, int &offset)
+{
+   store = NULL;
+   offset = 0;
+
+   if (variable_context == NULL)
+      return false;
+
+   switch (deref->ir_type) {
+   case ir_type_dereference_array: {
+      const ir_dereference_array *const da =
+         (const ir_dereference_array *) deref;
+
+      ir_constant *const index_c =
+         da->array_index->constant_expression_value(variable_context);
+
+      if (!index_c || !index_c->type->is_scalar() || !index_c->type->is_integer())
+         break;
+
+      const int index = index_c->type->base_type == GLSL_TYPE_INT ?
+         index_c->get_int_component(0) :
+         index_c->get_uint_component(0);
+
+      ir_constant *substore;
+      int suboffset;
+
+      const ir_dereference *const deref = da->array->as_dereference();
+      if (!deref)
+         break;
+
+      if (!constant_referenced(deref, variable_context, substore, suboffset))
+         break;
+
+      const glsl_type *const vt = da->array->type;
+      if (vt->is_array()) {
+         store = substore->get_array_element(index);
+         offset = 0;
+      } else if (vt->is_matrix()) {
+         store = substore;
+         offset = index * vt->vector_elements;
+      } else if (vt->is_vector()) {
+         store = substore;
+         offset = suboffset + index;
+      }
+
+      break;
+   }
+
+   case ir_type_dereference_record: {
+      const ir_dereference_record *const dr =
+         (const ir_dereference_record *) deref;
+
+      const ir_dereference *const deref = dr->record->as_dereference();
+      if (!deref)
+         break;
+
+      ir_constant *substore;
+      int suboffset;
+
+      if (!constant_referenced(deref, variable_context, substore, suboffset))
+         break;
+
+      /* Since we're dropping it on the floor...
+       */
+      assert(suboffset == 0);
+
+      store = substore->get_record_field(dr->field);
+      break;
+   }
+
+   case ir_type_dereference_variable: {
+      const ir_dereference_variable *const dv =
+         (const ir_dereference_variable *) deref;
+
+      store = (ir_constant *) hash_table_find(variable_context, dv->var);
+      break;
+   }
+
+   default:
+      assert(!"Should not get here.");
+      break;
+   }
+
+   return store != NULL;
+}
+
+
+ir_constant *
+ir_rvalue::constant_expression_value(struct hash_table *)
+{
+   assert(this->type->is_error());
+   return NULL;
+}
+
+ir_constant *
+ir_expression::constant_expression_value(struct hash_table *variable_context)
+{
+   if (this->type->is_error())
+      return NULL;
+
+   ir_constant *op[Elements(this->operands)] = { NULL, };
+   ir_constant_data data;
+
+   memset(&data, 0, sizeof(data));
+
+   for (unsigned operand = 0; operand < this->get_num_operands(); operand++) {
+      op[operand] = this->operands[operand]->constant_expression_value(variable_context);
+      if (!op[operand])
+	 return NULL;
+   }
+
+   if (op[1] != NULL)
+      switch (this->operation) {
+      case ir_binop_lshift:
+      case ir_binop_rshift:
+      case ir_binop_ldexp:
+      case ir_binop_vector_extract:
+      case ir_triop_csel:
+      case ir_triop_bitfield_extract:
+         break;
+
+      default:
+         assert(op[0]->type->base_type == op[1]->type->base_type);
+         break;
+      }
+
+   bool op0_scalar = op[0]->type->is_scalar();
+   bool op1_scalar = op[1] != NULL && op[1]->type->is_scalar();
+
+   /* When iterating over a vector or matrix's components, we want to increase
+    * the loop counter.  However, for scalars, we want to stay at 0.
+    */
+   unsigned c0_inc = op0_scalar ? 0 : 1;
+   unsigned c1_inc = op1_scalar ? 0 : 1;
+   unsigned components;
+   if (op1_scalar || !op[1]) {
+      components = op[0]->type->components();
+   } else {
+      components = op[1]->type->components();
+   }
+
+   void *ctx = ralloc_parent(this);
+
+   /* Handle array operations here, rather than below. */
+   if (op[0]->type->is_array()) {
+      assert(op[1] != NULL && op[1]->type->is_array());
+      switch (this->operation) {
+      case ir_binop_all_equal:
+	 return new(ctx) ir_constant(op[0]->has_value(op[1]));
+      case ir_binop_any_nequal:
+	 return new(ctx) ir_constant(!op[0]->has_value(op[1]));
+      default:
+	 break;
+      }
+      return NULL;
+   }
+
+   switch (this->operation) {
+   case ir_unop_bit_not:
+       switch (op[0]->type->base_type) {
+       case GLSL_TYPE_INT:
+           for (unsigned c = 0; c < components; c++)
+               data.i[c] = ~ op[0]->value.i[c];
+           break;
+       case GLSL_TYPE_UINT:
+           for (unsigned c = 0; c < components; c++)
+               data.u[c] = ~ op[0]->value.u[c];
+           break;
+       default:
+           assert(0);
+       }
+       break;
+
+   case ir_unop_logic_not:
+      assert(op[0]->type->base_type == GLSL_TYPE_BOOL);
+      for (unsigned c = 0; c < op[0]->type->components(); c++)
+	 data.b[c] = !op[0]->value.b[c];
+      break;
+
+   case ir_unop_f2i:
+      assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.i[c] = (int) op[0]->value.f[c];
+      }
+      break;
+   case ir_unop_f2u:
+      assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+         data.i[c] = (unsigned) op[0]->value.f[c];
+      }
+      break;
+   case ir_unop_i2f:
+      assert(op[0]->type->base_type == GLSL_TYPE_INT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.f[c] = (float) op[0]->value.i[c];
+      }
+      break;
+   case ir_unop_u2f:
+      assert(op[0]->type->base_type == GLSL_TYPE_UINT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.f[c] = (float) op[0]->value.u[c];
+      }
+      break;
+   case ir_unop_b2f:
+      assert(op[0]->type->base_type == GLSL_TYPE_BOOL);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.f[c] = op[0]->value.b[c] ? 1.0F : 0.0F;
+      }
+      break;
+   case ir_unop_f2b:
+      assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.b[c] = op[0]->value.f[c] != 0.0F ? true : false;
+      }
+      break;
+   case ir_unop_b2i:
+      assert(op[0]->type->base_type == GLSL_TYPE_BOOL);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.u[c] = op[0]->value.b[c] ? 1 : 0;
+      }
+      break;
+   case ir_unop_i2b:
+      assert(op[0]->type->is_integer());
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.b[c] = op[0]->value.u[c] ? true : false;
+      }
+      break;
+   case ir_unop_u2i:
+      assert(op[0]->type->base_type == GLSL_TYPE_UINT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.i[c] = op[0]->value.u[c];
+      }
+      break;
+   case ir_unop_i2u:
+      assert(op[0]->type->base_type == GLSL_TYPE_INT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.u[c] = op[0]->value.i[c];
+      }
+      break;
+   case ir_unop_bitcast_i2f:
+      assert(op[0]->type->base_type == GLSL_TYPE_INT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.f[c] = bitcast_u2f(op[0]->value.i[c]);
+      }
+      break;
+   case ir_unop_bitcast_f2i:
+      assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.i[c] = bitcast_f2u(op[0]->value.f[c]);
+      }
+      break;
+   case ir_unop_bitcast_u2f:
+      assert(op[0]->type->base_type == GLSL_TYPE_UINT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.f[c] = bitcast_u2f(op[0]->value.u[c]);
+      }
+      break;
+   case ir_unop_bitcast_f2u:
+      assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.u[c] = bitcast_f2u(op[0]->value.f[c]);
+      }
+      break;
+   case ir_unop_any:
+      assert(op[0]->type->is_boolean());
+      data.b[0] = false;
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 if (op[0]->value.b[c])
+	    data.b[0] = true;
+      }
+      break;
+
+   case ir_unop_trunc:
+      assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.f[c] = truncf(op[0]->value.f[c]);
+      }
+      break;
+
+   case ir_unop_round_even:
+      assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.f[c] = _mesa_round_to_even(op[0]->value.f[c]);
+      }
+      break;
+
+   case ir_unop_ceil:
+      assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.f[c] = ceilf(op[0]->value.f[c]);
+      }
+      break;
+
+   case ir_unop_floor:
+      assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.f[c] = floorf(op[0]->value.f[c]);
+      }
+      break;
+
+   case ir_unop_fract:
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 switch (this->type->base_type) {
+	 case GLSL_TYPE_UINT:
+	    data.u[c] = 0;
+	    break;
+	 case GLSL_TYPE_INT:
+	    data.i[c] = 0;
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    data.f[c] = op[0]->value.f[c] - floor(op[0]->value.f[c]);
+	    break;
+	 default:
+	    assert(0);
+	 }
+      }
+      break;
+
+   case ir_unop_sin:
+   case ir_unop_sin_reduced:
+      assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.f[c] = sinf(op[0]->value.f[c]);
+      }
+      break;
+
+   case ir_unop_cos:
+   case ir_unop_cos_reduced:
+      assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.f[c] = cosf(op[0]->value.f[c]);
+      }
+      break;
+
+   case ir_unop_neg:
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 switch (this->type->base_type) {
+	 case GLSL_TYPE_UINT:
+	    data.u[c] = -((int) op[0]->value.u[c]);
+	    break;
+	 case GLSL_TYPE_INT:
+	    data.i[c] = -op[0]->value.i[c];
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    data.f[c] = -op[0]->value.f[c];
+	    break;
+	 default:
+	    assert(0);
+	 }
+      }
+      break;
+
+   case ir_unop_abs:
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 switch (this->type->base_type) {
+	 case GLSL_TYPE_UINT:
+	    data.u[c] = op[0]->value.u[c];
+	    break;
+	 case GLSL_TYPE_INT:
+	    data.i[c] = op[0]->value.i[c];
+	    if (data.i[c] < 0)
+	       data.i[c] = -data.i[c];
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    data.f[c] = fabs(op[0]->value.f[c]);
+	    break;
+	 default:
+	    assert(0);
+	 }
+      }
+      break;
+
+   case ir_unop_sign:
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 switch (this->type->base_type) {
+	 case GLSL_TYPE_UINT:
+	    data.u[c] = op[0]->value.i[c] > 0;
+	    break;
+	 case GLSL_TYPE_INT:
+	    data.i[c] = (op[0]->value.i[c] > 0) - (op[0]->value.i[c] < 0);
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    data.f[c] = float((op[0]->value.f[c] > 0)-(op[0]->value.f[c] < 0));
+	    break;
+	 default:
+	    assert(0);
+	 }
+      }
+      break;
+
+   case ir_unop_rcp:
+      assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 switch (this->type->base_type) {
+	 case GLSL_TYPE_UINT:
+	    if (op[0]->value.u[c] != 0.0)
+	       data.u[c] = 1 / op[0]->value.u[c];
+	    break;
+	 case GLSL_TYPE_INT:
+	    if (op[0]->value.i[c] != 0.0)
+	       data.i[c] = 1 / op[0]->value.i[c];
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    if (op[0]->value.f[c] != 0.0)
+	       data.f[c] = 1.0F / op[0]->value.f[c];
+	    break;
+	 default:
+	    assert(0);
+	 }
+      }
+      break;
+
+   case ir_unop_rsq:
+      assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.f[c] = 1.0F / sqrtf(op[0]->value.f[c]);
+      }
+      break;
+
+   case ir_unop_sqrt:
+      assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.f[c] = sqrtf(op[0]->value.f[c]);
+      }
+      break;
+
+   case ir_unop_exp:
+      assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.f[c] = expf(op[0]->value.f[c]);
+      }
+      break;
+
+   case ir_unop_exp2:
+      assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.f[c] = exp2f(op[0]->value.f[c]);
+      }
+      break;
+
+   case ir_unop_log:
+      assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.f[c] = logf(op[0]->value.f[c]);
+      }
+      break;
+
+   case ir_unop_log2:
+      assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.f[c] = log2f(op[0]->value.f[c]);
+      }
+      break;
+
+   case ir_unop_dFdx:
+   case ir_unop_dFdy:
+      assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.f[c] = 0.0;
+      }
+      break;
+
+   case ir_unop_pack_snorm_2x16:
+      assert(op[0]->type == glsl_type::vec2_type);
+      data.u[0] = pack_2x16(pack_snorm_1x16,
+                            op[0]->value.f[0],
+                            op[0]->value.f[1]);
+      break;
+   case ir_unop_pack_snorm_4x8:
+      assert(op[0]->type == glsl_type::vec4_type);
+      data.u[0] = pack_4x8(pack_snorm_1x8,
+                           op[0]->value.f[0],
+                           op[0]->value.f[1],
+                           op[0]->value.f[2],
+                           op[0]->value.f[3]);
+      break;
+   case ir_unop_unpack_snorm_2x16:
+      assert(op[0]->type == glsl_type::uint_type);
+      unpack_2x16(unpack_snorm_1x16,
+                  op[0]->value.u[0],
+                  &data.f[0], &data.f[1]);
+      break;
+   case ir_unop_unpack_snorm_4x8:
+      assert(op[0]->type == glsl_type::uint_type);
+      unpack_4x8(unpack_snorm_1x8,
+                 op[0]->value.u[0],
+                 &data.f[0], &data.f[1], &data.f[2], &data.f[3]);
+      break;
+   case ir_unop_pack_unorm_2x16:
+      assert(op[0]->type == glsl_type::vec2_type);
+      data.u[0] = pack_2x16(pack_unorm_1x16,
+                            op[0]->value.f[0],
+                            op[0]->value.f[1]);
+      break;
+   case ir_unop_pack_unorm_4x8:
+      assert(op[0]->type == glsl_type::vec4_type);
+      data.u[0] = pack_4x8(pack_unorm_1x8,
+                           op[0]->value.f[0],
+                           op[0]->value.f[1],
+                           op[0]->value.f[2],
+                           op[0]->value.f[3]);
+      break;
+   case ir_unop_unpack_unorm_2x16:
+      assert(op[0]->type == glsl_type::uint_type);
+      unpack_2x16(unpack_unorm_1x16,
+                  op[0]->value.u[0],
+                  &data.f[0], &data.f[1]);
+      break;
+   case ir_unop_unpack_unorm_4x8:
+      assert(op[0]->type == glsl_type::uint_type);
+      unpack_4x8(unpack_unorm_1x8,
+                 op[0]->value.u[0],
+                 &data.f[0], &data.f[1], &data.f[2], &data.f[3]);
+      break;
+   case ir_unop_pack_half_2x16:
+      assert(op[0]->type == glsl_type::vec2_type);
+      data.u[0] = pack_2x16(pack_half_1x16,
+                            op[0]->value.f[0],
+                            op[0]->value.f[1]);
+      break;
+   case ir_unop_unpack_half_2x16:
+      assert(op[0]->type == glsl_type::uint_type);
+      unpack_2x16(unpack_half_1x16,
+                  op[0]->value.u[0],
+                  &data.f[0], &data.f[1]);
+      break;
+   case ir_binop_pow:
+      assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 data.f[c] = powf(op[0]->value.f[c], op[1]->value.f[c]);
+      }
+      break;
+
+   case ir_binop_dot:
+      data.f[0] = dot(op[0], op[1]);
+      break;
+
+   case ir_binop_min:
+      assert(op[0]->type == op[1]->type || op0_scalar || op1_scalar);
+      for (unsigned c = 0, c0 = 0, c1 = 0;
+	   c < components;
+	   c0 += c0_inc, c1 += c1_inc, c++) {
+
+	 switch (op[0]->type->base_type) {
+	 case GLSL_TYPE_UINT:
+	    data.u[c] = MIN2(op[0]->value.u[c0], op[1]->value.u[c1]);
+	    break;
+	 case GLSL_TYPE_INT:
+	    data.i[c] = MIN2(op[0]->value.i[c0], op[1]->value.i[c1]);
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    data.f[c] = MIN2(op[0]->value.f[c0], op[1]->value.f[c1]);
+	    break;
+	 default:
+	    assert(0);
+	 }
+      }
+
+      break;
+   case ir_binop_max:
+      assert(op[0]->type == op[1]->type || op0_scalar || op1_scalar);
+      for (unsigned c = 0, c0 = 0, c1 = 0;
+	   c < components;
+	   c0 += c0_inc, c1 += c1_inc, c++) {
+
+	 switch (op[0]->type->base_type) {
+	 case GLSL_TYPE_UINT:
+	    data.u[c] = MAX2(op[0]->value.u[c0], op[1]->value.u[c1]);
+	    break;
+	 case GLSL_TYPE_INT:
+	    data.i[c] = MAX2(op[0]->value.i[c0], op[1]->value.i[c1]);
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    data.f[c] = MAX2(op[0]->value.f[c0], op[1]->value.f[c1]);
+	    break;
+	 default:
+	    assert(0);
+	 }
+      }
+      break;
+
+   case ir_binop_add:
+      assert(op[0]->type == op[1]->type || op0_scalar || op1_scalar);
+      for (unsigned c = 0, c0 = 0, c1 = 0;
+	   c < components;
+	   c0 += c0_inc, c1 += c1_inc, c++) {
+
+	 switch (op[0]->type->base_type) {
+	 case GLSL_TYPE_UINT:
+	    data.u[c] = op[0]->value.u[c0] + op[1]->value.u[c1];
+	    break;
+	 case GLSL_TYPE_INT:
+	    data.i[c] = op[0]->value.i[c0] + op[1]->value.i[c1];
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    data.f[c] = op[0]->value.f[c0] + op[1]->value.f[c1];
+	    break;
+	 default:
+	    assert(0);
+	 }
+      }
+
+      break;
+   case ir_binop_sub:
+      assert(op[0]->type == op[1]->type || op0_scalar || op1_scalar);
+      for (unsigned c = 0, c0 = 0, c1 = 0;
+	   c < components;
+	   c0 += c0_inc, c1 += c1_inc, c++) {
+
+	 switch (op[0]->type->base_type) {
+	 case GLSL_TYPE_UINT:
+	    data.u[c] = op[0]->value.u[c0] - op[1]->value.u[c1];
+	    break;
+	 case GLSL_TYPE_INT:
+	    data.i[c] = op[0]->value.i[c0] - op[1]->value.i[c1];
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    data.f[c] = op[0]->value.f[c0] - op[1]->value.f[c1];
+	    break;
+	 default:
+	    assert(0);
+	 }
+      }
+
+      break;
+   case ir_binop_mul:
+      /* Check for equal types, or unequal types involving scalars */
+      if ((op[0]->type == op[1]->type && !op[0]->type->is_matrix())
+	  || op0_scalar || op1_scalar) {
+	 for (unsigned c = 0, c0 = 0, c1 = 0;
+	      c < components;
+	      c0 += c0_inc, c1 += c1_inc, c++) {
+
+	    switch (op[0]->type->base_type) {
+	    case GLSL_TYPE_UINT:
+	       data.u[c] = op[0]->value.u[c0] * op[1]->value.u[c1];
+	       break;
+	    case GLSL_TYPE_INT:
+	       data.i[c] = op[0]->value.i[c0] * op[1]->value.i[c1];
+	       break;
+	    case GLSL_TYPE_FLOAT:
+	       data.f[c] = op[0]->value.f[c0] * op[1]->value.f[c1];
+	       break;
+	    default:
+	       assert(0);
+	    }
+	 }
+      } else {
+	 assert(op[0]->type->is_matrix() || op[1]->type->is_matrix());
+
+	 /* Multiply an N-by-M matrix with an M-by-P matrix.  Since either
+	  * matrix can be a GLSL vector, either N or P can be 1.
+	  *
+	  * For vec*mat, the vector is treated as a row vector.  This
+	  * means the vector is a 1-row x M-column matrix.
+	  *
+	  * For mat*vec, the vector is treated as a column vector.  Since
+	  * matrix_columns is 1 for vectors, this just works.
+	  */
+	 const unsigned n = op[0]->type->is_vector()
+	    ? 1 : op[0]->type->vector_elements;
+	 const unsigned m = op[1]->type->vector_elements;
+	 const unsigned p = op[1]->type->matrix_columns;
+	 for (unsigned j = 0; j < p; j++) {
+	    for (unsigned i = 0; i < n; i++) {
+	       for (unsigned k = 0; k < m; k++) {
+		  data.f[i+n*j] += op[0]->value.f[i+n*k]*op[1]->value.f[k+m*j];
+	       }
+	    }
+	 }
+      }
+
+      break;
+   case ir_binop_div:
+      /* FINISHME: Emit warning when division-by-zero is detected. */
+      assert(op[0]->type == op[1]->type || op0_scalar || op1_scalar);
+      for (unsigned c = 0, c0 = 0, c1 = 0;
+	   c < components;
+	   c0 += c0_inc, c1 += c1_inc, c++) {
+
+	 switch (op[0]->type->base_type) {
+	 case GLSL_TYPE_UINT:
+	    if (op[1]->value.u[c1] == 0) {
+	       data.u[c] = 0;
+	    } else {
+	       data.u[c] = op[0]->value.u[c0] / op[1]->value.u[c1];
+	    }
+	    break;
+	 case GLSL_TYPE_INT:
+	    if (op[1]->value.i[c1] == 0) {
+	       data.i[c] = 0;
+	    } else {
+	       data.i[c] = op[0]->value.i[c0] / op[1]->value.i[c1];
+	    }
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    data.f[c] = op[0]->value.f[c0] / op[1]->value.f[c1];
+	    break;
+	 default:
+	    assert(0);
+	 }
+      }
+
+      break;
+   case ir_binop_mod:
+      /* FINISHME: Emit warning when division-by-zero is detected. */
+      assert(op[0]->type == op[1]->type || op0_scalar || op1_scalar);
+      for (unsigned c = 0, c0 = 0, c1 = 0;
+	   c < components;
+	   c0 += c0_inc, c1 += c1_inc, c++) {
+
+	 switch (op[0]->type->base_type) {
+	 case GLSL_TYPE_UINT:
+	    if (op[1]->value.u[c1] == 0) {
+	       data.u[c] = 0;
+	    } else {
+	       data.u[c] = op[0]->value.u[c0] % op[1]->value.u[c1];
+	    }
+	    break;
+	 case GLSL_TYPE_INT:
+	    if (op[1]->value.i[c1] == 0) {
+	       data.i[c] = 0;
+	    } else {
+	       data.i[c] = op[0]->value.i[c0] % op[1]->value.i[c1];
+	    }
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    /* We don't use fmod because it rounds toward zero; GLSL specifies
+	     * the use of floor.
+	     */
+	    data.f[c] = op[0]->value.f[c0] - op[1]->value.f[c1]
+	       * floorf(op[0]->value.f[c0] / op[1]->value.f[c1]);
+	    break;
+	 default:
+	    assert(0);
+	 }
+      }
+
+      break;
+
+   case ir_binop_logic_and:
+      assert(op[0]->type->base_type == GLSL_TYPE_BOOL);
+      for (unsigned c = 0; c < op[0]->type->components(); c++)
+	 data.b[c] = op[0]->value.b[c] && op[1]->value.b[c];
+      break;
+   case ir_binop_logic_xor:
+      assert(op[0]->type->base_type == GLSL_TYPE_BOOL);
+      for (unsigned c = 0; c < op[0]->type->components(); c++)
+	 data.b[c] = op[0]->value.b[c] ^ op[1]->value.b[c];
+      break;
+   case ir_binop_logic_or:
+      assert(op[0]->type->base_type == GLSL_TYPE_BOOL);
+      for (unsigned c = 0; c < op[0]->type->components(); c++)
+	 data.b[c] = op[0]->value.b[c] || op[1]->value.b[c];
+      break;
+
+   case ir_binop_less:
+      assert(op[0]->type == op[1]->type);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 switch (op[0]->type->base_type) {
+	 case GLSL_TYPE_UINT:
+	    data.b[c] = op[0]->value.u[c] < op[1]->value.u[c];
+	    break;
+	 case GLSL_TYPE_INT:
+	    data.b[c] = op[0]->value.i[c] < op[1]->value.i[c];
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    data.b[c] = op[0]->value.f[c] < op[1]->value.f[c];
+	    break;
+	 default:
+	    assert(0);
+	 }
+      }
+      break;
+   case ir_binop_greater:
+      assert(op[0]->type == op[1]->type);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 switch (op[0]->type->base_type) {
+	 case GLSL_TYPE_UINT:
+	    data.b[c] = op[0]->value.u[c] > op[1]->value.u[c];
+	    break;
+	 case GLSL_TYPE_INT:
+	    data.b[c] = op[0]->value.i[c] > op[1]->value.i[c];
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    data.b[c] = op[0]->value.f[c] > op[1]->value.f[c];
+	    break;
+	 default:
+	    assert(0);
+	 }
+      }
+      break;
+   case ir_binop_lequal:
+      assert(op[0]->type == op[1]->type);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 switch (op[0]->type->base_type) {
+	 case GLSL_TYPE_UINT:
+	    data.b[c] = op[0]->value.u[c] <= op[1]->value.u[c];
+	    break;
+	 case GLSL_TYPE_INT:
+	    data.b[c] = op[0]->value.i[c] <= op[1]->value.i[c];
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    data.b[c] = op[0]->value.f[c] <= op[1]->value.f[c];
+	    break;
+	 default:
+	    assert(0);
+	 }
+      }
+      break;
+   case ir_binop_gequal:
+      assert(op[0]->type == op[1]->type);
+      for (unsigned c = 0; c < op[0]->type->components(); c++) {
+	 switch (op[0]->type->base_type) {
+	 case GLSL_TYPE_UINT:
+	    data.b[c] = op[0]->value.u[c] >= op[1]->value.u[c];
+	    break;
+	 case GLSL_TYPE_INT:
+	    data.b[c] = op[0]->value.i[c] >= op[1]->value.i[c];
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    data.b[c] = op[0]->value.f[c] >= op[1]->value.f[c];
+	    break;
+	 default:
+	    assert(0);
+	 }
+      }
+      break;
+   case ir_binop_equal:
+      assert(op[0]->type == op[1]->type);
+      for (unsigned c = 0; c < components; c++) {
+	 switch (op[0]->type->base_type) {
+	 case GLSL_TYPE_UINT:
+	    data.b[c] = op[0]->value.u[c] == op[1]->value.u[c];
+	    break;
+	 case GLSL_TYPE_INT:
+	    data.b[c] = op[0]->value.i[c] == op[1]->value.i[c];
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    data.b[c] = op[0]->value.f[c] == op[1]->value.f[c];
+	    break;
+	 case GLSL_TYPE_BOOL:
+	    data.b[c] = op[0]->value.b[c] == op[1]->value.b[c];
+	    break;
+	 default:
+	    assert(0);
+	 }
+      }
+      break;
+   case ir_binop_nequal:
+      assert(op[0]->type == op[1]->type);
+      for (unsigned c = 0; c < components; c++) {
+	 switch (op[0]->type->base_type) {
+	 case GLSL_TYPE_UINT:
+	    data.b[c] = op[0]->value.u[c] != op[1]->value.u[c];
+	    break;
+	 case GLSL_TYPE_INT:
+	    data.b[c] = op[0]->value.i[c] != op[1]->value.i[c];
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    data.b[c] = op[0]->value.f[c] != op[1]->value.f[c];
+	    break;
+	 case GLSL_TYPE_BOOL:
+	    data.b[c] = op[0]->value.b[c] != op[1]->value.b[c];
+	    break;
+	 default:
+	    assert(0);
+	 }
+      }
+      break;
+   case ir_binop_all_equal:
+      data.b[0] = op[0]->has_value(op[1]);
+      break;
+   case ir_binop_any_nequal:
+      data.b[0] = !op[0]->has_value(op[1]);
+      break;
+
+   case ir_binop_lshift:
+      for (unsigned c = 0, c0 = 0, c1 = 0;
+           c < components;
+           c0 += c0_inc, c1 += c1_inc, c++) {
+
+          if (op[0]->type->base_type == GLSL_TYPE_INT &&
+              op[1]->type->base_type == GLSL_TYPE_INT) {
+              data.i[c] = op[0]->value.i[c0] << op[1]->value.i[c1];
+
+          } else if (op[0]->type->base_type == GLSL_TYPE_INT &&
+                     op[1]->type->base_type == GLSL_TYPE_UINT) {
+              data.i[c] = op[0]->value.i[c0] << op[1]->value.u[c1];
+
+          } else if (op[0]->type->base_type == GLSL_TYPE_UINT &&
+                     op[1]->type->base_type == GLSL_TYPE_INT) {
+              data.u[c] = op[0]->value.u[c0] << op[1]->value.i[c1];
+
+          } else if (op[0]->type->base_type == GLSL_TYPE_UINT &&
+                     op[1]->type->base_type == GLSL_TYPE_UINT) {
+              data.u[c] = op[0]->value.u[c0] << op[1]->value.u[c1];
+          }
+      }
+      break;
+
+   case ir_binop_rshift:
+       for (unsigned c = 0, c0 = 0, c1 = 0;
+            c < components;
+            c0 += c0_inc, c1 += c1_inc, c++) {
+
+           if (op[0]->type->base_type == GLSL_TYPE_INT &&
+               op[1]->type->base_type == GLSL_TYPE_INT) {
+               data.i[c] = op[0]->value.i[c0] >> op[1]->value.i[c1];
+
+           } else if (op[0]->type->base_type == GLSL_TYPE_INT &&
+                      op[1]->type->base_type == GLSL_TYPE_UINT) {
+               data.i[c] = op[0]->value.i[c0] >> op[1]->value.u[c1];
+
+           } else if (op[0]->type->base_type == GLSL_TYPE_UINT &&
+                      op[1]->type->base_type == GLSL_TYPE_INT) {
+               data.u[c] = op[0]->value.u[c0] >> op[1]->value.i[c1];
+
+           } else if (op[0]->type->base_type == GLSL_TYPE_UINT &&
+                      op[1]->type->base_type == GLSL_TYPE_UINT) {
+               data.u[c] = op[0]->value.u[c0] >> op[1]->value.u[c1];
+           }
+       }
+       break;
+
+   case ir_binop_bit_and:
+      for (unsigned c = 0, c0 = 0, c1 = 0;
+           c < components;
+           c0 += c0_inc, c1 += c1_inc, c++) {
+
+          switch (op[0]->type->base_type) {
+          case GLSL_TYPE_INT:
+              data.i[c] = op[0]->value.i[c0] & op[1]->value.i[c1];
+              break;
+          case GLSL_TYPE_UINT:
+              data.u[c] = op[0]->value.u[c0] & op[1]->value.u[c1];
+              break;
+          default:
+              assert(0);
+          }
+      }
+      break;
+
+   case ir_binop_bit_or:
+      for (unsigned c = 0, c0 = 0, c1 = 0;
+           c < components;
+           c0 += c0_inc, c1 += c1_inc, c++) {
+
+          switch (op[0]->type->base_type) {
+          case GLSL_TYPE_INT:
+              data.i[c] = op[0]->value.i[c0] | op[1]->value.i[c1];
+              break;
+          case GLSL_TYPE_UINT:
+              data.u[c] = op[0]->value.u[c0] | op[1]->value.u[c1];
+              break;
+          default:
+              assert(0);
+          }
+      }
+      break;
+
+   case ir_binop_vector_extract: {
+      const int c = CLAMP(op[1]->value.i[0], 0,
+			  (int) op[0]->type->vector_elements - 1);
+
+      switch (op[0]->type->base_type) {
+      case GLSL_TYPE_UINT:
+         data.u[0] = op[0]->value.u[c];
+         break;
+      case GLSL_TYPE_INT:
+         data.i[0] = op[0]->value.i[c];
+         break;
+      case GLSL_TYPE_FLOAT:
+         data.f[0] = op[0]->value.f[c];
+         break;
+      case GLSL_TYPE_BOOL:
+         data.b[0] = op[0]->value.b[c];
+         break;
+      default:
+         assert(0);
+      }
+      break;
+   }
+
+   case ir_binop_bit_xor:
+      for (unsigned c = 0, c0 = 0, c1 = 0;
+           c < components;
+           c0 += c0_inc, c1 += c1_inc, c++) {
+
+          switch (op[0]->type->base_type) {
+          case GLSL_TYPE_INT:
+              data.i[c] = op[0]->value.i[c0] ^ op[1]->value.i[c1];
+              break;
+          case GLSL_TYPE_UINT:
+              data.u[c] = op[0]->value.u[c0] ^ op[1]->value.u[c1];
+              break;
+          default:
+              assert(0);
+          }
+      }
+      break;
+
+   case ir_unop_bitfield_reverse:
+      /* http://graphics.stanford.edu/~seander/bithacks.html#BitReverseObvious */
+      for (unsigned c = 0; c < components; c++) {
+         unsigned int v = op[0]->value.u[c]; // input bits to be reversed
+         unsigned int r = v; // r will be reversed bits of v; first get LSB of v
+         int s = sizeof(v) * CHAR_BIT - 1; // extra shift needed at end
+
+         for (v >>= 1; v; v >>= 1) {
+            r <<= 1;
+            r |= v & 1;
+            s--;
+         }
+         r <<= s; // shift when v's highest bits are zero
+
+         data.u[c] = r;
+      }
+      break;
+
+   case ir_unop_bit_count:
+      for (unsigned c = 0; c < components; c++) {
+         unsigned count = 0;
+         unsigned v = op[0]->value.u[c];
+
+         for (; v; count++) {
+            v &= v - 1;
+         }
+         data.u[c] = count;
+      }
+      break;
+
+   case ir_unop_find_msb:
+      for (unsigned c = 0; c < components; c++) {
+         int v = op[0]->value.i[c];
+
+         if (v == 0 || (op[0]->type->base_type == GLSL_TYPE_INT && v == -1))
+            data.i[c] = -1;
+         else {
+            int count = 0;
+            int top_bit = op[0]->type->base_type == GLSL_TYPE_UINT
+                          ? 0 : v & (1 << 31);
+
+            while (((v & (1 << 31)) == top_bit) && count != 32) {
+               count++;
+               v <<= 1;
+            }
+
+            data.i[c] = 31 - count;
+         }
+      }
+      break;
+
+   case ir_unop_find_lsb:
+      for (unsigned c = 0; c < components; c++) {
+         if (op[0]->value.i[c] == 0)
+            data.i[c] = -1;
+         else {
+            unsigned pos = 0;
+            unsigned v = op[0]->value.u[c];
+
+            for (; !(v & 1); v >>= 1) {
+               pos++;
+            }
+            data.u[c] = pos;
+         }
+      }
+      break;
+
+   case ir_triop_bitfield_extract: {
+      int offset = op[1]->value.i[0];
+      int bits = op[2]->value.i[0];
+
+      for (unsigned c = 0; c < components; c++) {
+         if (bits == 0)
+            data.u[c] = 0;
+         else if (offset < 0 || bits < 0)
+            data.u[c] = 0; /* Undefined, per spec. */
+         else if (offset + bits > 32)
+            data.u[c] = 0; /* Undefined, per spec. */
+         else {
+            if (op[0]->type->base_type == GLSL_TYPE_INT) {
+               /* int so that the right shift will sign-extend. */
+               int value = op[0]->value.i[c];
+               value <<= 32 - bits - offset;
+               value >>= 32 - bits;
+               data.i[c] = value;
+            } else {
+               unsigned value = op[0]->value.u[c];
+               value <<= 32 - bits - offset;
+               value >>= 32 - bits;
+               data.u[c] = value;
+            }
+         }
+      }
+      break;
+   }
+
+   case ir_binop_bfm: {
+      int bits = op[0]->value.i[0];
+      int offset = op[1]->value.i[0];
+
+      for (unsigned c = 0; c < components; c++) {
+         if (bits == 0)
+            data.u[c] = op[0]->value.u[c];
+         else if (offset < 0 || bits < 0)
+            data.u[c] = 0; /* Undefined for bitfieldInsert, per spec. */
+         else if (offset + bits > 32)
+            data.u[c] = 0; /* Undefined for bitfieldInsert, per spec. */
+         else
+            data.u[c] = ((1 << bits) - 1) << offset;
+      }
+      break;
+   }
+
+   case ir_binop_ldexp:
+      for (unsigned c = 0; c < components; c++) {
+         data.f[c] = ldexp(op[0]->value.f[c], op[1]->value.i[c]);
+         /* Flush subnormal values to zero. */
+         if (!isnormal(data.f[c]))
+            data.f[c] = copysign(0.0f, op[0]->value.f[c]);
+      }
+      break;
+
+   case ir_triop_fma:
+      assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
+      assert(op[1]->type->base_type == GLSL_TYPE_FLOAT);
+      assert(op[2]->type->base_type == GLSL_TYPE_FLOAT);
+
+      for (unsigned c = 0; c < components; c++) {
+         data.f[c] = op[0]->value.f[c] * op[1]->value.f[c]
+                                       + op[2]->value.f[c];
+      }
+      break;
+
+   case ir_triop_lrp: {
+      assert(op[0]->type->base_type == GLSL_TYPE_FLOAT);
+      assert(op[1]->type->base_type == GLSL_TYPE_FLOAT);
+      assert(op[2]->type->base_type == GLSL_TYPE_FLOAT);
+
+      unsigned c2_inc = op[2]->type->is_scalar() ? 0 : 1;
+      for (unsigned c = 0, c2 = 0; c < components; c2 += c2_inc, c++) {
+         data.f[c] = op[0]->value.f[c] * (1.0f - op[2]->value.f[c2]) +
+                     (op[1]->value.f[c] * op[2]->value.f[c2]);
+      }
+      break;
+   }
+
+   case ir_triop_csel:
+      for (unsigned c = 0; c < components; c++) {
+         data.u[c] = op[0]->value.b[c] ? op[1]->value.u[c]
+                                       : op[2]->value.u[c];
+      }
+      break;
+
+   case ir_triop_vector_insert: {
+      const unsigned idx = op[2]->value.u[0];
+
+      memcpy(&data, &op[0]->value, sizeof(data));
+
+      switch (this->type->base_type) {
+      case GLSL_TYPE_INT:
+	 data.i[idx] = op[1]->value.i[0];
+	 break;
+      case GLSL_TYPE_UINT:
+	 data.u[idx] = op[1]->value.u[0];
+	 break;
+      case GLSL_TYPE_FLOAT:
+	 data.f[idx] = op[1]->value.f[0];
+	 break;
+      case GLSL_TYPE_BOOL:
+	 data.b[idx] = op[1]->value.b[0];
+	 break;
+      default:
+	 assert(!"Should not get here.");
+	 break;
+      }
+      break;
+   }
+
+   case ir_quadop_bitfield_insert: {
+      int offset = op[2]->value.i[0];
+      int bits = op[3]->value.i[0];
+
+      for (unsigned c = 0; c < components; c++) {
+         if (bits == 0)
+            data.u[c] = op[0]->value.u[c];
+         else if (offset < 0 || bits < 0)
+            data.u[c] = 0; /* Undefined, per spec. */
+         else if (offset + bits > 32)
+            data.u[c] = 0; /* Undefined, per spec. */
+         else {
+            unsigned insert_mask = ((1 << bits) - 1) << offset;
+
+            unsigned insert = op[1]->value.u[c];
+            insert <<= offset;
+            insert &= insert_mask;
+
+            unsigned base = op[0]->value.u[c];
+            base &= ~insert_mask;
+
+            data.u[c] = base | insert;
+         }
+      }
+      break;
+   }
+
+   case ir_quadop_vector:
+      for (unsigned c = 0; c < this->type->vector_elements; c++) {
+	 switch (this->type->base_type) {
+	 case GLSL_TYPE_INT:
+	    data.i[c] = op[c]->value.i[0];
+	    break;
+	 case GLSL_TYPE_UINT:
+	    data.u[c] = op[c]->value.u[0];
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    data.f[c] = op[c]->value.f[0];
+	    break;
+	 default:
+	    assert(0);
+	 }
+      }
+      break;
+
+   default:
+      /* FINISHME: Should handle all expression types. */
+      return NULL;
+   }
+
+   return new(ctx) ir_constant(this->type, &data);
+}
+
+
+ir_constant *
+ir_texture::constant_expression_value(struct hash_table *)
+{
+   /* texture lookups aren't constant expressions */
+   return NULL;
+}
+
+
+ir_constant *
+ir_swizzle::constant_expression_value(struct hash_table *variable_context)
+{
+   ir_constant *v = this->val->constant_expression_value(variable_context);
+
+   if (v != NULL) {
+      ir_constant_data data = { { 0 } };
+
+      const unsigned swiz_idx[4] = {
+	 this->mask.x, this->mask.y, this->mask.z, this->mask.w
+      };
+
+      for (unsigned i = 0; i < this->mask.num_components; i++) {
+	 switch (v->type->base_type) {
+	 case GLSL_TYPE_UINT:
+	 case GLSL_TYPE_INT:   data.u[i] = v->value.u[swiz_idx[i]]; break;
+	 case GLSL_TYPE_FLOAT: data.f[i] = v->value.f[swiz_idx[i]]; break;
+	 case GLSL_TYPE_BOOL:  data.b[i] = v->value.b[swiz_idx[i]]; break;
+	 default:              assert(!"Should not get here."); break;
+	 }
+      }
+
+      void *ctx = ralloc_parent(this);
+      return new(ctx) ir_constant(this->type, &data);
+   }
+   return NULL;
+}
+
+
+ir_constant *
+ir_dereference_variable::constant_expression_value(struct hash_table *variable_context)
+{
+   /* This may occur during compile and var->type is glsl_type::error_type */
+   if (!var)
+      return NULL;
+
+   /* Give priority to the context hashtable, if it exists */
+   if (variable_context) {
+      ir_constant *value = (ir_constant *)hash_table_find(variable_context, var);
+      if(value)
+	 return value;
+   }
+
+   /* The constant_value of a uniform variable is its initializer,
+    * not the lifetime constant value of the uniform.
+    */
+   if (var->data.mode == ir_var_uniform)
+      return NULL;
+
+   if (!var->constant_value)
+      return NULL;
+
+   return var->constant_value->clone(ralloc_parent(var), NULL);
+}
+
+
+ir_constant *
+ir_dereference_array::constant_expression_value(struct hash_table *variable_context)
+{
+   ir_constant *array = this->array->constant_expression_value(variable_context);
+   ir_constant *idx = this->array_index->constant_expression_value(variable_context);
+
+   if ((array != NULL) && (idx != NULL)) {
+      void *ctx = ralloc_parent(this);
+      if (array->type->is_matrix()) {
+	 /* Array access of a matrix results in a vector.
+	  */
+	 const unsigned column = idx->value.u[0];
+
+	 const glsl_type *const column_type = array->type->column_type();
+
+	 /* Offset in the constant matrix to the first element of the column
+	  * to be extracted.
+	  */
+	 const unsigned mat_idx = column * column_type->vector_elements;
+
+	 ir_constant_data data = { { 0 } };
+
+	 switch (column_type->base_type) {
+	 case GLSL_TYPE_UINT:
+	 case GLSL_TYPE_INT:
+	    for (unsigned i = 0; i < column_type->vector_elements; i++)
+	       data.u[i] = array->value.u[mat_idx + i];
+
+	    break;
+
+	 case GLSL_TYPE_FLOAT:
+	    for (unsigned i = 0; i < column_type->vector_elements; i++)
+	       data.f[i] = array->value.f[mat_idx + i];
+
+	    break;
+
+	 default:
+	    assert(!"Should not get here.");
+	    break;
+	 }
+
+	 return new(ctx) ir_constant(column_type, &data);
+      } else if (array->type->is_vector()) {
+	 const unsigned component = idx->value.u[0];
+
+	 return new(ctx) ir_constant(array, component);
+      } else {
+	 const unsigned index = idx->value.u[0];
+	 return array->get_array_element(index)->clone(ctx, NULL);
+      }
+   }
+   return NULL;
+}
+
+
+ir_constant *
+ir_dereference_record::constant_expression_value(struct hash_table *)
+{
+   ir_constant *v = this->record->constant_expression_value();
+
+   return (v != NULL) ? v->get_record_field(this->field) : NULL;
+}
+
+
+ir_constant *
+ir_assignment::constant_expression_value(struct hash_table *)
+{
+   /* FINISHME: Handle CEs involving assignment (return RHS) */
+   return NULL;
+}
+
+
+ir_constant *
+ir_constant::constant_expression_value(struct hash_table *)
+{
+   return this;
+}
+
+
+ir_constant *
+ir_call::constant_expression_value(struct hash_table *variable_context)
+{
+   return this->callee->constant_expression_value(&this->actual_parameters, variable_context);
+}
+
+
+bool ir_function_signature::constant_expression_evaluate_expression_list(const struct exec_list &body,
+									 struct hash_table *variable_context,
+									 ir_constant **result)
+{
+   foreach_list(n, &body) {
+      ir_instruction *inst = (ir_instruction *)n;
+      switch(inst->ir_type) {
+
+	 /* (declare () type symbol) */
+      case ir_type_variable: {
+	 ir_variable *var = inst->as_variable();
+	 hash_table_insert(variable_context, ir_constant::zero(this, var->type), var);
+	 break;
+      }
+
+	 /* (assign [condition] (write-mask) (ref) (value)) */
+      case ir_type_assignment: {
+	 ir_assignment *asg = inst->as_assignment();
+	 if (asg->condition) {
+	    ir_constant *cond = asg->condition->constant_expression_value(variable_context);
+	    if (!cond)
+	       return false;
+	    if (!cond->get_bool_component(0))
+	       break;
+	 }
+
+	 ir_constant *store = NULL;
+	 int offset = 0;
+
+	 if (!constant_referenced(asg->lhs, variable_context, store, offset))
+	    return false;
+
+	 ir_constant *value = asg->rhs->constant_expression_value(variable_context);
+
+	 if (!value)
+	    return false;
+
+	 store->copy_masked_offset(value, offset, asg->write_mask);
+	 break;
+      }
+
+	 /* (return (expression)) */
+      case ir_type_return:
+	 assert (result);
+	 *result = inst->as_return()->value->constant_expression_value(variable_context);
+	 return *result != NULL;
+
+	 /* (call name (ref) (params))*/
+      case ir_type_call: {
+	 ir_call *call = inst->as_call();
+
+	 /* Just say no to void functions in constant expressions.  We
+	  * don't need them at that point.
+	  */
+
+	 if (!call->return_deref)
+	    return false;
+
+	 ir_constant *store = NULL;
+	 int offset = 0;
+
+	 if (!constant_referenced(call->return_deref, variable_context,
+                                  store, offset))
+	    return false;
+
+	 ir_constant *value = call->constant_expression_value(variable_context);
+
+	 if(!value)
+	    return false;
+
+	 store->copy_offset(value, offset);
+	 break;
+      }
+
+	 /* (if condition (then-instructions) (else-instructions)) */
+      case ir_type_if: {
+	 ir_if *iif = inst->as_if();
+
+	 ir_constant *cond = iif->condition->constant_expression_value(variable_context);
+	 if (!cond || !cond->type->is_boolean())
+	    return false;
+
+	 exec_list &branch = cond->get_bool_component(0) ? iif->then_instructions : iif->else_instructions;
+
+	 *result = NULL;
+	 if (!constant_expression_evaluate_expression_list(branch, variable_context, result))
+	    return false;
+
+	 /* If there was a return in the branch chosen, drop out now. */
+	 if (*result)
+	    return true;
+
+	 break;
+      }
+
+	 /* Every other expression type, we drop out. */
+      default:
+	 return false;
+      }
+   }
+
+   /* Reaching the end of the block is not an error condition */
+   if (result)
+      *result = NULL;
+
+   return true;
+}
+
+ir_constant *
+ir_function_signature::constant_expression_value(exec_list *actual_parameters, struct hash_table *variable_context)
+{
+   const glsl_type *type = this->return_type;
+   if (type == glsl_type::void_type)
+      return NULL;
+
+   /* From the GLSL 1.20 spec, page 23:
+    * "Function calls to user-defined functions (non-built-in functions)
+    *  cannot be used to form constant expressions."
+    */
+   if (!this->is_builtin())
+      return NULL;
+
+   /*
+    * Of the builtin functions, only the texture lookups and the noise
+    * ones must not be used in constant expressions.  They all include
+    * specific opcodes so they don't need to be special-cased at this
+    * point.
+    */
+
+   /* Initialize the table of dereferencable names with the function
+    * parameters.  Verify their const-ness on the way.
+    *
+    * We expect the correctness of the number of parameters to have
+    * been checked earlier.
+    */
+   hash_table *deref_hash = hash_table_ctor(8, hash_table_pointer_hash,
+					    hash_table_pointer_compare);
+
+   /* If "origin" is non-NULL, then the function body is there.  So we
+    * have to use the variable objects from the object with the body,
+    * but the parameter instantiation on the current object.
+    */
+   const exec_node *parameter_info = origin ? origin->parameters.head : parameters.head;
+
+   foreach_list(n, actual_parameters) {
+      ir_constant *constant = ((ir_rvalue *) n)->constant_expression_value(variable_context);
+      if (constant == NULL) {
+         hash_table_dtor(deref_hash);
+         return NULL;
+      }
+
+
+      ir_variable *var = (ir_variable *)parameter_info;
+      hash_table_insert(deref_hash, constant, var);
+
+      parameter_info = parameter_info->next;
+   }
+
+   ir_constant *result = NULL;
+
+   /* Now run the builtin function until something non-constant
+    * happens or we get the result.
+    */
+   if (constant_expression_evaluate_expression_list(origin ? origin->body : body, deref_hash, &result) && result)
+      result = result->clone(ralloc_parent(this), NULL);
+
+   hash_table_dtor(deref_hash);
+
+   return result;
+}

diff --git a/icd/intel/compiler/shader/ir_deserializer.cpp b/icd/intel/compiler/shader/ir_deserializer.cpp
new file mode 100644
index 0000000..81d7c93
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_deserializer.cpp

@@ -0,0 +1,773 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "ir_deserializer.h"
+
+/**
+ * Searches for ir_function with matching signature from exec_list.
+ */
+static ir_function *
+search_func(struct _mesa_glsl_parse_state *state, struct exec_list *list,
+            const char *name, struct exec_list *parameters)
+{
+   foreach_list_safe(node, list) {
+      ir_function *func = ((ir_instruction *) node)->as_function();
+      if (func && strcmp(name, func->name) == 0 &&
+         func->matching_signature(state, parameters))
+         return func;
+   }
+   return NULL;
+}
+
+
+/**
+ * Helper function to read a list of instructions.
+ */
+bool
+ir_deserializer::deserialize_list(exec_list *list)
+{
+   uint32_t list_len = map->read_uint32_t();
+   for (unsigned k = 0; k < list_len; k++)
+      if (!read_instruction(list))
+         return false;
+   return true;
+}
+
+
+ir_instruction *
+ir_deserializer::read_ir_variable()
+{
+   const glsl_type *type = deserialize_glsl_type(map, state, type_ht);
+
+   /* TODO: Understand how this can happen and fix */
+   if (type == glsl_type::error_type)
+      return NULL;
+
+   char *name = map->read_string();
+   int64_t unique_id = map->read_int64_t();
+   uint8_t mode = map->read_uint8_t();
+
+   ir_variable *var =
+      new(mem_ctx) ir_variable(type, name, (ir_variable_mode) mode);
+
+   if (!var)
+      return NULL;
+
+   map->read(&var->data, sizeof(var->data));
+
+   var->num_state_slots = map->read_uint32_t();
+   uint8_t has_constant_value = map->read_uint8_t();
+   uint8_t has_constant_initializer = map->read_uint8_t();
+
+   var->state_slots = NULL;
+
+   if (var->num_state_slots > 0) {
+
+      /* Validate num_state_slots against defined maximum. */
+      if (var->num_state_slots > MAX_NUM_STATE_SLOTS)
+         return NULL;
+
+      var->state_slots = ralloc_array(var, ir_state_slot, var->num_state_slots);
+
+      for (unsigned i = 0; i < var->num_state_slots; i++) {
+         var->state_slots[i].swizzle = map->read_int32_t();
+         for (int j = 0; j < 5; j++) {
+            var->state_slots[i].tokens[j] = map->read_int32_t();
+         }
+      }
+   }
+
+   if (has_constant_value)
+      var->constant_value = read_ir_constant();
+
+   if (has_constant_initializer)
+      var->constant_initializer = read_ir_constant();
+
+   uint8_t has_interface_type = map->read_uint8_t();
+
+   if (has_interface_type && (var->get_interface_type() == NULL))
+      var->init_interface_type(deserialize_glsl_type(map, state, type_ht));
+   /**
+    * Store address to this variable so that variable
+    * dereference readers can find it later.
+    */
+   _mesa_hash_table_insert(var_ht, hash_value,
+                           (void*) (intptr_t) unique_id, var);
+   return var;
+}
+
+
+ir_instruction *
+ir_deserializer::read_ir_function(bool prototypes_only)
+{
+   char *name = map->read_string();
+   uint32_t num_signatures = map->read_uint32_t();
+
+   ir_function *f = new(mem_ctx) ir_function(name);
+   ir_function_signature *sig = NULL;
+   uint32_t next_signature = 0;
+
+   /* Add all signatures to the function. */
+   for (unsigned j = 0; j < num_signatures; j++) {
+
+      if (prototypes_only && next_signature) {
+         /* We're on our >1 iteration of the loop so
+          * we have more than one signature for this function
+          * and must skip ahead to the next signature.
+          */
+         map->jump(next_signature);
+         continue;
+      }
+
+      /* Type equals ir_function_signature. */
+      uint8_t ir_type = map->read_uint8_t();
+      uint32_t len = map->read_uint32_t();
+
+      /* The next sig, if there is one, follows the entire function, not
+       * just the parameters.
+       */
+      if (prototypes_only)
+         next_signature = map->position() + len;
+
+      /* Used for debugging. */
+      (void) ir_type;
+
+      assert(ir_type == ir_type_function_signature);
+
+      uint8_t is_builtin = map->read_uint8_t();
+
+      uint8_t is_defined = map->read_uint8_t();
+
+      const glsl_type *return_type = deserialize_glsl_type(map, state, type_ht);
+      if (!return_type)
+         return NULL;
+
+      sig = new(mem_ctx) ir_function_signature(return_type);
+
+      /* is_defined should be true if original was, even if func is empty */
+      sig->is_defined = is_defined;
+
+      /* Fill function signature parameters. */
+      if (!deserialize_list(&sig->parameters))
+         return NULL;
+
+      /* Fill instructions for the function body. */
+      if (!prototypes_only) {
+         uint32_t body_count = map->read_uint32_t();
+         for (unsigned k = 0; k < body_count; k++)
+            if (!read_instruction(&sig->body, is_builtin ? true : false))
+               return NULL;
+      }
+
+      if (!is_builtin) {
+         f->add_signature(sig);
+      } else {
+         ir_function_signature *builtin_sig =
+            _mesa_glsl_find_builtin_function(state, name, &sig->parameters);
+
+         if (!builtin_sig)
+            return NULL;
+
+         f->add_signature(sig);
+      }
+
+      /* Break out of the loop if read errors occurred. */
+      if (map->errors())
+         return NULL;
+
+   } /* For each function signature. */
+
+   return f;
+}
+
+
+/* Reads in type and package length, compares type to expected type. */
+#define VALIDATE_RVALUE(node_type)\
+   uint8_t ir_type = map->read_uint8_t();\
+   uint32_t len = map->read_uint32_t();\
+   (void) len;\
+   if (ir_type != node_type)\
+      return NULL;\
+
+
+ir_dereference_array *
+ir_deserializer::read_ir_dereference_array()
+{
+   VALIDATE_RVALUE(ir_type_dereference_array);
+
+   ir_rvalue *array_rval = read_ir_rvalue();
+   ir_rvalue *index_rval = read_ir_rvalue();
+
+   if (array_rval && index_rval)
+      return new(mem_ctx) ir_dereference_array(array_rval, index_rval);
+
+   return NULL;
+}
+
+
+ir_dereference_record *
+ir_deserializer::read_ir_dereference_record()
+{
+   VALIDATE_RVALUE(ir_type_dereference_record);
+
+   char *name = map->read_string();
+
+   ir_rvalue *rval = read_ir_rvalue();
+
+   if (rval)
+      return new(mem_ctx) ir_dereference_record(rval, name);
+
+   return NULL;
+}
+
+
+/**
+ * Reads in a variable deref, seeks variable address
+ * from a map with it's unique_id.
+ */
+ir_dereference_variable *
+ir_deserializer::read_ir_dereference_variable()
+{
+   VALIDATE_RVALUE(ir_type_dereference_variable);
+
+   int64_t unique_id = map->read_int64_t();
+
+   hash_entry *entry = _mesa_hash_table_search(var_ht, hash_value,
+                                               (void*) (intptr_t) unique_id);
+   if (!entry)
+      return NULL;
+
+   return new(mem_ctx) ir_dereference_variable((ir_variable*) entry->data);
+}
+
+
+ir_constant *
+ir_deserializer::read_ir_constant()
+{
+   VALIDATE_RVALUE(ir_type_constant);
+
+   const glsl_type *constant_type = deserialize_glsl_type(map, state, type_ht);
+
+   /* TODO: Understand how this can happen and fix */
+   if (constant_type == glsl_type::error_type)
+      return NULL;
+
+   ir_constant_data data;
+   map->read(&data, sizeof(data));
+
+   /* Array of constants. */
+   if (constant_type->base_type == GLSL_TYPE_ARRAY) {
+
+      struct exec_list elements;
+      for (unsigned i = 0; i < constant_type->length; i++) {
+
+         ir_constant *con = read_ir_constant();
+
+         /* TODO: Understand how this can happen and fix */
+         if (!con)
+            return NULL;
+
+         /* Break out of the loop if read errors occurred. */
+         if (map->errors())
+            return NULL;
+
+         elements.push_tail(con);
+      }
+
+      return new(mem_ctx) ir_constant(constant_type, &elements);
+
+   } else if (constant_type->base_type == GLSL_TYPE_STRUCT) {
+      struct exec_list elements;
+      if (!deserialize_list(&elements))
+         return NULL;
+
+      return new(mem_ctx) ir_constant(constant_type, &elements);
+   }
+
+   return new(mem_ctx) ir_constant(constant_type, &data);
+}
+
+
+ir_swizzle *
+ir_deserializer::read_ir_swizzle()
+{
+   VALIDATE_RVALUE(ir_type_swizzle);
+
+   struct ir_swizzle_mask mask = {};
+   map->read(&mask, sizeof(ir_swizzle_mask));
+
+   ir_rvalue *rval = read_ir_rvalue();
+   if (rval)
+      return new(mem_ctx) ir_swizzle(rval, mask.x, mask.y, mask.z, mask.w,
+                                     mask.num_components);
+   return NULL;
+}
+
+
+ir_texture *
+ir_deserializer::read_ir_texture()
+{
+   VALIDATE_RVALUE(ir_type_texture);
+
+   int32_t op = map->read_int32_t();
+
+   ir_texture *new_tex = new(mem_ctx) ir_texture((ir_texture_opcode) op);
+
+   if (!new_tex)
+      return NULL;
+
+   const glsl_type *type = deserialize_glsl_type(map, state, type_ht);
+
+   ir_dereference *sampler = (ir_dereference *) read_ir_rvalue();
+
+   if (!sampler || !type)
+      return NULL;
+
+   new_tex->set_sampler(sampler, type);
+
+   new_tex->coordinate = read_ir_rvalue();
+   new_tex->projector = read_ir_rvalue();
+   new_tex->shadow_comparitor = read_ir_rvalue();
+   new_tex->offset = read_ir_rvalue();
+
+   memset(&new_tex->lod_info, 0, sizeof(new_tex->lod_info));
+
+   new_tex->lod_info.lod = read_ir_rvalue();
+   new_tex->lod_info.bias = read_ir_rvalue();
+   new_tex->lod_info.sample_index = read_ir_rvalue();
+   new_tex->lod_info.component = read_ir_rvalue();
+   new_tex->lod_info.grad.dPdx = read_ir_rvalue();
+   new_tex->lod_info.grad.dPdy = read_ir_rvalue();
+
+   return new_tex;
+}
+
+
+ir_expression *
+ir_deserializer::read_ir_expression()
+{
+   VALIDATE_RVALUE(ir_type_expression);
+
+   /* Type resulting from the operation. */
+   const glsl_type *rval_type = deserialize_glsl_type(map, state, type_ht);
+
+   /* Read operation type + all operands for creating ir_expression. */
+   uint32_t operation = map->read_uint32_t();
+   uint32_t operands = map->read_int32_t();
+
+   ir_rvalue *ir_rvalue_table[4] = { NULL };
+   for (unsigned k = 0; k < operands; k++) {
+      ir_rvalue_table[k] = read_ir_rvalue();
+      if (!ir_rvalue_table[k])
+         return NULL;
+   }
+
+   return new(mem_ctx) ir_expression(operation,
+      rval_type,
+      ir_rvalue_table[0],
+      ir_rvalue_table[1],
+      ir_rvalue_table[2],
+      ir_rvalue_table[3]);
+}
+
+
+ir_rvalue *
+ir_deserializer::read_ir_rvalue()
+{
+   uint8_t ir_type = map->read_uint8_t();
+
+   switch(ir_type) {
+   case ir_type_constant:
+      return read_ir_constant();
+   case ir_type_dereference_variable:
+      return read_ir_dereference_variable();
+   case ir_type_dereference_record:
+      return read_ir_dereference_record();
+   case ir_type_dereference_array:
+      return read_ir_dereference_array();
+   case ir_type_expression:
+      return read_ir_expression();
+   case ir_type_swizzle:
+      return read_ir_swizzle();
+   case ir_type_texture:
+      return read_ir_texture();
+   /* type is ir_type_unset ir rvalue is set to NULL */
+   case ir_type_unset:
+   default:
+      return NULL;
+   }
+}
+
+
+ir_instruction *
+ir_deserializer::read_ir_assignment()
+{
+   uint32_t write_mask = map->read_uint8_t();
+
+   ir_dereference *lhs_deref = (ir_dereference *) read_ir_rvalue();
+
+   if (!lhs_deref)
+      return NULL;
+
+   ir_rvalue *cond = read_ir_rvalue();
+   ir_rvalue *rval = read_ir_rvalue();
+
+   if (rval)
+      return new(mem_ctx) ir_assignment(lhs_deref, rval, cond, write_mask);
+
+   return NULL;
+}
+
+
+ir_instruction *
+ir_deserializer::read_ir_if()
+{
+   ir_rvalue *cond = read_ir_rvalue();
+   /* TODO: Understand how this can happen and fix */
+   if (!cond)
+      return NULL;
+
+   ir_if *irif = new(mem_ctx) ir_if(cond);
+   if (!irif)
+      return NULL;
+
+   if (!deserialize_list(&irif->then_instructions))
+      return NULL;
+   if (!deserialize_list(&irif->else_instructions))
+      return NULL;
+
+   return irif;
+}
+
+
+ir_instruction *
+ir_deserializer::read_ir_return()
+{
+   return new(mem_ctx) ir_return(read_ir_rvalue());
+}
+
+
+/**
+ * Read a call to ir_function, finds the correct function
+ * signature from prototypes list and creates the call.
+ */
+ir_instruction *
+ir_deserializer::read_ir_call()
+{
+   struct exec_list parameters;
+
+   char *name = map->read_string();
+
+   ir_dereference_variable *return_deref = (ir_dereference_variable *) read_ir_rvalue();
+
+   uint8_t list_len = map->read_uint8_t();
+
+   /* read call parameters */
+   for (unsigned k = 0; k < list_len; k++) {
+      ir_rvalue *rval = read_ir_rvalue();
+
+      if (!rval)
+         return NULL;
+
+      parameters.push_tail(rval);
+   }
+
+   uint8_t use_builtin = map->read_uint8_t();
+
+   if (use_builtin) {
+      ir_function_signature *builtin_sig =
+         _mesa_glsl_find_builtin_function(state, name, &parameters);
+
+      if (!builtin_sig)
+         return NULL;
+
+      ir_call *call = new(mem_ctx) ir_call(builtin_sig, return_deref,
+                                           &parameters);
+      call->use_builtin = true;
+      return call;
+   }
+
+   /* Find the function from the prototypes. */
+   ir_function *func = search_func(state, prototypes, name, &parameters);
+   if (func)
+      return new(mem_ctx) ir_call(func->matching_signature(state, &parameters),
+                                  return_deref, &parameters);
+
+   return NULL;
+}
+
+
+ir_instruction *
+ir_deserializer::read_ir_discard()
+{
+   return new(mem_ctx) ir_discard(read_ir_rvalue());
+}
+
+
+ir_instruction *
+ir_deserializer::read_ir_loop()
+{
+   ir_loop *loop = new(mem_ctx) ir_loop;
+
+   if (!deserialize_list(&loop->body_instructions))
+      return NULL;
+
+   return loop;
+}
+
+
+ir_instruction *
+ir_deserializer::read_ir_loop_jump()
+{
+   uint8_t mode = map->read_uint8_t();
+   return  new(mem_ctx) ir_loop_jump((ir_loop_jump::jump_mode) mode);
+}
+
+
+ir_instruction *
+ir_deserializer::read_emit_vertex()
+{
+   return new(mem_ctx) ir_emit_vertex;
+}
+
+
+ir_instruction *
+ir_deserializer::read_end_primitive()
+{
+   return new(mem_ctx) ir_end_primitive;
+}
+
+
+bool
+ir_deserializer::read_instruction(struct exec_list *list, bool ignore)
+{
+   uint8_t ir_type = map->read_uint8_t();
+   uint32_t inst_dumpsize = map->read_uint32_t();
+
+   /* Reader wants to jump over this instruction. */
+   if (ignore) {
+      map->ffwd(inst_dumpsize);
+      return true;
+   }
+
+   ir_instruction *ir;
+
+   switch(ir_type) {
+   case ir_type_variable:
+      ir = read_ir_variable();
+      break;
+   case ir_type_assignment:
+      ir = read_ir_assignment();
+      break;
+   case ir_type_constant:
+      ir = read_ir_constant();
+      break;
+   case ir_type_function:
+      ir = read_ir_function();
+      break;
+   case ir_type_if:
+      ir = read_ir_if();
+      break;
+   case ir_type_return:
+      ir = read_ir_return();
+      break;
+   case ir_type_call:
+      ir = read_ir_call();
+      break;
+   case ir_type_discard:
+      ir = read_ir_discard();
+      break;
+   case ir_type_loop:
+      ir = read_ir_loop();
+      break;
+   case ir_type_loop_jump:
+      ir = read_ir_loop_jump();
+      break;
+   case ir_type_emit_vertex:
+      ir = read_emit_vertex();
+      break;
+   case ir_type_end_primitive:
+      ir = read_end_primitive();
+      break;
+   default:
+      if (ir_type <= ir_type_unset || ir_type >= ir_type_max)
+         assert(!"read invalid ir_type during IR deserialization\n");
+
+      return false;
+   }
+
+   if (!ir || map->errors())
+      return false;
+
+   list->push_tail(ir);
+   return true;
+}
+
+
+/**
+ * Go through the blob and read prototypes for the functions.
+ */
+bool
+ir_deserializer::read_prototypes(unsigned list_len)
+{
+   uint32_t ir_start = map->position();
+
+   for (unsigned k = 0; k < list_len; k++) {
+      uint8_t ir_type = map->read_uint8_t();
+      uint32_t len = map->read_uint32_t();
+
+      if (len > (map->size() - map->position())) {
+         /* We've run off the end for some reason
+          * hopefully we got the protos we need
+          * move on!
+          * TODO: Understand how this can happen and fix
+          */
+         break;
+      }
+
+      if (ir_type != ir_type_function) {
+         map->ffwd(len);
+         continue;
+      }
+
+      ir_instruction *func = read_ir_function(true);
+      if (!func)
+         return false;
+
+      prototypes->push_tail(func);
+
+      /* Break out of the loop if read errors occurred. */
+      if (map->errors())
+         return false;
+   }
+
+   map->jump(ir_start);
+   return true;
+}
+
+
+/**
+ * Enable all glsl extensions for the parse state. Patch to pack
+ * enable bits in a struct was not accepted by upstream so we need
+ * to enable each individual bit like this.
+ */
+static void
+enable_glsl_extensions(struct _mesa_glsl_parse_state *state)
+{
+   state->ARB_arrays_of_arrays_enable = true;
+   state->ARB_compute_shader_enable = true;
+   state->ARB_conservative_depth_enable = true;
+   state->ARB_draw_buffers_enable = true;
+   state->ARB_draw_instanced_enable = true;
+   state->ARB_explicit_attrib_location_enable = true;
+   state->ARB_fragment_coord_conventions_enable = true;
+   state->ARB_gpu_shader5_enable = true;
+   state->ARB_sample_shading_enable = true;
+   state->ARB_separate_shader_objects_enable = true;
+   state->ARB_shader_atomic_counters_enable = true;
+   state->ARB_shader_bit_encoding_enable = true;
+   state->ARB_shader_image_load_store_enable = true;
+   state->ARB_shader_stencil_export_enable = true;
+   state->ARB_shader_texture_lod_enable = true;
+   state->ARB_shading_language_420pack_enable = true;
+   state->ARB_shading_language_packing_enable = true;
+   state->ARB_texture_cube_map_array_enable = true;
+   state->ARB_texture_gather_enable = true;
+   state->ARB_texture_multisample_enable = true;
+   state->ARB_texture_query_levels_enable = true;
+   state->ARB_texture_query_lod_enable = true;
+   state->ARB_texture_rectangle_enable = true;
+   state->ARB_uniform_buffer_object_enable = true;
+   state->ARB_viewport_array_enable = true;
+
+   state->OES_EGL_image_external_enable = true;
+   state->OES_standard_derivatives_enable = true;
+   state->OES_texture_3D_enable = true;
+
+   state->AMD_conservative_depth_enable = true;
+   state->AMD_shader_stencil_export_enable = true;
+   state->AMD_shader_trinary_minmax_enable = true;
+   state->AMD_vertex_shader_layer_enable = true;
+   state->EXT_separate_shader_objects_enable = true;
+   state->EXT_shader_integer_mix_enable = true;
+   state->EXT_texture_array_enable = true;
+}
+
+
+bool
+ir_deserializer::deserialize(struct gl_context *ctx, void *mem_ctx, gl_shader *shader, memory_map *map)
+{
+   assert(ctx);
+   bool success = false;
+   uint32_t exec_list_len;
+
+   /* Allocations use passed ralloc ctx during reading. */
+   this->mem_ctx = mem_ctx;
+   this->map = map;
+
+   /* Parse state is used to find builtin functions and existing types during
+    * reading. We enable all the extensions to have all possible builtin types
+    * and functions available.
+    */
+   state = new(mem_ctx)
+      _mesa_glsl_parse_state(ctx,
+                            _mesa_shader_enum_to_shader_stage(shader->Type),
+                             shader);
+
+   enable_glsl_extensions(state);
+
+   state->language_version = shader->Version;
+   state->uses_builtin_functions = true;
+   _mesa_glsl_initialize_builtin_functions();
+   _mesa_glsl_initialize_types(state);
+
+   prototypes = new(mem_ctx) exec_list;
+   shader->ir = new(shader) exec_list;
+
+   exec_list_len = map->read_uint32_t();
+
+   success = read_prototypes(exec_list_len);
+
+   /* Top level exec_list read loop, constructs a new list. */
+   while(exec_list_len && success == true) {
+      success = read_instruction(shader->ir);
+      exec_list_len--;
+   }
+
+   ralloc_free(prototypes);
+   ralloc_free(state);
+
+   if (!success)
+      goto error_deserialize;
+
+   shader->CompileStatus = GL_TRUE;
+
+   /* Allocates glsl_symbol_table internally. */
+   populate_symbol_table(shader);
+
+   validate_ir_tree(shader->ir);
+
+error_deserialize:
+   return success;
+}

diff --git a/icd/intel/compiler/shader/ir_deserializer.h b/icd/intel/compiler/shader/ir_deserializer.h
new file mode 100644
index 0000000..8b5ee45
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_deserializer.h

@@ -0,0 +1,121 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef IR_CACHE_DESERIALIZER_H
+#define IR_CACHE_DESERIALIZER_H
+
+#include "program/program.h"
+#include "glsl_parser_extras.h"
+#include "main/shaderobj.h"
+#include "linker.h"
+#include "memory_map.h"
+
+#ifdef __cplusplus
+
+/**
+ * Class to deserialize gl_shader IR from a binary data blob
+ *
+ * Deserialization is done with a help of memory_map class that takes care
+ * of actual data reading. Class fills existing gl_shader's ir exec_list
+ * with IR instructions from the binary blob.
+ */
+class ir_deserializer
+{
+public:
+   ir_deserializer() :
+      state(NULL),
+      mem_ctx(NULL),
+      map(NULL)
+   {
+      var_ht = _mesa_hash_table_create(0, int_equal);
+      type_ht = _mesa_hash_table_create(0, int_equal);
+      hash_value = _mesa_hash_data(this, sizeof(ir_deserializer));
+   }
+
+   ~ir_deserializer()
+   {
+      _mesa_hash_table_destroy(var_ht, NULL);
+      _mesa_hash_table_destroy(type_ht, NULL);
+   }
+
+   /* deserialize IR to gl_shader from mapped memory */
+   bool deserialize(struct gl_context *ctx, void *mem_ctx, gl_shader *shader, memory_map *map);
+
+private:
+
+   struct _mesa_glsl_parse_state *state;
+   void *mem_ctx;
+   memory_map *map;
+
+   /* pointer to list which contains prototypes of functions */
+   struct exec_list *prototypes;
+
+   bool read_prototypes(unsigned list_len);
+   bool read_instruction(struct exec_list *list, bool ignore = false);
+   bool deserialize_list(struct exec_list *list);
+
+   ir_instruction *read_ir_variable();
+   ir_instruction *read_ir_assignment();
+   ir_instruction *read_ir_function(bool prototypes_only = false);
+   ir_instruction *read_ir_if();
+   ir_instruction *read_ir_return();
+   ir_instruction *read_ir_call();
+   ir_instruction *read_ir_discard();
+   ir_instruction *read_ir_loop();
+   ir_instruction *read_ir_loop_jump();
+   ir_instruction *read_emit_vertex();
+   ir_instruction *read_end_primitive();
+
+   /* rvalue readers */
+   ir_rvalue *read_ir_rvalue();
+   ir_constant *read_ir_constant();
+   ir_swizzle *read_ir_swizzle();
+   ir_texture *read_ir_texture();
+   ir_expression *read_ir_expression();
+   ir_dereference_array *read_ir_dereference_array();
+   ir_dereference_record *read_ir_dereference_record();
+   ir_dereference_variable *read_ir_dereference_variable();
+
+   /**
+    * var_ht is used to store created ir_variables with a unique_key for
+    * each so that ir_dereference_variable creation can find the variable
+    */
+   struct hash_table *var_ht;
+   uint32_t hash_value;
+
+   /* to read user types only once */
+   struct hash_table *type_ht;
+
+   static bool int_equal(const void *a, const void *b)
+   {
+      return a == b;
+   }
+
+};
+
+
+#endif /* ifdef __cplusplus */
+
+#endif /* IR_CACHE_DESERIALIZER_H */

diff --git a/icd/intel/compiler/shader/ir_equals.cpp b/icd/intel/compiler/shader/ir_equals.cpp
new file mode 100644
index 0000000..65376cd
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_equals.cpp

@@ -0,0 +1,200 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "ir.h"
+
+/**
+ * Helper for checking equality when one instruction might be NULL, since you
+ * can't access a's vtable in that case.
+ */
+static bool
+possibly_null_equals(ir_instruction *a, ir_instruction *b, enum ir_node_type ignore)
+{
+   if (!a || !b)
+      return !a && !b;
+
+   return a->equals(b, ignore);
+}
+
+/**
+ * The base equality function: Return not equal for anything we don't know
+ * about.
+ */
+bool
+ir_instruction::equals(ir_instruction *, enum ir_node_type)
+{
+   return false;
+}
+
+bool
+ir_constant::equals(ir_instruction *ir, enum ir_node_type)
+{
+   const ir_constant *other = ir->as_constant();
+   if (!other)
+      return false;
+
+   if (type != other->type)
+      return false;
+
+   for (unsigned i = 0; i < type->components(); i++) {
+      if (value.u[i] != other->value.u[i])
+         return false;
+   }
+
+   return true;
+}
+
+bool
+ir_dereference_variable::equals(ir_instruction *ir, enum ir_node_type)
+{
+   const ir_dereference_variable *other = ir->as_dereference_variable();
+   if (!other)
+      return false;
+
+   return var == other->var;
+}
+
+bool
+ir_dereference_array::equals(ir_instruction *ir, enum ir_node_type ignore)
+{
+   const ir_dereference_array *other = ir->as_dereference_array();
+   if (!other)
+      return false;
+
+   if (type != other->type)
+      return false;
+
+   if (!array->equals(other->array, ignore))
+      return false;
+
+   if (!array_index->equals(other->array_index, ignore))
+      return false;
+
+   return true;
+}
+
+bool
+ir_swizzle::equals(ir_instruction *ir, enum ir_node_type ignore)
+{
+   const ir_swizzle *other = ir->as_swizzle();
+   if (!other)
+      return false;
+
+   if (type != other->type)
+      return false;
+
+   if (ignore != ir_type_swizzle) {
+      if (mask.x != other->mask.x ||
+          mask.y != other->mask.y ||
+          mask.z != other->mask.z ||
+          mask.w != other->mask.w) {
+         return false;
+      }
+   }
+
+   return val->equals(other->val, ignore);
+}
+
+bool
+ir_texture::equals(ir_instruction *ir, enum ir_node_type ignore)
+{
+   const ir_texture *other = ir->as_texture();
+   if (!other)
+      return false;
+
+   if (type != other->type)
+      return false;
+
+   if (op != other->op)
+      return false;
+
+   if (!possibly_null_equals(coordinate, other->coordinate, ignore))
+      return false;
+
+   if (!possibly_null_equals(projector, other->projector, ignore))
+      return false;
+
+   if (!possibly_null_equals(shadow_comparitor, other->shadow_comparitor, ignore))
+      return false;
+
+   if (!possibly_null_equals(offset, other->offset, ignore))
+      return false;
+
+   if (!sampler->equals(other->sampler, ignore))
+      return false;
+
+   switch (op) {
+   case ir_tex:
+   case ir_lod:
+   case ir_query_levels:
+      break;
+   case ir_txb:
+      if (!lod_info.bias->equals(other->lod_info.bias, ignore))
+         return false;
+      break;
+   case ir_txl:
+   case ir_txf:
+   case ir_txs:
+      if (!lod_info.lod->equals(other->lod_info.lod, ignore))
+         return false;
+      break;
+   case ir_txd:
+      if (!lod_info.grad.dPdx->equals(other->lod_info.grad.dPdx, ignore) ||
+          !lod_info.grad.dPdy->equals(other->lod_info.grad.dPdy, ignore))
+         return false;
+      break;
+   case ir_txf_ms:
+      if (!lod_info.sample_index->equals(other->lod_info.sample_index, ignore))
+         return false;
+      break;
+   case ir_tg4:
+      if (!lod_info.component->equals(other->lod_info.component, ignore))
+         return false;
+      break;
+   default:
+      assert(!"Unrecognized texture op");
+   }
+
+   return true;
+}
+
+bool
+ir_expression::equals(ir_instruction *ir, enum ir_node_type ignore)
+{
+   const ir_expression *other = ir->as_expression();
+   if (!other)
+      return false;
+
+   if (type != other->type)
+      return false;
+
+   if (operation != other->operation)
+      return false;
+
+   for (unsigned i = 0; i < get_num_operands(); i++) {
+      if (!operands[i]->equals(other->operands[i], ignore))
+         return false;
+   }
+
+   return true;
+}

diff --git a/icd/intel/compiler/shader/ir_expression_flattening.cpp b/icd/intel/compiler/shader/ir_expression_flattening.cpp
new file mode 100644
index 0000000..c1cadb1
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_expression_flattening.cpp

@@ -0,0 +1,90 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file ir_expression_flattening.cpp
+ *
+ * Takes the leaves of expression trees and makes them dereferences of
+ * assignments of the leaves to temporaries, according to a predicate.
+ *
+ * This is used for breaking down matrix operations, where it's easier to
+ * create a temporary and work on each of its vector components individually.
+ */
+
+#include "ir.h"
+#include "ir_visitor.h"
+#include "ir_rvalue_visitor.h"
+#include "ir_expression_flattening.h"
+#include "glsl_types.h"
+
+class ir_expression_flattening_visitor : public ir_rvalue_visitor {
+public:
+   ir_expression_flattening_visitor(bool (*predicate)(ir_instruction *ir))
+   {
+      this->predicate = predicate;
+   }
+
+   virtual ~ir_expression_flattening_visitor()
+   {
+      /* empty */
+   }
+
+   void handle_rvalue(ir_rvalue **rvalue);
+   bool (*predicate)(ir_instruction *ir);
+};
+
+void
+do_expression_flattening(exec_list *instructions,
+			 bool (*predicate)(ir_instruction *ir))
+{
+   ir_expression_flattening_visitor v(predicate);
+
+   foreach_list(n, instructions) {
+      ir_instruction *ir = (ir_instruction *) n;
+
+      ir->accept(&v);
+   }
+}
+
+void
+ir_expression_flattening_visitor::handle_rvalue(ir_rvalue **rvalue)
+{
+   ir_variable *var;
+   ir_assignment *assign;
+   ir_rvalue *ir = *rvalue;
+
+   if (!ir || !this->predicate(ir))
+      return;
+
+   void *ctx = ralloc_parent(ir);
+
+   var = new(ctx) ir_variable(ir->type, "flattening_tmp", ir_var_temporary);
+   base_ir->insert_before(var);
+
+   assign = new(ctx) ir_assignment(new(ctx) ir_dereference_variable(var),
+				   ir,
+				   NULL);
+   base_ir->insert_before(assign);
+
+   *rvalue = new(ctx) ir_dereference_variable(var);
+}

diff --git a/icd/intel/compiler/shader/ir_function.cpp b/icd/intel/compiler/shader/ir_function.cpp
new file mode 100644
index 0000000..40cf589
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_function.cpp

@@ -0,0 +1,225 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "glsl_types.h"
+#include "ir.h"
+
+typedef enum {
+   PARAMETER_LIST_NO_MATCH,
+   PARAMETER_LIST_EXACT_MATCH,
+   PARAMETER_LIST_INEXACT_MATCH /*< Match requires implicit conversion. */
+} parameter_list_match_t;
+
+/**
+ * \brief Check if two parameter lists match.
+ *
+ * \param list_a Parameters of the function definition.
+ * \param list_b Actual parameters passed to the function.
+ * \see matching_signature()
+ */
+static parameter_list_match_t
+parameter_lists_match(const exec_list *list_a, const exec_list *list_b)
+{
+   const exec_node *node_a = list_a->head;
+   const exec_node *node_b = list_b->head;
+
+   /* This is set to true if there is an inexact match requiring an implicit
+    * conversion. */
+   bool inexact_match = false;
+
+   for (/* empty */
+	; !node_a->is_tail_sentinel()
+	; node_a = node_a->next, node_b = node_b->next) {
+      /* If all of the parameters from the other parameter list have been
+       * exhausted, the lists have different length and, by definition,
+       * do not match.
+       */
+      if (node_b->is_tail_sentinel())
+	 return PARAMETER_LIST_NO_MATCH;
+
+
+      const ir_variable *const param = (ir_variable *) node_a;
+      const ir_rvalue *const actual = (ir_rvalue *) node_b;
+
+      if (param->type == actual->type)
+	 continue;
+
+      /* Try to find an implicit conversion from actual to param. */
+      inexact_match = true;
+      switch ((enum ir_variable_mode)(param->data.mode)) {
+      case ir_var_auto:
+      case ir_var_uniform:
+      case ir_var_temporary:
+	 /* These are all error conditions.  It is invalid for a parameter to
+	  * a function to be declared as auto (not in, out, or inout) or
+	  * as uniform.
+	  */
+	 assert(0);
+	 return PARAMETER_LIST_NO_MATCH;
+
+      case ir_var_const_in:
+      case ir_var_function_in:
+	 if (!actual->type->can_implicitly_convert_to(param->type))
+	    return PARAMETER_LIST_NO_MATCH;
+	 break;
+
+      case ir_var_function_out:
+	 if (!param->type->can_implicitly_convert_to(actual->type))
+	    return PARAMETER_LIST_NO_MATCH;
+	 break;
+
+      case ir_var_function_inout:
+	 /* Since there are no bi-directional automatic conversions (e.g.,
+	  * there is int -> float but no float -> int), inout parameters must
+	  * be exact matches.
+	  */
+	 return PARAMETER_LIST_NO_MATCH;
+
+      default:
+	 assert(false);
+	 return PARAMETER_LIST_NO_MATCH;
+      }
+   }
+
+   /* If all of the parameters from the other parameter list have been
+    * exhausted, the lists have different length and, by definition, do not
+    * match.
+    */
+   if (!node_b->is_tail_sentinel())
+      return PARAMETER_LIST_NO_MATCH;
+
+   if (inexact_match)
+      return PARAMETER_LIST_INEXACT_MATCH;
+   else
+      return PARAMETER_LIST_EXACT_MATCH;
+}
+
+
+ir_function_signature *
+ir_function::matching_signature(_mesa_glsl_parse_state *state,
+                                const exec_list *actual_parameters)
+{
+   bool is_exact;
+   return matching_signature(state, actual_parameters, &is_exact);
+}
+
+ir_function_signature *
+ir_function::matching_signature(_mesa_glsl_parse_state *state,
+                                const exec_list *actual_parameters,
+			        bool *is_exact)
+{
+   ir_function_signature *match = NULL;
+   bool multiple_inexact_matches = false;
+
+   /* From page 42 (page 49 of the PDF) of the GLSL 1.20 spec:
+    *
+    * "If an exact match is found, the other signatures are ignored, and
+    *  the exact match is used.  Otherwise, if no exact match is found, then
+    *  the implicit conversions in Section 4.1.10 "Implicit Conversions" will
+    *  be applied to the calling arguments if this can make their types match
+    *  a signature.  In this case, it is a semantic error if there are
+    *  multiple ways to apply these conversions to the actual arguments of a
+    *  call such that the call can be made to match multiple signatures."
+    */
+   foreach_list(n, &this->signatures) {
+      ir_function_signature *const sig = (ir_function_signature *) n;
+
+      /* Skip over any built-ins that aren't available in this shader. */
+      if (sig->is_builtin() && !sig->is_builtin_available(state))
+         continue;
+
+      switch (parameter_lists_match(& sig->parameters, actual_parameters)) {
+      case PARAMETER_LIST_EXACT_MATCH:
+	 *is_exact = true;
+	 return sig;
+      case PARAMETER_LIST_INEXACT_MATCH:
+	 if (match == NULL)
+	    match = sig;
+	 else
+	    multiple_inexact_matches = true;
+	 continue;
+      case PARAMETER_LIST_NO_MATCH:
+	 continue;
+      default:
+	 assert(false);
+	 return NULL;
+      }
+   }
+
+   /* There is no exact match (we would have returned it by now).  If there
+    * are multiple inexact matches, the call is ambiguous, which is an error.
+    *
+    * FINISHME: Report a decent error.  Returning NULL will likely result in
+    * FINISHME: a "no matching signature" error; it should report that the
+    * FINISHME: call is ambiguous.  But reporting errors from here is hard.
+    */
+   *is_exact = false;
+
+   if (multiple_inexact_matches)
+      return NULL;
+
+   return match;
+}
+
+
+static bool
+parameter_lists_match_exact(const exec_list *list_a, const exec_list *list_b)
+{
+   const exec_node *node_a = list_a->head;
+   const exec_node *node_b = list_b->head;
+
+   for (/* empty */
+	; !node_a->is_tail_sentinel() && !node_b->is_tail_sentinel()
+	; node_a = node_a->next, node_b = node_b->next) {
+      ir_variable *a = (ir_variable *) node_a;
+      ir_variable *b = (ir_variable *) node_b;
+
+      /* If the types of the parameters do not match, the parameters lists
+       * are different.
+       */
+      if (a->type != b->type)
+         return false;
+   }
+
+   /* Unless both lists are exhausted, they differ in length and, by
+    * definition, do not match.
+    */
+   return (node_a->is_tail_sentinel() == node_b->is_tail_sentinel());
+}
+
+ir_function_signature *
+ir_function::exact_matching_signature(_mesa_glsl_parse_state *state,
+                                      const exec_list *actual_parameters)
+{
+   foreach_list(n, &this->signatures) {
+      ir_function_signature *const sig = (ir_function_signature *) n;
+
+      /* Skip over any built-ins that aren't available in this shader. */
+      if (sig->is_builtin() && !sig->is_builtin_available(state))
+         continue;
+
+      if (parameter_lists_match_exact(&sig->parameters, actual_parameters))
+	 return sig;
+   }
+   return NULL;
+}

diff --git a/icd/intel/compiler/shader/ir_function_can_inline.cpp b/icd/intel/compiler/shader/ir_function_can_inline.cpp
new file mode 100644
index 0000000..7b15d5d
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_function_can_inline.cpp

@@ -0,0 +1,76 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file ir_function_can_inline.cpp
+ *
+ * Determines if we can inline a function call using ir_function_inlining.cpp.
+ *
+ * The primary restriction is that we can't return from the function
+ * other than as the last instruction.  We could potentially work
+ * around this for some constructs by flattening control flow and
+ * moving the return to the end, or by using breaks from a do {} while
+ * (0) loop surrounding the function body.
+ */
+
+#include "ir.h"
+
+class ir_function_can_inline_visitor : public ir_hierarchical_visitor {
+public:
+   ir_function_can_inline_visitor()
+   {
+      this->num_returns = 0;
+   }
+
+   virtual ir_visitor_status visit_enter(ir_return *);
+
+   int num_returns;
+};
+
+ir_visitor_status
+ir_function_can_inline_visitor::visit_enter(ir_return *ir)
+{
+   (void) ir;
+   this->num_returns++;
+   return visit_continue;
+}
+
+bool
+can_inline(ir_call *call)
+{
+   ir_function_can_inline_visitor v;
+   const ir_function_signature *callee = call->callee;
+   if (!callee->is_defined)
+      return false;
+
+   v.run((exec_list *) &callee->body);
+
+   /* If the function is empty (no last instruction) or does not end with a
+    * return statement, we need to count the implicit return.
+    */
+   ir_instruction *last = (ir_instruction *)callee->body.get_tail();
+   if (last == NULL || !last->as_return())
+      v.num_returns++;
+
+   return v.num_returns == 1;
+}

diff --git a/icd/intel/compiler/shader/ir_function_detect_recursion.cpp b/icd/intel/compiler/shader/ir_function_detect_recursion.cpp
new file mode 100644
index 0000000..af985cd
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_function_detect_recursion.cpp

@@ -0,0 +1,360 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file ir_function_detect_recursion.cpp
+ * Determine whether a shader contains static recursion.
+ *
+ * Consider the (possibly disjoint) graph of function calls in a shader.  If a
+ * program contains recursion, this graph will contain a cycle.  If a function
+ * is part of a cycle, it will have a caller and it will have a callee (it
+ * calls another function).
+ *
+ * To detect recursion, the function call graph is constructed.  The graph is
+ * repeatedly reduced by removing any function that either has no callees
+ * (leaf functions) or has no caller.  Eventually the only functions that
+ * remain will be the functions in the cycles.
+ *
+ * The GLSL spec is a bit wishy-washy about recursion.
+ *
+ * From page 39 (page 45 of the PDF) of the GLSL 1.10 spec:
+ *
+ *     "Behavior is undefined if recursion is used. Recursion means having any
+ *     function appearing more than once at any one time in the run-time stack
+ *     of function calls. That is, a function may not call itself either
+ *     directly or indirectly. Compilers may give diagnostic messages when
+ *     this is detectable at compile time, but not all such cases can be
+ *     detected at compile time."
+ *
+ * From page 79 (page 85 of the PDF):
+ *
+ *     "22) Should recursion be supported?
+ *
+ *      DISCUSSION: Probably not necessary, but another example of limiting
+ *      the language based on how it would directly map to hardware. One
+ *      thought is that recursion would benefit ray tracing shaders. On the
+ *      other hand, many recursion operations can also be implemented with the
+ *      user managing the recursion through arrays. RenderMan doesn't support
+ *      recursion. This could be added at a later date, if it proved to be
+ *      necessary.
+ *
+ *      RESOLVED on September 10, 2002: Implementations are not required to
+ *      support recursion.
+ *
+ *      CLOSED on September 10, 2002."
+ *
+ * From page 79 (page 85 of the PDF):
+ *
+ *     "56) Is it an error for an implementation to support recursion if the
+ *     specification says recursion is not supported?
+ *
+ *     ADDED on September 10, 2002.
+ *
+ *     DISCUSSION: This issues is related to Issue (22). If we say that
+ *     recursion (or some other piece of functionality) is not supported, is
+ *     it an error for an implementation to support it? Perhaps the
+ *     specification should remain silent on these kind of things so that they
+ *     could be gracefully added later as an extension or as part of the
+ *     standard.
+ *
+ *     RESOLUTION: Languages, in general, have programs that are not
+ *     well-formed in ways a compiler cannot detect. Portability is only
+ *     ensured for well-formed programs. Detecting recursion is an example of
+ *     this. The language will say a well-formed program may not recurse, but
+ *     compilers are not forced to detect that recursion may happen.
+ *
+ *     CLOSED: November 29, 2002."
+ *
+ * In GLSL 1.10 the behavior of recursion is undefined.  Compilers don't have
+ * to reject shaders (at compile-time or link-time) that contain recursion.
+ * Instead they could work, or crash, or kill a kitten.
+ *
+ * From page 44 (page 50 of the PDF) of the GLSL 1.20 spec:
+ *
+ *     "Recursion is not allowed, not even statically. Static recursion is
+ *     present if the static function call graph of the program contains
+ *     cycles."
+ *
+ * This langauge clears things up a bit, but it still leaves a lot of
+ * questions unanswered.
+ *
+ *     - Is the error generated at compile-time or link-time?
+ *
+ *     - Is it an error to have a recursive function that is never statically
+ *       called by main or any function called directly or indirectly by main?
+ *       Technically speaking, such a function is not in the "static function
+ *       call graph of the program" at all.
+ *
+ * \bug
+ * If a shader has multiple cycles, this algorithm may erroneously complain
+ * about functions that aren't in any cycle, but are in the part of the call
+ * tree that connects them.  For example, if the call graph consists of a
+ * cycle between A and B, and a cycle between D and E, and B also calls C
+ * which calls D, then this algorithm will report C as a function which "has
+ * static recursion" even though it is not part of any cycle.
+ *
+ * A better algorithm for cycle detection that doesn't have this drawback can
+ * be found here:
+ *
+ * http://en.wikipedia.org/wiki/Tarjan%E2%80%99s_strongly_connected_components_algorithm
+ *
+ * \author Ian Romanick <ian.d.romanick@intel.com>
+ */
+#include "libfns.h"  // LunarG ADD: 
+#include "ir.h"
+#include "glsl_parser_extras.h"
+#include "linker.h"
+#include "program/hash_table.h"
+#include "program.h"
+
+namespace {
+
+struct call_node : public exec_node {
+   class function *func;
+};
+
+class function {
+public:
+   function(ir_function_signature *sig)
+      : sig(sig)
+   {
+      /* empty */
+   }
+
+   DECLARE_RALLOC_CXX_OPERATORS(function)
+
+   ir_function_signature *sig;
+
+   /** List of functions called by this function. */
+   exec_list callees;
+
+   /** List of functions that call this function. */
+   exec_list callers;
+};
+
+class has_recursion_visitor : public ir_hierarchical_visitor {
+public:
+   has_recursion_visitor()
+      : current(NULL)
+   {
+      progress = false;
+      this->mem_ctx = ralloc_context(NULL);
+      this->function_hash = hash_table_ctor(0, hash_table_pointer_hash,
+					    hash_table_pointer_compare);
+   }
+
+   ~has_recursion_visitor()
+   {
+      hash_table_dtor(this->function_hash);
+      ralloc_free(this->mem_ctx);
+   }
+
+   function *get_function(ir_function_signature *sig)
+   {
+      function *f = (function *) hash_table_find(this->function_hash, sig);
+      if (f == NULL) {
+	 f = new(mem_ctx) function(sig);
+	 hash_table_insert(this->function_hash, f, sig);
+      }
+
+      return f;
+   }
+
+   virtual ir_visitor_status visit_enter(ir_function_signature *sig)
+   {
+      this->current = this->get_function(sig);
+      return visit_continue;
+   }
+
+   virtual ir_visitor_status visit_leave(ir_function_signature *sig)
+   {
+      (void) sig;
+      this->current = NULL;
+      return visit_continue;
+   }
+
+   virtual ir_visitor_status visit_enter(ir_call *call)
+   {
+      /* At global scope this->current will be NULL.  Since there is no way to
+       * call global scope, it can never be part of a cycle.  Don't bother
+       * adding calls from global scope to the graph.
+       */
+      if (this->current == NULL)
+	 return visit_continue;
+
+      function *const target = this->get_function(call->callee);
+
+      /* Create a link from the caller to the callee.
+       */
+      call_node *node = new(mem_ctx) call_node;
+      node->func = target;
+      this->current->callees.push_tail(node);
+
+      /* Create a link from the callee to the caller.
+       */
+      node = new(mem_ctx) call_node;
+      node->func = this->current;
+      target->callers.push_tail(node);
+      return visit_continue;
+   }
+
+   function *current;
+   struct hash_table *function_hash;
+   void *mem_ctx;
+   bool progress;
+};
+
+} /* anonymous namespace */
+
+static void
+destroy_links(exec_list *list, function *f)
+{
+   foreach_list_safe(node, list) {
+      struct call_node *n = (struct call_node *) node;
+
+      /* If this is the right function, remove it.  Note that the loop cannot
+       * terminate now.  There can be multiple links to a function if it is
+       * either called multiple times or calls multiple times.
+       */
+      if (n->func == f)
+	 n->remove();
+   }
+}
+
+
+/**
+ * Remove a function if it has either no in or no out links
+ */
+static void
+remove_unlinked_functions(const void *key, void *data, void *closure)
+{
+   has_recursion_visitor *visitor = (has_recursion_visitor *) closure;
+   function *f = (function *) data;
+
+   if (f->callers.is_empty() || f->callees.is_empty()) {
+      while (!f->callers.is_empty()) {
+	 struct call_node *n = (struct call_node *) f->callers.pop_head();
+	 destroy_links(& n->func->callees, f);
+      }
+
+      while (!f->callees.is_empty()) {
+	 struct call_node *n = (struct call_node *) f->callees.pop_head();
+	 destroy_links(& n->func->callers, f);
+      }
+
+      hash_table_remove(visitor->function_hash, key);
+      visitor->progress = true;
+   }
+}
+
+
+static void
+emit_errors_unlinked(const void *key, void *data, void *closure)
+{
+   struct _mesa_glsl_parse_state *state =
+      (struct _mesa_glsl_parse_state *) closure;
+   function *f = (function *) data;
+   YYLTYPE loc;
+
+   (void) key;
+
+   char *proto = prototype_string(f->sig->return_type,
+				  f->sig->function_name(),
+				  &f->sig->parameters);
+
+   memset(&loc, 0, sizeof(loc));
+   _mesa_glsl_error(&loc, state,
+		    "function `%s' has static recursion",
+		    proto);
+   ralloc_free(proto);
+}
+
+
+static void
+emit_errors_linked(const void *key, void *data, void *closure)
+{
+   struct gl_shader_program *prog =
+      (struct gl_shader_program *) closure;
+   function *f = (function *) data;
+
+   (void) key;
+
+   char *proto = prototype_string(f->sig->return_type,
+				  f->sig->function_name(),
+				  &f->sig->parameters);
+
+   linker_error(prog, "function `%s' has static recursion.\n", proto);
+   ralloc_free(proto);
+}
+
+
+void
+detect_recursion_unlinked(struct _mesa_glsl_parse_state *state,
+			  exec_list *instructions)
+{
+   has_recursion_visitor v;
+
+   /* Collect all of the information about which functions call which other
+    * functions.
+    */
+   v.run(instructions);
+
+   /* Remove from the set all of the functions that either have no caller or
+    * call no other functions.  Repeat until no functions are removed.
+    */
+   do {
+      v.progress = false;
+      hash_table_call_foreach(v.function_hash, remove_unlinked_functions, & v);
+   } while (v.progress);
+
+
+   /* At this point any functions still in the hash must be part of a cycle.
+    */
+   hash_table_call_foreach(v.function_hash, emit_errors_unlinked, state);
+}
+
+
+void
+detect_recursion_linked(struct gl_shader_program *prog,
+			exec_list *instructions)
+{
+   has_recursion_visitor v;
+
+   /* Collect all of the information about which functions call which other
+    * functions.
+    */
+   v.run(instructions);
+
+   /* Remove from the set all of the functions that either have no caller or
+    * call no other functions.  Repeat until no functions are removed.
+    */
+   do {
+      v.progress = false;
+      hash_table_call_foreach(v.function_hash, remove_unlinked_functions, & v);
+   } while (v.progress);
+
+
+   /* At this point any functions still in the hash must be part of a cycle.
+    */
+   hash_table_call_foreach(v.function_hash, emit_errors_linked, prog);
+}

diff --git a/icd/intel/compiler/shader/ir_function_inlining.h b/icd/intel/compiler/shader/ir_function_inlining.h
new file mode 100644
index 0000000..6db011b
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_function_inlining.h

@@ -0,0 +1,30 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file ir_function_inlining.h
+ *
+ * Replaces calls to functions with the body of the function.
+ */
+
+bool can_inline(ir_call *call);

diff --git a/icd/intel/compiler/shader/ir_hierarchical_visitor.cpp b/icd/intel/compiler/shader/ir_hierarchical_visitor.cpp
new file mode 100644
index 0000000..2e606dd
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_hierarchical_visitor.cpp

@@ -0,0 +1,324 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "ir.h"
+#include "ir_hierarchical_visitor.h"
+
+ir_hierarchical_visitor::ir_hierarchical_visitor()
+{
+   this->base_ir = NULL;
+   this->callback = NULL;
+   this->data = NULL;
+   this->in_assignee = false;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit(ir_rvalue *ir)
+{
+   if (this->callback != NULL)
+      this->callback(ir, this->data);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit(ir_variable *ir)
+{
+   if (this->callback != NULL)
+      this->callback(ir, this->data);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit(ir_constant *ir)
+{
+   if (this->callback != NULL)
+      this->callback(ir, this->data);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit(ir_loop_jump *ir)
+{
+   if (this->callback != NULL)
+      this->callback(ir, this->data);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit(ir_emit_vertex *ir)
+{
+   if (this->callback != NULL)
+      this->callback(ir, this->data);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit(ir_end_primitive *ir)
+{
+   if (this->callback != NULL)
+      this->callback(ir, this->data);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit(ir_dereference_variable *ir)
+{
+   if (this->callback != NULL)
+      this->callback(ir, this->data);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_enter(ir_loop *ir)
+{
+   if (this->callback != NULL)
+      this->callback(ir, this->data);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_leave(ir_loop *ir)
+{
+   (void) ir;
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_enter(ir_function_signature *ir)
+{
+   if (this->callback != NULL)
+      this->callback(ir, this->data);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_leave(ir_function_signature *ir)
+{
+   (void) ir;
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_enter(ir_function *ir)
+{
+   if (this->callback != NULL)
+      this->callback(ir, this->data);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_leave(ir_function *ir)
+{
+   (void) ir;
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_enter(ir_expression *ir)
+{
+   if (this->callback != NULL)
+      this->callback(ir, this->data);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_leave(ir_expression *ir)
+{
+   (void) ir;
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_enter(ir_texture *ir)
+{
+   if (this->callback != NULL)
+      this->callback(ir, this->data);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_leave(ir_texture *ir)
+{
+   (void) ir;
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_enter(ir_swizzle *ir)
+{
+   if (this->callback != NULL)
+      this->callback(ir, this->data);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_leave(ir_swizzle *ir)
+{
+   (void) ir;
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_enter(ir_dereference_array *ir)
+{
+   if (this->callback != NULL)
+      this->callback(ir, this->data);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_leave(ir_dereference_array *ir)
+{
+   (void) ir;
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_enter(ir_dereference_record *ir)
+{
+   if (this->callback != NULL)
+      this->callback(ir, this->data);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_leave(ir_dereference_record *ir)
+{
+   (void) ir;
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_enter(ir_assignment *ir)
+{
+   if (this->callback != NULL)
+      this->callback(ir, this->data);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_leave(ir_assignment *ir)
+{
+   (void) ir;
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_enter(ir_call *ir)
+{
+   if (this->callback != NULL)
+      this->callback(ir, this->data);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_leave(ir_call *ir)
+{
+   (void) ir;
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_enter(ir_return *ir)
+{
+   if (this->callback != NULL)
+      this->callback(ir, this->data);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_leave(ir_return *ir)
+{
+   (void) ir;
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_enter(ir_discard *ir)
+{
+   if (this->callback != NULL)
+      this->callback(ir, this->data);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_leave(ir_discard *ir)
+{
+   (void) ir;
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_enter(ir_if *ir)
+{
+   if (this->callback != NULL)
+      this->callback(ir, this->data);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_hierarchical_visitor::visit_leave(ir_if *ir)
+{
+   (void) ir;
+   return visit_continue;
+}
+
+void
+ir_hierarchical_visitor::run(exec_list *instructions)
+{
+   visit_list_elements(this, instructions);
+}
+
+
+void
+visit_tree(ir_instruction *ir,
+	   void (*callback)(class ir_instruction *ir, void *data),
+	   void *data)
+{
+   ir_hierarchical_visitor v;
+
+   v.callback = callback;
+   v.data = data;
+
+   ir->accept(&v);
+}

diff --git a/icd/intel/compiler/shader/ir_hierarchical_visitor.h b/icd/intel/compiler/shader/ir_hierarchical_visitor.h
new file mode 100644
index 0000000..647d2e0
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_hierarchical_visitor.h

@@ -0,0 +1,189 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef IR_HIERARCHICAL_VISITOR_H
+#define IR_HIERARCHICAL_VISITOR_H
+
+/**
+ * Enumeration values returned by visit methods to guide processing
+ */
+enum ir_visitor_status {
+   visit_continue,		/**< Continue visiting as normal. */
+   visit_continue_with_parent,	/**< Don't visit siblings, continue w/parent. */
+   visit_stop			/**< Stop visiting immediately. */
+};
+
+
+#ifdef __cplusplus
+/**
+ * Base class of hierarchical visitors of IR instruction trees
+ *
+ * Hierarchical visitors differ from traditional visitors in a couple of
+ * important ways.  Rather than having a single \c visit method for each
+ * subclass in the composite, there are three kinds of visit methods.
+ * Leaf-node classes have a traditional \c visit method.  Internal-node
+ * classes have a \c visit_enter method, which is invoked just before
+ * processing child nodes, and a \c visit_leave method which is invoked just
+ * after processing child nodes.
+ *
+ * In addition, each visit method and the \c accept methods in the composite
+ * have a return value which guides the navigation.  Any of the visit methods
+ * can choose to continue visiting the tree as normal (by returning \c
+ * visit_continue), terminate visiting any further nodes immediately (by
+ * returning \c visit_stop), or stop visiting sibling nodes (by returning \c
+ * visit_continue_with_parent).
+ *
+ * These two changes combine to allow nagivation of children to be implemented
+ * in the composite's \c accept method.  The \c accept method for a leaf-node
+ * class will simply call the \c visit method, as usual, and pass its return
+ * value on.  The \c accept method for internal-node classes will call the \c
+ * visit_enter method, call the \c accpet method of each child node, and,
+ * finally, call the \c visit_leave method.  If any of these return a value
+ * other that \c visit_continue, the correct action must be taken.
+ *
+ * The final benefit is that the hierarchical visitor base class need not be
+ * abstract.  Default implementations of every \c visit, \c visit_enter, and
+ * \c visit_leave method can be provided.  By default each of these methods
+ * simply returns \c visit_continue.  This allows a significant reduction in
+ * derived class code.
+ *
+ * For more information about hierarchical visitors, see:
+ *
+ *    http://c2.com/cgi/wiki?HierarchicalVisitorPattern
+ *    http://c2.com/cgi/wiki?HierarchicalVisitorDiscussion
+ */
+
+class ir_hierarchical_visitor {
+public:
+   ir_hierarchical_visitor();
+
+   /**
+    * \name Visit methods for leaf-node classes
+    */
+   /*@{*/
+   virtual ir_visitor_status visit(class ir_rvalue *);
+   virtual ir_visitor_status visit(class ir_variable *);
+   virtual ir_visitor_status visit(class ir_constant *);
+   virtual ir_visitor_status visit(class ir_loop_jump *);
+   virtual ir_visitor_status visit(class ir_emit_vertex *);
+   virtual ir_visitor_status visit(class ir_end_primitive *);
+
+   /**
+    * ir_dereference_variable isn't technically a leaf, but it is treated as a
+    * leaf here for a couple reasons.  By not automatically visiting the one
+    * child ir_variable node from the ir_dereference_variable, ir_variable
+    * nodes can always be handled as variable declarations.  Code that used
+    * non-hierarchical visitors had to set an "in a dereference" flag to
+    * determine how to handle an ir_variable.  By forcing the visitor to
+    * handle the ir_variable within the ir_dereference_variable visitor, this
+    * kludge can be avoided.
+    *
+    * In addition, I can envision no use for having separate enter and leave
+    * methods.  Anything that could be done in the enter and leave methods
+    * that couldn't just be done in the visit method.
+    */
+   virtual ir_visitor_status visit(class ir_dereference_variable *);
+   /*@}*/
+
+   /**
+    * \name Visit methods for internal-node classes
+    */
+   /*@{*/
+   virtual ir_visitor_status visit_enter(class ir_loop *);
+   virtual ir_visitor_status visit_leave(class ir_loop *);
+   virtual ir_visitor_status visit_enter(class ir_function_signature *);
+   virtual ir_visitor_status visit_leave(class ir_function_signature *);
+   virtual ir_visitor_status visit_enter(class ir_function *);
+   virtual ir_visitor_status visit_leave(class ir_function *);
+   virtual ir_visitor_status visit_enter(class ir_expression *);
+   virtual ir_visitor_status visit_leave(class ir_expression *);
+   virtual ir_visitor_status visit_enter(class ir_texture *);
+   virtual ir_visitor_status visit_leave(class ir_texture *);
+   virtual ir_visitor_status visit_enter(class ir_swizzle *);
+   virtual ir_visitor_status visit_leave(class ir_swizzle *);
+   virtual ir_visitor_status visit_enter(class ir_dereference_array *);
+   virtual ir_visitor_status visit_leave(class ir_dereference_array *);
+   virtual ir_visitor_status visit_enter(class ir_dereference_record *);
+   virtual ir_visitor_status visit_leave(class ir_dereference_record *);
+   virtual ir_visitor_status visit_enter(class ir_assignment *);
+   virtual ir_visitor_status visit_leave(class ir_assignment *);
+   virtual ir_visitor_status visit_enter(class ir_call *);
+   virtual ir_visitor_status visit_leave(class ir_call *);
+   virtual ir_visitor_status visit_enter(class ir_return *);
+   virtual ir_visitor_status visit_leave(class ir_return *);
+   virtual ir_visitor_status visit_enter(class ir_discard *);
+   virtual ir_visitor_status visit_leave(class ir_discard *);
+   virtual ir_visitor_status visit_enter(class ir_if *);
+   virtual ir_visitor_status visit_leave(class ir_if *);
+   /*@}*/
+
+
+   /**
+    * Utility function to process a linked list of instructions with a visitor
+    */
+   void run(struct exec_list *instructions);
+
+   /* Some visitors may need to insert new variable declarations and
+    * assignments for portions of a subtree, which means they need a
+    * pointer to the current instruction in the stream, not just their
+    * node in the tree rooted at that instruction.
+    *
+    * This is implemented by visit_list_elements -- if the visitor is
+    * not called by it, nothing good will happen.
+    */
+   class ir_instruction *base_ir;
+
+   /**
+    * Callback function that is invoked on entry to each node visited.
+    *
+    * \warning
+    * Visitor classes derived from \c ir_hierarchical_visitor \b may \b not
+    * invoke this function.  This can be used, for example, to cause the
+    * callback to be invoked on every node type execpt one.
+    */
+   void (*callback)(class ir_instruction *ir, void *data);
+
+   /**
+    * Extra data parameter passed to the per-node callback function
+    */
+   void *data;
+
+   /**
+    * Currently in the LHS of an assignment?
+    *
+    * This is set and cleared by the \c ir_assignment::accept method.
+    */
+   bool in_assignee;
+};
+
+void visit_tree(ir_instruction *ir,
+		void (*callback)(class ir_instruction *ir, void *data),
+		void *data);
+
+ir_visitor_status visit_list_elements(ir_hierarchical_visitor *v, exec_list *l,
+                                      bool statement_list = true);
+#endif /* __cplusplus */
+
+#endif /* IR_HIERARCHICAL_VISITOR_H */

diff --git a/icd/intel/compiler/shader/ir_hv_accept.cpp b/icd/intel/compiler/shader/ir_hv_accept.cpp
new file mode 100644
index 0000000..2a1f70e
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_hv_accept.cpp

@@ -0,0 +1,416 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "ir.h"
+
+/**
+ * \file ir_hv_accept.cpp
+ * Implementations of all hierarchical visitor accept methods for IR
+ * instructions.
+ */
+
+/**
+ * Process a list of nodes using a hierarchical vistor.
+ *
+ * If statement_list is true (the default), this is a list of statements, so
+ * v->base_ir will be set to point to each statement just before iterating
+ * over it, and restored after iteration is complete.  If statement_list is
+ * false, this is a list that appears inside a statement (e.g. a parameter
+ * list), so v->base_ir will be left alone.
+ *
+ * \warning
+ * This function will operate correctly if a node being processed is removed
+ * from the list.  However, if nodes are added to the list after the node being
+ * processed, some of the added nodes may not be processed.
+ */
+ir_visitor_status
+visit_list_elements(ir_hierarchical_visitor *v, exec_list *l,
+                    bool statement_list)
+{
+   ir_instruction *prev_base_ir = v->base_ir;
+
+   foreach_list_safe(n, l) {
+      ir_instruction *const ir = (ir_instruction *) n;
+      if (statement_list)
+         v->base_ir = ir;
+      ir_visitor_status s = ir->accept(v);
+
+      if (s != visit_continue)
+	 return s;
+   }
+   if (statement_list)
+      v->base_ir = prev_base_ir;
+
+   return visit_continue;
+}
+
+
+ir_visitor_status
+ir_rvalue::accept(ir_hierarchical_visitor *v)
+{
+   return v->visit(this);
+}
+
+
+ir_visitor_status
+ir_variable::accept(ir_hierarchical_visitor *v)
+{
+   return v->visit(this);
+}
+
+
+ir_visitor_status
+ir_loop::accept(ir_hierarchical_visitor *v)
+{
+   ir_visitor_status s = v->visit_enter(this);
+
+   if (s != visit_continue)
+      return (s == visit_continue_with_parent) ? visit_continue : s;
+
+   s = visit_list_elements(v, &this->body_instructions);
+   if (s == visit_stop)
+      return s;
+
+   return v->visit_leave(this);
+}
+
+
+ir_visitor_status
+ir_loop_jump::accept(ir_hierarchical_visitor *v)
+{
+   return v->visit(this);
+}
+
+
+ir_visitor_status
+ir_function_signature::accept(ir_hierarchical_visitor *v)
+{
+   ir_visitor_status s = v->visit_enter(this);
+   if (s != visit_continue)
+      return (s == visit_continue_with_parent) ? visit_continue : s;
+
+   s = visit_list_elements(v, &this->parameters);
+   if (s == visit_stop)
+      return s;
+
+   s = visit_list_elements(v, &this->body);
+   return (s == visit_stop) ? s : v->visit_leave(this);
+}
+
+
+ir_visitor_status
+ir_function::accept(ir_hierarchical_visitor *v)
+{
+   ir_visitor_status s = v->visit_enter(this);
+   if (s != visit_continue)
+      return (s == visit_continue_with_parent) ? visit_continue : s;
+
+   s = visit_list_elements(v, &this->signatures, false);
+   return (s == visit_stop) ? s : v->visit_leave(this);
+}
+
+
+ir_visitor_status
+ir_expression::accept(ir_hierarchical_visitor *v)
+{
+   ir_visitor_status s = v->visit_enter(this);
+
+   if (s != visit_continue)
+      return (s == visit_continue_with_parent) ? visit_continue : s;
+
+   for (unsigned i = 0; i < this->get_num_operands(); i++) {
+      switch (this->operands[i]->accept(v)) {
+      case visit_continue:
+	 break;
+
+      case visit_continue_with_parent:
+	 // I wish for Java's labeled break-statement here.
+	 goto done;
+
+      case visit_stop:
+	 return s;
+      }
+   }
+
+done:
+   return v->visit_leave(this);
+}
+
+ir_visitor_status
+ir_texture::accept(ir_hierarchical_visitor *v)
+{
+   ir_visitor_status s = v->visit_enter(this);
+   if (s != visit_continue)
+      return (s == visit_continue_with_parent) ? visit_continue : s;
+
+   s = this->sampler->accept(v);
+   if (s != visit_continue)
+      return (s == visit_continue_with_parent) ? visit_continue : s;
+
+   if (this->coordinate) {
+      s = this->coordinate->accept(v);
+      if (s != visit_continue)
+	 return (s == visit_continue_with_parent) ? visit_continue : s;
+   }
+
+   if (this->projector) {
+      s = this->projector->accept(v);
+      if (s != visit_continue)
+	 return (s == visit_continue_with_parent) ? visit_continue : s;
+   }
+
+   if (this->shadow_comparitor) {
+      s = this->shadow_comparitor->accept(v);
+      if (s != visit_continue)
+	 return (s == visit_continue_with_parent) ? visit_continue : s;
+   }
+
+   if (this->offset) {
+      s = this->offset->accept(v);
+      if (s != visit_continue)
+	 return (s == visit_continue_with_parent) ? visit_continue : s;
+   }
+
+   switch (this->op) {
+   case ir_tex:
+   case ir_lod:
+   case ir_query_levels:
+      break;
+   case ir_txb:
+      s = this->lod_info.bias->accept(v);
+      if (s != visit_continue)
+	 return (s == visit_continue_with_parent) ? visit_continue : s;
+      break;
+   case ir_txl:
+   case ir_txf:
+   case ir_txs:
+      s = this->lod_info.lod->accept(v);
+      if (s != visit_continue)
+	 return (s == visit_continue_with_parent) ? visit_continue : s;
+      break;
+   case ir_txf_ms:
+      s = this->lod_info.sample_index->accept(v);
+      if (s != visit_continue)
+         return (s == visit_continue_with_parent) ? visit_continue : s;
+      break;
+   case ir_txd:
+      s = this->lod_info.grad.dPdx->accept(v);
+      if (s != visit_continue)
+	 return (s == visit_continue_with_parent) ? visit_continue : s;
+
+      s = this->lod_info.grad.dPdy->accept(v);
+      if (s != visit_continue)
+	 return (s == visit_continue_with_parent) ? visit_continue : s;
+      break;
+   case ir_tg4:
+      s = this->lod_info.component->accept(v);
+      if (s != visit_continue)
+         return (s == visit_continue_with_parent) ? visit_continue : s;
+      break;
+   }
+
+   return (s == visit_stop) ? s : v->visit_leave(this);
+}
+
+
+ir_visitor_status
+ir_swizzle::accept(ir_hierarchical_visitor *v)
+{
+   ir_visitor_status s = v->visit_enter(this);
+   if (s != visit_continue)
+      return (s == visit_continue_with_parent) ? visit_continue : s;
+
+   s = this->val->accept(v);
+   return (s == visit_stop) ? s : v->visit_leave(this);
+}
+
+
+ir_visitor_status
+ir_dereference_variable::accept(ir_hierarchical_visitor *v)
+{
+   return v->visit(this);
+}
+
+
+ir_visitor_status
+ir_dereference_array::accept(ir_hierarchical_visitor *v)
+{
+   ir_visitor_status s = v->visit_enter(this);
+   if (s != visit_continue)
+      return (s == visit_continue_with_parent) ? visit_continue : s;
+
+   /* The array index is not the target of the assignment, so clear the
+    * 'in_assignee' flag.  Restore it after returning from the array index.
+    */
+   const bool was_in_assignee = v->in_assignee;
+   v->in_assignee = false;
+   s = this->array_index->accept(v);
+   v->in_assignee = was_in_assignee;
+
+   if (s != visit_continue)
+      return (s == visit_continue_with_parent) ? visit_continue : s;
+
+   s = this->array->accept(v);
+   return (s == visit_stop) ? s : v->visit_leave(this);
+}
+
+
+ir_visitor_status
+ir_dereference_record::accept(ir_hierarchical_visitor *v)
+{
+   ir_visitor_status s = v->visit_enter(this);
+   if (s != visit_continue)
+      return (s == visit_continue_with_parent) ? visit_continue : s;
+
+   s = this->record->accept(v);
+   return (s == visit_stop) ? s : v->visit_leave(this);
+}
+
+
+ir_visitor_status
+ir_assignment::accept(ir_hierarchical_visitor *v)
+{
+   ir_visitor_status s = v->visit_enter(this);
+   if (s != visit_continue)
+      return (s == visit_continue_with_parent) ? visit_continue : s;
+
+   v->in_assignee = true;
+   s = this->lhs->accept(v);
+   v->in_assignee = false;
+   if (s != visit_continue)
+      return (s == visit_continue_with_parent) ? visit_continue : s;
+
+   s = this->rhs->accept(v);
+   if (s != visit_continue)
+      return (s == visit_continue_with_parent) ? visit_continue : s;
+
+   if (this->condition)
+      s = this->condition->accept(v);
+
+   return (s == visit_stop) ? s : v->visit_leave(this);
+}
+
+
+ir_visitor_status
+ir_constant::accept(ir_hierarchical_visitor *v)
+{
+   return v->visit(this);
+}
+
+
+ir_visitor_status
+ir_call::accept(ir_hierarchical_visitor *v)
+{
+   ir_visitor_status s = v->visit_enter(this);
+   if (s != visit_continue)
+      return (s == visit_continue_with_parent) ? visit_continue : s;
+
+   if (this->return_deref != NULL) {
+      v->in_assignee = true;
+      s = this->return_deref->accept(v);
+      v->in_assignee = false;
+      if (s != visit_continue)
+	 return (s == visit_continue_with_parent) ? visit_continue : s;
+   }
+
+   s = visit_list_elements(v, &this->actual_parameters, false);
+   if (s == visit_stop)
+      return s;
+
+   return v->visit_leave(this);
+}
+
+
+ir_visitor_status
+ir_return::accept(ir_hierarchical_visitor *v)
+{
+   ir_visitor_status s = v->visit_enter(this);
+   if (s != visit_continue)
+      return (s == visit_continue_with_parent) ? visit_continue : s;
+
+   ir_rvalue *val = this->get_value();
+   if (val) {
+      s = val->accept(v);
+      if (s != visit_continue)
+	 return (s == visit_continue_with_parent) ? visit_continue : s;
+   }
+
+   return v->visit_leave(this);
+}
+
+
+ir_visitor_status
+ir_discard::accept(ir_hierarchical_visitor *v)
+{
+   ir_visitor_status s = v->visit_enter(this);
+   if (s != visit_continue)
+      return (s == visit_continue_with_parent) ? visit_continue : s;
+
+   if (this->condition != NULL) {
+      s = this->condition->accept(v);
+      if (s != visit_continue)
+	 return (s == visit_continue_with_parent) ? visit_continue : s;
+   }
+
+   return v->visit_leave(this);
+}
+
+
+ir_visitor_status
+ir_if::accept(ir_hierarchical_visitor *v)
+{
+   ir_visitor_status s = v->visit_enter(this);
+   if (s != visit_continue)
+      return (s == visit_continue_with_parent) ? visit_continue : s;
+
+   s = this->condition->accept(v);
+   if (s != visit_continue)
+      return (s == visit_continue_with_parent) ? visit_continue : s;
+
+   if (s != visit_continue_with_parent) {
+      s = visit_list_elements(v, &this->then_instructions);
+      if (s == visit_stop)
+	 return s;
+   }
+
+   if (s != visit_continue_with_parent) {
+      s = visit_list_elements(v, &this->else_instructions);
+      if (s == visit_stop)
+	 return s;
+   }
+
+   return v->visit_leave(this);
+}
+
+ir_visitor_status
+ir_emit_vertex::accept(ir_hierarchical_visitor *v)
+{
+   return v->visit(this);
+}
+
+
+ir_visitor_status
+ir_end_primitive::accept(ir_hierarchical_visitor *v)
+{
+   return v->visit(this);
+}

diff --git a/icd/intel/compiler/shader/ir_import_prototypes.cpp b/icd/intel/compiler/shader/ir_import_prototypes.cpp
new file mode 100644
index 0000000..b0429fb
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_import_prototypes.cpp

@@ -0,0 +1,125 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file ir_import_prototypes.cpp
+ * Import function prototypes from one IR tree into another.
+ *
+ * \author Ian Romanick
+ */
+#include "ir.h"
+#include "glsl_symbol_table.h"
+
+namespace {
+
+/**
+ * Visitor used to import function prototypes
+ *
+ * Normally the \c clone method of either \c ir_function or
+ * \c ir_function_signature could be used.  However, we don't want a complete
+ * clone of the \c ir_function_signature.  We want everything \b except the
+ * body of the function.
+ */
+class import_prototype_visitor : public ir_hierarchical_visitor {
+public:
+   /**
+    */
+   import_prototype_visitor(exec_list *list, glsl_symbol_table *symbols,
+			    void *mem_ctx)
+   {
+      this->mem_ctx = mem_ctx;
+      this->list = list;
+      this->symbols = symbols;
+      this->function = NULL;
+   }
+
+   virtual ir_visitor_status visit_enter(ir_function *ir)
+   {
+      assert(this->function == NULL);
+
+      this->function = this->symbols->get_function(ir->name);
+      if (!this->function) {
+	 this->function = new(this->mem_ctx) ir_function(ir->name);
+
+	 list->push_tail(this->function);
+
+	 /* Add the new function to the symbol table.
+	  */
+	 this->symbols->add_function(this->function);
+      }
+      return visit_continue;
+   }
+
+   virtual ir_visitor_status visit_leave(ir_function *ir)
+   {
+      (void) ir;
+      assert(this->function != NULL);
+
+      this->function = NULL;
+      return visit_continue;
+   }
+
+   ir_visitor_status visit_enter(ir_function_signature *ir)
+   {
+      assert(this->function != NULL);
+
+      ir_function_signature *copy = ir->clone_prototype(mem_ctx, NULL);
+
+      this->function->add_signature(copy);
+
+      /* Do not process child nodes of the ir_function_signature.  There can
+       * never be any nodes inside the ir_function_signature that we care
+       * about.  Instead continue with the next sibling.
+       */
+      return visit_continue_with_parent;
+   }
+
+private:
+   exec_list *list;
+   ir_function *function;
+   glsl_symbol_table *symbols;
+   void *mem_ctx;
+};
+
+} /* anonymous namespace */
+
+/**
+ * Import function prototypes from one IR tree into another
+ *
+ * \param source   Source instruction stream containing functions whose
+ *                 prototypes are to be imported
+ * \param dest     Destination instruction stream where new \c ir_function and
+ *                 \c ir_function_signature nodes will be stored
+ * \param symbols  Symbol table where new functions will be stored
+ * \param mem_ctx  ralloc memory context used for new allocations
+ */
+void
+import_prototypes(const exec_list *source, exec_list *dest,
+		  glsl_symbol_table *symbols, void *mem_ctx)
+{
+   import_prototype_visitor v(dest, symbols, mem_ctx);
+
+   /* Making source be const is just extra documentation.
+    */
+   v.run(const_cast<exec_list *>(source));
+}

diff --git a/icd/intel/compiler/shader/ir_print_visitor.cpp b/icd/intel/compiler/shader/ir_print_visitor.cpp
new file mode 100644
index 0000000..89288db
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_print_visitor.cpp

@@ -0,0 +1,566 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "icd-utils.h" // LunarG: ADD
+#include "ir_print_visitor.h"
+#include "glsl_types.h"
+#include "glsl_parser_extras.h"
+#include "main/macros.h"
+#include "program/hash_table.h"
+
+static void print_type(FILE *f, const glsl_type *t);
+
+void
+ir_instruction::print(void) const
+{
+   this->fprint(stdout);
+}
+
+void
+ir_instruction::fprint(FILE *f) const
+{
+   ir_instruction *deconsted = const_cast<ir_instruction *>(this);
+
+   ir_print_visitor v(f);
+   deconsted->accept(&v);
+}
+
+extern "C" {
+void
+_mesa_print_ir(FILE *f, exec_list *instructions,
+	       struct _mesa_glsl_parse_state *state)
+{
+   if (state) {
+      for (unsigned i = 0; i < state->num_user_structures; i++) {
+	 const glsl_type *const s = state->user_structures[i];
+
+	 fprintf(f, "(structure (%s) (%s@%p) (%u) (\n",
+                 s->name, s->name, (void *) s, s->length);
+
+	 for (unsigned j = 0; j < s->length; j++) {
+	    fprintf(f, "\t((");
+	    print_type(f, s->fields.structure[j].type);
+	    fprintf(f, ")(%s))\n", s->fields.structure[j].name);
+	 }
+
+	 fprintf(f, ")\n");
+      }
+   }
+
+   fprintf(f, "(\n");
+   foreach_list(n, instructions) {
+      ir_instruction *ir = (ir_instruction *) n;
+      ir->fprint(f);
+      if (ir->ir_type != ir_type_function)
+	 fprintf(f, "\n");
+   }
+   fprintf(f, "\n)");
+}
+
+} /* extern "C" */
+
+ir_print_visitor::ir_print_visitor(FILE *f)
+   : f(f)
+{
+   indentation = 0;
+   printable_names =
+      hash_table_ctor(32, hash_table_pointer_hash, hash_table_pointer_compare);
+   symbols = _mesa_symbol_table_ctor();
+   mem_ctx = ralloc_context(NULL);
+}
+
+ir_print_visitor::~ir_print_visitor()
+{
+   hash_table_dtor(printable_names);
+   _mesa_symbol_table_dtor(symbols);
+   ralloc_free(mem_ctx);
+}
+
+void ir_print_visitor::indent(void)
+{
+   for (int i = 0; i < indentation; i++)
+      fprintf(f, "  ");
+}
+
+const char *
+ir_print_visitor::unique_name(ir_variable *var)
+{
+   /* var->name can be NULL in function prototypes when a type is given for a
+    * parameter but no name is given.  In that case, just return an empty
+    * string.  Don't worry about tracking the generated name in the printable
+    * names hash because this is the only scope where it can ever appear.
+    */
+   if (var->name == NULL) {
+      static unsigned arg = 1;
+      return ralloc_asprintf(this->mem_ctx, "parameter@%u", arg++);
+   }
+
+   /* Do we already have a name for this variable? */
+   const char *name = (const char *) hash_table_find(this->printable_names, var);
+   if (name != NULL)
+      return name;
+
+   /* If there's no conflict, just use the original name */
+   if (_mesa_symbol_table_find_symbol(this->symbols, -1, var->name) == NULL) {
+      name = var->name;
+   } else {
+      static unsigned i = 1;
+      name = ralloc_asprintf(this->mem_ctx, "%s@%u", var->name, ++i);
+   }
+   hash_table_insert(this->printable_names, (void *) name, var);
+   _mesa_symbol_table_add_symbol(this->symbols, -1, name, var);
+   return name;
+}
+
+static void
+print_type(FILE *f, const glsl_type *t)
+{
+   if (t->base_type == GLSL_TYPE_ARRAY) {
+      fprintf(f, "(array ");
+      print_type(f, t->fields.array);
+      fprintf(f, " %u)", t->length);
+   } else if ((t->base_type == GLSL_TYPE_STRUCT)
+	      && (strncmp("gl_", t->name, 3) != 0)) {
+      fprintf(f, "%s@%p", t->name, (void *) t);
+   } else {
+      fprintf(f, "%s", t->name);
+   }
+}
+
+void ir_print_visitor::visit(ir_rvalue *)
+{
+   fprintf(f, "error");
+}
+
+void ir_print_visitor::visit(ir_variable *ir)
+{
+   fprintf(f, "(declare ");
+
+   const char *const cent = (ir->data.centroid) ? "centroid " : "";
+   const char *const samp = (ir->data.sample) ? "sample " : "";
+   const char *const inv = (ir->data.invariant) ? "invariant " : "";
+   const char *const mode[] = { "", "uniform ", "shader_in ", "shader_out ",
+                                "in ", "out ", "inout ",
+			        "const_in ", "sys ", "temporary " };
+   STATIC_ASSERT(ARRAY_SIZE(mode) == ir_var_mode_count);
+   const char *const interp[] = { "", "smooth", "flat", "noperspective" };
+   STATIC_ASSERT(ARRAY_SIZE(interp) == INTERP_QUALIFIER_COUNT);
+
+   fprintf(f, "(%s%s%s%s%s) ",
+	  cent, samp, inv, mode[ir->data.mode], interp[ir->data.interpolation]);
+
+   print_type(f, ir->type);
+   fprintf(f, " %s)", unique_name(ir));
+}
+
+
+void ir_print_visitor::visit(ir_function_signature *ir)
+{
+   _mesa_symbol_table_push_scope(symbols);
+   fprintf(f, "(signature ");
+   indentation++;
+
+   print_type(f, ir->return_type);
+   fprintf(f, "\n");
+   indent();
+
+   fprintf(f, "(parameters\n");
+   indentation++;
+
+   foreach_list(n, &ir->parameters) {
+      ir_variable *const inst = (ir_variable *) n;
+
+      indent();
+      inst->accept(this);
+      fprintf(f, "\n");
+   }
+   indentation--;
+
+   indent();
+   fprintf(f, ")\n");
+
+   indent();
+
+   fprintf(f, "(\n");
+   indentation++;
+
+   foreach_list(n, &ir->body) {
+      ir_instruction *const inst = (ir_instruction *) n;
+
+      indent();
+      inst->accept(this);
+      fprintf(f, "\n");
+   }
+   indentation--;
+   indent();
+   fprintf(f, "))\n");
+   indentation--;
+   _mesa_symbol_table_pop_scope(symbols);
+}
+
+
+void ir_print_visitor::visit(ir_function *ir)
+{
+   fprintf(f, "(function %s\n", ir->name);
+   indentation++;
+   foreach_list(n, &ir->signatures) {
+      ir_function_signature *const sig = (ir_function_signature *) n;
+      indent();
+      sig->accept(this);
+      fprintf(f, "\n");
+   }
+   indentation--;
+   indent();
+   fprintf(f, ")\n\n");
+}
+
+
+void ir_print_visitor::visit(ir_expression *ir)
+{
+   fprintf(f, "(expression ");
+
+   print_type(f, ir->type);
+
+   fprintf(f, " %s ", ir->operator_string());
+
+   for (unsigned i = 0; i < ir->get_num_operands(); i++) {
+      ir->operands[i]->accept(this);
+   }
+
+   fprintf(f, ") ");
+}
+
+
+void ir_print_visitor::visit(ir_texture *ir)
+{
+   fprintf(f, "(%s ", ir->opcode_string());
+
+   print_type(f, ir->type);
+   fprintf(f, " ");
+
+   ir->sampler->accept(this);
+   fprintf(f, " ");
+
+   if (ir->op != ir_txs && ir->op != ir_query_levels) {
+      ir->coordinate->accept(this);
+
+      fprintf(f, " ");
+
+      if (ir->offset != NULL) {
+	 ir->offset->accept(this);
+      } else {
+	 fprintf(f, "0");
+      }
+
+      fprintf(f, " ");
+   }
+
+   if (ir->op != ir_txf && ir->op != ir_txf_ms &&
+       ir->op != ir_txs && ir->op != ir_tg4 &&
+       ir->op != ir_query_levels) {
+      if (ir->projector)
+	 ir->projector->accept(this);
+      else
+	 fprintf(f, "1");
+
+      if (ir->shadow_comparitor) {
+	 fprintf(f, " ");
+	 ir->shadow_comparitor->accept(this);
+      } else {
+	 fprintf(f, " ()");
+      }
+   }
+
+   fprintf(f, " ");
+   switch (ir->op)
+   {
+   case ir_tex:
+   case ir_lod:
+   case ir_query_levels:
+      break;
+   case ir_txb:
+      ir->lod_info.bias->accept(this);
+      break;
+   case ir_txl:
+   case ir_txf:
+   case ir_txs:
+      ir->lod_info.lod->accept(this);
+      break;
+   case ir_txf_ms:
+      ir->lod_info.sample_index->accept(this);
+      break;
+   case ir_txd:
+      fprintf(f, "(");
+      ir->lod_info.grad.dPdx->accept(this);
+      fprintf(f, " ");
+      ir->lod_info.grad.dPdy->accept(this);
+      fprintf(f, ")");
+      break;
+   case ir_tg4:
+      ir->lod_info.component->accept(this);
+      break;
+   };
+   fprintf(f, ")");
+}
+
+
+void ir_print_visitor::visit(ir_swizzle *ir)
+{
+   const unsigned swiz[4] = {
+      ir->mask.x,
+      ir->mask.y,
+      ir->mask.z,
+      ir->mask.w,
+   };
+
+   fprintf(f, "(swiz ");
+   for (unsigned i = 0; i < ir->mask.num_components; i++) {
+      fprintf(f, "%c", "xyzw"[swiz[i]]);
+   }
+   fprintf(f, " ");
+   ir->val->accept(this);
+   fprintf(f, ")");
+}
+
+
+void ir_print_visitor::visit(ir_dereference_variable *ir)
+{
+   ir_variable *var = ir->variable_referenced();
+   fprintf(f, "(var_ref %s) ", unique_name(var));
+}
+
+
+void ir_print_visitor::visit(ir_dereference_array *ir)
+{
+   fprintf(f, "(array_ref ");
+   ir->array->accept(this);
+   ir->array_index->accept(this);
+   fprintf(f, ") ");
+}
+
+
+void ir_print_visitor::visit(ir_dereference_record *ir)
+{
+   fprintf(f, "(record_ref ");
+   ir->record->accept(this);
+   fprintf(f, " %s) ", ir->field);
+}
+
+
+void ir_print_visitor::visit(ir_assignment *ir)
+{
+   fprintf(f, "(assign ");
+
+   if (ir->condition)
+      ir->condition->accept(this);
+
+   char mask[5];
+   unsigned j = 0;
+
+   for (unsigned i = 0; i < 4; i++) {
+      if ((ir->write_mask & (1 << i)) != 0) {
+	 mask[j] = "xyzw"[i];
+	 j++;
+      }
+   }
+   mask[j] = '\0';
+
+   fprintf(f, " (%s) ", mask);
+
+   ir->lhs->accept(this);
+
+   fprintf(f, " ");
+
+   ir->rhs->accept(this);
+   fprintf(f, ") ");
+}
+
+
+void ir_print_visitor::visit(ir_constant *ir)
+{
+   fprintf(f, "(constant ");
+   print_type(f, ir->type);
+   fprintf(f, " (");
+
+   if (ir->type->is_array()) {
+      for (unsigned i = 0; i < ir->type->length; i++)
+	 ir->get_array_element(i)->accept(this);
+   } else if (ir->type->is_record()) {
+      ir_constant *value = (ir_constant *) ir->components.get_head();
+      for (unsigned i = 0; i < ir->type->length; i++) {
+	 fprintf(f, "(%s ", ir->type->fields.structure[i].name);
+	 value->accept(this);
+	 fprintf(f, ")");
+
+	 value = (ir_constant *) value->next;
+      }
+   } else {
+      for (unsigned i = 0; i < ir->type->components(); i++) {
+	 if (i != 0)
+	    fprintf(f, " ");
+	 switch (ir->type->base_type) {
+	 case GLSL_TYPE_UINT:  fprintf(f, "%u", ir->value.u[i]); break;
+	 case GLSL_TYPE_INT:   fprintf(f, "%d", ir->value.i[i]); break;
+	 case GLSL_TYPE_FLOAT:
+            if (ir->value.f[i] == 0.0f)
+               /* 0.0 == -0.0, so print with %f to get the proper sign. */
+               fprintf(f, "%.1f", ir->value.f[i]);
+            else if (fabs(ir->value.f[i]) < 0.000001f)
+               fprintf(f, "%a", ir->value.f[i]);
+            else if (fabs(ir->value.f[i]) > 1000000.0f)
+               fprintf(f, "%e", ir->value.f[i]);
+            else
+               fprintf(f, "%f", ir->value.f[i]);
+            break;
+	 case GLSL_TYPE_BOOL:  fprintf(f, "%d", ir->value.b[i]); break;
+	 default: assert(0);
+	 }
+      }
+   }
+   fprintf(f, ")) ");
+}
+
+
+void
+ir_print_visitor::visit(ir_call *ir)
+{
+   fprintf(f, "(call %s ", ir->callee_name());
+   if (ir->return_deref)
+      ir->return_deref->accept(this);
+   fprintf(f, " (");
+   foreach_list(n, &ir->actual_parameters) {
+      ir_rvalue *const param = (ir_rvalue *) n;
+
+      param->accept(this);
+   }
+   fprintf(f, "))\n");
+}
+
+
+void
+ir_print_visitor::visit(ir_return *ir)
+{
+   fprintf(f, "(return");
+
+   ir_rvalue *const value = ir->get_value();
+   if (value) {
+      fprintf(f, " ");
+      value->accept(this);
+   }
+
+   fprintf(f, ")");
+}
+
+
+void
+ir_print_visitor::visit(ir_discard *ir)
+{
+   fprintf(f, "(discard ");
+
+   if (ir->condition != NULL) {
+      fprintf(f, " ");
+      ir->condition->accept(this);
+   }
+
+   fprintf(f, ")");
+}
+
+
+void
+ir_print_visitor::visit(ir_if *ir)
+{
+   fprintf(f, "(if ");
+   ir->condition->accept(this);
+
+   fprintf(f, "(\n");
+   indentation++;
+
+   foreach_list(n, &ir->then_instructions) {
+      ir_instruction *const inst = (ir_instruction *) n;
+
+      indent();
+      inst->accept(this);
+      fprintf(f, "\n");
+   }
+
+   indentation--;
+   indent();
+   fprintf(f, ")\n");
+
+   indent();
+   if (!ir->else_instructions.is_empty()) {
+      fprintf(f, "(\n");
+      indentation++;
+
+      foreach_list(n, &ir->else_instructions) {
+	 ir_instruction *const inst = (ir_instruction *) n;
+
+	 indent();
+	 inst->accept(this);
+	 fprintf(f, "\n");
+      }
+      indentation--;
+      indent();
+      fprintf(f, "))\n");
+   } else {
+      fprintf(f, "())\n");
+   }
+}
+
+
+void
+ir_print_visitor::visit(ir_loop *ir)
+{
+   fprintf(f, "(loop (\n");
+   indentation++;
+
+   foreach_list(n, &ir->body_instructions) {
+      ir_instruction *const inst = (ir_instruction *) n;
+
+      indent();
+      inst->accept(this);
+      fprintf(f, "\n");
+   }
+   indentation--;
+   indent();
+   fprintf(f, "))\n");
+}
+
+
+void
+ir_print_visitor::visit(ir_loop_jump *ir)
+{
+   fprintf(f, "%s", ir->is_break() ? "break" : "continue");
+}
+
+void
+ir_print_visitor::visit(ir_emit_vertex *)
+{
+   fprintf(f, "(emit-vertex)");
+}
+
+void
+ir_print_visitor::visit(ir_end_primitive *)
+{
+   fprintf(f, "(end-primitive)");
+}

diff --git a/icd/intel/compiler/shader/ir_print_visitor.h b/icd/intel/compiler/shader/ir_print_visitor.h
new file mode 100644
index 0000000..98f041d
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_print_visitor.h

@@ -0,0 +1,95 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef IR_PRINT_VISITOR_H
+#define IR_PRINT_VISITOR_H
+
+#include "ir.h"
+#include "ir_visitor.h"
+
+extern "C" {
+#include "program/symbol_table.h"
+}
+
+/**
+ * Abstract base class of visitors of IR instruction trees
+ */
+class ir_print_visitor : public ir_visitor {
+public:
+   ir_print_visitor(FILE *f);
+   virtual ~ir_print_visitor();
+
+   void indent(void);
+
+   /**
+    * \name Visit methods
+    *
+    * As typical for the visitor pattern, there must be one \c visit method for
+    * each concrete subclass of \c ir_instruction.  Virtual base classes within
+    * the hierarchy should not have \c visit methods.
+    */
+   /*@{*/
+   virtual void visit(ir_rvalue *);
+   virtual void visit(ir_variable *);
+   virtual void visit(ir_function_signature *);
+   virtual void visit(ir_function *);
+   virtual void visit(ir_expression *);
+   virtual void visit(ir_texture *);
+   virtual void visit(ir_swizzle *);
+   virtual void visit(ir_dereference_variable *);
+   virtual void visit(ir_dereference_array *);
+   virtual void visit(ir_dereference_record *);
+   virtual void visit(ir_assignment *);
+   virtual void visit(ir_constant *);
+   virtual void visit(ir_call *);
+   virtual void visit(ir_return *);
+   virtual void visit(ir_discard *);
+   virtual void visit(ir_if *);
+   virtual void visit(ir_loop *);
+   virtual void visit(ir_loop_jump *);
+   virtual void visit(ir_emit_vertex *);
+   virtual void visit(ir_end_primitive *);
+   /*@}*/
+
+private:
+   /**
+    * Fetch/generate a unique name for ir_variable.
+    *
+    * GLSL IR permits multiple ir_variables to share the same name.  This works
+    * fine until we try to print it, when we really need a unique one.
+    */
+   const char *unique_name(ir_variable *var);
+
+   /** A mapping from ir_variable * -> unique printable names. */
+   hash_table *printable_names;
+   _mesa_symbol_table *symbols;
+
+   void *mem_ctx;
+   FILE *f;
+
+   int indentation;
+};
+
+#endif /* IR_PRINT_VISITOR_H */

diff --git a/icd/intel/compiler/shader/ir_reader.cpp b/icd/intel/compiler/shader/ir_reader.cpp
new file mode 100644
index 0000000..28923f3
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_reader.cpp

@@ -0,0 +1,1131 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "ir_reader.h"
+#include "glsl_parser_extras.h"
+#include "glsl_types.h"
+#include "s_expression.h"
+
+const static bool debug = false;
+
+namespace {
+
+class ir_reader {
+public:
+   ir_reader(_mesa_glsl_parse_state *);
+
+   void read(exec_list *instructions, const char *src, bool scan_for_protos);
+
+private:
+   void *mem_ctx;
+   _mesa_glsl_parse_state *state;
+
+   void ir_read_error(s_expression *, const char *fmt, ...);
+
+   const glsl_type *read_type(s_expression *);
+
+   void scan_for_prototypes(exec_list *, s_expression *);
+   ir_function *read_function(s_expression *, bool skip_body);
+   void read_function_sig(ir_function *, s_expression *, bool skip_body);
+
+   void read_instructions(exec_list *, s_expression *, ir_loop *);
+   ir_instruction *read_instruction(s_expression *, ir_loop *);
+   ir_variable *read_declaration(s_expression *);
+   ir_if *read_if(s_expression *, ir_loop *);
+   ir_loop *read_loop(s_expression *);
+   ir_call *read_call(s_expression *);
+   ir_return *read_return(s_expression *);
+   ir_rvalue *read_rvalue(s_expression *);
+   ir_assignment *read_assignment(s_expression *);
+   ir_expression *read_expression(s_expression *);
+   ir_swizzle *read_swizzle(s_expression *);
+   ir_constant *read_constant(s_expression *);
+   ir_texture *read_texture(s_expression *);
+   ir_emit_vertex *read_emit_vertex(s_expression *);
+   ir_end_primitive *read_end_primitive(s_expression *);
+
+   ir_dereference *read_dereference(s_expression *);
+   ir_dereference_variable *read_var_ref(s_expression *);
+};
+
+} /* anonymous namespace */
+
+ir_reader::ir_reader(_mesa_glsl_parse_state *state) : state(state)
+{
+   this->mem_ctx = state;
+}
+
+void
+_mesa_glsl_read_ir(_mesa_glsl_parse_state *state, exec_list *instructions,
+		   const char *src, bool scan_for_protos)
+{
+   ir_reader r(state);
+   r.read(instructions, src, scan_for_protos);
+}
+
+void
+ir_reader::read(exec_list *instructions, const char *src, bool scan_for_protos)
+{
+   void *sx_mem_ctx = ralloc_context(NULL);
+   s_expression *expr = s_expression::read_expression(sx_mem_ctx, src);
+   if (expr == NULL) {
+      ir_read_error(NULL, "couldn't parse S-Expression.");
+      return;
+   }
+   
+   if (scan_for_protos) {
+      scan_for_prototypes(instructions, expr);
+      if (state->error)
+	 return;
+   }
+
+   read_instructions(instructions, expr, NULL);
+   ralloc_free(sx_mem_ctx);
+
+   if (debug)
+      validate_ir_tree(instructions);
+}
+
+void
+ir_reader::ir_read_error(s_expression *expr, const char *fmt, ...)
+{
+   va_list ap;
+
+   state->error = true;
+
+   if (state->current_function != NULL)
+      ralloc_asprintf_append(&state->info_log, "In function %s:\n",
+			     state->current_function->function_name());
+   ralloc_strcat(&state->info_log, "error: ");
+
+   va_start(ap, fmt);
+   ralloc_vasprintf_append(&state->info_log, fmt, ap);
+   va_end(ap);
+   ralloc_strcat(&state->info_log, "\n");
+
+   if (expr != NULL) {
+      ralloc_strcat(&state->info_log, "...in this context:\n   ");
+      expr->print();
+      ralloc_strcat(&state->info_log, "\n\n");
+   }
+}
+
+const glsl_type *
+ir_reader::read_type(s_expression *expr)
+{
+   s_expression *s_base_type;
+   s_int *s_size;
+
+   s_pattern pat[] = { "array", s_base_type, s_size };
+   if (MATCH(expr, pat)) {
+      const glsl_type *base_type = read_type(s_base_type);
+      if (base_type == NULL) {
+	 ir_read_error(NULL, "when reading base type of array type");
+	 return NULL;
+      }
+
+      return glsl_type::get_array_instance(base_type, s_size->value());
+   }
+   
+   s_symbol *type_sym = SX_AS_SYMBOL(expr);
+   if (type_sym == NULL) {
+      ir_read_error(expr, "expected <type>");
+      return NULL;
+   }
+
+   const glsl_type *type = state->symbols->get_type(type_sym->value());
+   if (type == NULL)
+      ir_read_error(expr, "invalid type: %s", type_sym->value());
+
+   return type;
+}
+
+
+void
+ir_reader::scan_for_prototypes(exec_list *instructions, s_expression *expr)
+{
+   s_list *list = SX_AS_LIST(expr);
+   if (list == NULL) {
+      ir_read_error(expr, "Expected (<instruction> ...); found an atom.");
+      return;
+   }
+
+   foreach_list(n, &list->subexpressions) {
+      s_list *sub = SX_AS_LIST(n);
+      if (sub == NULL)
+	 continue; // not a (function ...); ignore it.
+
+      s_symbol *tag = SX_AS_SYMBOL(sub->subexpressions.get_head());
+      if (tag == NULL || strcmp(tag->value(), "function") != 0)
+	 continue; // not a (function ...); ignore it.
+
+      ir_function *f = read_function(sub, true);
+      if (f == NULL)
+	 return;
+      instructions->push_tail(f);
+   }
+}
+
+ir_function *
+ir_reader::read_function(s_expression *expr, bool skip_body)
+{
+   bool added = false;
+   s_symbol *name;
+
+   s_pattern pat[] = { "function", name };
+   if (!PARTIAL_MATCH(expr, pat)) {
+      ir_read_error(expr, "Expected (function <name> (signature ...) ...)");
+      return NULL;
+   }
+
+   ir_function *f = state->symbols->get_function(name->value());
+   if (f == NULL) {
+      f = new(mem_ctx) ir_function(name->value());
+      added = state->symbols->add_function(f);
+      assert(added);
+   }
+
+   /* Skip over "function" tag and function name (which are guaranteed to be
+    * present by the above PARTIAL_MATCH call).
+    */
+   exec_node *node = ((s_list *) expr)->subexpressions.head->next->next;
+   for (/* nothing */; !node->is_tail_sentinel(); node = node->next) {
+      s_expression *s_sig = (s_expression *) node;
+      read_function_sig(f, s_sig, skip_body);
+   }
+   return added ? f : NULL;
+}
+
+static bool
+always_available(const _mesa_glsl_parse_state *)
+{
+   return true;
+}
+
+void
+ir_reader::read_function_sig(ir_function *f, s_expression *expr, bool skip_body)
+{
+   s_expression *type_expr;
+   s_list *paramlist;
+   s_list *body_list;
+
+   s_pattern pat[] = { "signature", type_expr, paramlist, body_list };
+   if (!MATCH(expr, pat)) {
+      ir_read_error(expr, "Expected (signature <type> (parameters ...) "
+			  "(<instruction> ...))");
+      return;
+   }
+
+   const glsl_type *return_type = read_type(type_expr);
+   if (return_type == NULL)
+      return;
+
+   s_symbol *paramtag = SX_AS_SYMBOL(paramlist->subexpressions.get_head());
+   if (paramtag == NULL || strcmp(paramtag->value(), "parameters") != 0) {
+      ir_read_error(paramlist, "Expected (parameters ...)");
+      return;
+   }
+
+   // Read the parameters list into a temporary place.
+   exec_list hir_parameters;
+   state->symbols->push_scope();
+
+   /* Skip over the "parameters" tag. */
+   exec_node *node = paramlist->subexpressions.head->next;
+   for (/* nothing */; !node->is_tail_sentinel(); node = node->next) {
+      ir_variable *var = read_declaration((s_expression *) node);
+      if (var == NULL)
+	 return;
+
+      hir_parameters.push_tail(var);
+   }
+
+   ir_function_signature *sig =
+      f->exact_matching_signature(state, &hir_parameters);
+   if (sig == NULL && skip_body) {
+      /* If scanning for prototypes, generate a new signature. */
+      /* ir_reader doesn't know what languages support a given built-in, so
+       * just say that they're always available.  For now, other mechanisms
+       * guarantee the right built-ins are available.
+       */
+      sig = new(mem_ctx) ir_function_signature(return_type, always_available);
+      f->add_signature(sig);
+   } else if (sig != NULL) {
+      const char *badvar = sig->qualifiers_match(&hir_parameters);
+      if (badvar != NULL) {
+	 ir_read_error(expr, "function `%s' parameter `%s' qualifiers "
+		       "don't match prototype", f->name, badvar);
+	 return;
+      }
+
+      if (sig->return_type != return_type) {
+	 ir_read_error(expr, "function `%s' return type doesn't "
+		       "match prototype", f->name);
+	 return;
+      }
+   } else {
+      /* No prototype for this body exists - skip it. */
+      state->symbols->pop_scope();
+      return;
+   }
+   assert(sig != NULL);
+
+   sig->replace_parameters(&hir_parameters);
+
+   if (!skip_body && !body_list->subexpressions.is_empty()) {
+      if (sig->is_defined) {
+	 ir_read_error(expr, "function %s redefined", f->name);
+	 return;
+      }
+      state->current_function = sig;
+      read_instructions(&sig->body, body_list, NULL);
+      state->current_function = NULL;
+      sig->is_defined = true;
+   }
+
+   state->symbols->pop_scope();
+}
+
+void
+ir_reader::read_instructions(exec_list *instructions, s_expression *expr,
+			     ir_loop *loop_ctx)
+{
+   // Read in a list of instructions
+   s_list *list = SX_AS_LIST(expr);
+   if (list == NULL) {
+      ir_read_error(expr, "Expected (<instruction> ...); found an atom.");
+      return;
+   }
+
+   foreach_list(n, &list->subexpressions) {
+      s_expression *sub = (s_expression *) n;
+      ir_instruction *ir = read_instruction(sub, loop_ctx);
+      if (ir != NULL) {
+	 /* Global variable declarations should be moved to the top, before
+	  * any functions that might use them.  Functions are added to the
+	  * instruction stream when scanning for prototypes, so without this
+	  * hack, they always appear before variable declarations.
+	  */
+	 if (state->current_function == NULL && ir->as_variable() != NULL)
+	    instructions->push_head(ir);
+	 else
+	    instructions->push_tail(ir);
+      }
+   }
+}
+
+
+ir_instruction *
+ir_reader::read_instruction(s_expression *expr, ir_loop *loop_ctx)
+{
+   s_symbol *symbol = SX_AS_SYMBOL(expr);
+   if (symbol != NULL) {
+      if (strcmp(symbol->value(), "break") == 0 && loop_ctx != NULL)
+	 return new(mem_ctx) ir_loop_jump(ir_loop_jump::jump_break);
+      if (strcmp(symbol->value(), "continue") == 0 && loop_ctx != NULL)
+	 return new(mem_ctx) ir_loop_jump(ir_loop_jump::jump_continue);
+   }
+
+   s_list *list = SX_AS_LIST(expr);
+   if (list == NULL || list->subexpressions.is_empty()) {
+      ir_read_error(expr, "Invalid instruction.\n");
+      return NULL;
+   }
+
+   s_symbol *tag = SX_AS_SYMBOL(list->subexpressions.get_head());
+   if (tag == NULL) {
+      ir_read_error(expr, "expected instruction tag");
+      return NULL;
+   }
+
+   ir_instruction *inst = NULL;
+   if (strcmp(tag->value(), "declare") == 0) {
+      inst = read_declaration(list);
+   } else if (strcmp(tag->value(), "assign") == 0) {
+      inst = read_assignment(list);
+   } else if (strcmp(tag->value(), "if") == 0) {
+      inst = read_if(list, loop_ctx);
+   } else if (strcmp(tag->value(), "loop") == 0) {
+      inst = read_loop(list);
+   } else if (strcmp(tag->value(), "call") == 0) {
+      inst = read_call(list);
+   } else if (strcmp(tag->value(), "return") == 0) {
+      inst = read_return(list);
+   } else if (strcmp(tag->value(), "function") == 0) {
+      inst = read_function(list, false);
+   } else if (strcmp(tag->value(), "emit-vertex") == 0) {
+      inst = read_emit_vertex(list);
+   } else if (strcmp(tag->value(), "end-primitive") == 0) {
+      inst = read_end_primitive(list);
+   } else {
+      inst = read_rvalue(list);
+      if (inst == NULL)
+	 ir_read_error(NULL, "when reading instruction");
+   }
+   return inst;
+}
+
+ir_variable *
+ir_reader::read_declaration(s_expression *expr)
+{
+   s_list *s_quals;
+   s_expression *s_type;
+   s_symbol *s_name;
+
+   s_pattern pat[] = { "declare", s_quals, s_type, s_name };
+   if (!MATCH(expr, pat)) {
+      ir_read_error(expr, "expected (declare (<qualifiers>) <type> <name>)");
+      return NULL;
+   }
+
+   const glsl_type *type = read_type(s_type);
+   if (type == NULL)
+      return NULL;
+
+   ir_variable *var = new(mem_ctx) ir_variable(type, s_name->value(),
+					       ir_var_auto);
+
+   foreach_list(n, &s_quals->subexpressions) {
+      s_symbol *qualifier = SX_AS_SYMBOL(n);
+      if (qualifier == NULL) {
+	 ir_read_error(expr, "qualifier list must contain only symbols");
+	 return NULL;
+      }
+
+      // FINISHME: Check for duplicate/conflicting qualifiers.
+      if (strcmp(qualifier->value(), "centroid") == 0) {
+	 var->data.centroid = 1;
+      } else if (strcmp(qualifier->value(), "sample") == 0) {
+         var->data.sample = 1;
+      } else if (strcmp(qualifier->value(), "invariant") == 0) {
+	 var->data.invariant = 1;
+      } else if (strcmp(qualifier->value(), "uniform") == 0) {
+	 var->data.mode = ir_var_uniform;
+      } else if (strcmp(qualifier->value(), "auto") == 0) {
+	 var->data.mode = ir_var_auto;
+      } else if (strcmp(qualifier->value(), "in") == 0) {
+	 var->data.mode = ir_var_function_in;
+      } else if (strcmp(qualifier->value(), "shader_in") == 0) {
+         var->data.mode = ir_var_shader_in;
+      } else if (strcmp(qualifier->value(), "const_in") == 0) {
+	 var->data.mode = ir_var_const_in;
+      } else if (strcmp(qualifier->value(), "out") == 0) {
+	 var->data.mode = ir_var_function_out;
+      } else if (strcmp(qualifier->value(), "shader_out") == 0) {
+	 var->data.mode = ir_var_shader_out;
+      } else if (strcmp(qualifier->value(), "inout") == 0) {
+	 var->data.mode = ir_var_function_inout;
+      } else if (strcmp(qualifier->value(), "temporary") == 0) {
+	 var->data.mode = ir_var_temporary;
+      } else if (strcmp(qualifier->value(), "smooth") == 0) {
+	 var->data.interpolation = INTERP_QUALIFIER_SMOOTH;
+      } else if (strcmp(qualifier->value(), "flat") == 0) {
+	 var->data.interpolation = INTERP_QUALIFIER_FLAT;
+      } else if (strcmp(qualifier->value(), "noperspective") == 0) {
+	 var->data.interpolation = INTERP_QUALIFIER_NOPERSPECTIVE;
+      } else {
+	 ir_read_error(expr, "unknown qualifier: %s", qualifier->value());
+	 return NULL;
+      }
+   }
+
+   // Add the variable to the symbol table
+   state->symbols->add_variable(var);
+
+   return var;
+}
+
+
+ir_if *
+ir_reader::read_if(s_expression *expr, ir_loop *loop_ctx)
+{
+   s_expression *s_cond;
+   s_expression *s_then;
+   s_expression *s_else;
+
+   s_pattern pat[] = { "if", s_cond, s_then, s_else };
+   if (!MATCH(expr, pat)) {
+      ir_read_error(expr, "expected (if <condition> (<then>...) (<else>...))");
+      return NULL;
+   }
+
+   ir_rvalue *condition = read_rvalue(s_cond);
+   if (condition == NULL) {
+      ir_read_error(NULL, "when reading condition of (if ...)");
+      return NULL;
+   }
+
+   ir_if *iff = new(mem_ctx) ir_if(condition);
+
+   read_instructions(&iff->then_instructions, s_then, loop_ctx);
+   read_instructions(&iff->else_instructions, s_else, loop_ctx);
+   if (state->error) {
+      delete iff;
+      iff = NULL;
+   }
+   return iff;
+}
+
+
+ir_loop *
+ir_reader::read_loop(s_expression *expr)
+{
+   s_expression *s_body;
+
+   s_pattern loop_pat[] = { "loop", s_body };
+   if (!MATCH(expr, loop_pat)) {
+      ir_read_error(expr, "expected (loop <body>)");
+      return NULL;
+   }
+
+   ir_loop *loop = new(mem_ctx) ir_loop;
+
+   read_instructions(&loop->body_instructions, s_body, loop);
+   if (state->error) {
+      delete loop;
+      loop = NULL;
+   }
+   return loop;
+}
+
+
+ir_return *
+ir_reader::read_return(s_expression *expr)
+{
+   s_expression *s_retval;
+
+   s_pattern return_value_pat[] = { "return", s_retval};
+   s_pattern return_void_pat[] = { "return" };
+   if (MATCH(expr, return_value_pat)) {
+      ir_rvalue *retval = read_rvalue(s_retval);
+      if (retval == NULL) {
+         ir_read_error(NULL, "when reading return value");
+         return NULL;
+      }
+      return new(mem_ctx) ir_return(retval);
+   } else if (MATCH(expr, return_void_pat)) {
+      return new(mem_ctx) ir_return;
+   } else {
+      ir_read_error(expr, "expected (return <rvalue>) or (return)");
+      return NULL;
+   }
+}
+
+
+ir_rvalue *
+ir_reader::read_rvalue(s_expression *expr)
+{
+   s_list *list = SX_AS_LIST(expr);
+   if (list == NULL || list->subexpressions.is_empty())
+      return NULL;
+
+   s_symbol *tag = SX_AS_SYMBOL(list->subexpressions.get_head());
+   if (tag == NULL) {
+      ir_read_error(expr, "expected rvalue tag");
+      return NULL;
+   }
+
+   ir_rvalue *rvalue = read_dereference(list);
+   if (rvalue != NULL || state->error)
+      return rvalue;
+   else if (strcmp(tag->value(), "swiz") == 0) {
+      rvalue = read_swizzle(list);
+   } else if (strcmp(tag->value(), "expression") == 0) {
+      rvalue = read_expression(list);
+   } else if (strcmp(tag->value(), "constant") == 0) {
+      rvalue = read_constant(list);
+   } else {
+      rvalue = read_texture(list);
+      if (rvalue == NULL && !state->error)
+	 ir_read_error(expr, "unrecognized rvalue tag: %s", tag->value());
+   }
+
+   return rvalue;
+}
+
+ir_assignment *
+ir_reader::read_assignment(s_expression *expr)
+{
+   s_expression *cond_expr = NULL;
+   s_expression *lhs_expr, *rhs_expr;
+   s_list       *mask_list;
+
+   s_pattern pat4[] = { "assign",            mask_list, lhs_expr, rhs_expr };
+   s_pattern pat5[] = { "assign", cond_expr, mask_list, lhs_expr, rhs_expr };
+   if (!MATCH(expr, pat4) && !MATCH(expr, pat5)) {
+      ir_read_error(expr, "expected (assign [<condition>] (<write mask>) "
+			  "<lhs> <rhs>)");
+      return NULL;
+   }
+
+   ir_rvalue *condition = NULL;
+   if (cond_expr != NULL) {
+      condition = read_rvalue(cond_expr);
+      if (condition == NULL) {
+	 ir_read_error(NULL, "when reading condition of assignment");
+	 return NULL;
+      }
+   }
+
+   unsigned mask = 0;
+
+   s_symbol *mask_symbol;
+   s_pattern mask_pat[] = { mask_symbol };
+   if (MATCH(mask_list, mask_pat)) {
+      const char *mask_str = mask_symbol->value();
+      unsigned mask_length = strlen(mask_str);
+      if (mask_length > 4) {
+	 ir_read_error(expr, "invalid write mask: %s", mask_str);
+	 return NULL;
+      }
+
+      const unsigned idx_map[] = { 3, 0, 1, 2 }; /* w=bit 3, x=0, y=1, z=2 */
+
+      for (unsigned i = 0; i < mask_length; i++) {
+	 if (mask_str[i] < 'w' || mask_str[i] > 'z') {
+	    ir_read_error(expr, "write mask contains invalid character: %c",
+			  mask_str[i]);
+	    return NULL;
+	 }
+	 mask |= 1 << idx_map[mask_str[i] - 'w'];
+      }
+   } else if (!mask_list->subexpressions.is_empty()) {
+      ir_read_error(mask_list, "expected () or (<write mask>)");
+      return NULL;
+   }
+
+   ir_dereference *lhs = read_dereference(lhs_expr);
+   if (lhs == NULL) {
+      ir_read_error(NULL, "when reading left-hand side of assignment");
+      return NULL;
+   }
+
+   ir_rvalue *rhs = read_rvalue(rhs_expr);
+   if (rhs == NULL) {
+      ir_read_error(NULL, "when reading right-hand side of assignment");
+      return NULL;
+   }
+
+   if (mask == 0 && (lhs->type->is_vector() || lhs->type->is_scalar())) {
+      ir_read_error(expr, "non-zero write mask required.");
+      return NULL;
+   }
+
+   return new(mem_ctx) ir_assignment(lhs, rhs, condition, mask);
+}
+
+ir_call *
+ir_reader::read_call(s_expression *expr)
+{
+   s_symbol *name;
+   s_list *params;
+   s_list *s_return = NULL;
+
+   ir_dereference_variable *return_deref = NULL;
+
+   s_pattern void_pat[] = { "call", name, params };
+   s_pattern non_void_pat[] = { "call", name, s_return, params };
+   if (MATCH(expr, non_void_pat)) {
+      return_deref = read_var_ref(s_return);
+      if (return_deref == NULL) {
+	 ir_read_error(s_return, "when reading a call's return storage");
+	 return NULL;
+      }
+   } else if (!MATCH(expr, void_pat)) {
+      ir_read_error(expr, "expected (call <name> [<deref>] (<param> ...))");
+      return NULL;
+   }
+
+   exec_list parameters;
+
+   foreach_list(n, &params->subexpressions) {
+      s_expression *expr = (s_expression *) n;
+      ir_rvalue *param = read_rvalue(expr);
+      if (param == NULL) {
+	 ir_read_error(expr, "when reading parameter to function call");
+	 return NULL;
+      }
+      parameters.push_tail(param);
+   }
+
+   ir_function *f = state->symbols->get_function(name->value());
+   if (f == NULL) {
+      ir_read_error(expr, "found call to undefined function %s",
+		    name->value());
+      return NULL;
+   }
+
+   ir_function_signature *callee = f->matching_signature(state, &parameters);
+   if (callee == NULL) {
+      ir_read_error(expr, "couldn't find matching signature for function "
+                    "%s", name->value());
+      return NULL;
+   }
+
+   if (callee->return_type == glsl_type::void_type && return_deref) {
+      ir_read_error(expr, "call has return value storage but void type");
+      return NULL;
+   } else if (callee->return_type != glsl_type::void_type && !return_deref) {
+      ir_read_error(expr, "call has non-void type but no return value storage");
+      return NULL;
+   }
+
+   return new(mem_ctx) ir_call(callee, return_deref, &parameters);
+}
+
+ir_expression *
+ir_reader::read_expression(s_expression *expr)
+{
+   s_expression *s_type;
+   s_symbol *s_op;
+   s_expression *s_arg[4] = {NULL};
+
+   s_pattern pat[] = { "expression", s_type, s_op, s_arg[0] };
+   if (!PARTIAL_MATCH(expr, pat)) {
+      ir_read_error(expr, "expected (expression <type> <operator> "
+			  "<operand> [<operand>] [<operand>] [<operand>])");
+      return NULL;
+   }
+   s_arg[1] = (s_expression *) s_arg[0]->next; // may be tail sentinel
+   s_arg[2] = (s_expression *) s_arg[1]->next; // may be tail sentinel or NULL
+   if (s_arg[2])
+      s_arg[3] = (s_expression *) s_arg[2]->next; // may be tail sentinel or NULL
+
+   const glsl_type *type = read_type(s_type);
+   if (type == NULL)
+      return NULL;
+
+   /* Read the operator */
+   ir_expression_operation op = ir_expression::get_operator(s_op->value());
+   if (op == (ir_expression_operation) -1) {
+      ir_read_error(expr, "invalid operator: %s", s_op->value());
+      return NULL;
+   }
+    
+   int num_operands = -3; /* skip "expression" <type> <operation> */
+   foreach_list(n, &((s_list *) expr)->subexpressions)
+      ++num_operands;
+
+   int expected_operands = ir_expression::get_num_operands(op);
+   if (num_operands != expected_operands) {
+      ir_read_error(expr, "found %d expression operands, expected %d",
+                    num_operands, expected_operands);
+      return NULL;
+   }
+
+   ir_rvalue *arg[4] = {NULL};
+   for (int i = 0; i < num_operands; i++) {
+      arg[i] = read_rvalue(s_arg[i]);
+      if (arg[i] == NULL) {
+         ir_read_error(NULL, "when reading operand #%d of %s", i, s_op->value());
+         return NULL;
+      }
+   }
+
+   return new(mem_ctx) ir_expression(op, type, arg[0], arg[1], arg[2], arg[3]);
+}
+
+ir_swizzle *
+ir_reader::read_swizzle(s_expression *expr)
+{
+   s_symbol *swiz;
+   s_expression *sub;
+
+   s_pattern pat[] = { "swiz", swiz, sub };
+   if (!MATCH(expr, pat)) {
+      ir_read_error(expr, "expected (swiz <swizzle> <rvalue>)");
+      return NULL;
+   }
+
+   if (strlen(swiz->value()) > 4) {
+      ir_read_error(expr, "expected a valid swizzle; found %s", swiz->value());
+      return NULL;
+   }
+
+   ir_rvalue *rvalue = read_rvalue(sub);
+   if (rvalue == NULL)
+      return NULL;
+
+   ir_swizzle *ir = ir_swizzle::create(rvalue, swiz->value(),
+				       rvalue->type->vector_elements);
+   if (ir == NULL)
+      ir_read_error(expr, "invalid swizzle");
+
+   return ir;
+}
+
+ir_constant *
+ir_reader::read_constant(s_expression *expr)
+{
+   s_expression *type_expr;
+   s_list *values;
+
+   s_pattern pat[] = { "constant", type_expr, values };
+   if (!MATCH(expr, pat)) {
+      ir_read_error(expr, "expected (constant <type> (...))");
+      return NULL;
+   }
+
+   const glsl_type *type = read_type(type_expr);
+   if (type == NULL)
+      return NULL;
+
+   if (values == NULL) {
+      ir_read_error(expr, "expected (constant <type> (...))");
+      return NULL;
+   }
+
+   if (type->is_array()) {
+      unsigned elements_supplied = 0;
+      exec_list elements;
+      foreach_list(n, &values->subexpressions) {
+	 s_expression *elt = (s_expression *) n;
+	 ir_constant *ir_elt = read_constant(elt);
+	 if (ir_elt == NULL)
+	    return NULL;
+	 elements.push_tail(ir_elt);
+	 elements_supplied++;
+      }
+
+      if (elements_supplied != type->length) {
+	 ir_read_error(values, "expected exactly %u array elements, "
+		       "given %u", type->length, elements_supplied);
+	 return NULL;
+      }
+      return new(mem_ctx) ir_constant(type, &elements);
+   }
+
+   ir_constant_data data = { { 0 } };
+
+   // Read in list of values (at most 16).
+   unsigned k = 0;
+   foreach_list(n, &values->subexpressions) {
+      if (k >= 16) {
+	 ir_read_error(values, "expected at most 16 numbers");
+	 return NULL;
+      }
+
+      s_expression *expr = (s_expression *) n;
+
+      if (type->base_type == GLSL_TYPE_FLOAT) {
+	 s_number *value = SX_AS_NUMBER(expr);
+	 if (value == NULL) {
+	    ir_read_error(values, "expected numbers");
+	    return NULL;
+	 }
+	 data.f[k] = value->fvalue();
+      } else {
+	 s_int *value = SX_AS_INT(expr);
+	 if (value == NULL) {
+	    ir_read_error(values, "expected integers");
+	    return NULL;
+	 }
+
+	 switch (type->base_type) {
+	 case GLSL_TYPE_UINT: {
+	    data.u[k] = value->value();
+	    break;
+	 }
+	 case GLSL_TYPE_INT: {
+	    data.i[k] = value->value();
+	    break;
+	 }
+	 case GLSL_TYPE_BOOL: {
+	    data.b[k] = value->value();
+	    break;
+	 }
+	 default:
+	    ir_read_error(values, "unsupported constant type");
+	    return NULL;
+	 }
+      }
+      ++k;
+   }
+   if (k != type->components()) {
+      ir_read_error(values, "expected %u constant values, found %u",
+		    type->components(), k);
+      return NULL;
+   }
+
+   return new(mem_ctx) ir_constant(type, &data);
+}
+
+ir_dereference_variable *
+ir_reader::read_var_ref(s_expression *expr)
+{
+   s_symbol *s_var;
+   s_pattern var_pat[] = { "var_ref", s_var };
+
+   if (MATCH(expr, var_pat)) {
+      ir_variable *var = state->symbols->get_variable(s_var->value());
+      if (var == NULL) {
+	 ir_read_error(expr, "undeclared variable: %s", s_var->value());
+	 return NULL;
+      }
+      return new(mem_ctx) ir_dereference_variable(var);
+   }
+   return NULL;
+}
+
+ir_dereference *
+ir_reader::read_dereference(s_expression *expr)
+{
+   s_expression *s_subject;
+   s_expression *s_index;
+   s_symbol *s_field;
+
+   s_pattern array_pat[] = { "array_ref", s_subject, s_index };
+   s_pattern record_pat[] = { "record_ref", s_subject, s_field };
+
+   ir_dereference_variable *var_ref = read_var_ref(expr);
+   if (var_ref != NULL) {
+      return var_ref;
+   } else if (MATCH(expr, array_pat)) {
+      ir_rvalue *subject = read_rvalue(s_subject);
+      if (subject == NULL) {
+	 ir_read_error(NULL, "when reading the subject of an array_ref");
+	 return NULL;
+      }
+
+      ir_rvalue *idx = read_rvalue(s_index);
+      if (idx == NULL) {
+	 ir_read_error(NULL, "when reading the index of an array_ref");
+	 return NULL;
+      }
+      return new(mem_ctx) ir_dereference_array(subject, idx);
+   } else if (MATCH(expr, record_pat)) {
+      ir_rvalue *subject = read_rvalue(s_subject);
+      if (subject == NULL) {
+	 ir_read_error(NULL, "when reading the subject of a record_ref");
+	 return NULL;
+      }
+      return new(mem_ctx) ir_dereference_record(subject, s_field->value());
+   }
+   return NULL;
+}
+
+ir_texture *
+ir_reader::read_texture(s_expression *expr)
+{
+   s_symbol *tag = NULL;
+   s_expression *s_type = NULL;
+   s_expression *s_sampler = NULL;
+   s_expression *s_coord = NULL;
+   s_expression *s_offset = NULL;
+   s_expression *s_proj = NULL;
+   s_list *s_shadow = NULL;
+   s_expression *s_lod = NULL;
+   s_expression *s_sample_index = NULL;
+   s_expression *s_component = NULL;
+
+   ir_texture_opcode op = ir_tex; /* silence warning */
+
+   s_pattern tex_pattern[] =
+      { "tex", s_type, s_sampler, s_coord, s_offset, s_proj, s_shadow };
+   s_pattern lod_pattern[] =
+      { "lod", s_type, s_sampler, s_coord };
+   s_pattern txf_pattern[] =
+      { "txf", s_type, s_sampler, s_coord, s_offset, s_lod };
+   s_pattern txf_ms_pattern[] =
+      { "txf_ms", s_type, s_sampler, s_coord, s_sample_index };
+   s_pattern txs_pattern[] =
+      { "txs", s_type, s_sampler, s_lod };
+   s_pattern tg4_pattern[] =
+      { "tg4", s_type, s_sampler, s_coord, s_offset, s_component };
+   s_pattern query_levels_pattern[] =
+      { "query_levels", s_type, s_sampler };
+   s_pattern other_pattern[] =
+      { tag, s_type, s_sampler, s_coord, s_offset, s_proj, s_shadow, s_lod };
+
+   if (MATCH(expr, lod_pattern)) {
+      op = ir_lod;
+   } else if (MATCH(expr, tex_pattern)) {
+      op = ir_tex;
+   } else if (MATCH(expr, txf_pattern)) {
+      op = ir_txf;
+   } else if (MATCH(expr, txf_ms_pattern)) {
+      op = ir_txf_ms;
+   } else if (MATCH(expr, txs_pattern)) {
+      op = ir_txs;
+   } else if (MATCH(expr, tg4_pattern)) {
+      op = ir_tg4;
+   } else if (MATCH(expr, query_levels_pattern)) {
+      op = ir_query_levels;
+   } else if (MATCH(expr, other_pattern)) {
+      op = ir_texture::get_opcode(tag->value());
+      if (op == -1)
+	 return NULL;
+   } else {
+      ir_read_error(NULL, "unexpected texture pattern %s", tag->value());
+      return NULL;
+   }
+
+   ir_texture *tex = new(mem_ctx) ir_texture(op);
+
+   // Read return type
+   const glsl_type *type = read_type(s_type);
+   if (type == NULL) {
+      ir_read_error(NULL, "when reading type in (%s ...)",
+		    tex->opcode_string());
+      return NULL;
+   }
+
+   // Read sampler (must be a deref)
+   ir_dereference *sampler = read_dereference(s_sampler);
+   if (sampler == NULL) {
+      ir_read_error(NULL, "when reading sampler in (%s ...)",
+		    tex->opcode_string());
+      return NULL;
+   }
+   tex->set_sampler(sampler, type);
+
+   if (op != ir_txs) {
+      // Read coordinate (any rvalue)
+      tex->coordinate = read_rvalue(s_coord);
+      if (tex->coordinate == NULL) {
+	 ir_read_error(NULL, "when reading coordinate in (%s ...)",
+		       tex->opcode_string());
+	 return NULL;
+      }
+
+      if (op != ir_txf_ms && op != ir_lod) {
+         // Read texel offset - either 0 or an rvalue.
+         s_int *si_offset = SX_AS_INT(s_offset);
+         if (si_offset == NULL || si_offset->value() != 0) {
+            tex->offset = read_rvalue(s_offset);
+            if (tex->offset == NULL) {
+               ir_read_error(s_offset, "expected 0 or an expression");
+               return NULL;
+            }
+         }
+      }
+   }
+
+   if (op != ir_txf && op != ir_txf_ms &&
+       op != ir_txs && op != ir_lod && op != ir_tg4 &&
+       op != ir_query_levels) {
+      s_int *proj_as_int = SX_AS_INT(s_proj);
+      if (proj_as_int && proj_as_int->value() == 1) {
+	 tex->projector = NULL;
+      } else {
+	 tex->projector = read_rvalue(s_proj);
+	 if (tex->projector == NULL) {
+	    ir_read_error(NULL, "when reading projective divide in (%s ..)",
+	                  tex->opcode_string());
+	    return NULL;
+	 }
+      }
+
+      if (s_shadow->subexpressions.is_empty()) {
+	 tex->shadow_comparitor = NULL;
+      } else {
+	 tex->shadow_comparitor = read_rvalue(s_shadow);
+	 if (tex->shadow_comparitor == NULL) {
+	    ir_read_error(NULL, "when reading shadow comparitor in (%s ..)",
+			  tex->opcode_string());
+	    return NULL;
+	 }
+      }
+   }
+
+   switch (op) {
+   case ir_txb:
+      tex->lod_info.bias = read_rvalue(s_lod);
+      if (tex->lod_info.bias == NULL) {
+	 ir_read_error(NULL, "when reading LOD bias in (txb ...)");
+	 return NULL;
+      }
+      break;
+   case ir_txl:
+   case ir_txf:
+   case ir_txs:
+      tex->lod_info.lod = read_rvalue(s_lod);
+      if (tex->lod_info.lod == NULL) {
+	 ir_read_error(NULL, "when reading LOD in (%s ...)",
+		       tex->opcode_string());
+	 return NULL;
+      }
+      break;
+   case ir_txf_ms:
+      tex->lod_info.sample_index = read_rvalue(s_sample_index);
+      if (tex->lod_info.sample_index == NULL) {
+         ir_read_error(NULL, "when reading sample_index in (txf_ms ...)");
+         return NULL;
+      }
+      break;
+   case ir_txd: {
+      s_expression *s_dx, *s_dy;
+      s_pattern dxdy_pat[] = { s_dx, s_dy };
+      if (!MATCH(s_lod, dxdy_pat)) {
+	 ir_read_error(s_lod, "expected (dPdx dPdy) in (txd ...)");
+	 return NULL;
+      }
+      tex->lod_info.grad.dPdx = read_rvalue(s_dx);
+      if (tex->lod_info.grad.dPdx == NULL) {
+	 ir_read_error(NULL, "when reading dPdx in (txd ...)");
+	 return NULL;
+      }
+      tex->lod_info.grad.dPdy = read_rvalue(s_dy);
+      if (tex->lod_info.grad.dPdy == NULL) {
+	 ir_read_error(NULL, "when reading dPdy in (txd ...)");
+	 return NULL;
+      }
+      break;
+   }
+   case ir_tg4:
+      tex->lod_info.component = read_rvalue(s_component);
+      if (tex->lod_info.component == NULL) {
+         ir_read_error(NULL, "when reading component in (tg4 ...)");
+         return NULL;
+      }
+      break;
+   default:
+      // tex and lod don't have any extra parameters.
+      break;
+   };
+   return tex;
+}
+
+ir_emit_vertex *
+ir_reader::read_emit_vertex(s_expression *expr)
+{
+   s_pattern pat[] = { "emit-vertex" };
+
+   if (MATCH(expr, pat)) {
+      return new(mem_ctx) ir_emit_vertex();
+   }
+   ir_read_error(NULL, "when reading emit-vertex");
+   return NULL;
+}
+
+ir_end_primitive *
+ir_reader::read_end_primitive(s_expression *expr)
+{
+   s_pattern pat[] = { "end-primitive" };
+
+   if (MATCH(expr, pat)) {
+      return new(mem_ctx) ir_end_primitive();
+   }
+   ir_read_error(NULL, "when reading end-primitive");
+   return NULL;
+}

diff --git a/icd/intel/compiler/shader/ir_reader.h b/icd/intel/compiler/shader/ir_reader.h
new file mode 100644
index 0000000..aef2ca2
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_reader.h

@@ -0,0 +1,34 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef IR_READER_H
+#define IR_READER_H
+
+#include "ir.h"
+
+void _mesa_glsl_read_ir(_mesa_glsl_parse_state *state, exec_list *instructions,
+			const char *src, bool scan_for_prototypes);
+
+#endif /* IR_READER_H */

diff --git a/icd/intel/compiler/shader/ir_rvalue_visitor.cpp b/icd/intel/compiler/shader/ir_rvalue_visitor.cpp
new file mode 100644
index 0000000..fcbe944
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_rvalue_visitor.cpp

@@ -0,0 +1,259 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file ir_rvalue_visitor.cpp
+ *
+ * Generic class to implement the common pattern we have of wanting to
+ * visit each ir_rvalue * and possibly change that node to a different
+ * class.
+ */
+
+#include "ir.h"
+#include "ir_visitor.h"
+#include "ir_rvalue_visitor.h"
+#include "glsl_types.h"
+
+ir_visitor_status
+ir_rvalue_base_visitor::rvalue_visit(ir_expression *ir)
+{
+   unsigned int operand;
+
+   for (operand = 0; operand < ir->get_num_operands(); operand++) {
+      handle_rvalue(&ir->operands[operand]);
+   }
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_rvalue_base_visitor::rvalue_visit(ir_texture *ir)
+{
+   handle_rvalue(&ir->coordinate);
+   handle_rvalue(&ir->projector);
+   handle_rvalue(&ir->shadow_comparitor);
+   handle_rvalue(&ir->offset);
+
+   switch (ir->op) {
+   case ir_tex:
+   case ir_lod:
+   case ir_query_levels:
+      break;
+   case ir_txb:
+      handle_rvalue(&ir->lod_info.bias);
+      break;
+   case ir_txf:
+   case ir_txl:
+   case ir_txs:
+      handle_rvalue(&ir->lod_info.lod);
+      break;
+   case ir_txf_ms:
+      handle_rvalue(&ir->lod_info.sample_index);
+      break;
+   case ir_txd:
+      handle_rvalue(&ir->lod_info.grad.dPdx);
+      handle_rvalue(&ir->lod_info.grad.dPdy);
+      break;
+   case ir_tg4:
+      handle_rvalue(&ir->lod_info.component);
+      break;
+   }
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_rvalue_base_visitor::rvalue_visit(ir_swizzle *ir)
+{
+   handle_rvalue(&ir->val);
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_rvalue_base_visitor::rvalue_visit(ir_dereference_array *ir)
+{
+   /* The array index is not the target of the assignment, so clear the
+    * 'in_assignee' flag.  Restore it after returning from the array index.
+    */
+   const bool was_in_assignee = this->in_assignee;
+   this->in_assignee = false;
+   handle_rvalue(&ir->array_index);
+   this->in_assignee = was_in_assignee;
+
+   handle_rvalue(&ir->array);
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_rvalue_base_visitor::rvalue_visit(ir_dereference_record *ir)
+{
+   handle_rvalue(&ir->record);
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_rvalue_base_visitor::rvalue_visit(ir_assignment *ir)
+{
+   handle_rvalue(&ir->rhs);
+   handle_rvalue(&ir->condition);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_rvalue_base_visitor::rvalue_visit(ir_call *ir)
+{
+   foreach_list_safe(n, &ir->actual_parameters) {
+      ir_rvalue *param = (ir_rvalue *) n;
+      ir_rvalue *new_param = param;
+      handle_rvalue(&new_param);
+
+      if (new_param != param) {
+	 param->replace_with(new_param);
+      }
+   }
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_rvalue_base_visitor::rvalue_visit(ir_return *ir)
+{
+   handle_rvalue(&ir->value);;
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_rvalue_base_visitor::rvalue_visit(ir_if *ir)
+{
+   handle_rvalue(&ir->condition);
+   return visit_continue;
+}
+
+
+ir_visitor_status
+ir_rvalue_visitor::visit_leave(ir_expression *ir)
+{
+   return rvalue_visit(ir);
+}
+
+ir_visitor_status
+ir_rvalue_visitor::visit_leave(ir_texture *ir)
+{
+   return rvalue_visit(ir);
+}
+
+ir_visitor_status
+ir_rvalue_visitor::visit_leave(ir_swizzle *ir)
+{
+   return rvalue_visit(ir);
+}
+
+ir_visitor_status
+ir_rvalue_visitor::visit_leave(ir_dereference_array *ir)
+{
+   return rvalue_visit(ir);
+}
+
+ir_visitor_status
+ir_rvalue_visitor::visit_leave(ir_dereference_record *ir)
+{
+   return rvalue_visit(ir);
+}
+
+ir_visitor_status
+ir_rvalue_visitor::visit_leave(ir_assignment *ir)
+{
+   return rvalue_visit(ir);
+}
+
+ir_visitor_status
+ir_rvalue_visitor::visit_leave(ir_call *ir)
+{
+   return rvalue_visit(ir);
+}
+
+ir_visitor_status
+ir_rvalue_visitor::visit_leave(ir_return *ir)
+{
+   return rvalue_visit(ir);
+}
+
+ir_visitor_status
+ir_rvalue_visitor::visit_leave(ir_if *ir)
+{
+   return rvalue_visit(ir);
+}
+
+ir_visitor_status
+ir_rvalue_enter_visitor::visit_enter(ir_expression *ir)
+{
+   return rvalue_visit(ir);
+}
+
+ir_visitor_status
+ir_rvalue_enter_visitor::visit_enter(ir_texture *ir)
+{
+   return rvalue_visit(ir);
+}
+
+ir_visitor_status
+ir_rvalue_enter_visitor::visit_enter(ir_swizzle *ir)
+{
+   return rvalue_visit(ir);
+}
+
+ir_visitor_status
+ir_rvalue_enter_visitor::visit_enter(ir_dereference_array *ir)
+{
+   return rvalue_visit(ir);
+}
+
+ir_visitor_status
+ir_rvalue_enter_visitor::visit_enter(ir_dereference_record *ir)
+{
+   return rvalue_visit(ir);
+}
+
+ir_visitor_status
+ir_rvalue_enter_visitor::visit_enter(ir_assignment *ir)
+{
+   return rvalue_visit(ir);
+}
+
+ir_visitor_status
+ir_rvalue_enter_visitor::visit_enter(ir_call *ir)
+{
+   return rvalue_visit(ir);
+}
+
+ir_visitor_status
+ir_rvalue_enter_visitor::visit_enter(ir_return *ir)
+{
+   return rvalue_visit(ir);
+}
+
+ir_visitor_status
+ir_rvalue_enter_visitor::visit_enter(ir_if *ir)
+{
+   return rvalue_visit(ir);
+}

diff --git a/icd/intel/compiler/shader/ir_serialize.cpp b/icd/intel/compiler/shader/ir_serialize.cpp
new file mode 100644
index 0000000..cc3f9bd
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_serialize.cpp

@@ -0,0 +1,343 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "ir_serialize.h"
+
+
+/**
+ * Wraps serialization of an ir instruction, writes ir_type
+ * and length of each instruction package as a header for it
+ */
+void
+ir_instruction::serialize(memory_writer &mem)
+{
+   uint32_t data_len = 0;
+   uint8_t ir_type = this->ir_type;
+   mem.write_uint8_t(ir_type);
+
+   unsigned start_pos = mem.position();
+   mem.write_uint32_t(data_len);
+
+   this->serialize_data(mem);
+
+   data_len = mem.position() - start_pos - sizeof(data_len);
+   mem.overwrite(&data_len, sizeof(data_len), start_pos);
+}
+
+
+/**
+ * Wraps rvalue serialization, rvalue has its type or
+ * ir_type_unset written before it to indicate if value is NULL
+ */
+static void
+serialize_rvalue(ir_rvalue *val, memory_writer &mem)
+{
+   uint8_t ir_type = val ? val->ir_type : ir_type_unset;
+   mem.write_uint8_t(ir_type);
+
+   if (val)
+      val->serialize(mem);
+}
+
+
+/**
+ * Serialization of exec_list, writes length of the list +
+ * calls serialize_data for each instruction
+ */
+void
+serialize_list(exec_list *list, memory_writer &mem)
+{
+   uint32_t list_len = 0;
+
+   unsigned start_pos = mem.position();
+   mem.write_uint32_t(list_len);
+
+   foreach_list(node, list) {
+      ((ir_instruction *)node)->serialize(mem);
+      list_len++;
+   }
+
+   mem.overwrite(&list_len, sizeof(list_len), start_pos);
+}
+
+
+void
+ir_variable::serialize_data(memory_writer &mem)
+{
+   /* name can be NULL, see ir_print_visitor for explanation */
+   const char *non_null_name = name ? name : "parameter";
+   int64_t unique_id = (int64_t) (intptr_t) this;
+   uint8_t mode = data.mode;
+   uint8_t has_constant_value = constant_value ? 1 : 0;
+   uint8_t has_constant_initializer = constant_initializer ? 1 : 0;
+
+   type->serialize(mem);
+
+   mem.write_string(non_null_name);
+   mem.write_int64_t(unique_id);
+   mem.write_uint8_t(mode);
+
+   mem.write(&data, sizeof(data));
+
+   mem.write_uint32_t(num_state_slots);
+   mem.write_uint8_t(has_constant_value);
+   mem.write_uint8_t(has_constant_initializer);
+
+   for (unsigned i = 0; i < num_state_slots; i++) {
+      mem.write_int32_t(state_slots[i].swizzle);
+      for (unsigned j = 0; j < 5; j++) {
+         mem.write_int32_t(state_slots[i].tokens[j]);
+      }
+   }
+
+   if (constant_value)
+      constant_value->serialize(mem);
+
+   if (constant_initializer)
+      constant_initializer->serialize(mem);
+
+   uint8_t has_interface_type = get_interface_type() ? 1 : 0;
+
+   mem.write_uint8_t(has_interface_type);
+   if (has_interface_type)
+      get_interface_type()->serialize(mem);
+}
+
+
+void
+ir_assignment::serialize_data(memory_writer &mem)
+{
+   uint8_t assignment_mask = write_mask;
+   mem.write_uint8_t(assignment_mask);
+
+   serialize_rvalue(lhs, mem);
+   serialize_rvalue(condition, mem);
+   serialize_rvalue(rhs, mem);
+}
+
+
+void
+ir_call::serialize_data(memory_writer &mem)
+{
+   mem.write_string(callee_name());
+
+   uint8_t list_len = 0;
+   uint8_t uses_builtin = use_builtin;
+
+   serialize_rvalue(return_deref, mem);
+
+   unsigned start_pos = mem.position();
+   mem.write_uint8_t(list_len);
+
+   foreach_list(node, &this->actual_parameters) {
+      serialize_rvalue((ir_rvalue*)node, mem);
+      list_len++;
+   }
+
+   mem.overwrite(&list_len, sizeof(list_len), start_pos);
+   mem.write_uint8_t(uses_builtin);
+}
+
+
+void
+ir_constant::serialize_data(memory_writer &mem)
+{
+   type->serialize(mem);
+
+   mem.write(&value, sizeof(ir_constant_data));
+
+   if (array_elements) {
+      for (unsigned i = 0; i < type->length; i++)
+         array_elements[i]->serialize(mem);
+   }
+
+   /* struct constant, dump components exec_list */
+   if (!components.is_empty())
+      serialize_list(&components, mem);
+}
+
+
+void
+ir_dereference_array::serialize_data(memory_writer &mem)
+{
+   serialize_rvalue(array, mem);
+   serialize_rvalue(array_index, mem);
+}
+
+
+void
+ir_dereference_record::serialize_data(memory_writer &mem)
+{
+   mem.write_string(field);
+   serialize_rvalue(record, mem);
+}
+
+
+
+/**
+ * address of the variable is used as unique identifier for it
+ */
+void
+ir_dereference_variable::serialize_data(memory_writer &mem)
+{
+   mem.write_int64_t((int64_t) (intptr_t) var);
+}
+
+
+void
+ir_discard::serialize_data(memory_writer &mem)
+{
+   serialize_rvalue(condition, mem);
+}
+
+
+void
+ir_expression::serialize_data(memory_writer &mem)
+{
+   uint32_t num_operands = get_num_operands();
+   uint32_t exp_operation = operation;
+
+   type->serialize(mem);
+
+   mem.write_uint32_t(exp_operation);
+   mem.write_uint32_t(num_operands);
+
+   for (unsigned i = 0; i < num_operands; i++)
+      serialize_rvalue(operands[i], mem);
+}
+
+
+void
+ir_function::serialize_data(memory_writer &mem)
+{
+   mem.write_string(name);
+   serialize_list(&signatures, mem);
+}
+
+
+void
+ir_function_signature::serialize_data(memory_writer &mem)
+{
+   uint8_t builtin_func = is_builtin();
+   mem.write_uint8_t(builtin_func);
+
+   uint8_t is_defined = this->is_defined;
+   mem.write_uint8_t(is_defined);
+
+   /* dump the return type of function */
+   return_type->serialize(mem);
+
+   /* function parameters */
+   serialize_list(&parameters, mem);
+
+   /* function body */
+   serialize_list(&body, mem);
+}
+
+
+void
+ir_if::serialize_data(memory_writer &mem)
+{
+   serialize_rvalue(condition, mem);
+   serialize_list(&then_instructions, mem);
+   serialize_list(&else_instructions, mem);
+}
+
+
+void
+ir_loop::serialize_data(memory_writer &mem)
+{
+   serialize_list(&body_instructions, mem);
+}
+
+
+void
+ir_loop_jump::serialize_data(memory_writer &mem)
+{
+   uint8_t jump_mode = mode;
+   mem.write_uint8_t(jump_mode);
+}
+
+
+void
+ir_return::serialize_data(memory_writer &mem)
+{
+   serialize_rvalue(value, mem);
+}
+
+
+void
+ir_swizzle::serialize_data(memory_writer &mem)
+{
+   mem.write(&mask, sizeof(ir_swizzle_mask));
+   serialize_rvalue(val, mem);
+}
+
+
+void
+ir_texture::serialize_data(memory_writer &mem)
+{
+   mem.write_int32_t((int32_t)op);
+
+   type->serialize(mem);
+
+   /* sampler */
+   mem.write_uint8_t((uint8_t)sampler->ir_type);
+
+   sampler->serialize(mem);
+
+   serialize_rvalue(coordinate, mem);
+   serialize_rvalue(projector, mem);
+   serialize_rvalue(shadow_comparitor, mem);
+   serialize_rvalue(offset, mem);
+
+   /* lod_info structure */
+   serialize_rvalue(lod_info.lod, mem);
+   serialize_rvalue(lod_info.bias, mem);
+   serialize_rvalue(lod_info.sample_index, mem);
+   serialize_rvalue(lod_info.component, mem);
+   serialize_rvalue(lod_info.grad.dPdx, mem);
+   serialize_rvalue(lod_info.grad.dPdy, mem);
+}
+
+
+void
+ir_emit_vertex::serialize_data(memory_writer &mem)
+{
+   /* no data */
+}
+
+
+void
+ir_end_primitive::serialize_data(memory_writer &mem)
+{
+   /* no data */
+}
+
+
+void
+ir_rvalue::serialize_data(memory_writer &mem)
+{
+   assert(0 && "unreachable");
+}

diff --git a/icd/intel/compiler/shader/ir_serialize.h b/icd/intel/compiler/shader/ir_serialize.h
new file mode 100644
index 0000000..d6c4741
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_serialize.h

@@ -0,0 +1,36 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef IR_SERIALIZE_H
+#define IR_SERIALIZE_H
+
+#include "ir.h"
+#include "memory_writer.h"
+#include "main/hash_table.h"
+
+void
+serialize_list(exec_list *list, memory_writer &mem);
+
+#endif /* IR_SERIALIZE_H */

diff --git a/icd/intel/compiler/shader/ir_set_program_inouts.cpp b/icd/intel/compiler/shader/ir_set_program_inouts.cpp
new file mode 100644
index 0000000..e2ec9b6
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_set_program_inouts.cpp

@@ -0,0 +1,354 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file ir_set_program_inouts.cpp
+ *
+ * Sets the InputsRead and OutputsWritten of Mesa programs.
+ *
+ * Additionally, for fragment shaders, sets the InterpQualifier array, the
+ * IsCentroid and IsSample bitfields, and the UsesDFdy flag.
+ *
+ * Mesa programs (gl_program, not gl_shader_program) have a set of
+ * flags indicating which varyings are read and written.  Computing
+ * which are actually read from some sort of backend code can be
+ * tricky when variable array indexing involved.  So this pass
+ * provides support for setting InputsRead and OutputsWritten right
+ * from the GLSL IR.
+ */
+
+#include "libfns.h" // LunarG ADD:
+#include "ir.h"
+#include "ir_visitor.h"
+#include "glsl_types.h"
+
+namespace {
+
+class ir_set_program_inouts_visitor : public ir_hierarchical_visitor {
+public:
+   ir_set_program_inouts_visitor(struct gl_program *prog,
+                                 gl_shader_stage shader_stage)
+   {
+      this->prog = prog;
+      this->shader_stage = shader_stage;
+   }
+   ~ir_set_program_inouts_visitor()
+   {
+   }
+
+   virtual ir_visitor_status visit_enter(ir_dereference_array *);
+   virtual ir_visitor_status visit_enter(ir_function_signature *);
+   virtual ir_visitor_status visit_enter(ir_expression *);
+   virtual ir_visitor_status visit_enter(ir_discard *);
+   virtual ir_visitor_status visit_enter(ir_texture *);
+   virtual ir_visitor_status visit(ir_dereference_variable *);
+
+private:
+   void mark_whole_variable(ir_variable *var);
+   bool try_mark_partial_variable(ir_variable *var, ir_rvalue *index);
+
+   struct gl_program *prog;
+   gl_shader_stage shader_stage;
+};
+
+} /* anonymous namespace */
+
+static inline bool
+is_shader_inout(ir_variable *var)
+{
+   return var->data.mode == ir_var_shader_in ||
+          var->data.mode == ir_var_shader_out ||
+          var->data.mode == ir_var_system_value;
+}
+
+static void
+mark(struct gl_program *prog, ir_variable *var, int offset, int len,
+     bool is_fragment_shader)
+{
+   /* As of GLSL 1.20, varyings can only be floats, floating-point
+    * vectors or matrices, or arrays of them.  For Mesa programs using
+    * InputsRead/OutputsWritten, everything but matrices uses one
+    * slot, while matrices use a slot per column.  Presumably
+    * something doing a more clever packing would use something other
+    * than InputsRead/OutputsWritten.
+    */
+
+   for (int i = 0; i < len; i++) {
+      GLbitfield64 bitfield =
+         BITFIELD64_BIT(var->data.location + var->data.index + offset + i);
+      if (var->data.mode == ir_var_shader_in) {
+	 prog->InputsRead |= bitfield;
+         if (is_fragment_shader) {
+            gl_fragment_program *fprog = (gl_fragment_program *) prog;
+            fprog->InterpQualifier[var->data.location +
+                                   var->data.index + offset + i] =
+               (glsl_interp_qualifier) var->data.interpolation;
+            if (var->data.centroid)
+               fprog->IsCentroid |= bitfield;
+            if (var->data.sample)
+               fprog->IsSample |= bitfield;
+         }
+      } else if (var->data.mode == ir_var_system_value) {
+         prog->SystemValuesRead |= bitfield;
+      } else {
+         assert(var->data.mode == ir_var_shader_out);
+	 prog->OutputsWritten |= bitfield;
+      }
+   }
+}
+
+/**
+ * Mark an entire variable as used.  Caller must ensure that the variable
+ * represents a shader input or output.
+ */
+void
+ir_set_program_inouts_visitor::mark_whole_variable(ir_variable *var)
+{
+   const glsl_type *type = var->type;
+   if (this->shader_stage == MESA_SHADER_GEOMETRY &&
+       var->data.mode == ir_var_shader_in && type->is_array()) {
+      type = type->fields.array;
+   }
+
+   mark(this->prog, var, 0, type->count_attribute_slots(),
+        this->shader_stage == MESA_SHADER_FRAGMENT);
+}
+
+/* Default handler: Mark all the locations in the variable as used. */
+ir_visitor_status
+ir_set_program_inouts_visitor::visit(ir_dereference_variable *ir)
+{
+   if (!is_shader_inout(ir->var))
+      return visit_continue;
+
+   mark_whole_variable(ir->var);
+
+   return visit_continue;
+}
+
+/**
+ * Try to mark a portion of the given variable as used.  Caller must ensure
+ * that the variable represents a shader input or output which can be indexed
+ * into in array fashion (an array or matrix).  For the purpose of geometry
+ * shader inputs (which are always arrays*), this means that the array element
+ * must be something that can be indexed into in array fashion.
+ *
+ * *Except gl_PrimitiveIDIn, as noted below.
+ *
+ * If the index can't be interpreted as a constant, or some other problem
+ * occurs, then nothing will be marked and false will be returned.
+ */
+bool
+ir_set_program_inouts_visitor::try_mark_partial_variable(ir_variable *var,
+                                                         ir_rvalue *index)
+{
+   const glsl_type *type = var->type;
+
+   if (this->shader_stage == MESA_SHADER_GEOMETRY &&
+       var->data.mode == ir_var_shader_in) {
+      /* The only geometry shader input that is not an array is
+       * gl_PrimitiveIDIn, and in that case, this code will never be reached,
+       * because gl_PrimitiveIDIn can't be indexed into in array fashion.
+       */
+      assert(type->is_array());
+      type = type->fields.array;
+   }
+
+   /* The code below only handles:
+    *
+    * - Indexing into matrices
+    * - Indexing into arrays of (matrices, vectors, or scalars)
+    *
+    * All other possibilities are either prohibited by GLSL (vertex inputs and
+    * fragment outputs can't be structs) or should have been eliminated by
+    * lowering passes (do_vec_index_to_swizzle() gets rid of indexing into
+    * vectors, and lower_packed_varyings() gets rid of structs that occur in
+    * varyings).
+    */
+   if (!(type->is_matrix() ||
+        (type->is_array() &&
+         (type->fields.array->is_numeric() ||
+          type->fields.array->is_boolean())))) {
+      assert(!"Unexpected indexing in ir_set_program_inouts");
+
+      /* For safety in release builds, in case we ever encounter unexpected
+       * indexing, give up and let the caller mark the whole variable as used.
+       */
+      return false;
+   }
+
+   ir_constant *index_as_constant = index->as_constant();
+   if (!index_as_constant)
+      return false;
+
+   unsigned elem_width;
+   unsigned num_elems;
+   if (type->is_array()) {
+      num_elems = type->length;
+      if (type->fields.array->is_matrix())
+         elem_width = type->fields.array->matrix_columns;
+      else
+         elem_width = 1;
+   } else {
+      num_elems = type->matrix_columns;
+      elem_width = 1;
+   }
+
+   if (index_as_constant->value.u[0] >= num_elems) {
+      /* Constant index outside the bounds of the matrix/array.  This could
+       * arise as a result of constant folding of a legal GLSL program.
+       *
+       * Even though the spec says that indexing outside the bounds of a
+       * matrix/array results in undefined behaviour, we don't want to pass
+       * out-of-range values to mark() (since this could result in slots that
+       * don't exist being marked as used), so just let the caller mark the
+       * whole variable as used.
+       */
+      return false;
+   }
+
+   mark(this->prog, var, index_as_constant->value.u[0] * elem_width,
+        elem_width, this->shader_stage == MESA_SHADER_FRAGMENT);
+   return true;
+}
+
+ir_visitor_status
+ir_set_program_inouts_visitor::visit_enter(ir_dereference_array *ir)
+{
+   /* Note: for geometry shader inputs, lower_named_interface_blocks may
+    * create 2D arrays, so we need to be able to handle those.  2D arrays
+    * shouldn't be able to crop up for any other reason.
+    */
+   if (ir_dereference_array * const inner_array =
+       ir->array->as_dereference_array()) {
+      /*          ir => foo[i][j]
+       * inner_array => foo[i]
+       */
+      if (ir_dereference_variable * const deref_var =
+          inner_array->array->as_dereference_variable()) {
+         if (this->shader_stage == MESA_SHADER_GEOMETRY &&
+             deref_var->var->data.mode == ir_var_shader_in) {
+            /* foo is a geometry shader input, so i is the vertex, and j the
+             * part of the input we're accessing.
+             */
+            if (try_mark_partial_variable(deref_var->var, ir->array_index))
+            {
+               /* We've now taken care of foo and j, but i might contain a
+                * subexpression that accesses shader inputs.  So manually
+                * visit i and then continue with the parent.
+                */
+               inner_array->array_index->accept(this);
+               return visit_continue_with_parent;
+            }
+         }
+      }
+   } else if (ir_dereference_variable * const deref_var =
+              ir->array->as_dereference_variable()) {
+      /* ir => foo[i], where foo is a variable. */
+      if (this->shader_stage == MESA_SHADER_GEOMETRY &&
+          deref_var->var->data.mode == ir_var_shader_in) {
+         /* foo is a geometry shader input, so i is the vertex, and we're
+          * accessing the entire input.
+          */
+         mark_whole_variable(deref_var->var);
+         /* We've now taken care of foo, but i might contain a subexpression
+          * that accesses shader inputs.  So manually visit i and then
+          * continue with the parent.
+          */
+         ir->array_index->accept(this);
+         return visit_continue_with_parent;
+      } else if (is_shader_inout(deref_var->var)) {
+         /* foo is a shader input/output, but not a geometry shader input,
+          * so i is the part of the input we're accessing.
+          */
+         if (try_mark_partial_variable(deref_var->var, ir->array_index))
+            return visit_continue_with_parent;
+      }
+   }
+
+   /* The expression is something we don't recognize.  Just visit its
+    * subexpressions.
+    */
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_set_program_inouts_visitor::visit_enter(ir_function_signature *ir)
+{
+   /* We don't want to descend into the function parameters and
+    * consider them as shader inputs or outputs.
+    */
+   visit_list_elements(this, &ir->body);
+   return visit_continue_with_parent;
+}
+
+ir_visitor_status
+ir_set_program_inouts_visitor::visit_enter(ir_expression *ir)
+{
+   if (this->shader_stage == MESA_SHADER_FRAGMENT &&
+       ir->operation == ir_unop_dFdy) {
+      gl_fragment_program *fprog = (gl_fragment_program *) prog;
+      fprog->UsesDFdy = true;
+   }
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_set_program_inouts_visitor::visit_enter(ir_discard *)
+{
+   /* discards are only allowed in fragment shaders. */
+   assert(this->shader_stage == MESA_SHADER_FRAGMENT);
+
+   gl_fragment_program *fprog = (gl_fragment_program *) prog;
+   fprog->UsesKill = true;
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_set_program_inouts_visitor::visit_enter(ir_texture *ir)
+{
+   if (ir->op == ir_tg4)
+      prog->UsesGather = true;
+   return visit_continue;
+}
+
+void
+do_set_program_inouts(exec_list *instructions, struct gl_program *prog,
+                      gl_shader_stage shader_stage)
+{
+   ir_set_program_inouts_visitor v(prog, shader_stage);
+
+   prog->InputsRead = 0;
+   prog->OutputsWritten = 0;
+   prog->SystemValuesRead = 0;
+   if (shader_stage == MESA_SHADER_FRAGMENT) {
+      gl_fragment_program *fprog = (gl_fragment_program *) prog;
+      memset(fprog->InterpQualifier, 0, sizeof(fprog->InterpQualifier));
+      fprog->IsCentroid = 0;
+      fprog->IsSample = 0;
+      fprog->UsesDFdy = false;
+      fprog->UsesKill = false;
+   }
+   visit_list_elements(&v, instructions);
+}

diff --git a/icd/intel/compiler/shader/ir_uniform.h b/icd/intel/compiler/shader/ir_uniform.h
new file mode 100644
index 0000000..3508509
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_uniform.h

@@ -0,0 +1,193 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef IR_UNIFORM_H
+#define IR_UNIFORM_H
+
+
+/* stdbool.h is necessary because this file is included in both C and C++ code.
+ */
+#include <stdbool.h>
+
+#include "program/prog_parameter.h"  /* For union gl_constant_value. */
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+enum gl_uniform_driver_format {
+   uniform_native = 0,          /**< Store data in the native format. */
+   uniform_int_float,           /**< Store integer data as floats. */
+   uniform_bool_float,          /**< Store boolean data as floats. */
+
+   /**
+    * Store boolean data as integer using 1 for \c true.
+    */
+   uniform_bool_int_0_1,
+
+   /**
+    * Store boolean data as integer using ~0 for \c true.
+    */
+   uniform_bool_int_0_not0
+};
+
+struct gl_uniform_driver_storage {
+   /**
+    * Number of bytes from one array element to the next.
+    */
+   uint8_t element_stride;
+
+   /**
+    * Number of bytes from one vector in a matrix to the next.
+    */
+   uint8_t vector_stride;
+
+   /**
+    * Base format of the stored data.
+    *
+    * This field must have a value from \c GLSL_TYPE_UINT through \c
+    * GLSL_TYPE_SAMPLER.
+    */
+   uint8_t format;
+
+   /**
+    * Pointer to the base of the data.
+    */
+   void *data;
+};
+
+struct gl_opaque_uniform_index {
+   /**
+    * Base opaque uniform index
+    *
+    * If \c gl_uniform_storage::base_type is an opaque type, this
+    * represents its uniform index.  If \c
+    * gl_uniform_storage::array_elements is not zero, the array will
+    * use opaque uniform indices \c index through \c index + \c
+    * gl_uniform_storage::array_elements - 1, inclusive.
+    *
+    * Note that the index may be different in each shader stage.
+    */
+   uint8_t index;
+
+   /**
+    * Whether this opaque uniform is used in this shader stage.
+    */
+   bool active;
+};
+
+struct gl_uniform_storage {
+   char *name;
+   /** Type of this uniform data stored.
+    *
+    * In the case of an array, it's the type of a single array element.
+    */
+   const struct glsl_type *type;
+
+   /**
+    * The number of elements in this uniform.
+    *
+    * For non-arrays, this is always 0.  For arrays, the value is the size of
+    * the array.
+    */
+   unsigned array_elements;
+
+   /**
+    * Has this uniform ever been set?
+    */
+   bool initialized;
+
+   struct gl_opaque_uniform_index sampler[MESA_SHADER_STAGES];
+
+   struct gl_opaque_uniform_index image[MESA_SHADER_STAGES];
+
+   /**
+    * Storage used by the driver for the uniform
+    */
+   unsigned num_driver_storage;
+   struct gl_uniform_driver_storage *driver_storage;
+
+   /**
+    * Storage used by Mesa for the uniform
+    *
+    * This form of the uniform is used by Mesa's implementation of \c
+    * glGetUniform.  It can also be used by drivers to obtain the value of the
+    * uniform if the \c ::driver_storage interface is not used.
+    */
+   union gl_constant_value *storage;
+
+   /** Fields for GL_ARB_uniform_buffer_object
+    * @{
+    */
+
+   /**
+    * GL_UNIFORM_BLOCK_INDEX: index of the uniform block containing
+    * the uniform, or -1 for the default uniform block.  Note that the
+    * index is into the linked program's UniformBlocks[] array, not
+    * the linked shader's.
+    */
+   int block_index;
+
+   /** GL_UNIFORM_OFFSET: byte offset within the uniform block, or -1. */
+   int offset;
+
+   /**
+    * GL_UNIFORM_MATRIX_STRIDE: byte stride between columns or rows of
+    * a matrix.  Set to 0 for non-matrices in UBOs, or -1 for uniforms
+    * in the default uniform block.
+    */
+   int matrix_stride;
+
+   /**
+    * GL_UNIFORM_ARRAY_STRIDE: byte stride between elements of the
+    * array.  Set to zero for non-arrays in UBOs, or -1 for uniforms
+    * in the default uniform block.
+    */
+   int array_stride;
+
+   /** GL_UNIFORM_ROW_MAJOR: true iff it's a row-major matrix in a UBO */
+   bool row_major;
+
+   /** @} */
+
+   /**
+    * Index within gl_shader_program::AtomicBuffers[] of the atomic
+    * counter buffer this uniform is stored in, or -1 if this is not
+    * an atomic counter.
+    */
+   int atomic_buffer_index;
+
+   /**
+    * The 'base location' for this uniform in the uniform remap table. For
+    * arrays this is the first element in the array.
+    */
+   unsigned remap_location;
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* IR_UNIFORM_H */

diff --git a/icd/intel/compiler/shader/ir_validate.cpp b/icd/intel/compiler/shader/ir_validate.cpp
new file mode 100644
index 0000000..71defc8
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_validate.cpp

@@ -0,0 +1,825 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file ir_validate.cpp
+ *
+ * Attempts to verify that various invariants of the IR tree are true.
+ *
+ * In particular, at the moment it makes sure that no single
+ * ir_instruction node except for ir_variable appears multiple times
+ * in the ir tree.  ir_variable does appear multiple times: Once as a
+ * declaration in an exec_list, and multiple times as the endpoint of
+ * a dereference chain.
+ */
+
+#include "ir.h"
+#include "ir_hierarchical_visitor.h"
+#include "program/hash_table.h"
+#include "glsl_types.h"
+
+namespace {
+
+class ir_validate : public ir_hierarchical_visitor {
+public:
+   ir_validate()
+   {
+      this->ht = hash_table_ctor(0, hash_table_pointer_hash,
+				 hash_table_pointer_compare);
+
+      this->current_function = NULL;
+
+      this->callback = ir_validate::validate_ir;
+      this->data = ht;
+   }
+
+   ~ir_validate()
+   {
+      hash_table_dtor(this->ht);
+   }
+
+   virtual ir_visitor_status visit(ir_variable *v);
+   virtual ir_visitor_status visit(ir_dereference_variable *ir);
+
+   virtual ir_visitor_status visit_enter(ir_if *ir);
+
+   virtual ir_visitor_status visit_enter(ir_function *ir);
+   virtual ir_visitor_status visit_leave(ir_function *ir);
+   virtual ir_visitor_status visit_enter(ir_function_signature *ir);
+
+   virtual ir_visitor_status visit_leave(ir_expression *ir);
+   virtual ir_visitor_status visit_leave(ir_swizzle *ir);
+
+   virtual ir_visitor_status visit_enter(class ir_dereference_array *);
+
+   virtual ir_visitor_status visit_enter(ir_assignment *ir);
+   virtual ir_visitor_status visit_enter(ir_call *ir);
+
+   static void validate_ir(ir_instruction *ir, void *data);
+
+   ir_function *current_function;
+
+   struct hash_table *ht;
+};
+
+} /* anonymous namespace */
+
+ir_visitor_status
+ir_validate::visit(ir_dereference_variable *ir)
+{
+   if ((ir->var == NULL) || (ir->var->as_variable() == NULL)) {
+      printf("ir_dereference_variable @ %p does not specify a variable %p\n",
+	     (void *) ir, (void *) ir->var);
+      abort();
+   }
+
+   if (hash_table_find(ht, ir->var) == NULL) {
+      printf("ir_dereference_variable @ %p specifies undeclared variable "
+	     "`%s' @ %p\n",
+	     (void *) ir, ir->var->name, (void *) ir->var);
+      abort();
+   }
+
+   this->validate_ir(ir, this->data);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_validate::visit_enter(class ir_dereference_array *ir)
+{
+   if (!ir->array->type->is_array() && !ir->array->type->is_matrix()) {
+      printf("ir_dereference_array @ %p does not specify an array or a "
+             "matrix\n",
+             (void *) ir);
+      ir->print();
+      printf("\n");
+      abort();
+   }
+
+   if (!ir->array_index->type->is_scalar()) {
+      printf("ir_dereference_array @ %p does not have scalar index: %s\n",
+             (void *) ir, ir->array_index->type->name);
+      abort();
+   }
+
+   if (!ir->array_index->type->is_integer()) {
+      printf("ir_dereference_array @ %p does not have integer index: %s\n",
+             (void *) ir, ir->array_index->type->name);
+      abort();
+   }
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_validate::visit_enter(ir_if *ir)
+{
+   if (ir->condition->type != glsl_type::bool_type) {
+      printf("ir_if condition %s type instead of bool.\n",
+	     ir->condition->type->name);
+      ir->print();
+      printf("\n");
+      abort();
+   }
+
+   return visit_continue;
+}
+
+
+ir_visitor_status
+ir_validate::visit_enter(ir_function *ir)
+{
+   /* Function definitions cannot be nested.
+    */
+   if (this->current_function != NULL) {
+      printf("Function definition nested inside another function "
+	     "definition:\n");
+      printf("%s %p inside %s %p\n",
+	     ir->name, (void *) ir,
+	     this->current_function->name, (void *) this->current_function);
+      abort();
+   }
+
+   /* Store the current function hierarchy being traversed.  This is used
+    * by the function signature visitor to ensure that the signatures are
+    * linked with the correct functions.
+    */
+   this->current_function = ir;
+
+   this->validate_ir(ir, this->data);
+
+   /* Verify that all of the things stored in the list of signatures are,
+    * in fact, function signatures.
+    */
+   foreach_list(node, &ir->signatures) {
+      ir_instruction *sig = (ir_instruction *) node;
+
+      if (sig->ir_type != ir_type_function_signature) {
+	 printf("Non-signature in signature list of function `%s'\n",
+		ir->name);
+	 abort();
+      }
+   }
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_validate::visit_leave(ir_function *ir)
+{
+   assert(ralloc_parent(ir->name) == ir);
+
+   this->current_function = NULL;
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_validate::visit_enter(ir_function_signature *ir)
+{
+   if (this->current_function != ir->function()) {
+      printf("Function signature nested inside wrong function "
+	     "definition:\n");
+      printf("%p inside %s %p instead of %s %p\n",
+	     (void *) ir,
+	     this->current_function->name, (void *) this->current_function,
+	     ir->function_name(), (void *) ir->function());
+      abort();
+   }
+
+   if (ir->return_type == NULL) {
+      printf("Function signature %p for function %s has NULL return type.\n",
+	     (void *) ir, ir->function_name());
+      abort();
+   }
+
+   this->validate_ir(ir, this->data);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_validate::visit_leave(ir_expression *ir)
+{
+   switch (ir->operation) {
+   case ir_unop_bit_not:
+      assert(ir->operands[0]->type == ir->type);
+      break;
+   case ir_unop_logic_not:
+      assert(ir->type->base_type == GLSL_TYPE_BOOL);
+      assert(ir->operands[0]->type->base_type == GLSL_TYPE_BOOL);
+      break;
+
+   case ir_unop_neg:
+   case ir_unop_abs:
+   case ir_unop_sign:
+   case ir_unop_rcp:
+   case ir_unop_rsq:
+   case ir_unop_sqrt:
+      assert(ir->type == ir->operands[0]->type);
+      break;
+
+   case ir_unop_exp:
+   case ir_unop_log:
+   case ir_unop_exp2:
+   case ir_unop_log2:
+      assert(ir->operands[0]->type->base_type == GLSL_TYPE_FLOAT);
+      assert(ir->type == ir->operands[0]->type);
+      break;
+
+   case ir_unop_f2i:
+      assert(ir->operands[0]->type->base_type == GLSL_TYPE_FLOAT);
+      assert(ir->type->base_type == GLSL_TYPE_INT);
+      break;
+   case ir_unop_f2u:
+      assert(ir->operands[0]->type->base_type == GLSL_TYPE_FLOAT);
+      assert(ir->type->base_type == GLSL_TYPE_UINT);
+      break;
+   case ir_unop_i2f:
+      assert(ir->operands[0]->type->base_type == GLSL_TYPE_INT);
+      assert(ir->type->base_type == GLSL_TYPE_FLOAT);
+      break;
+   case ir_unop_f2b:
+      assert(ir->operands[0]->type->base_type == GLSL_TYPE_FLOAT);
+      assert(ir->type->base_type == GLSL_TYPE_BOOL);
+      break;
+   case ir_unop_b2f:
+      assert(ir->operands[0]->type->base_type == GLSL_TYPE_BOOL);
+      assert(ir->type->base_type == GLSL_TYPE_FLOAT);
+      break;
+   case ir_unop_i2b:
+      assert(ir->operands[0]->type->base_type == GLSL_TYPE_INT);
+      assert(ir->type->base_type == GLSL_TYPE_BOOL);
+      break;
+   case ir_unop_b2i:
+      assert(ir->operands[0]->type->base_type == GLSL_TYPE_BOOL);
+      assert(ir->type->base_type == GLSL_TYPE_INT);
+      break;
+   case ir_unop_u2f:
+      assert(ir->operands[0]->type->base_type == GLSL_TYPE_UINT);
+      assert(ir->type->base_type == GLSL_TYPE_FLOAT);
+      break;
+   case ir_unop_i2u:
+      assert(ir->operands[0]->type->base_type == GLSL_TYPE_INT);
+      assert(ir->type->base_type == GLSL_TYPE_UINT);
+      break;
+   case ir_unop_u2i:
+      assert(ir->operands[0]->type->base_type == GLSL_TYPE_UINT);
+      assert(ir->type->base_type == GLSL_TYPE_INT);
+      break;
+   case ir_unop_bitcast_i2f:
+      assert(ir->operands[0]->type->base_type == GLSL_TYPE_INT);
+      assert(ir->type->base_type == GLSL_TYPE_FLOAT);
+      break;
+   case ir_unop_bitcast_f2i:
+      assert(ir->operands[0]->type->base_type == GLSL_TYPE_FLOAT);
+      assert(ir->type->base_type == GLSL_TYPE_INT);
+      break;
+   case ir_unop_bitcast_u2f:
+      assert(ir->operands[0]->type->base_type == GLSL_TYPE_UINT);
+      assert(ir->type->base_type == GLSL_TYPE_FLOAT);
+      break;
+   case ir_unop_bitcast_f2u:
+      assert(ir->operands[0]->type->base_type == GLSL_TYPE_FLOAT);
+      assert(ir->type->base_type == GLSL_TYPE_UINT);
+      break;
+
+   case ir_unop_any:
+      assert(ir->operands[0]->type->base_type == GLSL_TYPE_BOOL);
+      assert(ir->type == glsl_type::bool_type);
+      break;
+
+   case ir_unop_trunc:
+   case ir_unop_round_even:
+   case ir_unop_ceil:
+   case ir_unop_floor:
+   case ir_unop_fract:
+   case ir_unop_sin:
+   case ir_unop_cos:
+   case ir_unop_sin_reduced:
+   case ir_unop_cos_reduced:
+   case ir_unop_dFdx:
+   case ir_unop_dFdy:
+      assert(ir->operands[0]->type->base_type == GLSL_TYPE_FLOAT);
+      assert(ir->operands[0]->type == ir->type);
+      break;
+
+   case ir_unop_pack_snorm_2x16:
+   case ir_unop_pack_unorm_2x16:
+   case ir_unop_pack_half_2x16:
+      assert(ir->type == glsl_type::uint_type);
+      assert(ir->operands[0]->type == glsl_type::vec2_type);
+      break;
+
+   case ir_unop_pack_snorm_4x8:
+   case ir_unop_pack_unorm_4x8:
+      assert(ir->type == glsl_type::uint_type);
+      assert(ir->operands[0]->type == glsl_type::vec4_type);
+      break;
+
+   case ir_unop_unpack_snorm_2x16:
+   case ir_unop_unpack_unorm_2x16:
+   case ir_unop_unpack_half_2x16:
+      assert(ir->type == glsl_type::vec2_type);
+      assert(ir->operands[0]->type == glsl_type::uint_type);
+      break;
+
+   case ir_unop_unpack_snorm_4x8:
+   case ir_unop_unpack_unorm_4x8:
+      assert(ir->type == glsl_type::vec4_type);
+      assert(ir->operands[0]->type == glsl_type::uint_type);
+      break;
+
+   case ir_unop_unpack_half_2x16_split_x:
+   case ir_unop_unpack_half_2x16_split_y:
+      assert(ir->type == glsl_type::float_type);
+      assert(ir->operands[0]->type == glsl_type::uint_type);
+      break;
+
+   case ir_unop_bitfield_reverse:
+      assert(ir->operands[0]->type == ir->type);
+      assert(ir->type->is_integer());
+      break;
+
+   case ir_unop_bit_count:
+   case ir_unop_find_msb:
+   case ir_unop_find_lsb:
+      assert(ir->operands[0]->type->vector_elements == ir->type->vector_elements);
+      assert(ir->operands[0]->type->is_integer());
+      assert(ir->type->base_type == GLSL_TYPE_INT);
+      break;
+
+   case ir_unop_noise:
+      /* XXX what can we assert here? */
+      break;
+
+   case ir_binop_add:
+   case ir_binop_sub:
+   case ir_binop_mul:
+   case ir_binop_div:
+   case ir_binop_mod:
+   case ir_binop_min:
+   case ir_binop_max:
+   case ir_binop_pow:
+      assert(ir->operands[0]->type->base_type ==
+             ir->operands[1]->type->base_type);
+
+      if (ir->operands[0]->type->is_scalar())
+	 assert(ir->operands[1]->type == ir->type);
+      else if (ir->operands[1]->type->is_scalar())
+	 assert(ir->operands[0]->type == ir->type);
+      else if (ir->operands[0]->type->is_vector() &&
+	       ir->operands[1]->type->is_vector()) {
+	 assert(ir->operands[0]->type == ir->operands[1]->type);
+	 assert(ir->operands[0]->type == ir->type);
+      }
+      break;
+
+   case ir_binop_imul_high:
+      assert(ir->type == ir->operands[0]->type);
+      assert(ir->type == ir->operands[1]->type);
+      assert(ir->type->is_integer());
+      break;
+
+   case ir_binop_carry:
+   case ir_binop_borrow:
+      assert(ir->type == ir->operands[0]->type);
+      assert(ir->type == ir->operands[1]->type);
+      assert(ir->type->base_type == GLSL_TYPE_UINT);
+      break;
+
+   case ir_binop_less:
+   case ir_binop_greater:
+   case ir_binop_lequal:
+   case ir_binop_gequal:
+   case ir_binop_equal:
+   case ir_binop_nequal:
+      /* The semantics of the IR operators differ from the GLSL <, >, <=, >=,
+       * ==, and != operators.  The IR operators perform a component-wise
+       * comparison on scalar or vector types and return a boolean scalar or
+       * vector type of the same size.
+       */
+      assert(ir->type->base_type == GLSL_TYPE_BOOL);
+      assert(ir->operands[0]->type == ir->operands[1]->type);
+      assert(ir->operands[0]->type->is_vector()
+	     || ir->operands[0]->type->is_scalar());
+      assert(ir->operands[0]->type->vector_elements
+	     == ir->type->vector_elements);
+      break;
+
+   case ir_binop_all_equal:
+   case ir_binop_any_nequal:
+      /* GLSL == and != operate on scalars, vectors, matrices and arrays, and
+       * return a scalar boolean.  The IR matches that.
+       */
+      assert(ir->type == glsl_type::bool_type);
+      assert(ir->operands[0]->type == ir->operands[1]->type);
+      break;
+
+   case ir_binop_lshift:
+   case ir_binop_rshift:
+      assert(ir->operands[0]->type->is_integer() &&
+             ir->operands[1]->type->is_integer());
+      if (ir->operands[0]->type->is_scalar()) {
+          assert(ir->operands[1]->type->is_scalar());
+      }
+      if (ir->operands[0]->type->is_vector() &&
+          ir->operands[1]->type->is_vector()) {
+          assert(ir->operands[0]->type->components() ==
+                 ir->operands[1]->type->components());
+      }
+      assert(ir->type == ir->operands[0]->type);
+      break;
+
+   case ir_binop_bit_and:
+   case ir_binop_bit_xor:
+   case ir_binop_bit_or:
+       assert(ir->operands[0]->type->base_type ==
+              ir->operands[1]->type->base_type);
+       assert(ir->type->is_integer());
+       if (ir->operands[0]->type->is_vector() &&
+           ir->operands[1]->type->is_vector()) {
+           assert(ir->operands[0]->type->vector_elements ==
+                  ir->operands[1]->type->vector_elements);
+       }
+       break;
+
+   case ir_binop_logic_and:
+   case ir_binop_logic_xor:
+   case ir_binop_logic_or:
+      assert(ir->type == glsl_type::bool_type);
+      assert(ir->operands[0]->type == glsl_type::bool_type);
+      assert(ir->operands[1]->type == glsl_type::bool_type);
+      break;
+
+   case ir_binop_dot:
+      assert(ir->type == glsl_type::float_type);
+      assert(ir->operands[0]->type->base_type == GLSL_TYPE_FLOAT);
+      assert(ir->operands[0]->type->is_vector());
+      assert(ir->operands[0]->type == ir->operands[1]->type);
+      break;
+
+   case ir_binop_pack_half_2x16_split:
+      assert(ir->type == glsl_type::uint_type);
+      assert(ir->operands[0]->type == glsl_type::float_type);
+      assert(ir->operands[1]->type == glsl_type::float_type);
+      break;
+
+   case ir_binop_bfm:
+      assert(ir->type->is_integer());
+      assert(ir->operands[0]->type->is_integer());
+      assert(ir->operands[1]->type->is_integer());
+      break;
+
+   case ir_binop_ubo_load:
+      assert(ir->operands[0]->as_constant());
+      assert(ir->operands[0]->type == glsl_type::uint_type);
+
+      assert(ir->operands[1]->type == glsl_type::uint_type);
+      break;
+
+   case ir_binop_ldexp:
+      assert(ir->operands[0]->type == ir->type);
+      assert(ir->operands[0]->type->is_float());
+      assert(ir->operands[1]->type->base_type == GLSL_TYPE_INT);
+      assert(ir->operands[0]->type->components() ==
+             ir->operands[1]->type->components());
+      break;
+
+   case ir_binop_vector_extract:
+      assert(ir->operands[0]->type->is_vector());
+      assert(ir->operands[1]->type->is_scalar()
+             && ir->operands[1]->type->is_integer());
+      break;
+
+   case ir_triop_fma:
+      assert(ir->type->base_type == GLSL_TYPE_FLOAT);
+      assert(ir->type == ir->operands[0]->type);
+      assert(ir->type == ir->operands[1]->type);
+      assert(ir->type == ir->operands[2]->type);
+      break;
+
+   case ir_triop_lrp:
+      assert(ir->operands[0]->type->base_type == GLSL_TYPE_FLOAT);
+      assert(ir->operands[0]->type == ir->operands[1]->type);
+      assert(ir->operands[2]->type == ir->operands[0]->type || ir->operands[2]->type == glsl_type::float_type);
+      break;
+
+   case ir_triop_csel:
+      assert(ir->operands[0]->type->base_type == GLSL_TYPE_BOOL);
+      assert(ir->type->vector_elements == ir->operands[0]->type->vector_elements);
+      assert(ir->type == ir->operands[1]->type);
+      assert(ir->type == ir->operands[2]->type);
+      break;
+
+   case ir_triop_bfi:
+      assert(ir->operands[0]->type->is_integer());
+      assert(ir->operands[1]->type == ir->operands[2]->type);
+      assert(ir->operands[1]->type == ir->type);
+      break;
+
+   case ir_triop_bitfield_extract:
+      assert(ir->operands[0]->type == ir->type);
+      assert(ir->operands[1]->type == glsl_type::int_type);
+      assert(ir->operands[2]->type == glsl_type::int_type);
+      break;
+
+   case ir_triop_vector_insert:
+      assert(ir->operands[0]->type->is_vector());
+      assert(ir->operands[1]->type->is_scalar());
+      assert(ir->operands[0]->type->base_type == ir->operands[1]->type->base_type);
+      assert(ir->operands[2]->type->is_scalar()
+             && ir->operands[2]->type->is_integer());
+      assert(ir->type == ir->operands[0]->type);
+      break;
+
+   case ir_quadop_bitfield_insert:
+      assert(ir->operands[0]->type == ir->type);
+      assert(ir->operands[1]->type == ir->type);
+      assert(ir->operands[2]->type == glsl_type::int_type);
+      assert(ir->operands[3]->type == glsl_type::int_type);
+      break;
+
+   case ir_quadop_vector:
+      /* The vector operator collects some number of scalars and generates a
+       * vector from them.
+       *
+       *  - All of the operands must be scalar.
+       *  - Number of operands must matche the size of the resulting vector.
+       *  - Base type of the operands must match the base type of the result.
+       */
+      assert(ir->type->is_vector());
+      switch (ir->type->vector_elements) {
+      case 2:
+	 assert(ir->operands[0]->type->is_scalar());
+	 assert(ir->operands[0]->type->base_type == ir->type->base_type);
+	 assert(ir->operands[1]->type->is_scalar());
+	 assert(ir->operands[1]->type->base_type == ir->type->base_type);
+	 assert(ir->operands[2] == NULL);
+	 assert(ir->operands[3] == NULL);
+	 break;
+      case 3:
+	 assert(ir->operands[0]->type->is_scalar());
+	 assert(ir->operands[0]->type->base_type == ir->type->base_type);
+	 assert(ir->operands[1]->type->is_scalar());
+	 assert(ir->operands[1]->type->base_type == ir->type->base_type);
+	 assert(ir->operands[2]->type->is_scalar());
+	 assert(ir->operands[2]->type->base_type == ir->type->base_type);
+	 assert(ir->operands[3] == NULL);
+	 break;
+      case 4:
+	 assert(ir->operands[0]->type->is_scalar());
+	 assert(ir->operands[0]->type->base_type == ir->type->base_type);
+	 assert(ir->operands[1]->type->is_scalar());
+	 assert(ir->operands[1]->type->base_type == ir->type->base_type);
+	 assert(ir->operands[2]->type->is_scalar());
+	 assert(ir->operands[2]->type->base_type == ir->type->base_type);
+	 assert(ir->operands[3]->type->is_scalar());
+	 assert(ir->operands[3]->type->base_type == ir->type->base_type);
+	 break;
+      default:
+	 /* The is_vector assertion above should prevent execution from ever
+	  * getting here.
+	  */
+	 assert(!"Should not get here.");
+	 break;
+      }
+   }
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_validate::visit_leave(ir_swizzle *ir)
+{
+   unsigned int chans[4] = {ir->mask.x, ir->mask.y, ir->mask.z, ir->mask.w};
+
+   for (unsigned int i = 0; i < ir->type->vector_elements; i++) {
+      if (chans[i] >= ir->val->type->vector_elements) {
+	 printf("ir_swizzle @ %p specifies a channel not present "
+		"in the value.\n", (void *) ir);
+	 ir->print();
+	 abort();
+      }
+   }
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_validate::visit(ir_variable *ir)
+{
+   /* An ir_variable is the one thing that can (and will) appear multiple times
+    * in an IR tree.  It is added to the hashtable so that it can be used
+    * in the ir_dereference_variable handler to ensure that a variable is
+    * declared before it is dereferenced.
+    */
+   if (ir->name)
+      assert(ralloc_parent(ir->name) == ir);
+
+   hash_table_insert(ht, ir, ir);
+
+
+   /* If a variable is an array, verify that the maximum array index is in
+    * bounds.  There was once an error in AST-to-HIR conversion that set this
+    * to be out of bounds.
+    */
+   if (ir->type->array_size() > 0) {
+      if (ir->data.max_array_access >= ir->type->length) {
+	 printf("ir_variable has maximum access out of bounds (%d vs %d)\n",
+		ir->data.max_array_access, ir->type->length - 1);
+	 ir->print();
+	 abort();
+      }
+   }
+
+   /* If a variable is an interface block (or an array of interface blocks),
+    * verify that the maximum array index for each interface member is in
+    * bounds.
+    */
+   if (ir->is_interface_instance()) {
+      const glsl_struct_field *fields =
+         ir->get_interface_type()->fields.structure;
+      for (unsigned i = 0; i < ir->get_interface_type()->length; i++) {
+         if (fields[i].type->array_size() > 0) {
+            if (ir->max_ifc_array_access[i] >= fields[i].type->length) {
+               printf("ir_variable has maximum access out of bounds for "
+                      "field %s (%d vs %d)\n", fields[i].name,
+                      ir->max_ifc_array_access[i], fields[i].type->length);
+               ir->print();
+               abort();
+            }
+         }
+      }
+   }
+
+   if (ir->constant_initializer != NULL && !ir->data.has_initializer) {
+      printf("ir_variable didn't have an initializer, but has a constant "
+	     "initializer value.\n");
+      ir->print();
+      abort();
+   }
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_validate::visit_enter(ir_assignment *ir)
+{
+   const ir_dereference *const lhs = ir->lhs;
+   if (lhs->type->is_scalar() || lhs->type->is_vector()) {
+      if (ir->write_mask == 0) {
+	 printf("Assignment LHS is %s, but write mask is 0:\n",
+		lhs->type->is_scalar() ? "scalar" : "vector");
+	 ir->print();
+	 abort();
+      }
+
+      int lhs_components = 0;
+      for (int i = 0; i < 4; i++) {
+	 if (ir->write_mask & (1 << i))
+	    lhs_components++;
+      }
+
+      if (lhs_components != ir->rhs->type->vector_elements) {
+	 printf("Assignment count of LHS write mask channels enabled not\n"
+		"matching RHS vector size (%d LHS, %d RHS).\n",
+		lhs_components, ir->rhs->type->vector_elements);
+	 ir->print();
+	 abort();
+      }
+   }
+
+   this->validate_ir(ir, this->data);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_validate::visit_enter(ir_call *ir)
+{
+   ir_function_signature *const callee = ir->callee;
+
+   if (callee->ir_type != ir_type_function_signature) {
+      printf("IR called by ir_call is not ir_function_signature!\n");
+      abort();
+   }
+
+   if (ir->return_deref) {
+      if (ir->return_deref->type != callee->return_type) {
+	 printf("callee type %s does not match return storage type %s\n",
+	        callee->return_type->name, ir->return_deref->type->name);
+	 abort();
+      }
+   } else if (callee->return_type != glsl_type::void_type) {
+      printf("ir_call has non-void callee but no return storage\n");
+      abort();
+   }
+
+   const exec_node *formal_param_node = callee->parameters.head;
+   const exec_node *actual_param_node = ir->actual_parameters.head;
+   while (true) {
+      if (formal_param_node->is_tail_sentinel()
+          != actual_param_node->is_tail_sentinel()) {
+         printf("ir_call has the wrong number of parameters:\n");
+         goto dump_ir;
+      }
+      if (formal_param_node->is_tail_sentinel()) {
+         break;
+      }
+      const ir_variable *formal_param
+         = (const ir_variable *) formal_param_node;
+      const ir_rvalue *actual_param
+         = (const ir_rvalue *) actual_param_node;
+      if (formal_param->type != actual_param->type) {
+         printf("ir_call parameter type mismatch:\n");
+         goto dump_ir;
+      }
+      if (formal_param->data.mode == ir_var_function_out
+          || formal_param->data.mode == ir_var_function_inout) {
+         if (!actual_param->is_lvalue()) {
+            printf("ir_call out/inout parameters must be lvalues:\n");
+            goto dump_ir;
+         }
+      }
+      formal_param_node = formal_param_node->next;
+      actual_param_node = actual_param_node->next;
+   }
+
+   return visit_continue;
+
+dump_ir:
+   ir->print();
+   printf("callee:\n");
+   callee->print();
+   abort();
+   return visit_stop;
+}
+
+void
+ir_validate::validate_ir(ir_instruction *ir, void *data)
+{
+   struct hash_table *ht = (struct hash_table *) data;
+
+   if (hash_table_find(ht, ir)) {
+      printf("Instruction node present twice in ir tree:\n");
+      ir->print();
+      printf("\n");
+      abort();
+   }
+   hash_table_insert(ht, ir, ir);
+}
+
+void
+check_node_type(ir_instruction *ir, void *data)
+{
+   (void) data;
+
+   if (ir->ir_type <= ir_type_unset || ir->ir_type >= ir_type_max) {
+      printf("Instruction node with unset type\n");
+      ir->print(); printf("\n");
+   }
+   ir_rvalue *value = ir->as_rvalue();
+   if (value != NULL)
+      assert(value->type != glsl_type::error_type);
+}
+
+void
+validate_ir_tree(exec_list *instructions)
+{
+   /* We shouldn't have any reason to validate IR in a release build,
+    * and it's half composed of assert()s anyway which wouldn't do
+    * anything.
+    */
+#ifdef DEBUG
+   ir_validate v;
+
+   v.run(instructions);
+
+   foreach_list(n, instructions) {
+      ir_instruction *ir = (ir_instruction *) n;
+
+      visit_tree(ir, check_node_type, NULL);
+   }
+#endif
+}

diff --git a/icd/intel/compiler/shader/ir_variable_refcount.cpp b/icd/intel/compiler/shader/ir_variable_refcount.cpp
new file mode 100644
index 0000000..923eb1a
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_variable_refcount.cpp

@@ -0,0 +1,134 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file ir_variable_refcount.cpp
+ *
+ * Provides a visitor which produces a list of variables referenced,
+ * how many times they were referenced and assigned, and whether they
+ * were defined in the scope.
+ */
+
+#include "ir.h"
+#include "ir_visitor.h"
+#include "ir_variable_refcount.h"
+#include "glsl_types.h"
+#include "main/hash_table.h"
+
+ir_variable_refcount_visitor::ir_variable_refcount_visitor()
+{
+   this->mem_ctx = ralloc_context(NULL);
+   this->ht = _mesa_hash_table_create(NULL, _mesa_key_pointer_equal);
+}
+
+static void
+free_entry(struct hash_entry *entry)
+{
+   ir_variable_refcount_entry *ivre = (ir_variable_refcount_entry *) entry->data;
+   delete ivre;
+}
+
+ir_variable_refcount_visitor::~ir_variable_refcount_visitor()
+{
+   ralloc_free(this->mem_ctx);
+   _mesa_hash_table_destroy(this->ht, free_entry);
+}
+
+// constructor
+ir_variable_refcount_entry::ir_variable_refcount_entry(ir_variable *var)
+{
+   this->var = var;
+   assign = NULL;
+   assigned_count = 0;
+   declaration = false;
+   referenced_count = 0;
+}
+
+
+ir_variable_refcount_entry *
+ir_variable_refcount_visitor::get_variable_entry(ir_variable *var)
+{
+   assert(var);
+
+   struct hash_entry *e = _mesa_hash_table_search(this->ht,
+						    _mesa_hash_pointer(var),
+						    var);
+   if (e)
+      return (ir_variable_refcount_entry *)e->data;
+
+   ir_variable_refcount_entry *entry = new ir_variable_refcount_entry(var);
+   assert(entry->referenced_count == 0);
+   _mesa_hash_table_insert(this->ht, _mesa_hash_pointer(var), var, entry);
+
+   return entry;
+}
+
+
+ir_visitor_status
+ir_variable_refcount_visitor::visit(ir_variable *ir)
+{
+   ir_variable_refcount_entry *entry = this->get_variable_entry(ir);
+   if (entry)
+      entry->declaration = true;
+
+   return visit_continue;
+}
+
+
+ir_visitor_status
+ir_variable_refcount_visitor::visit(ir_dereference_variable *ir)
+{
+   ir_variable *const var = ir->variable_referenced();
+   ir_variable_refcount_entry *entry = this->get_variable_entry(var);
+
+   if (entry)
+      entry->referenced_count++;
+
+   return visit_continue;
+}
+
+
+ir_visitor_status
+ir_variable_refcount_visitor::visit_enter(ir_function_signature *ir)
+{
+   /* We don't want to descend into the function parameters and
+    * dead-code eliminate them, so just accept the body here.
+    */
+   visit_list_elements(this, &ir->body);
+   return visit_continue_with_parent;
+}
+
+
+ir_visitor_status
+ir_variable_refcount_visitor::visit_leave(ir_assignment *ir)
+{
+   ir_variable_refcount_entry *entry;
+   entry = this->get_variable_entry(ir->lhs->variable_referenced());
+   if (entry) {
+      entry->assigned_count++;
+      if (entry->assign == NULL)
+	 entry->assign = ir;
+   }
+
+   return visit_continue;
+}

diff --git a/icd/intel/compiler/shader/ir_variable_refcount.h b/icd/intel/compiler/shader/ir_variable_refcount.h
new file mode 100644
index 0000000..c15e811
--- /dev/null
+++ b/icd/intel/compiler/shader/ir_variable_refcount.h

@@ -0,0 +1,69 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file ir_variable_refcount.h
+ *
+ * Provides a visitor which produces a list of variables referenced,
+ * how many times they were referenced and assigned, and whether they
+ * were defined in the scope.
+ */
+
+#include "ir.h"
+#include "ir_visitor.h"
+#include "glsl_types.h"
+
+class ir_variable_refcount_entry
+{
+public:
+   ir_variable_refcount_entry(ir_variable *var);
+
+   ir_variable *var; /* The key: the variable's pointer. */
+   ir_assignment *assign; /* An assignment to the variable, if any */
+
+   /** Number of times the variable is referenced, including assignments. */
+   unsigned referenced_count;
+
+   /** Number of times the variable is assigned. */
+   unsigned assigned_count;
+
+   bool declaration; /* If the variable had a decl in the instruction stream */
+};
+
+class ir_variable_refcount_visitor : public ir_hierarchical_visitor {
+public:
+   ir_variable_refcount_visitor(void);
+   ~ir_variable_refcount_visitor(void);
+
+   virtual ir_visitor_status visit(ir_variable *);
+   virtual ir_visitor_status visit(ir_dereference_variable *);
+
+   virtual ir_visitor_status visit_enter(ir_function_signature *);
+   virtual ir_visitor_status visit_leave(ir_assignment *);
+
+   ir_variable_refcount_entry *get_variable_entry(ir_variable *var);
+
+   struct hash_table *ht;
+
+   void *mem_ctx;
+};

diff --git a/icd/intel/compiler/shader/libfns.h b/icd/intel/compiler/shader/libfns.h
new file mode 100644
index 0000000..d2760db
--- /dev/null
+++ b/icd/intel/compiler/shader/libfns.h

@@ -0,0 +1,31 @@
+/*
+ * Copyright © 2010 LunarG Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+// LunarG ADD:
+
+#ifndef LIBFNS_H
+#define LIBFNS_H
+
+#include "main/macros.h"
+
+#endif // LIBFNS_H

diff --git a/icd/intel/compiler/shader/link_atomics.cpp b/icd/intel/compiler/shader/link_atomics.cpp
new file mode 100644
index 0000000..d92cdb1
--- /dev/null
+++ b/icd/intel/compiler/shader/link_atomics.cpp

@@ -0,0 +1,265 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "glsl_parser_extras.h"
+#include "ir.h"
+#include "ir_uniform.h"
+#include "linker.h"
+#include "program/hash_table.h"
+#include "main/macros.h"
+
+namespace {
+   /*
+    * Atomic counter as seen by the program.
+    */
+   struct active_atomic_counter {
+      unsigned id;
+      ir_variable *var;
+   };
+
+   /*
+    * Atomic counter buffer referenced by the program.  There is a one
+    * to one correspondence between these and the objects that can be
+    * queried using glGetActiveAtomicCounterBufferiv().
+    */
+   struct active_atomic_buffer {
+      active_atomic_buffer()
+         : counters(0), num_counters(0), stage_references(), size(0)
+      {}
+
+      ~active_atomic_buffer()
+      {
+         free(counters);
+      }
+
+      void push_back(unsigned id, ir_variable *var)
+      {
+         counters = (active_atomic_counter *)
+            realloc(counters, sizeof(active_atomic_counter) * (num_counters + 1));
+
+         counters[num_counters].id = id;
+         counters[num_counters].var = var;
+         num_counters++;
+      }
+
+      active_atomic_counter *counters;
+      unsigned num_counters;
+      unsigned stage_references[MESA_SHADER_STAGES];
+      unsigned size;
+   };
+
+   int
+   cmp_actives(const void *a, const void *b)
+   {
+      const active_atomic_counter *const first = (active_atomic_counter *) a;
+      const active_atomic_counter *const second = (active_atomic_counter *) b;
+
+      return int(first->var->data.atomic.offset) - int(second->var->data.atomic.offset);
+   }
+
+   bool
+   check_atomic_counters_overlap(const ir_variable *x, const ir_variable *y)
+   {
+      return ((x->data.atomic.offset >= y->data.atomic.offset &&
+               x->data.atomic.offset < y->data.atomic.offset + y->type->atomic_size()) ||
+              (y->data.atomic.offset >= x->data.atomic.offset &&
+               y->data.atomic.offset < x->data.atomic.offset + x->type->atomic_size()));
+   }
+
+   active_atomic_buffer *
+   find_active_atomic_counters(struct gl_context *ctx,
+                               struct gl_shader_program *prog,
+                               unsigned *num_buffers)
+   {
+      active_atomic_buffer *const buffers =
+         new active_atomic_buffer[ctx->Const.MaxAtomicBufferBindings];
+
+      *num_buffers = 0;
+
+      for (unsigned i = 0; i < MESA_SHADER_STAGES; ++i) {
+         struct gl_shader *sh = prog->_LinkedShaders[i];
+         if (sh == NULL)
+            continue;
+
+         foreach_list(node, sh->ir) {
+            ir_variable *var = ((ir_instruction *)node)->as_variable();
+
+            if (var && var->type->contains_atomic()) {
+               unsigned id = 0;
+               bool found = prog->UniformHash->get(id, var->name);
+               assert(found);
+               (void) found;
+               active_atomic_buffer *buf = &buffers[var->data.binding];
+
+               /* If this is the first time the buffer is used, increment
+                * the counter of buffers used.
+                */
+               if (buf->size == 0)
+                  (*num_buffers)++;
+
+               buf->push_back(id, var);
+
+               buf->stage_references[i]++;
+               buf->size = MAX2(buf->size, var->data.atomic.offset +
+                                var->type->atomic_size());
+            }
+         }
+      }
+
+      for (unsigned i = 0; i < ctx->Const.MaxAtomicBufferBindings; i++) {
+         if (buffers[i].size == 0)
+            continue;
+
+         qsort(buffers[i].counters, buffers[i].num_counters,
+               sizeof(active_atomic_counter),
+               cmp_actives);
+
+         for (unsigned j = 1; j < buffers[i].num_counters; j++) {
+            /* If an overlapping counter found, it must be a reference to the
+             * same counter from a different shader stage.
+             */
+            if (check_atomic_counters_overlap(buffers[i].counters[j-1].var,
+                                              buffers[i].counters[j].var)
+                && strcmp(buffers[i].counters[j-1].var->name,
+                          buffers[i].counters[j].var->name) != 0) {
+               linker_error(prog, "Atomic counter %s declared at offset %d "
+                            "which is already in use.",
+                            buffers[i].counters[j].var->name,
+                            buffers[i].counters[j].var->data.atomic.offset);
+            }
+         }
+      }
+      return buffers;
+   }
+}
+
+void
+link_assign_atomic_counter_resources(struct gl_context *ctx,
+                                     struct gl_shader_program *prog)
+{
+   unsigned num_buffers;
+   active_atomic_buffer *abs =
+      find_active_atomic_counters(ctx, prog, &num_buffers);
+
+   prog->AtomicBuffers = rzalloc_array(prog, gl_active_atomic_buffer,
+                                       num_buffers);
+   prog->NumAtomicBuffers = num_buffers;
+
+   unsigned i = 0;
+   for (unsigned binding = 0;
+        binding < ctx->Const.MaxAtomicBufferBindings;
+        binding++) {
+
+      /* If the binding was not used, skip.
+       */
+      if (abs[binding].size == 0)
+         continue;
+
+      active_atomic_buffer &ab = abs[binding];
+      gl_active_atomic_buffer &mab = prog->AtomicBuffers[i];
+
+      /* Assign buffer-specific fields. */
+      mab.Binding = binding;
+      mab.MinimumSize = ab.size;
+      mab.Uniforms = rzalloc_array(prog->AtomicBuffers, GLuint,
+                                   ab.num_counters);
+      mab.NumUniforms = ab.num_counters;
+
+      /* Assign counter-specific fields. */
+      for (unsigned j = 0; j < ab.num_counters; j++) {
+         ir_variable *const var = ab.counters[j].var;
+         const unsigned id = ab.counters[j].id;
+         gl_uniform_storage *const storage = &prog->UniformStorage[id];
+
+         mab.Uniforms[j] = id;
+         var->data.atomic.buffer_index = i;
+         storage->atomic_buffer_index = i;
+         storage->offset = var->data.atomic.offset;
+         storage->array_stride = (var->type->is_array() ?
+                                  var->type->element_type()->atomic_size() : 0);
+      }
+
+      /* Assign stage-specific fields. */
+      for (unsigned j = 0; j < MESA_SHADER_STAGES; ++j)
+         mab.StageReferences[j] =
+            (ab.stage_references[j] ? GL_TRUE : GL_FALSE);
+
+      i++;
+   }
+
+   delete [] abs;
+   assert(i == num_buffers);
+}
+
+void
+link_check_atomic_counter_resources(struct gl_context *ctx,
+                                    struct gl_shader_program *prog)
+{
+   unsigned num_buffers;
+   active_atomic_buffer *const abs =
+      find_active_atomic_counters(ctx, prog, &num_buffers);
+   unsigned atomic_counters[MESA_SHADER_STAGES] = {};
+   unsigned atomic_buffers[MESA_SHADER_STAGES] = {};
+   unsigned total_atomic_counters = 0;
+   unsigned total_atomic_buffers = 0;
+
+   /* Sum the required resources.  Note that this counts buffers and
+    * counters referenced by several shader stages multiple times
+    * against the combined limit -- That's the behavior the spec
+    * requires.
+    */
+   for (unsigned i = 0; i < ctx->Const.MaxAtomicBufferBindings; i++) {
+      if (abs[i].size == 0)
+         continue;
+
+      for (unsigned j = 0; j < MESA_SHADER_STAGES; ++j) {
+         const unsigned n = abs[i].stage_references[j];
+
+         if (n) {
+            atomic_counters[j] += n;
+            total_atomic_counters += n;
+            atomic_buffers[j]++;
+            total_atomic_buffers++;
+         }
+      }
+   }
+
+   /* Check that they are within the supported limits. */
+   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
+      if (atomic_counters[i] > ctx->Const.Program[i].MaxAtomicCounters)
+         linker_error(prog, "Too many %s shader atomic counters",
+                      _mesa_shader_stage_to_string(i));
+
+      if (atomic_buffers[i] > ctx->Const.Program[i].MaxAtomicBuffers)
+         linker_error(prog, "Too many %s shader atomic counter buffers",
+                      _mesa_shader_stage_to_string(i));
+   }
+
+   if (total_atomic_counters > ctx->Const.MaxCombinedAtomicCounters)
+      linker_error(prog, "Too many combined atomic counters");
+
+   if (total_atomic_buffers > ctx->Const.MaxCombinedAtomicBuffers)
+      linker_error(prog, "Too many combined atomic buffers");
+
+   delete [] abs;
+}

diff --git a/icd/intel/compiler/shader/link_functions.cpp b/icd/intel/compiler/shader/link_functions.cpp
new file mode 100644
index 0000000..e16567e
--- /dev/null
+++ b/icd/intel/compiler/shader/link_functions.cpp

@@ -0,0 +1,341 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "libfns.h" // LunarG ADD:
+#include "glsl_symbol_table.h"
+#include "glsl_parser_extras.h"
+#include "ir.h"
+#include "program.h"
+#include "program/hash_table.h"
+#include "linker.h"
+
+static ir_function_signature *
+find_matching_signature(const char *name, const exec_list *actual_parameters,
+			gl_shader **shader_list, unsigned num_shaders,
+			bool use_builtin);
+
+namespace {
+
+class call_link_visitor : public ir_hierarchical_visitor {
+public:
+   call_link_visitor(gl_shader_program *prog, gl_shader *linked,
+		     gl_shader **shader_list, unsigned num_shaders)
+   {
+      this->prog = prog;
+      this->shader_list = shader_list;
+      this->num_shaders = num_shaders;
+      this->success = true;
+      this->linked = linked;
+
+      this->locals = hash_table_ctor(0, hash_table_pointer_hash,
+				     hash_table_pointer_compare);
+   }
+
+   ~call_link_visitor()
+   {
+      hash_table_dtor(this->locals);
+   }
+
+   virtual ir_visitor_status visit(ir_variable *ir)
+   {
+      hash_table_insert(locals, ir, ir);
+      return visit_continue;
+   }
+
+   virtual ir_visitor_status visit_enter(ir_call *ir)
+   {
+      /* If ir is an ir_call from a function that was imported from another
+       * shader callee will point to an ir_function_signature in the original
+       * shader.  In this case the function signature MUST NOT BE MODIFIED.
+       * Doing so will modify the original shader.  This may prevent that
+       * shader from being linkable in other programs.
+       */
+      const ir_function_signature *const callee = ir->callee;
+      assert(callee != NULL);
+      const char *const name = callee->function_name();
+
+      /* Determine if the requested function signature already exists in the
+       * final linked shader.  If it does, use it as the target of the call.
+       */
+      ir_function_signature *sig =
+	 find_matching_signature(name, &callee->parameters, &linked, 1,
+				 ir->use_builtin);
+      if (sig != NULL) {
+	 ir->callee = sig;
+	 return visit_continue;
+      }
+
+      /* Try to find the signature in one of the other shaders that is being
+       * linked.  If it's not found there, return an error.
+       */
+      sig = find_matching_signature(name, &ir->actual_parameters, shader_list,
+				    num_shaders, ir->use_builtin);
+      if (sig == NULL) {
+	 /* FINISHME: Log the full signature of unresolved function.
+	  */
+	 linker_error(this->prog, "unresolved reference to function `%s'\n",
+		      name);
+	 this->success = false;
+	 return visit_stop;
+      }
+
+      /* Find the prototype information in the linked shader.  Generate any
+       * details that may be missing.
+       */
+      ir_function *f = linked->symbols->get_function(name);
+      if (f == NULL) {
+	 f = new(linked) ir_function(name);
+
+	 /* Add the new function to the linked IR.  Put it at the end
+          * so that it comes after any global variable declarations
+          * that it refers to.
+	  */
+	 linked->symbols->add_function(f);
+	 linked->ir->push_tail(f);
+      }
+
+      ir_function_signature *linked_sig =
+	 f->exact_matching_signature(NULL, &callee->parameters);
+      if ((linked_sig == NULL)
+	  || ((linked_sig != NULL)
+	      && (linked_sig->is_builtin() != ir->use_builtin))) {
+	 linked_sig = new(linked) ir_function_signature(callee->return_type);
+	 f->add_signature(linked_sig);
+      }
+
+      /* At this point linked_sig and called may be the same.  If ir is an
+       * ir_call from linked then linked_sig and callee will be
+       * ir_function_signatures that have no definitions (is_defined is false).
+       */
+      assert(!linked_sig->is_defined);
+      assert(linked_sig->body.is_empty());
+
+      /* Create an in-place clone of the function definition.  This multistep
+       * process introduces some complexity here, but it has some advantages.
+       * The parameter list and the and function body are cloned separately.
+       * The clone of the parameter list is used to prime the hashtable used
+       * to replace variable references in the cloned body.
+       *
+       * The big advantage is that the ir_function_signature does not change.
+       * This means that we don't have to process the rest of the IR tree to
+       * patch ir_call nodes.  In addition, there is no way to remove or
+       * replace signature stored in a function.  One could easily be added,
+       * but this avoids the need.
+       */
+      struct hash_table *ht = hash_table_ctor(0, hash_table_pointer_hash,
+					      hash_table_pointer_compare);
+      exec_list formal_parameters;
+      foreach_list_const(node, &sig->parameters) {
+	 const ir_instruction *const original = (ir_instruction *) node;
+	 assert(const_cast<ir_instruction *>(original)->as_variable());
+
+	 ir_instruction *copy = original->clone(linked, ht);
+	 formal_parameters.push_tail(copy);
+      }
+
+      linked_sig->replace_parameters(&formal_parameters);
+
+      if (sig->is_defined) {
+         foreach_list_const(node, &sig->body) {
+            const ir_instruction *const original = (ir_instruction *) node;
+
+            ir_instruction *copy = original->clone(linked, ht);
+            linked_sig->body.push_tail(copy);
+         }
+
+         linked_sig->is_defined = true;
+      }
+
+      hash_table_dtor(ht);
+
+      /* Patch references inside the function to things outside the function
+       * (i.e., function calls and global variables).
+       */
+      linked_sig->accept(this);
+
+      ir->callee = linked_sig;
+
+      return visit_continue;
+   }
+
+   virtual ir_visitor_status visit_leave(ir_call *ir)
+   {
+      /* Traverse list of function parameters, and for array parameters
+       * propagate max_array_access. Otherwise arrays that are only referenced
+       * from inside functions via function parameters will be incorrectly
+       * optimized. This will lead to incorrect code being generated (or worse).
+       * Do it when leaving the node so the children would propagate their
+       * array accesses first.
+       */
+
+      const exec_node *formal_param_node = ir->callee->parameters.get_head();
+      if (formal_param_node) {
+         const exec_node *actual_param_node = ir->actual_parameters.get_head();
+         while (!actual_param_node->is_tail_sentinel()) {
+            ir_variable *formal_param = (ir_variable *) formal_param_node;
+            ir_rvalue *actual_param = (ir_rvalue *) actual_param_node;
+
+            formal_param_node = formal_param_node->get_next();
+            actual_param_node = actual_param_node->get_next();
+
+            if (formal_param->type->is_array()) {
+               ir_dereference_variable *deref = actual_param->as_dereference_variable();
+               if (deref && deref->var && deref->var->type->is_array()) {
+                  deref->var->data.max_array_access =
+                     MAX2(formal_param->data.max_array_access,
+                         deref->var->data.max_array_access);
+               }
+            }
+         }
+      }
+      return visit_continue;
+   }
+
+   virtual ir_visitor_status visit(ir_dereference_variable *ir)
+   {
+      if (hash_table_find(locals, ir->var) == NULL) {
+	 /* The non-function variable must be a global, so try to find the
+	  * variable in the shader's symbol table.  If the variable is not
+	  * found, then it's a global that *MUST* be defined in the original
+	  * shader.
+	  */
+	 ir_variable *var = linked->symbols->get_variable(ir->var->name);
+	 if (var == NULL) {
+	    /* Clone the ir_variable that the dereference already has and add
+	     * it to the linked shader.
+	     */
+	    var = ir->var->clone(linked, NULL);
+	    linked->symbols->add_variable(var);
+	    linked->ir->push_head(var);
+	 } else {
+            if (var->type->is_array()) {
+               /* It is possible to have a global array declared in multiple
+                * shaders without a size.  The array is implicitly sized by
+                * the maximal access to it in *any* shader.  Because of this,
+                * we need to track the maximal access to the array as linking
+                * pulls more functions in that access the array.
+                */
+               var->data.max_array_access =
+                  MAX2(var->data.max_array_access,
+                       ir->var->data.max_array_access);
+
+               if (var->type->length == 0 && ir->var->type->length != 0)
+                  var->type = ir->var->type;
+            }
+            if (var->is_interface_instance()) {
+               /* Similarly, we need implicit sizes of arrays within interface
+                * blocks to be sized by the maximal access in *any* shader.
+                */
+               for (unsigned i = 0; i < var->get_interface_type()->length;
+                    i++) {
+                  var->max_ifc_array_access[i] =
+                     MAX2(var->max_ifc_array_access[i],
+                          ir->var->max_ifc_array_access[i]);
+               }
+            }
+	 }
+
+	 ir->var = var;
+      }
+
+      return visit_continue;
+   }
+
+   /** Was function linking successful? */
+   bool success;
+
+private:
+   /**
+    * Shader program being linked
+    *
+    * This is only used for logging error messages.
+    */
+   gl_shader_program *prog;
+
+   /** List of shaders available for linking. */
+   gl_shader **shader_list;
+
+   /** Number of shaders available for linking. */
+   unsigned num_shaders;
+
+   /**
+    * Final linked shader
+    *
+    * This is used two ways.  It is used to find global variables in the
+    * linked shader that are accessed by the function.  It is also used to add
+    * global variables from the shader where the function originated.
+    */
+   gl_shader *linked;
+
+   /**
+    * Table of variables local to the function.
+    */
+   hash_table *locals;
+};
+
+} /* anonymous namespace */
+
+/**
+ * Searches a list of shaders for a particular function definition
+ */
+ir_function_signature *
+find_matching_signature(const char *name, const exec_list *actual_parameters,
+			gl_shader **shader_list, unsigned num_shaders,
+			bool use_builtin)
+{
+   for (unsigned i = 0; i < num_shaders; i++) {
+      ir_function *const f = shader_list[i]->symbols->get_function(name);
+
+      if (f == NULL)
+	 continue;
+
+      ir_function_signature *sig =
+         f->matching_signature(NULL, actual_parameters);
+
+      if ((sig == NULL) ||
+          (!sig->is_defined && !sig->is_intrinsic))
+	 continue;
+
+      /* If this function expects to bind to a built-in function and the
+       * signature that we found isn't a built-in, keep looking.  Also keep
+       * looking if we expect a non-built-in but found a built-in.
+       */
+      if (use_builtin != sig->is_builtin())
+	    continue;
+
+      return sig;
+   }
+
+   return NULL;
+}
+
+
+bool
+link_function_calls(gl_shader_program *prog, gl_shader *main,
+		    gl_shader **shader_list, unsigned num_shaders)
+{
+   call_link_visitor v(prog, main, shader_list, num_shaders);
+
+   v.run(main->ir);
+   return v.success;
+}

diff --git a/icd/intel/compiler/shader/link_interface_blocks.cpp b/icd/intel/compiler/shader/link_interface_blocks.cpp
new file mode 100644
index 0000000..52552cc
--- /dev/null
+++ b/icd/intel/compiler/shader/link_interface_blocks.cpp

@@ -0,0 +1,385 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file link_interface_blocks.cpp
+ * Linker support for GLSL's interface blocks.
+ */
+
+#include "ir.h"
+#include "glsl_symbol_table.h"
+#include "linker.h"
+#include "main/macros.h"
+#include "program/hash_table.h"
+
+
+namespace {
+
+/**
+ * Information about a single interface block definition that we need to keep
+ * track of in order to check linkage rules.
+ *
+ * Note: this class is expected to be short lived, so it doesn't make copies
+ * of the strings it references; it simply borrows the pointers from the
+ * ir_variable class.
+ */
+struct interface_block_definition
+{
+   /**
+    * Extract an interface block definition from an ir_variable that
+    * represents either the interface instance (for named interfaces), or a
+    * member of the interface (for unnamed interfaces).
+    */
+   explicit interface_block_definition(const ir_variable *var)
+      : type(var->get_interface_type()),
+        instance_name(NULL),
+        array_size(-1)
+   {
+      if (var->is_interface_instance()) {
+         instance_name = var->name;
+         if (var->type->is_array())
+            array_size = var->type->length;
+      }
+      explicitly_declared = (var->data.how_declared != ir_var_declared_implicitly);
+   }
+
+   /**
+    * Interface block type
+    */
+   const glsl_type *type;
+
+   /**
+    * For a named interface block, the instance name.  Otherwise NULL.
+    */
+   const char *instance_name;
+
+   /**
+    * For an interface block array, the array size (or 0 if unsized).
+    * Otherwise -1.
+    */
+   int array_size;
+
+   /**
+    * True if this interface block was explicitly declared in the shader;
+    * false if it was an implicitly declared built-in interface block.
+    */
+   bool explicitly_declared;
+};
+
+
+/**
+ * Check if two interfaces match, according to intrastage interface matching
+ * rules.  If they do, and the first interface uses an unsized array, it will
+ * be updated to reflect the array size declared in the second interface.
+ */
+bool
+intrastage_match(interface_block_definition *a,
+                 const interface_block_definition *b,
+                 ir_variable_mode mode)
+{
+   /* Types must match. */
+   if (a->type != b->type) {
+      /* Exception: if both the interface blocks are implicitly declared,
+       * don't force their types to match.  They might mismatch due to the two
+       * shaders using different GLSL versions, and that's ok.
+       */
+      if (a->explicitly_declared || b->explicitly_declared)
+         return false;
+   }
+
+   /* Presence/absence of interface names must match. */
+   if ((a->instance_name == NULL) != (b->instance_name == NULL))
+      return false;
+
+   /* For uniforms, instance names need not match.  For shader ins/outs,
+    * it's not clear from the spec whether they need to match, but
+    * Mesa's implementation relies on them matching.
+    */
+   if (a->instance_name != NULL && mode != ir_var_uniform &&
+       strcmp(a->instance_name, b->instance_name) != 0) {
+      return false;
+   }
+
+   /* Array vs. nonarray must be consistent, and sizes must be
+    * consistent, with the exception that unsized arrays match sized
+    * arrays.
+    */
+   if ((a->array_size == -1) != (b->array_size == -1))
+      return false;
+   if (b->array_size != 0) {
+      if (a->array_size == 0)
+         a->array_size = b->array_size;
+      else if (a->array_size != b->array_size)
+         return false;
+   }
+
+   return true;
+}
+
+
+/**
+ * Check if two interfaces match, according to interstage (in/out) interface
+ * matching rules.
+ *
+ * If \c extra_array_level is true, then vertex-to-geometry shader matching
+ * rules are enforced (i.e. a successful match requires the consumer interface
+ * to be an array and the producer interface to be a non-array).
+ */
+bool
+interstage_match(const interface_block_definition *producer,
+                 const interface_block_definition *consumer,
+                 bool extra_array_level)
+{
+   /* Unsized arrays should not occur during interstage linking.  They
+    * should have all been assigned a size by link_intrastage_shaders.
+    */
+   assert(consumer->array_size != 0);
+   assert(producer->array_size != 0);
+
+   /* Types must match. */
+   if (consumer->type != producer->type) {
+      /* Exception: if both the interface blocks are implicitly declared,
+       * don't force their types to match.  They might mismatch due to the two
+       * shaders using different GLSL versions, and that's ok.
+       */
+      if (consumer->explicitly_declared || producer->explicitly_declared)
+         return false;
+   }
+   if (extra_array_level) {
+      /* Consumer must be an array, and producer must not. */
+      if (consumer->array_size == -1)
+         return false;
+      if (producer->array_size != -1)
+         return false;
+   } else {
+      /* Array vs. nonarray must be consistent, and sizes must be consistent.
+       * Since unsized arrays have been ruled out, we can check this by just
+       * making sure the sizes are equal.
+       */
+      if (consumer->array_size != producer->array_size)
+         return false;
+   }
+   return true;
+}
+
+
+/**
+ * This class keeps track of a mapping from an interface block name to the
+ * necessary information about that interface block to determine whether to
+ * generate a link error.
+ *
+ * Note: this class is expected to be short lived, so it doesn't make copies
+ * of the strings it references; it simply borrows the pointers from the
+ * ir_variable class.
+ */
+class interface_block_definitions
+{
+public:
+   interface_block_definitions()
+      : mem_ctx(ralloc_context(NULL)),
+        ht(hash_table_ctor(0, hash_table_string_hash,
+                           hash_table_string_compare))
+   {
+   }
+
+   ~interface_block_definitions()
+   {
+      hash_table_dtor(ht);
+      ralloc_free(mem_ctx);
+   }
+
+   /**
+    * Lookup the interface definition having the given block name.  Return
+    * NULL if none is found.
+    */
+   interface_block_definition *lookup(const char *block_name)
+   {
+      return (interface_block_definition *) hash_table_find(ht, block_name);
+   }
+
+   /**
+    * Add a new interface definition.
+    */
+   void store(const interface_block_definition &def)
+   {
+      interface_block_definition *hash_entry =
+         rzalloc(mem_ctx, interface_block_definition);
+      *hash_entry = def;
+      hash_table_insert(ht, hash_entry, def.type->name);
+   }
+
+private:
+   /**
+    * Ralloc context for data structures allocated by this class.
+    */
+   void *mem_ctx;
+
+   /**
+    * Hash table mapping interface block name to an \c
+    * interface_block_definition struct.  interface_block_definition structs
+    * are allocated using \c mem_ctx.
+    */
+   hash_table *ht;
+};
+
+
+}; /* anonymous namespace */
+
+
+void
+validate_intrastage_interface_blocks(struct gl_shader_program *prog,
+                                     const gl_shader **shader_list,
+                                     unsigned num_shaders)
+{
+   interface_block_definitions in_interfaces;
+   interface_block_definitions out_interfaces;
+   interface_block_definitions uniform_interfaces;
+
+   for (unsigned int i = 0; i < num_shaders; i++) {
+      if (shader_list[i] == NULL)
+         continue;
+
+      foreach_list(node, shader_list[i]->ir) {
+         ir_variable *var = ((ir_instruction *) node)->as_variable();
+         if (!var)
+            continue;
+
+         const glsl_type *iface_type = var->get_interface_type();
+
+         if (iface_type == NULL)
+            continue;
+
+         interface_block_definitions *definitions;
+         switch (var->data.mode) {
+         case ir_var_shader_in:
+            definitions = &in_interfaces;
+            break;
+         case ir_var_shader_out:
+            definitions = &out_interfaces;
+            break;
+         case ir_var_uniform:
+            definitions = &uniform_interfaces;
+            break;
+         default:
+            /* Only in, out, and uniform interfaces are legal, so we should
+             * never get here.
+             */
+            assert(!"illegal interface type");
+            continue;
+         }
+
+         const interface_block_definition def(var);
+         interface_block_definition *prev_def =
+            definitions->lookup(iface_type->name);
+
+         if (prev_def == NULL) {
+            /* This is the first time we've seen the interface, so save
+             * it into the appropriate data structure.
+             */
+            definitions->store(def);
+         } else if (!intrastage_match(prev_def, &def,
+                                      (ir_variable_mode) var->data.mode)) {
+            linker_error(prog, "definitions of interface block `%s' do not"
+                         " match\n", iface_type->name);
+            return;
+         }
+      }
+   }
+}
+
+void
+validate_interstage_inout_blocks(struct gl_shader_program *prog,
+                                 const gl_shader *producer,
+                                 const gl_shader *consumer)
+{
+   interface_block_definitions definitions;
+   const bool extra_array_level = consumer->Stage == MESA_SHADER_GEOMETRY;
+
+   /* Add input interfaces from the consumer to the symbol table. */
+   foreach_list(node, consumer->ir) {
+      ir_variable *var = ((ir_instruction *) node)->as_variable();
+      if (!var || !var->get_interface_type() || var->data.mode != ir_var_shader_in)
+         continue;
+
+      definitions.store(interface_block_definition(var));
+   }
+
+   /* Verify that the producer's output interfaces match. */
+   foreach_list(node, producer->ir) {
+      ir_variable *var = ((ir_instruction *) node)->as_variable();
+      if (!var || !var->get_interface_type() || var->data.mode != ir_var_shader_out)
+         continue;
+
+      interface_block_definition *consumer_def =
+         definitions.lookup(var->get_interface_type()->name);
+
+      /* The consumer doesn't use this output block.  Ignore it. */
+      if (consumer_def == NULL)
+         continue;
+
+      const interface_block_definition producer_def(var);
+
+      if (!interstage_match(&producer_def, consumer_def, extra_array_level)) {
+         linker_error(prog, "definitions of interface block `%s' do not "
+                      "match\n", var->get_interface_type()->name);
+         return;
+      }
+   }
+}
+
+
+void
+validate_interstage_uniform_blocks(struct gl_shader_program *prog,
+                                   gl_shader **stages, int num_stages)
+{
+   interface_block_definitions definitions;
+
+   for (int i = 0; i < num_stages; i++) {
+      if (stages[i] == NULL)
+         continue;
+
+      const gl_shader *stage = stages[i];
+      foreach_list(node, stage->ir) {
+         ir_variable *var = ((ir_instruction *) node)->as_variable();
+         if (!var || !var->get_interface_type() || var->data.mode != ir_var_uniform)
+            continue;
+
+         interface_block_definition *old_def =
+            definitions.lookup(var->get_interface_type()->name);
+         const interface_block_definition new_def(var);
+         if (old_def == NULL) {
+            definitions.store(new_def);
+         } else {
+            /* Interstage uniform matching rules are the same as intrastage
+             * uniform matchin rules (for uniforms, it is as though all
+             * shaders are in the same shader stage).
+             */
+            if (!intrastage_match(old_def, &new_def, ir_var_uniform)) {
+               linker_error(prog, "definitions of interface block `%s' do not "
+                            "match\n", var->get_interface_type()->name);
+               return;
+            }
+         }
+      }
+   }
+}

diff --git a/icd/intel/compiler/shader/link_uniform_block_active_visitor.cpp b/icd/intel/compiler/shader/link_uniform_block_active_visitor.cpp
new file mode 100644
index 0000000..566ef32
--- /dev/null
+++ b/icd/intel/compiler/shader/link_uniform_block_active_visitor.cpp

@@ -0,0 +1,170 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "link_uniform_block_active_visitor.h"
+#include "program.h"
+
+link_uniform_block_active *
+process_block(void *mem_ctx, struct hash_table *ht, ir_variable *var)
+{
+   const uint32_t h = _mesa_hash_string(var->get_interface_type()->name);
+   const hash_entry *const existing_block =
+      _mesa_hash_table_search(ht, h, var->get_interface_type()->name);
+
+   const glsl_type *const block_type = var->is_interface_instance()
+      ? var->type : var->get_interface_type();
+
+
+   /* If a block with this block-name has not previously been seen, add it.
+    * If a block with this block-name has been seen, it must be identical to
+    * the block currently being examined.
+    */
+   if (existing_block == NULL) {
+      link_uniform_block_active *const b =
+	 rzalloc(mem_ctx, struct link_uniform_block_active);
+
+      b->type = block_type;
+      b->has_instance_name = var->is_interface_instance();
+
+      if (var->data.explicit_binding) {
+         b->has_binding = true;
+         b->binding = var->data.binding;
+      } else {
+         b->has_binding = false;
+         b->binding = 0;
+      }
+
+      _mesa_hash_table_insert(ht, h, var->get_interface_type()->name,
+			      (void *) b);
+      return b;
+   } else {
+      link_uniform_block_active *const b =
+	 (link_uniform_block_active *) existing_block->data;
+
+      if (b->type != block_type
+	  || b->has_instance_name != var->is_interface_instance())
+	 return NULL;
+      else
+	 return b;
+   }
+
+   assert(!"Should not get here.");
+   return NULL;
+}
+
+ir_visitor_status
+link_uniform_block_active_visitor::visit_enter(ir_dereference_array *ir)
+{
+   ir_dereference_variable *const d = ir->array->as_dereference_variable();
+   ir_variable *const var = (d == NULL) ? NULL : d->var;
+
+   /* If the r-value being dereferenced is not a variable (e.g., a field of a
+    * structure) or is not a uniform block instance, continue.
+    *
+    * WARNING: It is not enough for the variable to be part of uniform block.
+    * It must represent the entire block.  Arrays (or matrices) inside blocks
+    * that lack an instance name are handled by the ir_dereference_variable
+    * function.
+    */
+   if (var == NULL
+       || !var->is_in_uniform_block()
+       || !var->is_interface_instance())
+      return visit_continue;
+
+   /* Process the block.  Bail if there was an error.
+    */
+   link_uniform_block_active *const b =
+      process_block(this->mem_ctx, this->ht, var);
+   if (b == NULL) {
+      linker_error(prog,
+		   "uniform block `%s' has mismatching definitions",
+		   var->get_interface_type()->name);
+      this->success = false;
+      return visit_stop;
+   }
+
+   /* Block arrays must be declared with an instance name.
+    */
+   assert(b->has_instance_name);
+   assert((b->num_array_elements == 0) == (b->array_elements == NULL));
+   assert(b->type != NULL);
+
+   /* Determine whether or not this array index has already been added to the
+    * list of active array indices.  At this point all constant folding must
+    * have occurred, and the array index must be a constant.
+    */
+   ir_constant *c = ir->array_index->as_constant();
+   assert(c != NULL);
+
+   const unsigned idx = c->get_uint_component(0);
+
+   unsigned i;
+   for (i = 0; i < b->num_array_elements; i++) {
+      if (b->array_elements[i] == idx)
+	 break;
+   }
+
+   assert(i <= b->num_array_elements);
+
+   if (i == b->num_array_elements) {
+      b->array_elements = reralloc(this->mem_ctx,
+				   b->array_elements,
+				   unsigned,
+				   b->num_array_elements + 1);
+
+      b->array_elements[b->num_array_elements] = idx;
+
+      b->num_array_elements++;
+   }
+
+   return visit_continue_with_parent;
+}
+
+ir_visitor_status
+link_uniform_block_active_visitor::visit(ir_dereference_variable *ir)
+{
+   ir_variable *var = ir->var;
+
+   if (!var->is_in_uniform_block())
+      return visit_continue;
+
+   assert(!var->is_interface_instance() || !var->type->is_array());
+
+   /* Process the block.  Bail if there was an error.
+    */
+   link_uniform_block_active *const b =
+      process_block(this->mem_ctx, this->ht, var);
+   if (b == NULL) {
+      linker_error(this->prog,
+		   "uniform block `%s' has mismatching definitions",
+		   var->get_interface_type()->name);
+      this->success = false;
+      return visit_stop;
+   }
+
+   assert(b->num_array_elements == 0);
+   assert(b->array_elements == NULL);
+   assert(b->type != NULL);
+
+   return visit_continue;
+}

diff --git a/icd/intel/compiler/shader/link_uniform_block_active_visitor.h b/icd/intel/compiler/shader/link_uniform_block_active_visitor.h
new file mode 100644
index 0000000..d76dbca
--- /dev/null
+++ b/icd/intel/compiler/shader/link_uniform_block_active_visitor.h

@@ -0,0 +1,65 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef LINK_UNIFORM_BLOCK_ACTIVE_VISITOR_H
+#define LINK_UNIFORM_BLOCK_ACTIVE_VISITOR_H
+
+#include "ir.h"
+#include "ir_visitor.h"
+#include "glsl_types.h"
+#include "main/hash_table.h"
+
+struct link_uniform_block_active {
+   const glsl_type *type;
+
+   unsigned *array_elements;
+   unsigned num_array_elements;
+
+   unsigned binding;
+
+   bool has_instance_name;
+   bool has_binding;
+};
+
+class link_uniform_block_active_visitor : public ir_hierarchical_visitor {
+public:
+   link_uniform_block_active_visitor(void *mem_ctx, struct hash_table *ht,
+				     struct gl_shader_program *prog)
+      : success(true), prog(prog), ht(ht), mem_ctx(mem_ctx)
+   {
+      /* empty */
+   }
+
+   virtual ir_visitor_status visit_enter(ir_dereference_array *);
+   virtual ir_visitor_status visit(ir_dereference_variable *);
+
+   bool success;
+
+private:
+   struct gl_shader_program *prog;
+   struct hash_table *ht;
+   void *mem_ctx;
+};
+
+#endif /* LINK_UNIFORM_BLOCK_ACTIVE_VISITOR_H */

diff --git a/icd/intel/compiler/shader/link_uniform_blocks.cpp b/icd/intel/compiler/shader/link_uniform_blocks.cpp
new file mode 100644
index 0000000..9169f94
--- /dev/null
+++ b/icd/intel/compiler/shader/link_uniform_blocks.cpp

@@ -0,0 +1,343 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "libfns.h" // LunarG ADD:
+#include "icd-utils.h" // LunarG ADD:
+#include "ir.h"
+#include "linker.h"
+#include "ir_uniform.h"
+#include "link_uniform_block_active_visitor.h"
+#include "main/hash_table.h"
+#include "program.h"
+
+namespace {
+
+class ubo_visitor : public program_resource_visitor {
+public:
+   ubo_visitor(void *mem_ctx, gl_uniform_buffer_variable *variables,
+               unsigned num_variables)
+      : index(0), offset(0), buffer_size(0), variables(variables),
+        num_variables(num_variables), mem_ctx(mem_ctx), is_array_instance(false)
+   {
+      /* empty */
+   }
+
+   void process(const glsl_type *type, const char *name)
+   {
+      this->offset = 0;
+      this->buffer_size = 0;
+      this->is_array_instance = strchr(name, ']') != NULL;
+      this->program_resource_visitor::process(type, name);
+   }
+
+   unsigned index;
+   unsigned offset;
+   unsigned buffer_size;
+   gl_uniform_buffer_variable *variables;
+   unsigned num_variables;
+   void *mem_ctx;
+   bool is_array_instance;
+
+private:
+   virtual void visit_field(const glsl_type *type, const char *name,
+                            bool row_major)
+   {
+      (void) type;
+      (void) name;
+      (void) row_major;
+      assert(!"Should not get here.");
+   }
+
+   virtual void visit_field(const glsl_type *type, const char *name,
+                            bool row_major, const glsl_type *record_type)
+   {
+      assert(this->index < this->num_variables);
+
+      gl_uniform_buffer_variable *v = &this->variables[this->index++];
+
+      v->Name = ralloc_strdup(mem_ctx, name);
+      v->Type = type;
+      v->RowMajor = row_major;
+
+      if (this->is_array_instance) {
+         v->IndexName = ralloc_strdup(mem_ctx, name);
+
+         char *open_bracket = strchr(v->IndexName, '[');
+         assert(open_bracket != NULL);
+
+         char *close_bracket = strchr(open_bracket, ']');
+         assert(close_bracket != NULL);
+
+         /* Length of the tail without the ']' but with the NUL.
+          */
+         unsigned len = strlen(close_bracket + 1) + 1;
+
+         memmove(open_bracket, close_bracket + 1, len);
+     } else {
+         v->IndexName = v->Name;
+      }
+
+      const unsigned alignment = record_type
+	 ? record_type->std140_base_alignment(v->RowMajor)
+	 : type->std140_base_alignment(v->RowMajor);
+      unsigned size = type->std140_size(v->RowMajor);
+
+      this->offset = glsl_align(this->offset, alignment);
+      v->Offset = this->offset;
+      this->offset += size;
+
+      /* From the GL_ARB_uniform_buffer_object spec:
+       *
+       *     "For uniform blocks laid out according to [std140] rules, the
+       *      minimum buffer object size returned by the
+       *      UNIFORM_BLOCK_DATA_SIZE query is derived by taking the offset of
+       *      the last basic machine unit consumed by the last uniform of the
+       *      uniform block (including any end-of-array or end-of-structure
+       *      padding), adding one, and rounding up to the next multiple of
+       *      the base alignment required for a vec4."
+       */
+      this->buffer_size = glsl_align(this->offset, 16);
+   }
+
+   virtual void visit_field(const glsl_struct_field *field)
+   {
+      /* FINISHME: When support for doubles (dvec4, etc.) is added to the
+       * FINISHME: compiler, this may be incorrect for a structure in a UBO
+       * FINISHME: like struct s { struct { float f } s1; dvec4 v; };.
+       */
+      this->offset = glsl_align(this->offset,
+                                field->type->std140_base_alignment(false));
+   }
+};
+
+class count_block_size : public program_resource_visitor {
+public:
+   count_block_size() : num_active_uniforms(0)
+   {
+      /* empty */
+   }
+
+   unsigned num_active_uniforms;
+
+private:
+   virtual void visit_field(const glsl_type *type, const char *name,
+                            bool row_major)
+   {
+      (void) type;
+      (void) name;
+      (void) row_major;
+      this->num_active_uniforms++;
+   }
+};
+
+} /* anonymous namespace */
+
+struct block {
+   const glsl_type *type;
+   bool has_instance_name;
+};
+
+unsigned
+link_uniform_blocks(void *mem_ctx,
+                    struct gl_shader_program *prog,
+                    struct gl_shader **shader_list,
+                    unsigned num_shaders,
+                    struct gl_uniform_block **blocks_ret)
+{
+   /* This hash table will track all of the uniform blocks that have been
+    * encountered.  Since blocks with the same block-name must be the same,
+    * the hash is organized by block-name.
+    */
+   struct hash_table *block_hash =
+      _mesa_hash_table_create(mem_ctx, _mesa_key_string_equal);
+
+   /* Determine which uniform blocks are active.
+    */
+   link_uniform_block_active_visitor v(mem_ctx, block_hash, prog);
+   for (unsigned i = 0; i < num_shaders; i++) {
+      visit_list_elements(&v, shader_list[i]->ir);
+   }
+
+   /* Count the number of active uniform blocks.  Count the total number of
+    * active slots in those uniform blocks.
+    */
+   unsigned num_blocks = 0;
+   unsigned num_variables = 0;
+   count_block_size block_size;
+   struct hash_entry *entry;
+
+   hash_table_foreach (block_hash, entry) {
+      const struct link_uniform_block_active *const b =
+         (const struct link_uniform_block_active *) entry->data;
+
+      const glsl_type *const block_type =
+         b->type->is_array() ? b->type->fields.array : b->type;
+
+      assert((b->num_array_elements > 0) == b->type->is_array());
+
+      block_size.num_active_uniforms = 0;
+      block_size.process(block_type, "");
+
+      if (b->num_array_elements > 0) {
+         num_blocks += b->num_array_elements;
+         num_variables += b->num_array_elements
+            * block_size.num_active_uniforms;
+      } else {
+         num_blocks++;
+         num_variables += block_size.num_active_uniforms;
+      }
+
+   }
+
+   if (num_blocks == 0) {
+      assert(num_variables == 0);
+      _mesa_hash_table_destroy(block_hash, NULL);
+      return 0;
+   }
+
+   assert(num_variables != 0);
+
+   /* Allocate storage to hold all of the informatation related to uniform
+    * blocks that can be queried through the API.
+    */
+   gl_uniform_block *blocks =
+      ralloc_array(mem_ctx, gl_uniform_block, num_blocks);
+   gl_uniform_buffer_variable *variables =
+      ralloc_array(blocks, gl_uniform_buffer_variable, num_variables);
+
+   /* Add each variable from each uniform block to the API tracking
+    * structures.
+    */
+   unsigned i = 0;
+   ubo_visitor parcel(blocks, variables, num_variables);
+
+   STATIC_ASSERT(unsigned(GLSL_INTERFACE_PACKING_STD140)
+                 == unsigned(ubo_packing_std140));
+   STATIC_ASSERT(unsigned(GLSL_INTERFACE_PACKING_SHARED)
+                 == unsigned(ubo_packing_shared));
+   STATIC_ASSERT(unsigned(GLSL_INTERFACE_PACKING_PACKED)
+                 == unsigned(ubo_packing_packed));
+
+
+   hash_table_foreach (block_hash, entry) {
+      const struct link_uniform_block_active *const b =
+         (const struct link_uniform_block_active *) entry->data;
+      const glsl_type *block_type = b->type;
+
+      if (b->num_array_elements > 0) {
+         const char *const name = block_type->fields.array->name;
+
+         assert(b->has_instance_name);
+         for (unsigned j = 0; j < b->num_array_elements; j++) {
+            blocks[i].Name = ralloc_asprintf(blocks, "%s[%u]", name,
+                                             b->array_elements[j]);
+            blocks[i].Uniforms = &variables[parcel.index];
+
+            /* The GL_ARB_shading_language_420pack spec says:
+             *
+             *     "If the binding identifier is used with a uniform block
+             *     instanced as an array then the first element of the array
+             *     takes the specified block binding and each subsequent
+             *     element takes the next consecutive uniform block binding
+             *     point."
+             */
+            blocks[i].Binding = (b->has_binding) ? b->binding + j : 0;
+
+            blocks[i].UniformBufferSize = 0;
+            blocks[i]._Packing =
+               gl_uniform_block_packing(block_type->interface_packing);
+
+            parcel.process(block_type->fields.array,
+                           blocks[i].Name);
+
+            blocks[i].UniformBufferSize = parcel.buffer_size;
+
+            blocks[i].NumUniforms =
+               (unsigned)(ptrdiff_t)(&variables[parcel.index] - blocks[i].Uniforms);
+
+            i++;
+         }
+      } else {
+         blocks[i].Name = ralloc_strdup(blocks, block_type->name);
+         blocks[i].Uniforms = &variables[parcel.index];
+         blocks[i].Binding = (b->has_binding) ? b->binding : 0;
+         blocks[i].UniformBufferSize = 0;
+         blocks[i]._Packing =
+            gl_uniform_block_packing(block_type->interface_packing);
+
+         parcel.process(block_type,
+                        b->has_instance_name ? block_type->name : "");
+
+         blocks[i].UniformBufferSize = parcel.buffer_size;
+
+         blocks[i].NumUniforms =
+            (unsigned)(ptrdiff_t)(&variables[parcel.index] - blocks[i].Uniforms);
+
+         i++;
+      }
+   }
+
+   assert(parcel.index == num_variables);
+
+   _mesa_hash_table_destroy(block_hash, NULL);
+
+   *blocks_ret = blocks;
+   return num_blocks;
+}
+
+bool
+link_uniform_blocks_are_compatible(const gl_uniform_block *a,
+				   const gl_uniform_block *b)
+{
+   assert(strcmp(a->Name, b->Name) == 0);
+
+   /* Page 35 (page 42 of the PDF) in section 4.3.7 of the GLSL 1.50 spec says:
+    *
+    *     "Matched block names within an interface (as defined above) must
+    *     match in terms of having the same number of declarations with the
+    *     same sequence of types and the same sequence of member names, as
+    *     well as having the same member-wise layout qualification....if a
+    *     matching block is declared as an array, then the array sizes must
+    *     also match... Any mismatch will generate a link error."
+    *
+    * Arrays are not yet supported, so there is no check for that.
+    */
+   if (a->NumUniforms != b->NumUniforms)
+      return false;
+
+   if (a->_Packing != b->_Packing)
+      return false;
+
+   for (unsigned i = 0; i < a->NumUniforms; i++) {
+      if (strcmp(a->Uniforms[i].Name, b->Uniforms[i].Name) != 0)
+	 return false;
+
+      if (a->Uniforms[i].Type != b->Uniforms[i].Type)
+	 return false;
+
+      if (a->Uniforms[i].RowMajor != b->Uniforms[i].RowMajor)
+	 return false;
+   }
+
+   return true;
+}

diff --git a/icd/intel/compiler/shader/link_uniform_initializers.cpp b/icd/intel/compiler/shader/link_uniform_initializers.cpp
new file mode 100644
index 0000000..ae4cd71
--- /dev/null
+++ b/icd/intel/compiler/shader/link_uniform_initializers.cpp

@@ -0,0 +1,312 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "libfns.h" // LunarG ADD:
+#include "ir.h"
+#include "linker.h"
+#include "ir_uniform.h"
+#include "glsl_symbol_table.h"
+#include "program/hash_table.h"
+
+/* These functions are put in a "private" namespace instead of being marked
+ * static so that the unit tests can access them.  See
+ * http://code.google.com/p/googletest/wiki/AdvancedGuide#Testing_Private_Code
+ */
+namespace linker {
+
+gl_uniform_storage *
+get_storage(gl_uniform_storage *storage, unsigned num_storage,
+	    const char *name)
+{
+   for (unsigned int i = 0; i < num_storage; i++) {
+      if (strcmp(name, storage[i].name) == 0)
+	 return &storage[i];
+   }
+
+   return NULL;
+}
+
+static unsigned
+get_uniform_block_index(const gl_shader_program *shProg,
+                        const char *uniformBlockName)
+{
+   for (unsigned i = 0; i < shProg->NumUniformBlocks; i++) {
+      if (!strcmp(shProg->UniformBlocks[i].Name, uniformBlockName))
+	 return i;
+   }
+
+   return GL_INVALID_INDEX;
+}
+
+void
+copy_constant_to_storage(union gl_constant_value *storage,
+			 const ir_constant *val,
+			 const enum glsl_base_type base_type,
+			 const unsigned int elements)
+{
+   for (unsigned int i = 0; i < elements; i++) {
+      switch (base_type) {
+      case GLSL_TYPE_UINT:
+	 storage[i].u = val->value.u[i];
+	 break;
+      case GLSL_TYPE_INT:
+      case GLSL_TYPE_SAMPLER:
+	 storage[i].i = val->value.i[i];
+	 break;
+      case GLSL_TYPE_FLOAT:
+	 storage[i].f = val->value.f[i];
+	 break;
+      case GLSL_TYPE_BOOL:
+	 storage[i].b = int(val->value.b[i]);
+	 break;
+      case GLSL_TYPE_ARRAY:
+      case GLSL_TYPE_STRUCT:
+      case GLSL_TYPE_IMAGE:
+      case GLSL_TYPE_ATOMIC_UINT:
+      case GLSL_TYPE_INTERFACE:
+      case GLSL_TYPE_VOID:
+      case GLSL_TYPE_ERROR:
+	 /* All other types should have already been filtered by other
+	  * paths in the caller.
+	  */
+	 assert(!"Should not get here.");
+	 break;
+      }
+   }
+}
+
+void
+set_sampler_binding(gl_shader_program *prog, const char *name, int binding)
+{
+   struct gl_uniform_storage *const storage =
+      get_storage(prog->UniformStorage, prog->NumUserUniformStorage, name);
+
+   if (storage == NULL) {
+      assert(storage != NULL);
+      return;
+   }
+
+   const unsigned elements = MAX2(storage->array_elements, 1);
+
+   /* Section 4.4.4 (Opaque-Uniform Layout Qualifiers) of the GLSL 4.20 spec
+    * says:
+    *
+    *     "If the binding identifier is used with an array, the first element
+    *     of the array takes the specified unit and each subsequent element
+    *     takes the next consecutive unit."
+    */
+   for (unsigned int i = 0; i < elements; i++) {
+      storage->storage[i].i = binding + i;
+   }
+
+   for (int sh = 0; sh < MESA_SHADER_STAGES; sh++) {
+      gl_shader *shader = prog->_LinkedShaders[sh];
+
+      if (shader && storage->sampler[sh].active) {
+         for (unsigned i = 0; i < elements; i++) {
+            unsigned index = storage->sampler[sh].index + i;
+
+            shader->SamplerUnits[index] = storage->storage[i].i;
+         }
+      }
+   }
+
+   storage->initialized = true;
+}
+
+void
+set_block_binding(gl_shader_program *prog, const char *block_name, int binding)
+{
+   const unsigned block_index = get_uniform_block_index(prog, block_name);
+
+   if (block_index == GL_INVALID_INDEX) {
+      assert(block_index != GL_INVALID_INDEX);
+      return;
+   }
+
+      /* This is a field of a UBO.  val is the binding index. */
+      for (int i = 0; i < MESA_SHADER_STAGES; i++) {
+         int stage_index = prog->UniformBlockStageIndex[i][block_index];
+
+         if (stage_index != -1) {
+            struct gl_shader *sh = prog->_LinkedShaders[i];
+            sh->UniformBlocks[stage_index].Binding = binding;
+         }
+      }
+}
+
+void
+set_uniform_initializer(void *mem_ctx, gl_shader_program *prog,
+			const char *name, const glsl_type *type,
+			ir_constant *val)
+{
+   if (type->is_record()) {
+      ir_constant *field_constant;
+
+      field_constant = (ir_constant *)val->components.get_head();
+
+      for (unsigned int i = 0; i < type->length; i++) {
+	 const glsl_type *field_type = type->fields.structure[i].type;
+	 const char *field_name = ralloc_asprintf(mem_ctx, "%s.%s", name,
+					    type->fields.structure[i].name);
+	 set_uniform_initializer(mem_ctx, prog, field_name,
+				 field_type, field_constant);
+	 field_constant = (ir_constant *)field_constant->next;
+      }
+      return;
+   } else if (type->is_array() && type->fields.array->is_record()) {
+      const glsl_type *const element_type = type->fields.array;
+
+      for (unsigned int i = 0; i < type->length; i++) {
+	 const char *element_name = ralloc_asprintf(mem_ctx, "%s[%d]", name, i);
+
+	 set_uniform_initializer(mem_ctx, prog, element_name,
+				 element_type, val->array_elements[i]);
+      }
+      return;
+   }
+
+   struct gl_uniform_storage *const storage =
+      get_storage(prog->UniformStorage,
+		  prog->NumUserUniformStorage,
+		  name);
+   if (storage == NULL) {
+      assert(storage != NULL);
+      return;
+   }
+
+   if (val->type->is_array()) {
+      const enum glsl_base_type base_type =
+	 val->array_elements[0]->type->base_type;
+      const unsigned int elements = val->array_elements[0]->type->components();
+      unsigned int idx = 0;
+
+      assert(val->type->length >= storage->array_elements);
+      for (unsigned int i = 0; i < storage->array_elements; i++) {
+	 copy_constant_to_storage(& storage->storage[idx],
+				  val->array_elements[i],
+				  base_type,
+				  elements);
+
+	 idx += elements;
+      }
+   } else {
+      copy_constant_to_storage(storage->storage,
+			       val,
+			       val->type->base_type,
+			       val->type->components());
+
+      if (storage->type->is_sampler()) {
+         for (int sh = 0; sh < MESA_SHADER_STAGES; sh++) {
+            gl_shader *shader = prog->_LinkedShaders[sh];
+
+            if (shader && storage->sampler[sh].active) {
+               unsigned index = storage->sampler[sh].index;
+
+               shader->SamplerUnits[index] = storage->storage[0].i;
+            }
+         }
+      }
+   }
+
+   storage->initialized = true;
+}
+}
+
+void
+link_set_uniform_initializers(struct gl_shader_program *prog)
+{
+   void *mem_ctx = NULL;
+
+   for (unsigned int i = 0; i < MESA_SHADER_STAGES; i++) {
+      struct gl_shader *shader = prog->_LinkedShaders[i];
+
+      if (shader == NULL)
+	 continue;
+
+      foreach_list(node, shader->ir) {
+	 ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+	 if (!var || var->data.mode != ir_var_uniform)
+	    continue;
+
+	 if (!mem_ctx)
+	    mem_ctx = ralloc_context(NULL);
+
+         if (var->data.explicit_binding) {
+            const glsl_type *const type = var->type;
+
+            if (type->is_sampler()
+                || (type->is_array() && type->fields.array->is_sampler())) {
+               linker::set_sampler_binding(prog, var->name, var->data.binding);
+            } else if (var->is_in_uniform_block()) {
+               const glsl_type *const iface_type = var->get_interface_type();
+
+               /* If the variable is an array and it is an interface instance,
+                * we need to set the binding for each array element.  Just
+                * checking that the variable is an array is not sufficient.
+                * The variable could be an array element of a uniform block
+                * that lacks an instance name.  For example:
+                *
+                *     uniform U {
+                *         float f[4];
+                *     };
+                *
+                * In this case "f" would pass is_in_uniform_block (above) and
+                * type->is_array(), but it will fail is_interface_instance().
+                */
+               if (var->is_interface_instance() && var->type->is_array()) {
+                  for (unsigned i = 0; i < var->type->length; i++) {
+                     const char *name =
+                        ralloc_asprintf(mem_ctx, "%s[%u]", iface_type->name, i);
+
+                     /* Section 4.4.3 (Uniform Block Layout Qualifiers) of the
+                      * GLSL 4.20 spec says:
+                      *
+                      *     "If the binding identifier is used with a uniform
+                      *     block instanced as an array then the first element
+                      *     of the array takes the specified block binding and
+                      *     each subsequent element takes the next consecutive
+                      *     uniform block binding point."
+                      */
+                     linker::set_block_binding(prog, name,
+                                               var->data.binding + i);
+                  }
+               } else {
+                  linker::set_block_binding(prog, iface_type->name,
+                                            var->data.binding);
+               }
+            } else if (type->contains_atomic()) {
+               /* we don't actually need to do anything. */
+            } else {
+               assert(!"Explicit binding not on a sampler, UBO or atomic.");
+            }
+         } else if (var->constant_value) {
+            linker::set_uniform_initializer(mem_ctx, prog, var->name,
+                                            var->type, var->constant_value);
+         }
+      }
+   }
+
+   ralloc_free(mem_ctx);
+}

diff --git a/icd/intel/compiler/shader/link_uniforms.cpp b/icd/intel/compiler/shader/link_uniforms.cpp
new file mode 100644
index 0000000..eae87ab
--- /dev/null
+++ b/icd/intel/compiler/shader/link_uniforms.cpp

@@ -0,0 +1,959 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "libfns.h" // LunarG ADD:
+#include "icd-utils.h" // LunarG: ADD
+#include "ir.h"
+#include "linker.h"
+#include "ir_uniform.h"
+#include "glsl_symbol_table.h"
+#include "program/hash_table.h"
+#include "program.h"
+
+/**
+ * \file link_uniforms.cpp
+ * Assign locations for GLSL uniforms.
+ *
+ * \author Ian Romanick <ian.d.romanick@intel.com>
+ */
+
+/**
+ * Count the backing storage requirements for a type
+ */
+static unsigned
+values_for_type(const glsl_type *type)
+{
+   if (type->is_sampler()) {
+      return 1;
+   } else if (type->is_array() && type->fields.array->is_sampler()) {
+      return type->array_size();
+   } else {
+      return type->component_slots();
+   }
+}
+
+void
+program_resource_visitor::process(const glsl_type *type, const char *name)
+{
+   assert(type->is_record()
+          || (type->is_array() && type->fields.array->is_record())
+          || type->is_interface()
+          || (type->is_array() && type->fields.array->is_interface()));
+
+   char *name_copy = ralloc_strdup(NULL, name);
+   recursion(type, &name_copy, strlen(name), false, NULL);
+   ralloc_free(name_copy);
+}
+
+void
+program_resource_visitor::process(ir_variable *var)
+{
+   const glsl_type *t = var->type;
+
+   /* false is always passed for the row_major parameter to the other
+    * processing functions because no information is available to do
+    * otherwise.  See the warning in linker.h.
+    */
+
+   /* Only strdup the name if we actually will need to modify it. */
+   if (var->data.from_named_ifc_block_array) {
+      /* lower_named_interface_blocks created this variable by lowering an
+       * interface block array to an array variable.  For example if the
+       * original source code was:
+       *
+       *     out Blk { vec4 bar } foo[3];
+       *
+       * Then the variable is now:
+       *
+       *     out vec4 bar[3];
+       *
+       * We need to visit each array element using the names constructed like
+       * so:
+       *
+       *     Blk[0].bar
+       *     Blk[1].bar
+       *     Blk[2].bar
+       */
+      assert(t->is_array());
+      const glsl_type *ifc_type = var->get_interface_type();
+      char *name = ralloc_strdup(NULL, ifc_type->name);
+      size_t name_length = strlen(name);
+      for (unsigned i = 0; i < t->length; i++) {
+         size_t new_length = name_length;
+         ralloc_asprintf_rewrite_tail(&name, &new_length, "[%u].%s", i,
+                                      var->name);
+         /* Note: row_major is only meaningful for uniform blocks, and
+          * lowering is only applied to non-uniform interface blocks, so we
+          * can safely pass false for row_major.
+          */
+         recursion(var->type, &name, new_length, false, NULL);
+      }
+      ralloc_free(name);
+   } else if (var->data.from_named_ifc_block_nonarray) {
+      /* lower_named_interface_blocks created this variable by lowering a
+       * named interface block (non-array) to an ordinary variable.  For
+       * example if the original source code was:
+       *
+       *     out Blk { vec4 bar } foo;
+       *
+       * Then the variable is now:
+       *
+       *     out vec4 bar;
+       *
+       * We need to visit this variable using the name:
+       *
+       *     Blk.bar
+       */
+      const glsl_type *ifc_type = var->get_interface_type();
+      char *name = ralloc_asprintf(NULL, "%s.%s", ifc_type->name, var->name);
+      /* Note: row_major is only meaningful for uniform blocks, and lowering
+       * is only applied to non-uniform interface blocks, so we can safely
+       * pass false for row_major.
+       */
+      recursion(var->type, &name, strlen(name), false, NULL);
+      ralloc_free(name);
+   } else if (t->is_record() || (t->is_array() && t->fields.array->is_record())) {
+      char *name = ralloc_strdup(NULL, var->name);
+      recursion(var->type, &name, strlen(name), false, NULL);
+      ralloc_free(name);
+   } else if (t->is_interface()) {
+      char *name = ralloc_strdup(NULL, var->type->name);
+      recursion(var->type, &name, strlen(name), false, NULL);
+      ralloc_free(name);
+   } else if (t->is_array() && t->fields.array->is_interface()) {
+      char *name = ralloc_strdup(NULL, var->type->fields.array->name);
+      recursion(var->type, &name, strlen(name), false, NULL);
+      ralloc_free(name);
+   } else {
+      this->visit_field(t, var->name, false, NULL);
+   }
+}
+
+void
+program_resource_visitor::recursion(const glsl_type *t, char **name,
+                                    size_t name_length, bool row_major,
+                                    const glsl_type *record_type)
+{
+   /* Records need to have each field processed individually.
+    *
+    * Arrays of records need to have each array element processed
+    * individually, then each field of the resulting array elements processed
+    * individually.
+    */
+   if (t->is_record() || t->is_interface()) {
+      if (record_type == NULL && t->is_record())
+         record_type = t;
+
+      for (unsigned i = 0; i < t->length; i++) {
+	 const char *field = t->fields.structure[i].name;
+	 size_t new_length = name_length;
+
+         if (t->fields.structure[i].type->is_record())
+            this->visit_field(&t->fields.structure[i]);
+
+         /* Append '.field' to the current variable name. */
+         if (name_length == 0) {
+            ralloc_asprintf_rewrite_tail(name, &new_length, "%s", field);
+         } else {
+            ralloc_asprintf_rewrite_tail(name, &new_length, ".%s", field);
+         }
+
+         recursion(t->fields.structure[i].type, name, new_length,
+                   t->fields.structure[i].row_major, record_type);
+
+         /* Only the first leaf-field of the record gets called with the
+          * record type pointer.
+          */
+         record_type = NULL;
+      }
+   } else if (t->is_array() && (t->fields.array->is_record()
+                                || t->fields.array->is_interface())) {
+      if (record_type == NULL && t->fields.array->is_record())
+         record_type = t->fields.array;
+
+      for (unsigned i = 0; i < t->length; i++) {
+	 size_t new_length = name_length;
+
+	 /* Append the subscript to the current variable name */
+	 ralloc_asprintf_rewrite_tail(name, &new_length, "[%u]", i);
+
+         recursion(t->fields.array, name, new_length, row_major,
+                   record_type);
+
+         /* Only the first leaf-field of the record gets called with the
+          * record type pointer.
+          */
+         record_type = NULL;
+      }
+   } else {
+      this->visit_field(t, *name, row_major, record_type);
+   }
+}
+
+void
+program_resource_visitor::visit_field(const glsl_type *type, const char *name,
+                                      bool row_major,
+                                      const glsl_type *)
+{
+   visit_field(type, name, row_major);
+}
+
+void
+program_resource_visitor::visit_field(const glsl_struct_field *field)
+{
+   (void) field;
+   /* empty */
+}
+
+namespace {
+
+/**
+ * Class to help calculate the storage requirements for a set of uniforms
+ *
+ * As uniforms are added to the active set the number of active uniforms and
+ * the storage requirements for those uniforms are accumulated.  The active
+ * uniforms are added the the hash table supplied to the constructor.
+ *
+ * If the same uniform is added multiple times (i.e., once for each shader
+ * target), it will only be accounted once.
+ */
+class count_uniform_size : public program_resource_visitor {
+public:
+   count_uniform_size(struct string_to_uint_map *map)
+      : num_active_uniforms(0), num_values(0), num_shader_samplers(0),
+        num_shader_images(0), num_shader_uniform_components(0),
+        is_ubo_var(false), map(map)
+   {
+      /* empty */
+   }
+
+   void start_shader()
+   {
+      this->num_shader_samplers = 0;
+      this->num_shader_images = 0;
+      this->num_shader_uniform_components = 0;
+   }
+
+   void process(ir_variable *var)
+   {
+      this->is_ubo_var = var->is_in_uniform_block();
+      if (var->is_interface_instance())
+         program_resource_visitor::process(var->get_interface_type(),
+                                           var->get_interface_type()->name);
+      else
+         program_resource_visitor::process(var);
+   }
+
+   /**
+    * Total number of active uniforms counted
+    */
+   unsigned num_active_uniforms;
+
+   /**
+    * Number of data values required to back the storage for the active uniforms
+    */
+   unsigned num_values;
+
+   /**
+    * Number of samplers used
+    */
+   unsigned num_shader_samplers;
+
+   /**
+    * Number of images used
+    */
+   unsigned num_shader_images;
+
+   /**
+    * Number of uniforms used in the current shader
+    */
+   unsigned num_shader_uniform_components;
+
+   bool is_ubo_var;
+
+private:
+   virtual void visit_field(const glsl_type *type, const char *name,
+                            bool row_major)
+   {
+      assert(!type->is_record());
+      assert(!(type->is_array() && type->fields.array->is_record()));
+      assert(!type->is_interface());
+      assert(!(type->is_array() && type->fields.array->is_interface()));
+
+      (void) row_major;
+
+      /* Count the number of samplers regardless of whether the uniform is
+       * already in the hash table.  The hash table prevents adding the same
+       * uniform for multiple shader targets, but in this case we want to
+       * count it for each shader target.
+       */
+      const unsigned values = values_for_type(type);
+      if (type->contains_sampler()) {
+	 this->num_shader_samplers +=
+	    type->is_array() ? type->array_size() : 1;
+      } else if (type->contains_image()) {
+         this->num_shader_images += values;
+
+         /* As drivers are likely to represent image uniforms as
+          * scalar indices, count them against the limit of uniform
+          * components in the default block.  The spec allows image
+          * uniforms to use up no more than one scalar slot.
+          */
+         this->num_shader_uniform_components += values;
+      } else {
+	 /* Accumulate the total number of uniform slots used by this shader.
+	  * Note that samplers do not count against this limit because they
+	  * don't use any storage on current hardware.
+	  */
+	 if (!is_ubo_var)
+	    this->num_shader_uniform_components += values;
+      }
+
+      /* If the uniform is already in the map, there's nothing more to do.
+       */
+      unsigned id;
+      if (this->map->get(id, name))
+	 return;
+
+      this->map->put(this->num_active_uniforms, name);
+
+      /* Each leaf uniform occupies one entry in the list of active
+       * uniforms.
+       */
+      this->num_active_uniforms++;
+      this->num_values += values;
+   }
+
+   struct string_to_uint_map *map;
+};
+
+} /* anonymous namespace */
+
+/**
+ * Class to help parcel out pieces of backing storage to uniforms
+ *
+ * Each uniform processed has some range of the \c gl_constant_value
+ * structures associated with it.  The association is done by finding
+ * the uniform in the \c string_to_uint_map and using the value from
+ * the map to connect that slot in the \c gl_uniform_storage table
+ * with the next available slot in the \c gl_constant_value array.
+ *
+ * \warning
+ * This class assumes that every uniform that will be processed is
+ * already in the \c string_to_uint_map.  In addition, it assumes that
+ * the \c gl_uniform_storage and \c gl_constant_value arrays are "big
+ * enough."
+ */
+class parcel_out_uniform_storage : public program_resource_visitor {
+public:
+   parcel_out_uniform_storage(struct string_to_uint_map *map,
+			      struct gl_uniform_storage *uniforms,
+			      union gl_constant_value *values)
+      : map(map), uniforms(uniforms), values(values)
+   {
+   }
+
+   void start_shader(gl_shader_stage shader_type)
+   {
+      assert(shader_type < MESA_SHADER_STAGES);
+      this->shader_type = shader_type;
+
+      this->shader_samplers_used = 0;
+      this->shader_shadow_samplers = 0;
+      this->next_sampler = 0;
+      this->next_image = 0;
+      memset(this->targets, 0, sizeof(this->targets));
+   }
+
+   void set_and_process(struct gl_shader_program *prog,
+			ir_variable *var)
+   {
+      ubo_block_index = -1;
+      if (var->is_in_uniform_block()) {
+         if (var->is_interface_instance() && var->type->is_array()) {
+            unsigned l = strlen(var->get_interface_type()->name);
+
+            for (unsigned i = 0; i < prog->NumUniformBlocks; i++) {
+               if (strncmp(var->get_interface_type()->name,
+                           prog->UniformBlocks[i].Name,
+                           l) == 0
+                   && prog->UniformBlocks[i].Name[l] == '[') {
+                  ubo_block_index = i;
+                  break;
+               }
+            }
+         } else {
+            for (unsigned i = 0; i < prog->NumUniformBlocks; i++) {
+               if (strcmp(var->get_interface_type()->name,
+                          prog->UniformBlocks[i].Name) == 0) {
+                  ubo_block_index = i;
+                  break;
+               }
+	    }
+	 }
+	 assert(ubo_block_index != -1);
+
+         /* Uniform blocks that were specified with an instance name must be
+          * handled a little bit differently.  The name of the variable is the
+          * name used to reference the uniform block instead of being the name
+          * of a variable within the block.  Therefore, searching for the name
+          * within the block will fail.
+          */
+         if (var->is_interface_instance()) {
+            ubo_byte_offset = 0;
+            ubo_row_major = false;
+         } else {
+            const struct gl_uniform_block *const block =
+               &prog->UniformBlocks[ubo_block_index];
+
+            assert(var->data.location != -1);
+
+            const struct gl_uniform_buffer_variable *const ubo_var =
+               &block->Uniforms[var->data.location];
+
+            ubo_row_major = ubo_var->RowMajor;
+            ubo_byte_offset = ubo_var->Offset;
+         }
+
+         if (var->is_interface_instance())
+            process(var->get_interface_type(),
+                    var->get_interface_type()->name);
+         else
+            process(var);
+      } else
+         process(var);
+   }
+
+   int ubo_block_index;
+   int ubo_byte_offset;
+   bool ubo_row_major;
+   gl_shader_stage shader_type;
+
+private:
+   void handle_samplers(const glsl_type *base_type,
+                        struct gl_uniform_storage *uniform)
+   {
+      if (base_type->is_sampler()) {
+         uniform->sampler[shader_type].index = this->next_sampler;
+         uniform->sampler[shader_type].active = true;
+
+         /* Increment the sampler by 1 for non-arrays and by the number of
+          * array elements for arrays.
+          */
+         this->next_sampler +=
+               MAX2(1, uniform->array_elements);
+
+         const gl_texture_index target = base_type->sampler_index();
+         const unsigned shadow = base_type->sampler_shadow;
+         for (unsigned i = uniform->sampler[shader_type].index;
+              i < MIN2(this->next_sampler, MAX_SAMPLERS);
+              i++) {
+            this->targets[i] = target;
+            this->shader_samplers_used |= 1U << i;
+            this->shader_shadow_samplers |= shadow << i;
+         }
+      } else {
+         uniform->sampler[shader_type].index = ~0;
+         uniform->sampler[shader_type].active = false;
+      }
+   }
+
+   void handle_images(const glsl_type *base_type,
+                      struct gl_uniform_storage *uniform)
+   {
+      if (base_type->is_image()) {
+         uniform->image[shader_type].index = this->next_image;
+         uniform->image[shader_type].active = true;
+
+         /* Increment the image index by 1 for non-arrays and by the
+          * number of array elements for arrays.
+          */
+         this->next_image += MAX2(1, uniform->array_elements);
+
+      } else {
+         uniform->image[shader_type].index = ~0;
+         uniform->image[shader_type].active = false;
+      }
+   }
+
+   virtual void visit_field(const glsl_type *type, const char *name,
+                            bool row_major)
+   {
+      (void) type;
+      (void) name;
+      (void) row_major;
+      assert(!"Should not get here.");
+   }
+
+   virtual void visit_field(const glsl_type *type, const char *name,
+                            bool row_major, const glsl_type *record_type)
+   {
+      assert(!type->is_record());
+      assert(!(type->is_array() && type->fields.array->is_record()));
+      assert(!type->is_interface());
+      assert(!(type->is_array() && type->fields.array->is_interface()));
+
+      (void) row_major;
+
+      unsigned id;
+      bool found = this->map->get(id, name);
+      assert(found);
+
+      if (!found)
+	 return;
+
+      const glsl_type *base_type;
+      if (type->is_array()) {
+	 this->uniforms[id].array_elements = type->length;
+	 base_type = type->fields.array;
+      } else {
+	 this->uniforms[id].array_elements = 0;
+	 base_type = type;
+      }
+
+      /* This assigns uniform indices to sampler and image uniforms. */
+      handle_samplers(base_type, &this->uniforms[id]);
+      handle_images(base_type, &this->uniforms[id]);
+
+      /* If there is already storage associated with this uniform, it means
+       * that it was set while processing an earlier shader stage.  For
+       * example, we may be processing the uniform in the fragment shader, but
+       * the uniform was already processed in the vertex shader.
+       */
+      if (this->uniforms[id].storage != NULL) {
+         return;
+      }
+
+      this->uniforms[id].name = ralloc_strdup(this->uniforms, name);
+      this->uniforms[id].type = base_type;
+      this->uniforms[id].initialized = 0;
+      this->uniforms[id].num_driver_storage = 0;
+      this->uniforms[id].driver_storage = NULL;
+      this->uniforms[id].storage = this->values;
+      this->uniforms[id].atomic_buffer_index = -1;
+      if (this->ubo_block_index != -1) {
+	 this->uniforms[id].block_index = this->ubo_block_index;
+
+	 const unsigned alignment = record_type
+	    ? record_type->std140_base_alignment(ubo_row_major)
+	    : type->std140_base_alignment(ubo_row_major);
+	 this->ubo_byte_offset = glsl_align(this->ubo_byte_offset, alignment);
+	 this->uniforms[id].offset = this->ubo_byte_offset;
+	 this->ubo_byte_offset += type->std140_size(ubo_row_major);
+
+	 if (type->is_array()) {
+	    this->uniforms[id].array_stride =
+	       glsl_align(type->fields.array->std140_size(ubo_row_major), 16);
+	 } else {
+	    this->uniforms[id].array_stride = 0;
+	 }
+
+	 if (type->is_matrix() ||
+	     (type->is_array() && type->fields.array->is_matrix())) {
+	    this->uniforms[id].matrix_stride = 16;
+	    this->uniforms[id].row_major = ubo_row_major;
+	 } else {
+	    this->uniforms[id].matrix_stride = 0;
+	    this->uniforms[id].row_major = false;
+	 }
+      } else {
+	 this->uniforms[id].block_index = -1;
+	 this->uniforms[id].offset = -1;
+	 this->uniforms[id].array_stride = -1;
+	 this->uniforms[id].matrix_stride = -1;
+	 this->uniforms[id].row_major = false;
+      }
+
+      this->values += values_for_type(type);
+   }
+
+   struct string_to_uint_map *map;
+
+   struct gl_uniform_storage *uniforms;
+   unsigned next_sampler;
+   unsigned next_image;
+
+public:
+   union gl_constant_value *values;
+
+   gl_texture_index targets[MAX_SAMPLERS];
+
+   /**
+    * Mask of samplers used by the current shader stage.
+    */
+   unsigned shader_samplers_used;
+
+   /**
+    * Mask of samplers used by the current shader stage for shadows.
+    */
+   unsigned shader_shadow_samplers;
+
+   bool isVK;
+};
+
+/**
+ * Merges a uniform block into an array of uniform blocks that may or
+ * may not already contain a copy of it.
+ *
+ * Returns the index of the new block in the array.
+ */
+int
+link_cross_validate_uniform_block(void *mem_ctx,
+				  struct gl_uniform_block **linked_blocks,
+				  unsigned int *num_linked_blocks,
+				  struct gl_uniform_block *new_block)
+{
+   for (unsigned int i = 0; i < *num_linked_blocks; i++) {
+      struct gl_uniform_block *old_block = &(*linked_blocks)[i];
+
+      if (strcmp(old_block->Name, new_block->Name) == 0)
+	 return link_uniform_blocks_are_compatible(old_block, new_block)
+	    ? i : -1;
+   }
+
+   *linked_blocks = reralloc(mem_ctx, *linked_blocks,
+			     struct gl_uniform_block,
+			     *num_linked_blocks + 1);
+   int linked_block_index = (*num_linked_blocks)++;
+   struct gl_uniform_block *linked_block = &(*linked_blocks)[linked_block_index];
+
+   memcpy(linked_block, new_block, sizeof(*new_block));
+   linked_block->Uniforms = ralloc_array(*linked_blocks,
+					 struct gl_uniform_buffer_variable,
+					 linked_block->NumUniforms);
+
+   memcpy(linked_block->Uniforms,
+	  new_block->Uniforms,
+	  sizeof(*linked_block->Uniforms) * linked_block->NumUniforms);
+
+   for (unsigned int i = 0; i < linked_block->NumUniforms; i++) {
+      struct gl_uniform_buffer_variable *ubo_var =
+	 &linked_block->Uniforms[i];
+
+      if (ubo_var->Name == ubo_var->IndexName) {
+         ubo_var->Name = ralloc_strdup(*linked_blocks, ubo_var->Name);
+         ubo_var->IndexName = ubo_var->Name;
+      } else {
+         ubo_var->Name = ralloc_strdup(*linked_blocks, ubo_var->Name);
+         ubo_var->IndexName = ralloc_strdup(*linked_blocks, ubo_var->IndexName);
+      }
+   }
+
+   return linked_block_index;
+}
+
+/**
+ * Walks the IR and update the references to uniform blocks in the
+ * ir_variables to point at linked shader's list (previously, they
+ * would point at the uniform block list in one of the pre-linked
+ * shaders).
+ */
+static void
+link_update_uniform_buffer_variables(struct gl_shader *shader)
+{
+   foreach_list(node, shader->ir) {
+      ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+      if ((var == NULL) || !var->is_in_uniform_block())
+	 continue;
+
+      assert(var->data.mode == ir_var_uniform);
+
+      if (var->is_interface_instance()) {
+         var->data.location = 0;
+         continue;
+      }
+
+      bool found = false;
+      char sentinel = '\0';
+
+      if (var->type->is_record()) {
+         sentinel = '.';
+      } else if (var->type->is_array()
+                 && var->type->fields.array->is_record()) {
+         sentinel = '[';
+      }
+
+      const unsigned l = strlen(var->name);
+      for (unsigned i = 0; i < shader->NumUniformBlocks; i++) {
+	 for (unsigned j = 0; j < shader->UniformBlocks[i].NumUniforms; j++) {
+            if (sentinel) {
+               const char *begin = shader->UniformBlocks[i].Uniforms[j].Name;
+               const char *end = strchr(begin, sentinel);
+
+               if (end == NULL)
+                  continue;
+
+               if (l != (end - begin))
+                  continue;
+
+               if (strncmp(var->name, begin, l) == 0) {
+                  found = true;
+                  var->data.location = j;
+                  break;
+               }
+            } else if (!strcmp(var->name,
+                               shader->UniformBlocks[i].Uniforms[j].Name)) {
+	       found = true;
+	       var->data.location = j;
+	       break;
+	    }
+	 }
+	 if (found)
+	    break;
+      }
+      assert(found);
+   }
+}
+
+void
+link_assign_uniform_block_offsets(struct gl_shader *shader)
+{
+   for (unsigned b = 0; b < shader->NumUniformBlocks; b++) {
+      struct gl_uniform_block *block = &shader->UniformBlocks[b];
+
+      unsigned offset = 0;
+      for (unsigned int i = 0; i < block->NumUniforms; i++) {
+	 struct gl_uniform_buffer_variable *ubo_var = &block->Uniforms[i];
+	 const struct glsl_type *type = ubo_var->Type;
+
+	 unsigned alignment = type->std140_base_alignment(ubo_var->RowMajor);
+	 unsigned size = type->std140_size(ubo_var->RowMajor);
+
+	 offset = glsl_align(offset, alignment);
+	 ubo_var->Offset = offset;
+	 offset += size;
+      }
+
+      /* From the GL_ARB_uniform_buffer_object spec:
+       *
+       *     "For uniform blocks laid out according to [std140] rules,
+       *      the minimum buffer object size returned by the
+       *      UNIFORM_BLOCK_DATA_SIZE query is derived by taking the
+       *      offset of the last basic machine unit consumed by the
+       *      last uniform of the uniform block (including any
+       *      end-of-array or end-of-structure padding), adding one,
+       *      and rounding up to the next multiple of the base
+       *      alignment required for a vec4."
+       */
+      block->UniformBufferSize = glsl_align(offset, 16);
+   }
+}
+
+/**
+ * Scan the program for image uniforms and store image unit access
+ * information into the gl_shader data structure.
+ */
+static void
+link_set_image_access_qualifiers(struct gl_shader_program *prog)
+{
+   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
+      gl_shader *sh = prog->_LinkedShaders[i];
+
+      if (sh == NULL)
+	 continue;
+
+      foreach_list(node, sh->ir) {
+	 ir_variable *var = ((ir_instruction *) node)->as_variable();
+
+         if (var && var->data.mode == ir_var_uniform &&
+             var->type->contains_image()) {
+            unsigned id = 0;
+            bool found = prog->UniformHash->get(id, var->name);
+            assert(found);
+            (void) found;
+            const gl_uniform_storage *storage = &prog->UniformStorage[id];
+            const unsigned index = storage->image[i].index;
+            const GLenum access = (var->data.image.read_only ? GL_READ_ONLY :
+                                   var->data.image.write_only ? GL_WRITE_ONLY :
+                                   GL_READ_WRITE);
+
+            for (unsigned j = 0; j < MAX2(1, storage->array_elements); ++j)
+               sh->ImageAccess[index + j] = access;
+         }
+      }
+   }
+}
+
+void
+link_assign_uniform_locations(struct gl_shader_program *prog)
+{
+   ralloc_free(prog->UniformStorage);
+   prog->UniformStorage = NULL;
+   prog->NumUserUniformStorage = 0;
+
+   ralloc_free(prog->UniformRemapTable);
+   prog->UniformRemapTable = NULL;
+   prog->NumUniformRemapTable = 0;
+
+   if (prog->UniformHash != NULL) {
+      prog->UniformHash->clear();
+   } else {
+      prog->UniformHash = new string_to_uint_map;
+   }
+
+   /* First pass: Count the uniform resources used by the user-defined
+    * uniforms.  While this happens, each active uniform will have an index
+    * assigned to it.
+    *
+    * Note: this is *NOT* the index that is returned to the application by
+    * glGetUniformLocation.
+    */
+   count_uniform_size uniform_size(prog->UniformHash);
+   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
+      struct gl_shader *sh = prog->_LinkedShaders[i];
+
+      if (sh == NULL)
+	 continue;
+
+      /* Uniforms that lack an initializer in the shader code have an initial
+       * value of zero.  This includes sampler uniforms.
+       *
+       * Page 24 (page 30 of the PDF) of the GLSL 1.20 spec says:
+       *
+       *     "The link time initial value is either the value of the variable's
+       *     initializer, if present, or 0 if no initializer is present. Sampler
+       *     types cannot have initializers."
+       */
+      memset(sh->SamplerUnits, 0, sizeof(sh->SamplerUnits));
+      memset(sh->ImageUnits, 0, sizeof(sh->ImageUnits));
+
+      link_update_uniform_buffer_variables(sh);
+
+      /* Reset various per-shader target counts.
+       */
+      uniform_size.start_shader();
+
+      foreach_list(node, sh->ir) {
+	 ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+	 if ((var == NULL) || (var->data.mode != ir_var_uniform))
+	    continue;
+
+	 /* FINISHME: Update code to process built-in uniforms!
+	  */
+	 if (strncmp("gl_", var->name, 3) == 0) {
+	    uniform_size.num_shader_uniform_components +=
+	       var->type->component_slots();
+	    continue;
+	 }
+
+	 uniform_size.process(var);
+      }
+
+      sh->num_samplers = uniform_size.num_shader_samplers;
+      sh->NumImages = uniform_size.num_shader_images;
+      sh->num_uniform_components = uniform_size.num_shader_uniform_components;
+
+      sh->num_combined_uniform_components = sh->num_uniform_components;
+      for (unsigned i = 0; i < sh->NumUniformBlocks; i++) {
+	 sh->num_combined_uniform_components +=
+	    sh->UniformBlocks[i].UniformBufferSize / 4;
+      }
+   }
+
+   const unsigned num_user_uniforms = uniform_size.num_active_uniforms;
+   const unsigned num_data_slots = uniform_size.num_values;
+
+   /* On the outside chance that there were no uniforms, bail out.
+    */
+   if (num_user_uniforms == 0)
+      return;
+
+   struct gl_uniform_storage *uniforms =
+      rzalloc_array(prog, struct gl_uniform_storage, num_user_uniforms);
+   union gl_constant_value *data =
+      rzalloc_array(uniforms, union gl_constant_value, num_data_slots);
+#ifndef NDEBUG
+   union gl_constant_value *data_end = &data[num_data_slots];
+#endif
+
+   parcel_out_uniform_storage parcel(prog->UniformHash, uniforms, data);
+
+   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
+      if (prog->_LinkedShaders[i] == NULL)
+	 continue;
+
+      parcel.start_shader((gl_shader_stage)i);
+
+      foreach_list(node, prog->_LinkedShaders[i]->ir) {
+	 ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+	 if ((var == NULL) || (var->data.mode != ir_var_uniform))
+	    continue;
+
+	 /* FINISHME: Update code to process built-in uniforms!
+	  */
+	 if (strncmp("gl_", var->name, 3) == 0)
+	    continue;
+
+	 parcel.set_and_process(prog, var);
+      }
+
+      prog->_LinkedShaders[i]->active_samplers = parcel.shader_samplers_used;
+      prog->_LinkedShaders[i]->shadow_samplers = parcel.shader_shadow_samplers;
+
+      STATIC_ASSERT(sizeof(prog->_LinkedShaders[i]->SamplerTargets) == sizeof(parcel.targets));
+      memcpy(prog->_LinkedShaders[i]->SamplerTargets, parcel.targets,
+             sizeof(prog->_LinkedShaders[i]->SamplerTargets));
+   }
+
+   /* Build the uniform remap table that is used to set/get uniform locations */
+   for (unsigned i = 0; i < num_user_uniforms; i++) {
+
+      /* how many new entries for this uniform? */
+      const unsigned entries = MAX2(1, uniforms[i].array_elements);
+
+      /* resize remap table to fit new entries */
+      prog->UniformRemapTable =
+         reralloc(prog,
+                  prog->UniformRemapTable,
+                  gl_uniform_storage *,
+                  prog->NumUniformRemapTable + entries);
+
+      /* set pointers for this uniform */
+      for (unsigned j = 0; j < entries; j++)
+         prog->UniformRemapTable[prog->NumUniformRemapTable+j] = &uniforms[i];
+
+      /* set the base location in remap table for the uniform */
+      uniforms[i].remap_location = prog->NumUniformRemapTable;
+
+      prog->NumUniformRemapTable += entries;
+   }
+
+#ifndef NDEBUG
+   for (unsigned i = 0; i < num_user_uniforms; i++) {
+      assert(uniforms[i].storage != NULL);
+   }
+
+   assert(parcel.values == data_end);
+#endif
+
+   prog->NumUserUniformStorage = num_user_uniforms;
+   prog->UniformStorage = uniforms;
+
+   link_set_image_access_qualifiers(prog);
+   link_set_uniform_initializers(prog);
+
+   return;
+}

diff --git a/icd/intel/compiler/shader/link_varyings.cpp b/icd/intel/compiler/shader/link_varyings.cpp
new file mode 100644
index 0000000..2543045
--- /dev/null
+++ b/icd/intel/compiler/shader/link_varyings.cpp

@@ -0,0 +1,1531 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file link_varyings.cpp
+ *
+ * Linker functions related specifically to linking varyings between shader
+ * stages.
+ */
+
+
+#include "icd-utils.h" // LunarG: ADD
+#include "main/mtypes.h"
+#include "glsl_symbol_table.h"
+#include "glsl_parser_extras.h"
+#include "ir_optimization.h"
+#include "linker.h"
+#include "link_varyings.h"
+#include "main/macros.h"
+#include "program/hash_table.h"
+#include "program.h"
+
+
+/**
+ * Validate the types and qualifiers of an output from one stage against the
+ * matching input to another stage.
+ */
+static void
+cross_validate_types_and_qualifiers(struct gl_shader_program *prog,
+                                    const ir_variable *input,
+                                    const ir_variable *output,
+                                    gl_shader_stage consumer_stage,
+                                    gl_shader_stage producer_stage)
+{
+   /* Check that the types match between stages.
+    */
+   const glsl_type *type_to_match = input->type;
+   if (consumer_stage == MESA_SHADER_GEOMETRY) {
+      assert(type_to_match->is_array()); /* Enforced by ast_to_hir */
+      type_to_match = type_to_match->element_type();
+   }
+   if (type_to_match != output->type) {
+      /* There is a bit of a special case for gl_TexCoord.  This
+       * built-in is unsized by default.  Applications that variable
+       * access it must redeclare it with a size.  There is some
+       * language in the GLSL spec that implies the fragment shader
+       * and vertex shader do not have to agree on this size.  Other
+       * driver behave this way, and one or two applications seem to
+       * rely on it.
+       *
+       * Neither declaration needs to be modified here because the array
+       * sizes are fixed later when update_array_sizes is called.
+       *
+       * From page 48 (page 54 of the PDF) of the GLSL 1.10 spec:
+       *
+       *     "Unlike user-defined varying variables, the built-in
+       *     varying variables don't have a strict one-to-one
+       *     correspondence between the vertex language and the
+       *     fragment language."
+       */
+      if (!output->type->is_array()
+          || (strncmp("gl_", output->name, 3) != 0)) {
+         linker_error(prog,
+                      "%s shader output `%s' declared as type `%s', "
+                      "but %s shader input declared as type `%s'\n",
+                      _mesa_shader_stage_to_string(producer_stage),
+                      output->name,
+                      output->type->name,
+                      _mesa_shader_stage_to_string(consumer_stage),
+                      input->type->name);
+         return;
+      }
+   }
+
+   /* Check that all of the qualifiers match between stages.
+    */
+   if (input->data.centroid != output->data.centroid) {
+      linker_error(prog,
+                   "%s shader output `%s' %s centroid qualifier, "
+                   "but %s shader input %s centroid qualifier\n",
+                   _mesa_shader_stage_to_string(producer_stage),
+                   output->name,
+                   (output->data.centroid) ? "has" : "lacks",
+                   _mesa_shader_stage_to_string(consumer_stage),
+                   (input->data.centroid) ? "has" : "lacks");
+      return;
+   }
+
+   if (input->data.sample != output->data.sample) {
+      linker_error(prog,
+                   "%s shader output `%s' %s sample qualifier, "
+                   "but %s shader input %s sample qualifier\n",
+                   _mesa_shader_stage_to_string(producer_stage),
+                   output->name,
+                   (output->data.sample) ? "has" : "lacks",
+                   _mesa_shader_stage_to_string(consumer_stage),
+                   (input->data.sample) ? "has" : "lacks");
+      return;
+   }
+
+   if (input->data.invariant != output->data.invariant) {
+      linker_error(prog,
+                   "%s shader output `%s' %s invariant qualifier, "
+                   "but %s shader input %s invariant qualifier\n",
+                   _mesa_shader_stage_to_string(producer_stage),
+                   output->name,
+                   (output->data.invariant) ? "has" : "lacks",
+                   _mesa_shader_stage_to_string(consumer_stage),
+                   (input->data.invariant) ? "has" : "lacks");
+      return;
+   }
+
+   if (input->data.interpolation != output->data.interpolation) {
+      linker_error(prog,
+                   "%s shader output `%s' specifies %s "
+                   "interpolation qualifier, "
+                   "but %s shader input specifies %s "
+                   "interpolation qualifier\n",
+                   _mesa_shader_stage_to_string(producer_stage),
+                   output->name,
+                   interpolation_string(output->data.interpolation),
+                   _mesa_shader_stage_to_string(consumer_stage),
+                   interpolation_string(input->data.interpolation));
+      return;
+   }
+}
+
+/**
+ * Validate front and back color outputs against single color input
+ */
+static void
+cross_validate_front_and_back_color(struct gl_shader_program *prog,
+                                    const ir_variable *input,
+                                    const ir_variable *front_color,
+                                    const ir_variable *back_color,
+                                    gl_shader_stage consumer_stage,
+                                    gl_shader_stage producer_stage)
+{
+   if (front_color != NULL && front_color->data.assigned)
+      cross_validate_types_and_qualifiers(prog, input, front_color,
+                                          consumer_stage, producer_stage);
+
+   if (back_color != NULL && back_color->data.assigned)
+      cross_validate_types_and_qualifiers(prog, input, back_color,
+                                          consumer_stage, producer_stage);
+}
+
+/**
+ * Validate that outputs from one stage match inputs of another
+ */
+void
+cross_validate_outputs_to_inputs(struct gl_shader_program *prog,
+				 gl_shader *producer, gl_shader *consumer)
+{
+   glsl_symbol_table parameters;
+   ir_variable *explicit_locations[MAX_VARYING] = { NULL, };
+
+   /* Find all shader outputs in the "producer" stage.
+    */
+   foreach_list(node, producer->ir) {
+      ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+      if ((var == NULL) || (var->data.mode != ir_var_shader_out))
+	 continue;
+
+      if (!var->data.explicit_location
+          || var->data.location < VARYING_SLOT_VAR0)
+         parameters.add_variable(var);
+      else {
+         /* User-defined varyings with explicit locations are handled
+          * differently because they do not need to have matching names.
+          */
+         const unsigned idx = var->data.location - VARYING_SLOT_VAR0;
+
+         if (explicit_locations[idx] != NULL) {
+            linker_error(prog,
+                         "%s shader has multiple outputs explicitly "
+                         "assigned to location %d\n",
+                         _mesa_shader_stage_to_string(producer->Stage),
+                         idx);
+            return;
+         }
+
+         explicit_locations[idx] = var;
+      }
+   }
+
+
+   /* Find all shader inputs in the "consumer" stage.  Any variables that have
+    * matching outputs already in the symbol table must have the same type and
+    * qualifiers.
+    *
+    * Exception: if the consumer is the geometry shader, then the inputs
+    * should be arrays and the type of the array element should match the type
+    * of the corresponding producer output.
+    */
+   foreach_list(node, consumer->ir) {
+      ir_variable *const input = ((ir_instruction *) node)->as_variable();
+
+      if ((input == NULL) || (input->data.mode != ir_var_shader_in))
+	 continue;
+
+      if (strcmp(input->name, "gl_Color") == 0 && input->data.used) {
+         const ir_variable *const front_color =
+            parameters.get_variable("gl_FrontColor");
+
+         const ir_variable *const back_color =
+            parameters.get_variable("gl_BackColor");
+
+         cross_validate_front_and_back_color(prog, input,
+                                             front_color, back_color,
+                                             consumer->Stage, producer->Stage);
+      } else if (strcmp(input->name, "gl_SecondaryColor") == 0 && input->data.used) {
+         const ir_variable *const front_color =
+            parameters.get_variable("gl_FrontSecondaryColor");
+
+         const ir_variable *const back_color =
+            parameters.get_variable("gl_BackSecondaryColor");
+
+         cross_validate_front_and_back_color(prog, input,
+                                             front_color, back_color,
+                                             consumer->Stage, producer->Stage);
+      } else {
+         /* The rules for connecting inputs and outputs change in the presence
+          * of explicit locations.  In this case, we no longer care about the
+          * names of the variables.  Instead, we care only about the
+          * explicitly assigned location.
+          */
+         ir_variable *output = NULL;
+         if (input->data.explicit_location
+             && input->data.location >= VARYING_SLOT_VAR0) {
+            output = explicit_locations[input->data.location - VARYING_SLOT_VAR0];
+
+            if (output == NULL) {
+               linker_error(prog,
+                            "%s shader input `%s' with explicit location "
+                            "has no matching output\n",
+                            _mesa_shader_stage_to_string(consumer->Stage),
+                            input->name);
+            }
+         } else {
+            output = parameters.get_variable(input->name);
+         }
+
+         if (output != NULL) {
+            cross_validate_types_and_qualifiers(prog, input, output,
+                                                consumer->Stage, producer->Stage);
+         }
+      }
+   }
+}
+
+
+/**
+ * Initialize this object based on a string that was passed to
+ * glTransformFeedbackVaryings.
+ *
+ * If the input is mal-formed, this call still succeeds, but it sets
+ * this->var_name to a mal-formed input, so tfeedback_decl::find_output_var()
+ * will fail to find any matching variable.
+ */
+void
+tfeedback_decl::init(struct gl_context *ctx, const void *mem_ctx,
+                     const char *input)
+{
+   /* We don't have to be pedantic about what is a valid GLSL variable name,
+    * because any variable with an invalid name can't exist in the IR anyway.
+    */
+
+   this->location = -1;
+   this->orig_name = input;
+   this->is_clip_distance_mesa = false;
+   this->skip_components = 0;
+   this->next_buffer_separator = false;
+   this->matched_candidate = NULL;
+
+   if (ctx->Extensions.ARB_transform_feedback3) {
+      /* Parse gl_NextBuffer. */
+      if (strcmp(input, "gl_NextBuffer") == 0) {
+         this->next_buffer_separator = true;
+         return;
+      }
+
+      /* Parse gl_SkipComponents. */
+      if (strcmp(input, "gl_SkipComponents1") == 0)
+         this->skip_components = 1;
+      else if (strcmp(input, "gl_SkipComponents2") == 0)
+         this->skip_components = 2;
+      else if (strcmp(input, "gl_SkipComponents3") == 0)
+         this->skip_components = 3;
+      else if (strcmp(input, "gl_SkipComponents4") == 0)
+         this->skip_components = 4;
+
+      if (this->skip_components)
+         return;
+   }
+
+   /* Parse a declaration. */
+   const char *base_name_end;
+   long subscript = parse_program_resource_name(input, &base_name_end);
+   this->var_name = ralloc_strndup(mem_ctx, input, base_name_end - input);
+   if (subscript >= 0) {
+      this->array_subscript = subscript;
+      this->is_subscripted = true;
+   } else {
+      this->is_subscripted = false;
+   }
+
+   /* For drivers that lower gl_ClipDistance to gl_ClipDistanceMESA, this
+    * class must behave specially to account for the fact that gl_ClipDistance
+    * is converted from a float[8] to a vec4[2].
+    */
+   if (ctx->ShaderCompilerOptions[MESA_SHADER_VERTEX].LowerClipDistance &&
+       strcmp(this->var_name, "gl_ClipDistance") == 0) {
+      this->is_clip_distance_mesa = true;
+   }
+}
+
+
+/**
+ * Determine whether two tfeedback_decl objects refer to the same variable and
+ * array index (if applicable).
+ */
+bool
+tfeedback_decl::is_same(const tfeedback_decl &x, const tfeedback_decl &y)
+{
+   assert(x.is_varying() && y.is_varying());
+
+   if (strcmp(x.var_name, y.var_name) != 0)
+      return false;
+   if (x.is_subscripted != y.is_subscripted)
+      return false;
+   if (x.is_subscripted && x.array_subscript != y.array_subscript)
+      return false;
+   return true;
+}
+
+
+/**
+ * Assign a location for this tfeedback_decl object based on the transform
+ * feedback candidate found by find_candidate.
+ *
+ * If an error occurs, the error is reported through linker_error() and false
+ * is returned.
+ */
+bool
+tfeedback_decl::assign_location(struct gl_context *ctx,
+                                struct gl_shader_program *prog)
+{
+   assert(this->is_varying());
+
+   unsigned fine_location
+      = this->matched_candidate->toplevel_var->data.location * 4
+      + this->matched_candidate->toplevel_var->data.location_frac
+      + this->matched_candidate->offset;
+
+   if (this->matched_candidate->type->is_array()) {
+      /* Array variable */
+      const unsigned matrix_cols =
+         this->matched_candidate->type->fields.array->matrix_columns;
+      const unsigned vector_elements =
+         this->matched_candidate->type->fields.array->vector_elements;
+      unsigned actual_array_size = this->is_clip_distance_mesa ?
+         prog->LastClipDistanceArraySize :
+         this->matched_candidate->type->array_size();
+
+      if (this->is_subscripted) {
+         /* Check array bounds. */
+         if (this->array_subscript >= actual_array_size) {
+            linker_error(prog, "Transform feedback varying %s has index "
+                         "%i, but the array size is %u.",
+                         this->orig_name, this->array_subscript,
+                         actual_array_size);
+            return false;
+         }
+         unsigned array_elem_size = this->is_clip_distance_mesa ?
+            1 : vector_elements * matrix_cols;
+         fine_location += array_elem_size * this->array_subscript;
+         this->size = 1;
+      } else {
+         this->size = actual_array_size;
+      }
+      this->vector_elements = vector_elements;
+      this->matrix_columns = matrix_cols;
+      if (this->is_clip_distance_mesa)
+         this->type = GL_FLOAT;
+      else
+         this->type = this->matched_candidate->type->fields.array->gl_type;
+   } else {
+      /* Regular variable (scalar, vector, or matrix) */
+      if (this->is_subscripted) {
+         linker_error(prog, "Transform feedback varying %s requested, "
+                      "but %s is not an array.",
+                      this->orig_name, this->var_name);
+         return false;
+      }
+      this->size = 1;
+      this->vector_elements = this->matched_candidate->type->vector_elements;
+      this->matrix_columns = this->matched_candidate->type->matrix_columns;
+      this->type = this->matched_candidate->type->gl_type;
+   }
+   this->location = fine_location / 4;
+   this->location_frac = fine_location % 4;
+
+   /* From GL_EXT_transform_feedback:
+    *   A program will fail to link if:
+    *
+    *   * the total number of components to capture in any varying
+    *     variable in <varyings> is greater than the constant
+    *     MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS_EXT and the
+    *     buffer mode is SEPARATE_ATTRIBS_EXT;
+    */
+   if (prog->TransformFeedback.BufferMode == GL_SEPARATE_ATTRIBS &&
+       this->num_components() >
+       ctx->Const.MaxTransformFeedbackSeparateComponents) {
+      linker_error(prog, "Transform feedback varying %s exceeds "
+                   "MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS.",
+                   this->orig_name);
+      return false;
+   }
+
+   return true;
+}
+
+
+unsigned
+tfeedback_decl::get_num_outputs() const
+{
+   if (!this->is_varying()) {
+      return 0;
+   }
+
+   return (this->num_components() + this->location_frac + 3)/4;
+}
+
+
+/**
+ * Update gl_transform_feedback_info to reflect this tfeedback_decl.
+ *
+ * If an error occurs, the error is reported through linker_error() and false
+ * is returned.
+ */
+bool
+tfeedback_decl::store(struct gl_context *ctx, struct gl_shader_program *prog,
+                      struct gl_transform_feedback_info *info,
+                      unsigned buffer, const unsigned max_outputs) const
+{
+   assert(!this->next_buffer_separator);
+
+   /* Handle gl_SkipComponents. */
+   if (this->skip_components) {
+      info->BufferStride[buffer] += this->skip_components;
+      return true;
+   }
+
+   /* From GL_EXT_transform_feedback:
+    *   A program will fail to link if:
+    *
+    *     * the total number of components to capture is greater than
+    *       the constant MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS_EXT
+    *       and the buffer mode is INTERLEAVED_ATTRIBS_EXT.
+    */
+   if (prog->TransformFeedback.BufferMode == GL_INTERLEAVED_ATTRIBS &&
+       info->BufferStride[buffer] + this->num_components() >
+       ctx->Const.MaxTransformFeedbackInterleavedComponents) {
+      linker_error(prog, "The MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS "
+                   "limit has been exceeded.");
+      return false;
+   }
+
+   unsigned location = this->location;
+   unsigned location_frac = this->location_frac;
+   unsigned num_components = this->num_components();
+   while (num_components > 0) {
+      unsigned output_size = MIN2(num_components, 4 - location_frac);
+      assert(info->NumOutputs < max_outputs);
+      info->Outputs[info->NumOutputs].ComponentOffset = location_frac;
+      info->Outputs[info->NumOutputs].OutputRegister = location;
+      info->Outputs[info->NumOutputs].NumComponents = output_size;
+      info->Outputs[info->NumOutputs].OutputBuffer = buffer;
+      info->Outputs[info->NumOutputs].DstOffset = info->BufferStride[buffer];
+      ++info->NumOutputs;
+      info->BufferStride[buffer] += output_size;
+      num_components -= output_size;
+      location++;
+      location_frac = 0;
+   }
+
+   info->Varyings[info->NumVarying].Name = ralloc_strdup(prog, this->orig_name);
+   info->Varyings[info->NumVarying].Type = this->type;
+   info->Varyings[info->NumVarying].Size = this->size;
+   info->NumVarying++;
+
+   return true;
+}
+
+
+const tfeedback_candidate *
+tfeedback_decl::find_candidate(gl_shader_program *prog,
+                               hash_table *tfeedback_candidates)
+{
+   const char *name = this->is_clip_distance_mesa
+      ? "gl_ClipDistanceMESA" : this->var_name;
+   this->matched_candidate = (const tfeedback_candidate *)
+      hash_table_find(tfeedback_candidates, name);
+   if (!this->matched_candidate) {
+      /* From GL_EXT_transform_feedback:
+       *   A program will fail to link if:
+       *
+       *   * any variable name specified in the <varyings> array is not
+       *     declared as an output in the geometry shader (if present) or
+       *     the vertex shader (if no geometry shader is present);
+       */
+      linker_error(prog, "Transform feedback varying %s undeclared.",
+                   this->orig_name);
+   }
+   return this->matched_candidate;
+}
+
+
+/**
+ * Parse all the transform feedback declarations that were passed to
+ * glTransformFeedbackVaryings() and store them in tfeedback_decl objects.
+ *
+ * If an error occurs, the error is reported through linker_error() and false
+ * is returned.
+ */
+bool
+parse_tfeedback_decls(struct gl_context *ctx, struct gl_shader_program *prog,
+                      const void *mem_ctx, unsigned num_names,
+                      char **varying_names, tfeedback_decl *decls)
+{
+   for (unsigned i = 0; i < num_names; ++i) {
+      decls[i].init(ctx, mem_ctx, varying_names[i]);
+
+      if (!decls[i].is_varying())
+         continue;
+
+      /* From GL_EXT_transform_feedback:
+       *   A program will fail to link if:
+       *
+       *   * any two entries in the <varyings> array specify the same varying
+       *     variable;
+       *
+       * We interpret this to mean "any two entries in the <varyings> array
+       * specify the same varying variable and array index", since transform
+       * feedback of arrays would be useless otherwise.
+       */
+      for (unsigned j = 0; j < i; ++j) {
+         if (!decls[j].is_varying())
+            continue;
+
+         if (tfeedback_decl::is_same(decls[i], decls[j])) {
+            linker_error(prog, "Transform feedback varying %s specified "
+                         "more than once.", varying_names[i]);
+            return false;
+         }
+      }
+   }
+   return true;
+}
+
+
+/**
+ * Store transform feedback location assignments into
+ * prog->LinkedTransformFeedback based on the data stored in tfeedback_decls.
+ *
+ * If an error occurs, the error is reported through linker_error() and false
+ * is returned.
+ */
+bool
+store_tfeedback_info(struct gl_context *ctx, struct gl_shader_program *prog,
+                     unsigned num_tfeedback_decls,
+                     tfeedback_decl *tfeedback_decls)
+{
+   bool separate_attribs_mode =
+      prog->TransformFeedback.BufferMode == GL_SEPARATE_ATTRIBS;
+
+   ralloc_free(prog->LinkedTransformFeedback.Varyings);
+   ralloc_free(prog->LinkedTransformFeedback.Outputs);
+
+   memset(&prog->LinkedTransformFeedback, 0,
+          sizeof(prog->LinkedTransformFeedback));
+
+   prog->LinkedTransformFeedback.Varyings =
+      rzalloc_array(prog,
+		    struct gl_transform_feedback_varying_info,
+		    num_tfeedback_decls);
+
+   unsigned num_outputs = 0;
+   for (unsigned i = 0; i < num_tfeedback_decls; ++i)
+      num_outputs += tfeedback_decls[i].get_num_outputs();
+
+   prog->LinkedTransformFeedback.Outputs =
+      rzalloc_array(prog,
+                    struct gl_transform_feedback_output,
+                    num_outputs);
+
+   unsigned num_buffers = 0;
+
+   if (separate_attribs_mode) {
+      /* GL_SEPARATE_ATTRIBS */
+      for (unsigned i = 0; i < num_tfeedback_decls; ++i) {
+         if (!tfeedback_decls[i].store(ctx, prog, &prog->LinkedTransformFeedback,
+                                       num_buffers, num_outputs))
+            return false;
+
+         num_buffers++;
+      }
+   }
+   else {
+      /* GL_INVERLEAVED_ATTRIBS */
+      for (unsigned i = 0; i < num_tfeedback_decls; ++i) {
+         if (tfeedback_decls[i].is_next_buffer_separator()) {
+            num_buffers++;
+            continue;
+         }
+
+         if (!tfeedback_decls[i].store(ctx, prog,
+                                       &prog->LinkedTransformFeedback,
+                                       num_buffers, num_outputs))
+            return false;
+      }
+      num_buffers++;
+   }
+
+   assert(prog->LinkedTransformFeedback.NumOutputs == num_outputs);
+
+   prog->LinkedTransformFeedback.NumBuffers = num_buffers;
+   return true;
+}
+
+namespace {
+
+/**
+ * Data structure recording the relationship between outputs of one shader
+ * stage (the "producer") and inputs of another (the "consumer").
+ */
+class varying_matches
+{
+public:
+   varying_matches(bool disable_varying_packing, bool consumer_is_fs);
+   ~varying_matches();
+   void record(ir_variable *producer_var, ir_variable *consumer_var);
+   unsigned assign_locations();
+   void store_locations() const;
+
+private:
+   /**
+    * If true, this driver disables varying packing, so all varyings need to
+    * be aligned on slot boundaries, and take up a number of slots equal to
+    * their number of matrix columns times their array size.
+    */
+   const bool disable_varying_packing;
+
+   /**
+    * Enum representing the order in which varyings are packed within a
+    * packing class.
+    *
+    * Currently we pack vec4's first, then vec2's, then scalar values, then
+    * vec3's.  This order ensures that the only vectors that are at risk of
+    * having to be "double parked" (split between two adjacent varying slots)
+    * are the vec3's.
+    */
+   enum packing_order_enum {
+      PACKING_ORDER_VEC4,
+      PACKING_ORDER_VEC2,
+      PACKING_ORDER_SCALAR,
+      PACKING_ORDER_VEC3,
+   };
+
+   static unsigned compute_packing_class(const ir_variable *var);
+   static packing_order_enum compute_packing_order(const ir_variable *var);
+   static int match_comparator(const void *x_generic, const void *y_generic);
+
+   /**
+    * Structure recording the relationship between a single producer output
+    * and a single consumer input.
+    */
+   struct match {
+      /**
+       * Packing class for this varying, computed by compute_packing_class().
+       */
+      unsigned packing_class;
+
+      /**
+       * Packing order for this varying, computed by compute_packing_order().
+       */
+      packing_order_enum packing_order;
+      unsigned num_components;
+
+      /**
+       * The output variable in the producer stage.
+       */
+      ir_variable *producer_var;
+
+      /**
+       * The input variable in the consumer stage.
+       */
+      ir_variable *consumer_var;
+
+      /**
+       * The location which has been assigned for this varying.  This is
+       * expressed in multiples of a float, with the first generic varying
+       * (i.e. the one referred to by VARYING_SLOT_VAR0) represented by the
+       * value 0.
+       */
+      unsigned generic_location;
+   } *matches;
+
+   /**
+    * The number of elements in the \c matches array that are currently in
+    * use.
+    */
+   unsigned num_matches;
+
+   /**
+    * The number of elements that were set aside for the \c matches array when
+    * it was allocated.
+    */
+   unsigned matches_capacity;
+
+   const bool consumer_is_fs;
+};
+
+} /* anonymous namespace */
+
+varying_matches::varying_matches(bool disable_varying_packing,
+                                 bool consumer_is_fs)
+   : disable_varying_packing(disable_varying_packing),
+     consumer_is_fs(consumer_is_fs)
+{
+   /* Note: this initial capacity is rather arbitrarily chosen to be large
+    * enough for many cases without wasting an unreasonable amount of space.
+    * varying_matches::record() will resize the array if there are more than
+    * this number of varyings.
+    */
+   this->matches_capacity = 8;
+   this->matches = (match *)
+      malloc(sizeof(*this->matches) * this->matches_capacity);
+   this->num_matches = 0;
+}
+
+
+varying_matches::~varying_matches()
+{
+   free(this->matches);
+}
+
+
+/**
+ * Record the given producer/consumer variable pair in the list of variables
+ * that should later be assigned locations.
+ *
+ * It is permissible for \c consumer_var to be NULL (this happens if a
+ * variable is output by the producer and consumed by transform feedback, but
+ * not consumed by the consumer).
+ *
+ * If \c producer_var has already been paired up with a consumer_var, or
+ * producer_var is part of fixed pipeline functionality (and hence already has
+ * a location assigned), this function has no effect.
+ *
+ * Note: as a side effect this function may change the interpolation type of
+ * \c producer_var, but only when the change couldn't possibly affect
+ * rendering.
+ */
+void
+varying_matches::record(ir_variable *producer_var, ir_variable *consumer_var)
+{
+   assert(producer_var != NULL || consumer_var != NULL);
+
+   if ((producer_var && !producer_var->data.is_unmatched_generic_inout)
+       || (consumer_var && !consumer_var->data.is_unmatched_generic_inout)) {
+      /* Either a location already exists for this variable (since it is part
+       * of fixed functionality), or it has already been recorded as part of a
+       * previous match.
+       */
+      return;
+   }
+
+   if ((consumer_var == NULL && producer_var->type->contains_integer()) ||
+       !consumer_is_fs) {
+      /* Since this varying is not being consumed by the fragment shader, its
+       * interpolation type varying cannot possibly affect rendering.  Also,
+       * this variable is non-flat and is (or contains) an integer.
+       *
+       * lower_packed_varyings requires all integer varyings to flat,
+       * regardless of where they appear.  We can trivially satisfy that
+       * requirement by changing the interpolation type to flat here.
+       */
+      producer_var->data.centroid = false;
+      producer_var->data.sample = false;
+      producer_var->data.interpolation = INTERP_QUALIFIER_FLAT;
+
+      if (consumer_var) {
+         consumer_var->data.centroid = false;
+         consumer_var->data.sample = false;
+         consumer_var->data.interpolation = INTERP_QUALIFIER_FLAT;
+      }
+   }
+
+   if (this->num_matches == this->matches_capacity) {
+      this->matches_capacity *= 2;
+      this->matches = (match *)
+         realloc(this->matches,
+                 sizeof(*this->matches) * this->matches_capacity);
+   }
+
+   const ir_variable *const var = (producer_var != NULL)
+      ? producer_var : consumer_var;
+
+   this->matches[this->num_matches].packing_class
+      = this->compute_packing_class(var);
+   this->matches[this->num_matches].packing_order
+      = this->compute_packing_order(var);
+   if (this->disable_varying_packing) {
+      unsigned slots = var->type->is_array()
+         ? (var->type->length * var->type->fields.array->matrix_columns)
+         : var->type->matrix_columns;
+      this->matches[this->num_matches].num_components = 4 * slots;
+   } else {
+      this->matches[this->num_matches].num_components
+         = var->type->component_slots();
+   }
+   this->matches[this->num_matches].producer_var = producer_var;
+   this->matches[this->num_matches].consumer_var = consumer_var;
+   this->num_matches++;
+   if (producer_var)
+      producer_var->data.is_unmatched_generic_inout = 0;
+   if (consumer_var)
+      consumer_var->data.is_unmatched_generic_inout = 0;
+}
+
+
+/**
+ * Choose locations for all of the variable matches that were previously
+ * passed to varying_matches::record().
+ */
+unsigned
+varying_matches::assign_locations()
+{
+   /* Sort varying matches into an order that makes them easy to pack. */
+   qsort(this->matches, this->num_matches, sizeof(*this->matches),
+         &varying_matches::match_comparator);
+
+   unsigned generic_location = 0;
+
+   for (unsigned i = 0; i < this->num_matches; i++) {
+      /* Advance to the next slot if this varying has a different packing
+       * class than the previous one, and we're not already on a slot
+       * boundary.
+       */
+      if (i > 0 &&
+          this->matches[i - 1].packing_class
+          != this->matches[i].packing_class) {
+         generic_location = ALIGN(generic_location, 4);
+      }
+
+      this->matches[i].generic_location = generic_location;
+
+      generic_location += this->matches[i].num_components;
+   }
+
+   return (generic_location + 3) / 4;
+}
+
+
+/**
+ * Update the producer and consumer shaders to reflect the locations
+ * assignments that were made by varying_matches::assign_locations().
+ */
+void
+varying_matches::store_locations() const
+{
+   for (unsigned i = 0; i < this->num_matches; i++) {
+      ir_variable *producer_var = this->matches[i].producer_var;
+      ir_variable *consumer_var = this->matches[i].consumer_var;
+      unsigned generic_location = this->matches[i].generic_location;
+      unsigned slot = generic_location / 4;
+      unsigned offset = generic_location % 4;
+
+      if (producer_var) {
+         producer_var->data.location = VARYING_SLOT_VAR0 + slot;
+         producer_var->data.location_frac = offset;
+      }
+
+      if (consumer_var) {
+         assert(consumer_var->data.location == -1);
+         consumer_var->data.location = VARYING_SLOT_VAR0 + slot;
+         consumer_var->data.location_frac = offset;
+      }
+   }
+}
+
+
+/**
+ * Compute the "packing class" of the given varying.  This is an unsigned
+ * integer with the property that two variables in the same packing class can
+ * be safely backed into the same vec4.
+ */
+unsigned
+varying_matches::compute_packing_class(const ir_variable *var)
+{
+   /* Without help from the back-end, there is no way to pack together
+    * variables with different interpolation types, because
+    * lower_packed_varyings must choose exactly one interpolation type for
+    * each packed varying it creates.
+    *
+    * However, we can safely pack together floats, ints, and uints, because:
+    *
+    * - varyings of base type "int" and "uint" must use the "flat"
+    *   interpolation type, which can only occur in GLSL 1.30 and above.
+    *
+    * - On platforms that support GLSL 1.30 and above, lower_packed_varyings
+    *   can store flat floats as ints without losing any information (using
+    *   the ir_unop_bitcast_* opcodes).
+    *
+    * Therefore, the packing class depends only on the interpolation type.
+    */
+   unsigned packing_class = var->data.centroid | (var->data.sample << 1);
+   packing_class *= 4;
+   packing_class += var->data.interpolation;
+   return packing_class;
+}
+
+
+/**
+ * Compute the "packing order" of the given varying.  This is a sort key we
+ * use to determine when to attempt to pack the given varying relative to
+ * other varyings in the same packing class.
+ */
+varying_matches::packing_order_enum
+varying_matches::compute_packing_order(const ir_variable *var)
+{
+   const glsl_type *element_type = var->type;
+
+   while (element_type->base_type == GLSL_TYPE_ARRAY) {
+      element_type = element_type->fields.array;
+   }
+
+   switch (element_type->component_slots() % 4) {
+   case 1: return PACKING_ORDER_SCALAR;
+   case 2: return PACKING_ORDER_VEC2;
+   case 3: return PACKING_ORDER_VEC3;
+   case 0: return PACKING_ORDER_VEC4;
+   default:
+      assert(!"Unexpected value of vector_elements");
+      return PACKING_ORDER_VEC4;
+   }
+}
+
+
+/**
+ * Comparison function passed to qsort() to sort varyings by packing_class and
+ * then by packing_order.
+ */
+int
+varying_matches::match_comparator(const void *x_generic, const void *y_generic)
+{
+   const match *x = (const match *) x_generic;
+   const match *y = (const match *) y_generic;
+
+   if (x->packing_class != y->packing_class)
+      return x->packing_class - y->packing_class;
+   return x->packing_order - y->packing_order;
+}
+
+
+/**
+ * Is the given variable a varying variable to be counted against the
+ * limit in ctx->Const.MaxVarying?
+ * This includes variables such as texcoords, colors and generic
+ * varyings, but excludes variables such as gl_FrontFacing and gl_FragCoord.
+ */
+static bool
+is_varying_var(gl_shader_stage stage, const ir_variable *var)
+{
+   /* Only fragment shaders will take a varying variable as an input */
+   if (stage == MESA_SHADER_FRAGMENT &&
+       var->data.mode == ir_var_shader_in) {
+      switch (var->data.location) {
+      case VARYING_SLOT_POS:
+      case VARYING_SLOT_FACE:
+      case VARYING_SLOT_PNTC:
+         return false;
+      default:
+         return true;
+      }
+   }
+   return false;
+}
+
+
+/**
+ * Visitor class that generates tfeedback_candidate structs describing all
+ * possible targets of transform feedback.
+ *
+ * tfeedback_candidate structs are stored in the hash table
+ * tfeedback_candidates, which is passed to the constructor.  This hash table
+ * maps varying names to instances of the tfeedback_candidate struct.
+ */
+class tfeedback_candidate_generator : public program_resource_visitor
+{
+public:
+   tfeedback_candidate_generator(void *mem_ctx,
+                                 hash_table *tfeedback_candidates)
+      : mem_ctx(mem_ctx),
+        tfeedback_candidates(tfeedback_candidates),
+        toplevel_var(NULL),
+        varying_floats(0)
+   {
+   }
+
+   void process(ir_variable *var)
+   {
+      this->toplevel_var = var;
+      this->varying_floats = 0;
+      if (var->is_interface_instance())
+         program_resource_visitor::process(var->get_interface_type(),
+                                           var->get_interface_type()->name);
+      else
+         program_resource_visitor::process(var);
+   }
+
+private:
+   virtual void visit_field(const glsl_type *type, const char *name,
+                            bool row_major)
+   {
+      assert(!type->is_record());
+      assert(!(type->is_array() && type->fields.array->is_record()));
+      assert(!type->is_interface());
+      assert(!(type->is_array() && type->fields.array->is_interface()));
+
+      (void) row_major;
+
+      tfeedback_candidate *candidate
+         = rzalloc(this->mem_ctx, tfeedback_candidate);
+      candidate->toplevel_var = this->toplevel_var;
+      candidate->type = type;
+      candidate->offset = this->varying_floats;
+      hash_table_insert(this->tfeedback_candidates, candidate,
+                        ralloc_strdup(this->mem_ctx, name));
+      this->varying_floats += type->component_slots();
+   }
+
+   /**
+    * Memory context used to allocate hash table keys and values.
+    */
+   void * const mem_ctx;
+
+   /**
+    * Hash table in which tfeedback_candidate objects should be stored.
+    */
+   hash_table * const tfeedback_candidates;
+
+   /**
+    * Pointer to the toplevel variable that is being traversed.
+    */
+   ir_variable *toplevel_var;
+
+   /**
+    * Total number of varying floats that have been visited so far.  This is
+    * used to determine the offset to each varying within the toplevel
+    * variable.
+    */
+   unsigned varying_floats;
+};
+
+
+namespace linker {
+
+bool
+populate_consumer_input_sets(void *mem_ctx, exec_list *ir,
+                             hash_table *consumer_inputs,
+                             hash_table *consumer_interface_inputs,
+                             ir_variable *consumer_inputs_with_locations[VARYING_SLOT_MAX])
+{
+   memset(consumer_inputs_with_locations,
+          0,
+          sizeof(consumer_inputs_with_locations[0]) * VARYING_SLOT_MAX);
+
+   foreach_list(node, ir) {
+      ir_variable *const input_var = ((ir_instruction *) node)->as_variable();
+
+      if ((input_var != NULL) && (input_var->data.mode == ir_var_shader_in)) {
+         if (input_var->type->is_interface())
+            return false;
+
+         if (input_var->data.explicit_location) {
+            /* assign_varying_locations only cares about finding the
+             * ir_variable at the start of a contiguous location block.
+             *
+             *     - For !producer, consumer_inputs_with_locations isn't used.
+             *
+             *     - For !consumer, consumer_inputs_with_locations is empty.
+             *
+             * For consumer && producer, if you were trying to set some
+             * ir_variable to the middle of a location block on the other side
+             * of producer/consumer, cross_validate_outputs_to_inputs() should
+             * be link-erroring due to either type mismatch or location
+             * overlaps.  If the variables do match up, then they've got a
+             * matching data.location and you only looked at
+             * consumer_inputs_with_locations[var->data.location], not any
+             * following entries for the array/structure.
+             */
+            consumer_inputs_with_locations[input_var->data.location] =
+               input_var;
+         } else if (input_var->get_interface_type() != NULL) {
+            char *const iface_field_name =
+               ralloc_asprintf(mem_ctx, "%s.%s",
+                               input_var->get_interface_type()->name,
+                               input_var->name);
+            hash_table_insert(consumer_interface_inputs, input_var,
+                              iface_field_name);
+         } else {
+            hash_table_insert(consumer_inputs, input_var,
+                              ralloc_strdup(mem_ctx, input_var->name));
+         }
+      }
+   }
+
+   return true;
+}
+
+/**
+ * Find a variable from the consumer that "matches" the specified variable
+ *
+ * This function only finds inputs with names that match.  There is no
+ * validation (here) that the types, etc. are compatible.
+ */
+ir_variable *
+get_matching_input(void *mem_ctx,
+                   const ir_variable *output_var,
+                   hash_table *consumer_inputs,
+                   hash_table *consumer_interface_inputs,
+                   ir_variable *consumer_inputs_with_locations[VARYING_SLOT_MAX])
+{
+   ir_variable *input_var;
+
+   if (output_var->data.explicit_location) {
+      input_var = consumer_inputs_with_locations[output_var->data.location];
+   } else if (output_var->get_interface_type() != NULL) {
+      char *const iface_field_name =
+         ralloc_asprintf(mem_ctx, "%s.%s",
+                         output_var->get_interface_type()->name,
+                         output_var->name);
+      input_var =
+         (ir_variable *) hash_table_find(consumer_interface_inputs,
+                                         iface_field_name);
+   } else {
+      input_var =
+         (ir_variable *) hash_table_find(consumer_inputs, output_var->name);
+   }
+
+   return (input_var == NULL || input_var->data.mode != ir_var_shader_in)
+      ? NULL : input_var;
+}
+
+}
+
+static int
+io_variable_cmp(const void *_a, const void *_b)
+{
+   const ir_variable *const a = *(const ir_variable **) _a;
+   const ir_variable *const b = *(const ir_variable **) _b;
+
+   if (a->data.explicit_location && b->data.explicit_location)
+      return b->data.location - a->data.location;
+
+   if (a->data.explicit_location && !b->data.explicit_location)
+      return 1;
+
+   if (!a->data.explicit_location && b->data.explicit_location)
+      return -1;
+
+   return -strcmp(a->name, b->name);
+}
+
+/**
+ * Sort the shader IO variables into canonical order
+ */
+static void
+canonicalize_shader_io(exec_list *ir, enum ir_variable_mode io_mode)
+{
+   ir_variable *var_table[MAX_PROGRAM_OUTPUTS * 4];
+   unsigned num_variables = 0;
+
+   foreach_list(node, ir) {
+      ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+      if (var == NULL || var->data.mode != io_mode)
+         continue;
+
+      /* If we have already encountered more I/O variables that could
+       * successfully link, bail.
+       */
+      if (num_variables == ARRAY_SIZE(var_table))
+         return;
+
+      var_table[num_variables++] = var;
+   }
+
+   if (num_variables == 0)
+      return;
+
+   /* Sort the list in reverse order (io_variable_cmp handles this).  Later
+    * we're going to push the variables on to the IR list as a stack, so we
+    * want the last variable (in canonical order) to be first in the list.
+    */
+   qsort(var_table, num_variables, sizeof(var_table[0]), io_variable_cmp);
+
+   /* Remove the variable from it's current location in the IR, and put it at
+    * the front.
+    */
+   for (unsigned i = 0; i < num_variables; i++) {
+      var_table[i]->remove();
+      ir->push_head(var_table[i]);
+   }
+}
+
+/**
+ * Assign locations for all variables that are produced in one pipeline stage
+ * (the "producer") and consumed in the next stage (the "consumer").
+ *
+ * Variables produced by the producer may also be consumed by transform
+ * feedback.
+ *
+ * \param num_tfeedback_decls is the number of declarations indicating
+ *        variables that may be consumed by transform feedback.
+ *
+ * \param tfeedback_decls is a pointer to an array of tfeedback_decl objects
+ *        representing the result of parsing the strings passed to
+ *        glTransformFeedbackVaryings().  assign_location() will be called for
+ *        each of these objects that matches one of the outputs of the
+ *        producer.
+ *
+ * \param gs_input_vertices: if \c consumer is a geometry shader, this is the
+ *        number of input vertices it accepts.  Otherwise zero.
+ *
+ * When num_tfeedback_decls is nonzero, it is permissible for the consumer to
+ * be NULL.  In this case, varying locations are assigned solely based on the
+ * requirements of transform feedback.
+ */
+bool
+assign_varying_locations(struct gl_context *ctx,
+			 void *mem_ctx,
+			 struct gl_shader_program *prog,
+			 gl_shader *producer, gl_shader *consumer,
+                         unsigned num_tfeedback_decls,
+                         tfeedback_decl *tfeedback_decls,
+                         unsigned gs_input_vertices)
+{
+   varying_matches matches(ctx->Const.DisableVaryingPacking,
+                           consumer && consumer->Stage == MESA_SHADER_FRAGMENT);
+   hash_table *tfeedback_candidates
+      = hash_table_ctor(0, hash_table_string_hash, hash_table_string_compare);
+   hash_table *consumer_inputs
+      = hash_table_ctor(0, hash_table_string_hash, hash_table_string_compare);
+   hash_table *consumer_interface_inputs
+      = hash_table_ctor(0, hash_table_string_hash, hash_table_string_compare);
+   ir_variable *consumer_inputs_with_locations[VARYING_SLOT_MAX] = {
+      NULL,
+   };
+
+   /* Operate in a total of four passes.
+    *
+    * 1. Sort inputs / outputs into a canonical order.  This is necessary so
+    *    that inputs / outputs of separable shaders will be assigned
+    *    predictable locations regardless of the order in which declarations
+    *    appeared in the shader source.
+    *
+    * 2. Assign locations for any matching inputs and outputs.
+    *
+    * 3. Mark output variables in the producer that do not have locations as
+    *    not being outputs.  This lets the optimizer eliminate them.
+    *
+    * 4. Mark input variables in the consumer that do not have locations as
+    *    not being inputs.  This lets the optimizer eliminate them.
+    */
+   if (consumer)
+      canonicalize_shader_io(consumer->ir, ir_var_shader_in);
+
+   if (producer)
+      canonicalize_shader_io(producer->ir, ir_var_shader_out);
+
+   if (consumer
+       && !linker::populate_consumer_input_sets(mem_ctx,
+                                                consumer->ir,
+                                                consumer_inputs,
+                                                consumer_interface_inputs,
+                                                consumer_inputs_with_locations)) {
+      assert(!"populate_consumer_input_sets failed");
+      hash_table_dtor(tfeedback_candidates);
+      hash_table_dtor(consumer_inputs);
+      hash_table_dtor(consumer_interface_inputs);
+      return false;
+   }
+
+   if (producer) {
+      foreach_list(node, producer->ir) {
+         ir_variable *const output_var =
+            ((ir_instruction *) node)->as_variable();
+
+         if ((output_var == NULL) ||
+             (output_var->data.mode != ir_var_shader_out))
+            continue;
+
+         tfeedback_candidate_generator g(mem_ctx, tfeedback_candidates);
+         g.process(output_var);
+
+         ir_variable *const input_var =
+            linker::get_matching_input(mem_ctx, output_var, consumer_inputs,
+                                       consumer_interface_inputs,
+                                       consumer_inputs_with_locations);
+
+         /* If a matching input variable was found, add this ouptut (and the
+          * input) to the set.  If this is a separable program and there is no
+          * consumer stage, add the output.
+          */
+         if (input_var || (prog->SeparateShader && consumer == NULL)) {
+            matches.record(output_var, input_var);
+         }
+      }
+   } else {
+      /* If there's no producer stage, then this must be a separable program.
+       * For example, we may have a program that has just a fragment shader.
+       * Later this program will be used with some arbitrary vertex (or
+       * geometry) shader program.  This means that locations must be assigned
+       * for all the inputs.
+       */
+      foreach_list(node, consumer->ir) {
+         ir_variable *const input_var =
+            ((ir_instruction *) node)->as_variable();
+
+         if ((input_var == NULL) ||
+             (input_var->data.mode != ir_var_shader_in))
+            continue;
+
+         matches.record(NULL, input_var);
+      }
+   }
+
+   for (unsigned i = 0; i < num_tfeedback_decls; ++i) {
+      if (!tfeedback_decls[i].is_varying())
+         continue;
+
+      const tfeedback_candidate *matched_candidate
+         = tfeedback_decls[i].find_candidate(prog, tfeedback_candidates);
+
+      if (matched_candidate == NULL) {
+         hash_table_dtor(tfeedback_candidates);
+         hash_table_dtor(consumer_inputs);
+         hash_table_dtor(consumer_interface_inputs);
+         return false;
+      }
+
+      if (matched_candidate->toplevel_var->data.is_unmatched_generic_inout)
+         matches.record(matched_candidate->toplevel_var, NULL);
+   }
+
+   const unsigned slots_used = matches.assign_locations();
+   matches.store_locations();
+
+   for (unsigned i = 0; i < num_tfeedback_decls; ++i) {
+      if (!tfeedback_decls[i].is_varying())
+         continue;
+
+      if (!tfeedback_decls[i].assign_location(ctx, prog)) {
+         hash_table_dtor(tfeedback_candidates);
+         hash_table_dtor(consumer_inputs);
+         hash_table_dtor(consumer_interface_inputs);
+         return false;
+      }
+   }
+
+   hash_table_dtor(tfeedback_candidates);
+   hash_table_dtor(consumer_inputs);
+   hash_table_dtor(consumer_interface_inputs);
+
+   if (ctx->Const.DisableVaryingPacking) {
+      /* Transform feedback code assumes varyings are packed, so if the driver
+       * has disabled varying packing, make sure it does not support transform
+       * feedback.
+       */
+      assert(!ctx->Extensions.EXT_transform_feedback);
+   } else {
+      if (producer) {
+         lower_packed_varyings(mem_ctx, slots_used, ir_var_shader_out,
+                               0, producer);
+      }
+      if (consumer) {
+         lower_packed_varyings(mem_ctx, slots_used, ir_var_shader_in,
+                               gs_input_vertices, consumer);
+      }
+   }
+
+   if (consumer && producer) {
+      foreach_list(node, consumer->ir) {
+         ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+         if (var && var->data.mode == ir_var_shader_in &&
+             var->data.is_unmatched_generic_inout) {
+            if (prog->Version <= 120) {
+               /* On page 25 (page 31 of the PDF) of the GLSL 1.20 spec:
+                *
+                *     Only those varying variables used (i.e. read) in
+                *     the fragment shader executable must be written to
+                *     by the vertex shader executable; declaring
+                *     superfluous varying variables in a vertex shader is
+                *     permissible.
+                *
+                * We interpret this text as meaning that the VS must
+                * write the variable for the FS to read it.  See
+                * "glsl1-varying read but not written" in piglit.
+                */
+
+               linker_error(prog, "%s shader varying %s not written "
+                            "by %s shader\n.",
+                            _mesa_shader_stage_to_string(consumer->Stage),
+			    var->name,
+                            _mesa_shader_stage_to_string(producer->Stage));
+            }
+
+            /* An 'in' variable is only really a shader input if its
+             * value is written by the previous stage.
+             */
+            var->data.mode = ir_var_auto;
+         }
+      }
+   }
+
+   return true;
+}
+
+bool
+check_against_output_limit(struct gl_context *ctx,
+                           struct gl_shader_program *prog,
+                           gl_shader *producer)
+{
+   unsigned output_vectors = 0;
+
+   foreach_list(node, producer->ir) {
+      ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+      if (var && var->data.mode == ir_var_shader_out &&
+          is_varying_var(producer->Stage, var)) {
+         output_vectors += var->type->count_attribute_slots();
+      }
+   }
+
+   assert(producer->Stage != MESA_SHADER_FRAGMENT);
+   unsigned max_output_components =
+      ctx->Const.Program[producer->Stage].MaxOutputComponents;
+
+   const unsigned output_components = output_vectors * 4;
+   if (output_components > max_output_components) {
+      if (ctx->API == API_OPENGLES2 || prog->IsES)
+         linker_error(prog, "shader uses too many output vectors "
+                      "(%u > %u)\n",
+                      output_vectors,
+                      max_output_components / 4);
+      else
+         linker_error(prog, "shader uses too many output components "
+                      "(%u > %u)\n",
+                      output_components,
+                      max_output_components);
+
+      return false;
+   }
+
+   return true;
+}
+
+bool
+check_against_input_limit(struct gl_context *ctx,
+                          struct gl_shader_program *prog,
+                          gl_shader *consumer)
+{
+   unsigned input_vectors = 0;
+
+   foreach_list(node, consumer->ir) {
+      ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+      if (var && var->data.mode == ir_var_shader_in &&
+          is_varying_var(consumer->Stage, var)) {
+         input_vectors += var->type->count_attribute_slots();
+      }
+   }
+
+   assert(consumer->Stage != MESA_SHADER_VERTEX);
+   unsigned max_input_components =
+      ctx->Const.Program[consumer->Stage].MaxInputComponents;
+
+   const unsigned input_components = input_vectors * 4;
+   if (input_components > max_input_components) {
+      if (ctx->API == API_OPENGLES2 || prog->IsES)
+         linker_error(prog, "shader uses too many input vectors "
+                      "(%u > %u)\n",
+                      input_vectors,
+                      max_input_components / 4);
+      else
+         linker_error(prog, "shader uses too many input components "
+                      "(%u > %u)\n",
+                      input_components,
+                      max_input_components);
+
+      return false;
+   }
+
+   return true;
+}

diff --git a/icd/intel/compiler/shader/link_varyings.h b/icd/intel/compiler/shader/link_varyings.h
new file mode 100644
index 0000000..6fa2681
--- /dev/null
+++ b/icd/intel/compiler/shader/link_varyings.h

@@ -0,0 +1,249 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef GLSL_LINK_VARYINGS_H
+#define GLSL_LINK_VARYINGS_H
+
+/**
+ * \file link_varyings.h
+ *
+ * Linker functions related specifically to linking varyings between shader
+ * stages.
+ */
+
+
+#include "main/glheader.h"
+
+
+struct gl_shader_program;
+struct gl_shader;
+class ir_variable;
+
+
+/**
+ * Data structure describing a varying which is available for use in transform
+ * feedback.
+ *
+ * For example, if the vertex shader contains:
+ *
+ *     struct S {
+ *       vec4 foo;
+ *       float[3] bar;
+ *     };
+ *
+ *     varying S[2] v;
+ *
+ * Then there would be tfeedback_candidate objects corresponding to the
+ * following varyings:
+ *
+ *     v[0].foo
+ *     v[0].bar
+ *     v[1].foo
+ *     v[1].bar
+ */
+struct tfeedback_candidate
+{
+   /**
+    * Toplevel variable containing this varying.  In the above example, this
+    * would point to the declaration of the varying v.
+    */
+   ir_variable *toplevel_var;
+
+   /**
+    * Type of this varying.  In the above example, this would point to the
+    * glsl_type for "vec4" or "float[3]".
+    */
+   const glsl_type *type;
+
+   /**
+    * Offset within the toplevel variable where this varying occurs (counted
+    * in multiples of the size of a float).
+    */
+   unsigned offset;
+};
+
+
+/**
+ * Data structure tracking information about a transform feedback declaration
+ * during linking.
+ */
+class tfeedback_decl
+{
+public:
+   void init(struct gl_context *ctx, const void *mem_ctx, const char *input);
+   static bool is_same(const tfeedback_decl &x, const tfeedback_decl &y);
+   bool assign_location(struct gl_context *ctx,
+                        struct gl_shader_program *prog);
+   unsigned get_num_outputs() const;
+   bool store(struct gl_context *ctx, struct gl_shader_program *prog,
+              struct gl_transform_feedback_info *info, unsigned buffer,
+              const unsigned max_outputs) const;
+   const tfeedback_candidate *find_candidate(gl_shader_program *prog,
+                                             hash_table *tfeedback_candidates);
+
+   bool is_next_buffer_separator() const
+   {
+      return this->next_buffer_separator;
+   }
+
+   bool is_varying() const
+   {
+      return !this->next_buffer_separator && !this->skip_components;
+   }
+
+   /**
+    * The total number of varying components taken up by this variable.  Only
+    * valid if assign_location() has been called.
+    */
+   unsigned num_components() const
+   {
+      if (this->is_clip_distance_mesa)
+         return this->size;
+      else
+         return this->vector_elements * this->matrix_columns * this->size;
+   }
+
+   unsigned get_location() const {
+      return this->location;
+   }
+
+private:
+   /**
+    * The name that was supplied to glTransformFeedbackVaryings.  Used for
+    * error reporting and glGetTransformFeedbackVarying().
+    */
+   const char *orig_name;
+
+   /**
+    * The name of the variable, parsed from orig_name.
+    */
+   const char *var_name;
+
+   /**
+    * True if the declaration in orig_name represents an array.
+    */
+   bool is_subscripted;
+
+   /**
+    * If is_subscripted is true, the subscript that was specified in orig_name.
+    */
+   unsigned array_subscript;
+
+   /**
+    * True if the variable is gl_ClipDistance and the driver lowers
+    * gl_ClipDistance to gl_ClipDistanceMESA.
+    */
+   bool is_clip_distance_mesa;
+
+   /**
+    * The vertex shader output location that the linker assigned for this
+    * variable.  -1 if a location hasn't been assigned yet.
+    */
+   int location;
+
+   /**
+    * If non-zero, then this variable may be packed along with other variables
+    * into a single varying slot, so this offset should be applied when
+    * accessing components.  For example, an offset of 1 means that the x
+    * component of this variable is actually stored in component y of the
+    * location specified by \c location.
+    *
+    * Only valid if location != -1.
+    */
+   unsigned location_frac;
+
+   /**
+    * If location != -1, the number of vector elements in this variable, or 1
+    * if this variable is a scalar.
+    */
+   unsigned vector_elements;
+
+   /**
+    * If location != -1, the number of matrix columns in this variable, or 1
+    * if this variable is not a matrix.
+    */
+   unsigned matrix_columns;
+
+   /** Type of the varying returned by glGetTransformFeedbackVarying() */
+   GLenum type;
+
+   /**
+    * If location != -1, the size that should be returned by
+    * glGetTransformFeedbackVarying().
+    */
+   unsigned size;
+
+   /**
+    * How many components to skip. If non-zero, this is
+    * gl_SkipComponents{1,2,3,4} from ARB_transform_feedback3.
+    */
+   unsigned skip_components;
+
+   /**
+    * Whether this is gl_NextBuffer from ARB_transform_feedback3.
+    */
+   bool next_buffer_separator;
+
+   /**
+    * If find_candidate() has been called, pointer to the tfeedback_candidate
+    * data structure that was found.  Otherwise NULL.
+    */
+   const tfeedback_candidate *matched_candidate;
+};
+
+
+void
+cross_validate_outputs_to_inputs(struct gl_shader_program *prog,
+				 gl_shader *producer, gl_shader *consumer);
+
+bool
+parse_tfeedback_decls(struct gl_context *ctx, struct gl_shader_program *prog,
+                      const void *mem_ctx, unsigned num_names,
+                      char **varying_names, tfeedback_decl *decls);
+
+bool
+store_tfeedback_info(struct gl_context *ctx, struct gl_shader_program *prog,
+                     unsigned num_tfeedback_decls,
+                     tfeedback_decl *tfeedback_decls);
+
+bool
+assign_varying_locations(struct gl_context *ctx,
+			 void *mem_ctx,
+			 struct gl_shader_program *prog,
+			 gl_shader *producer, gl_shader *consumer,
+                         unsigned num_tfeedback_decls,
+                         tfeedback_decl *tfeedback_decls,
+                         unsigned gs_input_vertices);
+
+bool
+check_against_output_limit(struct gl_context *ctx,
+                           struct gl_shader_program *prog,
+                           gl_shader *producer);
+
+bool
+check_against_input_limit(struct gl_context *ctx,
+                          struct gl_shader_program *prog,
+                          gl_shader *consumer);
+
+#endif /* GLSL_LINK_VARYINGS_H */

diff --git a/icd/intel/compiler/shader/linker.cpp b/icd/intel/compiler/shader/linker.cpp
new file mode 100644
index 0000000..a55c7c6
--- /dev/null
+++ b/icd/intel/compiler/shader/linker.cpp

@@ -0,0 +1,2644 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file linker.cpp
+ * GLSL linker implementation
+ *
+ * Given a set of shaders that are to be linked to generate a final program,
+ * there are three distinct stages.
+ *
+ * In the first stage shaders are partitioned into groups based on the shader
+ * type.  All shaders of a particular type (e.g., vertex shaders) are linked
+ * together.
+ *
+ *   - Undefined references in each shader are resolve to definitions in
+ *     another shader.
+ *   - Types and qualifiers of uniforms, outputs, and global variables defined
+ *     in multiple shaders with the same name are verified to be the same.
+ *   - Initializers for uniforms and global variables defined
+ *     in multiple shaders with the same name are verified to be the same.
+ *
+ * The result, in the terminology of the GLSL spec, is a set of shader
+ * executables for each processing unit.
+ *
+ * After the first stage is complete, a series of semantic checks are performed
+ * on each of the shader executables.
+ *
+ *   - Each shader executable must define a \c main function.
+ *   - Each vertex shader executable must write to \c gl_Position.
+ *   - Each fragment shader executable must write to either \c gl_FragData or
+ *     \c gl_FragColor.
+ *
+ * In the final stage individual shader executables are linked to create a
+ * complete exectuable.
+ *
+ *   - Types of uniforms defined in multiple shader stages with the same name
+ *     are verified to be the same.
+ *   - Initializers for uniforms defined in multiple shader stages with the
+ *     same name are verified to be the same.
+ *   - Types and qualifiers of outputs defined in one stage are verified to
+ *     be the same as the types and qualifiers of inputs defined with the same
+ *     name in a later stage.
+ *
+ * \author Ian Romanick <ian.d.romanick@intel.com>
+ */
+
+#include "libfns.h" // LunarG ADD:
+#include "glsl_symbol_table.h"
+#include "glsl_parser_extras.h"
+#include "ir.h"
+#include "program.h"
+#include "program/hash_table.h"
+#include "linker.h"
+#include "link_varyings.h"
+#include "ir_optimization.h"
+#include "ir_rvalue_visitor.h"
+#include "standalone_scaffolding.h" // LunarG ADD:
+
+extern "C" {
+#include "main/enums.h"
+}
+
+void linker_error(gl_shader_program *, const char *, ...);
+
+namespace {
+
+/**
+ * Visitor that determines whether or not a variable is ever written.
+ */
+class find_assignment_visitor : public ir_hierarchical_visitor {
+public:
+   find_assignment_visitor(const char *name)
+      : name(name), found(false)
+   {
+      /* empty */
+   }
+
+   virtual ir_visitor_status visit_enter(ir_assignment *ir)
+   {
+      ir_variable *const var = ir->lhs->variable_referenced();
+
+      if (strcmp(name, var->name) == 0) {
+	 found = true;
+	 return visit_stop;
+      }
+
+      return visit_continue_with_parent;
+   }
+
+   virtual ir_visitor_status visit_enter(ir_call *ir)
+   {
+      foreach_two_lists(formal_node, &ir->callee->parameters,
+                        actual_node, &ir->actual_parameters) {
+	 ir_rvalue *param_rval = (ir_rvalue *) actual_node;
+	 ir_variable *sig_param = (ir_variable *) formal_node;
+
+	 if (sig_param->data.mode == ir_var_function_out ||
+	     sig_param->data.mode == ir_var_function_inout) {
+	    ir_variable *var = param_rval->variable_referenced();
+	    if (var && strcmp(name, var->name) == 0) {
+	       found = true;
+	       return visit_stop;
+	    }
+	 }
+      }
+
+      if (ir->return_deref != NULL) {
+	 ir_variable *const var = ir->return_deref->variable_referenced();
+
+	 if (strcmp(name, var->name) == 0) {
+	    found = true;
+	    return visit_stop;
+	 }
+      }
+
+      return visit_continue_with_parent;
+   }
+
+   bool variable_found()
+   {
+      return found;
+   }
+
+private:
+   const char *name;       /**< Find writes to a variable with this name. */
+   bool found;             /**< Was a write to the variable found? */
+};
+
+
+/**
+ * Visitor that determines whether or not a variable is ever read.
+ */
+class find_deref_visitor : public ir_hierarchical_visitor {
+public:
+   find_deref_visitor(const char *name)
+      : name(name), found(false)
+   {
+      /* empty */
+   }
+
+   virtual ir_visitor_status visit(ir_dereference_variable *ir)
+   {
+      if (strcmp(this->name, ir->var->name) == 0) {
+	 this->found = true;
+	 return visit_stop;
+      }
+
+      return visit_continue;
+   }
+
+   bool variable_found() const
+   {
+      return this->found;
+   }
+
+private:
+   const char *name;       /**< Find writes to a variable with this name. */
+   bool found;             /**< Was a write to the variable found? */
+};
+
+
+class geom_array_resize_visitor : public ir_hierarchical_visitor {
+public:
+   unsigned num_vertices;
+   gl_shader_program *prog;
+
+   geom_array_resize_visitor(unsigned num_vertices, gl_shader_program *prog)
+   {
+      this->num_vertices = num_vertices;
+      this->prog = prog;
+   }
+
+   virtual ~geom_array_resize_visitor()
+   {
+      /* empty */
+   }
+
+   virtual ir_visitor_status visit(ir_variable *var)
+   {
+      if (!var->type->is_array() || var->data.mode != ir_var_shader_in)
+         return visit_continue;
+
+      unsigned size = var->type->length;
+
+      /* Generate a link error if the shader has declared this array with an
+       * incorrect size.
+       */
+      if (size && size != this->num_vertices) {
+         linker_error(this->prog, "size of array %s declared as %u, "
+                      "but number of input vertices is %u\n",
+                      var->name, size, this->num_vertices);
+         return visit_continue;
+      }
+
+      /* Generate a link error if the shader attempts to access an input
+       * array using an index too large for its actual size assigned at link
+       * time.
+       */
+      if (var->data.max_array_access >= this->num_vertices) {
+         linker_error(this->prog, "geometry shader accesses element %i of "
+                      "%s, but only %i input vertices\n",
+                      var->data.max_array_access, var->name, this->num_vertices);
+         return visit_continue;
+      }
+
+      var->type = glsl_type::get_array_instance(var->type->element_type(),
+                                                this->num_vertices);
+      var->data.max_array_access = this->num_vertices - 1;
+
+      return visit_continue;
+   }
+
+   /* Dereferences of input variables need to be updated so that their type
+    * matches the newly assigned type of the variable they are accessing. */
+   virtual ir_visitor_status visit(ir_dereference_variable *ir)
+   {
+      ir->type = ir->var->type;
+      return visit_continue;
+   }
+
+   /* Dereferences of 2D input arrays need to be updated so that their type
+    * matches the newly assigned type of the array they are accessing. */
+   virtual ir_visitor_status visit_leave(ir_dereference_array *ir)
+   {
+      const glsl_type *const vt = ir->array->type;
+      if (vt->is_array())
+         ir->type = vt->element_type();
+      return visit_continue;
+   }
+};
+
+
+/**
+ * Visitor that determines whether or not a shader uses ir_end_primitive.
+ */
+class find_end_primitive_visitor : public ir_hierarchical_visitor {
+public:
+   find_end_primitive_visitor()
+      : found(false)
+   {
+      /* empty */
+   }
+
+   virtual ir_visitor_status visit(ir_end_primitive *)
+   {
+      found = true;
+      return visit_stop;
+   }
+
+   bool end_primitive_found()
+   {
+      return found;
+   }
+
+private:
+   bool found;
+};
+
+} /* anonymous namespace */
+
+void
+linker_error(gl_shader_program *prog, const char *fmt, ...)
+{
+   va_list ap;
+
+   ralloc_strcat(&prog->InfoLog, "error: ");
+   va_start(ap, fmt);
+   ralloc_vasprintf_append(&prog->InfoLog, fmt, ap);
+   va_end(ap);
+
+   prog->LinkStatus = false;
+}
+
+
+void
+linker_warning(gl_shader_program *prog, const char *fmt, ...)
+{
+   va_list ap;
+
+   ralloc_strcat(&prog->InfoLog, "warning: ");
+   va_start(ap, fmt);
+   ralloc_vasprintf_append(&prog->InfoLog, fmt, ap);
+   va_end(ap);
+
+}
+
+
+/**
+ * Given a string identifying a program resource, break it into a base name
+ * and an optional array index in square brackets.
+ *
+ * If an array index is present, \c out_base_name_end is set to point to the
+ * "[" that precedes the array index, and the array index itself is returned
+ * as a long.
+ *
+ * If no array index is present (or if the array index is negative or
+ * mal-formed), \c out_base_name_end, is set to point to the null terminator
+ * at the end of the input string, and -1 is returned.
+ *
+ * Only the final array index is parsed; if the string contains other array
+ * indices (or structure field accesses), they are left in the base name.
+ *
+ * No attempt is made to check that the base name is properly formed;
+ * typically the caller will look up the base name in a hash table, so
+ * ill-formed base names simply turn into hash table lookup failures.
+ */
+long
+parse_program_resource_name(const GLchar *name,
+                            const GLchar **out_base_name_end)
+{
+   /* Section 7.3.1 ("Program Interfaces") of the OpenGL 4.3 spec says:
+    *
+    *     "When an integer array element or block instance number is part of
+    *     the name string, it will be specified in decimal form without a "+"
+    *     or "-" sign or any extra leading zeroes. Additionally, the name
+    *     string will not include white space anywhere in the string."
+    */
+
+   const size_t len = strlen(name);
+   *out_base_name_end = name + len;
+
+   if (len == 0 || name[len-1] != ']')
+      return -1;
+
+   /* Walk backwards over the string looking for a non-digit character.  This
+    * had better be the opening bracket for an array index.
+    *
+    * Initially, i specifies the location of the ']'.  Since the string may
+    * contain only the ']' charcater, walk backwards very carefully.
+    */
+   unsigned i;
+   for (i = len - 1; (i > 0) && isdigit(name[i-1]); --i)
+      /* empty */ ;
+
+   if ((i == 0) || name[i-1] != '[')
+      return -1;
+
+   long array_index = strtol(&name[i], NULL, 10);
+   if (array_index < 0)
+      return -1;
+
+   *out_base_name_end = name + (i - 1);
+   return array_index;
+}
+
+
+void
+link_invalidate_variable_locations(exec_list *ir)
+{
+   foreach_list(node, ir) {
+      ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+      if (var == NULL)
+         continue;
+
+      /* Only assign locations for variables that lack an explicit location.
+       * Explicit locations are set for all built-in variables, generic vertex
+       * shader inputs (via layout(location=...)), and generic fragment shader
+       * outputs (also via layout(location=...)).
+       */
+      if (!var->data.explicit_location) {
+         var->data.location = -1;
+         var->data.location_frac = 0;
+      }
+
+      /* ir_variable::is_unmatched_generic_inout is used by the linker while
+       * connecting outputs from one stage to inputs of the next stage.
+       *
+       * There are two implicit assumptions here.  First, we assume that any
+       * built-in variable (i.e., non-generic in or out) will have
+       * explicit_location set.  Second, we assume that any generic in or out
+       * will not have explicit_location set.
+       *
+       * This second assumption will only be valid until
+       * GL_ARB_separate_shader_objects is supported.  When that extension is
+       * implemented, this function will need some modifications.
+       */
+      if (!var->data.explicit_location) {
+         var->data.is_unmatched_generic_inout = 1;
+      } else {
+         var->data.is_unmatched_generic_inout = 0;
+      }
+   }
+}
+
+
+/**
+ * Set UsesClipDistance and ClipDistanceArraySize based on the given shader.
+ *
+ * Also check for errors based on incorrect usage of gl_ClipVertex and
+ * gl_ClipDistance.
+ *
+ * Return false if an error was reported.
+ */
+static void
+analyze_clip_usage(struct gl_shader_program *prog,
+                   struct gl_shader *shader, GLboolean *UsesClipDistance,
+                   GLuint *ClipDistanceArraySize)
+{
+   *ClipDistanceArraySize = 0;
+
+   if (!prog->IsES && prog->Version >= 130) {
+      /* From section 7.1 (Vertex Shader Special Variables) of the
+       * GLSL 1.30 spec:
+       *
+       *   "It is an error for a shader to statically write both
+       *   gl_ClipVertex and gl_ClipDistance."
+       *
+       * This does not apply to GLSL ES shaders, since GLSL ES defines neither
+       * gl_ClipVertex nor gl_ClipDistance.
+       */
+      find_assignment_visitor clip_vertex("gl_ClipVertex");
+      find_assignment_visitor clip_distance("gl_ClipDistance");
+
+      clip_vertex.run(shader->ir);
+      clip_distance.run(shader->ir);
+      if (clip_vertex.variable_found() && clip_distance.variable_found()) {
+         linker_error(prog, "%s shader writes to both `gl_ClipVertex' "
+                      "and `gl_ClipDistance'\n",
+                      _mesa_shader_stage_to_string(shader->Stage));
+         return;
+      }
+      *UsesClipDistance = clip_distance.variable_found();
+      ir_variable *clip_distance_var =
+         shader->symbols->get_variable("gl_ClipDistance");
+      if (clip_distance_var)
+         *ClipDistanceArraySize = clip_distance_var->type->length;
+   } else {
+      *UsesClipDistance = false;
+   }
+}
+
+
+/**
+ * Verify that a vertex shader executable meets all semantic requirements.
+ *
+ * Also sets prog->Vert.UsesClipDistance and prog->Vert.ClipDistanceArraySize
+ * as a side effect.
+ *
+ * \param shader  Vertex shader executable to be verified
+ */
+void
+validate_vertex_shader_executable(struct gl_shader_program *prog,
+				  struct gl_shader *shader)
+{
+   if (shader == NULL)
+      return;
+
+   /* From the GLSL 1.10 spec, page 48:
+    *
+    *     "The variable gl_Position is available only in the vertex
+    *      language and is intended for writing the homogeneous vertex
+    *      position. All executions of a well-formed vertex shader
+    *      executable must write a value into this variable. [...] The
+    *      variable gl_Position is available only in the vertex
+    *      language and is intended for writing the homogeneous vertex
+    *      position. All executions of a well-formed vertex shader
+    *      executable must write a value into this variable."
+    *
+    * while in GLSL 1.40 this text is changed to:
+    *
+    *     "The variable gl_Position is available only in the vertex
+    *      language and is intended for writing the homogeneous vertex
+    *      position. It can be written at any time during shader
+    *      execution. It may also be read back by a vertex shader
+    *      after being written. This value will be used by primitive
+    *      assembly, clipping, culling, and other fixed functionality
+    *      operations, if present, that operate on primitives after
+    *      vertex processing has occurred. Its value is undefined if
+    *      the vertex shader executable does not write gl_Position."
+    *
+    * GLSL ES 3.00 is similar to GLSL 1.40--failing to write to gl_Position is
+    * not an error.
+    */
+   if (prog->Version < (prog->IsES ? 300 : 140)) {
+      find_assignment_visitor find("gl_Position");
+      find.run(shader->ir);
+      if (!find.variable_found()) {
+	 linker_error(prog, "vertex shader does not write to `gl_Position'\n");
+	 return;
+      }
+   }
+
+   analyze_clip_usage(prog, shader, &prog->Vert.UsesClipDistance,
+                      &prog->Vert.ClipDistanceArraySize);
+}
+
+
+/**
+ * Verify that a fragment shader executable meets all semantic requirements
+ *
+ * \param shader  Fragment shader executable to be verified
+ */
+void
+validate_fragment_shader_executable(struct gl_shader_program *prog,
+				    struct gl_shader *shader)
+{
+   if (shader == NULL)
+      return;
+
+   find_assignment_visitor frag_color("gl_FragColor");
+   find_assignment_visitor frag_data("gl_FragData");
+
+   frag_color.run(shader->ir);
+   frag_data.run(shader->ir);
+
+   if (frag_color.variable_found() && frag_data.variable_found()) {
+      linker_error(prog,  "fragment shader writes to both "
+		   "`gl_FragColor' and `gl_FragData'\n");
+   }
+}
+
+/**
+ * Verify that a geometry shader executable meets all semantic requirements
+ *
+ * Also sets prog->Geom.VerticesIn, prog->Geom.UsesClipDistance, and
+ * prog->Geom.ClipDistanceArraySize as a side effect.
+ *
+ * \param shader Geometry shader executable to be verified
+ */
+void
+validate_geometry_shader_executable(struct gl_shader_program *prog,
+				    struct gl_shader *shader)
+{
+   if (shader == NULL)
+      return;
+
+   unsigned num_vertices = vertices_per_prim(prog->Geom.InputType);
+   prog->Geom.VerticesIn = num_vertices;
+
+   analyze_clip_usage(prog, shader, &prog->Geom.UsesClipDistance,
+                      &prog->Geom.ClipDistanceArraySize);
+
+   find_end_primitive_visitor end_primitive;
+   end_primitive.run(shader->ir);
+   prog->Geom.UsesEndPrimitive = end_primitive.end_primitive_found();
+}
+
+
+/**
+ * Perform validation of global variables used across multiple shaders
+ */
+void
+cross_validate_globals(struct gl_shader_program *prog,
+		       struct gl_shader **shader_list,
+		       unsigned num_shaders,
+		       bool uniforms_only)
+{
+   /* Examine all of the uniforms in all of the shaders and cross validate
+    * them.
+    */
+   glsl_symbol_table variables;
+   for (unsigned i = 0; i < num_shaders; i++) {
+      if (shader_list[i] == NULL)
+	 continue;
+
+      foreach_list(node, shader_list[i]->ir) {
+	 ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+	 if (var == NULL)
+	    continue;
+
+	 if (uniforms_only && (var->data.mode != ir_var_uniform))
+	    continue;
+
+	 /* Don't cross validate temporaries that are at global scope.  These
+	  * will eventually get pulled into the shaders 'main'.
+	  */
+	 if (var->data.mode == ir_var_temporary)
+	    continue;
+
+	 /* If a global with this name has already been seen, verify that the
+	  * new instance has the same type.  In addition, if the globals have
+	  * initializers, the values of the initializers must be the same.
+	  */
+	 ir_variable *const existing = variables.get_variable(var->name);
+	 if (existing != NULL) {
+	    if (var->type != existing->type) {
+	       /* Consider the types to be "the same" if both types are arrays
+		* of the same type and one of the arrays is implicitly sized.
+		* In addition, set the type of the linked variable to the
+		* explicitly sized array.
+		*/
+	       if (var->type->is_array()
+		   && existing->type->is_array()
+		   && (var->type->fields.array == existing->type->fields.array)
+		   && ((var->type->length == 0)
+		       || (existing->type->length == 0))) {
+		  if (var->type->length != 0) {
+		     existing->type = var->type;
+		  }
+               } else if (var->type->is_record()
+		   && existing->type->is_record()
+		   && existing->type->record_compare(var->type)) {
+		  existing->type = var->type;
+	       } else {
+		  linker_error(prog, "%s `%s' declared as type "
+			       "`%s' and type `%s'\n",
+			       mode_string(var),
+			       var->name, var->type->name,
+			       existing->type->name);
+		  return;
+	       }
+	    }
+
+	    if (var->data.explicit_location) {
+	       if (existing->data.explicit_location
+		   && (var->data.location != existing->data.location)) {
+		     linker_error(prog, "explicit locations for %s "
+				  "`%s' have differing values\n",
+				  mode_string(var), var->name);
+		     return;
+	       }
+
+	       existing->data.location = var->data.location;
+	       existing->data.explicit_location = true;
+	    }
+
+            /* From the GLSL 4.20 specification:
+             * "A link error will result if two compilation units in a program
+             *  specify different integer-constant bindings for the same
+             *  opaque-uniform name.  However, it is not an error to specify a
+             *  binding on some but not all declarations for the same name"
+             */
+            if (var->data.explicit_binding) {
+               if (existing->data.explicit_binding &&
+                   var->data.binding != existing->data.binding) {
+                  linker_error(prog, "explicit bindings for %s "
+                               "`%s' have differing values\n",
+                               mode_string(var), var->name);
+                  return;
+               }
+
+               existing->data.binding = var->data.binding;
+               existing->data.explicit_binding = true;
+            }
+
+            if (var->type->contains_atomic() &&
+                var->data.atomic.offset != existing->data.atomic.offset) {
+               linker_error(prog, "offset specifications for %s "
+                            "`%s' have differing values\n",
+                            mode_string(var), var->name);
+               return;
+            }
+
+	    /* Validate layout qualifiers for gl_FragDepth.
+	     *
+	     * From the AMD/ARB_conservative_depth specs:
+	     *
+	     *    "If gl_FragDepth is redeclared in any fragment shader in a
+	     *    program, it must be redeclared in all fragment shaders in
+	     *    that program that have static assignments to
+	     *    gl_FragDepth. All redeclarations of gl_FragDepth in all
+	     *    fragment shaders in a single program must have the same set
+	     *    of qualifiers."
+	     */
+	    if (strcmp(var->name, "gl_FragDepth") == 0) {
+	       bool layout_declared = var->data.depth_layout != ir_depth_layout_none;
+	       bool layout_differs =
+		  var->data.depth_layout != existing->data.depth_layout;
+
+	       if (layout_declared && layout_differs) {
+		  linker_error(prog,
+			       "All redeclarations of gl_FragDepth in all "
+			       "fragment shaders in a single program must have "
+			       "the same set of qualifiers.");
+	       }
+
+	       if (var->data.used && layout_differs) {
+		  linker_error(prog,
+			       "If gl_FragDepth is redeclared with a layout "
+			       "qualifier in any fragment shader, it must be "
+			       "redeclared with the same layout qualifier in "
+			       "all fragment shaders that have assignments to "
+			       "gl_FragDepth");
+	       }
+	    }
+
+	    /* Page 35 (page 41 of the PDF) of the GLSL 4.20 spec says:
+	     *
+	     *     "If a shared global has multiple initializers, the
+	     *     initializers must all be constant expressions, and they
+	     *     must all have the same value. Otherwise, a link error will
+	     *     result. (A shared global having only one initializer does
+	     *     not require that initializer to be a constant expression.)"
+	     *
+	     * Previous to 4.20 the GLSL spec simply said that initializers
+	     * must have the same value.  In this case of non-constant
+	     * initializers, this was impossible to determine.  As a result,
+	     * no vendor actually implemented that behavior.  The 4.20
+	     * behavior matches the implemented behavior of at least one other
+	     * vendor, so we'll implement that for all GLSL versions.
+	     */
+	    if (var->constant_initializer != NULL) {
+	       if (existing->constant_initializer != NULL) {
+		  if (!var->constant_initializer->has_value(existing->constant_initializer)) {
+		     linker_error(prog, "initializers for %s "
+				  "`%s' have differing values\n",
+				  mode_string(var), var->name);
+		     return;
+		  }
+	       } else {
+		  /* If the first-seen instance of a particular uniform did not
+		   * have an initializer but a later instance does, copy the
+		   * initializer to the version stored in the symbol table.
+		   */
+		  /* FINISHME: This is wrong.  The constant_value field should
+		   * FINISHME: not be modified!  Imagine a case where a shader
+		   * FINISHME: without an initializer is linked in two different
+		   * FINISHME: programs with shaders that have differing
+		   * FINISHME: initializers.  Linking with the first will
+		   * FINISHME: modify the shader, and linking with the second
+		   * FINISHME: will fail.
+		   */
+		  existing->constant_initializer =
+		     var->constant_initializer->clone(ralloc_parent(existing),
+						      NULL);
+	       }
+	    }
+
+	    if (var->data.has_initializer) {
+	       if (existing->data.has_initializer
+		   && (var->constant_initializer == NULL
+		       || existing->constant_initializer == NULL)) {
+		  linker_error(prog,
+			       "shared global variable `%s' has multiple "
+			       "non-constant initializers.\n",
+			       var->name);
+		  return;
+	       }
+
+	       /* Some instance had an initializer, so keep track of that.  In
+		* this location, all sorts of initializers (constant or
+		* otherwise) will propagate the existence to the variable
+		* stored in the symbol table.
+		*/
+	       existing->data.has_initializer = true;
+	    }
+
+	    if (existing->data.invariant != var->data.invariant) {
+	       linker_error(prog, "declarations for %s `%s' have "
+			    "mismatching invariant qualifiers\n",
+			    mode_string(var), var->name);
+	       return;
+	    }
+            if (existing->data.centroid != var->data.centroid) {
+               linker_error(prog, "declarations for %s `%s' have "
+			    "mismatching centroid qualifiers\n",
+			    mode_string(var), var->name);
+               return;
+            }
+            if (existing->data.sample != var->data.sample) {
+               linker_error(prog, "declarations for %s `%s` have "
+                            "mismatching sample qualifiers\n",
+                            mode_string(var), var->name);
+               return;
+            }
+	 } else
+	    variables.add_variable(var);
+      }
+   }
+}
+
+
+/**
+ * Perform validation of uniforms used across multiple shader stages
+ */
+void
+cross_validate_uniforms(struct gl_shader_program *prog)
+{
+   cross_validate_globals(prog, prog->_LinkedShaders,
+                          MESA_SHADER_STAGES, true);
+}
+
+/**
+ * Accumulates the array of prog->UniformBlocks and checks that all
+ * definitons of blocks agree on their contents.
+ */
+static bool
+interstage_cross_validate_uniform_blocks(struct gl_shader_program *prog)
+{
+   unsigned max_num_uniform_blocks = 0;
+   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
+      if (prog->_LinkedShaders[i])
+	 max_num_uniform_blocks += prog->_LinkedShaders[i]->NumUniformBlocks;
+   }
+
+   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
+      struct gl_shader *sh = prog->_LinkedShaders[i];
+
+      prog->UniformBlockStageIndex[i] = ralloc_array(prog, int,
+						     max_num_uniform_blocks);
+      for (unsigned int j = 0; j < max_num_uniform_blocks; j++)
+	 prog->UniformBlockStageIndex[i][j] = -1;
+
+      if (sh == NULL)
+	 continue;
+
+      for (unsigned int j = 0; j < sh->NumUniformBlocks; j++) {
+	 int index = link_cross_validate_uniform_block(prog,
+						       &prog->UniformBlocks,
+						       &prog->NumUniformBlocks,
+						       &sh->UniformBlocks[j]);
+
+	 if (index == -1) {
+	    linker_error(prog, "uniform block `%s' has mismatching definitions",
+			 sh->UniformBlocks[j].Name);
+	    return false;
+	 }
+
+	 prog->UniformBlockStageIndex[i][index] = j;
+      }
+   }
+
+   return true;
+}
+
+
+/**
+ * Populates a shaders symbol table with all global declarations
+ */
+void
+populate_symbol_table(gl_shader *sh)
+{
+   sh->symbols = new(sh) glsl_symbol_table;
+
+   foreach_list(node, sh->ir) {
+      ir_instruction *const inst = (ir_instruction *) node;
+      ir_variable *var;
+      ir_function *func;
+
+      if ((func = inst->as_function()) != NULL) {
+	 sh->symbols->add_function(func);
+      } else if ((var = inst->as_variable()) != NULL) {
+	 sh->symbols->add_variable(var);
+      }
+   }
+}
+
+
+/**
+ * Remap variables referenced in an instruction tree
+ *
+ * This is used when instruction trees are cloned from one shader and placed in
+ * another.  These trees will contain references to \c ir_variable nodes that
+ * do not exist in the target shader.  This function finds these \c ir_variable
+ * references and replaces the references with matching variables in the target
+ * shader.
+ *
+ * If there is no matching variable in the target shader, a clone of the
+ * \c ir_variable is made and added to the target shader.  The new variable is
+ * added to \b both the instruction stream and the symbol table.
+ *
+ * \param inst         IR tree that is to be processed.
+ * \param symbols      Symbol table containing global scope symbols in the
+ *                     linked shader.
+ * \param instructions Instruction stream where new variable declarations
+ *                     should be added.
+ */
+void
+remap_variables(ir_instruction *inst, struct gl_shader *target,
+		hash_table *temps)
+{
+   class remap_visitor : public ir_hierarchical_visitor {
+   public:
+	 remap_visitor(struct gl_shader *target,
+		    hash_table *temps)
+      {
+	 this->target = target;
+	 this->symbols = target->symbols;
+	 this->instructions = target->ir;
+	 this->temps = temps;
+      }
+
+      virtual ir_visitor_status visit(ir_dereference_variable *ir)
+      {
+	 if (ir->var->data.mode == ir_var_temporary) {
+	    ir_variable *var = (ir_variable *) hash_table_find(temps, ir->var);
+
+	    assert(var != NULL);
+	    ir->var = var;
+	    return visit_continue;
+	 }
+
+	 ir_variable *const existing =
+	    this->symbols->get_variable(ir->var->name);
+	 if (existing != NULL)
+	    ir->var = existing;
+	 else {
+	    ir_variable *copy = ir->var->clone(this->target, NULL);
+
+	    this->symbols->add_variable(copy);
+	    this->instructions->push_head(copy);
+	    ir->var = copy;
+	 }
+
+	 return visit_continue;
+      }
+
+   private:
+      struct gl_shader *target;
+      glsl_symbol_table *symbols;
+      exec_list *instructions;
+      hash_table *temps;
+   };
+
+   remap_visitor v(target, temps);
+
+   inst->accept(&v);
+}
+
+
+/**
+ * Move non-declarations from one instruction stream to another
+ *
+ * The intended usage pattern of this function is to pass the pointer to the
+ * head sentinel of a list (i.e., a pointer to the list cast to an \c exec_node
+ * pointer) for \c last and \c false for \c make_copies on the first
+ * call.  Successive calls pass the return value of the previous call for
+ * \c last and \c true for \c make_copies.
+ *
+ * \param instructions Source instruction stream
+ * \param last         Instruction after which new instructions should be
+ *                     inserted in the target instruction stream
+ * \param make_copies  Flag selecting whether instructions in \c instructions
+ *                     should be copied (via \c ir_instruction::clone) into the
+ *                     target list or moved.
+ *
+ * \return
+ * The new "last" instruction in the target instruction stream.  This pointer
+ * is suitable for use as the \c last parameter of a later call to this
+ * function.
+ */
+exec_node *
+move_non_declarations(exec_list *instructions, exec_node *last,
+		      bool make_copies, gl_shader *target)
+{
+   hash_table *temps = NULL;
+
+   if (make_copies)
+      temps = hash_table_ctor(0, hash_table_pointer_hash,
+			      hash_table_pointer_compare);
+
+   foreach_list_safe(node, instructions) {
+      ir_instruction *inst = (ir_instruction *) node;
+
+      if (inst->as_function())
+	 continue;
+
+      ir_variable *var = inst->as_variable();
+      if ((var != NULL) && (var->data.mode != ir_var_temporary))
+	 continue;
+
+      assert(inst->as_assignment()
+             || inst->as_call()
+             || inst->as_if() /* for initializers with the ?: operator */
+	     || ((var != NULL) && (var->data.mode == ir_var_temporary)));
+
+      if (make_copies) {
+	 inst = inst->clone(target, NULL);
+
+	 if (var != NULL)
+	    hash_table_insert(temps, inst, var);
+	 else
+	    remap_variables(inst, target, temps);
+      } else {
+	 inst->remove();
+      }
+
+      last->insert_after(inst);
+      last = inst;
+   }
+
+   if (make_copies)
+      hash_table_dtor(temps);
+
+   return last;
+}
+
+/**
+ * Get the function signature for main from a shader
+ */
+static ir_function_signature *
+get_main_function_signature(gl_shader *sh)
+{
+   ir_function *const f = sh->symbols->get_function("main");
+   if (f != NULL) {
+      exec_list void_parameters;
+
+      /* Look for the 'void main()' signature and ensure that it's defined.
+       * This keeps the linker from accidentally pick a shader that just
+       * contains a prototype for main.
+       *
+       * We don't have to check for multiple definitions of main (in multiple
+       * shaders) because that would have already been caught above.
+       */
+      ir_function_signature *sig = f->matching_signature(NULL, &void_parameters);
+      if ((sig != NULL) && sig->is_defined) {
+	 return sig;
+      }
+   }
+
+   return NULL;
+}
+
+
+/**
+ * This class is only used in link_intrastage_shaders() below but declaring
+ * it inside that function leads to compiler warnings with some versions of
+ * gcc.
+ */
+class array_sizing_visitor : public ir_hierarchical_visitor {
+public:
+   array_sizing_visitor()
+      : mem_ctx(ralloc_context(NULL)),
+        unnamed_interfaces(hash_table_ctor(0, hash_table_pointer_hash,
+                                           hash_table_pointer_compare))
+   {
+   }
+
+   ~array_sizing_visitor()
+   {
+      hash_table_dtor(this->unnamed_interfaces);
+      ralloc_free(this->mem_ctx);
+   }
+
+   virtual ir_visitor_status visit(ir_variable *var)
+   {
+      fixup_type(&var->type, var->data.max_array_access);
+      if (var->type->is_interface()) {
+         if (interface_contains_unsized_arrays(var->type)) {
+            const glsl_type *new_type =
+               resize_interface_members(var->type, var->max_ifc_array_access);
+            var->type = new_type;
+            var->change_interface_type(new_type);
+         }
+      } else if (var->type->is_array() &&
+                 var->type->fields.array->is_interface()) {
+         if (interface_contains_unsized_arrays(var->type->fields.array)) {
+            const glsl_type *new_type =
+               resize_interface_members(var->type->fields.array,
+                                        var->max_ifc_array_access);
+            var->change_interface_type(new_type);
+            var->type =
+               glsl_type::get_array_instance(new_type, var->type->length);
+         }
+      } else if (const glsl_type *ifc_type = var->get_interface_type()) {
+         /* Store a pointer to the variable in the unnamed_interfaces
+          * hashtable.
+          */
+         ir_variable **interface_vars = (ir_variable **)
+            hash_table_find(this->unnamed_interfaces, ifc_type);
+         if (interface_vars == NULL) {
+            interface_vars = rzalloc_array(mem_ctx, ir_variable *,
+                                           ifc_type->length);
+            hash_table_insert(this->unnamed_interfaces, interface_vars,
+                              ifc_type);
+         }
+         unsigned index = ifc_type->field_index(var->name);
+         assert(index < ifc_type->length);
+         assert(interface_vars[index] == NULL);
+         interface_vars[index] = var;
+      }
+      return visit_continue;
+   }
+
+   /**
+    * For each unnamed interface block that was discovered while running the
+    * visitor, adjust the interface type to reflect the newly assigned array
+    * sizes, and fix up the ir_variable nodes to point to the new interface
+    * type.
+    */
+   void fixup_unnamed_interface_types()
+   {
+      hash_table_call_foreach(this->unnamed_interfaces,
+                              fixup_unnamed_interface_type, NULL);
+   }
+
+private:
+   /**
+    * If the type pointed to by \c type represents an unsized array, replace
+    * it with a sized array whose size is determined by max_array_access.
+    */
+   static void fixup_type(const glsl_type **type, unsigned max_array_access)
+   {
+      if ((*type)->is_unsized_array()) {
+         *type = glsl_type::get_array_instance((*type)->fields.array,
+                                               max_array_access + 1);
+         assert(*type != NULL);
+      }
+   }
+
+   /**
+    * Determine whether the given interface type contains unsized arrays (if
+    * it doesn't, array_sizing_visitor doesn't need to process it).
+    */
+   static bool interface_contains_unsized_arrays(const glsl_type *type)
+   {
+      for (unsigned i = 0; i < type->length; i++) {
+         const glsl_type *elem_type = type->fields.structure[i].type;
+         if (elem_type->is_unsized_array())
+            return true;
+      }
+      return false;
+   }
+
+   /**
+    * Create a new interface type based on the given type, with unsized arrays
+    * replaced by sized arrays whose size is determined by
+    * max_ifc_array_access.
+    */
+   static const glsl_type *
+   resize_interface_members(const glsl_type *type,
+                            const unsigned *max_ifc_array_access)
+   {
+      unsigned num_fields = type->length;
+      glsl_struct_field *fields = new glsl_struct_field[num_fields];
+      memcpy(fields, type->fields.structure,
+             num_fields * sizeof(*fields));
+      for (unsigned i = 0; i < num_fields; i++) {
+         fixup_type(&fields[i].type, max_ifc_array_access[i]);
+      }
+      glsl_interface_packing packing =
+         (glsl_interface_packing) type->interface_packing;
+      const glsl_type *new_ifc_type =
+         glsl_type::get_interface_instance(fields, num_fields,
+                                           packing, type->name);
+      delete [] fields;
+      return new_ifc_type;
+   }
+
+   static void fixup_unnamed_interface_type(const void *key, void *data,
+                                            void *)
+   {
+      const glsl_type *ifc_type = (const glsl_type *) key;
+      ir_variable **interface_vars = (ir_variable **) data;
+      unsigned num_fields = ifc_type->length;
+      glsl_struct_field *fields = new glsl_struct_field[num_fields];
+      memcpy(fields, ifc_type->fields.structure,
+             num_fields * sizeof(*fields));
+      bool interface_type_changed = false;
+      for (unsigned i = 0; i < num_fields; i++) {
+         if (interface_vars[i] != NULL &&
+             fields[i].type != interface_vars[i]->type) {
+            fields[i].type = interface_vars[i]->type;
+            interface_type_changed = true;
+         }
+      }
+      if (!interface_type_changed) {
+         delete [] fields;
+         return;
+      }
+      glsl_interface_packing packing =
+         (glsl_interface_packing) ifc_type->interface_packing;
+      const glsl_type *new_ifc_type =
+         glsl_type::get_interface_instance(fields, num_fields, packing,
+                                           ifc_type->name);
+      delete [] fields;
+      for (unsigned i = 0; i < num_fields; i++) {
+         if (interface_vars[i] != NULL)
+            interface_vars[i]->change_interface_type(new_ifc_type);
+      }
+   }
+
+   /**
+    * Memory context used to allocate the data in \c unnamed_interfaces.
+    */
+   void *mem_ctx;
+
+   /**
+    * Hash table from const glsl_type * to an array of ir_variable *'s
+    * pointing to the ir_variables constituting each unnamed interface block.
+    */
+   hash_table *unnamed_interfaces;
+};
+
+/**
+ * Performs the cross-validation of layout qualifiers specified in
+ * redeclaration of gl_FragCoord for the attached fragment shaders,
+ * and propagates them to the linked FS and linked shader program.
+ */
+static void
+link_fs_input_layout_qualifiers(struct gl_shader_program *prog,
+	                        struct gl_shader *linked_shader,
+	                        struct gl_shader **shader_list,
+	                        unsigned num_shaders)
+{
+   linked_shader->redeclares_gl_fragcoord = false;
+   linked_shader->uses_gl_fragcoord = false;
+   linked_shader->origin_upper_left = false;
+   linked_shader->pixel_center_integer = false;
+
+   if (linked_shader->Stage != MESA_SHADER_FRAGMENT ||
+       (prog->Version < 150 && !prog->ARB_fragment_coord_conventions_enable))
+      return;
+
+   for (unsigned i = 0; i < num_shaders; i++) {
+      struct gl_shader *shader = shader_list[i];
+      /* From the GLSL 1.50 spec, page 39:
+       *
+       *   "If gl_FragCoord is redeclared in any fragment shader in a program,
+       *    it must be redeclared in all the fragment shaders in that program
+       *    that have a static use gl_FragCoord."
+       *
+       * Exclude the case when one of the 'linked_shader' or 'shader' redeclares
+       * gl_FragCoord with no layout qualifiers but the other one doesn't
+       * redeclare it. If we strictly follow GLSL 1.50 spec's language, it
+       * should be a link error. But, generating link error for this case will
+       * be a wrong behaviour which spec didn't intend to do and it could also
+       * break some applications.
+       */
+      if ((linked_shader->redeclares_gl_fragcoord
+           && !shader->redeclares_gl_fragcoord
+           && shader->uses_gl_fragcoord
+           && (linked_shader->origin_upper_left
+               || linked_shader->pixel_center_integer))
+          || (shader->redeclares_gl_fragcoord
+              && !linked_shader->redeclares_gl_fragcoord
+              && linked_shader->uses_gl_fragcoord
+              && (shader->origin_upper_left
+                  || shader->pixel_center_integer))) {
+             linker_error(prog, "fragment shader defined with conflicting "
+                         "layout qualifiers for gl_FragCoord\n");
+      }
+
+      /* From the GLSL 1.50 spec, page 39:
+       *
+       *   "All redeclarations of gl_FragCoord in all fragment shaders in a
+       *    single program must have the same set of qualifiers."
+       */
+      if (linked_shader->redeclares_gl_fragcoord && shader->redeclares_gl_fragcoord
+          && (shader->origin_upper_left != linked_shader->origin_upper_left
+          || shader->pixel_center_integer != linked_shader->pixel_center_integer)) {
+         linker_error(prog, "fragment shader defined with conflicting "
+                      "layout qualifiers for gl_FragCoord\n");
+      }
+
+      /* Update the linked shader state.  Note that uses_gl_fragcoord should
+       * accumulate the results.  The other values should replace.  If there
+       * are multiple redeclarations, all the fields except uses_gl_fragcoord
+       * are already known to be the same.
+       */
+      if (shader->redeclares_gl_fragcoord || shader->uses_gl_fragcoord) {
+         linked_shader->redeclares_gl_fragcoord =
+            shader->redeclares_gl_fragcoord;
+         linked_shader->uses_gl_fragcoord = linked_shader->uses_gl_fragcoord
+            || shader->uses_gl_fragcoord;
+         linked_shader->origin_upper_left = shader->origin_upper_left;
+         linked_shader->pixel_center_integer = shader->pixel_center_integer;
+      }
+   }
+}
+
+/**
+ * Performs the cross-validation of geometry shader max_vertices and
+ * primitive type layout qualifiers for the attached geometry shaders,
+ * and propagates them to the linked GS and linked shader program.
+ */
+static void
+link_gs_inout_layout_qualifiers(struct gl_shader_program *prog,
+				struct gl_shader *linked_shader,
+				struct gl_shader **shader_list,
+				unsigned num_shaders)
+{
+   linked_shader->Geom.VerticesOut = 0;
+   linked_shader->Geom.Invocations = 0;
+   linked_shader->Geom.InputType = PRIM_UNKNOWN;
+   linked_shader->Geom.OutputType = PRIM_UNKNOWN;
+
+   /* No in/out qualifiers defined for anything but GLSL 1.50+
+    * geometry shaders so far.
+    */
+   if (linked_shader->Stage != MESA_SHADER_GEOMETRY || prog->Version < 150)
+      return;
+
+   /* From the GLSL 1.50 spec, page 46:
+    *
+    *     "All geometry shader output layout declarations in a program
+    *      must declare the same layout and same value for
+    *      max_vertices. There must be at least one geometry output
+    *      layout declaration somewhere in a program, but not all
+    *      geometry shaders (compilation units) are required to
+    *      declare it."
+    */
+
+   for (unsigned i = 0; i < num_shaders; i++) {
+      struct gl_shader *shader = shader_list[i];
+
+      if (shader->Geom.InputType != PRIM_UNKNOWN) {
+	 if (linked_shader->Geom.InputType != PRIM_UNKNOWN &&
+	     linked_shader->Geom.InputType != shader->Geom.InputType) {
+	    linker_error(prog, "geometry shader defined with conflicting "
+			 "input types\n");
+	    return;
+	 }
+	 linked_shader->Geom.InputType = shader->Geom.InputType;
+      }
+
+      if (shader->Geom.OutputType != PRIM_UNKNOWN) {
+	 if (linked_shader->Geom.OutputType != PRIM_UNKNOWN &&
+	     linked_shader->Geom.OutputType != shader->Geom.OutputType) {
+	    linker_error(prog, "geometry shader defined with conflicting "
+			 "output types\n");
+	    return;
+	 }
+	 linked_shader->Geom.OutputType = shader->Geom.OutputType;
+      }
+
+      if (shader->Geom.VerticesOut != 0) {
+	 if (linked_shader->Geom.VerticesOut != 0 &&
+	     linked_shader->Geom.VerticesOut != shader->Geom.VerticesOut) {
+	    linker_error(prog, "geometry shader defined with conflicting "
+			 "output vertex count (%d and %d)\n",
+			 linked_shader->Geom.VerticesOut,
+			 shader->Geom.VerticesOut);
+	    return;
+	 }
+	 linked_shader->Geom.VerticesOut = shader->Geom.VerticesOut;
+      }
+
+      if (shader->Geom.Invocations != 0) {
+	 if (linked_shader->Geom.Invocations != 0 &&
+	     linked_shader->Geom.Invocations != shader->Geom.Invocations) {
+	    linker_error(prog, "geometry shader defined with conflicting "
+			 "invocation count (%d and %d)\n",
+			 linked_shader->Geom.Invocations,
+			 shader->Geom.Invocations);
+	    return;
+	 }
+	 linked_shader->Geom.Invocations = shader->Geom.Invocations;
+      }
+   }
+
+   /* Just do the intrastage -> interstage propagation right now,
+    * since we already know we're in the right type of shader program
+    * for doing it.
+    */
+   if (linked_shader->Geom.InputType == PRIM_UNKNOWN) {
+      linker_error(prog,
+		   "geometry shader didn't declare primitive input type\n");
+      return;
+   }
+   prog->Geom.InputType = linked_shader->Geom.InputType;
+
+   if (linked_shader->Geom.OutputType == PRIM_UNKNOWN) {
+      linker_error(prog,
+		   "geometry shader didn't declare primitive output type\n");
+      return;
+   }
+   prog->Geom.OutputType = linked_shader->Geom.OutputType;
+
+   if (linked_shader->Geom.VerticesOut == 0) {
+      linker_error(prog,
+		   "geometry shader didn't declare max_vertices\n");
+      return;
+   }
+   prog->Geom.VerticesOut = linked_shader->Geom.VerticesOut;
+
+   if (linked_shader->Geom.Invocations == 0)
+      linked_shader->Geom.Invocations = 1;
+
+   prog->Geom.Invocations = linked_shader->Geom.Invocations;
+}
+
+
+/**
+ * Perform cross-validation of compute shader local_size_{x,y,z} layout
+ * qualifiers for the attached compute shaders, and propagate them to the
+ * linked CS and linked shader program.
+ */
+static void
+link_cs_input_layout_qualifiers(struct gl_shader_program *prog,
+                                struct gl_shader *linked_shader,
+                                struct gl_shader **shader_list,
+                                unsigned num_shaders)
+{
+   for (int i = 0; i < 3; i++)
+      linked_shader->Comp.LocalSize[i] = 0;
+
+   /* This function is called for all shader stages, but it only has an effect
+    * for compute shaders.
+    */
+   if (linked_shader->Stage != MESA_SHADER_COMPUTE)
+      return;
+
+   /* From the ARB_compute_shader spec, in the section describing local size
+    * declarations:
+    *
+    *     If multiple compute shaders attached to a single program object
+    *     declare local work-group size, the declarations must be identical;
+    *     otherwise a link-time error results. Furthermore, if a program
+    *     object contains any compute shaders, at least one must contain an
+    *     input layout qualifier specifying the local work sizes of the
+    *     program, or a link-time error will occur.
+    */
+   for (unsigned sh = 0; sh < num_shaders; sh++) {
+      struct gl_shader *shader = shader_list[sh];
+
+      if (shader->Comp.LocalSize[0] != 0) {
+         if (linked_shader->Comp.LocalSize[0] != 0) {
+            for (int i = 0; i < 3; i++) {
+               if (linked_shader->Comp.LocalSize[i] !=
+                   shader->Comp.LocalSize[i]) {
+                  linker_error(prog, "compute shader defined with conflicting "
+                               "local sizes\n");
+                  return;
+               }
+            }
+         }
+         for (int i = 0; i < 3; i++)
+            linked_shader->Comp.LocalSize[i] = shader->Comp.LocalSize[i];
+      }
+   }
+
+   /* Just do the intrastage -> interstage propagation right now,
+    * since we already know we're in the right type of shader program
+    * for doing it.
+    */
+   if (linked_shader->Comp.LocalSize[0] == 0) {
+      linker_error(prog, "compute shader didn't declare local size\n");
+      return;
+   }
+   for (int i = 0; i < 3; i++)
+      prog->Comp.LocalSize[i] = linked_shader->Comp.LocalSize[i];
+}
+
+
+/**
+ * Combine a group of shaders for a single stage to generate a linked shader
+ *
+ * \note
+ * If this function is supplied a single shader, it is cloned, and the new
+ * shader is returned.
+ */
+static struct gl_shader *
+link_intrastage_shaders(void *mem_ctx,
+			struct gl_context *ctx,
+			struct gl_shader_program *prog,
+			struct gl_shader **shader_list,
+			unsigned num_shaders)
+{
+   struct gl_uniform_block *uniform_blocks = NULL;
+
+   /* Check that global variables defined in multiple shaders are consistent.
+    */
+   cross_validate_globals(prog, shader_list, num_shaders, false);
+   if (!prog->LinkStatus)
+      return NULL;
+
+   /* Check that interface blocks defined in multiple shaders are consistent.
+    */
+   validate_intrastage_interface_blocks(prog, (const gl_shader **)shader_list,
+                                        num_shaders);
+   if (!prog->LinkStatus)
+      return NULL;
+
+   /* Link up uniform blocks defined within this stage. */
+   const unsigned num_uniform_blocks =
+      link_uniform_blocks(mem_ctx, prog, shader_list, num_shaders,
+                          &uniform_blocks);
+
+   /* Check that there is only a single definition of each function signature
+    * across all shaders.
+    */
+   for (unsigned i = 0; i < (num_shaders - 1); i++) {
+      foreach_list(node, shader_list[i]->ir) {
+	 ir_function *const f = ((ir_instruction *) node)->as_function();
+
+	 if (f == NULL)
+	    continue;
+
+	 for (unsigned j = i + 1; j < num_shaders; j++) {
+	    ir_function *const other =
+	       shader_list[j]->symbols->get_function(f->name);
+
+	    /* If the other shader has no function (and therefore no function
+	     * signatures) with the same name, skip to the next shader.
+	     */
+	    if (other == NULL)
+	       continue;
+
+	    foreach_list(n, &f->signatures) {
+	       ir_function_signature *sig = (ir_function_signature *) n;
+
+	       if (!sig->is_defined || sig->is_builtin())
+		  continue;
+
+	       ir_function_signature *other_sig =
+		  other->exact_matching_signature(NULL, &sig->parameters);
+
+	       if ((other_sig != NULL) && other_sig->is_defined
+		   && !other_sig->is_builtin()) {
+		  linker_error(prog, "function `%s' is multiply defined",
+			       f->name);
+		  return NULL;
+	       }
+	    }
+	 }
+      }
+   }
+
+   /* Find the shader that defines main, and make a clone of it.
+    *
+    * Starting with the clone, search for undefined references.  If one is
+    * found, find the shader that defines it.  Clone the reference and add
+    * it to the shader.  Repeat until there are no undefined references or
+    * until a reference cannot be resolved.
+    */
+   gl_shader *main = NULL;
+   for (unsigned i = 0; i < num_shaders; i++) {
+      if (get_main_function_signature(shader_list[i]) != NULL) {
+	 main = shader_list[i];
+	 break;
+      }
+   }
+
+   if (main == NULL) {
+      linker_error(prog, "%s shader lacks `main'\n",
+		   _mesa_shader_stage_to_string(shader_list[0]->Stage));
+      return NULL;
+   }
+
+   gl_shader *linked = ctx->Driver.NewShader(NULL, 0, main->Type);
+   linked->ir = new(linked) exec_list;
+   clone_ir_list(mem_ctx, linked->ir, main->ir);
+
+   linked->UniformBlocks = uniform_blocks;
+   linked->NumUniformBlocks = num_uniform_blocks;
+   ralloc_steal(linked, linked->UniformBlocks);
+
+   link_fs_input_layout_qualifiers(prog, linked, shader_list, num_shaders);
+   link_gs_inout_layout_qualifiers(prog, linked, shader_list, num_shaders);
+   link_cs_input_layout_qualifiers(prog, linked, shader_list, num_shaders);
+
+   populate_symbol_table(linked);
+
+   /* The a pointer to the main function in the final linked shader (i.e., the
+    * copy of the original shader that contained the main function).
+    */
+   ir_function_signature *const main_sig = get_main_function_signature(linked);
+
+   /* Move any instructions other than variable declarations or function
+    * declarations into main.
+    */
+   exec_node *insertion_point =
+      move_non_declarations(linked->ir, (exec_node *) &main_sig->body, false,
+			    linked);
+
+   for (unsigned i = 0; i < num_shaders; i++) {
+      if (shader_list[i] == main)
+	 continue;
+
+      insertion_point = move_non_declarations(shader_list[i]->ir,
+					      insertion_point, true, linked);
+   }
+
+   /* Check if any shader needs built-in functions. */
+   bool need_builtins = false;
+   for (unsigned i = 0; i < num_shaders; i++) {
+      if (shader_list[i]->uses_builtin_functions) {
+         need_builtins = true;
+         break;
+      }
+   }
+
+   bool ok;
+   if (need_builtins) {
+      /* Make a temporary array one larger than shader_list, which will hold
+       * the built-in function shader as well.
+       */
+      gl_shader **linking_shaders = (gl_shader **)
+         calloc(num_shaders + 1, sizeof(gl_shader *));
+      memcpy(linking_shaders, shader_list, num_shaders * sizeof(gl_shader *));
+      linking_shaders[num_shaders] = _mesa_glsl_get_builtin_function_shader();
+
+      ok = link_function_calls(prog, linked, linking_shaders, num_shaders + 1);
+
+      free(linking_shaders);
+   } else {
+      ok = link_function_calls(prog, linked, shader_list, num_shaders);
+   }
+
+
+   if (!ok) {
+      ctx->Driver.DeleteShader(ctx, linked);
+      return NULL;
+   }
+
+   /* At this point linked should contain all of the linked IR, so
+    * validate it to make sure nothing went wrong.
+    */
+   validate_ir_tree(linked->ir);
+
+   /* Set the size of geometry shader input arrays */
+   if (linked->Stage == MESA_SHADER_GEOMETRY) {
+      unsigned num_vertices = vertices_per_prim(prog->Geom.InputType);
+      geom_array_resize_visitor input_resize_visitor(num_vertices, prog);
+      foreach_list(n, linked->ir) {
+         ir_instruction *ir = (ir_instruction *) n;
+         ir->accept(&input_resize_visitor);
+      }
+   }
+
+   /* Make a pass over all variable declarations to ensure that arrays with
+    * unspecified sizes have a size specified.  The size is inferred from the
+    * max_array_access field.
+    */
+   array_sizing_visitor v;
+   v.run(linked->ir);
+   v.fixup_unnamed_interface_types();
+
+   return linked;
+}
+
+/**
+ * Update the sizes of linked shader uniform arrays to the maximum
+ * array index used.
+ *
+ * From page 81 (page 95 of the PDF) of the OpenGL 2.1 spec:
+ *
+ *     If one or more elements of an array are active,
+ *     GetActiveUniform will return the name of the array in name,
+ *     subject to the restrictions listed above. The type of the array
+ *     is returned in type. The size parameter contains the highest
+ *     array element index used, plus one. The compiler or linker
+ *     determines the highest index used.  There will be only one
+ *     active uniform reported by the GL per uniform array.
+
+ */
+static void
+update_array_sizes(struct gl_shader_program *prog)
+{
+   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
+	 if (prog->_LinkedShaders[i] == NULL)
+	    continue;
+
+      foreach_list(node, prog->_LinkedShaders[i]->ir) {
+	 ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+	 if ((var == NULL) || (var->data.mode != ir_var_uniform) ||
+	     !var->type->is_array())
+	    continue;
+
+	 /* GL_ARB_uniform_buffer_object says that std140 uniforms
+	  * will not be eliminated.  Since we always do std140, just
+	  * don't resize arrays in UBOs.
+          *
+          * Atomic counters are supposed to get deterministic
+          * locations assigned based on the declaration ordering and
+          * sizes, array compaction would mess that up.
+	  */
+	 if (var->is_in_uniform_block() || var->type->contains_atomic())
+	    continue;
+
+	 unsigned int size = var->data.max_array_access;
+	 for (unsigned j = 0; j < MESA_SHADER_STAGES; j++) {
+	       if (prog->_LinkedShaders[j] == NULL)
+		  continue;
+
+	    foreach_list(node2, prog->_LinkedShaders[j]->ir) {
+	       ir_variable *other_var = ((ir_instruction *) node2)->as_variable();
+	       if (!other_var)
+		  continue;
+
+	       if (strcmp(var->name, other_var->name) == 0 &&
+		   other_var->data.max_array_access > size) {
+		  size = other_var->data.max_array_access;
+	       }
+	    }
+	 }
+
+	 if (size + 1 != var->type->length) {
+	    /* If this is a built-in uniform (i.e., it's backed by some
+	     * fixed-function state), adjust the number of state slots to
+	     * match the new array size.  The number of slots per array entry
+	     * is not known.  It seems safe to assume that the total number of
+	     * slots is an integer multiple of the number of array elements.
+	     * Determine the number of slots per array element by dividing by
+	     * the old (total) size.
+	     */
+	    if (var->num_state_slots > 0) {
+	       var->num_state_slots = (size + 1)
+		  * (var->num_state_slots / var->type->length);
+	    }
+
+	    var->type = glsl_type::get_array_instance(var->type->fields.array,
+						      size + 1);
+	    /* FINISHME: We should update the types of array
+	     * dereferences of this variable now.
+	     */
+	 }
+      }
+   }
+}
+
+/**
+ * Find a contiguous set of available bits in a bitmask.
+ *
+ * \param used_mask     Bits representing used (1) and unused (0) locations
+ * \param needed_count  Number of contiguous bits needed.
+ *
+ * \return
+ * Base location of the available bits on success or -1 on failure.
+ */
+int
+find_available_slots(unsigned used_mask, unsigned needed_count)
+{
+   unsigned needed_mask = (1 << needed_count) - 1;
+   const int max_bit_to_test = (8 * sizeof(used_mask)) - needed_count;
+
+   /* The comparison to 32 is redundant, but without it GCC emits "warning:
+    * cannot optimize possibly infinite loops" for the loop below.
+    */
+   if ((needed_count == 0) || (max_bit_to_test < 0) || (max_bit_to_test > 32))
+      return -1;
+
+   for (int i = 0; i <= max_bit_to_test; i++) {
+      if ((needed_mask & ~used_mask) == needed_mask)
+	 return i;
+
+      needed_mask <<= 1;
+   }
+
+   return -1;
+}
+
+
+/**
+ * Assign locations for either VS inputs for FS outputs
+ *
+ * \param prog          Shader program whose variables need locations assigned
+ * \param target_index  Selector for the program target to receive location
+ *                      assignmnets.  Must be either \c MESA_SHADER_VERTEX or
+ *                      \c MESA_SHADER_FRAGMENT.
+ * \param max_index     Maximum number of generic locations.  This corresponds
+ *                      to either the maximum number of draw buffers or the
+ *                      maximum number of generic attributes.
+ *
+ * \return
+ * If locations are successfully assigned, true is returned.  Otherwise an
+ * error is emitted to the shader link log and false is returned.
+ */
+bool
+assign_attribute_or_color_locations(gl_shader_program *prog,
+				    unsigned target_index,
+				    unsigned max_index)
+{
+   /* Mark invalid locations as being used.
+    */
+   unsigned used_locations = (max_index >= 32)
+      ? ~0 : ~((1 << max_index) - 1);
+
+   assert((target_index == MESA_SHADER_VERTEX)
+	  || (target_index == MESA_SHADER_FRAGMENT));
+
+   gl_shader *const sh = prog->_LinkedShaders[target_index];
+   if (sh == NULL)
+      return true;
+
+   /* Operate in a total of four passes.
+    *
+    * 1. Invalidate the location assignments for all vertex shader inputs.
+    *
+    * 2. Assign locations for inputs that have user-defined (via
+    *    glBindVertexAttribLocation) locations and outputs that have
+    *    user-defined locations (via glBindFragDataLocation).
+    *
+    * 3. Sort the attributes without assigned locations by number of slots
+    *    required in decreasing order.  Fragmentation caused by attribute
+    *    locations assigned by the application may prevent large attributes
+    *    from having enough contiguous space.
+    *
+    * 4. Assign locations to any inputs without assigned locations.
+    */
+
+   const int generic_base = (target_index == MESA_SHADER_VERTEX)
+      ? (int) VERT_ATTRIB_GENERIC0 : (int) FRAG_RESULT_DATA0;
+
+   const enum ir_variable_mode direction =
+      (target_index == MESA_SHADER_VERTEX)
+      ? ir_var_shader_in : ir_var_shader_out;
+
+
+   /* Temporary storage for the set of attributes that need locations assigned.
+    */
+   struct temp_attr {
+      unsigned slots;
+      ir_variable *var;
+
+      /* Used below in the call to qsort. */
+      static int compare(const void *a, const void *b)
+      {
+	 const temp_attr *const l = (const temp_attr *) a;
+	 const temp_attr *const r = (const temp_attr *) b;
+
+	 /* Reversed because we want a descending order sort below. */
+	 return r->slots - l->slots;
+      }
+   } to_assign[16];
+
+   unsigned num_attr = 0;
+
+   foreach_list(node, sh->ir) {
+      ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+      if ((var == NULL) || (var->data.mode != (unsigned) direction))
+	 continue;
+
+      if (var->data.explicit_location) {
+	 if ((var->data.location >= (int)(max_index + generic_base))
+	     || (var->data.location < 0)) {
+	    linker_error(prog,
+			 "invalid explicit location %d specified for `%s'\n",
+			 (var->data.location < 0)
+			 ? var->data.location
+                         : var->data.location - generic_base,
+			 var->name);
+	    return false;
+	 }
+      } else if (target_index == MESA_SHADER_VERTEX) {
+	 unsigned binding;
+
+	 if (prog->AttributeBindings->get(binding, var->name)) {
+	    assert(binding >= VERT_ATTRIB_GENERIC0);
+	    var->data.location = binding;
+            var->data.is_unmatched_generic_inout = 0;
+	 }
+      } else if (target_index == MESA_SHADER_FRAGMENT) {
+	 unsigned binding;
+	 unsigned index;
+
+	 if (prog->FragDataBindings->get(binding, var->name)) {
+	    assert(binding >= FRAG_RESULT_DATA0);
+	    var->data.location = binding;
+            var->data.is_unmatched_generic_inout = 0;
+
+	    if (prog->FragDataIndexBindings->get(index, var->name)) {
+	       var->data.index = index;
+	    }
+	 }
+      }
+
+      /* If the variable is not a built-in and has a location statically
+       * assigned in the shader (presumably via a layout qualifier), make sure
+       * that it doesn't collide with other assigned locations.  Otherwise,
+       * add it to the list of variables that need linker-assigned locations.
+       */
+      const unsigned slots = var->type->count_attribute_slots();
+      if (var->data.location != -1) {
+	 if (var->data.location >= generic_base && var->data.index < 1) {
+	    /* From page 61 of the OpenGL 4.0 spec:
+	     *
+	     *     "LinkProgram will fail if the attribute bindings assigned
+	     *     by BindAttribLocation do not leave not enough space to
+	     *     assign a location for an active matrix attribute or an
+	     *     active attribute array, both of which require multiple
+	     *     contiguous generic attributes."
+	     *
+	     * I think above text prohibits the aliasing of explicit and
+	     * automatic assignments. But, aliasing is allowed in manual
+	     * assignments of attribute locations. See below comments for
+	     * the details.
+	     *
+	     * From OpenGL 4.0 spec, page 61:
+	     *
+	     *     "It is possible for an application to bind more than one
+	     *     attribute name to the same location. This is referred to as
+	     *     aliasing. This will only work if only one of the aliased
+	     *     attributes is active in the executable program, or if no
+	     *     path through the shader consumes more than one attribute of
+	     *     a set of attributes aliased to the same location. A link
+	     *     error can occur if the linker determines that every path
+	     *     through the shader consumes multiple aliased attributes,
+	     *     but implementations are not required to generate an error
+	     *     in this case."
+	     *
+	     * From GLSL 4.30 spec, page 54:
+	     *
+	     *    "A program will fail to link if any two non-vertex shader
+	     *     input variables are assigned to the same location. For
+	     *     vertex shaders, multiple input variables may be assigned
+	     *     to the same location using either layout qualifiers or via
+	     *     the OpenGL API. However, such aliasing is intended only to
+	     *     support vertex shaders where each execution path accesses
+	     *     at most one input per each location. Implementations are
+	     *     permitted, but not required, to generate link-time errors
+	     *     if they detect that every path through the vertex shader
+	     *     executable accesses multiple inputs assigned to any single
+	     *     location. For all shader types, a program will fail to link
+	     *     if explicit location assignments leave the linker unable
+	     *     to find space for other variables without explicit
+	     *     assignments."
+	     *
+	     * From OpenGL ES 3.0 spec, page 56:
+	     *
+	     *    "Binding more than one attribute name to the same location
+	     *     is referred to as aliasing, and is not permitted in OpenGL
+	     *     ES Shading Language 3.00 vertex shaders. LinkProgram will
+	     *     fail when this condition exists. However, aliasing is
+	     *     possible in OpenGL ES Shading Language 1.00 vertex shaders.
+	     *     This will only work if only one of the aliased attributes
+	     *     is active in the executable program, or if no path through
+	     *     the shader consumes more than one attribute of a set of
+	     *     attributes aliased to the same location. A link error can
+	     *     occur if the linker determines that every path through the
+	     *     shader consumes multiple aliased attributes, but implemen-
+	     *     tations are not required to generate an error in this case."
+	     *
+	     * After looking at above references from OpenGL, OpenGL ES and
+	     * GLSL specifications, we allow aliasing of vertex input variables
+	     * in: OpenGL 2.0 (and above) and OpenGL ES 2.0.
+	     *
+	     * NOTE: This is not required by the spec but its worth mentioning
+	     * here that we're not doing anything to make sure that no path
+	     * through the vertex shader executable accesses multiple inputs
+	     * assigned to any single location.
+	     */
+
+	    /* Mask representing the contiguous slots that will be used by
+	     * this attribute.
+	     */
+	    const unsigned attr = var->data.location - generic_base;
+	    const unsigned use_mask = (1 << slots) - 1;
+            const char *const string = (target_index == MESA_SHADER_VERTEX)
+               ? "vertex shader input" : "fragment shader output";
+
+            /* Generate a link error if the requested locations for this
+             * attribute exceed the maximum allowed attribute location.
+             */
+            if (attr + slots > max_index) {
+               linker_error(prog,
+                           "insufficient contiguous locations "
+                           "available for %s `%s' %d %d %d", string,
+                           var->name, used_locations, use_mask, attr);
+               return false;
+            }
+
+	    /* Generate a link error if the set of bits requested for this
+	     * attribute overlaps any previously allocated bits.
+	     */
+	    if ((~(use_mask << attr) & used_locations) != used_locations) {
+               if (target_index == MESA_SHADER_FRAGMENT ||
+                   (prog->IsES && prog->Version >= 300)) {
+                  linker_error(prog,
+                               "overlapping location is assigned "
+                               "to %s `%s' %d %d %d\n", string,
+                               var->name, used_locations, use_mask, attr);
+                  return false;
+               } else {
+                  linker_warning(prog,
+                                 "overlapping location is assigned "
+                                 "to %s `%s' %d %d %d\n", string,
+                                 var->name, used_locations, use_mask, attr);
+               }
+	    }
+
+	    used_locations |= (use_mask << attr);
+	 }
+
+	 continue;
+      }
+
+      to_assign[num_attr].slots = slots;
+      to_assign[num_attr].var = var;
+      num_attr++;
+   }
+
+   /* If all of the attributes were assigned locations by the application (or
+    * are built-in attributes with fixed locations), return early.  This should
+    * be the common case.
+    */
+   if (num_attr == 0)
+      return true;
+
+   qsort(to_assign, num_attr, sizeof(to_assign[0]), temp_attr::compare);
+
+   if (target_index == MESA_SHADER_VERTEX) {
+      /* VERT_ATTRIB_GENERIC0 is a pseudo-alias for VERT_ATTRIB_POS.  It can
+       * only be explicitly assigned by via glBindAttribLocation.  Mark it as
+       * reserved to prevent it from being automatically allocated below.
+       */
+      find_deref_visitor find("gl_Vertex");
+      find.run(sh->ir);
+      if (find.variable_found())
+	 used_locations |= (1 << 0);
+   }
+
+   for (unsigned i = 0; i < num_attr; i++) {
+      /* Mask representing the contiguous slots that will be used by this
+       * attribute.
+       */
+      const unsigned use_mask = (1 << to_assign[i].slots) - 1;
+
+      int location = find_available_slots(used_locations, to_assign[i].slots);
+
+      if (location < 0) {
+	 const char *const string = (target_index == MESA_SHADER_VERTEX)
+	    ? "vertex shader input" : "fragment shader output";
+
+	 linker_error(prog,
+		      "insufficient contiguous locations "
+		      "available for %s `%s'",
+		      string, to_assign[i].var->name);
+	 return false;
+      }
+
+      to_assign[i].var->data.location = generic_base + location;
+      to_assign[i].var->data.is_unmatched_generic_inout = 0;
+      used_locations |= (use_mask << location);
+   }
+
+   return true;
+}
+
+
+/**
+ * Demote shader inputs and outputs that are not used in other stages
+ */
+void
+demote_shader_inputs_and_outputs(gl_shader *sh, enum ir_variable_mode mode)
+{
+   foreach_list(node, sh->ir) {
+      ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+      if ((var == NULL) || (var->data.mode != int(mode)))
+	 continue;
+
+      /* A shader 'in' or 'out' variable is only really an input or output if
+       * its value is used by other shader stages.  This will cause the variable
+       * to have a location assigned.
+       */
+      if (var->data.is_unmatched_generic_inout) {
+	 var->data.mode = ir_var_auto;
+      }
+   }
+}
+
+
+/**
+ * Store the gl_FragDepth layout in the gl_shader_program struct.
+ */
+static void
+store_fragdepth_layout(struct gl_shader_program *prog)
+{
+   if (prog->_LinkedShaders[MESA_SHADER_FRAGMENT] == NULL) {
+      return;
+   }
+
+   struct exec_list *ir = prog->_LinkedShaders[MESA_SHADER_FRAGMENT]->ir;
+
+   /* We don't look up the gl_FragDepth symbol directly because if
+    * gl_FragDepth is not used in the shader, it's removed from the IR.
+    * However, the symbol won't be removed from the symbol table.
+    *
+    * We're only interested in the cases where the variable is NOT removed
+    * from the IR.
+    */
+   foreach_list(node, ir) {
+      ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+      if (var == NULL || var->data.mode != ir_var_shader_out) {
+         continue;
+      }
+
+      if (strcmp(var->name, "gl_FragDepth") == 0) {
+         switch (var->data.depth_layout) {
+         case ir_depth_layout_none:
+            prog->FragDepthLayout = FRAG_DEPTH_LAYOUT_NONE;
+            return;
+         case ir_depth_layout_any:
+            prog->FragDepthLayout = FRAG_DEPTH_LAYOUT_ANY;
+            return;
+         case ir_depth_layout_greater:
+            prog->FragDepthLayout = FRAG_DEPTH_LAYOUT_GREATER;
+            return;
+         case ir_depth_layout_less:
+            prog->FragDepthLayout = FRAG_DEPTH_LAYOUT_LESS;
+            return;
+         case ir_depth_layout_unchanged:
+            prog->FragDepthLayout = FRAG_DEPTH_LAYOUT_UNCHANGED;
+            return;
+         default:
+            assert(0);
+            return;
+         }
+      }
+   }
+}
+
+/**
+ * Validate the resources used by a program versus the implementation limits
+ */
+static void
+check_resources(struct gl_context *ctx, struct gl_shader_program *prog)
+{
+   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
+      struct gl_shader *sh = prog->_LinkedShaders[i];
+
+      if (sh == NULL)
+	 continue;
+
+      if (sh->num_samplers > ctx->Const.Program[i].MaxTextureImageUnits) {
+	 linker_error(prog, "Too many %s shader texture samplers",
+		      _mesa_shader_stage_to_string(i));
+      }
+
+      if (sh->num_uniform_components >
+          ctx->Const.Program[i].MaxUniformComponents) {
+         if (ctx->Const.GLSLSkipStrictMaxUniformLimitCheck) {
+            linker_warning(prog, "Too many %s shader default uniform block "
+                           "components, but the driver will try to optimize "
+                           "them out; this is non-portable out-of-spec "
+			   "behavior\n",
+                           _mesa_shader_stage_to_string(i));
+         } else {
+            linker_error(prog, "Too many %s shader default uniform block "
+			 "components",
+                         _mesa_shader_stage_to_string(i));
+         }
+      }
+
+      if (sh->num_combined_uniform_components >
+	  ctx->Const.Program[i].MaxCombinedUniformComponents) {
+         if (ctx->Const.GLSLSkipStrictMaxUniformLimitCheck) {
+            linker_warning(prog, "Too many %s shader uniform components, "
+                           "but the driver will try to optimize them out; "
+                           "this is non-portable out-of-spec behavior\n",
+                           _mesa_shader_stage_to_string(i));
+         } else {
+            linker_error(prog, "Too many %s shader uniform components",
+                         _mesa_shader_stage_to_string(i));
+         }
+      }
+   }
+
+   unsigned blocks[MESA_SHADER_STAGES] = {0};
+   unsigned total_uniform_blocks = 0;
+
+   for (unsigned i = 0; i < prog->NumUniformBlocks; i++) {
+      for (unsigned j = 0; j < MESA_SHADER_STAGES; j++) {
+	 if (prog->UniformBlockStageIndex[j][i] != -1) {
+	    blocks[j]++;
+	    total_uniform_blocks++;
+	 }
+      }
+
+      if (total_uniform_blocks > ctx->Const.MaxCombinedUniformBlocks) {
+	 linker_error(prog, "Too many combined uniform blocks (%d/%d)",
+		      prog->NumUniformBlocks,
+		      ctx->Const.MaxCombinedUniformBlocks);
+      } else {
+	 for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
+            const unsigned max_uniform_blocks =
+               ctx->Const.Program[i].MaxUniformBlocks;
+	    if (blocks[i] > max_uniform_blocks) {
+	       linker_error(prog, "Too many %s uniform blocks (%d/%d)",
+			    _mesa_shader_stage_to_string(i),
+			    blocks[i],
+			    max_uniform_blocks);
+	       break;
+	    }
+	 }
+      }
+   }
+}
+
+/**
+ * Validate shader image resources.
+ */
+static void
+check_image_resources(struct gl_context *ctx, struct gl_shader_program *prog)
+{
+   unsigned total_image_units = 0;
+   unsigned fragment_outputs = 0;
+
+   if (!ctx->Extensions.ARB_shader_image_load_store)
+      return;
+
+   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
+      struct gl_shader *sh = prog->_LinkedShaders[i];
+
+      if (sh) {
+         if (sh->NumImages > ctx->Const.Program[i].MaxImageUniforms)
+            linker_error(prog, "Too many %s shader image uniforms",
+                         _mesa_shader_stage_to_string(i));
+
+         total_image_units += sh->NumImages;
+
+         if (i == MESA_SHADER_FRAGMENT) {
+            foreach_list(node, sh->ir) {
+               ir_variable *var = ((ir_instruction *)node)->as_variable();
+               if (var && var->data.mode == ir_var_shader_out)
+                  fragment_outputs += var->type->count_attribute_slots();
+            }
+         }
+      }
+   }
+
+   if (total_image_units > ctx->Const.MaxCombinedImageUniforms)
+      linker_error(prog, "Too many combined image uniforms");
+
+   if (total_image_units + fragment_outputs >
+       ctx->Const.MaxCombinedImageUnitsAndFragmentOutputs)
+      linker_error(prog, "Too many combined image uniforms and fragment outputs");
+}
+
+void
+link_shaders(struct gl_context *ctx, struct gl_shader_program *prog)
+{
+   tfeedback_decl *tfeedback_decls = NULL;
+   unsigned num_tfeedback_decls = prog->TransformFeedback.NumVarying;
+
+   void *mem_ctx = ralloc_context(NULL); // temporary linker context
+
+   prog->LinkStatus = true; /* All error paths will set this to false */
+   prog->Validated = false;
+   prog->_Used = false;
+
+   ralloc_free(prog->InfoLog);
+   prog->InfoLog = ralloc_strdup(NULL, "");
+
+   ralloc_free(prog->UniformBlocks);
+   prog->UniformBlocks = NULL;
+   prog->NumUniformBlocks = 0;
+   for (int i = 0; i < MESA_SHADER_STAGES; i++) {
+      ralloc_free(prog->UniformBlockStageIndex[i]);
+      prog->UniformBlockStageIndex[i] = NULL;
+   }
+
+   ralloc_free(prog->AtomicBuffers);
+   prog->AtomicBuffers = NULL;
+   prog->NumAtomicBuffers = 0;
+   prog->ARB_fragment_coord_conventions_enable = false;
+
+   /* Separate the shaders into groups based on their type.
+    */
+   struct gl_shader **shader_list[MESA_SHADER_STAGES];
+   unsigned num_shaders[MESA_SHADER_STAGES];
+
+   for (int i = 0; i < MESA_SHADER_STAGES; i++) {
+      shader_list[i] = (struct gl_shader **)
+         calloc(prog->NumShaders, sizeof(struct gl_shader *));
+      num_shaders[i] = 0;
+   }
+
+   unsigned min_version = UINT_MAX;
+   unsigned max_version = 0;
+   const bool is_es_prog =
+      (prog->NumShaders > 0 && prog->Shaders[0]->IsES) ? true : false;
+   for (unsigned i = 0; i < prog->NumShaders; i++) {
+      min_version = MIN2(min_version, prog->Shaders[i]->Version);
+      max_version = MAX2(max_version, prog->Shaders[i]->Version);
+
+      if (prog->Shaders[i]->IsES != is_es_prog) {
+	 linker_error(prog, "all shaders must use same shading "
+		      "language version\n");
+	 goto done;
+      }
+
+      prog->ARB_fragment_coord_conventions_enable |=
+         prog->Shaders[i]->ARB_fragment_coord_conventions_enable;
+
+      gl_shader_stage shader_type = prog->Shaders[i]->Stage;
+      shader_list[shader_type][num_shaders[shader_type]] = prog->Shaders[i];
+      num_shaders[shader_type]++;
+   }
+
+   /* In desktop GLSL, different shader versions may be linked together.  In
+    * GLSL ES, all shader versions must be the same.
+    */
+   if (is_es_prog && min_version != max_version) {
+      linker_error(prog, "all shaders must use same shading "
+		   "language version\n");
+      goto done;
+   }
+
+   prog->Version = max_version;
+   prog->IsES = is_es_prog;
+
+   /* Geometry shaders have to be linked with vertex shaders.
+    */
+   if (num_shaders[MESA_SHADER_GEOMETRY] > 0 &&
+       num_shaders[MESA_SHADER_VERTEX] == 0 &&
+       !prog->SeparateShader) {
+      linker_error(prog, "Geometry shader must be linked with "
+		   "vertex shader\n");
+      goto done;
+   }
+
+   /* Compute shaders have additional restrictions. */
+   if (num_shaders[MESA_SHADER_COMPUTE] > 0 &&
+       num_shaders[MESA_SHADER_COMPUTE] != prog->NumShaders) {
+      linker_error(prog, "Compute shaders may not be linked with any other "
+                   "type of shader\n");
+   }
+
+   for (unsigned int i = 0; i < MESA_SHADER_STAGES; i++) {
+      if (prog->_LinkedShaders[i] != NULL)
+	 ctx->Driver.DeleteShader(ctx, prog->_LinkedShaders[i]);
+
+      prog->_LinkedShaders[i] = NULL;
+   }
+
+   /* Link all shaders for a particular stage and validate the result.
+    */
+   for (int stage = 0; stage < MESA_SHADER_STAGES; stage++) {
+      if (num_shaders[stage] > 0) {
+         gl_shader *const sh =
+            link_intrastage_shaders(mem_ctx, ctx, prog, shader_list[stage],
+                                    num_shaders[stage]);
+
+         if (!prog->LinkStatus)
+            goto done;
+
+         switch (stage) {
+         case MESA_SHADER_VERTEX:
+            validate_vertex_shader_executable(prog, sh);
+            break;
+         case MESA_SHADER_GEOMETRY:
+            validate_geometry_shader_executable(prog, sh);
+            break;
+         case MESA_SHADER_FRAGMENT:
+            validate_fragment_shader_executable(prog, sh);
+            break;
+         }
+         if (!prog->LinkStatus)
+            goto done;
+
+         _mesa_reference_shader(ctx, &prog->_LinkedShaders[stage], sh);
+      }
+   }
+
+   if (num_shaders[MESA_SHADER_GEOMETRY] > 0)
+      prog->LastClipDistanceArraySize = prog->Geom.ClipDistanceArraySize;
+   else if (num_shaders[MESA_SHADER_VERTEX] > 0)
+      prog->LastClipDistanceArraySize = prog->Vert.ClipDistanceArraySize;
+   else
+      prog->LastClipDistanceArraySize = 0; /* Not used */
+
+   /* Here begins the inter-stage linking phase.  Some initial validation is
+    * performed, then locations are assigned for uniforms, attributes, and
+    * varyings.
+    */
+   cross_validate_uniforms(prog);
+   if (!prog->LinkStatus)
+      goto done;
+
+   unsigned prev;
+
+   for (prev = 0; prev <= MESA_SHADER_FRAGMENT; prev++) {
+      if (prog->_LinkedShaders[prev] != NULL)
+         break;
+   }
+
+   /* Validate the inputs of each stage with the output of the preceding
+    * stage.
+    */
+   for (unsigned i = prev + 1; i <= MESA_SHADER_FRAGMENT; i++) {
+      if (prog->_LinkedShaders[i] == NULL)
+         continue;
+
+      validate_interstage_inout_blocks(prog, prog->_LinkedShaders[prev],
+                                       prog->_LinkedShaders[i]);
+      if (!prog->LinkStatus)
+         goto done;
+
+      cross_validate_outputs_to_inputs(prog,
+                                       prog->_LinkedShaders[prev],
+                                       prog->_LinkedShaders[i]);
+      if (!prog->LinkStatus)
+         goto done;
+
+      prev = i;
+   }
+
+   /* Cross-validate uniform blocks between shader stages */
+   validate_interstage_uniform_blocks(prog, prog->_LinkedShaders,
+                                      MESA_SHADER_STAGES);
+   if (!prog->LinkStatus)
+      goto done;
+
+   for (unsigned int i = 0; i < MESA_SHADER_STAGES; i++) {
+      if (prog->_LinkedShaders[i] != NULL)
+         lower_named_interface_blocks(mem_ctx, prog->_LinkedShaders[i]);
+   }
+
+   /* Implement the GLSL 1.30+ rule for discard vs infinite loops Do
+    * it before optimization because we want most of the checks to get
+    * dropped thanks to constant propagation.
+    *
+    * This rule also applies to GLSL ES 3.00.
+    */
+   if (max_version >= (is_es_prog ? 300 : 130)) {
+      struct gl_shader *sh = prog->_LinkedShaders[MESA_SHADER_FRAGMENT];
+      if (sh) {
+	 lower_discard_flow(sh->ir);
+      }
+   }
+
+   if (!interstage_cross_validate_uniform_blocks(prog))
+      goto done;
+
+   /* Do common optimization before assigning storage for attributes,
+    * uniforms, and varyings.  Later optimization could possibly make
+    * some of that unused.
+    */
+   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
+      if (prog->_LinkedShaders[i] == NULL)
+	 continue;
+
+      detect_recursion_linked(prog, prog->_LinkedShaders[i]->ir);
+      if (!prog->LinkStatus)
+	 goto done;
+
+      if (ctx->ShaderCompilerOptions[i].LowerClipDistance) {
+         lower_clip_distance(prog->_LinkedShaders[i]);
+      }
+
+      while (do_common_optimization(prog->_LinkedShaders[i]->ir, true, false,
+                                    &ctx->ShaderCompilerOptions[i],
+                                    ctx->Const.NativeIntegers))
+	 ;
+   }
+
+   /* Mark all generic shader inputs and outputs as unpaired. */
+   for (unsigned i = MESA_SHADER_VERTEX; i <= MESA_SHADER_FRAGMENT; i++) {
+      if (prog->_LinkedShaders[i] != NULL) {
+         link_invalidate_variable_locations(prog->_LinkedShaders[i]->ir);
+      }
+   }
+
+   /* FINISHME: The value of the max_attribute_index parameter is
+    * FINISHME: implementation dependent based on the value of
+    * FINISHME: GL_MAX_VERTEX_ATTRIBS.  GL_MAX_VERTEX_ATTRIBS must be
+    * FINISHME: at least 16, so hardcode 16 for now.
+    */
+   if (!assign_attribute_or_color_locations(prog, MESA_SHADER_VERTEX, 16)) {
+      goto done;
+   }
+
+   if (!assign_attribute_or_color_locations(prog, MESA_SHADER_FRAGMENT, MAX2(ctx->Const.MaxDrawBuffers, ctx->Const.MaxDualSourceDrawBuffers))) {
+      goto done;
+   }
+
+   unsigned first;
+   for (first = 0; first <= MESA_SHADER_FRAGMENT; first++) {
+      if (prog->_LinkedShaders[first] != NULL)
+	 break;
+   }
+
+   if (num_tfeedback_decls != 0) {
+      /* From GL_EXT_transform_feedback:
+       *   A program will fail to link if:
+       *
+       *   * the <count> specified by TransformFeedbackVaryingsEXT is
+       *     non-zero, but the program object has no vertex or geometry
+       *     shader;
+       */
+      if (first == MESA_SHADER_FRAGMENT) {
+         linker_error(prog, "Transform feedback varyings specified, but "
+                      "no vertex or geometry shader is present.");
+         goto done;
+      }
+
+      tfeedback_decls = ralloc_array(mem_ctx, tfeedback_decl,
+                                     prog->TransformFeedback.NumVarying);
+      if (!parse_tfeedback_decls(ctx, prog, mem_ctx, num_tfeedback_decls,
+                                 prog->TransformFeedback.VaryingNames,
+                                 tfeedback_decls))
+         goto done;
+   }
+
+   /* Linking the stages in the opposite order (from fragment to vertex)
+    * ensures that inter-shader outputs written to in an earlier stage are
+    * eliminated if they are (transitively) not used in a later stage.
+    */
+   int last, next;
+   for (last = MESA_SHADER_FRAGMENT; last >= 0; last--) {
+      if (prog->_LinkedShaders[last] != NULL)
+         break;
+   }
+
+   if (last >= 0 && last < MESA_SHADER_FRAGMENT) {
+      gl_shader *const sh = prog->_LinkedShaders[last];
+
+      if (num_tfeedback_decls != 0 || prog->SeparateShader) {
+         /* There was no fragment shader, but we still have to assign varying
+          * locations for use by transform feedback.
+          */
+         if (!assign_varying_locations(ctx, mem_ctx, prog,
+                                       sh, NULL,
+                                       num_tfeedback_decls, tfeedback_decls,
+                                       0))
+            goto done;
+      }
+
+      do_dead_builtin_varyings(ctx, sh, NULL,
+                               num_tfeedback_decls, tfeedback_decls);
+
+      if (!prog->SeparateShader)
+         demote_shader_inputs_and_outputs(sh, ir_var_shader_out);
+
+      /* Eliminate code that is now dead due to unused outputs being demoted.
+       */
+      while (do_dead_code(sh->ir, false))
+         ;
+   }
+   else if (first == MESA_SHADER_FRAGMENT) {
+      /* If the program only contains a fragment shader...
+       */
+      gl_shader *const sh = prog->_LinkedShaders[first];
+
+      do_dead_builtin_varyings(ctx, NULL, sh,
+                               num_tfeedback_decls, tfeedback_decls);
+
+      if (prog->SeparateShader) {
+         if (!assign_varying_locations(ctx, mem_ctx, prog,
+                                       NULL /* producer */,
+                                       sh /* consumer */,
+                                       0 /* num_tfeedback_decls */,
+                                       NULL /* tfeedback_decls */,
+                                       0 /* gs_input_vertices */))
+            goto done;
+      } else
+         demote_shader_inputs_and_outputs(sh, ir_var_shader_in);
+
+      while (do_dead_code(sh->ir, false))
+         ;
+   }
+
+   next = last;
+   for (int i = next - 1; i >= 0; i--) {
+      if (prog->_LinkedShaders[i] == NULL)
+         continue;
+
+      gl_shader *const sh_i = prog->_LinkedShaders[i];
+      gl_shader *const sh_next = prog->_LinkedShaders[next];
+      unsigned gs_input_vertices =
+         next == MESA_SHADER_GEOMETRY ? prog->Geom.VerticesIn : 0;
+
+      if (!assign_varying_locations(ctx, mem_ctx, prog, sh_i, sh_next,
+                next == MESA_SHADER_FRAGMENT ? num_tfeedback_decls : 0,
+                tfeedback_decls, gs_input_vertices))
+         goto done;
+
+      do_dead_builtin_varyings(ctx, sh_i, sh_next,
+                next == MESA_SHADER_FRAGMENT ? num_tfeedback_decls : 0,
+                tfeedback_decls);
+
+      demote_shader_inputs_and_outputs(sh_i, ir_var_shader_out);
+      demote_shader_inputs_and_outputs(sh_next, ir_var_shader_in);
+
+      /* Eliminate code that is now dead due to unused outputs being demoted.
+       */
+      while (do_dead_code(sh_i->ir, false))
+         ;
+      while (do_dead_code(sh_next->ir, false))
+         ;
+
+      /* This must be done after all dead varyings are eliminated. */
+      if (!check_against_output_limit(ctx, prog, sh_i))
+         goto done;
+      if (!check_against_input_limit(ctx, prog, sh_next))
+         goto done;
+
+      next = i;
+   }
+
+   if (!store_tfeedback_info(ctx, prog, num_tfeedback_decls, tfeedback_decls))
+      goto done;
+
+   update_array_sizes(prog);
+   link_assign_uniform_locations(prog);
+   link_assign_atomic_counter_resources(ctx, prog);
+   store_fragdepth_layout(prog);
+
+   check_resources(ctx, prog);
+   check_image_resources(ctx, prog);
+   link_check_atomic_counter_resources(ctx, prog);
+
+   if (!prog->LinkStatus)
+      goto done;
+
+   /* OpenGL ES requires that a vertex shader and a fragment shader both be
+    * present in a linked program. GL_ARB_ES2_compatibility doesn't say
+    * anything about shader linking when one of the shaders (vertex or
+    * fragment shader) is absent. So, the extension shouldn't change the
+    * behavior specified in GLSL specification.
+    */
+   if (!prog->SeparateShader && ctx->API == API_OPENGLES2) {
+      if (prog->_LinkedShaders[MESA_SHADER_VERTEX] == NULL) {
+	 linker_error(prog, "program lacks a vertex shader\n");
+      } else if (prog->_LinkedShaders[MESA_SHADER_FRAGMENT] == NULL) {
+	 linker_error(prog, "program lacks a fragment shader\n");
+      }
+   }
+
+   /* FINISHME: Assign fragment shader output locations. */
+
+done:
+   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
+      free(shader_list[i]);
+      if (prog->_LinkedShaders[i] == NULL)
+	 continue;
+
+      /* Do a final validation step to make sure that the IR wasn't
+       * invalidated by any modifications performed after intrastage linking.
+       */
+      validate_ir_tree(prog->_LinkedShaders[i]->ir);
+
+      /* Retain any live IR, but trash the rest. */
+      reparent_ir(prog->_LinkedShaders[i]->ir, prog->_LinkedShaders[i]->ir);
+
+      /* The symbol table in the linked shaders may contain references to
+       * variables that were removed (e.g., unused uniforms).  Since it may
+       * contain junk, there is no possible valid use.  Delete it and set the
+       * pointer to NULL.
+       */
+      delete prog->_LinkedShaders[i]->symbols;
+      prog->_LinkedShaders[i]->symbols = NULL;
+   }
+
+   ralloc_free(mem_ctx);
+}

diff --git a/icd/intel/compiler/shader/linker.h b/icd/intel/compiler/shader/linker.h
new file mode 100644
index 0000000..f0a947b
--- /dev/null
+++ b/icd/intel/compiler/shader/linker.h

@@ -0,0 +1,185 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef GLSL_LINKER_H
+#define GLSL_LINKER_H
+
+void
+populate_symbol_table(gl_shader *sh);
+
+extern bool
+link_function_calls(gl_shader_program *prog, gl_shader *main,
+		    gl_shader **shader_list, unsigned num_shaders);
+
+extern void
+link_invalidate_variable_locations(exec_list *ir);
+
+extern void
+link_assign_uniform_locations(struct gl_shader_program *prog);
+
+extern void
+link_set_uniform_initializers(struct gl_shader_program *prog);
+
+extern int
+link_cross_validate_uniform_block(void *mem_ctx,
+				  struct gl_uniform_block **linked_blocks,
+				  unsigned int *num_linked_blocks,
+				  struct gl_uniform_block *new_block);
+
+void
+link_assign_uniform_block_offsets(struct gl_shader *shader);
+
+extern bool
+link_uniform_blocks_are_compatible(const gl_uniform_block *a,
+				   const gl_uniform_block *b);
+
+extern unsigned
+link_uniform_blocks(void *mem_ctx,
+                    struct gl_shader_program *prog,
+                    struct gl_shader **shader_list,
+                    unsigned num_shaders,
+                    struct gl_uniform_block **blocks_ret);
+
+void
+validate_intrastage_interface_blocks(struct gl_shader_program *prog,
+                                     const gl_shader **shader_list,
+                                     unsigned num_shaders);
+
+void
+validate_interstage_inout_blocks(struct gl_shader_program *prog,
+                                 const gl_shader *producer,
+                                 const gl_shader *consumer);
+
+void
+validate_interstage_uniform_blocks(struct gl_shader_program *prog,
+                                   gl_shader **stages, int num_stages);
+
+extern void
+link_assign_atomic_counter_resources(struct gl_context *ctx,
+                                     struct gl_shader_program *prog);
+
+extern void
+link_check_atomic_counter_resources(struct gl_context *ctx,
+                                    struct gl_shader_program *prog);
+
+/**
+ * Class for processing all of the leaf fields of a variable that corresponds
+ * to a program resource.
+ *
+ * The leaf fields are all the parts of the variable that the application
+ * could query using \c glGetProgramResourceIndex (or that could be returned
+ * by \c glGetProgramResourceName).
+ *
+ * Classes my derive from this class to implement specific functionality.
+ * This class only provides the mechanism to iterate over the leaves.  Derived
+ * classes must implement \c ::visit_field and may override \c ::process.
+ */
+class program_resource_visitor {
+public:
+   /**
+    * Begin processing a variable
+    *
+    * Classes that overload this function should call \c ::process from the
+    * base class to start the recursive processing of the variable.
+    *
+    * \param var  The variable that is to be processed
+    *
+    * Calls \c ::visit_field for each leaf of the variable.
+    *
+    * \warning
+    * When processing a uniform block, this entry should only be used in cases
+    * where the row / column ordering of matrices in the block does not
+    * matter.  For example, enumerating the names of members of the block, but
+    * not for determining the offsets of members.
+    */
+   void process(ir_variable *var);
+
+   /**
+    * Begin processing a variable of a structured type.
+    *
+    * This flavor of \c process should be used to handle structured types
+    * (i.e., structures, interfaces, or arrays there of) that need special
+    * name handling.  A common usage is to handle cases where the block name
+    * (instead of the instance name) is used for an interface block.
+    *
+    * \param type  Type that is to be processed, associated with \c name
+    * \param name  Base name of the structured variable being processed
+    *
+    * \note
+    * \c type must be \c GLSL_TYPE_RECORD, \c GLSL_TYPE_INTERFACE, or an array
+    * there of.
+    */
+   void process(const glsl_type *type, const char *name);
+
+protected:
+   /**
+    * Method invoked for each leaf of the variable
+    *
+    * \param type  Type of the field.
+    * \param name  Fully qualified name of the field.
+    * \param row_major  For a matrix type, is it stored row-major.
+    * \param record_type  Type of the record containing the field.
+    *
+    * The default implementation just calls the other \c visit_field method.
+    */
+   virtual void visit_field(const glsl_type *type, const char *name,
+                            bool row_major, const glsl_type *record_type);
+
+   /**
+    * Method invoked for each leaf of the variable
+    *
+    * \param type  Type of the field.
+    * \param name  Fully qualified name of the field.
+    * \param row_major  For a matrix type, is it stored row-major.
+    */
+   virtual void visit_field(const glsl_type *type, const char *name,
+                            bool row_major) = 0;
+
+   /**
+    * Visit a record before visiting its fields
+    *
+    * For structures-of-structures or interfaces-of-structures, this visits
+    * the inner structure before visiting its fields.
+    *
+    * The default implementation does nothing.
+    */
+   virtual void visit_field(const glsl_struct_field *field);
+
+private:
+   /**
+    * \param name_length  Length of the current name \b not including the
+    *                     terminating \c NUL character.
+    */
+   void recursion(const glsl_type *t, char **name, size_t name_length,
+                  bool row_major, const glsl_type *record_type);
+};
+
+void
+linker_error(gl_shader_program *prog, const char *fmt, ...);
+
+void
+linker_warning(gl_shader_program *prog, const char *fmt, ...);
+
+#endif /* GLSL_LINKER_H */

diff --git a/icd/intel/compiler/shader/list.h b/icd/intel/compiler/shader/list.h
new file mode 100644
index 0000000..694b686
--- /dev/null
+++ b/icd/intel/compiler/shader/list.h

@@ -0,0 +1,419 @@
+/*
+ * Copyright © 2008, 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file list.h
+ * \brief Doubly-linked list abstract container type.
+ *
+ * Each doubly-linked list has a sentinel head and tail node.  These nodes
+ * contain no data.  The head sentinel can be identified by its \c prev
+ * pointer being \c NULL.  The tail sentinel can be identified by its
+ * \c next pointer being \c NULL.
+ *
+ * A list is empty if either the head sentinel's \c next pointer points to the
+ * tail sentinel or the tail sentinel's \c prev poiner points to the head
+ * sentinel.
+ *
+ * Instead of tracking two separate \c node structures and a \c list structure
+ * that points to them, the sentinel nodes are in a single structure.  Noting
+ * that each sentinel node always has one \c NULL pointer, the \c NULL
+ * pointers occupy the same memory location.  In the \c list structure
+ * contains a the following:
+ *
+ *   - A \c head pointer that represents the \c next pointer of the
+ *     head sentinel node.
+ *   - A \c tail pointer that represents the \c prev pointer of the head
+ *     sentinel node and the \c next pointer of the tail sentinel node.  This
+ *     pointer is \b always \c NULL.
+ *   - A \c tail_prev pointer that represents the \c prev pointer of the
+ *     tail sentinel node.
+ *
+ * Therefore, if \c head->next is \c NULL or \c tail_prev->prev is \c NULL,
+ * the list is empty.
+ *
+ * To anyone familiar with "exec lists" on the Amiga, this structure should
+ * be immediately recognizable.  See the following link for the original Amiga
+ * operating system documentation on the subject.
+ *
+ * http://www.natami.net/dev/Libraries_Manual_guide/node02D7.html
+ *
+ * \author Ian Romanick <ian.d.romanick@intel.com>
+ */
+
+#pragma once
+#ifndef LIST_CONTAINER_H
+#define LIST_CONTAINER_H
+
+#ifndef __cplusplus
+#include <stddef.h>
+#endif
+#include <assert.h>
+
+#include "ralloc.h"
+
+struct exec_node {
+   struct exec_node *next;
+   struct exec_node *prev;
+
+#ifdef __cplusplus
+   DECLARE_RALLOC_CXX_OPERATORS(exec_node)
+
+   exec_node() : next(NULL), prev(NULL)
+   {
+      /* empty */
+   }
+
+   const exec_node *get_next() const
+   {
+      return next;
+   }
+
+   exec_node *get_next()
+   {
+      return next;
+   }
+
+   const exec_node *get_prev() const
+   {
+      return prev;
+   }
+
+   exec_node *get_prev()
+   {
+      return prev;
+   }
+
+   void remove()
+   {
+      next->prev = prev;
+      prev->next = next;
+      next = NULL;
+      prev = NULL;
+   }
+
+   /**
+    * Link a node with itself
+    *
+    * This creates a sort of degenerate list that is occasionally useful.
+    */
+   void self_link()
+   {
+      next = this;
+      prev = this;
+   }
+
+   /**
+    * Insert a node in the list after the current node
+    */
+   void insert_after(exec_node *after)
+   {
+      after->next = this->next;
+      after->prev = this;
+
+      this->next->prev = after;
+      this->next = after;
+   }
+   /**
+    * Insert a node in the list before the current node
+    */
+   void insert_before(exec_node *before)
+   {
+      before->next = this;
+      before->prev = this->prev;
+
+      this->prev->next = before;
+      this->prev = before;
+   }
+
+   /**
+    * Insert another list in the list before the current node
+    */
+   void insert_before(struct exec_list *before);
+
+   /**
+    * Replace the current node with the given node.
+    */
+   void replace_with(exec_node *replacement)
+   {
+      replacement->prev = this->prev;
+      replacement->next = this->next;
+
+      this->prev->next = replacement;
+      this->next->prev = replacement;
+   }
+
+   /**
+    * Is this the sentinel at the tail of the list?
+    */
+   bool is_tail_sentinel() const
+   {
+      return this->next == NULL;
+   }
+
+   /**
+    * Is this the sentinel at the head of the list?
+    */
+   bool is_head_sentinel() const
+   {
+      return this->prev == NULL;
+   }
+#endif
+};
+
+
+#ifdef __cplusplus
+/* This macro will not work correctly if `t' uses virtual inheritance.  If you
+ * are using virtual inheritance, you deserve a slow and painful death.  Enjoy!
+ */
+#define exec_list_offsetof(t, f, p) \
+   (((char *) &((t *) p)->f) - ((char *) p))
+#else
+#define exec_list_offsetof(t, f, p) offsetof(t, f)
+#endif
+
+/**
+ * Get a pointer to the structure containing an exec_node
+ *
+ * Given a pointer to an \c exec_node embedded in a structure, get a pointer to
+ * the containing structure.
+ *
+ * \param type  Base type of the structure containing the node
+ * \param node  Pointer to the \c exec_node
+ * \param field Name of the field in \c type that is the embedded \c exec_node
+ */
+#define exec_node_data(type, node, field) \
+   ((type *) (((char *) node) - exec_list_offsetof(type, field, node)))
+
+#ifdef __cplusplus
+struct exec_node;
+#endif
+
+struct exec_list {
+   struct exec_node *head;
+   struct exec_node *tail;
+   struct exec_node *tail_pred;
+
+#ifdef __cplusplus
+   DECLARE_RALLOC_CXX_OPERATORS(exec_list)
+
+   exec_list()
+   {
+      make_empty();
+   }
+
+   void make_empty()
+   {
+      head = (exec_node *) & tail;
+      tail = NULL;
+      tail_pred = (exec_node *) & head;
+   }
+
+   bool is_empty() const
+   {
+      /* There are three ways to test whether a list is empty or not.
+       *
+       * - Check to see if the \c head points to the \c tail.
+       * - Check to see if the \c tail_pred points to the \c head.
+       * - Check to see if the \c head is the sentinel node by test whether its
+       *   \c next pointer is \c NULL.
+       *
+       * The first two methods tend to generate better code on modern systems
+       * because they save a pointer dereference.
+       */
+      return head == (exec_node *) &tail;
+   }
+
+   const exec_node *get_head() const
+   {
+      return !is_empty() ? head : NULL;
+   }
+
+   exec_node *get_head()
+   {
+      return !is_empty() ? head : NULL;
+   }
+
+   const exec_node *get_tail() const
+   {
+      return !is_empty() ? tail_pred : NULL;
+   }
+
+   exec_node *get_tail()
+   {
+      return !is_empty() ? tail_pred : NULL;
+   }
+
+   void push_head(exec_node *n)
+   {
+      n->next = head;
+      n->prev = (exec_node *) &head;
+
+      n->next->prev = n;
+      head = n;
+   }
+
+   void push_tail(exec_node *n)
+   {
+      n->next = (exec_node *) &tail;
+      n->prev = tail_pred;
+
+      n->prev->next = n;
+      tail_pred = n;
+   }
+
+   void push_degenerate_list_at_head(exec_node *n)
+   {
+      assert(n->prev->next == n);
+
+      n->prev->next = head;
+      head->prev = n->prev;
+      n->prev = (exec_node *) &head;
+      head = n;
+   }
+
+   /**
+    * Remove the first node from a list and return it
+    *
+    * \return
+    * The first node in the list or \c NULL if the list is empty.
+    *
+    * \sa exec_list::get_head
+    */
+   exec_node *pop_head()
+   {
+      exec_node *const n = this->get_head();
+      if (n != NULL)
+	 n->remove();
+
+      return n;
+   }
+
+   /**
+    * Move all of the nodes from this list to the target list
+    */
+   void move_nodes_to(exec_list *target)
+   {
+      if (is_empty()) {
+	 target->make_empty();
+      } else {
+	 target->head = head;
+	 target->tail = NULL;
+	 target->tail_pred = tail_pred;
+
+	 target->head->prev = (exec_node *) &target->head;
+	 target->tail_pred->next = (exec_node *) &target->tail;
+
+	 make_empty();
+      }
+   }
+
+   /**
+    * Append all nodes from the source list to the target list
+    */
+   void
+   append_list(exec_list *source)
+   {
+      if (source->is_empty())
+	 return;
+
+      /* Link the first node of the source with the last node of the target list.
+       */
+      this->tail_pred->next = source->head;
+      source->head->prev = this->tail_pred;
+
+      /* Make the tail of the source list be the tail of the target list.
+       */
+      this->tail_pred = source->tail_pred;
+      this->tail_pred->next = (exec_node *) &this->tail;
+
+      /* Make the source list empty for good measure.
+       */
+      source->make_empty();
+   }
+#endif
+};
+
+
+#ifdef __cplusplus
+inline void exec_node::insert_before(exec_list *before)
+{
+   if (before->is_empty())
+      return;
+
+   before->tail_pred->next = this;
+   before->head->prev = this->prev;
+
+   this->prev->next = before->head;
+   this->prev = before->tail_pred;
+
+   before->make_empty();
+}
+#endif
+
+/**
+ * This version is safe even if the current node is removed.
+ */ 
+#define foreach_list_safe(__node, __list)			     \
+   for (exec_node * __node = (__list)->head, * __next = __node->next \
+	; __next != NULL					     \
+	; __node = __next, __next = __next->next)
+
+#define foreach_list(__node, __list)			\
+   for (exec_node * __node = (__list)->head		\
+	; (__node)->next != NULL 			\
+	; (__node) = (__node)->next)
+
+/**
+ * Iterate through two lists at once.  Stops at the end of the shorter list.
+ *
+ * This is safe against either current node being removed or replaced.
+ */
+#define foreach_two_lists(__node1, __list1, __node2, __list2) \
+   for (exec_node * __node1 = (__list1)->head,                \
+                  * __node2 = (__list2)->head,                \
+                  * __next1 = __node1->next,                  \
+                  * __next2 = __node2->next                   \
+	; __next1 != NULL && __next2 != NULL                  \
+	; __node1 = __next1,                                  \
+          __node2 = __next2,                                  \
+          __next1 = __next1->next,                            \
+          __next2 = __next2->next)
+
+#define foreach_list_const(__node, __list)		\
+   for (const exec_node * __node = (__list)->head	\
+	; (__node)->next != NULL 			\
+	; (__node) = (__node)->next)
+
+#define foreach_list_typed(__type, __node, __field, __list)		\
+   for (__type * __node =						\
+	   exec_node_data(__type, (__list)->head, __field);		\
+	(__node)->__field.next != NULL; 				\
+	(__node) = exec_node_data(__type, (__node)->__field.next, __field))
+
+#define foreach_list_typed_const(__type, __node, __field, __list)	\
+   for (const __type * __node =						\
+	   exec_node_data(__type, (__list)->head, __field);		\
+	(__node)->__field.next != NULL; 				\
+	(__node) = exec_node_data(__type, (__node)->__field.next, __field))
+
+#endif /* LIST_CONTAINER_H */

diff --git a/icd/intel/compiler/shader/loop_analysis.cpp b/icd/intel/compiler/shader/loop_analysis.cpp
new file mode 100644
index 0000000..d6a9ac7
--- /dev/null
+++ b/icd/intel/compiler/shader/loop_analysis.cpp

@@ -0,0 +1,648 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "glsl_types.h"
+#include "loop_analysis.h"
+#include "ir_hierarchical_visitor.h"
+
+static bool is_loop_terminator(ir_if *ir);
+
+static bool all_expression_operands_are_loop_constant(ir_rvalue *,
+						      hash_table *);
+
+static ir_rvalue *get_basic_induction_increment(ir_assignment *, hash_table *);
+
+
+/**
+ * Record the fact that the given loop variable was referenced inside the loop.
+ *
+ * \arg in_assignee is true if the reference was on the LHS of an assignment.
+ *
+ * \arg in_conditional_code_or_nested_loop is true if the reference occurred
+ * inside an if statement or a nested loop.
+ *
+ * \arg current_assignment is the ir_assignment node that the loop variable is
+ * on the LHS of, if any (ignored if \c in_assignee is false).
+ */
+void
+loop_variable::record_reference(bool in_assignee,
+                                bool in_conditional_code_or_nested_loop,
+                                ir_assignment *current_assignment)
+{
+   if (in_assignee) {
+      assert(current_assignment != NULL);
+
+      if (in_conditional_code_or_nested_loop ||
+          current_assignment->condition != NULL) {
+         this->conditional_or_nested_assignment = true;
+      }
+
+      if (this->first_assignment == NULL) {
+         assert(this->num_assignments == 0);
+
+         this->first_assignment = current_assignment;
+      }
+
+      this->num_assignments++;
+   } else if (this->first_assignment == current_assignment) {
+      /* This catches the case where the variable is used in the RHS of an
+       * assignment where it is also in the LHS.
+       */
+      this->read_before_write = true;
+   }
+}
+
+
+loop_state::loop_state()
+{
+   this->ht = hash_table_ctor(0, hash_table_pointer_hash,
+			      hash_table_pointer_compare);
+   this->mem_ctx = ralloc_context(NULL);
+   this->loop_found = false;
+}
+
+
+loop_state::~loop_state()
+{
+   hash_table_dtor(this->ht);
+   ralloc_free(this->mem_ctx);
+}
+
+
+loop_variable_state *
+loop_state::insert(ir_loop *ir)
+{
+   loop_variable_state *ls = new(this->mem_ctx) loop_variable_state;
+
+   hash_table_insert(this->ht, ls, ir);
+   this->loop_found = true;
+
+   return ls;
+}
+
+
+loop_variable_state *
+loop_state::get(const ir_loop *ir)
+{
+   return (loop_variable_state *) hash_table_find(this->ht, ir);
+}
+
+
+loop_variable *
+loop_variable_state::get(const ir_variable *ir)
+{
+   return (loop_variable *) hash_table_find(this->var_hash, ir);
+}
+
+
+loop_variable *
+loop_variable_state::insert(ir_variable *var)
+{
+   void *mem_ctx = ralloc_parent(this);
+   loop_variable *lv = rzalloc(mem_ctx, loop_variable);
+
+   lv->var = var;
+
+   hash_table_insert(this->var_hash, lv, lv->var);
+   this->variables.push_tail(lv);
+
+   return lv;
+}
+
+
+loop_terminator *
+loop_variable_state::insert(ir_if *if_stmt)
+{
+   void *mem_ctx = ralloc_parent(this);
+   loop_terminator *t = new(mem_ctx) loop_terminator();
+
+   t->ir = if_stmt;
+   this->terminators.push_tail(t);
+
+   return t;
+}
+
+
+/**
+ * If the given variable already is recorded in the state for this loop,
+ * return the corresponding loop_variable object that records information
+ * about it.
+ *
+ * Otherwise, create a new loop_variable object to record information about
+ * the variable, and set its \c read_before_write field appropriately based on
+ * \c in_assignee.
+ *
+ * \arg in_assignee is true if this variable was encountered on the LHS of an
+ * assignment.
+ */
+loop_variable *
+loop_variable_state::get_or_insert(ir_variable *var, bool in_assignee)
+{
+   loop_variable *lv = this->get(var);
+
+   if (lv == NULL) {
+      lv = this->insert(var);
+      lv->read_before_write = !in_assignee;
+   }
+
+   return lv;
+}
+
+
+namespace {
+
+class loop_analysis : public ir_hierarchical_visitor {
+public:
+   loop_analysis(loop_state *loops);
+
+   virtual ir_visitor_status visit(ir_loop_jump *);
+   virtual ir_visitor_status visit(ir_dereference_variable *);
+
+   virtual ir_visitor_status visit_enter(ir_call *);
+
+   virtual ir_visitor_status visit_enter(ir_loop *);
+   virtual ir_visitor_status visit_leave(ir_loop *);
+   virtual ir_visitor_status visit_enter(ir_assignment *);
+   virtual ir_visitor_status visit_leave(ir_assignment *);
+   virtual ir_visitor_status visit_enter(ir_if *);
+   virtual ir_visitor_status visit_leave(ir_if *);
+
+   loop_state *loops;
+
+   int if_statement_depth;
+
+   ir_assignment *current_assignment;
+
+   exec_list state;
+};
+
+} /* anonymous namespace */
+
+loop_analysis::loop_analysis(loop_state *loops)
+   : loops(loops), if_statement_depth(0), current_assignment(NULL)
+{
+   /* empty */
+}
+
+
+ir_visitor_status
+loop_analysis::visit(ir_loop_jump *ir)
+{
+   (void) ir;
+
+   assert(!this->state.is_empty());
+
+   loop_variable_state *const ls =
+      (loop_variable_state *) this->state.get_head();
+
+   ls->num_loop_jumps++;
+
+   return visit_continue;
+}
+
+
+ir_visitor_status
+loop_analysis::visit_enter(ir_call *)
+{
+   /* Mark every loop that we're currently analyzing as containing an ir_call
+    * (even those at outer nesting levels).
+    */
+   foreach_list(node, &this->state) {
+      loop_variable_state *const ls = (loop_variable_state *) node;
+      ls->contains_calls = true;
+   }
+
+   return visit_continue_with_parent;
+}
+
+
+ir_visitor_status
+loop_analysis::visit(ir_dereference_variable *ir)
+{
+   /* If we're not somewhere inside a loop, there's nothing to do.
+    */
+   if (this->state.is_empty())
+      return visit_continue;
+
+   bool nested = false;
+
+   foreach_list(node, &this->state) {
+      loop_variable_state *const ls = (loop_variable_state *) node;
+
+      ir_variable *var = ir->variable_referenced();
+      loop_variable *lv = ls->get_or_insert(var, this->in_assignee);
+
+      lv->record_reference(this->in_assignee,
+                           nested || this->if_statement_depth > 0,
+                           this->current_assignment);
+      nested = true;
+   }
+
+   return visit_continue;
+}
+
+ir_visitor_status
+loop_analysis::visit_enter(ir_loop *ir)
+{
+   loop_variable_state *ls = this->loops->insert(ir);
+   this->state.push_head(ls);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+loop_analysis::visit_leave(ir_loop *ir)
+{
+   loop_variable_state *const ls =
+      (loop_variable_state *) this->state.pop_head();
+
+   /* Function calls may contain side effects.  These could alter any of our
+    * variables in ways that cannot be known, and may even terminate shader
+    * execution (say, calling discard in the fragment shader).  So we can't
+    * rely on any of our analysis about assignments to variables.
+    *
+    * We could perform some conservative analysis (prove there's no statically
+    * possible assignment, etc.) but it isn't worth it for now; function
+    * inlining will allow us to unroll loops anyway.
+    */
+   if (ls->contains_calls)
+      return visit_continue;
+
+   foreach_list(node, &ir->body_instructions) {
+      /* Skip over declarations at the start of a loop.
+       */
+      if (((ir_instruction *) node)->as_variable())
+	 continue;
+
+      ir_if *if_stmt = ((ir_instruction *) node)->as_if();
+
+      if ((if_stmt != NULL) && is_loop_terminator(if_stmt))
+	 ls->insert(if_stmt);
+      else
+	 break;
+   }
+
+
+   foreach_list_safe(node, &ls->variables) {
+      loop_variable *lv = (loop_variable *) node;
+
+      /* Move variables that are already marked as being loop constant to
+       * a separate list.  These trivially don't need to be tested.
+       */
+      if (lv->is_loop_constant()) {
+	 lv->remove();
+	 ls->constants.push_tail(lv);
+      }
+   }
+
+   /* Each variable assigned in the loop that isn't already marked as being loop
+    * constant might still be loop constant.  The requirements at this point
+    * are:
+    *
+    *    - Variable is written before it is read.
+    *
+    *    - Only one assignment to the variable.
+    *
+    *    - All operands on the RHS of the assignment are also loop constants.
+    *
+    * The last requirement is the reason for the progress loop.  A variable
+    * marked as a loop constant on one pass may allow other variables to be
+    * marked as loop constant on following passes.
+    */
+   bool progress;
+   do {
+      progress = false;
+
+      foreach_list_safe(node, &ls->variables) {
+	 loop_variable *lv = (loop_variable *) node;
+
+	 if (lv->conditional_or_nested_assignment || (lv->num_assignments > 1))
+	    continue;
+
+	 /* Process the RHS of the assignment.  If all of the variables
+	  * accessed there are loop constants, then add this
+	  */
+	 ir_rvalue *const rhs = lv->first_assignment->rhs;
+	 if (all_expression_operands_are_loop_constant(rhs, ls->var_hash)) {
+	    lv->rhs_clean = true;
+
+	    if (lv->is_loop_constant()) {
+	       progress = true;
+
+	       lv->remove();
+	       ls->constants.push_tail(lv);
+	    }
+	 }
+      }
+   } while (progress);
+
+   /* The remaining variables that are not loop invariant might be loop
+    * induction variables.
+    */
+   foreach_list_safe(node, &ls->variables) {
+      loop_variable *lv = (loop_variable *) node;
+
+      /* If there is more than one assignment to a variable, it cannot be a
+       * loop induction variable.  This isn't strictly true, but this is a
+       * very simple induction variable detector, and it can't handle more
+       * complex cases.
+       */
+      if (lv->num_assignments > 1)
+	 continue;
+
+      /* All of the variables with zero assignments in the loop are loop
+       * invariant, and they should have already been filtered out.
+       */
+      assert(lv->num_assignments == 1);
+      assert(lv->first_assignment != NULL);
+
+      /* The assignment to the variable in the loop must be unconditional and
+       * not inside a nested loop.
+       */
+      if (lv->conditional_or_nested_assignment)
+	 continue;
+
+      /* Basic loop induction variables have a single assignment in the loop
+       * that has the form 'VAR = VAR + i' or 'VAR = VAR - i' where i is a
+       * loop invariant.
+       */
+      ir_rvalue *const inc =
+	 get_basic_induction_increment(lv->first_assignment, ls->var_hash);
+      if (inc != NULL) {
+	 lv->increment = inc;
+
+	 lv->remove();
+	 ls->induction_variables.push_tail(lv);
+      }
+   }
+
+   /* Search the loop terminating conditions for those of the form 'i < c'
+    * where i is a loop induction variable, c is a constant, and < is any
+    * relative operator.  From each of these we can infer an iteration count.
+    * Also figure out which terminator (if any) produces the smallest
+    * iteration count--this is the limiting terminator.
+    */
+   foreach_list(node, &ls->terminators) {
+      loop_terminator *t = (loop_terminator *) node;
+      ir_if *if_stmt = t->ir;
+
+      /* If-statements can be either 'if (expr)' or 'if (deref)'.  We only care
+       * about the former here.
+       */
+      ir_expression *cond = if_stmt->condition->as_expression();
+      if (cond == NULL)
+	 continue;
+
+      switch (cond->operation) {
+      case ir_binop_less:
+      case ir_binop_greater:
+      case ir_binop_lequal:
+      case ir_binop_gequal: {
+	 /* The expressions that we care about will either be of the form
+	  * 'counter < limit' or 'limit < counter'.  Figure out which is
+	  * which.
+	  */
+	 ir_rvalue *counter = cond->operands[0]->as_dereference_variable();
+	 ir_constant *limit = cond->operands[1]->as_constant();
+	 enum ir_expression_operation cmp = cond->operation;
+
+	 if (limit == NULL) {
+	    counter = cond->operands[1]->as_dereference_variable();
+	    limit = cond->operands[0]->as_constant();
+
+	    switch (cmp) {
+	    case ir_binop_less:    cmp = ir_binop_greater; break;
+	    case ir_binop_greater: cmp = ir_binop_less;    break;
+	    case ir_binop_lequal:  cmp = ir_binop_gequal;  break;
+	    case ir_binop_gequal:  cmp = ir_binop_lequal;  break;
+	    default: assert(!"Should not get here.");
+	    }
+	 }
+
+	 if ((counter == NULL) || (limit == NULL))
+	    break;
+
+	 ir_variable *var = counter->variable_referenced();
+
+	 ir_rvalue *init = find_initial_value(ir, var);
+
+         loop_variable *lv = ls->get(var);
+         if (lv != NULL && lv->is_induction_var()) {
+            t->iterations = calculate_iterations(init, limit, lv->increment,
+                                                 cmp);
+
+            if (t->iterations >= 0 &&
+                (ls->limiting_terminator == NULL ||
+                 t->iterations < ls->limiting_terminator->iterations)) {
+               ls->limiting_terminator = t;
+            }
+         }
+         break;
+      }
+
+      default:
+         break;
+      }
+   }
+
+   return visit_continue;
+}
+
+ir_visitor_status
+loop_analysis::visit_enter(ir_if *ir)
+{
+   (void) ir;
+
+   if (!this->state.is_empty())
+      this->if_statement_depth++;
+
+   return visit_continue;
+}
+
+ir_visitor_status
+loop_analysis::visit_leave(ir_if *ir)
+{
+   (void) ir;
+
+   if (!this->state.is_empty())
+      this->if_statement_depth--;
+
+   return visit_continue;
+}
+
+ir_visitor_status
+loop_analysis::visit_enter(ir_assignment *ir)
+{
+   /* If we're not somewhere inside a loop, there's nothing to do.
+    */
+   if (this->state.is_empty())
+      return visit_continue_with_parent;
+
+   this->current_assignment = ir;
+
+   return visit_continue;
+}
+
+ir_visitor_status
+loop_analysis::visit_leave(ir_assignment *ir)
+{
+   /* Since the visit_enter exits with visit_continue_with_parent for this
+    * case, the loop state stack should never be empty here.
+    */
+   assert(!this->state.is_empty());
+
+   assert(this->current_assignment == ir);
+   this->current_assignment = NULL;
+
+   return visit_continue;
+}
+
+
+class examine_rhs : public ir_hierarchical_visitor {
+public:
+   examine_rhs(hash_table *loop_variables)
+   {
+      this->only_uses_loop_constants = true;
+      this->loop_variables = loop_variables;
+   }
+
+   virtual ir_visitor_status visit(ir_dereference_variable *ir)
+   {
+      loop_variable *lv =
+	 (loop_variable *) hash_table_find(this->loop_variables, ir->var);
+
+      assert(lv != NULL);
+
+      if (lv->is_loop_constant()) {
+	 return visit_continue;
+      } else {
+	 this->only_uses_loop_constants = false;
+	 return visit_stop;
+      }
+   }
+
+   hash_table *loop_variables;
+   bool only_uses_loop_constants;
+};
+
+
+bool
+all_expression_operands_are_loop_constant(ir_rvalue *ir, hash_table *variables)
+{
+   examine_rhs v(variables);
+
+   ir->accept(&v);
+
+   return v.only_uses_loop_constants;
+}
+
+
+ir_rvalue *
+get_basic_induction_increment(ir_assignment *ir, hash_table *var_hash)
+{
+   /* The RHS must be a binary expression.
+    */
+   ir_expression *const rhs = ir->rhs->as_expression();
+   if ((rhs == NULL)
+       || ((rhs->operation != ir_binop_add)
+	   && (rhs->operation != ir_binop_sub)))
+      return NULL;
+
+   /* One of the of operands of the expression must be the variable assigned.
+    * If the operation is subtraction, the variable in question must be the
+    * "left" operand.
+    */
+   ir_variable *const var = ir->lhs->variable_referenced();
+
+   ir_variable *const op0 = rhs->operands[0]->variable_referenced();
+   ir_variable *const op1 = rhs->operands[1]->variable_referenced();
+
+   if (((op0 != var) && (op1 != var))
+       || ((op1 == var) && (rhs->operation == ir_binop_sub)))
+      return NULL;
+
+   ir_rvalue *inc = (op0 == var) ? rhs->operands[1] : rhs->operands[0];
+
+   if (inc->as_constant() == NULL) {
+      ir_variable *const inc_var = inc->variable_referenced();
+      if (inc_var != NULL) {
+	 loop_variable *lv =
+	    (loop_variable *) hash_table_find(var_hash, inc_var);
+
+	 if (!lv->is_loop_constant())
+	    inc = NULL;
+      } else
+	 inc = NULL;
+   }
+
+   if ((inc != NULL) && (rhs->operation == ir_binop_sub)) {
+      void *mem_ctx = ralloc_parent(ir);
+
+      inc = new(mem_ctx) ir_expression(ir_unop_neg,
+				       inc->type,
+				       inc->clone(mem_ctx, NULL),
+				       NULL);
+   }
+
+   return inc;
+}
+
+
+/**
+ * Detect whether an if-statement is a loop terminating condition
+ *
+ * Detects if-statements of the form
+ *
+ *  (if (expression bool ...) (break))
+ */
+bool
+is_loop_terminator(ir_if *ir)
+{
+   if (!ir->else_instructions.is_empty())
+      return false;
+
+   ir_instruction *const inst =
+      (ir_instruction *) ir->then_instructions.get_head();
+   if (inst == NULL)
+      return false;
+
+   if (inst->ir_type != ir_type_loop_jump)
+      return false;
+
+   ir_loop_jump *const jump = (ir_loop_jump *) inst;
+   if (jump->mode != ir_loop_jump::jump_break)
+      return false;
+
+   return true;
+}
+
+
+loop_state *
+analyze_loop_variables(exec_list *instructions)
+{
+   loop_state *loops = new loop_state;
+   loop_analysis v(loops);
+
+   v.run(instructions);
+   return v.loops;
+}

diff --git a/icd/intel/compiler/shader/loop_analysis.h b/icd/intel/compiler/shader/loop_analysis.h
new file mode 100644
index 0000000..295dc79
--- /dev/null
+++ b/icd/intel/compiler/shader/loop_analysis.h

@@ -0,0 +1,279 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef LOOP_ANALYSIS_H
+#define LOOP_ANALYSIS_H
+
+#include "ir.h"
+#include "program/hash_table.h"
+
+/**
+ * Analyze and classify all variables used in all loops in the instruction list
+ */
+extern class loop_state *
+analyze_loop_variables(exec_list *instructions);
+
+
+/**
+ * Fill in loop control fields
+ *
+ * Based on analysis of loop variables, this function tries to remove
+ * redundant sequences in the loop of the form
+ *
+ *  (if (expression bool ...) (break))
+ *
+ * For example, if it is provable that one loop exit condition will
+ * always be satisfied before another, the unnecessary exit condition will be
+ * removed.
+ */
+extern bool
+set_loop_controls(exec_list *instructions, loop_state *ls);
+
+
+extern bool
+unroll_loops(exec_list *instructions, loop_state *ls,
+             const struct gl_shader_compiler_options *options);
+
+ir_rvalue *
+find_initial_value(ir_loop *loop, ir_variable *var);
+
+int
+calculate_iterations(ir_rvalue *from, ir_rvalue *to, ir_rvalue *increment,
+		     enum ir_expression_operation op);
+
+
+/**
+ * Tracking for all variables used in a loop
+ */
+class loop_variable_state : public exec_node {
+public:
+   class loop_variable *get(const ir_variable *);
+   class loop_variable *insert(ir_variable *);
+   class loop_variable *get_or_insert(ir_variable *, bool in_assignee);
+   class loop_terminator *insert(ir_if *);
+
+
+   /**
+    * Variables that have not yet been classified
+    */
+   exec_list variables;
+
+   /**
+    * Variables whose values are constant within the body of the loop
+    *
+    * This list contains \c loop_variable objects.
+    */
+   exec_list constants;
+
+   /**
+    * Induction variables for this loop
+    *
+    * This list contains \c loop_variable objects.
+    */
+   exec_list induction_variables;
+
+   /**
+    * Simple if-statements that lead to the termination of the loop
+    *
+    * This list contains \c loop_terminator objects.
+    *
+    * \sa is_loop_terminator
+    */
+   exec_list terminators;
+
+   /**
+    * If any of the terminators in \c terminators leads to termination of the
+    * loop after a constant number of iterations, this is the terminator that
+    * leads to termination after the smallest number of iterations.  Otherwise
+    * NULL.
+    */
+   loop_terminator *limiting_terminator;
+
+   /**
+    * Hash table containing all variables accessed in this loop
+    */
+   hash_table *var_hash;
+
+   /**
+    * Number of ir_loop_jump instructions that operate on this loop
+    */
+   unsigned num_loop_jumps;
+
+   /**
+    * Whether this loop contains any function calls.
+    */
+   bool contains_calls;
+
+   loop_variable_state()
+   {
+      this->num_loop_jumps = 0;
+      this->contains_calls = false;
+      this->var_hash = hash_table_ctor(0, hash_table_pointer_hash,
+				       hash_table_pointer_compare);
+      this->limiting_terminator = NULL;
+   }
+
+   ~loop_variable_state()
+   {
+      hash_table_dtor(this->var_hash);
+   }
+
+   static void* operator new(size_t size, void *ctx)
+   {
+      void *lvs = ralloc_size(ctx, size);
+      assert(lvs != NULL);
+
+      ralloc_set_destructor(lvs, (void (*)(void*)) destructor);
+
+      return lvs;
+   }
+
+private:
+   static void
+   destructor(loop_variable_state *lvs)
+   {
+      lvs->~loop_variable_state();
+   }
+};
+
+
+class loop_variable : public exec_node {
+public:
+   /** The variable in question. */
+   ir_variable *var;
+
+   /** Is the variable read in the loop before it is written? */
+   bool read_before_write;
+
+   /** Are all variables in the RHS of the assignment loop constants? */
+   bool rhs_clean;
+
+   /**
+    * Is there an assignment to the variable that is conditional, or inside a
+    * nested loop?
+    */
+   bool conditional_or_nested_assignment;
+
+   /** Reference to the first assignment to the variable in the loop body. */
+   ir_assignment *first_assignment;
+
+   /** Number of assignments to the variable in the loop body. */
+   unsigned num_assignments;
+
+   /**
+    * Increment value for a loop induction variable
+    *
+    * If this is a loop induction variable, the amount by which the variable
+    * is incremented on each iteration through the loop.
+    *
+    * If this is not a loop induction variable, NULL.
+    */
+   ir_rvalue *increment;
+
+
+   inline bool is_induction_var() const
+   {
+      /* Induction variables always have a non-null increment, and vice
+       * versa.
+       */
+      return this->increment != NULL;
+   }
+
+
+   inline bool is_loop_constant() const
+   {
+      const bool is_const = (this->num_assignments == 0)
+	 || ((this->num_assignments == 1)
+	     && !this->conditional_or_nested_assignment
+	     && !this->read_before_write
+	     && this->rhs_clean);
+
+      /* If the RHS of *the* assignment is clean, then there must be exactly
+       * one assignment of the variable.
+       */
+      assert((this->rhs_clean && (this->num_assignments == 1))
+	     || !this->rhs_clean);
+
+      /* Variables that are marked read-only *MUST* be loop constant.
+       */
+      assert(!this->var->data.read_only
+            || (this->var->data.read_only && is_const));
+
+      return is_const;
+   }
+
+   void record_reference(bool in_assignee,
+                         bool in_conditional_code_or_nested_loop,
+                         ir_assignment *current_assignment);
+};
+
+
+class loop_terminator : public exec_node {
+public:
+   loop_terminator()
+      : ir(NULL), iterations(-1)
+   {
+   }
+
+   /**
+    * Statement which terminates the loop.
+    */
+   ir_if *ir;
+
+   /**
+    * The number of iterations after which the terminator is known to
+    * terminate the loop (if that is a fixed value).  Otherwise -1.
+    */
+   int iterations;
+};
+
+
+class loop_state {
+public:
+   ~loop_state();
+
+   /**
+    * Get the loop variable state data for a particular loop
+    */
+   loop_variable_state *get(const ir_loop *);
+
+   loop_variable_state *insert(ir_loop *ir);
+
+   bool loop_found;
+
+private:
+   loop_state();
+
+   /**
+    * Hash table containing all loops that have been analyzed.
+    */
+   hash_table *ht;
+
+   void *mem_ctx;
+
+   friend loop_state *analyze_loop_variables(exec_list *instructions);
+};
+
+#endif /* LOOP_ANALYSIS_H */

diff --git a/icd/intel/compiler/shader/loop_controls.cpp b/icd/intel/compiler/shader/loop_controls.cpp
new file mode 100644
index 0000000..3db06ad
--- /dev/null
+++ b/icd/intel/compiler/shader/loop_controls.cpp

@@ -0,0 +1,233 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include <limits.h>
+#include "main/compiler.h"
+#include "glsl_types.h"
+#include "loop_analysis.h"
+#include "ir_hierarchical_visitor.h"
+
+/**
+ * Find an initializer of a variable outside a loop
+ *
+ * Works backwards from the loop to find the pre-loop value of the variable.
+ * This is used, for example, to find the initial value of loop induction
+ * variables.
+ *
+ * \param loop  Loop where \c var is an induction variable
+ * \param var   Variable whose initializer is to be found
+ *
+ * \return
+ * The \c ir_rvalue assigned to the variable outside the loop.  May return
+ * \c NULL if no initializer can be found.
+ */
+ir_rvalue *
+find_initial_value(ir_loop *loop, ir_variable *var)
+{
+   for (exec_node *node = loop->prev;
+	!node->is_head_sentinel();
+	node = node->prev) {
+      ir_instruction *ir = (ir_instruction *) node;
+
+      switch (ir->ir_type) {
+      case ir_type_call:
+      case ir_type_loop:
+      case ir_type_loop_jump:
+      case ir_type_return:
+      case ir_type_if:
+	 return NULL;
+
+      case ir_type_function:
+      case ir_type_function_signature:
+	 assert(!"Should not get here.");
+	 return NULL;
+
+      case ir_type_assignment: {
+	 ir_assignment *assign = ir->as_assignment();
+	 ir_variable *assignee = assign->lhs->whole_variable_referenced();
+
+	 if (assignee == var)
+	    return (assign->condition != NULL) ? NULL : assign->rhs;
+
+	 break;
+      }
+
+      default:
+	 break;
+      }
+   }
+
+   return NULL;
+}
+
+
+int
+calculate_iterations(ir_rvalue *from, ir_rvalue *to, ir_rvalue *increment,
+		     enum ir_expression_operation op)
+{
+   if (from == NULL || to == NULL || increment == NULL)
+      return -1;
+
+   void *mem_ctx = ralloc_context(NULL);
+
+   ir_expression *const sub =
+      new(mem_ctx) ir_expression(ir_binop_sub, from->type, to, from);
+
+   ir_expression *const div =
+      new(mem_ctx) ir_expression(ir_binop_div, sub->type, sub, increment);
+
+   ir_constant *iter = div->constant_expression_value();
+
+   if (iter == NULL)
+      return -1;
+
+   if (!iter->type->is_integer()) {
+      ir_rvalue *cast =
+	 new(mem_ctx) ir_expression(ir_unop_f2i, glsl_type::int_type, iter,
+				    NULL);
+
+      iter = cast->constant_expression_value();
+   }
+
+   int iter_value = iter->get_int_component(0);
+
+   /* Make sure that the calculated number of iterations satisfies the exit
+    * condition.  This is needed to catch off-by-one errors and some types of
+    * ill-formed loops.  For example, we need to detect that the following
+    * loop does not have a maximum iteration count.
+    *
+    *    for (float x = 0.0; x != 0.9; x += 0.2)
+    *        ;
+    */
+   const int bias[] = { -1, 0, 1 };
+   bool valid_loop = false;
+
+   for (unsigned i = 0; i < Elements(bias); i++) {
+      iter = (increment->type->is_integer())
+	 ? new(mem_ctx) ir_constant(iter_value + bias[i])
+	 : new(mem_ctx) ir_constant(float(iter_value + bias[i]));
+
+      ir_expression *const mul =
+	 new(mem_ctx) ir_expression(ir_binop_mul, increment->type, iter,
+				    increment);
+
+      ir_expression *const add =
+	 new(mem_ctx) ir_expression(ir_binop_add, mul->type, mul, from);
+
+      ir_expression *const cmp =
+	 new(mem_ctx) ir_expression(op, glsl_type::bool_type, add, to);
+
+      ir_constant *const cmp_result = cmp->constant_expression_value();
+
+      assert(cmp_result != NULL);
+      if (cmp_result->get_bool_component(0)) {
+	 iter_value += bias[i];
+	 valid_loop = true;
+	 break;
+      }
+   }
+
+   ralloc_free(mem_ctx);
+   return (valid_loop) ? iter_value : -1;
+}
+
+namespace {
+
+class loop_control_visitor : public ir_hierarchical_visitor {
+public:
+   loop_control_visitor(loop_state *state)
+   {
+      this->state = state;
+      this->progress = false;
+   }
+
+   virtual ir_visitor_status visit_leave(ir_loop *ir);
+
+   loop_state *state;
+
+   bool progress;
+};
+
+} /* anonymous namespace */
+
+ir_visitor_status
+loop_control_visitor::visit_leave(ir_loop *ir)
+{
+   loop_variable_state *const ls = this->state->get(ir);
+
+   /* If we've entered a loop that hasn't been analyzed, something really,
+    * really bad has happened.
+    */
+   if (ls == NULL) {
+      assert(ls != NULL);
+      return visit_continue;
+   }
+
+   if (ls->limiting_terminator != NULL) {
+      /* If the limiting terminator has an iteration count of zero, then we've
+       * proven that the loop cannot run, so delete it.
+       */
+      int iterations = ls->limiting_terminator->iterations;
+      if (iterations == 0) {
+         ir->remove();
+         this->progress = true;
+         return visit_continue;
+      }
+   }
+
+   /* Remove the conditional break statements associated with all terminators
+    * that are associated with a fixed iteration count, except for the one
+    * associated with the limiting terminator--that one needs to stay, since
+    * it terminates the loop.  Exception: if the loop still has a normative
+    * bound, then that terminates the loop, so we don't even need the limiting
+    * terminator.
+    */
+   foreach_list(node, &ls->terminators) {
+      loop_terminator *t = (loop_terminator *) node;
+
+      if (t->iterations < 0)
+         continue;
+
+      if (t != ls->limiting_terminator) {
+         t->ir->remove();
+
+         assert(ls->num_loop_jumps > 0);
+         ls->num_loop_jumps--;
+
+         this->progress = true;
+      }
+   }
+
+   return visit_continue;
+}
+
+
+bool
+set_loop_controls(exec_list *instructions, loop_state *ls)
+{
+   loop_control_visitor v(ls);
+
+   v.run(instructions);
+
+   return v.progress;
+}

diff --git a/icd/intel/compiler/shader/loop_unroll.cpp b/icd/intel/compiler/shader/loop_unroll.cpp
new file mode 100644
index 0000000..1ce4d58
--- /dev/null
+++ b/icd/intel/compiler/shader/loop_unroll.cpp

@@ -0,0 +1,355 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "glsl_types.h"
+#include "loop_analysis.h"
+#include "ir_hierarchical_visitor.h"
+
+#include "main/mtypes.h"
+
+namespace {
+
+class loop_unroll_visitor : public ir_hierarchical_visitor {
+public:
+   loop_unroll_visitor(loop_state *state,
+                       const struct gl_shader_compiler_options *options)
+   {
+      this->state = state;
+      this->progress = false;
+      this->options = options;
+   }
+
+   virtual ir_visitor_status visit_leave(ir_loop *ir);
+   void simple_unroll(ir_loop *ir, int iterations);
+   void complex_unroll(ir_loop *ir, int iterations,
+                       bool continue_from_then_branch);
+   void splice_post_if_instructions(ir_if *ir_if, exec_list *splice_dest);
+
+   loop_state *state;
+
+   bool progress;
+   const struct gl_shader_compiler_options *options;
+};
+
+} /* anonymous namespace */
+
+static bool
+is_break(ir_instruction *ir)
+{
+   return ir != NULL && ir->ir_type == ir_type_loop_jump
+		     && ((ir_loop_jump *) ir)->is_break();
+}
+
+class loop_unroll_count : public ir_hierarchical_visitor {
+public:
+   int nodes;
+   /* If there are nested loops, the node count will be inaccurate. */
+   bool nested_loop;
+
+   loop_unroll_count(exec_list *list)
+   {
+      nodes = 0;
+      nested_loop = false;
+
+      run(list);
+   }
+
+   virtual ir_visitor_status visit_enter(ir_assignment *)
+   {
+      nodes++;
+      return visit_continue;
+   }
+
+   virtual ir_visitor_status visit_enter(ir_expression *)
+   {
+      nodes++;
+      return visit_continue;
+   }
+
+   virtual ir_visitor_status visit_enter(ir_loop *)
+   {
+      nested_loop = true;
+      return visit_continue;
+   }
+};
+
+
+/**
+ * Unroll a loop which does not contain any jumps.  For example, if the input
+ * is:
+ *
+ *     (loop (...) ...instrs...)
+ *
+ * And the iteration count is 3, the output will be:
+ *
+ *     ...instrs... ...instrs... ...instrs...
+ */
+void
+loop_unroll_visitor::simple_unroll(ir_loop *ir, int iterations)
+{
+   void *const mem_ctx = ralloc_parent(ir);
+
+   for (int i = 0; i < iterations; i++) {
+      exec_list copy_list;
+
+      copy_list.make_empty();
+      clone_ir_list(mem_ctx, &copy_list, &ir->body_instructions);
+
+      ir->insert_before(&copy_list);
+   }
+
+   /* The loop has been replaced by the unrolled copies.  Remove the original
+    * loop from the IR sequence.
+    */
+   ir->remove();
+
+   this->progress = true;
+}
+
+
+/**
+ * Unroll a loop whose last statement is an ir_if.  If \c
+ * continue_from_then_branch is true, the loop is repeated only when the
+ * "then" branch of the if is taken; otherwise it is repeated only when the
+ * "else" branch of the if is taken.
+ *
+ * For example, if the input is:
+ *
+ *     (loop (...)
+ *      ...body...
+ *      (if (cond)
+ *          (...then_instrs...)
+ *        (...else_instrs...)))
+ *
+ * And the iteration count is 3, and \c continue_from_then_branch is true,
+ * then the output will be:
+ *
+ *     ...body...
+ *     (if (cond)
+ *         (...then_instrs...
+ *          ...body...
+ *          (if (cond)
+ *              (...then_instrs...
+ *               ...body...
+ *               (if (cond)
+ *                   (...then_instrs...)
+ *                 (...else_instrs...)))
+ *            (...else_instrs...)))
+ *       (...else_instrs))
+ */
+void
+loop_unroll_visitor::complex_unroll(ir_loop *ir, int iterations,
+                                    bool continue_from_then_branch)
+{
+   void *const mem_ctx = ralloc_parent(ir);
+   ir_instruction *ir_to_replace = ir;
+
+   for (int i = 0; i < iterations; i++) {
+      exec_list copy_list;
+
+      copy_list.make_empty();
+      clone_ir_list(mem_ctx, &copy_list, &ir->body_instructions);
+
+      ir_if *ir_if = ((ir_instruction *) copy_list.get_tail())->as_if();
+      assert(ir_if != NULL);
+
+      ir_to_replace->insert_before(&copy_list);
+      ir_to_replace->remove();
+
+      /* placeholder that will be removed in the next iteration */
+      ir_to_replace =
+         new(mem_ctx) ir_loop_jump(ir_loop_jump::jump_continue);
+
+      exec_list *const list = (continue_from_then_branch)
+         ? &ir_if->then_instructions : &ir_if->else_instructions;
+
+      list->push_tail(ir_to_replace);
+   }
+
+   ir_to_replace->remove();
+
+   this->progress = true;
+}
+
+
+/**
+ * Move all of the instructions which follow \c ir_if to the end of
+ * \c splice_dest.
+ *
+ * For example, in the code snippet:
+ *
+ *     (if (cond)
+ *         (...then_instructions...
+ *          break)
+ *       (...else_instructions...))
+ *     ...post_if_instructions...
+ *
+ * If \c ir_if points to the "if" instruction, and \c splice_dest points to
+ * (...else_instructions...), the code snippet is transformed into:
+ *
+ *     (if (cond)
+ *         (...then_instructions...
+ *          break)
+ *       (...else_instructions...
+ *        ...post_if_instructions...))
+ */
+void
+loop_unroll_visitor::splice_post_if_instructions(ir_if *ir_if,
+                                                 exec_list *splice_dest)
+{
+   while (!ir_if->get_next()->is_tail_sentinel()) {
+      ir_instruction *move_ir = (ir_instruction *) ir_if->get_next();
+
+      move_ir->remove();
+      splice_dest->push_tail(move_ir);
+   }
+}
+
+
+ir_visitor_status
+loop_unroll_visitor::visit_leave(ir_loop *ir)
+{
+   loop_variable_state *const ls = this->state->get(ir);
+   int iterations;
+
+   /* If we've entered a loop that hasn't been analyzed, something really,
+    * really bad has happened.
+    */
+   if (ls == NULL) {
+      assert(ls != NULL);
+      return visit_continue;
+   }
+
+   /* Don't try to unroll loops where the number of iterations is not known
+    * at compile-time.
+    */
+   if (ls->limiting_terminator == NULL)
+      return visit_continue;
+
+   iterations = ls->limiting_terminator->iterations;
+
+   const int max_iterations = options->MaxUnrollIterations;
+
+   /* Don't try to unroll loops that have zillions of iterations either.
+    */
+   if (iterations > max_iterations)
+      return visit_continue;
+
+   /* Don't try to unroll nested loops and loops with a huge body.
+    */
+   loop_unroll_count count(&ir->body_instructions);
+
+   if (count.nested_loop || count.nodes * iterations > max_iterations * 5)
+      return visit_continue;
+
+   /* Note: the limiting terminator contributes 1 to ls->num_loop_jumps.
+    * We'll be removing the limiting terminator before we unroll.
+    */
+   assert(ls->num_loop_jumps > 0);
+   unsigned predicted_num_loop_jumps = ls->num_loop_jumps - 1;
+
+   if (predicted_num_loop_jumps > 1)
+      return visit_continue;
+
+   if (predicted_num_loop_jumps == 0) {
+      ls->limiting_terminator->ir->remove();
+      simple_unroll(ir, iterations);
+      return visit_continue;
+   }
+
+   ir_instruction *last_ir = (ir_instruction *) ir->body_instructions.get_tail();
+   assert(last_ir != NULL);
+
+   if (is_break(last_ir)) {
+      /* If the only loop-jump is a break at the end of the loop, the loop
+       * will execute exactly once.  Remove the break and use the simple
+       * unroller with an iteration count of 1.
+       */
+      last_ir->remove();
+
+      ls->limiting_terminator->ir->remove();
+      simple_unroll(ir, 1);
+      return visit_continue;
+   }
+
+   foreach_list(node, &ir->body_instructions) {
+      /* recognize loops in the form produced by ir_lower_jumps */
+      ir_instruction *cur_ir = (ir_instruction *) node;
+
+      /* Skip the limiting terminator, since it will go away when we
+       * unroll.
+       */
+      if (cur_ir == ls->limiting_terminator->ir)
+         continue;
+
+      ir_if *ir_if = cur_ir->as_if();
+      if (ir_if != NULL) {
+         /* Determine which if-statement branch, if any, ends with a
+          * break.  The branch that did *not* have the break will get a
+          * temporary continue inserted in each iteration of the loop
+          * unroll.
+          *
+          * Note that since ls->num_loop_jumps is <= 1, it is impossible
+          * for both branches to end with a break.
+          */
+         ir_instruction *ir_if_last =
+            (ir_instruction *) ir_if->then_instructions.get_tail();
+
+         if (is_break(ir_if_last)) {
+            ls->limiting_terminator->ir->remove();
+            splice_post_if_instructions(ir_if, &ir_if->else_instructions);
+            ir_if_last->remove();
+            complex_unroll(ir, iterations, false);
+            return visit_continue;
+         } else {
+            ir_if_last =
+               (ir_instruction *) ir_if->else_instructions.get_tail();
+
+            if (is_break(ir_if_last)) {
+               ls->limiting_terminator->ir->remove();
+               splice_post_if_instructions(ir_if, &ir_if->then_instructions);
+               ir_if_last->remove();
+               complex_unroll(ir, iterations, true);
+               return visit_continue;
+            }
+         }
+      }
+   }
+
+   /* Did not find the break statement.  It must be in a complex if-nesting,
+    * so don't try to unroll.
+    */
+   return visit_continue;
+}
+
+
+bool
+unroll_loops(exec_list *instructions, loop_state *ls,
+             const struct gl_shader_compiler_options *options)
+{
+   loop_unroll_visitor v(ls, options);
+
+   v.run(instructions);
+
+   return v.progress;
+}

diff --git a/icd/intel/compiler/shader/lower_clip_distance.cpp b/icd/intel/compiler/shader/lower_clip_distance.cpp
new file mode 100644
index 0000000..2d6138d
--- /dev/null
+++ b/icd/intel/compiler/shader/lower_clip_distance.cpp

@@ -0,0 +1,549 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file lower_clip_distance.cpp
+ *
+ * This pass accounts for the difference between the way
+ * gl_ClipDistance is declared in standard GLSL (as an array of
+ * floats), and the way it is frequently implemented in hardware (as
+ * a pair of vec4s, with four clip distances packed into each).
+ *
+ * The declaration of gl_ClipDistance is replaced with a declaration
+ * of gl_ClipDistanceMESA, and any references to gl_ClipDistance are
+ * translated to refer to gl_ClipDistanceMESA with the appropriate
+ * swizzling of array indices.  For instance:
+ *
+ *   gl_ClipDistance[i]
+ *
+ * is translated into:
+ *
+ *   gl_ClipDistanceMESA[i>>2][i&3]
+ *
+ * Since some hardware may not internally represent gl_ClipDistance as a pair
+ * of vec4's, this lowering pass is optional.  To enable it, set the
+ * LowerClipDistance flag in gl_shader_compiler_options to true.
+ */
+
+#include "glsl_symbol_table.h"
+#include "ir_rvalue_visitor.h"
+#include "ir.h"
+#include "program/prog_instruction.h" /* For WRITEMASK_* */
+
+namespace {
+
+class lower_clip_distance_visitor : public ir_rvalue_visitor {
+public:
+   explicit lower_clip_distance_visitor(gl_shader_stage shader_stage)
+      : progress(false), old_clip_distance_1d_var(NULL),
+        old_clip_distance_2d_var(NULL), new_clip_distance_1d_var(NULL),
+        new_clip_distance_2d_var(NULL), shader_stage(shader_stage)
+   {
+   }
+
+   virtual ir_visitor_status visit(ir_variable *);
+   void create_indices(ir_rvalue*, ir_rvalue *&, ir_rvalue *&);
+   bool is_clip_distance_vec8(ir_rvalue *ir);
+   ir_rvalue *lower_clip_distance_vec8(ir_rvalue *ir);
+   virtual ir_visitor_status visit_leave(ir_assignment *);
+   void visit_new_assignment(ir_assignment *ir);
+   virtual ir_visitor_status visit_leave(ir_call *);
+
+   virtual void handle_rvalue(ir_rvalue **rvalue);
+
+   void fix_lhs(ir_assignment *);
+
+   bool progress;
+
+   /**
+    * Pointer to the declaration of gl_ClipDistance, if found.
+    *
+    * Note:
+    *
+    * - the 2d_var is for geometry shader input only.
+    *
+    * - since gl_ClipDistance is available in geometry shaders as both an
+    *   input and an output, it's possible for both old_clip_distance_1d_var
+    *   and old_clip_distance_2d_var to be non-null.
+    */
+   ir_variable *old_clip_distance_1d_var;
+   ir_variable *old_clip_distance_2d_var;
+
+   /**
+    * Pointer to the newly-created gl_ClipDistanceMESA variable.
+    */
+   ir_variable *new_clip_distance_1d_var;
+   ir_variable *new_clip_distance_2d_var;
+
+   /**
+    * Type of shader we are compiling (e.g. MESA_SHADER_VERTEX)
+    */
+   const gl_shader_stage shader_stage;
+};
+
+} /* anonymous namespace */
+
+/**
+ * Replace any declaration of gl_ClipDistance as an array of floats with a
+ * declaration of gl_ClipDistanceMESA as an array of vec4's.
+ */
+ir_visitor_status
+lower_clip_distance_visitor::visit(ir_variable *ir)
+{
+   if (!ir->name || strcmp(ir->name, "gl_ClipDistance") != 0)
+      return visit_continue;
+   assert (ir->type->is_array());
+
+   if (!ir->type->element_type()->is_array()) {
+      /* 1D gl_ClipDistance (used for vertex and geometry output, and fragment
+       * input).
+       */
+      if (this->old_clip_distance_1d_var)
+         return visit_continue;
+
+      this->progress = true;
+      this->old_clip_distance_1d_var = ir;
+      assert (ir->type->element_type() == glsl_type::float_type);
+      unsigned new_size = (ir->type->array_size() + 3) / 4;
+
+      /* Clone the old var so that we inherit all of its properties */
+      this->new_clip_distance_1d_var = ir->clone(ralloc_parent(ir), NULL);
+
+      /* And change the properties that we need to change */
+      this->new_clip_distance_1d_var->name
+         = ralloc_strdup(this->new_clip_distance_1d_var,
+                         "gl_ClipDistanceMESA");
+      this->new_clip_distance_1d_var->type
+         = glsl_type::get_array_instance(glsl_type::vec4_type, new_size);
+      this->new_clip_distance_1d_var->data.max_array_access
+         = ir->data.max_array_access / 4;
+
+      ir->replace_with(this->new_clip_distance_1d_var);
+   } else {
+      /* 2D gl_ClipDistance (used for geometry input). */
+      assert(ir->data.mode == ir_var_shader_in &&
+             this->shader_stage == MESA_SHADER_GEOMETRY);
+      if (this->old_clip_distance_2d_var)
+         return visit_continue;
+
+      this->progress = true;
+      this->old_clip_distance_2d_var = ir;
+      assert (ir->type->element_type()->element_type() == glsl_type::float_type);
+      unsigned new_size = (ir->type->element_type()->array_size() + 3) / 4;
+
+      /* Clone the old var so that we inherit all of its properties */
+      this->new_clip_distance_2d_var = ir->clone(ralloc_parent(ir), NULL);
+
+      /* And change the properties that we need to change */
+      this->new_clip_distance_2d_var->name
+         = ralloc_strdup(this->new_clip_distance_2d_var, "gl_ClipDistanceMESA");
+      this->new_clip_distance_2d_var->type = glsl_type::get_array_instance(
+         glsl_type::get_array_instance(glsl_type::vec4_type,
+            new_size),
+         ir->type->array_size());
+      this->new_clip_distance_2d_var->data.max_array_access
+         = ir->data.max_array_access / 4;
+
+      ir->replace_with(this->new_clip_distance_2d_var);
+   }
+   return visit_continue;
+}
+
+
+/**
+ * Create the necessary GLSL rvalues to index into gl_ClipDistanceMESA based
+ * on the rvalue previously used to index into gl_ClipDistance.
+ *
+ * \param array_index Selects one of the vec4's in gl_ClipDistanceMESA
+ * \param swizzle_index Selects a component within the vec4 selected by
+ *        array_index.
+ */
+void
+lower_clip_distance_visitor::create_indices(ir_rvalue *old_index,
+                                            ir_rvalue *&array_index,
+                                            ir_rvalue *&swizzle_index)
+{
+   void *ctx = ralloc_parent(old_index);
+
+   /* Make sure old_index is a signed int so that the bitwise "shift" and
+    * "and" operations below type check properly.
+    */
+   if (old_index->type != glsl_type::int_type) {
+      assert (old_index->type == glsl_type::uint_type);
+      old_index = new(ctx) ir_expression(ir_unop_u2i, old_index);
+   }
+
+   ir_constant *old_index_constant = old_index->constant_expression_value();
+   if (old_index_constant) {
+      /* gl_ClipDistance is being accessed via a constant index.  Don't bother
+       * creating expressions to calculate the lowered indices.  Just create
+       * constants.
+       */
+      int const_val = old_index_constant->get_int_component(0);
+      array_index = new(ctx) ir_constant(const_val / 4);
+      swizzle_index = new(ctx) ir_constant(const_val % 4);
+   } else {
+      /* Create a variable to hold the value of old_index (so that we
+       * don't compute it twice).
+       */
+      ir_variable *old_index_var = new(ctx) ir_variable(
+         glsl_type::int_type, "clip_distance_index", ir_var_temporary);
+      this->base_ir->insert_before(old_index_var);
+      this->base_ir->insert_before(new(ctx) ir_assignment(
+         new(ctx) ir_dereference_variable(old_index_var), old_index));
+
+      /* Create the expression clip_distance_index / 4.  Do this as a bit
+       * shift because that's likely to be more efficient.
+       */
+      array_index = new(ctx) ir_expression(
+         ir_binop_rshift, new(ctx) ir_dereference_variable(old_index_var),
+         new(ctx) ir_constant(2));
+
+      /* Create the expression clip_distance_index % 4.  Do this as a bitwise
+       * AND because that's likely to be more efficient.
+       */
+      swizzle_index = new(ctx) ir_expression(
+         ir_binop_bit_and, new(ctx) ir_dereference_variable(old_index_var),
+         new(ctx) ir_constant(3));
+   }
+}
+
+
+/**
+ * Determine whether the given rvalue describes an array of 8 floats that
+ * needs to be lowered to an array of 2 vec4's; that is, determine whether it
+ * matches one of the following patterns:
+ *
+ * - gl_ClipDistance (if gl_ClipDistance is 1D)
+ * - gl_ClipDistance[i] (if gl_ClipDistance is 2D)
+ */
+bool
+lower_clip_distance_visitor::is_clip_distance_vec8(ir_rvalue *ir)
+{
+   /* Note that geometry shaders contain gl_ClipDistance both as an input
+    * (which is a 2D array) and an output (which is a 1D array), so it's
+    * possible for both this->old_clip_distance_1d_var and
+    * this->old_clip_distance_2d_var to be non-NULL in the same shader.
+    */
+
+   if (this->old_clip_distance_1d_var) {
+      ir_dereference_variable *var_ref = ir->as_dereference_variable();
+      if (var_ref && var_ref->var == this->old_clip_distance_1d_var)
+         return true;
+   }
+   if (this->old_clip_distance_2d_var) {
+      /* 2D clip distance is only possible as a geometry input */
+      assert(this->shader_stage == MESA_SHADER_GEOMETRY);
+
+      ir_dereference_array *array_ref = ir->as_dereference_array();
+      if (array_ref) {
+         ir_dereference_variable *var_ref =
+            array_ref->array->as_dereference_variable();
+         if (var_ref && var_ref->var == this->old_clip_distance_2d_var)
+            return true;
+      }
+   }
+   return false;
+}
+
+
+/**
+ * If the given ir satisfies is_clip_distance_vec8(), return new ir
+ * representing its lowered equivalent.  That is, map:
+ *
+ * - gl_ClipDistance    => gl_ClipDistanceMESA    (if gl_ClipDistance is 1D)
+ * - gl_ClipDistance[i] => gl_ClipDistanceMESA[i] (if gl_ClipDistance is 2D)
+ *
+ * Otherwise return NULL.
+ */
+ir_rvalue *
+lower_clip_distance_visitor::lower_clip_distance_vec8(ir_rvalue *ir)
+{
+   if (this->old_clip_distance_1d_var) {
+      ir_dereference_variable *var_ref = ir->as_dereference_variable();
+      if (var_ref && var_ref->var == this->old_clip_distance_1d_var) {
+         return new(ralloc_parent(ir))
+            ir_dereference_variable(this->new_clip_distance_1d_var);
+      }
+   }
+   if (this->old_clip_distance_2d_var) {
+      /* 2D clip distance is only possible as a geometry input */
+      assert(this->shader_stage == MESA_SHADER_GEOMETRY);
+
+      ir_dereference_array *array_ref = ir->as_dereference_array();
+      if (array_ref) {
+         ir_dereference_variable *var_ref =
+            array_ref->array->as_dereference_variable();
+         if (var_ref && var_ref->var == this->old_clip_distance_2d_var) {
+            return new(ralloc_parent(ir))
+               ir_dereference_array(this->new_clip_distance_2d_var,
+                                    array_ref->array_index);
+         }
+      }
+   }
+   return NULL;
+}
+
+
+void
+lower_clip_distance_visitor::handle_rvalue(ir_rvalue **rv)
+{
+   if (*rv == NULL)
+      return;
+
+   ir_dereference_array *const array_deref = (*rv)->as_dereference_array();
+   if (array_deref == NULL)
+      return;
+
+   /* Replace any expression that indexes one of the floats in gl_ClipDistance
+    * with an expression that indexes into one of the vec4's in
+    * gl_ClipDistanceMESA and accesses the appropriate component.
+    */
+   ir_rvalue *lowered_vec8 =
+      this->lower_clip_distance_vec8(array_deref->array);
+   if (lowered_vec8 != NULL) {
+      this->progress = true;
+      ir_rvalue *array_index;
+      ir_rvalue *swizzle_index;
+      this->create_indices(array_deref->array_index, array_index, swizzle_index);
+      void *mem_ctx = ralloc_parent(array_deref);
+
+      ir_dereference_array *const new_array_deref =
+         new(mem_ctx) ir_dereference_array(lowered_vec8, array_index);
+
+      ir_expression *const expr =
+         new(mem_ctx) ir_expression(ir_binop_vector_extract,
+                                    new_array_deref,
+                                    swizzle_index);
+
+      *rv = expr;
+   }
+}
+
+void
+lower_clip_distance_visitor::fix_lhs(ir_assignment *ir)
+{
+   if (ir->lhs->ir_type == ir_type_expression) {
+      void *mem_ctx = ralloc_parent(ir);
+      ir_expression *const expr = (ir_expression *) ir->lhs;
+
+      /* The expression must be of the form:
+       *
+       *     (vector_extract gl_ClipDistanceMESA[i], j).
+       */
+      assert(expr->operation == ir_binop_vector_extract);
+      assert(expr->operands[0]->ir_type == ir_type_dereference_array);
+      assert(expr->operands[0]->type == glsl_type::vec4_type);
+
+      ir_dereference *const new_lhs = (ir_dereference *) expr->operands[0];
+      ir->rhs = new(mem_ctx) ir_expression(ir_triop_vector_insert,
+					   glsl_type::vec4_type,
+					   new_lhs->clone(mem_ctx, NULL),
+					   ir->rhs,
+					   expr->operands[1]);
+      ir->set_lhs(new_lhs);
+      ir->write_mask = WRITEMASK_XYZW;
+   }
+}
+
+/**
+ * Replace any assignment having the 1D gl_ClipDistance (undereferenced) as
+ * its LHS or RHS with a sequence of assignments, one for each component of
+ * the array.  Each of these assignments is lowered to refer to
+ * gl_ClipDistanceMESA as appropriate.
+ *
+ * We need to do a similar replacement for 2D gl_ClipDistance, however since
+ * it's an input, the only case we need to address is where a 1D slice of it
+ * is the entire RHS of an assignment, e.g.:
+ *
+ *     foo = gl_in[i].gl_ClipDistance
+ */
+ir_visitor_status
+lower_clip_distance_visitor::visit_leave(ir_assignment *ir)
+{
+   /* First invoke the base class visitor.  This causes handle_rvalue() to be
+    * called on ir->rhs and ir->condition.
+    */
+   ir_rvalue_visitor::visit_leave(ir);
+
+   if (this->is_clip_distance_vec8(ir->lhs) ||
+       this->is_clip_distance_vec8(ir->rhs)) {
+      /* LHS or RHS of the assignment is the entire 1D gl_ClipDistance array
+       * (or a 1D slice of a 2D gl_ClipDistance input array).  Since we are
+       * reshaping gl_ClipDistance from an array of floats to an array of
+       * vec4's, this isn't going to work as a bulk assignment anymore, so
+       * unroll it to element-by-element assignments and lower each of them.
+       *
+       * Note: to unroll into element-by-element assignments, we need to make
+       * clones of the LHS and RHS.  This is safe because expressions and
+       * l-values are side-effect free.
+       */
+      void *ctx = ralloc_parent(ir);
+      int array_size = ir->lhs->type->array_size();
+      for (int i = 0; i < array_size; ++i) {
+         ir_dereference_array *new_lhs = new(ctx) ir_dereference_array(
+            ir->lhs->clone(ctx, NULL), new(ctx) ir_constant(i));
+         ir_dereference_array *new_rhs = new(ctx) ir_dereference_array(
+            ir->rhs->clone(ctx, NULL), new(ctx) ir_constant(i));
+         this->handle_rvalue((ir_rvalue **) &new_rhs);
+
+         /* Handle the LHS after creating the new assignment.  This must
+          * happen in this order because handle_rvalue may replace the old LHS
+          * with an ir_expression of ir_binop_vector_extract.  Since this is
+          * not a valide l-value, this will cause an assertion in the
+          * ir_assignment constructor to fail.
+          *
+          * If this occurs, replace the mangled LHS with a dereference of the
+          * vector, and replace the RHS with an ir_triop_vector_insert.
+          */
+         ir_assignment *const assign = new(ctx) ir_assignment(new_lhs, new_rhs);
+         this->handle_rvalue((ir_rvalue **) &assign->lhs);
+         this->fix_lhs(assign);
+
+         this->base_ir->insert_before(assign);
+      }
+      ir->remove();
+
+      return visit_continue;
+   }
+
+   /* Handle the LHS as if it were an r-value.  Normally
+    * rvalue_visit(ir_assignment *) only visits the RHS, but we need to lower
+    * expressions in the LHS as well.
+    *
+    * This may cause the LHS to get replaced with an ir_expression of
+    * ir_binop_vector_extract.  If this occurs, replace it with a dereference
+    * of the vector, and replace the RHS with an ir_triop_vector_insert.
+    */
+   handle_rvalue((ir_rvalue **)&ir->lhs);
+   this->fix_lhs(ir);
+
+   return rvalue_visit(ir);
+}
+
+
+/**
+ * Set up base_ir properly and call visit_leave() on a newly created
+ * ir_assignment node.  This is used in cases where we have to insert an
+ * ir_assignment in a place where we know the hierarchical visitor won't see
+ * it.
+ */
+void
+lower_clip_distance_visitor::visit_new_assignment(ir_assignment *ir)
+{
+   ir_instruction *old_base_ir = this->base_ir;
+   this->base_ir = ir;
+   ir->accept(this);
+   this->base_ir = old_base_ir;
+}
+
+
+/**
+ * If a 1D gl_ClipDistance variable appears as an argument in an ir_call
+ * expression, replace it with a temporary variable, and make sure the ir_call
+ * is preceded and/or followed by assignments that copy the contents of the
+ * temporary variable to and/or from gl_ClipDistance.  Each of these
+ * assignments is then lowered to refer to gl_ClipDistanceMESA.
+ *
+ * We need to do a similar replacement for 2D gl_ClipDistance, however since
+ * it's an input, the only case we need to address is where a 1D slice of it
+ * is passed as an "in" parameter to an ir_call, e.g.:
+ *
+ *     foo(gl_in[i].gl_ClipDistance)
+ */
+ir_visitor_status
+lower_clip_distance_visitor::visit_leave(ir_call *ir)
+{
+   void *ctx = ralloc_parent(ir);
+
+   const exec_node *formal_param_node = ir->callee->parameters.head;
+   const exec_node *actual_param_node = ir->actual_parameters.head;
+   while (!actual_param_node->is_tail_sentinel()) {
+      ir_variable *formal_param = (ir_variable *) formal_param_node;
+      ir_rvalue *actual_param = (ir_rvalue *) actual_param_node;
+
+      /* Advance formal_param_node and actual_param_node now so that we can
+       * safely replace actual_param with another node, if necessary, below.
+       */
+      formal_param_node = formal_param_node->next;
+      actual_param_node = actual_param_node->next;
+
+      if (this->is_clip_distance_vec8(actual_param)) {
+         /* User is trying to pass the whole 1D gl_ClipDistance array (or a 1D
+          * slice of a 2D gl_ClipDistance array) to a function call.  Since we
+          * are reshaping gl_ClipDistance from an array of floats to an array
+          * of vec4's, this isn't going to work anymore, so use a temporary
+          * array instead.
+          */
+         ir_variable *temp_clip_distance = new(ctx) ir_variable(
+            actual_param->type, "temp_clip_distance", ir_var_temporary);
+         this->base_ir->insert_before(temp_clip_distance);
+         actual_param->replace_with(
+            new(ctx) ir_dereference_variable(temp_clip_distance));
+         if (formal_param->data.mode == ir_var_function_in
+             || formal_param->data.mode == ir_var_function_inout) {
+            /* Copy from gl_ClipDistance to the temporary before the call.
+             * Since we are going to insert this copy before the current
+             * instruction, we need to visit it afterwards to make sure it
+             * gets lowered.
+             */
+            ir_assignment *new_assignment = new(ctx) ir_assignment(
+               new(ctx) ir_dereference_variable(temp_clip_distance),
+               actual_param->clone(ctx, NULL));
+            this->base_ir->insert_before(new_assignment);
+            this->visit_new_assignment(new_assignment);
+         }
+         if (formal_param->data.mode == ir_var_function_out
+             || formal_param->data.mode == ir_var_function_inout) {
+            /* Copy from the temporary to gl_ClipDistance after the call.
+             * Since visit_list_elements() has already decided which
+             * instruction it's going to visit next, we need to visit
+             * afterwards to make sure it gets lowered.
+             */
+            ir_assignment *new_assignment = new(ctx) ir_assignment(
+               actual_param->clone(ctx, NULL),
+               new(ctx) ir_dereference_variable(temp_clip_distance));
+            this->base_ir->insert_after(new_assignment);
+            this->visit_new_assignment(new_assignment);
+         }
+      }
+   }
+
+   return rvalue_visit(ir);
+}
+
+
+bool
+lower_clip_distance(gl_shader *shader)
+{
+   lower_clip_distance_visitor v(shader->Stage);
+
+   visit_list_elements(&v, shader->ir);
+
+   if (v.new_clip_distance_1d_var)
+      shader->symbols->add_variable(v.new_clip_distance_1d_var);
+   if (v.new_clip_distance_2d_var)
+      shader->symbols->add_variable(v.new_clip_distance_2d_var);
+
+   return v.progress;
+}

diff --git a/icd/intel/compiler/shader/lower_discard.cpp b/icd/intel/compiler/shader/lower_discard.cpp
new file mode 100644
index 0000000..f2757d1
--- /dev/null
+++ b/icd/intel/compiler/shader/lower_discard.cpp

@@ -0,0 +1,201 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file lower_discard.cpp
+ *
+ * This pass moves discards out of if-statements.
+ *
+ * Case 1: The "then" branch contains a conditional discard:
+ * ---------------------------------------------------------
+ *
+ *    if (cond1) {
+ *	 s1;
+ *	 discard cond2;
+ *	 s2;
+ *    } else {
+ *	 s3;
+ *    }
+ *
+ * becomes:
+ *
+ *    temp = false;
+ *    if (cond1) {
+ *	 s1;
+ *	 temp = cond2;
+ *	 s2;
+ *    } else {
+ *	 s3;
+ *    }
+ *    discard temp;
+ *
+ * Case 2: The "else" branch contains a conditional discard:
+ * ---------------------------------------------------------
+ *
+ *    if (cond1) {
+ *	 s1;
+ *    } else {
+ *	 s2;
+ *	 discard cond2;
+ *	 s3;
+ *    }
+ *
+ * becomes:
+ *
+ *    temp = false;
+ *    if (cond1) {
+ *	 s1;
+ *    } else {
+ *	 s2;
+ *	 temp = cond2;
+ *	 s3;
+ *    }
+ *    discard temp;
+ *
+ * Case 3: Both branches contain a conditional discard:
+ * ----------------------------------------------------
+ *
+ *    if (cond1) {
+ *	 s1;
+ *	 discard cond2;
+ *	 s2;
+ *    } else {
+ *	 s3;
+ *	 discard cond3;
+ *	 s4;
+ *    }
+ *
+ * becomes:
+ *
+ *    temp = false;
+ *    if (cond1) {
+ *	 s1;
+ *	 temp = cond2;
+ *	 s2;
+ *    } else {
+ *	 s3;
+ *	 temp = cond3;
+ *	 s4;
+ *    }
+ *    discard temp;
+ *
+ * If there are multiple conditional discards, we need only deal with one of
+ * them.  Repeatedly applying this pass will take care of the others.
+ *
+ * Unconditional discards are treated as having a condition of "true".
+ */
+
+#include "glsl_types.h"
+#include "ir.h"
+
+namespace {
+
+class lower_discard_visitor : public ir_hierarchical_visitor {
+public:
+   lower_discard_visitor()
+   {
+      this->progress = false;
+   }
+
+   ir_visitor_status visit_leave(ir_if *);
+
+   bool progress;
+};
+
+} /* anonymous namespace */
+
+bool
+lower_discard(exec_list *instructions)
+{
+   lower_discard_visitor v;
+
+   visit_list_elements(&v, instructions);
+
+   return v.progress;
+}
+
+
+static ir_discard *
+find_discard(exec_list &instructions)
+{
+   foreach_list(n, &instructions) {
+      ir_discard *ir = ((ir_instruction *) n)->as_discard();
+      if (ir != NULL)
+	 return ir;
+   }
+   return NULL;
+}
+
+
+static void
+replace_discard(void *mem_ctx, ir_variable *var, ir_discard *ir)
+{
+   ir_rvalue *condition = ir->condition;
+
+   /* For unconditional discards, use "true" as the condition. */
+   if (condition == NULL)
+      condition = new(mem_ctx) ir_constant(true);
+
+   ir_assignment *assignment =
+      new(mem_ctx) ir_assignment(new(mem_ctx) ir_dereference_variable(var),
+				 condition, NULL);
+
+   ir->replace_with(assignment);
+}
+
+
+ir_visitor_status
+lower_discard_visitor::visit_leave(ir_if *ir)
+{
+   ir_discard *then_discard = find_discard(ir->then_instructions);
+   ir_discard *else_discard = find_discard(ir->else_instructions);
+
+   if (then_discard == NULL && else_discard == NULL)
+      return visit_continue;
+
+   void *mem_ctx = ralloc_parent(ir);
+
+   ir_variable *temp = new(mem_ctx) ir_variable(glsl_type::bool_type,
+						"discard_cond_temp",
+						ir_var_temporary);
+   ir_assignment *temp_initializer =
+      new(mem_ctx) ir_assignment(new(mem_ctx) ir_dereference_variable(temp),
+				 new(mem_ctx) ir_constant(false), NULL);
+
+   ir->insert_before(temp);
+   ir->insert_before(temp_initializer);
+
+   if (then_discard != NULL)
+      replace_discard(mem_ctx, temp, then_discard);
+
+   if (else_discard != NULL)
+      replace_discard(mem_ctx, temp, else_discard);
+
+   ir_discard *discard = then_discard != NULL ? then_discard : else_discard;
+   discard->condition = new(mem_ctx) ir_dereference_variable(temp);
+   ir->insert_after(discard);
+
+   this->progress = true;
+
+   return visit_continue;
+}

diff --git a/icd/intel/compiler/shader/lower_discard_flow.cpp b/icd/intel/compiler/shader/lower_discard_flow.cpp
new file mode 100644
index 0000000..1bc56d7
--- /dev/null
+++ b/icd/intel/compiler/shader/lower_discard_flow.cpp

@@ -0,0 +1,148 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/** @file lower_discard_flow.cpp
+ *
+ * Implements the GLSL 1.30 revision 9 rule for fragment shader
+ * discard handling:
+ *
+ *     "Control flow exits the shader, and subsequent implicit or
+ *      explicit derivatives are undefined when this control flow is
+ *      non-uniform (meaning different fragments within the primitive
+ *      take different control paths)."
+ *
+ * There seem to be two conflicting things here.  "Control flow exits
+ * the shader" sounds like the discarded fragments should effectively
+ * jump to the end of the shader, but that breaks derivatives in the
+ * case of uniform control flow and causes rendering failure in the
+ * bushes in Unigine Tropics.
+ *
+ * The question, then, is whether the intent was "loops stop at the
+ * point that the only active channels left are discarded pixels" or
+ * "discarded pixels become inactive at the point that control flow
+ * returns to the top of a loop".  This implements the second
+ * interpretation.
+ */
+
+#include "glsl_types.h"
+#include "ir.h"
+#include "program/hash_table.h"
+
+namespace {
+
+class lower_discard_flow_visitor : public ir_hierarchical_visitor {
+public:
+   lower_discard_flow_visitor(ir_variable *discarded)
+   : discarded(discarded)
+   {
+      mem_ctx = ralloc_parent(discarded);
+   }
+
+   ~lower_discard_flow_visitor()
+   {
+   }
+
+   ir_visitor_status visit_enter(ir_discard *ir);
+   ir_visitor_status visit_enter(ir_loop_jump *ir);
+   ir_visitor_status visit_enter(ir_loop *ir);
+   ir_visitor_status visit_enter(ir_function_signature *ir);
+
+   ir_if *generate_discard_break();
+
+   ir_variable *discarded;
+   void *mem_ctx;
+};
+
+} /* anonymous namespace */
+
+ir_visitor_status
+lower_discard_flow_visitor::visit_enter(ir_loop_jump *ir)
+{
+   if (ir->mode != ir_loop_jump::jump_continue)
+      return visit_continue;
+
+   ir->insert_before(generate_discard_break());
+
+   return visit_continue;
+}
+
+ir_visitor_status
+lower_discard_flow_visitor::visit_enter(ir_discard *ir)
+{
+   ir_dereference *lhs = new(mem_ctx) ir_dereference_variable(discarded);
+   ir_rvalue *rhs = new(mem_ctx) ir_constant(true);
+   ir_assignment *assign = new(mem_ctx) ir_assignment(lhs, rhs);
+   ir->insert_before(assign);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+lower_discard_flow_visitor::visit_enter(ir_loop *ir)
+{
+   ir->body_instructions.push_tail(generate_discard_break());
+
+   return visit_continue;
+}
+
+ir_visitor_status
+lower_discard_flow_visitor::visit_enter(ir_function_signature *ir)
+{
+   if (strcmp(ir->function_name(), "main") != 0)
+      return visit_continue;
+
+   ir_dereference *lhs = new(mem_ctx) ir_dereference_variable(discarded);
+   ir_rvalue *rhs = new(mem_ctx) ir_constant(false);
+   ir_assignment *assign = new(mem_ctx) ir_assignment(lhs, rhs);
+   ir->body.push_head(assign);
+
+   return visit_continue;
+}
+
+ir_if *
+lower_discard_flow_visitor::generate_discard_break()
+{
+   ir_rvalue *if_condition = new(mem_ctx) ir_dereference_variable(discarded);
+   ir_if *if_inst = new(mem_ctx) ir_if(if_condition);
+
+   ir_instruction *br = new(mem_ctx) ir_loop_jump(ir_loop_jump::jump_break);
+   if_inst->then_instructions.push_tail(br);
+
+   return if_inst;
+}
+
+void
+lower_discard_flow(exec_list *ir)
+{
+   void *mem_ctx = ir;
+
+   ir_variable *var = new(mem_ctx) ir_variable(glsl_type::bool_type,
+					       "discarded",
+					       ir_var_temporary);
+
+   ir->push_head(var);
+
+   lower_discard_flow_visitor v(var);
+
+   visit_list_elements(&v, ir);
+}

diff --git a/icd/intel/compiler/shader/lower_if_to_cond_assign.cpp b/icd/intel/compiler/shader/lower_if_to_cond_assign.cpp
new file mode 100644
index 0000000..f15b217
--- /dev/null
+++ b/icd/intel/compiler/shader/lower_if_to_cond_assign.cpp

@@ -0,0 +1,256 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file lower_if_to_cond_assign.cpp
+ *
+ * This attempts to flatten if-statements to conditional assignments for
+ * GPUs with limited or no flow control support.
+ *
+ * It can't handle other control flow being inside of its block, such
+ * as calls or loops.  Hopefully loop unrolling and inlining will take
+ * care of those.
+ *
+ * Drivers for GPUs with no control flow support should simply call
+ *
+ *    lower_if_to_cond_assign(instructions)
+ *
+ * to attempt to flatten all if-statements.
+ *
+ * Some GPUs (such as i965 prior to gen6) do support control flow, but have a
+ * maximum nesting depth N.  Drivers for such hardware can call
+ *
+ *    lower_if_to_cond_assign(instructions, N)
+ *
+ * to attempt to flatten any if-statements appearing at depth > N.
+ */
+
+#include "glsl_types.h"
+#include "ir.h"
+#include "program/hash_table.h"
+
+namespace {
+
+class ir_if_to_cond_assign_visitor : public ir_hierarchical_visitor {
+public:
+   ir_if_to_cond_assign_visitor(unsigned max_depth)
+   {
+      this->progress = false;
+      this->max_depth = max_depth;
+      this->depth = 0;
+
+      this->condition_variables = hash_table_ctor(0, hash_table_pointer_hash,
+						  hash_table_pointer_compare);
+   }
+
+   ~ir_if_to_cond_assign_visitor()
+   {
+      hash_table_dtor(this->condition_variables);
+   }
+
+   ir_visitor_status visit_enter(ir_if *);
+   ir_visitor_status visit_leave(ir_if *);
+
+   bool progress;
+   unsigned max_depth;
+   unsigned depth;
+
+   struct hash_table *condition_variables;
+};
+
+} /* anonymous namespace */
+
+bool
+lower_if_to_cond_assign(exec_list *instructions, unsigned max_depth)
+{
+   if (max_depth == UINT_MAX)
+      return false;
+
+   ir_if_to_cond_assign_visitor v(max_depth);
+
+   visit_list_elements(&v, instructions);
+
+   return v.progress;
+}
+
+void
+check_control_flow(ir_instruction *ir, void *data)
+{
+   bool *found_control_flow = (bool *)data;
+   switch (ir->ir_type) {
+   case ir_type_call:
+   case ir_type_discard:
+   case ir_type_loop:
+   case ir_type_loop_jump:
+   case ir_type_return:
+      *found_control_flow = true;
+      break;
+   default:
+      break;
+   }
+}
+
+void
+move_block_to_cond_assign(void *mem_ctx,
+			  ir_if *if_ir, ir_rvalue *cond_expr,
+			  exec_list *instructions,
+			  struct hash_table *ht)
+{
+   foreach_list_safe(node, instructions) {
+      ir_instruction *ir = (ir_instruction *) node;
+
+      if (ir->ir_type == ir_type_assignment) {
+	 ir_assignment *assign = (ir_assignment *)ir;
+
+	 if (hash_table_find(ht, assign) == NULL) {
+	    hash_table_insert(ht, assign, assign);
+
+	    /* If the LHS of the assignment is a condition variable that was
+	     * previously added, insert an additional assignment of false to
+	     * the variable.
+	     */
+	    const bool assign_to_cv =
+	       hash_table_find(ht, assign->lhs->variable_referenced()) != NULL;
+
+	    if (!assign->condition) {
+	       if (assign_to_cv) {
+		  assign->rhs =
+		     new(mem_ctx) ir_expression(ir_binop_logic_and,
+						glsl_type::bool_type,
+						cond_expr->clone(mem_ctx, NULL),
+						assign->rhs);
+	       } else {
+		  assign->condition = cond_expr->clone(mem_ctx, NULL);
+	       }
+	    } else {
+	       assign->condition =
+		  new(mem_ctx) ir_expression(ir_binop_logic_and,
+					     glsl_type::bool_type,
+					     cond_expr->clone(mem_ctx, NULL),
+					     assign->condition);
+	    }
+	 }
+      }
+
+      /* Now, move from the if block to the block surrounding it. */
+      ir->remove();
+      if_ir->insert_before(ir);
+   }
+}
+
+ir_visitor_status
+ir_if_to_cond_assign_visitor::visit_enter(ir_if *ir)
+{
+   (void) ir;
+   this->depth++;
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_if_to_cond_assign_visitor::visit_leave(ir_if *ir)
+{
+   /* Only flatten when beyond the GPU's maximum supported nesting depth. */
+   if (this->depth-- <= this->max_depth)
+      return visit_continue;
+
+   bool found_control_flow = false;
+   ir_assignment *assign;
+
+   /* Check that both blocks don't contain anything we can't support. */
+   foreach_list(n, &ir->then_instructions) {
+      ir_instruction *then_ir = (ir_instruction *) n;
+      visit_tree(then_ir, check_control_flow, &found_control_flow);
+   }
+   foreach_list(n, &ir->else_instructions) {
+      ir_instruction *else_ir = (ir_instruction *) n;
+      visit_tree(else_ir, check_control_flow, &found_control_flow);
+   }
+   if (found_control_flow)
+      return visit_continue;
+
+   void *mem_ctx = ralloc_parent(ir);
+
+   /* Store the condition to a variable.  Move all of the instructions from
+    * the then-clause of the if-statement.  Use the condition variable as a
+    * condition for all assignments.
+    */
+   ir_variable *const then_var =
+      new(mem_ctx) ir_variable(glsl_type::bool_type,
+			       "if_to_cond_assign_then",
+			       ir_var_temporary);
+   ir->insert_before(then_var);
+
+   ir_dereference_variable *then_cond =
+      new(mem_ctx) ir_dereference_variable(then_var);
+
+   assign = new(mem_ctx) ir_assignment(then_cond, ir->condition);
+   ir->insert_before(assign);
+
+   move_block_to_cond_assign(mem_ctx, ir, then_cond,
+			     &ir->then_instructions,
+			     this->condition_variables);
+
+   /* Add the new condition variable to the hash table.  This allows us to
+    * find this variable when lowering other (enclosing) if-statements.
+    */
+   hash_table_insert(this->condition_variables, then_var, then_var);
+
+   /* If there are instructions in the else-clause, store the inverse of the
+    * condition to a variable.  Move all of the instructions from the
+    * else-clause if the if-statement.  Use the (inverse) condition variable
+    * as a condition for all assignments.
+    */
+   if (!ir->else_instructions.is_empty()) {
+      ir_variable *const else_var =
+	 new(mem_ctx) ir_variable(glsl_type::bool_type,
+				  "if_to_cond_assign_else",
+				  ir_var_temporary);
+      ir->insert_before(else_var);
+
+      ir_dereference_variable *else_cond =
+	 new(mem_ctx) ir_dereference_variable(else_var);
+
+      ir_rvalue *inverse =
+	 new(mem_ctx) ir_expression(ir_unop_logic_not,
+				    then_cond->clone(mem_ctx, NULL));
+
+      assign = new(mem_ctx) ir_assignment(else_cond, inverse);
+      ir->insert_before(assign);
+
+      move_block_to_cond_assign(mem_ctx, ir, else_cond,
+				&ir->else_instructions,
+				this->condition_variables);
+
+      /* Add the new condition variable to the hash table.  This allows us to
+       * find this variable when lowering other (enclosing) if-statements.
+       */
+      hash_table_insert(this->condition_variables, else_var, else_var);
+   }
+
+   ir->remove();
+
+   this->progress = true;
+
+   return visit_continue;
+}

diff --git a/icd/intel/compiler/shader/lower_instructions.cpp b/icd/intel/compiler/shader/lower_instructions.cpp
new file mode 100644
index 0000000..8d6d630
--- /dev/null
+++ b/icd/intel/compiler/shader/lower_instructions.cpp

@@ -0,0 +1,548 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file lower_instructions.cpp
+ *
+ * Many GPUs lack native instructions for certain expression operations, and
+ * must replace them with some other expression tree.  This pass lowers some
+ * of the most common cases, allowing the lowering code to be implemented once
+ * rather than in each driver backend.
+ *
+ * Currently supported transformations:
+ * - SUB_TO_ADD_NEG
+ * - DIV_TO_MUL_RCP
+ * - INT_DIV_TO_MUL_RCP
+ * - EXP_TO_EXP2
+ * - POW_TO_EXP2
+ * - LOG_TO_LOG2
+ * - MOD_TO_FRACT
+ * - LDEXP_TO_ARITH
+ * - BITFIELD_INSERT_TO_BFM_BFI
+ * - CARRY_TO_ARITH
+ * - BORROW_TO_ARITH
+ *
+ * SUB_TO_ADD_NEG:
+ * ---------------
+ * Breaks an ir_binop_sub expression down to add(op0, neg(op1))
+ *
+ * This simplifies expression reassociation, and for many backends
+ * there is no subtract operation separate from adding the negation.
+ * For backends with native subtract operations, they will probably
+ * want to recognize add(op0, neg(op1)) or the other way around to
+ * produce a subtract anyway.
+ *
+ * DIV_TO_MUL_RCP and INT_DIV_TO_MUL_RCP:
+ * --------------------------------------
+ * Breaks an ir_binop_div expression down to op0 * (rcp(op1)).
+ *
+ * Many GPUs don't have a divide instruction (945 and 965 included),
+ * but they do have an RCP instruction to compute an approximate
+ * reciprocal.  By breaking the operation down, constant reciprocals
+ * can get constant folded.
+ *
+ * DIV_TO_MUL_RCP only lowers floating point division; INT_DIV_TO_MUL_RCP
+ * handles the integer case, converting to and from floating point so that
+ * RCP is possible.
+ *
+ * EXP_TO_EXP2 and LOG_TO_LOG2:
+ * ----------------------------
+ * Many GPUs don't have a base e log or exponent instruction, but they
+ * do have base 2 versions, so this pass converts exp and log to exp2
+ * and log2 operations.
+ *
+ * POW_TO_EXP2:
+ * -----------
+ * Many older GPUs don't have an x**y instruction.  For these GPUs, convert
+ * x**y to 2**(y * log2(x)).
+ *
+ * MOD_TO_FRACT:
+ * -------------
+ * Breaks an ir_binop_mod expression down to (op1 * fract(op0 / op1))
+ *
+ * Many GPUs don't have a MOD instruction (945 and 965 included), and
+ * if we have to break it down like this anyway, it gives an
+ * opportunity to do things like constant fold the (1.0 / op1) easily.
+ *
+ * LDEXP_TO_ARITH:
+ * -------------
+ * Converts ir_binop_ldexp to arithmetic and bit operations.
+ *
+ * BITFIELD_INSERT_TO_BFM_BFI:
+ * ---------------------------
+ * Breaks ir_quadop_bitfield_insert into ir_binop_bfm (bitfield mask) and
+ * ir_triop_bfi (bitfield insert).
+ *
+ * Many GPUs implement the bitfieldInsert() built-in from ARB_gpu_shader_5
+ * with a pair of instructions.
+ *
+ * CARRY_TO_ARITH:
+ * ---------------
+ * Converts ir_carry into (x + y) < x.
+ *
+ * BORROW_TO_ARITH:
+ * ----------------
+ * Converts ir_borrow into (x < y).
+ *
+ */
+
+#include "libfns.h" // LunarG ADD:
+#include "glsl_types.h"
+#include "ir.h"
+#include "ir_builder.h"
+#include "ir_optimization.h"
+
+using namespace ir_builder;
+
+namespace {
+
+class lower_instructions_visitor : public ir_hierarchical_visitor {
+public:
+   lower_instructions_visitor(unsigned lower)
+      : progress(false), lower(lower) { }
+
+   ir_visitor_status visit_leave(ir_expression *);
+
+   bool progress;
+
+private:
+   unsigned lower; /** Bitfield of which operations to lower */
+
+   void sub_to_add_neg(ir_expression *);
+   void div_to_mul_rcp(ir_expression *);
+   void int_div_to_mul_rcp(ir_expression *);
+   void mod_to_fract(ir_expression *);
+   void exp_to_exp2(ir_expression *);
+   void pow_to_exp2(ir_expression *);
+   void log_to_log2(ir_expression *);
+   void bitfield_insert_to_bfm_bfi(ir_expression *);
+   void ldexp_to_arith(ir_expression *);
+   void carry_to_arith(ir_expression *);
+   void borrow_to_arith(ir_expression *);
+};
+
+} /* anonymous namespace */
+
+/**
+ * Determine if a particular type of lowering should occur
+ */
+#define lowering(x) (this->lower & x)
+
+bool
+lower_instructions(exec_list *instructions, unsigned what_to_lower)
+{
+   lower_instructions_visitor v(what_to_lower);
+
+   visit_list_elements(&v, instructions);
+   return v.progress;
+}
+
+void
+lower_instructions_visitor::sub_to_add_neg(ir_expression *ir)
+{
+   ir->operation = ir_binop_add;
+   ir->operands[1] = new(ir) ir_expression(ir_unop_neg, ir->operands[1]->type,
+					   ir->operands[1], NULL);
+   this->progress = true;
+}
+
+void
+lower_instructions_visitor::div_to_mul_rcp(ir_expression *ir)
+{
+   assert(ir->operands[1]->type->is_float());
+
+   /* New expression for the 1.0 / op1 */
+   ir_rvalue *expr;
+   expr = new(ir) ir_expression(ir_unop_rcp,
+				ir->operands[1]->type,
+				ir->operands[1]);
+
+   /* op0 / op1 -> op0 * (1.0 / op1) */
+   ir->operation = ir_binop_mul;
+   ir->operands[1] = expr;
+
+   this->progress = true;
+}
+
+void
+lower_instructions_visitor::int_div_to_mul_rcp(ir_expression *ir)
+{
+   assert(ir->operands[1]->type->is_integer());
+
+   /* Be careful with integer division -- we need to do it as a
+    * float and re-truncate, since rcp(n > 1) of an integer would
+    * just be 0.
+    */
+   ir_rvalue *op0, *op1;
+   const struct glsl_type *vec_type;
+
+   vec_type = glsl_type::get_instance(GLSL_TYPE_FLOAT,
+				      ir->operands[1]->type->vector_elements,
+				      ir->operands[1]->type->matrix_columns);
+
+   if (ir->operands[1]->type->base_type == GLSL_TYPE_INT)
+      op1 = new(ir) ir_expression(ir_unop_i2f, vec_type, ir->operands[1], NULL);
+   else
+      op1 = new(ir) ir_expression(ir_unop_u2f, vec_type, ir->operands[1], NULL);
+
+   op1 = new(ir) ir_expression(ir_unop_rcp, op1->type, op1, NULL);
+
+   vec_type = glsl_type::get_instance(GLSL_TYPE_FLOAT,
+				      ir->operands[0]->type->vector_elements,
+				      ir->operands[0]->type->matrix_columns);
+
+   if (ir->operands[0]->type->base_type == GLSL_TYPE_INT)
+      op0 = new(ir) ir_expression(ir_unop_i2f, vec_type, ir->operands[0], NULL);
+   else
+      op0 = new(ir) ir_expression(ir_unop_u2f, vec_type, ir->operands[0], NULL);
+
+   vec_type = glsl_type::get_instance(GLSL_TYPE_FLOAT,
+				      ir->type->vector_elements,
+				      ir->type->matrix_columns);
+
+   op0 = new(ir) ir_expression(ir_binop_mul, vec_type, op0, op1);
+
+   if (ir->operands[1]->type->base_type == GLSL_TYPE_INT) {
+      ir->operation = ir_unop_f2i;
+      ir->operands[0] = op0;
+   } else {
+      ir->operation = ir_unop_i2u;
+      ir->operands[0] = new(ir) ir_expression(ir_unop_f2i, op0);
+   }
+   ir->operands[1] = NULL;
+
+   this->progress = true;
+}
+
+void
+lower_instructions_visitor::exp_to_exp2(ir_expression *ir)
+{
+   ir_constant *log2_e = new(ir) ir_constant(float(M_LOG2E));
+
+   ir->operation = ir_unop_exp2;
+   ir->operands[0] = new(ir) ir_expression(ir_binop_mul, ir->operands[0]->type,
+					   ir->operands[0], log2_e);
+   this->progress = true;
+}
+
+void
+lower_instructions_visitor::pow_to_exp2(ir_expression *ir)
+{
+   ir_expression *const log2_x =
+      new(ir) ir_expression(ir_unop_log2, ir->operands[0]->type,
+			    ir->operands[0]);
+
+   ir->operation = ir_unop_exp2;
+   ir->operands[0] = new(ir) ir_expression(ir_binop_mul, ir->operands[1]->type,
+					   ir->operands[1], log2_x);
+   ir->operands[1] = NULL;
+   this->progress = true;
+}
+
+void
+lower_instructions_visitor::log_to_log2(ir_expression *ir)
+{
+   ir->operation = ir_binop_mul;
+   ir->operands[0] = new(ir) ir_expression(ir_unop_log2, ir->operands[0]->type,
+					   ir->operands[0], NULL);
+   ir->operands[1] = new(ir) ir_constant(float(1.0 / M_LOG2E));
+   this->progress = true;
+}
+
+void
+lower_instructions_visitor::mod_to_fract(ir_expression *ir)
+{
+   ir_variable *temp = new(ir) ir_variable(ir->operands[1]->type, "mod_b",
+					   ir_var_temporary);
+   this->base_ir->insert_before(temp);
+
+   ir_assignment *const assign =
+      new(ir) ir_assignment(new(ir) ir_dereference_variable(temp),
+			    ir->operands[1], NULL);
+
+   this->base_ir->insert_before(assign);
+
+   ir_expression *const div_expr =
+      new(ir) ir_expression(ir_binop_div, ir->operands[0]->type,
+			    ir->operands[0],
+			    new(ir) ir_dereference_variable(temp));
+
+   /* Don't generate new IR that would need to be lowered in an additional
+    * pass.
+    */
+   if (lowering(DIV_TO_MUL_RCP))
+      div_to_mul_rcp(div_expr);
+
+   ir_rvalue *expr = new(ir) ir_expression(ir_unop_fract,
+					   ir->operands[0]->type,
+					   div_expr,
+					   NULL);
+
+   ir->operation = ir_binop_mul;
+   ir->operands[0] = new(ir) ir_dereference_variable(temp);
+   ir->operands[1] = expr;
+   this->progress = true;
+}
+
+void
+lower_instructions_visitor::bitfield_insert_to_bfm_bfi(ir_expression *ir)
+{
+   /* Translates
+    *    ir_quadop_bitfield_insert base insert offset bits
+    * into
+    *    ir_triop_bfi (ir_binop_bfm bits offset) insert base
+    */
+
+   ir_rvalue *base_expr = ir->operands[0];
+
+   ir->operation = ir_triop_bfi;
+   ir->operands[0] = new(ir) ir_expression(ir_binop_bfm,
+                                           ir->type->get_base_type(),
+                                           ir->operands[3],
+                                           ir->operands[2]);
+   /* ir->operands[1] is still the value to insert. */
+   ir->operands[2] = base_expr;
+   ir->operands[3] = NULL;
+
+   this->progress = true;
+}
+
+void
+lower_instructions_visitor::ldexp_to_arith(ir_expression *ir)
+{
+   /* Translates
+    *    ir_binop_ldexp x exp
+    * into
+    *
+    *    extracted_biased_exp = rshift(bitcast_f2i(abs(x)), exp_shift);
+    *    resulting_biased_exp = extracted_biased_exp + exp;
+    *
+    *    if (resulting_biased_exp < 1) {
+    *       return copysign(0.0, x);
+    *    }
+    *
+    *    return bitcast_u2f((bitcast_f2u(x) & sign_mantissa_mask) |
+    *                       lshift(i2u(resulting_biased_exp), exp_shift));
+    *
+    * which we can't actually implement as such, since the GLSL IR doesn't
+    * have vectorized if-statements. We actually implement it without branches
+    * using conditional-select:
+    *
+    *    extracted_biased_exp = rshift(bitcast_f2i(abs(x)), exp_shift);
+    *    resulting_biased_exp = extracted_biased_exp + exp;
+    *
+    *    is_not_zero_or_underflow = gequal(resulting_biased_exp, 1);
+    *    x = csel(is_not_zero_or_underflow, x, copysign(0.0f, x));
+    *    resulting_biased_exp = csel(is_not_zero_or_underflow,
+    *                                resulting_biased_exp, 0);
+    *
+    *    return bitcast_u2f((bitcast_f2u(x) & sign_mantissa_mask) |
+    *                       lshift(i2u(resulting_biased_exp), exp_shift));
+    */
+
+   const unsigned vec_elem = ir->type->vector_elements;
+
+   /* Types */
+   const glsl_type *ivec = glsl_type::get_instance(GLSL_TYPE_INT, vec_elem, 1);
+   const glsl_type *bvec = glsl_type::get_instance(GLSL_TYPE_BOOL, vec_elem, 1);
+
+   /* Constants */
+   ir_constant *zeroi = ir_constant::zero(ir, ivec);
+
+   ir_constant *sign_mask = new(ir) ir_constant(0x80000000u, vec_elem);
+
+   ir_constant *exp_shift = new(ir) ir_constant(23);
+   ir_constant *exp_width = new(ir) ir_constant(8);
+
+   /* Temporary variables */
+   ir_variable *x = new(ir) ir_variable(ir->type, "x", ir_var_temporary);
+   ir_variable *exp = new(ir) ir_variable(ivec, "exp", ir_var_temporary);
+
+   ir_variable *zero_sign_x = new(ir) ir_variable(ir->type, "zero_sign_x",
+                                                  ir_var_temporary);
+
+   ir_variable *extracted_biased_exp =
+      new(ir) ir_variable(ivec, "extracted_biased_exp", ir_var_temporary);
+   ir_variable *resulting_biased_exp =
+      new(ir) ir_variable(ivec, "resulting_biased_exp", ir_var_temporary);
+
+   ir_variable *is_not_zero_or_underflow =
+      new(ir) ir_variable(bvec, "is_not_zero_or_underflow", ir_var_temporary);
+
+   ir_instruction &i = *base_ir;
+
+   /* Copy <x> and <exp> arguments. */
+   i.insert_before(x);
+   i.insert_before(assign(x, ir->operands[0]));
+   i.insert_before(exp);
+   i.insert_before(assign(exp, ir->operands[1]));
+
+   /* Extract the biased exponent from <x>. */
+   i.insert_before(extracted_biased_exp);
+   i.insert_before(assign(extracted_biased_exp,
+                          rshift(bitcast_f2i(abs(x)), exp_shift)));
+
+   i.insert_before(resulting_biased_exp);
+   i.insert_before(assign(resulting_biased_exp,
+                          add(extracted_biased_exp, exp)));
+
+   /* Test if result is ±0.0, subnormal, or underflow by checking if the
+    * resulting biased exponent would be less than 0x1. If so, the result is
+    * 0.0 with the sign of x. (Actually, invert the conditions so that
+    * immediate values are the second arguments, which is better for i965)
+    */
+   i.insert_before(zero_sign_x);
+   i.insert_before(assign(zero_sign_x,
+                          bitcast_u2f(bit_and(bitcast_f2u(x), sign_mask))));
+
+   i.insert_before(is_not_zero_or_underflow);
+   i.insert_before(assign(is_not_zero_or_underflow,
+                          gequal(resulting_biased_exp,
+                                  new(ir) ir_constant(0x1, vec_elem))));
+   i.insert_before(assign(x, csel(is_not_zero_or_underflow,
+                                  x, zero_sign_x)));
+   i.insert_before(assign(resulting_biased_exp,
+                          csel(is_not_zero_or_underflow,
+                               resulting_biased_exp, zeroi)));
+
+   /* We could test for overflows by checking if the resulting biased exponent
+    * would be greater than 0xFE. Turns out we don't need to because the GLSL
+    * spec says:
+    *
+    *    "If this product is too large to be represented in the
+    *     floating-point type, the result is undefined."
+    */
+
+   ir_constant *exp_shift_clone = exp_shift->clone(ir, NULL);
+   ir->operation = ir_unop_bitcast_i2f;
+   ir->operands[0] = bitfield_insert(bitcast_f2i(x), resulting_biased_exp,
+                                     exp_shift_clone, exp_width);
+   ir->operands[1] = NULL;
+
+   /* Don't generate new IR that would need to be lowered in an additional
+    * pass.
+    */
+   if (lowering(BITFIELD_INSERT_TO_BFM_BFI))
+      bitfield_insert_to_bfm_bfi(ir->operands[0]->as_expression());
+
+   this->progress = true;
+}
+
+void
+lower_instructions_visitor::carry_to_arith(ir_expression *ir)
+{
+   /* Translates
+    *   ir_binop_carry x y
+    * into
+    *   sum = ir_binop_add x y
+    *   bcarry = ir_binop_less sum x
+    *   carry = ir_unop_b2i bcarry
+    */
+
+   ir_rvalue *x_clone = ir->operands[0]->clone(ir, NULL);
+   ir->operation = ir_unop_i2u;
+   ir->operands[0] = b2i(less(add(ir->operands[0], ir->operands[1]), x_clone));
+   ir->operands[1] = NULL;
+
+   this->progress = true;
+}
+
+void
+lower_instructions_visitor::borrow_to_arith(ir_expression *ir)
+{
+   /* Translates
+    *   ir_binop_borrow x y
+    * into
+    *   bcarry = ir_binop_less x y
+    *   carry = ir_unop_b2i bcarry
+    */
+
+   ir->operation = ir_unop_i2u;
+   ir->operands[0] = b2i(less(ir->operands[0], ir->operands[1]));
+   ir->operands[1] = NULL;
+
+   this->progress = true;
+}
+
+ir_visitor_status
+lower_instructions_visitor::visit_leave(ir_expression *ir)
+{
+   switch (ir->operation) {
+   case ir_binop_sub:
+      if (lowering(SUB_TO_ADD_NEG))
+	 sub_to_add_neg(ir);
+      break;
+
+   case ir_binop_div:
+      if (ir->operands[1]->type->is_integer() && lowering(INT_DIV_TO_MUL_RCP))
+	 int_div_to_mul_rcp(ir);
+      else if (ir->operands[1]->type->is_float() && lowering(DIV_TO_MUL_RCP))
+	 div_to_mul_rcp(ir);
+      break;
+
+   case ir_unop_exp:
+      if (lowering(EXP_TO_EXP2))
+	 exp_to_exp2(ir);
+      break;
+
+   case ir_unop_log:
+      if (lowering(LOG_TO_LOG2))
+	 log_to_log2(ir);
+      break;
+
+   case ir_binop_mod:
+      if (lowering(MOD_TO_FRACT) && ir->type->is_float())
+	 mod_to_fract(ir);
+      break;
+
+   case ir_binop_pow:
+      if (lowering(POW_TO_EXP2))
+	 pow_to_exp2(ir);
+      break;
+
+   case ir_quadop_bitfield_insert:
+      if (lowering(BITFIELD_INSERT_TO_BFM_BFI))
+         bitfield_insert_to_bfm_bfi(ir);
+      break;
+
+   case ir_binop_ldexp:
+      if (lowering(LDEXP_TO_ARITH))
+         ldexp_to_arith(ir);
+      break;
+
+   case ir_binop_carry:
+      if (lowering(CARRY_TO_ARITH))
+         carry_to_arith(ir);
+      break;
+
+   case ir_binop_borrow:
+      if (lowering(BORROW_TO_ARITH))
+         borrow_to_arith(ir);
+      break;
+
+   default:
+      return visit_continue;
+   }
+
+   return visit_continue;
+}

diff --git a/icd/intel/compiler/shader/lower_jumps.cpp b/icd/intel/compiler/shader/lower_jumps.cpp
new file mode 100644
index 0000000..02f65f0
--- /dev/null
+++ b/icd/intel/compiler/shader/lower_jumps.cpp

@@ -0,0 +1,1022 @@
+/*
+ * Copyright © 2010 Luca Barbieri
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file lower_jumps.cpp
+ *
+ * This pass lowers jumps (break, continue, and return) to if/else structures.
+ *
+ * It can be asked to:
+ * 1. Pull jumps out of ifs where possible
+ * 2. Remove all "continue"s, replacing them with an "execute flag"
+ * 3. Replace all "break" with a single conditional one at the end of the loop
+ * 4. Replace all "return"s with a single return at the end of the function,
+ *    for the main function and/or other functions
+ *
+ * Applying this pass gives several benefits:
+ * 1. All functions can be inlined.
+ * 2. nv40 and other pre-DX10 chips without "continue" can be supported
+ * 3. nv30 and other pre-DX10 chips with no control flow at all are better
+ *    supported
+ *
+ * Continues are lowered by adding a per-loop "execute flag", initialized to
+ * true, that when cleared inhibits all execution until the end of the loop.
+ *
+ * Breaks are lowered to continues, plus setting a "break flag" that is checked
+ * at the end of the loop, and trigger the unique "break".
+ *
+ * Returns are lowered to breaks/continues, plus adding a "return flag" that
+ * causes loops to break again out of their enclosing loops until all the
+ * loops are exited: then the "execute flag" logic will ignore everything
+ * until the end of the function.
+ *
+ * Note that "continue" and "return" can also be implemented by adding
+ * a dummy loop and using break.
+ * However, this is bad for hardware with limited nesting depth, and
+ * prevents further optimization, and thus is not currently performed.
+ */
+
+#include "glsl_types.h"
+#include <string.h>
+#include "ir.h"
+
+/**
+ * Enum recording the result of analyzing how control flow might exit
+ * an IR node.
+ *
+ * Each possible value of jump_strength indicates a strictly stronger
+ * guarantee on control flow than the previous value.
+ *
+ * The ordering of strengths roughly reflects the way jumps are
+ * lowered: jumps with higher strength tend to be lowered to jumps of
+ * lower strength.  Accordingly, strength is used as a heuristic to
+ * determine which lowering to perform first.
+ *
+ * This enum is also used by get_jump_strength() to categorize
+ * instructions as either break, continue, return, or other.  When
+ * used in this fashion, strength_always_clears_execute_flag is not
+ * used.
+ *
+ * The control flow analysis made by this optimization pass makes two
+ * simplifying assumptions:
+ *
+ * - It ignores discard instructions, since they are lowered by a
+ *   separate pass (lower_discard.cpp).
+ *
+ * - It assumes it is always possible for control to flow from a loop
+ *   to the instruction immediately following it.  Technically, this
+ *   is not true (since all execution paths through the loop might
+ *   jump back to the top, or return from the function).
+ *
+ * Both of these simplifying assumtions are safe, since they can never
+ * cause reachable code to be incorrectly classified as unreachable;
+ * they can only do the opposite.
+ */
+enum jump_strength
+{
+   /**
+    * Analysis has produced no guarantee on how control flow might
+    * exit this IR node.  It might fall out the bottom (with or
+    * without clearing the execute flag, if present), or it might
+    * continue to the top of the innermost enclosing loop, break out
+    * of it, or return from the function.
+    */
+   strength_none,
+
+   /**
+    * The only way control can fall out the bottom of this node is
+    * through a code path that clears the execute flag.  It might also
+    * continue to the top of the innermost enclosing loop, break out
+    * of it, or return from the function.
+    */
+   strength_always_clears_execute_flag,
+
+   /**
+    * Control cannot fall out the bottom of this node.  It might
+    * continue to the top of the innermost enclosing loop, break out
+    * of it, or return from the function.
+    */
+   strength_continue,
+
+   /**
+    * Control cannot fall out the bottom of this node, or continue the
+    * top of the innermost enclosing loop.  It can only break out of
+    * it or return from the function.
+    */
+   strength_break,
+
+   /**
+    * Control cannot fall out the bottom of this node, continue to the
+    * top of the innermost enclosing loop, or break out of it.  It can
+    * only return from the function.
+    */
+   strength_return
+};
+
+namespace {
+
+struct block_record
+{
+   /* minimum jump strength (of lowered IR, not pre-lowering IR)
+    *
+    * If the block ends with a jump, must be the strength of the jump.
+    * Otherwise, the jump would be dead and have been deleted before)
+    *
+    * If the block doesn't end with a jump, it can be different than strength_none if all paths before it lead to some jump
+    * (e.g. an if with a return in one branch, and a break in the other, while not lowering them)
+    * Note that identical jumps are usually unified though.
+    */
+   jump_strength min_strength;
+
+   /* can anything clear the execute flag? */
+   bool may_clear_execute_flag;
+
+   block_record()
+   {
+      this->min_strength = strength_none;
+      this->may_clear_execute_flag = false;
+   }
+};
+
+struct loop_record
+{
+   ir_function_signature* signature;
+   ir_loop* loop;
+
+   /* used to avoid lowering the break used to represent lowered breaks */
+   unsigned nesting_depth;
+   bool in_if_at_the_end_of_the_loop;
+
+   bool may_set_return_flag;
+
+   ir_variable* break_flag;
+   ir_variable* execute_flag; /* cleared to emulate continue */
+
+   loop_record(ir_function_signature* p_signature = 0, ir_loop* p_loop = 0)
+   {
+      this->signature = p_signature;
+      this->loop = p_loop;
+      this->nesting_depth = 0;
+      this->in_if_at_the_end_of_the_loop = false;
+      this->may_set_return_flag = false;
+      this->break_flag = 0;
+      this->execute_flag = 0;
+   }
+
+   ir_variable* get_execute_flag()
+   {
+      /* also supported for the "function loop" */
+      if(!this->execute_flag) {
+         exec_list& list = this->loop ? this->loop->body_instructions : signature->body;
+         this->execute_flag = new(this->signature) ir_variable(glsl_type::bool_type, "execute_flag", ir_var_temporary);
+         list.push_head(new(this->signature) ir_assignment(new(this->signature) ir_dereference_variable(execute_flag), new(this->signature) ir_constant(true), 0));
+         list.push_head(this->execute_flag);
+      }
+      return this->execute_flag;
+   }
+
+   ir_variable* get_break_flag()
+   {
+      assert(this->loop);
+      if(!this->break_flag) {
+         this->break_flag = new(this->signature) ir_variable(glsl_type::bool_type, "break_flag", ir_var_temporary);
+         this->loop->insert_before(this->break_flag);
+         this->loop->insert_before(new(this->signature) ir_assignment(new(this->signature) ir_dereference_variable(break_flag), new(this->signature) ir_constant(false), 0));
+      }
+      return this->break_flag;
+   }
+};
+
+struct function_record
+{
+   ir_function_signature* signature;
+   ir_variable* return_flag; /* used to break out of all loops and then jump to the return instruction */
+   ir_variable* return_value;
+   bool lower_return;
+   unsigned nesting_depth;
+
+   function_record(ir_function_signature* p_signature = 0,
+                   bool lower_return = false)
+   {
+      this->signature = p_signature;
+      this->return_flag = 0;
+      this->return_value = 0;
+      this->nesting_depth = 0;
+      this->lower_return = lower_return;
+   }
+
+   ir_variable* get_return_flag()
+   {
+      if(!this->return_flag) {
+         this->return_flag = new(this->signature) ir_variable(glsl_type::bool_type, "return_flag", ir_var_temporary);
+         this->signature->body.push_head(new(this->signature) ir_assignment(new(this->signature) ir_dereference_variable(return_flag), new(this->signature) ir_constant(false), 0));
+         this->signature->body.push_head(this->return_flag);
+      }
+      return this->return_flag;
+   }
+
+   ir_variable* get_return_value()
+   {
+      if(!this->return_value) {
+         assert(!this->signature->return_type->is_void());
+         return_value = new(this->signature) ir_variable(this->signature->return_type, "return_value", ir_var_temporary);
+         this->signature->body.push_head(this->return_value);
+      }
+      return this->return_value;
+   }
+};
+
+struct ir_lower_jumps_visitor : public ir_control_flow_visitor {
+   /* Postconditions: on exit of any visit() function:
+    *
+    * ANALYSIS: this->block.min_strength,
+    * this->block.may_clear_execute_flag, and
+    * this->loop.may_set_return_flag are updated to reflect the
+    * characteristics of the visited statement.
+    *
+    * DEAD_CODE_ELIMINATION: If this->block.min_strength is not
+    * strength_none, the visited node is at the end of its exec_list.
+    * In other words, any unreachable statements that follow the
+    * visited statement in its exec_list have been removed.
+    *
+    * CONTAINED_JUMPS_LOWERED: If the visited statement contains other
+    * statements, then should_lower_jump() is false for all of the
+    * return, break, or continue statements it contains.
+    *
+    * Note that visiting a jump does not lower it.  That is the
+    * responsibility of the statement (or function signature) that
+    * contains the jump.
+    */
+
+   bool progress;
+
+   struct function_record function;
+   struct loop_record loop;
+   struct block_record block;
+
+   bool pull_out_jumps;
+   bool lower_continue;
+   bool lower_break;
+   bool lower_sub_return;
+   bool lower_main_return;
+
+   ir_lower_jumps_visitor()
+      : progress(false),
+        pull_out_jumps(false),
+        lower_continue(false),
+        lower_break(false),
+        lower_sub_return(false),
+        lower_main_return(false)
+   {
+   }
+
+   void truncate_after_instruction(exec_node *ir)
+   {
+      if (!ir)
+         return;
+
+      while (!ir->get_next()->is_tail_sentinel()) {
+         ((ir_instruction *)ir->get_next())->remove();
+         this->progress = true;
+      }
+   }
+
+   void move_outer_block_inside(ir_instruction *ir, exec_list *inner_block)
+   {
+      while (!ir->get_next()->is_tail_sentinel()) {
+         ir_instruction *move_ir = (ir_instruction *)ir->get_next();
+
+         move_ir->remove();
+         inner_block->push_tail(move_ir);
+      }
+   }
+
+   /**
+    * Insert the instructions necessary to lower a return statement,
+    * before the given return instruction.
+    */
+   void insert_lowered_return(ir_return *ir)
+   {
+      ir_variable* return_flag = this->function.get_return_flag();
+      if(!this->function.signature->return_type->is_void()) {
+         ir_variable* return_value = this->function.get_return_value();
+         ir->insert_before(
+            new(ir) ir_assignment(
+               new (ir) ir_dereference_variable(return_value),
+               ir->value));
+      }
+      ir->insert_before(
+         new(ir) ir_assignment(
+            new (ir) ir_dereference_variable(return_flag),
+            new (ir) ir_constant(true)));
+      this->loop.may_set_return_flag = true;
+   }
+
+   /**
+    * If the given instruction is a return, lower it to instructions
+    * that store the return value (if there is one), set the return
+    * flag, and then break.
+    *
+    * It is safe to pass NULL to this function.
+    */
+   void lower_return_unconditionally(ir_instruction *ir)
+   {
+      if (get_jump_strength(ir) != strength_return) {
+         return;
+      }
+      insert_lowered_return((ir_return*)ir);
+      ir->replace_with(new(ir) ir_loop_jump(ir_loop_jump::jump_break));
+   }
+
+   /**
+    * Create the necessary instruction to replace a break instruction.
+    */
+   ir_instruction *create_lowered_break()
+   {
+      void *ctx = this->function.signature;
+      return new(ctx) ir_assignment(
+          new(ctx) ir_dereference_variable(this->loop.get_break_flag()),
+          new(ctx) ir_constant(true),
+          0);
+   }
+
+   /**
+    * If the given instruction is a break, lower it to an instruction
+    * that sets the break flag, without consulting
+    * should_lower_jump().
+    *
+    * It is safe to pass NULL to this function.
+    */
+   void lower_break_unconditionally(ir_instruction *ir)
+   {
+      if (get_jump_strength(ir) != strength_break) {
+         return;
+      }
+      ir->replace_with(create_lowered_break());
+   }
+
+   /**
+    * If the block ends in a conditional or unconditional break, lower
+    * it, even though should_lower_jump() says it needn't be lowered.
+    */
+   void lower_final_breaks(exec_list *block)
+   {
+      ir_instruction *ir = (ir_instruction *) block->get_tail();
+      lower_break_unconditionally(ir);
+      ir_if *ir_if = ir->as_if();
+      if (ir_if) {
+          lower_break_unconditionally(
+              (ir_instruction *) ir_if->then_instructions.get_tail());
+          lower_break_unconditionally(
+              (ir_instruction *) ir_if->else_instructions.get_tail());
+      }
+   }
+
+   virtual void visit(class ir_loop_jump * ir)
+   {
+      /* Eliminate all instructions after each one, since they are
+       * unreachable.  This satisfies the DEAD_CODE_ELIMINATION
+       * postcondition.
+       */
+      truncate_after_instruction(ir);
+
+      /* Set this->block.min_strength based on this instruction.  This
+       * satisfies the ANALYSIS postcondition.  It is not necessary to
+       * update this->block.may_clear_execute_flag or
+       * this->loop.may_set_return_flag, because an unlowered jump
+       * instruction can't change any flags.
+       */
+      this->block.min_strength = ir->is_break() ? strength_break : strength_continue;
+
+      /* The CONTAINED_JUMPS_LOWERED postcondition is already
+       * satisfied, because jump statements can't contain other
+       * statements.
+       */
+   }
+
+   virtual void visit(class ir_return * ir)
+   {
+      /* Eliminate all instructions after each one, since they are
+       * unreachable.  This satisfies the DEAD_CODE_ELIMINATION
+       * postcondition.
+       */
+      truncate_after_instruction(ir);
+
+      /* Set this->block.min_strength based on this instruction.  This
+       * satisfies the ANALYSIS postcondition.  It is not necessary to
+       * update this->block.may_clear_execute_flag or
+       * this->loop.may_set_return_flag, because an unlowered return
+       * instruction can't change any flags.
+       */
+      this->block.min_strength = strength_return;
+
+      /* The CONTAINED_JUMPS_LOWERED postcondition is already
+       * satisfied, because jump statements can't contain other
+       * statements.
+       */
+   }
+
+   virtual void visit(class ir_discard * ir)
+   {
+      /* Nothing needs to be done.  The ANALYSIS and
+       * DEAD_CODE_ELIMINATION postconditions are already satisfied,
+       * because discard statements are ignored by this optimization
+       * pass.  The CONTAINED_JUMPS_LOWERED postcondition is already
+       * satisfied, because discard statements can't contain other
+       * statements.
+       */
+      (void) ir;
+   }
+
+   enum jump_strength get_jump_strength(ir_instruction* ir)
+   {
+      if(!ir)
+         return strength_none;
+      else if(ir->ir_type == ir_type_loop_jump) {
+         if(((ir_loop_jump*)ir)->is_break())
+            return strength_break;
+         else
+            return strength_continue;
+      } else if(ir->ir_type == ir_type_return)
+         return strength_return;
+      else
+         return strength_none;
+   }
+
+   bool should_lower_jump(ir_jump* ir)
+   {
+      unsigned strength = get_jump_strength(ir);
+      bool lower;
+      switch(strength)
+      {
+      case strength_none:
+         lower = false; /* don't change this, code relies on it */
+         break;
+      case strength_continue:
+         lower = lower_continue;
+         break;
+      case strength_break:
+         assert(this->loop.loop);
+         /* never lower "canonical break" */
+         if(ir->get_next()->is_tail_sentinel() && (this->loop.nesting_depth == 0
+               || (this->loop.nesting_depth == 1 && this->loop.in_if_at_the_end_of_the_loop)))
+            lower = false;
+         else
+            lower = lower_break;
+         break;
+      case strength_return:
+         /* never lower return at the end of a this->function */
+         if(this->function.nesting_depth == 0 && ir->get_next()->is_tail_sentinel())
+            lower = false;
+         else
+            lower = this->function.lower_return;
+         break;
+      }
+      return lower;
+   }
+
+   block_record visit_block(exec_list* list)
+   {
+      /* Note: since visiting a node may change that node's next
+       * pointer, we can't use visit_exec_list(), because
+       * visit_exec_list() caches the node's next pointer before
+       * visiting it.  So we use foreach_list() instead.
+       *
+       * foreach_list() isn't safe if the node being visited gets
+       * removed, but fortunately this visitor doesn't do that.
+       */
+
+      block_record saved_block = this->block;
+      this->block = block_record();
+      foreach_list(node, list) {
+         ((ir_instruction *) node)->accept(this);
+      }
+      block_record ret = this->block;
+      this->block = saved_block;
+      return ret;
+   }
+
+   virtual void visit(ir_if *ir)
+   {
+      if(this->loop.nesting_depth == 0 && ir->get_next()->is_tail_sentinel())
+         this->loop.in_if_at_the_end_of_the_loop = true;
+
+      ++this->function.nesting_depth;
+      ++this->loop.nesting_depth;
+
+      block_record block_records[2];
+      ir_jump* jumps[2];
+
+      /* Recursively lower nested jumps.  This satisfies the
+       * CONTAINED_JUMPS_LOWERED postcondition, except in the case of
+       * unconditional jumps at the end of ir->then_instructions and
+       * ir->else_instructions, which are handled below.
+       */
+      block_records[0] = visit_block(&ir->then_instructions);
+      block_records[1] = visit_block(&ir->else_instructions);
+
+retry: /* we get here if we put code after the if inside a branch */
+
+      /* Determine which of ir->then_instructions and
+       * ir->else_instructions end with an unconditional jump.
+       */
+      for(unsigned i = 0; i < 2; ++i) {
+         exec_list& list = i ? ir->else_instructions : ir->then_instructions;
+         jumps[i] = 0;
+         if(!list.is_empty() && get_jump_strength((ir_instruction*)list.get_tail()))
+            jumps[i] = (ir_jump*)list.get_tail();
+      }
+
+      /* Loop until we have satisfied the CONTAINED_JUMPS_LOWERED
+       * postcondition by lowering jumps in both then_instructions and
+       * else_instructions.
+       */
+      for(;;) {
+         /* Determine the types of the jumps that terminate
+          * ir->then_instructions and ir->else_instructions.
+          */
+         jump_strength jump_strengths[2];
+
+         for(unsigned i = 0; i < 2; ++i) {
+            if(jumps[i]) {
+               jump_strengths[i] = block_records[i].min_strength;
+               assert(jump_strengths[i] == get_jump_strength(jumps[i]));
+            } else
+               jump_strengths[i] = strength_none;
+         }
+
+         /* If both code paths end in a jump, and the jumps are the
+          * same, and we are pulling out jumps, replace them with a
+          * single jump that comes after the if instruction.  The new
+          * jump will be visited next, and it will be lowered if
+          * necessary by the loop or conditional that encloses it.
+          */
+         if(pull_out_jumps && jump_strengths[0] == jump_strengths[1]) {
+            bool unify = true;
+            if(jump_strengths[0] == strength_continue)
+               ir->insert_after(new(ir) ir_loop_jump(ir_loop_jump::jump_continue));
+            else if(jump_strengths[0] == strength_break)
+               ir->insert_after(new(ir) ir_loop_jump(ir_loop_jump::jump_break));
+            /* FINISHME: unify returns with identical expressions */
+            else if(jump_strengths[0] == strength_return && this->function.signature->return_type->is_void())
+               ir->insert_after(new(ir) ir_return(NULL));
+	    else
+	       unify = false;
+
+            if(unify) {
+               jumps[0]->remove();
+               jumps[1]->remove();
+               this->progress = true;
+
+               /* Update jumps[] to reflect the fact that the jumps
+                * are gone, and update block_records[] to reflect the
+                * fact that control can now flow to the next
+                * instruction.
+                */
+               jumps[0] = 0;
+               jumps[1] = 0;
+               block_records[0].min_strength = strength_none;
+               block_records[1].min_strength = strength_none;
+
+               /* The CONTAINED_JUMPS_LOWERED postcondition is now
+                * satisfied, so we can break out of the loop.
+                */
+               break;
+            }
+         }
+
+         /* lower a jump: if both need to lowered, start with the strongest one, so that
+          * we might later unify the lowered version with the other one
+          */
+         bool should_lower[2];
+         for(unsigned i = 0; i < 2; ++i)
+            should_lower[i] = should_lower_jump(jumps[i]);
+
+         int lower;
+         if(should_lower[1] && should_lower[0])
+            lower = jump_strengths[1] > jump_strengths[0];
+         else if(should_lower[0])
+            lower = 0;
+         else if(should_lower[1])
+            lower = 1;
+         else
+            /* Neither code path ends in a jump that needs to be
+             * lowered, so the CONTAINED_JUMPS_LOWERED postcondition
+             * is satisfied and we can break out of the loop.
+             */
+            break;
+
+         if(jump_strengths[lower] == strength_return) {
+            /* To lower a return, we create a return flag (if the
+             * function doesn't have one already) and add instructions
+             * that: 1. store the return value (if this function has a
+             * non-void return) and 2. set the return flag
+             */
+            insert_lowered_return((ir_return*)jumps[lower]);
+            if(this->loop.loop) {
+               /* If we are in a loop, replace the return instruction
+                * with a break instruction, and then loop so that the
+                * break instruction can be lowered if necessary.
+                */
+               ir_loop_jump* lowered = 0;
+               lowered = new(ir) ir_loop_jump(ir_loop_jump::jump_break);
+               /* Note: we must update block_records and jumps to
+                * reflect the fact that the control path has been
+                * altered from a return to a break.
+                */
+               block_records[lower].min_strength = strength_break;
+               jumps[lower]->replace_with(lowered);
+               jumps[lower] = lowered;
+            } else {
+               /* If we are not in a loop, we then proceed as we would
+                * for a continue statement (set the execute flag to
+                * false to prevent the rest of the function from
+                * executing).
+                */
+               goto lower_continue;
+            }
+            this->progress = true;
+         } else if(jump_strengths[lower] == strength_break) {
+            /* To lower a break, we create a break flag (if the loop
+             * doesn't have one already) and add an instruction that
+             * sets it.
+             *
+             * Then we proceed as we would for a continue statement
+             * (set the execute flag to false to prevent the rest of
+             * the loop body from executing).
+             *
+             * The visit() function for the loop will ensure that the
+             * break flag is checked after executing the loop body.
+             */
+            jumps[lower]->insert_before(create_lowered_break());
+            goto lower_continue;
+         } else if(jump_strengths[lower] == strength_continue) {
+lower_continue:
+            /* To lower a continue, we create an execute flag (if the
+             * loop doesn't have one already) and replace the continue
+             * with an instruction that clears it.
+             *
+             * Note that this code path gets exercised when lowering
+             * return statements that are not inside a loop, so
+             * this->loop must be initialized even outside of loops.
+             */
+            ir_variable* execute_flag = this->loop.get_execute_flag();
+            jumps[lower]->replace_with(new(ir) ir_assignment(new (ir) ir_dereference_variable(execute_flag), new (ir) ir_constant(false), 0));
+            /* Note: we must update block_records and jumps to reflect
+             * the fact that the control path has been altered to an
+             * instruction that clears the execute flag.
+             */
+            jumps[lower] = 0;
+            block_records[lower].min_strength = strength_always_clears_execute_flag;
+            block_records[lower].may_clear_execute_flag = true;
+            this->progress = true;
+
+            /* Let the loop run again, in case the other branch of the
+             * if needs to be lowered too.
+             */
+         }
+      }
+
+      /* move out a jump out if possible */
+      if(pull_out_jumps) {
+         /* If one of the branches ends in a jump, and control cannot
+          * fall out the bottom of the other branch, then we can move
+          * the jump after the if.
+          *
+          * Set move_out to the branch we are moving a jump out of.
+          */
+         int move_out = -1;
+         if(jumps[0] && block_records[1].min_strength >= strength_continue)
+            move_out = 0;
+         else if(jumps[1] && block_records[0].min_strength >= strength_continue)
+            move_out = 1;
+
+         if(move_out >= 0)
+         {
+            jumps[move_out]->remove();
+            ir->insert_after(jumps[move_out]);
+            /* Note: we must update block_records and jumps to reflect
+             * the fact that the jump has been moved out of the if.
+             */
+            jumps[move_out] = 0;
+            block_records[move_out].min_strength = strength_none;
+            this->progress = true;
+         }
+      }
+
+      /* Now satisfy the ANALYSIS postcondition by setting
+       * this->block.min_strength and
+       * this->block.may_clear_execute_flag based on the
+       * characteristics of the two branches.
+       */
+      if(block_records[0].min_strength < block_records[1].min_strength)
+         this->block.min_strength = block_records[0].min_strength;
+      else
+         this->block.min_strength = block_records[1].min_strength;
+      this->block.may_clear_execute_flag = this->block.may_clear_execute_flag || block_records[0].may_clear_execute_flag || block_records[1].may_clear_execute_flag;
+
+      /* Now we need to clean up the instructions that follow the
+       * if.
+       *
+       * If those instructions are unreachable, then satisfy the
+       * DEAD_CODE_ELIMINATION postcondition by eliminating them.
+       * Otherwise that postcondition is already satisfied.
+       */
+      if(this->block.min_strength)
+         truncate_after_instruction(ir);
+      else if(this->block.may_clear_execute_flag)
+      {
+         /* If the "if" instruction might clear the execute flag, then
+          * we need to guard any instructions that follow so that they
+          * are only executed if the execute flag is set.
+          *
+          * If one of the branches of the "if" always clears the
+          * execute flag, and the other branch never clears it, then
+          * this is easy: just move all the instructions following the
+          * "if" into the branch that never clears it.
+          */
+         int move_into = -1;
+         if(block_records[0].min_strength && !block_records[1].may_clear_execute_flag)
+            move_into = 1;
+         else if(block_records[1].min_strength && !block_records[0].may_clear_execute_flag)
+            move_into = 0;
+
+         if(move_into >= 0) {
+            assert(!block_records[move_into].min_strength && !block_records[move_into].may_clear_execute_flag); /* otherwise, we just truncated */
+
+            exec_list* list = move_into ? &ir->else_instructions : &ir->then_instructions;
+            exec_node* next = ir->get_next();
+            if(!next->is_tail_sentinel()) {
+               move_outer_block_inside(ir, list);
+
+               /* If any instructions moved, then we need to visit
+                * them (since they are now inside the "if").  Since
+                * block_records[move_into] is in its default state
+                * (see assertion above), we can safely replace
+                * block_records[move_into] with the result of this
+                * analysis.
+                */
+               exec_list list;
+               list.head = next;
+               block_records[move_into] = visit_block(&list);
+
+               /*
+                * Then we need to re-start our jump lowering, since one
+                * of the instructions we moved might be a jump that
+                * needs to be lowered.
+                */
+               this->progress = true;
+               goto retry;
+            }
+         } else {
+            /* If we get here, then the simple case didn't apply; we
+             * need to actually guard the instructions that follow.
+             *
+             * To avoid creating unnecessarily-deep nesting, first
+             * look through the instructions that follow and unwrap
+             * any instructions that that are already wrapped in the
+             * appropriate guard.
+             */
+            ir_instruction* ir_after;
+            for(ir_after = (ir_instruction*)ir->get_next(); !ir_after->is_tail_sentinel();)
+            {
+               ir_if* ir_if = ir_after->as_if();
+               if(ir_if && ir_if->else_instructions.is_empty()) {
+                  ir_dereference_variable* ir_if_cond_deref = ir_if->condition->as_dereference_variable();
+                  if(ir_if_cond_deref && ir_if_cond_deref->var == this->loop.execute_flag) {
+                     ir_instruction* ir_next = (ir_instruction*)ir_after->get_next();
+                     ir_after->insert_before(&ir_if->then_instructions);
+                     ir_after->remove();
+                     ir_after = ir_next;
+                     continue;
+                  }
+               }
+               ir_after = (ir_instruction*)ir_after->get_next();
+
+               /* only set this if we find any unprotected instruction */
+               this->progress = true;
+            }
+
+            /* Then, wrap all the instructions that follow in a single
+             * guard.
+             */
+            if(!ir->get_next()->is_tail_sentinel()) {
+               assert(this->loop.execute_flag);
+               ir_if* if_execute = new(ir) ir_if(new(ir) ir_dereference_variable(this->loop.execute_flag));
+               move_outer_block_inside(ir, &if_execute->then_instructions);
+               ir->insert_after(if_execute);
+            }
+         }
+      }
+      --this->loop.nesting_depth;
+      --this->function.nesting_depth;
+   }
+
+   virtual void visit(ir_loop *ir)
+   {
+      /* Visit the body of the loop, with a fresh data structure in
+       * this->loop so that the analysis we do here won't bleed into
+       * enclosing loops.
+       *
+       * We assume that all code after a loop is reachable from the
+       * loop (see comments on enum jump_strength), so the
+       * DEAD_CODE_ELIMINATION postcondition is automatically
+       * satisfied, as is the block.min_strength portion of the
+       * ANALYSIS postcondition.
+       *
+       * The block.may_clear_execute_flag portion of the ANALYSIS
+       * postcondition is automatically satisfied because execute
+       * flags do not propagate outside of loops.
+       *
+       * The loop.may_set_return_flag portion of the ANALYSIS
+       * postcondition is handled below.
+       */
+      ++this->function.nesting_depth;
+      loop_record saved_loop = this->loop;
+      this->loop = loop_record(this->function.signature, ir);
+
+      /* Recursively lower nested jumps.  This satisfies the
+       * CONTAINED_JUMPS_LOWERED postcondition, except in the case of
+       * an unconditional continue or return at the bottom of the
+       * loop, which are handled below.
+       */
+      block_record body = visit_block(&ir->body_instructions);
+
+      /* If the loop ends in an unconditional continue, eliminate it
+       * because it is redundant.
+       */
+      ir_instruction *ir_last
+         = (ir_instruction *) ir->body_instructions.get_tail();
+      if (get_jump_strength(ir_last) == strength_continue) {
+         ir_last->remove();
+      }
+
+      /* If the loop ends in an unconditional return, and we are
+       * lowering returns, lower it.
+       */
+      if (this->function.lower_return)
+         lower_return_unconditionally(ir_last);
+
+      if(body.min_strength >= strength_break) {
+         /* FINISHME: If the min_strength of the loop body is
+          * strength_break or strength_return, that means that it
+          * isn't a loop at all, since control flow always leaves the
+          * body of the loop via break or return.  In principle the
+          * loop could be eliminated in this case.  This optimization
+          * is not implemented yet.
+          */
+      }
+
+      if(this->loop.break_flag) {
+         /* We only get here if we are lowering breaks */
+         assert (lower_break);
+
+         /* If a break flag was generated while visiting the body of
+          * the loop, then at least one break was lowered, so we need
+          * to generate an if statement at the end of the loop that
+          * does a "break" if the break flag is set.  The break we
+          * generate won't violate the CONTAINED_JUMPS_LOWERED
+          * postcondition, because should_lower_jump() always returns
+          * false for a break that happens at the end of a loop.
+          *
+          * However, if the loop already ends in a conditional or
+          * unconditional break, then we need to lower that break,
+          * because it won't be at the end of the loop anymore.
+          */
+         lower_final_breaks(&ir->body_instructions);
+
+         ir_if* break_if = new(ir) ir_if(new(ir) ir_dereference_variable(this->loop.break_flag));
+         break_if->then_instructions.push_tail(new(ir) ir_loop_jump(ir_loop_jump::jump_break));
+         ir->body_instructions.push_tail(break_if);
+      }
+
+      /* If the body of the loop may set the return flag, then at
+       * least one return was lowered to a break, so we need to ensure
+       * that the return flag is checked after the body of the loop is
+       * executed.
+       */
+      if(this->loop.may_set_return_flag) {
+         assert(this->function.return_flag);
+         /* Generate the if statement to check the return flag */
+         ir_if* return_if = new(ir) ir_if(new(ir) ir_dereference_variable(this->function.return_flag));
+         /* Note: we also need to propagate the knowledge that the
+          * return flag may get set to the outer context.  This
+          * satisfies the loop.may_set_return_flag part of the
+          * ANALYSIS postcondition.
+          */
+         saved_loop.may_set_return_flag = true;
+         if(saved_loop.loop)
+            /* If this loop is nested inside another one, then the if
+             * statement that we generated should break out of that
+             * loop if the return flag is set.  Caller will lower that
+             * break statement if necessary.
+             */
+            return_if->then_instructions.push_tail(new(ir) ir_loop_jump(ir_loop_jump::jump_break));
+         else
+            /* Otherwise, all we need to do is ensure that the
+             * instructions that follow are only executed if the
+             * return flag is clear.  We can do that by moving those
+             * instructions into the else clause of the generated if
+             * statement.
+             */
+            move_outer_block_inside(ir, &return_if->else_instructions);
+         ir->insert_after(return_if);
+      }
+
+      this->loop = saved_loop;
+      --this->function.nesting_depth;
+   }
+
+   virtual void visit(ir_function_signature *ir)
+   {
+      /* these are not strictly necessary */
+      assert(!this->function.signature);
+      assert(!this->loop.loop);
+
+      bool lower_return;
+      if (strcmp(ir->function_name(), "main") == 0)
+         lower_return = lower_main_return;
+      else
+         lower_return = lower_sub_return;
+
+      function_record saved_function = this->function;
+      loop_record saved_loop = this->loop;
+      this->function = function_record(ir, lower_return);
+      this->loop = loop_record(ir);
+
+      assert(!this->loop.loop);
+
+      /* Visit the body of the function to lower any jumps that occur
+       * in it, except possibly an unconditional return statement at
+       * the end of it.
+       */
+      visit_block(&ir->body);
+
+      /* If the body ended in an unconditional return of non-void,
+       * then we don't need to lower it because it's the one canonical
+       * return.
+       *
+       * If the body ended in a return of void, eliminate it because
+       * it is redundant.
+       */
+      if (ir->return_type->is_void() &&
+          get_jump_strength((ir_instruction *) ir->body.get_tail())) {
+         ir_jump *jump = (ir_jump *) ir->body.get_tail();
+         assert (jump->ir_type == ir_type_return);
+         jump->remove();
+      }
+
+      if(this->function.return_value)
+         ir->body.push_tail(new(ir) ir_return(new (ir) ir_dereference_variable(this->function.return_value)));
+
+      this->loop = saved_loop;
+      this->function = saved_function;
+   }
+
+   virtual void visit(class ir_function * ir)
+   {
+      visit_block(&ir->signatures);
+   }
+};
+
+} /* anonymous namespace */
+
+bool
+do_lower_jumps(exec_list *instructions, bool pull_out_jumps, bool lower_sub_return, bool lower_main_return, bool lower_continue, bool lower_break)
+{
+   ir_lower_jumps_visitor v;
+   v.pull_out_jumps = pull_out_jumps;
+   v.lower_continue = lower_continue;
+   v.lower_break = lower_break;
+   v.lower_sub_return = lower_sub_return;
+   v.lower_main_return = lower_main_return;
+
+   bool progress_ever = false;
+   do {
+      v.progress = false;
+      visit_exec_list(instructions, &v);
+      progress_ever = v.progress || progress_ever;
+   } while (v.progress);
+
+   return progress_ever;
+}

diff --git a/icd/intel/compiler/shader/lower_mat_op_to_vec.cpp b/icd/intel/compiler/shader/lower_mat_op_to_vec.cpp
new file mode 100644
index 0000000..105ee0d
--- /dev/null
+++ b/icd/intel/compiler/shader/lower_mat_op_to_vec.cpp

@@ -0,0 +1,432 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file lower_mat_op_to_vec.cpp
+ *
+ * Breaks matrix operation expressions down to a series of vector operations.
+ *
+ * Generally this is how we have to codegen matrix operations for a
+ * GPU, so this gives us the chance to constant fold operations on a
+ * column or row.
+ */
+
+#include "ir.h"
+#include "ir_expression_flattening.h"
+#include "glsl_types.h"
+
+namespace {
+
+class ir_mat_op_to_vec_visitor : public ir_hierarchical_visitor {
+public:
+   ir_mat_op_to_vec_visitor()
+   {
+      this->made_progress = false;
+      this->mem_ctx = NULL;
+   }
+
+   ir_visitor_status visit_leave(ir_assignment *);
+
+   ir_dereference *get_column(ir_dereference *val, int col);
+   ir_rvalue *get_element(ir_dereference *val, int col, int row);
+
+   void do_mul_mat_mat(ir_dereference *result,
+		       ir_dereference *a, ir_dereference *b);
+   void do_mul_mat_vec(ir_dereference *result,
+		       ir_dereference *a, ir_dereference *b);
+   void do_mul_vec_mat(ir_dereference *result,
+		       ir_dereference *a, ir_dereference *b);
+   void do_mul_mat_scalar(ir_dereference *result,
+			  ir_dereference *a, ir_dereference *b);
+   void do_equal_mat_mat(ir_dereference *result, ir_dereference *a,
+			 ir_dereference *b, bool test_equal);
+
+   void *mem_ctx;
+   bool made_progress;
+};
+
+} /* anonymous namespace */
+
+static bool
+mat_op_to_vec_predicate(ir_instruction *ir)
+{
+   ir_expression *expr = ir->as_expression();
+   unsigned int i;
+
+   if (!expr)
+      return false;
+
+   for (i = 0; i < expr->get_num_operands(); i++) {
+      if (expr->operands[i]->type->is_matrix())
+	 return true;
+   }
+
+   return false;
+}
+
+bool
+do_mat_op_to_vec(exec_list *instructions)
+{
+   ir_mat_op_to_vec_visitor v;
+
+   /* Pull out any matrix expression to a separate assignment to a
+    * temp.  This will make our handling of the breakdown to
+    * operations on the matrix's vector components much easier.
+    */
+   do_expression_flattening(instructions, mat_op_to_vec_predicate);
+
+   visit_list_elements(&v, instructions);
+
+   return v.made_progress;
+}
+
+ir_rvalue *
+ir_mat_op_to_vec_visitor::get_element(ir_dereference *val, int col, int row)
+{
+   val = get_column(val, col);
+
+   return new(mem_ctx) ir_swizzle(val, row, 0, 0, 0, 1);
+}
+
+ir_dereference *
+ir_mat_op_to_vec_visitor::get_column(ir_dereference *val, int row)
+{
+   val = val->clone(mem_ctx, NULL);
+
+   if (val->type->is_matrix()) {
+      val = new(mem_ctx) ir_dereference_array(val,
+					      new(mem_ctx) ir_constant(row));
+   }
+
+   return val;
+}
+
+void
+ir_mat_op_to_vec_visitor::do_mul_mat_mat(ir_dereference *result,
+					 ir_dereference *a,
+					 ir_dereference *b)
+{
+   unsigned b_col, i;
+   ir_assignment *assign;
+   ir_expression *expr;
+
+   for (b_col = 0; b_col < b->type->matrix_columns; b_col++) {
+      /* first column */
+      expr = new(mem_ctx) ir_expression(ir_binop_mul,
+					get_column(a, 0),
+					get_element(b, b_col, 0));
+
+      /* following columns */
+      for (i = 1; i < a->type->matrix_columns; i++) {
+	 ir_expression *mul_expr;
+
+	 mul_expr = new(mem_ctx) ir_expression(ir_binop_mul,
+					       get_column(a, i),
+					       get_element(b, b_col, i));
+	 expr = new(mem_ctx) ir_expression(ir_binop_add,
+					   expr,
+					   mul_expr);
+      }
+
+      assign = new(mem_ctx) ir_assignment(get_column(result, b_col), expr);
+      base_ir->insert_before(assign);
+   }
+}
+
+void
+ir_mat_op_to_vec_visitor::do_mul_mat_vec(ir_dereference *result,
+					 ir_dereference *a,
+					 ir_dereference *b)
+{
+   unsigned i;
+   ir_assignment *assign;
+   ir_expression *expr;
+
+   /* first column */
+   expr = new(mem_ctx) ir_expression(ir_binop_mul,
+				     get_column(a, 0),
+				     get_element(b, 0, 0));
+
+   /* following columns */
+   for (i = 1; i < a->type->matrix_columns; i++) {
+      ir_expression *mul_expr;
+
+      mul_expr = new(mem_ctx) ir_expression(ir_binop_mul,
+					    get_column(a, i),
+					    get_element(b, 0, i));
+      expr = new(mem_ctx) ir_expression(ir_binop_add, expr, mul_expr);
+   }
+
+   result = result->clone(mem_ctx, NULL);
+   assign = new(mem_ctx) ir_assignment(result, expr);
+   base_ir->insert_before(assign);
+}
+
+void
+ir_mat_op_to_vec_visitor::do_mul_vec_mat(ir_dereference *result,
+					 ir_dereference *a,
+					 ir_dereference *b)
+{
+   unsigned i;
+
+   for (i = 0; i < b->type->matrix_columns; i++) {
+      ir_rvalue *column_result;
+      ir_expression *column_expr;
+      ir_assignment *column_assign;
+
+      column_result = result->clone(mem_ctx, NULL);
+      column_result = new(mem_ctx) ir_swizzle(column_result, i, 0, 0, 0, 1);
+
+      column_expr = new(mem_ctx) ir_expression(ir_binop_dot,
+					       a->clone(mem_ctx, NULL),
+					       get_column(b, i));
+
+      column_assign = new(mem_ctx) ir_assignment(column_result,
+						 column_expr);
+      base_ir->insert_before(column_assign);
+   }
+}
+
+void
+ir_mat_op_to_vec_visitor::do_mul_mat_scalar(ir_dereference *result,
+					    ir_dereference *a,
+					    ir_dereference *b)
+{
+   unsigned i;
+
+   for (i = 0; i < a->type->matrix_columns; i++) {
+      ir_expression *column_expr;
+      ir_assignment *column_assign;
+
+      column_expr = new(mem_ctx) ir_expression(ir_binop_mul,
+					       get_column(a, i),
+					       b->clone(mem_ctx, NULL));
+
+      column_assign = new(mem_ctx) ir_assignment(get_column(result, i),
+						 column_expr);
+      base_ir->insert_before(column_assign);
+   }
+}
+
+void
+ir_mat_op_to_vec_visitor::do_equal_mat_mat(ir_dereference *result,
+					   ir_dereference *a,
+					   ir_dereference *b,
+					   bool test_equal)
+{
+   /* This essentially implements the following GLSL:
+    *
+    * bool equal(mat4 a, mat4 b)
+    * {
+    *   return !any(bvec4(a[0] != b[0],
+    *                     a[1] != b[1],
+    *                     a[2] != b[2],
+    *                     a[3] != b[3]);
+    * }
+    *
+    * bool nequal(mat4 a, mat4 b)
+    * {
+    *   return any(bvec4(a[0] != b[0],
+    *                    a[1] != b[1],
+    *                    a[2] != b[2],
+    *                    a[3] != b[3]);
+    * }
+    */
+   const unsigned columns = a->type->matrix_columns;
+   const glsl_type *const bvec_type =
+      glsl_type::get_instance(GLSL_TYPE_BOOL, columns, 1);
+
+   ir_variable *const tmp_bvec =
+      new(this->mem_ctx) ir_variable(bvec_type, "mat_cmp_bvec",
+				     ir_var_temporary);
+   this->base_ir->insert_before(tmp_bvec);
+
+   for (unsigned i = 0; i < columns; i++) {
+      ir_expression *const cmp =
+	 new(this->mem_ctx) ir_expression(ir_binop_any_nequal,
+					  get_column(a, i),
+					  get_column(b, i));
+
+      ir_dereference *const lhs =
+	 new(this->mem_ctx) ir_dereference_variable(tmp_bvec);
+
+      ir_assignment *const assign =
+	 new(this->mem_ctx) ir_assignment(lhs, cmp, NULL, (1U << i));
+
+      this->base_ir->insert_before(assign);
+   }
+
+   ir_rvalue *const val = new(this->mem_ctx) ir_dereference_variable(tmp_bvec);
+   ir_expression *any = new(this->mem_ctx) ir_expression(ir_unop_any, val);
+
+   if (test_equal)
+      any = new(this->mem_ctx) ir_expression(ir_unop_logic_not, any);
+
+   ir_assignment *const assign =
+      new(mem_ctx) ir_assignment(result->clone(mem_ctx, NULL), any);
+   base_ir->insert_before(assign);
+}
+
+static bool
+has_matrix_operand(const ir_expression *expr, unsigned &columns)
+{
+   for (unsigned i = 0; i < expr->get_num_operands(); i++) {
+      if (expr->operands[i]->type->is_matrix()) {
+	 columns = expr->operands[i]->type->matrix_columns;
+	 return true;
+      }
+   }
+
+   return false;
+}
+
+
+ir_visitor_status
+ir_mat_op_to_vec_visitor::visit_leave(ir_assignment *orig_assign)
+{
+   ir_expression *orig_expr = orig_assign->rhs->as_expression();
+   unsigned int i, matrix_columns = 1;
+   ir_dereference *op[2];
+
+   if (!orig_expr)
+      return visit_continue;
+
+   if (!has_matrix_operand(orig_expr, matrix_columns))
+      return visit_continue;
+
+   assert(orig_expr->get_num_operands() <= 2);
+
+   mem_ctx = ralloc_parent(orig_assign);
+
+   ir_dereference_variable *result =
+      orig_assign->lhs->as_dereference_variable();
+   assert(result);
+
+   /* Store the expression operands in temps so we can use them
+    * multiple times.
+    */
+   for (i = 0; i < orig_expr->get_num_operands(); i++) {
+      ir_assignment *assign;
+      ir_dereference *deref = orig_expr->operands[i]->as_dereference();
+
+      /* Avoid making a temporary if we don't need to to avoid aliasing. */
+      if (deref &&
+	  deref->variable_referenced() != result->variable_referenced()) {
+	 op[i] = deref;
+	 continue;
+      }
+
+      /* Otherwise, store the operand in a temporary generally if it's
+       * not a dereference.
+       */
+      ir_variable *var = new(mem_ctx) ir_variable(orig_expr->operands[i]->type,
+						  "mat_op_to_vec",
+						  ir_var_temporary);
+      base_ir->insert_before(var);
+
+      /* Note that we use this dereference for the assignment.  That means
+       * that others that want to use op[i] have to clone the deref.
+       */
+      op[i] = new(mem_ctx) ir_dereference_variable(var);
+      assign = new(mem_ctx) ir_assignment(op[i], orig_expr->operands[i]);
+      base_ir->insert_before(assign);
+   }
+
+   /* OK, time to break down this matrix operation. */
+   switch (orig_expr->operation) {
+   case ir_unop_neg: {
+      /* Apply the operation to each column.*/
+      for (i = 0; i < matrix_columns; i++) {
+	 ir_expression *column_expr;
+	 ir_assignment *column_assign;
+
+	 column_expr = new(mem_ctx) ir_expression(orig_expr->operation,
+						  get_column(op[0], i));
+
+	 column_assign = new(mem_ctx) ir_assignment(get_column(result, i),
+						    column_expr);
+	 assert(column_assign->write_mask != 0);
+	 base_ir->insert_before(column_assign);
+      }
+      break;
+   }
+   case ir_binop_add:
+   case ir_binop_sub:
+   case ir_binop_div:
+   case ir_binop_mod: {
+      /* For most operations, the matrix version is just going
+       * column-wise through and applying the operation to each column
+       * if available.
+       */
+      for (i = 0; i < matrix_columns; i++) {
+	 ir_expression *column_expr;
+	 ir_assignment *column_assign;
+
+	 column_expr = new(mem_ctx) ir_expression(orig_expr->operation,
+						  get_column(op[0], i),
+						  get_column(op[1], i));
+
+	 column_assign = new(mem_ctx) ir_assignment(get_column(result, i),
+						    column_expr);
+	 assert(column_assign->write_mask != 0);
+	 base_ir->insert_before(column_assign);
+      }
+      break;
+   }
+   case ir_binop_mul:
+      if (op[0]->type->is_matrix()) {
+	 if (op[1]->type->is_matrix()) {
+	    do_mul_mat_mat(result, op[0], op[1]);
+	 } else if (op[1]->type->is_vector()) {
+	    do_mul_mat_vec(result, op[0], op[1]);
+	 } else {
+	    assert(op[1]->type->is_scalar());
+	    do_mul_mat_scalar(result, op[0], op[1]);
+	 }
+      } else {
+	 assert(op[1]->type->is_matrix());
+	 if (op[0]->type->is_vector()) {
+	    do_mul_vec_mat(result, op[0], op[1]);
+	 } else {
+	    assert(op[0]->type->is_scalar());
+	    do_mul_mat_scalar(result, op[1], op[0]);
+	 }
+      }
+      break;
+
+   case ir_binop_all_equal:
+   case ir_binop_any_nequal:
+      do_equal_mat_mat(result, op[1], op[0],
+		       (orig_expr->operation == ir_binop_all_equal));
+      break;
+
+   default:
+      printf("FINISHME: Handle matrix operation for %s\n",
+	     orig_expr->operator_string());
+      abort();
+   }
+   orig_assign->remove();
+   this->made_progress = true;
+
+   return visit_continue;
+}

diff --git a/icd/intel/compiler/shader/lower_named_interface_blocks.cpp b/icd/intel/compiler/shader/lower_named_interface_blocks.cpp
new file mode 100644
index 0000000..04e0d36
--- /dev/null
+++ b/icd/intel/compiler/shader/lower_named_interface_blocks.cpp

@@ -0,0 +1,249 @@
+/*
+ * Copyright (c) 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file lower_named_interface_blocks.cpp
+ *
+ * This lowering pass converts all interface blocks with instance names
+ * into interface blocks without an instance name.
+ *
+ * For example, the following shader:
+ *
+ *   out block {
+ *     float block_var;
+ *   } inst_name;
+ *
+ *   main()
+ *   {
+ *     inst_name.block_var = 0.0;
+ *   }
+ *
+ * Is rewritten to:
+ *
+ *   out block {
+ *     float block_var;
+ *   };
+ *
+ *   main()
+ *   {
+ *     block_var = 0.0;
+ *   }
+ *
+ * This takes place after the shader code has already been verified with
+ * the interface name in place.
+ *
+ * The linking phase will use the interface block name rather than the
+ * interface's instance name when linking interfaces.
+ *
+ * This modification to the ir allows our currently existing dead code
+ * elimination to work with interface blocks without changes.
+ */
+
+#include "glsl_symbol_table.h"
+#include "ir.h"
+#include "ir_optimization.h"
+#include "ir_rvalue_visitor.h"
+#include "program/hash_table.h"
+
+namespace {
+
+class flatten_named_interface_blocks_declarations : public ir_rvalue_visitor
+{
+public:
+   void * const mem_ctx;
+   hash_table *interface_namespace;
+
+   flatten_named_interface_blocks_declarations(void *mem_ctx)
+      : mem_ctx(mem_ctx),
+        interface_namespace(NULL)
+   {
+   }
+
+   void run(exec_list *instructions);
+
+   virtual ir_visitor_status visit_leave(ir_assignment *);
+   virtual void handle_rvalue(ir_rvalue **rvalue);
+};
+
+} /* anonymous namespace */
+
+void
+flatten_named_interface_blocks_declarations::run(exec_list *instructions)
+{
+   interface_namespace = hash_table_ctor(0, hash_table_string_hash,
+                                         hash_table_string_compare);
+
+   /* First pass: adjust instance block variables with an instance name
+    * to not have an instance name.
+    *
+    * The interface block variables are stored in the interface_namespace
+    * hash table so they can be used in the second pass.
+    */
+   foreach_list_safe(node, instructions) {
+      ir_variable *var = ((ir_instruction *) node)->as_variable();
+      if (!var || !var->is_interface_instance())
+         continue;
+
+      /* It should be possible to handle uniforms during this pass,
+       * but, this will require changes to the other uniform block
+       * support code.
+       */
+      if (var->data.mode == ir_var_uniform)
+         continue;
+
+      const glsl_type * iface_t = var->type;
+      const glsl_type * array_t = NULL;
+      exec_node *insert_pos = var;
+
+      if (iface_t->is_array()) {
+         array_t = iface_t;
+         iface_t = array_t->fields.array;
+      }
+
+      assert (iface_t->is_interface());
+
+      for (unsigned i = 0; i < iface_t->length; i++) {
+         const char * field_name = iface_t->fields.structure[i].name;
+         char *iface_field_name =
+            ralloc_asprintf(mem_ctx, "%s.%s.%s",
+                            iface_t->name, var->name, field_name);
+
+         ir_variable *found_var =
+            (ir_variable *) hash_table_find(interface_namespace,
+                                            iface_field_name);
+         if (!found_var) {
+            ir_variable *new_var;
+            char *var_name =
+               ralloc_strdup(mem_ctx, iface_t->fields.structure[i].name);
+            if (array_t == NULL) {
+               new_var =
+                  new(mem_ctx) ir_variable(iface_t->fields.structure[i].type,
+                                           var_name,
+                                           (ir_variable_mode) var->data.mode);
+               new_var->data.from_named_ifc_block_nonarray = 1;
+            } else {
+               const glsl_type *new_array_type =
+                  glsl_type::get_array_instance(
+                     iface_t->fields.structure[i].type,
+                     array_t->length);
+               new_var =
+                  new(mem_ctx) ir_variable(new_array_type,
+                                           var_name,
+                                           (ir_variable_mode) var->data.mode);
+               new_var->data.from_named_ifc_block_array = 1;
+            }
+            new_var->data.location = iface_t->fields.structure[i].location;
+            new_var->data.explicit_location = (new_var->data.location >= 0);
+            new_var->data.interpolation =
+               iface_t->fields.structure[i].interpolation;
+            new_var->data.centroid = iface_t->fields.structure[i].centroid;
+            new_var->data.sample = iface_t->fields.structure[i].sample;
+
+            new_var->init_interface_type(iface_t);
+            hash_table_insert(interface_namespace, new_var,
+                              iface_field_name);
+            insert_pos->insert_after(new_var);
+            insert_pos = new_var;
+         }
+      }
+      var->remove();
+   }
+
+   /* Second pass: visit all ir_dereference_record instances, and if they
+    * reference an interface block, then flatten the refererence out.
+    */
+   visit_list_elements(this, instructions);
+   hash_table_dtor(interface_namespace);
+   interface_namespace = NULL;
+}
+
+ir_visitor_status
+flatten_named_interface_blocks_declarations::visit_leave(ir_assignment *ir)
+{
+   ir_dereference_record *lhs_rec = ir->lhs->as_dereference_record();
+   if (lhs_rec) {
+      ir_rvalue *lhs_rec_tmp = lhs_rec;
+      handle_rvalue(&lhs_rec_tmp);
+      if (lhs_rec_tmp != lhs_rec) {
+         ir->set_lhs(lhs_rec_tmp);
+      }
+   }
+   return rvalue_visit(ir);
+}
+
+void
+flatten_named_interface_blocks_declarations::handle_rvalue(ir_rvalue **rvalue)
+{
+   if (*rvalue == NULL)
+      return;
+
+   ir_dereference_record *ir = (*rvalue)->as_dereference_record();
+   if (ir == NULL)
+      return;
+
+   ir_variable *var = ir->variable_referenced();
+   if (var == NULL)
+      return;
+
+   if (!var->is_interface_instance())
+      return;
+
+   /* It should be possible to handle uniforms during this pass,
+    * but, this will require changes to the other uniform block
+    * support code.
+    */
+   if (var->data.mode == ir_var_uniform)
+      return;
+
+   if (var->get_interface_type() != NULL) {
+      char *iface_field_name =
+         ralloc_asprintf(mem_ctx, "%s.%s.%s", var->get_interface_type()->name,
+                         var->name, ir->field);
+      /* Find the variable in the set of flattened interface blocks */
+      ir_variable *found_var =
+         (ir_variable *) hash_table_find(interface_namespace,
+                                         iface_field_name);
+      assert(found_var);
+
+      ir_dereference_variable *deref_var =
+         new(mem_ctx) ir_dereference_variable(found_var);
+
+      ir_dereference_array *deref_array =
+         ir->record->as_dereference_array();
+      if (deref_array != NULL) {
+         *rvalue =
+            new(mem_ctx) ir_dereference_array(deref_var,
+                                              deref_array->array_index);
+      } else {
+         *rvalue = deref_var;
+      }
+   }
+}
+
+void
+lower_named_interface_blocks(void *mem_ctx, gl_shader *shader)
+{
+   flatten_named_interface_blocks_declarations v_decl(mem_ctx);
+   v_decl.run(shader->ir);
+}
+

diff --git a/icd/intel/compiler/shader/lower_noise.cpp b/icd/intel/compiler/shader/lower_noise.cpp
new file mode 100644
index 0000000..85f59b6
--- /dev/null
+++ b/icd/intel/compiler/shader/lower_noise.cpp

@@ -0,0 +1,71 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file lower_noise.cpp
+ * IR lower pass to remove noise opcodes.
+ *
+ * \author Ian Romanick <ian.d.romanick@intel.com>
+ */
+
+#include "ir.h"
+#include "ir_rvalue_visitor.h"
+
+class lower_noise_visitor : public ir_rvalue_visitor {
+public:
+   lower_noise_visitor() : progress(false)
+   {
+      /* empty */
+   }
+
+   void handle_rvalue(ir_rvalue **rvalue)
+   {
+      if (!*rvalue)
+	 return;
+
+      ir_expression *expr = (*rvalue)->as_expression();
+      if (!expr)
+	 return;
+
+      /* In the future, ir_unop_noise may be replaced by a call to a function
+       * that implements noise.  No hardware has a noise instruction.
+       */
+      if (expr->operation == ir_unop_noise) {
+	 *rvalue = ir_constant::zero(ralloc_parent(expr), expr->type);
+	 this->progress = true;
+      }
+   }
+
+   bool progress;
+};
+
+
+bool
+lower_noise(exec_list *instructions)
+{
+   lower_noise_visitor v;
+
+   visit_list_elements(&v, instructions);
+
+   return v.progress;
+}

diff --git a/icd/intel/compiler/shader/lower_offset_array.cpp b/icd/intel/compiler/shader/lower_offset_array.cpp
new file mode 100644
index 0000000..0c235ed
--- /dev/null
+++ b/icd/intel/compiler/shader/lower_offset_array.cpp

@@ -0,0 +1,90 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file brw_lower_offset_array.cpp
+ *
+ * IR lower pass to decompose ir_texture ir_tg4 with an array of offsets
+ * into four ir_tg4s with a single ivec2 offset, select the .w component of each,
+ * and return those four values packed into a gvec4.
+ *
+ * \author Chris Forbes <chrisf@ijw.co.nz>
+ */
+
+#include "glsl_types.h"
+#include "ir.h"
+#include "ir_builder.h"
+#include "ir_optimization.h"
+#include "ir_rvalue_visitor.h"
+
+using namespace ir_builder;
+
+class brw_lower_offset_array_visitor : public ir_rvalue_visitor {
+public:
+   brw_lower_offset_array_visitor()
+   {
+      progress = false;
+   }
+
+   void handle_rvalue(ir_rvalue **rv);
+
+   bool progress;
+};
+
+void
+brw_lower_offset_array_visitor::handle_rvalue(ir_rvalue **rv)
+{
+   if (*rv == NULL || (*rv)->ir_type != ir_type_texture)
+      return;
+
+   ir_texture *ir = (ir_texture *) *rv;
+   if (ir->op != ir_tg4 || !ir->offset || !ir->offset->type->is_array())
+      return;
+
+   void *mem_ctx = ralloc_parent(ir);
+
+   ir_variable *var = new (mem_ctx) ir_variable(ir->type, "result", ir_var_auto);
+   base_ir->insert_before(var);
+
+   for (int i = 0; i < 4; i++) {
+      ir_texture *tex = ir->clone(mem_ctx, NULL);
+      tex->offset = new (mem_ctx) ir_dereference_array(tex->offset,
+            new (mem_ctx) ir_constant(i));
+
+      base_ir->insert_before(assign(var, swizzle_w(tex), 1 << i));
+   }
+
+   *rv = new (mem_ctx) ir_dereference_variable(var);
+
+   progress = true;
+}
+
+bool
+lower_offset_arrays(exec_list *instructions)
+{
+   brw_lower_offset_array_visitor v;
+
+   visit_list_elements(&v, instructions);
+
+   return v.progress;
+}

diff --git a/icd/intel/compiler/shader/lower_output_reads.cpp b/icd/intel/compiler/shader/lower_output_reads.cpp
new file mode 100644
index 0000000..afe1776
--- /dev/null
+++ b/icd/intel/compiler/shader/lower_output_reads.cpp

@@ -0,0 +1,173 @@
+/*
+ * Copyright © 2012 Vincent Lejeune
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "ir.h"
+#include "program/hash_table.h"
+
+/**
+ * \file lower_output_reads.cpp
+ *
+ * In GLSL, shader output variables (such as varyings) can be both read and
+ * written.  However, on some hardware, reading an output register causes
+ * trouble.
+ *
+ * This pass creates temporary shadow copies of every (used) shader output,
+ * and replaces all accesses to use those instead.  It also adds code to the
+ * main() function to copy the final values to the actual shader outputs.
+ */
+
+namespace {
+
+class output_read_remover : public ir_hierarchical_visitor {
+protected:
+   /**
+    * A hash table mapping from the original ir_variable shader outputs
+    * (ir_var_shader_out mode) to the new temporaries to be used instead.
+    */
+   hash_table *replacements;
+
+   void *mem_ctx;
+public:
+   output_read_remover();
+   ~output_read_remover();
+   virtual ir_visitor_status visit(class ir_dereference_variable *);
+   virtual ir_visitor_status visit(class ir_emit_vertex *);
+   virtual ir_visitor_status visit_leave(class ir_return *);
+   virtual ir_visitor_status visit_leave(class ir_function_signature *);
+};
+
+} /* anonymous namespace */
+
+/**
+ * Hash function for the output variables - computes the hash of the name.
+ * NOTE: We're using the name string to ensure that the hash doesn't depend
+ * on any random factors, otherwise the output_read_remover could produce
+ * the random order of the assignments.
+ *
+ * NOTE: If you want to reuse this function please take into account that
+ * generally the names of the variables are non-unique.
+ */
+static unsigned
+hash_table_var_hash(const void *key)
+{
+   const ir_variable * var = static_cast<const ir_variable *>(key);
+   return hash_table_string_hash(var->name);
+}
+
+output_read_remover::output_read_remover()
+{
+   mem_ctx = ralloc_context(NULL);
+   replacements =
+      hash_table_ctor(0, hash_table_var_hash, hash_table_pointer_compare);
+}
+
+output_read_remover::~output_read_remover()
+{
+   hash_table_dtor(replacements);
+   ralloc_free(mem_ctx);
+}
+
+ir_visitor_status
+output_read_remover::visit(ir_dereference_variable *ir)
+{
+   if (ir->var->data.mode != ir_var_shader_out)
+      return visit_continue;
+
+   ir_variable *temp = (ir_variable *) hash_table_find(replacements, ir->var);
+
+   /* If we don't have an existing temporary, create one. */
+   if (temp == NULL) {
+      void *var_ctx = ralloc_parent(ir->var);
+      temp = new(var_ctx) ir_variable(ir->var->type, ir->var->name,
+                                      ir_var_temporary);
+      hash_table_insert(replacements, temp, ir->var);
+      ir->var->insert_after(temp);
+   }
+
+   /* Update the dereference to use the temporary */
+   ir->var = temp;
+
+   return visit_continue;
+}
+
+/**
+ * Create an assignment to copy a temporary value back to the actual output.
+ */
+static ir_assignment *
+copy(void *ctx, ir_variable *output, ir_variable *temp)
+{
+   ir_dereference_variable *lhs = new(ctx) ir_dereference_variable(output);
+   ir_dereference_variable *rhs = new(ctx) ir_dereference_variable(temp);
+   return new(ctx) ir_assignment(lhs, rhs);
+}
+
+/** Insert a copy-back assignment before a "return" statement or a call to
+ * EmitVertex().
+ */
+static void
+emit_return_copy(const void *key, void *data, void *closure)
+{
+   ir_return *ir = (ir_return *) closure;
+   ir->insert_before(copy(ir, (ir_variable *) key, (ir_variable *) data));
+}
+
+/** Insert a copy-back assignment at the end of the main() function */
+static void
+emit_main_copy(const void *key, void *data, void *closure)
+{
+   ir_function_signature *sig = (ir_function_signature *) closure;
+   sig->body.push_tail(copy(sig, (ir_variable *) key, (ir_variable *) data));
+}
+
+ir_visitor_status
+output_read_remover::visit_leave(ir_return *ir)
+{
+   hash_table_call_foreach(replacements, emit_return_copy, ir);
+   return visit_continue;
+}
+
+ir_visitor_status
+output_read_remover::visit(ir_emit_vertex *ir)
+{
+   hash_table_call_foreach(replacements, emit_return_copy, ir);
+   hash_table_clear(replacements);
+   return visit_continue;
+}
+
+ir_visitor_status
+output_read_remover::visit_leave(ir_function_signature *sig)
+{
+   if (strcmp(sig->function_name(), "main") != 0)
+      return visit_continue;
+
+   hash_table_call_foreach(replacements, emit_main_copy, sig);
+   return visit_continue;
+}
+
+void
+lower_output_reads(exec_list *instructions)
+{
+   output_read_remover v;
+   visit_list_elements(&v, instructions);
+}

diff --git a/icd/intel/compiler/shader/lower_packed_varyings.cpp b/icd/intel/compiler/shader/lower_packed_varyings.cpp
new file mode 100644
index 0000000..e865474
--- /dev/null
+++ b/icd/intel/compiler/shader/lower_packed_varyings.cpp

@@ -0,0 +1,681 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file lower_varyings_to_packed.cpp
+ *
+ * This lowering pass generates GLSL code that manually packs varyings into
+ * vec4 slots, for the benefit of back-ends that don't support packed varyings
+ * natively.
+ *
+ * For example, the following shader:
+ *
+ *   out mat3x2 foo;  // location=4, location_frac=0
+ *   out vec3 bar[2]; // location=5, location_frac=2
+ *
+ *   main()
+ *   {
+ *     ...
+ *   }
+ *
+ * Is rewritten to:
+ *
+ *   mat3x2 foo;
+ *   vec3 bar[2];
+ *   out vec4 packed4; // location=4, location_frac=0
+ *   out vec4 packed5; // location=5, location_frac=0
+ *   out vec4 packed6; // location=6, location_frac=0
+ *
+ *   main()
+ *   {
+ *     ...
+ *     packed4.xy = foo[0];
+ *     packed4.zw = foo[1];
+ *     packed5.xy = foo[2];
+ *     packed5.zw = bar[0].xy;
+ *     packed6.x = bar[0].z;
+ *     packed6.yzw = bar[1];
+ *   }
+ *
+ * This lowering pass properly handles "double parking" of a varying vector
+ * across two varying slots.  For example, in the code above, two of the
+ * components of bar[0] are stored in packed5, and the remaining component is
+ * stored in packed6.
+ *
+ * Note that in theory, the extra instructions may cause some loss of
+ * performance.  However, hopefully in most cases the performance loss will
+ * either be absorbed by a later optimization pass, or it will be offset by
+ * memory bandwidth savings (because fewer varyings are used).
+ *
+ * This lowering pass also packs flat floats, ints, and uints together, by
+ * using ivec4 as the base type of flat "varyings", and using appropriate
+ * casts to convert floats and uints into ints.
+ *
+ * This lowering pass also handles varyings whose type is a struct or an array
+ * of struct.  Structs are packed in order and with no gaps, so there may be a
+ * performance penalty due to structure elements being double-parked.
+ *
+ * Lowering of geometry shader inputs is slightly more complex, since geometry
+ * inputs are always arrays, so we need to lower arrays to arrays.  For
+ * example, the following input:
+ *
+ *   in struct Foo {
+ *     float f;
+ *     vec3 v;
+ *     vec2 a[2];
+ *   } arr[3];         // location=4, location_frac=0
+ *
+ * Would get lowered like this if it occurred in a fragment shader:
+ *
+ *   struct Foo {
+ *     float f;
+ *     vec3 v;
+ *     vec2 a[2];
+ *   } arr[3];
+ *   in vec4 packed4;  // location=4, location_frac=0
+ *   in vec4 packed5;  // location=5, location_frac=0
+ *   in vec4 packed6;  // location=6, location_frac=0
+ *   in vec4 packed7;  // location=7, location_frac=0
+ *   in vec4 packed8;  // location=8, location_frac=0
+ *   in vec4 packed9;  // location=9, location_frac=0
+ *
+ *   main()
+ *   {
+ *     arr[0].f = packed4.x;
+ *     arr[0].v = packed4.yzw;
+ *     arr[0].a[0] = packed5.xy;
+ *     arr[0].a[1] = packed5.zw;
+ *     arr[1].f = packed6.x;
+ *     arr[1].v = packed6.yzw;
+ *     arr[1].a[0] = packed7.xy;
+ *     arr[1].a[1] = packed7.zw;
+ *     arr[2].f = packed8.x;
+ *     arr[2].v = packed8.yzw;
+ *     arr[2].a[0] = packed9.xy;
+ *     arr[2].a[1] = packed9.zw;
+ *     ...
+ *   }
+ *
+ * But it would get lowered like this if it occurred in a geometry shader:
+ *
+ *   struct Foo {
+ *     float f;
+ *     vec3 v;
+ *     vec2 a[2];
+ *   } arr[3];
+ *   in vec4 packed4[3];  // location=4, location_frac=0
+ *   in vec4 packed5[3];  // location=5, location_frac=0
+ *
+ *   main()
+ *   {
+ *     arr[0].f = packed4[0].x;
+ *     arr[0].v = packed4[0].yzw;
+ *     arr[0].a[0] = packed5[0].xy;
+ *     arr[0].a[1] = packed5[0].zw;
+ *     arr[1].f = packed4[1].x;
+ *     arr[1].v = packed4[1].yzw;
+ *     arr[1].a[0] = packed5[1].xy;
+ *     arr[1].a[1] = packed5[1].zw;
+ *     arr[2].f = packed4[2].x;
+ *     arr[2].v = packed4[2].yzw;
+ *     arr[2].a[0] = packed5[2].xy;
+ *     arr[2].a[1] = packed5[2].zw;
+ *     ...
+ *   }
+ */
+
+#include "glsl_symbol_table.h"
+#include "ir.h"
+#include "ir_optimization.h"
+
+namespace {
+
+/**
+ * Visitor that performs varying packing.  For each varying declared in the
+ * shader, this visitor determines whether it needs to be packed.  If so, it
+ * demotes it to an ordinary global, creates new packed varyings, and
+ * generates assignments to convert between the original varying and the
+ * packed varying.
+ */
+class lower_packed_varyings_visitor
+{
+public:
+   lower_packed_varyings_visitor(void *mem_ctx, unsigned locations_used,
+                                 ir_variable_mode mode,
+                                 unsigned gs_input_vertices,
+                                 exec_list *out_instructions);
+
+   void run(exec_list *instructions);
+
+private:
+   ir_assignment *bitwise_assign_pack(ir_rvalue *lhs, ir_rvalue *rhs);
+   ir_assignment *bitwise_assign_unpack(ir_rvalue *lhs, ir_rvalue *rhs);
+   unsigned lower_rvalue(ir_rvalue *rvalue, unsigned fine_location,
+                         ir_variable *unpacked_var, const char *name,
+                         bool gs_input_toplevel, unsigned vertex_index);
+   unsigned lower_arraylike(ir_rvalue *rvalue, unsigned array_size,
+                            unsigned fine_location,
+                            ir_variable *unpacked_var, const char *name,
+                            bool gs_input_toplevel, unsigned vertex_index);
+   ir_dereference *get_packed_varying_deref(unsigned location,
+                                            ir_variable *unpacked_var,
+                                            const char *name,
+                                            unsigned vertex_index);
+   bool needs_lowering(ir_variable *var);
+
+   /**
+    * Memory context used to allocate new instructions for the shader.
+    */
+   void * const mem_ctx;
+
+   /**
+    * Number of generic varying slots which are used by this shader.  This is
+    * used to allocate temporary intermediate data structures.  If any varying
+    * used by this shader has a location greater than or equal to
+    * VARYING_SLOT_VAR0 + locations_used, an assertion will fire.
+    */
+   const unsigned locations_used;
+
+   /**
+    * Array of pointers to the packed varyings that have been created for each
+    * generic varying slot.  NULL entries in this array indicate varying slots
+    * for which a packed varying has not been created yet.
+    */
+   ir_variable **packed_varyings;
+
+   /**
+    * Type of varying which is being lowered in this pass (either
+    * ir_var_shader_in or ir_var_shader_out).
+    */
+   const ir_variable_mode mode;
+
+   /**
+    * If we are currently lowering geometry shader inputs, the number of input
+    * vertices the geometry shader accepts.  Otherwise zero.
+    */
+   const unsigned gs_input_vertices;
+
+   /**
+    * Exec list into which the visitor should insert the packing instructions.
+    * Caller provides this list; it should insert the instructions into the
+    * appropriate place in the shader once the visitor has finished running.
+    */
+   exec_list *out_instructions;
+};
+
+} /* anonymous namespace */
+
+lower_packed_varyings_visitor::lower_packed_varyings_visitor(
+      void *mem_ctx, unsigned locations_used, ir_variable_mode mode,
+      unsigned gs_input_vertices, exec_list *out_instructions)
+   : mem_ctx(mem_ctx),
+     locations_used(locations_used),
+     packed_varyings((ir_variable **)
+                     rzalloc_array_size(mem_ctx, sizeof(*packed_varyings),
+                                        locations_used)),
+     mode(mode),
+     gs_input_vertices(gs_input_vertices),
+     out_instructions(out_instructions)
+{
+}
+
+void
+lower_packed_varyings_visitor::run(exec_list *instructions)
+{
+   foreach_list (node, instructions) {
+      ir_variable *var = ((ir_instruction *) node)->as_variable();
+      if (var == NULL)
+         continue;
+
+      if (var->data.mode != this->mode ||
+          var->data.location < VARYING_SLOT_VAR0 ||
+          !this->needs_lowering(var))
+         continue;
+
+      /* This lowering pass is only capable of packing floats and ints
+       * together when their interpolation mode is "flat".  Therefore, to be
+       * safe, caller should ensure that integral varyings always use flat
+       * interpolation, even when this is not required by GLSL.
+       */
+      assert(var->data.interpolation == INTERP_QUALIFIER_FLAT ||
+             !var->type->contains_integer());
+
+      /* Change the old varying into an ordinary global. */
+      var->data.mode = ir_var_auto;
+
+      /* Create a reference to the old varying. */
+      ir_dereference_variable *deref
+         = new(this->mem_ctx) ir_dereference_variable(var);
+
+      /* Recursively pack or unpack it. */
+      this->lower_rvalue(deref, var->data.location * 4 + var->data.location_frac, var,
+                         var->name, this->gs_input_vertices != 0, 0);
+   }
+}
+
+
+/**
+ * Make an ir_assignment from \c rhs to \c lhs, performing appropriate
+ * bitcasts if necessary to match up types.
+ *
+ * This function is called when packing varyings.
+ */
+ir_assignment *
+lower_packed_varyings_visitor::bitwise_assign_pack(ir_rvalue *lhs,
+                                                   ir_rvalue *rhs)
+{
+   if (lhs->type->base_type != rhs->type->base_type) {
+      /* Since we only mix types in flat varyings, and we always store flat
+       * varyings as type ivec4, we need only produce conversions from (uint
+       * or float) to int.
+       */
+      assert(lhs->type->base_type == GLSL_TYPE_INT);
+      switch (rhs->type->base_type) {
+      case GLSL_TYPE_UINT:
+         rhs = new(this->mem_ctx)
+            ir_expression(ir_unop_u2i, lhs->type, rhs);
+         break;
+      case GLSL_TYPE_FLOAT:
+         rhs = new(this->mem_ctx)
+            ir_expression(ir_unop_bitcast_f2i, lhs->type, rhs);
+         break;
+      default:
+         assert(!"Unexpected type conversion while lowering varyings");
+         break;
+      }
+   }
+   return new(this->mem_ctx) ir_assignment(lhs, rhs);
+}
+
+
+/**
+ * Make an ir_assignment from \c rhs to \c lhs, performing appropriate
+ * bitcasts if necessary to match up types.
+ *
+ * This function is called when unpacking varyings.
+ */
+ir_assignment *
+lower_packed_varyings_visitor::bitwise_assign_unpack(ir_rvalue *lhs,
+                                                     ir_rvalue *rhs)
+{
+   if (lhs->type->base_type != rhs->type->base_type) {
+      /* Since we only mix types in flat varyings, and we always store flat
+       * varyings as type ivec4, we need only produce conversions from int to
+       * (uint or float).
+       */
+      assert(rhs->type->base_type == GLSL_TYPE_INT);
+      switch (lhs->type->base_type) {
+      case GLSL_TYPE_UINT:
+         rhs = new(this->mem_ctx)
+            ir_expression(ir_unop_i2u, lhs->type, rhs);
+         break;
+      case GLSL_TYPE_FLOAT:
+         rhs = new(this->mem_ctx)
+            ir_expression(ir_unop_bitcast_i2f, lhs->type, rhs);
+         break;
+      default:
+         assert(!"Unexpected type conversion while lowering varyings");
+         break;
+      }
+   }
+   return new(this->mem_ctx) ir_assignment(lhs, rhs);
+}
+
+
+/**
+ * Recursively pack or unpack the given varying (or portion of a varying) by
+ * traversing all of its constituent vectors.
+ *
+ * \param fine_location is the location where the first constituent vector
+ * should be packed--the word "fine" indicates that this location is expressed
+ * in multiples of a float, rather than multiples of a vec4 as is used
+ * elsewhere in Mesa.
+ *
+ * \param gs_input_toplevel should be set to true if we are lowering geometry
+ * shader inputs, and we are currently lowering the whole input variable
+ * (i.e. we are lowering the array whose index selects the vertex).
+ *
+ * \param vertex_index: if we are lowering geometry shader inputs, and the
+ * level of the array that we are currently lowering is *not* the top level,
+ * then this indicates which vertex we are currently lowering.  Otherwise it
+ * is ignored.
+ *
+ * \return the location where the next constituent vector (after this one)
+ * should be packed.
+ */
+unsigned
+lower_packed_varyings_visitor::lower_rvalue(ir_rvalue *rvalue,
+                                            unsigned fine_location,
+                                            ir_variable *unpacked_var,
+                                            const char *name,
+                                            bool gs_input_toplevel,
+                                            unsigned vertex_index)
+{
+   /* When gs_input_toplevel is set, we should be looking at a geometry shader
+    * input array.
+    */
+   assert(!gs_input_toplevel || rvalue->type->is_array());
+
+   if (rvalue->type->is_record()) {
+      for (unsigned i = 0; i < rvalue->type->length; i++) {
+         if (i != 0)
+            rvalue = rvalue->clone(this->mem_ctx, NULL);
+         const char *field_name = rvalue->type->fields.structure[i].name;
+         ir_dereference_record *dereference_record = new(this->mem_ctx)
+            ir_dereference_record(rvalue, field_name);
+         char *deref_name
+            = ralloc_asprintf(this->mem_ctx, "%s.%s", name, field_name);
+         fine_location = this->lower_rvalue(dereference_record, fine_location,
+                                            unpacked_var, deref_name, false,
+                                            vertex_index);
+      }
+      return fine_location;
+   } else if (rvalue->type->is_array()) {
+      /* Arrays are packed/unpacked by considering each array element in
+       * sequence.
+       */
+      return this->lower_arraylike(rvalue, rvalue->type->array_size(),
+                                   fine_location, unpacked_var, name,
+                                   gs_input_toplevel, vertex_index);
+   } else if (rvalue->type->is_matrix()) {
+      /* Matrices are packed/unpacked by considering each column vector in
+       * sequence.
+       */
+      return this->lower_arraylike(rvalue, rvalue->type->matrix_columns,
+                                   fine_location, unpacked_var, name,
+                                   false, vertex_index);
+   } else if (rvalue->type->vector_elements + fine_location % 4 > 4) {
+      /* This vector is going to be "double parked" across two varying slots,
+       * so handle it as two separate assignments.
+       */
+      unsigned left_components = 4 - fine_location % 4;
+      unsigned right_components
+         = rvalue->type->vector_elements - left_components;
+      unsigned left_swizzle_values[4] = { 0, 0, 0, 0 };
+      unsigned right_swizzle_values[4] = { 0, 0, 0, 0 };
+      char left_swizzle_name[4] = { 0, 0, 0, 0 };
+      char right_swizzle_name[4] = { 0, 0, 0, 0 };
+      for (unsigned i = 0; i < left_components; i++) {
+         left_swizzle_values[i] = i;
+         left_swizzle_name[i] = "xyzw"[i];
+      }
+      for (unsigned i = 0; i < right_components; i++) {
+         right_swizzle_values[i] = i + left_components;
+         right_swizzle_name[i] = "xyzw"[i + left_components];
+      }
+      ir_swizzle *left_swizzle = new(this->mem_ctx)
+         ir_swizzle(rvalue, left_swizzle_values, left_components);
+      ir_swizzle *right_swizzle = new(this->mem_ctx)
+         ir_swizzle(rvalue->clone(this->mem_ctx, NULL), right_swizzle_values,
+                    right_components);
+      char *left_name
+         = ralloc_asprintf(this->mem_ctx, "%s.%s", name, left_swizzle_name);
+      char *right_name
+         = ralloc_asprintf(this->mem_ctx, "%s.%s", name, right_swizzle_name);
+      fine_location = this->lower_rvalue(left_swizzle, fine_location,
+                                         unpacked_var, left_name, false,
+                                         vertex_index);
+      return this->lower_rvalue(right_swizzle, fine_location, unpacked_var,
+                                right_name, false, vertex_index);
+   } else {
+      /* No special handling is necessary; pack the rvalue into the
+       * varying.
+       */
+      unsigned swizzle_values[4] = { 0, 0, 0, 0 };
+      unsigned components = rvalue->type->vector_elements;
+      unsigned location = fine_location / 4;
+      unsigned location_frac = fine_location % 4;
+      for (unsigned i = 0; i < components; ++i)
+         swizzle_values[i] = i + location_frac;
+      ir_dereference *packed_deref =
+         this->get_packed_varying_deref(location, unpacked_var, name,
+                                        vertex_index);
+      ir_swizzle *swizzle = new(this->mem_ctx)
+         ir_swizzle(packed_deref, swizzle_values, components);
+      if (this->mode == ir_var_shader_out) {
+         ir_assignment *assignment
+            = this->bitwise_assign_pack(swizzle, rvalue);
+         this->out_instructions->push_tail(assignment);
+      } else {
+         ir_assignment *assignment
+            = this->bitwise_assign_unpack(rvalue, swizzle);
+         this->out_instructions->push_tail(assignment);
+      }
+      return fine_location + components;
+   }
+}
+
+/**
+ * Recursively pack or unpack a varying for which we need to iterate over its
+ * constituent elements, accessing each one using an ir_dereference_array.
+ * This takes care of both arrays and matrices, since ir_dereference_array
+ * treats a matrix like an array of its column vectors.
+ *
+ * \param gs_input_toplevel should be set to true if we are lowering geometry
+ * shader inputs, and we are currently lowering the whole input variable
+ * (i.e. we are lowering the array whose index selects the vertex).
+ *
+ * \param vertex_index: if we are lowering geometry shader inputs, and the
+ * level of the array that we are currently lowering is *not* the top level,
+ * then this indicates which vertex we are currently lowering.  Otherwise it
+ * is ignored.
+ */
+unsigned
+lower_packed_varyings_visitor::lower_arraylike(ir_rvalue *rvalue,
+                                               unsigned array_size,
+                                               unsigned fine_location,
+                                               ir_variable *unpacked_var,
+                                               const char *name,
+                                               bool gs_input_toplevel,
+                                               unsigned vertex_index)
+{
+   for (unsigned i = 0; i < array_size; i++) {
+      if (i != 0)
+         rvalue = rvalue->clone(this->mem_ctx, NULL);
+      ir_constant *constant = new(this->mem_ctx) ir_constant(i);
+      ir_dereference_array *dereference_array = new(this->mem_ctx)
+         ir_dereference_array(rvalue, constant);
+      if (gs_input_toplevel) {
+         /* Geometry shader inputs are a special case.  Instead of storing
+          * each element of the array at a different location, all elements
+          * are at the same location, but with a different vertex index.
+          */
+         (void) this->lower_rvalue(dereference_array, fine_location,
+                                   unpacked_var, name, false, i);
+      } else {
+         char *subscripted_name
+            = ralloc_asprintf(this->mem_ctx, "%s[%d]", name, i);
+         fine_location =
+            this->lower_rvalue(dereference_array, fine_location,
+                               unpacked_var, subscripted_name,
+                               false, vertex_index);
+      }
+   }
+   return fine_location;
+}
+
+/**
+ * Retrieve the packed varying corresponding to the given varying location.
+ * If no packed varying has been created for the given varying location yet,
+ * create it and add it to the shader before returning it.
+ *
+ * The newly created varying inherits its interpolation parameters from \c
+ * unpacked_var.  Its base type is ivec4 if we are lowering a flat varying,
+ * vec4 otherwise.
+ *
+ * \param vertex_index: if we are lowering geometry shader inputs, then this
+ * indicates which vertex we are currently lowering.  Otherwise it is ignored.
+ */
+ir_dereference *
+lower_packed_varyings_visitor::get_packed_varying_deref(
+      unsigned location, ir_variable *unpacked_var, const char *name,
+      unsigned vertex_index)
+{
+   unsigned slot = location - VARYING_SLOT_VAR0;
+   assert(slot < locations_used);
+   if (this->packed_varyings[slot] == NULL) {
+      char *packed_name = ralloc_asprintf(this->mem_ctx, "packed:%s", name);
+      const glsl_type *packed_type;
+      if (unpacked_var->data.interpolation == INTERP_QUALIFIER_FLAT)
+         packed_type = glsl_type::ivec4_type;
+      else
+         packed_type = glsl_type::vec4_type;
+      if (this->gs_input_vertices != 0) {
+         packed_type =
+            glsl_type::get_array_instance(packed_type,
+                                          this->gs_input_vertices);
+      }
+      ir_variable *packed_var = new(this->mem_ctx)
+         ir_variable(packed_type, packed_name, this->mode);
+      if (this->gs_input_vertices != 0) {
+         /* Prevent update_array_sizes() from messing with the size of the
+          * array.
+          */
+         packed_var->data.max_array_access = this->gs_input_vertices - 1;
+      }
+      packed_var->data.centroid = unpacked_var->data.centroid;
+      packed_var->data.sample = unpacked_var->data.sample;
+      packed_var->data.interpolation = unpacked_var->data.interpolation;
+      packed_var->data.location = location;
+      unpacked_var->insert_before(packed_var);
+      this->packed_varyings[slot] = packed_var;
+   } else {
+      /* For geometry shader inputs, only update the packed variable name the
+       * first time we visit each component.
+       */
+      if (this->gs_input_vertices == 0 || vertex_index == 0) {
+         ralloc_asprintf_append((char **) &this->packed_varyings[slot]->name,
+                                ",%s", name);
+      }
+   }
+
+   ir_dereference *deref = new(this->mem_ctx)
+      ir_dereference_variable(this->packed_varyings[slot]);
+   if (this->gs_input_vertices != 0) {
+      /* When lowering GS inputs, the packed variable is an array, so we need
+       * to dereference it using vertex_index.
+       */
+      ir_constant *constant = new(this->mem_ctx) ir_constant(vertex_index);
+      deref = new(this->mem_ctx) ir_dereference_array(deref, constant);
+   }
+   return deref;
+}
+
+bool
+lower_packed_varyings_visitor::needs_lowering(ir_variable *var)
+{
+   /* Things composed of vec4's and varyings with explicitly assigned
+    * locations don't need lowering.  Everything else does.
+    */
+   if (var->data.explicit_location)
+      return false;
+
+   const glsl_type *type = var->type;
+   if (this->gs_input_vertices != 0) {
+      assert(type->is_array());
+      type = type->element_type();
+   }
+   if (type->is_array())
+      type = type->fields.array;
+   if (type->vector_elements == 4)
+      return false;
+   return true;
+}
+
+
+/**
+ * Visitor that splices varying packing code before every use of EmitVertex()
+ * in a geometry shader.
+ */
+class lower_packed_varyings_gs_splicer : public ir_hierarchical_visitor
+{
+public:
+   explicit lower_packed_varyings_gs_splicer(void *mem_ctx,
+                                             const exec_list *instructions);
+
+   virtual ir_visitor_status visit(ir_emit_vertex *ev);
+
+private:
+   /**
+    * Memory context used to allocate new instructions for the shader.
+    */
+   void * const mem_ctx;
+
+   /**
+    * Instructions that should be spliced into place before each EmitVertex()
+    * call.
+    */
+   const exec_list *instructions;
+};
+
+
+lower_packed_varyings_gs_splicer::lower_packed_varyings_gs_splicer(
+      void *mem_ctx, const exec_list *instructions)
+   : mem_ctx(mem_ctx), instructions(instructions)
+{
+}
+
+
+ir_visitor_status
+lower_packed_varyings_gs_splicer::visit(ir_emit_vertex *ev)
+{
+   foreach_list(node, this->instructions) {
+      ir_instruction *ir = (ir_instruction *) node;
+      ev->insert_before(ir->clone(this->mem_ctx, NULL));
+   }
+   return visit_continue;
+}
+
+
+void
+lower_packed_varyings(void *mem_ctx, unsigned locations_used,
+                      ir_variable_mode mode, unsigned gs_input_vertices,
+                      gl_shader *shader)
+{
+   exec_list *instructions = shader->ir;
+   ir_function *main_func = shader->symbols->get_function("main");
+   exec_list void_parameters;
+   ir_function_signature *main_func_sig
+      = main_func->matching_signature(NULL, &void_parameters);
+   exec_list new_instructions;
+   lower_packed_varyings_visitor visitor(mem_ctx, locations_used, mode,
+                                         gs_input_vertices, &new_instructions);
+   visitor.run(instructions);
+   if (mode == ir_var_shader_out) {
+      if (shader->Stage == MESA_SHADER_GEOMETRY) {
+         /* For geometry shaders, outputs need to be lowered before each call
+          * to EmitVertex()
+          */
+         lower_packed_varyings_gs_splicer splicer(mem_ctx, &new_instructions);
+         splicer.run(instructions);
+      } else {
+         /* For other shader types, outputs need to be lowered at the end of
+          * main()
+          */
+         main_func_sig->body.append_list(&new_instructions);
+      }
+   } else {
+      /* Shader inputs need to be lowered at the beginning of main() */
+      main_func_sig->body.head->insert_before(&new_instructions);
+   }
+}

diff --git a/icd/intel/compiler/shader/lower_packing_builtins.cpp b/icd/intel/compiler/shader/lower_packing_builtins.cpp
new file mode 100644
index 0000000..db73c7b
--- /dev/null
+++ b/icd/intel/compiler/shader/lower_packing_builtins.cpp

@@ -0,0 +1,1314 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "ir.h"
+#include "ir_builder.h"
+#include "ir_optimization.h"
+#include "ir_rvalue_visitor.h"
+
+namespace {
+
+using namespace ir_builder;
+
+/**
+ * A visitor that lowers built-in floating-point pack/unpack expressions
+ * such packSnorm2x16.
+ */
+class lower_packing_builtins_visitor : public ir_rvalue_visitor {
+public:
+   /**
+    * \param op_mask is a bitmask of `enum lower_packing_builtins_op`
+    */
+   explicit lower_packing_builtins_visitor(int op_mask)
+      : op_mask(op_mask),
+        progress(false)
+   {
+      /* Mutually exclusive options. */
+      assert(!((op_mask & LOWER_PACK_HALF_2x16) &&
+               (op_mask & LOWER_PACK_HALF_2x16_TO_SPLIT)));
+
+      assert(!((op_mask & LOWER_UNPACK_HALF_2x16) &&
+               (op_mask & LOWER_UNPACK_HALF_2x16_TO_SPLIT)));
+
+      factory.instructions = &factory_instructions;
+   }
+
+   virtual ~lower_packing_builtins_visitor()
+   {
+      assert(factory_instructions.is_empty());
+   }
+
+   bool get_progress() { return progress; }
+
+   void handle_rvalue(ir_rvalue **rvalue)
+   {
+      if (!*rvalue)
+	 return;
+
+      ir_expression *expr = (*rvalue)->as_expression();
+      if (!expr)
+	 return;
+
+      enum lower_packing_builtins_op lowering_op =
+         choose_lowering_op(expr->operation);
+
+      if (lowering_op == LOWER_PACK_UNPACK_NONE)
+         return;
+
+      setup_factory(ralloc_parent(expr));
+
+      ir_rvalue *op0 = expr->operands[0];
+      ralloc_steal(factory.mem_ctx, op0);
+
+      switch (lowering_op) {
+      case LOWER_PACK_SNORM_2x16:
+         *rvalue = lower_pack_snorm_2x16(op0);
+         break;
+      case LOWER_PACK_SNORM_4x8:
+         *rvalue = lower_pack_snorm_4x8(op0);
+         break;
+      case LOWER_PACK_UNORM_2x16:
+         *rvalue = lower_pack_unorm_2x16(op0);
+         break;
+      case LOWER_PACK_UNORM_4x8:
+         *rvalue = lower_pack_unorm_4x8(op0);
+         break;
+      case LOWER_PACK_HALF_2x16:
+         *rvalue = lower_pack_half_2x16(op0);
+         break;
+      case LOWER_PACK_HALF_2x16_TO_SPLIT:
+         *rvalue = split_pack_half_2x16(op0);
+         break;
+      case LOWER_UNPACK_SNORM_2x16:
+         *rvalue = lower_unpack_snorm_2x16(op0);
+         break;
+      case LOWER_UNPACK_SNORM_4x8:
+         *rvalue = lower_unpack_snorm_4x8(op0);
+         break;
+      case LOWER_UNPACK_UNORM_2x16:
+         *rvalue = lower_unpack_unorm_2x16(op0);
+         break;
+      case LOWER_UNPACK_UNORM_4x8:
+         *rvalue = lower_unpack_unorm_4x8(op0);
+         break;
+      case LOWER_UNPACK_HALF_2x16:
+         *rvalue = lower_unpack_half_2x16(op0);
+         break;
+      case LOWER_UNPACK_HALF_2x16_TO_SPLIT:
+         *rvalue = split_unpack_half_2x16(op0);
+         break;
+      case LOWER_PACK_UNPACK_NONE:
+         assert(!"not reached");
+         break;
+      }
+
+      teardown_factory();
+      progress = true;
+   }
+
+private:
+   const int op_mask;
+   bool progress;
+   ir_factory factory;
+   exec_list factory_instructions;
+
+   /**
+    * Determine the needed lowering operation by filtering \a expr_op
+    * through \ref op_mask.
+    */
+   enum lower_packing_builtins_op
+   choose_lowering_op(ir_expression_operation expr_op)
+   {
+      /* C++ regards int and enum as fundamentally different types.
+       * So, we can't simply return from each case; we must cast the return
+       * value.
+       */
+      int result;
+
+      switch (expr_op) {
+      case ir_unop_pack_snorm_2x16:
+         result = op_mask & LOWER_PACK_SNORM_2x16;
+         break;
+      case ir_unop_pack_snorm_4x8:
+         result = op_mask & LOWER_PACK_SNORM_4x8;
+         break;
+      case ir_unop_pack_unorm_2x16:
+         result = op_mask & LOWER_PACK_UNORM_2x16;
+         break;
+      case ir_unop_pack_unorm_4x8:
+         result = op_mask & LOWER_PACK_UNORM_4x8;
+         break;
+      case ir_unop_pack_half_2x16:
+         result = op_mask & (LOWER_PACK_HALF_2x16 | LOWER_PACK_HALF_2x16_TO_SPLIT);
+         break;
+      case ir_unop_unpack_snorm_2x16:
+         result = op_mask & LOWER_UNPACK_SNORM_2x16;
+         break;
+      case ir_unop_unpack_snorm_4x8:
+         result = op_mask & LOWER_UNPACK_SNORM_4x8;
+         break;
+      case ir_unop_unpack_unorm_2x16:
+         result = op_mask & LOWER_UNPACK_UNORM_2x16;
+         break;
+      case ir_unop_unpack_unorm_4x8:
+         result = op_mask & LOWER_UNPACK_UNORM_4x8;
+         break;
+      case ir_unop_unpack_half_2x16:
+         result = op_mask & (LOWER_UNPACK_HALF_2x16 | LOWER_UNPACK_HALF_2x16_TO_SPLIT);
+         break;
+      default:
+         result = LOWER_PACK_UNPACK_NONE;
+         break;
+      }
+
+      return static_cast<enum lower_packing_builtins_op>(result);
+   }
+
+   void
+   setup_factory(void *mem_ctx)
+   {
+      assert(factory.mem_ctx == NULL);
+      assert(factory.instructions->is_empty());
+
+      factory.mem_ctx = mem_ctx;
+   }
+
+   void
+   teardown_factory()
+   {
+      base_ir->insert_before(factory.instructions);
+      assert(factory.instructions->is_empty());
+      factory.mem_ctx = NULL;
+   }
+
+   template <typename T>
+   ir_constant*
+   constant(T x)
+   {
+      return factory.constant(x);
+   }
+
+   /**
+    * \brief Pack two uint16's into a single uint32.
+    *
+    * Interpret the given uvec2 as a uint16 pair. Pack the pair into a uint32
+    * where the least significant bits specify the first element of the pair.
+    * Return the uint32.
+    */
+   ir_rvalue*
+   pack_uvec2_to_uint(ir_rvalue *uvec2_rval)
+   {
+      assert(uvec2_rval->type == glsl_type::uvec2_type);
+
+      /* uvec2 u = UVEC2_RVAL; */
+      ir_variable *u = factory.make_temp(glsl_type::uvec2_type,
+                                          "tmp_pack_uvec2_to_uint");
+      factory.emit(assign(u, uvec2_rval));
+
+      /* return (u.y << 16) | (u.x & 0xffff); */
+      return bit_or(lshift(swizzle_y(u), constant(16u)),
+                    bit_and(swizzle_x(u), constant(0xffffu)));
+   }
+
+   /**
+    * \brief Pack four uint8's into a single uint32.
+    *
+    * Interpret the given uvec4 as a uint32 4-typle. Pack the 4-tuple into a
+    * uint32 where the least significant bits specify the first element of the
+    * 4-tuple. Return the uint32.
+    */
+   ir_rvalue*
+   pack_uvec4_to_uint(ir_rvalue *uvec4_rval)
+   {
+      assert(uvec4_rval->type == glsl_type::uvec4_type);
+
+      /* uvec4 u = UVEC4_RVAL; */
+      ir_variable *u = factory.make_temp(glsl_type::uvec4_type,
+                                          "tmp_pack_uvec4_to_uint");
+      factory.emit(assign(u, bit_and(uvec4_rval, constant(0xffu))));
+
+      /* return (u.w << 24) | (u.z << 16) | (u.y << 8) | u.x; */
+      return bit_or(bit_or(lshift(swizzle_w(u), constant(24u)),
+                           lshift(swizzle_z(u), constant(16u))),
+                    bit_or(lshift(swizzle_y(u), constant(8u)),
+                           swizzle_x(u)));
+   }
+
+   /**
+    * \brief Unpack a uint32 into two uint16's.
+    *
+    * Interpret the given uint32 as a uint16 pair where the uint32's least
+    * significant bits specify the pair's first element. Return the uint16
+    * pair as a uvec2.
+    */
+   ir_rvalue*
+   unpack_uint_to_uvec2(ir_rvalue *uint_rval)
+   {
+      assert(uint_rval->type == glsl_type::uint_type);
+
+      /* uint u = UINT_RVAL; */
+      ir_variable *u = factory.make_temp(glsl_type::uint_type,
+                                          "tmp_unpack_uint_to_uvec2_u");
+      factory.emit(assign(u, uint_rval));
+
+      /* uvec2 u2; */
+      ir_variable *u2 = factory.make_temp(glsl_type::uvec2_type,
+                                           "tmp_unpack_uint_to_uvec2_u2");
+
+      /* u2.x = u & 0xffffu; */
+      factory.emit(assign(u2, bit_and(u, constant(0xffffu)), WRITEMASK_X));
+
+      /* u2.y = u >> 16u; */
+      factory.emit(assign(u2, rshift(u, constant(16u)), WRITEMASK_Y));
+
+      return deref(u2).val;
+   }
+
+   /**
+    * \brief Unpack a uint32 into four uint8's.
+    *
+    * Interpret the given uint32 as a uint8 4-tuple where the uint32's least
+    * significant bits specify the 4-tuple's first element. Return the uint8
+    * 4-tuple as a uvec4.
+    */
+   ir_rvalue*
+   unpack_uint_to_uvec4(ir_rvalue *uint_rval)
+   {
+      assert(uint_rval->type == glsl_type::uint_type);
+
+      /* uint u = UINT_RVAL; */
+      ir_variable *u = factory.make_temp(glsl_type::uint_type,
+                                          "tmp_unpack_uint_to_uvec4_u");
+      factory.emit(assign(u, uint_rval));
+
+      /* uvec4 u4; */
+      ir_variable *u4 = factory.make_temp(glsl_type::uvec4_type,
+                                           "tmp_unpack_uint_to_uvec4_u4");
+
+      /* u4.x = u & 0xffu; */
+      factory.emit(assign(u4, bit_and(u, constant(0xffu)), WRITEMASK_X));
+
+      /* u4.y = (u >> 8u) & 0xffu; */
+      factory.emit(assign(u4, bit_and(rshift(u, constant(8u)),
+                                      constant(0xffu)), WRITEMASK_Y));
+
+      /* u4.z = (u >> 16u) & 0xffu; */
+      factory.emit(assign(u4, bit_and(rshift(u, constant(16u)),
+                                      constant(0xffu)), WRITEMASK_Z));
+
+      /* u4.w = (u >> 24u) */
+      factory.emit(assign(u4, rshift(u, constant(24u)), WRITEMASK_W));
+
+      return deref(u4).val;
+   }
+
+   /**
+    * \brief Lower a packSnorm2x16 expression.
+    *
+    * \param vec2_rval is packSnorm2x16's input
+    * \return packSnorm2x16's output as a uint rvalue
+    */
+   ir_rvalue*
+   lower_pack_snorm_2x16(ir_rvalue *vec2_rval)
+   {
+      /* From page 88 (94 of pdf) of the GLSL ES 3.00 spec:
+       *
+       *    highp uint packSnorm2x16(vec2 v)
+       *    --------------------------------
+       *    First, converts each component of the normalized floating-point value
+       *    v into 16-bit integer values. Then, the results are packed into the
+       *    returned 32-bit unsigned integer.
+       *
+       *    The conversion for component c of v to fixed point is done as
+       *    follows:
+       *
+       *       packSnorm2x16: round(clamp(c, -1, +1) * 32767.0)
+       *
+       *    The first component of the vector will be written to the least
+       *    significant bits of the output; the last component will be written to
+       *    the most significant bits.
+       *
+       * This function generates IR that approximates the following pseudo-GLSL:
+       *
+       *     return pack_uvec2_to_uint(
+       *         uvec2(ivec2(
+       *           round(clamp(VEC2_RVALUE, -1.0f, 1.0f) * 32767.0f))));
+       *
+       * It is necessary to first convert the vec2 to ivec2 rather than directly
+       * converting vec2 to uvec2 because the latter conversion is undefined.
+       * From page 56 (62 of pdf) of the GLSL ES 3.00 spec: "It is undefined to
+       * convert a negative floating point value to an uint".
+       */
+      assert(vec2_rval->type == glsl_type::vec2_type);
+
+      ir_rvalue *result = pack_uvec2_to_uint(
+            i2u(f2i(round_even(mul(clamp(vec2_rval,
+                                         constant(-1.0f),
+                                         constant(1.0f)),
+                                   constant(32767.0f))))));
+
+      assert(result->type == glsl_type::uint_type);
+      return result;
+   }
+
+   /**
+    * \brief Lower a packSnorm4x8 expression.
+    *
+    * \param vec4_rval is packSnorm4x8's input
+    * \return packSnorm4x8's output as a uint rvalue
+    */
+   ir_rvalue*
+   lower_pack_snorm_4x8(ir_rvalue *vec4_rval)
+   {
+      /* From page 137 (143 of pdf) of the GLSL 4.30 spec:
+       *
+       *    highp uint packSnorm4x8(vec4 v)
+       *    -------------------------------
+       *    First, converts each component of the normalized floating-point value
+       *    v into 8-bit integer values. Then, the results are packed into the
+       *    returned 32-bit unsigned integer.
+       *
+       *    The conversion for component c of v to fixed point is done as
+       *    follows:
+       *
+       *       packSnorm4x8: round(clamp(c, -1, +1) * 127.0)
+       *
+       *    The first component of the vector will be written to the least
+       *    significant bits of the output; the last component will be written to
+       *    the most significant bits.
+       *
+       * This function generates IR that approximates the following pseudo-GLSL:
+       *
+       *     return pack_uvec4_to_uint(
+       *         uvec4(ivec4(
+       *           round(clamp(VEC4_RVALUE, -1.0f, 1.0f) * 127.0f))));
+       *
+       * It is necessary to first convert the vec4 to ivec4 rather than directly
+       * converting vec4 to uvec4 because the latter conversion is undefined.
+       * From page 87 (93 of pdf) of the GLSL 4.30 spec: "It is undefined to
+       * convert a negative floating point value to an uint".
+       */
+      assert(vec4_rval->type == glsl_type::vec4_type);
+
+      ir_rvalue *result = pack_uvec4_to_uint(
+            i2u(f2i(round_even(mul(clamp(vec4_rval,
+                                         constant(-1.0f),
+                                         constant(1.0f)),
+                                   constant(127.0f))))));
+
+      assert(result->type == glsl_type::uint_type);
+      return result;
+   }
+
+   /**
+    * \brief Lower an unpackSnorm2x16 expression.
+    *
+    * \param uint_rval is unpackSnorm2x16's input
+    * \return unpackSnorm2x16's output as a vec2 rvalue
+    */
+   ir_rvalue*
+   lower_unpack_snorm_2x16(ir_rvalue *uint_rval)
+   {
+      /* From page 88 (94 of pdf) of the GLSL ES 3.00 spec:
+       *
+       *    highp vec2 unpackSnorm2x16 (highp uint p)
+       *    -----------------------------------------
+       *    First, unpacks a single 32-bit unsigned integer p into a pair of
+       *    16-bit unsigned integers. Then, each component is converted to
+       *    a normalized floating-point value to generate the returned
+       *    two-component vector.
+       *
+       *    The conversion for unpacked fixed-point value f to floating point is
+       *    done as follows:
+       *
+       *       unpackSnorm2x16: clamp(f / 32767.0, -1,+1)
+       *
+       *    The first component of the returned vector will be extracted from the
+       *    least significant bits of the input; the last component will be
+       *    extracted from the most significant bits.
+       *
+       * This function generates IR that approximates the following pseudo-GLSL:
+       *
+       *    return clamp(
+       *       ((ivec2(unpack_uint_to_uvec2(UINT_RVALUE)) << 16) >> 16) / 32767.0f,
+       *       -1.0f, 1.0f);
+       *
+       * The above IR may appear unnecessarily complex, but the intermediate
+       * conversion to ivec2 and the bit shifts are necessary to correctly unpack
+       * negative floats.
+       *
+       * To see why, consider packing and then unpacking vec2(-1.0, 0.0).
+       * packSnorm2x16 encodes -1.0 as the int16 0xffff. During unpacking, we
+       * place that int16 into an int32, which results in the *positive* integer
+       * 0x0000ffff.  The int16's sign bit becomes, in the int32, the rather
+       * unimportant bit 16. We must now extend the int16's sign bit into bits
+       * 17-32, which is accomplished by left-shifting then right-shifting.
+       */
+
+      assert(uint_rval->type == glsl_type::uint_type);
+
+      ir_rvalue *result =
+        clamp(div(i2f(rshift(lshift(u2i(unpack_uint_to_uvec2(uint_rval)),
+                                    constant(16)),
+                             constant(16u))),
+                  constant(32767.0f)),
+              constant(-1.0f),
+              constant(1.0f));
+
+      assert(result->type == glsl_type::vec2_type);
+      return result;
+   }
+
+   /**
+    * \brief Lower an unpackSnorm4x8 expression.
+    *
+    * \param uint_rval is unpackSnorm4x8's input
+    * \return unpackSnorm4x8's output as a vec4 rvalue
+    */
+   ir_rvalue*
+   lower_unpack_snorm_4x8(ir_rvalue *uint_rval)
+   {
+      /* From page 137 (143 of pdf) of the GLSL 4.30 spec:
+       *
+       *    highp vec4 unpackSnorm4x8 (highp uint p)
+       *    ----------------------------------------
+       *    First, unpacks a single 32-bit unsigned integer p into four
+       *    8-bit unsigned integers. Then, each component is converted to
+       *    a normalized floating-point value to generate the returned
+       *    four-component vector.
+       *
+       *    The conversion for unpacked fixed-point value f to floating point is
+       *    done as follows:
+       *
+       *       unpackSnorm4x8: clamp(f / 127.0, -1, +1)
+       *
+       *    The first component of the returned vector will be extracted from the
+       *    least significant bits of the input; the last component will be
+       *    extracted from the most significant bits.
+       *
+       * This function generates IR that approximates the following pseudo-GLSL:
+       *
+       *    return clamp(
+       *       ((ivec4(unpack_uint_to_uvec4(UINT_RVALUE)) << 24) >> 24) / 127.0f,
+       *       -1.0f, 1.0f);
+       *
+       * The above IR may appear unnecessarily complex, but the intermediate
+       * conversion to ivec4 and the bit shifts are necessary to correctly unpack
+       * negative floats.
+       *
+       * To see why, consider packing and then unpacking vec4(-1.0, 0.0, 0.0,
+       * 0.0). packSnorm4x8 encodes -1.0 as the int8 0xff. During unpacking, we
+       * place that int8 into an int32, which results in the *positive* integer
+       * 0x000000ff.  The int8's sign bit becomes, in the int32, the rather
+       * unimportant bit 8. We must now extend the int8's sign bit into bits
+       * 9-32, which is accomplished by left-shifting then right-shifting.
+       */
+
+      assert(uint_rval->type == glsl_type::uint_type);
+
+      ir_rvalue *result =
+        clamp(div(i2f(rshift(lshift(u2i(unpack_uint_to_uvec4(uint_rval)),
+                                    constant(24u)),
+                             constant(24u))),
+                  constant(127.0f)),
+              constant(-1.0f),
+              constant(1.0f));
+
+      assert(result->type == glsl_type::vec4_type);
+      return result;
+   }
+
+   /**
+    * \brief Lower a packUnorm2x16 expression.
+    *
+    * \param vec2_rval is packUnorm2x16's input
+    * \return packUnorm2x16's output as a uint rvalue
+    */
+   ir_rvalue*
+   lower_pack_unorm_2x16(ir_rvalue *vec2_rval)
+   {
+      /* From page 88 (94 of pdf) of the GLSL ES 3.00 spec:
+       *
+       *    highp uint packUnorm2x16 (vec2 v)
+       *    ---------------------------------
+       *    First, converts each component of the normalized floating-point value
+       *    v into 16-bit integer values. Then, the results are packed into the
+       *    returned 32-bit unsigned integer.
+       *
+       *    The conversion for component c of v to fixed point is done as
+       *    follows:
+       *
+       *       packUnorm2x16: round(clamp(c, 0, +1) * 65535.0)
+       *
+       *    The first component of the vector will be written to the least
+       *    significant bits of the output; the last component will be written to
+       *    the most significant bits.
+       *
+       * This function generates IR that approximates the following pseudo-GLSL:
+       *
+       *     return pack_uvec2_to_uint(uvec2(
+       *                round(clamp(VEC2_RVALUE, 0.0f, 1.0f) * 65535.0f)));
+       *
+       * Here it is safe to directly convert the vec2 to uvec2 because the the
+       * vec2 has been clamped to a non-negative range.
+       */
+
+      assert(vec2_rval->type == glsl_type::vec2_type);
+
+      ir_rvalue *result = pack_uvec2_to_uint(
+         f2u(round_even(mul(saturate(vec2_rval), constant(65535.0f)))));
+
+      assert(result->type == glsl_type::uint_type);
+      return result;
+   }
+
+   /**
+    * \brief Lower a packUnorm4x8 expression.
+    *
+    * \param vec4_rval is packUnorm4x8's input
+    * \return packUnorm4x8's output as a uint rvalue
+    */
+   ir_rvalue*
+   lower_pack_unorm_4x8(ir_rvalue *vec4_rval)
+   {
+      /* From page 137 (143 of pdf) of the GLSL 4.30 spec:
+       *
+       *    highp uint packUnorm4x8 (vec4 v)
+       *    --------------------------------
+       *    First, converts each component of the normalized floating-point value
+       *    v into 8-bit integer values. Then, the results are packed into the
+       *    returned 32-bit unsigned integer.
+       *
+       *    The conversion for component c of v to fixed point is done as
+       *    follows:
+       *
+       *       packUnorm4x8: round(clamp(c, 0, +1) * 255.0)
+       *
+       *    The first component of the vector will be written to the least
+       *    significant bits of the output; the last component will be written to
+       *    the most significant bits.
+       *
+       * This function generates IR that approximates the following pseudo-GLSL:
+       *
+       *     return pack_uvec4_to_uint(uvec4(
+       *                round(clamp(VEC2_RVALUE, 0.0f, 1.0f) * 255.0f)));
+       *
+       * Here it is safe to directly convert the vec4 to uvec4 because the the
+       * vec4 has been clamped to a non-negative range.
+       */
+
+      assert(vec4_rval->type == glsl_type::vec4_type);
+
+      ir_rvalue *result = pack_uvec4_to_uint(
+         f2u(round_even(mul(saturate(vec4_rval), constant(255.0f)))));
+
+      assert(result->type == glsl_type::uint_type);
+      return result;
+   }
+
+   /**
+    * \brief Lower an unpackUnorm2x16 expression.
+    *
+    * \param uint_rval is unpackUnorm2x16's input
+    * \return unpackUnorm2x16's output as a vec2 rvalue
+    */
+   ir_rvalue*
+   lower_unpack_unorm_2x16(ir_rvalue *uint_rval)
+   {
+      /* From page 89 (95 of pdf) of the GLSL ES 3.00 spec:
+       *
+       *    highp vec2 unpackUnorm2x16 (highp uint p)
+       *    -----------------------------------------
+       *    First, unpacks a single 32-bit unsigned integer p into a pair of
+       *    16-bit unsigned integers. Then, each component is converted to
+       *    a normalized floating-point value to generate the returned
+       *    two-component vector.
+       *
+       *    The conversion for unpacked fixed-point value f to floating point is
+       *    done as follows:
+       *
+       *       unpackUnorm2x16: f / 65535.0
+       *
+       *    The first component of the returned vector will be extracted from the
+       *    least significant bits of the input; the last component will be
+       *    extracted from the most significant bits.
+       *
+       * This function generates IR that approximates the following pseudo-GLSL:
+       *
+       *     return vec2(unpack_uint_to_uvec2(UINT_RVALUE)) / 65535.0;
+       */
+
+      assert(uint_rval->type == glsl_type::uint_type);
+
+      ir_rvalue *result = div(u2f(unpack_uint_to_uvec2(uint_rval)),
+                              constant(65535.0f));
+
+      assert(result->type == glsl_type::vec2_type);
+      return result;
+   }
+
+   /**
+    * \brief Lower an unpackUnorm4x8 expression.
+    *
+    * \param uint_rval is unpackUnorm4x8's input
+    * \return unpackUnorm4x8's output as a vec4 rvalue
+    */
+   ir_rvalue*
+   lower_unpack_unorm_4x8(ir_rvalue *uint_rval)
+   {
+      /* From page 137 (143 of pdf) of the GLSL 4.30 spec:
+       *
+       *    highp vec4 unpackUnorm4x8 (highp uint p)
+       *    ----------------------------------------
+       *    First, unpacks a single 32-bit unsigned integer p into four
+       *    8-bit unsigned integers. Then, each component is converted to
+       *    a normalized floating-point value to generate the returned
+       *    two-component vector.
+       *
+       *    The conversion for unpacked fixed-point value f to floating point is
+       *    done as follows:
+       *
+       *       unpackUnorm4x8: f / 255.0
+       *
+       *    The first component of the returned vector will be extracted from the
+       *    least significant bits of the input; the last component will be
+       *    extracted from the most significant bits.
+       *
+       * This function generates IR that approximates the following pseudo-GLSL:
+       *
+       *     return vec4(unpack_uint_to_uvec4(UINT_RVALUE)) / 255.0;
+       */
+
+      assert(uint_rval->type == glsl_type::uint_type);
+
+      ir_rvalue *result = div(u2f(unpack_uint_to_uvec4(uint_rval)),
+                              constant(255.0f));
+
+      assert(result->type == glsl_type::vec4_type);
+      return result;
+   }
+
+   /**
+    * \brief Lower the component-wise calculation of packHalf2x16.
+    *
+    * \param f_rval is one component of packHafl2x16's input
+    * \param e_rval is the unshifted exponent bits of f_rval
+    * \param m_rval is the unshifted mantissa bits of f_rval
+    *
+    * \return a uint rvalue that encodes a float16 in its lower 16 bits
+    */
+   ir_rvalue*
+   pack_half_1x16_nosign(ir_rvalue *f_rval,
+                         ir_rvalue *e_rval,
+                         ir_rvalue *m_rval)
+   {
+      assert(e_rval->type == glsl_type::uint_type);
+      assert(m_rval->type == glsl_type::uint_type);
+
+      /* uint u16; */
+      ir_variable *u16 = factory.make_temp(glsl_type::uint_type,
+                                           "tmp_pack_half_1x16_u16");
+
+      /* float f = FLOAT_RVAL; */
+      ir_variable *f = factory.make_temp(glsl_type::float_type,
+                                          "tmp_pack_half_1x16_f");
+      factory.emit(assign(f, f_rval));
+
+      /* uint e = E_RVAL; */
+      ir_variable *e = factory.make_temp(glsl_type::uint_type,
+                                          "tmp_pack_half_1x16_e");
+      factory.emit(assign(e, e_rval));
+
+      /* uint m = M_RVAL; */
+      ir_variable *m = factory.make_temp(glsl_type::uint_type,
+                                          "tmp_pack_half_1x16_m");
+      factory.emit(assign(m, m_rval));
+
+      /* Preliminaries
+       * -------------
+       *
+       * For a float16, the bit layout is:
+       *
+       *   sign:     15
+       *   exponent: 10:14
+       *   mantissa: 0:9
+       *
+       * Let f16 be a float16 value. The sign, exponent, and mantissa
+       * determine its value thus:
+       *
+       *   if e16 = 0 and m16 = 0, then zero:       (-1)^s16 * 0                               (1)
+       *   if e16 = 0 and m16!= 0, then subnormal:  (-1)^s16 * 2^(e16 - 14) * (m16 / 2^10)     (2)
+       *   if 0 < e16 < 31, then normal:            (-1)^s16 * 2^(e16 - 15) * (1 + m16 / 2^10) (3)
+       *   if e16 = 31 and m16 = 0, then infinite:  (-1)^s16 * inf                             (4)
+       *   if e16 = 31 and m16 != 0, then           NaN                                        (5)
+       *
+       * where 0 <= m16 < 2^10.
+       *
+       * For a float32, the bit layout is:
+       *
+       *   sign:     31
+       *   exponent: 23:30
+       *   mantissa: 0:22
+       *
+       * Let f32 be a float32 value. The sign, exponent, and mantissa
+       * determine its value thus:
+       *
+       *   if e32 = 0 and m32 = 0, then zero:        (-1)^s * 0                                (10)
+       *   if e32 = 0 and m32 != 0, then subnormal:  (-1)^s * 2^(e32 - 126) * (m32 / 2^23)     (11)
+       *   if 0 < e32 < 255, then normal:            (-1)^s * 2^(e32 - 127) * (1 + m32 / 2^23) (12)
+       *   if e32 = 255 and m32 = 0, then infinite:  (-1)^s * inf                              (13)
+       *   if e32 = 255 and m32 != 0, then           NaN                                       (14)
+       *
+       * where 0 <= m32 < 2^23.
+       *
+       * The minimum and maximum normal float16 values are
+       *
+       *   min_norm16 = 2^(1 - 15) * (1 + 0 / 2^10) = 2^(-14)   (20)
+       *   max_norm16 = 2^(30 - 15) * (1 + 1023 / 2^10)         (21)
+       *
+       * The step at max_norm16 is
+       *
+       *   max_step16 = 2^5                                     (22)
+       *
+       * Observe that the float16 boundary values in equations 20-21 lie in the
+       * range of normal float32 values.
+       *
+       *
+       * Rounding Behavior
+       * -----------------
+       * Not all float32 values can be exactly represented as a float16. We
+       * round all such intermediate float32 values to the nearest float16; if
+       * the float32 is exactly between to float16 values, we round to the one
+       * with an even mantissa. This rounding behavior has several benefits:
+       *
+       *   - It has no sign bias.
+       *
+       *   - It reproduces the behavior of real hardware: opcode F32TO16 in Intel's
+       *     GPU ISA.
+       *
+       *   - By reproducing the behavior of the GPU (at least on Intel hardware),
+       *     compile-time evaluation of constant packHalf2x16 GLSL expressions will
+       *     result in the same value as if the expression were executed on the
+       *     GPU.
+       *
+       * Calculation
+       * -----------
+       * Our task is to compute s16, e16, m16 given f32.  Since this function
+       * ignores the sign bit, assume that s32 = s16 = 0.  There are several
+       * cases consider.
+       */
+
+      factory.emit(
+
+         /* Case 1) f32 is NaN
+          *
+          *   The resultant f16 will also be NaN.
+          */
+
+         /* if (e32 == 255 && m32 != 0) { */
+         if_tree(logic_and(equal(e, constant(0xffu << 23u)),
+                           logic_not(equal(m, constant(0u)))),
+
+            assign(u16, constant(0x7fffu)),
+
+         /* Case 2) f32 lies in the range [0, min_norm16).
+          *
+          *   The resultant float16 will be either zero, subnormal, or normal.
+          *
+          *   Solving
+          *
+          *     f32 = min_norm16       (30)
+          *
+          *   gives
+          *
+          *     e32 = 113 and m32 = 0  (31)
+          *
+          *   Therefore this case occurs if and only if
+          *
+          *     e32 < 113              (32)
+          */
+
+         /* } else if (e32 < 113) { */
+         if_tree(less(e, constant(113u << 23u)),
+
+            /* u16 = uint(round_to_even(abs(f32) * float(1u << 24u))); */
+            assign(u16, f2u(round_even(mul(expr(ir_unop_abs, f),
+                                           constant((float) (1 << 24)))))),
+
+         /* Case 3) f32 lies in the range
+          *         [min_norm16, max_norm16 + max_step16).
+          *
+          *   The resultant float16 will be either normal or infinite.
+          *
+          *   Solving
+          *
+          *     f32 = max_norm16 + max_step16           (40)
+          *         = 2^15 * (1 + 1023 / 2^10) + 2^5    (41)
+          *         = 2^16                              (42)
+          *   gives
+          *
+          *     e32 = 143 and m32 = 0                   (43)
+          *
+          *   We already solved the boundary condition f32 = min_norm16 above
+          *   in equation 31. Therefore this case occurs if and only if
+          *
+          *     113 <= e32 and e32 < 143
+          */
+
+         /* } else if (e32 < 143) { */
+         if_tree(less(e, constant(143u << 23u)),
+
+            /* The addition below handles the case where the mantissa rounds
+             * up to 1024 and bumps the exponent.
+             *
+             * u16 = ((e - (112u << 23u)) >> 13u)
+             *     + round_to_even((float(m) / (1u << 13u));
+             */
+            assign(u16, add(rshift(sub(e, constant(112u << 23u)),
+                                   constant(13u)),
+                            f2u(round_even(
+                                  div(u2f(m), constant((float) (1 << 13))))))),
+
+         /* Case 4) f32 lies in the range [max_norm16 + max_step16, inf].
+          *
+          *   The resultant float16 will be infinite.
+          *
+          *   The cases above caught all float32 values in the range
+          *   [0, max_norm16 + max_step16), so this is the fall-through case.
+          */
+
+         /* } else { */
+
+            assign(u16, constant(31u << 10u))))));
+
+         /* } */
+
+       return deref(u16).val;
+   }
+
+   /**
+    * \brief Lower a packHalf2x16 expression.
+    *
+    * \param vec2_rval is packHalf2x16's input
+    * \return packHalf2x16's output as a uint rvalue
+    */
+   ir_rvalue*
+   lower_pack_half_2x16(ir_rvalue *vec2_rval)
+   {
+      /* From page 89 (95 of pdf) of the GLSL ES 3.00 spec:
+       *
+       *    highp uint packHalf2x16 (mediump vec2 v)
+       *    ----------------------------------------
+       *    Returns an unsigned integer obtained by converting the components of
+       *    a two-component floating-point vector to the 16-bit floating-point
+       *    representation found in the OpenGL ES Specification, and then packing
+       *    these two 16-bit integers into a 32-bit unsigned integer.
+       *
+       *    The first vector component specifies the 16 least- significant bits
+       *    of the result; the second component specifies the 16 most-significant
+       *    bits.
+       */
+
+      assert(vec2_rval->type == glsl_type::vec2_type);
+
+      /* vec2 f = VEC2_RVAL; */
+      ir_variable *f = factory.make_temp(glsl_type::vec2_type,
+                                         "tmp_pack_half_2x16_f");
+      factory.emit(assign(f, vec2_rval));
+
+      /* uvec2 f32 = bitcast_f2u(f); */
+      ir_variable *f32 = factory.make_temp(glsl_type::uvec2_type,
+                                            "tmp_pack_half_2x16_f32");
+      factory.emit(assign(f32, expr(ir_unop_bitcast_f2u, f)));
+
+      /* uvec2 f16; */
+      ir_variable *f16 = factory.make_temp(glsl_type::uvec2_type,
+                                        "tmp_pack_half_2x16_f16");
+
+      /* Get f32's unshifted exponent bits.
+       *
+       *   uvec2 e = f32 & 0x7f800000u;
+       */
+      ir_variable *e = factory.make_temp(glsl_type::uvec2_type,
+                                          "tmp_pack_half_2x16_e");
+      factory.emit(assign(e, bit_and(f32, constant(0x7f800000u))));
+
+      /* Get f32's unshifted mantissa bits.
+       *
+       *   uvec2 m = f32 & 0x007fffffu;
+       */
+      ir_variable *m = factory.make_temp(glsl_type::uvec2_type,
+                                          "tmp_pack_half_2x16_m");
+      factory.emit(assign(m, bit_and(f32, constant(0x007fffffu))));
+
+      /* Set f16's exponent and mantissa bits.
+       *
+       *   f16.x = pack_half_1x16_nosign(e.x, m.x);
+       *   f16.y = pack_half_1y16_nosign(e.y, m.y);
+       */
+      factory.emit(assign(f16, pack_half_1x16_nosign(swizzle_x(f),
+                                                     swizzle_x(e),
+                                                     swizzle_x(m)),
+                           WRITEMASK_X));
+      factory.emit(assign(f16, pack_half_1x16_nosign(swizzle_y(f),
+                                                     swizzle_y(e),
+                                                     swizzle_y(m)),
+                           WRITEMASK_Y));
+
+      /* Set f16's sign bits.
+       *
+       *   f16 |= (f32 & (1u << 31u) >> 16u;
+       */
+      factory.emit(
+         assign(f16, bit_or(f16,
+                            rshift(bit_and(f32, constant(1u << 31u)),
+                                   constant(16u)))));
+
+
+      /* return (f16.y << 16u) | f16.x; */
+      ir_rvalue *result = bit_or(lshift(swizzle_y(f16),
+                                        constant(16u)),
+                                 swizzle_x(f16));
+
+      assert(result->type == glsl_type::uint_type);
+      return result;
+   }
+
+   /**
+    * \brief Split packHalf2x16's vec2 operand into two floats.
+    *
+    * \param vec2_rval is packHalf2x16's input
+    * \return a uint rvalue
+    *
+    * Some code generators, such as the i965 fragment shader, require that all
+    * vector expressions be lowered to a sequence of scalar expressions.
+    * However, packHalf2x16 cannot be scalarized by the same mechanism as
+    * a true vector operation because its input and output have a differing
+    * number of vector components.
+    *
+    * This method scalarizes packHalf2x16 by transforming it from an unary
+    * operation having vector input to a binary operation having scalar input.
+    * That is, it transforms
+    *
+    *    packHalf2x16(VEC2_RVAL);
+    *
+    * into
+    *
+    *    vec2 v = VEC2_RVAL;
+    *    return packHalf2x16_split(v.x, v.y);
+    */
+   ir_rvalue*
+   split_pack_half_2x16(ir_rvalue *vec2_rval)
+   {
+      assert(vec2_rval->type == glsl_type::vec2_type);
+
+      ir_variable *v = factory.make_temp(glsl_type::vec2_type,
+                                         "tmp_split_pack_half_2x16_v");
+      factory.emit(assign(v, vec2_rval));
+
+      return expr(ir_binop_pack_half_2x16_split, swizzle_x(v), swizzle_y(v));
+   }
+
+   /**
+    * \brief Lower the component-wise calculation of unpackHalf2x16.
+    *
+    * Given a uint that encodes a float16 in its lower 16 bits, this function
+    * returns a uint that encodes a float32 with the same value. The sign bit
+    * of the float16 is ignored.
+    *
+    * \param e_rval is the unshifted exponent bits of a float16
+    * \param m_rval is the unshifted mantissa bits of a float16
+    * \param a uint rvalue that encodes a float32
+    */
+   ir_rvalue*
+   unpack_half_1x16_nosign(ir_rvalue *e_rval, ir_rvalue *m_rval)
+   {
+      assert(e_rval->type == glsl_type::uint_type);
+      assert(m_rval->type == glsl_type::uint_type);
+
+      /* uint u32; */
+      ir_variable *u32 = factory.make_temp(glsl_type::uint_type,
+                                           "tmp_unpack_half_1x16_u32");
+
+      /* uint e = E_RVAL; */
+      ir_variable *e = factory.make_temp(glsl_type::uint_type,
+                                          "tmp_unpack_half_1x16_e");
+      factory.emit(assign(e, e_rval));
+
+      /* uint m = M_RVAL; */
+      ir_variable *m = factory.make_temp(glsl_type::uint_type,
+                                          "tmp_unpack_half_1x16_m");
+      factory.emit(assign(m, m_rval));
+
+      /* Preliminaries
+       * -------------
+       *
+       * For a float16, the bit layout is:
+       *
+       *   sign:     15
+       *   exponent: 10:14
+       *   mantissa: 0:9
+       *
+       * Let f16 be a float16 value. The sign, exponent, and mantissa
+       * determine its value thus:
+       *
+       *   if e16 = 0 and m16 = 0, then zero:       (-1)^s16 * 0                               (1)
+       *   if e16 = 0 and m16!= 0, then subnormal:  (-1)^s16 * 2^(e16 - 14) * (m16 / 2^10)     (2)
+       *   if 0 < e16 < 31, then normal:            (-1)^s16 * 2^(e16 - 15) * (1 + m16 / 2^10) (3)
+       *   if e16 = 31 and m16 = 0, then infinite:  (-1)^s16 * inf                             (4)
+       *   if e16 = 31 and m16 != 0, then           NaN                                        (5)
+       *
+       * where 0 <= m16 < 2^10.
+       *
+       * For a float32, the bit layout is:
+       *
+       *   sign: 31
+       *   exponent: 23:30
+       *   mantissa: 0:22
+       *
+       * Let f32 be a float32 value. The sign, exponent, and mantissa
+       * determine its value thus:
+       *
+       *   if e32 = 0 and m32 = 0, then zero:        (-1)^s * 0                                (10)
+       *   if e32 = 0 and m32 != 0, then subnormal:  (-1)^s * 2^(e32 - 126) * (m32 / 2^23)     (11)
+       *   if 0 < e32 < 255, then normal:            (-1)^s * 2^(e32 - 127) * (1 + m32 / 2^23) (12)
+       *   if e32 = 255 and m32 = 0, then infinite:  (-1)^s * inf                              (13)
+       *   if e32 = 255 and m32 != 0, then           NaN                                       (14)
+       *
+       * where 0 <= m32 < 2^23.
+       *
+       * Calculation
+       * -----------
+       * Our task is to compute s32, e32, m32 given f16.  Since this function
+       * ignores the sign bit, assume that s32 = s16 = 0.  There are several
+       * cases consider.
+       */
+
+      factory.emit(
+
+         /* Case 1) f16 is zero or subnormal.
+          *
+          *   The simplest method of calcuating f32 in this case is
+          *
+          *     f32 = f16                       (20)
+          *         = 2^(-14) * (m16 / 2^10)    (21)
+          *         = m16 / 2^(-24)             (22)
+          */
+
+         /* if (e16 == 0) { */
+         if_tree(equal(e, constant(0u)),
+
+            /* u32 = bitcast_f2u(float(m) / float(1 << 24)); */
+            assign(u32, expr(ir_unop_bitcast_f2u,
+                                div(u2f(m), constant((float)(1 << 24))))),
+
+         /* Case 2) f16 is normal.
+          *
+          *   The equation
+          *
+          *     f32 = f16                              (30)
+          *     2^(e32 - 127) * (1 + m32 / 2^23) =     (31)
+          *       2^(e16 - 15) * (1 + m16 / 2^10)
+          *
+          *   can be decomposed into two
+          *
+          *     2^(e32 - 127) = 2^(e16 - 15)           (32)
+          *     1 + m32 / 2^23 = 1 + m16 / 2^10        (33)
+          *
+          *   which solve to
+          *
+          *     e32 = e16 + 112                        (34)
+          *     m32 = m16 * 2^13                       (35)
+          */
+
+         /* } else if (e16 < 31)) { */
+         if_tree(less(e, constant(31u << 10u)),
+
+              /* u32 = ((e + (112 << 10)) | m) << 13;
+               */
+              assign(u32, lshift(bit_or(add(e, constant(112u << 10u)), m),
+                                 constant(13u))),
+
+
+         /* Case 3) f16 is infinite. */
+         if_tree(equal(m, constant(0u)),
+
+                 assign(u32, constant(255u << 23u)),
+
+         /* Case 4) f16 is NaN. */
+         /* } else { */
+
+            assign(u32, constant(0x7fffffffu))))));
+
+         /* } */
+
+      return deref(u32).val;
+   }
+
+   /**
+    * \brief Lower an unpackHalf2x16 expression.
+    *
+    * \param uint_rval is unpackHalf2x16's input
+    * \return unpackHalf2x16's output as a vec2 rvalue
+    */
+   ir_rvalue*
+   lower_unpack_half_2x16(ir_rvalue *uint_rval)
+   {
+      /* From page 89 (95 of pdf) of the GLSL ES 3.00 spec:
+       *
+       *    mediump vec2 unpackHalf2x16 (highp uint v)
+       *    ------------------------------------------
+       *    Returns a two-component floating-point vector with components
+       *    obtained by unpacking a 32-bit unsigned integer into a pair of 16-bit
+       *    values, interpreting those values as 16-bit floating-point numbers
+       *    according to the OpenGL ES Specification, and converting them to
+       *    32-bit floating-point values.
+       *
+       *    The first component of the vector is obtained from the
+       *    16 least-significant bits of v; the second component is obtained
+       *    from the 16 most-significant bits of v.
+       */
+      assert(uint_rval->type == glsl_type::uint_type);
+
+      /* uint u = RVALUE;
+       * uvec2 f16 = uvec2(u.x & 0xffff, u.y >> 16);
+       */
+      ir_variable *f16 = factory.make_temp(glsl_type::uvec2_type,
+                                            "tmp_unpack_half_2x16_f16");
+      factory.emit(assign(f16, unpack_uint_to_uvec2(uint_rval)));
+
+      /* uvec2 f32; */
+      ir_variable *f32 = factory.make_temp(glsl_type::uvec2_type,
+                                            "tmp_unpack_half_2x16_f32");
+
+      /* Get f16's unshifted exponent bits.
+       *
+       *    uvec2 e = f16 & 0x7c00u;
+       */
+      ir_variable *e = factory.make_temp(glsl_type::uvec2_type,
+                                          "tmp_unpack_half_2x16_e");
+      factory.emit(assign(e, bit_and(f16, constant(0x7c00u))));
+
+      /* Get f16's unshifted mantissa bits.
+       *
+       *    uvec2 m = f16 & 0x03ffu;
+       */
+      ir_variable *m = factory.make_temp(glsl_type::uvec2_type,
+                                          "tmp_unpack_half_2x16_m");
+      factory.emit(assign(m, bit_and(f16, constant(0x03ffu))));
+
+      /* Set f32's exponent and mantissa bits.
+       *
+       *   f32.x = unpack_half_1x16_nosign(e.x, m.x);
+       *   f32.y = unpack_half_1x16_nosign(e.y, m.y);
+       */
+      factory.emit(assign(f32, unpack_half_1x16_nosign(swizzle_x(e),
+                                                       swizzle_x(m)),
+                           WRITEMASK_X));
+      factory.emit(assign(f32, unpack_half_1x16_nosign(swizzle_y(e),
+                                                       swizzle_y(m)),
+                           WRITEMASK_Y));
+
+      /* Set f32's sign bit.
+       *
+       *    f32 |= (f16 & 0x8000u) << 16u;
+       */
+      factory.emit(assign(f32, bit_or(f32,
+                                       lshift(bit_and(f16,
+                                                      constant(0x8000u)),
+                                              constant(16u)))));
+
+      /* return bitcast_u2f(f32); */
+      ir_rvalue *result = expr(ir_unop_bitcast_u2f, f32);
+      assert(result->type == glsl_type::vec2_type);
+      return result;
+   }
+
+   /**
+    * \brief Split unpackHalf2x16 into two operations.
+    *
+    * \param uint_rval is unpackHalf2x16's input
+    * \return a vec2 rvalue
+    *
+    * Some code generators, such as the i965 fragment shader, require that all
+    * vector expressions be lowered to a sequence of scalar expressions.
+    * However, unpackHalf2x16 cannot be scalarized by the same method as
+    * a true vector operation because the number of components of its input
+    * and output differ.
+    *
+    * This method scalarizes unpackHalf2x16 by transforming it from a single
+    * operation having vec2 output to a pair of operations each having float
+    * output. That is, it transforms
+    *
+    *   unpackHalf2x16(UINT_RVAL)
+    *
+    * into
+    *
+    *   uint u = UINT_RVAL;
+    *   vec2 v;
+    *
+    *   v.x = unpackHalf2x16_split_x(u);
+    *   v.y = unpackHalf2x16_split_y(u);
+    *
+    *   return v;
+    */
+   ir_rvalue*
+   split_unpack_half_2x16(ir_rvalue *uint_rval)
+   {
+      assert(uint_rval->type == glsl_type::uint_type);
+
+      /* uint u = uint_rval; */
+      ir_variable *u = factory.make_temp(glsl_type::uint_type,
+                                          "tmp_split_unpack_half_2x16_u");
+      factory.emit(assign(u, uint_rval));
+
+      /* vec2 v; */
+      ir_variable *v = factory.make_temp(glsl_type::vec2_type,
+                                          "tmp_split_unpack_half_2x16_v");
+
+      /* v.x = unpack_half_2x16_split_x(u); */
+      factory.emit(assign(v, expr(ir_unop_unpack_half_2x16_split_x, u),
+                           WRITEMASK_X));
+
+      /* v.y = unpack_half_2x16_split_y(u); */
+      factory.emit(assign(v, expr(ir_unop_unpack_half_2x16_split_y, u),
+                           WRITEMASK_Y));
+
+      return deref(v).val;
+   }
+};
+
+} // namespace anonymous
+
+/**
+ * \brief Lower the builtin packing functions.
+ *
+ * \param op_mask is a bitmask of `enum lower_packing_builtins_op`.
+ */
+bool
+lower_packing_builtins(exec_list *instructions, int op_mask)
+{
+   lower_packing_builtins_visitor v(op_mask);
+   visit_list_elements(&v, instructions, true);
+   return v.get_progress();
+}

diff --git a/icd/intel/compiler/shader/lower_texture_projection.cpp b/icd/intel/compiler/shader/lower_texture_projection.cpp
new file mode 100644
index 0000000..16d6376
--- /dev/null
+++ b/icd/intel/compiler/shader/lower_texture_projection.cpp

@@ -0,0 +1,103 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file lower_texture_projection.cpp
+ *
+ * IR lower pass to perform the division of texture coordinates by the texture
+ * projector if present.
+ *
+ * Many GPUs have a texture sampling opcode that takes the projector
+ * and does the divide internally, thus the presence of the projector
+ * in the IR.  For GPUs that don't, this saves the driver needing the
+ * logic for handling the divide.
+ *
+ * \author Eric Anholt <eric@anholt.net>
+ */
+
+#include "ir.h"
+
+namespace {
+
+class lower_texture_projection_visitor : public ir_hierarchical_visitor {
+public:
+   lower_texture_projection_visitor()
+   {
+      progress = false;
+   }
+
+   ir_visitor_status visit_leave(ir_texture *ir);
+
+   bool progress;
+};
+
+} /* anonymous namespace */
+
+ir_visitor_status
+lower_texture_projection_visitor::visit_leave(ir_texture *ir)
+{
+   if (!ir->projector)
+      return visit_continue;
+
+   void *mem_ctx = ralloc_parent(ir);
+
+   ir_variable *var = new(mem_ctx) ir_variable(ir->projector->type,
+					       "projector", ir_var_auto);
+   base_ir->insert_before(var);
+   ir_dereference *deref = new(mem_ctx) ir_dereference_variable(var);
+   ir_expression *expr = new(mem_ctx) ir_expression(ir_unop_rcp,
+						    ir->projector->type,
+						    ir->projector,
+						    NULL);
+   ir_assignment *assign = new(mem_ctx) ir_assignment(deref, expr, NULL);
+   base_ir->insert_before(assign);
+
+   deref = new(mem_ctx) ir_dereference_variable(var);
+   ir->coordinate = new(mem_ctx) ir_expression(ir_binop_mul,
+					       ir->coordinate->type,
+					       ir->coordinate,
+					       deref);
+
+   if (ir->shadow_comparitor) {
+      deref = new(mem_ctx) ir_dereference_variable(var);
+      ir->shadow_comparitor = new(mem_ctx) ir_expression(ir_binop_mul,
+						  ir->shadow_comparitor->type,
+						  ir->shadow_comparitor,
+						  deref);
+   }
+
+   ir->projector = NULL;
+
+   progress = true;
+   return visit_continue;
+}
+
+bool
+do_lower_texture_projection(exec_list *instructions)
+{
+   lower_texture_projection_visitor v;
+
+   visit_list_elements(&v, instructions);
+
+   return v.progress;
+}

diff --git a/icd/intel/compiler/shader/lower_ubo_reference.cpp b/icd/intel/compiler/shader/lower_ubo_reference.cpp
new file mode 100644
index 0000000..90e65bd
--- /dev/null
+++ b/icd/intel/compiler/shader/lower_ubo_reference.cpp

@@ -0,0 +1,392 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file lower_ubo_reference.cpp
+ *
+ * IR lower pass to replace dereferences of variables in a uniform
+ * buffer object with usage of ir_binop_ubo_load expressions, each of
+ * which can read data up to the size of a vec4.
+ *
+ * This relieves drivers of the responsibility to deal with tricky UBO
+ * layout issues like std140 structures and row_major matrices on
+ * their own.
+ */
+
+#include "ir.h"
+#include "ir_builder.h"
+#include "ir_rvalue_visitor.h"
+#include "main/macros.h"
+
+using namespace ir_builder;
+
+namespace {
+class lower_ubo_reference_visitor : public ir_rvalue_enter_visitor {
+public:
+   lower_ubo_reference_visitor(struct gl_shader *shader)
+   : shader(shader)
+   {
+   }
+
+   void handle_rvalue(ir_rvalue **rvalue);
+   void emit_ubo_loads(ir_dereference *deref, ir_variable *base_offset,
+		       unsigned int deref_offset);
+   ir_expression *ubo_load(const struct glsl_type *type,
+			   ir_rvalue *offset);
+
+   void *mem_ctx;
+   struct gl_shader *shader;
+   struct gl_uniform_buffer_variable *ubo_var;
+   unsigned uniform_block;
+   bool progress;
+};
+
+/**
+ * Determine the name of the interface block field
+ *
+ * This is the name of the specific member as it would appear in the
+ * \c gl_uniform_buffer_variable::Name field in the shader's
+ * \c UniformBlocks array.
+ */
+static const char *
+interface_field_name(void *mem_ctx, char *base_name, ir_dereference *d)
+{
+   ir_constant *previous_index = NULL;
+
+   while (d != NULL) {
+      switch (d->ir_type) {
+      case ir_type_dereference_variable: {
+         ir_dereference_variable *v = (ir_dereference_variable *) d;
+         if (previous_index
+             && v->var->is_interface_instance()
+             && v->var->type->is_array())
+            return ralloc_asprintf(mem_ctx,
+                                   "%s[%d]",
+                                   base_name,
+                                   previous_index->get_uint_component(0));
+         else
+            return base_name;
+
+         break;
+      }
+
+      case ir_type_dereference_record: {
+         ir_dereference_record *r = (ir_dereference_record *) d;
+
+         d = r->record->as_dereference();
+         break;
+      }
+
+      case ir_type_dereference_array: {
+         ir_dereference_array *a = (ir_dereference_array *) d;
+
+         d = a->array->as_dereference();
+         previous_index = a->array_index->as_constant();
+         break;
+      }
+
+      default:
+         assert(!"Should not get here.");
+         break;
+      }
+   }
+
+   assert(!"Should not get here.");
+   return NULL;
+}
+
+void
+lower_ubo_reference_visitor::handle_rvalue(ir_rvalue **rvalue)
+{
+   if (!*rvalue)
+      return;
+
+   ir_dereference *deref = (*rvalue)->as_dereference();
+   if (!deref)
+      return;
+
+   ir_variable *var = deref->variable_referenced();
+   if (!var || !var->is_in_uniform_block())
+      return;
+
+   mem_ctx = ralloc_parent(*rvalue);
+
+   const char *const field_name =
+      interface_field_name(mem_ctx, (char *) var->get_interface_type()->name,
+                           deref);
+
+   this->uniform_block = -1;
+   for (unsigned i = 0; i < shader->NumUniformBlocks; i++) {
+      if (strcmp(field_name, shader->UniformBlocks[i].Name) == 0) {
+         this->uniform_block = i;
+
+         struct gl_uniform_block *block = &shader->UniformBlocks[i];
+
+         this->ubo_var = var->is_interface_instance()
+            ? &block->Uniforms[0] : &block->Uniforms[var->data.location];
+
+         break;
+      }
+   }
+
+   assert(this->uniform_block != (unsigned) -1);
+
+   ir_rvalue *offset = new(mem_ctx) ir_constant(0u);
+   unsigned const_offset = 0;
+   bool row_major = ubo_var->RowMajor;
+
+   /* Calculate the offset to the start of the region of the UBO
+    * dereferenced by *rvalue.  This may be a variable offset if an
+    * array dereference has a variable index.
+    */
+   while (deref) {
+      switch (deref->ir_type) {
+      case ir_type_dereference_variable: {
+	 const_offset += ubo_var->Offset;
+	 deref = NULL;
+	 break;
+      }
+
+      case ir_type_dereference_array: {
+	 ir_dereference_array *deref_array = (ir_dereference_array *)deref;
+	 unsigned array_stride;
+	 if (deref_array->array->type->is_matrix() && row_major) {
+	    /* When loading a vector out of a row major matrix, the
+	     * step between the columns (vectors) is the size of a
+	     * float, while the step between the rows (elements of a
+	     * vector) is handled below in emit_ubo_loads.
+	     */
+	    array_stride = 4;
+         } else if (deref_array->type->is_interface()) {
+            /* We're processing an array dereference of an interface instance
+	     * array.  The thing being dereferenced *must* be a variable
+	     * dereference because intefaces cannot be embedded an other
+	     * types.  In terms of calculating the offsets for the lowering
+	     * pass, we don't care about the array index.  All elements of an
+	     * interface instance array will have the same offsets relative to
+	     * the base of the block that backs them.
+             */
+            assert(deref_array->array->as_dereference_variable());
+            deref = deref_array->array->as_dereference();
+            break;
+	 } else {
+	    array_stride = deref_array->type->std140_size(row_major);
+	    array_stride = glsl_align(array_stride, 16);
+	 }
+
+         ir_rvalue *array_index = deref_array->array_index;
+         if (array_index->type->base_type == GLSL_TYPE_INT)
+            array_index = i2u(array_index);
+
+	 ir_constant *const_index = array_index->as_constant();
+	 if (const_index) {
+	    const_offset += array_stride * const_index->value.u[0];
+	 } else {
+	    offset = add(offset,
+			 mul(array_index,
+			     new(mem_ctx) ir_constant(array_stride)));
+	 }
+	 deref = deref_array->array->as_dereference();
+	 break;
+      }
+
+      case ir_type_dereference_record: {
+	 ir_dereference_record *deref_record = (ir_dereference_record *)deref;
+	 const glsl_type *struct_type = deref_record->record->type;
+	 unsigned intra_struct_offset = 0;
+
+	 unsigned max_field_align = 16;
+	 for (unsigned int i = 0; i < struct_type->length; i++) {
+	    const glsl_type *type = struct_type->fields.structure[i].type;
+	    unsigned field_align = type->std140_base_alignment(row_major);
+	    max_field_align = MAX2(field_align, max_field_align);
+	    intra_struct_offset = glsl_align(intra_struct_offset, field_align);
+
+	    if (strcmp(struct_type->fields.structure[i].name,
+		       deref_record->field) == 0)
+	       break;
+	    intra_struct_offset += type->std140_size(row_major);
+	 }
+
+	 const_offset = glsl_align(const_offset, max_field_align);
+	 const_offset += intra_struct_offset;
+
+	 deref = deref_record->record->as_dereference();
+	 break;
+      }
+      default:
+	 assert(!"not reached");
+	 deref = NULL;
+	 break;
+      }
+   }
+
+   /* Now that we've calculated the offset to the start of the
+    * dereference, walk over the type and emit loads into a temporary.
+    */
+   const glsl_type *type = (*rvalue)->type;
+   ir_variable *load_var = new(mem_ctx) ir_variable(type,
+						    "ubo_load_temp",
+						    ir_var_temporary);
+   base_ir->insert_before(load_var);
+
+   ir_variable *load_offset = new(mem_ctx) ir_variable(glsl_type::uint_type,
+						       "ubo_load_temp_offset",
+						       ir_var_temporary);
+   base_ir->insert_before(load_offset);
+   base_ir->insert_before(assign(load_offset, offset));
+
+   deref = new(mem_ctx) ir_dereference_variable(load_var);
+   emit_ubo_loads(deref, load_offset, const_offset);
+   *rvalue = deref;
+
+   progress = true;
+}
+
+ir_expression *
+lower_ubo_reference_visitor::ubo_load(const glsl_type *type,
+				      ir_rvalue *offset)
+{
+   return new(mem_ctx)
+      ir_expression(ir_binop_ubo_load,
+		    type,
+		    new(mem_ctx) ir_constant(this->uniform_block),
+		    offset);
+
+}
+
+/**
+ * Takes LHS and emits a series of assignments into its components
+ * from the UBO variable at variable_offset + deref_offset.
+ *
+ * Recursively calls itself to break the deref down to the point that
+ * the ir_binop_ubo_load expressions generated are contiguous scalars
+ * or vectors.
+ */
+void
+lower_ubo_reference_visitor::emit_ubo_loads(ir_dereference *deref,
+					    ir_variable *base_offset,
+					    unsigned int deref_offset)
+{
+   if (deref->type->is_record()) {
+      unsigned int field_offset = 0;
+
+      for (unsigned i = 0; i < deref->type->length; i++) {
+	 const struct glsl_struct_field *field =
+	    &deref->type->fields.structure[i];
+	 ir_dereference *field_deref =
+	    new(mem_ctx) ir_dereference_record(deref->clone(mem_ctx, NULL),
+					       field->name);
+
+	 field_offset =
+	    glsl_align(field_offset,
+		       field->type->std140_base_alignment(ubo_var->RowMajor));
+
+	 emit_ubo_loads(field_deref, base_offset, deref_offset + field_offset);
+
+	 field_offset += field->type->std140_size(ubo_var->RowMajor);
+      }
+      return;
+   }
+
+   if (deref->type->is_array()) {
+      unsigned array_stride =
+	 glsl_align(deref->type->fields.array->std140_size(ubo_var->RowMajor),
+		    16);
+
+      for (unsigned i = 0; i < deref->type->length; i++) {
+	 ir_constant *element = new(mem_ctx) ir_constant(i);
+	 ir_dereference *element_deref =
+	    new(mem_ctx) ir_dereference_array(deref->clone(mem_ctx, NULL),
+					      element);
+	 emit_ubo_loads(element_deref, base_offset,
+			deref_offset + i * array_stride);
+      }
+      return;
+   }
+
+   if (deref->type->is_matrix()) {
+      for (unsigned i = 0; i < deref->type->matrix_columns; i++) {
+	 ir_constant *col = new(mem_ctx) ir_constant(i);
+	 ir_dereference *col_deref =
+	    new(mem_ctx) ir_dereference_array(deref->clone(mem_ctx, NULL),
+					      col);
+
+	 /* std140 always rounds the stride of arrays (and matrices)
+	  * to a vec4, so matrices are always 16 between columns/rows.
+	  */
+	 emit_ubo_loads(col_deref, base_offset, deref_offset + i * 16);
+      }
+      return;
+   }
+
+   assert(deref->type->is_scalar() ||
+	  deref->type->is_vector());
+
+   if (!ubo_var->RowMajor) {
+      ir_rvalue *offset = add(base_offset,
+			      new(mem_ctx) ir_constant(deref_offset));
+      base_ir->insert_before(assign(deref->clone(mem_ctx, NULL),
+				    ubo_load(deref->type, offset)));
+   } else {
+      /* We're dereffing a column out of a row-major matrix, so we
+       * gather the vector from each stored row.
+      */
+      assert(deref->type->base_type == GLSL_TYPE_FLOAT);
+      /* Matrices, row_major or not, are stored as if they were
+       * arrays of vectors of the appropriate size in std140.
+       * Arrays have their strides rounded up to a vec4, so the
+       * matrix stride is always 16.
+       */
+      unsigned matrix_stride = 16;
+
+      for (unsigned i = 0; i < deref->type->vector_elements; i++) {
+	 ir_rvalue *chan_offset =
+	    add(base_offset,
+		new(mem_ctx) ir_constant(deref_offset + i * matrix_stride));
+
+	 base_ir->insert_before(assign(deref->clone(mem_ctx, NULL),
+				       ubo_load(glsl_type::float_type,
+						chan_offset),
+				       (1U << i)));
+      }
+   }
+}
+
+} /* unnamed namespace */
+
+void
+lower_ubo_reference(struct gl_shader *shader, exec_list *instructions)
+{
+   lower_ubo_reference_visitor v(shader);
+
+   /* Loop over the instructions lowering references, because we take
+    * a deref of a UBO array using a UBO dereference as the index will
+    * produce a collection of instructions all of which have cloned
+    * UBO dereferences for that array index.
+    */
+   do {
+      v.progress = false;
+      visit_list_elements(&v, instructions);
+   } while (v.progress);
+}

diff --git a/icd/intel/compiler/shader/lower_variable_index_to_cond_assign.cpp b/icd/intel/compiler/shader/lower_variable_index_to_cond_assign.cpp
new file mode 100644
index 0000000..d40d188
--- /dev/null
+++ b/icd/intel/compiler/shader/lower_variable_index_to_cond_assign.cpp

@@ -0,0 +1,549 @@
+/*
+ * Copyright © 2010 Luca Barbieri
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file lower_variable_index_to_cond_assign.cpp
+ *
+ * Turns non-constant indexing into array types to a series of
+ * conditional moves of each element into a temporary.
+ *
+ * Pre-DX10 GPUs often don't have a native way to do this operation,
+ * and this works around that.
+ *
+ * The lowering process proceeds as follows.  Each non-constant index
+ * found in an r-value is converted to a canonical form \c array[i].  Each
+ * element of the array is conditionally assigned to a temporary by comparing
+ * \c i to a constant index.  This is done by cloning the canonical form and
+ * replacing all occurrences of \c i with a constant.  Each remaining occurrence
+ * of the canonical form in the IR is replaced with a dereference of the
+ * temporary variable.
+ *
+ * L-values with non-constant indices are handled similarly.  In this case,
+ * the RHS of the assignment is assigned to a temporary.  The non-constant
+ * index is replace with the canonical form (just like for r-values).  The
+ * temporary is conditionally assigned to each element of the canonical form
+ * by comparing \c i with each index.  The same clone-and-replace scheme is
+ * used.
+ */
+
+#include "ir.h"
+#include "ir_rvalue_visitor.h"
+#include "ir_optimization.h"
+#include "glsl_types.h"
+#include "main/macros.h"
+
+/**
+ * Generate a comparison value for a block of indices
+ *
+ * Lowering passes for non-constant indexing of arrays, matrices, or vectors
+ * can use this to generate blocks of index comparison values.
+ *
+ * \param instructions  List where new instructions will be appended
+ * \param index         \c ir_variable containing the desired index
+ * \param base          Base value for this block of comparisons
+ * \param components    Number of unique index values to compare.  This must
+ *                      be on the range [1, 4].
+ * \param mem_ctx       ralloc memory context to be used for all allocations.
+ *
+ * \returns
+ * An \c ir_rvalue that \b must be cloned for each use in conditional
+ * assignments, etc.
+ */
+ir_rvalue *
+compare_index_block(exec_list *instructions, ir_variable *index,
+		    unsigned base, unsigned components, void *mem_ctx)
+{
+   ir_rvalue *broadcast_index = new(mem_ctx) ir_dereference_variable(index);
+
+   assert(index->type->is_scalar());
+   assert(index->type->base_type == GLSL_TYPE_INT);
+   assert(components >= 1 && components <= 4);
+
+   if (components > 1) {
+      const ir_swizzle_mask m = { 0, 0, 0, 0, components, false };
+      broadcast_index = new(mem_ctx) ir_swizzle(broadcast_index, m);
+   }
+
+   /* Compare the desired index value with the next block of four indices.
+    */
+   ir_constant_data test_indices_data;
+   memset(&test_indices_data, 0, sizeof(test_indices_data));
+   test_indices_data.i[0] = base;
+   test_indices_data.i[1] = base + 1;
+   test_indices_data.i[2] = base + 2;
+   test_indices_data.i[3] = base + 3;
+
+   ir_constant *const test_indices =
+      new(mem_ctx) ir_constant(broadcast_index->type,
+			       &test_indices_data);
+
+   ir_rvalue *const condition_val =
+      new(mem_ctx) ir_expression(ir_binop_equal,
+				 glsl_type::bvec(components),
+				 broadcast_index,
+				 test_indices);
+
+   ir_variable *const condition =
+      new(mem_ctx) ir_variable(condition_val->type,
+			       "dereference_condition",
+			       ir_var_temporary);
+   instructions->push_tail(condition);
+
+   ir_rvalue *const cond_deref =
+      new(mem_ctx) ir_dereference_variable(condition);
+   instructions->push_tail(new(mem_ctx) ir_assignment(cond_deref, condition_val, 0));
+
+   return cond_deref;
+}
+
+static inline bool
+is_array_or_matrix(const ir_rvalue *ir)
+{
+   return (ir->type->is_array() || ir->type->is_matrix());
+}
+
+namespace {
+/**
+ * Replace a dereference of a variable with a specified r-value
+ *
+ * Each time a dereference of the specified value is replaced, the r-value
+ * tree is cloned.
+ */
+class deref_replacer : public ir_rvalue_visitor {
+public:
+   deref_replacer(const ir_variable *variable_to_replace, ir_rvalue *value)
+      : variable_to_replace(variable_to_replace), value(value),
+	progress(false)
+   {
+      assert(this->variable_to_replace != NULL);
+      assert(this->value != NULL);
+   }
+
+   virtual void handle_rvalue(ir_rvalue **rvalue)
+   {
+      ir_dereference_variable *const dv = (*rvalue)->as_dereference_variable();
+
+      if ((dv != NULL) && (dv->var == this->variable_to_replace)) {
+	 this->progress = true;
+	 *rvalue = this->value->clone(ralloc_parent(*rvalue), NULL);
+      }
+   }
+
+   const ir_variable *variable_to_replace;
+   ir_rvalue *value;
+   bool progress;
+};
+
+/**
+ * Find a variable index dereference of an array in an rvalue tree
+ */
+class find_variable_index : public ir_hierarchical_visitor {
+public:
+   find_variable_index()
+      : deref(NULL)
+   {
+      /* empty */
+   }
+
+   virtual ir_visitor_status visit_enter(ir_dereference_array *ir)
+   {
+      if (is_array_or_matrix(ir->array)
+	  && (ir->array_index->as_constant() == NULL)) {
+	 this->deref = ir;
+	 return visit_stop;
+      }
+
+      return visit_continue;
+   }
+
+   /**
+    * First array dereference found in the tree that has a non-constant index.
+    */
+   ir_dereference_array *deref;
+};
+
+struct assignment_generator
+{
+   ir_instruction* base_ir;
+   ir_dereference *rvalue;
+   ir_variable *old_index;
+   bool is_write;
+   unsigned int write_mask;
+   ir_variable* var;
+
+   assignment_generator()
+      : base_ir(NULL),
+        rvalue(NULL),
+        old_index(NULL),
+        is_write(false),
+        write_mask(0),
+        var(NULL)
+   {
+   }
+
+   void generate(unsigned i, ir_rvalue* condition, exec_list *list) const
+   {
+      /* Just clone the rest of the deref chain when trying to get at the
+       * underlying variable.
+       */
+      void *mem_ctx = ralloc_parent(base_ir);
+
+      /* Clone the old r-value in its entirety.  Then replace any occurrences of
+       * the old variable index with the new constant index.
+       */
+      ir_dereference *element = this->rvalue->clone(mem_ctx, NULL);
+      ir_constant *const index = new(mem_ctx) ir_constant(i);
+      deref_replacer r(this->old_index, index);
+      element->accept(&r);
+      assert(r.progress);
+
+      /* Generate a conditional assignment to (or from) the constant indexed
+       * array dereference.
+       */
+      ir_rvalue *variable = new(mem_ctx) ir_dereference_variable(this->var);
+      ir_assignment *const assignment = (is_write)
+	 ? new(mem_ctx) ir_assignment(element, variable, condition, write_mask)
+	 : new(mem_ctx) ir_assignment(variable, element, condition);
+
+      list->push_tail(assignment);
+   }
+};
+
+struct switch_generator
+{
+   /* make TFunction a template parameter if you need to use other generators */
+   typedef assignment_generator TFunction;
+   const TFunction& generator;
+
+   ir_variable* index;
+   unsigned linear_sequence_max_length;
+   unsigned condition_components;
+
+   void *mem_ctx;
+
+   switch_generator(const TFunction& generator, ir_variable *index,
+		    unsigned linear_sequence_max_length,
+		    unsigned condition_components)
+      : generator(generator), index(index),
+	linear_sequence_max_length(linear_sequence_max_length),
+	condition_components(condition_components)
+   {
+      this->mem_ctx = ralloc_parent(index);
+   }
+
+   void linear_sequence(unsigned begin, unsigned end, exec_list *list)
+   {
+      if (begin == end)
+         return;
+
+      /* If the array access is a read, read the first element of this subregion
+       * unconditionally.  The remaining tests will possibly overwrite this
+       * value with one of the other array elements.
+       *
+       * This optimization cannot be done for writes because it will cause the
+       * first element of the subregion to be written possibly *in addition* to
+       * one of the other elements.
+       */
+      unsigned first;
+      if (!this->generator.is_write) {
+	 this->generator.generate(begin, 0, list);
+	 first = begin + 1;
+      } else {
+	 first = begin;
+      }
+
+      for (unsigned i = first; i < end; i += 4) {
+         const unsigned comps = MIN2(condition_components, end - i);
+
+	 ir_rvalue *const cond_deref =
+	    compare_index_block(list, index, i, comps, this->mem_ctx);
+
+         if (comps == 1) {
+            this->generator.generate(i, cond_deref->clone(this->mem_ctx, NULL),
+				     list);
+         } else {
+            for (unsigned j = 0; j < comps; j++) {
+	       ir_rvalue *const cond_swiz =
+		  new(this->mem_ctx) ir_swizzle(cond_deref->clone(this->mem_ctx, NULL),
+						j, 0, 0, 0, 1);
+
+               this->generator.generate(i + j, cond_swiz, list);
+            }
+         }
+      }
+   }
+
+   void bisect(unsigned begin, unsigned end, exec_list *list)
+   {
+      unsigned middle = (begin + end) >> 1;
+
+      assert(index->type->is_integer());
+
+      ir_constant *const middle_c = (index->type->base_type == GLSL_TYPE_UINT)
+	 ? new(this->mem_ctx) ir_constant((unsigned)middle)
+         : new(this->mem_ctx) ir_constant((int)middle);
+
+
+      ir_dereference_variable *deref =
+	 new(this->mem_ctx) ir_dereference_variable(this->index);
+
+      ir_expression *less =
+	 new(this->mem_ctx) ir_expression(ir_binop_less, glsl_type::bool_type,
+					  deref, middle_c);
+
+      ir_if *if_less = new(this->mem_ctx) ir_if(less);
+
+      generate(begin, middle, &if_less->then_instructions);
+      generate(middle, end, &if_less->else_instructions);
+
+      list->push_tail(if_less);
+   }
+
+   void generate(unsigned begin, unsigned end, exec_list *list)
+   {
+      unsigned length = end - begin;
+      if (length <= this->linear_sequence_max_length)
+         return linear_sequence(begin, end, list);
+      else
+         return bisect(begin, end, list);
+   }
+};
+
+/**
+ * Visitor class for replacing expressions with ir_constant values.
+ */
+
+class variable_index_to_cond_assign_visitor : public ir_rvalue_visitor {
+public:
+   variable_index_to_cond_assign_visitor(bool lower_input,
+					 bool lower_output,
+					 bool lower_temp,
+					 bool lower_uniform)
+   {
+      this->progress = false;
+      this->lower_inputs = lower_input;
+      this->lower_outputs = lower_output;
+      this->lower_temps = lower_temp;
+      this->lower_uniforms = lower_uniform;
+   }
+
+   bool progress;
+   bool lower_inputs;
+   bool lower_outputs;
+   bool lower_temps;
+   bool lower_uniforms;
+
+   bool storage_type_needs_lowering(ir_dereference_array *deref) const
+   {
+      /* If a variable isn't eventually the target of this dereference, then
+       * it must be a constant or some sort of anonymous temporary storage.
+       *
+       * FINISHME: Is this correct?  Most drivers treat arrays of constants as
+       * FINISHME: uniforms.  It seems like this should do the same.
+       */
+      const ir_variable *const var = deref->array->variable_referenced();
+      if (var == NULL)
+	 return this->lower_temps;
+
+      switch (var->data.mode) {
+      case ir_var_auto:
+      case ir_var_temporary:
+	 return this->lower_temps;
+      case ir_var_uniform:
+	 return this->lower_uniforms;
+      case ir_var_function_in:
+      case ir_var_const_in:
+         return this->lower_temps;
+      case ir_var_shader_in:
+         return this->lower_inputs;
+      case ir_var_function_out:
+         return this->lower_temps;
+      case ir_var_shader_out:
+         return this->lower_outputs;
+      case ir_var_function_inout:
+	 return this->lower_temps;
+      }
+
+      assert(!"Should not get here.");
+      return false;
+   }
+
+   bool needs_lowering(ir_dereference_array *deref) const
+   {
+      if (deref == NULL || deref->array_index->as_constant()
+	  || !is_array_or_matrix(deref->array))
+	 return false;
+
+      return this->storage_type_needs_lowering(deref);
+   }
+
+   ir_variable *convert_dereference_array(ir_dereference_array *orig_deref,
+					  ir_assignment* orig_assign,
+					  ir_dereference *orig_base)
+   {
+      assert(is_array_or_matrix(orig_deref->array));
+
+      const unsigned length = (orig_deref->array->type->is_array())
+         ? orig_deref->array->type->length
+         : orig_deref->array->type->matrix_columns;
+
+      void *const mem_ctx = ralloc_parent(base_ir);
+
+      /* Temporary storage for either the result of the dereference of
+       * the array, or the RHS that's being assigned into the
+       * dereference of the array.
+       */
+      ir_variable *var;
+
+      if (orig_assign) {
+	 var = new(mem_ctx) ir_variable(orig_assign->rhs->type,
+					"dereference_array_value",
+					ir_var_temporary);
+	 base_ir->insert_before(var);
+
+	 ir_dereference *lhs = new(mem_ctx) ir_dereference_variable(var);
+	 ir_assignment *assign = new(mem_ctx) ir_assignment(lhs,
+							    orig_assign->rhs,
+							    NULL);
+
+         base_ir->insert_before(assign);
+      } else {
+	 var = new(mem_ctx) ir_variable(orig_deref->type,
+					"dereference_array_value",
+					ir_var_temporary);
+	 base_ir->insert_before(var);
+      }
+
+      /* Store the index to a temporary to avoid reusing its tree. */
+      ir_variable *index =
+	 new(mem_ctx) ir_variable(orig_deref->array_index->type,
+				  "dereference_array_index", ir_var_temporary);
+      base_ir->insert_before(index);
+
+      ir_dereference *lhs = new(mem_ctx) ir_dereference_variable(index);
+      ir_assignment *assign =
+	 new(mem_ctx) ir_assignment(lhs, orig_deref->array_index, NULL);
+      base_ir->insert_before(assign);
+
+      orig_deref->array_index = lhs->clone(mem_ctx, NULL);
+
+      assignment_generator ag;
+      ag.rvalue = orig_base;
+      ag.base_ir = base_ir;
+      ag.old_index = index;
+      ag.var = var;
+      if (orig_assign) {
+	 ag.is_write = true;
+	 ag.write_mask = orig_assign->write_mask;
+      } else {
+	 ag.is_write = false;
+      }
+
+      switch_generator sg(ag, index, 4, 4);
+
+      /* If the original assignment has a condition, respect that original
+       * condition!  This is accomplished by wrapping the new conditional
+       * assignments in an if-statement that uses the original condition.
+       */
+      if ((orig_assign != NULL) && (orig_assign->condition != NULL)) {
+	 /* No need to clone the condition because the IR that it hangs on is
+	  * going to be removed from the instruction sequence.
+	  */
+	 ir_if *if_stmt = new(mem_ctx) ir_if(orig_assign->condition);
+
+	 sg.generate(0, length, &if_stmt->then_instructions);
+	 base_ir->insert_before(if_stmt);
+      } else {
+	 exec_list list;
+
+	 sg.generate(0, length, &list);
+	 base_ir->insert_before(&list);
+      }
+
+      return var;
+   }
+
+   virtual void handle_rvalue(ir_rvalue **pir)
+   {
+      if (this->in_assignee)
+	 return;
+
+      if (!*pir)
+         return;
+
+      ir_dereference_array* orig_deref = (*pir)->as_dereference_array();
+      if (needs_lowering(orig_deref)) {
+         ir_variable *var =
+	    convert_dereference_array(orig_deref, NULL, orig_deref);
+         assert(var);
+         *pir = new(ralloc_parent(base_ir)) ir_dereference_variable(var);
+         this->progress = true;
+      }
+   }
+
+   ir_visitor_status
+   visit_leave(ir_assignment *ir)
+   {
+      ir_rvalue_visitor::visit_leave(ir);
+
+      find_variable_index f;
+      ir->lhs->accept(&f);
+
+      if ((f.deref != NULL) && storage_type_needs_lowering(f.deref)) {
+         convert_dereference_array(f.deref, ir, ir->lhs);
+         ir->remove();
+         this->progress = true;
+      }
+
+      return visit_continue;
+   }
+};
+
+} /* anonymous namespace */
+
+bool
+lower_variable_index_to_cond_assign(exec_list *instructions,
+				    bool lower_input,
+				    bool lower_output,
+				    bool lower_temp,
+				    bool lower_uniform)
+{
+   variable_index_to_cond_assign_visitor v(lower_input,
+					   lower_output,
+					   lower_temp,
+					   lower_uniform);
+
+   /* Continue lowering until no progress is made.  If there are multiple
+    * levels of indirection (e.g., non-constant indexing of array elements and
+    * matrix columns of an array of matrix), each pass will only lower one
+    * level of indirection.
+    */
+   bool progress_ever = false;
+   do {
+      v.progress = false;
+      visit_list_elements(&v, instructions);
+      progress_ever = v.progress || progress_ever;
+   } while (v.progress);
+
+   return progress_ever;
+}

diff --git a/icd/intel/compiler/shader/lower_vec_index_to_cond_assign.cpp b/icd/intel/compiler/shader/lower_vec_index_to_cond_assign.cpp
new file mode 100644
index 0000000..fe6a3f2
--- /dev/null
+++ b/icd/intel/compiler/shader/lower_vec_index_to_cond_assign.cpp

@@ -0,0 +1,238 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file lower_vec_index_to_cond_assign.cpp
+ *
+ * Turns indexing into vector types to a series of conditional moves
+ * of each channel's swizzle into a temporary.
+ *
+ * Most GPUs don't have a native way to do this operation, and this
+ * works around that.  For drivers using both this pass and
+ * ir_vec_index_to_swizzle, there's a risk that this pass will happen
+ * before sufficient constant folding to find that the array index is
+ * constant.  However, we hope that other optimization passes,
+ * particularly constant folding of assignment conditions and copy
+ * propagation, will result in the same code in the end.
+ */
+
+#include "ir.h"
+#include "ir_visitor.h"
+#include "ir_optimization.h"
+#include "glsl_types.h"
+
+namespace {
+
+/**
+ * Visitor class for replacing expressions with ir_constant values.
+ */
+
+class ir_vec_index_to_cond_assign_visitor : public ir_hierarchical_visitor {
+public:
+   ir_vec_index_to_cond_assign_visitor()
+   {
+      progress = false;
+   }
+
+   ir_rvalue *convert_vec_index_to_cond_assign(void *mem_ctx,
+                                               ir_rvalue *orig_vector,
+                                               ir_rvalue *orig_index,
+                                               const glsl_type *type);
+
+   ir_rvalue *convert_vector_extract_to_cond_assign(ir_rvalue *ir);
+
+   virtual ir_visitor_status visit_enter(ir_expression *);
+   virtual ir_visitor_status visit_enter(ir_swizzle *);
+   virtual ir_visitor_status visit_leave(ir_assignment *);
+   virtual ir_visitor_status visit_enter(ir_return *);
+   virtual ir_visitor_status visit_enter(ir_call *);
+   virtual ir_visitor_status visit_enter(ir_if *);
+
+   bool progress;
+};
+
+} /* anonymous namespace */
+
+ir_rvalue *
+ir_vec_index_to_cond_assign_visitor::convert_vec_index_to_cond_assign(void *mem_ctx,
+                                                                      ir_rvalue *orig_vector,
+                                                                      ir_rvalue *orig_index,
+                                                                      const glsl_type *type)
+{
+   ir_assignment *assign, *value_assign;
+   ir_variable *index, *var, *value;
+   ir_dereference *deref, *deref_value;
+   unsigned i;
+
+
+   exec_list list;
+
+   /* Store the index to a temporary to avoid reusing its tree. */
+   index = new(base_ir) ir_variable(glsl_type::int_type,
+				    "vec_index_tmp_i",
+				    ir_var_temporary);
+   list.push_tail(index);
+   deref = new(base_ir) ir_dereference_variable(index);
+   assign = new(base_ir) ir_assignment(deref, orig_index, NULL);
+   list.push_tail(assign);
+
+   /* Store the value inside a temp, thus avoiding matrixes duplication */
+   value = new(base_ir) ir_variable(orig_vector->type, "vec_value_tmp",
+                                    ir_var_temporary);
+   list.push_tail(value);
+   deref_value = new(base_ir) ir_dereference_variable(value);
+   value_assign = new(base_ir) ir_assignment(deref_value, orig_vector);
+   list.push_tail(value_assign);
+
+   /* Temporary where we store whichever value we swizzle out. */
+   var = new(base_ir) ir_variable(type, "vec_index_tmp_v",
+				  ir_var_temporary);
+   list.push_tail(var);
+
+   /* Generate a single comparison condition "mask" for all of the components
+    * in the vector.
+    */
+   ir_rvalue *const cond_deref =
+      compare_index_block(&list, index, 0,
+                          orig_vector->type->vector_elements,
+			  mem_ctx);
+
+   /* Generate a conditional move of each vector element to the temp. */
+   for (i = 0; i < orig_vector->type->vector_elements; i++) {
+      ir_rvalue *condition_swizzle =
+         new(base_ir) ir_swizzle(cond_deref->clone(mem_ctx, NULL),
+                                 i, 0, 0, 0, 1);
+
+      /* Just clone the rest of the deref chain when trying to get at the
+       * underlying variable.
+       */
+      ir_rvalue *swizzle =
+	 new(base_ir) ir_swizzle(deref_value->clone(mem_ctx, NULL),
+				 i, 0, 0, 0, 1);
+
+      deref = new(base_ir) ir_dereference_variable(var);
+      assign = new(base_ir) ir_assignment(deref, swizzle, condition_swizzle);
+      list.push_tail(assign);
+   }
+
+   /* Put all of the new instructions in the IR stream before the old
+    * instruction.
+    */
+   base_ir->insert_before(&list);
+
+   this->progress = true;
+   return new(base_ir) ir_dereference_variable(var);
+}
+
+ir_rvalue *
+ir_vec_index_to_cond_assign_visitor::convert_vector_extract_to_cond_assign(ir_rvalue *ir)
+{
+   ir_expression *const expr = ir->as_expression();
+
+   if (expr == NULL || expr->operation != ir_binop_vector_extract)
+      return ir;
+
+   return convert_vec_index_to_cond_assign(ralloc_parent(ir),
+                                           expr->operands[0],
+                                           expr->operands[1],
+                                           ir->type);
+}
+
+ir_visitor_status
+ir_vec_index_to_cond_assign_visitor::visit_enter(ir_expression *ir)
+{
+   unsigned int i;
+
+   for (i = 0; i < ir->get_num_operands(); i++) {
+      ir->operands[i] = convert_vector_extract_to_cond_assign(ir->operands[i]);
+   }
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_vec_index_to_cond_assign_visitor::visit_enter(ir_swizzle *ir)
+{
+   /* Can't be hit from normal GLSL, since you can't swizzle a scalar (which
+    * the result of indexing a vector is.  But maybe at some point we'll end up
+    * using swizzling of scalars for vector construction.
+    */
+   ir->val = convert_vector_extract_to_cond_assign(ir->val);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_vec_index_to_cond_assign_visitor::visit_leave(ir_assignment *ir)
+{
+   ir->rhs = convert_vector_extract_to_cond_assign(ir->rhs);
+
+   if (ir->condition) {
+      ir->condition = convert_vector_extract_to_cond_assign(ir->condition);
+   }
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_vec_index_to_cond_assign_visitor::visit_enter(ir_call *ir)
+{
+   foreach_list_safe(n, &ir->actual_parameters) {
+      ir_rvalue *param = (ir_rvalue *) n;
+      ir_rvalue *new_param = convert_vector_extract_to_cond_assign(param);
+
+      if (new_param != param) {
+	 param->replace_with(new_param);
+      }
+   }
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_vec_index_to_cond_assign_visitor::visit_enter(ir_return *ir)
+{
+   if (ir->value) {
+      ir->value = convert_vector_extract_to_cond_assign(ir->value);
+   }
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_vec_index_to_cond_assign_visitor::visit_enter(ir_if *ir)
+{
+   ir->condition = convert_vector_extract_to_cond_assign(ir->condition);
+
+   return visit_continue;
+}
+
+bool
+do_vec_index_to_cond_assign(exec_list *instructions)
+{
+   ir_vec_index_to_cond_assign_visitor v;
+
+   visit_list_elements(&v, instructions);
+
+   return v.progress;
+}

diff --git a/icd/intel/compiler/shader/lower_vec_index_to_swizzle.cpp b/icd/intel/compiler/shader/lower_vec_index_to_swizzle.cpp
new file mode 100644
index 0000000..b5bb00c
--- /dev/null
+++ b/icd/intel/compiler/shader/lower_vec_index_to_swizzle.cpp

@@ -0,0 +1,172 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file lower_vec_index_to_swizzle.cpp
+ *
+ * Turns constant indexing into vector types to swizzles.  This will
+ * let other swizzle-aware optimization passes catch these constructs,
+ * and codegen backends not have to worry about this case.
+ */
+
+#include "ir.h"
+#include "ir_visitor.h"
+#include "ir_optimization.h"
+#include "glsl_types.h"
+#include "main/macros.h"
+
+/**
+ * Visitor class for replacing expressions with ir_constant values.
+ */
+
+namespace {
+
+class ir_vec_index_to_swizzle_visitor : public ir_hierarchical_visitor {
+public:
+   ir_vec_index_to_swizzle_visitor()
+   {
+      progress = false;
+   }
+
+   ir_rvalue *convert_vector_extract_to_swizzle(ir_rvalue *val);
+
+   virtual ir_visitor_status visit_enter(ir_expression *);
+   virtual ir_visitor_status visit_enter(ir_swizzle *);
+   virtual ir_visitor_status visit_enter(ir_assignment *);
+   virtual ir_visitor_status visit_enter(ir_return *);
+   virtual ir_visitor_status visit_enter(ir_call *);
+   virtual ir_visitor_status visit_enter(ir_if *);
+
+   bool progress;
+};
+
+} /* anonymous namespace */
+
+ir_rvalue *
+ir_vec_index_to_swizzle_visitor::convert_vector_extract_to_swizzle(ir_rvalue *ir)
+{
+   ir_expression *const expr = ir->as_expression();
+   if (expr == NULL || expr->operation != ir_binop_vector_extract)
+      return ir;
+
+   ir_constant *const idx = expr->operands[1]->constant_expression_value();
+   if (idx == NULL)
+      return ir;
+
+   void *ctx = ralloc_parent(ir);
+   this->progress = true;
+
+   /* Page 40 of the GLSL 1.20 spec says:
+    *
+    *     "When indexing with non-constant expressions, behavior is undefined
+    *     if the index is negative, or greater than or equal to the size of
+    *     the vector."
+    *
+    * The quoted spec text mentions non-constant expressions, but this code
+    * operates on constants.  These constants are the result of non-constant
+    * expressions that have been optimized to constants.  The common case here
+    * is a loop counter from an unrolled loop that is used to index a vector.
+    *
+    * The ir_swizzle constructor gets angry if the index is negative or too
+    * large.  For simplicity sake, just clamp the index to [0, size-1].
+    */
+   const int i = CLAMP(idx->value.i[0], 0,
+                       (int) expr->operands[0]->type->vector_elements - 1);
+
+   return new(ctx) ir_swizzle(expr->operands[0], i, 0, 0, 0, 1);
+}
+
+ir_visitor_status
+ir_vec_index_to_swizzle_visitor::visit_enter(ir_expression *ir)
+{
+   unsigned int i;
+
+   for (i = 0; i < ir->get_num_operands(); i++) {
+      ir->operands[i] = convert_vector_extract_to_swizzle(ir->operands[i]);
+   }
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_vec_index_to_swizzle_visitor::visit_enter(ir_swizzle *ir)
+{
+   /* Can't be hit from normal GLSL, since you can't swizzle a scalar (which
+    * the result of indexing a vector is.  But maybe at some point we'll end up
+    * using swizzling of scalars for vector construction.
+    */
+   ir->val = convert_vector_extract_to_swizzle(ir->val);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_vec_index_to_swizzle_visitor::visit_enter(ir_assignment *ir)
+{
+   ir->rhs = convert_vector_extract_to_swizzle(ir->rhs);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_vec_index_to_swizzle_visitor::visit_enter(ir_call *ir)
+{
+   foreach_list_safe(n, &ir->actual_parameters) {
+      ir_rvalue *param = (ir_rvalue *) n;
+      ir_rvalue *new_param = convert_vector_extract_to_swizzle(param);
+
+      if (new_param != param) {
+	 param->replace_with(new_param);
+      }
+   }
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_vec_index_to_swizzle_visitor::visit_enter(ir_return *ir)
+{
+   if (ir->value) {
+      ir->value = convert_vector_extract_to_swizzle(ir->value);
+   }
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_vec_index_to_swizzle_visitor::visit_enter(ir_if *ir)
+{
+   ir->condition = convert_vector_extract_to_swizzle(ir->condition);
+
+   return visit_continue;
+}
+
+bool
+do_vec_index_to_swizzle(exec_list *instructions)
+{
+   ir_vec_index_to_swizzle_visitor v;
+
+   v.run(instructions);
+
+   return v.progress;
+}

diff --git a/icd/intel/compiler/shader/lower_vector.cpp b/icd/intel/compiler/shader/lower_vector.cpp
new file mode 100644
index 0000000..d8eb55e
--- /dev/null
+++ b/icd/intel/compiler/shader/lower_vector.cpp

@@ -0,0 +1,228 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file lower_vector.cpp
+ * IR lowering pass to remove some types of ir_quadop_vector
+ *
+ * \author Ian Romanick <ian.d.romanick@intel.com>
+ */
+
+#include "ir.h"
+#include "ir_rvalue_visitor.h"
+
+namespace {
+
+class lower_vector_visitor : public ir_rvalue_visitor {
+public:
+   lower_vector_visitor() : dont_lower_swz(false), progress(false)
+   {
+      /* empty */
+   }
+
+   void handle_rvalue(ir_rvalue **rvalue);
+
+   /**
+    * Should SWZ-like expressions be lowered?
+    */
+   bool dont_lower_swz;
+
+   bool progress;
+};
+
+} /* anonymous namespace */
+
+/**
+ * Determine if an IR expression tree looks like an extended swizzle
+ *
+ * Extended swizzles consist of access of a single vector source (with possible
+ * per component negation) and the constants -1, 0, or 1.
+ */
+bool
+is_extended_swizzle(ir_expression *ir)
+{
+   /* Track any variables that are accessed by this expression.
+    */
+   ir_variable *var = NULL;
+
+   assert(ir->operation == ir_quadop_vector);
+
+   for (unsigned i = 0; i < ir->type->vector_elements; i++) {
+      ir_rvalue *op = ir->operands[i];
+
+      while (op != NULL) {
+	 switch (op->ir_type) {
+	 case ir_type_constant: {
+	    const ir_constant *const c = op->as_constant();
+
+	    if (!c->is_one() && !c->is_zero() && !c->is_negative_one())
+	       return false;
+
+	    op = NULL;
+	    break;
+	 }
+
+	 case ir_type_dereference_variable: {
+	    ir_dereference_variable *const d = (ir_dereference_variable *) op;
+
+	    if ((var != NULL) && (var != d->var))
+	       return false;
+
+	    var = d->var;
+	    op = NULL;
+	    break;
+	 }
+
+	 case ir_type_expression: {
+	    ir_expression *const ex = (ir_expression *) op;
+
+	    if (ex->operation != ir_unop_neg)
+	       return false;
+
+	    op = ex->operands[0];
+	    break;
+	 }
+
+	 case ir_type_swizzle:
+	    op = ((ir_swizzle *) op)->val;
+	    break;
+
+	 default:
+	    return false;
+	 }
+      }
+   }
+
+   return true;
+}
+
+void
+lower_vector_visitor::handle_rvalue(ir_rvalue **rvalue)
+{
+   if (!*rvalue)
+      return;
+
+   ir_expression *expr = (*rvalue)->as_expression();
+   if ((expr == NULL) || (expr->operation != ir_quadop_vector))
+      return;
+
+   if (this->dont_lower_swz && is_extended_swizzle(expr))
+      return;
+
+   /* FINISHME: Is this the right thing to use for the ralloc context?
+    */
+   void *const mem_ctx = expr;
+
+   assert(expr->type->vector_elements == expr->get_num_operands());
+
+   /* Generate a temporary with the same type as the ir_quadop_operation.
+    */
+   ir_variable *const temp =
+      new(mem_ctx) ir_variable(expr->type, "vecop_tmp", ir_var_temporary);
+
+   this->base_ir->insert_before(temp);
+
+   /* Counter of the number of components collected so far.
+    */
+   unsigned assigned;
+
+   /* Write-mask in the destination that receives counted by 'assigned'.
+    */
+   unsigned write_mask;
+
+
+   /* Generate up to four assignments to that variable.  Try to group component
+    * assignments together:
+    *
+    * - All constant components can be assigned at once.
+    * - All assigments of components from a single variable with the same
+    *   unary operator can be assigned at once.
+    */
+   ir_constant_data d = { { 0 } };
+
+   assigned = 0;
+   write_mask = 0;
+   for (unsigned i = 0; i < expr->type->vector_elements; i++) {
+      const ir_constant *const c = expr->operands[i]->as_constant();
+
+      if (c == NULL)
+	 continue;
+
+      switch (expr->type->base_type) {
+      case GLSL_TYPE_UINT:  d.u[assigned] = c->value.u[0]; break;
+      case GLSL_TYPE_INT:   d.i[assigned] = c->value.i[0]; break;
+      case GLSL_TYPE_FLOAT: d.f[assigned] = c->value.f[0]; break;
+      case GLSL_TYPE_BOOL:  d.b[assigned] = c->value.b[0]; break;
+      default:              assert(!"Should not get here."); break;
+      }
+
+      write_mask |= (1U << i);
+      assigned++;
+   }
+
+   assert((write_mask == 0) == (assigned == 0));
+
+   /* If there were constant values, generate an assignment.
+    */
+   if (assigned > 0) {
+      ir_constant *const c =
+	 new(mem_ctx) ir_constant(glsl_type::get_instance(expr->type->base_type,
+							  assigned, 1),
+				  &d);
+      ir_dereference *const lhs = new(mem_ctx) ir_dereference_variable(temp);
+      ir_assignment *const assign =
+	 new(mem_ctx) ir_assignment(lhs, c, NULL, write_mask);
+
+      this->base_ir->insert_before(assign);
+   }
+
+   /* FINISHME: This should try to coalesce assignments.
+    */
+   for (unsigned i = 0; i < expr->type->vector_elements; i++) {
+      if (expr->operands[i]->ir_type == ir_type_constant)
+	 continue;
+
+      ir_dereference *const lhs = new(mem_ctx) ir_dereference_variable(temp);
+      ir_assignment *const assign =
+	 new(mem_ctx) ir_assignment(lhs, expr->operands[i], NULL, (1U << i));
+
+      this->base_ir->insert_before(assign);
+      assigned++;
+   }
+
+   assert(assigned == expr->type->vector_elements);
+
+   *rvalue = new(mem_ctx) ir_dereference_variable(temp);
+   this->progress = true;
+}
+
+bool
+lower_quadop_vector(exec_list *instructions, bool dont_lower_swz)
+{
+   lower_vector_visitor v;
+
+   v.dont_lower_swz = dont_lower_swz;
+   visit_list_elements(&v, instructions);
+
+   return v.progress;
+}

diff --git a/icd/intel/compiler/shader/lower_vector_insert.cpp b/icd/intel/compiler/shader/lower_vector_insert.cpp
new file mode 100644
index 0000000..6d7cfa9
--- /dev/null
+++ b/icd/intel/compiler/shader/lower_vector_insert.cpp

@@ -0,0 +1,142 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include "ir.h"
+#include "ir_builder.h"
+#include "ir_rvalue_visitor.h"
+#include "ir_optimization.h"
+
+using namespace ir_builder;
+
+namespace {
+
+class vector_insert_visitor : public ir_rvalue_visitor {
+public:
+   vector_insert_visitor(bool lower_nonconstant_index)
+      : progress(false), lower_nonconstant_index(lower_nonconstant_index)
+   {
+      factory.instructions = &factory_instructions;
+   }
+
+   virtual ~vector_insert_visitor()
+   {
+      assert(factory_instructions.is_empty());
+   }
+
+   virtual void handle_rvalue(ir_rvalue **rv);
+
+   ir_factory factory;
+   exec_list factory_instructions;
+   bool progress;
+   bool lower_nonconstant_index;
+};
+
+} /* anonymous namespace */
+
+void
+vector_insert_visitor::handle_rvalue(ir_rvalue **rv)
+{
+   if (*rv == NULL || (*rv)->ir_type != ir_type_expression)
+      return;
+
+   ir_expression *const expr = (ir_expression *) *rv;
+
+   if (likely(expr->operation != ir_triop_vector_insert))
+      return;
+
+   factory.mem_ctx = ralloc_parent(expr);
+
+   ir_constant *const idx = expr->operands[2]->constant_expression_value();
+   if (idx != NULL) {
+      /* Replace (vector_insert (vec) (scalar) (index)) with a dereference of
+       * a new temporary.  The new temporary gets assigned as
+       *
+       *     t = vec
+       *     t.mask = scalar
+       *
+       * where mask is the component selected by index.
+       */
+      ir_variable *const temp =
+         factory.make_temp(expr->operands[0]->type, "vec_tmp");
+
+      const int mask = 1 << idx->value.i[0];
+
+      factory.emit(assign(temp, expr->operands[0]));
+      factory.emit(assign(temp, expr->operands[1], mask));
+
+      this->progress = true;
+      *rv = new(factory.mem_ctx) ir_dereference_variable(temp);
+   } else if (this->lower_nonconstant_index) {
+      /* Replace (vector_insert (vec) (scalar) (index)) with a dereference of
+       * a new temporary.  The new temporary gets assigned as
+       *
+       *     t = vec
+       *     if (index == 0)
+       *         t.x = scalar
+       *     if (index == 1)
+       *         t.y = scalar
+       *     if (index == 2)
+       *         t.z = scalar
+       *     if (index == 3)
+       *         t.w = scalar
+       */
+      ir_variable *const temp =
+         factory.make_temp(expr->operands[0]->type, "vec_tmp");
+
+      ir_variable *const src_temp =
+         factory.make_temp(expr->operands[1]->type, "src_temp");
+
+      factory.emit(assign(temp, expr->operands[0]));
+      factory.emit(assign(src_temp, expr->operands[1]));
+
+      for (unsigned i = 0; i < expr->type->vector_elements; i++) {
+         ir_constant *const cmp_index =
+            new(factory.mem_ctx) ir_constant(int(i));
+
+         ir_variable *const cmp_result =
+            factory.make_temp(glsl_type::bool_type, "index_condition");
+
+         factory.emit(assign(cmp_result,
+                             equal(expr->operands[2]->clone(factory.mem_ctx,
+                                                            NULL),
+                                   cmp_index)));
+
+         factory.emit(if_tree(cmp_result,
+                              assign(temp, src_temp, WRITEMASK_X << i)));
+      }
+
+      this->progress = true;
+      *rv = new(factory.mem_ctx) ir_dereference_variable(temp);
+   }
+
+   base_ir->insert_before(factory.instructions);
+}
+
+bool
+lower_vector_insert(exec_list *instructions, bool lower_nonconstant_index)
+{
+   vector_insert_visitor v(lower_nonconstant_index);
+
+   visit_list_elements(&v, instructions);
+
+   return v.progress;
+}

diff --git a/icd/intel/compiler/shader/main.cpp b/icd/intel/compiler/shader/main.cpp
new file mode 100644
index 0000000..91d1d67
--- /dev/null
+++ b/icd/intel/compiler/shader/main.cpp

@@ -0,0 +1,253 @@
+/*
+ * Copyright © 2008, 2009 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include <getopt.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <assert.h>
+#include <string.h>
+#include <inttypes.h>
+
+/** @file main.cpp
+ *
+ * This file is the main() routine and scaffolding for producing
+ * builtin_compiler (which doesn't include builtins itself and is used
+ * to generate the profile information for builtin_function.cpp), and
+ * for glsl_compiler (which does include builtins and can be used to
+ * offline compile GLSL code and examine the resulting GLSL IR.
+ */
+
+#include "gpu.h"
+#include "pipeline.h"
+#include "compiler_interface.h"
+#include "compiler/mesa-utils/src/glsl/ralloc.h"
+#include "pipeline_compiler_interface.h"
+
+
+static char* load_spv_file(const char *filename, size_t *psize)
+{
+    long int size;
+    void *shader_code;
+
+    FILE *fp = fopen(filename, "rb");
+    if (!fp) return NULL;
+
+    fseek(fp, 0L, SEEK_END);
+    size = ftell(fp);
+
+    fseek(fp, 0L, SEEK_SET);
+
+    shader_code = malloc(size);
+    size_t tmp = fread(shader_code, size, 1, fp);
+    (void) tmp;
+
+    *psize = size;
+
+    return (char *) shader_code;
+}
+
+
+static char* load_glsl_file(const char *filename, size_t *psize, VkShaderStageFlagBits stage)
+{
+    long int size;
+    void *shader_code;
+
+    FILE *fp = fopen(filename, "r");
+    if (!fp) return NULL;
+
+    fseek(fp, 0L, SEEK_END);
+    size = ftell(fp) + sizeof(icd_spv_header) + 1;
+
+    fseek(fp, 0L, SEEK_SET);
+
+    shader_code = malloc(size);
+    size_t s = fread((char *)shader_code + sizeof(icd_spv_header), size - sizeof(icd_spv_header), 1, fp);
+    (void) s;
+    ((char *)shader_code)[size-1] = 0;
+
+    icd_spv_header* header = (icd_spv_header*)shader_code;
+    header->magic = ICD_SPV_MAGIC;
+    header->version = 0; // not SPV
+    header->gen_magic = stage;
+
+    *psize = size;
+
+    return (char *) shader_code;
+}
+
+int dump_ast = 0;
+int dump_hir = 0;
+int dump_lir = 0;
+int do_link = 0;
+
+const struct option compiler_opts[] = {
+   { "dump-ast", no_argument, &dump_ast, 1 },
+   { "dump-hir", no_argument, &dump_hir, 1 },
+   { "dump-lir", no_argument, &dump_lir, 1 },
+   { "link",     no_argument, &do_link,  1 },
+   { "version",  required_argument, NULL, 'v' },
+   { NULL, 0, NULL, 0 }
+};
+
+
+bool checkFileName(char* fileName)
+{
+    const unsigned fileNameLength = strlen(fileName);
+    if (fileNameLength < 5 ||
+            strncmp(".spv", &fileName[fileNameLength - 4], 4) != 0) {
+        printf("file must be .spv, .vert, .geom, or .frag\n");
+        return false;
+    }
+
+    return true;
+}
+
+
+bool checkFileExt(char* fileName, const char* ext)
+{
+    const unsigned fileNameLength = strlen(fileName);
+    if (strncmp(ext, &fileName[fileNameLength - strlen(ext)], strlen(ext)) != 0) {
+        return false;
+    }
+
+    return true;
+}
+
+int main(int argc, char **argv)
+{
+   int status = EXIT_SUCCESS;
+
+   switch (argc) {
+   case 2:
+       {
+           // Call vkCreateShader on the single shader
+
+           printf("Frontend compile %s\n", argv[1]);
+           fflush(stdout);
+
+           void *shaderCode = 0;
+           size_t size = 0;
+           VkShaderStageFlagBits stage = VK_SHADER_STAGE_VERTEX_BIT;
+
+           if (checkFileExt(argv[1], "vert.spv")) {
+               shaderCode = load_spv_file(argv[1], &size);
+               stage = VK_SHADER_STAGE_VERTEX_BIT;
+           } else if (checkFileExt(argv[1], "frag.spv")) {
+               shaderCode = load_spv_file(argv[1], &size);
+               stage = VK_SHADER_STAGE_FRAGMENT_BIT;
+           } else if (checkFileExt(argv[1], "geom.spv")) {
+               shaderCode = load_spv_file(argv[1], &size);
+               stage = VK_SHADER_STAGE_GEOMETRY_BIT;
+           } else if (checkFileExt(argv[1], ".spv")) {
+               shaderCode = load_spv_file(argv[1], &size);
+           } else if (checkFileExt(argv[1], ".vert")) {
+               stage = VK_SHADER_STAGE_VERTEX_BIT;
+           } else if (checkFileExt(argv[1], ".geom")) {
+               stage = VK_SHADER_STAGE_GEOMETRY_BIT;
+           } else if (checkFileExt(argv[1], ".frag")) {
+               stage = VK_SHADER_STAGE_FRAGMENT_BIT;
+           } else {
+               return EXIT_FAILURE;
+           }
+
+           if (!shaderCode)
+               shaderCode = load_glsl_file(argv[1], &size, stage);
+
+           assert(shaderCode);
+
+           struct intel_ir *shader_program = shader_create_ir(NULL, shaderCode, size, stage);
+           assert(shader_program);
+
+           // Set up only the fields needed for backend compile
+           struct intel_gpu gpu = { 0 };
+           gpu.gen_opaque = INTEL_GEN(7.5);
+           gpu.gt = 3;
+
+           printf("Backend compile %s\n", argv[1]);
+           fflush(stdout);
+
+           // struct timespec before;
+           // clock_gettime(CLOCK_MONOTONIC, &before);
+           // uint64_t beforeNanoSeconds = before.tv_nsec + before.tv_sec*INT64_C(1000000000);
+
+           struct intel_pipeline_shader pipe_shader;
+           VkResult ret = intel_pipeline_shader_compile(&pipe_shader, &gpu, NULL, NULL, shader_program);
+
+           // struct timespec after;
+           // clock_gettime(CLOCK_MONOTONIC, &after);
+           // uint64_t afterNanoSeconds = after.tv_nsec + after.tv_sec*INT64_C(1000000000);
+           // printf("file: %s, intel_pipeline_shader_compile = %" PRIu64 " milliseconds\n", argv[1], (afterNanoSeconds - beforeNanoSeconds)/1000000);
+           // fflush(stdout);
+
+           if (ret != VK_SUCCESS)
+               return ret;
+
+           intel_pipeline_shader_cleanup(&pipe_shader, &gpu);
+           shader_destroy_ir(shader_program);
+        }
+       break;
+   case 3:
+       // Call vkCreateShader on both shaders, then call vkCreateGraphicsPipeline?
+       // Only need to hook this up if we start invoking the backend once for the whole pipeline
+
+       printf("Multiple shaders not hooked up yet\n");
+       break;
+
+       // Ensure both filenames have a .spv extension
+       if (!checkFileName(argv[1]))
+           return EXIT_FAILURE;
+
+       if (!checkFileName(argv[2]))
+           return EXIT_FAILURE;
+
+       void *shaderCode[2];
+       size_t size[2];
+       struct intel_ir *result[2];
+
+       // Compile first shader
+       shaderCode[0] = load_spv_file(argv[1], &size[0]);
+       assert(shaderCode[0]);
+       printf("Compiling %s\n", argv[1]);
+       result[0] = shader_create_ir(NULL, shaderCode[0], size[0], VK_SHADER_STAGE_VERTEX_BIT);
+       assert(result[0]);
+
+       // Compile second shader
+       shaderCode[1] = load_spv_file(argv[2], &size[1]);
+       assert(shaderCode[1]);
+       printf("Compiling %s\n", argv[2]);
+       result[1] = shader_create_ir(NULL, shaderCode[1], size[1], VK_SHADER_STAGE_FRAGMENT_BIT);
+       assert(result[1]);
+
+
+       shader_destroy_ir(result[0]);
+       shader_destroy_ir(result[1]);
+
+       break;
+   case 0:
+   case 1:
+   default:
+       printf("Please provide one .spv, .vert or .frag file as input\n");
+       break;
+   }
+
+   return status;
+}

diff --git a/icd/intel/compiler/shader/memory_map.h b/icd/intel/compiler/shader/memory_map.h
new file mode 100644
index 0000000..fc13134
--- /dev/null
+++ b/icd/intel/compiler/shader/memory_map.h

@@ -0,0 +1,237 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef MEMORY_MAP_H
+#define MEMORY_MAP_H
+
+#include <fcntl.h>
+#include <unistd.h>
+
+#ifdef _POSIX_MAPPED_FILES
+#include <sys/mman.h>
+#include <sys/stat.h>
+#endif
+
+#include <stdint.h>
+#include <string.h>
+#include "ralloc.h"
+
+#ifdef __cplusplus
+
+/**
+ * Helper class to read data
+ *
+ * Class can read either from user given memory or from a file. On Linux
+ * file reading wraps around the Posix functions for mapping a file into
+ * the process's address space. Other OS may need different implementation.
+ */
+class memory_map
+{
+public:
+   memory_map() :
+      error(false),
+      mode(memory_map::READ_MEM),
+      cache_size(0),
+      cache_mmap(NULL),
+      cache_mmap_p(NULL)
+   {
+      mem_ctx = ralloc_context(NULL);
+   }
+
+   /* read from disk */
+   int map(const char *path)
+   {
+#ifdef _POSIX_MAPPED_FILES
+      struct stat stat_info;
+      if (stat(path, &stat_info) != 0)
+         return -1;
+
+      mode = memory_map::READ_MAP;
+      cache_size = stat_info.st_size;
+
+      int fd = open(path, O_RDONLY);
+      if (fd) {
+         cache_mmap_p = cache_mmap = (char *)
+            mmap(NULL, cache_size, PROT_READ, MAP_PRIVATE, fd, 0);
+         close(fd);
+         return (cache_mmap == MAP_FAILED) ? -1 : 0;
+      }
+#else
+      /* Implementation for systems without mmap(). */
+      FILE *in = fopen(path, "r");
+      if (in) {
+         fseek(in, 0, SEEK_END);
+         cache_size = ftell(in);
+         rewind(in);
+
+         cache_mmap = ralloc_array(mem_ctx, char, cache_size);
+
+         if (!cache_mmap)
+            return -1;
+
+         if (fread(cache_mmap, cache_size, 1, in) != 1) {
+            ralloc_free(cache_mmap);
+            cache_mmap = NULL;
+         }
+         cache_mmap_p = cache_mmap;
+         fclose(in);
+
+         return (cache_mmap == NULL) ? -1 : 0;
+      }
+#endif
+      return -1;
+   }
+
+   /* read from memory */
+   void map(const void *memory, size_t size)
+   {
+      cache_mmap_p = cache_mmap = (char *) memory;
+      cache_size = size;
+   }
+
+   ~memory_map() {
+#ifdef _POSIX_MAPPED_FILES
+      if (cache_mmap && mode == READ_MAP) {
+         munmap(cache_mmap, cache_size);
+      }
+#endif
+      ralloc_free(mem_ctx);
+   }
+
+   /* move read pointer forward */
+   inline void ffwd(int len)
+   {
+      cache_mmap_p += len;
+   }
+
+   inline void jump(unsigned pos)
+   {
+      cache_mmap_p = cache_mmap + pos;
+   }
+
+   /**
+    * safety check to avoid reading over cache_size,
+    * returns bool if it is safe to continue reading
+    */
+   bool safe_read(unsigned size)
+   {
+      if (position() + size > cache_size)
+         error = true;
+      return !error;
+   }
+
+   /* position of read pointer */
+   inline uint32_t position()
+   {
+      return cache_mmap_p - cache_mmap;
+   }
+
+   inline char *read_string()
+   {
+      uint32_t len = read_uint32_t();
+
+      /* NULL pointer is supported */
+      if (len == 0)
+         return NULL;
+
+      /* don't read off the end of cache */
+      /* TODO: Understand how this can happen and fix */
+      if (len + position() > cache_size) {
+         error = true;
+         return NULL;
+      }
+
+      /* verify that last character is terminator */
+      if (*(cache_mmap_p + len - 1) != '\0') {
+         error = true;
+         return NULL;
+      }
+
+      char *str = ralloc_array(mem_ctx, char, len);
+      memcpy(str, cache_mmap_p, len);
+      ffwd(len);
+      return str;
+   }
+
+/**
+ * read functions per type
+ */
+#define DECL_READER(type) type read_ ##type () {\
+   if (!safe_read(sizeof(type)))\
+      return 0;\
+   ffwd(sizeof(type));\
+   return *(type *) (cache_mmap_p - sizeof(type));\
+}
+
+   DECL_READER(int32_t);
+   DECL_READER(int64_t);
+   DECL_READER(uint8_t);
+   DECL_READER(uint32_t);
+
+   inline uint8_t read_bool()
+   {
+      return read_uint8_t();
+   }
+
+   inline void read(void *dst, size_t size)
+   {
+      if (!safe_read(size))
+         return;
+      memcpy(dst, cache_mmap_p, size);
+      ffwd(size);
+   }
+
+   /* total size of mapped memory */
+   inline int32_t size()
+   {
+      return cache_size;
+   }
+
+   inline bool errors()
+   {
+      return error;
+   }
+
+private:
+
+   void *mem_ctx;
+
+   /* if errors have occurred during reading */
+   bool error;
+
+   /* specifies if we are reading mapped memory or user passed mem */
+   enum read_mode {
+      READ_MEM = 0,
+      READ_MAP
+   };
+
+   int32_t mode;
+   unsigned cache_size;
+   char *cache_mmap;
+   char *cache_mmap_p;
+};
+#endif /* ifdef __cplusplus */
+
+#endif /* MEMORY_MAP_H */

diff --git a/icd/intel/compiler/shader/memory_writer.h b/icd/intel/compiler/shader/memory_writer.h
new file mode 100644
index 0000000..f98d118
--- /dev/null
+++ b/icd/intel/compiler/shader/memory_writer.h

@@ -0,0 +1,204 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef MEMORY_WRITER_H
+#define MEMORY_WRITER_H
+
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+
+#include "main/hash_table.h"
+
+#ifdef __cplusplus
+/**
+ * Helper class for writing data to memory
+ *
+ * This class maintains a dynamically-sized memory buffer and allows
+ * for data to be efficiently appended to it with automatic resizing.
+ */
+class memory_writer
+{
+public:
+   memory_writer() :
+      memory(NULL),
+      curr_size(0),
+      pos(0),
+      unique_id_counter(0)
+   {
+      data_hash = _mesa_hash_table_create(0, int_equal);
+      hash_value = _mesa_hash_data(this, sizeof(memory_writer));
+   }
+
+   ~memory_writer()
+   {
+      free(memory);
+      _mesa_hash_table_destroy(data_hash, NULL);
+   }
+
+   /* user wants to claim the memory */
+   char *release_memory(size_t *size)
+   {
+      /* final realloc to free allocated but unused memory */
+      char *result = (char *) realloc(memory, pos);
+      *size = pos;
+      memory = NULL;
+      curr_size = 0;
+      pos = 0;
+      return result;
+   }
+
+/**
+ * write functions per type
+ */
+#define DECL_WRITER(type) void write_ ##type (const type data) {\
+   write(&data, sizeof(type));\
+}
+
+   DECL_WRITER(int32_t);
+   DECL_WRITER(int64_t);
+   DECL_WRITER(uint8_t);
+   DECL_WRITER(uint32_t);
+
+   void write_bool(bool data)
+   {
+      uint8_t val = data;
+      write_uint8_t(val);
+   }
+
+   /* write function that reallocates more memory if required */
+   void write(const void *data, int size)
+   {
+      if (!memory || pos > (curr_size - size))
+         if (!grow(size)) {
+            assert(!"Out of memory while serializing a shader");
+            return;
+         }
+
+      memcpy(memory + pos, data, size);
+
+      pos += size;
+   }
+
+   void overwrite(const void *data, int size, int offset)
+   {
+      if (offset < 0 || offset + size > pos) {
+         assert(!"Attempt to write out of bounds while serializing a shader");
+         return;
+      }
+
+      memcpy(memory + offset, data, size);
+   }
+
+   /* length is written to make reading safe, we write len + 1 to be
+    * able to make distinction between "" and NULL
+    */
+   void write_string(const char *str)
+   {
+      uint32_t len = str ? strlen(str) + 1 : 0;
+      write_uint32_t(len);
+
+      /* serialize string + terminator for more convinient parsing. */
+      if (str)
+         write(str, len);
+   }
+
+   unsigned position()
+   {
+      return pos;
+   }
+
+   /**
+    * Convert the given pointer into a small integer unique ID.  In other
+    * words, if make_unique_id() has previously been called with this pointer,
+    * return the same ID that was returned last time.  If this is the first
+    * call to make_unique_id() with this pointer, return a fresh ID.
+    *
+    * Return value is true if the pointer has been seen before, false
+    * otherwise.
+    */
+   bool make_unique_id(const void *ptr, uint32_t *id_out)
+   {
+      hash_entry *entry =
+         _mesa_hash_table_search(data_hash, _mesa_hash_pointer(ptr), ptr);
+      if (entry != NULL) {
+         *id_out = (uint32_t) (intptr_t) entry->data;
+         return true;
+      } else {
+         /* Note: hashtable uses 0 to represent "entry not found" so our
+          * unique ID's need to start at 1.  Hence, preincrement
+          * unique_id_counter.
+          */
+         *id_out = ++this->unique_id_counter;
+         _mesa_hash_table_insert(data_hash, _mesa_hash_pointer(ptr), ptr,
+                                 (void *) (intptr_t) *id_out);
+         return false;
+      }
+   }
+
+private:
+
+   /* reallocate more memory */
+   bool grow(int size)
+   {
+      unsigned new_size = 2 * (curr_size + size);
+      char *more_mem = (char *) realloc(memory, new_size);
+      if (more_mem == NULL) {
+         free(memory);
+         memory = NULL;
+         return false;
+      } else {
+         memory = more_mem;
+         curr_size = new_size;
+         return true;
+      }
+   }
+
+   /* allocated memory */
+   char *memory;
+
+   /* current size of the whole allocation */
+   int curr_size;
+
+   /* write position / size of the data written */
+   int pos;
+
+   /* this hash can be used to refer to data already written
+    * to skip sequential writes of the same data
+    */
+   struct hash_table *data_hash;
+   uint32_t hash_value;
+   unsigned unique_id_counter;
+
+   static bool int_equal(const void *a, const void *b)
+   {
+      return a == b;
+   }
+
+};
+
+#endif /* ifdef __cplusplus */
+
+#endif /* MEMORY_WRITER_H */

diff --git a/icd/intel/compiler/shader/opt_algebraic.cpp b/icd/intel/compiler/shader/opt_algebraic.cpp
new file mode 100644
index 0000000..9d55392
--- /dev/null
+++ b/icd/intel/compiler/shader/opt_algebraic.cpp

@@ -0,0 +1,657 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file opt_algebraic.cpp
+ *
+ * Takes advantage of association, commutivity, and other algebraic
+ * properties to simplify expressions.
+ */
+
+#include "ir.h"
+#include "ir_visitor.h"
+#include "ir_rvalue_visitor.h"
+#include "ir_optimization.h"
+#include "ir_builder.h"
+#include "glsl_types.h"
+
+using namespace ir_builder;
+
+namespace {
+
+/**
+ * Visitor class for replacing expressions with ir_constant values.
+ */
+
+class ir_algebraic_visitor : public ir_rvalue_visitor {
+public:
+   ir_algebraic_visitor(bool native_integers)
+   {
+      this->progress = false;
+      this->mem_ctx = NULL;
+      this->native_integers = native_integers;
+   }
+
+   virtual ~ir_algebraic_visitor()
+   {
+   }
+
+   ir_rvalue *handle_expression(ir_expression *ir);
+   void handle_rvalue(ir_rvalue **rvalue);
+   bool reassociate_constant(ir_expression *ir1,
+			     int const_index,
+			     ir_constant *constant,
+			     ir_expression *ir2);
+   void reassociate_operands(ir_expression *ir1,
+			     int op1,
+			     ir_expression *ir2,
+			     int op2);
+   ir_rvalue *swizzle_if_required(ir_expression *expr,
+				  ir_rvalue *operand);
+
+   void *mem_ctx;
+
+   bool native_integers;
+   bool progress;
+};
+
+} /* unnamed namespace */
+
+static inline bool
+is_vec_zero(ir_constant *ir)
+{
+   return (ir == NULL) ? false : ir->is_zero();
+}
+
+static inline bool
+is_vec_one(ir_constant *ir)
+{
+   return (ir == NULL) ? false : ir->is_one();
+}
+
+static inline bool
+is_vec_two(ir_constant *ir)
+{
+   return (ir == NULL) ? false : ir->is_value(2.0, 2);
+}
+
+static inline bool
+is_vec_negative_one(ir_constant *ir)
+{
+   return (ir == NULL) ? false : ir->is_negative_one();
+}
+
+static inline bool
+is_vec_basis(ir_constant *ir)
+{
+   return (ir == NULL) ? false : ir->is_basis();
+}
+
+static void
+update_type(ir_expression *ir)
+{
+   if (ir->operands[0]->type->is_vector())
+      ir->type = ir->operands[0]->type;
+   else
+      ir->type = ir->operands[1]->type;
+}
+
+void
+ir_algebraic_visitor::reassociate_operands(ir_expression *ir1,
+					   int op1,
+					   ir_expression *ir2,
+					   int op2)
+{
+   ir_rvalue *temp = ir2->operands[op2];
+   ir2->operands[op2] = ir1->operands[op1];
+   ir1->operands[op1] = temp;
+
+   /* Update the type of ir2.  The type of ir1 won't have changed --
+    * base types matched, and at least one of the operands of the 2
+    * binops is still a vector if any of them were.
+    */
+   update_type(ir2);
+
+   this->progress = true;
+}
+
+/**
+ * Reassociates a constant down a tree of adds or multiplies.
+ *
+ * Consider (2 * (a * (b * 0.5))).  We want to send up with a * b.
+ */
+bool
+ir_algebraic_visitor::reassociate_constant(ir_expression *ir1, int const_index,
+					   ir_constant *constant,
+					   ir_expression *ir2)
+{
+   if (!ir2 || ir1->operation != ir2->operation)
+      return false;
+
+   /* Don't want to even think about matrices. */
+   if (ir1->operands[0]->type->is_matrix() ||
+       ir1->operands[1]->type->is_matrix() ||
+       ir2->operands[0]->type->is_matrix() ||
+       ir2->operands[1]->type->is_matrix())
+      return false;
+
+   ir_constant *ir2_const[2];
+   ir2_const[0] = ir2->operands[0]->constant_expression_value();
+   ir2_const[1] = ir2->operands[1]->constant_expression_value();
+
+   if (ir2_const[0] && ir2_const[1])
+      return false;
+
+   if (ir2_const[0]) {
+      reassociate_operands(ir1, const_index, ir2, 1);
+      return true;
+   } else if (ir2_const[1]) {
+      reassociate_operands(ir1, const_index, ir2, 0);
+      return true;
+   }
+
+   if (reassociate_constant(ir1, const_index, constant,
+			    ir2->operands[0]->as_expression())) {
+      update_type(ir2);
+      return true;
+   }
+
+   if (reassociate_constant(ir1, const_index, constant,
+			    ir2->operands[1]->as_expression())) {
+      update_type(ir2);
+      return true;
+   }
+
+   return false;
+}
+
+/* When eliminating an expression and just returning one of its operands,
+ * we may need to swizzle that operand out to a vector if the expression was
+ * vector type.
+ */
+ir_rvalue *
+ir_algebraic_visitor::swizzle_if_required(ir_expression *expr,
+					  ir_rvalue *operand)
+{
+   if (expr->type->is_vector() && operand->type->is_scalar()) {
+      return new(mem_ctx) ir_swizzle(operand, 0, 0, 0, 0,
+				     expr->type->vector_elements);
+   } else
+      return operand;
+}
+
+ir_rvalue *
+ir_algebraic_visitor::handle_expression(ir_expression *ir)
+{
+   ir_constant *op_const[4] = {NULL, NULL, NULL, NULL};
+   ir_expression *op_expr[4] = {NULL, NULL, NULL, NULL};
+   unsigned int i;
+
+   assert(ir->get_num_operands() <= 4);
+   for (i = 0; i < ir->get_num_operands(); i++) {
+      if (ir->operands[i]->type->is_matrix())
+	 return ir;
+
+      op_const[i] = ir->operands[i]->constant_expression_value();
+      op_expr[i] = ir->operands[i]->as_expression();
+   }
+
+   if (this->mem_ctx == NULL)
+      this->mem_ctx = ralloc_parent(ir);
+
+   switch (ir->operation) {
+   case ir_unop_bit_not:
+      if (op_expr[0] && op_expr[0]->operation == ir_unop_bit_not)
+         return op_expr[0]->operands[0];
+      break;
+
+   case ir_unop_abs:
+      if (op_expr[0] == NULL)
+	 break;
+
+      switch (op_expr[0]->operation) {
+      case ir_unop_abs:
+      case ir_unop_neg:
+         return abs(op_expr[0]->operands[0]);
+      default:
+         break;
+      }
+      break;
+
+   case ir_unop_neg:
+      if (op_expr[0] == NULL)
+	 break;
+
+      if (op_expr[0]->operation == ir_unop_neg) {
+         return op_expr[0]->operands[0];
+      }
+      break;
+
+   case ir_unop_exp:
+      if (op_expr[0] == NULL)
+	 break;
+
+      if (op_expr[0]->operation == ir_unop_log) {
+         return op_expr[0]->operands[0];
+      }
+      break;
+
+   case ir_unop_log:
+      if (op_expr[0] == NULL)
+	 break;
+
+      if (op_expr[0]->operation == ir_unop_exp) {
+         return op_expr[0]->operands[0];
+      }
+      break;
+
+   case ir_unop_exp2:
+      if (op_expr[0] == NULL)
+	 break;
+
+      if (op_expr[0]->operation == ir_unop_log2) {
+         return op_expr[0]->operands[0];
+      }
+      break;
+
+   case ir_unop_log2:
+      if (op_expr[0] == NULL)
+	 break;
+
+      if (op_expr[0]->operation == ir_unop_exp2) {
+         return op_expr[0]->operands[0];
+      }
+      break;
+
+   case ir_unop_logic_not: {
+      enum ir_expression_operation new_op = ir_unop_logic_not;
+
+      if (op_expr[0] == NULL)
+	 break;
+
+      switch (op_expr[0]->operation) {
+      case ir_binop_less:    new_op = ir_binop_gequal;  break;
+      case ir_binop_greater: new_op = ir_binop_lequal;  break;
+      case ir_binop_lequal:  new_op = ir_binop_greater; break;
+      case ir_binop_gequal:  new_op = ir_binop_less;    break;
+      case ir_binop_equal:   new_op = ir_binop_nequal;  break;
+      case ir_binop_nequal:  new_op = ir_binop_equal;   break;
+      case ir_binop_all_equal:   new_op = ir_binop_any_nequal;  break;
+      case ir_binop_any_nequal:  new_op = ir_binop_all_equal;   break;
+
+      default:
+	 /* The default case handler is here to silence a warning from GCC.
+	  */
+	 break;
+      }
+
+      if (new_op != ir_unop_logic_not) {
+	 return new(mem_ctx) ir_expression(new_op,
+					   ir->type,
+					   op_expr[0]->operands[0],
+					   op_expr[0]->operands[1]);
+      }
+
+      break;
+   }
+
+   case ir_binop_add:
+      if (is_vec_zero(op_const[0]))
+	 return ir->operands[1];
+      if (is_vec_zero(op_const[1]))
+	 return ir->operands[0];
+
+      /* Reassociate addition of constants so that we can do constant
+       * folding.
+       */
+      if (op_const[0] && !op_const[1])
+	 reassociate_constant(ir, 0, op_const[0], op_expr[1]);
+      if (op_const[1] && !op_const[0])
+	 reassociate_constant(ir, 1, op_const[1], op_expr[0]);
+
+      /* Replace (-x + y) * a + x and commutative variations with lrp(x, y, a).
+       *
+       * (-x + y) * a + x
+       * (x * -a) + (y * a) + x
+       * x + (x * -a) + (y * a)
+       * x * (1 - a) + y * a
+       * lrp(x, y, a)
+       */
+      for (int mul_pos = 0; mul_pos < 2; mul_pos++) {
+         ir_expression *mul = op_expr[mul_pos];
+
+         if (!mul || mul->operation != ir_binop_mul)
+            continue;
+
+         /* Multiply found on one of the operands. Now check for an
+          * inner addition operation.
+          */
+         for (int inner_add_pos = 0; inner_add_pos < 2; inner_add_pos++) {
+            ir_expression *inner_add =
+               mul->operands[inner_add_pos]->as_expression();
+
+            if (!inner_add || inner_add->operation != ir_binop_add)
+               continue;
+
+            /* Inner addition found on one of the operands. Now check for
+             * one of the operands of the inner addition to be the negative
+             * of x_operand.
+             */
+            for (int neg_pos = 0; neg_pos < 2; neg_pos++) {
+               ir_expression *neg =
+                  inner_add->operands[neg_pos]->as_expression();
+
+               if (!neg || neg->operation != ir_unop_neg)
+                  continue;
+
+               ir_rvalue *x_operand = ir->operands[1 - mul_pos];
+
+               if (!neg->operands[0]->equals(x_operand))
+                  continue;
+
+               ir_rvalue *y_operand = inner_add->operands[1 - neg_pos];
+               ir_rvalue *a_operand = mul->operands[1 - inner_add_pos];
+
+               if (x_operand->type != y_operand->type ||
+                   x_operand->type != a_operand->type)
+                  continue;
+
+               return lrp(x_operand, y_operand, a_operand);
+            }
+         }
+      }
+      break;
+
+   case ir_binop_sub:
+      if (is_vec_zero(op_const[0]))
+	 return neg(ir->operands[1]);
+      if (is_vec_zero(op_const[1]))
+	 return ir->operands[0];
+      break;
+
+   case ir_binop_mul:
+      if (is_vec_one(op_const[0]))
+	 return ir->operands[1];
+      if (is_vec_one(op_const[1]))
+	 return ir->operands[0];
+
+      if (is_vec_zero(op_const[0]) || is_vec_zero(op_const[1]))
+	 return ir_constant::zero(ir, ir->type);
+
+      if (is_vec_negative_one(op_const[0]))
+         return neg(ir->operands[1]);
+      if (is_vec_negative_one(op_const[1]))
+         return neg(ir->operands[0]);
+
+
+      /* Reassociate multiplication of constants so that we can do
+       * constant folding.
+       */
+      if (op_const[0] && !op_const[1])
+	 reassociate_constant(ir, 0, op_const[0], op_expr[1]);
+      if (op_const[1] && !op_const[0])
+	 reassociate_constant(ir, 1, op_const[1], op_expr[0]);
+
+      break;
+
+   case ir_binop_div:
+      if (is_vec_one(op_const[0]) && ir->type->base_type == GLSL_TYPE_FLOAT) {
+	 return new(mem_ctx) ir_expression(ir_unop_rcp,
+					   ir->operands[1]->type,
+					   ir->operands[1],
+					   NULL);
+      }
+      if (is_vec_one(op_const[1]))
+	 return ir->operands[0];
+      break;
+
+   case ir_binop_dot:
+      if (is_vec_zero(op_const[0]) || is_vec_zero(op_const[1]))
+	 return ir_constant::zero(mem_ctx, ir->type);
+
+      if (is_vec_basis(op_const[0])) {
+	 unsigned component = 0;
+	 for (unsigned c = 0; c < op_const[0]->type->vector_elements; c++) {
+	    if (op_const[0]->value.f[c] == 1.0)
+	       component = c;
+	 }
+	 return new(mem_ctx) ir_swizzle(ir->operands[1], component, 0, 0, 0, 1);
+      }
+      if (is_vec_basis(op_const[1])) {
+	 unsigned component = 0;
+	 for (unsigned c = 0; c < op_const[1]->type->vector_elements; c++) {
+	    if (op_const[1]->value.f[c] == 1.0)
+	       component = c;
+	 }
+	 return new(mem_ctx) ir_swizzle(ir->operands[0], component, 0, 0, 0, 1);
+      }
+      break;
+
+   case ir_binop_less:
+   case ir_binop_lequal:
+   case ir_binop_greater:
+   case ir_binop_gequal:
+   case ir_binop_equal:
+   case ir_binop_nequal:
+      for (int add_pos = 0; add_pos < 2; add_pos++) {
+         ir_expression *add = op_expr[add_pos];
+
+         if (!add || add->operation != ir_binop_add)
+            continue;
+
+         ir_constant *zero = op_const[1 - add_pos];
+         if (!is_vec_zero(zero))
+            continue;
+
+         return new(mem_ctx) ir_expression(ir->operation,
+                                           add->operands[0],
+                                           neg(add->operands[1]));
+      }
+      break;
+
+   case ir_binop_rshift:
+   case ir_binop_lshift:
+      /* 0 >> x == 0 */
+      if (is_vec_zero(op_const[0]))
+         return ir->operands[0];
+      /* x >> 0 == x */
+      if (is_vec_zero(op_const[1]))
+         return ir->operands[0];
+      break;
+
+   case ir_binop_logic_and:
+      if (is_vec_one(op_const[0])) {
+	 return ir->operands[1];
+      } else if (is_vec_one(op_const[1])) {
+	 return ir->operands[0];
+      } else if (is_vec_zero(op_const[0]) || is_vec_zero(op_const[1])) {
+	 return ir_constant::zero(mem_ctx, ir->type);
+      } else if (op_expr[0] && op_expr[0]->operation == ir_unop_logic_not &&
+                 op_expr[1] && op_expr[1]->operation == ir_unop_logic_not) {
+         /* De Morgan's Law:
+          *    (not A) and (not B) === not (A or B)
+          */
+         return logic_not(logic_or(op_expr[0]->operands[0],
+                                   op_expr[1]->operands[0]));
+      } else if (ir->operands[0]->equals(ir->operands[1])) {
+         /* (a && a) == a */
+         return ir->operands[0];
+      }
+      break;
+
+   case ir_binop_logic_xor:
+      if (is_vec_zero(op_const[0])) {
+	 return ir->operands[1];
+      } else if (is_vec_zero(op_const[1])) {
+	 return ir->operands[0];
+      } else if (is_vec_one(op_const[0])) {
+	 return logic_not(ir->operands[1]);
+      } else if (is_vec_one(op_const[1])) {
+	 return logic_not(ir->operands[0]);
+      } else if (ir->operands[0]->equals(ir->operands[1])) {
+         /* (a ^^ a) == false */
+	 return ir_constant::zero(mem_ctx, ir->type);
+      }
+      break;
+
+   case ir_binop_logic_or:
+      if (is_vec_zero(op_const[0])) {
+	 return ir->operands[1];
+      } else if (is_vec_zero(op_const[1])) {
+	 return ir->operands[0];
+      } else if (is_vec_one(op_const[0]) || is_vec_one(op_const[1])) {
+	 ir_constant_data data;
+
+	 for (unsigned i = 0; i < 16; i++)
+	    data.b[i] = true;
+
+	 return new(mem_ctx) ir_constant(ir->type, &data);
+      } else if (op_expr[0] && op_expr[0]->operation == ir_unop_logic_not &&
+                 op_expr[1] && op_expr[1]->operation == ir_unop_logic_not) {
+         /* De Morgan's Law:
+          *    (not A) or (not B) === not (A and B)
+          */
+         return logic_not(logic_and(op_expr[0]->operands[0],
+                                    op_expr[1]->operands[0]));
+      } else if (ir->operands[0]->equals(ir->operands[1])) {
+         /* (a || a) == a */
+         return ir->operands[0];
+      }
+      break;
+
+   case ir_binop_pow:
+      /* 1^x == 1 */
+      if (is_vec_one(op_const[0]))
+         return op_const[0];
+
+      /* x^1 == x */
+      if (is_vec_one(op_const[1]))
+         return ir->operands[0];
+
+      /* pow(2,x) == exp2(x) */
+      if (is_vec_two(op_const[0]))
+         return expr(ir_unop_exp2, ir->operands[1]);
+
+      if (is_vec_two(op_const[1])) {
+         ir_variable *x = new(ir) ir_variable(ir->operands[1]->type, "x",
+                                              ir_var_temporary);
+         base_ir->insert_before(x);
+         base_ir->insert_before(assign(x, ir->operands[0]));
+         return mul(x, x);
+      }
+
+      break;
+
+   case ir_unop_rcp:
+      if (op_expr[0] && op_expr[0]->operation == ir_unop_rcp)
+	 return op_expr[0]->operands[0];
+
+      /* While ir_to_mesa.cpp will lower sqrt(x) to rcp(rsq(x)), it does so at
+       * its IR level, so we can always apply this transformation.
+       */
+      if (op_expr[0] && op_expr[0]->operation == ir_unop_rsq)
+         return sqrt(op_expr[0]->operands[0]);
+
+      /* As far as we know, all backends are OK with rsq. */
+      if (op_expr[0] && op_expr[0]->operation == ir_unop_sqrt) {
+	 return rsq(op_expr[0]->operands[0]);
+      }
+
+      break;
+
+   case ir_triop_fma:
+      /* Operands are op0 * op1 + op2. */
+      if (is_vec_zero(op_const[0]) || is_vec_zero(op_const[1])) {
+         return ir->operands[2];
+      } else if (is_vec_zero(op_const[2])) {
+         return mul(ir->operands[0], ir->operands[1]);
+      } else if (is_vec_one(op_const[0])) {
+         return add(ir->operands[1], ir->operands[2]);
+      } else if (is_vec_one(op_const[1])) {
+         return add(ir->operands[0], ir->operands[2]);
+      }
+      break;
+
+   case ir_triop_lrp:
+      /* Operands are (x, y, a). */
+      if (is_vec_zero(op_const[2])) {
+         return ir->operands[0];
+      } else if (is_vec_one(op_const[2])) {
+         return ir->operands[1];
+      } else if (ir->operands[0]->equals(ir->operands[1])) {
+         return ir->operands[0];
+      } else if (is_vec_zero(op_const[0])) {
+         return mul(ir->operands[1], ir->operands[2]);
+      } else if (is_vec_zero(op_const[1])) {
+         unsigned op2_components = ir->operands[2]->type->vector_elements;
+         ir_constant *one = new(mem_ctx) ir_constant(1.0f, op2_components);
+         return mul(ir->operands[0], add(one, neg(ir->operands[2])));
+      }
+      break;
+
+   case ir_triop_csel:
+      if (is_vec_one(op_const[0]))
+	 return ir->operands[1];
+      if (is_vec_zero(op_const[0]))
+	 return ir->operands[2];
+      break;
+
+   default:
+      break;
+   }
+
+   return ir;
+}
+
+void
+ir_algebraic_visitor::handle_rvalue(ir_rvalue **rvalue)
+{
+   if (!*rvalue)
+      return;
+
+   ir_expression *expr = (*rvalue)->as_expression();
+   if (!expr || expr->operation == ir_quadop_vector)
+      return;
+
+   ir_rvalue *new_rvalue = handle_expression(expr);
+   if (new_rvalue == *rvalue)
+      return;
+
+   /* If the expr used to be some vec OP scalar returning a vector, and the
+    * optimization gave us back a scalar, we still need to turn it into a
+    * vector.
+    */
+   *rvalue = swizzle_if_required(expr, new_rvalue);
+
+   this->progress = true;
+}
+
+bool
+do_algebraic(exec_list *instructions, bool native_integers)
+{
+   ir_algebraic_visitor v(native_integers);
+
+   visit_list_elements(&v, instructions);
+
+   return v.progress;
+}

diff --git a/icd/intel/compiler/shader/opt_array_splitting.cpp b/icd/intel/compiler/shader/opt_array_splitting.cpp
new file mode 100644
index 0000000..97d3a57
--- /dev/null
+++ b/icd/intel/compiler/shader/opt_array_splitting.cpp

@@ -0,0 +1,409 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file opt_array_splitting.cpp
+ *
+ * If an array is always dereferenced with a constant index, then
+ * split it apart into its elements, making it more amenable to other
+ * optimization passes.
+ *
+ * This skips uniform/varying arrays, which would need careful
+ * handling due to their ir->location fields tying them to the GL API
+ * and other shader stages.
+ */
+
+#include "ir.h"
+#include "ir_visitor.h"
+#include "ir_rvalue_visitor.h"
+#include "glsl_types.h"
+
+static bool debug = false;
+
+namespace {
+
+namespace opt_array_splitting {
+
+class variable_entry : public exec_node
+{
+public:
+   variable_entry(ir_variable *var)
+   {
+      this->var = var;
+      this->split = true;
+      this->declaration = false;
+      this->components = NULL;
+      this->mem_ctx = NULL;
+      if (var->type->is_array())
+	 this->size = var->type->length;
+      else
+	 this->size = var->type->matrix_columns;
+   }
+
+   ir_variable *var; /* The key: the variable's pointer. */
+   unsigned size; /* array length or matrix columns */
+
+   /** Whether this array should be split or not. */
+   bool split;
+
+   /* If the variable had a decl we can work with in the instruction
+    * stream.  We can't do splitting on function arguments, which
+    * don't get this variable set.
+    */
+   bool declaration;
+
+   ir_variable **components;
+
+   /** ralloc_parent(this->var) -- the shader's talloc context. */
+   void *mem_ctx;
+};
+
+} /* namespace */
+
+using namespace opt_array_splitting;
+
+/**
+ * This class does a walk over the tree, coming up with the set of
+ * variables that could be split by looking to see if they are arrays
+ * that are only ever constant-index dereferenced.
+ */
+class ir_array_reference_visitor : public ir_hierarchical_visitor {
+public:
+   ir_array_reference_visitor(void)
+   {
+      this->mem_ctx = ralloc_context(NULL);
+      this->variable_list.make_empty();
+   }
+
+   ~ir_array_reference_visitor(void)
+   {
+      ralloc_free(mem_ctx);
+   }
+
+   bool get_split_list(exec_list *instructions, bool linked);
+
+   virtual ir_visitor_status visit(ir_variable *);
+   virtual ir_visitor_status visit(ir_dereference_variable *);
+   virtual ir_visitor_status visit_enter(ir_dereference_array *);
+   virtual ir_visitor_status visit_enter(ir_function_signature *);
+
+   variable_entry *get_variable_entry(ir_variable *var);
+
+   /* List of variable_entry */
+   exec_list variable_list;
+
+   void *mem_ctx;
+};
+
+} /* namespace */
+
+variable_entry *
+ir_array_reference_visitor::get_variable_entry(ir_variable *var)
+{
+   assert(var);
+
+   if (var->data.mode != ir_var_auto &&
+       var->data.mode != ir_var_temporary)
+      return NULL;
+
+   if (!(var->type->is_array() || var->type->is_matrix()))
+      return NULL;
+
+   /* If the array hasn't been sized yet, we can't split it.  After
+    * linking, this should be resolved.
+    */
+   if (var->type->is_unsized_array())
+      return NULL;
+
+   foreach_list(n, &this->variable_list) {
+      variable_entry *entry = (variable_entry *) n;
+      if (entry->var == var)
+	 return entry;
+   }
+
+   variable_entry *entry = new(mem_ctx) variable_entry(var);
+   this->variable_list.push_tail(entry);
+   return entry;
+}
+
+
+ir_visitor_status
+ir_array_reference_visitor::visit(ir_variable *ir)
+{
+   variable_entry *entry = this->get_variable_entry(ir);
+
+   if (entry)
+      entry->declaration = true;
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_array_reference_visitor::visit(ir_dereference_variable *ir)
+{
+   variable_entry *entry = this->get_variable_entry(ir->var);
+
+   /* If we made it to here without seeing an ir_dereference_array,
+    * then the dereference of this array didn't have a constant index
+    * (see the visit_continue_with_parent below), so we can't split
+    * the variable.
+    */
+   if (entry)
+      entry->split = false;
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_array_reference_visitor::visit_enter(ir_dereference_array *ir)
+{
+   ir_dereference_variable *deref = ir->array->as_dereference_variable();
+   if (!deref)
+      return visit_continue;
+
+   variable_entry *entry = this->get_variable_entry(deref->var);
+
+   /* If the access to the array has a variable index, we wouldn't
+    * know which split variable this dereference should go to.
+    */
+   if (entry && !ir->array_index->as_constant())
+      entry->split = false;
+
+   return visit_continue_with_parent;
+}
+
+ir_visitor_status
+ir_array_reference_visitor::visit_enter(ir_function_signature *ir)
+{
+   /* We don't have logic for array-splitting function arguments,
+    * so just look at the body instructions and not the parameter
+    * declarations.
+    */
+   visit_list_elements(this, &ir->body);
+   return visit_continue_with_parent;
+}
+
+bool
+ir_array_reference_visitor::get_split_list(exec_list *instructions,
+					   bool linked)
+{
+   visit_list_elements(this, instructions);
+
+   /* If the shaders aren't linked yet, we can't mess with global
+    * declarations, which need to be matched by name across shaders.
+    */
+   if (!linked) {
+      foreach_list(node, instructions) {
+	 ir_variable *var = ((ir_instruction *)node)->as_variable();
+	 if (var) {
+	    variable_entry *entry = get_variable_entry(var);
+	    if (entry)
+	       entry->remove();
+	 }
+      }
+   }
+
+   /* Trim out variables we found that we can't split. */
+   foreach_list_safe(n, &variable_list) {
+      variable_entry *entry = (variable_entry *) n;
+
+      if (debug) {
+	 printf("array %s@%p: decl %d, split %d\n",
+		entry->var->name, (void *) entry->var, entry->declaration,
+		entry->split);
+      }
+
+      if (!(entry->declaration && entry->split)) {
+	 entry->remove();
+      }
+   }
+
+   return !variable_list.is_empty();
+}
+
+/**
+ * This class rewrites the dereferences of arrays that have been split
+ * to use the newly created ir_variables for each component.
+ */
+class ir_array_splitting_visitor : public ir_rvalue_visitor {
+public:
+   ir_array_splitting_visitor(exec_list *vars)
+   {
+      this->variable_list = vars;
+   }
+
+   virtual ~ir_array_splitting_visitor()
+   {
+   }
+
+   virtual ir_visitor_status visit_leave(ir_assignment *);
+
+   void split_deref(ir_dereference **deref);
+   void handle_rvalue(ir_rvalue **rvalue);
+   variable_entry *get_splitting_entry(ir_variable *var);
+
+   exec_list *variable_list;
+};
+
+variable_entry *
+ir_array_splitting_visitor::get_splitting_entry(ir_variable *var)
+{
+   assert(var);
+
+   foreach_list(n, this->variable_list) {
+      variable_entry *entry = (variable_entry *) n;
+      if (entry->var == var) {
+	 return entry;
+      }
+   }
+
+   return NULL;
+}
+
+void
+ir_array_splitting_visitor::split_deref(ir_dereference **deref)
+{
+   ir_dereference_array *deref_array = (*deref)->as_dereference_array();
+   if (!deref_array)
+      return;
+
+   ir_dereference_variable *deref_var = deref_array->array->as_dereference_variable();
+   if (!deref_var)
+      return;
+   ir_variable *var = deref_var->var;
+
+   variable_entry *entry = get_splitting_entry(var);
+   if (!entry)
+      return;
+
+   ir_constant *constant = deref_array->array_index->as_constant();
+   assert(constant);
+
+   if (constant->value.i[0] < (int)entry->size) {
+      *deref = new(entry->mem_ctx)
+	 ir_dereference_variable(entry->components[constant->value.i[0]]);
+   } else {
+      /* There was a constant array access beyond the end of the
+       * array.  This might have happened due to constant folding
+       * after the initial parse.  This produces an undefined value,
+       * but shouldn't crash.  Just give them an uninitialized
+       * variable.
+       */
+      ir_variable *temp = new(entry->mem_ctx) ir_variable(deref_array->type,
+							  "undef",
+							  ir_var_temporary);
+      entry->components[0]->insert_before(temp);
+      *deref = new(entry->mem_ctx) ir_dereference_variable(temp);
+   }
+}
+
+void
+ir_array_splitting_visitor::handle_rvalue(ir_rvalue **rvalue)
+{
+   if (!*rvalue)
+      return;
+
+   ir_dereference *deref = (*rvalue)->as_dereference();
+
+   if (!deref)
+      return;
+
+   split_deref(&deref);
+   *rvalue = deref;
+}
+
+ir_visitor_status
+ir_array_splitting_visitor::visit_leave(ir_assignment *ir)
+{
+   /* The normal rvalue visitor skips the LHS of assignments, but we
+    * need to process those just the same.
+    */
+   ir_rvalue *lhs = ir->lhs;
+
+   handle_rvalue(&lhs);
+   ir->lhs = lhs->as_dereference();
+
+   ir->lhs->accept(this);
+
+   handle_rvalue(&ir->rhs);
+   ir->rhs->accept(this);
+
+   if (ir->condition) {
+      handle_rvalue(&ir->condition);
+      ir->condition->accept(this);
+   }
+
+   return visit_continue;
+}
+
+bool
+optimize_split_arrays(exec_list *instructions, bool linked)
+{
+   ir_array_reference_visitor refs;
+   if (!refs.get_split_list(instructions, linked))
+      return false;
+
+   void *mem_ctx = ralloc_context(NULL);
+
+   /* Replace the decls of the arrays to be split with their split
+    * components.
+    */
+   foreach_list(n, &refs.variable_list) {
+      variable_entry *entry = (variable_entry *) n;
+      const struct glsl_type *type = entry->var->type;
+      const struct glsl_type *subtype;
+
+      if (type->is_matrix())
+	 subtype = type->column_type();
+      else
+	 subtype = type->fields.array;
+
+      entry->mem_ctx = ralloc_parent(entry->var);
+
+      entry->components = ralloc_array(mem_ctx,
+				       ir_variable *,
+				       entry->size);
+
+      for (unsigned int i = 0; i < entry->size; i++) {
+	 const char *name = ralloc_asprintf(mem_ctx, "%s_%d",
+					    entry->var->name, i);
+
+	 entry->components[i] =
+	    new(entry->mem_ctx) ir_variable(subtype, name, ir_var_temporary);
+	 entry->var->insert_before(entry->components[i]);
+      }
+
+      entry->var->remove();
+   }
+
+   ir_array_splitting_visitor split(&refs.variable_list);
+   visit_list_elements(&split, instructions);
+
+   if (debug)
+      _mesa_print_ir(stdout, instructions, NULL);
+
+   ralloc_free(mem_ctx);
+
+   return true;
+
+}

diff --git a/icd/intel/compiler/shader/opt_constant_folding.cpp b/icd/intel/compiler/shader/opt_constant_folding.cpp
new file mode 100644
index 0000000..d0e5754
--- /dev/null
+++ b/icd/intel/compiler/shader/opt_constant_folding.cpp

@@ -0,0 +1,161 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file opt_constant_folding.cpp
+ * Replace constant-valued expressions with references to constant values.
+ */
+
+#include "ir.h"
+#include "ir_visitor.h"
+#include "ir_rvalue_visitor.h"
+#include "ir_optimization.h"
+#include "glsl_types.h"
+
+namespace {
+
+/**
+ * Visitor class for replacing expressions with ir_constant values.
+ */
+
+class ir_constant_folding_visitor : public ir_rvalue_visitor {
+public:
+   ir_constant_folding_visitor()
+   {
+      this->progress = false;
+   }
+
+   virtual ~ir_constant_folding_visitor()
+   {
+      /* empty */
+   }
+
+   virtual ir_visitor_status visit_enter(ir_assignment *ir);
+   virtual ir_visitor_status visit_enter(ir_call *ir);
+
+   virtual void handle_rvalue(ir_rvalue **rvalue);
+
+   bool progress;
+};
+
+} /* unnamed namespace */
+
+void
+ir_constant_folding_visitor::handle_rvalue(ir_rvalue **rvalue)
+{
+   if (*rvalue == NULL || (*rvalue)->ir_type == ir_type_constant)
+      return;
+
+   /* Note that we do rvalue visitoring on leaving.  So if an
+    * expression has a non-constant operand, no need to go looking
+    * down it to find if it's constant.  This cuts the time of this
+    * pass down drastically.
+    */
+   ir_expression *expr = (*rvalue)->as_expression();
+   if (expr) {
+      for (unsigned int i = 0; i < expr->get_num_operands(); i++) {
+	 if (!expr->operands[i]->as_constant())
+	    return;
+      }
+   }
+
+   ir_constant *constant = (*rvalue)->constant_expression_value();
+   if (constant) {
+      *rvalue = constant;
+      this->progress = true;
+   } else {
+      (*rvalue)->accept(this);
+   }
+}
+
+ir_visitor_status
+ir_constant_folding_visitor::visit_enter(ir_assignment *ir)
+{
+   ir->rhs->accept(this);
+   handle_rvalue(&ir->rhs);
+
+   if (ir->condition) {
+      ir->condition->accept(this);
+      handle_rvalue(&ir->condition);
+
+      ir_constant *const_val = ir->condition->as_constant();
+      /* If the condition is constant, either remove the condition or
+       * remove the never-executed assignment.
+       */
+      if (const_val) {
+	 if (const_val->value.b[0])
+	    ir->condition = NULL;
+	 else
+	    ir->remove();
+	 this->progress = true;
+      }
+   }
+
+   /* Don't descend into the LHS because we want it to stay as a
+    * variable dereference.  FINISHME: We probably should to get array
+    * indices though.
+    */
+   return visit_continue_with_parent;
+}
+
+ir_visitor_status
+ir_constant_folding_visitor::visit_enter(ir_call *ir)
+{
+   /* Attempt to constant fold parameters */
+   foreach_two_lists(formal_node, &ir->callee->parameters,
+                     actual_node, &ir->actual_parameters) {
+      ir_rvalue *param_rval = (ir_rvalue *) actual_node;
+      ir_variable *sig_param = (ir_variable *) formal_node;
+
+      if (sig_param->data.mode == ir_var_function_in
+          || sig_param->data.mode == ir_var_const_in) {
+	 ir_rvalue *new_param = param_rval;
+
+	 handle_rvalue(&new_param);
+	 if (new_param != param_rval) {
+	    param_rval->replace_with(new_param);
+	 }
+      }
+   }
+
+   /* Next, see if the call can be replaced with an assignment of a constant */
+   ir_constant *const_val = ir->constant_expression_value();
+
+   if (const_val != NULL) {
+      ir_assignment *assignment =
+	 new(ralloc_parent(ir)) ir_assignment(ir->return_deref, const_val);
+      ir->replace_with(assignment);
+   }
+
+   return visit_continue_with_parent;
+}
+
+bool
+do_constant_folding(exec_list *instructions)
+{
+   ir_constant_folding_visitor constant_folding;
+
+   visit_list_elements(&constant_folding, instructions);
+
+   return constant_folding.progress;
+}

diff --git a/icd/intel/compiler/shader/opt_constant_propagation.cpp b/icd/intel/compiler/shader/opt_constant_propagation.cpp
new file mode 100644
index 0000000..18f5da6
--- /dev/null
+++ b/icd/intel/compiler/shader/opt_constant_propagation.cpp

@@ -0,0 +1,469 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * constant of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, constant, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above constantright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR CONSTANTRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file opt_constant_propagation.cpp
+ *
+ * Tracks assignments of constants to channels of variables, and
+ * usage of those constant channels with direct usage of the constants.
+ *
+ * This can lead to constant folding and algebraic optimizations in
+ * those later expressions, while causing no increase in instruction
+ * count (due to constants being generally free to load from a
+ * constant push buffer or as instruction immediate values) and
+ * possibly reducing register pressure.
+ */
+
+#include "ir.h"
+#include "ir_visitor.h"
+#include "ir_rvalue_visitor.h"
+#include "ir_basic_block.h"
+#include "ir_optimization.h"
+#include "glsl_types.h"
+
+namespace {
+
+class acp_entry : public exec_node
+{
+public:
+   acp_entry(ir_variable *var, unsigned write_mask, ir_constant *constant)
+   {
+      assert(var);
+      assert(constant);
+      this->var = var;
+      this->write_mask = write_mask;
+      this->constant = constant;
+      this->initial_values = write_mask;
+   }
+
+   acp_entry(const acp_entry *src)
+   {
+      this->var = src->var;
+      this->write_mask = src->write_mask;
+      this->constant = src->constant;
+      this->initial_values = src->initial_values;
+   }
+
+   ir_variable *var;
+   ir_constant *constant;
+   unsigned write_mask;
+
+   /** Mask of values initially available in the constant. */
+   unsigned initial_values;
+};
+
+
+class kill_entry : public exec_node
+{
+public:
+   kill_entry(ir_variable *var, unsigned write_mask)
+   {
+      assert(var);
+      this->var = var;
+      this->write_mask = write_mask;
+   }
+
+   ir_variable *var;
+   unsigned write_mask;
+};
+
+class ir_constant_propagation_visitor : public ir_rvalue_visitor {
+public:
+   ir_constant_propagation_visitor()
+   {
+      progress = false;
+      killed_all = false;
+      mem_ctx = ralloc_context(0);
+      this->acp = new(mem_ctx) exec_list;
+      this->kills = new(mem_ctx) exec_list;
+   }
+   ~ir_constant_propagation_visitor()
+   {
+      ralloc_free(mem_ctx);
+   }
+
+   virtual ir_visitor_status visit_enter(class ir_loop *);
+   virtual ir_visitor_status visit_enter(class ir_function_signature *);
+   virtual ir_visitor_status visit_enter(class ir_function *);
+   virtual ir_visitor_status visit_leave(class ir_assignment *);
+   virtual ir_visitor_status visit_enter(class ir_call *);
+   virtual ir_visitor_status visit_enter(class ir_if *);
+
+   void add_constant(ir_assignment *ir);
+   void kill(ir_variable *ir, unsigned write_mask);
+   void handle_if_block(exec_list *instructions);
+   void handle_rvalue(ir_rvalue **rvalue);
+
+   /** List of acp_entry: The available constants to propagate */
+   exec_list *acp;
+
+   /**
+    * List of kill_entry: The masks of variables whose values were
+    * killed in this block.
+    */
+   exec_list *kills;
+
+   bool progress;
+
+   bool killed_all;
+
+   void *mem_ctx;
+};
+
+
+void
+ir_constant_propagation_visitor::handle_rvalue(ir_rvalue **rvalue)
+{
+   if (this->in_assignee || !*rvalue)
+      return;
+
+   const glsl_type *type = (*rvalue)->type;
+   if (!type->is_scalar() && !type->is_vector())
+      return;
+
+   ir_swizzle *swiz = NULL;
+   ir_dereference_variable *deref = (*rvalue)->as_dereference_variable();
+   if (!deref) {
+      swiz = (*rvalue)->as_swizzle();
+      if (!swiz)
+	 return;
+
+      deref = swiz->val->as_dereference_variable();
+      if (!deref)
+	 return;
+   }
+
+   ir_constant_data data;
+   memset(&data, 0, sizeof(data));
+
+   for (unsigned int i = 0; i < type->components(); i++) {
+      int channel;
+      acp_entry *found = NULL;
+
+      if (swiz) {
+	 switch (i) {
+	 case 0: channel = swiz->mask.x; break;
+	 case 1: channel = swiz->mask.y; break;
+	 case 2: channel = swiz->mask.z; break;
+	 case 3: channel = swiz->mask.w; break;
+	 default: assert(!"shouldn't be reached"); channel = 0; break;
+	 }
+      } else {
+	 channel = i;
+      }
+
+      foreach_list(n, this->acp) {
+	 acp_entry *entry = (acp_entry *) n;
+	 if (entry->var == deref->var && entry->write_mask & (1 << channel)) {
+	    found = entry;
+	    break;
+	 }
+      }
+
+      if (!found)
+	 return;
+
+      int rhs_channel = 0;
+      for (int j = 0; j < 4; j++) {
+	 if (j == channel)
+	    break;
+	 if (found->initial_values & (1 << j))
+	    rhs_channel++;
+      }
+
+      switch (type->base_type) {
+      case GLSL_TYPE_FLOAT:
+	 data.f[i] = found->constant->value.f[rhs_channel];
+	 break;
+      case GLSL_TYPE_INT:
+	 data.i[i] = found->constant->value.i[rhs_channel];
+	 break;
+      case GLSL_TYPE_UINT:
+	 data.u[i] = found->constant->value.u[rhs_channel];
+	 break;
+      case GLSL_TYPE_BOOL:
+	 data.b[i] = found->constant->value.b[rhs_channel];
+	 break;
+      default:
+	 assert(!"not reached");
+	 break;
+      }
+   }
+
+   *rvalue = new(ralloc_parent(deref)) ir_constant(type, &data);
+   this->progress = true;
+}
+
+ir_visitor_status
+ir_constant_propagation_visitor::visit_enter(ir_function_signature *ir)
+{
+   /* Treat entry into a function signature as a completely separate
+    * block.  Any instructions at global scope will be shuffled into
+    * main() at link time, so they're irrelevant to us.
+    */
+   exec_list *orig_acp = this->acp;
+   exec_list *orig_kills = this->kills;
+   bool orig_killed_all = this->killed_all;
+
+   this->acp = new(mem_ctx) exec_list;
+   this->kills = new(mem_ctx) exec_list;
+   this->killed_all = false;
+
+   visit_list_elements(this, &ir->body);
+
+   this->kills = orig_kills;
+   this->acp = orig_acp;
+   this->killed_all = orig_killed_all;
+
+   return visit_continue_with_parent;
+}
+
+ir_visitor_status
+ir_constant_propagation_visitor::visit_leave(ir_assignment *ir)
+{
+   if (this->in_assignee)
+      return visit_continue;
+
+   unsigned kill_mask = ir->write_mask;
+   if (ir->lhs->as_dereference_array()) {
+      /* The LHS of the assignment uses an array indexing operator (e.g. v[i]
+       * = ...;).  Since we only try to constant propagate vectors and
+       * scalars, this means that either (a) array indexing is being used to
+       * select a vector component, or (b) the variable in question is neither
+       * a scalar or a vector, so we don't care about it.  In the former case,
+       * we want to kill the whole vector, since in general we can't predict
+       * which vector component will be selected by array indexing.  In the
+       * latter case, it doesn't matter what we do, so go ahead and kill the
+       * whole variable anyway.
+       *
+       * Note that if the array index is constant (e.g. v[2] = ...;), we could
+       * in principle be smarter, but we don't need to, because a future
+       * optimization pass will convert it to a simple assignment with the
+       * correct mask.
+       */
+      kill_mask = ~0;
+   }
+   kill(ir->lhs->variable_referenced(), kill_mask);
+
+   add_constant(ir);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_constant_propagation_visitor::visit_enter(ir_function *ir)
+{
+   (void) ir;
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_constant_propagation_visitor::visit_enter(ir_call *ir)
+{
+   /* Do constant propagation on call parameters, but skip any out params */
+   foreach_two_lists(formal_node, &ir->callee->parameters,
+                     actual_node, &ir->actual_parameters) {
+      ir_variable *sig_param = (ir_variable *) formal_node;
+      ir_rvalue *param = (ir_rvalue *) actual_node;
+      if (sig_param->data.mode != ir_var_function_out
+          && sig_param->data.mode != ir_var_function_inout) {
+	 ir_rvalue *new_param = param;
+	 handle_rvalue(&new_param);
+         if (new_param != param)
+	    param->replace_with(new_param);
+	 else
+	    param->accept(this);
+      }
+   }
+
+   /* Since we're unlinked, we don't (necssarily) know the side effects of
+    * this call.  So kill all copies.
+    */
+   acp->make_empty();
+   this->killed_all = true;
+
+   return visit_continue_with_parent;
+}
+
+void
+ir_constant_propagation_visitor::handle_if_block(exec_list *instructions)
+{
+   exec_list *orig_acp = this->acp;
+   exec_list *orig_kills = this->kills;
+   bool orig_killed_all = this->killed_all;
+
+   this->acp = new(mem_ctx) exec_list;
+   this->kills = new(mem_ctx) exec_list;
+   this->killed_all = false;
+
+   /* Populate the initial acp with a constant of the original */
+   foreach_list(n, orig_acp) {
+      acp_entry *a = (acp_entry *) n;
+      this->acp->push_tail(new(this->mem_ctx) acp_entry(a));
+   }
+
+   visit_list_elements(this, instructions);
+
+   if (this->killed_all) {
+      orig_acp->make_empty();
+   }
+
+   exec_list *new_kills = this->kills;
+   this->kills = orig_kills;
+   this->acp = orig_acp;
+   this->killed_all = this->killed_all || orig_killed_all;
+
+   foreach_list(n, new_kills) {
+      kill_entry *k = (kill_entry *) n;
+      kill(k->var, k->write_mask);
+   }
+}
+
+ir_visitor_status
+ir_constant_propagation_visitor::visit_enter(ir_if *ir)
+{
+   ir->condition->accept(this);
+   handle_rvalue(&ir->condition);
+
+   handle_if_block(&ir->then_instructions);
+   handle_if_block(&ir->else_instructions);
+
+   /* handle_if_block() already descended into the children. */
+   return visit_continue_with_parent;
+}
+
+ir_visitor_status
+ir_constant_propagation_visitor::visit_enter(ir_loop *ir)
+{
+   exec_list *orig_acp = this->acp;
+   exec_list *orig_kills = this->kills;
+   bool orig_killed_all = this->killed_all;
+
+   /* FINISHME: For now, the initial acp for loops is totally empty.
+    * We could go through once, then go through again with the acp
+    * cloned minus the killed entries after the first run through.
+    */
+   this->acp = new(mem_ctx) exec_list;
+   this->kills = new(mem_ctx) exec_list;
+   this->killed_all = false;
+
+   visit_list_elements(this, &ir->body_instructions);
+
+   if (this->killed_all) {
+      orig_acp->make_empty();
+   }
+
+   exec_list *new_kills = this->kills;
+   this->kills = orig_kills;
+   this->acp = orig_acp;
+   this->killed_all = this->killed_all || orig_killed_all;
+
+   foreach_list(n, new_kills) {
+      kill_entry *k = (kill_entry *) n;
+      kill(k->var, k->write_mask);
+   }
+
+   /* already descended into the children. */
+   return visit_continue_with_parent;
+}
+
+void
+ir_constant_propagation_visitor::kill(ir_variable *var, unsigned write_mask)
+{
+   assert(var != NULL);
+
+   /* We don't track non-vectors. */
+   if (!var->type->is_vector() && !var->type->is_scalar())
+      return;
+
+   /* Remove any entries currently in the ACP for this kill. */
+   foreach_list_safe(n, this->acp) {
+      acp_entry *entry = (acp_entry *) n;
+
+      if (entry->var == var) {
+	 entry->write_mask &= ~write_mask;
+	 if (entry->write_mask == 0)
+	    entry->remove();
+      }
+   }
+
+   /* Add this writemask of the variable to the list of killed
+    * variables in this block.
+    */
+   foreach_list(n, this->kills) {
+      kill_entry *entry = (kill_entry *) n;
+
+      if (entry->var == var) {
+	 entry->write_mask |= write_mask;
+	 return;
+      }
+   }
+   /* Not already in the list.  Make new entry. */
+   this->kills->push_tail(new(this->mem_ctx) kill_entry(var, write_mask));
+}
+
+/**
+ * Adds an entry to the available constant list if it's a plain assignment
+ * of a variable to a variable.
+ */
+void
+ir_constant_propagation_visitor::add_constant(ir_assignment *ir)
+{
+   acp_entry *entry;
+
+   if (ir->condition)
+      return;
+
+   if (!ir->write_mask)
+      return;
+
+   ir_dereference_variable *deref = ir->lhs->as_dereference_variable();
+   ir_constant *constant = ir->rhs->as_constant();
+
+   if (!deref || !constant)
+      return;
+
+   /* Only do constant propagation on vectors.  Constant matrices,
+    * arrays, or structures would require more work elsewhere.
+    */
+   if (!deref->var->type->is_vector() && !deref->var->type->is_scalar())
+      return;
+
+   entry = new(this->mem_ctx) acp_entry(deref->var, ir->write_mask, constant);
+   this->acp->push_tail(entry);
+}
+
+} /* unnamed namespace */
+
+/**
+ * Does a constant propagation pass on the code present in the instruction stream.
+ */
+bool
+do_constant_propagation(exec_list *instructions)
+{
+   ir_constant_propagation_visitor v;
+
+   visit_list_elements(&v, instructions);
+
+   return v.progress;
+}

diff --git a/icd/intel/compiler/shader/opt_constant_variable.cpp b/icd/intel/compiler/shader/opt_constant_variable.cpp
new file mode 100644
index 0000000..961b8aa
--- /dev/null
+++ b/icd/intel/compiler/shader/opt_constant_variable.cpp

@@ -0,0 +1,209 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file opt_constant_variable.cpp
+ *
+ * Marks variables assigned a single constant value over the course
+ * of the program as constant.
+ *
+ * The goal here is to trigger further constant folding and then dead
+ * code elimination.  This is common with vector/matrix constructors
+ * and calls to builtin functions.
+ */
+
+#include "ir.h"
+#include "ir_visitor.h"
+#include "ir_optimization.h"
+#include "glsl_types.h"
+
+namespace {
+
+struct assignment_entry {
+   exec_node link;
+   int assignment_count;
+   ir_variable *var;
+   ir_constant *constval;
+   bool our_scope;
+};
+
+class ir_constant_variable_visitor : public ir_hierarchical_visitor {
+public:
+   virtual ir_visitor_status visit_enter(ir_dereference_variable *);
+   virtual ir_visitor_status visit(ir_variable *);
+   virtual ir_visitor_status visit_enter(ir_assignment *);
+   virtual ir_visitor_status visit_enter(ir_call *);
+
+   exec_list list;
+};
+
+} /* unnamed namespace */
+
+static struct assignment_entry *
+get_assignment_entry(ir_variable *var, exec_list *list)
+{
+   struct assignment_entry *entry;
+
+   foreach_list_typed(struct assignment_entry, entry, link, list) {
+      if (entry->var == var)
+	 return entry;
+   }
+
+   entry = (struct assignment_entry *)calloc(1, sizeof(*entry));
+   entry->var = var;
+   list->push_head(&entry->link);
+   return entry;
+}
+
+ir_visitor_status
+ir_constant_variable_visitor::visit(ir_variable *ir)
+{
+   struct assignment_entry *entry = get_assignment_entry(ir, &this->list);
+   entry->our_scope = true;
+   return visit_continue;
+}
+
+/* Skip derefs of variables so that we can detect declarations. */
+ir_visitor_status
+ir_constant_variable_visitor::visit_enter(ir_dereference_variable *ir)
+{
+   (void)ir;
+   return visit_continue_with_parent;
+}
+
+ir_visitor_status
+ir_constant_variable_visitor::visit_enter(ir_assignment *ir)
+{
+   ir_constant *constval;
+   struct assignment_entry *entry;
+
+   entry = get_assignment_entry(ir->lhs->variable_referenced(), &this->list);
+   assert(entry);
+   entry->assignment_count++;
+
+   /* If it's already constant, don't do the work. */
+   if (entry->var->constant_value)
+      return visit_continue;
+
+   /* OK, now find if we actually have all the right conditions for
+    * this to be a constant value assigned to the var.
+    */
+   if (ir->condition)
+      return visit_continue;
+
+   ir_variable *var = ir->whole_variable_written();
+   if (!var)
+      return visit_continue;
+
+   constval = ir->rhs->constant_expression_value();
+   if (!constval)
+      return visit_continue;
+
+   /* Mark this entry as having a constant assignment (if the
+    * assignment count doesn't go >1).  do_constant_variable will fix
+    * up the variable with the constant value later.
+    */
+   entry->constval = constval;
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_constant_variable_visitor::visit_enter(ir_call *ir)
+{
+   /* Mark any out parameters as assigned to */
+   foreach_two_lists(formal_node, &ir->callee->parameters,
+                     actual_node, &ir->actual_parameters) {
+      ir_rvalue *param_rval = (ir_rvalue *) actual_node;
+      ir_variable *param = (ir_variable *) formal_node;
+
+      if (param->data.mode == ir_var_function_out ||
+	  param->data.mode == ir_var_function_inout) {
+	 ir_variable *var = param_rval->variable_referenced();
+	 struct assignment_entry *entry;
+
+	 assert(var);
+	 entry = get_assignment_entry(var, &this->list);
+	 entry->assignment_count++;
+      }
+   }
+
+   /* Mark the return storage as having been assigned to */
+   if (ir->return_deref != NULL) {
+      ir_variable *var = ir->return_deref->variable_referenced();
+      struct assignment_entry *entry;
+
+      assert(var);
+      entry = get_assignment_entry(var, &this->list);
+      entry->assignment_count++;
+   }
+
+   return visit_continue;
+}
+
+/**
+ * Does a copy propagation pass on the code present in the instruction stream.
+ */
+bool
+do_constant_variable(exec_list *instructions)
+{
+   bool progress = false;
+   ir_constant_variable_visitor v;
+
+   v.run(instructions);
+
+   while (!v.list.is_empty()) {
+
+      struct assignment_entry *entry;
+      entry = exec_node_data(struct assignment_entry, v.list.head, link);
+
+      if (entry->assignment_count == 1 && entry->constval && entry->our_scope) {
+	 entry->var->constant_value = entry->constval;
+	 progress = true;
+      }
+      entry->link.remove();
+      free(entry);
+   }
+
+   return progress;
+}
+
+bool
+do_constant_variable_unlinked(exec_list *instructions)
+{
+   bool progress = false;
+
+   foreach_list(n, instructions) {
+      ir_instruction *ir = (ir_instruction *) n;
+      ir_function *f = ir->as_function();
+      if (f) {
+	 foreach_list(signode, &f->signatures) {
+	    ir_function_signature *sig = (ir_function_signature *) signode;
+	    if (do_constant_variable(&sig->body))
+	       progress = true;
+	 }
+      }
+   }
+
+   return progress;
+}

diff --git a/icd/intel/compiler/shader/opt_copy_propagation.cpp b/icd/intel/compiler/shader/opt_copy_propagation.cpp
new file mode 100644
index 0000000..195cc8b
--- /dev/null
+++ b/icd/intel/compiler/shader/opt_copy_propagation.cpp

@@ -0,0 +1,349 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file opt_copy_propagation.cpp
+ *
+ * Moves usage of recently-copied variables to the previous copy of
+ * the variable.
+ *
+ * This should reduce the number of MOV instructions in the generated
+ * programs unless copy propagation is also done on the LIR, and may
+ * help anyway by triggering other optimizations that live in the HIR.
+ */
+
+#include "ir.h"
+#include "ir_visitor.h"
+#include "ir_basic_block.h"
+#include "ir_optimization.h"
+#include "glsl_types.h"
+
+namespace {
+
+class acp_entry : public exec_node
+{
+public:
+   acp_entry(ir_variable *lhs, ir_variable *rhs)
+   {
+      assert(lhs);
+      assert(rhs);
+      this->lhs = lhs;
+      this->rhs = rhs;
+   }
+
+   ir_variable *lhs;
+   ir_variable *rhs;
+};
+
+
+class kill_entry : public exec_node
+{
+public:
+   kill_entry(ir_variable *var)
+   {
+      assert(var);
+      this->var = var;
+   }
+
+   ir_variable *var;
+};
+
+class ir_copy_propagation_visitor : public ir_hierarchical_visitor {
+public:
+   ir_copy_propagation_visitor()
+   {
+      progress = false;
+      mem_ctx = ralloc_context(0);
+      this->acp = new(mem_ctx) exec_list;
+      this->kills = new(mem_ctx) exec_list;
+   }
+   ~ir_copy_propagation_visitor()
+   {
+      ralloc_free(mem_ctx);
+   }
+
+   virtual ir_visitor_status visit(class ir_dereference_variable *);
+   virtual ir_visitor_status visit_enter(class ir_loop *);
+   virtual ir_visitor_status visit_enter(class ir_function_signature *);
+   virtual ir_visitor_status visit_enter(class ir_function *);
+   virtual ir_visitor_status visit_leave(class ir_assignment *);
+   virtual ir_visitor_status visit_enter(class ir_call *);
+   virtual ir_visitor_status visit_enter(class ir_if *);
+
+   void add_copy(ir_assignment *ir);
+   void kill(ir_variable *ir);
+   void handle_if_block(exec_list *instructions);
+
+   /** List of acp_entry: The available copies to propagate */
+   exec_list *acp;
+   /**
+    * List of kill_entry: The variables whose values were killed in this
+    * block.
+    */
+   exec_list *kills;
+
+   bool progress;
+
+   bool killed_all;
+
+   void *mem_ctx;
+};
+
+} /* unnamed namespace */
+
+ir_visitor_status
+ir_copy_propagation_visitor::visit_enter(ir_function_signature *ir)
+{
+   /* Treat entry into a function signature as a completely separate
+    * block.  Any instructions at global scope will be shuffled into
+    * main() at link time, so they're irrelevant to us.
+    */
+   exec_list *orig_acp = this->acp;
+   exec_list *orig_kills = this->kills;
+   bool orig_killed_all = this->killed_all;
+
+   this->acp = new(mem_ctx) exec_list;
+   this->kills = new(mem_ctx) exec_list;
+   this->killed_all = false;
+
+   visit_list_elements(this, &ir->body);
+
+   this->kills = orig_kills;
+   this->acp = orig_acp;
+   this->killed_all = orig_killed_all;
+
+   return visit_continue_with_parent;
+}
+
+ir_visitor_status
+ir_copy_propagation_visitor::visit_leave(ir_assignment *ir)
+{
+   kill(ir->lhs->variable_referenced());
+
+   add_copy(ir);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_copy_propagation_visitor::visit_enter(ir_function *ir)
+{
+   (void) ir;
+   return visit_continue;
+}
+
+/**
+ * Replaces dereferences of ACP RHS variables with ACP LHS variables.
+ *
+ * This is where the actual copy propagation occurs.  Note that the
+ * rewriting of ir_dereference means that the ir_dereference instance
+ * must not be shared by multiple IR operations!
+ */
+ir_visitor_status
+ir_copy_propagation_visitor::visit(ir_dereference_variable *ir)
+{
+   if (this->in_assignee)
+      return visit_continue;
+
+   ir_variable *var = ir->var;
+
+   foreach_list(n, this->acp) {
+      acp_entry *entry = (acp_entry *) n;
+
+      if (var == entry->lhs) {
+	 ir->var = entry->rhs;
+	 this->progress = true;
+	 break;
+      }
+   }
+
+   return visit_continue;
+}
+
+
+ir_visitor_status
+ir_copy_propagation_visitor::visit_enter(ir_call *ir)
+{
+   /* Do copy propagation on call parameters, but skip any out params */
+   foreach_two_lists(formal_node, &ir->callee->parameters,
+                     actual_node, &ir->actual_parameters) {
+      ir_variable *sig_param = (ir_variable *) formal_node;
+      ir_rvalue *ir = (ir_rvalue *) actual_node;
+      if (sig_param->data.mode != ir_var_function_out
+          && sig_param->data.mode != ir_var_function_inout) {
+         ir->accept(this);
+      }
+   }
+
+   /* Since we're unlinked, we don't (necessarily) know the side effects of
+    * this call.  So kill all copies.
+    */
+   acp->make_empty();
+   this->killed_all = true;
+
+   return visit_continue_with_parent;
+}
+
+void
+ir_copy_propagation_visitor::handle_if_block(exec_list *instructions)
+{
+   exec_list *orig_acp = this->acp;
+   exec_list *orig_kills = this->kills;
+   bool orig_killed_all = this->killed_all;
+
+   this->acp = new(mem_ctx) exec_list;
+   this->kills = new(mem_ctx) exec_list;
+   this->killed_all = false;
+
+   /* Populate the initial acp with a copy of the original */
+   foreach_list(n, orig_acp) {
+      acp_entry *a = (acp_entry *) n;
+      this->acp->push_tail(new(this->mem_ctx) acp_entry(a->lhs, a->rhs));
+   }
+
+   visit_list_elements(this, instructions);
+
+   if (this->killed_all) {
+      orig_acp->make_empty();
+   }
+
+   exec_list *new_kills = this->kills;
+   this->kills = orig_kills;
+   this->acp = orig_acp;
+   this->killed_all = this->killed_all || orig_killed_all;
+
+   foreach_list(n, new_kills) {
+      kill_entry *k = (kill_entry *) n;
+      kill(k->var);
+   }
+}
+
+ir_visitor_status
+ir_copy_propagation_visitor::visit_enter(ir_if *ir)
+{
+   ir->condition->accept(this);
+
+   handle_if_block(&ir->then_instructions);
+   handle_if_block(&ir->else_instructions);
+
+   /* handle_if_block() already descended into the children. */
+   return visit_continue_with_parent;
+}
+
+ir_visitor_status
+ir_copy_propagation_visitor::visit_enter(ir_loop *ir)
+{
+   exec_list *orig_acp = this->acp;
+   exec_list *orig_kills = this->kills;
+   bool orig_killed_all = this->killed_all;
+
+   /* FINISHME: For now, the initial acp for loops is totally empty.
+    * We could go through once, then go through again with the acp
+    * cloned minus the killed entries after the first run through.
+    */
+   this->acp = new(mem_ctx) exec_list;
+   this->kills = new(mem_ctx) exec_list;
+   this->killed_all = false;
+
+   visit_list_elements(this, &ir->body_instructions);
+
+   if (this->killed_all) {
+      orig_acp->make_empty();
+   }
+
+   exec_list *new_kills = this->kills;
+   this->kills = orig_kills;
+   this->acp = orig_acp;
+   this->killed_all = this->killed_all || orig_killed_all;
+
+   foreach_list(n, new_kills) {
+      kill_entry *k = (kill_entry *) n;
+      kill(k->var);
+   }
+
+   /* already descended into the children. */
+   return visit_continue_with_parent;
+}
+
+void
+ir_copy_propagation_visitor::kill(ir_variable *var)
+{
+   assert(var != NULL);
+
+   /* Remove any entries currently in the ACP for this kill. */
+   foreach_list_safe(n, acp) {
+      acp_entry *entry = (acp_entry *) n;
+
+      if (entry->lhs == var || entry->rhs == var) {
+	 entry->remove();
+      }
+   }
+
+   /* Add the LHS variable to the list of killed variables in this block.
+    */
+   this->kills->push_tail(new(this->mem_ctx) kill_entry(var));
+}
+
+/**
+ * Adds an entry to the available copy list if it's a plain assignment
+ * of a variable to a variable.
+ */
+void
+ir_copy_propagation_visitor::add_copy(ir_assignment *ir)
+{
+   acp_entry *entry;
+
+   if (ir->condition)
+      return;
+
+   ir_variable *lhs_var = ir->whole_variable_written();
+   ir_variable *rhs_var = ir->rhs->whole_variable_referenced();
+
+   if ((lhs_var != NULL) && (rhs_var != NULL)) {
+      if (lhs_var == rhs_var) {
+	 /* This is a dumb assignment, but we've conveniently noticed
+	  * it here.  Removing it now would mess up the loop iteration
+	  * calling us.  Just flag it to not execute, and someone else
+	  * will clean up the mess.
+	  */
+	 ir->condition = new(ralloc_parent(ir)) ir_constant(false);
+	 this->progress = true;
+      } else {
+	 entry = new(this->mem_ctx) acp_entry(lhs_var, rhs_var);
+	 this->acp->push_tail(entry);
+      }
+   }
+}
+
+/**
+ * Does a copy propagation pass on the code present in the instruction stream.
+ */
+bool
+do_copy_propagation(exec_list *instructions)
+{
+   ir_copy_propagation_visitor v;
+
+   visit_list_elements(&v, instructions);
+
+   return v.progress;
+}

diff --git a/icd/intel/compiler/shader/opt_copy_propagation_elements.cpp b/icd/intel/compiler/shader/opt_copy_propagation_elements.cpp
new file mode 100644
index 0000000..ffd095a
--- /dev/null
+++ b/icd/intel/compiler/shader/opt_copy_propagation_elements.cpp

@@ -0,0 +1,495 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file opt_copy_propagation_elements.cpp
+ *
+ * Replaces usage of recently-copied components of variables with the
+ * previous copy of the variable.
+ *
+ * This pass can be compared with opt_copy_propagation, which operands
+ * on arbitrary whole-variable copies.  However, in order to handle
+ * the copy propagation of swizzled variables or writemasked writes,
+ * we want to track things on a channel-wise basis.  I found that
+ * trying to mix the swizzled/writemasked support here with the
+ * whole-variable stuff in opt_copy_propagation.cpp just made a mess,
+ * so this is separate despite the ACP handling being somewhat
+ * similar.
+ *
+ * This should reduce the number of MOV instructions in the generated
+ * programs unless copy propagation is also done on the LIR, and may
+ * help anyway by triggering other optimizations that live in the HIR.
+ */
+
+#include "ir.h"
+#include "ir_rvalue_visitor.h"
+#include "ir_basic_block.h"
+#include "ir_optimization.h"
+#include "glsl_types.h"
+
+static bool debug = false;
+
+namespace {
+
+class acp_entry : public exec_node
+{
+public:
+   acp_entry(ir_variable *lhs, ir_variable *rhs, int write_mask, int swizzle[4])
+   {
+      this->lhs = lhs;
+      this->rhs = rhs;
+      this->write_mask = write_mask;
+      memcpy(this->swizzle, swizzle, sizeof(this->swizzle));
+   }
+
+   acp_entry(acp_entry *a)
+   {
+      this->lhs = a->lhs;
+      this->rhs = a->rhs;
+      this->write_mask = a->write_mask;
+      memcpy(this->swizzle, a->swizzle, sizeof(this->swizzle));
+   }
+
+   ir_variable *lhs;
+   ir_variable *rhs;
+   unsigned int write_mask;
+   int swizzle[4];
+};
+
+
+class kill_entry : public exec_node
+{
+public:
+   kill_entry(ir_variable *var, int write_mask)
+   {
+      this->var = var;
+      this->write_mask = write_mask;
+   }
+
+   ir_variable *var;
+   unsigned int write_mask;
+};
+
+class ir_copy_propagation_elements_visitor : public ir_rvalue_visitor {
+public:
+   ir_copy_propagation_elements_visitor()
+   {
+      this->progress = false;
+      this->killed_all = false;
+      this->mem_ctx = ralloc_context(NULL);
+      this->shader_mem_ctx = NULL;
+      this->acp = new(mem_ctx) exec_list;
+      this->kills = new(mem_ctx) exec_list;
+   }
+   ~ir_copy_propagation_elements_visitor()
+   {
+      ralloc_free(mem_ctx);
+   }
+
+   virtual ir_visitor_status visit_enter(class ir_loop *);
+   virtual ir_visitor_status visit_enter(class ir_function_signature *);
+   virtual ir_visitor_status visit_leave(class ir_assignment *);
+   virtual ir_visitor_status visit_enter(class ir_call *);
+   virtual ir_visitor_status visit_enter(class ir_if *);
+   virtual ir_visitor_status visit_leave(class ir_swizzle *);
+
+   void handle_rvalue(ir_rvalue **rvalue);
+
+   void add_copy(ir_assignment *ir);
+   void kill(kill_entry *k);
+   void handle_if_block(exec_list *instructions);
+
+   /** List of acp_entry: The available copies to propagate */
+   exec_list *acp;
+   /**
+    * List of kill_entry: The variables whose values were killed in this
+    * block.
+    */
+   exec_list *kills;
+
+   bool progress;
+
+   bool killed_all;
+
+   /* Context for our local data structures. */
+   void *mem_ctx;
+   /* Context for allocating new shader nodes. */
+   void *shader_mem_ctx;
+};
+
+} /* unnamed namespace */
+
+ir_visitor_status
+ir_copy_propagation_elements_visitor::visit_enter(ir_function_signature *ir)
+{
+   /* Treat entry into a function signature as a completely separate
+    * block.  Any instructions at global scope will be shuffled into
+    * main() at link time, so they're irrelevant to us.
+    */
+   exec_list *orig_acp = this->acp;
+   exec_list *orig_kills = this->kills;
+   bool orig_killed_all = this->killed_all;
+
+   this->acp = new(mem_ctx) exec_list;
+   this->kills = new(mem_ctx) exec_list;
+   this->killed_all = false;
+
+   visit_list_elements(this, &ir->body);
+
+   this->kills = orig_kills;
+   this->acp = orig_acp;
+   this->killed_all = orig_killed_all;
+
+   return visit_continue_with_parent;
+}
+
+ir_visitor_status
+ir_copy_propagation_elements_visitor::visit_leave(ir_assignment *ir)
+{
+   ir_dereference_variable *lhs = ir->lhs->as_dereference_variable();
+   ir_variable *var = ir->lhs->variable_referenced();
+
+   if (var->type->is_scalar() || var->type->is_vector()) {
+      kill_entry *k;
+
+      if (lhs)
+	 k = new(mem_ctx) kill_entry(var, ir->write_mask);
+      else
+	 k = new(mem_ctx) kill_entry(var, ~0);
+
+      kill(k);
+   }
+
+   add_copy(ir);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_copy_propagation_elements_visitor::visit_leave(ir_swizzle *)
+{
+   /* Don't visit the values of swizzles since they are handled while
+    * visiting the swizzle itself.
+    */
+   return visit_continue;
+}
+
+/**
+ * Replaces dereferences of ACP RHS variables with ACP LHS variables.
+ *
+ * This is where the actual copy propagation occurs.  Note that the
+ * rewriting of ir_dereference means that the ir_dereference instance
+ * must not be shared by multiple IR operations!
+ */
+void
+ir_copy_propagation_elements_visitor::handle_rvalue(ir_rvalue **ir)
+{
+   int swizzle_chan[4];
+   ir_dereference_variable *deref_var;
+   ir_variable *source[4] = {NULL, NULL, NULL, NULL};
+   int source_chan[4] = {0, 0, 0, 0};
+   int chans;
+
+   if (!*ir)
+      return;
+
+   ir_swizzle *swizzle = (*ir)->as_swizzle();
+   if (swizzle) {
+      deref_var = swizzle->val->as_dereference_variable();
+      if (!deref_var)
+	 return;
+
+      swizzle_chan[0] = swizzle->mask.x;
+      swizzle_chan[1] = swizzle->mask.y;
+      swizzle_chan[2] = swizzle->mask.z;
+      swizzle_chan[3] = swizzle->mask.w;
+      chans = swizzle->type->vector_elements;
+   } else {
+      deref_var = (*ir)->as_dereference_variable();
+      if (!deref_var)
+	 return;
+
+      swizzle_chan[0] = 0;
+      swizzle_chan[1] = 1;
+      swizzle_chan[2] = 2;
+      swizzle_chan[3] = 3;
+      chans = deref_var->type->vector_elements;
+   }
+
+   if (this->in_assignee)
+      return;
+
+   ir_variable *var = deref_var->var;
+
+   /* Try to find ACP entries covering swizzle_chan[], hoping they're
+    * the same source variable.
+    */
+   foreach_list(n, this->acp) {
+      acp_entry *entry = (acp_entry *) n;
+
+      if (var == entry->lhs) {
+	 for (int c = 0; c < chans; c++) {
+	    if (entry->write_mask & (1 << swizzle_chan[c])) {
+	       source[c] = entry->rhs;
+	       source_chan[c] = entry->swizzle[swizzle_chan[c]];
+	    }
+	 }
+      }
+   }
+
+   /* Make sure all channels are copying from the same source variable. */
+   if (!source[0])
+      return;
+   for (int c = 1; c < chans; c++) {
+      if (source[c] != source[0])
+	 return;
+   }
+
+   if (!shader_mem_ctx)
+      shader_mem_ctx = ralloc_parent(deref_var);
+
+   if (debug) {
+      printf("Copy propagation from:\n");
+      (*ir)->print();
+   }
+
+   deref_var = new(shader_mem_ctx) ir_dereference_variable(source[0]);
+   *ir = new(shader_mem_ctx) ir_swizzle(deref_var,
+					source_chan[0],
+					source_chan[1],
+					source_chan[2],
+					source_chan[3],
+					chans);
+
+   if (debug) {
+      printf("to:\n");
+      (*ir)->print();
+      printf("\n");
+   }
+}
+
+
+ir_visitor_status
+ir_copy_propagation_elements_visitor::visit_enter(ir_call *ir)
+{
+   /* Do copy propagation on call parameters, but skip any out params */
+   foreach_two_lists(formal_node, &ir->callee->parameters,
+                     actual_node, &ir->actual_parameters) {
+      ir_variable *sig_param = (ir_variable *) formal_node;
+      ir_rvalue *ir = (ir_rvalue *) actual_node;
+      if (sig_param->data.mode != ir_var_function_out
+          && sig_param->data.mode != ir_var_function_inout) {
+         ir->accept(this);
+      }
+   }
+
+   /* Since we're unlinked, we don't (necessarily) know the side effects of
+    * this call.  So kill all copies.
+    */
+   acp->make_empty();
+   this->killed_all = true;
+
+   return visit_continue_with_parent;
+}
+
+void
+ir_copy_propagation_elements_visitor::handle_if_block(exec_list *instructions)
+{
+   exec_list *orig_acp = this->acp;
+   exec_list *orig_kills = this->kills;
+   bool orig_killed_all = this->killed_all;
+
+   this->acp = new(mem_ctx) exec_list;
+   this->kills = new(mem_ctx) exec_list;
+   this->killed_all = false;
+
+   /* Populate the initial acp with a copy of the original */
+   foreach_list(n, orig_acp) {
+      acp_entry *a = (acp_entry *) n;
+      this->acp->push_tail(new(this->mem_ctx) acp_entry(a));
+   }
+
+   visit_list_elements(this, instructions);
+
+   if (this->killed_all) {
+      orig_acp->make_empty();
+   }
+
+   exec_list *new_kills = this->kills;
+   this->kills = orig_kills;
+   this->acp = orig_acp;
+   this->killed_all = this->killed_all || orig_killed_all;
+
+   /* Move the new kills into the parent block's list, removing them
+    * from the parent's ACP list in the process.
+    */
+   foreach_list_safe(node, new_kills) {
+      kill_entry *k = (kill_entry *)node;
+      kill(k);
+   }
+}
+
+ir_visitor_status
+ir_copy_propagation_elements_visitor::visit_enter(ir_if *ir)
+{
+   ir->condition->accept(this);
+
+   handle_if_block(&ir->then_instructions);
+   handle_if_block(&ir->else_instructions);
+
+   /* handle_if_block() already descended into the children. */
+   return visit_continue_with_parent;
+}
+
+ir_visitor_status
+ir_copy_propagation_elements_visitor::visit_enter(ir_loop *ir)
+{
+   exec_list *orig_acp = this->acp;
+   exec_list *orig_kills = this->kills;
+   bool orig_killed_all = this->killed_all;
+
+   /* FINISHME: For now, the initial acp for loops is totally empty.
+    * We could go through once, then go through again with the acp
+    * cloned minus the killed entries after the first run through.
+    */
+   this->acp = new(mem_ctx) exec_list;
+   this->kills = new(mem_ctx) exec_list;
+   this->killed_all = false;
+
+   visit_list_elements(this, &ir->body_instructions);
+
+   if (this->killed_all) {
+      orig_acp->make_empty();
+   }
+
+   exec_list *new_kills = this->kills;
+   this->kills = orig_kills;
+   this->acp = orig_acp;
+   this->killed_all = this->killed_all || orig_killed_all;
+
+   foreach_list_safe(node, new_kills) {
+      kill_entry *k = (kill_entry *)node;
+      kill(k);
+   }
+
+   /* already descended into the children. */
+   return visit_continue_with_parent;
+}
+
+/* Remove any entries currently in the ACP for this kill. */
+void
+ir_copy_propagation_elements_visitor::kill(kill_entry *k)
+{
+   foreach_list_safe(node, acp) {
+      acp_entry *entry = (acp_entry *)node;
+
+      if (entry->lhs == k->var) {
+	 entry->write_mask = entry->write_mask & ~k->write_mask;
+	 if (entry->write_mask == 0) {
+	    entry->remove();
+	    continue;
+	 }
+      }
+      if (entry->rhs == k->var) {
+	 entry->remove();
+      }
+   }
+
+   /* If we were on a list, remove ourselves before inserting */
+   if (k->next)
+      k->remove();
+
+   this->kills->push_tail(k);
+}
+
+/**
+ * Adds directly-copied channels between vector variables to the available
+ * copy propagation list.
+ */
+void
+ir_copy_propagation_elements_visitor::add_copy(ir_assignment *ir)
+{
+   acp_entry *entry;
+   int orig_swizzle[4] = {0, 1, 2, 3};
+   int swizzle[4];
+
+   if (ir->condition)
+      return;
+
+   ir_dereference_variable *lhs = ir->lhs->as_dereference_variable();
+   if (!lhs || !(lhs->type->is_scalar() || lhs->type->is_vector()))
+      return;
+
+   ir_dereference_variable *rhs = ir->rhs->as_dereference_variable();
+   if (!rhs) {
+      ir_swizzle *swiz = ir->rhs->as_swizzle();
+      if (!swiz)
+	 return;
+
+      rhs = swiz->val->as_dereference_variable();
+      if (!rhs)
+	 return;
+
+      orig_swizzle[0] = swiz->mask.x;
+      orig_swizzle[1] = swiz->mask.y;
+      orig_swizzle[2] = swiz->mask.z;
+      orig_swizzle[3] = swiz->mask.w;
+   }
+
+   /* Move the swizzle channels out to the positions they match in the
+    * destination.  We don't want to have to rewrite the swizzle[]
+    * array every time we clear a bit of the write_mask.
+    */
+   int j = 0;
+   for (int i = 0; i < 4; i++) {
+      if (ir->write_mask & (1 << i))
+	 swizzle[i] = orig_swizzle[j++];
+   }
+
+   int write_mask = ir->write_mask;
+   if (lhs->var == rhs->var) {
+      /* If this is a copy from the variable to itself, then we need
+       * to be sure not to include the updated channels from this
+       * instruction in the set of new source channels to be
+       * copy-propagated from.
+       */
+      for (int i = 0; i < 4; i++) {
+	 if (ir->write_mask & (1 << orig_swizzle[i]))
+	    write_mask &= ~(1 << i);
+      }
+   }
+
+   entry = new(this->mem_ctx) acp_entry(lhs->var, rhs->var, write_mask,
+					swizzle);
+   this->acp->push_tail(entry);
+}
+
+bool
+do_copy_propagation_elements(exec_list *instructions)
+{
+   ir_copy_propagation_elements_visitor v;
+
+   visit_list_elements(&v, instructions);
+
+   return v.progress;
+}

diff --git a/icd/intel/compiler/shader/opt_cse.cpp b/icd/intel/compiler/shader/opt_cse.cpp
new file mode 100644
index 0000000..1b8782b
--- /dev/null
+++ b/icd/intel/compiler/shader/opt_cse.cpp

@@ -0,0 +1,423 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file opt_cse.cpp
+ *
+ * constant subexpression elimination at the GLSL IR level.
+ *
+ * Compare to brw_fs_cse.cpp for a more complete CSE implementation.  This one
+ * is generic and handles texture operations, but it's rather simple currently
+ * and doesn't support modification of variables in the available expressions
+ * list, so it can't do variables other than uniforms or shader inputs.
+ */
+
+#include "ir.h"
+#include "ir_visitor.h"
+#include "ir_rvalue_visitor.h"
+#include "ir_basic_block.h"
+#include "ir_optimization.h"
+#include "ir_builder.h"
+#include "glsl_types.h"
+
+using namespace ir_builder;
+
+static bool debug = false;
+
+namespace {
+
+/**
+ * This is the record of an available expression for common subexpression
+ * elimination.
+ */
+class ae_entry : public exec_node
+{
+public:
+   ae_entry(ir_instruction *base_ir, ir_rvalue **val)
+      : val(val), base_ir(base_ir)
+   {
+      assert(val);
+      assert(*val);
+      assert(base_ir);
+
+      var = NULL;
+   }
+
+   /**
+    * The pointer to the expression that we might be able to reuse
+    *
+    * Note the double pointer -- this is the place in the base_ir expression
+    * tree that we would rewrite to move the expression out to a new variable
+    * assignment.
+    */
+   ir_rvalue **val;
+
+   /**
+    * Root instruction in the basic block where the expression appeared.
+    *
+    * This is used so that we can insert the new variable declaration into the
+    * instruction stream (since *val is just somewhere in base_ir's expression
+    * tree).
+    */
+   ir_instruction *base_ir;
+
+   /**
+    * The variable that the expression has been stored in, if it's been CSEd
+    * once already.
+    */
+   ir_variable *var;
+};
+
+class cse_visitor : public ir_rvalue_visitor {
+public:
+   cse_visitor(exec_list *validate_instructions)
+      : validate_instructions(validate_instructions)
+   {
+      progress = false;
+      mem_ctx = ralloc_context(NULL);
+      this->ae = new(mem_ctx) exec_list;
+   }
+   ~cse_visitor()
+   {
+      ralloc_free(mem_ctx);
+   }
+
+   virtual ir_visitor_status visit_enter(ir_function_signature *ir);
+   virtual ir_visitor_status visit_enter(ir_loop *ir);
+   virtual ir_visitor_status visit_enter(ir_if *ir);
+   virtual ir_visitor_status visit_enter(ir_call *ir);
+   virtual void handle_rvalue(ir_rvalue **rvalue);
+
+   bool progress;
+
+private:
+   void *mem_ctx;
+
+   ir_rvalue *try_cse(ir_rvalue *rvalue);
+   void add_to_ae(ir_rvalue **rvalue);
+
+   /** List of ae_entry: The available expressions to reuse */
+   exec_list *ae;
+
+   /**
+    * The whole shader, so that we can validate_ir_tree in debug mode.
+    *
+    * This proved quite useful when trying to get the tree manipulation
+    * right.
+    */
+   exec_list *validate_instructions;
+};
+
+/**
+ * Visitor to walk an expression tree to check that all variables referenced
+ * are constants.
+ */
+class is_cse_candidate_visitor : public ir_hierarchical_visitor
+{
+public:
+
+   is_cse_candidate_visitor()
+      : ok(true)
+   {
+   }
+
+   virtual ir_visitor_status visit(ir_dereference_variable *ir);
+
+   bool ok;
+};
+
+
+class contains_rvalue_visitor : public ir_rvalue_visitor
+{
+public:
+
+   contains_rvalue_visitor(ir_rvalue *val)
+      : val(val)
+   {
+      found = false;
+   }
+
+   virtual void handle_rvalue(ir_rvalue **rvalue);
+
+   bool found;
+
+private:
+   ir_rvalue *val;
+};
+
+} /* unnamed namespace */
+
+static void
+dump_ae(exec_list *ae)
+{
+   int i = 0;
+
+   printf("CSE: AE contents:\n");
+   foreach_list(node, ae) {
+      ae_entry *entry = (ae_entry *)node;
+
+      printf("CSE:   AE %2d (%p): ", i, entry);
+      (*entry->val)->print();
+      printf("\n");
+
+      if (entry->var)
+         printf("CSE:     in var %p:\n", entry->var);
+
+      i++;
+   }
+}
+
+ir_visitor_status
+is_cse_candidate_visitor::visit(ir_dereference_variable *ir)
+{
+   /* Currently, since we don't handle kills of the ae based on variables
+    * getting assigned, we can only handle constant variables.
+    */
+   if (ir->var->data.read_only) {
+      return visit_continue;
+   } else {
+      ok = false;
+      return visit_stop;
+   }
+}
+
+void
+contains_rvalue_visitor::handle_rvalue(ir_rvalue **rvalue)
+{
+   if (*rvalue == val)
+      found = true;
+}
+
+static bool
+contains_rvalue(ir_rvalue *haystack, ir_rvalue *needle)
+{
+   contains_rvalue_visitor v(needle);
+   haystack->accept(&v);
+   return v.found;
+}
+
+static bool
+is_cse_candidate(ir_rvalue *ir)
+{
+   /* Our temporary variable assignment generation isn't ready to handle
+    * anything bigger than a vector.
+    */
+   if (!ir->type->is_vector() && !ir->type->is_scalar())
+      return false;
+
+   /* Only handle expressions and textures currently.  We may want to extend
+    * to variable-index array dereferences at some point.
+    */
+   switch (ir->ir_type) {
+   case ir_type_expression:
+   case ir_type_texture:
+      break;
+   default:
+      return false;
+   }
+
+   is_cse_candidate_visitor v;
+
+   ir->accept(&v);
+
+   return v.ok;
+}
+
+/**
+ * Tries to find and return a reference to a previous computation of a given
+ * expression.
+ *
+ * Walk the list of available expressions checking if any of them match the
+ * rvalue, and if so, move the previous copy of the expression to a temporary
+ * and return a reference of the temporary.
+ */
+ir_rvalue *
+cse_visitor::try_cse(ir_rvalue *rvalue)
+{
+   foreach_list(node, ae) {
+      ae_entry *entry = (ae_entry *)node;
+
+      if (debug) {
+         printf("Comparing to AE %p: ", entry);
+         (*entry->val)->print();
+         printf("\n");
+      }
+
+      if (!rvalue->equals(*entry->val))
+         continue;
+
+      if (debug) {
+         printf("CSE: Replacing: ");
+         (*entry->val)->print();
+         printf("\n");
+         printf("CSE:      with: ");
+         rvalue->print();
+         printf("\n");
+      }
+
+      if (!entry->var) {
+         ir_instruction *base_ir = entry->base_ir;
+
+         ir_variable *var = new(rvalue) ir_variable(rvalue->type,
+                                                    "cse",
+                                                    ir_var_auto);
+
+         /* Write the previous expression result into a new variable. */
+         base_ir->insert_before(var);
+         ir_assignment *assignment = assign(var, *entry->val);
+         base_ir->insert_before(assignment);
+
+         /* Replace the expression in the original tree with a deref of the
+          * variable, but keep tracking the expression for further reuse.
+          */
+         *entry->val = new(rvalue) ir_dereference_variable(var);
+         entry->val = &assignment->rhs;
+
+         entry->var = var;
+
+         /* Update the base_irs in the AE list.  We have to be sure that
+          * they're correct -- expressions from our base_ir that weren't moved
+          * need to stay in this base_ir (so that later consumption of them
+          * puts new variables between our new variable and our base_ir), but
+          * expressions from our base_ir that we *did* move need base_ir
+          * updated so that any further elimination from inside gets its new
+          * assignments put before our new assignment.
+          */
+         foreach_list(fixup_node, ae) {
+            ae_entry *fixup_entry = (ae_entry *)fixup_node;
+            if (contains_rvalue(assignment->rhs, *fixup_entry->val))
+               fixup_entry->base_ir = assignment;
+         }
+
+         if (debug)
+            dump_ae(ae);
+      }
+
+      /* Replace the expression in our current tree with the variable. */
+      return new(rvalue) ir_dereference_variable(entry->var);
+   }
+
+   return NULL;
+}
+
+/** Add the rvalue to the list of available expressions for CSE. */
+void
+cse_visitor::add_to_ae(ir_rvalue **rvalue)
+{
+   if (debug) {
+      printf("CSE: Add to AE: ");
+      (*rvalue)->print();
+      printf("\n");
+   }
+
+   ae->push_tail(new(mem_ctx) ae_entry(base_ir, rvalue));
+
+   if (debug)
+      dump_ae(ae);
+}
+
+void
+cse_visitor::handle_rvalue(ir_rvalue **rvalue)
+{
+   if (!*rvalue)
+      return;
+
+   if (debug) {
+      printf("CSE: handle_rvalue ");
+      (*rvalue)->print();
+      printf("\n");
+   }
+
+   if (!is_cse_candidate(*rvalue))
+      return;
+
+   ir_rvalue *new_rvalue = try_cse(*rvalue);
+   if (new_rvalue) {
+      *rvalue = new_rvalue;
+      progress = true;
+
+      if (debug)
+         validate_ir_tree(validate_instructions);
+   } else {
+      add_to_ae(rvalue);
+   }
+}
+
+ir_visitor_status
+cse_visitor::visit_enter(ir_if *ir)
+{
+   handle_rvalue(&ir->condition);
+
+   ae->make_empty();
+   visit_list_elements(this, &ir->then_instructions);
+
+   ae->make_empty();
+   visit_list_elements(this, &ir->else_instructions);
+
+   ae->make_empty();
+   return visit_continue_with_parent;
+}
+
+ir_visitor_status
+cse_visitor::visit_enter(ir_function_signature *ir)
+{
+   ae->make_empty();
+   visit_list_elements(this, &ir->body);
+
+   ae->make_empty();
+   return visit_continue_with_parent;
+}
+
+ir_visitor_status
+cse_visitor::visit_enter(ir_loop *ir)
+{
+   ae->make_empty();
+   visit_list_elements(this, &ir->body_instructions);
+
+   ae->make_empty();
+   return visit_continue_with_parent;
+}
+
+ir_visitor_status
+cse_visitor::visit_enter(ir_call *)
+{
+   /* Because call is an exec_list of ir_rvalues, handle_rvalue gets passed a
+    * pointer to the (ir_rvalue *) on the stack.  Since we save those pointers
+    * in the AE list, we can't let handle_rvalue get called.
+    */
+   return visit_continue_with_parent;
+}
+
+/**
+ * Does a (uniform-value) constant subexpression elimination pass on the code
+ * present in the instruction stream.
+ */
+bool
+do_cse(exec_list *instructions)
+{
+   cse_visitor v(instructions);
+
+   visit_list_elements(&v, instructions);
+
+   return v.progress;
+}

diff --git a/icd/intel/compiler/shader/opt_dead_builtin_varyings.cpp b/icd/intel/compiler/shader/opt_dead_builtin_varyings.cpp
new file mode 100644
index 0000000..d6f791e
--- /dev/null
+++ b/icd/intel/compiler/shader/opt_dead_builtin_varyings.cpp

@@ -0,0 +1,588 @@
+/*
+ * Copyright © 2013 Marek Olšák <maraeo@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file opt_dead_builtin_varyings.cpp
+ *
+ * This eliminates the built-in shader outputs which are either not written
+ * at all or not used by the next stage. It also eliminates unused elements
+ * of gl_TexCoord inputs, which reduces the overall varying usage.
+ * The varyings handled here are the primary and secondary color, the fog,
+ * and the texture coordinates (gl_TexCoord).
+ *
+ * This pass is necessary, because the Mesa GLSL linker cannot eliminate
+ * built-in varyings like it eliminates user-defined varyings, because
+ * the built-in varyings have pre-assigned locations. Also, the elimination
+ * of unused gl_TexCoord elements requires its own lowering pass anyway.
+ *
+ * It's implemented by replacing all occurrences of dead varyings with
+ * temporary variables, which creates dead code. It is recommended to run
+ * a dead-code elimination pass after this.
+ *
+ * If any texture coordinate slots can be eliminated, the gl_TexCoord array is
+ * broken down into separate vec4 variables with locations equal to
+ * VARYING_SLOT_TEX0 + i.
+ *
+ * The same is done for the gl_FragData fragment shader output.
+ */
+
+#include "libfns.h" // LunarG ADD:
+#include "icd-utils.h" // LunarG: ADD
+#include "ir.h"
+#include "ir_rvalue_visitor.h"
+#include "ir_optimization.h"
+#include "ir_print_visitor.h"
+#include "glsl_types.h"
+#include "link_varyings.h"
+
+namespace {
+
+/**
+ * This obtains detailed information about built-in varyings from shader code.
+ */
+class varying_info_visitor : public ir_hierarchical_visitor {
+public:
+   /* "mode" can be either ir_var_shader_in or ir_var_shader_out */
+   varying_info_visitor(ir_variable_mode mode, bool find_frag_outputs = false)
+      : lower_texcoord_array(true),
+        texcoord_array(NULL),
+        texcoord_usage(0),
+        find_frag_outputs(find_frag_outputs),
+        lower_fragdata_array(true),
+        fragdata_array(NULL),
+        fragdata_usage(0),
+        color_usage(0),
+        tfeedback_color_usage(0),
+        fog(NULL),
+        has_fog(false),
+        tfeedback_has_fog(false),
+        mode(mode)
+   {
+      memset(color, 0, sizeof(color));
+      memset(backcolor, 0, sizeof(backcolor));
+   }
+
+   virtual ir_visitor_status visit_enter(ir_dereference_array *ir)
+   {
+      ir_variable *var = ir->variable_referenced();
+
+      if (!var || var->data.mode != this->mode)
+         return visit_continue;
+
+      if (this->find_frag_outputs && var->data.location == FRAG_RESULT_DATA0) {
+         this->fragdata_array = var;
+
+         ir_constant *index = ir->array_index->as_constant();
+         if (index == NULL) {
+            /* This is variable indexing. */
+            this->fragdata_usage |= (1 << var->type->array_size()) - 1;
+            this->lower_fragdata_array = false;
+         }
+         else {
+            this->fragdata_usage |= 1 << index->get_uint_component(0);
+         }
+
+         /* Don't visit the leaves of ir_dereference_array. */
+         return visit_continue_with_parent;
+      }
+
+      if (!this->find_frag_outputs && var->data.location == VARYING_SLOT_TEX0) {
+         this->texcoord_array = var;
+
+         ir_constant *index = ir->array_index->as_constant();
+         if (index == NULL) {
+            /* There is variable indexing, we can't lower the texcoord array.
+             */
+            this->texcoord_usage |= (1 << var->type->array_size()) - 1;
+            this->lower_texcoord_array = false;
+         }
+         else {
+            this->texcoord_usage |= 1 << index->get_uint_component(0);
+         }
+
+         /* Don't visit the leaves of ir_dereference_array. */
+         return visit_continue_with_parent;
+      }
+
+      return visit_continue;
+   }
+
+   virtual ir_visitor_status visit(ir_dereference_variable *ir)
+   {
+      ir_variable *var = ir->variable_referenced();
+
+      if (var->data.mode != this->mode || !var->type->is_array())
+         return visit_continue;
+
+      if (this->find_frag_outputs && var->data.location == FRAG_RESULT_DATA0) {
+         /* This is a whole array dereference. */
+         this->fragdata_usage |= (1 << var->type->array_size()) - 1;
+         this->lower_fragdata_array = false;
+         return visit_continue;
+      }
+
+      if (!this->find_frag_outputs && var->data.location == VARYING_SLOT_TEX0) {
+         /* This is a whole array dereference like "gl_TexCoord = x;",
+          * there's probably no point in lowering that.
+          */
+         this->texcoord_usage |= (1 << var->type->array_size()) - 1;
+         this->lower_texcoord_array = false;
+      }
+      return visit_continue;
+   }
+
+   virtual ir_visitor_status visit(ir_variable *var)
+   {
+      if (var->data.mode != this->mode)
+         return visit_continue;
+
+      /* Nothing to do here for fragment outputs. */
+      if (this->find_frag_outputs)
+         return visit_continue;
+
+      /* Handle colors and fog. */
+      switch (var->data.location) {
+      case VARYING_SLOT_COL0:
+         this->color[0] = var;
+         this->color_usage |= 1;
+         break;
+      case VARYING_SLOT_COL1:
+         this->color[1] = var;
+         this->color_usage |= 2;
+         break;
+      case VARYING_SLOT_BFC0:
+         this->backcolor[0] = var;
+         this->color_usage |= 1;
+         break;
+      case VARYING_SLOT_BFC1:
+         this->backcolor[1] = var;
+         this->color_usage |= 2;
+         break;
+      case VARYING_SLOT_FOGC:
+         this->fog = var;
+         this->has_fog = true;
+         break;
+      }
+
+      return visit_continue;
+   }
+
+   void get(exec_list *ir,
+            unsigned num_tfeedback_decls,
+            tfeedback_decl *tfeedback_decls)
+   {
+      /* Handle the transform feedback varyings. */
+      for (unsigned i = 0; i < num_tfeedback_decls; i++) {
+         if (!tfeedback_decls[i].is_varying())
+            continue;
+
+         unsigned location = tfeedback_decls[i].get_location();
+
+         switch (location) {
+         case VARYING_SLOT_COL0:
+         case VARYING_SLOT_BFC0:
+            this->tfeedback_color_usage |= 1;
+            break;
+         case VARYING_SLOT_COL1:
+         case VARYING_SLOT_BFC1:
+            this->tfeedback_color_usage |= 2;
+            break;
+         case VARYING_SLOT_FOGC:
+            this->tfeedback_has_fog = true;
+            break;
+         default:
+            if (location >= VARYING_SLOT_TEX0 &&
+                location <= VARYING_SLOT_TEX7) {
+               this->lower_texcoord_array = false;
+            }
+         }
+      }
+
+      /* Process the shader. */
+      visit_list_elements(this, ir);
+
+      if (!this->texcoord_array) {
+         this->lower_texcoord_array = false;
+      }
+      if (!this->fragdata_array) {
+         this->lower_fragdata_array = false;
+      }
+   }
+
+   bool lower_texcoord_array;
+   ir_variable *texcoord_array;
+   unsigned texcoord_usage; /* bitmask */
+
+   bool find_frag_outputs; /* false if it's looking for varyings */
+   bool lower_fragdata_array;
+   ir_variable *fragdata_array;
+   unsigned fragdata_usage; /* bitmask */
+
+   ir_variable *color[2];
+   ir_variable *backcolor[2];
+   unsigned color_usage; /* bitmask */
+   unsigned tfeedback_color_usage; /* bitmask */
+
+   ir_variable *fog;
+   bool has_fog;
+   bool tfeedback_has_fog;
+
+   ir_variable_mode mode;
+};
+
+
+/**
+ * This replaces unused varyings with temporary variables.
+ *
+ * If "ir" is the producer, the "external" usage should come from
+ * the consumer. It also works the other way around. If either one is
+ * missing, set the "external" usage to a full mask.
+ */
+class replace_varyings_visitor : public ir_rvalue_visitor {
+public:
+   replace_varyings_visitor(exec_list *ir,
+                            const varying_info_visitor *info,
+                            unsigned external_texcoord_usage,
+                            unsigned external_color_usage,
+                            bool external_has_fog)
+      : info(info), new_fog(NULL)
+   {
+      void *const ctx = ir;
+
+      memset(this->new_fragdata, 0, sizeof(this->new_fragdata));
+      memset(this->new_texcoord, 0, sizeof(this->new_texcoord));
+      memset(this->new_color, 0, sizeof(this->new_color));
+      memset(this->new_backcolor, 0, sizeof(this->new_backcolor));
+
+      const char *mode_str =
+         info->mode == ir_var_shader_in ? "in" : "out";
+
+      /* Handle texcoord outputs.
+       *
+       * We're going to break down the gl_TexCoord array into separate
+       * variables. First, add declarations of the new variables all
+       * occurrences of gl_TexCoord will be replaced with.
+       */
+      if (info->lower_texcoord_array) {
+         prepare_array(ir, this->new_texcoord, ARRAY_SIZE(this->new_texcoord),
+                       VARYING_SLOT_TEX0, "TexCoord", mode_str,
+                       info->texcoord_usage, external_texcoord_usage);
+      }
+
+      /* Handle gl_FragData in the same way like gl_TexCoord. */
+      if (info->lower_fragdata_array) {
+         prepare_array(ir, this->new_fragdata, ARRAY_SIZE(this->new_fragdata),
+                       FRAG_RESULT_DATA0, "FragData", mode_str,
+                       info->fragdata_usage, (1 << MAX_DRAW_BUFFERS) - 1);
+      }
+
+      /* Create dummy variables which will replace set-but-unused color and
+       * fog outputs.
+       */
+      external_color_usage |= info->tfeedback_color_usage;
+
+      for (int i = 0; i < 2; i++) {
+         char name[32];
+
+         if (!(external_color_usage & (1 << i))) {
+            if (info->color[i]) {
+               snprintf(name, 32, "gl_%s_FrontColor%i_dummy", mode_str, i);
+               this->new_color[i] =
+                  new (ctx) ir_variable(glsl_type::vec4_type, name,
+                                        ir_var_temporary);
+            }
+
+            if (info->backcolor[i]) {
+               snprintf(name, 32, "gl_%s_BackColor%i_dummy", mode_str, i);
+               this->new_backcolor[i] =
+                  new (ctx) ir_variable(glsl_type::vec4_type, name,
+                                        ir_var_temporary);
+            }
+         }
+      }
+
+      if (!external_has_fog && !info->tfeedback_has_fog &&
+          info->fog) {
+         char name[32];
+
+         snprintf(name, 32, "gl_%s_FogFragCoord_dummy", mode_str);
+         this->new_fog = new (ctx) ir_variable(glsl_type::float_type, name,
+                                               ir_var_temporary);
+      }
+
+      /* Now do the replacing. */
+      visit_list_elements(this, ir);
+   }
+
+   void prepare_array(exec_list *ir,
+                      struct ir_variable **new_var,
+                      int max_elements, unsigned start_location,
+                      const char *var_name, const char *mode_str,
+                      unsigned usage, unsigned external_usage)
+   {
+      void *const ctx = ir;
+
+      for (int i = max_elements-1; i >= 0; i--) {
+         if (usage & (1 << i)) {
+            char name[32];
+
+            if (!(external_usage & (1 << i))) {
+               /* This varying is unused in the next stage. Declare
+                * a temporary instead of an output. */
+               snprintf(name, 32, "gl_%s_%s%i_dummy", mode_str, var_name, i);
+               new_var[i] =
+                  new (ctx) ir_variable(glsl_type::vec4_type, name,
+                                        ir_var_temporary);
+            }
+            else {
+               snprintf(name, 32, "gl_%s_%s%i", mode_str, var_name, i);
+               new_var[i] =
+                  new(ctx) ir_variable(glsl_type::vec4_type, name,
+                                       this->info->mode);
+               new_var[i]->data.location = start_location + i;
+               new_var[i]->data.explicit_location = true;
+               new_var[i]->data.explicit_index = 0;
+            }
+
+            ir->head->insert_before(new_var[i]);
+         }
+      }
+   }
+
+   virtual ir_visitor_status visit(ir_variable *var)
+   {
+      /* Remove the gl_TexCoord array. */
+      if (this->info->lower_texcoord_array &&
+          var == this->info->texcoord_array) {
+         var->remove();
+      }
+
+      /* Remove the gl_FragData array. */
+      if (this->info->lower_fragdata_array &&
+          var == this->info->fragdata_array) {
+         var->remove();
+      }
+
+      /* Replace set-but-unused color and fog outputs with dummy variables. */
+      for (int i = 0; i < 2; i++) {
+         if (var == this->info->color[i] && this->new_color[i]) {
+            var->replace_with(this->new_color[i]);
+         }
+         if (var == this->info->backcolor[i] &&
+             this->new_backcolor[i]) {
+            var->replace_with(this->new_backcolor[i]);
+         }
+      }
+
+      if (var == this->info->fog && this->new_fog) {
+         var->replace_with(this->new_fog);
+      }
+
+      return visit_continue;
+   }
+
+   virtual void handle_rvalue(ir_rvalue **rvalue)
+   {
+      if (!*rvalue)
+         return;
+
+      void *ctx = ralloc_parent(*rvalue);
+
+      /* Replace an array dereference gl_TexCoord[i] with a single
+       * variable dereference representing gl_TexCoord[i].
+       */
+      if (this->info->lower_texcoord_array) {
+         /* gl_TexCoord[i] occurrence */
+         ir_dereference_array *const da = (*rvalue)->as_dereference_array();
+
+         if (da && da->variable_referenced() ==
+             this->info->texcoord_array) {
+            unsigned i = da->array_index->as_constant()->get_uint_component(0);
+
+            *rvalue = new(ctx) ir_dereference_variable(this->new_texcoord[i]);
+            return;
+         }
+      }
+
+      /* Same for gl_FragData. */
+      if (this->info->lower_fragdata_array) {
+         /* gl_FragData[i] occurrence */
+         ir_dereference_array *const da = (*rvalue)->as_dereference_array();
+
+         if (da && da->variable_referenced() == this->info->fragdata_array) {
+            unsigned i = da->array_index->as_constant()->get_uint_component(0);
+
+            *rvalue = new(ctx) ir_dereference_variable(this->new_fragdata[i]);
+            return;
+         }
+      }
+
+      /* Replace set-but-unused color and fog outputs with dummy variables. */
+      ir_dereference_variable *const dv = (*rvalue)->as_dereference_variable();
+      if (!dv)
+         return;
+
+      ir_variable *var = dv->variable_referenced();
+
+      for (int i = 0; i < 2; i++) {
+         if (var == this->info->color[i] && this->new_color[i]) {
+            *rvalue = new(ctx) ir_dereference_variable(this->new_color[i]);
+            return;
+         }
+         if (var == this->info->backcolor[i] &&
+             this->new_backcolor[i]) {
+            *rvalue = new(ctx) ir_dereference_variable(this->new_backcolor[i]);
+            return;
+         }
+      }
+
+      if (var == this->info->fog && this->new_fog) {
+         *rvalue = new(ctx) ir_dereference_variable(this->new_fog);
+      }
+   }
+
+   virtual ir_visitor_status visit_leave(ir_assignment *ir)
+   {
+      handle_rvalue(&ir->rhs);
+      handle_rvalue(&ir->condition);
+
+      /* We have to use set_lhs when changing the LHS of an assignment. */
+      ir_rvalue *lhs = ir->lhs;
+
+      handle_rvalue(&lhs);
+      if (lhs != ir->lhs) {
+         ir->set_lhs(lhs);
+      }
+
+      return visit_continue;
+   }
+
+private:
+   const varying_info_visitor *info;
+   ir_variable *new_fragdata[MAX_DRAW_BUFFERS];
+   ir_variable *new_texcoord[MAX_TEXTURE_COORD_UNITS];
+   ir_variable *new_color[2];
+   ir_variable *new_backcolor[2];
+   ir_variable *new_fog;
+};
+
+} /* anonymous namespace */
+
+static void
+lower_texcoord_array(exec_list *ir, const varying_info_visitor *info)
+{
+   replace_varyings_visitor(ir, info,
+                            (1 << MAX_TEXTURE_COORD_UNITS) - 1,
+                            1 | 2, true);
+}
+
+static void
+lower_fragdata_array(exec_list *ir)
+{
+   varying_info_visitor info(ir_var_shader_out, true);
+   info.get(ir, 0, NULL);
+
+   replace_varyings_visitor(ir, &info, 0, 0, 0);
+}
+
+
+void
+do_dead_builtin_varyings(struct gl_context *ctx,
+                         gl_shader *producer, gl_shader *consumer,
+                         unsigned num_tfeedback_decls,
+                         tfeedback_decl *tfeedback_decls)
+{
+   /* Lower the gl_FragData array to separate variables. */
+   if (consumer && consumer->Stage == MESA_SHADER_FRAGMENT) {
+      lower_fragdata_array(consumer->ir);
+   }
+
+   /* Lowering of built-in varyings has no effect with the core context and
+    * GLES2, because they are not available there.
+    */
+   if (ctx->API == API_OPENGL_CORE ||
+       ctx->API == API_VK ||
+       ctx->API == API_OPENGLES2) {
+      return;
+   }
+
+   /* Information about built-in varyings. */
+   varying_info_visitor producer_info(ir_var_shader_out);
+   varying_info_visitor consumer_info(ir_var_shader_in);
+
+   if (producer) {
+      producer_info.get(producer->ir, num_tfeedback_decls, tfeedback_decls);
+
+      if (!consumer) {
+         /* At least eliminate unused gl_TexCoord elements. */
+         if (producer_info.lower_texcoord_array) {
+            lower_texcoord_array(producer->ir, &producer_info);
+         }
+         return;
+      }
+   }
+
+   if (consumer) {
+      consumer_info.get(consumer->ir, 0, NULL);
+
+      if (!producer) {
+         /* At least eliminate unused gl_TexCoord elements. */
+         if (consumer_info.lower_texcoord_array) {
+            lower_texcoord_array(consumer->ir, &consumer_info);
+         }
+         return;
+      }
+   }
+
+   /* Eliminate the outputs unused by the consumer. */
+   if (producer_info.lower_texcoord_array ||
+       producer_info.color_usage ||
+       producer_info.has_fog) {
+      replace_varyings_visitor(producer->ir,
+                               &producer_info,
+                               consumer_info.texcoord_usage,
+                               consumer_info.color_usage,
+                               consumer_info.has_fog);
+   }
+
+   /* The gl_TexCoord fragment shader inputs can be initialized
+    * by GL_COORD_REPLACE, so we can't eliminate them.
+    *
+    * This doesn't prevent elimination of the gl_TexCoord elements which
+    * are not read by the fragment shader. We want to eliminate those anyway.
+    */
+   if (consumer->Stage == MESA_SHADER_FRAGMENT) {
+      producer_info.texcoord_usage = (1 << MAX_TEXTURE_COORD_UNITS) - 1;
+   }
+
+   /* Eliminate the inputs uninitialized by the producer. */
+   if (consumer_info.lower_texcoord_array ||
+       consumer_info.color_usage ||
+       consumer_info.has_fog) {
+      replace_varyings_visitor(consumer->ir,
+                               &consumer_info,
+                               producer_info.texcoord_usage,
+                               producer_info.color_usage,
+                               producer_info.has_fog);
+   }
+}

diff --git a/icd/intel/compiler/shader/opt_dead_code.cpp b/icd/intel/compiler/shader/opt_dead_code.cpp
new file mode 100644
index 0000000..8adf52f
--- /dev/null
+++ b/icd/intel/compiler/shader/opt_dead_code.cpp

@@ -0,0 +1,150 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file opt_dead_code.cpp
+ *
+ * Eliminates dead assignments and variable declarations from the code.
+ */
+
+#include "ir.h"
+#include "ir_visitor.h"
+#include "ir_variable_refcount.h"
+#include "glsl_types.h"
+#include "main/hash_table.h"
+
+static bool debug = false;
+
+/**
+ * Do a dead code pass over instructions and everything that instructions
+ * references.
+ *
+ * Note that this will remove assignments to globals, so it is not suitable
+ * for usage on an unlinked instruction stream.
+ */
+bool
+do_dead_code(exec_list *instructions, bool uniform_locations_assigned)
+{
+   ir_variable_refcount_visitor v;
+   bool progress = false;
+
+   v.run(instructions);
+
+   struct hash_entry *e;
+   hash_table_foreach(v.ht, e) {
+      ir_variable_refcount_entry *entry = (ir_variable_refcount_entry *)e->data;
+
+      /* Since each assignment is a reference, the referenced count must be
+       * greater than or equal to the assignment count.  If they are equal,
+       * then all of the references are assignments, and the variable is
+       * dead.
+       *
+       * Note that if the variable is neither assigned nor referenced, both
+       * counts will be zero and will be caught by the equality test.
+       */
+      assert(entry->referenced_count >= entry->assigned_count);
+
+      if (debug) {
+	 printf("%s@%p: %d refs, %d assigns, %sdeclared in our scope\n",
+		entry->var->name, (void *) entry->var,
+		entry->referenced_count, entry->assigned_count,
+		entry->declaration ? "" : "not ");
+      }
+
+      if ((entry->referenced_count > entry->assigned_count)
+	  || !entry->declaration)
+	 continue;
+
+      if (entry->assign) {
+	 /* Remove a single dead assignment to the variable we found.
+	  * Don't do so if it's a shader or function output, though.
+	  */
+	 if (entry->var->data.mode != ir_var_function_out &&
+	     entry->var->data.mode != ir_var_function_inout &&
+             entry->var->data.mode != ir_var_shader_out) {
+	    entry->assign->remove();
+	    progress = true;
+
+	    if (debug) {
+	       printf("Removed assignment to %s@%p\n",
+		      entry->var->name, (void *) entry->var);
+	    }
+	 }
+      } else {
+	 /* If there are no assignments or references to the variable left,
+	  * then we can remove its declaration.
+	  */
+
+	 /* uniform initializers are precious, and could get used by another
+	  * stage.  Also, once uniform locations have been assigned, the
+	  * declaration cannot be deleted.
+	  */
+	 if (entry->var->data.mode == ir_var_uniform &&
+	     (uniform_locations_assigned ||
+	      entry->var->constant_value))
+	    continue;
+
+	 entry->var->remove();
+	 progress = true;
+
+	 if (debug) {
+	    printf("Removed declaration of %s@%p\n",
+		   entry->var->name, (void *) entry->var);
+	 }
+      }
+   }
+
+   return progress;
+}
+
+/**
+ * Does a dead code pass on the functions present in the instruction stream.
+ *
+ * This is suitable for use while the program is not linked, as it will
+ * ignore variable declarations (and the assignments to them) for variables
+ * with global scope.
+ */
+bool
+do_dead_code_unlinked(exec_list *instructions)
+{
+   bool progress = false;
+
+   foreach_list(n, instructions) {
+      ir_instruction *ir = (ir_instruction *) n;
+      ir_function *f = ir->as_function();
+      if (f) {
+	 foreach_list(signode, &f->signatures) {
+	    ir_function_signature *sig = (ir_function_signature *) signode;
+	    /* The setting of the uniform_locations_assigned flag here is
+	     * irrelevant.  If there is a uniform declaration encountered
+	     * inside the body of the function, something has already gone
+	     * terribly, terribly wrong.
+	     */
+	    if (do_dead_code(&sig->body, false))
+	       progress = true;
+	 }
+      }
+   }
+
+   return progress;
+}

diff --git a/icd/intel/compiler/shader/opt_dead_code_local.cpp b/icd/intel/compiler/shader/opt_dead_code_local.cpp
new file mode 100644
index 0000000..c27c526
--- /dev/null
+++ b/icd/intel/compiler/shader/opt_dead_code_local.cpp

@@ -0,0 +1,340 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file opt_dead_code_local.cpp
+ *
+ * Eliminates local dead assignments from the code.
+ *
+ * This operates on basic blocks, tracking assignments and finding if
+ * they're used before the variable is completely reassigned.
+ *
+ * Compare this to ir_dead_code.cpp, which operates globally looking
+ * for assignments to variables that are never read.
+ */
+
+#include "ir.h"
+#include "ir_basic_block.h"
+#include "ir_optimization.h"
+#include "glsl_types.h"
+
+static bool debug = false;
+
+namespace {
+
+class assignment_entry : public exec_node
+{
+public:
+   assignment_entry(ir_variable *lhs, ir_assignment *ir)
+   {
+      assert(lhs);
+      assert(ir);
+      this->lhs = lhs;
+      this->ir = ir;
+      this->unused = ir->write_mask;
+   }
+
+   ir_variable *lhs;
+   ir_assignment *ir;
+
+   /* bitmask of xyzw channels written that haven't been used so far. */
+   int unused;
+};
+
+class kill_for_derefs_visitor : public ir_hierarchical_visitor {
+public:
+   kill_for_derefs_visitor(exec_list *assignments)
+   {
+      this->assignments = assignments;
+   }
+
+   void use_channels(ir_variable *const var, int used)
+   {
+      foreach_list_safe(n, this->assignments) {
+	 assignment_entry *entry = (assignment_entry *) n;
+
+	 if (entry->lhs == var) {
+	    if (var->type->is_scalar() || var->type->is_vector()) {
+	       if (debug)
+		  printf("used %s (0x%01x - 0x%01x)\n", entry->lhs->name,
+			 entry->unused, used & 0xf);
+	       entry->unused &= ~used;
+	       if (!entry->unused)
+		  entry->remove();
+	    } else {
+	       if (debug)
+		  printf("used %s\n", entry->lhs->name);
+	       entry->remove();
+	    }
+	 }
+      }
+   }
+
+   virtual ir_visitor_status visit(ir_dereference_variable *ir)
+   {
+      use_channels(ir->var, ~0);
+
+      return visit_continue;
+   }
+
+   virtual ir_visitor_status visit(ir_swizzle *ir)
+   {
+      ir_dereference_variable *deref = ir->val->as_dereference_variable();
+      if (!deref)
+	 return visit_continue;
+
+      int used = 0;
+      used |= 1 << ir->mask.x;
+      used |= 1 << ir->mask.y;
+      used |= 1 << ir->mask.z;
+      used |= 1 << ir->mask.w;
+
+      use_channels(deref->var, used);
+
+      return visit_continue_with_parent;
+   }
+
+   virtual ir_visitor_status visit(ir_emit_vertex *)
+   {
+      /* For the purpose of dead code elimination, emitting a vertex counts as
+       * "reading" all of the currently assigned output variables.
+       */
+      foreach_list_safe(n, this->assignments) {
+         assignment_entry *entry = (assignment_entry *) n;
+         if (entry->lhs->data.mode == ir_var_shader_out) {
+            if (debug)
+               printf("kill %s\n", entry->lhs->name);
+            entry->remove();
+         }
+      }
+
+      return visit_continue;
+   }
+
+private:
+   exec_list *assignments;
+};
+
+class array_index_visit : public ir_hierarchical_visitor {
+public:
+   array_index_visit(ir_hierarchical_visitor *v)
+   {
+      this->visitor = v;
+   }
+
+   virtual ir_visitor_status visit_enter(class ir_dereference_array *ir)
+   {
+      ir->array_index->accept(visitor);
+      return visit_continue;
+   }
+
+   static void run(ir_instruction *ir, ir_hierarchical_visitor *v)
+   {
+      array_index_visit top_visit(v);
+      ir->accept(& top_visit);
+   }
+
+   ir_hierarchical_visitor *visitor;
+};
+
+} /* unnamed namespace */
+
+/**
+ * Adds an entry to the available copy list if it's a plain assignment
+ * of a variable to a variable.
+ */
+static bool
+process_assignment(void *ctx, ir_assignment *ir, exec_list *assignments)
+{
+   ir_variable *var = NULL;
+   bool progress = false;
+   kill_for_derefs_visitor v(assignments);
+
+   /* Kill assignment entries for things used to produce this assignment. */
+   ir->rhs->accept(&v);
+   if (ir->condition) {
+      ir->condition->accept(&v);
+   }
+
+   /* Kill assignment enties used as array indices.
+    */
+   array_index_visit::run(ir->lhs, &v);
+   var = ir->lhs->variable_referenced();
+   assert(var);
+
+   /* Now, check if we did a whole-variable assignment. */
+   if (!ir->condition) {
+      ir_dereference_variable *deref_var = ir->lhs->as_dereference_variable();
+
+      /* If it's a vector type, we can do per-channel elimination of
+       * use of the RHS.
+       */
+      if (deref_var && (deref_var->var->type->is_scalar() ||
+			deref_var->var->type->is_vector())) {
+
+	 if (debug)
+	    printf("looking for %s.0x%01x to remove\n", var->name,
+		   ir->write_mask);
+
+	 foreach_list_safe(n, assignments) {
+	    assignment_entry *entry = (assignment_entry *) n;
+
+	    if (entry->lhs != var)
+	       continue;
+
+	    int remove = entry->unused & ir->write_mask;
+	    if (debug) {
+	       printf("%s 0x%01x - 0x%01x = 0x%01x\n",
+		      var->name,
+		      entry->ir->write_mask,
+		      remove, entry->ir->write_mask & ~remove);
+	    }
+	    if (remove) {
+	       progress = true;
+
+	       if (debug) {
+		  printf("rewriting:\n  ");
+		  entry->ir->print();
+		  printf("\n");
+	       }
+
+	       entry->ir->write_mask &= ~remove;
+	       entry->unused &= ~remove;
+	       if (entry->ir->write_mask == 0) {
+		  /* Delete the dead assignment. */
+		  entry->ir->remove();
+		  entry->remove();
+	       } else {
+		  void *mem_ctx = ralloc_parent(entry->ir);
+		  /* Reswizzle the RHS arguments according to the new
+		   * write_mask.
+		   */
+		  unsigned components[4];
+		  unsigned channels = 0;
+		  unsigned next = 0;
+
+		  for (int i = 0; i < 4; i++) {
+		     if ((entry->ir->write_mask | remove) & (1 << i)) {
+			if (!(remove & (1 << i)))
+			   components[channels++] = next;
+			next++;
+		     }
+		  }
+
+		  entry->ir->rhs = new(mem_ctx) ir_swizzle(entry->ir->rhs,
+							   components,
+							   channels);
+		  if (debug) {
+		     printf("to:\n  ");
+		     entry->ir->print();
+		     printf("\n");
+		  }
+	       }
+	    }
+	 }
+      } else if (ir->whole_variable_written() != NULL) {
+	 /* We did a whole-variable assignment.  So, any instruction in
+	  * the assignment list with the same LHS is dead.
+	  */
+	 if (debug)
+	    printf("looking for %s to remove\n", var->name);
+	 foreach_list_safe(n, assignments) {
+	    assignment_entry *entry = (assignment_entry *) n;
+
+	    if (entry->lhs == var) {
+	       if (debug)
+		  printf("removing %s\n", var->name);
+	       entry->ir->remove();
+	       entry->remove();
+	       progress = true;
+	    }
+	 }
+      }
+   }
+
+   /* Add this instruction to the assignment list available to be removed. */
+   assignment_entry *entry = new(ctx) assignment_entry(var, ir);
+   assignments->push_tail(entry);
+
+   if (debug) {
+      printf("add %s\n", var->name);
+
+      printf("current entries\n");
+      foreach_list(n, assignments) {
+	 assignment_entry *entry = (assignment_entry *) n;
+
+	 printf("    %s (0x%01x)\n", entry->lhs->name, entry->unused);
+      }
+   }
+
+   return progress;
+}
+
+static void
+dead_code_local_basic_block(ir_instruction *first,
+			     ir_instruction *last,
+			     void *data)
+{
+   ir_instruction *ir, *ir_next;
+   /* List of avaialble_copy */
+   exec_list assignments;
+   bool *out_progress = (bool *)data;
+   bool progress = false;
+
+   void *ctx = ralloc_context(NULL);
+   /* Safe looping, since process_assignment */
+   for (ir = first, ir_next = (ir_instruction *)first->next;;
+	ir = ir_next, ir_next = (ir_instruction *)ir->next) {
+      ir_assignment *ir_assign = ir->as_assignment();
+
+      if (debug) {
+	 ir->print();
+	 printf("\n");
+      }
+
+      if (ir_assign) {
+	 progress = process_assignment(ctx, ir_assign, &assignments) || progress;
+      } else {
+	 kill_for_derefs_visitor kill(&assignments);
+	 ir->accept(&kill);
+      }
+
+      if (ir == last)
+	 break;
+   }
+   *out_progress = progress;
+   ralloc_free(ctx);
+}
+
+/**
+ * Does a copy propagation pass on the code present in the instruction stream.
+ */
+bool
+do_dead_code_local(exec_list *instructions)
+{
+   bool progress = false;
+
+   call_for_basic_blocks(instructions, dead_code_local_basic_block, &progress);
+
+   return progress;
+}

diff --git a/icd/intel/compiler/shader/opt_dead_functions.cpp b/icd/intel/compiler/shader/opt_dead_functions.cpp
new file mode 100644
index 0000000..8bb278e
--- /dev/null
+++ b/icd/intel/compiler/shader/opt_dead_functions.cpp

@@ -0,0 +1,156 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file opt_dead_functions.cpp
+ *
+ * Eliminates unused functions from the linked program.
+ */
+
+#include "ir.h"
+#include "ir_visitor.h"
+#include "ir_expression_flattening.h"
+#include "glsl_types.h"
+
+namespace {
+
+class signature_entry : public exec_node
+{
+public:
+   signature_entry(ir_function_signature *sig)
+   {
+      this->signature = sig;
+      this->used = false;
+   }
+
+   ir_function_signature *signature;
+   bool used;
+};
+
+class ir_dead_functions_visitor : public ir_hierarchical_visitor {
+public:
+   ir_dead_functions_visitor()
+   {
+      this->mem_ctx = ralloc_context(NULL);
+   }
+
+   ~ir_dead_functions_visitor()
+   {
+      ralloc_free(this->mem_ctx);
+   }
+
+   virtual ir_visitor_status visit_enter(ir_function_signature *);
+   virtual ir_visitor_status visit_enter(ir_call *);
+
+   signature_entry *get_signature_entry(ir_function_signature *var);
+
+   /* List of signature_entry */
+   exec_list signature_list;
+   void *mem_ctx;
+};
+
+} /* unnamed namespace */
+
+signature_entry *
+ir_dead_functions_visitor::get_signature_entry(ir_function_signature *sig)
+{
+   foreach_list(n, &this->signature_list) {
+      signature_entry *entry = (signature_entry *) n;
+      if (entry->signature == sig)
+	 return entry;
+   }
+
+   signature_entry *entry = new(mem_ctx) signature_entry(sig);
+   this->signature_list.push_tail(entry);
+   return entry;
+}
+
+
+ir_visitor_status
+ir_dead_functions_visitor::visit_enter(ir_function_signature *ir)
+{
+   signature_entry *entry = this->get_signature_entry(ir);
+
+   if (strcmp(ir->function_name(), "main") == 0) {
+      entry->used = true;
+   }
+
+
+
+   return visit_continue;
+}
+
+
+ir_visitor_status
+ir_dead_functions_visitor::visit_enter(ir_call *ir)
+{
+   signature_entry *entry = this->get_signature_entry(ir->callee);
+
+   entry->used = true;
+
+   return visit_continue;
+}
+
+bool
+do_dead_functions(exec_list *instructions)
+{
+   ir_dead_functions_visitor v;
+   bool progress = false;
+
+   visit_list_elements(&v, instructions);
+
+   /* Now that we've figured out which function signatures are used, remove
+    * the unused ones, and remove function definitions that have no more
+    * signatures.
+    */
+    foreach_list_safe(n, &v.signature_list) {
+      signature_entry *entry = (signature_entry *) n;
+
+      if (!entry->used) {
+	 entry->signature->remove();
+	 delete entry->signature;
+	 progress = true;
+      }
+      delete(entry);
+   }
+
+   /* We don't just do this above when we nuked a signature because of
+    * const pointers.
+    */
+   foreach_list_safe(n, instructions) {
+      ir_instruction *ir = (ir_instruction *) n;
+      ir_function *func = ir->as_function();
+
+      if (func && func->signatures.is_empty()) {
+	 /* At this point (post-linking), the symbol table is no
+	  * longer in use, so not removing the function from the
+	  * symbol table should be OK.
+	  */
+	 func->remove();
+	 delete func;
+	 progress = true;
+      }
+   }
+
+   return progress;
+}

diff --git a/icd/intel/compiler/shader/opt_flatten_nested_if_blocks.cpp b/icd/intel/compiler/shader/opt_flatten_nested_if_blocks.cpp
new file mode 100644
index 0000000..c702102
--- /dev/null
+++ b/icd/intel/compiler/shader/opt_flatten_nested_if_blocks.cpp

@@ -0,0 +1,103 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file opt_flatten_nested_if_blocks.cpp
+ *
+ * Flattens nested if blocks such as:
+ *
+ * if (x) {
+ *    if (y) {
+ *       ...
+ *    }
+ * }
+ *
+ * into a single if block with a combined condition:
+ *
+ * if (x && y) {
+ *    ...
+ * }
+ */
+
+#include "ir.h"
+#include "ir_builder.h"
+
+using namespace ir_builder;
+
+namespace {
+
+class nested_if_flattener : public ir_hierarchical_visitor {
+public:
+   nested_if_flattener()
+   {
+      progress = false;
+   }
+
+   ir_visitor_status visit_leave(ir_if *);
+   ir_visitor_status visit_enter(ir_assignment *);
+
+   bool progress;
+};
+
+} /* unnamed namespace */
+
+/* We only care about the top level "if" instructions, so don't
+ * descend into expressions.
+ */
+ir_visitor_status
+nested_if_flattener::visit_enter(ir_assignment *ir)
+{
+   (void) ir;
+   return visit_continue_with_parent;
+}
+
+bool
+opt_flatten_nested_if_blocks(exec_list *instructions)
+{
+   nested_if_flattener v;
+
+   v.run(instructions);
+   return v.progress;
+}
+
+
+ir_visitor_status
+nested_if_flattener::visit_leave(ir_if *ir)
+{
+   /* Only handle a single ir_if within the then clause of an ir_if.  No extra
+    * instructions, no else clauses, nothing.
+    */
+   if (ir->then_instructions.is_empty() || !ir->else_instructions.is_empty())
+      return visit_continue;
+
+   ir_if *inner = ((ir_instruction *) ir->then_instructions.head)->as_if();
+   if (!inner || !inner->next->is_tail_sentinel() ||
+       !inner->else_instructions.is_empty())
+      return visit_continue;
+
+   ir->condition = logic_and(ir->condition, inner->condition);
+   inner->then_instructions.move_nodes_to(&ir->then_instructions);
+
+   progress = true;
+   return visit_continue;
+}

diff --git a/icd/intel/compiler/shader/opt_flip_matrices.cpp b/icd/intel/compiler/shader/opt_flip_matrices.cpp
new file mode 100644
index 0000000..9044fd6
--- /dev/null
+++ b/icd/intel/compiler/shader/opt_flip_matrices.cpp

@@ -0,0 +1,124 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file opt_flip_matrices.cpp
+ *
+ * Convert (matrix * vector) operations to (vector * matrixTranspose),
+ * which can be done using dot products rather than multiplies and adds.
+ * On some hardware, this is more efficient.
+ *
+ * This currently only does the conversion for built-in matrices which
+ * already have transposed equivalents.  Namely, gl_ModelViewProjectionMatrix
+ * and gl_TextureMatrix.
+ */
+#include "ir.h"
+#include "ir_optimization.h"
+#include "main/macros.h"
+
+namespace {
+class matrix_flipper : public ir_hierarchical_visitor {
+public:
+   matrix_flipper(exec_list *instructions)
+   {
+      progress = false;
+      mvp_transpose = NULL;
+      texmat_transpose = NULL;
+
+      foreach_list(n, instructions) {
+         ir_instruction *ir = (ir_instruction *) n;
+         ir_variable *var = ir->as_variable();
+         if (!var)
+            continue;
+         if (strcmp(var->name, "gl_ModelViewProjectionMatrixTranspose") == 0)
+            mvp_transpose = var;
+         if (strcmp(var->name, "gl_TextureMatrixTranspose") == 0)
+            texmat_transpose = var;
+      }
+   }
+
+   ir_visitor_status visit_enter(ir_expression *ir);
+
+   bool progress;
+
+private:
+   ir_variable *mvp_transpose;
+   ir_variable *texmat_transpose;
+};
+}
+
+ir_visitor_status
+matrix_flipper::visit_enter(ir_expression *ir)
+{
+   if (ir->operation != ir_binop_mul ||
+       !ir->operands[0]->type->is_matrix() ||
+       !ir->operands[1]->type->is_vector())
+      return visit_continue;
+
+   ir_variable *mat_var = ir->operands[0]->variable_referenced();
+   if (!mat_var)
+      return visit_continue;
+
+   if (mvp_transpose &&
+       strcmp(mat_var->name, "gl_ModelViewProjectionMatrix") == 0) {
+#ifndef NDEBUG
+      ir_dereference_variable *deref = ir->operands[0]->as_dereference_variable();
+      assert(deref && deref->var == mat_var);
+#endif
+
+      void *mem_ctx = ralloc_parent(ir);
+
+      ir->operands[0] = ir->operands[1];
+      ir->operands[1] = new(mem_ctx) ir_dereference_variable(mvp_transpose);
+
+      progress = true;
+   } else if (texmat_transpose &&
+              strcmp(mat_var->name, "gl_TextureMatrix") == 0) {
+      ir_dereference_array *array_ref = ir->operands[0]->as_dereference_array();
+      assert(array_ref != NULL);
+      ir_dereference_variable *var_ref = array_ref->array->as_dereference_variable();
+      assert(var_ref && var_ref->var == mat_var);
+
+      ir->operands[0] = ir->operands[1];
+      ir->operands[1] = array_ref;
+
+      var_ref->var = texmat_transpose;
+
+      texmat_transpose->data.max_array_access =
+         MAX2(texmat_transpose->data.max_array_access, mat_var->data.max_array_access);
+
+      progress = true;
+   }
+
+   return visit_continue;
+}
+
+bool
+opt_flip_matrices(struct exec_list *instructions)
+{
+   matrix_flipper v(instructions);
+
+   visit_list_elements(&v, instructions);
+
+   return v.progress;
+}

diff --git a/icd/intel/compiler/shader/opt_function_inlining.cpp b/icd/intel/compiler/shader/opt_function_inlining.cpp
new file mode 100644
index 0000000..964209d
--- /dev/null
+++ b/icd/intel/compiler/shader/opt_function_inlining.cpp

@@ -0,0 +1,365 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file opt_function_inlining.cpp
+ *
+ * Replaces calls to functions with the body of the function.
+ */
+
+#include "ir.h"
+#include "ir_visitor.h"
+#include "ir_function_inlining.h"
+#include "ir_expression_flattening.h"
+#include "glsl_types.h"
+#include "program/hash_table.h"
+
+static void
+do_variable_replacement(exec_list *instructions,
+                        ir_variable *orig,
+                        ir_dereference *repl);
+
+namespace {
+
+class ir_function_inlining_visitor : public ir_hierarchical_visitor {
+public:
+   ir_function_inlining_visitor()
+   {
+      progress = false;
+   }
+
+   virtual ~ir_function_inlining_visitor()
+   {
+      /* empty */
+   }
+
+   virtual ir_visitor_status visit_enter(ir_expression *);
+   virtual ir_visitor_status visit_enter(ir_call *);
+   virtual ir_visitor_status visit_enter(ir_return *);
+   virtual ir_visitor_status visit_enter(ir_texture *);
+   virtual ir_visitor_status visit_enter(ir_swizzle *);
+
+   bool progress;
+};
+
+} /* unnamed namespace */
+
+bool
+do_function_inlining(exec_list *instructions)
+{
+   ir_function_inlining_visitor v;
+
+   v.run(instructions);
+
+   return v.progress;
+}
+
+static void
+replace_return_with_assignment(ir_instruction *ir, void *data)
+{
+   void *ctx = ralloc_parent(ir);
+   ir_dereference *orig_deref = (ir_dereference *) data;
+   ir_return *ret = ir->as_return();
+
+   if (ret) {
+      if (ret->value) {
+	 ir_rvalue *lhs = orig_deref->clone(ctx, NULL);
+	 ret->replace_with(new(ctx) ir_assignment(lhs, ret->value, NULL));
+      } else {
+	 /* un-valued return has to be the last return, or we shouldn't
+	  * have reached here. (see can_inline()).
+	  */
+	 assert(ret->next->is_tail_sentinel());
+	 ret->remove();
+      }
+   }
+}
+
+void
+ir_call::generate_inline(ir_instruction *next_ir)
+{
+   void *ctx = ralloc_parent(this);
+   ir_variable **parameters;
+   int num_parameters;
+   int i;
+   struct hash_table *ht;
+
+   ht = hash_table_ctor(0, hash_table_pointer_hash, hash_table_pointer_compare);
+
+   num_parameters = 0;
+   foreach_list(n, &this->callee->parameters)
+      num_parameters++;
+
+   parameters = new ir_variable *[num_parameters];
+
+   /* Generate the declarations for the parameters to our inlined code,
+    * and set up the mapping of real function body variables to ours.
+    */
+   i = 0;
+   foreach_two_lists(formal_node, &this->callee->parameters,
+                     actual_node, &this->actual_parameters) {
+      ir_variable *sig_param = (ir_variable *) formal_node;
+      ir_rvalue *param = (ir_rvalue *) actual_node;
+
+      /* Generate a new variable for the parameter. */
+      if (sig_param->type->contains_opaque()) {
+	 /* For opaque types, we want the inlined variable references
+	  * referencing the passed in variable, since that will have
+	  * the location information, which an assignment of an opaque
+	  * variable wouldn't.  Fix it up below.
+	  */
+	 parameters[i] = NULL;
+      } else {
+	 parameters[i] = sig_param->clone(ctx, ht);
+	 parameters[i]->data.mode = ir_var_auto;
+
+	 /* Remove the read-only decoration because we're going to write
+	  * directly to this variable.  If the cloned variable is left
+	  * read-only and the inlined function is inside a loop, the loop
+	  * analysis code will get confused.
+	  */
+	 parameters[i]->data.read_only = false;
+	 next_ir->insert_before(parameters[i]);
+      }
+
+      /* Move the actual param into our param variable if it's an 'in' type. */
+      if (parameters[i] && (sig_param->data.mode == ir_var_function_in ||
+			    sig_param->data.mode == ir_var_const_in ||
+			    sig_param->data.mode == ir_var_function_inout)) {
+	 ir_assignment *assign;
+
+	 assign = new(ctx) ir_assignment(new(ctx) ir_dereference_variable(parameters[i]),
+					 param, NULL);
+	 next_ir->insert_before(assign);
+      }
+
+      ++i;
+   }
+
+   exec_list new_instructions;
+
+   /* Generate the inlined body of the function to a new list */
+   foreach_list(n, &callee->body) {
+      ir_instruction *ir = (ir_instruction *) n;
+      ir_instruction *new_ir = ir->clone(ctx, ht);
+
+      new_instructions.push_tail(new_ir);
+      visit_tree(new_ir, replace_return_with_assignment, this->return_deref);
+   }
+
+   /* If any opaque types were passed in, replace any deref of the
+    * opaque variable with a deref of the argument.
+    */
+   foreach_two_lists(formal_node, &this->callee->parameters,
+                     actual_node, &this->actual_parameters) {
+      ir_rvalue *const param = (ir_rvalue *) actual_node;
+      ir_variable *sig_param = (ir_variable *) formal_node;
+
+      if (sig_param->type->contains_opaque()) {
+	 ir_dereference *deref = param->as_dereference();
+
+	 assert(deref);
+	 do_variable_replacement(&new_instructions, sig_param, deref);
+      }
+   }
+
+   /* Now push those new instructions in. */
+   next_ir->insert_before(&new_instructions);
+
+   /* Copy back the value of any 'out' parameters from the function body
+    * variables to our own.
+    */
+   i = 0;
+   foreach_two_lists(formal_node, &this->callee->parameters,
+                     actual_node, &this->actual_parameters) {
+      ir_rvalue *const param = (ir_rvalue *) actual_node;
+      const ir_variable *const sig_param = (ir_variable *) formal_node;
+
+      /* Move our param variable into the actual param if it's an 'out' type. */
+      if (parameters[i] && (sig_param->data.mode == ir_var_function_out ||
+			    sig_param->data.mode == ir_var_function_inout)) {
+	 ir_assignment *assign;
+
+	 assign = new(ctx) ir_assignment(param->clone(ctx, NULL)->as_rvalue(),
+					 new(ctx) ir_dereference_variable(parameters[i]),
+					 NULL);
+	 next_ir->insert_before(assign);
+      }
+
+      ++i;
+   }
+
+   delete [] parameters;
+
+   hash_table_dtor(ht);
+}
+
+
+ir_visitor_status
+ir_function_inlining_visitor::visit_enter(ir_expression *ir)
+{
+   (void) ir;
+   return visit_continue_with_parent;
+}
+
+
+ir_visitor_status
+ir_function_inlining_visitor::visit_enter(ir_return *ir)
+{
+   (void) ir;
+   return visit_continue_with_parent;
+}
+
+
+ir_visitor_status
+ir_function_inlining_visitor::visit_enter(ir_texture *ir)
+{
+   (void) ir;
+   return visit_continue_with_parent;
+}
+
+
+ir_visitor_status
+ir_function_inlining_visitor::visit_enter(ir_swizzle *ir)
+{
+   (void) ir;
+   return visit_continue_with_parent;
+}
+
+
+ir_visitor_status
+ir_function_inlining_visitor::visit_enter(ir_call *ir)
+{
+   if (can_inline(ir)) {
+      ir->generate_inline(ir);
+      ir->remove();
+      this->progress = true;
+   }
+
+   return visit_continue;
+}
+
+
+/**
+ * Replaces references to the "orig" variable with a clone of "repl."
+ *
+ * From the spec, opaque types can appear in the tree as function
+ * (non-out) parameters and as the result of array indexing and
+ * structure field selection.  In our builtin implementation, they
+ * also appear in the sampler field of an ir_tex instruction.
+ */
+
+class ir_variable_replacement_visitor : public ir_hierarchical_visitor {
+public:
+   ir_variable_replacement_visitor(ir_variable *orig, ir_dereference *repl)
+   {
+      this->orig = orig;
+      this->repl = repl;
+   }
+
+   virtual ~ir_variable_replacement_visitor()
+   {
+   }
+
+   virtual ir_visitor_status visit_leave(ir_call *);
+   virtual ir_visitor_status visit_leave(ir_dereference_array *);
+   virtual ir_visitor_status visit_leave(ir_dereference_record *);
+   virtual ir_visitor_status visit_leave(ir_texture *);
+
+   void replace_deref(ir_dereference **deref);
+   void replace_rvalue(ir_rvalue **rvalue);
+
+   ir_variable *orig;
+   ir_dereference *repl;
+};
+
+void
+ir_variable_replacement_visitor::replace_deref(ir_dereference **deref)
+{
+   ir_dereference_variable *deref_var = (*deref)->as_dereference_variable();
+   if (deref_var && deref_var->var == this->orig) {
+      *deref = this->repl->clone(ralloc_parent(*deref), NULL);
+   }
+}
+
+void
+ir_variable_replacement_visitor::replace_rvalue(ir_rvalue **rvalue)
+{
+   if (!*rvalue)
+      return;
+
+   ir_dereference *deref = (*rvalue)->as_dereference();
+
+   if (!deref)
+      return;
+
+   replace_deref(&deref);
+   *rvalue = deref;
+}
+
+ir_visitor_status
+ir_variable_replacement_visitor::visit_leave(ir_texture *ir)
+{
+   replace_deref(&ir->sampler);
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_variable_replacement_visitor::visit_leave(ir_dereference_array *ir)
+{
+   replace_rvalue(&ir->array);
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_variable_replacement_visitor::visit_leave(ir_dereference_record *ir)
+{
+   replace_rvalue(&ir->record);
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_variable_replacement_visitor::visit_leave(ir_call *ir)
+{
+   foreach_list_safe(n, &ir->actual_parameters) {
+      ir_rvalue *param = (ir_rvalue *) n;
+      ir_rvalue *new_param = param;
+      replace_rvalue(&new_param);
+
+      if (new_param != param) {
+	 param->replace_with(new_param);
+      }
+   }
+   return visit_continue;
+}
+
+static void
+do_variable_replacement(exec_list *instructions,
+                        ir_variable *orig,
+                        ir_dereference *repl)
+{
+   ir_variable_replacement_visitor v(orig, repl);
+
+   visit_list_elements(&v, instructions);
+}

diff --git a/icd/intel/compiler/shader/opt_if_simplification.cpp b/icd/intel/compiler/shader/opt_if_simplification.cpp
new file mode 100644
index 0000000..e05f031
--- /dev/null
+++ b/icd/intel/compiler/shader/opt_if_simplification.cpp

@@ -0,0 +1,126 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file opt_if_simplification.cpp
+ *
+ * Moves constant branches of if statements out to the surrounding
+ * instruction stream, and inverts if conditionals to avoid empty
+ * "then" blocks.
+ */
+
+#include "ir.h"
+
+namespace {
+
+class ir_if_simplification_visitor : public ir_hierarchical_visitor {
+public:
+   ir_if_simplification_visitor()
+   {
+      this->made_progress = false;
+   }
+
+   ir_visitor_status visit_leave(ir_if *);
+   ir_visitor_status visit_enter(ir_assignment *);
+
+   bool made_progress;
+};
+
+} /* unnamed namespace */
+
+/* We only care about the top level "if" instructions, so don't
+ * descend into expressions.
+ */
+ir_visitor_status
+ir_if_simplification_visitor::visit_enter(ir_assignment *ir)
+{
+   (void) ir;
+   return visit_continue_with_parent;
+}
+
+bool
+do_if_simplification(exec_list *instructions)
+{
+   ir_if_simplification_visitor v;
+
+   v.run(instructions);
+   return v.made_progress;
+}
+
+
+ir_visitor_status
+ir_if_simplification_visitor::visit_leave(ir_if *ir)
+{
+   /* If the if statement has nothing on either side, remove it. */
+   if (ir->then_instructions.is_empty() &&
+       ir->else_instructions.is_empty()) {
+      ir->remove();
+      this->made_progress = true;
+      return visit_continue;
+   }
+
+   /* FINISHME: Ideally there would be a way to note that the condition results
+    * FINISHME: in a constant before processing both of the other subtrees.
+    * FINISHME: This can probably be done with some flags, but it would take
+    * FINISHME: some work to get right.
+    */
+   ir_constant *condition_constant = ir->condition->constant_expression_value();
+   if (condition_constant) {
+      /* Move the contents of the one branch of the conditional
+       * that matters out.
+       */
+      if (condition_constant->value.b[0]) {
+         ir->insert_before(&ir->then_instructions);
+      } else {
+         ir->insert_before(&ir->else_instructions);
+      }
+      ir->remove();
+      this->made_progress = true;
+      return visit_continue;
+   }
+
+   /* Turn:
+    *
+    *     if (cond) {
+    *     } else {
+    *         do_work();
+    *     }
+    *
+    * into :
+    *
+    *     if (!cond)
+    *         do_work();
+    *
+    * which avoids control flow for "else" (which is usually more
+    * expensive than normal operations), and the "not" can usually be
+    * folded into the generation of "cond" anyway.
+    */
+   if (ir->then_instructions.is_empty()) {
+      ir->condition = new(ralloc_parent(ir->condition))
+	 ir_expression(ir_unop_logic_not, ir->condition);
+      ir->else_instructions.move_nodes_to(&ir->then_instructions);
+      this->made_progress = true;
+   }
+
+   return visit_continue;
+}

diff --git a/icd/intel/compiler/shader/opt_noop_swizzle.cpp b/icd/intel/compiler/shader/opt_noop_swizzle.cpp
new file mode 100644
index 0000000..586ad5e
--- /dev/null
+++ b/icd/intel/compiler/shader/opt_noop_swizzle.cpp

@@ -0,0 +1,83 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file opt_noop_swizzle.cpp
+ *
+ * If a swizzle doesn't change the order or count of components, then
+ * remove the swizzle so that other optimization passes see the value
+ * behind it.
+ */
+
+#include "ir.h"
+#include "ir_visitor.h"
+#include "ir_rvalue_visitor.h"
+#include "glsl_types.h"
+
+namespace {
+
+class ir_noop_swizzle_visitor : public ir_rvalue_visitor {
+public:
+   ir_noop_swizzle_visitor()
+   {
+      this->progress = false;
+   }
+
+   void handle_rvalue(ir_rvalue **rvalue);
+   bool progress;
+};
+
+} /* unnamed namespace */
+
+void
+ir_noop_swizzle_visitor::handle_rvalue(ir_rvalue **rvalue)
+{
+   if (!*rvalue)
+      return;
+
+   ir_swizzle *swiz = (*rvalue)->as_swizzle();
+   if (!swiz || swiz->type != swiz->val->type)
+      return;
+
+   int elems = swiz->val->type->vector_elements;
+   if (swiz->mask.x != 0)
+      return;
+   if (elems >= 2 && swiz->mask.y != 1)
+      return;
+   if (elems >= 3 && swiz->mask.z != 2)
+      return;
+   if (elems >= 4 && swiz->mask.w != 3)
+      return;
+
+   this->progress = true;
+   *rvalue = swiz->val;
+}
+
+bool
+do_noop_swizzle(exec_list *instructions)
+{
+   ir_noop_swizzle_visitor v;
+   visit_list_elements(&v, instructions);
+
+   return v.progress;
+}

diff --git a/icd/intel/compiler/shader/opt_redundant_jumps.cpp b/icd/intel/compiler/shader/opt_redundant_jumps.cpp
new file mode 100644
index 0000000..ee384d0
--- /dev/null
+++ b/icd/intel/compiler/shader/opt_redundant_jumps.cpp

@@ -0,0 +1,124 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file opt_redundant_jumps.cpp
+ * Remove certain types of redundant jumps
+ */
+
+#include "ir.h"
+
+namespace {
+
+class redundant_jumps_visitor : public ir_hierarchical_visitor {
+public:
+   redundant_jumps_visitor()
+   {
+      this->progress = false;
+   }
+
+   virtual ir_visitor_status visit_leave(ir_if *);
+   virtual ir_visitor_status visit_leave(ir_loop *);
+   virtual ir_visitor_status visit_enter(ir_assignment *);
+
+   bool progress;
+};
+
+} /* unnamed namespace */
+
+/* We only care about the top level instructions, so don't descend
+ * into expressions.
+ */
+ir_visitor_status
+redundant_jumps_visitor::visit_enter(ir_assignment *)
+{
+   return visit_continue_with_parent;
+}
+
+ir_visitor_status
+redundant_jumps_visitor::visit_leave(ir_if *ir)
+{
+   /* If the last instruction in both branches is a 'break' or a 'continue',
+    * pull it out of the branches and insert it after the if-statment.  Note
+    * that both must be the same type (either 'break' or 'continue').
+    */
+   ir_instruction *const last_then =
+      (ir_instruction *) ir->then_instructions.get_tail();
+   ir_instruction *const last_else =
+      (ir_instruction *) ir->else_instructions.get_tail();
+
+   if ((last_then == NULL) || (last_else == NULL))
+      return visit_continue;
+
+   if ((last_then->ir_type != ir_type_loop_jump)
+       || (last_else->ir_type != ir_type_loop_jump))
+      return visit_continue;
+
+   ir_loop_jump *const then_jump = (ir_loop_jump *) last_then;
+   ir_loop_jump *const else_jump = (ir_loop_jump *) last_else;
+
+   if (then_jump->mode != else_jump->mode)
+      return visit_continue;
+
+   then_jump->remove();
+   else_jump->remove();
+   this->progress = true;
+
+   ir->insert_after(then_jump);
+
+   /* If both branchs of the if-statement are now empty, remove the
+    * if-statement.
+    */
+   if (ir->then_instructions.is_empty() && ir->else_instructions.is_empty())
+      ir->remove();
+
+   return visit_continue;
+}
+
+
+ir_visitor_status
+redundant_jumps_visitor::visit_leave(ir_loop *ir)
+{
+   /* If the last instruction of a loop body is a 'continue', remove it.
+    */
+   ir_instruction *const last =
+      (ir_instruction *) ir->body_instructions.get_tail();
+
+   if (last && (last->ir_type == ir_type_loop_jump)
+       && (((ir_loop_jump *) last)->mode == ir_loop_jump::jump_continue)) {
+      last->remove();
+      this->progress = true;
+   }
+
+   return visit_continue;
+}
+
+
+bool
+optimize_redundant_jumps(exec_list *instructions)
+{
+   redundant_jumps_visitor v;
+
+   v.run(instructions);
+   return v.progress;
+}

diff --git a/icd/intel/compiler/shader/opt_structure_splitting.cpp b/icd/intel/compiler/shader/opt_structure_splitting.cpp
new file mode 100644
index 0000000..1ec537b
--- /dev/null
+++ b/icd/intel/compiler/shader/opt_structure_splitting.cpp

@@ -0,0 +1,371 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file opt_structure_splitting.cpp
+ *
+ * If a structure is only ever referenced by its components, then
+ * split those components out to individual variables so they can be
+ * handled normally by other optimization passes.
+ *
+ * This skips structures like uniforms, which need to be accessible as
+ * structures for their access by the GL.
+ */
+
+#include "ir.h"
+#include "ir_visitor.h"
+#include "ir_rvalue_visitor.h"
+#include "glsl_types.h"
+
+namespace {
+
+static bool debug = false;
+
+class variable_entry : public exec_node
+{
+public:
+   variable_entry(ir_variable *var)
+   {
+      this->var = var;
+      this->whole_structure_access = 0;
+      this->declaration = false;
+      this->components = NULL;
+      this->mem_ctx = NULL;
+   }
+
+   ir_variable *var; /* The key: the variable's pointer. */
+
+   /** Number of times the variable is referenced, including assignments. */
+   unsigned whole_structure_access;
+
+   /* If the variable had a decl we can work with in the instruction
+    * stream.  We can't do splitting on function arguments, which
+    * don't get this variable set.
+    */
+   bool declaration;
+
+   ir_variable **components;
+
+   /** ralloc_parent(this->var) -- the shader's ralloc context. */
+   void *mem_ctx;
+};
+
+
+class ir_structure_reference_visitor : public ir_hierarchical_visitor {
+public:
+   ir_structure_reference_visitor(void)
+   {
+      this->mem_ctx = ralloc_context(NULL);
+      this->variable_list.make_empty();
+   }
+
+   ~ir_structure_reference_visitor(void)
+   {
+      ralloc_free(mem_ctx);
+   }
+
+   virtual ir_visitor_status visit(ir_variable *);
+   virtual ir_visitor_status visit(ir_dereference_variable *);
+   virtual ir_visitor_status visit_enter(ir_dereference_record *);
+   virtual ir_visitor_status visit_enter(ir_assignment *);
+   virtual ir_visitor_status visit_enter(ir_function_signature *);
+
+   variable_entry *get_variable_entry(ir_variable *var);
+
+   /* List of variable_entry */
+   exec_list variable_list;
+
+   void *mem_ctx;
+};
+
+variable_entry *
+ir_structure_reference_visitor::get_variable_entry(ir_variable *var)
+{
+   assert(var);
+
+   if (!var->type->is_record() || var->data.mode == ir_var_uniform
+       || var->data.mode == ir_var_shader_in || var->data.mode == ir_var_shader_out)
+      return NULL;
+
+   foreach_list(n, &this->variable_list) {
+      variable_entry *entry = (variable_entry *) n;
+      if (entry->var == var)
+	 return entry;
+   }
+
+   variable_entry *entry = new(mem_ctx) variable_entry(var);
+   this->variable_list.push_tail(entry);
+   return entry;
+}
+
+
+ir_visitor_status
+ir_structure_reference_visitor::visit(ir_variable *ir)
+{
+   variable_entry *entry = this->get_variable_entry(ir);
+
+   if (entry)
+      entry->declaration = true;
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_structure_reference_visitor::visit(ir_dereference_variable *ir)
+{
+   ir_variable *const var = ir->variable_referenced();
+   variable_entry *entry = this->get_variable_entry(var);
+
+   if (entry)
+      entry->whole_structure_access++;
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_structure_reference_visitor::visit_enter(ir_dereference_record *ir)
+{
+   (void) ir;
+   /* Don't descend into the ir_dereference_variable below. */
+   return visit_continue_with_parent;
+}
+
+ir_visitor_status
+ir_structure_reference_visitor::visit_enter(ir_assignment *ir)
+{
+   /* If there are no structure references yet, no need to bother with
+    * processing the expression tree.
+    */
+   if (this->variable_list.is_empty())
+      return visit_continue_with_parent;
+
+   if (ir->lhs->as_dereference_variable() &&
+       ir->rhs->as_dereference_variable() &&
+       !ir->condition) {
+      /* We'll split copies of a structure to copies of components, so don't
+       * descend to the ir_dereference_variables.
+       */
+      return visit_continue_with_parent;
+   }
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_structure_reference_visitor::visit_enter(ir_function_signature *ir)
+{
+   /* We don't have logic for structure-splitting function arguments,
+    * so just look at the body instructions and not the parameter
+    * declarations.
+    */
+   visit_list_elements(this, &ir->body);
+   return visit_continue_with_parent;
+}
+
+class ir_structure_splitting_visitor : public ir_rvalue_visitor {
+public:
+   ir_structure_splitting_visitor(exec_list *vars)
+   {
+      this->variable_list = vars;
+   }
+
+   virtual ~ir_structure_splitting_visitor()
+   {
+   }
+
+   virtual ir_visitor_status visit_leave(ir_assignment *);
+
+   void split_deref(ir_dereference **deref);
+   void handle_rvalue(ir_rvalue **rvalue);
+   variable_entry *get_splitting_entry(ir_variable *var);
+
+   exec_list *variable_list;
+};
+
+variable_entry *
+ir_structure_splitting_visitor::get_splitting_entry(ir_variable *var)
+{
+   assert(var);
+
+   if (!var->type->is_record())
+      return NULL;
+
+   foreach_list(n, this->variable_list) {
+      variable_entry *entry = (variable_entry *) n;
+      if (entry->var == var) {
+	 return entry;
+      }
+   }
+
+   return NULL;
+}
+
+void
+ir_structure_splitting_visitor::split_deref(ir_dereference **deref)
+{
+   if ((*deref)->ir_type != ir_type_dereference_record)
+      return;
+
+   ir_dereference_record *deref_record = (ir_dereference_record *)*deref;
+   ir_dereference_variable *deref_var = deref_record->record->as_dereference_variable();
+   if (!deref_var)
+      return;
+
+   variable_entry *entry = get_splitting_entry(deref_var->var);
+   if (!entry)
+      return;
+
+   unsigned int i;
+   for (i = 0; i < entry->var->type->length; i++) {
+      if (strcmp(deref_record->field,
+		 entry->var->type->fields.structure[i].name) == 0)
+	 break;
+   }
+   assert(i != entry->var->type->length);
+
+   *deref = new(entry->mem_ctx) ir_dereference_variable(entry->components[i]);
+}
+
+void
+ir_structure_splitting_visitor::handle_rvalue(ir_rvalue **rvalue)
+{
+   if (!*rvalue)
+      return;
+
+   ir_dereference *deref = (*rvalue)->as_dereference();
+
+   if (!deref)
+      return;
+
+   split_deref(&deref);
+   *rvalue = deref;
+}
+
+ir_visitor_status
+ir_structure_splitting_visitor::visit_leave(ir_assignment *ir)
+{
+   ir_dereference_variable *lhs_deref = ir->lhs->as_dereference_variable();
+   ir_dereference_variable *rhs_deref = ir->rhs->as_dereference_variable();
+   variable_entry *lhs_entry = lhs_deref ? get_splitting_entry(lhs_deref->var) : NULL;
+   variable_entry *rhs_entry = rhs_deref ? get_splitting_entry(rhs_deref->var) : NULL;
+   const glsl_type *type = ir->rhs->type;
+
+   if ((lhs_entry || rhs_entry) && !ir->condition) {
+      for (unsigned int i = 0; i < type->length; i++) {
+	 ir_dereference *new_lhs, *new_rhs;
+	 void *mem_ctx = lhs_entry ? lhs_entry->mem_ctx : rhs_entry->mem_ctx;
+
+	 if (lhs_entry) {
+	    new_lhs = new(mem_ctx) ir_dereference_variable(lhs_entry->components[i]);
+	 } else {
+	    new_lhs = new(mem_ctx)
+	       ir_dereference_record(ir->lhs->clone(mem_ctx, NULL),
+				     type->fields.structure[i].name);
+	 }
+
+	 if (rhs_entry) {
+	    new_rhs = new(mem_ctx) ir_dereference_variable(rhs_entry->components[i]);
+	 } else {
+	    new_rhs = new(mem_ctx)
+	       ir_dereference_record(ir->rhs->clone(mem_ctx, NULL),
+				     type->fields.structure[i].name);
+	 }
+
+	 ir->insert_before(new(mem_ctx) ir_assignment(new_lhs,
+						      new_rhs,
+						      NULL));
+      }
+      ir->remove();
+   } else {
+      handle_rvalue(&ir->rhs);
+      split_deref(&ir->lhs);
+   }
+
+   handle_rvalue(&ir->condition);
+
+   return visit_continue;
+}
+
+} /* unnamed namespace */
+
+bool
+do_structure_splitting(exec_list *instructions)
+{
+   ir_structure_reference_visitor refs;
+
+   visit_list_elements(&refs, instructions);
+
+   /* Trim out variables we can't split. */
+   foreach_list_safe(n, &refs.variable_list) {
+      variable_entry *entry = (variable_entry *) n;
+
+      if (debug) {
+	 printf("structure %s@%p: decl %d, whole_access %d\n",
+		entry->var->name, (void *) entry->var, entry->declaration,
+		entry->whole_structure_access);
+      }
+
+      if (!entry->declaration || entry->whole_structure_access) {
+	 entry->remove();
+      }
+   }
+
+   if (refs.variable_list.is_empty())
+      return false;
+
+   void *mem_ctx = ralloc_context(NULL);
+
+   /* Replace the decls of the structures to be split with their split
+    * components.
+    */
+   foreach_list_safe(n, &refs.variable_list) {
+      variable_entry *entry = (variable_entry *) n;
+      const struct glsl_type *type = entry->var->type;
+
+      entry->mem_ctx = ralloc_parent(entry->var);
+
+      entry->components = ralloc_array(mem_ctx,
+				       ir_variable *,
+				       type->length);
+
+      for (unsigned int i = 0; i < entry->var->type->length; i++) {
+	 const char *name = ralloc_asprintf(mem_ctx, "%s_%s",
+					    entry->var->name,
+					    type->fields.structure[i].name);
+
+	 entry->components[i] =
+	    new(entry->mem_ctx) ir_variable(type->fields.structure[i].type,
+					    name,
+					    ir_var_temporary);
+	 entry->var->insert_before(entry->components[i]);
+      }
+
+      entry->var->remove();
+   }
+
+   ir_structure_splitting_visitor split(&refs.variable_list);
+   visit_list_elements(&split, instructions);
+
+   ralloc_free(mem_ctx);
+
+   return true;
+}

diff --git a/icd/intel/compiler/shader/opt_swizzle_swizzle.cpp b/icd/intel/compiler/shader/opt_swizzle_swizzle.cpp
new file mode 100644
index 0000000..7564c6b
--- /dev/null
+++ b/icd/intel/compiler/shader/opt_swizzle_swizzle.cpp

@@ -0,0 +1,97 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file opt_swizzle_swizzle.cpp
+ *
+ * Eliminates the second swizzle in a swizzle chain.
+ */
+
+#include "ir.h"
+#include "ir_visitor.h"
+#include "ir_optimization.h"
+#include "glsl_types.h"
+
+namespace {
+
+class ir_swizzle_swizzle_visitor : public ir_hierarchical_visitor {
+public:
+   ir_swizzle_swizzle_visitor()
+   {
+      progress = false;
+   }
+
+   virtual ir_visitor_status visit_enter(ir_swizzle *);
+
+   bool progress;
+};
+
+} /* unnamed namespace */
+
+ir_visitor_status
+ir_swizzle_swizzle_visitor::visit_enter(ir_swizzle *ir)
+{
+   int mask2[4];
+
+   ir_swizzle *swiz2 = ir->val->as_swizzle();
+   if (!swiz2)
+      return visit_continue;
+
+   memset(&mask2, 0, sizeof(mask2));
+   if (swiz2->mask.num_components >= 1)
+      mask2[0] = swiz2->mask.x;
+   if (swiz2->mask.num_components >= 2)
+      mask2[1] = swiz2->mask.y;
+   if (swiz2->mask.num_components >= 3)
+      mask2[2] = swiz2->mask.z;
+   if (swiz2->mask.num_components >= 4)
+      mask2[3] = swiz2->mask.w;
+
+   if (ir->mask.num_components >= 1)
+      ir->mask.x = mask2[ir->mask.x];
+   if (ir->mask.num_components >= 2)
+      ir->mask.y = mask2[ir->mask.y];
+   if (ir->mask.num_components >= 3)
+      ir->mask.z = mask2[ir->mask.z];
+   if (ir->mask.num_components >= 4)
+      ir->mask.w = mask2[ir->mask.w];
+
+   ir->val = swiz2->val;
+
+   this->progress = true;
+
+   return visit_continue;
+}
+
+/**
+ * Does a copy propagation pass on the code present in the instruction stream.
+ */
+bool
+do_swizzle_swizzle(exec_list *instructions)
+{
+   ir_swizzle_swizzle_visitor v;
+
+   v.run(instructions);
+
+   return v.progress;
+}

diff --git a/icd/intel/compiler/shader/opt_tree_grafting.cpp b/icd/intel/compiler/shader/opt_tree_grafting.cpp
new file mode 100644
index 0000000..d47613c
--- /dev/null
+++ b/icd/intel/compiler/shader/opt_tree_grafting.cpp

@@ -0,0 +1,403 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file opt_tree_grafting.cpp
+ *
+ * Takes assignments to variables that are dereferenced only once and
+ * pastes the RHS expression into where the variable is dereferenced.
+ *
+ * In the process of various operations like function inlining and
+ * tertiary op handling, we'll end up with our expression trees having
+ * been chopped up into a series of assignments of short expressions
+ * to temps.  Other passes like ir_algebraic.cpp would prefer to see
+ * the deepest expression trees they can to try to optimize them.
+ *
+ * This is a lot like copy propagaton.  In comparison, copy
+ * propagation only acts on plain copies, not arbitrary expressions on
+ * the RHS.  Generally, we wouldn't want to go pasting some
+ * complicated expression everywhere it got used, though, so we don't
+ * handle expressions in that pass.
+ *
+ * The hard part is making sure we don't move an expression across
+ * some other assignments that would change the value of the
+ * expression.  So we split this into two passes: First, find the
+ * variables in our scope which are written to once and read once, and
+ * then go through basic blocks seeing if we find an opportunity to
+ * move those expressions safely.
+ */
+
+#include "ir.h"
+#include "ir_visitor.h"
+#include "ir_variable_refcount.h"
+#include "ir_basic_block.h"
+#include "ir_optimization.h"
+#include "glsl_types.h"
+
+namespace {
+
+static bool debug = false;
+
+class ir_tree_grafting_visitor : public ir_hierarchical_visitor {
+public:
+   ir_tree_grafting_visitor(ir_assignment *graft_assign,
+			    ir_variable *graft_var)
+   {
+      this->progress = false;
+      this->graft_assign = graft_assign;
+      this->graft_var = graft_var;
+   }
+
+   virtual ir_visitor_status visit_leave(class ir_assignment *);
+   virtual ir_visitor_status visit_enter(class ir_call *);
+   virtual ir_visitor_status visit_enter(class ir_expression *);
+   virtual ir_visitor_status visit_enter(class ir_function *);
+   virtual ir_visitor_status visit_enter(class ir_function_signature *);
+   virtual ir_visitor_status visit_enter(class ir_if *);
+   virtual ir_visitor_status visit_enter(class ir_loop *);
+   virtual ir_visitor_status visit_enter(class ir_swizzle *);
+   virtual ir_visitor_status visit_enter(class ir_texture *);
+
+   ir_visitor_status check_graft(ir_instruction *ir, ir_variable *var);
+
+   bool do_graft(ir_rvalue **rvalue);
+
+   bool progress;
+   ir_variable *graft_var;
+   ir_assignment *graft_assign;
+};
+
+struct find_deref_info {
+   ir_variable *var;
+   bool found;
+};
+
+void
+dereferences_variable_callback(ir_instruction *ir, void *data)
+{
+   struct find_deref_info *info = (struct find_deref_info *)data;
+   ir_dereference_variable *deref = ir->as_dereference_variable();
+
+   if (deref && deref->var == info->var)
+      info->found = true;
+}
+
+static bool
+dereferences_variable(ir_instruction *ir, ir_variable *var)
+{
+   struct find_deref_info info;
+
+   info.var = var;
+   info.found = false;
+
+   visit_tree(ir, dereferences_variable_callback, &info);
+
+   return info.found;
+}
+
+bool
+ir_tree_grafting_visitor::do_graft(ir_rvalue **rvalue)
+{
+   if (!*rvalue)
+      return false;
+
+   ir_dereference_variable *deref = (*rvalue)->as_dereference_variable();
+
+   if (!deref || deref->var != this->graft_var)
+      return false;
+
+   if (debug) {
+      fprintf(stderr, "GRAFTING:\n");
+      this->graft_assign->fprint(stderr);
+      fprintf(stderr, "\n");
+      fprintf(stderr, "TO:\n");
+      (*rvalue)->fprint(stderr);
+      fprintf(stderr, "\n");
+   }
+
+   this->graft_assign->remove();
+   *rvalue = this->graft_assign->rhs;
+
+   this->progress = true;
+   return true;
+}
+
+ir_visitor_status
+ir_tree_grafting_visitor::visit_enter(ir_loop *ir)
+{
+   (void)ir;
+   /* Do not traverse into the body of the loop since that is a
+    * different basic block.
+    */
+   return visit_stop;
+}
+
+/**
+ * Check if we can continue grafting after writing to a variable.  If the
+ * expression we're trying to graft references the variable, we must stop.
+ *
+ * \param ir   An instruction that writes to a variable.
+ * \param var  The variable being updated.
+ */
+ir_visitor_status
+ir_tree_grafting_visitor::check_graft(ir_instruction *ir, ir_variable *var)
+{
+   if (dereferences_variable(this->graft_assign->rhs, var)) {
+      if (debug) {
+	 fprintf(stderr, "graft killed by: ");
+	 ir->fprint(stderr);
+	 fprintf(stderr, "\n");
+      }
+      return visit_stop;
+   }
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_tree_grafting_visitor::visit_leave(ir_assignment *ir)
+{
+   if (do_graft(&ir->rhs) ||
+       do_graft(&ir->condition))
+      return visit_stop;
+
+   /* If this assignment updates a variable used in the assignment
+    * we're trying to graft, then we're done.
+    */
+   return check_graft(ir, ir->lhs->variable_referenced());
+}
+
+ir_visitor_status
+ir_tree_grafting_visitor::visit_enter(ir_function *ir)
+{
+   (void) ir;
+   return visit_continue_with_parent;
+}
+
+ir_visitor_status
+ir_tree_grafting_visitor::visit_enter(ir_function_signature *ir)
+{
+   (void)ir;
+   return visit_continue_with_parent;
+}
+
+ir_visitor_status
+ir_tree_grafting_visitor::visit_enter(ir_call *ir)
+{
+   foreach_two_lists(formal_node, &ir->callee->parameters,
+                     actual_node, &ir->actual_parameters) {
+      ir_variable *sig_param = (ir_variable *) formal_node;
+      ir_rvalue *ir = (ir_rvalue *) actual_node;
+      ir_rvalue *new_ir = ir;
+
+      if (sig_param->data.mode != ir_var_function_in
+          && sig_param->data.mode != ir_var_const_in) {
+	 if (check_graft(ir, sig_param) == visit_stop)
+	    return visit_stop;
+	 continue;
+      }
+
+      if (do_graft(&new_ir)) {
+	 ir->replace_with(new_ir);
+	 return visit_stop;
+      }
+   }
+
+   if (ir->return_deref && check_graft(ir, ir->return_deref->var) == visit_stop)
+      return visit_stop;
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_tree_grafting_visitor::visit_enter(ir_expression *ir)
+{
+   for (unsigned int i = 0; i < ir->get_num_operands(); i++) {
+      if (do_graft(&ir->operands[i]))
+	 return visit_stop;
+   }
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_tree_grafting_visitor::visit_enter(ir_if *ir)
+{
+   if (do_graft(&ir->condition))
+      return visit_stop;
+
+   /* Do not traverse into the body of the if-statement since that is a
+    * different basic block.
+    */
+   return visit_continue_with_parent;
+}
+
+ir_visitor_status
+ir_tree_grafting_visitor::visit_enter(ir_swizzle *ir)
+{
+   if (do_graft(&ir->val))
+      return visit_stop;
+
+   return visit_continue;
+}
+
+ir_visitor_status
+ir_tree_grafting_visitor::visit_enter(ir_texture *ir)
+{
+   if (do_graft(&ir->coordinate) ||
+       do_graft(&ir->projector) ||
+       do_graft(&ir->offset) ||
+       do_graft(&ir->shadow_comparitor))
+	 return visit_stop;
+
+   switch (ir->op) {
+   case ir_tex:
+   case ir_lod:
+   case ir_query_levels:
+      break;
+   case ir_txb:
+      if (do_graft(&ir->lod_info.bias))
+	 return visit_stop;
+      break;
+   case ir_txf:
+   case ir_txl:
+   case ir_txs:
+      if (do_graft(&ir->lod_info.lod))
+	 return visit_stop;
+      break;
+   case ir_txf_ms:
+      if (do_graft(&ir->lod_info.sample_index))
+         return visit_stop;
+      break;
+   case ir_txd:
+      if (do_graft(&ir->lod_info.grad.dPdx) ||
+	  do_graft(&ir->lod_info.grad.dPdy))
+	 return visit_stop;
+      break;
+   case ir_tg4:
+      if (do_graft(&ir->lod_info.component))
+         return visit_stop;
+      break;
+   }
+
+   return visit_continue;
+}
+
+struct tree_grafting_info {
+   ir_variable_refcount_visitor *refs;
+   bool progress;
+};
+
+static bool
+try_tree_grafting(ir_assignment *start,
+		  ir_variable *lhs_var,
+		  ir_instruction *bb_last)
+{
+   ir_tree_grafting_visitor v(start, lhs_var);
+
+   if (debug) {
+      fprintf(stderr, "trying to graft: ");
+      lhs_var->fprint(stderr);
+      fprintf(stderr, "\n");
+   }
+
+   for (ir_instruction *ir = (ir_instruction *)start->next;
+	ir != bb_last->next;
+	ir = (ir_instruction *)ir->next) {
+
+      if (debug) {
+	 fprintf(stderr, "- ");
+	 ir->fprint(stderr);
+	 fprintf(stderr, "\n");
+      }
+
+      ir_visitor_status s = ir->accept(&v);
+      if (s == visit_stop)
+	 return v.progress;
+   }
+
+   return false;
+}
+
+static void
+tree_grafting_basic_block(ir_instruction *bb_first,
+			  ir_instruction *bb_last,
+			  void *data)
+{
+   struct tree_grafting_info *info = (struct tree_grafting_info *)data;
+   ir_instruction *ir, *next;
+
+   for (ir = bb_first, next = (ir_instruction *)ir->next;
+	ir != bb_last->next;
+	ir = next, next = (ir_instruction *)ir->next) {
+      ir_assignment *assign = ir->as_assignment();
+
+      if (!assign)
+	 continue;
+
+      ir_variable *lhs_var = assign->whole_variable_written();
+      if (!lhs_var)
+	 continue;
+
+      if (lhs_var->data.mode == ir_var_function_out ||
+	  lhs_var->data.mode == ir_var_function_inout ||
+          lhs_var->data.mode == ir_var_shader_out)
+	 continue;
+
+      ir_variable_refcount_entry *entry = info->refs->get_variable_entry(lhs_var);
+
+      if (!entry->declaration ||
+	  entry->assigned_count != 1 ||
+	  entry->referenced_count != 2)
+	 continue;
+
+      assert(assign == entry->assign);
+
+      /* Found a possibly graftable assignment.  Now, walk through the
+       * rest of the BB seeing if the deref is here, and if nothing interfered with
+       * pasting its expression's values in between.
+       */
+      info->progress |= try_tree_grafting(assign, lhs_var, bb_last);
+   }
+}
+
+} /* unnamed namespace */
+
+/**
+ * Does a copy propagation pass on the code present in the instruction stream.
+ */
+bool
+do_tree_grafting(exec_list *instructions)
+{
+   ir_variable_refcount_visitor refs;
+   struct tree_grafting_info info;
+
+   info.progress = false;
+   info.refs = &refs;
+
+   visit_list_elements(info.refs, instructions);
+
+   call_for_basic_blocks(instructions, tree_grafting_basic_block, &info);
+
+   return info.progress;
+}

diff --git a/icd/intel/compiler/shader/opt_vectorize.cpp b/icd/intel/compiler/shader/opt_vectorize.cpp
new file mode 100644
index 0000000..f9a3b61
--- /dev/null
+++ b/icd/intel/compiler/shader/opt_vectorize.cpp

@@ -0,0 +1,395 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file opt_vectorize.cpp
+ *
+ * Combines scalar assignments of the same expression (modulo swizzle) to
+ * multiple channels of the same variable into a single vectorized expression
+ * and assignment.
+ *
+ * Many generated shaders contain scalarized code. That is, they contain
+ *
+ * r1.x = log2(v0.x);
+ * r1.y = log2(v0.y);
+ * r1.z = log2(v0.z);
+ *
+ * rather than
+ *
+ * r1.xyz = log2(v0.xyz);
+ *
+ * We look for consecutive assignments of the same expression (modulo swizzle)
+ * to each channel of the same variable.
+ *
+ * For instance, we want to convert these three scalar operations
+ *
+ * (assign (x) (var_ref r1) (expression float log2 (swiz x (var_ref v0))))
+ * (assign (y) (var_ref r1) (expression float log2 (swiz y (var_ref v0))))
+ * (assign (z) (var_ref r1) (expression float log2 (swiz z (var_ref v0))))
+ *
+ * into a single vector operation
+ *
+ * (assign (xyz) (var_ref r1) (expression vec3 log2 (swiz xyz (var_ref v0))))
+ */
+
+#include "ir.h"
+#include "ir_visitor.h"
+#include "ir_optimization.h"
+#include "glsl_types.h"
+#include "program/prog_instruction.h"
+
+namespace {
+
+class ir_vectorize_visitor : public ir_hierarchical_visitor {
+public:
+   void clear()
+   {
+      assignment[0] = NULL;
+      assignment[1] = NULL;
+      assignment[2] = NULL;
+      assignment[3] = NULL;
+      current_assignment = NULL;
+      last_assignment = NULL;
+      channels = 0;
+      has_swizzle = false;
+   }
+
+   ir_vectorize_visitor()
+   {
+      clear();
+      progress = false;
+   }
+
+   virtual ir_visitor_status visit_enter(ir_assignment *);
+   virtual ir_visitor_status visit_enter(ir_swizzle *);
+   virtual ir_visitor_status visit_enter(ir_dereference_array *);
+   virtual ir_visitor_status visit_enter(ir_expression *);
+   virtual ir_visitor_status visit_enter(ir_if *);
+   virtual ir_visitor_status visit_enter(ir_loop *);
+
+   virtual ir_visitor_status visit_leave(ir_assignment *);
+
+   void try_vectorize();
+
+   ir_assignment *assignment[4];
+   ir_assignment *current_assignment, *last_assignment;
+   unsigned channels;
+   bool has_swizzle;
+
+   bool progress;
+};
+
+} /* unnamed namespace */
+
+/**
+ * Rewrites the swizzles and types of a right-hand side of an assignment.
+ *
+ * From the example above, this function would be called (by visit_tree()) on
+ * the nodes of the tree (expression float log2 (swiz z   (var_ref v0))),
+ * rewriting it into     (expression vec3  log2 (swiz xyz (var_ref v0))).
+ *
+ * The function operates on ir_expressions (and its operands) and ir_swizzles.
+ * For expressions it sets a new type and swizzles any non-expression and non-
+ * swizzle scalar operands into appropriately sized vector arguments. For
+ * example, if combining
+ *
+ * (assign (x) (var_ref r1) (expression float + (swiz x (var_ref v0) (var_ref v1))))
+ * (assign (y) (var_ref r1) (expression float + (swiz y (var_ref v0) (var_ref v1))))
+ *
+ * where v1 is a scalar, rewrite_swizzle() would insert a swizzle on
+ * (var_ref v1) such that the final result was
+ *
+ * (assign (xy) (var_ref r1) (expression vec2 + (swiz xy (var_ref v0))
+ *                                              (swiz xx (var_ref v1))))
+ *
+ * For swizzles, it sets a new type, and if the variable being swizzled is a
+ * vector it overwrites the swizzle mask with the ir_swizzle_mask passed as the
+ * data parameter. If the swizzled variable is scalar, then the swizzle was
+ * added by an earlier call to rewrite_swizzle() on an expression, so the
+ * mask should not be modified.
+ */
+static void
+rewrite_swizzle(ir_instruction *ir, void *data)
+{
+   ir_swizzle_mask *mask = (ir_swizzle_mask *)data;
+
+   switch (ir->ir_type) {
+   case ir_type_swizzle: {
+      ir_swizzle *swz = (ir_swizzle *)ir;
+      if (swz->val->type->is_vector()) {
+         swz->mask = *mask;
+      }
+      swz->type = glsl_type::get_instance(swz->type->base_type,
+                                          mask->num_components, 1);
+      break;
+   }
+   case ir_type_expression: {
+      ir_expression *expr = (ir_expression *)ir;
+      expr->type = glsl_type::get_instance(expr->type->base_type,
+                                           mask->num_components, 1);
+      for (unsigned i = 0; i < 4; i++) {
+         if (expr->operands[i]) {
+            ir_rvalue *rval = expr->operands[i]->as_rvalue();
+            if (rval && rval->type->is_scalar() &&
+                !rval->as_expression() && !rval->as_swizzle()) {
+               expr->operands[i] = new(ir) ir_swizzle(rval, 0, 0, 0, 0,
+                                                      mask->num_components);
+            }
+         }
+      }
+      break;
+   }
+   default:
+      break;
+   }
+}
+
+/**
+ * Attempt to vectorize the previously saved assignments, and clear them from
+ * consideration.
+ *
+ * If the assignments are able to be combined, it modifies in-place the last
+ * assignment seen to be an equivalent vector form of the scalar assignments.
+ * It then removes the other now obsolete scalar assignments.
+ */
+void
+ir_vectorize_visitor::try_vectorize()
+{
+   if (this->last_assignment && this->channels > 1) {
+      ir_swizzle_mask mask = {0, 0, 0, 0, channels, 0};
+
+      this->last_assignment->write_mask = 0;
+
+      for (unsigned i = 0, j = 0; i < 4; i++) {
+         if (this->assignment[i]) {
+            this->last_assignment->write_mask |= 1 << i;
+
+            if (this->assignment[i] != this->last_assignment) {
+               this->assignment[i]->remove();
+            }
+
+            switch (j) {
+            case 0: mask.x = i; break;
+            case 1: mask.y = i; break;
+            case 2: mask.z = i; break;
+            case 3: mask.w = i; break;
+            }
+
+            j++;
+         }
+      }
+
+      visit_tree(this->last_assignment->rhs, rewrite_swizzle, &mask);
+
+      this->progress = true;
+   }
+   clear();
+}
+
+/**
+ * Returns whether the write mask is a single channel.
+ */
+static bool
+single_channel_write_mask(unsigned write_mask)
+{
+   return write_mask != 0 && (write_mask & (write_mask - 1)) == 0;
+}
+
+/**
+ * Translates single-channeled write mask to single-channeled swizzle.
+ */
+static unsigned
+write_mask_to_swizzle(unsigned write_mask)
+{
+   switch (write_mask) {
+   case WRITEMASK_X: return SWIZZLE_X;
+   case WRITEMASK_Y: return SWIZZLE_Y;
+   case WRITEMASK_Z: return SWIZZLE_Z;
+   case WRITEMASK_W: return SWIZZLE_W;
+   }
+   assert(!"not reached");
+   unreachable();
+}
+
+/**
+ * Returns whether a single-channeled write mask matches a swizzle.
+ */
+static bool
+write_mask_matches_swizzle(unsigned write_mask,
+                           const ir_swizzle *swz)
+{
+   return ((write_mask == WRITEMASK_X && swz->mask.x == SWIZZLE_X) ||
+           (write_mask == WRITEMASK_Y && swz->mask.x == SWIZZLE_Y) ||
+           (write_mask == WRITEMASK_Z && swz->mask.x == SWIZZLE_Z) ||
+           (write_mask == WRITEMASK_W && swz->mask.x == SWIZZLE_W));
+}
+
+/**
+ * Upon entering an ir_assignment, attempt to vectorize the currently tracked
+ * assignments if the current assignment is not suitable. Keep a pointer to
+ * the current assignment.
+ */
+ir_visitor_status
+ir_vectorize_visitor::visit_enter(ir_assignment *ir)
+{
+   ir_dereference *lhs = this->last_assignment != NULL ?
+                         this->last_assignment->lhs : NULL;
+   ir_rvalue *rhs = this->last_assignment != NULL ?
+                    this->last_assignment->rhs : NULL;
+
+   if (ir->condition ||
+       this->channels >= 4 ||
+       !single_channel_write_mask(ir->write_mask) ||
+       this->assignment[write_mask_to_swizzle(ir->write_mask)] != NULL ||
+       (lhs && !ir->lhs->equals(lhs)) ||
+       (rhs && !ir->rhs->equals(rhs, ir_type_swizzle))) {
+      try_vectorize();
+   }
+
+   this->current_assignment = ir;
+
+   return visit_continue;
+}
+
+/**
+ * Upon entering an ir_swizzle, set ::has_swizzle if we're visiting from an
+ * ir_assignment (i.e., that ::current_assignment is set) and the swizzle mask
+ * matches the current assignment's write mask.
+ *
+ * If the write mask doesn't match the swizzle mask, remove the current
+ * assignment from further consideration.
+ */
+ir_visitor_status
+ir_vectorize_visitor::visit_enter(ir_swizzle *ir)
+{
+   if (this->current_assignment) {
+      if (write_mask_matches_swizzle(this->current_assignment->write_mask, ir)) {
+         this->has_swizzle = true;
+      } else {
+         this->current_assignment = NULL;
+      }
+   }
+   return visit_continue;
+}
+
+/* Upon entering an ir_array_dereference, remove the current assignment from
+ * further consideration. Since the index of an array dereference must scalar,
+ * we are not able to vectorize it.
+ *
+ * FINISHME: If all of scalar indices are identical we could vectorize.
+ */
+ir_visitor_status
+ir_vectorize_visitor::visit_enter(ir_dereference_array *)
+{
+   this->current_assignment = NULL;
+   return visit_continue_with_parent;
+}
+
+/**
+ * Upon entering an ir_expression, remove the current assignment from further
+ * consideration if the expression operates horizontally on vectors.
+ */
+ir_visitor_status
+ir_vectorize_visitor::visit_enter(ir_expression *ir)
+{
+   if (ir->is_horizontal()) {
+      this->current_assignment = NULL;
+      return visit_continue_with_parent;
+   }
+   return visit_continue;
+}
+
+/* Since there is no statement to visit between the "then" and "else"
+ * instructions try to vectorize before, in between, and after them to avoid
+ * combining statements from different basic blocks.
+ */
+ir_visitor_status
+ir_vectorize_visitor::visit_enter(ir_if *ir)
+{
+   try_vectorize();
+
+   visit_list_elements(this, &ir->then_instructions);
+   try_vectorize();
+
+   visit_list_elements(this, &ir->else_instructions);
+   try_vectorize();
+
+   return visit_continue_with_parent;
+}
+
+/* Since there is no statement to visit between the instructions in the body of
+ * the loop and the instructions after it try to vectorize before and after the
+ * body to avoid combining statements from different basic blocks.
+ */
+ir_visitor_status
+ir_vectorize_visitor::visit_enter(ir_loop *ir)
+{
+   try_vectorize();
+
+   visit_list_elements(this, &ir->body_instructions);
+   try_vectorize();
+
+   return visit_continue_with_parent;
+}
+
+/**
+ * Upon leaving an ir_assignment, save a pointer to it in ::assignment[] if
+ * the swizzle mask(s) found were appropriate. Also save a pointer in
+ * ::last_assignment so that we can compare future assignments with it.
+ *
+ * Finally, clear ::current_assignment and ::has_swizzle.
+ */
+ir_visitor_status
+ir_vectorize_visitor::visit_leave(ir_assignment *ir)
+{
+   if (this->has_swizzle && this->current_assignment) {
+      assert(this->current_assignment == ir);
+
+      unsigned channel = write_mask_to_swizzle(this->current_assignment->write_mask);
+      this->assignment[channel] = ir;
+      this->channels++;
+
+      this->last_assignment = this->current_assignment;
+   }
+   this->current_assignment = NULL;
+   this->has_swizzle = false;
+   return visit_continue;
+}
+
+/**
+ * Combines scalar assignments of the same expression (modulo swizzle) to
+ * multiple channels of the same variable into a single vectorized expression
+ * and assignment.
+ */
+bool
+do_vectorize(exec_list *instructions)
+{
+   ir_vectorize_visitor v;
+
+   v.run(instructions);
+
+   /* Try to vectorize the last assignments seen. */
+   v.try_vectorize();
+
+   return v.progress;
+}

diff --git a/icd/intel/compiler/shader/ossource.cpp b/icd/intel/compiler/shader/ossource.cpp
new file mode 100644
index 0000000..4657e26
--- /dev/null
+++ b/icd/intel/compiler/shader/ossource.cpp

@@ -0,0 +1,201 @@
+//
+//Copyright (C) 2002-2005  3Dlabs Inc. Ltd.
+//All rights reserved.
+//
+//Redistribution and use in source and binary forms, with or without
+//modification, are permitted provided that the following conditions
+//are met:
+//
+//    Redistributions of source code must retain the above copyright
+//    notice, this list of conditions and the following disclaimer.
+//
+//    Redistributions in binary form must reproduce the above
+//    copyright notice, this list of conditions and the following
+//    disclaimer in the documentation and/or other materials provided
+//    with the distribution.
+//
+//    Neither the name of 3Dlabs Inc. Ltd. nor the names of its
+//    contributors may be used to endorse or promote products derived
+//    from this software without specific prior written permission.
+//
+//THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+//"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+//LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+//FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+//COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+//INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+//BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+//LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+//CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+//LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+//ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+//POSSIBILITY OF SUCH DAMAGE.
+//
+
+//
+// This file contains the Linux-specific functions
+//
+
+#include "glslang/OSDependent/osinclude.h"
+#include "glsl_parser_extras.h"
+
+#include <pthread.h>
+#include <semaphore.h>
+#include <assert.h>
+#include <errno.h>
+#include <stdint.h>
+
+namespace glslang {
+
+bool InitProcess();
+bool InitThread();
+bool DetachThread();
+bool DetachProcess();
+
+//
+// Thread cleanup
+//
+
+//
+// Wrapper for Linux call to DetachThread.  This is required as pthread_cleanup_push() expects 
+// the cleanup routine to return void.
+// 
+void DetachThreadLinux(void *)
+{
+	DetachThread();
+}
+
+
+//
+// Registers cleanup handler, sets cancel type and state, and executes the thread specific
+// cleanup handler.  This function will be called in the Standalone.cpp for regression 
+// testing.  When OpenGL applications are run with the driver code, Linux OS does the 
+// thread cleanup.
+// 
+void OS_CleanupThreadData(void)
+{
+	int old_cancel_state, old_cancel_type;
+	void *cleanupArg = NULL;
+
+	//
+	// Set thread cancel state and push cleanup handler.
+	//
+	pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, &old_cancel_state);
+	pthread_cleanup_push(DetachThreadLinux, (void *) cleanupArg);
+
+	//
+	// Put the thread in deferred cancellation mode.
+	//
+	pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, &old_cancel_type);
+
+	//
+	// Pop cleanup handler and execute it prior to unregistering the cleanup handler.
+	//
+	pthread_cleanup_pop(1);
+
+	//
+	// Restore the thread's previous cancellation mode.
+	//
+	pthread_setcanceltype(old_cancel_state, NULL);
+}
+
+
+//
+// Thread Local Storage Operations
+//
+
+inline OS_TLSIndex PthreadKeyToTLSIndex(pthread_key_t key)
+{
+       return (OS_TLSIndex)((uintptr_t)key + 1);
+}
+
+inline pthread_key_t TLSIndexToPthreadKey(OS_TLSIndex nIndex)
+{
+       return (pthread_key_t)((uintptr_t)nIndex - 1);
+}
+
+OS_TLSIndex OS_AllocTLSIndex()
+{
+	pthread_key_t pPoolIndex;
+
+	//
+	// Create global pool key.
+	//
+	if ((pthread_key_create(&pPoolIndex, NULL)) != 0) {
+		assert(0 && "OS_AllocTLSIndex(): Unable to allocate Thread Local Storage");
+		return OS_INVALID_TLS_INDEX;
+	}
+	else
+		return PthreadKeyToTLSIndex(pPoolIndex);
+}
+
+
+bool OS_SetTLSValue(OS_TLSIndex nIndex, void *lpvValue)
+{
+	if (nIndex == OS_INVALID_TLS_INDEX) {
+		assert(0 && "OS_SetTLSValue(): Invalid TLS Index");
+		return false;
+	}
+
+	if (pthread_setspecific(TLSIndexToPthreadKey(nIndex), lpvValue) == 0)
+		return true;
+	else
+		return false;
+}
+
+void* OS_GetTLSValue(OS_TLSIndex nIndex)
+{
+       //
+       // This function should return 0 if nIndex is invalid.
+       //
+       assert(nIndex != OS_INVALID_TLS_INDEX);
+       return pthread_getspecific(TLSIndexToPthreadKey(nIndex)); 
+}
+
+bool OS_FreeTLSIndex(OS_TLSIndex nIndex)
+{
+	if (nIndex == OS_INVALID_TLS_INDEX) {
+		assert(0 && "OS_SetTLSValue(): Invalid TLS Index");
+		return false;
+	}
+
+	//
+	// Delete the global pool key.
+	//
+	if (pthread_key_delete(TLSIndexToPthreadKey(nIndex)) == 0)
+		return true;
+	else
+		return false;
+}
+
+mtx_t GlslangGlobalLock = _MTX_INITIALIZER_NP;
+
+void InitGlobalLock() {
+}
+
+void GetGlobalLock() {
+   mtx_lock(&GlslangGlobalLock);
+}
+
+void ReleaseGlobalLock() {
+   mtx_unlock(&GlslangGlobalLock);
+}
+
+void* OS_CreateThread(TThreadEntrypoint entry)
+{
+    return 0;
+}
+
+void OS_WaitForAllThreads(void* threads, int numThreads)
+{
+}
+
+void OS_Sleep(int milliseconds)
+{
+}
+
+void OS_DumpMemoryCounters()
+{
+}
+
+} // end namespace glslang

diff --git a/icd/intel/compiler/shader/program.h b/icd/intel/compiler/shader/program.h
new file mode 100644
index 0000000..0dd721e
--- /dev/null
+++ b/icd/intel/compiler/shader/program.h

@@ -0,0 +1,53 @@
+/*
+ * Copyright (C) 1999-2008  Brian Paul   All Rights Reserved.
+ * Copyright (C) 2009  VMware, Inc.  All Rights Reserved.
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include "libfns.h"  // LunarG ADD:
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+extern void
+_mesa_glsl_compile_shader(struct gl_context *ctx, struct gl_shader *shader,
+			  bool dump_ast, bool dump_SPV, bool dump_hir,
+			  bool strip_SPV, bool canonicalize_SPV);
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+extern void
+link_shaders(struct gl_context *ctx, struct gl_shader_program *prog);
+
+extern void
+linker_error(struct gl_shader_program *prog, const char *fmt, ...)
+   PRINTFLIKE(2, 3);
+
+extern void
+linker_warning(struct gl_shader_program *prog, const char *fmt, ...)
+   PRINTFLIKE(2, 3);
+
+extern long
+parse_program_resource_name(const GLchar *name,
+                            const GLchar **out_base_name_end);

diff --git a/icd/intel/compiler/shader/s_expression.cpp b/icd/intel/compiler/shader/s_expression.cpp
new file mode 100644
index 0000000..6906ff0
--- /dev/null
+++ b/icd/intel/compiler/shader/s_expression.cpp

@@ -0,0 +1,219 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include <assert.h>
+#include <limits>
+#include "s_expression.h"
+
+s_symbol::s_symbol(const char *str, size_t n)
+{
+   /* Assume the given string is already nul-terminated and in memory that
+    * will live as long as this node.
+    */
+   assert(str[n] == '\0');
+   this->str = str;
+}
+
+s_list::s_list()
+{
+}
+
+static void
+skip_whitespace(const char *&src, char *&symbol_buffer)
+{
+   size_t n = strspn(src, " \v\t\r\n");
+   src += n;
+   symbol_buffer += n;
+   /* Also skip Scheme-style comments: semi-colon 'til end of line */
+   if (src[0] == ';') {
+      n = strcspn(src, "\n");
+      src += n;
+      symbol_buffer += n;
+      skip_whitespace(src, symbol_buffer);
+   }
+}
+
+static s_expression *
+read_atom(void *ctx, const char *&src, char *&symbol_buffer)
+{
+   s_expression *expr = NULL;
+
+   skip_whitespace(src, symbol_buffer);
+
+   size_t n = strcspn(src, "( \v\t\r\n);");
+   if (n == 0)
+      return NULL; // no atom
+
+   // Check for the special symbol '+INF', which means +Infinity.  Note: C99
+   // requires strtof to parse '+INF' as +Infinity, but we still support some
+   // non-C99-compliant compilers (e.g. MSVC).
+   if (n == 4 && strncmp(src, "+INF", 4) == 0) {
+      expr = new(ctx) s_float(std::numeric_limits<float>::infinity());
+   } else {
+      // Check if the atom is a number.
+      char *float_end = NULL;
+      float f = glsl_strtof(src, &float_end);
+      if (float_end != src) {
+         char *int_end = NULL;
+         int i = strtol(src, &int_end, 10);
+         // If strtof matched more characters, it must have a decimal part
+         if (float_end > int_end)
+            expr = new(ctx) s_float(f);
+         else
+            expr = new(ctx) s_int(i);
+      } else {
+         // Not a number; return a symbol.
+         symbol_buffer[n] = '\0';
+         expr = new(ctx) s_symbol(symbol_buffer, n);
+      }
+   }
+
+   src += n;
+   symbol_buffer += n;
+
+   return expr;
+}
+
+static s_expression *
+__read_expression(void *ctx, const char *&src, char *&symbol_buffer)
+{
+   s_expression *atom = read_atom(ctx, src, symbol_buffer);
+   if (atom != NULL)
+      return atom;
+
+   skip_whitespace(src, symbol_buffer);
+   if (src[0] == '(') {
+      ++src;
+      ++symbol_buffer;
+
+      s_list *list = new(ctx) s_list;
+      s_expression *expr;
+
+      while ((expr = __read_expression(ctx, src, symbol_buffer)) != NULL) {
+	 list->subexpressions.push_tail(expr);
+      }
+      skip_whitespace(src, symbol_buffer);
+      if (src[0] != ')') {
+	 printf("Unclosed expression (check your parenthesis).\n");
+	 return NULL;
+      }
+      ++src;
+      ++symbol_buffer;
+      return list;
+   }
+   return NULL;
+}
+
+s_expression *
+s_expression::read_expression(void *ctx, const char *&src)
+{
+   assert(src != NULL);
+
+   /* When we encounter a Symbol, we need to save a nul-terminated copy of
+    * the string.  However, ralloc_strndup'ing every individual Symbol is
+    * extremely expensive.  We could avoid this by simply overwriting the
+    * next character (guaranteed to be whitespace, parens, or semicolon) with
+    * a nul-byte.  But overwriting non-whitespace would mess up parsing.
+    *
+    * So, just copy the whole buffer ahead of time.  Walk both, leaving the
+    * original source string unmodified, and altering the copy to contain the
+    * necessary nul-bytes whenever we encounter a symbol.
+    */
+   char *symbol_buffer = ralloc_strdup(ctx, src);
+   return __read_expression(ctx, src, symbol_buffer);
+}
+
+void s_int::print()
+{
+   printf("%d", this->val);
+}
+
+void s_float::print()
+{
+   printf("%f", this->val);
+}
+
+void s_symbol::print()
+{
+   printf("%s", this->str);
+}
+
+void s_list::print()
+{
+   printf("(");
+   foreach_list(n, &this->subexpressions) {
+      s_expression *expr = (s_expression *) n;
+      expr->print();
+      if (!expr->next->is_tail_sentinel())
+	 printf(" ");
+   }
+   printf(")");
+}
+
+// --------------------------------------------------
+
+bool
+s_pattern::match(s_expression *expr)
+{
+   switch (type)
+   {
+   case EXPR:   *p_expr = expr; break;
+   case LIST:   if (expr->is_list())   *p_list   = (s_list *)   expr; break;
+   case SYMBOL: if (expr->is_symbol()) *p_symbol = (s_symbol *) expr; break;
+   case NUMBER: if (expr->is_number()) *p_number = (s_number *) expr; break;
+   case INT:    if (expr->is_int())    *p_int    = (s_int *)    expr; break;
+   case STRING:
+      s_symbol *sym = SX_AS_SYMBOL(expr);
+      if (sym != NULL && strcmp(sym->value(), literal) == 0)
+	 return true;
+      return false;
+   };
+
+   return *p_expr == expr;
+}
+
+bool
+s_match(s_expression *top, unsigned n, s_pattern *pattern, bool partial)
+{
+   s_list *list = SX_AS_LIST(top);
+   if (list == NULL)
+      return false;
+
+   unsigned i = 0;
+   foreach_list(node, &list->subexpressions) {
+      if (i >= n)
+	 return partial; /* More actual items than the pattern expected */
+
+      s_expression *expr = (s_expression *) node;
+      if (expr == NULL || !pattern[i].match(expr))
+	 return false;
+
+      i++;
+   }
+
+   if (i < n)
+      return false; /* Less actual items than the pattern expected */
+
+   return true;
+}

diff --git a/icd/intel/compiler/shader/s_expression.h b/icd/intel/compiler/shader/s_expression.h
new file mode 100644
index 0000000..b3751e4
--- /dev/null
+++ b/icd/intel/compiler/shader/s_expression.h

@@ -0,0 +1,180 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef S_EXPRESSION_H
+#define S_EXPRESSION_H
+
+#include "libfns.h" // LunarG ADD:
+#include "strtod.h"
+#include "list.h"
+
+/* Type-safe downcasting macros (also safe to pass NULL) */
+#define SX_AS_(t,x) ((x) && ((s_expression*) x)->is_##t()) ? ((s_##t*) (x)) \
+                                                           : NULL
+#define SX_AS_LIST(x)   SX_AS_(list, x)
+#define SX_AS_SYMBOL(x) SX_AS_(symbol, x)
+#define SX_AS_NUMBER(x) SX_AS_(number, x)
+#define SX_AS_INT(x)    SX_AS_(int, x)
+
+/* Pattern matching macros */
+#define MATCH(list, pat) s_match(list, Elements(pat), pat, false)
+#define PARTIAL_MATCH(list, pat) s_match(list, Elements(pat), pat, true)
+
+/* For our purposes, S-Expressions are:
+ * - <int>
+ * - <float>
+ * - symbol
+ * - (expr1 expr2 ... exprN)     where exprN is an S-Expression
+ *
+ * Unlike LISP/Scheme, we do not support (foo . bar) pairs.
+ */
+class s_expression : public exec_node
+{
+public:
+   /**
+    * Read an S-Expression from the given string.
+    * Advances the supplied pointer to just after the expression read.
+    *
+    * Any allocation will be performed with 'ctx' as the ralloc owner.
+    */
+   static s_expression *read_expression(void *ctx, const char *&src);
+
+   /**
+    * Print out an S-Expression.  Useful for debugging.
+    */
+   virtual void print() = 0;
+
+   virtual bool is_list()   const { return false; }
+   virtual bool is_symbol() const { return false; }
+   virtual bool is_number() const { return false; }
+   virtual bool is_int()    const { return false; }
+
+protected:
+   s_expression() { }
+};
+
+/* Atoms */
+
+class s_number : public s_expression
+{
+public:
+   bool is_number() const { return true; }
+
+   virtual float fvalue() = 0;
+
+protected:
+   s_number() { }
+};
+
+class s_int : public s_number
+{
+public:
+   s_int(int x) : val(x) { }
+
+   bool is_int() const { return true; }
+
+   float fvalue() { return float(this->val); }
+   int value() { return this->val; }
+
+   void print();
+
+private:
+   int val;
+};
+
+class s_float : public s_number
+{
+public:
+   s_float(float x) : val(x) { }
+
+   float fvalue() { return this->val; }
+
+   void print();
+
+private:
+   float val;
+};
+
+class s_symbol : public s_expression
+{
+public:
+   s_symbol(const char *, size_t);
+
+   bool is_symbol() const { return true; }
+
+   const char *value() { return this->str; }
+
+   void print();
+
+private:
+   const char *str;
+};
+
+/* Lists of expressions: (expr1 ... exprN) */
+class s_list : public s_expression
+{
+public:
+   s_list();
+
+   virtual bool is_list() const { return true; }
+
+   void print();
+
+   exec_list subexpressions;
+};
+
+// ------------------------------------------------------------
+
+/**
+ * Part of a pattern to match - essentially a record holding a pointer to the
+ * storage for the component to match, along with the appropriate type.
+ */
+class s_pattern {
+public:
+   s_pattern(s_expression *&s) : p_expr(&s),   type(EXPR)   { }
+   s_pattern(s_list       *&s) : p_list(&s),   type(LIST)   { }
+   s_pattern(s_symbol     *&s) : p_symbol(&s), type(SYMBOL) { }
+   s_pattern(s_number     *&s) : p_number(&s), type(NUMBER) { }
+   s_pattern(s_int        *&s) : p_int(&s),    type(INT)    { }
+   s_pattern(const char *str)  : literal(str), type(STRING) { }
+
+   bool match(s_expression *expr);
+
+private:
+   union {
+      s_expression **p_expr;
+      s_list       **p_list;
+      s_symbol     **p_symbol;
+      s_number     **p_number;
+      s_int        **p_int;
+      const char *literal;
+   };
+   enum { EXPR, LIST, SYMBOL, NUMBER, INT, STRING } type;
+};
+
+bool
+s_match(s_expression *top, unsigned n, s_pattern *pattern, bool partial);
+
+#endif /* S_EXPRESSION_H */

diff --git a/icd/intel/compiler/shader/shader_cache.h b/icd/intel/compiler/shader/shader_cache.h
new file mode 100644
index 0000000..4af26d3
--- /dev/null
+++ b/icd/intel/compiler/shader/shader_cache.h

@@ -0,0 +1,126 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef SHADER_CACHE_H
+#define SHADER_CACHE_H
+
+#include "main/shaderobj.h"
+#include "main/uniforms.h"
+#include "main/macros.h"
+#include "program/hash_table.h"
+#include "ir.h"
+
+#ifdef __cplusplus
+const uint32_t cache_validation_data[] = {
+   ir_type_max,
+   GLSL_TYPE_ERROR,
+   sizeof(long),
+   sizeof(gl_shader),
+   sizeof(gl_program),
+   sizeof(gl_shader_program),
+   sizeof(gl_uniform_storage),
+   sizeof(gl_program_parameter_list),
+   sizeof(gl_program_parameter),
+   sizeof(ir_variable),
+   sizeof(ir_assignment),
+   sizeof(ir_call),
+   sizeof(ir_constant),
+   sizeof(ir_dereference_array),
+   sizeof(ir_dereference_record),
+   sizeof(ir_dereference_variable),
+   sizeof(ir_discard),
+   sizeof(ir_expression),
+   sizeof(ir_function),
+   sizeof(ir_function_signature),
+   sizeof(ir_if),
+   sizeof(ir_loop),
+   sizeof(ir_loop_jump),
+   sizeof(ir_return),
+   sizeof(ir_swizzle),
+   sizeof(ir_texture),
+   sizeof(ir_emit_vertex),
+   sizeof(ir_end_primitive)
+};
+#define num_cache_validation_data_items\
+   sizeof(cache_validation_data)/sizeof(uint32_t)
+#endif
+
+/* C API for the cache */
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+enum {
+   MESA_SHADER_DESERIALIZE_READ_ERROR = -1,
+   MESA_SHADER_DESERIALIZE_VERSION_ERROR = -2,
+};
+
+const char *
+mesa_get_shader_cache_magic();
+
+char *
+mesa_shader_serialize(struct gl_context *ctx, struct gl_shader *shader,
+                      size_t *size, bool shader_only);
+
+char *
+mesa_program_serialize(struct gl_context *ctx, struct gl_shader_program *prog,
+                       size_t *size);
+
+int
+mesa_program_deserialize(struct gl_context *ctx, struct gl_shader_program *prog,
+                         const GLvoid *data, size_t size);
+
+int
+mesa_program_load(struct gl_context *ctx, struct gl_shader_program *prog,
+                  const char *path);
+
+struct gl_shader *
+read_single_shader(struct gl_context *ctx, struct gl_shader *shader,
+                   const char* path);
+
+int
+mesa_shader_load(struct gl_context *ctx, struct gl_shader *shader,
+                 const char *path);
+
+struct gl_shader *
+mesa_shader_deserialize(struct gl_context *ctx, struct gl_shader *shader,
+                        const char* path);
+
+/**
+ * Features not currently supported by the caches.
+ */
+bool
+supported_by_program_cache(struct gl_shader_program *prog, bool is_write);
+
+bool
+supported_by_shader_cache(struct gl_shader *shader, bool is_write);
+
+
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+#endif /* SHADER_CACHE_H */

diff --git a/icd/intel/compiler/shader/shader_deserialize.cpp b/icd/intel/compiler/shader/shader_deserialize.cpp
new file mode 100644
index 0000000..e6a1feb
--- /dev/null
+++ b/icd/intel/compiler/shader/shader_deserialize.cpp

@@ -0,0 +1,513 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "shader_cache.h"
+#include "ir_deserializer.h"
+#include "main/context.h"
+#include "main/shaderapi.h"
+
+#if 0
+static struct gl_program_parameter_list*
+deserialize_program_parameters(memory_map &map)
+{
+   struct gl_program_parameter_list *list = _mesa_new_parameter_list();
+   uint8_t par_amount = map.read_uint8_t();
+
+   if (par_amount == 0)
+      return list;
+
+   for (unsigned i = 0; i < par_amount; i++) {
+
+      struct gl_program_parameter par;
+      gl_constant_value values[4];
+
+      char *name = map.read_string();
+      map.read(&par, sizeof(struct gl_program_parameter));
+      map.read(values, 4 * sizeof(gl_constant_value));
+
+      _mesa_add_parameter(list, par.Type, name, par.Size, par.DataType,
+                          values, par.StateIndexes);
+   }
+   list->StateFlags = map.read_uint32_t();
+
+   return list;
+}
+
+
+/**
+ * gl_program contains post-link data populated by the driver
+ */
+static bool
+deserialize_gl_program(struct gl_shader *shader, memory_map &map)
+{
+   map.read(shader->Program, sizeof(struct gl_program));
+   char *str = map.read_string();
+   shader->Program->String = (GLubyte*) _mesa_strdup(str);
+
+   shader->Program->Parameters = deserialize_program_parameters(map);
+
+   if (map.errors())
+      return false;
+
+   return true;
+}
+#endif
+
+static bool
+read_hash_table(struct string_to_uint_map *hash, memory_map *map)
+{
+   if (map->errors())
+      return false;
+
+   uint32_t size = map->read_uint32_t();
+
+   for (unsigned i = 0; i < size; i++) {
+
+      char *key = map->read_string();
+      uint32_t value = map->read_uint32_t();
+
+      /* put() adds +1 bias on the value (see hash_table.h), this
+       * is taken care here when reading
+       */
+      hash->put(value-1, key);
+
+      /* break out in case of read errors */
+      if (map->errors())
+         return false;
+   }
+   return true;
+}
+
+
+static void
+read_uniform_storage(void *mem_ctx, gl_uniform_storage *uni, memory_map &map)
+{
+   map.read(uni, sizeof(gl_uniform_storage));
+
+   char *name = map.read_string();
+   uni->name = _mesa_strdup(name);
+
+   /* type is resolved later */
+   uni->type = NULL;
+   uni->driver_storage = NULL;
+
+   uint32_t size = map.read_uint32_t();
+
+   uni->storage = rzalloc_array(mem_ctx, union gl_constant_value, size);
+
+   /* initialize to zero for now, initializers will be propagated later */
+   memset(uni->storage, 0, size * sizeof(union gl_constant_value));
+
+   /* driver uniform storage gets generated and propagated later */
+   uni->driver_storage = NULL;
+   uni->num_driver_storage = 0;
+}
+
+
+static ir_variable *
+search_var(struct exec_list *list, const char *name)
+{
+   foreach_list_safe(node, list) {
+      ir_variable *var = ((ir_instruction *) node)->as_variable();
+      if (var && strcmp(name, var->name) == 0)
+         return var;
+   }
+   return NULL;
+}
+
+
+/**
+ * Resolve glsl_types for uniform_storage
+ */
+static void
+resolve_uniform_types(struct gl_shader_program *prog, struct gl_shader *sha)
+{
+   /* for each storage, find corresponding uniform from the shader */
+   for (unsigned i = 0; i < prog->NumUserUniformStorage; i++) {
+      ir_variable *var = search_var(sha->ir, prog->UniformStorage[i].name);
+
+      if (var) {
+         /* for arrays, uniform storage type contains the element type */
+         if (var->type->is_array())
+            prog->UniformStorage[i].type = var->type->element_type();
+         else
+            prog->UniformStorage[i].type = var->type;
+      }
+   }
+}
+
+
+static bool
+validate_binary_cache(struct gl_context *ctx, memory_map &map)
+{
+   assert(ctx);
+   uint32_t data[num_cache_validation_data_items];
+
+   // read the cache sentinel here, ensure it is non-zero, meaning
+   // cache entry is complete
+   uint32_t cache_sentinel = 0;
+   map.read(&cache_sentinel, sizeof(cache_sentinel));
+   if (cache_sentinel == 0)
+      return false;
+
+   map.read(&data, sizeof(cache_validation_data));
+
+   /* validation data (common struct sizes) must match */
+   if (memcmp(&data, cache_validation_data, sizeof(cache_validation_data)))
+      return false;
+
+   char *cache_magic_id = map.read_string();
+   char *cache_vendor = map.read_string();
+   char *cache_renderer = map.read_string();
+
+   const char *mesa_magic_id = mesa_get_shader_cache_magic();
+
+   const char *mesa_vendor =
+      (const char *) ctx->Driver.GetString(ctx, GL_VENDOR);
+   const char *mesa_renderer =
+      (const char *) ctx->Driver.GetString(ctx, GL_RENDERER);
+
+   // Ensure we didn't get something bad back from cache somehow
+   if (! (cache_magic_id && cache_vendor && cache_renderer &&
+           mesa_magic_id && mesa_vendor  && mesa_renderer))
+           return false;
+
+   /* check if cache was created with another driver */
+   if ((strcmp(mesa_vendor, cache_vendor)) ||
+      (strcmp(mesa_renderer, cache_renderer)))
+         return false;
+
+   /* check against different version of mesa */
+   if (strcmp(cache_magic_id, mesa_magic_id))
+      return false;
+
+   return true;
+}
+
+
+/* read and serialize a gl_shader */
+static struct gl_shader *
+read_shader_in_place(struct gl_context *ctx, void *mem_ctx, memory_map &map, ir_deserializer &s, struct gl_shader *shader)
+{
+   assert(ctx);
+   struct gl_program *prog = NULL;
+   gl_shader_stage stage;
+
+   // If shader is NULL, we are in program path, and need to check sentinel
+   // In shader_only path, sentinel has already been read by validate_binary_cache
+   if (shader == NULL) {
+      uint32_t shader_sentinel = map.read_uint32_t();
+      if (shader_sentinel == 0)
+         return NULL;
+   }
+
+   uint32_t shader_size = map.read_uint32_t();
+   uint32_t type = map.read_uint32_t();
+
+   GLuint name;
+   GLint refcount;
+   const char* source;
+
+   bool shader_cleanup = false;
+   bool program_cleanup = false;
+
+   /* Useful information for debugging. */
+   (void) shader_size;
+
+   /* verify that type is supported */
+   switch (type) {
+      case GL_VERTEX_SHADER:
+      case GL_FRAGMENT_SHADER:
+      case GL_GEOMETRY_SHADER:
+         break;
+      default:
+         goto error_deserialize;
+   }
+
+   if(!shader) {
+      shader_cleanup = true;
+      shader = ctx->Driver.NewShader(NULL, 0, type);
+   }
+
+   if (!shader)
+      return NULL;
+
+   /* Set the fields we already know */
+   name = shader->Name;
+   refcount = shader->RefCount;
+   source = shader->Source;
+
+   /* If anything has been added to gl_shader_program, we need to grok it for
+    * deserialization.  i.e. does it make sense to cache a mutex id?  No.
+    * Below is size for mesa-10.2.5 with deferred compilation.
+    */
+   #if defined(i386) || defined(__i386__)
+      STATIC_ASSERT(sizeof(gl_shader) == 400);
+   #else
+      STATIC_ASSERT(sizeof(gl_shader) == 464);
+   #endif
+
+   /* Reading individual fields and structs would slow us down here. This is
+    * slightly dangerous though and we need to take care to initialize any
+    * pointers properly.
+    *
+    * Read up to (but exclude) gl_shader::Mutex to avoid races.
+    */
+   map.read(shader, offsetof(struct gl_shader, Mutex));
+   map.ffwd(sizeof(struct gl_shader) - offsetof(struct gl_shader, Mutex));
+
+   /* verify that type from header matches */
+   if (shader->Type != type)
+      goto error_deserialize;
+
+   /* Set correct name and refcount. */
+   shader->Name = name;
+   shader->RefCount = refcount;
+   shader->Source = source;
+
+   /* clear all pointer fields, only data preserved */
+   shader->Label = NULL;
+   shader->Program = NULL;
+   shader->InfoLog = ralloc_strdup(mem_ctx, "");
+   shader->UniformBlocks = NULL;
+   shader->ir = NULL;
+   shader->symbols = NULL;
+
+   stage = _mesa_shader_enum_to_shader_stage(shader->Type);
+
+   prog =
+      ctx->Driver.NewProgram(ctx, _mesa_shader_stage_to_program(stage),
+                             shader->Name);
+
+   if (!prog)
+      goto error_deserialize;
+   else
+      program_cleanup = true;
+
+   _mesa_reference_program(ctx, &shader->Program, prog);
+
+   /* IR tree */
+   if (!s.deserialize(ctx, mem_ctx, shader, &map))
+      goto error_deserialize;
+
+   return shader;
+
+error_deserialize:
+
+   if (shader && shader_cleanup)
+      ctx->Driver.DeleteShader(ctx, shader);
+
+   if (prog && program_cleanup)
+      _mesa_reference_program(ctx, &shader->Program, NULL);
+
+   return NULL;
+}
+
+
+static gl_shader *
+read_shader(struct gl_context *ctx, void *mem_ctx, memory_map &map, ir_deserializer &s)
+{
+   assert(ctx);
+   struct gl_shader *shader = NULL;
+   shader = read_shader_in_place(ctx, mem_ctx, map, s, shader);
+   return shader;
+}
+
+
+/* read and serialize a gl_shader */
+/* this version is used externally without requiring
+ * creation of ir_deserializer
+ */
+struct gl_shader *
+read_single_shader(struct gl_context *ctx, struct gl_shader *shader, const char* path)
+{
+   assert(ctx);
+
+   ir_deserializer s;
+
+   memory_map map;
+
+   if (map.map(path))
+      return NULL;
+
+   if (validate_binary_cache(ctx, map) == false) {
+      /* Cache binary produced with a different Mesa, remove it. */
+      unlink(path);
+      return NULL;
+   }
+
+   return read_shader_in_place(ctx, shader, map, s, shader);
+}
+
+
+static int
+deserialize_program(struct gl_context *ctx, struct gl_shader_program *prog, memory_map &map)
+{
+   assert(ctx);
+
+   if (validate_binary_cache(ctx, map) == false)
+      return MESA_SHADER_DESERIALIZE_VERSION_ERROR;
+
+   struct gl_shader_program tmp_prog;
+
+   /* If anything has been added to gl_shader_program, we need to grok it for
+    * deserialization.  i.e. does it make sense to cache a mutex id?  No.
+    * Below is size for mesa-10.2.5 with deferred compilation.
+    */
+   #if defined(i386) || defined(__i386__)
+      STATIC_ASSERT(sizeof(gl_shader_program) == 268);
+   #else
+      STATIC_ASSERT(sizeof(gl_shader_program) == 408);
+   #endif
+
+   /* Read up to (but exclude) gl_shader_program::Mutex to avoid races */
+   map.read(&tmp_prog, offsetof(gl_shader_program, Mutex));
+   map.ffwd(sizeof(gl_shader_program) - offsetof(gl_shader_program, Mutex));
+
+   /* Cache does not support compatibility extensions
+    * like ARB_ES3_compatibility (yet).
+    */
+   if (_mesa_is_desktop_gl(ctx) && tmp_prog.IsES)
+      return MESA_SHADER_DESERIALIZE_READ_ERROR;
+
+   prog->Type = tmp_prog.Type;
+   prog->Version = tmp_prog.Version;
+   prog->IsES = tmp_prog.IsES;
+   prog->NumUserUniformStorage = tmp_prog.NumUserUniformStorage;
+   prog->NumUniformRemapTable = tmp_prog.NumUniformRemapTable;
+   prog->LastClipDistanceArraySize = tmp_prog.LastClipDistanceArraySize;
+   prog->FragDepthLayout = tmp_prog.FragDepthLayout;
+
+   prog->UniformStorage = NULL;
+   prog->Label = NULL;
+
+   prog->UniformHash = new string_to_uint_map;
+
+   /* these already allocated by _mesa_init_shader_program */
+   read_hash_table(prog->AttributeBindings, &map);
+   read_hash_table(prog->FragDataBindings, &map);
+   read_hash_table(prog->FragDataIndexBindings, &map);
+
+   read_hash_table(prog->UniformHash, &map);
+
+   if (map.errors())
+      return MESA_SHADER_DESERIALIZE_READ_ERROR;
+
+   memcpy(&prog->Geom, &tmp_prog.Geom, sizeof(prog->Geom));
+   memcpy(&prog->Vert, &tmp_prog.Vert, sizeof(prog->Vert));
+
+   /* uniform storage */
+   prog->UniformStorage = rzalloc_array(prog, struct gl_uniform_storage,
+                                        prog->NumUserUniformStorage);
+
+   for (unsigned i = 0; i < prog->NumUserUniformStorage; i++)
+      read_uniform_storage(prog, &prog->UniformStorage[i], map);
+
+   prog->UniformRemapTable =
+      rzalloc_array(prog, gl_uniform_storage *, prog->NumUniformRemapTable);
+
+   /* assign remap entries from UniformStorage */
+   for (unsigned i = 0; i < prog->NumUserUniformStorage; i++) {
+      unsigned entries = MAX2(1, prog->UniformStorage[i].array_elements);
+      for (unsigned j = 0; j < entries; j++)
+         prog->UniformRemapTable[prog->UniformStorage[i].remap_location + j] =
+            &prog->UniformStorage[i];
+   }
+
+   /* how many linked shaders does the binary contain */
+   uint8_t shader_amount = map.read_uint8_t();
+
+   /* use same deserializer to have same type_hash across shader stages */
+   ir_deserializer s;
+
+   /* init list, cache can contain only some shader types */
+   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++)
+      prog->_LinkedShaders[i] = NULL;
+
+   /* Only reading error or error constructing gl_program can change this. */
+   prog->LinkStatus = true;
+
+   /* read _LinkedShaders */
+   for (unsigned i = 0; i < shader_amount; i++) {
+      uint32_t index = map.read_uint32_t();
+
+      struct gl_shader *sha = read_shader(ctx, prog, map, s);
+
+      if (!sha)
+         return MESA_SHADER_DESERIALIZE_READ_ERROR;
+
+      resolve_uniform_types(prog, sha);
+
+      _mesa_reference_shader(ctx, &prog->_LinkedShaders[index], sha);
+
+      struct gl_program *linked_prog = ctx->Driver.GetProgram(ctx, prog, sha);
+      if (linked_prog) {
+         _mesa_copy_linked_program_data((gl_shader_stage) index,
+                                        prog, linked_prog);
+         _mesa_reference_program(ctx, &sha->Program, linked_prog);
+         _mesa_reference_program(ctx, &linked_prog, NULL);
+      }
+
+   }
+
+   /* set default values for uniforms that have initializer */
+   link_set_uniform_initializers(prog);
+
+   prog->_Linked = GL_TRUE;
+
+   return 0;
+}
+
+/**
+ * Deserialize gl_shader_program structure
+ */
+extern "C" int
+mesa_program_deserialize(struct gl_context *ctx, struct gl_shader_program *prog,
+                         const GLvoid *data, size_t size)
+{
+   assert(ctx);
+   memory_map map;
+   map.map((const void*) data, size);
+   return deserialize_program(ctx, prog, map);
+}
+
+
+extern "C" int
+mesa_program_load(struct gl_context *ctx, struct gl_shader_program *prog,
+                  const char *path)
+{
+   assert(ctx);
+   memory_map map;
+   int result = 0;
+
+   if (map.map(path))
+      return -1;
+   result = deserialize_program(ctx, prog, map);
+
+   /* Cache binary produced with a different Mesa, remove it. */
+   if (result == MESA_SHADER_DESERIALIZE_VERSION_ERROR)
+      unlink(path);
+
+   return result;
+}

diff --git a/icd/intel/compiler/shader/shader_serialize.cpp b/icd/intel/compiler/shader/shader_serialize.cpp
new file mode 100644
index 0000000..d0dd2e0
--- /dev/null
+++ b/icd/intel/compiler/shader/shader_serialize.cpp

@@ -0,0 +1,229 @@
+/* -*- c++ -*- */
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "shader_cache.h"
+#include "ir_serialize.h"
+
+#define MESA_SHADER_CACHE_MAGIC __DATE__ " " __TIME__
+
+const char *
+mesa_get_shader_cache_magic()
+{
+   return MESA_SHADER_CACHE_MAGIC;
+}
+
+/**
+ * Serializes gl_shader structure, writes shader header
+ * information and exec_list of instructions
+ */
+extern "C" char *
+mesa_shader_serialize(struct gl_context *ctx, struct gl_shader *shader, size_t *size, bool shader_only)
+{
+   assert(ctx);
+
+   if (!supported_by_shader_cache(shader, true /* is_write */))
+      return NULL;
+
+   *size = 0;
+
+   memory_writer blob;
+
+   // write the size early for quick completeness checking
+   assert(blob.position() == 0);
+   uint32_t shader_sentinel = 0;
+   blob.write_uint32_t(shader_sentinel);
+
+   if (shader_only) {
+      /* If only serializing shader, we must populate our cache identifiers */
+      blob.write(&cache_validation_data, sizeof(cache_validation_data));
+
+      blob.write_string(mesa_get_shader_cache_magic());
+      blob.write_string((const char *)ctx->Driver.GetString(ctx, GL_VENDOR));
+      blob.write_string((const char *)ctx->Driver.GetString(ctx, GL_RENDERER));
+   }
+
+   int32_t start_pos = blob.position();
+   uint32_t shader_data_len = 0;
+   uint32_t shader_type = shader->Type;
+
+   blob.write_uint32_t(shader_data_len);
+   blob.write_uint32_t(shader_type);
+
+   blob.write(shader, sizeof(struct gl_shader));
+
+   /* dump all shader instructions */
+   serialize_list(shader->ir, blob);
+
+   shader_data_len = blob.position() -
+      start_pos - sizeof(shader_data_len);
+   blob.overwrite(&shader_data_len, sizeof(shader_data_len), start_pos);
+
+   // write shader length to sentinel for now
+   shader_sentinel = shader_data_len;
+   blob.overwrite(&shader_sentinel, sizeof(shader_sentinel), 0);
+
+   *size = blob.position();
+
+   return blob.release_memory(size);
+}
+
+
+/**
+ * helper structure for hash serialization, hash size is
+ * counted to item_count during serialization
+ */
+struct hash_serialize_data
+{
+   hash_serialize_data(void *memory_writer) :
+      writer(memory_writer),
+      item_count(0) { }
+
+   void *writer;
+   uint32_t item_count;
+};
+
+
+static void
+serialize_hash(const void *key, void *data, void *closure)
+{
+   hash_serialize_data *s_data = (hash_serialize_data *) closure;
+   memory_writer *blob = (memory_writer *) s_data->writer;
+
+   uint32_t value = ((intptr_t)data);
+
+   blob->write_string((char *)key);
+   blob->write_uint32_t(value);
+
+   s_data->item_count++;
+}
+
+
+static void
+serialize_hash_table(struct string_to_uint_map *map, memory_writer *blob)
+{
+   struct hash_serialize_data data(blob);
+   int32_t pos = blob->position();
+   blob->write_uint32_t(data.item_count);
+
+   map->iterate(serialize_hash, &data);
+
+   blob->overwrite(&data.item_count, sizeof(data.item_count), pos);
+}
+
+
+static void
+serialize_uniform_storage(gl_uniform_storage *uni, memory_writer &blob)
+{
+   blob.write(uni, sizeof(gl_uniform_storage));
+   blob.write_string(uni->name);
+
+   /* note, type is not serialized, it is resolved during parsing */
+
+   /* how many elements (1 if not array) * how many components in the type */
+   const unsigned elements = MAX2(1, uni->array_elements);
+   uint32_t size = elements * MAX2(1, uni->type->components());
+
+   blob.write_uint32_t(size);
+}
+
+/**
+ * Serialize gl_shader_program structure
+ */
+extern "C" char *
+mesa_program_serialize(struct gl_context *ctx, struct gl_shader_program *prog, size_t *size)
+{
+   assert(ctx);
+
+   if (!supported_by_program_cache(prog, true /* is_write */))
+      return NULL;
+
+   memory_writer blob;
+
+   // write the size early for quick completeness checking
+   assert(blob.position() == 0);
+   uint32_t program_sentinel = 0;
+   blob.write_uint32_t(program_sentinel);
+
+   blob.write(&cache_validation_data, sizeof(cache_validation_data));
+
+   blob.write_string(mesa_get_shader_cache_magic());
+   blob.write_string((const char *)ctx->Driver.GetString(ctx, GL_VENDOR));
+   blob.write_string((const char *)ctx->Driver.GetString(ctx, GL_RENDERER));
+
+   blob.write(prog, sizeof(gl_shader_program));
+
+   /* hash tables */
+   serialize_hash_table(prog->AttributeBindings, &blob);
+   serialize_hash_table(prog->FragDataBindings, &blob);
+   serialize_hash_table(prog->FragDataIndexBindings, &blob);
+   serialize_hash_table(prog->UniformHash, &blob);
+
+   /* uniform storage */
+   if (prog->UniformStorage) {
+      for (unsigned i = 0; i < prog->NumUserUniformStorage; i++)
+         serialize_uniform_storage(&prog->UniformStorage[i], blob);
+   }
+
+   uint8_t shader_amount = 0;
+   unsigned shader_amount_pos = blob.position();
+   blob.write_uint8_t(shader_amount);
+
+   /* _LinkedShaders IR */
+   for (uint32_t i = 0; i < MESA_SHADER_STAGES; i++) {
+      size_t sha_size = 0;
+
+      if (!prog->_LinkedShaders[i])
+         continue;
+
+      /* Set used GLSL version and IsES flag from gl_shader_program,
+       * this is required when deserializing the data.
+       */
+      prog->_LinkedShaders[i]->Version = prog->Version;
+      prog->_LinkedShaders[i]->IsES = prog->IsES;
+
+      char *data = mesa_shader_serialize(ctx, prog->_LinkedShaders[i], &sha_size, false /* shader_only */);
+
+      if (!data)
+         return NULL;
+
+      shader_amount++;
+
+      /* index in _LinkedShaders list + shader blob */
+      if (data) {
+         blob.write_uint32_t(i);
+         blob.write(data, sha_size);
+         free(data);
+      }
+   }
+
+   blob.overwrite(&shader_amount, sizeof(shader_amount), shader_amount_pos);
+
+   *size = blob.position();
+
+   // write program size to sentinel for now
+   program_sentinel = *size;
+   blob.overwrite(&program_sentinel, sizeof(program_sentinel), 0);
+
+   return blob.release_memory(size);
+}

diff --git a/icd/intel/compiler/shader/standalone_scaffolding.cpp b/icd/intel/compiler/shader/standalone_scaffolding.cpp
new file mode 100644
index 0000000..3c4093e
--- /dev/null
+++ b/icd/intel/compiler/shader/standalone_scaffolding.cpp

@@ -0,0 +1,188 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/* This file declares stripped-down versions of functions that
+ * normally exist outside of the glsl folder, so that they can be used
+ * when running the GLSL compiler standalone (for unit testing or
+ * compiling builtins).
+ */
+
+#include "standalone_scaffolding.h"
+
+#include <assert.h>
+#include <string.h>
+#include "ralloc.h"
+
+//void
+//_mesa_warning(struct gl_context *ctx, const char *fmt, ...)
+//{
+//    va_list vargs;
+//    (void) ctx;
+
+//    va_start(vargs, fmt);
+
+//    /* This output is not thread-safe, but that's good enough for the
+//     * standalone compiler.
+//     */
+//    fprintf(stderr, "Mesa warning: ");
+//    vfprintf(stderr, fmt, vargs);
+//    fprintf(stderr, "\n");
+
+//    va_end(vargs);
+//}
+
+void
+_mesa_reference_shader(struct gl_context *ctx, struct gl_shader **ptr,
+                       struct gl_shader *sh)
+{
+   (void) ctx;
+   *ptr = sh;
+}
+
+void
+_mesa_reference_program(struct gl_context *ctx, struct gl_program **ptr,
+                       struct gl_program *prog)
+{
+   (void) ctx;
+   *ptr = prog;
+}
+
+//void
+//_mesa_shader_debug(struct gl_context *, GLenum, GLuint *id,
+//                   const char *, int)
+//{
+//}
+
+struct gl_shader *
+_mesa_new_shader(struct gl_context *ctx, GLuint name, GLenum type)
+{
+   struct gl_shader *shader;
+
+   (void) ctx;
+
+   assert(type == GL_VERTEX_SHADER ||
+          type == GL_GEOMETRY_SHADER ||
+          type == GL_FRAGMENT_SHADER);
+
+   shader = rzalloc(NULL, struct gl_shader);
+   if (shader) {
+      shader->Type = type;
+      shader->Stage = _mesa_shader_enum_to_shader_stage(type);
+      shader->Name = name;
+      // LunarG: VK does not use reference counts
+//      shader->RefCount = 1;
+   }
+   return shader;
+}
+
+void _mesa_delete_shader(struct gl_context *ctx, struct gl_shader *sh)
+{
+   free((void *)sh->Source);
+   free(sh->Label);
+   _mesa_reference_program(ctx, &sh->Program, NULL);
+   ralloc_free(sh);
+}
+
+
+void initialize_context_to_defaults(struct gl_context *ctx, gl_api api)
+{
+   memset(ctx, 0, sizeof(*ctx));
+
+   ctx->API = api;
+
+   ctx->Extensions.dummy_false = false;
+   ctx->Extensions.dummy_true = true;
+   ctx->Extensions.ARB_compute_shader = true;
+   ctx->Extensions.ARB_conservative_depth = true;
+   ctx->Extensions.ARB_draw_instanced = true;
+   ctx->Extensions.ARB_ES2_compatibility = true;
+   ctx->Extensions.ARB_ES3_compatibility = true;
+   ctx->Extensions.ARB_explicit_attrib_location = true;
+   ctx->Extensions.ARB_fragment_coord_conventions = true;
+   ctx->Extensions.ARB_gpu_shader5 = true;
+   ctx->Extensions.ARB_sample_shading = true;
+   ctx->Extensions.ARB_shader_bit_encoding = true;
+   ctx->Extensions.ARB_shader_stencil_export = true;
+   ctx->Extensions.ARB_shader_texture_lod = true;
+   ctx->Extensions.ARB_shading_language_420pack = true;
+   ctx->Extensions.ARB_shading_language_packing = true;
+   ctx->Extensions.ARB_texture_cube_map_array = true;
+   ctx->Extensions.ARB_texture_gather = true;
+   ctx->Extensions.ARB_texture_multisample = true;
+   ctx->Extensions.ARB_texture_query_levels = true;
+   ctx->Extensions.ARB_texture_query_lod = true;
+   ctx->Extensions.ARB_uniform_buffer_object = true;
+   ctx->Extensions.ARB_viewport_array = true;
+
+   ctx->Extensions.OES_EGL_image_external = true;
+   ctx->Extensions.OES_standard_derivatives = true;
+
+   ctx->Extensions.EXT_shader_integer_mix = true;
+   ctx->Extensions.EXT_texture3D = true;
+   ctx->Extensions.EXT_texture_array = true;
+
+   ctx->Extensions.NV_texture_rectangle = true;
+
+   ctx->Const.GLSLVersion = 120;
+
+   /* 1.20 minimums. */
+   ctx->Const.MaxLights = 8;
+   ctx->Const.MaxClipPlanes = 6;
+   ctx->Const.MaxTextureUnits = 2;
+   ctx->Const.MaxTextureCoordUnits = 2;
+   ctx->Const.Program[MESA_SHADER_VERTEX].MaxAttribs = 16;
+
+   ctx->Const.Program[MESA_SHADER_VERTEX].MaxUniformComponents = 512;
+   ctx->Const.Program[MESA_SHADER_VERTEX].MaxOutputComponents = 32;
+   ctx->Const.MaxVarying = 8; /* == gl_MaxVaryingFloats / 4 */
+   ctx->Const.Program[MESA_SHADER_VERTEX].MaxTextureImageUnits = 0;
+   ctx->Const.MaxCombinedTextureImageUnits = 2;
+   ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxTextureImageUnits = 2;
+   ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxUniformComponents = 64;
+   ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxInputComponents = 32;
+
+   ctx->Const.MaxDrawBuffers = 1;
+   ctx->Const.MaxComputeWorkGroupCount[0] = 65535;
+   ctx->Const.MaxComputeWorkGroupCount[1] = 65535;
+   ctx->Const.MaxComputeWorkGroupCount[2] = 65535;
+   ctx->Const.MaxComputeWorkGroupSize[0] = 1024;
+   ctx->Const.MaxComputeWorkGroupSize[1] = 1024;
+   ctx->Const.MaxComputeWorkGroupSize[2] = 64;
+   ctx->Const.MaxComputeWorkGroupInvocations = 1024;
+   ctx->Const.Program[MESA_SHADER_COMPUTE].MaxTextureImageUnits = 16;
+   ctx->Const.Program[MESA_SHADER_COMPUTE].MaxUniformComponents = 1024;
+   ctx->Const.Program[MESA_SHADER_COMPUTE].MaxInputComponents = 0; /* not used */
+   ctx->Const.Program[MESA_SHADER_COMPUTE].MaxOutputComponents = 0; /* not used */
+
+   /* Set up default shader compiler options. */
+   struct gl_shader_compiler_options options;
+   memset(&options, 0, sizeof(options));
+   options.MaxUnrollIterations = 32;
+   options.MaxIfDepth = UINT_MAX;
+
+   /* Default pragma settings */
+   options.DefaultPragmas.Optimize = true;
+
+   for (int sh = 0; sh < MESA_SHADER_STAGES; ++sh)
+      memcpy(&ctx->ShaderCompilerOptions[sh], &options, sizeof(options));
+}

diff --git a/icd/intel/compiler/shader/standalone_scaffolding.h b/icd/intel/compiler/shader/standalone_scaffolding.h
new file mode 100644
index 0000000..ce5d4e3
--- /dev/null
+++ b/icd/intel/compiler/shader/standalone_scaffolding.h

@@ -0,0 +1,89 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/* This file declares stripped-down versions of functions that
+ * normally exist outside of the glsl folder, so that they can be used
+ * when running the GLSL compiler standalone (for unit testing or
+ * compiling builtins).
+ */
+
+#pragma once
+#ifndef STANDALONE_SCAFFOLDING_H
+#define STANDALONE_SCAFFOLDING_H
+
+#include <assert.h>
+#include "main/mtypes.h"
+
+extern "C" void
+_mesa_warning(struct gl_context *ctx, const char *fmtString, ... );
+
+extern "C" void
+_mesa_reference_shader(struct gl_context *ctx, struct gl_shader **ptr,
+                       struct gl_shader *sh);
+
+// LunarG ADD:
+extern "C" void
+_mesa_reference_program(struct gl_context *ctx, struct gl_program **ptr,
+                       struct gl_program *prog);
+
+extern "C" struct gl_shader *
+_mesa_new_shader(struct gl_context *ctx, GLuint name, GLenum type);
+
+// LunarG ADD:
+extern "C" void
+_mesa_delete_shader(struct gl_context *ctx, struct gl_shader *sh);
+
+extern "C" void
+_mesa_shader_debug(struct gl_context *ctx, GLenum type, GLuint *id,
+                   const char *msg, int len);
+
+static inline gl_shader_stage
+_mesa_shader_enum_to_shader_stage(GLenum v)
+{
+   switch (v) {
+   case GL_VERTEX_SHADER:
+      return MESA_SHADER_VERTEX;
+   case GL_FRAGMENT_SHADER:
+      return MESA_SHADER_FRAGMENT;
+   case GL_GEOMETRY_SHADER:
+      return MESA_SHADER_GEOMETRY;
+   case GL_COMPUTE_SHADER:
+      return MESA_SHADER_COMPUTE;
+   default:
+      assert(!"bad value in _mesa_shader_enum_to_shader_stage()");
+      return MESA_SHADER_VERTEX;
+   }
+}
+
+/**
+ * Initialize the given gl_context structure to a reasonable set of
+ * defaults representing the minimum capabilities required by the
+ * OpenGL spec.
+ *
+ * This is used when compiling builtin functions and in testing, when
+ * we don't have a connection to an actual driver.
+ */
+void initialize_context_to_defaults(struct gl_context *ctx, gl_api api);
+
+
+#endif /* STANDALONE_SCAFFOLDING_H */

diff --git a/icd/intel/compiler/shader/standalone_utils.c b/icd/intel/compiler/shader/standalone_utils.c
new file mode 100644
index 0000000..7d64fe2
--- /dev/null
+++ b/icd/intel/compiler/shader/standalone_utils.c

@@ -0,0 +1,73 @@
+/*
+ * Copyright © 2008, 2009 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include <getopt.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <assert.h>
+#include <string.h>
+#include <inttypes.h>
+
+/** @file standalone_utils.cpp
+ *
+ * This file contains the allocation and logging utils needed
+ * by the standalone compiler.
+ */
+
+#include "instance.h"
+
+
+void *intel_alloc(const void *handle,
+                                size_t size, size_t alignment,
+                                VkSystemAllocationScope scope)
+{
+    assert(intel_handle_validate(handle));
+    return icd_instance_alloc(((const struct intel_handle *) handle)->instance->icd,
+            size, alignment, scope);
+}
+
+void intel_free(const void *handle, void *ptr)
+{
+    assert(intel_handle_validate(handle));
+    icd_instance_free(((const struct intel_handle *) handle)->instance->icd, ptr);
+}
+
+void intel_logv(const void *handle,
+                VkFlags msg_flags,
+                VkDebugReportObjectTypeEXT obj_type, uint64_t src_object,
+                size_t location, int32_t msg_code,
+                const char *format, va_list ap)
+{
+    char msg[256];
+    int ret;
+
+    ret = vsnprintf(msg, sizeof(msg), format, ap);
+    if (ret >= sizeof(msg) || ret < 0)
+        msg[sizeof(msg) - 1] = '\0';
+
+    assert(intel_handle_validate(handle));
+    icd_instance_log(((const struct intel_handle *) handle)->instance->icd,
+                     msg_flags,
+                     obj_type, src_object,              /* obj_type, object */
+                     location, msg_code,                /* location, msg_code */
+                     msg);
+}

diff --git a/icd/intel/compiler/shader/strtod.cpp b/icd/intel/compiler/shader/strtod.cpp
new file mode 100644
index 0000000..1ac29ec
--- /dev/null
+++ b/icd/intel/compiler/shader/strtod.cpp

@@ -0,0 +1,80 @@
+/*
+ * Copyright 2010 VMware, Inc.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
+ * IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR
+ * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+#include <stdlib.h>
+
+#ifdef _GNU_SOURCE
+#include <locale.h>
+#ifdef __APPLE__
+#include <xlocale.h>
+#endif
+#endif
+
+#include "strtod.h"
+
+#if defined(_GNU_SOURCE) && !defined(__CYGWIN__) && !defined(__FreeBSD__) && \
+   !defined(__HAIKU__) && !defined(__UCLIBC__)
+#define GLSL_HAVE_LOCALE_T
+#endif
+
+#ifdef GLSL_HAVE_LOCALE_T
+static struct locale_initializer {
+   locale_initializer() { loc = newlocale(LC_CTYPE_MASK, "C", NULL); }
+   locale_t loc;
+} loc_init;
+#endif
+
+
+/**
+ * Wrapper around strtod which uses the "C" locale so the decimal
+ * point is always '.'
+ */
+double
+glsl_strtod(const char *s, char **end)
+{
+#ifdef GLSL_HAVE_LOCALE_T
+   return strtod_l(s, end, loc_init.loc);
+#else
+   return strtod(s, end);
+#endif
+}
+
+
+/**
+ * Wrapper around strtof which uses the "C" locale so the decimal
+ * point is always '.'
+ */
+float
+glsl_strtof(const char *s, char **end)
+{
+#ifdef GLSL_HAVE_LOCALE_T
+   return strtof_l(s, end, loc_init.loc);
+#elif _XOPEN_SOURCE >= 600 || _ISOC99_SOURCE
+   return strtof(s, end);
+#else
+   return (float) strtod(s, end);
+#endif
+}

diff --git a/icd/intel/compiler/shader/strtod.h b/icd/intel/compiler/shader/strtod.h
new file mode 100644
index 0000000..ad847db
--- /dev/null
+++ b/icd/intel/compiler/shader/strtod.h

@@ -0,0 +1,46 @@
+/*
+ * Copyright 2010 VMware, Inc.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
+ * IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR
+ * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+#ifndef STRTOD_H
+#define STRTOD_H
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+extern double
+glsl_strtod(const char *s, char **end);
+
+extern float
+glsl_strtof(const char *s, char **end);
+
+
+#ifdef __cplusplus
+}
+#endif
+
+
+#endif

diff --git a/icd/intel/compiler/shader/test.cpp b/icd/intel/compiler/shader/test.cpp
new file mode 100644
index 0000000..b1ff92e
--- /dev/null
+++ b/icd/intel/compiler/shader/test.cpp

@@ -0,0 +1,78 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file test.cpp
+ *
+ * Standalone tests for the GLSL compiler.
+ *
+ * This file provides a standalone executable which can be used to
+ * test components of the GLSL.
+ *
+ * Each test is a function with the same signature as main().  The
+ * main function interprets its first argument as the name of the test
+ * to run, strips out that argument, and then calls the test function.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "test_optpass.h"
+
+/**
+ * Print proper usage and exit with failure.
+ */
+static void
+usage_fail(const char *name)
+{
+   printf("*** usage: %s <command> <options>\n", name);
+   printf("\n");
+   printf("Possible commands are:\n");
+   printf("  optpass: test an optimization pass in isolation\n");
+   exit(EXIT_FAILURE);
+}
+
+static const char *extract_command_from_argv(int *argc, char **argv)
+{
+   if (*argc < 2) {
+      usage_fail(argv[0]);
+   }
+   const char *command = argv[1];
+   --*argc;
+   memmove(&argv[1], &argv[2], (*argc) * sizeof(argv[1]));
+   return command;
+}
+
+int main(int argc, char **argv)
+{
+   const char *command = extract_command_from_argv(&argc, argv);
+   if (strcmp(command, "optpass") == 0) {
+      return test_optpass(argc, argv);
+   } else {
+      usage_fail(argv[0]);
+   }
+
+   /* Execution should never reach here. */
+   return EXIT_FAILURE;
+}

diff --git a/icd/intel/compiler/shader/test_optpass.cpp b/icd/intel/compiler/shader/test_optpass.cpp
new file mode 100644
index 0000000..db5cb26
--- /dev/null
+++ b/icd/intel/compiler/shader/test_optpass.cpp

@@ -0,0 +1,274 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file test_optpass.cpp
+ *
+ * Standalone test for optimization passes.
+ *
+ * This file provides the "optpass" command for the standalone
+ * glsl_test app.  It accepts either GLSL or high-level IR as input,
+ * and performs the optimiation passes specified on the command line.
+ * It outputs the IR, both before and after optimiations.
+ */
+
+#include <string>
+#include <iostream>
+#include <sstream>
+#include <getopt.h>
+
+#include "ast.h"
+#include "ir_optimization.h"
+#include "program.h"
+#include "ir_reader.h"
+#include "standalone_scaffolding.h"
+
+using namespace std;
+
+static string read_stdin_to_eof()
+{
+   stringbuf sb;
+   cin.get(sb, '\0');
+   return sb.str();
+}
+
+static GLboolean
+do_optimization(struct exec_list *ir, const char *optimization,
+                const struct gl_shader_compiler_options *options)
+{
+   int int_0;
+   int int_1;
+   int int_2;
+   int int_3;
+   int int_4;
+
+   if (sscanf(optimization, "do_common_optimization ( %d ) ", &int_0) == 1) {
+      return do_common_optimization(ir, int_0 != 0, false, options, true);
+   } else if (strcmp(optimization, "do_algebraic") == 0) {
+      return do_algebraic(ir, true);
+   } else if (strcmp(optimization, "do_constant_folding") == 0) {
+      return do_constant_folding(ir);
+   } else if (strcmp(optimization, "do_constant_variable") == 0) {
+      return do_constant_variable(ir);
+   } else if (strcmp(optimization, "do_constant_variable_unlinked") == 0) {
+      return do_constant_variable_unlinked(ir);
+   } else if (strcmp(optimization, "do_copy_propagation") == 0) {
+      return do_copy_propagation(ir);
+   } else if (strcmp(optimization, "do_copy_propagation_elements") == 0) {
+      return do_copy_propagation_elements(ir);
+   } else if (strcmp(optimization, "do_constant_propagation") == 0) {
+      return do_constant_propagation(ir);
+   } else if (strcmp(optimization, "do_dead_code") == 0) {
+      return do_dead_code(ir, false);
+   } else if (strcmp(optimization, "do_dead_code_local") == 0) {
+      return do_dead_code_local(ir);
+   } else if (strcmp(optimization, "do_dead_code_unlinked") == 0) {
+      return do_dead_code_unlinked(ir);
+   } else if (strcmp(optimization, "do_dead_functions") == 0) {
+      return do_dead_functions(ir);
+   } else if (strcmp(optimization, "do_function_inlining") == 0) {
+      return do_function_inlining(ir);
+   } else if (sscanf(optimization,
+                     "do_lower_jumps ( %d , %d , %d , %d , %d ) ",
+                     &int_0, &int_1, &int_2, &int_3, &int_4) == 5) {
+      return do_lower_jumps(ir, int_0 != 0, int_1 != 0, int_2 != 0,
+                            int_3 != 0, int_4 != 0);
+   } else if (strcmp(optimization, "do_lower_texture_projection") == 0) {
+      return do_lower_texture_projection(ir);
+   } else if (strcmp(optimization, "do_if_simplification") == 0) {
+      return do_if_simplification(ir);
+   } else if (sscanf(optimization, "lower_if_to_cond_assign ( %d ) ",
+                     &int_0) == 1) {
+      return lower_if_to_cond_assign(ir, int_0);
+   } else if (strcmp(optimization, "do_mat_op_to_vec") == 0) {
+      return do_mat_op_to_vec(ir);
+   } else if (strcmp(optimization, "do_noop_swizzle") == 0) {
+      return do_noop_swizzle(ir);
+   } else if (strcmp(optimization, "do_structure_splitting") == 0) {
+      return do_structure_splitting(ir);
+   } else if (strcmp(optimization, "do_swizzle_swizzle") == 0) {
+      return do_swizzle_swizzle(ir);
+   } else if (strcmp(optimization, "do_tree_grafting") == 0) {
+      return do_tree_grafting(ir);
+   } else if (strcmp(optimization, "do_vec_index_to_cond_assign") == 0) {
+      return do_vec_index_to_cond_assign(ir);
+   } else if (strcmp(optimization, "do_vec_index_to_swizzle") == 0) {
+      return do_vec_index_to_swizzle(ir);
+   } else if (strcmp(optimization, "lower_discard") == 0) {
+      return lower_discard(ir);
+   } else if (sscanf(optimization, "lower_instructions ( %d ) ",
+                     &int_0) == 1) {
+      return lower_instructions(ir, int_0);
+   } else if (strcmp(optimization, "lower_noise") == 0) {
+      return lower_noise(ir);
+   } else if (sscanf(optimization, "lower_variable_index_to_cond_assign "
+                     "( %d , %d , %d , %d ) ", &int_0, &int_1, &int_2,
+                     &int_3) == 4) {
+      return lower_variable_index_to_cond_assign(ir, int_0 != 0, int_1 != 0,
+                                                 int_2 != 0, int_3 != 0);
+   } else if (sscanf(optimization, "lower_quadop_vector ( %d ) ",
+                     &int_0) == 1) {
+      return lower_quadop_vector(ir, int_0 != 0);
+   } else if (strcmp(optimization, "optimize_redundant_jumps") == 0) {
+      return optimize_redundant_jumps(ir);
+   } else {
+      printf("Unrecognized optimization %s\n", optimization);
+      exit(EXIT_FAILURE);
+      return false;
+   }
+}
+
+static GLboolean
+do_optimization_passes(struct exec_list *ir, char **optimizations,
+                       int num_optimizations, bool quiet,
+                       const struct gl_shader_compiler_options *options)
+{
+   GLboolean overall_progress = false;
+
+   for (int i = 0; i < num_optimizations; ++i) {
+      const char *optimization = optimizations[i];
+      if (!quiet) {
+         printf("*** Running optimization %s...", optimization);
+      }
+      GLboolean progress = do_optimization(ir, optimization, options);
+      if (!quiet) {
+         printf("%s\n", progress ? "progress" : "no progress");
+      }
+      validate_ir_tree(ir);
+
+      overall_progress = overall_progress || progress;
+   }
+
+   return overall_progress;
+}
+
+int test_optpass(int argc, char **argv)
+{
+   int input_format_ir = 0; /* 0=glsl, 1=ir */
+   int loop = 0;
+   int shader_type = GL_VERTEX_SHADER;
+   int quiet = 0;
+
+   const struct option optpass_opts[] = {
+      { "input-ir", no_argument, &input_format_ir, 1 },
+      { "input-glsl", no_argument, &input_format_ir, 0 },
+      { "loop", no_argument, &loop, 1 },
+      { "vertex-shader", no_argument, &shader_type, GL_VERTEX_SHADER },
+      { "fragment-shader", no_argument, &shader_type, GL_FRAGMENT_SHADER },
+      { "quiet", no_argument, &quiet, 1 },
+      { NULL, 0, NULL, 0 }
+   };
+
+   int idx = 0;
+   int c;
+   while ((c = getopt_long(argc, argv, "", optpass_opts, &idx)) != -1) {
+      if (c != 0) {
+         printf("*** usage: %s optpass <optimizations> <options>\n", argv[0]);
+         printf("\n");
+         printf("Possible options are:\n");
+         printf("  --input-ir: input format is IR\n");
+         printf("  --input-glsl: input format is GLSL (the default)\n");
+         printf("  --loop: run optimizations repeatedly until no progress\n");
+         printf("  --vertex-shader: test with a vertex shader (the default)\n");
+         printf("  --fragment-shader: test with a fragment shader\n");
+         exit(EXIT_FAILURE);
+      }
+   }
+
+   struct gl_context local_ctx;
+   struct gl_context *ctx = &local_ctx;
+   initialize_context_to_defaults(ctx, API_OPENGL_COMPAT);
+
+   ctx->Driver.NewShader = _mesa_new_shader;
+
+   struct gl_shader *shader = rzalloc(NULL, struct gl_shader);
+   shader->Type = shader_type;
+   shader->Stage = _mesa_shader_enum_to_shader_stage(shader_type);
+
+   string input = read_stdin_to_eof();
+
+   struct _mesa_glsl_parse_state *state
+      = new(shader) _mesa_glsl_parse_state(ctx, shader->Stage, shader);
+
+   if (input_format_ir) {
+      shader->ir = new(shader) exec_list;
+      _mesa_glsl_initialize_types(state);
+      _mesa_glsl_read_ir(state, shader->ir, input.c_str(), true);
+   } else {
+      shader->Source = input.c_str();
+      const char *source = shader->Source;
+      state->error = glcpp_preprocess(state, &source, &state->info_log,
+                                state->extensions, ctx) != 0;
+
+      if (!state->error) {
+         _mesa_glsl_lexer_ctor(state, source);
+         _mesa_glsl_parse(state);
+         _mesa_glsl_lexer_dtor(state);
+      }
+
+      shader->ir = new(shader) exec_list;
+      if (!state->error && !state->translation_unit.is_empty())
+         _mesa_ast_to_hir(shader->ir, state);
+   }
+
+   /* Print out the initial IR */
+   if (!state->error && !quiet) {
+      printf("*** pre-optimization IR:\n");
+      _mesa_print_ir(stdout, shader->ir, state);
+      printf("\n--\n");
+   }
+
+   /* Optimization passes */
+   if (!state->error) {
+      GLboolean progress;
+      const struct gl_shader_compiler_options *options =
+         &ctx->ShaderCompilerOptions[_mesa_shader_enum_to_shader_stage(shader_type)];
+      do {
+         progress = do_optimization_passes(shader->ir, &argv[optind],
+                                           argc - optind, quiet != 0, options);
+      } while (loop && progress);
+   }
+
+   /* Print out the resulting IR */
+   if (!state->error) {
+      if (!quiet) {
+         printf("*** resulting IR:\n");
+      }
+      _mesa_print_ir(stdout, shader->ir, state);
+      if (!quiet) {
+         printf("\n--\n");
+      }
+   }
+
+   if (state->error) {
+      printf("*** error(s) occurred:\n");
+      printf("%s\n", state->info_log);
+      printf("--\n");
+   }
+
+   ralloc_free(state);
+   ralloc_free(shader);
+
+   return state->error;
+}
+

diff --git a/icd/intel/compiler/shader/test_optpass.h b/icd/intel/compiler/shader/test_optpass.h
new file mode 100644
index 0000000..923ccf3
--- /dev/null
+++ b/icd/intel/compiler/shader/test_optpass.h

@@ -0,0 +1,30 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+#ifndef TEST_OPTPASS_H
+#define TEST_OPTPASS_H
+
+int test_optpass(int argc, char **argv);
+
+#endif /* TEST_OPTPASS_H */

diff --git a/icd/intel/compiler/shader/tests/.gitignore b/icd/intel/compiler/shader/tests/.gitignore
new file mode 100644
index 0000000..15ce248
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/.gitignore

@@ -0,0 +1,4 @@
+ralloc-test
+uniform-initializer-test
+sampler-types-test
+general-ir-test

diff --git a/icd/intel/compiler/shader/tests/builtin_variable_test.cpp b/icd/intel/compiler/shader/tests/builtin_variable_test.cpp
new file mode 100644
index 0000000..3fdfce5
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/builtin_variable_test.cpp

@@ -0,0 +1,394 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include <gtest/gtest.h>
+#include "standalone_scaffolding.h"
+#include "main/compiler.h"
+#include "main/mtypes.h"
+#include "main/macros.h"
+#include "ralloc.h"
+#include "ir.h"
+#include "glsl_parser_extras.h"
+#include "glsl_symbol_table.h"
+
+class common_builtin : public ::testing::Test {
+public:
+   common_builtin(GLenum shader_type)
+      : shader_type(shader_type)
+   {
+      /* empty */
+   }
+
+   virtual void SetUp();
+   virtual void TearDown();
+
+   void string_starts_with_prefix(const char *str, const char *prefix);
+   void names_start_with_gl();
+   void uniforms_and_system_values_dont_have_explicit_location();
+   void constants_are_constant();
+   void no_invalid_variable_modes();
+
+   GLenum shader_type;
+   struct _mesa_glsl_parse_state *state;
+   struct gl_shader *shader;
+   void *mem_ctx;
+   gl_context ctx;
+   exec_list ir;
+};
+
+void
+common_builtin::SetUp()
+{
+   this->mem_ctx = ralloc_context(NULL);
+   this->ir.make_empty();
+
+   initialize_context_to_defaults(&this->ctx, API_OPENGL_COMPAT);
+
+   this->shader = rzalloc(this->mem_ctx, gl_shader);
+   this->shader->Type = this->shader_type;
+   this->shader->Stage = _mesa_shader_enum_to_shader_stage(this->shader_type);
+
+   this->state =
+      new(mem_ctx) _mesa_glsl_parse_state(&this->ctx, this->shader->Stage,
+                                          this->shader);
+
+   _mesa_glsl_initialize_types(this->state);
+   _mesa_glsl_initialize_variables(&this->ir, this->state);
+}
+
+void
+common_builtin::TearDown()
+{
+   ralloc_free(this->mem_ctx);
+   this->mem_ctx = NULL;
+}
+
+void
+common_builtin::string_starts_with_prefix(const char *str, const char *prefix)
+{
+   const size_t len = strlen(prefix);
+   char *const name_prefix = new char[len + 1];
+
+   strncpy(name_prefix, str, len);
+   name_prefix[len] = '\0';
+   EXPECT_STREQ(prefix, name_prefix) << "Bad name " << str;
+
+   delete [] name_prefix;
+}
+
+void
+common_builtin::names_start_with_gl()
+{
+   foreach_list(node, &this->ir) {
+      ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+      string_starts_with_prefix(var->name, "gl_");
+   }
+}
+
+void
+common_builtin::uniforms_and_system_values_dont_have_explicit_location()
+{
+   foreach_list(node, &this->ir) {
+      ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+      if (var->data.mode != ir_var_uniform && var->data.mode != ir_var_system_value)
+         continue;
+
+      EXPECT_FALSE(var->data.explicit_location);
+      EXPECT_EQ(-1, var->data.location);
+   }
+}
+
+void
+common_builtin::constants_are_constant()
+{
+   foreach_list(node, &this->ir) {
+      ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+      if (var->data.mode != ir_var_auto)
+         continue;
+
+      EXPECT_FALSE(var->data.explicit_location);
+      EXPECT_EQ(-1, var->data.location);
+      EXPECT_TRUE(var->data.read_only);
+   }
+}
+
+void
+common_builtin::no_invalid_variable_modes()
+{
+   foreach_list(node, &this->ir) {
+      ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+      switch (var->data.mode) {
+      case ir_var_auto:
+      case ir_var_uniform:
+      case ir_var_shader_in:
+      case ir_var_shader_out:
+      case ir_var_system_value:
+         break;
+
+      default:
+         ADD_FAILURE() << "Built-in variable " << var->name
+                       << " has an invalid mode " << int(var->data.mode);
+         break;
+      }
+   }
+}
+
+/************************************************************/
+
+class vertex_builtin : public common_builtin {
+public:
+   vertex_builtin()
+      : common_builtin(GL_VERTEX_SHADER)
+   {
+      /* empty */
+   }
+};
+
+TEST_F(vertex_builtin, names_start_with_gl)
+{
+   common_builtin::names_start_with_gl();
+}
+
+TEST_F(vertex_builtin, inputs_have_explicit_location)
+{
+   foreach_list(node, &this->ir) {
+      ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+      if (var->data.mode != ir_var_shader_in)
+         continue;
+
+      EXPECT_TRUE(var->data.explicit_location);
+      EXPECT_NE(-1, var->data.location);
+      EXPECT_GT(VERT_ATTRIB_GENERIC0, var->data.location);
+      EXPECT_EQ(0u, var->data.location_frac);
+   }
+}
+
+TEST_F(vertex_builtin, outputs_have_explicit_location)
+{
+   foreach_list(node, &this->ir) {
+      ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+      if (var->data.mode != ir_var_shader_out)
+         continue;
+
+      EXPECT_TRUE(var->data.explicit_location);
+      EXPECT_NE(-1, var->data.location);
+      EXPECT_GT(VARYING_SLOT_VAR0, var->data.location);
+      EXPECT_EQ(0u, var->data.location_frac);
+
+      /* Several varyings only exist in the fragment shader.  Be sure that no
+       * outputs with these locations exist.
+       */
+      EXPECT_NE(VARYING_SLOT_PNTC, var->data.location);
+      EXPECT_NE(VARYING_SLOT_FACE, var->data.location);
+      EXPECT_NE(VARYING_SLOT_PRIMITIVE_ID, var->data.location);
+   }
+}
+
+TEST_F(vertex_builtin, uniforms_and_system_values_dont_have_explicit_location)
+{
+   common_builtin::uniforms_and_system_values_dont_have_explicit_location();
+}
+
+TEST_F(vertex_builtin, constants_are_constant)
+{
+   common_builtin::constants_are_constant();
+}
+
+TEST_F(vertex_builtin, no_invalid_variable_modes)
+{
+   common_builtin::no_invalid_variable_modes();
+}
+
+/********************************************************************/
+
+class fragment_builtin : public common_builtin {
+public:
+   fragment_builtin()
+      : common_builtin(GL_FRAGMENT_SHADER)
+   {
+      /* empty */
+   }
+};
+
+TEST_F(fragment_builtin, names_start_with_gl)
+{
+   common_builtin::names_start_with_gl();
+}
+
+TEST_F(fragment_builtin, inputs_have_explicit_location)
+{
+   foreach_list(node, &this->ir) {
+      ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+      if (var->data.mode != ir_var_shader_in)
+	 continue;
+
+      EXPECT_TRUE(var->data.explicit_location);
+      EXPECT_NE(-1, var->data.location);
+      EXPECT_GT(VARYING_SLOT_VAR0, var->data.location);
+      EXPECT_EQ(0u, var->data.location_frac);
+
+      /* Several varyings only exist in the vertex / geometry shader.  Be sure
+       * that no inputs with these locations exist.
+       */
+      EXPECT_TRUE(_mesa_varying_slot_in_fs((gl_varying_slot) var->data.location));
+   }
+}
+
+TEST_F(fragment_builtin, outputs_have_explicit_location)
+{
+   foreach_list(node, &this->ir) {
+      ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+      if (var->data.mode != ir_var_shader_out)
+	 continue;
+
+      EXPECT_TRUE(var->data.explicit_location);
+      EXPECT_NE(-1, var->data.location);
+
+      /* gl_FragData[] has location FRAG_RESULT_DATA0.  Locations beyond that
+       * are invalid.
+       */
+      EXPECT_GE(FRAG_RESULT_DATA0, var->data.location);
+
+      EXPECT_EQ(0u, var->data.location_frac);
+   }
+}
+
+TEST_F(fragment_builtin, uniforms_and_system_values_dont_have_explicit_location)
+{
+   common_builtin::uniforms_and_system_values_dont_have_explicit_location();
+}
+
+TEST_F(fragment_builtin, constants_are_constant)
+{
+   common_builtin::constants_are_constant();
+}
+
+TEST_F(fragment_builtin, no_invalid_variable_modes)
+{
+   common_builtin::no_invalid_variable_modes();
+}
+
+/********************************************************************/
+
+class geometry_builtin : public common_builtin {
+public:
+   geometry_builtin()
+      : common_builtin(GL_GEOMETRY_SHADER)
+   {
+      /* empty */
+   }
+};
+
+TEST_F(geometry_builtin, names_start_with_gl)
+{
+   common_builtin::names_start_with_gl();
+}
+
+TEST_F(geometry_builtin, inputs_have_explicit_location)
+{
+   foreach_list(node, &this->ir) {
+      ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+      if (var->data.mode != ir_var_shader_in)
+	 continue;
+
+      if (var->is_interface_instance()) {
+         EXPECT_STREQ("gl_in", var->name);
+         EXPECT_FALSE(var->data.explicit_location);
+         EXPECT_EQ(-1, var->data.location);
+
+         ASSERT_TRUE(var->type->is_array());
+
+         const glsl_type *const instance_type = var->type->fields.array;
+
+         for (unsigned i = 0; i < instance_type->length; i++) {
+            const glsl_struct_field *const input =
+               &instance_type->fields.structure[i];
+
+            string_starts_with_prefix(input->name, "gl_");
+            EXPECT_NE(-1, input->location);
+            EXPECT_GT(VARYING_SLOT_VAR0, input->location);
+
+            /* Several varyings only exist in the fragment shader.  Be sure
+             * that no inputs with these locations exist.
+             */
+            EXPECT_NE(VARYING_SLOT_PNTC, input->location);
+            EXPECT_NE(VARYING_SLOT_FACE, input->location);
+         }
+      } else {
+         EXPECT_TRUE(var->data.explicit_location);
+         EXPECT_NE(-1, var->data.location);
+         EXPECT_GT(VARYING_SLOT_VAR0, var->data.location);
+         EXPECT_EQ(0u, var->data.location_frac);
+      }
+
+      /* Several varyings only exist in the fragment shader.  Be sure that no
+       * inputs with these locations exist.
+       */
+      EXPECT_NE(VARYING_SLOT_PNTC, var->data.location);
+      EXPECT_NE(VARYING_SLOT_FACE, var->data.location);
+   }
+}
+
+TEST_F(geometry_builtin, outputs_have_explicit_location)
+{
+   foreach_list(node, &this->ir) {
+      ir_variable *const var = ((ir_instruction *) node)->as_variable();
+
+      if (var->data.mode != ir_var_shader_out)
+	 continue;
+
+      EXPECT_TRUE(var->data.explicit_location);
+      EXPECT_NE(-1, var->data.location);
+      EXPECT_GT(VARYING_SLOT_VAR0, var->data.location);
+      EXPECT_EQ(0u, var->data.location_frac);
+
+      /* Several varyings only exist in the fragment shader.  Be sure that no
+       * outputs with these locations exist.
+       */
+      EXPECT_NE(VARYING_SLOT_PNTC, var->data.location);
+      EXPECT_NE(VARYING_SLOT_FACE, var->data.location);
+   }
+}
+
+TEST_F(geometry_builtin, uniforms_and_system_values_dont_have_explicit_location)
+{
+   common_builtin::uniforms_and_system_values_dont_have_explicit_location();
+}
+
+TEST_F(geometry_builtin, constants_are_constant)
+{
+   common_builtin::constants_are_constant();
+}
+
+TEST_F(geometry_builtin, no_invalid_variable_modes)
+{
+   common_builtin::no_invalid_variable_modes();
+}

diff --git a/icd/intel/compiler/shader/tests/common.c b/icd/intel/compiler/shader/tests/common.c
new file mode 100644
index 0000000..d69f54d
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/common.c

@@ -0,0 +1,30 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include <stdio.h>
+#include "main/errors.h"
+
+void
+_mesa_error_no_memory(const char *caller)
+{
+   fprintf(stderr, "Mesa error: out of memory in %s", caller);
+}

diff --git a/icd/intel/compiler/shader/tests/compare_ir b/icd/intel/compiler/shader/tests/compare_ir
new file mode 100644
index 0000000..a40fc81
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/compare_ir

@@ -0,0 +1,59 @@
+#!/usr/bin/env python
+# coding=utf-8
+#
+# Copyright © 2011 Intel Corporation
+#
+# Permission is hereby granted, free of charge, to any person obtaining a
+# copy of this software and associated documentation files (the "Software"),
+# to deal in the Software without restriction, including without limitation
+# the rights to use, copy, modify, merge, publish, distribute, sublicense,
+# and/or sell copies of the Software, and to permit persons to whom the
+# Software is furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice (including the next
+# paragraph) shall be included in all copies or substantial portions of the
+# Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+# DEALINGS IN THE SOFTWARE.
+
+# Compare two files containing IR code.  Ignore formatting differences
+# and declaration order.
+
+import os
+import os.path
+import subprocess
+import sys
+import tempfile
+
+from sexps import *
+
+if len(sys.argv) != 3:
+    print 'Usage: compare_ir <file1> <file2>'
+    exit(1)
+
+with open(sys.argv[1]) as f:
+    ir1 = sort_decls(parse_sexp(f.read()))
+with open(sys.argv[2]) as f:
+    ir2 = sort_decls(parse_sexp(f.read()))
+
+if ir1 == ir2:
+    exit(0)
+else:
+    file1, path1 = tempfile.mkstemp(os.path.basename(sys.argv[1]))
+    file2, path2 = tempfile.mkstemp(os.path.basename(sys.argv[2]))
+    try:
+        os.write(file1, '{0}\n'.format(sexp_to_string(ir1)))
+        os.close(file1)
+        os.write(file2, '{0}\n'.format(sexp_to_string(ir2)))
+        os.close(file2)
+        subprocess.call(['diff', '-u', path1, path2])
+    finally:
+        os.remove(path1)
+        os.remove(path2)
+    exit(1)

diff --git a/icd/intel/compiler/shader/tests/copy_constant_to_storage_tests.cpp b/icd/intel/compiler/shader/tests/copy_constant_to_storage_tests.cpp
new file mode 100644
index 0000000..6ab2084
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/copy_constant_to_storage_tests.cpp

@@ -0,0 +1,294 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include <gtest/gtest.h>
+#include "main/compiler.h"
+#include "main/mtypes.h"
+#include "main/macros.h"
+#include "ralloc.h"
+#include "uniform_initializer_utils.h"
+
+namespace linker {
+extern void
+copy_constant_to_storage(union gl_constant_value *storage,
+			 const ir_constant *val,
+			 const enum glsl_base_type base_type,
+			 const unsigned int elements);
+}
+
+class copy_constant_to_storage : public ::testing::Test {
+public:
+   void int_test(unsigned rows);
+   void uint_test(unsigned rows);
+   void bool_test(unsigned rows);
+   void sampler_test();
+   void float_test(unsigned columns, unsigned rows);
+
+   virtual void SetUp();
+   virtual void TearDown();
+
+   gl_constant_value storage[17];
+   void *mem_ctx;
+};
+
+void
+copy_constant_to_storage::SetUp()
+{
+   this->mem_ctx = ralloc_context(NULL);
+}
+
+void
+copy_constant_to_storage::TearDown()
+{
+   ralloc_free(this->mem_ctx);
+   this->mem_ctx = NULL;
+}
+
+void
+copy_constant_to_storage::int_test(unsigned rows)
+{
+   ir_constant *val;
+   generate_data(mem_ctx, GLSL_TYPE_INT, 1, rows, val);
+
+   const unsigned red_zone_size = Elements(storage) - val->type->components();
+   fill_storage_array_with_sentinels(storage,
+				     val->type->components(),
+				     red_zone_size);
+
+   linker::copy_constant_to_storage(storage,
+				    val,
+				    val->type->base_type,
+				    val->type->components());
+
+   verify_data(storage, 0, val, red_zone_size);
+}
+
+void
+copy_constant_to_storage::uint_test(unsigned rows)
+{
+   ir_constant *val;
+   generate_data(mem_ctx, GLSL_TYPE_UINT, 1, rows, val);
+
+   const unsigned red_zone_size = Elements(storage) - val->type->components();
+   fill_storage_array_with_sentinels(storage,
+				     val->type->components(),
+				     red_zone_size);
+
+   linker::copy_constant_to_storage(storage,
+				    val,
+				    val->type->base_type,
+				    val->type->components());
+
+   verify_data(storage, 0, val, red_zone_size);
+}
+
+void
+copy_constant_to_storage::float_test(unsigned columns, unsigned rows)
+{
+   ir_constant *val;
+   generate_data(mem_ctx, GLSL_TYPE_FLOAT, columns, rows, val);
+
+   const unsigned red_zone_size = Elements(storage) - val->type->components();
+   fill_storage_array_with_sentinels(storage,
+				     val->type->components(),
+				     red_zone_size);
+
+   linker::copy_constant_to_storage(storage,
+				    val,
+				    val->type->base_type,
+				    val->type->components());
+
+   verify_data(storage, 0, val, red_zone_size);
+}
+
+void
+copy_constant_to_storage::bool_test(unsigned rows)
+{
+   ir_constant *val;
+   generate_data(mem_ctx, GLSL_TYPE_BOOL, 1, rows, val);
+
+   const unsigned red_zone_size = Elements(storage) - val->type->components();
+   fill_storage_array_with_sentinels(storage,
+				     val->type->components(),
+				     red_zone_size);
+
+   linker::copy_constant_to_storage(storage,
+				    val,
+				    val->type->base_type,
+				    val->type->components());
+
+   verify_data(storage, 0, val, red_zone_size);
+}
+
+/**
+ * The only difference between this test and int_test is that the base type
+ * passed to \c linker::copy_constant_to_storage is hard-coded to \c
+ * GLSL_TYPE_SAMPLER instead of using the base type from the constant.
+ */
+void
+copy_constant_to_storage::sampler_test(void)
+{
+   ir_constant *val;
+   generate_data(mem_ctx, GLSL_TYPE_INT, 1, 1, val);
+
+   const unsigned red_zone_size = Elements(storage) - val->type->components();
+   fill_storage_array_with_sentinels(storage,
+				     val->type->components(),
+				     red_zone_size);
+
+   linker::copy_constant_to_storage(storage,
+				    val,
+				    GLSL_TYPE_SAMPLER,
+				    val->type->components());
+
+   verify_data(storage, 0, val, red_zone_size);
+}
+
+TEST_F(copy_constant_to_storage, bool_uniform)
+{
+   bool_test(1);
+}
+
+TEST_F(copy_constant_to_storage, bvec2_uniform)
+{
+   bool_test(2);
+}
+
+TEST_F(copy_constant_to_storage, bvec3_uniform)
+{
+   bool_test(3);
+}
+
+TEST_F(copy_constant_to_storage, bvec4_uniform)
+{
+   bool_test(4);
+}
+
+TEST_F(copy_constant_to_storage, int_uniform)
+{
+   int_test(1);
+}
+
+TEST_F(copy_constant_to_storage, ivec2_uniform)
+{
+   int_test(2);
+}
+
+TEST_F(copy_constant_to_storage, ivec3_uniform)
+{
+   int_test(3);
+}
+
+TEST_F(copy_constant_to_storage, ivec4_uniform)
+{
+   int_test(4);
+}
+
+TEST_F(copy_constant_to_storage, uint_uniform)
+{
+   uint_test(1);
+}
+
+TEST_F(copy_constant_to_storage, uvec2_uniform)
+{
+   uint_test(2);
+}
+
+TEST_F(copy_constant_to_storage, uvec3_uniform)
+{
+   uint_test(3);
+}
+
+TEST_F(copy_constant_to_storage, uvec4_uniform)
+{
+   uint_test(4);
+}
+
+TEST_F(copy_constant_to_storage, float_uniform)
+{
+   float_test(1, 1);
+}
+
+TEST_F(copy_constant_to_storage, vec2_uniform)
+{
+   float_test(1, 2);
+}
+
+TEST_F(copy_constant_to_storage, vec3_uniform)
+{
+   float_test(1, 3);
+}
+
+TEST_F(copy_constant_to_storage, vec4_uniform)
+{
+   float_test(1, 4);
+}
+
+TEST_F(copy_constant_to_storage, mat2x2_uniform)
+{
+   float_test(2, 2);
+}
+
+TEST_F(copy_constant_to_storage, mat2x3_uniform)
+{
+   float_test(2, 3);
+}
+
+TEST_F(copy_constant_to_storage, mat2x4_uniform)
+{
+   float_test(2, 4);
+}
+
+TEST_F(copy_constant_to_storage, mat3x2_uniform)
+{
+   float_test(3, 2);
+}
+
+TEST_F(copy_constant_to_storage, mat3x3_uniform)
+{
+   float_test(3, 3);
+}
+
+TEST_F(copy_constant_to_storage, mat3x4_uniform)
+{
+   float_test(3, 4);
+}
+
+TEST_F(copy_constant_to_storage, mat4x2_uniform)
+{
+   float_test(4, 2);
+}
+
+TEST_F(copy_constant_to_storage, mat4x3_uniform)
+{
+   float_test(4, 3);
+}
+
+TEST_F(copy_constant_to_storage, mat4x4_uniform)
+{
+   float_test(4, 4);
+}
+
+TEST_F(copy_constant_to_storage, sampler_uniform)
+{
+   sampler_test();
+}

diff --git a/icd/intel/compiler/shader/tests/general_ir_test.cpp b/icd/intel/compiler/shader/tests/general_ir_test.cpp
new file mode 100644
index 0000000..7bff56b
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/general_ir_test.cpp

@@ -0,0 +1,90 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include "icd-utils.h" // LunarG ADD:
+#include <gtest/gtest.h>
+#include "main/compiler.h"
+#include "main/mtypes.h"
+#include "main/macros.h"
+#include "ralloc.h"
+#include "ir.h"
+
+TEST(ir_variable_constructor, interface)
+{
+   void *mem_ctx = ralloc_context(NULL);
+
+   static const glsl_struct_field f[] = {
+      {
+         glsl_type::vec(4),
+         "v",
+         false
+      }
+   };
+
+   const glsl_type *const interface =
+      glsl_type::get_interface_instance(f,
+                                        ARRAY_SIZE(f),
+                                        GLSL_INTERFACE_PACKING_STD140,
+                                        "simple_interface");
+
+   static const char name[] = "named_instance";
+
+   ir_variable *const v =
+      new(mem_ctx) ir_variable(interface, name, ir_var_uniform);
+
+   EXPECT_STREQ(name, v->name);
+   EXPECT_NE(name, v->name);
+   EXPECT_EQ(interface, v->type);
+   EXPECT_EQ(interface, v->get_interface_type());
+}
+
+TEST(ir_variable_constructor, interface_array)
+{
+   void *mem_ctx = ralloc_context(NULL);
+
+   static const glsl_struct_field f[] = {
+      {
+         glsl_type::vec(4),
+         "v",
+         false
+      }
+   };
+
+   const glsl_type *const interface =
+      glsl_type::get_interface_instance(f,
+                                        ARRAY_SIZE(f),
+                                        GLSL_INTERFACE_PACKING_STD140,
+                                        "simple_interface");
+
+   const glsl_type *const interface_array =
+      glsl_type::get_array_instance(interface, 2);
+
+   static const char name[] = "array_instance";
+
+   ir_variable *const v =
+      new(mem_ctx) ir_variable(interface_array, name, ir_var_uniform);
+
+   EXPECT_STREQ(name, v->name);
+   EXPECT_NE(name, v->name);
+   EXPECT_EQ(interface_array, v->type);
+   EXPECT_EQ(interface, v->get_interface_type());
+}

diff --git a/icd/intel/compiler/shader/tests/invalidate_locations_test.cpp b/icd/intel/compiler/shader/tests/invalidate_locations_test.cpp
new file mode 100644
index 0000000..997592f
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/invalidate_locations_test.cpp

@@ -0,0 +1,196 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include <gtest/gtest.h>
+#include "main/compiler.h"
+#include "main/mtypes.h"
+#include "main/macros.h"
+#include "ralloc.h"
+#include "ir.h"
+#include "linker.h"
+
+/**
+ * \file varyings_test.cpp
+ *
+ * Test various aspects of linking shader stage inputs and outputs.
+ */
+
+class invalidate_locations : public ::testing::Test {
+public:
+   virtual void SetUp();
+   virtual void TearDown();
+
+   void *mem_ctx;
+   exec_list ir;
+};
+
+void
+invalidate_locations::SetUp()
+{
+   this->mem_ctx = ralloc_context(NULL);
+   this->ir.make_empty();
+}
+
+void
+invalidate_locations::TearDown()
+{
+   ralloc_free(this->mem_ctx);
+   this->mem_ctx = NULL;
+}
+
+TEST_F(invalidate_locations, simple_vertex_in_generic)
+{
+   ir_variable *const var =
+      new(mem_ctx) ir_variable(glsl_type::vec(4),
+                               "a",
+                               ir_var_shader_in);
+
+   EXPECT_FALSE(var->data.explicit_location);
+   EXPECT_EQ(-1, var->data.location);
+
+   var->data.location = VERT_ATTRIB_GENERIC0;
+   var->data.location_frac = 2;
+
+   ir.push_tail(var);
+
+   link_invalidate_variable_locations(&ir);
+
+   EXPECT_EQ(-1, var->data.location);
+   EXPECT_EQ(0u, var->data.location_frac);
+   EXPECT_FALSE(var->data.explicit_location);
+   EXPECT_TRUE(var->data.is_unmatched_generic_inout);
+}
+
+TEST_F(invalidate_locations, explicit_location_vertex_in_generic)
+{
+   ir_variable *const var =
+      new(mem_ctx) ir_variable(glsl_type::vec(4),
+                               "a",
+                               ir_var_shader_in);
+
+   EXPECT_FALSE(var->data.explicit_location);
+   EXPECT_EQ(-1, var->data.location);
+
+   var->data.location = VERT_ATTRIB_GENERIC0;
+   var->data.explicit_location = true;
+
+   ir.push_tail(var);
+
+   link_invalidate_variable_locations(&ir);
+
+   EXPECT_EQ(VERT_ATTRIB_GENERIC0, var->data.location);
+   EXPECT_EQ(0u, var->data.location_frac);
+   EXPECT_TRUE(var->data.explicit_location);
+   EXPECT_FALSE(var->data.is_unmatched_generic_inout);
+}
+
+TEST_F(invalidate_locations, explicit_location_frac_vertex_in_generic)
+{
+   ir_variable *const var =
+      new(mem_ctx) ir_variable(glsl_type::vec(4),
+                               "a",
+                               ir_var_shader_in);
+
+   EXPECT_FALSE(var->data.explicit_location);
+   EXPECT_EQ(-1, var->data.location);
+
+   var->data.location = VERT_ATTRIB_GENERIC0;
+   var->data.location_frac = 2;
+   var->data.explicit_location = true;
+
+   ir.push_tail(var);
+
+   link_invalidate_variable_locations(&ir);
+
+   EXPECT_EQ(VERT_ATTRIB_GENERIC0, var->data.location);
+   EXPECT_EQ(2u, var->data.location_frac);
+   EXPECT_TRUE(var->data.explicit_location);
+   EXPECT_FALSE(var->data.is_unmatched_generic_inout);
+}
+
+TEST_F(invalidate_locations, vertex_in_builtin)
+{
+   ir_variable *const var =
+      new(mem_ctx) ir_variable(glsl_type::vec(4),
+                               "gl_Vertex",
+                               ir_var_shader_in);
+
+   EXPECT_FALSE(var->data.explicit_location);
+   EXPECT_EQ(-1, var->data.location);
+
+   var->data.location = VERT_ATTRIB_POS;
+   var->data.explicit_location = true;
+
+   ir.push_tail(var);
+
+   link_invalidate_variable_locations(&ir);
+
+   EXPECT_EQ(VERT_ATTRIB_POS, var->data.location);
+   EXPECT_EQ(0u, var->data.location_frac);
+   EXPECT_TRUE(var->data.explicit_location);
+   EXPECT_FALSE(var->data.is_unmatched_generic_inout);
+}
+
+TEST_F(invalidate_locations, simple_vertex_out_generic)
+{
+   ir_variable *const var =
+      new(mem_ctx) ir_variable(glsl_type::vec(4),
+                               "a",
+                               ir_var_shader_out);
+
+   EXPECT_FALSE(var->data.explicit_location);
+   EXPECT_EQ(-1, var->data.location);
+
+   var->data.location = VARYING_SLOT_VAR0;
+
+   ir.push_tail(var);
+
+   link_invalidate_variable_locations(&ir);
+
+   EXPECT_EQ(-1, var->data.location);
+   EXPECT_EQ(0u, var->data.location_frac);
+   EXPECT_FALSE(var->data.explicit_location);
+   EXPECT_TRUE(var->data.is_unmatched_generic_inout);
+}
+
+TEST_F(invalidate_locations, vertex_out_builtin)
+{
+   ir_variable *const var =
+      new(mem_ctx) ir_variable(glsl_type::vec(4),
+                               "gl_FrontColor",
+                               ir_var_shader_out);
+
+   EXPECT_FALSE(var->data.explicit_location);
+   EXPECT_EQ(-1, var->data.location);
+
+   var->data.location = VARYING_SLOT_COL0;
+   var->data.explicit_location = true;
+
+   ir.push_tail(var);
+
+   link_invalidate_variable_locations(&ir);
+
+   EXPECT_EQ(VARYING_SLOT_COL0, var->data.location);
+   EXPECT_EQ(0u, var->data.location_frac);
+   EXPECT_TRUE(var->data.explicit_location);
+   EXPECT_FALSE(var->data.is_unmatched_generic_inout);
+}

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/.gitignore b/icd/intel/compiler/shader/tests/lower_jumps/.gitignore
new file mode 100644
index 0000000..f47cb20
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/.gitignore

@@ -0,0 +1 @@
+*.out

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/create_test_cases.py b/icd/intel/compiler/shader/tests/lower_jumps/create_test_cases.py
new file mode 100644
index 0000000..9974681
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/create_test_cases.py

@@ -0,0 +1,643 @@
+# coding=utf-8
+#
+# Copyright © 2011 Intel Corporation
+#
+# Permission is hereby granted, free of charge, to any person obtaining a
+# copy of this software and associated documentation files (the "Software"),
+# to deal in the Software without restriction, including without limitation
+# the rights to use, copy, modify, merge, publish, distribute, sublicense,
+# and/or sell copies of the Software, and to permit persons to whom the
+# Software is furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice (including the next
+# paragraph) shall be included in all copies or substantial portions of the
+# Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+# DEALINGS IN THE SOFTWARE.
+
+import os
+import os.path
+import re
+import subprocess
+import sys
+
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..')) # For access to sexps.py, which is in parent dir
+from sexps import *
+
+def make_test_case(f_name, ret_type, body):
+    """Create a simple optimization test case consisting of a single
+    function with the given name, return type, and body.
+
+    Global declarations are automatically created for any undeclared
+    variables that are referenced by the function.  All undeclared
+    variables are assumed to be floats.
+    """
+    check_sexp(body)
+    declarations = {}
+    def make_declarations(sexp, already_declared = ()):
+        if isinstance(sexp, list):
+            if len(sexp) == 2 and sexp[0] == 'var_ref':
+                if sexp[1] not in already_declared:
+                    declarations[sexp[1]] = [
+                        'declare', ['in'], 'float', sexp[1]]
+            elif len(sexp) == 4 and sexp[0] == 'assign':
+                assert sexp[2][0] == 'var_ref'
+                if sexp[2][1] not in already_declared:
+                    declarations[sexp[2][1]] = [
+                        'declare', ['out'], 'float', sexp[2][1]]
+                make_declarations(sexp[3], already_declared)
+            else:
+                already_declared = set(already_declared)
+                for s in sexp:
+                    if isinstance(s, list) and len(s) >= 4 and \
+                            s[0] == 'declare':
+                        already_declared.add(s[3])
+                    else:
+                        make_declarations(s, already_declared)
+    make_declarations(body)
+    return declarations.values() + \
+        [['function', f_name, ['signature', ret_type, ['parameters'], body]]]
+
+
+# The following functions can be used to build expressions.
+
+def const_float(value):
+    """Create an expression representing the given floating point value."""
+    return ['constant', 'float', ['{0:.6f}'.format(value)]]
+
+def const_bool(value):
+    """Create an expression representing the given boolean value.
+
+    If value is not a boolean, it is converted to a boolean.  So, for
+    instance, const_bool(1) is equivalent to const_bool(True).
+    """
+    return ['constant', 'bool', ['{0}'.format(1 if value else 0)]]
+
+def gt_zero(var_name):
+    """Create Construct the expression var_name > 0"""
+    return ['expression', 'bool', '>', ['var_ref', var_name], const_float(0)]
+
+
+# The following functions can be used to build complex control flow
+# statements.  All of these functions return statement lists (even
+# those which only create a single statement), so that statements can
+# be sequenced together using the '+' operator.
+
+def return_(value = None):
+    """Create a return statement."""
+    if value is not None:
+        return [['return', value]]
+    else:
+        return [['return']]
+
+def break_():
+    """Create a break statement."""
+    return ['break']
+
+def continue_():
+    """Create a continue statement."""
+    return ['continue']
+
+def simple_if(var_name, then_statements, else_statements = None):
+    """Create a statement of the form
+
+    if (var_name > 0.0) {
+       <then_statements>
+    } else {
+       <else_statements>
+    }
+
+    else_statements may be omitted.
+    """
+    if else_statements is None:
+        else_statements = []
+    check_sexp(then_statements)
+    check_sexp(else_statements)
+    return [['if', gt_zero(var_name), then_statements, else_statements]]
+
+def loop(statements):
+    """Create a loop containing the given statements as its loop
+    body.
+    """
+    check_sexp(statements)
+    return [['loop', [], [], [], [], statements]]
+
+def declare_temp(var_type, var_name):
+    """Create a declaration of the form
+
+    (declare (temporary) <var_type> <var_name)
+    """
+    return [['declare', ['temporary'], var_type, var_name]]
+
+def assign_x(var_name, value):
+    """Create a statement that assigns <value> to the variable
+    <var_name>.  The assignment uses the mask (x).
+    """
+    check_sexp(value)
+    return [['assign', ['x'], ['var_ref', var_name], value]]
+
+def complex_if(var_prefix, statements):
+    """Create a statement of the form
+
+    if (<var_prefix>a > 0.0) {
+       if (<var_prefix>b > 0.0) {
+          <statements>
+       }
+    }
+
+    This is useful in testing jump lowering, because if <statements>
+    ends in a jump, lower_jumps.cpp won't try to combine this
+    construct with the code that follows it, as it might do for a
+    simple if.
+
+    All variables used in the if statement are prefixed with
+    var_prefix.  This can be used to ensure uniqueness.
+    """
+    check_sexp(statements)
+    return simple_if(var_prefix + 'a', simple_if(var_prefix + 'b', statements))
+
+def declare_execute_flag():
+    """Create the statements that lower_jumps.cpp uses to declare and
+    initialize the temporary boolean execute_flag.
+    """
+    return declare_temp('bool', 'execute_flag') + \
+        assign_x('execute_flag', const_bool(True))
+
+def declare_return_flag():
+    """Create the statements that lower_jumps.cpp uses to declare and
+    initialize the temporary boolean return_flag.
+    """
+    return declare_temp('bool', 'return_flag') + \
+        assign_x('return_flag', const_bool(False))
+
+def declare_return_value():
+    """Create the statements that lower_jumps.cpp uses to declare and
+    initialize the temporary variable return_value.  Assume that
+    return_value is a float.
+    """
+    return declare_temp('float', 'return_value')
+
+def declare_break_flag():
+    """Create the statements that lower_jumps.cpp uses to declare and
+    initialize the temporary boolean break_flag.
+    """
+    return declare_temp('bool', 'break_flag') + \
+        assign_x('break_flag', const_bool(False))
+
+def lowered_return_simple(value = None):
+    """Create the statements that lower_jumps.cpp lowers a return
+    statement to, in situations where it does not need to clear the
+    execute flag.
+    """
+    if value:
+        result = assign_x('return_value', value)
+    else:
+        result = []
+    return result + assign_x('return_flag', const_bool(True))
+
+def lowered_return(value = None):
+    """Create the statements that lower_jumps.cpp lowers a return
+    statement to, in situations where it needs to clear the execute
+    flag.
+    """
+    return lowered_return_simple(value) + \
+        assign_x('execute_flag', const_bool(False))
+
+def lowered_continue():
+    """Create the statement that lower_jumps.cpp lowers a continue
+    statement to.
+    """
+    return assign_x('execute_flag', const_bool(False))
+
+def lowered_break_simple():
+    """Create the statement that lower_jumps.cpp lowers a break
+    statement to, in situations where it does not need to clear the
+    execute flag.
+    """
+    return assign_x('break_flag', const_bool(True))
+
+def lowered_break():
+    """Create the statement that lower_jumps.cpp lowers a break
+    statement to, in situations where it needs to clear the execute
+    flag.
+    """
+    return lowered_break_simple() + assign_x('execute_flag', const_bool(False))
+
+def if_execute_flag(statements):
+    """Wrap statements in an if test so that they will only execute if
+    execute_flag is True.
+    """
+    check_sexp(statements)
+    return [['if', ['var_ref', 'execute_flag'], statements, []]]
+
+def if_not_return_flag(statements):
+    """Wrap statements in an if test so that they will only execute if
+    return_flag is False.
+    """
+    check_sexp(statements)
+    return [['if', ['var_ref', 'return_flag'], [], statements]]
+
+def final_return():
+    """Create the return statement that lower_jumps.cpp places at the
+    end of a function when lowering returns.
+    """
+    return [['return', ['var_ref', 'return_value']]]
+
+def final_break():
+    """Create the conditional break statement that lower_jumps.cpp
+    places at the end of a function when lowering breaks.
+    """
+    return [['if', ['var_ref', 'break_flag'], break_(), []]]
+
+def bash_quote(*args):
+    """Quote the arguments appropriately so that bash will understand
+    each argument as a single word.
+    """
+    def quote_word(word):
+        for c in word:
+            if not (c.isalpha() or c.isdigit() or c in '@%_-+=:,./'):
+                break
+        else:
+            if not word:
+                return "''"
+            return word
+        return "'{0}'".format(word.replace("'", "'\"'\"'"))
+    return ' '.join(quote_word(word) for word in args)
+
+def create_test_case(doc_string, input_sexp, expected_sexp, test_name,
+                     pull_out_jumps=False, lower_sub_return=False,
+                     lower_main_return=False, lower_continue=False,
+                     lower_break=False):
+    """Create a test case that verifies that do_lower_jumps transforms
+    the given code in the expected way.
+    """
+    doc_lines = [line.strip() for line in doc_string.splitlines()]
+    doc_string = ''.join('# {0}\n'.format(line) for line in doc_lines if line != '')
+    check_sexp(input_sexp)
+    check_sexp(expected_sexp)
+    input_str = sexp_to_string(sort_decls(input_sexp))
+    expected_output = sexp_to_string(sort_decls(expected_sexp))
+
+    optimization = (
+        'do_lower_jumps({0:d}, {1:d}, {2:d}, {3:d}, {4:d})'.format(
+            pull_out_jumps, lower_sub_return, lower_main_return,
+            lower_continue, lower_break))
+    args = ['../../glsl_test', 'optpass', '--quiet', '--input-ir', optimization]
+    test_file = '{0}.opt_test'.format(test_name)
+    with open(test_file, 'w') as f:
+        f.write('#!/usr/bin/env bash\n#\n# This file was generated by create_test_cases.py.\n#\n')
+        f.write(doc_string)
+        f.write('{0} <<EOF\n'.format(bash_quote(*args)))
+        f.write('{0}\nEOF\n'.format(input_str))
+    os.chmod(test_file, 0774)
+    expected_file = '{0}.opt_test.expected'.format(test_name)
+    with open(expected_file, 'w') as f:
+        f.write('{0}\n'.format(expected_output))
+
+def test_lower_returns_main():
+    doc_string = """Test that do_lower_jumps respects the lower_main_return
+    flag in deciding whether to lower returns in the main
+    function.
+    """
+    input_sexp = make_test_case('main', 'void', (
+            complex_if('', return_())
+            ))
+    expected_sexp = make_test_case('main', 'void', (
+            declare_execute_flag() +
+            declare_return_flag() +
+            complex_if('', lowered_return())
+            ))
+    create_test_case(doc_string, input_sexp, expected_sexp, 'lower_returns_main_true',
+                     lower_main_return=True)
+    create_test_case(doc_string, input_sexp, input_sexp, 'lower_returns_main_false',
+                     lower_main_return=False)
+
+def test_lower_returns_sub():
+    doc_string = """Test that do_lower_jumps respects the lower_sub_return flag
+    in deciding whether to lower returns in subroutines.
+    """
+    input_sexp = make_test_case('sub', 'void', (
+            complex_if('', return_())
+            ))
+    expected_sexp = make_test_case('sub', 'void', (
+            declare_execute_flag() +
+            declare_return_flag() +
+            complex_if('', lowered_return())
+            ))
+    create_test_case(doc_string, input_sexp, expected_sexp, 'lower_returns_sub_true',
+                     lower_sub_return=True)
+    create_test_case(doc_string, input_sexp, input_sexp, 'lower_returns_sub_false',
+                     lower_sub_return=False)
+
+def test_lower_returns_1():
+    doc_string = """Test that a void return at the end of a function is
+    eliminated.
+    """
+    input_sexp = make_test_case('main', 'void', (
+            assign_x('a', const_float(1)) +
+            return_()
+            ))
+    expected_sexp = make_test_case('main', 'void', (
+            assign_x('a', const_float(1))
+            ))
+    create_test_case(doc_string, input_sexp, expected_sexp, 'lower_returns_1',
+                     lower_main_return=True)
+
+def test_lower_returns_2():
+    doc_string = """Test that lowering is not performed on a non-void return at
+    the end of subroutine.
+    """
+    input_sexp = make_test_case('sub', 'float', (
+            assign_x('a', const_float(1)) +
+            return_(const_float(1))
+            ))
+    create_test_case(doc_string, input_sexp, input_sexp, 'lower_returns_2',
+                     lower_sub_return=True)
+
+def test_lower_returns_3():
+    doc_string = """Test lowering of returns when there is one nested inside a
+    complex structure of ifs, and one at the end of a function.
+
+    In this case, the latter return needs to be lowered because it
+    will not be at the end of the function once the final return
+    is inserted.
+    """
+    input_sexp = make_test_case('sub', 'float', (
+            complex_if('', return_(const_float(1))) +
+            return_(const_float(2))
+            ))
+    expected_sexp = make_test_case('sub', 'float', (
+            declare_execute_flag() +
+            declare_return_value() +
+            declare_return_flag() +
+            complex_if('', lowered_return(const_float(1))) +
+            if_execute_flag(lowered_return(const_float(2))) +
+            final_return()
+            ))
+    create_test_case(doc_string, input_sexp, expected_sexp, 'lower_returns_3',
+                     lower_sub_return=True)
+
+def test_lower_returns_4():
+    doc_string = """Test that returns are properly lowered when they occur in
+    both branches of an if-statement.
+    """
+    input_sexp = make_test_case('sub', 'float', (
+            simple_if('a', return_(const_float(1)),
+                      return_(const_float(2)))
+            ))
+    expected_sexp = make_test_case('sub', 'float', (
+            declare_execute_flag() +
+            declare_return_value() +
+            declare_return_flag() +
+            simple_if('a', lowered_return(const_float(1)),
+                      lowered_return(const_float(2))) +
+            final_return()
+            ))
+    create_test_case(doc_string, input_sexp, expected_sexp, 'lower_returns_4',
+                     lower_sub_return=True)
+
+def test_lower_unified_returns():
+    doc_string = """If both branches of an if statement end in a return, and
+    pull_out_jumps is True, then those returns should be lifted
+    outside the if and then properly lowered.
+
+    Verify that this lowering occurs during the same pass as the
+    lowering of other returns by checking that extra temporary
+    variables aren't generated.
+    """
+    input_sexp = make_test_case('main', 'void', (
+            complex_if('a', return_()) +
+            simple_if('b', simple_if('c', return_(), return_()))
+            ))
+    expected_sexp = make_test_case('main', 'void', (
+            declare_execute_flag() +
+            declare_return_flag() +
+            complex_if('a', lowered_return()) +
+            if_execute_flag(simple_if('b', (simple_if('c', [], []) +
+                                            lowered_return())))
+            ))
+    create_test_case(doc_string, input_sexp, expected_sexp, 'lower_unified_returns',
+                     lower_main_return=True, pull_out_jumps=True)
+
+def test_lower_pulled_out_jump():
+    doc_string = """If one branch of an if ends in a jump, and control cannot
+    fall out the bottom of the other branch, and pull_out_jumps is
+    True, then the jump is lifted outside the if.
+
+    Verify that this lowering occurs during the same pass as the
+    lowering of other jumps by checking that extra temporary
+    variables aren't generated.
+    """
+    input_sexp = make_test_case('main', 'void', (
+            complex_if('a', return_()) +
+            loop(simple_if('b', simple_if('c', break_(), continue_()),
+                           return_())) +
+            assign_x('d', const_float(1))
+            ))
+    # Note: optimization produces two other effects: the break
+    # gets lifted out of the if statements, and the code after the
+    # loop gets guarded so that it only executes if the return
+    # flag is clear.
+    expected_sexp = make_test_case('main', 'void', (
+            declare_execute_flag() +
+            declare_return_flag() +
+            complex_if('a', lowered_return()) +
+            if_execute_flag(
+                loop(simple_if('b', simple_if('c', [], continue_()),
+                               lowered_return_simple()) +
+                     break_()) +
+                if_not_return_flag(assign_x('d', const_float(1))))
+            ))
+    create_test_case(doc_string, input_sexp, expected_sexp, 'lower_pulled_out_jump',
+                     lower_main_return=True, pull_out_jumps=True)
+
+def test_lower_breaks_1():
+    doc_string = """If a loop contains an unconditional break at the bottom of
+    it, it should not be lowered."""
+    input_sexp = make_test_case('main', 'void', (
+            loop(assign_x('a', const_float(1)) +
+                 break_())
+            ))
+    expected_sexp = input_sexp
+    create_test_case(doc_string, input_sexp, expected_sexp, 'lower_breaks_1', lower_break=True)
+
+def test_lower_breaks_2():
+    doc_string = """If a loop contains a conditional break at the bottom of it,
+    it should not be lowered if it is in the then-clause.
+    """
+    input_sexp = make_test_case('main', 'void', (
+            loop(assign_x('a', const_float(1)) +
+                 simple_if('b', break_()))
+            ))
+    expected_sexp = input_sexp
+    create_test_case(doc_string, input_sexp, expected_sexp, 'lower_breaks_2', lower_break=True)
+
+def test_lower_breaks_3():
+    doc_string = """If a loop contains a conditional break at the bottom of it,
+    it should not be lowered if it is in the then-clause, even if
+    there are statements preceding the break.
+    """
+    input_sexp = make_test_case('main', 'void', (
+            loop(assign_x('a', const_float(1)) +
+                 simple_if('b', (assign_x('c', const_float(1)) +
+                                 break_())))
+            ))
+    expected_sexp = input_sexp
+    create_test_case(doc_string, input_sexp, expected_sexp, 'lower_breaks_3', lower_break=True)
+
+def test_lower_breaks_4():
+    doc_string = """If a loop contains a conditional break at the bottom of it,
+    it should not be lowered if it is in the else-clause.
+    """
+    input_sexp = make_test_case('main', 'void', (
+            loop(assign_x('a', const_float(1)) +
+                 simple_if('b', [], break_()))
+            ))
+    expected_sexp = input_sexp
+    create_test_case(doc_string, input_sexp, expected_sexp, 'lower_breaks_4', lower_break=True)
+
+def test_lower_breaks_5():
+    doc_string = """If a loop contains a conditional break at the bottom of it,
+    it should not be lowered if it is in the else-clause, even if
+    there are statements preceding the break.
+    """
+    input_sexp = make_test_case('main', 'void', (
+            loop(assign_x('a', const_float(1)) +
+                 simple_if('b', [], (assign_x('c', const_float(1)) +
+                                     break_())))
+            ))
+    expected_sexp = input_sexp
+    create_test_case(doc_string, input_sexp, expected_sexp, 'lower_breaks_5', lower_break=True)
+
+def test_lower_breaks_6():
+    doc_string = """If a loop contains conditional breaks and continues, and
+    ends in an unconditional break, then the unconditional break
+    needs to be lowered, because it will no longer be at the end
+    of the loop after the final break is added.
+    """
+    input_sexp = make_test_case('main', 'void', (
+            loop(simple_if('a', (complex_if('b', continue_()) +
+                                 complex_if('c', break_()))) +
+                 break_())
+            ))
+    expected_sexp = make_test_case('main', 'void', (
+            declare_break_flag() +
+            loop(declare_execute_flag() +
+                 simple_if(
+                    'a',
+                    (complex_if('b', lowered_continue()) +
+                     if_execute_flag(
+                            complex_if('c', lowered_break())))) +
+                 if_execute_flag(lowered_break_simple()) +
+                 final_break())
+            ))
+    create_test_case(doc_string, input_sexp, expected_sexp, 'lower_breaks_6',
+                     lower_break=True, lower_continue=True)
+
+def test_lower_guarded_conditional_break():
+    doc_string = """Normally a conditional break at the end of a loop isn't
+    lowered, however if the conditional break gets placed inside
+    an if(execute_flag) because of earlier lowering of continues,
+    then the break needs to be lowered.
+    """
+    input_sexp = make_test_case('main', 'void', (
+            loop(complex_if('a', continue_()) +
+                 simple_if('b', break_()))
+            ))
+    expected_sexp = make_test_case('main', 'void', (
+            declare_break_flag() +
+            loop(declare_execute_flag() +
+                 complex_if('a', lowered_continue()) +
+                 if_execute_flag(simple_if('b', lowered_break())) +
+                 final_break())
+            ))
+    create_test_case(doc_string, input_sexp, expected_sexp, 'lower_guarded_conditional_break',
+                     lower_break=True, lower_continue=True)
+
+def test_remove_continue_at_end_of_loop():
+    doc_string = """Test that a redundant continue-statement at the end of a
+    loop is removed.
+    """
+    input_sexp = make_test_case('main', 'void', (
+            loop(assign_x('a', const_float(1)) +
+                 continue_())
+            ))
+    expected_sexp = make_test_case('main', 'void', (
+            loop(assign_x('a', const_float(1)))
+            ))
+    create_test_case(doc_string, input_sexp, expected_sexp, 'remove_continue_at_end_of_loop')
+
+def test_lower_return_void_at_end_of_loop():
+    doc_string = """Test that a return of void at the end of a loop is properly
+    lowered.
+    """
+    input_sexp = make_test_case('main', 'void', (
+            loop(assign_x('a', const_float(1)) +
+                 return_()) +
+            assign_x('b', const_float(2))
+            ))
+    expected_sexp = make_test_case('main', 'void', (
+            declare_return_flag() +
+            loop(assign_x('a', const_float(1)) +
+                 lowered_return_simple() +
+                 break_()) +
+            if_not_return_flag(assign_x('b', const_float(2)))
+            ))
+    create_test_case(doc_string, input_sexp, input_sexp, 'return_void_at_end_of_loop_lower_nothing')
+    create_test_case(doc_string, input_sexp, expected_sexp, 'return_void_at_end_of_loop_lower_return',
+                     lower_main_return=True)
+    create_test_case(doc_string, input_sexp, expected_sexp, 'return_void_at_end_of_loop_lower_return_and_break',
+                     lower_main_return=True, lower_break=True)
+
+def test_lower_return_non_void_at_end_of_loop():
+    doc_string = """Test that a non-void return at the end of a loop is
+    properly lowered.
+    """
+    input_sexp = make_test_case('sub', 'float', (
+            loop(assign_x('a', const_float(1)) +
+                 return_(const_float(2))) +
+            assign_x('b', const_float(3)) +
+            return_(const_float(4))
+            ))
+    expected_sexp = make_test_case('sub', 'float', (
+            declare_execute_flag() +
+            declare_return_value() +
+            declare_return_flag() +
+            loop(assign_x('a', const_float(1)) +
+                 lowered_return_simple(const_float(2)) +
+                 break_()) +
+            if_not_return_flag(assign_x('b', const_float(3)) +
+                               lowered_return(const_float(4))) +
+            final_return()
+            ))
+    create_test_case(doc_string, input_sexp, input_sexp, 'return_non_void_at_end_of_loop_lower_nothing')
+    create_test_case(doc_string, input_sexp, expected_sexp, 'return_non_void_at_end_of_loop_lower_return',
+                     lower_sub_return=True)
+    create_test_case(doc_string, input_sexp, expected_sexp, 'return_non_void_at_end_of_loop_lower_return_and_break',
+                     lower_sub_return=True, lower_break=True)
+
+if __name__ == '__main__':
+    test_lower_returns_main()
+    test_lower_returns_sub()
+    test_lower_returns_1()
+    test_lower_returns_2()
+    test_lower_returns_3()
+    test_lower_returns_4()
+    test_lower_unified_returns()
+    test_lower_pulled_out_jump()
+    test_lower_breaks_1()
+    test_lower_breaks_2()
+    test_lower_breaks_3()
+    test_lower_breaks_4()
+    test_lower_breaks_5()
+    test_lower_breaks_6()
+    test_lower_guarded_conditional_break()
+    test_remove_continue_at_end_of_loop()
+    test_lower_return_void_at_end_of_loop()
+    test_lower_return_non_void_at_end_of_loop()

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_1.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_1.opt_test
new file mode 100644
index 0000000..b412ba8
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_1.opt_test

@@ -0,0 +1,13 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# If a loop contains an unconditional break at the bottom of
+# it, it should not be lowered.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(0, 0, 0, 0, 1)' <<EOF
+((declare (out) float a)
+ (function main
+  (signature void (parameters)
+   ((loop
+     ((assign (x) (var_ref a) (constant float (1.000000))) break))))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_1.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_1.opt_test.expected
new file mode 100644
index 0000000..56ef3e4
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_1.opt_test.expected

@@ -0,0 +1,5 @@
+((declare (out) float a)
+ (function main
+  (signature void (parameters)
+   ((loop
+     ((assign (x) (var_ref a) (constant float (1.000000))) break))))))

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_2.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_2.opt_test
new file mode 100644
index 0000000..f5de803
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_2.opt_test

@@ -0,0 +1,15 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# If a loop contains a conditional break at the bottom of it,
+# it should not be lowered if it is in the then-clause.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(0, 0, 0, 0, 1)' <<EOF
+((declare (in) float b) (declare (out) float a)
+ (function main
+  (signature void (parameters)
+   ((loop
+     ((assign (x) (var_ref a) (constant float (1.000000)))
+      (if (expression bool > (var_ref b) (constant float (0.000000))) (break)
+       ())))))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_2.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_2.opt_test.expected
new file mode 100644
index 0000000..dc231f9
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_2.opt_test.expected

@@ -0,0 +1,7 @@
+((declare (in) float b) (declare (out) float a)
+ (function main
+  (signature void (parameters)
+   ((loop
+     ((assign (x) (var_ref a) (constant float (1.000000)))
+      (if (expression bool > (var_ref b) (constant float (0.0))) (break)
+       ())))))))

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_3.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_3.opt_test
new file mode 100644
index 0000000..60368bc
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_3.opt_test

@@ -0,0 +1,17 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# If a loop contains a conditional break at the bottom of it,
+# it should not be lowered if it is in the then-clause, even if
+# there are statements preceding the break.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(0, 0, 0, 0, 1)' <<EOF
+((declare (in) float b) (declare (out) float a) (declare (out) float c)
+ (function main
+  (signature void (parameters)
+   ((loop
+     ((assign (x) (var_ref a) (constant float (1.000000)))
+      (if (expression bool > (var_ref b) (constant float (0.000000)))
+       ((assign (x) (var_ref c) (constant float (1.000000))) break)
+       ())))))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_3.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_3.opt_test.expected
new file mode 100644
index 0000000..8131b66
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_3.opt_test.expected

@@ -0,0 +1,8 @@
+((declare (in) float b) (declare (out) float a) (declare (out) float c)
+ (function main
+  (signature void (parameters)
+   ((loop
+     ((assign (x) (var_ref a) (constant float (1.000000)))
+      (if (expression bool > (var_ref b) (constant float (0.0)))
+       ((assign (x) (var_ref c) (constant float (1.000000))) break)
+       ())))))))

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_4.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_4.opt_test
new file mode 100644
index 0000000..cde3197
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_4.opt_test

@@ -0,0 +1,15 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# If a loop contains a conditional break at the bottom of it,
+# it should not be lowered if it is in the else-clause.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(0, 0, 0, 0, 1)' <<EOF
+((declare (in) float b) (declare (out) float a)
+ (function main
+  (signature void (parameters)
+   ((loop
+     ((assign (x) (var_ref a) (constant float (1.000000)))
+      (if (expression bool > (var_ref b) (constant float (0.000000))) ()
+       (break))))))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_4.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_4.opt_test.expected
new file mode 100644
index 0000000..94dcb37
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_4.opt_test.expected

@@ -0,0 +1,7 @@
+((declare (in) float b) (declare (out) float a)
+ (function main
+  (signature void (parameters)
+   ((loop
+     ((assign (x) (var_ref a) (constant float (1.000000)))
+      (if (expression bool > (var_ref b) (constant float (0.0))) ()
+       (break))))))))

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_5.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_5.opt_test
new file mode 100644
index 0000000..157b589
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_5.opt_test

@@ -0,0 +1,16 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# If a loop contains a conditional break at the bottom of it,
+# it should not be lowered if it is in the else-clause, even if
+# there are statements preceding the break.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(0, 0, 0, 0, 1)' <<EOF
+((declare (in) float b) (declare (out) float a) (declare (out) float c)
+ (function main
+  (signature void (parameters)
+   ((loop
+     ((assign (x) (var_ref a) (constant float (1.000000)))
+      (if (expression bool > (var_ref b) (constant float (0.000000))) ()
+       ((assign (x) (var_ref c) (constant float (1.000000))) break))))))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_5.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_5.opt_test.expected
new file mode 100644
index 0000000..5b46ccb
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_5.opt_test.expected

@@ -0,0 +1,7 @@
+((declare (in) float b) (declare (out) float a) (declare (out) float c)
+ (function main
+  (signature void (parameters)
+   ((loop
+     ((assign (x) (var_ref a) (constant float (1.000000)))
+      (if (expression bool > (var_ref b) (constant float (0.0))) ()
+       ((assign (x) (var_ref c) (constant float (1.000000))) break))))))))

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_6.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_6.opt_test
new file mode 100644
index 0000000..4767df1
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_6.opt_test

@@ -0,0 +1,29 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# If a loop contains conditional breaks and continues, and
+# ends in an unconditional break, then the unconditional break
+# needs to be lowered, because it will no longer be at the end
+# of the loop after the final break is added.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(0, 0, 0, 1, 1)' <<EOF
+((declare (in) float a) (declare (in) float ba) (declare (in) float bb)
+ (declare (in) float ca)
+ (declare (in) float cb)
+ (function main
+  (signature void (parameters)
+   ((loop
+     ((if (expression bool > (var_ref a) (constant float (0.000000)))
+       ((if (expression bool > (var_ref ba) (constant float (0.000000)))
+         ((if (expression bool > (var_ref bb) (constant float (0.000000)))
+           (continue)
+           ()))
+         ())
+        (if (expression bool > (var_ref ca) (constant float (0.000000)))
+         ((if (expression bool > (var_ref cb) (constant float (0.000000)))
+           (break)
+           ()))
+         ()))
+       ())
+      break))))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_6.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_6.opt_test.expected
new file mode 100644
index 0000000..967ce64
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_breaks_6.opt_test.expected

@@ -0,0 +1,29 @@
+((declare (in) float a) (declare (in) float ba) (declare (in) float bb)
+ (declare (in) float ca)
+ (declare (in) float cb)
+ (function main
+  (signature void (parameters)
+   ((declare (temporary) bool break_flag)
+    (assign (x) (var_ref break_flag) (constant bool (0)))
+    (loop
+     ((declare (temporary) bool execute_flag)
+      (assign (x) (var_ref execute_flag) (constant bool (1)))
+      (if (expression bool > (var_ref a) (constant float (0.0)))
+       ((if (expression bool > (var_ref ba) (constant float (0.0)))
+         ((if (expression bool > (var_ref bb) (constant float (0.0)))
+           ((assign (x) (var_ref execute_flag) (constant bool (0))))
+           ()))
+         ())
+        (if (var_ref execute_flag)
+         ((if (expression bool > (var_ref ca) (constant float (0.0)))
+           ((if (expression bool > (var_ref cb) (constant float (0.0)))
+             ((assign (x) (var_ref break_flag) (constant bool (1)))
+              (assign (x) (var_ref execute_flag) (constant bool (0))))
+             ()))
+           ()))
+         ()))
+       ())
+      (if (var_ref execute_flag)
+       ((assign (x) (var_ref break_flag) (constant bool (1))))
+       ())
+      (if (var_ref break_flag) (break) ())))))))

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_guarded_conditional_break.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/lower_guarded_conditional_break.opt_test
new file mode 100644
index 0000000..164914a
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_guarded_conditional_break.opt_test

@@ -0,0 +1,21 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# Normally a conditional break at the end of a loop isn't
+# lowered, however if the conditional break gets placed inside
+# an if(execute_flag) because of earlier lowering of continues,
+# then the break needs to be lowered.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(0, 0, 0, 1, 1)' <<EOF
+((declare (in) float aa) (declare (in) float ab) (declare (in) float b)
+ (function main
+  (signature void (parameters)
+   ((loop
+     ((if (expression bool > (var_ref aa) (constant float (0.000000)))
+       ((if (expression bool > (var_ref ab) (constant float (0.000000)))
+         (continue)
+         ()))
+       ())
+      (if (expression bool > (var_ref b) (constant float (0.000000))) (break)
+       ())))))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_guarded_conditional_break.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/lower_guarded_conditional_break.opt_test.expected
new file mode 100644
index 0000000..841073e
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_guarded_conditional_break.opt_test.expected

@@ -0,0 +1,20 @@
+((declare (in) float aa) (declare (in) float ab) (declare (in) float b)
+ (function main
+  (signature void (parameters)
+   ((declare (temporary) bool break_flag)
+    (assign (x) (var_ref break_flag) (constant bool (0)))
+    (loop
+     ((declare (temporary) bool execute_flag)
+      (assign (x) (var_ref execute_flag) (constant bool (1)))
+      (if (expression bool > (var_ref aa) (constant float (0.0)))
+       ((if (expression bool > (var_ref ab) (constant float (0.0)))
+         ((assign (x) (var_ref execute_flag) (constant bool (0))))
+         ()))
+       ())
+      (if (var_ref execute_flag)
+       ((if (expression bool > (var_ref b) (constant float (0.0)))
+         ((assign (x) (var_ref break_flag) (constant bool (1)))
+          (assign (x) (var_ref execute_flag) (constant bool (0))))
+         ()))
+       ())
+      (if (var_ref break_flag) (break) ())))))))

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_pulled_out_jump.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/lower_pulled_out_jump.opt_test
new file mode 100644
index 0000000..1a5c096
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_pulled_out_jump.opt_test

@@ -0,0 +1,28 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# If one branch of an if ends in a jump, and control cannot
+# fall out the bottom of the other branch, and pull_out_jumps is
+# True, then the jump is lifted outside the if.
+# Verify that this lowering occurs during the same pass as the
+# lowering of other jumps by checking that extra temporary
+# variables aren't generated.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(1, 0, 1, 0, 0)' <<EOF
+((declare (in) float aa) (declare (in) float ab) (declare (in) float b)
+ (declare (in) float c)
+ (declare (out) float d)
+ (function main
+  (signature void (parameters)
+   ((if (expression bool > (var_ref aa) (constant float (0.000000)))
+     ((if (expression bool > (var_ref ab) (constant float (0.000000)))
+       ((return))
+       ()))
+     ())
+    (loop
+     ((if (expression bool > (var_ref b) (constant float (0.000000)))
+       ((if (expression bool > (var_ref c) (constant float (0.000000))) (break)
+         (continue)))
+       ((return)))))
+    (assign (x) (var_ref d) (constant float (1.000000)))))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_pulled_out_jump.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/lower_pulled_out_jump.opt_test.expected
new file mode 100644
index 0000000..cf2ef3f
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_pulled_out_jump.opt_test.expected

@@ -0,0 +1,25 @@
+((declare (in) float aa) (declare (in) float ab) (declare (in) float b)
+ (declare (in) float c)
+ (declare (out) float d)
+ (function main
+  (signature void (parameters)
+   ((declare (temporary) bool execute_flag)
+    (assign (x) (var_ref execute_flag) (constant bool (1)))
+    (declare (temporary) bool return_flag)
+    (assign (x) (var_ref return_flag) (constant bool (0)))
+    (if (expression bool > (var_ref aa) (constant float (0.0)))
+     ((if (expression bool > (var_ref ab) (constant float (0.0)))
+       ((assign (x) (var_ref return_flag) (constant bool (1)))
+        (assign (x) (var_ref execute_flag) (constant bool (0))))
+       ()))
+     ())
+    (if (var_ref execute_flag)
+     ((loop
+       ((if (expression bool > (var_ref b) (constant float (0.0)))
+         ((if (expression bool > (var_ref c) (constant float (0.0))) ()
+           (continue)))
+         ((assign (x) (var_ref return_flag) (constant bool (1)))))
+        break))
+      (if (var_ref return_flag) ()
+       ((assign (x) (var_ref d) (constant float (1.000000))))))
+     ())))))

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_1.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_1.opt_test
new file mode 100644
index 0000000..e73c512
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_1.opt_test

@@ -0,0 +1,12 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# Test that a void return at the end of a function is
+# eliminated.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(0, 0, 1, 0, 0)' <<EOF
+((declare (out) float a)
+ (function main
+  (signature void (parameters)
+   ((assign (x) (var_ref a) (constant float (1.000000))) (return)))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_1.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_1.opt_test.expected
new file mode 100644
index 0000000..7c3919c
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_1.opt_test.expected

@@ -0,0 +1,4 @@
+((declare (out) float a)
+ (function main
+  (signature void (parameters)
+   ((assign (x) (var_ref a) (constant float (1.000000)))))))

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_2.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_2.opt_test
new file mode 100644
index 0000000..da2dc7e
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_2.opt_test

@@ -0,0 +1,13 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# Test that lowering is not performed on a non-void return at
+# the end of subroutine.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(0, 1, 0, 0, 0)' <<EOF
+((declare (out) float a)
+ (function sub
+  (signature float (parameters)
+   ((assign (x) (var_ref a) (constant float (1.000000)))
+    (return (constant float (1.000000)))))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_2.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_2.opt_test.expected
new file mode 100644
index 0000000..7777927
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_2.opt_test.expected

@@ -0,0 +1,5 @@
+((declare (out) float a)
+ (function sub
+  (signature float (parameters)
+   ((assign (x) (var_ref a) (constant float (1.000000)))
+    (return (constant float (1.000000)))))))

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_3.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_3.opt_test
new file mode 100644
index 0000000..9509781
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_3.opt_test

@@ -0,0 +1,20 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# Test lowering of returns when there is one nested inside a
+# complex structure of ifs, and one at the end of a function.
+# In this case, the latter return needs to be lowered because it
+# will not be at the end of the function once the final return
+# is inserted.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(0, 1, 0, 0, 0)' <<EOF
+((declare (in) float a) (declare (in) float b)
+ (function sub
+  (signature float (parameters)
+   ((if (expression bool > (var_ref a) (constant float (0.000000)))
+     ((if (expression bool > (var_ref b) (constant float (0.000000)))
+       ((return (constant float (1.000000))))
+       ()))
+     ())
+    (return (constant float (2.000000)))))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_3.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_3.opt_test.expected
new file mode 100644
index 0000000..5b62bbc
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_3.opt_test.expected

@@ -0,0 +1,21 @@
+((declare (in) float a) (declare (in) float b)
+ (function sub
+  (signature float (parameters)
+   ((declare (temporary) bool execute_flag)
+    (assign (x) (var_ref execute_flag) (constant bool (1)))
+    (declare (temporary) float return_value)
+    (declare (temporary) bool return_flag)
+    (assign (x) (var_ref return_flag) (constant bool (0)))
+    (if (expression bool > (var_ref a) (constant float (0.0)))
+     ((if (expression bool > (var_ref b) (constant float (0.0)))
+       ((assign (x) (var_ref return_value) (constant float (1.000000)))
+        (assign (x) (var_ref return_flag) (constant bool (1)))
+        (assign (x) (var_ref execute_flag) (constant bool (0))))
+       ()))
+     ())
+    (if (var_ref execute_flag)
+     ((assign (x) (var_ref return_value) (constant float (2.000000)))
+      (assign (x) (var_ref return_flag) (constant bool (1)))
+      (assign (x) (var_ref execute_flag) (constant bool (0))))
+     ())
+    (return (var_ref return_value))))))

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_4.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_4.opt_test
new file mode 100644
index 0000000..c5bb9c8
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_4.opt_test

@@ -0,0 +1,14 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# Test that returns are properly lowered when they occur in
+# both branches of an if-statement.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(0, 1, 0, 0, 0)' <<EOF
+((declare (in) float a)
+ (function sub
+  (signature float (parameters)
+   ((if (expression bool > (var_ref a) (constant float (0.000000)))
+     ((return (constant float (1.000000))))
+     ((return (constant float (2.000000)))))))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_4.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_4.opt_test.expected
new file mode 100644
index 0000000..07c6842
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_4.opt_test.expected

@@ -0,0 +1,16 @@
+((declare (in) float a)
+ (function sub
+  (signature float (parameters)
+   ((declare (temporary) bool execute_flag)
+    (assign (x) (var_ref execute_flag) (constant bool (1)))
+    (declare (temporary) float return_value)
+    (declare (temporary) bool return_flag)
+    (assign (x) (var_ref return_flag) (constant bool (0)))
+    (if (expression bool > (var_ref a) (constant float (0.0)))
+     ((assign (x) (var_ref return_value) (constant float (1.000000)))
+      (assign (x) (var_ref return_flag) (constant bool (1)))
+      (assign (x) (var_ref execute_flag) (constant bool (0))))
+     ((assign (x) (var_ref return_value) (constant float (2.000000)))
+      (assign (x) (var_ref return_flag) (constant bool (1)))
+      (assign (x) (var_ref execute_flag) (constant bool (0)))))
+    (return (var_ref return_value))))))

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_main_false.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_main_false.opt_test
new file mode 100644
index 0000000..fdb1d0e
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_main_false.opt_test

@@ -0,0 +1,17 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# Test that do_lower_jumps respects the lower_main_return
+# flag in deciding whether to lower returns in the main
+# function.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(0, 0, 0, 0, 0)' <<EOF
+((declare (in) float a) (declare (in) float b)
+ (function main
+  (signature void (parameters)
+   ((if (expression bool > (var_ref a) (constant float (0.000000)))
+     ((if (expression bool > (var_ref b) (constant float (0.000000)))
+       ((return))
+       ()))
+     ())))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_main_false.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_main_false.opt_test.expected
new file mode 100644
index 0000000..7e3fe31
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_main_false.opt_test.expected

@@ -0,0 +1,8 @@
+((declare (in) float a) (declare (in) float b)
+ (function main
+  (signature void (parameters)
+   ((if (expression bool > (var_ref a) (constant float (0.0)))
+     ((if (expression bool > (var_ref b) (constant float (0.0)))
+       ((return))
+       ()))
+     ())))))

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_main_true.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_main_true.opt_test
new file mode 100644
index 0000000..939ec8b
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_main_true.opt_test

@@ -0,0 +1,17 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# Test that do_lower_jumps respects the lower_main_return
+# flag in deciding whether to lower returns in the main
+# function.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(0, 0, 1, 0, 0)' <<EOF
+((declare (in) float a) (declare (in) float b)
+ (function main
+  (signature void (parameters)
+   ((if (expression bool > (var_ref a) (constant float (0.000000)))
+     ((if (expression bool > (var_ref b) (constant float (0.000000)))
+       ((return))
+       ()))
+     ())))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_main_true.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_main_true.opt_test.expected
new file mode 100644
index 0000000..b47f5a4
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_main_true.opt_test.expected

@@ -0,0 +1,13 @@
+((declare (in) float a) (declare (in) float b)
+ (function main
+  (signature void (parameters)
+   ((declare (temporary) bool execute_flag)
+    (assign (x) (var_ref execute_flag) (constant bool (1)))
+    (declare (temporary) bool return_flag)
+    (assign (x) (var_ref return_flag) (constant bool (0)))
+    (if (expression bool > (var_ref a) (constant float (0.0)))
+     ((if (expression bool > (var_ref b) (constant float (0.0)))
+       ((assign (x) (var_ref return_flag) (constant bool (1)))
+        (assign (x) (var_ref execute_flag) (constant bool (0))))
+       ()))
+     ())))))

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_sub_false.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_sub_false.opt_test
new file mode 100644
index 0000000..92a4e8a
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_sub_false.opt_test

@@ -0,0 +1,16 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# Test that do_lower_jumps respects the lower_sub_return flag
+# in deciding whether to lower returns in subroutines.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(0, 0, 0, 0, 0)' <<EOF
+((declare (in) float a) (declare (in) float b)
+ (function sub
+  (signature void (parameters)
+   ((if (expression bool > (var_ref a) (constant float (0.000000)))
+     ((if (expression bool > (var_ref b) (constant float (0.000000)))
+       ((return))
+       ()))
+     ())))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_sub_false.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_sub_false.opt_test.expected
new file mode 100644
index 0000000..7424968
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_sub_false.opt_test.expected

@@ -0,0 +1,8 @@
+((declare (in) float a) (declare (in) float b)
+ (function sub
+  (signature void (parameters)
+   ((if (expression bool > (var_ref a) (constant float (0.0)))
+     ((if (expression bool > (var_ref b) (constant float (0.0)))
+       ((return))
+       ()))
+     ())))))

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_sub_true.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_sub_true.opt_test
new file mode 100644
index 0000000..789414e
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_sub_true.opt_test

@@ -0,0 +1,16 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# Test that do_lower_jumps respects the lower_sub_return flag
+# in deciding whether to lower returns in subroutines.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(0, 1, 0, 0, 0)' <<EOF
+((declare (in) float a) (declare (in) float b)
+ (function sub
+  (signature void (parameters)
+   ((if (expression bool > (var_ref a) (constant float (0.000000)))
+     ((if (expression bool > (var_ref b) (constant float (0.000000)))
+       ((return))
+       ()))
+     ())))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_sub_true.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_sub_true.opt_test.expected
new file mode 100644
index 0000000..1a3eae5
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_returns_sub_true.opt_test.expected

@@ -0,0 +1,13 @@
+((declare (in) float a) (declare (in) float b)
+ (function sub
+  (signature void (parameters)
+   ((declare (temporary) bool execute_flag)
+    (assign (x) (var_ref execute_flag) (constant bool (1)))
+    (declare (temporary) bool return_flag)
+    (assign (x) (var_ref return_flag) (constant bool (0)))
+    (if (expression bool > (var_ref a) (constant float (0.0)))
+     ((if (expression bool > (var_ref b) (constant float (0.0)))
+       ((assign (x) (var_ref return_flag) (constant bool (1)))
+        (assign (x) (var_ref execute_flag) (constant bool (0))))
+       ()))
+     ())))))

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_unified_returns.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/lower_unified_returns.opt_test
new file mode 100644
index 0000000..5d6e51c
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_unified_returns.opt_test

@@ -0,0 +1,26 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# If both branches of an if statement end in a return, and
+# pull_out_jumps is True, then those returns should be lifted
+# outside the if and then properly lowered.
+# Verify that this lowering occurs during the same pass as the
+# lowering of other returns by checking that extra temporary
+# variables aren't generated.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(1, 0, 1, 0, 0)' <<EOF
+((declare (in) float aa) (declare (in) float ab) (declare (in) float b)
+ (declare (in) float c)
+ (function main
+  (signature void (parameters)
+   ((if (expression bool > (var_ref aa) (constant float (0.000000)))
+     ((if (expression bool > (var_ref ab) (constant float (0.000000)))
+       ((return))
+       ()))
+     ())
+    (if (expression bool > (var_ref b) (constant float (0.000000)))
+     ((if (expression bool > (var_ref c) (constant float (0.000000)))
+       ((return))
+       ((return))))
+     ())))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/lower_unified_returns.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/lower_unified_returns.opt_test.expected
new file mode 100644
index 0000000..c0b51e1
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/lower_unified_returns.opt_test.expected

@@ -0,0 +1,21 @@
+((declare (in) float aa) (declare (in) float ab) (declare (in) float b)
+ (declare (in) float c)
+ (function main
+  (signature void (parameters)
+   ((declare (temporary) bool execute_flag)
+    (assign (x) (var_ref execute_flag) (constant bool (1)))
+    (declare (temporary) bool return_flag)
+    (assign (x) (var_ref return_flag) (constant bool (0)))
+    (if (expression bool > (var_ref aa) (constant float (0.0)))
+     ((if (expression bool > (var_ref ab) (constant float (0.0)))
+       ((assign (x) (var_ref return_flag) (constant bool (1)))
+        (assign (x) (var_ref execute_flag) (constant bool (0))))
+       ()))
+     ())
+    (if (var_ref execute_flag)
+     ((if (expression bool > (var_ref b) (constant float (0.0)))
+       ((if (expression bool > (var_ref c) (constant float (0.0))) () ())
+        (assign (x) (var_ref return_flag) (constant bool (1)))
+        (assign (x) (var_ref execute_flag) (constant bool (0))))
+       ()))
+     ())))))

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/remove_continue_at_end_of_loop.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/remove_continue_at_end_of_loop.opt_test
new file mode 100644
index 0000000..8403bb2
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/remove_continue_at_end_of_loop.opt_test

@@ -0,0 +1,13 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# Test that a redundant continue-statement at the end of a
+# loop is removed.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(0, 0, 0, 0, 0)' <<EOF
+((declare (out) float a)
+ (function main
+  (signature void (parameters)
+   ((loop
+     ((assign (x) (var_ref a) (constant float (1.000000))) continue))))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/remove_continue_at_end_of_loop.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/remove_continue_at_end_of_loop.opt_test.expected
new file mode 100644
index 0000000..98b74d7
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/remove_continue_at_end_of_loop.opt_test.expected

@@ -0,0 +1,5 @@
+((declare (out) float a)
+ (function main
+  (signature void (parameters)
+   ((loop
+     ((assign (x) (var_ref a) (constant float (1.000000)))))))))

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/return_non_void_at_end_of_loop_lower_nothing.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/return_non_void_at_end_of_loop_lower_nothing.opt_test
new file mode 100644
index 0000000..1f62e73
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/return_non_void_at_end_of_loop_lower_nothing.opt_test

@@ -0,0 +1,16 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# Test that a non-void return at the end of a loop is
+# properly lowered.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(0, 0, 0, 0, 0)' <<EOF
+((declare (out) float a) (declare (out) float b)
+ (function sub
+  (signature float (parameters)
+   ((loop
+     ((assign (x) (var_ref a) (constant float (1.000000)))
+      (return (constant float (2.000000)))))
+    (assign (x) (var_ref b) (constant float (3.000000)))
+    (return (constant float (4.000000)))))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/return_non_void_at_end_of_loop_lower_nothing.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/return_non_void_at_end_of_loop_lower_nothing.opt_test.expected
new file mode 100644
index 0000000..040d383
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/return_non_void_at_end_of_loop_lower_nothing.opt_test.expected

@@ -0,0 +1,8 @@
+((declare (out) float a) (declare (out) float b)
+ (function sub
+  (signature float (parameters)
+   ((loop
+     ((assign (x) (var_ref a) (constant float (1.000000)))
+      (return (constant float (2.000000)))))
+    (assign (x) (var_ref b) (constant float (3.000000)))
+    (return (constant float (4.000000)))))))

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/return_non_void_at_end_of_loop_lower_return.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/return_non_void_at_end_of_loop_lower_return.opt_test
new file mode 100644
index 0000000..42c4e75
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/return_non_void_at_end_of_loop_lower_return.opt_test

@@ -0,0 +1,16 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# Test that a non-void return at the end of a loop is
+# properly lowered.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(0, 1, 0, 0, 0)' <<EOF
+((declare (out) float a) (declare (out) float b)
+ (function sub
+  (signature float (parameters)
+   ((loop
+     ((assign (x) (var_ref a) (constant float (1.000000)))
+      (return (constant float (2.000000)))))
+    (assign (x) (var_ref b) (constant float (3.000000)))
+    (return (constant float (4.000000)))))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/return_non_void_at_end_of_loop_lower_return.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/return_non_void_at_end_of_loop_lower_return.opt_test.expected
new file mode 100644
index 0000000..792cbf6
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/return_non_void_at_end_of_loop_lower_return.opt_test.expected

@@ -0,0 +1,19 @@
+((declare (out) float a) (declare (out) float b)
+ (function sub
+  (signature float (parameters)
+   ((declare (temporary) bool execute_flag)
+    (assign (x) (var_ref execute_flag) (constant bool (1)))
+    (declare (temporary) float return_value)
+    (declare (temporary) bool return_flag)
+    (assign (x) (var_ref return_flag) (constant bool (0)))
+    (loop
+     ((assign (x) (var_ref a) (constant float (1.000000)))
+      (assign (x) (var_ref return_value) (constant float (2.000000)))
+      (assign (x) (var_ref return_flag) (constant bool (1)))
+      break))
+    (if (var_ref return_flag) ()
+     ((assign (x) (var_ref b) (constant float (3.000000)))
+      (assign (x) (var_ref return_value) (constant float (4.000000)))
+      (assign (x) (var_ref return_flag) (constant bool (1)))
+      (assign (x) (var_ref execute_flag) (constant bool (0)))))
+    (return (var_ref return_value))))))

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/return_non_void_at_end_of_loop_lower_return_and_break.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/return_non_void_at_end_of_loop_lower_return_and_break.opt_test
new file mode 100644
index 0000000..b3eef39
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/return_non_void_at_end_of_loop_lower_return_and_break.opt_test

@@ -0,0 +1,16 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# Test that a non-void return at the end of a loop is
+# properly lowered.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(0, 1, 0, 0, 1)' <<EOF
+((declare (out) float a) (declare (out) float b)
+ (function sub
+  (signature float (parameters)
+   ((loop
+     ((assign (x) (var_ref a) (constant float (1.000000)))
+      (return (constant float (2.000000)))))
+    (assign (x) (var_ref b) (constant float (3.000000)))
+    (return (constant float (4.000000)))))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/return_non_void_at_end_of_loop_lower_return_and_break.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/return_non_void_at_end_of_loop_lower_return_and_break.opt_test.expected
new file mode 100644
index 0000000..792cbf6
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/return_non_void_at_end_of_loop_lower_return_and_break.opt_test.expected

@@ -0,0 +1,19 @@
+((declare (out) float a) (declare (out) float b)
+ (function sub
+  (signature float (parameters)
+   ((declare (temporary) bool execute_flag)
+    (assign (x) (var_ref execute_flag) (constant bool (1)))
+    (declare (temporary) float return_value)
+    (declare (temporary) bool return_flag)
+    (assign (x) (var_ref return_flag) (constant bool (0)))
+    (loop
+     ((assign (x) (var_ref a) (constant float (1.000000)))
+      (assign (x) (var_ref return_value) (constant float (2.000000)))
+      (assign (x) (var_ref return_flag) (constant bool (1)))
+      break))
+    (if (var_ref return_flag) ()
+     ((assign (x) (var_ref b) (constant float (3.000000)))
+      (assign (x) (var_ref return_value) (constant float (4.000000)))
+      (assign (x) (var_ref return_flag) (constant bool (1)))
+      (assign (x) (var_ref execute_flag) (constant bool (0)))))
+    (return (var_ref return_value))))))

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/return_void_at_end_of_loop_lower_nothing.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/return_void_at_end_of_loop_lower_nothing.opt_test
new file mode 100644
index 0000000..0408282
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/return_void_at_end_of_loop_lower_nothing.opt_test

@@ -0,0 +1,14 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# Test that a return of void at the end of a loop is properly
+# lowered.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(0, 0, 0, 0, 0)' <<EOF
+((declare (out) float a) (declare (out) float b)
+ (function main
+  (signature void (parameters)
+   ((loop
+     ((assign (x) (var_ref a) (constant float (1.000000))) (return)))
+    (assign (x) (var_ref b) (constant float (2.000000)))))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/return_void_at_end_of_loop_lower_nothing.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/return_void_at_end_of_loop_lower_nothing.opt_test.expected
new file mode 100644
index 0000000..569213e
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/return_void_at_end_of_loop_lower_nothing.opt_test.expected

@@ -0,0 +1,6 @@
+((declare (out) float a) (declare (out) float b)
+ (function main
+  (signature void (parameters)
+   ((loop
+     ((assign (x) (var_ref a) (constant float (1.000000))) (return)))
+    (assign (x) (var_ref b) (constant float (2.000000)))))))

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/return_void_at_end_of_loop_lower_return.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/return_void_at_end_of_loop_lower_return.opt_test
new file mode 100644
index 0000000..a7e65c8
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/return_void_at_end_of_loop_lower_return.opt_test

@@ -0,0 +1,14 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# Test that a return of void at the end of a loop is properly
+# lowered.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(0, 0, 1, 0, 0)' <<EOF
+((declare (out) float a) (declare (out) float b)
+ (function main
+  (signature void (parameters)
+   ((loop
+     ((assign (x) (var_ref a) (constant float (1.000000))) (return)))
+    (assign (x) (var_ref b) (constant float (2.000000)))))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/return_void_at_end_of_loop_lower_return.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/return_void_at_end_of_loop_lower_return.opt_test.expected
new file mode 100644
index 0000000..66f3aec
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/return_void_at_end_of_loop_lower_return.opt_test.expected

@@ -0,0 +1,11 @@
+((declare (out) float a) (declare (out) float b)
+ (function main
+  (signature void (parameters)
+   ((declare (temporary) bool return_flag)
+    (assign (x) (var_ref return_flag) (constant bool (0)))
+    (loop
+     ((assign (x) (var_ref a) (constant float (1.000000)))
+      (assign (x) (var_ref return_flag) (constant bool (1)))
+      break))
+    (if (var_ref return_flag) ()
+     ((assign (x) (var_ref b) (constant float (2.000000)))))))))

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/return_void_at_end_of_loop_lower_return_and_break.opt_test b/icd/intel/compiler/shader/tests/lower_jumps/return_void_at_end_of_loop_lower_return_and_break.opt_test
new file mode 100644
index 0000000..7a5efe5
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/return_void_at_end_of_loop_lower_return_and_break.opt_test

@@ -0,0 +1,14 @@
+#!/usr/bin/env bash
+#
+# This file was generated by create_test_cases.py.
+#
+# Test that a return of void at the end of a loop is properly
+# lowered.
+../../glsl_test optpass --quiet --input-ir 'do_lower_jumps(0, 0, 1, 0, 1)' <<EOF
+((declare (out) float a) (declare (out) float b)
+ (function main
+  (signature void (parameters)
+   ((loop
+     ((assign (x) (var_ref a) (constant float (1.000000))) (return)))
+    (assign (x) (var_ref b) (constant float (2.000000)))))))
+EOF

diff --git a/icd/intel/compiler/shader/tests/lower_jumps/return_void_at_end_of_loop_lower_return_and_break.opt_test.expected b/icd/intel/compiler/shader/tests/lower_jumps/return_void_at_end_of_loop_lower_return_and_break.opt_test.expected
new file mode 100644
index 0000000..66f3aec
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/lower_jumps/return_void_at_end_of_loop_lower_return_and_break.opt_test.expected

@@ -0,0 +1,11 @@
+((declare (out) float a) (declare (out) float b)
+ (function main
+  (signature void (parameters)
+   ((declare (temporary) bool return_flag)
+    (assign (x) (var_ref return_flag) (constant bool (0)))
+    (loop
+     ((assign (x) (var_ref a) (constant float (1.000000)))
+      (assign (x) (var_ref return_flag) (constant bool (1)))
+      break))
+    (if (var_ref return_flag) ()
+     ((assign (x) (var_ref b) (constant float (2.000000)))))))))

diff --git a/icd/intel/compiler/shader/tests/optimization-test b/icd/intel/compiler/shader/tests/optimization-test
new file mode 100644
index 0000000..8ca7776
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/optimization-test

@@ -0,0 +1,34 @@
+#!/usr/bin/env bash
+
+if [ ! -z "$srcdir" ]; then
+   compare_ir=`pwd`/tests/compare_ir
+else
+   compare_ir=./compare_ir
+fi
+
+total=0
+pass=0
+
+echo "====== Testing optimization passes ======"
+for test in `find . -iname '*.opt_test'`; do
+    echo -n "Testing $test..."
+    (cd `dirname "$test"`; ./`basename "$test"`) > "$test.out" 2>&1
+    total=$((total+1))
+    if $PYTHON2 $PYTHON_FLAGS $compare_ir "$test.expected" "$test.out" >/dev/null 2>&1; then
+        echo "PASS"
+        pass=$((pass+1))
+    else
+        echo "FAIL"
+        $PYTHON2 $PYTHON_FLAGS $compare_ir "$test.expected" "$test.out"
+    fi
+done
+
+echo ""
+echo "$pass/$total tests returned correct results"
+echo ""
+
+if [[ $pass == $total ]]; then
+    exit 0
+else
+    exit 1
+fi

diff --git a/icd/intel/compiler/shader/tests/ralloc_test.cpp b/icd/intel/compiler/shader/tests/ralloc_test.cpp
new file mode 100644
index 0000000..c0a870a
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/ralloc_test.cpp

@@ -0,0 +1,38 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include <gtest/gtest.h>
+#include <string.h>
+
+#include "ralloc.h"
+
+/**
+ * \name Basic functionality
+ */
+/*@{*/
+TEST(ralloc_test, null_parent)
+{
+   void *mem_ctx = ralloc_context(NULL);
+
+   EXPECT_EQ(NULL, ralloc_parent(mem_ctx));
+}
+/*@}*/

diff --git a/icd/intel/compiler/shader/tests/sampler_types_test.cpp b/icd/intel/compiler/shader/tests/sampler_types_test.cpp
new file mode 100644
index 0000000..86d329a
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/sampler_types_test.cpp

@@ -0,0 +1,101 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include <gtest/gtest.h>
+#include "main/compiler.h"
+#include "main/mtypes.h"
+#include "main/macros.h"
+#include "ralloc.h"
+#include "ir.h"
+
+/**
+ * \file sampler_types_test.cpp
+ *
+ * Test that built-in sampler types have the right properties.
+ */
+
+#define ARRAY    EXPECT_TRUE(type->sampler_array);
+#define NONARRAY EXPECT_FALSE(type->sampler_array);
+#define SHADOW   EXPECT_TRUE(type->sampler_shadow);
+#define COLOR    EXPECT_FALSE(type->sampler_shadow);
+
+#define T(TYPE, DIM, DATA_TYPE, ARR, SHAD, COMPS)           \
+TEST(sampler_types, TYPE)                                   \
+{                                                           \
+   const glsl_type *type = glsl_type::TYPE##_type;          \
+   EXPECT_EQ(GLSL_TYPE_SAMPLER, type->base_type);           \
+   EXPECT_EQ(DIM, type->sampler_dimensionality);            \
+   EXPECT_EQ(DATA_TYPE, type->sampler_type);                \
+   ARR;                                                     \
+   SHAD;                                                    \
+   EXPECT_EQ(COMPS, type->coordinate_components());         \
+}
+
+T( sampler1D,        GLSL_SAMPLER_DIM_1D,   GLSL_TYPE_FLOAT, NONARRAY, COLOR,  1)
+T( sampler2D,        GLSL_SAMPLER_DIM_2D,   GLSL_TYPE_FLOAT, NONARRAY, COLOR,  2)
+T( sampler3D,        GLSL_SAMPLER_DIM_3D,   GLSL_TYPE_FLOAT, NONARRAY, COLOR,  3)
+T( samplerCube,      GLSL_SAMPLER_DIM_CUBE, GLSL_TYPE_FLOAT, NONARRAY, COLOR,  3)
+T( sampler1DArray,   GLSL_SAMPLER_DIM_1D,   GLSL_TYPE_FLOAT, ARRAY,    COLOR,  2)
+T( sampler2DArray,   GLSL_SAMPLER_DIM_2D,   GLSL_TYPE_FLOAT, ARRAY,    COLOR,  3)
+T( samplerCubeArray, GLSL_SAMPLER_DIM_CUBE, GLSL_TYPE_FLOAT, ARRAY,    COLOR,  4)
+T( sampler2DRect,    GLSL_SAMPLER_DIM_RECT, GLSL_TYPE_FLOAT, NONARRAY, COLOR,  2)
+T( samplerBuffer,    GLSL_SAMPLER_DIM_BUF,  GLSL_TYPE_FLOAT, NONARRAY, COLOR,  1)
+T( sampler2DMS,      GLSL_SAMPLER_DIM_MS,   GLSL_TYPE_FLOAT, NONARRAY, COLOR,  2)
+T( sampler2DMSArray, GLSL_SAMPLER_DIM_MS,   GLSL_TYPE_FLOAT, ARRAY,    COLOR,  3)
+T(isampler1D,        GLSL_SAMPLER_DIM_1D,   GLSL_TYPE_INT,   NONARRAY, COLOR,  1)
+T(isampler2D,        GLSL_SAMPLER_DIM_2D,   GLSL_TYPE_INT,   NONARRAY, COLOR,  2)
+T(isampler3D,        GLSL_SAMPLER_DIM_3D,   GLSL_TYPE_INT,   NONARRAY, COLOR,  3)
+T(isamplerCube,      GLSL_SAMPLER_DIM_CUBE, GLSL_TYPE_INT,   NONARRAY, COLOR,  3)
+T(isampler1DArray,   GLSL_SAMPLER_DIM_1D,   GLSL_TYPE_INT,   ARRAY,    COLOR,  2)
+T(isampler2DArray,   GLSL_SAMPLER_DIM_2D,   GLSL_TYPE_INT,   ARRAY,    COLOR,  3)
+T(isamplerCubeArray, GLSL_SAMPLER_DIM_CUBE, GLSL_TYPE_INT,   ARRAY,    COLOR,  4)
+T(isampler2DRect,    GLSL_SAMPLER_DIM_RECT, GLSL_TYPE_INT,   NONARRAY, COLOR,  2)
+T(isamplerBuffer,    GLSL_SAMPLER_DIM_BUF,  GLSL_TYPE_INT,   NONARRAY, COLOR,  1)
+T(isampler2DMS,      GLSL_SAMPLER_DIM_MS,   GLSL_TYPE_INT,   NONARRAY, COLOR,  2)
+T(isampler2DMSArray, GLSL_SAMPLER_DIM_MS,   GLSL_TYPE_INT,   ARRAY,    COLOR,  3)
+T(usampler1D,        GLSL_SAMPLER_DIM_1D,   GLSL_TYPE_UINT,  NONARRAY, COLOR,  1)
+T(usampler2D,        GLSL_SAMPLER_DIM_2D,   GLSL_TYPE_UINT,  NONARRAY, COLOR,  2)
+T(usampler3D,        GLSL_SAMPLER_DIM_3D,   GLSL_TYPE_UINT,  NONARRAY, COLOR,  3)
+T(usamplerCube,      GLSL_SAMPLER_DIM_CUBE, GLSL_TYPE_UINT,  NONARRAY, COLOR,  3)
+T(usampler1DArray,   GLSL_SAMPLER_DIM_1D,   GLSL_TYPE_UINT,  ARRAY,    COLOR,  2)
+T(usampler2DArray,   GLSL_SAMPLER_DIM_2D,   GLSL_TYPE_UINT,  ARRAY,    COLOR,  3)
+T(usamplerCubeArray, GLSL_SAMPLER_DIM_CUBE, GLSL_TYPE_UINT,  ARRAY,    COLOR,  4)
+T(usampler2DRect,    GLSL_SAMPLER_DIM_RECT, GLSL_TYPE_UINT,  NONARRAY, COLOR,  2)
+T(usamplerBuffer,    GLSL_SAMPLER_DIM_BUF,  GLSL_TYPE_UINT,  NONARRAY, COLOR,  1)
+T(usampler2DMS,      GLSL_SAMPLER_DIM_MS,   GLSL_TYPE_UINT,  NONARRAY, COLOR,  2)
+T(usampler2DMSArray, GLSL_SAMPLER_DIM_MS,   GLSL_TYPE_UINT,  ARRAY,    COLOR,  3)
+
+T(sampler1DShadow,   GLSL_SAMPLER_DIM_1D,   GLSL_TYPE_FLOAT, NONARRAY, SHADOW, 1)
+T(sampler2DShadow,   GLSL_SAMPLER_DIM_2D,   GLSL_TYPE_FLOAT, NONARRAY, SHADOW, 2)
+T(samplerCubeShadow, GLSL_SAMPLER_DIM_CUBE, GLSL_TYPE_FLOAT, NONARRAY, SHADOW, 3)
+
+T(sampler1DArrayShadow,
+  GLSL_SAMPLER_DIM_1D, GLSL_TYPE_FLOAT, ARRAY, SHADOW, 2)
+T(sampler2DArrayShadow,
+  GLSL_SAMPLER_DIM_2D, GLSL_TYPE_FLOAT, ARRAY, SHADOW, 3)
+T(samplerCubeArrayShadow,
+  GLSL_SAMPLER_DIM_CUBE, GLSL_TYPE_FLOAT, ARRAY, SHADOW, 4)
+T(sampler2DRectShadow,
+  GLSL_SAMPLER_DIM_RECT, GLSL_TYPE_FLOAT, NONARRAY, SHADOW, 2)
+
+T(samplerExternalOES,
+  GLSL_SAMPLER_DIM_EXTERNAL, GLSL_TYPE_FLOAT, NONARRAY, COLOR, 2)

diff --git a/icd/intel/compiler/shader/tests/set_uniform_initializer_tests.cpp b/icd/intel/compiler/shader/tests/set_uniform_initializer_tests.cpp
new file mode 100644
index 0000000..be202b3
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/set_uniform_initializer_tests.cpp

@@ -0,0 +1,593 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include <gtest/gtest.h>
+#include "main/compiler.h"
+#include "main/mtypes.h"
+#include "main/macros.h"
+#include "ralloc.h"
+#include "uniform_initializer_utils.h"
+
+namespace linker {
+extern void
+set_uniform_initializer(void *mem_ctx, gl_shader_program *prog,
+			const char *name, const glsl_type *type,
+			ir_constant *val);
+}
+
+class set_uniform_initializer : public ::testing::Test {
+public:
+   virtual void SetUp();
+   virtual void TearDown();
+
+   /**
+    * Index of the uniform to be tested.
+    *
+    * All of the \c set_uniform_initializer tests create several slots for
+    * unifroms.  All but one of the slots is fake.  This field holds the index
+    * of the slot for the uniform being tested.
+    */
+   unsigned actual_index;
+
+   /**
+    * Name of the uniform to be tested.
+    */
+   const char *name;
+
+   /**
+    * Shader program used in the test.
+    */
+   struct gl_shader_program *prog;
+
+   /**
+    * Ralloc memory context used for all temporary allocations.
+    */
+   void *mem_ctx;
+};
+
+void
+set_uniform_initializer::SetUp()
+{
+   this->mem_ctx = ralloc_context(NULL);
+   this->prog = rzalloc(NULL, struct gl_shader_program);
+
+   /* Set default values used by the test cases.
+    */
+   this->actual_index = 1;
+   this->name = "i";
+}
+
+void
+set_uniform_initializer::TearDown()
+{
+   ralloc_free(this->mem_ctx);
+   this->mem_ctx = NULL;
+
+   ralloc_free(this->prog);
+   this->prog = NULL;
+}
+
+/**
+ * Create some uniform storage for a program.
+ *
+ * \param prog          Program to get some storage
+ * \param num_storage   Total number of storage slots
+ * \param index_to_set  Storage slot that will actually get a value
+ * \param name          Name for the actual storage slot
+ * \param type          Type for the elements of the actual storage slot
+ * \param array_size    Size for the array of the actual storage slot.  This
+ *                      should be zero for non-arrays.
+ */
+static unsigned
+establish_uniform_storage(struct gl_shader_program *prog, unsigned num_storage,
+			  unsigned index_to_set, const char *name,
+			  const glsl_type *type, unsigned array_size)
+{
+   const unsigned elements = MAX2(1, array_size);
+   const unsigned data_components = elements * type->components();
+   const unsigned total_components = MAX2(17, (data_components
+					       + type->components()));
+   const unsigned red_zone_components = total_components - data_components;
+
+   prog->UniformStorage = rzalloc_array(prog, struct gl_uniform_storage,
+					num_storage);
+   prog->NumUserUniformStorage = num_storage;
+
+   prog->UniformStorage[index_to_set].name = (char *) name;
+   prog->UniformStorage[index_to_set].type = type;
+   prog->UniformStorage[index_to_set].array_elements = array_size;
+   prog->UniformStorage[index_to_set].initialized = false;
+   for (int sh = 0; sh < MESA_SHADER_STAGES; sh++) {
+      prog->UniformStorage[index_to_set].sampler[sh].index = ~0;
+      prog->UniformStorage[index_to_set].sampler[sh].active = false;
+   }
+   prog->UniformStorage[index_to_set].num_driver_storage = 0;
+   prog->UniformStorage[index_to_set].driver_storage = NULL;
+   prog->UniformStorage[index_to_set].storage =
+      rzalloc_array(prog, union gl_constant_value, total_components);
+
+   fill_storage_array_with_sentinels(prog->UniformStorage[index_to_set].storage,
+				     data_components,
+				     red_zone_components);
+
+   for (unsigned i = 0; i < num_storage; i++) {
+      if (i == index_to_set)
+	 continue;
+
+      prog->UniformStorage[i].name = (char *) "invalid slot";
+      prog->UniformStorage[i].type = glsl_type::void_type;
+      prog->UniformStorage[i].array_elements = 0;
+      prog->UniformStorage[i].initialized = false;
+      for (int sh = 0; sh < MESA_SHADER_STAGES; sh++) {
+         prog->UniformStorage[i].sampler[sh].index = ~0;
+         prog->UniformStorage[i].sampler[sh].active = false;
+      }
+      prog->UniformStorage[i].num_driver_storage = 0;
+      prog->UniformStorage[i].driver_storage = NULL;
+      prog->UniformStorage[i].storage = NULL;
+   }
+
+   return red_zone_components;
+}
+
+/**
+ * Verify that the correct uniform is marked as having been initialized.
+ */
+static void
+verify_initialization(struct gl_shader_program *prog, unsigned actual_index)
+{
+   for (unsigned i = 0; i < prog->NumUserUniformStorage; i++) {
+      if (i == actual_index) {
+	 EXPECT_TRUE(prog->UniformStorage[actual_index].initialized);
+      } else {
+	 EXPECT_FALSE(prog->UniformStorage[i].initialized);
+      }
+   }
+}
+
+static void
+non_array_test(void *mem_ctx, struct gl_shader_program *prog,
+	       unsigned actual_index, const char *name,
+	       enum glsl_base_type base_type,
+	       unsigned columns, unsigned rows)
+{
+   const glsl_type *const type =
+      glsl_type::get_instance(base_type, rows, columns);
+
+   unsigned red_zone_components =
+      establish_uniform_storage(prog, 3, actual_index, name, type, 0);
+
+   ir_constant *val;
+   generate_data(mem_ctx, base_type, columns, rows, val);
+
+   linker::set_uniform_initializer(mem_ctx, prog, name, type, val);
+
+   verify_initialization(prog, actual_index);
+   verify_data(prog->UniformStorage[actual_index].storage, 0, val,
+	       red_zone_components);
+}
+
+TEST_F(set_uniform_initializer, int_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_INT, 1, 1);
+}
+
+TEST_F(set_uniform_initializer, ivec2_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_INT, 1, 2);
+}
+
+TEST_F(set_uniform_initializer, ivec3_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_INT, 1, 3);
+}
+
+TEST_F(set_uniform_initializer, ivec4_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_INT, 1, 4);
+}
+
+TEST_F(set_uniform_initializer, uint_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_UINT, 1, 1);
+}
+
+TEST_F(set_uniform_initializer, uvec2_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_UINT, 1, 2);
+}
+
+TEST_F(set_uniform_initializer, uvec3_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_UINT, 1, 3);
+}
+
+TEST_F(set_uniform_initializer, uvec4_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_UINT, 1, 4);
+}
+
+TEST_F(set_uniform_initializer, bool_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_BOOL, 1, 1);
+}
+
+TEST_F(set_uniform_initializer, bvec2_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_BOOL, 1, 2);
+}
+
+TEST_F(set_uniform_initializer, bvec3_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_BOOL, 1, 3);
+}
+
+TEST_F(set_uniform_initializer, bvec4_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_BOOL, 1, 4);
+}
+
+TEST_F(set_uniform_initializer, float_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 1, 2);
+}
+
+TEST_F(set_uniform_initializer, vec2_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 1, 2);
+}
+
+TEST_F(set_uniform_initializer, vec3_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 1, 3);
+}
+
+TEST_F(set_uniform_initializer, vec4_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 1, 4);
+}
+
+TEST_F(set_uniform_initializer, mat2x2_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 2, 2);
+}
+
+TEST_F(set_uniform_initializer, mat2x3_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 2, 3);
+}
+
+TEST_F(set_uniform_initializer, mat2x4_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 2, 4);
+}
+
+TEST_F(set_uniform_initializer, mat3x2_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 3, 2);
+}
+
+TEST_F(set_uniform_initializer, mat3x3_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 3, 3);
+}
+
+TEST_F(set_uniform_initializer, mat3x4_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 3, 4);
+}
+
+TEST_F(set_uniform_initializer, mat4x2_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 4, 2);
+}
+
+TEST_F(set_uniform_initializer, mat4x3_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 4, 3);
+}
+
+TEST_F(set_uniform_initializer, mat4x4_uniform)
+{
+   non_array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 4, 4);
+}
+
+static void
+array_test(void *mem_ctx, struct gl_shader_program *prog,
+	   unsigned actual_index, const char *name,
+	   enum glsl_base_type base_type,
+	   unsigned columns, unsigned rows, unsigned array_size,
+	   unsigned excess_data_size)
+{
+   const glsl_type *const element_type =
+      glsl_type::get_instance(base_type, rows, columns);
+
+   const unsigned red_zone_components =
+      establish_uniform_storage(prog, 3, actual_index, name, element_type,
+				array_size);
+
+   /* The constant value generated may have more array elements than the
+    * uniform that it initializes.  In the real compiler and linker this can
+    * happen when a uniform array is compacted because some of the tail
+    * elements are not used.  In this case, the type of the uniform will be
+    * modified, but the initializer will not.
+    */
+   ir_constant *val;
+   generate_array_data(mem_ctx, base_type, columns, rows,
+		       array_size + excess_data_size, val);
+
+   linker::set_uniform_initializer(mem_ctx, prog, name, element_type, val);
+
+   verify_initialization(prog, actual_index);
+   verify_data(prog->UniformStorage[actual_index].storage, array_size,
+	       val, red_zone_components);
+}
+
+TEST_F(set_uniform_initializer, int_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_INT, 1, 1, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, ivec2_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_INT, 1, 2, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, ivec3_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_INT, 1, 3, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, ivec4_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_INT, 1, 4, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, uint_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_UINT, 1, 1, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, uvec2_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_UINT, 1, 2, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, uvec3_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_UINT, 1, 3, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, uvec4_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_UINT, 1, 4, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, bool_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_BOOL, 1, 1, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, bvec2_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_BOOL, 1, 2, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, bvec3_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_BOOL, 1, 3, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, bvec4_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_BOOL, 1, 4, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, float_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 1, 1, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, vec2_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 1, 2, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, vec3_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 1, 3, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, vec4_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 1, 4, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, mat2x2_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 2, 2, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, mat2x3_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 2, 3, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, mat2x4_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 2, 4, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, mat3x2_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 3, 2, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, mat3x3_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 3, 3, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, mat3x4_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 3, 4, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, mat4x2_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 4, 2, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, mat4x3_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 4, 3, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, mat4x4_array_uniform)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 4, 4, 4, 0);
+}
+
+TEST_F(set_uniform_initializer, int_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_INT, 1, 1, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, ivec2_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_INT, 1, 2, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, ivec3_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_INT, 1, 3, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, ivec4_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_INT, 1, 4, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, uint_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_UINT, 1, 1, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, uvec2_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_UINT, 1, 2, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, uvec3_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_UINT, 1, 3, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, uvec4_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_UINT, 1, 4, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, bool_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_BOOL, 1, 1, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, bvec2_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_BOOL, 1, 2, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, bvec3_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_BOOL, 1, 3, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, bvec4_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_BOOL, 1, 4, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, float_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 1, 1, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, vec2_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 1, 2, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, vec3_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 1, 3, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, vec4_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 1, 4, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, mat2x2_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 2, 2, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, mat2x3_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 2, 3, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, mat2x4_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 2, 4, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, mat3x2_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 3, 2, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, mat3x3_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 3, 3, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, mat3x4_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 3, 4, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, mat4x2_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 4, 2, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, mat4x3_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 4, 3, 4, 5);
+}
+
+TEST_F(set_uniform_initializer, mat4x4_array_uniform_excess_initializer)
+{
+   array_test(mem_ctx, prog, actual_index, name, GLSL_TYPE_FLOAT, 4, 4, 4, 5);
+}

diff --git a/icd/intel/compiler/shader/tests/sexps.py b/icd/intel/compiler/shader/tests/sexps.py
new file mode 100644
index 0000000..a714af8
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/sexps.py

@@ -0,0 +1,103 @@
+# coding=utf-8
+#
+# Copyright © 2011 Intel Corporation
+#
+# Permission is hereby granted, free of charge, to any person obtaining a
+# copy of this software and associated documentation files (the "Software"),
+# to deal in the Software without restriction, including without limitation
+# the rights to use, copy, modify, merge, publish, distribute, sublicense,
+# and/or sell copies of the Software, and to permit persons to whom the
+# Software is furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice (including the next
+# paragraph) shall be included in all copies or substantial portions of the
+# Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+# DEALINGS IN THE SOFTWARE.
+
+# This file contains helper functions for manipulating sexps in Python.
+#
+# We represent a sexp in Python using nested lists containing strings.
+# So, for example, the sexp (constant float (1.000000)) is represented
+# as ['constant', 'float', ['1.000000']].
+
+import re
+
+def check_sexp(sexp):
+    """Verify that the argument is a proper sexp.
+
+    That is, raise an exception if the argument is not a string or a
+    list, or if it contains anything that is not a string or a list at
+    any nesting level.
+    """
+    if isinstance(sexp, list):
+        for s in sexp:
+            check_sexp(s)
+    elif not isinstance(sexp, basestring):
+        raise Exception('Not a sexp: {0!r}'.format(sexp))
+
+def parse_sexp(sexp):
+    """Convert a string, of the form that would be output by mesa,
+    into a sexp represented as nested lists containing strings.
+    """
+    sexp_token_regexp = re.compile(
+        '[a-zA-Z_]+(@[0-9]+)?|[0-9]+(\\.[0-9]+)?|[^ \n]')
+    stack = [[]]
+    for match in sexp_token_regexp.finditer(sexp):
+        token = match.group(0)
+        if token == '(':
+            stack.append([])
+        elif token == ')':
+            if len(stack) == 1:
+                raise Exception('Unmatched )')
+            sexp = stack.pop()
+            stack[-1].append(sexp)
+        else:
+            stack[-1].append(token)
+    if len(stack) != 1:
+        raise Exception('Unmatched (')
+    if len(stack[0]) != 1:
+        raise Exception('Multiple sexps')
+    return stack[0][0]
+
+def sexp_to_string(sexp):
+    """Convert a sexp, represented as nested lists containing strings,
+    into a single string of the form parseable by mesa.
+    """
+    if isinstance(sexp, basestring):
+        return sexp
+    assert isinstance(sexp, list)
+    result = ''
+    for s in sexp:
+        sub_result = sexp_to_string(s)
+        if result == '':
+            result = sub_result
+        elif '\n' not in result and '\n' not in sub_result and \
+                len(result) + len(sub_result) + 1 <= 70:
+            result += ' ' + sub_result
+        else:
+            result += '\n' + sub_result
+    return '({0})'.format(result.replace('\n', '\n '))
+
+def sort_decls(sexp):
+    """Sort all toplevel variable declarations in sexp.
+
+    This is used to work around the fact that
+    ir_reader::read_instructions reorders declarations.
+    """
+    assert isinstance(sexp, list)
+    decls = []
+    other_code = []
+    for s in sexp:
+        if isinstance(s, list) and len(s) >= 4 and s[0] == 'declare':
+            decls.append(s)
+        else:
+            other_code.append(s)
+    return sorted(decls) + other_code
+

diff --git a/icd/intel/compiler/shader/tests/threadpool_test.cpp b/icd/intel/compiler/shader/tests/threadpool_test.cpp
new file mode 100644
index 0000000..63f55c5
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/threadpool_test.cpp

@@ -0,0 +1,137 @@
+/*
+ * Copyright © 2014 LunarG, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include <gtest/gtest.h>
+#include <string.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <time.h>
+#include <unistd.h>
+#include "c11/threads.h"
+
+#include "threadpool.h"
+
+#define NUM_THREADS 10
+#define OPS_PER_THREAD 100
+#define MAX_TASKS 10
+
+static void
+race_cb(void *data)
+{
+   usleep(1000 * 5);
+}
+
+static int
+race_random_op(void *data)
+{
+   struct _mesa_threadpool *pool = (struct _mesa_threadpool *) data;
+   struct _mesa_threadpool_task *tasks[MAX_TASKS];
+   int num_tasks = 0;
+   int num_ops = 0;
+   int i;
+
+   while (num_ops < OPS_PER_THREAD) {
+      int op = (random() % 1000);
+
+      if (op < 480) { /* 48% */
+         if (num_tasks < MAX_TASKS) {
+            tasks[num_tasks++] =
+               _mesa_threadpool_queue_task(pool, race_cb, NULL);
+         }
+      }
+      else if (op < 980) { /* 50% */
+         if (num_tasks)
+            _mesa_threadpool_complete_task(pool, tasks[--num_tasks]);
+      }
+      else if (op < 995) { /* 1.5% */
+         for (i = 0; i < num_tasks; i++)
+            _mesa_threadpool_complete_task(pool, tasks[i]);
+         num_tasks = 0;
+      }
+      else { /* 0.5% */
+         _mesa_threadpool_join(pool, (op < 998));
+      }
+
+      num_ops++;
+   }
+
+   for (i = 0; i < num_tasks; i++)
+      _mesa_threadpool_complete_task(pool, tasks[i]);
+
+   _mesa_threadpool_unref(pool);
+
+   return 0;
+}
+
+/**
+ * \name Thread safety
+ */
+/*@{*/
+TEST(threadpool_test, race)
+{
+   struct _mesa_threadpool *pool;
+   thrd_t threads[NUM_THREADS];
+   int i;
+
+   srandom(time(NULL));
+   pool = _mesa_threadpool_create(4);
+   for (i = 0; i < NUM_THREADS; i++) {
+      thrd_create(&threads[i], race_random_op,
+            (void *) _mesa_threadpool_ref(pool));
+   }
+   _mesa_threadpool_unref(pool);
+
+   for (i = 0; i < NUM_THREADS; i++)
+      thrd_join(threads[i], NULL);
+
+   /* this is not really a unit test */
+   EXPECT_TRUE(true);
+}
+
+static void
+basic_cb(void *data)
+{
+   int *val = (int *) data;
+
+   usleep(1000 * 5);
+   *val = 1;
+}
+
+TEST(threadpool_test, basic)
+{
+   struct _mesa_threadpool *pool;
+   struct _mesa_threadpool_task *tasks[2];
+   int vals[2];
+
+   pool = _mesa_threadpool_create(2);
+
+   vals[0] = vals[1] = 0;
+   tasks[0] = _mesa_threadpool_queue_task(pool, basic_cb, (void *) &vals[0]);
+   tasks[1] = _mesa_threadpool_queue_task(pool, basic_cb, (void *) &vals[1]);
+   _mesa_threadpool_complete_task(pool, tasks[0]);
+   _mesa_threadpool_complete_task(pool, tasks[1]);
+   EXPECT_TRUE(vals[0] == 1 && vals[1] == 1);
+
+   _mesa_threadpool_unref(pool);
+}
+
+/*@}*/

diff --git a/icd/intel/compiler/shader/tests/uniform_initializer_utils.cpp b/icd/intel/compiler/shader/tests/uniform_initializer_utils.cpp
new file mode 100644
index 0000000..5e86c24
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/uniform_initializer_utils.cpp

@@ -0,0 +1,239 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include <gtest/gtest.h>
+#include "main/mtypes.h"
+#include "main/macros.h"
+#include "ralloc.h"
+#include "uniform_initializer_utils.h"
+#include <stdio.h>
+
+void
+fill_storage_array_with_sentinels(gl_constant_value *storage,
+				  unsigned data_size,
+				  unsigned red_zone_size)
+{
+   for (unsigned i = 0; i < data_size; i++)
+      storage[i].u = 0xDEADBEEF;
+
+   for (unsigned i = 0; i < red_zone_size; i++)
+      storage[data_size + i].u = 0xBADDC0DE;
+}
+
+/**
+ * Verfiy that markers past the end of the real uniform are unmodified
+ */
+static ::testing::AssertionResult
+red_zone_is_intact(gl_constant_value *storage,
+		   unsigned data_size,
+		   unsigned red_zone_size)
+{
+   for (unsigned i = 0; i < red_zone_size; i++) {
+      const unsigned idx = data_size + i;
+
+      if (storage[idx].u != 0xBADDC0DE)
+	 return ::testing::AssertionFailure()
+	    << "storage[" << idx << "].u = "  << storage[idx].u
+	    << ", exepected data values = " << data_size
+	    << ", red-zone size = " << red_zone_size;
+   }
+
+   return ::testing::AssertionSuccess();
+}
+
+static const int values[] = {
+   2, 0, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53
+};
+
+/**
+ * Generate a single data element.
+ *
+ * This is by both \c generate_data and \c generate_array_data to create the
+ * data.
+ */
+static void
+generate_data_element(void *mem_ctx, const glsl_type *type,
+		      ir_constant *&val, unsigned data_index_base)
+{
+   /* Set the initial data values for the generated constant.
+    */
+   ir_constant_data data;
+   memset(&data, 0, sizeof(data));
+   for (unsigned i = 0; i < type->components(); i++) {
+      const unsigned idx = (i + data_index_base) % Elements(values);
+      switch (type->base_type) {
+      case GLSL_TYPE_UINT:
+      case GLSL_TYPE_INT:
+      case GLSL_TYPE_SAMPLER:
+	 data.i[i] = values[idx];
+	 break;
+      case GLSL_TYPE_FLOAT:
+	 data.f[i] = float(values[idx]);
+	 break;
+      case GLSL_TYPE_BOOL:
+	 data.b[i] = bool(values[idx]);
+	 break;
+      case GLSL_TYPE_ATOMIC_UINT:
+      case GLSL_TYPE_STRUCT:
+      case GLSL_TYPE_ARRAY:
+      case GLSL_TYPE_VOID:
+      case GLSL_TYPE_ERROR:
+      case GLSL_TYPE_INTERFACE:
+	 ASSERT_TRUE(false);
+	 break;
+      }
+   }
+
+   /* Generate and verify the constant.
+    */
+   val = new(mem_ctx) ir_constant(type, &data);
+
+   for (unsigned i = 0; i < type->components(); i++) {
+      switch (type->base_type) {
+      case GLSL_TYPE_UINT:
+      case GLSL_TYPE_INT:
+      case GLSL_TYPE_SAMPLER:
+	 ASSERT_EQ(data.i[i], val->value.i[i]);
+	 break;
+      case GLSL_TYPE_FLOAT:
+	 ASSERT_EQ(data.f[i], val->value.f[i]);
+	 break;
+      case GLSL_TYPE_BOOL:
+	 ASSERT_EQ(data.b[i], val->value.b[i]);
+	 break;
+      case GLSL_TYPE_ATOMIC_UINT:
+      case GLSL_TYPE_STRUCT:
+      case GLSL_TYPE_ARRAY:
+      case GLSL_TYPE_VOID:
+      case GLSL_TYPE_ERROR:
+      case GLSL_TYPE_INTERFACE:
+	 ASSERT_TRUE(false);
+	 break;
+      }
+   }
+}
+
+void
+generate_data(void *mem_ctx, enum glsl_base_type base_type,
+	      unsigned columns, unsigned rows,
+	      ir_constant *&val)
+{
+   /* Determine what the type of the generated constant should be.
+    */
+   const glsl_type *const type =
+      glsl_type::get_instance(base_type, rows, columns);
+   ASSERT_FALSE(type->is_error());
+
+   generate_data_element(mem_ctx, type, val, 0);
+}
+
+void
+generate_array_data(void *mem_ctx, enum glsl_base_type base_type,
+		    unsigned columns, unsigned rows, unsigned array_size,
+		    ir_constant *&val)
+{
+   /* Determine what the type of the generated constant should be.
+    */
+   const glsl_type *const element_type =
+      glsl_type::get_instance(base_type, rows, columns);
+   ASSERT_FALSE(element_type->is_error());
+
+   const glsl_type *const array_type =
+      glsl_type::get_array_instance(element_type, array_size);
+   ASSERT_FALSE(array_type->is_error());
+
+   /* Set the initial data values for the generated constant.
+    */
+   exec_list values_for_array;
+   for (unsigned i = 0; i < array_size; i++) {
+      ir_constant *element;
+
+      generate_data_element(mem_ctx, element_type, element, i);
+      values_for_array.push_tail(element);
+   }
+
+   val = new(mem_ctx) ir_constant(array_type, &values_for_array);
+}
+
+/**
+ * Verify that the data stored for the uniform matches the initializer
+ *
+ * \param storage              Backing storage for the uniform
+ * \param storage_array_size  Array size of the backing storage.  This must be
+ *                            less than or equal to the array size of the type
+ *                            of \c val.  If \c val is not an array, this must
+ *                            be zero.
+ * \param val                 Value of the initializer for the unifrom.
+ * \param red_zone
+ */
+void
+verify_data(gl_constant_value *storage, unsigned storage_array_size,
+	    ir_constant *val, unsigned red_zone_size)
+{
+   if (val->type->base_type == GLSL_TYPE_ARRAY) {
+      const glsl_type *const element_type = val->array_elements[0]->type;
+
+      for (unsigned i = 0; i < storage_array_size; i++) {
+	 verify_data(storage + (i * element_type->components()), 0,
+		     val->array_elements[i], 0);
+      }
+
+      const unsigned components = element_type->components();
+
+      if (red_zone_size > 0) {
+	 EXPECT_TRUE(red_zone_is_intact(storage,
+					storage_array_size * components,
+					red_zone_size));
+      }
+   } else {
+      ASSERT_EQ(0u, storage_array_size);
+      for (unsigned i = 0; i < val->type->components(); i++) {
+	 switch (val->type->base_type) {
+	 case GLSL_TYPE_UINT:
+	 case GLSL_TYPE_INT:
+	 case GLSL_TYPE_SAMPLER:
+	    EXPECT_EQ(val->value.i[i], storage[i].i);
+	    break;
+	 case GLSL_TYPE_FLOAT:
+	    EXPECT_EQ(val->value.f[i], storage[i].f);
+	    break;
+	 case GLSL_TYPE_BOOL:
+	    EXPECT_EQ(int(val->value.b[i]), storage[i].i);
+	    break;
+         case GLSL_TYPE_ATOMIC_UINT:
+	 case GLSL_TYPE_STRUCT:
+	 case GLSL_TYPE_ARRAY:
+	 case GLSL_TYPE_VOID:
+	 case GLSL_TYPE_ERROR:
+	 case GLSL_TYPE_INTERFACE:
+	    ASSERT_TRUE(false);
+	    break;
+	 }
+      }
+
+      if (red_zone_size > 0) {
+	 EXPECT_TRUE(red_zone_is_intact(storage,
+					val->type->components(),
+					red_zone_size));
+      }
+   }
+}

diff --git a/icd/intel/compiler/shader/tests/uniform_initializer_utils.h b/icd/intel/compiler/shader/tests/uniform_initializer_utils.h
new file mode 100644
index 0000000..f8c06d2
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/uniform_initializer_utils.h

@@ -0,0 +1,47 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#pragma once
+
+#include "program/prog_parameter.h"
+#include "ir.h"
+#include "ir_uniform.h"
+
+extern void
+fill_storage_array_with_sentinels(gl_constant_value *storage,
+				  unsigned data_size,
+				  unsigned red_zone_size);
+
+extern void
+generate_data(void *mem_ctx, enum glsl_base_type base_type,
+	      unsigned columns, unsigned rows,
+	      ir_constant *&val);
+
+extern void
+generate_array_data(void *mem_ctx, enum glsl_base_type base_type,
+		    unsigned columns, unsigned rows, unsigned array_size,
+		    ir_constant *&val);
+
+extern void
+verify_data(gl_constant_value *storage, unsigned storage_array_size,
+	    ir_constant *val, unsigned red_zone_size);

diff --git a/icd/intel/compiler/shader/tests/varyings_test.cpp b/icd/intel/compiler/shader/tests/varyings_test.cpp
new file mode 100644
index 0000000..662fc0e
--- /dev/null
+++ b/icd/intel/compiler/shader/tests/varyings_test.cpp

@@ -0,0 +1,357 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include <gtest/gtest.h>
+#include "main/compiler.h"
+#include "main/mtypes.h"
+#include "main/macros.h"
+#include "ralloc.h"
+#include "ir.h"
+#include "program/hash_table.h"
+
+/**
+ * \file varyings_test.cpp
+ *
+ * Test various aspects of linking shader stage inputs and outputs.
+ */
+
+namespace linker {
+bool
+populate_consumer_input_sets(void *mem_ctx, exec_list *ir,
+                             hash_table *consumer_inputs,
+                             hash_table *consumer_interface_inputs,
+                             ir_variable *consumer_inputs_with_locations[VARYING_SLOT_MAX]);
+
+ir_variable *
+get_matching_input(void *mem_ctx,
+                   const ir_variable *output_var,
+                   hash_table *consumer_inputs,
+                   hash_table *consumer_interface_inputs,
+                   ir_variable *consumer_inputs_with_locations[VARYING_SLOT_MAX]);
+}
+
+class link_varyings : public ::testing::Test {
+public:
+   link_varyings();
+
+   virtual void SetUp();
+   virtual void TearDown();
+
+   char *interface_field_name(const glsl_type *iface, unsigned field = 0)
+   {
+      return ralloc_asprintf(mem_ctx,
+                             "%s.%s",
+                             iface->name,
+                             iface->fields.structure[field].name);
+   }
+
+   void *mem_ctx;
+   exec_list ir;
+   hash_table *consumer_inputs;
+   hash_table *consumer_interface_inputs;
+
+   const glsl_type *simple_interface;
+   ir_variable *junk[VARYING_SLOT_MAX];
+};
+
+link_varyings::link_varyings()
+{
+   static const glsl_struct_field f[] = {
+      {
+         glsl_type::vec(4),
+         "v",
+         false,
+         0,
+         0,
+         0,
+         0
+      }
+   };
+
+   this->simple_interface =
+      glsl_type::get_interface_instance(f,
+                                        ARRAY_SIZE(f),
+                                        GLSL_INTERFACE_PACKING_STD140,
+                                        "simple_interface");
+}
+
+void
+link_varyings::SetUp()
+{
+   this->mem_ctx = ralloc_context(NULL);
+   this->ir.make_empty();
+
+   this->consumer_inputs
+      = hash_table_ctor(0, hash_table_string_hash, hash_table_string_compare);
+
+   this->consumer_interface_inputs
+      = hash_table_ctor(0, hash_table_string_hash, hash_table_string_compare);
+}
+
+void
+link_varyings::TearDown()
+{
+   ralloc_free(this->mem_ctx);
+   this->mem_ctx = NULL;
+
+   hash_table_dtor(this->consumer_inputs);
+   this->consumer_inputs = NULL;
+   hash_table_dtor(this->consumer_interface_inputs);
+   this->consumer_interface_inputs = NULL;
+}
+
+/**
+ * Hash table callback function that counts the elements in the table
+ *
+ * \sa num_elements
+ */
+static void
+ht_count_callback(const void *, void *, void *closure)
+{
+   unsigned int *counter = (unsigned int *) closure;
+
+   (*counter)++;
+}
+
+/**
+ * Helper function to count the number of elements in a hash table.
+ */
+static unsigned
+num_elements(hash_table *ht)
+{
+   unsigned int counter = 0;
+
+   hash_table_call_foreach(ht, ht_count_callback, (void *) &counter);
+
+   return counter;
+}
+
+/**
+ * Helper function to determine whether a hash table is empty.
+ */
+static bool
+is_empty(hash_table *ht)
+{
+   return num_elements(ht) == 0;
+}
+
+TEST_F(link_varyings, single_simple_input)
+{
+   ir_variable *const v =
+      new(mem_ctx) ir_variable(glsl_type::vec(4),
+                               "a",
+                               ir_var_shader_in);
+
+
+   ir.push_tail(v);
+
+   ASSERT_TRUE(linker::populate_consumer_input_sets(mem_ctx,
+                                                    &ir,
+                                                    consumer_inputs,
+                                                    consumer_interface_inputs,
+                                                    junk));
+
+   EXPECT_EQ((void *) v, hash_table_find(consumer_inputs, "a"));
+   EXPECT_EQ(1u, num_elements(consumer_inputs));
+   EXPECT_TRUE(is_empty(consumer_interface_inputs));
+}
+
+TEST_F(link_varyings, gl_ClipDistance)
+{
+   const glsl_type *const array_8_of_float =
+      glsl_type::get_array_instance(glsl_type::vec(1), 8);
+
+   ir_variable *const clipdistance =
+      new(mem_ctx) ir_variable(array_8_of_float,
+                               "gl_ClipDistance",
+                               ir_var_shader_in);
+
+   clipdistance->data.explicit_location = true;
+   clipdistance->data.location = VARYING_SLOT_CLIP_DIST0;
+   clipdistance->data.explicit_index = 0;
+
+   ir.push_tail(clipdistance);
+
+   ASSERT_TRUE(linker::populate_consumer_input_sets(mem_ctx,
+                                                    &ir,
+                                                    consumer_inputs,
+                                                    consumer_interface_inputs,
+                                                    junk));
+
+   EXPECT_EQ(clipdistance, junk[VARYING_SLOT_CLIP_DIST0]);
+   EXPECT_TRUE(is_empty(consumer_inputs));
+   EXPECT_TRUE(is_empty(consumer_interface_inputs));
+}
+
+TEST_F(link_varyings, single_interface_input)
+{
+   ir_variable *const v =
+      new(mem_ctx) ir_variable(simple_interface->fields.structure[0].type,
+                               simple_interface->fields.structure[0].name,
+                               ir_var_shader_in);
+
+   v->init_interface_type(simple_interface);
+
+   ir.push_tail(v);
+
+   ASSERT_TRUE(linker::populate_consumer_input_sets(mem_ctx,
+                                                    &ir,
+                                                    consumer_inputs,
+                                                    consumer_interface_inputs,
+                                                    junk));
+   char *const full_name = interface_field_name(simple_interface);
+
+   EXPECT_EQ((void *) v, hash_table_find(consumer_interface_inputs, full_name));
+   EXPECT_EQ(1u, num_elements(consumer_interface_inputs));
+   EXPECT_TRUE(is_empty(consumer_inputs));
+}
+
+TEST_F(link_varyings, one_interface_and_one_simple_input)
+{
+   ir_variable *const v =
+      new(mem_ctx) ir_variable(glsl_type::vec(4),
+                               "a",
+                               ir_var_shader_in);
+
+
+   ir.push_tail(v);
+
+   ir_variable *const iface =
+      new(mem_ctx) ir_variable(simple_interface->fields.structure[0].type,
+                               simple_interface->fields.structure[0].name,
+                               ir_var_shader_in);
+
+   iface->init_interface_type(simple_interface);
+
+   ir.push_tail(iface);
+
+   ASSERT_TRUE(linker::populate_consumer_input_sets(mem_ctx,
+                                                    &ir,
+                                                    consumer_inputs,
+                                                    consumer_interface_inputs,
+                                                    junk));
+
+   char *const iface_field_name = interface_field_name(simple_interface);
+
+   EXPECT_EQ((void *) iface, hash_table_find(consumer_interface_inputs,
+                                             iface_field_name));
+   EXPECT_EQ(1u, num_elements(consumer_interface_inputs));
+
+   EXPECT_EQ((void *) v, hash_table_find(consumer_inputs, "a"));
+   EXPECT_EQ(1u, num_elements(consumer_inputs));
+}
+
+TEST_F(link_varyings, invalid_interface_input)
+{
+   ir_variable *const v =
+      new(mem_ctx) ir_variable(simple_interface,
+                               "named_interface",
+                               ir_var_shader_in);
+
+   ASSERT_EQ(simple_interface, v->get_interface_type());
+
+   ir.push_tail(v);
+
+   EXPECT_FALSE(linker::populate_consumer_input_sets(mem_ctx,
+                                                    &ir,
+                                                    consumer_inputs,
+                                                     consumer_interface_inputs,
+                                                     junk));
+}
+
+TEST_F(link_varyings, interface_field_doesnt_match_noninterface)
+{
+   char *const iface_field_name = interface_field_name(simple_interface);
+
+   /* The input shader has a single input variable name "a.v"
+    */
+   ir_variable *const in_v =
+      new(mem_ctx) ir_variable(glsl_type::vec(4),
+                               iface_field_name,
+                               ir_var_shader_in);
+
+   ir.push_tail(in_v);
+
+   ASSERT_TRUE(linker::populate_consumer_input_sets(mem_ctx,
+                                                    &ir,
+                                                    consumer_inputs,
+                                                    consumer_interface_inputs,
+                                                    junk));
+
+   /* Create an output variable, "v", that is part of an interface block named
+    * "a".  They should not match.
+    */
+   ir_variable *const out_v =
+      new(mem_ctx) ir_variable(simple_interface->fields.structure[0].type,
+                               simple_interface->fields.structure[0].name,
+                               ir_var_shader_in);
+
+   out_v->init_interface_type(simple_interface);
+
+   ir_variable *const match =
+      linker::get_matching_input(mem_ctx,
+                                 out_v,
+                                 consumer_inputs,
+                                 consumer_interface_inputs,
+                                 junk);
+
+   EXPECT_EQ(NULL, match);
+}
+
+TEST_F(link_varyings, interface_field_doesnt_match_noninterface_vice_versa)
+{
+   char *const iface_field_name = interface_field_name(simple_interface);
+
+   /* In input shader has a single variable, "v", that is part of an interface
+    * block named "a".
+    */
+   ir_variable *const in_v =
+      new(mem_ctx) ir_variable(simple_interface->fields.structure[0].type,
+                               simple_interface->fields.structure[0].name,
+                               ir_var_shader_in);
+
+   in_v->init_interface_type(simple_interface);
+
+   ir.push_tail(in_v);
+
+   ASSERT_TRUE(linker::populate_consumer_input_sets(mem_ctx,
+                                                    &ir,
+                                                    consumer_inputs,
+                                                    consumer_interface_inputs,
+                                                    junk));
+
+   /* Create an output variable "a.v".  They should not match.
+    */
+   ir_variable *const out_v =
+      new(mem_ctx) ir_variable(glsl_type::vec(4),
+                               iface_field_name,
+                               ir_var_shader_out);
+
+   ir_variable *const match =
+      linker::get_matching_input(mem_ctx,
+                                 out_v,
+                                 consumer_inputs,
+                                 consumer_interface_inputs,
+                                 junk);
+
+   EXPECT_EQ(NULL, match);
+}

diff --git a/icd/intel/compiler/shader/threadpool.c b/icd/intel/compiler/shader/threadpool.c
new file mode 100644
index 0000000..d6ed8c1
--- /dev/null
+++ b/icd/intel/compiler/shader/threadpool.c

@@ -0,0 +1,548 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 2014  LunarG, Inc.   All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <stdio.h>
+#include <stdbool.h>
+#include "c11/threads.h"
+#include "main/compiler.h"
+#include "main/simple_list.h"
+#include "threadpool.h"
+
+enum _mesa_threadpool_control {
+   MESA_THREADPOOL_NORMAL,    /* threads wait when there is no task */
+   MESA_THREADPOOL_QUIT,      /* threads quit when there is no task */
+   MESA_THREADPOOL_QUIT_NOW   /* threads quit as soon as possible */
+};
+
+enum _mesa_threadpool_task_state {
+   MESA_THREADPOOL_TASK_PENDING,    /* task is on the pending list */
+   MESA_THREADPOOL_TASK_ACTIVE,     /* task is being worked on */
+   MESA_THREADPOOL_TASK_COMPLETED,  /* task has been completed */
+   MESA_THREADPOOL_TASK_CANCELLED   /* task is cancelled */
+};
+
+struct _mesa_threadpool_task {
+   /* these are protected by the pool's mutex */
+   struct simple_node link; /* must be the first */
+   enum _mesa_threadpool_task_state state;
+   cnd_t completed;
+
+   void (*func)(void *);
+   void *data;
+};
+
+struct _mesa_threadpool {
+   mtx_t mutex;
+   int refcnt;
+   bool shutdown;
+
+   enum _mesa_threadpool_control thread_control;
+   thrd_t *threads;
+   int num_threads, max_threads;
+   int idle_threads; /* number of threads that are idle */
+   cnd_t thread_wakeup;
+   cnd_t thread_joined;
+
+   struct simple_node pending_tasks;
+   int num_pending_tasks;
+   int num_tasks;
+};
+
+static struct _mesa_threadpool_task *
+task_create(void)
+{
+   struct _mesa_threadpool_task *task;
+
+   task = malloc(sizeof(*task));
+   if (!task)
+      return NULL;
+
+   if (cnd_init(&task->completed)) {
+      free(task);
+      return NULL;
+   }
+
+   task->state = MESA_THREADPOOL_TASK_PENDING;
+
+   return task;
+}
+
+static void
+task_destroy(struct _mesa_threadpool_task *task)
+{
+   cnd_destroy(&task->completed);
+   free(task);
+}
+
+static void
+pool_exec_task(struct _mesa_threadpool *pool,
+               struct _mesa_threadpool_task *task)
+{
+   assert(task->state == MESA_THREADPOOL_TASK_PENDING);
+
+   remove_from_list(&task->link);
+   pool->num_pending_tasks--;
+
+   task->state = MESA_THREADPOOL_TASK_ACTIVE;
+
+   /* do the work! */
+   mtx_unlock(&pool->mutex);
+   task->func(task->data);
+   mtx_lock(&pool->mutex);
+
+   task->state = MESA_THREADPOOL_TASK_COMPLETED;
+}
+
+static int
+_mesa_threadpool_worker(void *arg)
+{
+   struct _mesa_threadpool *pool = (struct _mesa_threadpool *) arg;
+
+   mtx_lock(&pool->mutex);
+
+   while (true) {
+      struct _mesa_threadpool_task *task;
+
+      /* wait until there are tasks */
+      while (is_empty_list(&pool->pending_tasks) &&
+             pool->thread_control == MESA_THREADPOOL_NORMAL) {
+         pool->idle_threads++;
+         cnd_wait(&pool->thread_wakeup, &pool->mutex);
+         pool->idle_threads--;
+      }
+
+      if (pool->thread_control != MESA_THREADPOOL_NORMAL) {
+         if (pool->thread_control == MESA_THREADPOOL_QUIT_NOW ||
+             is_empty_list(&pool->pending_tasks))
+            break;
+      }
+
+      assert(!is_empty_list(&pool->pending_tasks));
+      task = (struct _mesa_threadpool_task *)
+         first_elem(&pool->pending_tasks);
+
+      pool_exec_task(pool, task);
+      cnd_signal(&task->completed);
+   }
+
+   mtx_unlock(&pool->mutex);
+
+   return 0;
+}
+
+/**
+ * Queue a new task.
+ */
+struct _mesa_threadpool_task *
+_mesa_threadpool_queue_task(struct _mesa_threadpool *pool,
+                            void (*func)(void *), void *data)
+{
+   struct _mesa_threadpool_task *task;
+
+   task = task_create();
+   if (!task)
+      return NULL;
+
+   task->func = func;
+   task->data = data;
+
+   mtx_lock(&pool->mutex);
+
+   if (unlikely(pool->shutdown)) {
+      mtx_unlock(&pool->mutex);
+      free(task);
+      return NULL;
+   }
+
+   /* someone is joining with the threads */
+   while (unlikely(pool->thread_control != MESA_THREADPOOL_NORMAL))
+      cnd_wait(&pool->thread_joined, &pool->mutex);
+
+   /* spawn threads as needed */
+   if (pool->idle_threads <= pool->num_pending_tasks &&
+       pool->num_threads < pool->max_threads) {
+      int err;
+
+      err = thrd_create(&pool->threads[pool->num_threads],
+                        _mesa_threadpool_worker, (void *) pool);
+      if (!err)
+         pool->num_threads++;
+
+      if (!pool->num_threads) {
+         mtx_unlock(&pool->mutex);
+         task_destroy(task);
+         return NULL;
+      }
+   }
+
+   insert_at_tail(&pool->pending_tasks, &task->link);
+   pool->num_tasks++;
+   pool->num_pending_tasks++;
+   cnd_signal(&pool->thread_wakeup);
+
+   mtx_unlock(&pool->mutex);
+
+   return task;
+}
+
+/**
+ * Wait and destroy the tasks.
+ */
+static bool
+pool_wait_tasks(struct _mesa_threadpool *pool,
+                struct _mesa_threadpool_task **tasks,
+                int num_tasks)
+{
+   bool all_completed = true;
+   int i;
+
+   for (i = 0; i < num_tasks; i++) {
+      struct _mesa_threadpool_task *task = tasks[i];
+
+      while (task->state != MESA_THREADPOOL_TASK_COMPLETED &&
+             task->state != MESA_THREADPOOL_TASK_CANCELLED)
+         cnd_wait(&task->completed, &pool->mutex);
+
+      if (task->state != MESA_THREADPOOL_TASK_COMPLETED)
+         all_completed = false;
+
+      task_destroy(task);
+   }
+
+   pool->num_tasks -= num_tasks;
+
+   return all_completed;
+}
+
+/**
+ * Wait for \p tasks to complete, and destroy them.  If some of \p tasks
+ * cannot not be completed, return false.
+ *
+ * This function can be called from within the worker threads.
+ */
+bool
+_mesa_threadpool_complete_tasks(struct _mesa_threadpool *pool,
+                                struct _mesa_threadpool_task **tasks,
+                                int num_tasks)
+{
+   bool prioritized = false, completed;
+   int i;
+
+   mtx_lock(&pool->mutex);
+
+   /* we need to do something about tasks that are pending */
+   for (i = 0; i < num_tasks; i++) {
+      struct _mesa_threadpool_task *task = tasks[i];
+
+      if (task->state != MESA_THREADPOOL_TASK_PENDING)
+         continue;
+
+      /* move them to the head so that they are executed next */
+      if (!prioritized) {
+         int j;
+
+         for (j = i + 1; j < num_tasks; j++) {
+            if (task->state == MESA_THREADPOOL_TASK_PENDING)
+               move_to_head(&pool->pending_tasks, &task->link);
+         }
+         prioritized = true;
+      }
+
+      /*
+       * Execute right away for we would have to wait for the worker threads
+       * otherwise, which is no faster.  More importantly, when this is called
+       * from within worker threads, there may be no idle thread available to
+       * execute them.
+       */
+      pool_exec_task(pool, task);
+   }
+
+   completed = pool_wait_tasks(pool, tasks, num_tasks);
+
+   mtx_unlock(&pool->mutex);
+
+   return completed;
+}
+
+/**
+ * This is equivalent to calling \p _mesa_threadpool_complete_tasks with one
+ * task.
+ */
+bool
+_mesa_threadpool_complete_task(struct _mesa_threadpool *pool,
+                               struct _mesa_threadpool_task *task)
+{
+   bool completed;
+
+   mtx_lock(&pool->mutex);
+
+   if (task->state == MESA_THREADPOOL_TASK_PENDING)
+      pool_exec_task(pool, task);
+
+   completed = pool_wait_tasks(pool, &task, 1);
+
+   mtx_unlock(&pool->mutex);
+
+   return completed;
+}
+
+static void
+pool_cancel_pending_tasks(struct _mesa_threadpool *pool)
+{
+   struct simple_node *node, *temp;
+
+   if (is_empty_list(&pool->pending_tasks))
+      return;
+
+   foreach_s(node, temp, &pool->pending_tasks) {
+      struct _mesa_threadpool_task *task =
+         (struct _mesa_threadpool_task *) node;
+
+      remove_from_list(&task->link);
+      task->state = MESA_THREADPOOL_TASK_CANCELLED;
+
+      /* in case some thread is already waiting */
+      cnd_signal(&task->completed);
+   }
+
+   pool->num_pending_tasks = 0;
+}
+
+static void
+pool_join_threads(struct _mesa_threadpool *pool, bool graceful)
+{
+   int joined_threads = 0;
+
+   if (!pool->num_threads)
+      return;
+
+   pool->thread_control = (graceful) ?
+      MESA_THREADPOOL_QUIT : MESA_THREADPOOL_QUIT_NOW;
+
+   while (joined_threads < pool->num_threads) {
+      int i = joined_threads, num_threads = pool->num_threads;
+
+      cnd_broadcast(&pool->thread_wakeup);
+      mtx_unlock(&pool->mutex);
+      while (i < num_threads)
+         thrd_join(pool->threads[i++], NULL);
+      mtx_lock(&pool->mutex);
+
+      joined_threads = num_threads;
+   }
+
+   pool->thread_control = MESA_THREADPOOL_NORMAL;
+   pool->num_threads = 0;
+   assert(!pool->idle_threads);
+}
+
+/**
+ * Join with all pool threads.  When \p graceful is true, wait for the pending
+ * tasks to be completed.
+ */
+void
+_mesa_threadpool_join(struct _mesa_threadpool *pool, bool graceful)
+{
+   mtx_lock(&pool->mutex);
+
+   /* someone is already joining with the threads */
+   while (unlikely(pool->thread_control != MESA_THREADPOOL_NORMAL))
+      cnd_wait(&pool->thread_joined, &pool->mutex);
+
+   if (pool->num_threads) {
+      pool_join_threads(pool, graceful);
+      /* wake up whoever is waiting */
+      cnd_broadcast(&pool->thread_joined);
+   }
+
+   if (!graceful)
+      pool_cancel_pending_tasks(pool);
+
+   assert(pool->num_threads == 0);
+   assert(is_empty_list(&pool->pending_tasks) && !pool->num_pending_tasks);
+
+   mtx_unlock(&pool->mutex);
+}
+
+/**
+ * After this call, no task can be queued.
+ */
+static void
+_mesa_threadpool_set_shutdown(struct _mesa_threadpool *pool)
+{
+   mtx_lock(&pool->mutex);
+   pool->shutdown = true;
+   mtx_unlock(&pool->mutex);
+}
+
+/**
+ * Decrease the reference count.  Destroy \p pool when the reference count
+ * reaches zero.
+ */
+void
+_mesa_threadpool_unref(struct _mesa_threadpool *pool)
+{
+   bool destroy = false;
+
+   mtx_lock(&pool->mutex);
+   pool->refcnt--;
+   destroy = (pool->refcnt == 0);
+   mtx_unlock(&pool->mutex);
+
+   if (destroy) {
+      _mesa_threadpool_join(pool, false);
+
+      if (pool->num_tasks) {
+         fprintf(stderr, "thread pool destroyed with %d tasks\n",
+               pool->num_tasks);
+      }
+
+      free(pool->threads);
+      cnd_destroy(&pool->thread_joined);
+      cnd_destroy(&pool->thread_wakeup);
+      mtx_destroy(&pool->mutex);
+      free(pool);
+   }
+}
+
+/**
+ * Increase the reference count.
+ */
+struct _mesa_threadpool *
+_mesa_threadpool_ref(struct _mesa_threadpool *pool)
+{
+   mtx_lock(&pool->mutex);
+   pool->refcnt++;
+   mtx_unlock(&pool->mutex);
+
+   return pool;
+}
+
+/**
+ * Create a thread pool.  As threads are spawned as needed, this is
+ * inexpensive.
+ */
+struct _mesa_threadpool *
+_mesa_threadpool_create(int max_threads)
+{
+   struct _mesa_threadpool *pool;
+
+   if (max_threads < 1)
+      return NULL;
+
+   pool = calloc(1, sizeof(*pool));
+   if (!pool)
+      return NULL;
+
+   if (mtx_init(&pool->mutex, mtx_plain)) {
+      free(pool);
+      return NULL;
+   }
+
+   pool->refcnt = 1;
+
+   if (cnd_init(&pool->thread_wakeup)) {
+      mtx_destroy(&pool->mutex);
+      free(pool);
+      return NULL;
+   }
+
+   if (cnd_init(&pool->thread_joined)) {
+      cnd_destroy(&pool->thread_wakeup);
+      mtx_destroy(&pool->mutex);
+      free(pool);
+      return NULL;
+   }
+
+   pool->thread_control = MESA_THREADPOOL_NORMAL;
+
+   pool->threads = malloc(sizeof(pool->threads[0]) * max_threads);
+   if (!pool->threads) {
+      cnd_destroy(&pool->thread_joined);
+      cnd_destroy(&pool->thread_wakeup);
+      mtx_destroy(&pool->mutex);
+      free(pool);
+      return NULL;
+   }
+
+   pool->max_threads = max_threads;
+
+   make_empty_list(&pool->pending_tasks);
+
+   return pool;
+}
+
+static mtx_t threadpool_lock = _MTX_INITIALIZER_NP;
+static struct _mesa_threadpool *threadpool;
+
+/**
+ * Get the singleton GLSL thread pool.  \p max_threads is honored only by the
+ * first call to this function.
+ */
+struct _mesa_threadpool *
+_mesa_glsl_get_threadpool(int max_threads)
+{
+   mtx_lock(&threadpool_lock);
+   if (!threadpool)
+      threadpool = _mesa_threadpool_create(max_threads);
+   if (threadpool)
+      _mesa_threadpool_ref(threadpool);
+   mtx_unlock(&threadpool_lock);
+
+   return threadpool;
+}
+
+/**
+ * Wait until all tasks are completed and threads are joined.
+ */
+void
+_mesa_glsl_wait_threadpool(void)
+{
+   mtx_lock(&threadpool_lock);
+   if (threadpool)
+      _mesa_threadpool_join(threadpool, true);
+   mtx_unlock(&threadpool_lock);
+}
+
+/**
+ * Destroy the GLSL thread pool.
+ */
+void
+_mesa_glsl_destroy_threadpool(void)
+{
+   mtx_lock(&threadpool_lock);
+   if (threadpool) {
+      /*
+       * This is called from _mesa_destroy_shader_compiler().  No new task is
+       * allowed since this point.  But contexts, who also own references to
+       * the pool, can still complete tasks that have been queued.
+       */
+      _mesa_threadpool_set_shutdown(threadpool);
+
+      _mesa_threadpool_join(threadpool, false);
+      _mesa_threadpool_unref(threadpool);
+      threadpool = NULL;
+   }
+   mtx_unlock(&threadpool_lock);
+}

diff --git a/icd/intel/desc.c b/icd/intel/desc.c
new file mode 100644
index 0000000..2f3802b
--- /dev/null
+++ b/icd/intel/desc.c

@@ -0,0 +1,1012 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olv@lunarg.com>
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ * Author: Tony Barbour <tony@LunarG.com>
+ *
+ */
+
+#include "buf.h"
+#include "cmd.h"
+#include "dev.h"
+#include "gpu.h"
+#include "img.h"
+#include "sampler.h"
+#include "view.h"
+#include "desc.h"
+
+enum intel_desc_surface_type {
+    INTEL_DESC_SURFACE_UNUSED,
+    INTEL_DESC_SURFACE_BUF,
+    INTEL_DESC_SURFACE_IMG,
+};
+
+struct intel_desc_surface {
+    const struct intel_mem *mem;
+    bool read_only;
+
+    enum intel_desc_surface_type type;
+    union {
+        const void *unused;
+        struct intel_buf_view buf;
+        const struct intel_img_view *img;
+    } u;
+};
+
+struct intel_desc_sampler {
+    const struct intel_sampler *sampler;
+};
+
+bool intel_desc_iter_init_for_binding(struct intel_desc_iter *iter,
+                                      const struct intel_desc_layout *layout,
+                                      uint32_t binding_index, uint32_t array_base)
+{
+    const struct intel_desc_layout_binding *binding;
+    uint32_t i;
+
+    /* should we waste some memory to get rid of this loop? */
+    for (i = 0; i < layout->binding_count; i++) {
+        if (layout->bindings[i].binding == binding_index)
+            break;
+    }
+
+    if (i >= layout->binding_count ||
+        array_base >= layout->bindings[i].array_size)
+        return false;
+
+    binding = &layout->bindings[i];
+
+    iter->type = binding->type;
+    iter->increment = binding->increment;
+    iter->size = binding->array_size;
+
+    intel_desc_offset_mad(&iter->begin, &binding->increment,
+            &binding->offset, array_base);
+    intel_desc_offset_add(&iter->end, &iter->begin, &binding->increment);
+    iter->cur = array_base;
+
+    return true;
+}
+
+static bool desc_iter_init_for_writing(struct intel_desc_iter *iter,
+                                       const struct intel_desc_set *set,
+                                       uint32_t binding_index, uint32_t array_base)
+{
+    if (!intel_desc_iter_init_for_binding(iter, set->layout,
+                binding_index, array_base))
+        return false;
+
+    intel_desc_offset_add(&iter->begin, &iter->begin, &set->region_begin);
+    intel_desc_offset_add(&iter->end, &iter->end, &set->region_begin);
+
+    return true;
+}
+
+bool intel_desc_iter_advance(struct intel_desc_iter *iter)
+{
+    if (iter->cur >= iter->size)
+        return false;
+
+    iter->cur++;
+
+    iter->begin = iter->end;
+    intel_desc_offset_add(&iter->end, &iter->end, &iter->increment);
+
+    return true;
+}
+
+static void desc_region_init_desc_sizes(struct intel_desc_region *region,
+                                        const struct intel_gpu *gpu)
+{
+    region->surface_desc_size = sizeof(struct intel_desc_surface);
+    region->sampler_desc_size = sizeof(struct intel_desc_sampler);
+}
+
+VkResult intel_desc_region_create(struct intel_dev *dev,
+                                    struct intel_desc_region **region_ret)
+{
+    const uint32_t surface_count = 1024*1024;
+    const uint32_t sampler_count = 1024*1024;
+    struct intel_desc_region *region;
+
+    region = intel_alloc(dev, sizeof(*region), sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_DEVICE);
+    if (!region)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    memset(region, 0, sizeof(*region));
+
+    desc_region_init_desc_sizes(region, dev->gpu);
+
+    intel_desc_offset_set(&region->size,
+            region->surface_desc_size * surface_count,
+            region->sampler_desc_size * sampler_count);
+
+    region->surfaces = intel_alloc(dev, region->size.surface,
+            64, VK_SYSTEM_ALLOCATION_SCOPE_DEVICE);
+    if (!region->surfaces) {
+        intel_free(dev, region);
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+    }
+
+    region->samplers = intel_alloc(dev, region->size.sampler,
+            64, VK_SYSTEM_ALLOCATION_SCOPE_DEVICE);
+    if (!region->samplers) {
+        intel_free(dev, region->surfaces);
+        intel_free(dev, region);
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+    }
+
+    *region_ret = region;
+
+    return VK_SUCCESS;
+}
+
+void intel_desc_region_destroy(struct intel_dev *dev,
+                               struct intel_desc_region *region)
+{
+    intel_free(dev, region->samplers);
+    intel_free(dev, region->surfaces);
+    intel_free(dev, region);
+}
+
+/**
+ * Get the size of a descriptor in the region.
+ */
+static VkResult desc_region_get_desc_size(const struct intel_desc_region *region,
+                                            VkDescriptorType type,
+                                            struct intel_desc_offset *size)
+{
+    uint32_t surface_size = 0, sampler_size = 0;
+
+    switch (type) {
+    case VK_DESCRIPTOR_TYPE_SAMPLER:
+        sampler_size = region->sampler_desc_size;
+        break;
+    case VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER:
+        surface_size = region->surface_desc_size;
+        sampler_size = region->sampler_desc_size;
+        break;
+    case VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE:
+    case VK_DESCRIPTOR_TYPE_STORAGE_IMAGE:
+    case VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER:
+    case VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER:
+    case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER:
+    case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER:
+    case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC:
+    case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC:
+    case VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT:
+        surface_size = region->surface_desc_size;
+        break;
+    default:
+        assert(!"unknown descriptor type");
+        break;
+    }
+
+    intel_desc_offset_set(size, surface_size, sampler_size);
+
+    return VK_SUCCESS;
+}
+
+VkResult intel_desc_region_alloc(struct intel_desc_region *region,
+                                 uint32_t max_sets,
+                                 const VkDescriptorPoolCreateInfo *info,
+                                 struct intel_desc_offset *begin,
+                                 struct intel_desc_offset *end)
+{
+    uint32_t surface_size = 0, sampler_size = 0;
+    struct intel_desc_offset alloc;
+    uint32_t i;
+
+    /* calculate sizes needed */
+    for (i = 0; i < info->poolSizeCount; i++) {
+        const VkDescriptorPoolSize *tc = &info->pPoolSizes[i];
+        struct intel_desc_offset size;
+        VkResult ret;
+
+        ret = desc_region_get_desc_size(region, tc->type, &size);
+        if (ret != VK_SUCCESS)
+            return ret;
+
+        surface_size += size.surface * tc->descriptorCount;
+        sampler_size += size.sampler * tc->descriptorCount;
+    }
+
+    surface_size *= max_sets;
+    sampler_size *= max_sets;
+
+    intel_desc_offset_set(&alloc, surface_size, sampler_size);
+
+    *begin = region->cur;
+    intel_desc_offset_add(end, &region->cur, &alloc);
+
+    if (!intel_desc_offset_within(end, &region->size))
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    /* increment the writer pointer */
+    region->cur = *end;
+
+    return VK_SUCCESS;
+}
+
+static void desc_region_validate_begin_end(const struct intel_desc_region *region,
+                                           const struct intel_desc_offset *begin,
+                                           const struct intel_desc_offset *end)
+{
+    assert(begin->surface % region->surface_desc_size == 0 &&
+           begin->sampler % region->sampler_desc_size == 0);
+    assert(end->surface % region->surface_desc_size == 0 &&
+           end->sampler % region->sampler_desc_size == 0);
+    assert(intel_desc_offset_within(end, &region->size));
+}
+
+void intel_desc_region_free(struct intel_desc_region *region,
+                            const struct intel_desc_offset *begin,
+                            const struct intel_desc_offset *end)
+{
+    desc_region_validate_begin_end(region, begin, end);
+
+    /* is it ok not to reclaim? */
+}
+
+void intel_desc_region_update(struct intel_desc_region *region,
+                              const struct intel_desc_offset *begin,
+                              const struct intel_desc_offset *end,
+                              const struct intel_desc_surface *surfaces,
+                              const struct intel_desc_sampler *samplers)
+{
+    desc_region_validate_begin_end(region, begin, end);
+
+    if (begin->surface < end->surface) {
+        memcpy((char *) region->surfaces + begin->surface, surfaces,
+                end->surface - begin->surface);
+    }
+
+    if (begin->sampler < end->sampler) {
+        memcpy((char *) region->samplers + begin->sampler, samplers,
+                end->sampler - begin->sampler);
+    }
+}
+
+void intel_desc_region_copy(struct intel_desc_region *region,
+                            const struct intel_desc_offset *begin,
+                            const struct intel_desc_offset *end,
+                            const struct intel_desc_offset *src)
+{
+    struct intel_desc_offset src_end;
+    const struct intel_desc_surface *surfaces;
+    const struct intel_desc_sampler *samplers;
+
+    /* no overlap */
+    assert(intel_desc_offset_within(src, begin) ||
+           intel_desc_offset_within(end, src));
+
+    /* no read past region */
+    intel_desc_offset_sub(&src_end, end, begin);
+    intel_desc_offset_add(&src_end, src, &src_end);
+    assert(intel_desc_offset_within(&src_end, &region->size));
+
+    surfaces = (const struct intel_desc_surface *)
+        ((const char *) region->surfaces + src->surface);
+    samplers = (const struct intel_desc_sampler *)
+        ((const char *) region->samplers + src->sampler);
+
+    intel_desc_region_update(region, begin, end, surfaces, samplers);
+}
+
+void intel_desc_region_read_surface(const struct intel_desc_region *region,
+                                    const struct intel_desc_offset *offset,
+                                    VkShaderStageFlagBits stage,
+                                    const struct intel_mem **mem,
+                                    bool *read_only,
+                                    const uint32_t **cmd,
+                                    uint32_t *cmd_len)
+{
+    const struct intel_desc_surface *desc;
+    struct intel_desc_offset end;
+
+    intel_desc_offset_set(&end,
+            offset->surface + region->surface_desc_size, offset->sampler);
+    desc_region_validate_begin_end(region, offset, &end);
+
+    desc = (const struct intel_desc_surface *)
+        ((const char *) region->surfaces + offset->surface);
+
+    *mem = desc->mem;
+    *read_only = desc->read_only;
+    switch (desc->type) {
+    case INTEL_DESC_SURFACE_BUF:
+        *cmd = (stage == VK_SHADER_STAGE_FRAGMENT_BIT) ?
+            desc->u.buf.fs_cmd : desc->u.buf.cmd;
+        *cmd_len = desc->u.buf.cmd_len;
+        break;
+    case INTEL_DESC_SURFACE_IMG:
+        *cmd = desc->u.img->cmd;
+        *cmd_len = desc->u.img->cmd_len;
+        break;
+    case INTEL_DESC_SURFACE_UNUSED:
+    default:
+        *cmd = NULL;
+        *cmd_len = 0;
+        break;
+    }
+}
+
+void intel_desc_region_read_sampler(const struct intel_desc_region *region,
+                                    const struct intel_desc_offset *offset,
+                                    const struct intel_sampler **sampler)
+{
+    const struct intel_desc_sampler *desc;
+    struct intel_desc_offset end;
+
+    intel_desc_offset_set(&end,
+            offset->surface, offset->sampler + region->sampler_desc_size);
+    desc_region_validate_begin_end(region, offset, &end);
+
+    desc = (const struct intel_desc_sampler *)
+        ((const char *) region->samplers + offset->sampler);
+
+    *sampler = desc->sampler;
+}
+
+static void desc_pool_destroy(struct intel_obj *obj)
+{
+    struct intel_desc_pool *pool = intel_desc_pool_from_obj(obj);
+
+    intel_desc_pool_destroy(pool);
+}
+
+VkResult intel_desc_pool_create(struct intel_dev *dev,
+                                const VkDescriptorPoolCreateInfo *info,
+                                struct intel_desc_pool **pool_ret)
+{
+    struct intel_desc_pool *pool;
+    VkResult ret;
+
+    pool = (struct intel_desc_pool *) intel_base_create(&dev->base.handle,
+            sizeof(*pool), dev->base.dbg, VK_DEBUG_REPORT_OBJECT_TYPE_DESCRIPTOR_POOL_EXT,
+            info, 0);
+    if (!pool)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    pool->dev = dev;
+
+    ret = intel_desc_region_alloc(dev->desc_region, info->maxSets, info,
+            &pool->region_begin, &pool->region_end);
+    if (ret != VK_SUCCESS) {
+        intel_base_destroy(&pool->obj.base);
+        return ret;
+    }
+
+    /* point to head */
+    pool->cur = pool->region_begin;
+
+    pool->obj.destroy = desc_pool_destroy;
+
+    *pool_ret = pool;
+
+    return VK_SUCCESS;
+}
+
+void intel_desc_pool_destroy(struct intel_desc_pool *pool)
+{
+    intel_desc_region_free(pool->dev->desc_region,
+            &pool->region_begin, &pool->region_end);
+    intel_base_destroy(&pool->obj.base);
+}
+
+VkResult intel_desc_pool_alloc(struct intel_desc_pool *pool,
+                                 const struct intel_desc_layout *layout,
+                                 struct intel_desc_offset *begin,
+                                 struct intel_desc_offset *end)
+{
+    *begin = pool->cur;
+    intel_desc_offset_add(end, &pool->cur, &layout->region_size);
+
+    if (!intel_desc_offset_within(end, &pool->region_end))
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    /* increment the writer pointer */
+    pool->cur = *end;
+
+    return VK_SUCCESS;
+}
+
+void intel_desc_pool_reset(struct intel_desc_pool *pool)
+{
+    /* reset to head */
+    pool->cur = pool->region_begin;
+}
+
+static void desc_set_destroy(struct intel_obj *obj)
+{
+    struct intel_desc_set *set = intel_desc_set_from_obj(obj);
+
+    intel_desc_set_destroy(set);
+}
+
+VkResult intel_desc_set_create(struct intel_dev *dev,
+                               struct intel_desc_pool *pool,
+                               const struct intel_desc_layout *layout,
+                               struct intel_desc_set **set_ret)
+{
+    struct intel_desc_set *set;
+    VkResult ret;
+
+    set = (struct intel_desc_set *) intel_base_create(&dev->base.handle,
+            sizeof(*set), dev->base.dbg, VK_DEBUG_REPORT_OBJECT_TYPE_DESCRIPTOR_SET_EXT,
+            NULL, 0);
+    if (!set)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    set->region = dev->desc_region;
+    ret = intel_desc_pool_alloc(pool, layout,
+            &set->region_begin, &set->region_end);
+    if (ret != VK_SUCCESS) {
+        intel_base_destroy(&set->obj.base);
+        return ret;
+    }
+
+    set->layout = layout;
+
+    set->obj.destroy = desc_set_destroy;
+
+    *set_ret = set;
+
+    return VK_SUCCESS;
+}
+
+void intel_desc_set_destroy(struct intel_desc_set *set)
+{
+    intel_base_destroy(&set->obj.base);
+}
+
+static bool desc_set_img_layout_read_only(VkImageLayout layout)
+{
+    switch (layout) {
+    case VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL:
+    case VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL:
+    case VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL:
+        return true;
+    default:
+        return false;
+    }
+}
+
+static void desc_set_write_sampler(struct intel_desc_set *set,
+                                   const struct intel_desc_iter *iter,
+                                   const struct intel_sampler *sampler)
+{
+    struct intel_desc_sampler desc;
+
+    desc.sampler = sampler;
+    intel_desc_region_update(set->region, &iter->begin, &iter->end,
+            NULL, &desc);
+}
+
+static void desc_set_write_combined_image_sampler(struct intel_desc_set *set,
+                                                  const struct intel_desc_iter *iter,
+                                                  const struct intel_img_view *img_view,
+                                                  VkImageLayout img_layout,
+                                                  const struct intel_sampler *sampler)
+{
+    struct intel_desc_surface view_desc;
+    struct intel_desc_sampler sampler_desc;
+
+    view_desc.mem = img_view->img->obj.mem;
+    view_desc.read_only = desc_set_img_layout_read_only(img_layout);
+    view_desc.type = INTEL_DESC_SURFACE_IMG;
+    view_desc.u.img = img_view;
+
+    sampler_desc.sampler = sampler;
+
+    intel_desc_region_update(set->region, &iter->begin, &iter->end,
+            &view_desc, &sampler_desc);
+}
+
+static void desc_set_write_image(struct intel_desc_set *set,
+                                 const struct intel_desc_iter *iter,
+                                 const struct intel_img_view *img_view,
+                                 VkImageLayout img_layout)
+{
+    struct intel_desc_surface desc;
+
+    desc.mem = img_view->img->obj.mem;
+    desc.read_only = desc_set_img_layout_read_only(img_layout);
+    desc.type = INTEL_DESC_SURFACE_IMG;
+    desc.u.img = img_view;
+    intel_desc_region_update(set->region, &iter->begin, &iter->end,
+            &desc, NULL);
+}
+
+static void desc_set_write_buffer(struct intel_desc_set *set,
+                                  const struct intel_desc_iter *iter,
+                                  const struct intel_buf_view *buf_view)
+{
+    struct intel_desc_surface desc;
+
+    desc.mem = buf_view->buf->obj.mem;
+    desc.read_only = false;
+    desc.type = INTEL_DESC_SURFACE_BUF;
+    memcpy((void*)&desc.u.buf, buf_view, sizeof(*buf_view));
+    intel_desc_region_update(set->region, &iter->begin, &iter->end,
+            &desc, NULL);
+}
+
+static void desc_layout_destroy(struct intel_obj *obj)
+{
+    struct intel_desc_layout *layout = intel_desc_layout_from_obj(obj);
+
+    intel_desc_layout_destroy(layout);
+}
+
+static VkResult desc_layout_init_bindings(struct intel_desc_layout *layout,
+                                            const struct intel_desc_region *region,
+                                            const VkDescriptorSetLayoutCreateInfo *info)
+{
+    struct intel_desc_offset offset;
+    uint32_t i;
+    VkResult ret;
+
+    intel_desc_offset_set(&offset, 0, 0);
+
+    /* allocate bindings */
+    layout->bindings = intel_alloc(layout, sizeof(layout->bindings[0]) *
+            info->bindingCount, sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+    if (!layout->bindings)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    memset(layout->bindings, 0,
+            sizeof(layout->bindings[0]) * info->bindingCount);
+    layout->binding_count = info->bindingCount;
+
+    /* initialize bindings */
+    for (i = 0; i < info->bindingCount; i++) {
+        const VkDescriptorSetLayoutBinding *lb = &info->pBindings[i];
+        struct intel_desc_layout_binding *binding = &layout->bindings[i];
+        struct intel_desc_offset size;
+
+        switch (lb->descriptorType) {
+        case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC:
+        case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC:
+            layout->dynamic_desc_count += lb->descriptorCount;
+            break;
+        default:
+            break;
+        }
+
+        /* lb->stageFlags does not gain us anything */
+        binding->binding = lb->binding;
+        binding->type = lb->descriptorType;
+        binding->array_size = lb->descriptorCount;
+        binding->offset = offset;
+
+        ret = desc_region_get_desc_size(region,
+                lb->descriptorType, &size);
+        if (ret != VK_SUCCESS)
+            return ret;
+
+        binding->increment = size;
+
+        /* copy immutable samplers */
+        if (lb->pImmutableSamplers) {
+            bool shared = true;
+            uint32_t j;
+
+            for (j = 1; j < lb->descriptorCount; j++) {
+                if (lb->pImmutableSamplers[j] != lb->pImmutableSamplers[0]) {
+                    shared = false;
+                    break;
+                }
+            }
+
+            if (shared) {
+                binding->shared_immutable_sampler =
+                    intel_sampler((VkSampler) lb->pImmutableSamplers[0]);
+                /* set sampler offset increment to 0 */
+                intel_desc_offset_set(&binding->increment,
+                        binding->increment.surface, 0);
+            } else {
+                binding->immutable_samplers = intel_alloc(layout,
+                        sizeof(binding->immutable_samplers[0]) * lb->descriptorCount,
+                        sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+                if (!binding->immutable_samplers)
+                    return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+                for (j = 0; j < lb->descriptorCount; j++) {
+                    binding->immutable_samplers[j] =
+                        intel_sampler((VkSampler) lb->pImmutableSamplers[j]);
+                }
+            }
+        }
+
+        /* increment offset */
+        intel_desc_offset_mad(&size, &binding->increment, &size,
+                lb->descriptorCount - 1);
+        intel_desc_offset_add(&offset, &offset, &size);
+    }
+
+    layout->region_size = offset;
+
+    return VK_SUCCESS;
+}
+
+VkResult intel_desc_layout_create(struct intel_dev *dev,
+                                    const VkDescriptorSetLayoutCreateInfo *info,
+                                    struct intel_desc_layout **layout_ret)
+{
+    struct intel_desc_layout *layout;
+    VkResult ret;
+
+    layout = (struct intel_desc_layout *) intel_base_create(&dev->base.handle,
+            sizeof(*layout), dev->base.dbg,
+            VK_DEBUG_REPORT_OBJECT_TYPE_DESCRIPTOR_SET_LAYOUT_EXT, info, 0);
+    if (!layout)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    ret = desc_layout_init_bindings(layout, dev->desc_region, info);
+    if (ret != VK_SUCCESS) {
+        intel_desc_layout_destroy(layout);
+        return ret;
+    }
+
+    layout->obj.destroy = desc_layout_destroy;
+
+    *layout_ret = layout;
+
+    return VK_SUCCESS;
+}
+
+void intel_desc_layout_destroy(struct intel_desc_layout *layout)
+{
+    uint32_t i;
+
+    for (i = 0; i < layout->binding_count; i++) {
+        struct intel_desc_layout_binding *binding = &layout->bindings[i];
+
+        if (binding->immutable_samplers)
+            intel_free(layout, binding->immutable_samplers);
+    }
+    intel_free(layout, layout->bindings);
+    intel_base_destroy(&layout->obj.base);
+}
+
+static void pipeline_layout_destroy(struct intel_obj *obj)
+{
+    struct intel_pipeline_layout *pipeline_layout =
+        intel_pipeline_layout_from_obj(obj);
+
+    intel_pipeline_layout_destroy(pipeline_layout);
+}
+
+VkResult intel_pipeline_layout_create(struct intel_dev                   *dev,
+                                      const VkPipelineLayoutCreateInfo   *pPipelineCreateInfo,
+                                      struct intel_pipeline_layout      **pipeline_layout_ret)
+{
+    struct intel_pipeline_layout *pipeline_layout;
+    uint32_t count = pPipelineCreateInfo->setLayoutCount;
+    uint32_t i;
+
+    pipeline_layout = (struct intel_pipeline_layout *) intel_base_create(
+        &dev->base.handle, sizeof(*pipeline_layout), dev->base.dbg,
+        VK_DEBUG_REPORT_OBJECT_TYPE_PIPELINE_LAYOUT_EXT, NULL, 0);
+
+    if (!pipeline_layout)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    if (count > 0) {
+        pipeline_layout->layouts = intel_alloc(pipeline_layout,
+                                               sizeof(pipeline_layout->layouts[0]) * count,
+                                               sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+        if (!pipeline_layout->layouts) {
+            intel_pipeline_layout_destroy(pipeline_layout);
+            return VK_ERROR_OUT_OF_HOST_MEMORY;
+        }
+
+        pipeline_layout->dynamic_desc_indices = intel_alloc(pipeline_layout,
+                sizeof(pipeline_layout->dynamic_desc_indices[0]) * count,
+                sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+        if (!pipeline_layout->dynamic_desc_indices) {
+            intel_pipeline_layout_destroy(pipeline_layout);
+            return VK_ERROR_OUT_OF_HOST_MEMORY;
+        }
+    }
+
+    for (i = 0; i < count; i++) {
+        pipeline_layout->layouts[i] = intel_desc_layout(pPipelineCreateInfo->pSetLayouts[i]);
+        pipeline_layout->dynamic_desc_indices[i] = pipeline_layout->total_dynamic_desc_count;
+
+        pipeline_layout->total_dynamic_desc_count +=
+            pipeline_layout->layouts[i]->dynamic_desc_count;
+    }
+
+    pipeline_layout->layout_count = count;
+
+    pipeline_layout->obj.destroy = pipeline_layout_destroy;
+
+    *pipeline_layout_ret = pipeline_layout;
+
+    return VK_SUCCESS;
+}
+
+void intel_pipeline_layout_destroy(struct intel_pipeline_layout *pipeline_layout)
+{
+    if (pipeline_layout->dynamic_desc_indices)
+        intel_free(pipeline_layout, pipeline_layout->dynamic_desc_indices);
+    if (pipeline_layout->layouts)
+        intel_free(pipeline_layout, pipeline_layout->layouts);
+    intel_base_destroy(&pipeline_layout->obj.base);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateDescriptorSetLayout(
+    VkDevice                                   device,
+    const VkDescriptorSetLayoutCreateInfo* pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkDescriptorSetLayout*                   pSetLayout)
+{
+    struct intel_dev *dev = intel_dev(device);
+
+    return intel_desc_layout_create(dev, pCreateInfo,
+            (struct intel_desc_layout **) pSetLayout);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyDescriptorSetLayout(
+    VkDevice                                device,
+    VkDescriptorSetLayout                   descriptorSetLayout,
+    const VkAllocationCallbacks*                     pAllocator)
+
+{
+    struct intel_obj *obj = intel_obj(descriptorSetLayout);
+
+    obj->destroy(obj);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreatePipelineLayout(
+    VkDevice                                device,
+    const VkPipelineLayoutCreateInfo*       pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkPipelineLayout*                       pPipelineLayout)
+{
+    struct intel_dev *dev = intel_dev(device);
+
+    return intel_pipeline_layout_create(dev,
+                                        pCreateInfo,
+                                        (struct intel_pipeline_layout **) pPipelineLayout);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyPipelineLayout(
+    VkDevice                                device,
+    VkPipelineLayout                        pipelineLayout,
+    const VkAllocationCallbacks*                     pAllocator)
+
+{
+    struct intel_obj *obj = intel_obj(pipelineLayout);
+
+    obj->destroy(obj);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateDescriptorPool(
+    VkDevice                                    device,
+    const VkDescriptorPoolCreateInfo*           pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkDescriptorPool*                           pDescriptorPool)
+{
+    struct intel_dev *dev = intel_dev(device);
+
+    return intel_desc_pool_create(dev, pCreateInfo,
+            (struct intel_desc_pool **) pDescriptorPool);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyDescriptorPool(
+    VkDevice                                device,
+    VkDescriptorPool                        descriptorPool,
+    const VkAllocationCallbacks*                     pAllocator)
+
+{
+    struct intel_obj *obj = intel_obj(descriptorPool);
+
+    obj->destroy(obj);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkResetDescriptorPool(
+    VkDevice                                  device,
+    VkDescriptorPool                          descriptorPool,
+    VkDescriptorPoolResetFlags                flags)
+{
+    struct intel_desc_pool *pool = intel_desc_pool(descriptorPool);
+
+    intel_desc_pool_reset(pool);
+
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkAllocateDescriptorSets(
+    VkDevice                                    device,
+    const VkDescriptorSetAllocateInfo*             pAllocateInfo,
+    VkDescriptorSet*                            pDescriptorSets)
+{
+    struct intel_desc_pool *pool = intel_desc_pool(pAllocateInfo->descriptorPool);
+    struct intel_dev *dev = pool->dev;
+    VkResult ret = VK_SUCCESS;
+    uint32_t i;
+
+    for (i = 0; i < pAllocateInfo->descriptorSetCount; i++) {
+        const struct intel_desc_layout *layout =
+            intel_desc_layout((VkDescriptorSetLayout) pAllocateInfo->pSetLayouts[i]);
+
+        ret = intel_desc_set_create(dev, pool, layout,
+                (struct intel_desc_set **) &pDescriptorSets[i]);
+        if (ret != VK_SUCCESS)
+            break;
+    }
+
+    return ret;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkFreeDescriptorSets(
+    VkDevice                                    device,
+    VkDescriptorPool                            descriptorPool,
+    uint32_t                                    descriptorSetCount,
+    const VkDescriptorSet*                      pDescriptorSets)
+{
+    uint32_t i;
+
+    for (i = 0; i < descriptorSetCount; i++) {
+        intel_desc_set_destroy(
+           *(struct intel_desc_set **) &pDescriptorSets[i]);
+    }
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkUpdateDescriptorSets(
+    VkDevice                                    device,
+    uint32_t                                    descriptorWriteCount,
+    const VkWriteDescriptorSet*                 pDescriptorWrites,
+    uint32_t                                    descriptorCopyCount,
+    const VkCopyDescriptorSet*                  pDescriptorCopies)
+{
+    uint32_t i, j;
+
+    for (i = 0; i < descriptorWriteCount; i++) {
+        const VkWriteDescriptorSet *write = &pDescriptorWrites[i];
+        struct intel_desc_set *set = intel_desc_set(write->dstSet);
+        const struct intel_desc_layout_binding *binding;
+        struct intel_desc_iter iter;
+
+        desc_iter_init_for_writing(&iter, set, write->dstBinding,
+                    write->dstArrayElement);
+
+        switch (write->descriptorType) {
+        case VK_DESCRIPTOR_TYPE_SAMPLER:
+            for (j = 0; j < write->descriptorCount; j++) {
+                const VkDescriptorImageInfo *info = &write->pImageInfo[j];
+                const struct intel_sampler *sampler =
+                    intel_sampler(info->sampler);
+
+                desc_set_write_sampler(set, &iter, sampler);
+
+                intel_desc_iter_advance(&iter);
+            }
+            break;
+        case VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER:
+            binding = &set->layout->bindings[write->dstBinding];
+
+            /* write the shared immutable sampler */
+            if (binding->shared_immutable_sampler) {
+                struct intel_desc_offset end;
+                struct intel_desc_sampler sampler_desc;
+
+                assert(!iter.increment.sampler);
+                intel_desc_offset_set(&end, iter.begin.surface,
+                        iter.begin.sampler + set->region->sampler_desc_size);
+
+                sampler_desc.sampler = binding->shared_immutable_sampler;
+                intel_desc_region_update(set->region, &iter.begin, &end,
+                        NULL, &sampler_desc);
+            }
+
+            for (j = 0; j < write->descriptorCount; j++) {
+                const VkDescriptorImageInfo *info = &write->pImageInfo[j];
+                const struct intel_img_view *img_view =
+                    intel_img_view(info->imageView);
+                const struct intel_sampler *sampler =
+                    (binding->immutable_samplers) ?
+                    binding->immutable_samplers[write->dstArrayElement + j] :
+                    intel_sampler(info->sampler);
+
+                desc_set_write_combined_image_sampler(set, &iter,
+                        img_view, info->imageLayout, sampler);
+
+                intel_desc_iter_advance(&iter);
+            }
+            break;
+        case VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE:
+        case VK_DESCRIPTOR_TYPE_STORAGE_IMAGE:
+            for (j = 0; j < write->descriptorCount; j++) {
+                const VkDescriptorImageInfo *info = &write->pImageInfo[j];
+                const struct intel_img_view *img_view =
+                    intel_img_view(info->imageView);
+
+                desc_set_write_image(set, &iter, img_view, info->imageLayout);
+
+                intel_desc_iter_advance(&iter);
+            }
+            break;
+        case VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER:
+        case VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER:
+            for (j = 0; j < write->descriptorCount; j++) {
+                const struct intel_buf_view *buf_view =
+                    intel_buf_view(write->pTexelBufferView[j]);
+
+                desc_set_write_buffer(set, &iter, buf_view);
+
+                intel_desc_iter_advance(&iter);
+            }
+            break;
+        case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER:
+        case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER:
+        case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC:
+        case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC:
+            {
+                struct intel_dev *dev = intel_dev(device);
+                VkBufferViewCreateInfo view_info;
+                memset(&view_info, 0, sizeof(view_info));
+                view_info.sType = VK_STRUCTURE_TYPE_BUFFER_VIEW_CREATE_INFO;
+
+                for (j = 0; j < write->descriptorCount; j++) {
+                    const VkDescriptorBufferInfo *info = &write->pBufferInfo[j];
+                    struct intel_buf_view buf_view;
+
+                    view_info.buffer = info->buffer;
+                    view_info.offset = info->offset;
+                    view_info.range = info->range;
+
+                    intel_buf_view_init(dev, &view_info, &buf_view, true);
+
+                    desc_set_write_buffer(set, &iter, &buf_view);
+
+                    intel_desc_iter_advance(&iter);
+                }
+            }
+            break;
+        default:
+            break;
+        }
+    }
+
+    for (i = 0; i < descriptorCopyCount; i++) {
+        const VkCopyDescriptorSet *copy = &pDescriptorCopies[i];
+        const struct intel_desc_set *src_set = intel_desc_set(copy->srcSet);
+        const struct intel_desc_set *dst_set = intel_desc_set(copy->dstSet);
+        struct intel_desc_iter src_iter, dst_iter;
+        struct intel_desc_offset src_begin, dst_begin;
+
+        desc_iter_init_for_writing(&src_iter, src_set,
+                    copy->srcBinding, copy->srcArrayElement);
+        desc_iter_init_for_writing(&dst_iter, dst_set,
+                    copy->dstBinding, copy->dstArrayElement);
+
+        /* save the begin offsets */
+        src_begin = src_iter.begin;
+        dst_begin = dst_iter.begin;
+
+        intel_desc_region_copy(dst_set->region, &dst_begin,
+                &dst_iter.end, &src_begin);
+    }
+
+//    return VK_SUCCESS;
+}

diff --git a/icd/intel/desc.h b/icd/intel/desc.h
new file mode 100644
index 0000000..1932651
--- /dev/null
+++ b/icd/intel/desc.h

@@ -0,0 +1,285 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olv@lunarg.com>
+ *
+ */
+
+#ifndef DESC_H
+#define DESC_H
+
+#include "intel.h"
+#include "obj.h"
+
+struct intel_cmd;
+struct intel_dev;
+
+/**
+ * Opaque descriptors.
+ */
+struct intel_desc_surface;
+struct intel_desc_sampler;
+
+/**
+ * Descriptor region offset (or size) in bytes.
+ */
+struct intel_desc_offset {
+    uint32_t surface;
+    uint32_t sampler;
+};
+
+struct intel_desc_iter {
+    VkDescriptorType type;
+    struct intel_desc_offset increment;
+    uint32_t size;
+
+    struct intel_desc_offset begin;
+    struct intel_desc_offset end;
+    uint32_t cur;
+};
+
+/**
+ * Per-device descriptor region.
+ */
+struct intel_desc_region {
+    /* this is not an intel_obj */
+
+    uint32_t surface_desc_size;
+    uint32_t sampler_desc_size;
+
+    /* surface descriptors (in system memory until RS is enabled) */
+    struct intel_desc_surface *surfaces;
+    /* sampler desciptors */
+    struct intel_desc_sampler *samplers;
+
+    struct intel_desc_offset size;
+    struct intel_desc_offset cur;
+};
+
+struct intel_desc_pool {
+    struct intel_obj obj;
+
+    struct intel_dev *dev;
+
+    /* point to a continuous area in the device's region */
+    struct intel_desc_offset region_begin;
+    struct intel_desc_offset region_end;
+
+    struct intel_desc_offset cur;
+};
+
+struct intel_desc_layout;
+
+struct intel_desc_set {
+    struct intel_obj obj;
+
+    /* suballocated from a pool */
+    struct intel_desc_region *region;
+    struct intel_desc_offset region_begin;
+    struct intel_desc_offset region_end;
+
+    const struct intel_desc_layout *layout;
+};
+
+struct intel_desc_layout {
+    struct intel_obj obj;
+
+    /* homogeneous bindings in this layout */
+    struct intel_desc_layout_binding {
+        uint32_t binding;
+        VkDescriptorType type;
+        uint32_t array_size;
+        const struct intel_sampler **immutable_samplers;
+        const struct intel_sampler *shared_immutable_sampler;
+
+        /* to initialize intel_desc_iter */
+        struct intel_desc_offset offset;
+        struct intel_desc_offset increment;
+    } *bindings;
+    uint32_t binding_count;
+
+    /* count of _DYNAMIC descriptors */
+    uint32_t dynamic_desc_count;
+
+    /* the size of the layout in the region */
+    struct intel_desc_offset region_size;
+};
+
+struct intel_pipeline_layout {
+    struct intel_obj obj;
+
+    struct intel_desc_layout **layouts;
+    uint32_t *dynamic_desc_indices;
+    uint32_t layout_count;
+
+    uint32_t total_dynamic_desc_count;
+};
+
+static inline struct intel_desc_pool *intel_desc_pool(VkDescriptorPool pool)
+{
+    return *(struct intel_desc_pool **) &pool;
+}
+
+static inline struct intel_desc_pool *intel_desc_pool_from_obj(struct intel_obj *obj)
+{
+    return (struct intel_desc_pool *) obj;
+}
+
+static inline struct intel_desc_set *intel_desc_set(VkDescriptorSet set)
+{
+    return *(struct intel_desc_set **) &set;
+}
+
+static inline struct intel_desc_set *intel_desc_set_from_obj(struct intel_obj *obj)
+{
+    return (struct intel_desc_set *) obj;
+}
+
+static inline struct intel_desc_layout *intel_desc_layout(VkDescriptorSetLayout layout)
+{
+    return *(struct intel_desc_layout **) &layout;
+}
+
+static inline struct intel_desc_layout *intel_desc_layout_from_obj(struct intel_obj *obj)
+{
+    return (struct intel_desc_layout *) obj;
+}
+
+static inline struct intel_pipeline_layout *intel_pipeline_layout(VkPipelineLayout pipeline_layout)
+{
+    return *(struct intel_pipeline_layout **) &pipeline_layout;
+}
+
+static inline struct intel_pipeline_layout *intel_pipeline_layout_from_obj(struct intel_obj *obj)
+{
+    return (struct intel_pipeline_layout *) obj;
+}
+
+static inline void intel_desc_offset_set(struct intel_desc_offset *offset,
+                                         uint32_t surface_offset,
+                                         uint32_t sampler_offset)
+{
+    offset->surface = surface_offset;
+    offset->sampler = sampler_offset;
+}
+
+static inline void intel_desc_offset_add(struct intel_desc_offset *offset,
+                                         const struct intel_desc_offset *lhs,
+                                         const struct intel_desc_offset *rhs)
+{
+    offset->surface = lhs->surface + rhs->surface;
+    offset->sampler = lhs->sampler + rhs->sampler;
+}
+
+static inline void intel_desc_offset_sub(struct intel_desc_offset *offset,
+                                         const struct intel_desc_offset *lhs,
+                                         const struct intel_desc_offset *rhs)
+{
+    offset->surface = lhs->surface - rhs->surface;
+    offset->sampler = lhs->sampler - rhs->sampler;
+}
+
+static inline void intel_desc_offset_mad(struct intel_desc_offset *offset,
+                                         const struct intel_desc_offset *lhs,
+                                         const struct intel_desc_offset *rhs,
+                                         uint32_t lhs_scale)
+{
+    offset->surface = lhs->surface * lhs_scale + rhs->surface;
+    offset->sampler = lhs->sampler * lhs_scale + rhs->sampler;
+}
+
+static inline bool intel_desc_offset_within(const struct intel_desc_offset *offset,
+                                            const struct intel_desc_offset *other)
+{
+    return (offset->surface <= other->surface &&
+            offset->sampler <= other->sampler);
+}
+
+bool intel_desc_iter_init_for_binding(struct intel_desc_iter *iter,
+                                      const struct intel_desc_layout *layout,
+                                      uint32_t binding_index, uint32_t array_base);
+
+bool intel_desc_iter_advance(struct intel_desc_iter *iter);
+
+VkResult intel_desc_region_create(struct intel_dev *dev,
+                                    struct intel_desc_region **region_ret);
+void intel_desc_region_destroy(struct intel_dev *dev,
+                               struct intel_desc_region *region);
+
+VkResult intel_desc_region_alloc(struct intel_desc_region *region,
+                                 uint32_t max_sets,
+                                 const VkDescriptorPoolCreateInfo *info,
+                                 struct intel_desc_offset *begin,
+                                 struct intel_desc_offset *end);
+void intel_desc_region_free(struct intel_desc_region *region,
+                            const struct intel_desc_offset *begin,
+                            const struct intel_desc_offset *end);
+
+void intel_desc_region_clear(struct intel_desc_region *region,
+                             const struct intel_desc_offset *begin,
+                             const struct intel_desc_offset *end);
+
+void intel_desc_region_update(struct intel_desc_region *region,
+                              const struct intel_desc_offset *begin,
+                              const struct intel_desc_offset *end,
+                              const struct intel_desc_surface *surfaces,
+                              const struct intel_desc_sampler *samplers);
+
+void intel_desc_region_copy(struct intel_desc_region *region,
+                            const struct intel_desc_offset *begin,
+                            const struct intel_desc_offset *end,
+                            const struct intel_desc_offset *src);
+
+void intel_desc_region_read_surface(const struct intel_desc_region *region,
+                                    const struct intel_desc_offset *offset,
+                                    VkShaderStageFlagBits stage,
+                                    const struct intel_mem **mem,
+                                    bool *read_only,
+                                    const uint32_t **cmd,
+                                    uint32_t *cmd_len);
+void intel_desc_region_read_sampler(const struct intel_desc_region *region,
+                                    const struct intel_desc_offset *offset,
+                                    const struct intel_sampler **sampler);
+
+VkResult intel_desc_pool_create(struct intel_dev *dev,
+                                  const VkDescriptorPoolCreateInfo *info,
+                                  struct intel_desc_pool **pool_ret);
+void intel_desc_pool_destroy(struct intel_desc_pool *pool);
+
+VkResult intel_desc_pool_alloc(struct intel_desc_pool *pool,
+                                 const struct intel_desc_layout *layout,
+                                 struct intel_desc_offset *begin,
+                                 struct intel_desc_offset *end);
+void intel_desc_pool_reset(struct intel_desc_pool *pool);
+
+VkResult intel_desc_set_create(struct intel_dev *dev,
+                                 struct intel_desc_pool *pool,
+                                 const struct intel_desc_layout *layout,
+                                 struct intel_desc_set **set_ret);
+void intel_desc_set_destroy(struct intel_desc_set *set);
+
+VkResult intel_desc_layout_create(struct intel_dev *dev,
+                                    const VkDescriptorSetLayoutCreateInfo *info,
+                                    struct intel_desc_layout **layout_ret);
+void intel_desc_layout_destroy(struct intel_desc_layout *layout);
+
+VkResult intel_pipeline_layout_create(struct intel_dev *dev,
+                                          const VkPipelineLayoutCreateInfo  *pPipelineCreateInfo,
+                                          struct intel_pipeline_layout **pipeline_layout_ret);
+void intel_pipeline_layout_destroy(struct intel_pipeline_layout *pipeline_layout);
+
+#endif /* DESC_H */

diff --git a/icd/intel/dev.c b/icd/intel/dev.c
new file mode 100644
index 0000000..09e237e
--- /dev/null
+++ b/icd/intel/dev.c

@@ -0,0 +1,254 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Chia-I Wu <olv@lunarg.com>
+ *
+ */
+
+#include <stdarg.h>
+#include "kmd/winsys.h"
+#include "desc.h"
+#include "gpu.h"
+#include "pipeline.h"
+#include "queue.h"
+#include "dev.h"
+
+static void dev_destroy_meta_shaders(struct intel_dev *dev)
+{
+    uint32_t i;
+
+    for (i = 0; i < ARRAY_SIZE(dev->cmd_meta_shaders); i++) {
+        if (!dev->cmd_meta_shaders[i])
+            break;
+
+        intel_pipeline_shader_destroy(dev, dev->cmd_meta_shaders[i]);
+        dev->cmd_meta_shaders[i] = NULL;
+    }
+}
+
+static bool dev_create_meta_shaders(struct intel_dev *dev)
+{
+    uint32_t i;
+
+    for (i = 0; i < ARRAY_SIZE(dev->cmd_meta_shaders); i++) {
+        struct intel_pipeline_shader *sh;
+
+        sh = intel_pipeline_shader_create_meta(dev, i);
+        if (!sh) {
+            dev_destroy_meta_shaders(dev);
+            return false;
+        }
+
+        dev->cmd_meta_shaders[i] = sh;
+    }
+
+    return true;
+}
+
+static VkResult dev_create_queues(struct intel_dev *dev,
+                                  const VkDeviceQueueCreateInfo *queues,
+                                  uint32_t count)
+{
+    uint32_t i;
+
+    for (i = 0; i < count; i++) {
+        const VkDeviceQueueCreateInfo *q = &queues[i];
+        VkResult ret = VK_SUCCESS;
+
+        assert((q->queueFamilyIndex < INTEL_GPU_ENGINE_COUNT &&
+            q->queueCount == 1 && !dev->queues[q->queueFamilyIndex]) && "Invalid Queue request");
+        /* Help catch places where we forgot to initialize pQueuePriorities */
+        assert(q->pQueuePriorities);
+        ret = intel_queue_create(dev, q->queueFamilyIndex,
+                    &dev->queues[q->queueFamilyIndex]);
+
+        if (ret != VK_SUCCESS) {
+            uint32_t j;
+            for (j = 0; j < i; j++)
+                intel_queue_destroy(dev->queues[j]);
+
+            return ret;
+        }
+    }
+
+    return VK_SUCCESS;
+}
+
+VkResult intel_dev_create(struct intel_gpu *gpu,
+                          const VkDeviceCreateInfo *info,
+                          struct intel_dev **dev_ret)
+{
+    struct intel_dev *dev;
+    uint32_t i;
+    VkResult ret;
+
+    // ICD limited to a single virtual device
+    if (gpu->winsys)
+        return VK_ERROR_INITIALIZATION_FAILED;
+
+    dev = (struct intel_dev *) intel_base_create(&gpu->handle,
+            sizeof(*dev), false,
+            VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT, info, sizeof(struct intel_dev_dbg));
+    if (!dev)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    for (i = 0; i < info->enabledExtensionCount; i++) {
+        const enum intel_phy_dev_ext_type ext =
+            intel_gpu_lookup_phy_dev_extension(gpu,
+                    info->ppEnabledExtensionNames[i]);
+
+        if (ext != INTEL_PHY_DEV_EXT_INVALID)
+            dev->phy_dev_exts[ext] = true;
+    }
+
+    dev->gpu = gpu;
+
+    ret = intel_gpu_init_winsys(gpu);
+    if (ret != VK_SUCCESS) {
+        intel_dev_destroy(dev);
+        return ret;
+    }
+
+    dev->winsys = gpu->winsys;
+
+    dev->cmd_scratch_bo = intel_winsys_alloc_bo(dev->winsys,
+            "command buffer scratch", 4096, false);
+    if (!dev->cmd_scratch_bo) {
+        intel_dev_destroy(dev);
+        return VK_ERROR_OUT_OF_DEVICE_MEMORY;
+    }
+
+    if (!dev_create_meta_shaders(dev)) {
+        intel_dev_destroy(dev);
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+    }
+
+    ret = intel_desc_region_create(dev, &dev->desc_region);
+    if (ret != VK_SUCCESS) {
+        intel_dev_destroy(dev);
+        return ret;
+    }
+
+    intel_pipeline_init_default_sample_patterns(dev,
+            (uint8_t *) &dev->sample_pattern_1x,
+            (uint8_t *) &dev->sample_pattern_2x,
+            (uint8_t *) &dev->sample_pattern_4x,
+            (uint8_t *) dev->sample_pattern_8x,
+            (uint8_t *) dev->sample_pattern_16x);
+
+    ret = dev_create_queues(dev, info->pQueueCreateInfos,
+            info->queueCreateInfoCount);
+    if (ret != VK_SUCCESS) {
+        intel_dev_destroy(dev);
+        return ret;
+    }
+
+    *dev_ret = dev;
+
+    return VK_SUCCESS;
+}
+
+void intel_dev_destroy(struct intel_dev *dev)
+{
+    struct intel_gpu *gpu = dev->gpu;
+    uint32_t i;
+
+    for (i = 0; i < ARRAY_SIZE(dev->queues); i++) {
+        if (dev->queues[i])
+            intel_queue_destroy(dev->queues[i]);
+    }
+
+    if (dev->desc_region)
+        intel_desc_region_destroy(dev, dev->desc_region);
+
+    dev_destroy_meta_shaders(dev);
+
+    intel_bo_unref(dev->cmd_scratch_bo);
+
+    intel_base_destroy(&dev->base);
+
+    if (gpu->winsys)
+        intel_gpu_cleanup_winsys(gpu);
+}
+
+void intel_dev_log(struct intel_dev *dev,
+                   VkFlags msg_flags,
+                   struct intel_base *src_object,
+                   size_t location,
+                   int32_t msg_code,
+                   const char *format, ...)
+{
+    va_list ap;
+
+    va_start(ap, format);
+    intel_logv(dev, msg_flags,
+               (src_object->dbg ? src_object->dbg->type : 0),
+               (uint64_t) src_object,
+               location, msg_code,
+               format, ap);
+    va_end(ap);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateDevice(
+    VkPhysicalDevice                    gpu_,
+    const VkDeviceCreateInfo*           pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkDevice*                           pDevice)
+{
+    struct intel_gpu *gpu = intel_gpu(gpu_);
+
+    return intel_dev_create(gpu, pCreateInfo, (struct intel_dev **) pDevice);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyDevice(
+    VkDevice                                  device,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    struct intel_dev *dev = intel_dev(device);
+
+    intel_dev_destroy(dev);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetDeviceQueue(
+    VkDevice                                  device,
+    uint32_t                                  queueFamilyIndex,
+    uint32_t                                  queueIndex,
+    VkQueue*                                  pQueue)
+{
+    struct intel_dev *dev = intel_dev(device);
+
+    *pQueue = (VkQueue) dev->queues[queueFamilyIndex];
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkDeviceWaitIdle(
+    VkDevice                                  device)
+{
+    struct intel_dev *dev = intel_dev(device);
+    VkResult ret = VK_SUCCESS;
+    uint32_t i;
+
+    for (i = 0; i < ARRAY_SIZE(dev->queues); i++) {
+        if (dev->queues[i]) {
+            const VkResult r = intel_queue_wait(dev->queues[i], -1);
+            if (r != VK_SUCCESS)
+                ret = r;
+        }
+    }
+
+    return ret;
+}

diff --git a/icd/intel/dev.h b/icd/intel/dev.h
new file mode 100644
index 0000000..bf84886
--- /dev/null
+++ b/icd/intel/dev.h

@@ -0,0 +1,188 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Chia-I Wu <olv@lunarg.com>
+ *
+ */
+
+#ifndef DEV_H
+#define DEV_H
+
+#include "intel.h"
+#include "gpu.h"
+#include "obj.h"
+
+struct intel_desc_region;
+struct intel_pipeline_shader;
+struct intel_queue;
+struct intel_winsys;
+
+enum intel_dev_meta_shader {
+    /*
+     * This expects an ivec2 to be pushed:
+     *
+     *  .x is memory offset
+     *  .y is fill value
+     *
+     * as well as GEN6_VFCOMP_STORE_VID.
+     */
+    INTEL_DEV_META_VS_FILL_MEM,
+
+    /*
+     * These expect an ivec2 to be pushed:
+     *
+     *  .x is dst memory offset
+     *  .y is src memory offset
+     *
+     * as well as GEN6_VFCOMP_STORE_VID.
+     */
+    INTEL_DEV_META_VS_COPY_MEM,
+    INTEL_DEV_META_VS_COPY_MEM_UNALIGNED,
+
+    /*
+     * This expects an ivec4 to be pushed:
+     *
+     *  .xy is added to fargment coord to form (u, v)
+     *  .z is extent width
+     *  .w is dst memory offset
+     *
+     * as well as GEN6_VFCOMP_STORE_VID.
+     */
+    INTEL_DEV_META_VS_COPY_R8_TO_MEM,
+    INTEL_DEV_META_VS_COPY_R16_TO_MEM,
+    INTEL_DEV_META_VS_COPY_R32_TO_MEM,
+    INTEL_DEV_META_VS_COPY_R32G32_TO_MEM,
+    INTEL_DEV_META_VS_COPY_R32G32B32A32_TO_MEM,
+
+    /*
+     * These expect an ivec4 to be pushed:
+     *
+     *  .xy is added to fragment coord to form (u, v)
+     *  .z is ai
+     *  .w is lod
+     */
+    INTEL_DEV_META_FS_COPY_MEM,             /* ld_lz(u)             */
+    INTEL_DEV_META_FS_COPY_1D,              /* ld(u, lod)           */
+    INTEL_DEV_META_FS_COPY_1D_ARRAY,        /* ld(u, lod, ai)       */
+    INTEL_DEV_META_FS_COPY_2D,              /* ld(u, lod, v)        */
+    INTEL_DEV_META_FS_COPY_2D_ARRAY,        /* ld(u, lod, v, ai)    */
+    INTEL_DEV_META_FS_COPY_2D_MS,           /* ld_mcs() + ld2dms()  */
+
+    /*
+     * These expect a second ivec4 to be pushed:
+     *
+     *  .x is memory offset
+     *  .y is extent width
+     *
+     * The second ivec4 is to convert linear fragment coord to (u, v).
+     */
+    INTEL_DEV_META_FS_COPY_1D_TO_MEM,       /* ld(u, lod)           */
+    INTEL_DEV_META_FS_COPY_1D_ARRAY_TO_MEM, /* ld(u, lod, ai)       */
+    INTEL_DEV_META_FS_COPY_2D_TO_MEM,       /* ld(u, lod, v)        */
+    INTEL_DEV_META_FS_COPY_2D_ARRAY_TO_MEM, /* ld(u, lod, v, ai)    */
+    INTEL_DEV_META_FS_COPY_2D_MS_TO_MEM,    /* ld_mcs() + ld2dms()  */
+
+    /*
+     * This expects an ivec4 to be pushed:
+     *
+     *  .xy is added to fargment coord to form (u, v)
+     *  .z is extent width
+     *
+     * .z is used to linearize (u, v).
+     */
+    INTEL_DEV_META_FS_COPY_MEM_TO_IMG,      /* ld_lz(u)             */
+
+    /*
+     * These expect the clear value to be pushed, and set fragment color or
+     * depth to the clear value.
+     */
+    INTEL_DEV_META_FS_CLEAR_COLOR,
+    INTEL_DEV_META_FS_CLEAR_DEPTH,
+
+    /*
+     * These expect an ivec4 to be pushed:
+     *
+     *  .xy is added to fragment coord to form (u, v)
+     *
+     * All samples are fetched and averaged.  The fragment color is set to the
+     * averaged value.
+     */
+    INTEL_DEV_META_FS_RESOLVE_2X,
+    INTEL_DEV_META_FS_RESOLVE_4X,
+    INTEL_DEV_META_FS_RESOLVE_8X,
+    INTEL_DEV_META_FS_RESOLVE_16X,
+
+    INTEL_DEV_META_SHADER_COUNT,
+};
+
+struct intel_dev_dbg {
+    struct intel_base_dbg base;
+};
+
+struct intel_dev {
+    struct intel_base base;
+
+    bool phy_dev_exts[INTEL_PHY_DEV_EXT_COUNT];
+
+    struct intel_gpu *gpu;
+    struct intel_winsys *winsys;
+
+    struct intel_bo *cmd_scratch_bo;
+    struct intel_pipeline_shader *cmd_meta_shaders[INTEL_DEV_META_SHADER_COUNT];
+
+    struct intel_desc_region *desc_region;
+
+    uint32_t sample_pattern_1x;
+    uint32_t sample_pattern_2x;
+    uint32_t sample_pattern_4x;
+    uint32_t sample_pattern_8x[2];
+    uint32_t sample_pattern_16x[4];
+
+    struct intel_queue *queues[INTEL_GPU_ENGINE_COUNT];
+};
+
+static inline struct intel_dev *intel_dev(VkDevice dev)
+{
+    return (struct intel_dev *) dev;
+}
+
+static inline struct intel_dev_dbg *intel_dev_dbg(struct intel_dev *dev)
+{
+    return (struct intel_dev_dbg *) dev->base.dbg;
+}
+
+VkResult intel_dev_create(struct intel_gpu *gpu,
+                            const VkDeviceCreateInfo *info,
+                            struct intel_dev **dev_ret);
+void intel_dev_destroy(struct intel_dev *dev);
+
+void intel_dev_log(struct intel_dev *dev,
+                   VkFlags msg_flags,
+                   struct intel_base *src_object,
+                   size_t location,
+                   int32_t msg_code,
+                   const char *format, ...);
+
+static inline const struct intel_pipeline_shader *intel_dev_get_meta_shader(const struct intel_dev *dev,
+                                                                            enum intel_dev_meta_shader id)
+{
+    assert(id < INTEL_DEV_META_SHADER_COUNT);
+    return dev->cmd_meta_shaders[id];
+}
+
+#endif /* DEV_H */

diff --git a/icd/intel/event.c b/icd/intel/event.c
new file mode 100644
index 0000000..a62724f
--- /dev/null
+++ b/icd/intel/event.c

@@ -0,0 +1,185 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ *
+ */
+
+#include "dev.h"
+#include "mem.h"
+#include "event.h"
+
+static VkResult event_map(struct intel_event *event, uint32_t **ptr_ret)
+{
+    void *ptr;
+
+    /*
+     * This is an unsynchronous mapping.  It doesn't look like we want a
+     * synchronous mapping.  But it is also unclear what would happen when GPU
+     * writes to it at the same time.  We need atomicy here.
+     */
+    ptr = intel_mem_map(event->obj.mem, 0);
+    if (!ptr)
+        return VK_ERROR_MEMORY_MAP_FAILED;
+
+    *ptr_ret = (uint32_t *) ((uint8_t *) ptr + event->obj.offset);
+
+    return VK_SUCCESS;
+}
+
+static void event_unmap(struct intel_event *event)
+{
+    intel_mem_unmap(event->obj.mem);
+}
+
+static VkResult event_write(struct intel_event *event, uint32_t val)
+{
+    VkResult ret;
+    uint32_t *ptr;
+
+    ret = event_map(event, &ptr);
+    if (ret == VK_SUCCESS) {
+        *ptr = val;
+        event_unmap(event);
+    }
+
+    return ret;
+}
+
+static VkResult event_read(struct intel_event *event, uint32_t *val)
+{
+    VkResult ret;
+    uint32_t *ptr;
+
+    ret = event_map(event, &ptr);
+    if (ret == VK_SUCCESS) {
+        *val = *ptr;
+        event_unmap(event);
+    }
+
+    return ret;
+}
+
+static void event_destroy(struct intel_obj *obj)
+{
+    struct intel_event *event = intel_event_from_obj(obj);
+
+    intel_event_destroy(event);
+}
+
+VkResult intel_event_create(struct intel_dev *dev,
+                              const VkEventCreateInfo *info,
+                              struct intel_event **event_ret)
+{
+    struct intel_event *event;
+    VkMemoryAllocateInfo mem_reqs;
+
+    event = (struct intel_event *) intel_base_create(&dev->base.handle,
+            sizeof(*event), dev->base.dbg, VK_DEBUG_REPORT_OBJECT_TYPE_EVENT_EXT, info, 0);
+    if (!event)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    mem_reqs.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
+    mem_reqs.allocationSize = 4; // We know allocation is page alignned
+    mem_reqs.pNext = NULL;
+    mem_reqs.memoryTypeIndex = 0;
+    intel_mem_alloc(dev, &mem_reqs, &event->obj.mem);
+
+    event->obj.destroy = event_destroy;
+
+    *event_ret = event;
+
+    return VK_SUCCESS;
+}
+
+void intel_event_destroy(struct intel_event *event)
+{
+    intel_base_destroy(&event->obj.base);
+}
+
+VkResult intel_event_set(struct intel_event *event)
+{
+    return event_write(event, 1);
+}
+
+VkResult intel_event_reset(struct intel_event *event)
+{
+    return event_write(event, 0);
+}
+
+VkResult intel_event_get_status(struct intel_event *event)
+{
+    VkResult ret;
+    uint32_t val;
+
+    ret = event_read(event, &val);
+    if (ret != VK_SUCCESS)
+        return ret;
+
+    return (val) ? VK_EVENT_SET : VK_EVENT_RESET;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateEvent(
+    VkDevice                                  device,
+    const VkEventCreateInfo*                pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkEvent*                                  pEvent)
+{
+    struct intel_dev *dev = intel_dev(device);
+
+    return intel_event_create(dev, pCreateInfo,
+            (struct intel_event **) pEvent);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyEvent(
+    VkDevice                                device,
+    VkEvent                                 event,
+    const VkAllocationCallbacks*                     pAllocator)
+
+ {
+    struct intel_obj *obj = intel_obj(event);
+
+    intel_mem_free(obj->mem);
+    obj->destroy(obj);
+ }
+
+VKAPI_ATTR VkResult VKAPI_CALL vkGetEventStatus(
+    VkDevice                                  device,
+    VkEvent                                   event_)
+{
+    struct intel_event *event = intel_event(event_);
+
+    return intel_event_get_status(event);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkSetEvent(
+    VkDevice                                  device,
+    VkEvent                                   event_)
+{
+    struct intel_event *event = intel_event(event_);
+
+    return intel_event_set(event);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkResetEvent(
+    VkDevice                                  device,
+    VkEvent                                   event_)
+{
+    struct intel_event *event = intel_event(event_);
+
+    return intel_event_reset(event);
+}

diff --git a/icd/intel/event.h b/icd/intel/event.h
new file mode 100644
index 0000000..8d37751
--- /dev/null
+++ b/icd/intel/event.h

@@ -0,0 +1,53 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ *
+ */
+
+#ifndef EVENT_H
+#define EVENT_H
+
+#include "intel.h"
+#include "obj.h"
+
+struct intel_dev;
+
+struct intel_event {
+    struct intel_obj obj;
+};
+
+static inline struct intel_event *intel_event(VkEvent event)
+{
+    return *(struct intel_event **) &event;
+}
+
+static inline struct intel_event *intel_event_from_obj(struct intel_obj *obj)
+{
+    return (struct intel_event *) obj;
+}
+
+VkResult intel_event_create(struct intel_dev *dev,
+                              const VkEventCreateInfo *info,
+                              struct intel_event **event_ret);
+void intel_event_destroy(struct intel_event *event);
+
+VkResult intel_event_set(struct intel_event *event);
+VkResult intel_event_reset(struct intel_event *event);
+VkResult intel_event_get_status(struct intel_event *event);
+
+#endif /* EVENT_H */

diff --git a/icd/intel/extension_info.h b/icd/intel/extension_info.h
new file mode 100644
index 0000000..8fa4124
--- /dev/null
+++ b/icd/intel/extension_info.h

@@ -0,0 +1,53 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ *
+ */
+
+#ifndef EXTENSION_INFO_H
+#define EXTENSION_INFO_H
+
+#include "intel.h"
+
+enum intel_phy_dev_ext_type {
+    INTEL_PHY_DEV_EXT_WSI_SWAPCHAIN,
+
+    INTEL_PHY_DEV_EXT_COUNT,
+    INTEL_PHY_DEV_EXT_INVALID = INTEL_PHY_DEV_EXT_COUNT,
+};
+
+enum intel_global_ext_type {
+    INTEL_PHY_DEV_EXT_DEBUG_REPORT,
+    INTEL_GLOBAL_EXT_WSI_SURFACE,
+#ifdef VK_USE_PLATFORM_XCB_KHR
+    INTEL_GLOBAL_EXT_WSI_XCB_SURFACE,
+#endif // VK_USE_PLATFORM_XCB_KHR
+#ifdef VK_USE_PLATFORM_XLIB_KHR
+    INTEL_GLOBAL_EXT_WSI_XLIB_SURFACE,
+#endif // VK_USE_PLATFORM_XLIB_KHR
+    INTEL_GLOBAL_EXT_COUNT,
+    INTEL_GLOBAL_EXT_INVALID = INTEL_GLOBAL_EXT_COUNT,
+};
+
+extern const VkExtensionProperties intel_phy_dev_gpu_exts[INTEL_PHY_DEV_EXT_COUNT];
+extern const VkExtensionProperties intel_global_exts[INTEL_GLOBAL_EXT_COUNT];
+
+bool compare_vk_extension_properties(
+        const VkExtensionProperties *op1,
+        const char                  *extensionName);
+#endif /* EXTENSION_INFO_H */

diff --git a/icd/intel/extension_utils.c b/icd/intel/extension_utils.c
new file mode 100644
index 0000000..d08ce69
--- /dev/null
+++ b/icd/intel/extension_utils.c

@@ -0,0 +1,75 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ * Author: Ian Elliott <ian@lunarg.com>
+ * Author: Mark Lobodzinski <mark@lunarg.com>
+ *
+ */
+
+#include "extension_info.h"
+
+const VkExtensionProperties intel_global_exts[INTEL_GLOBAL_EXT_COUNT] = {
+    {
+        .extensionName = VK_EXT_DEBUG_REPORT_EXTENSION_NAME,
+        .specVersion = VK_EXT_DEBUG_REPORT_SPEC_VERSION,
+    },
+    {
+        .extensionName = VK_KHR_SURFACE_EXTENSION_NAME,
+        .specVersion = VK_KHR_SURFACE_SPEC_VERSION,
+    }
+#ifdef VK_USE_PLATFORM_XCB_KHR
+    ,{
+        .extensionName = VK_KHR_XCB_SURFACE_EXTENSION_NAME,
+        .specVersion = VK_KHR_XCB_SURFACE_SPEC_VERSION,
+    }
+#endif
+#ifdef VK_USE_PLATFORM_XLIB_KHR
+    ,{
+        .extensionName = VK_KHR_XLIB_SURFACE_EXTENSION_NAME,
+        .specVersion = VK_KHR_XLIB_SURFACE_SPEC_VERSION,
+    }
+#endif
+#if 0
+#ifdef VK_USE_PLATFORM_MIR_KHR
+    ,{
+        .extensionName = VK_KHR_MIR_SURFACE_EXTENSION_NAME,
+        .specVersion = VK_KHR_MIR_SURFACE_SPEC_VERSION,
+    }
+#endif
+#ifdef VK_USE_PLATFORM_WAYLAND_KHR
+    ,{
+        .extensionName = VK_KHR_WAYLAND_SURFACE_EXTENSION_NAME,
+        .specVersion = VK_KHR_WAYLAND_SURFACE_SPEC_VERSION,
+    }
+#endif
+#endif
+};
+
+const VkExtensionProperties intel_phy_dev_gpu_exts[INTEL_PHY_DEV_EXT_COUNT] = {
+    {
+        .extensionName = VK_KHR_SWAPCHAIN_EXTENSION_NAME,
+        .specVersion = VK_KHR_SWAPCHAIN_SPEC_VERSION,
+    }
+};
+
+bool compare_vk_extension_properties(
+        const VkExtensionProperties *op1,
+        const char *extensionName)
+{
+    return strcmp(op1->extensionName, extensionName) == 0 ? true : false;
+}

diff --git a/icd/intel/fb.c b/icd/intel/fb.c
new file mode 100644
index 0000000..3331f6f
--- /dev/null
+++ b/icd/intel/fb.c

@@ -0,0 +1,250 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chris Forbes <chrisf@ijw.co.nz>
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Mark Lobodzinski <mark@lunarg.com>
+ * Author: Tony Barbour <tony@LunarG.com>
+ *
+ */
+
+#include "dev.h"
+#include "obj.h"
+#include "view.h"
+#include "img.h"
+#include "fb.h"
+
+static void fb_destroy(struct intel_obj *obj)
+{
+    struct intel_fb *fb = intel_fb_from_obj(obj);
+
+    intel_fb_destroy(fb);
+}
+
+VkResult intel_fb_create(struct intel_dev *dev,
+                         const VkFramebufferCreateInfo *info,
+                         const VkAllocationCallbacks *allocator,
+                         struct intel_fb **fb_ret)
+{
+    struct intel_fb *fb;
+    uint32_t width, height, array_size, i;
+
+    fb = (struct intel_fb *) intel_base_create(&dev->base.handle,
+            sizeof(*fb), dev->base.dbg, VK_DEBUG_REPORT_OBJECT_TYPE_FRAMEBUFFER_EXT, info, 0);
+    if (!fb)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    fb->view_count = info->attachmentCount;
+    fb->views = intel_alloc(fb, sizeof(fb->views[0]) * fb->view_count, sizeof(int),
+            VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+    if (!fb->views) {
+        intel_fb_destroy(fb);
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+    }
+
+    width = info->width;
+    height = info->height;
+    array_size = info->layers;
+
+    for (i = 0; i < info->attachmentCount; i++) {
+        const VkImageView *att = &info->pAttachments[i];
+        const struct intel_img_view *view = intel_img_view(*att);
+
+        if (array_size > view->att_view.array_size)
+            array_size = view->att_view.array_size;
+
+        fb->views[i] = &view->att_view;
+    }
+
+    fb->width = width;
+    fb->height = height;
+    fb->array_size = array_size;
+
+    fb->obj.destroy = fb_destroy;
+
+    *fb_ret = fb;
+
+    return VK_SUCCESS;
+}
+
+void intel_fb_destroy(struct intel_fb *fb)
+{
+    if (fb->views)
+        intel_free(fb, fb->views);
+    intel_base_destroy(&fb->obj.base);
+}
+
+static void render_pass_destroy(struct intel_obj *obj)
+{
+    struct intel_render_pass *rp = intel_render_pass_from_obj(obj);
+
+    intel_render_pass_destroy(rp);
+}
+
+VkResult intel_render_pass_create(struct intel_dev *dev,
+                                  const VkRenderPassCreateInfo *info,
+                                  const VkAllocationCallbacks *allocator,
+                                  struct intel_render_pass **rp_ret)
+{
+    struct intel_render_pass *rp;
+    uint32_t i;
+
+    // TODO: Add support for subpass dependencies
+    assert(!(info->dependencyCount) && "No ICD support for subpass dependencies");
+
+    rp = (struct intel_render_pass *) intel_base_create(&dev->base.handle,
+            sizeof(*rp), dev->base.dbg, VK_DEBUG_REPORT_OBJECT_TYPE_RENDER_PASS_EXT, info, 0);
+    if (!rp)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    rp->attachment_count = info->attachmentCount;
+    rp->subpass_count = info->subpassCount;
+
+    rp->attachments = intel_alloc(rp,
+            sizeof(rp->attachments[0]) * rp->attachment_count +
+            sizeof(rp->subpasses[0]) * rp->subpass_count, sizeof(int),
+            VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+    if (!rp->attachments) {
+        intel_render_pass_destroy(rp);
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+    }
+
+    rp->subpasses = (struct intel_render_pass_subpass *)
+        (rp->attachments + rp->attachment_count);
+
+    rp->obj.destroy = render_pass_destroy;
+
+    for (i = 0; i < info->attachmentCount; i++) {
+        struct intel_render_pass_attachment *att = &rp->attachments[i];
+
+        att->format = info->pAttachments[i].format;
+        att->sample_count = (uint32_t) info->pAttachments[i].samples;
+        att->initial_layout = info->pAttachments[i].initialLayout;
+        att->final_layout = info->pAttachments[i].finalLayout;
+
+        att->clear_on_load = (info->pAttachments[i].loadOp ==
+                VK_ATTACHMENT_LOAD_OP_CLEAR);
+        att->disable_store = (info->pAttachments[i].storeOp ==
+                VK_ATTACHMENT_STORE_OP_DONT_CARE);
+
+        att->stencil_clear_on_load = (info->pAttachments[i].stencilLoadOp ==
+                VK_ATTACHMENT_LOAD_OP_CLEAR);
+        att->stencil_disable_store = (info->pAttachments[i].stencilStoreOp ==
+                VK_ATTACHMENT_STORE_OP_DONT_CARE);
+    }
+
+    for (i = 0; i < rp->subpass_count; i++) {
+        const VkSubpassDescription *subpass_info = &info->pSubpasses[i];
+        struct intel_render_pass_subpass *subpass = &rp->subpasses[i];
+        uint32_t j;
+
+        // TODO: Add support for Input Attachment References
+        assert(!(subpass_info->inputAttachmentCount) && "No ICD support for Input Attachments");
+
+        for (j = 0; j < subpass_info->colorAttachmentCount; j++) {
+            const VkAttachmentReference *color_ref =
+                &subpass_info->pColorAttachments[j];
+            const VkAttachmentReference *resolve_ref =
+                (subpass_info->pResolveAttachments) ?
+                &subpass_info->pResolveAttachments[j] : NULL;
+
+            subpass->color_indices[j] = color_ref->attachment;
+            subpass->resolve_indices[j] = (resolve_ref) ?
+                resolve_ref->attachment : VK_ATTACHMENT_UNUSED;
+            subpass->color_layouts[j] = color_ref->layout;
+        }
+
+        subpass->color_count = subpass_info->colorAttachmentCount;
+
+        if (subpass_info->pDepthStencilAttachment) {
+            subpass->ds_index =
+                subpass_info->pDepthStencilAttachment->attachment;
+            subpass->ds_layout =
+                subpass_info->pDepthStencilAttachment->layout;
+        } else {
+            subpass->ds_index = VK_ATTACHMENT_UNUSED;
+            subpass->ds_layout = VK_IMAGE_LAYOUT_UNDEFINED;
+        }
+
+        switch (subpass->ds_layout) {
+        case VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL:
+        case VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL:
+            subpass->ds_optimal = true;
+            break;
+        default:
+            subpass->ds_optimal = false;
+            break;
+        }
+    }
+
+    *rp_ret = rp;
+
+    return VK_SUCCESS;
+}
+
+void intel_render_pass_destroy(struct intel_render_pass *rp)
+{
+    if (rp->attachments)
+        intel_free(rp, rp->attachments);
+
+    intel_base_destroy(&rp->obj.base);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateFramebuffer(
+    VkDevice                                  device,
+    const VkFramebufferCreateInfo*          pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkFramebuffer*                            pFramebuffer)
+{
+    struct intel_dev *dev = intel_dev(device);
+
+    return intel_fb_create(dev, pCreateInfo, pAllocator,
+            (struct intel_fb **) pFramebuffer);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyFramebuffer(
+    VkDevice                                device,
+    VkFramebuffer                           framebuffer,
+    const VkAllocationCallbacks*                     pAllocator)
+
+{
+    struct intel_obj *obj = intel_obj(framebuffer);
+
+    obj->destroy(obj);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateRenderPass(
+    VkDevice                                  device,
+    const VkRenderPassCreateInfo*          pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkRenderPass*                            pRenderPass)
+{
+    struct intel_dev *dev = intel_dev(device);
+
+    return intel_render_pass_create(dev, pCreateInfo, pAllocator,
+            (struct intel_render_pass **) pRenderPass);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyRenderPass(
+    VkDevice                                device,
+    VkRenderPass                           renderPass,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    struct intel_obj *obj = intel_obj(renderPass);
+
+    obj->destroy(obj);
+}

diff --git a/icd/intel/fb.h b/icd/intel/fb.h
new file mode 100644
index 0000000..56ad0b2
--- /dev/null
+++ b/icd/intel/fb.h

@@ -0,0 +1,108 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olv@lunarg.com>
+ * Author: Chris Forbes <chrisf@ijw.co.nz>
+ * Author: Jon Ashburn <jon@lunarg.com>
+ *
+ */
+
+#ifndef FB_H
+#define FB_H
+
+#include "intel.h"
+#include "obj.h"
+
+struct intel_fb {
+    struct intel_obj obj;
+
+    const struct intel_att_view **views;
+    uint32_t view_count;
+
+    uint32_t width;
+    uint32_t height;
+    uint32_t array_size;
+};
+
+struct intel_render_pass_attachment {
+    VkFormat format;
+    uint32_t sample_count;
+
+    VkImageLayout initial_layout;
+    VkImageLayout final_layout;
+
+    bool clear_on_load;
+    bool disable_store;
+
+    bool stencil_clear_on_load;
+    bool stencil_disable_store;
+};
+
+struct intel_render_pass_subpass {
+    uint32_t color_indices[INTEL_MAX_RENDER_TARGETS];
+    uint32_t resolve_indices[INTEL_MAX_RENDER_TARGETS];
+    VkImageLayout color_layouts[INTEL_MAX_RENDER_TARGETS];
+    uint32_t color_count;
+
+    uint32_t ds_index;
+    VkImageLayout ds_layout;
+    bool ds_optimal;
+};
+
+struct intel_render_pass {
+    struct intel_obj obj;
+
+    struct intel_render_pass_attachment *attachments;
+    uint32_t attachment_count;
+
+    struct intel_render_pass_subpass *subpasses;
+    uint32_t subpass_count;
+};
+
+static inline struct intel_fb *intel_fb(VkFramebuffer fb)
+{
+    return *(struct intel_fb **) &fb;
+}
+
+static inline struct intel_fb *intel_fb_from_obj(struct intel_obj *obj)
+{
+    return (struct intel_fb *) obj;
+}
+
+static inline struct intel_render_pass *intel_render_pass(VkRenderPass rp)
+{
+    return *(struct intel_render_pass **) &rp;
+}
+
+static inline struct intel_render_pass *intel_render_pass_from_obj(struct intel_obj *obj)
+{
+    return (struct intel_render_pass *) obj;
+}
+
+VkResult intel_fb_create(struct intel_dev *dev,
+                         const VkFramebufferCreateInfo *pInfo,
+                         const VkAllocationCallbacks *allocator,
+                         struct intel_fb **fb_ret);
+void intel_fb_destroy(struct intel_fb *fb);
+
+VkResult intel_render_pass_create(struct intel_dev *dev,
+                                  const VkRenderPassCreateInfo *pInfo,
+                                  const VkAllocationCallbacks *allocator,
+                                  struct intel_render_pass **rp_ret);
+void intel_render_pass_destroy(struct intel_render_pass *rp);
+
+#endif /* FB_H */

diff --git a/icd/intel/fence.c b/icd/intel/fence.c
new file mode 100644
index 0000000..9e13c19
--- /dev/null
+++ b/icd/intel/fence.c

@@ -0,0 +1,194 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Chia-I Wu <olv@lunarg.com>
+ * Author: Mark Lobodzinski <mark@lunarg.com>
+ * Author: Tony Barbour <tony@LunarG.com>
+ *
+ */
+
+#include "kmd/winsys.h"
+#include "cmd.h"
+#include "dev.h"
+#include "wsi.h"
+#include "fence.h"
+#include "instance.h"
+
+static void fence_destroy(struct intel_obj *obj)
+{
+    struct intel_fence *fence = intel_fence_from_obj(obj);
+
+    intel_fence_destroy(fence);
+}
+
+VkResult intel_fence_create(struct intel_dev *dev,
+                              const VkFenceCreateInfo *info,
+                              struct intel_fence **fence_ret)
+{
+    struct intel_fence *fence;
+
+    fence = (struct intel_fence *) intel_base_create(&dev->base.handle,
+            sizeof(*fence), dev->base.dbg, VK_DEBUG_REPORT_OBJECT_TYPE_FENCE_EXT, info, 0);
+    if (!fence)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    if (dev->base.handle.instance->global_exts[INTEL_GLOBAL_EXT_WSI_SURFACE]) {
+        VkResult ret = intel_wsi_fence_init(fence);
+        if (ret != VK_SUCCESS) {
+            intel_fence_destroy(fence);
+            return ret;
+        }
+    }
+
+    fence->obj.destroy = fence_destroy;
+
+    *fence_ret = fence;
+    fence->signaled = (info->flags & VK_FENCE_CREATE_SIGNALED_BIT);
+
+    return VK_SUCCESS;
+}
+
+void intel_fence_destroy(struct intel_fence *fence)
+{
+    if (fence->wsi_data)
+        intel_wsi_fence_cleanup(fence);
+
+    intel_bo_unref(fence->seqno_bo);
+
+    intel_base_destroy(&fence->obj.base);
+}
+
+void intel_fence_copy(struct intel_fence *fence,
+                      const struct intel_fence *src)
+{
+    intel_wsi_fence_copy(fence, src);
+    intel_fence_set_seqno(fence, src->seqno_bo);
+}
+
+void intel_fence_set_seqno(struct intel_fence *fence,
+                           struct intel_bo *seqno_bo)
+{
+    intel_bo_unref(fence->seqno_bo);
+    fence->seqno_bo = intel_bo_ref(seqno_bo);
+
+    fence->signaled = false;
+}
+
+VkResult intel_fence_wait(struct intel_fence *fence, int64_t timeout_ns)
+{
+    VkResult ret;
+
+    ret = intel_wsi_fence_wait(fence, timeout_ns);
+    if (ret != VK_SUCCESS)
+        return ret;
+
+    if (fence->signaled) {
+        return VK_SUCCESS;
+    }
+
+    if (fence->seqno_bo) {
+        ret = (intel_bo_wait(fence->seqno_bo, timeout_ns)) ?
+            VK_NOT_READY : VK_SUCCESS;
+        if (ret == VK_SUCCESS) {
+            fence->signaled = true;
+        }
+        return ret;
+    }
+
+    assert(0 && "Invalid fence status");
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateFence(
+    VkDevice                                  device,
+    const VkFenceCreateInfo*                pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkFence*                                  pFence)
+{
+    struct intel_dev *dev = intel_dev(device);
+
+    return intel_fence_create(dev, pCreateInfo,
+            (struct intel_fence **) pFence);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyFence(
+    VkDevice                                device,
+    VkFence                                 fence,
+    const VkAllocationCallbacks*                     pAllocator)
+
+ {
+    struct intel_obj *obj = intel_obj(fence);
+
+    obj->destroy(obj);
+ }
+
+VKAPI_ATTR VkResult VKAPI_CALL vkGetFenceStatus(
+    VkDevice                                  device,
+    VkFence                                   fence_)
+{
+    struct intel_fence *fence = intel_fence(fence_);
+
+    return intel_fence_wait(fence, 0);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkWaitForFences(
+    VkDevice                                  device,
+    uint32_t                                    fenceCount,
+    const VkFence*                            pFences,
+    VkBool32                                    waitAll,
+    uint64_t                                    timeout)
+{
+    VkResult ret = VK_SUCCESS;
+    uint32_t i;
+
+    for (i = 0; i < fenceCount; i++) {
+        struct intel_fence *fence = intel_fence(pFences[i]);
+        int64_t ns;
+        VkResult r;
+
+        /* timeout in nano seconds */
+        ns = (timeout <= (uint64_t) INT64_MAX) ? timeout : -1;
+        r = intel_fence_wait(fence, ns);
+
+        if (!waitAll && r == VK_SUCCESS)
+            return VK_SUCCESS;
+
+        /* Translate return value according to spec */
+        if (r == VK_NOT_READY)
+            r = VK_TIMEOUT;
+
+        if (r != VK_SUCCESS)
+            ret = r;
+    }
+
+    return ret;
+}
+VKAPI_ATTR VkResult VKAPI_CALL vkResetFences(
+    VkDevice                                  device,
+    uint32_t                                  fenceCount,
+    const VkFence*                            pFences)
+{
+    uint32_t i;
+
+    for (i = 0; i < fenceCount; i++) {
+        struct intel_fence *fence = intel_fence(pFences[i]);
+        fence->signaled = false;
+    }
+
+    return VK_SUCCESS;
+}

diff --git a/icd/intel/fence.h b/icd/intel/fence.h
new file mode 100644
index 0000000..9ce72f9
--- /dev/null
+++ b/icd/intel/fence.h

@@ -0,0 +1,63 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ *
+ */
+
+#ifndef FENCE_H
+#define FENCE_H
+
+#include "intel.h"
+#include "obj.h"
+
+struct intel_bo;
+struct intel_dev;
+
+struct intel_fence {
+    struct intel_obj obj;
+
+    struct intel_bo *seqno_bo;
+    bool    signaled;
+
+    void *wsi_data;
+};
+
+static inline struct intel_fence *intel_fence(VkFence fence)
+{
+    return *(struct intel_fence **) &fence;
+}
+
+static inline struct intel_fence *intel_fence_from_obj(struct intel_obj *obj)
+{
+    return (struct intel_fence *) obj;
+}
+
+VkResult intel_fence_create(struct intel_dev *dev,
+                              const VkFenceCreateInfo *info,
+                              struct intel_fence **fence_ret);
+void intel_fence_destroy(struct intel_fence *fence);
+
+VkResult intel_fence_wait(struct intel_fence *fence, int64_t timeout_ns);
+
+void intel_fence_copy(struct intel_fence *fence,
+                      const struct intel_fence *src);
+
+void intel_fence_set_seqno(struct intel_fence *fence,
+                           struct intel_bo *seqno_bo);
+
+#endif /* FENCE_H */

diff --git a/icd/intel/format.c b/icd/intel/format.c
new file mode 100644
index 0000000..86c5a12
--- /dev/null
+++ b/icd/intel/format.c

@@ -0,0 +1,733 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Chia-I Wu <olv@lunarg.com>
+ * Author: Mark Lobodzinski <mark@lunarg.com>
+ *
+ */
+
+#include "genhw/genhw.h"
+#include "dev.h"
+#include "gpu.h"
+#include "format.h"
+
+struct intel_vf_cap {
+   int vertex_element;
+};
+
+struct intel_sampler_cap {
+   int sampling;
+   int filtering;
+   int shadow_map;
+   int chroma_key;
+};
+
+struct intel_dp_cap {
+   int rt_write;
+   int rt_write_blending;
+   int typed_write;
+   int media_color_processing;
+};
+
+/*
+ * This table is based on:
+ *
+ *  - the Sandy Bridge PRM, volume 4 part 1, page 88-97
+ *  - the Ivy Bridge PRM, volume 2 part 1, page 97-99
+ *  - the Haswell PRM, volume 7, page 467-470
+ */
+static const struct intel_vf_cap intel_vf_caps[] = {
+#define CAP(vertex_element) { INTEL_GEN(vertex_element) }
+   [GEN6_FORMAT_R32G32B32A32_FLOAT]       = CAP(  1),
+   [GEN6_FORMAT_R32G32B32A32_SINT]        = CAP(  1),
+   [GEN6_FORMAT_R32G32B32A32_UINT]        = CAP(  1),
+   [GEN6_FORMAT_R32G32B32A32_UNORM]       = CAP(  1),
+   [GEN6_FORMAT_R32G32B32A32_SNORM]       = CAP(  1),
+   [GEN6_FORMAT_R64G64_FLOAT]             = CAP(  1),
+   [GEN6_FORMAT_R32G32B32A32_SSCALED]     = CAP(  1),
+   [GEN6_FORMAT_R32G32B32A32_USCALED]     = CAP(  1),
+   [GEN6_FORMAT_R32G32B32A32_SFIXED]      = CAP(7.5),
+   [GEN6_FORMAT_R32G32B32_FLOAT]          = CAP(  1),
+   [GEN6_FORMAT_R32G32B32_SINT]           = CAP(  1),
+   [GEN6_FORMAT_R32G32B32_UINT]           = CAP(  1),
+   [GEN6_FORMAT_R32G32B32_UNORM]          = CAP(  1),
+   [GEN6_FORMAT_R32G32B32_SNORM]          = CAP(  1),
+   [GEN6_FORMAT_R32G32B32_SSCALED]        = CAP(  1),
+   [GEN6_FORMAT_R32G32B32_USCALED]        = CAP(  1),
+   [GEN6_FORMAT_R32G32B32_SFIXED]         = CAP(7.5),
+   [GEN6_FORMAT_R16G16B16A16_UNORM]       = CAP(  1),
+   [GEN6_FORMAT_R16G16B16A16_SNORM]       = CAP(  1),
+   [GEN6_FORMAT_R16G16B16A16_SINT]        = CAP(  1),
+   [GEN6_FORMAT_R16G16B16A16_UINT]        = CAP(  1),
+   [GEN6_FORMAT_R16G16B16A16_FLOAT]       = CAP(  1),
+   [GEN6_FORMAT_R32G32_FLOAT]             = CAP(  1),
+   [GEN6_FORMAT_R32G32_SINT]              = CAP(  1),
+   [GEN6_FORMAT_R32G32_UINT]              = CAP(  1),
+   [GEN6_FORMAT_R32G32_UNORM]             = CAP(  1),
+   [GEN6_FORMAT_R32G32_SNORM]             = CAP(  1),
+   [GEN6_FORMAT_R64_FLOAT]                = CAP(  1),
+   [GEN6_FORMAT_R16G16B16A16_SSCALED]     = CAP(  1),
+   [GEN6_FORMAT_R16G16B16A16_USCALED]     = CAP(  1),
+   [GEN6_FORMAT_R32G32_SSCALED]           = CAP(  1),
+   [GEN6_FORMAT_R32G32_USCALED]           = CAP(  1),
+   [GEN6_FORMAT_R32G32_SFIXED]            = CAP(7.5),
+   [GEN6_FORMAT_B8G8R8A8_UNORM]           = CAP(  1),
+   [GEN6_FORMAT_R10G10B10A2_UNORM]        = CAP(  1),
+   [GEN6_FORMAT_R10G10B10A2_UINT]         = CAP(  1),
+   [GEN6_FORMAT_R10G10B10_SNORM_A2_UNORM] = CAP(  1),
+   [GEN6_FORMAT_R8G8B8A8_UNORM]           = CAP(  1),
+   [GEN6_FORMAT_R8G8B8A8_SNORM]           = CAP(  1),
+   [GEN6_FORMAT_R8G8B8A8_SINT]            = CAP(  1),
+   [GEN6_FORMAT_R8G8B8A8_UINT]            = CAP(  1),
+   [GEN6_FORMAT_R16G16_UNORM]             = CAP(  1),
+   [GEN6_FORMAT_R16G16_SNORM]             = CAP(  1),
+   [GEN6_FORMAT_R16G16_SINT]              = CAP(  1),
+   [GEN6_FORMAT_R16G16_UINT]              = CAP(  1),
+   [GEN6_FORMAT_R16G16_FLOAT]             = CAP(  1),
+   [GEN6_FORMAT_B10G10R10A2_UNORM]        = CAP(7.5),
+   [GEN6_FORMAT_R11G11B10_FLOAT]          = CAP(  1),
+   [GEN6_FORMAT_R32_SINT]                 = CAP(  1),
+   [GEN6_FORMAT_R32_UINT]                 = CAP(  1),
+   [GEN6_FORMAT_R32_FLOAT]                = CAP(  1),
+   [GEN6_FORMAT_R32_UNORM]                = CAP(  1),
+   [GEN6_FORMAT_R32_SNORM]                = CAP(  1),
+   [GEN6_FORMAT_R10G10B10X2_USCALED]      = CAP(  1),
+   [GEN6_FORMAT_R8G8B8A8_SSCALED]         = CAP(  1),
+   [GEN6_FORMAT_R8G8B8A8_USCALED]         = CAP(  1),
+   [GEN6_FORMAT_R16G16_SSCALED]           = CAP(  1),
+   [GEN6_FORMAT_R16G16_USCALED]           = CAP(  1),
+   [GEN6_FORMAT_R32_SSCALED]              = CAP(  1),
+   [GEN6_FORMAT_R32_USCALED]              = CAP(  1),
+   [GEN6_FORMAT_R8G8_UNORM]               = CAP(  1),
+   [GEN6_FORMAT_R8G8_SNORM]               = CAP(  1),
+   [GEN6_FORMAT_R8G8_SINT]                = CAP(  1),
+   [GEN6_FORMAT_R8G8_UINT]                = CAP(  1),
+   [GEN6_FORMAT_R16_UNORM]                = CAP(  1),
+   [GEN6_FORMAT_R16_SNORM]                = CAP(  1),
+   [GEN6_FORMAT_R16_SINT]                 = CAP(  1),
+   [GEN6_FORMAT_R16_UINT]                 = CAP(  1),
+   [GEN6_FORMAT_R16_FLOAT]                = CAP(  1),
+   [GEN6_FORMAT_R8G8_SSCALED]             = CAP(  1),
+   [GEN6_FORMAT_R8G8_USCALED]             = CAP(  1),
+   [GEN6_FORMAT_R16_SSCALED]              = CAP(  1),
+   [GEN6_FORMAT_R16_USCALED]              = CAP(  1),
+   [GEN6_FORMAT_R8_UNORM]                 = CAP(  1),
+   [GEN6_FORMAT_R8_SNORM]                 = CAP(  1),
+   [GEN6_FORMAT_R8_SINT]                  = CAP(  1),
+   [GEN6_FORMAT_R8_UINT]                  = CAP(  1),
+   [GEN6_FORMAT_R8_SSCALED]               = CAP(  1),
+   [GEN6_FORMAT_R8_USCALED]               = CAP(  1),
+   [GEN6_FORMAT_R8G8B8_UNORM]             = CAP(  1),
+   [GEN6_FORMAT_R8G8B8_SNORM]             = CAP(  1),
+   [GEN6_FORMAT_R8G8B8_SSCALED]           = CAP(  1),
+   [GEN6_FORMAT_R8G8B8_USCALED]           = CAP(  1),
+   [GEN6_FORMAT_R64G64B64A64_FLOAT]       = CAP(  1),
+   [GEN6_FORMAT_R64G64B64_FLOAT]          = CAP(  1),
+   [GEN6_FORMAT_R16G16B16_FLOAT]          = CAP(  6),
+   [GEN6_FORMAT_R16G16B16_UNORM]          = CAP(  1),
+   [GEN6_FORMAT_R16G16B16_SNORM]          = CAP(  1),
+   [GEN6_FORMAT_R16G16B16_SSCALED]        = CAP(  1),
+   [GEN6_FORMAT_R16G16B16_USCALED]        = CAP(  1),
+   [GEN6_FORMAT_R16G16B16_UINT]           = CAP(7.5),
+   [GEN6_FORMAT_R16G16B16_SINT]           = CAP(7.5),
+   [GEN6_FORMAT_R32_SFIXED]               = CAP(7.5),
+   [GEN6_FORMAT_R10G10B10A2_SNORM]        = CAP(7.5),
+   [GEN6_FORMAT_R10G10B10A2_USCALED]      = CAP(7.5),
+   [GEN6_FORMAT_R10G10B10A2_SSCALED]      = CAP(7.5),
+   [GEN6_FORMAT_R10G10B10A2_SINT]         = CAP(7.5),
+   [GEN6_FORMAT_B10G10R10A2_SNORM]        = CAP(7.5),
+   [GEN6_FORMAT_B10G10R10A2_USCALED]      = CAP(7.5),
+   [GEN6_FORMAT_B10G10R10A2_SSCALED]      = CAP(7.5),
+   [GEN6_FORMAT_B10G10R10A2_UINT]         = CAP(7.5),
+   [GEN6_FORMAT_B10G10R10A2_SINT]         = CAP(7.5),
+   [GEN6_FORMAT_R8G8B8_UINT]              = CAP(7.5),
+   [GEN6_FORMAT_R8G8B8_SINT]              = CAP(7.5),
+#undef CAP
+};
+
+/*
+ * This table is based on:
+ *
+ *  - the Sandy Bridge PRM, volume 4 part 1, page 88-97
+ *  - the Ivy Bridge PRM, volume 4 part 1, page 84-87
+ */
+static const struct intel_sampler_cap intel_sampler_caps[] = {
+#define CAP(sampling, filtering, shadow_map, chroma_key) \
+   { INTEL_GEN(sampling), INTEL_GEN(filtering), INTEL_GEN(shadow_map), INTEL_GEN(chroma_key) }
+   [GEN6_FORMAT_R32G32B32A32_FLOAT]       = CAP(  1,   5,   0,   0),
+   [GEN6_FORMAT_R32G32B32A32_SINT]        = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_R32G32B32A32_UINT]        = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_R32G32B32X32_FLOAT]       = CAP(  1,   5,   0,   0),
+   [GEN6_FORMAT_R32G32B32_FLOAT]          = CAP(  1,   5,   0,   0),
+   [GEN6_FORMAT_R32G32B32_SINT]           = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_R32G32B32_UINT]           = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_R16G16B16A16_UNORM]       = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R16G16B16A16_SNORM]       = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R16G16B16A16_SINT]        = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_R16G16B16A16_UINT]        = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_R16G16B16A16_FLOAT]       = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R32G32_FLOAT]             = CAP(  1,   5,   0,   0),
+   [GEN6_FORMAT_R32G32_SINT]              = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_R32G32_UINT]              = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_R32_FLOAT_X8X24_TYPELESS] = CAP(  1,   5,   1,   0),
+   [GEN6_FORMAT_X32_TYPELESS_G8X24_UINT]  = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_L32A32_FLOAT]             = CAP(  1,   5,   0,   0),
+   [GEN6_FORMAT_R16G16B16X16_UNORM]       = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R16G16B16X16_FLOAT]       = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_A32X32_FLOAT]             = CAP(  1,   5,   0,   0),
+   [GEN6_FORMAT_L32X32_FLOAT]             = CAP(  1,   5,   0,   0),
+   [GEN6_FORMAT_I32X32_FLOAT]             = CAP(  1,   5,   0,   0),
+   [GEN6_FORMAT_B8G8R8A8_UNORM]           = CAP(  1,   1,   0,   1),
+   [GEN6_FORMAT_B8G8R8A8_UNORM_SRGB]      = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R10G10B10A2_UNORM]        = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R10G10B10A2_UNORM_SRGB]   = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R10G10B10A2_UINT]         = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_R10G10B10_SNORM_A2_UNORM] = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R8G8B8A8_UNORM]           = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R8G8B8A8_UNORM_SRGB]      = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R8G8B8A8_SNORM]           = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R8G8B8A8_SINT]            = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_R8G8B8A8_UINT]            = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_R16G16_UNORM]             = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R16G16_SNORM]             = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R16G16_SINT]              = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_R16G16_UINT]              = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_R16G16_FLOAT]             = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_B10G10R10A2_UNORM]        = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_B10G10R10A2_UNORM_SRGB]   = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R11G11B10_FLOAT]          = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R32_SINT]                 = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_R32_UINT]                 = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_R32_FLOAT]                = CAP(  1,   5,   1,   0),
+   [GEN6_FORMAT_R24_UNORM_X8_TYPELESS]    = CAP(  1,   5,   1,   0),
+   [GEN6_FORMAT_X24_TYPELESS_G8_UINT]     = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_L16A16_UNORM]             = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_I24X8_UNORM]              = CAP(  1,   5,   1,   0),
+   [GEN6_FORMAT_L24X8_UNORM]              = CAP(  1,   5,   1,   0),
+   [GEN6_FORMAT_A24X8_UNORM]              = CAP(  1,   5,   1,   0),
+   [GEN6_FORMAT_I32_FLOAT]                = CAP(  1,   5,   1,   0),
+   [GEN6_FORMAT_L32_FLOAT]                = CAP(  1,   5,   1,   0),
+   [GEN6_FORMAT_A32_FLOAT]                = CAP(  1,   5,   1,   0),
+   [GEN6_FORMAT_B8G8R8X8_UNORM]           = CAP(  1,   1,   0,   1),
+   [GEN6_FORMAT_B8G8R8X8_UNORM_SRGB]      = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R8G8B8X8_UNORM]           = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R8G8B8X8_UNORM_SRGB]      = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R9G9B9E5_SHAREDEXP]       = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_B10G10R10X2_UNORM]        = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_L16A16_FLOAT]             = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_B5G6R5_UNORM]             = CAP(  1,   1,   0,   1),
+   [GEN6_FORMAT_B5G6R5_UNORM_SRGB]        = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_B5G5R5A1_UNORM]           = CAP(  1,   1,   0,   1),
+   [GEN6_FORMAT_B5G5R5A1_UNORM_SRGB]      = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_B4G4R4A4_UNORM]           = CAP(  1,   1,   0,   1),
+   [GEN6_FORMAT_B4G4R4A4_UNORM_SRGB]      = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R8G8_UNORM]               = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R8G8_SNORM]               = CAP(  1,   1,   0,   1),
+   [GEN6_FORMAT_R8G8_SINT]                = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_R8G8_UINT]                = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_R16_UNORM]                = CAP(  1,   1,   1,   0),
+   [GEN6_FORMAT_R16_SNORM]                = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R16_SINT]                 = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_R16_UINT]                 = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_R16_FLOAT]                = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_A8P8_UNORM_PALETTE0]      = CAP(  5,   5,   0,   0),
+   [GEN6_FORMAT_A8P8_UNORM_PALETTE1]      = CAP(  5,   5,   0,   0),
+   [GEN6_FORMAT_I16_UNORM]                = CAP(  1,   1,   1,   0),
+   [GEN6_FORMAT_L16_UNORM]                = CAP(  1,   1,   1,   0),
+   [GEN6_FORMAT_A16_UNORM]                = CAP(  1,   1,   1,   0),
+   [GEN6_FORMAT_L8A8_UNORM]               = CAP(  1,   1,   0,   1),
+   [GEN6_FORMAT_I16_FLOAT]                = CAP(  1,   1,   1,   0),
+   [GEN6_FORMAT_L16_FLOAT]                = CAP(  1,   1,   1,   0),
+   [GEN6_FORMAT_A16_FLOAT]                = CAP(  1,   1,   1,   0),
+   [GEN6_FORMAT_L8A8_UNORM_SRGB]          = CAP(4.5, 4.5,   0,   0),
+   [GEN6_FORMAT_R5G5_SNORM_B6_UNORM]      = CAP(  1,   1,   0,   1),
+   [GEN6_FORMAT_P8A8_UNORM_PALETTE0]      = CAP(  5,   5,   0,   0),
+   [GEN6_FORMAT_P8A8_UNORM_PALETTE1]      = CAP(  5,   5,   0,   0),
+   [GEN6_FORMAT_R8_UNORM]                 = CAP(  1,   1,   0, 4.5),
+   [GEN6_FORMAT_R8_SNORM]                 = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R8_SINT]                  = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_R8_UINT]                  = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_A8_UNORM]                 = CAP(  1,   1,   0,   1),
+   [GEN6_FORMAT_I8_UNORM]                 = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_L8_UNORM]                 = CAP(  1,   1,   0,   1),
+   [GEN6_FORMAT_P4A4_UNORM_PALETTE0]      = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_A4P4_UNORM_PALETTE0]      = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_P8_UNORM_PALETTE0]        = CAP(4.5, 4.5,   0,   0),
+   [GEN6_FORMAT_L8_UNORM_SRGB]            = CAP(4.5, 4.5,   0,   0),
+   [GEN6_FORMAT_P8_UNORM_PALETTE1]        = CAP(4.5, 4.5,   0,   0),
+   [GEN6_FORMAT_P4A4_UNORM_PALETTE1]      = CAP(4.5, 4.5,   0,   0),
+   [GEN6_FORMAT_A4P4_UNORM_PALETTE1]      = CAP(4.5, 4.5,   0,   0),
+   [GEN6_FORMAT_DXT1_RGB_SRGB]            = CAP(4.5, 4.5,   0,   0),
+   [GEN6_FORMAT_R1_UNORM]                 = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_YCRCB_NORMAL]             = CAP(  1,   1,   0,   1),
+   [GEN6_FORMAT_YCRCB_SWAPUVY]            = CAP(  1,   1,   0,   1),
+   [GEN6_FORMAT_P2_UNORM_PALETTE0]        = CAP(4.5, 4.5,   0,   0),
+   [GEN6_FORMAT_P2_UNORM_PALETTE1]        = CAP(4.5, 4.5,   0,   0),
+   [GEN6_FORMAT_BC1_UNORM]                = CAP(  1,   1,   0,   1),
+   [GEN6_FORMAT_BC2_UNORM]                = CAP(  1,   1,   0,   1),
+   [GEN6_FORMAT_BC3_UNORM]                = CAP(  1,   1,   0,   1),
+   [GEN6_FORMAT_BC4_UNORM]                = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_BC5_UNORM]                = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_BC1_UNORM_SRGB]           = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_BC2_UNORM_SRGB]           = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_BC3_UNORM_SRGB]           = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_MONO8]                    = CAP(  1,   0,   0,   0),
+   [GEN6_FORMAT_YCRCB_SWAPUV]             = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_YCRCB_SWAPY]              = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_DXT1_RGB]                 = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_FXT1]                     = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_BC4_SNORM]                = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_BC5_SNORM]                = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R16G16B16_FLOAT]          = CAP(  5,   5,   0,   0),
+   [GEN6_FORMAT_BC6H_SF16]                = CAP(  7,   7,   0,   0),
+   [GEN6_FORMAT_BC7_UNORM]                = CAP(  7,   7,   0,   0),
+   [GEN6_FORMAT_BC7_UNORM_SRGB]           = CAP(  7,   7,   0,   0),
+   [GEN6_FORMAT_BC6H_UF16]                = CAP(  7,   7,   0,   0),
+#undef CAP
+};
+
+/*
+ * This table is based on:
+ *
+ *  - the Sandy Bridge PRM, volume 4 part 1, page 88-97
+ *  - the Ivy Bridge PRM, volume 4 part 1, page 172, 252-253, and 277-278
+ *  - the Haswell PRM, volume 7, page 262-264
+ */
+static const struct intel_dp_cap intel_dp_caps[] = {
+#define CAP(rt_write, rt_write_blending, typed_write, media_color_processing) \
+   { INTEL_GEN(rt_write), INTEL_GEN(rt_write_blending), INTEL_GEN(typed_write), INTEL_GEN(media_color_processing) }
+   [GEN6_FORMAT_R32G32B32A32_FLOAT]       = CAP(  1,   1,   7,   0),
+   [GEN6_FORMAT_R32G32B32A32_SINT]        = CAP(  1,   0,   7,   0),
+   [GEN6_FORMAT_R32G32B32A32_UINT]        = CAP(  1,   0,   7,   0),
+   [GEN6_FORMAT_R16G16B16A16_UNORM]       = CAP(  1, 4.5,   7,   6),
+   [GEN6_FORMAT_R16G16B16A16_SNORM]       = CAP(  1,   6,   7,   0),
+   [GEN6_FORMAT_R16G16B16A16_SINT]        = CAP(  1,   0,   7,   0),
+   [GEN6_FORMAT_R16G16B16A16_UINT]        = CAP(  1,   0,   7,   0),
+   [GEN6_FORMAT_R16G16B16A16_FLOAT]       = CAP(  1,   1,   7,   0),
+   [GEN6_FORMAT_R32G32_FLOAT]             = CAP(  1,   1,   7,   0),
+   [GEN6_FORMAT_R32G32_SINT]              = CAP(  1,   0,   7,   0),
+   [GEN6_FORMAT_R32G32_UINT]              = CAP(  1,   0,   7,   0),
+   [GEN6_FORMAT_B8G8R8A8_UNORM]           = CAP(  1,   1,   7,   6),
+   [GEN6_FORMAT_B8G8R8A8_UNORM_SRGB]      = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R10G10B10A2_UNORM]        = CAP(  1,   1,   7,   6),
+   [GEN6_FORMAT_R10G10B10A2_UNORM_SRGB]   = CAP(  0,   0,   0,   6),
+   [GEN6_FORMAT_R10G10B10A2_UINT]         = CAP(  1,   0,   7,   0),
+   [GEN6_FORMAT_R8G8B8A8_UNORM]           = CAP(  1,   1,   7,   6),
+   [GEN6_FORMAT_R8G8B8A8_UNORM_SRGB]      = CAP(  1,   1,   0,   6),
+   [GEN6_FORMAT_R8G8B8A8_SNORM]           = CAP(  1,   6,   7,   0),
+   [GEN6_FORMAT_R8G8B8A8_SINT]            = CAP(  1,   0,   7,   0),
+   [GEN6_FORMAT_R8G8B8A8_UINT]            = CAP(  1,   0,   7,   0),
+   [GEN6_FORMAT_R16G16_UNORM]             = CAP(  1, 4.5,   7,   0),
+   [GEN6_FORMAT_R16G16_SNORM]             = CAP(  1,   6,   7,   0),
+   [GEN6_FORMAT_R16G16_SINT]              = CAP(  1,   0,   7,   0),
+   [GEN6_FORMAT_R16G16_UINT]              = CAP(  1,   0,   7,   0),
+   [GEN6_FORMAT_R16G16_FLOAT]             = CAP(  1,   1,   7,   0),
+   [GEN6_FORMAT_B10G10R10A2_UNORM]        = CAP(  1,   1,   7,   6),
+   [GEN6_FORMAT_B10G10R10A2_UNORM_SRGB]   = CAP(  1,   1,   0,   6),
+   [GEN6_FORMAT_R11G11B10_FLOAT]          = CAP(  1,   1,   7,   0),
+   [GEN6_FORMAT_R32_SINT]                 = CAP(  1,   0,   7,   0),
+   [GEN6_FORMAT_R32_UINT]                 = CAP(  1,   0,   7,   0),
+   [GEN6_FORMAT_R32_FLOAT]                = CAP(  1,   1,   7,   0),
+   [GEN6_FORMAT_B8G8R8X8_UNORM]           = CAP(  0,   0,   0,   6),
+   [GEN6_FORMAT_B5G6R5_UNORM]             = CAP(  1,   1,   7,   0),
+   [GEN6_FORMAT_B5G6R5_UNORM_SRGB]        = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_B5G5R5A1_UNORM]           = CAP(  1,   1,   7,   0),
+   [GEN6_FORMAT_B5G5R5A1_UNORM_SRGB]      = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_B4G4R4A4_UNORM]           = CAP(  1,   1,   7,   0),
+   [GEN6_FORMAT_B4G4R4A4_UNORM_SRGB]      = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R8G8_UNORM]               = CAP(  1,   1,   7,   0),
+   [GEN6_FORMAT_R8G8_SNORM]               = CAP(  1,   6,   7,   0),
+   [GEN6_FORMAT_R8G8_SINT]                = CAP(  1,   0,   7,   0),
+   [GEN6_FORMAT_R8G8_UINT]                = CAP(  1,   0,   7,   0),
+   [GEN6_FORMAT_R16_UNORM]                = CAP(  1, 4.5,   7,   7),
+   [GEN6_FORMAT_R16_SNORM]                = CAP(  1,   6,   7,   0),
+   [GEN6_FORMAT_R16_SINT]                 = CAP(  1,   0,   7,   0),
+   [GEN6_FORMAT_R16_UINT]                 = CAP(  1,   0,   7,   0),
+   [GEN6_FORMAT_R16_FLOAT]                = CAP(  1,   1,   7,   0),
+   [GEN6_FORMAT_B5G5R5X1_UNORM]           = CAP(  1,   1,   7,   0),
+   [GEN6_FORMAT_B5G5R5X1_UNORM_SRGB]      = CAP(  1,   1,   0,   0),
+   [GEN6_FORMAT_R8_UNORM]                 = CAP(  1,   1,   7,   0),
+   [GEN6_FORMAT_R8_SNORM]                 = CAP(  1,   6,   7,   0),
+   [GEN6_FORMAT_R8_SINT]                  = CAP(  1,   0,   7,   0),
+   [GEN6_FORMAT_R8_UINT]                  = CAP(  1,   0,   7,   0),
+   [GEN6_FORMAT_A8_UNORM]                 = CAP(  1,   1,   7,   0),
+   [GEN6_FORMAT_YCRCB_NORMAL]             = CAP(  1,   0,   0,   6),
+   [GEN6_FORMAT_YCRCB_SWAPUVY]            = CAP(  1,   0,   0,   6),
+   [GEN6_FORMAT_YCRCB_SWAPUV]             = CAP(  1,   0,   0,   6),
+   [GEN6_FORMAT_YCRCB_SWAPY]              = CAP(  1,   0,   0,   6),
+#undef CAP
+};
+
+static const int intel_color_mapping[VK_FORMAT_RANGE_SIZE] = {
+    [VK_FORMAT_R4G4_UNORM_PACK8]           = 0,
+    [VK_FORMAT_R4G4B4A4_UNORM_PACK16]       = 0,
+    [VK_FORMAT_B4G4R4A4_UNORM_PACK16]       = 0,
+    [VK_FORMAT_R5G6B5_UNORM_PACK16]         = 0,
+    [VK_FORMAT_B5G6R5_UNORM_PACK16]         = GEN6_FORMAT_B5G6R5_UNORM,
+    [VK_FORMAT_R5G5B5A1_UNORM_PACK16]       = 0,
+    [VK_FORMAT_B5G5R5A1_UNORM_PACK16]       = 0,
+    [VK_FORMAT_A1R5G5B5_UNORM_PACK16]       = 0,
+    [VK_FORMAT_R8_UNORM]             = GEN6_FORMAT_R8_UNORM,
+    [VK_FORMAT_R8_SNORM]             = GEN6_FORMAT_R8_SNORM,
+    [VK_FORMAT_R8_USCALED]           = GEN6_FORMAT_R8_USCALED,
+    [VK_FORMAT_R8_SSCALED]           = GEN6_FORMAT_R8_SSCALED,
+    [VK_FORMAT_R8_UINT]              = GEN6_FORMAT_R8_UINT,
+    [VK_FORMAT_R8_SINT]              = GEN6_FORMAT_R8_SINT,
+    [VK_FORMAT_R8_SRGB]              = 0,
+    [VK_FORMAT_R8G8_UNORM]           = GEN6_FORMAT_R8G8_UNORM,
+    [VK_FORMAT_R8G8_SNORM]           = GEN6_FORMAT_R8G8_SNORM,
+    [VK_FORMAT_R8G8_USCALED]         = GEN6_FORMAT_R8G8_USCALED,
+    [VK_FORMAT_R8G8_SSCALED]         = GEN6_FORMAT_R8G8_SSCALED,
+    [VK_FORMAT_R8G8_UINT]            = GEN6_FORMAT_R8G8_UINT,
+    [VK_FORMAT_R8G8_SINT]            = GEN6_FORMAT_R8G8_SINT,
+    [VK_FORMAT_R8G8_SRGB]            = 0,
+    [VK_FORMAT_R8G8B8_UNORM]         = GEN6_FORMAT_R8G8B8_UNORM,
+    [VK_FORMAT_R8G8B8_SNORM]         = GEN6_FORMAT_R8G8B8_SNORM,
+    [VK_FORMAT_R8G8B8_USCALED]       = GEN6_FORMAT_R8G8B8_USCALED,
+    [VK_FORMAT_R8G8B8_SSCALED]       = GEN6_FORMAT_R8G8B8_SSCALED,
+    [VK_FORMAT_R8G8B8_UINT]          = GEN6_FORMAT_R8G8B8_UINT,
+    [VK_FORMAT_R8G8B8_SINT]          = GEN6_FORMAT_R8G8B8_SINT,
+    [VK_FORMAT_R8G8B8_SRGB]          = GEN6_FORMAT_R8G8B8_UNORM_SRGB,
+    [VK_FORMAT_B8G8R8_UNORM]         = 0,
+    [VK_FORMAT_B8G8R8_SNORM]         = 0,
+    [VK_FORMAT_B8G8R8_USCALED]       = 0,
+    [VK_FORMAT_B8G8R8_SSCALED]       = 0,
+    [VK_FORMAT_B8G8R8_UINT]          = 0,
+    [VK_FORMAT_B8G8R8_SINT]          = 0,
+    [VK_FORMAT_B8G8R8_SRGB]          = GEN6_FORMAT_B5G6R5_UNORM_SRGB,
+    [VK_FORMAT_R8G8B8A8_UNORM]       = GEN6_FORMAT_R8G8B8A8_UNORM,
+    [VK_FORMAT_R8G8B8A8_SNORM]       = GEN6_FORMAT_R8G8B8A8_SNORM,
+    [VK_FORMAT_R8G8B8A8_USCALED]     = GEN6_FORMAT_R8G8B8A8_USCALED,
+    [VK_FORMAT_R8G8B8A8_SSCALED]     = GEN6_FORMAT_R8G8B8A8_SSCALED,
+    [VK_FORMAT_R8G8B8A8_UINT]        = GEN6_FORMAT_R8G8B8A8_UINT,
+    [VK_FORMAT_R8G8B8A8_SINT]        = GEN6_FORMAT_R8G8B8A8_SINT,
+    [VK_FORMAT_R8G8B8A8_SRGB]        = GEN6_FORMAT_R8G8B8A8_UNORM_SRGB,
+    [VK_FORMAT_B8G8R8A8_UNORM]       = GEN6_FORMAT_B8G8R8A8_UNORM,
+    [VK_FORMAT_B8G8R8A8_SNORM]       = 0,
+    [VK_FORMAT_B8G8R8A8_USCALED]     = 0,
+    [VK_FORMAT_B8G8R8A8_SSCALED]     = 0,
+    [VK_FORMAT_B8G8R8A8_UINT]        = 0,
+    [VK_FORMAT_B8G8R8A8_SINT]        = 0,
+    [VK_FORMAT_B8G8R8A8_SRGB]        = GEN6_FORMAT_B8G8R8A8_UNORM_SRGB,
+    [VK_FORMAT_A8B8G8R8_UNORM_PACK32]       = 0,
+    [VK_FORMAT_A8B8G8R8_SNORM_PACK32]       = 0,
+    [VK_FORMAT_A8B8G8R8_USCALED_PACK32]     = 0,
+    [VK_FORMAT_A8B8G8R8_SSCALED_PACK32]     = 0,
+    [VK_FORMAT_A8B8G8R8_UINT_PACK32]        = 0,
+    [VK_FORMAT_A8B8G8R8_SINT_PACK32]        = 0,
+    [VK_FORMAT_A8B8G8R8_SRGB_PACK32]        = 0,
+    [VK_FORMAT_A2R10G10B10_UNORM_PACK32]    = GEN6_FORMAT_B10G10R10A2_UNORM,
+    [VK_FORMAT_A2R10G10B10_SNORM_PACK32]    = GEN6_FORMAT_B10G10R10A2_SNORM,
+    [VK_FORMAT_A2R10G10B10_USCALED_PACK32]  = GEN6_FORMAT_B10G10R10A2_USCALED,
+    [VK_FORMAT_A2R10G10B10_SSCALED_PACK32]  = GEN6_FORMAT_B10G10R10A2_SSCALED,
+    [VK_FORMAT_A2R10G10B10_UINT_PACK32]     = GEN6_FORMAT_B10G10R10A2_UINT,
+    [VK_FORMAT_A2R10G10B10_SINT_PACK32]     = GEN6_FORMAT_B10G10R10A2_SINT,
+    [VK_FORMAT_A2B10G10R10_UNORM_PACK32]    = GEN6_FORMAT_R10G10B10A2_UNORM,
+    [VK_FORMAT_A2B10G10R10_SNORM_PACK32]    = GEN6_FORMAT_R10G10B10A2_SNORM,
+    [VK_FORMAT_A2B10G10R10_USCALED_PACK32]  = GEN6_FORMAT_R10G10B10A2_USCALED,
+    [VK_FORMAT_A2B10G10R10_SSCALED_PACK32]  = GEN6_FORMAT_R10G10B10A2_SSCALED,
+    [VK_FORMAT_A2B10G10R10_UINT_PACK32]     = GEN6_FORMAT_R10G10B10A2_UINT,
+    [VK_FORMAT_A2B10G10R10_SINT_PACK32]     = GEN6_FORMAT_R10G10B10A2_SINT,
+    [VK_FORMAT_R16_UNORM]            = GEN6_FORMAT_R16_UNORM,
+    [VK_FORMAT_R16_SNORM]            = GEN6_FORMAT_R16_SNORM,
+    [VK_FORMAT_R16_USCALED]          = GEN6_FORMAT_R16_USCALED,
+    [VK_FORMAT_R16_SSCALED]          = GEN6_FORMAT_R16_SSCALED,
+    [VK_FORMAT_R16_UINT]             = GEN6_FORMAT_R16_UINT,
+    [VK_FORMAT_R16_SINT]             = GEN6_FORMAT_R16_SINT,
+    [VK_FORMAT_R16_SFLOAT]           = GEN6_FORMAT_R16_FLOAT,
+    [VK_FORMAT_R16G16_UNORM]         = GEN6_FORMAT_R16G16_UNORM,
+    [VK_FORMAT_R16G16_SNORM]         = GEN6_FORMAT_R16G16_SNORM,
+    [VK_FORMAT_R16G16_USCALED]       = GEN6_FORMAT_R16G16_USCALED,
+    [VK_FORMAT_R16G16_SSCALED]       = GEN6_FORMAT_R16G16_SSCALED,
+    [VK_FORMAT_R16G16_UINT]          = GEN6_FORMAT_R16G16_UINT,
+    [VK_FORMAT_R16G16_SINT]          = GEN6_FORMAT_R16G16_SINT,
+    [VK_FORMAT_R16G16_SFLOAT]        = GEN6_FORMAT_R16G16_FLOAT,
+    [VK_FORMAT_R16G16B16_UNORM]      = GEN6_FORMAT_R16G16B16_UNORM,
+    [VK_FORMAT_R16G16B16_SNORM]      = GEN6_FORMAT_R16G16B16_SNORM,
+    [VK_FORMAT_R16G16B16_USCALED]    = GEN6_FORMAT_R16G16B16_USCALED,
+    [VK_FORMAT_R16G16B16_SSCALED]    = GEN6_FORMAT_R16G16B16_SSCALED,
+    [VK_FORMAT_R16G16B16_UINT]       = GEN6_FORMAT_R16G16B16_UINT,
+    [VK_FORMAT_R16G16B16_SINT]       = GEN6_FORMAT_R16G16B16_SINT,
+    [VK_FORMAT_R16G16B16_SFLOAT]     = 0,
+    [VK_FORMAT_R16G16B16A16_UNORM]   = GEN6_FORMAT_R16G16B16A16_UNORM,
+    [VK_FORMAT_R16G16B16A16_SNORM]   = GEN6_FORMAT_R16G16B16A16_SNORM,
+    [VK_FORMAT_R16G16B16A16_USCALED] = GEN6_FORMAT_R16G16B16A16_USCALED,
+    [VK_FORMAT_R16G16B16A16_SSCALED] = GEN6_FORMAT_R16G16B16A16_SSCALED,
+    [VK_FORMAT_R16G16B16A16_UINT]    = GEN6_FORMAT_R16G16B16A16_UINT,
+    [VK_FORMAT_R16G16B16A16_SINT]    = GEN6_FORMAT_R16G16B16A16_SINT,
+    [VK_FORMAT_R16G16B16A16_SFLOAT]  = GEN6_FORMAT_R16G16B16A16_FLOAT,
+    [VK_FORMAT_R32_UINT]             = GEN6_FORMAT_R32_UINT,
+    [VK_FORMAT_R32_SINT]             = GEN6_FORMAT_R32_SINT,
+    [VK_FORMAT_R32_SFLOAT]           = GEN6_FORMAT_R32_FLOAT,
+    [VK_FORMAT_R32G32_UINT]          = GEN6_FORMAT_R32G32_UINT,
+    [VK_FORMAT_R32G32_SINT]          = GEN6_FORMAT_R32G32_SINT,
+    [VK_FORMAT_R32G32_SFLOAT]        = GEN6_FORMAT_R32G32_FLOAT,
+    [VK_FORMAT_R32G32B32_UINT]       = GEN6_FORMAT_R32G32B32_UINT,
+    [VK_FORMAT_R32G32B32_SINT]       = GEN6_FORMAT_R32G32B32_SINT,
+    [VK_FORMAT_R32G32B32_SFLOAT]     = GEN6_FORMAT_R32G32B32_FLOAT,
+    [VK_FORMAT_R32G32B32A32_UINT]    = GEN6_FORMAT_R32G32B32A32_UINT,
+    [VK_FORMAT_R32G32B32A32_SINT]    = GEN6_FORMAT_R32G32B32A32_SINT,
+    [VK_FORMAT_R32G32B32A32_SFLOAT]  = GEN6_FORMAT_R32G32B32A32_FLOAT,
+    [VK_FORMAT_R64_UINT]             = 0,
+    [VK_FORMAT_R64_SINT]             = 0,
+    [VK_FORMAT_R64_SFLOAT]           = GEN6_FORMAT_R64_FLOAT,
+    [VK_FORMAT_R64G64_UINT]          = 0,
+    [VK_FORMAT_R64G64_SINT]          = 0,
+    [VK_FORMAT_R64G64_SFLOAT]        = GEN6_FORMAT_R64G64_FLOAT,
+    [VK_FORMAT_R64G64B64_UINT]       = 0,
+    [VK_FORMAT_R64G64B64_SINT]       = 0,
+    [VK_FORMAT_R64G64B64_SFLOAT]     = GEN6_FORMAT_R64G64B64_FLOAT,
+    [VK_FORMAT_R64G64B64A64_UINT]    = 0,
+    [VK_FORMAT_R64G64B64A64_SINT]    = 0,
+    [VK_FORMAT_R64G64B64A64_SFLOAT]  = GEN6_FORMAT_R64G64B64A64_FLOAT,
+    [VK_FORMAT_B10G11R11_UFLOAT_PACK32]     = GEN6_FORMAT_R11G11B10_FLOAT,
+    [VK_FORMAT_E5B9G9R9_UFLOAT_PACK32]      = GEN6_FORMAT_R9G9B9E5_SHAREDEXP,
+    [VK_FORMAT_BC1_RGB_UNORM_BLOCK]        = GEN6_FORMAT_DXT1_RGB,
+    [VK_FORMAT_BC1_RGB_SRGB_BLOCK]         = GEN6_FORMAT_DXT1_RGB_SRGB,
+    [VK_FORMAT_BC2_UNORM_BLOCK]            = GEN6_FORMAT_BC2_UNORM,
+    [VK_FORMAT_BC1_RGBA_UNORM_BLOCK]       = GEN6_FORMAT_BC1_UNORM,
+    [VK_FORMAT_BC1_RGBA_SRGB_BLOCK]        = GEN6_FORMAT_BC1_UNORM_SRGB,
+    [VK_FORMAT_BC2_SRGB_BLOCK]             = GEN6_FORMAT_BC2_UNORM_SRGB,
+    [VK_FORMAT_BC3_UNORM_BLOCK]            = GEN6_FORMAT_BC3_UNORM,
+    [VK_FORMAT_BC3_SRGB_BLOCK]             = GEN6_FORMAT_BC3_UNORM_SRGB,
+    [VK_FORMAT_BC4_UNORM_BLOCK]            = GEN6_FORMAT_BC4_UNORM,
+    [VK_FORMAT_BC4_SNORM_BLOCK]            = GEN6_FORMAT_BC4_SNORM,
+    [VK_FORMAT_BC5_UNORM_BLOCK]            = GEN6_FORMAT_BC5_UNORM,
+    [VK_FORMAT_BC5_SNORM_BLOCK]            = GEN6_FORMAT_BC5_SNORM,
+    [VK_FORMAT_BC6H_UFLOAT_BLOCK]          = GEN6_FORMAT_BC6H_UF16,
+    [VK_FORMAT_BC6H_SFLOAT_BLOCK]          = GEN6_FORMAT_BC6H_SF16,
+    [VK_FORMAT_BC7_UNORM_BLOCK]            = GEN6_FORMAT_BC7_UNORM,
+    [VK_FORMAT_BC7_SRGB_BLOCK]             = GEN6_FORMAT_BC7_UNORM_SRGB,
+    /* TODO: Implement for remaining compressed formats. */
+    [VK_FORMAT_ETC2_R8G8B8_UNORM_BLOCK]    = 0,
+    [VK_FORMAT_ETC2_R8G8B8A1_UNORM_BLOCK]  = 0,
+    [VK_FORMAT_ETC2_R8G8B8A8_UNORM_BLOCK]  = 0,
+    [VK_FORMAT_EAC_R11_UNORM_BLOCK]        = 0,
+    [VK_FORMAT_EAC_R11_SNORM_BLOCK]        = 0,
+    [VK_FORMAT_EAC_R11G11_UNORM_BLOCK]     = 0,
+    [VK_FORMAT_EAC_R11G11_SNORM_BLOCK]     = 0,
+    [VK_FORMAT_ASTC_4x4_UNORM_BLOCK]       = 0,
+    [VK_FORMAT_ASTC_4x4_SRGB_BLOCK]        = 0,
+    [VK_FORMAT_ASTC_5x4_UNORM_BLOCK]       = 0,
+    [VK_FORMAT_ASTC_5x4_SRGB_BLOCK]        = 0,
+    [VK_FORMAT_ASTC_5x5_UNORM_BLOCK]       = 0,
+    [VK_FORMAT_ASTC_5x5_SRGB_BLOCK]        = 0,
+    [VK_FORMAT_ASTC_6x5_UNORM_BLOCK]       = 0,
+    [VK_FORMAT_ASTC_6x5_SRGB_BLOCK]        = 0,
+    [VK_FORMAT_ASTC_6x6_UNORM_BLOCK]       = 0,
+    [VK_FORMAT_ASTC_6x6_SRGB_BLOCK]        = 0,
+    [VK_FORMAT_ASTC_8x5_UNORM_BLOCK]       = 0,
+    [VK_FORMAT_ASTC_8x5_SRGB_BLOCK]        = 0,
+    [VK_FORMAT_ASTC_8x6_UNORM_BLOCK]       = 0,
+    [VK_FORMAT_ASTC_8x6_SRGB_BLOCK]        = 0,
+    [VK_FORMAT_ASTC_8x8_UNORM_BLOCK]       = 0,
+    [VK_FORMAT_ASTC_8x8_SRGB_BLOCK]        = 0,
+    [VK_FORMAT_ASTC_10x5_UNORM_BLOCK]      = 0,
+    [VK_FORMAT_ASTC_10x5_SRGB_BLOCK]       = 0,
+    [VK_FORMAT_ASTC_10x6_UNORM_BLOCK]      = 0,
+    [VK_FORMAT_ASTC_10x6_SRGB_BLOCK]       = 0,
+    [VK_FORMAT_ASTC_10x8_UNORM_BLOCK]      = 0,
+    [VK_FORMAT_ASTC_10x8_SRGB_BLOCK]       = 0,
+    [VK_FORMAT_ASTC_10x10_UNORM_BLOCK]     = 0,
+    [VK_FORMAT_ASTC_10x10_SRGB_BLOCK]      = 0,
+    [VK_FORMAT_ASTC_12x10_UNORM_BLOCK]     = 0,
+    [VK_FORMAT_ASTC_12x10_SRGB_BLOCK]      = 0,
+    [VK_FORMAT_ASTC_12x12_UNORM_BLOCK]     = 0,
+    [VK_FORMAT_ASTC_12x12_SRGB_BLOCK]      = 0,
+};
+
+int intel_format_translate_color(const struct intel_gpu *gpu,
+                                 VkFormat format)
+{
+    int fmt;
+
+    assert(!icd_format_is_undef(format));
+    assert(!icd_format_is_ds(format));
+
+    fmt = intel_color_mapping[format];
+    /* TODO: Implement for remaining compressed formats. */
+
+    /* GEN6_FORMAT_R32G32B32A32_FLOAT happens to be 0 */
+    if (format == VK_FORMAT_R32G32B32A32_SFLOAT)
+        assert(fmt == 0);
+    else if (!fmt)
+        fmt = -1;
+
+    return fmt;
+}
+
+static VkFlags intel_format_get_color_features(const struct intel_gpu *gpu,
+                                                 VkFormat format)
+{
+    const int fmt = intel_format_translate_color(gpu, format);
+    const struct intel_vf_cap *vf;
+    const struct intel_sampler_cap *sampler;
+    const struct intel_dp_cap *dp;
+    VkFlags features;
+
+    if (fmt < 0)
+        return 0;
+
+    sampler = (fmt < ARRAY_SIZE(intel_sampler_caps)) ?
+        &intel_sampler_caps[fmt] : NULL;
+    vf = (fmt < ARRAY_SIZE(intel_vf_caps)) ?  &intel_vf_caps[fmt] : NULL;
+    dp = (fmt < ARRAY_SIZE(intel_dp_caps)) ?  &intel_dp_caps[fmt] : NULL;
+
+    features = VK_FORMAT_FEATURE_STORAGE_IMAGE_BIT;
+
+#define TEST(gpu, func, cap) ((func) && (func)->cap && \
+        intel_gpu_gen(gpu) >= (func)->cap)
+    if (TEST(gpu, vf, vertex_element)) {
+        /* no feature bit to set */
+    }
+
+    if (TEST(gpu, sampler, sampling)) {
+        if (icd_format_is_int(format) ||
+            TEST(gpu, sampler, filtering))
+            features |= VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT | VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT;
+    }
+
+    if (TEST(gpu, dp, typed_write))
+        features |= VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT | VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT;
+
+    if (TEST(gpu, dp, rt_write)) {
+        features |= VK_FORMAT_FEATURE_COLOR_ATTACHMENT_BIT;
+
+        if (TEST(gpu, dp, rt_write_blending))
+            features |= VK_FORMAT_FEATURE_COLOR_ATTACHMENT_BLEND_BIT;
+
+        if (features & VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT) {
+            features |= VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT | VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT;
+        }
+    }
+#undef TEST
+
+    return features;
+}
+
+static VkFlags intel_format_get_ds_features(const struct intel_gpu *gpu,
+                                              VkFormat format)
+{
+    VkFlags features;
+
+    assert(icd_format_is_ds(format));
+
+    switch (format) {
+    case VK_FORMAT_S8_UINT:
+        features = VK_FORMAT_FEATURE_DEPTH_STENCIL_ATTACHMENT_BIT;;
+        break;
+    case VK_FORMAT_D16_UNORM:
+    case VK_FORMAT_X8_D24_UNORM_PACK32:
+    case VK_FORMAT_D32_SFLOAT:
+        features = VK_FORMAT_FEATURE_DEPTH_STENCIL_ATTACHMENT_BIT;
+        break;
+    case VK_FORMAT_D16_UNORM_S8_UINT:
+    case VK_FORMAT_D24_UNORM_S8_UINT:
+    case VK_FORMAT_D32_SFLOAT_S8_UINT:
+        features = VK_FORMAT_FEATURE_DEPTH_STENCIL_ATTACHMENT_BIT |
+                   VK_FORMAT_FEATURE_DEPTH_STENCIL_ATTACHMENT_BIT;
+        break;
+    default:
+        features = 0;
+        break;
+    }
+
+    return features;
+}
+
+static VkFlags intel_format_get_raw_features(const struct intel_gpu *gpu,
+                                               VkFormat format)
+{
+    return (format == VK_FORMAT_UNDEFINED) ?
+        VK_FORMAT_FEATURE_STORAGE_IMAGE_BIT : 0;
+}
+
+// These format features flags are supported by buffers.  Remaining flags
+// are supported by optimal/linear tiling images.
+static VkFlags bufferFormatsFlagMask =  VK_FORMAT_FEATURE_UNIFORM_TEXEL_BUFFER_BIT        |
+                                        VK_FORMAT_FEATURE_STORAGE_TEXEL_BUFFER_BIT        |
+                                        VK_FORMAT_FEATURE_STORAGE_TEXEL_BUFFER_ATOMIC_BIT |
+                                        VK_FORMAT_FEATURE_VERTEX_BUFFER_BIT;
+
+static void intel_format_get_props(const struct intel_gpu *gpu,
+                                   VkFormat format,
+                                   VkFormatProperties *props)
+{
+    if (icd_format_is_undef(format)) {
+        props->linearTilingFeatures  = intel_format_get_raw_features(gpu, format);
+        props->optimalTilingFeatures = 0;
+        props->bufferFeatures        = 0;
+    } else if(icd_format_is_color(format)) {
+        VkFlags formatFlags = 0;
+        formatFlags = intel_format_get_color_features(gpu, format);
+        props->linearTilingFeatures  = formatFlags & ~bufferFormatsFlagMask;
+        props->optimalTilingFeatures = formatFlags & ~bufferFormatsFlagMask;
+        props->bufferFeatures        = formatFlags &  bufferFormatsFlagMask;
+    } else if(icd_format_is_ds(format)) {
+        props->linearTilingFeatures  = 0;
+        props->optimalTilingFeatures = intel_format_get_ds_features(gpu, format);
+        props->bufferFeatures        = 0;
+    } else {
+        props->linearTilingFeatures  = 0;
+        props->optimalTilingFeatures = 0;
+        props->bufferFeatures        = 0;
+    }
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetPhysicalDeviceFormatProperties(
+    VkPhysicalDevice                          physicalDevice,
+    VkFormat                                  format,
+    VkFormatProperties*                       pFormatInfo)
+{
+    const struct intel_gpu *gpu = intel_gpu(physicalDevice);
+
+    intel_format_get_props(gpu, format, pFormatInfo);
+}
+
+// From the Ivy Bridge PRM, volume 1 part 1, page 105:
+//
+//     "In addition to restrictions on maximum height, width, and depth,
+//      surfaces are also restricted to a maximum size in bytes. This
+//      maximum is 2 GB for all products and all surface types."
+static const size_t intel_max_resource_size = 1u << 31;
+
+VKAPI_ATTR VkResult VKAPI_CALL vkGetPhysicalDeviceImageFormatProperties(
+    VkPhysicalDevice                            physicalDevice,
+    VkFormat                                    format,
+    VkImageType                                 type,
+    VkImageTiling                               tiling,
+    VkImageUsageFlags                           usage,
+    VkImageCreateFlags                          flags,
+    VkImageFormatProperties*                    pImageFormatProperties)
+{
+    memset(pImageFormatProperties, 0, sizeof(*pImageFormatProperties));
+
+    // TODO: Add support for specific formats. For now, repeat info from
+    //       limits in gpu.c for ALL formats
+    // Format-specific values can be a superset of returned DeviceLimits
+    pImageFormatProperties->maxExtent.width  = 8192;
+    pImageFormatProperties->maxExtent.height = 8192;
+    pImageFormatProperties->maxExtent.depth  = 8192;
+    pImageFormatProperties->maxArrayLayers   = 2048;
+    pImageFormatProperties->maxMipLevels	 = 14;
+    pImageFormatProperties->maxArrayLayers   = 2048;
+    pImageFormatProperties->sampleCounts	 = VK_SAMPLE_COUNT_1_BIT | VK_SAMPLE_COUNT_2_BIT | VK_SAMPLE_COUNT_4_BIT | VK_SAMPLE_COUNT_8_BIT;
+    pImageFormatProperties->maxResourceSize  = intel_max_resource_size;
+
+    return VK_SUCCESS;
+}

diff --git a/icd/intel/format.h b/icd/intel/format.h
new file mode 100644
index 0000000..85caa04
--- /dev/null
+++ b/icd/intel/format.h

@@ -0,0 +1,54 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Jeremy Hayes <jeremy@lunarg.com>
+ *
+ */
+
+#ifndef FORMAT_H
+#define FORMAT_H
+
+#include "intel.h"
+
+struct intel_gpu;
+
+static inline bool intel_format_has_depth(const struct intel_gpu *gpu,
+                                          VkFormat format)
+{
+    bool has_depth = false;
+
+    switch (format) {
+    case VK_FORMAT_D16_UNORM:
+    case VK_FORMAT_X8_D24_UNORM_PACK32:
+    case VK_FORMAT_D32_SFLOAT:
+    /* VK_FORMAT_D16_UNORM_S8_UINT is unsupported */
+    case VK_FORMAT_D24_UNORM_S8_UINT:
+    case VK_FORMAT_D32_SFLOAT_S8_UINT:
+        has_depth = true;
+        break;
+    default:
+        break;
+    }
+
+    return has_depth;
+}
+
+int intel_format_translate_color(const struct intel_gpu *gpu,
+                                 VkFormat format);
+
+#endif /* FORMAT_H */

diff --git a/icd/intel/genhw/gen_blitter.xml.h b/icd/intel/genhw/gen_blitter.xml.h
new file mode 100644
index 0000000..6d58e02
--- /dev/null
+++ b/icd/intel/genhw/gen_blitter.xml.h

@@ -0,0 +1,129 @@
+#ifndef GEN_BLITTER_XML
+#define GEN_BLITTER_XML
+
+/* Autogenerated file, DO NOT EDIT manually!
+
+This file was generated by the rules-ng-ng headergen tool in this git repository:
+https://github.com/olvaffe/envytools/
+git clone https://github.com/olvaffe/envytools.git
+
+Copyright (C) 2014-2015 by the following authors:
+- Chia-I Wu <olvaffe@gmail.com> (olv)
+
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
+The above copyright notice and this permission notice (including the
+next paragraph) shall be included in all copies or substantial
+portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+*/
+
+
+#define GEN6_BLITTER_TYPE__MASK					0xe0000000
+#define GEN6_BLITTER_TYPE__SHIFT				29
+#define GEN6_BLITTER_TYPE_BLITTER				(0x2 << 29)
+#define GEN6_BLITTER_OPCODE__MASK				0x1fc00000
+#define GEN6_BLITTER_OPCODE__SHIFT				22
+#define GEN6_BLITTER_OPCODE_COLOR_BLT				(0x40 << 22)
+#define GEN6_BLITTER_OPCODE_SRC_COPY_BLT			(0x43 << 22)
+#define GEN6_BLITTER_OPCODE_XY_COLOR_BLT			(0x50 << 22)
+#define GEN6_BLITTER_OPCODE_XY_SRC_COPY_BLT			(0x53 << 22)
+#define GEN6_BLITTER_BR00_WRITE_A				(0x1 << 21)
+#define GEN6_BLITTER_BR00_WRITE_RGB				(0x1 << 20)
+#define GEN6_BLITTER_BR00_SRC_TILED				(0x1 << 15)
+#define GEN6_BLITTER_BR00_DST_TILED				(0x1 << 11)
+#define GEN6_BLITTER_LENGTH__MASK				0x0000003f
+#define GEN6_BLITTER_LENGTH__SHIFT				0
+#define GEN6_BLITTER_BR13_DIR_RTL				(0x1 << 30)
+#define GEN6_BLITTER_BR13_CLIP_ENABLE				(0x1 << 30)
+#define GEN6_BLITTER_BR13_FORMAT__MASK				0x03000000
+#define GEN6_BLITTER_BR13_FORMAT__SHIFT				24
+#define GEN6_BLITTER_BR13_FORMAT_8				(0x0 << 24)
+#define GEN6_BLITTER_BR13_FORMAT_565				(0x1 << 24)
+#define GEN6_BLITTER_BR13_FORMAT_1555				(0x2 << 24)
+#define GEN6_BLITTER_BR13_FORMAT_8888				(0x3 << 24)
+#define GEN6_BLITTER_BR13_ROP__MASK				0x00ff0000
+#define GEN6_BLITTER_BR13_ROP__SHIFT				16
+#define GEN6_BLITTER_BR13_ROP_SRCCOPY				(0xcc << 16)
+#define GEN6_BLITTER_BR13_ROP_PATCOPY				(0xf0 << 16)
+#define GEN6_BLITTER_BR13_DST_PITCH__MASK			0x0000ffff
+#define GEN6_BLITTER_BR13_DST_PITCH__SHIFT			0
+#define GEN6_BLITTER_BR11_SRC_PITCH__MASK			0x0000ffff
+#define GEN6_BLITTER_BR11_SRC_PITCH__SHIFT			0
+#define GEN6_BLITTER_BR14_DST_HEIGHT__MASK			0xffff0000
+#define GEN6_BLITTER_BR14_DST_HEIGHT__SHIFT			16
+#define GEN6_BLITTER_BR14_DST_WIDTH__MASK			0x0000ffff
+#define GEN6_BLITTER_BR14_DST_WIDTH__SHIFT			0
+#define GEN6_BLITTER_BR22_DST_Y1__MASK				0xffff0000
+#define GEN6_BLITTER_BR22_DST_Y1__SHIFT				16
+#define GEN6_BLITTER_BR22_DST_X1__MASK				0x0000ffff
+#define GEN6_BLITTER_BR22_DST_X1__SHIFT				0
+#define GEN6_BLITTER_BR23_DST_Y2__MASK				0xffff0000
+#define GEN6_BLITTER_BR23_DST_Y2__SHIFT				16
+#define GEN6_BLITTER_BR23_DST_X2__MASK				0x0000ffff
+#define GEN6_BLITTER_BR23_DST_X2__SHIFT				0
+#define GEN6_BLITTER_BR26_SRC_Y1__MASK				0xffff0000
+#define GEN6_BLITTER_BR26_SRC_Y1__SHIFT				16
+#define GEN6_BLITTER_BR26_SRC_X1__MASK				0x0000ffff
+#define GEN6_BLITTER_BR26_SRC_X1__SHIFT				0
+#define GEN6_COLOR_BLT__SIZE					6
+
+
+
+
+
+
+
+
+#define GEN6_SRC_COPY_BLT__SIZE					8
+
+
+
+
+
+
+
+
+
+
+
+#define GEN6_XY_COLOR_BLT__SIZE					7
+
+
+
+
+
+
+
+
+
+#define GEN6_XY_SRC_COPY_BLT__SIZE				10
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+#endif /* GEN_BLITTER_XML */

diff --git a/icd/intel/genhw/gen_eu_isa.xml.h b/icd/intel/genhw/gen_eu_isa.xml.h
new file mode 100644
index 0000000..f91fa5f
--- /dev/null
+++ b/icd/intel/genhw/gen_eu_isa.xml.h

@@ -0,0 +1,563 @@
+#ifndef GEN_EU_ISA_XML
+#define GEN_EU_ISA_XML
+
+/* Autogenerated file, DO NOT EDIT manually!
+
+This file was generated by the rules-ng-ng headergen tool in this git repository:
+https://github.com/olvaffe/envytools/
+git clone https://github.com/olvaffe/envytools.git
+
+Copyright (C) 2014-2015 by the following authors:
+- Chia-I Wu <olvaffe@gmail.com> (olv)
+
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
+The above copyright notice and this permission notice (including the
+next paragraph) shall be included in all copies or substantial
+portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+*/
+
+
+enum gen_eu_opcode {
+    GEN6_OPCODE_ILLEGAL					      = 0x0,
+    GEN6_OPCODE_MOV					      = 0x1,
+    GEN6_OPCODE_SEL					      = 0x2,
+    GEN6_OPCODE_MOVI					      = 0x3,
+    GEN6_OPCODE_NOT					      = 0x4,
+    GEN6_OPCODE_AND					      = 0x5,
+    GEN6_OPCODE_OR					      = 0x6,
+    GEN6_OPCODE_XOR					      = 0x7,
+    GEN6_OPCODE_SHR					      = 0x8,
+    GEN6_OPCODE_SHL					      = 0x9,
+    GEN6_OPCODE_DIM					      = 0xa,
+    GEN6_OPCODE_ASR					      = 0xc,
+    GEN6_OPCODE_CMP					      = 0x10,
+    GEN6_OPCODE_CMPN					      = 0x11,
+    GEN7_OPCODE_CSEL					      = 0x12,
+    GEN7_OPCODE_F32TO16					      = 0x13,
+    GEN7_OPCODE_F16TO32					      = 0x14,
+    GEN7_OPCODE_BFREV					      = 0x17,
+    GEN7_OPCODE_BFE					      = 0x18,
+    GEN7_OPCODE_BFI1					      = 0x19,
+    GEN7_OPCODE_BFI2					      = 0x1a,
+    GEN6_OPCODE_JMPI					      = 0x20,
+    GEN7_OPCODE_BRD					      = 0x21,
+    GEN6_OPCODE_IF					      = 0x22,
+    GEN7_OPCODE_BRC					      = 0x23,
+    GEN6_OPCODE_ELSE					      = 0x24,
+    GEN6_OPCODE_ENDIF					      = 0x25,
+    GEN6_OPCODE_CASE					      = 0x26,
+    GEN6_OPCODE_WHILE					      = 0x27,
+    GEN6_OPCODE_BREAK					      = 0x28,
+    GEN6_OPCODE_CONT					      = 0x29,
+    GEN6_OPCODE_HALT					      = 0x2a,
+    GEN75_OPCODE_CALLA					      = 0x2b,
+    GEN6_OPCODE_CALL					      = 0x2c,
+    GEN6_OPCODE_RETURN					      = 0x2d,
+    GEN8_OPCODE_GOTO					      = 0x2e,
+    GEN6_OPCODE_WAIT					      = 0x30,
+    GEN6_OPCODE_SEND					      = 0x31,
+    GEN6_OPCODE_SENDC					      = 0x32,
+    GEN6_OPCODE_MATH					      = 0x38,
+    GEN6_OPCODE_ADD					      = 0x40,
+    GEN6_OPCODE_MUL					      = 0x41,
+    GEN6_OPCODE_AVG					      = 0x42,
+    GEN6_OPCODE_FRC					      = 0x43,
+    GEN6_OPCODE_RNDU					      = 0x44,
+    GEN6_OPCODE_RNDD					      = 0x45,
+    GEN6_OPCODE_RNDE					      = 0x46,
+    GEN6_OPCODE_RNDZ					      = 0x47,
+    GEN6_OPCODE_MAC					      = 0x48,
+    GEN6_OPCODE_MACH					      = 0x49,
+    GEN6_OPCODE_LZD					      = 0x4a,
+    GEN7_OPCODE_FBH					      = 0x4b,
+    GEN7_OPCODE_FBL					      = 0x4c,
+    GEN7_OPCODE_CBIT					      = 0x4d,
+    GEN7_OPCODE_ADDC					      = 0x4e,
+    GEN7_OPCODE_SUBB					      = 0x4f,
+    GEN6_OPCODE_SAD2					      = 0x50,
+    GEN6_OPCODE_SADA2					      = 0x51,
+    GEN6_OPCODE_DP4					      = 0x54,
+    GEN6_OPCODE_DPH					      = 0x55,
+    GEN6_OPCODE_DP3					      = 0x56,
+    GEN6_OPCODE_DP2					      = 0x57,
+    GEN6_OPCODE_LINE					      = 0x59,
+    GEN6_OPCODE_PLN					      = 0x5a,
+    GEN6_OPCODE_MAD					      = 0x5b,
+    GEN6_OPCODE_LRP					      = 0x5c,
+    GEN6_OPCODE_NOP					      = 0x7e,
+};
+
+enum gen_eu_access_mode {
+    GEN6_ALIGN_1					      = 0x0,
+    GEN6_ALIGN_16					      = 0x1,
+};
+
+enum gen_eu_mask_control {
+    GEN6_MASKCTRL_NORMAL				      = 0x0,
+    GEN6_MASKCTRL_NOMASK				      = 0x1,
+};
+
+enum gen_eu_dependency_control {
+    GEN6_DEPCTRL_NORMAL					      = 0x0,
+    GEN6_DEPCTRL_NODDCLR				      = 0x1,
+    GEN6_DEPCTRL_NODDCHK				      = 0x2,
+    GEN6_DEPCTRL_NEITHER				      = 0x3,
+};
+
+enum gen_eu_quarter_control {
+    GEN6_QTRCTRL_1Q					      = 0x0,
+    GEN6_QTRCTRL_2Q					      = 0x1,
+    GEN6_QTRCTRL_3Q					      = 0x2,
+    GEN6_QTRCTRL_4Q					      = 0x3,
+    GEN6_QTRCTRL_1H					      = 0x0,
+    GEN6_QTRCTRL_2H					      = 0x2,
+};
+
+enum gen_eu_thread_control {
+    GEN6_THREADCTRL_NORMAL				      = 0x0,
+    GEN6_THREADCTRL_ATOMIC				      = 0x1,
+    GEN6_THREADCTRL_SWITCH				      = 0x2,
+};
+
+enum gen_eu_predicate_control {
+    GEN6_PREDCTRL_NONE					      = 0x0,
+    GEN6_PREDCTRL_NORMAL				      = 0x1,
+    GEN6_PREDCTRL_ANYV					      = 0x2,
+    GEN6_PREDCTRL_ALLV					      = 0x3,
+    GEN6_PREDCTRL_ANY2H					      = 0x4,
+    GEN6_PREDCTRL_ALL2H					      = 0x5,
+    GEN6_PREDCTRL_X					      = 0x2,
+    GEN6_PREDCTRL_Y					      = 0x3,
+    GEN6_PREDCTRL_Z					      = 0x4,
+    GEN6_PREDCTRL_W					      = 0x5,
+    GEN6_PREDCTRL_ANY4H					      = 0x6,
+    GEN6_PREDCTRL_ALL4H					      = 0x7,
+    GEN6_PREDCTRL_ANY8H					      = 0x8,
+    GEN6_PREDCTRL_ALL8H					      = 0x9,
+    GEN6_PREDCTRL_ANY16H				      = 0xa,
+    GEN6_PREDCTRL_ALL16H				      = 0xb,
+    GEN7_PREDCTRL_ANY32H				      = 0xc,
+    GEN7_PREDCTRL_ALL32H				      = 0xd,
+};
+
+enum gen_eu_exec_size {
+    GEN6_EXECSIZE_1					      = 0x0,
+    GEN6_EXECSIZE_2					      = 0x1,
+    GEN6_EXECSIZE_4					      = 0x2,
+    GEN6_EXECSIZE_8					      = 0x3,
+    GEN6_EXECSIZE_16					      = 0x4,
+    GEN6_EXECSIZE_32					      = 0x5,
+};
+
+enum gen_eu_condition_modifier {
+    GEN6_COND_NONE					      = 0x0,
+    GEN6_COND_Z						      = 0x1,
+    GEN6_COND_NZ					      = 0x2,
+    GEN6_COND_G						      = 0x3,
+    GEN6_COND_GE					      = 0x4,
+    GEN6_COND_L						      = 0x5,
+    GEN6_COND_LE					      = 0x6,
+    GEN6_COND_O						      = 0x8,
+    GEN6_COND_U						      = 0x9,
+};
+
+enum gen_eu_math_function_control {
+    GEN6_MATH_INV					      = 0x1,
+    GEN6_MATH_LOG					      = 0x2,
+    GEN6_MATH_EXP					      = 0x3,
+    GEN6_MATH_SQRT					      = 0x4,
+    GEN6_MATH_RSQ					      = 0x5,
+    GEN6_MATH_SIN					      = 0x6,
+    GEN6_MATH_COS					      = 0x7,
+    GEN6_MATH_FDIV					      = 0x9,
+    GEN6_MATH_POW					      = 0xa,
+    GEN6_MATH_INT_DIV					      = 0xb,
+    GEN6_MATH_INT_DIV_QUOTIENT				      = 0xc,
+    GEN6_MATH_INT_DIV_REMAINDER				      = 0xd,
+    GEN8_MATH_INVM					      = 0xe,
+    GEN8_MATH_RSQRTM					      = 0xf,
+};
+
+enum gen_eu_shared_function_id {
+    GEN6_SFID_NULL					      = 0x0,
+    GEN6_SFID_SAMPLER					      = 0x2,
+    GEN6_SFID_GATEWAY					      = 0x3,
+    GEN6_SFID_DP_SAMPLER				      = 0x4,
+    GEN6_SFID_DP_RC					      = 0x5,
+    GEN6_SFID_URB					      = 0x6,
+    GEN6_SFID_SPAWNER					      = 0x7,
+    GEN6_SFID_VME					      = 0x8,
+    GEN6_SFID_DP_CC					      = 0x9,
+    GEN7_SFID_DP_DC0					      = 0xa,
+    GEN7_SFID_PI					      = 0xb,
+    GEN75_SFID_DP_DC1					      = 0xc,
+};
+
+enum gen_eu_reg_file {
+    GEN6_FILE_ARF					      = 0x0,
+    GEN6_FILE_GRF					      = 0x1,
+    GEN6_FILE_MRF					      = 0x2,
+    GEN6_FILE_IMM					      = 0x3,
+};
+
+enum gen_eu_reg_type {
+    GEN6_TYPE_UD					      = 0x0,
+    GEN6_TYPE_D						      = 0x1,
+    GEN6_TYPE_UW					      = 0x2,
+    GEN6_TYPE_W						      = 0x3,
+    GEN6_TYPE_UB					      = 0x4,
+    GEN6_TYPE_B						      = 0x5,
+    GEN7_TYPE_DF					      = 0x6,
+    GEN6_TYPE_F						      = 0x7,
+    GEN8_TYPE_UQ					      = 0x8,
+    GEN8_TYPE_Q						      = 0x9,
+    GEN8_TYPE_HF					      = 0xa,
+    GEN6_TYPE_UV_IMM					      = 0x4,
+    GEN6_TYPE_VF_IMM					      = 0x5,
+    GEN6_TYPE_V_IMM					      = 0x6,
+    GEN8_TYPE_DF_IMM					      = 0xa,
+    GEN8_TYPE_HF_IMM					      = 0xb,
+    GEN7_TYPE_F_3SRC					      = 0x0,
+    GEN7_TYPE_D_3SRC					      = 0x1,
+    GEN7_TYPE_UD_3SRC					      = 0x2,
+    GEN7_TYPE_DF_3SRC					      = 0x3,
+};
+
+enum gen_eu_vertical_stride {
+    GEN6_VERTSTRIDE_0					      = 0x0,
+    GEN6_VERTSTRIDE_1					      = 0x1,
+    GEN6_VERTSTRIDE_2					      = 0x2,
+    GEN6_VERTSTRIDE_4					      = 0x3,
+    GEN6_VERTSTRIDE_8					      = 0x4,
+    GEN6_VERTSTRIDE_16					      = 0x5,
+    GEN6_VERTSTRIDE_32					      = 0x6,
+    GEN6_VERTSTRIDE_VXH					      = 0xf,
+};
+
+enum gen_eu_width {
+    GEN6_WIDTH_1					      = 0x0,
+    GEN6_WIDTH_2					      = 0x1,
+    GEN6_WIDTH_4					      = 0x2,
+    GEN6_WIDTH_8					      = 0x3,
+    GEN6_WIDTH_16					      = 0x4,
+};
+
+enum gen_eu_horizontal_stride {
+    GEN6_HORZSTRIDE_0					      = 0x0,
+    GEN6_HORZSTRIDE_1					      = 0x1,
+    GEN6_HORZSTRIDE_2					      = 0x2,
+    GEN6_HORZSTRIDE_4					      = 0x3,
+};
+
+enum gen_eu_addressing_mode {
+    GEN6_ADDRMODE_DIRECT				      = 0x0,
+    GEN6_ADDRMODE_INDIRECT				      = 0x1,
+};
+
+enum gen_eu_swizzle {
+    GEN6_SWIZZLE_X					      = 0x0,
+    GEN6_SWIZZLE_Y					      = 0x1,
+    GEN6_SWIZZLE_Z					      = 0x2,
+    GEN6_SWIZZLE_W					      = 0x3,
+};
+
+enum gen_eu_arf_reg {
+    GEN6_ARF_NULL					      = 0x0,
+    GEN6_ARF_A0						      = 0x10,
+    GEN6_ARF_ACC0					      = 0x20,
+    GEN6_ARF_F0						      = 0x30,
+    GEN6_ARF_SR0					      = 0x70,
+    GEN6_ARF_CR0					      = 0x80,
+    GEN6_ARF_N0						      = 0x90,
+    GEN6_ARF_IP						      = 0xa0,
+    GEN6_ARF_TDR					      = 0xb0,
+    GEN7_ARF_TM0					      = 0xc0,
+};
+
+#define GEN6_INST_SATURATE					(0x1 << 31)
+#define GEN6_INST_DEBUGCTRL					(0x1 << 30)
+#define GEN6_INST_CMPTCTRL					(0x1 << 29)
+#define GEN8_INST_BRANCHCTRL					(0x1 << 28)
+#define GEN6_INST_ACCWRCTRL					(0x1 << 28)
+#define GEN6_INST_CONDMODIFIER__MASK				0x0f000000
+#define GEN6_INST_CONDMODIFIER__SHIFT				24
+#define GEN6_INST_SFID__MASK					0x0f000000
+#define GEN6_INST_SFID__SHIFT					24
+#define GEN6_INST_FC__MASK					0x0f000000
+#define GEN6_INST_FC__SHIFT					24
+#define GEN6_INST_EXECSIZE__MASK				0x00e00000
+#define GEN6_INST_EXECSIZE__SHIFT				21
+#define GEN6_INST_PREDINV					(0x1 << 20)
+#define GEN6_INST_PREDCTRL__MASK				0x000f0000
+#define GEN6_INST_PREDCTRL__SHIFT				16
+#define GEN6_INST_THREADCTRL__MASK				0x0000c000
+#define GEN6_INST_THREADCTRL__SHIFT				14
+#define GEN6_INST_QTRCTRL__MASK					0x00003000
+#define GEN6_INST_QTRCTRL__SHIFT				12
+#define GEN6_INST_DEPCTRL__MASK					0x00000c00
+#define GEN6_INST_DEPCTRL__SHIFT				10
+#define GEN6_INST_MASKCTRL__MASK				0x00000200
+#define GEN6_INST_MASKCTRL__SHIFT				9
+#define GEN8_INST_NIBCTRL					(0x1 << 11)
+#define GEN8_INST_DEPCTRL__MASK					0x00000600
+#define GEN8_INST_DEPCTRL__SHIFT				9
+#define GEN6_INST_ACCESSMODE__MASK				0x00000100
+#define GEN6_INST_ACCESSMODE__SHIFT				8
+#define GEN6_INST_OPCODE__MASK					0x0000007f
+#define GEN6_INST_OPCODE__SHIFT					0
+#define GEN6_INST_DST_ADDRMODE__MASK				0x80000000
+#define GEN6_INST_DST_ADDRMODE__SHIFT				31
+#define GEN6_INST_DST_HORZSTRIDE__MASK				0x60000000
+#define GEN6_INST_DST_HORZSTRIDE__SHIFT				29
+#define GEN6_INST_DST_REG__MASK					0x1fe00000
+#define GEN6_INST_DST_REG__SHIFT				21
+#define GEN6_INST_DST_SUBREG__MASK				0x001f0000
+#define GEN6_INST_DST_SUBREG__SHIFT				16
+#define GEN6_INST_DST_ADDR_SUBREG__MASK				0x1c000000
+#define GEN6_INST_DST_ADDR_SUBREG__SHIFT			26
+#define GEN6_INST_DST_ADDR_IMM__MASK				0x03ff0000
+#define GEN6_INST_DST_ADDR_IMM__SHIFT				16
+#define GEN8_INST_DST_ADDR_SUBREG__MASK				0x1e000000
+#define GEN8_INST_DST_ADDR_SUBREG__SHIFT			25
+#define GEN8_INST_DST_ADDR_IMM__MASK				0x01ff0000
+#define GEN8_INST_DST_ADDR_IMM__SHIFT				16
+#define GEN6_INST_DST_SUBREG_ALIGN16__MASK			0x00100000
+#define GEN6_INST_DST_SUBREG_ALIGN16__SHIFT			20
+#define GEN6_INST_DST_SUBREG_ALIGN16__SHR			4
+#define GEN6_INST_DST_ADDR_IMM_ALIGN16__MASK			0x03f00000
+#define GEN6_INST_DST_ADDR_IMM_ALIGN16__SHIFT			20
+#define GEN6_INST_DST_ADDR_IMM_ALIGN16__SHR			4
+#define GEN8_INST_DST_ADDR_IMM_ALIGN16__MASK			0x01f00000
+#define GEN8_INST_DST_ADDR_IMM_ALIGN16__SHIFT			20
+#define GEN8_INST_DST_ADDR_IMM_ALIGN16__SHR			4
+#define GEN6_INST_DST_WRITEMASK__MASK				0x000f0000
+#define GEN6_INST_DST_WRITEMASK__SHIFT				16
+#define GEN7_INST_NIBCTRL					(0x1 << 15)
+#define GEN6_INST_SRC1_TYPE__MASK				0x00007000
+#define GEN6_INST_SRC1_TYPE__SHIFT				12
+#define GEN6_INST_SRC1_FILE__MASK				0x00000c00
+#define GEN6_INST_SRC1_FILE__SHIFT				10
+#define GEN6_INST_SRC0_TYPE__MASK				0x00000380
+#define GEN6_INST_SRC0_TYPE__SHIFT				7
+#define GEN6_INST_SRC0_FILE__MASK				0x00000060
+#define GEN6_INST_SRC0_FILE__SHIFT				5
+#define GEN6_INST_DST_TYPE__MASK				0x0000001c
+#define GEN6_INST_DST_TYPE__SHIFT				2
+#define GEN6_INST_DST_FILE__MASK				0x00000003
+#define GEN6_INST_DST_FILE__SHIFT				0
+#define GEN8_INST_DST_ADDR_IMM_BIT9__MASK			0x00008000
+#define GEN8_INST_DST_ADDR_IMM_BIT9__SHIFT			15
+#define GEN8_INST_DST_ADDR_IMM_BIT9__SHR			9
+#define GEN8_INST_SRC0_TYPE__MASK				0x00007800
+#define GEN8_INST_SRC0_TYPE__SHIFT				11
+#define GEN8_INST_SRC0_FILE__MASK				0x00000600
+#define GEN8_INST_SRC0_FILE__SHIFT				9
+#define GEN8_INST_DST_TYPE__MASK				0x000001e0
+#define GEN8_INST_DST_TYPE__SHIFT				5
+#define GEN8_INST_DST_FILE__MASK				0x00000018
+#define GEN8_INST_DST_FILE__SHIFT				3
+#define GEN8_INST_MASKCTRL__MASK				0x00000004
+#define GEN8_INST_MASKCTRL__SHIFT				2
+#define GEN8_INST_FLAG_REG__MASK				0x00000002
+#define GEN8_INST_FLAG_REG__SHIFT				1
+#define GEN8_INST_FLAG_SUBREG__MASK				0x00000001
+#define GEN8_INST_FLAG_SUBREG__SHIFT				0
+#define GEN7_INST_FLAG_REG__MASK				0x04000000
+#define GEN7_INST_FLAG_REG__SHIFT				26
+#define GEN6_INST_FLAG_SUBREG__MASK				0x02000000
+#define GEN6_INST_FLAG_SUBREG__SHIFT				25
+#define GEN8_INST_SRC0_ADDR_IMM_BIT9__MASK			0x80000000
+#define GEN8_INST_SRC0_ADDR_IMM_BIT9__SHIFT			31
+#define GEN8_INST_SRC0_ADDR_IMM_BIT9__SHR			9
+#define GEN8_INST_SRC1_TYPE__MASK				0x78000000
+#define GEN8_INST_SRC1_TYPE__SHIFT				27
+#define GEN8_INST_SRC1_FILE__MASK				0x06000000
+#define GEN8_INST_SRC1_FILE__SHIFT				25
+#define GEN8_INST_SRC1_ADDR_IMM_BIT9__MASK			0x02000000
+#define GEN8_INST_SRC1_ADDR_IMM_BIT9__SHIFT			25
+#define GEN8_INST_SRC1_ADDR_IMM_BIT9__SHR			9
+#define GEN6_INST_SRC_VERTSTRIDE__MASK				0x01e00000
+#define GEN6_INST_SRC_VERTSTRIDE__SHIFT				21
+#define GEN6_INST_SRC_WIDTH__MASK				0x001c0000
+#define GEN6_INST_SRC_WIDTH__SHIFT				18
+#define GEN6_INST_SRC_HORZSTRIDE__MASK				0x00030000
+#define GEN6_INST_SRC_HORZSTRIDE__SHIFT				16
+#define GEN6_INST_SRC_SWIZZLE_W__MASK				0x000c0000
+#define GEN6_INST_SRC_SWIZZLE_W__SHIFT				18
+#define GEN6_INST_SRC_SWIZZLE_Z__MASK				0x00030000
+#define GEN6_INST_SRC_SWIZZLE_Z__SHIFT				16
+#define GEN6_INST_SRC_ADDRMODE__MASK				0x00008000
+#define GEN6_INST_SRC_ADDRMODE__SHIFT				15
+#define GEN6_INST_SRC_NEGATE					(0x1 << 14)
+#define GEN6_INST_SRC_ABSOLUTE					(0x1 << 13)
+#define GEN6_INST_SRC_REG__MASK					0x00001fe0
+#define GEN6_INST_SRC_REG__SHIFT				5
+#define GEN6_INST_SRC_SUBREG__MASK				0x0000001f
+#define GEN6_INST_SRC_SUBREG__SHIFT				0
+#define GEN6_INST_SRC_ADDR_SUBREG__MASK				0x00001c00
+#define GEN6_INST_SRC_ADDR_SUBREG__SHIFT			10
+#define GEN6_INST_SRC_ADDR_IMM__MASK				0x000003ff
+#define GEN6_INST_SRC_ADDR_IMM__SHIFT				0
+#define GEN8_INST_SRC_ADDR_SUBREG__MASK				0x00001e00
+#define GEN8_INST_SRC_ADDR_SUBREG__SHIFT			9
+#define GEN8_INST_SRC_ADDR_IMM__MASK				0x000001ff
+#define GEN8_INST_SRC_ADDR_IMM__SHIFT				0
+#define GEN6_INST_SRC_SUBREG_ALIGN16__MASK			0x00000010
+#define GEN6_INST_SRC_SUBREG_ALIGN16__SHIFT			4
+#define GEN6_INST_SRC_SUBREG_ALIGN16__SHR			4
+#define GEN6_INST_SRC_ADDR_IMM_ALIGN16__MASK			0x000003f0
+#define GEN6_INST_SRC_ADDR_IMM_ALIGN16__SHIFT			4
+#define GEN6_INST_SRC_ADDR_IMM_ALIGN16__SHR			4
+#define GEN8_INST_SRC_ADDR_IMM_ALIGN16__MASK			0x000001f0
+#define GEN8_INST_SRC_ADDR_IMM_ALIGN16__SHIFT			4
+#define GEN8_INST_SRC_ADDR_IMM_ALIGN16__SHR			4
+#define GEN6_INST_SRC_SWIZZLE_Y__MASK				0x0000000c
+#define GEN6_INST_SRC_SWIZZLE_Y__SHIFT				2
+#define GEN6_INST_SRC_SWIZZLE_X__MASK				0x00000003
+#define GEN6_INST_SRC_SWIZZLE_X__SHIFT				0
+#define GEN6_3SRC_DST_REG__MASK					0xff000000
+#define GEN6_3SRC_DST_REG__SHIFT				24
+#define GEN6_3SRC_DST_SUBREG__MASK				0x00e00000
+#define GEN6_3SRC_DST_SUBREG__SHIFT				21
+#define GEN6_3SRC_DST_SUBREG__SHR				2
+#define GEN6_3SRC_DST_WRITEMASK__MASK				0x001e0000
+#define GEN6_3SRC_DST_WRITEMASK__SHIFT				17
+#define GEN7_3SRC_NIBCTRL					(0x1 << 15)
+#define GEN7_3SRC_DST_TYPE__MASK				0x00003000
+#define GEN7_3SRC_DST_TYPE__SHIFT				12
+#define GEN7_3SRC_SRC_TYPE__MASK				0x00000c00
+#define GEN7_3SRC_SRC_TYPE__SHIFT				10
+#define GEN6_3SRC_SRC2_NEGATE					(0x1 << 9)
+#define GEN6_3SRC_SRC2_ABSOLUTE					(0x1 << 8)
+#define GEN6_3SRC_SRC1_NEGATE					(0x1 << 7)
+#define GEN6_3SRC_SRC1_ABSOLUTE					(0x1 << 6)
+#define GEN6_3SRC_SRC0_NEGATE					(0x1 << 5)
+#define GEN6_3SRC_SRC0_ABSOLUTE					(0x1 << 4)
+#define GEN7_3SRC_FLAG_REG__MASK				0x00000004
+#define GEN7_3SRC_FLAG_REG__SHIFT				2
+#define GEN6_3SRC_FLAG_SUBREG__MASK				0x00000002
+#define GEN6_3SRC_FLAG_SUBREG__SHIFT				1
+#define GEN6_3SRC_DST_FILE_MRF					(0x1 << 0)
+#define GEN8_3SRC_DST_TYPE__MASK				0x0001c000
+#define GEN8_3SRC_DST_TYPE__SHIFT				14
+#define GEN8_3SRC_SRC_TYPE__MASK				0x00003800
+#define GEN8_3SRC_SRC_TYPE__SHIFT				11
+#define GEN8_3SRC_SRC2_NEGATE					(0x1 << 10)
+#define GEN8_3SRC_SRC2_ABSOLUTE					(0x1 << 9)
+#define GEN8_3SRC_SRC1_NEGATE					(0x1 << 8)
+#define GEN8_3SRC_SRC1_ABSOLUTE					(0x1 << 7)
+#define GEN8_3SRC_SRC0_NEGATE					(0x1 << 6)
+#define GEN8_3SRC_SRC0_ABSOLUTE					(0x1 << 5)
+#define GEN8_3SRC_MASKCTRL__MASK				0x00000004
+#define GEN8_3SRC_MASKCTRL__SHIFT				2
+#define GEN8_3SRC_FLAG_REG__MASK				0x00000002
+#define GEN8_3SRC_FLAG_REG__SHIFT				1
+#define GEN8_3SRC_FLAG_SUBREG__MASK				0x00000001
+#define GEN8_3SRC_FLAG_SUBREG__SHIFT				0
+#define GEN6_3SRC_SRC_REG__MASK					0x000ff000
+#define GEN6_3SRC_SRC_REG__SHIFT				12
+#define GEN6_3SRC_SRC_SUBREG__MASK				0x00000e00
+#define GEN6_3SRC_SRC_SUBREG__SHIFT				9
+#define GEN6_3SRC_SRC_SUBREG__SHR				2
+#define GEN6_3SRC_SRC_SWIZZLE_W__MASK				0x00000180
+#define GEN6_3SRC_SRC_SWIZZLE_W__SHIFT				7
+#define GEN6_3SRC_SRC_SWIZZLE_Z__MASK				0x00000060
+#define GEN6_3SRC_SRC_SWIZZLE_Z__SHIFT				5
+#define GEN6_3SRC_SRC_SWIZZLE_Y__MASK				0x00000018
+#define GEN6_3SRC_SRC_SWIZZLE_Y__SHIFT				3
+#define GEN6_3SRC_SRC_SWIZZLE_X__MASK				0x00000006
+#define GEN6_3SRC_SRC_SWIZZLE_X__SHIFT				1
+#define GEN6_3SRC_SRC_REPCTRL					(0x1 << 0)
+#define GEN6_COMPACT_SRC1_REG__MASK			0xff00000000000000ULL
+#define GEN6_COMPACT_SRC1_REG__SHIFT				56
+#define GEN6_COMPACT_SRC0_REG__MASK			0x00ff000000000000ULL
+#define GEN6_COMPACT_SRC0_REG__SHIFT				48
+#define GEN6_COMPACT_DST_REG__MASK			0x0000ff0000000000ULL
+#define GEN6_COMPACT_DST_REG__SHIFT				40
+#define GEN6_COMPACT_SRC1_INDEX__MASK			0x000000f800000000ULL
+#define GEN6_COMPACT_SRC1_INDEX__SHIFT				35
+#define GEN6_COMPACT_SRC0_INDEX__MASK			0x00000007c0000000ULL
+#define GEN6_COMPACT_SRC0_INDEX__SHIFT				30
+#define GEN6_COMPACT_CMPTCTRL					(0x1 << 29)
+#define GEN6_COMPACT_FLAG_SUBREG__MASK				0x10000000
+#define GEN6_COMPACT_FLAG_SUBREG__SHIFT				28
+#define GEN6_COMPACT_CONDMODIFIER__MASK				0x0f000000
+#define GEN6_COMPACT_CONDMODIFIER__SHIFT			24
+#define GEN6_COMPACT_ACCWRCTRL					(0x1 << 23)
+#define GEN6_COMPACT_SUBREG_INDEX__MASK				0x007c0000
+#define GEN6_COMPACT_SUBREG_INDEX__SHIFT			18
+#define GEN6_COMPACT_DATATYPE_INDEX__MASK			0x0003e000
+#define GEN6_COMPACT_DATATYPE_INDEX__SHIFT			13
+#define GEN6_COMPACT_CONTROL_INDEX__MASK			0x00001f00
+#define GEN6_COMPACT_CONTROL_INDEX__SHIFT			8
+#define GEN6_COMPACT_DEBUGCTRL					(0x1 << 7)
+#define GEN6_COMPACT_OPCODE__MASK				0x0000007f
+#define GEN6_COMPACT_OPCODE__SHIFT				0
+#define GEN8_COMPACT_3SRC_SRC2_REG__MASK		0xfe00000000000000ULL
+#define GEN8_COMPACT_3SRC_SRC2_REG__SHIFT			57
+#define GEN8_COMPACT_3SRC_SRC2_REG__SHR				1
+#define GEN8_COMPACT_3SRC_SRC1_REG__MASK		0x01fc000000000000ULL
+#define GEN8_COMPACT_3SRC_SRC1_REG__SHIFT			50
+#define GEN8_COMPACT_3SRC_SRC1_REG__SHR				1
+#define GEN8_COMPACT_3SRC_SRC0_REG__MASK		0x0003f80000000000ULL
+#define GEN8_COMPACT_3SRC_SRC0_REG__SHIFT			43
+#define GEN8_COMPACT_3SRC_SRC0_REG__SHR				1
+#define GEN8_COMPACT_3SRC_SRC2_SUBREG__MASK		0x0000070000000000ULL
+#define GEN8_COMPACT_3SRC_SRC2_SUBREG__SHIFT			40
+#define GEN8_COMPACT_3SRC_SRC2_SUBREG__SHR			2
+#define GEN8_COMPACT_3SRC_SRC1_SUBREG__MASK		0x000000e000000000ULL
+#define GEN8_COMPACT_3SRC_SRC1_SUBREG__SHIFT			37
+#define GEN8_COMPACT_3SRC_SRC1_SUBREG__SHR			2
+#define GEN8_COMPACT_3SRC_SRC0_SUBREG__MASK		0x0000001c00000000ULL
+#define GEN8_COMPACT_3SRC_SRC0_SUBREG__SHIFT			34
+#define GEN8_COMPACT_3SRC_SRC0_SUBREG__SHR			2
+#define GEN8_COMPACT_3SRC_SRC2_REPCTRL				(0x1ULL << 33)
+#define GEN8_COMPACT_3SRC_SRC1_REPCTRL				(0x1ULL << 32)
+#define GEN8_COMPACT_3SRC_SATURATE				(0x1 << 31)
+#define GEN8_COMPACT_3SRC_DEBUGCTRL				(0x1 << 30)
+#define GEN8_COMPACT_3SRC_CMPTCTRL				(0x1 << 29)
+#define GEN8_COMPACT_3SRC_SRC0_REPCTRL				(0x1 << 28)
+#define GEN8_COMPACT_3SRC_DST_REG__MASK				0x0007f000
+#define GEN8_COMPACT_3SRC_DST_REG__SHIFT			12
+#define GEN8_COMPACT_3SRC_SOURCE_INDEX__MASK			0x00000c00
+#define GEN8_COMPACT_3SRC_SOURCE_INDEX__SHIFT			10
+#define GEN8_COMPACT_3SRC_CONTROL_INDEX__MASK			0x00000300
+#define GEN8_COMPACT_3SRC_CONTROL_INDEX__SHIFT			8
+#define GEN8_COMPACT_3SRC_OPCODE__MASK				0x0000007f
+#define GEN8_COMPACT_3SRC_OPCODE__SHIFT				0
+
+
+
+
+
+
+
+
+#define GEN6_3SRC_SRC_2__MASK				0x7ffffc0000000000ULL
+#define GEN6_3SRC_SRC_2__SHIFT					42
+#define GEN6_3SRC_SRC_1__MASK				0x000003ffffe00000ULL
+#define GEN6_3SRC_SRC_1__SHIFT					21
+#define GEN6_3SRC_SRC_0__MASK					0x001fffff
+#define GEN6_3SRC_SRC_0__SHIFT					0
+
+
+
+
+
+
+#endif /* GEN_EU_ISA_XML */

diff --git a/icd/intel/genhw/gen_eu_message.xml.h b/icd/intel/genhw/gen_eu_message.xml.h
new file mode 100644
index 0000000..fe8b269
--- /dev/null
+++ b/icd/intel/genhw/gen_eu_message.xml.h

@@ -0,0 +1,329 @@
+#ifndef GEN_EU_MESSAGE_XML
+#define GEN_EU_MESSAGE_XML
+
+/* Autogenerated file, DO NOT EDIT manually!
+
+This file was generated by the rules-ng-ng headergen tool in this git repository:
+https://github.com/olvaffe/envytools/
+git clone https://github.com/olvaffe/envytools.git
+
+Copyright (C) 2014-2015 by the following authors:
+- Chia-I Wu <olvaffe@gmail.com> (olv)
+
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
+The above copyright notice and this permission notice (including the
+next paragraph) shall be included in all copies or substantial
+portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+*/
+
+
+enum gen_eu_urb_op {
+    GEN6_MSG_URB_WRITE					      = 0x0,
+    GEN6_MSG_URB_FF_SYNC				      = 0x1,
+    GEN7_MSG_URB_WRITE_HWORD				      = 0x0,
+    GEN7_MSG_URB_WRITE_OWORD				      = 0x1,
+    GEN7_MSG_URB_READ_HWORD				      = 0x2,
+    GEN7_MSG_URB_READ_OWORD				      = 0x3,
+    GEN7_MSG_URB_ATOMIC_MOV				      = 0x4,
+    GEN7_MSG_URB_ATOMIC_INC				      = 0x5,
+    GEN8_MSG_URB_SIMD8_WRITE				      = 0x7,
+};
+
+enum gen_eu_pi_simd {
+    GEN7_MSG_PI_SIMD8					      = 0x0,
+    GEN7_MSG_PI_SIMD16					      = 0x1,
+};
+
+enum gen_eu_pi_op {
+    GEN7_MSG_PI_EVAL_SNAPPED_IMM			      = 0x0,
+    GEN7_MSG_PI_EVAL_SINDEX				      = 0x1,
+    GEN7_MSG_PI_EVAL_CENTROID				      = 0x2,
+    GEN7_MSG_PI_EVAL_SNAPPED				      = 0x3,
+};
+
+enum gen_eu_sampler_simd {
+    GEN6_MSG_SAMPLER_SIMD4X2				      = 0x0,
+    GEN9_MSG_SAMPLER_SIMD8D				      = 0x0,
+    GEN6_MSG_SAMPLER_SIMD8				      = 0x1,
+    GEN6_MSG_SAMPLER_SIMD16				      = 0x2,
+    GEN6_MSG_SAMPLER_SIMD32_64				      = 0x3,
+};
+
+enum gen_eu_sampler_op {
+    GEN6_MSG_SAMPLER_SAMPLE				      = 0x0,
+    GEN6_MSG_SAMPLER_SAMPLE_B				      = 0x1,
+    GEN6_MSG_SAMPLER_SAMPLE_L				      = 0x2,
+    GEN6_MSG_SAMPLER_SAMPLE_C				      = 0x3,
+    GEN6_MSG_SAMPLER_SAMPLE_D				      = 0x4,
+    GEN6_MSG_SAMPLER_SAMPLE_B_C				      = 0x5,
+    GEN6_MSG_SAMPLER_SAMPLE_L_C				      = 0x6,
+    GEN6_MSG_SAMPLER_LD					      = 0x7,
+    GEN6_MSG_SAMPLER_GATHER4				      = 0x8,
+    GEN6_MSG_SAMPLER_LOD				      = 0x9,
+    GEN6_MSG_SAMPLER_RESINFO				      = 0xa,
+    GEN6_MSG_SAMPLER_SAMPLEINFO				      = 0xb,
+    GEN7_MSG_SAMPLER_GATHER4_C				      = 0x10,
+    GEN7_MSG_SAMPLER_GATHER4_PO				      = 0x11,
+    GEN7_MSG_SAMPLER_GATHER4_PO_C			      = 0x12,
+    GEN7_MSG_SAMPLER_SAMPLE_D_C				      = 0x14,
+    GEN7_MSG_SAMPLER_SAMPLE_LZ				      = 0x18,
+    GEN7_MSG_SAMPLER_SAMPLE_C_LC			      = 0x19,
+    GEN7_MSG_SAMPLER_LD_LZ				      = 0x1a,
+    GEN7_MSG_SAMPLER_LD_MCS				      = 0x1d,
+    GEN7_MSG_SAMPLER_LD2DMS				      = 0x1e,
+    GEN7_MSG_SAMPLER_LD2DSS				      = 0x1f,
+};
+
+enum gen_eu_dp_op {
+    GEN6_MSG_DP_OWORD_BLOCK_READ			      = 0x0,
+    GEN6_MSG_DP_RT_UNORM_READ				      = 0x1,
+    GEN6_MSG_DP_OWORD_DUAL_BLOCK_READ			      = 0x2,
+    GEN6_MSG_DP_MEDIA_BLOCK_READ			      = 0x4,
+    GEN6_MSG_DP_UNALIGNED_OWORD_BLOCK_READ		      = 0x5,
+    GEN6_MSG_DP_DWORD_SCATTERED_READ			      = 0x6,
+    GEN6_MSG_DP_DWORD_ATOMIC_WRITE			      = 0x7,
+    GEN6_MSG_DP_OWORD_BLOCK_WRITE			      = 0x8,
+    GEN6_MSG_DP_OWORD_DUAL_BLOCK_WRITE			      = 0x9,
+    GEN6_MSG_DP_MEDIA_BLOCK_WRITE			      = 0xa,
+    GEN6_MSG_DP_DWORD_SCATTERED_WRITE			      = 0xb,
+    GEN6_MSG_DP_RT_WRITE				      = 0xc,
+    GEN6_MSG_DP_SVB_WRITE				      = 0xd,
+    GEN6_MSG_DP_RT_UNORM_WRITE				      = 0xe,
+    GEN7_MSG_DP_SAMPLER_UNALIGNED_OWORD_BLOCK_READ	      = 0x1,
+    GEN7_MSG_DP_SAMPLER_MEDIA_BLOCK_READ		      = 0x4,
+    GEN7_MSG_DP_RC_MEDIA_BLOCK_READ			      = 0x4,
+    GEN7_MSG_DP_RC_TYPED_SURFACE_READ			      = 0x5,
+    GEN7_MSG_DP_RC_TYPED_ATOMIC_OP			      = 0x6,
+    GEN7_MSG_DP_RC_MEMORY_FENCE				      = 0x7,
+    GEN7_MSG_DP_RC_MEDIA_BLOCK_WRITE			      = 0xa,
+    GEN7_MSG_DP_RC_RT_WRITE				      = 0xc,
+    GEN7_MSG_DP_RC_TYPED_SURFACE_WRITE			      = 0xd,
+    GEN7_MSG_DP_CC_OWORD_BLOCK_READ			      = 0x0,
+    GEN7_MSG_DP_CC_UNALIGNED_OWORD_BLOCK_READ		      = 0x1,
+    GEN7_MSG_DP_CC_OWORD_DUAL_BLOCK_READ		      = 0x2,
+    GEN7_MSG_DP_CC_DWORD_SCATTERED_READ			      = 0x3,
+    GEN7_MSG_DP_DC0_OWORD_BLOCK_READ			      = 0x0,
+    GEN7_MSG_DP_DC0_UNALIGNED_OWORD_BLOCK_READ		      = 0x1,
+    GEN7_MSG_DP_DC0_OWORD_DUAL_BLOCK_READ		      = 0x2,
+    GEN7_MSG_DP_DC0_DWORD_SCATTERED_READ		      = 0x3,
+    GEN7_MSG_DP_DC0_BYTE_SCATTERED_READ			      = 0x4,
+    GEN7_MSG_DP_DC0_UNTYPED_SURFACE_READ		      = 0x5,
+    GEN7_MSG_DP_DC0_UNTYPED_ATOMIC_OP			      = 0x6,
+    GEN7_MSG_DP_DC0_MEMORY_FENCE			      = 0x7,
+    GEN7_MSG_DP_DC0_OWORD_BLOCK_WRITE			      = 0x8,
+    GEN7_MSG_DP_DC0_OWORD_DUAL_BLOCK_WRITE		      = 0xa,
+    GEN7_MSG_DP_DC0_DWORD_SCATTERED_WRITE		      = 0xb,
+    GEN7_MSG_DP_DC0_BYTE_SCATTERED_WRITE		      = 0xc,
+    GEN7_MSG_DP_DC0_UNTYPED_SURFACE_WRITE		      = 0xd,
+    GEN75_MSG_DP_SAMPLER_READ_SURFACE_INFO		      = 0x0,
+    GEN75_MSG_DP_SAMPLER_UNALIGNED_OWORD_BLOCK_READ	      = 0x1,
+    GEN75_MSG_DP_SAMPLER_MEDIA_BLOCK_READ		      = 0x4,
+    GEN75_MSG_DP_RC_MEDIA_BLOCK_READ			      = 0x4,
+    GEN75_MSG_DP_RC_MEMORY_FENCE			      = 0x7,
+    GEN75_MSG_DP_RC_MEDIA_BLOCK_WRITE			      = 0xa,
+    GEN75_MSG_DP_RC_RT_WRITE				      = 0xc,
+    GEN75_MSG_DP_CC_OWORD_BLOCK_READ			      = 0x0,
+    GEN75_MSG_DP_CC_UNALIGNED_OWORD_BLOCK_READ		      = 0x1,
+    GEN75_MSG_DP_CC_OWORD_DUAL_BLOCK_READ		      = 0x2,
+    GEN75_MSG_DP_CC_DWORD_SCATTERED_READ		      = 0x3,
+    GEN75_MSG_DP_DC0_OWORD_BLOCK_READ			      = 0x0,
+    GEN75_MSG_DP_DC0_UNALIGNED_OWORD_BLOCK_READ		      = 0x1,
+    GEN75_MSG_DP_DC0_OWORD_DUAL_BLOCK_READ		      = 0x2,
+    GEN75_MSG_DP_DC0_DWORD_SCATTERED_READ		      = 0x3,
+    GEN75_MSG_DP_DC0_BYTE_SCATTERED_READ		      = 0x4,
+    GEN75_MSG_DP_DC0_MEMORY_FENCE			      = 0x7,
+    GEN75_MSG_DP_DC0_OWORD_BLOCK_WRITE			      = 0x8,
+    GEN75_MSG_DP_DC0_OWORD_DUAL_BLOCK_WRITE		      = 0xa,
+    GEN75_MSG_DP_DC0_DWORD_SCATTERED_WRITE		      = 0xb,
+    GEN75_MSG_DP_DC0_BYTE_SCATTERED_WRITE		      = 0xc,
+    GEN75_MSG_DP_DC1_UNTYPED_SURFACE_READ		      = 0x1,
+    GEN75_MSG_DP_DC1_UNTYPED_ATOMIC_OP			      = 0x2,
+    GEN75_MSG_DP_DC1_UNTYPED_ATOMIC_OP_SIMD4X2		      = 0x3,
+    GEN75_MSG_DP_DC1_MEDIA_BLOCK_READ			      = 0x4,
+    GEN75_MSG_DP_DC1_TYPED_SURFACE_READ			      = 0x5,
+    GEN75_MSG_DP_DC1_TYPED_ATOMIC_OP			      = 0x6,
+    GEN75_MSG_DP_DC1_TYPED_ATOMIC_OP_SIMD4X2		      = 0x7,
+    GEN75_MSG_DP_DC1_UNTYPED_SURFACE_WRITE		      = 0x9,
+    GEN75_MSG_DP_DC1_MEDIA_BLOCK_WRITE			      = 0xa,
+    GEN75_MSG_DP_DC1_ATOMIC_COUNTER_OP			      = 0xb,
+    GEN75_MSG_DP_DC1_ATOMIC_COUNTER_OP_SIMD4X2		      = 0xc,
+    GEN75_MSG_DP_DC1_TYPED_SURFACE_WRITE		      = 0xd,
+};
+
+enum gen_eu_dp_aop {
+    GEN7_MSG_DP_AOP_CMPWR8B				      = 0x0,
+    GEN7_MSG_DP_AOP_AND					      = 0x1,
+    GEN7_MSG_DP_AOP_OR					      = 0x2,
+    GEN7_MSG_DP_AOP_XOR					      = 0x3,
+    GEN7_MSG_DP_AOP_MOV					      = 0x4,
+    GEN7_MSG_DP_AOP_INC					      = 0x5,
+    GEN7_MSG_DP_AOP_DEC					      = 0x6,
+    GEN7_MSG_DP_AOP_ADD					      = 0x7,
+    GEN7_MSG_DP_AOP_SUB					      = 0x8,
+    GEN7_MSG_DP_AOP_REVSUB				      = 0x9,
+    GEN7_MSG_DP_AOP_IMAX				      = 0xa,
+    GEN7_MSG_DP_AOP_IMIN				      = 0xb,
+    GEN7_MSG_DP_AOP_UMAX				      = 0xc,
+    GEN7_MSG_DP_AOP_UMIN				      = 0xd,
+    GEN7_MSG_DP_AOP_CMPWR				      = 0xe,
+    GEN7_MSG_DP_AOP_PREDEC				      = 0xf,
+};
+
+#define GEN6_MSG_EOT						(0x1 << 31)
+#define GEN6_MSG_MLEN__MASK					0x1e000000
+#define GEN6_MSG_MLEN__SHIFT					25
+#define GEN6_MSG_RLEN__MASK					0x01f00000
+#define GEN6_MSG_RLEN__SHIFT					20
+#define GEN6_MSG_HEADER_PRESENT					(0x1 << 19)
+#define GEN6_MSG_FUNCTION_CONTROL__MASK				0x0007ffff
+#define GEN6_MSG_FUNCTION_CONTROL__SHIFT			0
+#define GEN6_MSG_URB_COMPLETE					(0x1 << 15)
+#define GEN6_MSG_URB_USED					(0x1 << 14)
+#define GEN6_MSG_URB_ALLOCATE					(0x1 << 13)
+#define GEN6_MSG_URB_INTERLEAVED				(0x1 << 10)
+#define GEN6_MSG_URB_OFFSET__MASK				0x000003f0
+#define GEN6_MSG_URB_OFFSET__SHIFT				4
+#define GEN6_MSG_URB_OP__MASK					0x0000000f
+#define GEN6_MSG_URB_OP__SHIFT					0
+#define GEN7_MSG_URB_PER_SLOT_OFFSET				(0x1 << 16)
+#define GEN7_MSG_URB_COMPLETE					(0x1 << 15)
+#define GEN7_MSG_URB_INTERLEAVED				(0x1 << 14)
+#define GEN7_MSG_URB_GLOBAL_OFFSET__MASK			0x00003ff8
+#define GEN7_MSG_URB_GLOBAL_OFFSET__SHIFT			3
+#define GEN7_MSG_URB_OP__MASK					0x00000007
+#define GEN7_MSG_URB_OP__SHIFT					0
+#define GEN8_MSG_URB_PER_SLOT_OFFSET				(0x1 << 17)
+#define GEN8_MSG_URB_INTERLEAVED				(0x1 << 15)
+#define GEN8_MSG_URB_GLOBAL_OFFSET__MASK			0x00007ff0
+#define GEN8_MSG_URB_GLOBAL_OFFSET__SHIFT			4
+#define GEN8_MSG_URB_OP__MASK					0x0000000f
+#define GEN8_MSG_URB_OP__SHIFT					0
+#define GEN7_MSG_PI_SIMD__MASK					0x00010000
+#define GEN7_MSG_PI_SIMD__SHIFT					16
+#define GEN7_MSG_PI_LINEAR_INTERP				(0x1 << 14)
+#define GEN7_MSG_PI_OP__MASK					0x00003000
+#define GEN7_MSG_PI_OP__SHIFT					12
+#define GEN7_MSG_PI_SLOTGRP_HI					(0x1 << 11)
+#define GEN7_MSG_PI_OFFSET_Y__MASK				0x000000f0
+#define GEN7_MSG_PI_OFFSET_Y__SHIFT				4
+#define GEN7_MSG_PI_OFFSET_X__MASK				0x0000000f
+#define GEN7_MSG_PI_OFFSET_X__SHIFT				0
+#define GEN7_MSG_PI_SAMPLE_INDEX__MASK				0x000000f0
+#define GEN7_MSG_PI_SAMPLE_INDEX__SHIFT				4
+#define GEN6_MSG_SAMPLER_SIMD__MASK				0x00030000
+#define GEN6_MSG_SAMPLER_SIMD__SHIFT				16
+#define GEN6_MSG_SAMPLER_OP__MASK				0x0000f000
+#define GEN6_MSG_SAMPLER_OP__SHIFT				12
+#define GEN7_MSG_SAMPLER_SIMD__MASK				0x00060000
+#define GEN7_MSG_SAMPLER_SIMD__SHIFT				17
+#define GEN7_MSG_SAMPLER_OP__MASK				0x0001f000
+#define GEN7_MSG_SAMPLER_OP__SHIFT				12
+#define GEN6_MSG_SAMPLER_INDEX__MASK				0x00000f00
+#define GEN6_MSG_SAMPLER_INDEX__SHIFT				8
+#define GEN6_MSG_SAMPLER_SURFACE__MASK				0x000000ff
+#define GEN6_MSG_SAMPLER_SURFACE__SHIFT				0
+#define GEN6_MSG_DP_SEND_WRITE_COMMIT				(0x1 << 17)
+#define GEN6_MSG_DP_OP__MASK					0x0001e000
+#define GEN6_MSG_DP_OP__SHIFT					13
+#define GEN6_MSG_DP_CTRL__MASK					0x00001f00
+#define GEN6_MSG_DP_CTRL__SHIFT					8
+#define GEN7_MSG_DP_CATEGORY					(0x1 << 18)
+#define GEN7_MSG_DP_OP__MASK					0x0003c000
+#define GEN7_MSG_DP_OP__SHIFT					14
+#define GEN7_MSG_DP_CTRL__MASK					0x00003f00
+#define GEN7_MSG_DP_CTRL__SHIFT					8
+#define GEN7_MSG_DP_OWORD_BLOCK_READ_INVALIDATE			(0x1 << 13)
+#define GEN6_MSG_DP_OWORD_BLOCK_SIZE__MASK			0x00000700
+#define GEN6_MSG_DP_OWORD_BLOCK_SIZE__SHIFT			8
+#define GEN6_MSG_DP_OWORD_BLOCK_SIZE_1_LO			(0x0 << 8)
+#define GEN6_MSG_DP_OWORD_BLOCK_SIZE_1_HI			(0x1 << 8)
+#define GEN6_MSG_DP_OWORD_BLOCK_SIZE_2				(0x2 << 8)
+#define GEN6_MSG_DP_OWORD_BLOCK_SIZE_4				(0x3 << 8)
+#define GEN6_MSG_DP_OWORD_BLOCK_SIZE_8				(0x4 << 8)
+#define GEN6_MSG_DP_UNALIGNED_OWORD_BLOCK_SIZE__MASK		0x00000700
+#define GEN6_MSG_DP_UNALIGNED_OWORD_BLOCK_SIZE__SHIFT		8
+#define GEN6_MSG_DP_UNALIGNED_OWORD_BLOCK_SIZE_1_LO		(0x0 << 8)
+#define GEN6_MSG_DP_UNALIGNED_OWORD_BLOCK_SIZE_1_HI		(0x1 << 8)
+#define GEN6_MSG_DP_UNALIGNED_OWORD_BLOCK_SIZE_2		(0x2 << 8)
+#define GEN6_MSG_DP_UNALIGNED_OWORD_BLOCK_SIZE_4		(0x3 << 8)
+#define GEN6_MSG_DP_UNALIGNED_OWORD_BLOCK_SIZE_8		(0x4 << 8)
+#define GEN7_MSG_DP_OWORD_DUAL_BLOCK_READ_INVALIDATE		(0x1 << 13)
+#define GEN6_MSG_DP_OWORD_DUAL_BLOCK_SIZE__MASK			0x00000300
+#define GEN6_MSG_DP_OWORD_DUAL_BLOCK_SIZE__SHIFT		8
+#define GEN6_MSG_DP_OWORD_DUAL_BLOCK_SIZE_1			(0x0 << 8)
+#define GEN6_MSG_DP_OWORD_DUAL_BLOCK_SIZE_4			(0x2 << 8)
+#define GEN7_MSG_DP_DWORD_SCATTERED_READ_INVALIDATE		(0x1 << 13)
+#define GEN6_MSG_DP_DWORD_SCATTERED_BLOCK_SIZE__MASK		0x00000300
+#define GEN6_MSG_DP_DWORD_SCATTERED_BLOCK_SIZE__SHIFT		8
+#define GEN6_MSG_DP_DWORD_SCATTERED_BLOCK_SIZE_8		(0x2 << 8)
+#define GEN6_MSG_DP_DWORD_SCATTERED_BLOCK_SIZE_16		(0x3 << 8)
+#define GEN6_MSG_DP_BYTE_SCATTERED_DATA_SIZE__MASK		0x00000600
+#define GEN6_MSG_DP_BYTE_SCATTERED_DATA_SIZE__SHIFT		9
+#define GEN6_MSG_DP_BYTE_SCATTERED_DATA_SIZE_1			(0x0 << 9)
+#define GEN6_MSG_DP_BYTE_SCATTERED_DATA_SIZE_2			(0x1 << 9)
+#define GEN6_MSG_DP_BYTE_SCATTERED_DATA_SIZE_4			(0x2 << 9)
+#define GEN6_MSG_DP_BYTE_SCATTERED_MODE__MASK			0x00000100
+#define GEN6_MSG_DP_BYTE_SCATTERED_MODE__SHIFT			8
+#define GEN6_MSG_DP_BYTE_SCATTERED_MODE_SIMD8			(0x0 << 8)
+#define GEN6_MSG_DP_BYTE_SCATTERED_MODE_SIMD16			(0x1 << 8)
+#define GEN6_MSG_DP_RT_LAST					(0x1 << 12)
+#define GEN6_MSG_DP_RT_SLOTGRP_HI				(0x1 << 11)
+#define GEN6_MSG_DP_RT_MODE__MASK				0x00000700
+#define GEN6_MSG_DP_RT_MODE__SHIFT				8
+#define GEN6_MSG_DP_RT_MODE_SIMD16				(0x0 << 8)
+#define GEN6_MSG_DP_RT_MODE_SIMD16_REPDATA			(0x1 << 8)
+#define GEN6_MSG_DP_RT_MODE_SIMD8_DUALSRC_LO			(0x2 << 8)
+#define GEN6_MSG_DP_RT_MODE_SIMD8_DUALSRC_HI			(0x3 << 8)
+#define GEN6_MSG_DP_RT_MODE_SIMD8_LO				(0x4 << 8)
+#define GEN6_MSG_DP_RT_MODE_SIMD8_IMAGE_WR			(0x5 << 8)
+#define GEN7_MSG_DP_TYPED_SLOTGRP_HI				(0x1 << 13)
+#define GEN7_MSG_DP_TYPED_MASK__MASK				0x00000f00
+#define GEN7_MSG_DP_TYPED_MASK__SHIFT				8
+#define GEN7_MSG_DP_UNTYPED_MODE__MASK				0x00003000
+#define GEN7_MSG_DP_UNTYPED_MODE__SHIFT				12
+#define GEN7_MSG_DP_UNTYPED_MODE_SIMD4X2			(0x0 << 12)
+#define GEN7_MSG_DP_UNTYPED_MODE_SIMD16				(0x1 << 12)
+#define GEN7_MSG_DP_UNTYPED_MODE_SIMD8				(0x2 << 12)
+#define GEN7_MSG_DP_UNTYPED_MASK__MASK				0x00000f00
+#define GEN7_MSG_DP_UNTYPED_MASK__SHIFT				8
+#define GEN7_MSG_DP_ATOMIC_RETURN_DATA_ENABLE			(0x1 << 13)
+#define GEN7_MSG_DP_ATOMIC_TYPED_SLOTGRP_HI			(0x1 << 12)
+#define GEN7_MSG_DP_ATOMIC_UNTYPED_MODE__MASK			0x00001000
+#define GEN7_MSG_DP_ATOMIC_UNTYPED_MODE__SHIFT			12
+#define GEN7_MSG_DP_ATOMIC_UNTYPED_MODE_SIMD16			(0x0 << 12)
+#define GEN7_MSG_DP_ATOMIC_UNTYPED_MODE_SIMD8			(0x1 << 12)
+#define GEN7_MSG_DP_ATOMIC_OP__MASK				0x00000f00
+#define GEN7_MSG_DP_ATOMIC_OP__SHIFT				8
+#define GEN6_MSG_DP_SURFACE__MASK				0x000000ff
+#define GEN6_MSG_DP_SURFACE__SHIFT				0
+#define GEN6_MSG_TS_RESOURCE_SELECT__MASK			0x00000010
+#define GEN6_MSG_TS_RESOURCE_SELECT__SHIFT			4
+#define GEN6_MSG_TS_RESOURCE_SELECT_CHILD			(0x0 << 4)
+#define GEN6_MSG_TS_RESOURCE_SELECT_ROOT			(0x1 << 4)
+#define GEN6_MSG_TS_RESOURCE_SELECT_DEREF			(0x0 << 4)
+#define GEN6_MSG_TS_RESOURCE_SELECT_NO_DEREF			(0x1 << 4)
+#define GEN6_MSG_TS_REQUESTER_TYPE__MASK			0x00000002
+#define GEN6_MSG_TS_REQUESTER_TYPE__SHIFT			1
+#define GEN6_MSG_TS_REQUESTER_TYPE_ROOT				(0x0 << 1)
+#define GEN6_MSG_TS_REQUESTER_TYPE_CHILD			(0x1 << 1)
+#define GEN6_MSG_TS_OPCODE__MASK				0x00000001
+#define GEN6_MSG_TS_OPCODE__SHIFT				0
+#define GEN6_MSG_TS_OPCODE_DEREF				0x0
+#define GEN6_MSG_TS_OPCODE_SPAWN				0x1
+
+#endif /* GEN_EU_MESSAGE_XML */

diff --git a/icd/intel/genhw/gen_mi.xml.h b/icd/intel/genhw/gen_mi.xml.h
new file mode 100644
index 0000000..24d726a
--- /dev/null
+++ b/icd/intel/genhw/gen_mi.xml.h

@@ -0,0 +1,261 @@
+#ifndef GEN_MI_XML
+#define GEN_MI_XML
+
+/* Autogenerated file, DO NOT EDIT manually!
+
+This file was generated by the rules-ng-ng headergen tool in this git repository:
+https://github.com/olvaffe/envytools/
+git clone https://github.com/olvaffe/envytools.git
+
+Copyright (C) 2014-2015 by the following authors:
+- Chia-I Wu <olvaffe@gmail.com> (olv)
+
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
+The above copyright notice and this permission notice (including the
+next paragraph) shall be included in all copies or substantial
+portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+*/
+
+
+enum gen_mi_alu_opcode {
+    GEN75_MI_ALU_NOOP					      = 0x0,
+    GEN75_MI_ALU_LOAD					      = 0x80,
+    GEN75_MI_ALU_LOADINV				      = 0x480,
+    GEN75_MI_ALU_LOAD0					      = 0x81,
+    GEN75_MI_ALU_LOAD1					      = 0x481,
+    GEN75_MI_ALU_ADD					      = 0x100,
+    GEN75_MI_ALU_SUB					      = 0x101,
+    GEN75_MI_ALU_AND					      = 0x102,
+    GEN75_MI_ALU_OR					      = 0x103,
+    GEN75_MI_ALU_XOR					      = 0x104,
+    GEN75_MI_ALU_STORE					      = 0x180,
+    GEN75_MI_ALU_STOREINV				      = 0x580,
+};
+
+enum gen_mi_alu_operand {
+    GEN75_MI_ALU_R0					      = 0x0,
+    GEN75_MI_ALU_R1					      = 0x1,
+    GEN75_MI_ALU_R2					      = 0x2,
+    GEN75_MI_ALU_R3					      = 0x3,
+    GEN75_MI_ALU_R4					      = 0x4,
+    GEN75_MI_ALU_R5					      = 0x5,
+    GEN75_MI_ALU_R6					      = 0x6,
+    GEN75_MI_ALU_R7					      = 0x7,
+    GEN75_MI_ALU_R8					      = 0x8,
+    GEN75_MI_ALU_R9					      = 0x9,
+    GEN75_MI_ALU_R10					      = 0xa,
+    GEN75_MI_ALU_R11					      = 0xb,
+    GEN75_MI_ALU_R12					      = 0xc,
+    GEN75_MI_ALU_R13					      = 0xd,
+    GEN75_MI_ALU_R14					      = 0xe,
+    GEN75_MI_ALU_R15					      = 0xf,
+    GEN75_MI_ALU_SRCA					      = 0x20,
+    GEN75_MI_ALU_SRCB					      = 0x21,
+    GEN75_MI_ALU_ACCU					      = 0x31,
+    GEN75_MI_ALU_ZF					      = 0x32,
+    GEN75_MI_ALU_CF					      = 0x33,
+};
+
+#define GEN6_MI_TYPE__MASK					0xe0000000
+#define GEN6_MI_TYPE__SHIFT					29
+#define GEN6_MI_TYPE_MI						(0x0 << 29)
+#define GEN6_MI_OPCODE__MASK					0x1f800000
+#define GEN6_MI_OPCODE__SHIFT					23
+#define GEN6_MI_OPCODE_MI_NOOP					(0x0 << 23)
+#define GEN75_MI_OPCODE_MI_SET_PREDICATE			(0x1 << 23)
+#define GEN75_MI_OPCODE_MI_RS_CONTROL				(0x6 << 23)
+#define GEN75_MI_OPCODE_MI_URB_ATOMIC_ALLOC			(0x9 << 23)
+#define GEN6_MI_OPCODE_MI_BATCH_BUFFER_END			(0xa << 23)
+#define GEN7_MI_OPCODE_MI_PREDICATE				(0xc << 23)
+#define GEN7_MI_OPCODE_MI_URB_CLEAR				(0x19 << 23)
+#define GEN75_MI_OPCODE_MI_MATH					(0x1a << 23)
+#define GEN6_MI_OPCODE_MI_STORE_DATA_IMM			(0x20 << 23)
+#define GEN6_MI_OPCODE_MI_LOAD_REGISTER_IMM			(0x22 << 23)
+#define GEN6_MI_OPCODE_MI_STORE_REGISTER_MEM			(0x24 << 23)
+#define GEN6_MI_OPCODE_MI_FLUSH_DW				(0x26 << 23)
+#define GEN6_MI_OPCODE_MI_REPORT_PERF_COUNT			(0x28 << 23)
+#define GEN7_MI_OPCODE_MI_LOAD_REGISTER_MEM			(0x29 << 23)
+#define GEN75_MI_OPCODE_MI_LOAD_REGISTER_REG			(0x2a << 23)
+#define GEN75_MI_OPCODE_MI_LOAD_URB_MEM				(0x2c << 23)
+#define GEN75_MI_OPCODE_MI_STORE_URB_MEM			(0x2d << 23)
+#define GEN6_MI_OPCODE_MI_BATCH_BUFFER_START			(0x31 << 23)
+#define GEN6_MI_LENGTH__MASK					0x0000003f
+#define GEN6_MI_LENGTH__SHIFT					0
+#define GEN6_MI_NOOP__SIZE					1
+
+#define GEN75_MI_SET_PREDICATE__SIZE				1
+#define GEN75_MI_SET_PREDICATE_DW0_PREDICATE__MASK		0x00000003
+#define GEN75_MI_SET_PREDICATE_DW0_PREDICATE__SHIFT		0
+#define GEN75_MI_SET_PREDICATE_DW0_PREDICATE_ALWAYS		0x0
+#define GEN75_MI_SET_PREDICATE_DW0_PREDICATE_ON_CLEAR		0x1
+#define GEN75_MI_SET_PREDICATE_DW0_PREDICATE_ON_SET		0x2
+#define GEN75_MI_SET_PREDICATE_DW0_PREDICATE_DISABLE		0x3
+
+#define GEN75_MI_RS_CONTROL__SIZE				1
+#define GEN75_MI_RS_CONTROL_DW0_ENABLE				(0x1 << 0)
+
+#define GEN75_MI_URB_ATOMIC_ALLOC__SIZE				1
+#define GEN75_MI_URB_ATOMIC_ALLOC_DW0_OFFSET__MASK		0x000ff000
+#define GEN75_MI_URB_ATOMIC_ALLOC_DW0_OFFSET__SHIFT		12
+#define GEN75_MI_URB_ATOMIC_ALLOC_DW0_SIZE__MASK		0x000001ff
+#define GEN75_MI_URB_ATOMIC_ALLOC_DW0_SIZE__SHIFT		0
+
+#define GEN6_MI_BATCH_BUFFER_END__SIZE				1
+
+#define GEN7_MI_PREDICATE__SIZE					1
+#define GEN7_MI_PREDICATE_DW0_LOADOP__MASK			0x000000c0
+#define GEN7_MI_PREDICATE_DW0_LOADOP__SHIFT			6
+#define GEN7_MI_PREDICATE_DW0_LOADOP_KEEP			(0x0 << 6)
+#define GEN7_MI_PREDICATE_DW0_LOADOP_LOAD			(0x2 << 6)
+#define GEN7_MI_PREDICATE_DW0_LOADOP_LOADINV			(0x3 << 6)
+#define GEN7_MI_PREDICATE_DW0_COMBINEOP__MASK			0x00000018
+#define GEN7_MI_PREDICATE_DW0_COMBINEOP__SHIFT			3
+#define GEN7_MI_PREDICATE_DW0_COMBINEOP_SET			(0x0 << 3)
+#define GEN7_MI_PREDICATE_DW0_COMBINEOP_AND			(0x1 << 3)
+#define GEN7_MI_PREDICATE_DW0_COMBINEOP_OR			(0x2 << 3)
+#define GEN7_MI_PREDICATE_DW0_COMBINEOP_XOR			(0x3 << 3)
+#define GEN7_MI_PREDICATE_DW0_COMPAREOP__MASK			0x00000003
+#define GEN7_MI_PREDICATE_DW0_COMPAREOP__SHIFT			0
+#define GEN7_MI_PREDICATE_DW0_COMPAREOP_TRUE			0x0
+#define GEN7_MI_PREDICATE_DW0_COMPAREOP_FALSE			0x1
+#define GEN7_MI_PREDICATE_DW0_COMPAREOP_SRCS_EQUAL		0x2
+#define GEN7_MI_PREDICATE_DW0_COMPAREOP_DELTAS_EQUAL		0x3
+
+#define GEN7_MI_URB_CLEAR__SIZE					2
+
+#define GEN7_MI_URB_CLEAR_DW1_LENGTH__MASK			0x3fff0000
+#define GEN7_MI_URB_CLEAR_DW1_LENGTH__SHIFT			16
+#define GEN7_MI_URB_CLEAR_DW1_OFFSET__MASK			0x00007fff
+#define GEN7_MI_URB_CLEAR_DW1_OFFSET__SHIFT			0
+
+#define GEN75_MI_MATH__SIZE					65
+
+#define GEN75_MI_MATH_DW_OP__MASK				0xfff00000
+#define GEN75_MI_MATH_DW_OP__SHIFT				20
+#define GEN75_MI_MATH_DW_SRC1__MASK				0x000ffc00
+#define GEN75_MI_MATH_DW_SRC1__SHIFT				10
+#define GEN75_MI_MATH_DW_SRC2__MASK				0x000007ff
+#define GEN75_MI_MATH_DW_SRC2__SHIFT				0
+
+#define GEN6_MI_STORE_DATA_IMM__SIZE				6
+#define GEN6_MI_STORE_DATA_IMM_DW0_USE_GGTT			(0x1 << 22)
+
+
+#define GEN6_MI_STORE_DATA_IMM_DW2_ADDR__MASK			0xfffffffc
+#define GEN6_MI_STORE_DATA_IMM_DW2_ADDR__SHIFT			2
+#define GEN6_MI_STORE_DATA_IMM_DW2_ADDR__SHR			2
+
+
+
+
+#define GEN6_MI_LOAD_REGISTER_IMM__SIZE				3
+#define GEN6_MI_LOAD_REGISTER_IMM_DW0_WRITE_DISABLES__MASK	0x00000f00
+#define GEN6_MI_LOAD_REGISTER_IMM_DW0_WRITE_DISABLES__SHIFT	8
+
+#define GEN6_MI_LOAD_REGISTER_IMM_DW1_REG__MASK			0x007ffffc
+#define GEN6_MI_LOAD_REGISTER_IMM_DW1_REG__SHIFT		2
+#define GEN6_MI_LOAD_REGISTER_IMM_DW1_REG__SHR			2
+
+
+#define GEN6_MI_STORE_REGISTER_MEM__SIZE			4
+#define GEN6_MI_STORE_REGISTER_MEM_DW0_USE_GGTT			(0x1 << 22)
+#define GEN75_MI_STORE_REGISTER_MEM_DW0_PREDICATE_ENABLE	(0x1 << 21)
+
+#define GEN6_MI_STORE_REGISTER_MEM_DW1_REG__MASK		0x007ffffc
+#define GEN6_MI_STORE_REGISTER_MEM_DW1_REG__SHIFT		2
+#define GEN6_MI_STORE_REGISTER_MEM_DW1_REG__SHR			2
+
+#define GEN6_MI_STORE_REGISTER_MEM_DW2_ADDR__MASK		0xfffffffc
+#define GEN6_MI_STORE_REGISTER_MEM_DW2_ADDR__SHIFT		2
+#define GEN6_MI_STORE_REGISTER_MEM_DW2_ADDR__SHR		2
+
+
+#define GEN6_MI_FLUSH_DW__SIZE					4
+
+
+
+
+#define GEN6_MI_REPORT_PERF_COUNT__SIZE				3
+
+#define GEN6_MI_REPORT_PERF_COUNT_DW1_CORE_MODE_ENABLE		(0x1 << 4)
+#define GEN6_MI_REPORT_PERF_COUNT_DW1_USE_GGTT			(0x1 << 0)
+#define GEN6_MI_REPORT_PERF_COUNT_DW1_ADDR__MASK		0xffffffc0
+#define GEN6_MI_REPORT_PERF_COUNT_DW1_ADDR__SHIFT		6
+#define GEN6_MI_REPORT_PERF_COUNT_DW1_ADDR__SHR			6
+
+
+#define GEN7_MI_LOAD_REGISTER_MEM__SIZE				4
+#define GEN7_MI_LOAD_REGISTER_MEM_DW0_USE_GGTT			(0x1 << 22)
+#define GEN7_MI_LOAD_REGISTER_MEM_DW0_ASYNC_MODE_ENABLE		(0x1 << 21)
+
+#define GEN7_MI_LOAD_REGISTER_MEM_DW1_REG__MASK			0x007ffffc
+#define GEN7_MI_LOAD_REGISTER_MEM_DW1_REG__SHIFT		2
+#define GEN7_MI_LOAD_REGISTER_MEM_DW1_REG__SHR			2
+
+#define GEN7_MI_LOAD_REGISTER_MEM_DW2_ADDR__MASK		0xfffffffc
+#define GEN7_MI_LOAD_REGISTER_MEM_DW2_ADDR__SHIFT		2
+#define GEN7_MI_LOAD_REGISTER_MEM_DW2_ADDR__SHR			2
+
+
+#define GEN75_MI_LOAD_REGISTER_REG__SIZE			3
+
+#define GEN75_MI_LOAD_REGISTER_REG_DW1_SRC_REG__MASK		0x007ffffc
+#define GEN75_MI_LOAD_REGISTER_REG_DW1_SRC_REG__SHIFT		2
+#define GEN75_MI_LOAD_REGISTER_REG_DW1_SRC_REG__SHR		2
+
+#define GEN75_MI_LOAD_REGISTER_REG_DW2_DST_REG__MASK		0x007ffffc
+#define GEN75_MI_LOAD_REGISTER_REG_DW2_DST_REG__SHIFT		2
+#define GEN75_MI_LOAD_REGISTER_REG_DW2_DST_REG__SHR		2
+
+#define GEN75_MI_LOAD_URB_MEM__SIZE				4
+
+#define GEN75_MI_LOAD_URB_MEM_DW1_ADDR__MASK			0x00007ffc
+#define GEN75_MI_LOAD_URB_MEM_DW1_ADDR__SHIFT			2
+#define GEN75_MI_LOAD_URB_MEM_DW1_ADDR__SHR			2
+
+#define GEN75_MI_LOAD_URB_MEM_DW2_ADDR__MASK			0xffffffc0
+#define GEN75_MI_LOAD_URB_MEM_DW2_ADDR__SHIFT			6
+#define GEN75_MI_LOAD_URB_MEM_DW2_ADDR__SHR			6
+
+
+#define GEN75_MI_STORE_URB_MEM__SIZE				4
+
+#define GEN75_MI_STORE_URB_MEM_DW1_ADDR__MASK			0x00007ffc
+#define GEN75_MI_STORE_URB_MEM_DW1_ADDR__SHIFT			2
+#define GEN75_MI_STORE_URB_MEM_DW1_ADDR__SHR			2
+
+#define GEN75_MI_STORE_URB_MEM_DW2_ADDR__MASK			0xffffffc0
+#define GEN75_MI_STORE_URB_MEM_DW2_ADDR__SHIFT			6
+#define GEN75_MI_STORE_URB_MEM_DW2_ADDR__SHR			6
+
+
+#define GEN6_MI_BATCH_BUFFER_START__SIZE			3
+#define GEN75_MI_BATCH_BUFFER_START_DW0_SECOND_LEVEL		(0x1 << 22)
+#define GEN75_MI_BATCH_BUFFER_START_DW0_ADD_OFFSET_ENABLE	(0x1 << 16)
+#define GEN75_MI_BATCH_BUFFER_START_DW0_PREDICATION_ENABLE	(0x1 << 15)
+#define GEN75_MI_BATCH_BUFFER_START_DW0_NON_PRIVILEGED		(0x1 << 13)
+#define GEN6_MI_BATCH_BUFFER_START_DW0_CLEAR_COMMAND_BUFFER	(0x1 << 11)
+#define GEN6_MI_BATCH_BUFFER_START_DW0_USE_PPGTT		(0x1 << 8)
+
+#define GEN6_MI_BATCH_BUFFER_START_DW1_ADDR__MASK		0xfffffffc
+#define GEN6_MI_BATCH_BUFFER_START_DW1_ADDR__SHIFT		2
+#define GEN6_MI_BATCH_BUFFER_START_DW1_ADDR__SHR		2
+
+
+
+#endif /* GEN_MI_XML */

diff --git a/icd/intel/genhw/gen_regs.xml.h b/icd/intel/genhw/gen_regs.xml.h
new file mode 100644
index 0000000..2bdd72b
--- /dev/null
+++ b/icd/intel/genhw/gen_regs.xml.h

@@ -0,0 +1,172 @@
+#ifndef GEN_REGS_XML
+#define GEN_REGS_XML
+
+/* Autogenerated file, DO NOT EDIT manually!
+
+This file was generated by the rules-ng-ng headergen tool in this git repository:
+https://github.com/olvaffe/envytools/
+git clone https://github.com/olvaffe/envytools.git
+
+Copyright (C) 2014-2015 by the following authors:
+- Chia-I Wu <olvaffe@gmail.com> (olv)
+
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
+The above copyright notice and this permission notice (including the
+next paragraph) shall be included in all copies or substantial
+portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+*/
+
+
+#define GEN6_REG_MASK__MASK					0xffff0000
+#define GEN6_REG_MASK__SHIFT					16
+#define GEN6_REG__SIZE						0x400000
+#define GEN7_REG_HS_INVOCATION_COUNT				0x2300
+
+#define GEN7_REG_DS_INVOCATION_COUNT				0x2308
+
+#define GEN6_REG_IA_VERTICES_COUNT				0x2310
+
+#define GEN6_REG_IA_PRIMITIVES_COUNT				0x2318
+
+#define GEN6_REG_VS_INVOCATION_COUNT				0x2320
+
+#define GEN6_REG_GS_INVOCATION_COUNT				0x2328
+
+#define GEN6_REG_GS_PRIMITIVES_COUNT				0x2330
+
+#define GEN6_REG_CL_INVOCATION_COUNT				0x2338
+
+#define GEN6_REG_CL_PRIMITIVES_COUNT				0x2340
+
+#define GEN6_REG_PS_INVOCATION_COUNT				0x2348
+
+#define GEN6_REG_PS_DEPTH_COUNT					0x2350
+
+#define GEN6_REG_TIMESTAMP					0x2358
+
+#define GEN6_REG_OACONTROL					0x2360
+#define GEN6_REG_OACONTROL_COUNTER_SELECT__MASK			0x0000001c
+#define GEN6_REG_OACONTROL_COUNTER_SELECT__SHIFT		2
+#define GEN6_REG_OACONTROL_PERFORMANCE_COUNTER_ENABLE		(0x1 << 0)
+
+
+#define GEN7_REG_MI_PREDICATE_SRC0				0x2400
+
+#define GEN7_REG_MI_PREDICATE_SRC1				0x2408
+
+#define GEN7_REG_MI_PREDICATE_DATA				0x2410
+
+#define GEN7_REG_MI_PREDICATE_RESULT				0x2418
+
+#define GEN75_REG_MI_PREDICATE_RESULT_1				0x241c
+
+#define GEN75_REG_MI_PREDICATE_RESULT_2				0x2214
+
+#define GEN7_REG_3DPRIM_END_OFFSET				0x2420
+
+#define GEN7_REG_3DPRIM_START_VERTEX				0x2430
+
+#define GEN7_REG_3DPRIM_VERTEX_COUNT				0x2434
+
+#define GEN7_REG_3DPRIM_INSTANCE_COUNT				0x2438
+
+#define GEN7_REG_3DPRIM_START_INSTANCE				0x243c
+
+#define GEN7_REG_3DPRIM_BASE_VERTEX				0x2440
+
+#define GEN75_REG_CS_GPR(i0)					(0x2600 + 0x8*(i0))
+#define GEN75_REG_CS_GPR__ESIZE					0x8
+#define GEN75_REG_CS_GPR__LEN					0x10
+
+
+#define GEN6_REG_SO_PRIM_STORAGE_NEEDED				0x2280
+
+#define GEN6_REG_SO_NUM_PRIMS_WRITTEN				0x2288
+
+
+#define GEN7_REG_SO_NUM_PRIMS_WRITTEN(i0)			(0x5200 + 0x8*(i0))
+#define GEN7_REG_SO_NUM_PRIMS_WRITTEN__ESIZE			0x8
+#define GEN7_REG_SO_NUM_PRIMS_WRITTEN__LEN			0x4
+
+#define GEN7_REG_SO_PRIM_STORAGE_NEEDED(i0)			(0x5240 + 0x8*(i0))
+#define GEN7_REG_SO_PRIM_STORAGE_NEEDED__ESIZE			0x8
+#define GEN7_REG_SO_PRIM_STORAGE_NEEDED__LEN			0x4
+
+#define GEN7_REG_SO_WRITE_OFFSET(i0)				(0x5280 + 0x8*(i0))
+#define GEN7_REG_SO_WRITE_OFFSET__ESIZE				0x8
+#define GEN7_REG_SO_WRITE_OFFSET__LEN				0x4
+
+
+#define GEN7_REG_CACHE_MODE_0					0x7000
+#define GEN7_REG_CACHE_MODE_0_HIZ_RAW_STALL_OPT_DISABLE		(0x1 << 2)
+
+#define GEN7_REG_CACHE_MODE_1					0x7004
+#define GEN8_REG_CACHE_MODE_1_HIZ_NP_EARLY_Z_FAILS_DISABLE	(0x1 << 13)
+#define GEN8_REG_CACHE_MODE_1_HIZ_NP_PMA_FIX_ENABLE		(0x1 << 11)
+
+
+#define GEN8_REG_L3CNTLREG					0x7034
+
+
+#define GEN7_REG_L3SQCREG1					0xb010
+#define GEN7_REG_L3SQCREG1_CON4DCUNC				(0x1 << 24)
+#define GEN7_REG_L3SQCREG1_SQGHPCI__MASK			0x00ff0000
+#define GEN7_REG_L3SQCREG1_SQGHPCI__SHIFT			16
+#define GEN7_REG_L3SQCREG1_SQGHPCI_18_6				(0x73 << 16)
+#define GEN75_REG_L3SQCREG1_SQGPCI__MASK			0x00f80000
+#define GEN75_REG_L3SQCREG1_SQGPCI__SHIFT			19
+#define GEN75_REG_L3SQCREG1_SQGPCI_24				(0xc << 19)
+#define GEN75_REG_L3SQCREG1_SQHPCI__MASK			0x0007c000
+#define GEN75_REG_L3SQCREG1_SQHPCI__SHIFT			14
+#define GEN75_REG_L3SQCREG1_SQHPCI_8				(0x4 << 14)
+
+#define GEN7_REG_L3SQCREG2					0xb014
+
+#define GEN7_REG_L3SQCREG3					0xb018
+
+#define GEN7_REG_L3CNTLREG1					0xb01c
+
+#define GEN7_REG_L3CNTLREG2					0xb020
+#define GEN7_REG_L3CNTLREG2_DCWASLMB				(0x1 << 27)
+#define GEN7_REG_L3CNTLREG2_DCWASS__MASK			0x07e00000
+#define GEN7_REG_L3CNTLREG2_DCWASS__SHIFT			21
+#define GEN7_REG_L3CNTLREG2_ROCPSLMB				(0x1 << 20)
+#define GEN7_REG_L3CNTLREG2_RDOCPL__MASK			0x000fc000
+#define GEN7_REG_L3CNTLREG2_RDOCPL__SHIFT			14
+#define GEN7_REG_L3CNTLREG2_URBSLMB				(0x1 << 7)
+#define GEN7_REG_L3CNTLREG2_URBALL__MASK			0x0000007e
+#define GEN7_REG_L3CNTLREG2_URBALL__SHIFT			1
+#define GEN7_REG_L3CNTLREG2_SLMMENB				(0x1 << 0)
+
+#define GEN7_REG_L3CNTLREG3					0xb024
+#define GEN7_REG_L3CNTLREG3_TWALSLMB				(0x1 << 21)
+#define GEN7_REG_L3CNTLREG3_TXWYALL__MASK			0x001f8000
+#define GEN7_REG_L3CNTLREG3_TXWYALL__SHIFT			15
+#define GEN7_REG_L3CNTLREG3_CWASLMB				(0x1 << 14)
+#define GEN7_REG_L3CNTLREG3_CTWYALL__MASK			0x00003f00
+#define GEN7_REG_L3CNTLREG3_CTWYALL__SHIFT			8
+#define GEN7_REG_L3CNTLREG3_ISWYSLMB				(0x1 << 7)
+#define GEN7_REG_L3CNTLREG3_ISWYALL__MASK			0x0000007e
+#define GEN7_REG_L3CNTLREG3_ISWYALL__SHIFT			1
+
+#define GEN6_REG_BCS_SWCTRL					0x22200
+#define GEN6_REG_BCS_SWCTRL_DST_TILING_Y			(0x1 << 1)
+#define GEN6_REG_BCS_SWCTRL_SRC_TILING_Y			(0x1 << 0)
+
+
+#endif /* GEN_REGS_XML */

diff --git a/icd/intel/genhw/gen_render.xml.h b/icd/intel/genhw/gen_render.xml.h
new file mode 100644
index 0000000..2e86ba9
--- /dev/null
+++ b/icd/intel/genhw/gen_render.xml.h

@@ -0,0 +1,294 @@
+#ifndef GEN_RENDER_XML
+#define GEN_RENDER_XML
+
+/* Autogenerated file, DO NOT EDIT manually!
+
+This file was generated by the rules-ng-ng headergen tool in this git repository:
+https://github.com/olvaffe/envytools/
+git clone https://github.com/olvaffe/envytools.git
+
+Copyright (C) 2014-2015 by the following authors:
+- Chia-I Wu <olvaffe@gmail.com> (olv)
+
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
+The above copyright notice and this permission notice (including the
+next paragraph) shall be included in all copies or substantial
+portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+*/
+
+
+#define GEN6_RENDER_TYPE__MASK					0xe0000000
+#define GEN6_RENDER_TYPE__SHIFT					29
+#define GEN6_RENDER_TYPE_RENDER					(0x3 << 29)
+#define GEN6_RENDER_SUBTYPE__MASK				0x18000000
+#define GEN6_RENDER_SUBTYPE__SHIFT				27
+#define GEN6_RENDER_SUBTYPE_COMMON				(0x0 << 27)
+#define GEN6_RENDER_SUBTYPE_SINGLE_DW				(0x1 << 27)
+#define GEN6_RENDER_SUBTYPE_MEDIA				(0x2 << 27)
+#define GEN6_RENDER_SUBTYPE_3D					(0x3 << 27)
+#define GEN6_RENDER_OPCODE__MASK				0x07ff0000
+#define GEN6_RENDER_OPCODE__SHIFT				16
+#define GEN6_RENDER_OPCODE_STATE_BASE_ADDRESS			(0x101 << 16)
+#define GEN6_RENDER_OPCODE_STATE_SIP				(0x102 << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_VF_STATISTICS		(0xb << 16)
+#define GEN6_RENDER_OPCODE_PIPELINE_SELECT			(0x104 << 16)
+#define GEN6_RENDER_OPCODE_MEDIA_VFE_STATE			(0x0 << 16)
+#define GEN6_RENDER_OPCODE_MEDIA_CURBE_LOAD			(0x1 << 16)
+#define GEN6_RENDER_OPCODE_MEDIA_INTERFACE_DESCRIPTOR_LOAD	(0x2 << 16)
+#define GEN6_RENDER_OPCODE_MEDIA_STATE_FLUSH			(0x4 << 16)
+#define GEN7_RENDER_OPCODE_GPGPU_WALKER				(0x105 << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_BINDING_TABLE_POINTERS	(0x1 << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_SAMPLER_STATE_POINTERS	(0x2 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_CLEAR_PARAMS			(0x4 << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_URB				(0x5 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_DEPTH_BUFFER			(0x5 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_STENCIL_BUFFER		(0x6 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_HIER_DEPTH_BUFFER		(0x7 << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_VERTEX_BUFFERS		(0x8 << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_VERTEX_ELEMENTS		(0x9 << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_INDEX_BUFFER			(0xa << 16)
+#define GEN75_RENDER_OPCODE_3DSTATE_VF				(0xc << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_VIEWPORT_STATE_POINTERS	(0xd << 16)
+#define GEN8_RENDER_OPCODE_3DSTATE_MULTISAMPLE			(0xd << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_CC_STATE_POINTERS		(0xe << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_SCISSOR_STATE_POINTERS	(0xf << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_VS				(0x10 << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_GS				(0x11 << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_CLIP				(0x12 << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_SF				(0x13 << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_WM				(0x14 << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_CONSTANT_VS			(0x15 << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_CONSTANT_GS			(0x16 << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_CONSTANT_PS			(0x17 << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_SAMPLE_MASK			(0x18 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_CONSTANT_HS			(0x19 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_CONSTANT_DS			(0x1a << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_HS				(0x1b << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_TE				(0x1c << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_DS				(0x1d << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_STREAMOUT			(0x1e << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_SBE				(0x1f << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_PS				(0x20 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_VIEWPORT_STATE_POINTERS_SF_CLIP	(0x21 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_VIEWPORT_STATE_POINTERS_CC	(0x23 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_BLEND_STATE_POINTERS		(0x24 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_DEPTH_STENCIL_STATE_POINTERS	(0x25 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_BINDING_TABLE_POINTERS_VS	(0x26 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_BINDING_TABLE_POINTERS_HS	(0x27 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_BINDING_TABLE_POINTERS_DS	(0x28 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_BINDING_TABLE_POINTERS_GS	(0x29 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_BINDING_TABLE_POINTERS_PS	(0x2a << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_SAMPLER_STATE_POINTERS_VS	(0x2b << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_SAMPLER_STATE_POINTERS_HS	(0x2c << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_SAMPLER_STATE_POINTERS_DS	(0x2d << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_SAMPLER_STATE_POINTERS_GS	(0x2e << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_SAMPLER_STATE_POINTERS_PS	(0x2f << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_URB_VS			(0x30 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_URB_HS			(0x31 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_URB_DS			(0x32 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_URB_GS			(0x33 << 16)
+#define GEN8_RENDER_OPCODE_3DSTATE_VF_INSTANCING		(0x49 << 16)
+#define GEN8_RENDER_OPCODE_3DSTATE_VF_SGVS			(0x4a << 16)
+#define GEN8_RENDER_OPCODE_3DSTATE_VF_TOPOLOGY			(0x4b << 16)
+#define GEN8_RENDER_OPCODE_3DSTATE_WM_CHROMAKEY			(0x4c << 16)
+#define GEN8_RENDER_OPCODE_3DSTATE_PS_BLEND			(0x4d << 16)
+#define GEN8_RENDER_OPCODE_3DSTATE_WM_DEPTH_STENCIL		(0x4e << 16)
+#define GEN8_RENDER_OPCODE_3DSTATE_PS_EXTRA			(0x4f << 16)
+#define GEN8_RENDER_OPCODE_3DSTATE_RASTER			(0x50 << 16)
+#define GEN8_RENDER_OPCODE_3DSTATE_SBE_SWIZ			(0x51 << 16)
+#define GEN8_RENDER_OPCODE_3DSTATE_WM_HZ_OP			(0x52 << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_DRAWING_RECTANGLE		(0x100 << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_DEPTH_BUFFER			(0x105 << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_POLY_STIPPLE_OFFSET		(0x106 << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_POLY_STIPPLE_PATTERN		(0x107 << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_LINE_STIPPLE			(0x108 << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_AA_LINE_PARAMETERS		(0x10a << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_GS_SVB_INDEX			(0x10b << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_MULTISAMPLE			(0x10d << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_STENCIL_BUFFER		(0x10e << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_HIER_DEPTH_BUFFER		(0x10f << 16)
+#define GEN6_RENDER_OPCODE_3DSTATE_CLEAR_PARAMS			(0x110 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_PUSH_CONSTANT_ALLOC_VS	(0x112 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_PUSH_CONSTANT_ALLOC_HS	(0x113 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_PUSH_CONSTANT_ALLOC_DS	(0x114 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_PUSH_CONSTANT_ALLOC_GS	(0x115 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_PUSH_CONSTANT_ALLOC_PS	(0x116 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_SO_DECL_LIST			(0x117 << 16)
+#define GEN7_RENDER_OPCODE_3DSTATE_SO_BUFFER			(0x118 << 16)
+#define GEN8_RENDER_OPCODE_3DSTATE_SAMPLE_PATTERN		(0x11c << 16)
+#define GEN6_RENDER_OPCODE_PIPE_CONTROL				(0x200 << 16)
+#define GEN6_RENDER_OPCODE_3DPRIMITIVE				(0x300 << 16)
+#define GEN6_RENDER_LENGTH__MASK				0x000000ff
+#define GEN6_RENDER_LENGTH__SHIFT				0
+#define GEN6_MOCS_LLC__MASK					0x00000003
+#define GEN6_MOCS_LLC__SHIFT					0
+#define GEN6_MOCS_LLC_PTE					0x0
+#define GEN6_MOCS_LLC_UC					0x1
+#define GEN6_MOCS_LLC_WB					0x2
+#define GEN7_MOCS_LLC__MASK					0x00000002
+#define GEN7_MOCS_LLC__SHIFT					1
+#define GEN7_MOCS_LLC_PTE					(0x0 << 1)
+#define GEN7_MOCS_LLC_WB					(0x1 << 1)
+#define GEN75_MOCS_LLC__MASK					0x00000006
+#define GEN75_MOCS_LLC__SHIFT					1
+#define GEN75_MOCS_LLC_PTE					(0x0 << 1)
+#define GEN75_MOCS_LLC_UC					(0x1 << 1)
+#define GEN75_MOCS_LLC_WB					(0x2 << 1)
+#define GEN75_MOCS_LLC_ELLC					(0x3 << 1)
+#define GEN7_MOCS_L3__MASK					0x00000001
+#define GEN7_MOCS_L3__SHIFT					0
+#define GEN7_MOCS_L3_UC						0x0
+#define GEN7_MOCS_L3_WB						0x1
+#define GEN8_MOCS_MT__MASK					0x00000060
+#define GEN8_MOCS_MT__SHIFT					5
+#define GEN8_MOCS_MT_PTE					(0x0 << 5)
+#define GEN8_MOCS_MT_UC						(0x1 << 5)
+#define GEN8_MOCS_MT_WT						(0x2 << 5)
+#define GEN8_MOCS_MT_WB						(0x3 << 5)
+#define GEN8_MOCS_CT__MASK					0x00000018
+#define GEN8_MOCS_CT__SHIFT					3
+#define GEN8_MOCS_CT_ELLC					(0x0 << 3)
+#define GEN8_MOCS_CT_LLC_ONLY					(0x1 << 3)
+#define GEN8_MOCS_CT_LLC					(0x2 << 3)
+#define GEN8_MOCS_CT_L3						(0x3 << 3)
+#define GEN9_MOCS__MASK						0x0000007f
+#define GEN9_MOCS__SHIFT					0
+#define GEN9_MOCS_MT_WT_CT_L3					0x5
+#define GEN9_MOCS_MT_WB_CT_L3					0x9
+#define GEN6_SBA_ADDR__MASK					0xfffff000
+#define GEN6_SBA_ADDR__SHIFT					12
+#define GEN6_SBA_ADDR__SHR					12
+#define GEN6_SBA_MOCS__MASK					0x00000f00
+#define GEN6_SBA_MOCS__SHIFT					8
+#define GEN8_SBA_MOCS__MASK					0x000007f0
+#define GEN8_SBA_MOCS__SHIFT					4
+#define GEN6_SBA_ADDR_MODIFIED					(0x1 << 0)
+#define GEN6_BINDING_TABLE_ADDR__MASK				0x0000ffe0
+#define GEN6_BINDING_TABLE_ADDR__SHIFT				5
+#define GEN6_BINDING_TABLE_ADDR__SHR				5
+#define GEN6_STATE_BASE_ADDRESS__SIZE				19
+
+
+#define GEN6_SBA_DW1_GENERAL_STATELESS_MOCS__MASK		0x000000f0
+#define GEN6_SBA_DW1_GENERAL_STATELESS_MOCS__SHIFT		4
+#define GEN6_SBA_DW1_GENERAL_STATELESS_FORCE_WRITE_THRU		(0x1 << 3)
+
+
+
+
+
+
+
+
+
+
+
+
+
+#define GEN8_SBA_DW3_STATELESS_MOCS__MASK			0x007f0000
+#define GEN8_SBA_DW3_STATELESS_MOCS__SHIFT			16
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+#define GEN6_STATE_SIP__SIZE					3
+
+
+#define GEN6_SIP_DW1_KERNEL_ADDR__MASK				0xfffffff0
+#define GEN6_SIP_DW1_KERNEL_ADDR__SHIFT				4
+#define GEN6_SIP_DW1_KERNEL_ADDR__SHR				4
+
+
+#define GEN6_PIPELINE_SELECT__SIZE				1
+
+#define GEN6_PIPELINE_SELECT_DW0_SELECT__MASK			0x00000003
+#define GEN6_PIPELINE_SELECT_DW0_SELECT__SHIFT			0
+#define GEN6_PIPELINE_SELECT_DW0_SELECT_3D			0x0
+#define GEN6_PIPELINE_SELECT_DW0_SELECT_MEDIA			0x1
+#define GEN7_PIPELINE_SELECT_DW0_SELECT_GPGPU			0x2
+#define GEN9_PIPELINE_SELECT_DW0_SELECT__MASK			0x00000700
+#define GEN9_PIPELINE_SELECT_DW0_SELECT__SHIFT			8
+#define GEN9_PIPELINE_SELECT_DW0_SELECT_3D			(0x3 << 8)
+
+#define GEN6_PIPE_CONTROL__SIZE					6
+
+
+#define GEN7_PIPE_CONTROL_USE_GGTT				(0x1 << 24)
+#define GEN7_PIPE_CONTROL_LRI_WRITE__MASK			0x00800000
+#define GEN7_PIPE_CONTROL_LRI_WRITE__SHIFT			23
+#define GEN7_PIPE_CONTROL_LRI_WRITE_NONE			(0x0 << 23)
+#define GEN7_PIPE_CONTROL_LRI_WRITE_IMM				(0x1 << 23)
+#define GEN6_PIPE_CONTROL_PROTECTED_MEMORY_ENABLE		(0x1 << 22)
+#define GEN6_PIPE_CONTROL_STORE_DATA_INDEX			(0x1 << 21)
+#define GEN6_PIPE_CONTROL_CS_STALL				(0x1 << 20)
+#define GEN6_PIPE_CONTROL_GLOBAL_SNAPSHOT_COUNT_RESET		(0x1 << 19)
+#define GEN6_PIPE_CONTROL_TLB_INVALIDATE			(0x1 << 18)
+#define GEN6_PIPE_CONTROL_SYNC_GFDT_SURFACE			(0x1 << 17)
+#define GEN6_PIPE_CONTROL_GENERIC_MEDIA_STATE_CLEAR		(0x1 << 16)
+#define GEN6_PIPE_CONTROL_WRITE__MASK				0x0000c000
+#define GEN6_PIPE_CONTROL_WRITE__SHIFT				14
+#define GEN6_PIPE_CONTROL_WRITE_NONE				(0x0 << 14)
+#define GEN6_PIPE_CONTROL_WRITE_IMM				(0x1 << 14)
+#define GEN6_PIPE_CONTROL_WRITE_PS_DEPTH_COUNT			(0x2 << 14)
+#define GEN6_PIPE_CONTROL_WRITE_TIMESTAMP			(0x3 << 14)
+#define GEN6_PIPE_CONTROL_DEPTH_STALL				(0x1 << 13)
+#define GEN6_PIPE_CONTROL_RENDER_CACHE_FLUSH			(0x1 << 12)
+#define GEN6_PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE		(0x1 << 11)
+#define GEN6_PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE		(0x1 << 10)
+#define GEN6_PIPE_CONTROL_INDIRECT_STATE_POINTERS_DISABLE	(0x1 << 9)
+#define GEN6_PIPE_CONTROL_NOTIFY_ENABLE				(0x1 << 8)
+#define GEN7_PIPE_CONTROL_WRITE_IMM_FLUSH			(0x1 << 7)
+#define GEN6_PIPE_CONTROL_PROTECTED_MEMORY_APP_ID__MASK		0x00000040
+#define GEN6_PIPE_CONTROL_PROTECTED_MEMORY_APP_ID__SHIFT	6
+#define GEN7_PIPE_CONTROL_DC_FLUSH				(0x1 << 5)
+#define GEN6_PIPE_CONTROL_VF_CACHE_INVALIDATE			(0x1 << 4)
+#define GEN6_PIPE_CONTROL_CONSTANT_CACHE_INVALIDATE		(0x1 << 3)
+#define GEN6_PIPE_CONTROL_STATE_CACHE_INVALIDATE		(0x1 << 2)
+#define GEN6_PIPE_CONTROL_PIXEL_SCOREBOARD_STALL		(0x1 << 1)
+#define GEN6_PIPE_CONTROL_DEPTH_CACHE_FLUSH			(0x1 << 0)
+
+#define GEN6_PIPE_CONTROL_DW2_USE_GGTT				(0x1 << 2)
+#define GEN6_PIPE_CONTROL_DW2_ADDR__MASK			0xfffffff8
+#define GEN6_PIPE_CONTROL_DW2_ADDR__SHIFT			3
+#define GEN6_PIPE_CONTROL_DW2_ADDR__SHR				3
+
+#define GEN7_PIPE_CONTROL_DW2_ADDR__MASK			0xfffffffc
+#define GEN7_PIPE_CONTROL_DW2_ADDR__SHIFT			2
+#define GEN7_PIPE_CONTROL_DW2_ADDR__SHR				2
+
+#define GEN8_PIPE_CONTROL_DW2_ADDR__MASK			0xfffffffc
+#define GEN8_PIPE_CONTROL_DW2_ADDR__SHIFT			2
+#define GEN8_PIPE_CONTROL_DW2_ADDR__SHR				2
+
+
+
+
+
+#endif /* GEN_RENDER_XML */

diff --git a/icd/intel/genhw/gen_render_3d.xml.h b/icd/intel/genhw/gen_render_3d.xml.h
new file mode 100644
index 0000000..d25542e
--- /dev/null
+++ b/icd/intel/genhw/gen_render_3d.xml.h

@@ -0,0 +1,1797 @@
+#ifndef GEN_RENDER_3D_XML
+#define GEN_RENDER_3D_XML
+
+/* Autogenerated file, DO NOT EDIT manually!
+
+This file was generated by the rules-ng-ng headergen tool in this git repository:
+https://github.com/olvaffe/envytools/
+git clone https://github.com/olvaffe/envytools.git
+
+Copyright (C) 2014-2015 by the following authors:
+- Chia-I Wu <olvaffe@gmail.com> (olv)
+
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
+The above copyright notice and this permission notice (including the
+next paragraph) shall be included in all copies or substantial
+portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+*/
+
+
+enum gen_prim_type {
+    GEN6_3DPRIM_POINTLIST				      = 0x1,
+    GEN6_3DPRIM_LINELIST				      = 0x2,
+    GEN6_3DPRIM_LINESTRIP				      = 0x3,
+    GEN6_3DPRIM_TRILIST					      = 0x4,
+    GEN6_3DPRIM_TRISTRIP				      = 0x5,
+    GEN6_3DPRIM_TRIFAN					      = 0x6,
+    GEN6_3DPRIM_QUADLIST				      = 0x7,
+    GEN6_3DPRIM_QUADSTRIP				      = 0x8,
+    GEN6_3DPRIM_LINELIST_ADJ				      = 0x9,
+    GEN6_3DPRIM_LINESTRIP_ADJ				      = 0xa,
+    GEN6_3DPRIM_TRILIST_ADJ				      = 0xb,
+    GEN6_3DPRIM_TRISTRIP_ADJ				      = 0xc,
+    GEN6_3DPRIM_TRISTRIP_REVERSE			      = 0xd,
+    GEN6_3DPRIM_POLYGON					      = 0xe,
+    GEN6_3DPRIM_RECTLIST				      = 0xf,
+    GEN6_3DPRIM_LINELOOP				      = 0x10,
+    GEN6_3DPRIM_POINTLIST_BF				      = 0x11,
+    GEN6_3DPRIM_LINESTRIP_CONT				      = 0x12,
+    GEN6_3DPRIM_LINESTRIP_BF				      = 0x13,
+    GEN6_3DPRIM_LINESTRIP_CONT_BF			      = 0x14,
+    GEN6_3DPRIM_TRIFAN_NOSTIPPLE			      = 0x16,
+    GEN7_3DPRIM_PATCHLIST_1				      = 0x20,
+    GEN7_3DPRIM_PATCHLIST_2				      = 0x21,
+    GEN7_3DPRIM_PATCHLIST_3				      = 0x22,
+    GEN7_3DPRIM_PATCHLIST_4				      = 0x23,
+    GEN7_3DPRIM_PATCHLIST_5				      = 0x24,
+    GEN7_3DPRIM_PATCHLIST_6				      = 0x25,
+    GEN7_3DPRIM_PATCHLIST_7				      = 0x26,
+    GEN7_3DPRIM_PATCHLIST_8				      = 0x27,
+    GEN7_3DPRIM_PATCHLIST_9				      = 0x28,
+    GEN7_3DPRIM_PATCHLIST_10				      = 0x29,
+    GEN7_3DPRIM_PATCHLIST_11				      = 0x2a,
+    GEN7_3DPRIM_PATCHLIST_12				      = 0x2b,
+    GEN7_3DPRIM_PATCHLIST_13				      = 0x2c,
+    GEN7_3DPRIM_PATCHLIST_14				      = 0x2d,
+    GEN7_3DPRIM_PATCHLIST_15				      = 0x2e,
+    GEN7_3DPRIM_PATCHLIST_16				      = 0x2f,
+    GEN7_3DPRIM_PATCHLIST_17				      = 0x30,
+    GEN7_3DPRIM_PATCHLIST_18				      = 0x31,
+    GEN7_3DPRIM_PATCHLIST_19				      = 0x32,
+    GEN7_3DPRIM_PATCHLIST_20				      = 0x33,
+    GEN7_3DPRIM_PATCHLIST_21				      = 0x34,
+    GEN7_3DPRIM_PATCHLIST_22				      = 0x35,
+    GEN7_3DPRIM_PATCHLIST_23				      = 0x36,
+    GEN7_3DPRIM_PATCHLIST_24				      = 0x37,
+    GEN7_3DPRIM_PATCHLIST_25				      = 0x38,
+    GEN7_3DPRIM_PATCHLIST_26				      = 0x39,
+    GEN7_3DPRIM_PATCHLIST_27				      = 0x3a,
+    GEN7_3DPRIM_PATCHLIST_28				      = 0x3b,
+    GEN7_3DPRIM_PATCHLIST_29				      = 0x3c,
+    GEN7_3DPRIM_PATCHLIST_30				      = 0x3d,
+    GEN7_3DPRIM_PATCHLIST_31				      = 0x3e,
+    GEN7_3DPRIM_PATCHLIST_32				      = 0x3f,
+};
+
+enum gen_state_alignment {
+    GEN6_ALIGNMENT_COLOR_CALC_STATE			      = 0x40,
+    GEN6_ALIGNMENT_DEPTH_STENCIL_STATE			      = 0x40,
+    GEN6_ALIGNMENT_BLEND_STATE				      = 0x40,
+    GEN6_ALIGNMENT_CLIP_VIEWPORT			      = 0x20,
+    GEN6_ALIGNMENT_SF_VIEWPORT				      = 0x20,
+    GEN7_ALIGNMENT_SF_CLIP_VIEWPORT			      = 0x40,
+    GEN6_ALIGNMENT_CC_VIEWPORT				      = 0x20,
+    GEN6_ALIGNMENT_SCISSOR_RECT				      = 0x20,
+    GEN6_ALIGNMENT_BINDING_TABLE_STATE			      = 0x20,
+    GEN6_ALIGNMENT_SAMPLER_BORDER_COLOR_STATE		      = 0x20,
+    GEN8_ALIGNMENT_SAMPLER_BORDER_COLOR_STATE		      = 0x40,
+    GEN6_ALIGNMENT_SAMPLER_STATE			      = 0x20,
+    GEN6_ALIGNMENT_SURFACE_STATE			      = 0x20,
+    GEN8_ALIGNMENT_SURFACE_STATE			      = 0x40,
+};
+
+enum gen_vf_component {
+    GEN6_VFCOMP_NOSTORE					      = 0x0,
+    GEN6_VFCOMP_STORE_SRC				      = 0x1,
+    GEN6_VFCOMP_STORE_0					      = 0x2,
+    GEN6_VFCOMP_STORE_1_FP				      = 0x3,
+    GEN6_VFCOMP_STORE_1_INT				      = 0x4,
+    GEN6_VFCOMP_STORE_VID				      = 0x5,
+    GEN6_VFCOMP_STORE_IID				      = 0x6,
+};
+
+enum gen_depth_format {
+    GEN6_ZFORMAT_D32_FLOAT_S8X24_UINT			      = 0x0,
+    GEN6_ZFORMAT_D32_FLOAT				      = 0x1,
+    GEN6_ZFORMAT_D24_UNORM_S8_UINT			      = 0x2,
+    GEN6_ZFORMAT_D24_UNORM_X8_UINT			      = 0x3,
+    GEN6_ZFORMAT_D16_UNORM				      = 0x5,
+};
+
+#define GEN6_INTERP_NONPERSPECTIVE_SAMPLE			(0x1 << 5)
+#define GEN6_INTERP_NONPERSPECTIVE_CENTROID			(0x1 << 4)
+#define GEN6_INTERP_NONPERSPECTIVE_PIXEL			(0x1 << 3)
+#define GEN6_INTERP_PERSPECTIVE_SAMPLE				(0x1 << 2)
+#define GEN6_INTERP_PERSPECTIVE_CENTROID			(0x1 << 1)
+#define GEN6_INTERP_PERSPECTIVE_PIXEL				(0x1 << 0)
+#define GEN6_PS_DISPATCH_32					(0x1 << 2)
+#define GEN6_PS_DISPATCH_16					(0x1 << 1)
+#define GEN6_PS_DISPATCH_8					(0x1 << 0)
+#define GEN6_THREADDISP_SPF					(0x1 << 31)
+#define GEN6_THREADDISP_VME					(0x1 << 30)
+#define GEN6_THREADDISP_SAMPLER_COUNT__MASK			0x38000000
+#define GEN6_THREADDISP_SAMPLER_COUNT__SHIFT			27
+#define GEN7_THREADDISP_DENORMAL__MASK				0x04000000
+#define GEN7_THREADDISP_DENORMAL__SHIFT				26
+#define GEN7_THREADDISP_DENORMAL_FTZ				(0x0 << 26)
+#define GEN7_THREADDISP_DENORMAL_RET				(0x1 << 26)
+#define GEN6_THREADDISP_BINDING_TABLE_SIZE__MASK		0x03fc0000
+#define GEN6_THREADDISP_BINDING_TABLE_SIZE__SHIFT		18
+#define GEN6_THREADDISP_PRIORITY_HIGH				(0x1 << 17)
+#define GEN6_THREADDISP_FP_MODE_ALT				(0x1 << 16)
+#define GEN7_ROUNDING_MODE__MASK				0x0000c000
+#define GEN7_ROUNDING_MODE__SHIFT				14
+#define GEN7_ROUNDING_MODE_RTNE					(0x0 << 14)
+#define GEN7_ROUNDING_MODE_RU					(0x1 << 14)
+#define GEN7_ROUNDING_MODE_RD					(0x2 << 14)
+#define GEN7_ROUNDING_MODE_RTZ					(0x3 << 14)
+#define GEN6_THREADDISP_ILLEGAL_CODE_EXCEPTION			(0x1 << 13)
+#define GEN75_THREADDISP_ACCESS_UAV				(0x1 << 12)
+#define GEN6_THREADDISP_MASK_STACK_EXCEPTION			(0x1 << 11)
+#define GEN6_THREADDISP_SOFTWARE_EXCEPTION			(0x1 << 7)
+#define GEN6_THREADSCRATCH_ADDR__MASK				0xfffffc00
+#define GEN6_THREADSCRATCH_ADDR__SHIFT				10
+#define GEN6_THREADSCRATCH_ADDR__SHR				10
+#define GEN6_THREADSCRATCH_SPACE_PER_THREAD__MASK		0x0000000f
+#define GEN6_THREADSCRATCH_SPACE_PER_THREAD__SHIFT		0
+#define GEN6_3DSTATE_VF_STATISTICS__SIZE			1
+
+#define GEN6_VF_STATS_DW0_ENABLE				(0x1 << 0)
+
+#define GEN6_3DSTATE_BINDING_TABLE_POINTERS__SIZE		4
+
+#define GEN6_BINDING_TABLE_PTR_DW0_PS_CHANGED			(0x1 << 12)
+#define GEN6_BINDING_TABLE_PTR_DW0_GS_CHANGED			(0x1 << 9)
+#define GEN6_BINDING_TABLE_PTR_DW0_VS_CHANGED			(0x1 << 8)
+
+
+
+
+#define GEN6_3DSTATE_SAMPLER_STATE_POINTERS__SIZE		4
+
+#define GEN6_SAMPLER_PTR_DW0_PS_CHANGED				(0x1 << 12)
+#define GEN6_SAMPLER_PTR_DW0_GS_CHANGED				(0x1 << 9)
+#define GEN6_SAMPLER_PTR_DW0_VS_CHANGED				(0x1 << 8)
+
+#define GEN6_SAMPLER_PTR_DW1_VS_ADDR__MASK			0xffffffe0
+#define GEN6_SAMPLER_PTR_DW1_VS_ADDR__SHIFT			5
+#define GEN6_SAMPLER_PTR_DW1_VS_ADDR__SHR			5
+
+#define GEN6_SAMPLER_PTR_DW2_GS_ADDR__MASK			0xffffffe0
+#define GEN6_SAMPLER_PTR_DW2_GS_ADDR__SHIFT			5
+#define GEN6_SAMPLER_PTR_DW2_GS_ADDR__SHR			5
+
+#define GEN6_SAMPLER_PTR_DW3_PS_ADDR__MASK			0xffffffe0
+#define GEN6_SAMPLER_PTR_DW3_PS_ADDR__SHIFT			5
+#define GEN6_SAMPLER_PTR_DW3_PS_ADDR__SHR			5
+
+#define GEN6_3DSTATE_URB__SIZE					3
+
+
+#define GEN6_URB_DW1_VS_ENTRY_SIZE__MASK			0x00ff0000
+#define GEN6_URB_DW1_VS_ENTRY_SIZE__SHIFT			16
+#define GEN6_URB_DW1_VS_ENTRY_COUNT__MASK			0x0000ffff
+#define GEN6_URB_DW1_VS_ENTRY_COUNT__SHIFT			0
+#define GEN6_URB_DW1_VS_ENTRY_COUNT__ALIGN			4
+
+#define GEN6_URB_DW2_GS_ENTRY_COUNT__MASK			0x0003ff00
+#define GEN6_URB_DW2_GS_ENTRY_COUNT__SHIFT			8
+#define GEN6_URB_DW2_GS_ENTRY_COUNT__ALIGN			4
+#define GEN6_URB_DW2_GS_ENTRY_SIZE__MASK			0x00000007
+#define GEN6_URB_DW2_GS_ENTRY_SIZE__SHIFT			0
+
+#define GEN7_3DSTATE_URB_ANY__SIZE				2
+
+
+#define GEN7_URB_DW1_OFFSET__MASK				0x3e000000
+#define GEN7_URB_DW1_OFFSET__SHIFT				25
+#define GEN7_URB_DW1_ENTRY_SIZE__MASK				0x01ff0000
+#define GEN7_URB_DW1_ENTRY_SIZE__SHIFT				16
+#define GEN7_URB_DW1_ENTRY_COUNT__MASK				0x0000ffff
+#define GEN7_URB_DW1_ENTRY_COUNT__SHIFT				0
+
+#define GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_ANY__SIZE		2
+
+
+#define GEN7_PCB_ALLOC_DW1_OFFSET__MASK				0x000f0000
+#define GEN7_PCB_ALLOC_DW1_OFFSET__SHIFT			16
+#define GEN7_PCB_ALLOC_DW1_SIZE__MASK				0x0000001f
+#define GEN7_PCB_ALLOC_DW1_SIZE__SHIFT				0
+
+#define GEN75_PCB_ALLOC_DW1_OFFSET__MASK			0x001f0000
+#define GEN75_PCB_ALLOC_DW1_OFFSET__SHIFT			16
+#define GEN75_PCB_ALLOC_DW1_SIZE__MASK				0x0000003f
+#define GEN75_PCB_ALLOC_DW1_SIZE__SHIFT				0
+
+#define GEN6_3DSTATE_VERTEX_BUFFERS__SIZE			133
+
+
+
+#define GEN6_VB_DW0_INDEX__MASK					0xfc000000
+#define GEN6_VB_DW0_INDEX__SHIFT				26
+#define GEN8_VB_DW0_MOCS__MASK					0x007f0000
+#define GEN8_VB_DW0_MOCS__SHIFT					16
+#define GEN6_VB_DW0_ACCESS__MASK				0x00100000
+#define GEN6_VB_DW0_ACCESS__SHIFT				20
+#define GEN6_VB_DW0_ACCESS_VERTEXDATA				(0x0 << 20)
+#define GEN6_VB_DW0_ACCESS_INSTANCEDATA				(0x1 << 20)
+#define GEN6_VB_DW0_MOCS__MASK					0x000f0000
+#define GEN6_VB_DW0_MOCS__SHIFT					16
+#define GEN7_VB_DW0_ADDR_MODIFIED				(0x1 << 14)
+#define GEN6_VB_DW0_IS_NULL					(0x1 << 13)
+#define GEN6_VB_DW0_CACHE_INVALIDATE				(0x1 << 12)
+#define GEN6_VB_DW0_PITCH__MASK					0x00000fff
+#define GEN6_VB_DW0_PITCH__SHIFT				0
+
+
+
+
+
+
+
+#define GEN6_3DSTATE_VERTEX_ELEMENTS__SIZE			69
+
+
+
+#define GEN6_VE_DW0_VB_INDEX__MASK				0xfc000000
+#define GEN6_VE_DW0_VB_INDEX__SHIFT				26
+#define GEN6_VE_DW0_VALID					(0x1 << 25)
+#define GEN6_VE_DW0_FORMAT__MASK				0x01ff0000
+#define GEN6_VE_DW0_FORMAT__SHIFT				16
+#define GEN6_VE_DW0_EDGE_FLAG_ENABLE				(0x1 << 15)
+#define GEN6_VE_DW0_VB_OFFSET__MASK				0x000007ff
+#define GEN6_VE_DW0_VB_OFFSET__SHIFT				0
+#define GEN75_VE_DW0_VB_OFFSET__MASK				0x00000fff
+#define GEN75_VE_DW0_VB_OFFSET__SHIFT				0
+
+#define GEN6_VE_DW1_COMP0__MASK					0x70000000
+#define GEN6_VE_DW1_COMP0__SHIFT				28
+#define GEN6_VE_DW1_COMP1__MASK					0x07000000
+#define GEN6_VE_DW1_COMP1__SHIFT				24
+#define GEN6_VE_DW1_COMP2__MASK					0x00700000
+#define GEN6_VE_DW1_COMP2__SHIFT				20
+#define GEN6_VE_DW1_COMP3__MASK					0x00070000
+#define GEN6_VE_DW1_COMP3__SHIFT				16
+
+#define GEN6_3DSTATE_INDEX_BUFFER__SIZE				5
+
+#define GEN6_IB_DW0_MOCS__MASK					0x0000f000
+#define GEN6_IB_DW0_MOCS__SHIFT					12
+#define GEN6_IB_DW0_CUT_INDEX_ENABLE				(0x1 << 10)
+#define GEN6_IB_DW0_FORMAT__MASK				0x00000300
+#define GEN6_IB_DW0_FORMAT__SHIFT				8
+#define GEN6_IB_DW0_FORMAT_BYTE					(0x0 << 8)
+#define GEN6_IB_DW0_FORMAT_WORD					(0x1 << 8)
+#define GEN6_IB_DW0_FORMAT_DWORD				(0x2 << 8)
+
+
+
+
+
+#define GEN8_IB_DW1_FORMAT__MASK				0x00000300
+#define GEN8_IB_DW1_FORMAT__SHIFT				8
+#define GEN8_IB_DW1_FORMAT_BYTE					(0x0 << 8)
+#define GEN8_IB_DW1_FORMAT_WORD					(0x1 << 8)
+#define GEN8_IB_DW1_FORMAT_DWORD				(0x2 << 8)
+#define GEN8_IB_DW1_MOCS__MASK					0x0000007f
+#define GEN8_IB_DW1_MOCS__SHIFT					0
+
+
+
+
+#define GEN75_3DSTATE_VF__SIZE					2
+
+#define GEN75_VF_DW0_CUT_INDEX_ENABLE				(0x1 << 8)
+
+
+#define GEN8_3DSTATE_VF_INSTANCING__SIZE			3
+
+
+#define GEN8_INSTANCING_DW1_ENABLE				(0x1 << 8)
+#define GEN8_INSTANCING_DW1_VB_INDEX__MASK			0x0000003f
+#define GEN8_INSTANCING_DW1_VB_INDEX__SHIFT			0
+
+
+#define GEN8_3DSTATE_VF_SGVS__SIZE				2
+
+
+#define GEN8_SGVS_DW1_IID_ENABLE				(0x1 << 31)
+#define GEN8_SGVS_DW1_IID_VE_COMP__MASK				0x60000000
+#define GEN8_SGVS_DW1_IID_VE_COMP__SHIFT			29
+#define GEN8_SGVS_DW1_IID_VE_INDEX__MASK			0x003f0000
+#define GEN8_SGVS_DW1_IID_VE_INDEX__SHIFT			16
+#define GEN8_SGVS_DW1_VID_ENABLE				(0x1 << 15)
+#define GEN8_SGVS_DW1_VID_VE_COMP__MASK				0x00006000
+#define GEN8_SGVS_DW1_VID_VE_COMP__SHIFT			13
+#define GEN8_SGVS_DW1_VID_VE_INDEX__MASK			0x0000003f
+#define GEN8_SGVS_DW1_VID_VE_INDEX__SHIFT			0
+
+#define GEN8_3DSTATE_VF_TOPOLOGY__SIZE				2
+
+
+#define GEN8_TOPOLOGY_DW1_TYPE__MASK				0x0000003f
+#define GEN8_TOPOLOGY_DW1_TYPE__SHIFT				0
+
+#define GEN6_3DSTATE_VIEWPORT_STATE_POINTERS__SIZE		4
+
+#define GEN6_VP_PTR_DW0_CC_CHANGED				(0x1 << 12)
+#define GEN6_VP_PTR_DW0_SF_CHANGED				(0x1 << 11)
+#define GEN6_VP_PTR_DW0_CLIP_CHANGED				(0x1 << 10)
+
+#define GEN6_VP_PTR_DW1_CLIP_ADDR__MASK				0xffffffe0
+#define GEN6_VP_PTR_DW1_CLIP_ADDR__SHIFT			5
+#define GEN6_VP_PTR_DW1_CLIP_ADDR__SHR				5
+
+#define GEN6_VP_PTR_DW2_SF_ADDR__MASK				0xffffffe0
+#define GEN6_VP_PTR_DW2_SF_ADDR__SHIFT				5
+#define GEN6_VP_PTR_DW2_SF_ADDR__SHR				5
+
+#define GEN6_VP_PTR_DW3_CC_ADDR__MASK				0xffffffe0
+#define GEN6_VP_PTR_DW3_CC_ADDR__SHIFT				5
+#define GEN6_VP_PTR_DW3_CC_ADDR__SHR				5
+
+#define GEN6_3DSTATE_CC_STATE_POINTERS__SIZE			4
+
+
+#define GEN6_CC_PTR_DW1_BLEND_CHANGED				(0x1 << 0)
+#define GEN6_CC_PTR_DW1_BLEND_ADDR__MASK			0xffffffc0
+#define GEN6_CC_PTR_DW1_BLEND_ADDR__SHIFT			6
+#define GEN6_CC_PTR_DW1_BLEND_ADDR__SHR				6
+
+#define GEN6_CC_PTR_DW2_ZS_CHANGED				(0x1 << 0)
+#define GEN6_CC_PTR_DW2_ZS_ADDR__MASK				0xffffffc0
+#define GEN6_CC_PTR_DW2_ZS_ADDR__SHIFT				6
+#define GEN6_CC_PTR_DW2_ZS_ADDR__SHR				6
+
+#define GEN6_CC_PTR_DW3_CC_CHANGED				(0x1 << 0)
+#define GEN6_CC_PTR_DW3_CC_ADDR__MASK				0xffffffc0
+#define GEN6_CC_PTR_DW3_CC_ADDR__SHIFT				6
+#define GEN6_CC_PTR_DW3_CC_ADDR__SHR				6
+
+#define GEN6_3DSTATE_SCISSOR_STATE_POINTERS__SIZE		2
+
+
+#define GEN6_SCISSOR_PTR_DW1_ADDR__MASK				0xffffffe0
+#define GEN6_SCISSOR_PTR_DW1_ADDR__SHIFT			5
+#define GEN6_SCISSOR_PTR_DW1_ADDR__SHR				5
+
+#define GEN7_3DSTATE_POINTERS_ANY__SIZE				2
+
+
+
+#define GEN6_3DSTATE_VS__SIZE					9
+
+
+#define GEN6_VS_DW1_KERNEL_ADDR__MASK				0xffffffc0
+#define GEN6_VS_DW1_KERNEL_ADDR__SHIFT				6
+#define GEN6_VS_DW1_KERNEL_ADDR__SHR				6
+
+
+
+#define GEN6_VS_DW4_URB_GRF_START__MASK				0x01f00000
+#define GEN6_VS_DW4_URB_GRF_START__SHIFT			20
+#define GEN6_VS_DW4_URB_READ_LEN__MASK				0x0001f800
+#define GEN6_VS_DW4_URB_READ_LEN__SHIFT				11
+#define GEN6_VS_DW4_URB_READ_OFFSET__MASK			0x000003f0
+#define GEN6_VS_DW4_URB_READ_OFFSET__SHIFT			4
+
+#define GEN6_VS_DW5_MAX_THREADS__MASK				0xfe000000
+#define GEN6_VS_DW5_MAX_THREADS__SHIFT				25
+#define GEN75_VS_DW5_MAX_THREADS__MASK				0xff800000
+#define GEN75_VS_DW5_MAX_THREADS__SHIFT				23
+#define GEN6_VS_DW5_STATISTICS					(0x1 << 10)
+#define GEN6_VS_DW5_CACHE_DISABLE				(0x1 << 1)
+#define GEN6_VS_DW5_VS_ENABLE					(0x1 << 0)
+
+
+
+#define GEN8_VS_DW1_KERNEL_ADDR__MASK				0xffffffc0
+#define GEN8_VS_DW1_KERNEL_ADDR__SHIFT				6
+#define GEN8_VS_DW1_KERNEL_ADDR__SHR				6
+
+
+
+
+
+#define GEN8_VS_DW6_URB_GRF_START__MASK				0x01f00000
+#define GEN8_VS_DW6_URB_GRF_START__SHIFT			20
+#define GEN8_VS_DW6_URB_READ_LEN__MASK				0x0001f800
+#define GEN8_VS_DW6_URB_READ_LEN__SHIFT				11
+#define GEN8_VS_DW6_URB_READ_OFFSET__MASK			0x000003f0
+#define GEN8_VS_DW6_URB_READ_OFFSET__SHIFT			4
+
+#define GEN8_VS_DW7_MAX_THREADS__MASK				0xff800000
+#define GEN8_VS_DW7_MAX_THREADS__SHIFT				23
+#define GEN8_VS_DW7_STATISTICS					(0x1 << 10)
+#define GEN8_VS_DW7_SIMD8_ENABLE				(0x1 << 2)
+#define GEN8_VS_DW7_CACHE_DISABLE				(0x1 << 1)
+#define GEN8_VS_DW7_VS_ENABLE					(0x1 << 0)
+
+#define GEN8_VS_DW8_URB_WRITE_OFFSET__MASK			0x03e00000
+#define GEN8_VS_DW8_URB_WRITE_OFFSET__SHIFT			21
+#define GEN8_VS_DW8_URB_WRITE_LEN__MASK				0x001f0000
+#define GEN8_VS_DW8_URB_WRITE_LEN__SHIFT			16
+#define GEN8_VS_DW8_UCP_CLIP_ENABLES__MASK			0x0000ff00
+#define GEN8_VS_DW8_UCP_CLIP_ENABLES__SHIFT			8
+
+#define GEN7_3DSTATE_HS__SIZE					9
+
+
+#define GEN7_HS_DW1_DISPATCH_MAX_THREADS__MASK			0x0000007f
+#define GEN7_HS_DW1_DISPATCH_MAX_THREADS__SHIFT			0
+#define GEN75_HS_DW1_DISPATCH_MAX_THREADS__MASK			0x000000ff
+#define GEN75_HS_DW1_DISPATCH_MAX_THREADS__SHIFT		0
+
+#define GEN7_HS_DW2_HS_ENABLE					(0x1 << 31)
+#define GEN7_HS_DW2_STATISTICS					(0x1 << 29)
+#define GEN7_HS_DW2_INSTANCE_COUNT__MASK			0x0000000f
+#define GEN7_HS_DW2_INSTANCE_COUNT__SHIFT			0
+
+#define GEN7_HS_DW3_KERNEL_ADDR__MASK				0xffffffc0
+#define GEN7_HS_DW3_KERNEL_ADDR__SHIFT				6
+#define GEN7_HS_DW3_KERNEL_ADDR__SHR				6
+
+
+#define GEN7_HS_DW5_SPF						(0x1 << 27)
+#define GEN7_HS_DW5_VME						(0x1 << 26)
+#define GEN75_HS_DW5_ACCESS_UAV					(0x1 << 25)
+#define GEN7_HS_DW5_INCLUDE_VERTEX_HANDLES			(0x1 << 24)
+#define GEN7_HS_DW5_URB_GRF_START__MASK				0x00f80000
+#define GEN7_HS_DW5_URB_GRF_START__SHIFT			19
+#define GEN7_HS_DW5_URB_READ_LEN__MASK				0x0001f800
+#define GEN7_HS_DW5_URB_READ_LEN__SHIFT				11
+#define GEN7_HS_DW5_URB_READ_OFFSET__MASK			0x000003f0
+#define GEN7_HS_DW5_URB_READ_OFFSET__SHIFT			4
+
+#define GEN7_HS_DW6_URB_SEMAPHORE_ADDR__MASK			0x00000fff
+#define GEN7_HS_DW6_URB_SEMAPHORE_ADDR__SHIFT			0
+#define GEN7_HS_DW6_URB_SEMAPHORE_ADDR__SHR			6
+#define GEN75_HS_DW6_URB_SEMAPHORE_ADDR__MASK			0x00001fff
+#define GEN75_HS_DW6_URB_SEMAPHORE_ADDR__SHIFT			0
+#define GEN75_HS_DW6_URB_SEMAPHORE_ADDR__SHR			6
+
+
+
+#define GEN8_HS_DW1_DISPATCH_MAX_THREADS__MASK			0x000000ff
+#define GEN8_HS_DW1_DISPATCH_MAX_THREADS__SHIFT			0
+
+#define GEN8_HS_DW2_HS_ENABLE					(0x1 << 31)
+#define GEN8_HS_DW2_STATISTICS					(0x1 << 29)
+#define GEN8_HS_DW2_INSTANCE_COUNT__MASK			0x0000000f
+#define GEN8_HS_DW2_INSTANCE_COUNT__SHIFT			0
+
+#define GEN8_HS_DW3_KERNEL_ADDR__MASK				0xffffffc0
+#define GEN8_HS_DW3_KERNEL_ADDR__SHIFT				6
+#define GEN8_HS_DW3_KERNEL_ADDR__SHR				6
+
+
+
+
+#define GEN8_HS_DW7_SPF						(0x1 << 27)
+#define GEN8_HS_DW7_VME						(0x1 << 26)
+#define GEN8_HS_DW7_ACCESS_UAV					(0x1 << 25)
+#define GEN8_HS_DW7_INCLUDE_VERTEX_HANDLES			(0x1 << 24)
+#define GEN8_HS_DW7_URB_GRF_START__MASK				0x00f80000
+#define GEN8_HS_DW7_URB_GRF_START__SHIFT			19
+#define GEN8_HS_DW7_URB_READ_LEN__MASK				0x0001f800
+#define GEN8_HS_DW7_URB_READ_LEN__SHIFT				11
+#define GEN8_HS_DW7_URB_READ_OFFSET__MASK			0x000003f0
+#define GEN8_HS_DW7_URB_READ_OFFSET__SHIFT			4
+
+#define GEN8_HS_DW8_URB_SEMAPHORE_ADDR__MASK			0x00001fff
+#define GEN8_HS_DW8_URB_SEMAPHORE_ADDR__SHIFT			0
+#define GEN8_HS_DW8_URB_SEMAPHORE_ADDR__SHR			6
+
+#define GEN7_3DSTATE_TE__SIZE					4
+
+
+#define GEN7_TE_DW1_PARTITIONING__MASK				0x00003000
+#define GEN7_TE_DW1_PARTITIONING__SHIFT				12
+#define GEN7_TE_DW1_PARTITIONING_INTEGER			(0x0 << 12)
+#define GEN7_TE_DW1_PARTITIONING_ODD_FRACTIONAL			(0x1 << 12)
+#define GEN7_TE_DW1_PARTITIONING_EVEN_FRACTIONAL		(0x2 << 12)
+#define GEN7_TE_DW1_OUTPUT_TOPO__MASK				0x00000300
+#define GEN7_TE_DW1_OUTPUT_TOPO__SHIFT				8
+#define GEN7_TE_DW1_OUTPUT_TOPO_POINT				(0x0 << 8)
+#define GEN7_TE_DW1_OUTPUT_TOPO_LINE				(0x1 << 8)
+#define GEN7_TE_DW1_OUTPUT_TOPO_TRI_CW				(0x2 << 8)
+#define GEN7_TE_DW1_OUTPUT_TOPO_TRI_CCW				(0x3 << 8)
+#define GEN7_TE_DW1_DOMAIN__MASK				0x00000030
+#define GEN7_TE_DW1_DOMAIN__SHIFT				4
+#define GEN7_TE_DW1_DOMAIN_QUAD					(0x0 << 4)
+#define GEN7_TE_DW1_DOMAIN_TRI					(0x1 << 4)
+#define GEN7_TE_DW1_DOMAIN_ISOLINE				(0x2 << 4)
+#define GEN7_TE_DW1_MODE__MASK					0x00000006
+#define GEN7_TE_DW1_MODE__SHIFT					1
+#define GEN7_TE_DW1_MODE_HW					(0x0 << 1)
+#define GEN7_TE_DW1_MODE_SW					(0x1 << 1)
+#define GEN7_TE_DW1_TE_ENABLE					(0x1 << 0)
+
+
+
+#define GEN7_3DSTATE_DS__SIZE					11
+
+
+#define GEN7_DS_DW1_KERNEL_ADDR__MASK				0xffffffc0
+#define GEN7_DS_DW1_KERNEL_ADDR__SHIFT				6
+#define GEN7_DS_DW1_KERNEL_ADDR__SHR				6
+
+
+
+#define GEN7_DS_DW4_URB_GRF_START__MASK				0x01f00000
+#define GEN7_DS_DW4_URB_GRF_START__SHIFT			20
+#define GEN7_DS_DW4_URB_READ_LEN__MASK				0x0003f800
+#define GEN7_DS_DW4_URB_READ_LEN__SHIFT				11
+#define GEN7_DS_DW4_URB_READ_OFFSET__MASK			0x000003f0
+#define GEN7_DS_DW4_URB_READ_OFFSET__SHIFT			4
+
+#define GEN7_DS_DW5_MAX_THREADS__MASK				0xfe000000
+#define GEN7_DS_DW5_MAX_THREADS__SHIFT				25
+#define GEN75_DS_DW5_MAX_THREADS__MASK				0x3fe00000
+#define GEN75_DS_DW5_MAX_THREADS__SHIFT				21
+#define GEN7_DS_DW5_STATISTICS					(0x1 << 10)
+#define GEN7_DS_DW5_COMPUTE_W					(0x1 << 2)
+#define GEN7_DS_DW5_CACHE_DISABLE				(0x1 << 1)
+#define GEN7_DS_DW5_DS_ENABLE					(0x1 << 0)
+
+
+
+#define GEN8_DS_DW1_KERNEL_ADDR__MASK				0xffffffc0
+#define GEN8_DS_DW1_KERNEL_ADDR__SHIFT				6
+#define GEN8_DS_DW1_KERNEL_ADDR__SHR				6
+
+
+
+
+
+#define GEN8_DS_DW6_URB_GRF_START__MASK				0x01f00000
+#define GEN8_DS_DW6_URB_GRF_START__SHIFT			20
+#define GEN8_DS_DW6_URB_READ_LEN__MASK				0x0003f800
+#define GEN8_DS_DW6_URB_READ_LEN__SHIFT				11
+#define GEN8_DS_DW6_URB_READ_OFFSET__MASK			0x000003f0
+#define GEN8_DS_DW6_URB_READ_OFFSET__SHIFT			4
+
+#define GEN8_DS_DW7_MAX_THREADS__MASK				0x3fe00000
+#define GEN8_DS_DW7_MAX_THREADS__SHIFT				21
+#define GEN8_DS_DW7_STATISTICS					(0x1 << 10)
+#define GEN8_DS_DW7_COMPUTE_W					(0x1 << 2)
+#define GEN8_DS_DW7_CACHE_DISABLE				(0x1 << 1)
+#define GEN8_DS_DW7_DS_ENABLE					(0x1 << 0)
+
+#define GEN8_DS_DW8_URB_WRITE_OFFSET__MASK			0x03e00000
+#define GEN8_DS_DW8_URB_WRITE_OFFSET__SHIFT			21
+#define GEN8_DS_DW8_URB_WRITE_LEN__MASK				0x001f0000
+#define GEN8_DS_DW8_URB_WRITE_LEN__SHIFT			16
+#define GEN8_DS_DW8_UCP_CLIP_ENABLES__MASK			0x0000ff00
+#define GEN8_DS_DW8_UCP_CLIP_ENABLES__SHIFT			8
+
+
+
+#define GEN6_3DSTATE_GS__SIZE					10
+
+
+#define GEN6_GS_DW1_KERNEL_ADDR__MASK				0xffffffc0
+#define GEN6_GS_DW1_KERNEL_ADDR__SHIFT				6
+#define GEN6_GS_DW1_KERNEL_ADDR__SHR				6
+
+
+
+#define GEN6_GS_DW4_URB_READ_LEN__MASK				0x0001f800
+#define GEN6_GS_DW4_URB_READ_LEN__SHIFT				11
+#define GEN6_GS_DW4_URB_READ_OFFSET__MASK			0x000003f0
+#define GEN6_GS_DW4_URB_READ_OFFSET__SHIFT			4
+#define GEN6_GS_DW4_URB_GRF_START__MASK				0x0000000f
+#define GEN6_GS_DW4_URB_GRF_START__SHIFT			0
+
+#define GEN6_GS_DW5_MAX_THREADS__MASK				0xfe000000
+#define GEN6_GS_DW5_MAX_THREADS__SHIFT				25
+#define GEN6_GS_DW5_STATISTICS					(0x1 << 10)
+#define GEN6_GS_DW5_SO_STATISTICS				(0x1 << 9)
+#define GEN6_GS_DW5_RENDER_ENABLE				(0x1 << 8)
+
+#define GEN6_GS_DW6_REORDER_ENABLE				(0x1 << 30)
+#define GEN6_GS_DW6_DISCARD_ADJACENCY				(0x1 << 29)
+#define GEN6_GS_DW6_SVBI_PAYLOAD_ENABLE				(0x1 << 28)
+#define GEN6_GS_DW6_SVBI_POST_INC_ENABLE			(0x1 << 27)
+#define GEN6_GS_DW6_SVBI_POST_INC_VAL__MASK			0x03ff0000
+#define GEN6_GS_DW6_SVBI_POST_INC_VAL__SHIFT			16
+#define GEN6_GS_DW6_GS_ENABLE					(0x1 << 15)
+
+
+
+#define GEN7_GS_DW1_KERNEL_ADDR__MASK				0xffffffc0
+#define GEN7_GS_DW1_KERNEL_ADDR__SHIFT				6
+#define GEN7_GS_DW1_KERNEL_ADDR__SHR				6
+
+
+
+#define GEN7_GS_DW4_OUTPUT_SIZE__MASK				0x1f800000
+#define GEN7_GS_DW4_OUTPUT_SIZE__SHIFT				23
+#define GEN7_GS_DW4_OUTPUT_TOPO__MASK				0x007e0000
+#define GEN7_GS_DW4_OUTPUT_TOPO__SHIFT				17
+#define GEN7_GS_DW4_URB_READ_LEN__MASK				0x0001f800
+#define GEN7_GS_DW4_URB_READ_LEN__SHIFT				11
+#define GEN7_GS_DW4_INCLUDE_VERTEX_HANDLES			(0x1 << 10)
+#define GEN7_GS_DW4_URB_READ_OFFSET__MASK			0x000003f0
+#define GEN7_GS_DW4_URB_READ_OFFSET__SHIFT			4
+#define GEN7_GS_DW4_URB_GRF_START__MASK				0x0000000f
+#define GEN7_GS_DW4_URB_GRF_START__SHIFT			0
+
+#define GEN7_GS_DW5_MAX_THREADS__MASK				0xfe000000
+#define GEN7_GS_DW5_MAX_THREADS__SHIFT				25
+#define GEN7_GS_DW5_GSCTRL__MASK				0x01000000
+#define GEN7_GS_DW5_GSCTRL__SHIFT				24
+#define GEN7_GS_DW5_GSCTRL_CUT					(0x0 << 24)
+#define GEN7_GS_DW5_GSCTRL_SID					(0x1 << 24)
+#define GEN75_GS_DW5_MAX_THREADS__MASK				0xff000000
+#define GEN75_GS_DW5_MAX_THREADS__SHIFT				24
+#define GEN7_GS_DW5_CONTROL_DATA_HEADER_SIZE__MASK		0x00f00000
+#define GEN7_GS_DW5_CONTROL_DATA_HEADER_SIZE__SHIFT		20
+#define GEN7_GS_DW5_INSTANCE_CONTROL__MASK			0x000f8000
+#define GEN7_GS_DW5_INSTANCE_CONTROL__SHIFT			15
+#define GEN7_GS_DW5_DEFAULT_STREAM_ID__MASK			0x00006000
+#define GEN7_GS_DW5_DEFAULT_STREAM_ID__SHIFT			13
+#define GEN7_GS_DW5_DISPATCH_MODE__MASK				0x00001800
+#define GEN7_GS_DW5_DISPATCH_MODE__SHIFT			11
+#define GEN7_GS_DW5_DISPATCH_MODE_SINGLE			(0x0 << 11)
+#define GEN7_GS_DW5_DISPATCH_MODE_DUAL_INSTANCE			(0x1 << 11)
+#define GEN7_GS_DW5_DISPATCH_MODE_DUAL_OBJECT			(0x2 << 11)
+#define GEN7_GS_DW5_STATISTICS					(0x1 << 10)
+#define GEN7_GS_DW5_INVOCATION_INCR__MASK			0x000003e0
+#define GEN7_GS_DW5_INVOCATION_INCR__SHIFT			5
+#define GEN7_GS_DW5_INCLUDE_PRIMITIVE_ID			(0x1 << 4)
+#define GEN7_GS_DW5_HINT					(0x1 << 3)
+#define GEN7_GS_DW5_REORDER_ENABLE				(0x1 << 2)
+#define GEN75_GS_DW5_REORDER__MASK				0x00000004
+#define GEN75_GS_DW5_REORDER__SHIFT				2
+#define GEN75_GS_DW5_REORDER_LEADING				(0x0 << 2)
+#define GEN75_GS_DW5_REORDER_TRAILING				(0x1 << 2)
+#define GEN7_GS_DW5_DISCARD_ADJACENCY				(0x1 << 1)
+#define GEN7_GS_DW5_GS_ENABLE					(0x1 << 0)
+
+#define GEN75_GS_DW6_GSCTRL__MASK				0x80000000
+#define GEN75_GS_DW6_GSCTRL__SHIFT				31
+#define GEN75_GS_DW6_GSCTRL_CUT					(0x0 << 31)
+#define GEN75_GS_DW6_GSCTRL_SID					(0x1 << 31)
+#define GEN7_GS_DW6_URB_SEMAPHORE_ADDR__MASK			0x00000fff
+#define GEN7_GS_DW6_URB_SEMAPHORE_ADDR__SHIFT			0
+#define GEN7_GS_DW6_URB_SEMAPHORE_ADDR__SHR			6
+#define GEN75_GS_DW6_URB_SEMAPHORE_ADDR__MASK			0x00001fff
+#define GEN75_GS_DW6_URB_SEMAPHORE_ADDR__SHIFT			0
+#define GEN75_GS_DW6_URB_SEMAPHORE_ADDR__SHR			6
+
+
+
+#define GEN8_GS_DW1_KERNEL_ADDR__MASK				0xffffffc0
+#define GEN8_GS_DW1_KERNEL_ADDR__SHIFT				6
+#define GEN8_GS_DW1_KERNEL_ADDR__SHR				6
+
+
+#define GEN8_GS_DW3_EXPECTED_VERTEX_COUNT__MASK			0x0000007f
+#define GEN8_GS_DW3_EXPECTED_VERTEX_COUNT__SHIFT		0
+
+
+
+#define GEN8_GS_DW6_OUTPUT_SIZE__MASK				0x1f800000
+#define GEN8_GS_DW6_OUTPUT_SIZE__SHIFT				23
+#define GEN8_GS_DW6_OUTPUT_TOPO__MASK				0x007e0000
+#define GEN8_GS_DW6_OUTPUT_TOPO__SHIFT				17
+#define GEN8_GS_DW6_URB_READ_LEN__MASK				0x0001f800
+#define GEN8_GS_DW6_URB_READ_LEN__SHIFT				11
+#define GEN8_GS_DW6_INCLUDE_VERTEX_HANDLES			(0x1 << 10)
+#define GEN8_GS_DW6_URB_READ_OFFSET__MASK			0x000003f0
+#define GEN8_GS_DW6_URB_READ_OFFSET__SHIFT			4
+#define GEN8_GS_DW6_URB_GRF_START__MASK				0x0000000f
+#define GEN8_GS_DW6_URB_GRF_START__SHIFT			0
+
+#define GEN8_GS_DW7_MAX_THREADS__MASK				0xff000000
+#define GEN8_GS_DW7_MAX_THREADS__SHIFT				24
+#define GEN8_GS_DW7_CONTROL_DATA_HEADER_SIZE__MASK		0x00f00000
+#define GEN8_GS_DW7_CONTROL_DATA_HEADER_SIZE__SHIFT		20
+#define GEN8_GS_DW7_INSTANCE_CONTROL__MASK			0x000f8000
+#define GEN8_GS_DW7_INSTANCE_CONTROL__SHIFT			15
+#define GEN8_GS_DW7_DEFAULT_STREAM_ID__MASK			0x00006000
+#define GEN8_GS_DW7_DEFAULT_STREAM_ID__SHIFT			13
+#define GEN8_GS_DW7_DISPATCH_MODE__MASK				0x00001800
+#define GEN8_GS_DW7_DISPATCH_MODE__SHIFT			11
+#define GEN8_GS_DW7_DISPATCH_MODE_SINGLE			(0x0 << 11)
+#define GEN8_GS_DW7_DISPATCH_MODE_DUAL_INSTANCE			(0x1 << 11)
+#define GEN8_GS_DW7_DISPATCH_MODE_DUAL_OBJECT			(0x2 << 11)
+#define GEN8_GS_DW7_STATISTICS					(0x1 << 10)
+#define GEN8_GS_DW7_INVOCATION_INCR__MASK			0x000003e0
+#define GEN8_GS_DW7_INVOCATION_INCR__SHIFT			5
+#define GEN8_GS_DW7_INCLUDE_PRIMITIVE_ID			(0x1 << 4)
+#define GEN8_GS_DW7_HINT					(0x1 << 3)
+#define GEN8_GS_DW7_REORDER__MASK				0x00000004
+#define GEN8_GS_DW7_REORDER__SHIFT				2
+#define GEN8_GS_DW7_REORDER_LEADING				(0x0 << 2)
+#define GEN8_GS_DW7_REORDER_TRAILING				(0x1 << 2)
+#define GEN8_GS_DW7_DISCARD_ADJACENCY				(0x1 << 1)
+#define GEN8_GS_DW7_GS_ENABLE					(0x1 << 0)
+
+#define GEN8_GS_DW8_GSCTRL__MASK				0x80000000
+#define GEN8_GS_DW8_GSCTRL__SHIFT				31
+#define GEN8_GS_DW8_GSCTRL_CUT					(0x0 << 31)
+#define GEN8_GS_DW8_GSCTRL_SID					(0x1 << 31)
+#define GEN8_GS_DW8_URB_SEMAPHORE_ADDR__MASK			0x00001fff
+#define GEN8_GS_DW8_URB_SEMAPHORE_ADDR__SHIFT			0
+#define GEN8_GS_DW8_URB_SEMAPHORE_ADDR__SHR			6
+#define GEN9_GS_DW8_MAX_THREADS__MASK				0x00001fff
+#define GEN9_GS_DW8_MAX_THREADS__SHIFT				0
+
+#define GEN8_GS_DW9_URB_WRITE_OFFSET__MASK			0x03e00000
+#define GEN8_GS_DW9_URB_WRITE_OFFSET__SHIFT			21
+#define GEN8_GS_DW9_URB_WRITE_LEN__MASK				0x001f0000
+#define GEN8_GS_DW9_URB_WRITE_LEN__SHIFT			16
+#define GEN8_GS_DW9_UCP_CLIP_ENABLES__MASK			0x0000ff00
+#define GEN8_GS_DW9_UCP_CLIP_ENABLES__SHIFT			8
+
+#define GEN7_3DSTATE_STREAMOUT__SIZE				5
+
+
+#define GEN7_SO_DW1_SO_ENABLE					(0x1 << 31)
+#define GEN7_SO_DW1_RENDER_DISABLE				(0x1 << 30)
+#define GEN7_SO_DW1_RENDER_STREAM_SELECT__MASK			0x18000000
+#define GEN7_SO_DW1_RENDER_STREAM_SELECT__SHIFT			27
+#define GEN7_SO_DW1_REORDER__MASK				0x04000000
+#define GEN7_SO_DW1_REORDER__SHIFT				26
+#define GEN7_SO_DW1_REORDER_LEADING				(0x0 << 26)
+#define GEN7_SO_DW1_REORDER_TRAILING				(0x1 << 26)
+#define GEN7_SO_DW1_STATISTICS					(0x1 << 25)
+#define GEN7_SO_DW1_BUFFER_ENABLES__MASK			0x00000f00
+#define GEN7_SO_DW1_BUFFER_ENABLES__SHIFT			8
+
+#define GEN7_SO_DW2_STREAM3_READ_OFFSET__MASK			0x20000000
+#define GEN7_SO_DW2_STREAM3_READ_OFFSET__SHIFT			29
+#define GEN7_SO_DW2_STREAM3_READ_LEN__MASK			0x1f000000
+#define GEN7_SO_DW2_STREAM3_READ_LEN__SHIFT			24
+#define GEN7_SO_DW2_STREAM2_READ_OFFSET__MASK			0x00200000
+#define GEN7_SO_DW2_STREAM2_READ_OFFSET__SHIFT			21
+#define GEN7_SO_DW2_STREAM2_READ_LEN__MASK			0x001f0000
+#define GEN7_SO_DW2_STREAM2_READ_LEN__SHIFT			16
+#define GEN7_SO_DW2_STREAM1_READ_OFFSET__MASK			0x00002000
+#define GEN7_SO_DW2_STREAM1_READ_OFFSET__SHIFT			13
+#define GEN7_SO_DW2_STREAM1_READ_LEN__MASK			0x00001f00
+#define GEN7_SO_DW2_STREAM1_READ_LEN__SHIFT			8
+#define GEN7_SO_DW2_STREAM0_READ_OFFSET__MASK			0x00000020
+#define GEN7_SO_DW2_STREAM0_READ_OFFSET__SHIFT			5
+#define GEN7_SO_DW2_STREAM0_READ_LEN__MASK			0x0000001f
+#define GEN7_SO_DW2_STREAM0_READ_LEN__SHIFT			0
+
+#define GEN8_SO_DW3_BUFFER1_PITCH__MASK				0x0fff0000
+#define GEN8_SO_DW3_BUFFER1_PITCH__SHIFT			16
+#define GEN8_SO_DW3_BUFFER0_PITCH__MASK				0x00000fff
+#define GEN8_SO_DW3_BUFFER0_PITCH__SHIFT			0
+
+#define GEN8_SO_DW4_BUFFER3_PITCH__MASK				0x0fff0000
+#define GEN8_SO_DW4_BUFFER3_PITCH__SHIFT			16
+#define GEN8_SO_DW4_BUFFER2_PITCH__MASK				0x00000fff
+#define GEN8_SO_DW4_BUFFER2_PITCH__SHIFT			0
+
+#define GEN7_3DSTATE_SO_DECL_LIST__SIZE				259
+
+
+#define GEN7_SO_DECL_DW1_STREAM3_BUFFER_SELECTS__MASK		0x0000f000
+#define GEN7_SO_DECL_DW1_STREAM3_BUFFER_SELECTS__SHIFT		12
+#define GEN7_SO_DECL_DW1_STREAM2_BUFFER_SELECTS__MASK		0x00000f00
+#define GEN7_SO_DECL_DW1_STREAM2_BUFFER_SELECTS__SHIFT		8
+#define GEN7_SO_DECL_DW1_STREAM1_BUFFER_SELECTS__MASK		0x000000f0
+#define GEN7_SO_DECL_DW1_STREAM1_BUFFER_SELECTS__SHIFT		4
+#define GEN7_SO_DECL_DW1_STREAM0_BUFFER_SELECTS__MASK		0x0000000f
+#define GEN7_SO_DECL_DW1_STREAM0_BUFFER_SELECTS__SHIFT		0
+
+#define GEN7_SO_DECL_DW2_STREAM3_ENTRY_COUNT__MASK		0xff000000
+#define GEN7_SO_DECL_DW2_STREAM3_ENTRY_COUNT__SHIFT		24
+#define GEN7_SO_DECL_DW2_STREAM2_ENTRY_COUNT__MASK		0x00ff0000
+#define GEN7_SO_DECL_DW2_STREAM2_ENTRY_COUNT__SHIFT		16
+#define GEN7_SO_DECL_DW2_STREAM1_ENTRY_COUNT__MASK		0x0000ff00
+#define GEN7_SO_DECL_DW2_STREAM1_ENTRY_COUNT__SHIFT		8
+#define GEN7_SO_DECL_DW2_STREAM0_ENTRY_COUNT__MASK		0x000000ff
+#define GEN7_SO_DECL_DW2_STREAM0_ENTRY_COUNT__SHIFT		0
+
+#define GEN7_SO_DECL_HIGH__MASK					0xffff0000
+#define GEN7_SO_DECL_HIGH__SHIFT				16
+#define GEN7_SO_DECL_OUTPUT_SLOT__MASK				0x00003000
+#define GEN7_SO_DECL_OUTPUT_SLOT__SHIFT				12
+#define GEN7_SO_DECL_HOLE_FLAG					(0x1 << 11)
+#define GEN7_SO_DECL_REG_INDEX__MASK				0x000003f0
+#define GEN7_SO_DECL_REG_INDEX__SHIFT				4
+#define GEN7_SO_DECL_COMPONENT_MASK__MASK			0x0000000f
+#define GEN7_SO_DECL_COMPONENT_MASK__SHIFT			0
+
+#define GEN7_3DSTATE_SO_BUFFER__SIZE				8
+
+
+#define GEN8_SO_BUF_DW1_ENABLE					(0x1 << 31)
+#define GEN7_SO_BUF_DW1_INDEX__MASK				0x60000000
+#define GEN7_SO_BUF_DW1_INDEX__SHIFT				29
+#define GEN7_SO_BUF_DW1_MOCS__MASK				0x1e000000
+#define GEN7_SO_BUF_DW1_MOCS__SHIFT				25
+#define GEN8_SO_BUF_DW1_MOCS__MASK				0x1fc00000
+#define GEN8_SO_BUF_DW1_MOCS__SHIFT				22
+#define GEN8_SO_BUF_DW1_OFFSET_WRITE_ENABLE			(0x1 << 21)
+#define GEN8_SO_BUF_DW1_OFFSET_ENABLE				(0x1 << 20)
+#define GEN7_SO_BUF_DW1_PITCH__MASK				0x00000fff
+#define GEN7_SO_BUF_DW1_PITCH__SHIFT				0
+
+#define GEN7_SO_BUF_DW2_START_ADDR__MASK			0xfffffffc
+#define GEN7_SO_BUF_DW2_START_ADDR__SHIFT			2
+#define GEN7_SO_BUF_DW2_START_ADDR__SHR				2
+
+#define GEN7_SO_BUF_DW3_END_ADDR__MASK				0xfffffffc
+#define GEN7_SO_BUF_DW3_END_ADDR__SHIFT				2
+#define GEN7_SO_BUF_DW3_END_ADDR__SHR				2
+
+#define GEN8_SO_BUF_DW2_ADDR__MASK				0xfffffffc
+#define GEN8_SO_BUF_DW2_ADDR__SHIFT				2
+#define GEN8_SO_BUF_DW2_ADDR__SHR				2
+
+
+
+#define GEN8_SO_BUF_DW5_OFFSET_ADDR__MASK			0xfffffffc
+#define GEN8_SO_BUF_DW5_OFFSET_ADDR__SHIFT			2
+#define GEN8_SO_BUF_DW5_OFFSET_ADDR__SHR			2
+
+
+
+#define GEN6_3DSTATE_CLIP__SIZE					4
+
+
+#define GEN7_CLIP_DW1_FRONTWINDING__MASK			0x00100000
+#define GEN7_CLIP_DW1_FRONTWINDING__SHIFT			20
+#define GEN7_CLIP_DW1_FRONTWINDING_CW				(0x0 << 20)
+#define GEN7_CLIP_DW1_FRONTWINDING_CCW				(0x1 << 20)
+#define GEN7_CLIP_DW1_SUBPIXEL__MASK				0x00080000
+#define GEN7_CLIP_DW1_SUBPIXEL__SHIFT				19
+#define GEN7_CLIP_DW1_SUBPIXEL_8BITS				(0x0 << 19)
+#define GEN7_CLIP_DW1_SUBPIXEL_4BITS				(0x1 << 19)
+#define GEN7_CLIP_DW1_EARLY_CULL_ENABLE				(0x1 << 18)
+#define GEN7_CLIP_DW1_CULLMODE__MASK				0x00030000
+#define GEN7_CLIP_DW1_CULLMODE__SHIFT				16
+#define GEN7_CLIP_DW1_CULLMODE_BOTH				(0x0 << 16)
+#define GEN7_CLIP_DW1_CULLMODE_NONE				(0x1 << 16)
+#define GEN7_CLIP_DW1_CULLMODE_FRONT				(0x2 << 16)
+#define GEN7_CLIP_DW1_CULLMODE_BACK				(0x3 << 16)
+#define GEN6_CLIP_DW1_STATISTICS				(0x1 << 10)
+#define GEN6_CLIP_DW1_UCP_CULL_ENABLES__MASK			0x000000ff
+#define GEN6_CLIP_DW1_UCP_CULL_ENABLES__SHIFT			0
+
+#define GEN6_CLIP_DW2_CLIP_ENABLE				(0x1 << 31)
+#define GEN6_CLIP_DW2_APIMODE__MASK				0x40000000
+#define GEN6_CLIP_DW2_APIMODE__SHIFT				30
+#define GEN6_CLIP_DW2_APIMODE_OGL				(0x0 << 30)
+#define GEN6_CLIP_DW2_APIMODE_D3D				(0x1 << 30)
+#define GEN6_CLIP_DW2_XY_TEST_ENABLE				(0x1 << 28)
+#define GEN6_CLIP_DW2_Z_TEST_ENABLE				(0x1 << 27)
+#define GEN6_CLIP_DW2_GB_TEST_ENABLE				(0x1 << 26)
+#define GEN6_CLIP_DW2_UCP_CLIP_ENABLES__MASK			0x00ff0000
+#define GEN6_CLIP_DW2_UCP_CLIP_ENABLES__SHIFT			16
+#define GEN6_CLIP_DW2_CLIPMODE__MASK				0x0000e000
+#define GEN6_CLIP_DW2_CLIPMODE__SHIFT				13
+#define GEN6_CLIP_DW2_CLIPMODE_NORMAL				(0x0 << 13)
+#define GEN6_CLIP_DW2_CLIPMODE_REJECT_ALL			(0x3 << 13)
+#define GEN6_CLIP_DW2_CLIPMODE_ACCEPT_ALL			(0x4 << 13)
+#define GEN6_CLIP_DW2_PERSPECTIVE_DIVIDE_DISABLE		(0x1 << 9)
+#define GEN6_CLIP_DW2_NONPERSPECTIVE_BARYCENTRIC_ENABLE		(0x1 << 8)
+#define GEN6_CLIP_DW2_TRI_PROVOKE__MASK				0x00000030
+#define GEN6_CLIP_DW2_TRI_PROVOKE__SHIFT			4
+#define GEN6_CLIP_DW2_LINE_PROVOKE__MASK			0x0000000c
+#define GEN6_CLIP_DW2_LINE_PROVOKE__SHIFT			2
+#define GEN6_CLIP_DW2_TRIFAN_PROVOKE__MASK			0x00000003
+#define GEN6_CLIP_DW2_TRIFAN_PROVOKE__SHIFT			0
+
+#define GEN6_CLIP_DW3_MIN_POINT_WIDTH__MASK			0x0ffe0000
+#define GEN6_CLIP_DW3_MIN_POINT_WIDTH__SHIFT			17
+#define GEN6_CLIP_DW3_MIN_POINT_WIDTH__RADIX			3
+#define GEN6_CLIP_DW3_MAX_POINT_WIDTH__MASK			0x0001ffc0
+#define GEN6_CLIP_DW3_MAX_POINT_WIDTH__SHIFT			6
+#define GEN6_CLIP_DW3_MAX_POINT_WIDTH__RADIX			3
+#define GEN6_CLIP_DW3_RTAINDEX_FORCED_ZERO			(0x1 << 5)
+#define GEN6_CLIP_DW3_MAX_VPINDEX__MASK				0x0000000f
+#define GEN6_CLIP_DW3_MAX_VPINDEX__SHIFT			0
+
+#define GEN6_3DSTATE_SF_DW1_DW3__SIZE				3
+
+#define GEN7_SF_DW1_DEPTH_FORMAT__MASK				0x00007000
+#define GEN7_SF_DW1_DEPTH_FORMAT__SHIFT				12
+#define GEN9_SF_DW1_LINE_WIDTH__MASK				0x3ffff000
+#define GEN9_SF_DW1_LINE_WIDTH__SHIFT				12
+#define GEN9_SF_DW1_LINE_WIDTH__RADIX				7
+#define GEN7_SF_DW1_LEGACY_DEPTH_OFFSET				(0x1 << 11)
+#define GEN7_SF_DW1_STATISTICS					(0x1 << 10)
+#define GEN7_SF_DW1_DEPTH_OFFSET_SOLID				(0x1 << 9)
+#define GEN7_SF_DW1_DEPTH_OFFSET_WIREFRAME			(0x1 << 8)
+#define GEN7_SF_DW1_DEPTH_OFFSET_POINT				(0x1 << 7)
+#define GEN7_SF_DW1_FRONTFACE__MASK				0x00000060
+#define GEN7_SF_DW1_FRONTFACE__SHIFT				5
+#define GEN7_SF_DW1_FRONTFACE_SOLID				(0x0 << 5)
+#define GEN7_SF_DW1_FRONTFACE_WIREFRAME				(0x1 << 5)
+#define GEN7_SF_DW1_FRONTFACE_POINT				(0x2 << 5)
+#define GEN7_SF_DW1_BACKFACE__MASK				0x00000018
+#define GEN7_SF_DW1_BACKFACE__SHIFT				3
+#define GEN7_SF_DW1_BACKFACE_SOLID				(0x0 << 3)
+#define GEN7_SF_DW1_BACKFACE_WIREFRAME				(0x1 << 3)
+#define GEN7_SF_DW1_BACKFACE_POINT				(0x2 << 3)
+#define GEN7_SF_DW1_VIEWPORT_ENABLE				(0x1 << 1)
+#define GEN7_SF_DW1_FRONTWINDING__MASK				0x00000001
+#define GEN7_SF_DW1_FRONTWINDING__SHIFT				0
+#define GEN7_SF_DW1_FRONTWINDING_CW				0x0
+#define GEN7_SF_DW1_FRONTWINDING_CCW				0x1
+
+#define GEN7_SF_DW2_AA_LINE_ENABLE				(0x1 << 31)
+#define GEN7_SF_DW2_CULLMODE__MASK				0x60000000
+#define GEN7_SF_DW2_CULLMODE__SHIFT				29
+#define GEN7_SF_DW2_CULLMODE_BOTH				(0x0 << 29)
+#define GEN7_SF_DW2_CULLMODE_NONE				(0x1 << 29)
+#define GEN7_SF_DW2_CULLMODE_FRONT				(0x2 << 29)
+#define GEN7_SF_DW2_CULLMODE_BACK				(0x3 << 29)
+#define GEN7_SF_DW2_LINE_WIDTH__MASK				0x0ffc0000
+#define GEN7_SF_DW2_LINE_WIDTH__SHIFT				18
+#define GEN7_SF_DW2_LINE_WIDTH__RADIX				7
+#define GEN7_SF_DW2_AA_LINE_CAP__MASK				0x00030000
+#define GEN7_SF_DW2_AA_LINE_CAP__SHIFT				16
+#define GEN7_SF_DW2_AA_LINE_CAP_0_5				(0x0 << 16)
+#define GEN7_SF_DW2_AA_LINE_CAP_1_0				(0x1 << 16)
+#define GEN7_SF_DW2_AA_LINE_CAP_2_0				(0x2 << 16)
+#define GEN7_SF_DW2_AA_LINE_CAP_4_0				(0x3 << 16)
+#define GEN75_SF_DW2_LINE_STIPPLE_ENABLE			(0x1 << 14)
+#define GEN7_SF_DW2_SCISSOR_ENABLE				(0x1 << 11)
+#define GEN7_SF_DW2_MSRASTMODE__MASK				0x00000300
+#define GEN7_SF_DW2_MSRASTMODE__SHIFT				8
+#define GEN7_SF_DW2_MSRASTMODE_OFF_PIXEL			(0x0 << 8)
+#define GEN7_SF_DW2_MSRASTMODE_OFF_PATTERN			(0x1 << 8)
+#define GEN7_SF_DW2_MSRASTMODE_ON_PIXEL				(0x2 << 8)
+#define GEN7_SF_DW2_MSRASTMODE_ON_PATTERN			(0x3 << 8)
+
+#define GEN7_SF_DW3_LINE_LAST_PIXEL_ENABLE			(0x1 << 31)
+#define GEN7_SF_DW3_TRI_PROVOKE__MASK				0x60000000
+#define GEN7_SF_DW3_TRI_PROVOKE__SHIFT				29
+#define GEN7_SF_DW3_LINE_PROVOKE__MASK				0x18000000
+#define GEN7_SF_DW3_LINE_PROVOKE__SHIFT				27
+#define GEN7_SF_DW3_TRIFAN_PROVOKE__MASK			0x06000000
+#define GEN7_SF_DW3_TRIFAN_PROVOKE__SHIFT			25
+#define GEN7_SF_DW3_TRUE_AA_LINE_DISTANCE			(0x1 << 14)
+#define GEN7_SF_DW3_SUBPIXEL__MASK				0x00001000
+#define GEN7_SF_DW3_SUBPIXEL__SHIFT				12
+#define GEN7_SF_DW3_SUBPIXEL_8BITS				(0x0 << 12)
+#define GEN7_SF_DW3_SUBPIXEL_4BITS				(0x1 << 12)
+#define GEN7_SF_DW3_USE_POINT_WIDTH				(0x1 << 11)
+#define GEN7_SF_DW3_POINT_WIDTH__MASK				0x000007ff
+#define GEN7_SF_DW3_POINT_WIDTH__SHIFT				0
+#define GEN7_SF_DW3_POINT_WIDTH__RADIX				3
+
+#define GEN7_3DSTATE_SBE_DW1__SIZE				13
+
+#define GEN8_SBE_DW1_USE_URB_READ_LEN				(0x1 << 29)
+#define GEN8_SBE_DW1_USE_URB_READ_OFFSET			(0x1 << 28)
+#define GEN7_SBE_DW1_ATTR_SWIZZLE__MASK				0x10000000
+#define GEN7_SBE_DW1_ATTR_SWIZZLE__SHIFT			28
+#define GEN7_SBE_DW1_ATTR_SWIZZLE_0_15				(0x0 << 28)
+#define GEN7_SBE_DW1_ATTR_SWIZZLE_16_31				(0x1 << 28)
+#define GEN7_SBE_DW1_ATTR_COUNT__MASK				0x0fc00000
+#define GEN7_SBE_DW1_ATTR_COUNT__SHIFT				22
+#define GEN7_SBE_DW1_ATTR_SWIZZLE_ENABLE			(0x1 << 21)
+#define GEN7_SBE_DW1_POINT_SPRITE_TEXCOORD__MASK		0x00100000
+#define GEN7_SBE_DW1_POINT_SPRITE_TEXCOORD__SHIFT		20
+#define GEN7_SBE_DW1_POINT_SPRITE_TEXCOORD_UPPERLEFT		(0x0 << 20)
+#define GEN7_SBE_DW1_POINT_SPRITE_TEXCOORD_LOWERLEFT		(0x1 << 20)
+#define GEN7_SBE_DW1_URB_READ_LEN__MASK				0x0000f800
+#define GEN7_SBE_DW1_URB_READ_LEN__SHIFT			11
+#define GEN7_SBE_DW1_URB_READ_OFFSET__MASK			0x000003f0
+#define GEN7_SBE_DW1_URB_READ_OFFSET__SHIFT			4
+#define GEN8_SBE_DW1_URB_READ_OFFSET__MASK			0x000007e0
+#define GEN8_SBE_DW1_URB_READ_OFFSET__SHIFT			5
+
+#define GEN8_3DSTATE_SBE_SWIZ_DW1_DW8__SIZE			8
+
+#define GEN8_SBE_SWIZ_HIGH__MASK				0xffff0000
+#define GEN8_SBE_SWIZ_HIGH__SHIFT				16
+#define GEN8_SBE_SWIZ_OVERRIDE_W				(0x1 << 15)
+#define GEN8_SBE_SWIZ_OVERRIDE_Z				(0x1 << 14)
+#define GEN8_SBE_SWIZ_OVERRIDE_Y				(0x1 << 13)
+#define GEN8_SBE_SWIZ_OVERRIDE_X				(0x1 << 12)
+#define GEN8_SBE_SWIZ_CONST__MASK				0x00000600
+#define GEN8_SBE_SWIZ_CONST__SHIFT				9
+#define GEN8_SBE_SWIZ_CONST_0000				(0x0 << 9)
+#define GEN8_SBE_SWIZ_CONST_0001_FLOAT				(0x1 << 9)
+#define GEN8_SBE_SWIZ_CONST_1111_FLOAT				(0x2 << 9)
+#define GEN8_SBE_SWIZ_CONST_PRIM_ID				(0x3 << 9)
+#define GEN8_SBE_SWIZ_INPUTATTR__MASK				0x000000c0
+#define GEN8_SBE_SWIZ_INPUTATTR__SHIFT				6
+#define GEN8_SBE_SWIZ_INPUTATTR_NORMAL				(0x0 << 6)
+#define GEN8_SBE_SWIZ_INPUTATTR_FACING				(0x1 << 6)
+#define GEN8_SBE_SWIZ_INPUTATTR_W				(0x2 << 6)
+#define GEN8_SBE_SWIZ_INPUTATTR_FACING_W			(0x3 << 6)
+#define GEN8_SBE_SWIZ_URB_ENTRY_OFFSET__MASK			0x0000001f
+#define GEN8_SBE_SWIZ_URB_ENTRY_OFFSET__SHIFT			0
+
+#define GEN6_3DSTATE_SF__SIZE					20
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+#define GEN7_3DSTATE_SBE__SIZE					14
+
+
+
+
+
+
+
+
+
+
+
+
+#define GEN9_SBE_DW_ACTIVE_COMPONENT__MASK			0x00000003
+#define GEN9_SBE_DW_ACTIVE_COMPONENT__SHIFT			0
+#define GEN9_SBE_DW_ACTIVE_COMPONENT_NONE			0x0
+#define GEN9_SBE_DW_ACTIVE_COMPONENT_XY				0x1
+#define GEN9_SBE_DW_ACTIVE_COMPONENT_XYZ			0x2
+#define GEN9_SBE_DW_ACTIVE_COMPONENT_XYZW			0x3
+
+#define GEN8_3DSTATE_SBE_SWIZ__SIZE				11
+
+
+
+
+#define GEN8_3DSTATE_RASTER__SIZE				5
+
+
+#define GEN9_RASTER_DW1_Z_TEST_FAR_ENABLE			(0x1 << 26)
+#define GEN8_RASTER_DW1_FRONTWINDING__MASK			0x00200000
+#define GEN8_RASTER_DW1_FRONTWINDING__SHIFT			21
+#define GEN8_RASTER_DW1_FRONTWINDING_CW				(0x0 << 21)
+#define GEN8_RASTER_DW1_FRONTWINDING_CCW			(0x1 << 21)
+#define GEN8_RASTER_DW1_CULLMODE__MASK				0x00030000
+#define GEN8_RASTER_DW1_CULLMODE__SHIFT				16
+#define GEN8_RASTER_DW1_CULLMODE_BOTH				(0x0 << 16)
+#define GEN8_RASTER_DW1_CULLMODE_NONE				(0x1 << 16)
+#define GEN8_RASTER_DW1_CULLMODE_FRONT				(0x2 << 16)
+#define GEN8_RASTER_DW1_CULLMODE_BACK				(0x3 << 16)
+#define GEN8_RASTER_DW1_SMOOTH_POINT_ENABLE			(0x1 << 13)
+#define GEN8_RASTER_DW1_API_MULTISAMPLE_ENABLE			(0x1 << 12)
+#define GEN8_RASTER_DW1_DEPTH_OFFSET_SOLID			(0x1 << 9)
+#define GEN8_RASTER_DW1_DEPTH_OFFSET_WIREFRAME			(0x1 << 8)
+#define GEN8_RASTER_DW1_DEPTH_OFFSET_POINT			(0x1 << 7)
+#define GEN8_RASTER_DW1_FRONTFACE__MASK				0x00000060
+#define GEN8_RASTER_DW1_FRONTFACE__SHIFT			5
+#define GEN8_RASTER_DW1_FRONTFACE_SOLID				(0x0 << 5)
+#define GEN8_RASTER_DW1_FRONTFACE_WIREFRAME			(0x1 << 5)
+#define GEN8_RASTER_DW1_FRONTFACE_POINT				(0x2 << 5)
+#define GEN8_RASTER_DW1_BACKFACE__MASK				0x00000018
+#define GEN8_RASTER_DW1_BACKFACE__SHIFT				3
+#define GEN8_RASTER_DW1_BACKFACE_SOLID				(0x0 << 3)
+#define GEN8_RASTER_DW1_BACKFACE_WIREFRAME			(0x1 << 3)
+#define GEN8_RASTER_DW1_BACKFACE_POINT				(0x2 << 3)
+#define GEN8_RASTER_DW1_AA_LINE_ENABLE				(0x1 << 2)
+#define GEN8_RASTER_DW1_SCISSOR_ENABLE				(0x1 << 1)
+#define GEN8_RASTER_DW1_Z_TEST_ENABLE				(0x1 << 0)
+#define GEN9_RASTER_DW1_Z_TEST_NEAR_ENABLE			(0x1 << 0)
+
+
+
+
+#define GEN6_3DSTATE_WM__SIZE					9
+
+
+#define GEN6_WM_DW1_KERNEL0_ADDR__MASK				0xffffffc0
+#define GEN6_WM_DW1_KERNEL0_ADDR__SHIFT				6
+#define GEN6_WM_DW1_KERNEL0_ADDR__SHR				6
+
+
+
+#define GEN6_WM_DW4_STATISTICS					(0x1 << 31)
+#define GEN6_WM_DW4_DEPTH_CLEAR					(0x1 << 30)
+#define GEN6_WM_DW4_DEPTH_RESOLVE				(0x1 << 28)
+#define GEN6_WM_DW4_HIZ_RESOLVE					(0x1 << 27)
+#define GEN6_WM_DW4_URB_GRF_START0__MASK			0x007f0000
+#define GEN6_WM_DW4_URB_GRF_START0__SHIFT			16
+#define GEN6_WM_DW4_URB_GRF_START1__MASK			0x00007f00
+#define GEN6_WM_DW4_URB_GRF_START1__SHIFT			8
+#define GEN6_WM_DW4_URB_GRF_START2__MASK			0x0000007f
+#define GEN6_WM_DW4_URB_GRF_START2__SHIFT			0
+
+#define GEN6_WM_DW5_MAX_THREADS__MASK				0xfe000000
+#define GEN6_WM_DW5_MAX_THREADS__SHIFT				25
+#define GEN6_WM_DW5_LEGACY_LINE_RAST				(0x1 << 23)
+#define GEN6_WM_DW5_PS_KILL_PIXEL				(0x1 << 22)
+#define GEN6_WM_DW5_PS_COMPUTE_DEPTH				(0x1 << 21)
+#define GEN6_WM_DW5_PS_USE_DEPTH				(0x1 << 20)
+#define GEN6_WM_DW5_PS_DISPATCH_ENABLE				(0x1 << 19)
+#define GEN6_WM_DW5_AA_LINE_CAP__MASK				0x00030000
+#define GEN6_WM_DW5_AA_LINE_CAP__SHIFT				16
+#define GEN6_WM_DW5_AA_LINE_CAP_0_5				(0x0 << 16)
+#define GEN6_WM_DW5_AA_LINE_CAP_1_0				(0x1 << 16)
+#define GEN6_WM_DW5_AA_LINE_CAP_2_0				(0x2 << 16)
+#define GEN6_WM_DW5_AA_LINE_CAP_4_0				(0x3 << 16)
+#define GEN6_WM_DW5_AA_LINE_WIDTH__MASK				0x0000c000
+#define GEN6_WM_DW5_AA_LINE_WIDTH__SHIFT			14
+#define GEN6_WM_DW5_AA_LINE_WIDTH_0_5				(0x0 << 14)
+#define GEN6_WM_DW5_AA_LINE_WIDTH_1_0				(0x1 << 14)
+#define GEN6_WM_DW5_AA_LINE_WIDTH_2_0				(0x2 << 14)
+#define GEN6_WM_DW5_AA_LINE_WIDTH_4_0				(0x3 << 14)
+#define GEN6_WM_DW5_POLY_STIPPLE_ENABLE				(0x1 << 13)
+#define GEN6_WM_DW5_LINE_STIPPLE_ENABLE				(0x1 << 11)
+#define GEN6_WM_DW5_PS_COMPUTE_OMASK				(0x1 << 9)
+#define GEN6_WM_DW5_PS_USE_W					(0x1 << 8)
+#define GEN6_WM_DW5_PS_DUAL_SOURCE_BLEND			(0x1 << 7)
+#define GEN6_WM_DW5_PS_DISPATCH_MODE__MASK			0x00000007
+#define GEN6_WM_DW5_PS_DISPATCH_MODE__SHIFT			0
+
+#define GEN6_WM_DW6_SF_ATTR_COUNT__MASK				0x03f00000
+#define GEN6_WM_DW6_SF_ATTR_COUNT__SHIFT			20
+#define GEN6_WM_DW6_PS_POSOFFSET__MASK				0x000c0000
+#define GEN6_WM_DW6_PS_POSOFFSET__SHIFT				18
+#define GEN6_WM_DW6_PS_POSOFFSET_NONE				(0x0 << 18)
+#define GEN6_WM_DW6_PS_POSOFFSET_CENTROID			(0x2 << 18)
+#define GEN6_WM_DW6_PS_POSOFFSET_SAMPLE				(0x3 << 18)
+#define GEN6_WM_DW6_ZW_INTERP__MASK				0x00030000
+#define GEN6_WM_DW6_ZW_INTERP__SHIFT				16
+#define GEN6_WM_DW6_ZW_INTERP_PIXEL				(0x0 << 16)
+#define GEN6_WM_DW6_ZW_INTERP_CENTROID				(0x2 << 16)
+#define GEN6_WM_DW6_ZW_INTERP_SAMPLE				(0x3 << 16)
+#define GEN6_WM_DW6_BARYCENTRIC_INTERP__MASK			0x0000fc00
+#define GEN6_WM_DW6_BARYCENTRIC_INTERP__SHIFT			10
+#define GEN6_WM_DW6_POINT_RASTRULE__MASK			0x00000200
+#define GEN6_WM_DW6_POINT_RASTRULE__SHIFT			9
+#define GEN6_WM_DW6_POINT_RASTRULE_UPPER_LEFT			(0x0 << 9)
+#define GEN6_WM_DW6_POINT_RASTRULE_UPPER_RIGHT			(0x1 << 9)
+#define GEN6_WM_DW6_MSRASTMODE__MASK				0x00000006
+#define GEN6_WM_DW6_MSRASTMODE__SHIFT				1
+#define GEN6_WM_DW6_MSRASTMODE_OFF_PIXEL			(0x0 << 1)
+#define GEN6_WM_DW6_MSRASTMODE_OFF_PATTERN			(0x1 << 1)
+#define GEN6_WM_DW6_MSRASTMODE_ON_PIXEL				(0x2 << 1)
+#define GEN6_WM_DW6_MSRASTMODE_ON_PATTERN			(0x3 << 1)
+#define GEN6_WM_DW6_MSDISPMODE__MASK				0x00000001
+#define GEN6_WM_DW6_MSDISPMODE__SHIFT				0
+#define GEN6_WM_DW6_MSDISPMODE_PERSAMPLE			0x0
+#define GEN6_WM_DW6_MSDISPMODE_PERPIXEL				0x1
+
+#define GEN6_WM_DW7_KERNEL1_ADDR__MASK				0xffffffc0
+#define GEN6_WM_DW7_KERNEL1_ADDR__SHIFT				6
+#define GEN6_WM_DW7_KERNEL1_ADDR__SHR				6
+
+#define GEN6_WM_DW8_KERNEL2_ADDR__MASK				0xffffffc0
+#define GEN6_WM_DW8_KERNEL2_ADDR__SHIFT				6
+#define GEN6_WM_DW8_KERNEL2_ADDR__SHR				6
+
+
+#define GEN7_WM_DW1_STATISTICS					(0x1 << 31)
+#define GEN7_WM_DW1_DEPTH_CLEAR					(0x1 << 30)
+#define GEN7_WM_DW1_PS_DISPATCH_ENABLE				(0x1 << 29)
+#define GEN7_WM_DW1_DEPTH_RESOLVE				(0x1 << 28)
+#define GEN7_WM_DW1_HIZ_RESOLVE					(0x1 << 27)
+#define GEN7_WM_DW1_LEGACY_LINE_RAST				(0x1 << 26)
+#define GEN7_WM_DW1_PS_KILL_PIXEL				(0x1 << 25)
+#define GEN7_WM_DW1_PSCDEPTH__MASK				0x01800000
+#define GEN7_WM_DW1_PSCDEPTH__SHIFT				23
+#define GEN7_WM_DW1_PSCDEPTH_OFF				(0x0 << 23)
+#define GEN7_WM_DW1_PSCDEPTH_ON					(0x1 << 23)
+#define GEN7_WM_DW1_PSCDEPTH_ON_GE				(0x2 << 23)
+#define GEN7_WM_DW1_PSCDEPTH_ON_LE				(0x3 << 23)
+#define GEN7_WM_DW1_EDSC__MASK					0x00600000
+#define GEN7_WM_DW1_EDSC__SHIFT					21
+#define GEN7_WM_DW1_EDSC_NORMAL					(0x0 << 21)
+#define GEN7_WM_DW1_EDSC_PSEXEC					(0x1 << 21)
+#define GEN7_WM_DW1_EDSC_PREPS					(0x2 << 21)
+#define GEN7_WM_DW1_PS_USE_DEPTH				(0x1 << 20)
+#define GEN7_WM_DW1_PS_USE_W					(0x1 << 19)
+#define GEN7_WM_DW1_ZW_INTERP__MASK				0x00060000
+#define GEN7_WM_DW1_ZW_INTERP__SHIFT				17
+#define GEN7_WM_DW1_ZW_INTERP_PIXEL				(0x0 << 17)
+#define GEN7_WM_DW1_ZW_INTERP_CENTROID				(0x2 << 17)
+#define GEN7_WM_DW1_ZW_INTERP_SAMPLE				(0x3 << 17)
+#define GEN7_WM_DW1_BARYCENTRIC_INTERP__MASK			0x0001f800
+#define GEN7_WM_DW1_BARYCENTRIC_INTERP__SHIFT			11
+#define GEN7_WM_DW1_PS_USE_COVERAGE_MASK			(0x1 << 10)
+#define GEN7_WM_DW1_AA_LINE_CAP__MASK				0x00000300
+#define GEN7_WM_DW1_AA_LINE_CAP__SHIFT				8
+#define GEN7_WM_DW1_AA_LINE_CAP_0_5				(0x0 << 8)
+#define GEN7_WM_DW1_AA_LINE_CAP_1_0				(0x1 << 8)
+#define GEN7_WM_DW1_AA_LINE_CAP_2_0				(0x2 << 8)
+#define GEN7_WM_DW1_AA_LINE_CAP_4_0				(0x3 << 8)
+#define GEN7_WM_DW1_AA_LINE_WIDTH__MASK				0x000000c0
+#define GEN7_WM_DW1_AA_LINE_WIDTH__SHIFT			6
+#define GEN7_WM_DW1_AA_LINE_WIDTH_0_5				(0x0 << 6)
+#define GEN7_WM_DW1_AA_LINE_WIDTH_1_0				(0x1 << 6)
+#define GEN7_WM_DW1_AA_LINE_WIDTH_2_0				(0x2 << 6)
+#define GEN7_WM_DW1_AA_LINE_WIDTH_4_0				(0x3 << 6)
+#define GEN75_WM_DW1_RT_INDEPENDENT_RAST			(0x1 << 5)
+#define GEN7_WM_DW1_POLY_STIPPLE_ENABLE				(0x1 << 4)
+#define GEN7_WM_DW1_LINE_STIPPLE_ENABLE				(0x1 << 3)
+#define GEN7_WM_DW1_POINT_RASTRULE__MASK			0x00000004
+#define GEN7_WM_DW1_POINT_RASTRULE__SHIFT			2
+#define GEN7_WM_DW1_POINT_RASTRULE_UPPER_LEFT			(0x0 << 2)
+#define GEN7_WM_DW1_POINT_RASTRULE_UPPER_RIGHT			(0x1 << 2)
+#define GEN7_WM_DW1_MSRASTMODE__MASK				0x00000003
+#define GEN7_WM_DW1_MSRASTMODE__SHIFT				0
+#define GEN7_WM_DW1_MSRASTMODE_OFF_PIXEL			0x0
+#define GEN7_WM_DW1_MSRASTMODE_OFF_PATTERN			0x1
+#define GEN7_WM_DW1_MSRASTMODE_ON_PIXEL				0x2
+#define GEN7_WM_DW1_MSRASTMODE_ON_PATTERN			0x3
+
+#define GEN7_WM_DW2_MSDISPMODE__MASK				0x80000000
+#define GEN7_WM_DW2_MSDISPMODE__SHIFT				31
+#define GEN7_WM_DW2_MSDISPMODE_PERSAMPLE			(0x0 << 31)
+#define GEN7_WM_DW2_MSDISPMODE_PERPIXEL				(0x1 << 31)
+#define GEN75_WM_DW2_PS_UAV_ONLY				(0x1 << 30)
+
+#define GEN8_3DSTATE_WM_CHROMAKEY__SIZE				2
+
+
+
+#define GEN8_3DSTATE_WM_DEPTH_STENCIL__SIZE			4
+
+
+#define GEN8_ZS_DW1_STENCIL0_FAIL_OP__MASK			0xe0000000
+#define GEN8_ZS_DW1_STENCIL0_FAIL_OP__SHIFT			29
+#define GEN8_ZS_DW1_STENCIL0_ZFAIL_OP__MASK			0x1c000000
+#define GEN8_ZS_DW1_STENCIL0_ZFAIL_OP__SHIFT			26
+#define GEN8_ZS_DW1_STENCIL0_ZPASS_OP__MASK			0x03800000
+#define GEN8_ZS_DW1_STENCIL0_ZPASS_OP__SHIFT			23
+#define GEN8_ZS_DW1_STENCIL1_FUNC__MASK				0x00700000
+#define GEN8_ZS_DW1_STENCIL1_FUNC__SHIFT			20
+#define GEN8_ZS_DW1_STENCIL1_FAIL_OP__MASK			0x000e0000
+#define GEN8_ZS_DW1_STENCIL1_FAIL_OP__SHIFT			17
+#define GEN8_ZS_DW1_STENCIL1_ZFAIL_OP__MASK			0x0001c000
+#define GEN8_ZS_DW1_STENCIL1_ZFAIL_OP__SHIFT			14
+#define GEN8_ZS_DW1_STENCIL1_ZPASS_OP__MASK			0x00003800
+#define GEN8_ZS_DW1_STENCIL1_ZPASS_OP__SHIFT			11
+#define GEN8_ZS_DW1_STENCIL0_FUNC__MASK				0x00000700
+#define GEN8_ZS_DW1_STENCIL0_FUNC__SHIFT			8
+#define GEN8_ZS_DW1_DEPTH_FUNC__MASK				0x000000e0
+#define GEN8_ZS_DW1_DEPTH_FUNC__SHIFT				5
+#define GEN8_ZS_DW1_STENCIL1_ENABLE				(0x1 << 4)
+#define GEN8_ZS_DW1_STENCIL_TEST_ENABLE				(0x1 << 3)
+#define GEN8_ZS_DW1_STENCIL_WRITE_ENABLE			(0x1 << 2)
+#define GEN8_ZS_DW1_DEPTH_TEST_ENABLE				(0x1 << 1)
+#define GEN8_ZS_DW1_DEPTH_WRITE_ENABLE				(0x1 << 0)
+
+#define GEN8_ZS_DW2_STENCIL0_VALUEMASK__MASK			0xff000000
+#define GEN8_ZS_DW2_STENCIL0_VALUEMASK__SHIFT			24
+#define GEN8_ZS_DW2_STENCIL0_WRITEMASK__MASK			0x00ff0000
+#define GEN8_ZS_DW2_STENCIL0_WRITEMASK__SHIFT			16
+#define GEN8_ZS_DW2_STENCIL1_VALUEMASK__MASK			0x0000ff00
+#define GEN8_ZS_DW2_STENCIL1_VALUEMASK__SHIFT			8
+#define GEN8_ZS_DW2_STENCIL1_WRITEMASK__MASK			0x000000ff
+#define GEN8_ZS_DW2_STENCIL1_WRITEMASK__SHIFT			0
+
+#define GEN9_ZS_DW3_STENCIL0_REF__MASK				0x0000ff00
+#define GEN9_ZS_DW3_STENCIL0_REF__SHIFT				8
+#define GEN9_ZS_DW3_STENCIL1_REF__MASK				0x000000ff
+#define GEN9_ZS_DW3_STENCIL1_REF__SHIFT				0
+
+#define GEN8_3DSTATE_WM_HZ_OP__SIZE				5
+
+
+#define GEN8_WM_HZ_DW1_STENCIL_CLEAR				(0x1 << 31)
+#define GEN8_WM_HZ_DW1_DEPTH_CLEAR				(0x1 << 30)
+#define GEN8_WM_HZ_DW1_DEPTH_RESOLVE				(0x1 << 28)
+#define GEN8_WM_HZ_DW1_HIZ_RESOLVE				(0x1 << 27)
+#define GEN8_WM_HZ_DW1_PIXEL_OFFSET_ENABLE			(0x1 << 26)
+#define GEN8_WM_HZ_DW1_FULL_SURFACE_DEPTH_CLEAR			(0x1 << 25)
+#define GEN8_WM_HZ_DW1_STENCIL_CLEAR_VALUE__MASK		0x00ff0000
+#define GEN8_WM_HZ_DW1_STENCIL_CLEAR_VALUE__SHIFT		16
+#define GEN8_WM_HZ_DW1_NUMSAMPLES__MASK				0x0000e000
+#define GEN8_WM_HZ_DW1_NUMSAMPLES__SHIFT			13
+#define GEN8_WM_HZ_DW1_NUMSAMPLES_1				(0x0 << 13)
+#define GEN8_WM_HZ_DW1_NUMSAMPLES_2				(0x1 << 13)
+#define GEN8_WM_HZ_DW1_NUMSAMPLES_4				(0x2 << 13)
+#define GEN8_WM_HZ_DW1_NUMSAMPLES_8				(0x3 << 13)
+#define GEN8_WM_HZ_DW1_NUMSAMPLES_16				(0x4 << 13)
+
+#define GEN8_WM_HZ_DW2_RECT_MIN_Y__MASK				0xffff0000
+#define GEN8_WM_HZ_DW2_RECT_MIN_Y__SHIFT			16
+#define GEN8_WM_HZ_DW2_RECT_MIN_X__MASK				0x0000ffff
+#define GEN8_WM_HZ_DW2_RECT_MIN_X__SHIFT			0
+
+#define GEN8_WM_HZ_DW3_RECT_MAX_Y__MASK				0xffff0000
+#define GEN8_WM_HZ_DW3_RECT_MAX_Y__SHIFT			16
+#define GEN8_WM_HZ_DW3_RECT_MAX_X__MASK				0x0000ffff
+#define GEN8_WM_HZ_DW3_RECT_MAX_X__SHIFT			0
+
+#define GEN8_WM_HZ_DW4_SAMPLE_MASK__MASK			0x0000ffff
+#define GEN8_WM_HZ_DW4_SAMPLE_MASK__SHIFT			0
+
+#define GEN7_3DSTATE_PS__SIZE					12
+
+
+#define GEN7_PS_DW1_KERNEL0_ADDR__MASK				0xffffffc0
+#define GEN7_PS_DW1_KERNEL0_ADDR__SHIFT				6
+#define GEN7_PS_DW1_KERNEL0_ADDR__SHR				6
+
+
+
+#define GEN7_PS_DW4_MAX_THREADS__MASK				0xff000000
+#define GEN7_PS_DW4_MAX_THREADS__SHIFT				24
+#define GEN75_PS_DW4_MAX_THREADS__MASK				0xff800000
+#define GEN75_PS_DW4_MAX_THREADS__SHIFT				23
+#define GEN75_PS_DW4_SAMPLE_MASK__MASK				0x000ff000
+#define GEN75_PS_DW4_SAMPLE_MASK__SHIFT				12
+#define GEN7_PS_DW4_PUSH_CONSTANT_ENABLE			(0x1 << 11)
+#define GEN7_PS_DW4_ATTR_ENABLE					(0x1 << 10)
+#define GEN7_PS_DW4_COMPUTE_OMASK				(0x1 << 9)
+#define GEN7_PS_DW4_RT_FAST_CLEAR				(0x1 << 8)
+#define GEN7_PS_DW4_DUAL_SOURCE_BLEND				(0x1 << 7)
+#define GEN7_PS_DW4_RT_RESOLVE					(0x1 << 6)
+#define GEN75_PS_DW4_ACCESS_UAV					(0x1 << 5)
+#define GEN7_PS_DW4_POSOFFSET__MASK				0x00000018
+#define GEN7_PS_DW4_POSOFFSET__SHIFT				3
+#define GEN7_PS_DW4_POSOFFSET_NONE				(0x0 << 3)
+#define GEN7_PS_DW4_POSOFFSET_CENTROID				(0x2 << 3)
+#define GEN7_PS_DW4_POSOFFSET_SAMPLE				(0x3 << 3)
+#define GEN7_PS_DW4_DISPATCH_MODE__MASK				0x00000007
+#define GEN7_PS_DW4_DISPATCH_MODE__SHIFT			0
+
+#define GEN7_PS_DW5_URB_GRF_START0__MASK			0x007f0000
+#define GEN7_PS_DW5_URB_GRF_START0__SHIFT			16
+#define GEN7_PS_DW5_URB_GRF_START1__MASK			0x00007f00
+#define GEN7_PS_DW5_URB_GRF_START1__SHIFT			8
+#define GEN7_PS_DW5_URB_GRF_START2__MASK			0x0000007f
+#define GEN7_PS_DW5_URB_GRF_START2__SHIFT			0
+
+#define GEN7_PS_DW6_KERNEL1_ADDR__MASK				0xffffffc0
+#define GEN7_PS_DW6_KERNEL1_ADDR__SHIFT				6
+#define GEN7_PS_DW6_KERNEL1_ADDR__SHR				6
+
+#define GEN7_PS_DW7_KERNEL2_ADDR__MASK				0xffffffc0
+#define GEN7_PS_DW7_KERNEL2_ADDR__SHIFT				6
+#define GEN7_PS_DW7_KERNEL2_ADDR__SHR				6
+
+
+
+#define GEN8_PS_DW1_KERNEL0_ADDR__MASK				0xffffffc0
+#define GEN8_PS_DW1_KERNEL0_ADDR__SHIFT				6
+#define GEN8_PS_DW1_KERNEL0_ADDR__SHR				6
+
+
+
+
+
+#define GEN8_PS_DW6_MAX_THREADS__MASK				0xff800000
+#define GEN8_PS_DW6_MAX_THREADS__SHIFT				23
+#define GEN8_PS_DW6_PUSH_CONSTANT_ENABLE			(0x1 << 11)
+#define GEN8_PS_DW6_RT_FAST_CLEAR				(0x1 << 8)
+#define GEN8_PS_DW6_RT_RESOLVE					(0x1 << 6)
+#define GEN8_PS_DW6_POSOFFSET__MASK				0x00000018
+#define GEN8_PS_DW6_POSOFFSET__SHIFT				3
+#define GEN8_PS_DW6_POSOFFSET_NONE				(0x0 << 3)
+#define GEN8_PS_DW6_POSOFFSET_CENTROID				(0x2 << 3)
+#define GEN8_PS_DW6_POSOFFSET_SAMPLE				(0x3 << 3)
+#define GEN8_PS_DW6_DISPATCH_MODE__MASK				0x00000007
+#define GEN8_PS_DW6_DISPATCH_MODE__SHIFT			0
+
+#define GEN8_PS_DW7_URB_GRF_START0__MASK			0x007f0000
+#define GEN8_PS_DW7_URB_GRF_START0__SHIFT			16
+#define GEN8_PS_DW7_URB_GRF_START1__MASK			0x00007f00
+#define GEN8_PS_DW7_URB_GRF_START1__SHIFT			8
+#define GEN8_PS_DW7_URB_GRF_START2__MASK			0x0000007f
+#define GEN8_PS_DW7_URB_GRF_START2__SHIFT			0
+
+#define GEN8_PS_DW8_KERNEL1_ADDR__MASK				0xffffffc0
+#define GEN8_PS_DW8_KERNEL1_ADDR__SHIFT				6
+#define GEN8_PS_DW8_KERNEL1_ADDR__SHR				6
+
+
+#define GEN8_PS_DW10_KERNEL2_ADDR__MASK				0xffffffc0
+#define GEN8_PS_DW10_KERNEL2_ADDR__SHIFT			6
+#define GEN8_PS_DW10_KERNEL2_ADDR__SHR				6
+
+
+#define GEN8_3DSTATE_PS_EXTRA__SIZE				2
+
+
+#define GEN8_PSX_DW1_DISPATCH_ENABLE				(0x1 << 31)
+#define GEN8_PSX_DW1_UAV_ONLY					(0x1 << 30)
+#define GEN8_PSX_DW1_COMPUTE_OMASK				(0x1 << 29)
+#define GEN8_PSX_DW1_KILL_PIXEL					(0x1 << 28)
+#define GEN8_PSX_DW1_PSCDEPTH__MASK				0x0c000000
+#define GEN8_PSX_DW1_PSCDEPTH__SHIFT				26
+#define GEN8_PSX_DW1_PSCDEPTH_OFF				(0x0 << 26)
+#define GEN8_PSX_DW1_PSCDEPTH_ON				(0x1 << 26)
+#define GEN8_PSX_DW1_PSCDEPTH_ON_GE				(0x2 << 26)
+#define GEN8_PSX_DW1_PSCDEPTH_ON_LE				(0x3 << 26)
+#define GEN8_PSX_DW1_FORCE_COMPUTE_DEPTH			(0x1 << 25)
+#define GEN8_PSX_DW1_USE_DEPTH					(0x1 << 24)
+#define GEN8_PSX_DW1_USE_W					(0x1 << 23)
+#define GEN8_PSX_DW1_ATTR_ENABLE				(0x1 << 8)
+#define GEN8_PSX_DW1_DISABLE_ALPHA_TO_COVERAGE			(0x1 << 7)
+#define GEN8_PSX_DW1_PER_SAMPLE					(0x1 << 6)
+#define GEN8_PSX_DW1_COMPUTE_STENCIL				(0x1 << 5)
+#define GEN8_PSX_DW1_ACCESS_UAV					(0x1 << 2)
+#define GEN8_PSX_DW1_USE_COVERAGE_MASK				(0x1 << 1)
+
+#define GEN8_3DSTATE_PS_BLEND__SIZE				2
+
+
+#define GEN8_PS_BLEND_DW1_ALPHA_TO_COVERAGE			(0x1 << 31)
+#define GEN8_PS_BLEND_DW1_WRITABLE_RT				(0x1 << 30)
+#define GEN8_PS_BLEND_DW1_BLEND_ENABLE				(0x1 << 29)
+#define GEN8_PS_BLEND_DW1_SRC_ALPHA_FACTOR__MASK		0x1f000000
+#define GEN8_PS_BLEND_DW1_SRC_ALPHA_FACTOR__SHIFT		24
+#define GEN8_PS_BLEND_DW1_DST_ALPHA_FACTOR__MASK		0x00f80000
+#define GEN8_PS_BLEND_DW1_DST_ALPHA_FACTOR__SHIFT		19
+#define GEN8_PS_BLEND_DW1_SRC_COLOR_FACTOR__MASK		0x0007c000
+#define GEN8_PS_BLEND_DW1_SRC_COLOR_FACTOR__SHIFT		14
+#define GEN8_PS_BLEND_DW1_DST_COLOR_FACTOR__MASK		0x00003e00
+#define GEN8_PS_BLEND_DW1_DST_COLOR_FACTOR__SHIFT		9
+#define GEN8_PS_BLEND_DW1_ALPHA_TEST_ENABLE			(0x1 << 8)
+#define GEN8_PS_BLEND_DW1_INDEPENDENT_ALPHA_ENABLE		(0x1 << 7)
+
+#define GEN6_3DSTATE_CONSTANT_ANY__SIZE				11
+
+#define GEN6_CONSTANT_DW0_BUFFER_ENABLES__MASK			0x0000f000
+#define GEN6_CONSTANT_DW0_BUFFER_ENABLES__SHIFT			12
+#define GEN6_CONSTANT_DW0_MOCS__MASK				0x00000f00
+#define GEN6_CONSTANT_DW0_MOCS__SHIFT				8
+
+#define GEN6_CONSTANT_DW_ADDR_READ_LEN__MASK			0x0000001f
+#define GEN6_CONSTANT_DW_ADDR_READ_LEN__SHIFT			0
+#define GEN6_CONSTANT_DW_ADDR_ADDR__MASK			0xffffffe0
+#define GEN6_CONSTANT_DW_ADDR_ADDR__SHIFT			5
+#define GEN6_CONSTANT_DW_ADDR_ADDR__SHR				5
+
+
+
+#define GEN7_CONSTANT_DW1_BUFFER1_READ_LEN__MASK		0xffff0000
+#define GEN7_CONSTANT_DW1_BUFFER1_READ_LEN__SHIFT		16
+#define GEN7_CONSTANT_DW1_BUFFER0_READ_LEN__MASK		0x0000ffff
+#define GEN7_CONSTANT_DW1_BUFFER0_READ_LEN__SHIFT		0
+
+#define GEN7_CONSTANT_DW2_BUFFER3_READ_LEN__MASK		0xffff0000
+#define GEN7_CONSTANT_DW2_BUFFER3_READ_LEN__SHIFT		16
+#define GEN7_CONSTANT_DW2_BUFFER2_READ_LEN__MASK		0x0000ffff
+#define GEN7_CONSTANT_DW2_BUFFER2_READ_LEN__SHIFT		0
+
+#define GEN7_CONSTANT_DW_ADDR_MOCS__MASK			0x0000001f
+#define GEN7_CONSTANT_DW_ADDR_MOCS__SHIFT			0
+#define GEN7_CONSTANT_DW_ADDR_ADDR__MASK			0xffffffe0
+#define GEN7_CONSTANT_DW_ADDR_ADDR__SHIFT			5
+#define GEN7_CONSTANT_DW_ADDR_ADDR__SHR				5
+
+#define GEN8_CONSTANT_DW_ADDR_ADDR__MASK			0xffffffe0
+#define GEN8_CONSTANT_DW_ADDR_ADDR__SHIFT			5
+#define GEN8_CONSTANT_DW_ADDR_ADDR__SHR				5
+
+#define GEN6_3DSTATE_SAMPLE_MASK__SIZE				2
+
+
+#define GEN6_SAMPLE_MASK_DW1_VAL__MASK				0x0000000f
+#define GEN6_SAMPLE_MASK_DW1_VAL__SHIFT				0
+#define GEN7_SAMPLE_MASK_DW1_VAL__MASK				0x000000ff
+#define GEN7_SAMPLE_MASK_DW1_VAL__SHIFT				0
+#define GEN8_SAMPLE_MASK_DW1_VAL__MASK				0x0000ffff
+#define GEN8_SAMPLE_MASK_DW1_VAL__SHIFT				0
+
+#define GEN6_3DSTATE_DRAWING_RECTANGLE__SIZE			4
+
+
+#define GEN6_DRAWING_RECTANGLE_DW1_MIN_Y__MASK			0xffff0000
+#define GEN6_DRAWING_RECTANGLE_DW1_MIN_Y__SHIFT			16
+#define GEN6_DRAWING_RECTANGLE_DW1_MIN_X__MASK			0x0000ffff
+#define GEN6_DRAWING_RECTANGLE_DW1_MIN_X__SHIFT			0
+
+#define GEN6_DRAWING_RECTANGLE_DW2_MAX_Y__MASK			0xffff0000
+#define GEN6_DRAWING_RECTANGLE_DW2_MAX_Y__SHIFT			16
+#define GEN6_DRAWING_RECTANGLE_DW2_MAX_X__MASK			0x0000ffff
+#define GEN6_DRAWING_RECTANGLE_DW2_MAX_X__SHIFT			0
+
+#define GEN6_DRAWING_RECTANGLE_DW3_ORIGIN_Y__MASK		0xffff0000
+#define GEN6_DRAWING_RECTANGLE_DW3_ORIGIN_Y__SHIFT		16
+#define GEN6_DRAWING_RECTANGLE_DW3_ORIGIN_X__MASK		0x0000ffff
+#define GEN6_DRAWING_RECTANGLE_DW3_ORIGIN_X__SHIFT		0
+
+#define GEN6_3DSTATE_DEPTH_BUFFER__SIZE				8
+
+
+#define GEN6_DEPTH_DW1_TYPE__MASK				0xe0000000
+#define GEN6_DEPTH_DW1_TYPE__SHIFT				29
+#define GEN6_DEPTH_DW1_TILING__MASK				0x0c000000
+#define GEN6_DEPTH_DW1_TILING__SHIFT				26
+#define GEN6_DEPTH_DW1_STR_MODE__MASK				0x01800000
+#define GEN6_DEPTH_DW1_STR_MODE__SHIFT				23
+#define GEN6_DEPTH_DW1_HIZ_ENABLE				(0x1 << 22)
+#define GEN6_DEPTH_DW1_SEPARATE_STENCIL				(0x1 << 21)
+#define GEN6_DEPTH_DW1_FORMAT__MASK				0x001c0000
+#define GEN6_DEPTH_DW1_FORMAT__SHIFT				18
+#define GEN6_DEPTH_DW1_PITCH__MASK				0x0001ffff
+#define GEN6_DEPTH_DW1_PITCH__SHIFT				0
+
+
+#define GEN6_DEPTH_DW3_HEIGHT__MASK				0xfff80000
+#define GEN6_DEPTH_DW3_HEIGHT__SHIFT				19
+#define GEN6_DEPTH_DW3_WIDTH__MASK				0x0007ffc0
+#define GEN6_DEPTH_DW3_WIDTH__SHIFT				6
+#define GEN6_DEPTH_DW3_LOD__MASK				0x0000003c
+#define GEN6_DEPTH_DW3_LOD__SHIFT				2
+#define GEN6_DEPTH_DW3_MIPLAYOUT__MASK				0x00000002
+#define GEN6_DEPTH_DW3_MIPLAYOUT__SHIFT				1
+#define GEN6_DEPTH_DW3_MIPLAYOUT_BELOW				(0x0 << 1)
+#define GEN6_DEPTH_DW3_MIPLAYOUT_RIGHT				(0x1 << 1)
+
+#define GEN6_DEPTH_DW4_DEPTH__MASK				0xffe00000
+#define GEN6_DEPTH_DW4_DEPTH__SHIFT				21
+#define GEN6_DEPTH_DW4_MIN_ARRAY_ELEMENT__MASK			0x001ffc00
+#define GEN6_DEPTH_DW4_MIN_ARRAY_ELEMENT__SHIFT			10
+#define GEN6_DEPTH_DW4_RT_VIEW_EXTENT__MASK			0x000003fe
+#define GEN6_DEPTH_DW4_RT_VIEW_EXTENT__SHIFT			1
+
+#define GEN6_DEPTH_DW5_OFFSET_Y__MASK				0xffff0000
+#define GEN6_DEPTH_DW5_OFFSET_Y__SHIFT				16
+#define GEN6_DEPTH_DW5_OFFSET_X__MASK				0x0000ffff
+#define GEN6_DEPTH_DW5_OFFSET_X__SHIFT				0
+
+#define GEN6_DEPTH_DW6_MOCS__MASK				0xf8000000
+#define GEN6_DEPTH_DW6_MOCS__SHIFT				27
+
+
+
+#define GEN7_DEPTH_DW1_TYPE__MASK				0xe0000000
+#define GEN7_DEPTH_DW1_TYPE__SHIFT				29
+#define GEN7_DEPTH_DW1_DEPTH_WRITE_ENABLE			(0x1 << 28)
+#define GEN7_DEPTH_DW1_STENCIL_WRITE_ENABLE			(0x1 << 27)
+#define GEN7_DEPTH_DW1_HIZ_ENABLE				(0x1 << 22)
+#define GEN7_DEPTH_DW1_FORMAT__MASK				0x001c0000
+#define GEN7_DEPTH_DW1_FORMAT__SHIFT				18
+#define GEN7_DEPTH_DW1_PITCH__MASK				0x0003ffff
+#define GEN7_DEPTH_DW1_PITCH__SHIFT				0
+
+
+#define GEN7_DEPTH_DW3_HEIGHT__MASK				0xfffc0000
+#define GEN7_DEPTH_DW3_HEIGHT__SHIFT				18
+#define GEN7_DEPTH_DW3_WIDTH__MASK				0x0003fff0
+#define GEN7_DEPTH_DW3_WIDTH__SHIFT				4
+#define GEN7_DEPTH_DW3_LOD__MASK				0x0000000f
+#define GEN7_DEPTH_DW3_LOD__SHIFT				0
+
+#define GEN7_DEPTH_DW4_DEPTH__MASK				0xffe00000
+#define GEN7_DEPTH_DW4_DEPTH__SHIFT				21
+#define GEN7_DEPTH_DW4_MIN_ARRAY_ELEMENT__MASK			0x001ffc00
+#define GEN7_DEPTH_DW4_MIN_ARRAY_ELEMENT__SHIFT			10
+#define GEN7_DEPTH_DW4_MOCS__MASK				0x0000000f
+#define GEN7_DEPTH_DW4_MOCS__SHIFT				0
+
+#define GEN7_DEPTH_DW5_OFFSET_Y__MASK				0xffff0000
+#define GEN7_DEPTH_DW5_OFFSET_Y__SHIFT				16
+#define GEN7_DEPTH_DW5_OFFSET_X__MASK				0x0000ffff
+#define GEN7_DEPTH_DW5_OFFSET_X__SHIFT				0
+
+#define GEN7_DEPTH_DW6_RT_VIEW_EXTENT__MASK			0xffe00000
+#define GEN7_DEPTH_DW6_RT_VIEW_EXTENT__SHIFT			21
+
+
+
+#define GEN8_DEPTH_DW1_TYPE__MASK				0xe0000000
+#define GEN8_DEPTH_DW1_TYPE__SHIFT				29
+#define GEN8_DEPTH_DW1_DEPTH_WRITE_ENABLE			(0x1 << 28)
+#define GEN8_DEPTH_DW1_STENCIL_WRITE_ENABLE			(0x1 << 27)
+#define GEN8_DEPTH_DW1_HIZ_ENABLE				(0x1 << 22)
+#define GEN8_DEPTH_DW1_FORMAT__MASK				0x001c0000
+#define GEN8_DEPTH_DW1_FORMAT__SHIFT				18
+#define GEN8_DEPTH_DW1_PITCH__MASK				0x0003ffff
+#define GEN8_DEPTH_DW1_PITCH__SHIFT				0
+
+
+
+#define GEN8_DEPTH_DW4_HEIGHT__MASK				0xfffc0000
+#define GEN8_DEPTH_DW4_HEIGHT__SHIFT				18
+#define GEN8_DEPTH_DW4_WIDTH__MASK				0x0003fff0
+#define GEN8_DEPTH_DW4_WIDTH__SHIFT				4
+#define GEN8_DEPTH_DW4_LOD__MASK				0x0000000f
+#define GEN8_DEPTH_DW4_LOD__SHIFT				0
+
+#define GEN8_DEPTH_DW5_DEPTH__MASK				0xffe00000
+#define GEN8_DEPTH_DW5_DEPTH__SHIFT				21
+#define GEN8_DEPTH_DW5_MIN_ARRAY_ELEMENT__MASK			0x001ffc00
+#define GEN8_DEPTH_DW5_MIN_ARRAY_ELEMENT__SHIFT			10
+#define GEN8_DEPTH_DW5_MOCS__MASK				0x0000007f
+#define GEN8_DEPTH_DW5_MOCS__SHIFT				0
+
+#define GEN8_DEPTH_DW6_OFFSET_Y__MASK				0xffff0000
+#define GEN8_DEPTH_DW6_OFFSET_Y__SHIFT				16
+#define GEN8_DEPTH_DW6_OFFSET_X__MASK				0x0000ffff
+#define GEN8_DEPTH_DW6_OFFSET_X__SHIFT				0
+
+#define GEN8_DEPTH_DW7_RT_VIEW_EXTENT__MASK			0xffe00000
+#define GEN8_DEPTH_DW7_RT_VIEW_EXTENT__SHIFT			21
+#define GEN8_DEPTH_DW7_QPITCH__MASK				0x00007fff
+#define GEN8_DEPTH_DW7_QPITCH__SHIFT				0
+
+#define GEN6_3DSTATE_POLY_STIPPLE_OFFSET__SIZE			2
+
+
+#define GEN6_POLY_STIPPLE_OFFSET_DW1_X__MASK			0x00001f00
+#define GEN6_POLY_STIPPLE_OFFSET_DW1_X__SHIFT			8
+#define GEN6_POLY_STIPPLE_OFFSET_DW1_Y__MASK			0x0000001f
+#define GEN6_POLY_STIPPLE_OFFSET_DW1_Y__SHIFT			0
+
+#define GEN6_3DSTATE_POLY_STIPPLE_PATTERN__SIZE			33
+
+
+
+#define GEN6_3DSTATE_LINE_STIPPLE__SIZE				3
+
+
+#define GEN6_LINE_STIPPLE_DW1_PATTERN__MASK			0x0000ffff
+#define GEN6_LINE_STIPPLE_DW1_PATTERN__SHIFT			0
+
+#define GEN6_LINE_STIPPLE_DW2_INVERSE_REPEAT_COUNT__MASK	0xffff0000
+#define GEN6_LINE_STIPPLE_DW2_INVERSE_REPEAT_COUNT__SHIFT	16
+#define GEN6_LINE_STIPPLE_DW2_INVERSE_REPEAT_COUNT__RADIX	13
+#define GEN7_LINE_STIPPLE_DW2_INVERSE_REPEAT_COUNT__MASK	0xffff8000
+#define GEN7_LINE_STIPPLE_DW2_INVERSE_REPEAT_COUNT__SHIFT	15
+#define GEN7_LINE_STIPPLE_DW2_INVERSE_REPEAT_COUNT__RADIX	16
+#define GEN6_LINE_STIPPLE_DW2_REPEAT_COUNT__MASK		0x000001ff
+#define GEN6_LINE_STIPPLE_DW2_REPEAT_COUNT__SHIFT		0
+
+#define GEN6_3DSTATE_AA_LINE_PARAMETERS__SIZE			3
+
+
+#define GEN6_AA_LINE_DW1_BIAS__MASK				0x00ff0000
+#define GEN6_AA_LINE_DW1_BIAS__SHIFT				16
+#define GEN6_AA_LINE_DW1_BIAS__RADIX				8
+#define GEN6_AA_LINE_DW1_SLOPE__MASK				0x000000ff
+#define GEN6_AA_LINE_DW1_SLOPE__SHIFT				0
+#define GEN6_AA_LINE_DW1_SLOPE__RADIX				8
+
+#define GEN6_AA_LINE_DW2_CAP_BIAS__MASK				0x00ff0000
+#define GEN6_AA_LINE_DW2_CAP_BIAS__SHIFT			16
+#define GEN6_AA_LINE_DW2_CAP_BIAS__RADIX			8
+#define GEN6_AA_LINE_DW2_CAP_SLOPE__MASK			0x000000ff
+#define GEN6_AA_LINE_DW2_CAP_SLOPE__SHIFT			0
+#define GEN6_AA_LINE_DW2_CAP_SLOPE__RADIX			8
+
+#define GEN6_3DSTATE_GS_SVB_INDEX__SIZE				4
+
+
+#define GEN6_SVBI_DW1_INDEX__MASK				0x60000000
+#define GEN6_SVBI_DW1_INDEX__SHIFT				29
+#define GEN6_SVBI_DW1_LOAD_INTERNAL_VERTEX_COUNT		(0x1 << 0)
+
+
+
+#define GEN6_3DSTATE_MULTISAMPLE__SIZE				4
+
+
+#define GEN75_MULTISAMPLE_DW1_DX9_MULTISAMPLE_ENABLE		(0x1 << 5)
+#define GEN6_MULTISAMPLE_DW1_PIXLOC__MASK			0x00000010
+#define GEN6_MULTISAMPLE_DW1_PIXLOC__SHIFT			4
+#define GEN6_MULTISAMPLE_DW1_PIXLOC_CENTER			(0x0 << 4)
+#define GEN6_MULTISAMPLE_DW1_PIXLOC_UL_CORNER			(0x1 << 4)
+#define GEN6_MULTISAMPLE_DW1_NUMSAMPLES__MASK			0x0000000e
+#define GEN6_MULTISAMPLE_DW1_NUMSAMPLES__SHIFT			1
+#define GEN6_MULTISAMPLE_DW1_NUMSAMPLES_1			(0x0 << 1)
+#define GEN8_MULTISAMPLE_DW1_NUMSAMPLES_2			(0x1 << 1)
+#define GEN6_MULTISAMPLE_DW1_NUMSAMPLES_4			(0x2 << 1)
+#define GEN7_MULTISAMPLE_DW1_NUMSAMPLES_8			(0x3 << 1)
+#define GEN8_MULTISAMPLE_DW1_NUMSAMPLES_16			(0x4 << 1)
+
+
+
+#define GEN8_3DSTATE_SAMPLE_PATTERN__SIZE			9
+
+
+
+
+
+#define GEN8_SAMPLE_PATTERN_DW8_1X__MASK			0x00ff0000
+#define GEN8_SAMPLE_PATTERN_DW8_1X__SHIFT			16
+#define GEN8_SAMPLE_PATTERN_DW8_2X__MASK			0x0000ffff
+#define GEN8_SAMPLE_PATTERN_DW8_2X__SHIFT			0
+
+#define GEN6_3DSTATE_STENCIL_BUFFER__SIZE			5
+
+
+#define GEN75_STENCIL_DW1_STENCIL_BUFFER_ENABLE			(0x1 << 31)
+#define GEN6_STENCIL_DW1_MOCS__MASK				0x1e000000
+#define GEN6_STENCIL_DW1_MOCS__SHIFT				25
+#define GEN8_STENCIL_DW1_MOCS__MASK				0x1fc00000
+#define GEN8_STENCIL_DW1_MOCS__SHIFT				22
+#define GEN6_STENCIL_DW1_PITCH__MASK				0x0001ffff
+#define GEN6_STENCIL_DW1_PITCH__SHIFT				0
+
+
+
+#define GEN8_STENCIL_DW4_QPITCH__MASK				0x00007fff
+#define GEN8_STENCIL_DW4_QPITCH__SHIFT				0
+
+#define GEN6_3DSTATE_HIER_DEPTH_BUFFER__SIZE			5
+
+
+#define GEN6_HIZ_DW1_MOCS__MASK					0x1e000000
+#define GEN6_HIZ_DW1_MOCS__SHIFT				25
+#define GEN8_HIZ_DW1_MOCS__MASK					0xfe000000
+#define GEN8_HIZ_DW1_MOCS__SHIFT				25
+#define GEN6_HIZ_DW1_PITCH__MASK				0x0001ffff
+#define GEN6_HIZ_DW1_PITCH__SHIFT				0
+
+
+
+#define GEN8_HIZ_DW4_QPITCH__MASK				0x00007fff
+#define GEN8_HIZ_DW4_QPITCH__SHIFT				0
+
+#define GEN6_3DSTATE_CLEAR_PARAMS__SIZE				3
+
+#define GEN6_CLEAR_PARAMS_DW0_VALID				(0x1 << 15)
+
+
+
+#define GEN7_CLEAR_PARAMS_DW2_VALID				(0x1 << 0)
+
+#define GEN6_3DPRIMITIVE__SIZE					7
+
+#define GEN6_3DPRIM_DW0_ACCESS__MASK				0x00008000
+#define GEN6_3DPRIM_DW0_ACCESS__SHIFT				15
+#define GEN6_3DPRIM_DW0_ACCESS_SEQUENTIAL			(0x0 << 15)
+#define GEN6_3DPRIM_DW0_ACCESS_RANDOM				(0x1 << 15)
+#define GEN6_3DPRIM_DW0_TYPE__MASK				0x00007c00
+#define GEN6_3DPRIM_DW0_TYPE__SHIFT				10
+#define GEN6_3DPRIM_DW0_USE_INTERNAL_VERTEX_COUNT		(0x1 << 9)
+
+
+
+
+
+
+
+#define GEN7_3DPRIM_DW0_INDIRECT_PARAM_ENABLE			(0x1 << 10)
+#define GEN75_3DPRIM_DW0_UAV_COHERENCY_REQUIRED			(0x1 << 9)
+#define GEN7_3DPRIM_DW0_PREDICATE_ENABLE			(0x1 << 8)
+
+#define GEN7_3DPRIM_DW1_END_OFFSET_ENABLE			(0x1 << 9)
+#define GEN7_3DPRIM_DW1_ACCESS__MASK				0x00000100
+#define GEN7_3DPRIM_DW1_ACCESS__SHIFT				8
+#define GEN7_3DPRIM_DW1_ACCESS_SEQUENTIAL			(0x0 << 8)
+#define GEN7_3DPRIM_DW1_ACCESS_RANDOM				(0x1 << 8)
+#define GEN7_3DPRIM_DW1_TYPE__MASK				0x0000003f
+#define GEN7_3DPRIM_DW1_TYPE__SHIFT				0
+
+
+
+
+
+
+
+#endif /* GEN_RENDER_3D_XML */

diff --git a/icd/intel/genhw/gen_render_dynamic.xml.h b/icd/intel/genhw/gen_render_dynamic.xml.h
new file mode 100644
index 0000000..6d815be
--- /dev/null
+++ b/icd/intel/genhw/gen_render_dynamic.xml.h

@@ -0,0 +1,526 @@
+#ifndef GEN_RENDER_DYNAMIC_XML
+#define GEN_RENDER_DYNAMIC_XML
+
+/* Autogenerated file, DO NOT EDIT manually!
+
+This file was generated by the rules-ng-ng headergen tool in this git repository:
+https://github.com/olvaffe/envytools/
+git clone https://github.com/olvaffe/envytools.git
+
+Copyright (C) 2014-2015 by the following authors:
+- Chia-I Wu <olvaffe@gmail.com> (olv)
+
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
+The above copyright notice and this permission notice (including the
+next paragraph) shall be included in all copies or substantial
+portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+*/
+
+
+enum gen_compare_function {
+    GEN6_COMPAREFUNCTION_ALWAYS				      = 0x0,
+    GEN6_COMPAREFUNCTION_NEVER				      = 0x1,
+    GEN6_COMPAREFUNCTION_LESS				      = 0x2,
+    GEN6_COMPAREFUNCTION_EQUAL				      = 0x3,
+    GEN6_COMPAREFUNCTION_LEQUAL				      = 0x4,
+    GEN6_COMPAREFUNCTION_GREATER			      = 0x5,
+    GEN6_COMPAREFUNCTION_NOTEQUAL			      = 0x6,
+    GEN6_COMPAREFUNCTION_GEQUAL				      = 0x7,
+};
+
+enum gen_stencil_op {
+    GEN6_STENCILOP_KEEP					      = 0x0,
+    GEN6_STENCILOP_ZERO					      = 0x1,
+    GEN6_STENCILOP_REPLACE				      = 0x2,
+    GEN6_STENCILOP_INCRSAT				      = 0x3,
+    GEN6_STENCILOP_DECRSAT				      = 0x4,
+    GEN6_STENCILOP_INCR					      = 0x5,
+    GEN6_STENCILOP_DECR					      = 0x6,
+    GEN6_STENCILOP_INVERT				      = 0x7,
+};
+
+enum gen_blend_factor {
+    GEN6_BLENDFACTOR_ONE				      = 0x1,
+    GEN6_BLENDFACTOR_SRC_COLOR				      = 0x2,
+    GEN6_BLENDFACTOR_SRC_ALPHA				      = 0x3,
+    GEN6_BLENDFACTOR_DST_ALPHA				      = 0x4,
+    GEN6_BLENDFACTOR_DST_COLOR				      = 0x5,
+    GEN6_BLENDFACTOR_SRC_ALPHA_SATURATE			      = 0x6,
+    GEN6_BLENDFACTOR_CONST_COLOR			      = 0x7,
+    GEN6_BLENDFACTOR_CONST_ALPHA			      = 0x8,
+    GEN6_BLENDFACTOR_SRC1_COLOR				      = 0x9,
+    GEN6_BLENDFACTOR_SRC1_ALPHA				      = 0xa,
+    GEN6_BLENDFACTOR_ZERO				      = 0x11,
+    GEN6_BLENDFACTOR_INV_SRC_COLOR			      = 0x12,
+    GEN6_BLENDFACTOR_INV_SRC_ALPHA			      = 0x13,
+    GEN6_BLENDFACTOR_INV_DST_ALPHA			      = 0x14,
+    GEN6_BLENDFACTOR_INV_DST_COLOR			      = 0x15,
+    GEN6_BLENDFACTOR_INV_CONST_COLOR			      = 0x17,
+    GEN6_BLENDFACTOR_INV_CONST_ALPHA			      = 0x18,
+    GEN6_BLENDFACTOR_INV_SRC1_COLOR			      = 0x19,
+    GEN6_BLENDFACTOR_INV_SRC1_ALPHA			      = 0x1a,
+};
+
+enum gen_blend_function {
+    GEN6_BLENDFUNCTION_ADD				      = 0x0,
+    GEN6_BLENDFUNCTION_SUBTRACT				      = 0x1,
+    GEN6_BLENDFUNCTION_REVERSE_SUBTRACT			      = 0x2,
+    GEN6_BLENDFUNCTION_MIN				      = 0x3,
+    GEN6_BLENDFUNCTION_MAX				      = 0x4,
+};
+
+enum gen_logicop_function {
+    GEN6_LOGICOP_CLEAR					      = 0x0,
+    GEN6_LOGICOP_NOR					      = 0x1,
+    GEN6_LOGICOP_AND_INVERTED				      = 0x2,
+    GEN6_LOGICOP_COPY_INVERTED				      = 0x3,
+    GEN6_LOGICOP_AND_REVERSE				      = 0x4,
+    GEN6_LOGICOP_INVERT					      = 0x5,
+    GEN6_LOGICOP_XOR					      = 0x6,
+    GEN6_LOGICOP_NAND					      = 0x7,
+    GEN6_LOGICOP_AND					      = 0x8,
+    GEN6_LOGICOP_EQUIV					      = 0x9,
+    GEN6_LOGICOP_NOOP					      = 0xa,
+    GEN6_LOGICOP_OR_INVERTED				      = 0xb,
+    GEN6_LOGICOP_COPY					      = 0xc,
+    GEN6_LOGICOP_OR_REVERSE				      = 0xd,
+    GEN6_LOGICOP_OR					      = 0xe,
+    GEN6_LOGICOP_SET					      = 0xf,
+};
+
+enum gen_sampler_mip_filter {
+    GEN6_MIPFILTER_NONE					      = 0x0,
+    GEN6_MIPFILTER_NEAREST				      = 0x1,
+    GEN6_MIPFILTER_LINEAR				      = 0x3,
+};
+
+enum gen_sampler_map_filter {
+    GEN6_MAPFILTER_NEAREST				      = 0x0,
+    GEN6_MAPFILTER_LINEAR				      = 0x1,
+    GEN6_MAPFILTER_ANISOTROPIC				      = 0x2,
+    GEN6_MAPFILTER_MONO					      = 0x6,
+};
+
+enum gen_sampler_aniso_ratio {
+    GEN6_ANISORATIO_2					      = 0x0,
+    GEN6_ANISORATIO_4					      = 0x1,
+    GEN6_ANISORATIO_6					      = 0x2,
+    GEN6_ANISORATIO_8					      = 0x3,
+    GEN6_ANISORATIO_10					      = 0x4,
+    GEN6_ANISORATIO_12					      = 0x5,
+    GEN6_ANISORATIO_14					      = 0x6,
+    GEN6_ANISORATIO_16					      = 0x7,
+};
+
+enum gen_sampler_texcoord_mode {
+    GEN6_TEXCOORDMODE_WRAP				      = 0x0,
+    GEN6_TEXCOORDMODE_MIRROR				      = 0x1,
+    GEN6_TEXCOORDMODE_CLAMP				      = 0x2,
+    GEN6_TEXCOORDMODE_CUBE				      = 0x3,
+    GEN6_TEXCOORDMODE_CLAMP_BORDER			      = 0x4,
+    GEN6_TEXCOORDMODE_MIRROR_ONCE			      = 0x5,
+    GEN8_TEXCOORDMODE_HALF_BORDER			      = 0x6,
+};
+
+enum gen_sampler_key_filter {
+    GEN6_KEYFILTER_KILL_ON_ANY_MATCH			      = 0x0,
+    GEN6_KEYFILTER_REPLACE_BLACK			      = 0x1,
+};
+
+#define GEN6_COLOR_CALC_STATE__SIZE				6
+
+#define GEN6_CC_DW0_STENCIL0_REF__MASK				0xff000000
+#define GEN6_CC_DW0_STENCIL0_REF__SHIFT				24
+#define GEN6_CC_DW0_STENCIL1_REF__MASK				0x00ff0000
+#define GEN6_CC_DW0_STENCIL1_REF__SHIFT				16
+#define GEN6_CC_DW0_ROUND_DISABLE_DISABLE			(0x1 << 15)
+#define GEN6_CC_DW0_ALPHATEST__MASK				0x00000001
+#define GEN6_CC_DW0_ALPHATEST__SHIFT				0
+#define GEN6_CC_DW0_ALPHATEST_UNORM8				0x0
+#define GEN6_CC_DW0_ALPHATEST_FLOAT32				0x1
+
+
+
+
+
+
+#define GEN6_DEPTH_STENCIL_STATE__SIZE				3
+
+#define GEN6_ZS_DW0_STENCIL_TEST_ENABLE				(0x1 << 31)
+#define GEN6_ZS_DW0_STENCIL0_FUNC__MASK				0x70000000
+#define GEN6_ZS_DW0_STENCIL0_FUNC__SHIFT			28
+#define GEN6_ZS_DW0_STENCIL0_FAIL_OP__MASK			0x0e000000
+#define GEN6_ZS_DW0_STENCIL0_FAIL_OP__SHIFT			25
+#define GEN6_ZS_DW0_STENCIL0_ZFAIL_OP__MASK			0x01c00000
+#define GEN6_ZS_DW0_STENCIL0_ZFAIL_OP__SHIFT			22
+#define GEN6_ZS_DW0_STENCIL0_ZPASS_OP__MASK			0x00380000
+#define GEN6_ZS_DW0_STENCIL0_ZPASS_OP__SHIFT			19
+#define GEN6_ZS_DW0_STENCIL_WRITE_ENABLE			(0x1 << 18)
+#define GEN6_ZS_DW0_STENCIL1_ENABLE				(0x1 << 15)
+#define GEN6_ZS_DW0_STENCIL1_FUNC__MASK				0x00007000
+#define GEN6_ZS_DW0_STENCIL1_FUNC__SHIFT			12
+#define GEN6_ZS_DW0_STENCIL1_FAIL_OP__MASK			0x00000e00
+#define GEN6_ZS_DW0_STENCIL1_FAIL_OP__SHIFT			9
+#define GEN6_ZS_DW0_STENCIL1_ZFAIL_OP__MASK			0x000001c0
+#define GEN6_ZS_DW0_STENCIL1_ZFAIL_OP__SHIFT			6
+#define GEN6_ZS_DW0_STENCIL1_ZPASS_OP__MASK			0x00000038
+#define GEN6_ZS_DW0_STENCIL1_ZPASS_OP__SHIFT			3
+
+#define GEN6_ZS_DW1_STENCIL0_VALUEMASK__MASK			0xff000000
+#define GEN6_ZS_DW1_STENCIL0_VALUEMASK__SHIFT			24
+#define GEN6_ZS_DW1_STENCIL0_WRITEMASK__MASK			0x00ff0000
+#define GEN6_ZS_DW1_STENCIL0_WRITEMASK__SHIFT			16
+#define GEN6_ZS_DW1_STENCIL1_VALUEMASK__MASK			0x0000ff00
+#define GEN6_ZS_DW1_STENCIL1_VALUEMASK__SHIFT			8
+#define GEN6_ZS_DW1_STENCIL1_WRITEMASK__MASK			0x000000ff
+#define GEN6_ZS_DW1_STENCIL1_WRITEMASK__SHIFT			0
+
+#define GEN6_ZS_DW2_DEPTH_TEST_ENABLE				(0x1 << 31)
+#define GEN6_ZS_DW2_DEPTH_FUNC__MASK				0x38000000
+#define GEN6_ZS_DW2_DEPTH_FUNC__SHIFT				27
+#define GEN6_ZS_DW2_DEPTH_WRITE_ENABLE				(0x1 << 26)
+
+#define GEN6_BLEND_STATE__SIZE					17
+
+
+#define GEN6_RT_DW0_BLEND_ENABLE				(0x1 << 31)
+#define GEN6_RT_DW0_INDEPENDENT_ALPHA_ENABLE			(0x1 << 30)
+#define GEN6_RT_DW0_ALPHA_FUNC__MASK				0x1c000000
+#define GEN6_RT_DW0_ALPHA_FUNC__SHIFT				26
+#define GEN6_RT_DW0_SRC_ALPHA_FACTOR__MASK			0x01f00000
+#define GEN6_RT_DW0_SRC_ALPHA_FACTOR__SHIFT			20
+#define GEN6_RT_DW0_DST_ALPHA_FACTOR__MASK			0x000f8000
+#define GEN6_RT_DW0_DST_ALPHA_FACTOR__SHIFT			15
+#define GEN6_RT_DW0_COLOR_FUNC__MASK				0x00003800
+#define GEN6_RT_DW0_COLOR_FUNC__SHIFT				11
+#define GEN6_RT_DW0_SRC_COLOR_FACTOR__MASK			0x000003e0
+#define GEN6_RT_DW0_SRC_COLOR_FACTOR__SHIFT			5
+#define GEN6_RT_DW0_DST_COLOR_FACTOR__MASK			0x0000001f
+#define GEN6_RT_DW0_DST_COLOR_FACTOR__SHIFT			0
+
+#define GEN6_RT_DW1_ALPHA_TO_COVERAGE				(0x1 << 31)
+#define GEN6_RT_DW1_ALPHA_TO_ONE				(0x1 << 30)
+#define GEN6_RT_DW1_ALPHA_TO_COVERAGE_DITHER			(0x1 << 29)
+#define GEN6_RT_DW1_WRITE_DISABLE_A				(0x1 << 27)
+#define GEN6_RT_DW1_WRITE_DISABLE_R				(0x1 << 26)
+#define GEN6_RT_DW1_WRITE_DISABLE_G				(0x1 << 25)
+#define GEN6_RT_DW1_WRITE_DISABLE_B				(0x1 << 24)
+#define GEN6_RT_DW1_LOGICOP_ENABLE				(0x1 << 22)
+#define GEN6_RT_DW1_LOGICOP_FUNC__MASK				0x003c0000
+#define GEN6_RT_DW1_LOGICOP_FUNC__SHIFT				18
+#define GEN6_RT_DW1_ALPHA_TEST_ENABLE				(0x1 << 16)
+#define GEN6_RT_DW1_ALPHA_TEST_FUNC__MASK			0x0000e000
+#define GEN6_RT_DW1_ALPHA_TEST_FUNC__SHIFT			13
+#define GEN6_RT_DW1_DITHER_ENABLE				(0x1 << 12)
+#define GEN6_RT_DW1_X_DITHER_OFFSET__MASK			0x00000c00
+#define GEN6_RT_DW1_X_DITHER_OFFSET__SHIFT			10
+#define GEN6_RT_DW1_Y_DITHER_OFFSET__MASK			0x00000300
+#define GEN6_RT_DW1_Y_DITHER_OFFSET__SHIFT			8
+#define GEN6_RT_DW1_COLORCLAMP__MASK				0x0000000c
+#define GEN6_RT_DW1_COLORCLAMP__SHIFT				2
+#define GEN6_RT_DW1_COLORCLAMP_UNORM				(0x0 << 2)
+#define GEN6_RT_DW1_COLORCLAMP_SNORM				(0x1 << 2)
+#define GEN6_RT_DW1_COLORCLAMP_RTFORMAT				(0x2 << 2)
+#define GEN6_RT_DW1_PRE_BLEND_CLAMP				(0x1 << 1)
+#define GEN6_RT_DW1_POST_BLEND_CLAMP				(0x1 << 0)
+
+
+#define GEN8_BLEND_DW0_ALPHA_TO_COVERAGE			(0x1 << 31)
+#define GEN8_BLEND_DW0_INDEPENDENT_ALPHA_ENABLE			(0x1 << 30)
+#define GEN8_BLEND_DW0_ALPHA_TO_ONE				(0x1 << 29)
+#define GEN8_BLEND_DW0_ALPHA_TO_COVERAGE_DITHER			(0x1 << 28)
+#define GEN8_BLEND_DW0_ALPHA_TEST_ENABLE			(0x1 << 27)
+#define GEN8_BLEND_DW0_ALPHA_TEST_FUNC__MASK			0x07000000
+#define GEN8_BLEND_DW0_ALPHA_TEST_FUNC__SHIFT			24
+#define GEN8_BLEND_DW0_DITHER_ENABLE				(0x1 << 23)
+#define GEN8_BLEND_DW0_X_DITHER_OFFSET__MASK			0x00600000
+#define GEN8_BLEND_DW0_X_DITHER_OFFSET__SHIFT			21
+#define GEN8_BLEND_DW0_Y_DITHER_OFFSET__MASK			0x00180000
+#define GEN8_BLEND_DW0_Y_DITHER_OFFSET__SHIFT			19
+
+
+#define GEN8_RT_DW0_BLEND_ENABLE				(0x1 << 31)
+#define GEN8_RT_DW0_SRC_COLOR_FACTOR__MASK			0x7c000000
+#define GEN8_RT_DW0_SRC_COLOR_FACTOR__SHIFT			26
+#define GEN8_RT_DW0_DST_COLOR_FACTOR__MASK			0x03e00000
+#define GEN8_RT_DW0_DST_COLOR_FACTOR__SHIFT			21
+#define GEN8_RT_DW0_COLOR_FUNC__MASK				0x001c0000
+#define GEN8_RT_DW0_COLOR_FUNC__SHIFT				18
+#define GEN8_RT_DW0_SRC_ALPHA_FACTOR__MASK			0x0003e000
+#define GEN8_RT_DW0_SRC_ALPHA_FACTOR__SHIFT			13
+#define GEN8_RT_DW0_DST_ALPHA_FACTOR__MASK			0x00001f00
+#define GEN8_RT_DW0_DST_ALPHA_FACTOR__SHIFT			8
+#define GEN8_RT_DW0_ALPHA_FUNC__MASK				0x000000e0
+#define GEN8_RT_DW0_ALPHA_FUNC__SHIFT				5
+#define GEN8_RT_DW0_WRITE_DISABLE_A				(0x1 << 3)
+#define GEN8_RT_DW0_WRITE_DISABLE_R				(0x1 << 2)
+#define GEN8_RT_DW0_WRITE_DISABLE_G				(0x1 << 1)
+#define GEN8_RT_DW0_WRITE_DISABLE_B				(0x1 << 0)
+
+#define GEN8_RT_DW1_LOGICOP_ENABLE				(0x1 << 31)
+#define GEN8_RT_DW1_LOGICOP_FUNC__MASK				0x78000000
+#define GEN8_RT_DW1_LOGICOP_FUNC__SHIFT				27
+#define GEN8_RT_DW1_PRE_BLEND_CLAMP_SRC_ONLY			(0x1 << 4)
+#define GEN8_RT_DW1_COLORCLAMP__MASK				0x0000000c
+#define GEN8_RT_DW1_COLORCLAMP__SHIFT				2
+#define GEN8_RT_DW1_COLORCLAMP_UNORM				(0x0 << 2)
+#define GEN8_RT_DW1_COLORCLAMP_SNORM				(0x1 << 2)
+#define GEN8_RT_DW1_COLORCLAMP_RTFORMAT				(0x2 << 2)
+#define GEN8_RT_DW1_PRE_BLEND_CLAMP				(0x1 << 1)
+#define GEN8_RT_DW1_POST_BLEND_CLAMP				(0x1 << 0)
+
+#define GEN6_CLIP_VIEWPORT__SIZE				64
+
+
+
+
+
+
+#define GEN6_SF_VIEWPORT__SIZE					128
+
+
+
+
+
+
+
+
+
+
+#define GEN7_SF_CLIP_VIEWPORT__SIZE				256
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+#define GEN6_CC_VIEWPORT__SIZE					32
+
+
+
+
+#define GEN6_SCISSOR_RECT__SIZE					32
+
+
+#define GEN6_SCISSOR_DW0_MIN_Y__MASK				0xffff0000
+#define GEN6_SCISSOR_DW0_MIN_Y__SHIFT				16
+#define GEN6_SCISSOR_DW0_MIN_X__MASK				0x0000ffff
+#define GEN6_SCISSOR_DW0_MIN_X__SHIFT				0
+
+#define GEN6_SCISSOR_DW1_MAX_Y__MASK				0xffff0000
+#define GEN6_SCISSOR_DW1_MAX_Y__SHIFT				16
+#define GEN6_SCISSOR_DW1_MAX_X__MASK				0x0000ffff
+#define GEN6_SCISSOR_DW1_MAX_X__SHIFT				0
+
+#define GEN6_SAMPLER_BORDER_COLOR_STATE__SIZE			20
+
+#define GEN6_BORDER_COLOR_DW0_A__MASK				0xff000000
+#define GEN6_BORDER_COLOR_DW0_A__SHIFT				24
+#define GEN6_BORDER_COLOR_DW0_B__MASK				0x00ff0000
+#define GEN6_BORDER_COLOR_DW0_B__SHIFT				16
+#define GEN6_BORDER_COLOR_DW0_G__MASK				0x0000ff00
+#define GEN6_BORDER_COLOR_DW0_G__SHIFT				8
+#define GEN6_BORDER_COLOR_DW0_R__MASK				0x000000ff
+#define GEN6_BORDER_COLOR_DW0_R__SHIFT				0
+
+
+
+
+
+#define GEN6_BORDER_COLOR_DW5_G__MASK				0xffff0000
+#define GEN6_BORDER_COLOR_DW5_G__SHIFT				16
+#define GEN6_BORDER_COLOR_DW5_R__MASK				0x0000ffff
+#define GEN6_BORDER_COLOR_DW5_R__SHIFT				0
+
+#define GEN6_BORDER_COLOR_DW6_A__MASK				0xffff0000
+#define GEN6_BORDER_COLOR_DW6_A__SHIFT				16
+#define GEN6_BORDER_COLOR_DW6_B__MASK				0x0000ffff
+#define GEN6_BORDER_COLOR_DW6_B__SHIFT				0
+
+#define GEN6_BORDER_COLOR_DW7_G__MASK				0xffff0000
+#define GEN6_BORDER_COLOR_DW7_G__SHIFT				16
+#define GEN6_BORDER_COLOR_DW7_R__MASK				0x0000ffff
+#define GEN6_BORDER_COLOR_DW7_R__SHIFT				0
+
+#define GEN6_BORDER_COLOR_DW8_A__MASK				0xffff0000
+#define GEN6_BORDER_COLOR_DW8_A__SHIFT				16
+#define GEN6_BORDER_COLOR_DW8_B__MASK				0x0000ffff
+#define GEN6_BORDER_COLOR_DW8_B__SHIFT				0
+
+#define GEN6_BORDER_COLOR_DW9_G__MASK				0xffff0000
+#define GEN6_BORDER_COLOR_DW9_G__SHIFT				16
+#define GEN6_BORDER_COLOR_DW9_R__MASK				0x0000ffff
+#define GEN6_BORDER_COLOR_DW9_R__SHIFT				0
+
+#define GEN6_BORDER_COLOR_DW10_A__MASK				0xffff0000
+#define GEN6_BORDER_COLOR_DW10_A__SHIFT				16
+#define GEN6_BORDER_COLOR_DW10_B__MASK				0x0000ffff
+#define GEN6_BORDER_COLOR_DW10_B__SHIFT				0
+
+#define GEN6_BORDER_COLOR_DW11_A__MASK				0xff000000
+#define GEN6_BORDER_COLOR_DW11_A__SHIFT				24
+#define GEN6_BORDER_COLOR_DW11_B__MASK				0x00ff0000
+#define GEN6_BORDER_COLOR_DW11_B__SHIFT				16
+#define GEN6_BORDER_COLOR_DW11_G__MASK				0x0000ff00
+#define GEN6_BORDER_COLOR_DW11_G__SHIFT				8
+#define GEN6_BORDER_COLOR_DW11_R__MASK				0x000000ff
+#define GEN6_BORDER_COLOR_DW11_R__SHIFT				0
+
+
+
+
+
+
+
+
+#define GEN6_SAMPLER_STATE__SIZE				4
+
+#define GEN6_SAMPLER_DW0_DISABLE				(0x1 << 31)
+#define GEN7_SAMPLER_DW0_BORDER_COLOR_MODE__MASK		0x20000000
+#define GEN7_SAMPLER_DW0_BORDER_COLOR_MODE__SHIFT		29
+#define GEN7_SAMPLER_DW0_BORDER_COLOR_MODE_DX10_OGL		(0x0 << 29)
+#define GEN7_SAMPLER_DW0_BORDER_COLOR_MODE_DX9			(0x1 << 29)
+#define GEN6_SAMPLER_DW0_LOD_PRECLAMP_ENABLE			(0x1 << 28)
+#define GEN6_SAMPLER_DW0_MIN_MAG_NOT_EQUAL			(0x1 << 27)
+#define GEN8_SAMPLER_DW0_LOD_PRECLAMP_ENABLE__MASK		0x18000000
+#define GEN8_SAMPLER_DW0_LOD_PRECLAMP_ENABLE__SHIFT		27
+#define GEN6_SAMPLER_DW0_BASE_LOD__MASK				0x07c00000
+#define GEN6_SAMPLER_DW0_BASE_LOD__SHIFT			22
+#define GEN6_SAMPLER_DW0_MIP_FILTER__MASK			0x00300000
+#define GEN6_SAMPLER_DW0_MIP_FILTER__SHIFT			20
+#define GEN6_SAMPLER_DW0_MAG_FILTER__MASK			0x000e0000
+#define GEN6_SAMPLER_DW0_MAG_FILTER__SHIFT			17
+#define GEN6_SAMPLER_DW0_MIN_FILTER__MASK			0x0001c000
+#define GEN6_SAMPLER_DW0_MIN_FILTER__SHIFT			14
+#define GEN6_SAMPLER_DW0_LOD_BIAS__MASK				0x00003ff8
+#define GEN6_SAMPLER_DW0_LOD_BIAS__SHIFT			3
+#define GEN6_SAMPLER_DW0_LOD_BIAS__RADIX			6
+#define GEN6_SAMPLER_DW0_SHADOW_FUNC__MASK			0x00000007
+#define GEN6_SAMPLER_DW0_SHADOW_FUNC__SHIFT			0
+#define GEN7_SAMPLER_DW0_LOD_BIAS__MASK				0x00003ffe
+#define GEN7_SAMPLER_DW0_LOD_BIAS__SHIFT			1
+#define GEN7_SAMPLER_DW0_LOD_BIAS__RADIX			8
+#define GEN7_SAMPLER_DW0_ANISO_ALGO__MASK			0x00000001
+#define GEN7_SAMPLER_DW0_ANISO_ALGO__SHIFT			0
+#define GEN7_SAMPLER_DW0_ANISO_ALGO_LEGACY			0x0
+#define GEN7_SAMPLER_DW0_ANISO_ALGO_EWA				0x1
+
+#define GEN6_SAMPLER_DW1_MIN_LOD__MASK				0xffc00000
+#define GEN6_SAMPLER_DW1_MIN_LOD__SHIFT				22
+#define GEN6_SAMPLER_DW1_MIN_LOD__RADIX				6
+#define GEN6_SAMPLER_DW1_MAX_LOD__MASK				0x003ff000
+#define GEN6_SAMPLER_DW1_MAX_LOD__SHIFT				12
+#define GEN6_SAMPLER_DW1_MAX_LOD__RADIX				6
+#define GEN6_SAMPLER_DW1_CUBECTRLMODE__MASK			0x00000200
+#define GEN6_SAMPLER_DW1_CUBECTRLMODE__SHIFT			9
+#define GEN6_SAMPLER_DW1_CUBECTRLMODE_PROGRAMMED		(0x0 << 9)
+#define GEN6_SAMPLER_DW1_CUBECTRLMODE_OVERRIDE			(0x1 << 9)
+#define GEN6_SAMPLER_DW1_U_WRAP__MASK				0x000001c0
+#define GEN6_SAMPLER_DW1_U_WRAP__SHIFT				6
+#define GEN6_SAMPLER_DW1_V_WRAP__MASK				0x00000038
+#define GEN6_SAMPLER_DW1_V_WRAP__SHIFT				3
+#define GEN6_SAMPLER_DW1_R_WRAP__MASK				0x00000007
+#define GEN6_SAMPLER_DW1_R_WRAP__SHIFT				0
+
+#define GEN7_SAMPLER_DW1_MIN_LOD__MASK				0xfff00000
+#define GEN7_SAMPLER_DW1_MIN_LOD__SHIFT				20
+#define GEN7_SAMPLER_DW1_MIN_LOD__RADIX				8
+#define GEN7_SAMPLER_DW1_MAX_LOD__MASK				0x000fff00
+#define GEN7_SAMPLER_DW1_MAX_LOD__SHIFT				8
+#define GEN7_SAMPLER_DW1_MAX_LOD__RADIX				8
+#define GEN8_SAMPLER_DW1_CHROMAKEY_ENABLE			(0x1 << 7)
+#define GEN8_SAMPLER_DW1_CHROMAKEY_INDEX__MASK			0x00000060
+#define GEN8_SAMPLER_DW1_CHROMAKEY_INDEX__SHIFT			5
+#define GEN8_SAMPLER_DW1_CHROMAKEY_MODE__MASK			0x00000010
+#define GEN8_SAMPLER_DW1_CHROMAKEY_MODE__SHIFT			4
+#define GEN7_SAMPLER_DW1_SHADOW_FUNC__MASK			0x0000000e
+#define GEN7_SAMPLER_DW1_SHADOW_FUNC__SHIFT			1
+#define GEN7_SAMPLER_DW1_CUBECTRLMODE__MASK			0x00000001
+#define GEN7_SAMPLER_DW1_CUBECTRLMODE__SHIFT			0
+#define GEN7_SAMPLER_DW1_CUBECTRLMODE_PROGRAMMED		0x0
+#define GEN7_SAMPLER_DW1_CUBECTRLMODE_OVERRIDE			0x1
+
+#define GEN6_SAMPLER_DW2_BORDER_COLOR_ADDR__MASK		0xffffffe0
+#define GEN6_SAMPLER_DW2_BORDER_COLOR_ADDR__SHIFT		5
+#define GEN6_SAMPLER_DW2_BORDER_COLOR_ADDR__SHR			5
+
+#define GEN8_SAMPLER_DW2_SEP_FILTER_COEFF_TABLE_SIZE__MASK	0xc0000000
+#define GEN8_SAMPLER_DW2_SEP_FILTER_COEFF_TABLE_SIZE__SHIFT	30
+#define GEN8_SAMPLER_DW2_SEP_FILTER_WIDTH__MASK			0x30000000
+#define GEN8_SAMPLER_DW2_SEP_FILTER_WIDTH__SHIFT		28
+#define GEN8_SAMPLER_DW2_SEP_FILTER_HEIGHT__MASK		0x0c000000
+#define GEN8_SAMPLER_DW2_SEP_FILTER_HEIGHT__SHIFT		26
+#define GEN8_SAMPLER_DW2_INDIRECT_STATE_ADDR__MASK		0x00ffffc0
+#define GEN8_SAMPLER_DW2_INDIRECT_STATE_ADDR__SHIFT		6
+#define GEN8_SAMPLER_DW2_INDIRECT_STATE_ADDR__SHR		6
+#define GEN8_SAMPLER_DW2_FLEXIBLE_FILTER_MODE			(0x1 << 4)
+#define GEN8_SAMPLER_DW2_FLEXIBLE_FILTER_COEFF_SIZE		(0x1 << 3)
+#define GEN8_SAMPLER_DW2_FLEXIBLE_FILTER_HALIGN			(0x1 << 2)
+#define GEN8_SAMPLER_DW2_FLEXIBLE_FILTER_VALIGN			(0x1 << 1)
+#define GEN8_SAMPLER_DW2_LOD_CLAMP_MAG_MODE			(0x1 << 0)
+
+#define GEN8_SAMPLER_DW3_NON_SEP_FILTER_FOOTPRINT_MASK__MASK	0xff000000
+#define GEN8_SAMPLER_DW3_NON_SEP_FILTER_FOOTPRINT_MASK__SHIFT	24
+#define GEN6_SAMPLER_DW3_CHROMAKEY_ENABLE			(0x1 << 25)
+#define GEN6_SAMPLER_DW3_CHROMAKEY_INDEX__MASK			0x01800000
+#define GEN6_SAMPLER_DW3_CHROMAKEY_INDEX__SHIFT			23
+#define GEN6_SAMPLER_DW3_CHROMAKEY_MODE__MASK			0x00400000
+#define GEN6_SAMPLER_DW3_CHROMAKEY_MODE__SHIFT			22
+#define GEN6_SAMPLER_DW3_MAX_ANISO__MASK			0x00380000
+#define GEN6_SAMPLER_DW3_MAX_ANISO__SHIFT			19
+#define GEN6_SAMPLER_DW3_U_MAG_ROUND				(0x1 << 18)
+#define GEN6_SAMPLER_DW3_U_MIN_ROUND				(0x1 << 17)
+#define GEN6_SAMPLER_DW3_V_MAG_ROUND				(0x1 << 16)
+#define GEN6_SAMPLER_DW3_V_MIN_ROUND				(0x1 << 15)
+#define GEN6_SAMPLER_DW3_R_MAG_ROUND				(0x1 << 14)
+#define GEN6_SAMPLER_DW3_R_MIN_ROUND				(0x1 << 13)
+#define GEN7_SAMPLER_DW3_TRIQUAL__MASK				0x00001800
+#define GEN7_SAMPLER_DW3_TRIQUAL__SHIFT				11
+#define GEN7_SAMPLER_DW3_TRIQUAL_FULL				(0x0 << 11)
+#define GEN75_SAMPLER_DW3_TRIQUAL_HIGH				(0x1 << 11)
+#define GEN7_SAMPLER_DW3_TRIQUAL_MED				(0x2 << 11)
+#define GEN7_SAMPLER_DW3_TRIQUAL_LOW				(0x3 << 11)
+#define GEN7_SAMPLER_DW3_NON_NORMALIZED_COORD			(0x1 << 10)
+#define GEN7_SAMPLER_DW3_U_WRAP__MASK				0x000001c0
+#define GEN7_SAMPLER_DW3_U_WRAP__SHIFT				6
+#define GEN7_SAMPLER_DW3_V_WRAP__MASK				0x00000038
+#define GEN7_SAMPLER_DW3_V_WRAP__SHIFT				3
+#define GEN7_SAMPLER_DW3_R_WRAP__MASK				0x00000007
+#define GEN7_SAMPLER_DW3_R_WRAP__SHIFT				0
+#define GEN6_SAMPLER_DW3_NON_NORMALIZED_COORD			(0x1 << 0)
+
+
+#endif /* GEN_RENDER_DYNAMIC_XML */

diff --git a/icd/intel/genhw/gen_render_media.xml.h b/icd/intel/genhw/gen_render_media.xml.h
new file mode 100644
index 0000000..55d830b
--- /dev/null
+++ b/icd/intel/genhw/gen_render_media.xml.h

@@ -0,0 +1,311 @@
+#ifndef GEN_RENDER_MEDIA_XML
+#define GEN_RENDER_MEDIA_XML
+
+/* Autogenerated file, DO NOT EDIT manually!
+
+This file was generated by the rules-ng-ng headergen tool in this git repository:
+https://github.com/olvaffe/envytools/
+git clone https://github.com/olvaffe/envytools.git
+
+Copyright (C) 2014-2015 by the following authors:
+- Chia-I Wu <olvaffe@gmail.com> (olv)
+
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
+The above copyright notice and this permission notice (including the
+next paragraph) shall be included in all copies or substantial
+portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+*/
+
+
+#define GEN6_INTERFACE_DESCRIPTOR_DATA__SIZE			8
+
+#define GEN6_IDRT_DW0_KERNEL_ADDR__MASK				0xffffffc0
+#define GEN6_IDRT_DW0_KERNEL_ADDR__SHIFT			6
+#define GEN6_IDRT_DW0_KERNEL_ADDR__SHR				6
+
+#define GEN6_IDRT_DW1_SPF					(0x1 << 18)
+#define GEN6_IDRT_DW1_PRIORITY_HIGH				(0x1 << 17)
+#define GEN6_IDRT_DW1_FP_MODE_ALT				(0x1 << 16)
+#define GEN6_IDRT_DW1_ILLEGAL_CODE_EXCEPTION			(0x1 << 13)
+#define GEN6_IDRT_DW1_MASK_STACK_EXCEPTION			(0x1 << 11)
+#define GEN6_IDRT_DW1_SOFTWARE_EXCEPTION			(0x1 << 7)
+
+#define GEN6_IDRT_DW2_SAMPLER_COUNT__MASK			0x0000001c
+#define GEN6_IDRT_DW2_SAMPLER_COUNT__SHIFT			2
+#define GEN6_IDRT_DW2_SAMPLER_ADDR__MASK			0xffffffe0
+#define GEN6_IDRT_DW2_SAMPLER_ADDR__SHIFT			5
+#define GEN6_IDRT_DW2_SAMPLER_ADDR__SHR				5
+
+#define GEN6_IDRT_DW3_BINDING_TABLE_SIZE__MASK			0x0000001f
+#define GEN6_IDRT_DW3_BINDING_TABLE_SIZE__SHIFT			0
+
+#define GEN6_IDRT_DW4_CURBE_READ_LEN__MASK			0xffff0000
+#define GEN6_IDRT_DW4_CURBE_READ_LEN__SHIFT			16
+#define GEN6_IDRT_DW4_CURBE_READ_OFFSET__MASK			0x0000ffff
+#define GEN6_IDRT_DW4_CURBE_READ_OFFSET__SHIFT			0
+
+#define GEN6_IDRT_DW5_BARRIER_ID__MASK				0x0000000f
+#define GEN6_IDRT_DW5_BARRIER_ID__SHIFT				0
+
+#define GEN7_IDRT_DW5_BARRIER_RETURN_GRF__MASK			0xff000000
+#define GEN7_IDRT_DW5_BARRIER_RETURN_GRF__SHIFT			24
+#define GEN7_IDRT_DW5_ROUNDING_MODE__MASK			0x00c00000
+#define GEN7_IDRT_DW5_ROUNDING_MODE__SHIFT			22
+#define GEN7_IDRT_DW5_ROUNDING_MODE_RTNE			(0x0 << 22)
+#define GEN7_IDRT_DW5_ROUNDING_MODE_RU				(0x1 << 22)
+#define GEN7_IDRT_DW5_ROUNDING_MODE_RD				(0x2 << 22)
+#define GEN7_IDRT_DW5_ROUNDING_MODE_RTZ				(0x3 << 22)
+#define GEN7_IDRT_DW5_BARRIER_ENABLE				(0x1 << 21)
+#define GEN7_IDRT_DW5_SLM_SIZE__MASK				0x001f0000
+#define GEN7_IDRT_DW5_SLM_SIZE__SHIFT				16
+#define GEN7_IDRT_DW5_BARRIER_RETURN_BYTE__MASK			0x0000ff00
+#define GEN7_IDRT_DW5_BARRIER_RETURN_BYTE__SHIFT		8
+#define GEN7_IDRT_DW5_THREAD_GROUP_SIZE__MASK			0x000000ff
+#define GEN7_IDRT_DW5_THREAD_GROUP_SIZE__SHIFT			0
+
+#define GEN75_IDRT_DW6_CROSS_THREAD_CURBE_READ_LEN__MASK	0x000000ff
+#define GEN75_IDRT_DW6_CROSS_THREAD_CURBE_READ_LEN__SHIFT	0
+
+
+
+#define GEN8_IDRT_DW0_KERNEL_ADDR__MASK				0xffffffc0
+#define GEN8_IDRT_DW0_KERNEL_ADDR__SHIFT			6
+#define GEN8_IDRT_DW0_KERNEL_ADDR__SHR				6
+
+
+#define GEN8_IDRT_DW2_THREAD_PREEMPTION_DISABLE			(0x1 << 20)
+#define GEN8_IDRT_DW2_DENORM__MASK				0x00080000
+#define GEN8_IDRT_DW2_DENORM__SHIFT				19
+#define GEN8_IDRT_DW2_DENORM_FTZ				(0x0 << 19)
+#define GEN8_IDRT_DW2_DENORM_RET				(0x1 << 19)
+#define GEN8_IDRT_DW2_SPF					(0x1 << 18)
+#define GEN8_IDRT_DW2_PRIORITY_HIGH				(0x1 << 17)
+#define GEN8_IDRT_DW2_FP_MODE_ALT				(0x1 << 16)
+#define GEN8_IDRT_DW2_ILLEGAL_CODE_EXCEPTION			(0x1 << 13)
+#define GEN8_IDRT_DW2_MASK_STACK_EXCEPTION			(0x1 << 11)
+#define GEN8_IDRT_DW2_SOFTWARE_EXCEPTION			(0x1 << 7)
+
+#define GEN8_IDRT_DW3_SAMPLER_COUNT__MASK			0x0000001c
+#define GEN8_IDRT_DW3_SAMPLER_COUNT__SHIFT			2
+#define GEN8_IDRT_DW3_SAMPLER_ADDR__MASK			0xffffffe0
+#define GEN8_IDRT_DW3_SAMPLER_ADDR__SHIFT			5
+#define GEN8_IDRT_DW3_SAMPLER_ADDR__SHR				5
+
+#define GEN8_IDRT_DW4_BINDING_TABLE_SIZE__MASK			0x0000001f
+#define GEN8_IDRT_DW4_BINDING_TABLE_SIZE__SHIFT			0
+
+#define GEN8_IDRT_DW5_CURBE_READ_LEN__MASK			0xffff0000
+#define GEN8_IDRT_DW5_CURBE_READ_LEN__SHIFT			16
+
+#define GEN8_IDRT_DW6_ROUNDING_MODE__MASK			0x00c00000
+#define GEN8_IDRT_DW6_ROUNDING_MODE__SHIFT			22
+#define GEN8_IDRT_DW6_ROUNDING_MODE_RTNE			(0x0 << 22)
+#define GEN8_IDRT_DW6_ROUNDING_MODE_RU				(0x1 << 22)
+#define GEN8_IDRT_DW6_ROUNDING_MODE_RD				(0x2 << 22)
+#define GEN8_IDRT_DW6_ROUNDING_MODE_RTZ				(0x3 << 22)
+#define GEN8_IDRT_DW6_BARRIER_ENABLE				(0x1 << 21)
+#define GEN8_IDRT_DW6_SLM_SIZE__MASK				0x001f0000
+#define GEN8_IDRT_DW6_SLM_SIZE__SHIFT				16
+#define GEN8_IDRT_DW6_THREAD_GROUP_SIZE__MASK			0x000000ff
+#define GEN8_IDRT_DW6_THREAD_GROUP_SIZE__SHIFT			0
+
+#define GEN8_IDRT_DW7_CROSS_THREAD_CURBE_READ_LEN__MASK		0x000000ff
+#define GEN8_IDRT_DW7_CROSS_THREAD_CURBE_READ_LEN__SHIFT	0
+
+#define GEN6_MEDIA_VFE_STATE__SIZE				9
+
+
+#define GEN6_VFE_DW1_SCRATCH_STACK_SIZE__MASK			0x000000f0
+#define GEN6_VFE_DW1_SCRATCH_STACK_SIZE__SHIFT			4
+#define GEN6_VFE_DW1_SCRATCH_SPACE_PER_THREAD__MASK		0x0000000f
+#define GEN6_VFE_DW1_SCRATCH_SPACE_PER_THREAD__SHIFT		0
+#define GEN6_VFE_DW1_SCRATCH_ADDR__MASK				0xfffffc00
+#define GEN6_VFE_DW1_SCRATCH_ADDR__SHIFT			10
+#define GEN6_VFE_DW1_SCRATCH_ADDR__SHR				10
+
+#define GEN6_VFE_DW2_MAX_THREADS__MASK				0xffff0000
+#define GEN6_VFE_DW2_MAX_THREADS__SHIFT				16
+#define GEN6_VFE_DW2_URB_ENTRY_COUNT__MASK			0x0000ff00
+#define GEN6_VFE_DW2_URB_ENTRY_COUNT__SHIFT			8
+#define GEN6_VFE_DW2_RESET_GATEWAY_TIMER			(0x1 << 7)
+#define GEN6_VFE_DW2_BYPASS_GATEWAY_CONTROL			(0x1 << 6)
+#define GEN6_VFE_DW2_FAST_PREEMPT				(0x1 << 5)
+#define GEN7_VFE_DW2_GATEWAY_MMIO__MASK				0x00000018
+#define GEN7_VFE_DW2_GATEWAY_MMIO__SHIFT			3
+#define GEN7_VFE_DW2_GATEWAY_MMIO_NONE				(0x0 << 3)
+#define GEN7_VFE_DW2_GATEWAY_MMIO_ANY				(0x2 << 3)
+#define GEN7_VFE_DW2_GPGPU_MODE					(0x1 << 2)
+
+#define GEN75_VFE_DW3_HALF_SLICE_DISABLE__MASK			0x00000003
+#define GEN75_VFE_DW3_HALF_SLICE_DISABLE__SHIFT			0
+#define GEN75_VFE_DW3_HALF_SLICE_DISABLE_NONE			0x0
+#define GEN75_VFE_DW3_HALF_SLICE_DISABLE_23			0x1
+#define GEN75_VFE_DW3_HALF_SLICE_DISABLE_123			0x3
+
+#define GEN6_VFE_DW4_URB_ENTRY_SIZE__MASK			0xffff0000
+#define GEN6_VFE_DW4_URB_ENTRY_SIZE__SHIFT			16
+#define GEN6_VFE_DW4_CURBE_SIZE__MASK				0x0000ffff
+#define GEN6_VFE_DW4_CURBE_SIZE__SHIFT				0
+
+#define GEN6_VFE_DW5_SCOREBOARD_ENABLE				(0x1 << 31)
+#define GEN6_VFE_DW5_SCOREBOARD_TYPE__MASK			0x40000000
+#define GEN6_VFE_DW5_SCOREBOARD_TYPE__SHIFT			30
+#define GEN6_VFE_DW5_SCOREBOARD_TYPE_STALLING			(0x0 << 30)
+#define GEN6_VFE_DW5_SCOREBOARD_TYPE_NON_STALLING		(0x1 << 30)
+#define GEN6_VFE_DW5_SCOREBOARD_MASK__MASK			0x000000ff
+#define GEN6_VFE_DW5_SCOREBOARD_MASK__SHIFT			0
+
+
+
+
+#define GEN8_VFE_DW1_SCRATCH_STACK_SIZE__MASK			0x000000f0
+#define GEN8_VFE_DW1_SCRATCH_STACK_SIZE__SHIFT			4
+#define GEN8_VFE_DW1_SCRATCH_SPACE_PER_THREAD__MASK		0x0000000f
+#define GEN8_VFE_DW1_SCRATCH_SPACE_PER_THREAD__SHIFT		0
+#define GEN8_VFE_DW1_SCRATCH_ADDR__MASK				0xfffffc00
+#define GEN8_VFE_DW1_SCRATCH_ADDR__SHIFT			10
+#define GEN8_VFE_DW1_SCRATCH_ADDR__SHR				10
+
+
+#define GEN8_VFE_DW3_MAX_THREADS__MASK				0xffff0000
+#define GEN8_VFE_DW3_MAX_THREADS__SHIFT				16
+#define GEN8_VFE_DW3_URB_ENTRY_COUNT__MASK			0x0000ff00
+#define GEN8_VFE_DW3_URB_ENTRY_COUNT__SHIFT			8
+#define GEN8_VFE_DW3_RESET_GATEWAY_TIMER			(0x1 << 7)
+#define GEN8_VFE_DW3_BYPASS_GATEWAY_CONTROL			(0x1 << 6)
+
+#define GEN8_VFE_DW4_HALF_SLICE_DISABLE__MASK			0x00000003
+#define GEN8_VFE_DW4_HALF_SLICE_DISABLE__SHIFT			0
+#define GEN8_VFE_DW4_HALF_SLICE_DISABLE_NONE			0x0
+#define GEN8_VFE_DW4_HALF_SLICE_DISABLE_23			0x1
+#define GEN8_VFE_DW4_HALF_SLICE_DISABLE_123			0x3
+
+#define GEN8_VFE_DW5_URB_ENTRY_SIZE__MASK			0xffff0000
+#define GEN8_VFE_DW5_URB_ENTRY_SIZE__SHIFT			16
+#define GEN8_VFE_DW5_CURBE_SIZE__MASK				0x0000ffff
+#define GEN8_VFE_DW5_CURBE_SIZE__SHIFT				0
+
+#define GEN8_VFE_DW6_SCOREBOARD_ENABLE				(0x1 << 31)
+#define GEN8_VFE_DW6_SCOREBOARD_TYPE__MASK			0x40000000
+#define GEN8_VFE_DW6_SCOREBOARD_TYPE__SHIFT			30
+#define GEN8_VFE_DW6_SCOREBOARD_TYPE_STALLING			(0x0 << 30)
+#define GEN8_VFE_DW6_SCOREBOARD_TYPE_NON_STALLING		(0x1 << 30)
+#define GEN8_VFE_DW6_SCOREBOARD_MASK__MASK			0x000000ff
+#define GEN8_VFE_DW6_SCOREBOARD_MASK__SHIFT			0
+
+
+#define GEN6_MEDIA_CURBE_LOAD__SIZE				4
+
+
+
+#define GEN6_CURBE_LOAD_DW2_LEN__MASK				0x0001ffff
+#define GEN6_CURBE_LOAD_DW2_LEN__SHIFT				0
+
+#define GEN6_CURBE_LOAD_DW3_ADDR__MASK				0xffffffe0
+#define GEN6_CURBE_LOAD_DW3_ADDR__SHIFT				5
+#define GEN6_CURBE_LOAD_DW3_ADDR__SHR				5
+
+#define GEN6_MEDIA_INTERFACE_DESCRIPTOR_LOAD__SIZE		4
+
+
+
+#define GEN6_IDRT_LOAD_DW2_LEN__MASK				0x0001ffff
+#define GEN6_IDRT_LOAD_DW2_LEN__SHIFT				0
+
+#define GEN6_IDRT_LOAD_DW3_ADDR__MASK				0xffffffe0
+#define GEN6_IDRT_LOAD_DW3_ADDR__SHIFT				5
+#define GEN6_IDRT_LOAD_DW3_ADDR__SHR				5
+
+#define GEN6_MEDIA_STATE_FLUSH__SIZE				2
+
+
+#define GEN6_MEDIA_FLUSH_DW1_THREAD_COUNT_WATERMARK__MASK	0x00ff0000
+#define GEN6_MEDIA_FLUSH_DW1_THREAD_COUNT_WATERMARK__SHIFT	16
+#define GEN6_MEDIA_FLUSH_DW1_BARRIER_MASK__MASK			0x0000ffff
+#define GEN6_MEDIA_FLUSH_DW1_BARRIER_MASK__SHIFT		0
+
+#define GEN7_MEDIA_FLUSH_DW1_DISABLE_PREEMPTION			(0x1 << 8)
+#define GEN75_MEDIA_FLUSH_DW1_FLUSH_TO_GO			(0x1 << 7)
+#define GEN7_MEDIA_FLUSH_DW1_WATERMARK_REQUIRED			(0x1 << 6)
+#define GEN7_MEDIA_FLUSH_DW1_IDRT_OFFSET__MASK			0x0000003f
+#define GEN7_MEDIA_FLUSH_DW1_IDRT_OFFSET__SHIFT			0
+
+#define GEN7_GPGPU_WALKER__SIZE					15
+
+#define GEN7_GPGPU_DW0_INDIRECT_PARAM_ENABLE			(0x1 << 10)
+#define GEN7_GPGPU_DW0_PREDICATE_ENABLE				(0x1 << 8)
+
+#define GEN7_GPGPU_DW1_IDRT_OFFSET__MASK			0x0000003f
+#define GEN7_GPGPU_DW1_IDRT_OFFSET__SHIFT			0
+
+#define GEN7_GPGPU_DW2_SIMD_SIZE__MASK				0xc0000000
+#define GEN7_GPGPU_DW2_SIMD_SIZE__SHIFT				30
+#define GEN7_GPGPU_DW2_SIMD_SIZE_SIMD8				(0x0 << 30)
+#define GEN7_GPGPU_DW2_SIMD_SIZE_SIMD16				(0x1 << 30)
+#define GEN7_GPGPU_DW2_SIMD_SIZE_SIMD32				(0x2 << 30)
+#define GEN7_GPGPU_DW2_THREAD_MAX_Z__MASK			0x003f0000
+#define GEN7_GPGPU_DW2_THREAD_MAX_Z__SHIFT			16
+#define GEN7_GPGPU_DW2_THREAD_MAX_Y__MASK			0x00003f00
+#define GEN7_GPGPU_DW2_THREAD_MAX_Y__SHIFT			8
+#define GEN7_GPGPU_DW2_THREAD_MAX_X__MASK			0x0000003f
+#define GEN7_GPGPU_DW2_THREAD_MAX_X__SHIFT			0
+
+
+
+
+
+
+
+
+
+
+#define GEN8_GPGPU_DW0_INDIRECT_PARAM_ENABLE			(0x1 << 10)
+#define GEN8_GPGPU_DW0_PREDICATE_ENABLE				(0x1 << 8)
+
+#define GEN8_GPGPU_DW1_IDRT_OFFSET__MASK			0x0000003f
+#define GEN8_GPGPU_DW1_IDRT_OFFSET__SHIFT			0
+
+
+#define GEN8_GPGPU_DW3_INDIRECT_ADDR__MASK			0xffffffe0
+#define GEN8_GPGPU_DW3_INDIRECT_ADDR__SHIFT			5
+#define GEN8_GPGPU_DW3_INDIRECT_ADDR__SHR			5
+
+#define GEN8_GPGPU_DW4_SIMD_SIZE__MASK				0xc0000000
+#define GEN8_GPGPU_DW4_SIMD_SIZE__SHIFT				30
+#define GEN8_GPGPU_DW4_SIMD_SIZE_SIMD8				(0x0 << 30)
+#define GEN8_GPGPU_DW4_SIMD_SIZE_SIMD16				(0x1 << 30)
+#define GEN8_GPGPU_DW4_SIMD_SIZE_SIMD32				(0x2 << 30)
+#define GEN8_GPGPU_DW4_THREAD_MAX_Z__MASK			0x003f0000
+#define GEN8_GPGPU_DW4_THREAD_MAX_Z__SHIFT			16
+#define GEN8_GPGPU_DW4_THREAD_MAX_Y__MASK			0x00003f00
+#define GEN8_GPGPU_DW4_THREAD_MAX_Y__SHIFT			8
+#define GEN8_GPGPU_DW4_THREAD_MAX_X__MASK			0x0000003f
+#define GEN8_GPGPU_DW4_THREAD_MAX_X__SHIFT			0
+
+
+
+
+
+
+
+
+
+
+
+
+#endif /* GEN_RENDER_MEDIA_XML */

diff --git a/icd/intel/genhw/gen_render_surface.xml.h b/icd/intel/genhw/gen_render_surface.xml.h
new file mode 100644
index 0000000..7c2349f
--- /dev/null
+++ b/icd/intel/genhw/gen_render_surface.xml.h

@@ -0,0 +1,518 @@
+#ifndef GEN_RENDER_SURFACE_XML
+#define GEN_RENDER_SURFACE_XML
+
+/* Autogenerated file, DO NOT EDIT manually!
+
+This file was generated by the rules-ng-ng headergen tool in this git repository:
+https://github.com/olvaffe/envytools/
+git clone https://github.com/olvaffe/envytools.git
+
+Copyright (C) 2014-2015 by the following authors:
+- Chia-I Wu <olvaffe@gmail.com> (olv)
+
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
+The above copyright notice and this permission notice (including the
+next paragraph) shall be included in all copies or substantial
+portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+*/
+
+
+enum gen_surface_format {
+    GEN6_FORMAT_R32G32B32A32_FLOAT			      = 0x0,
+    GEN6_FORMAT_R32G32B32A32_SINT			      = 0x1,
+    GEN6_FORMAT_R32G32B32A32_UINT			      = 0x2,
+    GEN6_FORMAT_R32G32B32A32_UNORM			      = 0x3,
+    GEN6_FORMAT_R32G32B32A32_SNORM			      = 0x4,
+    GEN6_FORMAT_R64G64_FLOAT				      = 0x5,
+    GEN6_FORMAT_R32G32B32X32_FLOAT			      = 0x6,
+    GEN6_FORMAT_R32G32B32A32_SSCALED			      = 0x7,
+    GEN6_FORMAT_R32G32B32A32_USCALED			      = 0x8,
+    GEN6_FORMAT_R32G32B32A32_SFIXED			      = 0x20,
+    GEN6_FORMAT_R64G64_PASSTHRU				      = 0x21,
+    GEN6_FORMAT_R32G32B32_FLOAT				      = 0x40,
+    GEN6_FORMAT_R32G32B32_SINT				      = 0x41,
+    GEN6_FORMAT_R32G32B32_UINT				      = 0x42,
+    GEN6_FORMAT_R32G32B32_UNORM				      = 0x43,
+    GEN6_FORMAT_R32G32B32_SNORM				      = 0x44,
+    GEN6_FORMAT_R32G32B32_SSCALED			      = 0x45,
+    GEN6_FORMAT_R32G32B32_USCALED			      = 0x46,
+    GEN6_FORMAT_R32G32B32_SFIXED			      = 0x50,
+    GEN6_FORMAT_R16G16B16A16_UNORM			      = 0x80,
+    GEN6_FORMAT_R16G16B16A16_SNORM			      = 0x81,
+    GEN6_FORMAT_R16G16B16A16_SINT			      = 0x82,
+    GEN6_FORMAT_R16G16B16A16_UINT			      = 0x83,
+    GEN6_FORMAT_R16G16B16A16_FLOAT			      = 0x84,
+    GEN6_FORMAT_R32G32_FLOAT				      = 0x85,
+    GEN6_FORMAT_R32G32_SINT				      = 0x86,
+    GEN6_FORMAT_R32G32_UINT				      = 0x87,
+    GEN6_FORMAT_R32_FLOAT_X8X24_TYPELESS		      = 0x88,
+    GEN6_FORMAT_X32_TYPELESS_G8X24_UINT			      = 0x89,
+    GEN6_FORMAT_L32A32_FLOAT				      = 0x8a,
+    GEN6_FORMAT_R32G32_UNORM				      = 0x8b,
+    GEN6_FORMAT_R32G32_SNORM				      = 0x8c,
+    GEN6_FORMAT_R64_FLOAT				      = 0x8d,
+    GEN6_FORMAT_R16G16B16X16_UNORM			      = 0x8e,
+    GEN6_FORMAT_R16G16B16X16_FLOAT			      = 0x8f,
+    GEN6_FORMAT_A32X32_FLOAT				      = 0x90,
+    GEN6_FORMAT_L32X32_FLOAT				      = 0x91,
+    GEN6_FORMAT_I32X32_FLOAT				      = 0x92,
+    GEN6_FORMAT_R16G16B16A16_SSCALED			      = 0x93,
+    GEN6_FORMAT_R16G16B16A16_USCALED			      = 0x94,
+    GEN6_FORMAT_R32G32_SSCALED				      = 0x95,
+    GEN6_FORMAT_R32G32_USCALED				      = 0x96,
+    GEN6_FORMAT_R32G32_SFIXED				      = 0xa0,
+    GEN6_FORMAT_R64_PASSTHRU				      = 0xa1,
+    GEN6_FORMAT_B8G8R8A8_UNORM				      = 0xc0,
+    GEN6_FORMAT_B8G8R8A8_UNORM_SRGB			      = 0xc1,
+    GEN6_FORMAT_R10G10B10A2_UNORM			      = 0xc2,
+    GEN6_FORMAT_R10G10B10A2_UNORM_SRGB			      = 0xc3,
+    GEN6_FORMAT_R10G10B10A2_UINT			      = 0xc4,
+    GEN6_FORMAT_R10G10B10_SNORM_A2_UNORM		      = 0xc5,
+    GEN6_FORMAT_R8G8B8A8_UNORM				      = 0xc7,
+    GEN6_FORMAT_R8G8B8A8_UNORM_SRGB			      = 0xc8,
+    GEN6_FORMAT_R8G8B8A8_SNORM				      = 0xc9,
+    GEN6_FORMAT_R8G8B8A8_SINT				      = 0xca,
+    GEN6_FORMAT_R8G8B8A8_UINT				      = 0xcb,
+    GEN6_FORMAT_R16G16_UNORM				      = 0xcc,
+    GEN6_FORMAT_R16G16_SNORM				      = 0xcd,
+    GEN6_FORMAT_R16G16_SINT				      = 0xce,
+    GEN6_FORMAT_R16G16_UINT				      = 0xcf,
+    GEN6_FORMAT_R16G16_FLOAT				      = 0xd0,
+    GEN6_FORMAT_B10G10R10A2_UNORM			      = 0xd1,
+    GEN6_FORMAT_B10G10R10A2_UNORM_SRGB			      = 0xd2,
+    GEN6_FORMAT_R11G11B10_FLOAT				      = 0xd3,
+    GEN6_FORMAT_R32_SINT				      = 0xd6,
+    GEN6_FORMAT_R32_UINT				      = 0xd7,
+    GEN6_FORMAT_R32_FLOAT				      = 0xd8,
+    GEN6_FORMAT_R24_UNORM_X8_TYPELESS			      = 0xd9,
+    GEN6_FORMAT_X24_TYPELESS_G8_UINT			      = 0xda,
+    GEN6_FORMAT_L32_UNORM				      = 0xdd,
+    GEN6_FORMAT_A32_UNORM				      = 0xde,
+    GEN6_FORMAT_L16A16_UNORM				      = 0xdf,
+    GEN6_FORMAT_I24X8_UNORM				      = 0xe0,
+    GEN6_FORMAT_L24X8_UNORM				      = 0xe1,
+    GEN6_FORMAT_A24X8_UNORM				      = 0xe2,
+    GEN6_FORMAT_I32_FLOAT				      = 0xe3,
+    GEN6_FORMAT_L32_FLOAT				      = 0xe4,
+    GEN6_FORMAT_A32_FLOAT				      = 0xe5,
+    GEN6_FORMAT_X8B8_UNORM_G8R8_SNORM			      = 0xe6,
+    GEN6_FORMAT_A8X8_UNORM_G8R8_SNORM			      = 0xe7,
+    GEN6_FORMAT_B8X8_UNORM_G8R8_SNORM			      = 0xe8,
+    GEN6_FORMAT_B8G8R8X8_UNORM				      = 0xe9,
+    GEN6_FORMAT_B8G8R8X8_UNORM_SRGB			      = 0xea,
+    GEN6_FORMAT_R8G8B8X8_UNORM				      = 0xeb,
+    GEN6_FORMAT_R8G8B8X8_UNORM_SRGB			      = 0xec,
+    GEN6_FORMAT_R9G9B9E5_SHAREDEXP			      = 0xed,
+    GEN6_FORMAT_B10G10R10X2_UNORM			      = 0xee,
+    GEN6_FORMAT_L16A16_FLOAT				      = 0xf0,
+    GEN6_FORMAT_R32_UNORM				      = 0xf1,
+    GEN6_FORMAT_R32_SNORM				      = 0xf2,
+    GEN6_FORMAT_R10G10B10X2_USCALED			      = 0xf3,
+    GEN6_FORMAT_R8G8B8A8_SSCALED			      = 0xf4,
+    GEN6_FORMAT_R8G8B8A8_USCALED			      = 0xf5,
+    GEN6_FORMAT_R16G16_SSCALED				      = 0xf6,
+    GEN6_FORMAT_R16G16_USCALED				      = 0xf7,
+    GEN6_FORMAT_R32_SSCALED				      = 0xf8,
+    GEN6_FORMAT_R32_USCALED				      = 0xf9,
+    GEN6_FORMAT_B5G6R5_UNORM				      = 0x100,
+    GEN6_FORMAT_B5G6R5_UNORM_SRGB			      = 0x101,
+    GEN6_FORMAT_B5G5R5A1_UNORM				      = 0x102,
+    GEN6_FORMAT_B5G5R5A1_UNORM_SRGB			      = 0x103,
+    GEN6_FORMAT_B4G4R4A4_UNORM				      = 0x104,
+    GEN6_FORMAT_B4G4R4A4_UNORM_SRGB			      = 0x105,
+    GEN6_FORMAT_R8G8_UNORM				      = 0x106,
+    GEN6_FORMAT_R8G8_SNORM				      = 0x107,
+    GEN6_FORMAT_R8G8_SINT				      = 0x108,
+    GEN6_FORMAT_R8G8_UINT				      = 0x109,
+    GEN6_FORMAT_R16_UNORM				      = 0x10a,
+    GEN6_FORMAT_R16_SNORM				      = 0x10b,
+    GEN6_FORMAT_R16_SINT				      = 0x10c,
+    GEN6_FORMAT_R16_UINT				      = 0x10d,
+    GEN6_FORMAT_R16_FLOAT				      = 0x10e,
+    GEN6_FORMAT_A8P8_UNORM_PALETTE0			      = 0x10f,
+    GEN6_FORMAT_A8P8_UNORM_PALETTE1			      = 0x110,
+    GEN6_FORMAT_I16_UNORM				      = 0x111,
+    GEN6_FORMAT_L16_UNORM				      = 0x112,
+    GEN6_FORMAT_A16_UNORM				      = 0x113,
+    GEN6_FORMAT_L8A8_UNORM				      = 0x114,
+    GEN6_FORMAT_I16_FLOAT				      = 0x115,
+    GEN6_FORMAT_L16_FLOAT				      = 0x116,
+    GEN6_FORMAT_A16_FLOAT				      = 0x117,
+    GEN6_FORMAT_L8A8_UNORM_SRGB				      = 0x118,
+    GEN6_FORMAT_R5G5_SNORM_B6_UNORM			      = 0x119,
+    GEN6_FORMAT_B5G5R5X1_UNORM				      = 0x11a,
+    GEN6_FORMAT_B5G5R5X1_UNORM_SRGB			      = 0x11b,
+    GEN6_FORMAT_R8G8_SSCALED				      = 0x11c,
+    GEN6_FORMAT_R8G8_USCALED				      = 0x11d,
+    GEN6_FORMAT_R16_SSCALED				      = 0x11e,
+    GEN6_FORMAT_R16_USCALED				      = 0x11f,
+    GEN6_FORMAT_P8A8_UNORM_PALETTE0			      = 0x122,
+    GEN6_FORMAT_P8A8_UNORM_PALETTE1			      = 0x123,
+    GEN6_FORMAT_A1B5G5R5_UNORM				      = 0x124,
+    GEN6_FORMAT_A4B4G4R4_UNORM				      = 0x125,
+    GEN6_FORMAT_L8A8_UINT				      = 0x126,
+    GEN6_FORMAT_L8A8_SINT				      = 0x127,
+    GEN6_FORMAT_R8_UNORM				      = 0x140,
+    GEN6_FORMAT_R8_SNORM				      = 0x141,
+    GEN6_FORMAT_R8_SINT					      = 0x142,
+    GEN6_FORMAT_R8_UINT					      = 0x143,
+    GEN6_FORMAT_A8_UNORM				      = 0x144,
+    GEN6_FORMAT_I8_UNORM				      = 0x145,
+    GEN6_FORMAT_L8_UNORM				      = 0x146,
+    GEN6_FORMAT_P4A4_UNORM_PALETTE0			      = 0x147,
+    GEN6_FORMAT_A4P4_UNORM_PALETTE0			      = 0x148,
+    GEN6_FORMAT_R8_SSCALED				      = 0x149,
+    GEN6_FORMAT_R8_USCALED				      = 0x14a,
+    GEN6_FORMAT_P8_UNORM_PALETTE0			      = 0x14b,
+    GEN6_FORMAT_L8_UNORM_SRGB				      = 0x14c,
+    GEN6_FORMAT_P8_UNORM_PALETTE1			      = 0x14d,
+    GEN6_FORMAT_P4A4_UNORM_PALETTE1			      = 0x14e,
+    GEN6_FORMAT_A4P4_UNORM_PALETTE1			      = 0x14f,
+    GEN6_FORMAT_Y8_UNORM				      = 0x150,
+    GEN6_FORMAT_L8_UINT					      = 0x152,
+    GEN6_FORMAT_L8_SINT					      = 0x153,
+    GEN6_FORMAT_I8_UINT					      = 0x154,
+    GEN6_FORMAT_I8_SINT					      = 0x155,
+    GEN6_FORMAT_DXT1_RGB_SRGB				      = 0x180,
+    GEN6_FORMAT_R1_UNORM				      = 0x181,
+    GEN6_FORMAT_YCRCB_NORMAL				      = 0x182,
+    GEN6_FORMAT_YCRCB_SWAPUVY				      = 0x183,
+    GEN6_FORMAT_P2_UNORM_PALETTE0			      = 0x184,
+    GEN6_FORMAT_P2_UNORM_PALETTE1			      = 0x185,
+    GEN6_FORMAT_BC1_UNORM				      = 0x186,
+    GEN6_FORMAT_BC2_UNORM				      = 0x187,
+    GEN6_FORMAT_BC3_UNORM				      = 0x188,
+    GEN6_FORMAT_BC4_UNORM				      = 0x189,
+    GEN6_FORMAT_BC5_UNORM				      = 0x18a,
+    GEN6_FORMAT_BC1_UNORM_SRGB				      = 0x18b,
+    GEN6_FORMAT_BC2_UNORM_SRGB				      = 0x18c,
+    GEN6_FORMAT_BC3_UNORM_SRGB				      = 0x18d,
+    GEN6_FORMAT_MONO8					      = 0x18e,
+    GEN6_FORMAT_YCRCB_SWAPUV				      = 0x18f,
+    GEN6_FORMAT_YCRCB_SWAPY				      = 0x190,
+    GEN6_FORMAT_DXT1_RGB				      = 0x191,
+    GEN6_FORMAT_FXT1					      = 0x192,
+    GEN6_FORMAT_R8G8B8_UNORM				      = 0x193,
+    GEN6_FORMAT_R8G8B8_SNORM				      = 0x194,
+    GEN6_FORMAT_R8G8B8_SSCALED				      = 0x195,
+    GEN6_FORMAT_R8G8B8_USCALED				      = 0x196,
+    GEN6_FORMAT_R64G64B64A64_FLOAT			      = 0x197,
+    GEN6_FORMAT_R64G64B64_FLOAT				      = 0x198,
+    GEN6_FORMAT_BC4_SNORM				      = 0x199,
+    GEN6_FORMAT_BC5_SNORM				      = 0x19a,
+    GEN6_FORMAT_R16G16B16_FLOAT				      = 0x19b,
+    GEN6_FORMAT_R16G16B16_UNORM				      = 0x19c,
+    GEN6_FORMAT_R16G16B16_SNORM				      = 0x19d,
+    GEN6_FORMAT_R16G16B16_SSCALED			      = 0x19e,
+    GEN6_FORMAT_R16G16B16_USCALED			      = 0x19f,
+    GEN6_FORMAT_BC6H_SF16				      = 0x1a1,
+    GEN6_FORMAT_BC7_UNORM				      = 0x1a2,
+    GEN6_FORMAT_BC7_UNORM_SRGB				      = 0x1a3,
+    GEN6_FORMAT_BC6H_UF16				      = 0x1a4,
+    GEN6_FORMAT_PLANAR_420_8				      = 0x1a5,
+    GEN6_FORMAT_R8G8B8_UNORM_SRGB			      = 0x1a8,
+    GEN6_FORMAT_ETC1_RGB8				      = 0x1a9,
+    GEN6_FORMAT_ETC2_RGB8				      = 0x1aa,
+    GEN6_FORMAT_EAC_R11					      = 0x1ab,
+    GEN6_FORMAT_EAC_RG11				      = 0x1ac,
+    GEN6_FORMAT_EAC_SIGNED_R11				      = 0x1ad,
+    GEN6_FORMAT_EAC_SIGNED_RG11				      = 0x1ae,
+    GEN6_FORMAT_ETC2_SRGB8				      = 0x1af,
+    GEN6_FORMAT_R16G16B16_UINT				      = 0x1b0,
+    GEN6_FORMAT_R16G16B16_SINT				      = 0x1b1,
+    GEN6_FORMAT_R32_SFIXED				      = 0x1b2,
+    GEN6_FORMAT_R10G10B10A2_SNORM			      = 0x1b3,
+    GEN6_FORMAT_R10G10B10A2_USCALED			      = 0x1b4,
+    GEN6_FORMAT_R10G10B10A2_SSCALED			      = 0x1b5,
+    GEN6_FORMAT_R10G10B10A2_SINT			      = 0x1b6,
+    GEN6_FORMAT_B10G10R10A2_SNORM			      = 0x1b7,
+    GEN6_FORMAT_B10G10R10A2_USCALED			      = 0x1b8,
+    GEN6_FORMAT_B10G10R10A2_SSCALED			      = 0x1b9,
+    GEN6_FORMAT_B10G10R10A2_UINT			      = 0x1ba,
+    GEN6_FORMAT_B10G10R10A2_SINT			      = 0x1bb,
+    GEN6_FORMAT_R64G64B64A64_PASSTHRU			      = 0x1bc,
+    GEN6_FORMAT_R64G64B64_PASSTHRU			      = 0x1bd,
+    GEN6_FORMAT_ETC2_RGB8_PTA				      = 0x1c0,
+    GEN6_FORMAT_ETC2_SRGB8_PTA				      = 0x1c1,
+    GEN6_FORMAT_ETC2_EAC_RGBA8				      = 0x1c2,
+    GEN6_FORMAT_ETC2_EAC_SRGB8_A8			      = 0x1c3,
+    GEN6_FORMAT_R8G8B8_UINT				      = 0x1c8,
+    GEN6_FORMAT_R8G8B8_SINT				      = 0x1c9,
+    GEN6_FORMAT_RAW					      = 0x1ff,
+};
+
+enum gen_surface_type {
+    GEN6_SURFTYPE_1D					      = 0x0,
+    GEN6_SURFTYPE_2D					      = 0x1,
+    GEN6_SURFTYPE_3D					      = 0x2,
+    GEN6_SURFTYPE_CUBE					      = 0x3,
+    GEN6_SURFTYPE_BUFFER				      = 0x4,
+    GEN7_SURFTYPE_STRBUF				      = 0x5,
+    GEN6_SURFTYPE_NULL					      = 0x7,
+};
+
+enum gen_surface_tiling {
+    GEN6_TILING_NONE					      = 0x0,
+    GEN8_TILING_W					      = 0x1,
+    GEN6_TILING_X					      = 0x2,
+    GEN6_TILING_Y					      = 0x3,
+};
+
+enum gen_surface_clear_color {
+    GEN7_CLEAR_COLOR_ZERO				      = 0x0,
+    GEN7_CLEAR_COLOR_ONE				      = 0x1,
+};
+
+enum gen_surface_scs {
+    GEN75_SCS_ZERO					      = 0x0,
+    GEN75_SCS_ONE					      = 0x1,
+    GEN75_SCS_RED					      = 0x4,
+    GEN75_SCS_GREEN					      = 0x5,
+    GEN75_SCS_BLUE					      = 0x6,
+    GEN75_SCS_ALPHA					      = 0x7,
+};
+
+#define GEN6_SURFACE_STATE__SIZE				16
+
+#define GEN6_SURFACE_DW0_TYPE__MASK				0xe0000000
+#define GEN6_SURFACE_DW0_TYPE__SHIFT				29
+#define GEN6_SURFACE_DW0_FORMAT__MASK				0x07fc0000
+#define GEN6_SURFACE_DW0_FORMAT__SHIFT				18
+#define GEN6_SURFACE_DW0_VSTRIDE				(0x1 << 12)
+#define GEN6_SURFACE_DW0_VSTRIDE_OFFSET				(0x1 << 11)
+#define GEN6_SURFACE_DW0_MIPLAYOUT__MASK			0x00000400
+#define GEN6_SURFACE_DW0_MIPLAYOUT__SHIFT			10
+#define GEN6_SURFACE_DW0_MIPLAYOUT_BELOW			(0x0 << 10)
+#define GEN6_SURFACE_DW0_MIPLAYOUT_RIGHT			(0x1 << 10)
+#define GEN6_SURFACE_DW0_CUBE_MAP_CORNER_MODE			(0x1 << 9)
+#define GEN6_SURFACE_DW0_RENDER_CACHE_RW			(0x1 << 8)
+#define GEN6_SURFACE_DW0_MEDIA_BOUNDARY_PIXEL_MODE__MASK	0x000000c0
+#define GEN6_SURFACE_DW0_MEDIA_BOUNDARY_PIXEL_MODE__SHIFT	6
+#define GEN6_SURFACE_DW0_CUBE_FACE_ENABLES__MASK		0x0000003f
+#define GEN6_SURFACE_DW0_CUBE_FACE_ENABLES__SHIFT		0
+
+
+#define GEN6_SURFACE_DW2_HEIGHT__MASK				0xfff80000
+#define GEN6_SURFACE_DW2_HEIGHT__SHIFT				19
+#define GEN6_SURFACE_DW2_WIDTH__MASK				0x0007ffc0
+#define GEN6_SURFACE_DW2_WIDTH__SHIFT				6
+#define GEN6_SURFACE_DW2_MIP_COUNT_LOD__MASK			0x0000003c
+#define GEN6_SURFACE_DW2_MIP_COUNT_LOD__SHIFT			2
+#define GEN6_SURFACE_DW2_RTROTATE__MASK				0x00000003
+#define GEN6_SURFACE_DW2_RTROTATE__SHIFT			0
+#define GEN6_SURFACE_DW2_RTROTATE_0DEG				0x0
+#define GEN6_SURFACE_DW2_RTROTATE_90DEG				0x1
+#define GEN6_SURFACE_DW2_RTROTATE_270DEG			0x3
+
+#define GEN6_SURFACE_DW3_DEPTH__MASK				0xffe00000
+#define GEN6_SURFACE_DW3_DEPTH__SHIFT				21
+#define GEN6_SURFACE_DW3_PITCH__MASK				0x000ffff8
+#define GEN6_SURFACE_DW3_PITCH__SHIFT				3
+#define GEN6_SURFACE_DW3_TILING__MASK				0x00000003
+#define GEN6_SURFACE_DW3_TILING__SHIFT				0
+
+#define GEN6_SURFACE_DW4_MIN_LOD__MASK				0xf0000000
+#define GEN6_SURFACE_DW4_MIN_LOD__SHIFT				28
+#define GEN6_SURFACE_DW4_MIN_ARRAY_ELEMENT__MASK		0x0ffe0000
+#define GEN6_SURFACE_DW4_MIN_ARRAY_ELEMENT__SHIFT		17
+#define GEN6_SURFACE_DW4_RT_VIEW_EXTENT__MASK			0x0001ff00
+#define GEN6_SURFACE_DW4_RT_VIEW_EXTENT__SHIFT			8
+#define GEN6_SURFACE_DW4_MULTISAMPLECOUNT__MASK			0x00000070
+#define GEN6_SURFACE_DW4_MULTISAMPLECOUNT__SHIFT		4
+#define GEN6_SURFACE_DW4_MULTISAMPLECOUNT_1			(0x0 << 4)
+#define GEN6_SURFACE_DW4_MULTISAMPLECOUNT_4			(0x2 << 4)
+#define GEN6_SURFACE_DW4_MSPOS_INDEX__MASK			0x00000007
+#define GEN6_SURFACE_DW4_MSPOS_INDEX__SHIFT			0
+
+#define GEN6_SURFACE_DW5_X_OFFSET__MASK				0xfe000000
+#define GEN6_SURFACE_DW5_X_OFFSET__SHIFT			25
+#define GEN6_SURFACE_DW5_X_OFFSET__SHR				2
+#define GEN6_SURFACE_DW5_VALIGN__MASK				0x01000000
+#define GEN6_SURFACE_DW5_VALIGN__SHIFT				24
+#define GEN6_SURFACE_DW5_VALIGN_2				(0x0 << 24)
+#define GEN6_SURFACE_DW5_VALIGN_4				(0x1 << 24)
+#define GEN6_SURFACE_DW5_Y_OFFSET__MASK				0x00f00000
+#define GEN6_SURFACE_DW5_Y_OFFSET__SHIFT			20
+#define GEN6_SURFACE_DW5_Y_OFFSET__SHR				1
+#define GEN6_SURFACE_DW5_MOCS__MASK				0x000f0000
+#define GEN6_SURFACE_DW5_MOCS__SHIFT				16
+
+
+#define GEN7_SURFACE_DW0_TYPE__MASK				0xe0000000
+#define GEN7_SURFACE_DW0_TYPE__SHIFT				29
+#define GEN7_SURFACE_DW0_IS_ARRAY				(0x1 << 28)
+#define GEN7_SURFACE_DW0_FORMAT__MASK				0x07fc0000
+#define GEN7_SURFACE_DW0_FORMAT__SHIFT				18
+#define GEN7_SURFACE_DW0_VALIGN__MASK				0x00030000
+#define GEN7_SURFACE_DW0_VALIGN__SHIFT				16
+#define GEN7_SURFACE_DW0_VALIGN_2				(0x0 << 16)
+#define GEN7_SURFACE_DW0_VALIGN_4				(0x1 << 16)
+#define GEN8_SURFACE_DW0_VALIGN_8				(0x2 << 16)
+#define GEN8_SURFACE_DW0_VALIGN_16				(0x3 << 16)
+#define GEN7_SURFACE_DW0_HALIGN__MASK				0x00008000
+#define GEN7_SURFACE_DW0_HALIGN__SHIFT				15
+#define GEN7_SURFACE_DW0_HALIGN_4				(0x0 << 15)
+#define GEN7_SURFACE_DW0_HALIGN_8				(0x1 << 15)
+#define GEN7_SURFACE_DW0_TILING__MASK				0x00006000
+#define GEN7_SURFACE_DW0_TILING__SHIFT				13
+#define GEN7_SURFACE_DW0_VSTRIDE				(0x1 << 12)
+#define GEN7_SURFACE_DW0_VSTRIDE_OFFSET				(0x1 << 11)
+#define GEN7_SURFACE_DW0_ARYSPC__MASK				0x00000400
+#define GEN7_SURFACE_DW0_ARYSPC__SHIFT				10
+#define GEN7_SURFACE_DW0_ARYSPC_FULL				(0x0 << 10)
+#define GEN7_SURFACE_DW0_ARYSPC_LOD0				(0x1 << 10)
+#define GEN8_SURFACE_DW0_HALIGN__MASK				0x0000c000
+#define GEN8_SURFACE_DW0_HALIGN__SHIFT				14
+#define GEN8_SURFACE_DW0_HALIGN_4				(0x1 << 14)
+#define GEN8_SURFACE_DW0_HALIGN_8				(0x2 << 14)
+#define GEN8_SURFACE_DW0_HALIGN_16				(0x3 << 14)
+#define GEN8_SURFACE_DW0_TILING__MASK				0x00003000
+#define GEN8_SURFACE_DW0_TILING__SHIFT				12
+#define GEN8_SURFACE_DW0_VSTRIDE				(0x1 << 11)
+#define GEN8_SURFACE_DW0_VSTRIDE_OFFSET				(0x1 << 10)
+#define GEN8_SURFACE_DW0_SAMPLER_L2_BYPASS_MODE			(0x1 << 9)
+#define GEN7_SURFACE_DW0_RENDER_CACHE_RW			(0x1 << 8)
+#define GEN7_SURFACE_DW0_MEDIA_BOUNDARY_PIXEL_MODE__MASK	0x000000c0
+#define GEN7_SURFACE_DW0_MEDIA_BOUNDARY_PIXEL_MODE__SHIFT	6
+#define GEN7_SURFACE_DW0_CUBE_FACE_ENABLES__MASK		0x0000003f
+#define GEN7_SURFACE_DW0_CUBE_FACE_ENABLES__SHIFT		0
+
+
+#define GEN8_SURFACE_DW1_MOCS__MASK				0x7f000000
+#define GEN8_SURFACE_DW1_MOCS__SHIFT				24
+#define GEN8_SURFACE_DW1_BASE_LOD__MASK				0x00f80000
+#define GEN8_SURFACE_DW1_BASE_LOD__SHIFT			19
+#define GEN8_SURFACE_DW1_QPITCH__MASK				0x00007fff
+#define GEN8_SURFACE_DW1_QPITCH__SHIFT				0
+
+#define GEN7_SURFACE_DW2_HEIGHT__MASK				0x3fff0000
+#define GEN7_SURFACE_DW2_HEIGHT__SHIFT				16
+#define GEN7_SURFACE_DW2_WIDTH__MASK				0x00003fff
+#define GEN7_SURFACE_DW2_WIDTH__SHIFT				0
+
+#define GEN7_SURFACE_DW3_DEPTH__MASK				0xffe00000
+#define GEN7_SURFACE_DW3_DEPTH__SHIFT				21
+#define GEN75_SURFACE_DW3_INTEGER_SURFACE_FORMAT__MASK		0x001c0000
+#define GEN75_SURFACE_DW3_INTEGER_SURFACE_FORMAT__SHIFT		18
+#define GEN7_SURFACE_DW3_PITCH__MASK				0x0003ffff
+#define GEN7_SURFACE_DW3_PITCH__SHIFT				0
+
+#define GEN7_SURFACE_DW4_RTROTATE__MASK				0x60000000
+#define GEN7_SURFACE_DW4_RTROTATE__SHIFT			29
+#define GEN7_SURFACE_DW4_RTROTATE_0DEG				(0x0 << 29)
+#define GEN7_SURFACE_DW4_RTROTATE_90DEG				(0x1 << 29)
+#define GEN7_SURFACE_DW4_RTROTATE_270DEG			(0x3 << 29)
+#define GEN7_SURFACE_DW4_MIN_ARRAY_ELEMENT__MASK		0x1ffc0000
+#define GEN7_SURFACE_DW4_MIN_ARRAY_ELEMENT__SHIFT		18
+#define GEN7_SURFACE_DW4_RT_VIEW_EXTENT__MASK			0x0003ff80
+#define GEN7_SURFACE_DW4_RT_VIEW_EXTENT__SHIFT			7
+#define GEN7_SURFACE_DW4_MSFMT__MASK				0x00000040
+#define GEN7_SURFACE_DW4_MSFMT__SHIFT				6
+#define GEN7_SURFACE_DW4_MSFMT_MSS				(0x0 << 6)
+#define GEN7_SURFACE_DW4_MSFMT_DEPTH_STENCIL			(0x1 << 6)
+#define GEN7_SURFACE_DW4_MULTISAMPLECOUNT__MASK			0x00000038
+#define GEN7_SURFACE_DW4_MULTISAMPLECOUNT__SHIFT		3
+#define GEN7_SURFACE_DW4_MULTISAMPLECOUNT_1			(0x0 << 3)
+#define GEN8_SURFACE_DW4_MULTISAMPLECOUNT_2			(0x1 << 3)
+#define GEN7_SURFACE_DW4_MULTISAMPLECOUNT_4			(0x2 << 3)
+#define GEN7_SURFACE_DW4_MULTISAMPLECOUNT_8			(0x3 << 3)
+#define GEN8_SURFACE_DW4_MULTISAMPLECOUNT_16			(0x4 << 3)
+#define GEN7_SURFACE_DW4_MSPOS_INDEX__MASK			0x00000007
+#define GEN7_SURFACE_DW4_MSPOS_INDEX__SHIFT			0
+#define GEN7_SURFACE_DW4_MIN_ARRAY_ELEMENT_STRBUF__MASK		0x07ffffff
+#define GEN7_SURFACE_DW4_MIN_ARRAY_ELEMENT_STRBUF__SHIFT	0
+
+#define GEN7_SURFACE_DW5_X_OFFSET__MASK				0xfe000000
+#define GEN7_SURFACE_DW5_X_OFFSET__SHIFT			25
+#define GEN7_SURFACE_DW5_X_OFFSET__SHR				2
+#define GEN7_SURFACE_DW5_Y_OFFSET__MASK				0x00f00000
+#define GEN7_SURFACE_DW5_Y_OFFSET__SHIFT			20
+#define GEN7_SURFACE_DW5_Y_OFFSET__SHR				1
+#define GEN7_SURFACE_DW5_MOCS__MASK				0x000f0000
+#define GEN7_SURFACE_DW5_MOCS__SHIFT				16
+#define GEN8_SURFACE_DW5_Y_OFFSET__MASK				0x00e00000
+#define GEN8_SURFACE_DW5_Y_OFFSET__SHIFT			21
+#define GEN8_SURFACE_DW5_Y_OFFSET__SHR				1
+#define GEN8_SURFACE_DW5_CUBE_EWA				(0x1 << 20)
+#define GEN8_SURFACE_DW5_COHERENCY_TYPE				(0x1 << 14)
+#define GEN7_SURFACE_DW5_MIN_LOD__MASK				0x000000f0
+#define GEN7_SURFACE_DW5_MIN_LOD__SHIFT				4
+#define GEN7_SURFACE_DW5_MIP_COUNT_LOD__MASK			0x0000000f
+#define GEN7_SURFACE_DW5_MIP_COUNT_LOD__SHIFT			0
+
+#define GEN8_SURFACE_DW6_SEPARATE_UV_ENABLE			(0x1 << 31)
+#define GEN7_SURFACE_DW6_UV_X_OFFSET__MASK			0x3fff0000
+#define GEN7_SURFACE_DW6_UV_X_OFFSET__SHIFT			16
+#define GEN7_SURFACE_DW6_UV_Y_OFFSET__MASK			0x00003fff
+#define GEN7_SURFACE_DW6_UV_Y_OFFSET__SHIFT			0
+#define GEN7_SURFACE_DW6_MCS_ADDR__MASK				0xfffff000
+#define GEN7_SURFACE_DW6_MCS_ADDR__SHIFT			12
+#define GEN7_SURFACE_DW6_MCS_ADDR__SHR				12
+#define GEN8_SURFACE_DW6_AUX_QPITCH__MASK			0x7fff0000
+#define GEN8_SURFACE_DW6_AUX_QPITCH__SHIFT			16
+#define GEN7_SURFACE_DW6_AUX_PITCH__MASK			0x00000ff8
+#define GEN7_SURFACE_DW6_AUX_PITCH__SHIFT			3
+#define GEN7_SURFACE_DW6_APPEND_COUNTER_ADDR__MASK		0xffffffc0
+#define GEN7_SURFACE_DW6_APPEND_COUNTER_ADDR__SHIFT		6
+#define GEN7_SURFACE_DW6_APPEND_COUNTER_ADDR__SHR		6
+#define GEN7_SURFACE_DW6_AUX_MODE__MASK				0x00000007
+#define GEN7_SURFACE_DW6_AUX_MODE__SHIFT			0
+#define GEN7_SURFACE_DW6_AUX_MODE_NONE				0x0
+#define GEN7_SURFACE_DW6_AUX_MODE_MCS				0x1
+#define GEN7_SURFACE_DW6_AUX_MODE_APPEND			0x2
+#define GEN8_SURFACE_DW6_AUX_MODE_HIZ				0x3
+
+#define GEN7_SURFACE_DW7_CC_R__MASK				0x80000000
+#define GEN7_SURFACE_DW7_CC_R__SHIFT				31
+#define GEN7_SURFACE_DW7_CC_G__MASK				0x40000000
+#define GEN7_SURFACE_DW7_CC_G__SHIFT				30
+#define GEN7_SURFACE_DW7_CC_B__MASK				0x20000000
+#define GEN7_SURFACE_DW7_CC_B__SHIFT				29
+#define GEN7_SURFACE_DW7_CC_A__MASK				0x10000000
+#define GEN7_SURFACE_DW7_CC_A__SHIFT				28
+#define GEN75_SURFACE_DW7_SCS_R__MASK				0x0e000000
+#define GEN75_SURFACE_DW7_SCS_R__SHIFT				25
+#define GEN75_SURFACE_DW7_SCS_G__MASK				0x01c00000
+#define GEN75_SURFACE_DW7_SCS_G__SHIFT				22
+#define GEN75_SURFACE_DW7_SCS_B__MASK				0x00380000
+#define GEN75_SURFACE_DW7_SCS_B__SHIFT				19
+#define GEN75_SURFACE_DW7_SCS_A__MASK				0x00070000
+#define GEN75_SURFACE_DW7_SCS_A__SHIFT				16
+#define GEN7_SURFACE_DW7_RES_MIN_LOD__MASK			0x00000fff
+#define GEN7_SURFACE_DW7_RES_MIN_LOD__SHIFT			0
+
+
+
+
+
+
+
+
+
+#define GEN6_BINDING_TABLE_STATE__SIZE				256
+
+#define GEN6_BINDING_TABLE_DW_ADDR__MASK			0xffffffe0
+#define GEN6_BINDING_TABLE_DW_ADDR__SHIFT			5
+#define GEN6_BINDING_TABLE_DW_ADDR__SHR				5
+
+#define GEN8_BINDING_TABLE_DW_ADDR__MASK			0xffffffc0
+#define GEN8_BINDING_TABLE_DW_ADDR__SHIFT			6
+#define GEN8_BINDING_TABLE_DW_ADDR__SHR				6
+
+
+#endif /* GEN_RENDER_SURFACE_XML */

diff --git a/icd/intel/genhw/genhw.h b/icd/intel/genhw/genhw.h
new file mode 100644
index 0000000..144595a
--- /dev/null
+++ b/icd/intel/genhw/genhw.h

@@ -0,0 +1,260 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef GENHW_H
+#define GENHW_H
+
+#include <stdbool.h>
+#include <stdint.h>
+#include <assert.h>
+
+#include "gen_regs.xml.h"
+#include "gen_mi.xml.h"
+#include "gen_blitter.xml.h"
+#include "gen_render.xml.h"
+#include "gen_render_surface.xml.h"
+#include "gen_render_dynamic.xml.h"
+#include "gen_render_3d.xml.h"
+#include "gen_render_media.xml.h"
+#include "gen_eu_isa.xml.h"
+#include "gen_eu_message.xml.h"
+
+#define GEN_MI_CMD(gen, op) (GEN6_MI_TYPE_MI | gen ## _MI_OPCODE_ ## op)
+#define GEN6_MI_CMD(op) GEN_MI_CMD(GEN6, op)
+#define GEN7_MI_CMD(op) GEN_MI_CMD(GEN7, op)
+
+#define GEN_BLITTER_CMD(gen, op) \
+   (GEN6_BLITTER_TYPE_BLITTER | gen ## _BLITTER_OPCODE_ ## op)
+#define GEN6_BLITTER_CMD(op) GEN_BLITTER_CMD(GEN6, op)
+
+#define GEN_RENDER_CMD(subtype, gen, op)  \
+   (GEN6_RENDER_TYPE_RENDER |             \
+    GEN6_RENDER_SUBTYPE_ ## subtype |     \
+    gen ## _RENDER_OPCODE_ ## op)
+#define GEN6_RENDER_CMD(subtype, op) GEN_RENDER_CMD(subtype, GEN6, op)
+#define GEN7_RENDER_CMD(subtype, op) GEN_RENDER_CMD(subtype, GEN7, op)
+#define GEN75_RENDER_CMD(subtype, op) GEN_RENDER_CMD(subtype, GEN75, op)
+#define GEN8_RENDER_CMD(subtype, op) GEN_RENDER_CMD(subtype, GEN8, op)
+
+#define GEN_EXTRACT(bits, field) (((bits) & field ## __MASK) >> field ## __SHIFT)
+#define GEN_SHIFT32(bits, field) gen_shift32(bits, field ## __MASK, field ## __SHIFT)
+
+static inline uint32_t
+gen_shift32(uint32_t bits, uint32_t mask, int shift)
+{
+   bits <<= shift;
+
+   assert((bits & mask) == bits);
+   return bits & mask;
+}
+
+static inline bool
+gen_is_snb(int devid)
+{
+   return (devid == 0x0102 || /* GT1 desktop */
+           devid == 0x0112 || /* GT2 desktop */
+           devid == 0x0122 || /* GT2_PLUS desktop */
+           devid == 0x0106 || /* GT1 mobile */
+           devid == 0x0116 || /* GT2 mobile */
+           devid == 0x0126 || /* GT2_PLUS mobile */
+           devid == 0x010a);  /* GT1 server */
+}
+
+static inline int
+gen_get_snb_gt(int devid)
+{
+   assert(gen_is_snb(devid));
+   return (devid & 0x30) ? 2 : 1;
+}
+
+static inline bool
+gen_is_ivb(int devid)
+{
+   return (devid == 0x0152 || /* GT1 desktop */
+           devid == 0x0162 || /* GT2 desktop */
+           devid == 0x0156 || /* GT1 mobile */
+           devid == 0x0166 || /* GT2 mobile */
+           devid == 0x015a || /* GT1 server */
+           devid == 0x016a);  /* GT2 server */
+}
+
+static inline int
+gen_get_ivb_gt(int devid)
+{
+   assert(gen_is_ivb(devid));
+   return (devid & 0x30) >> 4;
+}
+
+static inline bool
+gen_is_hsw(int devid)
+{
+   return (devid == 0x0402 || /* GT1 desktop */
+           devid == 0x0412 || /* GT2 desktop */
+           devid == 0x0422 || /* GT3 desktop */
+           devid == 0x0406 || /* GT1 mobile */
+           devid == 0x0416 || /* GT2 mobile */
+           devid == 0x0426 || /* GT2 mobile */
+           devid == 0x040a || /* GT1 server */
+           devid == 0x041a || /* GT2 server */
+           devid == 0x042a || /* GT3 server */
+           devid == 0x040b || /* GT1 reserved */
+           devid == 0x041b || /* GT2 reserved */
+           devid == 0x042b || /* GT3 reserved */
+           devid == 0x040e || /* GT1 reserved */
+           devid == 0x041e || /* GT2 reserved */
+           devid == 0x042e || /* GT3 reserved */
+           devid == 0x0c02 || /* SDV */
+           devid == 0x0c12 ||
+           devid == 0x0c22 ||
+           devid == 0x0c06 ||
+           devid == 0x0c16 ||
+           devid == 0x0c26 ||
+           devid == 0x0c0a ||
+           devid == 0x0c1a ||
+           devid == 0x0c2a ||
+           devid == 0x0c0b ||
+           devid == 0x0c1b ||
+           devid == 0x0c2b ||
+           devid == 0x0c0e ||
+           devid == 0x0c1e ||
+           devid == 0x0c2e ||
+           devid == 0x0a02 || /* ULT */
+           devid == 0x0a12 ||
+           devid == 0x0a22 ||
+           devid == 0x0a06 ||
+           devid == 0x0a16 ||
+           devid == 0x0a26 ||
+           devid == 0x0a0a ||
+           devid == 0x0a1a ||
+           devid == 0x0a2a ||
+           devid == 0x0a0b ||
+           devid == 0x0a1b ||
+           devid == 0x0a2b ||
+           devid == 0x0a0e ||
+           devid == 0x0a1e ||
+           devid == 0x0a2e ||
+           devid == 0x0d02 || /* CRW */
+           devid == 0x0d12 ||
+           devid == 0x0d22 ||
+           devid == 0x0d06 ||
+           devid == 0x0d16 ||
+           devid == 0x0d26 ||
+           devid == 0x0d0a ||
+           devid == 0x0d1a ||
+           devid == 0x0d2a ||
+           devid == 0x0d0b ||
+           devid == 0x0d1b ||
+           devid == 0x0d2b ||
+           devid == 0x0d0e ||
+           devid == 0x0d1e ||
+           devid == 0x0d2e);
+}
+
+static inline int
+gen_get_hsw_gt(int devid)
+{
+   assert(gen_is_hsw(devid));
+   return ((devid & 0x30) >> 4) + 1;
+}
+
+static inline bool
+gen_is_bdw(int devid)
+{
+   return (devid == 0x1602 || /* GT1 ULT */
+           devid == 0x1606 || /* GT1 ULT */
+           devid == 0x160a || /* GT1 server */
+           devid == 0x160b || /* GT1 Iris */
+           devid == 0x160d || /* GT1 workstation */
+           devid == 0x160e || /* GT1 ULX */
+           devid == 0x1612 || /* GT2 */
+           devid == 0x1616 ||
+           devid == 0x161a ||
+           devid == 0x161b ||
+           devid == 0x161d ||
+           devid == 0x161e ||
+           devid == 0x1622 || /* GT3 */
+           devid == 0x1626 ||
+           devid == 0x162a ||
+           devid == 0x162b ||
+           devid == 0x162d ||
+           devid == 0x162e);
+}
+
+static inline int
+gen_get_bdw_gt(int devid)
+{
+   assert(gen_is_bdw(devid));
+   return ((devid & 0x30) >> 4) + 1;
+}
+
+static inline bool
+gen_is_vlv(int devid)
+{
+   return (devid == 0x0f30 ||
+           devid == 0x0f31 ||
+           devid == 0x0f32 ||
+           devid == 0x0f33 ||
+           devid == 0x0157 ||
+           devid == 0x0155);
+}
+
+static inline bool
+gen_is_chv(int devid)
+{
+   return (devid == 0x22b0 ||
+           devid == 0x22b1 ||
+           devid == 0x22b2 ||
+           devid == 0x22b3);
+}
+
+static inline bool
+gen_is_atom(int devid)
+{
+   return (gen_is_vlv(devid) ||
+           gen_is_chv(devid));
+}
+
+static inline bool
+gen_is_desktop(int devid)
+{
+   assert(!gen_is_atom(devid));
+   return ((devid & 0xf) == 0x2);
+}
+
+static inline bool
+gen_is_mobile(int devid)
+{
+   assert(!gen_is_atom(devid));
+   return ((devid & 0xf) == 0x6);
+}
+
+static inline bool
+gen_is_server(int devid)
+{
+   assert(!gen_is_atom(devid));
+   return ((devid & 0xf) == 0xa);
+}
+
+#endif /* GENHW_H */

diff --git a/icd/intel/gpu.c b/icd/intel/gpu.c
new file mode 100644
index 0000000..4034c7f
--- /dev/null
+++ b/icd/intel/gpu.c

@@ -0,0 +1,555 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Chia-I Wu <olv@lunarg.com>
+ * Author: Chris Forbes <chrisf@ijw.co.nz>
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Mark Lobodzinski <mark@lunarg.com>
+ * Author: Tobin Ehlis <tobin@lunarg.com>
+ *
+ */
+
+#include <stdio.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+#include "genhw/genhw.h"
+#include "kmd/winsys.h"
+#include "queue.h"
+#include "gpu.h"
+#include "instance.h"
+#include "wsi.h"
+#include "icd.h"
+
+static int gpu_open_primary_node(struct intel_gpu *gpu)
+{
+    if (gpu->primary_fd_internal < 0)
+        gpu->primary_fd_internal = open(gpu->primary_node, O_RDWR);
+
+    return gpu->primary_fd_internal;
+}
+
+static void gpu_close_primary_node(struct intel_gpu *gpu)
+{
+    if (gpu->primary_fd_internal >= 0) {
+        close(gpu->primary_fd_internal);
+        gpu->primary_fd_internal = -1;
+    }
+}
+
+static int gpu_open_render_node(struct intel_gpu *gpu)
+{
+    if (gpu->render_fd_internal < 0 && gpu->render_node) {
+        gpu->render_fd_internal = open(gpu->render_node, O_RDWR);
+        if (gpu->render_fd_internal < 0) {
+            intel_log(gpu, VK_DEBUG_REPORT_ERROR_BIT_EXT, 0, VK_NULL_HANDLE, 0,
+                    0, "failed to open %s", gpu->render_node);
+        }
+    }
+
+    return gpu->render_fd_internal;
+}
+
+static void gpu_close_render_node(struct intel_gpu *gpu)
+{
+    if (gpu->render_fd_internal >= 0) {
+        close(gpu->render_fd_internal);
+        gpu->render_fd_internal = -1;
+    }
+}
+
+static const char *gpu_get_name(const struct intel_gpu *gpu)
+{
+    const char *name = NULL;
+
+    if (gen_is_hsw(gpu->devid)) {
+        if (gen_is_desktop(gpu->devid))
+            name = "Intel(R) Haswell Desktop";
+        else if (gen_is_mobile(gpu->devid))
+            name = "Intel(R) Haswell Mobile";
+        else if (gen_is_server(gpu->devid))
+            name = "Intel(R) Haswell Server";
+    }
+    else if (gen_is_ivb(gpu->devid)) {
+        if (gen_is_desktop(gpu->devid))
+            name = "Intel(R) Ivybridge Desktop";
+        else if (gen_is_mobile(gpu->devid))
+            name = "Intel(R) Ivybridge Mobile";
+        else if (gen_is_server(gpu->devid))
+            name = "Intel(R) Ivybridge Server";
+    }
+    else if (gen_is_snb(gpu->devid)) {
+        if (gen_is_desktop(gpu->devid))
+            name = "Intel(R) Sandybridge Desktop";
+        else if (gen_is_mobile(gpu->devid))
+            name = "Intel(R) Sandybridge Mobile";
+        else if (gen_is_server(gpu->devid))
+            name = "Intel(R) Sandybridge Server";
+    }
+
+    if (!name)
+        name = "Unknown Intel Chipset";
+
+    return name;
+}
+
+void intel_gpu_destroy(struct intel_gpu *gpu)
+{
+    intel_wsi_gpu_cleanup(gpu);
+
+    intel_gpu_cleanup_winsys(gpu);
+
+    intel_free(gpu, gpu->primary_node);
+    intel_free(gpu, gpu);
+}
+
+static int devid_to_gen(int devid)
+{
+    int gen;
+
+    if (gen_is_hsw(devid))
+        gen = INTEL_GEN(7.5);
+    else if (gen_is_ivb(devid))
+        gen = INTEL_GEN(7);
+    else if (gen_is_snb(devid))
+        gen = INTEL_GEN(6);
+    else
+        gen = -1;
+
+#ifdef INTEL_GEN_SPECIALIZED
+    if (gen != INTEL_GEN(INTEL_GEN_SPECIALIZED))
+        gen = -1;
+#endif
+
+    return gen;
+}
+
+VkResult intel_gpu_create(const struct intel_instance *instance, int devid,
+                            const char *primary_node, const char *render_node,
+                            struct intel_gpu **gpu_ret)
+{
+    const int gen = devid_to_gen(devid);
+    size_t primary_len, render_len;
+    struct intel_gpu *gpu;
+
+    if (gen < 0) {
+        intel_log(instance, VK_DEBUG_REPORT_WARNING_BIT_EXT, 0,
+                VK_NULL_HANDLE, 0, 0, "unsupported device id 0x%04x", devid);
+        return VK_ERROR_INITIALIZATION_FAILED;
+    }
+
+    gpu = intel_alloc(instance, sizeof(*gpu), sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_INSTANCE);
+    if (!gpu)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    memset(gpu, 0, sizeof(*gpu));
+    /* there is no VK_DBG_OBJECT_GPU */
+    intel_handle_init(&gpu->handle, VK_DEBUG_REPORT_OBJECT_TYPE_PHYSICAL_DEVICE_EXT, instance);
+
+    gpu->devid = devid;
+
+    primary_len = strlen(primary_node);
+    render_len = (render_node) ? strlen(render_node) : 0;
+
+    gpu->primary_node = intel_alloc(gpu, primary_len + 1 +
+            ((render_len) ? (render_len + 1) : 0), sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_INSTANCE);
+    if (!gpu->primary_node) {
+        intel_free(instance, gpu);
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+    }
+
+    memcpy(gpu->primary_node, primary_node, primary_len + 1);
+
+    if (render_node) {
+        gpu->render_node = gpu->primary_node + primary_len + 1;
+        memcpy(gpu->render_node, render_node, render_len + 1);
+    } else {
+        gpu->render_node = gpu->primary_node;
+    }
+
+    gpu->gen_opaque = gen;
+
+    switch (intel_gpu_gen(gpu)) {
+    case INTEL_GEN(7.5):
+        gpu->gt = gen_get_hsw_gt(devid);
+        break;
+    case INTEL_GEN(7):
+        gpu->gt = gen_get_ivb_gt(devid);
+        break;
+    case INTEL_GEN(6):
+        gpu->gt = gen_get_snb_gt(devid);
+        break;
+    }
+
+    /* 150K dwords */
+    gpu->max_batch_buffer_size = sizeof(uint32_t) * 150*1024;
+
+    /* the winsys is prepared for one reloc every two dwords, then minus 2 */
+    gpu->batch_buffer_reloc_count =
+        gpu->max_batch_buffer_size / sizeof(uint32_t) / 2 - 2;
+
+    gpu->primary_fd_internal = -1;
+    gpu->render_fd_internal = -1;
+
+    *gpu_ret = gpu;
+
+    return VK_SUCCESS;
+}
+
+void intel_gpu_get_limits(VkPhysicalDeviceLimits *pLimits)
+{
+    // TODO: fill out more limits
+    memset(pLimits, 0, sizeof(*pLimits));
+
+    // no size limit, but no bounded buffer could exceed 2GB
+    pLimits->maxBoundDescriptorSets         = 1;
+    pLimits->maxComputeWorkGroupInvocations = 512;
+
+    // incremented every 80ns
+    pLimits->timestampPeriod = 80.0f;
+
+    // hardware is limited to 16 viewports
+    pLimits->maxViewports        = INTEL_MAX_VIEWPORTS;
+    pLimits->maxColorAttachments = INTEL_MAX_RENDER_TARGETS;
+
+    // ?
+    pLimits->maxImageDimension1D   = 8192;
+    pLimits->maxImageDimension2D   = 8192;
+    pLimits->maxImageDimension3D   = 8192;
+    pLimits->maxImageDimensionCube = 8192;
+    pLimits->maxImageArrayLayers   = 2048;
+    pLimits->maxTexelBufferElements = 128 * 1024 * 1024;  // 128M texels hard limit
+    pLimits->maxUniformBufferRange = 64 * 1024;          // not hard limit
+
+    /* HW has two per-stage resource tables:
+     * - samplers, 16 per stage on IVB; blocks of 16 on HSW+ via shader hack, as the
+     *   table base ptr used by the sampler hw is under shader sw control.
+     *
+     * - binding table entries, 250 total on all gens, shared between
+     *   textures, RT, images, SSBO, UBO, ...
+     *   the top few indices (250-255) are used for 'stateless' access with various cache
+     *   options, and for SLM access.
+     */
+    pLimits->maxPerStageDescriptorSamplers       = 16;        // technically more on HSW+..
+    pLimits->maxDescriptorSetSamplers            = 16;
+
+    pLimits->maxPerStageDescriptorUniformBuffers = 128;
+    pLimits->maxDescriptorSetUniformBuffers      = 128;
+
+    pLimits->maxPerStageDescriptorSampledImages  = 128;
+    pLimits->maxDescriptorSetSampledImages       = 128;
+
+    // storage images and buffers not implemented; left at zero
+
+    pLimits->maxPerStageResources  = 250;
+
+    // required to support at least two queue priorities
+    pLimits->discreteQueuePriorities = 2;
+}
+
+void intel_gpu_get_props(const struct intel_gpu *gpu,
+                         VkPhysicalDeviceProperties *props)
+{
+    const char *name;
+    size_t name_len;
+
+    props->apiVersion = INTEL_API_VERSION;
+    props->driverVersion = INTEL_DRIVER_VERSION;
+
+    props->vendorID = 0x8086;
+    props->deviceID = gpu->devid;
+
+    props->deviceType = VK_PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU;
+
+    /* copy GPU name */
+    name = gpu_get_name(gpu);
+    name_len = strlen(name);
+    if (name_len > sizeof(props->deviceName) - 1)
+        name_len = sizeof(props->deviceName) - 1;
+    memcpy(props->deviceName, name, name_len);
+    props->deviceName[name_len] = '\0';
+
+    intel_gpu_get_limits(&props->limits);
+
+    intel_gpu_get_sparse_properties(&props->sparseProperties);
+}
+
+void intel_gpu_get_queue_props(const struct intel_gpu *gpu,
+                               enum intel_gpu_engine_type engine,
+                               VkQueueFamilyProperties *props)
+{
+    switch (engine) {
+    case INTEL_GPU_ENGINE_3D:
+        props->queueFlags = VK_QUEUE_GRAPHICS_BIT | VK_QUEUE_COMPUTE_BIT;
+        props->queueCount = 1;
+        props->timestampValidBits = 0;
+        props->minImageTransferGranularity.width = 1;
+        props->minImageTransferGranularity.height = 1;
+        props->minImageTransferGranularity.depth = 1;
+        break;
+    default:
+        assert(!"unknown engine type");
+        return;
+    }
+}
+
+void intel_gpu_get_memory_props(const struct intel_gpu *gpu,
+                                VkPhysicalDeviceMemoryProperties *props)
+{
+    memset(props, 0, sizeof(VkPhysicalDeviceMemoryProperties));
+    props->memoryTypeCount = INTEL_MEMORY_TYPE_COUNT;
+    props->memoryHeapCount = INTEL_MEMORY_HEAP_COUNT;
+
+    // For now, Intel will support one memory type
+    for (uint32_t i = 0; i < props->memoryTypeCount; i++) {
+        assert(props->memoryTypeCount == 1);
+        props->memoryTypes[i].propertyFlags = INTEL_MEMORY_PROPERTY_ALL;
+        props->memoryTypes[i].heapIndex     = i;
+    }
+
+    // For now, Intel will support a single heap with all available memory
+    for (uint32_t i = 0; i < props->memoryHeapCount; i++) {
+        assert(props->memoryHeapCount == 1);
+        props->memoryHeaps[0].size = INTEL_MEMORY_HEAP_SIZE;
+    }
+}
+
+int intel_gpu_get_max_threads(const struct intel_gpu *gpu,
+                              VkShaderStageFlagBits stage)
+{
+    switch (intel_gpu_gen(gpu)) {
+    case INTEL_GEN(7.5):
+        switch (stage) {
+        case VK_SHADER_STAGE_VERTEX_BIT:
+            return (gpu->gt >= 2) ? 280 : 70;
+        case VK_SHADER_STAGE_GEOMETRY_BIT:
+            /* values from ilo_gpe_init_gs_cso_gen7 */
+            return (gpu->gt >= 2) ? 256 : 70;
+        case VK_SHADER_STAGE_FRAGMENT_BIT:
+            return (gpu->gt == 3) ? 408 :
+                   (gpu->gt == 2) ? 204 : 102;
+        default:
+            break;
+        }
+        break;
+    case INTEL_GEN(7):
+        switch (stage) {
+        case VK_SHADER_STAGE_VERTEX_BIT:
+            return (gpu->gt == 2) ? 128 : 36;
+        case VK_SHADER_STAGE_GEOMETRY_BIT:
+            /* values from ilo_gpe_init_gs_cso_gen7 */
+            return (gpu->gt == 2) ? 128 : 36;
+        case VK_SHADER_STAGE_FRAGMENT_BIT:
+            return (gpu->gt == 2) ? 172 : 48;
+        default:
+            break;
+        }
+        break;
+    case INTEL_GEN(6):
+        switch (stage) {
+        case VK_SHADER_STAGE_VERTEX_BIT:
+            return (gpu->gt == 2) ? 60 : 24;
+        case VK_SHADER_STAGE_GEOMETRY_BIT:
+            /* values from ilo_gpe_init_gs_cso_gen6 */
+            return (gpu->gt == 2) ? 28 : 21;
+        case VK_SHADER_STAGE_FRAGMENT_BIT:
+            return (gpu->gt == 2) ? 80 : 40;
+        default:
+            break;
+        }
+        break;
+    default:
+        break;
+    }
+
+    intel_log(gpu, VK_DEBUG_REPORT_ERROR_BIT_EXT, 0, VK_NULL_HANDLE,
+            0, 0, "unknown Gen or shader stage");
+
+    switch (stage) {
+    case VK_SHADER_STAGE_VERTEX_BIT:
+        return 1;
+    case VK_SHADER_STAGE_GEOMETRY_BIT:
+        return 1;
+    case VK_SHADER_STAGE_FRAGMENT_BIT:
+        return 4;
+    default:
+        return 1;
+    }
+}
+
+int intel_gpu_get_primary_fd(struct intel_gpu *gpu)
+{
+    return gpu_open_primary_node(gpu);
+}
+
+VkResult intel_gpu_init_winsys(struct intel_gpu *gpu)
+{
+    int fd;
+
+    assert(!gpu->winsys);
+
+    fd = gpu_open_render_node(gpu);
+    if (fd < 0)
+        return VK_ERROR_INITIALIZATION_FAILED;
+
+    gpu->winsys = intel_winsys_create_for_fd(gpu->handle.instance->icd, fd);
+    if (!gpu->winsys) {
+        intel_log(gpu, VK_DEBUG_REPORT_ERROR_BIT_EXT, 0,
+                VK_NULL_HANDLE, 0, 0, "failed to create GPU winsys");
+        gpu_close_render_node(gpu);
+        return VK_ERROR_INITIALIZATION_FAILED;
+    }
+
+    return VK_SUCCESS;
+}
+
+void intel_gpu_cleanup_winsys(struct intel_gpu *gpu)
+{
+    if (gpu->winsys) {
+        intel_winsys_destroy(gpu->winsys);
+        gpu->winsys = NULL;
+    }
+
+    gpu_close_primary_node(gpu);
+    gpu_close_render_node(gpu);
+}
+
+enum intel_phy_dev_ext_type intel_gpu_lookup_phy_dev_extension(
+        const struct intel_gpu *gpu,
+        const char *ext)
+{
+    uint32_t type;
+    uint32_t array_size = ARRAY_SIZE(intel_phy_dev_gpu_exts);
+
+    for (type = 0; type < array_size; type++) {
+        if (compare_vk_extension_properties(&intel_phy_dev_gpu_exts[type], ext))
+            break;
+    }
+
+    assert(type < array_size || type == INTEL_PHY_DEV_EXT_INVALID);
+
+    return type;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetPhysicalDeviceProperties(
+    VkPhysicalDevice gpu_,
+    VkPhysicalDeviceProperties* pProperties)
+{
+    struct intel_gpu *gpu = intel_gpu(gpu_);
+
+    intel_gpu_get_props(gpu, pProperties);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetPhysicalDeviceQueueFamilyProperties(
+    VkPhysicalDevice gpu_,
+    uint32_t* pQueueFamilyPropertyCount,
+    VkQueueFamilyProperties* pProperties)
+{
+   struct intel_gpu *gpu = intel_gpu(gpu_);
+   int engine;
+
+   if (pProperties == NULL) {
+       *pQueueFamilyPropertyCount = INTEL_GPU_ENGINE_COUNT;
+       return;
+   }
+
+   for (engine = 0; engine < *pQueueFamilyPropertyCount; engine++) {
+       intel_gpu_get_queue_props(gpu, engine, pProperties);
+       pProperties++;
+   }
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetPhysicalDeviceMemoryProperties(
+    VkPhysicalDevice gpu_,
+    VkPhysicalDeviceMemoryProperties* pProperties)
+{
+   struct intel_gpu *gpu = intel_gpu(gpu_);
+
+   intel_gpu_get_memory_props(gpu, pProperties);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetPhysicalDeviceFeatures(
+                                               VkPhysicalDevice physicalDevice,
+                                               VkPhysicalDeviceFeatures* pFeatures)
+{
+    /* TODO: fill out features */
+    memset(pFeatures, 0, sizeof(*pFeatures));
+    pFeatures->shaderClipDistance = 1;
+    pFeatures->occlusionQueryPrecise = 1;
+}
+
+void intel_gpu_get_sparse_properties(VkPhysicalDeviceSparseProperties *pProps)
+{
+    memset(pProps, 0, sizeof(*pProps));
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkEnumerateDeviceExtensionProperties(
+        VkPhysicalDevice                            physicalDevice,
+        const char*                                 pLayerName,
+        uint32_t*                                   pPropertyCount,
+        VkExtensionProperties*                      pProperties)
+{
+    uint32_t copy_size;
+    uint32_t extension_count = ARRAY_SIZE(intel_phy_dev_gpu_exts);
+
+    if (pProperties == NULL) {
+        *pPropertyCount = INTEL_PHY_DEV_EXT_COUNT;
+        return VK_SUCCESS;
+    }
+
+    copy_size = *pPropertyCount < extension_count ? *pPropertyCount : extension_count;
+    memcpy(pProperties, intel_phy_dev_gpu_exts, copy_size * sizeof(VkExtensionProperties));
+    *pPropertyCount = copy_size;
+    if (copy_size < extension_count) {
+        return VK_INCOMPLETE;
+    }
+
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkEnumerateDeviceLayerProperties(
+        VkPhysicalDevice                            physicalDevice,
+        uint32_t*                                   pPropertyCount,
+        VkLayerProperties*                          pProperties)
+{
+    *pPropertyCount = 0;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetPhysicalDeviceSparseImageFormatProperties(
+    VkPhysicalDevice                            physicalDevice,
+    VkFormat                                    format,
+    VkImageType                                 type,
+    VkSampleCountFlagBits                       samples,
+    VkImageUsageFlags                           usage,
+    VkImageTiling                               tiling,
+    uint32_t*                                   pPropertyCount,
+    VkSparseImageFormatProperties*              pProperties)
+{
+    *pPropertyCount = 0;
+}
+
+ICD_EXPORT VKAPI_ATTR VkResult ICD_EXPORT VKAPI_CALL vk_icdNegotiateLoaderICDInterfaceVersion(uint32_t *pVersion)
+{
+    if (*pVersion > 2) {
+      *pVersion = 2;
+    }
+    return VK_SUCCESS;
+}

diff --git a/icd/intel/gpu.h b/icd/intel/gpu.h
new file mode 100644
index 0000000..4400bda
--- /dev/null
+++ b/icd/intel/gpu.h

@@ -0,0 +1,120 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Mark Lobodzinski <mark@lunarg.com>
+ * Author: Tobin Ehlis <tobin@lunarg.com>
+ *
+ */
+
+#ifndef GPU_H
+#define GPU_H
+
+#include "intel.h"
+#include "extension_info.h"
+
+#define INTEL_GPU_ASSERT(gpu, min_gen, max_gen)   \
+       assert(intel_gpu_gen(gpu) >= INTEL_GEN(min_gen) && \
+              intel_gpu_gen(gpu) <= INTEL_GEN(max_gen))
+
+enum intel_gpu_engine_type {
+    /* TODO BLT support */
+    INTEL_GPU_ENGINE_3D,
+
+    INTEL_GPU_ENGINE_COUNT
+};
+
+struct intel_instance;
+struct intel_wsi_display;
+struct intel_winsys;
+
+/*
+ * intel_gpu is the only object that does not inherit from intel_base.
+ */
+struct intel_gpu {
+    struct intel_handle handle;
+
+    struct intel_gpu *next;
+
+    int devid;          /* PCI device ID */
+    char *primary_node; /* path to the primary node */
+    char *render_node;  /* path to the render node */
+    int gen_opaque;     /* always read this with intel_gpu_gen() */
+    int gt;
+
+    VkDeviceSize max_batch_buffer_size;
+    uint32_t batch_buffer_reloc_count;
+
+    /*
+     * The enabled hardware features could be limited by the kernel.  These
+     * mutable fds allows us to talk to the kernel before the device is
+     * created.
+     */
+    int primary_fd_internal;
+    int render_fd_internal;
+
+    struct intel_winsys *winsys;
+
+    struct intel_wsi_display **displays;
+    uint32_t display_count;
+};
+
+static inline struct intel_gpu *intel_gpu(VkPhysicalDevice gpu)
+{
+    return (struct intel_gpu *) gpu;
+}
+
+static inline int intel_gpu_gen(const struct intel_gpu *gpu)
+{
+#ifdef INTEL_GEN_SPECIALIZED
+    return INTEL_GEN(INTEL_GEN_SPECIALIZED);
+#else
+    return gpu->gen_opaque;
+#endif
+}
+
+VkResult intel_gpu_create(const struct intel_instance *instance, int devid,
+                            const char *primary_node, const char *render_node,
+                            struct intel_gpu **gpu_ret);
+void intel_gpu_destroy(struct intel_gpu *gpu);
+
+
+void intel_gpu_get_props(const struct intel_gpu *gpu,
+                         VkPhysicalDeviceProperties *props);
+
+void intel_gpu_get_sparse_properties(VkPhysicalDeviceSparseProperties *pProps);
+
+void intel_gpu_get_limits(VkPhysicalDeviceLimits *pLimits);
+
+void intel_gpu_get_queue_props(const struct intel_gpu *gpu,
+                               enum intel_gpu_engine_type engine,
+                               VkQueueFamilyProperties *props);
+void intel_gpu_get_memory_props(const struct intel_gpu *gpu,
+                                VkPhysicalDeviceMemoryProperties *props);
+
+int intel_gpu_get_max_threads(const struct intel_gpu *gpu,
+                              VkShaderStageFlagBits stage);
+
+int intel_gpu_get_primary_fd(struct intel_gpu *gpu);
+
+VkResult intel_gpu_init_winsys(struct intel_gpu *gpu);
+void intel_gpu_cleanup_winsys(struct intel_gpu *gpu);
+
+enum intel_phy_dev_ext_type intel_gpu_lookup_phy_dev_extension(const struct intel_gpu *gpu,
+        const char *extensionName);
+
+#endif /* GPU_H */

diff --git a/icd/intel/img.c b/icd/intel/img.c
new file mode 100644
index 0000000..1e8f232b
--- /dev/null
+++ b/icd/intel/img.c

@@ -0,0 +1,191 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Jon Ashburn <jon@lunarg.com>
+ *
+ */
+
+#include "kmd/winsys.h"
+#include "dev.h"
+#include "gpu.h"
+#include "wsi.h"
+#include "img.h"
+
+/*
+ * From the Ivy Bridge PRM, volume 1 part 1, page 105:
+ *
+ *     "In addition to restrictions on maximum height, width, and depth,
+ *      surfaces are also restricted to a maximum size in bytes. This
+ *      maximum is 2 GB for all products and all surface types."
+ */
+static const size_t intel_max_resource_size = 1u << 31;
+
+static void img_destroy(struct intel_obj *obj)
+{
+    struct intel_img *img = intel_img_from_obj(obj);
+
+    intel_img_destroy(img);
+}
+
+static VkResult img_get_memory_requirements(struct intel_base *base, VkMemoryRequirements *pRequirements)
+{
+    struct intel_img *img = intel_img_from_base(base);
+
+    pRequirements->size = img->total_size;
+    pRequirements->alignment = 4096;
+    pRequirements->memoryTypeBits = (1 << INTEL_MEMORY_TYPE_COUNT) - 1;
+
+    return VK_SUCCESS;
+}
+
+VkResult intel_img_create(struct intel_dev *dev,
+                          const VkImageCreateInfo *info,
+                          const VkAllocationCallbacks *allocator,
+                          bool scanout,
+                          struct intel_img **img_ret)
+{
+    struct intel_img *img;
+    struct intel_layout *layout;
+
+    img = (struct intel_img *) intel_base_create(&dev->base.handle,
+            sizeof(*img), dev->base.dbg, VK_DEBUG_REPORT_OBJECT_TYPE_IMAGE_EXT, info, 0);
+    if (!img)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    layout = &img->layout;
+
+    img->type = info->imageType;
+    img->depth = info->extent.depth;
+    img->mip_levels = info->mipLevels;
+    img->array_size = info->arrayLayers;
+    img->usage = info->usage;
+    img->sample_count = (uint32_t) info->samples;
+    intel_layout_init(layout, dev, info, scanout);
+
+    img->total_size = img->layout.bo_stride * img->layout.bo_height;
+
+    if (layout->aux != INTEL_LAYOUT_AUX_NONE) {
+        img->aux_offset = u_align(img->total_size, 4096);
+        img->total_size = img->aux_offset +
+            layout->aux_stride * layout->aux_height;
+    }
+
+    if (layout->separate_stencil) {
+        VkImageCreateInfo s8_info;
+
+        img->s8_layout = intel_alloc(img, sizeof(*img->s8_layout), sizeof(int),
+                VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+        if (!img->s8_layout) {
+            intel_img_destroy(img);
+            return VK_ERROR_OUT_OF_HOST_MEMORY;
+        }
+
+        s8_info = *info;
+        s8_info.format = VK_FORMAT_S8_UINT;
+        /* no stencil texturing */
+        s8_info.usage &= ~VK_IMAGE_USAGE_SAMPLED_BIT;
+        assert(icd_format_is_ds(info->format));
+
+        intel_layout_init(img->s8_layout, dev, &s8_info, scanout);
+
+        img->s8_offset = u_align(img->total_size, 4096);
+        img->total_size = img->s8_offset +
+            img->s8_layout->bo_stride * img->s8_layout->bo_height;
+    }
+
+    if (scanout) {
+        VkResult ret = intel_wsi_img_init(img);
+        if (ret != VK_SUCCESS) {
+            intel_img_destroy(img);
+            return ret;
+        }
+    }
+
+    img->obj.destroy = img_destroy;
+    img->obj.base.get_memory_requirements = img_get_memory_requirements;
+
+    *img_ret = img;
+
+    return VK_SUCCESS;
+}
+
+void intel_img_destroy(struct intel_img *img)
+{
+    if (img->wsi_data)
+        intel_wsi_img_cleanup(img);
+
+    if (img->s8_layout)
+        intel_free(img, img->s8_layout);
+
+    intel_base_destroy(&img->obj.base);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateImage(
+    VkDevice                                  device,
+    const VkImageCreateInfo*                pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkImage*                                  pImage)
+{
+    struct intel_dev *dev = intel_dev(device);
+
+    return intel_img_create(dev, pCreateInfo, pAllocator, false,
+            (struct intel_img **) pImage);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyImage(
+    VkDevice                                device,
+    VkImage                                 image,
+    const VkAllocationCallbacks*                     pAllocator)
+
+ {
+    struct intel_obj *obj = intel_obj(image);
+
+    obj->destroy(obj);
+ }
+
+VKAPI_ATTR void VKAPI_CALL vkGetImageSubresourceLayout(
+    VkDevice                                    device,
+    VkImage                                     image,
+    const VkImageSubresource*                   pSubresource,
+    VkSubresourceLayout*                        pLayout)
+{
+    const struct intel_img *img = intel_img(image);
+    unsigned x, y;
+
+    intel_layout_get_slice_pos(&img->layout, pSubresource->mipLevel,
+                               pSubresource->arrayLayer, &x, &y);
+    intel_layout_pos_to_mem(&img->layout, x, y, &x, &y);
+
+    pLayout->offset = intel_layout_mem_to_linear(&img->layout, x, y);
+    pLayout->size = intel_layout_get_slice_size(&img->layout,
+                                               pSubresource->mipLevel);
+    pLayout->rowPitch = img->layout.bo_stride;
+    pLayout->depthPitch = intel_layout_get_slice_stride(&img->layout,
+                                                       pSubresource->mipLevel);
+    pLayout->arrayPitch = intel_layout_get_slice_stride(&img->layout,
+                                                       pSubresource->arrayLayer);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetImageSparseMemoryRequirements(
+    VkDevice                                    device,
+    VkImage                                     image,
+    uint32_t*                                   pSparseMemoryRequirementCount,
+    VkSparseImageMemoryRequirements*            pSparseMemoryRequirements)
+{
+    *pSparseMemoryRequirementCount = 0;
+}

diff --git a/icd/intel/img.h b/icd/intel/img.h
new file mode 100644
index 0000000..301f7bc
--- /dev/null
+++ b/icd/intel/img.h

@@ -0,0 +1,81 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ *
+ */
+
+#ifndef IMG_H
+#define IMG_H
+
+#include "kmd/winsys.h"
+#include "intel.h"
+#include "layout.h"
+#include "obj.h"
+
+struct intel_img {
+    struct intel_obj obj;
+
+    VkImageType type;
+    int32_t depth;
+    uint32_t mip_levels;
+    uint32_t array_size;
+    VkFlags usage;
+    uint32_t sample_count;
+    struct intel_layout layout;
+
+    /* layout of separate stencil */
+    struct intel_layout *s8_layout;
+
+    size_t total_size;
+    size_t aux_offset;
+    size_t s8_offset;
+
+    void *wsi_data;
+};
+
+static inline struct intel_img *intel_img(VkImage image)
+{
+    return *(struct intel_img **) &image;
+}
+
+static inline struct intel_img *intel_img_from_base(struct intel_base *base)
+{
+    return (struct intel_img *) base;
+}
+
+static inline struct intel_img *intel_img_from_obj(struct intel_obj *obj)
+{
+    return intel_img_from_base(&obj->base);
+}
+
+VkResult intel_img_create(struct intel_dev *dev,
+                          const VkImageCreateInfo *info,
+                          const VkAllocationCallbacks *allocator,
+                          bool scanout,
+                          struct intel_img **img_ret);
+
+void intel_img_destroy(struct intel_img *img);
+
+static inline bool intel_img_can_enable_hiz(const struct intel_img *img,
+                                            uint32_t level)
+{
+    return (img->layout.aux == INTEL_LAYOUT_AUX_HIZ &&
+            img->layout.aux_enables & (1 << level));
+}
+
+#endif /* IMG_H */

diff --git a/icd/intel/instance.c b/icd/intel/instance.c
new file mode 100644
index 0000000..d924c0b
--- /dev/null
+++ b/icd/intel/instance.c

@@ -0,0 +1,360 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olv@lunarg.com>
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ *
+ */
+
+#include "icd-enumerate-drm.h"
+#include "gpu.h"
+#include "instance.h"
+
+static int intel_devid_override;
+int intel_debug = -1;
+
+void *intel_alloc(const void *handle,
+                                size_t size, size_t alignment,
+                                VkSystemAllocationScope scope)
+{
+    assert(intel_handle_validate(handle));
+    return icd_instance_alloc(((const struct intel_handle *) handle)->instance->icd,
+            size, alignment, scope);
+}
+
+void intel_free(const void *handle, void *ptr)
+{
+    assert(intel_handle_validate(handle));
+    icd_instance_free(((const struct intel_handle *) handle)->instance->icd, ptr);
+}
+
+void intel_logv(const void *handle,
+                VkFlags msg_flags,
+                VkDebugReportObjectTypeEXT obj_type, uint64_t src_object,
+                size_t location, int32_t msg_code,
+                const char *format, va_list ap)
+{
+    char msg[256];
+    int ret;
+
+    ret = vsnprintf(msg, sizeof(msg), format, ap);
+    if (ret >= sizeof(msg) || ret < 0)
+        msg[sizeof(msg) - 1] = '\0';
+
+    assert(intel_handle_validate(handle));
+    icd_instance_log(((const struct intel_handle *) handle)->instance->icd,
+                     msg_flags,
+                     obj_type, src_object,              /* obj_type, object */
+                     location, msg_code,                /* location, msg_code */
+                     msg);
+}
+
+static void intel_debug_init(void)
+{
+    const char *env;
+
+    if (intel_debug >= 0)
+        return;
+
+    intel_debug = 0;
+
+    /* parse comma-separated debug options */
+    env = getenv("VK_INTEL_DEBUG");
+    while (env) {
+        const char *p = strchr(env, ',');
+        size_t len;
+
+        if (p)
+            len = p - env;
+        else
+            len = strlen(env);
+
+        if (len > 0) {
+            if (strncmp(env, "batch", len) == 0) {
+                intel_debug |= INTEL_DEBUG_BATCH;
+            } else if (strncmp(env, "nohw", len) == 0) {
+                intel_debug |= INTEL_DEBUG_NOHW;
+            } else if (strncmp(env, "nocache", len) == 0) {
+                intel_debug |= INTEL_DEBUG_NOCACHE;
+            } else if (strncmp(env, "nohiz", len) == 0) {
+                intel_debug |= INTEL_DEBUG_NOHIZ;
+            } else if (strncmp(env, "hang", len) == 0) {
+                intel_debug |= INTEL_DEBUG_HANG;
+            } else if (strncmp(env, "0x", 2) == 0) {
+                intel_debug |= INTEL_DEBUG_NOHW;
+                intel_devid_override = strtol(env, NULL, 16);
+            }
+        }
+
+        if (!p)
+            break;
+
+        env = p + 1;
+    }
+}
+
+static void intel_instance_add_gpu(struct intel_instance *instance,
+                                   struct intel_gpu *gpu)
+{
+    gpu->next = instance->gpus;
+    instance->gpus = gpu;
+}
+
+static void intel_instance_remove_gpus(struct intel_instance *instance)
+{
+    struct intel_gpu *gpu = instance->gpus;
+
+    while (gpu) {
+        struct intel_gpu *next = gpu->next;
+
+        intel_gpu_destroy(gpu);
+        gpu = next;
+    }
+
+    instance->gpus = NULL;
+}
+
+static void intel_instance_destroy(struct intel_instance *instance)
+{
+    struct icd_instance *icd = instance->icd;
+
+    intel_instance_remove_gpus(instance);
+    icd_instance_free(icd, instance);
+
+    icd_instance_destroy(icd);
+}
+
+static VkResult intel_instance_create(
+        const VkInstanceCreateInfo* info,
+        const VkAllocationCallbacks* allocator,
+        struct intel_instance **pInstance)
+{
+    struct intel_instance *instance;
+    struct icd_instance *icd;
+    uint32_t i;
+
+    intel_debug_init();
+
+    icd = icd_instance_create(info->pApplicationInfo, allocator);
+    if (!icd)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    instance = icd_instance_alloc(icd, sizeof(*instance), sizeof(int),
+            VK_SYSTEM_ALLOCATION_SCOPE_INSTANCE);
+    if (!instance) {
+        icd_instance_destroy(icd);
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+    }
+
+    memset(instance, 0, sizeof(*instance));
+    intel_handle_init(&instance->handle, VK_DEBUG_REPORT_OBJECT_TYPE_INSTANCE_EXT, instance);
+
+    instance->icd = icd;
+
+    for (i = 0; i < info->enabledExtensionCount; i++) {
+        const enum intel_global_ext_type ext =
+            intel_gpu_lookup_global_extension(
+                    info->ppEnabledExtensionNames[i]);
+
+        if (ext != INTEL_GLOBAL_EXT_INVALID) {
+            instance->global_exts[ext] = true;
+        } else {
+            /* Fail create if extensions are specified that
+             * ICD cannot satisfy. Loader will filter out extensions / layers
+             * not meant by the ICD.
+             */
+            icd_instance_destroy(icd);
+            intel_instance_destroy(instance);
+            return VK_ERROR_EXTENSION_NOT_PRESENT;
+        }
+    }
+
+    /*
+     * This ICD does not support any layers.
+     */
+    if (info->enabledLayerCount > 0) {
+        icd_instance_destroy(icd);
+        intel_instance_destroy(instance);
+        return VK_ERROR_LAYER_NOT_PRESENT;
+    }
+
+    *pInstance = instance;
+
+    return VK_SUCCESS;
+}
+
+enum intel_global_ext_type intel_gpu_lookup_global_extension(
+        const char *extensionName)
+{
+    enum intel_global_ext_type type;
+
+    for (type = 0; type < ARRAY_SIZE(intel_global_exts); type++) {
+        if (compare_vk_extension_properties(&intel_global_exts[type], extensionName))
+            break;
+    }
+
+    assert(type < INTEL_GLOBAL_EXT_COUNT || type == INTEL_GLOBAL_EXT_INVALID);
+
+    return type;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateInstance(
+    const VkInstanceCreateInfo*               pCreateInfo,
+    const VkAllocationCallbacks*              pAllocator,
+    VkInstance*                               pInstance)
+{
+    return intel_instance_create(pCreateInfo, pAllocator,
+            (struct intel_instance **) pInstance);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyInstance(
+    VkInstance                                pInstance,
+    const VkAllocationCallbacks*              pAllocator)
+{
+    struct intel_instance *instance = intel_instance(pInstance);
+
+    intel_instance_destroy(instance);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkEnumerateInstanceExtensionProperties(
+        const char*                           pLayerName,
+        uint32_t*                             pPropertyCount,
+        VkExtensionProperties*                pProperties)
+{
+    uint32_t copy_size;
+
+    if (pProperties == NULL) {
+        *pPropertyCount = INTEL_GLOBAL_EXT_COUNT;
+        return VK_SUCCESS;
+    }
+
+    copy_size = *pPropertyCount < INTEL_GLOBAL_EXT_COUNT ? *pPropertyCount : INTEL_GLOBAL_EXT_COUNT;
+    memcpy(pProperties, intel_global_exts, copy_size * sizeof(VkExtensionProperties));
+    *pPropertyCount = copy_size;
+    if (copy_size < INTEL_GLOBAL_EXT_COUNT) {
+        return VK_INCOMPLETE;
+    }
+
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkEnumerateInstanceLayerProperties(
+        uint32_t*                             pPropertyCount,
+        VkLayerProperties*                    pProperties)
+{
+    *pPropertyCount = 0;
+    return VK_SUCCESS;
+}
+
+static VkPhysicalDevice physicalGPU = 0;
+
+VKAPI_ATTR VkResult VKAPI_CALL vkEnumeratePhysicalDevices(
+    VkInstance                                instance_,
+    uint32_t*                                 pPhysicalDeviceCount,
+    VkPhysicalDevice*                         pPhysicalDevices)
+{
+    struct intel_instance *instance = intel_instance(instance_);
+    struct icd_drm_device *devices, *dev;
+    VkResult ret;
+    uint32_t count;
+
+    if (pPhysicalDevices == NULL) {
+        *pPhysicalDeviceCount = 1;
+        return VK_SUCCESS;
+    }
+
+    if (physicalGPU)
+    {
+        *pPhysicalDeviceCount = 1;
+        pPhysicalDevices[0] = physicalGPU;
+        return VK_SUCCESS;
+    }
+
+    intel_instance_remove_gpus(instance);
+
+    devices = icd_drm_enumerate(instance->icd, 0x8086);
+
+    count = 0;
+    dev = devices;
+    while (dev) {
+        const char *primary_node, *render_node;
+        int devid;
+        struct intel_gpu *gpu;
+
+        primary_node = icd_drm_get_devnode(dev, ICD_DRM_MINOR_LEGACY);
+        if (!primary_node)
+            continue;
+
+        render_node = icd_drm_get_devnode(dev, ICD_DRM_MINOR_RENDER);
+
+        devid = (intel_devid_override) ? intel_devid_override : dev->devid;
+        ret = intel_gpu_create(instance, devid,
+                primary_node, render_node, &gpu);
+        if (ret == VK_SUCCESS) {
+            intel_instance_add_gpu(instance, gpu);
+
+            pPhysicalDevices[count++] = (VkPhysicalDevice) gpu;
+            physicalGPU = (VkPhysicalDevice) gpu;
+            if (count >= *pPhysicalDeviceCount)
+                break;
+        }
+
+        dev = dev->next;
+    }
+
+    icd_drm_release(instance->icd, devices);
+
+    *pPhysicalDeviceCount = count;
+
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateDebugReportCallbackEXT(
+    VkInstance                                instance,
+    const VkDebugReportCallbackCreateInfoEXT  *pCreateInfo,
+    const VkAllocationCallbacks*              pAllocator,
+    VkDebugReportCallbackEXT*                 pCallback)
+{
+    struct intel_instance *inst = intel_instance(instance);
+
+    return icd_instance_create_logger(inst->icd, pCreateInfo, pAllocator, pCallback);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyDebugReportCallbackEXT(
+    VkInstance                                instance,
+    VkDebugReportCallbackEXT                  callback,
+    const VkAllocationCallbacks               *pAllocator)
+{
+    struct intel_instance *inst = intel_instance(instance);
+
+    icd_instance_destroy_logger(inst->icd, callback, pAllocator);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDebugReportMessageEXT(
+    VkInstance                                instance,
+    VkDebugReportFlagsEXT                     flags,
+    VkDebugReportObjectTypeEXT                objType,
+    uint64_t                                  object,
+    size_t                                    location,
+    int32_t                                   msgCode,
+    const char*                               pLayerPrefix,
+    const char*                               pMsg)
+{
+    // Intentionally does nothing.
+    // Loader will call registered callbacks after all
+    // ICDs have been notified.
+}

diff --git a/icd/intel/instance.h b/icd/intel/instance.h
new file mode 100644
index 0000000..ec4e605
--- /dev/null
+++ b/icd/intel/instance.h

@@ -0,0 +1,45 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olv@lunarg.com>
+ *
+ */
+
+#ifndef INSTANCE_H
+#define INSTANCE_H
+
+#include "intel.h"
+#include "extension_info.h"
+
+struct intel_gpu;
+
+struct intel_instance {
+    struct intel_handle handle;
+
+    struct icd_instance *icd;
+
+    struct intel_gpu *gpus;
+    bool global_exts[INTEL_GLOBAL_EXT_COUNT];
+};
+
+static inline struct intel_instance *intel_instance(VkInstance instance)
+{
+    return (struct intel_instance *) instance;
+}
+
+enum intel_global_ext_type intel_gpu_lookup_global_extension(const char *extensionName);
+#endif /* INSTANCE_H */

diff --git a/icd/intel/intel.h b/icd/intel/intel.h
new file mode 100644
index 0000000..03d2b4b
--- /dev/null
+++ b/icd/intel/intel.h

@@ -0,0 +1,149 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Chia-I Wu <olv@lunarg.com>
+ * Author: Jeremy Hayes <jeremy@lunarg.com>
+ * Author: Jon Ashburn <jon@lunarg.com>
+ *
+ */
+
+#ifndef INTEL_H
+#define INTEL_H
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <stdarg.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <string.h>
+#include <assert.h>
+
+#include <vulkan/vulkan.h>
+#include <vulkan/vk_icd.h>
+
+#include "icd.h"
+#include "icd-spv.h"
+#include "icd-format.h"
+#include "icd-instance.h"
+#include "icd-utils.h"
+
+#define INTEL_API_VERSION VK_MAKE_VERSION(1, 0, VK_HEADER_VERSION)
+#define INTEL_DRIVER_VERSION 0
+
+#define INTEL_GEN(gen) ((int) ((gen) * 100))
+
+#define INTEL_MAX_VERTEX_BINDING_COUNT 33
+#define INTEL_MAX_VERTEX_ELEMENT_COUNT (INTEL_MAX_VERTEX_BINDING_COUNT + 1)
+#define INTEL_MAX_RENDER_TARGETS 8
+#define INTEL_MAX_VIEWPORTS 16
+
+#define INTEL_MEMORY_PROPERTY_ALL (VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT | VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |\
+    VK_MEMORY_PROPERTY_HOST_COHERENT_BIT | VK_MEMORY_PROPERTY_HOST_CACHED_BIT)
+#define INTEL_MEMORY_HEAP_COUNT   1
+#define INTEL_MEMORY_HEAP_SIZE    (2u << 30)
+#define INTEL_MEMORY_TYPE_COUNT   1
+
+enum intel_debug_flags {
+    INTEL_DEBUG_BATCH       = 1 << 0,
+
+    INTEL_DEBUG_NOHW        = 1 << 20,
+    INTEL_DEBUG_NOCACHE     = 1 << 21,
+    INTEL_DEBUG_NOHIZ       = 1 << 22,
+    INTEL_DEBUG_HANG        = 1 << 23,
+};
+
+struct intel_instance;
+
+struct intel_handle {
+    /* the loader expects a "void *" at the beginning */
+    void *loader_data;
+
+    uint32_t magic;
+    const struct intel_instance *instance;
+};
+
+extern int intel_debug;
+
+static const uint32_t intel_handle_magic = 0x494e544c;
+
+static inline void intel_handle_init(struct intel_handle *handle,
+                                     VkDebugReportObjectTypeEXT type,
+                                     const struct intel_instance *instance)
+{
+    set_loader_magic_value((void *)handle);
+
+    handle->magic = intel_handle_magic + type;
+    handle->instance = instance;
+}
+
+/**
+ * Return true if \p handle is a valid intel_handle.  This assumes the first
+ * sizeof(intel_handle) bytes are readable, and they does not happen to have
+ * our magic values.
+ */
+static inline bool intel_handle_validate(const void *handle)
+{
+//    const uint32_t handle_type =
+//        ((const struct intel_handle *) handle)->magic - intel_handle_magic;
+
+    /* TODO: this does not work for extensions, needs adjusting */
+//    return (handle_type <= VK_DEBUG_REPORT_OBJECT_TYPE_END_RANGE);
+    return true;
+}
+
+/**
+ * Return true if \p handle is a valid intel_handle of \p type.
+ *
+ * \see intel_handle_validate().
+ */
+static inline bool intel_handle_validate_type(const void *handle,
+                                              VkDebugReportObjectTypeEXT type)
+{
+    const uint32_t handle_type =
+        ((const struct intel_handle *) handle)->magic - intel_handle_magic;
+
+    return (handle_type == (uint32_t) type);
+}
+
+void *intel_alloc(const void *handle,
+                  size_t size, size_t alignment,
+                  VkSystemAllocationScope scope);
+
+void intel_free(const void *handle, void *ptr);
+
+void intel_logv(const void *handle,
+                VkFlags msg_flags,
+                VkDebugReportObjectTypeEXT obj_type, uint64_t src_object,
+                size_t location, int32_t msg_code,
+                const char *format, va_list ap);
+
+static inline void intel_log(const void *handle,
+                             VkFlags msg_flags,
+                             VkDebugReportObjectTypeEXT obj_type, uint64_t src_object,
+                             size_t location, int32_t msg_code,
+                             const char *format, ...)
+{
+    va_list ap;
+
+    va_start(ap, format);
+    intel_logv(handle, msg_flags, obj_type, src_object,
+               location, msg_code, format, ap);
+    va_end(ap);
+}
+
+#endif /* INTEL_H */

diff --git a/icd/intel/intel_icd.json b/icd/intel/intel_icd.json
new file mode 100644
index 0000000..d963b7a
--- /dev/null
+++ b/icd/intel/intel_icd.json

@@ -0,0 +1,7 @@
+{
+    "file_format_version": "1.0.0",
+    "ICD": {
+        "library_path": "./libVK_i965.so",
+        "api_version": "1.0.21"
+    }
+}

diff --git a/icd/intel/kmd/CMakeLists.txt b/icd/intel/kmd/CMakeLists.txt
new file mode 100644
index 0000000..baeb3e6
--- /dev/null
+++ b/icd/intel/kmd/CMakeLists.txt

@@ -0,0 +1,53 @@
+set(sources "")
+set(include_dirs "")
+set(definitions "")
+set(libraries icd)
+
+if(UNIX)
+    set(libdrm_sources
+        libdrm/xf86drm.c
+        libdrm/xf86drmHash.c
+        libdrm/xf86drmMode.c
+        libdrm/xf86drmRandom.c
+        libdrm/intel/intel_bufmgr.c
+        libdrm/intel/intel_bufmgr_gem.c
+        libdrm/intel/intel_decode.c)
+
+    list(APPEND sources
+        winsys_drm.c
+        ${libdrm_sources})
+
+    list(APPEND include_dirs
+        libdrm
+        libdrm/include/drm
+        libdrm/intel)
+
+    list(APPEND definitions
+        -D_GNU_SOURCE
+        -DHAVE_LIBDRM_ATOMIC_PRIMITIVES=1)
+
+    set_source_files_properties(${libdrm_sources} PROPERTIES COMPILE_FLAGS
+        "-std=gnu99 -Wno-type-limits -Wno-unused-variable")
+
+    find_package(PthreadStubs REQUIRED)
+    list(APPEND include_dirs ${PTHREADSTUBS_INCLUDE_DIRS})
+    list(APPEND libraries ${PTHREADSTUBS_LIBRARIES})
+
+    find_package(PCIAccess REQUIRED)
+    list(APPEND include_dirs ${PCIACCESS_INCLUDE_DIRS})
+    list(APPEND libraries ${PCIACCESS_LIBRARIES})
+
+    find_package(Valgrind)
+    if(VALGRIND_FOUND AND CMAKE_BUILD_TYPE STREQUAL "Debug")
+        list(APPEND definitions -DHAVE_VALGRIND=1)
+        list(APPEND include_dirs ${VALGRIND_INCLUDE_DIRS})
+        list(APPEND libraries ${VALGRIND_LIBRARIES})
+    endif()
+endif()
+
+add_library(intelkmd STATIC ${sources})
+target_include_directories(intelkmd PRIVATE ${include_dirs})
+target_include_directories(intelkmd PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
+target_compile_definitions(intelkmd PRIVATE ${definitions})
+target_link_libraries(intelkmd ${libraries})
+set_target_properties(intelkmd PROPERTIES POSITION_INDEPENDENT_CODE ON)

diff --git a/icd/intel/kmd/libdrm/include/drm/drm.h b/icd/intel/kmd/libdrm/include/drm/drm.h
new file mode 100644
index 0000000..229a29f
--- /dev/null
+++ b/icd/intel/kmd/libdrm/include/drm/drm.h

@@ -0,0 +1,856 @@
+/**
+ * \file drm.h
+ * Header for the Direct Rendering Manager
+ *
+ * \author Rickard E. (Rik) Faith <faith@valinux.com>
+ *
+ * \par Acknowledgments:
+ * Dec 1999, Richard Henderson <rth@twiddle.net>, move to generic \c cmpxchg.
+ */
+
+/*
+ * Copyright 1999 Precision Insight, Inc., Cedar Park, Texas.
+ * Copyright 2000 VA Linux Systems, Inc., Sunnyvale, California.
+ * All rights reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * VA LINUX SYSTEMS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef _DRM_H_
+#define _DRM_H_
+
+#if defined(__linux__)
+
+#include <linux/types.h>
+#include <asm/ioctl.h>
+typedef unsigned int drm_handle_t;
+
+#else /* One of the BSDs */
+
+#include <sys/ioccom.h>
+#include <sys/types.h>
+typedef int8_t   __s8;
+typedef uint8_t  __u8;
+typedef int16_t  __s16;
+typedef uint16_t __u16;
+typedef int32_t  __s32;
+typedef uint32_t __u32;
+typedef int64_t  __s64;
+typedef uint64_t __u64;
+typedef unsigned long drm_handle_t;
+
+#endif
+
+#define DRM_NAME	"drm"	  /**< Name in kernel, /dev, and /proc */
+#define DRM_MIN_ORDER	5	  /**< At least 2^5 bytes = 32 bytes */
+#define DRM_MAX_ORDER	22	  /**< Up to 2^22 bytes = 4MB */
+#define DRM_RAM_PERCENT 10	  /**< How much system ram can we lock? */
+
+#define _DRM_LOCK_HELD	0x80000000U /**< Hardware lock is held */
+#define _DRM_LOCK_CONT	0x40000000U /**< Hardware lock is contended */
+#define _DRM_LOCK_IS_HELD(lock)	   ((lock) & _DRM_LOCK_HELD)
+#define _DRM_LOCK_IS_CONT(lock)	   ((lock) & _DRM_LOCK_CONT)
+#define _DRM_LOCKING_CONTEXT(lock) ((lock) & ~(_DRM_LOCK_HELD|_DRM_LOCK_CONT))
+
+typedef unsigned int drm_context_t;
+typedef unsigned int drm_drawable_t;
+typedef unsigned int drm_magic_t;
+
+/**
+ * Cliprect.
+ *
+ * \warning: If you change this structure, make sure you change
+ * XF86DRIClipRectRec in the server as well
+ *
+ * \note KW: Actually it's illegal to change either for
+ * backwards-compatibility reasons.
+ */
+struct drm_clip_rect {
+	unsigned short x1;
+	unsigned short y1;
+	unsigned short x2;
+	unsigned short y2;
+};
+
+/**
+ * Drawable information.
+ */
+struct drm_drawable_info {
+	unsigned int num_rects;
+	struct drm_clip_rect *rects;
+};
+
+/**
+ * Texture region,
+ */
+struct drm_tex_region {
+	unsigned char next;
+	unsigned char prev;
+	unsigned char in_use;
+	unsigned char padding;
+	unsigned int age;
+};
+
+/**
+ * Hardware lock.
+ *
+ * The lock structure is a simple cache-line aligned integer.  To avoid
+ * processor bus contention on a multiprocessor system, there should not be any
+ * other data stored in the same cache line.
+ */
+struct drm_hw_lock {
+	__volatile__ unsigned int lock;		/**< lock variable */
+	char padding[60];			/**< Pad to cache line */
+};
+
+/**
+ * DRM_IOCTL_VERSION ioctl argument type.
+ *
+ * \sa drmGetVersion().
+ */
+struct drm_version {
+	int version_major;	  /**< Major version */
+	int version_minor;	  /**< Minor version */
+	int version_patchlevel;	  /**< Patch level */
+	size_t name_len;	  /**< Length of name buffer */
+	char *name;	  /**< Name of driver */
+	size_t date_len;	  /**< Length of date buffer */
+	char *date;	  /**< User-space buffer to hold date */
+	size_t desc_len;	  /**< Length of desc buffer */
+	char *desc;	  /**< User-space buffer to hold desc */
+};
+
+/**
+ * DRM_IOCTL_GET_UNIQUE ioctl argument type.
+ *
+ * \sa drmGetBusid() and drmSetBusId().
+ */
+struct drm_unique {
+	size_t unique_len;	  /**< Length of unique */
+	char *unique;	  /**< Unique name for driver instantiation */
+};
+
+struct drm_list {
+	int count;		  /**< Length of user-space structures */
+	struct drm_version *version;
+};
+
+struct drm_block {
+	int unused;
+};
+
+/**
+ * DRM_IOCTL_CONTROL ioctl argument type.
+ *
+ * \sa drmCtlInstHandler() and drmCtlUninstHandler().
+ */
+struct drm_control {
+	enum {
+		DRM_ADD_COMMAND,
+		DRM_RM_COMMAND,
+		DRM_INST_HANDLER,
+		DRM_UNINST_HANDLER
+	} func;
+	int irq;
+};
+
+/**
+ * Type of memory to map.
+ */
+enum drm_map_type {
+	_DRM_FRAME_BUFFER = 0,	  /**< WC (no caching), no core dump */
+	_DRM_REGISTERS = 1,	  /**< no caching, no core dump */
+	_DRM_SHM = 2,		  /**< shared, cached */
+	_DRM_AGP = 3,		  /**< AGP/GART */
+	_DRM_SCATTER_GATHER = 4,  /**< Scatter/gather memory for PCI DMA */
+	_DRM_CONSISTENT = 5,	  /**< Consistent memory for PCI DMA */
+	_DRM_GEM = 6		  /**< GEM object */
+};
+
+/**
+ * Memory mapping flags.
+ */
+enum drm_map_flags {
+	_DRM_RESTRICTED = 0x01,	     /**< Cannot be mapped to user-virtual */
+	_DRM_READ_ONLY = 0x02,
+	_DRM_LOCKED = 0x04,	     /**< shared, cached, locked */
+	_DRM_KERNEL = 0x08,	     /**< kernel requires access */
+	_DRM_WRITE_COMBINING = 0x10, /**< use write-combining if available */
+	_DRM_CONTAINS_LOCK = 0x20,   /**< SHM page that contains lock */
+	_DRM_REMOVABLE = 0x40,	     /**< Removable mapping */
+	_DRM_DRIVER = 0x80	     /**< Managed by driver */
+};
+
+struct drm_ctx_priv_map {
+	unsigned int ctx_id;	 /**< Context requesting private mapping */
+	void *handle;		 /**< Handle of map */
+};
+
+/**
+ * DRM_IOCTL_GET_MAP, DRM_IOCTL_ADD_MAP and DRM_IOCTL_RM_MAP ioctls
+ * argument type.
+ *
+ * \sa drmAddMap().
+ */
+struct drm_map {
+	unsigned long offset;	 /**< Requested physical address (0 for SAREA)*/
+	unsigned long size;	 /**< Requested physical size (bytes) */
+	enum drm_map_type type;	 /**< Type of memory to map */
+	enum drm_map_flags flags;	 /**< Flags */
+	void *handle;		 /**< User-space: "Handle" to pass to mmap() */
+				 /**< Kernel-space: kernel-virtual address */
+	int mtrr;		 /**< MTRR slot used */
+	/*   Private data */
+};
+
+/**
+ * DRM_IOCTL_GET_CLIENT ioctl argument type.
+ */
+struct drm_client {
+	int idx;		/**< Which client desired? */
+	int auth;		/**< Is client authenticated? */
+	unsigned long pid;	/**< Process ID */
+	unsigned long uid;	/**< User ID */
+	unsigned long magic;	/**< Magic */
+	unsigned long iocs;	/**< Ioctl count */
+};
+
+enum drm_stat_type {
+	_DRM_STAT_LOCK,
+	_DRM_STAT_OPENS,
+	_DRM_STAT_CLOSES,
+	_DRM_STAT_IOCTLS,
+	_DRM_STAT_LOCKS,
+	_DRM_STAT_UNLOCKS,
+	_DRM_STAT_VALUE,	/**< Generic value */
+	_DRM_STAT_BYTE,		/**< Generic byte counter (1024bytes/K) */
+	_DRM_STAT_COUNT,	/**< Generic non-byte counter (1000/k) */
+
+	_DRM_STAT_IRQ,		/**< IRQ */
+	_DRM_STAT_PRIMARY,	/**< Primary DMA bytes */
+	_DRM_STAT_SECONDARY,	/**< Secondary DMA bytes */
+	_DRM_STAT_DMA,		/**< DMA */
+	_DRM_STAT_SPECIAL,	/**< Special DMA (e.g., priority or polled) */
+	_DRM_STAT_MISSED	/**< Missed DMA opportunity */
+	    /* Add to the *END* of the list */
+};
+
+/**
+ * DRM_IOCTL_GET_STATS ioctl argument type.
+ */
+struct drm_stats {
+	unsigned long count;
+	struct {
+		unsigned long value;
+		enum drm_stat_type type;
+	} data[15];
+};
+
+/**
+ * Hardware locking flags.
+ */
+enum drm_lock_flags {
+	_DRM_LOCK_READY = 0x01,	     /**< Wait until hardware is ready for DMA */
+	_DRM_LOCK_QUIESCENT = 0x02,  /**< Wait until hardware quiescent */
+	_DRM_LOCK_FLUSH = 0x04,	     /**< Flush this context's DMA queue first */
+	_DRM_LOCK_FLUSH_ALL = 0x08,  /**< Flush all DMA queues first */
+	/* These *HALT* flags aren't supported yet
+	   -- they will be used to support the
+	   full-screen DGA-like mode. */
+	_DRM_HALT_ALL_QUEUES = 0x10, /**< Halt all current and future queues */
+	_DRM_HALT_CUR_QUEUES = 0x20  /**< Halt all current queues */
+};
+
+/**
+ * DRM_IOCTL_LOCK, DRM_IOCTL_UNLOCK and DRM_IOCTL_FINISH ioctl argument type.
+ *
+ * \sa drmGetLock() and drmUnlock().
+ */
+struct drm_lock {
+	int context;
+	enum drm_lock_flags flags;
+};
+
+/**
+ * DMA flags
+ *
+ * \warning
+ * These values \e must match xf86drm.h.
+ *
+ * \sa drm_dma.
+ */
+enum drm_dma_flags {
+	/* Flags for DMA buffer dispatch */
+	_DRM_DMA_BLOCK = 0x01,	      /**<
+				       * Block until buffer dispatched.
+				       *
+				       * \note The buffer may not yet have
+				       * been processed by the hardware --
+				       * getting a hardware lock with the
+				       * hardware quiescent will ensure
+				       * that the buffer has been
+				       * processed.
+				       */
+	_DRM_DMA_WHILE_LOCKED = 0x02, /**< Dispatch while lock held */
+	_DRM_DMA_PRIORITY = 0x04,     /**< High priority dispatch */
+
+	/* Flags for DMA buffer request */
+	_DRM_DMA_WAIT = 0x10,	      /**< Wait for free buffers */
+	_DRM_DMA_SMALLER_OK = 0x20,   /**< Smaller-than-requested buffers OK */
+	_DRM_DMA_LARGER_OK = 0x40     /**< Larger-than-requested buffers OK */
+};
+
+/**
+ * DRM_IOCTL_ADD_BUFS and DRM_IOCTL_MARK_BUFS ioctl argument type.
+ *
+ * \sa drmAddBufs().
+ */
+struct drm_buf_desc {
+	int count;		 /**< Number of buffers of this size */
+	int size;		 /**< Size in bytes */
+	int low_mark;		 /**< Low water mark */
+	int high_mark;		 /**< High water mark */
+	enum {
+		_DRM_PAGE_ALIGN = 0x01,	/**< Align on page boundaries for DMA */
+		_DRM_AGP_BUFFER = 0x02,	/**< Buffer is in AGP space */
+		_DRM_SG_BUFFER = 0x04,	/**< Scatter/gather memory buffer */
+		_DRM_FB_BUFFER = 0x08,	/**< Buffer is in frame buffer */
+		_DRM_PCI_BUFFER_RO = 0x10 /**< Map PCI DMA buffer read-only */
+	} flags;
+	unsigned long agp_start; /**<
+				  * Start address of where the AGP buffers are
+				  * in the AGP aperture
+				  */
+};
+
+/**
+ * DRM_IOCTL_INFO_BUFS ioctl argument type.
+ */
+struct drm_buf_info {
+	int count;		/**< Entries in list */
+	struct drm_buf_desc *list;
+};
+
+/**
+ * DRM_IOCTL_FREE_BUFS ioctl argument type.
+ */
+struct drm_buf_free {
+	int count;
+	int *list;
+};
+
+/**
+ * Buffer information
+ *
+ * \sa drm_buf_map.
+ */
+struct drm_buf_pub {
+	int idx;		       /**< Index into the master buffer list */
+	int total;		       /**< Buffer size */
+	int used;		       /**< Amount of buffer in use (for DMA) */
+	void *address;	       /**< Address of buffer */
+};
+
+/**
+ * DRM_IOCTL_MAP_BUFS ioctl argument type.
+ */
+struct drm_buf_map {
+	int count;		/**< Length of the buffer list */
+#ifdef __cplusplus
+	void *virt;
+#else
+	void *virtual;		/**< Mmap'd area in user-virtual */
+#endif
+	struct drm_buf_pub *list;	/**< Buffer information */
+};
+
+/**
+ * DRM_IOCTL_DMA ioctl argument type.
+ *
+ * Indices here refer to the offset into the buffer list in drm_buf_get.
+ *
+ * \sa drmDMA().
+ */
+struct drm_dma {
+	int context;			  /**< Context handle */
+	int send_count;			  /**< Number of buffers to send */
+	int *send_indices;	  /**< List of handles to buffers */
+	int *send_sizes;		  /**< Lengths of data to send */
+	enum drm_dma_flags flags;	  /**< Flags */
+	int request_count;		  /**< Number of buffers requested */
+	int request_size;		  /**< Desired size for buffers */
+	int *request_indices;	  /**< Buffer information */
+	int *request_sizes;
+	int granted_count;		  /**< Number of buffers granted */
+};
+
+enum drm_ctx_flags {
+	_DRM_CONTEXT_PRESERVED = 0x01,
+	_DRM_CONTEXT_2DONLY = 0x02
+};
+
+/**
+ * DRM_IOCTL_ADD_CTX ioctl argument type.
+ *
+ * \sa drmCreateContext() and drmDestroyContext().
+ */
+struct drm_ctx {
+	drm_context_t handle;
+	enum drm_ctx_flags flags;
+};
+
+/**
+ * DRM_IOCTL_RES_CTX ioctl argument type.
+ */
+struct drm_ctx_res {
+	int count;
+	struct drm_ctx *contexts;
+};
+
+/**
+ * DRM_IOCTL_ADD_DRAW and DRM_IOCTL_RM_DRAW ioctl argument type.
+ */
+struct drm_draw {
+	drm_drawable_t handle;
+};
+
+/**
+ * DRM_IOCTL_UPDATE_DRAW ioctl argument type.
+ */
+typedef enum {
+	DRM_DRAWABLE_CLIPRECTS
+} drm_drawable_info_type_t;
+
+struct drm_update_draw {
+	drm_drawable_t handle;
+	unsigned int type;
+	unsigned int num;
+	unsigned long long data;
+};
+
+/**
+ * DRM_IOCTL_GET_MAGIC and DRM_IOCTL_AUTH_MAGIC ioctl argument type.
+ */
+struct drm_auth {
+	drm_magic_t magic;
+};
+
+/**
+ * DRM_IOCTL_IRQ_BUSID ioctl argument type.
+ *
+ * \sa drmGetInterruptFromBusID().
+ */
+struct drm_irq_busid {
+	int irq;	/**< IRQ number */
+	int busnum;	/**< bus number */
+	int devnum;	/**< device number */
+	int funcnum;	/**< function number */
+};
+
+enum drm_vblank_seq_type {
+	_DRM_VBLANK_ABSOLUTE = 0x0,	/**< Wait for specific vblank sequence number */
+	_DRM_VBLANK_RELATIVE = 0x1,	/**< Wait for given number of vblanks */
+	_DRM_VBLANK_EVENT = 0x4000000,   /**< Send event instead of blocking */
+	_DRM_VBLANK_FLIP = 0x8000000,   /**< Scheduled buffer swap should flip */
+	_DRM_VBLANK_NEXTONMISS = 0x10000000,	/**< If missed, wait for next vblank */
+	_DRM_VBLANK_SECONDARY = 0x20000000,	/**< Secondary display controller */
+	_DRM_VBLANK_SIGNAL = 0x40000000	/**< Send signal instead of blocking, unsupported */
+};
+
+#define _DRM_VBLANK_TYPES_MASK (_DRM_VBLANK_ABSOLUTE | _DRM_VBLANK_RELATIVE)
+#define _DRM_VBLANK_FLAGS_MASK (_DRM_VBLANK_EVENT | _DRM_VBLANK_SIGNAL | \
+				_DRM_VBLANK_SECONDARY | _DRM_VBLANK_NEXTONMISS)
+
+struct drm_wait_vblank_request {
+	enum drm_vblank_seq_type type;
+	unsigned int sequence;
+	unsigned long signal;
+};
+
+struct drm_wait_vblank_reply {
+	enum drm_vblank_seq_type type;
+	unsigned int sequence;
+	long tval_sec;
+	long tval_usec;
+};
+
+/**
+ * DRM_IOCTL_WAIT_VBLANK ioctl argument type.
+ *
+ * \sa drmWaitVBlank().
+ */
+union drm_wait_vblank {
+	struct drm_wait_vblank_request request;
+	struct drm_wait_vblank_reply reply;
+};
+
+#define _DRM_PRE_MODESET 1
+#define _DRM_POST_MODESET 2
+
+/**
+ * DRM_IOCTL_MODESET_CTL ioctl argument type
+ *
+ * \sa drmModesetCtl().
+ */
+struct drm_modeset_ctl {
+	__u32 crtc;
+	__u32 cmd;
+};
+
+/**
+ * DRM_IOCTL_AGP_ENABLE ioctl argument type.
+ *
+ * \sa drmAgpEnable().
+ */
+struct drm_agp_mode {
+	unsigned long mode;	/**< AGP mode */
+};
+
+/**
+ * DRM_IOCTL_AGP_ALLOC and DRM_IOCTL_AGP_FREE ioctls argument type.
+ *
+ * \sa drmAgpAlloc() and drmAgpFree().
+ */
+struct drm_agp_buffer {
+	unsigned long size;	/**< In bytes -- will round to page boundary */
+	unsigned long handle;	/**< Used for binding / unbinding */
+	unsigned long type;	/**< Type of memory to allocate */
+	unsigned long physical;	/**< Physical used by i810 */
+};
+
+/**
+ * DRM_IOCTL_AGP_BIND and DRM_IOCTL_AGP_UNBIND ioctls argument type.
+ *
+ * \sa drmAgpBind() and drmAgpUnbind().
+ */
+struct drm_agp_binding {
+	unsigned long handle;	/**< From drm_agp_buffer */
+	unsigned long offset;	/**< In bytes -- will round to page boundary */
+};
+
+/**
+ * DRM_IOCTL_AGP_INFO ioctl argument type.
+ *
+ * \sa drmAgpVersionMajor(), drmAgpVersionMinor(), drmAgpGetMode(),
+ * drmAgpBase(), drmAgpSize(), drmAgpMemoryUsed(), drmAgpMemoryAvail(),
+ * drmAgpVendorId() and drmAgpDeviceId().
+ */
+struct drm_agp_info {
+	int agp_version_major;
+	int agp_version_minor;
+	unsigned long mode;
+	unsigned long aperture_base;	/* physical address */
+	unsigned long aperture_size;	/* bytes */
+	unsigned long memory_allowed;	/* bytes */
+	unsigned long memory_used;
+
+	/* PCI information */
+	unsigned short id_vendor;
+	unsigned short id_device;
+};
+
+/**
+ * DRM_IOCTL_SG_ALLOC ioctl argument type.
+ */
+struct drm_scatter_gather {
+	unsigned long size;	/**< In bytes -- will round to page boundary */
+	unsigned long handle;	/**< Used for mapping / unmapping */
+};
+
+/**
+ * DRM_IOCTL_SET_VERSION ioctl argument type.
+ */
+struct drm_set_version {
+	int drm_di_major;
+	int drm_di_minor;
+	int drm_dd_major;
+	int drm_dd_minor;
+};
+
+/** DRM_IOCTL_GEM_CLOSE ioctl argument type */
+struct drm_gem_close {
+	/** Handle of the object to be closed. */
+	__u32 handle;
+	__u32 pad;
+};
+
+/** DRM_IOCTL_GEM_FLINK ioctl argument type */
+struct drm_gem_flink {
+	/** Handle for the object being named */
+	__u32 handle;
+
+	/** Returned global name */
+	__u32 name;
+};
+
+/** DRM_IOCTL_GEM_OPEN ioctl argument type */
+struct drm_gem_open {
+	/** Name of object being opened */
+	__u32 name;
+
+	/** Returned handle for the object */
+	__u32 handle;
+
+	/** Returned size of the object */
+	__u64 size;
+};
+
+/** DRM_IOCTL_GET_CAP ioctl argument type */
+struct drm_get_cap {
+	__u64 capability;
+	__u64 value;
+};
+
+/**
+ * DRM_CLIENT_CAP_STEREO_3D
+ *
+ * if set to 1, the DRM core will expose the stereo 3D capabilities of the
+ * monitor by advertising the supported 3D layouts in the flags of struct
+ * drm_mode_modeinfo.
+ */
+#define DRM_CLIENT_CAP_STEREO_3D	1
+
+/**
+ * DRM_CLIENT_CAP_UNIVERSAL_PLANES
+ *
+ * if set to 1, the DRM core will expose the full universal plane list
+ * (including primary and cursor planes).
+ */
+#define DRM_CLIENT_CAP_UNIVERSAL_PLANES 2
+
+/** DRM_IOCTL_SET_CLIENT_CAP ioctl argument type */
+struct drm_set_client_cap {
+	__u64 capability;
+	__u64 value;
+};
+
+#define DRM_CLOEXEC O_CLOEXEC
+struct drm_prime_handle {
+	__u32 handle;
+
+	/** Flags.. only applicable for handle->fd */
+	__u32 flags;
+
+	/** Returned dmabuf file descriptor */
+	__s32 fd;
+};
+
+#include "drm_mode.h"
+
+#define DRM_IOCTL_BASE			'd'
+#define DRM_IO(nr)			_IO(DRM_IOCTL_BASE,nr)
+#define DRM_IOR(nr,type)		_IOR(DRM_IOCTL_BASE,nr,type)
+#define DRM_IOW(nr,type)		_IOW(DRM_IOCTL_BASE,nr,type)
+#define DRM_IOWR(nr,type)		_IOWR(DRM_IOCTL_BASE,nr,type)
+
+#define DRM_IOCTL_VERSION		DRM_IOWR(0x00, struct drm_version)
+#define DRM_IOCTL_GET_UNIQUE		DRM_IOWR(0x01, struct drm_unique)
+#define DRM_IOCTL_GET_MAGIC		DRM_IOR( 0x02, struct drm_auth)
+#define DRM_IOCTL_IRQ_BUSID		DRM_IOWR(0x03, struct drm_irq_busid)
+#define DRM_IOCTL_GET_MAP               DRM_IOWR(0x04, struct drm_map)
+#define DRM_IOCTL_GET_CLIENT            DRM_IOWR(0x05, struct drm_client)
+#define DRM_IOCTL_GET_STATS             DRM_IOR( 0x06, struct drm_stats)
+#define DRM_IOCTL_SET_VERSION		DRM_IOWR(0x07, struct drm_set_version)
+#define DRM_IOCTL_MODESET_CTL           DRM_IOW(0x08, struct drm_modeset_ctl)
+#define DRM_IOCTL_GEM_CLOSE		DRM_IOW (0x09, struct drm_gem_close)
+#define DRM_IOCTL_GEM_FLINK		DRM_IOWR(0x0a, struct drm_gem_flink)
+#define DRM_IOCTL_GEM_OPEN		DRM_IOWR(0x0b, struct drm_gem_open)
+#define DRM_IOCTL_GET_CAP		DRM_IOWR(0x0c, struct drm_get_cap)
+#define DRM_IOCTL_SET_CLIENT_CAP	DRM_IOW( 0x0d, struct drm_set_client_cap)
+
+#define DRM_IOCTL_SET_UNIQUE		DRM_IOW( 0x10, struct drm_unique)
+#define DRM_IOCTL_AUTH_MAGIC		DRM_IOW( 0x11, struct drm_auth)
+#define DRM_IOCTL_BLOCK			DRM_IOWR(0x12, struct drm_block)
+#define DRM_IOCTL_UNBLOCK		DRM_IOWR(0x13, struct drm_block)
+#define DRM_IOCTL_CONTROL		DRM_IOW( 0x14, struct drm_control)
+#define DRM_IOCTL_ADD_MAP		DRM_IOWR(0x15, struct drm_map)
+#define DRM_IOCTL_ADD_BUFS		DRM_IOWR(0x16, struct drm_buf_desc)
+#define DRM_IOCTL_MARK_BUFS		DRM_IOW( 0x17, struct drm_buf_desc)
+#define DRM_IOCTL_INFO_BUFS		DRM_IOWR(0x18, struct drm_buf_info)
+#define DRM_IOCTL_MAP_BUFS		DRM_IOWR(0x19, struct drm_buf_map)
+#define DRM_IOCTL_FREE_BUFS		DRM_IOW( 0x1a, struct drm_buf_free)
+
+#define DRM_IOCTL_RM_MAP		DRM_IOW( 0x1b, struct drm_map)
+
+#define DRM_IOCTL_SET_SAREA_CTX		DRM_IOW( 0x1c, struct drm_ctx_priv_map)
+#define DRM_IOCTL_GET_SAREA_CTX 	DRM_IOWR(0x1d, struct drm_ctx_priv_map)
+
+#define DRM_IOCTL_SET_MASTER            DRM_IO(0x1e)
+#define DRM_IOCTL_DROP_MASTER           DRM_IO(0x1f)
+
+#define DRM_IOCTL_ADD_CTX		DRM_IOWR(0x20, struct drm_ctx)
+#define DRM_IOCTL_RM_CTX		DRM_IOWR(0x21, struct drm_ctx)
+#define DRM_IOCTL_MOD_CTX		DRM_IOW( 0x22, struct drm_ctx)
+#define DRM_IOCTL_GET_CTX		DRM_IOWR(0x23, struct drm_ctx)
+#define DRM_IOCTL_SWITCH_CTX		DRM_IOW( 0x24, struct drm_ctx)
+#define DRM_IOCTL_NEW_CTX		DRM_IOW( 0x25, struct drm_ctx)
+#define DRM_IOCTL_RES_CTX		DRM_IOWR(0x26, struct drm_ctx_res)
+#define DRM_IOCTL_ADD_DRAW		DRM_IOWR(0x27, struct drm_draw)
+#define DRM_IOCTL_RM_DRAW		DRM_IOWR(0x28, struct drm_draw)
+#define DRM_IOCTL_DMA			DRM_IOWR(0x29, struct drm_dma)
+#define DRM_IOCTL_LOCK			DRM_IOW( 0x2a, struct drm_lock)
+#define DRM_IOCTL_UNLOCK		DRM_IOW( 0x2b, struct drm_lock)
+#define DRM_IOCTL_FINISH		DRM_IOW( 0x2c, struct drm_lock)
+
+#define DRM_IOCTL_PRIME_HANDLE_TO_FD    DRM_IOWR(0x2d, struct drm_prime_handle)
+#define DRM_IOCTL_PRIME_FD_TO_HANDLE    DRM_IOWR(0x2e, struct drm_prime_handle)
+
+#define DRM_IOCTL_AGP_ACQUIRE		DRM_IO(  0x30)
+#define DRM_IOCTL_AGP_RELEASE		DRM_IO(  0x31)
+#define DRM_IOCTL_AGP_ENABLE		DRM_IOW( 0x32, struct drm_agp_mode)
+#define DRM_IOCTL_AGP_INFO		DRM_IOR( 0x33, struct drm_agp_info)
+#define DRM_IOCTL_AGP_ALLOC		DRM_IOWR(0x34, struct drm_agp_buffer)
+#define DRM_IOCTL_AGP_FREE		DRM_IOW( 0x35, struct drm_agp_buffer)
+#define DRM_IOCTL_AGP_BIND		DRM_IOW( 0x36, struct drm_agp_binding)
+#define DRM_IOCTL_AGP_UNBIND		DRM_IOW( 0x37, struct drm_agp_binding)
+
+#define DRM_IOCTL_SG_ALLOC		DRM_IOWR(0x38, struct drm_scatter_gather)
+#define DRM_IOCTL_SG_FREE		DRM_IOW( 0x39, struct drm_scatter_gather)
+
+#define DRM_IOCTL_WAIT_VBLANK		DRM_IOWR(0x3a, union drm_wait_vblank)
+
+#define DRM_IOCTL_UPDATE_DRAW		DRM_IOW(0x3f, struct drm_update_draw)
+
+#define DRM_IOCTL_MODE_GETRESOURCES	DRM_IOWR(0xA0, struct drm_mode_card_res)
+#define DRM_IOCTL_MODE_GETCRTC		DRM_IOWR(0xA1, struct drm_mode_crtc)
+#define DRM_IOCTL_MODE_SETCRTC		DRM_IOWR(0xA2, struct drm_mode_crtc)
+#define DRM_IOCTL_MODE_CURSOR		DRM_IOWR(0xA3, struct drm_mode_cursor)
+#define DRM_IOCTL_MODE_GETGAMMA		DRM_IOWR(0xA4, struct drm_mode_crtc_lut)
+#define DRM_IOCTL_MODE_SETGAMMA		DRM_IOWR(0xA5, struct drm_mode_crtc_lut)
+#define DRM_IOCTL_MODE_GETENCODER	DRM_IOWR(0xA6, struct drm_mode_get_encoder)
+#define DRM_IOCTL_MODE_GETCONNECTOR	DRM_IOWR(0xA7, struct drm_mode_get_connector)
+#define DRM_IOCTL_MODE_ATTACHMODE	DRM_IOWR(0xA8, struct drm_mode_mode_cmd)
+#define DRM_IOCTL_MODE_DETACHMODE	DRM_IOWR(0xA9, struct drm_mode_mode_cmd)
+
+#define DRM_IOCTL_MODE_GETPROPERTY	DRM_IOWR(0xAA, struct drm_mode_get_property)
+#define DRM_IOCTL_MODE_SETPROPERTY	DRM_IOWR(0xAB, struct drm_mode_connector_set_property)
+#define DRM_IOCTL_MODE_GETPROPBLOB	DRM_IOWR(0xAC, struct drm_mode_get_blob)
+#define DRM_IOCTL_MODE_GETFB		DRM_IOWR(0xAD, struct drm_mode_fb_cmd)
+#define DRM_IOCTL_MODE_ADDFB		DRM_IOWR(0xAE, struct drm_mode_fb_cmd)
+#define DRM_IOCTL_MODE_RMFB		DRM_IOWR(0xAF, unsigned int)
+#define DRM_IOCTL_MODE_PAGE_FLIP	DRM_IOWR(0xB0, struct drm_mode_crtc_page_flip)
+#define DRM_IOCTL_MODE_DIRTYFB		DRM_IOWR(0xB1, struct drm_mode_fb_dirty_cmd)
+
+#define DRM_IOCTL_MODE_CREATE_DUMB DRM_IOWR(0xB2, struct drm_mode_create_dumb)
+#define DRM_IOCTL_MODE_MAP_DUMB    DRM_IOWR(0xB3, struct drm_mode_map_dumb)
+#define DRM_IOCTL_MODE_DESTROY_DUMB    DRM_IOWR(0xB4, struct drm_mode_destroy_dumb)
+#define DRM_IOCTL_MODE_GETPLANERESOURCES DRM_IOWR(0xB5, struct drm_mode_get_plane_res)
+#define DRM_IOCTL_MODE_GETPLANE	DRM_IOWR(0xB6, struct drm_mode_get_plane)
+#define DRM_IOCTL_MODE_SETPLANE	DRM_IOWR(0xB7, struct drm_mode_set_plane)
+#define DRM_IOCTL_MODE_ADDFB2		DRM_IOWR(0xB8, struct drm_mode_fb_cmd2)
+#define DRM_IOCTL_MODE_OBJ_GETPROPERTIES	DRM_IOWR(0xB9, struct drm_mode_obj_get_properties)
+#define DRM_IOCTL_MODE_OBJ_SETPROPERTY	DRM_IOWR(0xBA, struct drm_mode_obj_set_property)
+#define DRM_IOCTL_MODE_CURSOR2		DRM_IOWR(0xBB, struct drm_mode_cursor2)
+
+/**
+ * Device specific ioctls should only be in their respective headers
+ * The device specific ioctl range is from 0x40 to 0x99.
+ * Generic IOCTLS restart at 0xA0.
+ *
+ * \sa drmCommandNone(), drmCommandRead(), drmCommandWrite(), and
+ * drmCommandReadWrite().
+ */
+#define DRM_COMMAND_BASE                0x40
+#define DRM_COMMAND_END			0xA0
+
+/**
+ * Header for events written back to userspace on the drm fd.  The
+ * type defines the type of event, the length specifies the total
+ * length of the event (including the header), and user_data is
+ * typically a 64 bit value passed with the ioctl that triggered the
+ * event.  A read on the drm fd will always only return complete
+ * events, that is, if for example the read buffer is 100 bytes, and
+ * there are two 64 byte events pending, only one will be returned.
+ *
+ * Event types 0 - 0x7fffffff are generic drm events, 0x80000000 and
+ * up are chipset specific.
+ */
+struct drm_event {
+	__u32 type;
+	__u32 length;
+};
+
+#define DRM_EVENT_VBLANK 0x01
+#define DRM_EVENT_FLIP_COMPLETE 0x02
+
+struct drm_event_vblank {
+	struct drm_event base;
+	__u64 user_data;
+	__u32 tv_sec;
+	__u32 tv_usec;
+	__u32 sequence;
+	__u32 reserved;
+};
+
+#define DRM_CAP_DUMB_BUFFER 0x1
+#define DRM_CAP_VBLANK_HIGH_CRTC   0x2
+#define DRM_CAP_DUMB_PREFERRED_DEPTH 0x3
+#define DRM_CAP_DUMB_PREFER_SHADOW 0x4
+#define DRM_CAP_PRIME 0x5
+#define DRM_CAP_TIMESTAMP_MONOTONIC 0x6
+#define DRM_CAP_ASYNC_PAGE_FLIP 0x7
+
+#define DRM_PRIME_CAP_IMPORT 0x1
+#define DRM_PRIME_CAP_EXPORT 0x2
+
+/* typedef area */
+typedef struct drm_clip_rect drm_clip_rect_t;
+typedef struct drm_drawable_info drm_drawable_info_t;
+typedef struct drm_tex_region drm_tex_region_t;
+typedef struct drm_hw_lock drm_hw_lock_t;
+typedef struct drm_version drm_version_t;
+typedef struct drm_unique drm_unique_t;
+typedef struct drm_list drm_list_t;
+typedef struct drm_block drm_block_t;
+typedef struct drm_control drm_control_t;
+typedef enum drm_map_type drm_map_type_t;
+typedef enum drm_map_flags drm_map_flags_t;
+typedef struct drm_ctx_priv_map drm_ctx_priv_map_t;
+typedef struct drm_map drm_map_t;
+typedef struct drm_client drm_client_t;
+typedef enum drm_stat_type drm_stat_type_t;
+typedef struct drm_stats drm_stats_t;
+typedef enum drm_lock_flags drm_lock_flags_t;
+typedef struct drm_lock drm_lock_t;
+typedef enum drm_dma_flags drm_dma_flags_t;
+typedef struct drm_buf_desc drm_buf_desc_t;
+typedef struct drm_buf_info drm_buf_info_t;
+typedef struct drm_buf_free drm_buf_free_t;
+typedef struct drm_buf_pub drm_buf_pub_t;
+typedef struct drm_buf_map drm_buf_map_t;
+typedef struct drm_dma drm_dma_t;
+typedef union drm_wait_vblank drm_wait_vblank_t;
+typedef struct drm_agp_mode drm_agp_mode_t;
+typedef enum drm_ctx_flags drm_ctx_flags_t;
+typedef struct drm_ctx drm_ctx_t;
+typedef struct drm_ctx_res drm_ctx_res_t;
+typedef struct drm_draw drm_draw_t;
+typedef struct drm_update_draw drm_update_draw_t;
+typedef struct drm_auth drm_auth_t;
+typedef struct drm_irq_busid drm_irq_busid_t;
+typedef enum drm_vblank_seq_type drm_vblank_seq_type_t;
+
+typedef struct drm_agp_buffer drm_agp_buffer_t;
+typedef struct drm_agp_binding drm_agp_binding_t;
+typedef struct drm_agp_info drm_agp_info_t;
+typedef struct drm_scatter_gather drm_scatter_gather_t;
+typedef struct drm_set_version drm_set_version_t;
+
+#endif

diff --git a/icd/intel/kmd/libdrm/include/drm/drm_mode.h b/icd/intel/kmd/libdrm/include/drm/drm_mode.h
new file mode 100644
index 0000000..e7c394b
--- /dev/null
+++ b/icd/intel/kmd/libdrm/include/drm/drm_mode.h

@@ -0,0 +1,510 @@
+/*
+ * Copyright (c) 2007 Dave Airlie <airlied@linux.ie>
+ * Copyright (c) 2007 Jakob Bornecrantz <wallbraker@gmail.com>
+ * Copyright (c) 2008 Red Hat Inc.
+ * Copyright (c) 2007-2008 Tungsten Graphics, Inc., Cedar Park, TX., USA
+ * Copyright (c) 2007-2008 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef _DRM_MODE_H
+#define _DRM_MODE_H
+
+#define DRM_DISPLAY_INFO_LEN	32
+#define DRM_CONNECTOR_NAME_LEN	32
+#define DRM_DISPLAY_MODE_LEN	32
+#define DRM_PROP_NAME_LEN	32
+
+#define DRM_MODE_TYPE_BUILTIN	(1<<0)
+#define DRM_MODE_TYPE_CLOCK_C	((1<<1) | DRM_MODE_TYPE_BUILTIN)
+#define DRM_MODE_TYPE_CRTC_C	((1<<2) | DRM_MODE_TYPE_BUILTIN)
+#define DRM_MODE_TYPE_PREFERRED	(1<<3)
+#define DRM_MODE_TYPE_DEFAULT	(1<<4)
+#define DRM_MODE_TYPE_USERDEF	(1<<5)
+#define DRM_MODE_TYPE_DRIVER	(1<<6)
+
+/* Video mode flags */
+/* bit compatible with the xorg definitions. */
+#define DRM_MODE_FLAG_PHSYNC			(1<<0)
+#define DRM_MODE_FLAG_NHSYNC			(1<<1)
+#define DRM_MODE_FLAG_PVSYNC			(1<<2)
+#define DRM_MODE_FLAG_NVSYNC			(1<<3)
+#define DRM_MODE_FLAG_INTERLACE			(1<<4)
+#define DRM_MODE_FLAG_DBLSCAN			(1<<5)
+#define DRM_MODE_FLAG_CSYNC			(1<<6)
+#define DRM_MODE_FLAG_PCSYNC			(1<<7)
+#define DRM_MODE_FLAG_NCSYNC			(1<<8)
+#define DRM_MODE_FLAG_HSKEW			(1<<9) /* hskew provided */
+#define DRM_MODE_FLAG_BCAST			(1<<10)
+#define DRM_MODE_FLAG_PIXMUX			(1<<11)
+#define DRM_MODE_FLAG_DBLCLK			(1<<12)
+#define DRM_MODE_FLAG_CLKDIV2			(1<<13)
+#define DRM_MODE_FLAG_3D_MASK			(0x1f<<14)
+#define  DRM_MODE_FLAG_3D_NONE			(0<<14)
+#define  DRM_MODE_FLAG_3D_FRAME_PACKING		(1<<14)
+#define  DRM_MODE_FLAG_3D_FIELD_ALTERNATIVE	(2<<14)
+#define  DRM_MODE_FLAG_3D_LINE_ALTERNATIVE	(3<<14)
+#define  DRM_MODE_FLAG_3D_SIDE_BY_SIDE_FULL	(4<<14)
+#define  DRM_MODE_FLAG_3D_L_DEPTH		(5<<14)
+#define  DRM_MODE_FLAG_3D_L_DEPTH_GFX_GFX_DEPTH	(6<<14)
+#define  DRM_MODE_FLAG_3D_TOP_AND_BOTTOM	(7<<14)
+#define  DRM_MODE_FLAG_3D_SIDE_BY_SIDE_HALF	(8<<14)
+
+
+/* DPMS flags */
+/* bit compatible with the xorg definitions. */
+#define DRM_MODE_DPMS_ON	0
+#define DRM_MODE_DPMS_STANDBY	1
+#define DRM_MODE_DPMS_SUSPEND	2
+#define DRM_MODE_DPMS_OFF	3
+
+/* Scaling mode options */
+#define DRM_MODE_SCALE_NONE		0 /* Unmodified timing (display or
+					     software can still scale) */
+#define DRM_MODE_SCALE_FULLSCREEN	1 /* Full screen, ignore aspect */
+#define DRM_MODE_SCALE_CENTER		2 /* Centered, no scaling */
+#define DRM_MODE_SCALE_ASPECT		3 /* Full screen, preserve aspect */
+
+/* Dithering mode options */
+#define DRM_MODE_DITHERING_OFF	0
+#define DRM_MODE_DITHERING_ON	1
+#define DRM_MODE_DITHERING_AUTO 2
+
+/* Dirty info options */
+#define DRM_MODE_DIRTY_OFF      0
+#define DRM_MODE_DIRTY_ON       1
+#define DRM_MODE_DIRTY_ANNOTATE 2
+
+struct drm_mode_modeinfo {
+	__u32 clock;
+	__u16 hdisplay, hsync_start, hsync_end, htotal, hskew;
+	__u16 vdisplay, vsync_start, vsync_end, vtotal, vscan;
+
+	__u32 vrefresh;
+
+	__u32 flags;
+	__u32 type;
+	char name[DRM_DISPLAY_MODE_LEN];
+};
+
+struct drm_mode_card_res {
+	__u64 fb_id_ptr;
+	__u64 crtc_id_ptr;
+	__u64 connector_id_ptr;
+	__u64 encoder_id_ptr;
+	__u32 count_fbs;
+	__u32 count_crtcs;
+	__u32 count_connectors;
+	__u32 count_encoders;
+	__u32 min_width, max_width;
+	__u32 min_height, max_height;
+};
+
+struct drm_mode_crtc {
+	__u64 set_connectors_ptr;
+	__u32 count_connectors;
+
+	__u32 crtc_id; /**< Id */
+	__u32 fb_id; /**< Id of framebuffer */
+
+	__u32 x, y; /**< Position on the frameuffer */
+
+	__u32 gamma_size;
+	__u32 mode_valid;
+	struct drm_mode_modeinfo mode;
+};
+
+#define DRM_MODE_PRESENT_TOP_FIELD     (1<<0)
+#define DRM_MODE_PRESENT_BOTTOM_FIELD  (1<<1)
+
+/* Planes blend with or override other bits on the CRTC */
+struct drm_mode_set_plane {
+	__u32 plane_id;
+	__u32 crtc_id;
+	__u32 fb_id; /* fb object contains surface format type */
+	__u32 flags;
+
+	/* Signed dest location allows it to be partially off screen */
+	__s32 crtc_x, crtc_y;
+	__u32 crtc_w, crtc_h;
+
+	/* Source values are 16.16 fixed point */
+	__u32 src_x, src_y;
+	__u32 src_h, src_w;
+};
+
+struct drm_mode_get_plane {
+	__u32 plane_id;
+
+	__u32 crtc_id;
+	__u32 fb_id;
+
+	__u32 possible_crtcs;
+	__u32 gamma_size;
+
+	__u32 count_format_types;
+	__u64 format_type_ptr;
+};
+
+struct drm_mode_get_plane_res {
+	__u64 plane_id_ptr;
+	__u32 count_planes;
+};
+
+#define DRM_MODE_ENCODER_NONE	0
+#define DRM_MODE_ENCODER_DAC	1
+#define DRM_MODE_ENCODER_TMDS	2
+#define DRM_MODE_ENCODER_LVDS	3
+#define DRM_MODE_ENCODER_TVDAC	4
+#define DRM_MODE_ENCODER_VIRTUAL 5
+#define DRM_MODE_ENCODER_DSI	6
+#define DRM_MODE_ENCODER_DPMST	7
+
+struct drm_mode_get_encoder {
+	__u32 encoder_id;
+	__u32 encoder_type;
+
+	__u32 crtc_id; /**< Id of crtc */
+
+	__u32 possible_crtcs;
+	__u32 possible_clones;
+};
+
+/* This is for connectors with multiple signal types. */
+/* Try to match DRM_MODE_CONNECTOR_X as closely as possible. */
+#define DRM_MODE_SUBCONNECTOR_Automatic	0
+#define DRM_MODE_SUBCONNECTOR_Unknown	0
+#define DRM_MODE_SUBCONNECTOR_DVID	3
+#define DRM_MODE_SUBCONNECTOR_DVIA	4
+#define DRM_MODE_SUBCONNECTOR_Composite	5
+#define DRM_MODE_SUBCONNECTOR_SVIDEO	6
+#define DRM_MODE_SUBCONNECTOR_Component	8
+#define DRM_MODE_SUBCONNECTOR_SCART	9
+
+#define DRM_MODE_CONNECTOR_Unknown	0
+#define DRM_MODE_CONNECTOR_VGA		1
+#define DRM_MODE_CONNECTOR_DVII		2
+#define DRM_MODE_CONNECTOR_DVID		3
+#define DRM_MODE_CONNECTOR_DVIA		4
+#define DRM_MODE_CONNECTOR_Composite	5
+#define DRM_MODE_CONNECTOR_SVIDEO	6
+#define DRM_MODE_CONNECTOR_LVDS		7
+#define DRM_MODE_CONNECTOR_Component	8
+#define DRM_MODE_CONNECTOR_9PinDIN	9
+#define DRM_MODE_CONNECTOR_DisplayPort	10
+#define DRM_MODE_CONNECTOR_HDMIA	11
+#define DRM_MODE_CONNECTOR_HDMIB	12
+#define DRM_MODE_CONNECTOR_TV		13
+#define DRM_MODE_CONNECTOR_eDP		14
+#define DRM_MODE_CONNECTOR_VIRTUAL      15
+#define DRM_MODE_CONNECTOR_DSI		16
+
+struct drm_mode_get_connector {
+
+	__u64 encoders_ptr;
+	__u64 modes_ptr;
+	__u64 props_ptr;
+	__u64 prop_values_ptr;
+
+	__u32 count_modes;
+	__u32 count_props;
+	__u32 count_encoders;
+
+	__u32 encoder_id; /**< Current Encoder */
+	__u32 connector_id; /**< Id */
+	__u32 connector_type;
+	__u32 connector_type_id;
+
+	__u32 connection;
+	__u32 mm_width, mm_height; /**< HxW in millimeters */
+	__u32 subpixel;
+};
+
+#define DRM_MODE_PROP_PENDING	(1<<0)
+#define DRM_MODE_PROP_RANGE	(1<<1)
+#define DRM_MODE_PROP_IMMUTABLE	(1<<2)
+#define DRM_MODE_PROP_ENUM	(1<<3) /* enumerated type with text strings */
+#define DRM_MODE_PROP_BLOB	(1<<4)
+#define DRM_MODE_PROP_BITMASK	(1<<5) /* bitmask of enumerated types */
+
+/* non-extended types: legacy bitmask, one bit per type: */
+#define DRM_MODE_PROP_LEGACY_TYPE  ( \
+		DRM_MODE_PROP_RANGE | \
+		DRM_MODE_PROP_ENUM | \
+		DRM_MODE_PROP_BLOB | \
+		DRM_MODE_PROP_BITMASK)
+
+/* extended-types: rather than continue to consume a bit per type,
+ * grab a chunk of the bits to use as integer type id.
+ */
+#define DRM_MODE_PROP_EXTENDED_TYPE	0x0000ffc0
+#define DRM_MODE_PROP_TYPE(n)		((n) << 6)
+#define DRM_MODE_PROP_OBJECT		DRM_MODE_PROP_TYPE(1)
+#define DRM_MODE_PROP_SIGNED_RANGE	DRM_MODE_PROP_TYPE(2)
+
+struct drm_mode_property_enum {
+	__u64 value;
+	char name[DRM_PROP_NAME_LEN];
+};
+
+struct drm_mode_get_property {
+	__u64 values_ptr; /* values and blob lengths */
+	__u64 enum_blob_ptr; /* enum and blob id ptrs */
+
+	__u32 prop_id;
+	__u32 flags;
+	char name[DRM_PROP_NAME_LEN];
+
+	__u32 count_values;
+	__u32 count_enum_blobs;
+};
+
+struct drm_mode_connector_set_property {
+	__u64 value;
+	__u32 prop_id;
+	__u32 connector_id;
+};
+
+#define DRM_MODE_OBJECT_CRTC 0xcccccccc
+#define DRM_MODE_OBJECT_CONNECTOR 0xc0c0c0c0
+#define DRM_MODE_OBJECT_ENCODER 0xe0e0e0e0
+#define DRM_MODE_OBJECT_MODE 0xdededede
+#define DRM_MODE_OBJECT_PROPERTY 0xb0b0b0b0
+#define DRM_MODE_OBJECT_FB 0xfbfbfbfb
+#define DRM_MODE_OBJECT_BLOB 0xbbbbbbbb
+#define DRM_MODE_OBJECT_PLANE 0xeeeeeeee
+
+struct drm_mode_obj_get_properties {
+	__u64 props_ptr;
+	__u64 prop_values_ptr;
+	__u32 count_props;
+	__u32 obj_id;
+	__u32 obj_type;
+};
+
+struct drm_mode_obj_set_property {
+	__u64 value;
+	__u32 prop_id;
+	__u32 obj_id;
+	__u32 obj_type;
+};
+
+struct drm_mode_get_blob {
+	__u32 blob_id;
+	__u32 length;
+	__u64 data;
+};
+
+struct drm_mode_fb_cmd {
+	__u32 fb_id;
+	__u32 width, height;
+	__u32 pitch;
+	__u32 bpp;
+	__u32 depth;
+	/* driver specific handle */
+	__u32 handle;
+};
+
+#define DRM_MODE_FB_INTERLACED (1<<0) /* for interlaced framebuffers */
+
+struct drm_mode_fb_cmd2 {
+	__u32 fb_id;
+	__u32 width, height;
+	__u32 pixel_format; /* fourcc code from drm_fourcc.h */
+	__u32 flags;
+
+	/*
+	 * In case of planar formats, this ioctl allows up to 4
+	 * buffer objects with offsets and pitches per plane.
+	 * The pitch and offset order is dictated by the fourcc,
+	 * e.g. NV12 (http://fourcc.org/yuv.php#NV12) is described as:
+	 *
+	 *   YUV 4:2:0 image with a plane of 8 bit Y samples
+	 *   followed by an interleaved U/V plane containing
+	 *   8 bit 2x2 subsampled colour difference samples.
+	 *
+	 * So it would consist of Y as offset[0] and UV as
+	 * offset[1].  Note that offset[0] will generally
+	 * be 0.
+	 */
+	__u32 handles[4];
+	__u32 pitches[4]; /* pitch for each plane */
+	__u32 offsets[4]; /* offset of each plane */
+};
+
+#define DRM_MODE_FB_DIRTY_ANNOTATE_COPY 0x01
+#define DRM_MODE_FB_DIRTY_ANNOTATE_FILL 0x02
+#define DRM_MODE_FB_DIRTY_FLAGS         0x03
+
+/*
+ * Mark a region of a framebuffer as dirty.
+ *
+ * Some hardware does not automatically update display contents
+ * as a hardware or software draw to a framebuffer. This ioctl
+ * allows userspace to tell the kernel and the hardware what
+ * regions of the framebuffer have changed.
+ *
+ * The kernel or hardware is free to update more then just the
+ * region specified by the clip rects. The kernel or hardware
+ * may also delay and/or coalesce several calls to dirty into a
+ * single update.
+ *
+ * Userspace may annotate the updates, the annotates are a
+ * promise made by the caller that the change is either a copy
+ * of pixels or a fill of a single color in the region specified.
+ *
+ * If the DRM_MODE_FB_DIRTY_ANNOTATE_COPY flag is given then
+ * the number of updated regions are half of num_clips given,
+ * where the clip rects are paired in src and dst. The width and
+ * height of each one of the pairs must match.
+ *
+ * If the DRM_MODE_FB_DIRTY_ANNOTATE_FILL flag is given the caller
+ * promises that the region specified of the clip rects is filled
+ * completely with a single color as given in the color argument.
+ */
+
+struct drm_mode_fb_dirty_cmd {
+	__u32 fb_id;
+	__u32 flags;
+	__u32 color;
+	__u32 num_clips;
+	__u64 clips_ptr;
+};
+
+struct drm_mode_mode_cmd {
+	__u32 connector_id;
+	struct drm_mode_modeinfo mode;
+};
+
+#define DRM_MODE_CURSOR_BO	(1<<0)
+#define DRM_MODE_CURSOR_MOVE	(1<<1)
+
+/*
+ * depending on the value in flags different members are used.
+ *
+ * CURSOR_BO uses
+ *    crtc
+ *    width
+ *    height
+ *    handle - if 0 turns the cursor of
+ *
+ * CURSOR_MOVE uses
+ *    crtc
+ *    x
+ *    y
+ */
+struct drm_mode_cursor {
+	__u32 flags;
+	__u32 crtc_id;
+	__s32 x;
+	__s32 y;
+	__u32 width;
+	__u32 height;
+	/* driver specific handle */
+	__u32 handle;
+};
+
+struct drm_mode_cursor2 {
+	__u32 flags;
+	__u32 crtc_id;
+	__s32 x;
+	__s32 y;
+	__u32 width;
+	__u32 height;
+	/* driver specific handle */
+	__u32 handle;
+	__s32 hot_x;
+	__s32 hot_y;
+};
+
+struct drm_mode_crtc_lut {
+	__u32 crtc_id;
+	__u32 gamma_size;
+
+	/* pointers to arrays */
+	__u64 red;
+	__u64 green;
+	__u64 blue;
+};
+
+#define DRM_MODE_PAGE_FLIP_EVENT 0x01
+#define DRM_MODE_PAGE_FLIP_ASYNC 0x02
+#define DRM_MODE_PAGE_FLIP_FLAGS (DRM_MODE_PAGE_FLIP_EVENT|DRM_MODE_PAGE_FLIP_ASYNC)
+
+/*
+ * Request a page flip on the specified crtc.
+ *
+ * This ioctl will ask KMS to schedule a page flip for the specified
+ * crtc.  Once any pending rendering targeting the specified fb (as of
+ * ioctl time) has completed, the crtc will be reprogrammed to display
+ * that fb after the next vertical refresh.  The ioctl returns
+ * immediately, but subsequent rendering to the current fb will block
+ * in the execbuffer ioctl until the page flip happens.  If a page
+ * flip is already pending as the ioctl is called, EBUSY will be
+ * returned.
+ *
+ * The ioctl supports one flag, DRM_MODE_PAGE_FLIP_EVENT, which will
+ * request that drm sends back a vblank event (see drm.h: struct
+ * drm_event_vblank) when the page flip is done.  The user_data field
+ * passed in with this ioctl will be returned as the user_data field
+ * in the vblank event struct.
+ *
+ * The reserved field must be zero until we figure out something
+ * clever to use it for.
+ */
+
+struct drm_mode_crtc_page_flip {
+	__u32 crtc_id;
+	__u32 fb_id;
+	__u32 flags;
+	__u32 reserved;
+	__u64 user_data;
+};
+
+/* create a dumb scanout buffer */
+struct drm_mode_create_dumb {
+        __u32 height;
+        __u32 width;
+        __u32 bpp;
+        __u32 flags;
+        /* handle, pitch, size will be returned */
+        __u32 handle;
+        __u32 pitch;
+        __u64 size;
+};
+
+/* set up for mmap of a dumb scanout buffer */
+struct drm_mode_map_dumb {
+        /** Handle for the object being mapped. */
+        __u32 handle;
+        __u32 pad;
+        /**
+         * Fake offset to use for subsequent mmap call
+         *
+         * This is a fixed-size type for 32/64 compatibility.
+         */
+        __u64 offset;
+};
+
+struct drm_mode_destroy_dumb {
+	__u32 handle;
+};
+
+#endif

diff --git a/icd/intel/kmd/libdrm/include/drm/i915_drm.h b/icd/intel/kmd/libdrm/include/drm/i915_drm.h
new file mode 100644
index 0000000..a0bda34
--- /dev/null
+++ b/icd/intel/kmd/libdrm/include/drm/i915_drm.h

@@ -0,0 +1,1107 @@
+/*
+ * Copyright 2003 Tungsten Graphics, Inc., Cedar Park, Texas.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
+ * IN NO EVENT SHALL TUNGSTEN GRAPHICS AND/OR ITS SUPPLIERS BE LIABLE FOR
+ * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef _I915_DRM_H_
+#define _I915_DRM_H_
+
+#include <drm.h>
+
+/* Please note that modifications to all structs defined here are
+ * subject to backwards-compatibility constraints.
+ */
+
+/**
+ * DOC: uevents generated by i915 on it's device node
+ *
+ * I915_L3_PARITY_UEVENT - Generated when the driver receives a parity mismatch
+ *	event from the gpu l3 cache. Additional information supplied is ROW,
+ *	BANK, SUBBANK, SLICE of the affected cacheline. Userspace should keep
+ *	track of these events and if a specific cache-line seems to have a
+ *	persistent error remap it with the l3 remapping tool supplied in
+ *	intel-gpu-tools.  The value supplied with the event is always 1.
+ *
+ * I915_ERROR_UEVENT - Generated upon error detection, currently only via
+ *	hangcheck. The error detection event is a good indicator of when things
+ *	began to go badly. The value supplied with the event is a 1 upon error
+ *	detection, and a 0 upon reset completion, signifying no more error
+ *	exists. NOTE: Disabling hangcheck or reset via module parameter will
+ *	cause the related events to not be seen.
+ *
+ * I915_RESET_UEVENT - Event is generated just before an attempt to reset the
+ *	the GPU. The value supplied with the event is always 1. NOTE: Disable
+ *	reset via module parameter will cause this event to not be seen.
+ */
+#define I915_L3_PARITY_UEVENT		"L3_PARITY_ERROR"
+#define I915_ERROR_UEVENT		"ERROR"
+#define I915_RESET_UEVENT		"RESET"
+
+/* Each region is a minimum of 16k, and there are at most 255 of them.
+ */
+#define I915_NR_TEX_REGIONS 255	/* table size 2k - maximum due to use
+				 * of chars for next/prev indices */
+#define I915_LOG_MIN_TEX_REGION_SIZE 14
+
+typedef struct _drm_i915_init {
+	enum {
+		I915_INIT_DMA = 0x01,
+		I915_CLEANUP_DMA = 0x02,
+		I915_RESUME_DMA = 0x03
+	} func;
+	unsigned int mmio_offset;
+	int sarea_priv_offset;
+	unsigned int ring_start;
+	unsigned int ring_end;
+	unsigned int ring_size;
+	unsigned int front_offset;
+	unsigned int back_offset;
+	unsigned int depth_offset;
+	unsigned int w;
+	unsigned int h;
+	unsigned int pitch;
+	unsigned int pitch_bits;
+	unsigned int back_pitch;
+	unsigned int depth_pitch;
+	unsigned int cpp;
+	unsigned int chipset;
+} drm_i915_init_t;
+
+typedef struct _drm_i915_sarea {
+	struct drm_tex_region texList[I915_NR_TEX_REGIONS + 1];
+	int last_upload;	/* last time texture was uploaded */
+	int last_enqueue;	/* last time a buffer was enqueued */
+	int last_dispatch;	/* age of the most recently dispatched buffer */
+	int ctxOwner;		/* last context to upload state */
+	int texAge;
+	int pf_enabled;		/* is pageflipping allowed? */
+	int pf_active;
+	int pf_current_page;	/* which buffer is being displayed? */
+	int perf_boxes;		/* performance boxes to be displayed */
+	int width, height;      /* screen size in pixels */
+
+	drm_handle_t front_handle;
+	int front_offset;
+	int front_size;
+
+	drm_handle_t back_handle;
+	int back_offset;
+	int back_size;
+
+	drm_handle_t depth_handle;
+	int depth_offset;
+	int depth_size;
+
+	drm_handle_t tex_handle;
+	int tex_offset;
+	int tex_size;
+	int log_tex_granularity;
+	int pitch;
+	int rotation;           /* 0, 90, 180 or 270 */
+	int rotated_offset;
+	int rotated_size;
+	int rotated_pitch;
+	int virtualX, virtualY;
+
+	unsigned int front_tiled;
+	unsigned int back_tiled;
+	unsigned int depth_tiled;
+	unsigned int rotated_tiled;
+	unsigned int rotated2_tiled;
+
+	int pipeA_x;
+	int pipeA_y;
+	int pipeA_w;
+	int pipeA_h;
+	int pipeB_x;
+	int pipeB_y;
+	int pipeB_w;
+	int pipeB_h;
+
+	/* fill out some space for old userspace triple buffer */
+	drm_handle_t unused_handle;
+	__u32 unused1, unused2, unused3;
+
+	/* buffer object handles for static buffers. May change
+	 * over the lifetime of the client.
+	 */
+	__u32 front_bo_handle;
+	__u32 back_bo_handle;
+	__u32 unused_bo_handle;
+	__u32 depth_bo_handle;
+
+} drm_i915_sarea_t;
+
+/* due to userspace building against these headers we need some compat here */
+#define planeA_x pipeA_x
+#define planeA_y pipeA_y
+#define planeA_w pipeA_w
+#define planeA_h pipeA_h
+#define planeB_x pipeB_x
+#define planeB_y pipeB_y
+#define planeB_w pipeB_w
+#define planeB_h pipeB_h
+
+/* Flags for perf_boxes
+ */
+#define I915_BOX_RING_EMPTY    0x1
+#define I915_BOX_FLIP          0x2
+#define I915_BOX_WAIT          0x4
+#define I915_BOX_TEXTURE_LOAD  0x8
+#define I915_BOX_LOST_CONTEXT  0x10
+
+/* I915 specific ioctls
+ * The device specific ioctl range is 0x40 to 0x79.
+ */
+#define DRM_I915_INIT		0x00
+#define DRM_I915_FLUSH		0x01
+#define DRM_I915_FLIP		0x02
+#define DRM_I915_BATCHBUFFER	0x03
+#define DRM_I915_IRQ_EMIT	0x04
+#define DRM_I915_IRQ_WAIT	0x05
+#define DRM_I915_GETPARAM	0x06
+#define DRM_I915_SETPARAM	0x07
+#define DRM_I915_ALLOC		0x08
+#define DRM_I915_FREE		0x09
+#define DRM_I915_INIT_HEAP	0x0a
+#define DRM_I915_CMDBUFFER	0x0b
+#define DRM_I915_DESTROY_HEAP	0x0c
+#define DRM_I915_SET_VBLANK_PIPE	0x0d
+#define DRM_I915_GET_VBLANK_PIPE	0x0e
+#define DRM_I915_VBLANK_SWAP	0x0f
+#define DRM_I915_HWS_ADDR	0x11
+#define DRM_I915_GEM_INIT	0x13
+#define DRM_I915_GEM_EXECBUFFER	0x14
+#define DRM_I915_GEM_PIN	0x15
+#define DRM_I915_GEM_UNPIN	0x16
+#define DRM_I915_GEM_BUSY	0x17
+#define DRM_I915_GEM_THROTTLE	0x18
+#define DRM_I915_GEM_ENTERVT	0x19
+#define DRM_I915_GEM_LEAVEVT	0x1a
+#define DRM_I915_GEM_CREATE	0x1b
+#define DRM_I915_GEM_PREAD	0x1c
+#define DRM_I915_GEM_PWRITE	0x1d
+#define DRM_I915_GEM_MMAP	0x1e
+#define DRM_I915_GEM_SET_DOMAIN	0x1f
+#define DRM_I915_GEM_SW_FINISH	0x20
+#define DRM_I915_GEM_SET_TILING	0x21
+#define DRM_I915_GEM_GET_TILING	0x22
+#define DRM_I915_GEM_GET_APERTURE 0x23
+#define DRM_I915_GEM_MMAP_GTT	0x24
+#define DRM_I915_GET_PIPE_FROM_CRTC_ID	0x25
+#define DRM_I915_GEM_MADVISE	0x26
+#define DRM_I915_OVERLAY_PUT_IMAGE	0x27
+#define DRM_I915_OVERLAY_ATTRS	0x28
+#define DRM_I915_GEM_EXECBUFFER2	0x29
+#define DRM_I915_GET_SPRITE_COLORKEY	0x2a
+#define DRM_I915_SET_SPRITE_COLORKEY	0x2b
+#define DRM_I915_GEM_WAIT	0x2c
+#define DRM_I915_GEM_CONTEXT_CREATE	0x2d
+#define DRM_I915_GEM_CONTEXT_DESTROY	0x2e
+#define DRM_I915_GEM_SET_CACHING	0x2f
+#define DRM_I915_GEM_GET_CACHING	0x30
+#define DRM_I915_REG_READ		0x31
+#define DRM_I915_GET_RESET_STATS	0x32
+#define DRM_I915_GEM_USERPTR		0x33
+#define DRM_I915_GEM_CONTEXT_GETPARAM	0x34
+#define DRM_I915_GEM_CONTEXT_SETPARAM	0x35
+
+#define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
+#define DRM_IOCTL_I915_FLUSH		DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH)
+#define DRM_IOCTL_I915_FLIP		DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLIP)
+#define DRM_IOCTL_I915_BATCHBUFFER	DRM_IOW( DRM_COMMAND_BASE + DRM_I915_BATCHBUFFER, drm_i915_batchbuffer_t)
+#define DRM_IOCTL_I915_IRQ_EMIT         DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_IRQ_EMIT, drm_i915_irq_emit_t)
+#define DRM_IOCTL_I915_IRQ_WAIT         DRM_IOW( DRM_COMMAND_BASE + DRM_I915_IRQ_WAIT, drm_i915_irq_wait_t)
+#define DRM_IOCTL_I915_GETPARAM         DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GETPARAM, drm_i915_getparam_t)
+#define DRM_IOCTL_I915_SETPARAM         DRM_IOW( DRM_COMMAND_BASE + DRM_I915_SETPARAM, drm_i915_setparam_t)
+#define DRM_IOCTL_I915_ALLOC            DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_ALLOC, drm_i915_mem_alloc_t)
+#define DRM_IOCTL_I915_FREE             DRM_IOW( DRM_COMMAND_BASE + DRM_I915_FREE, drm_i915_mem_free_t)
+#define DRM_IOCTL_I915_INIT_HEAP        DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT_HEAP, drm_i915_mem_init_heap_t)
+#define DRM_IOCTL_I915_CMDBUFFER	DRM_IOW( DRM_COMMAND_BASE + DRM_I915_CMDBUFFER, drm_i915_cmdbuffer_t)
+#define DRM_IOCTL_I915_DESTROY_HEAP	DRM_IOW( DRM_COMMAND_BASE + DRM_I915_DESTROY_HEAP, drm_i915_mem_destroy_heap_t)
+#define DRM_IOCTL_I915_SET_VBLANK_PIPE	DRM_IOW( DRM_COMMAND_BASE + DRM_I915_SET_VBLANK_PIPE, drm_i915_vblank_pipe_t)
+#define DRM_IOCTL_I915_GET_VBLANK_PIPE	DRM_IOR( DRM_COMMAND_BASE + DRM_I915_GET_VBLANK_PIPE, drm_i915_vblank_pipe_t)
+#define DRM_IOCTL_I915_VBLANK_SWAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_VBLANK_SWAP, drm_i915_vblank_swap_t)
+#define DRM_IOCTL_I915_HWS_ADDR		DRM_IOW(DRM_COMMAND_BASE + DRM_I915_HWS_ADDR, struct drm_i915_gem_init)
+#define DRM_IOCTL_I915_GEM_INIT		DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_INIT, struct drm_i915_gem_init)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER, struct drm_i915_gem_execbuffer)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER2	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER2, struct drm_i915_gem_execbuffer2)
+#define DRM_IOCTL_I915_GEM_PIN		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_PIN, struct drm_i915_gem_pin)
+#define DRM_IOCTL_I915_GEM_UNPIN	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_UNPIN, struct drm_i915_gem_unpin)
+#define DRM_IOCTL_I915_GEM_BUSY		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_BUSY, struct drm_i915_gem_busy)
+#define DRM_IOCTL_I915_GEM_SET_CACHING		DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_SET_CACHING, struct drm_i915_gem_caching)
+#define DRM_IOCTL_I915_GEM_GET_CACHING		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_GET_CACHING, struct drm_i915_gem_caching)
+#define DRM_IOCTL_I915_GEM_THROTTLE	DRM_IO ( DRM_COMMAND_BASE + DRM_I915_GEM_THROTTLE)
+#define DRM_IOCTL_I915_GEM_ENTERVT	DRM_IO(DRM_COMMAND_BASE + DRM_I915_GEM_ENTERVT)
+#define DRM_IOCTL_I915_GEM_LEAVEVT	DRM_IO(DRM_COMMAND_BASE + DRM_I915_GEM_LEAVEVT)
+#define DRM_IOCTL_I915_GEM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_CREATE, struct drm_i915_gem_create)
+#define DRM_IOCTL_I915_GEM_PREAD	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_PREAD, struct drm_i915_gem_pread)
+#define DRM_IOCTL_I915_GEM_PWRITE	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_PWRITE, struct drm_i915_gem_pwrite)
+#define DRM_IOCTL_I915_GEM_MMAP		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_MMAP, struct drm_i915_gem_mmap)
+#define DRM_IOCTL_I915_GEM_MMAP_GTT	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_MMAP_GTT, struct drm_i915_gem_mmap_gtt)
+#define DRM_IOCTL_I915_GEM_SET_DOMAIN	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_SET_DOMAIN, struct drm_i915_gem_set_domain)
+#define DRM_IOCTL_I915_GEM_SW_FINISH	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_SW_FINISH, struct drm_i915_gem_sw_finish)
+#define DRM_IOCTL_I915_GEM_SET_TILING	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_SET_TILING, struct drm_i915_gem_set_tiling)
+#define DRM_IOCTL_I915_GEM_GET_TILING	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_GET_TILING, struct drm_i915_gem_get_tiling)
+#define DRM_IOCTL_I915_GEM_GET_APERTURE	DRM_IOR  (DRM_COMMAND_BASE + DRM_I915_GEM_GET_APERTURE, struct drm_i915_gem_get_aperture)
+#define DRM_IOCTL_I915_GET_PIPE_FROM_CRTC_ID DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GET_PIPE_FROM_CRTC_ID, struct drm_i915_get_pipe_from_crtc_id)
+#define DRM_IOCTL_I915_GEM_MADVISE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_MADVISE, struct drm_i915_gem_madvise)
+#define DRM_IOCTL_I915_OVERLAY_PUT_IMAGE	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_OVERLAY_PUT_IMAGE, struct drm_intel_overlay_put_image)
+#define DRM_IOCTL_I915_OVERLAY_ATTRS	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_OVERLAY_ATTRS, struct drm_intel_overlay_attrs)
+#define DRM_IOCTL_I915_SET_SPRITE_COLORKEY DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_SET_SPRITE_COLORKEY, struct drm_intel_sprite_colorkey)
+#define DRM_IOCTL_I915_GET_SPRITE_COLORKEY DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_SET_SPRITE_COLORKEY, struct drm_intel_sprite_colorkey)
+#define DRM_IOCTL_I915_GEM_WAIT		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_WAIT, struct drm_i915_gem_wait)
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create)
+#define DRM_IOCTL_I915_GEM_CONTEXT_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_DESTROY, struct drm_i915_gem_context_destroy)
+#define DRM_IOCTL_I915_REG_READ			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_REG_READ, struct drm_i915_reg_read)
+#define DRM_IOCTL_I915_GET_RESET_STATS		DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GET_RESET_STATS, struct drm_i915_reset_stats)
+#define DRM_IOCTL_I915_GEM_USERPTR			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_USERPTR, struct drm_i915_gem_userptr)
+#define DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_GETPARAM, struct drm_i915_gem_context_param)
+#define DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_SETPARAM, struct drm_i915_gem_context_param)
+
+/* Allow drivers to submit batchbuffers directly to hardware, relying
+ * on the security mechanisms provided by hardware.
+ */
+typedef struct drm_i915_batchbuffer {
+	int start;		/* agp offset */
+	int used;		/* nr bytes in use */
+	int DR1;		/* hw flags for GFX_OP_DRAWRECT_INFO */
+	int DR4;		/* window origin for GFX_OP_DRAWRECT_INFO */
+	int num_cliprects;	/* mulitpass with multiple cliprects? */
+	struct drm_clip_rect *cliprects;	/* pointer to userspace cliprects */
+} drm_i915_batchbuffer_t;
+
+/* As above, but pass a pointer to userspace buffer which can be
+ * validated by the kernel prior to sending to hardware.
+ */
+typedef struct _drm_i915_cmdbuffer {
+	char *buf;	/* pointer to userspace command buffer */
+	int sz;			/* nr bytes in buf */
+	int DR1;		/* hw flags for GFX_OP_DRAWRECT_INFO */
+	int DR4;		/* window origin for GFX_OP_DRAWRECT_INFO */
+	int num_cliprects;	/* mulitpass with multiple cliprects? */
+	struct drm_clip_rect *cliprects;	/* pointer to userspace cliprects */
+} drm_i915_cmdbuffer_t;
+
+/* Userspace can request & wait on irq's:
+ */
+typedef struct drm_i915_irq_emit {
+	int *irq_seq;
+} drm_i915_irq_emit_t;
+
+typedef struct drm_i915_irq_wait {
+	int irq_seq;
+} drm_i915_irq_wait_t;
+
+/* Ioctl to query kernel params:
+ */
+#define I915_PARAM_IRQ_ACTIVE            1
+#define I915_PARAM_ALLOW_BATCHBUFFER     2
+#define I915_PARAM_LAST_DISPATCH         3
+#define I915_PARAM_CHIPSET_ID            4
+#define I915_PARAM_HAS_GEM               5
+#define I915_PARAM_NUM_FENCES_AVAIL      6
+#define I915_PARAM_HAS_OVERLAY           7
+#define I915_PARAM_HAS_PAGEFLIPPING	 8
+#define I915_PARAM_HAS_EXECBUF2          9
+#define I915_PARAM_HAS_BSD		 10
+#define I915_PARAM_HAS_BLT		 11
+#define I915_PARAM_HAS_RELAXED_FENCING	 12
+#define I915_PARAM_HAS_COHERENT_RINGS	 13
+#define I915_PARAM_HAS_EXEC_CONSTANTS	 14
+#define I915_PARAM_HAS_RELAXED_DELTA	 15
+#define I915_PARAM_HAS_GEN7_SOL_RESET	 16
+#define I915_PARAM_HAS_LLC     	 	 17
+#define I915_PARAM_HAS_ALIASING_PPGTT	 18
+#define I915_PARAM_HAS_WAIT_TIMEOUT	 19
+#define I915_PARAM_HAS_SEMAPHORES	 20
+#define I915_PARAM_HAS_PRIME_VMAP_FLUSH	 21
+#define I915_PARAM_HAS_VEBOX		 22
+#define I915_PARAM_HAS_SECURE_BATCHES	 23
+#define I915_PARAM_HAS_PINNED_BATCHES	 24
+#define I915_PARAM_HAS_EXEC_NO_RELOC	 25
+#define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
+#define I915_PARAM_HAS_WT     	 	 27
+#define I915_PARAM_CMD_PARSER_VERSION	 28
+#define I915_PARAM_HAS_COHERENT_PHYS_GTT 29
+#define I915_PARAM_MMAP_VERSION          30
+#define I915_PARAM_HAS_BSD2		 31
+#define I915_PARAM_REVISION              32
+#define I915_PARAM_SUBSLICE_TOTAL	 33
+#define I915_PARAM_EU_TOTAL		 34
+
+typedef struct drm_i915_getparam {
+	int param;
+	int *value;
+} drm_i915_getparam_t;
+
+/* Ioctl to set kernel params:
+ */
+#define I915_SETPARAM_USE_MI_BATCHBUFFER_START            1
+#define I915_SETPARAM_TEX_LRU_LOG_GRANULARITY             2
+#define I915_SETPARAM_ALLOW_BATCHBUFFER                   3
+#define I915_SETPARAM_NUM_USED_FENCES                     4
+
+typedef struct drm_i915_setparam {
+	int param;
+	int value;
+} drm_i915_setparam_t;
+
+/* A memory manager for regions of shared memory:
+ */
+#define I915_MEM_REGION_AGP 1
+
+typedef struct drm_i915_mem_alloc {
+	int region;
+	int alignment;
+	int size;
+	int *region_offset;	/* offset from start of fb or agp */
+} drm_i915_mem_alloc_t;
+
+typedef struct drm_i915_mem_free {
+	int region;
+	int region_offset;
+} drm_i915_mem_free_t;
+
+typedef struct drm_i915_mem_init_heap {
+	int region;
+	int size;
+	int start;
+} drm_i915_mem_init_heap_t;
+
+/* Allow memory manager to be torn down and re-initialized (eg on
+ * rotate):
+ */
+typedef struct drm_i915_mem_destroy_heap {
+	int region;
+} drm_i915_mem_destroy_heap_t;
+
+/* Allow X server to configure which pipes to monitor for vblank signals
+ */
+#define	DRM_I915_VBLANK_PIPE_A	1
+#define	DRM_I915_VBLANK_PIPE_B	2
+
+typedef struct drm_i915_vblank_pipe {
+	int pipe;
+} drm_i915_vblank_pipe_t;
+
+/* Schedule buffer swap at given vertical blank:
+ */
+typedef struct drm_i915_vblank_swap {
+	drm_drawable_t drawable;
+	enum drm_vblank_seq_type seqtype;
+	unsigned int sequence;
+} drm_i915_vblank_swap_t;
+
+typedef struct drm_i915_hws_addr {
+	__u64 addr;
+} drm_i915_hws_addr_t;
+
+struct drm_i915_gem_init {
+	/**
+	 * Beginning offset in the GTT to be managed by the DRM memory
+	 * manager.
+	 */
+	__u64 gtt_start;
+	/**
+	 * Ending offset in the GTT to be managed by the DRM memory
+	 * manager.
+	 */
+	__u64 gtt_end;
+};
+
+struct drm_i915_gem_create {
+	/**
+	 * Requested size for the object.
+	 *
+	 * The (page-aligned) allocated size for the object will be returned.
+	 */
+	__u64 size;
+	/**
+	 * Returned handle for the object.
+	 *
+	 * Object handles are nonzero.
+	 */
+	__u32 handle;
+	__u32 pad;
+};
+
+struct drm_i915_gem_pread {
+	/** Handle for the object being read. */
+	__u32 handle;
+	__u32 pad;
+	/** Offset into the object to read from */
+	__u64 offset;
+	/** Length of data to read */
+	__u64 size;
+	/**
+	 * Pointer to write the data into.
+	 *
+	 * This is a fixed-size type for 32/64 compatibility.
+	 */
+	__u64 data_ptr;
+};
+
+struct drm_i915_gem_pwrite {
+	/** Handle for the object being written to. */
+	__u32 handle;
+	__u32 pad;
+	/** Offset into the object to write to */
+	__u64 offset;
+	/** Length of data to write */
+	__u64 size;
+	/**
+	 * Pointer to read the data from.
+	 *
+	 * This is a fixed-size type for 32/64 compatibility.
+	 */
+	__u64 data_ptr;
+};
+
+struct drm_i915_gem_mmap {
+	/** Handle for the object being mapped. */
+	__u32 handle;
+	__u32 pad;
+	/** Offset in the object to map. */
+	__u64 offset;
+	/**
+	 * Length of data to map.
+	 *
+	 * The value will be page-aligned.
+	 */
+	__u64 size;
+	/**
+	 * Returned pointer the data was mapped at.
+	 *
+	 * This is a fixed-size type for 32/64 compatibility.
+	 */
+	__u64 addr_ptr;
+
+	/**
+	 * Flags for extended behaviour.
+	 *
+	 * Added in version 2.
+	 */
+	__u64 flags;
+#define I915_MMAP_WC 0x1
+};
+
+struct drm_i915_gem_mmap_gtt {
+	/** Handle for the object being mapped. */
+	__u32 handle;
+	__u32 pad;
+	/**
+	 * Fake offset to use for subsequent mmap call
+	 *
+	 * This is a fixed-size type for 32/64 compatibility.
+	 */
+	__u64 offset;
+};
+
+struct drm_i915_gem_set_domain {
+	/** Handle for the object */
+	__u32 handle;
+
+	/** New read domains */
+	__u32 read_domains;
+
+	/** New write domain */
+	__u32 write_domain;
+};
+
+struct drm_i915_gem_sw_finish {
+	/** Handle for the object */
+	__u32 handle;
+};
+
+struct drm_i915_gem_relocation_entry {
+	/**
+	 * Handle of the buffer being pointed to by this relocation entry.
+	 *
+	 * It's appealing to make this be an index into the mm_validate_entry
+	 * list to refer to the buffer, but this allows the driver to create
+	 * a relocation list for state buffers and not re-write it per
+	 * exec using the buffer.
+	 */
+	__u32 target_handle;
+
+	/**
+	 * Value to be added to the offset of the target buffer to make up
+	 * the relocation entry.
+	 */
+	__u32 delta;
+
+	/** Offset in the buffer the relocation entry will be written into */
+	__u64 offset;
+
+	/**
+	 * Offset value of the target buffer that the relocation entry was last
+	 * written as.
+	 *
+	 * If the buffer has the same offset as last time, we can skip syncing
+	 * and writing the relocation.  This value is written back out by
+	 * the execbuffer ioctl when the relocation is written.
+	 */
+	__u64 presumed_offset;
+
+	/**
+	 * Target memory domains read by this operation.
+	 */
+	__u32 read_domains;
+
+	/**
+	 * Target memory domains written by this operation.
+	 *
+	 * Note that only one domain may be written by the whole
+	 * execbuffer operation, so that where there are conflicts,
+	 * the application will get -EINVAL back.
+	 */
+	__u32 write_domain;
+};
+
+/** @{
+ * Intel memory domains
+ *
+ * Most of these just align with the various caches in
+ * the system and are used to flush and invalidate as
+ * objects end up cached in different domains.
+ */
+/** CPU cache */
+#define I915_GEM_DOMAIN_CPU		0x00000001
+/** Render cache, used by 2D and 3D drawing */
+#define I915_GEM_DOMAIN_RENDER		0x00000002
+/** Sampler cache, used by texture engine */
+#define I915_GEM_DOMAIN_SAMPLER		0x00000004
+/** Command queue, used to load batch buffers */
+#define I915_GEM_DOMAIN_COMMAND		0x00000008
+/** Instruction cache, used by shader programs */
+#define I915_GEM_DOMAIN_INSTRUCTION	0x00000010
+/** Vertex address cache */
+#define I915_GEM_DOMAIN_VERTEX		0x00000020
+/** GTT domain - aperture and scanout */
+#define I915_GEM_DOMAIN_GTT		0x00000040
+/** @} */
+
+struct drm_i915_gem_exec_object {
+	/**
+	 * User's handle for a buffer to be bound into the GTT for this
+	 * operation.
+	 */
+	__u32 handle;
+
+	/** Number of relocations to be performed on this buffer */
+	__u32 relocation_count;
+	/**
+	 * Pointer to array of struct drm_i915_gem_relocation_entry containing
+	 * the relocations to be performed in this buffer.
+	 */
+	__u64 relocs_ptr;
+
+	/** Required alignment in graphics aperture */
+	__u64 alignment;
+
+	/**
+	 * Returned value of the updated offset of the object, for future
+	 * presumed_offset writes.
+	 */
+	__u64 offset;
+};
+
+struct drm_i915_gem_execbuffer {
+	/**
+	 * List of buffers to be validated with their relocations to be
+	 * performend on them.
+	 *
+	 * This is a pointer to an array of struct drm_i915_gem_validate_entry.
+	 *
+	 * These buffers must be listed in an order such that all relocations
+	 * a buffer is performing refer to buffers that have already appeared
+	 * in the validate list.
+	 */
+	__u64 buffers_ptr;
+	__u32 buffer_count;
+
+	/** Offset in the batchbuffer to start execution from. */
+	__u32 batch_start_offset;
+	/** Bytes used in batchbuffer from batch_start_offset */
+	__u32 batch_len;
+	__u32 DR1;
+	__u32 DR4;
+	__u32 num_cliprects;
+	/** This is a struct drm_clip_rect *cliprects */
+	__u64 cliprects_ptr;
+};
+
+struct drm_i915_gem_exec_object2 {
+	/**
+	 * User's handle for a buffer to be bound into the GTT for this
+	 * operation.
+	 */
+	__u32 handle;
+
+	/** Number of relocations to be performed on this buffer */
+	__u32 relocation_count;
+	/**
+	 * Pointer to array of struct drm_i915_gem_relocation_entry containing
+	 * the relocations to be performed in this buffer.
+	 */
+	__u64 relocs_ptr;
+
+	/** Required alignment in graphics aperture */
+	__u64 alignment;
+
+	/**
+	 * Returned value of the updated offset of the object, for future
+	 * presumed_offset writes.
+	 */
+	__u64 offset;
+
+#define EXEC_OBJECT_NEEDS_FENCE (1<<0)
+#define EXEC_OBJECT_NEEDS_GTT	(1<<1)
+#define EXEC_OBJECT_WRITE	(1<<2)
+#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_WRITE<<1)
+	__u64 flags;
+
+	__u64 rsvd1;
+	__u64 rsvd2;
+};
+
+struct drm_i915_gem_execbuffer2 {
+	/**
+	 * List of gem_exec_object2 structs
+	 */
+	__u64 buffers_ptr;
+	__u32 buffer_count;
+
+	/** Offset in the batchbuffer to start execution from. */
+	__u32 batch_start_offset;
+	/** Bytes used in batchbuffer from batch_start_offset */
+	__u32 batch_len;
+	__u32 DR1;
+	__u32 DR4;
+	__u32 num_cliprects;
+	/** This is a struct drm_clip_rect *cliprects */
+	__u64 cliprects_ptr;
+#define I915_EXEC_RING_MASK              (7<<0)
+#define I915_EXEC_DEFAULT                (0<<0)
+#define I915_EXEC_RENDER                 (1<<0)
+#define I915_EXEC_BSD                    (2<<0)
+#define I915_EXEC_BLT                    (3<<0)
+#define I915_EXEC_VEBOX                  (4<<0)
+
+/* Used for switching the constants addressing mode on gen4+ RENDER ring.
+ * Gen6+ only supports relative addressing to dynamic state (default) and
+ * absolute addressing.
+ *
+ * These flags are ignored for the BSD and BLT rings.
+ */
+#define I915_EXEC_CONSTANTS_MASK 	(3<<6)
+#define I915_EXEC_CONSTANTS_REL_GENERAL (0<<6) /* default */
+#define I915_EXEC_CONSTANTS_ABSOLUTE 	(1<<6)
+#define I915_EXEC_CONSTANTS_REL_SURFACE (2<<6) /* gen4/5 only */
+	__u64 flags;
+	__u64 rsvd1; /* now used for context info */
+	__u64 rsvd2;
+};
+
+/** Resets the SO write offset registers for transform feedback on gen7. */
+#define I915_EXEC_GEN7_SOL_RESET	(1<<8)
+
+/** Request a privileged ("secure") batch buffer. Note only available for
+ * DRM_ROOT_ONLY | DRM_MASTER processes.
+ */
+#define I915_EXEC_SECURE		(1<<9)
+
+/** Inform the kernel that the batch is and will always be pinned. This
+ * negates the requirement for a workaround to be performed to avoid
+ * an incoherent CS (such as can be found on 830/845). If this flag is
+ * not passed, the kernel will endeavour to make sure the batch is
+ * coherent with the CS before execution. If this flag is passed,
+ * userspace assumes the responsibility for ensuring the same.
+ */
+#define I915_EXEC_IS_PINNED		(1<<10)
+
+/** Provide a hint to the kernel that the command stream and auxiliary
+ * state buffers already holds the correct presumed addresses and so the
+ * relocation process may be skipped if no buffers need to be moved in
+ * preparation for the execbuffer.
+ */
+#define I915_EXEC_NO_RELOC		(1<<11)
+
+/** Use the reloc.handle as an index into the exec object array rather
+ * than as the per-file handle.
+ */
+#define I915_EXEC_HANDLE_LUT		(1<<12)
+
+/** Used for switching BSD rings on the platforms with two BSD rings */
+#define I915_EXEC_BSD_MASK		(3<<13)
+#define I915_EXEC_BSD_DEFAULT		(0<<13) /* default ping-pong mode */
+#define I915_EXEC_BSD_RING1		(1<<13)
+#define I915_EXEC_BSD_RING2		(2<<13)
+
+#define __I915_EXEC_UNKNOWN_FLAGS -(1<<15)
+
+#define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
+#define i915_execbuffer2_set_context_id(eb2, context) \
+	(eb2).rsvd1 = context & I915_EXEC_CONTEXT_ID_MASK
+#define i915_execbuffer2_get_context_id(eb2) \
+	((eb2).rsvd1 & I915_EXEC_CONTEXT_ID_MASK)
+
+struct drm_i915_gem_pin {
+	/** Handle of the buffer to be pinned. */
+	__u32 handle;
+	__u32 pad;
+
+	/** alignment required within the aperture */
+	__u64 alignment;
+
+	/** Returned GTT offset of the buffer. */
+	__u64 offset;
+};
+
+struct drm_i915_gem_unpin {
+	/** Handle of the buffer to be unpinned. */
+	__u32 handle;
+	__u32 pad;
+};
+
+struct drm_i915_gem_busy {
+	/** Handle of the buffer to check for busy */
+	__u32 handle;
+
+	/** Return busy status (1 if busy, 0 if idle).
+	 * The high word is used to indicate on which rings the object
+	 * currently resides:
+	 *  16:31 - busy (r or r/w) rings (16 render, 17 bsd, 18 blt, etc)
+	 */
+	__u32 busy;
+};
+
+/**
+ * I915_CACHING_NONE
+ *
+ * GPU access is not coherent with cpu caches. Default for machines without an
+ * LLC.
+ */
+#define I915_CACHING_NONE		0
+/**
+ * I915_CACHING_CACHED
+ *
+ * GPU access is coherent with cpu caches and furthermore the data is cached in
+ * last-level caches shared between cpu cores and the gpu GT. Default on
+ * machines with HAS_LLC.
+ */
+#define I915_CACHING_CACHED		1
+/**
+ * I915_CACHING_DISPLAY
+ *
+ * Special GPU caching mode which is coherent with the scanout engines.
+ * Transparently falls back to I915_CACHING_NONE on platforms where no special
+ * cache mode (like write-through or gfdt flushing) is available. The kernel
+ * automatically sets this mode when using a buffer as a scanout target.
+ * Userspace can manually set this mode to avoid a costly stall and clflush in
+ * the hotpath of drawing the first frame.
+ */
+#define I915_CACHING_DISPLAY		2
+
+struct drm_i915_gem_caching {
+	/**
+	 * Handle of the buffer to set/get the caching level of. */
+	__u32 handle;
+
+	/**
+	 * Cacheing level to apply or return value
+	 *
+	 * bits0-15 are for generic caching control (i.e. the above defined
+	 * values). bits16-31 are reserved for platform-specific variations
+	 * (e.g. l3$ caching on gen7). */
+	__u32 caching;
+};
+
+#define I915_TILING_NONE	0
+#define I915_TILING_X		1
+#define I915_TILING_Y		2
+
+#define I915_BIT_6_SWIZZLE_NONE		0
+#define I915_BIT_6_SWIZZLE_9		1
+#define I915_BIT_6_SWIZZLE_9_10		2
+#define I915_BIT_6_SWIZZLE_9_11		3
+#define I915_BIT_6_SWIZZLE_9_10_11	4
+/* Not seen by userland */
+#define I915_BIT_6_SWIZZLE_UNKNOWN	5
+/* Seen by userland. */
+#define I915_BIT_6_SWIZZLE_9_17		6
+#define I915_BIT_6_SWIZZLE_9_10_17	7
+
+struct drm_i915_gem_set_tiling {
+	/** Handle of the buffer to have its tiling state updated */
+	__u32 handle;
+
+	/**
+	 * Tiling mode for the object (I915_TILING_NONE, I915_TILING_X,
+	 * I915_TILING_Y).
+	 *
+	 * This value is to be set on request, and will be updated by the
+	 * kernel on successful return with the actual chosen tiling layout.
+	 *
+	 * The tiling mode may be demoted to I915_TILING_NONE when the system
+	 * has bit 6 swizzling that can't be managed correctly by GEM.
+	 *
+	 * Buffer contents become undefined when changing tiling_mode.
+	 */
+	__u32 tiling_mode;
+
+	/**
+	 * Stride in bytes for the object when in I915_TILING_X or
+	 * I915_TILING_Y.
+	 */
+	__u32 stride;
+
+	/**
+	 * Returned address bit 6 swizzling required for CPU access through
+	 * mmap mapping.
+	 */
+	__u32 swizzle_mode;
+};
+
+struct drm_i915_gem_get_tiling {
+	/** Handle of the buffer to get tiling state for. */
+	__u32 handle;
+
+	/**
+	 * Current tiling mode for the object (I915_TILING_NONE, I915_TILING_X,
+	 * I915_TILING_Y).
+	 */
+	__u32 tiling_mode;
+
+	/**
+	 * Returned address bit 6 swizzling required for CPU access through
+	 * mmap mapping.
+	 */
+	__u32 swizzle_mode;
+
+	/**
+	 * Returned address bit 6 swizzling required for CPU access through
+	 * mmap mapping whilst bound.
+	 */
+	__u32 phys_swizzle_mode;
+};
+
+struct drm_i915_gem_get_aperture {
+	/** Total size of the aperture used by i915_gem_execbuffer, in bytes */
+	__u64 aper_size;
+
+	/**
+	 * Available space in the aperture used by i915_gem_execbuffer, in
+	 * bytes
+	 */
+	__u64 aper_available_size;
+};
+
+struct drm_i915_get_pipe_from_crtc_id {
+	/** ID of CRTC being requested **/
+	__u32 crtc_id;
+
+	/** pipe of requested CRTC **/
+	__u32 pipe;
+};
+
+#define I915_MADV_WILLNEED 0
+#define I915_MADV_DONTNEED 1
+#define __I915_MADV_PURGED 2 /* internal state */
+
+struct drm_i915_gem_madvise {
+	/** Handle of the buffer to change the backing store advice */
+	__u32 handle;
+
+	/* Advice: either the buffer will be needed again in the near future,
+	 *         or won't be and could be discarded under memory pressure.
+	 */
+	__u32 madv;
+
+	/** Whether the backing store still exists. */
+	__u32 retained;
+};
+
+/* flags */
+#define I915_OVERLAY_TYPE_MASK 		0xff
+#define I915_OVERLAY_YUV_PLANAR 	0x01
+#define I915_OVERLAY_YUV_PACKED 	0x02
+#define I915_OVERLAY_RGB		0x03
+
+#define I915_OVERLAY_DEPTH_MASK		0xff00
+#define I915_OVERLAY_RGB24		0x1000
+#define I915_OVERLAY_RGB16		0x2000
+#define I915_OVERLAY_RGB15		0x3000
+#define I915_OVERLAY_YUV422		0x0100
+#define I915_OVERLAY_YUV411		0x0200
+#define I915_OVERLAY_YUV420		0x0300
+#define I915_OVERLAY_YUV410		0x0400
+
+#define I915_OVERLAY_SWAP_MASK		0xff0000
+#define I915_OVERLAY_NO_SWAP		0x000000
+#define I915_OVERLAY_UV_SWAP		0x010000
+#define I915_OVERLAY_Y_SWAP		0x020000
+#define I915_OVERLAY_Y_AND_UV_SWAP	0x030000
+
+#define I915_OVERLAY_FLAGS_MASK		0xff000000
+#define I915_OVERLAY_ENABLE		0x01000000
+
+struct drm_intel_overlay_put_image {
+	/* various flags and src format description */
+	__u32 flags;
+	/* source picture description */
+	__u32 bo_handle;
+	/* stride values and offsets are in bytes, buffer relative */
+	__u16 stride_Y; /* stride for packed formats */
+	__u16 stride_UV;
+	__u32 offset_Y; /* offset for packet formats */
+	__u32 offset_U;
+	__u32 offset_V;
+	/* in pixels */
+	__u16 src_width;
+	__u16 src_height;
+	/* to compensate the scaling factors for partially covered surfaces */
+	__u16 src_scan_width;
+	__u16 src_scan_height;
+	/* output crtc description */
+	__u32 crtc_id;
+	__u16 dst_x;
+	__u16 dst_y;
+	__u16 dst_width;
+	__u16 dst_height;
+};
+
+/* flags */
+#define I915_OVERLAY_UPDATE_ATTRS	(1<<0)
+#define I915_OVERLAY_UPDATE_GAMMA	(1<<1)
+struct drm_intel_overlay_attrs {
+	__u32 flags;
+	__u32 color_key;
+	__s32 brightness;
+	__u32 contrast;
+	__u32 saturation;
+	__u32 gamma0;
+	__u32 gamma1;
+	__u32 gamma2;
+	__u32 gamma3;
+	__u32 gamma4;
+	__u32 gamma5;
+};
+
+/*
+ * Intel sprite handling
+ *
+ * Color keying works with a min/mask/max tuple.  Both source and destination
+ * color keying is allowed.
+ *
+ * Source keying:
+ * Sprite pixels within the min & max values, masked against the color channels
+ * specified in the mask field, will be transparent.  All other pixels will
+ * be displayed on top of the primary plane.  For RGB surfaces, only the min
+ * and mask fields will be used; ranged compares are not allowed.
+ *
+ * Destination keying:
+ * Primary plane pixels that match the min value, masked against the color
+ * channels specified in the mask field, will be replaced by corresponding
+ * pixels from the sprite plane.
+ *
+ * Note that source & destination keying are exclusive; only one can be
+ * active on a given plane.
+ */
+
+#define I915_SET_COLORKEY_NONE		(1<<0) /* disable color key matching */
+#define I915_SET_COLORKEY_DESTINATION	(1<<1)
+#define I915_SET_COLORKEY_SOURCE	(1<<2)
+struct drm_intel_sprite_colorkey {
+	__u32 plane_id;
+	__u32 min_value;
+	__u32 channel_mask;
+	__u32 max_value;
+	__u32 flags;
+};
+
+struct drm_i915_gem_wait {
+	/** Handle of BO we shall wait on */
+	__u32 bo_handle;
+	__u32 flags;
+	/** Number of nanoseconds to wait, Returns time remaining. */
+	__s64 timeout_ns;
+};
+
+struct drm_i915_gem_context_create {
+	/*  output: id of new context*/
+	__u32 ctx_id;
+	__u32 pad;
+};
+
+struct drm_i915_gem_context_destroy {
+	__u32 ctx_id;
+	__u32 pad;
+};
+
+struct drm_i915_reg_read {
+	__u64 offset;
+	__u64 val; /* Return value */
+};
+
+struct drm_i915_reset_stats {
+	__u32 ctx_id;
+	__u32 flags;
+
+	/* All resets since boot/module reload, for all contexts */
+	__u32 reset_count;
+
+	/* Number of batches lost when active in GPU, for this context */
+	__u32 batch_active;
+
+	/* Number of batches lost pending for execution, for this context */
+	__u32 batch_pending;
+
+	__u32 pad;
+};
+
+struct drm_i915_gem_userptr {
+	__u64 user_ptr;
+	__u64 user_size;
+	__u32 flags;
+#define I915_USERPTR_READ_ONLY 0x1
+#define I915_USERPTR_UNSYNCHRONIZED 0x80000000
+	/**
+	 * Returned handle for the object.
+	 *
+	 * Object handles are nonzero.
+	 */
+	__u32 handle;
+};
+
+struct drm_i915_gem_context_param {
+	__u32 ctx_id;
+	__u32 size;
+	__u64 param;
+#define I915_CONTEXT_PARAM_BAN_PERIOD 0x1
+	__u64 value;
+};
+
+#endif /* _I915_DRM_H_ */

diff --git a/icd/intel/kmd/libdrm/intel/intel_aub.h b/icd/intel/kmd/libdrm/intel/intel_aub.h
new file mode 100644
index 0000000..5f0aba8
--- /dev/null
+++ b/icd/intel/kmd/libdrm/intel/intel_aub.h

@@ -0,0 +1,153 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Eric Anholt <eric@anholt.net>
+ *
+ */
+
+/** @file intel_aub.h
+ *
+ * The AUB file is a file format used by Intel's internal simulation
+ * and other validation tools.  It can be used at various levels by a
+ * driver to input state to the simulated hardware or a replaying
+ * debugger.
+ *
+ * We choose to dump AUB files using the trace block format for ease
+ * of implementation -- dump out the blocks of memory as plain blobs
+ * and insert ring commands to execute the batchbuffer blob.
+ */
+
+#ifndef _INTEL_AUB_H
+#define _INTEL_AUB_H
+
+#define AUB_MI_NOOP			(0)
+#define AUB_MI_BATCH_BUFFER_START 	(0x31 << 23)
+#define AUB_PIPE_CONTROL		(0x7a000002)
+
+/* DW0: instruction type. */
+
+#define CMD_AUB			(7 << 29)
+
+#define CMD_AUB_HEADER		(CMD_AUB | (1 << 23) | (0x05 << 16))
+/* DW1 */
+# define AUB_HEADER_MAJOR_SHIFT		24
+# define AUB_HEADER_MINOR_SHIFT		16
+
+#define CMD_AUB_TRACE_HEADER_BLOCK (CMD_AUB | (1 << 23) | (0x41 << 16))
+#define CMD_AUB_DUMP_BMP           (CMD_AUB | (1 << 23) | (0x9e << 16))
+
+/* DW1 */
+#define AUB_TRACE_OPERATION_MASK	0x000000ff
+#define AUB_TRACE_OP_COMMENT		0x00000000
+#define AUB_TRACE_OP_DATA_WRITE		0x00000001
+#define AUB_TRACE_OP_COMMAND_WRITE	0x00000002
+#define AUB_TRACE_OP_MMIO_WRITE		0x00000003
+// operation = TRACE_DATA_WRITE, Type
+#define AUB_TRACE_TYPE_MASK		0x0000ff00
+#define AUB_TRACE_TYPE_NOTYPE		(0 << 8)
+#define AUB_TRACE_TYPE_BATCH		(1 << 8)
+#define AUB_TRACE_TYPE_VERTEX_BUFFER	(5 << 8)
+#define AUB_TRACE_TYPE_2D_MAP		(6 << 8)
+#define AUB_TRACE_TYPE_CUBE_MAP		(7 << 8)
+#define AUB_TRACE_TYPE_VOLUME_MAP	(9 << 8)
+#define AUB_TRACE_TYPE_1D_MAP		(10 << 8)
+#define AUB_TRACE_TYPE_CONSTANT_BUFFER	(11 << 8)
+#define AUB_TRACE_TYPE_CONSTANT_URB	(12 << 8)
+#define AUB_TRACE_TYPE_INDEX_BUFFER	(13 << 8)
+#define AUB_TRACE_TYPE_GENERAL		(14 << 8)
+#define AUB_TRACE_TYPE_SURFACE		(15 << 8)
+
+
+// operation = TRACE_COMMAND_WRITE, Type =
+#define AUB_TRACE_TYPE_RING_HWB		(1 << 8)
+#define AUB_TRACE_TYPE_RING_PRB0	(2 << 8)
+#define AUB_TRACE_TYPE_RING_PRB1	(3 << 8)
+#define AUB_TRACE_TYPE_RING_PRB2	(4 << 8)
+
+// Address space
+#define AUB_TRACE_ADDRESS_SPACE_MASK	0x00ff0000
+#define AUB_TRACE_MEMTYPE_GTT		(0 << 16)
+#define AUB_TRACE_MEMTYPE_LOCAL		(1 << 16)
+#define AUB_TRACE_MEMTYPE_NONLOCAL	(2 << 16)
+#define AUB_TRACE_MEMTYPE_PCI		(3 << 16)
+#define AUB_TRACE_MEMTYPE_GTT_ENTRY     (4 << 16)
+
+/* DW2 */
+
+/**
+ * aub_state_struct_type enum values are encoded with the top 16 bits
+ * representing the type to be delivered to the .aub file, and the bottom 16
+ * bits representing the subtype.  This macro performs the encoding.
+ */
+#define ENCODE_SS_TYPE(type, subtype) (((type) << 16) | (subtype))
+
+enum aub_state_struct_type {
+   AUB_TRACE_VS_STATE =			ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 1),
+   AUB_TRACE_GS_STATE =			ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 2),
+   AUB_TRACE_CLIP_STATE =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 3),
+   AUB_TRACE_SF_STATE =			ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 4),
+   AUB_TRACE_WM_STATE =			ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 5),
+   AUB_TRACE_CC_STATE =			ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 6),
+   AUB_TRACE_CLIP_VP_STATE =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 7),
+   AUB_TRACE_SF_VP_STATE =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 8),
+   AUB_TRACE_CC_VP_STATE =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 0x9),
+   AUB_TRACE_SAMPLER_STATE =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 0xa),
+   AUB_TRACE_KERNEL_INSTRUCTIONS =	ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 0xb),
+   AUB_TRACE_SCRATCH_SPACE =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 0xc),
+   AUB_TRACE_SAMPLER_DEFAULT_COLOR =	ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 0xd),
+
+   AUB_TRACE_SCISSOR_STATE =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 0x15),
+   AUB_TRACE_BLEND_STATE =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 0x16),
+   AUB_TRACE_DEPTH_STENCIL_STATE =	ENCODE_SS_TYPE(AUB_TRACE_TYPE_GENERAL, 0x17),
+
+   AUB_TRACE_VERTEX_BUFFER =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_VERTEX_BUFFER, 0),
+   AUB_TRACE_BINDING_TABLE =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_SURFACE, 0x100),
+   AUB_TRACE_SURFACE_STATE =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_SURFACE, 0x200),
+   AUB_TRACE_VS_CONSTANTS =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_CONSTANT_BUFFER, 0),
+   AUB_TRACE_WM_CONSTANTS =		ENCODE_SS_TYPE(AUB_TRACE_TYPE_CONSTANT_BUFFER, 1),
+};
+
+#undef ENCODE_SS_TYPE
+
+/**
+ * Decode a aub_state_struct_type value to determine the type that should be
+ * stored in the .aub file.
+ */
+static inline uint32_t AUB_TRACE_TYPE(enum aub_state_struct_type ss_type)
+{
+   return (ss_type & 0xFFFF0000) >> 16;
+}
+
+/**
+ * Decode a state_struct_type value to determine the subtype that should be
+ * stored in the .aub file.
+ */
+static inline uint32_t AUB_TRACE_SUBTYPE(enum aub_state_struct_type ss_type)
+{
+   return ss_type & 0xFFFF;
+}
+
+/* DW3: address */
+/* DW4: len */
+
+#endif /* _INTEL_AUB_H */

diff --git a/icd/intel/kmd/libdrm/intel/intel_bufmgr.c b/icd/intel/kmd/libdrm/intel/intel_bufmgr.c
new file mode 100644
index 0000000..234cd13
--- /dev/null
+++ b/icd/intel/kmd/libdrm/intel/intel_bufmgr.c

@@ -0,0 +1,354 @@
+/*
+ * Copyright © 2007 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Eric Anholt <eric@anholt.net>
+ *
+ */
+
+#ifdef HAVE_CONFIG_H
+#include "config.h"
+#endif
+
+#include <string.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <assert.h>
+#include <errno.h>
+#include <drm.h>
+#include <i915_drm.h>
+#include <pciaccess.h>
+#include "libdrm.h"
+#include "intel_bufmgr.h"
+#include "intel_bufmgr_priv.h"
+#include "xf86drm.h"
+
+/** @file intel_bufmgr.c
+ *
+ * Convenience functions for buffer management methods.
+ */
+
+drm_public drm_intel_bo *
+drm_intel_bo_alloc(drm_intel_bufmgr *bufmgr, const char *name,
+		   unsigned long size, unsigned int alignment)
+{
+	return bufmgr->bo_alloc(bufmgr, name, size, alignment);
+}
+
+drm_public drm_intel_bo *
+drm_intel_bo_alloc_for_render(drm_intel_bufmgr *bufmgr, const char *name,
+			      unsigned long size, unsigned int alignment)
+{
+	return bufmgr->bo_alloc_for_render(bufmgr, name, size, alignment);
+}
+
+drm_public drm_intel_bo *
+drm_intel_bo_alloc_userptr(drm_intel_bufmgr *bufmgr,
+			   const char *name, void *addr,
+			   uint32_t tiling_mode,
+			   uint32_t stride,
+			   unsigned long size,
+			   unsigned long flags)
+{
+	if (bufmgr->bo_alloc_userptr)
+		return bufmgr->bo_alloc_userptr(bufmgr, name, addr, tiling_mode,
+						stride, size, flags);
+	return NULL;
+}
+
+drm_public drm_intel_bo *
+drm_intel_bo_alloc_tiled(drm_intel_bufmgr *bufmgr, const char *name,
+                        int x, int y, int cpp, uint32_t *tiling_mode,
+                        unsigned long *pitch, unsigned long flags)
+{
+	return bufmgr->bo_alloc_tiled(bufmgr, name, x, y, cpp,
+				      tiling_mode, pitch, flags);
+}
+
+drm_public void
+drm_intel_bo_reference(drm_intel_bo *bo)
+{
+	bo->bufmgr->bo_reference(bo);
+}
+
+drm_public void
+drm_intel_bo_unreference(drm_intel_bo *bo)
+{
+	if (bo == NULL)
+		return;
+
+	bo->bufmgr->bo_unreference(bo);
+}
+
+drm_public int
+drm_intel_bo_map(drm_intel_bo *buf, int write_enable)
+{
+	return buf->bufmgr->bo_map(buf, write_enable);
+}
+
+drm_public int
+drm_intel_bo_unmap(drm_intel_bo *buf)
+{
+	return buf->bufmgr->bo_unmap(buf);
+}
+
+drm_public int
+drm_intel_bo_subdata(drm_intel_bo *bo, unsigned long offset,
+		     unsigned long size, const void *data)
+{
+	return bo->bufmgr->bo_subdata(bo, offset, size, data);
+}
+
+drm_public int
+drm_intel_bo_get_subdata(drm_intel_bo *bo, unsigned long offset,
+			 unsigned long size, void *data)
+{
+	int ret;
+	if (bo->bufmgr->bo_get_subdata)
+		return bo->bufmgr->bo_get_subdata(bo, offset, size, data);
+
+	if (size == 0 || data == NULL)
+		return 0;
+
+	ret = drm_intel_bo_map(bo, 0);
+	if (ret)
+		return ret;
+	memcpy(data, (unsigned char *)bo->virtual + offset, size);
+	drm_intel_bo_unmap(bo);
+	return 0;
+}
+
+drm_public void
+drm_intel_bo_wait_rendering(drm_intel_bo *bo)
+{
+	bo->bufmgr->bo_wait_rendering(bo);
+}
+
+drm_public void
+drm_intel_bufmgr_destroy(drm_intel_bufmgr *bufmgr)
+{
+	bufmgr->destroy(bufmgr);
+}
+
+drm_public int
+drm_intel_bo_exec(drm_intel_bo *bo, int used,
+		  drm_clip_rect_t * cliprects, int num_cliprects, int DR4)
+{
+	return bo->bufmgr->bo_exec(bo, used, cliprects, num_cliprects, DR4);
+}
+
+drm_public int
+drm_intel_bo_mrb_exec(drm_intel_bo *bo, int used,
+		drm_clip_rect_t *cliprects, int num_cliprects, int DR4,
+		unsigned int rings)
+{
+	if (bo->bufmgr->bo_mrb_exec)
+		return bo->bufmgr->bo_mrb_exec(bo, used,
+					cliprects, num_cliprects, DR4,
+					rings);
+
+	switch (rings) {
+	case I915_EXEC_DEFAULT:
+	case I915_EXEC_RENDER:
+		return bo->bufmgr->bo_exec(bo, used,
+					   cliprects, num_cliprects, DR4);
+	default:
+		return -ENODEV;
+	}
+}
+
+drm_public void
+drm_intel_bufmgr_set_debug(drm_intel_bufmgr *bufmgr, int enable_debug)
+{
+	bufmgr->debug = enable_debug;
+}
+
+drm_public int
+drm_intel_bufmgr_check_aperture_space(drm_intel_bo ** bo_array, int count)
+{
+	return bo_array[0]->bufmgr->check_aperture_space(bo_array, count);
+}
+
+drm_public int
+drm_intel_bo_flink(drm_intel_bo *bo, uint32_t * name)
+{
+	if (bo->bufmgr->bo_flink)
+		return bo->bufmgr->bo_flink(bo, name);
+
+	return -ENODEV;
+}
+
+drm_public int
+drm_intel_bo_emit_reloc(drm_intel_bo *bo, uint32_t offset,
+			drm_intel_bo *target_bo, uint32_t target_offset,
+			uint32_t read_domains, uint32_t write_domain)
+{
+	return bo->bufmgr->bo_emit_reloc(bo, offset,
+					 target_bo, target_offset,
+					 read_domains, write_domain);
+}
+
+/* For fence registers, not GL fences */
+drm_public int
+drm_intel_bo_emit_reloc_fence(drm_intel_bo *bo, uint32_t offset,
+			      drm_intel_bo *target_bo, uint32_t target_offset,
+			      uint32_t read_domains, uint32_t write_domain)
+{
+	return bo->bufmgr->bo_emit_reloc_fence(bo, offset,
+					       target_bo, target_offset,
+					       read_domains, write_domain);
+}
+
+
+drm_public int
+drm_intel_bo_pin(drm_intel_bo *bo, uint32_t alignment)
+{
+	if (bo->bufmgr->bo_pin)
+		return bo->bufmgr->bo_pin(bo, alignment);
+
+	return -ENODEV;
+}
+
+drm_public int
+drm_intel_bo_unpin(drm_intel_bo *bo)
+{
+	if (bo->bufmgr->bo_unpin)
+		return bo->bufmgr->bo_unpin(bo);
+
+	return -ENODEV;
+}
+
+drm_public int
+drm_intel_bo_set_tiling(drm_intel_bo *bo, uint32_t * tiling_mode,
+			uint32_t stride)
+{
+	if (bo->bufmgr->bo_set_tiling)
+		return bo->bufmgr->bo_set_tiling(bo, tiling_mode, stride);
+
+	*tiling_mode = I915_TILING_NONE;
+	return 0;
+}
+
+drm_public int
+drm_intel_bo_get_tiling(drm_intel_bo *bo, uint32_t * tiling_mode,
+			uint32_t * swizzle_mode)
+{
+	if (bo->bufmgr->bo_get_tiling)
+		return bo->bufmgr->bo_get_tiling(bo, tiling_mode, swizzle_mode);
+
+	*tiling_mode = I915_TILING_NONE;
+	*swizzle_mode = I915_BIT_6_SWIZZLE_NONE;
+	return 0;
+}
+
+drm_public int
+drm_intel_bo_disable_reuse(drm_intel_bo *bo)
+{
+	if (bo->bufmgr->bo_disable_reuse)
+		return bo->bufmgr->bo_disable_reuse(bo);
+	return 0;
+}
+
+drm_public int
+drm_intel_bo_is_reusable(drm_intel_bo *bo)
+{
+	if (bo->bufmgr->bo_is_reusable)
+		return bo->bufmgr->bo_is_reusable(bo);
+	return 0;
+}
+
+drm_public int
+drm_intel_bo_busy(drm_intel_bo *bo)
+{
+	if (bo->bufmgr->bo_busy)
+		return bo->bufmgr->bo_busy(bo);
+	return 0;
+}
+
+drm_public int
+drm_intel_bo_madvise(drm_intel_bo *bo, int madv)
+{
+	if (bo->bufmgr->bo_madvise)
+		return bo->bufmgr->bo_madvise(bo, madv);
+	return -1;
+}
+
+drm_public int
+drm_intel_bo_references(drm_intel_bo *bo, drm_intel_bo *target_bo)
+{
+	return bo->bufmgr->bo_references(bo, target_bo);
+}
+
+drm_public int
+drm_intel_get_pipe_from_crtc_id(drm_intel_bufmgr *bufmgr, int crtc_id)
+{
+	if (bufmgr->get_pipe_from_crtc_id)
+		return bufmgr->get_pipe_from_crtc_id(bufmgr, crtc_id);
+	return -1;
+}
+
+static size_t
+drm_intel_probe_agp_aperture_size(int fd)
+{
+	struct pci_device *pci_dev;
+	size_t size = 0;
+	int ret;
+
+	ret = pci_system_init();
+	if (ret)
+		goto err;
+
+	/* XXX handle multiple adaptors? */
+	pci_dev = pci_device_find_by_slot(0, 0, 2, 0);
+	if (pci_dev == NULL)
+		goto err;
+
+	ret = pci_device_probe(pci_dev);
+	if (ret)
+		goto err;
+
+	size = pci_dev->regions[2].size;
+err:
+	pci_system_cleanup ();
+	return size;
+}
+
+drm_public int
+drm_intel_get_aperture_sizes(int fd, size_t *mappable, size_t *total)
+{
+
+	struct drm_i915_gem_get_aperture aperture;
+	int ret;
+
+	ret = drmIoctl(fd, DRM_IOCTL_I915_GEM_GET_APERTURE, &aperture);
+	if (ret)
+		return ret;
+
+	*mappable = 0;
+	/* XXX add a query for the kernel value? */
+	if (*mappable == 0)
+		*mappable = drm_intel_probe_agp_aperture_size(fd);
+	if (*mappable == 0)
+		*mappable = 64 * 1024 * 1024; /* minimum possible value */
+	*total = aperture.aper_size;
+	return 0;
+}

diff --git a/icd/intel/kmd/libdrm/intel/intel_bufmgr.h b/icd/intel/kmd/libdrm/intel/intel_bufmgr.h
new file mode 100644
index 0000000..2dabd03
--- /dev/null
+++ b/icd/intel/kmd/libdrm/intel/intel_bufmgr.h

@@ -0,0 +1,313 @@
+/*
+ * Copyright © 2008-2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Eric Anholt <eric@anholt.net>
+ *
+ */
+
+/**
+ * @file intel_bufmgr.h
+ *
+ * Public definitions of Intel-specific bufmgr functions.
+ */
+
+#ifndef INTEL_BUFMGR_H
+#define INTEL_BUFMGR_H
+
+#include <stdio.h>
+#include <stdint.h>
+#include <stdio.h>
+
+struct drm_clip_rect;
+
+typedef struct _drm_intel_bufmgr drm_intel_bufmgr;
+typedef struct _drm_intel_context drm_intel_context;
+typedef struct _drm_intel_bo drm_intel_bo;
+
+struct _drm_intel_bo {
+	/**
+	 * Size in bytes of the buffer object.
+	 *
+	 * The size may be larger than the size originally requested for the
+	 * allocation, such as being aligned to page size.
+	 */
+	unsigned long size;
+
+	/**
+	 * Alignment requirement for object
+	 *
+	 * Used for GTT mapping & pinning the object.
+	 */
+	unsigned long align;
+
+	/**
+	 * Deprecated field containing (possibly the low 32-bits of) the last
+	 * seen virtual card address.  Use offset64 instead.
+	 */
+	unsigned long offset;
+
+	/**
+	 * Virtual address for accessing the buffer data.  Only valid while
+	 * mapped.
+	 */
+#ifdef __cplusplus
+	void *virt;
+#else
+	void *virtual;
+#endif
+
+	/** Buffer manager context associated with this buffer object */
+	drm_intel_bufmgr *bufmgr;
+
+	/**
+	 * MM-specific handle for accessing object
+	 */
+	int handle;
+
+	/**
+	 * Last seen card virtual address (offset from the beginning of the
+	 * aperture) for the object.  This should be used to fill relocation
+	 * entries when calling drm_intel_bo_emit_reloc()
+	 */
+	uint64_t offset64;
+};
+
+enum aub_dump_bmp_format {
+	AUB_DUMP_BMP_FORMAT_8BIT = 1,
+	AUB_DUMP_BMP_FORMAT_ARGB_4444 = 4,
+	AUB_DUMP_BMP_FORMAT_ARGB_0888 = 6,
+	AUB_DUMP_BMP_FORMAT_ARGB_8888 = 7,
+};
+
+typedef struct _drm_intel_aub_annotation {
+	uint32_t type;
+	uint32_t subtype;
+	uint32_t ending_offset;
+} drm_intel_aub_annotation;
+
+#define BO_ALLOC_FOR_RENDER (1<<0)
+
+drm_intel_bo *drm_intel_bo_alloc(drm_intel_bufmgr *bufmgr, const char *name,
+				 unsigned long size, unsigned int alignment);
+drm_intel_bo *drm_intel_bo_alloc_for_render(drm_intel_bufmgr *bufmgr,
+					    const char *name,
+					    unsigned long size,
+					    unsigned int alignment);
+drm_intel_bo *drm_intel_bo_alloc_userptr(drm_intel_bufmgr *bufmgr,
+					const char *name,
+					void *addr, uint32_t tiling_mode,
+					uint32_t stride, unsigned long size,
+					unsigned long flags);
+drm_intel_bo *drm_intel_bo_alloc_tiled(drm_intel_bufmgr *bufmgr,
+				       const char *name,
+				       int x, int y, int cpp,
+				       uint32_t *tiling_mode,
+				       unsigned long *pitch,
+				       unsigned long flags);
+void drm_intel_bo_reference(drm_intel_bo *bo);
+void drm_intel_bo_unreference(drm_intel_bo *bo);
+int drm_intel_bo_map(drm_intel_bo *bo, int write_enable);
+int drm_intel_bo_unmap(drm_intel_bo *bo);
+
+int drm_intel_bo_subdata(drm_intel_bo *bo, unsigned long offset,
+			 unsigned long size, const void *data);
+int drm_intel_bo_get_subdata(drm_intel_bo *bo, unsigned long offset,
+			     unsigned long size, void *data);
+void drm_intel_bo_wait_rendering(drm_intel_bo *bo);
+
+void drm_intel_bufmgr_set_debug(drm_intel_bufmgr *bufmgr, int enable_debug);
+void drm_intel_bufmgr_destroy(drm_intel_bufmgr *bufmgr);
+int drm_intel_bo_exec(drm_intel_bo *bo, int used,
+		      struct drm_clip_rect *cliprects, int num_cliprects, int DR4);
+int drm_intel_bo_mrb_exec(drm_intel_bo *bo, int used,
+			struct drm_clip_rect *cliprects, int num_cliprects, int DR4,
+			unsigned int flags);
+int drm_intel_bufmgr_check_aperture_space(drm_intel_bo ** bo_array, int count);
+
+int drm_intel_bo_emit_reloc(drm_intel_bo *bo, uint32_t offset,
+			    drm_intel_bo *target_bo, uint32_t target_offset,
+			    uint32_t read_domains, uint32_t write_domain);
+int drm_intel_bo_emit_reloc_fence(drm_intel_bo *bo, uint32_t offset,
+				  drm_intel_bo *target_bo,
+				  uint32_t target_offset,
+				  uint32_t read_domains, uint32_t write_domain);
+int drm_intel_bo_pin(drm_intel_bo *bo, uint32_t alignment);
+int drm_intel_bo_unpin(drm_intel_bo *bo);
+int drm_intel_bo_set_tiling(drm_intel_bo *bo, uint32_t * tiling_mode,
+			    uint32_t stride);
+int drm_intel_bo_get_tiling(drm_intel_bo *bo, uint32_t * tiling_mode,
+			    uint32_t * swizzle_mode);
+int drm_intel_bo_flink(drm_intel_bo *bo, uint32_t * name);
+int drm_intel_bo_busy(drm_intel_bo *bo);
+int drm_intel_bo_madvise(drm_intel_bo *bo, int madv);
+
+int drm_intel_bo_disable_reuse(drm_intel_bo *bo);
+int drm_intel_bo_is_reusable(drm_intel_bo *bo);
+int drm_intel_bo_references(drm_intel_bo *bo, drm_intel_bo *target_bo);
+
+/* drm_intel_bufmgr_gem.c */
+drm_intel_bufmgr *drm_intel_bufmgr_gem_init(int fd, int batch_size);
+drm_intel_bo *drm_intel_bo_gem_create_from_name(drm_intel_bufmgr *bufmgr,
+						const char *name,
+						unsigned int handle);
+void drm_intel_bufmgr_gem_enable_reuse(drm_intel_bufmgr *bufmgr);
+void drm_intel_bufmgr_gem_enable_fenced_relocs(drm_intel_bufmgr *bufmgr);
+void drm_intel_bufmgr_gem_set_vma_cache_size(drm_intel_bufmgr *bufmgr,
+					     int limit);
+int drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo);
+int drm_intel_gem_bo_map_gtt(drm_intel_bo *bo);
+int drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo);
+
+int drm_intel_gem_bo_map_unsynchronized_non_gtt(drm_intel_bo *bo);
+
+int drm_intel_gem_bo_get_reloc_count(drm_intel_bo *bo);
+void drm_intel_gem_bo_clear_relocs(drm_intel_bo *bo, int start);
+void drm_intel_gem_bo_start_gtt_access(drm_intel_bo *bo, int write_enable);
+
+void
+drm_intel_bufmgr_gem_set_aub_filename(drm_intel_bufmgr *bufmgr,
+				      const char *filename);
+void drm_intel_bufmgr_gem_set_aub_dump(drm_intel_bufmgr *bufmgr, int enable);
+void drm_intel_gem_bo_aub_dump_bmp(drm_intel_bo *bo,
+				   int x1, int y1, int width, int height,
+				   enum aub_dump_bmp_format format,
+				   int pitch, int offset);
+void
+drm_intel_bufmgr_gem_set_aub_annotations(drm_intel_bo *bo,
+					 drm_intel_aub_annotation *annotations,
+					 unsigned count);
+
+int drm_intel_get_pipe_from_crtc_id(drm_intel_bufmgr *bufmgr, int crtc_id);
+
+int drm_intel_get_aperture_sizes(int fd, size_t *mappable, size_t *total);
+int drm_intel_bufmgr_gem_get_devid(drm_intel_bufmgr *bufmgr);
+int drm_intel_gem_bo_wait(drm_intel_bo *bo, int64_t timeout_ns);
+
+drm_intel_context *drm_intel_gem_context_create(drm_intel_bufmgr *bufmgr);
+void drm_intel_gem_context_destroy(drm_intel_context *ctx);
+int drm_intel_gem_bo_context_exec(drm_intel_bo *bo, drm_intel_context *ctx,
+				  int used, unsigned int flags);
+
+int drm_intel_bo_gem_export_to_prime(drm_intel_bo *bo, int *prime_fd);
+drm_intel_bo *drm_intel_bo_gem_create_from_prime(drm_intel_bufmgr *bufmgr,
+						int prime_fd, int size);
+
+/* drm_intel_bufmgr_fake.c */
+drm_intel_bufmgr *drm_intel_bufmgr_fake_init(int fd,
+					     unsigned long low_offset,
+					     void *low_virtual,
+					     unsigned long size,
+					     volatile unsigned int
+					     *last_dispatch);
+void drm_intel_bufmgr_fake_set_last_dispatch(drm_intel_bufmgr *bufmgr,
+					     volatile unsigned int
+					     *last_dispatch);
+void drm_intel_bufmgr_fake_set_exec_callback(drm_intel_bufmgr *bufmgr,
+					     int (*exec) (drm_intel_bo *bo,
+							  unsigned int used,
+							  void *priv),
+					     void *priv);
+void drm_intel_bufmgr_fake_set_fence_callback(drm_intel_bufmgr *bufmgr,
+					      unsigned int (*emit) (void *priv),
+					      void (*wait) (unsigned int fence,
+							    void *priv),
+					      void *priv);
+drm_intel_bo *drm_intel_bo_fake_alloc_static(drm_intel_bufmgr *bufmgr,
+					     const char *name,
+					     unsigned long offset,
+					     unsigned long size, void *virt);
+void drm_intel_bo_fake_disable_backing_store(drm_intel_bo *bo,
+					     void (*invalidate_cb) (drm_intel_bo
+								    * bo,
+								    void *ptr),
+					     void *ptr);
+
+void drm_intel_bufmgr_fake_contended_lock_take(drm_intel_bufmgr *bufmgr);
+void drm_intel_bufmgr_fake_evict_all(drm_intel_bufmgr *bufmgr);
+
+struct drm_intel_decode *drm_intel_decode_context_alloc(uint32_t devid);
+void drm_intel_decode_context_free(struct drm_intel_decode *ctx);
+void drm_intel_decode_set_batch_pointer(struct drm_intel_decode *ctx,
+					void *data, uint32_t hw_offset,
+					int count);
+void drm_intel_decode_set_dump_past_end(struct drm_intel_decode *ctx,
+					int dump_past_end);
+void drm_intel_decode_set_head_tail(struct drm_intel_decode *ctx,
+				    uint32_t head, uint32_t tail);
+void drm_intel_decode_set_output_file(struct drm_intel_decode *ctx, FILE *out);
+void drm_intel_decode(struct drm_intel_decode *ctx);
+
+int drm_intel_reg_read(drm_intel_bufmgr *bufmgr,
+		       uint32_t offset,
+		       uint64_t *result);
+
+int drm_intel_get_reset_stats(drm_intel_context *ctx,
+			      uint32_t *reset_count,
+			      uint32_t *active,
+			      uint32_t *pending);
+
+int drm_intel_get_subslice_total(int fd, unsigned int *subslice_total);
+int drm_intel_get_eu_total(int fd, unsigned int *eu_total);
+
+/** @{ Compatibility defines to keep old code building despite the symbol rename
+ * from dri_* to drm_intel_*
+ */
+#define dri_bo drm_intel_bo
+#define dri_bufmgr drm_intel_bufmgr
+#define dri_bo_alloc drm_intel_bo_alloc
+#define dri_bo_reference drm_intel_bo_reference
+#define dri_bo_unreference drm_intel_bo_unreference
+#define dri_bo_map drm_intel_bo_map
+#define dri_bo_unmap drm_intel_bo_unmap
+#define dri_bo_subdata drm_intel_bo_subdata
+#define dri_bo_get_subdata drm_intel_bo_get_subdata
+#define dri_bo_wait_rendering drm_intel_bo_wait_rendering
+#define dri_bufmgr_set_debug drm_intel_bufmgr_set_debug
+#define dri_bufmgr_destroy drm_intel_bufmgr_destroy
+#define dri_bo_exec drm_intel_bo_exec
+#define dri_bufmgr_check_aperture_space drm_intel_bufmgr_check_aperture_space
+#define dri_bo_emit_reloc(reloc_bo, read, write, target_offset,		\
+			  reloc_offset, target_bo)			\
+	drm_intel_bo_emit_reloc(reloc_bo, reloc_offset,			\
+				target_bo, target_offset,		\
+				read, write);
+#define dri_bo_pin drm_intel_bo_pin
+#define dri_bo_unpin drm_intel_bo_unpin
+#define dri_bo_get_tiling drm_intel_bo_get_tiling
+#define dri_bo_set_tiling(bo, mode) drm_intel_bo_set_tiling(bo, mode, 0)
+#define dri_bo_flink drm_intel_bo_flink
+#define intel_bufmgr_gem_init drm_intel_bufmgr_gem_init
+#define intel_bo_gem_create_from_name drm_intel_bo_gem_create_from_name
+#define intel_bufmgr_gem_enable_reuse drm_intel_bufmgr_gem_enable_reuse
+#define intel_bufmgr_fake_init drm_intel_bufmgr_fake_init
+#define intel_bufmgr_fake_set_last_dispatch drm_intel_bufmgr_fake_set_last_dispatch
+#define intel_bufmgr_fake_set_exec_callback drm_intel_bufmgr_fake_set_exec_callback
+#define intel_bufmgr_fake_set_fence_callback drm_intel_bufmgr_fake_set_fence_callback
+#define intel_bo_fake_alloc_static drm_intel_bo_fake_alloc_static
+#define intel_bo_fake_disable_backing_store drm_intel_bo_fake_disable_backing_store
+#define intel_bufmgr_fake_contended_lock_take drm_intel_bufmgr_fake_contended_lock_take
+#define intel_bufmgr_fake_evict_all drm_intel_bufmgr_fake_evict_all
+
+/** @{ */
+
+#endif /* INTEL_BUFMGR_H */

diff --git a/icd/intel/kmd/libdrm/intel/intel_bufmgr_gem.c b/icd/intel/kmd/libdrm/intel/intel_bufmgr_gem.c
new file mode 100644
index 0000000..6743226
--- /dev/null
+++ b/icd/intel/kmd/libdrm/intel/intel_bufmgr_gem.c

@@ -0,0 +1,3740 @@
+/**************************************************************************
+ *
+ * Copyright © 2007 Red Hat Inc.
+ * Copyright © 2007-2012 Intel Corporation
+ * Copyright 2006 Tungsten Graphics, Inc., Bismarck, ND., USA
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM,
+ * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+ * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
+ * USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ *
+ **************************************************************************/
+/*
+ * Authors: Thomas Hellström <thomas-at-tungstengraphics-dot-com>
+ *          Keith Whitwell <keithw-at-tungstengraphics-dot-com>
+ *	    Eric Anholt <eric@anholt.net>
+ *	    Dave Airlie <airlied@linux.ie>
+ */
+
+#ifdef HAVE_CONFIG_H
+#include "config.h"
+#endif
+
+#include <xf86drm.h>
+#include <xf86atomic.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <assert.h>
+#include <pthread.h>
+#include <sys/ioctl.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <stdbool.h>
+
+#include "errno.h"
+#ifndef ETIME
+#define ETIME ETIMEDOUT
+#endif
+#include "libdrm.h"
+#include "libdrm_lists.h"
+#include "intel_bufmgr.h"
+#include "intel_bufmgr_priv.h"
+#include "intel_chipset.h"
+#include "intel_aub.h"
+#include "string.h"
+
+#include "i915_drm.h"
+
+#ifdef HAVE_VALGRIND
+#include <valgrind.h>
+#include <memcheck.h>
+#define VG(x) x
+#else
+#define VG(x)
+#endif
+
+#define memclear(s) memset(&s, 0, sizeof(s))
+
+#define DBG(...) do {					\
+	if (bufmgr_gem->bufmgr.debug)			\
+		fprintf(stderr, __VA_ARGS__);		\
+} while (0)
+
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+
+typedef struct _drm_intel_bo_gem drm_intel_bo_gem;
+
+struct drm_intel_gem_bo_bucket {
+	drmMMListHead head;
+	unsigned long size;
+};
+
+typedef struct _drm_intel_bufmgr_gem {
+	drm_intel_bufmgr bufmgr;
+
+	atomic_t refcount;
+
+	int fd;
+
+	int max_relocs;
+
+	pthread_mutex_t lock;
+
+	struct drm_i915_gem_exec_object *exec_objects;
+	struct drm_i915_gem_exec_object2 *exec2_objects;
+	drm_intel_bo **exec_bos;
+	int exec_size;
+	int exec_count;
+
+	/** Array of lists of cached gem objects of power-of-two sizes */
+	struct drm_intel_gem_bo_bucket cache_bucket[14 * 4];
+	int num_buckets;
+	time_t time;
+
+	drmMMListHead managers;
+
+	drmMMListHead named;
+	drmMMListHead vma_cache;
+	int vma_count, vma_open, vma_max;
+
+	uint64_t gtt_size;
+	int available_fences;
+	int pci_device;
+	int gen;
+	unsigned int has_bsd : 1;
+	unsigned int has_blt : 1;
+	unsigned int has_relaxed_fencing : 1;
+	unsigned int has_llc : 1;
+	unsigned int has_wait_timeout : 1;
+	unsigned int bo_reuse : 1;
+	unsigned int no_exec : 1;
+	unsigned int has_vebox : 1;
+	bool fenced_relocs;
+
+	char *aub_filename;
+	FILE *aub_file;
+	uint32_t aub_offset;
+} drm_intel_bufmgr_gem;
+
+#define DRM_INTEL_RELOC_FENCE (1<<0)
+
+typedef struct _drm_intel_reloc_target_info {
+	drm_intel_bo *bo;
+	int flags;
+} drm_intel_reloc_target;
+
+struct _drm_intel_bo_gem {
+	drm_intel_bo bo;
+
+	atomic_t refcount;
+	uint32_t gem_handle;
+	const char *name;
+
+	/**
+	 * Kenel-assigned global name for this object
+         *
+         * List contains both flink named and prime fd'd objects
+	 */
+	unsigned int global_name;
+	drmMMListHead name_list;
+
+	/**
+	 * Index of the buffer within the validation list while preparing a
+	 * batchbuffer execution.
+	 */
+	int validate_index;
+
+	/**
+	 * Current tiling mode
+	 */
+	uint32_t tiling_mode;
+	uint32_t swizzle_mode;
+	unsigned long stride;
+
+	time_t free_time;
+
+	/** Array passed to the DRM containing relocation information. */
+	struct drm_i915_gem_relocation_entry *relocs;
+	/**
+	 * Array of info structs corresponding to relocs[i].target_handle etc
+	 */
+	drm_intel_reloc_target *reloc_target_info;
+	/** Number of entries in relocs */
+	int reloc_count;
+	/** Mapped address for the buffer, saved across map/unmap cycles */
+	void *mem_virtual;
+	/** GTT virtual address for the buffer, saved across map/unmap cycles */
+	void *gtt_virtual;
+	/**
+	 * Virtual address of the buffer allocated by user, used for userptr
+	 * objects only.
+	 */
+	void *user_virtual;
+	int map_count;
+	drmMMListHead vma_list;
+
+	/** BO cache list */
+	drmMMListHead head;
+
+	/**
+	 * Boolean of whether this BO and its children have been included in
+	 * the current drm_intel_bufmgr_check_aperture_space() total.
+	 */
+	bool included_in_check_aperture;
+
+	/**
+	 * Boolean of whether this buffer has been used as a relocation
+	 * target and had its size accounted for, and thus can't have any
+	 * further relocations added to it.
+	 */
+	bool used_as_reloc_target;
+
+	/**
+	 * Boolean of whether we have encountered an error whilst building the relocation tree.
+	 */
+	bool has_error;
+
+	/**
+	 * Boolean of whether this buffer can be re-used
+	 */
+	bool reusable;
+
+	/**
+	 * Boolean of whether the GPU is definitely not accessing the buffer.
+	 *
+	 * This is only valid when reusable, since non-reusable
+	 * buffers are those that have been shared wth other
+	 * processes, so we don't know their state.
+	 */
+	bool idle;
+
+	/**
+	 * Boolean of whether this buffer was allocated with userptr
+	 */
+	bool is_userptr;
+
+	/**
+	 * Size in bytes of this buffer and its relocation descendents.
+	 *
+	 * Used to avoid costly tree walking in
+	 * drm_intel_bufmgr_check_aperture in the common case.
+	 */
+	int reloc_tree_size;
+
+	/**
+	 * Number of potential fence registers required by this buffer and its
+	 * relocations.
+	 */
+	int reloc_tree_fences;
+
+	/** Flags that we may need to do the SW_FINSIH ioctl on unmap. */
+	bool mapped_cpu_write;
+
+	uint32_t aub_offset;
+
+	drm_intel_aub_annotation *aub_annotations;
+	unsigned aub_annotation_count;
+};
+
+static unsigned int
+drm_intel_gem_estimate_batch_space(drm_intel_bo ** bo_array, int count);
+
+static unsigned int
+drm_intel_gem_compute_batch_space(drm_intel_bo ** bo_array, int count);
+
+static int
+drm_intel_gem_bo_get_tiling(drm_intel_bo *bo, uint32_t * tiling_mode,
+			    uint32_t * swizzle_mode);
+
+static int
+drm_intel_gem_bo_set_tiling_internal(drm_intel_bo *bo,
+				     uint32_t tiling_mode,
+				     uint32_t stride);
+
+static void drm_intel_gem_bo_unreference_locked_timed(drm_intel_bo *bo,
+						      time_t time);
+
+static void drm_intel_gem_bo_unreference(drm_intel_bo *bo);
+
+static void drm_intel_gem_bo_free(drm_intel_bo *bo);
+
+static unsigned long
+drm_intel_gem_bo_tile_size(drm_intel_bufmgr_gem *bufmgr_gem, unsigned long size,
+			   uint32_t *tiling_mode)
+{
+	unsigned long min_size, max_size;
+	unsigned long i;
+
+	if (*tiling_mode == I915_TILING_NONE)
+		return size;
+
+	/* 965+ just need multiples of page size for tiling */
+	if (bufmgr_gem->gen >= 4)
+		return ROUND_UP_TO(size, 4096);
+
+	/* Older chips need powers of two, of at least 512k or 1M */
+	if (bufmgr_gem->gen == 3) {
+		min_size = 1024*1024;
+		max_size = 128*1024*1024;
+	} else {
+		min_size = 512*1024;
+		max_size = 64*1024*1024;
+	}
+
+	if (size > max_size) {
+		*tiling_mode = I915_TILING_NONE;
+		return size;
+	}
+
+	/* Do we need to allocate every page for the fence? */
+	if (bufmgr_gem->has_relaxed_fencing)
+		return ROUND_UP_TO(size, 4096);
+
+	for (i = min_size; i < size; i <<= 1)
+		;
+
+	return i;
+}
+
+/*
+ * Round a given pitch up to the minimum required for X tiling on a
+ * given chip.  We use 512 as the minimum to allow for a later tiling
+ * change.
+ */
+static unsigned long
+drm_intel_gem_bo_tile_pitch(drm_intel_bufmgr_gem *bufmgr_gem,
+			    unsigned long pitch, uint32_t *tiling_mode)
+{
+	unsigned long tile_width;
+	unsigned long i;
+
+	/* If untiled, then just align it so that we can do rendering
+	 * to it with the 3D engine.
+	 */
+	if (*tiling_mode == I915_TILING_NONE)
+		return ALIGN(pitch, 64);
+
+	if (*tiling_mode == I915_TILING_X
+			|| (IS_915(bufmgr_gem->pci_device)
+			    && *tiling_mode == I915_TILING_Y))
+		tile_width = 512;
+	else
+		tile_width = 128;
+
+	/* 965 is flexible */
+	if (bufmgr_gem->gen >= 4)
+		return ROUND_UP_TO(pitch, tile_width);
+
+	/* The older hardware has a maximum pitch of 8192 with tiled
+	 * surfaces, so fallback to untiled if it's too large.
+	 */
+	if (pitch > 8192) {
+		*tiling_mode = I915_TILING_NONE;
+		return ALIGN(pitch, 64);
+	}
+
+	/* Pre-965 needs power of two tile width */
+	for (i = tile_width; i < pitch; i <<= 1)
+		;
+
+	return i;
+}
+
+static struct drm_intel_gem_bo_bucket *
+drm_intel_gem_bo_bucket_for_size(drm_intel_bufmgr_gem *bufmgr_gem,
+				 unsigned long size)
+{
+	int i;
+
+	for (i = 0; i < bufmgr_gem->num_buckets; i++) {
+		struct drm_intel_gem_bo_bucket *bucket =
+		    &bufmgr_gem->cache_bucket[i];
+		if (bucket->size >= size) {
+			return bucket;
+		}
+	}
+
+	return NULL;
+}
+
+static void
+drm_intel_gem_dump_validation_list(drm_intel_bufmgr_gem *bufmgr_gem)
+{
+	int i, j;
+
+	for (i = 0; i < bufmgr_gem->exec_count; i++) {
+		drm_intel_bo *bo = bufmgr_gem->exec_bos[i];
+		drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+
+		if (bo_gem->relocs == NULL) {
+			DBG("%2d: %d (%s)\n", i, bo_gem->gem_handle,
+			    bo_gem->name);
+			continue;
+		}
+
+		for (j = 0; j < bo_gem->reloc_count; j++) {
+			drm_intel_bo *target_bo = bo_gem->reloc_target_info[j].bo;
+			drm_intel_bo_gem *target_gem =
+			    (drm_intel_bo_gem *) target_bo;
+
+			DBG("%2d: %d (%s)@0x%08llx -> "
+			    "%d (%s)@0x%08lx + 0x%08x\n",
+			    i,
+			    bo_gem->gem_handle, bo_gem->name,
+			    (unsigned long long)bo_gem->relocs[j].offset,
+			    target_gem->gem_handle,
+			    target_gem->name,
+			    target_bo->offset64,
+			    bo_gem->relocs[j].delta);
+		}
+	}
+}
+
+static inline void
+drm_intel_gem_bo_reference(drm_intel_bo *bo)
+{
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+
+	atomic_inc(&bo_gem->refcount);
+}
+
+/**
+ * Adds the given buffer to the list of buffers to be validated (moved into the
+ * appropriate memory type) with the next batch submission.
+ *
+ * If a buffer is validated multiple times in a batch submission, it ends up
+ * with the intersection of the memory type flags and the union of the
+ * access flags.
+ */
+static void
+drm_intel_add_validate_buffer(drm_intel_bo *bo)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	int index;
+
+	if (bo_gem->validate_index != -1)
+		return;
+
+	/* Extend the array of validation entries as necessary. */
+	if (bufmgr_gem->exec_count == bufmgr_gem->exec_size) {
+		int new_size = bufmgr_gem->exec_size * 2;
+
+		if (new_size == 0)
+			new_size = 5;
+
+		bufmgr_gem->exec_objects =
+		    realloc(bufmgr_gem->exec_objects,
+			    sizeof(*bufmgr_gem->exec_objects) * new_size);
+		bufmgr_gem->exec_bos =
+		    realloc(bufmgr_gem->exec_bos,
+			    sizeof(*bufmgr_gem->exec_bos) * new_size);
+		bufmgr_gem->exec_size = new_size;
+	}
+
+	index = bufmgr_gem->exec_count;
+	bo_gem->validate_index = index;
+	/* Fill in array entry */
+	bufmgr_gem->exec_objects[index].handle = bo_gem->gem_handle;
+	bufmgr_gem->exec_objects[index].relocation_count = bo_gem->reloc_count;
+	bufmgr_gem->exec_objects[index].relocs_ptr = (uintptr_t) bo_gem->relocs;
+	bufmgr_gem->exec_objects[index].alignment = 0;
+	bufmgr_gem->exec_objects[index].offset = 0;
+	bufmgr_gem->exec_bos[index] = bo;
+	bufmgr_gem->exec_count++;
+}
+
+static void
+drm_intel_add_validate_buffer2(drm_intel_bo *bo, int need_fence)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *)bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *)bo;
+	int index;
+
+	if (bo_gem->validate_index != -1) {
+		if (need_fence)
+			bufmgr_gem->exec2_objects[bo_gem->validate_index].flags |=
+				EXEC_OBJECT_NEEDS_FENCE;
+		return;
+	}
+
+	/* Extend the array of validation entries as necessary. */
+	if (bufmgr_gem->exec_count == bufmgr_gem->exec_size) {
+		int new_size = bufmgr_gem->exec_size * 2;
+
+		if (new_size == 0)
+			new_size = 5;
+
+		bufmgr_gem->exec2_objects =
+			realloc(bufmgr_gem->exec2_objects,
+				sizeof(*bufmgr_gem->exec2_objects) * new_size);
+		bufmgr_gem->exec_bos =
+			realloc(bufmgr_gem->exec_bos,
+				sizeof(*bufmgr_gem->exec_bos) * new_size);
+		bufmgr_gem->exec_size = new_size;
+	}
+
+	index = bufmgr_gem->exec_count;
+	bo_gem->validate_index = index;
+	/* Fill in array entry */
+	bufmgr_gem->exec2_objects[index].handle = bo_gem->gem_handle;
+	bufmgr_gem->exec2_objects[index].relocation_count = bo_gem->reloc_count;
+	bufmgr_gem->exec2_objects[index].relocs_ptr = (uintptr_t)bo_gem->relocs;
+	bufmgr_gem->exec2_objects[index].alignment = 0;
+	bufmgr_gem->exec2_objects[index].offset = 0;
+	bufmgr_gem->exec_bos[index] = bo;
+	bufmgr_gem->exec2_objects[index].flags = 0;
+	bufmgr_gem->exec2_objects[index].rsvd1 = 0;
+	bufmgr_gem->exec2_objects[index].rsvd2 = 0;
+	if (need_fence) {
+		bufmgr_gem->exec2_objects[index].flags |=
+			EXEC_OBJECT_NEEDS_FENCE;
+	}
+	bufmgr_gem->exec_count++;
+}
+
+#define RELOC_BUF_SIZE(x) ((I915_RELOC_HEADER + x * I915_RELOC0_STRIDE) * \
+	sizeof(uint32_t))
+
+static void
+drm_intel_bo_gem_set_in_aperture_size(drm_intel_bufmgr_gem *bufmgr_gem,
+				      drm_intel_bo_gem *bo_gem)
+{
+	int size;
+
+	assert(!bo_gem->used_as_reloc_target);
+
+	/* The older chipsets are far-less flexible in terms of tiling,
+	 * and require tiled buffer to be size aligned in the aperture.
+	 * This means that in the worst possible case we will need a hole
+	 * twice as large as the object in order for it to fit into the
+	 * aperture. Optimal packing is for wimps.
+	 */
+	size = bo_gem->bo.size;
+	if (bufmgr_gem->gen < 4 && bo_gem->tiling_mode != I915_TILING_NONE) {
+		int min_size;
+
+		if (bufmgr_gem->has_relaxed_fencing) {
+			if (bufmgr_gem->gen == 3)
+				min_size = 1024*1024;
+			else
+				min_size = 512*1024;
+
+			while (min_size < size)
+				min_size *= 2;
+		} else
+			min_size = size;
+
+		/* Account for worst-case alignment. */
+		size = 2 * min_size;
+	}
+
+	bo_gem->reloc_tree_size = size;
+}
+
+static int
+drm_intel_setup_reloc_list(drm_intel_bo *bo)
+{
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	unsigned int max_relocs = bufmgr_gem->max_relocs;
+
+	if (bo->size / 4 < max_relocs)
+		max_relocs = bo->size / 4;
+
+	bo_gem->relocs = malloc(max_relocs *
+				sizeof(struct drm_i915_gem_relocation_entry));
+	bo_gem->reloc_target_info = malloc(max_relocs *
+					   sizeof(drm_intel_reloc_target));
+	if (bo_gem->relocs == NULL || bo_gem->reloc_target_info == NULL) {
+		bo_gem->has_error = true;
+
+		free (bo_gem->relocs);
+		bo_gem->relocs = NULL;
+
+		free (bo_gem->reloc_target_info);
+		bo_gem->reloc_target_info = NULL;
+
+		return 1;
+	}
+
+	return 0;
+}
+
+static int
+drm_intel_gem_bo_busy(drm_intel_bo *bo)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	struct drm_i915_gem_busy busy;
+	int ret;
+
+	if (bo_gem->reusable && bo_gem->idle)
+		return false;
+
+	memclear(busy);
+	busy.handle = bo_gem->gem_handle;
+
+	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GEM_BUSY, &busy);
+	if (ret == 0) {
+		bo_gem->idle = !busy.busy;
+		return busy.busy;
+	} else {
+		return false;
+	}
+	return (ret == 0 && busy.busy);
+}
+
+static int
+drm_intel_gem_bo_madvise_internal(drm_intel_bufmgr_gem *bufmgr_gem,
+				  drm_intel_bo_gem *bo_gem, int state)
+{
+	struct drm_i915_gem_madvise madv;
+
+	memclear(madv);
+	madv.handle = bo_gem->gem_handle;
+	madv.madv = state;
+	madv.retained = 1;
+	drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GEM_MADVISE, &madv);
+
+	return madv.retained;
+}
+
+static int
+drm_intel_gem_bo_madvise(drm_intel_bo *bo, int madv)
+{
+	return drm_intel_gem_bo_madvise_internal
+		((drm_intel_bufmgr_gem *) bo->bufmgr,
+		 (drm_intel_bo_gem *) bo,
+		 madv);
+}
+
+/* drop the oldest entries that have been purged by the kernel */
+static void
+drm_intel_gem_bo_cache_purge_bucket(drm_intel_bufmgr_gem *bufmgr_gem,
+				    struct drm_intel_gem_bo_bucket *bucket)
+{
+	while (!DRMLISTEMPTY(&bucket->head)) {
+		drm_intel_bo_gem *bo_gem;
+
+		bo_gem = DRMLISTENTRY(drm_intel_bo_gem,
+				      bucket->head.next, head);
+		if (drm_intel_gem_bo_madvise_internal
+		    (bufmgr_gem, bo_gem, I915_MADV_DONTNEED))
+			break;
+
+		DRMLISTDEL(&bo_gem->head);
+		drm_intel_gem_bo_free(&bo_gem->bo);
+	}
+}
+
+static drm_intel_bo *
+drm_intel_gem_bo_alloc_internal(drm_intel_bufmgr *bufmgr,
+				const char *name,
+				unsigned long size,
+				unsigned long flags,
+				uint32_t tiling_mode,
+				unsigned long stride)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bufmgr;
+	drm_intel_bo_gem *bo_gem;
+	unsigned int page_size = getpagesize();
+	int ret;
+	struct drm_intel_gem_bo_bucket *bucket;
+	bool alloc_from_cache;
+	unsigned long bo_size;
+	bool for_render = false;
+
+	if (flags & BO_ALLOC_FOR_RENDER)
+		for_render = true;
+
+	/* Round the allocated size up to a power of two number of pages. */
+	bucket = drm_intel_gem_bo_bucket_for_size(bufmgr_gem, size);
+
+	/* If we don't have caching at this size, don't actually round the
+	 * allocation up.
+	 */
+	if (bucket == NULL) {
+		bo_size = size;
+		if (bo_size < page_size)
+			bo_size = page_size;
+	} else {
+		bo_size = bucket->size;
+	}
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+	/* Get a buffer out of the cache if available */
+retry:
+	alloc_from_cache = false;
+	if (bucket != NULL && !DRMLISTEMPTY(&bucket->head)) {
+		if (for_render) {
+			/* Allocate new render-target BOs from the tail (MRU)
+			 * of the list, as it will likely be hot in the GPU
+			 * cache and in the aperture for us.
+			 */
+			bo_gem = DRMLISTENTRY(drm_intel_bo_gem,
+					      bucket->head.prev, head);
+			DRMLISTDEL(&bo_gem->head);
+			alloc_from_cache = true;
+		} else {
+			/* For non-render-target BOs (where we're probably
+			 * going to map it first thing in order to fill it
+			 * with data), check if the last BO in the cache is
+			 * unbusy, and only reuse in that case. Otherwise,
+			 * allocating a new buffer is probably faster than
+			 * waiting for the GPU to finish.
+			 */
+			bo_gem = DRMLISTENTRY(drm_intel_bo_gem,
+					      bucket->head.next, head);
+			if (!drm_intel_gem_bo_busy(&bo_gem->bo)) {
+				alloc_from_cache = true;
+				DRMLISTDEL(&bo_gem->head);
+			}
+		}
+
+		if (alloc_from_cache) {
+			if (!drm_intel_gem_bo_madvise_internal
+			    (bufmgr_gem, bo_gem, I915_MADV_WILLNEED)) {
+				drm_intel_gem_bo_free(&bo_gem->bo);
+				drm_intel_gem_bo_cache_purge_bucket(bufmgr_gem,
+								    bucket);
+				goto retry;
+			}
+
+			if (drm_intel_gem_bo_set_tiling_internal(&bo_gem->bo,
+								 tiling_mode,
+								 stride)) {
+				drm_intel_gem_bo_free(&bo_gem->bo);
+				goto retry;
+			}
+		}
+	}
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	if (!alloc_from_cache) {
+		struct drm_i915_gem_create create;
+
+		bo_gem = calloc(1, sizeof(*bo_gem));
+		if (!bo_gem)
+			return NULL;
+
+		bo_gem->bo.size = bo_size;
+
+		memclear(create);
+		create.size = bo_size;
+
+		ret = drmIoctl(bufmgr_gem->fd,
+			       DRM_IOCTL_I915_GEM_CREATE,
+			       &create);
+		bo_gem->gem_handle = create.handle;
+		bo_gem->bo.handle = bo_gem->gem_handle;
+		if (ret != 0) {
+			free(bo_gem);
+			return NULL;
+		}
+		bo_gem->bo.bufmgr = bufmgr;
+
+		bo_gem->tiling_mode = I915_TILING_NONE;
+		bo_gem->swizzle_mode = I915_BIT_6_SWIZZLE_NONE;
+		bo_gem->stride = 0;
+
+		/* drm_intel_gem_bo_free calls DRMLISTDEL() for an uninitialized
+		   list (vma_list), so better set the list head here */
+		DRMINITLISTHEAD(&bo_gem->name_list);
+		DRMINITLISTHEAD(&bo_gem->vma_list);
+		if (drm_intel_gem_bo_set_tiling_internal(&bo_gem->bo,
+							 tiling_mode,
+							 stride)) {
+		    drm_intel_gem_bo_free(&bo_gem->bo);
+		    return NULL;
+		}
+	}
+
+	bo_gem->name = name;
+	atomic_set(&bo_gem->refcount, 1);
+	bo_gem->validate_index = -1;
+	bo_gem->reloc_tree_fences = 0;
+	bo_gem->used_as_reloc_target = false;
+	bo_gem->has_error = false;
+	bo_gem->reusable = true;
+	bo_gem->aub_annotations = NULL;
+	bo_gem->aub_annotation_count = 0;
+
+	drm_intel_bo_gem_set_in_aperture_size(bufmgr_gem, bo_gem);
+
+	DBG("bo_create: buf %d (%s) %ldb\n",
+	    bo_gem->gem_handle, bo_gem->name, size);
+
+	return &bo_gem->bo;
+}
+
+static drm_intel_bo *
+drm_intel_gem_bo_alloc_for_render(drm_intel_bufmgr *bufmgr,
+				  const char *name,
+				  unsigned long size,
+				  unsigned int alignment)
+{
+	return drm_intel_gem_bo_alloc_internal(bufmgr, name, size,
+					       BO_ALLOC_FOR_RENDER,
+					       I915_TILING_NONE, 0);
+}
+
+static drm_intel_bo *
+drm_intel_gem_bo_alloc(drm_intel_bufmgr *bufmgr,
+		       const char *name,
+		       unsigned long size,
+		       unsigned int alignment)
+{
+	return drm_intel_gem_bo_alloc_internal(bufmgr, name, size, 0,
+					       I915_TILING_NONE, 0);
+}
+
+static drm_intel_bo *
+drm_intel_gem_bo_alloc_tiled(drm_intel_bufmgr *bufmgr, const char *name,
+			     int x, int y, int cpp, uint32_t *tiling_mode,
+			     unsigned long *pitch, unsigned long flags)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *)bufmgr;
+	unsigned long size, stride;
+	uint32_t tiling;
+
+	do {
+		unsigned long aligned_y, height_alignment;
+
+		tiling = *tiling_mode;
+
+		/* If we're tiled, our allocations are in 8 or 32-row blocks,
+		 * so failure to align our height means that we won't allocate
+		 * enough pages.
+		 *
+		 * If we're untiled, we still have to align to 2 rows high
+		 * because the data port accesses 2x2 blocks even if the
+		 * bottom row isn't to be rendered, so failure to align means
+		 * we could walk off the end of the GTT and fault.  This is
+		 * documented on 965, and may be the case on older chipsets
+		 * too so we try to be careful.
+		 */
+		aligned_y = y;
+		height_alignment = 2;
+
+		if ((bufmgr_gem->gen == 2) && tiling != I915_TILING_NONE)
+			height_alignment = 16;
+		else if (tiling == I915_TILING_X
+			|| (IS_915(bufmgr_gem->pci_device)
+			    && tiling == I915_TILING_Y))
+			height_alignment = 8;
+		else if (tiling == I915_TILING_Y)
+			height_alignment = 32;
+		aligned_y = ALIGN(y, height_alignment);
+
+		stride = x * cpp;
+		stride = drm_intel_gem_bo_tile_pitch(bufmgr_gem, stride, tiling_mode);
+		size = stride * aligned_y;
+		size = drm_intel_gem_bo_tile_size(bufmgr_gem, size, tiling_mode);
+	} while (*tiling_mode != tiling);
+	*pitch = stride;
+
+	if (tiling == I915_TILING_NONE)
+		stride = 0;
+
+	return drm_intel_gem_bo_alloc_internal(bufmgr, name, size, flags,
+					       tiling, stride);
+}
+
+static drm_intel_bo *
+drm_intel_gem_bo_alloc_userptr(drm_intel_bufmgr *bufmgr,
+				const char *name,
+				void *addr,
+				uint32_t tiling_mode,
+				uint32_t stride,
+				unsigned long size,
+				unsigned long flags)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bufmgr;
+	drm_intel_bo_gem *bo_gem;
+	int ret;
+	struct drm_i915_gem_userptr userptr;
+
+	/* Tiling with userptr surfaces is not supported
+	 * on all hardware so refuse it for time being.
+	 */
+	if (tiling_mode != I915_TILING_NONE)
+		return NULL;
+
+	bo_gem = calloc(1, sizeof(*bo_gem));
+	if (!bo_gem)
+		return NULL;
+
+	bo_gem->bo.size = size;
+
+	memclear(userptr);
+	userptr.user_ptr = (__u64)((unsigned long)addr);
+	userptr.user_size = size;
+	userptr.flags = flags;
+
+	ret = drmIoctl(bufmgr_gem->fd,
+			DRM_IOCTL_I915_GEM_USERPTR,
+			&userptr);
+	if (ret != 0) {
+		DBG("bo_create_userptr: "
+		    "ioctl failed with user ptr %p size 0x%lx, "
+		    "user flags 0x%lx\n", addr, size, flags);
+		free(bo_gem);
+		return NULL;
+	}
+
+	bo_gem->gem_handle = userptr.handle;
+	bo_gem->bo.handle = bo_gem->gem_handle;
+	bo_gem->bo.bufmgr    = bufmgr;
+	bo_gem->is_userptr   = true;
+	bo_gem->bo.virtual   = addr;
+	/* Save the address provided by user */
+	bo_gem->user_virtual = addr;
+	bo_gem->tiling_mode  = I915_TILING_NONE;
+	bo_gem->swizzle_mode = I915_BIT_6_SWIZZLE_NONE;
+	bo_gem->stride       = 0;
+
+	DRMINITLISTHEAD(&bo_gem->name_list);
+	DRMINITLISTHEAD(&bo_gem->vma_list);
+
+	bo_gem->name = name;
+	atomic_set(&bo_gem->refcount, 1);
+	bo_gem->validate_index = -1;
+	bo_gem->reloc_tree_fences = 0;
+	bo_gem->used_as_reloc_target = false;
+	bo_gem->has_error = false;
+	bo_gem->reusable = false;
+
+	drm_intel_bo_gem_set_in_aperture_size(bufmgr_gem, bo_gem);
+
+	DBG("bo_create_userptr: "
+	    "ptr %p buf %d (%s) size %ldb, stride 0x%x, tile mode %d\n",
+		addr, bo_gem->gem_handle, bo_gem->name,
+		size, stride, tiling_mode);
+
+	return &bo_gem->bo;
+}
+
+/**
+ * Returns a drm_intel_bo wrapping the given buffer object handle.
+ *
+ * This can be used when one application needs to pass a buffer object
+ * to another.
+ */
+drm_public drm_intel_bo *
+drm_intel_bo_gem_create_from_name(drm_intel_bufmgr *bufmgr,
+				  const char *name,
+				  unsigned int handle)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bufmgr;
+	drm_intel_bo_gem *bo_gem;
+	int ret;
+	struct drm_gem_open open_arg;
+	struct drm_i915_gem_get_tiling get_tiling;
+	drmMMListHead *list;
+
+	/* At the moment most applications only have a few named bo.
+	 * For instance, in a DRI client only the render buffers passed
+	 * between X and the client are named. And since X returns the
+	 * alternating names for the front/back buffer a linear search
+	 * provides a sufficiently fast match.
+	 */
+	pthread_mutex_lock(&bufmgr_gem->lock);
+	for (list = bufmgr_gem->named.next;
+	     list != &bufmgr_gem->named;
+	     list = list->next) {
+		bo_gem = DRMLISTENTRY(drm_intel_bo_gem, list, name_list);
+		if (bo_gem->global_name == handle) {
+			drm_intel_gem_bo_reference(&bo_gem->bo);
+			pthread_mutex_unlock(&bufmgr_gem->lock);
+			return &bo_gem->bo;
+		}
+	}
+
+	memclear(open_arg);
+	open_arg.name = handle;
+	ret = drmIoctl(bufmgr_gem->fd,
+		       DRM_IOCTL_GEM_OPEN,
+		       &open_arg);
+	if (ret != 0) {
+		DBG("Couldn't reference %s handle 0x%08x: %s\n",
+		    name, handle, strerror(errno));
+		pthread_mutex_unlock(&bufmgr_gem->lock);
+		return NULL;
+	}
+        /* Now see if someone has used a prime handle to get this
+         * object from the kernel before by looking through the list
+         * again for a matching gem_handle
+         */
+	for (list = bufmgr_gem->named.next;
+	     list != &bufmgr_gem->named;
+	     list = list->next) {
+		bo_gem = DRMLISTENTRY(drm_intel_bo_gem, list, name_list);
+		if (bo_gem->gem_handle == open_arg.handle) {
+			drm_intel_gem_bo_reference(&bo_gem->bo);
+			pthread_mutex_unlock(&bufmgr_gem->lock);
+			return &bo_gem->bo;
+		}
+	}
+
+	bo_gem = calloc(1, sizeof(*bo_gem));
+	if (!bo_gem) {
+		pthread_mutex_unlock(&bufmgr_gem->lock);
+		return NULL;
+	}
+
+	bo_gem->bo.size = open_arg.size;
+	bo_gem->bo.offset = 0;
+	bo_gem->bo.offset64 = 0;
+	bo_gem->bo.virtual = NULL;
+	bo_gem->bo.bufmgr = bufmgr;
+	bo_gem->name = name;
+	atomic_set(&bo_gem->refcount, 1);
+	bo_gem->validate_index = -1;
+	bo_gem->gem_handle = open_arg.handle;
+	bo_gem->bo.handle = open_arg.handle;
+	bo_gem->global_name = handle;
+	bo_gem->reusable = false;
+
+	memclear(get_tiling);
+	get_tiling.handle = bo_gem->gem_handle;
+	ret = drmIoctl(bufmgr_gem->fd,
+		       DRM_IOCTL_I915_GEM_GET_TILING,
+		       &get_tiling);
+	if (ret != 0) {
+		drm_intel_gem_bo_unreference(&bo_gem->bo);
+		pthread_mutex_unlock(&bufmgr_gem->lock);
+		return NULL;
+	}
+	bo_gem->tiling_mode = get_tiling.tiling_mode;
+	bo_gem->swizzle_mode = get_tiling.swizzle_mode;
+	/* XXX stride is unknown */
+	drm_intel_bo_gem_set_in_aperture_size(bufmgr_gem, bo_gem);
+
+	DRMINITLISTHEAD(&bo_gem->vma_list);
+	DRMLISTADDTAIL(&bo_gem->name_list, &bufmgr_gem->named);
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+	DBG("bo_create_from_handle: %d (%s)\n", handle, bo_gem->name);
+
+	return &bo_gem->bo;
+}
+
+static void
+drm_intel_gem_bo_free(drm_intel_bo *bo)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	struct drm_gem_close close;
+	int ret;
+
+	DRMLISTDEL(&bo_gem->vma_list);
+	if (bo_gem->mem_virtual) {
+		VG(VALGRIND_FREELIKE_BLOCK(bo_gem->mem_virtual, 0));
+		drm_munmap(bo_gem->mem_virtual, bo_gem->bo.size);
+		bufmgr_gem->vma_count--;
+	}
+	if (bo_gem->gtt_virtual) {
+		drm_munmap(bo_gem->gtt_virtual, bo_gem->bo.size);
+		bufmgr_gem->vma_count--;
+	}
+
+	/* Close this object */
+	memclear(close);
+	close.handle = bo_gem->gem_handle;
+	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_GEM_CLOSE, &close);
+	if (ret != 0) {
+		DBG("DRM_IOCTL_GEM_CLOSE %d failed (%s): %s\n",
+		    bo_gem->gem_handle, bo_gem->name, strerror(errno));
+	}
+	free(bo_gem->aub_annotations);
+	free(bo);
+}
+
+static void
+drm_intel_gem_bo_mark_mmaps_incoherent(drm_intel_bo *bo)
+{
+#if HAVE_VALGRIND
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+
+	if (bo_gem->mem_virtual)
+		VALGRIND_MAKE_MEM_NOACCESS(bo_gem->mem_virtual, bo->size);
+
+	if (bo_gem->gtt_virtual)
+		VALGRIND_MAKE_MEM_NOACCESS(bo_gem->gtt_virtual, bo->size);
+#endif
+}
+
+/** Frees all cached buffers significantly older than @time. */
+static void
+drm_intel_gem_cleanup_bo_cache(drm_intel_bufmgr_gem *bufmgr_gem, time_t time)
+{
+	int i;
+
+	if (bufmgr_gem->time == time)
+		return;
+
+	for (i = 0; i < bufmgr_gem->num_buckets; i++) {
+		struct drm_intel_gem_bo_bucket *bucket =
+		    &bufmgr_gem->cache_bucket[i];
+
+		while (!DRMLISTEMPTY(&bucket->head)) {
+			drm_intel_bo_gem *bo_gem;
+
+			bo_gem = DRMLISTENTRY(drm_intel_bo_gem,
+					      bucket->head.next, head);
+			if (time - bo_gem->free_time <= 1)
+				break;
+
+			DRMLISTDEL(&bo_gem->head);
+
+			drm_intel_gem_bo_free(&bo_gem->bo);
+		}
+	}
+
+	bufmgr_gem->time = time;
+}
+
+static void drm_intel_gem_bo_purge_vma_cache(drm_intel_bufmgr_gem *bufmgr_gem)
+{
+	int limit;
+
+	DBG("%s: cached=%d, open=%d, limit=%d\n", __FUNCTION__,
+	    bufmgr_gem->vma_count, bufmgr_gem->vma_open, bufmgr_gem->vma_max);
+
+	if (bufmgr_gem->vma_max < 0)
+		return;
+
+	/* We may need to evict a few entries in order to create new mmaps */
+	limit = bufmgr_gem->vma_max - 2*bufmgr_gem->vma_open;
+	if (limit < 0)
+		limit = 0;
+
+	while (bufmgr_gem->vma_count > limit) {
+		drm_intel_bo_gem *bo_gem;
+
+		bo_gem = DRMLISTENTRY(drm_intel_bo_gem,
+				      bufmgr_gem->vma_cache.next,
+				      vma_list);
+		assert(bo_gem->map_count == 0);
+		DRMLISTDELINIT(&bo_gem->vma_list);
+
+		if (bo_gem->mem_virtual) {
+			drm_munmap(bo_gem->mem_virtual, bo_gem->bo.size);
+			bo_gem->mem_virtual = NULL;
+			bufmgr_gem->vma_count--;
+		}
+		if (bo_gem->gtt_virtual) {
+			drm_munmap(bo_gem->gtt_virtual, bo_gem->bo.size);
+			bo_gem->gtt_virtual = NULL;
+			bufmgr_gem->vma_count--;
+		}
+	}
+}
+
+static void drm_intel_gem_bo_close_vma(drm_intel_bufmgr_gem *bufmgr_gem,
+				       drm_intel_bo_gem *bo_gem)
+{
+	bufmgr_gem->vma_open--;
+	DRMLISTADDTAIL(&bo_gem->vma_list, &bufmgr_gem->vma_cache);
+	if (bo_gem->mem_virtual)
+		bufmgr_gem->vma_count++;
+	if (bo_gem->gtt_virtual)
+		bufmgr_gem->vma_count++;
+	drm_intel_gem_bo_purge_vma_cache(bufmgr_gem);
+}
+
+static void drm_intel_gem_bo_open_vma(drm_intel_bufmgr_gem *bufmgr_gem,
+				      drm_intel_bo_gem *bo_gem)
+{
+	bufmgr_gem->vma_open++;
+	DRMLISTDEL(&bo_gem->vma_list);
+	if (bo_gem->mem_virtual)
+		bufmgr_gem->vma_count--;
+	if (bo_gem->gtt_virtual)
+		bufmgr_gem->vma_count--;
+	drm_intel_gem_bo_purge_vma_cache(bufmgr_gem);
+}
+
+static void
+drm_intel_gem_bo_unreference_final(drm_intel_bo *bo, time_t time)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	struct drm_intel_gem_bo_bucket *bucket;
+	int i;
+
+	/* Unreference all the target buffers */
+	for (i = 0; i < bo_gem->reloc_count; i++) {
+		if (bo_gem->reloc_target_info[i].bo != bo) {
+			drm_intel_gem_bo_unreference_locked_timed(bo_gem->
+								  reloc_target_info[i].bo,
+								  time);
+		}
+	}
+	bo_gem->reloc_count = 0;
+	bo_gem->used_as_reloc_target = false;
+
+	DBG("bo_unreference final: %d (%s)\n",
+	    bo_gem->gem_handle, bo_gem->name);
+
+	/* release memory associated with this object */
+	if (bo_gem->reloc_target_info) {
+		free(bo_gem->reloc_target_info);
+		bo_gem->reloc_target_info = NULL;
+	}
+	if (bo_gem->relocs) {
+		free(bo_gem->relocs);
+		bo_gem->relocs = NULL;
+	}
+
+	/* Clear any left-over mappings */
+	if (bo_gem->map_count) {
+		DBG("bo freed with non-zero map-count %d\n", bo_gem->map_count);
+		bo_gem->map_count = 0;
+		drm_intel_gem_bo_close_vma(bufmgr_gem, bo_gem);
+		drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+	}
+
+	DRMLISTDEL(&bo_gem->name_list);
+
+	bucket = drm_intel_gem_bo_bucket_for_size(bufmgr_gem, bo->size);
+	/* Put the buffer into our internal cache for reuse if we can. */
+	if (bufmgr_gem->bo_reuse && bo_gem->reusable && bucket != NULL &&
+	    drm_intel_gem_bo_madvise_internal(bufmgr_gem, bo_gem,
+					      I915_MADV_DONTNEED)) {
+		bo_gem->free_time = time;
+
+		bo_gem->name = NULL;
+		bo_gem->validate_index = -1;
+
+		DRMLISTADDTAIL(&bo_gem->head, &bucket->head);
+	} else {
+		drm_intel_gem_bo_free(bo);
+	}
+}
+
+static void drm_intel_gem_bo_unreference_locked_timed(drm_intel_bo *bo,
+						      time_t time)
+{
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+
+	assert(atomic_read(&bo_gem->refcount) > 0);
+	if (atomic_dec_and_test(&bo_gem->refcount))
+		drm_intel_gem_bo_unreference_final(bo, time);
+}
+
+static void drm_intel_gem_bo_unreference(drm_intel_bo *bo)
+{
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+
+	assert(atomic_read(&bo_gem->refcount) > 0);
+
+	if (atomic_add_unless(&bo_gem->refcount, -1, 1)) {
+		drm_intel_bufmgr_gem *bufmgr_gem =
+		    (drm_intel_bufmgr_gem *) bo->bufmgr;
+		struct timespec time;
+
+		clock_gettime(CLOCK_MONOTONIC, &time);
+
+		pthread_mutex_lock(&bufmgr_gem->lock);
+
+		if (atomic_dec_and_test(&bo_gem->refcount)) {
+			drm_intel_gem_bo_unreference_final(bo, time.tv_sec);
+			drm_intel_gem_cleanup_bo_cache(bufmgr_gem, time.tv_sec);
+		}
+
+		pthread_mutex_unlock(&bufmgr_gem->lock);
+	}
+}
+
+static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	struct drm_i915_gem_set_domain set_domain;
+	int ret;
+
+	if (bo_gem->is_userptr) {
+		/* Return the same user ptr */
+		bo->virtual = bo_gem->user_virtual;
+		return 0;
+	}
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	if (bo_gem->map_count++ == 0)
+		drm_intel_gem_bo_open_vma(bufmgr_gem, bo_gem);
+
+	if (!bo_gem->mem_virtual) {
+		struct drm_i915_gem_mmap mmap_arg;
+
+		DBG("bo_map: %d (%s), map_count=%d\n",
+		    bo_gem->gem_handle, bo_gem->name, bo_gem->map_count);
+
+		memclear(mmap_arg);
+		mmap_arg.handle = bo_gem->gem_handle;
+		mmap_arg.size = bo->size;
+		ret = drmIoctl(bufmgr_gem->fd,
+			       DRM_IOCTL_I915_GEM_MMAP,
+			       &mmap_arg);
+		if (ret != 0) {
+			ret = -errno;
+			DBG("%s:%d: Error mapping buffer %d (%s): %s .\n",
+			    __FILE__, __LINE__, bo_gem->gem_handle,
+			    bo_gem->name, strerror(errno));
+			if (--bo_gem->map_count == 0)
+				drm_intel_gem_bo_close_vma(bufmgr_gem, bo_gem);
+			pthread_mutex_unlock(&bufmgr_gem->lock);
+			return ret;
+		}
+		VG(VALGRIND_MALLOCLIKE_BLOCK(mmap_arg.addr_ptr, mmap_arg.size, 0, 1));
+		bo_gem->mem_virtual = (void *)(uintptr_t) mmap_arg.addr_ptr;
+	}
+	DBG("bo_map: %d (%s) -> %p\n", bo_gem->gem_handle, bo_gem->name,
+	    bo_gem->mem_virtual);
+	bo->virtual = bo_gem->mem_virtual;
+
+	memclear(set_domain);
+	set_domain.handle = bo_gem->gem_handle;
+	set_domain.read_domains = I915_GEM_DOMAIN_CPU;
+	if (write_enable)
+		set_domain.write_domain = I915_GEM_DOMAIN_CPU;
+	else
+		set_domain.write_domain = 0;
+	ret = drmIoctl(bufmgr_gem->fd,
+		       DRM_IOCTL_I915_GEM_SET_DOMAIN,
+		       &set_domain);
+	if (ret != 0) {
+		DBG("%s:%d: Error setting to CPU domain %d: %s\n",
+		    __FILE__, __LINE__, bo_gem->gem_handle,
+		    strerror(errno));
+	}
+
+	if (write_enable)
+		bo_gem->mapped_cpu_write = true;
+
+	drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+	VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->mem_virtual, bo->size));
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return 0;
+}
+
+static int map_bo(drm_intel_bo *bo)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	int ret;
+
+	if (bo_gem->map_count++ == 0)
+		drm_intel_gem_bo_open_vma(bufmgr_gem, bo_gem);
+
+	if (!bo_gem->mem_virtual) {
+		struct drm_i915_gem_mmap mmap_arg;
+
+		DBG("bo_map: %d (%s), map_count=%d\n",
+		    bo_gem->gem_handle, bo_gem->name, bo_gem->map_count);
+
+		memclear(mmap_arg);
+		mmap_arg.handle = bo_gem->gem_handle;
+		mmap_arg.size = bo->size;
+		ret = drmIoctl(bufmgr_gem->fd,
+			       DRM_IOCTL_I915_GEM_MMAP,
+			       &mmap_arg);
+		if (ret != 0) {
+			ret = -errno;
+			DBG("%s:%d: Error mapping buffer %d (%s): %s .\n",
+			    __FILE__, __LINE__, bo_gem->gem_handle,
+			    bo_gem->name, strerror(errno));
+			if (--bo_gem->map_count == 0)
+				drm_intel_gem_bo_close_vma(bufmgr_gem, bo_gem);
+			return ret;
+		}
+		VG(VALGRIND_MALLOCLIKE_BLOCK(mmap_arg.addr_ptr, mmap_arg.size, 0, 1));
+		bo_gem->mem_virtual = (void *)(uintptr_t) mmap_arg.addr_ptr;
+	}
+	DBG("bo_map: %d (%s) -> %p\n", bo_gem->gem_handle, bo_gem->name,
+	    bo_gem->mem_virtual);
+	bo->virtual = bo_gem->mem_virtual;
+
+	drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+	VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->mem_virtual, bo->size));
+
+	return 0;
+}
+
+drm_public int
+drm_intel_gem_bo_map_unsynchronized_non_gtt(drm_intel_bo *bo)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	int ret;
+
+	if (!bufmgr_gem->has_llc)
+		return drm_intel_gem_bo_map(bo, true);
+
+	if (bo_gem->is_userptr) {
+		/* Return the same user ptr */
+		bo->virtual = bo_gem->user_virtual;
+		return 0;
+	}
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+	ret = map_bo(bo);
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return ret;
+}
+
+static int
+map_gtt(drm_intel_bo *bo)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	int ret;
+
+	if (bo_gem->is_userptr)
+		return -EINVAL;
+
+	if (bo_gem->map_count++ == 0)
+		drm_intel_gem_bo_open_vma(bufmgr_gem, bo_gem);
+
+	/* Get a mapping of the buffer if we haven't before. */
+	if (bo_gem->gtt_virtual == NULL) {
+		struct drm_i915_gem_mmap_gtt mmap_arg;
+
+		DBG("bo_map_gtt: mmap %d (%s), map_count=%d\n",
+		    bo_gem->gem_handle, bo_gem->name, bo_gem->map_count);
+
+		memclear(mmap_arg);
+		mmap_arg.handle = bo_gem->gem_handle;
+
+		/* Get the fake offset back... */
+		ret = drmIoctl(bufmgr_gem->fd,
+			       DRM_IOCTL_I915_GEM_MMAP_GTT,
+			       &mmap_arg);
+		if (ret != 0) {
+			ret = -errno;
+			DBG("%s:%d: Error preparing buffer map %d (%s): %s .\n",
+			    __FILE__, __LINE__,
+			    bo_gem->gem_handle, bo_gem->name,
+			    strerror(errno));
+			if (--bo_gem->map_count == 0)
+				drm_intel_gem_bo_close_vma(bufmgr_gem, bo_gem);
+			return ret;
+		}
+
+		/* and mmap it */
+		bo_gem->gtt_virtual = drm_mmap(0, bo->size, PROT_READ | PROT_WRITE,
+					       MAP_SHARED, bufmgr_gem->fd,
+					       mmap_arg.offset);
+		if (bo_gem->gtt_virtual == MAP_FAILED) {
+			bo_gem->gtt_virtual = NULL;
+			ret = -errno;
+			DBG("%s:%d: Error mapping buffer %d (%s): %s .\n",
+			    __FILE__, __LINE__,
+			    bo_gem->gem_handle, bo_gem->name,
+			    strerror(errno));
+			if (--bo_gem->map_count == 0)
+				drm_intel_gem_bo_close_vma(bufmgr_gem, bo_gem);
+			return ret;
+		}
+	}
+
+	bo->virtual = bo_gem->gtt_virtual;
+
+	DBG("bo_map_gtt: %d (%s) -> %p\n", bo_gem->gem_handle, bo_gem->name,
+	    bo_gem->gtt_virtual);
+
+	return 0;
+}
+
+drm_public int
+drm_intel_gem_bo_map_gtt(drm_intel_bo *bo)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	struct drm_i915_gem_set_domain set_domain;
+	int ret;
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	ret = map_gtt(bo);
+	if (ret) {
+		pthread_mutex_unlock(&bufmgr_gem->lock);
+		return ret;
+	}
+
+	/* Now move it to the GTT domain so that the GPU and CPU
+	 * caches are flushed and the GPU isn't actively using the
+	 * buffer.
+	 *
+	 * The pagefault handler does this domain change for us when
+	 * it has unbound the BO from the GTT, but it's up to us to
+	 * tell it when we're about to use things if we had done
+	 * rendering and it still happens to be bound to the GTT.
+	 */
+	memclear(set_domain);
+	set_domain.handle = bo_gem->gem_handle;
+	set_domain.read_domains = I915_GEM_DOMAIN_GTT;
+	set_domain.write_domain = I915_GEM_DOMAIN_GTT;
+	ret = drmIoctl(bufmgr_gem->fd,
+		       DRM_IOCTL_I915_GEM_SET_DOMAIN,
+		       &set_domain);
+	if (ret != 0) {
+		DBG("%s:%d: Error setting domain %d: %s\n",
+		    __FILE__, __LINE__, bo_gem->gem_handle,
+		    strerror(errno));
+	}
+
+	drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+	VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->gtt_virtual, bo->size));
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return 0;
+}
+
+/**
+ * Performs a mapping of the buffer object like the normal GTT
+ * mapping, but avoids waiting for the GPU to be done reading from or
+ * rendering to the buffer.
+ *
+ * This is used in the implementation of GL_ARB_map_buffer_range: The
+ * user asks to create a buffer, then does a mapping, fills some
+ * space, runs a drawing command, then asks to map it again without
+ * synchronizing because it guarantees that it won't write over the
+ * data that the GPU is busy using (or, more specifically, that if it
+ * does write over the data, it acknowledges that rendering is
+ * undefined).
+ */
+
+drm_public int
+drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+#ifdef HAVE_VALGRIND
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+#endif
+	int ret;
+
+	/* If the CPU cache isn't coherent with the GTT, then use a
+	 * regular synchronized mapping.  The problem is that we don't
+	 * track where the buffer was last used on the CPU side in
+	 * terms of drm_intel_bo_map vs drm_intel_gem_bo_map_gtt, so
+	 * we would potentially corrupt the buffer even when the user
+	 * does reasonable things.
+	 */
+	if (!bufmgr_gem->has_llc)
+		return drm_intel_gem_bo_map_gtt(bo);
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	ret = map_gtt(bo);
+	if (ret == 0) {
+		drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+		VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->gtt_virtual, bo->size));
+	}
+
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return ret;
+}
+
+static int drm_intel_gem_bo_unmap(drm_intel_bo *bo)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	int ret = 0;
+
+	if (bo == NULL)
+		return 0;
+
+	if (bo_gem->is_userptr)
+		return 0;
+
+	bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	if (bo_gem->map_count <= 0) {
+		DBG("attempted to unmap an unmapped bo\n");
+		pthread_mutex_unlock(&bufmgr_gem->lock);
+		/* Preserve the old behaviour of just treating this as a
+		 * no-op rather than reporting the error.
+		 */
+		return 0;
+	}
+
+	if (bo_gem->mapped_cpu_write) {
+		struct drm_i915_gem_sw_finish sw_finish;
+
+		/* Cause a flush to happen if the buffer's pinned for
+		 * scanout, so the results show up in a timely manner.
+		 * Unlike GTT set domains, this only does work if the
+		 * buffer should be scanout-related.
+		 */
+		memclear(sw_finish);
+		sw_finish.handle = bo_gem->gem_handle;
+		ret = drmIoctl(bufmgr_gem->fd,
+			       DRM_IOCTL_I915_GEM_SW_FINISH,
+			       &sw_finish);
+		ret = ret == -1 ? -errno : 0;
+
+		bo_gem->mapped_cpu_write = false;
+	}
+
+	/* We need to unmap after every innovation as we cannot track
+	 * an open vma for every bo as that will exhaasut the system
+	 * limits and cause later failures.
+	 */
+	if (--bo_gem->map_count == 0) {
+		drm_intel_gem_bo_close_vma(bufmgr_gem, bo_gem);
+		drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+		bo->virtual = NULL;
+	}
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return ret;
+}
+
+drm_public int
+drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo)
+{
+	return drm_intel_gem_bo_unmap(bo);
+}
+
+static int
+drm_intel_gem_bo_subdata(drm_intel_bo *bo, unsigned long offset,
+			 unsigned long size, const void *data)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	struct drm_i915_gem_pwrite pwrite;
+	int ret;
+
+	if (bo_gem->is_userptr)
+		return -EINVAL;
+
+	memclear(pwrite);
+	pwrite.handle = bo_gem->gem_handle;
+	pwrite.offset = offset;
+	pwrite.size = size;
+	pwrite.data_ptr = (uint64_t) (uintptr_t) data;
+	ret = drmIoctl(bufmgr_gem->fd,
+		       DRM_IOCTL_I915_GEM_PWRITE,
+		       &pwrite);
+	if (ret != 0) {
+		ret = -errno;
+		DBG("%s:%d: Error writing data to buffer %d: (%d %d) %s .\n",
+		    __FILE__, __LINE__, bo_gem->gem_handle, (int)offset,
+		    (int)size, strerror(errno));
+	}
+
+	return ret;
+}
+
+static int
+drm_intel_gem_get_pipe_from_crtc_id(drm_intel_bufmgr *bufmgr, int crtc_id)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bufmgr;
+	struct drm_i915_get_pipe_from_crtc_id get_pipe_from_crtc_id;
+	int ret;
+
+	memclear(get_pipe_from_crtc_id);
+	get_pipe_from_crtc_id.crtc_id = crtc_id;
+	ret = drmIoctl(bufmgr_gem->fd,
+		       DRM_IOCTL_I915_GET_PIPE_FROM_CRTC_ID,
+		       &get_pipe_from_crtc_id);
+	if (ret != 0) {
+		/* We return -1 here to signal that we don't
+		 * know which pipe is associated with this crtc.
+		 * This lets the caller know that this information
+		 * isn't available; using the wrong pipe for
+		 * vblank waiting can cause the chipset to lock up
+		 */
+		return -1;
+	}
+
+	return get_pipe_from_crtc_id.pipe;
+}
+
+static int
+drm_intel_gem_bo_get_subdata(drm_intel_bo *bo, unsigned long offset,
+			     unsigned long size, void *data)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	struct drm_i915_gem_pread pread;
+	int ret;
+
+	if (bo_gem->is_userptr)
+		return -EINVAL;
+
+	memclear(pread);
+	pread.handle = bo_gem->gem_handle;
+	pread.offset = offset;
+	pread.size = size;
+	pread.data_ptr = (uint64_t) (uintptr_t) data;
+	ret = drmIoctl(bufmgr_gem->fd,
+		       DRM_IOCTL_I915_GEM_PREAD,
+		       &pread);
+	if (ret != 0) {
+		ret = -errno;
+		DBG("%s:%d: Error reading data from buffer %d: (%d %d) %s .\n",
+		    __FILE__, __LINE__, bo_gem->gem_handle, (int)offset,
+		    (int)size, strerror(errno));
+	}
+
+	return ret;
+}
+
+/** Waits for all GPU rendering with the object to have completed. */
+static void
+drm_intel_gem_bo_wait_rendering(drm_intel_bo *bo)
+{
+	drm_intel_gem_bo_start_gtt_access(bo, 1);
+}
+
+/**
+ * Waits on a BO for the given amount of time.
+ *
+ * @bo: buffer object to wait for
+ * @timeout_ns: amount of time to wait in nanoseconds.
+ *   If value is less than 0, an infinite wait will occur.
+ *
+ * Returns 0 if the wait was successful ie. the last batch referencing the
+ * object has completed within the allotted time. Otherwise some negative return
+ * value describes the error. Of particular interest is -ETIME when the wait has
+ * failed to yield the desired result.
+ *
+ * Similar to drm_intel_gem_bo_wait_rendering except a timeout parameter allows
+ * the operation to give up after a certain amount of time. Another subtle
+ * difference is the internal locking semantics are different (this variant does
+ * not hold the lock for the duration of the wait). This makes the wait subject
+ * to a larger userspace race window.
+ *
+ * The implementation shall wait until the object is no longer actively
+ * referenced within a batch buffer at the time of the call. The wait will
+ * not guarantee that the buffer is re-issued via another thread, or an flinked
+ * handle. Userspace must make sure this race does not occur if such precision
+ * is important.
+ *
+ * Note that some kernels have broken the inifite wait for negative values
+ * promise, upgrade to latest stable kernels if this is the case.
+ */
+drm_public int
+drm_intel_gem_bo_wait(drm_intel_bo *bo, int64_t timeout_ns)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	struct drm_i915_gem_wait wait;
+	int ret;
+
+	if (!bufmgr_gem->has_wait_timeout) {
+		DBG("%s:%d: Timed wait is not supported. Falling back to "
+		    "infinite wait\n", __FILE__, __LINE__);
+		if (timeout_ns) {
+			drm_intel_gem_bo_wait_rendering(bo);
+			return 0;
+		} else {
+			return drm_intel_gem_bo_busy(bo) ? -ETIME : 0;
+		}
+	}
+
+	memclear(wait);
+	wait.bo_handle = bo_gem->gem_handle;
+	wait.timeout_ns = timeout_ns;
+	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GEM_WAIT, &wait);
+	if (ret == -1)
+		return -errno;
+
+	return ret;
+}
+
+/**
+ * Sets the object to the GTT read and possibly write domain, used by the X
+ * 2D driver in the absence of kernel support to do drm_intel_gem_bo_map_gtt().
+ *
+ * In combination with drm_intel_gem_bo_pin() and manual fence management, we
+ * can do tiled pixmaps this way.
+ */
+drm_public void
+drm_intel_gem_bo_start_gtt_access(drm_intel_bo *bo, int write_enable)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	struct drm_i915_gem_set_domain set_domain;
+	int ret;
+
+	memclear(set_domain);
+	set_domain.handle = bo_gem->gem_handle;
+	set_domain.read_domains = I915_GEM_DOMAIN_GTT;
+	set_domain.write_domain = write_enable ? I915_GEM_DOMAIN_GTT : 0;
+	ret = drmIoctl(bufmgr_gem->fd,
+		       DRM_IOCTL_I915_GEM_SET_DOMAIN,
+		       &set_domain);
+	if (ret != 0) {
+		DBG("%s:%d: Error setting memory domains %d (%08x %08x): %s .\n",
+		    __FILE__, __LINE__, bo_gem->gem_handle,
+		    set_domain.read_domains, set_domain.write_domain,
+		    strerror(errno));
+	}
+}
+
+static void
+drm_intel_bufmgr_gem_destroy(drm_intel_bufmgr *bufmgr)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bufmgr;
+	int i;
+
+	free(bufmgr_gem->exec2_objects);
+	free(bufmgr_gem->exec_objects);
+	free(bufmgr_gem->exec_bos);
+	free(bufmgr_gem->aub_filename);
+
+	pthread_mutex_destroy(&bufmgr_gem->lock);
+
+	/* Free any cached buffer objects we were going to reuse */
+	for (i = 0; i < bufmgr_gem->num_buckets; i++) {
+		struct drm_intel_gem_bo_bucket *bucket =
+		    &bufmgr_gem->cache_bucket[i];
+		drm_intel_bo_gem *bo_gem;
+
+		while (!DRMLISTEMPTY(&bucket->head)) {
+			bo_gem = DRMLISTENTRY(drm_intel_bo_gem,
+					      bucket->head.next, head);
+			DRMLISTDEL(&bo_gem->head);
+
+			drm_intel_gem_bo_free(&bo_gem->bo);
+		}
+	}
+
+	free(bufmgr);
+}
+
+/**
+ * Adds the target buffer to the validation list and adds the relocation
+ * to the reloc_buffer's relocation list.
+ *
+ * The relocation entry at the given offset must already contain the
+ * precomputed relocation value, because the kernel will optimize out
+ * the relocation entry write when the buffer hasn't moved from the
+ * last known offset in target_bo.
+ */
+static int
+do_bo_emit_reloc(drm_intel_bo *bo, uint32_t offset,
+		 drm_intel_bo *target_bo, uint32_t target_offset,
+		 uint32_t read_domains, uint32_t write_domain,
+		 bool need_fence)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	drm_intel_bo_gem *target_bo_gem = (drm_intel_bo_gem *) target_bo;
+	bool fenced_command;
+
+	if (bo_gem->has_error)
+		return -ENOMEM;
+
+	if (target_bo_gem->has_error) {
+		bo_gem->has_error = true;
+		return -ENOMEM;
+	}
+
+	/* We never use HW fences for rendering on 965+ */
+	if (bufmgr_gem->gen >= 4)
+		need_fence = false;
+
+	fenced_command = need_fence;
+	if (target_bo_gem->tiling_mode == I915_TILING_NONE)
+		need_fence = false;
+
+	/* Create a new relocation list if needed */
+	if (bo_gem->relocs == NULL && drm_intel_setup_reloc_list(bo))
+		return -ENOMEM;
+
+	/* Check overflow */
+	assert(bo_gem->reloc_count < bufmgr_gem->max_relocs);
+
+	/* Check args */
+	assert(offset <= bo->size - 4);
+	assert((write_domain & (write_domain - 1)) == 0);
+
+	/* An object needing a fence is a tiled buffer, so it won't have
+	 * relocs to other buffers.
+	 */
+	if (need_fence) {
+		assert(target_bo_gem->reloc_count == 0);
+		target_bo_gem->reloc_tree_fences = 1;
+	}
+
+	/* Make sure that we're not adding a reloc to something whose size has
+	 * already been accounted for.
+	 */
+	assert(!bo_gem->used_as_reloc_target);
+	if (target_bo_gem != bo_gem) {
+		target_bo_gem->used_as_reloc_target = true;
+		bo_gem->reloc_tree_size += target_bo_gem->reloc_tree_size;
+		bo_gem->reloc_tree_fences += target_bo_gem->reloc_tree_fences;
+	}
+
+	bo_gem->relocs[bo_gem->reloc_count].offset = offset;
+	bo_gem->relocs[bo_gem->reloc_count].delta = target_offset;
+	bo_gem->relocs[bo_gem->reloc_count].target_handle =
+	    target_bo_gem->gem_handle;
+	bo_gem->relocs[bo_gem->reloc_count].read_domains = read_domains;
+	bo_gem->relocs[bo_gem->reloc_count].write_domain = write_domain;
+	bo_gem->relocs[bo_gem->reloc_count].presumed_offset = target_bo->offset64;
+
+	bo_gem->reloc_target_info[bo_gem->reloc_count].bo = target_bo;
+	if (target_bo != bo)
+		drm_intel_gem_bo_reference(target_bo);
+	if (fenced_command)
+		bo_gem->reloc_target_info[bo_gem->reloc_count].flags =
+			DRM_INTEL_RELOC_FENCE;
+	else
+		bo_gem->reloc_target_info[bo_gem->reloc_count].flags = 0;
+
+	bo_gem->reloc_count++;
+
+	return 0;
+}
+
+static int
+drm_intel_gem_bo_emit_reloc(drm_intel_bo *bo, uint32_t offset,
+			    drm_intel_bo *target_bo, uint32_t target_offset,
+			    uint32_t read_domains, uint32_t write_domain)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *)bo->bufmgr;
+
+	return do_bo_emit_reloc(bo, offset, target_bo, target_offset,
+				read_domains, write_domain,
+				!bufmgr_gem->fenced_relocs);
+}
+
+static int
+drm_intel_gem_bo_emit_reloc_fence(drm_intel_bo *bo, uint32_t offset,
+				  drm_intel_bo *target_bo,
+				  uint32_t target_offset,
+				  uint32_t read_domains, uint32_t write_domain)
+{
+	return do_bo_emit_reloc(bo, offset, target_bo, target_offset,
+				read_domains, write_domain, true);
+}
+
+drm_public int
+drm_intel_gem_bo_get_reloc_count(drm_intel_bo *bo)
+{
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+
+	return bo_gem->reloc_count;
+}
+
+/**
+ * Removes existing relocation entries in the BO after "start".
+ *
+ * This allows a user to avoid a two-step process for state setup with
+ * counting up all the buffer objects and doing a
+ * drm_intel_bufmgr_check_aperture_space() before emitting any of the
+ * relocations for the state setup.  Instead, save the state of the
+ * batchbuffer including drm_intel_gem_get_reloc_count(), emit all the
+ * state, and then check if it still fits in the aperture.
+ *
+ * Any further drm_intel_bufmgr_check_aperture_space() queries
+ * involving this buffer in the tree are undefined after this call.
+ */
+drm_public void
+drm_intel_gem_bo_clear_relocs(drm_intel_bo *bo, int start)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	int i;
+	struct timespec time;
+
+	clock_gettime(CLOCK_MONOTONIC, &time);
+
+	assert(bo_gem->reloc_count >= start);
+
+	/* Unreference the cleared target buffers */
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	for (i = start; i < bo_gem->reloc_count; i++) {
+		drm_intel_bo_gem *target_bo_gem = (drm_intel_bo_gem *) bo_gem->reloc_target_info[i].bo;
+		if (&target_bo_gem->bo != bo) {
+			bo_gem->reloc_tree_fences -= target_bo_gem->reloc_tree_fences;
+			drm_intel_gem_bo_unreference_locked_timed(&target_bo_gem->bo,
+								  time.tv_sec);
+		}
+	}
+	bo_gem->reloc_count = start;
+
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+}
+
+/**
+ * Walk the tree of relocations rooted at BO and accumulate the list of
+ * validations to be performed and update the relocation buffers with
+ * index values into the validation list.
+ */
+static void
+drm_intel_gem_bo_process_reloc(drm_intel_bo *bo)
+{
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	int i;
+
+	if (bo_gem->relocs == NULL)
+		return;
+
+	for (i = 0; i < bo_gem->reloc_count; i++) {
+		drm_intel_bo *target_bo = bo_gem->reloc_target_info[i].bo;
+
+		if (target_bo == bo)
+			continue;
+
+		drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+
+		/* Continue walking the tree depth-first. */
+		drm_intel_gem_bo_process_reloc(target_bo);
+
+		/* Add the target to the validate list */
+		drm_intel_add_validate_buffer(target_bo);
+	}
+}
+
+static void
+drm_intel_gem_bo_process_reloc2(drm_intel_bo *bo)
+{
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *)bo;
+	int i;
+
+	if (bo_gem->relocs == NULL)
+		return;
+
+	for (i = 0; i < bo_gem->reloc_count; i++) {
+		drm_intel_bo *target_bo = bo_gem->reloc_target_info[i].bo;
+		int need_fence;
+
+		if (target_bo == bo)
+			continue;
+
+		drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+
+		/* Continue walking the tree depth-first. */
+		drm_intel_gem_bo_process_reloc2(target_bo);
+
+		need_fence = (bo_gem->reloc_target_info[i].flags &
+			      DRM_INTEL_RELOC_FENCE);
+
+		/* Add the target to the validate list */
+		drm_intel_add_validate_buffer2(target_bo, need_fence);
+	}
+}
+
+
+static void
+drm_intel_update_buffer_offsets(drm_intel_bufmgr_gem *bufmgr_gem)
+{
+	int i;
+
+	for (i = 0; i < bufmgr_gem->exec_count; i++) {
+		drm_intel_bo *bo = bufmgr_gem->exec_bos[i];
+		drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+
+		/* Update the buffer offset */
+		if (bufmgr_gem->exec_objects[i].offset != bo->offset64) {
+			DBG("BO %d (%s) migrated: 0x%08lx -> 0x%08llx\n",
+			    bo_gem->gem_handle, bo_gem->name, bo->offset64,
+			    (unsigned long long)bufmgr_gem->exec_objects[i].
+			    offset);
+			bo->offset64 = bufmgr_gem->exec_objects[i].offset;
+			bo->offset = bufmgr_gem->exec_objects[i].offset;
+		}
+	}
+}
+
+static void
+drm_intel_update_buffer_offsets2 (drm_intel_bufmgr_gem *bufmgr_gem)
+{
+	int i;
+
+	for (i = 0; i < bufmgr_gem->exec_count; i++) {
+		drm_intel_bo *bo = bufmgr_gem->exec_bos[i];
+		drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *)bo;
+
+		/* Update the buffer offset */
+		if (bufmgr_gem->exec2_objects[i].offset != bo->offset64) {
+			DBG("BO %d (%s) migrated: 0x%08lx -> 0x%08llx\n",
+			    bo_gem->gem_handle, bo_gem->name, bo->offset64,
+			    (unsigned long long)bufmgr_gem->exec2_objects[i].offset);
+			bo->offset64 = bufmgr_gem->exec2_objects[i].offset;
+			bo->offset = bufmgr_gem->exec2_objects[i].offset;
+		}
+	}
+}
+
+static void
+aub_out(drm_intel_bufmgr_gem *bufmgr_gem, uint32_t data)
+{
+	fwrite(&data, 1, 4, bufmgr_gem->aub_file);
+}
+
+static void
+aub_out_data(drm_intel_bufmgr_gem *bufmgr_gem, void *data, size_t size)
+{
+	fwrite(data, 1, size, bufmgr_gem->aub_file);
+}
+
+static void
+aub_write_bo_data(drm_intel_bo *bo, uint32_t offset, uint32_t size)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	uint32_t *data;
+	unsigned int i;
+
+	data = malloc(bo->size);
+	drm_intel_bo_get_subdata(bo, offset, size, data);
+
+	/* Easy mode: write out bo with no relocations */
+	if (!bo_gem->reloc_count) {
+		aub_out_data(bufmgr_gem, data, size);
+		free(data);
+		return;
+	}
+
+	/* Otherwise, handle the relocations while writing. */
+	for (i = 0; i < size / 4; i++) {
+		int r;
+		for (r = 0; r < bo_gem->reloc_count; r++) {
+			struct drm_i915_gem_relocation_entry *reloc;
+			drm_intel_reloc_target *info;
+
+			reloc = &bo_gem->relocs[r];
+			info = &bo_gem->reloc_target_info[r];
+
+			if (reloc->offset == offset + i * 4) {
+				drm_intel_bo_gem *target_gem;
+				uint32_t val;
+
+				target_gem = (drm_intel_bo_gem *)info->bo;
+
+				val = reloc->delta;
+				val += target_gem->aub_offset;
+
+				aub_out(bufmgr_gem, val);
+				data[i] = val;
+				break;
+			}
+		}
+		if (r == bo_gem->reloc_count) {
+			/* no relocation, just the data */
+			aub_out(bufmgr_gem, data[i]);
+		}
+	}
+
+	free(data);
+}
+
+static void
+aub_bo_get_address(drm_intel_bo *bo)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+
+	/* Give the object a graphics address in the AUB file.  We
+	 * don't just use the GEM object address because we do AUB
+	 * dumping before execution -- we want to successfully log
+	 * when the hardware might hang, and we might even want to aub
+	 * capture for a driver trying to execute on a different
+	 * generation of hardware by disabling the actual kernel exec
+	 * call.
+	 */
+	bo_gem->aub_offset = bufmgr_gem->aub_offset;
+	bufmgr_gem->aub_offset += bo->size;
+	/* XXX: Handle aperture overflow. */
+	assert(bufmgr_gem->aub_offset < 256 * 1024 * 1024);
+}
+
+static void
+aub_write_trace_block(drm_intel_bo *bo, uint32_t type, uint32_t subtype,
+		      uint32_t offset, uint32_t size)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+
+	aub_out(bufmgr_gem,
+		CMD_AUB_TRACE_HEADER_BLOCK |
+		((bufmgr_gem->gen >= 8 ? 6 : 5) - 2));
+	aub_out(bufmgr_gem,
+		AUB_TRACE_MEMTYPE_GTT | type | AUB_TRACE_OP_DATA_WRITE);
+	aub_out(bufmgr_gem, subtype);
+	aub_out(bufmgr_gem, bo_gem->aub_offset + offset);
+	aub_out(bufmgr_gem, size);
+	if (bufmgr_gem->gen >= 8)
+		aub_out(bufmgr_gem, 0);
+	aub_write_bo_data(bo, offset, size);
+}
+
+/**
+ * Break up large objects into multiple writes.  Otherwise a 128kb VBO
+ * would overflow the 16 bits of size field in the packet header and
+ * everything goes badly after that.
+ */
+static void
+aub_write_large_trace_block(drm_intel_bo *bo, uint32_t type, uint32_t subtype,
+			    uint32_t offset, uint32_t size)
+{
+	uint32_t block_size;
+	uint32_t sub_offset;
+
+	for (sub_offset = 0; sub_offset < size; sub_offset += block_size) {
+		block_size = size - sub_offset;
+
+		if (block_size > 8 * 4096)
+			block_size = 8 * 4096;
+
+		aub_write_trace_block(bo, type, subtype, offset + sub_offset,
+				      block_size);
+	}
+}
+
+static void
+aub_write_bo(drm_intel_bo *bo)
+{
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	uint32_t offset = 0;
+	unsigned i;
+
+	aub_bo_get_address(bo);
+
+	/* Write out each annotated section separately. */
+	for (i = 0; i < bo_gem->aub_annotation_count; ++i) {
+		drm_intel_aub_annotation *annotation =
+			&bo_gem->aub_annotations[i];
+		uint32_t ending_offset = annotation->ending_offset;
+		if (ending_offset > bo->size)
+			ending_offset = bo->size;
+		if (ending_offset > offset) {
+			aub_write_large_trace_block(bo, annotation->type,
+						    annotation->subtype,
+						    offset,
+						    ending_offset - offset);
+			offset = ending_offset;
+		}
+	}
+
+	/* Write out any remaining unannotated data */
+	if (offset < bo->size) {
+		aub_write_large_trace_block(bo, AUB_TRACE_TYPE_NOTYPE, 0,
+					    offset, bo->size - offset);
+	}
+}
+
+/*
+ * Make a ringbuffer on fly and dump it
+ */
+static void
+aub_build_dump_ringbuffer(drm_intel_bufmgr_gem *bufmgr_gem,
+			  uint32_t batch_buffer, int ring_flag)
+{
+	uint32_t ringbuffer[4096];
+	int ring = AUB_TRACE_TYPE_RING_PRB0; /* The default ring */
+	int ring_count = 0;
+
+	if (ring_flag == I915_EXEC_BSD)
+		ring = AUB_TRACE_TYPE_RING_PRB1;
+	else if (ring_flag == I915_EXEC_BLT)
+		ring = AUB_TRACE_TYPE_RING_PRB2;
+
+	/* Make a ring buffer to execute our batchbuffer. */
+	memset(ringbuffer, 0, sizeof(ringbuffer));
+	if (bufmgr_gem->gen >= 8) {
+		ringbuffer[ring_count++] = AUB_MI_BATCH_BUFFER_START | (3 - 2);
+		ringbuffer[ring_count++] = batch_buffer;
+		ringbuffer[ring_count++] = 0;
+	} else {
+		ringbuffer[ring_count++] = AUB_MI_BATCH_BUFFER_START;
+		ringbuffer[ring_count++] = batch_buffer;
+	}
+
+	/* Write out the ring.  This appears to trigger execution of
+	 * the ring in the simulator.
+	 */
+	aub_out(bufmgr_gem,
+		CMD_AUB_TRACE_HEADER_BLOCK |
+		((bufmgr_gem->gen >= 8 ? 6 : 5) - 2));
+	aub_out(bufmgr_gem,
+		AUB_TRACE_MEMTYPE_GTT | ring | AUB_TRACE_OP_COMMAND_WRITE);
+	aub_out(bufmgr_gem, 0); /* general/surface subtype */
+	aub_out(bufmgr_gem, bufmgr_gem->aub_offset);
+	aub_out(bufmgr_gem, ring_count * 4);
+	if (bufmgr_gem->gen >= 8)
+		aub_out(bufmgr_gem, 0);
+
+	/* FIXME: Need some flush operations here? */
+	aub_out_data(bufmgr_gem, ringbuffer, ring_count * 4);
+
+	/* Update offset pointer */
+	bufmgr_gem->aub_offset += 4096;
+}
+
+drm_public void
+drm_intel_gem_bo_aub_dump_bmp(drm_intel_bo *bo,
+			      int x1, int y1, int width, int height,
+			      enum aub_dump_bmp_format format,
+			      int pitch, int offset)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *)bo;
+	uint32_t cpp;
+
+	switch (format) {
+	case AUB_DUMP_BMP_FORMAT_8BIT:
+		cpp = 1;
+		break;
+	case AUB_DUMP_BMP_FORMAT_ARGB_4444:
+		cpp = 2;
+		break;
+	case AUB_DUMP_BMP_FORMAT_ARGB_0888:
+	case AUB_DUMP_BMP_FORMAT_ARGB_8888:
+		cpp = 4;
+		break;
+	default:
+		printf("Unknown AUB dump format %d\n", format);
+		return;
+	}
+
+	if (!bufmgr_gem->aub_file)
+		return;
+
+	aub_out(bufmgr_gem, CMD_AUB_DUMP_BMP | 4);
+	aub_out(bufmgr_gem, (y1 << 16) | x1);
+	aub_out(bufmgr_gem,
+		(format << 24) |
+		(cpp << 19) |
+		pitch / 4);
+	aub_out(bufmgr_gem, (height << 16) | width);
+	aub_out(bufmgr_gem, bo_gem->aub_offset + offset);
+	aub_out(bufmgr_gem,
+		((bo_gem->tiling_mode != I915_TILING_NONE) ? (1 << 2) : 0) |
+		((bo_gem->tiling_mode == I915_TILING_Y) ? (1 << 3) : 0));
+}
+
+static void
+aub_exec(drm_intel_bo *bo, int ring_flag, int used)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	int i;
+	bool batch_buffer_needs_annotations;
+
+	if (!bufmgr_gem->aub_file)
+		return;
+
+	/* If batch buffer is not annotated, annotate it the best we
+	 * can.
+	 */
+	batch_buffer_needs_annotations = bo_gem->aub_annotation_count == 0;
+	if (batch_buffer_needs_annotations) {
+		drm_intel_aub_annotation annotations[2] = {
+			{ AUB_TRACE_TYPE_BATCH, 0, used },
+			{ AUB_TRACE_TYPE_NOTYPE, 0, bo->size }
+		};
+		drm_intel_bufmgr_gem_set_aub_annotations(bo, annotations, 2);
+	}
+
+	/* Write out all buffers to AUB memory */
+	for (i = 0; i < bufmgr_gem->exec_count; i++) {
+		aub_write_bo(bufmgr_gem->exec_bos[i]);
+	}
+
+	/* Remove any annotations we added */
+	if (batch_buffer_needs_annotations)
+		drm_intel_bufmgr_gem_set_aub_annotations(bo, NULL, 0);
+
+	/* Dump ring buffer */
+	aub_build_dump_ringbuffer(bufmgr_gem, bo_gem->aub_offset, ring_flag);
+
+	fflush(bufmgr_gem->aub_file);
+
+	/*
+	 * One frame has been dumped. So reset the aub_offset for the next frame.
+	 *
+	 * FIXME: Can we do this?
+	 */
+	bufmgr_gem->aub_offset = 0x10000;
+}
+
+static int
+drm_intel_gem_bo_exec(drm_intel_bo *bo, int used,
+		      drm_clip_rect_t * cliprects, int num_cliprects, int DR4)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	struct drm_i915_gem_execbuffer execbuf;
+	int ret, i;
+
+	if (bo_gem->has_error)
+		return -ENOMEM;
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+	/* Update indices and set up the validate list. */
+	drm_intel_gem_bo_process_reloc(bo);
+
+	/* Add the batch buffer to the validation list.  There are no
+	 * relocations pointing to it.
+	 */
+	drm_intel_add_validate_buffer(bo);
+
+	memclear(execbuf);
+	execbuf.buffers_ptr = (uintptr_t) bufmgr_gem->exec_objects;
+	execbuf.buffer_count = bufmgr_gem->exec_count;
+	execbuf.batch_start_offset = 0;
+	execbuf.batch_len = used;
+	execbuf.cliprects_ptr = (uintptr_t) cliprects;
+	execbuf.num_cliprects = num_cliprects;
+	execbuf.DR1 = 0;
+	execbuf.DR4 = DR4;
+
+	ret = drmIoctl(bufmgr_gem->fd,
+		       DRM_IOCTL_I915_GEM_EXECBUFFER,
+		       &execbuf);
+	if (ret != 0) {
+		ret = -errno;
+		if (errno == ENOSPC) {
+			DBG("Execbuffer fails to pin. "
+			    "Estimate: %u. Actual: %u. Available: %u\n",
+			    drm_intel_gem_estimate_batch_space(bufmgr_gem->exec_bos,
+							       bufmgr_gem->
+							       exec_count),
+			    drm_intel_gem_compute_batch_space(bufmgr_gem->exec_bos,
+							      bufmgr_gem->
+							      exec_count),
+			    (unsigned int)bufmgr_gem->gtt_size);
+		}
+	}
+	drm_intel_update_buffer_offsets(bufmgr_gem);
+
+	if (bufmgr_gem->bufmgr.debug)
+		drm_intel_gem_dump_validation_list(bufmgr_gem);
+
+	for (i = 0; i < bufmgr_gem->exec_count; i++) {
+		drm_intel_bo *bo = bufmgr_gem->exec_bos[i];
+		drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+
+		bo_gem->idle = false;
+
+		/* Disconnect the buffer from the validate list */
+		bo_gem->validate_index = -1;
+		bufmgr_gem->exec_bos[i] = NULL;
+	}
+	bufmgr_gem->exec_count = 0;
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return ret;
+}
+
+static int
+do_exec2(drm_intel_bo *bo, int used, drm_intel_context *ctx,
+	 drm_clip_rect_t *cliprects, int num_cliprects, int DR4,
+	 unsigned int flags)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *)bo->bufmgr;
+	struct drm_i915_gem_execbuffer2 execbuf;
+	int ret = 0;
+	int i;
+
+	switch (flags & 0x7) {
+	default:
+		return -EINVAL;
+	case I915_EXEC_BLT:
+		if (!bufmgr_gem->has_blt)
+			return -EINVAL;
+		break;
+	case I915_EXEC_BSD:
+		if (!bufmgr_gem->has_bsd)
+			return -EINVAL;
+		break;
+	case I915_EXEC_VEBOX:
+		if (!bufmgr_gem->has_vebox)
+			return -EINVAL;
+		break;
+	case I915_EXEC_RENDER:
+	case I915_EXEC_DEFAULT:
+		break;
+	}
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+	/* Update indices and set up the validate list. */
+	drm_intel_gem_bo_process_reloc2(bo);
+
+	/* Add the batch buffer to the validation list.  There are no relocations
+	 * pointing to it.
+	 */
+	drm_intel_add_validate_buffer2(bo, 0);
+
+	memclear(execbuf);
+	execbuf.buffers_ptr = (uintptr_t)bufmgr_gem->exec2_objects;
+	execbuf.buffer_count = bufmgr_gem->exec_count;
+	execbuf.batch_start_offset = 0;
+	execbuf.batch_len = used;
+	execbuf.cliprects_ptr = (uintptr_t)cliprects;
+	execbuf.num_cliprects = num_cliprects;
+	execbuf.DR1 = 0;
+	execbuf.DR4 = DR4;
+	execbuf.flags = flags;
+	if (ctx == NULL)
+		i915_execbuffer2_set_context_id(execbuf, 0);
+	else
+		i915_execbuffer2_set_context_id(execbuf, ctx->ctx_id);
+	execbuf.rsvd2 = 0;
+
+	aub_exec(bo, flags, used);
+
+	if (bufmgr_gem->no_exec)
+		goto skip_execution;
+
+	ret = drmIoctl(bufmgr_gem->fd,
+		       DRM_IOCTL_I915_GEM_EXECBUFFER2,
+		       &execbuf);
+	if (ret != 0) {
+		ret = -errno;
+		if (ret == -ENOSPC) {
+			DBG("Execbuffer fails to pin. "
+			    "Estimate: %u. Actual: %u. Available: %u\n",
+			    drm_intel_gem_estimate_batch_space(bufmgr_gem->exec_bos,
+							       bufmgr_gem->exec_count),
+			    drm_intel_gem_compute_batch_space(bufmgr_gem->exec_bos,
+							      bufmgr_gem->exec_count),
+			    (unsigned int) bufmgr_gem->gtt_size);
+		}
+	}
+	drm_intel_update_buffer_offsets2(bufmgr_gem);
+
+skip_execution:
+	if (bufmgr_gem->bufmgr.debug)
+		drm_intel_gem_dump_validation_list(bufmgr_gem);
+
+	for (i = 0; i < bufmgr_gem->exec_count; i++) {
+		drm_intel_bo *bo = bufmgr_gem->exec_bos[i];
+		drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *)bo;
+
+		bo_gem->idle = false;
+
+		/* Disconnect the buffer from the validate list */
+		bo_gem->validate_index = -1;
+		bufmgr_gem->exec_bos[i] = NULL;
+	}
+	bufmgr_gem->exec_count = 0;
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return ret;
+}
+
+static int
+drm_intel_gem_bo_exec2(drm_intel_bo *bo, int used,
+		       drm_clip_rect_t *cliprects, int num_cliprects,
+		       int DR4)
+{
+	return do_exec2(bo, used, NULL, cliprects, num_cliprects, DR4,
+			I915_EXEC_RENDER);
+}
+
+static int
+drm_intel_gem_bo_mrb_exec2(drm_intel_bo *bo, int used,
+			drm_clip_rect_t *cliprects, int num_cliprects, int DR4,
+			unsigned int flags)
+{
+	return do_exec2(bo, used, NULL, cliprects, num_cliprects, DR4,
+			flags);
+}
+
+drm_public int
+drm_intel_gem_bo_context_exec(drm_intel_bo *bo, drm_intel_context *ctx,
+			      int used, unsigned int flags)
+{
+	return do_exec2(bo, used, ctx, NULL, 0, 0, flags);
+}
+
+static int
+drm_intel_gem_bo_pin(drm_intel_bo *bo, uint32_t alignment)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	struct drm_i915_gem_pin pin;
+	int ret;
+
+	memclear(pin);
+	pin.handle = bo_gem->gem_handle;
+	pin.alignment = alignment;
+
+	ret = drmIoctl(bufmgr_gem->fd,
+		       DRM_IOCTL_I915_GEM_PIN,
+		       &pin);
+	if (ret != 0)
+		return -errno;
+
+	bo->offset64 = pin.offset;
+	bo->offset = pin.offset;
+	return 0;
+}
+
+static int
+drm_intel_gem_bo_unpin(drm_intel_bo *bo)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	struct drm_i915_gem_unpin unpin;
+	int ret;
+
+	memclear(unpin);
+	unpin.handle = bo_gem->gem_handle;
+
+	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GEM_UNPIN, &unpin);
+	if (ret != 0)
+		return -errno;
+
+	return 0;
+}
+
+static int
+drm_intel_gem_bo_set_tiling_internal(drm_intel_bo *bo,
+				     uint32_t tiling_mode,
+				     uint32_t stride)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	struct drm_i915_gem_set_tiling set_tiling;
+	int ret;
+
+	if (bo_gem->global_name == 0 &&
+	    tiling_mode == bo_gem->tiling_mode &&
+	    stride == bo_gem->stride)
+		return 0;
+
+	memset(&set_tiling, 0, sizeof(set_tiling));
+	do {
+		/* set_tiling is slightly broken and overwrites the
+		 * input on the error path, so we have to open code
+		 * rmIoctl.
+		 */
+		set_tiling.handle = bo_gem->gem_handle;
+		set_tiling.tiling_mode = tiling_mode;
+		set_tiling.stride = stride;
+
+		ret = ioctl(bufmgr_gem->fd,
+			    DRM_IOCTL_I915_GEM_SET_TILING,
+			    &set_tiling);
+	} while (ret == -1 && (errno == EINTR || errno == EAGAIN));
+	if (ret == -1)
+		return -errno;
+
+	bo_gem->tiling_mode = set_tiling.tiling_mode;
+	bo_gem->swizzle_mode = set_tiling.swizzle_mode;
+	bo_gem->stride = set_tiling.stride;
+	return 0;
+}
+
+static int
+drm_intel_gem_bo_set_tiling(drm_intel_bo *bo, uint32_t * tiling_mode,
+			    uint32_t stride)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	int ret;
+
+	/* Tiling with userptr surfaces is not supported
+	 * on all hardware so refuse it for time being.
+	 */
+	if (bo_gem->is_userptr)
+		return -EINVAL;
+
+	/* Linear buffers have no stride. By ensuring that we only ever use
+	 * stride 0 with linear buffers, we simplify our code.
+	 */
+	if (*tiling_mode == I915_TILING_NONE)
+		stride = 0;
+
+	ret = drm_intel_gem_bo_set_tiling_internal(bo, *tiling_mode, stride);
+	if (ret == 0)
+		drm_intel_bo_gem_set_in_aperture_size(bufmgr_gem, bo_gem);
+
+	*tiling_mode = bo_gem->tiling_mode;
+	return ret;
+}
+
+static int
+drm_intel_gem_bo_get_tiling(drm_intel_bo *bo, uint32_t * tiling_mode,
+			    uint32_t * swizzle_mode)
+{
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+
+	*tiling_mode = bo_gem->tiling_mode;
+	*swizzle_mode = bo_gem->swizzle_mode;
+	return 0;
+}
+
+drm_public drm_intel_bo *
+drm_intel_bo_gem_create_from_prime(drm_intel_bufmgr *bufmgr, int prime_fd, int size)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bufmgr;
+	int ret;
+	uint32_t handle;
+	drm_intel_bo_gem *bo_gem;
+	struct drm_i915_gem_get_tiling get_tiling;
+	drmMMListHead *list;
+
+	ret = drmPrimeFDToHandle(bufmgr_gem->fd, prime_fd, &handle);
+
+	/*
+	 * See if the kernel has already returned this buffer to us. Just as
+	 * for named buffers, we must not create two bo's pointing at the same
+	 * kernel object
+	 */
+	pthread_mutex_lock(&bufmgr_gem->lock);
+	for (list = bufmgr_gem->named.next;
+	     list != &bufmgr_gem->named;
+	     list = list->next) {
+		bo_gem = DRMLISTENTRY(drm_intel_bo_gem, list, name_list);
+		if (bo_gem->gem_handle == handle) {
+			drm_intel_gem_bo_reference(&bo_gem->bo);
+			pthread_mutex_unlock(&bufmgr_gem->lock);
+			return &bo_gem->bo;
+		}
+	}
+
+	if (ret) {
+	  fprintf(stderr,"ret is %d %d\n", ret, errno);
+	  pthread_mutex_unlock(&bufmgr_gem->lock);
+		return NULL;
+	}
+
+	bo_gem = calloc(1, sizeof(*bo_gem));
+	if (!bo_gem) {
+		pthread_mutex_unlock(&bufmgr_gem->lock);
+		return NULL;
+	}
+	/* Determine size of bo.  The fd-to-handle ioctl really should
+	 * return the size, but it doesn't.  If we have kernel 3.12 or
+	 * later, we can lseek on the prime fd to get the size.  Older
+	 * kernels will just fail, in which case we fall back to the
+	 * provided (estimated or guess size). */
+	ret = lseek(prime_fd, 0, SEEK_END);
+	if (ret != -1)
+		bo_gem->bo.size = ret;
+	else
+		bo_gem->bo.size = size;
+
+	bo_gem->bo.handle = handle;
+	bo_gem->bo.bufmgr = bufmgr;
+
+	bo_gem->gem_handle = handle;
+
+	atomic_set(&bo_gem->refcount, 1);
+
+	bo_gem->name = "prime";
+	bo_gem->validate_index = -1;
+	bo_gem->reloc_tree_fences = 0;
+	bo_gem->used_as_reloc_target = false;
+	bo_gem->has_error = false;
+	bo_gem->reusable = false;
+
+	DRMINITLISTHEAD(&bo_gem->vma_list);
+	DRMLISTADDTAIL(&bo_gem->name_list, &bufmgr_gem->named);
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	memclear(get_tiling);
+	get_tiling.handle = bo_gem->gem_handle;
+	ret = drmIoctl(bufmgr_gem->fd,
+		       DRM_IOCTL_I915_GEM_GET_TILING,
+		       &get_tiling);
+	if (ret != 0) {
+		drm_intel_gem_bo_unreference(&bo_gem->bo);
+		return NULL;
+	}
+	bo_gem->tiling_mode = get_tiling.tiling_mode;
+	bo_gem->swizzle_mode = get_tiling.swizzle_mode;
+	/* XXX stride is unknown */
+	drm_intel_bo_gem_set_in_aperture_size(bufmgr_gem, bo_gem);
+
+	return &bo_gem->bo;
+}
+
+drm_public int
+drm_intel_bo_gem_export_to_prime(drm_intel_bo *bo, int *prime_fd)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+        if (DRMLISTEMPTY(&bo_gem->name_list))
+                DRMLISTADDTAIL(&bo_gem->name_list, &bufmgr_gem->named);
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	if (drmPrimeHandleToFD(bufmgr_gem->fd, bo_gem->gem_handle,
+			       DRM_CLOEXEC, prime_fd) != 0)
+		return -errno;
+
+	bo_gem->reusable = false;
+
+	return 0;
+}
+
+static int
+drm_intel_gem_bo_flink(drm_intel_bo *bo, uint32_t * name)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	int ret;
+
+	if (!bo_gem->global_name) {
+		struct drm_gem_flink flink;
+
+		memclear(flink);
+		flink.handle = bo_gem->gem_handle;
+
+		pthread_mutex_lock(&bufmgr_gem->lock);
+
+		ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_GEM_FLINK, &flink);
+		if (ret != 0) {
+			pthread_mutex_unlock(&bufmgr_gem->lock);
+			return -errno;
+		}
+
+		bo_gem->global_name = flink.name;
+		bo_gem->reusable = false;
+
+                if (DRMLISTEMPTY(&bo_gem->name_list))
+                        DRMLISTADDTAIL(&bo_gem->name_list, &bufmgr_gem->named);
+		pthread_mutex_unlock(&bufmgr_gem->lock);
+	}
+
+	*name = bo_gem->global_name;
+	return 0;
+}
+
+/**
+ * Enables unlimited caching of buffer objects for reuse.
+ *
+ * This is potentially very memory expensive, as the cache at each bucket
+ * size is only bounded by how many buffers of that size we've managed to have
+ * in flight at once.
+ */
+drm_public void
+drm_intel_bufmgr_gem_enable_reuse(drm_intel_bufmgr *bufmgr)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bufmgr;
+
+	bufmgr_gem->bo_reuse = true;
+}
+
+/**
+ * Enable use of fenced reloc type.
+ *
+ * New code should enable this to avoid unnecessary fence register
+ * allocation.  If this option is not enabled, all relocs will have fence
+ * register allocated.
+ */
+drm_public void
+drm_intel_bufmgr_gem_enable_fenced_relocs(drm_intel_bufmgr *bufmgr)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *)bufmgr;
+
+	if (bufmgr_gem->bufmgr.bo_exec == drm_intel_gem_bo_exec2)
+		bufmgr_gem->fenced_relocs = true;
+}
+
+/**
+ * Return the additional aperture space required by the tree of buffer objects
+ * rooted at bo.
+ */
+static int
+drm_intel_gem_bo_get_aperture_space(drm_intel_bo *bo)
+{
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	int i;
+	int total = 0;
+
+	if (bo == NULL || bo_gem->included_in_check_aperture)
+		return 0;
+
+	total += bo->size;
+	bo_gem->included_in_check_aperture = true;
+
+	for (i = 0; i < bo_gem->reloc_count; i++)
+		total +=
+		    drm_intel_gem_bo_get_aperture_space(bo_gem->
+							reloc_target_info[i].bo);
+
+	return total;
+}
+
+/**
+ * Count the number of buffers in this list that need a fence reg
+ *
+ * If the count is greater than the number of available regs, we'll have
+ * to ask the caller to resubmit a batch with fewer tiled buffers.
+ *
+ * This function over-counts if the same buffer is used multiple times.
+ */
+static unsigned int
+drm_intel_gem_total_fences(drm_intel_bo ** bo_array, int count)
+{
+	int i;
+	unsigned int total = 0;
+
+	for (i = 0; i < count; i++) {
+		drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo_array[i];
+
+		if (bo_gem == NULL)
+			continue;
+
+		total += bo_gem->reloc_tree_fences;
+	}
+	return total;
+}
+
+/**
+ * Clear the flag set by drm_intel_gem_bo_get_aperture_space() so we're ready
+ * for the next drm_intel_bufmgr_check_aperture_space() call.
+ */
+static void
+drm_intel_gem_bo_clear_aperture_space_flag(drm_intel_bo *bo)
+{
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	int i;
+
+	if (bo == NULL || !bo_gem->included_in_check_aperture)
+		return;
+
+	bo_gem->included_in_check_aperture = false;
+
+	for (i = 0; i < bo_gem->reloc_count; i++)
+		drm_intel_gem_bo_clear_aperture_space_flag(bo_gem->
+							   reloc_target_info[i].bo);
+}
+
+/**
+ * Return a conservative estimate for the amount of aperture required
+ * for a collection of buffers. This may double-count some buffers.
+ */
+static unsigned int
+drm_intel_gem_estimate_batch_space(drm_intel_bo **bo_array, int count)
+{
+	int i;
+	unsigned int total = 0;
+
+	for (i = 0; i < count; i++) {
+		drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo_array[i];
+		if (bo_gem != NULL)
+			total += bo_gem->reloc_tree_size;
+	}
+	return total;
+}
+
+/**
+ * Return the amount of aperture needed for a collection of buffers.
+ * This avoids double counting any buffers, at the cost of looking
+ * at every buffer in the set.
+ */
+static unsigned int
+drm_intel_gem_compute_batch_space(drm_intel_bo **bo_array, int count)
+{
+	int i;
+	unsigned int total = 0;
+	drm_intel_bufmgr_gem *bufmgr_gem;
+
+	if (count == 0)
+	    return 0;
+
+	/* Protect accesses to aperture counting flags.
+	 */
+	bufmgr_gem = (drm_intel_bufmgr_gem *) bo_array[0]->bufmgr;
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	for (i = 0; i < count; i++) {
+		total += drm_intel_gem_bo_get_aperture_space(bo_array[i]);
+		/* For the first buffer object in the array, we get an
+		 * accurate count back for its reloc_tree size (since nothing
+		 * had been flagged as being counted yet).  We can save that
+		 * value out as a more conservative reloc_tree_size that
+		 * avoids double-counting target buffers.  Since the first
+		 * buffer happens to usually be the batch buffer in our
+		 * callers, this can pull us back from doing the tree
+		 * walk on every new batch emit.
+		 */
+		if (i == 0) {
+			drm_intel_bo_gem *bo_gem =
+			    (drm_intel_bo_gem *) bo_array[i];
+			bo_gem->reloc_tree_size = total;
+		}
+	}
+
+	for (i = 0; i < count; i++)
+		drm_intel_gem_bo_clear_aperture_space_flag(bo_array[i]);
+
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return total;
+}
+
+/**
+ * Return -1 if the batchbuffer should be flushed before attempting to
+ * emit rendering referencing the buffers pointed to by bo_array.
+ *
+ * This is required because if we try to emit a batchbuffer with relocations
+ * to a tree of buffers that won't simultaneously fit in the aperture,
+ * the rendering will return an error at a point where the software is not
+ * prepared to recover from it.
+ *
+ * However, we also want to emit the batchbuffer significantly before we reach
+ * the limit, as a series of batchbuffers each of which references buffers
+ * covering almost all of the aperture means that at each emit we end up
+ * waiting to evict a buffer from the last rendering, and we get synchronous
+ * performance.  By emitting smaller batchbuffers, we eat some CPU overhead to
+ * get better parallelism.
+ */
+static int
+drm_intel_gem_check_aperture_space(drm_intel_bo **bo_array, int count)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem =
+	    (drm_intel_bufmgr_gem *) bo_array[0]->bufmgr;
+	unsigned int total = 0;
+	unsigned int threshold = bufmgr_gem->gtt_size * 3 / 4;
+	int total_fences;
+
+	/* Check for fence reg constraints if necessary */
+	if (bufmgr_gem->available_fences) {
+		total_fences = drm_intel_gem_total_fences(bo_array, count);
+		if (total_fences > bufmgr_gem->available_fences)
+			return -ENOSPC;
+	}
+
+	total = drm_intel_gem_estimate_batch_space(bo_array, count);
+
+	if (total > threshold)
+		total = drm_intel_gem_compute_batch_space(bo_array, count);
+
+	if (total > threshold) {
+		DBG("check_space: overflowed available aperture, "
+		    "%dkb vs %dkb\n",
+		    total / 1024, (int)bufmgr_gem->gtt_size / 1024);
+		return -ENOSPC;
+	} else {
+		DBG("drm_check_space: total %dkb vs bufgr %dkb\n", total / 1024,
+		    (int)bufmgr_gem->gtt_size / 1024);
+		return 0;
+	}
+}
+
+/*
+ * Disable buffer reuse for objects which are shared with the kernel
+ * as scanout buffers
+ */
+static int
+drm_intel_gem_bo_disable_reuse(drm_intel_bo *bo)
+{
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+
+	bo_gem->reusable = false;
+	return 0;
+}
+
+static int
+drm_intel_gem_bo_is_reusable(drm_intel_bo *bo)
+{
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+
+	return bo_gem->reusable;
+}
+
+static int
+_drm_intel_gem_bo_references(drm_intel_bo *bo, drm_intel_bo *target_bo)
+{
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	int i;
+
+	for (i = 0; i < bo_gem->reloc_count; i++) {
+		if (bo_gem->reloc_target_info[i].bo == target_bo)
+			return 1;
+		if (bo == bo_gem->reloc_target_info[i].bo)
+			continue;
+		if (_drm_intel_gem_bo_references(bo_gem->reloc_target_info[i].bo,
+						target_bo))
+			return 1;
+	}
+
+	return 0;
+}
+
+/** Return true if target_bo is referenced by bo's relocation tree. */
+static int
+drm_intel_gem_bo_references(drm_intel_bo *bo, drm_intel_bo *target_bo)
+{
+	drm_intel_bo_gem *target_bo_gem = (drm_intel_bo_gem *) target_bo;
+
+	if (bo == NULL || target_bo == NULL)
+		return 0;
+	if (target_bo_gem->used_as_reloc_target)
+		return _drm_intel_gem_bo_references(bo, target_bo);
+	return 0;
+}
+
+static void
+add_bucket(drm_intel_bufmgr_gem *bufmgr_gem, int size)
+{
+	unsigned int i = bufmgr_gem->num_buckets;
+
+	assert(i < ARRAY_SIZE(bufmgr_gem->cache_bucket));
+
+	DRMINITLISTHEAD(&bufmgr_gem->cache_bucket[i].head);
+	bufmgr_gem->cache_bucket[i].size = size;
+	bufmgr_gem->num_buckets++;
+}
+
+static void
+init_cache_buckets(drm_intel_bufmgr_gem *bufmgr_gem)
+{
+	unsigned long size, cache_max_size = 64 * 1024 * 1024;
+
+	/* OK, so power of two buckets was too wasteful of memory.
+	 * Give 3 other sizes between each power of two, to hopefully
+	 * cover things accurately enough.  (The alternative is
+	 * probably to just go for exact matching of sizes, and assume
+	 * that for things like composited window resize the tiled
+	 * width/height alignment and rounding of sizes to pages will
+	 * get us useful cache hit rates anyway)
+	 */
+	add_bucket(bufmgr_gem, 4096);
+	add_bucket(bufmgr_gem, 4096 * 2);
+	add_bucket(bufmgr_gem, 4096 * 3);
+
+	/* Initialize the linked lists for BO reuse cache. */
+	for (size = 4 * 4096; size <= cache_max_size; size *= 2) {
+		add_bucket(bufmgr_gem, size);
+
+		add_bucket(bufmgr_gem, size + size * 1 / 4);
+		add_bucket(bufmgr_gem, size + size * 2 / 4);
+		add_bucket(bufmgr_gem, size + size * 3 / 4);
+	}
+}
+
+drm_public void
+drm_intel_bufmgr_gem_set_vma_cache_size(drm_intel_bufmgr *bufmgr, int limit)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *)bufmgr;
+
+	bufmgr_gem->vma_max = limit;
+
+	drm_intel_gem_bo_purge_vma_cache(bufmgr_gem);
+}
+
+/**
+ * Get the PCI ID for the device.  This can be overridden by setting the
+ * INTEL_DEVID_OVERRIDE environment variable to the desired ID.
+ */
+static int
+get_pci_device_id(drm_intel_bufmgr_gem *bufmgr_gem)
+{
+	char *devid_override;
+	int devid = 0;
+	int ret;
+	drm_i915_getparam_t gp;
+
+	if (geteuid() == getuid()) {
+		devid_override = getenv("INTEL_DEVID_OVERRIDE");
+		if (devid_override) {
+			bufmgr_gem->no_exec = true;
+			return strtod(devid_override, NULL);
+		}
+	}
+
+	memclear(gp);
+	gp.param = I915_PARAM_CHIPSET_ID;
+	gp.value = &devid;
+	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
+	if (ret) {
+		fprintf(stderr, "get chip id failed: %d [%d]\n", ret, errno);
+		fprintf(stderr, "param: %d, val: %d\n", gp.param, *gp.value);
+	}
+	return devid;
+}
+
+drm_public int
+drm_intel_bufmgr_gem_get_devid(drm_intel_bufmgr *bufmgr)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *)bufmgr;
+
+	return bufmgr_gem->pci_device;
+}
+
+/**
+ * Sets the AUB filename.
+ *
+ * This function has to be called before drm_intel_bufmgr_gem_set_aub_dump()
+ * for it to have any effect.
+ */
+drm_public void
+drm_intel_bufmgr_gem_set_aub_filename(drm_intel_bufmgr *bufmgr,
+				      const char *filename)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *)bufmgr;
+
+	free(bufmgr_gem->aub_filename);
+	if (filename)
+		bufmgr_gem->aub_filename = strdup(filename);
+}
+
+/**
+ * Sets up AUB dumping.
+ *
+ * This is a trace file format that can be used with the simulator.
+ * Packets are emitted in a format somewhat like GPU command packets.
+ * You can set up a GTT and upload your objects into the referenced
+ * space, then send off batchbuffers and get BMPs out the other end.
+ */
+drm_public void
+drm_intel_bufmgr_gem_set_aub_dump(drm_intel_bufmgr *bufmgr, int enable)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *)bufmgr;
+	int entry = 0x200003;
+	int i;
+	int gtt_size = 0x10000;
+	const char *filename;
+
+	if (!enable) {
+		if (bufmgr_gem->aub_file) {
+			fclose(bufmgr_gem->aub_file);
+			bufmgr_gem->aub_file = NULL;
+		}
+		return;
+	}
+
+	if (geteuid() != getuid())
+		return;
+
+	if (bufmgr_gem->aub_filename)
+		filename = bufmgr_gem->aub_filename;
+	else
+		filename = "intel.aub";
+	bufmgr_gem->aub_file = fopen(filename, "w+");
+	if (!bufmgr_gem->aub_file)
+		return;
+
+	/* Start allocating objects from just after the GTT. */
+	bufmgr_gem->aub_offset = gtt_size;
+
+	/* Start with a (required) version packet. */
+	aub_out(bufmgr_gem, CMD_AUB_HEADER | (13 - 2));
+	aub_out(bufmgr_gem,
+		(4 << AUB_HEADER_MAJOR_SHIFT) |
+		(0 << AUB_HEADER_MINOR_SHIFT));
+	for (i = 0; i < 8; i++) {
+		aub_out(bufmgr_gem, 0); /* app name */
+	}
+	aub_out(bufmgr_gem, 0); /* timestamp */
+	aub_out(bufmgr_gem, 0); /* timestamp */
+	aub_out(bufmgr_gem, 0); /* comment len */
+
+	/* Set up the GTT. The max we can handle is 256M */
+	aub_out(bufmgr_gem, CMD_AUB_TRACE_HEADER_BLOCK | ((bufmgr_gem->gen >= 8 ? 6 : 5) - 2));
+	/* Need to use GTT_ENTRY type for recent emulator */
+	aub_out(bufmgr_gem, AUB_TRACE_MEMTYPE_GTT_ENTRY | 0 | AUB_TRACE_OP_DATA_WRITE);
+	aub_out(bufmgr_gem, 0); /* subtype */
+	aub_out(bufmgr_gem, 0); /* offset */
+	aub_out(bufmgr_gem, gtt_size); /* size */
+	if (bufmgr_gem->gen >= 8)
+		aub_out(bufmgr_gem, 0);
+	for (i = 0x000; i < gtt_size; i += 4, entry += 0x1000) {
+		aub_out(bufmgr_gem, entry);
+	}
+}
+
+drm_public drm_intel_context *
+drm_intel_gem_context_create(drm_intel_bufmgr *bufmgr)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *)bufmgr;
+	struct drm_i915_gem_context_create create;
+	drm_intel_context *context = NULL;
+	int ret;
+
+	context = calloc(1, sizeof(*context));
+	if (!context)
+		return NULL;
+
+	memclear(create);
+	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GEM_CONTEXT_CREATE, &create);
+	if (ret != 0) {
+		DBG("DRM_IOCTL_I915_GEM_CONTEXT_CREATE failed: %s\n",
+		    strerror(errno));
+		free(context);
+		return NULL;
+	}
+
+	context->ctx_id = create.ctx_id;
+	context->bufmgr = bufmgr;
+
+	return context;
+}
+
+drm_public void
+drm_intel_gem_context_destroy(drm_intel_context *ctx)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem;
+	struct drm_i915_gem_context_destroy destroy;
+	int ret;
+
+	if (ctx == NULL)
+		return;
+
+	memclear(destroy);
+
+	bufmgr_gem = (drm_intel_bufmgr_gem *)ctx->bufmgr;
+	destroy.ctx_id = ctx->ctx_id;
+	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GEM_CONTEXT_DESTROY,
+		       &destroy);
+	if (ret != 0)
+		fprintf(stderr, "DRM_IOCTL_I915_GEM_CONTEXT_DESTROY failed: %s\n",
+			strerror(errno));
+
+	free(ctx);
+}
+
+drm_public int
+drm_intel_get_reset_stats(drm_intel_context *ctx,
+			  uint32_t *reset_count,
+			  uint32_t *active,
+			  uint32_t *pending)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem;
+	struct drm_i915_reset_stats stats;
+	int ret;
+
+	if (ctx == NULL)
+		return -EINVAL;
+
+	memclear(stats);
+
+	bufmgr_gem = (drm_intel_bufmgr_gem *)ctx->bufmgr;
+	stats.ctx_id = ctx->ctx_id;
+	ret = drmIoctl(bufmgr_gem->fd,
+		       DRM_IOCTL_I915_GET_RESET_STATS,
+		       &stats);
+	if (ret == 0) {
+		if (reset_count != NULL)
+			*reset_count = stats.reset_count;
+
+		if (active != NULL)
+			*active = stats.batch_active;
+
+		if (pending != NULL)
+			*pending = stats.batch_pending;
+	}
+
+	return ret;
+}
+
+drm_public int
+drm_intel_reg_read(drm_intel_bufmgr *bufmgr,
+		   uint32_t offset,
+		   uint64_t *result)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *)bufmgr;
+	struct drm_i915_reg_read reg_read;
+	int ret;
+
+	memclear(reg_read);
+	reg_read.offset = offset;
+
+	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_REG_READ, &reg_read);
+
+	*result = reg_read.val;
+	return ret;
+}
+
+drm_public int
+drm_intel_get_subslice_total(int fd, unsigned int *subslice_total)
+{
+	drm_i915_getparam_t gp;
+	int ret;
+
+	memclear(gp);
+	gp.value = (int*)subslice_total;
+	gp.param = I915_PARAM_SUBSLICE_TOTAL;
+	ret = drmIoctl(fd, DRM_IOCTL_I915_GETPARAM, &gp);
+	if (ret)
+		return -errno;
+
+	return 0;
+}
+
+drm_public int
+drm_intel_get_eu_total(int fd, unsigned int *eu_total)
+{
+	drm_i915_getparam_t gp;
+	int ret;
+
+	memclear(gp);
+	gp.value = (int*)eu_total;
+	gp.param = I915_PARAM_EU_TOTAL;
+	ret = drmIoctl(fd, DRM_IOCTL_I915_GETPARAM, &gp);
+	if (ret)
+		return -errno;
+
+	return 0;
+}
+
+/**
+ * Annotate the given bo for use in aub dumping.
+ *
+ * \param annotations is an array of drm_intel_aub_annotation objects
+ * describing the type of data in various sections of the bo.  Each
+ * element of the array specifies the type and subtype of a section of
+ * the bo, and the past-the-end offset of that section.  The elements
+ * of \c annotations must be sorted so that ending_offset is
+ * increasing.
+ *
+ * \param count is the number of elements in the \c annotations array.
+ * If \c count is zero, then \c annotations will not be dereferenced.
+ *
+ * Annotations are copied into a private data structure, so caller may
+ * re-use the memory pointed to by \c annotations after the call
+ * returns.
+ *
+ * Annotations are stored for the lifetime of the bo; to reset to the
+ * default state (no annotations), call this function with a \c count
+ * of zero.
+ */
+drm_public void
+drm_intel_bufmgr_gem_set_aub_annotations(drm_intel_bo *bo,
+					 drm_intel_aub_annotation *annotations,
+					 unsigned count)
+{
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	unsigned size = sizeof(*annotations) * count;
+	drm_intel_aub_annotation *new_annotations =
+		count > 0 ? realloc(bo_gem->aub_annotations, size) : NULL;
+	if (new_annotations == NULL) {
+		free(bo_gem->aub_annotations);
+		bo_gem->aub_annotations = NULL;
+		bo_gem->aub_annotation_count = 0;
+		return;
+	}
+	memcpy(new_annotations, annotations, size);
+	bo_gem->aub_annotations = new_annotations;
+	bo_gem->aub_annotation_count = count;
+}
+
+static pthread_mutex_t bufmgr_list_mutex = PTHREAD_MUTEX_INITIALIZER;
+static drmMMListHead bufmgr_list = { &bufmgr_list, &bufmgr_list };
+
+static drm_intel_bufmgr_gem *
+drm_intel_bufmgr_gem_find(int fd)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem;
+
+	DRMLISTFOREACHENTRY(bufmgr_gem, &bufmgr_list, managers) {
+		if (bufmgr_gem->fd == fd) {
+			atomic_inc(&bufmgr_gem->refcount);
+			return bufmgr_gem;
+		}
+	}
+
+	return NULL;
+}
+
+static void
+drm_intel_bufmgr_gem_unref(drm_intel_bufmgr *bufmgr)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *)bufmgr;
+
+	if (atomic_add_unless(&bufmgr_gem->refcount, -1, 1)) {
+		pthread_mutex_lock(&bufmgr_list_mutex);
+
+		if (atomic_dec_and_test(&bufmgr_gem->refcount)) {
+			DRMLISTDEL(&bufmgr_gem->managers);
+			drm_intel_bufmgr_gem_destroy(bufmgr);
+		}
+
+		pthread_mutex_unlock(&bufmgr_list_mutex);
+	}
+}
+
+static bool
+has_userptr(drm_intel_bufmgr_gem *bufmgr_gem)
+{
+	int ret;
+	void *ptr;
+	long pgsz;
+	struct drm_i915_gem_userptr userptr;
+	struct drm_gem_close close_bo;
+
+	pgsz = sysconf(_SC_PAGESIZE);
+	assert(pgsz > 0);
+
+	ret = posix_memalign(&ptr, pgsz, pgsz);
+	if (ret) {
+		DBG("Failed to get a page (%ld) for userptr detection!\n",
+			pgsz);
+		return false;
+	}
+
+	memclear(userptr);
+	userptr.user_ptr = (__u64)(unsigned long)ptr;
+	userptr.user_size = pgsz;
+
+retry:
+	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GEM_USERPTR, &userptr);
+	if (ret) {
+		if (errno == ENODEV && userptr.flags == 0) {
+			userptr.flags = I915_USERPTR_UNSYNCHRONIZED;
+			goto retry;
+		}
+		free(ptr);
+		return false;
+	}
+
+	memclear(close_bo);
+	close_bo.handle = userptr.handle;
+	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_GEM_CLOSE, &close_bo);
+	free(ptr);
+	if (ret) {
+		fprintf(stderr, "Failed to release test userptr object! (%d) "
+				"i915 kernel driver may not be sane!\n", errno);
+		return false;
+	}
+
+	return true;
+}
+
+/**
+ * Initializes the GEM buffer manager, which uses the kernel to allocate, map,
+ * and manage map buffer objections.
+ *
+ * \param fd File descriptor of the opened DRM device.
+ */
+drm_public drm_intel_bufmgr *
+drm_intel_bufmgr_gem_init(int fd, int batch_size)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem;
+	struct drm_i915_gem_get_aperture aperture;
+	drm_i915_getparam_t gp;
+	int ret, tmp;
+	bool exec2 = false;
+
+	pthread_mutex_lock(&bufmgr_list_mutex);
+
+	bufmgr_gem = drm_intel_bufmgr_gem_find(fd);
+	if (bufmgr_gem)
+		goto exit;
+
+	bufmgr_gem = calloc(1, sizeof(*bufmgr_gem));
+	if (bufmgr_gem == NULL)
+		goto exit;
+
+	bufmgr_gem->fd = fd;
+	atomic_set(&bufmgr_gem->refcount, 1);
+
+	if (pthread_mutex_init(&bufmgr_gem->lock, NULL) != 0) {
+		free(bufmgr_gem);
+		bufmgr_gem = NULL;
+		goto exit;
+	}
+
+	memclear(aperture);
+	ret = drmIoctl(bufmgr_gem->fd,
+		       DRM_IOCTL_I915_GEM_GET_APERTURE,
+		       &aperture);
+
+	if (ret == 0)
+		bufmgr_gem->gtt_size = aperture.aper_available_size;
+	else {
+		fprintf(stderr, "DRM_IOCTL_I915_GEM_APERTURE failed: %s\n",
+			strerror(errno));
+		bufmgr_gem->gtt_size = 128 * 1024 * 1024;
+		fprintf(stderr, "Assuming %dkB available aperture size.\n"
+			"May lead to reduced performance or incorrect "
+			"rendering.\n",
+			(int)bufmgr_gem->gtt_size / 1024);
+	}
+
+	bufmgr_gem->pci_device = get_pci_device_id(bufmgr_gem);
+
+	if (IS_GEN2(bufmgr_gem->pci_device))
+		bufmgr_gem->gen = 2;
+	else if (IS_GEN3(bufmgr_gem->pci_device))
+		bufmgr_gem->gen = 3;
+	else if (IS_GEN4(bufmgr_gem->pci_device))
+		bufmgr_gem->gen = 4;
+	else if (IS_GEN5(bufmgr_gem->pci_device))
+		bufmgr_gem->gen = 5;
+	else if (IS_GEN6(bufmgr_gem->pci_device))
+		bufmgr_gem->gen = 6;
+	else if (IS_GEN7(bufmgr_gem->pci_device))
+		bufmgr_gem->gen = 7;
+	else if (IS_GEN8(bufmgr_gem->pci_device))
+		bufmgr_gem->gen = 8;
+	else if (IS_GEN9(bufmgr_gem->pci_device))
+		bufmgr_gem->gen = 9;
+	else {
+		free(bufmgr_gem);
+		bufmgr_gem = NULL;
+		goto exit;
+	}
+
+	if (IS_GEN3(bufmgr_gem->pci_device) &&
+	    bufmgr_gem->gtt_size > 256*1024*1024) {
+		/* The unmappable part of gtt on gen 3 (i.e. above 256MB) can't
+		 * be used for tiled blits. To simplify the accounting, just
+		 * subtract the unmappable part (fixed to 256MB on all known
+		 * gen3 devices) if the kernel advertises it. */
+		bufmgr_gem->gtt_size -= 256*1024*1024;
+	}
+
+	memclear(gp);
+	gp.value = &tmp;
+
+	gp.param = I915_PARAM_HAS_EXECBUF2;
+	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
+	if (!ret)
+		exec2 = true;
+
+	gp.param = I915_PARAM_HAS_BSD;
+	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
+	bufmgr_gem->has_bsd = ret == 0;
+
+	gp.param = I915_PARAM_HAS_BLT;
+	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
+	bufmgr_gem->has_blt = ret == 0;
+
+	gp.param = I915_PARAM_HAS_RELAXED_FENCING;
+	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
+	bufmgr_gem->has_relaxed_fencing = ret == 0;
+
+	if (has_userptr(bufmgr_gem))
+		bufmgr_gem->bufmgr.bo_alloc_userptr =
+			drm_intel_gem_bo_alloc_userptr;
+
+	gp.param = I915_PARAM_HAS_WAIT_TIMEOUT;
+	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
+	bufmgr_gem->has_wait_timeout = ret == 0;
+
+	gp.param = I915_PARAM_HAS_LLC;
+	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
+	if (ret != 0) {
+		/* Kernel does not supports HAS_LLC query, fallback to GPU
+		 * generation detection and assume that we have LLC on GEN6/7
+		 */
+		bufmgr_gem->has_llc = (IS_GEN6(bufmgr_gem->pci_device) |
+				IS_GEN7(bufmgr_gem->pci_device));
+	} else
+		bufmgr_gem->has_llc = *gp.value;
+
+	gp.param = I915_PARAM_HAS_VEBOX;
+	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
+	bufmgr_gem->has_vebox = (ret == 0) & (*gp.value > 0);
+
+	if (bufmgr_gem->gen < 4) {
+		gp.param = I915_PARAM_NUM_FENCES_AVAIL;
+		gp.value = &bufmgr_gem->available_fences;
+		ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
+		if (ret) {
+			fprintf(stderr, "get fences failed: %d [%d]\n", ret,
+				errno);
+			fprintf(stderr, "param: %d, val: %d\n", gp.param,
+				*gp.value);
+			bufmgr_gem->available_fences = 0;
+		} else {
+			/* XXX The kernel reports the total number of fences,
+			 * including any that may be pinned.
+			 *
+			 * We presume that there will be at least one pinned
+			 * fence for the scanout buffer, but there may be more
+			 * than one scanout and the user may be manually
+			 * pinning buffers. Let's move to execbuffer2 and
+			 * thereby forget the insanity of using fences...
+			 */
+			bufmgr_gem->available_fences -= 2;
+			if (bufmgr_gem->available_fences < 0)
+				bufmgr_gem->available_fences = 0;
+		}
+	}
+
+	/* Let's go with one relocation per every 2 dwords (but round down a bit
+	 * since a power of two will mean an extra page allocation for the reloc
+	 * buffer).
+	 *
+	 * Every 4 was too few for the blender benchmark.
+	 */
+	bufmgr_gem->max_relocs = batch_size / sizeof(uint32_t) / 2 - 2;
+
+	bufmgr_gem->bufmgr.bo_alloc = drm_intel_gem_bo_alloc;
+	bufmgr_gem->bufmgr.bo_alloc_for_render =
+	    drm_intel_gem_bo_alloc_for_render;
+	bufmgr_gem->bufmgr.bo_alloc_tiled = drm_intel_gem_bo_alloc_tiled;
+	bufmgr_gem->bufmgr.bo_reference = drm_intel_gem_bo_reference;
+	bufmgr_gem->bufmgr.bo_unreference = drm_intel_gem_bo_unreference;
+	bufmgr_gem->bufmgr.bo_map = drm_intel_gem_bo_map;
+	bufmgr_gem->bufmgr.bo_unmap = drm_intel_gem_bo_unmap;
+	bufmgr_gem->bufmgr.bo_subdata = drm_intel_gem_bo_subdata;
+	bufmgr_gem->bufmgr.bo_get_subdata = drm_intel_gem_bo_get_subdata;
+	bufmgr_gem->bufmgr.bo_wait_rendering = drm_intel_gem_bo_wait_rendering;
+	bufmgr_gem->bufmgr.bo_emit_reloc = drm_intel_gem_bo_emit_reloc;
+	bufmgr_gem->bufmgr.bo_emit_reloc_fence = drm_intel_gem_bo_emit_reloc_fence;
+	bufmgr_gem->bufmgr.bo_pin = drm_intel_gem_bo_pin;
+	bufmgr_gem->bufmgr.bo_unpin = drm_intel_gem_bo_unpin;
+	bufmgr_gem->bufmgr.bo_get_tiling = drm_intel_gem_bo_get_tiling;
+	bufmgr_gem->bufmgr.bo_set_tiling = drm_intel_gem_bo_set_tiling;
+	bufmgr_gem->bufmgr.bo_flink = drm_intel_gem_bo_flink;
+	/* Use the new one if available */
+	if (exec2) {
+		bufmgr_gem->bufmgr.bo_exec = drm_intel_gem_bo_exec2;
+		bufmgr_gem->bufmgr.bo_mrb_exec = drm_intel_gem_bo_mrb_exec2;
+	} else
+		bufmgr_gem->bufmgr.bo_exec = drm_intel_gem_bo_exec;
+	bufmgr_gem->bufmgr.bo_busy = drm_intel_gem_bo_busy;
+	bufmgr_gem->bufmgr.bo_madvise = drm_intel_gem_bo_madvise;
+	bufmgr_gem->bufmgr.destroy = drm_intel_bufmgr_gem_unref;
+	bufmgr_gem->bufmgr.debug = 0;
+	bufmgr_gem->bufmgr.check_aperture_space =
+	    drm_intel_gem_check_aperture_space;
+	bufmgr_gem->bufmgr.bo_disable_reuse = drm_intel_gem_bo_disable_reuse;
+	bufmgr_gem->bufmgr.bo_is_reusable = drm_intel_gem_bo_is_reusable;
+	bufmgr_gem->bufmgr.get_pipe_from_crtc_id =
+	    drm_intel_gem_get_pipe_from_crtc_id;
+	bufmgr_gem->bufmgr.bo_references = drm_intel_gem_bo_references;
+
+	DRMINITLISTHEAD(&bufmgr_gem->named);
+	init_cache_buckets(bufmgr_gem);
+
+	DRMINITLISTHEAD(&bufmgr_gem->vma_cache);
+	bufmgr_gem->vma_max = -1; /* unlimited by default */
+
+	DRMLISTADD(&bufmgr_gem->managers, &bufmgr_list);
+
+exit:
+	pthread_mutex_unlock(&bufmgr_list_mutex);
+
+	return bufmgr_gem != NULL ? &bufmgr_gem->bufmgr : NULL;
+}

diff --git a/icd/intel/kmd/libdrm/intel/intel_bufmgr_priv.h b/icd/intel/kmd/libdrm/intel/intel_bufmgr_priv.h
new file mode 100644
index 0000000..59ebd18
--- /dev/null
+++ b/icd/intel/kmd/libdrm/intel/intel_bufmgr_priv.h

@@ -0,0 +1,304 @@
+/*
+ * Copyright © 2008 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Eric Anholt <eric@anholt.net>
+ *
+ */
+
+/**
+ * @file intel_bufmgr_priv.h
+ *
+ * Private definitions of Intel-specific bufmgr functions and structures.
+ */
+
+#ifndef INTEL_BUFMGR_PRIV_H
+#define INTEL_BUFMGR_PRIV_H
+
+/**
+ * Context for a buffer manager instance.
+ *
+ * Contains public methods followed by private storage for the buffer manager.
+ */
+struct _drm_intel_bufmgr {
+	/**
+	 * Allocate a buffer object.
+	 *
+	 * Buffer objects are not necessarily initially mapped into CPU virtual
+	 * address space or graphics device aperture.  They must be mapped
+	 * using bo_map() or drm_intel_gem_bo_map_gtt() to be used by the CPU.
+	 */
+	drm_intel_bo *(*bo_alloc) (drm_intel_bufmgr *bufmgr, const char *name,
+				   unsigned long size, unsigned int alignment);
+
+	/**
+	 * Allocate a buffer object, hinting that it will be used as a
+	 * render target.
+	 *
+	 * This is otherwise the same as bo_alloc.
+	 */
+	drm_intel_bo *(*bo_alloc_for_render) (drm_intel_bufmgr *bufmgr,
+					      const char *name,
+					      unsigned long size,
+					      unsigned int alignment);
+
+	/**
+	 * Allocate a buffer object from an existing user accessible
+	 * address malloc'd with the provided size.
+	 * Alignment is used when mapping to the gtt.
+	 * Flags may be I915_VMAP_READ_ONLY or I915_USERPTR_UNSYNCHRONIZED
+	 */
+	drm_intel_bo *(*bo_alloc_userptr)(drm_intel_bufmgr *bufmgr,
+					  const char *name, void *addr,
+					  uint32_t tiling_mode, uint32_t stride,
+					  unsigned long size,
+					  unsigned long flags);
+
+	/**
+	 * Allocate a tiled buffer object.
+	 *
+	 * Alignment for tiled objects is set automatically; the 'flags'
+	 * argument provides a hint about how the object will be used initially.
+	 *
+	 * Valid tiling formats are:
+	 *  I915_TILING_NONE
+	 *  I915_TILING_X
+	 *  I915_TILING_Y
+	 *
+	 * Note the tiling format may be rejected; callers should check the
+	 * 'tiling_mode' field on return, as well as the pitch value, which
+	 * may have been rounded up to accommodate for tiling restrictions.
+	 */
+	drm_intel_bo *(*bo_alloc_tiled) (drm_intel_bufmgr *bufmgr,
+					 const char *name,
+					 int x, int y, int cpp,
+					 uint32_t *tiling_mode,
+					 unsigned long *pitch,
+					 unsigned long flags);
+
+	/** Takes a reference on a buffer object */
+	void (*bo_reference) (drm_intel_bo *bo);
+
+	/**
+	 * Releases a reference on a buffer object, freeing the data if
+	 * no references remain.
+	 */
+	void (*bo_unreference) (drm_intel_bo *bo);
+
+	/**
+	 * Maps the buffer into userspace.
+	 *
+	 * This function will block waiting for any existing execution on the
+	 * buffer to complete, first.  The resulting mapping is available at
+	 * buf->virtual.
+	 */
+	int (*bo_map) (drm_intel_bo *bo, int write_enable);
+
+	/**
+	 * Reduces the refcount on the userspace mapping of the buffer
+	 * object.
+	 */
+	int (*bo_unmap) (drm_intel_bo *bo);
+
+	/**
+	 * Write data into an object.
+	 *
+	 * This is an optional function, if missing,
+	 * drm_intel_bo will map/memcpy/unmap.
+	 */
+	int (*bo_subdata) (drm_intel_bo *bo, unsigned long offset,
+			   unsigned long size, const void *data);
+
+	/**
+	 * Read data from an object
+	 *
+	 * This is an optional function, if missing,
+	 * drm_intel_bo will map/memcpy/unmap.
+	 */
+	int (*bo_get_subdata) (drm_intel_bo *bo, unsigned long offset,
+			       unsigned long size, void *data);
+
+	/**
+	 * Waits for rendering to an object by the GPU to have completed.
+	 *
+	 * This is not required for any access to the BO by bo_map,
+	 * bo_subdata, etc.  It is merely a way for the driver to implement
+	 * glFinish.
+	 */
+	void (*bo_wait_rendering) (drm_intel_bo *bo);
+
+	/**
+	 * Tears down the buffer manager instance.
+	 */
+	void (*destroy) (drm_intel_bufmgr *bufmgr);
+
+	/**
+	 * Add relocation entry in reloc_buf, which will be updated with the
+	 * target buffer's real offset on on command submission.
+	 *
+	 * Relocations remain in place for the lifetime of the buffer object.
+	 *
+	 * \param bo Buffer to write the relocation into.
+	 * \param offset Byte offset within reloc_bo of the pointer to
+	 *			target_bo.
+	 * \param target_bo Buffer whose offset should be written into the
+	 *                  relocation entry.
+	 * \param target_offset Constant value to be added to target_bo's
+	 *			offset in relocation entry.
+	 * \param read_domains GEM read domains which the buffer will be
+	 *			read into by the command that this relocation
+	 *			is part of.
+	 * \param write_domains GEM read domains which the buffer will be
+	 *			dirtied in by the command that this
+	 *			relocation is part of.
+	 */
+	int (*bo_emit_reloc) (drm_intel_bo *bo, uint32_t offset,
+			      drm_intel_bo *target_bo, uint32_t target_offset,
+			      uint32_t read_domains, uint32_t write_domain);
+	int (*bo_emit_reloc_fence)(drm_intel_bo *bo, uint32_t offset,
+				   drm_intel_bo *target_bo,
+				   uint32_t target_offset,
+				   uint32_t read_domains,
+				   uint32_t write_domain);
+
+	/** Executes the command buffer pointed to by bo. */
+	int (*bo_exec) (drm_intel_bo *bo, int used,
+			drm_clip_rect_t *cliprects, int num_cliprects,
+			int DR4);
+
+	/** Executes the command buffer pointed to by bo on the selected
+	 * ring buffer
+	 */
+	int (*bo_mrb_exec) (drm_intel_bo *bo, int used,
+			    drm_clip_rect_t *cliprects, int num_cliprects,
+			    int DR4, unsigned flags);
+
+	/**
+	 * Pin a buffer to the aperture and fix the offset until unpinned
+	 *
+	 * \param buf Buffer to pin
+	 * \param alignment Required alignment for aperture, in bytes
+	 */
+	int (*bo_pin) (drm_intel_bo *bo, uint32_t alignment);
+
+	/**
+	 * Unpin a buffer from the aperture, allowing it to be removed
+	 *
+	 * \param buf Buffer to unpin
+	 */
+	int (*bo_unpin) (drm_intel_bo *bo);
+
+	/**
+	 * Ask that the buffer be placed in tiling mode
+	 *
+	 * \param buf Buffer to set tiling mode for
+	 * \param tiling_mode desired, and returned tiling mode
+	 */
+	int (*bo_set_tiling) (drm_intel_bo *bo, uint32_t * tiling_mode,
+			      uint32_t stride);
+
+	/**
+	 * Get the current tiling (and resulting swizzling) mode for the bo.
+	 *
+	 * \param buf Buffer to get tiling mode for
+	 * \param tiling_mode returned tiling mode
+	 * \param swizzle_mode returned swizzling mode
+	 */
+	int (*bo_get_tiling) (drm_intel_bo *bo, uint32_t * tiling_mode,
+			      uint32_t * swizzle_mode);
+
+	/**
+	 * Create a visible name for a buffer which can be used by other apps
+	 *
+	 * \param buf Buffer to create a name for
+	 * \param name Returned name
+	 */
+	int (*bo_flink) (drm_intel_bo *bo, uint32_t * name);
+
+	/**
+	 * Returns 1 if mapping the buffer for write could cause the process
+	 * to block, due to the object being active in the GPU.
+	 */
+	int (*bo_busy) (drm_intel_bo *bo);
+
+	/**
+	 * Specify the volatility of the buffer.
+	 * \param bo Buffer to create a name for
+	 * \param madv The purgeable status
+	 *
+	 * Use I915_MADV_DONTNEED to mark the buffer as purgeable, and it will be
+	 * reclaimed under memory pressure. If you subsequently require the buffer,
+	 * then you must pass I915_MADV_WILLNEED to mark the buffer as required.
+	 *
+	 * Returns 1 if the buffer was retained, or 0 if it was discarded whilst
+	 * marked as I915_MADV_DONTNEED.
+	 */
+	int (*bo_madvise) (drm_intel_bo *bo, int madv);
+
+	int (*check_aperture_space) (drm_intel_bo ** bo_array, int count);
+
+	/**
+	 * Disable buffer reuse for buffers which will be shared in some way,
+	 * as with scanout buffers. When the buffer reference count goes to
+	 * zero, it will be freed and not placed in the reuse list.
+	 *
+	 * \param bo Buffer to disable reuse for
+	 */
+	int (*bo_disable_reuse) (drm_intel_bo *bo);
+
+	/**
+	 * Query whether a buffer is reusable.
+	 *
+	 * \param bo Buffer to query
+	 */
+	int (*bo_is_reusable) (drm_intel_bo *bo);
+
+	/**
+	 *
+	 * Return the pipe associated with a crtc_id so that vblank
+	 * synchronization can use the correct data in the request.
+	 * This is only supported for KMS and gem at this point, when
+	 * unsupported, this function returns -1 and leaves the decision
+	 * of what to do in that case to the caller
+	 *
+	 * \param bufmgr the associated buffer manager
+	 * \param crtc_id the crtc identifier
+	 */
+	int (*get_pipe_from_crtc_id) (drm_intel_bufmgr *bufmgr, int crtc_id);
+
+	/** Returns true if target_bo is in the relocation tree rooted at bo. */
+	int (*bo_references) (drm_intel_bo *bo, drm_intel_bo *target_bo);
+
+	/**< Enables verbose debugging printouts */
+	int debug;
+};
+
+struct _drm_intel_context {
+	unsigned int ctx_id;
+	struct _drm_intel_bufmgr *bufmgr;
+};
+
+#define ALIGN(value, alignment)	((value + alignment - 1) & ~(alignment - 1))
+#define ROUND_UP_TO(x, y)	(((x) + (y) - 1) / (y) * (y))
+#define ROUND_UP_TO_MB(x)	ROUND_UP_TO((x), 1024*1024)
+
+#endif /* INTEL_BUFMGR_PRIV_H */

diff --git a/icd/intel/kmd/libdrm/intel/intel_chipset.h b/icd/intel/kmd/libdrm/intel/intel_chipset.h
new file mode 100644
index 0000000..e22a867
--- /dev/null
+++ b/icd/intel/kmd/libdrm/intel/intel_chipset.h

@@ -0,0 +1,376 @@
+/*
+ *
+ * Copyright 2003 Tungsten Graphics, Inc., Cedar Park, Texas.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
+ * IN NO EVENT SHALL TUNGSTEN GRAPHICS AND/OR ITS SUPPLIERS BE LIABLE FOR
+ * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef _INTEL_CHIPSET_H
+#define _INTEL_CHIPSET_H
+
+#define PCI_CHIP_I810			0x7121
+#define PCI_CHIP_I810_DC100		0x7123
+#define PCI_CHIP_I810_E			0x7125
+#define PCI_CHIP_I815			0x1132
+
+#define PCI_CHIP_I830_M			0x3577
+#define PCI_CHIP_845_G			0x2562
+#define PCI_CHIP_I855_GM		0x3582
+#define PCI_CHIP_I865_G			0x2572
+
+#define PCI_CHIP_I915_G			0x2582
+#define PCI_CHIP_E7221_G		0x258A
+#define PCI_CHIP_I915_GM		0x2592
+#define PCI_CHIP_I945_G			0x2772
+#define PCI_CHIP_I945_GM		0x27A2
+#define PCI_CHIP_I945_GME		0x27AE
+
+#define PCI_CHIP_Q35_G			0x29B2
+#define PCI_CHIP_G33_G			0x29C2
+#define PCI_CHIP_Q33_G			0x29D2
+
+#define PCI_CHIP_IGD_GM			0xA011
+#define PCI_CHIP_IGD_G			0xA001
+
+#define IS_IGDGM(devid)		((devid) == PCI_CHIP_IGD_GM)
+#define IS_IGDG(devid)		((devid) == PCI_CHIP_IGD_G)
+#define IS_IGD(devid)		(IS_IGDG(devid) || IS_IGDGM(devid))
+
+#define PCI_CHIP_I965_G			0x29A2
+#define PCI_CHIP_I965_Q			0x2992
+#define PCI_CHIP_I965_G_1		0x2982
+#define PCI_CHIP_I946_GZ		0x2972
+#define PCI_CHIP_I965_GM		0x2A02
+#define PCI_CHIP_I965_GME		0x2A12
+
+#define PCI_CHIP_GM45_GM		0x2A42
+
+#define PCI_CHIP_IGD_E_G		0x2E02
+#define PCI_CHIP_Q45_G			0x2E12
+#define PCI_CHIP_G45_G			0x2E22
+#define PCI_CHIP_G41_G			0x2E32
+
+#define PCI_CHIP_ILD_G			0x0042
+#define PCI_CHIP_ILM_G			0x0046
+
+#define PCI_CHIP_SANDYBRIDGE_GT1	0x0102 /* desktop */
+#define PCI_CHIP_SANDYBRIDGE_GT2	0x0112
+#define PCI_CHIP_SANDYBRIDGE_GT2_PLUS	0x0122
+#define PCI_CHIP_SANDYBRIDGE_M_GT1	0x0106 /* mobile */
+#define PCI_CHIP_SANDYBRIDGE_M_GT2	0x0116
+#define PCI_CHIP_SANDYBRIDGE_M_GT2_PLUS	0x0126
+#define PCI_CHIP_SANDYBRIDGE_S		0x010A /* server */
+
+#define PCI_CHIP_IVYBRIDGE_GT1		0x0152 /* desktop */
+#define PCI_CHIP_IVYBRIDGE_GT2		0x0162
+#define PCI_CHIP_IVYBRIDGE_M_GT1	0x0156 /* mobile */
+#define PCI_CHIP_IVYBRIDGE_M_GT2	0x0166
+#define PCI_CHIP_IVYBRIDGE_S		0x015a /* server */
+#define PCI_CHIP_IVYBRIDGE_S_GT2	0x016a /* server */
+
+#define PCI_CHIP_HASWELL_GT1		0x0402 /* Desktop */
+#define PCI_CHIP_HASWELL_GT2		0x0412
+#define PCI_CHIP_HASWELL_GT3		0x0422
+#define PCI_CHIP_HASWELL_M_GT1		0x0406 /* Mobile */
+#define PCI_CHIP_HASWELL_M_GT2		0x0416
+#define PCI_CHIP_HASWELL_M_GT3		0x0426
+#define PCI_CHIP_HASWELL_S_GT1		0x040A /* Server */
+#define PCI_CHIP_HASWELL_S_GT2		0x041A
+#define PCI_CHIP_HASWELL_S_GT3		0x042A
+#define PCI_CHIP_HASWELL_B_GT1		0x040B /* Reserved */
+#define PCI_CHIP_HASWELL_B_GT2		0x041B
+#define PCI_CHIP_HASWELL_B_GT3		0x042B
+#define PCI_CHIP_HASWELL_E_GT1		0x040E /* Reserved */
+#define PCI_CHIP_HASWELL_E_GT2		0x041E
+#define PCI_CHIP_HASWELL_E_GT3		0x042E
+#define PCI_CHIP_HASWELL_SDV_GT1	0x0C02 /* Desktop */
+#define PCI_CHIP_HASWELL_SDV_GT2	0x0C12
+#define PCI_CHIP_HASWELL_SDV_GT3	0x0C22
+#define PCI_CHIP_HASWELL_SDV_M_GT1	0x0C06 /* Mobile */
+#define PCI_CHIP_HASWELL_SDV_M_GT2	0x0C16
+#define PCI_CHIP_HASWELL_SDV_M_GT3	0x0C26
+#define PCI_CHIP_HASWELL_SDV_S_GT1	0x0C0A /* Server */
+#define PCI_CHIP_HASWELL_SDV_S_GT2	0x0C1A
+#define PCI_CHIP_HASWELL_SDV_S_GT3	0x0C2A
+#define PCI_CHIP_HASWELL_SDV_B_GT1	0x0C0B /* Reserved */
+#define PCI_CHIP_HASWELL_SDV_B_GT2	0x0C1B
+#define PCI_CHIP_HASWELL_SDV_B_GT3	0x0C2B
+#define PCI_CHIP_HASWELL_SDV_E_GT1	0x0C0E /* Reserved */
+#define PCI_CHIP_HASWELL_SDV_E_GT2	0x0C1E
+#define PCI_CHIP_HASWELL_SDV_E_GT3	0x0C2E
+#define PCI_CHIP_HASWELL_ULT_GT1	0x0A02 /* Desktop */
+#define PCI_CHIP_HASWELL_ULT_GT2	0x0A12
+#define PCI_CHIP_HASWELL_ULT_GT3	0x0A22
+#define PCI_CHIP_HASWELL_ULT_M_GT1	0x0A06 /* Mobile */
+#define PCI_CHIP_HASWELL_ULT_M_GT2	0x0A16
+#define PCI_CHIP_HASWELL_ULT_M_GT3	0x0A26
+#define PCI_CHIP_HASWELL_ULT_S_GT1	0x0A0A /* Server */
+#define PCI_CHIP_HASWELL_ULT_S_GT2	0x0A1A
+#define PCI_CHIP_HASWELL_ULT_S_GT3	0x0A2A
+#define PCI_CHIP_HASWELL_ULT_B_GT1	0x0A0B /* Reserved */
+#define PCI_CHIP_HASWELL_ULT_B_GT2	0x0A1B
+#define PCI_CHIP_HASWELL_ULT_B_GT3	0x0A2B
+#define PCI_CHIP_HASWELL_ULT_E_GT1	0x0A0E /* Reserved */
+#define PCI_CHIP_HASWELL_ULT_E_GT2	0x0A1E
+#define PCI_CHIP_HASWELL_ULT_E_GT3	0x0A2E
+#define PCI_CHIP_HASWELL_CRW_GT1	0x0D02 /* Desktop */
+#define PCI_CHIP_HASWELL_CRW_GT2	0x0D12
+#define PCI_CHIP_HASWELL_CRW_GT3	0x0D22
+#define PCI_CHIP_HASWELL_CRW_M_GT1	0x0D06 /* Mobile */
+#define PCI_CHIP_HASWELL_CRW_M_GT2	0x0D16
+#define PCI_CHIP_HASWELL_CRW_M_GT3	0x0D26
+#define PCI_CHIP_HASWELL_CRW_S_GT1	0x0D0A /* Server */
+#define PCI_CHIP_HASWELL_CRW_S_GT2	0x0D1A
+#define PCI_CHIP_HASWELL_CRW_S_GT3	0x0D2A
+#define PCI_CHIP_HASWELL_CRW_B_GT1	0x0D0B /* Reserved */
+#define PCI_CHIP_HASWELL_CRW_B_GT2	0x0D1B
+#define PCI_CHIP_HASWELL_CRW_B_GT3	0x0D2B
+#define PCI_CHIP_HASWELL_CRW_E_GT1	0x0D0E /* Reserved */
+#define PCI_CHIP_HASWELL_CRW_E_GT2	0x0D1E
+#define PCI_CHIP_HASWELL_CRW_E_GT3	0x0D2E
+#define BDW_SPARE			0x2
+#define BDW_ULT				0x6
+#define BDW_SERVER			0xa
+#define BDW_IRIS			0xb
+#define BDW_WORKSTATION			0xd
+#define BDW_ULX				0xe
+
+#define PCI_CHIP_VALLEYVIEW_PO		0x0f30 /* VLV PO board */
+#define PCI_CHIP_VALLEYVIEW_1		0x0f31
+#define PCI_CHIP_VALLEYVIEW_2		0x0f32
+#define PCI_CHIP_VALLEYVIEW_3		0x0f33
+
+#define PCI_CHIP_CHERRYVIEW_0		0x22b0
+#define PCI_CHIP_CHERRYVIEW_1		0x22b1
+#define PCI_CHIP_CHERRYVIEW_2		0x22b2
+#define PCI_CHIP_CHERRYVIEW_3		0x22b3
+
+#define PCI_CHIP_SKYLAKE_ULT_GT2	0x1916
+#define PCI_CHIP_SKYLAKE_ULT_GT1	0x1906
+#define PCI_CHIP_SKYLAKE_ULT_GT3	0x1926
+#define PCI_CHIP_SKYLAKE_ULT_GT2F	0x1921
+#define PCI_CHIP_SKYLAKE_ULX_GT1	0x190E
+#define PCI_CHIP_SKYLAKE_ULX_GT2	0x191E
+#define PCI_CHIP_SKYLAKE_DT_GT2		0x1912
+#define PCI_CHIP_SKYLAKE_DT_GT1		0x1902
+#define PCI_CHIP_SKYLAKE_HALO_GT2	0x191B
+#define PCI_CHIP_SKYLAKE_HALO_GT3	0x192B
+#define PCI_CHIP_SKYLAKE_HALO_GT1 	0x190B
+#define PCI_CHIP_SKYLAKE_SRV_GT2	0x191A
+#define PCI_CHIP_SKYLAKE_SRV_GT3	0x192A
+#define PCI_CHIP_SKYLAKE_SRV_GT1	0x190A
+#define PCI_CHIP_SKYLAKE_WKS_GT2 	0x191D
+
+#define IS_MOBILE(devid)	((devid) == PCI_CHIP_I855_GM || \
+				 (devid) == PCI_CHIP_I915_GM || \
+				 (devid) == PCI_CHIP_I945_GM || \
+				 (devid) == PCI_CHIP_I945_GME || \
+				 (devid) == PCI_CHIP_I965_GM || \
+				 (devid) == PCI_CHIP_I965_GME || \
+				 (devid) == PCI_CHIP_GM45_GM || IS_IGD(devid) || \
+				 (devid) == PCI_CHIP_IVYBRIDGE_M_GT1 || \
+				 (devid) == PCI_CHIP_IVYBRIDGE_M_GT2)
+
+#define IS_G45(devid)		((devid) == PCI_CHIP_IGD_E_G || \
+				 (devid) == PCI_CHIP_Q45_G || \
+				 (devid) == PCI_CHIP_G45_G || \
+				 (devid) == PCI_CHIP_G41_G)
+#define IS_GM45(devid)		((devid) == PCI_CHIP_GM45_GM)
+#define IS_G4X(devid)		(IS_G45(devid) || IS_GM45(devid))
+
+#define IS_ILD(devid)		((devid) == PCI_CHIP_ILD_G)
+#define IS_ILM(devid)		((devid) == PCI_CHIP_ILM_G)
+
+#define IS_915(devid)		((devid) == PCI_CHIP_I915_G || \
+				 (devid) == PCI_CHIP_E7221_G || \
+				 (devid) == PCI_CHIP_I915_GM)
+
+#define IS_945GM(devid)		((devid) == PCI_CHIP_I945_GM || \
+				 (devid) == PCI_CHIP_I945_GME)
+
+#define IS_945(devid)		((devid) == PCI_CHIP_I945_G || \
+				 (devid) == PCI_CHIP_I945_GM || \
+				 (devid) == PCI_CHIP_I945_GME || \
+				 IS_G33(devid))
+
+#define IS_G33(devid)		((devid) == PCI_CHIP_G33_G || \
+				 (devid) == PCI_CHIP_Q33_G || \
+				 (devid) == PCI_CHIP_Q35_G || IS_IGD(devid))
+
+#define IS_GEN2(devid)		((devid) == PCI_CHIP_I830_M || \
+				 (devid) == PCI_CHIP_845_G || \
+				 (devid) == PCI_CHIP_I855_GM || \
+				 (devid) == PCI_CHIP_I865_G)
+
+#define IS_GEN3(devid)		(IS_945(devid) || IS_915(devid))
+
+#define IS_GEN4(devid)		((devid) == PCI_CHIP_I965_G || \
+				 (devid) == PCI_CHIP_I965_Q || \
+				 (devid) == PCI_CHIP_I965_G_1 || \
+				 (devid) == PCI_CHIP_I965_GM || \
+				 (devid) == PCI_CHIP_I965_GME || \
+				 (devid) == PCI_CHIP_I946_GZ || \
+				 IS_G4X(devid))
+
+#define IS_GEN5(devid)		(IS_ILD(devid) || IS_ILM(devid))
+
+#define IS_GEN6(devid)		((devid) == PCI_CHIP_SANDYBRIDGE_GT1 || \
+				 (devid) == PCI_CHIP_SANDYBRIDGE_GT2 || \
+				 (devid) == PCI_CHIP_SANDYBRIDGE_GT2_PLUS || \
+				 (devid) == PCI_CHIP_SANDYBRIDGE_M_GT1 || \
+				 (devid) == PCI_CHIP_SANDYBRIDGE_M_GT2 || \
+				 (devid) == PCI_CHIP_SANDYBRIDGE_M_GT2_PLUS || \
+				 (devid) == PCI_CHIP_SANDYBRIDGE_S)
+
+#define IS_GEN7(devid)		(IS_IVYBRIDGE(devid) || \
+				 IS_HASWELL(devid) || \
+				 IS_VALLEYVIEW(devid))
+
+#define IS_IVYBRIDGE(devid)	((devid) == PCI_CHIP_IVYBRIDGE_GT1 || \
+				 (devid) == PCI_CHIP_IVYBRIDGE_GT2 || \
+				 (devid) == PCI_CHIP_IVYBRIDGE_M_GT1 || \
+				 (devid) == PCI_CHIP_IVYBRIDGE_M_GT2 || \
+				 (devid) == PCI_CHIP_IVYBRIDGE_S || \
+				 (devid) == PCI_CHIP_IVYBRIDGE_S_GT2)
+
+#define IS_VALLEYVIEW(devid)	((devid) == PCI_CHIP_VALLEYVIEW_PO || \
+				 (devid) == PCI_CHIP_VALLEYVIEW_1 || \
+				 (devid) == PCI_CHIP_VALLEYVIEW_2 || \
+				 (devid) == PCI_CHIP_VALLEYVIEW_3)
+
+#define IS_HSW_GT1(devid)	((devid) == PCI_CHIP_HASWELL_GT1 || \
+				 (devid) == PCI_CHIP_HASWELL_M_GT1 || \
+				 (devid) == PCI_CHIP_HASWELL_S_GT1 || \
+				 (devid) == PCI_CHIP_HASWELL_B_GT1 || \
+				 (devid) == PCI_CHIP_HASWELL_E_GT1 || \
+				 (devid) == PCI_CHIP_HASWELL_SDV_GT1 || \
+				 (devid) == PCI_CHIP_HASWELL_SDV_M_GT1 || \
+				 (devid) == PCI_CHIP_HASWELL_SDV_S_GT1 || \
+				 (devid) == PCI_CHIP_HASWELL_SDV_B_GT1 || \
+				 (devid) == PCI_CHIP_HASWELL_SDV_E_GT1 || \
+				 (devid) == PCI_CHIP_HASWELL_ULT_GT1 || \
+				 (devid) == PCI_CHIP_HASWELL_ULT_M_GT1 || \
+				 (devid) == PCI_CHIP_HASWELL_ULT_S_GT1 || \
+				 (devid) == PCI_CHIP_HASWELL_ULT_B_GT1 || \
+				 (devid) == PCI_CHIP_HASWELL_ULT_E_GT1 || \
+				 (devid) == PCI_CHIP_HASWELL_CRW_GT1 || \
+				 (devid) == PCI_CHIP_HASWELL_CRW_M_GT1 || \
+				 (devid) == PCI_CHIP_HASWELL_CRW_S_GT1 || \
+				 (devid) == PCI_CHIP_HASWELL_CRW_B_GT1 || \
+				 (devid) == PCI_CHIP_HASWELL_CRW_E_GT1)
+#define IS_HSW_GT2(devid)	((devid) == PCI_CHIP_HASWELL_GT2 || \
+				 (devid) == PCI_CHIP_HASWELL_M_GT2 || \
+				 (devid) == PCI_CHIP_HASWELL_S_GT2 || \
+				 (devid) == PCI_CHIP_HASWELL_B_GT2 || \
+				 (devid) == PCI_CHIP_HASWELL_E_GT2 || \
+				 (devid) == PCI_CHIP_HASWELL_SDV_GT2 || \
+				 (devid) == PCI_CHIP_HASWELL_SDV_M_GT2 || \
+				 (devid) == PCI_CHIP_HASWELL_SDV_S_GT2 || \
+				 (devid) == PCI_CHIP_HASWELL_SDV_B_GT2 || \
+				 (devid) == PCI_CHIP_HASWELL_SDV_E_GT2 || \
+				 (devid) == PCI_CHIP_HASWELL_ULT_GT2 || \
+				 (devid) == PCI_CHIP_HASWELL_ULT_M_GT2 || \
+				 (devid) == PCI_CHIP_HASWELL_ULT_S_GT2 || \
+				 (devid) == PCI_CHIP_HASWELL_ULT_B_GT2 || \
+				 (devid) == PCI_CHIP_HASWELL_ULT_E_GT2 || \
+				 (devid) == PCI_CHIP_HASWELL_CRW_GT2 || \
+				 (devid) == PCI_CHIP_HASWELL_CRW_M_GT2 || \
+				 (devid) == PCI_CHIP_HASWELL_CRW_S_GT2 || \
+				 (devid) == PCI_CHIP_HASWELL_CRW_B_GT2 || \
+				 (devid) == PCI_CHIP_HASWELL_CRW_E_GT2)
+#define IS_HSW_GT3(devid)	((devid) == PCI_CHIP_HASWELL_GT3 || \
+				 (devid) == PCI_CHIP_HASWELL_M_GT3 || \
+				 (devid) == PCI_CHIP_HASWELL_S_GT3 || \
+				 (devid) == PCI_CHIP_HASWELL_B_GT3 || \
+				 (devid) == PCI_CHIP_HASWELL_E_GT3 || \
+				 (devid) == PCI_CHIP_HASWELL_SDV_GT3 || \
+				 (devid) == PCI_CHIP_HASWELL_SDV_M_GT3 || \
+				 (devid) == PCI_CHIP_HASWELL_SDV_S_GT3 || \
+				 (devid) == PCI_CHIP_HASWELL_SDV_B_GT3 || \
+				 (devid) == PCI_CHIP_HASWELL_SDV_E_GT3 || \
+				 (devid) == PCI_CHIP_HASWELL_ULT_GT3 || \
+				 (devid) == PCI_CHIP_HASWELL_ULT_M_GT3 || \
+				 (devid) == PCI_CHIP_HASWELL_ULT_S_GT3 || \
+				 (devid) == PCI_CHIP_HASWELL_ULT_B_GT3 || \
+				 (devid) == PCI_CHIP_HASWELL_ULT_E_GT3 || \
+				 (devid) == PCI_CHIP_HASWELL_CRW_GT3 || \
+				 (devid) == PCI_CHIP_HASWELL_CRW_M_GT3 || \
+				 (devid) == PCI_CHIP_HASWELL_CRW_S_GT3 || \
+				 (devid) == PCI_CHIP_HASWELL_CRW_B_GT3 || \
+				 (devid) == PCI_CHIP_HASWELL_CRW_E_GT3)
+
+#define IS_HASWELL(devid)	(IS_HSW_GT1(devid) || \
+				 IS_HSW_GT2(devid) || \
+				 IS_HSW_GT3(devid))
+
+#define IS_BROADWELL(devid)     (((devid & 0xff00) != 0x1600) ? 0 : \
+				(((devid & 0x00f0) >> 4) > 3) ? 0 : \
+				((devid & 0x000f) == BDW_SPARE) ? 1 : \
+				((devid & 0x000f) == BDW_ULT) ? 1 : \
+				((devid & 0x000f) == BDW_IRIS) ? 1 : \
+				((devid & 0x000f) == BDW_SERVER) ? 1 : \
+				((devid & 0x000f) == BDW_WORKSTATION) ? 1 : \
+				((devid & 0x000f) == BDW_ULX) ? 1 : 0)
+
+#define IS_CHERRYVIEW(devid)	((devid) == PCI_CHIP_CHERRYVIEW_0 || \
+				 (devid) == PCI_CHIP_CHERRYVIEW_1 || \
+				 (devid) == PCI_CHIP_CHERRYVIEW_2 || \
+				 (devid) == PCI_CHIP_CHERRYVIEW_3)
+
+#define IS_GEN8(devid)		(IS_BROADWELL(devid) || \
+				 IS_CHERRYVIEW(devid))
+
+#define IS_SKL_GT1(devid)	((devid) == PCI_CHIP_SKYLAKE_ULT_GT1	|| \
+				 (devid) == PCI_CHIP_SKYLAKE_ULX_GT1	|| \
+				 (devid) == PCI_CHIP_SKYLAKE_DT_GT1	|| \
+				 (devid) == PCI_CHIP_SKYLAKE_HALO_GT1	|| \
+				 (devid) == PCI_CHIP_SKYLAKE_SRV_GT1)
+
+#define IS_SKL_GT2(devid)	((devid) == PCI_CHIP_SKYLAKE_ULT_GT2	|| \
+				 (devid) == PCI_CHIP_SKYLAKE_ULT_GT2F	|| \
+				 (devid) == PCI_CHIP_SKYLAKE_ULX_GT2	|| \
+				 (devid) == PCI_CHIP_SKYLAKE_DT_GT2	|| \
+				 (devid) == PCI_CHIP_SKYLAKE_HALO_GT2	|| \
+				 (devid) == PCI_CHIP_SKYLAKE_SRV_GT2	|| \
+				 (devid) == PCI_CHIP_SKYLAKE_WKS_GT2)
+
+#define IS_SKL_GT3(devid)	((devid) == PCI_CHIP_SKYLAKE_ULT_GT3	|| \
+				 (devid) == PCI_CHIP_SKYLAKE_HALO_GT3	|| \
+				 (devid) == PCI_CHIP_SKYLAKE_SRV_GT3)
+
+#define IS_SKYLAKE(devid)	(IS_SKL_GT1(devid) || \
+				 IS_SKL_GT2(devid) || \
+				 IS_SKL_GT3(devid))
+
+#define IS_GEN9(devid)		IS_SKYLAKE(devid)
+
+#define IS_9XX(dev)		(IS_GEN3(dev) || \
+				 IS_GEN4(dev) || \
+				 IS_GEN5(dev) || \
+				 IS_GEN6(dev) || \
+				 IS_GEN7(dev) || \
+				 IS_GEN8(dev) || \
+				 IS_GEN9(dev))
+
+
+#endif /* _INTEL_CHIPSET_H */

diff --git a/icd/intel/kmd/libdrm/intel/intel_decode.c b/icd/intel/kmd/libdrm/intel/intel_decode.c
new file mode 100644
index 0000000..86d47d1
--- /dev/null
+++ b/icd/intel/kmd/libdrm/intel/intel_decode.c

@@ -0,0 +1,3985 @@
+/*
+ * Copyright © 2009-2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifdef HAVE_CONFIG_H
+#include "config.h"
+#endif
+
+#include <assert.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <stdbool.h>
+#include <stdarg.h>
+#include <string.h>
+
+#include "libdrm.h"
+#include "xf86drm.h"
+#include "intel_chipset.h"
+#include "intel_bufmgr.h"
+
+/* Struct for tracking drm_intel_decode state. */
+struct drm_intel_decode {
+	/** stdio file where the output should land.  Defaults to stdout. */
+	FILE *out;
+
+	/** PCI device ID. */
+	uint32_t devid;
+
+	/**
+	 * Shorthand device identifier: 3 is 915, 4 is 965, 5 is
+	 * Ironlake, etc.
+	 */
+	int gen;
+
+	/** GPU address of the start of the current packet. */
+	uint32_t hw_offset;
+	/** CPU virtual address of the start of the current packet. */
+	uint32_t *data;
+	/** DWORDs of remaining batchbuffer data starting from the packet. */
+	uint32_t count;
+
+	/** GPU address of the start of the batchbuffer data. */
+	uint32_t base_hw_offset;
+	/** CPU Virtual address of the start of the batchbuffer data. */
+	uint32_t *base_data;
+	/** Number of DWORDs of batchbuffer data. */
+	uint32_t base_count;
+
+	/** @{
+	 * GPU head and tail pointers, which will be noted in the dump, or ~0.
+	 */
+	uint32_t head, tail;
+	/** @} */
+
+	/**
+	 * Whether to dump the dwords after MI_BATCHBUFFER_END.
+	 *
+	 * This sometimes provides clues in corrupted batchbuffers,
+	 * and is used by the intel-gpu-tools.
+	 */
+	bool dump_past_end;
+
+	bool overflowed;
+};
+
+static FILE *out;
+static uint32_t saved_s2 = 0, saved_s4 = 0;
+static char saved_s2_set = 0, saved_s4_set = 0;
+static uint32_t head_offset = 0xffffffff;	/* undefined */
+static uint32_t tail_offset = 0xffffffff;	/* undefined */
+
+#ifndef ARRAY_SIZE
+#define ARRAY_SIZE(A) (sizeof(A)/sizeof(A[0]))
+#endif
+
+#define BUFFER_FAIL(_count, _len, _name) do {			\
+    fprintf(out, "Buffer size too small in %s (%d < %d)\n",	\
+	    (_name), (_count), (_len));				\
+    return _count;						\
+} while (0)
+
+static float int_as_float(uint32_t intval)
+{
+	union intfloat {
+		uint32_t i;
+		float f;
+	} uval;
+
+	uval.i = intval;
+	return uval.f;
+}
+
+static void DRM_PRINTFLIKE(3, 4)
+instr_out(struct drm_intel_decode *ctx, unsigned int index,
+	  const char *fmt, ...)
+{
+	va_list va;
+	const char *parseinfo;
+	uint32_t offset = ctx->hw_offset + index * 4;
+
+	if (index > ctx->count) {
+		if (!ctx->overflowed) {
+			fprintf(out, "ERROR: Decode attempted to continue beyond end of batchbuffer\n");
+			ctx->overflowed = true;
+		}
+		return;
+	}
+
+	if (offset == head_offset)
+		parseinfo = "HEAD";
+	else if (offset == tail_offset)
+		parseinfo = "TAIL";
+	else
+		parseinfo = "    ";
+
+	fprintf(out, "0x%08x: %s 0x%08x: %s", offset, parseinfo,
+		ctx->data[index], index == 0 ? "" : "   ");
+	va_start(va, fmt);
+	vfprintf(out, fmt, va);
+	va_end(va);
+}
+
+static int
+decode_MI_SET_CONTEXT(struct drm_intel_decode *ctx)
+{
+	uint32_t data = ctx->data[1];
+	if (ctx->gen > 7)
+		return 1;
+
+	instr_out(ctx, 0, "MI_SET_CONTEXT\n");
+	instr_out(ctx, 1, "gtt offset = 0x%x%s%s\n",
+		  data & ~0xfff,
+		  data & (1<<1)? ", Force Restore": "",
+		  data & (1<<0)? ", Restore Inhibit": "");
+
+	return 2;
+}
+
+static int
+decode_MI_WAIT_FOR_EVENT(struct drm_intel_decode *ctx)
+{
+	const char *cc_wait;
+	int cc_shift = 0;
+	uint32_t data = ctx->data[0];
+
+	if (ctx->gen <= 5)
+		cc_shift = 9;
+	else
+		cc_shift = 16;
+
+	switch ((data >> cc_shift) & 0x1f) {
+	case 1:
+		cc_wait = ", cc wait 1";
+		break;
+	case 2:
+		cc_wait = ", cc wait 2";
+		break;
+	case 3:
+		cc_wait = ", cc wait 3";
+		break;
+	case 4:
+		cc_wait = ", cc wait 4";
+		break;
+	case 5:
+		cc_wait = ", cc wait 4";
+		break;
+	default:
+		cc_wait = "";
+		break;
+	}
+
+	if (ctx->gen <= 5) {
+		instr_out(ctx, 0, "MI_WAIT_FOR_EVENT%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
+			  data & (1<<18)? ", pipe B start vblank wait": "",
+			  data & (1<<17)? ", pipe A start vblank wait": "",
+			  data & (1<<16)? ", overlay flip pending wait": "",
+			  data & (1<<14)? ", pipe B hblank wait": "",
+			  data & (1<<13)? ", pipe A hblank wait": "",
+			  cc_wait,
+			  data & (1<<8)? ", plane C pending flip wait": "",
+			  data & (1<<7)? ", pipe B vblank wait": "",
+			  data & (1<<6)? ", plane B pending flip wait": "",
+			  data & (1<<5)? ", pipe B scan line wait": "",
+			  data & (1<<4)? ", fbc idle wait": "",
+			  data & (1<<3)? ", pipe A vblank wait": "",
+			  data & (1<<2)? ", plane A pending flip wait": "",
+			  data & (1<<1)? ", plane A scan line wait": "");
+	} else {
+		instr_out(ctx, 0, "MI_WAIT_FOR_EVENT%s%s%s%s%s%s%s%s%s%s%s%s\n",
+			  data & (1<<20)? ", sprite C pending flip wait": "", /* ivb */
+			  cc_wait,
+			  data & (1<<13)? ", pipe B hblank wait": "",
+			  data & (1<<11)? ", pipe B vblank wait": "",
+			  data & (1<<10)? ", sprite B pending flip wait": "",
+			  data & (1<<9)? ", plane B pending flip wait": "",
+			  data & (1<<8)? ", plane B scan line wait": "",
+			  data & (1<<5)? ", pipe A hblank wait": "",
+			  data & (1<<3)? ", pipe A vblank wait": "",
+			  data & (1<<2)? ", sprite A pending flip wait": "",
+			  data & (1<<1)? ", plane A pending flip wait": "",
+			  data & (1<<0)? ", plane A scan line wait": "");
+	}
+
+	return 1;
+}
+
+static int
+decode_mi(struct drm_intel_decode *ctx)
+{
+	unsigned int opcode, len = -1;
+	const char *post_sync_op = "";
+	uint32_t *data = ctx->data;
+
+	struct {
+		uint32_t opcode;
+		int len_mask;
+		unsigned int min_len;
+		unsigned int max_len;
+		const char *name;
+		int (*func)(struct drm_intel_decode *ctx);
+	} opcodes_mi[] = {
+		{ 0x08, 0, 1, 1, "MI_ARB_ON_OFF" },
+		{ 0x0a, 0, 1, 1, "MI_BATCH_BUFFER_END" },
+		{ 0x30, 0x3f, 3, 3, "MI_BATCH_BUFFER" },
+		{ 0x31, 0x3f, 2, 2, "MI_BATCH_BUFFER_START" },
+		{ 0x14, 0x3f, 3, 3, "MI_DISPLAY_BUFFER_INFO" },
+		{ 0x04, 0, 1, 1, "MI_FLUSH" },
+		{ 0x22, 0x1f, 3, 3, "MI_LOAD_REGISTER_IMM" },
+		{ 0x13, 0x3f, 2, 2, "MI_LOAD_SCAN_LINES_EXCL" },
+		{ 0x12, 0x3f, 2, 2, "MI_LOAD_SCAN_LINES_INCL" },
+		{ 0x00, 0, 1, 1, "MI_NOOP" },
+		{ 0x11, 0x3f, 2, 2, "MI_OVERLAY_FLIP" },
+		{ 0x07, 0, 1, 1, "MI_REPORT_HEAD" },
+		{ 0x18, 0x3f, 2, 2, "MI_SET_CONTEXT", decode_MI_SET_CONTEXT },
+		{ 0x20, 0x3f, 3, 4, "MI_STORE_DATA_IMM" },
+		{ 0x21, 0x3f, 3, 4, "MI_STORE_DATA_INDEX" },
+		{ 0x24, 0x3f, 3, 3, "MI_STORE_REGISTER_MEM" },
+		{ 0x02, 0, 1, 1, "MI_USER_INTERRUPT" },
+		{ 0x03, 0, 1, 1, "MI_WAIT_FOR_EVENT", decode_MI_WAIT_FOR_EVENT },
+		{ 0x16, 0x7f, 3, 3, "MI_SEMAPHORE_MBOX" },
+		{ 0x26, 0x1f, 3, 4, "MI_FLUSH_DW" },
+		{ 0x28, 0x3f, 3, 3, "MI_REPORT_PERF_COUNT" },
+		{ 0x29, 0xff, 3, 3, "MI_LOAD_REGISTER_MEM" },
+		{ 0x0b, 0, 1, 1, "MI_SUSPEND_FLUSH"},
+	}, *opcode_mi = NULL;
+
+	/* check instruction length */
+	for (opcode = 0; opcode < sizeof(opcodes_mi) / sizeof(opcodes_mi[0]);
+	     opcode++) {
+		if ((data[0] & 0x1f800000) >> 23 == opcodes_mi[opcode].opcode) {
+			len = 1;
+			if (opcodes_mi[opcode].max_len > 1) {
+				len =
+				    (data[0] & opcodes_mi[opcode].len_mask) + 2;
+				if (len < opcodes_mi[opcode].min_len
+				    || len > opcodes_mi[opcode].max_len) {
+					fprintf(out,
+						"Bad length (%d) in %s, [%d, %d]\n",
+						len, opcodes_mi[opcode].name,
+						opcodes_mi[opcode].min_len,
+						opcodes_mi[opcode].max_len);
+				}
+			}
+			opcode_mi = &opcodes_mi[opcode];
+			break;
+		}
+	}
+
+	if (opcode_mi && opcode_mi->func)
+		return opcode_mi->func(ctx);
+
+	switch ((data[0] & 0x1f800000) >> 23) {
+	case 0x0a:
+		instr_out(ctx, 0, "MI_BATCH_BUFFER_END\n");
+		return -1;
+	case 0x16:
+		instr_out(ctx, 0, "MI_SEMAPHORE_MBOX%s%s%s%s %u\n",
+			  data[0] & (1 << 22) ? " global gtt," : "",
+			  data[0] & (1 << 21) ? " update semaphore," : "",
+			  data[0] & (1 << 20) ? " compare semaphore," : "",
+			  data[0] & (1 << 18) ? " use compare reg" : "",
+			  (data[0] & (0x3 << 16)) >> 16);
+		instr_out(ctx, 1, "value\n");
+		instr_out(ctx, 2, "address\n");
+		return len;
+	case 0x21:
+		instr_out(ctx, 0, "MI_STORE_DATA_INDEX%s\n",
+			  data[0] & (1 << 21) ? " use per-process HWS," : "");
+		instr_out(ctx, 1, "index\n");
+		instr_out(ctx, 2, "dword\n");
+		if (len == 4)
+			instr_out(ctx, 3, "upper dword\n");
+		return len;
+	case 0x00:
+		if (data[0] & (1 << 22))
+			instr_out(ctx, 0,
+				  "MI_NOOP write NOPID reg, val=0x%x\n",
+				  data[0] & ((1 << 22) - 1));
+		else
+			instr_out(ctx, 0, "MI_NOOP\n");
+		return len;
+	case 0x26:
+		switch (data[0] & (0x3 << 14)) {
+		case (0 << 14):
+			post_sync_op = "no write";
+			break;
+		case (1 << 14):
+			post_sync_op = "write data";
+			break;
+		case (2 << 14):
+			post_sync_op = "reserved";
+			break;
+		case (3 << 14):
+			post_sync_op = "write TIMESTAMP";
+			break;
+		}
+		instr_out(ctx, 0,
+			  "MI_FLUSH_DW%s%s%s%s post_sync_op='%s' %s%s\n",
+			  data[0] & (1 << 22) ?
+			  " enable protected mem (BCS-only)," : "",
+			  data[0] & (1 << 21) ? " store in hws," : "",
+			  data[0] & (1 << 18) ? " invalidate tlb," : "",
+			  data[0] & (1 << 17) ? " flush gfdt," : "",
+			  post_sync_op,
+			  data[0] & (1 << 8) ? " enable notify interrupt," : "",
+			  data[0] & (1 << 7) ?
+			  " invalidate video state (BCS-only)," : "");
+		if (data[0] & (1 << 21))
+			instr_out(ctx, 1, "hws index\n");
+		else
+			instr_out(ctx, 1, "address\n");
+		instr_out(ctx, 2, "dword\n");
+		if (len == 4)
+			instr_out(ctx, 3, "upper dword\n");
+		return len;
+	}
+
+	for (opcode = 0; opcode < sizeof(opcodes_mi) / sizeof(opcodes_mi[0]);
+	     opcode++) {
+		if ((data[0] & 0x1f800000) >> 23 == opcodes_mi[opcode].opcode) {
+			unsigned int i;
+
+			instr_out(ctx, 0, "%s\n",
+				  opcodes_mi[opcode].name);
+			for (i = 1; i < len; i++) {
+				instr_out(ctx, i, "dword %d\n", i);
+			}
+
+			return len;
+		}
+	}
+
+	instr_out(ctx, 0, "MI UNKNOWN\n");
+	return 1;
+}
+
+static void
+decode_2d_br00(struct drm_intel_decode *ctx, const char *cmd)
+{
+	instr_out(ctx, 0,
+		  "%s (rgb %sabled, alpha %sabled, src tile %d, dst tile %d)\n",
+		  cmd,
+		  (ctx->data[0] & (1 << 20)) ? "en" : "dis",
+		  (ctx->data[0] & (1 << 21)) ? "en" : "dis",
+		  (ctx->data[0] >> 15) & 1,
+		  (ctx->data[0] >> 11) & 1);
+}
+
+static void
+decode_2d_br01(struct drm_intel_decode *ctx)
+{
+	const char *format;
+	switch ((ctx->data[1] >> 24) & 0x3) {
+	case 0:
+		format = "8";
+		break;
+	case 1:
+		format = "565";
+		break;
+	case 2:
+		format = "1555";
+		break;
+	case 3:
+		format = "8888";
+		break;
+	}
+
+	instr_out(ctx, 1,
+		  "format %s, pitch %d, rop 0x%02x, "
+		  "clipping %sabled, %s%s \n",
+		  format,
+		  (short)(ctx->data[1] & 0xffff),
+		  (ctx->data[1] >> 16) & 0xff,
+		  ctx->data[1] & (1 << 30) ? "en" : "dis",
+		  ctx->data[1] & (1 << 31) ? "solid pattern enabled, " : "",
+		  ctx->data[1] & (1 << 31) ?
+		  "mono pattern transparency enabled, " : "");
+
+}
+
+static int
+decode_2d(struct drm_intel_decode *ctx)
+{
+	unsigned int opcode, len;
+	uint32_t *data = ctx->data;
+
+	struct {
+		uint32_t opcode;
+		unsigned int min_len;
+		unsigned int max_len;
+		const char *name;
+	} opcodes_2d[] = {
+		{ 0x40, 5, 5, "COLOR_BLT" },
+		{ 0x43, 6, 6, "SRC_COPY_BLT" },
+		{ 0x01, 8, 8, "XY_SETUP_BLT" },
+		{ 0x11, 9, 9, "XY_SETUP_MONO_PATTERN_SL_BLT" },
+		{ 0x03, 3, 3, "XY_SETUP_CLIP_BLT" },
+		{ 0x24, 2, 2, "XY_PIXEL_BLT" },
+		{ 0x25, 3, 3, "XY_SCANLINES_BLT" },
+		{ 0x26, 4, 4, "Y_TEXT_BLT" },
+		{ 0x31, 5, 134, "XY_TEXT_IMMEDIATE_BLT" },
+		{ 0x50, 6, 6, "XY_COLOR_BLT" },
+		{ 0x51, 6, 6, "XY_PAT_BLT" },
+		{ 0x76, 8, 8, "XY_PAT_CHROMA_BLT" },
+		{ 0x72, 7, 135, "XY_PAT_BLT_IMMEDIATE" },
+		{ 0x77, 9, 137, "XY_PAT_CHROMA_BLT_IMMEDIATE" },
+		{ 0x52, 9, 9, "XY_MONO_PAT_BLT" },
+		{ 0x59, 7, 7, "XY_MONO_PAT_FIXED_BLT" },
+		{ 0x53, 8, 8, "XY_SRC_COPY_BLT" },
+		{ 0x54, 8, 8, "XY_MONO_SRC_COPY_BLT" },
+		{ 0x71, 9, 137, "XY_MONO_SRC_COPY_IMMEDIATE_BLT" },
+		{ 0x55, 9, 9, "XY_FULL_BLT" },
+		{ 0x55, 9, 137, "XY_FULL_IMMEDIATE_PATTERN_BLT" },
+		{ 0x56, 9, 9, "XY_FULL_MONO_SRC_BLT" },
+		{ 0x75, 10, 138, "XY_FULL_MONO_SRC_IMMEDIATE_PATTERN_BLT" },
+		{ 0x57, 12, 12, "XY_FULL_MONO_PATTERN_BLT" },
+		{ 0x58, 12, 12, "XY_FULL_MONO_PATTERN_MONO_SRC_BLT"},
+	};
+
+	switch ((data[0] & 0x1fc00000) >> 22) {
+	case 0x25:
+		instr_out(ctx, 0,
+			  "XY_SCANLINES_BLT (pattern seed (%d, %d), dst tile %d)\n",
+			  (data[0] >> 12) & 0x8,
+			  (data[0] >> 8) & 0x8, (data[0] >> 11) & 1);
+
+		len = (data[0] & 0x000000ff) + 2;
+		if (len != 3)
+			fprintf(out, "Bad count in XY_SCANLINES_BLT\n");
+
+		instr_out(ctx, 1, "dest (%d,%d)\n",
+			  data[1] & 0xffff, data[1] >> 16);
+		instr_out(ctx, 2, "dest (%d,%d)\n",
+			  data[2] & 0xffff, data[2] >> 16);
+		return len;
+	case 0x01:
+		decode_2d_br00(ctx, "XY_SETUP_BLT");
+
+		len = (data[0] & 0x000000ff) + 2;
+		if (len != 8)
+			fprintf(out, "Bad count in XY_SETUP_BLT\n");
+
+		decode_2d_br01(ctx);
+		instr_out(ctx, 2, "cliprect (%d,%d)\n",
+			  data[2] & 0xffff, data[2] >> 16);
+		instr_out(ctx, 3, "cliprect (%d,%d)\n",
+			  data[3] & 0xffff, data[3] >> 16);
+		instr_out(ctx, 4, "setup dst offset 0x%08x\n",
+			  data[4]);
+		instr_out(ctx, 5, "setup background color\n");
+		instr_out(ctx, 6, "setup foreground color\n");
+		instr_out(ctx, 7, "color pattern offset\n");
+		return len;
+	case 0x03:
+		decode_2d_br00(ctx, "XY_SETUP_CLIP_BLT");
+
+		len = (data[0] & 0x000000ff) + 2;
+		if (len != 3)
+			fprintf(out, "Bad count in XY_SETUP_CLIP_BLT\n");
+
+		instr_out(ctx, 1, "cliprect (%d,%d)\n",
+			  data[1] & 0xffff, data[2] >> 16);
+		instr_out(ctx, 2, "cliprect (%d,%d)\n",
+			  data[2] & 0xffff, data[3] >> 16);
+		return len;
+	case 0x11:
+		decode_2d_br00(ctx, "XY_SETUP_MONO_PATTERN_SL_BLT");
+
+		len = (data[0] & 0x000000ff) + 2;
+		if (len != 9)
+			fprintf(out,
+				"Bad count in XY_SETUP_MONO_PATTERN_SL_BLT\n");
+
+		decode_2d_br01(ctx);
+		instr_out(ctx, 2, "cliprect (%d,%d)\n",
+			  data[2] & 0xffff, data[2] >> 16);
+		instr_out(ctx, 3, "cliprect (%d,%d)\n",
+			  data[3] & 0xffff, data[3] >> 16);
+		instr_out(ctx, 4, "setup dst offset 0x%08x\n",
+			  data[4]);
+		instr_out(ctx, 5, "setup background color\n");
+		instr_out(ctx, 6, "setup foreground color\n");
+		instr_out(ctx, 7, "mono pattern dw0\n");
+		instr_out(ctx, 8, "mono pattern dw1\n");
+		return len;
+	case 0x50:
+		decode_2d_br00(ctx, "XY_COLOR_BLT");
+
+		len = (data[0] & 0x000000ff) + 2;
+		if (len != 6)
+			fprintf(out, "Bad count in XY_COLOR_BLT\n");
+
+		decode_2d_br01(ctx);
+		instr_out(ctx, 2, "(%d,%d)\n",
+			  data[2] & 0xffff, data[2] >> 16);
+		instr_out(ctx, 3, "(%d,%d)\n",
+			  data[3] & 0xffff, data[3] >> 16);
+		instr_out(ctx, 4, "offset 0x%08x\n", data[4]);
+		instr_out(ctx, 5, "color\n");
+		return len;
+	case 0x53:
+		decode_2d_br00(ctx, "XY_SRC_COPY_BLT");
+
+		len = (data[0] & 0x000000ff) + 2;
+		if (len != 8)
+			fprintf(out, "Bad count in XY_SRC_COPY_BLT\n");
+
+		decode_2d_br01(ctx);
+		instr_out(ctx, 2, "dst (%d,%d)\n",
+			  data[2] & 0xffff, data[2] >> 16);
+		instr_out(ctx, 3, "dst (%d,%d)\n",
+			  data[3] & 0xffff, data[3] >> 16);
+		instr_out(ctx, 4, "dst offset 0x%08x\n", data[4]);
+		instr_out(ctx, 5, "src (%d,%d)\n",
+			  data[5] & 0xffff, data[5] >> 16);
+		instr_out(ctx, 6, "src pitch %d\n",
+			  (short)(data[6] & 0xffff));
+		instr_out(ctx, 7, "src offset 0x%08x\n", data[7]);
+		return len;
+	}
+
+	for (opcode = 0; opcode < sizeof(opcodes_2d) / sizeof(opcodes_2d[0]);
+	     opcode++) {
+		if ((data[0] & 0x1fc00000) >> 22 == opcodes_2d[opcode].opcode) {
+			unsigned int i;
+
+			len = 1;
+			instr_out(ctx, 0, "%s\n",
+				  opcodes_2d[opcode].name);
+			if (opcodes_2d[opcode].max_len > 1) {
+				len = (data[0] & 0x000000ff) + 2;
+				if (len < opcodes_2d[opcode].min_len ||
+				    len > opcodes_2d[opcode].max_len) {
+					fprintf(out, "Bad count in %s\n",
+						opcodes_2d[opcode].name);
+				}
+			}
+
+			for (i = 1; i < len; i++) {
+				instr_out(ctx, i, "dword %d\n", i);
+			}
+
+			return len;
+		}
+	}
+
+	instr_out(ctx, 0, "2D UNKNOWN\n");
+	return 1;
+}
+
+static int
+decode_3d_1c(struct drm_intel_decode *ctx)
+{
+	uint32_t *data = ctx->data;
+	uint32_t opcode;
+
+	opcode = (data[0] & 0x00f80000) >> 19;
+
+	switch (opcode) {
+	case 0x11:
+		instr_out(ctx, 0,
+			  "3DSTATE_DEPTH_SUBRECTANGLE_DISABLE\n");
+		return 1;
+	case 0x10:
+		instr_out(ctx, 0, "3DSTATE_SCISSOR_ENABLE %s\n",
+			  data[0] & 1 ? "enabled" : "disabled");
+		return 1;
+	case 0x01:
+		instr_out(ctx, 0, "3DSTATE_MAP_COORD_SET_I830\n");
+		return 1;
+	case 0x0a:
+		instr_out(ctx, 0, "3DSTATE_MAP_CUBE_I830\n");
+		return 1;
+	case 0x05:
+		instr_out(ctx, 0, "3DSTATE_MAP_TEX_STREAM_I830\n");
+		return 1;
+	}
+
+	instr_out(ctx, 0, "3D UNKNOWN: 3d_1c opcode = 0x%x\n",
+		  opcode);
+	return 1;
+}
+
+/** Sets the string dstname to describe the destination of the PS instruction */
+static void
+i915_get_instruction_dst(uint32_t *data, int i, char *dstname, int do_mask)
+{
+	uint32_t a0 = data[i];
+	int dst_nr = (a0 >> 14) & 0xf;
+	char dstmask[8];
+	const char *sat;
+
+	if (do_mask) {
+		if (((a0 >> 10) & 0xf) == 0xf) {
+			dstmask[0] = 0;
+		} else {
+			int dstmask_index = 0;
+
+			dstmask[dstmask_index++] = '.';
+			if (a0 & (1 << 10))
+				dstmask[dstmask_index++] = 'x';
+			if (a0 & (1 << 11))
+				dstmask[dstmask_index++] = 'y';
+			if (a0 & (1 << 12))
+				dstmask[dstmask_index++] = 'z';
+			if (a0 & (1 << 13))
+				dstmask[dstmask_index++] = 'w';
+			dstmask[dstmask_index++] = 0;
+		}
+
+		if (a0 & (1 << 22))
+			sat = ".sat";
+		else
+			sat = "";
+	} else {
+		dstmask[0] = 0;
+		sat = "";
+	}
+
+	switch ((a0 >> 19) & 0x7) {
+	case 0:
+		if (dst_nr > 15)
+			fprintf(out, "bad destination reg R%d\n", dst_nr);
+		sprintf(dstname, "R%d%s%s", dst_nr, dstmask, sat);
+		break;
+	case 4:
+		if (dst_nr > 0)
+			fprintf(out, "bad destination reg oC%d\n", dst_nr);
+		sprintf(dstname, "oC%s%s", dstmask, sat);
+		break;
+	case 5:
+		if (dst_nr > 0)
+			fprintf(out, "bad destination reg oD%d\n", dst_nr);
+		sprintf(dstname, "oD%s%s", dstmask, sat);
+		break;
+	case 6:
+		if (dst_nr > 3)
+			fprintf(out, "bad destination reg U%d\n", dst_nr);
+		sprintf(dstname, "U%d%s%s", dst_nr, dstmask, sat);
+		break;
+	default:
+		sprintf(dstname, "RESERVED");
+		break;
+	}
+}
+
+static const char *
+i915_get_channel_swizzle(uint32_t select)
+{
+	switch (select & 0x7) {
+	case 0:
+		return (select & 8) ? "-x" : "x";
+	case 1:
+		return (select & 8) ? "-y" : "y";
+	case 2:
+		return (select & 8) ? "-z" : "z";
+	case 3:
+		return (select & 8) ? "-w" : "w";
+	case 4:
+		return (select & 8) ? "-0" : "0";
+	case 5:
+		return (select & 8) ? "-1" : "1";
+	default:
+		return (select & 8) ? "-bad" : "bad";
+	}
+}
+
+static void
+i915_get_instruction_src_name(uint32_t src_type, uint32_t src_nr, char *name)
+{
+	switch (src_type) {
+	case 0:
+		sprintf(name, "R%d", src_nr);
+		if (src_nr > 15)
+			fprintf(out, "bad src reg %s\n", name);
+		break;
+	case 1:
+		if (src_nr < 8)
+			sprintf(name, "T%d", src_nr);
+		else if (src_nr == 8)
+			sprintf(name, "DIFFUSE");
+		else if (src_nr == 9)
+			sprintf(name, "SPECULAR");
+		else if (src_nr == 10)
+			sprintf(name, "FOG");
+		else {
+			fprintf(out, "bad src reg T%d\n", src_nr);
+			sprintf(name, "RESERVED");
+		}
+		break;
+	case 2:
+		sprintf(name, "C%d", src_nr);
+		if (src_nr > 31)
+			fprintf(out, "bad src reg %s\n", name);
+		break;
+	case 4:
+		sprintf(name, "oC");
+		if (src_nr > 0)
+			fprintf(out, "bad src reg oC%d\n", src_nr);
+		break;
+	case 5:
+		sprintf(name, "oD");
+		if (src_nr > 0)
+			fprintf(out, "bad src reg oD%d\n", src_nr);
+		break;
+	case 6:
+		sprintf(name, "U%d", src_nr);
+		if (src_nr > 3)
+			fprintf(out, "bad src reg %s\n", name);
+		break;
+	default:
+		fprintf(out, "bad src reg type %d\n", src_type);
+		sprintf(name, "RESERVED");
+		break;
+	}
+}
+
+static void i915_get_instruction_src0(uint32_t *data, int i, char *srcname)
+{
+	uint32_t a0 = data[i];
+	uint32_t a1 = data[i + 1];
+	int src_nr = (a0 >> 2) & 0x1f;
+	const char *swizzle_x = i915_get_channel_swizzle((a1 >> 28) & 0xf);
+	const char *swizzle_y = i915_get_channel_swizzle((a1 >> 24) & 0xf);
+	const char *swizzle_z = i915_get_channel_swizzle((a1 >> 20) & 0xf);
+	const char *swizzle_w = i915_get_channel_swizzle((a1 >> 16) & 0xf);
+	char swizzle[100];
+
+	i915_get_instruction_src_name((a0 >> 7) & 0x7, src_nr, srcname);
+	sprintf(swizzle, ".%s%s%s%s", swizzle_x, swizzle_y, swizzle_z,
+		swizzle_w);
+	if (strcmp(swizzle, ".xyzw") != 0)
+		strcat(srcname, swizzle);
+}
+
+static void i915_get_instruction_src1(uint32_t *data, int i, char *srcname)
+{
+	uint32_t a1 = data[i + 1];
+	uint32_t a2 = data[i + 2];
+	int src_nr = (a1 >> 8) & 0x1f;
+	const char *swizzle_x = i915_get_channel_swizzle((a1 >> 4) & 0xf);
+	const char *swizzle_y = i915_get_channel_swizzle((a1 >> 0) & 0xf);
+	const char *swizzle_z = i915_get_channel_swizzle((a2 >> 28) & 0xf);
+	const char *swizzle_w = i915_get_channel_swizzle((a2 >> 24) & 0xf);
+	char swizzle[100];
+
+	i915_get_instruction_src_name((a1 >> 13) & 0x7, src_nr, srcname);
+	sprintf(swizzle, ".%s%s%s%s", swizzle_x, swizzle_y, swizzle_z,
+		swizzle_w);
+	if (strcmp(swizzle, ".xyzw") != 0)
+		strcat(srcname, swizzle);
+}
+
+static void i915_get_instruction_src2(uint32_t *data, int i, char *srcname)
+{
+	uint32_t a2 = data[i + 2];
+	int src_nr = (a2 >> 16) & 0x1f;
+	const char *swizzle_x = i915_get_channel_swizzle((a2 >> 12) & 0xf);
+	const char *swizzle_y = i915_get_channel_swizzle((a2 >> 8) & 0xf);
+	const char *swizzle_z = i915_get_channel_swizzle((a2 >> 4) & 0xf);
+	const char *swizzle_w = i915_get_channel_swizzle((a2 >> 0) & 0xf);
+	char swizzle[100];
+
+	i915_get_instruction_src_name((a2 >> 21) & 0x7, src_nr, srcname);
+	sprintf(swizzle, ".%s%s%s%s", swizzle_x, swizzle_y, swizzle_z,
+		swizzle_w);
+	if (strcmp(swizzle, ".xyzw") != 0)
+		strcat(srcname, swizzle);
+}
+
+static void
+i915_get_instruction_addr(uint32_t src_type, uint32_t src_nr, char *name)
+{
+	switch (src_type) {
+	case 0:
+		sprintf(name, "R%d", src_nr);
+		if (src_nr > 15)
+			fprintf(out, "bad src reg %s\n", name);
+		break;
+	case 1:
+		if (src_nr < 8)
+			sprintf(name, "T%d", src_nr);
+		else if (src_nr == 8)
+			sprintf(name, "DIFFUSE");
+		else if (src_nr == 9)
+			sprintf(name, "SPECULAR");
+		else if (src_nr == 10)
+			sprintf(name, "FOG");
+		else {
+			fprintf(out, "bad src reg T%d\n", src_nr);
+			sprintf(name, "RESERVED");
+		}
+		break;
+	case 4:
+		sprintf(name, "oC");
+		if (src_nr > 0)
+			fprintf(out, "bad src reg oC%d\n", src_nr);
+		break;
+	case 5:
+		sprintf(name, "oD");
+		if (src_nr > 0)
+			fprintf(out, "bad src reg oD%d\n", src_nr);
+		break;
+	default:
+		fprintf(out, "bad src reg type %d\n", src_type);
+		sprintf(name, "RESERVED");
+		break;
+	}
+}
+
+static void
+i915_decode_alu1(struct drm_intel_decode *ctx,
+		 int i, char *instr_prefix, const char *op_name)
+{
+	char dst[100], src0[100];
+
+	i915_get_instruction_dst(ctx->data, i, dst, 1);
+	i915_get_instruction_src0(ctx->data, i, src0);
+
+	instr_out(ctx, i++, "%s: %s %s, %s\n", instr_prefix,
+		  op_name, dst, src0);
+	instr_out(ctx, i++, "%s\n", instr_prefix);
+	instr_out(ctx, i++, "%s\n", instr_prefix);
+}
+
+static void
+i915_decode_alu2(struct drm_intel_decode *ctx,
+		 int i, char *instr_prefix, const char *op_name)
+{
+	char dst[100], src0[100], src1[100];
+
+	i915_get_instruction_dst(ctx->data, i, dst, 1);
+	i915_get_instruction_src0(ctx->data, i, src0);
+	i915_get_instruction_src1(ctx->data, i, src1);
+
+	instr_out(ctx, i++, "%s: %s %s, %s, %s\n", instr_prefix,
+		  op_name, dst, src0, src1);
+	instr_out(ctx, i++, "%s\n", instr_prefix);
+	instr_out(ctx, i++, "%s\n", instr_prefix);
+}
+
+static void
+i915_decode_alu3(struct drm_intel_decode *ctx,
+		 int i, char *instr_prefix, const char *op_name)
+{
+	char dst[100], src0[100], src1[100], src2[100];
+
+	i915_get_instruction_dst(ctx->data, i, dst, 1);
+	i915_get_instruction_src0(ctx->data, i, src0);
+	i915_get_instruction_src1(ctx->data, i, src1);
+	i915_get_instruction_src2(ctx->data, i, src2);
+
+	instr_out(ctx, i++, "%s: %s %s, %s, %s, %s\n", instr_prefix,
+		  op_name, dst, src0, src1, src2);
+	instr_out(ctx, i++, "%s\n", instr_prefix);
+	instr_out(ctx, i++, "%s\n", instr_prefix);
+}
+
+static void
+i915_decode_tex(struct drm_intel_decode *ctx, int i,
+		const char *instr_prefix, const char *tex_name)
+{
+	uint32_t t0 = ctx->data[i];
+	uint32_t t1 = ctx->data[i + 1];
+	char dst_name[100];
+	char addr_name[100];
+	int sampler_nr;
+
+	i915_get_instruction_dst(ctx->data, i, dst_name, 0);
+	i915_get_instruction_addr((t1 >> 24) & 0x7,
+				  (t1 >> 17) & 0xf, addr_name);
+	sampler_nr = t0 & 0xf;
+
+	instr_out(ctx, i++, "%s: %s %s, S%d, %s\n", instr_prefix,
+		  tex_name, dst_name, sampler_nr, addr_name);
+	instr_out(ctx, i++, "%s\n", instr_prefix);
+	instr_out(ctx, i++, "%s\n", instr_prefix);
+}
+
+static void
+i915_decode_dcl(struct drm_intel_decode *ctx, int i, char *instr_prefix)
+{
+	uint32_t d0 = ctx->data[i];
+	const char *sampletype;
+	int dcl_nr = (d0 >> 14) & 0xf;
+	const char *dcl_x = d0 & (1 << 10) ? "x" : "";
+	const char *dcl_y = d0 & (1 << 11) ? "y" : "";
+	const char *dcl_z = d0 & (1 << 12) ? "z" : "";
+	const char *dcl_w = d0 & (1 << 13) ? "w" : "";
+	char dcl_mask[10];
+
+	switch ((d0 >> 19) & 0x3) {
+	case 1:
+		sprintf(dcl_mask, ".%s%s%s%s", dcl_x, dcl_y, dcl_z, dcl_w);
+		if (strcmp(dcl_mask, ".") == 0)
+			fprintf(out, "bad (empty) dcl mask\n");
+
+		if (dcl_nr > 10)
+			fprintf(out, "bad T%d dcl register number\n", dcl_nr);
+		if (dcl_nr < 8) {
+			if (strcmp(dcl_mask, ".x") != 0 &&
+			    strcmp(dcl_mask, ".xy") != 0 &&
+			    strcmp(dcl_mask, ".xz") != 0 &&
+			    strcmp(dcl_mask, ".w") != 0 &&
+			    strcmp(dcl_mask, ".xyzw") != 0) {
+				fprintf(out, "bad T%d.%s dcl mask\n", dcl_nr,
+					dcl_mask);
+			}
+			instr_out(ctx, i++, "%s: DCL T%d%s\n",
+				  instr_prefix, dcl_nr, dcl_mask);
+		} else {
+			if (strcmp(dcl_mask, ".xz") == 0)
+				fprintf(out, "errataed bad dcl mask %s\n",
+					dcl_mask);
+			else if (strcmp(dcl_mask, ".xw") == 0)
+				fprintf(out, "errataed bad dcl mask %s\n",
+					dcl_mask);
+			else if (strcmp(dcl_mask, ".xzw") == 0)
+				fprintf(out, "errataed bad dcl mask %s\n",
+					dcl_mask);
+
+			if (dcl_nr == 8) {
+				instr_out(ctx, i++,
+					  "%s: DCL DIFFUSE%s\n", instr_prefix,
+					  dcl_mask);
+			} else if (dcl_nr == 9) {
+				instr_out(ctx, i++,
+					  "%s: DCL SPECULAR%s\n", instr_prefix,
+					  dcl_mask);
+			} else if (dcl_nr == 10) {
+				instr_out(ctx, i++,
+					  "%s: DCL FOG%s\n", instr_prefix,
+					  dcl_mask);
+			}
+		}
+		instr_out(ctx, i++, "%s\n", instr_prefix);
+		instr_out(ctx, i++, "%s\n", instr_prefix);
+		break;
+	case 3:
+		switch ((d0 >> 22) & 0x3) {
+		case 0:
+			sampletype = "2D";
+			break;
+		case 1:
+			sampletype = "CUBE";
+			break;
+		case 2:
+			sampletype = "3D";
+			break;
+		default:
+			sampletype = "RESERVED";
+			break;
+		}
+		if (dcl_nr > 15)
+			fprintf(out, "bad S%d dcl register number\n", dcl_nr);
+		instr_out(ctx, i++, "%s: DCL S%d %s\n",
+			  instr_prefix, dcl_nr, sampletype);
+		instr_out(ctx, i++, "%s\n", instr_prefix);
+		instr_out(ctx, i++, "%s\n", instr_prefix);
+		break;
+	default:
+		instr_out(ctx, i++, "%s: DCL RESERVED%d\n",
+			  instr_prefix, dcl_nr);
+		instr_out(ctx, i++, "%s\n", instr_prefix);
+		instr_out(ctx, i++, "%s\n", instr_prefix);
+	}
+}
+
+static void
+i915_decode_instruction(struct drm_intel_decode *ctx,
+			int i, char *instr_prefix)
+{
+	switch ((ctx->data[i] >> 24) & 0x1f) {
+	case 0x0:
+		instr_out(ctx, i++, "%s: NOP\n", instr_prefix);
+		instr_out(ctx, i++, "%s\n", instr_prefix);
+		instr_out(ctx, i++, "%s\n", instr_prefix);
+		break;
+	case 0x01:
+		i915_decode_alu2(ctx, i, instr_prefix, "ADD");
+		break;
+	case 0x02:
+		i915_decode_alu1(ctx, i, instr_prefix, "MOV");
+		break;
+	case 0x03:
+		i915_decode_alu2(ctx, i, instr_prefix, "MUL");
+		break;
+	case 0x04:
+		i915_decode_alu3(ctx, i, instr_prefix, "MAD");
+		break;
+	case 0x05:
+		i915_decode_alu3(ctx, i, instr_prefix, "DP2ADD");
+		break;
+	case 0x06:
+		i915_decode_alu2(ctx, i, instr_prefix, "DP3");
+		break;
+	case 0x07:
+		i915_decode_alu2(ctx, i, instr_prefix, "DP4");
+		break;
+	case 0x08:
+		i915_decode_alu1(ctx, i, instr_prefix, "FRC");
+		break;
+	case 0x09:
+		i915_decode_alu1(ctx, i, instr_prefix, "RCP");
+		break;
+	case 0x0a:
+		i915_decode_alu1(ctx, i, instr_prefix, "RSQ");
+		break;
+	case 0x0b:
+		i915_decode_alu1(ctx, i, instr_prefix, "EXP");
+		break;
+	case 0x0c:
+		i915_decode_alu1(ctx, i, instr_prefix, "LOG");
+		break;
+	case 0x0d:
+		i915_decode_alu2(ctx, i, instr_prefix, "CMP");
+		break;
+	case 0x0e:
+		i915_decode_alu2(ctx, i, instr_prefix, "MIN");
+		break;
+	case 0x0f:
+		i915_decode_alu2(ctx, i, instr_prefix, "MAX");
+		break;
+	case 0x10:
+		i915_decode_alu1(ctx, i, instr_prefix, "FLR");
+		break;
+	case 0x11:
+		i915_decode_alu1(ctx, i, instr_prefix, "MOD");
+		break;
+	case 0x12:
+		i915_decode_alu1(ctx, i, instr_prefix, "TRC");
+		break;
+	case 0x13:
+		i915_decode_alu2(ctx, i, instr_prefix, "SGE");
+		break;
+	case 0x14:
+		i915_decode_alu2(ctx, i, instr_prefix, "SLT");
+		break;
+	case 0x15:
+		i915_decode_tex(ctx, i, instr_prefix, "TEXLD");
+		break;
+	case 0x16:
+		i915_decode_tex(ctx, i, instr_prefix, "TEXLDP");
+		break;
+	case 0x17:
+		i915_decode_tex(ctx, i, instr_prefix, "TEXLDB");
+		break;
+	case 0x19:
+		i915_decode_dcl(ctx, i, instr_prefix);
+		break;
+	default:
+		instr_out(ctx, i++, "%s: unknown\n", instr_prefix);
+		instr_out(ctx, i++, "%s\n", instr_prefix);
+		instr_out(ctx, i++, "%s\n", instr_prefix);
+		break;
+	}
+}
+
+static const char *
+decode_compare_func(uint32_t op)
+{
+	switch (op & 0x7) {
+	case 0:
+		return "always";
+	case 1:
+		return "never";
+	case 2:
+		return "less";
+	case 3:
+		return "equal";
+	case 4:
+		return "lequal";
+	case 5:
+		return "greater";
+	case 6:
+		return "notequal";
+	case 7:
+		return "gequal";
+	}
+	return "";
+}
+
+static const char *
+decode_stencil_op(uint32_t op)
+{
+	switch (op & 0x7) {
+	case 0:
+		return "keep";
+	case 1:
+		return "zero";
+	case 2:
+		return "replace";
+	case 3:
+		return "incr_sat";
+	case 4:
+		return "decr_sat";
+	case 5:
+		return "greater";
+	case 6:
+		return "incr";
+	case 7:
+		return "decr";
+	}
+	return "";
+}
+
+#if 0
+static const char *
+decode_logic_op(uint32_t op)
+{
+	switch (op & 0xf) {
+	case 0:
+		return "clear";
+	case 1:
+		return "nor";
+	case 2:
+		return "and_inv";
+	case 3:
+		return "copy_inv";
+	case 4:
+		return "and_rvrse";
+	case 5:
+		return "inv";
+	case 6:
+		return "xor";
+	case 7:
+		return "nand";
+	case 8:
+		return "and";
+	case 9:
+		return "equiv";
+	case 10:
+		return "noop";
+	case 11:
+		return "or_inv";
+	case 12:
+		return "copy";
+	case 13:
+		return "or_rvrse";
+	case 14:
+		return "or";
+	case 15:
+		return "set";
+	}
+	return "";
+}
+#endif
+
+static const char *
+decode_blend_fact(uint32_t op)
+{
+	switch (op & 0xf) {
+	case 1:
+		return "zero";
+	case 2:
+		return "one";
+	case 3:
+		return "src_colr";
+	case 4:
+		return "inv_src_colr";
+	case 5:
+		return "src_alpha";
+	case 6:
+		return "inv_src_alpha";
+	case 7:
+		return "dst_alpha";
+	case 8:
+		return "inv_dst_alpha";
+	case 9:
+		return "dst_colr";
+	case 10:
+		return "inv_dst_colr";
+	case 11:
+		return "src_alpha_sat";
+	case 12:
+		return "cnst_colr";
+	case 13:
+		return "inv_cnst_colr";
+	case 14:
+		return "cnst_alpha";
+	case 15:
+		return "inv_const_alpha";
+	}
+	return "";
+}
+
+static const char *
+decode_tex_coord_mode(uint32_t mode)
+{
+	switch (mode & 0x7) {
+	case 0:
+		return "wrap";
+	case 1:
+		return "mirror";
+	case 2:
+		return "clamp_edge";
+	case 3:
+		return "cube";
+	case 4:
+		return "clamp_border";
+	case 5:
+		return "mirror_once";
+	}
+	return "";
+}
+
+static const char *
+decode_sample_filter(uint32_t mode)
+{
+	switch (mode & 0x7) {
+	case 0:
+		return "nearest";
+	case 1:
+		return "linear";
+	case 2:
+		return "anisotropic";
+	case 3:
+		return "4x4_1";
+	case 4:
+		return "4x4_2";
+	case 5:
+		return "4x4_flat";
+	case 6:
+		return "6x5_mono";
+	}
+	return "";
+}
+
+static int
+decode_3d_1d(struct drm_intel_decode *ctx)
+{
+	unsigned int len, i, c, idx, word, map, sampler, instr;
+	const char *format, *zformat, *type;
+	uint32_t opcode;
+	uint32_t *data = ctx->data;
+	uint32_t devid = ctx->devid;
+
+	struct {
+		uint32_t opcode;
+		int i830_only;
+		unsigned int min_len;
+		unsigned int max_len;
+		const char *name;
+	} opcodes_3d_1d[] = {
+		{ 0x86, 0, 4, 4, "3DSTATE_CHROMA_KEY" },
+		{ 0x88, 0, 2, 2, "3DSTATE_CONSTANT_BLEND_COLOR" },
+		{ 0x99, 0, 2, 2, "3DSTATE_DEFAULT_DIFFUSE" },
+		{ 0x9a, 0, 2, 2, "3DSTATE_DEFAULT_SPECULAR" },
+		{ 0x98, 0, 2, 2, "3DSTATE_DEFAULT_Z" },
+		{ 0x97, 0, 2, 2, "3DSTATE_DEPTH_OFFSET_SCALE" },
+		{ 0x9d, 0, 65, 65, "3DSTATE_FILTER_COEFFICIENTS_4X4" },
+		{ 0x9e, 0, 4, 4, "3DSTATE_MONO_FILTER" },
+		{ 0x89, 0, 4, 4, "3DSTATE_FOG_MODE" },
+		{ 0x8f, 0, 2, 16, "3DSTATE_MAP_PALLETE_LOAD_32" },
+		{ 0x83, 0, 2, 2, "3DSTATE_SPAN_STIPPLE" },
+		{ 0x8c, 1, 2, 2, "3DSTATE_MAP_COORD_TRANSFORM_I830" },
+		{ 0x8b, 1, 2, 2, "3DSTATE_MAP_VERTEX_TRANSFORM_I830" },
+		{ 0x8d, 1, 3, 3, "3DSTATE_W_STATE_I830" },
+		{ 0x01, 1, 2, 2, "3DSTATE_COLOR_FACTOR_I830" },
+		{ 0x02, 1, 2, 2, "3DSTATE_MAP_COORD_SETBIND_I830"},
+	}, *opcode_3d_1d;
+
+	opcode = (data[0] & 0x00ff0000) >> 16;
+
+	switch (opcode) {
+	case 0x07:
+		/* This instruction is unusual.  A 0 length means just
+		 * 1 DWORD instead of 2.  The 0 length is specified in
+		 * one place to be unsupported, but stated to be
+		 * required in another, and 0 length LOAD_INDIRECTs
+		 * appear to cause no harm at least.
+		 */
+		instr_out(ctx, 0, "3DSTATE_LOAD_INDIRECT\n");
+		len = (data[0] & 0x000000ff) + 1;
+		i = 1;
+		if (data[0] & (0x01 << 8)) {
+			instr_out(ctx, i++, "SIS.0\n");
+			instr_out(ctx, i++, "SIS.1\n");
+		}
+		if (data[0] & (0x02 << 8)) {
+			instr_out(ctx, i++, "DIS.0\n");
+		}
+		if (data[0] & (0x04 << 8)) {
+			instr_out(ctx, i++, "SSB.0\n");
+			instr_out(ctx, i++, "SSB.1\n");
+		}
+		if (data[0] & (0x08 << 8)) {
+			instr_out(ctx, i++, "MSB.0\n");
+			instr_out(ctx, i++, "MSB.1\n");
+		}
+		if (data[0] & (0x10 << 8)) {
+			instr_out(ctx, i++, "PSP.0\n");
+			instr_out(ctx, i++, "PSP.1\n");
+		}
+		if (data[0] & (0x20 << 8)) {
+			instr_out(ctx, i++, "PSC.0\n");
+			instr_out(ctx, i++, "PSC.1\n");
+		}
+		if (len != i) {
+			fprintf(out, "Bad count in 3DSTATE_LOAD_INDIRECT\n");
+			return len;
+		}
+		return len;
+	case 0x04:
+		instr_out(ctx, 0,
+			  "3DSTATE_LOAD_STATE_IMMEDIATE_1\n");
+		len = (data[0] & 0x0000000f) + 2;
+		i = 1;
+		for (word = 0; word <= 8; word++) {
+			if (data[0] & (1 << (4 + word))) {
+				/* save vertex state for decode */
+				if (!IS_GEN2(devid)) {
+					int tex_num;
+
+					if (word == 2) {
+						saved_s2_set = 1;
+						saved_s2 = data[i];
+					}
+					if (word == 4) {
+						saved_s4_set = 1;
+						saved_s4 = data[i];
+					}
+
+					switch (word) {
+					case 0:
+						instr_out(ctx, i,
+							  "S0: vbo offset: 0x%08x%s\n",
+							  data[i] & (~1),
+							  data[i] & 1 ?
+							  ", auto cache invalidate disabled"
+							  : "");
+						break;
+					case 1:
+						instr_out(ctx, i,
+							  "S1: vertex width: %i, vertex pitch: %i\n",
+							  (data[i] >> 24) &
+							  0x3f,
+							  (data[i] >> 16) &
+							  0x3f);
+						break;
+					case 2:
+						instr_out(ctx, i,
+							  "S2: texcoord formats: ");
+						for (tex_num = 0;
+						     tex_num < 8; tex_num++) {
+							switch ((data[i] >>
+								 tex_num *
+								 4) & 0xf) {
+							case 0:
+								fprintf(out,
+									"%i=2D ",
+									tex_num);
+								break;
+							case 1:
+								fprintf(out,
+									"%i=3D ",
+									tex_num);
+								break;
+							case 2:
+								fprintf(out,
+									"%i=4D ",
+									tex_num);
+								break;
+							case 3:
+								fprintf(out,
+									"%i=1D ",
+									tex_num);
+								break;
+							case 4:
+								fprintf(out,
+									"%i=2D_16 ",
+									tex_num);
+								break;
+							case 5:
+								fprintf(out,
+									"%i=4D_16 ",
+									tex_num);
+								break;
+							case 0xf:
+								fprintf(out,
+									"%i=NP ",
+									tex_num);
+								break;
+							}
+						}
+						fprintf(out, "\n");
+
+						break;
+					case 3:
+						instr_out(ctx, i,
+							  "S3: not documented\n");
+						break;
+					case 4:
+						{
+							const char *cullmode = "";
+							const char *vfmt_xyzw = "";
+							switch ((data[i] >> 13)
+								& 0x3) {
+							case 0:
+								cullmode =
+								    "both";
+								break;
+							case 1:
+								cullmode =
+								    "none";
+								break;
+							case 2:
+								cullmode = "cw";
+								break;
+							case 3:
+								cullmode =
+								    "ccw";
+								break;
+							}
+							switch (data[i] &
+								(7 << 6 | 1 <<
+								 2)) {
+							case 1 << 6:
+								vfmt_xyzw =
+								    "XYZ,";
+								break;
+							case 2 << 6:
+								vfmt_xyzw =
+								    "XYZW,";
+								break;
+							case 3 << 6:
+								vfmt_xyzw =
+								    "XY,";
+								break;
+							case 4 << 6:
+								vfmt_xyzw =
+								    "XYW,";
+								break;
+							case 1 << 6 | 1 << 2:
+								vfmt_xyzw =
+								    "XYZF,";
+								break;
+							case 2 << 6 | 1 << 2:
+								vfmt_xyzw =
+								    "XYZWF,";
+								break;
+							case 3 << 6 | 1 << 2:
+								vfmt_xyzw =
+								    "XYF,";
+								break;
+							case 4 << 6 | 1 << 2:
+								vfmt_xyzw =
+								    "XYWF,";
+								break;
+							}
+							instr_out(ctx, i,
+								  "S4: point_width=%i, line_width=%.1f,"
+								  "%s%s%s%s%s cullmode=%s, vfmt=%s%s%s%s%s%s "
+								  "%s%s%s%s%s\n",
+								  (data[i] >>
+								   23) & 0x1ff,
+								  ((data[i] >>
+								    19) & 0xf) /
+								  2.0,
+								  data[i] & (0xf
+									     <<
+									     15)
+								  ?
+								  " flatshade="
+								  : "",
+								  data[i] & (1
+									     <<
+									     18)
+								  ? "Alpha," :
+								  "",
+								  data[i] & (1
+									     <<
+									     17)
+								  ? "Fog," : "",
+								  data[i] & (1
+									     <<
+									     16)
+								  ? "Specular,"
+								  : "",
+								  data[i] & (1
+									     <<
+									     15)
+								  ? "Color," :
+								  "", cullmode,
+								  data[i] & (1
+									     <<
+									     12)
+								  ?
+								  "PointWidth,"
+								  : "",
+								  data[i] & (1
+									     <<
+									     11)
+								  ? "SpecFog," :
+								  "",
+								  data[i] & (1
+									     <<
+									     10)
+								  ? "Color," :
+								  "",
+								  data[i] & (1
+									     <<
+									     9)
+								  ? "DepthOfs,"
+								  : "",
+								  vfmt_xyzw,
+								  data[i] & (1
+									     <<
+									     9)
+								  ? "FogParam,"
+								  : "",
+								  data[i] & (1
+									     <<
+									     5)
+								  ?
+								  "force default diffuse, "
+								  : "",
+								  data[i] & (1
+									     <<
+									     4)
+								  ?
+								  "force default specular, "
+								  : "",
+								  data[i] & (1
+									     <<
+									     3)
+								  ?
+								  "local depth ofs enable, "
+								  : "",
+								  data[i] & (1
+									     <<
+									     1)
+								  ?
+								  "point sprite enable, "
+								  : "",
+								  data[i] & (1
+									     <<
+									     0)
+								  ?
+								  "line AA enable, "
+								  : "");
+							break;
+						}
+					case 5:
+						{
+							instr_out(ctx, i,
+								  "S5:%s%s%s%s%s"
+								  "%s%s%s%s stencil_ref=0x%x, stencil_test=%s, "
+								  "stencil_fail=%s, stencil_pass_z_fail=%s, "
+								  "stencil_pass_z_pass=%s, %s%s%s%s\n",
+								  data[i] & (0xf
+									     <<
+									     28)
+								  ?
+								  " write_disable="
+								  : "",
+								  data[i] & (1
+									     <<
+									     31)
+								  ? "Alpha," :
+								  "",
+								  data[i] & (1
+									     <<
+									     30)
+								  ? "Red," : "",
+								  data[i] & (1
+									     <<
+									     29)
+								  ? "Green," :
+								  "",
+								  data[i] & (1
+									     <<
+									     28)
+								  ? "Blue," :
+								  "",
+								  data[i] & (1
+									     <<
+									     27)
+								  ?
+								  " force default point size,"
+								  : "",
+								  data[i] & (1
+									     <<
+									     26)
+								  ?
+								  " last pixel enable,"
+								  : "",
+								  data[i] & (1
+									     <<
+									     25)
+								  ?
+								  " global depth ofs enable,"
+								  : "",
+								  data[i] & (1
+									     <<
+									     24)
+								  ?
+								  " fog enable,"
+								  : "",
+								  (data[i] >>
+								   16) & 0xff,
+								  decode_compare_func
+								  (data[i] >>
+								   13),
+								  decode_stencil_op
+								  (data[i] >>
+								   10),
+								  decode_stencil_op
+								  (data[i] >>
+								   7),
+								  decode_stencil_op
+								  (data[i] >>
+								   4),
+								  data[i] & (1
+									     <<
+									     3)
+								  ?
+								  "stencil write enable, "
+								  : "",
+								  data[i] & (1
+									     <<
+									     2)
+								  ?
+								  "stencil test enable, "
+								  : "",
+								  data[i] & (1
+									     <<
+									     1)
+								  ?
+								  "color dither enable, "
+								  : "",
+								  data[i] & (1
+									     <<
+									     0)
+								  ?
+								  "logicop enable, "
+								  : "");
+						}
+						break;
+					case 6:
+						instr_out(ctx, i,
+							  "S6: %salpha_test=%s, alpha_ref=0x%x, "
+							  "depth_test=%s, %ssrc_blnd_fct=%s, dst_blnd_fct=%s, "
+							  "%s%stristrip_provoking_vertex=%i\n",
+							  data[i] & (1 << 31) ?
+							  "alpha test enable, "
+							  : "",
+							  decode_compare_func
+							  (data[i] >> 28),
+							  data[i] & (0xff <<
+								     20),
+							  decode_compare_func
+							  (data[i] >> 16),
+							  data[i] & (1 << 15) ?
+							  "cbuf blend enable, "
+							  : "",
+							  decode_blend_fact(data
+									    [i]
+									    >>
+									    8),
+							  decode_blend_fact(data
+									    [i]
+									    >>
+									    4),
+							  data[i] & (1 << 3) ?
+							  "depth write enable, "
+							  : "",
+							  data[i] & (1 << 2) ?
+							  "cbuf write enable, "
+							  : "",
+							  data[i] & (0x3));
+						break;
+					case 7:
+						instr_out(ctx, i,
+							  "S7: depth offset constant: 0x%08x\n",
+							  data[i]);
+						break;
+					}
+				} else {
+					instr_out(ctx, i,
+						  "S%d: 0x%08x\n", word, data[i]);
+				}
+				i++;
+			}
+		}
+		if (len != i) {
+			fprintf(out,
+				"Bad count in 3DSTATE_LOAD_STATE_IMMEDIATE_1\n");
+		}
+		return len;
+	case 0x03:
+		instr_out(ctx, 0,
+			  "3DSTATE_LOAD_STATE_IMMEDIATE_2\n");
+		len = (data[0] & 0x0000000f) + 2;
+		i = 1;
+		for (word = 6; word <= 14; word++) {
+			if (data[0] & (1 << word)) {
+				if (word == 6)
+					instr_out(ctx, i++,
+						  "TBCF\n");
+				else if (word >= 7 && word <= 10) {
+					instr_out(ctx, i++,
+						  "TB%dC\n", word - 7);
+					instr_out(ctx, i++,
+						  "TB%dA\n", word - 7);
+				} else if (word >= 11 && word <= 14) {
+					instr_out(ctx, i,
+						  "TM%dS0: offset=0x%08x, %s\n",
+						  word - 11,
+						  data[i] & 0xfffffffe,
+						  data[i] & 1 ? "use fence" :
+						  "");
+					i++;
+					instr_out(ctx, i,
+						  "TM%dS1: height=%i, width=%i, %s\n",
+						  word - 11, data[i] >> 21,
+						  (data[i] >> 10) & 0x3ff,
+						  data[i] & 2 ? (data[i] & 1 ?
+								 "y-tiled" :
+								 "x-tiled") :
+						  "");
+					i++;
+					instr_out(ctx, i,
+						  "TM%dS2: pitch=%i, \n",
+						  word - 11,
+						  ((data[i] >> 21) + 1) * 4);
+					i++;
+					instr_out(ctx, i++,
+						  "TM%dS3\n", word - 11);
+					instr_out(ctx, i++,
+						  "TM%dS4: dflt color\n",
+						  word - 11);
+				}
+			}
+		}
+		if (len != i) {
+			fprintf(out,
+				"Bad count in 3DSTATE_LOAD_STATE_IMMEDIATE_2\n");
+		}
+		return len;
+	case 0x00:
+		instr_out(ctx, 0, "3DSTATE_MAP_STATE\n");
+		len = (data[0] & 0x0000003f) + 2;
+		instr_out(ctx, 1, "mask\n");
+
+		i = 2;
+		for (map = 0; map <= 15; map++) {
+			if (data[1] & (1 << map)) {
+				int width, height, pitch, dword;
+				const char *tiling;
+
+				dword = data[i];
+				instr_out(ctx, i++,
+					  "map %d MS2 %s%s%s\n", map,
+					  dword & (1 << 31) ?
+					  "untrusted surface, " : "",
+					  dword & (1 << 1) ?
+					  "vertical line stride enable, " : "",
+					  dword & (1 << 0) ?
+					  "vertical ofs enable, " : "");
+
+				dword = data[i];
+				width = ((dword >> 10) & ((1 << 11) - 1)) + 1;
+				height = ((dword >> 21) & ((1 << 11) - 1)) + 1;
+
+				tiling = "none";
+				if (dword & (1 << 2))
+					tiling = "fenced";
+				else if (dword & (1 << 1))
+					tiling = dword & (1 << 0) ? "Y" : "X";
+				type = " BAD";
+				format = "BAD";
+				switch ((dword >> 7) & 0x7) {
+				case 1:
+					type = "8b";
+					switch ((dword >> 3) & 0xf) {
+					case 0:
+						format = "I";
+						break;
+					case 1:
+						format = "L";
+						break;
+					case 4:
+						format = "A";
+						break;
+					case 5:
+						format = " mono";
+						break;
+					}
+					break;
+				case 2:
+					type = "16b";
+					switch ((dword >> 3) & 0xf) {
+					case 0:
+						format = " rgb565";
+						break;
+					case 1:
+						format = " argb1555";
+						break;
+					case 2:
+						format = " argb4444";
+						break;
+					case 5:
+						format = " ay88";
+						break;
+					case 6:
+						format = " bump655";
+						break;
+					case 7:
+						format = "I";
+						break;
+					case 8:
+						format = "L";
+						break;
+					case 9:
+						format = "A";
+						break;
+					}
+					break;
+				case 3:
+					type = "32b";
+					switch ((dword >> 3) & 0xf) {
+					case 0:
+						format = " argb8888";
+						break;
+					case 1:
+						format = " abgr8888";
+						break;
+					case 2:
+						format = " xrgb8888";
+						break;
+					case 3:
+						format = " xbgr8888";
+						break;
+					case 4:
+						format = " qwvu8888";
+						break;
+					case 5:
+						format = " axvu8888";
+						break;
+					case 6:
+						format = " lxvu8888";
+						break;
+					case 7:
+						format = " xlvu8888";
+						break;
+					case 8:
+						format = " argb2101010";
+						break;
+					case 9:
+						format = " abgr2101010";
+						break;
+					case 10:
+						format = " awvu2101010";
+						break;
+					case 11:
+						format = " gr1616";
+						break;
+					case 12:
+						format = " vu1616";
+						break;
+					case 13:
+						format = " xI824";
+						break;
+					case 14:
+						format = " xA824";
+						break;
+					case 15:
+						format = " xL824";
+						break;
+					}
+					break;
+				case 5:
+					type = "422";
+					switch ((dword >> 3) & 0xf) {
+					case 0:
+						format = " yuv_swapy";
+						break;
+					case 1:
+						format = " yuv";
+						break;
+					case 2:
+						format = " yuv_swapuv";
+						break;
+					case 3:
+						format = " yuv_swapuvy";
+						break;
+					}
+					break;
+				case 6:
+					type = "compressed";
+					switch ((dword >> 3) & 0x7) {
+					case 0:
+						format = " dxt1";
+						break;
+					case 1:
+						format = " dxt2_3";
+						break;
+					case 2:
+						format = " dxt4_5";
+						break;
+					case 3:
+						format = " fxt1";
+						break;
+					case 4:
+						format = " dxt1_rb";
+						break;
+					}
+					break;
+				case 7:
+					type = "4b indexed";
+					switch ((dword >> 3) & 0xf) {
+					case 7:
+						format = " argb8888";
+						break;
+					}
+					break;
+				}
+				dword = data[i];
+				instr_out(ctx, i++,
+					  "map %d MS3 [width=%d, height=%d, format=%s%s, tiling=%s%s]\n",
+					  map, width, height, type, format,
+					  tiling,
+					  dword & (1 << 9) ? " palette select" :
+					  "");
+
+				dword = data[i];
+				pitch =
+				    4 * (((dword >> 21) & ((1 << 11) - 1)) + 1);
+				instr_out(ctx, i++,
+					  "map %d MS4 [pitch=%d, max_lod=%i, vol_depth=%i, cube_face_ena=%x, %s]\n",
+					  map, pitch, (dword >> 9) & 0x3f,
+					  dword & 0xff, (dword >> 15) & 0x3f,
+					  dword & (1 << 8) ? "miplayout legacy"
+					  : "miplayout right");
+			}
+		}
+		if (len != i) {
+			fprintf(out, "Bad count in 3DSTATE_MAP_STATE\n");
+			return len;
+		}
+		return len;
+	case 0x06:
+		instr_out(ctx, 0,
+			  "3DSTATE_PIXEL_SHADER_CONSTANTS\n");
+		len = (data[0] & 0x000000ff) + 2;
+
+		i = 2;
+		for (c = 0; c <= 31; c++) {
+			if (data[1] & (1 << c)) {
+				instr_out(ctx, i, "C%d.X = %f\n", c,
+					  int_as_float(data[i]));
+				i++;
+				instr_out(ctx, i, "C%d.Y = %f\n",
+					  c, int_as_float(data[i]));
+				i++;
+				instr_out(ctx, i, "C%d.Z = %f\n",
+					  c, int_as_float(data[i]));
+				i++;
+				instr_out(ctx, i, "C%d.W = %f\n",
+					  c, int_as_float(data[i]));
+				i++;
+			}
+		}
+		if (len != i) {
+			fprintf(out,
+				"Bad count in 3DSTATE_PIXEL_SHADER_CONSTANTS\n");
+		}
+		return len;
+	case 0x05:
+		instr_out(ctx, 0, "3DSTATE_PIXEL_SHADER_PROGRAM\n");
+		len = (data[0] & 0x000000ff) + 2;
+		if ((len - 1) % 3 != 0 || len > 370) {
+			fprintf(out,
+				"Bad count in 3DSTATE_PIXEL_SHADER_PROGRAM\n");
+		}
+		i = 1;
+		for (instr = 0; instr < (len - 1) / 3; instr++) {
+			char instr_prefix[10];
+
+			sprintf(instr_prefix, "PS%03d", instr);
+			i915_decode_instruction(ctx, i,
+						instr_prefix);
+			i += 3;
+		}
+		return len;
+	case 0x01:
+		if (IS_GEN2(devid))
+			break;
+		instr_out(ctx, 0, "3DSTATE_SAMPLER_STATE\n");
+		instr_out(ctx, 1, "mask\n");
+		len = (data[0] & 0x0000003f) + 2;
+		i = 2;
+		for (sampler = 0; sampler <= 15; sampler++) {
+			if (data[1] & (1 << sampler)) {
+				uint32_t dword;
+				const char *mip_filter = "";
+
+				dword = data[i];
+				switch ((dword >> 20) & 0x3) {
+				case 0:
+					mip_filter = "none";
+					break;
+				case 1:
+					mip_filter = "nearest";
+					break;
+				case 3:
+					mip_filter = "linear";
+					break;
+				}
+				instr_out(ctx, i++,
+					  "sampler %d SS2:%s%s%s "
+					  "base_mip_level=%i, mip_filter=%s, mag_filter=%s, min_filter=%s "
+					  "lod_bias=%.2f,%s max_aniso=%i, shadow_func=%s\n",
+					  sampler,
+					  dword & (1 << 31) ? " reverse gamma,"
+					  : "",
+					  dword & (1 << 30) ? " packed2planar,"
+					  : "",
+					  dword & (1 << 29) ?
+					  " colorspace conversion," : "",
+					  (dword >> 22) & 0x1f, mip_filter,
+					  decode_sample_filter(dword >> 17),
+					  decode_sample_filter(dword >> 14),
+					  ((dword >> 5) & 0x1ff) / (0x10 * 1.0),
+					  dword & (1 << 4) ? " shadow," : "",
+					  dword & (1 << 3) ? 4 : 2,
+					  decode_compare_func(dword));
+				dword = data[i];
+				instr_out(ctx, i++,
+					  "sampler %d SS3: min_lod=%.2f,%s "
+					  "tcmode_x=%s, tcmode_y=%s, tcmode_z=%s,%s texmap_idx=%i,%s\n",
+					  sampler,
+					  ((dword >> 24) & 0xff) / (0x10 * 1.0),
+					  dword & (1 << 17) ?
+					  " kill pixel enable," : "",
+					  decode_tex_coord_mode(dword >> 12),
+					  decode_tex_coord_mode(dword >> 9),
+					  decode_tex_coord_mode(dword >> 6),
+					  dword & (1 << 5) ?
+					  " normalized coords," : "",
+					  (dword >> 1) & 0xf,
+					  dword & (1 << 0) ? " deinterlacer," :
+					  "");
+				dword = data[i];
+				instr_out(ctx, i++,
+					  "sampler %d SS4: border color\n",
+					  sampler);
+			}
+		}
+		if (len != i) {
+			fprintf(out, "Bad count in 3DSTATE_SAMPLER_STATE\n");
+		}
+		return len;
+	case 0x85:
+		len = (data[0] & 0x0000000f) + 2;
+
+		if (len != 2)
+			fprintf(out,
+				"Bad count in 3DSTATE_DEST_BUFFER_VARIABLES\n");
+
+		instr_out(ctx, 0,
+			  "3DSTATE_DEST_BUFFER_VARIABLES\n");
+
+		switch ((data[1] >> 8) & 0xf) {
+		case 0x0:
+			format = "g8";
+			break;
+		case 0x1:
+			format = "x1r5g5b5";
+			break;
+		case 0x2:
+			format = "r5g6b5";
+			break;
+		case 0x3:
+			format = "a8r8g8b8";
+			break;
+		case 0x4:
+			format = "ycrcb_swapy";
+			break;
+		case 0x5:
+			format = "ycrcb_normal";
+			break;
+		case 0x6:
+			format = "ycrcb_swapuv";
+			break;
+		case 0x7:
+			format = "ycrcb_swapuvy";
+			break;
+		case 0x8:
+			format = "a4r4g4b4";
+			break;
+		case 0x9:
+			format = "a1r5g5b5";
+			break;
+		case 0xa:
+			format = "a2r10g10b10";
+			break;
+		default:
+			format = "BAD";
+			break;
+		}
+		switch ((data[1] >> 2) & 0x3) {
+		case 0x0:
+			zformat = "u16";
+			break;
+		case 0x1:
+			zformat = "f16";
+			break;
+		case 0x2:
+			zformat = "u24x8";
+			break;
+		default:
+			zformat = "BAD";
+			break;
+		}
+		instr_out(ctx, 1,
+			  "%s format, %s depth format, early Z %sabled\n",
+			  format, zformat,
+			  (data[1] & (1 << 31)) ? "en" : "dis");
+		return len;
+
+	case 0x8e:
+		{
+			const char *name, *tiling;
+
+			len = (data[0] & 0x0000000f) + 2;
+			if (len != 3)
+				fprintf(out,
+					"Bad count in 3DSTATE_BUFFER_INFO\n");
+
+			switch ((data[1] >> 24) & 0x7) {
+			case 0x3:
+				name = "color";
+				break;
+			case 0x7:
+				name = "depth";
+				break;
+			default:
+				name = "unknown";
+				break;
+			}
+
+			tiling = "none";
+			if (data[1] & (1 << 23))
+				tiling = "fenced";
+			else if (data[1] & (1 << 22))
+				tiling = data[1] & (1 << 21) ? "Y" : "X";
+
+			instr_out(ctx, 0, "3DSTATE_BUFFER_INFO\n");
+			instr_out(ctx, 1,
+				  "%s, tiling = %s, pitch=%d\n", name, tiling,
+				  data[1] & 0xffff);
+
+			instr_out(ctx, 2, "address\n");
+			return len;
+		}
+	case 0x81:
+		len = (data[0] & 0x0000000f) + 2;
+
+		if (len != 3)
+			fprintf(out,
+				"Bad count in 3DSTATE_SCISSOR_RECTANGLE\n");
+
+		instr_out(ctx, 0, "3DSTATE_SCISSOR_RECTANGLE\n");
+		instr_out(ctx, 1, "(%d,%d)\n",
+			  data[1] & 0xffff, data[1] >> 16);
+		instr_out(ctx, 2, "(%d,%d)\n",
+			  data[2] & 0xffff, data[2] >> 16);
+
+		return len;
+	case 0x80:
+		len = (data[0] & 0x0000000f) + 2;
+
+		if (len != 5)
+			fprintf(out,
+				"Bad count in 3DSTATE_DRAWING_RECTANGLE\n");
+
+		instr_out(ctx, 0, "3DSTATE_DRAWING_RECTANGLE\n");
+		instr_out(ctx, 1, "%s\n",
+			  data[1] & (1 << 30) ? "depth ofs disabled " : "");
+		instr_out(ctx, 2, "(%d,%d)\n",
+			  data[2] & 0xffff, data[2] >> 16);
+		instr_out(ctx, 3, "(%d,%d)\n",
+			  data[3] & 0xffff, data[3] >> 16);
+		instr_out(ctx, 4, "(%d,%d)\n",
+			  data[4] & 0xffff, data[4] >> 16);
+
+		return len;
+	case 0x9c:
+		len = (data[0] & 0x0000000f) + 2;
+
+		if (len != 7)
+			fprintf(out, "Bad count in 3DSTATE_CLEAR_PARAMETERS\n");
+
+		instr_out(ctx, 0, "3DSTATE_CLEAR_PARAMETERS\n");
+		instr_out(ctx, 1, "prim_type=%s, clear=%s%s%s\n",
+			  data[1] & (1 << 16) ? "CLEAR_RECT" : "ZONE_INIT",
+			  data[1] & (1 << 2) ? "color," : "",
+			  data[1] & (1 << 1) ? "depth," : "",
+			  data[1] & (1 << 0) ? "stencil," : "");
+		instr_out(ctx, 2, "clear color\n");
+		instr_out(ctx, 3, "clear depth/stencil\n");
+		instr_out(ctx, 4, "color value (rgba8888)\n");
+		instr_out(ctx, 5, "depth value %f\n",
+			  int_as_float(data[5]));
+		instr_out(ctx, 6, "clear stencil\n");
+		return len;
+	}
+
+	for (idx = 0; idx < ARRAY_SIZE(opcodes_3d_1d); idx++) {
+		opcode_3d_1d = &opcodes_3d_1d[idx];
+		if (opcode_3d_1d->i830_only && !IS_GEN2(devid))
+			continue;
+
+		if (((data[0] & 0x00ff0000) >> 16) == opcode_3d_1d->opcode) {
+			len = 1;
+
+			instr_out(ctx, 0, "%s\n",
+				  opcode_3d_1d->name);
+			if (opcode_3d_1d->max_len > 1) {
+				len = (data[0] & 0x0000ffff) + 2;
+				if (len < opcode_3d_1d->min_len ||
+				    len > opcode_3d_1d->max_len) {
+					fprintf(out, "Bad count in %s\n",
+						opcode_3d_1d->name);
+				}
+			}
+
+			for (i = 1; i < len; i++) {
+				instr_out(ctx, i, "dword %d\n", i);
+			}
+
+			return len;
+		}
+	}
+
+	instr_out(ctx, 0, "3D UNKNOWN: 3d_1d opcode = 0x%x\n",
+		  opcode);
+	return 1;
+}
+
+static int
+decode_3d_primitive(struct drm_intel_decode *ctx)
+{
+	uint32_t *data = ctx->data;
+	uint32_t count = ctx->count;
+	char immediate = (data[0] & (1 << 23)) == 0;
+	unsigned int len, i, j, ret;
+	const char *primtype;
+	int original_s2 = saved_s2;
+	int original_s4 = saved_s4;
+
+	switch ((data[0] >> 18) & 0xf) {
+	case 0x0:
+		primtype = "TRILIST";
+		break;
+	case 0x1:
+		primtype = "TRISTRIP";
+		break;
+	case 0x2:
+		primtype = "TRISTRIP_REVERSE";
+		break;
+	case 0x3:
+		primtype = "TRIFAN";
+		break;
+	case 0x4:
+		primtype = "POLYGON";
+		break;
+	case 0x5:
+		primtype = "LINELIST";
+		break;
+	case 0x6:
+		primtype = "LINESTRIP";
+		break;
+	case 0x7:
+		primtype = "RECTLIST";
+		break;
+	case 0x8:
+		primtype = "POINTLIST";
+		break;
+	case 0x9:
+		primtype = "DIB";
+		break;
+	case 0xa:
+		primtype = "CLEAR_RECT";
+		saved_s4 = 3 << 6;
+		saved_s2 = ~0;
+		break;
+	default:
+		primtype = "unknown";
+		break;
+	}
+
+	/* XXX: 3DPRIM_DIB not supported */
+	if (immediate) {
+		len = (data[0] & 0x0003ffff) + 2;
+		instr_out(ctx, 0, "3DPRIMITIVE inline %s\n",
+			  primtype);
+		if (count < len)
+			BUFFER_FAIL(count, len, "3DPRIMITIVE inline");
+		if (!saved_s2_set || !saved_s4_set) {
+			fprintf(out, "unknown vertex format\n");
+			for (i = 1; i < len; i++) {
+				instr_out(ctx, i,
+					  "           vertex data (%f float)\n",
+					  int_as_float(data[i]));
+			}
+		} else {
+			unsigned int vertex = 0;
+			for (i = 1; i < len;) {
+				unsigned int tc;
+
+#define VERTEX_OUT(fmt, ...) do {					\
+    if (i < len)							\
+	instr_out(ctx, i, " V%d."fmt"\n", vertex, __VA_ARGS__); \
+    else								\
+	fprintf(out, " missing data in V%d\n", vertex);			\
+    i++;								\
+} while (0)
+
+				VERTEX_OUT("X = %f", int_as_float(data[i]));
+				VERTEX_OUT("Y = %f", int_as_float(data[i]));
+				switch (saved_s4 >> 6 & 0x7) {
+				case 0x1:
+					VERTEX_OUT("Z = %f",
+						   int_as_float(data[i]));
+					break;
+				case 0x2:
+					VERTEX_OUT("Z = %f",
+						   int_as_float(data[i]));
+					VERTEX_OUT("W = %f",
+						   int_as_float(data[i]));
+					break;
+				case 0x3:
+					break;
+				case 0x4:
+					VERTEX_OUT("W = %f",
+						   int_as_float(data[i]));
+					break;
+				default:
+					fprintf(out, "bad S4 position mask\n");
+				}
+
+				if (saved_s4 & (1 << 10)) {
+					VERTEX_OUT
+					    ("color = (A=0x%02x, R=0x%02x, G=0x%02x, "
+					     "B=0x%02x)", data[i] >> 24,
+					     (data[i] >> 16) & 0xff,
+					     (data[i] >> 8) & 0xff,
+					     data[i] & 0xff);
+				}
+				if (saved_s4 & (1 << 11)) {
+					VERTEX_OUT
+					    ("spec = (A=0x%02x, R=0x%02x, G=0x%02x, "
+					     "B=0x%02x)", data[i] >> 24,
+					     (data[i] >> 16) & 0xff,
+					     (data[i] >> 8) & 0xff,
+					     data[i] & 0xff);
+				}
+				if (saved_s4 & (1 << 12))
+					VERTEX_OUT("width = 0x%08x)", data[i]);
+
+				for (tc = 0; tc <= 7; tc++) {
+					switch ((saved_s2 >> (tc * 4)) & 0xf) {
+					case 0x0:
+						VERTEX_OUT("T%d.X = %f", tc,
+							   int_as_float(data
+									[i]));
+						VERTEX_OUT("T%d.Y = %f", tc,
+							   int_as_float(data
+									[i]));
+						break;
+					case 0x1:
+						VERTEX_OUT("T%d.X = %f", tc,
+							   int_as_float(data
+									[i]));
+						VERTEX_OUT("T%d.Y = %f", tc,
+							   int_as_float(data
+									[i]));
+						VERTEX_OUT("T%d.Z = %f", tc,
+							   int_as_float(data
+									[i]));
+						break;
+					case 0x2:
+						VERTEX_OUT("T%d.X = %f", tc,
+							   int_as_float(data
+									[i]));
+						VERTEX_OUT("T%d.Y = %f", tc,
+							   int_as_float(data
+									[i]));
+						VERTEX_OUT("T%d.Z = %f", tc,
+							   int_as_float(data
+									[i]));
+						VERTEX_OUT("T%d.W = %f", tc,
+							   int_as_float(data
+									[i]));
+						break;
+					case 0x3:
+						VERTEX_OUT("T%d.X = %f", tc,
+							   int_as_float(data
+									[i]));
+						break;
+					case 0x4:
+						VERTEX_OUT
+						    ("T%d.XY = 0x%08x half-float",
+						     tc, data[i]);
+						break;
+					case 0x5:
+						VERTEX_OUT
+						    ("T%d.XY = 0x%08x half-float",
+						     tc, data[i]);
+						VERTEX_OUT
+						    ("T%d.ZW = 0x%08x half-float",
+						     tc, data[i]);
+						break;
+					case 0xf:
+						break;
+					default:
+						fprintf(out,
+							"bad S2.T%d format\n",
+							tc);
+					}
+				}
+				vertex++;
+			}
+		}
+
+		ret = len;
+	} else {
+		/* indirect vertices */
+		len = data[0] & 0x0000ffff;	/* index count */
+		if (data[0] & (1 << 17)) {
+			/* random vertex access */
+			if (count < (len + 1) / 2 + 1) {
+				BUFFER_FAIL(count, (len + 1) / 2 + 1,
+					    "3DPRIMITIVE random indirect");
+			}
+			instr_out(ctx, 0,
+				  "3DPRIMITIVE random indirect %s (%d)\n",
+				  primtype, len);
+			if (len == 0) {
+				/* vertex indices continue until 0xffff is
+				 * found
+				 */
+				for (i = 1; i < count; i++) {
+					if ((data[i] & 0xffff) == 0xffff) {
+						instr_out(ctx, i,
+							  "    indices: (terminator)\n");
+						ret = i;
+						goto out;
+					} else if ((data[i] >> 16) == 0xffff) {
+						instr_out(ctx, i,
+							  "    indices: 0x%04x, (terminator)\n",
+							  data[i] & 0xffff);
+						ret = i;
+						goto out;
+					} else {
+						instr_out(ctx, i,
+							  "    indices: 0x%04x, 0x%04x\n",
+							  data[i] & 0xffff,
+							  data[i] >> 16);
+					}
+				}
+				fprintf(out,
+					"3DPRIMITIVE: no terminator found in index buffer\n");
+				ret = count;
+				goto out;
+			} else {
+				/* fixed size vertex index buffer */
+				for (j = 1, i = 0; i < len; i += 2, j++) {
+					if (i * 2 == len - 1) {
+						instr_out(ctx, j,
+							  "    indices: 0x%04x\n",
+							  data[j] & 0xffff);
+					} else {
+						instr_out(ctx, j,
+							  "    indices: 0x%04x, 0x%04x\n",
+							  data[j] & 0xffff,
+							  data[j] >> 16);
+					}
+				}
+			}
+			ret = (len + 1) / 2 + 1;
+			goto out;
+		} else {
+			/* sequential vertex access */
+			instr_out(ctx, 0,
+				  "3DPRIMITIVE sequential indirect %s, %d starting from "
+				  "%d\n", primtype, len, data[1] & 0xffff);
+			instr_out(ctx, 1, "           start\n");
+			ret = 2;
+			goto out;
+		}
+	}
+
+out:
+	saved_s2 = original_s2;
+	saved_s4 = original_s4;
+	return ret;
+}
+
+static int
+decode_3d(struct drm_intel_decode *ctx)
+{
+	uint32_t opcode;
+	unsigned int idx;
+	uint32_t *data = ctx->data;
+
+	struct {
+		uint32_t opcode;
+		unsigned int min_len;
+		unsigned int max_len;
+		const char *name;
+	} opcodes_3d[] = {
+		{ 0x06, 1, 1, "3DSTATE_ANTI_ALIASING" },
+		{ 0x08, 1, 1, "3DSTATE_BACKFACE_STENCIL_OPS" },
+		{ 0x09, 1, 1, "3DSTATE_BACKFACE_STENCIL_MASKS" },
+		{ 0x16, 1, 1, "3DSTATE_COORD_SET_BINDINGS" },
+		{ 0x15, 1, 1, "3DSTATE_FOG_COLOR" },
+		{ 0x0b, 1, 1, "3DSTATE_INDEPENDENT_ALPHA_BLEND" },
+		{ 0x0d, 1, 1, "3DSTATE_MODES_4" },
+		{ 0x0c, 1, 1, "3DSTATE_MODES_5" },
+		{ 0x07, 1, 1, "3DSTATE_RASTERIZATION_RULES"},
+	}, *opcode_3d;
+
+	opcode = (data[0] & 0x1f000000) >> 24;
+
+	switch (opcode) {
+	case 0x1f:
+		return decode_3d_primitive(ctx);
+	case 0x1d:
+		return decode_3d_1d(ctx);
+	case 0x1c:
+		return decode_3d_1c(ctx);
+	}
+
+	for (idx = 0; idx < ARRAY_SIZE(opcodes_3d); idx++) {
+		opcode_3d = &opcodes_3d[idx];
+		if (opcode == opcode_3d->opcode) {
+			unsigned int len = 1, i;
+
+			instr_out(ctx, 0, "%s\n", opcode_3d->name);
+			if (opcode_3d->max_len > 1) {
+				len = (data[0] & 0xff) + 2;
+				if (len < opcode_3d->min_len ||
+				    len > opcode_3d->max_len) {
+					fprintf(out, "Bad count in %s\n",
+						opcode_3d->name);
+				}
+			}
+
+			for (i = 1; i < len; i++) {
+				instr_out(ctx, i, "dword %d\n", i);
+			}
+			return len;
+		}
+	}
+
+	instr_out(ctx, 0, "3D UNKNOWN: 3d opcode = 0x%x\n", opcode);
+	return 1;
+}
+
+static const char *get_965_surfacetype(unsigned int surfacetype)
+{
+	switch (surfacetype) {
+	case 0:
+		return "1D";
+	case 1:
+		return "2D";
+	case 2:
+		return "3D";
+	case 3:
+		return "CUBE";
+	case 4:
+		return "BUFFER";
+	case 7:
+		return "NULL";
+	default:
+		return "unknown";
+	}
+}
+
+static const char *get_965_depthformat(unsigned int depthformat)
+{
+	switch (depthformat) {
+	case 0:
+		return "s8_z24float";
+	case 1:
+		return "z32float";
+	case 2:
+		return "z24s8";
+	case 5:
+		return "z16";
+	default:
+		return "unknown";
+	}
+}
+
+static const char *get_965_element_component(uint32_t data, int component)
+{
+	uint32_t component_control = (data >> (16 + (3 - component) * 4)) & 0x7;
+
+	switch (component_control) {
+	case 0:
+		return "nostore";
+	case 1:
+		switch (component) {
+		case 0:
+			return "X";
+		case 1:
+			return "Y";
+		case 2:
+			return "Z";
+		case 3:
+			return "W";
+		default:
+			return "fail";
+		}
+	case 2:
+		return "0.0";
+	case 3:
+		return "1.0";
+	case 4:
+		return "0x1";
+	case 5:
+		return "VID";
+	default:
+		return "fail";
+	}
+}
+
+static const char *get_965_prim_type(uint32_t primtype)
+{
+	switch (primtype) {
+	case 0x01:
+		return "point list";
+	case 0x02:
+		return "line list";
+	case 0x03:
+		return "line strip";
+	case 0x04:
+		return "tri list";
+	case 0x05:
+		return "tri strip";
+	case 0x06:
+		return "tri fan";
+	case 0x07:
+		return "quad list";
+	case 0x08:
+		return "quad strip";
+	case 0x09:
+		return "line list adj";
+	case 0x0a:
+		return "line strip adj";
+	case 0x0b:
+		return "tri list adj";
+	case 0x0c:
+		return "tri strip adj";
+	case 0x0d:
+		return "tri strip reverse";
+	case 0x0e:
+		return "polygon";
+	case 0x0f:
+		return "rect list";
+	case 0x10:
+		return "line loop";
+	case 0x11:
+		return "point list bf";
+	case 0x12:
+		return "line strip cont";
+	case 0x13:
+		return "line strip bf";
+	case 0x14:
+		return "line strip cont bf";
+	case 0x15:
+		return "tri fan no stipple";
+	default:
+		return "fail";
+	}
+}
+
+static int
+i965_decode_urb_fence(struct drm_intel_decode *ctx, int len)
+{
+	uint32_t vs_fence, clip_fence, gs_fence, sf_fence, vfe_fence, cs_fence;
+	uint32_t *data = ctx->data;
+
+	if (len != 3)
+		fprintf(out, "Bad count in URB_FENCE\n");
+
+	vs_fence = data[1] & 0x3ff;
+	gs_fence = (data[1] >> 10) & 0x3ff;
+	clip_fence = (data[1] >> 20) & 0x3ff;
+	sf_fence = data[2] & 0x3ff;
+	vfe_fence = (data[2] >> 10) & 0x3ff;
+	cs_fence = (data[2] >> 20) & 0x7ff;
+
+	instr_out(ctx, 0, "URB_FENCE: %s%s%s%s%s%s\n",
+		  (data[0] >> 13) & 1 ? "cs " : "",
+		  (data[0] >> 12) & 1 ? "vfe " : "",
+		  (data[0] >> 11) & 1 ? "sf " : "",
+		  (data[0] >> 10) & 1 ? "clip " : "",
+		  (data[0] >> 9) & 1 ? "gs " : "",
+		  (data[0] >> 8) & 1 ? "vs " : "");
+	instr_out(ctx, 1,
+		  "vs fence: %d, clip_fence: %d, gs_fence: %d\n",
+		  vs_fence, clip_fence, gs_fence);
+	instr_out(ctx, 2,
+		  "sf fence: %d, vfe_fence: %d, cs_fence: %d\n",
+		  sf_fence, vfe_fence, cs_fence);
+	if (gs_fence < vs_fence)
+		fprintf(out, "gs fence < vs fence!\n");
+	if (clip_fence < gs_fence)
+		fprintf(out, "clip fence < gs fence!\n");
+	if (sf_fence < clip_fence)
+		fprintf(out, "sf fence < clip fence!\n");
+	if (cs_fence < sf_fence)
+		fprintf(out, "cs fence < sf fence!\n");
+
+	return len;
+}
+
+static void
+state_base_out(struct drm_intel_decode *ctx, unsigned int index,
+	       const char *name)
+{
+	if (ctx->data[index] & 1) {
+		instr_out(ctx, index,
+			  "%s state base address 0x%08x\n", name,
+			  ctx->data[index] & ~1);
+	} else {
+		instr_out(ctx, index, "%s state base not updated\n",
+			  name);
+	}
+}
+
+static void
+state_max_out(struct drm_intel_decode *ctx, unsigned int index,
+	      const char *name)
+{
+	if (ctx->data[index] & 1) {
+		if (ctx->data[index] == 1) {
+			instr_out(ctx, index,
+				  "%s state upper bound disabled\n", name);
+		} else {
+			instr_out(ctx, index,
+				  "%s state upper bound 0x%08x\n", name,
+				  ctx->data[index] & ~1);
+		}
+	} else {
+		instr_out(ctx, index,
+			  "%s state upper bound not updated\n", name);
+	}
+}
+
+static int
+gen7_3DSTATE_VIEWPORT_STATE_POINTERS_CC(struct drm_intel_decode *ctx)
+{
+	instr_out(ctx, 0, "3DSTATE_VIEWPORT_STATE_POINTERS_CC\n");
+	instr_out(ctx, 1, "pointer to CC viewport\n");
+
+	return 2;
+}
+
+static int
+gen7_3DSTATE_VIEWPORT_STATE_POINTERS_SF_CLIP(struct drm_intel_decode *ctx)
+{
+	instr_out(ctx, 0, "3DSTATE_VIEWPORT_STATE_POINTERS_SF_CLIP\n");
+	instr_out(ctx, 1, "pointer to SF_CLIP viewport\n");
+
+	return 2;
+}
+
+static int
+gen7_3DSTATE_BLEND_STATE_POINTERS(struct drm_intel_decode *ctx)
+{
+	instr_out(ctx, 0, "3DSTATE_BLEND_STATE_POINTERS\n");
+	instr_out(ctx, 1, "pointer to BLEND_STATE at 0x%08x (%s)\n",
+		  ctx->data[1] & ~1,
+		  (ctx->data[1] & 1) ? "changed" : "unchanged");
+
+	return 2;
+}
+
+static int
+gen7_3DSTATE_DEPTH_STENCIL_STATE_POINTERS(struct drm_intel_decode *ctx)
+{
+	instr_out(ctx, 0, "3DSTATE_DEPTH_STENCIL_STATE_POINTERS\n");
+	instr_out(ctx, 1,
+		  "pointer to DEPTH_STENCIL_STATE at 0x%08x (%s)\n",
+		  ctx->data[1] & ~1,
+		  (ctx->data[1] & 1) ? "changed" : "unchanged");
+
+	return 2;
+}
+
+static int
+gen7_3DSTATE_HIER_DEPTH_BUFFER(struct drm_intel_decode *ctx)
+{
+	instr_out(ctx, 0, "3DSTATE_HIER_DEPTH_BUFFER\n");
+	instr_out(ctx, 1, "pitch %db\n",
+		  (ctx->data[1] & 0x1ffff) + 1);
+	instr_out(ctx, 2, "pointer to HiZ buffer\n");
+
+	return 3;
+}
+
+static int
+gen6_3DSTATE_CC_STATE_POINTERS(struct drm_intel_decode *ctx)
+{
+	instr_out(ctx, 0, "3DSTATE_CC_STATE_POINTERS\n");
+	instr_out(ctx, 1, "blend change %d\n", ctx->data[1] & 1);
+	instr_out(ctx, 2, "depth stencil change %d\n",
+		  ctx->data[2] & 1);
+	instr_out(ctx, 3, "cc change %d\n", ctx->data[3] & 1);
+
+	return 4;
+}
+
+static int
+gen7_3DSTATE_CC_STATE_POINTERS(struct drm_intel_decode *ctx)
+{
+	instr_out(ctx, 0, "3DSTATE_CC_STATE_POINTERS\n");
+	instr_out(ctx, 1, "pointer to COLOR_CALC_STATE at 0x%08x "
+		  "(%s)\n",
+		  ctx->data[1] & ~1,
+		  (ctx->data[1] & 1) ? "changed" : "unchanged");
+
+	return 2;
+}
+
+static int
+gen7_3DSTATE_URB_unit(struct drm_intel_decode *ctx, const char *unit)
+{
+    int start_kb = ((ctx->data[1] >> 25) & 0x3f) * 8;
+    /* the field is # of 512-bit rows - 1, we print bytes */
+    int entry_size = (((ctx->data[1] >> 16) & 0x1ff) + 1);
+    int nr_entries = ctx->data[1] & 0xffff;
+
+    instr_out(ctx, 0, "3DSTATE_URB_%s\n", unit);
+    instr_out(ctx, 1,
+	      "%dKB start, size=%d 64B rows, nr_entries=%d, total size %dB\n",
+	      start_kb, entry_size, nr_entries, nr_entries * 64 * entry_size);
+
+    return 2;
+}
+
+static int
+gen7_3DSTATE_URB_VS(struct drm_intel_decode *ctx)
+{
+	return gen7_3DSTATE_URB_unit(ctx, "VS");
+}
+
+static int
+gen7_3DSTATE_URB_HS(struct drm_intel_decode *ctx)
+{
+	return gen7_3DSTATE_URB_unit(ctx, "HS");
+}
+
+static int
+gen7_3DSTATE_URB_DS(struct drm_intel_decode *ctx)
+{
+	return gen7_3DSTATE_URB_unit(ctx, "DS");
+}
+
+static int
+gen7_3DSTATE_URB_GS(struct drm_intel_decode *ctx)
+{
+	return gen7_3DSTATE_URB_unit(ctx, "GS");
+}
+
+static int
+gen7_3DSTATE_CONSTANT(struct drm_intel_decode *ctx, const char *unit)
+{
+	int rlen[4];
+
+	rlen[0] = (ctx->data[1] >> 0) & 0xffff;
+	rlen[1] = (ctx->data[1] >> 16) & 0xffff;
+	rlen[2] = (ctx->data[2] >> 0) & 0xffff;
+	rlen[3] = (ctx->data[2] >> 16) & 0xffff;
+
+	instr_out(ctx, 0, "3DSTATE_CONSTANT_%s\n", unit);
+	instr_out(ctx, 1, "len 0 = %d, len 1 = %d\n", rlen[0], rlen[1]);
+	instr_out(ctx, 2, "len 2 = %d, len 3 = %d\n", rlen[2], rlen[3]);
+	instr_out(ctx, 3, "pointer to constbuf 0\n");
+	instr_out(ctx, 4, "pointer to constbuf 1\n");
+	instr_out(ctx, 5, "pointer to constbuf 2\n");
+	instr_out(ctx, 6, "pointer to constbuf 3\n");
+
+	return 7;
+}
+
+static int
+gen7_3DSTATE_CONSTANT_VS(struct drm_intel_decode *ctx)
+{
+	return gen7_3DSTATE_CONSTANT(ctx, "VS");
+}
+
+static int
+gen7_3DSTATE_CONSTANT_GS(struct drm_intel_decode *ctx)
+{
+	return gen7_3DSTATE_CONSTANT(ctx, "GS");
+}
+
+static int
+gen7_3DSTATE_CONSTANT_PS(struct drm_intel_decode *ctx)
+{
+	return gen7_3DSTATE_CONSTANT(ctx, "PS");
+}
+
+static int
+gen7_3DSTATE_CONSTANT_DS(struct drm_intel_decode *ctx)
+{
+	return gen7_3DSTATE_CONSTANT(ctx, "DS");
+}
+
+static int
+gen7_3DSTATE_CONSTANT_HS(struct drm_intel_decode *ctx)
+{
+	return gen7_3DSTATE_CONSTANT(ctx, "HS");
+}
+
+
+static int
+gen6_3DSTATE_WM(struct drm_intel_decode *ctx)
+{
+	instr_out(ctx, 0, "3DSTATE_WM\n");
+	instr_out(ctx, 1, "kernel start pointer 0\n");
+	instr_out(ctx, 2,
+		  "SPF=%d, VME=%d, Sampler Count %d, "
+		  "Binding table count %d\n",
+		  (ctx->data[2] >> 31) & 1,
+		  (ctx->data[2] >> 30) & 1,
+		  (ctx->data[2] >> 27) & 7,
+		  (ctx->data[2] >> 18) & 0xff);
+	instr_out(ctx, 3, "scratch offset\n");
+	instr_out(ctx, 4,
+		  "Depth Clear %d, Depth Resolve %d, HiZ Resolve %d, "
+		  "Dispatch GRF start[0] %d, start[1] %d, start[2] %d\n",
+		  (ctx->data[4] & (1 << 30)) != 0,
+		  (ctx->data[4] & (1 << 28)) != 0,
+		  (ctx->data[4] & (1 << 27)) != 0,
+		  (ctx->data[4] >> 16) & 0x7f,
+		  (ctx->data[4] >> 8) & 0x7f,
+		  (ctx->data[4] & 0x7f));
+	instr_out(ctx, 5,
+		  "MaxThreads %d, PS KillPixel %d, PS computed Z %d, "
+		  "PS use sourceZ %d, Thread Dispatch %d, PS use sourceW %d, "
+		  "Dispatch32 %d, Dispatch16 %d, Dispatch8 %d\n",
+		  ((ctx->data[5] >> 25) & 0x7f) + 1,
+		  (ctx->data[5] & (1 << 22)) != 0,
+		  (ctx->data[5] & (1 << 21)) != 0,
+		  (ctx->data[5] & (1 << 20)) != 0,
+		  (ctx->data[5] & (1 << 19)) != 0,
+		  (ctx->data[5] & (1 << 8)) != 0,
+		  (ctx->data[5] & (1 << 2)) != 0,
+		  (ctx->data[5] & (1 << 1)) != 0,
+		  (ctx->data[5] & (1 << 0)) != 0);
+	instr_out(ctx, 6,
+		  "Num SF output %d, Pos XY offset %d, ZW interp mode %d , "
+		  "Barycentric interp mode 0x%x, Point raster rule %d, "
+		  "Multisample mode %d, "
+		  "Multisample Dispatch mode %d\n",
+		  (ctx->data[6] >> 20) & 0x3f,
+		  (ctx->data[6] >> 18) & 3,
+		  (ctx->data[6] >> 16) & 3,
+		  (ctx->data[6] >> 10) & 0x3f,
+		  (ctx->data[6] & (1 << 9)) != 0,
+		  (ctx->data[6] >> 1) & 3,
+		  (ctx->data[6] & 1));
+	instr_out(ctx, 7, "kernel start pointer 1\n");
+	instr_out(ctx, 8, "kernel start pointer 2\n");
+
+	return 9;
+}
+
+static int
+gen7_3DSTATE_WM(struct drm_intel_decode *ctx)
+{
+	const char *computed_depth = "";
+	const char *early_depth = "";
+	const char *zw_interp = "";
+
+	switch ((ctx->data[1] >> 23) & 0x3) {
+	case 0:
+		computed_depth = "";
+		break;
+	case 1:
+		computed_depth = "computed depth";
+		break;
+	case 2:
+		computed_depth = "computed depth >=";
+		break;
+	case 3:
+		computed_depth = "computed depth <=";
+		break;
+	}
+
+	switch ((ctx->data[1] >> 21) & 0x3) {
+	case 0:
+		early_depth = "";
+		break;
+	case 1:
+		early_depth = ", EDSC_PSEXEC";
+		break;
+	case 2:
+		early_depth = ", EDSC_PREPS";
+		break;
+	case 3:
+		early_depth = ", BAD EDSC";
+		break;
+	}
+
+	switch ((ctx->data[1] >> 17) & 0x3) {
+	case 0:
+		early_depth = "";
+		break;
+	case 1:
+		early_depth = ", BAD ZW interp";
+		break;
+	case 2:
+		early_depth = ", ZW centroid";
+		break;
+	case 3:
+		early_depth = ", ZW sample";
+		break;
+	}
+
+	instr_out(ctx, 0, "3DSTATE_WM\n");
+	instr_out(ctx, 1, "(%s%s%s%s%s%s)%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
+		  (ctx->data[1] & (1 << 11)) ? "PP " : "",
+		  (ctx->data[1] & (1 << 12)) ? "PC " : "",
+		  (ctx->data[1] & (1 << 13)) ? "PS " : "",
+		  (ctx->data[1] & (1 << 14)) ? "NPP " : "",
+		  (ctx->data[1] & (1 << 15)) ? "NPC " : "",
+		  (ctx->data[1] & (1 << 16)) ? "NPS " : "",
+		  (ctx->data[1] & (1 << 30)) ? ", depth clear" : "",
+		  (ctx->data[1] & (1 << 29)) ? "" : ", disabled",
+		  (ctx->data[1] & (1 << 28)) ? ", depth resolve" : "",
+		  (ctx->data[1] & (1 << 27)) ? ", hiz resolve" : "",
+		  (ctx->data[1] & (1 << 25)) ? ", kill" : "",
+		  computed_depth,
+		  early_depth,
+		  zw_interp,
+		  (ctx->data[1] & (1 << 20)) ? ", source depth" : "",
+		  (ctx->data[1] & (1 << 19)) ? ", source W" : "",
+		  (ctx->data[1] & (1 << 10)) ? ", coverage" : "",
+		  (ctx->data[1] & (1 << 4)) ? ", poly stipple" : "",
+		  (ctx->data[1] & (1 << 3)) ? ", line stipple" : "",
+		  (ctx->data[1] & (1 << 2)) ? ", point UL" : ", point UR"
+		  );
+	instr_out(ctx, 2, "MS\n");
+
+	return 3;
+}
+
+static int
+gen4_3DPRIMITIVE(struct drm_intel_decode *ctx)
+{
+	instr_out(ctx, 0,
+		  "3DPRIMITIVE: %s %s\n",
+		  get_965_prim_type((ctx->data[0] >> 10) & 0x1f),
+		  (ctx->data[0] & (1 << 15)) ? "random" : "sequential");
+	instr_out(ctx, 1, "vertex count\n");
+	instr_out(ctx, 2, "start vertex\n");
+	instr_out(ctx, 3, "instance count\n");
+	instr_out(ctx, 4, "start instance\n");
+	instr_out(ctx, 5, "index bias\n");
+
+	return 6;
+}
+
+static int
+gen7_3DPRIMITIVE(struct drm_intel_decode *ctx)
+{
+	bool indirect = !!(ctx->data[0] & (1 << 10));
+
+	instr_out(ctx, 0,
+		  "3DPRIMITIVE: %s%s\n",
+		  indirect ? " indirect" : "",
+		  (ctx->data[0] & (1 << 8)) ? " predicated" : "");
+	instr_out(ctx, 1, "%s %s\n",
+		  get_965_prim_type(ctx->data[1] & 0x3f),
+		  (ctx->data[1] & (1 << 8)) ? "random" : "sequential");
+	instr_out(ctx, 2, indirect ? "ignored" : "vertex count\n");
+	instr_out(ctx, 3, indirect ? "ignored" : "start vertex\n");
+	instr_out(ctx, 4, indirect ? "ignored" : "instance count\n");
+	instr_out(ctx, 5, indirect ? "ignored" : "start instance\n");
+	instr_out(ctx, 6, indirect ? "ignored" : "index bias\n");
+
+	return 7;
+}
+
+static int
+decode_3d_965(struct drm_intel_decode *ctx)
+{
+	uint32_t opcode;
+	unsigned int len;
+	unsigned int i, j, sba_len;
+	const char *desc1 = NULL;
+	uint32_t *data = ctx->data;
+	uint32_t devid = ctx->devid;
+
+	struct {
+		uint32_t opcode;
+		uint32_t len_mask;
+		int unsigned min_len;
+		int unsigned max_len;
+		const char *name;
+		int gen;
+		int (*func)(struct drm_intel_decode *ctx);
+	} opcodes_3d[] = {
+		{ 0x6000, 0x00ff, 3, 3, "URB_FENCE" },
+		{ 0x6001, 0xffff, 2, 2, "CS_URB_STATE" },
+		{ 0x6002, 0x00ff, 2, 2, "CONSTANT_BUFFER" },
+		{ 0x6101, 0xffff, 6, 10, "STATE_BASE_ADDRESS" },
+		{ 0x6102, 0xffff, 2, 2, "STATE_SIP" },
+		{ 0x6104, 0xffff, 1, 1, "3DSTATE_PIPELINE_SELECT" },
+		{ 0x680b, 0xffff, 1, 1, "3DSTATE_VF_STATISTICS" },
+		{ 0x6904, 0xffff, 1, 1, "3DSTATE_PIPELINE_SELECT" },
+		{ 0x7800, 0xffff, 7, 7, "3DSTATE_PIPELINED_POINTERS" },
+		{ 0x7801, 0x00ff, 4, 6, "3DSTATE_BINDING_TABLE_POINTERS" },
+		{ 0x7802, 0x00ff, 4, 4, "3DSTATE_SAMPLER_STATE_POINTERS" },
+		{ 0x7805, 0x00ff, 7, 7, "3DSTATE_DEPTH_BUFFER", 7 },
+		{ 0x7805, 0x00ff, 3, 3, "3DSTATE_URB" },
+		{ 0x7804, 0x00ff, 3, 3, "3DSTATE_CLEAR_PARAMS" },
+		{ 0x7806, 0x00ff, 3, 3, "3DSTATE_STENCIL_BUFFER" },
+		{ 0x790f, 0x00ff, 3, 3, "3DSTATE_HIER_DEPTH_BUFFER", 6 },
+		{ 0x7807, 0x00ff, 3, 3, "3DSTATE_HIER_DEPTH_BUFFER", 7, gen7_3DSTATE_HIER_DEPTH_BUFFER },
+		{ 0x7808, 0x00ff, 5, 257, "3DSTATE_VERTEX_BUFFERS" },
+		{ 0x7809, 0x00ff, 3, 256, "3DSTATE_VERTEX_ELEMENTS" },
+		{ 0x780a, 0x00ff, 3, 3, "3DSTATE_INDEX_BUFFER" },
+		{ 0x780b, 0xffff, 1, 1, "3DSTATE_VF_STATISTICS" },
+		{ 0x780d, 0x00ff, 4, 4, "3DSTATE_VIEWPORT_STATE_POINTERS" },
+		{ 0x780e, 0xffff, 4, 4, NULL, 6, gen6_3DSTATE_CC_STATE_POINTERS },
+		{ 0x780e, 0x00ff, 2, 2, NULL, 7, gen7_3DSTATE_CC_STATE_POINTERS },
+		{ 0x780f, 0x00ff, 2, 2, "3DSTATE_SCISSOR_POINTERS" },
+		{ 0x7810, 0x00ff, 6, 6, "3DSTATE_VS" },
+		{ 0x7811, 0x00ff, 7, 7, "3DSTATE_GS" },
+		{ 0x7812, 0x00ff, 4, 4, "3DSTATE_CLIP" },
+		{ 0x7813, 0x00ff, 20, 20, "3DSTATE_SF", 6 },
+		{ 0x7813, 0x00ff, 7, 7, "3DSTATE_SF", 7 },
+		{ 0x7814, 0x00ff, 3, 3, "3DSTATE_WM", 7, gen7_3DSTATE_WM },
+		{ 0x7814, 0x00ff, 9, 9, "3DSTATE_WM", 6, gen6_3DSTATE_WM },
+		{ 0x7815, 0x00ff, 5, 5, "3DSTATE_CONSTANT_VS_STATE", 6 },
+		{ 0x7815, 0x00ff, 7, 7, "3DSTATE_CONSTANT_VS", 7, gen7_3DSTATE_CONSTANT_VS },
+		{ 0x7816, 0x00ff, 5, 5, "3DSTATE_CONSTANT_GS_STATE", 6 },
+		{ 0x7816, 0x00ff, 7, 7, "3DSTATE_CONSTANT_GS", 7, gen7_3DSTATE_CONSTANT_GS },
+		{ 0x7817, 0x00ff, 5, 5, "3DSTATE_CONSTANT_PS_STATE", 6 },
+		{ 0x7817, 0x00ff, 7, 7, "3DSTATE_CONSTANT_PS", 7, gen7_3DSTATE_CONSTANT_PS },
+		{ 0x7818, 0xffff, 2, 2, "3DSTATE_SAMPLE_MASK" },
+		{ 0x7819, 0x00ff, 7, 7, "3DSTATE_CONSTANT_HS", 7, gen7_3DSTATE_CONSTANT_HS },
+		{ 0x781a, 0x00ff, 7, 7, "3DSTATE_CONSTANT_DS", 7, gen7_3DSTATE_CONSTANT_DS },
+		{ 0x781b, 0x00ff, 7, 7, "3DSTATE_HS" },
+		{ 0x781c, 0x00ff, 4, 4, "3DSTATE_TE" },
+		{ 0x781d, 0x00ff, 6, 6, "3DSTATE_DS" },
+		{ 0x781e, 0x00ff, 3, 3, "3DSTATE_STREAMOUT" },
+		{ 0x781f, 0x00ff, 14, 14, "3DSTATE_SBE" },
+		{ 0x7820, 0x00ff, 8, 8, "3DSTATE_PS" },
+		{ 0x7821, 0x00ff, 2, 2, NULL, 7, gen7_3DSTATE_VIEWPORT_STATE_POINTERS_SF_CLIP },
+		{ 0x7823, 0x00ff, 2, 2, NULL, 7, gen7_3DSTATE_VIEWPORT_STATE_POINTERS_CC },
+		{ 0x7824, 0x00ff, 2, 2, NULL, 7, gen7_3DSTATE_BLEND_STATE_POINTERS },
+		{ 0x7825, 0x00ff, 2, 2, NULL, 7, gen7_3DSTATE_DEPTH_STENCIL_STATE_POINTERS },
+		{ 0x7826, 0x00ff, 2, 2, "3DSTATE_BINDING_TABLE_POINTERS_VS" },
+		{ 0x7827, 0x00ff, 2, 2, "3DSTATE_BINDING_TABLE_POINTERS_HS" },
+		{ 0x7828, 0x00ff, 2, 2, "3DSTATE_BINDING_TABLE_POINTERS_DS" },
+		{ 0x7829, 0x00ff, 2, 2, "3DSTATE_BINDING_TABLE_POINTERS_GS" },
+		{ 0x782a, 0x00ff, 2, 2, "3DSTATE_BINDING_TABLE_POINTERS_PS" },
+		{ 0x782b, 0x00ff, 2, 2, "3DSTATE_SAMPLER_STATE_POINTERS_VS" },
+		{ 0x782c, 0x00ff, 2, 2, "3DSTATE_SAMPLER_STATE_POINTERS_HS" },
+		{ 0x782d, 0x00ff, 2, 2, "3DSTATE_SAMPLER_STATE_POINTERS_DS" },
+		{ 0x782e, 0x00ff, 2, 2, "3DSTATE_SAMPLER_STATE_POINTERS_GS" },
+		{ 0x782f, 0x00ff, 2, 2, "3DSTATE_SAMPLER_STATE_POINTERS_PS" },
+		{ 0x7830, 0x00ff, 2, 2, NULL, 7, gen7_3DSTATE_URB_VS },
+		{ 0x7831, 0x00ff, 2, 2, NULL, 7, gen7_3DSTATE_URB_HS },
+		{ 0x7832, 0x00ff, 2, 2, NULL, 7, gen7_3DSTATE_URB_DS },
+		{ 0x7833, 0x00ff, 2, 2, NULL, 7, gen7_3DSTATE_URB_GS },
+		{ 0x7900, 0xffff, 4, 4, "3DSTATE_DRAWING_RECTANGLE" },
+		{ 0x7901, 0xffff, 5, 5, "3DSTATE_CONSTANT_COLOR" },
+		{ 0x7905, 0xffff, 5, 7, "3DSTATE_DEPTH_BUFFER" },
+		{ 0x7906, 0xffff, 2, 2, "3DSTATE_POLY_STIPPLE_OFFSET" },
+		{ 0x7907, 0xffff, 33, 33, "3DSTATE_POLY_STIPPLE_PATTERN" },
+		{ 0x7908, 0xffff, 3, 3, "3DSTATE_LINE_STIPPLE" },
+		{ 0x7909, 0xffff, 2, 2, "3DSTATE_GLOBAL_DEPTH_OFFSET_CLAMP" },
+		{ 0x7909, 0xffff, 2, 2, "3DSTATE_CLEAR_PARAMS" },
+		{ 0x790a, 0xffff, 3, 3, "3DSTATE_AA_LINE_PARAMETERS" },
+		{ 0x790b, 0xffff, 4, 4, "3DSTATE_GS_SVB_INDEX" },
+		{ 0x790d, 0xffff, 3, 3, "3DSTATE_MULTISAMPLE", 6 },
+		{ 0x790d, 0xffff, 4, 4, "3DSTATE_MULTISAMPLE", 7 },
+		{ 0x7910, 0x00ff, 2, 2, "3DSTATE_CLEAR_PARAMS" },
+		{ 0x7912, 0x00ff, 2, 2, "3DSTATE_PUSH_CONSTANT_ALLOC_VS" },
+		{ 0x7913, 0x00ff, 2, 2, "3DSTATE_PUSH_CONSTANT_ALLOC_HS" },
+		{ 0x7914, 0x00ff, 2, 2, "3DSTATE_PUSH_CONSTANT_ALLOC_DS" },
+		{ 0x7915, 0x00ff, 2, 2, "3DSTATE_PUSH_CONSTANT_ALLOC_GS" },
+		{ 0x7916, 0x00ff, 2, 2, "3DSTATE_PUSH_CONSTANT_ALLOC_PS" },
+		{ 0x7917, 0x00ff, 2, 2+128*2, "3DSTATE_SO_DECL_LIST" },
+		{ 0x7918, 0x00ff, 4, 4, "3DSTATE_SO_BUFFER" },
+		{ 0x7a00, 0x00ff, 4, 6, "PIPE_CONTROL" },
+		{ 0x7b00, 0x00ff, 7, 7, NULL, 7, gen7_3DPRIMITIVE },
+		{ 0x7b00, 0x00ff, 6, 6, NULL, 0, gen4_3DPRIMITIVE },
+	}, *opcode_3d = NULL;
+
+	opcode = (data[0] & 0xffff0000) >> 16;
+
+	for (i = 0; i < ARRAY_SIZE(opcodes_3d); i++) {
+		if (opcode != opcodes_3d[i].opcode)
+			continue;
+
+		/* If it's marked as not our gen, skip. */
+		if (opcodes_3d[i].gen && opcodes_3d[i].gen != ctx->gen)
+			continue;
+
+		opcode_3d = &opcodes_3d[i];
+		break;
+	}
+
+	if (opcode_3d) {
+		if (opcode_3d->max_len == 1)
+			len = 1;
+		else
+			len = (data[0] & opcode_3d->len_mask) + 2;
+
+		if (len < opcode_3d->min_len ||
+		    len > opcode_3d->max_len) {
+			fprintf(out, "Bad length %d in %s, expected %d-%d\n",
+				len, opcode_3d->name,
+				opcode_3d->min_len, opcode_3d->max_len);
+		}
+	} else {
+		len = (data[0] & 0x0000ffff) + 2;
+	}
+
+	switch (opcode) {
+	case 0x6000:
+		return i965_decode_urb_fence(ctx, len);
+	case 0x6001:
+		instr_out(ctx, 0, "CS_URB_STATE\n");
+		instr_out(ctx, 1,
+			  "entry_size: %d [%d bytes], n_entries: %d\n",
+			  (data[1] >> 4) & 0x1f,
+			  (((data[1] >> 4) & 0x1f) + 1) * 64, data[1] & 0x7);
+		return len;
+	case 0x6002:
+		instr_out(ctx, 0, "CONSTANT_BUFFER: %s\n",
+			  (data[0] >> 8) & 1 ? "valid" : "invalid");
+		instr_out(ctx, 1,
+			  "offset: 0x%08x, length: %d bytes\n", data[1] & ~0x3f,
+			  ((data[1] & 0x3f) + 1) * 64);
+		return len;
+	case 0x6101:
+		i = 0;
+		instr_out(ctx, 0, "STATE_BASE_ADDRESS\n");
+		i++;
+
+		if (IS_GEN6(devid) || IS_GEN7(devid))
+			sba_len = 10;
+		else if (IS_GEN5(devid))
+			sba_len = 8;
+		else
+			sba_len = 6;
+		if (len != sba_len)
+			fprintf(out, "Bad count in STATE_BASE_ADDRESS\n");
+
+		state_base_out(ctx, i++, "general");
+		state_base_out(ctx, i++, "surface");
+		if (IS_GEN6(devid) || IS_GEN7(devid))
+			state_base_out(ctx, i++, "dynamic");
+		state_base_out(ctx, i++, "indirect");
+		if (IS_GEN5(devid) || IS_GEN6(devid) || IS_GEN7(devid))
+			state_base_out(ctx, i++, "instruction");
+
+		state_max_out(ctx, i++, "general");
+		if (IS_GEN6(devid) || IS_GEN7(devid))
+			state_max_out(ctx, i++, "dynamic");
+		state_max_out(ctx, i++, "indirect");
+		if (IS_GEN5(devid) || IS_GEN6(devid) || IS_GEN7(devid))
+			state_max_out(ctx, i++, "instruction");
+
+		return len;
+	case 0x7800:
+		instr_out(ctx, 0, "3DSTATE_PIPELINED_POINTERS\n");
+		instr_out(ctx, 1, "VS state\n");
+		instr_out(ctx, 2, "GS state\n");
+		instr_out(ctx, 3, "Clip state\n");
+		instr_out(ctx, 4, "SF state\n");
+		instr_out(ctx, 5, "WM state\n");
+		instr_out(ctx, 6, "CC state\n");
+		return len;
+	case 0x7801:
+		if (len != 6 && len != 4)
+			fprintf(out,
+				"Bad count in 3DSTATE_BINDING_TABLE_POINTERS\n");
+		if (len == 6) {
+			instr_out(ctx, 0,
+				  "3DSTATE_BINDING_TABLE_POINTERS\n");
+			instr_out(ctx, 1, "VS binding table\n");
+			instr_out(ctx, 2, "GS binding table\n");
+			instr_out(ctx, 3, "Clip binding table\n");
+			instr_out(ctx, 4, "SF binding table\n");
+			instr_out(ctx, 5, "WM binding table\n");
+		} else {
+			instr_out(ctx, 0,
+				  "3DSTATE_BINDING_TABLE_POINTERS: VS mod %d, "
+				  "GS mod %d, PS mod %d\n",
+				  (data[0] & (1 << 8)) != 0,
+				  (data[0] & (1 << 9)) != 0,
+				  (data[0] & (1 << 12)) != 0);
+			instr_out(ctx, 1, "VS binding table\n");
+			instr_out(ctx, 2, "GS binding table\n");
+			instr_out(ctx, 3, "WM binding table\n");
+		}
+
+		return len;
+	case 0x7802:
+		instr_out(ctx, 0,
+			  "3DSTATE_SAMPLER_STATE_POINTERS: VS mod %d, "
+			  "GS mod %d, PS mod %d\n", (data[0] & (1 << 8)) != 0,
+			  (data[0] & (1 << 9)) != 0,
+			  (data[0] & (1 << 12)) != 0);
+		instr_out(ctx, 1, "VS sampler state\n");
+		instr_out(ctx, 2, "GS sampler state\n");
+		instr_out(ctx, 3, "WM sampler state\n");
+		return len;
+	case 0x7805:
+		/* Actually 3DSTATE_DEPTH_BUFFER on gen7. */
+		if (ctx->gen == 7)
+			break;
+
+		instr_out(ctx, 0, "3DSTATE_URB\n");
+		instr_out(ctx, 1,
+			  "VS entries %d, alloc size %d (1024bit row)\n",
+			  data[1] & 0xffff, ((data[1] >> 16) & 0x07f) + 1);
+		instr_out(ctx, 2,
+			  "GS entries %d, alloc size %d (1024bit row)\n",
+			  (data[2] >> 8) & 0x3ff, (data[2] & 7) + 1);
+		return len;
+
+	case 0x7808:
+		if ((len - 1) % 4 != 0)
+			fprintf(out, "Bad count in 3DSTATE_VERTEX_BUFFERS\n");
+		instr_out(ctx, 0, "3DSTATE_VERTEX_BUFFERS\n");
+
+		for (i = 1; i < len;) {
+			int idx, access;
+			if (IS_GEN6(devid)) {
+				idx = 26;
+				access = 20;
+			} else {
+				idx = 27;
+				access = 26;
+			}
+			instr_out(ctx, i,
+				  "buffer %d: %s, pitch %db\n", data[i] >> idx,
+				  data[i] & (1 << access) ? "random" :
+				  "sequential", data[i] & 0x07ff);
+			i++;
+			instr_out(ctx, i++, "buffer address\n");
+			instr_out(ctx, i++, "max index\n");
+			instr_out(ctx, i++, "mbz\n");
+		}
+		return len;
+
+	case 0x7809:
+		if ((len + 1) % 2 != 0)
+			fprintf(out, "Bad count in 3DSTATE_VERTEX_ELEMENTS\n");
+		instr_out(ctx, 0, "3DSTATE_VERTEX_ELEMENTS\n");
+
+		for (i = 1; i < len;) {
+			instr_out(ctx, i,
+				  "buffer %d: %svalid, type 0x%04x, "
+				  "src offset 0x%04x bytes\n",
+				  data[i] >> ((IS_GEN6(devid) || IS_GEN7(devid)) ? 26 : 27),
+				  data[i] & (1 << ((IS_GEN6(devid) || IS_GEN7(devid)) ? 25 : 26)) ?
+				  "" : "in", (data[i] >> 16) & 0x1ff,
+				  data[i] & 0x07ff);
+			i++;
+			instr_out(ctx, i, "(%s, %s, %s, %s), "
+				  "dst offset 0x%02x bytes\n",
+				  get_965_element_component(data[i], 0),
+				  get_965_element_component(data[i], 1),
+				  get_965_element_component(data[i], 2),
+				  get_965_element_component(data[i], 3),
+				  (data[i] & 0xff) * 4);
+			i++;
+		}
+		return len;
+
+	case 0x780d:
+		instr_out(ctx, 0,
+			  "3DSTATE_VIEWPORT_STATE_POINTERS\n");
+		instr_out(ctx, 1, "clip\n");
+		instr_out(ctx, 2, "sf\n");
+		instr_out(ctx, 3, "cc\n");
+		return len;
+
+	case 0x780a:
+		instr_out(ctx, 0, "3DSTATE_INDEX_BUFFER\n");
+		instr_out(ctx, 1, "beginning buffer address\n");
+		instr_out(ctx, 2, "ending buffer address\n");
+		return len;
+
+	case 0x780f:
+		instr_out(ctx, 0, "3DSTATE_SCISSOR_POINTERS\n");
+		instr_out(ctx, 1, "scissor rect offset\n");
+		return len;
+
+	case 0x7810:
+		instr_out(ctx, 0, "3DSTATE_VS\n");
+		instr_out(ctx, 1, "kernel pointer\n");
+		instr_out(ctx, 2,
+			  "SPF=%d, VME=%d, Sampler Count %d, "
+			  "Binding table count %d\n", (data[2] >> 31) & 1,
+			  (data[2] >> 30) & 1, (data[2] >> 27) & 7,
+			  (data[2] >> 18) & 0xff);
+		instr_out(ctx, 3, "scratch offset\n");
+		instr_out(ctx, 4,
+			  "Dispatch GRF start %d, VUE read length %d, "
+			  "VUE read offset %d\n", (data[4] >> 20) & 0x1f,
+			  (data[4] >> 11) & 0x3f, (data[4] >> 4) & 0x3f);
+		instr_out(ctx, 5,
+			  "Max Threads %d, Vertex Cache %sable, "
+			  "VS func %sable\n", ((data[5] >> 25) & 0x7f) + 1,
+			  (data[5] & (1 << 1)) != 0 ? "dis" : "en",
+			  (data[5] & 1) != 0 ? "en" : "dis");
+		return len;
+
+	case 0x7811:
+		instr_out(ctx, 0, "3DSTATE_GS\n");
+		instr_out(ctx, 1, "kernel pointer\n");
+		instr_out(ctx, 2,
+			  "SPF=%d, VME=%d, Sampler Count %d, "
+			  "Binding table count %d\n", (data[2] >> 31) & 1,
+			  (data[2] >> 30) & 1, (data[2] >> 27) & 7,
+			  (data[2] >> 18) & 0xff);
+		instr_out(ctx, 3, "scratch offset\n");
+		instr_out(ctx, 4,
+			  "Dispatch GRF start %d, VUE read length %d, "
+			  "VUE read offset %d\n", (data[4] & 0xf),
+			  (data[4] >> 11) & 0x3f, (data[4] >> 4) & 0x3f);
+		instr_out(ctx, 5,
+			  "Max Threads %d, Rendering %sable\n",
+			  ((data[5] >> 25) & 0x7f) + 1,
+			  (data[5] & (1 << 8)) != 0 ? "en" : "dis");
+		instr_out(ctx, 6,
+			  "Reorder %sable, Discard Adjaceny %sable, "
+			  "GS %sable\n",
+			  (data[6] & (1 << 30)) != 0 ? "en" : "dis",
+			  (data[6] & (1 << 29)) != 0 ? "en" : "dis",
+			  (data[6] & (1 << 15)) != 0 ? "en" : "dis");
+		return len;
+
+	case 0x7812:
+		instr_out(ctx, 0, "3DSTATE_CLIP\n");
+		instr_out(ctx, 1,
+			  "UserClip distance cull test mask 0x%x\n",
+			  data[1] & 0xff);
+		instr_out(ctx, 2,
+			  "Clip %sable, API mode %s, Viewport XY test %sable, "
+			  "Viewport Z test %sable, Guardband test %sable, Clip mode %d, "
+			  "Perspective Divide %sable, Non-Perspective Barycentric %sable, "
+			  "Tri Provoking %d, Line Provoking %d, Trifan Provoking %d\n",
+			  (data[2] & (1 << 31)) != 0 ? "en" : "dis",
+			  (data[2] & (1 << 30)) != 0 ? "D3D" : "OGL",
+			  (data[2] & (1 << 28)) != 0 ? "en" : "dis",
+			  (data[2] & (1 << 27)) != 0 ? "en" : "dis",
+			  (data[2] & (1 << 26)) != 0 ? "en" : "dis",
+			  (data[2] >> 13) & 7,
+			  (data[2] & (1 << 9)) != 0 ? "dis" : "en",
+			  (data[2] & (1 << 8)) != 0 ? "en" : "dis",
+			  (data[2] >> 4) & 3, (data[2] >> 2) & 3,
+			  (data[2] & 3));
+		instr_out(ctx, 3,
+			  "Min PointWidth %d, Max PointWidth %d, "
+			  "Force Zero RTAIndex %sable, Max VPIndex %d\n",
+			  (data[3] >> 17) & 0x7ff, (data[3] >> 6) & 0x7ff,
+			  (data[3] & (1 << 5)) != 0 ? "en" : "dis",
+			  (data[3] & 0xf));
+		return len;
+
+	case 0x7813:
+		if (ctx->gen == 7)
+			break;
+
+		instr_out(ctx, 0, "3DSTATE_SF\n");
+		instr_out(ctx, 1,
+			  "Attrib Out %d, Attrib Swizzle %sable, VUE read length %d, "
+			  "VUE read offset %d\n", (data[1] >> 22) & 0x3f,
+			  (data[1] & (1 << 21)) != 0 ? "en" : "dis",
+			  (data[1] >> 11) & 0x1f, (data[1] >> 4) & 0x3f);
+		instr_out(ctx, 2,
+			  "Legacy Global DepthBias %sable, FrontFace fill %d, BF fill %d, "
+			  "VP transform %sable, FrontWinding_%s\n",
+			  (data[2] & (1 << 11)) != 0 ? "en" : "dis",
+			  (data[2] >> 5) & 3, (data[2] >> 3) & 3,
+			  (data[2] & (1 << 1)) != 0 ? "en" : "dis",
+			  (data[2] & 1) != 0 ? "CCW" : "CW");
+		instr_out(ctx, 3,
+			  "AA %sable, CullMode %d, Scissor %sable, Multisample m ode %d\n",
+			  (data[3] & (1 << 31)) != 0 ? "en" : "dis",
+			  (data[3] >> 29) & 3,
+			  (data[3] & (1 << 11)) != 0 ? "en" : "dis",
+			  (data[3] >> 8) & 3);
+		instr_out(ctx, 4,
+			  "Last Pixel %sable, SubPixel Precision %d, Use PixelWidth %d\n",
+			  (data[4] & (1 << 31)) != 0 ? "en" : "dis",
+			  (data[4] & (1 << 12)) != 0 ? 4 : 8,
+			  (data[4] & (1 << 11)) != 0);
+		instr_out(ctx, 5,
+			  "Global Depth Offset Constant %f\n",
+			  *(float *)(&data[5]));
+		instr_out(ctx, 6, "Global Depth Offset Scale %f\n",
+			  *(float *)(&data[6]));
+		instr_out(ctx, 7, "Global Depth Offset Clamp %f\n",
+			  *(float *)(&data[7]));
+
+		for (i = 0, j = 0; i < 8; i++, j += 2)
+			instr_out(ctx, i + 8,
+				  "Attrib %d (Override %s%s%s%s, Const Source %d, Swizzle Select %d, "
+				  "Source %d); Attrib %d (Override %s%s%s%s, Const Source %d, Swizzle Select %d, Source %d)\n",
+				  j + 1,
+				  (data[8 + i] & (1 << 31)) != 0 ? "W" : "",
+				  (data[8 + i] & (1 << 30)) != 0 ? "Z" : "",
+				  (data[8 + i] & (1 << 29)) != 0 ? "Y" : "",
+				  (data[8 + i] & (1 << 28)) != 0 ? "X" : "",
+				  (data[8 + i] >> 25) & 3,
+				  (data[8 + i] >> 22) & 3,
+				  (data[8 + i] >> 16) & 0x1f, j,
+				  (data[8 + i] & (1 << 15)) != 0 ? "W" : "",
+				  (data[8 + i] & (1 << 14)) != 0 ? "Z" : "",
+				  (data[8 + i] & (1 << 13)) != 0 ? "Y" : "",
+				  (data[8 + i] & (1 << 12)) != 0 ? "X" : "",
+				  (data[8 + i] >> 9) & 3,
+				  (data[8 + i] >> 6) & 3, (data[8 + i] & 0x1f));
+		instr_out(ctx, 16,
+			  "Point Sprite TexCoord Enable\n");
+		instr_out(ctx, 17, "Const Interp Enable\n");
+		instr_out(ctx, 18,
+			  "Attrib 7-0 WrapShortest Enable\n");
+		instr_out(ctx, 19,
+			  "Attrib 15-8 WrapShortest Enable\n");
+
+		return len;
+
+	case 0x7900:
+		instr_out(ctx, 0, "3DSTATE_DRAWING_RECTANGLE\n");
+		instr_out(ctx, 1, "top left: %d,%d\n",
+			  data[1] & 0xffff, (data[1] >> 16) & 0xffff);
+		instr_out(ctx, 2, "bottom right: %d,%d\n",
+			  data[2] & 0xffff, (data[2] >> 16) & 0xffff);
+		instr_out(ctx, 3, "origin: %d,%d\n",
+			  (int)data[3] & 0xffff, ((int)data[3] >> 16) & 0xffff);
+
+		return len;
+
+	case 0x7905:
+		instr_out(ctx, 0, "3DSTATE_DEPTH_BUFFER\n");
+		if (IS_GEN5(devid) || IS_GEN6(devid))
+			instr_out(ctx, 1,
+				  "%s, %s, pitch = %d bytes, %stiled, HiZ %d, Separate Stencil %d\n",
+				  get_965_surfacetype(data[1] >> 29),
+				  get_965_depthformat((data[1] >> 18) & 0x7),
+				  (data[1] & 0x0001ffff) + 1,
+				  data[1] & (1 << 27) ? "" : "not ",
+				  (data[1] & (1 << 22)) != 0,
+				  (data[1] & (1 << 21)) != 0);
+		else
+			instr_out(ctx, 1,
+				  "%s, %s, pitch = %d bytes, %stiled\n",
+				  get_965_surfacetype(data[1] >> 29),
+				  get_965_depthformat((data[1] >> 18) & 0x7),
+				  (data[1] & 0x0001ffff) + 1,
+				  data[1] & (1 << 27) ? "" : "not ");
+		instr_out(ctx, 2, "depth offset\n");
+		instr_out(ctx, 3, "%dx%d\n",
+			  ((data[3] & 0x0007ffc0) >> 6) + 1,
+			  ((data[3] & 0xfff80000) >> 19) + 1);
+		instr_out(ctx, 4, "volume depth\n");
+		if (len >= 6)
+			instr_out(ctx, 5, "\n");
+		if (len >= 7) {
+			if (IS_GEN6(devid))
+				instr_out(ctx, 6, "\n");
+			else
+				instr_out(ctx, 6,
+					  "render target view extent\n");
+		}
+
+		return len;
+
+	case 0x7a00:
+		if (IS_GEN6(devid) || IS_GEN7(devid)) {
+			unsigned int i;
+			if (len != 4 && len != 5)
+				fprintf(out, "Bad count in PIPE_CONTROL\n");
+
+			switch ((data[1] >> 14) & 0x3) {
+			case 0:
+				desc1 = "no write";
+				break;
+			case 1:
+				desc1 = "qword write";
+				break;
+			case 2:
+				desc1 = "PS_DEPTH_COUNT write";
+				break;
+			case 3:
+				desc1 = "TIMESTAMP write";
+				break;
+			}
+			instr_out(ctx, 0, "PIPE_CONTROL\n");
+			instr_out(ctx, 1,
+				  "%s, %s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
+				  desc1,
+				  data[1] & (1 << 20) ? "cs stall, " : "",
+				  data[1] & (1 << 19) ?
+				  "global snapshot count reset, " : "",
+				  data[1] & (1 << 18) ? "tlb invalidate, " : "",
+				  data[1] & (1 << 17) ? "gfdt flush, " : "",
+				  data[1] & (1 << 17) ? "media state clear, " :
+				  "",
+				  data[1] & (1 << 13) ? "depth stall, " : "",
+				  data[1] & (1 << 12) ?
+				  "render target cache flush, " : "",
+				  data[1] & (1 << 11) ?
+				  "instruction cache invalidate, " : "",
+				  data[1] & (1 << 10) ?
+				  "texture cache invalidate, " : "",
+				  data[1] & (1 << 9) ?
+				  "indirect state invalidate, " : "",
+				  data[1] & (1 << 8) ? "notify irq, " : "",
+				  data[1] & (1 << 7) ? "PIPE_CONTROL flush, " :
+				  "",
+				  data[1] & (1 << 6) ? "protect mem app_id, " :
+				  "", data[1] & (1 << 5) ? "DC flush, " : "",
+				  data[1] & (1 << 4) ? "vf fetch invalidate, " :
+				  "",
+				  data[1] & (1 << 3) ?
+				  "constant cache invalidate, " : "",
+				  data[1] & (1 << 2) ?
+				  "state cache invalidate, " : "",
+				  data[1] & (1 << 1) ? "stall at scoreboard, " :
+				  "",
+				  data[1] & (1 << 0) ? "depth cache flush, " :
+				  "");
+			if (len == 5) {
+				instr_out(ctx, 2,
+					  "destination address\n");
+				instr_out(ctx, 3,
+					  "immediate dword low\n");
+				instr_out(ctx, 4,
+					  "immediate dword high\n");
+			} else {
+				for (i = 2; i < len; i++) {
+					instr_out(ctx, i, "\n");
+				}
+			}
+			return len;
+		} else {
+			if (len != 4)
+				fprintf(out, "Bad count in PIPE_CONTROL\n");
+
+			switch ((data[0] >> 14) & 0x3) {
+			case 0:
+				desc1 = "no write";
+				break;
+			case 1:
+				desc1 = "qword write";
+				break;
+			case 2:
+				desc1 = "PS_DEPTH_COUNT write";
+				break;
+			case 3:
+				desc1 = "TIMESTAMP write";
+				break;
+			}
+			instr_out(ctx, 0,
+				  "PIPE_CONTROL: %s, %sdepth stall, %sRC write flush, "
+				  "%sinst flush\n",
+				  desc1,
+				  data[0] & (1 << 13) ? "" : "no ",
+				  data[0] & (1 << 12) ? "" : "no ",
+				  data[0] & (1 << 11) ? "" : "no ");
+			instr_out(ctx, 1, "destination address\n");
+			instr_out(ctx, 2, "immediate dword low\n");
+			instr_out(ctx, 3, "immediate dword high\n");
+			return len;
+		}
+	}
+
+	if (opcode_3d) {
+		if (opcode_3d->func) {
+			return opcode_3d->func(ctx);
+		} else {
+			unsigned int i;
+
+			instr_out(ctx, 0, "%s\n", opcode_3d->name);
+
+			for (i = 1; i < len; i++) {
+				instr_out(ctx, i, "dword %d\n", i);
+			}
+			return len;
+		}
+	}
+
+	instr_out(ctx, 0, "3D UNKNOWN: 3d_965 opcode = 0x%x\n",
+		  opcode);
+	return 1;
+}
+
+static int
+decode_3d_i830(struct drm_intel_decode *ctx)
+{
+	unsigned int idx;
+	uint32_t opcode;
+	uint32_t *data = ctx->data;
+
+	struct {
+		uint32_t opcode;
+		unsigned int min_len;
+		unsigned int max_len;
+		const char *name;
+	} opcodes_3d[] = {
+		{ 0x02, 1, 1, "3DSTATE_MODES_3" },
+		{ 0x03, 1, 1, "3DSTATE_ENABLES_1" },
+		{ 0x04, 1, 1, "3DSTATE_ENABLES_2" },
+		{ 0x05, 1, 1, "3DSTATE_VFT0" },
+		{ 0x06, 1, 1, "3DSTATE_AA" },
+		{ 0x07, 1, 1, "3DSTATE_RASTERIZATION_RULES" },
+		{ 0x08, 1, 1, "3DSTATE_MODES_1" },
+		{ 0x09, 1, 1, "3DSTATE_STENCIL_TEST" },
+		{ 0x0a, 1, 1, "3DSTATE_VFT1" },
+		{ 0x0b, 1, 1, "3DSTATE_INDPT_ALPHA_BLEND" },
+		{ 0x0c, 1, 1, "3DSTATE_MODES_5" },
+		{ 0x0d, 1, 1, "3DSTATE_MAP_BLEND_OP" },
+		{ 0x0e, 1, 1, "3DSTATE_MAP_BLEND_ARG" },
+		{ 0x0f, 1, 1, "3DSTATE_MODES_2" },
+		{ 0x15, 1, 1, "3DSTATE_FOG_COLOR" },
+		{ 0x16, 1, 1, "3DSTATE_MODES_4"},
+	}, *opcode_3d;
+
+	opcode = (data[0] & 0x1f000000) >> 24;
+
+	switch (opcode) {
+	case 0x1f:
+		return decode_3d_primitive(ctx);
+	case 0x1d:
+		return decode_3d_1d(ctx);
+	case 0x1c:
+		return decode_3d_1c(ctx);
+	}
+
+	for (idx = 0; idx < ARRAY_SIZE(opcodes_3d); idx++) {
+		opcode_3d = &opcodes_3d[idx];
+		if ((data[0] & 0x1f000000) >> 24 == opcode_3d->opcode) {
+			unsigned int len = 1, i;
+
+			instr_out(ctx, 0, "%s\n", opcode_3d->name);
+			if (opcode_3d->max_len > 1) {
+				len = (data[0] & 0xff) + 2;
+				if (len < opcode_3d->min_len ||
+				    len > opcode_3d->max_len) {
+					fprintf(out, "Bad count in %s\n",
+						opcode_3d->name);
+				}
+			}
+
+			for (i = 1; i < len; i++) {
+				instr_out(ctx, i, "dword %d\n", i);
+			}
+			return len;
+		}
+	}
+
+	instr_out(ctx, 0, "3D UNKNOWN: 3d_i830 opcode = 0x%x\n",
+		  opcode);
+	return 1;
+}
+
+drm_public struct drm_intel_decode *
+drm_intel_decode_context_alloc(uint32_t devid)
+{
+	struct drm_intel_decode *ctx;
+
+	ctx = calloc(1, sizeof(struct drm_intel_decode));
+	if (!ctx)
+		return NULL;
+
+	ctx->devid = devid;
+	ctx->out = stdout;
+
+	if (IS_GEN9(devid))
+		ctx->gen = 9;
+	else if (IS_GEN8(devid))
+		ctx->gen = 8;
+	else if (IS_GEN7(devid))
+		ctx->gen = 7;
+	else if (IS_GEN6(devid))
+		ctx->gen = 6;
+	else if (IS_GEN5(devid))
+		ctx->gen = 5;
+	else if (IS_GEN4(devid))
+		ctx->gen = 4;
+	else if (IS_9XX(devid))
+		ctx->gen = 3;
+	else {
+		assert(IS_GEN2(devid));
+		ctx->gen = 2;
+	}
+
+	return ctx;
+}
+
+drm_public void
+drm_intel_decode_context_free(struct drm_intel_decode *ctx)
+{
+	free(ctx);
+}
+
+drm_public void
+drm_intel_decode_set_dump_past_end(struct drm_intel_decode *ctx,
+				   int dump_past_end)
+{
+	ctx->dump_past_end = !!dump_past_end;
+}
+
+drm_public void
+drm_intel_decode_set_batch_pointer(struct drm_intel_decode *ctx,
+				   void *data, uint32_t hw_offset, int count)
+{
+	ctx->base_data = data;
+	ctx->base_hw_offset = hw_offset;
+	ctx->base_count = count;
+}
+
+drm_public void
+drm_intel_decode_set_head_tail(struct drm_intel_decode *ctx,
+			       uint32_t head, uint32_t tail)
+{
+	ctx->head = head;
+	ctx->tail = tail;
+}
+
+drm_public void
+drm_intel_decode_set_output_file(struct drm_intel_decode *ctx,
+				 FILE *out)
+{
+	ctx->out = out;
+}
+
+/**
+ * Decodes an i830-i915 batch buffer, writing the output to stdout.
+ *
+ * \param data batch buffer contents
+ * \param count number of DWORDs to decode in the batch buffer
+ * \param hw_offset hardware address for the buffer
+ */
+drm_public void
+drm_intel_decode(struct drm_intel_decode *ctx)
+{
+	int ret;
+	unsigned int index = 0;
+	uint32_t devid;
+	int size = ctx->base_count * 4;
+	void *temp;
+
+	if (!ctx)
+		return;
+
+	/* Put a scratch page full of obviously undefined data after
+	 * the batchbuffer.  This lets us avoid a bunch of length
+	 * checking in statically sized packets.
+	 */
+	temp = malloc(size + 4096);
+	memcpy(temp, ctx->base_data, size);
+	memset((char *)temp + size, 0xd0, 4096);
+	ctx->data = temp;
+
+	ctx->hw_offset = ctx->base_hw_offset;
+	ctx->count = ctx->base_count;
+
+	devid = ctx->devid;
+	head_offset = ctx->head;
+	tail_offset = ctx->tail;
+	out = ctx->out;
+
+	saved_s2_set = 0;
+	saved_s4_set = 1;
+
+	while (ctx->count > 0) {
+		index = 0;
+
+		switch ((ctx->data[index] & 0xe0000000) >> 29) {
+		case 0x0:
+			ret = decode_mi(ctx);
+
+			/* If MI_BATCHBUFFER_END happened, then dump
+			 * the rest of the output in case we some day
+			 * want it in debugging, but don't decode it
+			 * since it'll just confuse in the common
+			 * case.
+			 */
+			if (ret == -1) {
+				if (ctx->dump_past_end) {
+					index++;
+				} else {
+					for (index = index + 1; index < ctx->count;
+					     index++) {
+						instr_out(ctx, index, "\n");
+					}
+				}
+			} else
+				index += ret;
+			break;
+		case 0x2:
+			index += decode_2d(ctx);
+			break;
+		case 0x3:
+			if (IS_9XX(devid) && !IS_GEN3(devid)) {
+				index +=
+				    decode_3d_965(ctx);
+			} else if (IS_GEN3(devid)) {
+				index += decode_3d(ctx);
+			} else {
+				index +=
+				    decode_3d_i830(ctx);
+			}
+			break;
+		default:
+			instr_out(ctx, index, "UNKNOWN\n");
+			index++;
+			break;
+		}
+		fflush(out);
+
+		if (ctx->count < index)
+			break;
+
+		ctx->count -= index;
+		ctx->data += index;
+		ctx->hw_offset += 4 * index;
+	}
+
+	free(temp);
+}

diff --git a/icd/intel/kmd/libdrm/libdrm.h b/icd/intel/kmd/libdrm/libdrm.h
new file mode 100644
index 0000000..6c3cd59
--- /dev/null
+++ b/icd/intel/kmd/libdrm/libdrm.h

@@ -0,0 +1,89 @@
+/*
+ * Copyright © 2014 NVIDIA Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef LIBDRM_LIBDRM_H
+#define LIBDRM_LIBDRM_H
+
+#if defined(HAVE_VISIBILITY)
+#  define drm_private __attribute__((visibility("hidden")))
+#  define drm_public __attribute__((visibility("default")))
+#else
+#  define drm_private
+#  define drm_public
+#endif
+
+
+/**
+ * Static (compile-time) assertion.
+ * Basically, use COND to dimension an array.  If COND is false/zero the
+ * array size will be -1 and we'll get a compilation error.
+ */
+#define STATIC_ASSERT(COND) \
+   do { \
+      (void) sizeof(char [1 - 2*!(COND)]); \
+   } while (0)
+
+
+#include <sys/mman.h>
+
+#if defined(ANDROID) && !defined(__LP64__)
+#include <errno.h> /* for EINVAL */
+
+extern void *__mmap2(void *, size_t, int, int, int, size_t);
+
+static inline void *drm_mmap(void *addr, size_t length, int prot, int flags,
+                             int fd, loff_t offset)
+{
+   /* offset must be aligned to 4096 (not necessarily the page size) */
+   if (offset & 4095) {
+      errno = EINVAL;
+      return MAP_FAILED;
+   }
+
+   return __mmap2(addr, length, prot, flags, fd, (size_t) (offset >> 12));
+}
+
+#  define drm_munmap(addr, length) \
+              munmap(addr, length)
+
+
+#else
+
+/* assume large file support exists */
+#  define drm_mmap(addr, length, prot, flags, fd, offset) \
+              mmap(addr, length, prot, flags, fd, offset)
+
+
+static inline int drm_munmap(void *addr, size_t length)
+{
+   /* Copied from configure code generated by AC_SYS_LARGEFILE */
+#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + \
+                     (((off_t) 1 << 31) << 31))
+   STATIC_ASSERT(LARGE_OFF_T % 2147483629 == 721 &&
+                 LARGE_OFF_T % 2147483647 == 1);
+#undef LARGE_OFF_T
+
+   return munmap(addr, length);
+}
+#endif
+
+#endif

diff --git a/icd/intel/kmd/libdrm/libdrm_lists.h b/icd/intel/kmd/libdrm/libdrm_lists.h
new file mode 100644
index 0000000..8926d8d
--- /dev/null
+++ b/icd/intel/kmd/libdrm/libdrm_lists.h

@@ -0,0 +1,118 @@
+/**************************************************************************
+ *
+ * Copyright 2006 Tungsten Graphics, Inc., Bismarck, ND. USA.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM,
+ * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+ * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
+ * USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ */
+
+/*
+ * List macros heavily inspired by the Linux kernel
+ * list handling. No list looping yet.
+ */
+
+#include <stddef.h>
+
+typedef struct _drmMMListHead
+{
+    struct _drmMMListHead *prev;
+    struct _drmMMListHead *next;
+} drmMMListHead;
+
+#define DRMINITLISTHEAD(__item)		       \
+  do{					       \
+    (__item)->prev = (__item);		       \
+    (__item)->next = (__item);		       \
+  } while (0)
+
+#define DRMLISTADD(__item, __list)		\
+  do {						\
+    (__item)->prev = (__list);			\
+    (__item)->next = (__list)->next;		\
+    (__list)->next->prev = (__item);		\
+    (__list)->next = (__item);			\
+  } while (0)
+
+#define DRMLISTADDTAIL(__item, __list)		\
+  do {						\
+    (__item)->next = (__list);			\
+    (__item)->prev = (__list)->prev;		\
+    (__list)->prev->next = (__item);		\
+    (__list)->prev = (__item);			\
+  } while(0)
+
+#define DRMLISTDEL(__item)			\
+  do {						\
+    (__item)->prev->next = (__item)->next;	\
+    (__item)->next->prev = (__item)->prev;	\
+  } while(0)
+
+#define DRMLISTDELINIT(__item)			\
+  do {						\
+    (__item)->prev->next = (__item)->next;	\
+    (__item)->next->prev = (__item)->prev;	\
+    (__item)->next = (__item);			\
+    (__item)->prev = (__item);			\
+  } while(0)
+
+#define DRMLISTENTRY(__type, __item, __field)   \
+    ((__type *)(((char *) (__item)) - offsetof(__type, __field)))
+
+#define DRMLISTEMPTY(__item) ((__item)->next == (__item))
+
+#define DRMLISTSINGLE(__list) \
+	(!DRMLISTEMPTY(__list) && ((__list)->next == (__list)->prev))
+
+#define DRMLISTFOREACH(__item, __list)					\
+	for ((__item) = (__list)->next;					\
+	     (__item) != (__list); (__item) = (__item)->next)
+
+#define DRMLISTFOREACHSAFE(__item, __temp, __list)			\
+	for ((__item) = (__list)->next, (__temp) = (__item)->next;	\
+	     (__item) != (__list);					\
+	     (__item) = (__temp), (__temp) = (__item)->next)
+
+#define DRMLISTFOREACHSAFEREVERSE(__item, __temp, __list)		\
+	for ((__item) = (__list)->prev, (__temp) = (__item)->prev;	\
+	     (__item) != (__list);					\
+	     (__item) = (__temp), (__temp) = (__item)->prev)
+
+#define DRMLISTFOREACHENTRY(__item, __list, __head)                            \
+	for ((__item) = DRMLISTENTRY(typeof(*__item), (__list)->next, __head); \
+	     &(__item)->__head != (__list);                                    \
+	     (__item) = DRMLISTENTRY(typeof(*__item),                          \
+				     (__item)->__head.next, __head))
+
+#define DRMLISTFOREACHENTRYSAFE(__item, __temp, __list, __head)                \
+	for ((__item) = DRMLISTENTRY(typeof(*__item), (__list)->next, __head), \
+	     (__temp) = DRMLISTENTRY(typeof(*__item),                          \
+				     (__item)->__head.next, __head);           \
+	     &(__item)->__head != (__list);                                    \
+	     (__item) = (__temp),                                              \
+	     (__temp) = DRMLISTENTRY(typeof(*__item),                          \
+				     (__temp)->__head.next, __head))
+
+#define DRMLISTJOIN(__list, __join) if (!DRMLISTEMPTY(__list)) {	\
+	(__list)->next->prev = (__join);				\
+	(__list)->prev->next = (__join)->next;				\
+	(__join)->next->prev = (__list)->prev;				\
+	(__join)->next = (__list)->next;				\
+}

diff --git a/icd/intel/kmd/libdrm/xf86atomic.h b/icd/intel/kmd/libdrm/xf86atomic.h
new file mode 100644
index 0000000..194554c
--- /dev/null
+++ b/icd/intel/kmd/libdrm/xf86atomic.h

@@ -0,0 +1,117 @@
+/*
+ * Copyright © 2009 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Chris Wilson <chris@chris-wilson.co.uk>
+ *
+ */
+
+/**
+ * @file xf86atomics.h
+ *
+ * Private definitions for atomic operations
+ */
+
+#ifndef LIBDRM_ATOMICS_H
+#define LIBDRM_ATOMICS_H
+
+#ifdef HAVE_CONFIG_H
+#include "config.h"
+#endif
+
+#if HAVE_LIBDRM_ATOMIC_PRIMITIVES
+
+#define HAS_ATOMIC_OPS 1
+
+typedef struct {
+	int atomic;
+} atomic_t;
+
+# define atomic_read(x) ((x)->atomic)
+# define atomic_set(x, val) ((x)->atomic = (val))
+# define atomic_inc(x) ((void) __sync_fetch_and_add (&(x)->atomic, 1))
+# define atomic_inc_return(x) (__sync_add_and_fetch (&(x)->atomic, 1))
+# define atomic_dec_and_test(x) (__sync_add_and_fetch (&(x)->atomic, -1) == 0)
+# define atomic_add(x, v) ((void) __sync_add_and_fetch(&(x)->atomic, (v)))
+# define atomic_dec(x, v) ((void) __sync_sub_and_fetch(&(x)->atomic, (v)))
+# define atomic_cmpxchg(x, oldv, newv) __sync_val_compare_and_swap (&(x)->atomic, oldv, newv)
+
+#endif
+
+#if HAVE_LIB_ATOMIC_OPS
+#include <atomic_ops.h>
+
+#define HAS_ATOMIC_OPS 1
+
+typedef struct {
+	AO_t atomic;
+} atomic_t;
+
+# define atomic_read(x) AO_load_full(&(x)->atomic)
+# define atomic_set(x, val) AO_store_full(&(x)->atomic, (val))
+# define atomic_inc(x) ((void) AO_fetch_and_add1_full(&(x)->atomic))
+# define atomic_inc_return(x) (AO_fetch_and_add1_full(&(x)->atomic) + 1)
+# define atomic_add(x, v) ((void) AO_fetch_and_add_full(&(x)->atomic, (v)))
+# define atomic_dec(x, v) ((void) AO_fetch_and_add_full(&(x)->atomic, -(v)))
+# define atomic_dec_and_test(x) (AO_fetch_and_sub1_full(&(x)->atomic) == 1)
+# define atomic_cmpxchg(x, oldv, newv) AO_compare_and_swap_full(&(x)->atomic, oldv, newv)
+
+#endif
+
+#if (defined(__sun) || defined(__NetBSD__)) && !defined(HAS_ATOMIC_OPS)  /* Solaris & OpenSolaris & NetBSD */
+
+#include <sys/atomic.h>
+#define HAS_ATOMIC_OPS 1
+
+#if defined(__NetBSD__)
+#define LIBDRM_ATOMIC_TYPE int
+#else
+#define LIBDRM_ATOMIC_TYPE uint_t
+#endif
+
+typedef struct { LIBDRM_ATOMIC_TYPE atomic; } atomic_t;
+
+# define atomic_read(x) (int) ((x)->atomic)
+# define atomic_set(x, val) ((x)->atomic = (LIBDRM_ATOMIC_TYPE)(val))
+# define atomic_inc(x) (atomic_inc_uint (&(x)->atomic))
+# define atomic_inc_return (atomic_inc_uint_nv(&(x)->atomic))
+# define atomic_dec_and_test(x) (atomic_dec_uint_nv(&(x)->atomic) == 0)
+# define atomic_add(x, v) (atomic_add_int(&(x)->atomic, (v)))
+# define atomic_dec(x, v) (atomic_add_int(&(x)->atomic, -(v)))
+# define atomic_cmpxchg(x, oldv, newv) atomic_cas_uint (&(x)->atomic, oldv, newv)
+
+#endif
+
+#if ! HAS_ATOMIC_OPS
+#error libdrm requires atomic operations, please define them for your CPU/compiler.
+#endif
+
+static inline int atomic_add_unless(atomic_t *v, int add, int unless)
+{
+	int c, old;
+	c = atomic_read(v);
+	while (c != unless && (old = atomic_cmpxchg(v, c, c + add)) != c)
+		c = old;
+	return c == unless;
+}
+
+#endif

diff --git a/icd/intel/kmd/libdrm/xf86drm.c b/icd/intel/kmd/libdrm/xf86drm.c
new file mode 100644
index 0000000..ffc53b8
--- /dev/null
+++ b/icd/intel/kmd/libdrm/xf86drm.c

@@ -0,0 +1,2819 @@
+/**
+ * \file xf86drm.c 
+ * User-level interface to DRM device
+ *
+ * \author Rickard E. (Rik) Faith <faith@valinux.com>
+ * \author Kevin E. Martin <martin@valinux.com>
+ */
+
+/*
+ * Copyright 1999 Precision Insight, Inc., Cedar Park, Texas.
+ * Copyright 2000 VA Linux Systems, Inc., Sunnyvale, California.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifdef HAVE_CONFIG_H
+# include <config.h>
+#endif
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+#include <strings.h>
+#include <ctype.h>
+#include <dirent.h>
+#include <stddef.h>
+#include <fcntl.h>
+#include <errno.h>
+#include <signal.h>
+#include <time.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#define stat_t struct stat
+#include <sys/ioctl.h>
+#include <sys/time.h>
+#include <stdarg.h>
+#ifdef HAVE_SYS_MKDEV_H
+# include <sys/mkdev.h> /* defines major(), minor(), and makedev() on Solaris */
+#endif
+
+/* Not all systems have MAP_FAILED defined */
+#ifndef MAP_FAILED
+#define MAP_FAILED ((void *)-1)
+#endif
+
+#include "xf86drm.h"
+#include "libdrm.h"
+
+#if defined(__FreeBSD__) || defined(__FreeBSD_kernel__) || defined(__DragonFly__)
+#define DRM_MAJOR 145
+#endif
+
+#ifdef __NetBSD__
+#define DRM_MAJOR 34
+#endif
+
+# ifdef __OpenBSD__
+#  define DRM_MAJOR 81
+# endif
+
+#ifndef DRM_MAJOR
+#define DRM_MAJOR 226		/* Linux */
+#endif
+
+/*
+ * This definition needs to be changed on some systems if dev_t is a structure.
+ * If there is a header file we can get it from, there would be best.
+ */
+#ifndef makedev
+#define makedev(x,y)    ((dev_t)(((x) << 8) | (y)))
+#endif
+
+#define DRM_MSG_VERBOSITY 3
+
+#define memclear(s) memset(&s, 0, sizeof(s))
+
+static drmServerInfoPtr drm_server_info;
+
+void drmSetServerInfo(drmServerInfoPtr info)
+{
+    drm_server_info = info;
+}
+
+/**
+ * Output a message to stderr.
+ *
+ * \param format printf() like format string.
+ *
+ * \internal
+ * This function is a wrapper around vfprintf().
+ */
+
+static int DRM_PRINTFLIKE(1, 0)
+drmDebugPrint(const char *format, va_list ap)
+{
+    return vfprintf(stderr, format, ap);
+}
+
+void
+drmMsg(const char *format, ...)
+{
+    va_list	ap;
+    const char *env;
+    if (((env = getenv("LIBGL_DEBUG")) && strstr(env, "verbose")) || drm_server_info)
+    {
+	va_start(ap, format);
+	if (drm_server_info) {
+	  drm_server_info->debug_print(format,ap);
+	} else {
+	  drmDebugPrint(format, ap);
+	}
+	va_end(ap);
+    }
+}
+
+static void *drmHashTable = NULL; /* Context switch callbacks */
+
+void *drmGetHashTable(void)
+{
+    return drmHashTable;
+}
+
+void *drmMalloc(int size)
+{
+    void *pt;
+    if ((pt = malloc(size)))
+	memset(pt, 0, size);
+    return pt;
+}
+
+void drmFree(void *pt)
+{
+    if (pt)
+	free(pt);
+}
+
+/**
+ * Call ioctl, restarting if it is interupted
+ */
+int
+drmIoctl(int fd, unsigned long request, void *arg)
+{
+    int	ret;
+
+    do {
+	ret = ioctl(fd, request, arg);
+    } while (ret == -1 && (errno == EINTR || errno == EAGAIN));
+    return ret;
+}
+
+static unsigned long drmGetKeyFromFd(int fd)
+{
+    stat_t     st;
+
+    st.st_rdev = 0;
+    fstat(fd, &st);
+    return st.st_rdev;
+}
+
+drmHashEntry *drmGetEntry(int fd)
+{
+    unsigned long key = drmGetKeyFromFd(fd);
+    void          *value;
+    drmHashEntry  *entry;
+
+    if (!drmHashTable)
+	drmHashTable = drmHashCreate();
+
+    if (drmHashLookup(drmHashTable, key, &value)) {
+	entry           = drmMalloc(sizeof(*entry));
+	entry->fd       = fd;
+	entry->f        = NULL;
+	entry->tagTable = drmHashCreate();
+	drmHashInsert(drmHashTable, key, entry);
+    } else {
+	entry = value;
+    }
+    return entry;
+}
+
+/**
+ * Compare two busid strings
+ *
+ * \param first
+ * \param second
+ *
+ * \return 1 if matched.
+ *
+ * \internal
+ * This function compares two bus ID strings.  It understands the older
+ * PCI:b:d:f format and the newer pci:oooo:bb:dd.f format.  In the format, o is
+ * domain, b is bus, d is device, f is function.
+ */
+static int drmMatchBusID(const char *id1, const char *id2, int pci_domain_ok)
+{
+    /* First, check if the IDs are exactly the same */
+    if (strcasecmp(id1, id2) == 0)
+	return 1;
+
+    /* Try to match old/new-style PCI bus IDs. */
+    if (strncasecmp(id1, "pci", 3) == 0) {
+	unsigned int o1, b1, d1, f1;
+	unsigned int o2, b2, d2, f2;
+	int ret;
+
+	ret = sscanf(id1, "pci:%04x:%02x:%02x.%u", &o1, &b1, &d1, &f1);
+	if (ret != 4) {
+	    o1 = 0;
+	    ret = sscanf(id1, "PCI:%u:%u:%u", &b1, &d1, &f1);
+	    if (ret != 3)
+		return 0;
+	}
+
+	ret = sscanf(id2, "pci:%04x:%02x:%02x.%u", &o2, &b2, &d2, &f2);
+	if (ret != 4) {
+	    o2 = 0;
+	    ret = sscanf(id2, "PCI:%u:%u:%u", &b2, &d2, &f2);
+	    if (ret != 3)
+		return 0;
+	}
+
+	/* If domains aren't properly supported by the kernel interface,
+	 * just ignore them, which sucks less than picking a totally random
+	 * card with "open by name"
+	 */
+	if (!pci_domain_ok)
+		o1 = o2 = 0;
+
+	if ((o1 != o2) || (b1 != b2) || (d1 != d2) || (f1 != f2))
+	    return 0;
+	else
+	    return 1;
+    }
+    return 0;
+}
+
+/**
+ * Handles error checking for chown call.
+ *
+ * \param path to file.
+ * \param id of the new owner.
+ * \param id of the new group.
+ *
+ * \return zero if success or -1 if failure.
+ *
+ * \internal
+ * Checks for failure. If failure was caused by signal call chown again.
+ * If any other failure happened then it will output error mesage using
+ * drmMsg() call.
+ */
+#if !defined(UDEV)
+static int chown_check_return(const char *path, uid_t owner, gid_t group)
+{
+	int rv;
+
+	do {
+		rv = chown(path, owner, group);
+	} while (rv != 0 && errno == EINTR);
+
+	if (rv == 0)
+		return 0;
+
+	drmMsg("Failed to change owner or group for file %s! %d: %s\n",
+			path, errno, strerror(errno));
+	return -1;
+}
+#endif
+
+/**
+ * Open the DRM device, creating it if necessary.
+ *
+ * \param dev major and minor numbers of the device.
+ * \param minor minor number of the device.
+ * 
+ * \return a file descriptor on success, or a negative value on error.
+ *
+ * \internal
+ * Assembles the device name from \p minor and opens it, creating the device
+ * special file node with the major and minor numbers specified by \p dev and
+ * parent directory if necessary and was called by root.
+ */
+static int drmOpenDevice(dev_t dev, int minor, int type)
+{
+    stat_t          st;
+    const char      *dev_name;
+    char            buf[64];
+    int             fd;
+    mode_t          devmode = DRM_DEV_MODE, serv_mode;
+    gid_t           serv_group;
+#if !defined(UDEV)
+    int             isroot  = !geteuid();
+    uid_t           user    = DRM_DEV_UID;
+    gid_t           group   = DRM_DEV_GID;
+#endif
+
+    switch (type) {
+    case DRM_NODE_PRIMARY:
+	    dev_name = DRM_DEV_NAME;
+	    break;
+    case DRM_NODE_CONTROL:
+	    dev_name = DRM_CONTROL_DEV_NAME;
+	    break;
+    case DRM_NODE_RENDER:
+	    dev_name = DRM_RENDER_DEV_NAME;
+	    break;
+    default:
+	    return -EINVAL;
+    };
+
+    sprintf(buf, dev_name, DRM_DIR_NAME, minor);
+    drmMsg("drmOpenDevice: node name is %s\n", buf);
+
+    if (drm_server_info) {
+	drm_server_info->get_perms(&serv_group, &serv_mode);
+	devmode  = serv_mode ? serv_mode : DRM_DEV_MODE;
+	devmode &= ~(S_IXUSR|S_IXGRP|S_IXOTH);
+    }
+
+#if !defined(UDEV)
+    if (stat(DRM_DIR_NAME, &st)) {
+	if (!isroot)
+	    return DRM_ERR_NOT_ROOT;
+	mkdir(DRM_DIR_NAME, DRM_DEV_DIRMODE);
+	chown_check_return(DRM_DIR_NAME, 0, 0); /* root:root */
+	chmod(DRM_DIR_NAME, DRM_DEV_DIRMODE);
+    }
+
+    /* Check if the device node exists and create it if necessary. */
+    if (stat(buf, &st)) {
+	if (!isroot)
+	    return DRM_ERR_NOT_ROOT;
+	remove(buf);
+	mknod(buf, S_IFCHR | devmode, dev);
+    }
+
+    if (drm_server_info) {
+	group = (serv_group >= 0) ? serv_group : DRM_DEV_GID;
+	chown_check_return(buf, user, group);
+	chmod(buf, devmode);
+    }
+#else
+    /* if we modprobed then wait for udev */
+    {
+	int udev_count = 0;
+wait_for_udev:
+        if (stat(DRM_DIR_NAME, &st)) {
+		usleep(20);
+		udev_count++;
+
+		if (udev_count == 50)
+			return -1;
+		goto wait_for_udev;
+	}
+
+    	if (stat(buf, &st)) {
+		usleep(20);
+		udev_count++;
+
+		if (udev_count == 50)
+			return -1;
+		goto wait_for_udev;
+    	}
+    }
+#endif
+
+    fd = open(buf, O_RDWR, 0);
+    drmMsg("drmOpenDevice: open result is %d, (%s)\n",
+		fd, fd < 0 ? strerror(errno) : "OK");
+    if (fd >= 0)
+	return fd;
+
+#if !defined(UDEV)
+    /* Check if the device node is not what we expect it to be, and recreate it
+     * and try again if so.
+     */
+    if (st.st_rdev != dev) {
+	if (!isroot)
+	    return DRM_ERR_NOT_ROOT;
+	remove(buf);
+	mknod(buf, S_IFCHR | devmode, dev);
+	if (drm_server_info) {
+	    chown_check_return(buf, user, group);
+	    chmod(buf, devmode);
+	}
+    }
+    fd = open(buf, O_RDWR, 0);
+    drmMsg("drmOpenDevice: open result is %d, (%s)\n",
+		fd, fd < 0 ? strerror(errno) : "OK");
+    if (fd >= 0)
+	return fd;
+
+    drmMsg("drmOpenDevice: Open failed\n");
+    remove(buf);
+#endif
+    return -errno;
+}
+
+
+/**
+ * Open the DRM device
+ *
+ * \param minor device minor number.
+ * \param create allow to create the device if set.
+ *
+ * \return a file descriptor on success, or a negative value on error.
+ * 
+ * \internal
+ * Calls drmOpenDevice() if \p create is set, otherwise assembles the device
+ * name from \p minor and opens it.
+ */
+static int drmOpenMinor(int minor, int create, int type)
+{
+    int  fd;
+    char buf[64];
+    const char *dev_name;
+    
+    if (create)
+	return drmOpenDevice(makedev(DRM_MAJOR, minor), minor, type);
+    
+    switch (type) {
+    case DRM_NODE_PRIMARY:
+	    dev_name = DRM_DEV_NAME;
+	    break;
+    case DRM_NODE_CONTROL:
+	    dev_name = DRM_CONTROL_DEV_NAME;
+	    break;
+    case DRM_NODE_RENDER:
+	    dev_name = DRM_RENDER_DEV_NAME;
+	    break;
+    default:
+	    return -EINVAL;
+    };
+
+    sprintf(buf, dev_name, DRM_DIR_NAME, minor);
+    if ((fd = open(buf, O_RDWR, 0)) >= 0)
+	return fd;
+    return -errno;
+}
+
+
+/**
+ * Determine whether the DRM kernel driver has been loaded.
+ * 
+ * \return 1 if the DRM driver is loaded, 0 otherwise.
+ *
+ * \internal 
+ * Determine the presence of the kernel driver by attempting to open the 0
+ * minor and get version information.  For backward compatibility with older
+ * Linux implementations, /proc/dri is also checked.
+ */
+int drmAvailable(void)
+{
+    drmVersionPtr version;
+    int           retval = 0;
+    int           fd;
+
+    if ((fd = drmOpenMinor(0, 1, DRM_NODE_PRIMARY)) < 0) {
+#ifdef __linux__
+	/* Try proc for backward Linux compatibility */
+	if (!access("/proc/dri/0", R_OK))
+	    return 1;
+#endif
+	return 0;
+    }
+    
+    if ((version = drmGetVersion(fd))) {
+	retval = 1;
+	drmFreeVersion(version);
+    }
+    close(fd);
+
+    return retval;
+}
+
+static int drmGetMinorBase(int type)
+{
+    switch (type) {
+    case DRM_NODE_PRIMARY:
+        return 0;
+    case DRM_NODE_CONTROL:
+        return 64;
+    case DRM_NODE_RENDER:
+        return 128;
+    default:
+        return -1;
+    };
+}
+
+static int drmGetMinorType(int minor)
+{
+    int type = minor >> 6;
+
+    if (minor < 0)
+        return -1;
+
+    switch (type) {
+    case DRM_NODE_PRIMARY:
+    case DRM_NODE_CONTROL:
+    case DRM_NODE_RENDER:
+        return type;
+    default:
+        return -1;
+    }
+}
+
+static const char *drmGetMinorName(int type)
+{
+    switch (type) {
+    case DRM_NODE_PRIMARY:
+        return "card";
+    case DRM_NODE_CONTROL:
+        return "controlD";
+    case DRM_NODE_RENDER:
+        return "renderD";
+    default:
+        return NULL;
+    }
+}
+
+/**
+ * Open the device by bus ID.
+ *
+ * \param busid bus ID.
+ * \param type device node type.
+ *
+ * \return a file descriptor on success, or a negative value on error.
+ *
+ * \internal
+ * This function attempts to open every possible minor (up to DRM_MAX_MINOR),
+ * comparing the device bus ID with the one supplied.
+ *
+ * \sa drmOpenMinor() and drmGetBusid().
+ */
+static int drmOpenByBusid(const char *busid, int type)
+{
+    int        i, pci_domain_ok = 1;
+    int        fd;
+    const char *buf;
+    drmSetVersion sv;
+    int        base = drmGetMinorBase(type);
+
+    if (base < 0)
+        return -1;
+
+    drmMsg("drmOpenByBusid: Searching for BusID %s\n", busid);
+    for (i = base; i < base + DRM_MAX_MINOR; i++) {
+	fd = drmOpenMinor(i, 1, type);
+	drmMsg("drmOpenByBusid: drmOpenMinor returns %d\n", fd);
+	if (fd >= 0) {
+	    /* We need to try for 1.4 first for proper PCI domain support
+	     * and if that fails, we know the kernel is busted
+	     */
+	    sv.drm_di_major = 1;
+	    sv.drm_di_minor = 4;
+	    sv.drm_dd_major = -1;	/* Don't care */
+	    sv.drm_dd_minor = -1;	/* Don't care */
+	    if (drmSetInterfaceVersion(fd, &sv)) {
+#ifndef __alpha__
+		pci_domain_ok = 0;
+#endif
+		sv.drm_di_major = 1;
+		sv.drm_di_minor = 1;
+		sv.drm_dd_major = -1;       /* Don't care */
+		sv.drm_dd_minor = -1;       /* Don't care */
+		drmMsg("drmOpenByBusid: Interface 1.4 failed, trying 1.1\n");
+		drmSetInterfaceVersion(fd, &sv);
+	    }
+	    buf = drmGetBusid(fd);
+	    drmMsg("drmOpenByBusid: drmGetBusid reports %s\n", buf);
+	    if (buf && drmMatchBusID(buf, busid, pci_domain_ok)) {
+		drmFreeBusid(buf);
+		return fd;
+	    }
+	    if (buf)
+		drmFreeBusid(buf);
+	    close(fd);
+	}
+    }
+    return -1;
+}
+
+
+/**
+ * Open the device by name.
+ *
+ * \param name driver name.
+ * \param type the device node type.
+ * 
+ * \return a file descriptor on success, or a negative value on error.
+ * 
+ * \internal
+ * This function opens the first minor number that matches the driver name and
+ * isn't already in use.  If it's in use it then it will already have a bus ID
+ * assigned.
+ * 
+ * \sa drmOpenMinor(), drmGetVersion() and drmGetBusid().
+ */
+static int drmOpenByName(const char *name, int type)
+{
+    int           i;
+    int           fd;
+    drmVersionPtr version;
+    char *        id;
+    int           base = drmGetMinorBase(type);
+
+    if (base < 0)
+        return -1;
+
+    /*
+     * Open the first minor number that matches the driver name and isn't
+     * already in use.  If it's in use it will have a busid assigned already.
+     */
+    for (i = base; i < base + DRM_MAX_MINOR; i++) {
+	if ((fd = drmOpenMinor(i, 1, type)) >= 0) {
+	    if ((version = drmGetVersion(fd))) {
+		if (!strcmp(version->name, name)) {
+		    drmFreeVersion(version);
+		    id = drmGetBusid(fd);
+		    drmMsg("drmGetBusid returned '%s'\n", id ? id : "NULL");
+		    if (!id || !*id) {
+			if (id)
+			    drmFreeBusid(id);
+			return fd;
+		    } else {
+			drmFreeBusid(id);
+		    }
+		} else {
+		    drmFreeVersion(version);
+		}
+	    }
+	    close(fd);
+	}
+    }
+
+#ifdef __linux__
+    /* Backward-compatibility /proc support */
+    for (i = 0; i < 8; i++) {
+	char proc_name[64], buf[512];
+	char *driver, *pt, *devstring;
+	int  retcode;
+	
+	sprintf(proc_name, "/proc/dri/%d/name", i);
+	if ((fd = open(proc_name, 0, 0)) >= 0) {
+	    retcode = read(fd, buf, sizeof(buf)-1);
+	    close(fd);
+	    if (retcode) {
+		buf[retcode-1] = '\0';
+		for (driver = pt = buf; *pt && *pt != ' '; ++pt)
+		    ;
+		if (*pt) { /* Device is next */
+		    *pt = '\0';
+		    if (!strcmp(driver, name)) { /* Match */
+			for (devstring = ++pt; *pt && *pt != ' '; ++pt)
+			    ;
+			if (*pt) { /* Found busid */
+			    return drmOpenByBusid(++pt, type);
+			} else { /* No busid */
+			    return drmOpenDevice(strtol(devstring, NULL, 0),i, type);
+			}
+		    }
+		}
+	    }
+	}
+    }
+#endif
+
+    return -1;
+}
+
+
+/**
+ * Open the DRM device.
+ *
+ * Looks up the specified name and bus ID, and opens the device found.  The
+ * entry in /dev/dri is created if necessary and if called by root.
+ *
+ * \param name driver name. Not referenced if bus ID is supplied.
+ * \param busid bus ID. Zero if not known.
+ * 
+ * \return a file descriptor on success, or a negative value on error.
+ * 
+ * \internal
+ * It calls drmOpenByBusid() if \p busid is specified or drmOpenByName()
+ * otherwise.
+ */
+int drmOpen(const char *name, const char *busid)
+{
+    return drmOpenWithType(name, busid, DRM_NODE_PRIMARY);
+}
+
+/**
+ * Open the DRM device with specified type.
+ *
+ * Looks up the specified name and bus ID, and opens the device found.  The
+ * entry in /dev/dri is created if necessary and if called by root.
+ *
+ * \param name driver name. Not referenced if bus ID is supplied.
+ * \param busid bus ID. Zero if not known.
+ * \param type the device node type to open, PRIMARY, CONTROL or RENDER
+ *
+ * \return a file descriptor on success, or a negative value on error.
+ *
+ * \internal
+ * It calls drmOpenByBusid() if \p busid is specified or drmOpenByName()
+ * otherwise.
+ */
+int drmOpenWithType(const char *name, const char *busid, int type)
+{
+    if (!drmAvailable() && name != NULL && drm_server_info) {
+	/* try to load the kernel module */
+	if (!drm_server_info->load_module(name)) {
+	    drmMsg("[drm] failed to load kernel module \"%s\"\n", name);
+	    return -1;
+	}
+    }
+
+    if (busid) {
+	int fd = drmOpenByBusid(busid, type);
+	if (fd >= 0)
+	    return fd;
+    }
+    
+    if (name)
+	return drmOpenByName(name, type);
+
+    return -1;
+}
+
+int drmOpenControl(int minor)
+{
+    return drmOpenMinor(minor, 0, DRM_NODE_CONTROL);
+}
+
+int drmOpenRender(int minor)
+{
+    return drmOpenMinor(minor, 0, DRM_NODE_RENDER);
+}
+
+/**
+ * Free the version information returned by drmGetVersion().
+ *
+ * \param v pointer to the version information.
+ *
+ * \internal
+ * It frees the memory pointed by \p %v as well as all the non-null strings
+ * pointers in it.
+ */
+void drmFreeVersion(drmVersionPtr v)
+{
+    if (!v)
+	return;
+    drmFree(v->name);
+    drmFree(v->date);
+    drmFree(v->desc);
+    drmFree(v);
+}
+
+
+/**
+ * Free the non-public version information returned by the kernel.
+ *
+ * \param v pointer to the version information.
+ *
+ * \internal
+ * Used by drmGetVersion() to free the memory pointed by \p %v as well as all
+ * the non-null strings pointers in it.
+ */
+static void drmFreeKernelVersion(drm_version_t *v)
+{
+    if (!v)
+	return;
+    drmFree(v->name);
+    drmFree(v->date);
+    drmFree(v->desc);
+    drmFree(v);
+}
+
+
+/**
+ * Copy version information.
+ * 
+ * \param d destination pointer.
+ * \param s source pointer.
+ * 
+ * \internal
+ * Used by drmGetVersion() to translate the information returned by the ioctl
+ * interface in a private structure into the public structure counterpart.
+ */
+static void drmCopyVersion(drmVersionPtr d, const drm_version_t *s)
+{
+    d->version_major      = s->version_major;
+    d->version_minor      = s->version_minor;
+    d->version_patchlevel = s->version_patchlevel;
+    d->name_len           = s->name_len;
+    d->name               = strdup(s->name);
+    d->date_len           = s->date_len;
+    d->date               = strdup(s->date);
+    d->desc_len           = s->desc_len;
+    d->desc               = strdup(s->desc);
+}
+
+
+/**
+ * Query the driver version information.
+ *
+ * \param fd file descriptor.
+ * 
+ * \return pointer to a drmVersion structure which should be freed with
+ * drmFreeVersion().
+ * 
+ * \note Similar information is available via /proc/dri.
+ * 
+ * \internal
+ * It gets the version information via successive DRM_IOCTL_VERSION ioctls,
+ * first with zeros to get the string lengths, and then the actually strings.
+ * It also null-terminates them since they might not be already.
+ */
+drmVersionPtr drmGetVersion(int fd)
+{
+    drmVersionPtr retval;
+    drm_version_t *version = drmMalloc(sizeof(*version));
+
+    memclear(*version);
+
+    if (drmIoctl(fd, DRM_IOCTL_VERSION, version)) {
+	drmFreeKernelVersion(version);
+	return NULL;
+    }
+
+    if (version->name_len)
+	version->name    = drmMalloc(version->name_len + 1);
+    if (version->date_len)
+	version->date    = drmMalloc(version->date_len + 1);
+    if (version->desc_len)
+	version->desc    = drmMalloc(version->desc_len + 1);
+
+    if (drmIoctl(fd, DRM_IOCTL_VERSION, version)) {
+	drmMsg("DRM_IOCTL_VERSION: %s\n", strerror(errno));
+	drmFreeKernelVersion(version);
+	return NULL;
+    }
+
+    /* The results might not be null-terminated strings, so terminate them. */
+    if (version->name_len) version->name[version->name_len] = '\0';
+    if (version->date_len) version->date[version->date_len] = '\0';
+    if (version->desc_len) version->desc[version->desc_len] = '\0';
+
+    retval = drmMalloc(sizeof(*retval));
+    drmCopyVersion(retval, version);
+    drmFreeKernelVersion(version);
+    return retval;
+}
+
+
+/**
+ * Get version information for the DRM user space library.
+ * 
+ * This version number is driver independent.
+ * 
+ * \param fd file descriptor.
+ *
+ * \return version information.
+ * 
+ * \internal
+ * This function allocates and fills a drm_version structure with a hard coded
+ * version number.
+ */
+drmVersionPtr drmGetLibVersion(int fd)
+{
+    drm_version_t *version = drmMalloc(sizeof(*version));
+
+    /* Version history:
+     *   NOTE THIS MUST NOT GO ABOVE VERSION 1.X due to drivers needing it
+     *   revision 1.0.x = original DRM interface with no drmGetLibVersion
+     *                    entry point and many drm<Device> extensions
+     *   revision 1.1.x = added drmCommand entry points for device extensions
+     *                    added drmGetLibVersion to identify libdrm.a version
+     *   revision 1.2.x = added drmSetInterfaceVersion
+     *                    modified drmOpen to handle both busid and name
+     *   revision 1.3.x = added server + memory manager
+     */
+    version->version_major      = 1;
+    version->version_minor      = 3;
+    version->version_patchlevel = 0;
+
+    return (drmVersionPtr)version;
+}
+
+int drmGetCap(int fd, uint64_t capability, uint64_t *value)
+{
+	struct drm_get_cap cap;
+	int ret;
+
+	memclear(cap);
+	cap.capability = capability;
+
+	ret = drmIoctl(fd, DRM_IOCTL_GET_CAP, &cap);
+	if (ret)
+		return ret;
+
+	*value = cap.value;
+	return 0;
+}
+
+int drmSetClientCap(int fd, uint64_t capability, uint64_t value)
+{
+	struct drm_set_client_cap cap;
+
+	memclear(cap);
+	cap.capability = capability;
+	cap.value = value;
+
+	return drmIoctl(fd, DRM_IOCTL_SET_CLIENT_CAP, &cap);
+}
+
+/**
+ * Free the bus ID information.
+ *
+ * \param busid bus ID information string as given by drmGetBusid().
+ *
+ * \internal
+ * This function is just frees the memory pointed by \p busid.
+ */
+void drmFreeBusid(const char *busid)
+{
+    drmFree((void *)busid);
+}
+
+
+/**
+ * Get the bus ID of the device.
+ *
+ * \param fd file descriptor.
+ *
+ * \return bus ID string.
+ *
+ * \internal
+ * This function gets the bus ID via successive DRM_IOCTL_GET_UNIQUE ioctls to
+ * get the string length and data, passing the arguments in a drm_unique
+ * structure.
+ */
+char *drmGetBusid(int fd)
+{
+    drm_unique_t u;
+
+    memclear(u);
+
+    if (drmIoctl(fd, DRM_IOCTL_GET_UNIQUE, &u))
+	return NULL;
+    u.unique = drmMalloc(u.unique_len + 1);
+    if (drmIoctl(fd, DRM_IOCTL_GET_UNIQUE, &u))
+	return NULL;
+    u.unique[u.unique_len] = '\0';
+
+    return u.unique;
+}
+
+
+/**
+ * Set the bus ID of the device.
+ *
+ * \param fd file descriptor.
+ * \param busid bus ID string.
+ *
+ * \return zero on success, negative on failure.
+ *
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_SET_UNIQUE ioctl, passing
+ * the arguments in a drm_unique structure.
+ */
+int drmSetBusid(int fd, const char *busid)
+{
+    drm_unique_t u;
+
+    memclear(u);
+    u.unique     = (char *)busid;
+    u.unique_len = strlen(busid);
+
+    if (drmIoctl(fd, DRM_IOCTL_SET_UNIQUE, &u)) {
+	return -errno;
+    }
+    return 0;
+}
+
+int drmGetMagic(int fd, drm_magic_t * magic)
+{
+    drm_auth_t auth;
+
+    memclear(auth);
+
+    *magic = 0;
+    if (drmIoctl(fd, DRM_IOCTL_GET_MAGIC, &auth))
+	return -errno;
+    *magic = auth.magic;
+    return 0;
+}
+
+int drmAuthMagic(int fd, drm_magic_t magic)
+{
+    drm_auth_t auth;
+
+    memclear(auth);
+    auth.magic = magic;
+    if (drmIoctl(fd, DRM_IOCTL_AUTH_MAGIC, &auth))
+	return -errno;
+    return 0;
+}
+
+/**
+ * Specifies a range of memory that is available for mapping by a
+ * non-root process.
+ *
+ * \param fd file descriptor.
+ * \param offset usually the physical address. The actual meaning depends of
+ * the \p type parameter. See below.
+ * \param size of the memory in bytes.
+ * \param type type of the memory to be mapped.
+ * \param flags combination of several flags to modify the function actions.
+ * \param handle will be set to a value that may be used as the offset
+ * parameter for mmap().
+ * 
+ * \return zero on success or a negative value on error.
+ *
+ * \par Mapping the frame buffer
+ * For the frame buffer
+ * - \p offset will be the physical address of the start of the frame buffer,
+ * - \p size will be the size of the frame buffer in bytes, and
+ * - \p type will be DRM_FRAME_BUFFER.
+ *
+ * \par
+ * The area mapped will be uncached. If MTRR support is available in the
+ * kernel, the frame buffer area will be set to write combining. 
+ *
+ * \par Mapping the MMIO register area
+ * For the MMIO register area,
+ * - \p offset will be the physical address of the start of the register area,
+ * - \p size will be the size of the register area bytes, and
+ * - \p type will be DRM_REGISTERS.
+ * \par
+ * The area mapped will be uncached. 
+ * 
+ * \par Mapping the SAREA
+ * For the SAREA,
+ * - \p offset will be ignored and should be set to zero,
+ * - \p size will be the desired size of the SAREA in bytes,
+ * - \p type will be DRM_SHM.
+ * 
+ * \par
+ * A shared memory area of the requested size will be created and locked in
+ * kernel memory. This area may be mapped into client-space by using the handle
+ * returned. 
+ * 
+ * \note May only be called by root.
+ *
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_ADD_MAP ioctl, passing
+ * the arguments in a drm_map structure.
+ */
+int drmAddMap(int fd, drm_handle_t offset, drmSize size, drmMapType type,
+	      drmMapFlags flags, drm_handle_t *handle)
+{
+    drm_map_t map;
+
+    memclear(map);
+    map.offset  = offset;
+    map.size    = size;
+    map.type    = type;
+    map.flags   = flags;
+    if (drmIoctl(fd, DRM_IOCTL_ADD_MAP, &map))
+	return -errno;
+    if (handle)
+	*handle = (drm_handle_t)(uintptr_t)map.handle;
+    return 0;
+}
+
+int drmRmMap(int fd, drm_handle_t handle)
+{
+    drm_map_t map;
+
+    memclear(map);
+    map.handle = (void *)(uintptr_t)handle;
+
+    if(drmIoctl(fd, DRM_IOCTL_RM_MAP, &map))
+	return -errno;
+    return 0;
+}
+
+/**
+ * Make buffers available for DMA transfers.
+ * 
+ * \param fd file descriptor.
+ * \param count number of buffers.
+ * \param size size of each buffer.
+ * \param flags buffer allocation flags.
+ * \param agp_offset offset in the AGP aperture 
+ *
+ * \return number of buffers allocated, negative on error.
+ *
+ * \internal
+ * This function is a wrapper around DRM_IOCTL_ADD_BUFS ioctl.
+ *
+ * \sa drm_buf_desc.
+ */
+int drmAddBufs(int fd, int count, int size, drmBufDescFlags flags,
+	       int agp_offset)
+{
+    drm_buf_desc_t request;
+
+    memclear(request);
+    request.count     = count;
+    request.size      = size;
+    request.flags     = flags;
+    request.agp_start = agp_offset;
+
+    if (drmIoctl(fd, DRM_IOCTL_ADD_BUFS, &request))
+	return -errno;
+    return request.count;
+}
+
+int drmMarkBufs(int fd, double low, double high)
+{
+    drm_buf_info_t info;
+    int            i;
+
+    memclear(info);
+
+    if (drmIoctl(fd, DRM_IOCTL_INFO_BUFS, &info))
+	return -EINVAL;
+
+    if (!info.count)
+	return -EINVAL;
+
+    if (!(info.list = drmMalloc(info.count * sizeof(*info.list))))
+	return -ENOMEM;
+
+    if (drmIoctl(fd, DRM_IOCTL_INFO_BUFS, &info)) {
+	int retval = -errno;
+	drmFree(info.list);
+	return retval;
+    }
+
+    for (i = 0; i < info.count; i++) {
+	info.list[i].low_mark  = low  * info.list[i].count;
+	info.list[i].high_mark = high * info.list[i].count;
+	if (drmIoctl(fd, DRM_IOCTL_MARK_BUFS, &info.list[i])) {
+	    int retval = -errno;
+	    drmFree(info.list);
+	    return retval;
+	}
+    }
+    drmFree(info.list);
+
+    return 0;
+}
+
+/**
+ * Free buffers.
+ *
+ * \param fd file descriptor.
+ * \param count number of buffers to free.
+ * \param list list of buffers to be freed.
+ *
+ * \return zero on success, or a negative value on failure.
+ * 
+ * \note This function is primarily used for debugging.
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_FREE_BUFS ioctl, passing
+ * the arguments in a drm_buf_free structure.
+ */
+int drmFreeBufs(int fd, int count, int *list)
+{
+    drm_buf_free_t request;
+
+    memclear(request);
+    request.count = count;
+    request.list  = list;
+    if (drmIoctl(fd, DRM_IOCTL_FREE_BUFS, &request))
+	return -errno;
+    return 0;
+}
+
+
+/**
+ * Close the device.
+ *
+ * \param fd file descriptor.
+ *
+ * \internal
+ * This function closes the file descriptor.
+ */
+int drmClose(int fd)
+{
+    unsigned long key    = drmGetKeyFromFd(fd);
+    drmHashEntry  *entry = drmGetEntry(fd);
+
+    drmHashDestroy(entry->tagTable);
+    entry->fd       = 0;
+    entry->f        = NULL;
+    entry->tagTable = NULL;
+
+    drmHashDelete(drmHashTable, key);
+    drmFree(entry);
+
+    return close(fd);
+}
+
+
+/**
+ * Map a region of memory.
+ *
+ * \param fd file descriptor.
+ * \param handle handle returned by drmAddMap().
+ * \param size size in bytes. Must match the size used by drmAddMap().
+ * \param address will contain the user-space virtual address where the mapping
+ * begins.
+ *
+ * \return zero on success, or a negative value on failure.
+ * 
+ * \internal
+ * This function is a wrapper for mmap().
+ */
+int drmMap(int fd, drm_handle_t handle, drmSize size, drmAddressPtr address)
+{
+    static unsigned long pagesize_mask = 0;
+
+    if (fd < 0)
+	return -EINVAL;
+
+    if (!pagesize_mask)
+	pagesize_mask = getpagesize() - 1;
+
+    size = (size + pagesize_mask) & ~pagesize_mask;
+
+    *address = drm_mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, handle);
+    if (*address == MAP_FAILED)
+	return -errno;
+    return 0;
+}
+
+
+/**
+ * Unmap mappings obtained with drmMap().
+ *
+ * \param address address as given by drmMap().
+ * \param size size in bytes. Must match the size used by drmMap().
+ * 
+ * \return zero on success, or a negative value on failure.
+ *
+ * \internal
+ * This function is a wrapper for munmap().
+ */
+int drmUnmap(drmAddress address, drmSize size)
+{
+    return drm_munmap(address, size);
+}
+
+drmBufInfoPtr drmGetBufInfo(int fd)
+{
+    drm_buf_info_t info;
+    drmBufInfoPtr  retval;
+    int            i;
+
+    memclear(info);
+
+    if (drmIoctl(fd, DRM_IOCTL_INFO_BUFS, &info))
+	return NULL;
+
+    if (info.count) {
+	if (!(info.list = drmMalloc(info.count * sizeof(*info.list))))
+	    return NULL;
+
+	if (drmIoctl(fd, DRM_IOCTL_INFO_BUFS, &info)) {
+	    drmFree(info.list);
+	    return NULL;
+	}
+
+	retval = drmMalloc(sizeof(*retval));
+	retval->count = info.count;
+	retval->list  = drmMalloc(info.count * sizeof(*retval->list));
+	for (i = 0; i < info.count; i++) {
+	    retval->list[i].count     = info.list[i].count;
+	    retval->list[i].size      = info.list[i].size;
+	    retval->list[i].low_mark  = info.list[i].low_mark;
+	    retval->list[i].high_mark = info.list[i].high_mark;
+	}
+	drmFree(info.list);
+	return retval;
+    }
+    return NULL;
+}
+
+/**
+ * Map all DMA buffers into client-virtual space.
+ *
+ * \param fd file descriptor.
+ *
+ * \return a pointer to a ::drmBufMap structure.
+ *
+ * \note The client may not use these buffers until obtaining buffer indices
+ * with drmDMA().
+ * 
+ * \internal
+ * This function calls the DRM_IOCTL_MAP_BUFS ioctl and copies the returned
+ * information about the buffers in a drm_buf_map structure into the
+ * client-visible data structures.
+ */ 
+drmBufMapPtr drmMapBufs(int fd)
+{
+    drm_buf_map_t bufs;
+    drmBufMapPtr  retval;
+    int           i;
+
+    memclear(bufs);
+    if (drmIoctl(fd, DRM_IOCTL_MAP_BUFS, &bufs))
+	return NULL;
+
+    if (!bufs.count)
+	return NULL;
+
+	if (!(bufs.list = drmMalloc(bufs.count * sizeof(*bufs.list))))
+	    return NULL;
+
+	if (drmIoctl(fd, DRM_IOCTL_MAP_BUFS, &bufs)) {
+	    drmFree(bufs.list);
+	    return NULL;
+	}
+
+	retval = drmMalloc(sizeof(*retval));
+	retval->count = bufs.count;
+	retval->list  = drmMalloc(bufs.count * sizeof(*retval->list));
+	for (i = 0; i < bufs.count; i++) {
+	    retval->list[i].idx     = bufs.list[i].idx;
+	    retval->list[i].total   = bufs.list[i].total;
+	    retval->list[i].used    = 0;
+	    retval->list[i].address = bufs.list[i].address;
+	}
+
+	drmFree(bufs.list);
+	
+	return retval;
+}
+
+
+/**
+ * Unmap buffers allocated with drmMapBufs().
+ *
+ * \return zero on success, or negative value on failure.
+ *
+ * \internal
+ * Calls munmap() for every buffer stored in \p bufs and frees the
+ * memory allocated by drmMapBufs().
+ */
+int drmUnmapBufs(drmBufMapPtr bufs)
+{
+    int i;
+
+    for (i = 0; i < bufs->count; i++) {
+	drm_munmap(bufs->list[i].address, bufs->list[i].total);
+    }
+
+    drmFree(bufs->list);
+    drmFree(bufs);
+	
+    return 0;
+}
+
+
+#define DRM_DMA_RETRY		16
+
+/**
+ * Reserve DMA buffers.
+ *
+ * \param fd file descriptor.
+ * \param request 
+ * 
+ * \return zero on success, or a negative value on failure.
+ *
+ * \internal
+ * Assemble the arguments into a drm_dma structure and keeps issuing the
+ * DRM_IOCTL_DMA ioctl until success or until maximum number of retries.
+ */
+int drmDMA(int fd, drmDMAReqPtr request)
+{
+    drm_dma_t dma;
+    int ret, i = 0;
+
+    dma.context         = request->context;
+    dma.send_count      = request->send_count;
+    dma.send_indices    = request->send_list;
+    dma.send_sizes      = request->send_sizes;
+    dma.flags           = request->flags;
+    dma.request_count   = request->request_count;
+    dma.request_size    = request->request_size;
+    dma.request_indices = request->request_list;
+    dma.request_sizes   = request->request_sizes;
+    dma.granted_count   = 0;
+
+    do {
+	ret = ioctl( fd, DRM_IOCTL_DMA, &dma );
+    } while ( ret && errno == EAGAIN && i++ < DRM_DMA_RETRY );
+
+    if ( ret == 0 ) {
+	request->granted_count = dma.granted_count;
+	return 0;
+    } else {
+	return -errno;
+    }
+}
+
+
+/**
+ * Obtain heavyweight hardware lock.
+ *
+ * \param fd file descriptor.
+ * \param context context.
+ * \param flags flags that determine the sate of the hardware when the function
+ * returns.
+ * 
+ * \return always zero.
+ * 
+ * \internal
+ * This function translates the arguments into a drm_lock structure and issue
+ * the DRM_IOCTL_LOCK ioctl until the lock is successfully acquired.
+ */
+int drmGetLock(int fd, drm_context_t context, drmLockFlags flags)
+{
+    drm_lock_t lock;
+
+    memclear(lock);
+    lock.context = context;
+    lock.flags   = 0;
+    if (flags & DRM_LOCK_READY)      lock.flags |= _DRM_LOCK_READY;
+    if (flags & DRM_LOCK_QUIESCENT)  lock.flags |= _DRM_LOCK_QUIESCENT;
+    if (flags & DRM_LOCK_FLUSH)      lock.flags |= _DRM_LOCK_FLUSH;
+    if (flags & DRM_LOCK_FLUSH_ALL)  lock.flags |= _DRM_LOCK_FLUSH_ALL;
+    if (flags & DRM_HALT_ALL_QUEUES) lock.flags |= _DRM_HALT_ALL_QUEUES;
+    if (flags & DRM_HALT_CUR_QUEUES) lock.flags |= _DRM_HALT_CUR_QUEUES;
+
+    while (drmIoctl(fd, DRM_IOCTL_LOCK, &lock))
+	;
+    return 0;
+}
+
+/**
+ * Release the hardware lock.
+ *
+ * \param fd file descriptor.
+ * \param context context.
+ * 
+ * \return zero on success, or a negative value on failure.
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_UNLOCK ioctl, passing the
+ * argument in a drm_lock structure.
+ */
+int drmUnlock(int fd, drm_context_t context)
+{
+    drm_lock_t lock;
+
+    memclear(lock);
+    lock.context = context;
+    return drmIoctl(fd, DRM_IOCTL_UNLOCK, &lock);
+}
+
+drm_context_t *drmGetReservedContextList(int fd, int *count)
+{
+    drm_ctx_res_t res;
+    drm_ctx_t     *list;
+    drm_context_t * retval;
+    int           i;
+
+    memclear(res);
+    if (drmIoctl(fd, DRM_IOCTL_RES_CTX, &res))
+	return NULL;
+
+    if (!res.count)
+	return NULL;
+
+    if (!(list   = drmMalloc(res.count * sizeof(*list))))
+	return NULL;
+    if (!(retval = drmMalloc(res.count * sizeof(*retval)))) {
+	drmFree(list);
+	return NULL;
+    }
+
+    res.contexts = list;
+    if (drmIoctl(fd, DRM_IOCTL_RES_CTX, &res))
+	return NULL;
+
+    for (i = 0; i < res.count; i++)
+	retval[i] = list[i].handle;
+    drmFree(list);
+
+    *count = res.count;
+    return retval;
+}
+
+void drmFreeReservedContextList(drm_context_t *pt)
+{
+    drmFree(pt);
+}
+
+/**
+ * Create context.
+ *
+ * Used by the X server during GLXContext initialization. This causes
+ * per-context kernel-level resources to be allocated.
+ *
+ * \param fd file descriptor.
+ * \param handle is set on success. To be used by the client when requesting DMA
+ * dispatch with drmDMA().
+ * 
+ * \return zero on success, or a negative value on failure.
+ * 
+ * \note May only be called by root.
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_ADD_CTX ioctl, passing the
+ * argument in a drm_ctx structure.
+ */
+int drmCreateContext(int fd, drm_context_t *handle)
+{
+    drm_ctx_t ctx;
+
+    memclear(ctx);
+    if (drmIoctl(fd, DRM_IOCTL_ADD_CTX, &ctx))
+	return -errno;
+    *handle = ctx.handle;
+    return 0;
+}
+
+int drmSwitchToContext(int fd, drm_context_t context)
+{
+    drm_ctx_t ctx;
+
+    memclear(ctx);
+    ctx.handle = context;
+    if (drmIoctl(fd, DRM_IOCTL_SWITCH_CTX, &ctx))
+	return -errno;
+    return 0;
+}
+
+int drmSetContextFlags(int fd, drm_context_t context, drm_context_tFlags flags)
+{
+    drm_ctx_t ctx;
+
+    /*
+     * Context preserving means that no context switches are done between DMA
+     * buffers from one context and the next.  This is suitable for use in the
+     * X server (which promises to maintain hardware context), or in the
+     * client-side library when buffers are swapped on behalf of two threads.
+     */
+    memclear(ctx);
+    ctx.handle = context;
+    if (flags & DRM_CONTEXT_PRESERVED)
+	ctx.flags |= _DRM_CONTEXT_PRESERVED;
+    if (flags & DRM_CONTEXT_2DONLY)
+	ctx.flags |= _DRM_CONTEXT_2DONLY;
+    if (drmIoctl(fd, DRM_IOCTL_MOD_CTX, &ctx))
+	return -errno;
+    return 0;
+}
+
+int drmGetContextFlags(int fd, drm_context_t context,
+                       drm_context_tFlagsPtr flags)
+{
+    drm_ctx_t ctx;
+
+    memclear(ctx);
+    ctx.handle = context;
+    if (drmIoctl(fd, DRM_IOCTL_GET_CTX, &ctx))
+	return -errno;
+    *flags = 0;
+    if (ctx.flags & _DRM_CONTEXT_PRESERVED)
+	*flags |= DRM_CONTEXT_PRESERVED;
+    if (ctx.flags & _DRM_CONTEXT_2DONLY)
+	*flags |= DRM_CONTEXT_2DONLY;
+    return 0;
+}
+
+/**
+ * Destroy context.
+ *
+ * Free any kernel-level resources allocated with drmCreateContext() associated
+ * with the context.
+ * 
+ * \param fd file descriptor.
+ * \param handle handle given by drmCreateContext().
+ * 
+ * \return zero on success, or a negative value on failure.
+ * 
+ * \note May only be called by root.
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_RM_CTX ioctl, passing the
+ * argument in a drm_ctx structure.
+ */
+int drmDestroyContext(int fd, drm_context_t handle)
+{
+    drm_ctx_t ctx;
+
+    memclear(ctx);
+    ctx.handle = handle;
+    if (drmIoctl(fd, DRM_IOCTL_RM_CTX, &ctx))
+	return -errno;
+    return 0;
+}
+
+int drmCreateDrawable(int fd, drm_drawable_t *handle)
+{
+    drm_draw_t draw;
+
+    memclear(draw);
+    if (drmIoctl(fd, DRM_IOCTL_ADD_DRAW, &draw))
+	return -errno;
+    *handle = draw.handle;
+    return 0;
+}
+
+int drmDestroyDrawable(int fd, drm_drawable_t handle)
+{
+    drm_draw_t draw;
+
+    memclear(draw);
+    draw.handle = handle;
+    if (drmIoctl(fd, DRM_IOCTL_RM_DRAW, &draw))
+	return -errno;
+    return 0;
+}
+
+int drmUpdateDrawableInfo(int fd, drm_drawable_t handle,
+			   drm_drawable_info_type_t type, unsigned int num,
+			   void *data)
+{
+    drm_update_draw_t update;
+
+    memclear(update);
+    update.handle = handle;
+    update.type = type;
+    update.num = num;
+    update.data = (unsigned long long)(unsigned long)data;
+
+    if (drmIoctl(fd, DRM_IOCTL_UPDATE_DRAW, &update))
+	return -errno;
+
+    return 0;
+}
+
+/**
+ * Acquire the AGP device.
+ *
+ * Must be called before any of the other AGP related calls.
+ *
+ * \param fd file descriptor.
+ * 
+ * \return zero on success, or a negative value on failure.
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_AGP_ACQUIRE ioctl.
+ */
+int drmAgpAcquire(int fd)
+{
+    if (drmIoctl(fd, DRM_IOCTL_AGP_ACQUIRE, NULL))
+	return -errno;
+    return 0;
+}
+
+
+/**
+ * Release the AGP device.
+ *
+ * \param fd file descriptor.
+ * 
+ * \return zero on success, or a negative value on failure.
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_AGP_RELEASE ioctl.
+ */
+int drmAgpRelease(int fd)
+{
+    if (drmIoctl(fd, DRM_IOCTL_AGP_RELEASE, NULL))
+	return -errno;
+    return 0;
+}
+
+
+/**
+ * Set the AGP mode.
+ *
+ * \param fd file descriptor.
+ * \param mode AGP mode.
+ * 
+ * \return zero on success, or a negative value on failure.
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_AGP_ENABLE ioctl, passing the
+ * argument in a drm_agp_mode structure.
+ */
+int drmAgpEnable(int fd, unsigned long mode)
+{
+    drm_agp_mode_t m;
+
+    memclear(m);
+    m.mode = mode;
+    if (drmIoctl(fd, DRM_IOCTL_AGP_ENABLE, &m))
+	return -errno;
+    return 0;
+}
+
+
+/**
+ * Allocate a chunk of AGP memory.
+ *
+ * \param fd file descriptor.
+ * \param size requested memory size in bytes. Will be rounded to page boundary.
+ * \param type type of memory to allocate.
+ * \param address if not zero, will be set to the physical address of the
+ * allocated memory.
+ * \param handle on success will be set to a handle of the allocated memory.
+ * 
+ * \return zero on success, or a negative value on failure.
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_AGP_ALLOC ioctl, passing the
+ * arguments in a drm_agp_buffer structure.
+ */
+int drmAgpAlloc(int fd, unsigned long size, unsigned long type,
+		unsigned long *address, drm_handle_t *handle)
+{
+    drm_agp_buffer_t b;
+
+    memclear(b);
+    *handle = DRM_AGP_NO_HANDLE;
+    b.size   = size;
+    b.type   = type;
+    if (drmIoctl(fd, DRM_IOCTL_AGP_ALLOC, &b))
+	return -errno;
+    if (address != 0UL)
+	*address = b.physical;
+    *handle = b.handle;
+    return 0;
+}
+
+
+/**
+ * Free a chunk of AGP memory.
+ *
+ * \param fd file descriptor.
+ * \param handle handle to the allocated memory, as given by drmAgpAllocate().
+ * 
+ * \return zero on success, or a negative value on failure.
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_AGP_FREE ioctl, passing the
+ * argument in a drm_agp_buffer structure.
+ */
+int drmAgpFree(int fd, drm_handle_t handle)
+{
+    drm_agp_buffer_t b;
+
+    memclear(b);
+    b.handle = handle;
+    if (drmIoctl(fd, DRM_IOCTL_AGP_FREE, &b))
+	return -errno;
+    return 0;
+}
+
+
+/**
+ * Bind a chunk of AGP memory.
+ *
+ * \param fd file descriptor.
+ * \param handle handle to the allocated memory, as given by drmAgpAllocate().
+ * \param offset offset in bytes. It will round to page boundary.
+ * 
+ * \return zero on success, or a negative value on failure.
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_AGP_BIND ioctl, passing the
+ * argument in a drm_agp_binding structure.
+ */
+int drmAgpBind(int fd, drm_handle_t handle, unsigned long offset)
+{
+    drm_agp_binding_t b;
+
+    memclear(b);
+    b.handle = handle;
+    b.offset = offset;
+    if (drmIoctl(fd, DRM_IOCTL_AGP_BIND, &b))
+	return -errno;
+    return 0;
+}
+
+
+/**
+ * Unbind a chunk of AGP memory.
+ *
+ * \param fd file descriptor.
+ * \param handle handle to the allocated memory, as given by drmAgpAllocate().
+ * 
+ * \return zero on success, or a negative value on failure.
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_AGP_UNBIND ioctl, passing
+ * the argument in a drm_agp_binding structure.
+ */
+int drmAgpUnbind(int fd, drm_handle_t handle)
+{
+    drm_agp_binding_t b;
+
+    memclear(b);
+    b.handle = handle;
+    if (drmIoctl(fd, DRM_IOCTL_AGP_UNBIND, &b))
+	return -errno;
+    return 0;
+}
+
+
+/**
+ * Get AGP driver major version number.
+ *
+ * \param fd file descriptor.
+ * 
+ * \return major version number on success, or a negative value on failure..
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_AGP_INFO ioctl, getting the
+ * necessary information in a drm_agp_info structure.
+ */
+int drmAgpVersionMajor(int fd)
+{
+    drm_agp_info_t i;
+
+    memclear(i);
+
+    if (drmIoctl(fd, DRM_IOCTL_AGP_INFO, &i))
+	return -errno;
+    return i.agp_version_major;
+}
+
+
+/**
+ * Get AGP driver minor version number.
+ *
+ * \param fd file descriptor.
+ * 
+ * \return minor version number on success, or a negative value on failure.
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_AGP_INFO ioctl, getting the
+ * necessary information in a drm_agp_info structure.
+ */
+int drmAgpVersionMinor(int fd)
+{
+    drm_agp_info_t i;
+
+    memclear(i);
+
+    if (drmIoctl(fd, DRM_IOCTL_AGP_INFO, &i))
+	return -errno;
+    return i.agp_version_minor;
+}
+
+
+/**
+ * Get AGP mode.
+ *
+ * \param fd file descriptor.
+ * 
+ * \return mode on success, or zero on failure.
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_AGP_INFO ioctl, getting the
+ * necessary information in a drm_agp_info structure.
+ */
+unsigned long drmAgpGetMode(int fd)
+{
+    drm_agp_info_t i;
+
+    memclear(i);
+
+    if (drmIoctl(fd, DRM_IOCTL_AGP_INFO, &i))
+	return 0;
+    return i.mode;
+}
+
+
+/**
+ * Get AGP aperture base.
+ *
+ * \param fd file descriptor.
+ * 
+ * \return aperture base on success, zero on failure.
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_AGP_INFO ioctl, getting the
+ * necessary information in a drm_agp_info structure.
+ */
+unsigned long drmAgpBase(int fd)
+{
+    drm_agp_info_t i;
+
+    memclear(i);
+
+    if (drmIoctl(fd, DRM_IOCTL_AGP_INFO, &i))
+	return 0;
+    return i.aperture_base;
+}
+
+
+/**
+ * Get AGP aperture size.
+ *
+ * \param fd file descriptor.
+ * 
+ * \return aperture size on success, zero on failure.
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_AGP_INFO ioctl, getting the
+ * necessary information in a drm_agp_info structure.
+ */
+unsigned long drmAgpSize(int fd)
+{
+    drm_agp_info_t i;
+
+    memclear(i);
+
+    if (drmIoctl(fd, DRM_IOCTL_AGP_INFO, &i))
+	return 0;
+    return i.aperture_size;
+}
+
+
+/**
+ * Get used AGP memory.
+ *
+ * \param fd file descriptor.
+ * 
+ * \return memory used on success, or zero on failure.
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_AGP_INFO ioctl, getting the
+ * necessary information in a drm_agp_info structure.
+ */
+unsigned long drmAgpMemoryUsed(int fd)
+{
+    drm_agp_info_t i;
+
+    memclear(i);
+
+    if (drmIoctl(fd, DRM_IOCTL_AGP_INFO, &i))
+	return 0;
+    return i.memory_used;
+}
+
+
+/**
+ * Get available AGP memory.
+ *
+ * \param fd file descriptor.
+ * 
+ * \return memory available on success, or zero on failure.
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_AGP_INFO ioctl, getting the
+ * necessary information in a drm_agp_info structure.
+ */
+unsigned long drmAgpMemoryAvail(int fd)
+{
+    drm_agp_info_t i;
+
+    memclear(i);
+
+    if (drmIoctl(fd, DRM_IOCTL_AGP_INFO, &i))
+	return 0;
+    return i.memory_allowed;
+}
+
+
+/**
+ * Get hardware vendor ID.
+ *
+ * \param fd file descriptor.
+ * 
+ * \return vendor ID on success, or zero on failure.
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_AGP_INFO ioctl, getting the
+ * necessary information in a drm_agp_info structure.
+ */
+unsigned int drmAgpVendorId(int fd)
+{
+    drm_agp_info_t i;
+
+    memclear(i);
+
+    if (drmIoctl(fd, DRM_IOCTL_AGP_INFO, &i))
+	return 0;
+    return i.id_vendor;
+}
+
+
+/**
+ * Get hardware device ID.
+ *
+ * \param fd file descriptor.
+ * 
+ * \return zero on success, or zero on failure.
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_AGP_INFO ioctl, getting the
+ * necessary information in a drm_agp_info structure.
+ */
+unsigned int drmAgpDeviceId(int fd)
+{
+    drm_agp_info_t i;
+
+    memclear(i);
+
+    if (drmIoctl(fd, DRM_IOCTL_AGP_INFO, &i))
+	return 0;
+    return i.id_device;
+}
+
+int drmScatterGatherAlloc(int fd, unsigned long size, drm_handle_t *handle)
+{
+    drm_scatter_gather_t sg;
+
+    memclear(sg);
+
+    *handle = 0;
+    sg.size   = size;
+    if (drmIoctl(fd, DRM_IOCTL_SG_ALLOC, &sg))
+	return -errno;
+    *handle = sg.handle;
+    return 0;
+}
+
+int drmScatterGatherFree(int fd, drm_handle_t handle)
+{
+    drm_scatter_gather_t sg;
+
+    memclear(sg);
+    sg.handle = handle;
+    if (drmIoctl(fd, DRM_IOCTL_SG_FREE, &sg))
+	return -errno;
+    return 0;
+}
+
+/**
+ * Wait for VBLANK.
+ *
+ * \param fd file descriptor.
+ * \param vbl pointer to a drmVBlank structure.
+ * 
+ * \return zero on success, or a negative value on failure.
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_WAIT_VBLANK ioctl.
+ */
+int drmWaitVBlank(int fd, drmVBlankPtr vbl)
+{
+    struct timespec timeout, cur;
+    int ret;
+
+    ret = clock_gettime(CLOCK_MONOTONIC, &timeout);
+    if (ret < 0) {
+	fprintf(stderr, "clock_gettime failed: %s\n", strerror(errno));
+	goto out;
+    }
+    timeout.tv_sec++;
+
+    do {
+       ret = ioctl(fd, DRM_IOCTL_WAIT_VBLANK, vbl);
+       vbl->request.type &= ~DRM_VBLANK_RELATIVE;
+       if (ret && errno == EINTR) {
+	       clock_gettime(CLOCK_MONOTONIC, &cur);
+	       /* Timeout after 1s */
+	       if (cur.tv_sec > timeout.tv_sec + 1 ||
+		   (cur.tv_sec == timeout.tv_sec && cur.tv_nsec >=
+		    timeout.tv_nsec)) {
+		       errno = EBUSY;
+		       ret = -1;
+		       break;
+	       }
+       }
+    } while (ret && errno == EINTR);
+
+out:
+    return ret;
+}
+
+int drmError(int err, const char *label)
+{
+    switch (err) {
+    case DRM_ERR_NO_DEVICE:
+	fprintf(stderr, "%s: no device\n", label);
+	break;
+    case DRM_ERR_NO_ACCESS:
+	fprintf(stderr, "%s: no access\n", label);
+	break;
+    case DRM_ERR_NOT_ROOT:
+	fprintf(stderr, "%s: not root\n", label);
+	break;
+    case DRM_ERR_INVALID:
+	fprintf(stderr, "%s: invalid args\n", label);
+	break;
+    default:
+	if (err < 0)
+	    err = -err;
+	fprintf( stderr, "%s: error %d (%s)\n", label, err, strerror(err) );
+	break;
+    }
+
+    return 1;
+}
+
+/**
+ * Install IRQ handler.
+ *
+ * \param fd file descriptor.
+ * \param irq IRQ number.
+ * 
+ * \return zero on success, or a negative value on failure.
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_CONTROL ioctl, passing the
+ * argument in a drm_control structure.
+ */
+int drmCtlInstHandler(int fd, int irq)
+{
+    drm_control_t ctl;
+
+    memclear(ctl);
+    ctl.func  = DRM_INST_HANDLER;
+    ctl.irq   = irq;
+    if (drmIoctl(fd, DRM_IOCTL_CONTROL, &ctl))
+	return -errno;
+    return 0;
+}
+
+
+/**
+ * Uninstall IRQ handler.
+ *
+ * \param fd file descriptor.
+ * 
+ * \return zero on success, or a negative value on failure.
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_CONTROL ioctl, passing the
+ * argument in a drm_control structure.
+ */
+int drmCtlUninstHandler(int fd)
+{
+    drm_control_t ctl;
+
+    memclear(ctl);
+    ctl.func  = DRM_UNINST_HANDLER;
+    ctl.irq   = 0;
+    if (drmIoctl(fd, DRM_IOCTL_CONTROL, &ctl))
+	return -errno;
+    return 0;
+}
+
+int drmFinish(int fd, int context, drmLockFlags flags)
+{
+    drm_lock_t lock;
+
+    memclear(lock);
+    lock.context = context;
+    if (flags & DRM_LOCK_READY)      lock.flags |= _DRM_LOCK_READY;
+    if (flags & DRM_LOCK_QUIESCENT)  lock.flags |= _DRM_LOCK_QUIESCENT;
+    if (flags & DRM_LOCK_FLUSH)      lock.flags |= _DRM_LOCK_FLUSH;
+    if (flags & DRM_LOCK_FLUSH_ALL)  lock.flags |= _DRM_LOCK_FLUSH_ALL;
+    if (flags & DRM_HALT_ALL_QUEUES) lock.flags |= _DRM_HALT_ALL_QUEUES;
+    if (flags & DRM_HALT_CUR_QUEUES) lock.flags |= _DRM_HALT_CUR_QUEUES;
+    if (drmIoctl(fd, DRM_IOCTL_FINISH, &lock))
+	return -errno;
+    return 0;
+}
+
+/**
+ * Get IRQ from bus ID.
+ *
+ * \param fd file descriptor.
+ * \param busnum bus number.
+ * \param devnum device number.
+ * \param funcnum function number.
+ * 
+ * \return IRQ number on success, or a negative value on failure.
+ * 
+ * \internal
+ * This function is a wrapper around the DRM_IOCTL_IRQ_BUSID ioctl, passing the
+ * arguments in a drm_irq_busid structure.
+ */
+int drmGetInterruptFromBusID(int fd, int busnum, int devnum, int funcnum)
+{
+    drm_irq_busid_t p;
+
+    memclear(p);
+    p.busnum  = busnum;
+    p.devnum  = devnum;
+    p.funcnum = funcnum;
+    if (drmIoctl(fd, DRM_IOCTL_IRQ_BUSID, &p))
+	return -errno;
+    return p.irq;
+}
+
+int drmAddContextTag(int fd, drm_context_t context, void *tag)
+{
+    drmHashEntry  *entry = drmGetEntry(fd);
+
+    if (drmHashInsert(entry->tagTable, context, tag)) {
+	drmHashDelete(entry->tagTable, context);
+	drmHashInsert(entry->tagTable, context, tag);
+    }
+    return 0;
+}
+
+int drmDelContextTag(int fd, drm_context_t context)
+{
+    drmHashEntry  *entry = drmGetEntry(fd);
+
+    return drmHashDelete(entry->tagTable, context);
+}
+
+void *drmGetContextTag(int fd, drm_context_t context)
+{
+    drmHashEntry  *entry = drmGetEntry(fd);
+    void          *value;
+
+    if (drmHashLookup(entry->tagTable, context, &value))
+	return NULL;
+
+    return value;
+}
+
+int drmAddContextPrivateMapping(int fd, drm_context_t ctx_id,
+                                drm_handle_t handle)
+{
+    drm_ctx_priv_map_t map;
+
+    memclear(map);
+    map.ctx_id = ctx_id;
+    map.handle = (void *)(uintptr_t)handle;
+
+    if (drmIoctl(fd, DRM_IOCTL_SET_SAREA_CTX, &map))
+	return -errno;
+    return 0;
+}
+
+int drmGetContextPrivateMapping(int fd, drm_context_t ctx_id,
+                                drm_handle_t *handle)
+{
+    drm_ctx_priv_map_t map;
+
+    memclear(map);
+    map.ctx_id = ctx_id;
+
+    if (drmIoctl(fd, DRM_IOCTL_GET_SAREA_CTX, &map))
+	return -errno;
+    if (handle)
+	*handle = (drm_handle_t)(uintptr_t)map.handle;
+
+    return 0;
+}
+
+int drmGetMap(int fd, int idx, drm_handle_t *offset, drmSize *size,
+	      drmMapType *type, drmMapFlags *flags, drm_handle_t *handle,
+	      int *mtrr)
+{
+    drm_map_t map;
+
+    memclear(map);
+    map.offset = idx;
+    if (drmIoctl(fd, DRM_IOCTL_GET_MAP, &map))
+	return -errno;
+    *offset = map.offset;
+    *size   = map.size;
+    *type   = map.type;
+    *flags  = map.flags;
+    *handle = (unsigned long)map.handle;
+    *mtrr   = map.mtrr;
+    return 0;
+}
+
+int drmGetClient(int fd, int idx, int *auth, int *pid, int *uid,
+		 unsigned long *magic, unsigned long *iocs)
+{
+    drm_client_t client;
+
+    memclear(client);
+    client.idx = idx;
+    if (drmIoctl(fd, DRM_IOCTL_GET_CLIENT, &client))
+	return -errno;
+    *auth      = client.auth;
+    *pid       = client.pid;
+    *uid       = client.uid;
+    *magic     = client.magic;
+    *iocs      = client.iocs;
+    return 0;
+}
+
+int drmGetStats(int fd, drmStatsT *stats)
+{
+    drm_stats_t s;
+    unsigned    i;
+
+    memclear(s);
+    if (drmIoctl(fd, DRM_IOCTL_GET_STATS, &s))
+	return -errno;
+
+    stats->count = 0;
+    memset(stats, 0, sizeof(*stats));
+    if (s.count > sizeof(stats->data)/sizeof(stats->data[0]))
+	return -1;
+
+#define SET_VALUE                              \
+    stats->data[i].long_format = "%-20.20s";   \
+    stats->data[i].rate_format = "%8.8s";      \
+    stats->data[i].isvalue     = 1;            \
+    stats->data[i].verbose     = 0
+
+#define SET_COUNT                              \
+    stats->data[i].long_format = "%-20.20s";   \
+    stats->data[i].rate_format = "%5.5s";      \
+    stats->data[i].isvalue     = 0;            \
+    stats->data[i].mult_names  = "kgm";        \
+    stats->data[i].mult        = 1000;         \
+    stats->data[i].verbose     = 0
+
+#define SET_BYTE                               \
+    stats->data[i].long_format = "%-20.20s";   \
+    stats->data[i].rate_format = "%5.5s";      \
+    stats->data[i].isvalue     = 0;            \
+    stats->data[i].mult_names  = "KGM";        \
+    stats->data[i].mult        = 1024;         \
+    stats->data[i].verbose     = 0
+
+
+    stats->count = s.count;
+    for (i = 0; i < s.count; i++) {
+	stats->data[i].value = s.data[i].value;
+	switch (s.data[i].type) {
+	case _DRM_STAT_LOCK:
+	    stats->data[i].long_name = "Lock";
+	    stats->data[i].rate_name = "Lock";
+	    SET_VALUE;
+	    break;
+	case _DRM_STAT_OPENS:
+	    stats->data[i].long_name = "Opens";
+	    stats->data[i].rate_name = "O";
+	    SET_COUNT;
+	    stats->data[i].verbose   = 1;
+	    break;
+	case _DRM_STAT_CLOSES:
+	    stats->data[i].long_name = "Closes";
+	    stats->data[i].rate_name = "Lock";
+	    SET_COUNT;
+	    stats->data[i].verbose   = 1;
+	    break;
+	case _DRM_STAT_IOCTLS:
+	    stats->data[i].long_name = "Ioctls";
+	    stats->data[i].rate_name = "Ioc/s";
+	    SET_COUNT;
+	    break;
+	case _DRM_STAT_LOCKS:
+	    stats->data[i].long_name = "Locks";
+	    stats->data[i].rate_name = "Lck/s";
+	    SET_COUNT;
+	    break;
+	case _DRM_STAT_UNLOCKS:
+	    stats->data[i].long_name = "Unlocks";
+	    stats->data[i].rate_name = "Unl/s";
+	    SET_COUNT;
+	    break;
+	case _DRM_STAT_IRQ:
+	    stats->data[i].long_name = "IRQs";
+	    stats->data[i].rate_name = "IRQ/s";
+	    SET_COUNT;
+	    break;
+	case _DRM_STAT_PRIMARY:
+	    stats->data[i].long_name = "Primary Bytes";
+	    stats->data[i].rate_name = "PB/s";
+	    SET_BYTE;
+	    break;
+	case _DRM_STAT_SECONDARY:
+	    stats->data[i].long_name = "Secondary Bytes";
+	    stats->data[i].rate_name = "SB/s";
+	    SET_BYTE;
+	    break;
+	case _DRM_STAT_DMA:
+	    stats->data[i].long_name = "DMA";
+	    stats->data[i].rate_name = "DMA/s";
+	    SET_COUNT;
+	    break;
+	case _DRM_STAT_SPECIAL:
+	    stats->data[i].long_name = "Special DMA";
+	    stats->data[i].rate_name = "dma/s";
+	    SET_COUNT;
+	    break;
+	case _DRM_STAT_MISSED:
+	    stats->data[i].long_name = "Miss";
+	    stats->data[i].rate_name = "Ms/s";
+	    SET_COUNT;
+	    break;
+	case _DRM_STAT_VALUE:
+	    stats->data[i].long_name = "Value";
+	    stats->data[i].rate_name = "Value";
+	    SET_VALUE;
+	    break;
+	case _DRM_STAT_BYTE:
+	    stats->data[i].long_name = "Bytes";
+	    stats->data[i].rate_name = "B/s";
+	    SET_BYTE;
+	    break;
+	case _DRM_STAT_COUNT:
+	default:
+	    stats->data[i].long_name = "Count";
+	    stats->data[i].rate_name = "Cnt/s";
+	    SET_COUNT;
+	    break;
+	}
+    }
+    return 0;
+}
+
+/**
+ * Issue a set-version ioctl.
+ *
+ * \param fd file descriptor.
+ * \param drmCommandIndex command index 
+ * \param data source pointer of the data to be read and written.
+ * \param size size of the data to be read and written.
+ * 
+ * \return zero on success, or a negative value on failure.
+ * 
+ * \internal
+ * It issues a read-write ioctl given by 
+ * \code DRM_COMMAND_BASE + drmCommandIndex \endcode.
+ */
+int drmSetInterfaceVersion(int fd, drmSetVersion *version)
+{
+    int retcode = 0;
+    drm_set_version_t sv;
+
+    memclear(sv);
+    sv.drm_di_major = version->drm_di_major;
+    sv.drm_di_minor = version->drm_di_minor;
+    sv.drm_dd_major = version->drm_dd_major;
+    sv.drm_dd_minor = version->drm_dd_minor;
+
+    if (drmIoctl(fd, DRM_IOCTL_SET_VERSION, &sv)) {
+	retcode = -errno;
+    }
+
+    version->drm_di_major = sv.drm_di_major;
+    version->drm_di_minor = sv.drm_di_minor;
+    version->drm_dd_major = sv.drm_dd_major;
+    version->drm_dd_minor = sv.drm_dd_minor;
+
+    return retcode;
+}
+
+/**
+ * Send a device-specific command.
+ *
+ * \param fd file descriptor.
+ * \param drmCommandIndex command index 
+ * 
+ * \return zero on success, or a negative value on failure.
+ * 
+ * \internal
+ * It issues a ioctl given by 
+ * \code DRM_COMMAND_BASE + drmCommandIndex \endcode.
+ */
+int drmCommandNone(int fd, unsigned long drmCommandIndex)
+{
+    unsigned long request;
+
+    request = DRM_IO( DRM_COMMAND_BASE + drmCommandIndex);
+
+    if (drmIoctl(fd, request, NULL)) {
+	return -errno;
+    }
+    return 0;
+}
+
+
+/**
+ * Send a device-specific read command.
+ *
+ * \param fd file descriptor.
+ * \param drmCommandIndex command index 
+ * \param data destination pointer of the data to be read.
+ * \param size size of the data to be read.
+ * 
+ * \return zero on success, or a negative value on failure.
+ *
+ * \internal
+ * It issues a read ioctl given by 
+ * \code DRM_COMMAND_BASE + drmCommandIndex \endcode.
+ */
+int drmCommandRead(int fd, unsigned long drmCommandIndex, void *data,
+                   unsigned long size)
+{
+    unsigned long request;
+
+    request = DRM_IOC( DRM_IOC_READ, DRM_IOCTL_BASE, 
+	DRM_COMMAND_BASE + drmCommandIndex, size);
+
+    if (drmIoctl(fd, request, data)) {
+	return -errno;
+    }
+    return 0;
+}
+
+
+/**
+ * Send a device-specific write command.
+ *
+ * \param fd file descriptor.
+ * \param drmCommandIndex command index 
+ * \param data source pointer of the data to be written.
+ * \param size size of the data to be written.
+ * 
+ * \return zero on success, or a negative value on failure.
+ * 
+ * \internal
+ * It issues a write ioctl given by 
+ * \code DRM_COMMAND_BASE + drmCommandIndex \endcode.
+ */
+int drmCommandWrite(int fd, unsigned long drmCommandIndex, void *data,
+                    unsigned long size)
+{
+    unsigned long request;
+
+    request = DRM_IOC( DRM_IOC_WRITE, DRM_IOCTL_BASE, 
+	DRM_COMMAND_BASE + drmCommandIndex, size);
+
+    if (drmIoctl(fd, request, data)) {
+	return -errno;
+    }
+    return 0;
+}
+
+
+/**
+ * Send a device-specific read-write command.
+ *
+ * \param fd file descriptor.
+ * \param drmCommandIndex command index 
+ * \param data source pointer of the data to be read and written.
+ * \param size size of the data to be read and written.
+ * 
+ * \return zero on success, or a negative value on failure.
+ * 
+ * \internal
+ * It issues a read-write ioctl given by 
+ * \code DRM_COMMAND_BASE + drmCommandIndex \endcode.
+ */
+int drmCommandWriteRead(int fd, unsigned long drmCommandIndex, void *data,
+                        unsigned long size)
+{
+    unsigned long request;
+
+    request = DRM_IOC( DRM_IOC_READ|DRM_IOC_WRITE, DRM_IOCTL_BASE, 
+	DRM_COMMAND_BASE + drmCommandIndex, size);
+
+    if (drmIoctl(fd, request, data))
+	return -errno;
+    return 0;
+}
+
+#define DRM_MAX_FDS 16
+static struct {
+    char *BusID;
+    int fd;
+    int refcount;
+    int type;
+} connection[DRM_MAX_FDS];
+
+static int nr_fds = 0;
+
+int drmOpenOnce(void *unused, 
+		const char *BusID,
+		int *newlyopened)
+{
+    return drmOpenOnceWithType(BusID, newlyopened, DRM_NODE_PRIMARY);
+}
+
+int drmOpenOnceWithType(const char *BusID, int *newlyopened, int type)
+{
+    int i;
+    int fd;
+   
+    for (i = 0; i < nr_fds; i++)
+	if ((strcmp(BusID, connection[i].BusID) == 0) &&
+	    (connection[i].type == type)) {
+	    connection[i].refcount++;
+	    *newlyopened = 0;
+	    return connection[i].fd;
+	}
+
+    fd = drmOpenWithType(NULL, BusID, type);
+    if (fd <= 0 || nr_fds == DRM_MAX_FDS)
+	return fd;
+   
+    connection[nr_fds].BusID = strdup(BusID);
+    connection[nr_fds].fd = fd;
+    connection[nr_fds].refcount = 1;
+    connection[nr_fds].type = type;
+    *newlyopened = 1;
+
+    if (0)
+	fprintf(stderr, "saved connection %d for %s %d\n", 
+		nr_fds, connection[nr_fds].BusID, 
+		strcmp(BusID, connection[nr_fds].BusID));
+
+    nr_fds++;
+
+    return fd;
+}
+
+void drmCloseOnce(int fd)
+{
+    int i;
+
+    for (i = 0; i < nr_fds; i++) {
+	if (fd == connection[i].fd) {
+	    if (--connection[i].refcount == 0) {
+		drmClose(connection[i].fd);
+		free(connection[i].BusID);
+	    
+		if (i < --nr_fds) 
+		    connection[i] = connection[nr_fds];
+
+		return;
+	    }
+	}
+    }
+}
+
+int drmSetMaster(int fd)
+{
+	return drmIoctl(fd, DRM_IOCTL_SET_MASTER, NULL);
+}
+
+int drmDropMaster(int fd)
+{
+	return drmIoctl(fd, DRM_IOCTL_DROP_MASTER, NULL);
+}
+
+char *drmGetDeviceNameFromFd(int fd)
+{
+	char name[128];
+	struct stat sbuf;
+	dev_t d;
+	int i;
+
+	/* The whole drmOpen thing is a fiasco and we need to find a way
+	 * back to just using open(2).  For now, however, lets just make
+	 * things worse with even more ad hoc directory walking code to
+	 * discover the device file name. */
+
+	fstat(fd, &sbuf);
+	d = sbuf.st_rdev;
+
+	for (i = 0; i < DRM_MAX_MINOR; i++) {
+		snprintf(name, sizeof name, DRM_DEV_NAME, DRM_DIR_NAME, i);
+		if (stat(name, &sbuf) == 0 && sbuf.st_rdev == d)
+			break;
+	}
+	if (i == DRM_MAX_MINOR)
+		return NULL;
+
+	return strdup(name);
+}
+
+int drmGetNodeTypeFromFd(int fd)
+{
+	struct stat sbuf;
+	int maj, min, type;
+
+	if (fstat(fd, &sbuf))
+		return -1;
+
+	maj = major(sbuf.st_rdev);
+	min = minor(sbuf.st_rdev);
+
+	if (maj != DRM_MAJOR || !S_ISCHR(sbuf.st_mode)) {
+		errno = EINVAL;
+		return -1;
+	}
+
+	type = drmGetMinorType(min);
+	if (type == -1)
+		errno = ENODEV;
+	return type;
+}
+
+int drmPrimeHandleToFD(int fd, uint32_t handle, uint32_t flags, int *prime_fd)
+{
+	struct drm_prime_handle args;
+	int ret;
+
+	args.handle = handle;
+	args.flags = flags;
+	ret = drmIoctl(fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &args);
+	if (ret)
+		return ret;
+
+	*prime_fd = args.fd;
+	return 0;
+}
+
+int drmPrimeFDToHandle(int fd, int prime_fd, uint32_t *handle)
+{
+	struct drm_prime_handle args;
+	int ret;
+
+	args.fd = prime_fd;
+	args.flags = 0;
+	ret = drmIoctl(fd, DRM_IOCTL_PRIME_FD_TO_HANDLE, &args);
+	if (ret)
+		return ret;
+
+	*handle = args.handle;
+	return 0;
+}
+
+static char *drmGetMinorNameForFD(int fd, int type)
+{
+#ifdef __linux__
+	DIR *sysdir;
+	struct dirent *pent, *ent;
+	struct stat sbuf;
+	const char *name = drmGetMinorName(type);
+	int len;
+	char dev_name[64], buf[64];
+	long name_max;
+	int maj, min;
+
+	if (!name)
+		return NULL;
+
+	len = strlen(name);
+
+	if (fstat(fd, &sbuf))
+		return NULL;
+
+	maj = major(sbuf.st_rdev);
+	min = minor(sbuf.st_rdev);
+
+	if (maj != DRM_MAJOR || !S_ISCHR(sbuf.st_mode))
+		return NULL;
+
+	snprintf(buf, sizeof(buf), "/sys/dev/char/%d:%d/device/drm", maj, min);
+
+	sysdir = opendir(buf);
+	if (!sysdir)
+		return NULL;
+
+	name_max = fpathconf(dirfd(sysdir), _PC_NAME_MAX);
+	if (name_max == -1)
+		goto out_close_dir;
+
+	pent = malloc(offsetof(struct dirent, d_name) + name_max + 1);
+	if (pent == NULL)
+		 goto out_close_dir;
+
+	while (readdir_r(sysdir, pent, &ent) == 0 && ent != NULL) {
+		if (strncmp(ent->d_name, name, len) == 0) {
+			free(pent);
+			closedir(sysdir);
+
+			snprintf(dev_name, sizeof(dev_name), DRM_DIR_NAME "/%s",
+				 ent->d_name);
+			return strdup(dev_name);
+		}
+	}
+
+	free(pent);
+
+out_close_dir:
+	closedir(sysdir);
+#endif
+	return NULL;
+}
+
+char *drmGetPrimaryDeviceNameFromFd(int fd)
+{
+	return drmGetMinorNameForFD(fd, DRM_NODE_PRIMARY);
+}
+
+char *drmGetRenderDeviceNameFromFd(int fd)
+{
+	return drmGetMinorNameForFD(fd, DRM_NODE_RENDER);
+}

diff --git a/icd/intel/kmd/libdrm/xf86drm.h b/icd/intel/kmd/libdrm/xf86drm.h
new file mode 100644
index 0000000..40c55c9
--- /dev/null
+++ b/icd/intel/kmd/libdrm/xf86drm.h

@@ -0,0 +1,759 @@
+/**
+ * \file xf86drm.h 
+ * OS-independent header for DRM user-level library interface.
+ *
+ * \author Rickard E. (Rik) Faith <faith@valinux.com>
+ */
+ 
+/*
+ * Copyright 1999, 2000 Precision Insight, Inc., Cedar Park, Texas.
+ * Copyright 2000 VA Linux Systems, Inc., Sunnyvale, California.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef _XF86DRM_H_
+#define _XF86DRM_H_
+
+#include <stdarg.h>
+#include <sys/types.h>
+#include <stdint.h>
+#include <drm.h>
+
+#if defined(__cplusplus) || defined(c_plusplus)
+extern "C" {
+#endif
+
+#ifndef DRM_MAX_MINOR
+#define DRM_MAX_MINOR   16
+#endif
+
+#if defined(__linux__)
+
+#define DRM_IOCTL_NR(n)		_IOC_NR(n)
+#define DRM_IOC_VOID		_IOC_NONE
+#define DRM_IOC_READ		_IOC_READ
+#define DRM_IOC_WRITE		_IOC_WRITE
+#define DRM_IOC_READWRITE	_IOC_READ|_IOC_WRITE
+#define DRM_IOC(dir, group, nr, size) _IOC(dir, group, nr, size)
+
+#else /* One of the *BSDs */
+
+#include <sys/ioccom.h>
+#define DRM_IOCTL_NR(n)         ((n) & 0xff)
+#define DRM_IOC_VOID            IOC_VOID
+#define DRM_IOC_READ            IOC_OUT
+#define DRM_IOC_WRITE           IOC_IN
+#define DRM_IOC_READWRITE       IOC_INOUT
+#define DRM_IOC(dir, group, nr, size) _IOC(dir, group, nr, size)
+
+#endif
+
+				/* Defaults, if nothing set in xf86config */
+#define DRM_DEV_UID	 0
+#define DRM_DEV_GID	 0
+/* Default /dev/dri directory permissions 0755 */
+#define DRM_DEV_DIRMODE	 	\
+	(S_IRUSR|S_IWUSR|S_IXUSR|S_IRGRP|S_IXGRP|S_IROTH|S_IXOTH)
+#define DRM_DEV_MODE	 (S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP)
+
+#define DRM_DIR_NAME  "/dev/dri"
+#define DRM_DEV_NAME  "%s/card%d"
+#define DRM_CONTROL_DEV_NAME  "%s/controlD%d"
+#define DRM_RENDER_DEV_NAME  "%s/renderD%d"
+#define DRM_PROC_NAME "/proc/dri/" /* For backward Linux compatibility */
+
+#define DRM_ERR_NO_DEVICE  (-1001)
+#define DRM_ERR_NO_ACCESS  (-1002)
+#define DRM_ERR_NOT_ROOT   (-1003)
+#define DRM_ERR_INVALID    (-1004)
+#define DRM_ERR_NO_FD      (-1005)
+
+#define DRM_AGP_NO_HANDLE 0
+
+typedef unsigned int  drmSize,     *drmSizePtr;	    /**< For mapped regions */
+typedef void          *drmAddress, **drmAddressPtr; /**< For mapped regions */
+
+#if (__GNUC__ >= 3)
+#define DRM_PRINTFLIKE(f, a) __attribute__ ((format(__printf__, f, a)))
+#else
+#define DRM_PRINTFLIKE(f, a)
+#endif
+
+typedef struct _drmServerInfo {
+  int (*debug_print)(const char *format, va_list ap) DRM_PRINTFLIKE(1,0);
+  int (*load_module)(const char *name);
+  void (*get_perms)(gid_t *, mode_t *);
+} drmServerInfo, *drmServerInfoPtr;
+
+typedef struct drmHashEntry {
+    int      fd;
+    void     (*f)(int, void *, void *);
+    void     *tagTable;
+} drmHashEntry;
+
+extern int drmIoctl(int fd, unsigned long request, void *arg);
+extern void *drmGetHashTable(void);
+extern drmHashEntry *drmGetEntry(int fd);
+
+/**
+ * Driver version information.
+ *
+ * \sa drmGetVersion() and drmSetVersion().
+ */
+typedef struct _drmVersion {
+    int     version_major;        /**< Major version */
+    int     version_minor;        /**< Minor version */
+    int     version_patchlevel;   /**< Patch level */
+    int     name_len; 	          /**< Length of name buffer */
+    char    *name;	          /**< Name of driver */
+    int     date_len;             /**< Length of date buffer */
+    char    *date;                /**< User-space buffer to hold date */
+    int     desc_len;	          /**< Length of desc buffer */
+    char    *desc;                /**< User-space buffer to hold desc */
+} drmVersion, *drmVersionPtr;
+
+typedef struct _drmStats {
+    unsigned long count;	     /**< Number of data */
+    struct {
+	unsigned long value;	     /**< Value from kernel */
+	const char    *long_format;  /**< Suggested format for long_name */
+	const char    *long_name;    /**< Long name for value */
+	const char    *rate_format;  /**< Suggested format for rate_name */
+	const char    *rate_name;    /**< Short name for value per second */
+	int           isvalue;       /**< True if value (vs. counter) */
+	const char    *mult_names;   /**< Multiplier names (e.g., "KGM") */
+	int           mult;          /**< Multiplier value (e.g., 1024) */
+	int           verbose;       /**< Suggest only in verbose output */
+    } data[15];
+} drmStatsT;
+
+
+				/* All of these enums *MUST* match with the
+                                   kernel implementation -- so do *NOT*
+                                   change them!  (The drmlib implementation
+                                   will just copy the flags instead of
+                                   translating them.) */
+typedef enum {
+    DRM_FRAME_BUFFER    = 0,      /**< WC, no caching, no core dump */
+    DRM_REGISTERS       = 1,      /**< no caching, no core dump */
+    DRM_SHM             = 2,      /**< shared, cached */
+    DRM_AGP             = 3,	  /**< AGP/GART */
+    DRM_SCATTER_GATHER  = 4,	  /**< PCI scatter/gather */
+    DRM_CONSISTENT      = 5	  /**< PCI consistent */
+} drmMapType;
+
+typedef enum {
+    DRM_RESTRICTED      = 0x0001, /**< Cannot be mapped to client-virtual */
+    DRM_READ_ONLY       = 0x0002, /**< Read-only in client-virtual */
+    DRM_LOCKED          = 0x0004, /**< Physical pages locked */
+    DRM_KERNEL          = 0x0008, /**< Kernel requires access */
+    DRM_WRITE_COMBINING = 0x0010, /**< Use write-combining, if available */
+    DRM_CONTAINS_LOCK   = 0x0020, /**< SHM page that contains lock */
+    DRM_REMOVABLE	= 0x0040  /**< Removable mapping */
+} drmMapFlags;
+
+/**
+ * \warning These values *MUST* match drm.h
+ */
+typedef enum {
+    /** \name Flags for DMA buffer dispatch */
+    /*@{*/
+    DRM_DMA_BLOCK        = 0x01, /**< 
+				  * Block until buffer dispatched.
+				  * 
+				  * \note the buffer may not yet have been
+				  * processed by the hardware -- getting a
+				  * hardware lock with the hardware quiescent
+				  * will ensure that the buffer has been
+				  * processed.
+				  */
+    DRM_DMA_WHILE_LOCKED = 0x02, /**< Dispatch while lock held */
+    DRM_DMA_PRIORITY     = 0x04, /**< High priority dispatch */
+    /*@}*/
+
+    /** \name Flags for DMA buffer request */
+    /*@{*/
+    DRM_DMA_WAIT         = 0x10, /**< Wait for free buffers */
+    DRM_DMA_SMALLER_OK   = 0x20, /**< Smaller-than-requested buffers OK */
+    DRM_DMA_LARGER_OK    = 0x40  /**< Larger-than-requested buffers OK */
+    /*@}*/
+} drmDMAFlags;
+
+typedef enum {
+    DRM_PAGE_ALIGN       = 0x01,
+    DRM_AGP_BUFFER       = 0x02,
+    DRM_SG_BUFFER        = 0x04,
+    DRM_FB_BUFFER        = 0x08,
+    DRM_PCI_BUFFER_RO    = 0x10
+} drmBufDescFlags;
+
+typedef enum {
+    DRM_LOCK_READY      = 0x01, /**< Wait until hardware is ready for DMA */
+    DRM_LOCK_QUIESCENT  = 0x02, /**< Wait until hardware quiescent */
+    DRM_LOCK_FLUSH      = 0x04, /**< Flush this context's DMA queue first */
+    DRM_LOCK_FLUSH_ALL  = 0x08, /**< Flush all DMA queues first */
+				/* These *HALT* flags aren't supported yet
+                                   -- they will be used to support the
+                                   full-screen DGA-like mode. */
+    DRM_HALT_ALL_QUEUES = 0x10, /**< Halt all current and future queues */
+    DRM_HALT_CUR_QUEUES = 0x20  /**< Halt all current queues */
+} drmLockFlags;
+
+typedef enum {
+    DRM_CONTEXT_PRESERVED = 0x01, /**< This context is preserved and
+				     never swapped. */
+    DRM_CONTEXT_2DONLY    = 0x02  /**< This context is for 2D rendering only. */
+} drm_context_tFlags, *drm_context_tFlagsPtr;
+
+typedef struct _drmBufDesc {
+    int              count;	  /**< Number of buffers of this size */
+    int              size;	  /**< Size in bytes */
+    int              low_mark;	  /**< Low water mark */
+    int              high_mark;	  /**< High water mark */
+} drmBufDesc, *drmBufDescPtr;
+
+typedef struct _drmBufInfo {
+    int              count;	  /**< Number of buffers described in list */
+    drmBufDescPtr    list;	  /**< List of buffer descriptions */
+} drmBufInfo, *drmBufInfoPtr;
+
+typedef struct _drmBuf {
+    int              idx;	  /**< Index into the master buffer list */
+    int              total;	  /**< Buffer size */
+    int              used;	  /**< Amount of buffer in use (for DMA) */
+    drmAddress       address;	  /**< Address */
+} drmBuf, *drmBufPtr;
+
+/**
+ * Buffer mapping information.
+ *
+ * Used by drmMapBufs() and drmUnmapBufs() to store information about the
+ * mapped buffers.
+ */
+typedef struct _drmBufMap {
+    int              count;	  /**< Number of buffers mapped */
+    drmBufPtr        list;	  /**< Buffers */
+} drmBufMap, *drmBufMapPtr;
+
+typedef struct _drmLock {
+    volatile unsigned int lock;
+    char                      padding[60];
+    /* This is big enough for most current (and future?) architectures:
+       DEC Alpha:              32 bytes
+       Intel Merced:           ?
+       Intel P5/PPro/PII/PIII: 32 bytes
+       Intel StrongARM:        32 bytes
+       Intel i386/i486:        16 bytes
+       MIPS:                   32 bytes (?)
+       Motorola 68k:           16 bytes
+       Motorola PowerPC:       32 bytes
+       Sun SPARC:              32 bytes
+    */
+} drmLock, *drmLockPtr;
+
+/**
+ * Indices here refer to the offset into
+ * list in drmBufInfo
+ */
+typedef struct _drmDMAReq {
+    drm_context_t    context;  	  /**< Context handle */
+    int           send_count;     /**< Number of buffers to send */
+    int           *send_list;     /**< List of handles to buffers */
+    int           *send_sizes;    /**< Lengths of data to send, in bytes */
+    drmDMAFlags   flags;          /**< Flags */
+    int           request_count;  /**< Number of buffers requested */
+    int           request_size;	  /**< Desired size of buffers requested */
+    int           *request_list;  /**< Buffer information */
+    int           *request_sizes; /**< Minimum acceptable sizes */
+    int           granted_count;  /**< Number of buffers granted at this size */
+} drmDMAReq, *drmDMAReqPtr;
+
+typedef struct _drmRegion {
+    drm_handle_t     handle;
+    unsigned int  offset;
+    drmSize       size;
+    drmAddress    map;
+} drmRegion, *drmRegionPtr;
+
+typedef struct _drmTextureRegion {
+    unsigned char next;
+    unsigned char prev;
+    unsigned char in_use;
+    unsigned char padding;	/**< Explicitly pad this out */
+    unsigned int  age;
+} drmTextureRegion, *drmTextureRegionPtr;
+
+
+typedef enum {
+    DRM_VBLANK_ABSOLUTE = 0x0,	/**< Wait for specific vblank sequence number */
+    DRM_VBLANK_RELATIVE = 0x1,	/**< Wait for given number of vblanks */
+    /* bits 1-6 are reserved for high crtcs */
+    DRM_VBLANK_HIGH_CRTC_MASK = 0x0000003e,
+    DRM_VBLANK_EVENT = 0x4000000,	/**< Send event instead of blocking */
+    DRM_VBLANK_FLIP = 0x8000000,	/**< Scheduled buffer swap should flip */
+    DRM_VBLANK_NEXTONMISS = 0x10000000,	/**< If missed, wait for next vblank */
+    DRM_VBLANK_SECONDARY = 0x20000000,	/**< Secondary display controller */
+    DRM_VBLANK_SIGNAL   = 0x40000000	/* Send signal instead of blocking */
+} drmVBlankSeqType;
+#define DRM_VBLANK_HIGH_CRTC_SHIFT 1
+
+typedef struct _drmVBlankReq {
+	drmVBlankSeqType type;
+	unsigned int sequence;
+	unsigned long signal;
+} drmVBlankReq, *drmVBlankReqPtr;
+
+typedef struct _drmVBlankReply {
+	drmVBlankSeqType type;
+	unsigned int sequence;
+	long tval_sec;
+	long tval_usec;
+} drmVBlankReply, *drmVBlankReplyPtr;
+
+typedef union _drmVBlank {
+	drmVBlankReq request;
+	drmVBlankReply reply;
+} drmVBlank, *drmVBlankPtr;
+
+typedef struct _drmSetVersion {
+	int drm_di_major;
+	int drm_di_minor;
+	int drm_dd_major;
+	int drm_dd_minor;
+} drmSetVersion, *drmSetVersionPtr;
+
+#define __drm_dummy_lock(lock) (*(__volatile__ unsigned int *)lock)
+
+#define DRM_LOCK_HELD  0x80000000U /**< Hardware lock is held */
+#define DRM_LOCK_CONT  0x40000000U /**< Hardware lock is contended */
+
+#if defined(__GNUC__) && (__GNUC__ >= 2)
+# if defined(__i386) || defined(__AMD64__) || defined(__x86_64__) || defined(__amd64__)
+				/* Reflect changes here to drmP.h */
+#define DRM_CAS(lock,old,new,__ret)                                    \
+	do {                                                           \
+                int __dummy;	/* Can't mark eax as clobbered */      \
+		__asm__ __volatile__(                                  \
+			"lock ; cmpxchg %4,%1\n\t"                     \
+                        "setnz %0"                                     \
+			: "=d" (__ret),                                \
+   			  "=m" (__drm_dummy_lock(lock)),               \
+                          "=a" (__dummy)                               \
+			: "2" (old),                                   \
+			  "r" (new));                                  \
+	} while (0)
+
+#elif defined(__alpha__)
+
+#define	DRM_CAS(lock, old, new, ret)		\
+	do {					\
+		int tmp, old32;			\
+		__asm__ __volatile__(		\
+		"	addl	$31, %5, %3\n"	\
+		"1:	ldl_l	%0, %2\n"	\
+		"	cmpeq	%0, %3, %1\n"	\
+		"	beq	%1, 2f\n"	\
+		"	mov	%4, %0\n"	\
+		"	stl_c	%0, %2\n"	\
+		"	beq	%0, 3f\n"	\
+		"	mb\n"			\
+		"2:	cmpeq	%1, 0, %1\n"	\
+		".subsection 2\n"		\
+		"3:	br	1b\n"		\
+		".previous"			\
+		: "=&r"(tmp), "=&r"(ret),	\
+		  "=m"(__drm_dummy_lock(lock)),	\
+		  "=&r"(old32)			\
+		: "r"(new), "r"(old)		\
+		: "memory");			\
+	} while (0)
+
+#elif defined(__sparc__)
+
+#define DRM_CAS(lock,old,new,__ret)				\
+do {	register unsigned int __old __asm("o0");		\
+	register unsigned int __new __asm("o1");		\
+	register volatile unsigned int *__lock __asm("o2");	\
+	__old = old;						\
+	__new = new;						\
+	__lock = (volatile unsigned int *)lock;			\
+	__asm__ __volatile__(					\
+		/*"cas [%2], %3, %0"*/				\
+		".word 0xd3e29008\n\t"				\
+		/*"membar #StoreStore | #StoreLoad"*/		\
+		".word 0x8143e00a"				\
+		: "=&r" (__new)					\
+		: "0" (__new),					\
+		  "r" (__lock),					\
+		  "r" (__old)					\
+		: "memory");					\
+	__ret = (__new != __old);				\
+} while(0)
+
+#elif defined(__ia64__)
+
+#ifdef __INTEL_COMPILER
+/* this currently generates bad code (missing stop bits)... */
+#include <ia64intrin.h>
+
+#define DRM_CAS(lock,old,new,__ret)					      \
+	do {								      \
+		unsigned long __result, __old = (old) & 0xffffffff;		\
+		__mf();							      	\
+		__result = _InterlockedCompareExchange_acq(&__drm_dummy_lock(lock), (new), __old);\
+		__ret = (__result) != (__old);					\
+/*		__ret = (__sync_val_compare_and_swap(&__drm_dummy_lock(lock), \
+						     (old), (new))	      \
+			 != (old));					      */\
+	} while (0)
+
+#else
+#define DRM_CAS(lock,old,new,__ret)					  \
+	do {								  \
+		unsigned int __result, __old = (old);			  \
+		__asm__ __volatile__(					  \
+			"mf\n"						  \
+			"mov ar.ccv=%2\n"				  \
+			";;\n"						  \
+			"cmpxchg4.acq %0=%1,%3,ar.ccv"			  \
+			: "=r" (__result), "=m" (__drm_dummy_lock(lock))  \
+			: "r" ((unsigned long)__old), "r" (new)			  \
+			: "memory");					  \
+		__ret = (__result) != (__old);				  \
+	} while (0)
+
+#endif
+
+#elif defined(__powerpc__)
+
+#define DRM_CAS(lock,old,new,__ret)			\
+	do {						\
+		__asm__ __volatile__(			\
+			"sync;"				\
+			"0:    lwarx %0,0,%1;"		\
+			"      xor. %0,%3,%0;"		\
+			"      bne 1f;"			\
+			"      stwcx. %2,0,%1;"		\
+			"      bne- 0b;"		\
+			"1:    "			\
+			"sync;"				\
+		: "=&r"(__ret)				\
+		: "r"(lock), "r"(new), "r"(old)		\
+		: "cr0", "memory");			\
+	} while (0)
+
+#endif /* architecture */
+#endif /* __GNUC__ >= 2 */
+
+#ifndef DRM_CAS
+#define DRM_CAS(lock,old,new,ret) do { ret=1; } while (0) /* FAST LOCK FAILS */
+#endif
+
+#if defined(__alpha__)
+#define DRM_CAS_RESULT(_result)		long _result
+#elif defined(__powerpc__)
+#define DRM_CAS_RESULT(_result)		int _result
+#else
+#define DRM_CAS_RESULT(_result)		char _result
+#endif
+
+#define DRM_LIGHT_LOCK(fd,lock,context)                                \
+	do {                                                           \
+                DRM_CAS_RESULT(__ret);                                 \
+		DRM_CAS(lock,context,DRM_LOCK_HELD|context,__ret);     \
+                if (__ret) drmGetLock(fd,context,0);                   \
+        } while(0)
+
+				/* This one counts fast locks -- for
+                                   benchmarking only. */
+#define DRM_LIGHT_LOCK_COUNT(fd,lock,context,count)                    \
+	do {                                                           \
+                DRM_CAS_RESULT(__ret);                                 \
+		DRM_CAS(lock,context,DRM_LOCK_HELD|context,__ret);     \
+                if (__ret) drmGetLock(fd,context,0);                   \
+                else       ++count;                                    \
+        } while(0)
+
+#define DRM_LOCK(fd,lock,context,flags)                                \
+	do {                                                           \
+		if (flags) drmGetLock(fd,context,flags);               \
+		else       DRM_LIGHT_LOCK(fd,lock,context);            \
+	} while(0)
+
+#define DRM_UNLOCK(fd,lock,context)                                    \
+	do {                                                           \
+                DRM_CAS_RESULT(__ret);                                 \
+		DRM_CAS(lock,DRM_LOCK_HELD|context,context,__ret);     \
+                if (__ret) drmUnlock(fd,context);                      \
+        } while(0)
+
+				/* Simple spin locks */
+#define DRM_SPINLOCK(spin,val)                                         \
+	do {                                                           \
+            DRM_CAS_RESULT(__ret);                                     \
+	    do {                                                       \
+		DRM_CAS(spin,0,val,__ret);                             \
+		if (__ret) while ((spin)->lock);                       \
+	    } while (__ret);                                           \
+	} while(0)
+
+#define DRM_SPINLOCK_TAKE(spin,val)                                    \
+	do {                                                           \
+            DRM_CAS_RESULT(__ret);                                     \
+            int  cur;                                                  \
+	    do {                                                       \
+                cur = (*spin).lock;                                    \
+		DRM_CAS(spin,cur,val,__ret);                           \
+	    } while (__ret);                                           \
+	} while(0)
+
+#define DRM_SPINLOCK_COUNT(spin,val,count,__ret)                       \
+	do {                                                           \
+            int  __i;                                                  \
+            __ret = 1;                                                 \
+            for (__i = 0; __ret && __i < count; __i++) {               \
+		DRM_CAS(spin,0,val,__ret);                             \
+		if (__ret) for (;__i < count && (spin)->lock; __i++);  \
+	    }                                                          \
+	} while(0)
+
+#define DRM_SPINUNLOCK(spin,val)                                       \
+	do {                                                           \
+            DRM_CAS_RESULT(__ret);                                     \
+            if ((*spin).lock == val) { /* else server stole lock */    \
+	        do {                                                   \
+		    DRM_CAS(spin,val,0,__ret);                         \
+	        } while (__ret);                                       \
+            }                                                          \
+	} while(0)
+
+
+
+/* General user-level programmer's API: unprivileged */
+extern int           drmAvailable(void);
+extern int           drmOpen(const char *name, const char *busid);
+
+#define DRM_NODE_PRIMARY 0
+#define DRM_NODE_CONTROL 1
+#define DRM_NODE_RENDER  2
+extern int           drmOpenWithType(const char *name, const char *busid,
+                                     int type);
+
+extern int           drmOpenControl(int minor);
+extern int           drmOpenRender(int minor);
+extern int           drmClose(int fd);
+extern drmVersionPtr drmGetVersion(int fd);
+extern drmVersionPtr drmGetLibVersion(int fd);
+extern int           drmGetCap(int fd, uint64_t capability, uint64_t *value);
+extern void          drmFreeVersion(drmVersionPtr);
+extern int           drmGetMagic(int fd, drm_magic_t * magic);
+extern char          *drmGetBusid(int fd);
+extern int           drmGetInterruptFromBusID(int fd, int busnum, int devnum,
+					      int funcnum);
+extern int           drmGetMap(int fd, int idx, drm_handle_t *offset,
+			       drmSize *size, drmMapType *type,
+			       drmMapFlags *flags, drm_handle_t *handle,
+			       int *mtrr);
+extern int           drmGetClient(int fd, int idx, int *auth, int *pid,
+				  int *uid, unsigned long *magic,
+				  unsigned long *iocs);
+extern int           drmGetStats(int fd, drmStatsT *stats);
+extern int           drmSetInterfaceVersion(int fd, drmSetVersion *version);
+extern int           drmCommandNone(int fd, unsigned long drmCommandIndex);
+extern int           drmCommandRead(int fd, unsigned long drmCommandIndex,
+                                    void *data, unsigned long size);
+extern int           drmCommandWrite(int fd, unsigned long drmCommandIndex,
+                                     void *data, unsigned long size);
+extern int           drmCommandWriteRead(int fd, unsigned long drmCommandIndex,
+                                         void *data, unsigned long size);
+
+/* General user-level programmer's API: X server (root) only  */
+extern void          drmFreeBusid(const char *busid);
+extern int           drmSetBusid(int fd, const char *busid);
+extern int           drmAuthMagic(int fd, drm_magic_t magic);
+extern int           drmAddMap(int fd,
+			       drm_handle_t offset,
+			       drmSize size,
+			       drmMapType type,
+			       drmMapFlags flags,
+			       drm_handle_t * handle);
+extern int	     drmRmMap(int fd, drm_handle_t handle);
+extern int	     drmAddContextPrivateMapping(int fd, drm_context_t ctx_id,
+						 drm_handle_t handle);
+
+extern int           drmAddBufs(int fd, int count, int size,
+				drmBufDescFlags flags,
+				int agp_offset);
+extern int           drmMarkBufs(int fd, double low, double high);
+extern int           drmCreateContext(int fd, drm_context_t * handle);
+extern int           drmSetContextFlags(int fd, drm_context_t context,
+					drm_context_tFlags flags);
+extern int           drmGetContextFlags(int fd, drm_context_t context,
+					drm_context_tFlagsPtr flags);
+extern int           drmAddContextTag(int fd, drm_context_t context, void *tag);
+extern int           drmDelContextTag(int fd, drm_context_t context);
+extern void          *drmGetContextTag(int fd, drm_context_t context);
+extern drm_context_t * drmGetReservedContextList(int fd, int *count);
+extern void          drmFreeReservedContextList(drm_context_t *);
+extern int           drmSwitchToContext(int fd, drm_context_t context);
+extern int           drmDestroyContext(int fd, drm_context_t handle);
+extern int           drmCreateDrawable(int fd, drm_drawable_t * handle);
+extern int           drmDestroyDrawable(int fd, drm_drawable_t handle);
+extern int           drmUpdateDrawableInfo(int fd, drm_drawable_t handle,
+					   drm_drawable_info_type_t type,
+					   unsigned int num, void *data);
+extern int           drmCtlInstHandler(int fd, int irq);
+extern int           drmCtlUninstHandler(int fd);
+extern int           drmSetClientCap(int fd, uint64_t capability,
+				     uint64_t value);
+
+/* General user-level programmer's API: authenticated client and/or X */
+extern int           drmMap(int fd,
+			    drm_handle_t handle,
+			    drmSize size,
+			    drmAddressPtr address);
+extern int           drmUnmap(drmAddress address, drmSize size);
+extern drmBufInfoPtr drmGetBufInfo(int fd);
+extern drmBufMapPtr  drmMapBufs(int fd);
+extern int           drmUnmapBufs(drmBufMapPtr bufs);
+extern int           drmDMA(int fd, drmDMAReqPtr request);
+extern int           drmFreeBufs(int fd, int count, int *list);
+extern int           drmGetLock(int fd,
+			        drm_context_t context,
+			        drmLockFlags flags);
+extern int           drmUnlock(int fd, drm_context_t context);
+extern int           drmFinish(int fd, int context, drmLockFlags flags);
+extern int	     drmGetContextPrivateMapping(int fd, drm_context_t ctx_id, 
+						 drm_handle_t * handle);
+
+/* AGP/GART support: X server (root) only */
+extern int           drmAgpAcquire(int fd);
+extern int           drmAgpRelease(int fd);
+extern int           drmAgpEnable(int fd, unsigned long mode);
+extern int           drmAgpAlloc(int fd, unsigned long size,
+				 unsigned long type, unsigned long *address,
+				 drm_handle_t *handle);
+extern int           drmAgpFree(int fd, drm_handle_t handle);
+extern int 	     drmAgpBind(int fd, drm_handle_t handle,
+				unsigned long offset);
+extern int           drmAgpUnbind(int fd, drm_handle_t handle);
+
+/* AGP/GART info: authenticated client and/or X */
+extern int           drmAgpVersionMajor(int fd);
+extern int           drmAgpVersionMinor(int fd);
+extern unsigned long drmAgpGetMode(int fd);
+extern unsigned long drmAgpBase(int fd); /* Physical location */
+extern unsigned long drmAgpSize(int fd); /* Bytes */
+extern unsigned long drmAgpMemoryUsed(int fd);
+extern unsigned long drmAgpMemoryAvail(int fd);
+extern unsigned int  drmAgpVendorId(int fd);
+extern unsigned int  drmAgpDeviceId(int fd);
+
+/* PCI scatter/gather support: X server (root) only */
+extern int           drmScatterGatherAlloc(int fd, unsigned long size,
+					   drm_handle_t *handle);
+extern int           drmScatterGatherFree(int fd, drm_handle_t handle);
+
+extern int           drmWaitVBlank(int fd, drmVBlankPtr vbl);
+
+/* Support routines */
+extern void          drmSetServerInfo(drmServerInfoPtr info);
+extern int           drmError(int err, const char *label);
+extern void          *drmMalloc(int size);
+extern void          drmFree(void *pt);
+
+/* Hash table routines */
+extern void *drmHashCreate(void);
+extern int  drmHashDestroy(void *t);
+extern int  drmHashLookup(void *t, unsigned long key, void **value);
+extern int  drmHashInsert(void *t, unsigned long key, void *value);
+extern int  drmHashDelete(void *t, unsigned long key);
+extern int  drmHashFirst(void *t, unsigned long *key, void **value);
+extern int  drmHashNext(void *t, unsigned long *key, void **value);
+
+/* PRNG routines */
+extern void          *drmRandomCreate(unsigned long seed);
+extern int           drmRandomDestroy(void *state);
+extern unsigned long drmRandom(void *state);
+extern double        drmRandomDouble(void *state);
+
+/* Skip list routines */
+
+extern void *drmSLCreate(void);
+extern int  drmSLDestroy(void *l);
+extern int  drmSLLookup(void *l, unsigned long key, void **value);
+extern int  drmSLInsert(void *l, unsigned long key, void *value);
+extern int  drmSLDelete(void *l, unsigned long key);
+extern int  drmSLNext(void *l, unsigned long *key, void **value);
+extern int  drmSLFirst(void *l, unsigned long *key, void **value);
+extern void drmSLDump(void *l);
+extern int  drmSLLookupNeighbors(void *l, unsigned long key,
+				 unsigned long *prev_key, void **prev_value,
+				 unsigned long *next_key, void **next_value);
+
+extern int drmOpenOnce(void *unused, const char *BusID, int *newlyopened);
+extern int drmOpenOnceWithType(const char *BusID, int *newlyopened, int type);
+extern void drmCloseOnce(int fd);
+extern void drmMsg(const char *format, ...) DRM_PRINTFLIKE(1, 2);
+
+extern int drmSetMaster(int fd);
+extern int drmDropMaster(int fd);
+
+#define DRM_EVENT_CONTEXT_VERSION 2
+
+typedef struct _drmEventContext {
+
+	/* This struct is versioned so we can add more pointers if we
+	 * add more events. */
+	int version;
+
+	void (*vblank_handler)(int fd,
+			       unsigned int sequence, 
+			       unsigned int tv_sec,
+			       unsigned int tv_usec,
+			       void *user_data);
+
+	void (*page_flip_handler)(int fd,
+				  unsigned int sequence,
+				  unsigned int tv_sec,
+				  unsigned int tv_usec,
+				  void *user_data);
+
+} drmEventContext, *drmEventContextPtr;
+
+extern int drmHandleEvent(int fd, drmEventContextPtr evctx);
+
+extern char *drmGetDeviceNameFromFd(int fd);
+extern int drmGetNodeTypeFromFd(int fd);
+
+extern int drmPrimeHandleToFD(int fd, uint32_t handle, uint32_t flags, int *prime_fd);
+extern int drmPrimeFDToHandle(int fd, int prime_fd, uint32_t *handle);
+
+extern char *drmGetPrimaryDeviceNameFromFd(int fd);
+extern char *drmGetRenderDeviceNameFromFd(int fd);
+
+#if defined(__cplusplus) || defined(c_plusplus)
+}
+#endif
+
+#endif

diff --git a/icd/intel/kmd/libdrm/xf86drmHash.c b/icd/intel/kmd/libdrm/xf86drmHash.c
new file mode 100644
index 0000000..f287e61
--- /dev/null
+++ b/icd/intel/kmd/libdrm/xf86drmHash.c

@@ -0,0 +1,253 @@
+/* xf86drmHash.c -- Small hash table support for integer -> integer mapping
+ * Created: Sun Apr 18 09:35:45 1999 by faith@precisioninsight.com
+ *
+ * Copyright 1999 Precision Insight, Inc., Cedar Park, Texas.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: Rickard E. (Rik) Faith <faith@valinux.com>
+ *
+ * DESCRIPTION
+ *
+ * This file contains a straightforward implementation of a fixed-sized
+ * hash table using self-organizing linked lists [Knuth73, pp. 398-399] for
+ * collision resolution.  There are two potentially interesting things
+ * about this implementation:
+ *
+ * 1) The table is power-of-two sized.  Prime sized tables are more
+ * traditional, but do not have a significant advantage over power-of-two
+ * sized table, especially when double hashing is not used for collision
+ * resolution.
+ *
+ * 2) The hash computation uses a table of random integers [Hanson97,
+ * pp. 39-41].
+ *
+ * FUTURE ENHANCEMENTS
+ *
+ * With a table size of 512, the current implementation is sufficient for a
+ * few hundred keys.  Since this is well above the expected size of the
+ * tables for which this implementation was designed, the implementation of
+ * dynamic hash tables was postponed until the need arises.  A common (and
+ * naive) approach to dynamic hash table implementation simply creates a
+ * new hash table when necessary, rehashes all the data into the new table,
+ * and destroys the old table.  The approach in [Larson88] is superior in
+ * two ways: 1) only a portion of the table is expanded when needed,
+ * distributing the expansion cost over several insertions, and 2) portions
+ * of the table can be locked, enabling a scalable thread-safe
+ * implementation.
+ *
+ * REFERENCES
+ *
+ * [Hanson97] David R. Hanson.  C Interfaces and Implementations:
+ * Techniques for Creating Reusable Software.  Reading, Massachusetts:
+ * Addison-Wesley, 1997.
+ *
+ * [Knuth73] Donald E. Knuth. The Art of Computer Programming.  Volume 3:
+ * Sorting and Searching.  Reading, Massachusetts: Addison-Wesley, 1973.
+ *
+ * [Larson88] Per-Ake Larson. "Dynamic Hash Tables".  CACM 31(4), April
+ * 1988, pp. 446-457.
+ *
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "xf86drm.h"
+#include "xf86drmHash.h"
+
+#define HASH_MAGIC 0xdeadbeef
+
+static unsigned long HashHash(unsigned long key)
+{
+    unsigned long        hash = 0;
+    unsigned long        tmp  = key;
+    static int           init = 0;
+    static unsigned long scatter[256];
+    int                  i;
+
+    if (!init) {
+	void *state;
+	state = drmRandomCreate(37);
+	for (i = 0; i < 256; i++) scatter[i] = drmRandom(state);
+	drmRandomDestroy(state);
+	++init;
+    }
+
+    while (tmp) {
+	hash = (hash << 1) + scatter[tmp & 0xff];
+	tmp >>= 8;
+    }
+
+    hash %= HASH_SIZE;
+#if DEBUG
+    printf( "Hash(%lu) = %lu\n", key, hash);
+#endif
+    return hash;
+}
+
+void *drmHashCreate(void)
+{
+    HashTablePtr table;
+    int          i;
+
+    table           = drmMalloc(sizeof(*table));
+    if (!table) return NULL;
+    table->magic    = HASH_MAGIC;
+    table->entries  = 0;
+    table->hits     = 0;
+    table->partials = 0;
+    table->misses   = 0;
+
+    for (i = 0; i < HASH_SIZE; i++) table->buckets[i] = NULL;
+    return table;
+}
+
+int drmHashDestroy(void *t)
+{
+    HashTablePtr  table = (HashTablePtr)t;
+    HashBucketPtr bucket;
+    HashBucketPtr next;
+    int           i;
+
+    if (table->magic != HASH_MAGIC) return -1; /* Bad magic */
+
+    for (i = 0; i < HASH_SIZE; i++) {
+	for (bucket = table->buckets[i]; bucket;) {
+	    next = bucket->next;
+	    drmFree(bucket);
+	    bucket = next;
+	}
+    }
+    drmFree(table);
+    return 0;
+}
+
+/* Find the bucket and organize the list so that this bucket is at the
+   top. */
+
+static HashBucketPtr HashFind(HashTablePtr table,
+			      unsigned long key, unsigned long *h)
+{
+    unsigned long hash = HashHash(key);
+    HashBucketPtr prev = NULL;
+    HashBucketPtr bucket;
+
+    if (h) *h = hash;
+
+    for (bucket = table->buckets[hash]; bucket; bucket = bucket->next) {
+	if (bucket->key == key) {
+	    if (prev) {
+				/* Organize */
+		prev->next           = bucket->next;
+		bucket->next         = table->buckets[hash];
+		table->buckets[hash] = bucket;
+		++table->partials;
+	    } else {
+		++table->hits;
+	    }
+	    return bucket;
+	}
+	prev = bucket;
+    }
+    ++table->misses;
+    return NULL;
+}
+
+int drmHashLookup(void *t, unsigned long key, void **value)
+{
+    HashTablePtr  table = (HashTablePtr)t;
+    HashBucketPtr bucket;
+
+    if (!table || table->magic != HASH_MAGIC) return -1; /* Bad magic */
+
+    bucket = HashFind(table, key, NULL);
+    if (!bucket) return 1;	/* Not found */
+    *value = bucket->value;
+    return 0;			/* Found */
+}
+
+int drmHashInsert(void *t, unsigned long key, void *value)
+{
+    HashTablePtr  table = (HashTablePtr)t;
+    HashBucketPtr bucket;
+    unsigned long hash;
+
+    if (table->magic != HASH_MAGIC) return -1; /* Bad magic */
+
+    if (HashFind(table, key, &hash)) return 1; /* Already in table */
+
+    bucket               = drmMalloc(sizeof(*bucket));
+    if (!bucket) return -1;	/* Error */
+    bucket->key          = key;
+    bucket->value        = value;
+    bucket->next         = table->buckets[hash];
+    table->buckets[hash] = bucket;
+#if DEBUG
+    printf("Inserted %lu at %lu/%p\n", key, hash, bucket);
+#endif
+    return 0;			/* Added to table */
+}
+
+int drmHashDelete(void *t, unsigned long key)
+{
+    HashTablePtr  table = (HashTablePtr)t;
+    unsigned long hash;
+    HashBucketPtr bucket;
+
+    if (table->magic != HASH_MAGIC) return -1; /* Bad magic */
+
+    bucket = HashFind(table, key, &hash);
+
+    if (!bucket) return 1;	/* Not found */
+
+    table->buckets[hash] = bucket->next;
+    drmFree(bucket);
+    return 0;
+}
+
+int drmHashNext(void *t, unsigned long *key, void **value)
+{
+    HashTablePtr  table = (HashTablePtr)t;
+
+    while (table->p0 < HASH_SIZE) {
+	if (table->p1) {
+	    *key       = table->p1->key;
+	    *value     = table->p1->value;
+	    table->p1  = table->p1->next;
+	    return 1;
+	}
+	table->p1 = table->buckets[table->p0];
+	++table->p0;
+    }
+    return 0;
+}
+
+int drmHashFirst(void *t, unsigned long *key, void **value)
+{
+    HashTablePtr  table = (HashTablePtr)t;
+
+    if (table->magic != HASH_MAGIC) return -1; /* Bad magic */
+
+    table->p0 = 0;
+    table->p1 = table->buckets[0];
+    return drmHashNext(table, key, value);
+}

diff --git a/icd/intel/kmd/libdrm/xf86drmHash.h b/icd/intel/kmd/libdrm/xf86drmHash.h
new file mode 100644
index 0000000..38fd84b
--- /dev/null
+++ b/icd/intel/kmd/libdrm/xf86drmHash.h

@@ -0,0 +1,47 @@
+/*
+ * Copyright 1999 Precision Insight, Inc., Cedar Park, Texas.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: Rickard E. (Rik) Faith <faith@valinux.com>
+ */
+
+#define HASH_SIZE  512		/* Good for about 100 entries */
+				/* If you change this value, you probably
+                                   have to change the HashHash hashing
+                                   function! */
+
+typedef struct HashBucket {
+    unsigned long     key;
+    void              *value;
+    struct HashBucket *next;
+} HashBucket, *HashBucketPtr;
+
+typedef struct HashTable {
+    unsigned long    magic;
+    unsigned long    entries;
+    unsigned long    hits;	/* At top of linked list */
+    unsigned long    partials;	/* Not at top of linked list */
+    unsigned long    misses;	/* Not in table */
+    HashBucketPtr    buckets[HASH_SIZE];
+    int              p0;
+    HashBucketPtr    p1;
+} HashTable, *HashTablePtr;

diff --git a/icd/intel/kmd/libdrm/xf86drmMode.c b/icd/intel/kmd/libdrm/xf86drmMode.c
new file mode 100644
index 0000000..fd625a9
--- /dev/null
+++ b/icd/intel/kmd/libdrm/xf86drmMode.c

@@ -0,0 +1,1132 @@
+/*
+ * \file xf86drmMode.c
+ * Header for DRM modesetting interface.
+ *
+ * \author Jakob Bornecrantz <wallbraker@gmail.com>
+ *
+ * \par Acknowledgements:
+ * Feb 2007, Dave Airlie <airlied@linux.ie>
+ */
+
+/*
+ * Copyright (c) 2007-2008 Tungsten Graphics, Inc., Cedar Park, Texas.
+ * Copyright (c) 2007-2008 Dave Airlie <airlied@linux.ie>
+ * Copyright (c) 2007-2008 Jakob Bornecrantz <wallbraker@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+/*
+ * TODO the types we are after are defined in different headers on different
+ * platforms find which headers to include to get uint32_t
+ */
+#include <stdint.h>
+#include <sys/ioctl.h>
+#include <stdio.h>
+
+#ifdef HAVE_CONFIG_H
+#include "config.h"
+#endif
+
+#include "xf86drmMode.h"
+#include "xf86drm.h"
+#include <drm.h>
+#include <string.h>
+#include <dirent.h>
+#include <unistd.h>
+#include <errno.h>
+
+#ifdef HAVE_VALGRIND
+#include <valgrind.h>
+#include <memcheck.h>
+#define VG(x) x
+#else
+#define VG(x)
+#endif
+
+#define memclear(s) memset(&s, 0, sizeof(s))
+
+#define U642VOID(x) ((void *)(unsigned long)(x))
+#define VOID2U64(x) ((uint64_t)(unsigned long)(x))
+
+static inline int DRM_IOCTL(int fd, unsigned long cmd, void *arg)
+{
+	int ret = drmIoctl(fd, cmd, arg);
+	return ret < 0 ? -errno : ret;
+}
+
+/*
+ * Util functions
+ */
+
+static void* drmAllocCpy(char *array, int count, int entry_size)
+{
+	char *r;
+	int i;
+
+	if (!count || !array || !entry_size)
+		return 0;
+
+	if (!(r = drmMalloc(count*entry_size)))
+		return 0;
+
+	for (i = 0; i < count; i++)
+		memcpy(r+(entry_size*i), array+(entry_size*i), entry_size);
+
+	return r;
+}
+
+/*
+ * A couple of free functions.
+ */
+
+void drmModeFreeModeInfo(drmModeModeInfoPtr ptr)
+{
+	if (!ptr)
+		return;
+
+	drmFree(ptr);
+}
+
+void drmModeFreeResources(drmModeResPtr ptr)
+{
+	if (!ptr)
+		return;
+
+	drmFree(ptr->fbs);
+	drmFree(ptr->crtcs);
+	drmFree(ptr->connectors);
+	drmFree(ptr->encoders);
+	drmFree(ptr);
+
+}
+
+void drmModeFreeFB(drmModeFBPtr ptr)
+{
+	if (!ptr)
+		return;
+
+	/* we might add more frees later. */
+	drmFree(ptr);
+}
+
+void drmModeFreeCrtc(drmModeCrtcPtr ptr)
+{
+	if (!ptr)
+		return;
+
+	drmFree(ptr);
+
+}
+
+void drmModeFreeConnector(drmModeConnectorPtr ptr)
+{
+	if (!ptr)
+		return;
+
+	drmFree(ptr->encoders);
+	drmFree(ptr->prop_values);
+	drmFree(ptr->props);
+	drmFree(ptr->modes);
+	drmFree(ptr);
+
+}
+
+void drmModeFreeEncoder(drmModeEncoderPtr ptr)
+{
+	drmFree(ptr);
+}
+
+/*
+ * ModeSetting functions.
+ */
+
+drmModeResPtr drmModeGetResources(int fd)
+{
+	struct drm_mode_card_res res, counts;
+	drmModeResPtr r = 0;
+
+retry:
+	memclear(res);
+	if (drmIoctl(fd, DRM_IOCTL_MODE_GETRESOURCES, &res))
+		return 0;
+
+	counts = res;
+
+	if (res.count_fbs) {
+		res.fb_id_ptr = VOID2U64(drmMalloc(res.count_fbs*sizeof(uint32_t)));
+		if (!res.fb_id_ptr)
+			goto err_allocs;
+	}
+	if (res.count_crtcs) {
+		res.crtc_id_ptr = VOID2U64(drmMalloc(res.count_crtcs*sizeof(uint32_t)));
+		if (!res.crtc_id_ptr)
+			goto err_allocs;
+	}
+	if (res.count_connectors) {
+		res.connector_id_ptr = VOID2U64(drmMalloc(res.count_connectors*sizeof(uint32_t)));
+		if (!res.connector_id_ptr)
+			goto err_allocs;
+	}
+	if (res.count_encoders) {
+		res.encoder_id_ptr = VOID2U64(drmMalloc(res.count_encoders*sizeof(uint32_t)));
+		if (!res.encoder_id_ptr)
+			goto err_allocs;
+	}
+
+	if (drmIoctl(fd, DRM_IOCTL_MODE_GETRESOURCES, &res))
+		goto err_allocs;
+
+	/* The number of available connectors and etc may have changed with a
+	 * hotplug event in between the ioctls, in which case the field is
+	 * silently ignored by the kernel.
+	 */
+	if (counts.count_fbs < res.count_fbs ||
+	    counts.count_crtcs < res.count_crtcs ||
+	    counts.count_connectors < res.count_connectors ||
+	    counts.count_encoders < res.count_encoders)
+	{
+		drmFree(U642VOID(res.fb_id_ptr));
+		drmFree(U642VOID(res.crtc_id_ptr));
+		drmFree(U642VOID(res.connector_id_ptr));
+		drmFree(U642VOID(res.encoder_id_ptr));
+
+		goto retry;
+	}
+
+	/*
+	 * return
+	 */
+	if (!(r = drmMalloc(sizeof(*r))))
+		goto err_allocs;
+
+	r->min_width     = res.min_width;
+	r->max_width     = res.max_width;
+	r->min_height    = res.min_height;
+	r->max_height    = res.max_height;
+	r->count_fbs     = res.count_fbs;
+	r->count_crtcs   = res.count_crtcs;
+	r->count_connectors = res.count_connectors;
+	r->count_encoders = res.count_encoders;
+
+	r->fbs        = drmAllocCpy(U642VOID(res.fb_id_ptr), res.count_fbs, sizeof(uint32_t));
+	r->crtcs      = drmAllocCpy(U642VOID(res.crtc_id_ptr), res.count_crtcs, sizeof(uint32_t));
+	r->connectors = drmAllocCpy(U642VOID(res.connector_id_ptr), res.count_connectors, sizeof(uint32_t));
+	r->encoders   = drmAllocCpy(U642VOID(res.encoder_id_ptr), res.count_encoders, sizeof(uint32_t));
+	if ((res.count_fbs && !r->fbs) ||
+	    (res.count_crtcs && !r->crtcs) ||
+	    (res.count_connectors && !r->connectors) ||
+	    (res.count_encoders && !r->encoders))
+	{
+		drmFree(r->fbs);
+		drmFree(r->crtcs);
+		drmFree(r->connectors);
+		drmFree(r->encoders);
+		drmFree(r);
+		r = 0;
+	}
+
+err_allocs:
+	drmFree(U642VOID(res.fb_id_ptr));
+	drmFree(U642VOID(res.crtc_id_ptr));
+	drmFree(U642VOID(res.connector_id_ptr));
+	drmFree(U642VOID(res.encoder_id_ptr));
+
+	return r;
+}
+
+int drmModeAddFB(int fd, uint32_t width, uint32_t height, uint8_t depth,
+                 uint8_t bpp, uint32_t pitch, uint32_t bo_handle,
+		 uint32_t *buf_id)
+{
+	struct drm_mode_fb_cmd f;
+	int ret;
+
+	memclear(f);
+	f.width  = width;
+	f.height = height;
+	f.pitch  = pitch;
+	f.bpp    = bpp;
+	f.depth  = depth;
+	f.handle = bo_handle;
+
+	if ((ret = DRM_IOCTL(fd, DRM_IOCTL_MODE_ADDFB, &f)))
+		return ret;
+
+	*buf_id = f.fb_id;
+	return 0;
+}
+
+int drmModeAddFB2(int fd, uint32_t width, uint32_t height,
+		  uint32_t pixel_format, uint32_t bo_handles[4],
+		  uint32_t pitches[4], uint32_t offsets[4],
+		  uint32_t *buf_id, uint32_t flags)
+{
+	struct drm_mode_fb_cmd2 f;
+	int ret;
+
+	memclear(f);
+	f.width  = width;
+	f.height = height;
+	f.pixel_format = pixel_format;
+	f.flags = flags;
+	memcpy(f.handles, bo_handles, 4 * sizeof(bo_handles[0]));
+	memcpy(f.pitches, pitches, 4 * sizeof(pitches[0]));
+	memcpy(f.offsets, offsets, 4 * sizeof(offsets[0]));
+
+	if ((ret = DRM_IOCTL(fd, DRM_IOCTL_MODE_ADDFB2, &f)))
+		return ret;
+
+	*buf_id = f.fb_id;
+	return 0;
+}
+
+int drmModeRmFB(int fd, uint32_t bufferId)
+{
+	return DRM_IOCTL(fd, DRM_IOCTL_MODE_RMFB, &bufferId);
+}
+
+drmModeFBPtr drmModeGetFB(int fd, uint32_t buf)
+{
+	struct drm_mode_fb_cmd info;
+	drmModeFBPtr r;
+
+	memclear(info);
+	info.fb_id = buf;
+
+	if (drmIoctl(fd, DRM_IOCTL_MODE_GETFB, &info))
+		return NULL;
+
+	if (!(r = drmMalloc(sizeof(*r))))
+		return NULL;
+
+	r->fb_id = info.fb_id;
+	r->width = info.width;
+	r->height = info.height;
+	r->pitch = info.pitch;
+	r->bpp = info.bpp;
+	r->handle = info.handle;
+	r->depth = info.depth;
+
+	return r;
+}
+
+int drmModeDirtyFB(int fd, uint32_t bufferId,
+		   drmModeClipPtr clips, uint32_t num_clips)
+{
+	struct drm_mode_fb_dirty_cmd dirty;
+
+	memclear(dirty);
+	dirty.fb_id = bufferId;
+	dirty.clips_ptr = VOID2U64(clips);
+	dirty.num_clips = num_clips;
+
+	return DRM_IOCTL(fd, DRM_IOCTL_MODE_DIRTYFB, &dirty);
+}
+
+
+/*
+ * Crtc functions
+ */
+
+drmModeCrtcPtr drmModeGetCrtc(int fd, uint32_t crtcId)
+{
+	struct drm_mode_crtc crtc;
+	drmModeCrtcPtr r;
+
+	memclear(crtc);
+	crtc.crtc_id = crtcId;
+
+	if (drmIoctl(fd, DRM_IOCTL_MODE_GETCRTC, &crtc))
+		return 0;
+
+	/*
+	 * return
+	 */
+
+	if (!(r = drmMalloc(sizeof(*r))))
+		return 0;
+
+	r->crtc_id         = crtc.crtc_id;
+	r->x               = crtc.x;
+	r->y               = crtc.y;
+	r->mode_valid      = crtc.mode_valid;
+	if (r->mode_valid) {
+		memcpy(&r->mode, &crtc.mode, sizeof(struct drm_mode_modeinfo));
+		r->width = crtc.mode.hdisplay;
+		r->height = crtc.mode.vdisplay;
+	}
+	r->buffer_id       = crtc.fb_id;
+	r->gamma_size      = crtc.gamma_size;
+	return r;
+}
+
+
+int drmModeSetCrtc(int fd, uint32_t crtcId, uint32_t bufferId,
+                   uint32_t x, uint32_t y, uint32_t *connectors, int count,
+		   drmModeModeInfoPtr mode)
+{
+	struct drm_mode_crtc crtc;
+
+	memclear(crtc);
+	crtc.x             = x;
+	crtc.y             = y;
+	crtc.crtc_id       = crtcId;
+	crtc.fb_id         = bufferId;
+	crtc.set_connectors_ptr = VOID2U64(connectors);
+	crtc.count_connectors = count;
+	if (mode) {
+	  memcpy(&crtc.mode, mode, sizeof(struct drm_mode_modeinfo));
+	  crtc.mode_valid = 1;
+	}
+
+	return DRM_IOCTL(fd, DRM_IOCTL_MODE_SETCRTC, &crtc);
+}
+
+/*
+ * Cursor manipulation
+ */
+
+int drmModeSetCursor(int fd, uint32_t crtcId, uint32_t bo_handle, uint32_t width, uint32_t height)
+{
+	struct drm_mode_cursor arg;
+
+	memclear(arg);
+	arg.flags = DRM_MODE_CURSOR_BO;
+	arg.crtc_id = crtcId;
+	arg.width = width;
+	arg.height = height;
+	arg.handle = bo_handle;
+
+	return DRM_IOCTL(fd, DRM_IOCTL_MODE_CURSOR, &arg);
+}
+
+int drmModeSetCursor2(int fd, uint32_t crtcId, uint32_t bo_handle, uint32_t width, uint32_t height, int32_t hot_x, int32_t hot_y)
+{
+	struct drm_mode_cursor2 arg;
+
+	memclear(arg);
+	arg.flags = DRM_MODE_CURSOR_BO;
+	arg.crtc_id = crtcId;
+	arg.width = width;
+	arg.height = height;
+	arg.handle = bo_handle;
+	arg.hot_x = hot_x;
+	arg.hot_y = hot_y;
+
+	return DRM_IOCTL(fd, DRM_IOCTL_MODE_CURSOR2, &arg);
+}
+
+int drmModeMoveCursor(int fd, uint32_t crtcId, int x, int y)
+{
+	struct drm_mode_cursor arg;
+
+	memclear(arg);
+	arg.flags = DRM_MODE_CURSOR_MOVE;
+	arg.crtc_id = crtcId;
+	arg.x = x;
+	arg.y = y;
+
+	return DRM_IOCTL(fd, DRM_IOCTL_MODE_CURSOR, &arg);
+}
+
+/*
+ * Encoder get
+ */
+drmModeEncoderPtr drmModeGetEncoder(int fd, uint32_t encoder_id)
+{
+	struct drm_mode_get_encoder enc;
+	drmModeEncoderPtr r = NULL;
+
+	memclear(enc);
+	enc.encoder_id = encoder_id;
+
+	if (drmIoctl(fd, DRM_IOCTL_MODE_GETENCODER, &enc))
+		return 0;
+
+	if (!(r = drmMalloc(sizeof(*r))))
+		return 0;
+
+	r->encoder_id = enc.encoder_id;
+	r->crtc_id = enc.crtc_id;
+	r->encoder_type = enc.encoder_type;
+	r->possible_crtcs = enc.possible_crtcs;
+	r->possible_clones = enc.possible_clones;
+
+	return r;
+}
+
+/*
+ * Connector manipulation
+ */
+
+drmModeConnectorPtr drmModeGetConnector(int fd, uint32_t connector_id)
+{
+	struct drm_mode_get_connector conn, counts;
+	drmModeConnectorPtr r = NULL;
+
+retry:
+	memclear(conn);
+	conn.connector_id = connector_id;
+
+	if (drmIoctl(fd, DRM_IOCTL_MODE_GETCONNECTOR, &conn))
+		return 0;
+
+	counts = conn;
+
+	if (conn.count_props) {
+		conn.props_ptr = VOID2U64(drmMalloc(conn.count_props*sizeof(uint32_t)));
+		if (!conn.props_ptr)
+			goto err_allocs;
+		conn.prop_values_ptr = VOID2U64(drmMalloc(conn.count_props*sizeof(uint64_t)));
+		if (!conn.prop_values_ptr)
+			goto err_allocs;
+	}
+
+	if (conn.count_modes) {
+		conn.modes_ptr = VOID2U64(drmMalloc(conn.count_modes*sizeof(struct drm_mode_modeinfo)));
+		if (!conn.modes_ptr)
+			goto err_allocs;
+	}
+
+	if (conn.count_encoders) {
+		conn.encoders_ptr = VOID2U64(drmMalloc(conn.count_encoders*sizeof(uint32_t)));
+		if (!conn.encoders_ptr)
+			goto err_allocs;
+	}
+
+	if (drmIoctl(fd, DRM_IOCTL_MODE_GETCONNECTOR, &conn))
+		goto err_allocs;
+
+	/* The number of available connectors and etc may have changed with a
+	 * hotplug event in between the ioctls, in which case the field is
+	 * silently ignored by the kernel.
+	 */
+	if (counts.count_props < conn.count_props ||
+	    counts.count_modes < conn.count_modes ||
+	    counts.count_encoders < conn.count_encoders) {
+		drmFree(U642VOID(conn.props_ptr));
+		drmFree(U642VOID(conn.prop_values_ptr));
+		drmFree(U642VOID(conn.modes_ptr));
+		drmFree(U642VOID(conn.encoders_ptr));
+
+		goto retry;
+	}
+
+	if(!(r = drmMalloc(sizeof(*r)))) {
+		goto err_allocs;
+	}
+
+	r->connector_id = conn.connector_id;
+	r->encoder_id = conn.encoder_id;
+	r->connection   = conn.connection;
+	r->mmWidth      = conn.mm_width;
+	r->mmHeight     = conn.mm_height;
+	/* convert subpixel from kernel to userspace */
+	r->subpixel     = conn.subpixel + 1;
+	r->count_modes  = conn.count_modes;
+	r->count_props  = conn.count_props;
+	r->props        = drmAllocCpy(U642VOID(conn.props_ptr), conn.count_props, sizeof(uint32_t));
+	r->prop_values  = drmAllocCpy(U642VOID(conn.prop_values_ptr), conn.count_props, sizeof(uint64_t));
+	r->modes        = drmAllocCpy(U642VOID(conn.modes_ptr), conn.count_modes, sizeof(struct drm_mode_modeinfo));
+	r->count_encoders = conn.count_encoders;
+	r->encoders     = drmAllocCpy(U642VOID(conn.encoders_ptr), conn.count_encoders, sizeof(uint32_t));
+	r->connector_type  = conn.connector_type;
+	r->connector_type_id = conn.connector_type_id;
+
+	if ((r->count_props && !r->props) ||
+	    (r->count_props && !r->prop_values) ||
+	    (r->count_modes && !r->modes) ||
+	    (r->count_encoders && !r->encoders)) {
+		drmFree(r->props);
+		drmFree(r->prop_values);
+		drmFree(r->modes);
+		drmFree(r->encoders);
+		drmFree(r);
+		r = 0;
+	}
+
+err_allocs:
+	drmFree(U642VOID(conn.prop_values_ptr));
+	drmFree(U642VOID(conn.props_ptr));
+	drmFree(U642VOID(conn.modes_ptr));
+	drmFree(U642VOID(conn.encoders_ptr));
+
+	return r;
+}
+
+int drmModeAttachMode(int fd, uint32_t connector_id, drmModeModeInfoPtr mode_info)
+{
+	struct drm_mode_mode_cmd res;
+
+	memclear(res);
+	memcpy(&res.mode, mode_info, sizeof(struct drm_mode_modeinfo));
+	res.connector_id = connector_id;
+
+	return DRM_IOCTL(fd, DRM_IOCTL_MODE_ATTACHMODE, &res);
+}
+
+int drmModeDetachMode(int fd, uint32_t connector_id, drmModeModeInfoPtr mode_info)
+{
+	struct drm_mode_mode_cmd res;
+
+	memclear(res);
+	memcpy(&res.mode, mode_info, sizeof(struct drm_mode_modeinfo));
+	res.connector_id = connector_id;
+
+	return DRM_IOCTL(fd, DRM_IOCTL_MODE_DETACHMODE, &res);
+}
+
+
+drmModePropertyPtr drmModeGetProperty(int fd, uint32_t property_id)
+{
+	struct drm_mode_get_property prop;
+	drmModePropertyPtr r;
+
+	memclear(prop);
+	prop.prop_id = property_id;
+
+	if (drmIoctl(fd, DRM_IOCTL_MODE_GETPROPERTY, &prop))
+		return 0;
+
+	if (prop.count_values)
+		prop.values_ptr = VOID2U64(drmMalloc(prop.count_values * sizeof(uint64_t)));
+
+	if (prop.count_enum_blobs && (prop.flags & (DRM_MODE_PROP_ENUM | DRM_MODE_PROP_BITMASK)))
+		prop.enum_blob_ptr = VOID2U64(drmMalloc(prop.count_enum_blobs * sizeof(struct drm_mode_property_enum)));
+
+	if (prop.count_enum_blobs && (prop.flags & DRM_MODE_PROP_BLOB)) {
+		prop.values_ptr = VOID2U64(drmMalloc(prop.count_enum_blobs * sizeof(uint32_t)));
+		prop.enum_blob_ptr = VOID2U64(drmMalloc(prop.count_enum_blobs * sizeof(uint32_t)));
+	}
+
+	if (drmIoctl(fd, DRM_IOCTL_MODE_GETPROPERTY, &prop)) {
+		r = NULL;
+		goto err_allocs;
+	}
+
+	if (!(r = drmMalloc(sizeof(*r))))
+		return NULL;
+
+	r->prop_id = prop.prop_id;
+	r->count_values = prop.count_values;
+
+	r->flags = prop.flags;
+	if (prop.count_values)
+		r->values = drmAllocCpy(U642VOID(prop.values_ptr), prop.count_values, sizeof(uint64_t));
+	if (prop.flags & (DRM_MODE_PROP_ENUM | DRM_MODE_PROP_BITMASK)) {
+		r->count_enums = prop.count_enum_blobs;
+		r->enums = drmAllocCpy(U642VOID(prop.enum_blob_ptr), prop.count_enum_blobs, sizeof(struct drm_mode_property_enum));
+	} else if (prop.flags & DRM_MODE_PROP_BLOB) {
+		r->values = drmAllocCpy(U642VOID(prop.values_ptr), prop.count_enum_blobs, sizeof(uint32_t));
+		r->blob_ids = drmAllocCpy(U642VOID(prop.enum_blob_ptr), prop.count_enum_blobs, sizeof(uint32_t));
+		r->count_blobs = prop.count_enum_blobs;
+	}
+	strncpy(r->name, prop.name, DRM_PROP_NAME_LEN);
+	r->name[DRM_PROP_NAME_LEN-1] = 0;
+
+err_allocs:
+	drmFree(U642VOID(prop.values_ptr));
+	drmFree(U642VOID(prop.enum_blob_ptr));
+
+	return r;
+}
+
+void drmModeFreeProperty(drmModePropertyPtr ptr)
+{
+	if (!ptr)
+		return;
+
+	drmFree(ptr->values);
+	drmFree(ptr->enums);
+	drmFree(ptr);
+}
+
+drmModePropertyBlobPtr drmModeGetPropertyBlob(int fd, uint32_t blob_id)
+{
+	struct drm_mode_get_blob blob;
+	drmModePropertyBlobPtr r;
+
+	memclear(blob);
+	blob.blob_id = blob_id;
+
+	if (drmIoctl(fd, DRM_IOCTL_MODE_GETPROPBLOB, &blob))
+		return NULL;
+
+	if (blob.length)
+		blob.data = VOID2U64(drmMalloc(blob.length));
+
+	if (drmIoctl(fd, DRM_IOCTL_MODE_GETPROPBLOB, &blob)) {
+		r = NULL;
+		goto err_allocs;
+	}
+
+	if (!(r = drmMalloc(sizeof(*r))))
+		goto err_allocs;
+
+	r->id = blob.blob_id;
+	r->length = blob.length;
+	r->data = drmAllocCpy(U642VOID(blob.data), 1, blob.length);
+
+err_allocs:
+	drmFree(U642VOID(blob.data));
+	return r;
+}
+
+void drmModeFreePropertyBlob(drmModePropertyBlobPtr ptr)
+{
+	if (!ptr)
+		return;
+
+	drmFree(ptr->data);
+	drmFree(ptr);
+}
+
+int drmModeConnectorSetProperty(int fd, uint32_t connector_id, uint32_t property_id,
+			     uint64_t value)
+{
+	struct drm_mode_connector_set_property osp;
+
+	memclear(osp);
+	osp.connector_id = connector_id;
+	osp.prop_id = property_id;
+	osp.value = value;
+
+	return DRM_IOCTL(fd, DRM_IOCTL_MODE_SETPROPERTY, &osp);
+}
+
+/*
+ * checks if a modesetting capable driver has attached to the pci id
+ * returns 0 if modesetting supported.
+ *  -EINVAL or invalid bus id
+ *  -ENOSYS if no modesetting support
+*/
+int drmCheckModesettingSupported(const char *busid)
+{
+#if defined (__linux__)
+	char pci_dev_dir[1024];
+	int domain, bus, dev, func;
+	DIR *sysdir;
+	struct dirent *dent;
+	int found = 0, ret;
+
+	ret = sscanf(busid, "pci:%04x:%02x:%02x.%d", &domain, &bus, &dev, &func);
+	if (ret != 4)
+		return -EINVAL;
+
+	sprintf(pci_dev_dir, "/sys/bus/pci/devices/%04x:%02x:%02x.%d/drm",
+		domain, bus, dev, func);
+
+	sysdir = opendir(pci_dev_dir);
+	if (sysdir) {
+		dent = readdir(sysdir);
+		while (dent) {
+			if (!strncmp(dent->d_name, "controlD", 8)) {
+				found = 1;
+				break;
+			}
+
+			dent = readdir(sysdir);
+		}
+		closedir(sysdir);
+		if (found)
+			return 0;
+	}
+
+	sprintf(pci_dev_dir, "/sys/bus/pci/devices/%04x:%02x:%02x.%d/",
+		domain, bus, dev, func);
+
+	sysdir = opendir(pci_dev_dir);
+	if (!sysdir)
+		return -EINVAL;
+
+	dent = readdir(sysdir);
+	while (dent) {
+		if (!strncmp(dent->d_name, "drm:controlD", 12)) {
+			found = 1;
+			break;
+		}
+
+		dent = readdir(sysdir);
+	}
+
+	closedir(sysdir);
+	if (found)
+		return 0;
+#elif defined (__FreeBSD__) || defined (__FreeBSD_kernel__)
+	char kbusid[1024], sbusid[1024];
+	char oid[128];
+	int domain, bus, dev, func;
+	int i, modesetting, ret;
+	size_t len;
+
+	ret = sscanf(busid, "pci:%04x:%02x:%02x.%d", &domain, &bus, &dev,
+	    &func);
+	if (ret != 4)
+		return -EINVAL;
+	snprintf(kbusid, sizeof(kbusid), "pci:%04x:%02x:%02x.%d", domain, bus,
+	    dev, func);
+
+	/* How many GPUs do we expect in the machine ? */
+	for (i = 0; i < 16; i++) {
+		snprintf(oid, sizeof(oid), "hw.dri.%d.busid", i);
+		len = sizeof(sbusid);
+		ret = sysctlbyname(oid, sbusid, &len, NULL, 0);
+		if (ret == -1) {
+			if (errno == ENOENT)
+				continue;
+			return -EINVAL;
+		}
+		if (strcmp(sbusid, kbusid) != 0)
+			continue;
+		snprintf(oid, sizeof(oid), "hw.dri.%d.modesetting", i);
+		len = sizeof(modesetting);
+		ret = sysctlbyname(oid, &modesetting, &len, NULL, 0);
+		if (ret == -1 || len != sizeof(modesetting))
+			return -EINVAL;
+		return (modesetting ? 0 : -ENOSYS);
+	}
+#elif defined(__DragonFly__)
+	return 0;
+#endif
+	return -ENOSYS;
+
+}
+
+int drmModeCrtcGetGamma(int fd, uint32_t crtc_id, uint32_t size,
+			uint16_t *red, uint16_t *green, uint16_t *blue)
+{
+	struct drm_mode_crtc_lut l;
+
+	memclear(l);
+	l.crtc_id = crtc_id;
+	l.gamma_size = size;
+	l.red = VOID2U64(red);
+	l.green = VOID2U64(green);
+	l.blue = VOID2U64(blue);
+
+	return DRM_IOCTL(fd, DRM_IOCTL_MODE_GETGAMMA, &l);
+}
+
+int drmModeCrtcSetGamma(int fd, uint32_t crtc_id, uint32_t size,
+			uint16_t *red, uint16_t *green, uint16_t *blue)
+{
+	struct drm_mode_crtc_lut l;
+
+	memclear(l);
+	l.crtc_id = crtc_id;
+	l.gamma_size = size;
+	l.red = VOID2U64(red);
+	l.green = VOID2U64(green);
+	l.blue = VOID2U64(blue);
+
+	return DRM_IOCTL(fd, DRM_IOCTL_MODE_SETGAMMA, &l);
+}
+
+int drmHandleEvent(int fd, drmEventContextPtr evctx)
+{
+	char buffer[1024];
+	int len, i;
+	struct drm_event *e;
+	struct drm_event_vblank *vblank;
+	
+	/* The DRM read semantics guarantees that we always get only
+	 * complete events. */
+
+	len = read(fd, buffer, sizeof buffer);
+	if (len == 0)
+		return 0;
+	if (len < (int)sizeof *e)
+		return -1;
+
+	i = 0;
+	while (i < len) {
+		e = (struct drm_event *) &buffer[i];
+		switch (e->type) {
+		case DRM_EVENT_VBLANK:
+			if (evctx->version < 1 ||
+			    evctx->vblank_handler == NULL)
+				break;
+			vblank = (struct drm_event_vblank *) e;
+			evctx->vblank_handler(fd,
+					      vblank->sequence, 
+					      vblank->tv_sec,
+					      vblank->tv_usec,
+					      U642VOID (vblank->user_data));
+			break;
+		case DRM_EVENT_FLIP_COMPLETE:
+			if (evctx->version < 2 ||
+			    evctx->page_flip_handler == NULL)
+				break;
+			vblank = (struct drm_event_vblank *) e;
+			evctx->page_flip_handler(fd,
+						 vblank->sequence,
+						 vblank->tv_sec,
+						 vblank->tv_usec,
+						 U642VOID (vblank->user_data));
+			break;
+		default:
+			break;
+		}
+		i += e->length;
+	}
+
+	return 0;
+}
+
+int drmModePageFlip(int fd, uint32_t crtc_id, uint32_t fb_id,
+		    uint32_t flags, void *user_data)
+{
+	struct drm_mode_crtc_page_flip flip;
+
+	memclear(flip);
+	flip.fb_id = fb_id;
+	flip.crtc_id = crtc_id;
+	flip.user_data = VOID2U64(user_data);
+	flip.flags = flags;
+
+	return DRM_IOCTL(fd, DRM_IOCTL_MODE_PAGE_FLIP, &flip);
+}
+
+int drmModeSetPlane(int fd, uint32_t plane_id, uint32_t crtc_id,
+		    uint32_t fb_id, uint32_t flags,
+		    int32_t crtc_x, int32_t crtc_y,
+		    uint32_t crtc_w, uint32_t crtc_h,
+		    uint32_t src_x, uint32_t src_y,
+		    uint32_t src_w, uint32_t src_h)
+
+{
+	struct drm_mode_set_plane s;
+
+	memclear(s);
+	s.plane_id = plane_id;
+	s.crtc_id = crtc_id;
+	s.fb_id = fb_id;
+	s.flags = flags;
+	s.crtc_x = crtc_x;
+	s.crtc_y = crtc_y;
+	s.crtc_w = crtc_w;
+	s.crtc_h = crtc_h;
+	s.src_x = src_x;
+	s.src_y = src_y;
+	s.src_w = src_w;
+	s.src_h = src_h;
+
+	return DRM_IOCTL(fd, DRM_IOCTL_MODE_SETPLANE, &s);
+}
+
+
+drmModePlanePtr drmModeGetPlane(int fd, uint32_t plane_id)
+{
+	struct drm_mode_get_plane ovr, counts;
+	drmModePlanePtr r = 0;
+
+retry:
+	memclear(ovr);
+	ovr.plane_id = plane_id;
+	if (drmIoctl(fd, DRM_IOCTL_MODE_GETPLANE, &ovr))
+		return 0;
+
+	counts = ovr;
+
+	if (ovr.count_format_types) {
+		ovr.format_type_ptr = VOID2U64(drmMalloc(ovr.count_format_types *
+							 sizeof(uint32_t)));
+		if (!ovr.format_type_ptr)
+			goto err_allocs;
+	}
+
+	if (drmIoctl(fd, DRM_IOCTL_MODE_GETPLANE, &ovr))
+		goto err_allocs;
+
+	if (counts.count_format_types < ovr.count_format_types) {
+		drmFree(U642VOID(ovr.format_type_ptr));
+		goto retry;
+	}
+
+	if (!(r = drmMalloc(sizeof(*r))))
+		goto err_allocs;
+
+	r->count_formats = ovr.count_format_types;
+	r->plane_id = ovr.plane_id;
+	r->crtc_id = ovr.crtc_id;
+	r->fb_id = ovr.fb_id;
+	r->possible_crtcs = ovr.possible_crtcs;
+	r->gamma_size = ovr.gamma_size;
+	r->formats = drmAllocCpy(U642VOID(ovr.format_type_ptr),
+				 ovr.count_format_types, sizeof(uint32_t));
+	if (ovr.count_format_types && !r->formats) {
+		drmFree(r->formats);
+		drmFree(r);
+		r = 0;
+	}
+
+err_allocs:
+	drmFree(U642VOID(ovr.format_type_ptr));
+
+	return r;
+}
+
+void drmModeFreePlane(drmModePlanePtr ptr)
+{
+	if (!ptr)
+		return;
+
+	drmFree(ptr->formats);
+	drmFree(ptr);
+}
+
+drmModePlaneResPtr drmModeGetPlaneResources(int fd)
+{
+	struct drm_mode_get_plane_res res, counts;
+	drmModePlaneResPtr r = 0;
+
+retry:
+	memclear(res);
+	if (drmIoctl(fd, DRM_IOCTL_MODE_GETPLANERESOURCES, &res))
+		return 0;
+
+	counts = res;
+
+	if (res.count_planes) {
+		res.plane_id_ptr = VOID2U64(drmMalloc(res.count_planes *
+							sizeof(uint32_t)));
+		if (!res.plane_id_ptr)
+			goto err_allocs;
+	}
+
+	if (drmIoctl(fd, DRM_IOCTL_MODE_GETPLANERESOURCES, &res))
+		goto err_allocs;
+
+	if (counts.count_planes < res.count_planes) {
+		drmFree(U642VOID(res.plane_id_ptr));
+		goto retry;
+	}
+
+	if (!(r = drmMalloc(sizeof(*r))))
+		goto err_allocs;
+
+	r->count_planes = res.count_planes;
+	r->planes = drmAllocCpy(U642VOID(res.plane_id_ptr),
+				  res.count_planes, sizeof(uint32_t));
+	if (res.count_planes && !r->planes) {
+		drmFree(r->planes);
+		drmFree(r);
+		r = 0;
+	}
+
+err_allocs:
+	drmFree(U642VOID(res.plane_id_ptr));
+
+	return r;
+}
+
+void drmModeFreePlaneResources(drmModePlaneResPtr ptr)
+{
+	if (!ptr)
+		return;
+
+	drmFree(ptr->planes);
+	drmFree(ptr);
+}
+
+drmModeObjectPropertiesPtr drmModeObjectGetProperties(int fd,
+						      uint32_t object_id,
+						      uint32_t object_type)
+{
+	struct drm_mode_obj_get_properties properties;
+	drmModeObjectPropertiesPtr ret = NULL;
+	uint32_t count;
+
+retry:
+	memclear(properties);
+	properties.obj_id = object_id;
+	properties.obj_type = object_type;
+
+	if (drmIoctl(fd, DRM_IOCTL_MODE_OBJ_GETPROPERTIES, &properties))
+		return 0;
+
+	count = properties.count_props;
+
+	if (count) {
+		properties.props_ptr = VOID2U64(drmMalloc(count *
+							  sizeof(uint32_t)));
+		if (!properties.props_ptr)
+			goto err_allocs;
+		properties.prop_values_ptr = VOID2U64(drmMalloc(count *
+						      sizeof(uint64_t)));
+		if (!properties.prop_values_ptr)
+			goto err_allocs;
+	}
+
+	if (drmIoctl(fd, DRM_IOCTL_MODE_OBJ_GETPROPERTIES, &properties))
+		goto err_allocs;
+
+	if (count < properties.count_props) {
+		drmFree(U642VOID(properties.props_ptr));
+		drmFree(U642VOID(properties.prop_values_ptr));
+		goto retry;
+	}
+	count = properties.count_props;
+
+	ret = drmMalloc(sizeof(*ret));
+	if (!ret)
+		goto err_allocs;
+
+	ret->count_props = count;
+	ret->props = drmAllocCpy(U642VOID(properties.props_ptr),
+				 count, sizeof(uint32_t));
+	ret->prop_values = drmAllocCpy(U642VOID(properties.prop_values_ptr),
+				       count, sizeof(uint64_t));
+	if (ret->count_props && (!ret->props || !ret->prop_values)) {
+		drmFree(ret->props);
+		drmFree(ret->prop_values);
+		drmFree(ret);
+		ret = NULL;
+	}
+
+err_allocs:
+	drmFree(U642VOID(properties.props_ptr));
+	drmFree(U642VOID(properties.prop_values_ptr));
+	return ret;
+}
+
+void drmModeFreeObjectProperties(drmModeObjectPropertiesPtr ptr)
+{
+	if (!ptr)
+		return;
+	drmFree(ptr->props);
+	drmFree(ptr->prop_values);
+	drmFree(ptr);
+}
+
+int drmModeObjectSetProperty(int fd, uint32_t object_id, uint32_t object_type,
+			     uint32_t property_id, uint64_t value)
+{
+	struct drm_mode_obj_set_property prop;
+
+	memclear(prop);
+	prop.value = value;
+	prop.prop_id = property_id;
+	prop.obj_id = object_id;
+	prop.obj_type = object_type;
+
+	return DRM_IOCTL(fd, DRM_IOCTL_MODE_OBJ_SETPROPERTY, &prop);
+}

diff --git a/icd/intel/kmd/libdrm/xf86drmMode.h b/icd/intel/kmd/libdrm/xf86drmMode.h
new file mode 100644
index 0000000..2d30184
--- /dev/null
+++ b/icd/intel/kmd/libdrm/xf86drmMode.h

@@ -0,0 +1,478 @@
+/*
+ * \file xf86drmMode.h
+ * Header for DRM modesetting interface.
+ *
+ * \author Jakob Bornecrantz <wallbraker@gmail.com>
+ *
+ * \par Acknowledgements:
+ * Feb 2007, Dave Airlie <airlied@linux.ie>
+ */
+
+/*
+ * Copyright (c) 2007-2008 Tungsten Graphics, Inc., Cedar Park, Texas.
+ * Copyright (c) 2007-2008 Dave Airlie <airlied@linux.ie>
+ * Copyright (c) 2007-2008 Jakob Bornecrantz <wallbraker@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#ifndef _XF86DRMMODE_H_
+#define _XF86DRMMODE_H_
+
+#if defined(__cplusplus) || defined(c_plusplus)
+extern "C" {
+#endif
+
+#include <drm.h>
+
+/*
+ * This is the interface for modesetting for drm.
+ *
+ * In order to use this interface you must include either <stdint.h> or another
+ * header defining uint32_t, int32_t and uint16_t.
+ *
+ * It aims to provide a randr1.2 compatible interface for modesettings in the
+ * kernel, the interface is also ment to be used by libraries like EGL.
+ *
+ * More information can be found in randrproto.txt which can be found here:
+ * http://gitweb.freedesktop.org/?p=xorg/proto/randrproto.git
+ *
+ * There are some major diffrences to be noted. Unlike the randr1.2 proto you
+ * need to create the memory object of the framebuffer yourself with the ttm
+ * buffer object interface. This object needs to be pinned.
+ */
+
+/*
+ * If we pickup an old version of drm.h which doesn't include drm_mode.h
+ * we should redefine defines. This is so that builds doesn't breaks with
+ * new libdrm on old kernels.
+ */
+#ifndef _DRM_MODE_H
+
+#define DRM_DISPLAY_INFO_LEN    32
+#define DRM_CONNECTOR_NAME_LEN  32
+#define DRM_DISPLAY_MODE_LEN    32
+#define DRM_PROP_NAME_LEN       32
+
+#define DRM_MODE_TYPE_BUILTIN   (1<<0)
+#define DRM_MODE_TYPE_CLOCK_C   ((1<<1) | DRM_MODE_TYPE_BUILTIN)
+#define DRM_MODE_TYPE_CRTC_C    ((1<<2) | DRM_MODE_TYPE_BUILTIN)
+#define DRM_MODE_TYPE_PREFERRED (1<<3)
+#define DRM_MODE_TYPE_DEFAULT   (1<<4)
+#define DRM_MODE_TYPE_USERDEF   (1<<5)
+#define DRM_MODE_TYPE_DRIVER    (1<<6)
+
+/* Video mode flags */
+/* bit compatible with the xorg definitions. */
+#define DRM_MODE_FLAG_PHSYNC			(1<<0)
+#define DRM_MODE_FLAG_NHSYNC			(1<<1)
+#define DRM_MODE_FLAG_PVSYNC			(1<<2)
+#define DRM_MODE_FLAG_NVSYNC			(1<<3)
+#define DRM_MODE_FLAG_INTERLACE			(1<<4)
+#define DRM_MODE_FLAG_DBLSCAN			(1<<5)
+#define DRM_MODE_FLAG_CSYNC			(1<<6)
+#define DRM_MODE_FLAG_PCSYNC			(1<<7)
+#define DRM_MODE_FLAG_NCSYNC			(1<<8)
+#define DRM_MODE_FLAG_HSKEW			(1<<9) /* hskew provided */
+#define DRM_MODE_FLAG_BCAST			(1<<10)
+#define DRM_MODE_FLAG_PIXMUX			(1<<11)
+#define DRM_MODE_FLAG_DBLCLK			(1<<12)
+#define DRM_MODE_FLAG_CLKDIV2			(1<<13)
+#define DRM_MODE_FLAG_3D_MASK			(0x1f<<14)
+#define  DRM_MODE_FLAG_3D_NONE			(0<<14)
+#define  DRM_MODE_FLAG_3D_FRAME_PACKING		(1<<14)
+#define  DRM_MODE_FLAG_3D_FIELD_ALTERNATIVE	(2<<14)
+#define  DRM_MODE_FLAG_3D_LINE_ALTERNATIVE	(3<<14)
+#define  DRM_MODE_FLAG_3D_SIDE_BY_SIDE_FULL	(4<<14)
+#define  DRM_MODE_FLAG_3D_L_DEPTH		(5<<14)
+#define  DRM_MODE_FLAG_3D_L_DEPTH_GFX_GFX_DEPTH	(6<<14)
+#define  DRM_MODE_FLAG_3D_TOP_AND_BOTTOM	(7<<14)
+#define  DRM_MODE_FLAG_3D_SIDE_BY_SIDE_HALF	(8<<14)
+
+/* DPMS flags */
+/* bit compatible with the xorg definitions. */
+#define DRM_MODE_DPMS_ON        0
+#define DRM_MODE_DPMS_STANDBY   1
+#define DRM_MODE_DPMS_SUSPEND   2
+#define DRM_MODE_DPMS_OFF       3
+
+/* Scaling mode options */
+#define DRM_MODE_SCALE_NON_GPU          0
+#define DRM_MODE_SCALE_FULLSCREEN       1
+#define DRM_MODE_SCALE_NO_SCALE         2
+#define DRM_MODE_SCALE_ASPECT           3
+
+/* Dithering mode options */
+#define DRM_MODE_DITHERING_OFF  0
+#define DRM_MODE_DITHERING_ON   1
+
+#define DRM_MODE_ENCODER_NONE   0
+#define DRM_MODE_ENCODER_DAC    1
+#define DRM_MODE_ENCODER_TMDS   2
+#define DRM_MODE_ENCODER_LVDS   3
+#define DRM_MODE_ENCODER_TVDAC  4
+#define DRM_MODE_ENCODER_VIRTUAL 5
+#define DRM_MODE_ENCODER_DSI	6
+
+#define DRM_MODE_SUBCONNECTOR_Automatic 0
+#define DRM_MODE_SUBCONNECTOR_Unknown   0
+#define DRM_MODE_SUBCONNECTOR_DVID      3
+#define DRM_MODE_SUBCONNECTOR_DVIA      4
+#define DRM_MODE_SUBCONNECTOR_Composite 5
+#define DRM_MODE_SUBCONNECTOR_SVIDEO    6
+#define DRM_MODE_SUBCONNECTOR_Component 8
+#define DRM_MODE_SUBCONNECTOR_SCART     9
+
+#define DRM_MODE_CONNECTOR_Unknown      0
+#define DRM_MODE_CONNECTOR_VGA          1
+#define DRM_MODE_CONNECTOR_DVII         2
+#define DRM_MODE_CONNECTOR_DVID         3
+#define DRM_MODE_CONNECTOR_DVIA         4
+#define DRM_MODE_CONNECTOR_Composite    5
+#define DRM_MODE_CONNECTOR_SVIDEO       6
+#define DRM_MODE_CONNECTOR_LVDS         7
+#define DRM_MODE_CONNECTOR_Component    8
+#define DRM_MODE_CONNECTOR_9PinDIN      9
+#define DRM_MODE_CONNECTOR_DisplayPort  10
+#define DRM_MODE_CONNECTOR_HDMIA        11
+#define DRM_MODE_CONNECTOR_HDMIB        12
+#define DRM_MODE_CONNECTOR_TV		13
+#define DRM_MODE_CONNECTOR_eDP		14
+#define DRM_MODE_CONNECTOR_VIRTUAL      15
+#define DRM_MODE_CONNECTOR_DSI          16
+
+#define DRM_MODE_PROP_PENDING   (1<<0)
+#define DRM_MODE_PROP_RANGE     (1<<1)
+#define DRM_MODE_PROP_IMMUTABLE (1<<2)
+#define DRM_MODE_PROP_ENUM      (1<<3) /* enumerated type with text strings */
+#define DRM_MODE_PROP_BLOB      (1<<4)
+
+#define DRM_MODE_CURSOR_BO      (1<<0)
+#define DRM_MODE_CURSOR_MOVE    (1<<1)
+
+#endif /* _DRM_MODE_H */
+
+
+/*
+ * Feature defines
+ *
+ * Just because these are defined doesn't mean that the kernel
+ * can do that feature, its just for new code vs old libdrm.
+ */
+#define DRM_MODE_FEATURE_KMS		1
+#define DRM_MODE_FEATURE_DIRTYFB	1
+
+
+typedef struct _drmModeRes {
+
+	int count_fbs;
+	uint32_t *fbs;
+
+	int count_crtcs;
+	uint32_t *crtcs;
+
+	int count_connectors;
+	uint32_t *connectors;
+
+	int count_encoders;
+	uint32_t *encoders;
+
+	uint32_t min_width, max_width;
+	uint32_t min_height, max_height;
+} drmModeRes, *drmModeResPtr;
+
+typedef struct _drmModeModeInfo {
+	uint32_t clock;
+	uint16_t hdisplay, hsync_start, hsync_end, htotal, hskew;
+	uint16_t vdisplay, vsync_start, vsync_end, vtotal, vscan;
+
+	uint32_t vrefresh;
+
+	uint32_t flags;
+	uint32_t type;
+	char name[DRM_DISPLAY_MODE_LEN];
+} drmModeModeInfo, *drmModeModeInfoPtr;
+
+typedef struct _drmModeFB {
+	uint32_t fb_id;
+	uint32_t width, height;
+	uint32_t pitch;
+	uint32_t bpp;
+	uint32_t depth;
+	/* driver specific handle */
+	uint32_t handle;
+} drmModeFB, *drmModeFBPtr;
+
+typedef struct drm_clip_rect drmModeClip, *drmModeClipPtr;
+
+typedef struct _drmModePropertyBlob {
+	uint32_t id;
+	uint32_t length;
+	void *data;
+} drmModePropertyBlobRes, *drmModePropertyBlobPtr;
+
+typedef struct _drmModeProperty {
+	uint32_t prop_id;
+	uint32_t flags;
+	char name[DRM_PROP_NAME_LEN];
+	int count_values;
+	uint64_t *values; /* store the blob lengths */
+	int count_enums;
+	struct drm_mode_property_enum *enums;
+	int count_blobs;
+	uint32_t *blob_ids; /* store the blob IDs */
+} drmModePropertyRes, *drmModePropertyPtr;
+
+static __inline int drm_property_type_is(drmModePropertyPtr property,
+		uint32_t type)
+{
+	/* instanceof for props.. handles extended type vs original types: */
+	if (property->flags & DRM_MODE_PROP_EXTENDED_TYPE)
+		return (property->flags & DRM_MODE_PROP_EXTENDED_TYPE) == type;
+	return property->flags & type;
+}
+
+typedef struct _drmModeCrtc {
+	uint32_t crtc_id;
+	uint32_t buffer_id; /**< FB id to connect to 0 = disconnect */
+
+	uint32_t x, y; /**< Position on the framebuffer */
+	uint32_t width, height;
+	int mode_valid;
+	drmModeModeInfo mode;
+
+	int gamma_size; /**< Number of gamma stops */
+
+} drmModeCrtc, *drmModeCrtcPtr;
+
+typedef struct _drmModeEncoder {
+	uint32_t encoder_id;
+	uint32_t encoder_type;
+	uint32_t crtc_id;
+	uint32_t possible_crtcs;
+	uint32_t possible_clones;
+} drmModeEncoder, *drmModeEncoderPtr;
+
+typedef enum {
+	DRM_MODE_CONNECTED         = 1,
+	DRM_MODE_DISCONNECTED      = 2,
+	DRM_MODE_UNKNOWNCONNECTION = 3
+} drmModeConnection;
+
+typedef enum {
+	DRM_MODE_SUBPIXEL_UNKNOWN        = 1,
+	DRM_MODE_SUBPIXEL_HORIZONTAL_RGB = 2,
+	DRM_MODE_SUBPIXEL_HORIZONTAL_BGR = 3,
+	DRM_MODE_SUBPIXEL_VERTICAL_RGB   = 4,
+	DRM_MODE_SUBPIXEL_VERTICAL_BGR   = 5,
+	DRM_MODE_SUBPIXEL_NONE           = 6
+} drmModeSubPixel;
+
+typedef struct _drmModeConnector {
+	uint32_t connector_id;
+	uint32_t encoder_id; /**< Encoder currently connected to */
+	uint32_t connector_type;
+	uint32_t connector_type_id;
+	drmModeConnection connection;
+	uint32_t mmWidth, mmHeight; /**< HxW in millimeters */
+	drmModeSubPixel subpixel;
+
+	int count_modes;
+	drmModeModeInfoPtr modes;
+
+	int count_props;
+	uint32_t *props; /**< List of property ids */
+	uint64_t *prop_values; /**< List of property values */
+
+	int count_encoders;
+	uint32_t *encoders; /**< List of encoder ids */
+} drmModeConnector, *drmModeConnectorPtr;
+
+#define DRM_PLANE_TYPE_OVERLAY 0
+#define DRM_PLANE_TYPE_PRIMARY 1
+#define DRM_PLANE_TYPE_CURSOR  2
+
+typedef struct _drmModeObjectProperties {
+	uint32_t count_props;
+	uint32_t *props;
+	uint64_t *prop_values;
+} drmModeObjectProperties, *drmModeObjectPropertiesPtr;
+
+typedef struct _drmModePlane {
+	uint32_t count_formats;
+	uint32_t *formats;
+	uint32_t plane_id;
+
+	uint32_t crtc_id;
+	uint32_t fb_id;
+
+	uint32_t crtc_x, crtc_y;
+	uint32_t x, y;
+
+	uint32_t possible_crtcs;
+	uint32_t gamma_size;
+} drmModePlane, *drmModePlanePtr;
+
+typedef struct _drmModePlaneRes {
+	uint32_t count_planes;
+	uint32_t *planes;
+} drmModePlaneRes, *drmModePlaneResPtr;
+
+extern void drmModeFreeModeInfo( drmModeModeInfoPtr ptr );
+extern void drmModeFreeResources( drmModeResPtr ptr );
+extern void drmModeFreeFB( drmModeFBPtr ptr );
+extern void drmModeFreeCrtc( drmModeCrtcPtr ptr );
+extern void drmModeFreeConnector( drmModeConnectorPtr ptr );
+extern void drmModeFreeEncoder( drmModeEncoderPtr ptr );
+extern void drmModeFreePlane( drmModePlanePtr ptr );
+extern void drmModeFreePlaneResources(drmModePlaneResPtr ptr);
+
+/**
+ * Retrives all of the resources associated with a card.
+ */
+extern drmModeResPtr drmModeGetResources(int fd);
+
+/*
+ * FrameBuffer manipulation.
+ */
+
+/**
+ * Retrive information about framebuffer bufferId
+ */
+extern drmModeFBPtr drmModeGetFB(int fd, uint32_t bufferId);
+
+/**
+ * Creates a new framebuffer with an buffer object as its scanout buffer.
+ */
+extern int drmModeAddFB(int fd, uint32_t width, uint32_t height, uint8_t depth,
+			uint8_t bpp, uint32_t pitch, uint32_t bo_handle,
+			uint32_t *buf_id);
+/* ...with a specific pixel format */
+extern int drmModeAddFB2(int fd, uint32_t width, uint32_t height,
+			 uint32_t pixel_format, uint32_t bo_handles[4],
+			 uint32_t pitches[4], uint32_t offsets[4],
+			 uint32_t *buf_id, uint32_t flags);
+/**
+ * Destroies the given framebuffer.
+ */
+extern int drmModeRmFB(int fd, uint32_t bufferId);
+
+/**
+ * Mark a region of a framebuffer as dirty.
+ */
+extern int drmModeDirtyFB(int fd, uint32_t bufferId,
+			  drmModeClipPtr clips, uint32_t num_clips);
+
+
+/*
+ * Crtc functions
+ */
+
+/**
+ * Retrive information about the ctrt crtcId
+ */
+extern drmModeCrtcPtr drmModeGetCrtc(int fd, uint32_t crtcId);
+
+/**
+ * Set the mode on a crtc crtcId with the given mode modeId.
+ */
+int drmModeSetCrtc(int fd, uint32_t crtcId, uint32_t bufferId,
+                   uint32_t x, uint32_t y, uint32_t *connectors, int count,
+		   drmModeModeInfoPtr mode);
+
+/*
+ * Cursor functions
+ */
+
+/**
+ * Set the cursor on crtc
+ */
+int drmModeSetCursor(int fd, uint32_t crtcId, uint32_t bo_handle, uint32_t width, uint32_t height);
+
+int drmModeSetCursor2(int fd, uint32_t crtcId, uint32_t bo_handle, uint32_t width, uint32_t height, int32_t hot_x, int32_t hot_y);
+/**
+ * Move the cursor on crtc
+ */
+int drmModeMoveCursor(int fd, uint32_t crtcId, int x, int y);
+
+/**
+ * Encoder functions
+ */
+drmModeEncoderPtr drmModeGetEncoder(int fd, uint32_t encoder_id);
+
+/*
+ * Connector manipulation
+ */
+
+/**
+ * Retrive information about the connector connectorId.
+ */
+extern drmModeConnectorPtr drmModeGetConnector(int fd,
+		uint32_t connectorId);
+
+/**
+ * Attaches the given mode to an connector.
+ */
+extern int drmModeAttachMode(int fd, uint32_t connectorId, drmModeModeInfoPtr mode_info);
+
+/**
+ * Detaches a mode from the connector
+ * must be unused, by the given mode.
+ */
+extern int drmModeDetachMode(int fd, uint32_t connectorId, drmModeModeInfoPtr mode_info);
+
+extern drmModePropertyPtr drmModeGetProperty(int fd, uint32_t propertyId);
+extern void drmModeFreeProperty(drmModePropertyPtr ptr);
+
+extern drmModePropertyBlobPtr drmModeGetPropertyBlob(int fd, uint32_t blob_id);
+extern void drmModeFreePropertyBlob(drmModePropertyBlobPtr ptr);
+extern int drmModeConnectorSetProperty(int fd, uint32_t connector_id, uint32_t property_id,
+				    uint64_t value);
+extern int drmCheckModesettingSupported(const char *busid);
+
+extern int drmModeCrtcSetGamma(int fd, uint32_t crtc_id, uint32_t size,
+			       uint16_t *red, uint16_t *green, uint16_t *blue);
+extern int drmModeCrtcGetGamma(int fd, uint32_t crtc_id, uint32_t size,
+			       uint16_t *red, uint16_t *green, uint16_t *blue);
+extern int drmModePageFlip(int fd, uint32_t crtc_id, uint32_t fb_id,
+			   uint32_t flags, void *user_data);
+
+extern drmModePlaneResPtr drmModeGetPlaneResources(int fd);
+extern drmModePlanePtr drmModeGetPlane(int fd, uint32_t plane_id);
+extern int drmModeSetPlane(int fd, uint32_t plane_id, uint32_t crtc_id,
+			   uint32_t fb_id, uint32_t flags,
+			   int32_t crtc_x, int32_t crtc_y,
+			   uint32_t crtc_w, uint32_t crtc_h,
+			   uint32_t src_x, uint32_t src_y,
+			   uint32_t src_w, uint32_t src_h);
+
+extern drmModeObjectPropertiesPtr drmModeObjectGetProperties(int fd,
+							uint32_t object_id,
+							uint32_t object_type);
+extern void drmModeFreeObjectProperties(drmModeObjectPropertiesPtr ptr);
+extern int drmModeObjectSetProperty(int fd, uint32_t object_id,
+				    uint32_t object_type, uint32_t property_id,
+				    uint64_t value);
+
+#if defined(__cplusplus) || defined(c_plusplus)
+}
+#endif
+
+#endif

diff --git a/icd/intel/kmd/libdrm/xf86drmRandom.c b/icd/intel/kmd/libdrm/xf86drmRandom.c
new file mode 100644
index 0000000..81f0301
--- /dev/null
+++ b/icd/intel/kmd/libdrm/xf86drmRandom.c

@@ -0,0 +1,137 @@
+/* xf86drmRandom.c -- "Minimal Standard" PRNG Implementation
+ * Created: Mon Apr 19 08:28:13 1999 by faith@precisioninsight.com
+ *
+ * Copyright 1999 Precision Insight, Inc., Cedar Park, Texas.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ * 
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ * 
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ * 
+ * Authors: Rickard E. (Rik) Faith <faith@valinux.com>
+ *
+ * DESCRIPTION
+ *
+ * This file contains a simple, straightforward implementation of the Park
+ * & Miller "Minimal Standard" PRNG [PM88, PMS93], which is a Lehmer
+ * multiplicative linear congruential generator (MLCG) with a period of
+ * 2^31-1.
+ *
+ * This implementation is intended to provide a reliable, portable PRNG
+ * that is suitable for testing a hash table implementation and for
+ * implementing skip lists.
+ *
+ * FUTURE ENHANCEMENTS
+ *
+ * If initial seeds are not selected randomly, two instances of the PRNG
+ * can be correlated.  [Knuth81, pp. 32-33] describes a shuffling technique
+ * that can eliminate this problem.
+ *
+ * If PRNGs are used for simulation, the period of the current
+ * implementation may be too short.  [LE88] discusses methods of combining
+ * MLCGs to produce much longer periods, and suggests some alternative
+ * values for A and M.  [LE90 and Sch92] also provide information on
+ * long-period PRNGs.
+ *
+ * REFERENCES
+ *
+ * [Knuth81] Donald E. Knuth. The Art of Computer Programming.  Volume 2:
+ * Seminumerical Algorithms.  Reading, Massachusetts: Addison-Wesley, 1981.
+ *
+ * [LE88] Pierre L'Ecuyer. "Efficient and Portable Combined Random Number
+ * Generators".  CACM 31(6), June 1988, pp. 742-774.
+ *
+ * [LE90] Pierre L'Ecuyer. "Random Numbers for Simulation". CACM 33(10,
+ * October 1990, pp. 85-97.
+ *
+ * [PM88] Stephen K. Park and Keith W. Miller. "Random Number Generators:
+ * Good Ones are Hard to Find". CACM 31(10), October 1988, pp. 1192-1201.
+ *
+ * [Sch92] Bruce Schneier. "Pseudo-Ransom Sequence Generator for 32-Bit
+ * CPUs".  Dr. Dobb's Journal 17(2), February 1992, pp. 34, 37-38, 40.
+ *
+ * [PMS93] Stephen K. Park, Keith W. Miller, and Paul K. Stockmeyer.  In
+ * "Technical Correspondence: Remarks on Choosing and Implementing Random
+ * Number Generators". CACM 36(7), July 1993, pp. 105-110.
+ *
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "xf86drm.h"
+#include "xf86drmRandom.h"
+
+#define RANDOM_MAGIC 0xfeedbeef
+
+void *drmRandomCreate(unsigned long seed)
+{
+    RandomState  *state;
+
+    state           = drmMalloc(sizeof(*state));
+    if (!state) return NULL;
+    state->magic    = RANDOM_MAGIC;
+#if 0
+				/* Park & Miller, October 1988 */
+    state->a        = 16807;
+    state->m        = 2147483647;
+    state->check    = 1043618065; /* After 10000 iterations */
+#else
+				/* Park, Miller, and Stockmeyer, July 1993 */
+    state->a        = 48271;
+    state->m        = 2147483647;
+    state->check    = 399268537; /* After 10000 iterations */
+#endif
+    state->q        = state->m / state->a;
+    state->r        = state->m % state->a;
+
+    state->seed     = seed;
+				/* Check for illegal boundary conditions,
+                                   and choose closest legal value. */
+    if (state->seed <= 0)        state->seed = 1;
+    if (state->seed >= state->m) state->seed = state->m - 1;
+
+    return state;
+}
+
+int drmRandomDestroy(void *state)
+{
+    drmFree(state);
+    return 0;
+}
+
+unsigned long drmRandom(void *state)
+{
+    RandomState   *s = (RandomState *)state;
+    unsigned long hi;
+    unsigned long lo;
+
+    hi      = s->seed / s->q;
+    lo      = s->seed % s->q;
+    s->seed = s->a * lo - s->r * hi;
+    if ((s->a * lo) <= (s->r * hi)) s->seed += s->m;
+
+    return s->seed;
+}
+
+double drmRandomDouble(void *state)
+{
+    RandomState *s = (RandomState *)state;
+    
+    return (double)drmRandom(state)/(double)s->m;
+}

diff --git a/icd/intel/kmd/libdrm/xf86drmRandom.h b/icd/intel/kmd/libdrm/xf86drmRandom.h
new file mode 100644
index 0000000..43b730c
--- /dev/null
+++ b/icd/intel/kmd/libdrm/xf86drmRandom.h

@@ -0,0 +1,35 @@
+/*
+ * Copyright 1999 Precision Insight, Inc., Cedar Park, Texas.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: Rickard E. (Rik) Faith <faith@valinux.com>
+ */
+
+typedef struct RandomState {
+    unsigned long magic;
+    unsigned long a;
+    unsigned long m;
+    unsigned long q;		/* m div a */
+    unsigned long r;		/* m mod a */
+    unsigned long check;
+    unsigned long seed;
+} RandomState;

diff --git a/icd/intel/kmd/winsys.h b/icd/intel/kmd/winsys.h
new file mode 100644
index 0000000..098b53d
--- /dev/null
+++ b/icd/intel/kmd/winsys.h

@@ -0,0 +1,319 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 2012-2014 LunarG, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Chia-I Wu <olv@lunarg.com>
+ */
+
+#ifndef INTEL_WINSYS_H
+#define INTEL_WINSYS_H
+
+#include <stddef.h>
+#include <stdint.h>
+#include <stdbool.h>
+
+/* this is compatible with i915_drm.h's definitions */
+enum intel_ring_type {
+   INTEL_RING_RENDER    = 1,
+   INTEL_RING_BSD       = 2,
+   INTEL_RING_BLT       = 3,
+   INTEL_RING_VEBOX     = 4,
+};
+
+/* this is compatible with i915_drm.h's definitions */
+enum intel_exec_flag {
+   INTEL_EXEC_GEN7_SOL_RESET  = 1 << 8,
+};
+
+/* this is compatible with i915_drm.h's definitions */
+enum intel_reloc_flag {
+   INTEL_RELOC_FENCE          = 1 << 0,
+   INTEL_RELOC_GGTT           = 1 << 1,
+   INTEL_RELOC_WRITE          = 1 << 2,
+};
+
+/* this is compatible with i915_drm.h's definitions */
+enum intel_tiling_mode {
+   INTEL_TILING_NONE = 0,
+   INTEL_TILING_X    = 1,
+   INTEL_TILING_Y    = 2,
+};
+
+enum intel_winsys_handle_type {
+   INTEL_WINSYS_HANDLE_SHARED,
+   INTEL_WINSYS_HANDLE_KMS,
+   INTEL_WINSYS_HANDLE_FD,
+};
+
+struct icd_instance;
+struct intel_winsys;
+struct intel_bo;
+
+struct intel_winsys_info {
+   int devid;
+
+   /* the sizes of the aperture in bytes */
+   size_t aperture_total;
+   size_t aperture_mappable;
+
+   bool has_llc;
+   bool has_address_swizzling;
+   bool has_logical_context;
+   bool has_ppgtt;
+
+   /* valid registers for intel_winsys_read_reg() */
+   bool has_timestamp;
+
+   /* valid flags for intel_winsys_submit_bo() */
+   bool has_gen7_sol_reset;
+};
+
+struct intel_winsys_handle {
+    enum intel_winsys_handle_type type;
+    unsigned handle;
+    unsigned stride;
+};
+
+struct intel_winsys *
+intel_winsys_create_for_fd(const struct icd_instance *instance, int fd);
+
+void
+intel_winsys_destroy(struct intel_winsys *winsys);
+
+const struct intel_winsys_info *
+intel_winsys_get_info(const struct intel_winsys *winsys);
+
+/**
+ * Read a register.  Only registers that are considered safe, such as
+ *
+ *   TIMESTAMP (0x2358)
+ *
+ * can be read.
+ */
+int
+intel_winsys_read_reg(struct intel_winsys *winsys,
+                      uint32_t reg, uint64_t *val);
+
+/**
+ * Return the numbers of submissions lost due to GPU reset.
+ *
+ * \param active_lost      Number of lost active/guilty submissions
+ * \param pending_lost     Number of lost pending/innocent submissions
+ */
+int
+intel_winsys_get_reset_stats(struct intel_winsys *winsys,
+                             uint32_t *active_lost,
+                             uint32_t *pending_lost);
+/**
+ * Allocate a buffer object.
+ *
+ * \param name             Informative description of the bo.
+ * \param size             Size of the bo.
+ * \param cpu_init         Will be initialized by CPU.
+ */
+struct intel_bo *
+intel_winsys_alloc_bo(struct intel_winsys *winsys,
+                      const char *name,
+                      unsigned long size,
+                      bool cpu_init);
+
+/**
+ * Create a bo from a winsys handle.
+ */
+struct intel_bo *
+intel_winsys_import_handle(struct intel_winsys *winsys,
+                           const char *name,
+                           const struct intel_winsys_handle *handle,
+                           unsigned long height,
+                           enum intel_tiling_mode *tiling,
+                           unsigned long *pitch);
+
+/**
+ * Export \p bo as a winsys handle for inter-process sharing.  \p tiling and
+ * \p pitch must match those set by \p intel_bo_set_tiling().
+ */
+int
+intel_winsys_export_handle(struct intel_winsys *winsys,
+                           struct intel_bo *bo,
+                           enum intel_tiling_mode tiling,
+                           unsigned long pitch,
+                           unsigned long height,
+                           struct intel_winsys_handle *handle);
+
+/**
+ * Return true when buffer objects directly specified in \p bo_array, and
+ * those indirectly referenced by them, can fit in the aperture space.
+ */
+bool
+intel_winsys_can_submit_bo(struct intel_winsys *winsys,
+                           struct intel_bo **bo_array,
+                           int count);
+
+/**
+ * Submit \p bo for execution.
+ *
+ * \p bo and all bos referenced by \p bo will be considered busy until all
+ * commands are parsed and executed.  \p ctx is ignored when the bo is not
+ * submitted to the render ring.
+ */
+int
+intel_winsys_submit_bo(struct intel_winsys *winsys,
+                       enum intel_ring_type ring,
+                       struct intel_bo *bo, int used,
+                       unsigned long flags);
+
+/**
+ * Decode the commands contained in \p bo.  For debugging.
+ *
+ * \param bo      Batch buffer to decode.
+ * \param used    Size of the commands in bytes.
+ */
+void
+intel_winsys_decode_bo(struct intel_winsys *winsys,
+                       struct intel_bo *bo, int used);
+
+/**
+ * Increase the reference count of \p bo.  No-op when \p bo is NULL.
+ */
+struct intel_bo *
+intel_bo_ref(struct intel_bo *bo);
+
+/**
+ * Decrease the reference count of \p bo.  When the reference count reaches
+ * zero, \p bo is destroyed.  No-op when \p bo is NULL.
+ */
+void
+intel_bo_unref(struct intel_bo *bo);
+
+/**
+ * Set the tiling of \p bo.  The info is used by GTT mapping and bo export.
+ */
+int
+intel_bo_set_tiling(struct intel_bo *bo,
+                    enum intel_tiling_mode tiling,
+                    unsigned long pitch);
+
+/**
+ * Map \p bo for CPU access.  Recursive mapping is allowed.
+ *
+ * map() maps the backing store into CPU address space, cached.  It will block
+ * if the bo is busy.  This variant allows fastest random reads and writes,
+ * but the caller needs to handle tiling or swizzling manually if the bo is
+ * tiled or swizzled.  If write is enabled and there is no shared last-level
+ * cache (LLC), the CPU cache will be flushed, which is expensive.
+ *
+ * map_gtt() maps the bo for MMIO access, uncached but write-combined.  It
+ * will block if the bo is busy.  This variant promises a reasonable speed for
+ * sequential writes, but reads would be very slow.  Callers always have a
+ * linear view of the bo.
+ *
+ * map_async() and map_gtt_async() work similar to map() and map_gtt()
+ * respectively, except that they do not block.
+ */
+void *
+intel_bo_map(struct intel_bo *bo, bool write_enable);
+
+void *
+intel_bo_map_async(struct intel_bo *bo);
+
+void *
+intel_bo_map_gtt(struct intel_bo *bo);
+
+void *
+intel_bo_map_gtt_async(struct intel_bo *bo);
+
+/**
+ * Unmap \p bo.
+ */
+void
+intel_bo_unmap(struct intel_bo *bo);
+
+/**
+ * Write data to \p bo.
+ */
+int
+intel_bo_pwrite(struct intel_bo *bo, unsigned long offset,
+                unsigned long size, const void *data);
+
+/**
+ * Read data from the bo.
+ */
+int
+intel_bo_pread(struct intel_bo *bo, unsigned long offset,
+               unsigned long size, void *data);
+
+/**
+ * Add \p target_bo to the relocation list.
+ *
+ * When \p bo is submitted for execution, and if \p target_bo has moved,
+ * the kernel will patch \p bo at \p offset to \p target_bo->offset plus
+ * \p target_offset.
+ *
+ * \p presumed_offset should be written to \p bo at \p offset.
+ */
+int
+intel_bo_add_reloc(struct intel_bo *bo, uint32_t offset,
+                   struct intel_bo *target_bo, uint32_t target_offset,
+                   uint32_t flags, uint64_t *presumed_offset);
+
+/**
+ * Return the current number of relocations.
+ */
+int
+intel_bo_get_reloc_count(struct intel_bo *bo);
+
+/**
+ * Truncate all relocations except the first \p start ones.
+ *
+ * Combined with \p intel_bo_get_reloc_count(), they can be used to undo the
+ * \p intel_bo_add_reloc() calls that were just made.
+ */
+void
+intel_bo_truncate_relocs(struct intel_bo *bo, int start);
+
+/**
+ * Return true if \p target_bo is on the relocation list of \p bo, or on
+ * the relocation list of some bo that is referenced by \p bo.
+ */
+bool
+intel_bo_has_reloc(struct intel_bo *bo, struct intel_bo *target_bo);
+
+/**
+ * Wait until \bo is idle, or \p timeout nanoseconds have passed.  A
+ * negative timeout means to wait indefinitely.
+ *
+ * \return 0 only when \p bo is idle
+ */
+int
+intel_bo_wait(struct intel_bo *bo, int64_t timeout);
+
+/**
+ * Return true if \p bo is busy.
+ */
+static inline bool
+intel_bo_is_busy(struct intel_bo *bo)
+{
+   return (intel_bo_wait(bo, 0) != 0);
+}
+
+#endif /* INTEL_WINSYS_H */

diff --git a/icd/intel/kmd/winsys_drm.c b/icd/intel/kmd/winsys_drm.c
new file mode 100644
index 0000000..4a9ebc7
--- /dev/null
+++ b/icd/intel/kmd/winsys_drm.c

@@ -0,0 +1,603 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 2012-2014 LunarG, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Chia-I Wu <olv@lunarg.com>
+ */
+
+#include <string.h>
+#include <stdlib.h>
+#include <limits.h>
+#include <errno.h>
+#ifndef ETIME
+#define ETIME ETIMEDOUT
+#endif
+#include <assert.h>
+
+#include <xf86drm.h>
+#include <i915_drm.h>
+#include <intel_bufmgr.h>
+
+#include "icd-instance.h"
+#include "icd-utils.h"
+#include "winsys.h"
+
+struct intel_winsys {
+   const struct icd_instance *instance;
+   int fd;
+   drm_intel_bufmgr *bufmgr;
+   struct intel_winsys_info info;
+
+   drm_intel_context *ctx;
+};
+
+static drm_intel_bo *
+gem_bo(const struct intel_bo *bo)
+{
+   return (drm_intel_bo *) bo;
+}
+
+static bool
+get_param(struct intel_winsys *winsys, int param, int *value)
+{
+   struct drm_i915_getparam gp;
+   int err;
+
+   *value = 0;
+
+   memset(&gp, 0, sizeof(gp));
+   gp.param = param;
+   gp.value = value;
+
+   err = drmCommandWriteRead(winsys->fd, DRM_I915_GETPARAM, &gp, sizeof(gp));
+   if (err) {
+      *value = 0;
+      return false;
+   }
+
+   return true;
+}
+
+static bool
+test_address_swizzling(struct intel_winsys *winsys)
+{
+   drm_intel_bo *bo;
+   uint32_t tiling = I915_TILING_X, swizzle;
+   unsigned long pitch;
+
+   bo = drm_intel_bo_alloc_tiled(winsys->bufmgr,
+         "address swizzling test", 64, 64, 4, &tiling, &pitch, 0);
+   if (bo) {
+      drm_intel_bo_get_tiling(bo, &tiling, &swizzle);
+      drm_intel_bo_unreference(bo);
+   }
+   else {
+      swizzle = I915_BIT_6_SWIZZLE_NONE;
+   }
+
+   return (swizzle != I915_BIT_6_SWIZZLE_NONE);
+}
+
+static bool
+test_reg_read(struct intel_winsys *winsys, uint32_t reg)
+{
+   uint64_t dummy;
+
+   return !drm_intel_reg_read(winsys->bufmgr, reg, &dummy);
+}
+
+static bool
+probe_winsys(struct intel_winsys *winsys)
+{
+   struct intel_winsys_info *info = &winsys->info;
+   int val;
+
+   /*
+    * When we need the Nth vertex from a user vertex buffer, and the vertex is
+    * uploaded to, say, the beginning of a bo, we want the first vertex in the
+    * bo to be fetched.  One way to do this is to set the base address of the
+    * vertex buffer to
+    *
+    *   bo->offset64 + (vb->buffer_offset - vb->stride * N).
+    *
+    * The second term may be negative, and we need kernel support to do that.
+    *
+    * This check is taken from the classic driver.  u_vbuf_upload_buffers()
+    * guarantees the term is never negative, but it is good to require a
+    * recent kernel.
+    */
+   get_param(winsys, I915_PARAM_HAS_RELAXED_DELTA, &val);
+   if (!val) {
+      return false;
+   }
+
+   info->devid = drm_intel_bufmgr_gem_get_devid(winsys->bufmgr);
+
+   if (drm_intel_get_aperture_sizes(winsys->fd,
+               &info->aperture_mappable, &info->aperture_total)) {
+       return false;
+   }
+
+   get_param(winsys, I915_PARAM_HAS_LLC, &val);
+   info->has_llc = val;
+   info->has_address_swizzling = test_address_swizzling(winsys);
+
+   winsys->ctx = drm_intel_gem_context_create(winsys->bufmgr);
+   if (!winsys->ctx)
+      return false;
+
+   info->has_logical_context = (winsys->ctx != NULL);
+
+   get_param(winsys, I915_PARAM_HAS_ALIASING_PPGTT, &val);
+   info->has_ppgtt = val;
+
+   /* test TIMESTAMP read */
+   info->has_timestamp = test_reg_read(winsys, 0x2358);
+
+   get_param(winsys, I915_PARAM_HAS_GEN7_SOL_RESET, &val);
+   info->has_gen7_sol_reset = val;
+
+   return true;
+}
+
+struct intel_winsys *
+intel_winsys_create_for_fd(const struct icd_instance *instance, int fd)
+{
+   /* so that we can have enough relocs per bo */
+   const int batch_size = sizeof(uint32_t) * 150 * 1024;
+   struct intel_winsys *winsys;
+
+   winsys = icd_instance_alloc(instance, sizeof(*winsys), sizeof(int),
+           VK_SYSTEM_ALLOCATION_SCOPE_INSTANCE);
+   if (!winsys)
+      return NULL;
+
+   memset(winsys, 0, sizeof(*winsys));
+
+   winsys->instance = instance;
+   winsys->fd = fd;
+
+   winsys->bufmgr = drm_intel_bufmgr_gem_init(winsys->fd, batch_size);
+   if (!winsys->bufmgr) {
+      icd_instance_free(instance, winsys);
+      return NULL;
+   }
+
+   if (!probe_winsys(winsys)) {
+      drm_intel_bufmgr_destroy(winsys->bufmgr);
+      icd_instance_free(instance, winsys);
+      return NULL;
+   }
+
+   /*
+    * No need to implicitly set up a fence register for each non-linear reloc
+    * entry.  INTEL_RELOC_FENCE will be set on reloc entries that need them.
+    */
+   drm_intel_bufmgr_gem_enable_fenced_relocs(winsys->bufmgr);
+
+   drm_intel_bufmgr_gem_enable_reuse(winsys->bufmgr);
+   drm_intel_bufmgr_gem_set_vma_cache_size(winsys->bufmgr, -1);
+
+   return winsys;
+}
+
+void
+intel_winsys_destroy(struct intel_winsys *winsys)
+{
+   drm_intel_gem_context_destroy(winsys->ctx);
+   drm_intel_bufmgr_destroy(winsys->bufmgr);
+   icd_instance_free(winsys->instance, winsys);
+}
+
+const struct intel_winsys_info *
+intel_winsys_get_info(const struct intel_winsys *winsys)
+{
+   return &winsys->info;
+}
+
+int
+intel_winsys_read_reg(struct intel_winsys *winsys,
+                      uint32_t reg, uint64_t *val)
+{
+   return drm_intel_reg_read(winsys->bufmgr, reg, val);
+}
+
+int
+intel_winsys_get_reset_stats(struct intel_winsys *winsys,
+                             uint32_t *active_lost,
+                             uint32_t *pending_lost)
+{
+   uint32_t reset_count;
+
+   return drm_intel_get_reset_stats(winsys->ctx,
+         &reset_count, active_lost, pending_lost);
+}
+
+struct intel_bo *
+intel_winsys_alloc_bo(struct intel_winsys *winsys,
+                      const char *name,
+                      unsigned long size,
+                      bool cpu_init)
+{
+   const unsigned int alignment = 4096; /* always page-aligned */
+   drm_intel_bo *bo;
+
+   if (cpu_init) {
+      bo = drm_intel_bo_alloc(winsys->bufmgr, name, size, alignment);
+   } else {
+      bo = drm_intel_bo_alloc_for_render(winsys->bufmgr,
+            name, size, alignment);
+   }
+
+   return (struct intel_bo *) bo;
+}
+
+struct intel_bo *
+intel_winsys_import_handle(struct intel_winsys *winsys,
+                           const char *name,
+                           const struct intel_winsys_handle *handle,
+                           unsigned long height,
+                           enum intel_tiling_mode *tiling,
+                           unsigned long *pitch)
+{
+   uint32_t real_tiling, swizzle;
+   drm_intel_bo *bo;
+   int err;
+
+   switch (handle->type) {
+   case INTEL_WINSYS_HANDLE_SHARED:
+      {
+         const uint32_t gem_name = handle->handle;
+         bo = drm_intel_bo_gem_create_from_name(winsys->bufmgr,
+               name, gem_name);
+      }
+      break;
+   case INTEL_WINSYS_HANDLE_FD:
+      {
+         const int fd = (int) handle->handle;
+         bo = drm_intel_bo_gem_create_from_prime(winsys->bufmgr,
+               fd, height * handle->stride);
+      }
+      break;
+   default:
+      bo = NULL;
+      break;
+   }
+
+   if (!bo)
+      return NULL;
+
+   err = drm_intel_bo_get_tiling(bo, &real_tiling, &swizzle);
+   if (err) {
+      drm_intel_bo_unreference(bo);
+      return NULL;
+   }
+
+   *tiling = real_tiling;
+   *pitch = handle->stride;
+
+   return (struct intel_bo *) bo;
+}
+
+int
+intel_winsys_export_handle(struct intel_winsys *winsys,
+                           struct intel_bo *bo,
+                           enum intel_tiling_mode tiling,
+                           unsigned long pitch,
+                           unsigned long height,
+                           struct intel_winsys_handle *handle)
+{
+   int err = 0;
+
+   switch (handle->type) {
+   case INTEL_WINSYS_HANDLE_SHARED:
+      {
+         uint32_t name;
+
+         err = drm_intel_bo_flink(gem_bo(bo), &name);
+         if (!err)
+            handle->handle = name;
+      }
+      break;
+   case INTEL_WINSYS_HANDLE_KMS:
+      handle->handle = gem_bo(bo)->handle;
+      break;
+   case INTEL_WINSYS_HANDLE_FD:
+      {
+         int fd;
+
+         err = drm_intel_bo_gem_export_to_prime(gem_bo(bo), &fd);
+         if (!err)
+            handle->handle = fd;
+      }
+      break;
+   default:
+      err = -EINVAL;
+      break;
+   }
+
+   if (err)
+      return err;
+
+   handle->stride = pitch;
+
+   return 0;
+}
+
+bool
+intel_winsys_can_submit_bo(struct intel_winsys *winsys,
+                           struct intel_bo **bo_array,
+                           int count)
+{
+   return !drm_intel_bufmgr_check_aperture_space((drm_intel_bo **) bo_array,
+                                                 count);
+}
+
+int
+intel_winsys_submit_bo(struct intel_winsys *winsys,
+                       enum intel_ring_type ring,
+                       struct intel_bo *bo, int used,
+                       unsigned long flags)
+{
+   const unsigned long exec_flags = (unsigned long) ring | flags;
+   drm_intel_context *ctx;
+
+   /* logical contexts are only available for the render ring */
+   ctx = (ring == INTEL_RING_RENDER) ? winsys->ctx : NULL;
+
+   if (ctx) {
+      return drm_intel_gem_bo_context_exec(gem_bo(bo),
+            ctx, used, exec_flags);
+   }
+   else {
+      return drm_intel_bo_mrb_exec(gem_bo(bo),
+            used, NULL, 0, 0, exec_flags);
+   }
+}
+
+void
+intel_winsys_decode_bo(struct intel_winsys *winsys,
+                       struct intel_bo *bo, int used)
+{
+   struct drm_intel_decode *decode;
+   void *ptr;
+
+   ptr = intel_bo_map(bo, false);
+   if (!ptr) {
+      return;
+   }
+
+   decode = drm_intel_decode_context_alloc(winsys->info.devid);
+   if (!decode) {
+      intel_bo_unmap(bo);
+      return;
+   }
+
+   drm_intel_decode_set_output_file(decode, stderr);
+
+   /* in dwords */
+   used /= 4;
+
+   drm_intel_decode_set_batch_pointer(decode,
+         ptr, gem_bo(bo)->offset64, used);
+
+   drm_intel_decode(decode);
+   free(decode);
+   intel_bo_unmap(bo);
+}
+
+struct intel_bo *
+intel_bo_ref(struct intel_bo *bo)
+{
+   if (bo)
+      drm_intel_bo_reference(gem_bo(bo));
+
+   return bo;
+}
+
+void
+intel_bo_unref(struct intel_bo *bo)
+{
+   if (bo)
+      drm_intel_bo_unreference(gem_bo(bo));
+}
+
+int
+intel_bo_set_tiling(struct intel_bo *bo,
+                    enum intel_tiling_mode tiling,
+                    unsigned long pitch)
+{
+   uint32_t real_tiling = tiling;
+   int err;
+
+   switch (tiling) {
+   case INTEL_TILING_X:
+      if (pitch % 512)
+         return -1;
+      break;
+   case INTEL_TILING_Y:
+      if (pitch % 128)
+         return -1;
+      break;
+   default:
+      break;
+   }
+
+   err = drm_intel_bo_set_tiling(gem_bo(bo), &real_tiling, pitch);
+   if (err || real_tiling != tiling) {
+      assert(!"tiling mismatch");
+      return -1;
+   }
+
+   return 0;
+}
+
+void *
+intel_bo_map(struct intel_bo *bo, bool write_enable)
+{
+   int err;
+
+   err = drm_intel_bo_map(gem_bo(bo), write_enable);
+   if (err) {
+      return NULL;
+   }
+
+   return gem_bo(bo)->virtual;
+}
+
+void *
+intel_bo_map_async(struct intel_bo *bo)
+{
+   int err;
+
+   err = drm_intel_gem_bo_map_unsynchronized_non_gtt(gem_bo(bo));
+   if (err) {
+      return NULL;
+   }
+
+   return gem_bo(bo)->virtual;
+}
+
+void *
+intel_bo_map_gtt(struct intel_bo *bo)
+{
+   int err;
+
+   err = drm_intel_gem_bo_map_gtt(gem_bo(bo));
+   if (err) {
+      return NULL;
+   }
+
+   return gem_bo(bo)->virtual;
+}
+
+void *
+intel_bo_map_gtt_async(struct intel_bo *bo)
+{
+   int err;
+
+   err = drm_intel_gem_bo_map_unsynchronized(gem_bo(bo));
+   if (err) {
+      return NULL;
+   }
+
+   return gem_bo(bo)->virtual;
+}
+
+void
+intel_bo_unmap(struct intel_bo *bo)
+{
+   int err U_ASSERT_ONLY;
+
+   err = drm_intel_bo_unmap(gem_bo(bo));
+   assert(!err);
+}
+
+int
+intel_bo_pwrite(struct intel_bo *bo, unsigned long offset,
+                unsigned long size, const void *data)
+{
+   return drm_intel_bo_subdata(gem_bo(bo), offset, size, data);
+}
+
+int
+intel_bo_pread(struct intel_bo *bo, unsigned long offset,
+               unsigned long size, void *data)
+{
+   return drm_intel_bo_get_subdata(gem_bo(bo), offset, size, data);
+}
+
+int
+intel_bo_add_reloc(struct intel_bo *bo, uint32_t offset,
+                   struct intel_bo *target_bo, uint32_t target_offset,
+                   uint32_t flags, uint64_t *presumed_offset)
+{
+   uint32_t read_domains, write_domain;
+   int err;
+
+   if (flags & INTEL_RELOC_WRITE) {
+      /*
+       * Because of the translation to domains, INTEL_RELOC_GGTT should only
+       * be set on GEN6 when the bo is written by MI_* or PIPE_CONTROL.  The
+       * kernel will translate it back to INTEL_RELOC_GGTT.
+       */
+      write_domain = (flags & INTEL_RELOC_GGTT) ?
+         I915_GEM_DOMAIN_INSTRUCTION : I915_GEM_DOMAIN_RENDER;
+      read_domains = write_domain;
+   } else {
+      write_domain = 0;
+      read_domains = I915_GEM_DOMAIN_RENDER |
+                     I915_GEM_DOMAIN_SAMPLER |
+                     I915_GEM_DOMAIN_INSTRUCTION |
+                     I915_GEM_DOMAIN_VERTEX;
+   }
+
+   if (flags & INTEL_RELOC_FENCE) {
+      err = drm_intel_bo_emit_reloc_fence(gem_bo(bo), offset,
+            gem_bo(target_bo), target_offset,
+            read_domains, write_domain);
+   } else {
+      err = drm_intel_bo_emit_reloc(gem_bo(bo), offset,
+            gem_bo(target_bo), target_offset,
+            read_domains, write_domain);
+   }
+
+   *presumed_offset = gem_bo(target_bo)->offset64 + target_offset;
+
+   return err;
+}
+
+int
+intel_bo_get_reloc_count(struct intel_bo *bo)
+{
+   return drm_intel_gem_bo_get_reloc_count(gem_bo(bo));
+}
+
+void
+intel_bo_truncate_relocs(struct intel_bo *bo, int start)
+{
+   drm_intel_gem_bo_clear_relocs(gem_bo(bo), start);
+}
+
+bool
+intel_bo_has_reloc(struct intel_bo *bo, struct intel_bo *target_bo)
+{
+   return drm_intel_bo_references(gem_bo(bo), gem_bo(target_bo));
+}
+
+int
+intel_bo_wait(struct intel_bo *bo, int64_t timeout)
+{
+   int err = 0;
+
+   if (timeout >= 0)
+       err = drm_intel_gem_bo_wait(gem_bo(bo), timeout);
+   else
+       drm_intel_bo_wait_rendering(gem_bo(bo));
+
+   /* consider the bo idle on errors */
+   if (err && err != -ETIME)
+      err = 0;
+
+   return err;
+}

diff --git a/icd/intel/layout.c b/icd/intel/layout.c
new file mode 100644
index 0000000..a2b681b
--- /dev/null
+++ b/icd/intel/layout.c

@@ -0,0 +1,1403 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Chia-I Wu <olv@lunarg.com>
+ *
+ */
+
+#include "dev.h"
+#include "format.h"
+#include "gpu.h"
+#include "layout.h"
+
+enum {
+   LAYOUT_TILING_NONE = 1 << GEN6_TILING_NONE,
+   LAYOUT_TILING_X = 1 << GEN6_TILING_X,
+   LAYOUT_TILING_Y = 1 << GEN6_TILING_Y,
+   LAYOUT_TILING_W = 1 << GEN8_TILING_W,
+
+   LAYOUT_TILING_ALL = (LAYOUT_TILING_NONE |
+                        LAYOUT_TILING_X |
+                        LAYOUT_TILING_Y |
+                        LAYOUT_TILING_W)
+};
+
+struct intel_layout_params {
+   struct intel_dev *dev;
+
+   const struct intel_gpu *gpu;
+   const VkImageCreateInfo *info;
+   bool scanout;
+
+   bool compressed;
+
+   unsigned h0, h1;
+   unsigned max_x, max_y;
+};
+
+static void
+layout_get_slice_size(const struct intel_layout *layout,
+                      const struct intel_layout_params *params,
+                      unsigned level, unsigned *width, unsigned *height)
+{
+   const VkImageCreateInfo *info = params->info;
+   unsigned w, h;
+
+   w = u_minify(layout->width0, level);
+   h = u_minify(layout->height0, level);
+
+   /*
+    * From the Sandy Bridge PRM, volume 1 part 1, page 114:
+    *
+    *     "The dimensions of the mip maps are first determined by applying the
+    *      sizing algorithm presented in Non-Power-of-Two Mipmaps above. Then,
+    *      if necessary, they are padded out to compression block boundaries."
+    */
+   w = u_align(w, layout->block_width);
+   h = u_align(h, layout->block_height);
+
+   /*
+    * From the Sandy Bridge PRM, volume 1 part 1, page 111:
+    *
+    *     "If the surface is multisampled (4x), these values must be adjusted
+    *      as follows before proceeding:
+    *
+    *        W_L = ceiling(W_L / 2) * 4
+    *        H_L = ceiling(H_L / 2) * 4"
+    *
+    * From the Ivy Bridge PRM, volume 1 part 1, page 108:
+    *
+    *     "If the surface is multisampled and it is a depth or stencil surface
+    *      or Multisampled Surface StorageFormat in SURFACE_STATE is
+    *      MSFMT_DEPTH_STENCIL, W_L and H_L must be adjusted as follows before
+    *      proceeding:
+    *
+    *        #samples  W_L =                    H_L =
+    *        2         ceiling(W_L / 2) * 4     HL [no adjustment]
+    *        4         ceiling(W_L / 2) * 4     ceiling(H_L / 2) * 4
+    *        8         ceiling(W_L / 2) * 8     ceiling(H_L / 2) * 4
+    *        16        ceiling(W_L / 2) * 8     ceiling(H_L / 2) * 8"
+    *
+    * For interleaved samples (4x), where pixels
+    *
+    *   (x, y  ) (x+1, y  )
+    *   (x, y+1) (x+1, y+1)
+    *
+    * would be is occupied by
+    *
+    *   (x, y  , si0) (x+1, y  , si0) (x, y  , si1) (x+1, y  , si1)
+    *   (x, y+1, si0) (x+1, y+1, si0) (x, y+1, si1) (x+1, y+1, si1)
+    *   (x, y  , si2) (x+1, y  , si2) (x, y  , si3) (x+1, y  , si3)
+    *   (x, y+1, si2) (x+1, y+1, si2) (x, y+1, si3) (x+1, y+1, si3)
+    *
+    * Thus the need to
+    *
+    *   w = align(w, 2) * 2;
+    *   y = align(y, 2) * 2;
+    */
+   if (layout->interleaved_samples) {
+      switch (info->samples) {
+      case VK_SAMPLE_COUNT_1_BIT:
+         break;
+      case VK_SAMPLE_COUNT_2_BIT:
+         w = u_align(w, 2) * 2;
+         break;
+      case VK_SAMPLE_COUNT_4_BIT:
+         w = u_align(w, 2) * 2;
+         h = u_align(h, 2) * 2;
+         break;
+      case VK_SAMPLE_COUNT_8_BIT:
+         w = u_align(w, 2) * 4;
+         h = u_align(h, 2) * 2;
+         break;
+      case VK_SAMPLE_COUNT_16_BIT:
+         w = u_align(w, 2) * 4;
+         h = u_align(h, 2) * 4;
+         break;
+      default:
+         assert(!"unsupported sample count");
+         break;
+      }
+   }
+
+   /*
+    * From the Ivy Bridge PRM, volume 1 part 1, page 108:
+    *
+    *     "For separate stencil buffer, the width must be mutiplied by 2 and
+    *      height divided by 2..."
+    *
+    * To make things easier (for transfer), we will just double the stencil
+    * stride in 3DSTATE_STENCIL_BUFFER.
+    */
+   w = u_align(w, layout->align_i);
+   h = u_align(h, layout->align_j);
+
+   *width = w;
+   *height = h;
+}
+
+static unsigned
+layout_get_num_layers(const struct intel_layout *layout,
+                      const struct intel_layout_params *params)
+{
+   const VkImageCreateInfo *info = params->info;
+   unsigned num_layers = info->arrayLayers;
+
+   /* samples of the same index are stored in a layer */
+   if (info->samples != VK_SAMPLE_COUNT_1_BIT && !layout->interleaved_samples)
+      num_layers *= (uint32_t) info->samples;
+
+   return num_layers;
+}
+
+static void
+layout_init_layer_height(struct intel_layout *layout,
+                         struct intel_layout_params *params)
+{
+   const VkImageCreateInfo *info = params->info;
+   unsigned num_layers;
+
+   if (layout->walk != INTEL_LAYOUT_WALK_LAYER)
+      return;
+
+   num_layers = layout_get_num_layers(layout, params);
+   if (num_layers <= 1)
+      return;
+
+   /*
+    * From the Sandy Bridge PRM, volume 1 part 1, page 115:
+    *
+    *     "The following equation is used for surface formats other than
+    *      compressed textures:
+    *
+    *        QPitch = (h0 + h1 + 11j)"
+    *
+    *     "The equation for compressed textures (BC* and FXT1 surface formats)
+    *      follows:
+    *
+    *        QPitch = (h0 + h1 + 11j) / 4"
+    *
+    *     "[DevSNB] Errata: Sampler MSAA Qpitch will be 4 greater than the
+    *      value calculated in the equation above, for every other odd Surface
+    *      Height starting from 1 i.e. 1,5,9,13"
+    *
+    * From the Ivy Bridge PRM, volume 1 part 1, page 111-112:
+    *
+    *     "If Surface Array Spacing is set to ARYSPC_FULL (note that the depth
+    *      buffer and stencil buffer have an implied value of ARYSPC_FULL):
+    *
+    *        QPitch = (h0 + h1 + 12j)
+    *        QPitch = (h0 + h1 + 12j) / 4 (compressed)
+    *
+    *      (There are many typos or missing words here...)"
+    *
+    * To access the N-th slice, an offset of (Stride * QPitch * N) is added to
+    * the base address.  The PRM divides QPitch by 4 for compressed formats
+    * because the block height for those formats are 4, and it wants QPitch to
+    * mean the number of memory rows, as opposed to texel rows, between
+    * slices.  Since we use texel rows everywhere, we do not need to divide
+    * QPitch by 4.
+    */
+   layout->layer_height = params->h0 + params->h1 +
+      ((intel_gpu_gen(params->gpu) >= INTEL_GEN(7)) ? 12 : 11) * layout->align_j;
+
+   if (intel_gpu_gen(params->gpu) == INTEL_GEN(6) &&
+       info->samples != VK_SAMPLE_COUNT_1_BIT &&
+       layout->height0 % 4 == 1)
+      layout->layer_height += 4;
+
+   params->max_y += layout->layer_height * (num_layers - 1);
+}
+
+static void
+layout_init_lods(struct intel_layout *layout,
+                 struct intel_layout_params *params)
+{
+   const VkImageCreateInfo *info = params->info;
+   unsigned cur_x, cur_y;
+   unsigned lv;
+
+   cur_x = 0;
+   cur_y = 0;
+   for (lv = 0; lv < info->mipLevels; lv++) {
+      unsigned lod_w, lod_h;
+
+      layout_get_slice_size(layout, params, lv, &lod_w, &lod_h);
+
+      layout->lods[lv].x = cur_x;
+      layout->lods[lv].y = cur_y;
+      layout->lods[lv].slice_width = lod_w;
+      layout->lods[lv].slice_height = lod_h;
+
+      switch (layout->walk) {
+      case INTEL_LAYOUT_WALK_LOD:
+         lod_h *= layout_get_num_layers(layout, params);
+         if (lv == 1)
+            cur_x += lod_w;
+         else
+            cur_y += lod_h;
+
+         /* every LOD begins at tile boundaries */
+         if (info->mipLevels > 1) {
+            assert(layout->format == VK_FORMAT_S8_UINT);
+            cur_x = u_align(cur_x, 64);
+            cur_y = u_align(cur_y, 64);
+         }
+         break;
+      case INTEL_LAYOUT_WALK_LAYER:
+         /* MIPLAYOUT_BELOW */
+         if (lv == 1)
+            cur_x += lod_w;
+         else
+            cur_y += lod_h;
+         break;
+      case INTEL_LAYOUT_WALK_3D:
+         {
+            const unsigned num_slices = u_minify(info->extent.depth, lv);
+            const unsigned num_slices_per_row = 1 << lv;
+            const unsigned num_rows =
+               (num_slices + num_slices_per_row - 1) / num_slices_per_row;
+
+            lod_w *= num_slices_per_row;
+            lod_h *= num_rows;
+
+            cur_y += lod_h;
+         }
+         break;
+      }
+
+      if (params->max_x < layout->lods[lv].x + lod_w)
+         params->max_x = layout->lods[lv].x + lod_w;
+      if (params->max_y < layout->lods[lv].y + lod_h)
+         params->max_y = layout->lods[lv].y + lod_h;
+   }
+
+   if (layout->walk == INTEL_LAYOUT_WALK_LAYER) {
+      params->h0 = layout->lods[0].slice_height;
+
+      if (info->mipLevels > 1)
+         params->h1 = layout->lods[1].slice_height;
+      else
+         layout_get_slice_size(layout, params, 1, &cur_x, &params->h1);
+   }
+}
+
+static void
+layout_init_alignments(struct intel_layout *layout,
+                       struct intel_layout_params *params)
+{
+   const VkImageCreateInfo *info = params->info;
+
+   /*
+    * From the Sandy Bridge PRM, volume 1 part 1, page 113:
+    *
+    *     "surface format           align_i     align_j
+    *      YUV 4:2:2 formats        4           *see below
+    *      BC1-5                    4           4
+    *      FXT1                     8           4
+    *      all other formats        4           *see below"
+    *
+    *     "- align_j = 4 for any depth buffer
+    *      - align_j = 2 for separate stencil buffer
+    *      - align_j = 4 for any render target surface is multisampled (4x)
+    *      - align_j = 4 for any render target surface with Surface Vertical
+    *        Alignment = VALIGN_4
+    *      - align_j = 2 for any render target surface with Surface Vertical
+    *        Alignment = VALIGN_2
+    *      - align_j = 2 for all other render target surface
+    *      - align_j = 2 for any sampling engine surface with Surface Vertical
+    *        Alignment = VALIGN_2
+    *      - align_j = 4 for any sampling engine surface with Surface Vertical
+    *        Alignment = VALIGN_4"
+    *
+    * From the Sandy Bridge PRM, volume 4 part 1, page 86:
+    *
+    *     "This field (Surface Vertical Alignment) must be set to VALIGN_2 if
+    *      the Surface Format is 96 bits per element (BPE)."
+    *
+    * They can be rephrased as
+    *
+    *                                  align_i        align_j
+    *   compressed formats             block width    block height
+    *   PIPE_FORMAT_S8_UINT            4              2
+    *   other depth/stencil formats    4              4
+    *   4x multisampled                4              4
+    *   bpp 96                         4              2
+    *   others                         4              2 or 4
+    */
+
+   /*
+    * From the Ivy Bridge PRM, volume 1 part 1, page 110:
+    *
+    *     "surface defined by      surface format     align_i     align_j
+    *      3DSTATE_DEPTH_BUFFER    D16_UNORM          8           4
+    *                              not D16_UNORM      4           4
+    *      3DSTATE_STENCIL_BUFFER  N/A                8           8
+    *      SURFACE_STATE           BC*, ETC*, EAC*    4           4
+    *                              FXT1               8           4
+    *                              all others         (set by SURFACE_STATE)"
+    *
+    * From the Ivy Bridge PRM, volume 4 part 1, page 63:
+    *
+    *     "- This field (Surface Vertical Aligment) is intended to be set to
+    *        VALIGN_4 if the surface was rendered as a depth buffer, for a
+    *        multisampled (4x) render target, or for a multisampled (8x)
+    *        render target, since these surfaces support only alignment of 4.
+    *      - Use of VALIGN_4 for other surfaces is supported, but uses more
+    *        memory.
+    *      - This field must be set to VALIGN_4 for all tiled Y Render Target
+    *        surfaces.
+    *      - Value of 1 is not supported for format YCRCB_NORMAL (0x182),
+    *        YCRCB_SWAPUVY (0x183), YCRCB_SWAPUV (0x18f), YCRCB_SWAPY (0x190)
+    *      - If Number of Multisamples is not MULTISAMPLECOUNT_1, this field
+    *        must be set to VALIGN_4."
+    *      - VALIGN_4 is not supported for surface format R32G32B32_FLOAT."
+    *
+    *     "- This field (Surface Horizontal Aligment) is intended to be set to
+    *        HALIGN_8 only if the surface was rendered as a depth buffer with
+    *        Z16 format or a stencil buffer, since these surfaces support only
+    *        alignment of 8.
+    *      - Use of HALIGN_8 for other surfaces is supported, but uses more
+    *        memory.
+    *      - This field must be set to HALIGN_4 if the Surface Format is BC*.
+    *      - This field must be set to HALIGN_8 if the Surface Format is
+    *        FXT1."
+    *
+    * They can be rephrased as
+    *
+    *                                  align_i        align_j
+    *  compressed formats              block width    block height
+    *  PIPE_FORMAT_Z16_UNORM           8              4
+    *  PIPE_FORMAT_S8_UINT             8              8
+    *  other depth/stencil formats     4              4
+    *  2x or 4x multisampled           4 or 8         4
+    *  tiled Y                         4 or 8         4 (if rt)
+    *  PIPE_FORMAT_R32G32B32_FLOAT     4 or 8         2
+    *  others                          4 or 8         2 or 4
+    */
+
+   if (params->compressed) {
+      /* this happens to be the case */
+      layout->align_i = layout->block_width;
+      layout->align_j = layout->block_height;
+   } else if (info->usage & VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT) {
+      if (intel_gpu_gen(params->gpu) >= INTEL_GEN(7)) {
+         switch (layout->format) {
+         case VK_FORMAT_D16_UNORM:
+            layout->align_i = 8;
+            layout->align_j = 4;
+            break;
+         case VK_FORMAT_S8_UINT:
+            layout->align_i = 8;
+            layout->align_j = 8;
+            break;
+         default:
+            layout->align_i = 4;
+            layout->align_j = 4;
+            break;
+         }
+      } else {
+         switch (layout->format) {
+         case VK_FORMAT_S8_UINT:
+            layout->align_i = 4;
+            layout->align_j = 2;
+            break;
+         default:
+            layout->align_i = 4;
+            layout->align_j = 4;
+            break;
+         }
+      }
+   } else {
+      const bool valign_4 =
+         (info->samples != VK_SAMPLE_COUNT_1_BIT) ||
+         (intel_gpu_gen(params->gpu) >= INTEL_GEN(8)) ||
+         (intel_gpu_gen(params->gpu) >= INTEL_GEN(7) &&
+          layout->tiling == GEN6_TILING_Y &&
+          (info->usage & VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT));
+
+      if (intel_gpu_gen(params->gpu) >= INTEL_GEN(7) &&
+          intel_gpu_gen(params->gpu) <= INTEL_GEN(7.5) && valign_4)
+         assert(layout->format != VK_FORMAT_R32G32B32_SFLOAT);
+
+      layout->align_i = 4;
+      layout->align_j = (valign_4) ? 4 : 2;
+   }
+
+   /*
+    * the fact that align i and j are multiples of block width and height
+    * respectively is what makes the size of the bo a multiple of the block
+    * size, slices start at block boundaries, and many of the computations
+    * work.
+    */
+   assert(layout->align_i % layout->block_width == 0);
+   assert(layout->align_j % layout->block_height == 0);
+
+   /* make sure u_align() works */
+   assert(u_is_pow2(layout->align_i) &&
+          u_is_pow2(layout->align_j));
+   assert(u_is_pow2(layout->block_width) &&
+          u_is_pow2(layout->block_height));
+}
+
+static unsigned
+layout_get_valid_tilings(const struct intel_layout *layout,
+                         const struct intel_layout_params *params)
+{
+   const VkImageCreateInfo *info = params->info;
+   const VkFormat format = layout->format;
+   unsigned valid_tilings = LAYOUT_TILING_ALL;
+
+   /*
+    * From the Sandy Bridge PRM, volume 1 part 2, page 32:
+    *
+    *     "Display/Overlay   Y-Major not supported.
+    *                        X-Major required for Async Flips"
+    */
+   if (params->scanout)
+       valid_tilings &= LAYOUT_TILING_X;
+
+   if (info->tiling == VK_IMAGE_TILING_LINEAR)
+       valid_tilings &= LAYOUT_TILING_NONE;
+
+   /*
+    * From the Sandy Bridge PRM, volume 2 part 1, page 318:
+    *
+    *     "[DevSNB+]: This field (Tiled Surface) must be set to TRUE. Linear
+    *      Depth Buffer is not supported."
+    *
+    *     "The Depth Buffer, if tiled, must use Y-Major tiling."
+    *
+    * From the Sandy Bridge PRM, volume 1 part 2, page 22:
+    *
+    *     "W-Major Tile Format is used for separate stencil."
+    */
+   if (info->usage & VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT) {
+      switch (format) {
+      case VK_FORMAT_S8_UINT:
+         valid_tilings &= LAYOUT_TILING_W;
+         break;
+      default:
+         valid_tilings &= LAYOUT_TILING_Y;
+         break;
+      }
+   }
+
+   if (info->usage & VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT) {
+      /*
+       * From the Sandy Bridge PRM, volume 1 part 2, page 32:
+       *
+       *     "NOTE: 128BPE Format Color buffer ( render target ) MUST be
+       *      either TileX or Linear."
+       *
+       * From the Haswell PRM, volume 5, page 32:
+       *
+       *     "NOTE: 128 BPP format color buffer (render target) supports
+       *      Linear, TiledX and TiledY."
+       */
+      if (intel_gpu_gen(params->gpu) < INTEL_GEN(7.5) && layout->block_size == 16)
+         valid_tilings &= ~LAYOUT_TILING_Y;
+
+      /*
+       * From the Ivy Bridge PRM, volume 4 part 1, page 63:
+       *
+       *     "This field (Surface Vertical Aligment) must be set to VALIGN_4
+       *      for all tiled Y Render Target surfaces."
+       *
+       *     "VALIGN_4 is not supported for surface format R32G32B32_FLOAT."
+       */
+      if (intel_gpu_gen(params->gpu) >= INTEL_GEN(7) &&
+          intel_gpu_gen(params->gpu) <= INTEL_GEN(7.5) &&
+          layout->format == VK_FORMAT_R32G32B32_SFLOAT)
+         valid_tilings &= ~LAYOUT_TILING_Y;
+
+      valid_tilings &= ~LAYOUT_TILING_W;
+   }
+
+   if (info->usage & VK_IMAGE_USAGE_SAMPLED_BIT) {
+      if (intel_gpu_gen(params->gpu) < INTEL_GEN(8))
+         valid_tilings &= ~LAYOUT_TILING_W;
+   }
+
+   /* no conflicting binding flags */
+   assert(valid_tilings);
+
+   return valid_tilings;
+}
+
+static void
+layout_init_tiling(struct intel_layout *layout,
+                   struct intel_layout_params *params)
+{
+   const VkImageCreateInfo *info = params->info;
+   unsigned preferred_tilings;
+
+   layout->valid_tilings = layout_get_valid_tilings(layout, params);
+
+   preferred_tilings = layout->valid_tilings;
+
+   /* no fencing nor BLT support */
+   if (preferred_tilings & ~LAYOUT_TILING_W)
+      preferred_tilings &= ~LAYOUT_TILING_W;
+
+   if (info->usage & (VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT |
+                      VK_IMAGE_USAGE_SAMPLED_BIT)) {
+      /*
+       * heuristically set a minimum width/height for enabling tiling
+       */
+      if (layout->width0 < 64 && (preferred_tilings & ~LAYOUT_TILING_X))
+         preferred_tilings &= ~LAYOUT_TILING_X;
+
+      if ((layout->width0 < 32 || layout->height0 < 16) &&
+          (layout->width0 < 16 || layout->height0 < 32) &&
+          (preferred_tilings & ~LAYOUT_TILING_Y))
+         preferred_tilings &= ~LAYOUT_TILING_Y;
+   } else {
+      /* force linear if we are not sure where the texture is bound to */
+      if (preferred_tilings & LAYOUT_TILING_NONE)
+         preferred_tilings &= LAYOUT_TILING_NONE;
+   }
+
+   /* prefer tiled over linear */
+   if (preferred_tilings & LAYOUT_TILING_Y)
+      layout->tiling = GEN6_TILING_Y;
+   else if (preferred_tilings & LAYOUT_TILING_X)
+      layout->tiling = GEN6_TILING_X;
+   else if (preferred_tilings & LAYOUT_TILING_W)
+      layout->tiling = GEN8_TILING_W;
+   else
+      layout->tiling = GEN6_TILING_NONE;
+}
+
+static void
+layout_init_walk_gen7(struct intel_layout *layout,
+                              struct intel_layout_params *params)
+{
+   const VkImageCreateInfo *info = params->info;
+
+   /*
+    * It is not explicitly states, but render targets are expected to be
+    * UMS/CMS (samples non-interleaved) and depth/stencil buffers are expected
+    * to be IMS (samples interleaved).
+    *
+    * See "Multisampled Surface Storage Format" field of SURFACE_STATE.
+    */
+   if (info->usage & VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT) {
+      /*
+       * From the Ivy Bridge PRM, volume 1 part 1, page 111:
+       *
+       *     "note that the depth buffer and stencil buffer have an implied
+       *      value of ARYSPC_FULL"
+       */
+      layout->walk = (info->imageType == VK_IMAGE_TYPE_3D) ?
+         INTEL_LAYOUT_WALK_3D : INTEL_LAYOUT_WALK_LAYER;
+
+      layout->interleaved_samples = true;
+   } else {
+      /*
+       * From the Ivy Bridge PRM, volume 4 part 1, page 66:
+       *
+       *     "If Multisampled Surface Storage Format is MSFMT_MSS and Number
+       *      of Multisamples is not MULTISAMPLECOUNT_1, this field (Surface
+       *      Array Spacing) must be set to ARYSPC_LOD0."
+       *
+       * As multisampled resources are not mipmapped, we never use
+       * ARYSPC_FULL for them.
+       */
+      if (info->samples != VK_SAMPLE_COUNT_1_BIT)
+         assert(info->mipLevels == 1);
+
+      layout->walk =
+         (info->imageType == VK_IMAGE_TYPE_3D) ? INTEL_LAYOUT_WALK_3D :
+         (info->mipLevels > 1) ? INTEL_LAYOUT_WALK_LAYER :
+         INTEL_LAYOUT_WALK_LOD;
+
+      layout->interleaved_samples = false;
+   }
+}
+
+static void
+layout_init_walk_gen6(struct intel_layout *layout,
+                      struct intel_layout_params *params)
+{
+   /*
+    * From the Sandy Bridge PRM, volume 1 part 1, page 115:
+    *
+    *     "The separate stencil buffer does not support mip mapping, thus the
+    *      storage for LODs other than LOD 0 is not needed. The following
+    *      QPitch equation applies only to the separate stencil buffer:
+    *
+    *        QPitch = h_0"
+    *
+    * GEN6 does not support compact spacing otherwise.
+    */
+   layout->walk =
+      (params->info->imageType == VK_IMAGE_TYPE_3D) ? INTEL_LAYOUT_WALK_3D :
+      (layout->format == VK_FORMAT_S8_UINT) ? INTEL_LAYOUT_WALK_LOD :
+      INTEL_LAYOUT_WALK_LAYER;
+
+   /* GEN6 supports only interleaved samples */
+   layout->interleaved_samples = true;
+}
+
+static void
+layout_init_walk(struct intel_layout *layout,
+                 struct intel_layout_params *params)
+{
+   if (intel_gpu_gen(params->gpu) >= INTEL_GEN(7))
+      layout_init_walk_gen7(layout, params);
+   else
+      layout_init_walk_gen6(layout, params);
+}
+
+static void
+layout_init_size_and_format(struct intel_layout *layout,
+                            struct intel_layout_params *params)
+{
+   const VkImageCreateInfo *info = params->info;
+   VkFormat format = info->format;
+   bool require_separate_stencil = false;
+
+   layout->width0 = info->extent.width;
+   layout->height0 = info->extent.height;
+
+   /*
+    * From the Sandy Bridge PRM, volume 2 part 1, page 317:
+    *
+    *     "This field (Separate Stencil Buffer Enable) must be set to the same
+    *      value (enabled or disabled) as Hierarchical Depth Buffer Enable."
+    *
+    * GEN7+ requires separate stencil buffers.
+    */
+   if (info->usage & VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT) {
+      if (intel_gpu_gen(params->gpu) >= INTEL_GEN(7))
+         require_separate_stencil = true;
+      else
+         require_separate_stencil = (layout->aux == INTEL_LAYOUT_AUX_HIZ);
+   }
+
+   switch (format) {
+   case VK_FORMAT_D24_UNORM_S8_UINT:
+      if (require_separate_stencil) {
+         format = VK_FORMAT_X8_D24_UNORM_PACK32;
+         layout->separate_stencil = true;
+      }
+      break;
+   case VK_FORMAT_D32_SFLOAT_S8_UINT:
+      if (require_separate_stencil) {
+         format = VK_FORMAT_D32_SFLOAT;
+         layout->separate_stencil = true;
+      }
+      break;
+   default:
+      break;
+   }
+
+   layout->format = format;
+   layout->block_width = icd_format_get_block_width(format);
+   layout->block_height = layout->block_width;
+   layout->block_size = icd_format_get_size(format);
+
+   params->compressed = icd_format_is_compressed(format);
+}
+
+static bool
+layout_want_mcs(struct intel_layout *layout,
+                struct intel_layout_params *params)
+{
+   const VkImageCreateInfo *info = params->info;
+   bool want_mcs = false;
+
+   /* MCS is for RT on GEN7+ */
+   if (intel_gpu_gen(params->gpu) < INTEL_GEN(7))
+      return false;
+
+   if (info->imageType != VK_IMAGE_TYPE_2D ||
+       !(info->usage & VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT))
+      return false;
+
+   /*
+    * From the Ivy Bridge PRM, volume 4 part 1, page 77:
+    *
+    *     "For Render Target and Sampling Engine Surfaces:If the surface is
+    *      multisampled (Number of Multisamples any value other than
+    *      MULTISAMPLECOUNT_1), this field (MCS Enable) must be enabled."
+    *
+    *     "This field must be set to 0 for all SINT MSRTs when all RT channels
+    *      are not written"
+    */
+   if (info->samples != VK_SAMPLE_COUNT_1_BIT &&
+       !icd_format_is_int(info->format)) {
+      want_mcs = true;
+   } else if (info->samples == VK_SAMPLE_COUNT_1_BIT) {
+      /*
+       * From the Ivy Bridge PRM, volume 2 part 1, page 326:
+       *
+       *     "When MCS is buffer is used for color clear of non-multisampler
+       *      render target, the following restrictions apply.
+       *      - Support is limited to tiled render targets.
+       *      - Support is for non-mip-mapped and non-array surface types
+       *        only.
+       *      - Clear is supported only on the full RT; i.e., no partial clear
+       *        or overlapping clears.
+       *      - MCS buffer for non-MSRT is supported only for RT formats
+       *        32bpp, 64bpp and 128bpp.
+       *      ..."
+       */
+      if (layout->tiling != GEN6_TILING_NONE &&
+          info->mipLevels == 1 && info->arrayLayers == 1) {
+         switch (layout->block_size) {
+         case 4:
+         case 8:
+         case 16:
+            want_mcs = true;
+            break;
+         default:
+            break;
+         }
+      }
+   }
+
+   return want_mcs;
+}
+
+static bool
+layout_want_hiz(const struct intel_layout *layout,
+                const struct intel_layout_params *params)
+{
+   const VkImageCreateInfo *info = params->info;
+
+   if (intel_debug & INTEL_DEBUG_NOHIZ)
+       return false;
+
+   if (!(info->usage & VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT))
+      return false;
+
+   if (!intel_format_has_depth(params->gpu, info->format))
+      return false;
+
+   /*
+    * HiZ implies separate stencil on Gen6.  We do not want to copy stencils
+    * values between combined and separate stencil buffers when HiZ is enabled
+    * or disabled.
+    */
+   if (intel_gpu_gen(params->gpu) == INTEL_GEN(6))
+       return false;
+
+   return true;
+}
+
+static void
+layout_init_aux(struct intel_layout *layout,
+                struct intel_layout_params *params)
+{
+   if (layout_want_hiz(layout, params))
+      layout->aux = INTEL_LAYOUT_AUX_HIZ;
+   else if (layout_want_mcs(layout, params))
+      layout->aux = INTEL_LAYOUT_AUX_MCS;
+}
+
+static void
+layout_align(struct intel_layout *layout, struct intel_layout_params *params)
+{
+   const VkImageCreateInfo *info = params->info;
+   int align_w = 1, align_h = 1, pad_h = 0;
+
+   /*
+    * From the Sandy Bridge PRM, volume 1 part 1, page 118:
+    *
+    *     "To determine the necessary padding on the bottom and right side of
+    *      the surface, refer to the table in Section 7.18.3.4 for the i and j
+    *      parameters for the surface format in use. The surface must then be
+    *      extended to the next multiple of the alignment unit size in each
+    *      dimension, and all texels contained in this extended surface must
+    *      have valid GTT entries."
+    *
+    *     "For cube surfaces, an additional two rows of padding are required
+    *      at the bottom of the surface. This must be ensured regardless of
+    *      whether the surface is stored tiled or linear.  This is due to the
+    *      potential rotation of cache line orientation from memory to cache."
+    *
+    *     "For compressed textures (BC* and FXT1 surface formats), padding at
+    *      the bottom of the surface is to an even compressed row, which is
+    *      equal to a multiple of 8 uncompressed texel rows. Thus, for padding
+    *      purposes, these surfaces behave as if j = 8 only for surface
+    *      padding purposes. The value of 4 for j still applies for mip level
+    *      alignment and QPitch calculation."
+    */
+   if (info->usage & VK_IMAGE_USAGE_SAMPLED_BIT) {
+      if (align_w < layout->align_i)
+          align_w = layout->align_i;
+      if (align_h < layout->align_j)
+          align_h = layout->align_j;
+
+      /* in case it is used as a cube */
+      if (info->imageType == VK_IMAGE_TYPE_2D)
+         pad_h += 2;
+
+      if (params->compressed && align_h < layout->align_j * 2)
+         align_h = layout->align_j * 2;
+   }
+
+   /*
+    * From the Sandy Bridge PRM, volume 1 part 1, page 118:
+    *
+    *     "If the surface contains an odd number of rows of data, a final row
+    *      below the surface must be allocated."
+    */
+   if ((info->usage & VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT) && align_h < 2)
+      align_h = 2;
+
+   /*
+    * Depth Buffer Clear/Resolve works in 8x4 sample blocks.  In
+    * intel_texture_can_enable_hiz(), we always return true for the first slice.
+    * To avoid out-of-bound access, we have to pad.
+    */
+   if (layout->aux == INTEL_LAYOUT_AUX_HIZ &&
+       info->mipLevels == 1 &&
+       info->arrayLayers == 1 &&
+       info->extent.depth == 1) {
+      if (align_w < 8)
+          align_w = 8;
+      if (align_h < 4)
+          align_h = 4;
+   }
+
+   params->max_x = u_align(params->max_x, align_w);
+   params->max_y = u_align(params->max_y + pad_h, align_h);
+}
+
+/* note that this may force the texture to be linear */
+static void
+layout_calculate_bo_size(struct intel_layout *layout,
+                         struct intel_layout_params *params)
+{
+   assert(params->max_x % layout->block_width == 0);
+   assert(params->max_y % layout->block_height == 0);
+   assert(layout->layer_height % layout->block_height == 0);
+
+   layout->bo_stride =
+      (params->max_x / layout->block_width) * layout->block_size;
+   layout->bo_height = params->max_y / layout->block_height;
+
+   while (true) {
+      unsigned w = layout->bo_stride, h = layout->bo_height;
+      unsigned align_w, align_h;
+
+      /*
+       * From the Haswell PRM, volume 5, page 163:
+       *
+       *     "For linear surfaces, additional padding of 64 bytes is required
+       *      at the bottom of the surface. This is in addition to the padding
+       *      required above."
+       */
+      if (intel_gpu_gen(params->gpu) >= INTEL_GEN(7.5) &&
+          (params->info->usage & VK_IMAGE_USAGE_SAMPLED_BIT) &&
+          layout->tiling == GEN6_TILING_NONE)
+         h += (64 + layout->bo_stride - 1) / layout->bo_stride;
+
+      /*
+       * From the Sandy Bridge PRM, volume 4 part 1, page 81:
+       *
+       *     "- For linear render target surfaces, the pitch must be a
+       *        multiple of the element size for non-YUV surface formats.
+       *        Pitch must be a multiple of 2 * element size for YUV surface
+       *        formats.
+       *      - For other linear surfaces, the pitch can be any multiple of
+       *        bytes.
+       *      - For tiled surfaces, the pitch must be a multiple of the tile
+       *        width."
+       *
+       * Different requirements may exist when the bo is used in different
+       * places, but our alignments here should be good enough that we do not
+       * need to check layout->info->usage.
+       */
+      switch (layout->tiling) {
+      case GEN6_TILING_X:
+         align_w = 512;
+         align_h = 8;
+         break;
+      case GEN6_TILING_Y:
+         align_w = 128;
+         align_h = 32;
+         break;
+      case GEN8_TILING_W:
+         /*
+          * From the Sandy Bridge PRM, volume 1 part 2, page 22:
+          *
+          *     "A 4KB tile is subdivided into 8-high by 8-wide array of
+          *      Blocks for W-Major Tiles (W Tiles). Each Block is 8 rows by 8
+          *      bytes."
+          */
+         align_w = 64;
+         align_h = 64;
+         break;
+      default:
+         assert(layout->tiling == GEN6_TILING_NONE);
+         /* some good enough values */
+         align_w = 64;
+         align_h = 2;
+         break;
+      }
+
+      w = u_align(w, align_w);
+      h = u_align(h, align_h);
+
+      /* make sure the bo is mappable */
+      if (layout->tiling != GEN6_TILING_NONE) {
+         /*
+          * Usually only the first 256MB of the GTT is mappable.
+          *
+          * See also how intel_context::max_gtt_map_object_size is calculated.
+          */
+         const size_t mappable_gtt_size = 256 * 1024 * 1024;
+
+         /*
+          * Be conservative.  We may be able to switch from VALIGN_4 to
+          * VALIGN_2 if the layout was Y-tiled, but let's keep it simple.
+          */
+         if (mappable_gtt_size / w / 4 < h) {
+            if (layout->valid_tilings & LAYOUT_TILING_NONE) {
+               layout->tiling = GEN6_TILING_NONE;
+               /* MCS support for non-MSRTs is limited to tiled RTs */
+               if (layout->aux == INTEL_LAYOUT_AUX_MCS &&
+                   params->info->samples == VK_SAMPLE_COUNT_1_BIT)
+                  layout->aux = INTEL_LAYOUT_AUX_NONE;
+
+               continue;
+            } else {
+               /* mapping will fail */
+            }
+         }
+      }
+
+      layout->bo_stride = w;
+      layout->bo_height = h;
+      break;
+   }
+}
+
+static void
+layout_calculate_hiz_size(struct intel_layout *layout,
+                          struct intel_layout_params *params)
+{
+   const VkImageCreateInfo *info = params->info;
+   const unsigned hz_align_j = 8;
+   enum intel_layout_walk_type hz_walk;
+   unsigned hz_width, hz_height, lv;
+   unsigned hz_clear_w, hz_clear_h;
+
+   assert(layout->aux == INTEL_LAYOUT_AUX_HIZ);
+
+   assert(layout->walk == INTEL_LAYOUT_WALK_LAYER ||
+          layout->walk == INTEL_LAYOUT_WALK_3D);
+
+   /*
+    * From the Sandy Bridge PRM, volume 2 part 1, page 312:
+    *
+    *     "The hierarchical depth buffer does not support the LOD field, it is
+    *      assumed by hardware to be zero. A separate hierarachical depth
+    *      buffer is required for each LOD used, and the corresponding
+    *      buffer's state delivered to hardware each time a new depth buffer
+    *      state with modified LOD is delivered."
+    *
+    * We will put all LODs in a single bo with INTEL_LAYOUT_WALK_LOD.
+    */
+   if (intel_gpu_gen(params->gpu) >= INTEL_GEN(7))
+      hz_walk = layout->walk;
+   else
+      hz_walk = INTEL_LAYOUT_WALK_LOD;
+
+   /*
+    * See the Sandy Bridge PRM, volume 2 part 1, page 312, and the Ivy Bridge
+    * PRM, volume 2 part 1, page 312-313.
+    *
+    * It seems HiZ buffer is aligned to 8x8, with every two rows packed into a
+    * memory row.
+    */
+   switch (hz_walk) {
+   case INTEL_LAYOUT_WALK_LOD:
+      {
+         unsigned lod_tx[INTEL_LAYOUT_MAX_LEVELS];
+         unsigned lod_ty[INTEL_LAYOUT_MAX_LEVELS];
+         unsigned cur_tx, cur_ty;
+
+         /* figure out the tile offsets of LODs */
+         hz_width = 0;
+         hz_height = 0;
+         cur_tx = 0;
+         cur_ty = 0;
+         for (lv = 0; lv < info->mipLevels; lv++) {
+            unsigned tw, th;
+
+            lod_tx[lv] = cur_tx;
+            lod_ty[lv] = cur_ty;
+
+            tw = u_align(layout->lods[lv].slice_width, 16);
+            th = u_align(layout->lods[lv].slice_height, hz_align_j) *
+               info->arrayLayers / 2;
+            /* convert to Y-tiles */
+            tw = u_align(tw, 128) / 128;
+            th = u_align(th, 32) / 32;
+
+            if (hz_width < cur_tx + tw)
+               hz_width = cur_tx + tw;
+            if (hz_height < cur_ty + th)
+               hz_height = cur_ty + th;
+
+            if (lv == 1)
+               cur_tx += tw;
+            else
+               cur_ty += th;
+         }
+
+         /* convert tile offsets to memory offsets */
+         for (lv = 0; lv < info->mipLevels; lv++) {
+            layout->aux_offsets[lv] =
+               (lod_ty[lv] * hz_width + lod_tx[lv]) * 4096;
+         }
+         hz_width *= 128;
+         hz_height *= 32;
+      }
+      break;
+   case INTEL_LAYOUT_WALK_LAYER:
+      {
+         const unsigned h0 = u_align(params->h0, hz_align_j);
+         const unsigned h1 = u_align(params->h1, hz_align_j);
+         const unsigned htail =
+            ((intel_gpu_gen(params->gpu) >= INTEL_GEN(7)) ? 12 : 11) * hz_align_j;
+         const unsigned hz_qpitch = h0 + h1 + htail;
+
+         hz_width = u_align(layout->lods[0].slice_width, 16);
+
+         hz_height = hz_qpitch * info->arrayLayers / 2;
+         if (intel_gpu_gen(params->gpu) >= INTEL_GEN(7))
+            hz_height = u_align(hz_height, 8);
+
+         layout->aux_layer_height = hz_qpitch;
+      }
+      break;
+   case INTEL_LAYOUT_WALK_3D:
+      hz_width = u_align(layout->lods[0].slice_width, 16);
+
+      hz_height = 0;
+      for (lv = 0; lv < info->mipLevels; lv++) {
+         const unsigned h = u_align(layout->lods[lv].slice_height, hz_align_j);
+         /* according to the formula, slices are packed together vertically */
+         hz_height += h * u_minify(info->extent.depth, lv);
+      }
+      hz_height /= 2;
+      break;
+   default:
+      assert(!"unknown layout walk");
+      hz_width = 0;
+      hz_height = 0;
+      break;
+   }
+
+   /*
+    * In hiz_align_fb(), we will align the LODs to 8x4 sample blocks.
+    * Experiments on Haswell show that aligning the RECTLIST primitive and
+    * 3DSTATE_DRAWING_RECTANGLE alone are not enough.  The LOD sizes must be
+    * aligned.
+    */
+   hz_clear_w = 8;
+   hz_clear_h = 4;
+   switch (info->samples) {
+   case VK_SAMPLE_COUNT_1_BIT:
+   default:
+      break;
+   case VK_SAMPLE_COUNT_2_BIT:
+      hz_clear_w /= 2;
+      break;
+   case VK_SAMPLE_COUNT_4_BIT:
+      hz_clear_w /= 2;
+      hz_clear_h /= 2;
+      break;
+   case VK_SAMPLE_COUNT_8_BIT:
+      hz_clear_w /= 4;
+      hz_clear_h /= 2;
+      break;
+   case VK_SAMPLE_COUNT_16_BIT:
+      hz_clear_w /= 4;
+      hz_clear_h /= 4;
+      break;
+   }
+
+   for (lv = 0; lv < info->mipLevels; lv++) {
+      if (u_minify(layout->width0, lv) % hz_clear_w ||
+          u_minify(layout->height0, lv) % hz_clear_h)
+         break;
+      layout->aux_enables |= 1 << lv;
+   }
+
+   /* we padded to allow this in layout_align() */
+   if (info->mipLevels == 1 && info->arrayLayers == 1 && info->extent.depth == 1)
+      layout->aux_enables |= 0x1;
+
+   /* align to Y-tile */
+   layout->aux_stride = u_align(hz_width, 128);
+   layout->aux_height = u_align(hz_height, 32);
+}
+
+static void
+layout_calculate_mcs_size(struct intel_layout *layout,
+                          struct intel_layout_params *params)
+{
+   const VkImageCreateInfo *info = params->info;
+   int mcs_width, mcs_height, mcs_cpp;
+   int downscale_x, downscale_y;
+
+   assert(layout->aux == INTEL_LAYOUT_AUX_MCS);
+
+   if (info->samples != VK_SAMPLE_COUNT_1_BIT) {
+      /*
+       * From the Ivy Bridge PRM, volume 2 part 1, page 326, the clear
+       * rectangle is scaled down by 8x2 for 4X MSAA and 2x2 for 8X MSAA.  The
+       * need of scale down could be that the clear rectangle is used to clear
+       * the MCS instead of the RT.
+       *
+       * For 8X MSAA, we need 32 bits in MCS for every pixel in the RT.  The
+       * 2x2 factor could come from that the hardware writes 128 bits (an
+       * OWord) at a time, and the OWord in MCS maps to a 2x2 pixel block in
+       * the RT.  For 4X MSAA, we need 8 bits in MCS for every pixel in the
+       * RT.  Similarly, we could reason that an OWord in 4X MCS maps to a 8x2
+       * pixel block in the RT.
+       */
+      switch (info->samples) {
+      case VK_SAMPLE_COUNT_2_BIT:
+      case VK_SAMPLE_COUNT_4_BIT:
+         downscale_x = 8;
+         downscale_y = 2;
+         mcs_cpp = 1;
+         break;
+      case VK_SAMPLE_COUNT_8_BIT:
+         downscale_x = 2;
+         downscale_y = 2;
+         mcs_cpp = 4;
+         break;
+      case VK_SAMPLE_COUNT_16_BIT:
+         downscale_x = 2;
+         downscale_y = 1;
+         mcs_cpp = 8;
+         break;
+      default:
+         assert(!"unsupported sample count");
+         return;
+         break;
+      }
+
+      /*
+       * It also appears that the 2x2 subspans generated by the scaled-down
+       * clear rectangle cannot be masked.  The scale-down clear rectangle
+       * thus must be aligned to 2x2, and we need to pad.
+       */
+      mcs_width = u_align(layout->width0, downscale_x * 2);
+      mcs_height = u_align(layout->height0, downscale_y * 2);
+   } else {
+      /*
+       * From the Ivy Bridge PRM, volume 2 part 1, page 327:
+       *
+       *     "              Pixels  Lines
+       *      TiledY RT CL
+       *          bpp
+       *          32          8        4
+       *          64          4        4
+       *          128         2        4
+       *
+       *      TiledX RT CL
+       *          bpp
+       *          32          16       2
+       *          64          8        2
+       *          128         4        2"
+       *
+       * This table and the two following tables define the RT alignments, the
+       * clear rectangle alignments, and the clear rectangle scale factors.
+       * Viewing the RT alignments as the sizes of 128-byte blocks, we can see
+       * that the clear rectangle alignments are 16x32 blocks, and the clear
+       * rectangle scale factors are 8x16 blocks.
+       *
+       * For non-MSAA RT, we need 1 bit in MCS for every 128-byte block in the
+       * RT.  Similar to the MSAA cases, we can argue that an OWord maps to
+       * 8x16 blocks.
+       *
+       * One problem with this reasoning is that a Y-tile in MCS has 8x32
+       * OWords and maps to 64x512 128-byte blocks.  This differs from i965,
+       * which says that a Y-tile maps to 128x256 blocks (\see
+       * intel_get_non_msrt_mcs_alignment).  It does not really change
+       * anything except for the size of the allocated MCS.  Let's see if we
+       * hit out-of-bound access.
+       */
+      switch (layout->tiling) {
+      case GEN6_TILING_X:
+         downscale_x = 64 / layout->block_size;
+         downscale_y = 2;
+         break;
+      case GEN6_TILING_Y:
+         downscale_x = 32 / layout->block_size;
+         downscale_y = 4;
+         break;
+      default:
+         assert(!"unsupported tiling mode");
+         return;
+         break;
+      }
+
+      downscale_x *= 8;
+      downscale_y *= 16;
+
+      /*
+       * From the Haswell PRM, volume 7, page 652:
+       *
+       *     "Clear rectangle must be aligned to two times the number of
+       *      pixels in the table shown below due to 16X16 hashing across the
+       *      slice."
+       *
+       * The scaled-down clear rectangle must be aligned to 4x4 instead of
+       * 2x2, and we need to pad.
+       */
+      mcs_width = u_align(layout->width0, downscale_x * 4) / downscale_x;
+      mcs_height = u_align(layout->height0, downscale_y * 4) / downscale_y;
+      mcs_cpp = 16; /* an OWord */
+   }
+
+   layout->aux_enables = (1 << info->mipLevels) - 1;
+   /* align to Y-tile */
+   layout->aux_stride = u_align(mcs_width * mcs_cpp, 128);
+   layout->aux_height = u_align(mcs_height, 32);
+}
+
+/**
+ * Initialize the layout.  Callers should zero-initialize \p layout first.
+ */
+void intel_layout_init(struct intel_layout *layout,
+                       struct intel_dev *dev,
+                       const VkImageCreateInfo *info,
+                       bool scanout)
+{
+   struct intel_layout_params params;
+
+   memset(&params, 0, sizeof(params));
+   params.dev = dev;
+   params.gpu = dev->gpu;
+   params.info = info;
+   params.scanout = scanout;
+
+   /* note that there are dependencies between these functions */
+   layout_init_aux(layout, &params);
+   layout_init_size_and_format(layout, &params);
+   layout_init_walk(layout, &params);
+   layout_init_tiling(layout, &params);
+   layout_init_alignments(layout, &params);
+   layout_init_lods(layout, &params);
+   layout_init_layer_height(layout, &params);
+
+   layout_align(layout, &params);
+   layout_calculate_bo_size(layout, &params);
+
+   switch (layout->aux) {
+   case INTEL_LAYOUT_AUX_HIZ:
+      layout_calculate_hiz_size(layout, &params);
+      break;
+   case INTEL_LAYOUT_AUX_MCS:
+      layout_calculate_mcs_size(layout, &params);
+      break;
+   default:
+      break;
+   }
+}
+
+/**
+ * Return the offset (in bytes) to a slice within the bo.
+ *
+ * The returned offset is aligned to tile size.  Since slices are not
+ * guaranteed to start at tile boundaries, the X and Y offsets (in pixels)
+ * from the tile origin to the slice are also returned.  X offset is always a
+ * multiple of 4 and Y offset is always a multiple of 2.
+ */
+unsigned
+intel_layout_get_slice_tile_offset(const struct intel_layout *layout,
+                                   unsigned level, unsigned slice,
+                                   unsigned *x_offset, unsigned *y_offset)
+{
+   unsigned tile_w, tile_h, tile_size, row_size;
+   unsigned tile_offset, x, y;
+
+   /* see the Sandy Bridge PRM, volume 1 part 2, page 24 */
+
+   switch (layout->tiling) {
+   case GEN6_TILING_NONE:
+       tile_w = 1;
+       tile_h = 1;
+      break;
+   case GEN6_TILING_X:
+      tile_w = 512;
+      tile_h = 8;
+      break;
+   case GEN6_TILING_Y:
+      tile_w = 128;
+      tile_h = 32;
+      break;
+   case GEN8_TILING_W:
+      tile_w = 64;
+      tile_h = 64;
+      break;
+   default:
+      assert(!"unknown tiling");
+      tile_w = 1;
+      tile_h = 1;
+      break;
+   }
+
+   tile_size = tile_w * tile_h;
+   row_size = layout->bo_stride * tile_h;
+
+   intel_layout_get_slice_pos(layout, level, slice, &x, &y);
+   /* in bytes */
+   intel_layout_pos_to_mem(layout, x, y, &x, &y);
+   tile_offset = row_size * (y / tile_h) + tile_size * (x / tile_w);
+
+   /*
+    * Since tex->bo_stride is a multiple of tile_w, slice_offset should be
+    * aligned at this point.
+    */
+   assert(tile_offset % tile_size == 0);
+
+   /*
+    * because of the possible values of align_i and align_j in
+    * tex_layout_init_alignments(), x_offset is guaranteed to be a multiple of
+    * 4 and y_offset is guaranteed to be a multiple of 2.
+    */
+   if (x_offset) {
+      /* in pixels */
+      x = (x % tile_w) / layout->block_size * layout->block_width;
+      assert(x % 4 == 0);
+
+      *x_offset = x;
+   }
+
+   if (y_offset) {
+      /* in pixels */
+      y = (y % tile_h) * layout->block_height;
+      assert(y % 2 == 0);
+
+      *y_offset = y;
+   }
+
+   return tile_offset;
+}

diff --git a/icd/intel/layout.h b/icd/intel/layout.h
new file mode 100644
index 0000000..da193ce
--- /dev/null
+++ b/icd/intel/layout.h

@@ -0,0 +1,290 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ *
+ */
+
+#ifndef LAYOUT_H
+#define LAYOUT_H
+
+#include "genhw/genhw.h"
+#include "intel.h"
+
+#define INTEL_LAYOUT_MAX_LEVELS 16
+
+struct intel_dev;
+
+enum intel_layout_walk_type {
+   /*
+    * Array layers of an LOD are packed together vertically.  This maps to
+    * ARYSPC_LOD0 for non-mipmapped 2D textures, and is extended to support
+    * mipmapped stencil textures and HiZ on GEN6.
+    */
+   INTEL_LAYOUT_WALK_LOD,
+
+   /*
+    * LODs of an array layer are packed together.  This maps to ARYSPC_FULL
+    * and is used for mipmapped 2D textures.
+    */
+   INTEL_LAYOUT_WALK_LAYER,
+
+   /*
+    * 3D slices of an LOD are packed together, horizontally with wrapping.
+    * Used for 3D textures.
+    */
+   INTEL_LAYOUT_WALK_3D,
+};
+
+enum intel_layout_aux_type {
+   INTEL_LAYOUT_AUX_NONE,
+   INTEL_LAYOUT_AUX_HIZ,
+   INTEL_LAYOUT_AUX_MCS,
+};
+
+struct intel_layout_lod {
+   /* physical position */
+   unsigned x;
+   unsigned y;
+
+   /*
+    * Physical size of an LOD slice.  There may be multiple slices when the
+    * walk type is not INTEL_LAYOUT_WALK_LAYER.
+    */
+   unsigned slice_width;
+   unsigned slice_height;
+};
+
+/**
+ * Texture layout.
+ */
+struct intel_layout {
+   enum intel_layout_aux_type aux;
+
+   /* physical width0, height0, and format */
+   unsigned width0;
+   unsigned height0;
+   VkFormat format;
+   bool separate_stencil;
+
+   /*
+    * width, height, and size of pixel blocks, for conversion between 2D
+    * coordinates and memory offsets
+    */
+   unsigned block_width;
+   unsigned block_height;
+   unsigned block_size;
+
+   enum intel_layout_walk_type walk;
+   bool interleaved_samples;
+
+   /* bitmask of valid tiling modes */
+   unsigned valid_tilings;
+   enum gen_surface_tiling tiling;
+
+   /* mipmap alignments */
+   unsigned align_i;
+   unsigned align_j;
+
+   struct intel_layout_lod lods[INTEL_LAYOUT_MAX_LEVELS];
+
+   /* physical height of layers for INTEL_LAYOUT_WALK_LAYER */
+   unsigned layer_height;
+
+   /* distance in bytes between two pixel block rows */
+   unsigned bo_stride;
+   /* number of pixel block rows */
+   unsigned bo_height;
+
+   /* bitmask of levels that can use aux */
+   unsigned aux_enables;
+   unsigned aux_offsets[INTEL_LAYOUT_MAX_LEVELS];
+   unsigned aux_layer_height;
+   unsigned aux_stride;
+   unsigned aux_height;
+};
+
+void intel_layout_init(struct intel_layout *layout,
+                       struct intel_dev *dev,
+                       const VkImageCreateInfo *info,
+                       bool scanout);
+
+/**
+ * Convert from pixel position to 2D memory offset.
+ */
+static inline void
+intel_layout_pos_to_mem(const struct intel_layout *layout,
+                        unsigned pos_x, unsigned pos_y,
+                        unsigned *mem_x, unsigned *mem_y)
+{
+   assert(pos_x % layout->block_width == 0);
+   assert(pos_y % layout->block_height == 0);
+
+   *mem_x = pos_x / layout->block_width * layout->block_size;
+   *mem_y = pos_y / layout->block_height;
+}
+
+/**
+ * Convert from 2D memory offset to linear offset.
+ */
+static inline unsigned
+intel_layout_mem_to_linear(const struct intel_layout *layout,
+                           unsigned mem_x, unsigned mem_y)
+{
+   return mem_y * layout->bo_stride + mem_x;
+}
+
+/**
+ * Convert from 2D memory offset to raw offset.
+ */
+static inline unsigned
+intel_layout_mem_to_raw(const struct intel_layout *layout,
+                        unsigned mem_x, unsigned mem_y)
+{
+   unsigned tile_w U_ASSERT_ONLY;
+   unsigned tile_h;
+
+   switch (layout->tiling) {
+   case GEN6_TILING_NONE:
+      tile_w = 1;
+      tile_h = 1;
+      break;
+   case GEN6_TILING_X:
+      tile_w = 512;
+      tile_h = 8;
+      break;
+   case GEN6_TILING_Y:
+      tile_w = 128;
+      tile_h = 32;
+      break;
+   case GEN8_TILING_W:
+      tile_w = 64;
+      tile_h = 64;
+      break;
+   default:
+      assert(!"unknown tiling");
+      tile_w = 1;
+      tile_h = 1;
+      break;
+   }
+
+   assert(mem_x % tile_w == 0);
+   assert(mem_y % tile_h == 0);
+
+   return mem_y * layout->bo_stride + mem_x * tile_h;
+}
+
+/**
+ * Return the stride, in bytes, between slices within a level.
+ */
+static inline unsigned
+intel_layout_get_slice_stride(const struct intel_layout *layout, unsigned level)
+{
+   unsigned h;
+
+   switch (layout->walk) {
+   case INTEL_LAYOUT_WALK_LOD:
+      h = layout->lods[level].slice_height;
+      break;
+   case INTEL_LAYOUT_WALK_LAYER:
+      h = layout->layer_height;
+      break;
+   case INTEL_LAYOUT_WALK_3D:
+      if (level == 0) {
+         h = layout->lods[0].slice_height;
+         break;
+      }
+      /* fall through */
+   default:
+      assert(!"no single stride to walk across slices");
+      h = 0;
+      break;
+   }
+
+   assert(h % layout->block_height == 0);
+
+   return (h / layout->block_height) * layout->bo_stride;
+}
+
+/**
+ * Return the physical size, in bytes, of a slice in a level.
+ */
+static inline unsigned
+intel_layout_get_slice_size(const struct intel_layout *layout, unsigned level)
+{
+   const unsigned w = layout->lods[level].slice_width;
+   const unsigned h = layout->lods[level].slice_height;
+
+   assert(w % layout->block_width == 0);
+   assert(h % layout->block_height == 0);
+
+   return (w / layout->block_width * layout->block_size) *
+      (h / layout->block_height);
+}
+
+/**
+ * Return the pixel position of a slice.
+ */
+static inline void
+intel_layout_get_slice_pos(const struct intel_layout *layout,
+                           unsigned level, unsigned slice,
+                           unsigned *x, unsigned *y)
+{
+   switch (layout->walk) {
+   case INTEL_LAYOUT_WALK_LOD:
+      *x = layout->lods[level].x;
+      *y = layout->lods[level].y + layout->lods[level].slice_height * slice;
+      break;
+   case INTEL_LAYOUT_WALK_LAYER:
+      *x = layout->lods[level].x;
+      *y = layout->lods[level].y + layout->layer_height * slice;
+      break;
+   case INTEL_LAYOUT_WALK_3D:
+      {
+         /* slices are packed horizontally with wrapping */
+         const unsigned sx = slice & ((1 << level) - 1);
+         const unsigned sy = slice >> level;
+
+         *x = layout->lods[level].x + layout->lods[level].slice_width * sx;
+         *y = layout->lods[level].y + layout->lods[level].slice_height * sy;
+
+         /* should not overlap with the next level */
+         if (level + 1 < ARRAY_SIZE(layout->lods) &&
+             layout->lods[level + 1].y) {
+            assert(*y + layout->lods[level].slice_height <=
+                  layout->lods[level + 1].y);
+         }
+         break;
+      }
+   default:
+      assert(!"unknown layout walk type");
+      *x = 0;
+      *y = 0;
+      break;
+   }
+
+   /* should not exceed the bo size */
+   assert(*y + layout->lods[level].slice_height <=
+         layout->bo_height * layout->block_height);
+}
+
+unsigned
+intel_layout_get_slice_tile_offset(const struct intel_layout *layout,
+                                   unsigned level, unsigned slice,
+                                   unsigned *x_offset, unsigned *y_offset);
+
+#endif /* LAYOUT_H */

diff --git a/icd/intel/mem.c b/icd/intel/mem.c
new file mode 100644
index 0000000..f0c1cca
--- /dev/null
+++ b/icd/intel/mem.c

@@ -0,0 +1,129 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Chia-I Wu <olv@lunarg.com>
+ * Author: Mike Stroyan <mike@LunarG.com>
+ * Author: Tony Barbour <tony@LunarG.com>
+ *
+ */
+
+#include "dev.h"
+#include "mem.h"
+
+VkResult intel_mem_alloc(struct intel_dev *dev,
+                           const VkMemoryAllocateInfo *info,
+                           struct intel_mem **mem_ret)
+{
+    struct intel_mem *mem;
+
+    /* ignore any IMAGE_INFO and BUFFER_INFO usage: they don't alter allocations */
+
+    mem = (struct intel_mem *) intel_base_create(&dev->base.handle,
+            sizeof(*mem), dev->base.dbg, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_MEMORY_EXT, info, 0);
+    if (!mem)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    mem->bo = intel_winsys_alloc_bo(dev->winsys,
+            "vk-gpu-memory", info->allocationSize, 0);
+    if (!mem->bo) {
+        intel_mem_free(mem);
+        return VK_ERROR_OUT_OF_DEVICE_MEMORY;
+    }
+
+    mem->size = info->allocationSize;
+
+    *mem_ret = mem;
+
+    return VK_SUCCESS;
+}
+
+void intel_mem_free(struct intel_mem *mem)
+{
+    intel_bo_unref(mem->bo);
+
+    intel_base_destroy(&mem->base);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkAllocateMemory(
+    VkDevice                                device,
+    const VkMemoryAllocateInfo*                pAllocateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkDeviceMemory*                         pMemory)
+{
+    struct intel_dev *dev = intel_dev(device);
+
+    return intel_mem_alloc(dev, pAllocateInfo, (struct intel_mem **) pMemory);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkFreeMemory(
+    VkDevice                                  device,
+    VkDeviceMemory                            mem_,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    struct intel_mem *mem = intel_mem(mem_);
+
+    intel_mem_free(mem);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkMapMemory(
+    VkDevice                                  device,
+    VkDeviceMemory                            mem_,
+    VkDeviceSize                              offset,
+    VkDeviceSize                              size,
+    VkFlags                                   flags,
+    void**                                    ppData)
+{
+    struct intel_mem *mem = intel_mem(mem_);
+    void *ptr = intel_mem_map(mem, flags);
+
+    *ppData = (void *)((size_t)ptr + offset);
+
+    return (ptr) ? VK_SUCCESS : VK_ERROR_MEMORY_MAP_FAILED;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkUnmapMemory(
+    VkDevice                                    device,
+    VkDeviceMemory                              mem_)
+{
+    struct intel_mem *mem = intel_mem(mem_);
+
+    intel_mem_unmap(mem);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkFlushMappedMemoryRanges(
+    VkDevice                                  device,
+    uint32_t                                  memoryRangeCount,
+    const VkMappedMemoryRange*                pMemoryRanges)
+{
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkInvalidateMappedMemoryRanges(
+    VkDevice                                  device,
+    uint32_t                                  memoryRangeCount,
+    const VkMappedMemoryRange*                pMemoryRanges)
+{
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetDeviceMemoryCommitment(
+    VkDevice                                  device,
+    VkDeviceMemory                            memory,
+    VkDeviceSize*                             pCommittedMemoryInBytes)
+{
+}

diff --git a/icd/intel/mem.h b/icd/intel/mem.h
new file mode 100644
index 0000000..b2c4c81
--- /dev/null
+++ b/icd/intel/mem.h

@@ -0,0 +1,66 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ *
+ */
+
+#ifndef MEM_H
+#define MEM_H
+
+#include "kmd/winsys.h"
+#include "intel.h"
+#include "obj.h"
+
+struct intel_mem {
+    struct intel_base base;
+
+    struct intel_bo *bo;
+    VkDeviceSize size;
+};
+
+VkResult intel_mem_alloc(struct intel_dev *dev,
+                           const VkMemoryAllocateInfo *info,
+                           struct intel_mem **mem_ret);
+void intel_mem_free(struct intel_mem *mem);
+
+static inline void *intel_mem_map(struct intel_mem *mem, VkFlags flags)
+{
+    return intel_bo_map_async(mem->bo);
+}
+
+static inline void *intel_mem_map_sync(struct intel_mem *mem, bool rw)
+{
+    return intel_bo_map(mem->bo, rw);
+}
+
+static inline void intel_mem_unmap(struct intel_mem *mem)
+{
+    intel_bo_unmap(mem->bo);
+}
+
+static inline bool intel_mem_is_busy(struct intel_mem *mem)
+{
+    return intel_bo_is_busy(mem->bo);
+}
+
+static inline struct intel_mem *intel_mem(VkDeviceMemory mem)
+{
+    return *(struct intel_mem **) &mem;
+}
+
+#endif /* MEM_H */

diff --git a/icd/intel/obj.c b/icd/intel/obj.c
new file mode 100644
index 0000000..4771e13
--- /dev/null
+++ b/icd/intel/obj.c

@@ -0,0 +1,363 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ * Author: Jon Ashburn <jon@lunarg.com>
+ *
+ */
+
+#include "dev.h"
+#include "gpu.h"
+#include "mem.h"
+#include "obj.h"
+
+VkResult intel_base_get_memory_requirements(struct intel_base *base, VkMemoryRequirements* pRequirements)
+{
+    memset(pRequirements, 0, sizeof(VkMemoryRequirements));
+    pRequirements->memoryTypeBits = (1<< INTEL_MEMORY_TYPE_COUNT) - 1;
+
+    return VK_SUCCESS;
+}
+
+static bool base_dbg_copy_create_info(const struct intel_handle *handle,
+                                      struct intel_base_dbg *dbg,
+                                      const void *create_info)
+{
+    const union {
+        const void *ptr;
+        const struct {
+            VkStructureType struct_type;
+            void *next;
+        } *header;
+    } info = { .ptr = create_info };
+    size_t shallow_copy = 0;
+
+    if (!create_info)
+        return true;
+
+    switch (dbg->type) {
+    case VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT:
+        assert(info.header->struct_type == VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO);
+        break;
+    case VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_MEMORY_EXT:
+        assert(info.header->struct_type == VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO);
+        break;
+    case VK_DEBUG_REPORT_OBJECT_TYPE_EVENT_EXT:
+        assert(info.header->struct_type == VK_STRUCTURE_TYPE_EVENT_CREATE_INFO);
+        shallow_copy = sizeof(VkEventCreateInfo);
+        break;
+    case VK_DEBUG_REPORT_OBJECT_TYPE_FENCE_EXT:
+        assert(info.header->struct_type == VK_STRUCTURE_TYPE_FENCE_CREATE_INFO);
+        shallow_copy = sizeof(VkFenceCreateInfo);
+        break;
+    case VK_DEBUG_REPORT_OBJECT_TYPE_QUERY_POOL_EXT:
+        assert(info.header->struct_type == VK_STRUCTURE_TYPE_QUERY_POOL_CREATE_INFO);
+        shallow_copy = sizeof(VkQueryPoolCreateInfo);
+        break;
+    case VK_DEBUG_REPORT_OBJECT_TYPE_BUFFER_EXT:
+        assert(info.header->struct_type == VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO);
+        shallow_copy = sizeof(VkBufferCreateInfo);
+        break;
+    case VK_DEBUG_REPORT_OBJECT_TYPE_BUFFER_VIEW_EXT:
+        assert(info.header->struct_type == VK_STRUCTURE_TYPE_BUFFER_VIEW_CREATE_INFO);
+        shallow_copy = sizeof(VkBufferViewCreateInfo);
+        break;
+    case VK_DEBUG_REPORT_OBJECT_TYPE_IMAGE_EXT:
+        assert(info.header->struct_type == VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO);
+        shallow_copy = sizeof(VkImageCreateInfo);
+        break;
+    case VK_DEBUG_REPORT_OBJECT_TYPE_IMAGE_VIEW_EXT:
+        assert(info.header->struct_type == VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO);
+        shallow_copy = sizeof(VkImageViewCreateInfo);
+        break;
+    case VK_DEBUG_REPORT_OBJECT_TYPE_SAMPLER_EXT:
+        assert(info.header->struct_type == VK_STRUCTURE_TYPE_SAMPLER_CREATE_INFO);
+        shallow_copy = sizeof(VkSamplerCreateInfo);
+        break;
+    case VK_DEBUG_REPORT_OBJECT_TYPE_DESCRIPTOR_SET_EXT:
+        /* no create info */
+        break;
+    case VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_POOL_EXT:
+        assert(info.header->struct_type == VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO);
+        shallow_copy = sizeof(VkCommandPoolCreateInfo);
+        break;
+    case VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT:
+        assert(info.header->struct_type == VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO);
+        shallow_copy = sizeof(VkCommandBufferAllocateInfo);
+        break;
+    case VK_DEBUG_REPORT_OBJECT_TYPE_PIPELINE_EXT:
+        assert(info.header->struct_type == VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO);
+        break;
+    case VK_DEBUG_REPORT_OBJECT_TYPE_FRAMEBUFFER_EXT:
+        assert(info.header->struct_type ==  VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO);
+        shallow_copy = sizeof(VkFramebufferCreateInfo);
+        break;
+    case VK_DEBUG_REPORT_OBJECT_TYPE_RENDER_PASS_EXT:
+        assert(info.header->struct_type ==  VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO);
+        shallow_copy = sizeof(VkRenderPassCreateInfo);
+        break;
+    case VK_DEBUG_REPORT_OBJECT_TYPE_DESCRIPTOR_SET_LAYOUT_EXT:
+        assert(info.header->struct_type ==  VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO);
+        /* TODO */
+        shallow_copy = sizeof(VkDescriptorSetLayoutCreateInfo) * 0;
+        break;
+    case VK_DEBUG_REPORT_OBJECT_TYPE_DESCRIPTOR_POOL_EXT:
+        assert(info.header->struct_type ==  VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO);
+        shallow_copy = sizeof(VkDescriptorPoolCreateInfo);
+        break;
+    default:
+        assert(!"unknown dbg object type");
+        return false;
+        break;
+    }
+
+    if (shallow_copy) {
+        dbg->create_info = intel_alloc(handle, shallow_copy, sizeof(int),
+                VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+        if (!dbg->create_info)
+            return false;
+
+        memcpy(dbg->create_info, create_info, shallow_copy);
+        dbg->create_info_size = shallow_copy;
+    } else if (info.header->struct_type ==
+            VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO) {
+        size_t size;
+        const VkMemoryAllocateInfo *src = info.ptr;
+        VkMemoryAllocateInfo *dst;
+        uint8_t *d;
+        size = sizeof(*src);
+
+        dbg->create_info_size = size;
+        dst = intel_alloc(handle, size, sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+        if (!dst)
+            return false;
+        memcpy(dst, src, sizeof(*src));
+
+        d = (uint8_t *) dst;
+        d += sizeof(*src);
+
+        dbg->create_info = dst;
+    } else if (info.header->struct_type ==
+            VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO) {
+        const VkDeviceCreateInfo *src = info.ptr;
+        VkDeviceCreateInfo *dst;
+        uint8_t *d;
+        size_t size;
+
+        size = sizeof(*src);
+        dbg->create_info_size = size;
+
+        size += sizeof(src->pQueueCreateInfos[0]) *
+            src->queueCreateInfoCount;
+        for (uint32_t i = 0; i < src->queueCreateInfoCount; i++) {
+            size += src->pQueueCreateInfos[i].queueCount * sizeof(float);
+        }
+        size += sizeof(src->ppEnabledExtensionNames[0]) *
+            src->enabledExtensionCount;
+        for (uint32_t i = 0; i < src->enabledExtensionCount; i++) {
+            size += strlen(src->ppEnabledExtensionNames[i]) + 1;
+        }
+
+        dst = intel_alloc(handle, size, sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+        if (!dst)
+            return false;
+
+        memcpy(dst, src, sizeof(*src));
+
+        d = (uint8_t *) dst;
+        d += sizeof(*src);
+
+        size = sizeof(src->pQueueCreateInfos[0]) * src->queueCreateInfoCount;
+        memcpy(d, src->pQueueCreateInfos, size);
+        dst->pQueueCreateInfos = (const VkDeviceQueueCreateInfo *) d;
+        d += size;
+        for (uint32_t i = 0; i < src->queueCreateInfoCount; i++) {
+            size = sizeof(float) *
+                dst->pQueueCreateInfos[i].queueCount;
+            memcpy(d, src->pQueueCreateInfos[i].pQueuePriorities, size);
+            *((float **) &dst->pQueueCreateInfos[i].pQueuePriorities) = (float *) d;
+            d += size;
+        }
+
+        size = sizeof(src->ppEnabledExtensionNames[0]) *
+            src->enabledExtensionCount;
+        dst->ppEnabledExtensionNames = (const char **) d;
+        memcpy(d, src->ppEnabledExtensionNames, size);
+        d += size;
+        for (uint32_t i = 0; i < src->enabledExtensionCount; i++) {
+            char **ptr = (char **) &dst->ppEnabledExtensionNames[i];
+            strcpy((char *) d, src->ppEnabledExtensionNames[i]);
+            *ptr = (char *) d;
+            d += strlen(src->ppEnabledExtensionNames[i]) + 1;
+        }
+
+        dbg->create_info = dst;
+    } else if (info.header->struct_type == VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO) {
+        // TODO: What do we want to copy here?
+    }
+
+    return true;
+}
+
+/**
+ * Create an intel_base_dbg.  When dbg_size is non-zero, a buffer of that
+ * size is allocated and zeroed.
+ */
+struct intel_base_dbg *intel_base_dbg_create(const struct intel_handle *handle,
+                                             VkDebugReportObjectTypeEXT type,
+                                             const void *create_info,
+                                             size_t dbg_size)
+{
+    struct intel_base_dbg *dbg;
+
+    if (!dbg_size)
+        dbg_size = sizeof(*dbg);
+
+    assert(dbg_size >= sizeof(*dbg));
+
+    dbg = intel_alloc(handle, dbg_size, sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+    if (!dbg)
+        return NULL;
+
+    memset(dbg, 0, dbg_size);
+
+    dbg->type = type;
+
+    if (!base_dbg_copy_create_info(handle, dbg, create_info)) {
+        intel_free(handle, dbg);
+        return NULL;
+    }
+
+    return dbg;
+}
+
+void intel_base_dbg_destroy(const struct intel_handle *handle,
+                            struct intel_base_dbg *dbg)
+{
+    if (dbg->tag)
+        intel_free(handle, dbg->tag);
+
+    if (dbg->create_info)
+        intel_free(handle, dbg->create_info);
+
+    intel_free(handle, dbg);
+}
+
+/**
+ * Create an intel_base.  obj_size and dbg_size specify the real sizes of the
+ * object and the debug metadata.  Memories are zeroed.
+ */
+struct intel_base *intel_base_create(const struct intel_handle *handle,
+                                     size_t obj_size, bool debug,
+                                     VkDebugReportObjectTypeEXT type,
+                                     const void *create_info,
+                                     size_t dbg_size)
+{
+    struct intel_base *base;
+
+    if (!obj_size)
+        obj_size = sizeof(*base);
+
+    assert(obj_size >= sizeof(*base));
+
+    base = intel_alloc(handle, obj_size, sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+    if (!base)
+        return NULL;
+
+    memset(base, 0, obj_size);
+    intel_handle_init(&base->handle, type, handle->instance);
+
+    if (debug) {
+        base->dbg = intel_base_dbg_create(&base->handle,
+                type, create_info, dbg_size);
+        if (!base->dbg) {
+            intel_free(handle, base);
+            return NULL;
+        }
+    }
+
+    base->get_memory_requirements = intel_base_get_memory_requirements;
+
+    return base;
+}
+
+void intel_base_destroy(struct intel_base *base)
+{
+    if (base->dbg)
+        intel_base_dbg_destroy(&base->handle, base->dbg);
+    intel_free(base, base);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetBufferMemoryRequirements(
+    VkDevice                                    device,
+    VkBuffer                                    buffer,
+    VkMemoryRequirements*                       pRequirements)
+{
+    struct intel_base *base = intel_base(buffer);
+
+    base->get_memory_requirements(base, pRequirements);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetImageMemoryRequirements(
+    VkDevice                                    device,
+    VkImage                                     image,
+    VkMemoryRequirements*                       pRequirements)
+{
+    struct intel_base *base = intel_base(image);
+
+    base->get_memory_requirements(base, pRequirements);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkBindBufferMemory(
+    VkDevice                                    device,
+    VkBuffer                                    buffer,
+    VkDeviceMemory                              mem_,
+    VkDeviceSize                                memoryOffset)
+{
+    struct intel_obj *obj = intel_obj(buffer);
+    struct intel_mem *mem = intel_mem(mem_);
+
+    intel_obj_bind_mem(obj, mem, memoryOffset);
+
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkBindImageMemory(
+    VkDevice                                    device,
+    VkImage                                     image,
+    VkDeviceMemory                              mem_,
+    VkDeviceSize                                memoryOffset)
+{
+    struct intel_obj *obj = intel_obj(image);
+    struct intel_mem *mem = intel_mem(mem_);
+
+    intel_obj_bind_mem(obj, mem, memoryOffset);
+
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkQueueBindSparse(
+    VkQueue                                     queue,
+    uint32_t                                    bindInfoCount,
+    const VkBindSparseInfo*                     pBindInfo,
+    VkFence                                     fence)
+{
+    assert(0 && "vkQueueBindSparse not supported");
+    return VK_SUCCESS;
+}
+

diff --git a/icd/intel/obj.h b/icd/intel/obj.h
new file mode 100644
index 0000000..60cc571
--- /dev/null
+++ b/icd/intel/obj.h

@@ -0,0 +1,94 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ *
+ */
+
+#ifndef OBJ_H
+#define OBJ_H
+
+#include "intel.h"
+
+struct intel_dev;
+struct intel_mem;
+
+struct intel_base_dbg {
+    VkDebugReportObjectTypeEXT type;
+
+    void *create_info;
+    size_t create_info_size;
+
+    void *tag;
+    size_t tag_size;
+};
+
+struct intel_base {
+    struct intel_handle handle;
+
+    struct intel_base_dbg *dbg;
+
+    VkResult (*get_memory_requirements)(struct intel_base *base,
+                         VkMemoryRequirements *data);
+};
+
+struct intel_obj {
+    struct intel_base base;
+
+    void (*destroy)(struct intel_obj *obj);
+
+    /* for memory binding */
+    struct intel_mem *mem;
+    size_t offset;
+};
+
+static inline struct intel_base *intel_base(void * base)
+{
+    return (struct intel_base *) base;
+}
+
+static inline struct intel_obj *intel_obj(void * obj)
+{
+    return (struct intel_obj *) obj;
+}
+
+static inline void intel_obj_bind_mem(struct intel_obj *obj,
+                                      struct intel_mem *mem,
+                                      VkDeviceSize offset)
+{
+    obj->mem = mem;
+    obj->offset = offset;
+}
+
+VkResult intel_base_get_memory_requirements(struct intel_base *base, VkMemoryRequirements *data);
+
+struct intel_base_dbg *intel_base_dbg_create(const struct intel_handle *handle,
+                                             VkDebugReportObjectTypeEXT type,
+                                             const void *create_info,
+                                             size_t dbg_size);
+void intel_base_dbg_destroy(const struct intel_handle *handle,
+                            struct intel_base_dbg *dbg);
+
+struct intel_base *intel_base_create(const struct intel_handle *handle,
+                                     size_t obj_size, bool debug,
+                                     VkDebugReportObjectTypeEXT type,
+                                     const void *create_info,
+                                     size_t dbg_size);
+void intel_base_destroy(struct intel_base *base);
+
+#endif /* OBJ_H */

diff --git a/icd/intel/pipeline.c b/icd/intel/pipeline.c
new file mode 100644
index 0000000..a3eca19
--- /dev/null
+++ b/icd/intel/pipeline.c

@@ -0,0 +1,1456 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Chia-I Wu <olv@lunarg.com>
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ * Author: GregF <greg@LunarG.com>
+ * Author: Tony Barbour <tony@LunarG.com>
+ *
+ */
+
+#include "genhw/genhw.h"
+#include "compiler/pipeline/pipeline_compiler_interface.h"
+#include "cmd.h"
+#include "format.h"
+#include "shader.h"
+#include "pipeline.h"
+#include "mem.h"
+
+static int translate_blend_func(VkBlendOp func)
+{
+   switch (func) {
+   case VK_BLEND_OP_ADD:                return GEN6_BLENDFUNCTION_ADD;
+   case VK_BLEND_OP_SUBTRACT:           return GEN6_BLENDFUNCTION_SUBTRACT;
+   case VK_BLEND_OP_REVERSE_SUBTRACT:   return GEN6_BLENDFUNCTION_REVERSE_SUBTRACT;
+   case VK_BLEND_OP_MIN:                return GEN6_BLENDFUNCTION_MIN;
+   case VK_BLEND_OP_MAX:                return GEN6_BLENDFUNCTION_MAX;
+   default:
+      assert(!"unknown blend func");
+      return GEN6_BLENDFUNCTION_ADD;
+   };
+}
+
+static int translate_blend(VkBlendFactor blend)
+{
+   switch (blend) {
+   case VK_BLEND_FACTOR_ZERO:                     return GEN6_BLENDFACTOR_ZERO;
+   case VK_BLEND_FACTOR_ONE:                      return GEN6_BLENDFACTOR_ONE;
+   case VK_BLEND_FACTOR_SRC_COLOR:                return GEN6_BLENDFACTOR_SRC_COLOR;
+   case VK_BLEND_FACTOR_ONE_MINUS_SRC_COLOR:      return GEN6_BLENDFACTOR_INV_SRC_COLOR;
+   case VK_BLEND_FACTOR_DST_COLOR:               return GEN6_BLENDFACTOR_DST_COLOR;
+   case VK_BLEND_FACTOR_ONE_MINUS_DST_COLOR:     return GEN6_BLENDFACTOR_INV_DST_COLOR;
+   case VK_BLEND_FACTOR_SRC_ALPHA:                return GEN6_BLENDFACTOR_SRC_ALPHA;
+   case VK_BLEND_FACTOR_ONE_MINUS_SRC_ALPHA:      return GEN6_BLENDFACTOR_INV_SRC_ALPHA;
+   case VK_BLEND_FACTOR_DST_ALPHA:               return GEN6_BLENDFACTOR_DST_ALPHA;
+   case VK_BLEND_FACTOR_ONE_MINUS_DST_ALPHA:     return GEN6_BLENDFACTOR_INV_DST_ALPHA;
+   case VK_BLEND_FACTOR_CONSTANT_COLOR:           return GEN6_BLENDFACTOR_CONST_COLOR;
+   case VK_BLEND_FACTOR_ONE_MINUS_CONSTANT_COLOR: return GEN6_BLENDFACTOR_INV_CONST_COLOR;
+   case VK_BLEND_FACTOR_CONSTANT_ALPHA:           return GEN6_BLENDFACTOR_CONST_ALPHA;
+   case VK_BLEND_FACTOR_ONE_MINUS_CONSTANT_ALPHA: return GEN6_BLENDFACTOR_INV_CONST_ALPHA;
+   case VK_BLEND_FACTOR_SRC_ALPHA_SATURATE:       return GEN6_BLENDFACTOR_SRC_ALPHA_SATURATE;
+   case VK_BLEND_FACTOR_SRC1_COLOR:               return GEN6_BLENDFACTOR_SRC1_COLOR;
+   case VK_BLEND_FACTOR_ONE_MINUS_SRC1_COLOR:     return GEN6_BLENDFACTOR_INV_SRC1_COLOR;
+   case VK_BLEND_FACTOR_SRC1_ALPHA:               return GEN6_BLENDFACTOR_SRC1_ALPHA;
+   case VK_BLEND_FACTOR_ONE_MINUS_SRC1_ALPHA:     return GEN6_BLENDFACTOR_INV_SRC1_ALPHA;
+   default:
+      assert(!"unknown blend factor");
+      return GEN6_BLENDFACTOR_ONE;
+   };
+}
+
+static int translate_compare_func(VkCompareOp func)
+{
+    switch (func) {
+    case VK_COMPARE_OP_NEVER:         return GEN6_COMPAREFUNCTION_NEVER;
+    case VK_COMPARE_OP_LESS:          return GEN6_COMPAREFUNCTION_LESS;
+    case VK_COMPARE_OP_EQUAL:         return GEN6_COMPAREFUNCTION_EQUAL;
+    case VK_COMPARE_OP_LESS_OR_EQUAL:    return GEN6_COMPAREFUNCTION_LEQUAL;
+    case VK_COMPARE_OP_GREATER:       return GEN6_COMPAREFUNCTION_GREATER;
+    case VK_COMPARE_OP_NOT_EQUAL:     return GEN6_COMPAREFUNCTION_NOTEQUAL;
+    case VK_COMPARE_OP_GREATER_OR_EQUAL: return GEN6_COMPAREFUNCTION_GEQUAL;
+    case VK_COMPARE_OP_ALWAYS:        return GEN6_COMPAREFUNCTION_ALWAYS;
+    default:
+      assert(!"unknown compare_func");
+      return GEN6_COMPAREFUNCTION_NEVER;
+    }
+}
+
+static int translate_stencil_op(VkStencilOp op)
+{
+    switch (op) {
+    case VK_STENCIL_OP_KEEP:       return GEN6_STENCILOP_KEEP;
+    case VK_STENCIL_OP_ZERO:       return GEN6_STENCILOP_ZERO;
+    case VK_STENCIL_OP_REPLACE:    return GEN6_STENCILOP_REPLACE;
+    case VK_STENCIL_OP_INCREMENT_AND_CLAMP:  return GEN6_STENCILOP_INCRSAT;
+    case VK_STENCIL_OP_DECREMENT_AND_CLAMP:  return GEN6_STENCILOP_DECRSAT;
+    case VK_STENCIL_OP_INVERT:     return GEN6_STENCILOP_INVERT;
+    case VK_STENCIL_OP_INCREMENT_AND_WRAP:   return GEN6_STENCILOP_INCR;
+    case VK_STENCIL_OP_DECREMENT_AND_WRAP:   return GEN6_STENCILOP_DECR;
+    default:
+      assert(!"unknown stencil op");
+      return GEN6_STENCILOP_KEEP;
+    }
+}
+
+static int translate_sample_count(VkSampleCountFlagBits samples)
+{
+    switch (samples) {
+    case VK_SAMPLE_COUNT_1_BIT:     return 1;
+    case VK_SAMPLE_COUNT_2_BIT:     return 2;
+    case VK_SAMPLE_COUNT_4_BIT:     return 4;
+    case VK_SAMPLE_COUNT_8_BIT:     return 8;
+    case VK_SAMPLE_COUNT_16_BIT:    return 16;
+    case VK_SAMPLE_COUNT_32_BIT:    return 32;
+    case VK_SAMPLE_COUNT_64_BIT:    return 64;
+    default:
+      assert(!"unknown sample count");
+      return 1;
+    }
+}
+
+struct intel_pipeline_create_info {
+    VkFlags                                use_pipeline_dynamic_state;
+    VkGraphicsPipelineCreateInfo           graphics;
+    VkPipelineVertexInputStateCreateInfo   vi;
+    VkPipelineInputAssemblyStateCreateInfo ia;
+    VkPipelineDepthStencilStateCreateInfo  db;
+    VkPipelineColorBlendStateCreateInfo    cb;
+    VkPipelineRasterizationStateCreateInfo        rs;
+    VkPipelineTessellationStateCreateInfo  tess;
+    VkPipelineMultisampleStateCreateInfo   ms;
+    VkPipelineViewportStateCreateInfo      vp;
+
+    VkComputePipelineCreateInfo            compute;
+
+    VkPipelineShaderStageCreateInfo        vs;
+    VkPipelineShaderStageCreateInfo        tcs;
+    VkPipelineShaderStageCreateInfo        tes;
+    VkPipelineShaderStageCreateInfo        gs;
+    VkPipelineShaderStageCreateInfo        fs;
+};
+
+/* in S1.3 */
+struct intel_pipeline_sample_position {
+    int8_t x, y;
+};
+
+static uint8_t pack_sample_position(const struct intel_dev *dev,
+                                    const struct intel_pipeline_sample_position *pos)
+{
+    return (pos->x + 8) << 4 | (pos->y + 8);
+}
+
+void intel_pipeline_init_default_sample_patterns(const struct intel_dev *dev,
+                                                 uint8_t *pat_1x, uint8_t *pat_2x,
+                                                 uint8_t *pat_4x, uint8_t *pat_8x,
+                                                 uint8_t *pat_16x)
+{
+    static const struct intel_pipeline_sample_position default_1x[1] = {
+        {  0,  0 },
+    };
+    static const struct intel_pipeline_sample_position default_2x[2] = {
+        { -4, -4 },
+        {  4,  4 },
+    };
+    static const struct intel_pipeline_sample_position default_4x[4] = {
+        { -2, -6 },
+        {  6, -2 },
+        { -6,  2 },
+        {  2,  6 },
+    };
+    static const struct intel_pipeline_sample_position default_8x[8] = {
+        { -1,  1 },
+        {  1,  5 },
+        {  3, -5 },
+        {  5,  3 },
+        { -7, -1 },
+        { -3, -7 },
+        {  7, -3 },
+        { -5,  7 },
+    };
+    static const struct intel_pipeline_sample_position default_16x[16] = {
+        {  0,  2 },
+        {  3,  0 },
+        { -3, -2 },
+        { -2, -4 },
+        {  4,  3 },
+        {  5,  1 },
+        {  6, -1 },
+        {  2, -6 },
+        { -4,  5 },
+        { -5, -5 },
+        { -1, -7 },
+        {  7, -3 },
+        { -7,  4 },
+        {  1, -8 },
+        { -6,  6 },
+        { -8,  7 },
+    };
+    int i;
+
+    pat_1x[0] = pack_sample_position(dev, default_1x);
+    for (i = 0; i < 2; i++)
+        pat_2x[i] = pack_sample_position(dev, &default_2x[i]);
+    for (i = 0; i < 4; i++)
+        pat_4x[i] = pack_sample_position(dev, &default_4x[i]);
+    for (i = 0; i < 8; i++)
+        pat_8x[i] = pack_sample_position(dev, &default_8x[i]);
+    for (i = 0; i < 16; i++)
+        pat_16x[i] = pack_sample_position(dev, &default_16x[i]);
+}
+
+struct intel_pipeline_shader *intel_pipeline_shader_create_meta(struct intel_dev *dev,
+                                                                enum intel_dev_meta_shader id)
+{
+    struct intel_pipeline_shader *sh;
+    VkResult ret;
+
+    sh = intel_alloc(dev, sizeof(*sh), sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_DEVICE);
+    if (!sh)
+        return NULL;
+    memset(sh, 0, sizeof(*sh));
+
+    ret = intel_pipeline_shader_compile_meta(sh, dev->gpu, id);
+    if (ret != VK_SUCCESS) {
+        intel_free(dev, sh);
+        return NULL;
+    }
+
+    switch (id) {
+    case INTEL_DEV_META_VS_FILL_MEM:
+    case INTEL_DEV_META_VS_COPY_MEM:
+    case INTEL_DEV_META_VS_COPY_MEM_UNALIGNED:
+        sh->max_threads = intel_gpu_get_max_threads(dev->gpu,
+                VK_SHADER_STAGE_VERTEX_BIT);
+        break;
+    default:
+        sh->max_threads = intel_gpu_get_max_threads(dev->gpu,
+                VK_SHADER_STAGE_FRAGMENT_BIT);
+        break;
+    }
+
+    return sh;
+}
+
+void intel_pipeline_shader_destroy(struct intel_dev *dev,
+                                   struct intel_pipeline_shader *sh)
+{
+    intel_pipeline_shader_cleanup(sh, dev->gpu);
+    intel_free(dev, sh);
+}
+
+static VkResult pipeline_build_shader(struct intel_pipeline *pipeline,
+                                        const VkPipelineShaderStageCreateInfo *sh_info,
+                                        struct intel_pipeline_shader *sh)
+{
+    struct intel_shader_module *mod =
+        intel_shader_module(sh_info->module);
+    const struct intel_ir *ir =
+        intel_shader_module_get_ir(mod, sh_info->stage);
+    VkResult ret;
+
+    if (!ir)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    ret = intel_pipeline_shader_compile(sh,
+            pipeline->dev->gpu, pipeline->pipeline_layout, sh_info, ir);
+
+    if (ret != VK_SUCCESS)
+        return ret;
+
+    sh->max_threads =
+        intel_gpu_get_max_threads(pipeline->dev->gpu, sh_info->stage);
+
+    /* 1KB aligned */
+    sh->scratch_offset = u_align(pipeline->scratch_size, 1024);
+    pipeline->scratch_size = sh->scratch_offset +
+        sh->per_thread_scratch_size * sh->max_threads;
+
+    pipeline->active_shaders |= sh_info->stage;
+
+    return VK_SUCCESS;
+}
+
+static VkResult pipeline_build_shaders(struct intel_pipeline *pipeline,
+                                         const struct intel_pipeline_create_info *info)
+{
+    VkResult ret = VK_SUCCESS;
+
+    if (ret == VK_SUCCESS && info->vs.module)
+        ret = pipeline_build_shader(pipeline, &info->vs, &pipeline->vs);
+    if (ret == VK_SUCCESS && info->tcs.module)
+        ret = pipeline_build_shader(pipeline, &info->tcs,&pipeline->tcs);
+    if (ret == VK_SUCCESS && info->tes.module)
+        ret = pipeline_build_shader(pipeline, &info->tes,&pipeline->tes);
+    if (ret == VK_SUCCESS && info->gs.module)
+        ret = pipeline_build_shader(pipeline, &info->gs, &pipeline->gs);
+    if (ret == VK_SUCCESS && info->fs.module)
+        ret = pipeline_build_shader(pipeline, &info->fs, &pipeline->fs);
+
+    if (ret == VK_SUCCESS && info->compute.stage.module) {
+        ret = pipeline_build_shader(pipeline,
+                &info->compute.stage, &pipeline->cs);
+    }
+
+    return ret;
+}
+static uint32_t *pipeline_cmd_ptr(struct intel_pipeline *pipeline, int cmd_len)
+{
+    uint32_t *ptr;
+
+    assert(pipeline->cmd_len + cmd_len < INTEL_PSO_CMD_ENTRIES);
+    ptr = &pipeline->cmds[pipeline->cmd_len];
+    pipeline->cmd_len += cmd_len;
+    return ptr;
+}
+
+static VkResult pipeline_build_ia(struct intel_pipeline *pipeline,
+                                    const struct intel_pipeline_create_info* info)
+{
+    pipeline->topology = info->ia.topology;
+    pipeline->disable_vs_cache = false;
+
+    switch (info->ia.topology) {
+    case VK_PRIMITIVE_TOPOLOGY_POINT_LIST:
+        pipeline->prim_type = GEN6_3DPRIM_POINTLIST;
+        break;
+    case VK_PRIMITIVE_TOPOLOGY_LINE_LIST:
+        pipeline->prim_type = GEN6_3DPRIM_LINELIST;
+        break;
+    case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP:
+        pipeline->prim_type = GEN6_3DPRIM_LINESTRIP;
+        break;
+    case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST:
+        pipeline->prim_type = GEN6_3DPRIM_TRILIST;
+        break;
+    case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP:
+        pipeline->prim_type = GEN6_3DPRIM_TRISTRIP;
+        break;
+    case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_FAN:
+        pipeline->prim_type = GEN6_3DPRIM_TRIFAN;
+        break;
+    case VK_PRIMITIVE_TOPOLOGY_LINE_LIST_WITH_ADJACENCY:
+        pipeline->prim_type = GEN6_3DPRIM_LINELIST_ADJ;
+        break;
+    case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP_WITH_ADJACENCY:
+        pipeline->prim_type = GEN6_3DPRIM_LINESTRIP_ADJ;
+        break;
+    case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST_WITH_ADJACENCY:
+        pipeline->prim_type = GEN6_3DPRIM_TRILIST_ADJ;
+        break;
+    case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP_WITH_ADJACENCY:
+        pipeline->prim_type = GEN6_3DPRIM_TRISTRIP_ADJ;
+        break;
+    case VK_PRIMITIVE_TOPOLOGY_PATCH_LIST:
+        pipeline->prim_type = GEN7_3DPRIM_PATCHLIST_1 +
+            info->tess.patchControlPoints - 1;
+        break;
+    default:
+        assert(!"unsupported primitive topology format");
+        break;
+    }
+
+    if (info->ia.primitiveRestartEnable) {
+        pipeline->primitive_restart = true;
+        pipeline->primitive_restart_index = 0;
+    } else {
+        pipeline->primitive_restart = false;
+    }
+
+    return VK_SUCCESS;
+}
+
+static VkResult pipeline_build_rs_state(struct intel_pipeline *pipeline,
+                                          const struct intel_pipeline_create_info* info)
+{
+    const VkPipelineRasterizationStateCreateInfo *rs_state = &info->rs;
+    bool ccw;
+
+    pipeline->depthClipEnable = !rs_state->depthClampEnable;
+    pipeline->rasterizerDiscardEnable = rs_state->rasterizerDiscardEnable;
+    pipeline->depthBiasEnable = rs_state->depthBiasEnable;
+
+    switch (rs_state->polygonMode) {
+    case VK_POLYGON_MODE_POINT:
+        pipeline->cmd_sf_fill |= GEN7_SF_DW1_FRONTFACE_POINT |
+                              GEN7_SF_DW1_BACKFACE_POINT;
+        break;
+    case VK_POLYGON_MODE_LINE:
+        pipeline->cmd_sf_fill |= GEN7_SF_DW1_FRONTFACE_WIREFRAME |
+                              GEN7_SF_DW1_BACKFACE_WIREFRAME;
+        break;
+    case VK_POLYGON_MODE_FILL:
+    default:
+        pipeline->cmd_sf_fill |= GEN7_SF_DW1_FRONTFACE_SOLID |
+                              GEN7_SF_DW1_BACKFACE_SOLID;
+        break;
+    }
+
+    ccw = (rs_state->frontFace == VK_FRONT_FACE_COUNTER_CLOCKWISE);
+    /* flip the winding order */
+
+    if (ccw) {
+        pipeline->cmd_sf_fill |= GEN7_SF_DW1_FRONTWINDING_CCW;
+        pipeline->cmd_clip_cull |= GEN7_CLIP_DW1_FRONTWINDING_CCW;
+    }
+
+    switch (rs_state->cullMode) {
+    case VK_CULL_MODE_NONE:
+    default:
+        pipeline->cmd_sf_cull |= GEN7_SF_DW2_CULLMODE_NONE;
+        pipeline->cmd_clip_cull |= GEN7_CLIP_DW1_CULLMODE_NONE;
+        break;
+    case VK_CULL_MODE_FRONT_BIT:
+        pipeline->cmd_sf_cull |= GEN7_SF_DW2_CULLMODE_FRONT;
+        pipeline->cmd_clip_cull |= GEN7_CLIP_DW1_CULLMODE_FRONT;
+        break;
+    case VK_CULL_MODE_BACK_BIT:
+        pipeline->cmd_sf_cull |= GEN7_SF_DW2_CULLMODE_BACK;
+        pipeline->cmd_clip_cull |= GEN7_CLIP_DW1_CULLMODE_BACK;
+        break;
+    case VK_CULL_MODE_FRONT_AND_BACK:
+        pipeline->cmd_sf_cull |= GEN7_SF_DW2_CULLMODE_BOTH;
+        pipeline->cmd_clip_cull |= GEN7_CLIP_DW1_CULLMODE_BOTH;
+        break;
+    }
+
+    /* only GEN7+ needs cull mode in 3DSTATE_CLIP */
+    if (intel_gpu_gen(pipeline->dev->gpu) == INTEL_GEN(6))
+        pipeline->cmd_clip_cull = 0;
+
+    return VK_SUCCESS;
+}
+
+static void pipeline_destroy(struct intel_obj *obj)
+{
+    struct intel_pipeline *pipeline = intel_pipeline_from_obj(obj);
+
+    if (pipeline->active_shaders & SHADER_VERTEX_FLAG) {
+        intel_pipeline_shader_cleanup(&pipeline->vs, pipeline->dev->gpu);
+    }
+
+    if (pipeline->active_shaders & SHADER_TESS_CONTROL_FLAG) {
+        intel_pipeline_shader_cleanup(&pipeline->tcs, pipeline->dev->gpu);
+    }
+
+    if (pipeline->active_shaders & SHADER_TESS_EVAL_FLAG) {
+        intel_pipeline_shader_cleanup(&pipeline->tes, pipeline->dev->gpu);
+    }
+
+    if (pipeline->active_shaders & SHADER_GEOMETRY_FLAG) {
+        intel_pipeline_shader_cleanup(&pipeline->gs, pipeline->dev->gpu);
+    }
+
+    if (pipeline->active_shaders & SHADER_FRAGMENT_FLAG) {
+        intel_pipeline_shader_cleanup(&pipeline->fs, pipeline->dev->gpu);
+    }
+
+    if (pipeline->active_shaders & SHADER_COMPUTE_FLAG) {
+        intel_pipeline_shader_cleanup(&pipeline->cs, pipeline->dev->gpu);
+    }
+
+    intel_base_destroy(&pipeline->obj.base);
+}
+
+static void pipeline_build_urb_alloc_gen6(struct intel_pipeline *pipeline,
+                                          const struct intel_pipeline_create_info *info)
+{
+    const struct intel_gpu *gpu = pipeline->dev->gpu;
+    const int urb_size = ((gpu->gt == 2) ? 64 : 32) * 1024;
+    const struct intel_pipeline_shader *vs = &pipeline->vs;
+    const struct intel_pipeline_shader *gs = &pipeline->gs;
+    int vs_entry_size, gs_entry_size;
+    int vs_size, gs_size;
+
+    INTEL_GPU_ASSERT(gpu, 6, 6);
+
+    vs_entry_size = ((vs->in_count >= vs->out_count) ?
+        vs->in_count : vs->out_count);
+    gs_entry_size = (gs) ? gs->out_count : 0;
+
+    /* in bytes */
+    vs_entry_size *= sizeof(float) * 4;
+    gs_entry_size *= sizeof(float) * 4;
+
+    if (pipeline->active_shaders & SHADER_GEOMETRY_FLAG) {
+        vs_size = urb_size / 2;
+        gs_size = vs_size;
+    } else {
+        vs_size = urb_size;
+        gs_size = 0;
+    }
+
+    /* 3DSTATE_URB */
+    {
+        const uint8_t cmd_len = 3;
+        const uint32_t dw0 = GEN6_RENDER_CMD(3D, 3DSTATE_URB) |
+                             (cmd_len - 2);
+        int vs_alloc_size, gs_alloc_size;
+        int vs_entry_count, gs_entry_count;
+        uint32_t *dw;
+
+        /* in 1024-bit rows */
+        vs_alloc_size = (vs_entry_size + 128 - 1) / 128;
+        gs_alloc_size = (gs_entry_size + 128 - 1) / 128;
+
+        /* valid range is [1, 5] */
+        if (!vs_alloc_size)
+            vs_alloc_size = 1;
+        if (!gs_alloc_size)
+            gs_alloc_size = 1;
+        assert(vs_alloc_size <= 5 && gs_alloc_size <= 5);
+
+        /* valid range is [24, 256], multiples of 4 */
+        vs_entry_count = (vs_size / 128 / vs_alloc_size) & ~3;
+        if (vs_entry_count > 256)
+            vs_entry_count = 256;
+        assert(vs_entry_count >= 24);
+
+        /* valid range is [0, 256], multiples of 4 */
+        gs_entry_count = (gs_size / 128 / gs_alloc_size) & ~3;
+        if (gs_entry_count > 256)
+            gs_entry_count = 256;
+
+        dw = pipeline_cmd_ptr(pipeline, cmd_len);
+
+        dw[0] = dw0;
+        dw[1] = (vs_alloc_size - 1) << GEN6_URB_DW1_VS_ENTRY_SIZE__SHIFT |
+                vs_entry_count << GEN6_URB_DW1_VS_ENTRY_COUNT__SHIFT;
+        dw[2] = gs_entry_count << GEN6_URB_DW2_GS_ENTRY_COUNT__SHIFT |
+                (gs_alloc_size - 1) << GEN6_URB_DW2_GS_ENTRY_SIZE__SHIFT;
+    }
+}
+
+static void pipeline_build_urb_alloc_gen7(struct intel_pipeline *pipeline,
+                                          const struct intel_pipeline_create_info *info)
+{
+    const struct intel_gpu *gpu = pipeline->dev->gpu;
+    const int urb_size = ((gpu->gt == 3) ? 512 :
+                          (gpu->gt == 2) ? 256 : 128) * 1024;
+    const struct intel_pipeline_shader *vs = &pipeline->vs;
+    const struct intel_pipeline_shader *gs = &pipeline->gs;
+    /* some space is reserved for PCBs */
+    int urb_offset = ((gpu->gt == 3) ? 32 : 16) * 1024;
+    int vs_entry_size, gs_entry_size;
+    int vs_size, gs_size;
+
+    INTEL_GPU_ASSERT(gpu, 7, 7.5);
+
+    vs_entry_size = ((vs->in_count >= vs->out_count) ?
+        vs->in_count : vs->out_count);
+    gs_entry_size = (gs) ? gs->out_count : 0;
+
+    /* in bytes */
+    vs_entry_size *= sizeof(float) * 4;
+    gs_entry_size *= sizeof(float) * 4;
+
+    if (pipeline->active_shaders & SHADER_GEOMETRY_FLAG) {
+        vs_size = (urb_size - urb_offset) / 2;
+        gs_size = vs_size;
+    } else {
+        vs_size = urb_size - urb_offset;
+        gs_size = 0;
+    }
+
+    /* 3DSTATE_URB_* */
+    {
+        const uint8_t cmd_len = 2;
+        int vs_alloc_size, gs_alloc_size;
+        int vs_entry_count, gs_entry_count;
+        uint32_t *dw;
+
+        /* in 512-bit rows */
+        vs_alloc_size = (vs_entry_size + 64 - 1) / 64;
+        gs_alloc_size = (gs_entry_size + 64 - 1) / 64;
+
+        if (!vs_alloc_size)
+            vs_alloc_size = 1;
+        if (!gs_alloc_size)
+            gs_alloc_size = 1;
+
+        /* avoid performance decrease due to banking */
+        if (vs_alloc_size == 5)
+            vs_alloc_size = 6;
+
+        /* in multiples of 8 */
+        vs_entry_count = (vs_size / 64 / vs_alloc_size) & ~7;
+        assert(vs_entry_count >= 32);
+
+        gs_entry_count = (gs_size / 64 / gs_alloc_size) & ~7;
+
+        if (intel_gpu_gen(gpu) >= INTEL_GEN(7.5)) {
+            const int max_vs_entry_count =
+                (gpu->gt >= 2) ? 1664 : 640;
+            const int max_gs_entry_count =
+                (gpu->gt >= 2) ? 640 : 256;
+            if (vs_entry_count >= max_vs_entry_count)
+                vs_entry_count = max_vs_entry_count;
+            if (gs_entry_count >= max_gs_entry_count)
+                gs_entry_count = max_gs_entry_count;
+        } else {
+            const int max_vs_entry_count =
+                (gpu->gt == 2) ? 704 : 512;
+            const int max_gs_entry_count =
+                (gpu->gt == 2) ? 320 : 192;
+            if (vs_entry_count >= max_vs_entry_count)
+                vs_entry_count = max_vs_entry_count;
+            if (gs_entry_count >= max_gs_entry_count)
+                gs_entry_count = max_gs_entry_count;
+        }
+
+        dw = pipeline_cmd_ptr(pipeline, cmd_len*4);
+        dw[0] = GEN7_RENDER_CMD(3D, 3DSTATE_URB_VS) | (cmd_len - 2);
+        dw[1] = (urb_offset / 8192) << GEN7_URB_DW1_OFFSET__SHIFT |
+                (vs_alloc_size - 1) << GEN7_URB_DW1_ENTRY_SIZE__SHIFT |
+                vs_entry_count;
+
+        dw += 2;
+        if (gs_size)
+            urb_offset += vs_size;
+        dw[0] = GEN7_RENDER_CMD(3D, 3DSTATE_URB_GS) | (cmd_len - 2);
+        dw[1] = (urb_offset  / 8192) << GEN7_URB_DW1_OFFSET__SHIFT |
+                (gs_alloc_size - 1) << GEN7_URB_DW1_ENTRY_SIZE__SHIFT |
+                gs_entry_count;
+
+        dw += 2;
+        dw[0] = GEN7_RENDER_CMD(3D, 3DSTATE_URB_HS) | (cmd_len - 2);
+        dw[1] = (urb_offset / 8192)  << GEN7_URB_DW1_OFFSET__SHIFT;
+
+        dw += 2;
+        dw[0] = GEN7_RENDER_CMD(3D, 3DSTATE_URB_DS) | (cmd_len - 2);
+        dw[1] = (urb_offset / 8192)  << GEN7_URB_DW1_OFFSET__SHIFT;
+    }
+}
+
+static void pipeline_build_vertex_elements(struct intel_pipeline *pipeline,
+                                           const struct intel_pipeline_create_info *info)
+{
+    const struct intel_pipeline_shader *vs = &pipeline->vs;
+    uint8_t cmd_len;
+    uint32_t *dw;
+    uint32_t i, j;
+    uint32_t attr_count;
+    uint32_t attrs_processed;
+    int comps[4];
+
+    INTEL_GPU_ASSERT(pipeline->dev->gpu, 6, 7.5);
+
+    attr_count = u_popcountll(vs->inputs_read);
+    cmd_len = 1 + 2 * attr_count;
+    if (vs->uses & (INTEL_SHADER_USE_VID | INTEL_SHADER_USE_IID))
+        cmd_len += 2;
+
+    if (cmd_len == 1)
+        return;
+
+    dw = pipeline_cmd_ptr(pipeline, cmd_len);
+
+    dw[0] = GEN6_RENDER_CMD(3D, 3DSTATE_VERTEX_ELEMENTS) |
+            (cmd_len - 2);
+    dw++;
+
+    /* VERTEX_ELEMENT_STATE */
+    for (i = 0, attrs_processed = 0; attrs_processed < attr_count; i++) {
+        VkVertexInputAttributeDescription *attr = NULL;
+
+        /*
+         * The compiler will pack the shader references and then
+         * indicate which locations are used via the bitmask in
+         * vs->inputs_read.
+         */
+        if (!(vs->inputs_read & (1L << i))) {
+            continue;
+        }
+
+        /*
+         * For each bit set in the vs->inputs_read we'll need
+         * to find the corresponding attribute record and then
+         * set up the next HW vertex element based on that attribute.
+         */
+        for (j = 0; j < info->vi.vertexAttributeDescriptionCount; j++) {
+            if (info->vi.pVertexAttributeDescriptions[j].location == i) {
+                attr = (VkVertexInputAttributeDescription *) &info->vi.pVertexAttributeDescriptions[j];
+                attrs_processed++;
+                break;
+            }
+        }
+        assert(attr != NULL);
+
+        const int format =
+            intel_format_translate_color(pipeline->dev->gpu, attr->format);
+
+        comps[0] = GEN6_VFCOMP_STORE_0;
+        comps[1] = GEN6_VFCOMP_STORE_0;
+        comps[2] = GEN6_VFCOMP_STORE_0;
+        comps[3] = icd_format_is_int(attr->format) ?
+            GEN6_VFCOMP_STORE_1_INT : GEN6_VFCOMP_STORE_1_FP;
+
+        switch (icd_format_get_channel_count(attr->format)) {
+        case 4: comps[3] = GEN6_VFCOMP_STORE_SRC; /* fall through */
+        case 3: comps[2] = GEN6_VFCOMP_STORE_SRC; /* fall through */
+        case 2: comps[1] = GEN6_VFCOMP_STORE_SRC; /* fall through */
+        case 1: comps[0] = GEN6_VFCOMP_STORE_SRC; break;
+        default:
+            break;
+        }
+
+        assert(attr->offset <= 2047);
+
+        dw[0] = attr->binding << GEN6_VE_DW0_VB_INDEX__SHIFT |
+                GEN6_VE_DW0_VALID |
+                format << GEN6_VE_DW0_FORMAT__SHIFT |
+                attr->offset;
+
+        dw[1] = comps[0] << GEN6_VE_DW1_COMP0__SHIFT |
+                comps[1] << GEN6_VE_DW1_COMP1__SHIFT |
+                comps[2] << GEN6_VE_DW1_COMP2__SHIFT |
+                comps[3] << GEN6_VE_DW1_COMP3__SHIFT;
+
+        dw += 2;
+    }
+
+    if (vs->uses & (INTEL_SHADER_USE_VID | INTEL_SHADER_USE_IID)) {
+        comps[0] = (vs->uses & INTEL_SHADER_USE_VID) ?
+            GEN6_VFCOMP_STORE_VID : GEN6_VFCOMP_STORE_0;
+        comps[1] = (vs->uses & INTEL_SHADER_USE_IID) ?
+            GEN6_VFCOMP_STORE_IID : GEN6_VFCOMP_NOSTORE;
+        comps[2] = GEN6_VFCOMP_NOSTORE;
+        comps[3] = GEN6_VFCOMP_NOSTORE;
+
+        dw[0] = GEN6_VE_DW0_VALID;
+        dw[1] = comps[0] << GEN6_VE_DW1_COMP0__SHIFT |
+                comps[1] << GEN6_VE_DW1_COMP1__SHIFT |
+                comps[2] << GEN6_VE_DW1_COMP2__SHIFT |
+                comps[3] << GEN6_VE_DW1_COMP3__SHIFT;
+
+        dw += 2;
+    }
+}
+
+static void pipeline_build_fragment_SBE(struct intel_pipeline *pipeline,
+                                        const struct intel_pipeline_create_info *info)
+{
+    const struct intel_pipeline_shader *fs = &pipeline->fs;
+    uint8_t cmd_len;
+    uint32_t *body;
+    uint32_t attr_skip, attr_count;
+    uint32_t vue_offset, vue_len;
+    uint32_t i;
+
+    // If GS is active, use its outputs
+    const struct intel_pipeline_shader *src =
+            (pipeline->active_shaders & SHADER_GEOMETRY_FLAG)
+                    ? &pipeline->gs
+                    : &pipeline->vs;
+
+    INTEL_GPU_ASSERT(pipeline->dev->gpu, 6, 7.5);
+
+    cmd_len = 14;
+
+    if (intel_gpu_gen(pipeline->dev->gpu) >= INTEL_GEN(7))
+        body = pipeline_cmd_ptr(pipeline, cmd_len);
+    else
+        body = pipeline->cmd_3dstate_sbe;
+
+    assert(!fs->reads_user_clip || src->enable_user_clip);
+    attr_skip = src->outputs_offset;
+    if (src->enable_user_clip != fs->reads_user_clip) {
+        attr_skip += 2;
+    }
+    assert(src->out_count >= attr_skip);
+    attr_count = src->out_count - attr_skip;
+
+    // LUNARG TODO: We currently are only handling 16 attrs;
+    // ultimately, we need to handle 32
+    assert(fs->in_count <= 16);
+    assert(attr_count <= 16);
+
+    vue_offset = attr_skip / 2;
+    vue_len = (attr_count + 1) / 2;
+    if (!vue_len)
+        vue_len = 1;
+
+    body[0] = GEN7_RENDER_CMD(3D, 3DSTATE_SBE) |
+            (cmd_len - 2);
+
+    // LUNARG TODO: If the attrs needed by the FS are exactly
+    // what is written by the VS, we don't need to enable
+    // swizzling, improving performance. Even if we swizzle,
+    // we can improve performance by reducing vue_len to
+    // just include the values needed by the FS:
+    // vue_len = ceiling((max_vs_out + 1)/2)
+
+    body[1] = GEN7_SBE_DW1_ATTR_SWIZZLE_ENABLE |
+          fs->in_count << GEN7_SBE_DW1_ATTR_COUNT__SHIFT |
+          vue_len << GEN7_SBE_DW1_URB_READ_LEN__SHIFT |
+          vue_offset << GEN7_SBE_DW1_URB_READ_OFFSET__SHIFT;
+
+    /* Vulkan default is point origin upper left */
+    body[1] |= GEN7_SBE_DW1_POINT_SPRITE_TEXCOORD_UPPERLEFT;
+
+    uint16_t src_slot[fs->in_count];
+    int32_t fs_in = 0;
+    int32_t src_out = - (vue_offset * 2 - src->outputs_offset);
+    for (i=0; i < 64; i++) {
+        bool srcWrites = src->outputs_written & (1L << i);
+        bool fsReads   = fs->inputs_read      & (1L << i);
+
+        if (fsReads) {
+            assert(src_out >= 0);
+            assert(fs_in < fs->in_count);
+            src_slot[fs_in] = src_out;
+
+            if (!srcWrites) {
+                // If the vertex shader did not write this input, we cannot
+                // program the SBE to read it.  Our choices are to allow it to
+                // read junk from a GRF, or get zero.  We're choosing zero.
+                if (i >= fs->generic_input_start) {
+                    src_slot[fs_in] = GEN8_SBE_SWIZ_CONST_0000 |
+                                     GEN8_SBE_SWIZ_OVERRIDE_X |
+                                     GEN8_SBE_SWIZ_OVERRIDE_Y |
+                                     GEN8_SBE_SWIZ_OVERRIDE_Z |
+                                     GEN8_SBE_SWIZ_OVERRIDE_W;
+                }
+            }
+
+            fs_in += 1;
+        }
+        if (srcWrites) {
+            src_out += 1;
+        }
+    }
+
+    for (i = 0; i < 8; i++) {
+        uint16_t hi, lo;
+
+        /* no attr swizzles */
+        if (i * 2 + 1 < fs->in_count) {
+            lo = src_slot[i * 2];
+            hi = src_slot[i * 2 + 1];
+        } else if (i * 2 < fs->in_count) {
+            lo = src_slot[i * 2];
+            hi = 0;
+        } else {
+            hi = 0;
+            lo = 0;
+        }
+
+        body[2 + i] = hi << GEN8_SBE_SWIZ_HIGH__SHIFT | lo;
+    }
+
+    if (info->ia.topology == VK_PRIMITIVE_TOPOLOGY_POINT_LIST)
+        body[10] = fs->point_sprite_enables;
+    else
+        body[10] = 0;
+
+    body[11] = 0; /* constant interpolation enables */
+    body[12] = 0; /* WrapShortest enables */
+    body[13] = 0;
+}
+
+static void pipeline_build_gs(struct intel_pipeline *pipeline,
+                              const struct intel_pipeline_create_info *info)
+{
+    // gen7_emit_3DSTATE_GS done by cmd_pipeline
+}
+
+static void pipeline_build_hs(struct intel_pipeline *pipeline,
+                              const struct intel_pipeline_create_info *info)
+{
+    const uint8_t cmd_len = 7;
+    const uint32_t dw0 = GEN7_RENDER_CMD(3D, 3DSTATE_HS) | (cmd_len - 2);
+    uint32_t *dw;
+
+    INTEL_GPU_ASSERT(pipeline->dev->gpu, 7, 7.5);
+
+    dw = pipeline_cmd_ptr(pipeline, cmd_len);
+    dw[0] = dw0;
+    dw[1] = 0;
+    dw[2] = 0;
+    dw[3] = 0;
+    dw[4] = 0;
+    dw[5] = 0;
+    dw[6] = 0;
+}
+
+static void pipeline_build_te(struct intel_pipeline *pipeline,
+                              const struct intel_pipeline_create_info *info)
+{
+    const uint8_t cmd_len = 4;
+    const uint32_t dw0 = GEN7_RENDER_CMD(3D, 3DSTATE_TE) | (cmd_len - 2);
+    uint32_t *dw;
+
+    INTEL_GPU_ASSERT(pipeline->dev->gpu, 7, 7.5);
+
+    dw = pipeline_cmd_ptr(pipeline, cmd_len);
+    dw[0] = dw0;
+    dw[1] = 0;
+    dw[2] = 0;
+    dw[3] = 0;
+}
+
+static void pipeline_build_ds(struct intel_pipeline *pipeline,
+                              const struct intel_pipeline_create_info *info)
+{
+    const uint8_t cmd_len = 6;
+    const uint32_t dw0 = GEN7_RENDER_CMD(3D, 3DSTATE_DS) | (cmd_len - 2);
+    uint32_t *dw;
+
+    INTEL_GPU_ASSERT(pipeline->dev->gpu, 7, 7.5);
+
+    dw = pipeline_cmd_ptr(pipeline, cmd_len);
+    dw[0] = dw0;
+    dw[1] = 0;
+    dw[2] = 0;
+    dw[3] = 0;
+    dw[4] = 0;
+    dw[5] = 0;
+}
+
+static void pipeline_build_depth_stencil(struct intel_pipeline *pipeline,
+                                         const struct intel_pipeline_create_info *info)
+{
+    pipeline->cmd_depth_stencil = 0;
+
+    if (info->db.stencilTestEnable) {
+        pipeline->cmd_depth_stencil = 1 << 31 |
+               translate_compare_func(info->db.front.compareOp) << 28 |
+               translate_stencil_op(info->db.front.failOp) << 25 |
+               translate_stencil_op(info->db.front.depthFailOp) << 22 |
+               translate_stencil_op(info->db.front.passOp) << 19 |
+               1 << 15 |
+               translate_compare_func(info->db.back.compareOp) << 12 |
+               translate_stencil_op(info->db.back.failOp) << 9 |
+               translate_stencil_op(info->db.back.depthFailOp) << 6 |
+               translate_stencil_op(info->db.back.passOp) << 3;
+     }
+
+    pipeline->stencilTestEnable = info->db.stencilTestEnable;
+
+    /*
+     * From the Sandy Bridge PRM, volume 2 part 1, page 360:
+     *
+     *     "Enabling the Depth Test function without defining a Depth Buffer is
+     *      UNDEFINED."
+     *
+     * From the Sandy Bridge PRM, volume 2 part 1, page 375:
+     *
+     *     "A Depth Buffer must be defined before enabling writes to it, or
+     *      operation is UNDEFINED."
+     *
+     * TODO We do not check these yet.
+     */
+    if (info->db.depthTestEnable) {
+       pipeline->cmd_depth_test = GEN6_ZS_DW2_DEPTH_TEST_ENABLE |
+               translate_compare_func(info->db.depthCompareOp) << 27;
+    } else {
+       pipeline->cmd_depth_test = GEN6_COMPAREFUNCTION_ALWAYS << 27;
+    }
+
+    if (info->db.depthWriteEnable)
+       pipeline->cmd_depth_test |= GEN6_ZS_DW2_DEPTH_WRITE_ENABLE;
+}
+
+static void pipeline_build_msaa(struct intel_pipeline *pipeline,
+                                const struct intel_pipeline_create_info *info)
+{
+    uint32_t cmd, cmd_len;
+    uint32_t *dw;
+
+    INTEL_GPU_ASSERT(pipeline->dev->gpu, 6, 7.5);
+
+    pipeline->sample_count =
+        translate_sample_count(info->ms.rasterizationSamples);
+
+    pipeline->alphaToCoverageEnable = info->ms.alphaToCoverageEnable;
+    pipeline->alphaToOneEnable = info->ms.alphaToOneEnable;
+
+    /* 3DSTATE_SAMPLE_MASK */
+    cmd = GEN6_RENDER_CMD(3D, 3DSTATE_SAMPLE_MASK);
+    cmd_len = 2;
+
+    dw = pipeline_cmd_ptr(pipeline, cmd_len);
+    dw[0] = cmd | (cmd_len - 2);
+    if (info->ms.pSampleMask) {
+        /* "Bit B of mask word M corresponds to sample 32*M + B."
+         * "The array is sized to a length of ceil(rasterizationSamples / 32) words."
+         * "If pSampleMask is NULL, it is treated as if the mask has all bits enabled,"
+         * "i.e. no coverage is removed from primitives."
+         */
+        assert(pipeline->sample_count / 32 == 0);
+        dw[1] = *info->ms.pSampleMask & ((1 << pipeline->sample_count) - 1);
+     } else {
+        dw[1] = (1 << pipeline->sample_count) - 1;
+     }
+
+    pipeline->cmd_sample_mask = dw[1];
+}
+
+static void pipeline_build_cb(struct intel_pipeline *pipeline,
+                              const struct intel_pipeline_create_info *info)
+{
+    uint32_t i;
+
+    INTEL_GPU_ASSERT(pipeline->dev->gpu, 6, 7.5);
+    STATIC_ASSERT(ARRAY_SIZE(pipeline->cmd_cb) >= INTEL_MAX_RENDER_TARGETS*2);
+    assert(info->cb.attachmentCount <= INTEL_MAX_RENDER_TARGETS);
+
+    uint32_t *dw = pipeline->cmd_cb;
+
+    for (i = 0; i < info->cb.attachmentCount; i++) {
+        const VkPipelineColorBlendAttachmentState *att = &info->cb.pAttachments[i];
+        uint32_t dw0, dw1;
+
+
+        dw0 = 0;
+        dw1 = GEN6_RT_DW1_COLORCLAMP_RTFORMAT |
+              GEN6_RT_DW1_PRE_BLEND_CLAMP |
+              GEN6_RT_DW1_POST_BLEND_CLAMP;
+
+        if (att->blendEnable) {
+            dw0 = 1 << 31 |
+                    translate_blend_func(att->alphaBlendOp) << 26 |
+                    translate_blend(att->srcAlphaBlendFactor) << 20 |
+                    translate_blend(att->dstAlphaBlendFactor) << 15 |
+                    translate_blend_func(att->colorBlendOp) << 11 |
+                    translate_blend(att->srcColorBlendFactor) << 5 |
+                    translate_blend(att->dstColorBlendFactor);
+
+            if (att->alphaBlendOp != att->colorBlendOp ||
+                att->srcAlphaBlendFactor != att->srcColorBlendFactor ||
+                att->dstAlphaBlendFactor != att->dstColorBlendFactor)
+                dw0 |= 1 << 30;
+
+            pipeline->dual_source_blend_enable = icd_pipeline_cb_att_needs_dual_source_blending(att);
+        }
+
+        if (info->cb.logicOpEnable && info->cb.logicOp != VK_LOGIC_OP_COPY) {
+            int logicop;
+
+            switch (info->cb.logicOp) {
+            case VK_LOGIC_OP_CLEAR:            logicop = GEN6_LOGICOP_CLEAR; break;
+            case VK_LOGIC_OP_AND:              logicop = GEN6_LOGICOP_AND; break;
+            case VK_LOGIC_OP_AND_REVERSE:      logicop = GEN6_LOGICOP_AND_REVERSE; break;
+            case VK_LOGIC_OP_AND_INVERTED:     logicop = GEN6_LOGICOP_AND_INVERTED; break;
+            case VK_LOGIC_OP_NO_OP:             logicop = GEN6_LOGICOP_NOOP; break;
+            case VK_LOGIC_OP_XOR:              logicop = GEN6_LOGICOP_XOR; break;
+            case VK_LOGIC_OP_OR:               logicop = GEN6_LOGICOP_OR; break;
+            case VK_LOGIC_OP_NOR:              logicop = GEN6_LOGICOP_NOR; break;
+            case VK_LOGIC_OP_EQUIVALENT:            logicop = GEN6_LOGICOP_EQUIV; break;
+            case VK_LOGIC_OP_INVERT:           logicop = GEN6_LOGICOP_INVERT; break;
+            case VK_LOGIC_OP_OR_REVERSE:       logicop = GEN6_LOGICOP_OR_REVERSE; break;
+            case VK_LOGIC_OP_COPY_INVERTED:    logicop = GEN6_LOGICOP_COPY_INVERTED; break;
+            case VK_LOGIC_OP_OR_INVERTED:      logicop = GEN6_LOGICOP_OR_INVERTED; break;
+            case VK_LOGIC_OP_NAND:             logicop = GEN6_LOGICOP_NAND; break;
+            case VK_LOGIC_OP_SET:              logicop = GEN6_LOGICOP_SET; break;
+            default:
+                assert(!"unknown logic op");
+                logicop = GEN6_LOGICOP_CLEAR;
+                break;
+            }
+
+            dw1 |= GEN6_RT_DW1_LOGICOP_ENABLE |
+                   logicop << GEN6_RT_DW1_LOGICOP_FUNC__SHIFT;
+        }
+
+        if (!(att->colorWriteMask & 0x1))
+            dw1 |= GEN6_RT_DW1_WRITE_DISABLE_R;
+        if (!(att->colorWriteMask & 0x2))
+            dw1 |= GEN6_RT_DW1_WRITE_DISABLE_G;
+        if (!(att->colorWriteMask & 0x4))
+            dw1 |= GEN6_RT_DW1_WRITE_DISABLE_B;
+        if (!(att->colorWriteMask & 0x8))
+            dw1 |= GEN6_RT_DW1_WRITE_DISABLE_A;
+
+        dw[2 * i] = dw0;
+        dw[2 * i + 1] = dw1;
+    }
+
+    for (i=info->cb.attachmentCount; i < INTEL_MAX_RENDER_TARGETS; i++)
+    {
+        dw[2 * i] = 0;
+        dw[2 * i + 1] = GEN6_RT_DW1_COLORCLAMP_RTFORMAT |
+                GEN6_RT_DW1_PRE_BLEND_CLAMP |
+                GEN6_RT_DW1_POST_BLEND_CLAMP |
+                GEN6_RT_DW1_WRITE_DISABLE_R |
+                GEN6_RT_DW1_WRITE_DISABLE_G |
+                GEN6_RT_DW1_WRITE_DISABLE_B |
+                GEN6_RT_DW1_WRITE_DISABLE_A;
+    }
+
+}
+
+static void pipeline_build_state(struct intel_pipeline *pipeline,
+                                 const struct intel_pipeline_create_info *info)
+{
+    if (info->use_pipeline_dynamic_state & INTEL_USE_PIPELINE_DYNAMIC_VIEWPORT) {
+        pipeline->state.viewport.viewport_count = info->vp.viewportCount;
+        memcpy(pipeline->state.viewport.viewports, info->vp.pViewports, info->vp.viewportCount * sizeof(VkViewport));
+    }
+    if (info->use_pipeline_dynamic_state & INTEL_USE_PIPELINE_DYNAMIC_SCISSOR) {
+        pipeline->state.viewport.scissor_count = info->vp.scissorCount;
+        memcpy(pipeline->state.viewport.scissors, info->vp.pScissors, info->vp.scissorCount * sizeof(VkRect2D));
+    }
+    if (info->use_pipeline_dynamic_state & INTEL_USE_PIPELINE_DYNAMIC_LINE_WIDTH) {
+        pipeline->state.line_width.line_width = info->rs.lineWidth;
+    }
+    if (info->use_pipeline_dynamic_state & INTEL_USE_PIPELINE_DYNAMIC_DEPTH_BIAS) {
+        pipeline->state.depth_bias.depth_bias = info->rs.depthBiasConstantFactor;
+        pipeline->state.depth_bias.depth_bias_clamp = info->rs.depthBiasClamp;
+        pipeline->state.depth_bias.slope_scaled_depth_bias = info->rs.depthBiasSlopeFactor;
+    }
+    if (info->use_pipeline_dynamic_state & INTEL_USE_PIPELINE_DYNAMIC_BLEND_CONSTANTS) {
+        pipeline->state.blend.blend_const[0] = info->cb.blendConstants[0];
+        pipeline->state.blend.blend_const[1] = info->cb.blendConstants[1];
+        pipeline->state.blend.blend_const[2] = info->cb.blendConstants[2];
+        pipeline->state.blend.blend_const[3] = info->cb.blendConstants[3];
+    }
+    if (info->use_pipeline_dynamic_state & INTEL_USE_PIPELINE_DYNAMIC_DEPTH_BOUNDS) {
+        pipeline->state.depth_bounds.min_depth_bounds = info->db.minDepthBounds;
+        pipeline->state.depth_bounds.max_depth_bounds = info->db.maxDepthBounds;
+    }
+    if (info->use_pipeline_dynamic_state & INTEL_USE_PIPELINE_DYNAMIC_STENCIL_COMPARE_MASK) {
+        pipeline->state.stencil.front.stencil_compare_mask = info->db.front.compareMask;
+        pipeline->state.stencil.back.stencil_compare_mask = info->db.back.compareMask;
+    }
+    if (info->use_pipeline_dynamic_state & INTEL_USE_PIPELINE_DYNAMIC_STENCIL_WRITE_MASK) {
+
+        pipeline->state.stencil.front.stencil_write_mask = info->db.front.writeMask;
+        pipeline->state.stencil.back.stencil_write_mask = info->db.back.writeMask;
+    }
+    if (info->use_pipeline_dynamic_state & INTEL_USE_PIPELINE_DYNAMIC_STENCIL_REFERENCE) {
+        pipeline->state.stencil.front.stencil_reference = info->db.front.reference;
+        pipeline->state.stencil.back.stencil_reference = info->db.back.reference;
+    }
+
+    pipeline->state.use_pipeline_dynamic_state = info->use_pipeline_dynamic_state;
+}
+
+
+static VkResult pipeline_build_all(struct intel_pipeline *pipeline,
+                                   const struct intel_pipeline_create_info *info)
+{
+    VkResult ret;
+
+    pipeline_build_state(pipeline, info);
+
+    ret = pipeline_build_shaders(pipeline, info);
+    if (ret != VK_SUCCESS)
+        return ret;
+
+    /* TODOVV: Move test to validation layer
+     *  This particular test is based on a limit imposed by
+     *  INTEL_MAX_VERTEX_BINDING_COUNT, which should be migrated
+     *  to API-defined maxVertexInputBindings setting and then
+     *  this check can be in DeviceLimits layer
+     */
+    if (info->vi.vertexBindingDescriptionCount > ARRAY_SIZE(pipeline->vb) ||
+        info->vi.vertexAttributeDescriptionCount > ARRAY_SIZE(pipeline->vb)) {
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    pipeline->vb_count = info->vi.vertexBindingDescriptionCount;
+    memcpy(pipeline->vb, info->vi.pVertexBindingDescriptions,
+            sizeof(pipeline->vb[0]) * pipeline->vb_count);
+
+    pipeline_build_vertex_elements(pipeline, info);
+    pipeline_build_fragment_SBE(pipeline, info);
+    pipeline_build_msaa(pipeline, info);
+    pipeline_build_depth_stencil(pipeline, info);
+
+    if (intel_gpu_gen(pipeline->dev->gpu) >= INTEL_GEN(7)) {
+        pipeline_build_urb_alloc_gen7(pipeline, info);
+        pipeline_build_gs(pipeline, info);
+        pipeline_build_hs(pipeline, info);
+        pipeline_build_te(pipeline, info);
+        pipeline_build_ds(pipeline, info);
+
+        pipeline->wa_flags = INTEL_CMD_WA_GEN6_PRE_DEPTH_STALL_WRITE |
+                             INTEL_CMD_WA_GEN6_PRE_COMMAND_SCOREBOARD_STALL |
+                             INTEL_CMD_WA_GEN7_PRE_VS_DEPTH_STALL_WRITE |
+                             INTEL_CMD_WA_GEN7_POST_COMMAND_CS_STALL |
+                             INTEL_CMD_WA_GEN7_POST_COMMAND_DEPTH_STALL;
+    } else {
+        pipeline_build_urb_alloc_gen6(pipeline, info);
+
+        pipeline->wa_flags = INTEL_CMD_WA_GEN6_PRE_DEPTH_STALL_WRITE |
+                             INTEL_CMD_WA_GEN6_PRE_COMMAND_SCOREBOARD_STALL;
+    }
+
+    ret = pipeline_build_ia(pipeline, info);
+
+    if (ret == VK_SUCCESS)
+        ret = pipeline_build_rs_state(pipeline, info);
+
+    if (ret == VK_SUCCESS) {
+        pipeline_build_cb(pipeline, info);
+        pipeline->cb_state = info->cb;
+        pipeline->tess_state = info->tess;
+    }
+
+    return ret;
+}
+
+static VkResult pipeline_create_info_init(struct intel_pipeline_create_info  *info,
+                                          const VkGraphicsPipelineCreateInfo *vkinfo)
+{
+    memset(info, 0, sizeof(*info));
+
+    /*
+     * Do we need to set safe defaults in case the app doesn't provide all of
+     * the necessary create infos?
+     */
+    info->ms.rasterizationSamples = VK_SAMPLE_COUNT_1_BIT;
+    info->ms.pSampleMask = NULL;
+
+    memcpy(&info->graphics, vkinfo, sizeof (info->graphics));
+
+    void *dst;
+    for (uint32_t i = 0; i < vkinfo->stageCount; i++) {
+        const VkPipelineShaderStageCreateInfo *thisStage = &vkinfo->pStages[i];
+        switch (thisStage->stage) {
+            case VK_SHADER_STAGE_VERTEX_BIT:
+                dst = &info->vs;
+                break;
+            case VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT:
+                dst = &info->tcs;
+                break;
+            case VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT:
+                dst = &info->tes;
+                break;
+            case VK_SHADER_STAGE_GEOMETRY_BIT:
+                dst = &info->gs;
+                break;
+            case VK_SHADER_STAGE_FRAGMENT_BIT:
+                dst = &info->fs;
+                break;
+            case VK_SHADER_STAGE_COMPUTE_BIT:
+                dst = &info->compute;
+                break;
+            default:
+                assert(!"unsupported shader stage");
+                break;
+        }
+        memcpy(dst, thisStage, sizeof(VkPipelineShaderStageCreateInfo));
+    }
+
+    if (vkinfo->pVertexInputState != NULL) {
+        memcpy(&info->vi, vkinfo->pVertexInputState, sizeof (info->vi));
+    }
+    if (vkinfo->pInputAssemblyState != NULL) {
+        memcpy(&info->ia, vkinfo->pInputAssemblyState, sizeof (info->ia));
+    }
+    if (vkinfo->pDepthStencilState != NULL) {
+        memcpy(&info->db, vkinfo->pDepthStencilState, sizeof (info->db));
+    }
+    if (vkinfo->pColorBlendState != NULL) {
+        memcpy(&info->cb, vkinfo->pColorBlendState, sizeof (info->cb));
+    }
+    if (vkinfo->pRasterizationState != NULL) {
+        memcpy(&info->rs, vkinfo->pRasterizationState, sizeof (info->rs));
+    }
+    if (vkinfo->pTessellationState != NULL) {
+        memcpy(&info->tess, vkinfo->pTessellationState, sizeof (info->tess));
+    }
+    if (vkinfo->pMultisampleState != NULL) {
+        memcpy(&info->ms, vkinfo->pMultisampleState, sizeof (info->ms));
+    }
+    if (vkinfo->pViewportState != NULL) {
+        memcpy(&info->vp, vkinfo->pViewportState, sizeof (info->vp));
+    }
+
+    /* by default, take all dynamic state from the pipeline */
+    info->use_pipeline_dynamic_state = INTEL_USE_PIPELINE_DYNAMIC_VIEWPORT |
+                                       INTEL_USE_PIPELINE_DYNAMIC_SCISSOR |
+                                       INTEL_USE_PIPELINE_DYNAMIC_BLEND_CONSTANTS |
+                                       INTEL_USE_PIPELINE_DYNAMIC_DEPTH_BIAS |
+                                       INTEL_USE_PIPELINE_DYNAMIC_DEPTH_BOUNDS |
+                                       INTEL_USE_PIPELINE_DYNAMIC_LINE_WIDTH |
+                                       INTEL_USE_PIPELINE_DYNAMIC_STENCIL_COMPARE_MASK |
+                                       INTEL_USE_PIPELINE_DYNAMIC_STENCIL_REFERENCE |
+                                       INTEL_USE_PIPELINE_DYNAMIC_STENCIL_WRITE_MASK;
+    if (vkinfo->pDynamicState != NULL) {
+        for (uint32_t i = 0; i < vkinfo->pDynamicState->dynamicStateCount; i++) {
+            /* Mark dynamic state indicated by app as not using pipeline state */
+            switch (vkinfo->pDynamicState->pDynamicStates[i]) {
+            case VK_DYNAMIC_STATE_VIEWPORT:
+                info->use_pipeline_dynamic_state &= ~INTEL_USE_PIPELINE_DYNAMIC_VIEWPORT;
+                break;
+            case VK_DYNAMIC_STATE_SCISSOR:
+                info->use_pipeline_dynamic_state &= ~INTEL_USE_PIPELINE_DYNAMIC_SCISSOR;
+                break;
+            case VK_DYNAMIC_STATE_LINE_WIDTH:
+                info->use_pipeline_dynamic_state &= ~INTEL_USE_PIPELINE_DYNAMIC_LINE_WIDTH;
+                break;
+            case VK_DYNAMIC_STATE_DEPTH_BIAS:
+                info->use_pipeline_dynamic_state &= ~INTEL_USE_PIPELINE_DYNAMIC_DEPTH_BIAS;
+                break;
+            case VK_DYNAMIC_STATE_BLEND_CONSTANTS:
+                info->use_pipeline_dynamic_state &= ~INTEL_USE_PIPELINE_DYNAMIC_BLEND_CONSTANTS;
+                break;
+            case VK_DYNAMIC_STATE_DEPTH_BOUNDS:
+                info->use_pipeline_dynamic_state &= ~INTEL_USE_PIPELINE_DYNAMIC_DEPTH_BOUNDS;
+                break;
+            case VK_DYNAMIC_STATE_STENCIL_COMPARE_MASK:
+                info->use_pipeline_dynamic_state &= ~INTEL_USE_PIPELINE_DYNAMIC_STENCIL_COMPARE_MASK;
+                break;
+            case VK_DYNAMIC_STATE_STENCIL_WRITE_MASK:
+                info->use_pipeline_dynamic_state &= ~INTEL_USE_PIPELINE_DYNAMIC_STENCIL_WRITE_MASK;
+                break;
+            case VK_DYNAMIC_STATE_STENCIL_REFERENCE:
+                info->use_pipeline_dynamic_state &= ~INTEL_USE_PIPELINE_DYNAMIC_STENCIL_REFERENCE;
+                break;
+            default:
+                assert(!"Invalid dynamic state");
+                break;
+            }
+        }
+    }
+
+    return VK_SUCCESS;
+}
+
+static VkResult graphics_pipeline_create(struct intel_dev *dev,
+                                         const VkGraphicsPipelineCreateInfo *info_,
+                                         struct intel_pipeline **pipeline_ret)
+{
+    struct intel_pipeline_create_info info;
+    struct intel_pipeline *pipeline;
+    VkResult ret;
+
+    ret = pipeline_create_info_init(&info, info_);
+
+    if (ret != VK_SUCCESS)
+        return ret;
+
+    pipeline = (struct intel_pipeline *) intel_base_create(&dev->base.handle,
+                        sizeof (*pipeline), dev->base.dbg,
+                        VK_DEBUG_REPORT_OBJECT_TYPE_PIPELINE_EXT, info_, 0);
+    if (!pipeline)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    pipeline->dev = dev;
+    pipeline->pipeline_layout = intel_pipeline_layout(info.graphics.layout);
+
+    pipeline->obj.destroy = pipeline_destroy;
+
+    ret = pipeline_build_all(pipeline, &info);
+    if (ret != VK_SUCCESS) {
+        pipeline_destroy(&pipeline->obj);
+        return ret;
+    }
+
+    VkMemoryAllocateInfo mem_reqs;
+    mem_reqs.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
+    mem_reqs.allocationSize = pipeline->scratch_size;
+    mem_reqs.pNext = NULL;
+    mem_reqs.memoryTypeIndex = 0;
+    intel_mem_alloc(dev, &mem_reqs, &pipeline->obj.mem);
+
+    *pipeline_ret = pipeline;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreatePipelineCache(
+    VkDevice                                    device,
+    const VkPipelineCacheCreateInfo*            pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkPipelineCache*                            pPipelineCache)
+{
+
+    // non-dispatchable objects only need to be 64 bits currently
+    *((uint64_t *)pPipelineCache) = 1;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyPipelineCache(
+    VkDevice                                    device,
+    VkPipelineCache                             pipelineCache,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkGetPipelineCacheData(
+    VkDevice                                    device,
+    VkPipelineCache                             pipelineCache,
+    size_t*                                     pDataSize,
+    void*                                       pData)
+{
+    return VK_ERROR_VALIDATION_FAILED_EXT;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkMergePipelineCaches(
+    VkDevice                                    device,
+    VkPipelineCache                             dstCache,
+    uint32_t                                    srcCacheCount,
+    const VkPipelineCache*                      pSrcCaches)
+{
+    return VK_ERROR_VALIDATION_FAILED_EXT;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateGraphicsPipelines(
+    VkDevice                                  device,
+    VkPipelineCache                           pipelineCache,
+    uint32_t                                  createInfoCount,
+    const VkGraphicsPipelineCreateInfo*       pCreateInfos,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkPipeline*                               pPipelines)
+{
+    struct intel_dev *dev = intel_dev(device);
+    uint32_t i;
+    VkResult res = VK_SUCCESS;
+    bool one_succeeded = false;
+
+    for (i = 0; i < createInfoCount; i++) {
+        res =  graphics_pipeline_create(dev, &(pCreateInfos[i]),
+            (struct intel_pipeline **) &(pPipelines[i]));
+        //return NULL handle for unsuccessful creates
+        if (res != VK_SUCCESS)
+            pPipelines[i] = VK_NULL_HANDLE;
+        else
+            one_succeeded = true;
+    }
+    //return VK_SUCCESS if any of count creates succeeded
+    if (one_succeeded)
+        return VK_SUCCESS;
+    else
+        return res;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateComputePipelines(
+    VkDevice                                  device,
+    VkPipelineCache                           pipelineCache,
+    uint32_t                                  createInfoCount,
+    const VkComputePipelineCreateInfo*        pCreateInfos,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkPipeline*                               pPipelines)
+{
+    return VK_ERROR_VALIDATION_FAILED_EXT;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyPipeline(
+    VkDevice                                device,
+    VkPipeline                              pipeline,
+    const VkAllocationCallbacks*                     pAllocator)
+
+ {
+    struct intel_obj *obj = intel_obj(pipeline);
+
+    intel_mem_free(obj->mem);
+    obj->destroy(obj);
+ }

diff --git a/icd/intel/pipeline.h b/icd/intel/pipeline.h
new file mode 100644
index 0000000..ecfcfcd
--- /dev/null
+++ b/icd/intel/pipeline.h

@@ -0,0 +1,282 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Cody Northrop <cody@lunarg.com>
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ * Author: Tony Barbour <tony@LunarG.com>
+ *
+ */
+
+#ifndef PIPELINE_H
+#define PIPELINE_H
+
+#include "intel.h"
+#include "obj.h"
+#include "desc.h"
+#include "dev.h"
+#include "state.h"
+
+enum intel_pipeline_shader_use {
+    INTEL_SHADER_USE_VID                = (1 << 0),
+    INTEL_SHADER_USE_IID                = (1 << 1),
+
+    INTEL_SHADER_USE_KILL               = (1 << 2),
+    INTEL_SHADER_USE_DEPTH              = (1 << 3),
+    INTEL_SHADER_USE_W                  = (1 << 4),
+};
+
+/* This order must match Pixel Shader Computed Depth Mode in 3DSTATE_WM */
+enum intel_computed_depth_mode {
+    INTEL_COMPUTED_DEPTH_MODE_NONE,
+    INTEL_COMPUTED_DEPTH_MODE_ON,
+    INTEL_COMPUTED_DEPTH_MODE_ON_GE,
+    INTEL_COMPUTED_DEPTH_MODE_ON_LE
+};
+
+enum intel_pipeline_rmap_slot_type {
+    INTEL_PIPELINE_RMAP_UNUSED,
+    INTEL_PIPELINE_RMAP_RT,
+    INTEL_PIPELINE_RMAP_SURFACE,
+    INTEL_PIPELINE_RMAP_SAMPLER,
+};
+
+struct intel_pipeline_rmap_slot {
+    enum intel_pipeline_rmap_slot_type type;
+
+    uint32_t index; /* in the render target array or layout chain */
+    union {
+        struct {
+            struct intel_desc_offset offset;
+            int dynamic_offset_index;
+        } surface;
+        struct intel_desc_offset sampler;
+    } u;
+};
+
+/**
+ * Shader resource mapping.
+ */
+struct intel_pipeline_rmap {
+    /* this is not an intel_obj */
+
+    uint32_t rt_count;
+    uint32_t texture_resource_count;
+    uint32_t resource_count;
+    uint32_t uav_count;
+    uint32_t sampler_count;
+
+    /*
+     * rt_count slots +
+     * resource_count slots +
+     * uav_count slots +
+     * sampler_count slots
+     */
+    struct intel_pipeline_rmap_slot *slots;
+    uint32_t slot_count;
+};
+
+#define SHADER_VERTEX_FLAG            VK_SHADER_STAGE_VERTEX_BIT
+#define SHADER_TESS_CONTROL_FLAG      VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT
+#define SHADER_TESS_EVAL_FLAG         VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT
+#define SHADER_GEOMETRY_FLAG          VK_SHADER_STAGE_GEOMETRY_BIT
+#define SHADER_FRAGMENT_FLAG          VK_SHADER_STAGE_FRAGMENT_BIT
+#define SHADER_COMPUTE_FLAG           VK_SHADER_STAGE_COMPUTE_BIT
+
+struct intel_pipeline_shader {
+    /* this is not an intel_obj */
+
+    void *pCode;
+    uint32_t codeSize;
+
+    /*
+     * must grab everything we need from shader object as that
+     * can go away after the pipeline is created
+     */
+    VkFlags uses;
+    uint64_t inputs_read;
+    uint64_t outputs_written;
+    uint32_t outputs_offset;
+    uint32_t generic_input_start;
+
+    VkBool32 enable_user_clip;
+    VkBool32 reads_user_clip;
+
+    uint32_t in_count;
+    uint32_t out_count;
+
+    uint32_t sampler_count;
+    uint32_t surface_count;
+
+    uint32_t ubo_start;
+
+    // geometry shader specific
+    uint32_t output_size_hwords;
+    uint32_t output_topology;
+    uint32_t control_data_header_size_hwords;
+    uint32_t control_data_format;
+    VkBool32 include_primitive_id;
+    int32_t  invocations;
+    VkBool32 dual_instanced_dispatch;
+    VkBool32 discard_adj;
+
+    uint32_t urb_grf_start;
+    uint32_t urb_grf_start_16;
+
+    /* If present, where does the SIMD16 kernel start? */
+    uint32_t offset_16;
+
+    VkFlags barycentric_interps;
+    VkFlags point_sprite_enables;
+
+    VkDeviceSize per_thread_scratch_size;
+
+    enum intel_computed_depth_mode computed_depth_mode;
+
+    struct intel_pipeline_rmap *rmap;
+
+    /* these are set up by the driver */
+    uint32_t max_threads;
+    VkDeviceSize scratch_offset;
+};
+
+/*
+ * On GEN6, there are
+ *
+ *  - 3DSTATE_URB (3)
+ *  - 3DSTATE_VERTEX_ELEMENTS (1+2*INTEL_MAX_VERTEX_ELEMENT_COUNT)
+ *  - 3DSTATE_SAMPLE_MASK (2)
+ *
+ * On GEN7, there are
+ *
+ *  - 3DSTATE_URB_x (2*4)
+ *  - 3DSTATE_VERTEX_ELEMENTS (1+2*INTEL_MAX_VERTEX_ELEMENT_COUNT)
+ *  - 3DSTATE_SBE (14)
+ *  - 3DSTATE_HS (7)
+ *  - 3DSTATE_TE (4)
+ *  - 3DSTATE_DS (6)
+ *  - 3DSTATE_SAMPLE_MASK (2)
+ */
+#define INTEL_PSO_CMD_ENTRIES   128
+
+/**
+ * 3D pipeline.
+ */
+struct intel_pipeline {
+    struct intel_obj obj;
+
+    struct intel_dev *dev;
+
+    const struct intel_pipeline_layout *pipeline_layout;
+
+    VkVertexInputBindingDescription vb[INTEL_MAX_VERTEX_BINDING_COUNT];
+    uint32_t vb_count;
+
+    /* VkPipelineIaStateCreateInfo */
+    VkPrimitiveTopology topology;
+    int prim_type;
+    bool disable_vs_cache;
+    bool primitive_restart;
+    uint32_t primitive_restart_index;
+
+    // TODO: This should probably be Intel HW state, not VK state.
+    /* Depth Buffer format */
+    VkFormat db_format;
+
+    VkPipelineColorBlendStateCreateInfo cb_state;
+
+    // VkPipelineRsStateCreateInfo rs_state;
+    bool depthClipEnable;
+    bool rasterizerDiscardEnable;
+    bool depthBiasEnable;
+
+    bool alphaToCoverageEnable;
+    bool alphaToOneEnable;
+
+    VkPipelineTessellationStateCreateInfo tess_state;
+
+    uint32_t active_shaders;
+    struct intel_pipeline_shader vs;
+    struct intel_pipeline_shader tcs;
+    struct intel_pipeline_shader tes;
+    struct intel_pipeline_shader gs;
+    struct intel_pipeline_shader fs;
+    struct intel_pipeline_shader cs;
+    VkDeviceSize scratch_size;
+
+    uint32_t wa_flags;
+
+    uint32_t cmds[INTEL_PSO_CMD_ENTRIES];
+    uint32_t cmd_len;
+
+    bool dual_source_blend_enable;
+
+    /* The following are only partial HW commands that will need
+     * more processing before sending to the HW
+     */
+    // VkPipelineDsStateCreateInfo ds_state
+    bool stencilTestEnable;
+
+    /* Dynamic state specified at PSO create time */
+    struct {
+        VkFlags use_pipeline_dynamic_state;
+        struct intel_dynamic_viewport viewport;
+        struct intel_dynamic_line_width line_width;
+        struct intel_dynamic_depth_bias depth_bias;
+        struct intel_dynamic_blend blend;
+        struct intel_dynamic_depth_bounds depth_bounds;
+        struct intel_dynamic_stencil stencil;
+    } state;
+
+    uint32_t cmd_depth_stencil;
+    uint32_t cmd_depth_test;
+
+    uint32_t cmd_sf_fill;
+    uint32_t cmd_clip_cull;
+    uint32_t cmd_sf_cull;
+    uint32_t cmd_cb[2 * INTEL_MAX_RENDER_TARGETS];
+    uint32_t sample_count;
+    uint32_t cmd_sample_mask;
+
+    uint32_t cmd_3dstate_sbe[14];
+};
+
+static inline struct intel_pipeline *intel_pipeline(VkPipeline pipeline)
+{
+    return *(struct intel_pipeline **) &pipeline;
+}
+
+static inline struct intel_pipeline *intel_pipeline_from_base(struct intel_base *base)
+{
+    return (struct intel_pipeline *) base;
+}
+
+static inline struct intel_pipeline *intel_pipeline_from_obj(struct intel_obj *obj)
+{
+    return intel_pipeline_from_base(&obj->base);
+}
+
+struct intel_pipeline_shader *intel_pipeline_shader_create_meta(struct intel_dev *dev,
+                                                                enum intel_dev_meta_shader id);
+void intel_pipeline_shader_destroy(struct intel_dev *dev,
+                                   struct intel_pipeline_shader *sh);
+
+void intel_pipeline_init_default_sample_patterns(const struct intel_dev *dev,
+                                                 uint8_t *pat_1x, uint8_t *pat_2x,
+                                                 uint8_t *pat_4x, uint8_t *pat_8x,
+                                                 uint8_t *pat_16x);
+
+#endif /* PIPELINE_H */

diff --git a/icd/intel/query.c b/icd/intel/query.c
new file mode 100644
index 0000000..8a6508b
--- /dev/null
+++ b/icd/intel/query.c

@@ -0,0 +1,253 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ *
+ */
+
+#include "dev.h"
+#include "mem.h"
+#include "query.h"
+#include "genhw/genhw.h"
+
+static void query_destroy(struct intel_obj *obj)
+{
+    struct intel_query *query = intel_query_from_obj(obj);
+
+    intel_mem_free(obj->mem);
+    intel_query_destroy(query);
+}
+
+static void query_init_pipeline_statistics(
+        struct intel_dev *dev,
+        const VkQueryPoolCreateInfo *info,
+        struct intel_query *query)
+{
+    /*
+     * Note: order defined by Vulkan spec.
+     */
+    const uint32_t regs[][2] = {
+        {VK_QUERY_PIPELINE_STATISTIC_INPUT_ASSEMBLY_PRIMITIVES_BIT, GEN6_REG_IA_PRIMITIVES_COUNT},
+        {VK_QUERY_PIPELINE_STATISTIC_VERTEX_SHADER_INVOCATIONS_BIT, GEN6_REG_VS_INVOCATION_COUNT},
+        {VK_QUERY_PIPELINE_STATISTIC_GEOMETRY_SHADER_INVOCATIONS_BIT, GEN6_REG_GS_INVOCATION_COUNT},
+        {VK_QUERY_PIPELINE_STATISTIC_GEOMETRY_SHADER_PRIMITIVES_BIT, GEN6_REG_GS_PRIMITIVES_COUNT},
+        {VK_QUERY_PIPELINE_STATISTIC_CLIPPING_INVOCATIONS_BIT, GEN6_REG_CL_INVOCATION_COUNT},
+        {VK_QUERY_PIPELINE_STATISTIC_CLIPPING_PRIMITIVES_BIT, GEN6_REG_CL_PRIMITIVES_COUNT},
+        {VK_QUERY_PIPELINE_STATISTIC_FRAGMENT_SHADER_INVOCATIONS_BIT, GEN6_REG_PS_INVOCATION_COUNT},
+        {VK_QUERY_PIPELINE_STATISTIC_TESSELLATION_CONTROL_SHADER_PATCHES_BIT, (intel_gpu_gen(dev->gpu) >= INTEL_GEN(7)) ? GEN7_REG_HS_INVOCATION_COUNT : 0},
+        {VK_QUERY_PIPELINE_STATISTIC_TESSELLATION_EVALUATION_SHADER_INVOCATIONS_BIT, (intel_gpu_gen(dev->gpu) >= INTEL_GEN(7)) ? GEN7_REG_DS_INVOCATION_COUNT : 0},
+        {VK_QUERY_PIPELINE_STATISTIC_COMPUTE_SHADER_INVOCATIONS_BIT, 0}
+    };
+    STATIC_ASSERT(ARRAY_SIZE(regs) < 32);
+    uint32_t i;
+    uint32_t reg_count = 0;
+
+    /*
+     * Only query registers indicated via pipeline statistics flags.
+     * If HW does not support a flag, fill value with 0.
+     */
+    for (i=0; i < ARRAY_SIZE(regs); i++) {
+        if ((regs[i][0] & info->pipelineStatistics)) {
+            query->regs[reg_count] = regs[i][1];
+            reg_count++;
+        }
+    }
+
+    query->reg_count = reg_count;
+    query->slot_stride = u_align(reg_count * sizeof(uint64_t) * 2, 64);
+}
+
+VkResult intel_query_create(struct intel_dev *dev,
+                            const VkQueryPoolCreateInfo *info,
+                            struct intel_query **query_ret)
+{
+    struct intel_query *query;
+
+    query = (struct intel_query *) intel_base_create(&dev->base.handle,
+            sizeof(*query), dev->base.dbg, VK_DEBUG_REPORT_OBJECT_TYPE_QUEUE_EXT,
+            info, 0);
+    if (!query)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    query->type = info->queryType;
+    query->slot_count = info->queryCount;
+
+    /*
+     * For each query type, the GPU will be asked to write the values of some
+     * registers to a buffer before and after a sequence of commands.  We will
+     * compare the differences to get the query results.
+     */
+    switch (info->queryType) {
+    case VK_QUERY_TYPE_OCCLUSION:
+        query->slot_stride = u_align(sizeof(uint64_t) * 2, 64);
+        break;
+    case VK_QUERY_TYPE_PIPELINE_STATISTICS:
+        query_init_pipeline_statistics(dev, info, query);
+        break;
+    case VK_QUERY_TYPE_TIMESTAMP:
+        query->slot_stride = u_align(sizeof(uint64_t), 64);
+        break;
+    default:
+        assert(!"unknown query type");
+        break;
+    }
+
+    VkMemoryAllocateInfo mem_reqs;
+    mem_reqs.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
+    mem_reqs.allocationSize = query->slot_stride * query->slot_count;
+    mem_reqs.pNext = NULL;
+    mem_reqs.memoryTypeIndex = 0;
+    intel_mem_alloc(dev, &mem_reqs, &query->obj.mem);
+
+    query->obj.destroy = query_destroy;
+
+    *query_ret = query;
+
+    return VK_SUCCESS;
+}
+
+void intel_query_destroy(struct intel_query *query)
+{
+    intel_base_destroy(&query->obj.base);
+}
+
+static void
+query_process_occlusion(const struct intel_query *query,
+                        uint32_t count, const uint8_t *raw,
+                        uint64_t *results)
+{
+    uint32_t i;
+
+    for (i = 0; i < count; i++) {
+        const uint64_t *pair = (const uint64_t *) raw;
+
+        results[i] = pair[1] - pair[0];
+        raw += query->slot_stride;
+    }
+}
+
+static void
+query_process_pipeline_statistics(const struct intel_query *query,
+                                  uint32_t count, const uint8_t *raw,
+                                  void *results)
+{
+    const uint32_t num_regs = query->reg_count;
+    uint32_t i, j;
+
+    for (i = 0; i < count; i++) {
+        const uint64_t *before = (const uint64_t *) raw;
+        const uint64_t *after = before + num_regs;
+        uint64_t *dst = (uint64_t *) (results + i);
+
+        for (j = 0; j < num_regs; j++)
+            dst[j] = after[j] - before[j];
+
+        raw += query->slot_stride;
+    }
+}
+
+static void
+query_process_timestamp(const struct intel_query *query,
+                        uint32_t count, const uint8_t *raw,
+                        uint64_t *results)
+{
+    uint32_t i;
+
+    for (i = 0; i < count; i++) {
+        const uint64_t *ts = (const uint64_t *) raw;
+
+        results[i] = *ts;
+        raw += query->slot_stride;
+    }
+}
+
+VkResult intel_query_get_results(struct intel_query *query,
+                                 uint32_t slot_start, uint32_t slot_count,
+                                 void *results)
+{
+    const uint8_t *ptr;
+
+    if (intel_mem_is_busy(query->obj.mem))
+        return VK_NOT_READY;
+
+    ptr = (const uint8_t *) intel_mem_map_sync(query->obj.mem, false);
+    if (!ptr)
+        return VK_ERROR_MEMORY_MAP_FAILED;
+
+    ptr += query->obj.offset + query->slot_stride * slot_start;
+
+    switch (query->type) {
+    case VK_QUERY_TYPE_OCCLUSION:
+        query_process_occlusion(query, slot_count, ptr, results);
+        break;
+    case VK_QUERY_TYPE_PIPELINE_STATISTICS:
+        query_process_pipeline_statistics(query, slot_count, ptr, results);
+        break;
+    case VK_QUERY_TYPE_TIMESTAMP:
+        query_process_timestamp(query, slot_count, ptr, results);
+        break;
+    default:
+        assert(0);
+        break;
+    }
+
+    intel_mem_unmap(query->obj.mem);
+
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateQueryPool(
+    VkDevice                                    device,
+    const VkQueryPoolCreateInfo*                pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkQueryPool*                                pQueryPool)
+{
+    struct intel_dev *dev = intel_dev(device);
+
+    return intel_query_create(dev, pCreateInfo,
+            (struct intel_query **) pQueryPool);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyQueryPool(
+    VkDevice                                    device,
+    VkQueryPool                                 queryPool,
+    const VkAllocationCallbacks*                     pAllocator)
+
+ {
+    struct intel_obj *obj = intel_obj(queryPool);
+
+    obj->destroy(obj);
+ }
+
+VKAPI_ATTR VkResult VKAPI_CALL vkGetQueryPoolResults(
+    VkDevice                                    device,
+    VkQueryPool                                 queryPool,
+    uint32_t                                    firstQuery,
+    uint32_t                                    queryCount,
+    size_t                                      dataSize,
+    void*                                       pData,
+    size_t                                      stride,
+    VkQueryResultFlags                          flags)
+{
+    struct intel_query *query = intel_query(queryPool);
+
+    if (pData)
+        return intel_query_get_results(query, firstQuery, queryCount, pData);
+    else
+        return VK_SUCCESS;
+}

diff --git a/icd/intel/query.h b/icd/intel/query.h
new file mode 100644
index 0000000..cfe14e7
--- /dev/null
+++ b/icd/intel/query.h

@@ -0,0 +1,62 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ *
+ */
+
+#ifndef QUERY_H
+#define QUERY_H
+
+#include "intel.h"
+#include "obj.h"
+
+struct intel_query {
+    struct intel_obj obj;
+
+    VkQueryType type;
+    uint32_t reg_count;
+    uint32_t slot_stride;
+    uint32_t slot_count;
+    uint32_t regs[32];
+};
+
+static inline struct intel_query *intel_query(VkQueryPool pool)
+{
+    return *(struct intel_query **) &pool;
+}
+
+static inline struct intel_query *intel_query_from_base(struct intel_base *base)
+{
+    return (struct intel_query *) base;
+}
+
+static inline struct intel_query *intel_query_from_obj(struct intel_obj *obj)
+{
+    return intel_query_from_base(&obj->base);
+}
+
+VkResult intel_query_create(struct intel_dev *dev,
+                              const VkQueryPoolCreateInfo *info,
+                              struct intel_query **query_ret);
+void intel_query_destroy(struct intel_query *query);
+
+VkResult intel_query_get_results(struct intel_query *query,
+                                   uint32_t slot_start, uint32_t slot_count,
+                                   void *results);
+
+#endif /* QUERY_H */

diff --git a/icd/intel/queue.c b/icd/intel/queue.c
new file mode 100644
index 0000000..6e485a8
--- /dev/null
+++ b/icd/intel/queue.c

@@ -0,0 +1,464 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Chia-I Wu <olv@lunarg.com>
+ * Author: Mark Lobodzinski <mark@lunarg.com>
+ *
+ */
+
+#include "genhw/genhw.h"
+#include "kmd/winsys.h"
+#include "cmd.h"
+#include "dev.h"
+#include "fence.h"
+#include "queue.h"
+
+static void semaphore_destroy(struct intel_obj *obj)
+{
+    struct intel_semaphore *semaphore = intel_semaphore_from_obj(obj);
+
+    intel_semaphore_destroy(semaphore);
+}
+
+VkResult intel_semaphore_create(struct intel_dev *dev,
+                                const VkSemaphoreCreateInfo *info,
+                                struct intel_semaphore **semaphore_ret)
+{
+    struct intel_semaphore *semaphore;
+    semaphore = (struct intel_semaphore *) intel_base_create(&dev->base.handle,
+            sizeof(*semaphore), dev->base.dbg, VK_DEBUG_REPORT_OBJECT_TYPE_SEMAPHORE_EXT, info, 0);
+
+    if (!semaphore)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    semaphore->references = 0;
+    *semaphore_ret = semaphore;
+    semaphore->obj.destroy = semaphore_destroy;
+
+    return VK_SUCCESS;
+}
+
+void intel_semaphore_destroy(struct intel_semaphore *semaphore)
+{
+    intel_base_destroy(&semaphore->obj.base);
+}
+
+static void queue_submit_hang(struct intel_queue *queue,
+                              struct intel_cmd *cmd,
+                              uint32_t active_lost,
+                              uint32_t pending_lost)
+{
+    intel_cmd_decode(cmd, true);
+
+    intel_dev_log(queue->dev, VK_DEBUG_REPORT_ERROR_BIT_EXT,
+                  &cmd->obj.base, 0, 0,
+                  "GPU hanged with %d/%d active/pending command buffers lost",
+                  active_lost, pending_lost);
+}
+
+static VkResult queue_submit_bo(struct intel_queue *queue,
+                                  struct intel_bo *bo,
+                                  VkDeviceSize used)
+{
+    struct intel_winsys *winsys = queue->dev->winsys;
+    int err;
+
+    if (intel_debug & INTEL_DEBUG_NOHW)
+        err = 0;
+    else
+        err = intel_winsys_submit_bo(winsys, queue->ring, bo, used, 0);
+
+    return (err) ? VK_ERROR_DEVICE_LOST : VK_SUCCESS;
+}
+
+static struct intel_bo *queue_create_bo(struct intel_queue *queue,
+                                        VkDeviceSize size,
+                                        const void *cmd,
+                                        size_t cmd_len)
+{
+    struct intel_bo *bo;
+    void *ptr;
+
+    bo = intel_winsys_alloc_bo(queue->dev->winsys,
+            "queue bo", size, true);
+    if (!bo)
+        return NULL;
+
+    if (!cmd_len)
+        return bo;
+
+    ptr = intel_bo_map(bo, true);
+    if (!ptr) {
+        intel_bo_unref(bo);
+        return NULL;
+    }
+
+    memcpy(ptr, cmd, cmd_len);
+    intel_bo_unmap(bo);
+
+    return bo;
+}
+
+static VkResult queue_select_pipeline(struct intel_queue *queue,
+                                        int pipeline_select)
+{
+    uint32_t pipeline_select_cmd[] = {
+        GEN6_RENDER_CMD(SINGLE_DW, PIPELINE_SELECT),
+        GEN6_MI_CMD(MI_BATCH_BUFFER_END),
+    };
+    struct intel_bo *bo = NULL;
+    VkResult ret;
+
+    if (queue->ring != INTEL_RING_RENDER ||
+        queue->last_pipeline_select == pipeline_select)
+        return VK_SUCCESS;
+
+    switch (pipeline_select) {
+    case GEN6_PIPELINE_SELECT_DW0_SELECT_3D:
+        bo = queue->select_graphics_bo;
+        break;
+    case GEN6_PIPELINE_SELECT_DW0_SELECT_MEDIA:
+        bo = queue->select_compute_bo;
+        break;
+    default:
+        assert(0 && "Invalid pipeline select");
+        break;
+    }
+
+    if (!bo) {
+        pipeline_select_cmd[0] |= pipeline_select;
+        bo = queue_create_bo(queue, sizeof(pipeline_select_cmd),
+                pipeline_select_cmd, sizeof(pipeline_select_cmd));
+        if (!bo)
+            return VK_ERROR_OUT_OF_DEVICE_MEMORY;
+
+        switch (pipeline_select) {
+        case GEN6_PIPELINE_SELECT_DW0_SELECT_3D:
+            queue->select_graphics_bo = bo;
+            break;
+        case GEN6_PIPELINE_SELECT_DW0_SELECT_MEDIA:
+            queue->select_compute_bo = bo;
+            break;
+        default:
+            break;
+        }
+    }
+
+    ret = queue_submit_bo(queue, bo, sizeof(pipeline_select_cmd));
+    if (ret == VK_SUCCESS)
+        queue->last_pipeline_select = pipeline_select;
+
+    return ret;
+}
+
+static VkResult queue_init_hw_and_atomic_bo(struct intel_queue *queue)
+{
+    const uint32_t ctx_init_cmd[] = {
+        /* STATE_SIP */
+        GEN6_RENDER_CMD(COMMON, STATE_SIP),
+        0,
+        /* PIPELINE_SELECT */
+        GEN6_RENDER_CMD(SINGLE_DW, PIPELINE_SELECT) |
+            GEN6_PIPELINE_SELECT_DW0_SELECT_3D,
+        /* 3DSTATE_VF_STATISTICS */
+        GEN6_RENDER_CMD(SINGLE_DW, 3DSTATE_VF_STATISTICS),
+        /* end */
+        GEN6_MI_CMD(MI_BATCH_BUFFER_END),
+        GEN6_MI_CMD(MI_NOOP),
+    };
+    struct intel_bo *bo;
+    VkResult ret;
+
+    if (queue->ring != INTEL_RING_RENDER) {
+        queue->last_pipeline_select = -1;
+        queue->atomic_bo = queue_create_bo(queue,
+                sizeof(uint32_t) * INTEL_QUEUE_ATOMIC_COUNTER_COUNT,
+                NULL, 0);
+        return (queue->atomic_bo) ? VK_SUCCESS : VK_ERROR_OUT_OF_DEVICE_MEMORY;
+    }
+
+    bo = queue_create_bo(queue,
+            sizeof(uint32_t) * INTEL_QUEUE_ATOMIC_COUNTER_COUNT,
+            ctx_init_cmd, sizeof(ctx_init_cmd));
+    if (!bo)
+        return VK_ERROR_OUT_OF_DEVICE_MEMORY;
+
+    ret = queue_submit_bo(queue, bo, sizeof(ctx_init_cmd));
+    if (ret != VK_SUCCESS) {
+        intel_bo_unref(bo);
+        return ret;
+    }
+
+    queue->last_pipeline_select = GEN6_PIPELINE_SELECT_DW0_SELECT_3D;
+    /* reuse */
+    queue->atomic_bo = bo;
+
+    return VK_SUCCESS;
+}
+
+static VkResult queue_submit_cmd_prepare(struct intel_queue *queue,
+                                           struct intel_cmd *cmd)
+{
+    if (unlikely(cmd->result != VK_SUCCESS || !cmd->primary)) {
+        intel_dev_log(cmd->dev, VK_DEBUG_REPORT_ERROR_BIT_EXT,
+                      &cmd->obj.base, 0, 0,
+                      "invalid command buffer submitted");
+    }
+
+    return queue_select_pipeline(queue, cmd->pipeline_select);
+}
+
+static VkResult queue_submit_cmd_debug(struct intel_queue *queue,
+                                         struct intel_cmd *cmd)
+{
+    uint32_t active[2], pending[2];
+    struct intel_bo *bo;
+    VkDeviceSize used;
+    VkResult ret;
+
+    ret = queue_submit_cmd_prepare(queue, cmd);
+    if (ret != VK_SUCCESS)
+        return ret;
+
+    if (intel_debug & INTEL_DEBUG_HANG) {
+        intel_winsys_get_reset_stats(queue->dev->winsys,
+                &active[0], &pending[0]);
+    }
+
+    bo = intel_cmd_get_batch(cmd, &used);
+    ret = queue_submit_bo(queue, bo, used);
+    if (ret != VK_SUCCESS)
+        return ret;
+
+    if (intel_debug & INTEL_DEBUG_HANG) {
+        intel_bo_wait(bo, -1);
+        intel_winsys_get_reset_stats(queue->dev->winsys,
+                &active[1], &pending[1]);
+
+        if (active[0] != active[1] || pending[0] != pending[1]) {
+            queue_submit_hang(queue, cmd, active[1] - active[0],
+                    pending[1] - pending[0]);
+        }
+    }
+
+    if (intel_debug & INTEL_DEBUG_BATCH)
+        intel_cmd_decode(cmd, false);
+
+    return VK_SUCCESS;
+}
+
+static VkResult queue_submit_cmd(struct intel_queue *queue,
+                                   struct intel_cmd *cmd)
+{
+    struct intel_bo *bo;
+    VkDeviceSize used;
+    VkResult ret;
+
+    ret = queue_submit_cmd_prepare(queue, cmd);
+    if (ret == VK_SUCCESS) {
+        bo = intel_cmd_get_batch(cmd, &used);
+        ret = queue_submit_bo(queue, bo, used);
+    }
+
+    return ret;
+}
+
+VkResult intel_queue_create(struct intel_dev *dev,
+                            enum intel_gpu_engine_type engine,
+                            struct intel_queue **queue_ret)
+{
+    struct intel_queue *queue;
+    enum intel_ring_type ring;
+    VkFenceCreateInfo fence_info;
+    VkResult ret;
+
+    switch (engine) {
+    case INTEL_GPU_ENGINE_3D:
+        ring = INTEL_RING_RENDER;
+        break;
+    default:
+        intel_dev_log(dev, VK_DEBUG_REPORT_ERROR_BIT_EXT,
+                      &dev->base, 0, 0,
+                      "invalid engine type");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+        break;
+    }
+
+    queue = (struct intel_queue *) intel_base_create(&dev->base.handle,
+            sizeof(*queue), dev->base.dbg, VK_DEBUG_REPORT_OBJECT_TYPE_QUEUE_EXT, NULL, 0);
+    if (!queue)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    queue->dev = dev;
+    queue->ring = ring;
+
+    if (queue_init_hw_and_atomic_bo(queue) != VK_SUCCESS) {
+        intel_queue_destroy(queue);
+        return VK_ERROR_INITIALIZATION_FAILED;
+    }
+
+    memset(&fence_info, 0, sizeof(fence_info));
+    fence_info.sType = VK_STRUCTURE_TYPE_FENCE_CREATE_INFO;
+    ret = intel_fence_create(dev, &fence_info, &queue->fence);
+    if (ret != VK_SUCCESS) {
+        intel_queue_destroy(queue);
+        return ret;
+    }
+
+    *queue_ret = queue;
+
+    return VK_SUCCESS;
+}
+
+void intel_queue_destroy(struct intel_queue *queue)
+{
+    if (queue->fence)
+        intel_fence_destroy(queue->fence);
+
+    intel_bo_unref(queue->atomic_bo);
+    intel_bo_unref(queue->select_graphics_bo);
+    intel_bo_unref(queue->select_compute_bo);
+
+    intel_base_destroy(&queue->base);
+}
+
+VkResult intel_queue_wait(struct intel_queue *queue, int64_t timeout)
+{
+    /* return VK_SUCCESS instead of VK_ERROR_UNAVAILABLE */
+    if (!queue->fence->seqno_bo)
+        return VK_SUCCESS;
+
+    return intel_fence_wait(queue->fence, timeout);
+}
+
+static void intel_wait_queue_semaphore(struct intel_queue *queue, struct intel_semaphore *semaphore)
+{
+    intel_queue_wait(queue, -1);
+    semaphore->references--;
+}
+
+static void intel_signal_queue_semaphore(struct intel_queue *queue, struct intel_semaphore *semaphore)
+{
+    semaphore->references++;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkQueueWaitIdle(
+    VkQueue                                   queue_)
+{
+    struct intel_queue *queue = intel_queue(queue_);
+
+    return intel_queue_wait(queue, -1);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkQueueSubmit(
+    VkQueue                                   queue_,
+    uint32_t                                  submitCount,
+    const VkSubmitInfo*                       pSubmits,
+    VkFence                                   fence_)
+{
+    struct intel_queue *queue = intel_queue(queue_);
+    VkResult ret = VK_SUCCESS;
+    struct intel_cmd *last_cmd;
+    uint32_t i;
+
+    for (uint32_t submit_idx = 0; submit_idx < submitCount; submit_idx++) {
+
+        const VkSubmitInfo *submit = &pSubmits[submit_idx];
+
+        for (i = 0; i < submit->waitSemaphoreCount; i++) {
+            struct intel_semaphore *pSemaphore = intel_semaphore(submit->pWaitSemaphores[i]);
+            intel_wait_queue_semaphore(queue, pSemaphore);
+        }
+
+        if (unlikely(intel_debug)) {
+            for (i = 0; i < submit->commandBufferCount; i++) {
+                struct intel_cmd *cmd = intel_cmd(submit->pCommandBuffers[i]);
+                ret = queue_submit_cmd_debug(queue, cmd);
+                if (ret != VK_SUCCESS)
+                    break;
+            }
+        } else {
+            for (i = 0; i < submit->commandBufferCount; i++) {
+                struct intel_cmd *cmd = intel_cmd(submit->pCommandBuffers[i]);
+                ret = queue_submit_cmd(queue, cmd);
+                if (ret != VK_SUCCESS)
+                    break;
+            }
+        }
+
+        /* no cmd submitted */
+        if (i == 0)
+            return ret;
+
+        last_cmd = intel_cmd(submit->pCommandBuffers[i - 1]);
+
+        if (ret == VK_SUCCESS) {
+            intel_fence_set_seqno(queue->fence,
+                    intel_cmd_get_batch(last_cmd, NULL));
+
+            if (fence_ != VK_NULL_HANDLE) {
+                struct intel_fence *fence = intel_fence(fence_);
+                intel_fence_copy(fence, queue->fence);
+            }
+        } else {
+            struct intel_bo *last_bo;
+
+            /* unbusy submitted BOs */
+            last_bo = intel_cmd_get_batch(last_cmd, NULL);
+            intel_bo_wait(last_bo, -1);
+        }
+
+        for (i = 0; i < submit->signalSemaphoreCount; i++) {
+            struct intel_semaphore *pSemaphore = intel_semaphore(submit->pSignalSemaphores[i]);
+            intel_signal_queue_semaphore(queue, pSemaphore);
+        }
+    }
+
+    return ret;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateSemaphore(
+    VkDevice                                device,
+    const VkSemaphoreCreateInfo            *pCreateInfo,
+    const VkAllocationCallbacks            *pAllocator,
+    VkSemaphore                            *pSemaphore)
+{
+    /*
+     * We want to find an unused semaphore register and initialize it.  Signal
+     * will increment the register.  Wait will atomically decrement it and
+     * block if the value is zero, or a large constant N if we do not want to
+     * go negative.
+     *
+     * XXX However, MI_SEMAPHORE_MBOX does not seem to have the flexibility.
+     */
+
+    // TODO: fully support semaphores (mean time, simulate it):
+    struct intel_dev *dev = intel_dev(device);
+
+    return intel_semaphore_create(dev, pCreateInfo,
+           (struct intel_semaphore **) pSemaphore);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroySemaphore(
+    VkDevice                                    device,
+    VkSemaphore                                 semaphore,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    struct intel_obj *obj = intel_obj(semaphore);
+    obj->destroy(obj);
+}

diff --git a/icd/intel/queue.h b/icd/intel/queue.h
new file mode 100644
index 0000000..afb129c
--- /dev/null
+++ b/icd/intel/queue.h

@@ -0,0 +1,85 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Mark Lobodzinski <mark@lunarg.com>
+ *
+ */
+
+#ifndef QUEUE_H
+#define QUEUE_H
+
+#include "kmd/winsys.h"
+#include "intel.h"
+#include "gpu.h"
+#include "obj.h"
+
+#define INTEL_QUEUE_ATOMIC_COUNTER_COUNT 1024
+
+struct intel_dev;
+struct intel_fence;
+
+struct intel_semaphore {
+    struct intel_obj obj;
+
+    int references;
+};
+
+static inline struct intel_semaphore *intel_semaphore(VkSemaphore semaphore)
+{
+    return *(struct intel_semaphore **) &semaphore;
+}
+
+static inline struct intel_semaphore *intel_semaphore_from_obj(struct intel_obj *obj)
+{
+    return (struct intel_semaphore *) obj;
+}
+
+void intel_semaphore_destroy(struct intel_semaphore *semaphore);
+
+struct intel_queue {
+    struct intel_base base;
+
+    struct intel_dev *dev;
+    enum intel_ring_type ring;
+
+    struct intel_bo *atomic_bo;
+    struct intel_bo *select_graphics_bo;
+    struct intel_bo *select_compute_bo;
+
+    int last_pipeline_select;
+
+    struct intel_fence *fence;
+};
+
+static inline struct intel_queue *intel_queue(VkQueue queue)
+{
+    return (struct intel_queue *) queue;
+}
+
+VkResult intel_queue_create(struct intel_dev *dev,
+                              enum intel_gpu_engine_type engine,
+                              struct intel_queue **queue_ret);
+void intel_queue_destroy(struct intel_queue *queue);
+
+VkResult intel_queue_wait(struct intel_queue *queue, int64_t timeout);
+
+VkResult intel_semaphore_create(struct intel_dev *dev,
+                                const VkSemaphoreCreateInfo *info,
+                                struct intel_semaphore **semaphore_ret);
+
+#endif /* QUEUE_H */

diff --git a/icd/intel/sampler.c b/icd/intel/sampler.c
new file mode 100644
index 0000000..11239e4
--- /dev/null
+++ b/icd/intel/sampler.c

@@ -0,0 +1,413 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Tony Barbour <tony@LunarG.com>
+ *
+ */
+
+#include "genhw/genhw.h"
+#include "dev.h"
+#include "sampler.h"
+
+/**
+ * Translate a pipe texture filter to the matching hardware mapfilter.
+ */
+static int translate_tex_filter(VkFilter filter)
+{
+   switch (filter) {
+   case VK_FILTER_NEAREST: return GEN6_MAPFILTER_NEAREST;
+   case VK_FILTER_LINEAR:  return GEN6_MAPFILTER_LINEAR;
+   default:
+      assert(!"unknown tex filter");
+      return GEN6_MAPFILTER_NEAREST;
+   }
+}
+
+static int translate_tex_mipmap_mode(VkSamplerMipmapMode mode)
+{
+   switch (mode) {
+   case VK_SAMPLER_MIPMAP_MODE_NEAREST: return GEN6_MIPFILTER_NEAREST;
+   case VK_SAMPLER_MIPMAP_MODE_LINEAR:  return GEN6_MIPFILTER_LINEAR;
+   default:
+      assert(!"unknown tex mipmap mode");
+      return GEN6_MIPFILTER_NONE;
+   }
+}
+
+static int translate_tex_addr(VkSamplerAddressMode addr)
+{
+   switch (addr) {
+   case VK_SAMPLER_ADDRESS_MODE_REPEAT:                 return GEN6_TEXCOORDMODE_WRAP;
+   case VK_SAMPLER_ADDRESS_MODE_MIRRORED_REPEAT:        return GEN6_TEXCOORDMODE_MIRROR;
+   case VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE:          return GEN6_TEXCOORDMODE_CLAMP;
+   case VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_BORDER:        return GEN6_TEXCOORDMODE_CLAMP_BORDER;
+   case VK_SAMPLER_ADDRESS_MODE_MIRROR_CLAMP_TO_EDGE:   return GEN6_TEXCOORDMODE_MIRROR_ONCE;
+   default:
+      assert(!"unknown tex address");
+      return GEN6_TEXCOORDMODE_WRAP;
+   }
+}
+
+static int translate_compare_func(VkCompareOp func)
+{
+    switch (func) {
+    case VK_COMPARE_OP_NEVER:         return GEN6_COMPAREFUNCTION_NEVER;
+    case VK_COMPARE_OP_LESS:          return GEN6_COMPAREFUNCTION_LESS;
+    case VK_COMPARE_OP_EQUAL:         return GEN6_COMPAREFUNCTION_EQUAL;
+    case VK_COMPARE_OP_LESS_OR_EQUAL:    return GEN6_COMPAREFUNCTION_LEQUAL;
+    case VK_COMPARE_OP_GREATER:       return GEN6_COMPAREFUNCTION_GREATER;
+    case VK_COMPARE_OP_NOT_EQUAL:     return GEN6_COMPAREFUNCTION_NOTEQUAL;
+    case VK_COMPARE_OP_GREATER_OR_EQUAL: return GEN6_COMPAREFUNCTION_GEQUAL;
+    case VK_COMPARE_OP_ALWAYS:        return GEN6_COMPAREFUNCTION_ALWAYS;
+    default:
+      assert(!"unknown compare_func");
+      return GEN6_COMPAREFUNCTION_NEVER;
+    }
+}
+
+static void translate_border_color(VkBorderColor type, float rgba[4])
+{
+    switch (type) {
+    case VK_BORDER_COLOR_INT_OPAQUE_WHITE:
+    case VK_BORDER_COLOR_FLOAT_OPAQUE_WHITE:
+        rgba[0] = 1.0;
+        rgba[1] = 1.0;
+        rgba[2] = 1.0;
+        rgba[3] = 1.0;
+        break;
+    case VK_BORDER_COLOR_INT_TRANSPARENT_BLACK:
+    case VK_BORDER_COLOR_FLOAT_TRANSPARENT_BLACK:
+    default:
+        rgba[0] = 0.0;
+        rgba[1] = 0.0;
+        rgba[2] = 0.0;
+        rgba[3] = 0.0;
+        break;
+    case VK_BORDER_COLOR_INT_OPAQUE_BLACK:
+    case VK_BORDER_COLOR_FLOAT_OPAQUE_BLACK:
+        rgba[0] = 0.0;
+        rgba[1] = 0.0;
+        rgba[2] = 0.0;
+        rgba[3] = 1.0;
+        break;
+    }
+}
+
+static void
+sampler_border_color_state_gen6(const struct intel_gpu *gpu,
+                                const float color[4],
+                                uint32_t dw[12])
+{
+   float rgba[4] = { color[0], color[1], color[2], color[3] };
+
+   INTEL_GPU_ASSERT(gpu, 6, 6);
+
+   /*
+    * This state is not documented in the Sandy Bridge PRM, but in the
+    * Ironlake PRM.  SNORM8 seems to be in DW11 instead of DW1.
+    */
+
+   /* IEEE_FP */
+   dw[1] = u_fui(rgba[0]);
+   dw[2] = u_fui(rgba[1]);
+   dw[3] = u_fui(rgba[2]);
+   dw[4] = u_fui(rgba[3]);
+
+   /* FLOAT_16 */
+   dw[5] = u_float_to_half(rgba[0]) |
+           u_float_to_half(rgba[1]) << 16;
+   dw[6] = u_float_to_half(rgba[2]) |
+           u_float_to_half(rgba[3]) << 16;
+
+   /* clamp to [-1.0f, 1.0f] */
+   rgba[0] = U_CLAMP(rgba[0], -1.0f, 1.0f);
+   rgba[1] = U_CLAMP(rgba[1], -1.0f, 1.0f);
+   rgba[2] = U_CLAMP(rgba[2], -1.0f, 1.0f);
+   rgba[3] = U_CLAMP(rgba[3], -1.0f, 1.0f);
+
+   /* SNORM16 */
+   dw[9] =  (int16_t) u_iround(rgba[0] * 32767.0f) |
+            (int16_t) u_iround(rgba[1] * 32767.0f) << 16;
+   dw[10] = (int16_t) u_iround(rgba[2] * 32767.0f) |
+            (int16_t) u_iround(rgba[3] * 32767.0f) << 16;
+
+   /* SNORM8 */
+   dw[11] = (int8_t) u_iround(rgba[0] * 127.0f) |
+            (int8_t) u_iround(rgba[1] * 127.0f) << 8 |
+            (int8_t) u_iround(rgba[2] * 127.0f) << 16 |
+            (int8_t) u_iround(rgba[3] * 127.0f) << 24;
+
+   /* clamp to [0.0f, 1.0f] */
+   rgba[0] = U_CLAMP(rgba[0], 0.0f, 1.0f);
+   rgba[1] = U_CLAMP(rgba[1], 0.0f, 1.0f);
+   rgba[2] = U_CLAMP(rgba[2], 0.0f, 1.0f);
+   rgba[3] = U_CLAMP(rgba[3], 0.0f, 1.0f);
+
+   /* UNORM8 */
+   dw[0] = (uint8_t) u_iround(rgba[0] * 255.0f) |
+           (uint8_t) u_iround(rgba[1] * 255.0f) << 8 |
+           (uint8_t) u_iround(rgba[2] * 255.0f) << 16 |
+           (uint8_t) u_iround(rgba[3] * 255.0f) << 24;
+
+   /* UNORM16 */
+   dw[7] = (uint16_t) u_iround(rgba[0] * 65535.0f) |
+           (uint16_t) u_iround(rgba[1] * 65535.0f) << 16;
+   dw[8] = (uint16_t) u_iround(rgba[2] * 65535.0f) |
+           (uint16_t) u_iround(rgba[3] * 65535.0f) << 16;
+}
+
+static void
+sampler_init(struct intel_sampler *sampler,
+             const struct intel_gpu *gpu,
+             const VkSamplerCreateInfo *info)
+{
+   int mip_filter, min_filter, mag_filter, max_aniso;
+   int lod_bias, max_lod, min_lod;
+   int wrap_s, wrap_t, wrap_r;
+   uint32_t dw0, dw1, dw3;
+   float border_color[4];
+
+   INTEL_GPU_ASSERT(gpu, 6, 7.5);
+   STATIC_ASSERT(ARRAY_SIZE(sampler->cmd) >= 15);
+
+   mip_filter = translate_tex_mipmap_mode(info->mipmapMode);
+   min_filter = translate_tex_filter(info->minFilter);
+   mag_filter = translate_tex_filter(info->magFilter);
+
+   if (info->anisotropyEnable == VK_FALSE) {
+        max_aniso = 1;
+   } else {
+        if (info->maxAnisotropy >= 2 && info->maxAnisotropy <= 16)
+           max_aniso = info->maxAnisotropy / 2 - 1;
+        else if (info->maxAnisotropy > 16)
+           max_aniso = GEN6_ANISORATIO_16;
+        else
+           max_aniso = GEN6_ANISORATIO_2;
+   }
+
+   /*
+    * Here is how the hardware calculate per-pixel LOD, from my reading of the
+    * PRMs:
+    *
+    *  1) LOD is set to log2(ratio of texels to pixels) if not specified in
+    *     other ways.  The number of texels is measured using level
+    *     SurfMinLod.
+    *  2) Bias is added to LOD.
+    *  3) LOD is clamped to [MinLod, MaxLod], and the clamped value is
+    *     compared with Base to determine whether magnification or
+    *     minification is needed.  (if preclamp is disabled, LOD is compared
+    *     with Base before clamping)
+    *  4) If magnification is needed, or no mipmapping is requested, LOD is
+    *     set to floor(MinLod).
+    *  5) LOD is clamped to [0, MIPCnt], and SurfMinLod is added to LOD.
+    *
+    * With Gallium interface, Base is always zero and
+    * pipe_sampler_view::u.tex.first_level specifies SurfMinLod.
+    */
+   if (intel_gpu_gen(gpu) >= INTEL_GEN(7)) {
+      const float scale = 256.0f;
+
+      /* [-16.0, 16.0) in S4.8 */
+      lod_bias = (int)
+         (U_CLAMP(info->mipLodBias, -16.0f, 15.9f) * scale);
+      lod_bias &= 0x1fff;
+
+      /* [0.0, 14.0] in U4.8 */
+      max_lod = (int) (U_CLAMP(info->maxLod, 0.0f, 14.0f) * scale);
+      min_lod = (int) (U_CLAMP(info->minLod, 0.0f, 14.0f) * scale);
+   }
+   else {
+      const float scale = 64.0f;
+
+      /* [-16.0, 16.0) in S4.6 */
+      lod_bias = (int)
+         (U_CLAMP(info->mipLodBias, -16.0f, 15.9f) * scale);
+      lod_bias &= 0x7ff;
+
+      /* [0.0, 13.0] in U4.6 */
+      max_lod = (int) (U_CLAMP(info->maxLod, 0.0f, 13.0f) * scale);
+      min_lod = (int) (U_CLAMP(info->minLod, 0.0f, 13.0f) * scale);
+   }
+
+   /*
+    * We want LOD to be clamped to determine magnification/minification, and
+    * get set to zero when it is magnification or when mipmapping is disabled.
+    * The hardware would set LOD to floor(MinLod) and that is a problem when
+    * MinLod is greater than or equal to 1.0f.
+    *
+    * With Base being zero, it is always minification when MinLod is non-zero.
+    * To achieve our goal, we just need to set MinLod to zero and set
+    * MagFilter to MinFilter when mipmapping is disabled.
+    */
+
+   /* determine wrap s/t/r */
+   wrap_s = translate_tex_addr(info->addressModeU);
+   wrap_t = translate_tex_addr(info->addressModeV);
+   wrap_r = translate_tex_addr(info->addressModeW);
+
+   translate_border_color(info->borderColor, border_color);
+
+   if (intel_gpu_gen(gpu) >= INTEL_GEN(7)) {
+      dw0 = 1 << 28 |
+            mip_filter << 20 |
+            lod_bias << 1;
+
+      if (info->maxAnisotropy > 1 && info->anisotropyEnable == VK_TRUE) {
+         dw0 |= GEN6_MAPFILTER_ANISOTROPIC << 17 |
+                GEN6_MAPFILTER_ANISOTROPIC << 14 |
+                1;
+      } else {
+         dw0 |= mag_filter << 17 |
+                min_filter << 14;
+      }
+
+      dw1 = min_lod << 20 |
+            max_lod << 8;
+
+      dw1 |= translate_compare_func(info->compareOp) << 1;
+
+      dw3 = max_aniso << 19;
+
+      /* round the coordinates for linear filtering */
+      if (min_filter != GEN6_MAPFILTER_NEAREST) {
+         dw3 |= (GEN6_SAMPLER_DW3_U_MIN_ROUND |
+                 GEN6_SAMPLER_DW3_V_MIN_ROUND |
+                 GEN6_SAMPLER_DW3_R_MIN_ROUND);
+      }
+      if (mag_filter != GEN6_MAPFILTER_NEAREST) {
+         dw3 |= (GEN6_SAMPLER_DW3_U_MAG_ROUND |
+                 GEN6_SAMPLER_DW3_V_MAG_ROUND |
+                 GEN6_SAMPLER_DW3_R_MAG_ROUND);
+      }
+
+      dw3 |= wrap_s << 6 |
+             wrap_t << 3 |
+             wrap_r;
+
+      if (info->unnormalizedCoordinates)
+          dw3 |= GEN7_SAMPLER_DW3_NON_NORMALIZED_COORD;
+
+      sampler->cmd[0] = dw0;
+      sampler->cmd[1] = dw1;
+      sampler->cmd[2] = dw3;
+
+      memcpy(&sampler->cmd[3], &border_color, sizeof(border_color));
+   }
+   else {
+      dw0 = 1 << 28 |
+            mip_filter << 20 |
+            lod_bias << 3;
+
+      dw0 |= translate_compare_func(info->compareOp);
+
+      if (info->maxAnisotropy > 1 && info->anisotropyEnable == VK_TRUE) {
+         dw0 |= GEN6_MAPFILTER_ANISOTROPIC << 17 |
+                GEN6_MAPFILTER_ANISOTROPIC << 14;
+      }
+      else {
+         dw0 |= (min_filter != mag_filter) << 27 |
+                mag_filter << 17 |
+                min_filter << 14;
+      }
+
+      dw1 = min_lod << 22 |
+            max_lod << 12;
+
+      dw1 |= wrap_s << 6 |
+             wrap_t << 3 |
+             wrap_r;
+
+      dw3 = max_aniso << 19;
+
+      /* round the coordinates for linear filtering */
+      if (min_filter != GEN6_MAPFILTER_NEAREST) {
+         dw3 |= (GEN6_SAMPLER_DW3_U_MIN_ROUND |
+                 GEN6_SAMPLER_DW3_V_MIN_ROUND |
+                 GEN6_SAMPLER_DW3_R_MIN_ROUND);
+      }
+      if (mag_filter != GEN6_MAPFILTER_NEAREST) {
+         dw3 |= (GEN6_SAMPLER_DW3_U_MAG_ROUND |
+                 GEN6_SAMPLER_DW3_V_MAG_ROUND |
+                 GEN6_SAMPLER_DW3_R_MAG_ROUND);
+      }
+
+      if (info->unnormalizedCoordinates)
+          dw3 |= GEN6_SAMPLER_DW3_NON_NORMALIZED_COORD;
+
+      sampler->cmd[0] = dw0;
+      sampler->cmd[1] = dw1;
+      sampler->cmd[2] = dw3;
+
+      sampler_border_color_state_gen6(gpu, border_color, &sampler->cmd[3]);
+   }
+}
+
+static void sampler_destroy(struct intel_obj *obj)
+{
+    struct intel_sampler *sampler = intel_sampler_from_obj(obj);
+
+    intel_sampler_destroy(sampler);
+}
+
+VkResult intel_sampler_create(struct intel_dev *dev,
+                                const VkSamplerCreateInfo *info,
+                                struct intel_sampler **sampler_ret)
+{
+    struct intel_sampler *sampler;
+
+    sampler = (struct intel_sampler *) intel_base_create(&dev->base.handle,
+            sizeof(*sampler), dev->base.dbg, VK_DEBUG_REPORT_OBJECT_TYPE_SAMPLER_EXT, info, 0);
+    if (!sampler)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    sampler->obj.destroy = sampler_destroy;
+
+    sampler_init(sampler, dev->gpu, info);
+
+    *sampler_ret = sampler;
+
+    return VK_SUCCESS;
+}
+
+void intel_sampler_destroy(struct intel_sampler *sampler)
+{
+    intel_base_destroy(&sampler->obj.base);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateSampler(
+    VkDevice                                  device,
+    const VkSamplerCreateInfo*              pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkSampler*                                pSampler)
+{
+    struct intel_dev *dev = intel_dev(device);
+
+    return intel_sampler_create(dev, pCreateInfo,
+            (struct intel_sampler **) pSampler);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroySampler(
+    VkDevice                                device,
+    VkSampler                                 sampler,
+    const VkAllocationCallbacks*                     pAllocator)
+
+ {
+    struct intel_obj *obj = intel_obj(sampler);
+
+    obj->destroy(obj);
+ }

diff --git a/icd/intel/sampler.h b/icd/intel/sampler.h
new file mode 100644
index 0000000..33f3882
--- /dev/null
+++ b/icd/intel/sampler.h

@@ -0,0 +1,53 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ *
+ */
+
+#ifndef SAMPLER_H
+#define SAMPLER_H
+
+#include "intel.h"
+#include "obj.h"
+
+struct intel_sampler {
+    struct intel_obj obj;
+
+    /*
+     * SAMPLER_STATE
+     * SAMPLER_BORDER_COLOR_STATE
+     */
+    uint32_t cmd[15];
+};
+
+static inline struct intel_sampler *intel_sampler(VkSampler sampler)
+{
+    return *(struct intel_sampler **) &sampler;
+}
+
+static inline struct intel_sampler *intel_sampler_from_obj(struct intel_obj *obj)
+{
+    return (struct intel_sampler *) obj;
+}
+
+VkResult intel_sampler_create(struct intel_dev *dev,
+                                const VkSamplerCreateInfo *info,
+                                struct intel_sampler **sampler_ret);
+void intel_sampler_destroy(struct intel_sampler *sampler);
+
+#endif /* SAMPLER_H */

diff --git a/icd/intel/shader.c b/icd/intel/shader.c
new file mode 100644
index 0000000..85866ba
--- /dev/null
+++ b/icd/intel/shader.c

@@ -0,0 +1,133 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chris Forbes <chrisf@ijw.co.nz>
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ * Author: Tony Barbour <tony@LunarG.com>
+ *
+ */
+
+#include "dev.h"
+#include "shader.h"
+#include "compiler/shader/compiler_interface.h"
+
+static void shader_module_destroy(struct intel_obj *obj)
+{
+    struct intel_shader_module *sm = intel_shader_module_from_obj(obj);
+
+    if (sm->vs)
+        shader_destroy_ir(sm->vs);
+    if (sm->tcs)
+        shader_destroy_ir(sm->tcs);
+    if (sm->tes)
+        shader_destroy_ir(sm->tes);
+    if (sm->gs)
+        shader_destroy_ir(sm->gs);
+    if (sm->fs)
+        shader_destroy_ir(sm->fs);
+    if (sm->cs)
+        shader_destroy_ir(sm->cs);
+
+    free(sm->code);
+    sm->code = 0;
+
+    intel_base_destroy(&sm->obj.base);
+}
+
+const struct intel_ir *intel_shader_module_get_ir(struct intel_shader_module *sm,
+                                                  VkShaderStageFlagBits stage)
+{
+    struct intel_ir **ir;
+
+    switch (stage) {
+    case VK_SHADER_STAGE_VERTEX_BIT:
+        ir = &sm->vs;
+        break;
+    case VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT:
+        ir = &sm->tcs;
+        break;
+    case VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT:
+        ir = &sm->tes;
+        break;
+    case VK_SHADER_STAGE_GEOMETRY_BIT:
+        ir = &sm->gs;
+        break;
+    case VK_SHADER_STAGE_FRAGMENT_BIT:
+        ir = &sm->fs;
+        break;
+    case VK_SHADER_STAGE_COMPUTE_BIT:
+        ir = &sm->cs;
+        break;
+    default:
+        assert(!"unsupported shader stage");
+        return NULL;
+        break;
+    }
+
+    shader_create_ir_with_lock(sm->gpu, sm->code, sm->code_size, stage, ir);
+
+    return *ir;
+}
+
+static VkResult shader_module_create(struct intel_dev *dev,
+                                     const VkShaderModuleCreateInfo *info,
+                                     struct intel_shader_module **sm_ret)
+{
+    struct intel_shader_module *sm;
+
+    sm = (struct intel_shader_module *) intel_base_create(&dev->base.handle,
+            sizeof(*sm), dev->base.dbg, VK_DEBUG_REPORT_OBJECT_TYPE_SHADER_MODULE_EXT, info, 0);
+    if (!sm)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    sm->gpu = dev->gpu;
+    sm->code_size = info->codeSize;
+    sm->code = malloc(info->codeSize);
+    if (!sm->code) {
+        shader_module_destroy(&sm->obj);
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+    }
+
+    memcpy(sm->code, info->pCode, info->codeSize);
+    sm->obj.destroy = shader_module_destroy;
+
+    *sm_ret = sm;
+
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateShaderModule(
+    VkDevice                                    device,
+    const VkShaderModuleCreateInfo*             pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkShaderModule*                             pShaderModule)
+{
+    struct intel_dev *dev = intel_dev(device);
+
+    return shader_module_create(dev, pCreateInfo, (struct intel_shader_module **) pShaderModule);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyShaderModule(
+    VkDevice                                device,
+    VkShaderModule                          shaderModule,
+    const VkAllocationCallbacks*                     pAllocator)
+
+ {
+    struct intel_obj *obj = intel_obj(shaderModule);
+
+    obj->destroy(obj);
+ }

diff --git a/icd/intel/shader.h b/icd/intel/shader.h
new file mode 100644
index 0000000..0a9c32f
--- /dev/null
+++ b/icd/intel/shader.h

@@ -0,0 +1,61 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chris Forbes <chrisf@ijw.co.nz>
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ *
+ */
+
+#ifndef SHADER_H
+#define SHADER_H
+
+#include "intel.h"
+#include "obj.h"
+
+struct intel_ir;
+
+struct intel_shader_module {
+    struct intel_obj obj;
+
+    const struct intel_gpu *gpu;
+    /* content is just a copy of the SPIRV image */
+    uint32_t code_size;
+    void *code;
+
+    /* simple cache */
+    struct intel_ir *vs;
+    struct intel_ir *tcs;
+    struct intel_ir *tes;
+    struct intel_ir *gs;
+    struct intel_ir *fs;
+    struct intel_ir *cs;
+};
+
+static inline struct intel_shader_module *intel_shader_module(VkShaderModule shaderModule)
+{
+    return *(struct intel_shader_module **) &shaderModule;
+}
+
+static inline struct intel_shader_module *intel_shader_module_from_obj(struct intel_obj *obj)
+{
+    return (struct intel_shader_module *)obj;
+}
+
+const struct intel_ir *intel_shader_module_get_ir(struct intel_shader_module *sm,
+                                                  VkShaderStageFlagBits stage);
+
+#endif /* SHADER_H */

diff --git a/icd/intel/state.c b/icd/intel/state.c
new file mode 100644
index 0000000..75fe6ad
--- /dev/null
+++ b/icd/intel/state.c

@@ -0,0 +1,272 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Chia-I Wu <olv@lunarg.com>
+ * Author: Cody Northrop <cody@lunarg.com>
+ *
+ */
+
+#include "genhw/genhw.h"
+#include "dev.h"
+#include "state.h"
+#include "cmd.h"
+
+void intel_set_viewport(struct intel_cmd *cmd, uint32_t first, uint32_t count, const VkViewport *viewports)
+{
+    cmd->bind.state.viewport.viewport_count = count;
+    memcpy(cmd->bind.state.viewport.viewports + first, viewports, count * sizeof(VkViewport));
+}
+
+void intel_set_scissor(struct intel_cmd *cmd, uint32_t first, uint32_t count, const VkRect2D *scissors)
+{
+    cmd->bind.state.viewport.scissor_count = count;
+    memcpy(cmd->bind.state.viewport.scissors + first, scissors, count * sizeof(VkRect2D));
+}
+
+void intel_set_line_width(struct intel_cmd *cmd, float line_width)
+{
+    cmd->bind.state.line_width.line_width = line_width;
+}
+
+void intel_set_depth_bias(
+    struct intel_cmd                   *cmd,
+    float                               depthBiasConstantFactor,
+    float                               depthBiasClamp,
+    float                               depthBiasSlopeFactor)
+{
+    cmd->bind.state.depth_bias.depth_bias = depthBiasConstantFactor;
+    cmd->bind.state.depth_bias.depth_bias_clamp = depthBiasClamp;
+    cmd->bind.state.depth_bias.slope_scaled_depth_bias = depthBiasSlopeFactor;
+}
+
+void intel_set_blend_constants(
+    struct intel_cmd                   *cmd,
+    const float                         constants[4])
+{
+    cmd->bind.state.blend.blend_const[0] = constants[0];
+    cmd->bind.state.blend.blend_const[1] = constants[1];
+    cmd->bind.state.blend.blend_const[2] = constants[2];
+    cmd->bind.state.blend.blend_const[3] = constants[3];
+}
+
+void intel_set_depth_bounds(
+    struct intel_cmd                   *cmd,
+    float                               minDepthBounds,
+    float                               maxDepthBounds)
+{
+    /*
+     * From the Sandy Bridge PRM, volume 2 part 1, page 359:
+     *
+     *     "If the Depth Buffer is either undefined or does not have a surface
+     *      format of D32_FLOAT_S8X24_UINT or D24_UNORM_S8_UINT and separate
+     *      stencil buffer is disabled, Stencil Test Enable must be DISABLED"
+     *
+     * From the Sandy Bridge PRM, volume 2 part 1, page 370:
+     *
+     *     "This field (Stencil Test Enable) cannot be enabled if
+     *      Surface Format in 3DSTATE_DEPTH_BUFFER is set to D16_UNORM."
+     *
+     * TODO We do not check these yet.
+     */
+    cmd->bind.state.depth_bounds.min_depth_bounds = minDepthBounds;
+    cmd->bind.state.depth_bounds.max_depth_bounds = maxDepthBounds;
+}
+
+void intel_set_stencil_compare_mask(
+    struct intel_cmd                   *cmd,
+    VkStencilFaceFlags                  faceMask,
+    uint32_t                            compareMask)
+{
+    /* TODO: enable back facing stencil state */
+    /* Some plumbing needs to be done if we want to support info_back.
+     * In the meantime, catch that back facing info has been submitted. */
+
+    /*
+     * From the Sandy Bridge PRM, volume 2 part 1, page 359:
+     *
+     *     "If the Depth Buffer is either undefined or does not have a surface
+     *      format of D32_FLOAT_S8X24_UINT or D24_UNORM_S8_UINT and separate
+     *      stencil buffer is disabled, Stencil Test Enable must be DISABLED"
+     *
+     * From the Sandy Bridge PRM, volume 2 part 1, page 370:
+     *
+     *     "This field (Stencil Test Enable) cannot be enabled if
+     *      Surface Format in 3DSTATE_DEPTH_BUFFER is set to D16_UNORM."
+     *
+     * TODO We do not check these yet.
+     */
+    if (faceMask & VK_STENCIL_FACE_FRONT_BIT) {
+        cmd->bind.state.stencil.front.stencil_compare_mask = compareMask;
+    }
+    if (faceMask & VK_STENCIL_FACE_BACK_BIT) {
+        cmd->bind.state.stencil.back.stencil_compare_mask = compareMask;
+    }
+}
+
+void intel_set_stencil_write_mask(
+    struct intel_cmd                   *cmd,
+    VkStencilFaceFlags                  faceMask,
+    uint32_t                            writeMask)
+{
+    if (faceMask & VK_STENCIL_FACE_FRONT_BIT) {
+        cmd->bind.state.stencil.front.stencil_write_mask = writeMask;
+    }
+    if (faceMask & VK_STENCIL_FACE_BACK_BIT) {
+        cmd->bind.state.stencil.back.stencil_write_mask = writeMask;
+    }
+}
+
+void intel_set_stencil_reference(
+    struct intel_cmd                   *cmd,
+    VkStencilFaceFlags                  faceMask,
+    uint32_t                            reference)
+{
+    if (faceMask & VK_STENCIL_FACE_FRONT_BIT) {
+        cmd->bind.state.stencil.front.stencil_reference = reference;
+    }
+    if (faceMask & VK_STENCIL_FACE_BACK_BIT) {
+        cmd->bind.state.stencil.back.stencil_reference = reference;
+    }
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdSetViewport(
+    VkCommandBuffer                         commandBuffer,
+    uint32_t                                firstViewport,
+    uint32_t                                viewportCount,
+    const VkViewport*                       pViewports)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+
+    if (cmd->bind.state.use_pipeline_dynamic_state & INTEL_USE_PIPELINE_DYNAMIC_VIEWPORT) {
+        return;
+    }
+
+    intel_set_viewport(cmd, firstViewport, viewportCount, pViewports);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdSetScissor(
+    VkCommandBuffer                         commandBuffer,
+    uint32_t                                firstScissor,
+    uint32_t                                scissorCount,
+    const VkRect2D*                         pScissors)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+
+    if (cmd->bind.state.use_pipeline_dynamic_state & INTEL_USE_PIPELINE_DYNAMIC_SCISSOR) {
+        return;
+    }
+
+    intel_set_scissor(cmd, firstScissor, scissorCount, pScissors);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdSetLineWidth(
+    VkCommandBuffer                              commandBuffer,
+    float                                    line_width)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+
+    if (cmd->bind.state.use_pipeline_dynamic_state & INTEL_USE_PIPELINE_DYNAMIC_LINE_WIDTH) {
+        return;
+    }
+
+    cmd->bind.state.line_width.line_width = line_width;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdSetDepthBias(
+    VkCommandBuffer                         commandBuffer,
+    float                               depthBiasConstantFactor,
+    float                               depthBiasClamp,
+    float                               depthBiasSlopeFactor)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+
+    if (cmd->bind.state.use_pipeline_dynamic_state & INTEL_USE_PIPELINE_DYNAMIC_DEPTH_BIAS) {
+        return;
+    }
+
+    intel_set_depth_bias(cmd, depthBiasConstantFactor, depthBiasClamp, depthBiasSlopeFactor);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdSetBlendConstants(
+    VkCommandBuffer                         commandBuffer,
+    const float                         blendConstants[4])
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+
+    if (cmd->bind.state.use_pipeline_dynamic_state & INTEL_USE_PIPELINE_DYNAMIC_BLEND_CONSTANTS) {
+        return;
+    }
+
+    intel_set_blend_constants(cmd, blendConstants);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdSetDepthBounds(
+    VkCommandBuffer                         commandBuffer,
+    float                               minDepthBounds,
+    float                               maxDepthBounds)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+
+    if (cmd->bind.state.use_pipeline_dynamic_state & INTEL_USE_PIPELINE_DYNAMIC_DEPTH_BOUNDS) {
+        return;
+    }
+
+    intel_set_depth_bounds(cmd, minDepthBounds, maxDepthBounds);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdSetStencilCompareMask(
+    VkCommandBuffer                         commandBuffer,
+    VkStencilFaceFlags                  faceMask,
+    uint32_t                            compareMask)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+
+    if (cmd->bind.state.use_pipeline_dynamic_state & INTEL_USE_PIPELINE_DYNAMIC_STENCIL_COMPARE_MASK) {
+        return;
+    }
+
+    intel_set_stencil_compare_mask(cmd, faceMask, compareMask);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdSetStencilWriteMask(
+    VkCommandBuffer                         commandBuffer,
+    VkStencilFaceFlags                  faceMask,
+    uint32_t                            writeMask)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+
+    if (cmd->bind.state.use_pipeline_dynamic_state & INTEL_USE_PIPELINE_DYNAMIC_STENCIL_WRITE_MASK) {
+        return;
+    }
+
+    intel_set_stencil_write_mask(cmd, faceMask, writeMask);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdSetStencilReference(
+    VkCommandBuffer                         commandBuffer,
+    VkStencilFaceFlags                  faceMask,
+    uint32_t                            reference)
+{
+    struct intel_cmd *cmd = intel_cmd(commandBuffer);
+
+    if (cmd->bind.state.use_pipeline_dynamic_state & INTEL_USE_PIPELINE_DYNAMIC_STENCIL_REFERENCE) {
+        return;
+    }
+
+    intel_set_stencil_reference(cmd, faceMask, reference);
+}

diff --git a/icd/intel/state.h b/icd/intel/state.h
new file mode 100644
index 0000000..76fb987
--- /dev/null
+++ b/icd/intel/state.h

@@ -0,0 +1,104 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Cody Northrop <cody@lunarg.com>
+ *
+ */
+
+#ifndef STATE_H
+#define STATE_H
+
+#include "intel.h"
+#include "obj.h"
+
+struct intel_dynamic_viewport {
+    uint32_t first_viewport;
+    uint32_t first_scissor;
+    uint32_t viewport_count;
+    uint32_t scissor_count;
+    VkViewport viewports[INTEL_MAX_VIEWPORTS];
+    VkRect2D scissors[INTEL_MAX_VIEWPORTS];
+    /* SF_CLIP_VIEWPORTs, CC_VIEWPORTs, and SCISSOR_RECTs */
+    uint32_t cmd[INTEL_MAX_VIEWPORTS * (16 /* viewport */ + 2 /* cc */ + 2 /* scissor */)];
+    uint32_t cmd_len;
+    uint32_t cmd_clip_pos;
+    uint32_t cmd_cc_pos;
+    uint32_t cmd_scissor_rect_pos;
+};
+
+struct intel_dynamic_line_width {
+    float line_width;
+};
+
+struct intel_dynamic_depth_bias {
+    float depth_bias;
+    float depth_bias_clamp;
+    float slope_scaled_depth_bias;
+};
+
+struct intel_dynamic_blend {
+    float blend_const[4];
+};
+
+struct intel_dynamic_depth_bounds {
+    float min_depth_bounds;
+    float max_depth_bounds;
+};
+
+struct intel_dynamic_stencil_face {
+    uint32_t stencil_compare_mask;
+    uint32_t stencil_write_mask;
+    uint32_t stencil_reference;
+};
+
+struct intel_dynamic_stencil {
+    struct intel_dynamic_stencil_face front;
+    /* TODO: enable back facing stencil state */
+    struct intel_dynamic_stencil_face back;
+};
+
+struct intel_cmd;
+void intel_set_viewport(struct intel_cmd *cmd, uint32_t first, uint32_t count, const VkViewport *viewports);
+void intel_set_scissor(struct intel_cmd *cmd, uint32_t first, uint32_t count, const VkRect2D *scissors);
+void intel_set_line_width(struct intel_cmd *cmd, float line_width);
+void intel_set_depth_bias(
+    struct intel_cmd                   *cmd,
+    float                               depthBiasConstantFactor,
+    float                               depthBiasClamp,
+    float                               depthBiasSlopeFactor);
+void intel_set_blend_constants(
+    struct intel_cmd                   *cmd,
+    const float                         constants[4]);
+void intel_set_depth_bounds(
+    struct intel_cmd                   *cmd,
+    float                               minDepthBounds,
+    float                               maxDepthBounds);
+void intel_set_stencil_compare_mask(
+    struct intel_cmd                   *cmd,
+    VkStencilFaceFlags                  faceMask,
+    uint32_t                            compareMask);
+void intel_set_stencil_write_mask(
+    struct intel_cmd                   *cmd,
+    VkStencilFaceFlags                  faceMask,
+    uint32_t                            writeMask);
+void intel_set_stencil_reference(
+    struct intel_cmd                   *cmd,
+    VkStencilFaceFlags                  faceMask,
+    uint32_t                            reference);
+
+#endif /* STATE_H */

diff --git a/icd/intel/view.c b/icd/intel/view.c
new file mode 100644
index 0000000..e145606
--- /dev/null
+++ b/icd/intel/view.c

@@ -0,0 +1,1396 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Chia-I Wu <olv@lunarg.com>
+ *
+ */
+
+#include "genhw/genhw.h"
+#include "kmd/winsys.h"
+#include "buf.h"
+#include "dev.h"
+#include "format.h"
+#include "gpu.h"
+#include "img.h"
+#include "mem.h"
+#include "view.h"
+
+static void surface_state_null_gen7(const struct intel_gpu *gpu,
+                                    uint32_t dw[8])
+{
+   INTEL_GPU_ASSERT(gpu, 7, 7.5);
+
+   /*
+    * From the Ivy Bridge PRM, volume 4 part 1, page 62:
+    *
+    *     "A null surface is used in instances where an actual surface is not
+    *      bound. When a write message is generated to a null surface, no
+    *      actual surface is written to. When a read message (including any
+    *      sampling engine message) is generated to a null surface, the result
+    *      is all zeros.  Note that a null surface type is allowed to be used
+    *      with all messages, even if it is not specificially indicated as
+    *      supported. All of the remaining fields in surface state are ignored
+    *      for null surfaces, with the following exceptions:
+    *
+    *      * Width, Height, Depth, LOD, and Render Target View Extent fields
+    *        must match the depth buffer's corresponding state for all render
+    *        target surfaces, including null.
+    *      * All sampling engine and data port messages support null surfaces
+    *        with the above behavior, even if not mentioned as specifically
+    *        supported, except for the following:
+    *        * Data Port Media Block Read/Write messages.
+    *      * The Surface Type of a surface used as a render target (accessed
+    *        via the Data Port's Render Target Write message) must be the same
+    *        as the Surface Type of all other render targets and of the depth
+    *        buffer (defined in 3DSTATE_DEPTH_BUFFER), unless either the depth
+    *        buffer or render targets are SURFTYPE_NULL."
+    *
+    * From the Ivy Bridge PRM, volume 4 part 1, page 65:
+    *
+    *     "If Surface Type is SURFTYPE_NULL, this field (Tiled Surface) must be
+    *      true"
+    */
+
+   dw[0] = GEN6_SURFTYPE_NULL << GEN7_SURFACE_DW0_TYPE__SHIFT |
+           GEN6_FORMAT_B8G8R8A8_UNORM << GEN7_SURFACE_DW0_FORMAT__SHIFT |
+           GEN6_TILING_X << 13;
+
+   dw[1] = 0;
+   dw[2] = 0;
+   dw[3] = 0;
+   dw[4] = 0;
+   dw[5] = 0;
+   dw[6] = 0;
+   dw[7] = 0;
+}
+
+static void surface_state_buf_gen7(const struct intel_gpu *gpu,
+                                   unsigned offset, unsigned size,
+                                   unsigned struct_size,
+                                   VkFormat elem_format,
+                                   bool is_rt, bool render_cache_rw,
+                                   uint32_t dw[8])
+{
+   const bool typed = !icd_format_is_undef(elem_format);
+   const bool structured = (!typed && struct_size > 1);
+   const int elem_size = (typed) ?
+      icd_format_get_size(elem_format) : 1;
+   int width, height, depth, pitch;
+   int surface_type, surface_format, num_entries;
+
+   INTEL_GPU_ASSERT(gpu, 7, 7.5);
+
+   surface_type = (structured) ? GEN7_SURFTYPE_STRBUF : GEN6_SURFTYPE_BUFFER;
+
+   surface_format = (typed) ?
+      intel_format_translate_color(gpu, elem_format) : GEN6_FORMAT_RAW;
+
+   /*
+    * It's possible that the buffer view being used is smaller than
+    * the format element size (required to be 16 for non-fragment shaders)
+    * Make certain that size is at least struct_size to keep HW happy.
+    */
+   if (size < struct_size) {
+       size = struct_size;
+   }
+
+   num_entries = size / struct_size;
+   /* see if there is enough space to fit another element */
+   if (size % struct_size >= elem_size && !structured)
+      num_entries++;
+
+   /*
+    * From the Ivy Bridge PRM, volume 4 part 1, page 67:
+    *
+    *     "For SURFTYPE_BUFFER render targets, this field (Surface Base
+    *      Address) specifies the base address of first element of the
+    *      surface. The surface is interpreted as a simple array of that
+    *      single element type. The address must be naturally-aligned to the
+    *      element size (e.g., a buffer containing R32G32B32A32_FLOAT elements
+    *      must be 16-byte aligned)
+    *
+    *      For SURFTYPE_BUFFER non-rendertarget surfaces, this field specifies
+    *      the base address of the first element of the surface, computed in
+    *      software by adding the surface base address to the byte offset of
+    *      the element in the buffer."
+    */
+   if (is_rt)
+      assert(offset % elem_size == 0);
+
+   /*
+    * From the Ivy Bridge PRM, volume 4 part 1, page 68:
+    *
+    *     "For typed buffer and structured buffer surfaces, the number of
+    *      entries in the buffer ranges from 1 to 2^27.  For raw buffer
+    *      surfaces, the number of entries in the buffer is the number of
+    *      bytes which can range from 1 to 2^30."
+    */
+   assert(num_entries >= 1 &&
+          num_entries <= 1 << ((typed || structured) ? 27 : 30));
+
+   /*
+    * From the Ivy Bridge PRM, volume 4 part 1, page 69:
+    *
+    *     "For SURFTYPE_BUFFER: The low two bits of this field (Width) must be
+    *      11 if the Surface Format is RAW (the size of the buffer must be a
+    *      multiple of 4 bytes)."
+    *
+    * From the Ivy Bridge PRM, volume 4 part 1, page 70:
+    *
+    *     "For surfaces of type SURFTYPE_BUFFER and SURFTYPE_STRBUF, this
+    *      field (Surface Pitch) indicates the size of the structure."
+    *
+    *     "For linear surfaces with Surface Type of SURFTYPE_STRBUF, the pitch
+    *      must be a multiple of 4 bytes."
+    */
+   if (structured)
+      assert(struct_size % 4 == 0);
+   else if (!typed)
+      assert(num_entries % 4 == 0);
+
+   pitch = struct_size;
+
+   pitch--;
+   num_entries--;
+   /* bits [6:0] */
+   width  = (num_entries & 0x0000007f);
+   /* bits [20:7] */
+   height = (num_entries & 0x001fff80) >> 7;
+   /* bits [30:21] */
+   depth  = (num_entries & 0x7fe00000) >> 21;
+   /* limit to [26:21] */
+   if (typed || structured)
+      depth &= 0x3f;
+
+   dw[0] = surface_type << GEN7_SURFACE_DW0_TYPE__SHIFT |
+           surface_format << GEN7_SURFACE_DW0_FORMAT__SHIFT;
+   if (render_cache_rw)
+      dw[0] |= GEN7_SURFACE_DW0_RENDER_CACHE_RW;
+
+   dw[1] = offset;
+
+   dw[2] = height << GEN7_SURFACE_DW2_HEIGHT__SHIFT |
+           width << GEN7_SURFACE_DW2_WIDTH__SHIFT;
+
+   dw[3] = depth << GEN7_SURFACE_DW3_DEPTH__SHIFT |
+           pitch;
+
+   dw[4] = 0;
+   dw[5] = GEN7_MOCS_L3_WB << GEN7_SURFACE_DW5_MOCS__SHIFT;
+
+   dw[6] = 0;
+   dw[7] = 0;
+
+   if (intel_gpu_gen(gpu) >= INTEL_GEN(7.5)) {
+      dw[7] |= GEN75_SCS_RED   << GEN75_SURFACE_DW7_SCS_R__SHIFT |
+               GEN75_SCS_GREEN << GEN75_SURFACE_DW7_SCS_G__SHIFT |
+               GEN75_SCS_BLUE  << GEN75_SURFACE_DW7_SCS_B__SHIFT |
+               GEN75_SCS_ALPHA << GEN75_SURFACE_DW7_SCS_A__SHIFT;
+   }
+}
+
+static int img_type_to_view_type(VkImageType type, unsigned first_layer, unsigned num_layers)
+{
+    if (first_layer == 0 && num_layers == 1) {
+        switch (type) {
+        case VK_IMAGE_TYPE_1D:   return VK_IMAGE_VIEW_TYPE_1D;
+        case VK_IMAGE_TYPE_2D:   return VK_IMAGE_VIEW_TYPE_2D;
+        case VK_IMAGE_TYPE_3D:   return VK_IMAGE_VIEW_TYPE_3D;
+        default: assert(!"unknown img type"); return VK_IMAGE_VIEW_TYPE_1D;
+        }
+    } else {
+        switch (type) {
+        case VK_IMAGE_TYPE_1D:   return VK_IMAGE_VIEW_TYPE_1D_ARRAY;
+        case VK_IMAGE_TYPE_2D:   return VK_IMAGE_VIEW_TYPE_2D_ARRAY;
+        case VK_IMAGE_TYPE_3D:   return VK_IMAGE_VIEW_TYPE_3D;
+        default: assert(!"unknown img type"); return VK_IMAGE_VIEW_TYPE_1D_ARRAY;
+        }
+    }
+}
+
+static int view_type_to_surface_type(VkImageViewType type)
+{
+    switch (type) {
+    case VK_IMAGE_VIEW_TYPE_1D:         return GEN6_SURFTYPE_1D;
+    case VK_IMAGE_VIEW_TYPE_1D_ARRAY:   return GEN6_SURFTYPE_1D;
+    case VK_IMAGE_VIEW_TYPE_2D:         return GEN6_SURFTYPE_2D;
+    case VK_IMAGE_VIEW_TYPE_2D_ARRAY:   return GEN6_SURFTYPE_2D;
+    case VK_IMAGE_VIEW_TYPE_3D:         return GEN6_SURFTYPE_3D;
+    case VK_IMAGE_VIEW_TYPE_CUBE:       return GEN6_SURFTYPE_CUBE;
+    case VK_IMAGE_VIEW_TYPE_CUBE_ARRAY: return GEN6_SURFTYPE_CUBE;
+    default: assert(!"unknown view type"); return GEN6_SURFTYPE_NULL;
+    }
+}
+
+static int channel_swizzle_to_scs(VkComponentSwizzle swizzle)
+{
+    switch (swizzle) {
+    case VK_COMPONENT_SWIZZLE_ZERO:  return GEN75_SCS_ZERO;
+    case VK_COMPONENT_SWIZZLE_ONE:   return GEN75_SCS_ONE;
+    case VK_COMPONENT_SWIZZLE_R:     return GEN75_SCS_RED;
+    case VK_COMPONENT_SWIZZLE_G:     return GEN75_SCS_GREEN;
+    case VK_COMPONENT_SWIZZLE_B:     return GEN75_SCS_BLUE;
+    case VK_COMPONENT_SWIZZLE_A:     return GEN75_SCS_ALPHA;
+    default: assert(!"unknown swizzle"); return GEN75_SCS_ZERO;
+    }
+}
+
+static void surface_state_tex_gen7(const struct intel_gpu *gpu,
+                                   const struct intel_img *img,
+                                   VkImageViewType type,
+                                   VkFormat format,
+                                   unsigned first_level,
+                                   unsigned num_levels,
+                                   unsigned first_layer,
+                                   unsigned num_layers,
+                                   VkComponentMapping swizzles,
+                                   bool is_rt,
+                                   uint32_t dw[8])
+{
+   int surface_type, surface_format;
+   int width, height, depth, pitch, lod;
+
+   INTEL_GPU_ASSERT(gpu, 7, 7.5);
+
+   surface_type = view_type_to_surface_type(type);
+   assert(surface_type != GEN6_SURFTYPE_BUFFER);
+
+   surface_format = intel_format_translate_color(gpu, format);
+   assert(surface_format >= 0);
+
+   width = img->layout.width0;
+   height = img->layout.height0;
+   depth = (type == VK_IMAGE_VIEW_TYPE_3D) ?
+      img->depth : num_layers;
+   pitch = img->layout.bo_stride;
+
+   if (surface_type == GEN6_SURFTYPE_CUBE) {
+      /*
+       * From the Ivy Bridge PRM, volume 4 part 1, page 70:
+       *
+       *     "For SURFTYPE_CUBE:For Sampling Engine Surfaces, the range of
+       *      this field is [0,340], indicating the number of cube array
+       *      elements (equal to the number of underlying 2D array elements
+       *      divided by 6). For other surfaces, this field must be zero."
+       *
+       * When is_rt is true, we treat the texture as a 2D one to avoid the
+       * restriction.
+       */
+      if (is_rt) {
+         surface_type = GEN6_SURFTYPE_2D;
+      }
+      else {
+         assert(num_layers % 6 == 0);
+         depth = num_layers / 6;
+      }
+   }
+
+   /* sanity check the size */
+   assert(width >= 1 && height >= 1 && depth >= 1 && pitch >= 1);
+   assert(first_layer < 2048 && num_layers <= 2048);
+   switch (surface_type) {
+   case GEN6_SURFTYPE_1D:
+      assert(width <= 16384 && height == 1 && depth <= 2048);
+      break;
+   case GEN6_SURFTYPE_2D:
+      assert(width <= 16384 && height <= 16384 && depth <= 2048);
+      break;
+   case GEN6_SURFTYPE_3D:
+      assert(width <= 2048 && height <= 2048 && depth <= 2048);
+      if (!is_rt)
+         assert(first_layer == 0);
+      break;
+   case GEN6_SURFTYPE_CUBE:
+      assert(width <= 16384 && height <= 16384 && depth <= 86);
+      assert(width == height);
+      if (is_rt)
+         assert(first_layer == 0);
+      break;
+   default:
+      assert(!"unexpected surface type");
+      break;
+   }
+
+   if (is_rt) {
+      assert(num_levels == 1);
+      lod = first_level;
+   }
+   else {
+      lod = num_levels - 1;
+   }
+
+   /*
+    * From the Ivy Bridge PRM, volume 4 part 1, page 68:
+    *
+    *     "The Base Address for linear render target surfaces and surfaces
+    *      accessed with the typed surface read/write data port messages must
+    *      be element-size aligned, for non-YUV surface formats, or a multiple
+    *      of 2 element-sizes for YUV surface formats.  Other linear surfaces
+    *      have no alignment requirements (byte alignment is sufficient)."
+    *
+    * From the Ivy Bridge PRM, volume 4 part 1, page 70:
+    *
+    *     "For linear render target surfaces and surfaces accessed with the
+    *      typed data port messages, the pitch must be a multiple of the
+    *      element size for non-YUV surface formats. Pitch must be a multiple
+    *      of 2 * element size for YUV surface formats. For linear surfaces
+    *      with Surface Type of SURFTYPE_STRBUF, the pitch must be a multiple
+    *      of 4 bytes.For other linear surfaces, the pitch can be any multiple
+    *      of bytes."
+    *
+    * From the Ivy Bridge PRM, volume 4 part 1, page 74:
+    *
+    *     "For linear surfaces, this field (X Offset) must be zero."
+    */
+   if (img->layout.tiling == GEN6_TILING_NONE) {
+      if (is_rt) {
+         const int elem_size U_ASSERT_ONLY = icd_format_get_size(format);
+         assert(pitch % elem_size == 0);
+      }
+   }
+
+   assert(img->layout.tiling != GEN8_TILING_W);
+   dw[0] = surface_type << GEN7_SURFACE_DW0_TYPE__SHIFT |
+           surface_format << GEN7_SURFACE_DW0_FORMAT__SHIFT |
+           img->layout.tiling << 13;
+
+   /*
+    * From the Ivy Bridge PRM, volume 4 part 1, page 63:
+    *
+    *     "If this field (Surface Array) is enabled, the Surface Type must be
+    *      SURFTYPE_1D, SURFTYPE_2D, or SURFTYPE_CUBE. If this field is
+    *      disabled and Surface Type is SURFTYPE_1D, SURFTYPE_2D, or
+    *      SURFTYPE_CUBE, the Depth field must be set to zero."
+    *
+    * For non-3D sampler surfaces, resinfo (the sampler message) always
+    * returns zero for the number of layers when this field is not set.
+    */
+   if (surface_type != GEN6_SURFTYPE_3D) {
+      if (num_layers > 1)
+         dw[0] |= GEN7_SURFACE_DW0_IS_ARRAY;
+      else
+         assert(depth == 1);
+   }
+
+   assert(img->layout.align_i == 4 || img->layout.align_i == 8);
+   assert(img->layout.align_j == 2 || img->layout.align_j == 4);
+
+   if (img->layout.align_j == 4)
+      dw[0] |= GEN7_SURFACE_DW0_VALIGN_4;
+
+   if (img->layout.align_i == 8)
+      dw[0] |= GEN7_SURFACE_DW0_HALIGN_8;
+
+   if (img->layout.walk == INTEL_LAYOUT_WALK_LOD)
+      dw[0] |= GEN7_SURFACE_DW0_ARYSPC_LOD0;
+   else
+      dw[0] |= GEN7_SURFACE_DW0_ARYSPC_FULL;
+
+   if (is_rt)
+      dw[0] |= GEN7_SURFACE_DW0_RENDER_CACHE_RW;
+
+   if (surface_type == GEN6_SURFTYPE_CUBE && !is_rt)
+      dw[0] |= GEN7_SURFACE_DW0_CUBE_FACE_ENABLES__MASK;
+
+   dw[1] = 0;
+
+   dw[2] = (height - 1) << GEN7_SURFACE_DW2_HEIGHT__SHIFT |
+           (width - 1) << GEN7_SURFACE_DW2_WIDTH__SHIFT;
+
+   dw[3] = (depth - 1) << GEN7_SURFACE_DW3_DEPTH__SHIFT |
+           (pitch - 1);
+
+   dw[4] = first_layer << 18 |
+           (num_layers - 1) << 7;
+
+   /*
+    * MSFMT_MSS means the samples are not interleaved and MSFMT_DEPTH_STENCIL
+    * means the samples are interleaved.  The layouts are the same when the
+    * number of samples is 1.
+    */
+   if (img->layout.interleaved_samples && img->sample_count > 1) {
+      assert(!is_rt);
+      dw[4] |= GEN7_SURFACE_DW4_MSFMT_DEPTH_STENCIL;
+   }
+   else {
+      dw[4] |= GEN7_SURFACE_DW4_MSFMT_MSS;
+   }
+
+   if (img->sample_count > 4)
+      dw[4] |= GEN7_SURFACE_DW4_MULTISAMPLECOUNT_8;
+   else if (img->sample_count > 2)
+      dw[4] |= GEN7_SURFACE_DW4_MULTISAMPLECOUNT_4;
+   else
+      dw[4] |= GEN7_SURFACE_DW4_MULTISAMPLECOUNT_1;
+
+   dw[5] = GEN7_MOCS_L3_WB << GEN7_SURFACE_DW5_MOCS__SHIFT |
+           (first_level) << GEN7_SURFACE_DW5_MIN_LOD__SHIFT |
+           lod;
+
+   dw[6] = 0;
+   dw[7] = 0;
+
+   if (intel_gpu_gen(gpu) >= INTEL_GEN(7.5)) {
+      dw[7] |=
+         channel_swizzle_to_scs((swizzles.r == VK_COMPONENT_SWIZZLE_IDENTITY) ? VK_COMPONENT_SWIZZLE_R : swizzles.r) << GEN75_SURFACE_DW7_SCS_R__SHIFT |
+         channel_swizzle_to_scs((swizzles.g == VK_COMPONENT_SWIZZLE_IDENTITY) ? VK_COMPONENT_SWIZZLE_G : swizzles.g) << GEN75_SURFACE_DW7_SCS_G__SHIFT |
+         channel_swizzle_to_scs((swizzles.b == VK_COMPONENT_SWIZZLE_IDENTITY) ? VK_COMPONENT_SWIZZLE_B : swizzles.b) << GEN75_SURFACE_DW7_SCS_B__SHIFT |
+         channel_swizzle_to_scs((swizzles.a == VK_COMPONENT_SWIZZLE_IDENTITY) ? VK_COMPONENT_SWIZZLE_A : swizzles.a) << GEN75_SURFACE_DW7_SCS_A__SHIFT;
+   } else {
+         assert(((swizzles.r == VK_COMPONENT_SWIZZLE_R) || (swizzles.r == VK_COMPONENT_SWIZZLE_IDENTITY)) &&
+                ((swizzles.g == VK_COMPONENT_SWIZZLE_G) || (swizzles.g == VK_COMPONENT_SWIZZLE_IDENTITY)) &&
+                ((swizzles.b == VK_COMPONENT_SWIZZLE_B) || (swizzles.b == VK_COMPONENT_SWIZZLE_IDENTITY)) &&
+                ((swizzles.a == VK_COMPONENT_SWIZZLE_A) || (swizzles.a == VK_COMPONENT_SWIZZLE_IDENTITY)));
+   }
+}
+
+static void surface_state_null_gen6(const struct intel_gpu *gpu,
+                                    uint32_t dw[6])
+{
+   INTEL_GPU_ASSERT(gpu, 6, 6);
+
+   /*
+    * From the Sandy Bridge PRM, volume 4 part 1, page 71:
+    *
+    *     "A null surface will be used in instances where an actual surface is
+    *      not bound. When a write message is generated to a null surface, no
+    *      actual surface is written to. When a read message (including any
+    *      sampling engine message) is generated to a null surface, the result
+    *      is all zeros. Note that a null surface type is allowed to be used
+    *      with all messages, even if it is not specificially indicated as
+    *      supported. All of the remaining fields in surface state are ignored
+    *      for null surfaces, with the following exceptions:
+    *
+    *        * [DevSNB+]: Width, Height, Depth, and LOD fields must match the
+    *          depth buffer's corresponding state for all render target
+    *          surfaces, including null.
+    *        * Surface Format must be R8G8B8A8_UNORM."
+    *
+    * From the Sandy Bridge PRM, volume 4 part 1, page 82:
+    *
+    *     "If Surface Type is SURFTYPE_NULL, this field (Tiled Surface) must be
+    *      true"
+    */
+
+   dw[0] = GEN6_SURFTYPE_NULL << GEN6_SURFACE_DW0_TYPE__SHIFT |
+           GEN6_FORMAT_B8G8R8A8_UNORM << GEN6_SURFACE_DW0_FORMAT__SHIFT;
+
+   dw[1] = 0;
+   dw[2] = 0;
+   dw[3] = GEN6_TILING_X;
+   dw[4] = 0;
+   dw[5] = 0;
+}
+
+static void surface_state_buf_gen6(const struct intel_gpu *gpu,
+                                   unsigned offset, unsigned size,
+                                   unsigned struct_size,
+                                   VkFormat elem_format,
+                                   bool is_rt, bool render_cache_rw,
+                                   uint32_t dw[6])
+{
+   const bool typed = !icd_format_is_undef(elem_format);
+   const int elem_size = icd_format_get_size(elem_format);
+   int width, height, depth, pitch;
+   int surface_format, num_entries;
+
+   INTEL_GPU_ASSERT(gpu, 6, 6);
+
+   /*
+    * For SURFTYPE_BUFFER, a SURFACE_STATE specifies an element of a
+    * structure in a buffer.
+    */
+
+   surface_format = (typed) ?
+       intel_format_translate_color(gpu, elem_format) : GEN6_FORMAT_RAW;
+
+   num_entries = size / struct_size;
+   /* see if there is enough space to fit another element */
+   if (size % struct_size >= elem_size)
+      num_entries++;
+
+   /*
+    * From the Sandy Bridge PRM, volume 4 part 1, page 76:
+    *
+    *     "For SURFTYPE_BUFFER render targets, this field (Surface Base
+    *      Address) specifies the base address of first element of the
+    *      surface. The surface is interpreted as a simple array of that
+    *      single element type. The address must be naturally-aligned to the
+    *      element size (e.g., a buffer containing R32G32B32A32_FLOAT elements
+    *      must be 16-byte aligned).
+    *
+    *      For SURFTYPE_BUFFER non-rendertarget surfaces, this field specifies
+    *      the base address of the first element of the surface, computed in
+    *      software by adding the surface base address to the byte offset of
+    *      the element in the buffer."
+    */
+   if (is_rt)
+      assert(offset % elem_size == 0);
+
+   /*
+    * From the Sandy Bridge PRM, volume 4 part 1, page 77:
+    *
+    *     "For buffer surfaces, the number of entries in the buffer ranges
+    *      from 1 to 2^27."
+    */
+   assert(num_entries >= 1 && num_entries <= 1 << 27);
+
+   /*
+    * From the Sandy Bridge PRM, volume 4 part 1, page 81:
+    *
+    *     "For surfaces of type SURFTYPE_BUFFER, this field (Surface Pitch)
+    *      indicates the size of the structure."
+    */
+   pitch = struct_size;
+
+   pitch--;
+   num_entries--;
+   /* bits [6:0] */
+   width  = (num_entries & 0x0000007f);
+   /* bits [19:7] */
+   height = (num_entries & 0x000fff80) >> 7;
+   /* bits [26:20] */
+   depth  = (num_entries & 0x07f00000) >> 20;
+
+   dw[0] = GEN6_SURFTYPE_BUFFER << GEN6_SURFACE_DW0_TYPE__SHIFT |
+           surface_format << GEN6_SURFACE_DW0_FORMAT__SHIFT;
+   if (render_cache_rw)
+      dw[0] |= GEN6_SURFACE_DW0_RENDER_CACHE_RW;
+
+   dw[1] = offset;
+
+   dw[2] = height << GEN6_SURFACE_DW2_HEIGHT__SHIFT |
+           width << GEN6_SURFACE_DW2_WIDTH__SHIFT;
+
+   dw[3] = depth << GEN6_SURFACE_DW3_DEPTH__SHIFT |
+           pitch << GEN6_SURFACE_DW3_PITCH__SHIFT;
+
+   dw[4] = 0;
+   dw[5] = 0;
+}
+
+static void surface_state_tex_gen6(const struct intel_gpu *gpu,
+                                   const struct intel_img *img,
+                                   VkImageViewType type,
+                                   VkFormat format,
+                                   unsigned first_level,
+                                   unsigned num_levels,
+                                   unsigned first_layer,
+                                   unsigned num_layers,
+                                   bool is_rt,
+                                   uint32_t dw[6])
+{
+   int surface_type, surface_format;
+   int width, height, depth, pitch, lod;
+
+   INTEL_GPU_ASSERT(gpu, 6, 6);
+
+   surface_type = view_type_to_surface_type(type);
+   assert(surface_type != GEN6_SURFTYPE_BUFFER);
+
+   surface_format = intel_format_translate_color(gpu, format);
+   assert(surface_format >= 0);
+
+   width = img->layout.width0;
+   height = img->layout.height0;
+   depth = (type == VK_IMAGE_VIEW_TYPE_3D) ?
+      img->depth : num_layers;
+   pitch = img->layout.bo_stride;
+
+   if (surface_type == GEN6_SURFTYPE_CUBE) {
+      /*
+       * From the Sandy Bridge PRM, volume 4 part 1, page 81:
+       *
+       *     "For SURFTYPE_CUBE: [DevSNB+]: for Sampling Engine Surfaces, the
+       *      range of this field (Depth) is [0,84], indicating the number of
+       *      cube array elements (equal to the number of underlying 2D array
+       *      elements divided by 6). For other surfaces, this field must be
+       *      zero."
+       *
+       * When is_rt is true, we treat the texture as a 2D one to avoid the
+       * restriction.
+       */
+      if (is_rt) {
+         surface_type = GEN6_SURFTYPE_2D;
+      }
+      else {
+         assert(num_layers % 6 == 0);
+         depth = num_layers / 6;
+      }
+   }
+
+   /* sanity check the size */
+   assert(width >= 1 && height >= 1 && depth >= 1 && pitch >= 1);
+   switch (surface_type) {
+   case GEN6_SURFTYPE_1D:
+      assert(width <= 8192 && height == 1 && depth <= 512);
+      assert(first_layer < 512 && num_layers <= 512);
+      break;
+   case GEN6_SURFTYPE_2D:
+      assert(width <= 8192 && height <= 8192 && depth <= 512);
+      assert(first_layer < 512 && num_layers <= 512);
+      break;
+   case GEN6_SURFTYPE_3D:
+      assert(width <= 2048 && height <= 2048 && depth <= 2048);
+      assert(first_layer < 2048 && num_layers <= 512);
+      if (!is_rt)
+         assert(first_layer == 0);
+      break;
+   case GEN6_SURFTYPE_CUBE:
+      assert(width <= 8192 && height <= 8192 && depth <= 85);
+      assert(width == height);
+      assert(first_layer < 512 && num_layers <= 512);
+      if (is_rt)
+         assert(first_layer == 0);
+      break;
+   default:
+      assert(!"unexpected surface type");
+      break;
+   }
+
+   /* non-full array spacing is supported only on GEN7+ */
+   assert(img->layout.walk != INTEL_LAYOUT_WALK_LOD);
+   /* non-interleaved samples are supported only on GEN7+ */
+   if (img->sample_count > 1)
+      assert(img->layout.interleaved_samples);
+
+   if (is_rt) {
+      assert(num_levels == 1);
+      lod = first_level;
+   }
+   else {
+      lod = num_levels - 1;
+   }
+
+   /*
+    * From the Sandy Bridge PRM, volume 4 part 1, page 76:
+    *
+    *     "Linear render target surface base addresses must be element-size
+    *      aligned, for non-YUV surface formats, or a multiple of 2
+    *      element-sizes for YUV surface formats. Other linear surfaces have
+    *      no alignment requirements (byte alignment is sufficient.)"
+    *
+    * From the Sandy Bridge PRM, volume 4 part 1, page 81:
+    *
+    *     "For linear render target surfaces, the pitch must be a multiple
+    *      of the element size for non-YUV surface formats. Pitch must be a
+    *      multiple of 2 * element size for YUV surface formats."
+    *
+    * From the Sandy Bridge PRM, volume 4 part 1, page 86:
+    *
+    *     "For linear surfaces, this field (X Offset) must be zero"
+    */
+   if (img->layout.tiling == GEN6_TILING_NONE) {
+      if (is_rt) {
+         const int elem_size U_ASSERT_ONLY = icd_format_get_size(format);
+         assert(pitch % elem_size == 0);
+      }
+   }
+
+   dw[0] = surface_type << GEN6_SURFACE_DW0_TYPE__SHIFT |
+           surface_format << GEN6_SURFACE_DW0_FORMAT__SHIFT |
+           GEN6_SURFACE_DW0_MIPLAYOUT_BELOW;
+
+   if (surface_type == GEN6_SURFTYPE_CUBE && !is_rt) {
+      dw[0] |= 1 << 9 |
+               GEN6_SURFACE_DW0_CUBE_FACE_ENABLES__MASK;
+   }
+
+   if (is_rt)
+      dw[0] |= GEN6_SURFACE_DW0_RENDER_CACHE_RW;
+
+   dw[1] = 0;
+
+   dw[2] = (height - 1) << GEN6_SURFACE_DW2_HEIGHT__SHIFT |
+           (width - 1) << GEN6_SURFACE_DW2_WIDTH__SHIFT |
+           lod << GEN6_SURFACE_DW2_MIP_COUNT_LOD__SHIFT;
+
+   assert(img->layout.tiling != GEN8_TILING_W);
+   dw[3] = (depth - 1) << GEN6_SURFACE_DW3_DEPTH__SHIFT |
+           (pitch - 1) << GEN6_SURFACE_DW3_PITCH__SHIFT |
+           img->layout.tiling;
+
+   dw[4] = first_level << GEN6_SURFACE_DW4_MIN_LOD__SHIFT |
+           first_layer << 17 |
+           (num_layers - 1) << 8 |
+           ((img->sample_count > 1) ? GEN6_SURFACE_DW4_MULTISAMPLECOUNT_4 :
+                                      GEN6_SURFACE_DW4_MULTISAMPLECOUNT_1);
+
+   dw[5] = 0;
+
+   assert(img->layout.align_j == 2 || img->layout.align_j == 4);
+   if (img->layout.align_j == 4)
+      dw[5] |= GEN6_SURFACE_DW5_VALIGN_4;
+}
+
+struct ds_surface_info {
+   int surface_type;
+   int format;
+
+   struct {
+      unsigned stride;
+      unsigned offset;
+   } zs, stencil, hiz;
+
+   unsigned width, height, depth;
+   unsigned lod, first_layer, num_layers;
+};
+
+static void
+ds_init_info_null(const struct intel_gpu *gpu,
+                  struct ds_surface_info *info)
+{
+   INTEL_GPU_ASSERT(gpu, 6, 7.5);
+
+   memset(info, 0, sizeof(*info));
+
+   info->surface_type = GEN6_SURFTYPE_NULL;
+   info->format = GEN6_ZFORMAT_D32_FLOAT;
+   info->width = 1;
+   info->height = 1;
+   info->depth = 1;
+   info->num_layers = 1;
+}
+
+static void
+ds_init_info(const struct intel_gpu *gpu,
+             const struct intel_img *img,
+             VkImageViewType view_type,
+             VkFormat format, unsigned level,
+             unsigned first_layer, unsigned num_layers,
+             struct ds_surface_info *info)
+{
+   bool separate_stencil;
+
+   INTEL_GPU_ASSERT(gpu, 6, 7.5);
+
+   memset(info, 0, sizeof(*info));
+
+   info->surface_type = view_type_to_surface_type(view_type);
+
+   if (info->surface_type == GEN6_SURFTYPE_CUBE) {
+      /*
+       * From the Sandy Bridge PRM, volume 2 part 1, page 325-326:
+       *
+       *     "For Other Surfaces (Cube Surfaces):
+       *      This field (Minimum Array Element) is ignored."
+       *
+       *     "For Other Surfaces (Cube Surfaces):
+       *      This field (Render Target View Extent) is ignored."
+       *
+       * As such, we cannot set first_layer and num_layers on cube surfaces.
+       * To work around that, treat it as a 2D surface.
+       */
+      info->surface_type = GEN6_SURFTYPE_2D;
+   }
+
+   if (intel_gpu_gen(gpu) >= INTEL_GEN(7)) {
+      separate_stencil = true;
+   }
+   else {
+      /*
+       * From the Sandy Bridge PRM, volume 2 part 1, page 317:
+       *
+       *     "This field (Separate Stencil Buffer Enable) must be set to the
+       *      same value (enabled or disabled) as Hierarchical Depth Buffer
+       *      Enable."
+       */
+      separate_stencil = intel_img_can_enable_hiz(img, level);
+   }
+
+   /*
+    * From the Sandy Bridge PRM, volume 2 part 1, page 317:
+    *
+    *     "If this field (Hierarchical Depth Buffer Enable) is enabled, the
+    *      Surface Format of the depth buffer cannot be
+    *      D32_FLOAT_S8X24_UINT or D24_UNORM_S8_UINT. Use of stencil
+    *      requires the separate stencil buffer."
+    *
+    * From the Ironlake PRM, volume 2 part 1, page 330:
+    *
+    *     "If this field (Separate Stencil Buffer Enable) is disabled, the
+    *      Surface Format of the depth buffer cannot be D24_UNORM_X8_UINT."
+    *
+    * There is no similar restriction for GEN6.  But when D24_UNORM_X8_UINT
+    * is indeed used, the depth values output by the fragment shaders will
+    * be different when read back.
+    *
+    * As for GEN7+, separate_stencil is always true.
+    */
+   switch (format) {
+   case VK_FORMAT_D16_UNORM:
+      info->format = GEN6_ZFORMAT_D16_UNORM;
+      break;
+   case VK_FORMAT_D32_SFLOAT:
+      info->format = GEN6_ZFORMAT_D32_FLOAT;
+      break;
+   case VK_FORMAT_D32_SFLOAT_S8_UINT:
+      info->format = (separate_stencil) ?
+         GEN6_ZFORMAT_D32_FLOAT :
+         GEN6_ZFORMAT_D32_FLOAT_S8X24_UINT;
+      break;
+   case VK_FORMAT_S8_UINT:
+      if (separate_stencil) {
+         info->format = GEN6_ZFORMAT_D32_FLOAT;
+         break;
+      }
+      /* fall through */
+   default:
+      assert(!"unsupported depth/stencil format");
+      ds_init_info_null(gpu, info);
+      return;
+      break;
+   }
+
+   if (format != VK_FORMAT_S8_UINT)
+      info->zs.stride = img->layout.bo_stride;
+
+   if (img->s8_layout) {
+      /*
+       * From the Sandy Bridge PRM, volume 2 part 1, page 329:
+       *
+       *     "The pitch must be set to 2x the value computed based on width,
+       *       as the stencil buffer is stored with two rows interleaved."
+       *
+       * According to the classic driver, we need to do the same for GEN7+
+       * even though the Ivy Bridge PRM does not say anything about it.
+       */
+      info->stencil.stride = img->s8_layout->bo_stride * 2;
+
+      if (intel_gpu_gen(gpu) == INTEL_GEN(6)) {
+         unsigned x, y;
+
+         assert(img->s8_layout->walk == INTEL_LAYOUT_WALK_LOD);
+
+         /* offset to the level */
+         intel_layout_get_slice_pos(img->s8_layout, level, 0, &x, &y);
+         intel_layout_pos_to_mem(img->s8_layout, x, y, &x, &y);
+         info->stencil.offset = intel_layout_mem_to_raw(img->s8_layout, x, y);
+      }
+   } else if (format == VK_FORMAT_S8_UINT) {
+      info->stencil.stride = img->layout.bo_stride * 2;
+   }
+
+   if (intel_img_can_enable_hiz(img, level)) {
+      info->hiz.stride = img->layout.aux_stride;
+
+      /* offset to the level */
+      if (intel_gpu_gen(gpu) == INTEL_GEN(6))
+          info->hiz.offset = img->layout.aux_offsets[level];
+   }
+
+   info->width = img->layout.width0;
+   info->height = img->layout.height0;
+   info->depth = (img->type == VK_IMAGE_TYPE_3D) ?
+      img->depth : num_layers;
+
+   info->lod = level;
+   info->first_layer = first_layer;
+   info->num_layers = num_layers;
+}
+
+static void att_view_init_for_ds(struct intel_att_view *view,
+                                 const struct intel_gpu *gpu,
+                                 const struct intel_img *img,
+                                 VkImageViewType view_type,
+                                 VkFormat format, unsigned level,
+                                 unsigned first_layer, unsigned num_layers)
+{
+   const int max_2d_size U_ASSERT_ONLY =
+       (intel_gpu_gen(gpu) >= INTEL_GEN(7)) ? 16384 : 8192;
+   const int max_array_size U_ASSERT_ONLY =
+       (intel_gpu_gen(gpu) >= INTEL_GEN(7)) ? 2048 : 512;
+   struct ds_surface_info info;
+   uint32_t dw1, dw2, dw3, dw4, dw5, dw6;
+   uint32_t *dw;
+
+   INTEL_GPU_ASSERT(gpu, 6, 7.5);
+
+   if (img) {
+      ds_init_info(gpu, img, view_type, format, level,
+              first_layer, num_layers, &info);
+   }
+   else {
+      ds_init_info_null(gpu, &info);
+   }
+
+   switch (info.surface_type) {
+   case GEN6_SURFTYPE_NULL:
+      break;
+   case GEN6_SURFTYPE_1D:
+      assert(info.width <= max_2d_size && info.height == 1 &&
+             info.depth <= max_array_size);
+      assert(info.first_layer < max_array_size - 1 &&
+             info.num_layers <= max_array_size);
+      break;
+   case GEN6_SURFTYPE_2D:
+      assert(info.width <= max_2d_size && info.height <= max_2d_size &&
+             info.depth <= max_array_size);
+      assert(info.first_layer < max_array_size - 1 &&
+             info.num_layers <= max_array_size);
+      break;
+   case GEN6_SURFTYPE_3D:
+      assert(info.width <= 2048 && info.height <= 2048 && info.depth <= 2048);
+      assert(info.first_layer < 2048 && info.num_layers <= max_array_size);
+      break;
+   case GEN6_SURFTYPE_CUBE:
+      assert(info.width <= max_2d_size && info.height <= max_2d_size &&
+             info.depth == 1);
+      assert(info.first_layer == 0 && info.num_layers == 1);
+      assert(info.width == info.height);
+      break;
+   default:
+      assert(!"unexpected depth surface type");
+      break;
+   }
+
+   dw1 = info.surface_type << 29 |
+         info.format << 18;
+
+   if (info.zs.stride) {
+      /* required for GEN6+ */
+      assert(info.zs.stride > 0 && info.zs.stride < 128 * 1024 &&
+            info.zs.stride % 128 == 0);
+      assert(info.width <= info.zs.stride);
+
+      dw1 |= (info.zs.stride - 1);
+   }
+
+   dw2 = 0;
+
+   if (intel_gpu_gen(gpu) >= INTEL_GEN(7)) {
+      if (info.zs.stride)
+         dw1 |= 1 << 28;
+
+      if (info.stencil.stride)
+         dw1 |= 1 << 27;
+
+      if (info.hiz.stride)
+         dw1 |= 1 << 22;
+
+      dw3 = (info.height - 1) << 18 |
+            (info.width - 1) << 4 |
+            info.lod;
+
+      dw4 = (info.depth - 1) << 21 |
+            info.first_layer << 10 |
+            GEN7_MOCS_L3_WB;
+
+      dw5 = 0;
+
+      dw6 = (info.num_layers - 1) << 21;
+   }
+   else {
+      /* always Y-tiled */
+      dw1 |= 1 << 27 |
+             1 << 26;
+
+      if (info.hiz.stride) {
+         dw1 |= 1 << 22 |
+                1 << 21;
+      }
+
+      dw3 = (info.height - 1) << 19 |
+            (info.width - 1) << 6 |
+            info.lod << 2 |
+            GEN6_DEPTH_DW3_MIPLAYOUT_BELOW;
+
+      dw4 = (info.depth - 1) << 21 |
+            info.first_layer << 10 |
+            (info.num_layers - 1) << 1;
+
+      dw5 = 0;
+
+      dw6 = 0;
+   }
+
+   STATIC_ASSERT(ARRAY_SIZE(view->att_cmd) >= 10);
+   dw = view->att_cmd;
+
+   dw[0] = dw1;
+   dw[1] = dw2;
+   dw[2] = dw3;
+   dw[3] = dw4;
+   dw[4] = dw5;
+   dw[5] = dw6;
+
+   /* separate stencil */
+   if (info.stencil.stride) {
+      assert(info.stencil.stride > 0 && info.stencil.stride < 128 * 1024 &&
+             info.stencil.stride % 128 == 0);
+
+      dw[6] = info.stencil.stride - 1;
+      dw[7] = img->s8_offset;
+
+      if (intel_gpu_gen(gpu) >= INTEL_GEN(7))
+         dw[6] |= GEN7_MOCS_L3_WB << GEN6_STENCIL_DW1_MOCS__SHIFT;
+      if (intel_gpu_gen(gpu) >= INTEL_GEN(7.5))
+         dw[6] |= GEN75_STENCIL_DW1_STENCIL_BUFFER_ENABLE;
+   }
+   else {
+      dw[6] = 0;
+      dw[7] = 0;
+   }
+
+   /* hiz */
+   if (info.hiz.stride) {
+      dw[8] = info.hiz.stride - 1;
+      dw[9] = img->aux_offset;
+
+      if (intel_gpu_gen(gpu) >= INTEL_GEN(7))
+         dw[8] |= GEN7_MOCS_L3_WB << GEN6_HIZ_DW1_MOCS__SHIFT;
+   }
+   else {
+      dw[8] = 0;
+      dw[9] = 0;
+   }
+
+   view->has_stencil = info.stencil.stride;
+   view->has_hiz = info.hiz.stride;
+}
+
+static const VkComponentMapping identity_channel_mapping = {
+    .r = VK_COMPONENT_SWIZZLE_R,
+    .g = VK_COMPONENT_SWIZZLE_G,
+    .b = VK_COMPONENT_SWIZZLE_B,
+    .a = VK_COMPONENT_SWIZZLE_A,
+};
+
+static void att_view_init_for_rt(struct intel_att_view *view,
+                                 const struct intel_gpu *gpu,
+                                 const struct intel_img *img,
+                                 VkImageViewType view_type,
+                                 VkFormat format, unsigned level,
+                                 unsigned first_layer, unsigned num_layers)
+{
+    if (intel_gpu_gen(gpu) >= INTEL_GEN(7)) {
+        surface_state_tex_gen7(gpu, img, view_type, format,
+                level, 1, first_layer, num_layers,
+                identity_channel_mapping, true, view->att_cmd);
+    } else {
+        surface_state_tex_gen6(gpu, img, view_type, format,
+                level, 1, first_layer, num_layers,
+                true, view->att_cmd);
+    }
+
+    view->is_rt = true;
+}
+
+static void att_view_init_for_input(struct intel_att_view *view,
+                                    const struct intel_gpu *gpu,
+                                    const struct intel_img *img,
+                                    VkImageViewType view_type,
+                                    VkFormat format, unsigned level,
+                                    unsigned first_layer, unsigned num_layers)
+{
+    if (intel_gpu_gen(gpu) >= INTEL_GEN(7)) {
+        if (false) {
+            surface_state_tex_gen7(gpu, img, view_type, format,
+                    level, 1, first_layer, num_layers,
+                    identity_channel_mapping, false, view->cmd);
+        } else {
+            surface_state_null_gen7(gpu, view->cmd);
+        }
+
+        view->cmd_len = 8;
+    } else {
+        if (false) {
+            surface_state_tex_gen6(gpu, img, view_type, format,
+                    level, 1, first_layer, num_layers, false, view->cmd);
+        } else {
+            surface_state_null_gen6(gpu, view->cmd);
+        }
+
+        view->cmd_len = 6;
+    }
+}
+
+void intel_null_view_init(struct intel_null_view *view,
+                          struct intel_dev *dev)
+{
+    if (intel_gpu_gen(dev->gpu) >= INTEL_GEN(7)) {
+        surface_state_null_gen7(dev->gpu, view->cmd);
+        view->cmd_len = 8;
+    } else {
+        surface_state_null_gen6(dev->gpu, view->cmd);
+        view->cmd_len = 6;
+    }
+}
+
+static void buf_view_destroy(struct intel_obj *obj)
+{
+    struct intel_buf_view *view = intel_buf_view_from_obj(obj);
+
+    intel_buf_view_destroy(view);
+}
+
+void intel_buf_view_init(const struct intel_dev *dev,
+                         const VkBufferViewCreateInfo *info,
+                         struct intel_buf_view *view,
+                         bool raw)
+{
+    struct intel_buf *buf = intel_buf(info->buffer);
+    /* TODO: Is transfer destination the only shader write operation? */
+    const bool will_write = buf ? (buf->usage & (VK_BUFFER_USAGE_STORAGE_TEXEL_BUFFER_BIT |
+                             VK_BUFFER_USAGE_STORAGE_BUFFER_BIT)) : 0;
+    VkFormat format;
+    VkDeviceSize stride;
+    uint32_t *cmd;
+    int i;
+
+    view->obj.destroy = buf_view_destroy;
+
+    view->buf = buf;
+
+    /*
+     * The compiler expects uniform buffers to have pitch of
+     * 4 for fragment shaders, but 16 for other stages.  The format
+     * must be VK_FORMAT_R32G32B32A32_SFLOAT.
+     */
+    if (raw) {
+        format = VK_FORMAT_R32G32B32A32_SFLOAT;
+        stride = 16;
+    } else {
+        format = info->format;
+        stride = icd_format_get_size(format);
+    }
+    cmd = view->cmd;
+
+    for (i = 0; i < 2; i++) {
+        if (intel_gpu_gen(dev->gpu) >= INTEL_GEN(7)) {
+            surface_state_buf_gen7(dev->gpu, info->offset,
+                    info->range, stride, format,
+                    will_write, will_write, cmd);
+            view->cmd_len = 8;
+        } else {
+            surface_state_buf_gen6(dev->gpu, info->offset,
+                    info->range, stride, format,
+                    will_write, will_write, cmd);
+            view->cmd_len = 6;
+        }
+
+        /* switch to view->fs_cmd */
+        if (raw) {
+            cmd = view->fs_cmd;
+            stride = 4;
+        } else {
+            memcpy(view->fs_cmd, view->cmd, sizeof(uint32_t) * view->cmd_len);
+            break;
+        }
+    }
+}
+
+VkResult intel_buf_view_create(struct intel_dev *dev,
+                               const VkBufferViewCreateInfo *info,
+                               struct intel_buf_view **view_ret)
+{
+    struct intel_buf_view *view;
+
+    view = (struct intel_buf_view *) intel_base_create(&dev->base.handle,
+            sizeof(*view), dev->base.dbg, VK_DEBUG_REPORT_OBJECT_TYPE_BUFFER_VIEW_EXT,
+            info, 0);
+    if (!view)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    intel_buf_view_init(dev, info, view, false);
+
+    *view_ret = view;
+
+    return VK_SUCCESS;
+}
+
+void intel_buf_view_destroy(struct intel_buf_view *view)
+{
+    intel_base_destroy(&view->obj.base);
+}
+
+static void img_view_destroy(struct intel_obj *obj)
+{
+    struct intel_img_view *view = intel_img_view_from_obj(obj);
+
+    intel_img_view_destroy(view);
+}
+
+void intel_img_view_init(struct intel_dev *dev,
+                         const VkImageViewCreateInfo *info,
+                         struct intel_img_view *view)
+{
+    VkComponentMapping state_swizzles;
+    uint32_t mip_levels, array_size;
+    struct intel_img *img = intel_img(info->image);
+
+    mip_levels = info->subresourceRange.levelCount;
+    if (mip_levels > img->mip_levels - info->subresourceRange.baseMipLevel)
+        mip_levels = img->mip_levels - info->subresourceRange.baseMipLevel;
+
+    array_size = info->subresourceRange.layerCount;
+    if (array_size > img->array_size - info->subresourceRange.baseArrayLayer)
+        array_size = img->array_size - info->subresourceRange.baseArrayLayer;
+
+    view->obj.destroy = img_view_destroy;
+
+    view->img = img;
+
+    if (!(img->usage & VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT)) {
+        if (intel_gpu_gen(dev->gpu) >= INTEL_GEN(7.5)) {
+            state_swizzles = info->components;
+            view->shader_swizzles.r = VK_COMPONENT_SWIZZLE_R;
+            view->shader_swizzles.g = VK_COMPONENT_SWIZZLE_G;
+            view->shader_swizzles.b = VK_COMPONENT_SWIZZLE_B;
+            view->shader_swizzles.a = VK_COMPONENT_SWIZZLE_A;
+        } else {
+            state_swizzles.r = VK_COMPONENT_SWIZZLE_R;
+            state_swizzles.g = VK_COMPONENT_SWIZZLE_G;
+            state_swizzles.b = VK_COMPONENT_SWIZZLE_B;
+            state_swizzles.a = VK_COMPONENT_SWIZZLE_A;
+            view->shader_swizzles = info->components;
+        }
+
+        /* shader_swizzles is ignored by the compiler */
+        if (view->shader_swizzles.r != VK_COMPONENT_SWIZZLE_R ||
+            view->shader_swizzles.g != VK_COMPONENT_SWIZZLE_G ||
+            view->shader_swizzles.b != VK_COMPONENT_SWIZZLE_B ||
+            view->shader_swizzles.a != VK_COMPONENT_SWIZZLE_A) {
+            intel_dev_log(dev, VK_DEBUG_REPORT_WARNING_BIT_EXT,
+                          (struct intel_base*)view, 0, 0,
+                          "image data swizzling is ignored");
+        }
+
+        if (intel_gpu_gen(dev->gpu) >= INTEL_GEN(7)) {
+            surface_state_tex_gen7(dev->gpu, img, info->viewType, info->format,
+                    info->subresourceRange.baseMipLevel, mip_levels,
+                    info->subresourceRange.baseArrayLayer, array_size,
+                    state_swizzles, false, view->cmd);
+            view->cmd_len = 8;
+        } else {
+            surface_state_tex_gen6(dev->gpu, img, info->viewType, info->format,
+                    info->subresourceRange.baseMipLevel, mip_levels,
+                    info->subresourceRange.baseArrayLayer, array_size,
+                    false, view->cmd);
+            view->cmd_len = 6;
+        }
+    }
+}
+
+VkResult intel_img_view_create(struct intel_dev *dev,
+                               const VkImageViewCreateInfo *info,
+                               struct intel_img_view **view_ret)
+{
+    struct intel_img_view *view;
+
+    view = (struct intel_img_view *) intel_base_create(&dev->base.handle,
+            sizeof(*view), dev->base.dbg, VK_DEBUG_REPORT_OBJECT_TYPE_IMAGE_VIEW_EXT, info, 0);
+    if (!view)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    intel_img_view_init(dev, info, view);
+
+    /* Initialize attachment view info in case it's needed */
+    intel_att_view_init(dev, info, &view->att_view);
+
+    *view_ret = view;
+
+    return VK_SUCCESS;
+}
+
+void intel_img_view_destroy(struct intel_img_view *view)
+{
+    intel_base_destroy(&view->obj.base);
+}
+
+void intel_att_view_init(struct intel_dev *dev,
+                           const VkImageViewCreateInfo *info,
+                           struct intel_att_view *att_view)
+{
+    struct intel_img *img = intel_img(info->image);
+    VkImageViewType view_type;
+
+    att_view->img = img;
+
+    att_view->mipLevel = info->subresourceRange.baseMipLevel;
+    att_view->baseArrayLayer = info->subresourceRange.baseArrayLayer;
+    att_view->array_size = info->subresourceRange.layerCount;
+
+    view_type = img_type_to_view_type(img->type,
+                                      info->subresourceRange.baseArrayLayer,
+                                      info->subresourceRange.layerCount);
+
+    att_view_init_for_input(att_view, dev->gpu, img, view_type, info->format,
+                            info->subresourceRange.baseMipLevel,
+                            info->subresourceRange.baseArrayLayer,
+                            info->subresourceRange.layerCount);
+
+    if (img->usage & VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT) {
+        att_view_init_for_ds(att_view, dev->gpu, img, view_type, img->layout.format,
+                             info->subresourceRange.baseMipLevel,
+                             info->subresourceRange.baseArrayLayer,
+                             info->subresourceRange.layerCount);
+    } else {
+        att_view_init_for_rt(att_view, dev->gpu, img, view_type, info->format,
+                             info->subresourceRange.baseMipLevel,
+                             info->subresourceRange.baseArrayLayer,
+                             info->subresourceRange.layerCount);
+    }
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateBufferView(
+    VkDevice                            device,
+    const VkBufferViewCreateInfo*       pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkBufferView*                       pView)
+{
+    struct intel_dev *dev = intel_dev(device);
+
+    return intel_buf_view_create(dev, pCreateInfo,
+            (struct intel_buf_view **) pView);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyBufferView(
+    VkDevice                            device,
+    VkBufferView                        bufferView,
+    const VkAllocationCallbacks*                     pAllocator)
+
+ {
+    struct intel_obj *obj = intel_obj(bufferView);
+
+    obj->destroy(obj);
+ }
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateImageView(
+    VkDevice                            device,
+    const VkImageViewCreateInfo*        pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkImageView*                        pView)
+{
+    struct intel_dev *dev = intel_dev(device);
+
+    return intel_img_view_create(dev, pCreateInfo,
+            (struct intel_img_view **) pView);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyImageView(
+    VkDevice                            device,
+    VkImageView                         imageView,
+    const VkAllocationCallbacks*                     pAllocator)
+
+{
+    struct intel_obj *obj = intel_obj(imageView);
+
+    obj->destroy(obj);
+}
+

diff --git a/icd/intel/view.h b/icd/intel/view.h
new file mode 100644
index 0000000..4dfe7e5
--- /dev/null
+++ b/icd/intel/view.h

@@ -0,0 +1,135 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ *
+ */
+
+#ifndef VIEW_H
+#define VIEW_H
+
+#include "obj.h"
+#include "intel.h"
+
+struct intel_img;
+struct intel_mem;
+
+struct intel_null_view {
+    /* this is not an intel_obj */
+
+    /* SURFACE_STATE */
+    uint32_t cmd[8];
+    uint32_t cmd_len;
+};
+
+struct intel_buf_view {
+    struct intel_obj obj;
+
+    struct intel_buf *buf;
+
+    /* SURFACE_STATE */
+    uint32_t cmd[8];
+    uint32_t fs_cmd[8];
+    uint32_t cmd_len;
+};
+
+struct intel_att_view {
+    struct intel_img *img;
+
+    uint32_t mipLevel;
+    uint32_t baseArrayLayer;
+    uint32_t array_size;
+
+    /* SURFACE_STATE for readback */
+    uint32_t cmd[8];
+    uint32_t cmd_len;
+
+    /*
+     * SURFACE_STATE when is_rt is true.  Otherwise,
+     *
+     * 3DSTATE_DEPTH_BUFFER
+     * 3DSTATE_STENCIL_BUFFER
+     * 3DSTATE_HIER_DEPTH_BUFFER
+     */
+    uint32_t att_cmd[10];
+    bool is_rt;
+    bool has_stencil;
+    bool has_hiz;
+};
+
+struct intel_img_view {
+    struct intel_obj obj;
+
+    struct intel_img *img;
+
+    VkComponentMapping shader_swizzles;
+
+    /* SURFACE_STATE */
+    uint32_t cmd[8];
+    uint32_t cmd_len;
+
+    struct intel_att_view att_view;
+};
+
+static inline struct intel_buf_view *intel_buf_view(VkBufferView view)
+{
+    return *(struct intel_buf_view **) &view;
+}
+
+static inline struct intel_buf_view *intel_buf_view_from_obj(struct intel_obj *obj)
+{
+    return (struct intel_buf_view *) obj;
+}
+
+static inline struct intel_img_view *intel_img_view(VkImageView view)
+{
+    return *(struct intel_img_view **) &view;
+}
+
+static inline struct intel_img_view *intel_img_view_from_obj(struct intel_obj *obj)
+{
+    return (struct intel_img_view *) obj;
+}
+
+void intel_null_view_init(struct intel_null_view *view,
+                          struct intel_dev *dev);
+
+void intel_buf_view_init(const struct intel_dev *dev,
+                         const VkBufferViewCreateInfo *info,
+                         struct intel_buf_view *view,
+                         bool raw);
+
+VkResult intel_buf_view_create(struct intel_dev *dev,
+                               const VkBufferViewCreateInfo *info,
+                               struct intel_buf_view **view_ret);
+
+void intel_buf_view_destroy(struct intel_buf_view *view);
+
+void intel_img_view_init(struct intel_dev *dev, const VkImageViewCreateInfo *info,
+                         struct intel_img_view *view);
+
+VkResult intel_img_view_create(struct intel_dev *dev,
+                                 const VkImageViewCreateInfo *info,
+                                 struct intel_img_view **view_ret);
+void intel_img_view_destroy(struct intel_img_view *view);
+
+void intel_att_view_init(struct intel_dev *dev,
+                         const VkImageViewCreateInfo *info,
+                         struct intel_att_view *att_view);
+void intel_att_view_destroy(struct intel_att_view *view);
+
+#endif /* VIEW_H */

diff --git a/icd/intel/wsi.h b/icd/intel/wsi.h
new file mode 100644
index 0000000..d697d2a
--- /dev/null
+++ b/icd/intel/wsi.h

@@ -0,0 +1,45 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olv@lunarg.com>
+ *
+ */
+
+#ifndef WSI_H
+#define WSI_H
+
+#include "intel.h"
+
+struct intel_fence;
+struct intel_gpu;
+struct intel_img;
+
+VkResult intel_wsi_gpu_get_properties(struct intel_gpu *gpu,
+                                  VkPhysicalDeviceProperties *pProperties);
+void intel_wsi_gpu_cleanup(struct intel_gpu *gpu);
+
+VkResult intel_wsi_img_init(struct intel_img *img);
+void intel_wsi_img_cleanup(struct intel_img *img);
+
+VkResult intel_wsi_fence_init(struct intel_fence *fence);
+void intel_wsi_fence_cleanup(struct intel_fence *fence);
+void intel_wsi_fence_copy(struct intel_fence *fence,
+                          const struct intel_fence *src);
+VkResult intel_wsi_fence_wait(struct intel_fence *fence,
+                                int64_t timeout_ns);
+
+#endif /* WSI_H */

diff --git a/icd/intel/wsi_null.c b/icd/intel/wsi_null.c
new file mode 100644
index 0000000..d3e1a05
--- /dev/null
+++ b/icd/intel/wsi_null.c

@@ -0,0 +1,55 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olv@lunarg.com>
+ *
+ */
+
+#include "wsi.h"
+
+void intel_wsi_gpu_cleanup(struct intel_gpu *gpu)
+{
+}
+
+VkResult intel_wsi_img_init(struct intel_img *img)
+{
+    return VK_SUCCESS;
+}
+
+void intel_wsi_img_cleanup(struct intel_img *img)
+{
+}
+
+VkResult intel_wsi_fence_init(struct intel_fence *fence)
+{
+    return VK_SUCCESS;
+}
+
+void intel_wsi_fence_cleanup(struct intel_fence *fence)
+{
+}
+
+void intel_wsi_fence_copy(struct intel_fence *fence,
+                          const struct intel_fence *src)
+{
+}
+
+VkResult intel_wsi_fence_wait(struct intel_fence *fence,
+                              int64_t timeout_ns)
+{
+    return VK_SUCCESS;
+}

diff --git a/icd/intel/wsi_x11.c b/icd/intel/wsi_x11.c
new file mode 100644
index 0000000..469f7dc
--- /dev/null
+++ b/icd/intel/wsi_x11.c

@@ -0,0 +1,1340 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Chia-I Wu <olvaffe@gmail.com>
+ * Author: Chia-I Wu <olv@lunarg.com>
+ * Author: Ian Elliott <ian@LunarG.com>
+ * Author: Ian Elliott <ianelliott@google.com>
+ * Author: Mike Stroyan <mike@LunarG.com>
+ * Author: Norbert Nopper <Norbert.Nopper@freescale.com> (xlib port)
+ *
+ */
+
+#define _GNU_SOURCE 1
+#include <time.h>
+#include <poll.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <xcb/xcb.h>
+#include <xcb/dri3.h>
+#include <xcb/present.h>
+
+#include <X11/Xlib-xcb.h>
+
+#include "kmd/winsys.h"
+#include "kmd/libdrm/xf86drmMode.h"
+#include "dev.h"
+#include "fence.h"
+#include "gpu.h"
+#include "img.h"
+#include "mem.h"
+#include "queue.h"
+#include "wsi.h"
+
+struct intel_x11_display {
+    struct intel_handle handle;
+
+    int fd;
+    uint32_t connector_id;
+
+    char name[32];
+    VkExtent2D physical_dimension;
+    VkExtent2D physical_resolution;
+
+    drmModeModeInfoPtr modes;
+    uint32_t mode_count;
+};
+
+typedef enum intel_x11_swap_chain_image_state_
+{
+    INTEL_SC_STATE_UNUSED = 0,
+    INTEL_SC_STATE_APP_OWNED = 1,
+    INTEL_SC_STATE_QUEUED_FOR_PRESENT = 2,
+    INTEL_SC_STATE_DISPLAYED = 3,
+} intel_x11_swap_chain_image_state;
+
+// This struct corresponds to the VkSwapchainHR object:
+struct intel_x11_swap_chain {
+    struct intel_handle handle;
+
+    xcb_connection_t *c;
+    xcb_window_t window;
+    uint32_t width;  // To compare with XCB_PRESENT_EVENT_CONFIGURE_NOTIFY's
+    uint32_t height; // ditto
+    bool out_of_date;
+    bool being_deleted;
+    bool force_copy;
+    VkPresentModeKHR present_mode;
+
+    int dri3_major, dri3_minor;
+    int present_major, present_minor;
+
+    xcb_special_event_t *present_special_event;
+
+    struct intel_img **persistent_images;
+    uint32_t persistent_image_count;
+    intel_x11_swap_chain_image_state *image_state;
+    uint32_t *present_queue;
+    uint32_t present_queue_length;
+    uint32_t present_serial;
+
+    struct {
+        uint32_t serial;
+    } local;
+
+    struct {
+        uint32_t serial;
+        uint64_t msc;
+    } remote;
+};
+
+struct intel_x11_img_data {
+    struct intel_mem *mem;
+    int prime_fd;
+    uint32_t pixmap;
+};
+
+struct intel_x11_fence_data {
+    struct intel_x11_swap_chain *swap_chain;
+    uint32_t serial;
+};
+
+/* these are what DDX expects */
+static const VkFormat x11_presentable_formats[] = {
+    VK_FORMAT_B8G8R8A8_UNORM,
+    VK_FORMAT_B8G8R8A8_SRGB,
+    VK_FORMAT_B5G6R5_UNORM_PACK16,
+};
+
+// Note: The following function is only needed if an application uses this ICD
+// directly, without the common Vulkan loader:
+//
+// Create a VkSurfaceKHR object for XCB window connections:
+static VkResult x11_xcb_surface_create(struct intel_instance *instance,
+                                   const VkXcbSurfaceCreateInfoKHR* pCreateInfo,
+                                   VkIcdSurfaceXcb **pSurface)
+{
+    VkIcdSurfaceXcb *surface;
+
+    surface = intel_alloc(instance, sizeof(*surface), sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+    if (!surface)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    memset(surface, 0, sizeof(*surface));
+// TBD: Do we need to do this (it doesn't fit with what we want for the API)?
+//    intel_handle_init(&surface, VK_DEBUG_REPORT_OBJECT_TYPE_SURFACE_KHR_EXT, instance);
+
+    surface->base.platform = VK_ICD_WSI_PLATFORM_XCB;
+    surface->connection = pCreateInfo->connection;
+    surface->window = pCreateInfo->window;
+
+    *pSurface = surface;
+
+    return VK_SUCCESS;
+}
+
+#ifdef VK_USE_PLATFORM_XLIB_KHR
+// Note: The following function is only needed if an application uses this ICD
+// directly, without the common Vulkan loader:
+//
+// Create a VkSurfaceKHR object for XLIB window connections:
+static VkResult x11_xlib_surface_create(struct intel_instance *instance,
+                                   const VkXlibSurfaceCreateInfoKHR* pCreateInfo,
+                                   VkIcdSurfaceXlib **pSurface)
+{
+    VkIcdSurfaceXlib *surface;
+
+    surface = intel_alloc(instance, sizeof(*surface), sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+    if (!surface)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    memset(surface, 0, sizeof(*surface));
+// TBD: Do we need to do this (it doesn't fit with what we want for the API)?
+//    intel_handle_init(&surface, VK_DEBUG_REPORT_OBJECT_TYPE_SURFACE_KHR_EXT, instance);
+
+    surface->base.platform = VK_ICD_WSI_PLATFORM_XLIB;
+    surface->dpy = pCreateInfo->dpy;
+    surface->window = pCreateInfo->window;
+
+    *pSurface = surface;
+
+    return VK_SUCCESS;
+}
+
+static inline VkIcdSurfaceXlib *x11_xlib_surface(VkSurfaceKHR surface)
+{
+    VkIcdSurfaceXlib *pSurface = (VkIcdSurfaceXlib *) surface;
+    if (pSurface->base.platform == VK_ICD_WSI_PLATFORM_XLIB)
+    {
+        return pSurface;
+    }
+
+    return 0;
+}
+
+#endif // VK_USE_PLATFORM_XLIB_KHR
+
+static void x11_surface_destroy(VkSurfaceKHR surface)
+{
+    intel_free(surface, surface);
+}
+
+static inline VkIcdSurfaceXcb *x11_xcb_surface(VkSurfaceKHR surface)
+{
+    VkIcdSurfaceXcb *pSurface = (VkIcdSurfaceXcb *) surface;
+    if (pSurface->base.platform == VK_ICD_WSI_PLATFORM_XCB)
+    {
+        return pSurface;
+    }
+
+    return 0;
+}
+
+static inline struct intel_x11_swap_chain *x11_swap_chain(VkSwapchainKHR sc)
+{
+    return (struct intel_x11_swap_chain *) sc;
+}
+
+static bool x11_is_format_presentable(const struct intel_dev *dev,
+                                      VkFormat format)
+{
+    uint32_t i;
+
+    for (i = 0; i < ARRAY_SIZE(x11_presentable_formats); i++) {
+        if (x11_presentable_formats[i] == format)
+            return true;
+    }
+
+    return false;
+}
+
+static int x11_export_prime_fd(struct intel_dev *dev,
+                               struct intel_bo *bo,
+                               const struct intel_layout *layout)
+{
+    struct intel_winsys_handle export;
+    enum intel_tiling_mode tiling;
+
+    export.type = INTEL_WINSYS_HANDLE_FD;
+
+    switch (layout->tiling) {
+    case GEN6_TILING_X:
+        tiling = INTEL_TILING_X;
+        break;
+    case GEN6_TILING_Y:
+        tiling = INTEL_TILING_Y;
+        break;
+    default:
+        assert(layout->tiling == GEN6_TILING_NONE);
+        tiling = INTEL_TILING_NONE;
+        break;
+    }
+
+    if (intel_bo_set_tiling(bo, tiling, layout->bo_stride))
+        return -1;
+
+    if (intel_winsys_export_handle(dev->winsys, bo, tiling,
+                layout->bo_stride, layout->bo_height, &export))
+        return -1;
+
+    return (int) export.handle;
+}
+
+/**
+ * Return true if fd points to the primary or render node of the GPU.
+ */
+static bool x11_gpu_match_fd(const struct intel_gpu *gpu, int fd)
+{
+    struct stat fd_stat, gpu_stat;
+
+    if (fstat(fd, &fd_stat))
+        return false;
+
+    /* is it the primary node? */
+    if (!stat(gpu->primary_node, &gpu_stat) &&
+        !memcmp(&fd_stat, &gpu_stat, sizeof(fd_stat)))
+        return true;
+
+    /* is it the render node? */
+    if (gpu->render_node && !stat(gpu->render_node, &gpu_stat) &&
+        !memcmp(&fd_stat, &gpu_stat, sizeof(fd_stat)))
+        return true;
+
+    return false;
+}
+
+/*
+ * Return the depth of \p drawable.
+ */
+static int x11_get_drawable_depth(xcb_connection_t *c,
+                                  xcb_drawable_t drawable)
+{
+    xcb_get_geometry_cookie_t cookie;
+    xcb_get_geometry_reply_t *reply;
+    uint8_t depth;
+
+    cookie = xcb_get_geometry(c, drawable);
+    reply = xcb_get_geometry_reply(c, cookie, NULL);
+
+    if (reply) {
+        depth = reply->depth;
+        free(reply);
+    } else {
+        depth = 0;
+    }
+
+    return depth;
+}
+
+static VkResult x11_get_surface_capabilities(
+    VkSurfaceKHR surface,
+    VkSurfaceCapabilitiesKHR *pSurfaceProperties)
+{
+    const VkIcdSurfaceXcb *s_xcb = x11_xcb_surface(surface);
+
+#ifdef VK_USE_PLATFORM_XLIB_KHR
+    const VkIcdSurfaceXlib *s_xlib = x11_xlib_surface(surface);
+    assert(s_xcb || s_xlib);
+#endif // VK_USE_PLATFORM_XLIB_KHR
+
+    xcb_connection_t *c = 0;
+    xcb_window_t window = 0;
+    if (s_xcb)
+    {
+        c = s_xcb->connection;
+        window = s_xcb->window;
+    }
+#ifdef VK_USE_PLATFORM_XLIB_KHR
+    else if (s_xlib)
+    {
+        c = XGetXCBConnection(s_xlib->dpy);
+        window = (xcb_window_t)s_xlib->window;
+    }
+#endif // VK_USE_PLATFORM_XLIB_KHR
+
+    xcb_get_geometry_cookie_t cookie;
+    xcb_get_geometry_reply_t *reply;
+
+    cookie = xcb_get_geometry(c, window);
+    reply = xcb_get_geometry_reply(c, cookie, NULL);
+
+    if (reply) {
+        pSurfaceProperties->currentExtent.width = reply->width;
+        pSurfaceProperties->currentExtent.height = reply->height;
+        free(reply);
+    } else {
+        pSurfaceProperties->currentExtent.width = 0;
+        pSurfaceProperties->currentExtent.height = 0;
+    }
+
+    pSurfaceProperties->minImageCount = 2;
+    pSurfaceProperties->maxImageCount = 0;
+
+    pSurfaceProperties->minImageExtent.width =
+        pSurfaceProperties->currentExtent.width;
+    pSurfaceProperties->minImageExtent.height =
+        pSurfaceProperties->currentExtent.height;
+    pSurfaceProperties->maxImageExtent.width =
+        pSurfaceProperties->currentExtent.width;
+    pSurfaceProperties->maxImageExtent.height =
+        pSurfaceProperties->currentExtent.height;
+    pSurfaceProperties->maxImageArrayLayers = 1;
+    pSurfaceProperties->supportedTransforms = VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR;
+    pSurfaceProperties->currentTransform = VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR;
+    pSurfaceProperties->supportedCompositeAlpha = VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR;
+    pSurfaceProperties->supportedUsageFlags =
+        VK_IMAGE_USAGE_TRANSFER_SRC_BIT |
+        VK_IMAGE_USAGE_TRANSFER_DST_BIT |
+        VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT;
+
+    return VK_SUCCESS;
+}
+
+/**
+ * Return true if DRI3 and Present are supported by the server.
+ */
+static bool x11_is_dri3_and_present_supported(xcb_connection_t *c)
+{
+    const xcb_query_extension_reply_t *ext;
+
+    xcb_prefetch_extension_data(c, &xcb_dri3_id);
+    xcb_prefetch_extension_data(c, &xcb_present_id);
+
+    ext = xcb_get_extension_data(c, &xcb_dri3_id);
+    if (!ext || !ext->present)
+        return false;
+
+    ext = xcb_get_extension_data(c, &xcb_present_id);
+    if (!ext || !ext->present)
+        return false;
+
+    return true;
+}
+
+/**
+ * Send a DRI3Open to get the server GPU fd.
+ */
+static int x11_dri3_open(xcb_connection_t *c,
+                         xcb_drawable_t drawable,
+                         xcb_randr_provider_t provider)
+{
+    xcb_dri3_open_cookie_t cookie;
+    xcb_dri3_open_reply_t *reply;
+    int fd;
+
+    cookie = xcb_dri3_open(c, drawable, provider);
+    reply = xcb_dri3_open_reply(c, cookie, NULL);
+    if (!reply)
+        return -1;
+
+    fd = (reply->nfd == 1) ? xcb_dri3_open_reply_fds(c, reply)[0] : -1;
+    free(reply);
+
+    return fd;
+}
+
+/**
+ * Send a DRI3PixmapFromBuffer to create a pixmap from \p prime_fd.
+ */
+static xcb_pixmap_t x11_dri3_pixmap_from_buffer(xcb_connection_t *c,
+                                                xcb_drawable_t drawable,
+                                                uint8_t depth, int prime_fd,
+                                                const struct intel_layout *layout)
+{
+    xcb_pixmap_t pixmap;
+
+    pixmap = xcb_generate_id(c);
+
+    xcb_dri3_pixmap_from_buffer(c, pixmap, drawable,
+            layout->bo_stride * layout->bo_height,
+            layout->width0, layout->height0,
+            layout->bo_stride, depth,
+            layout->block_size * 8, prime_fd);
+
+    return pixmap;
+}
+
+/**
+ * Send DRI3QueryVersion and PresentQueryVersion to query extension versions.
+ */
+static bool x11_swap_chain_dri3_and_present_query_version(struct intel_x11_swap_chain *sc)
+{
+    xcb_dri3_query_version_cookie_t dri3_cookie;
+    xcb_dri3_query_version_reply_t *dri3_reply;
+    xcb_present_query_version_cookie_t present_cookie;
+    xcb_present_query_version_reply_t *present_reply;
+
+    dri3_cookie = xcb_dri3_query_version(sc->c,
+            XCB_DRI3_MAJOR_VERSION, XCB_DRI3_MINOR_VERSION);
+    present_cookie = xcb_present_query_version(sc->c,
+            XCB_PRESENT_MAJOR_VERSION, XCB_PRESENT_MINOR_VERSION);
+
+    dri3_reply = xcb_dri3_query_version_reply(sc->c, dri3_cookie, NULL);
+    if (!dri3_reply)
+        return false;
+
+    sc->dri3_major = dri3_reply->major_version;
+    sc->dri3_minor = dri3_reply->minor_version;
+    free(dri3_reply);
+
+    present_reply = xcb_present_query_version_reply(sc->c, present_cookie, NULL);
+    if (!present_reply)
+        return false;
+
+    sc->present_major = present_reply->major_version;
+    sc->present_minor = present_reply->minor_version;
+    free(present_reply);
+
+    return true;
+}
+
+/**
+ * Send a PresentSelectInput to select interested events.
+ */
+static bool x11_swap_chain_present_select_input(struct intel_x11_swap_chain *sc, struct intel_x11_swap_chain *old_sc)
+{
+    xcb_void_cookie_t cookie;
+    xcb_generic_error_t *error;
+    xcb_present_event_t present_special_event_id;
+
+    if (old_sc && (old_sc->c == sc->c) && (old_sc->window == sc->window)) {
+        // Reuse a previous event.  (They can never be stopped.)
+        sc->present_special_event = old_sc->present_special_event;
+        old_sc->present_special_event = NULL;
+        return true;
+    }
+    /* create the event queue */
+    present_special_event_id = xcb_generate_id(sc->c);
+    sc->present_special_event = xcb_register_for_special_xge(sc->c,
+            &xcb_present_id, present_special_event_id, NULL);
+
+    cookie = xcb_present_select_input_checked(sc->c,
+            present_special_event_id, sc->window,
+            XCB_PRESENT_EVENT_MASK_COMPLETE_NOTIFY |
+            XCB_PRESENT_EVENT_MASK_CONFIGURE_NOTIFY);
+
+    error = xcb_request_check(sc->c, cookie);
+    if (error) {
+        free(error);
+        return false;
+    }
+
+    return true;
+}
+
+static struct intel_img *x11_swap_chain_create_persistent_image(struct intel_x11_swap_chain *sc,
+                                                                struct intel_dev *dev,
+                                                                const VkImageCreateInfo *img_info)
+{
+    struct intel_img *img;
+    struct intel_mem *mem;
+    struct intel_x11_img_data *data;
+    VkMemoryAllocateInfo mem_info;
+    int prime_fd;
+    xcb_pixmap_t pixmap;
+    VkResult ret;
+
+    ret = intel_img_create(dev, img_info, NULL, true, &img);
+    if (ret != VK_SUCCESS)
+        return NULL;
+
+    memset(&mem_info, 0, sizeof(mem_info));
+    mem_info.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
+    mem_info.allocationSize = img->total_size;
+    mem_info.memoryTypeIndex = 0;
+
+    ret = intel_mem_alloc(dev, &mem_info, &mem);
+    if (ret != VK_SUCCESS) {
+        intel_img_destroy(img);
+        return NULL;
+    }
+
+    prime_fd = x11_export_prime_fd(dev, mem->bo, &img->layout);
+    if (prime_fd < 0) {
+        intel_mem_free(mem);
+        intel_img_destroy(img);
+        return NULL;
+    }
+
+    pixmap = x11_dri3_pixmap_from_buffer(sc->c, sc->window,
+            x11_get_drawable_depth(sc->c, sc->window),
+            prime_fd, &img->layout);
+
+    data = (struct intel_x11_img_data *) img->wsi_data;
+    data->mem = mem;
+    data->prime_fd = prime_fd;
+    data->pixmap = pixmap;
+
+    intel_obj_bind_mem(&img->obj, mem, 0);
+
+    return img;
+}
+
+static bool x11_swap_chain_create_persistent_images(struct intel_x11_swap_chain *sc,
+                                                    struct intel_dev *dev,
+                                                    const VkSwapchainCreateInfoKHR *info)
+{
+    struct intel_img **images;
+    intel_x11_swap_chain_image_state *image_state;
+    uint32_t *present_queue;
+    VkImageCreateInfo img_info;
+    uint32_t i;
+
+    images = intel_alloc(sc, sizeof(*images) * info->minImageCount,
+            sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+    if (!images)
+        return false;
+    image_state = intel_alloc(
+            sc, sizeof(intel_x11_swap_chain_image_state) * info->minImageCount,
+            sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+    if (!image_state) {
+        for (i = 0; i < info->minImageCount; i++) {
+            intel_img_destroy(images[i]);
+        }
+        intel_free(sc, images);
+        return false;
+    }
+    present_queue = intel_alloc(
+            sc, sizeof(uint32_t) * info->minImageCount,
+            sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+    if (!present_queue) {
+        for (i = 0; i < info->minImageCount; i++) {
+            intel_img_destroy(images[i]);
+        }
+        intel_free(sc, images);
+        intel_free(sc, image_state);
+        return false;
+    }
+
+    memset(&img_info, 0, sizeof(img_info));
+    img_info.sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO;
+    img_info.imageType = VK_IMAGE_TYPE_2D;
+    img_info.format = info->imageFormat;
+    img_info.extent.width = info->imageExtent.width;
+    img_info.extent.height = info->imageExtent.height;
+    img_info.extent.depth = 1;
+    img_info.mipLevels = 1;
+    img_info.arrayLayers = info->imageArrayLayers;
+    img_info.samples = VK_SAMPLE_COUNT_1_BIT;
+    img_info.tiling = VK_IMAGE_TILING_OPTIMAL;
+    img_info.usage = info->imageUsage;
+    img_info.flags = 0;
+
+    for (i = 0; i < info->minImageCount; i++) {
+        images[i] = x11_swap_chain_create_persistent_image(sc,
+                dev, &img_info);
+        if (!images[i])
+            break;
+        image_state[i] = INTEL_SC_STATE_UNUSED;
+    }
+
+    if (i < info->minImageCount) {
+        uint32_t j;
+        for (j = 0; j < i; j++)
+            intel_img_destroy(images[j]);
+
+        intel_free(sc, images);
+
+        if (image_state) {
+            intel_free(sc, image_state);
+        }
+
+        if (present_queue) {
+            intel_free(sc, present_queue);
+        }
+
+        return false;
+    }
+
+    sc->persistent_images = images;
+    sc->persistent_image_count = info->minImageCount;
+    sc->image_state = image_state;
+    sc->present_queue = present_queue;
+    sc->present_queue_length = 0;
+
+    return true;
+}
+
+/**
+ * Send a PresentPixmap.
+ */
+static VkResult x11_swap_chain_present_pixmap(struct intel_x11_swap_chain *sc,
+                                              struct intel_img *img)
+{
+    struct intel_x11_img_data *data =
+        (struct intel_x11_img_data *) img->wsi_data;
+    uint32_t options = XCB_PRESENT_OPTION_NONE;
+    uint32_t target_msc, divisor, remainder;
+    xcb_void_cookie_t cookie;
+    xcb_generic_error_t *err;
+
+    target_msc = 0;
+    divisor = 1;
+    remainder = 0;
+    if (sc->present_mode == VK_PRESENT_MODE_IMMEDIATE_KHR) {
+        options |= XCB_PRESENT_OPTION_ASYNC;
+    }
+
+    if (sc->force_copy)
+        options |= XCB_PRESENT_OPTION_COPY;
+
+    cookie = xcb_present_pixmap_checked(sc->c,
+            sc->window,
+            data->pixmap,
+            ++sc->local.serial,
+            0, /* valid-area */
+            0, /* update-area */
+            0, /* x-off */
+            0, /* y-off */
+            0, /* crtc */
+            0, /* wait-fence */
+            0, /* idle-fence */
+            options,
+            target_msc,
+            divisor,
+            remainder,
+            0, NULL);
+
+    err = xcb_request_check(sc->c, cookie);
+    /* TODOVV: Can this be validated */
+    if (err) {
+        free(err);
+        return VK_ERROR_INITIALIZATION_FAILED;
+    }
+
+    return VK_SUCCESS;
+}
+
+/**
+ * Handle a Present event.
+ */
+static void x11_swap_chain_present_event(struct intel_x11_swap_chain *sc,
+                                         const xcb_present_generic_event_t *ev)
+{
+    union {
+        const xcb_present_generic_event_t *ev;
+        const xcb_present_complete_notify_event_t *complete;
+        const xcb_present_configure_notify_event_t *configure;
+    } u = { .ev = ev };
+
+    switch (u.ev->evtype) {
+    case XCB_PRESENT_COMPLETE_NOTIFY:
+        sc->remote.serial = u.complete->serial;
+        sc->remote.msc = u.complete->msc;
+        assert(sc->present_queue_length > 0);
+        if (sc->image_state[sc->present_queue[0]] == INTEL_SC_STATE_DISPLAYED) {
+            // Remove the previously-displayed image from the present queue:
+            sc->image_state[sc->present_queue[0]] = INTEL_SC_STATE_UNUSED;
+            sc->present_queue_length--;
+            for (int j = 0; j < sc->present_queue_length; j++) {
+                sc->present_queue[j] = sc->present_queue[j+1];
+            }
+        }
+        assert(sc->present_queue_length > 0);
+        assert(sc->image_state[sc->present_queue[0]] == INTEL_SC_STATE_QUEUED_FOR_PRESENT);
+        sc->image_state[sc->present_queue[0]] = INTEL_SC_STATE_DISPLAYED;
+        break;
+    case XCB_PRESENT_EVENT_CONFIGURE_NOTIFY:
+        if ((u.configure->width != sc->width) ||
+            (u.configure->height != sc->height)) {
+            // The swapchain is now considered "out of date" with the window:
+            sc->out_of_date = true;
+        }
+        break;
+    default:
+        break;
+    }
+}
+
+/*
+ * Wait for an event on a swap chain.
+ * Uses polling because xcb_wait_for_special_event won't time out.
+ */
+static VkResult x11_swap_chain_wait(struct intel_x11_swap_chain *sc,
+                                    uint32_t serial, int64_t timeout)
+{
+    struct timespec current_time; // current time for planning wait
+    struct timespec stop_time;  // time when timeout will elapse
+    bool wait;
+    // Don't wait on destroyed swap chain
+    if (sc->present_special_event == NULL ) {
+        return VK_SUCCESS;
+    }
+    if (timeout == 0){
+        wait = false;
+    } else {
+        wait = true;
+        clock_gettime(CLOCK_MONOTONIC, &current_time);
+        if (timeout == -1) {
+            // wait approximately forever
+            stop_time.tv_nsec = current_time.tv_nsec;
+            stop_time.tv_sec = current_time.tv_sec + 10*365*24*60*60;
+        } else {
+            stop_time.tv_nsec = current_time.tv_nsec + (timeout % 1000000000);
+            stop_time.tv_sec = current_time.tv_sec + (timeout / 1000000000);
+            // Carry overflow from tv_nsec to tv_sec
+            while (stop_time.tv_nsec > 1000000000) {
+                stop_time.tv_sec += 1;
+                stop_time.tv_nsec -= 1000000000;
+            }
+        }
+    }
+
+    while (sc->remote.serial < serial) {
+        xcb_present_generic_event_t *ev;
+        xcb_intern_atom_reply_t *reply;
+
+        ev = (xcb_present_generic_event_t *)
+            xcb_poll_for_special_event(sc->c, sc->present_special_event);
+        if (wait) {
+            int poll_timeout;
+            struct pollfd fds;
+
+            fds.fd = xcb_get_file_descriptor(sc->c);
+            fds.events = POLLIN;
+
+            while (!ev) {
+                clock_gettime(CLOCK_MONOTONIC, &current_time);
+                if (current_time.tv_sec > stop_time.tv_sec ||
+                    (current_time.tv_sec == stop_time.tv_sec && current_time.tv_nsec > stop_time.tv_nsec)) {
+                    // Time has run out
+                    return VK_TIMEOUT;
+                }
+                poll_timeout = 1000/60; // milliseconds for 60 HZ
+                if (current_time.tv_sec >= stop_time.tv_sec-1) { // Remaining timeout may be under 1/60 seconds.
+                    int remaining_timeout;
+                    remaining_timeout = 1000 * (stop_time.tv_sec - current_time.tv_sec) + (stop_time.tv_nsec - current_time.tv_nsec) / 1000000;
+                    if (poll_timeout > remaining_timeout) {
+                        poll_timeout = remaining_timeout; // milliseconds for remainder of timeout
+                    }
+                }
+
+                // Wait for any input on the xcb connection or a timeout.
+                // Events may come in and be queued up before poll.  Timing out handles that.
+                poll(&fds, 1, poll_timeout);
+
+                // Another thread may have handled events and updated sc->remote.serial
+                if (sc->remote.serial >= serial) {
+                    return VK_SUCCESS;
+                }
+
+                // Use xcb_intern_atom_reply just to make xcb really read events from socket.
+                // Calling xcb_poll_for_special_event fails to actually look for new packets.
+                reply = xcb_intern_atom_reply(sc->c, xcb_intern_atom(sc->c, 1, 1, "a"), NULL);
+                if (reply) {
+                    free(reply);
+                }
+
+                ev = (xcb_present_generic_event_t *)
+                    xcb_poll_for_special_event(sc->c, sc->present_special_event);
+            }
+        } else {
+            if (!ev)
+                return VK_NOT_READY;
+        }
+
+        x11_swap_chain_present_event(sc, ev);
+
+        free(ev);
+    }
+
+    return VK_SUCCESS;
+}
+
+static void x11_swap_chain_destroy_begin(struct intel_x11_swap_chain *sc)
+{
+    if (sc->persistent_images) {
+        uint32_t i;
+
+        for (i = 0; i < sc->persistent_image_count; i++)
+            intel_img_destroy(sc->persistent_images[i]);
+        intel_free(sc, sc->persistent_images);
+    }
+
+    if (sc->image_state) {
+        intel_free(sc, sc->image_state);
+    }
+
+    if (sc->present_queue) {
+        intel_free(sc, sc->present_queue);
+    }
+
+    // Don't unregister because another swap chain may be using this event queue.
+    //if (sc->present_special_event)
+    //    xcb_unregister_for_special_event(sc->c, sc->present_special_event);
+}
+
+static void x11_swap_chain_destroy_end(struct intel_x11_swap_chain *sc)
+{
+    // Leave memory around for now because fences use it without reference count.
+    //intel_free(sc, sc);
+}
+
+static VkResult x11_swap_chain_create(struct intel_dev *dev,
+                                      const VkSwapchainCreateInfoKHR *info,
+                                      struct intel_x11_swap_chain **sc_ret)
+{
+    const xcb_randr_provider_t provider = 0;
+
+    const VkIcdSurfaceXcb *s_xcb = x11_xcb_surface(info->surface);
+
+#ifdef VK_USE_PLATFORM_XLIB_KHR
+    const VkIcdSurfaceXlib *s_xlib = x11_xlib_surface(info->surface);
+    assert(s_xcb || s_xlib);
+#endif // VK_USE_PLATFORM_XLIB_KHR
+
+    xcb_connection_t *c = 0;
+    xcb_window_t window = 0;
+    if (s_xcb)
+    {
+        c = s_xcb->connection;
+        window = s_xcb->window;
+    }
+#ifdef VK_USE_PLATFORM_XLIB_KHR
+    else if (s_xlib)
+    {
+        c = XGetXCBConnection(s_xlib->dpy);
+        window = (xcb_window_t)s_xlib->window;
+    }
+#endif // VK_USE_PLATFORM_XLIB_KHR
+
+    struct intel_x11_swap_chain *sc;
+    int fd;
+
+    /* TODOVV: Add test to validation layer */
+    if (!x11_is_format_presentable(dev, info->imageFormat)) {
+        intel_dev_log(dev, VK_DEBUG_REPORT_ERROR_BIT_EXT,
+                      &dev->base, 0, 0, "invalid presentable image format");
+//        return VK_ERROR_INVALID_VALUE;
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    /* TODOVV: Can we add test to validation layer? */
+    if (!x11_is_dri3_and_present_supported(c)) {
+//        return VK_ERROR_INVALID_VALUE;
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    /* TODOVV: Can we add test to validation layer? */
+    fd = x11_dri3_open(c, window, provider);
+    if (fd < 0 || !x11_gpu_match_fd(dev->gpu, fd)) {
+        if (fd >= 0)
+            close(fd);
+//        return VK_ERROR_INVALID_VALUE;
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    close(fd);
+
+    sc = intel_alloc(dev, sizeof(*sc), sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+    if (!sc)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    memset(sc, 0, sizeof(*sc));
+    intel_handle_init(&sc->handle, VK_DEBUG_REPORT_OBJECT_TYPE_SWAPCHAIN_KHR_EXT, dev->base.handle.instance);
+
+    sc->c = c;
+    sc->window = window;
+
+    /* always copy unless flip bit is set */
+    sc->present_mode = info->presentMode;
+    // Record the swapchain's width and height, so that we can determine when
+    // it is "out of date" w.r.t. the window:
+    sc->width = info->imageExtent.width;
+    sc->height = info->imageExtent.height;
+    sc->out_of_date = false;
+    sc->being_deleted = false;
+    sc->force_copy = true;
+
+    if (!x11_swap_chain_dri3_and_present_query_version(sc) ||
+        !x11_swap_chain_present_select_input(sc, x11_swap_chain(info->oldSwapchain)) ||
+        !x11_swap_chain_create_persistent_images(sc, dev, info)) {
+        x11_swap_chain_destroy_begin(sc);
+        x11_swap_chain_destroy_end(sc);
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    *sc_ret = sc;
+
+    return VK_SUCCESS;
+}
+
+static void x11_display_destroy(struct intel_x11_display *dpy)
+{
+    intel_free(dpy, dpy->modes);
+    intel_free(dpy, dpy);
+}
+
+void intel_wsi_gpu_cleanup(struct intel_gpu *gpu)
+{
+    if (gpu->displays) {
+        uint32_t i;
+
+        for (i = 0; i < gpu->display_count; i++) {
+            struct intel_x11_display *dpy =
+                (struct intel_x11_display *) gpu->displays[i];
+            x11_display_destroy(dpy);
+        }
+        intel_free(gpu, gpu->displays);
+    }
+}
+
+VkResult intel_wsi_img_init(struct intel_img *img)
+{
+    struct intel_x11_img_data *data;
+
+    data = intel_alloc(img, sizeof(*data), sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+    if (!data)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    memset(data, 0, sizeof(*data));
+
+    assert(!img->wsi_data);
+    img->wsi_data = data;
+
+    return VK_SUCCESS;
+}
+
+void intel_wsi_img_cleanup(struct intel_img *img)
+{
+    struct intel_x11_img_data *data =
+        (struct intel_x11_img_data *) img->wsi_data;
+
+    if (data->mem) {
+        close(data->prime_fd);
+        intel_mem_free(data->mem);
+    }
+
+    intel_free(img, img->wsi_data);
+}
+
+VkResult intel_wsi_fence_init(struct intel_fence *fence)
+{
+    struct intel_x11_fence_data *data;
+
+    data = intel_alloc(fence, sizeof(*data), sizeof(int), VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+    if (!data)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    memset(data, 0, sizeof(*data));
+
+    assert(!fence->wsi_data);
+    fence->wsi_data = data;
+
+    return VK_SUCCESS;
+}
+
+void intel_wsi_fence_cleanup(struct intel_fence *fence)
+{
+    intel_free(fence, fence->wsi_data);
+}
+
+void intel_wsi_fence_copy(struct intel_fence *fence,
+                          const struct intel_fence *src)
+{
+    if (!fence->wsi_data) {
+        return;
+    }
+    memcpy(fence->wsi_data, src->wsi_data,
+            sizeof(struct intel_x11_fence_data));
+}
+
+VkResult intel_wsi_fence_wait(struct intel_fence *fence,
+                              int64_t timeout_ns)
+{
+    struct intel_x11_fence_data *data =
+        (struct intel_x11_fence_data *) fence->wsi_data;
+
+    if (!data)
+        return VK_SUCCESS;
+
+    if (!data->swap_chain)
+        return VK_SUCCESS;
+
+    return x11_swap_chain_wait(data->swap_chain, data->serial, timeout_ns);
+}
+
+
+// Note: The following function is only needed if an application uses this ICD
+// directly, without the common Vulkan loader:
+//
+// Create a VkSurfaceKHR object for XCB window connections:
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateXcbSurfaceKHR(
+    VkInstance                              instance,
+    const VkXcbSurfaceCreateInfoKHR*        pCreateInfo,
+    const VkAllocationCallbacks*            pAllocator,
+    VkSurfaceKHR*                           pSurface)
+{
+    return x11_xcb_surface_create((struct intel_instance *) instance,
+                              pCreateInfo,
+                              (VkIcdSurfaceXcb **) pSurface);
+}
+
+VKAPI_ATTR VkBool32 VKAPI_CALL vkGetPhysicalDeviceXcbPresentationSupportKHR(
+    VkPhysicalDevice                            physicalDevice,
+    uint32_t                                    queueFamilyIndex,
+    xcb_connection_t*                           connection,
+    xcb_visualid_t                              visual_id)
+{
+    // Just make sure we have a non-zero connection:
+    if (connection) {
+        return VK_TRUE;
+    } else {
+        return VK_FALSE;
+    }
+}
+
+#ifdef VK_USE_PLATFORM_XLIB_KHR
+// Note: The following function is only needed if an application uses this ICD
+// directly, without the common Vulkan loader:
+//
+// Create a VkSurfaceKHR object for XLIB window connections:
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateXlibSurfaceKHR(
+    VkInstance                              instance,
+    const VkXlibSurfaceCreateInfoKHR*       pCreateInfo,
+    const VkAllocationCallbacks*            pAllocator,
+    VkSurfaceKHR*                           pSurface)
+{
+    return x11_xlib_surface_create((struct intel_instance *) instance,
+                              pCreateInfo,
+                              (VkIcdSurfaceXlib **) pSurface);
+}
+
+VKAPI_ATTR VkBool32 VKAPI_CALL vkGetPhysicalDeviceXlibPresentationSupportKHR(
+    VkPhysicalDevice                            physicalDevice,
+    uint32_t                                    queueFamilyIndex,
+    Display*                                    dpy,
+    VisualID                                    visualID)
+{
+    // Just make sure we have a non-zero display:
+    if (dpy) {
+        return VK_TRUE;
+    } else {
+        return VK_FALSE;
+    }
+}
+#endif // VK_USE_PLATFORM_XLIB_KHR
+
+VKAPI_ATTR void VKAPI_CALL vkDestroySurfaceKHR(
+    VkInstance                               instance,
+    VkSurfaceKHR                             surface,
+    const VkAllocationCallbacks*             pAllocator)
+{
+    x11_surface_destroy(surface);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkGetPhysicalDeviceSurfaceSupportKHR(
+    VkPhysicalDevice                        physicalDevice,
+    uint32_t                                queueFamilyIndex,
+    VkSurfaceKHR                            surface,
+    VkBool32*                               pSupported)
+{
+    const VkIcdSurfaceXcb *s_xcb = x11_xcb_surface(surface);
+#ifdef VK_USE_PLATFORM_XLIB_KHR
+    const VkIcdSurfaceXlib *s_xlib = x11_xlib_surface(surface);
+    assert(s_xcb || s_xlib);
+#endif // VK_USE_PLATFORM_XLIB_KHR
+
+    *pSupported = false;
+
+    if (s_xcb) {
+        // Just make sure we have a non-zero connection:
+        if (s_xcb->connection) {
+            *pSupported = true;
+        }
+    }
+#ifdef VK_USE_PLATFORM_XLIB_KHR
+    else if (s_xlib) {
+        // Just make sure we have a non-zero display:
+        if (s_xlib->dpy) {
+            *pSupported = true;
+        }
+    }
+#endif // VK_USE_PLATFORM_XLIB_KHR
+
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkGetPhysicalDeviceSurfaceCapabilitiesKHR(
+    VkPhysicalDevice                        physicalDevice,
+    VkSurfaceKHR                            surface,
+    VkSurfaceCapabilitiesKHR*               pSurfaceCapabilities)
+{
+    return x11_get_surface_capabilities(surface, pSurfaceCapabilities);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkGetPhysicalDeviceSurfaceFormatsKHR(
+    VkPhysicalDevice                        physicalDevice,
+    VkSurfaceKHR                            surface,
+    uint32_t*                               pSurfaceFormatCount,
+    VkSurfaceFormatKHR*                     pSurfaceFormats)
+{
+    VkResult ret = VK_SUCCESS;
+
+    if (pSurfaceFormats) {
+        uint32_t i;
+        for (i = 0; i < *pSurfaceFormatCount; i++) {
+            pSurfaceFormats[i].format = x11_presentable_formats[i];
+            pSurfaceFormats[i].colorSpace = VK_COLORSPACE_SRGB_NONLINEAR_KHR;
+        }
+    } else {
+        *pSurfaceFormatCount = ARRAY_SIZE(x11_presentable_formats);
+    }
+
+    return ret;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkGetPhysicalDeviceSurfacePresentModesKHR(
+    VkPhysicalDevice                        physicalDevice,
+    VkSurfaceKHR                            surface,
+    uint32_t*                               pPresentModeCount,
+    VkPresentModeKHR*                       pPresentModes)
+{
+    VkResult ret = VK_SUCCESS;
+
+    if (pPresentModes) {
+        pPresentModes[0] = VK_PRESENT_MODE_IMMEDIATE_KHR;
+        pPresentModes[1] = VK_PRESENT_MODE_FIFO_KHR;
+    } else {
+        *pPresentModeCount = 2;
+    }
+
+    return ret;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateSwapchainKHR(
+    VkDevice                                device,
+    const VkSwapchainCreateInfoKHR*         pCreateInfo,
+    const VkAllocationCallbacks*            pAllocator,
+    VkSwapchainKHR*                         pSwapchain)
+{
+    struct intel_dev *dev = intel_dev(device);
+
+    if (pCreateInfo->oldSwapchain) {
+        // TODO: Eventually, do more than simply up-front destroy the
+        // oldSwapchain (but just do that for now):
+        struct intel_x11_swap_chain *sc =
+            x11_swap_chain(pCreateInfo->oldSwapchain);
+
+        sc->being_deleted = true;
+        x11_swap_chain_destroy_begin(sc);
+    }
+
+    return x11_swap_chain_create(dev, pCreateInfo,
+            (struct intel_x11_swap_chain **) pSwapchain);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroySwapchainKHR(
+    VkDevice                                 device,
+    VkSwapchainKHR                           swapchain,
+    const VkAllocationCallbacks*             pAllocator)
+{
+    struct intel_x11_swap_chain *sc = x11_swap_chain(swapchain);
+
+    if (!sc->being_deleted) {
+        x11_swap_chain_destroy_begin(sc);
+    }
+    x11_swap_chain_destroy_end(sc);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkGetSwapchainImagesKHR(
+    VkDevice                                 device,
+    VkSwapchainKHR                           swapchain,
+    uint32_t*                                pCount,
+    VkImage*                                 pSwapchainImages)
+{
+    struct intel_x11_swap_chain *sc = x11_swap_chain(swapchain);
+    VkResult ret = VK_SUCCESS;
+
+    // TODOVV: Move this check to a validation layer (i.e. the driver should
+    // assume the correct data type, and not check):
+    if (!pCount) {
+//        return VK_ERROR_INVALID_POINTER;
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    if (pSwapchainImages) {
+        uint32_t i;
+        for (i = 0; i < *pCount; i++) {
+            pSwapchainImages[i] = (VkImage) sc->persistent_images[i];
+        }
+    } else {
+        *pCount = sc->persistent_image_count;
+    }
+
+    return ret;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkAcquireNextImageKHR(
+    VkDevice                                 device,
+    VkSwapchainKHR                           swapchain,
+    uint64_t                                 timeout,
+    VkSemaphore                              semaphore,
+    VkFence                                  fence,
+    uint32_t*                                pImageIndex)
+{
+    struct intel_x11_swap_chain *sc = x11_swap_chain(swapchain);
+    VkResult ret = VK_SUCCESS;
+
+    if (sc->out_of_date) {
+        // The window was resized, and the swapchain must be re-created:
+        return VK_ERROR_OUT_OF_DATE_KHR;
+    }
+
+    // Find an unused image to return:
+    for (int i = 0; i < sc->persistent_image_count; i++) {
+        if (sc->image_state[i] == INTEL_SC_STATE_UNUSED) {
+            sc->image_state[i] = INTEL_SC_STATE_APP_OWNED;
+            *pImageIndex = i;
+            return ret;
+        }
+    }
+
+    // If no image is ready, wait for a present to finish
+    ret = x11_swap_chain_wait(sc, sc->remote.serial+1, timeout);
+    if (ret != VK_SUCCESS) {
+        return ret;
+    }
+
+    // Find an unused image to return:
+    for (int i = 0; i < sc->persistent_image_count; i++) {
+        if (sc->image_state[i] == INTEL_SC_STATE_UNUSED) {
+            sc->image_state[i] = INTEL_SC_STATE_APP_OWNED;
+            *pImageIndex = i;
+            return ret;
+        }
+    }
+    // NOTE: Should never get here, but in case we do, do something:
+    assert(0);
+    return VK_ERROR_VALIDATION_FAILED_EXT;
+}
+
+
+VKAPI_ATTR VkResult VKAPI_CALL vkQueuePresentKHR(
+    VkQueue                                  queue_,
+    const VkPresentInfoKHR*                  pPresentInfo)
+{
+    struct intel_queue *queue = intel_queue(queue_);
+    uint32_t i;
+    uint32_t num_swapchains = pPresentInfo->swapchainCount;
+    VkResult rtn = VK_SUCCESS;
+
+    // Wait for queue to idle before out-of-band xcb present operation.
+    const VkResult r = intel_queue_wait(queue, -1);
+    (void) r;
+
+    for (i = 0; i < num_swapchains; i++) {
+        struct intel_x11_swap_chain *sc =
+            x11_swap_chain(pPresentInfo->pSwapchains[i]);
+        struct intel_img *img = 
+            sc->persistent_images[pPresentInfo->pImageIndices[i]];
+        struct intel_x11_fence_data *data =
+            (struct intel_x11_fence_data *) queue->fence->wsi_data;
+        VkResult ret;
+
+        if (sc->out_of_date) {
+            // The window was resized, and the swapchain must be re-created:
+            rtn = VK_ERROR_OUT_OF_DATE_KHR;
+            // TODO: Potentially change this to match the result of Bug 14952
+            // (which deals with some of the swapchains being out-of-date, but
+            // not all of them).  For now, just present the swapchains that
+            // aren't out-of-date, and skip the ones that are out-of-date:
+            continue;
+        }
+
+        ret = x11_swap_chain_present_pixmap(sc, img);
+        if (ret != VK_SUCCESS) {
+            return ret;
+        }
+
+        // Record the state change for this image, and add this image to the
+        // present queue for the swap chain:
+        sc->image_state[pPresentInfo->pImageIndices[i]] =
+            INTEL_SC_STATE_QUEUED_FOR_PRESENT;
+        sc->present_queue[sc->present_queue_length++] =
+            pPresentInfo->pImageIndices[i];
+
+        data->swap_chain = sc;
+        data->serial = sc->local.serial;
+        sc->present_serial = sc->local.serial;
+        intel_fence_set_seqno(queue->fence, img->obj.mem->bo);
+    }
+
+    return rtn;
+}

diff --git a/icd/nulldrv/CMakeLists.txt b/icd/nulldrv/CMakeLists.txt
new file mode 100644
index 0000000..b3c4b15
--- /dev/null
+++ b/icd/nulldrv/CMakeLists.txt

@@ -0,0 +1,41 @@
+# Create the nulldrv Vulkan DRI library
+
+if (WIN32)
+    set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -D_CRT_SECURE_NO_WARNINGS")
+else()
+set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS}")
+endif()
+
+add_custom_command(OUTPUT nulldrv_gpa.c
+    COMMAND ${PYTHON_CMD} ${PROJECT_SOURCE_DIR}/vk-vtgenerate.py Xcb icd-get-proc-addr > nulldrv_gpa.c
+    DEPENDS ${PROJECT_SOURCE_DIR}/vk-vtgenerate.py ${PROJECT_SOURCE_DIR}/vulkan.py)
+
+set(sources
+    nulldrv.c
+    nulldrv_gpa.c
+    )
+
+set(definitions "")
+
+set(libraries
+    icd)
+
+add_library(VK_nulldrv SHARED ${sources})
+target_compile_definitions(VK_nulldrv PRIVATE ${definitions})
+target_include_directories(VK_nulldrv PRIVATE ${include_dirs})
+target_link_libraries(VK_nulldrv ${libraries})
+
+if (WIN32)
+    # Add in the DLL "map" file for vkGetProcAddr()
+    set_target_properties(VK_nulldrv PROPERTIES
+        LINK_FLAGS "/DEF:${PROJECT_SOURCE_DIR}/icd/nulldrv/VK_nulldrv.def")
+else()
+    set_target_properties(VK_nulldrv PROPERTIES
+        LINK_FLAGS "-Wl,-Bsymbolic")
+    if (NOT (CMAKE_CURRENT_SOURCE_DIR STREQUAL CMAKE_CURRENT_BINARY_DIR))
+        add_custom_target(nulldrv_icd-json ALL
+        COMMAND ln -sf ${CMAKE_CURRENT_SOURCE_DIR}/nulldrv_icd.json
+            VERBATIM
+            )
+    endif()
+endif()

diff --git a/icd/nulldrv/README.md b/icd/nulldrv/README.md
new file mode 100644
index 0000000..448d517
--- /dev/null
+++ b/icd/nulldrv/README.md

@@ -0,0 +1,3 @@
+# Null VK Driver
+
+This directory provides a null VK driver

diff --git a/icd/nulldrv/VK_nulldrv.def b/icd/nulldrv/VK_nulldrv.def
new file mode 100644
index 0000000..fc21d05
--- /dev/null
+++ b/icd/nulldrv/VK_nulldrv.def

@@ -0,0 +1,30 @@
+;;;; Begin Copyright Notice ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;
+; Copyright (C) 2015-2016 Valve Corporation
+; Copyright (C) 2015-2016 LunarG, Inc.
+;
+; Licensed under the Apache License, Version 2.0 (the "License");
+; you may not use this file except in compliance with the License.
+; You may obtain a copy of the License at
+;
+;     http://www.apache.org/licenses/LICENSE-2.0
+;
+; Unless required by applicable law or agreed to in writing, software
+; distributed under the License is distributed on an "AS IS" BASIS,
+; WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+; See the License for the specific language governing permissions and
+; limitations under the License.
+;;;;  End Copyright Notice ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+; The following is required on Windows, for exporting symbols from the DLL
+
+LIBRARY VK_nulldrv
+EXPORTS
+   vkCreateInstance
+   vkDestroyInstance
+   vkEnumeratePhysicalDevices
+   vkEnumerateInstanceExtensionProperties
+   xcbCreateWindow
+   xcbDestroyWindow
+   xcbGetMessage
+   xcbQueuePresent

diff --git a/icd/nulldrv/nulldrv.c b/icd/nulldrv/nulldrv.c
new file mode 100644
index 0000000..fd62612
--- /dev/null
+++ b/icd/nulldrv/nulldrv.c

@@ -0,0 +1,2235 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Cody Northrop <cody@lunarg.com>
+ * Author: David Pinedo <david@lunarg.com>
+ * Author: Ian Elliott <ian@LunarG.com>
+ * Author: Tony Barbour <tony@LunarG.com>
+ *
+ */
+
+#include "nulldrv.h"
+#include <stdio.h>
+
+#if 0
+#define NULLDRV_LOG_FUNC \
+    do { \
+        fflush(stdout); \
+        fflush(stderr); \
+        printf("null driver: %s\n", __FUNCTION__); \
+        fflush(stdout); \
+    } while (0)
+#else
+    #define NULLDRV_LOG_FUNC do { } while (0)
+#endif
+
+static const VkExtensionProperties nulldrv_instance_extensions[NULLDRV_INST_EXT_COUNT] = {
+    {
+        .extensionName = VK_KHR_SURFACE_EXTENSION_NAME,
+        .specVersion = VK_KHR_SURFACE_SPEC_VERSION,
+    },
+    {
+        .extensionName = VK_KHR_XCB_SURFACE_EXTENSION_NAME,
+        .specVersion = VK_KHR_XCB_SURFACE_SPEC_VERSION,
+    },
+};
+
+const VkExtensionProperties nulldrv_device_exts[NULLDRV_DEV_EXT_COUNT] = {
+    {
+        .extensionName = VK_KHR_SWAPCHAIN_EXTENSION_NAME,
+        .specVersion = VK_KHR_SWAPCHAIN_SPEC_VERSION,
+    }
+};
+
+static struct nulldrv_base *nulldrv_base(void* base)
+{
+    return (struct nulldrv_base *) base;
+}
+
+static struct nulldrv_base *nulldrv_base_create(
+        struct nulldrv_dev *dev,
+        size_t obj_size,
+        VkDebugReportObjectTypeEXT type)
+{
+    struct nulldrv_base *base;
+
+    if (!obj_size)
+        obj_size = sizeof(*base);
+
+    assert(obj_size >= sizeof(*base));
+
+    base = (struct nulldrv_base*)malloc(obj_size);
+    if (!base)
+        return NULL;
+
+    memset(base, 0, obj_size);
+
+    // Initialize pointer to loader's dispatch table with ICD_LOADER_MAGIC
+    set_loader_magic_value(base);
+
+    if (dev == NULL) {
+        /*
+         * dev is NULL when we are creating the base device object
+         * Set dev now so that debug setup happens correctly
+         */
+        dev = (struct nulldrv_dev *) base;
+    }
+
+
+    base->get_memory_requirements = NULL;
+
+    return base;
+}
+
+static VkResult nulldrv_gpu_add(int devid, const char *primary_node,
+                         const char *render_node, struct nulldrv_gpu **gpu_ret)
+{
+    struct nulldrv_gpu *gpu;
+
+	gpu = malloc(sizeof(*gpu));
+    if (!gpu)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+	memset(gpu, 0, sizeof(*gpu));
+
+    // Initialize pointer to loader's dispatch table with ICD_LOADER_MAGIC
+    set_loader_magic_value(gpu);
+
+    *gpu_ret = gpu;
+
+    return VK_SUCCESS;
+}
+
+static VkResult nulldrv_queue_create(struct nulldrv_dev *dev,
+                              uint32_t node_index,
+                              struct nulldrv_queue **queue_ret)
+{
+    struct nulldrv_queue *queue;
+
+    queue = (struct nulldrv_queue *) nulldrv_base_create(dev, sizeof(*queue),
+            VK_DEBUG_REPORT_OBJECT_TYPE_QUEUE_EXT);
+    if (!queue)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    queue->dev = dev;
+
+    *queue_ret = queue;
+
+    return VK_SUCCESS;
+}
+
+static VkResult dev_create_queues(struct nulldrv_dev *dev,
+                                  const VkDeviceQueueCreateInfo *queues,
+                                  uint32_t count)
+{
+    uint32_t i;
+
+    for (i = 0; i < count; i++) {
+        const VkDeviceQueueCreateInfo *q = &queues[i];
+        VkResult ret = VK_SUCCESS;
+
+        if (q->queueCount == 1 && !dev->queues[q->queueFamilyIndex]) {
+            ret = nulldrv_queue_create(dev, q->queueFamilyIndex,
+                    &dev->queues[q->queueFamilyIndex]);
+        }
+
+        if (ret != VK_SUCCESS) {
+            return ret;
+        }
+    }
+
+    return VK_SUCCESS;
+}
+
+static enum nulldrv_dev_ext_type nulldrv_gpu_lookup_extension(
+        const struct nulldrv_gpu *gpu,
+        const char* extensionName)
+{
+    enum nulldrv_dev_ext_type type;
+
+    for (type = 0; type < ARRAY_SIZE(nulldrv_device_exts); type++) {
+        if (strcmp(nulldrv_device_exts[type].extensionName, extensionName) == 0)
+            break;
+    }
+
+    assert(type < NULLDRV_DEV_EXT_COUNT || type == NULLDRV_DEV_EXT_INVALID);
+
+    return type;
+}
+
+static VkResult nulldrv_desc_ooxx_create(struct nulldrv_dev *dev,
+                                  struct nulldrv_desc_ooxx **ooxx_ret)
+{
+    struct nulldrv_desc_ooxx *ooxx;
+
+    ooxx = malloc(sizeof(*ooxx));
+    if (!ooxx) 
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    memset(ooxx, 0, sizeof(*ooxx));
+
+    ooxx->surface_desc_size = 0;
+    ooxx->sampler_desc_size = 0;
+
+    *ooxx_ret = ooxx; 
+
+    return VK_SUCCESS;
+}
+
+static VkResult nulldrv_dev_create(struct nulldrv_gpu *gpu,
+                            const VkDeviceCreateInfo *info,
+                            struct nulldrv_dev **dev_ret)
+{
+    struct nulldrv_dev *dev;
+    uint32_t i;
+    VkResult ret;
+
+    dev = (struct nulldrv_dev *) nulldrv_base_create(NULL, sizeof(*dev),
+            VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    if (!dev)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    for (i = 0; i < info->enabledExtensionCount; i++) {
+        const enum nulldrv_dev_ext_type ext = nulldrv_gpu_lookup_extension(
+                    gpu, info->ppEnabledExtensionNames[i]);
+
+        if (ext == NULLDRV_DEV_EXT_INVALID)
+            return VK_ERROR_EXTENSION_NOT_PRESENT;
+
+        dev->exts[ext] = true;
+    }
+
+    ret = nulldrv_desc_ooxx_create(dev, &dev->desc_ooxx);
+    if (ret != VK_SUCCESS) {
+        return ret;
+    }
+
+    ret = dev_create_queues(dev, info->pQueueCreateInfos,
+            info->queueCreateInfoCount);
+    if (ret != VK_SUCCESS) {
+        return ret;
+    }
+
+    *dev_ret = dev;
+
+    return VK_SUCCESS;
+}
+
+static struct nulldrv_gpu *nulldrv_gpu(VkPhysicalDevice gpu)
+{
+    return (struct nulldrv_gpu *) gpu;
+}
+
+static VkResult nulldrv_fence_create(struct nulldrv_dev *dev,
+                              const VkFenceCreateInfo *info,
+                              struct nulldrv_fence **fence_ret)
+{
+    struct nulldrv_fence *fence;
+
+    fence = (struct nulldrv_fence *) nulldrv_base_create(dev, sizeof(*fence),
+            VK_DEBUG_REPORT_OBJECT_TYPE_FENCE_EXT);
+    if (!fence)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    *fence_ret = fence;
+
+    return VK_SUCCESS;
+}
+
+static struct nulldrv_dev *nulldrv_dev(VkDevice dev)
+{
+    return (struct nulldrv_dev *) dev;
+}
+
+static struct nulldrv_img *nulldrv_img_from_base(struct nulldrv_base *base)
+{
+    return (struct nulldrv_img *) base;
+}
+
+
+static VkResult img_get_memory_requirements(struct nulldrv_base *base,
+                               VkMemoryRequirements *pRequirements)
+{
+    struct nulldrv_img *img = nulldrv_img_from_base(base);
+    VkResult ret = VK_SUCCESS;
+
+    pRequirements->size = img->total_size;
+    pRequirements->alignment = 4096;
+    pRequirements->memoryTypeBits = ~0u;        /* can use any memory type */
+
+    return ret;
+}
+
+static VkResult nulldrv_img_create(struct nulldrv_dev *dev,
+                            const VkImageCreateInfo *info,
+                            bool scanout,
+                            struct nulldrv_img **img_ret)
+{
+    struct nulldrv_img *img;
+
+    img = (struct nulldrv_img *) nulldrv_base_create(dev, sizeof(*img),
+            VK_DEBUG_REPORT_OBJECT_TYPE_IMAGE_EXT);
+    if (!img)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    img->type = info->imageType;
+    img->depth = info->extent.depth;
+    img->mip_levels = info->mipLevels;
+    img->array_size = info->arrayLayers;
+    img->usage = info->usage;
+    img->samples = info->samples;
+
+    img->obj.base.get_memory_requirements = img_get_memory_requirements;
+
+    *img_ret = img;
+
+    return VK_SUCCESS;
+}
+
+static struct nulldrv_img *nulldrv_img(VkImage image)
+{
+    return *(struct nulldrv_img **) &image;
+}
+
+static VkResult nulldrv_mem_alloc(struct nulldrv_dev *dev,
+                           const VkMemoryAllocateInfo *info,
+                           struct nulldrv_mem **mem_ret)
+{
+    struct nulldrv_mem *mem;
+
+    mem = (struct nulldrv_mem *) nulldrv_base_create(dev, sizeof(*mem),
+            VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_MEMORY_EXT);
+    if (!mem)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    mem->bo = malloc(info->allocationSize);
+    if (!mem->bo) {
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+    }
+
+    mem->size = info->allocationSize;
+
+    *mem_ret = mem;
+
+    return VK_SUCCESS;
+}
+
+static VkResult nulldrv_sampler_create(struct nulldrv_dev *dev,
+                                const VkSamplerCreateInfo *info,
+                                struct nulldrv_sampler **sampler_ret)
+{
+    struct nulldrv_sampler *sampler;
+
+    sampler = (struct nulldrv_sampler *) nulldrv_base_create(dev,
+            sizeof(*sampler), VK_DEBUG_REPORT_OBJECT_TYPE_SAMPLER_EXT);
+    if (!sampler)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    *sampler_ret = sampler;
+
+    return VK_SUCCESS;
+}
+
+static VkResult nulldrv_img_view_create(struct nulldrv_dev *dev,
+                                 const VkImageViewCreateInfo *info,
+                                 struct nulldrv_img_view **view_ret)
+{
+    struct nulldrv_img *img = nulldrv_img(info->image);
+    struct nulldrv_img_view *view;
+
+    view = (struct nulldrv_img_view *) nulldrv_base_create(dev, sizeof(*view),
+            VK_DEBUG_REPORT_OBJECT_TYPE_IMAGE_VIEW_EXT);
+    if (!view)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    view->img = img;
+
+    view->cmd_len = 8;
+
+    *view_ret = view;
+
+    return VK_SUCCESS;
+}
+
+static void *nulldrv_mem_map(struct nulldrv_mem *mem, VkFlags flags)
+{
+    return mem->bo;
+}
+
+static struct nulldrv_mem *nulldrv_mem(VkDeviceMemory mem)
+{
+    return *(struct nulldrv_mem **) &mem;
+}
+
+static struct nulldrv_buf *nulldrv_buf_from_base(struct nulldrv_base *base)
+{
+    return (struct nulldrv_buf *) base;
+}
+
+static VkResult buf_get_memory_requirements(struct nulldrv_base *base,
+                                VkMemoryRequirements* pMemoryRequirements)
+{
+    struct nulldrv_buf *buf = nulldrv_buf_from_base(base);
+
+    if (pMemoryRequirements == NULL)
+        return VK_SUCCESS;
+
+    pMemoryRequirements->size = buf->size;
+    pMemoryRequirements->alignment = 4096;
+    pMemoryRequirements->memoryTypeBits = 1;    /* nulldrv only has one memory type */
+
+    return VK_SUCCESS;
+}
+
+static VkResult nulldrv_buf_create(struct nulldrv_dev *dev,
+                            const VkBufferCreateInfo *info,
+                            struct nulldrv_buf **buf_ret)
+{
+    struct nulldrv_buf *buf;
+
+    buf = (struct nulldrv_buf *) nulldrv_base_create(dev, sizeof(*buf),
+            VK_DEBUG_REPORT_OBJECT_TYPE_BUFFER_EXT);
+    if (!buf)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    buf->size = info->size;
+    buf->usage = info->usage;
+
+    buf->obj.base.get_memory_requirements = buf_get_memory_requirements;
+
+    *buf_ret = buf;
+
+    return VK_SUCCESS;
+}
+
+static VkResult nulldrv_desc_layout_create(struct nulldrv_dev *dev,
+                                    const VkDescriptorSetLayoutCreateInfo *info,
+                                    struct nulldrv_desc_layout **layout_ret)
+{
+    struct nulldrv_desc_layout *layout;
+
+    layout = (struct nulldrv_desc_layout *)
+        nulldrv_base_create(dev, sizeof(*layout),
+                VK_DEBUG_REPORT_OBJECT_TYPE_DESCRIPTOR_SET_LAYOUT_EXT);
+    if (!layout)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    *layout_ret = layout;
+
+    return VK_SUCCESS;
+}
+
+static VkResult nulldrv_pipeline_layout_create(struct nulldrv_dev *dev,
+                                    const VkPipelineLayoutCreateInfo* pCreateInfo,
+                                    struct nulldrv_pipeline_layout **pipeline_layout_ret)
+{
+    struct nulldrv_pipeline_layout *pipeline_layout;
+
+    pipeline_layout = (struct nulldrv_pipeline_layout *)
+        nulldrv_base_create(dev, sizeof(*pipeline_layout),
+                VK_DEBUG_REPORT_OBJECT_TYPE_PIPELINE_LAYOUT_EXT);
+    if (!pipeline_layout)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    *pipeline_layout_ret = pipeline_layout;
+
+    return VK_SUCCESS;
+}
+
+static struct nulldrv_desc_layout *nulldrv_desc_layout(const VkDescriptorSetLayout layout)
+{
+    return *(struct nulldrv_desc_layout **) &layout;
+}
+
+static VkResult graphics_pipeline_create(struct nulldrv_dev *dev,
+                                           const VkGraphicsPipelineCreateInfo *info_,
+                                           struct nulldrv_pipeline **pipeline_ret)
+{
+    struct nulldrv_pipeline *pipeline;
+
+    pipeline = (struct nulldrv_pipeline *)
+        nulldrv_base_create(dev, sizeof(*pipeline), 
+                VK_DEBUG_REPORT_OBJECT_TYPE_PIPELINE_EXT);
+    if (!pipeline)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    *pipeline_ret = pipeline;
+
+    return VK_SUCCESS;
+}
+
+static VkResult nulldrv_cmd_create(struct nulldrv_dev *dev,
+                            const VkCommandBufferAllocateInfo *info,
+                            struct nulldrv_cmd **cmd_ret)
+{
+    struct nulldrv_cmd *cmd;
+    uint32_t num_allocated = 0;
+
+    for (uint32_t i = 0; i < info->commandBufferCount; i++) {
+        cmd = (struct nulldrv_cmd *) nulldrv_base_create(dev, sizeof(*cmd),
+                VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+        if (!cmd) {
+            for (uint32_t j = 0; j < num_allocated; j++) {
+                free(cmd_ret[j]);
+            }
+            return VK_ERROR_OUT_OF_HOST_MEMORY;
+        }
+        num_allocated++;
+        cmd_ret[i] = cmd;
+    }
+
+    return VK_SUCCESS;
+}
+
+static VkResult nulldrv_desc_pool_create(struct nulldrv_dev *dev,
+                                    const VkDescriptorPoolCreateInfo *info,
+                                    struct nulldrv_desc_pool **pool_ret)
+{
+    struct nulldrv_desc_pool *pool;
+
+    pool = (struct nulldrv_desc_pool *)
+        nulldrv_base_create(dev, sizeof(*pool),
+                VK_DEBUG_REPORT_OBJECT_TYPE_DESCRIPTOR_POOL_EXT);
+    if (!pool)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    pool->dev = dev;
+
+    *pool_ret = pool;
+
+    return VK_SUCCESS;
+}
+
+static VkResult nulldrv_desc_set_create(struct nulldrv_dev *dev,
+                                 struct nulldrv_desc_pool *pool,
+                                 const struct nulldrv_desc_layout *layout,
+                                 struct nulldrv_desc_set **set_ret)
+{
+    struct nulldrv_desc_set *set;
+
+    set = (struct nulldrv_desc_set *)
+        nulldrv_base_create(dev, sizeof(*set), 
+                VK_DEBUG_REPORT_OBJECT_TYPE_DESCRIPTOR_SET_EXT);
+    if (!set)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    set->ooxx = dev->desc_ooxx;
+    set->layout = layout;
+    *set_ret = set;
+
+    return VK_SUCCESS;
+}
+
+static struct nulldrv_desc_pool *nulldrv_desc_pool(VkDescriptorPool pool)
+{
+    return *(struct nulldrv_desc_pool **) &pool;
+}
+
+static VkResult nulldrv_fb_create(struct nulldrv_dev *dev,
+                           const VkFramebufferCreateInfo* info,
+                           struct nulldrv_framebuffer ** fb_ret)
+{
+
+    struct nulldrv_framebuffer *fb;
+    fb = (struct nulldrv_framebuffer *) nulldrv_base_create(dev, sizeof(*fb),
+            VK_DEBUG_REPORT_OBJECT_TYPE_FRAMEBUFFER_EXT);
+    if (!fb)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    *fb_ret = fb;
+
+    return VK_SUCCESS;
+
+}
+
+static VkResult nulldrv_render_pass_create(struct nulldrv_dev *dev,
+                           const VkRenderPassCreateInfo* info,
+                           struct nulldrv_render_pass** rp_ret)
+{
+    struct nulldrv_render_pass *rp;
+    rp = (struct nulldrv_render_pass *) nulldrv_base_create(dev, sizeof(*rp),
+            VK_DEBUG_REPORT_OBJECT_TYPE_RENDER_PASS_EXT);
+    if (!rp)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    *rp_ret = rp;
+
+    return VK_SUCCESS;
+}
+
+static struct nulldrv_buf *nulldrv_buf(VkBuffer buf)
+{
+    return *(struct nulldrv_buf **) &buf;
+}
+
+static VkResult nulldrv_buf_view_create(struct nulldrv_dev *dev,
+                                 const VkBufferViewCreateInfo *info,
+                                 struct nulldrv_buf_view **view_ret)
+{
+    struct nulldrv_buf *buf = nulldrv_buf(info->buffer);
+    struct nulldrv_buf_view *view;
+
+    view = (struct nulldrv_buf_view *) nulldrv_base_create(dev, sizeof(*view),
+            VK_DEBUG_REPORT_OBJECT_TYPE_BUFFER_VIEW_EXT);
+    if (!view)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    view->buf = buf;
+
+    *view_ret = view;
+
+    return VK_SUCCESS;
+}
+
+
+//*********************************************
+// Driver entry points
+//*********************************************
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateBuffer(
+    VkDevice                                  device,
+    const VkBufferCreateInfo*               pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkBuffer*                                 pBuffer)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_dev *dev = nulldrv_dev(device);
+
+    return nulldrv_buf_create(dev, pCreateInfo, (struct nulldrv_buf **) pBuffer);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyBuffer(
+    VkDevice                                  device,
+    VkBuffer                                  buffer,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateCommandPool(
+    VkDevice                                    device,
+    const VkCommandPoolCreateInfo*                  pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkCommandPool*                                  pCommandPool)
+{
+    NULLDRV_LOG_FUNC;
+    static VkCommandPool pool = (VkCommandPool)1;
+    *pCommandPool = pool;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyCommandPool(
+    VkDevice                                    device,
+    VkCommandPool                                   commandPool,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkResetCommandPool(
+    VkDevice                                    device,
+    VkCommandPool                                   commandPool,
+    VkCommandPoolResetFlags                         flags)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkAllocateCommandBuffers(
+    VkDevice                                  device,
+    const VkCommandBufferAllocateInfo*               pAllocateInfo,
+    VkCommandBuffer*                              pCommandBuffers)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_dev *dev = nulldrv_dev(device);
+
+    return nulldrv_cmd_create(dev, pAllocateInfo,
+            (struct nulldrv_cmd **) pCommandBuffers);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkFreeCommandBuffers(
+    VkDevice                                  device,
+    VkCommandPool                                 commandPool,
+    uint32_t                                  commandBufferCount,
+    const VkCommandBuffer*                        pCommandBuffers)
+{
+    NULLDRV_LOG_FUNC;
+    for (uint32_t i = 0; i < commandBufferCount; i++) {
+        free(pCommandBuffers[i]);
+    }
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkBeginCommandBuffer(
+    VkCommandBuffer                              commandBuffer,
+    const VkCommandBufferBeginInfo            *info)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkEndCommandBuffer(
+    VkCommandBuffer                              commandBuffer)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkResetCommandBuffer(
+    VkCommandBuffer                              commandBuffer,
+    VkCommandBufferResetFlags flags)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+static const VkFormat nulldrv_presentable_formats[] = {
+    VK_FORMAT_B8G8R8A8_UNORM,
+};
+
+VKAPI_ATTR void VKAPI_CALL vkDestroySurfaceKHR(
+    VkInstance                                   instance,
+    VkSurfaceKHR                                 surface,
+    const VkAllocationCallbacks*                pAllocator)
+{
+   NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkGetPhysicalDeviceSurfaceSupportKHR(
+    VkPhysicalDevice                        physicalDevice,
+    uint32_t                                queueFamilyIndex,
+    VkSurfaceKHR                            surface,
+    VkBool32*                               pSupported)
+{
+    NULLDRV_LOG_FUNC;
+
+    *pSupported = VK_TRUE;
+
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkGetPhysicalDeviceSurfaceCapabilitiesKHR(
+    VkPhysicalDevice                         physicalDevice,
+    VkSurfaceKHR                             surface,
+    VkSurfaceCapabilitiesKHR*                pSurfaceCapabilities)
+{
+    NULLDRV_LOG_FUNC;
+
+    pSurfaceCapabilities->minImageCount = 1;
+    pSurfaceCapabilities->maxImageCount = 0;  /* any number of images! */
+    pSurfaceCapabilities->currentExtent.width = -1;
+    pSurfaceCapabilities->currentExtent.height = -1;
+    pSurfaceCapabilities->minImageExtent.width = 1;
+    pSurfaceCapabilities->minImageExtent.height = 1;
+    pSurfaceCapabilities->maxImageExtent.width = 1024;
+    pSurfaceCapabilities->maxImageExtent.height = 1024;
+    pSurfaceCapabilities->maxImageArrayLayers = 1;
+    pSurfaceCapabilities->supportedTransforms = VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR;
+    pSurfaceCapabilities->currentTransform = VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR;
+    pSurfaceCapabilities->supportedCompositeAlpha = VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR;
+    pSurfaceCapabilities->supportedUsageFlags = VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT;
+
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkGetPhysicalDeviceSurfaceFormatsKHR(
+    VkPhysicalDevice                         physicalDevice,
+    VkSurfaceKHR                             surface,
+    uint32_t*                                pSurfaceFormatCount,
+    VkSurfaceFormatKHR*                      pSurfaceFormats)
+{
+    NULLDRV_LOG_FUNC;
+
+    if (pSurfaceFormats) {
+        uint32_t i;
+        for (i = 0; i < *pSurfaceFormatCount; i++) {
+            pSurfaceFormats[i].format = nulldrv_presentable_formats[i];
+            pSurfaceFormats[i].colorSpace = VK_COLORSPACE_SRGB_NONLINEAR_KHR;
+        }
+    }
+    else {
+        *pSurfaceFormatCount = ARRAY_SIZE(nulldrv_presentable_formats);
+    }
+
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkGetPhysicalDeviceSurfacePresentModesKHR(
+    VkPhysicalDevice                         physicalDevice,
+    VkSurfaceKHR                             surface,
+    uint32_t*                                pPresentModeCount,
+    VkPresentModeKHR*                        pPresentModes)
+{
+    NULLDRV_LOG_FUNC;
+
+    if (pPresentModes) {
+        pPresentModes[0] = VK_PRESENT_MODE_IMMEDIATE_KHR;
+        pPresentModes[1] = VK_PRESENT_MODE_FIFO_KHR;
+    } else {
+        *pPresentModeCount = 2;
+    }
+
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateSwapchainKHR(
+    VkDevice                                device,
+    const VkSwapchainCreateInfoKHR*         pCreateInfo,
+    const VkAllocationCallbacks*            pAllocator,
+    VkSwapchainKHR*                         pSwapchain)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_dev *dev = nulldrv_dev(device);
+    struct nulldrv_swap_chain *sc;
+
+    sc = (struct nulldrv_swap_chain *) nulldrv_base_create(dev, sizeof(*sc),
+            VK_DEBUG_REPORT_OBJECT_TYPE_SWAPCHAIN_KHR_EXT);
+    if (!sc) {
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+    }
+    sc->dev = dev;
+
+    *(VkSwapchainKHR **)pSwapchain = *(VkSwapchainKHR **)&sc;
+
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroySwapchainKHR(
+    VkDevice                                device,
+    VkSwapchainKHR                          swapchain,
+    const VkAllocationCallbacks*            pAllocator)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_swap_chain *sc = *(struct nulldrv_swap_chain **) &swapchain;
+
+    free(sc);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkGetSwapchainImagesKHR(
+    VkDevice                                 device,
+    VkSwapchainKHR                           swapchain,
+    uint32_t*                                pSwapchainImageCount,
+    VkImage*                                 pSwapchainImages)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_swap_chain *sc = *(struct nulldrv_swap_chain **) &swapchain;
+    struct nulldrv_dev *dev = sc->dev;
+    VkResult ret = VK_SUCCESS;
+
+    *pSwapchainImageCount = 2;
+    if (pSwapchainImages) {
+        uint32_t i;
+        for (i = 0; i < 2; i++) {
+                struct nulldrv_img *img;
+
+                img = (struct nulldrv_img *) nulldrv_base_create(dev,
+                        sizeof(*img),
+                        VK_DEBUG_REPORT_OBJECT_TYPE_IMAGE_EXT);
+                if (!img)
+                    return VK_ERROR_OUT_OF_HOST_MEMORY;
+            pSwapchainImages[i] = (VkImage) img;
+        }
+    }
+
+    return ret;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkAcquireNextImageKHR(
+    VkDevice                                 device,
+    VkSwapchainKHR                           swapchain,
+    uint64_t                                 timeout,
+    VkSemaphore                              semaphore,
+    VkFence                                  fence,
+    uint32_t*                                pImageIndex)
+{
+    NULLDRV_LOG_FUNC;
+
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkQueuePresentKHR(
+    VkQueue                                  queue_,
+    const VkPresentInfoKHR*                  pPresentInfo)
+{
+    NULLDRV_LOG_FUNC;
+
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdCopyBuffer(
+    VkCommandBuffer                              commandBuffer,
+    VkBuffer                                  srcBuffer,
+    VkBuffer                                  dstBuffer,
+    uint32_t                                    regionCount,
+    const VkBufferCopy*                      pRegions)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdCopyImage(
+    VkCommandBuffer                              commandBuffer,
+    VkImage                                   srcImage,
+    VkImageLayout                            srcImageLayout,
+    VkImage                                   dstImage,
+    VkImageLayout                            dstImageLayout,
+    uint32_t                                    regionCount,
+    const VkImageCopy*                       pRegions)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdBlitImage(
+    VkCommandBuffer                              commandBuffer,
+    VkImage                                  srcImage,
+    VkImageLayout                            srcImageLayout,
+    VkImage                                  dstImage,
+    VkImageLayout                            dstImageLayout,
+    uint32_t                                 regionCount,
+    const VkImageBlit*                       pRegions,
+    VkFilter                                 filter)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdCopyBufferToImage(
+    VkCommandBuffer                              commandBuffer,
+    VkBuffer                                  srcBuffer,
+    VkImage                                   dstImage,
+    VkImageLayout                            dstImageLayout,
+    uint32_t                                    regionCount,
+    const VkBufferImageCopy*                pRegions)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdCopyImageToBuffer(
+    VkCommandBuffer                              commandBuffer,
+    VkImage                                   srcImage,
+    VkImageLayout                            srcImageLayout,
+    VkBuffer                                  dstBuffer,
+    uint32_t                                    regionCount,
+    const VkBufferImageCopy*                pRegions)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdUpdateBuffer(
+    VkCommandBuffer                              commandBuffer,
+    VkBuffer                                  dstBuffer,
+    VkDeviceSize                                dstOffset,
+    VkDeviceSize                                dataSize,
+    const void*                                 pData)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdFillBuffer(
+    VkCommandBuffer                              commandBuffer,
+    VkBuffer                                  dstBuffer,
+    VkDeviceSize                                dstOffset,
+    VkDeviceSize                                size,
+    uint32_t                                    data)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdClearDepthStencilImage(
+    VkCommandBuffer                                 commandBuffer,
+    VkImage                                     image,
+    VkImageLayout                               imageLayout,
+    const VkClearDepthStencilValue*             pDepthStencil,
+    uint32_t                                    rangeCount,
+    const VkImageSubresourceRange*              pRanges)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdClearAttachments(
+    VkCommandBuffer                                 commandBuffer,
+    uint32_t                                    attachmentCount,
+    const VkClearAttachment*                    pAttachments,
+    uint32_t                                    rectCount,
+    const VkClearRect*                          pRects)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdClearColorImage(
+    VkCommandBuffer                         commandBuffer,
+    VkImage                             image,
+    VkImageLayout                       imageLayout,
+    const VkClearColorValue            *pColor,
+    uint32_t                            rangeCount,
+    const VkImageSubresourceRange*      pRanges)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdClearDepthStencil(
+    VkCommandBuffer                              commandBuffer,
+    VkImage                                   image,
+    VkImageLayout                            imageLayout,
+    float                                       depth,
+    uint32_t                                    stencil,
+    uint32_t                                    rangeCount,
+    const VkImageSubresourceRange*          pRanges)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdResolveImage(
+    VkCommandBuffer                              commandBuffer,
+    VkImage                                   srcImage,
+    VkImageLayout                            srcImageLayout,
+    VkImage                                   dstImage,
+    VkImageLayout                            dstImageLayout,
+    uint32_t                                    regionCount,
+    const VkImageResolve*                    pRegions)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdBeginQuery(
+    VkCommandBuffer                              commandBuffer,
+    VkQueryPool                              queryPool,
+    uint32_t                                    slot,
+    VkFlags                                   flags)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdEndQuery(
+    VkCommandBuffer                              commandBuffer,
+    VkQueryPool                              queryPool,
+    uint32_t                                    slot)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdResetQueryPool(
+    VkCommandBuffer                              commandBuffer,
+    VkQueryPool                              queryPool,
+    uint32_t                                    firstQuery,
+    uint32_t                                    queryCount)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdSetEvent(
+    VkCommandBuffer                              commandBuffer,
+    VkEvent                                  event_,
+    VkPipelineStageFlags                     stageMask)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdResetEvent(
+    VkCommandBuffer                              commandBuffer,
+    VkEvent                                  event_,
+    VkPipelineStageFlags                     stageMask)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdCopyQueryPoolResults(
+    VkCommandBuffer                                 commandBuffer,
+    VkQueryPool                                 queryPool,
+    uint32_t                                    firstQuery,
+    uint32_t                                    queryCount,
+    VkBuffer                                    dstBuffer,
+    VkDeviceSize                                dstOffset,
+    VkDeviceSize                                destStride,
+    VkFlags                                     flags)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdWriteTimestamp(
+    VkCommandBuffer                              commandBuffer,
+    VkPipelineStageFlagBits                     pipelineStage,
+    VkQueryPool                                 queryPool,
+    uint32_t                                    slot)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdBindPipeline(
+    VkCommandBuffer                              commandBuffer,
+    VkPipelineBindPoint                      pipelineBindPoint,
+    VkPipeline                               pipeline)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdSetViewport(VkCommandBuffer commandBuffer, uint32_t firstViewport, uint32_t viewportCount, const VkViewport* pViewports)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdSetScissor(VkCommandBuffer commandBuffer, uint32_t firstScissor, uint32_t scissorCount, const VkRect2D* pScissors)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdSetLineWidth(VkCommandBuffer commandBuffer, float lineWidth)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdSetDepthBias(VkCommandBuffer commandBuffer, float depthBiasConstantFactor, float depthBiasClamp, float depthBiasSlopeFactor)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdSetBlendConstants(VkCommandBuffer commandBuffer, const float blendConstants[4])
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdSetDepthBounds(VkCommandBuffer commandBuffer, float minDepthBounds, float maxDepthBounds)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdSetStencilCompareMask(VkCommandBuffer commandBuffer, VkStencilFaceFlags faceMask, uint32_t compareMask)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdSetStencilWriteMask(VkCommandBuffer commandBuffer, VkStencilFaceFlags faceMask, uint32_t writeMask)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdSetStencilReference(VkCommandBuffer commandBuffer, VkStencilFaceFlags faceMask, uint32_t reference)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdBindDescriptorSets(
+    VkCommandBuffer                              commandBuffer,
+    VkPipelineBindPoint                     pipelineBindPoint,
+    VkPipelineLayout                        layout,
+    uint32_t                                firstSet,
+    uint32_t                                descriptorSetCount,
+    const VkDescriptorSet*                  pDescriptorSets,
+    uint32_t                                dynamicOffsetCount,
+    const uint32_t*                         pDynamicOffsets)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdBindVertexBuffers(
+    VkCommandBuffer                                 commandBuffer,
+    uint32_t                                        firstBinding,
+    uint32_t                                        bindingCount,
+    const VkBuffer*                                 pBuffers,
+    const VkDeviceSize*                             pOffsets)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdBindIndexBuffer(
+    VkCommandBuffer                              commandBuffer,
+    VkBuffer                                  buffer,
+    VkDeviceSize                                offset,
+    VkIndexType                              indexType)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdDraw(
+    VkCommandBuffer                                 commandBuffer,
+    uint32_t                                    vertexCount,
+    uint32_t                                    instanceCount,
+    uint32_t                                    firstVertex,
+    uint32_t                                    firstInstance)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdDrawIndexed(
+    VkCommandBuffer                              commandBuffer,
+    uint32_t                                    indexCount,
+    uint32_t                                    instanceCount,
+    uint32_t                                    firstIndex,
+    int32_t                                     vertexOffset,
+    uint32_t                                    firstInstance)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdDrawIndirect(
+    VkCommandBuffer                              commandBuffer,
+    VkBuffer                                  buffer,
+    VkDeviceSize                                offset,
+    uint32_t                                    drawCount,
+    uint32_t                                    stride)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdDrawIndexedIndirect(
+    VkCommandBuffer                              commandBuffer,
+    VkBuffer                                  buffer,
+    VkDeviceSize                                offset,
+    uint32_t                                    drawCount,
+    uint32_t                                    stride)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdDispatch(
+    VkCommandBuffer                              commandBuffer,
+    uint32_t                                    x,
+    uint32_t                                    y,
+    uint32_t                                    z)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdDispatchIndirect(
+    VkCommandBuffer                              commandBuffer,
+    VkBuffer                                  buffer,
+    VkDeviceSize                                offset)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdWaitEvents(
+    VkCommandBuffer                             commandBuffer,
+    uint32_t                                    eventCount,
+    const VkEvent*                              pEvents,
+    VkPipelineStageFlags                        sourceStageMask,
+    VkPipelineStageFlags                        dstStageMask,
+    uint32_t                                    memoryBarrierCount,
+    const VkMemoryBarrier*                      pMemoryBarriers,
+    uint32_t                                    bufferMemoryBarrierCount,
+    const VkBufferMemoryBarrier*                pBufferMemoryBarriers,
+    uint32_t                                    imageMemoryBarrierCount,
+    const VkImageMemoryBarrier*                 pImageMemoryBarriers)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdPipelineBarrier(
+    VkCommandBuffer                             commandBuffer,
+    VkPipelineStageFlags                        srcStageMask,
+    VkPipelineStageFlags                        dstStageMask,
+    VkDependencyFlags                           dependencyFlags,
+    uint32_t                                    memoryBarrierCount,
+    const VkMemoryBarrier*                      pMemoryBarriers,
+    uint32_t                                    bufferMemoryBarrierCount,
+    const VkBufferMemoryBarrier*                pBufferMemoryBarriers,
+    uint32_t                                    imageMemoryBarrierCount,
+    const VkImageMemoryBarrier*                 pImageMemoryBarriers)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateDevice(
+    VkPhysicalDevice                            gpu_,
+    const VkDeviceCreateInfo*               pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkDevice*                                 pDevice)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_gpu *gpu = nulldrv_gpu(gpu_);
+    return nulldrv_dev_create(gpu, pCreateInfo, (struct nulldrv_dev**)pDevice);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyDevice(
+    VkDevice                                  device,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetDeviceQueue(
+    VkDevice                                  device,
+    uint32_t                                    queueNodeIndex,
+    uint32_t                                    queueIndex,
+    VkQueue*                                  pQueue)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_dev *dev = nulldrv_dev(device);
+    *pQueue = (VkQueue) dev->queues[0];
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkDeviceWaitIdle(
+    VkDevice                                  device)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateEvent(
+    VkDevice                                  device,
+    const VkEventCreateInfo*                pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkEvent*                                  pEvent)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyEvent(
+    VkDevice                                  device,
+    VkEvent                                   event,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkGetEventStatus(
+    VkDevice                                  device,
+    VkEvent                                   event_)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkSetEvent(
+    VkDevice                                  device,
+    VkEvent                                   event_)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkResetEvent(
+    VkDevice                                  device,
+    VkEvent                                   event_)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateFence(
+    VkDevice                                  device,
+    const VkFenceCreateInfo*                pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkFence*                                  pFence)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_dev *dev = nulldrv_dev(device);
+
+    return nulldrv_fence_create(dev, pCreateInfo,
+            (struct nulldrv_fence **) pFence);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyFence(
+    VkDevice                                  device,
+    VkFence                                  fence,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkGetFenceStatus(
+    VkDevice                                  device,
+    VkFence                                   fence_)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkResetFences(
+    VkDevice                    device,
+    uint32_t                    fenceCount,
+    const VkFence*              pFences)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkWaitForFences(
+    VkDevice                                  device,
+    uint32_t                                    fenceCount,
+    const VkFence*                            pFences,
+    VkBool32                                    waitAll,
+    uint64_t                                    timeout)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetPhysicalDeviceProperties(
+    VkPhysicalDevice                             gpu_,
+    VkPhysicalDeviceProperties*                  pProperties)
+{
+    NULLDRV_LOG_FUNC;
+
+    pProperties->apiVersion = VK_MAKE_VERSION(1, 0, VK_HEADER_VERSION);
+    pProperties->driverVersion = 0; // Appropriate that the nulldrv have 0's
+    pProperties->vendorID = 0;
+    pProperties->deviceID = 0;
+    pProperties->deviceType = VK_PHYSICAL_DEVICE_TYPE_OTHER;
+    strncpy(pProperties->deviceName, "nulldrv", strlen("nulldrv"));
+
+    /* TODO: fill out limits */
+    memset(&pProperties->limits, 0, sizeof(VkPhysicalDeviceLimits));
+    memset(&pProperties->sparseProperties, 0, sizeof(VkPhysicalDeviceSparseProperties));
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetPhysicalDeviceFeatures(
+    VkPhysicalDevice                          physicalDevice,
+    VkPhysicalDeviceFeatures*                 pFeatures)
+{
+    NULLDRV_LOG_FUNC;
+
+    /* nulldrv "implements" all vulkan features -- by doing nothing */
+    memset(pFeatures, VK_TRUE, sizeof(*pFeatures));
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetPhysicalDeviceFormatProperties(
+    VkPhysicalDevice                          physicalDevice,
+    VkFormat                                  format,
+    VkFormatProperties*                       pFormatInfo)
+{
+    NULLDRV_LOG_FUNC;
+
+    pFormatInfo->linearTilingFeatures = VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT |
+        VK_FORMAT_FEATURE_COLOR_ATTACHMENT_BIT;
+    pFormatInfo->optimalTilingFeatures = pFormatInfo->linearTilingFeatures;
+    pFormatInfo->bufferFeatures = 0;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetPhysicalDeviceQueueFamilyProperties(
+    VkPhysicalDevice                             gpu_,
+    uint32_t*                                    pQueueFamilyPropertyCount,
+    VkQueueFamilyProperties*                     pProperties)
+ {
+    if (pProperties == NULL) {
+        *pQueueFamilyPropertyCount = 1;
+        return;
+    }
+    pProperties->queueFlags = VK_QUEUE_GRAPHICS_BIT | VK_QUEUE_SPARSE_BINDING_BIT;
+    pProperties->queueCount = 1;
+    pProperties->timestampValidBits = 0;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetPhysicalDeviceMemoryProperties(
+    VkPhysicalDevice gpu_,
+    VkPhysicalDeviceMemoryProperties* pProperties)
+{
+    // the null driver pretends to have a single memory type (and single heap);
+    pProperties->memoryTypeCount = 1;
+    pProperties->memoryHeapCount = 1;
+    pProperties->memoryTypes[0].heapIndex = 0;
+    pProperties->memoryTypes[0].propertyFlags =
+        VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
+        VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
+        VK_MEMORY_PROPERTY_HOST_CACHED_BIT |
+        VK_MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT;
+    pProperties->memoryHeaps[0].flags = 0;      /* not device local */
+    pProperties->memoryHeaps[0].size = 0;       /* it's just malloc-backed memory */
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkEnumerateDeviceLayerProperties(
+        VkPhysicalDevice                            physicalDevice,
+        uint32_t*                                   pPropertyCount,
+        VkLayerProperties*                          pProperties)
+{
+    *pPropertyCount = 0;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkEnumerateInstanceExtensionProperties(
+    const char*                                 pLayerName,
+    uint32_t*                                   pPropertyCount,
+    VkExtensionProperties*                      pProperties)
+{
+    uint32_t copy_size;
+
+    if (pProperties == NULL) {
+        *pPropertyCount = NULLDRV_INST_EXT_COUNT;
+        return VK_SUCCESS;
+    }
+
+    copy_size = *pPropertyCount < NULLDRV_INST_EXT_COUNT ? *pPropertyCount : NULLDRV_INST_EXT_COUNT;
+    memcpy(pProperties, nulldrv_instance_extensions, copy_size * sizeof(VkExtensionProperties));
+    *pPropertyCount = copy_size;
+    if (copy_size < NULLDRV_INST_EXT_COUNT) {
+        return VK_INCOMPLETE;
+    }
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkEnumerateInstanceLayerProperties(
+        uint32_t*                                   pPropertyCount,
+        VkLayerProperties*                          pProperties)
+{
+    *pPropertyCount = 0;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkEnumerateDeviceExtensionProperties(
+    VkPhysicalDevice                            physicalDevice,
+    const char*                                 pLayerName,
+    uint32_t*                                   pPropertyCount,
+    VkExtensionProperties*                      pProperties)
+{
+    uint32_t copy_size;
+    uint32_t extension_count = ARRAY_SIZE(nulldrv_device_exts);
+
+    if (pProperties == NULL) {
+        *pPropertyCount = extension_count;
+        return VK_SUCCESS;
+    }
+
+    copy_size = *pPropertyCount < extension_count ? *pPropertyCount : extension_count;
+    memcpy(pProperties, nulldrv_device_exts, copy_size * sizeof(VkExtensionProperties));
+    *pPropertyCount = copy_size;
+    if (copy_size < extension_count) {
+        return VK_INCOMPLETE;
+    }
+
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateImage(
+    VkDevice                                  device,
+    const VkImageCreateInfo*                pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkImage*                                  pImage)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_dev *dev = nulldrv_dev(device);
+
+    return nulldrv_img_create(dev, pCreateInfo, false,
+            (struct nulldrv_img **) pImage);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyImage(
+    VkDevice                                  device,
+    VkImage                                   image,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetImageSubresourceLayout(
+    VkDevice                                    device,
+    VkImage                                     image,
+    const VkImageSubresource*                   pSubresource,
+    VkSubresourceLayout*                         pLayout)
+{
+    NULLDRV_LOG_FUNC;
+
+    pLayout->offset = 0;
+    pLayout->size = 1;
+    pLayout->rowPitch = 4;
+    pLayout->depthPitch = 4;
+    pLayout->arrayPitch = 4;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkAllocateMemory(
+    VkDevice                                  device,
+    const VkMemoryAllocateInfo*                pAllocateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkDeviceMemory*                             pMemory)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_dev *dev = nulldrv_dev(device);
+
+    return nulldrv_mem_alloc(dev, pAllocateInfo, (struct nulldrv_mem **) pMemory);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkFreeMemory(
+    VkDevice                                    device,
+    VkDeviceMemory                              mem_,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkMapMemory(
+    VkDevice                                    device,
+    VkDeviceMemory                              mem_,
+    VkDeviceSize                                offset,
+    VkDeviceSize                                size,
+    VkFlags                                     flags,
+    void**                                      ppData)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_mem *mem = nulldrv_mem(mem_);
+    void *ptr = nulldrv_mem_map(mem, flags);
+
+    *ppData = ptr;
+
+    return (ptr) ? VK_SUCCESS : VK_ERROR_MEMORY_MAP_FAILED;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkUnmapMemory(
+    VkDevice                                    device,
+    VkDeviceMemory                              mem_)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkFlushMappedMemoryRanges(
+    VkDevice                                  device,
+    uint32_t                                  memoryRangeCount,
+    const VkMappedMemoryRange*                pMemoryRanges)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkInvalidateMappedMemoryRanges(
+    VkDevice                                  device,
+    uint32_t                                  memoryRangeCount,
+    const VkMappedMemoryRange*                pMemoryRanges)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetDeviceMemoryCommitment(
+    VkDevice                                  device,
+    VkDeviceMemory                            memory,
+    VkDeviceSize*                             pCommittedMemoryInBytes)
+{
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateInstance(
+    const VkInstanceCreateInfo*             pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkInstance*                               pInstance)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_instance *inst;
+
+    inst = (struct nulldrv_instance *) nulldrv_base_create(NULL, sizeof(*inst),
+                VK_DEBUG_REPORT_OBJECT_TYPE_INSTANCE_EXT);
+    if (!inst)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    inst->obj.base.get_memory_requirements = NULL;
+
+    *pInstance = (VkInstance) inst;
+
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyInstance(
+    VkInstance                                pInstance,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkEnumeratePhysicalDevices(
+    VkInstance                                instance,
+    uint32_t*                                   pGpuCount,
+    VkPhysicalDevice*                           pGpus)
+{
+    NULLDRV_LOG_FUNC;
+    VkResult ret;
+    struct nulldrv_gpu *gpu;
+    *pGpuCount = 1;
+    ret = nulldrv_gpu_add(0, 0, 0, &gpu);
+    if (ret == VK_SUCCESS && pGpus)
+        pGpus[0] = (VkPhysicalDevice) gpu;
+    return ret;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkEnumerateLayers(
+    VkPhysicalDevice                            gpu,
+    size_t                                      maxStringSize,
+    size_t*                                     pLayerCount,
+    char* const*                                pOutLayers,
+    void*                                       pReserved)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetBufferMemoryRequirements(
+    VkDevice                                    device,
+    VkBuffer                                    buffer,
+    VkMemoryRequirements*                       pMemoryRequirements)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_base *base = nulldrv_base((void*)buffer);
+
+    base->get_memory_requirements(base, pMemoryRequirements);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetImageMemoryRequirements(
+    VkDevice                                    device,
+    VkImage                                     image,
+    VkMemoryRequirements*                       pMemoryRequirements)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_base *base = nulldrv_base((void*)image);
+
+    base->get_memory_requirements(base, pMemoryRequirements);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkBindBufferMemory(
+    VkDevice                                    device,
+    VkBuffer                                    buffer,
+    VkDeviceMemory                              mem_,
+    VkDeviceSize                                memoryOffset)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkBindImageMemory(
+    VkDevice                                    device,
+    VkImage                                     image,
+    VkDeviceMemory                              mem_,
+    VkDeviceSize                                memoryOffset)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetImageSparseMemoryRequirements(
+    VkDevice                                    device,
+    VkImage                                     image,
+    uint32_t*                                   pSparseMemoryRequirementCount,
+    VkSparseImageMemoryRequirements*            pSparseMemoryRequirements)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetPhysicalDeviceSparseImageFormatProperties(
+    VkPhysicalDevice                            physicalDevice,
+    VkFormat                                    format,
+    VkImageType                                 type,
+    VkSampleCountFlagBits                       samples,
+    VkImageUsageFlags                           usage,
+    VkImageTiling                               tiling,
+    uint32_t*                                   pPropertyCount,
+    VkSparseImageFormatProperties*              pProperties)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkQueueBindSparse(
+    VkQueue                                     queue,
+    uint32_t                                    bindInfoCount,
+    const VkBindSparseInfo*                     pBindInfo,
+    VkFence                                     fence)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreatePipelineCache(
+    VkDevice                                    device,
+    const VkPipelineCacheCreateInfo*            pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkPipelineCache*                            pPipelineCache)
+{
+
+    NULLDRV_LOG_FUNC;
+
+    struct nulldrv_dev *dev = nulldrv_dev(device);
+    struct nulldrv_pipeline_cache *pipeline_cache;
+
+    pipeline_cache = (struct nulldrv_pipeline_cache *) nulldrv_base_create(dev,
+            sizeof(*pipeline_cache), VK_DEBUG_REPORT_OBJECT_TYPE_PIPELINE_CACHE_EXT);
+    if (!pipeline_cache)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    *pPipelineCache = (VkPipelineCache) pipeline_cache;
+
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyPipeline(
+    VkDevice                                  device,
+    VkPipeline                                pipeline,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyPipelineCache(
+    VkDevice                                    device,
+    VkPipelineCache                             pipelineCache,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkGetPipelineCacheData(
+    VkDevice                                    device,
+    VkPipelineCache                             pipelineCache,
+    size_t*                                     pDataSize,
+    void*                                       pData)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_ERROR_INITIALIZATION_FAILED;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkMergePipelineCaches(
+    VkDevice                                    device,
+    VkPipelineCache                             dstCache,
+    uint32_t                                    srcCacheCount,
+    const VkPipelineCache*                      pSrcCaches)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_ERROR_INITIALIZATION_FAILED;
+}
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateGraphicsPipelines(
+    VkDevice                                  device,
+    VkPipelineCache                           pipelineCache,
+    uint32_t                                  createInfoCount,
+    const VkGraphicsPipelineCreateInfo*    pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkPipeline*                               pPipeline)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_dev *dev = nulldrv_dev(device);
+
+    return graphics_pipeline_create(dev, pCreateInfo,
+            (struct nulldrv_pipeline **) pPipeline);
+}
+
+
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateComputePipelines(
+    VkDevice                                  device,
+    VkPipelineCache                           pipelineCache,
+    uint32_t                                  createInfoCount,
+    const VkComputePipelineCreateInfo*     pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkPipeline*                               pPipeline)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+
+
+
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateQueryPool(
+    VkDevice                                  device,
+    const VkQueryPoolCreateInfo*           pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkQueryPool*                             pQueryPool)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyQueryPool(
+    VkDevice                                  device,
+    VkQueryPool                               queryPoool,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkGetQueryPoolResults(
+    VkDevice                                    device,
+    VkQueryPool                                 queryPool,
+    uint32_t                                    firstQuery,
+    uint32_t                                    queryCount,
+    size_t                                      dataSize,
+    void*                                       pData,
+    size_t                                      stride,
+    VkQueryResultFlags                          flags)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkQueueWaitIdle(
+    VkQueue                                   queue_)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkQueueSubmit(
+    VkQueue                                   queue_,
+    uint32_t                                  submitCount,
+    const VkSubmitInfo*                       pSubmits,
+    VkFence                                   fence_)
+{
+    NULLDRV_LOG_FUNC;
+   return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateSemaphore(
+    VkDevice                                  device,
+    const VkSemaphoreCreateInfo*            pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkSemaphore*                              pSemaphore)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroySemaphore(
+    VkDevice                                  device,
+    VkSemaphore                               semaphore,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateSampler(
+    VkDevice                                  device,
+    const VkSamplerCreateInfo*              pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkSampler*                                pSampler)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_dev *dev = nulldrv_dev(device);
+
+    return nulldrv_sampler_create(dev, pCreateInfo,
+            (struct nulldrv_sampler **) pSampler);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroySampler(
+    VkDevice                                  device,
+    VkSampler                                 sampler,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateShaderModule(
+    VkDevice                                    device,
+    const VkShaderModuleCreateInfo*             pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkShaderModule*                             pShaderModule)
+{
+    NULLDRV_LOG_FUNC;
+
+    struct nulldrv_dev *dev = nulldrv_dev(device);
+    struct nulldrv_shader_module *shader_module;
+
+    shader_module = (struct nulldrv_shader_module *) nulldrv_base_create(dev,
+            sizeof(*shader_module), VK_DEBUG_REPORT_OBJECT_TYPE_SHADER_MODULE_EXT);
+    if (!shader_module)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+    *pShaderModule = (VkShaderModule) shader_module;
+
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyShaderModule(
+    VkDevice                                    device,
+    VkShaderModule                              shaderModule,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    // TODO: Fill in with real data
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateBufferView(
+    VkDevice                                  device,
+    const VkBufferViewCreateInfo*          pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkBufferView*                            pView)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_dev *dev = nulldrv_dev(device);
+
+    return nulldrv_buf_view_create(dev, pCreateInfo,
+            (struct nulldrv_buf_view **) pView);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyBufferView(
+    VkDevice                                  device,
+    VkBufferView                              bufferView,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateImageView(
+    VkDevice                                  device,
+    const VkImageViewCreateInfo*           pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkImageView*                             pView)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_dev *dev = nulldrv_dev(device);
+
+    return nulldrv_img_view_create(dev, pCreateInfo,
+            (struct nulldrv_img_view **) pView);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyImageView(
+    VkDevice                                  device,
+    VkImageView                               imageView,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateDescriptorSetLayout(
+    VkDevice                                   device,
+    const VkDescriptorSetLayoutCreateInfo* pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkDescriptorSetLayout*                   pSetLayout)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_dev *dev = nulldrv_dev(device);
+
+    return nulldrv_desc_layout_create(dev, pCreateInfo,
+            (struct nulldrv_desc_layout **) pSetLayout);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyDescriptorSetLayout(
+    VkDevice                                  device,
+    VkDescriptorSetLayout                     descriptorSetLayout,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL  vkCreatePipelineLayout(
+    VkDevice                                device,
+    const VkPipelineLayoutCreateInfo*       pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkPipelineLayout*                       pPipelineLayout)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_dev *dev = nulldrv_dev(device);
+
+    return nulldrv_pipeline_layout_create(dev,
+            pCreateInfo,
+            (struct nulldrv_pipeline_layout **) pPipelineLayout);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyPipelineLayout(
+    VkDevice                                  device,
+    VkPipelineLayout                          pipelineLayout,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateDescriptorPool(
+    VkDevice                                    device,
+    const VkDescriptorPoolCreateInfo*           pCreateInfo,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkDescriptorPool*                           pDescriptorPool)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_dev *dev = nulldrv_dev(device);
+
+    return nulldrv_desc_pool_create(dev, pCreateInfo,
+            (struct nulldrv_desc_pool **) pDescriptorPool);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyDescriptorPool(
+    VkDevice                                  device,
+    VkDescriptorPool                          descriptorPool,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkResetDescriptorPool(
+    VkDevice                                device,
+    VkDescriptorPool                        descriptorPool,
+    VkDescriptorPoolResetFlags              flags)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkAllocateDescriptorSets(
+    VkDevice                                device,
+    const VkDescriptorSetAllocateInfo*      pAllocateInfo,
+    VkDescriptorSet*                        pDescriptorSets)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_desc_pool *pool = nulldrv_desc_pool(pAllocateInfo->descriptorPool);
+    struct nulldrv_dev *dev = pool->dev;
+    VkResult ret = VK_SUCCESS;
+    uint32_t i;
+
+    for (i = 0; i < pAllocateInfo->descriptorSetCount; i++) {
+        const struct nulldrv_desc_layout *layout =
+            nulldrv_desc_layout(pAllocateInfo->pSetLayouts[i]);
+
+        ret = nulldrv_desc_set_create(dev, pool, layout,
+                (struct nulldrv_desc_set **) &pDescriptorSets[i]);
+        if (ret != VK_SUCCESS)
+            break;
+    }
+
+    return ret;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkFreeDescriptorSets(
+    VkDevice                                    device,
+    VkDescriptorPool                            descriptorPool,
+    uint32_t                                    descriptorSetCount,
+    const VkDescriptorSet*                      pDescriptorSets)
+{
+    NULLDRV_LOG_FUNC;
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkUpdateDescriptorSets(
+    VkDevice                                    device,
+    uint32_t                                    descriptorWriteCount,
+    const VkWriteDescriptorSet*                 pDescriptorWrites,
+    uint32_t                                    descriptorCopyCount,
+    const VkCopyDescriptorSet*                  pDescriptorCopies)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateFramebuffer(
+    VkDevice                                  device,
+    const VkFramebufferCreateInfo*          info,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkFramebuffer*                            fb_ret)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_dev *dev = nulldrv_dev(device);
+
+    return nulldrv_fb_create(dev, info, (struct nulldrv_framebuffer **) fb_ret);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyFramebuffer(
+    VkDevice                                  device,
+    VkFramebuffer                             framebuffer,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkCreateRenderPass(
+    VkDevice                                  device,
+    const VkRenderPassCreateInfo*          info,
+    const VkAllocationCallbacks*                     pAllocator,
+    VkRenderPass*                            rp_ret)
+{
+    NULLDRV_LOG_FUNC;
+    struct nulldrv_dev *dev = nulldrv_dev(device);
+
+    return nulldrv_render_pass_create(dev, info, (struct nulldrv_render_pass **) rp_ret);
+}
+
+VKAPI_ATTR void VKAPI_CALL vkDestroyRenderPass(
+    VkDevice                                  device,
+    VkRenderPass                              renderPass,
+    const VkAllocationCallbacks*                     pAllocator)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdPushConstants(
+    VkCommandBuffer                                 commandBuffer,
+    VkPipelineLayout                            layout,
+    VkShaderStageFlags                          stageFlags,
+    uint32_t                                    offset,
+    uint32_t                                    size,
+    const void*                                 pValues)
+{
+    /* TODO: Implement */
+}
+
+VKAPI_ATTR void VKAPI_CALL vkGetRenderAreaGranularity(
+    VkDevice                                    device,
+    VkRenderPass                                renderPass,
+    VkExtent2D*                                 pGranularity)
+{
+    pGranularity->height = 1;
+    pGranularity->width = 1;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdBeginRenderPass(
+    VkCommandBuffer                                 commandBuffer,
+    const VkRenderPassBeginInfo*                pRenderPassBegin,
+    VkSubpassContents                        contents)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdNextSubpass(
+    VkCommandBuffer                                 commandBuffer,
+    VkSubpassContents                        contents)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdEndRenderPass(
+    VkCommandBuffer                              commandBuffer)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+VKAPI_ATTR void VKAPI_CALL vkCmdExecuteCommands(
+    VkCommandBuffer                                 commandBuffer,
+    uint32_t                                    commandBuffersCount,
+    const VkCommandBuffer*                          pCommandBuffers)
+{
+    NULLDRV_LOG_FUNC;
+}
+
+void* xcbCreateWindow(
+    uint16_t         width,
+    uint16_t         height)
+{
+    static uint32_t  window;  // Kludge to the max
+    NULLDRV_LOG_FUNC;
+    return &window;
+}
+
+// May not be needed, if we stub out stuf in tri.c
+void xcbDestroyWindow()
+{
+    NULLDRV_LOG_FUNC;
+}
+
+int xcbGetMessage(void *msg)
+{
+    NULLDRV_LOG_FUNC;
+    return 0;
+}
+
+VkResult xcbQueuePresent(void *queue, void *image, void* fence)
+{
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL vkGetPhysicalDeviceImageFormatProperties(
+    VkPhysicalDevice                            physicalDevice,
+    VkFormat                                    format,
+    VkImageType                                 type,
+    VkImageTiling                               tiling,
+    VkImageUsageFlags                           usage,
+    VkImageCreateFlags                          flags,
+    VkImageFormatProperties*                    pImageFormatProperties)
+{
+    pImageFormatProperties->maxExtent.width = 1024;
+    pImageFormatProperties->maxExtent.height = 1024;
+    pImageFormatProperties->maxExtent.depth = 1024;
+    pImageFormatProperties->maxMipLevels = 10;
+    pImageFormatProperties->maxArrayLayers = 1024;
+    pImageFormatProperties->sampleCounts = VK_SAMPLE_COUNT_1_BIT;
+    pImageFormatProperties->maxResourceSize = 1024*1024*1024;
+
+    return VK_SUCCESS;
+}

diff --git a/icd/nulldrv/nulldrv.h b/icd/nulldrv/nulldrv.h
new file mode 100644
index 0000000..0a8cd9b
--- /dev/null
+++ b/icd/nulldrv/nulldrv.h

@@ -0,0 +1,225 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Cody Northrop <cody@lunarg.com>
+ * Author: David Pinedo <david@lunarg.com>
+ */
+
+#ifndef NULLDRV_H
+#define NULLDRV_H
+#include <stdlib.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <string.h>
+#include <assert.h>
+
+#include <vulkan/vulkan.h>
+#include <vulkan/vk_icd.h>
+
+#include "icd.h"
+
+#include "icd-format.h"
+#include "icd-utils.h"
+
+struct nulldrv_base {
+    void *loader_data;
+    uint32_t magic;
+    VkResult (*get_memory_requirements)(struct nulldrv_base *base, VkMemoryRequirements *pMemoryRequirements);
+};
+
+struct nulldrv_obj {
+    struct nulldrv_base base;
+};
+
+enum nulldrv_dev_ext_type {
+   NULLDRV_DEV_EXT_KHR_SWAPCHAIN,
+   NULLDRV_DEV_EXT_COUNT,
+   NULLDRV_DEV_EXT_INVALID = NULLDRV_DEV_EXT_COUNT,
+};
+
+
+enum nulldrv_inst_ext_type {
+   NULLDRV_INST_EXT_KHR_SURFACE,
+   NULLDRV_INST_EXT_KHR_XCB_SURFACE,
+   NULLDRV_INST_EXT_COUNT,
+   NULLDRV_INST_EXT_INVALID = NULLDRV_INST_EXT_COUNT,
+};
+
+
+struct nulldrv_instance {
+    struct nulldrv_obj obj;
+};
+
+struct nulldrv_gpu {
+    void *loader_data;
+};
+
+struct nulldrv_dev {
+     struct nulldrv_base base;
+     bool exts[NULLDRV_DEV_EXT_COUNT];
+     struct nulldrv_desc_ooxx *desc_ooxx;
+     struct nulldrv_queue *queues[1];
+};
+
+struct nulldrv_desc_ooxx {
+    uint32_t surface_desc_size;
+    uint32_t sampler_desc_size;
+};
+
+
+struct nulldrv_queue {
+    struct nulldrv_base base;
+    struct nulldrv_dev *dev;
+};
+
+struct nulldrv_rt_view {
+    struct nulldrv_obj obj;
+};
+
+struct nulldrv_fence {
+    struct nulldrv_obj obj;
+};
+
+struct nulldrv_img {
+    struct nulldrv_obj obj;
+    VkImageType type;
+    int32_t depth;
+    uint32_t mip_levels;
+    uint32_t array_size;
+    VkFlags usage;
+    VkSampleCountFlagBits samples;
+    size_t total_size;
+};
+
+struct nulldrv_mem {
+    struct nulldrv_base base;
+    struct nulldrv_bo *bo;
+    VkDeviceSize size;
+};
+
+struct nulldrv_sampler {
+    struct nulldrv_obj obj;
+};
+
+struct nulldrv_shader_module {
+    struct nulldrv_obj obj;
+};
+
+struct nulldrv_pipeline_cache {
+    struct nulldrv_obj obj;
+};
+
+struct nulldrv_img_view {
+    struct nulldrv_obj obj;
+    struct nulldrv_img *img;
+    float min_lod;
+    uint32_t cmd_len;
+};
+
+struct nulldrv_buf {
+    struct nulldrv_obj obj;
+    VkDeviceSize size;
+    VkFlags usage;
+};
+
+struct nulldrv_desc_layout {
+    struct nulldrv_obj obj;
+};
+
+struct nulldrv_pipeline_layout {
+    struct nulldrv_obj obj;
+};
+
+struct nulldrv_shader {
+    struct nulldrv_obj obj;
+
+};
+
+struct nulldrv_pipeline {
+    struct nulldrv_obj obj;
+    struct nulldrv_dev *dev;
+};
+
+struct nulldrv_dynamic_vp {
+    struct nulldrv_obj obj;
+};
+
+struct nulldrv_dynamic_line_width {
+    struct nulldrv_obj obj;
+};
+
+struct nulldrv_dynamic_depth_bias {
+    struct nulldrv_obj obj;
+};
+
+struct nulldrv_dynamic_blend {
+    struct nulldrv_obj obj;
+};
+
+struct nulldrv_dynamic_depth_bounds {
+    struct nulldrv_obj obj;
+};
+
+struct nulldrv_dynamic_stencil {
+    struct nulldrv_obj obj;
+};
+
+struct nulldrv_cmd {
+    struct nulldrv_obj obj;
+};
+
+struct nulldrv_desc_pool {
+    struct nulldrv_obj obj;
+    struct nulldrv_dev *dev;
+};
+
+struct nulldrv_desc_set {
+    struct nulldrv_obj obj;
+    struct nulldrv_desc_ooxx *ooxx;
+    const struct nulldrv_desc_layout *layout;
+};
+
+struct nulldrv_framebuffer {
+    struct nulldrv_obj obj;
+};
+
+struct nulldrv_render_pass {
+    struct nulldrv_obj obj;
+};
+
+struct nulldrv_buf_view {
+    struct nulldrv_obj obj;
+
+    struct nulldrv_buf *buf;
+
+    /* SURFACE_STATE */
+    uint32_t cmd[8];
+    uint32_t fs_cmd[8];
+    uint32_t cmd_len;
+};
+
+struct nulldrv_display {
+    struct nulldrv_base base;
+    struct nulldrv_dev *dev;
+};
+
+struct nulldrv_swap_chain {
+    struct nulldrv_base base;
+    struct nulldrv_dev *dev;
+};
+
+#endif /* NULLDRV_H */

diff --git a/icd/nulldrv/nulldrv_icd.json b/icd/nulldrv/nulldrv_icd.json
new file mode 100644
index 0000000..6d4c8e2
--- /dev/null
+++ b/icd/nulldrv/nulldrv_icd.json

@@ -0,0 +1,7 @@
+{
+    "file_format_version": "1.0.0",
+    "ICD": {
+        "library_path": "./libVK_nulldrv.so",
+        "api_version": "1.0.21"
+    }
+}

diff --git a/jsoncpp_revision b/jsoncpp_revision
new file mode 100644
index 0000000..f2f09db
--- /dev/null
+++ b/jsoncpp_revision

@@ -0,0 +1 @@
+d8cd848ede1071a25846cd90b4fddf269d868ff1
\ No newline at end of file

diff --git a/layers/vk_layer_config.cpp b/layers/vk_layer_config.cpp
index d8fe87d..f5adb8c 100644
--- a/layers/vk_layer_config.cpp
+++ b/layers/vk_layer_config.cpp

@@ -43,6 +43,7 @@
 
   private:
     bool m_fileIsParsed;
+    std::string m_fileName;
     std::map<std::string, std::string> m_valueMap;
 
     void parseFile(const char *filename);
@@ -134,6 +135,13 @@
 // Constructor for ConfigFile. Initialize layers to log error messages to stdout by default. If a vk_layer_settings file is present,
 // its settings will override the defaults.
 ConfigFile::ConfigFile() : m_fileIsParsed(false) {
+
+#ifdef ANDROID
+    m_fileName = "/sdcard/Android/vk_layer_settings.txt";
+#else
+    m_fileName = "vk_layer_settings.txt";
+#endif
+
     m_valueMap["lunarg_core_validation.report_flags"] = "error";
     m_valueMap["lunarg_image.report_flags"] = "error";
     m_valueMap["lunarg_object_tracker.report_flags"] = "error";

diff --git a/layersvt/CMakeLists.txt b/layersvt/CMakeLists.txt
new file mode 100644
index 0000000..5fcd80b
--- /dev/null
+++ b/layersvt/CMakeLists.txt

@@ -0,0 +1,201 @@
+cmake_minimum_required (VERSION 2.8.11)
+
+macro(run_vk_helper subcmd)
+    add_custom_command(OUTPUT ${ARGN}
+        COMMAND ${PYTHON_CMD} ${PROJECT_SOURCE_DIR}/vk_helper.py --${subcmd} ${PROJECT_SOURCE_DIR}/include/vulkan/vulkan.h --abs_out_dir ${CMAKE_CURRENT_BINARY_DIR}
+        DEPENDS ${PROJECT_SOURCE_DIR}/vk_helper.py ${PROJECT_SOURCE_DIR}/include/vulkan/vulkan.h
+    )
+endmacro()
+
+## VulkanTools has its own layer generator script
+macro(run_vk_vtlayer_generate subcmd output)
+    add_custom_command(OUTPUT ${output}
+        COMMAND ${PYTHON_CMD} ${PROJECT_SOURCE_DIR}/vk-vtlayer-generate.py ${DisplayServer} ${subcmd} ${PROJECT_SOURCE_DIR}/include/vulkan/vulkan.h > ${output}
+        DEPENDS ${PROJECT_SOURCE_DIR}/vk-vtlayer-generate.py ${PROJECT_SOURCE_DIR}/include/vulkan/vulkan.h ${PROJECT_SOURCE_DIR}/vk_helper_api_dump.py
+    )
+endmacro()
+
+macro(run_vk_layer_xml_generate subcmd output)
+    add_custom_command(OUTPUT ${output}
+        COMMAND ${PYTHON_CMD} ${PROJECT_SOURCE_DIR}/vt_genvk.py -registry ${PROJECT_SOURCE_DIR}/vk.xml ${output}
+        DEPENDS ${PROJECT_SOURCE_DIR}/vk.xml ${PROJECT_SOURCE_DIR}/generator.py ${PROJECT_SOURCE_DIR}/vt_genvk.py ${PROJECT_SOURCE_DIR}/reg.py
+    )
+endmacro()
+
+set(LAYER_JSON_FILES
+    VkLayer_api_dump
+    VkLayer_basic
+    VkLayer_generic
+    VkLayer_multi
+    VkLayer_screenshot
+    )
+
+set(VK_LAYER_RPATH /usr/lib/x86_64-linux-gnu/vulkan/layer:/usr/lib/i386-linux-gnu/vulkan/layer)
+set(CMAKE_INSTALL_RPATH ${VK_LAYER_RPATH})
+
+if (WIN32)
+    if (NOT (CMAKE_CURRENT_SOURCE_DIR STREQUAL CMAKE_CURRENT_BINARY_DIR))
+        if (CMAKE_GENERATOR MATCHES "^Visual Studio.*")
+            foreach (config_file ${LAYER_JSON_FILES})
+                FILE(TO_NATIVE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/windows/${config_file}.json src_json)
+                FILE(TO_NATIVE_PATH ${CMAKE_CURRENT_BINARY_DIR}/$<CONFIGURATION>/${config_file}.json dst_json)
+                add_custom_target(${config_file}-json ALL
+                    COMMAND copy ${src_json} ${dst_json}
+                    VERBATIM
+                    )
+                add_dependencies(${config_file}-json ${config_file})
+            endforeach(config_file)
+        else()
+            foreach (config_file ${LAYER_JSON_FILES})
+                FILE(TO_NATIVE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/windows/${config_file}.json src_json)
+                FILE(TO_NATIVE_PATH ${CMAKE_CURRENT_BINARY_DIR}/${config_file}.json dst_json)
+                add_custom_target(${config_file}-json ALL
+                    COMMAND copy ${src_json} ${dst_json}
+                    VERBATIM
+                    )
+                add_dependencies(${config_file}-json ${config_file})
+            endforeach(config_file)
+        endif()
+    endif()
+else()
+    # extra setup for out-of-tree builds
+    if (NOT (CMAKE_CURRENT_SOURCE_DIR STREQUAL CMAKE_CURRENT_BINARY_DIR))
+        foreach (config_file ${LAYER_JSON_FILES})
+            add_custom_target(${config_file}-json ALL
+                COMMAND ln -sf ${CMAKE_CURRENT_SOURCE_DIR}/linux/${config_file}.json
+                VERBATIM
+                )
+             add_dependencies(${config_file}-json ${config_file})
+        endforeach(config_file)
+    endif()
+endif()
+
+if (WIN32)
+    macro(add_vk_layer target)
+    add_custom_command(OUTPUT VkLayer_${target}.def
+        COMMAND ${PYTHON_CMD} ${PROJECT_SOURCE_DIR}/vk-generate.py ${DisplayServer} win-def-file VkLayer_${target} layer > VkLayer_${target}.def
+        DEPENDS ${PROJECT_SOURCE_DIR}/vk-generate.py ${PROJECT_SOURCE_DIR}/vulkan.py
+    )
+    add_library(VkLayer_${target} SHARED ${ARGN} VkLayer_${target}.def)
+    target_link_Libraries(VkLayer_${target} VkLayer_utilsvt)
+    add_dependencies(VkLayer_${target} generate_vt_helpers)
+    endmacro()
+else()
+    macro(add_vk_layer target)
+    add_library(VkLayer_${target} SHARED ${ARGN})
+    target_link_Libraries(VkLayer_${target} VkLayer_utilsvt)
+    add_dependencies(VkLayer_${target} generate_vt_helpers)
+    set_target_properties(VkLayer_${target} PROPERTIES LINK_FLAGS "-Wl,-Bsymbolic")
+    install(TARGETS VkLayer_${target} DESTINATION ${PROJECT_BINARY_DIR}/install_staging)
+    endmacro()
+endif()
+
+include_directories(
+    ${CMAKE_CURRENT_SOURCE_DIR}
+    ${CMAKE_CURRENT_SOURCE_DIR}/../loader
+    ${CMAKE_CURRENT_SOURCE_DIR}/../layers
+    ${CMAKE_CURRENT_SOURCE_DIR}/../include/vulkan
+    ${CMAKE_CURRENT_BINARY_DIR}
+    ${PROJECT_SOURCE_DIR}/../glslang/SPIRV
+)
+
+if (WIN32)
+    set (CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -D_CRT_SECURE_NO_WARNINGS")
+    set (CMAKE_C_FLAGS_RELEASE   "${CMAKE_C_FLAGS_RELEASE} -D_CRT_SECURE_NO_WARNINGS")
+
+    # For VS 2015, which uses compiler version 1900, core_validation.cpp fails with too many objects
+    # without either optimizations enabled, or setting the /bigobj compilation option.  Since
+    # optimizations are enabled in a release build, this only affects the debug build.  For now,
+    # enable /bigobj mode for all debug layer files. An alternative for the future is to split
+    # large source files into multiple files which will also alleviate the compilation error.
+    if (MSVC AND NOT (MSVC_VERSION LESS 1900))
+        set (CMAKE_CXX_FLAGS_DEBUG   "${CMAKE_CXX_FLAGS_DEBUG} -D_CRT_SECURE_NO_WARNINGS /bigobj")
+        set (CMAKE_C_FLAGS_DEBUG     "${CMAKE_C_FLAGS_DEBUG} -D_CRT_SECURE_NO_WARNINGS /bigobj")
+    else()
+        set (CMAKE_CXX_FLAGS_DEBUG   "${CMAKE_CXX_FLAGS_DEBUG} -D_CRT_SECURE_NO_WARNINGS")
+        set (CMAKE_C_FLAGS_DEBUG     "${CMAKE_C_FLAGS_DEBUG} -D_CRT_SECURE_NO_WARNINGS")
+    endif()
+else()
+    set (CMAKE_CXX_FLAGS "-std=c++11")
+    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wpointer-arith")
+    set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wpointer-arith")
+endif()
+
+add_custom_command(OUTPUT vk_dispatch_table_helper.h
+    COMMAND ${PYTHON_CMD} ${PROJECT_SOURCE_DIR}/vk-generate.py ${DisplayServer} dispatch-table-ops layer > vk_dispatch_table_helper.h
+    DEPENDS ${PROJECT_SOURCE_DIR}/vk-generate.py ${PROJECT_SOURCE_DIR}/vulkan.py)
+
+run_vk_helper(gen_enum_string_helper vk_enum_string_helper.h)
+run_vk_helper(gen_struct_wrappers
+    vk_struct_string_helper.h
+    vk_struct_string_helper_cpp.h
+    vk_struct_string_helper_no_addr.h
+    vk_struct_string_helper_no_addr_cpp.h
+    vk_struct_size_helper.h
+    vk_struct_size_helper.c
+    vk_struct_wrappers.h
+    vk_struct_wrappers.cpp
+    vk_safe_struct.h
+# Don't list vk_safe_struct.cpp as OUTPUT to avoid duplicate builds.
+# If listed here use of it for add_library will cause it to be created
+# independently of custom target generate_vk_layer_helpers .
+# That breaks parallel builds.
+#   vk_safe_struct.cpp
+)
+
+# Let gen_struct_wrappers really create vk_safe_struct.cpp
+add_custom_command(OUTPUT vk_safe_struct.cpp
+    COMMAND echo defer making vk_safe_struct.cpp
+)
+
+set_source_files_properties(
+    vk_struct_string_helper.h
+    vk_struct_string_helper_cpp.h
+    vk_struct_string_helper_no_addr.h
+    vk_struct_string_helper_no_addr_cpp.h
+    vk_struct_size_helper.h
+    vk_struct_size_helper.c
+    vk_struct_wrappers.h
+    vk_struct_wrappers.cpp
+    vk_safe_struct.h
+    vk_safe_struct.cpp
+    PROPERTIES GENERATED TRUE)
+
+add_custom_target(generate_vt_helpers DEPENDS
+    vk_dispatch_table_helper.h
+    vk_enum_string_helper.h
+    vk_struct_string_helper.h
+    vk_struct_string_helper_no_addr.h
+    vk_struct_string_helper_cpp.h
+    vk_struct_string_helper_no_addr_cpp.h
+    vk_struct_size_helper.h
+    vk_struct_size_helper.c
+    vk_struct_wrappers.h
+    vk_struct_wrappers.cpp
+    vk_safe_struct.h
+    vk_safe_struct.cpp
+)
+
+#VulkanTools layers
+run_vk_vtlayer_generate(generic generic_layer.cpp)
+run_vk_layer_xml_generate(ApiDump api_dump.cpp)
+run_vk_layer_xml_generate(ApiDump api_dump_text.h)
+
+# Layer Utils Library
+# For Windows, we use a static lib because the Windows loader has a fairly restrictive loader search
+# path that can't be easily modified to point it to the same directory that contains the layers.
+if (WIN32)
+    add_library(VkLayer_utilsvt STATIC ../layers/vk_layer_config.cpp ../layers/vk_layer_extension_utils.cpp ../layers/vk_layer_utils.cpp)
+else()
+    add_library(VkLayer_utilsvt SHARED ../layers/vk_layer_config.cpp ../layers/vk_layer_extension_utils.cpp ../layers/vk_layer_utils.cpp)
+    install(TARGETS VkLayer_utilsvt DESTINATION ${PROJECT_BINARY_DIR}/install_staging)
+endif()
+
+# VulkanTools layers
+add_vk_layer(basic basic.cpp ../layers/vk_layer_table.cpp)
+add_vk_layer(multi multi.cpp ../layers/vk_layer_table.cpp)
+# generated
+add_vk_layer(generic generic_layer.cpp ../layers/vk_layer_table.cpp)
+add_vk_layer(api_dump api_dump.cpp api_dump_text.h ../layers/vk_layer_table.cpp)
+add_vk_layer(screenshot screenshot.cpp ../layers/vk_layer_table.cpp)
+

diff --git a/layersvt/README.md b/layersvt/README.md
new file mode 100644
index 0000000..940f7ee
--- /dev/null
+++ b/layersvt/README.md

@@ -0,0 +1,79 @@
+# Layer Description and Status
+
+## Overview
+
+Layer libraries can be written to intercept or hook VK entry points for various
+debug and validation purposes.  One or more VK entry points can be defined in your Layer
+library.  Undefined entrypoints in the Layer library will be passed to the next Layer which
+may be the driver.  Multiple layer libraries can be chained (actually a hierarchy) together.
+vkEnumerateInstanceLayerProperties and vkEnumerateDeviceLayerProperties can be called to list the
+available layers and their properties. Layers can intercept Vulkan instance level entry points
+in which case they are called an Instance Layer.  Layers can intercept device entry  points
+in which case they are called a Device Layer. Instance level entry points are those with VkInstance
+or VkPhysicalDevice as first parameter.  Device level entry points are those with VkDevice, VkCommandBuffer,
+or VkQueue as the first parameter. Layers that want to intercept both instance and device
+level entrypoints are called Global Layers. vkXXXXGetProcAddr is used internally by the Layers and
+Loader to initialize dispatch tables. Device Layers are activated at vkCreateDevice time. Instance
+Layers are activated at vkCreateInstance time.  Layers can also be activated via environment variables
+(VK_INSTANCE_LAYERS or VK_DEVICE_LAYERS).
+
+### Layer library example code
+
+Note that some layers are code-generated and will therefore exist in the directory (build_dir)/layers
+
+-include/vkLayer.h  - header file for layer code.
+
+### Templates
+layers/basic.cpp (name=VK_LAYER_LUNARG_basic) simple example wrapping a few entrypoints. Shows layer features:
+- Multiple dispatch tables for supporting multiple GPUs.
+- Example layer extension function shown.
+- Layer extension advertised by vkGetXXXExtension().
+
+layers/multi.cpp (name=VK_LAYER_LUNARG_multi1:VK_LAYER_LUNARG_multi2) simple example showing multiple layers per library
+
+(build dir)/layer/generic_layer.cpp (name=VK_LAYER_LUNARG_generic) - auto generated example wrapping all VK entrypoints.
+
+### Print API Calls and Parameter Values
+(build dir)/layers/api_dump.cpp (name=VK_LAYER_LUNARG_api_dump) - print out API calls along with parameter values
+
+## Using Layers
+
+1. Build VK loader and i965 icd driver using normal steps (cmake and make)
+2. Place libVkLayer_<name>.so in the same directory as your VK test or app:
+
+    cp build/layer/libVkLayer_basic.so build/layer/libVkLayer_generic.so build/tests
+
+    This is required for the Loader to be able to scan and enumerate your library.
+    Alternatively, use the VK\_LAYER\_PATH environment variable to specify where the layer libraries reside.
+
+3. Create a vk_layer_settings.txt file in the same directory to specify how your layers should behave.
+
+    Model it after the following example:  [*vk_layer_settings.txt*](vk_layer_settings.txt)
+
+4. Specify which Layers to activate by using
+vkCreateDevice and/or vkCreateInstance or environment variables.
+
+    export VK\_INSTANCE\_LAYERS=VK_LAYER_LUNARG_basic:VK_LAYER_LUNARG_generic
+    export VK\_DEVICE\_LAYERS=VK_LAYER_LUNARG_basic:VK_LAYER_LUNARG_generic
+    cd build/tests; ./vkinfo
+
+## Tips for writing new layers
+
+1. Must implement vkGetInstanceProcAddr() (aka GIPA) and vkGetDeviceProcAddr() (aka GDPA);
+2. Must have a local dispatch table to call next layer (see vk_layer.h);
+3. Must have a layer manifest file for each Layer library for Loader to find layer properties (see loader/README.md)
+4. Next layers GXPA can be found in the wrapped instance or device object;
+5. Loader calls a layer's GXPA first so initialization should occur here;
+6. all entrypoints can be wrapped but only will be called after layer is activated
+    via the first vkCreatDevice or vkCreateInstance;
+7. entrypoint names can be any name as specified by the layers vkGetXXXXXProcAddr
+    implementation; exceptions are vkGetXXXXProcAddr,
+    which must have the correct name since the Loader calls these entrypoints;
+8. entrypoint names must be exported to the OSes dynamic loader with VK\_LAYER\_EXPORT;
+9. Layer naming convention is camel case same name as in library: libVkLayer_<name>.so
+10. For multiple layers in one library the manifest file can specify each layer.
+
+## Status
+
+
+### Current known issues

diff --git a/layersvt/api_dump.h b/layersvt/api_dump.h
new file mode 100644
index 0000000..a50b287
--- /dev/null
+++ b/layersvt/api_dump.h

@@ -0,0 +1,386 @@
+/* Copyright (c) 2015-2016 The Khronos Group Inc.
+ * Copyright (c) 2015-2016 Valve Corporation
+ * Copyright (c) 2015-2016 LunarG, Inc.
+ * Copyright (C) 2015-2016 Google Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Lenny Komow <lenny@lunarg.com>
+ */
+
+#pragma once
+
+#define NOMINMAX
+
+#include "vk_loader_platform.h"
+#include "vulkan/vk_layer.h"
+#include "vk_layer_config.h"
+#include "vk_layer_table.h"
+#include "vk_layer_extension_utils.h"
+#include "vk_layer_utils.h"
+
+#include <algorithm>
+#include <fstream>
+#include <iomanip>
+#include <iostream>
+#include <ostream>
+#include <sstream>
+#include <string.h>
+#include <string>
+#include <type_traits>
+
+enum class ApiDumpFormat {
+    Text,
+};
+
+class ApiDumpSettings {
+  public:
+    ApiDumpSettings() {
+        // Get the output file settings and create a stream for it
+        const char *file_option = getLayerOption("lunarg_api_dump.file");
+        if (file_option != NULL && strcmp(file_option, "TRUE") == 0) {
+            use_cout = false;
+            const char *filename_option =
+                getLayerOption("lunarg_api_dump.log_filename");
+            if (filename_option != NULL && strcmp(filename_option, "") != 0)
+                output_stream.open(filename_option,
+                                   std::ofstream::out | std::ostream::trunc);
+            else
+                output_stream.open("vk_apidump.txt",
+                                   std::ofstream::out | std::ostream::trunc);
+        } else {
+            use_cout = true;
+        }
+        
+        output_format = ApiDumpFormat::Text;
+
+        // Get the remaining settings
+        show_params = readBoolOption("lunarg_api_dump.detailed", true);
+        show_address = !readBoolOption("lunarg_api_dump.no_addr", false);
+        should_flush = readBoolOption("lunarg_api_dump.flush", true);
+        indent_size =
+            std::max(readIntOption("lunarg_api_dump.indent_size", 4), 0);
+        show_type = readBoolOption("lunarg_api_dump.show_types", true);
+        name_size = std::max(readIntOption("lunarg_api_dump.name_size", 32), 0);
+        type_size = std::max(readIntOption("lunarg_api_dump.type_size", 0), 0);
+        use_spaces = readBoolOption("lunarg_api_dump.use_spaces", true);
+        show_shader = readBoolOption("lunarg_api_dump.show_shader", false);
+    }
+
+    ~ApiDumpSettings() {
+        if (!use_cout)
+            output_stream.close();
+    }
+    
+    inline ApiDumpFormat format() const { return output_format; }
+
+    std::ostream &formatNameType(std::ostream &stream, int indents,
+                                 const char *name, const char *type) const {
+        stream << indentation(indents) << name << ": ";
+
+        if (use_spaces)
+            stream << spaces(name_size - (int)strlen(name) - 2);
+        else
+            stream << tabs((name_size - (int)strlen(name) - 3 + indent_size) /
+                           indent_size);
+
+        if (show_type && use_spaces)
+            stream << type << spaces(type_size - (int)strlen(type));
+        else if (show_type && !use_spaces)
+            stream << type
+                   << tabs((type_size - (int)strlen(type) - 1 + indent_size) /
+                           indent_size);
+
+        return stream << " = ";
+    }
+
+    inline const char *indentation(int indents) const {
+        if (use_spaces)
+            return spaces(indents * indent_size);
+        else
+            return tabs(indents);
+    }
+
+    inline bool shouldFlush() const { return should_flush; }
+
+    inline bool showAddress() const { return show_address; }
+
+    inline bool showParams() const { return show_params; }
+    
+    inline bool showShader() const { return show_shader; }
+
+    inline std::ostream &stream() const {
+        return use_cout ? std::cout : *(std::ofstream *)&output_stream;
+    }
+
+  private:
+    inline static bool readBoolOption(const char *option, bool default_value) {
+        const char *string_option = getLayerOption(option);
+        if (string_option != NULL && strcmp(string_option, "TRUE") == 0)
+            return true;
+        else if (string_option != NULL && strcmp(string_option, "FALSE") == 0)
+            return false;
+        else
+            return default_value;
+    }
+
+    inline static int readIntOption(const char *option, int default_value) {
+        const char *string_option = getLayerOption(option);
+        int value;
+        if (sscanf(string_option, "%d", &value) != 1) {
+            return default_value;
+        } else {
+            return value;
+        }
+    }
+
+    inline static const char *spaces(int count) {
+        return SPACES + (MAX_SPACES - std::max(count, 0));
+    }
+
+    inline static const char *tabs(int count) {
+        return TABS + (MAX_TABS - std::max(count, 0));
+    }
+
+    bool use_cout;
+    std::ofstream output_stream;
+    ApiDumpFormat output_format;
+    bool show_params;
+    bool show_address;
+    bool should_flush;
+
+    bool show_type;
+    int indent_size;
+    int name_size;
+    int type_size;
+    bool use_spaces;
+    bool show_shader;
+
+    static const char *const SPACES;
+    static const int MAX_SPACES = 72;
+    static const char *const TABS;
+    static const int MAX_TABS = 18;
+};
+
+const char *const ApiDumpSettings::SPACES =
+    "                                                                        ";
+const char *const ApiDumpSettings::TABS =
+    "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t";
+
+class ApiDumpInstance {
+  public:
+    inline ApiDumpInstance()
+        : dump_settings(NULL), frame_count(0), thread_count(0) {
+        loader_platform_thread_create_mutex(&output_mutex);
+        loader_platform_thread_create_mutex(&frame_mutex);
+        loader_platform_thread_create_mutex(&thread_mutex);
+    }
+
+    inline ~ApiDumpInstance() {
+        if (dump_settings != NULL)
+            delete dump_settings;
+        
+        loader_platform_thread_delete_mutex(&thread_mutex);
+        loader_platform_thread_delete_mutex(&frame_mutex);
+        loader_platform_thread_delete_mutex(&output_mutex);
+    }
+
+    inline uint64_t frameCount() {
+        loader_platform_thread_lock_mutex(&frame_mutex);
+        uint64_t count = frame_count;
+        loader_platform_thread_unlock_mutex(&frame_mutex);
+        return count;
+    }
+
+    inline void nextFrame() {
+        loader_platform_thread_lock_mutex(&frame_mutex);
+        ++frame_count;
+        loader_platform_thread_unlock_mutex(&frame_mutex);
+    }
+
+    inline loader_platform_thread_mutex *outputMutex() { return &output_mutex; }
+
+    inline const ApiDumpSettings &settings() {
+        if (dump_settings == NULL)
+            dump_settings = new ApiDumpSettings();
+
+        return *dump_settings;
+    }
+
+    uint32_t threadID() {
+        loader_platform_thread_id id = loader_platform_get_thread_id();
+        loader_platform_thread_lock_mutex(&thread_mutex);
+        for (uint32_t i = 0; i < thread_count; ++i) {
+            if (thread_map[i] == id) {
+                loader_platform_thread_unlock_mutex(&thread_mutex);
+                return i;
+            }
+        }
+
+        uint32_t new_index = thread_count;
+        thread_map[thread_count++] = id;
+        assert(thread_count < MAX_THREADS);
+        loader_platform_thread_unlock_mutex(&thread_mutex);
+        return new_index;
+    }
+
+    static inline ApiDumpInstance &current() { return current_instance; }
+
+  private:
+    static ApiDumpInstance current_instance;
+
+    ApiDumpSettings *dump_settings;
+    loader_platform_thread_mutex output_mutex;
+    loader_platform_thread_mutex frame_mutex;
+    uint64_t frame_count;
+
+    static const size_t MAX_THREADS = 513;
+    loader_platform_thread_mutex thread_mutex;
+    loader_platform_thread_id thread_map[MAX_THREADS];
+    uint32_t thread_count;
+};
+
+ApiDumpInstance ApiDumpInstance::current_instance;
+
+//==================================== Text Backend Helpers ======================================//
+
+template <typename T>
+inline void
+dump_text_array(const T *array, size_t len, const ApiDumpSettings &settings,
+                const char *type_string, const char *child_type,
+                const char *name, int indents,
+                std::ostream &(*dump)(const T, const ApiDumpSettings &, int)) {
+    settings.formatNameType(settings.stream(), indents, name, type_string);
+    if (array == NULL) {
+        settings.stream() << "NULL\n";
+        return;
+    }
+    if (settings.showAddress())
+        settings.stream() << (void *)array << "\n";
+    else
+        settings.stream() << "address\n";
+
+    for (size_t i = 0; i < len && array != NULL; ++i) {
+        std::stringstream stream;
+        stream << name << '[' << i << ']';
+        std::string indexName = stream.str();
+        dump_text_value(array[i], settings, child_type, indexName.c_str(),
+                        indents + 1, dump);
+    }
+}
+
+template <typename T>
+inline void dump_text_array(
+    const T *array, size_t len, const ApiDumpSettings &settings,
+    const char *type_string, const char *child_type, const char *name,
+    int indents,
+    std::ostream &(*dump)(const T &, const ApiDumpSettings &, int)) {
+    settings.formatNameType(settings.stream(), indents, name, type_string);
+    if (array == NULL) {
+        settings.stream() << "NULL\n";
+        return;
+    }
+    if (settings.showAddress())
+        settings.stream() << (void *)array << "\n";
+    else
+        settings.stream() << "address\n";
+
+    for (size_t i = 0; i < len && array != NULL; ++i) {
+        std::stringstream stream;
+        stream << name << '[' << i << ']';
+        std::string indexName = stream.str();
+        dump_text_value(array[i], settings, child_type, indexName.c_str(),
+                        indents + 1, dump);
+    }
+}
+
+template <typename T>
+inline void dump_text_pointer(
+    const T *pointer, const ApiDumpSettings &settings, const char *type_string,
+    const char *name, int indents,
+    std::ostream &(*dump)(const T, const ApiDumpSettings &, int)) {
+    if (pointer == NULL) {
+        settings.formatNameType(settings.stream(), indents, name, type_string);
+        settings.stream() << "NULL\n";
+    } else {
+        dump_text_value(*pointer, settings, type_string, name, indents, dump);
+    }
+}
+
+template <typename T>
+inline void dump_text_pointer(
+    const T *pointer, const ApiDumpSettings &settings, const char *type_string,
+    const char *name, int indents,
+    std::ostream &(*dump)(const T &, const ApiDumpSettings &, int)) {
+    if (pointer == NULL) {
+        settings.formatNameType(settings.stream(), indents, name, type_string);
+        settings.stream() << "NULL\n";
+    } else {
+        dump_text_value(*pointer, settings, type_string, name, indents, dump);
+    }
+}
+
+template <typename T>
+inline void
+dump_text_value(const T object, const ApiDumpSettings &settings,
+                const char *type_string, const char *name, int indents,
+                std::ostream &(*dump)(const T, const ApiDumpSettings &, int)) {
+    settings.formatNameType(settings.stream(), indents, name, type_string);
+    dump(object, settings, indents) << "\n";
+}
+
+template <typename T>
+inline void dump_text_value(
+    const T &object, const ApiDumpSettings &settings, const char *type_string,
+    const char *name, int indents,
+    std::ostream &(*dump)(const T &, const ApiDumpSettings &, int)) {
+    settings.formatNameType(settings.stream(), indents, name, type_string);
+    dump(object, settings, indents);
+}
+
+inline void dump_text_special(
+    const char *text, const ApiDumpSettings &settings, const char *type_string,
+    const char *name, int indents) {
+    settings.formatNameType(settings.stream(), indents, name, type_string);
+    settings.stream() << text << "\n";
+}
+
+inline bool dump_text_bitmaskOption(const std::string &option,
+                                    std::ostream &stream, bool isFirst) {
+    if (isFirst)
+        stream << " (";
+    else
+        stream << " | ";
+    stream << option;
+    return false;
+}
+
+inline std::ostream &dump_text_cstring(const char *object,
+                                       const ApiDumpSettings &settings,
+                                       int indents) {
+    if (object == NULL)
+        return settings.stream() << "NULL";
+    else
+        return settings.stream() << "\"" << object << "\"";
+}
+
+inline std::ostream &dump_text_void(const void *object,
+                                    const ApiDumpSettings &settings,
+                                    int indents) {
+    if (object == NULL)
+        return settings.stream() << "NULL";
+    else if (settings.showAddress())
+        return settings.stream() << object;
+    else
+        return settings.stream() << "address";
+}

diff --git a/layersvt/basic.cpp b/layersvt/basic.cpp
new file mode 100644
index 0000000..43fb774
--- /dev/null
+++ b/layersvt/basic.cpp

@@ -0,0 +1,268 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * Copyright (C) 2015 Google Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: David Pinedo <david@lunarg.com>
+ * Author: Jon Ashburn <jon@lunarg.com>
+ */
+#include <string.h>
+#include <stdlib.h>
+#include <assert.h>
+#include <unordered_map>
+#include "vk_dispatch_table_helper.h"
+#include "vulkan/vk_layer.h"
+#include "vk_layer_table.h"
+
+static std::unordered_map<dispatch_key, VkInstance> basic_instance_map;
+
+typedef VkResult(VKAPI_PTR *PFN_vkLayerBasicEXT)(VkDevice device);
+static PFN_vkLayerBasicEXT pfn_layer_extension;
+
+VKAPI_ATTR VkResult VKAPI_CALL basic_LayerBasicEXT(VkDevice device) {
+    printf("In vkLayerBasicEXT() call w/ device: %p\n", (void *)device);
+    if (pfn_layer_extension) {
+        printf("In vkLayerBasicEXT() call down chain\n");
+        return pfn_layer_extension(device);
+    }
+    printf("vkLayerBasicEXT returning SUCCESS\n");
+    return VK_SUCCESS;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL
+basic_CreateInstance(const VkInstanceCreateInfo *pCreateInfo, const VkAllocationCallbacks *pAllocator, VkInstance *pInstance) {
+    VkLayerInstanceCreateInfo *chain_info = get_chain_info(pCreateInfo, VK_LAYER_LINK_INFO);
+
+    assert(chain_info->u.pLayerInfo);
+    PFN_vkGetInstanceProcAddr fpGetInstanceProcAddr = chain_info->u.pLayerInfo->pfnNextGetInstanceProcAddr;
+    PFN_vkCreateInstance fpCreateInstance = (PFN_vkCreateInstance)fpGetInstanceProcAddr(NULL, "vkCreateInstance");
+    if (fpCreateInstance == NULL) {
+        return VK_ERROR_INITIALIZATION_FAILED;
+    }
+
+    // Advance the link info for the next element on the chain
+    chain_info->u.pLayerInfo = chain_info->u.pLayerInfo->pNext;
+
+    VkResult result = fpCreateInstance(pCreateInfo, pAllocator, pInstance);
+    if (result != VK_SUCCESS)
+        return result;
+
+    basic_instance_map[get_dispatch_key(*pInstance)] = *pInstance;
+    initInstanceTable(*pInstance, fpGetInstanceProcAddr);
+
+    return result;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL
+basic_EnumeratePhysicalDevices(VkInstance instance, uint32_t *pPhysicalDeviceCount, VkPhysicalDevice *pPhysicalDevices) {
+    printf("At start of wrapped vkEnumeratePhysicalDevices() call w/ inst: %p\n", (void *)instance);
+    VkResult result = instance_dispatch_table(instance)->EnumeratePhysicalDevices(instance, pPhysicalDeviceCount, pPhysicalDevices);
+    printf("Completed wrapped vkEnumeratePhysicalDevices() call w/ count %u\n", *pPhysicalDeviceCount);
+    return result;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL basic_CreateDevice(VkPhysicalDevice physicalDevice,
+                                                  const VkDeviceCreateInfo *pCreateInfo,
+                                                  const VkAllocationCallbacks *pAllocator, VkDevice *pDevice) {
+    printf("VK_LAYER_LUNARG_Basic: At start of vkCreateDevice() call w/ gpu: %p\n", (void *)physicalDevice);
+
+    VkLayerDeviceCreateInfo *chain_info = get_chain_info(pCreateInfo, VK_LAYER_LINK_INFO);
+
+    assert(chain_info->u.pLayerInfo);
+    PFN_vkGetInstanceProcAddr fpGetInstanceProcAddr = chain_info->u.pLayerInfo->pfnNextGetInstanceProcAddr;
+    PFN_vkGetDeviceProcAddr fpGetDeviceProcAddr = chain_info->u.pLayerInfo->pfnNextGetDeviceProcAddr;
+    VkInstance instance = basic_instance_map[get_dispatch_key(physicalDevice)];
+    PFN_vkCreateDevice fpCreateDevice = (PFN_vkCreateDevice)fpGetInstanceProcAddr(instance, "vkCreateDevice");
+    if (fpCreateDevice == NULL) {
+        return VK_ERROR_INITIALIZATION_FAILED;
+    }
+
+    // Advance the link info for the next element on the chain
+    chain_info->u.pLayerInfo = chain_info->u.pLayerInfo->pNext;
+
+    VkResult result = fpCreateDevice(physicalDevice, pCreateInfo, pAllocator, pDevice);
+    if (result != VK_SUCCESS) {
+        return result;
+    }
+
+    initDeviceTable(*pDevice, fpGetDeviceProcAddr);
+
+    pfn_layer_extension = (PFN_vkLayerBasicEXT)fpGetDeviceProcAddr(*pDevice, "vkLayerBasicEXT");
+    printf("VK_LAYER_LUNARG_Basic: Completed vkCreateDevice() call w/ pDevice, Device %p: %p\n", (void *)pDevice, (void *)*pDevice);
+    return result;
+}
+
+/* hook DestroyDevice to remove tableMap entry */
+VKAPI_ATTR void VKAPI_CALL basic_DestroyDevice(VkDevice device, const VkAllocationCallbacks *pAllocator) {
+    dispatch_key key = get_dispatch_key(device);
+    device_dispatch_table(device)->DestroyDevice(device, pAllocator);
+    destroy_device_dispatch_table(key);
+}
+
+/* hook DestroyInstance to remove tableInstanceMap entry */
+VKAPI_ATTR void VKAPI_CALL basic_DestroyInstance(VkInstance instance, const VkAllocationCallbacks *pAllocator) {
+    dispatch_key key = get_dispatch_key(instance);
+    instance_dispatch_table(instance)->DestroyInstance(instance, pAllocator);
+    destroy_instance_dispatch_table(key);
+    basic_instance_map.erase(key);
+}
+
+VKAPI_ATTR void VKAPI_CALL
+basic_GetPhysicalDeviceFormatProperties(VkPhysicalDevice gpu, VkFormat format, VkFormatProperties *pFormatInfo) {
+    printf("At start of wrapped vkGetPhysicalDeviceFormatProperties() call w/ gpu: %p\n", (void *)gpu);
+    instance_dispatch_table(gpu)->GetPhysicalDeviceFormatProperties(gpu, format, pFormatInfo);
+    printf("Completed wrapped vkGetPhysicalDeviceFormatProperties() call w/ gpu: %p\n", (void *)gpu);
+}
+
+static const VkLayerProperties basic_LayerProps = {
+    "VK_LAYER_LUNARG_basic",
+    VK_MAKE_VERSION(1, 0, VK_HEADER_VERSION), // specVersion
+    1,              // implementationVersion
+    "LunarG Sample Layer",
+};
+
+static const VkExtensionProperties basic_physicaldevice_extensions[] = {{
+    "vkLayerBasicEXT", 1,
+}};
+
+template<typename T>
+VkResult EnumerateProperties(uint32_t src_count, const T *src_props, uint32_t *dst_count, T *dst_props) {
+    if (!dst_props || !src_props) {
+        *dst_count = src_count;
+        return VK_SUCCESS;
+    }
+
+    uint32_t copy_count = (*dst_count < src_count) ? *dst_count : src_count;
+    memcpy(dst_props, src_props, sizeof(T) * copy_count);
+    *dst_count = copy_count;
+
+    return (copy_count == src_count) ? VK_SUCCESS : VK_INCOMPLETE;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL
+basic_EnumerateInstanceLayerProperties(uint32_t *pCount, VkLayerProperties *pProperties) {
+    return EnumerateProperties(1, &basic_LayerProps, pCount, pProperties);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL
+basic_EnumerateDeviceLayerProperties(VkPhysicalDevice physicalDevice, uint32_t *pCount, VkLayerProperties *pProperties) {
+    return EnumerateProperties(1, &basic_LayerProps, pCount, pProperties);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL
+basic_EnumerateInstanceExtensionProperties(const char *pLayerName, uint32_t *pCount, VkExtensionProperties *pProperties) {
+    if (pLayerName && !strcmp(pLayerName, basic_LayerProps.layerName))
+        return EnumerateProperties<VkExtensionProperties>(0, NULL, pCount, pProperties);
+
+    return VK_ERROR_LAYER_NOT_PRESENT;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL basic_EnumerateDeviceExtensionProperties(VkPhysicalDevice physicalDevice,
+                                                                        const char *pLayerName, uint32_t *pCount,
+                                                                        VkExtensionProperties *pProperties) {
+    if (pLayerName && !strcmp(pLayerName, basic_LayerProps.layerName)) {
+        uint32_t count = sizeof(basic_physicaldevice_extensions) /
+            sizeof(basic_physicaldevice_extensions[0]);
+        return EnumerateProperties(count, basic_physicaldevice_extensions, pCount, pProperties);
+    }
+
+    return instance_dispatch_table(physicalDevice)
+        ->EnumerateDeviceExtensionProperties(physicalDevice, pLayerName, pCount, pProperties);
+}
+
+VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL basic_GetDeviceProcAddr(VkDevice device, const char *pName) {
+    if (!strcmp("vkGetDeviceProcAddr", pName))
+        return (PFN_vkVoidFunction)basic_GetDeviceProcAddr;
+    if (!strcmp("vkDestroyDevice", pName))
+        return (PFN_vkVoidFunction)basic_DestroyDevice;
+    if (!strcmp("vkLayerBasicEXT", pName))
+        return (PFN_vkVoidFunction)basic_LayerBasicEXT;
+
+    if (device == NULL)
+        return NULL;
+
+    if (device_dispatch_table(device)->GetDeviceProcAddr == NULL)
+        return NULL;
+    return device_dispatch_table(device)->GetDeviceProcAddr(device, pName);
+}
+
+VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL basic_GetInstanceProcAddr(VkInstance instance, const char *pName) {
+    if (!strcmp("vkEnumerateInstanceLayerProperties", pName))
+        return (PFN_vkVoidFunction)basic_EnumerateInstanceLayerProperties;
+    if (!strcmp("vkEnumerateDeviceLayerProperties", pName))
+        return (PFN_vkVoidFunction)basic_EnumerateDeviceLayerProperties;
+    if (!strcmp("vkEnumerateInstanceExtensionProperties", pName))
+        return (PFN_vkVoidFunction)basic_EnumerateInstanceExtensionProperties;
+    if (!strcmp("vkEnumerateDeviceExtensionProperties", pName))
+        return (PFN_vkVoidFunction)basic_EnumerateDeviceExtensionProperties;
+    if (!strcmp("vkGetInstanceProcAddr", pName))
+        return (PFN_vkVoidFunction)basic_GetInstanceProcAddr;
+    if (!strcmp("vkGetPhysicalDeviceFormatProperties", pName))
+        return (PFN_vkVoidFunction)basic_GetPhysicalDeviceFormatProperties;
+    if (!strcmp("vkCreateInstance", pName))
+        return (PFN_vkVoidFunction)basic_CreateInstance;
+    if (!strcmp("vkDestroyInstance", pName))
+        return (PFN_vkVoidFunction)basic_DestroyInstance;
+    if (!strcmp("vkCreateDevice", pName))
+        return (PFN_vkVoidFunction)basic_CreateDevice;
+    if (!strcmp("vkEnumeratePhysicalDevices", pName))
+        return (PFN_vkVoidFunction)basic_EnumeratePhysicalDevices;
+
+    assert(instance);
+
+    PFN_vkVoidFunction proc = basic_GetDeviceProcAddr(VK_NULL_HANDLE, pName);
+    if (proc)
+        return proc;
+
+    if (instance_dispatch_table(instance)->GetInstanceProcAddr == NULL)
+        return NULL;
+    return instance_dispatch_table(instance)->GetInstanceProcAddr(instance, pName);
+}
+
+// loader-layer interface v0, just wrappers since there is only a layer
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL
+vkEnumerateInstanceLayerProperties(uint32_t *pCount, VkLayerProperties *pProperties) {
+    return basic_EnumerateInstanceLayerProperties(pCount, pProperties);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL
+vkEnumerateDeviceLayerProperties(VkPhysicalDevice physicalDevice, uint32_t *pCount, VkLayerProperties *pProperties) {
+    // the layer command handles VK_NULL_HANDLE just fine internally
+    assert(physicalDevice == VK_NULL_HANDLE);
+    return basic_EnumerateDeviceLayerProperties(VK_NULL_HANDLE, pCount, pProperties);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL
+vkEnumerateInstanceExtensionProperties(const char *pLayerName, uint32_t *pCount, VkExtensionProperties *pProperties) {
+    return basic_EnumerateInstanceExtensionProperties(pLayerName, pCount, pProperties);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkEnumerateDeviceExtensionProperties(VkPhysicalDevice physicalDevice,
+                                                                                    const char *pLayerName, uint32_t *pCount,
+                                                                                    VkExtensionProperties *pProperties) {
+    // the layer command handles VK_NULL_HANDLE just fine internally
+    assert(physicalDevice == VK_NULL_HANDLE);
+    return basic_EnumerateDeviceExtensionProperties(VK_NULL_HANDLE, pLayerName, pCount, pProperties);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL vkGetDeviceProcAddr(VkDevice dev, const char *funcName) {
+    return basic_GetDeviceProcAddr(dev, funcName);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL vkGetInstanceProcAddr(VkInstance instance, const char *funcName) {
+    return basic_GetInstanceProcAddr(instance, funcName);
+}

diff --git a/layersvt/generic.h b/layersvt/generic.h
new file mode 100644
index 0000000..6854fba
--- /dev/null
+++ b/layersvt/generic.h

@@ -0,0 +1,98 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * Copyright (C) 2015 Google Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ * Author: Jon Ashburn <jon@lunarg.com>
+ *
+ */
+
+#ifndef GENERIC_H
+#define GENERIC_H
+#include "vulkan/vk_layer.h"
+
+/*
+ * This file contains static functions for the generated layer generic
+ */
+
+// The following is for logging error messages:
+struct layer_data {
+    debug_report_data *report_data;
+    std::vector<VkDebugReportCallbackEXT> logging_callback;
+
+    layer_data() : report_data(nullptr), logging_callback(VK_NULL_HANDLE){};
+};
+
+static const VkLayerProperties globalLayerProps[] = {{
+    "VK_LAYER_LUNARG_generic",
+    VK_MAKE_VERSION(1, 0, VK_HEADER_VERSION), // specVersion
+    1, "layer: generic",
+}};
+
+static const VkLayerProperties deviceLayerProps[] = {{
+    "VK_LAYER_LUNARG_generic",
+    VK_MAKE_VERSION(1, 0, VK_HEADER_VERSION), // specVersion
+    1, "layer: generic",
+}};
+
+struct devExts {
+    bool wsi_enabled;
+};
+struct instExts {
+    bool wsi_enabled;
+};
+static std::unordered_map<void *, struct devExts> deviceExtMap;
+static std::unordered_map<void *, struct instExts> instanceExtMap;
+
+static void createDeviceRegisterExtensions(const VkDeviceCreateInfo *pCreateInfo, VkDevice device) {
+    uint32_t i;
+    VkLayerDispatchTable *pDisp = device_dispatch_table(device);
+    PFN_vkGetDeviceProcAddr gpa = pDisp->GetDeviceProcAddr;
+    pDisp->CreateSwapchainKHR = (PFN_vkCreateSwapchainKHR)gpa(device, "vkCreateSwapchainKHR");
+    pDisp->DestroySwapchainKHR = (PFN_vkDestroySwapchainKHR)gpa(device, "vkDestroySwapchainKHR");
+    pDisp->GetSwapchainImagesKHR = (PFN_vkGetSwapchainImagesKHR)gpa(device, "vkGetSwapchainImagesKHR");
+    pDisp->AcquireNextImageKHR = (PFN_vkAcquireNextImageKHR)gpa(device, "vkAcquireNextImageKHR");
+    pDisp->QueuePresentKHR = (PFN_vkQueuePresentKHR)gpa(device, "vkQueuePresentKHR");
+
+    deviceExtMap[pDisp].wsi_enabled = false;
+    for (i = 0; i < pCreateInfo->enabledExtensionCount; i++) {
+        if (strcmp(pCreateInfo->ppEnabledExtensionNames[i], VK_KHR_SWAPCHAIN_EXTENSION_NAME) == 0)
+            deviceExtMap[pDisp].wsi_enabled = true;
+    }
+}
+
+static void createInstanceRegisterExtensions(const VkInstanceCreateInfo *pCreateInfo, VkInstance instance) {
+    uint32_t i;
+    VkLayerInstanceDispatchTable *pDisp = instance_dispatch_table(instance);
+    PFN_vkGetInstanceProcAddr gpa = pDisp->GetInstanceProcAddr;
+
+    pDisp->DestroySurfaceKHR = (PFN_vkDestroySurfaceKHR)gpa(instance, "vkDestroySurfaceKHR");
+    pDisp->GetPhysicalDeviceSurfaceSupportKHR =
+        (PFN_vkGetPhysicalDeviceSurfaceSupportKHR)gpa(instance, "vkGetPhysicalDeviceSurfaceSupportKHR");
+    pDisp->GetPhysicalDeviceSurfaceCapabilitiesKHR =
+        (PFN_vkGetPhysicalDeviceSurfaceCapabilitiesKHR)gpa(instance, "vkGetPhysicalDeviceSurfaceCapabilitiesKHR");
+    pDisp->GetPhysicalDeviceSurfaceFormatsKHR =
+        (PFN_vkGetPhysicalDeviceSurfaceFormatsKHR)gpa(instance, "vkGetPhysicalDeviceSurfaceFormatsKHR");
+    pDisp->GetPhysicalDeviceSurfacePresentModesKHR =
+        (PFN_vkGetPhysicalDeviceSurfacePresentModesKHR)gpa(instance, "vkGetPhysicalDeviceSurfacePresentModesKHR");
+    instanceExtMap[pDisp].wsi_enabled = false;
+    for (i = 0; i < pCreateInfo->enabledExtensionCount; i++) {
+        if (strcmp(pCreateInfo->ppEnabledExtensionNames[i], VK_KHR_SURFACE_EXTENSION_NAME) == 0)
+            instanceExtMap[pDisp].wsi_enabled = true;
+    }
+}
+#endif // GENERIC_H

diff --git a/layersvt/linux/VkLayer_api_dump.json b/layersvt/linux/VkLayer_api_dump.json
new file mode 100644
index 0000000..a5a01b2
--- /dev/null
+++ b/layersvt/linux/VkLayer_api_dump.json

@@ -0,0 +1,11 @@
+{
+    "file_format_version" : "1.0.0",
+    "layer" : {
+        "name": "VK_LAYER_LUNARG_api_dump",
+        "type": "GLOBAL",
+        "library_path": "./libVkLayer_api_dump.so",
+        "api_version": "1.0.33",
+        "implementation_version": "2",
+        "description": "LunarG debug layer"
+    }
+}

diff --git a/layersvt/linux/VkLayer_basic.json b/layersvt/linux/VkLayer_basic.json
new file mode 100644
index 0000000..0c379dd
--- /dev/null
+++ b/layersvt/linux/VkLayer_basic.json

@@ -0,0 +1,18 @@
+{
+    "file_format_version" : "1.0.0",
+    "layer" : {
+        "name": "VK_LAYER_LUNARG_basic",
+        "type": "GLOBAL",
+        "library_path": "./libVkLayer_basic.so",
+        "api_version": "1.0.33",
+        "implementation_version": "1",
+        "description": "LunarG Sample Layer",
+        "device_extensions": [
+             {
+                 "name": "VK_LUNARG_LayerExtension1",
+                 "spec_version": "0",
+		 "entrypoints": ["vkLayerExtension1"]
+             }
+         ]
+    }
+}

diff --git a/layersvt/linux/VkLayer_basic_implicit.json b/layersvt/linux/VkLayer_basic_implicit.json
new file mode 100644
index 0000000..d2d83eb
--- /dev/null
+++ b/layersvt/linux/VkLayer_basic_implicit.json

@@ -0,0 +1,20 @@
+{
+    "file_format_version" : "1.0.0",
+    "layer" : {
+        "name": "VK_LAYER_LUNARG_basic",
+        "type": "GLOBAL",
+        "library_path": "/etc/vulkan/implicit_layer.d/libVkLayer_basic.so",
+        "api_version": "1.0.33",
+        "implementation_version": "1",
+        "description": "LunarG Sample Layer",
+        "device_extensions": [
+             {
+                 "name": "VK_LUNARG_LayerExtension1",
+                 "spec_version": "0",
+		 "entrypoints": ["vkLayerExtension1"]
+             }
+        ],
+	"disable_environment": { "DISABLE_LAYER_BASIC_1": "1"},
+	"enable_environment": { "ENABLE_LAYER_BASIC_1": "134"}
+    }
+}

diff --git a/layersvt/linux/VkLayer_generic.json b/layersvt/linux/VkLayer_generic.json
new file mode 100644
index 0000000..f8c4878
--- /dev/null
+++ b/layersvt/linux/VkLayer_generic.json

@@ -0,0 +1,11 @@
+{
+    "file_format_version" : "1.0.0",
+    "layer" : {
+        "name": "VK_LAYER_LUNARG_generic",
+        "type": "GLOBAL",
+        "library_path": "./libVkLayer_generic.so",
+        "api_version": "1.0.33",
+        "implementation_version": "1",
+        "description": "LunarG Sample Layer"
+    }
+}

diff --git a/layersvt/linux/VkLayer_multi.json b/layersvt/linux/VkLayer_multi.json
new file mode 100644
index 0000000..5495f54
--- /dev/null
+++ b/layersvt/linux/VkLayer_multi.json

@@ -0,0 +1,28 @@
+{
+    "file_format_version" : "1.0.1",
+    "layers" : [
+        {
+            "name": "VK_LAYER_LUNARG_multi1",
+            "type": "INSTANCE",
+            "library_path": "./libVkLayer_multi.so",
+            "api_version": "1.0.33",
+            "implementation_version": "1",
+            "description": "LunarG Sample multiple layer per library",
+            "functions" : {
+              "vkGetInstanceProcAddr" : "VK_LAYER_LUNARG_multi1GetInstanceProcAddr",
+              "vkGetDeviceProcAddr" : "VK_LAYER_LUNARG_multi1GetDeviceProcAddr"
+            }
+        },
+        {
+            "name": "VK_LAYER_LUNARG_multi2",
+            "type": "INSTANCE",
+            "library_path": "./libVkLayer_multi.so",
+            "api_version": "1.0.33",
+            "implementation_version": "1",
+            "description": "LunarG Sample multiple layer per library",
+            "functions" : {
+              "vkGetInstanceProcAddr" : "VK_LAYER_LUNARG_multi2GetInstanceProcAddr"
+            }
+        }
+    ]
+}

diff --git a/layersvt/linux/VkLayer_screenshot.json b/layersvt/linux/VkLayer_screenshot.json
new file mode 100644
index 0000000..8a6e392
--- /dev/null
+++ b/layersvt/linux/VkLayer_screenshot.json

@@ -0,0 +1,11 @@
+{
+    "file_format_version" : "1.0.0",
+    "layer" : {
+        "name": "VK_LAYER_LUNARG_screenshot",
+        "type": "GLOBAL",
+        "library_path": "./libVkLayer_screenshot.so",
+        "api_version": "1.0.33",
+        "implementation_version": "1",
+        "description": "LunarG image capture layer"
+    }
+}

diff --git a/layersvt/multi.cpp b/layersvt/multi.cpp
new file mode 100644
index 0000000..318b87d
--- /dev/null
+++ b/layersvt/multi.cpp

@@ -0,0 +1,480 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ *
+ */
+
+#include <string.h>
+#include <stdlib.h>
+#include <assert.h>
+#include <unordered_map>
+#include "vk_loader_platform.h"
+#include "vulkan/vk_layer.h"
+#include "vk_layer_table.h"
+
+template<typename T>
+VkResult EnumerateProperties(uint32_t src_count, const T *src_props, uint32_t *dst_count, T *dst_props) {
+    if (!dst_props || !src_props) {
+        *dst_count = src_count;
+        return VK_SUCCESS;
+    }
+
+    uint32_t copy_count = (*dst_count < src_count) ? *dst_count : src_count;
+    memcpy(dst_props, src_props, sizeof(T) * copy_count);
+    *dst_count = copy_count;
+
+    return (copy_count == src_count) ? VK_SUCCESS : VK_INCOMPLETE;
+}
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+enum {
+    LAYER_MULTI1,
+    LAYER_MULTI2,
+    LAYER_COUNT,
+};
+
+static const VkLayerProperties all_layer_props[LAYER_COUNT] = {
+    {
+        "VK_LAYER_LUNARG_multi1",
+        VK_MAKE_VERSION(1, 0, VK_HEADER_VERSION), // specVersion
+        1,              // implementationVersion
+        "LunarG Sample multiple layer per library",
+    },
+    {
+        "VK_LAYER_LUNARG_multi2",
+        VK_MAKE_VERSION(1, 0, VK_HEADER_VERSION), // specVersion
+        1,              // implementationVersion
+        "LunarG Sample multiple layer per library",
+    },
+};
+
+static const struct {
+    uint32_t count;
+    const VkExtensionProperties* extensions;
+} all_extension_props[LAYER_COUNT] = {
+    { 0, NULL, },
+    { 0, NULL, },
+};
+
+static device_table_map multi1_device_table_map;
+static instance_table_map multi1_instance_table_map;
+static std::unordered_map<dispatch_key, VkInstance> multi1_instance_map;
+/******************************** Layer multi1 functions **************************/
+VKAPI_ATTR VkResult VKAPI_CALL
+multi1CreateInstance(const VkInstanceCreateInfo *pCreateInfo, const VkAllocationCallbacks *pAllocator, VkInstance *pInstance) {
+    VkLayerInstanceCreateInfo *chain_info = get_chain_info(pCreateInfo, VK_LAYER_LINK_INFO);
+
+    assert(chain_info->u.pLayerInfo);
+    PFN_vkGetInstanceProcAddr fpGetInstanceProcAddr = chain_info->u.pLayerInfo->pfnNextGetInstanceProcAddr;
+    PFN_vkCreateInstance fpCreateInstance = (PFN_vkCreateInstance)fpGetInstanceProcAddr(NULL, "vkCreateInstance");
+    if (fpCreateInstance == NULL) {
+        return VK_ERROR_INITIALIZATION_FAILED;
+    }
+
+    // Advance the link info for the next element on the chain
+    chain_info->u.pLayerInfo = chain_info->u.pLayerInfo->pNext;
+
+    VkResult result = fpCreateInstance(pCreateInfo, pAllocator, pInstance);
+    if (result != VK_SUCCESS)
+        return result;
+
+    multi1_instance_map[get_dispatch_key(*pInstance)] = *pInstance;
+    initInstanceTable(*pInstance, fpGetInstanceProcAddr, multi1_instance_table_map);
+
+    return result;
+}
+
+/* hook DestroyInstance to remove tableInstanceMap entry */
+VKAPI_ATTR void VKAPI_CALL multi1DestroyInstance(VkInstance instance, const VkAllocationCallbacks *pAllocator) {
+    VkLayerInstanceDispatchTable *pDisp = get_dispatch_table(multi1_instance_table_map, instance);
+    dispatch_key key = get_dispatch_key(instance);
+
+    printf("At start of wrapped multi1 vkDestroyInstance()\n");
+    pDisp->DestroyInstance(instance, pAllocator);
+    multi1_instance_table_map.erase(key);
+    multi1_instance_map.erase(key);
+    printf("Completed multi1 layer vkDestroyInstance()\n");
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL multi1CreateDevice(VkPhysicalDevice physicalDevice,
+                                                                  const VkDeviceCreateInfo *pCreateInfo,
+                                                                  const VkAllocationCallbacks *pAllocator, VkDevice *pDevice) {
+    printf("At start of multi1 layer vkCreateDevice()\n");
+
+    VkLayerDeviceCreateInfo *chain_info = get_chain_info(pCreateInfo, VK_LAYER_LINK_INFO);
+
+    assert(chain_info->u.pLayerInfo);
+    PFN_vkGetInstanceProcAddr fpGetInstanceProcAddr = chain_info->u.pLayerInfo->pfnNextGetInstanceProcAddr;
+    PFN_vkGetDeviceProcAddr fpGetDeviceProcAddr = chain_info->u.pLayerInfo->pfnNextGetDeviceProcAddr;
+    VkInstance instance = multi1_instance_map[get_dispatch_key(physicalDevice)];
+    PFN_vkCreateDevice fpCreateDevice = (PFN_vkCreateDevice)fpGetInstanceProcAddr(instance, "vkCreateDevice");
+    if (fpCreateDevice == NULL) {
+        return VK_ERROR_INITIALIZATION_FAILED;
+    }
+
+    // Advance the link info for the next element on the chain
+    chain_info->u.pLayerInfo = chain_info->u.pLayerInfo->pNext;
+
+    VkResult result = fpCreateDevice(physicalDevice, pCreateInfo, pAllocator, pDevice);
+    if (result != VK_SUCCESS) {
+        return result;
+    }
+
+    initDeviceTable(*pDevice, fpGetDeviceProcAddr, multi1_device_table_map);
+
+    printf("Completed multi1 layer vkCreateDevice() call");
+    return result;
+}
+
+/* hook DestroyDevice to remove tableMap entry */
+VKAPI_ATTR void VKAPI_CALL multi1DestroyDevice(VkDevice device, const VkAllocationCallbacks *pAllocator) {
+    VkLayerDispatchTable *pDisp = get_dispatch_table(multi1_device_table_map, device);
+    dispatch_key key = get_dispatch_key(device);
+
+    printf("At start of multi1 layer vkDestroyDevice()\n");
+    pDisp->DestroyDevice(device, pAllocator);
+    multi1_device_table_map.erase(key);
+    printf("Completed multi1 layer vkDestroyDevice()\n");
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL multi1CreateSampler(VkDevice device, const VkSamplerCreateInfo *pCreateInfo,
+                                                                   const VkAllocationCallbacks *pAllocator, VkSampler *pSampler) {
+    VkLayerDispatchTable *pDisp = get_dispatch_table(multi1_device_table_map, device);
+
+    printf("At start of multi1 layer vkCreateSampler()\n");
+    VkResult result = pDisp->CreateSampler(device, pCreateInfo, pAllocator, pSampler);
+    printf("Completed multi1 layer vkCreateSampler()\n");
+    return result;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL
+multi1CreateGraphicsPipelines(VkDevice device, VkPipelineCache pipelineCache, uint32_t count,
+                              const VkGraphicsPipelineCreateInfo *pCreateInfos, const VkAllocationCallbacks *pAllocator,
+                              VkPipeline *pPipelines) {
+    VkLayerDispatchTable *pDisp = get_dispatch_table(multi1_device_table_map, device);
+
+    printf("At start of multi1 layer vkCreateGraphicsPipeline()\n");
+    VkResult result = pDisp->CreateGraphicsPipelines(device, pipelineCache, count, pCreateInfos, pAllocator, pPipelines);
+    printf("Completed multi1 layer vkCreateGraphicsPipeline()\n");
+    return result;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL
+multi1EnumerateInstanceLayerProperties(uint32_t *pCount, VkLayerProperties *pProperties)
+{
+    return EnumerateProperties(1, &all_layer_props[LAYER_MULTI1], pCount, pProperties);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL
+multi1EnumerateDeviceLayerProperties(VkPhysicalDevice physicalDevice, uint32_t *pCount, VkLayerProperties *pProperties)
+{
+    return EnumerateProperties(1, &all_layer_props[LAYER_MULTI1], pCount, pProperties);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL
+multi1EnumerateInstanceExtensionProperties(const char *pLayerName, uint32_t *pCount, VkExtensionProperties *pProperties)
+{
+    if (pLayerName && !strcmp(pLayerName, all_layer_props[LAYER_MULTI1].layerName)) {
+        return EnumerateProperties(all_extension_props[LAYER_MULTI1].count,
+                all_extension_props[LAYER_MULTI1].extensions, pCount, pProperties);
+    }
+
+    return VK_ERROR_LAYER_NOT_PRESENT;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL
+multi1EnumerateDeviceExtensionProperties(VkPhysicalDevice physicalDevice,
+                                         const char *pLayerName, uint32_t *pCount,
+                                         VkExtensionProperties *pProperties)
+{
+    if (pLayerName && !strcmp(pLayerName, all_layer_props[LAYER_MULTI1].layerName)) {
+        return EnumerateProperties(all_extension_props[LAYER_MULTI1].count,
+                all_extension_props[LAYER_MULTI1].extensions, pCount, pProperties);
+    }
+
+    VkLayerInstanceDispatchTable *pDisp = get_dispatch_table(multi1_instance_table_map, physicalDevice);
+    return pDisp->EnumerateDeviceExtensionProperties(physicalDevice, pLayerName, pCount, pProperties);
+}
+
+VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL multi1GetDeviceProcAddr(VkDevice device, const char *pName) {
+    if (!strcmp(pName, "multi1GetDeviceProcAddr") || !strcmp(pName, "vkGetDeviceProcAddr"))
+        return (PFN_vkVoidFunction)multi1GetDeviceProcAddr;
+    if (!strcmp("vkDestroyDevice", pName))
+        return (PFN_vkVoidFunction)multi1DestroyDevice;
+    if (!strcmp("vkCreateSampler", pName))
+        return (PFN_vkVoidFunction)multi1CreateSampler;
+    if (!strcmp("vkCreateGraphicsPipelines", pName))
+        return (PFN_vkVoidFunction)multi1CreateGraphicsPipelines;
+
+    if (device == NULL)
+        return NULL;
+
+    VkLayerDispatchTable *pTable = get_dispatch_table(multi1_device_table_map, device);
+    if (pTable->GetDeviceProcAddr == NULL)
+        return NULL;
+    return pTable->GetDeviceProcAddr(device, pName);
+}
+
+VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL multi1GetInstanceProcAddr(VkInstance instance, const char *pName) {
+    if (!strcmp("vkEnumerateInstanceLayerProperties", pName))
+        return (PFN_vkVoidFunction)multi1EnumerateInstanceLayerProperties;
+    if (!strcmp("vkEnumerateDeviceLayerProperties", pName))
+        return (PFN_vkVoidFunction)multi1EnumerateDeviceLayerProperties;
+    if (!strcmp("vkEnumerateInstanceExtensionProperties", pName))
+        return (PFN_vkVoidFunction)multi1EnumerateInstanceExtensionProperties;
+    if (!strcmp("vkEnumerateDeviceExtensionProperties", pName))
+        return (PFN_vkVoidFunction)multi1EnumerateDeviceExtensionProperties;
+    if (!strcmp(pName, "multi1GetInstanceProcAddr") || !strcmp(pName, "vkGetInsatnceProcAddr"))
+        return (PFN_vkVoidFunction)multi1GetInstanceProcAddr;
+    if (!strcmp("vkCreateInstance", pName))
+        return (PFN_vkVoidFunction)multi1CreateInstance;
+    if (!strcmp("vkCreateDevice", pName))
+        return (PFN_vkVoidFunction)multi1CreateDevice;
+    if (!strcmp("vkDestroyInstance", pName))
+        return (PFN_vkVoidFunction)multi1DestroyInstance;
+
+    assert(instance);
+
+    PFN_vkVoidFunction proc = multi1GetDeviceProcAddr(VK_NULL_HANDLE, pName);
+    if (proc)
+        return proc;
+
+    VkLayerInstanceDispatchTable *pTable = get_dispatch_table(multi1_instance_table_map, instance);
+    if (pTable->GetInstanceProcAddr == NULL)
+        return NULL;
+    return pTable->GetInstanceProcAddr(instance, pName);
+
+}
+
+static instance_table_map multi2_instance_table_map;
+/******************************** Layer multi2 functions **************************/
+VKAPI_ATTR VkResult VKAPI_CALL
+multi2CreateInstance(const VkInstanceCreateInfo *pCreateInfo, const VkAllocationCallbacks *pAllocator, VkInstance *pInstance) {
+    VkLayerInstanceCreateInfo *chain_info = get_chain_info(pCreateInfo, VK_LAYER_LINK_INFO);
+
+    assert(chain_info->u.pLayerInfo);
+    PFN_vkGetInstanceProcAddr fpGetInstanceProcAddr = chain_info->u.pLayerInfo->pfnNextGetInstanceProcAddr;
+    PFN_vkCreateInstance fpCreateInstance = (PFN_vkCreateInstance)fpGetInstanceProcAddr(NULL, "vkCreateInstance");
+    if (fpCreateInstance == NULL) {
+        return VK_ERROR_INITIALIZATION_FAILED;
+    }
+
+    // Advance the link info for the next element on the chain
+    chain_info->u.pLayerInfo = chain_info->u.pLayerInfo->pNext;
+
+    VkResult result = fpCreateInstance(pCreateInfo, pAllocator, pInstance);
+    if (result != VK_SUCCESS)
+        return result;
+
+    initInstanceTable(*pInstance, fpGetInstanceProcAddr, multi2_instance_table_map);
+
+    return result;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL
+multi2EnumeratePhysicalDevices(VkInstance instance, uint32_t *pPhysicalDeviceCount, VkPhysicalDevice *pPhysicalDevices) {
+    VkLayerInstanceDispatchTable *pDisp = get_dispatch_table(multi2_instance_table_map, instance);
+
+    printf("At start of wrapped multi2 vkEnumeratePhysicalDevices()\n");
+    VkResult result = pDisp->EnumeratePhysicalDevices(instance, pPhysicalDeviceCount, pPhysicalDevices);
+    printf("Completed multi2 layer vkEnumeratePhysicalDevices()\n");
+    return result;
+}
+
+VKAPI_ATTR void VKAPI_CALL
+multi2GetPhysicalDeviceProperties(VkPhysicalDevice physicalDevice, VkPhysicalDeviceProperties *pProperties) {
+    VkLayerInstanceDispatchTable *pDisp = get_dispatch_table(multi2_instance_table_map, physicalDevice);
+    printf("At start of wrapped multi2 vkGetPhysicalDeviceProperties()\n");
+    pDisp->GetPhysicalDeviceProperties(physicalDevice, pProperties);
+    printf("Completed multi2 layer vkGetPhysicalDeviceProperties()\n");
+}
+
+VKAPI_ATTR void VKAPI_CALL
+multi2GetPhysicalDeviceFeatures(VkPhysicalDevice physicalDevice, VkPhysicalDeviceFeatures *pFeatures) {
+    VkLayerInstanceDispatchTable *pDisp = get_dispatch_table(multi2_instance_table_map, physicalDevice);
+    printf("At start of wrapped multi2 vkGetPhysicalDeviceFeatures()\n");
+    pDisp->GetPhysicalDeviceFeatures(physicalDevice, pFeatures);
+    printf("Completed multi2 layer vkGetPhysicalDeviceFeatures()\n");
+}
+
+/* hook DestroyInstance to remove tableInstanceMap entry */
+VKAPI_ATTR void VKAPI_CALL multi2DestroyInstance(VkInstance instance, const VkAllocationCallbacks *pAllocator) {
+    VkLayerInstanceDispatchTable *pDisp = get_dispatch_table(multi2_instance_table_map, instance);
+    dispatch_key key = get_dispatch_key(instance);
+
+    printf("At start of wrapped multi2 vkDestroyInstance()\n");
+    pDisp->DestroyInstance(instance, pAllocator);
+    multi2_instance_table_map.erase(key);
+    printf("Completed multi2 layer vkDestroyInstance()\n");
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL
+multi2EnumerateInstanceLayerProperties(uint32_t *pCount, VkLayerProperties *pProperties)
+{
+    return EnumerateProperties(1, &all_layer_props[LAYER_MULTI2], pCount, pProperties);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL
+multi2EnumerateDeviceLayerProperties(VkPhysicalDevice physicalDevice, uint32_t *pCount, VkLayerProperties *pProperties)
+{
+    return EnumerateProperties(1, &all_layer_props[LAYER_MULTI2], pCount, pProperties);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL
+multi2EnumerateInstanceExtensionProperties(const char *pLayerName, uint32_t *pCount, VkExtensionProperties *pProperties)
+{
+    if (pLayerName && !strcmp(pLayerName, all_layer_props[LAYER_MULTI2].layerName)) {
+        return EnumerateProperties(all_extension_props[LAYER_MULTI2].count,
+                all_extension_props[LAYER_MULTI2].extensions, pCount, pProperties);
+    }
+
+    return VK_ERROR_LAYER_NOT_PRESENT;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL
+multi2EnumerateDeviceExtensionProperties(VkPhysicalDevice physicalDevice,
+                                         const char *pLayerName, uint32_t *pCount,
+                                         VkExtensionProperties *pProperties)
+{
+    if (pLayerName && !strcmp(pLayerName, all_layer_props[LAYER_MULTI2].layerName)) {
+        return EnumerateProperties(all_extension_props[LAYER_MULTI2].count,
+                all_extension_props[LAYER_MULTI2].extensions, pCount, pProperties);
+    }
+
+    VkLayerInstanceDispatchTable *pDisp = get_dispatch_table(multi2_instance_table_map, physicalDevice);
+    return pDisp->EnumerateDeviceExtensionProperties(physicalDevice, pLayerName, pCount, pProperties);
+}
+
+VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL multi2GetInstanceProcAddr(VkInstance inst, const char *pName) {
+    if (!strcmp("vkEnumerateInstanceLayerProperties", pName))
+        return (PFN_vkVoidFunction)multi2EnumerateInstanceLayerProperties;
+    if (!strcmp("vkEnumerateDeviceLayerProperties", pName))
+        return (PFN_vkVoidFunction)multi2EnumerateDeviceLayerProperties;
+    if (!strcmp("vkEnumerateInstanceExtensionProperties", pName))
+        return (PFN_vkVoidFunction)multi2EnumerateInstanceExtensionProperties;
+    if (!strcmp("vkEnumerateDeviceExtensionProperties", pName))
+        return (PFN_vkVoidFunction)multi2EnumerateDeviceExtensionProperties;
+    if (!strcmp("vkCreateInstance", pName))
+        return (PFN_vkVoidFunction)multi2CreateInstance;
+    if (!strcmp(pName, "multi2GetInstanceProcAddr") || !strcmp(pName, "vkGetInstanceProcAddr"))
+        return (PFN_vkVoidFunction)multi2GetInstanceProcAddr;
+    if (!strcmp("vkEnumeratePhysicalDevices", pName))
+        return (PFN_vkVoidFunction)multi2EnumeratePhysicalDevices;
+    if (!strcmp("GetPhysicalDeviceProperties", pName))
+        return (PFN_vkVoidFunction)multi2GetPhysicalDeviceProperties;
+    if (!strcmp("GetPhysicalDeviceFeatures", pName))
+        return (PFN_vkVoidFunction)multi2GetPhysicalDeviceFeatures;
+    if (!strcmp("vkDestroyInstance", pName))
+        return (PFN_vkVoidFunction)multi2DestroyInstance;
+
+    if (inst == NULL)
+        return NULL;
+
+    VkLayerInstanceDispatchTable *pTable = get_dispatch_table(multi2_instance_table_map, inst);
+    if (pTable->GetInstanceProcAddr == NULL)
+        return NULL;
+    return pTable->GetInstanceProcAddr(inst, pName);
+}
+
+// loader-layer interface v0
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL
+vkEnumerateInstanceLayerProperties(uint32_t *pCount, VkLayerProperties *pProperties)
+{
+    return EnumerateProperties(LAYER_COUNT, all_layer_props, pCount, pProperties);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL
+vkEnumerateDeviceLayerProperties(VkPhysicalDevice physicalDevice, uint32_t *pCount, VkLayerProperties *pProperties)
+{
+    // multi2 does not intercept any device-level command
+    return multi1EnumerateDeviceLayerProperties(physicalDevice, pCount, pProperties);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL
+vkEnumerateInstanceExtensionProperties(const char *pLayerName, uint32_t *pCount, VkExtensionProperties *pProperties)
+{
+    assert(pLayerName);
+
+    for (int i = 0; i < LAYER_COUNT; i++) {
+        if (!strcmp(pLayerName, all_layer_props[i].layerName)) {
+            return EnumerateProperties(all_extension_props[i].count,
+                    all_extension_props[i].extensions, pCount, pProperties);
+        }
+    }
+
+    assert(!"unreachable");
+    *pCount = 0;
+    return VK_SUCCESS;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL
+vkEnumerateDeviceExtensionProperties(VkPhysicalDevice physicalDevice,
+                                     const char *pLayerName, uint32_t *pCount,
+                                     VkExtensionProperties *pProperties)
+{
+    assert(physicalDevice == VK_NULL_HANDLE && pLayerName);
+
+    for (int i = 0; i < LAYER_COUNT; i++) {
+        if (!strcmp(pLayerName, all_layer_props[i].layerName)) {
+            return EnumerateProperties(all_extension_props[i].count,
+                    all_extension_props[i].extensions, pCount, pProperties);
+        }
+    }
+
+    assert(!"unreachable");
+    *pCount = 0;
+    return VK_SUCCESS;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL VK_LAYER_LUNARG_multi1GetDeviceProcAddr(VkDevice device, const char *pName) {
+    return multi1GetDeviceProcAddr(device, pName);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL VK_LAYER_LUNARG_multi1GetInstanceProcAddr(VkInstance instance, const char *pName) {
+    if (!strcmp(pName, "vkEnumerateInstanceLayerProperties"))
+        return reinterpret_cast<PFN_vkVoidFunction>(vkEnumerateInstanceLayerProperties);
+    if (!strcmp(pName, "vkEnumerateDeviceLayerProperties"))
+        return reinterpret_cast<PFN_vkVoidFunction>(vkEnumerateDeviceLayerProperties);
+    if (!strcmp(pName, "vkEnumerateInstanceExtensionProperties"))
+        return reinterpret_cast<PFN_vkVoidFunction>(vkEnumerateInstanceExtensionProperties);
+    if (!strcmp(pName, "vkGetInstanceProcAddr"))
+        return reinterpret_cast<PFN_vkVoidFunction>(VK_LAYER_LUNARG_multi1GetInstanceProcAddr);
+
+    return multi1GetInstanceProcAddr(instance, pName);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL VK_LAYER_LUNARG_multi2GetInstanceProcAddr(VkInstance instance, const char *pName) {
+    if (!strcmp(pName, "vkEnumerateInstanceLayerProperties"))
+        return reinterpret_cast<PFN_vkVoidFunction>(vkEnumerateInstanceLayerProperties);
+    if (!strcmp(pName, "vkEnumerateDeviceLayerProperties"))
+        return reinterpret_cast<PFN_vkVoidFunction>(vkEnumerateDeviceLayerProperties);
+    if (!strcmp(pName, "vkEnumerateInstanceExtensionProperties"))
+        return reinterpret_cast<PFN_vkVoidFunction>(vkEnumerateInstanceExtensionProperties);
+    if (!strcmp(pName, "vkGetInstanceProcAddr"))
+        return reinterpret_cast<PFN_vkVoidFunction>(VK_LAYER_LUNARG_multi2GetInstanceProcAddr);
+
+    return multi2GetInstanceProcAddr(instance, pName);
+}
+
+#ifdef __cplusplus
+} // extern "C"
+#endif

diff --git a/layersvt/screenshot.cpp b/layersvt/screenshot.cpp
new file mode 100644
index 0000000..e02b2dd
--- /dev/null
+++ b/layersvt/screenshot.cpp

@@ -0,0 +1,1305 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Cody Northrop <cody@lunarg.com>
+ * Author: David Pinedo <david@lunarg.com>
+ * Author: Jon Ashburn <jon@lunarg.com>
+ */
+
+#include <inttypes.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <assert.h>
+#include <unordered_map>
+#include <iostream>
+#include <algorithm>
+#include <list>
+#include <map>
+#include <set>
+#include <vector>
+#include <fstream>
+
+using namespace std;
+
+#include "vk_dispatch_table_helper.h"
+#include "vk_struct_string_helper_cpp.h"
+#include "vk_layer_config.h"
+#include "vk_layer_table.h"
+#include "vk_layer_extension_utils.h"
+#include "vk_layer_utils.h"
+
+#ifdef ANDROID
+
+#include <android/log.h>
+#include <sys/system_properties.h>
+
+static char android_env[64] = {};
+const char* env_var = "debug.vulkan.screenshot";
+
+char* android_exec(const char* cmd) {
+    FILE* pipe = popen(cmd, "r");
+    if (pipe != nullptr) {
+        fgets(android_env, 64, pipe);
+        pclose(pipe);
+    }
+
+    // Only if the value is set will we get a string back
+    if (strlen(android_env) > 0) {
+        __android_log_print(ANDROID_LOG_INFO, "screenshot", "Vulkan screenshot layer capturing: %s", android_env);
+        return android_env;
+    }
+
+    return nullptr;
+}
+
+char* android_getenv(const char *key)
+{
+    std::string command("getprop ");
+    command += key;
+    return android_exec(command.c_str());
+}
+
+static inline char *local_getenv(const char *name) {
+    return android_getenv(name);
+}
+
+static inline void local_free_getenv(const char *val) {}
+
+#elif defined(__linux__)
+
+const char* env_var = "_VK_SCREENSHOT";
+
+static inline char *local_getenv(const char *name) { return getenv(name); }
+
+static inline void local_free_getenv(const char *val) {}
+
+#elif defined(_WIN32)
+
+const char* env_var = "_VK_SCREENSHOT";
+
+static inline char *local_getenv(const char *name) {
+    char *retVal;
+    DWORD valSize;
+
+    valSize = GetEnvironmentVariableA(name, NULL, 0);
+
+    // valSize DOES include the null terminator, so for any set variable
+    // will always be at least 1. If it's 0, the variable wasn't set.
+    if (valSize == 0)
+        return NULL;
+
+    // TODO; FIXME This should be using any app defined memory allocation
+    retVal = (char *)malloc(valSize);
+
+    GetEnvironmentVariableA(name, retVal, valSize);
+
+    return retVal;
+}
+
+static inline void local_free_getenv(const char *val) { free((void *)val); }
+#endif
+
+namespace screenshot {
+
+static int globalLockInitialized = 0;
+static loader_platform_thread_mutex globalLock;
+
+// unordered map: associates a swap chain with a device, image extent, format,
+// and list of images
+typedef struct {
+    VkDevice device;
+    VkExtent2D imageExtent;
+    VkFormat format;
+    VkImage *imageList;
+} SwapchainMapStruct;
+static unordered_map<VkSwapchainKHR, SwapchainMapStruct *> swapchainMap;
+
+// unordered map: associates an image with a device, image extent, and format
+typedef struct {
+    VkDevice device;
+    VkExtent2D imageExtent;
+    VkFormat format;
+} ImageMapStruct;
+static unordered_map<VkImage, ImageMapStruct *> imageMap;
+
+// unordered map: associates a device with a queue, commandPool, and physical
+// device also contains per device info including dispatch table
+typedef struct {
+    VkLayerDispatchTable *device_dispatch_table;
+    bool wsi_enabled;
+    VkQueue queue;
+    VkCommandPool commandPool;
+    VkPhysicalDevice physicalDevice;
+    PFN_vkSetDeviceLoaderData pfn_dev_init;
+} DeviceMapStruct;
+static unordered_map<VkDevice, DeviceMapStruct *> deviceMap;
+
+// unordered map: associates a physical device with an instance
+typedef struct { VkInstance instance; } PhysDeviceMapStruct;
+static unordered_map<VkPhysicalDevice, PhysDeviceMapStruct *> physDeviceMap;
+
+// set: list of frames to take screenshots without duplication.
+static set<int> screenshotFrames;
+
+// Flag indicating we have queried env_var
+static bool screenshotEnvQueried = false;
+
+static bool
+memory_type_from_properties(VkPhysicalDeviceMemoryProperties *memory_properties,
+                            uint32_t typeBits, VkFlags requirements_mask,
+                            uint32_t *typeIndex) {
+    // Search memtypes to find first index with those properties
+    for (uint32_t i = 0; i < 32; i++) {
+        if ((typeBits & 1) == 1) {
+            // Type is available, does it match user properties?
+            if ((memory_properties->memoryTypes[i].propertyFlags &
+                 requirements_mask) == requirements_mask) {
+                *typeIndex = i;
+                return true;
+            }
+        }
+        typeBits >>= 1;
+    }
+    // No memory types matched, return failure
+    return false;
+}
+
+static DeviceMapStruct *get_dev_info(VkDevice dev) {
+    auto it = deviceMap.find(dev);
+    if (it == deviceMap.end())
+        return NULL;
+    else
+        return it->second;
+}
+
+static void init_screenshot() {
+    if (!globalLockInitialized) {
+        // TODO/TBD: Need to delete this mutex sometime.  How???  One
+        // suggestion is to call this during vkCreateInstance(), and then we
+        // can clean it up during vkDestroyInstance().  However, that requires
+        // that the layer have per-instance locks.  We need to come back and
+        // address this soon.
+        loader_platform_thread_create_mutex(&globalLock);
+        globalLockInitialized = 1;
+    }
+}
+
+// Track allocated resources in writePPM()
+// and clean them up when they go out of scope.
+struct WritePPMCleanupData {
+    VkDevice device;
+    VkLayerDispatchTable *pTableDevice;
+    VkImage image2;
+    VkImage image3;
+    VkDeviceMemory mem2;
+    VkDeviceMemory mem3;
+    bool mem2mapped;
+    bool mem3mapped;
+    VkCommandBuffer commandBuffer;
+    VkCommandPool commandPool;
+    ~WritePPMCleanupData();
+};
+
+WritePPMCleanupData::~WritePPMCleanupData() {
+    if (mem2mapped)
+        pTableDevice->UnmapMemory(device, mem2);
+    if (mem2)
+        pTableDevice->FreeMemory(device, mem2, NULL);
+    if (image2)
+        pTableDevice->DestroyImage(device, image2, NULL);
+
+    if (mem3mapped)
+        pTableDevice->UnmapMemory(device, mem3);
+    if (mem3)
+        pTableDevice->FreeMemory(device, mem3, NULL);
+    if (image3)
+        pTableDevice->DestroyImage(device, image3, NULL);
+
+    if (commandBuffer)
+        pTableDevice->FreeCommandBuffers(device, commandPool, 1,
+                                         &commandBuffer);
+}
+
+// Save an image to a PPM image file.
+//
+// This function issues commands to copy/convert the swapchain image
+// from whatever compatible format the swapchain image uses
+// to a single format (VK_FORMAT_R8G8B8A8_UNORM) so that the converted
+// result can be easily written to a PPM file.
+//
+// Error handling: If there is a problem, this function should silently
+// fail without affecting the Present operation going on in the caller.
+// The numerous debug asserts are to catch programming errors and are not
+// expected to assert.  Recovery and clean up are implemented for image memory
+// allocation failures.
+// (TODO) It would be nice to pass any failure info to DebugReport or something.
+static void writePPM(const char *filename, VkImage image1) {
+
+    VkResult err;
+    bool pass;
+
+    // Bail immediately if we can't find the image.
+    if (imageMap.empty() || imageMap.find(image1) == imageMap.end())
+        return;
+
+    // Collect object info from maps.  This info is generally recorded
+    // by the other functions hooked in this layer.
+    VkDevice device = imageMap[image1]->device;
+    VkPhysicalDevice physicalDevice = deviceMap[device]->physicalDevice;
+    VkInstance instance = physDeviceMap[physicalDevice]->instance;
+    VkQueue queue = deviceMap[device]->queue;
+    DeviceMapStruct *devMap = get_dev_info(device);
+    if (NULL == devMap) {
+        assert(0);
+        return;
+    }
+    VkLayerDispatchTable *pTableDevice = devMap->device_dispatch_table;
+    VkLayerDispatchTable *pTableQueue =
+        get_dev_info(static_cast<VkDevice>(static_cast<void *>(queue)))
+            ->device_dispatch_table;
+    VkLayerInstanceDispatchTable *pInstanceTable;
+    pInstanceTable = instance_dispatch_table(instance);
+
+    // Gather incoming image info and check image format for compatibility with
+    // the target format.
+    // This function supports both 24-bit and 32-bit swapchain images.
+    VkFormat const target32bitFormat = VK_FORMAT_R8G8B8A8_UNORM;
+    VkFormat const target24bitFormat = VK_FORMAT_R8G8B8_UNORM;
+    uint32_t const width = imageMap[image1]->imageExtent.width;
+    uint32_t const height = imageMap[image1]->imageExtent.height;
+    VkFormat const format = imageMap[image1]->format;
+    uint32_t const numChannels = vk_format_get_channel_count(format);
+    if ((vk_format_get_compatibility_class(target24bitFormat) !=
+         vk_format_get_compatibility_class(format)) &&
+        (vk_format_get_compatibility_class(target32bitFormat) !=
+         vk_format_get_compatibility_class(format))) {
+        assert(0);
+        return;
+    }
+    if ((3 != numChannels) && (4 != numChannels)) {
+        assert(0);
+        return;
+    }
+
+    // General Approach
+    //
+    // The idea here is to copy/convert the swapchain image into another image
+    // that can be mapped and read by the CPU to produce a PPM file.
+    // The image must be untiled and converted to a specific format for easy
+    // parsing.  The memory for the final image must be host-visible.
+    // Note that in Vulkan, a BLIT operation must be used to perform a format
+    // conversion.
+    //
+    // Devices vary in their ability to blit to/from linear and optimal tiling.
+    // So we must query the device properties to get this information.
+    //
+    // If the device cannot BLIT to a LINEAR image, then the operation must be
+    // done in two steps:
+    // 1) BLIT the swapchain image (image1) to a temp image (image2) that is
+    // created with TILING_OPTIMAL.
+    // 2) COPY image2 to another temp image (image3) that is created with
+    // TILING_LINEAR.
+    // 3) Map image 3 and write the PPM file.
+    //
+    // If the device can BLIT to a LINEAR image, then:
+    // 1) BLIT the swapchain image (image1) to a temp image (image2) that is
+    // created with TILING_LINEAR.
+    // 2) Map image 2 and write the PPM file.
+    //
+    // There seems to be no way to tell if the swapchain image (image1) is tiled
+    // or not.  We therefore assume that the BLIT operation can always read from
+    // both linear and optimal tiled (swapchain) images.
+    // There is therefore no point in looking at the BLIT_SRC properties.
+    //
+    // There is also the optimization where the incoming and target formats are
+    // the same.  In this case, just do a COPY.
+
+    VkFormatProperties targetFormatProps;
+    pInstanceTable->GetPhysicalDeviceFormatProperties(
+        physicalDevice,
+        (3 == numChannels) ? target24bitFormat : target32bitFormat,
+        &targetFormatProps);
+    bool need2steps = false;
+    bool copyOnly = false;
+    if ((target24bitFormat == format) || (target32bitFormat == format)) {
+        copyOnly = true;
+    } else {
+        bool const bltLinear = targetFormatProps.linearTilingFeatures &
+                                       VK_FORMAT_FEATURE_BLIT_DST_BIT
+                                   ? true
+                                   : false;
+        bool const bltOptimal = targetFormatProps.optimalTilingFeatures &
+                                        VK_FORMAT_FEATURE_BLIT_DST_BIT
+                                    ? true
+                                    : false;
+        if (!bltLinear && !bltOptimal) {
+            // Cannot blit to either target tiling type.  It should be pretty
+            // unlikely to have a device that cannot blit to either type.
+            // But punt by just doing a copy and possibly have the wrong
+            // colors.  This should be quite rare.
+            copyOnly = true;
+        } else if (!bltLinear && bltOptimal) {
+            // Cannot blit to a linear target but can blt to optimal, so copy
+            // after blit is needed.
+            need2steps = true;
+        }
+        // Else bltLinear is available and only 1 step is needed.
+    }
+
+    // Put resources that need to be cleaned up in a struct with a destructor
+    // so that things get cleaned up when this function is exited.
+    WritePPMCleanupData data = {};
+    data.device = device;
+    data.pTableDevice = pTableDevice;
+
+    // Set up the image creation info for both the blit and copy images, in case
+    // both are needed.
+    VkImageCreateInfo imgCreateInfo2 = {
+        VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO,
+        NULL,
+        0,
+        VK_IMAGE_TYPE_2D,
+        VK_FORMAT_R8G8B8A8_UNORM,
+        {width, height, 1},
+        1,
+        1,
+        VK_SAMPLE_COUNT_1_BIT,
+        VK_IMAGE_TILING_LINEAR,
+        VK_IMAGE_USAGE_TRANSFER_DST_BIT,
+        VK_SHARING_MODE_EXCLUSIVE,
+        0,
+        NULL,
+        VK_IMAGE_LAYOUT_UNDEFINED,
+    };
+    VkImageCreateInfo imgCreateInfo3 = imgCreateInfo2;
+
+    // If we need both images, set up image2 to be read/write and tiled.
+    if (need2steps) {
+        imgCreateInfo2.tiling = VK_IMAGE_TILING_OPTIMAL;
+        imgCreateInfo2.usage =
+            VK_IMAGE_USAGE_TRANSFER_SRC_BIT | VK_IMAGE_USAGE_TRANSFER_DST_BIT;
+    }
+
+    VkMemoryAllocateInfo memAllocInfo = {
+        VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO, NULL,
+        0, // allocationSize, queried later
+        0  // memoryTypeIndex, queried later
+    };
+    VkMemoryRequirements memRequirements;
+    VkPhysicalDeviceMemoryProperties memoryProperties;
+
+    // Create image2 and allocate its memory.  It could be the intermediate or
+    // final image.
+    err =
+        pTableDevice->CreateImage(device, &imgCreateInfo2, NULL, &data.image2);
+    assert(!err);
+    if (VK_SUCCESS != err)
+        return;
+    pTableDevice->GetImageMemoryRequirements(device, data.image2,
+                                             &memRequirements);
+    memAllocInfo.allocationSize = memRequirements.size;
+    pInstanceTable->GetPhysicalDeviceMemoryProperties(physicalDevice,
+                                                      &memoryProperties);
+    pass = memory_type_from_properties(
+        &memoryProperties, memRequirements.memoryTypeBits,
+        need2steps ? VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT
+                   : VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT,
+        &memAllocInfo.memoryTypeIndex);
+    assert(pass);
+    err = pTableDevice->AllocateMemory(device, &memAllocInfo, NULL, &data.mem2);
+    assert(!err);
+    if (VK_SUCCESS != err)
+        return;
+    err = pTableQueue->BindImageMemory(device, data.image2, data.mem2, 0);
+    assert(!err);
+    if (VK_SUCCESS != err)
+        return;
+
+    // Create image3 and allocate its memory, if needed.
+    if (need2steps) {
+        err = pTableDevice->CreateImage(device, &imgCreateInfo3, NULL,
+                                        &data.image3);
+        assert(!err);
+        if (VK_SUCCESS != err)
+            return;
+        pTableDevice->GetImageMemoryRequirements(device, data.image3,
+                                                 &memRequirements);
+        memAllocInfo.allocationSize = memRequirements.size;
+        pInstanceTable->GetPhysicalDeviceMemoryProperties(physicalDevice,
+                                                          &memoryProperties);
+        pass = memory_type_from_properties(
+            &memoryProperties, memRequirements.memoryTypeBits,
+            VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT, &memAllocInfo.memoryTypeIndex);
+        assert(pass);
+        err = pTableDevice->AllocateMemory(device, &memAllocInfo, NULL,
+                                           &data.mem3);
+        assert(!err);
+        if (VK_SUCCESS != err)
+            return;
+        err = pTableQueue->BindImageMemory(device, data.image3, data.mem3, 0);
+        assert(!err);
+        if (VK_SUCCESS != err)
+            return;
+    }
+
+    // Set up the command buffer.  We get a command buffer from a pool we saved
+    // in a hooked function, which would be the application's pool.
+    const VkCommandBufferAllocateInfo allocCommandBufferInfo = {
+        VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO, NULL,
+        deviceMap[device]->commandPool, VK_COMMAND_BUFFER_LEVEL_PRIMARY, 1};
+    data.commandPool = deviceMap[device]->commandPool;
+    err = pTableDevice->AllocateCommandBuffers(device, &allocCommandBufferInfo,
+                                               &data.commandBuffer);
+    assert(!err);
+    if (VK_SUCCESS != err)
+        return;
+
+    VkDevice cmdBuf =
+        static_cast<VkDevice>(static_cast<void *>(data.commandBuffer));
+    deviceMap.emplace(cmdBuf, devMap);
+    VkLayerDispatchTable *pTableCommandBuffer;
+    pTableCommandBuffer = get_dev_info(cmdBuf)->device_dispatch_table;
+
+    // We have just created a dispatchable object, but the dispatch table has
+    // not been placed in the object yet.  When a "normal" application creates
+    // a command buffer, the dispatch table is installed by the top-level api
+    // binding (trampoline.c). But here, we have to do it ourselves.
+    if (!devMap->pfn_dev_init) {
+        *((const void **)data.commandBuffer) = *(void **)device;
+    } else {
+        err = devMap->pfn_dev_init(device, (void *)data.commandBuffer);
+        assert(!err);
+    }
+
+    const VkCommandBufferBeginInfo commandBufferBeginInfo = {
+        VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO, NULL,
+        VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT,
+    };
+    err = pTableCommandBuffer->BeginCommandBuffer(data.commandBuffer,
+                                                  &commandBufferBeginInfo);
+    assert(!err);
+
+    // This barrier is used to transition from/to present Layout
+    VkImageMemoryBarrier presentMemoryBarrier = {
+        VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,
+        NULL,
+        VK_ACCESS_TRANSFER_WRITE_BIT,
+        VK_ACCESS_TRANSFER_READ_BIT,
+        VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,
+        VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
+        VK_QUEUE_FAMILY_IGNORED,
+        VK_QUEUE_FAMILY_IGNORED,
+        image1,
+        {VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1}};
+
+    // This barrier is used to transition from a newly-created layout to a blt
+    // or copy destination layout.
+    VkImageMemoryBarrier destMemoryBarrier = {
+        VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,
+        NULL,
+        0,
+        VK_ACCESS_TRANSFER_WRITE_BIT,
+        VK_IMAGE_LAYOUT_UNDEFINED,
+        VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
+        VK_QUEUE_FAMILY_IGNORED,
+        VK_QUEUE_FAMILY_IGNORED,
+        data.image2,
+        {VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1}};
+
+    // This barrier is used to transition a dest layout to general layout.
+    VkImageMemoryBarrier generalMemoryBarrier = {
+        VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,
+        NULL,
+        VK_ACCESS_TRANSFER_WRITE_BIT,
+        VK_ACCESS_TRANSFER_READ_BIT,
+        VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
+        VK_IMAGE_LAYOUT_GENERAL,
+        VK_QUEUE_FAMILY_IGNORED,
+        VK_QUEUE_FAMILY_IGNORED,
+        data.image2,
+        {VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1}};
+
+    VkPipelineStageFlags srcStages = VK_PIPELINE_STAGE_TRANSFER_BIT;
+    VkPipelineStageFlags dstStages = VK_PIPELINE_STAGE_TRANSFER_BIT;
+
+    // The source image needs to be transitioned from present to transfer
+    // source.
+    pTableCommandBuffer->CmdPipelineBarrier(data.commandBuffer, srcStages,
+                                            dstStages, 0, 0, NULL, 0, NULL, 1,
+                                            &presentMemoryBarrier);
+
+    // image2 needs to be transitioned from its undefined state to transfer
+    // destination.
+    pTableCommandBuffer->CmdPipelineBarrier(data.commandBuffer, srcStages,
+                                            dstStages, 0, 0, NULL, 0, NULL, 1,
+                                            &destMemoryBarrier);
+
+    const VkImageCopy imageCopyRegion = {{VK_IMAGE_ASPECT_COLOR_BIT, 0, 0, 1},
+                                         {0, 0, 0},
+                                         {VK_IMAGE_ASPECT_COLOR_BIT, 0, 0, 1},
+                                         {0, 0, 0},
+                                         {width, height, 1}};
+
+    if (copyOnly) {
+        pTableCommandBuffer->CmdCopyImage(
+            data.commandBuffer, image1, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
+            data.image2, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1,
+            &imageCopyRegion);
+    } else {
+        VkImageBlit imageBlitRegion = {};
+        imageBlitRegion.srcSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
+        imageBlitRegion.srcSubresource.baseArrayLayer = 0;
+        imageBlitRegion.srcSubresource.layerCount = 1;
+        imageBlitRegion.srcSubresource.mipLevel = 0;
+        imageBlitRegion.srcOffsets[1].x = width;
+        imageBlitRegion.srcOffsets[1].y = height;
+        imageBlitRegion.srcOffsets[1].z = 1;
+        imageBlitRegion.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
+        imageBlitRegion.dstSubresource.baseArrayLayer = 0;
+        imageBlitRegion.dstSubresource.layerCount = 1;
+        imageBlitRegion.dstSubresource.mipLevel = 0;
+        imageBlitRegion.dstOffsets[1].x = width;
+        imageBlitRegion.dstOffsets[1].y = height;
+        imageBlitRegion.dstOffsets[1].z = 1;
+
+        pTableCommandBuffer->CmdBlitImage(
+            data.commandBuffer, image1, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
+            data.image2, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1,
+            &imageBlitRegion, VK_FILTER_NEAREST);
+        if (need2steps) {
+            // image 3 needs to be transitioned from its undefined state to a
+            // transfer destination.
+            destMemoryBarrier.image = data.image3;
+            pTableCommandBuffer->CmdPipelineBarrier(
+                data.commandBuffer, srcStages, dstStages, 0, 0, NULL, 0, NULL,
+                1, &destMemoryBarrier);
+
+            // Transition image2 so that it can be read for the upcoming copy to
+            // image 3.
+            destMemoryBarrier.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
+            destMemoryBarrier.newLayout = VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL;
+            destMemoryBarrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
+            destMemoryBarrier.dstAccessMask = VK_ACCESS_TRANSFER_READ_BIT;
+            destMemoryBarrier.image = data.image2;
+            pTableCommandBuffer->CmdPipelineBarrier(
+                data.commandBuffer, srcStages, dstStages, 0, 0, NULL, 0, NULL,
+                1, &destMemoryBarrier);
+
+            // This step essentially untiles the image.
+            pTableCommandBuffer->CmdCopyImage(
+                data.commandBuffer, data.image2,
+                VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, data.image3,
+                VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, &imageCopyRegion);
+            generalMemoryBarrier.image = data.image3;
+        }
+    }
+
+    // The destination needs to be transitioned from the optimal copy format to
+    // the format we can read with the CPU.
+    pTableCommandBuffer->CmdPipelineBarrier(data.commandBuffer, srcStages,
+                                            dstStages, 0, 0, NULL, 0, NULL, 1,
+                                            &generalMemoryBarrier);
+
+    // Restore the swap chain image layout to what it was before.
+    // This may not be strictly needed, but it is generally good to restore
+    // things to original state.
+    presentMemoryBarrier.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL;
+    presentMemoryBarrier.newLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
+    presentMemoryBarrier.srcAccessMask = VK_ACCESS_TRANSFER_READ_BIT;
+    presentMemoryBarrier.dstAccessMask = 0;
+    pTableCommandBuffer->CmdPipelineBarrier(data.commandBuffer, srcStages,
+                                            dstStages, 0, 0, NULL, 0, NULL, 1,
+                                            &presentMemoryBarrier);
+
+    err = pTableCommandBuffer->EndCommandBuffer(data.commandBuffer);
+    assert(!err);
+
+    VkFence nullFence = {VK_NULL_HANDLE};
+    VkSubmitInfo submitInfo;
+    submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
+    submitInfo.pNext = NULL;
+    submitInfo.waitSemaphoreCount = 0;
+    submitInfo.pWaitSemaphores = NULL;
+    submitInfo.pWaitDstStageMask = NULL;
+    submitInfo.commandBufferCount = 1;
+    submitInfo.pCommandBuffers = &data.commandBuffer;
+    submitInfo.signalSemaphoreCount = 0;
+    submitInfo.pSignalSemaphores = NULL;
+
+    err = pTableQueue->QueueSubmit(queue, 1, &submitInfo, nullFence);
+    assert(!err);
+
+    err = pTableQueue->QueueWaitIdle(queue);
+    assert(!err);
+
+    err = pTableDevice->DeviceWaitIdle(device);
+    assert(!err);
+
+    // Map the final image so that the CPU can read it.
+    const VkImageSubresource sr = {VK_IMAGE_ASPECT_COLOR_BIT, 0, 0};
+    VkSubresourceLayout srLayout;
+    const char *ptr;
+    if (!need2steps) {
+        pTableDevice->GetImageSubresourceLayout(device, data.image2, &sr,
+                                                &srLayout);
+        err = pTableDevice->MapMemory(device, data.mem2, 0, VK_WHOLE_SIZE, 0,
+                                      (void **)&ptr);
+        assert(!err);
+        if (VK_SUCCESS != err)
+            return;
+        data.mem2mapped = true;
+    } else {
+        pTableDevice->GetImageSubresourceLayout(device, data.image3, &sr,
+                                                &srLayout);
+        err = pTableDevice->MapMemory(device, data.mem3, 0, VK_WHOLE_SIZE, 0,
+                                      (void **)&ptr);
+        assert(!err);
+        if (VK_SUCCESS != err)
+            return;
+        data.mem3mapped = true;
+    }
+
+    // Write the data to a PPM file.
+    ofstream file(filename, ios::binary);
+    assert(file.is_open());
+
+    if (!file.is_open()) {
+#ifdef ANDROID
+        __android_log_print(ANDROID_LOG_DEBUG, "screenshot", "Failed to open output file: %s.  Be sure to grant read and write permissions.", filename);
+#endif
+       return;
+    }
+
+    file << "P6\n";
+    file << width << "\n";
+    file << height << "\n";
+    file << 255 << "\n";
+
+    ptr += srLayout.offset;
+    if (3 == numChannels) {
+        for (uint32_t y = 0; y < height; y++) {
+            file.write(ptr, 3 * width);
+            ptr += srLayout.rowPitch;
+        }
+    } else if (4 == numChannels) {
+        for (uint32_t y = 0; y < height; y++) {
+            const unsigned int *row = (const unsigned int *)ptr;
+            for (uint32_t x = 0; x < width; x++) {
+                file.write((char *)row, 3);
+                row++;
+            }
+            ptr += srLayout.rowPitch;
+        }
+    }
+    file.close();
+
+    // Clean up handled by ~WritePPMCleanupData()
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL
+CreateInstance(const VkInstanceCreateInfo *pCreateInfo,
+               const VkAllocationCallbacks *pAllocator, VkInstance *pInstance) {
+    VkLayerInstanceCreateInfo *chain_info =
+        get_chain_info(pCreateInfo, VK_LAYER_LINK_INFO);
+
+    assert(chain_info->u.pLayerInfo);
+    PFN_vkGetInstanceProcAddr fpGetInstanceProcAddr =
+        chain_info->u.pLayerInfo->pfnNextGetInstanceProcAddr;
+    assert(fpGetInstanceProcAddr);
+    PFN_vkCreateInstance fpCreateInstance =
+        (PFN_vkCreateInstance)fpGetInstanceProcAddr(VK_NULL_HANDLE,
+                                                    "vkCreateInstance");
+    if (fpCreateInstance == NULL) {
+        return VK_ERROR_INITIALIZATION_FAILED;
+    }
+
+    // Advance the link info for the next element on the chain
+    chain_info->u.pLayerInfo = chain_info->u.pLayerInfo->pNext;
+
+    VkResult result = fpCreateInstance(pCreateInfo, pAllocator, pInstance);
+    if (result != VK_SUCCESS)
+        return result;
+
+    initInstanceTable(*pInstance, fpGetInstanceProcAddr);
+
+    init_screenshot();
+
+    return result;
+}
+
+// TODO hook DestroyInstance to cleanup
+
+static void
+createDeviceRegisterExtensions(const VkDeviceCreateInfo *pCreateInfo,
+                               VkDevice device) {
+    uint32_t i;
+    DeviceMapStruct *devMap = get_dev_info(device);
+    VkLayerDispatchTable *pDisp = devMap->device_dispatch_table;
+    PFN_vkGetDeviceProcAddr gpa = pDisp->GetDeviceProcAddr;
+    pDisp->CreateSwapchainKHR =
+        (PFN_vkCreateSwapchainKHR)gpa(device, "vkCreateSwapchainKHR");
+    pDisp->GetSwapchainImagesKHR =
+        (PFN_vkGetSwapchainImagesKHR)gpa(device, "vkGetSwapchainImagesKHR");
+    pDisp->AcquireNextImageKHR =
+        (PFN_vkAcquireNextImageKHR)gpa(device, "vkAcquireNextImageKHR");
+    pDisp->QueuePresentKHR =
+        (PFN_vkQueuePresentKHR)gpa(device, "vkQueuePresentKHR");
+    devMap->wsi_enabled = false;
+    for (i = 0; i < pCreateInfo->enabledExtensionCount; i++) {
+        if (strcmp(pCreateInfo->ppEnabledExtensionNames[i],
+                   VK_KHR_SWAPCHAIN_EXTENSION_NAME) == 0)
+            devMap->wsi_enabled = true;
+    }
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL
+CreateDevice(VkPhysicalDevice gpu, const VkDeviceCreateInfo *pCreateInfo,
+             const VkAllocationCallbacks *pAllocator, VkDevice *pDevice) {
+    VkLayerDeviceCreateInfo *chain_info =
+        get_chain_info(pCreateInfo, VK_LAYER_LINK_INFO);
+
+    assert(chain_info->u.pLayerInfo);
+    PFN_vkGetInstanceProcAddr fpGetInstanceProcAddr =
+        chain_info->u.pLayerInfo->pfnNextGetInstanceProcAddr;
+    PFN_vkGetDeviceProcAddr fpGetDeviceProcAddr =
+        chain_info->u.pLayerInfo->pfnNextGetDeviceProcAddr;
+    VkInstance instance = physDeviceMap[gpu]->instance;
+    PFN_vkCreateDevice fpCreateDevice =
+        (PFN_vkCreateDevice)fpGetInstanceProcAddr(instance, "vkCreateDevice");
+    if (fpCreateDevice == NULL) {
+        return VK_ERROR_INITIALIZATION_FAILED;
+    }
+
+    // Advance the link info for the next element on the chain
+    chain_info->u.pLayerInfo = chain_info->u.pLayerInfo->pNext;
+
+    VkResult result = fpCreateDevice(gpu, pCreateInfo, pAllocator, pDevice);
+    if (result != VK_SUCCESS) {
+        return result;
+    }
+
+    assert(deviceMap.find(*pDevice) == deviceMap.end());
+    DeviceMapStruct *deviceMapElem = new DeviceMapStruct;
+    deviceMap[*pDevice] = deviceMapElem;
+
+    // Setup device dispatch table
+    deviceMapElem->device_dispatch_table = new VkLayerDispatchTable;
+    layer_init_device_dispatch_table(
+        *pDevice, deviceMapElem->device_dispatch_table, fpGetDeviceProcAddr);
+
+    createDeviceRegisterExtensions(pCreateInfo, *pDevice);
+    // Create a mapping from a device to a physicalDevice
+    deviceMapElem->physicalDevice = gpu;
+
+    // store the loader callback for initializing created dispatchable objects
+    chain_info = get_chain_info(pCreateInfo, VK_LOADER_DATA_CALLBACK);
+    if (chain_info) {
+        deviceMapElem->pfn_dev_init = chain_info->u.pfnSetDeviceLoaderData;
+    } else {
+        deviceMapElem->pfn_dev_init = NULL;
+    }
+    return result;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL
+EnumeratePhysicalDevices(VkInstance instance, uint32_t *pPhysicalDeviceCount,
+                         VkPhysicalDevice *pPhysicalDevices) {
+    VkResult result;
+
+    VkLayerInstanceDispatchTable *pTable = instance_dispatch_table(instance);
+    result = pTable->EnumeratePhysicalDevices(instance, pPhysicalDeviceCount,
+                                              pPhysicalDevices);
+    if (result == VK_SUCCESS && *pPhysicalDeviceCount > 0 && pPhysicalDevices) {
+        for (uint32_t i = 0; i < *pPhysicalDeviceCount; i++) {
+            // Create a mapping from a physicalDevice to an instance
+            if (physDeviceMap[pPhysicalDevices[i]] == NULL) {
+                PhysDeviceMapStruct *physDeviceMapElem =
+                    new PhysDeviceMapStruct;
+                physDeviceMap[pPhysicalDevices[i]] = physDeviceMapElem;
+            }
+            physDeviceMap[pPhysicalDevices[i]]->instance = instance;
+        }
+    }
+    return result;
+}
+
+VKAPI_ATTR void VKAPI_CALL
+DestroyDevice(VkDevice device, const VkAllocationCallbacks *pAllocator) {
+    DeviceMapStruct *devMap = get_dev_info(device);
+    assert(devMap);
+    VkLayerDispatchTable *pDisp = devMap->device_dispatch_table;
+    pDisp->DestroyDevice(device, pAllocator);
+
+    loader_platform_thread_lock_mutex(&globalLock);
+    delete pDisp;
+    delete devMap;
+
+    deviceMap.erase(device);
+    loader_platform_thread_unlock_mutex(&globalLock);
+}
+
+VKAPI_ATTR void VKAPI_CALL GetDeviceQueue(VkDevice device,
+                                          uint32_t queueNodeIndex,
+                                          uint32_t queueIndex,
+                                          VkQueue *pQueue) {
+    DeviceMapStruct *devMap = get_dev_info(device);
+    assert(devMap);
+    VkLayerDispatchTable *pDisp = devMap->device_dispatch_table;
+    pDisp->GetDeviceQueue(device, queueNodeIndex, queueIndex, pQueue);
+
+    // Save the device queue in a map if we are taking screenshots.
+    loader_platform_thread_lock_mutex(&globalLock);
+    if (screenshotEnvQueried && screenshotFrames.empty()) {
+        // No screenshots in the list to take
+        loader_platform_thread_unlock_mutex(&globalLock);
+        return;
+    }
+
+    VkDevice que = static_cast<VkDevice>(static_cast<void *>(*pQueue));
+    deviceMap.emplace(que, devMap);
+
+    // Create a mapping from a device to a queue
+    devMap->queue = *pQueue;
+    loader_platform_thread_unlock_mutex(&globalLock);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL CreateCommandPool(
+    VkDevice device, const VkCommandPoolCreateInfo *pCreateInfo,
+    const VkAllocationCallbacks *pAllocator, VkCommandPool *pCommandPool) {
+    DeviceMapStruct *devMap = get_dev_info(device);
+    assert(devMap);
+    VkLayerDispatchTable *pDisp = devMap->device_dispatch_table;
+    VkResult result =
+        pDisp->CreateCommandPool(device, pCreateInfo, pAllocator, pCommandPool);
+
+    // Save the command pool on a map if we are taking screenshots.
+    loader_platform_thread_lock_mutex(&globalLock);
+    if (screenshotEnvQueried && screenshotFrames.empty()) {
+        // No screenshots in the list to take
+        loader_platform_thread_unlock_mutex(&globalLock);
+        return result;
+    }
+
+    // Create a mapping from a device to a commandPool
+    devMap->commandPool = *pCommandPool;
+    loader_platform_thread_unlock_mutex(&globalLock);
+    return result;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL CreateSwapchainKHR(
+    VkDevice device, const VkSwapchainCreateInfoKHR *pCreateInfo,
+    const VkAllocationCallbacks *pAllocator, VkSwapchainKHR *pSwapchain) {
+
+    DeviceMapStruct *devMap = get_dev_info(device);
+    assert(devMap);
+    VkLayerDispatchTable *pDisp = devMap->device_dispatch_table;
+
+    // This layer does an image copy later on, and the copy command expects the
+    // transfer src bit to be on.
+    VkSwapchainCreateInfoKHR myCreateInfo = *pCreateInfo;
+    myCreateInfo.imageUsage |= VK_IMAGE_USAGE_TRANSFER_SRC_BIT;
+    VkResult result = pDisp->CreateSwapchainKHR(device, &myCreateInfo,
+                                                pAllocator, pSwapchain);
+
+    // Save the swapchain in a map of we are taking screenshots.
+    loader_platform_thread_lock_mutex(&globalLock);
+    if (screenshotEnvQueried && screenshotFrames.empty()) {
+        // No screenshots in the list to take
+        loader_platform_thread_unlock_mutex(&globalLock);
+        return result;
+    }
+
+    if (result == VK_SUCCESS) {
+        // Create a mapping for a swapchain to a device, image extent, and
+        // format
+        SwapchainMapStruct *swapchainMapElem = new SwapchainMapStruct;
+        swapchainMapElem->device = device;
+        swapchainMapElem->imageExtent = pCreateInfo->imageExtent;
+        swapchainMapElem->format = pCreateInfo->imageFormat;
+        swapchainMap.insert(make_pair(*pSwapchain, swapchainMapElem));
+
+        // Create a mapping for the swapchain object into the dispatch table
+        // TODO is this needed? screenshot_device_table_map.emplace((void
+        // *)pSwapchain, pTable);
+    }
+    loader_platform_thread_unlock_mutex(&globalLock);
+
+    return result;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL
+GetSwapchainImagesKHR(VkDevice device, VkSwapchainKHR swapchain,
+                      uint32_t *pCount, VkImage *pSwapchainImages) {
+    DeviceMapStruct *devMap = get_dev_info(device);
+    assert(devMap);
+    VkLayerDispatchTable *pDisp = devMap->device_dispatch_table;
+    VkResult result = pDisp->GetSwapchainImagesKHR(device, swapchain, pCount,
+                                                   pSwapchainImages);
+
+    // Save the swapchain images in a map if we are taking screenshots
+    loader_platform_thread_lock_mutex(&globalLock);
+    if (screenshotEnvQueried && screenshotFrames.empty()) {
+        // No screenshots in the list to take
+        loader_platform_thread_unlock_mutex(&globalLock);
+        return result;
+    }
+
+    if (result == VK_SUCCESS && pSwapchainImages && !swapchainMap.empty() &&
+        swapchainMap.find(swapchain) != swapchainMap.end()) {
+        unsigned i;
+
+        for (i = 0; i < *pCount; i++) {
+            // Create a mapping for an image to a device, image extent, and
+            // format
+            if (imageMap[pSwapchainImages[i]] == NULL) {
+                ImageMapStruct *imageMapElem = new ImageMapStruct;
+                imageMap[pSwapchainImages[i]] = imageMapElem;
+            }
+            imageMap[pSwapchainImages[i]]->device =
+                swapchainMap[swapchain]->device;
+            imageMap[pSwapchainImages[i]]->imageExtent =
+                swapchainMap[swapchain]->imageExtent;
+            imageMap[pSwapchainImages[i]]->format =
+                swapchainMap[swapchain]->format;
+        }
+
+        // Add list of images to swapchain to image map
+        SwapchainMapStruct *swapchainMapElem = swapchainMap[swapchain];
+        if (i >= 1 && swapchainMapElem) {
+            VkImage *imageList = new VkImage[i];
+            swapchainMapElem->imageList = imageList;
+            for (unsigned j = 0; j < i; j++) {
+                swapchainMapElem->imageList[j] = pSwapchainImages[j];
+            }
+        }
+    }
+    loader_platform_thread_unlock_mutex(&globalLock);
+    return result;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL
+QueuePresentKHR(VkQueue queue, const VkPresentInfoKHR *pPresentInfo) {
+    static int frameNumber = 0;
+    if (frameNumber == 10) {
+        fflush(stdout); /* *((int*)0)=0; */
+    }
+    DeviceMapStruct *devMap = get_dev_info((VkDevice)queue);
+    assert(devMap);
+    VkLayerDispatchTable *pDisp = devMap->device_dispatch_table;
+    VkResult result = pDisp->QueuePresentKHR(queue, pPresentInfo);
+    loader_platform_thread_lock_mutex(&globalLock);
+
+    if (!screenshotEnvQueried) {
+        const char *_vk_screenshot = local_getenv(env_var);
+        if (_vk_screenshot && *_vk_screenshot) {
+            string spec(_vk_screenshot), word;
+            size_t start = 0, comma = 0;
+
+            while (start < spec.size()) {
+                int frameToAdd;
+                comma = spec.find(',', start);
+                if (comma == string::npos)
+                    word = string(spec, start);
+                else
+                    word = string(spec, start, comma - start);
+                frameToAdd = atoi(word.c_str());
+                // Add the frame number to set, but only do it if the word
+                // started with a digit and if
+                // it's not already in the list
+                if (*(word.c_str()) >= '0' && *(word.c_str()) <= '9') {
+                    screenshotFrames.insert(frameToAdd);
+                }
+                if (comma == string::npos)
+                    break;
+                start = comma + 1;
+            }
+        }
+        local_free_getenv(_vk_screenshot);
+        screenshotEnvQueried = true;
+    }
+
+    if (result == VK_SUCCESS && !screenshotFrames.empty()) {
+        set<int>::iterator it;
+        it = screenshotFrames.find(frameNumber);
+        if (it != screenshotFrames.end()) {
+            string fileName;
+
+#ifdef ANDROID
+            // std::to_string is not supported currently
+            char buffer [64];
+            snprintf(buffer, sizeof(buffer), "/sdcard/Android/%d", frameNumber);
+            std::string base(buffer);
+            fileName = base + ".ppm";
+#else
+            fileName = to_string(frameNumber) + ".ppm";
+#endif
+
+            VkImage image;
+            VkSwapchainKHR swapchain;
+            // We'll dump only one image: the first
+            swapchain = pPresentInfo->pSwapchains[0];
+            image = swapchainMap[swapchain]
+                        ->imageList[pPresentInfo->pImageIndices[0]];
+            writePPM(fileName.c_str(), image);
+            screenshotFrames.erase(it);
+
+            if (screenshotFrames.empty()) {
+                // Free all our maps since we are done with them.
+                for (auto it = swapchainMap.begin(); it != swapchainMap.end();
+                     it++) {
+                    SwapchainMapStruct *swapchainMapElem = it->second;
+                    delete swapchainMapElem;
+                }
+                for (auto it = imageMap.begin(); it != imageMap.end(); it++) {
+                    ImageMapStruct *imageMapElem = it->second;
+                    delete imageMapElem;
+                }
+                for (auto it = physDeviceMap.begin(); it != physDeviceMap.end();
+                     it++) {
+                    PhysDeviceMapStruct *physDeviceMapElem = it->second;
+                    delete physDeviceMapElem;
+                }
+                swapchainMap.clear();
+                imageMap.clear();
+                physDeviceMap.clear();
+            }
+        }
+    }
+    frameNumber++;
+    loader_platform_thread_unlock_mutex(&globalLock);
+    return result;
+}
+
+static const VkLayerProperties global_layer = {
+    "VK_LAYER_LUNARG_screenshot", VK_MAKE_VERSION(1, 0, VK_HEADER_VERSION), 1,
+    "Layer: screenshot",
+};
+
+VKAPI_ATTR VkResult VKAPI_CALL EnumerateInstanceLayerProperties(
+    uint32_t *pCount, VkLayerProperties *pProperties) {
+    return util_GetLayerProperties(1, &global_layer, pCount, pProperties);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL EnumerateDeviceLayerProperties(
+    VkPhysicalDevice physicalDevice, uint32_t *pCount,
+    VkLayerProperties *pProperties) {
+    return util_GetLayerProperties(1, &global_layer, pCount, pProperties);
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL
+EnumerateInstanceExtensionProperties(const char *pLayerName, uint32_t *pCount,
+                                     VkExtensionProperties *pProperties) {
+    if (pLayerName && !strcmp(pLayerName, global_layer.layerName))
+        return util_GetExtensionProperties(0, NULL, pCount, pProperties);
+
+    return VK_ERROR_LAYER_NOT_PRESENT;
+}
+
+VKAPI_ATTR VkResult VKAPI_CALL EnumerateDeviceExtensionProperties(
+    VkPhysicalDevice physicalDevice, const char *pLayerName, uint32_t *pCount,
+    VkExtensionProperties *pProperties) {
+    if (pLayerName && !strcmp(pLayerName, global_layer.layerName))
+        return util_GetExtensionProperties(0, NULL, pCount, pProperties);
+
+    assert(physicalDevice);
+
+    VkLayerInstanceDispatchTable *pTable =
+        instance_dispatch_table(physicalDevice);
+    return pTable->EnumerateDeviceExtensionProperties(
+        physicalDevice, pLayerName, pCount, pProperties);
+}
+
+static PFN_vkVoidFunction intercept_core_instance_command(const char *name);
+
+static PFN_vkVoidFunction intercept_core_device_command(const char *name);
+
+static PFN_vkVoidFunction intercept_khr_swapchain_command(const char *name,
+                                                          VkDevice dev);
+
+VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL
+GetDeviceProcAddr(VkDevice dev, const char *funcName) {
+    PFN_vkVoidFunction proc = intercept_core_device_command(funcName);
+    if (proc)
+        return proc;
+
+    if (dev == NULL) {
+        return NULL;
+    }
+
+    proc = intercept_khr_swapchain_command(funcName, dev);
+    if (proc)
+        return proc;
+
+    DeviceMapStruct *devMap = get_dev_info(dev);
+    assert(devMap);
+    VkLayerDispatchTable *pDisp = devMap->device_dispatch_table;
+
+    if (pDisp->GetDeviceProcAddr == NULL)
+        return NULL;
+    return pDisp->GetDeviceProcAddr(dev, funcName);
+}
+
+VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL
+GetInstanceProcAddr(VkInstance instance, const char *funcName) {
+    PFN_vkVoidFunction proc = intercept_core_instance_command(funcName);
+    if (proc)
+        return proc;
+
+    assert(instance);
+
+    proc = intercept_core_device_command(funcName);
+    if (!proc)
+        proc = intercept_khr_swapchain_command(funcName, VK_NULL_HANDLE);
+    if (proc)
+        return proc;
+
+    VkLayerInstanceDispatchTable *pTable = instance_dispatch_table(instance);
+    if (pTable->GetInstanceProcAddr == NULL)
+        return NULL;
+    return pTable->GetInstanceProcAddr(instance, funcName);
+}
+
+static PFN_vkVoidFunction intercept_core_instance_command(const char *name) {
+
+    static const struct {
+        const char *name;
+        PFN_vkVoidFunction proc;
+    } core_instance_commands[] = {
+        {"vkGetInstanceProcAddr",
+         reinterpret_cast<PFN_vkVoidFunction>(GetInstanceProcAddr)},
+        {"vkCreateInstance",
+         reinterpret_cast<PFN_vkVoidFunction>(CreateInstance)},
+        {"vkCreateDevice", reinterpret_cast<PFN_vkVoidFunction>(CreateDevice)},
+        {"vkEnumeratePhysicalDevices",
+         reinterpret_cast<PFN_vkVoidFunction>(EnumeratePhysicalDevices)},
+        {"vkEnumerateInstanceLayerProperties",
+         reinterpret_cast<PFN_vkVoidFunction>(
+             EnumerateInstanceLayerProperties)},
+        {"vkEnumerateDeviceLayerProperties",
+         reinterpret_cast<PFN_vkVoidFunction>(EnumerateDeviceLayerProperties)},
+        {"vkEnumerateInstanceExtensionProperties",
+         reinterpret_cast<PFN_vkVoidFunction>(
+             EnumerateInstanceExtensionProperties)},
+        {"vkEnumerateDeviceExtensionProperties",
+         reinterpret_cast<PFN_vkVoidFunction>(
+             EnumerateDeviceExtensionProperties)}};
+
+    for (size_t i = 0; i < ARRAY_SIZE(core_instance_commands); i++) {
+        if (!strcmp(core_instance_commands[i].name, name))
+            return core_instance_commands[i].proc;
+    }
+
+    return nullptr;
+}
+
+static PFN_vkVoidFunction intercept_core_device_command(const char *name) {
+    static const struct {
+        const char *name;
+        PFN_vkVoidFunction proc;
+    } core_device_commands[] = {
+        {"vkGetDeviceProcAddr",
+         reinterpret_cast<PFN_vkVoidFunction>(GetDeviceProcAddr)},
+        {"vkGetDeviceQueue",
+         reinterpret_cast<PFN_vkVoidFunction>(GetDeviceQueue)},
+        {"vkCreateCommandPool",
+         reinterpret_cast<PFN_vkVoidFunction>(CreateCommandPool)},
+        {"vkDestroyDevice",
+         reinterpret_cast<PFN_vkVoidFunction>(DestroyDevice)},
+    };
+
+    for (size_t i = 0; i < ARRAY_SIZE(core_device_commands); i++) {
+        if (!strcmp(core_device_commands[i].name, name))
+            return core_device_commands[i].proc;
+    }
+
+    return nullptr;
+}
+
+static PFN_vkVoidFunction intercept_khr_swapchain_command(const char *name,
+                                                          VkDevice dev) {
+    static const struct {
+        const char *name;
+        PFN_vkVoidFunction proc;
+    } khr_swapchain_commands[] = {
+        {"vkCreateSwapchainKHR",
+         reinterpret_cast<PFN_vkVoidFunction>(CreateSwapchainKHR)},
+        {"vkGetSwapchainImagesKHR",
+         reinterpret_cast<PFN_vkVoidFunction>(GetSwapchainImagesKHR)},
+        {"vkQueuePresentKHR",
+         reinterpret_cast<PFN_vkVoidFunction>(QueuePresentKHR)},
+    };
+
+    if (dev) {
+        DeviceMapStruct *devMap = get_dev_info(dev);
+        if (!devMap->wsi_enabled)
+            return nullptr;
+    }
+
+    for (size_t i = 0; i < ARRAY_SIZE(khr_swapchain_commands); i++) {
+        if (!strcmp(khr_swapchain_commands[i].name, name))
+            return khr_swapchain_commands[i].proc;
+    }
+
+    return nullptr;
+}
+
+} // namespace screenshot
+
+// loader-layer interface v0, just wrappers since there is only a layer
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL
+vkEnumerateInstanceLayerProperties(uint32_t *pCount,
+                                   VkLayerProperties *pProperties) {
+    return screenshot::EnumerateInstanceLayerProperties(pCount, pProperties);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkEnumerateDeviceLayerProperties(
+    VkPhysicalDevice physicalDevice, uint32_t *pCount,
+    VkLayerProperties *pProperties) {
+    // the layer command handles VK_NULL_HANDLE just fine internally
+    assert(physicalDevice == VK_NULL_HANDLE);
+    return screenshot::EnumerateDeviceLayerProperties(VK_NULL_HANDLE, pCount,
+                                                      pProperties);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL
+vkEnumerateInstanceExtensionProperties(const char *pLayerName, uint32_t *pCount,
+                                       VkExtensionProperties *pProperties) {
+    return screenshot::EnumerateInstanceExtensionProperties(pLayerName, pCount,
+                                                            pProperties);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL
+vkEnumerateDeviceExtensionProperties(VkPhysicalDevice physicalDevice,
+                                     const char *pLayerName, uint32_t *pCount,
+                                     VkExtensionProperties *pProperties) {
+    // the layer command handles VK_NULL_HANDLE just fine internally
+    assert(physicalDevice == VK_NULL_HANDLE);
+    return screenshot::EnumerateDeviceExtensionProperties(
+        VK_NULL_HANDLE, pLayerName, pCount, pProperties);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL
+vkGetDeviceProcAddr(VkDevice dev, const char *funcName) {
+    return screenshot::GetDeviceProcAddr(dev, funcName);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL
+vkGetInstanceProcAddr(VkInstance instance, const char *funcName) {
+    return screenshot::GetInstanceProcAddr(instance, funcName);
+}

diff --git a/layersvt/vk_layer_settings.txt b/layersvt/vk_layer_settings.txt
new file mode 100644
index 0000000..abf18fd
--- /dev/null
+++ b/layersvt/vk_layer_settings.txt

@@ -0,0 +1,78 @@
+################################################################################
+#
+#  This file contains per-layer settings that configure layer behavior at
+#  execution time. Comments in this file are denoted with the "#" char.
+#  Settings lines are of the form:
+#      "<LayerIdentifier>.<SettingName> = <SettingValue>"
+#
+#  <LayerIdentifier> is typically the official layer name, minus the VK_LAYER
+#  prefix and all lower-camel-case -- i.e., for VK_LAYER_LUNARG_api_dump, the
+#  layer identifier is 'lunarg_api_dump'.
+#
+################################################################################
+################################################################################
+#  VK_LAYER_LUNARG_api_dump Settings:
+#  ==================================
+#
+#    DETAILED:
+#    =========
+#    <LayerIdentifer>.detailed : Setting this to TRUE causes parameter details
+#    to be dumped in addition to API calls.
+#    
+#    NO_ADDR:
+#    ========
+#    <LayerIdentifier>.no_addr : Setting this to TRUE causes "address" to be
+#    dumped in place of hex addresses.
+#
+#    FILE:
+#    =====
+#    <LayerIdentifer>.file : Setting this to TRUE indicates that output
+#    should be written to file instead of STDOUT.
+#
+#    LOG_FILENAME:
+#    =============
+#    <LayerIdentifer>.log_filename : Specifies the file to dump to when
+#    "file = TRUE".  The default is "vk_apidump.txt".
+#
+#    FLUSH:
+#    ======
+#    <LayerIdentifier>.flush : Setting this to TRUE causes IO to be flushed
+#    each API call that is written
+#
+#   INDENT SIZE
+#   ==============
+#   <LayerIdentifier>.indent_size : Specifies the number of spaces that a tab
+#   is equal to
+#
+#   SHOW TYPES
+#   ==============
+#   <LayerIdentifier>.show_types : Setting this to TRUE causes types to be
+#   dumped in addition to values
+#
+#   NAME SIZE
+#   ==============
+#   <LayerIdentifier>.name_size : The number of characters the name of a
+#   variable should consume, assuming more are not required
+#
+#   TYPE SIZE
+#   ==============
+#   <LayerIdentifier>.type_size : The number of characters the type of a
+#   variable should consume, assuming more are not requires
+#
+#   USE_SPACES
+#   ==============
+#   <LayerIdentifier>.use_spaces : Setting this to TRUE causes all tabs
+#   characters to be replaced with spaces
+
+#  VK_LUNARG_LAYER_api_dump Settings
+lunarg_api_dump.detailed = TRUE
+lunarg_api_dump.no_addr = FALSE
+lunarg_api_dump.file = FALSE
+lunarg_api_dump.log_filename = vk_apidump.txt
+lunarg_api_dump.flush = TRUE
+lunarg_api_dump.indent_size = 4
+lunarg_api_dump.show_types = TRUE
+lunarg_api_dump.name_size = 32
+lunarg_api_dump.type_size = 0
+lunarg_api_dump.use_spaces = TRUE
+lunarg_api_dump.show_shader = FALSE

diff --git a/layersvt/windows/VkLayer_api_dump.json b/layersvt/windows/VkLayer_api_dump.json
new file mode 100644
index 0000000..6d7021f
--- /dev/null
+++ b/layersvt/windows/VkLayer_api_dump.json

@@ -0,0 +1,11 @@
+{
+    "file_format_version" : "1.0.0",
+    "layer" : {
+        "name": "VK_LAYER_LUNARG_api_dump",
+        "type": "GLOBAL",
+        "library_path": ".\\VkLayer_api_dump.dll",
+        "api_version": "1.0.33",
+        "implementation_version": "2",
+        "description": "LunarG debug layer"
+    }
+}

diff --git a/layersvt/windows/VkLayer_basic.json b/layersvt/windows/VkLayer_basic.json
new file mode 100644
index 0000000..d59a1ba
--- /dev/null
+++ b/layersvt/windows/VkLayer_basic.json

@@ -0,0 +1,18 @@
+{
+    "file_format_version" : "1.0.0",
+    "layer" : {
+        "name": "VK_LAYER_LUNARG_basic",
+        "type": "GLOBAL",
+        "library_path": ".\\VkLayer_basic.dll",
+        "api_version": "1.0.33",
+        "implementation_version": "1",
+        "description": "LunarG Sample Layer",
+        "device_extensions": [
+             {
+                 "name": "VK_LUNARG_LayerExtension1",
+                 "spec_version": "0",
+		 "entrypoints": ["vkLayerExtension1"]
+             }
+         ]
+    }
+}

diff --git a/layersvt/windows/VkLayer_generic.json b/layersvt/windows/VkLayer_generic.json
new file mode 100644
index 0000000..3aa8ed4
--- /dev/null
+++ b/layersvt/windows/VkLayer_generic.json

@@ -0,0 +1,11 @@
+{
+    "file_format_version" : "1.0.0",
+    "layer" : {
+        "name": "VK_LAYER_LUNARG_generic",
+        "type": "GLOBAL",
+        "library_path": ".\\VkLayer_generic.dll",
+        "api_version": "1.0.33",
+        "implementation_version": "1",
+        "description": "LunarG sample layer"
+    }
+}

diff --git a/layersvt/windows/VkLayer_multi.json b/layersvt/windows/VkLayer_multi.json
new file mode 100644
index 0000000..3b1fb2f
--- /dev/null
+++ b/layersvt/windows/VkLayer_multi.json

@@ -0,0 +1,28 @@
+{
+    "file_format_version" : "1.0.1",
+    "layers" : [
+        {
+            "name": "VK_LAYER_LUNARG_multi1",
+            "type": "INSTANCE",
+            "library_path": ".\\VkLayer_multi.dll",
+            "api_version": "1.0.33",
+            "implementation_version": "1",
+            "description": "LunarG Sample multiple layer per library",
+            "functions" : {
+              "vkGetInstanceProcAddr" : "VK_LAYER_LUNARG_multi1GetInstanceProcAddr",
+              "vkGetDeviceProcAddr" : "VK_LAYER_LUNARG_multi1GetDeviceProcAddr"
+            }
+        },
+        {
+            "name": "VK_LAYER_LUNARG_multi2",
+            "type": "INSTANCE",
+            "library_path": ".\\VkLayer_multi.dll",
+            "api_version": "1.0.33",
+            "implementation_version": "1",
+            "description": "LunarG Sample multiple layer per library",
+            "functions" : {
+              "vkGetInstanceProcAddr" : "VK_LAYER_LUNARG_multi2GetInstanceProcAddr"
+            }
+        }
+    ]
+}

diff --git a/layersvt/windows/VkLayer_screenshot.json b/layersvt/windows/VkLayer_screenshot.json
new file mode 100644
index 0000000..22e767f
--- /dev/null
+++ b/layersvt/windows/VkLayer_screenshot.json

@@ -0,0 +1,11 @@
+{
+    "file_format_version" : "1.0.0",
+    "layer" : {
+        "name": "VK_LAYER_LUNARG_screenshot",
+        "type": "GLOBAL",
+        "library_path": ".\\VkLayer_screenshot.dll",
+        "api_version": "1.0.33",
+        "implementation_version": "1",
+        "description": "LunarG image capture layer"
+    }
+}

diff --git a/libs/vkjson/CMakeLists.txt b/libs/vkjson/CMakeLists.txt
index 2e79d91..8c218b1 100644
--- a/libs/vkjson/CMakeLists.txt
+++ b/libs/vkjson/CMakeLists.txt

@@ -1,23 +1,16 @@
 # Copyright 2015 Google Inc. All rights reserved.
 #
-# Permission is hereby granted, free of charge, to any person obtaining a
-# copy of this software and/or associated documentation files (the
-# "Materials"), to deal in the Materials without restriction, including
-# without limitation the rights to use, copy, modify, merge, publish,
-# distribute, sublicense, and/or sell copies of the Materials, and to
-# permit persons to whom the Materials are furnished to do so, subject to
-# the following conditions:
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
 #
-# The above copyright notice and this permission notice shall be included
-# in all copies or substantial portions of the Materials.
+#     http://www.apache.org/licenses/LICENSE-2.0
 #
-# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
-# THE SOFTWARE.
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
 
 include_directories(
 	${CMAKE_CURRENT_SOURCE_DIR}

diff --git a/tests/0001-vktrace-In-vktracereplay-test-change-layer-folder-to.patch b/tests/0001-vktrace-In-vktracereplay-test-change-layer-folder-to.patch
new file mode 100644
index 0000000..08440c4
--- /dev/null
+++ b/tests/0001-vktrace-In-vktracereplay-test-change-layer-folder-to.patch

@@ -0,0 +1,33 @@
+From 001884e2c4ac53067a158cc7e014c7eb670a9485 Mon Sep 17 00:00:00 2001
+From: Arda Coskunses <arda@lunarg.com>
+Date: Tue, 27 Sep 2016 17:09:40 -0600
+Subject: [PATCH] vktrace: In vktracereplay test change layer folder to
+ layersvt
+
+Change-Id: Id7736608b7da0bcd61bfe2c1bc3f023b582fbf03
+---
+ tests/_vktracereplay.ps1 | 8 ++++----
+ 1 file changed, 4 insertions(+), 4 deletions(-)
+
+diff --git a/tests/_vktracereplay.ps1 b/tests/_vktracereplay.ps1
+index ce98bff..fee27d7 100644
+--- a/tests/_vktracereplay.ps1
++++ b/tests/_vktracereplay.ps1
+@@ -29,10 +29,10 @@ cp ..\..\demos\$dPath\cube.exe .
+ cp ..\..\demos\*.ppm .

+ cp ..\..\demos\*.spv .

+ cp ..\..\loader\$dPath\vulkan-1.dll .

+-cp ..\..\layers\$dPath\VkLayer_screenshot.dll .

+-cp ..\..\layers\$dPath\VkLayer_screenshot.json .

+-cp ..\..\layers\$dPath\VkLayer_vktrace_layer.dll .

+-cp ..\..\layers\$dPath\VkLayer_vktrace_layer.json .

++cp ..\..\layersvt\$dPath\VkLayer_screenshot.dll .

++cp ..\..\layersvt\$dPath\VkLayer_screenshot.json .

++cp ..\..\layersvt\$dPath\VkLayer_vktrace_layer.dll .

++cp ..\..\layersvt\$dPath\VkLayer_vktrace_layer.json .

+ 

+ # Change PATH to the temp directory

+ $oldpath = $Env:PATH

+-- 
+2.7.4
+

diff --git a/tests/0001-vktrace-test-disable-mean-calc.patch b/tests/0001-vktrace-test-disable-mean-calc.patch
new file mode 100644
index 0000000..ae3c295
--- /dev/null
+++ b/tests/0001-vktrace-test-disable-mean-calc.patch

@@ -0,0 +1,88 @@
+From 140ed5b0db38bf3766cfccbb54b055870a8e662c Mon Sep 17 00:00:00 2001
+From: Arda Coskunses <arda@lunarg.com>
+Date: Wed, 28 Sep 2016 16:16:30 -0600
+Subject: [PATCH] vktrace: test disable mean calc
+
+Change-Id: I6d182f66ae17eaca0bb7c22d8934ce481563053a
+---
+ tests/_vktracereplay.ps1 | 64 ++++++++++++++++++++++++------------------------
+ 1 file changed, 32 insertions(+), 32 deletions(-)
+
+diff --git a/tests/_vktracereplay.ps1 b/tests/_vktracereplay.ps1
+index fee27d7..71273d0 100644
+--- a/tests/_vktracereplay.ps1
++++ b/tests/_vktracereplay.ps1
+@@ -75,38 +75,38 @@ if ($exitstatus -eq 0) {
+ }

+ 

+ # check the average pixel value of each screenshot to ensure something plausible was written

+-if ($exitstatus -eq 0) {

+-    $trace_mean = (convert 1-trace.ppm -format "%[mean]" info:)

+-    $replay_mean = (convert 1-replay.ppm -format "%[mean]" info:)

+-    $version = (identify -version)

+-

+-    # normalize the values so we can support Q8 and Q16 imagemagick installations

+-    if ($version -match "Q8") {

+-        $trace_mean = $trace_mean   / 255 # 2^8-1

+-        $replay_mean = $replay_mean / 255 # 2^8-1

+-    } else {

+-        $trace_mean = $trace_mean   / 65535 # 2^16-1

+-        $replay_mean = $replay_mean / 65535 # 2^16-1

+-    }

+-

+-    # if either screenshot is too bright or too dark, it either failed, or is a bad test

+-    if (($trace_mean -lt 0.10) -or ($trace_mean -gt 0.90)){

+-        echo ''

+-        echo 'Trace screenshot failed mean check, must be in range [0.1, 0.9]'

+-        write-host 'Detected mean:' $trace_mean

+-        echo ''

+-        write-host -background black -foreground red "[  FAILED  ] "  -nonewline;

+-        $exitstatus = 1

+-    }

+-    if (($replay_mean -lt 0.10) -or ($replay_mean -gt 0.90)){

+-        echo ''

+-        echo 'Replay screenshot failed mean check, must be in range [0.1, 0.9]'

+-        write-host 'Detected mean:' $replay_mean

+-        echo ''

+-        write-host -background black -foreground red "[  FAILED  ] "  -nonewline;

+-        $exitstatus = 1

+-    }

+-}

++#if ($exitstatus -eq 0) {

++#    $trace_mean = (convert 1-trace.ppm -format "%[mean]" info:)

++#    $replay_mean = (convert 1-replay.ppm -format "%[mean]" info:)

++#    $version = (identify -version)

++#

++#    # normalize the values so we can support Q8 and Q16 imagemagick installations

++#    if ($version -match "Q8") {

++#        $trace_mean = $trace_mean   / 255 # 2^8-1

++#        $replay_mean = $replay_mean / 255 # 2^8-1

++#    } else {

++#        $trace_mean = $trace_mean   / 65535 # 2^16-1

++#        $replay_mean = $replay_mean / 65535 # 2^16-1

++#    }

++#

++#    # if either screenshot is too bright or too dark, it either failed, or is a bad test

++#    if (($trace_mean -lt 0.10) -or ($trace_mean -gt 0.90)){

++#        echo ''

++#        echo 'Trace screenshot failed mean check, must be in range [0.1, 0.9]'

++#        write-host 'Detected mean:' $trace_mean

++#        echo ''

++#        write-host -background black -foreground red "[  FAILED  ] "  -nonewline;

++#        $exitstatus = 1

++#    }

++#    if (($replay_mean -lt 0.10) -or ($replay_mean -gt 0.90)){

++#        echo ''

++#        echo 'Replay screenshot failed mean check, must be in range [0.1, 0.9]'

++#        write-host 'Detected mean:' $replay_mean

++#        echo ''

++#        write-host -background black -foreground red "[  FAILED  ] "  -nonewline;

++#        $exitstatus = 1

++#    }

++#}

+ 

+ # if we passed all the checks, the test is good

+ if ($exitstatus -eq 0) {

+-- 
+2.7.4
+

diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt
index b1b83a5..3ac2493 100644
--- a/tests/CMakeLists.txt
+++ b/tests/CMakeLists.txt

@@ -20,7 +20,7 @@
 set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_SOURCE_DIR}/cmake")
 
 if(WIN32)
-    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -D_CRT_SECURE_NO_WARNINGS -D_USE_MATH_DEFINES")
+    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -D_CRT_SECURE_NO_WARNINGS -D_USE_MATH_DEFINES -DMAGICKCORE_HDRI_ENABLE=0 -DMAGICKCORE_QUANTUM_DEPTH=16")
 
     # If MSVC, disable some signed/unsigned mismatch warnings.
     if (MSVC)
@@ -28,7 +28,7 @@
     endif()
 
 else()
-    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
+    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -DMAGICKCORE_HDRI_ENABLE=0 -DMAGICKCORE_QUANTUM_DEPTH=16")
 endif()
 
 set (LIBGLM_INCLUDE_DIR ${PROJECT_SOURCE_DIR}/libs)
@@ -44,6 +44,7 @@
     "${PROJECT_SOURCE_DIR}/tests/gtest-1.7.0/include"
     "${PROJECT_SOURCE_DIR}/icd/common"
     "${PROJECT_SOURCE_DIR}/layers"
+    ${XCB_INCLUDE_DIRS}
     ${GLSLANG_SPIRV_INCLUDE_DIR}
     ${LIBGLM_INCLUDE_DIR}
     )
@@ -57,25 +58,31 @@
             COMMAND ln -sf ${CMAKE_CURRENT_SOURCE_DIR}/run_loader_tests.sh
             COMMAND ln -sf ${CMAKE_CURRENT_SOURCE_DIR}/run_extra_loader_tests.sh
             COMMAND ln -sf ${CMAKE_CURRENT_SOURCE_DIR}/vkvalidatelayerdoc.sh
+            # Files unique to VulkanTools go below this line
+            COMMAND ln -sf ${CMAKE_CURRENT_SOURCE_DIR}/vktracereplay.sh
             VERBATIM
             )
     endif()
 else()
     if (NOT (CMAKE_CURRENT_SOURCE_DIR STREQUAL CMAKE_CURRENT_BINARY_DIR))
         FILE(TO_NATIVE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/_run_all_tests.ps1 RUN_ALL)
-        FILE(TO_NATIVE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/_vkvalidatelayerdoc.ps1 VALIDATE_DOC)
+        FILE(TO_NATIVE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/_vktracereplay.ps1 VKTRACEREPLAY)
+        FILE(TO_NATIVE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/_vkvalidatelayerdoc.ps1 VKVALIDATELAYERDOC)
         add_custom_target(binary-dir-symlinks ALL
-            COMMAND ${CMAKE_COMMAND} -E copy_if_different ${RUN_ALL} run_all_tests.ps1
-            COMMAND ${CMAKE_COMMAND} -E copy_if_different ${VALIDATE_DOC} vkvalidatelayerdoc.ps1
+            COMMAND ${CMAKE_COMMAND} -E copy_if_different ${RUN_ALL} _run_all_tests.ps1
+            COMMAND ${CMAKE_COMMAND} -E copy_if_different ${VKTRACEREPLAY} _vktracereplay.ps1
+            COMMAND ${CMAKE_COMMAND} -E copy_if_different ${VKVALIDATELAYERDOC} _vkvalidatelayerdoc.ps1
             VERBATIM
             )
     endif()
 endif()
 
 if(WIN32)
-   set (LIBVK "${API_LOWERCASE}-${MAJOR}")
+   set (LIBVK "vulkan-${MAJOR}")
+   set (TEST_LIBRARIES ${GLSLANG_LIBRARIES})
 elseif(UNIX)
-   set (LIBVK "${API_LOWERCASE}")
+   set (LIBVK "vulkan")
+   set (TEST_LIBRARIES ${GLSLANG_LIBRARIES} ${XCB_LIBRARIES} ${X11_LIBRARIES})
 else()
 endif()
 
@@ -83,22 +90,13 @@
 set_target_properties(vk_layer_validation_tests
    PROPERTIES
    COMPILE_DEFINITIONS "GTEST_LINKED_AS_SHARED_LIBRARY=1")
-if(NOT WIN32)
-    if (BUILD_WSI_XCB_SUPPORT OR BUILD_WSI_XLIB_SUPPORT)
-        target_link_libraries(vk_layer_validation_tests ${LIBVK} ${XCB_LIBRARIES} ${X11_LIBRARIES} gtest gtest_main VkLayer_utils ${GLSLANG_LIBRARIES})
-    else()
-        target_link_libraries(vk_layer_validation_tests ${LIBVK} gtest gtest_main VkLayer_utils ${GLSLANG_LIBRARIES})
-    endif()
-endif()
-if(WIN32)
-   target_link_libraries(vk_layer_validation_tests ${LIBVK} gtest gtest_main VkLayer_utils ${GLSLANG_LIBRARIES})
-endif()
+target_link_libraries(vk_layer_validation_tests ${LIBVK} gtest gtest_main VkLayer_utils ${TEST_LIBRARIES})
 
 add_executable(vk_loader_validation_tests loader_validation_tests.cpp ${COMMON_CPP})
 set_target_properties(vk_loader_validation_tests
    PROPERTIES
    COMPILE_DEFINITIONS "GTEST_LINKED_AS_SHARED_LIBRARY=1")
-target_link_libraries(vk_loader_validation_tests ${LIBVK} gtest gtest_main VkLayer_utils ${GLSLANG_LIBRARIES})
+target_link_libraries(vk_loader_validation_tests ${LIBVK} gtest gtest_main VkLayer_utils ${TEST_LIBRARIES})
 
 add_subdirectory(gtest-1.7.0)
 add_subdirectory(layers)

diff --git a/tests/_vktracereplay.ps1 b/tests/_vktracereplay.ps1
new file mode 100644
index 0000000..71273d0
--- /dev/null
+++ b/tests/_vktracereplay.ps1

@@ -0,0 +1,126 @@
+# Powershell script for running the vktrace trace/replay auto test

+# To run this test:

+#    cd <this-dir>

+#    powershell C:\src\LoaderAndValidationLayers\vktracereplay.ps1 [-Debug]

+$exitstatus = 0

+

+if ($args[0] -eq "-Debug") {

+    $dPath = "Debug"

+} else {

+    $dPath = "Release"

+}

+

+write-host -background black -foreground green "[  RUN     ] " -nonewline

+write-host "vktracereplay.ps1: Vktrace trace/replay"

+

+# Create a temp directory to run the test in

+

+if (Test-Path .\vktracereplay_tmp) {

+    rm -recurse -force .\vktracereplay_tmp  > $null 2> $null

+}

+new-item vktracereplay_tmp -itemtype directory > $null 2> $null

+

+# Copy everything we need into the temp directory, so we

+# can make sure we are using the correct dll and exe files

+cd vktracereplay_tmp

+cp ..\..\vktrace\$dPath\vkreplay.exe .

+cp ..\..\vktrace\$dPath\vktrace.exe .

+cp ..\..\demos\$dPath\cube.exe .

+cp ..\..\demos\*.ppm .

+cp ..\..\demos\*.spv .

+cp ..\..\loader\$dPath\vulkan-1.dll .

+cp ..\..\layersvt\$dPath\VkLayer_screenshot.dll .

+cp ..\..\layersvt\$dPath\VkLayer_screenshot.json .

+cp ..\..\layersvt\$dPath\VkLayer_vktrace_layer.dll .

+cp ..\..\layersvt\$dPath\VkLayer_vktrace_layer.json .

+

+# Change PATH to the temp directory

+$oldpath = $Env:PATH

+$Env:PATH = $pwd

+

+# Set up some modified env vars

+$Env:VK_LAYER_PATH = $pwd

+

+# Do a trace and replay

+& vktrace -o c01.vktrace -s 1 -p cube -a "--c 10" > trace.sout 2> trace.serr

+rename-item -path 1.ppm -newname 1-trace.ppm

+& vkreplay  -s 1 -t  c01.vktrace > replay.sout 2> replay.serr

+rename-item -path 1.ppm -newname 1-replay.ppm

+

+# Force a failure - for testing this script

+#cp vulkan.dll 1-replay.ppm

+#rm 1-trace.ppm

+#rm 1-replay.ppm

+

+# Restore PATH

+$Env:PATH = $oldpath

+

+if ($exitstatus -eq 0) {

+   # Check that two screenshots were created

+   if (!(Test-Path 1-trace.ppm) -or !(Test-Path 1-replay.ppm)) {

+           echo 'Trace file does not exist'

+           write-host -background black -foreground red "[  FAILED  ] "  -nonewline;

+           $exitstatus = 1

+   }

+}

+

+if ($exitstatus -eq 0) {

+    # ensure the trace and replay snapshots are identical

+    fc.exe /b 1-trace.ppm 1-replay.ppm > $null

+    if (!(Test-Path 1-trace.ppm) -or !(Test-Path 1-replay.ppm) -or $LastExitCode -eq 1) {

+         echo 'Trace files do not match'

+         write-host -background black -foreground red "[  FAILED  ] "  -nonewline;

+         $exitstatus = 1

+    }

+}

+

+# check the average pixel value of each screenshot to ensure something plausible was written

+#if ($exitstatus -eq 0) {

+#    $trace_mean = (convert 1-trace.ppm -format "%[mean]" info:)

+#    $replay_mean = (convert 1-replay.ppm -format "%[mean]" info:)

+#    $version = (identify -version)

+#

+#    # normalize the values so we can support Q8 and Q16 imagemagick installations

+#    if ($version -match "Q8") {

+#        $trace_mean = $trace_mean   / 255 # 2^8-1

+#        $replay_mean = $replay_mean / 255 # 2^8-1

+#    } else {

+#        $trace_mean = $trace_mean   / 65535 # 2^16-1

+#        $replay_mean = $replay_mean / 65535 # 2^16-1

+#    }

+#

+#    # if either screenshot is too bright or too dark, it either failed, or is a bad test

+#    if (($trace_mean -lt 0.10) -or ($trace_mean -gt 0.90)){

+#        echo ''

+#        echo 'Trace screenshot failed mean check, must be in range [0.1, 0.9]'

+#        write-host 'Detected mean:' $trace_mean

+#        echo ''

+#        write-host -background black -foreground red "[  FAILED  ] "  -nonewline;

+#        $exitstatus = 1

+#    }

+#    if (($replay_mean -lt 0.10) -or ($replay_mean -gt 0.90)){

+#        echo ''

+#        echo 'Replay screenshot failed mean check, must be in range [0.1, 0.9]'

+#        write-host 'Detected mean:' $replay_mean

+#        echo ''

+#        write-host -background black -foreground red "[  FAILED  ] "  -nonewline;

+#        $exitstatus = 1

+#    }

+#}

+

+# if we passed all the checks, the test is good

+if ($exitstatus -eq 0) {

+   write-host -background black -foreground green "[  PASSED  ] " -nonewline;

+}

+

+write-host "vktracereplay.ps1: Vktrace trace/replay"

+write-host

+if ($exitstatus) {

+    echo '1 FAILED TEST'

+}

+

+# cleanup

+cd ..

+rm -recurse -force vktracereplay_tmp  > $null 2> $null

+Remove-Item Env:\VK_LAYER_PATH

+exit $exitstatus


diff --git a/tests/icd-spv.h b/tests/icd-spv.h
deleted file mode 100644
index 2275220..0000000
--- a/tests/icd-spv.h
+++ /dev/null

@@ -1,29 +0,0 @@
-/*
- * Copyright (c) 2015-2016 The Khronos Group Inc.
- * Copyright (c) 2015-2016 Valve Corporation
- * Copyright (c) 2015-2016 LunarG, Inc.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Author: Cody Northrop <cody@lunarg.com>
- */
-
-#ifndef ICD_SPV_H
-#define ICD_SPV_H
-
-#include <stdint.h>
-
-#define ICD_SPV_MAGIC   0x07230203
-#define ICD_SPV_VERSION 99
-
-struct icd_spv_header {
-    uint32_t magic;
-    uint32_t version;
-    uint32_t gen_magic;  // Generator's magic number
-};
-
-#endif /* ICD_SPV_H */

diff --git a/tests/layer_validation_tests.cpp b/tests/layer_validation_tests.cpp
index d3dc8c2..9e836c2 100644
--- a/tests/layer_validation_tests.cpp
+++ b/tests/layer_validation_tests.cpp

@@ -51,6 +51,7 @@
 #define SHADER_CHECKER_TESTS 1
 #define DEVICE_LIMITS_TESTS 1
 #define IMAGE_TESTS 1
+#define API_DUMP_TESTS 0
 
 //--------------------------------------------------------------------------------------
 // Mesh and VertexFormat Data
@@ -316,6 +317,7 @@
   protected:
     ErrorMonitor *m_errorMonitor;
     bool m_enableWSI;
+    bool m_enableApiDump;
 
     virtual void SetUp() {
         std::vector<const char *> instance_layer_names;
@@ -323,6 +325,11 @@
         std::vector<const char *> device_extension_names;
 
         instance_extension_names.push_back(VK_EXT_DEBUG_REPORT_EXTENSION_NAME);
+
+        if (m_enableApiDump) {
+            instance_layer_names.push_back("VK_LAYER_LUNARG_api_dump");
+        }
+
         /*
          * Since CreateDbgMsgCallback is an instance level extension call
          * any extension / layer that utilizes that feature also needs
@@ -380,7 +387,10 @@
         delete m_errorMonitor;
     }
 
-    VkLayerTest() { m_enableWSI = false; }
+    VkLayerTest() {
+        m_enableWSI = false;
+        m_enableApiDump = false;
+    }
 };
 
 VkResult VkLayerTest::BeginCommandBuffer(VkCommandBufferObj &commandBuffer) {
@@ -582,6 +592,15 @@
     VkWsiEnabledLayerTest() { m_enableWSI = true; }
 };
 
+class VkApiDumpEnabledLayerTest : public VkLayerTest {
+  public:
+protected:
+    VkApiDumpEnabledLayerTest() {
+        m_enableApiDump = true;
+    }
+};
+
+
 class VkBufferTest {
   public:
     enum eTestEnFlags {
@@ -795,6 +814,7 @@
 };
 
 uint32_t VkVerticesObj::BindIdGenerator;
+
 // ********************************************************************************************************************
 // ********************************************************************************************************************
 // ********************************************************************************************************************
@@ -19847,6 +19867,17 @@
 }
 #endif
 
+#if API_DUMP_TESTS
+TEST_F(VkApiDumpEnabledLayerTest, TestApiDump) {
+
+    // This test invokes the framework only, dumping commands.
+    // Ideally we would have a test harness that automatically verifies
+    // we can dump the entire API.
+    // This test just checks for a pulse, using visual inspection.
+    TEST_DESCRIPTION("Empty test with ApiDump enabled.");
+}
+#endif
+
 int main(int argc, char **argv) {
     int result;
 

diff --git a/tests/run_all_tests.sh b/tests/run_all_tests.sh
index 8cf691e..430840a 100755
--- a/tests/run_all_tests.sh
+++ b/tests/run_all_tests.sh

@@ -16,3 +16,6 @@
 # catch the errors that they are supposed to by intentionally doing things
 # that are wrong
 ./vk_layer_validation_tests
+
+# vktracereplay.sh tests vktrace trace and replay
+./vktracereplay.sh

diff --git a/tests/vktestframeworkandroid.h b/tests/vktestframeworkandroid.h
index 9d16097..922c452 100644
--- a/tests/vktestframeworkandroid.h
+++ b/tests/vktestframeworkandroid.h

@@ -4,6 +4,9 @@
 //  Copyright (c) 2015-2016 Valve Corporation
 //  Copyright (c) 2015-2016 LunarG, Inc.
 //  Copyright (c) 2015-2016 Google, Inc.
+//  Copyright (C) 2015-2016 Valve Corporation
+//  Copyright (C) 2015-2016 LunarG, Inc.
+//  Copyright (C) 2015 Google, Inc.
 //
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.

diff --git a/tests/vktracereplay.sh b/tests/vktracereplay.sh
new file mode 100755
index 0000000..cec9a5d
--- /dev/null
+++ b/tests/vktracereplay.sh

@@ -0,0 +1,49 @@
+#!/bin/bash
+#set -x
+if [ -t 1 ] ; then
+    RED='\033[0;31m'
+    GREEN='\033[0;32m'
+    NC='\033[0m' # No Color
+else
+    RED=''
+    GREEN=''
+    NC=''
+fi
+
+printf "$GREEN[ RUN      ]$NC $0\n"
+
+export LD_LIBRARY_PATH=${PWD}/../loader:${LD_LIBRARY_PATH}
+export VK_LAYER_PATH=${PWD}/../layersvt
+
+function trace_replay {
+	PGM=$1
+	VKTRACE=${PWD}/../vktrace/vktrace
+	VKREPLAY=${PWD}/../vktrace/vkreplay
+	APPDIR=${PWD}/../demos
+	printf "$GREEN[ TRACE    ]$NC ${PGM}\n"
+	${VKTRACE}	--Program ${APPDIR}/${PGM} \
+			--Arguments "--c 100" \
+			--WorkingDir ${APPDIR} \
+			--OutputTrace ${PGM}.vktrace \
+			-s 1
+	printf "$GREEN[ REPLAY   ]$NC ${PGM}\n"
+	${VKREPLAY}	--TraceFile ${PGM}.vktrace \
+			-s 1
+	rm -f ${PGM}.vktrace
+	cmp -s 1.ppm ${APPDIR}/1.ppm
+	RES=$?
+	rm 1.ppm ${APPDIR}/1.ppm
+	if [ $RES -eq 0 ] ; then
+	   printf "$GREEN[  PASSED  ]$NC ${PGM}\n"
+	else
+	   printf "$RED[  FAILED  ]$NC screenshot file compare failed\n"
+	   printf "$RED[  FAILED  ]$NC ${PGM}\n"
+	   printf "TEST FAILED\n"
+	   exit 1
+	fi
+}
+
+trace_replay cube
+
+exit 0
+

diff --git a/update_external_sources.bat b/update_external_sources.bat
index 65d151f..7ac5090 100644
--- a/update_external_sources.bat
+++ b/update_external_sources.bat

@@ -16,6 +16,7 @@
 set BASE_DIR="%BUILD_DIR%external"
 set GLSLANG_DIR=%BASE_DIR%\glslang
 set SPIRV_TOOLS_DIR=%BASE_DIR%\spirv-tools
+set JSONCPP_DIR=%BASE_DIR%\jsoncpp
 
 REM // ======== Parameter parsing ======== //
 
@@ -25,16 +26,20 @@
       echo Available options:
       echo   --sync-glslang      just pull glslang_revision
       echo   --sync-spirv-tools  just pull spirv-tools_revision
+      echo   --sync-jsoncpp      just pull jsoncpp HEAD
       echo   --build-glslang     pulls glslang_revision, configures CMake, builds Release and Debug
       echo   --build-spirv-tools pulls spirv-tools_revision, configures CMake, builds Release and Debug
-      echo   --all               sync and build glslang, LunarGLASS, spirv-tools
+      echo   --build-jsoncpp     pulls jsoncpp HEAD, configures CMake, builds Release and Debug
+      echo   --all               sync and build glslang, LunarGLASS, spirv-tools, and jsoncpp
       goto:finish
    )
 
    set sync-glslang=0
    set sync-spirv-tools=0
+   set sync-jsoncpp=0
    set build-glslang=0
    set build-spirv-tools=0
+   set build-jsoncpp=0
    set check-glslang-build-dependencies=0
 
    :parameterLoop
@@ -53,6 +58,12 @@
          goto:parameterLoop
       )
 
+      if "%1" == "--sync-jsoncpp" (
+         set sync-jsoncpp=1
+         shift
+         goto:parameterLoop
+      )
+
       if "%1" == "--build-glslang" (
          set sync-glslang=1
          set check-glslang-build-dependencies=1
@@ -70,11 +81,20 @@
          goto:parameterLoop
       )
 
+      if "%1" == "--build-jsoncpp" (
+         set sync-jsoncpps=1
+         set build-jsoncpp=1
+         shift
+         goto:parameterLoop
+      )
+
       if "%1" == "--all" (
          set sync-glslang=1
          set sync-spirv-tools=1
+         set sync-jsoncpp=1
          set build-glslang=1
          set build-spirv-tools=1
+         set build-jsoncpp=1
          set check-glslang-build-dependencies=1
          shift
          goto:parameterLoop
@@ -138,10 +158,11 @@
 set /p GLSLANG_REVISION= < glslang_revision
 set /p SPIRV_TOOLS_REVISION= < spirv-tools_revision
 set /p SPIRV_HEADERS_REVISION= < spirv-headers_revision
+set /p JSONCPP_REVISION= < jsoncpp_revision
 echo GLSLANG_REVISION=%GLSLANG_REVISION%
 echo SPIRV_TOOLS_REVISION=%SPIRV_TOOLS_REVISION%
 echo SPIRV_HEADERS_REVISION=%SPIRV_HEADERS_REVISION%
-
+echo JSONCPP_REVISION=%JSONCPP_REVISION%
 
 echo Creating and/or updating glslang, spirv-tools in %BASE_DIR%
 
@@ -170,6 +191,19 @@
    if %errorCode% neq 0 (goto:error)
 )
 
+if %sync-jsoncpp% equ 1 (
+   if exist %JSONCPP_DIR% (
+      rd /S /Q %JSONCPP_DIR%
+   )
+   if %errorlevel% neq 0 (goto:error)
+   if not exist %JSONCPP_DIR% (
+      call:create_jsoncpp
+   )
+   if %errorCode% neq 0 (goto:error)
+   call:update_jsoncpp
+   if %errorCode% neq 0 (goto:error)
+)
+
 if %build-glslang% equ 1 (
    call:build_glslang
    if %errorCode% neq 0 (goto:error)
@@ -180,6 +214,11 @@
    if %errorCode% neq 0 (goto:error)
 )
 
+if %build-jsoncpp% equ 1 (
+   call:build_jsoncpp
+   if %errorCode% neq 0 (goto:error)
+)
+
 echo.
 echo Exiting
 goto:finish
@@ -194,8 +233,6 @@
 endlocal
 goto:eof
 
-
-
 REM // ======== Functions ======== //
 
 :create_glslang
@@ -398,3 +435,103 @@
       set errorCode=1
    )
 goto:eof
+
+:create_jsoncpp
+   echo.
+   echo Creating local jsoncpp repository %JSONCPP_DIR%)
+   mkdir %JSONCPP_DIR%
+   cd %JSONCPP_DIR%
+   git clone https://github.com/open-source-parsers/jsoncpp.git .
+   git checkout %JSONCPP_REVISION%
+   if not exist %JSONCPP_DIR%\include\json\json.h (
+      echo jsoncpp source download failed!
+      set errorCode=1
+   )
+goto:eof
+
+:update_jsoncpp
+   echo.
+   echo Updating %JSONCPP_DIR%
+   cd %JSONCPP_DIR%
+   git fetch --all
+   git checkout %JSONCPP_REVISION%
+goto:eof
+
+:build_jsoncpp
+   echo.
+   echo Building %JSONCPP_DIR%
+   cd  %JSONCPP_DIR%
+   python amalgamate.py
+   
+   if not exist %JSONCPP_DIR%\dist\json\json.h (
+      echo.
+      echo JsonCPP Amalgamation failed to generate %JSONCPP_DIR%\dist\json\json.h
+      set errorCode=1
+   )
+
+REM    REM Cleanup any old directories lying around.
+REM    if exist build32 (
+REM       rmdir /s /q build32
+REM    )
+REM    if exist build (
+REM       rmdir /s /q build
+REM    )
+REM 
+REM    echo Making 32-bit jsoncpp
+REM    echo *************************
+REM    mkdir build32
+REM    set JSONCPP_BUILD_DIR=%JSONCPP_DIR%\build32
+REM    cd %JSONCPP_BUILD_DIR%
+REM 
+REM    echo Generating 32-bit JsonCPP CMake files for Visual Studio %VS_VERSION%
+REM    cmake -G "Visual Studio %VS_VERSION%" .. -DMSVC_RUNTIME=static
+REM 
+REM    echo Building 32-bit JsonCPP: MSBuild ALL_BUILD.vcxproj /p:Platform=x86 /p:Configuration=Debug
+REM    msbuild ALL_BUILD.vcxproj /p:Platform=x86 /p:Configuration=Debug /verbosity:quiet
+REM 
+REM    REM Check for existence of one lib, even though we should check for all results
+REM    if not exist %JSONCPP_BUILD_DIR%\src\lib_json\Debug\jsoncpp.lib (
+REM       echo.
+REM       echo jsoncpp 32-bit Debug build failed!
+REM       set errorCode=1
+REM    )
+REM    echo B Building 32-bit JsonCPP: MSBuild ALL_BUILD.vcxproj /p:Platform=x86 /p:Configuration=Release
+REM    msbuild ALL_BUILD.vcxproj /p:Platform=x86 /p:Configuration=Release /verbosity:quiet
+REM 
+REM    REM Check for existence of one lib, even though we should check for all results
+REM    if not exist %JSONCPP_BUILD_DIR%\src\lib_json\Release\jsoncpp.lib (
+REM       echo.
+REM       echo jsoncpp 32-bit Release build failed!
+REM       set errorCode=1
+REM    )
+REM 
+REM    cd ..
+REM 
+REM    echo Making 64-bit jsoncpp
+REM    echo *************************
+REM    mkdir build
+REM    set JSONCPP_BUILD_DIR=%JSONCPP_DIR%\build
+REM    cd %JSONCPP_BUILD_DIR%
+REM 
+REM    echo Generating 64-bit JsonCPP CMake files for Visual Studio %VS_VERSION%
+REM    cmake -G "Visual Studio %VS_VERSION% Win64" .. -DMSVC_RUNTIME=static
+REM 
+REM    echo Building 64-bit JsonCPP: MSBuild ALL_BUILD.vcxproj /p:Platform=x64 /p:Configuration=Debug
+REM    msbuild ALL_BUILD.vcxproj /p:Platform=x64 /p:Configuration=Debug /verbosity:quiet
+REM 
+REM    REM Check for existence of one lib, even though we should check for all results
+REM    if not exist %JSONCPP_BUILD_DIR%\src\lib_json\Debug\jsoncpp.lib (
+REM       echo.
+REM       echo jsoncpp 64-bit Debug build failed!
+REM       set errorCode=1
+REM    )
+REM    echo Building 64-bit JsonCPP: MSBuild ALL_BUILD.vcxproj /p:Platform=x64 /p:Configuration=Release
+REM    msbuild ALL_BUILD.vcxproj /p:Platform=x64 /p:Configuration=Release /verbosity:quiet
+REM 
+REM    REM Check for existence of one lib, even though we should check for all results
+REM    if not exist %JSONCPP_BUILD_DIR%\src\lib_json\Release\jsoncpp.lib (
+REM       echo.
+REM       echo jsoncpp 64-bit Release build failed!
+REM       set errorCode=1
+REM    )
+goto:eof
\ No newline at end of file

diff --git a/update_external_sources.sh b/update_external_sources.sh
index 6f87903..56231a3 100755
--- a/update_external_sources.sh
+++ b/update_external_sources.sh

@@ -6,9 +6,14 @@
 GLSLANG_REVISION=$(cat "${PWD}"/glslang_revision)
 SPIRV_TOOLS_REVISION=$(cat "${PWD}"/spirv-tools_revision)
 SPIRV_HEADERS_REVISION=$(cat "${PWD}"/spirv-headers_revision)
+JSONCPP_REVISION=$(cat "${PWD}"/jsoncpp_revision)
 echo "GLSLANG_REVISION=${GLSLANG_REVISION}"
 echo "SPIRV_TOOLS_REVISION=${SPIRV_TOOLS_REVISION}"
 echo "SPIRV_HEADERS_REVISION=${SPIRV_HEADERS_REVISION}"
+echo "JSONCPP_REVISION=${JSONCPP_REVISION}"
+
+LUNARGLASS_REVISION=$(cat "${PWD}"/LunarGLASS_revision)
+echo "LUNARGLASS_REVISION=$LUNARGLASS_REVISION"
 
 BUILDDIR=$PWD
 BASEDIR=$BUILDDIR/external
@@ -68,6 +73,55 @@
    make install
 }
 
+function create_LunarGLASS () {
+   rm -rf "${BASEDIR}"/LunarGLASS
+   echo "Creating local LunarGLASS repository (${BASEDIR}/LunarGLASS)."
+   mkdir -p "${BASEDIR}"/LunarGLASS
+   cd "${BASEDIR}"/LunarGLASS
+   git clone https://github.com/LunarG/LunarGLASS.git .
+   mkdir -p Core/LLVM
+   cd Core/LLVM 
+   wget http://llvm.org/releases/3.4/llvm-3.4.src.tar.gz
+   tar --gzip -xf llvm-3.4.src.tar.gz
+   git checkout -f $LUNARGLASS_REVISION . # put back the LunarGLASS versions of some LLVM files
+   # copy overlay files
+   cd "${BASEDIR}"/LunarGLASS
+   cp -R "${BUILDDIR}"/LunarGLASS/* .
+}
+
+function update_LunarGLASS () {
+   echo "Updating ${BASEDIR}/LunarGLASS"
+   cd "${BASEDIR}"/LunarGLASS
+   git fetch
+   git checkout -f $LUNARGLASS_REVISION .
+   # Figure out how to do this with git
+   #git checkout $LUNARGLASS_REVISION |& tee gitout
+   #if grep --quiet LLVM gitout ; then
+   #   rm -rf $BASEDIR/LunarGLASS/Core/LLVM/llvm-3.4/build
+   #fi
+   #rm -rf gitout
+ 
+   # copy overlay files
+   cp -R "${BUILDDIR}"/LunarGLASS/* .
+}
+
+function build_LunarGLASS () {
+   echo "Building ${BASEDIR}/LunarGLASS"
+   cd "${BASEDIR}"/LunarGLASS/Core/LLVM/llvm-3.4
+   if [ ! -d "${BASEDIR}/LunarGLASS/Core/LLVM/llvm-3.4/build" ]; then
+      mkdir -p build
+      cd build
+      ../configure --enable-terminfo=no --enable-curses=no
+      REQUIRES_RTTI=1 make -j $(nproc) && make install DESTDIR=`pwd`/install
+   fi
+   cd "${BASEDIR}"/LunarGLASS
+   mkdir -p build
+   cd build
+   cmake -D CMAKE_BUILD_TYPE=Release ..
+   make
+   make install
+}
+
 function build_spirv-tools () {
    echo "Building ${BASEDIR}/spirv-tools"
    cd "${BASEDIR}"/spirv-tools
@@ -77,17 +131,42 @@
    make -j $(nproc)
 }
 
-# If any options are provided, just compile those tools
-# If no options are provided, build everything
+function create_jsoncpp () {
+   rm -rf ${BASEDIR}/jsoncpp
+   echo "Creating local jsoncpp repository (${BASEDIR}/jsoncpp)."
+   mkdir -p ${BASEDIR}/jsoncpp
+   cd ${BASEDIR}/jsoncpp
+   git clone https://github.com/open-source-parsers/jsoncpp.git .
+   git checkout ${JSONCPP_REVISION}
+}
+
+function update_jsoncpp () {
+   echo "Updating ${BASEDIR}/jsoncpp"
+   cd ${BASEDIR}/jsoncpp
+   git fetch --all
+   git checkout ${JSONCPP_REVISION}
+}
+
+function build_jsoncpp () {
+   echo "Building ${BASEDIR}/jsoncpp"
+   cd ${BASEDIR}/jsoncpp
+   python amalgamate.py
+}
+
 INCLUDE_GLSLANG=false
 INCLUDE_SPIRV_TOOLS=false
+INCLUDE_LUNARGLASS=false
+INCLUDE_JSONCPP=false
 
 if [ "$#" == 0 ]; then
-  echo "Building glslang, spirv-tools"
+  # If no options are provided, build everything
+  echo "Building glslang, spirv-tools, LunarGLASS, and jsoncpp"
   INCLUDE_GLSLANG=true
   INCLUDE_SPIRV_TOOLS=true
+  INCLUDE_LUNARGLASS=true
+  INCLUDE_JSONCPP=true
 else
-  # Parse options
+  # If any options are provided, just compile those tools
   while [[ $# > 0 ]]
   do
     option="$1"
@@ -98,16 +177,32 @@
         INCLUDE_GLSLANG=true
         echo "Building glslang ($option)"
         ;;
+
         # options to specify build of spirv-tools components
         -s|--spirv-tools)
         INCLUDE_SPIRV_TOOLS=true
         echo "Building spirv-tools ($option)"
         ;;
+
+        # options to specify build of LunarGLASS components
+        -l|--LunarGLASS)
+        INCLUDE_LUNARGLASS=true
+        echo "Building LunarGLASS ($option)"
+        ;;
+
+        # options to specify build of jsoncpp components
+        -j|--jsoncpp)
+        INCLUDE_JSONCPP=true
+        echo "Building jsoncpp ($option)"
+        ;;
+
         *)
         echo "Unrecognized option: $option"
         echo "Try the following:"
         echo " -g | --glslang      # enable glslang"
         echo " -s | --spirv-tools  # enable spirv-tools"
+        echo " -l | --LunarGLASS   # enable LunarGLASS"
+        echo " -j | --jsoncpp      # enable jsoncpp"
         exit 1
         ;;
     esac
@@ -123,7 +218,6 @@
   build_glslang
 fi
 
-
 if [ ${INCLUDE_SPIRV_TOOLS} == "true" ]; then
     if [ ! -d "${BASEDIR}/spirv-tools" -o ! -d "${BASEDIR}/spirv-tools/.git" ]; then
        create_spirv-tools
@@ -131,3 +225,19 @@
     update_spirv-tools
     build_spirv-tools
 fi
+
+if [ $INCLUDE_LUNARGLASS == "true" ]; then
+    if [ ! -d "${BASEDIR}/LunarGLASS" -o ! -d "${BASEDIR}/LunarGLASS/.git" ]; then
+       create_LunarGLASS
+    fi
+    update_LunarGLASS
+    build_LunarGLASS
+fi
+
+if [ ${INCLUDE_JSONCPP} == "true" ]; then
+    if [ ! -d "${BASEDIR}/jsoncpp" -o ! -d "${BASEDIR}/jsoncpp/.git" ]; then
+       create_jsoncpp
+    fi
+    update_jsoncpp
+    build_jsoncpp
+fi

diff --git a/via/CMakeLists.txt b/via/CMakeLists.txt
new file mode 100644
index 0000000..7d4f410
--- /dev/null
+++ b/via/CMakeLists.txt

@@ -0,0 +1,64 @@
+file(GLOB TEXTURES
+   "${PROJECT_SOURCE_DIR}/via/images/*"
+)
+file(COPY ${TEXTURES} DESTINATION ${CMAKE_BINARY_DIR}/via/images)
+
+
+if(WIN32)
+    set (LIBRARIES "vulkan-${MAJOR}")
+
+    # For Windows, since 32-bit and 64-bit items can co-exist, we build each in its own build directory.
+    # 32-bit target data goes in build32, and 64-bit target data goes into build.  So, include/link the
+    # appropriate data at build time.
+    if (CMAKE_CL_64)
+        set (BUILDTGT_DIR build)
+    else ()
+        set (BUILDTGT_DIR build32)
+    endif()
+
+    # Use static MSVCRT libraries
+    foreach(configuration in CMAKE_C_FLAGS_DEBUG CMAKE_C_FLAGS_MINSIZEREL CMAKE_C_FLAGS_RELEASE CMAKE_C_FLAGS_RELWITHDEBINFO
+                             CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELEASE CMAKE_CXX_FLAGS_RELWITHDEBINFO)
+        if(${configuration} MATCHES "/MD")
+            string(REGEX REPLACE "/MD" "/MT" ${configuration} "${${configuration}}")
+        endif()
+    endforeach()
+
+    set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -D_CRT_SECURE_NO_WARNINGS -D_USE_MATH_DEFINES")
+    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -D_CRT_SECURE_NO_WARNINGS -D_USE_MATH_DEFINES")
+
+else()
+
+    if(UNIX)
+        set (LIBRARIES "vulkan")
+    endif()
+
+    if (BUILD_WSI_XCB_SUPPORT)
+        find_package(XCB REQUIRED)
+
+        include_directories(${XCB_INCLUDE_DIRS})
+        link_libraries(${XCB_LIBRARIES})
+    endif()
+    if (BUILD_WSI_XLIB_SUPPORT)
+        find_package(X11 REQUIRED)
+
+        include_directories(${X11_INCLUDE_DIRS})
+        link_libraries(${X11_LIBRARIES})
+    endif()
+    if (BUILD_WSI_WAYLAND_SUPPORT)
+        find_package(Wayland REQUIRED)
+
+        include_directories(${WAYLAND_CLIENT_INCLUDE_DIR})
+        link_libraries(${WAYLAND_CLIENT_LIBRARIES})
+    endif()
+
+    link_libraries(vulkan m)
+
+endif()
+
+add_executable(via via.cpp ${JSONCPP_SOURCE_DIR}/jsoncpp.cpp)
+target_include_directories(via PUBLIC ${JSONCPP_INCLUDE_DIR})
+target_link_libraries(via ${LIBRARIES})
+if(WIN32)
+    target_link_libraries(via version)
+endif()

diff --git a/via/README.md b/via/README.md
new file mode 100644
index 0000000..21662d4
--- /dev/null
+++ b/via/README.md

@@ -0,0 +1,217 @@
+#![LunarG's Vulkan Installation Analyzer (VIA)](images/lunarg_via_title.png)
+This document is an overview of how to use the [LunarG Vulkan Installation Analyzer (VIA)](https://vulkan.lunarg.com/doc/sdk/latest/windows/via.html).
+VIA is a tool that can:
+ 1. Determine the state of Vulkan components on your system
+ 2. Validate that your Vulkan Loader and drivers are installed properly
+ 3. Capture your system state in a a form that can be used as an attachment when submitting bugs
+
+ This document describes where to find the source for VIA, building it, runnning it, and how to understand the resulting command line output that is generated.
+
+<BR />
+
+
+## Building
+Many components of the LunarG Vulkan SDK are Open Source, including VIA.  VIA is currently part of the LunarG
+[VulkanTools](https://github.com/LunarG/VulkanTools) GitHub respository.
+
+**Windows Note:** VIA is already pre-built as part of the LunarG Windows Vulkan SDK, but should you wish to build a
+debug version or find the source, this section should provide you with the information you need.  Otherwise, simply
+skip down to the "Running" section below.
+
+#### Building VIA in VulkanTools
+Because it is part of a group of tools, you build it from the top folder by
+following the instructions in the [BuilidVT.md](https://github.com/LunarG/VulkanTools/blob/master/BUILDVT.md)
+file at the top of the source tree.
+
+#### Building VIA in the Linux Vulkan SDK
+The source for VIA can also be found in the LunarG Linux [Vulkan SDK](https://vulkan.lunarg.com/sdk/home) in the "source/via" directory.
+ 1. Download and install the Linux SDK
+ 2. Run "source setup-env.sh" in the SDK root directory
+ 3. Run "./build_tools.sh"
+
+<BR />
+
+## Running
+You will find the VIA binary in a different location depending on which OS you are using and whether you have built it, or installed it as part of the SDK.  The following information states where to find the proper executable.
+
+Please note that if you are trying to diagnose a troublesome application, the **best way** to run VIA to assist in diagnosis is to change to the location of the application, and run via in that folder locally (by typing in a relative or absolute path to the via executable).
+
+#### In the Windows Vulkan SDK
+VIA is installed into your start menu as part of the Windows Vulkan SDK.  Simply open your start menu, search for the "Vulkan SDK" and click "via".  This will ouptut the resulting via.html directly to your desktop.
+
+If you need to run via from the command-line, you will find it in your SDK folder (defined by the environment variable "VULKAN_SDK") under the "Bin" folder for 64-bit executables, and "Bin32" folder for 32-bit executables.  From there, simply run:
+```
+via.exe
+```
+
+#### In the Linux Vulkan SDK
+Once built, VIA can be found in the x86_64/bin directory.  You can simply execute it from there using:
+
+```
+via
+```
+
+<BR />
+
+
+#### If Built from VulkanTools
+Go into the folder where you generated the build items from the above building step.
+
+**Linux**
+
+Simply run:
+```
+via
+```
+
+
+**Windows**
+
+Windows has an additional required step to perform if you are building from VulkanTools before you can run VIA in the proper way.  This is because the CMake scripts have an important folder being copied one level too low.  To run, the first time you will have to copy this folder into the appropriate location.
+
+Steps to run the first time:
+ 1. Go into the folder you built, and then go into the "via" sub-folder.
+ 2. In this folder you should see an "images" folder.  Copy this into either or both of your "Debug" or "Release" folders.
+ 3. Go into the "Debug" or "Release" folder (whichever you are desiring to work with) 
+ 4. Run 
+```
+via.exe
+```
+
+After the first time, you just need to go into the folder and re-run "via.exe".
+
+<BR />
+
+### Resulting Output
+VIA outputs two things:
+ - A command-line ouptut indicating the overall status
+ - An HTML file (called via.html) containing the details which will be output to one of two locations:
+  1. If the current directory is writeable, the HTML will be placed in that location.
+  2. Otherwise, it will be saved to your home folder, except for the Windows Start Menu short-cut which writes the file to your desktop.
+
+Your home folder is the following location (based on your OS):
+ - Windows: Wherever your environment variables %HOMEDRIVE%\%HOMEPATH% point to.
+ - Linux: It will be placed in your home folder ("~/.").
+
+<BR />
+
+#### Additional command-line arguments
+There are additional command-line parameters which can be used.  These simply augment existing behavior and do not capture any more information.
+The available command-line arguments are:
+
+##### --unique_output
+The --unique_output argument, if provided, will cause the output html to be generated with a date/time suffix.  This will allow you to perform
+multiple state captures on your system without accidentally erasing previous results.  The new file has the following format:
+
+_via_YYYY_MM_DD_HH_MM.html_
+
+Where each component stands for the numeric values for year (YYYY), month (MM), day (DD), hour (HH), and minute (MM).
+
+#### --output_path
+The --output_path arument allows the user to specify a location for the output html file. For
+example, if the user runs `via --output_path /home/me/Documents`, then the output file will be
+`/home/me/Documents/via.html`.
+
+<BR />
+
+## Common Command-Line Outputs
+
+#### "SUCCESS: Validation completed properly"
+
+##### Problem:
+LunarG's VIA could detect no problems with your setup.
+
+##### Possible Reason:
+Your system is likely setup properly.  If you have trouble running Vulkan from another location, it could be that your environment variables aren't setup properly.
+
+##### Next Step:
+Re-run VIA from the location your Vulkan application/game is supposed to run.
+
+
+#### "ERROR: Failed to find Vulkan Driver JSON in registry"
+
+##### Problem:
+This is a Windows-specific error that indicates that no Vulkan Driver JSON files were referenced in the appropriate place in the Windows Registry.
+
+##### Possible Reason:
+This can indicate that a Vulkan driver failed to install properly.
+
+##### Next Step:
+You should follow up with your Graphics Driver.  See more details below.
+
+
+#### "ERROR: Failed to find Vulkan Driver JSON"
+
+##### Problem:
+The Vulkan loader on your system failed to find any Vulkan Driver JSON files in the appropriate places.
+
+##### Possible Reason:
+This can indicate that a Vulkan driver failed to install properly.
+
+##### Next Step:
+You should follow up with your Graphics Driver.  See more details below.
+
+
+#### "ERROR: Failed to find Vulkan Driver Lib"
+
+##### Problem:
+The Vulkan loader on your system found at least one Vulkan Driver JSON file, but the Driver library referenced in that file appears invalid.
+
+##### Possible Reason:
+This can indicate that a Vulkan driver failed to install properly.
+
+##### Next Step:
+You should follow up with your Graphics Driver.  See more details below.
+
+
+#### "ERROR: Vulkan failed to find a compatible driver"
+
+##### Problem:
+All the components appeared to be in place to allow Vulkan to load (from an external stand-point), but the loader still failed to find
+a Vulkan compatible driver.
+
+##### Possible Reason:
+This can indicate that either a Vulkan driver failed to install properly, or the run-time is failing for some reason.
+
+##### Next Step:
+First, attempt to re-install the run-time by visiting [LunarXchange](https://vulkan.lunarg.com/), and install the latest Vulkan SDK.
+If that does not work, try to see if there is a newer Vulkan driver available and install that.  If that still fails, file an issue on
+LunarXchange.
+
+
+#### [WINDOWS] Dialog box pop's up indicating "vulkan-1.dll is missing from your computer."
+
+##### Problem:
+The Vulkan loader "vulkan-1.dll" couldn't be found on your system.  This file is typically installed with some Vulkan driver installs,
+some Vulkan-capable games, or the LunarG Vulkan SDK.
+
+##### Possible Reason:
+The last Vulkan Runtime install that executed on your system failed to behave properly.  Or, you have never installed a Vulkan loader
+on your system.
+
+##### Next Step:
+To rememdy this, visit [LunarXchange](https://vulkan.lunarg.com/), and install the latest Vulkan SDK.  If that does not work, file an
+Issue to have one of their engineers assist you.
+
+
+#### [LINUX] Message indicating "error while loading shared libraries: libvulkan.so.1"
+
+##### Problem:
+The Vulkan loader "libvulkan.so.1" couldn't be found on your system.  This file is typically installed with some Vulkan driver installs,
+some Vulkan-capable games, or the LunarG Vulkan SDK.
+
+##### Possible Reason:
+The last Vulkan Runtime install that executed on your system failed to behave properly.  Or, you have never installed a Vulkan loader
+
+<BR />
+
+## Vulkan Graphics Driver Problems
+If the problem is possibly related to your Graphics Driver, it could be for several reasons:
+ 1. The hardware you have doesn't support Vulkan.
+ 2. Your hardware supports Vulkan, but you haven't yet installed a driver with Vulkan support.
+ 3. There is no Vulkan driver with support for the OS on which you are currently running.
+    - Sometimes, the company may provide Vulkan support for some devices on one Operatings System (say Windows), while still waiting to complete Vulkan on other systems.
+ 4. Everything supports Vulkan, but the driver failed to install properly.
+
+Before approaching your Graphics driver vendor, it would help if you verified that your current driver for your current hardware on your current
+operating system **does** support Vulkan.

diff --git a/via/images/bg-starfield.jpg b/via/images/bg-starfield.jpg
new file mode 100644
index 0000000..946be81
--- /dev/null
+++ b/via/images/bg-starfield.jpg
Binary files differ

diff --git a/via/images/lunarg_via.png b/via/images/lunarg_via.png
new file mode 100644
index 0000000..21778e7
--- /dev/null
+++ b/via/images/lunarg_via.png
Binary files differ

diff --git a/via/images/lunarg_via_title.png b/via/images/lunarg_via_title.png
new file mode 100644
index 0000000..e498b19
--- /dev/null
+++ b/via/images/lunarg_via_title.png
Binary files differ

diff --git a/via/via.cpp b/via/via.cpp
new file mode 100644
index 0000000..7ec16e7
--- /dev/null
+++ b/via/via.cpp

@@ -0,0 +1,4560 @@
+/*
+ * Copyright (c) 2016 Valve Corporation
+ * Copyright (c) 2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Mark Young <marky@lunarg.com>
+ */
+
+#include <cstring>
+#include <exception>
+#include <fstream>
+#include <iostream>
+#include <sstream>
+#include <string>
+#include <vector>
+#include <time.h>
+#include <inttypes.h>
+
+const char APP_VERSION[] = "Version 1.0";
+#define MAX_STRING_LENGTH 1024
+
+#ifdef _WIN32
+#pragma warning(disable : 4996)
+#include "shlwapi.h"
+#else
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/statvfs.h>
+#include <sys/utsname.h>
+#include <dirent.h>
+#include <unistd.h>
+#endif
+
+#include <json/json.h>
+
+#include <vulkan/vulkan.h>
+
+#if (defined(_MSC_VER) && _MSC_VER < 1900 /*vs2015*/) || defined MINGW_HAS_SECURE_API
+#include <basetsd.h>
+#define snprintf sprintf_s
+#endif
+
+enum ElementAlign { ALIGN_LEFT = 0, ALIGN_CENTER, ALIGN_RIGHT };
+
+struct PhysicalDeviceInfo {
+    VkPhysicalDevice vulkan_phys_dev;
+    std::vector<VkQueueFamilyProperties> queue_fam_props;
+};
+
+struct GlobalItems {
+    std::ofstream html_file_stream;
+    bool sdk_found;
+    std::string sdk_path;
+    VkInstance instance;
+    std::vector<PhysicalDeviceInfo> phys_devices;
+    std::vector<VkDevice> log_devices;
+    uint32_t cur_table;
+    std::string exe_directory;
+    bool is_odd_row;
+
+#ifdef _WIN32
+    bool is_wow64;
+#endif
+};
+
+// Create a global variable used to store the global settings
+GlobalItems global_items = {};
+
+// Error messages thrown by the application
+enum ErrorResults {
+    VULKAN_CANT_FIND_DRIVER = -2,
+    MISSING_DRIVER_REGISTRY = -3,
+    MISSING_DRIVER_JSON = -4,
+    MISSING_DRIVER_LIB = -5,
+};
+
+// Structure used to store name/value pairs read from the
+// Vulkan layer settings file (if one exists).
+struct SettingPair {
+    std::string name;
+    std::string value;
+};
+
+void StartOutput(std::string title);
+void EndOutput();
+void PrintSystemInfo(void);
+void PrintVulkanInfo(void);
+void PrintDriverInfo(void);
+void PrintRunTimeInfo(void);
+void PrintSDKInfo(void);
+void PrintExplicitLayerJsonInfo(const char *layer_json_filename,
+                                Json::Value root, uint32_t num_cols);
+void PrintImplicitLayerJsonInfo(const char *layer_json_filename,
+                                Json::Value root);
+void PrintLayerInfo(void);
+void PrintLayerSettingsFileInfo(void);
+void PrintTestResults(void);
+std::string TrimWhitespace(const std::string &str,
+                           const std::string &whitespace = " \t\n\r");
+
+int main(int argc, char **argv) {
+    int err_val = 0;
+    try {
+        time_t time_raw_format;
+        struct tm *ptr_time;
+        char html_file_name[MAX_STRING_LENGTH];
+        char full_file[MAX_STRING_LENGTH];
+        char temp[MAX_STRING_LENGTH];
+        const char* output_path = NULL;
+        bool generate_unique_file = false;
+
+        // Check and handle command-line arguments
+        if (argc > 1) {
+            for (int iii = 1; iii < argc; iii++) {
+                if (0 == strcmp("--unique_output", argv[iii])) {
+                    generate_unique_file = true;
+                } else if (0 == strcmp("--output_path", argv[iii]) &&
+                           argc > (iii + 1)) {
+                    output_path = argv[iii + 1];
+                    ++iii;
+                } else {
+                    std::cout << "Usage of via.exe:" << std::endl
+                              << "    via.exe [--unique_output] "
+                              "[--output_path <path>]" << std::endl
+                              << "          [--unique_output] Optional "
+                              "parameter to generate a unique html"
+                              << std::endl << "                            "
+                              "output file in the form "
+                              "\'via_YYYY_MM_DD_HH_MM.html\'"
+                              << std::endl << "          [--output_path <path>"
+                              "] Optional parameter to generate the output at"
+                              << std::endl << "                               "
+                              "  a given path" << std::endl;
+                    throw(-1);
+                }
+            }
+        }
+        
+        // If the user wants a specific output path, write it to the buffer
+        // and then continue writing the rest of the name below
+        size_t file_name_offset = 0;
+        if (output_path != NULL) {
+            file_name_offset = strlen(output_path) + 1;
+            strncpy(html_file_name, output_path, MAX_STRING_LENGTH - 1);
+#ifdef _WIN32
+            strncpy(html_file_name + file_name_offset - 1, "\\",
+                    MAX_STRING_LENGTH - file_name_offset);
+#else
+            strncpy(html_file_name + file_name_offset - 1, "/",
+                    MAX_STRING_LENGTH - file_name_offset);
+#endif
+        }
+
+        // If the user wants a unique file, generate a file with the current
+        // time and date incorporated into it.
+        if (generate_unique_file) {
+            time(&time_raw_format);
+            ptr_time = localtime(&time_raw_format);
+            if (strftime(html_file_name + file_name_offset,
+                         MAX_STRING_LENGTH - 1,
+                         "via_%Y_%m_%d_%H_%M.html", ptr_time) == 0) {
+                std::cerr << "Couldn't prepare formatted string" << std::endl;
+                throw(-1);
+            }
+        } else {
+            strncpy(html_file_name + file_name_offset, "via.html",
+                    MAX_STRING_LENGTH - 1 - file_name_offset);
+        }
+
+        // Write the output file to the current executing directory, or, if
+        // that fails, write it out to the user's home folder.
+        global_items.html_file_stream.open(html_file_name);
+        if (global_items.html_file_stream.fail()) {
+#ifdef _WIN32
+            char home_drive[32];
+            if (0 != GetEnvironmentVariableA("HOMEDRIVE", home_drive, 31) ||
+                0 != GetEnvironmentVariableA("HOMEPATH", temp,
+                                             MAX_STRING_LENGTH - 1)) {
+                std::cerr << "Error failed to get either HOMEDRIVE or HOMEPATH "
+                             "from environment settings!"
+                          << std::endl;
+                throw(-1);
+            }
+            snprintf(full_file, MAX_STRING_LENGTH - 1, "%s%s\\%s", home_drive,
+                     temp, html_file_name);
+#else
+            snprintf(full_file, MAX_STRING_LENGTH - 1, "~/%s", html_file_name);
+#endif
+            global_items.html_file_stream.open(full_file);
+            if (global_items.html_file_stream.fail()) {
+                std::cerr << "Error failed opening html file stream to "
+                             "either current"
+                             " folder as "
+                          << html_file_name << " or home folder as "
+                          << full_file << std::endl;
+                throw(-1);
+            }
+        }
+
+        global_items.cur_table = 0;
+
+// Determine where we are executing at.
+#ifdef _WIN32
+        int bytes = GetModuleFileName(NULL, temp, MAX_STRING_LENGTH - 1);
+        if (0 < bytes) {
+            std::string exe_location = temp;
+            global_items.exe_directory =
+                exe_location.substr(0, exe_location.rfind("\\"));
+
+            size_t index = 0;
+            while (true) {
+                index = global_items.exe_directory.find("\\", index);
+                if (index == std::string::npos) {
+                    break;
+                }
+                global_items.exe_directory.replace(index, 1, "/");
+                index++;
+            }
+        } else {
+            global_items.exe_directory = "";
+        }
+
+#elif __GNUC__
+        ssize_t len = ::readlink("/proc/self/exe", temp, MAX_STRING_LENGTH - 1);
+        if (0 < len) {
+            std::string exe_location = temp;
+            global_items.exe_directory =
+                exe_location.substr(0, exe_location.rfind("/"));
+        } else {
+            global_items.exe_directory = "";
+        }
+#endif
+
+        StartOutput("LunarG VIA");
+
+        PrintSystemInfo();
+        PrintVulkanInfo();
+        PrintTestResults();
+        EndOutput();
+    } catch (int e) {
+        // Print out a useful message for any common errors.
+        switch (e) {
+        case MISSING_DRIVER_REGISTRY:
+            std::cout << "ERROR: Failed to find Vulkan Driver JSON in registry"
+                      << std::endl;
+            break;
+        case MISSING_DRIVER_JSON:
+            std::cout << "ERROR: Failed to find Vulkan Driver JSON"
+                      << std::endl;
+            break;
+        case MISSING_DRIVER_LIB:
+            std::cout << "ERROR: Failed to find Vulkan Driver Lib" << std::endl;
+            break;
+        case VULKAN_CANT_FIND_DRIVER:
+            std::cout << "ERROR: Vulkan failed to find a compatible driver"
+                      << std::endl;
+            break;
+        default:
+            std::cout << "ERROR: Uknown failure occurred.  Refer to HTML for "
+                         "more info"
+                      << std::endl;
+            break;
+        }
+        err_val = e;
+    }
+    global_items.html_file_stream.close();
+
+    if (err_val == 0) {
+        std::cout << "SUCCESS: Validation completed properly" << std::endl;
+    }
+    return err_val;
+}
+
+// Output helper functions:
+//=============================
+
+// Start writing to the HTML file by creating the appropriate
+// header information including the appropriate CSS and JavaScript
+// items.
+void StartOutput(std::string output) {
+    global_items.html_file_stream << "<!DOCTYPE html>" << std::endl;
+    global_items.html_file_stream << "<HTML lang=\"en\" xml:lang=\"en\" "
+                                     "xmlns=\"http://www.w3.org/1999/xhtml\">"
+                                  << std::endl;
+    global_items.html_file_stream << std::endl
+                                  << "<HEAD>" << std::endl
+                                  << "    <TITLE>" << output << "</TITLE>"
+                                  << std::endl;
+
+    global_items.html_file_stream
+        << "    <META charset=\"UTF-8\">" << std::endl
+        << "    <style media=\"screen\" type=\"text/css\">" << std::endl
+        << "        html {" << std::endl
+        // By defining the color first, this won't override the background image
+        // (unless the images aren't there).
+        << "            background-color: #0b1e48;" << std::endl
+        // The following changes try to load the text image twice (locally, then
+        // off the web) followed by the background image twice (locally, then
+        // off the web).  The background color will only show if both background
+        // image loads fail.  In this way, a user will see their local copy on
+        // their machine, while a person they share it with will see the web
+        // images (or the background color).
+        << "            background-image: url(\"file:///"
+        << global_items.exe_directory << "/images/lunarg_via.png\"), "
+        << "url(\"https://vulkan.lunarg.com/img/lunarg_via.png\"), "
+           "url(\"file:///"
+        << global_items.exe_directory << "/images/bg-starfield.jpg\"), "
+        << "url(\"https://vulkan.lunarg.com/img/bg-starfield.jpg\");"
+        << std::endl
+        << "            background-position: center top, center top, center, "
+           "center;"
+        << std::endl
+        << "            -webkit-background-size: auto, auto, cover, cover;"
+        << std::endl
+        << "            -moz-background-size: auto, auto, cover, cover;"
+        << std::endl
+        << "            -o-background-size: auto, auto, cover, cover;"
+        << std::endl
+        << "            background-size: auto, auto, cover, cover;" << std::endl
+        << "            background-attachment: scroll, scroll, fixed, fixed;"
+        << std::endl
+        << "            background-repeat: no-repeat, no-repeat, no-repeat, "
+           "no-repeat;"
+        << std::endl
+        << "        }" << std::endl
+        // h1.section is used for section headers, and h1.version is used to
+        // print out the application version text (which shows up just under
+        // the title).
+        << "        h1.section {" << std::endl
+        << "            font-family: sans-serif;" << std::endl
+        << "            font-size: 35px;" << std::endl
+        << "            color: #FFFFFF;" << std::endl
+        << "        }" << std::endl
+        << "        h1.version {" << std::endl
+        << "            font-family: sans-serif;" << std::endl
+        << "            font-size: 25px;" << std::endl
+        << "            color: #FFFFFF;" << std::endl
+        << "        }" << std::endl
+        << "        table {" << std::endl
+        << "            min-width: 600px;" << std::endl
+        << "            width: 70%;" << std::endl
+        << "            border-collapse: collapse;" << std::endl
+        << "            border-color: grey;" << std::endl
+        << "            font-family: sans-serif;" << std::endl
+        << "        }" << std::endl
+        << "        td.header {" << std::endl
+        << "            padding: 18px;" << std::endl
+        << "            border: 1px solid #ccc;" << std::endl
+        << "            font-size: 18px;" << std::endl
+        << "            color: #fff;" << std::endl
+        << "        }" << std::endl
+        << "        td.odd {" << std::endl
+        << "            padding: 10px;" << std::endl
+        << "            border: 1px solid #ccc;" << std::endl
+        << "            font-size: 16px;" << std::endl
+        << "            color: rgb(255, 255, 255);" << std::endl
+        << "        }" << std::endl
+        << "        td.even {" << std::endl
+        << "            padding: 10px;" << std::endl
+        << "            border: 1px solid #ccc;" << std::endl
+        << "            font-size: 16px;" << std::endl
+        << "            color: rgb(220, 220, 220);" << std::endl
+        << "        }" << std::endl
+        << "        tr.header {" << std::endl
+        << "            background-color: rgba(255,255,255,0.5);" << std::endl
+        << "        }" << std::endl
+        << "        tr.odd {" << std::endl
+        << "            background-color: rgba(0,0,0,0.6);" << std::endl
+        << "        }" << std::endl
+        << "        tr.even {" << std::endl
+        << "            background-color: rgba(0,0,0,0.7);" << std::endl
+        << "        }" << std::endl
+        << "    </style>" << std::endl
+        << "    <script src=\"https://ajax.googleapis.com/ajax/libs/jquery/"
+        << "2.2.4/jquery.min.js\"></script>" << std::endl
+        << "    <script type=\"text/javascript\">" << std::endl
+        << "        $( document ).ready(function() {" << std::endl
+        << "            $('table tr:not(.header)').hide();" << std::endl
+        << "            $('.header').click(function() {" << std::endl
+        << "                "
+           "$(this).nextUntil('tr.header').slideToggle(300);"
+        << std::endl
+        << "            });" << std::endl
+        << "        });" << std::endl
+        << "    </script>" << std::endl
+        << "</HEAD>" << std::endl
+        << std::endl
+        << "<BODY>" << std::endl
+        << std::endl;
+    // We need space from the top for the VIA texture
+    for (uint32_t space = 0; space < 15; space++) {
+        global_items.html_file_stream << "    <BR />" << std::endl;
+    }
+    // All the silly "&nbsp;" are to make sure the version lines up directly
+    // under the  VIA portion of the log.
+    global_items.html_file_stream << "    <H1 class=\"version\"><center>";
+    for (uint32_t space = 0; space < 65; space++) {
+        global_items.html_file_stream << "&nbsp;";
+    }
+    global_items.html_file_stream << APP_VERSION << "</center></h1>"
+                                  << std::endl
+                                  << "    <BR />" << std::endl
+                                  << "    <BR />" << std::endl;
+}
+
+// Close out writing to the HTML file.
+void EndOutput() {
+    global_items.html_file_stream << "</BODY>" << std::endl
+                                  << std::endl
+                                  << "</HTML>" << std::endl;
+}
+
+void BeginSection(std::string section_str) {
+    global_items.html_file_stream << "    <H1 class=\"section\"><center>"
+                                  << section_str << "</center></h1>"
+                                  << std::endl;
+}
+
+void EndSection() {
+    global_items.html_file_stream << "    <BR/>" << std::endl
+                                  << "    <BR/>" << std::endl;
+}
+
+void PrintStandardText(std::string section) {
+    global_items.html_file_stream << "    <H2><font color=\"White\">" << section
+                                  << "</font></H2>" << std::endl;
+}
+
+void PrintBeginTable(const char *table_name, uint32_t num_cols) {
+
+    global_items.html_file_stream
+        << "    <table align=\"center\">" << std::endl
+        << "        <tr class=\"header\">" << std::endl
+        << "            <td colspan=\"" << num_cols << "\" class=\"header\">"
+        << table_name << "</td>" << std::endl
+        << "         </tr>" << std::endl;
+
+    global_items.is_odd_row = true;
+}
+
+void PrintBeginTableRow() {
+    std::string class_str = "";
+    if (global_items.is_odd_row) {
+        class_str = " class=\"odd\"";
+    } else {
+        class_str = " class=\"even\"";
+    }
+    global_items.html_file_stream << "        <tr" << class_str << ">"
+                                  << std::endl;
+}
+
+void PrintTableElement(std::string element, ElementAlign align = ALIGN_LEFT) {
+    std::string align_str = "";
+    std::string class_str = "";
+    if (align == ALIGN_RIGHT) {
+        align_str = " align=\"right\"";
+    }
+    if (global_items.is_odd_row) {
+        class_str = " class=\"odd\"";
+    } else {
+        class_str = " class=\"even\"";
+    }
+    global_items.html_file_stream << "            <td" << align_str << class_str
+                                  << ">" << element << "</td>" << std::endl;
+}
+
+void PrintEndTableRow() {
+    global_items.html_file_stream << "        </tr>" << std::endl;
+    global_items.is_odd_row = !global_items.is_odd_row;
+}
+
+void PrintEndTable() {
+    global_items.html_file_stream << "    </table>" << std::endl;
+}
+
+// Generate the full library location for a file based on the location of
+// the JSON file referencing it, and the library location contained in that
+// JSON file.
+bool GenerateLibraryPath(const char *json_location, const char *library_info,
+                         const uint32_t max_length, char *library_location) {
+    bool success = false;
+    char final_path[MAX_STRING_LENGTH];
+    char *working_string_ptr;
+    uint32_t len =
+        (max_length > MAX_STRING_LENGTH) ? MAX_STRING_LENGTH : max_length;
+
+    if (NULL == json_location || NULL == library_info ||
+        NULL == library_location) {
+        goto out;
+    }
+
+    // Remove json file from json path to get just the file base location
+    strncpy(final_path, json_location, len);
+    working_string_ptr = strrchr(final_path, '\\');
+    if (working_string_ptr == NULL) {
+        working_string_ptr = strrchr(final_path, '/');
+    }
+    if (working_string_ptr != NULL) {
+        working_string_ptr++;
+        *working_string_ptr = '\0';
+    }
+
+    // Determine if the library is relative or absolute
+    if (library_info[0] == '\\' || library_info[0] == '/' ||
+        library_info[1] == ':') {
+        // Absolute path
+        strncpy(library_location, library_info, len);
+        success = true;
+    } else {
+        uint32_t i = 0;
+        // Relative path, so we need to use the JSON's location
+        while (library_info[i] == '.' && library_info[i + 1] == '.' &&
+               (library_info[i + 2] == '\\' || library_info[i + 2] == '/')) {
+            i += 3;
+            // Go up a folder in the json path
+            working_string_ptr = strrchr(final_path, '\\');
+            if (working_string_ptr == NULL) {
+                working_string_ptr = strrchr(final_path, '/');
+            }
+            if (working_string_ptr != NULL) {
+                working_string_ptr++;
+                *working_string_ptr = '\0';
+            }
+        }
+        while (library_info[i] == '.' &&
+               (library_info[i + 1] == '\\' || library_info[i + 1] == '/')) {
+            i += 2;
+        }
+        strncpy(library_location, final_path, MAX_STRING_LENGTH - 1);
+        strncat(library_location, &library_info[i], len);
+        success = true;
+    }
+
+out:
+    return success;
+}
+
+#ifdef _WIN32
+// Registry utility fuctions to simplify reading data from the
+// Windows registry.
+
+const char g_uninstall_reg_path[] =
+    "SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Uninstall";
+
+bool ReadRegKeyString(HKEY regFolder, const char *keyPath,
+                      const char *valueName, const int maxLength,
+                      char *retString) {
+    bool retVal = false;
+    DWORD bufLen = maxLength;
+    DWORD keyFlags = KEY_READ;
+    HKEY hKey;
+    LONG lret;
+
+    if (global_items.is_wow64) {
+        keyFlags |= KEY_WOW64_64KEY;
+    }
+
+    *retString = '\0';
+    lret = RegOpenKeyExA(regFolder, keyPath, 0, keyFlags, &hKey);
+    if (lret == ERROR_SUCCESS) {
+        lret = RegQueryValueExA(hKey, valueName, NULL, NULL, (BYTE *)retString,
+                                &bufLen);
+        if (lret == ERROR_SUCCESS) {
+            retVal = true;
+        }
+        RegCloseKey(hKey);
+    }
+
+    return retVal;
+}
+
+bool WriteRegKeyString(HKEY regFolder, const char *keyPath, char *valueName,
+                       char *valueValue) {
+    bool retVal = false;
+    DWORD keyFlags = KEY_WRITE;
+    HKEY hKey;
+    LONG lret;
+
+    if (global_items.is_wow64) {
+        keyFlags |= KEY_WOW64_64KEY;
+    }
+
+    lret = RegOpenKeyExA(regFolder, keyPath, 0, keyFlags, &hKey);
+    if (lret == ERROR_SUCCESS) {
+        lret = RegSetKeyValueA(hKey, NULL, valueName, REG_SZ,
+                               (BYTE *)valueValue, (DWORD)(strlen(valueValue)));
+        if (lret == ERROR_SUCCESS) {
+            retVal = true;
+        }
+        RegCloseKey(hKey);
+    }
+
+    return retVal;
+}
+
+bool DeleteRegKeyString(HKEY regFolder, const char *keyPath, char *valueName) {
+    bool retVal = false;
+    DWORD keyFlags = KEY_WRITE;
+    HKEY hKey;
+    LONG lret;
+
+    if (global_items.is_wow64) {
+        keyFlags |= KEY_WOW64_64KEY;
+    }
+
+    lret = RegOpenKeyExA(regFolder, keyPath, 0, keyFlags, &hKey);
+    if (lret == ERROR_SUCCESS) {
+        lret = RegDeleteKeyValueA(hKey, NULL, valueName);
+        if (lret == ERROR_SUCCESS) {
+            retVal = true;
+        }
+        RegCloseKey(hKey);
+    }
+
+    return retVal;
+}
+
+bool ReadRegKeyDword(HKEY regFolder, const char *keyPath, const char *valueName,
+                     unsigned int *returnInt) {
+    bool retVal = false;
+    DWORD bufLen = sizeof(DWORD);
+    DWORD keyFlags = KEY_READ;
+    HKEY hKey;
+    LONG lret;
+
+    if (global_items.is_wow64) {
+        keyFlags |= KEY_WOW64_64KEY;
+    }
+
+    *returnInt = 0;
+    lret = RegOpenKeyExA(regFolder, keyPath, 0, keyFlags, &hKey);
+    if (lret == ERROR_SUCCESS) {
+        lret = RegQueryValueExA(hKey, valueName, NULL, NULL, (BYTE *)returnInt,
+                                &bufLen);
+        if (lret == ERROR_SUCCESS) {
+            retVal = true;
+        }
+        RegCloseKey(hKey);
+    }
+
+    return retVal;
+}
+
+bool FindNextRegKey(HKEY regFolder, const char *keyPath, const char *keySearch,
+                    const int itemIndex, const int maxLength, char *retString) {
+    bool retVal = false;
+    DWORD bufLen = MAX_STRING_LENGTH - 1;
+    DWORD keyFlags = KEY_ENUMERATE_SUB_KEYS | KEY_QUERY_VALUE;
+    HKEY hKey;
+    LONG lret;
+    int itemCount = 0;
+
+    if (global_items.is_wow64) {
+        keyFlags |= KEY_WOW64_64KEY;
+    }
+
+    *retString = '\0';
+    lret = RegOpenKeyExA(regFolder, keyPath, 0, keyFlags, &hKey);
+    if (lret == ERROR_SUCCESS) {
+        DWORD index = 0;
+        char keyName[MAX_STRING_LENGTH];
+
+        do {
+            lret = RegEnumKeyExA(hKey, index, keyName, &bufLen, NULL, NULL,
+                                 NULL, NULL);
+            if (ERROR_SUCCESS != lret) {
+                break;
+            }
+            if (strlen(keySearch) == 0 || NULL != strstr(keyName, keySearch)) {
+                if (itemIndex == itemCount) {
+                    strncpy_s(retString, maxLength, keyName, bufLen);
+                    retVal = true;
+                    break;
+                } else {
+                    itemCount++;
+                }
+            }
+            bufLen = MAX_STRING_LENGTH - 1;
+            ++index;
+        } while (true);
+    }
+
+    return retVal;
+}
+
+bool FindNextRegValue(HKEY regFolder, const char *keyPath,
+                      const char *valueSearch, const int startIndex,
+                      const int maxLength, char *retString, uint32_t *retValue) {
+    bool retVal = false;
+    DWORD bufLen = MAX_STRING_LENGTH - 1;
+    DWORD keyFlags = KEY_ENUMERATE_SUB_KEYS | KEY_QUERY_VALUE;
+    HKEY hKey;
+    LONG lret;
+
+    if (global_items.is_wow64) {
+        keyFlags |= KEY_WOW64_64KEY;
+    }
+
+    *retValue = 0;
+    *retString = '\0';
+    lret = RegOpenKeyExA(regFolder, keyPath, 0, keyFlags, &hKey);
+    if (lret == ERROR_SUCCESS) {
+        DWORD index = startIndex;
+        char valueName[MAX_STRING_LENGTH];
+
+        do {
+            DWORD type;
+            DWORD value;
+            DWORD len;
+            lret = RegEnumValueA(hKey, index, valueName, &bufLen, NULL, &type,
+                                 (LPBYTE)&value, &len);
+            if (ERROR_SUCCESS != lret) {
+                break;
+            }
+            if (type == REG_DWORD) {
+                *retValue = value;
+            }
+            if (strlen(valueSearch) == 0 ||
+                NULL != strstr(valueName, valueSearch)) {
+                strncpy_s(retString, maxLength, valueName, bufLen);
+                retVal = true;
+                break;
+            }
+            bufLen = MAX_STRING_LENGTH - 1;
+            ++index;
+        } while (true);
+    }
+
+    return retVal;
+}
+
+// Registry prototypes for Windows
+bool ReadRegKeyDword(HKEY regFolder, const char *keyPath, const char *valueName,
+                     unsigned int *returnInt);
+bool ReadRegKeyString(HKEY regFolder, const char *keyPath,
+                      const char *valueName, const int maxLength,
+                      char *retString);
+bool FindNextRegKey(HKEY regFolder, const char *keyPath, const char *keySearch,
+                    const int startIndex, const int maxLength, char *retString);
+bool FindNextRegValue(HKEY regFolder, const char *keyPath,
+                      const char *valueSearch, const int startIndex,
+                      const int maxLength, char *retString, uint32_t *retValue);
+bool WriteRegKeyString(HKEY regFolder, const char *keyPath, char *valueName,
+                       char *valueValue);
+bool DeleteRegKeyString(HKEY regFolder, const char *keyPath, char *valueName);
+
+// Functionality to determine if this 32-bit process is running on Windows 64.
+//
+void IsWow64() {
+    typedef BOOL(WINAPI * LPFN_ISWOW64PROCESS)(HANDLE, PBOOL);
+
+    // IsWow64Process is not available on all supported versions of Windows.
+    // Use GetModuleHandle to get a handle to the DLL that contains the function
+    // and GetProcAddress to get a pointer to the function if available.
+
+    LPFN_ISWOW64PROCESS fnIsWow64Process = (LPFN_ISWOW64PROCESS)GetProcAddress(
+        GetModuleHandle(TEXT("kernel32")), "IsWow64Process");
+
+    if (NULL != fnIsWow64Process) {
+        BOOL isWOW = FALSE;
+        if (!fnIsWow64Process(GetCurrentProcess(), &isWOW)) {
+            printf("Error : Failed to determine properly if on Win64!");
+        }
+
+        if (isWOW == TRUE) {
+            global_items.is_wow64 = true;
+        }
+    }
+}
+
+// Run the test in the specified directory with the corresponding
+// command-line arguments.
+// Returns 0 on no error, 1 if test file wasn't found, and -1
+// on any other errors.
+int RunTestInDirectory(std::string path, std::string test,
+                       std::string cmd_line) {
+    int err_code = -1;
+    char orig_dir[MAX_STRING_LENGTH];
+    orig_dir[0] = '\0';
+    if (0 != GetCurrentDirectoryA(MAX_STRING_LENGTH - 1, orig_dir) &&
+        TRUE == SetCurrentDirectoryA(path.c_str())) {
+        if (TRUE == PathFileExists(test.c_str())) {
+            err_code = system(cmd_line.c_str());
+        } else {
+            err_code = 1;
+        }
+        SetCurrentDirectoryA(orig_dir);
+    }
+    return err_code;
+}
+
+// Print out any information about the current system that we can
+// capture to ease in debugging/investigation at a later time.
+void PrintSystemInfo(void) {
+    OSVERSIONINFOEX os_info;
+    SYSTEM_INFO sys_info;
+    MEMORYSTATUSEX mem_stat;
+    DWORD ser_ver = 0;
+    DWORD sect_per_cluster = 0;
+    DWORD bytes_per_sect = 0;
+    DWORD num_free_cluster = 0;
+    DWORD total_num_cluster = 0;
+    char system_root_dir[MAX_STRING_LENGTH];
+    char generic_string[MAX_STRING_LENGTH];
+    char output_string[MAX_STRING_LENGTH];
+    char os_size[32];
+    std::string cur_directory;
+    std::string exe_directory;
+
+    // Determine if this 32-bit process is on Win64.
+    IsWow64();
+
+#if _WIN64
+    strncpy(os_size, " 64-bit", 31);
+#else
+    strncpy(os_size, " 32-bit", 31);
+#endif
+
+    BeginSection("System Info");
+
+    // Environment section has information about the OS and the
+    // execution environment.
+    PrintBeginTable("Environment", 3);
+
+    ZeroMemory(&sys_info, sizeof(SYSTEM_INFO));
+    GetSystemInfo(&sys_info);
+
+    ZeroMemory(&os_info, sizeof(OSVERSIONINFOEX));
+    os_info.dwOSVersionInfoSize = sizeof(OSVERSIONINFOEX);
+
+    ZeroMemory(&mem_stat, sizeof(MEMORYSTATUSEX));
+    mem_stat.dwLength = sizeof(MEMORYSTATUSEX);
+
+    // Since this is Windows #ifdef code, determine the version of Windows
+    // that the applciation is running on.  It's not trivial and has to
+    // refer to items queried in the above structures as well as the
+    // Windows registry.
+    if (TRUE == GetVersionEx((LPOSVERSIONINFO)(&os_info))) {
+        switch (os_info.dwMajorVersion) {
+        case 10:
+            if (os_info.wProductType == VER_NT_WORKSTATION) {
+                if (ReadRegKeyString(
+                        HKEY_LOCAL_MACHINE,
+                        "Software\\Microsoft\\Windows NT\\CurrentVersion",
+                        "ProductName", MAX_STRING_LENGTH - 1, generic_string)) {
+                    PrintBeginTableRow();
+                    PrintTableElement("Windows");
+                    PrintTableElement(generic_string);
+                    PrintTableElement(os_size);
+                    PrintEndTableRow();
+
+                    if (ReadRegKeyString(
+                            HKEY_LOCAL_MACHINE,
+                            "Software\\Microsoft\\Windows NT\\CurrentVersion",
+                            "CurrentBuild", MAX_STRING_LENGTH - 1,
+                            output_string)) {
+                        PrintBeginTableRow();
+                        PrintTableElement("");
+                        PrintTableElement("Build");
+                        PrintTableElement(output_string);
+                        PrintEndTableRow();
+                        if (ReadRegKeyString(
+                                HKEY_LOCAL_MACHINE, "Software\\Microsoft\\Windo"
+                                                    "ws NT\\CurrentVersion",
+                                "BuildBranch", MAX_STRING_LENGTH - 1,
+                                output_string)) {
+                            PrintBeginTableRow();
+                            PrintTableElement("");
+                            PrintTableElement("Branch");
+                            PrintTableElement(output_string);
+                            PrintEndTableRow();
+                        }
+                    }
+                } else {
+                    PrintBeginTableRow();
+                    PrintTableElement("Windows");
+                    PrintTableElement("Windows 10 (or newer)");
+                    PrintTableElement(os_size);
+                    PrintEndTableRow();
+                }
+            } else {
+                PrintBeginTableRow();
+                PrintTableElement("Windows");
+                PrintTableElement("Windows Server 2016 (or newer)");
+                PrintTableElement(os_size);
+                PrintEndTableRow();
+            }
+            break;
+        case 6:
+            switch (os_info.dwMinorVersion) {
+            case 3:
+                if (os_info.wProductType == VER_NT_WORKSTATION) {
+                    if (ReadRegKeyString(
+                            HKEY_LOCAL_MACHINE,
+                            "Software\\Microsoft\\Windows NT\\CurrentVersion",
+                            "ProductName", MAX_STRING_LENGTH - 1,
+                            generic_string)) {
+                        PrintBeginTableRow();
+                        PrintTableElement("Windows");
+                        PrintTableElement(generic_string);
+                        PrintTableElement(os_size);
+                        PrintEndTableRow();
+
+                        if (ReadRegKeyString(
+                                HKEY_LOCAL_MACHINE, "Software\\Microsoft\\Windo"
+                                                    "ws NT\\CurrentVersion",
+                                "CurrentBuild", MAX_STRING_LENGTH - 1,
+                                output_string)) {
+                            PrintBeginTableRow();
+                            PrintTableElement("");
+                            PrintTableElement("Build");
+                            PrintTableElement(output_string);
+                            PrintEndTableRow();
+
+                            if (ReadRegKeyString(HKEY_LOCAL_MACHINE,
+                                                 "Software\\Microsoft\\Windo"
+                                                 "ws NT\\CurrentVersion",
+                                                 "BuildBranch",
+                                                 MAX_STRING_LENGTH - 1,
+                                                 output_string)) {
+                                PrintBeginTableRow();
+                                PrintTableElement("");
+                                PrintTableElement("Branch");
+                                PrintTableElement(output_string);
+                                PrintEndTableRow();
+                            }
+                        }
+                    }
+                } else {
+                    PrintBeginTableRow();
+                    PrintTableElement("Windows");
+                    PrintTableElement("Windows Server 2012 R2 (or newer)");
+                    PrintTableElement(os_size);
+                    PrintEndTableRow();
+                }
+                break;
+            case 2:
+                if (os_info.wProductType == VER_NT_WORKSTATION) {
+                    if (ReadRegKeyString(
+                            HKEY_LOCAL_MACHINE,
+                            "Software\\Microsoft\\Windows NT\\CurrentVersion",
+                            "ProductName", MAX_STRING_LENGTH - 1,
+                            generic_string)) {
+                        PrintBeginTableRow();
+                        PrintTableElement("Windows");
+                        PrintTableElement(generic_string);
+                        PrintTableElement(os_size);
+                        PrintEndTableRow();
+
+                        if (ReadRegKeyString(
+                                HKEY_LOCAL_MACHINE, "Software\\Microsoft\\Windo"
+                                                    "ws NT\\CurrentVersion",
+                                "CurrentBuild", MAX_STRING_LENGTH - 1,
+                                output_string)) {
+                            PrintBeginTableRow();
+                            PrintTableElement("");
+                            PrintTableElement("Build");
+                            PrintTableElement(output_string);
+                            PrintEndTableRow();
+                            if (ReadRegKeyString(HKEY_LOCAL_MACHINE,
+                                                 "Software\\Microsoft\\Windo"
+                                                 "ws NT\\CurrentVersion",
+                                                 "BuildBranch",
+                                                 MAX_STRING_LENGTH - 1,
+                                                 output_string)) {
+                                PrintBeginTableRow();
+                                PrintTableElement("");
+                                PrintTableElement("Branch");
+                                PrintTableElement(output_string);
+                                PrintEndTableRow();
+                            }
+                        }
+                    }
+                } else {
+                    PrintBeginTableRow();
+                    PrintTableElement("Windows");
+                    PrintTableElement("Windows Server 2012 (or newer)");
+                    PrintTableElement(os_size);
+                    PrintEndTableRow();
+                }
+                break;
+            case 1:
+                if (os_info.wProductType == VER_NT_WORKSTATION) {
+                    PrintBeginTableRow();
+                    PrintTableElement("Windows");
+                    PrintTableElement("Windows 7 (or newer)");
+                    PrintTableElement(os_size);
+                    PrintEndTableRow();
+                } else {
+                    PrintBeginTableRow();
+                    PrintTableElement("Windows");
+                    PrintTableElement("Windows Server 2008 R2 (or newer)");
+                    PrintTableElement(os_size);
+                    PrintEndTableRow();
+                }
+                break;
+            default:
+                if (os_info.wProductType == VER_NT_WORKSTATION) {
+                    PrintBeginTableRow();
+                    PrintTableElement("Windows");
+                    PrintTableElement("Windows Vista (or newer)");
+                    PrintTableElement(os_size);
+                    PrintEndTableRow();
+                } else {
+                    PrintBeginTableRow();
+                    PrintTableElement("Windows");
+                    PrintTableElement("Windows Server 2008 (or newer)");
+                    PrintTableElement(os_size);
+                    PrintEndTableRow();
+                }
+                break;
+            }
+            break;
+        case 5:
+            ser_ver = GetSystemMetrics(SM_SERVERR2);
+            switch (os_info.dwMinorVersion) {
+            case 2:
+                if ((os_info.wProductType == VER_NT_WORKSTATION) &&
+                    (sys_info.wProcessorArchitecture ==
+                     PROCESSOR_ARCHITECTURE_AMD64)) {
+                    strncpy(generic_string, "Windows XP Professional x64",
+                            MAX_STRING_LENGTH - 1);
+                } else if (os_info.wSuiteMask & VER_SUITE_WH_SERVER) {
+                    strncpy(generic_string, "Windows Home Server",
+                            MAX_STRING_LENGTH - 1);
+                } else if (ser_ver != 0) {
+                    strncpy(generic_string, "Windows Server 2003 R2",
+                            MAX_STRING_LENGTH - 1);
+                } else {
+                    strncpy(generic_string, "Windows Server 2003",
+                            MAX_STRING_LENGTH - 1);
+                }
+                PrintBeginTableRow();
+                PrintTableElement("Windows");
+                PrintTableElement(generic_string);
+                PrintTableElement(os_size);
+                PrintEndTableRow();
+                break;
+            case 1:
+                PrintBeginTableRow();
+                PrintTableElement("Windows");
+                PrintTableElement("Windows XP");
+                PrintTableElement(os_size);
+                PrintEndTableRow();
+                break;
+            case 0:
+                PrintBeginTableRow();
+                PrintTableElement("Windows");
+                PrintTableElement("Windows 2000");
+                PrintTableElement(os_size);
+                PrintEndTableRow();
+                break;
+            default:
+                PrintBeginTableRow();
+                PrintTableElement("Windows");
+                PrintTableElement("Unknown Windows OS");
+                PrintTableElement(os_size);
+                PrintEndTableRow();
+                break;
+            }
+            break;
+        }
+    } else {
+        PrintBeginTableRow();
+        PrintTableElement("Windows");
+        PrintTableElement("Error retrieving Windows Version");
+        PrintTableElement("");
+        PrintEndTableRow();
+        throw(-1);
+    }
+
+    if (0 != GetEnvironmentVariableA("SYSTEMROOT", system_root_dir,
+                                     MAX_STRING_LENGTH - 1)) {
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("System Root");
+        PrintTableElement(system_root_dir);
+        PrintEndTableRow();
+    }
+    if (0 != GetEnvironmentVariableA("PROGRAMDATA", generic_string,
+                                     MAX_STRING_LENGTH - 1)) {
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("Program Data");
+        PrintTableElement(generic_string);
+        PrintEndTableRow();
+    }
+    if (0 != GetEnvironmentVariableA("PROGRAMFILES", generic_string,
+                                     MAX_STRING_LENGTH - 1)) {
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("Program Files");
+        PrintTableElement(generic_string);
+        PrintEndTableRow();
+    }
+    if (0 != GetEnvironmentVariableA("PROGRAMFILES(X86)", generic_string,
+                                     MAX_STRING_LENGTH - 1)) {
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("Program Files (x86)");
+        PrintTableElement(generic_string);
+        PrintEndTableRow();
+    }
+    if (0 != GetEnvironmentVariableA("TEMP", generic_string,
+                                     MAX_STRING_LENGTH - 1)) {
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("TEMP");
+        PrintTableElement(generic_string);
+        PrintEndTableRow();
+    }
+    if (0 !=
+        GetEnvironmentVariableA("TMP", generic_string, MAX_STRING_LENGTH - 1)) {
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("TMP");
+        PrintTableElement(generic_string);
+        PrintEndTableRow();
+    }
+
+    PrintEndTable();
+
+    // Output whatever generic hardware information we can find out about the
+    // system.  Including how much memory and disk space is available.
+    PrintBeginTable("Hardware", 3);
+
+    snprintf(generic_string, MAX_STRING_LENGTH - 1, "%u",
+             sys_info.dwNumberOfProcessors);
+    PrintBeginTableRow();
+    PrintTableElement("CPUs");
+    PrintTableElement("Number of Logical Cores");
+    PrintTableElement(generic_string);
+    PrintEndTableRow();
+
+    switch (sys_info.wProcessorArchitecture) {
+    case PROCESSOR_ARCHITECTURE_AMD64:
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("Type");
+        PrintTableElement("x86_64");
+        PrintEndTableRow();
+        break;
+    case PROCESSOR_ARCHITECTURE_ARM:
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("Type");
+        PrintTableElement("ARM");
+        PrintEndTableRow();
+        break;
+    case PROCESSOR_ARCHITECTURE_IA64:
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("Type");
+        PrintTableElement("IA64");
+        PrintEndTableRow();
+        break;
+    case PROCESSOR_ARCHITECTURE_INTEL:
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("Type");
+        PrintTableElement("x86");
+        PrintEndTableRow();
+        break;
+    default:
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("Type");
+        PrintTableElement("Unknown");
+        PrintEndTableRow();
+        break;
+    }
+
+    if (TRUE == GlobalMemoryStatusEx(&mem_stat)) {
+        if ((mem_stat.ullTotalPhys >> 40) > 0x0ULL) {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%u TB",
+                     static_cast<uint32_t>(mem_stat.ullTotalPhys >> 40));
+            PrintBeginTableRow();
+            PrintTableElement("Memory");
+            PrintTableElement("Physical");
+            PrintTableElement(generic_string);
+            PrintEndTableRow();
+        } else if ((mem_stat.ullTotalPhys >> 30) > 0x0ULL) {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%u GB",
+                     static_cast<uint32_t>(mem_stat.ullTotalPhys >> 30));
+            PrintBeginTableRow();
+            PrintTableElement("Memory");
+            PrintTableElement("Physical");
+            PrintTableElement(generic_string);
+            PrintEndTableRow();
+        } else if ((mem_stat.ullTotalPhys >> 20) > 0x0ULL) {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%u MB",
+                     static_cast<uint32_t>(mem_stat.ullTotalPhys >> 20));
+            PrintBeginTableRow();
+            PrintTableElement("Memory");
+            PrintTableElement("Physical");
+            PrintTableElement(generic_string);
+            PrintEndTableRow();
+        } else if ((mem_stat.ullTotalPhys >> 10) > 0x0ULL) {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%u KB",
+                     static_cast<uint32_t>(mem_stat.ullTotalPhys >> 10));
+            PrintBeginTableRow();
+            PrintTableElement("Memory");
+            PrintTableElement("Physical");
+            PrintTableElement(generic_string);
+            PrintEndTableRow();
+        } else {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%u bytes",
+                     static_cast<uint32_t>(mem_stat.ullTotalPhys));
+            PrintBeginTableRow();
+            PrintTableElement("Memory");
+            PrintTableElement("Physical");
+            PrintTableElement(generic_string);
+            PrintEndTableRow();
+        }
+    }
+
+    if (TRUE == GetDiskFreeSpaceA(NULL, &sect_per_cluster, &bytes_per_sect,
+                                  &num_free_cluster, &total_num_cluster)) {
+        uint64_t bytes_free = (uint64_t)bytes_per_sect *
+                              (uint64_t)sect_per_cluster *
+                              (uint64_t)num_free_cluster;
+        uint64_t bytes_total = (uint64_t)bytes_per_sect *
+                               (uint64_t)sect_per_cluster *
+                               (uint64_t)total_num_cluster;
+        double perc_free = (double)bytes_free / (double)bytes_total;
+        if ((bytes_total >> 40) > 0x0ULL) {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%u TB",
+                     static_cast<uint32_t>(bytes_total >> 40));
+            PrintBeginTableRow();
+            PrintTableElement("Disk Space");
+            PrintTableElement("Total");
+            PrintTableElement(generic_string);
+            PrintEndTableRow();
+        } else if ((bytes_total >> 30) > 0x0ULL) {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%u GB",
+                     static_cast<uint32_t>(bytes_total >> 30));
+            PrintBeginTableRow();
+            PrintTableElement("Disk Space");
+            PrintTableElement("Total");
+            PrintTableElement(generic_string);
+            PrintEndTableRow();
+        } else if ((bytes_total >> 20) > 0x0ULL) {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%u MB",
+                     static_cast<uint32_t>(bytes_total >> 20));
+            PrintBeginTableRow();
+            PrintTableElement("Disk Space");
+            PrintTableElement("Total");
+            PrintTableElement(generic_string);
+            PrintEndTableRow();
+        } else if ((bytes_total >> 10) > 0x0ULL) {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%u KB",
+                     static_cast<uint32_t>(bytes_total >> 10));
+            PrintBeginTableRow();
+            PrintTableElement("Disk Space");
+            PrintTableElement("Total");
+            PrintTableElement(generic_string);
+            PrintEndTableRow();
+        }
+        snprintf(output_string, MAX_STRING_LENGTH - 1, "%4.2f%%",
+                 (static_cast<float>(perc_free) * 100.f));
+        if ((bytes_free >> 40) > 0x0ULL) {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%u TB",
+                     static_cast<uint32_t>(bytes_free >> 40));
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement("Free");
+            PrintTableElement(generic_string);
+            PrintEndTableRow();
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement("Free Perc");
+            PrintTableElement(output_string);
+            PrintEndTableRow();
+        } else if ((bytes_free >> 30) > 0x0ULL) {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%u GB",
+                     static_cast<uint32_t>(bytes_free >> 30));
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement("Free");
+            PrintTableElement(generic_string);
+            PrintEndTableRow();
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement("Free Perc");
+            PrintTableElement(output_string);
+            PrintEndTableRow();
+        } else if ((bytes_free >> 20) > 0x0ULL) {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%u MB",
+                     static_cast<uint32_t>(bytes_free >> 20));
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement("Free");
+            PrintTableElement(generic_string);
+            PrintEndTableRow();
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement("Free Perc");
+            PrintTableElement(output_string);
+            PrintEndTableRow();
+        } else if ((bytes_free >> 10) > 0x0ULL) {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%u KB",
+                     static_cast<uint32_t>(bytes_free >> 10));
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement("Free");
+            PrintTableElement(generic_string);
+            PrintEndTableRow();
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement("Free Perc");
+            PrintTableElement(output_string);
+            PrintEndTableRow();
+        }
+    }
+
+    PrintEndTable();
+
+    // Print out information about this executable.
+    PrintBeginTable("Executable", 2);
+
+    PrintBeginTableRow();
+    PrintTableElement("Exe Directory");
+    PrintTableElement(global_items.exe_directory);
+    PrintEndTableRow();
+
+    if (0 != GetCurrentDirectoryA(MAX_STRING_LENGTH - 1, generic_string)) {
+        cur_directory = generic_string;
+        PrintBeginTableRow();
+        PrintTableElement("Current Directory");
+        PrintTableElement(generic_string);
+        PrintEndTableRow();
+    } else {
+        cur_directory = "";
+    }
+
+    PrintBeginTableRow();
+    PrintTableElement("Vulkan API Version");
+    uint32_t major = VK_VERSION_MAJOR(VK_API_VERSION_1_0);
+    uint32_t minor = VK_VERSION_MINOR(VK_API_VERSION_1_0);
+    uint32_t patch = VK_VERSION_PATCH(VK_HEADER_VERSION);
+    snprintf(generic_string, MAX_STRING_LENGTH - 1, "%d.%d.%d", major, minor,
+             patch);
+    PrintTableElement(generic_string);
+    PrintEndTableRow();
+
+    PrintBeginTableRow();
+    PrintTableElement("Byte Format");
+#if _WIN64 || __x86_64__ || __ppc64__
+    PrintTableElement("64-bit");
+#else
+    PrintTableElement("32-bit");
+#endif
+    PrintEndTableRow();
+
+    PrintEndTable();
+
+    // Now print out the remaining system info.
+    PrintDriverInfo();
+    PrintRunTimeInfo();
+    PrintSDKInfo();
+    PrintLayerInfo();
+    PrintLayerSettingsFileInfo();
+    EndSection();
+}
+
+// Determine what version an executable or library file is.
+bool GetFileVersion(const char *filename, const uint32_t max_len,
+                    char *version_string) {
+    DWORD ver_handle;
+    UINT size = 0;
+    LPBYTE buffer = NULL;
+    DWORD ver_size = GetFileVersionInfoSize(filename, &ver_handle);
+    bool success = false;
+
+    if (ver_size > 0) {
+        LPSTR ver_data = (LPSTR)malloc(sizeof(char) * ver_size);
+
+        if (GetFileVersionInfo(filename, ver_handle, ver_size, ver_data)) {
+            if (VerQueryValue(ver_data, "\\", (VOID FAR * FAR *)&buffer,
+                              &size)) {
+                if (size) {
+                    VS_FIXEDFILEINFO *ver_info = (VS_FIXEDFILEINFO *)buffer;
+                    if (ver_info->dwSignature == 0xfeef04bd) {
+                        DWORD max_size =
+                            ver_size > max_len ? max_len : ver_size;
+                        snprintf(version_string, max_len, "%d.%d.%d.%d",
+                                 (ver_info->dwFileVersionMS >> 16) & 0xffff,
+                                 (ver_info->dwFileVersionMS >> 0) & 0xffff,
+                                 (ver_info->dwFileVersionLS >> 16) & 0xffff,
+                                 (ver_info->dwFileVersionLS >> 0) & 0xffff);
+                        success = true;
+                    }
+                }
+            }
+        }
+        free(ver_data);
+    }
+
+    return success;
+}
+
+// Print out the information for every driver in the appropriate
+// Windows registry location and its corresponding JSON file.
+void PrintDriverInfo(void) {
+    bool failed = false;
+    const char vulkan_reg_base[] = "SOFTWARE\\Khronos\\Vulkan";
+    const char vulkan_reg_base_wow64[] =
+        "SOFTWARE\\WOW6432Node\\Khronos\\Vulkan";
+    char reg_key_loc[MAX_STRING_LENGTH];
+    char cur_vulkan_driver_json[MAX_STRING_LENGTH];
+    char generic_string[MAX_STRING_LENGTH];
+    char full_driver_path[MAX_STRING_LENGTH];
+    char system_path[MAX_STRING_LENGTH];
+    char count_str[64];
+    uint32_t i = 0;
+    uint32_t j = 0;
+    std::ifstream *stream = NULL;
+    bool found_registry = false;
+    bool found_json = false;
+    bool found_lib = false;
+
+    GetEnvironmentVariableA("SYSTEMROOT", generic_string, MAX_STRING_LENGTH);
+#if _WIN64 || __x86_64__ || __ppc64__
+    snprintf(system_path, MAX_STRING_LENGTH - 1, "%s\\system32\\",
+        generic_string);
+    snprintf(reg_key_loc, MAX_STRING_LENGTH - 1, "%s\\Drivers",
+             vulkan_reg_base);
+#else
+    if (global_items.is_wow64) {
+        snprintf(system_path, MAX_STRING_LENGTH - 1, "%s\\sysWOW64\\",
+            generic_string);
+        snprintf(reg_key_loc, MAX_STRING_LENGTH - 1, "%s\\Drivers",
+                 vulkan_reg_base_wow64);
+    } else {
+        snprintf(system_path, MAX_STRING_LENGTH - 1, "%s\\system32\\",
+            generic_string);
+        snprintf(reg_key_loc, MAX_STRING_LENGTH - 1, "%s\\Drivers",
+                 vulkan_reg_base);
+    }
+#endif
+
+    PrintBeginTable("Vulkan Driver Info", 3);
+    PrintBeginTableRow();
+    PrintTableElement("Registry Location");
+    PrintTableElement(generic_string);
+    PrintTableElement("");
+    PrintEndTableRow();
+
+    // Find the registry settings indicating the location of the driver
+    // JSON files.
+    uint32_t returned_value = 0;
+    while (FindNextRegValue(HKEY_LOCAL_MACHINE, reg_key_loc, "", i,
+                            MAX_STRING_LENGTH - 1, cur_vulkan_driver_json,
+                            &returned_value)) {
+
+        found_registry = true;
+
+        snprintf(generic_string, MAX_STRING_LENGTH - 1, "Driver %d", i++);
+
+        PrintBeginTableRow();
+        PrintTableElement(generic_string, ALIGN_RIGHT);
+        PrintTableElement(cur_vulkan_driver_json);
+
+        snprintf(generic_string, MAX_STRING_LENGTH - 1, "0x%08x",
+                 returned_value);
+        PrintTableElement(generic_string);
+        PrintEndTableRow();
+
+        // Parse the driver JSON file.
+        std::ifstream *stream = NULL;
+        stream = new std::ifstream(cur_vulkan_driver_json, std::ifstream::in);
+        if (nullptr == stream || stream->fail()) {
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement("Error reading JSON file");
+            PrintTableElement(cur_vulkan_driver_json);
+            PrintEndTableRow();
+
+            failed = true;
+            continue;
+        } else {
+            Json::Value root = Json::nullValue;
+            Json::Reader reader;
+            if (!reader.parse(*stream, root, false) || root.isNull()) {
+                PrintBeginTableRow();
+                PrintTableElement("");
+                PrintTableElement("Error reading JSON file");
+                PrintTableElement(reader.getFormattedErrorMessages());
+                PrintEndTableRow();
+
+                failed = true;
+                stream->close();
+                delete stream;
+                continue;
+            } else {
+                PrintBeginTableRow();
+                PrintTableElement("");
+                PrintTableElement("JSON File Version");
+                if (!root["file_format_version"].isNull()) {
+                    PrintTableElement(root["file_format_version"].asString());
+                } else {
+                    PrintTableElement("MISSING!");
+                }
+                PrintEndTableRow();
+
+                if (!root["ICD"].isNull()) {
+                    found_json = true;
+
+                    PrintBeginTableRow();
+                    PrintTableElement("");
+                    PrintTableElement("API Version");
+                    if (!root["ICD"]["api_version"].isNull()) {
+                        PrintTableElement(
+                            root["ICD"]["api_version"].asString());
+                    } else {
+                        PrintTableElement("MISSING!");
+                    }
+                    PrintEndTableRow();
+
+                    PrintBeginTableRow();
+                    PrintTableElement("");
+                    PrintTableElement("Library Path");
+                    if (!root["ICD"]["library_path"].isNull()) {
+                        std::string driver_name = root["ICD"]["library_path"].asString();
+                        std::string system_name = system_path;
+                        system_name += "\\";
+                        system_name += driver_name;
+                        PrintTableElement(driver_name);
+                        PrintEndTableRow();
+
+                        if (GenerateLibraryPath(
+                                cur_vulkan_driver_json,
+                                driver_name.c_str(),
+                                MAX_STRING_LENGTH - 1, full_driver_path)) {
+
+                            if (GetFileVersion(full_driver_path,
+                                               MAX_STRING_LENGTH - 1,
+                                               generic_string)) {
+
+                                PrintBeginTableRow();
+                                PrintTableElement("");
+                                PrintTableElement("Library File Version");
+                                PrintTableElement(generic_string);
+                                PrintEndTableRow();
+
+                                found_lib = true;
+                            } else if (GetFileVersion(system_name.c_str(),
+                                MAX_STRING_LENGTH - 1,
+                                generic_string)) {
+
+                                    PrintBeginTableRow();
+                                    PrintTableElement("");
+                                    PrintTableElement("Library File Version");
+                                    PrintTableElement(generic_string);
+                                    PrintEndTableRow();
+
+                                    found_lib = true;
+                            } else {
+                                snprintf(generic_string, MAX_STRING_LENGTH - 1,
+                                         "Failed to find driver %s "
+                                         " or %sreferenced by JSON %s",
+                                         root["ICD"]["library_path"]
+                                             .asString()
+                                             .c_str(),
+                                         full_driver_path,
+                                         cur_vulkan_driver_json);
+                                PrintBeginTableRow();
+                                PrintTableElement("");
+                                PrintTableElement("");
+                                PrintTableElement(generic_string);
+                                PrintEndTableRow();
+                            }
+                        } else {
+                            snprintf(generic_string, MAX_STRING_LENGTH - 1,
+                                     "Failed to find driver %s "
+                                     "referenced by JSON %s",
+                                     full_driver_path, cur_vulkan_driver_json);
+                            PrintBeginTableRow();
+                            PrintTableElement("");
+                            PrintTableElement("");
+                            PrintTableElement(generic_string);
+                            PrintEndTableRow();
+                        }
+                    } else {
+                        PrintTableElement("MISSING!");
+                        PrintEndTableRow();
+                    }
+
+                    j = 0;
+                    Json::Value dev_exts = root["ICD"]["device_extensions"];
+                    if (!dev_exts.isNull() && dev_exts.isArray()) {
+                        snprintf(count_str, MAX_STRING_LENGTH - 1, "%d",
+                                 dev_exts.size());
+                        PrintBeginTableRow();
+                        PrintTableElement("");
+                        PrintTableElement("Device Extensions");
+                        PrintTableElement(count_str);
+                        PrintEndTableRow();
+
+                        for (Json::ValueIterator dev_ext_it = dev_exts.begin();
+                             dev_ext_it != dev_exts.end(); dev_ext_it++) {
+                            Json::Value dev_ext = (*dev_ext_it);
+                            Json::Value dev_ext_name = dev_ext["name"];
+                            if (!dev_ext_name.isNull()) {
+                                snprintf(generic_string, MAX_STRING_LENGTH - 1,
+                                         "[%d]", j);
+
+                                PrintBeginTableRow();
+                                PrintTableElement("");
+                                PrintTableElement(generic_string, ALIGN_RIGHT);
+                                PrintTableElement(dev_ext_name.asString());
+                                PrintEndTableRow();
+                            }
+                        }
+                    }
+                    Json::Value inst_exts = root["ICD"]["instance_extensions"];
+                    j = 0;
+                    if (!inst_exts.isNull() && inst_exts.isArray()) {
+                        snprintf(count_str, MAX_STRING_LENGTH - 1, "%d",
+                                 inst_exts.size());
+                        PrintBeginTableRow();
+                        PrintTableElement("");
+                        PrintTableElement("Instance Extensions");
+                        PrintTableElement(count_str);
+                        PrintEndTableRow();
+
+                        for (Json::ValueIterator inst_ext_it =
+                                 inst_exts.begin();
+                             inst_ext_it != inst_exts.end(); inst_ext_it++) {
+                            Json::Value inst_ext = (*inst_ext_it);
+                            Json::Value inst_ext_name = inst_ext["name"];
+                            if (!inst_ext_name.isNull()) {
+                                snprintf(generic_string, MAX_STRING_LENGTH - 1,
+                                         "[%d]", j);
+
+                                PrintBeginTableRow();
+                                PrintTableElement("");
+                                PrintTableElement(generic_string, ALIGN_RIGHT);
+                                PrintTableElement(inst_ext_name.asString());
+                                PrintEndTableRow();
+                            }
+                        }
+                    }
+                } else {
+                    PrintBeginTableRow();
+                    PrintTableElement("");
+                    PrintTableElement("ICD Section");
+                    PrintTableElement("MISSING!");
+                    PrintEndTableRow();
+                }
+            }
+
+            stream->close();
+            delete stream;
+            stream = NULL;
+        }
+    }
+    if (!found_registry || !found_json || !found_lib) {
+        failed = true;
+    }
+
+    PrintEndTable();
+
+    if (failed) {
+        if (!found_registry) {
+            throw MISSING_DRIVER_REGISTRY;
+        } else if (!found_json) {
+            throw MISSING_DRIVER_JSON;
+        } else if (!found_lib) {
+            throw MISSING_DRIVER_LIB;
+        } else {
+            throw(-1);
+        }
+    }
+}
+
+// Print out whatever Vulkan runtime information we can gather from the system
+// via registry, standard system paths, etc.
+void PrintRunTimeInfo(void) {
+    char generic_string[MAX_STRING_LENGTH];
+    char count_string[MAX_STRING_LENGTH];
+    char version_string[MAX_STRING_LENGTH];
+    char output_string[MAX_STRING_LENGTH];
+    char dll_search[MAX_STRING_LENGTH];
+    char dll_prefix[MAX_STRING_LENGTH];
+    uint32_t i = 0;
+    uint32_t install_count = 0;
+    FILE *fp = NULL;
+
+    PrintBeginTable("Vulkan Runtimes", 3);
+
+    PrintBeginTableRow();
+    PrintTableElement("Runtimes In Registry");
+    PrintTableElement(g_uninstall_reg_path);
+    PrintTableElement("");
+    PrintEndTableRow();
+
+    // Find all Vulkan Runtime keys in the registry, and loop through each.
+    while (FindNextRegKey(HKEY_LOCAL_MACHINE, g_uninstall_reg_path, "VulkanRT",
+                          i, MAX_STRING_LENGTH - 1, output_string)) {
+        snprintf(count_string, MAX_STRING_LENGTH - 1, "[%d]", i++);
+
+        snprintf(generic_string, MAX_STRING_LENGTH - 1, "%s\\%s",
+                 g_uninstall_reg_path, output_string);
+
+        // Get the version from the registry
+        if (ReadRegKeyString(HKEY_LOCAL_MACHINE, generic_string,
+                             "DisplayVersion", MAX_STRING_LENGTH - 1,
+                             version_string)) {
+        } else {
+            strncpy(version_string, output_string, MAX_STRING_LENGTH - 1);
+        }
+
+        // Get the install count for this runtime from the registry
+        if (ReadRegKeyDword(HKEY_LOCAL_MACHINE, generic_string, "InstallCount",
+                            &install_count)) {
+            snprintf(output_string, MAX_STRING_LENGTH - 1,
+                     "%s  [Install Count = %d]", version_string, install_count);
+        } else {
+            snprintf(output_string, MAX_STRING_LENGTH - 1, "%s",
+                     version_string);
+        }
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement(count_string, ALIGN_RIGHT);
+        PrintTableElement(output_string);
+        PrintEndTableRow();
+    }
+
+    i = 0;
+    GetEnvironmentVariableA("SYSTEMROOT", generic_string, MAX_STRING_LENGTH);
+#if _WIN64 || __x86_64__ || __ppc64__
+    snprintf(dll_prefix, MAX_STRING_LENGTH - 1, "%s\\system32\\",
+             generic_string);
+#else
+    if (global_items.is_wow64) {
+        snprintf(dll_prefix, MAX_STRING_LENGTH - 1, "%s\\sysWOW64\\",
+                 generic_string);
+    } else {
+        snprintf(dll_prefix, MAX_STRING_LENGTH - 1, "%s\\system32\\",
+                 generic_string);
+    }
+#endif
+
+    PrintBeginTableRow();
+    PrintTableElement("Runtimes in System Folder");
+    PrintTableElement(dll_prefix);
+    PrintTableElement("");
+    PrintEndTableRow();
+
+    strncpy(dll_search, dll_prefix, MAX_STRING_LENGTH - 1);
+    strncat(dll_search, "Vulkan-*.dll", MAX_STRING_LENGTH - 1);
+
+    WIN32_FIND_DATAA ffd;
+    HANDLE hFind = FindFirstFileA(dll_search, &ffd);
+    if (hFind != INVALID_HANDLE_VALUE) {
+        do {
+            if (0 == (ffd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)) {
+                snprintf(count_string, MAX_STRING_LENGTH - 1, "DLL %d", i++);
+
+                PrintBeginTableRow();
+                PrintTableElement(count_string, ALIGN_RIGHT);
+                PrintTableElement(ffd.cFileName);
+
+                snprintf(generic_string, MAX_STRING_LENGTH - 1, "%s\\%s",
+                         dll_prefix, ffd.cFileName);
+                if (GetFileVersion(generic_string, MAX_STRING_LENGTH - 1,
+                                   version_string)) {
+                    snprintf(output_string, MAX_STRING_LENGTH - 1, "Version %s",
+                             version_string);
+                    PrintTableElement(output_string);
+                } else {
+                    PrintTableElement("");
+                }
+                PrintEndTableRow();
+            }
+        } while (FindNextFileA(hFind, &ffd) != 0);
+        FindClose(hFind);
+    }
+
+    PrintBeginTableRow();
+    PrintTableElement("Runtime Used by App");
+    if (!system("where vulkan-1.dll > where_vulkan")) {
+        fp = fopen("where_vulkan", "rt");
+        if (NULL != fp) {
+            if (NULL != fgets(generic_string, MAX_STRING_LENGTH - 1, fp)) {
+                int i = (int)strlen(generic_string) - 1;
+                while (generic_string[i] == '\n' || generic_string[i] == '\r' ||
+                       generic_string[i] == '\t' || generic_string[i] == ' ') {
+                    generic_string[i] = '\0';
+                    i--;
+                }
+
+                if (GetFileVersion(generic_string, MAX_STRING_LENGTH - 1,
+                                   version_string)) {
+                    PrintTableElement(generic_string);
+                    PrintTableElement(version_string);
+                } else {
+                    PrintTableElement(generic_string);
+                    PrintTableElement("");
+                }
+            }
+            fclose(fp);
+        }
+        DeleteFileA("where_vulkan");
+    } else {
+        PrintTableElement("Unknown");
+        PrintTableElement("Unknown");
+    }
+    PrintEndTableRow();
+
+    PrintEndTable();
+}
+
+// Print out information on whatever LunarG Vulkan SDKs we can find on
+// the system using the registry, and environmental variables.  This
+// includes listing what layers are available from the SDK.
+void PrintSDKInfo(void) {
+    const char vulkan_reg_base[] = "SOFTWARE\\Khronos\\Vulkan";
+    const char vulkan_reg_base_wow64[] =
+        "SOFTWARE\\WOW6432Node\\Khronos\\Vulkan";
+    char generic_string[MAX_STRING_LENGTH];
+    char count_string[MAX_STRING_LENGTH];
+    char output_string[MAX_STRING_LENGTH];
+    char cur_vulkan_layer_json[MAX_STRING_LENGTH];
+    char sdk_env_dir[MAX_STRING_LENGTH];
+    char reg_key_loc[MAX_STRING_LENGTH];
+    uint32_t i = 0;
+    uint32_t j = 0;
+    FILE *fp = NULL;
+    bool found = false;
+    bool failed = false;
+
+    PrintBeginTable("LunarG Vulkan SDKs", 3);
+    PrintBeginTableRow();
+    PrintTableElement("SDKs Found In Registry");
+    PrintTableElement(g_uninstall_reg_path);
+    PrintTableElement("");
+    PrintEndTableRow();
+
+    while (FindNextRegKey(HKEY_LOCAL_MACHINE, g_uninstall_reg_path, "VulkanSDK",
+                          i, MAX_STRING_LENGTH, output_string)) {
+        found = true;
+        snprintf(count_string, MAX_STRING_LENGTH - 1, "[%d]", i++);
+        snprintf(generic_string, MAX_STRING_LENGTH - 1, "%s\\%s",
+                 g_uninstall_reg_path, output_string);
+        if (ReadRegKeyString(HKEY_LOCAL_MACHINE, generic_string, "InstallDir",
+                             MAX_STRING_LENGTH, output_string)) {
+        }
+
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement(count_string, ALIGN_RIGHT);
+        PrintTableElement(output_string);
+        PrintEndTableRow();
+    }
+    if (!found) {
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("NONE FOUND", ALIGN_RIGHT);
+        PrintTableElement("");
+        PrintEndTableRow();
+    }
+
+    if (0 != GetEnvironmentVariableA("VK_SDK_PATH", sdk_env_dir,
+                                     MAX_STRING_LENGTH - 1)) {
+        PrintBeginTableRow();
+        PrintTableElement("VK_SDK_PATH");
+        global_items.sdk_found = true;
+        global_items.sdk_path = sdk_env_dir;
+        PrintTableElement(sdk_env_dir);
+        PrintTableElement("");
+        PrintEndTableRow();
+    } else if (0 != GetEnvironmentVariableA("VULKAN_SDK", sdk_env_dir,
+                                            MAX_STRING_LENGTH - 1)) {
+        PrintBeginTableRow();
+        PrintTableElement("VULKAN_SDK");
+        global_items.sdk_found = true;
+        global_items.sdk_path = sdk_env_dir;
+        PrintTableElement(sdk_env_dir);
+        PrintTableElement("");
+        PrintEndTableRow();
+    } else {
+        PrintBeginTableRow();
+        PrintTableElement("VK_SDK_PATH");
+        PrintTableElement("No installed SDK");
+        PrintTableElement("");
+        PrintEndTableRow();
+    }
+
+#if _WIN64 || __x86_64__ || __ppc64__
+    snprintf(reg_key_loc, MAX_STRING_LENGTH - 1, "%s\\ExplicitLayers",
+             vulkan_reg_base);
+#else
+    if (global_items.is_wow64) {
+        snprintf(reg_key_loc, MAX_STRING_LENGTH - 1, "%s\\ExplicitLayers",
+                 vulkan_reg_base_wow64);
+    } else {
+        snprintf(reg_key_loc, MAX_STRING_LENGTH - 1, "%s\\ExplicitLayers",
+                 vulkan_reg_base);
+    }
+#endif
+
+    PrintBeginTableRow();
+    PrintTableElement("SDK Explicit Layers");
+    PrintTableElement(generic_string);
+    PrintTableElement("");
+    PrintEndTableRow();
+
+    found = false;
+    i = 0;
+    uint32_t returned_value = 0;
+    while (FindNextRegValue(HKEY_LOCAL_MACHINE, reg_key_loc, "", i,
+                            MAX_STRING_LENGTH, cur_vulkan_layer_json,
+                            &returned_value)) {
+        found = true;
+
+        // Create a short json file name so we don't use up too much space
+        snprintf(output_string, MAX_STRING_LENGTH - 1, ".%s",
+                 &cur_vulkan_layer_json[strlen(sdk_env_dir)]);
+
+        snprintf(count_string, MAX_STRING_LENGTH - 1, "[%d]", i++);
+        PrintBeginTableRow();
+        PrintTableElement(count_string, ALIGN_RIGHT);
+        PrintTableElement(output_string);
+
+        snprintf(output_string, MAX_STRING_LENGTH - 1, "0x%08x", returned_value);
+        PrintTableElement(output_string);
+        PrintEndTableRow();
+
+        std::ifstream *stream = NULL;
+        stream = new std::ifstream(cur_vulkan_layer_json, std::ifstream::in);
+        if (nullptr == stream || stream->fail()) {
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement("ERROR reading JSON file!");
+            PrintTableElement("");
+            PrintEndTableRow();
+            failed = true;
+        } else {
+            Json::Value root = Json::nullValue;
+            Json::Reader reader;
+            if (!reader.parse(*stream, root, false) || root.isNull()) {
+                // Report to the user the failure and their locations in the
+                // document.
+                PrintBeginTableRow();
+                PrintTableElement("");
+                PrintTableElement("ERROR parsing JSON file!");
+                PrintTableElement(reader.getFormattedErrorMessages());
+                PrintEndTableRow();
+                failed = true;
+            } else {
+                PrintExplicitLayerJsonInfo(cur_vulkan_layer_json, root, 3);
+            }
+
+            stream->close();
+            delete stream;
+            stream = NULL;
+        }
+    }
+    if (!found) {
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("NONE FOUND", ALIGN_RIGHT);
+        PrintTableElement("");
+        PrintEndTableRow();
+    }
+
+    PrintEndTable();
+
+    if (failed) {
+        throw(-1);
+    }
+}
+
+// Print out whatever layers we can find out from the Windows'
+// registry and other environmental variables that may be used
+// to point the Vulkan loader at a layer path.
+void PrintLayerInfo(void) {
+    const char vulkan_reg_base[] = "SOFTWARE\\Khronos\\Vulkan";
+    const char vulkan_reg_base_wow64[] =
+        "SOFTWARE\\WOW6432Node\\Khronos\\Vulkan";
+    char vulkan_impl_layer_reg_key[MAX_STRING_LENGTH];
+    char cur_vulkan_layer_json[MAX_STRING_LENGTH];
+    char generic_string[MAX_STRING_LENGTH];
+    char full_layer_path[MAX_STRING_LENGTH];
+    char env_value[MAX_STRING_LENGTH];
+    uint32_t i = 0;
+    uint32_t j = 0;
+    FILE *fp = NULL;
+    bool failed = false;
+
+// Dump implicit layer information first.
+#if _WIN64 || __x86_64__ || __ppc64__
+    snprintf(vulkan_impl_layer_reg_key, MAX_STRING_LENGTH - 1,
+             "%s\\ImplicitLayers", vulkan_reg_base);
+#else
+    if (global_items.is_wow64) {
+        snprintf(vulkan_impl_layer_reg_key, MAX_STRING_LENGTH - 1,
+                 "%s\\ImplicitLayers", vulkan_reg_base_wow64);
+    } else {
+        snprintf(vulkan_impl_layer_reg_key, MAX_STRING_LENGTH - 1,
+                 "%s\\ImplicitLayers", vulkan_reg_base);
+    }
+#endif
+
+    PrintBeginTable("Implicit Layers", 4);
+    PrintBeginTableRow();
+    PrintTableElement("Registry");
+    PrintTableElement(vulkan_impl_layer_reg_key);
+    PrintTableElement("");
+    PrintTableElement("");
+    PrintEndTableRow();
+
+    // For each implicit layer listed in the registry, find its JSON and
+    // print out the useful information stored in it.
+    uint32_t returned_value = 0;
+    while (FindNextRegValue(HKEY_LOCAL_MACHINE, vulkan_impl_layer_reg_key, "",
+                            i, MAX_STRING_LENGTH, cur_vulkan_layer_json,
+                            &returned_value)) {
+
+        snprintf(generic_string, MAX_STRING_LENGTH - 1, "[%d]", i++);
+
+        PrintBeginTableRow();
+        PrintTableElement(generic_string, ALIGN_RIGHT);
+        PrintTableElement(cur_vulkan_layer_json);
+        PrintTableElement("");
+        snprintf(generic_string, MAX_STRING_LENGTH - 1, "0x%08x", returned_value);
+        PrintTableElement(generic_string);
+        PrintEndTableRow();
+
+        std::ifstream *stream = NULL;
+        stream = new std::ifstream(cur_vulkan_layer_json, std::ifstream::in);
+        if (nullptr == stream || stream->fail()) {
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement("ERROR reading JSON file!");
+            PrintTableElement("");
+            PrintEndTableRow();
+            failed = true;
+        } else {
+            Json::Value root = Json::nullValue;
+            Json::Reader reader;
+            if (!reader.parse(*stream, root, false) || root.isNull()) {
+                // Report to the user the failure and their locations in the
+                // document.
+                PrintBeginTableRow();
+                PrintTableElement("");
+                PrintTableElement("ERROR parsing JSON file!");
+                PrintTableElement(reader.getFormattedErrorMessages());
+                PrintEndTableRow();
+                failed = true;
+            } else {
+                PrintImplicitLayerJsonInfo(cur_vulkan_layer_json, root);
+            }
+
+            stream->close();
+            delete stream;
+            stream = NULL;
+        }
+    }
+    PrintEndTable();
+
+    // If the user's system has VK_LAYER_PATH set, dump out the layer
+    // information found in that folder.  This is important because if
+    // a user is having problems with the layers, they may be using
+    // non-standard layers.
+    if (0 != GetEnvironmentVariableA("VK_LAYER_PATH", env_value,
+                                     MAX_STRING_LENGTH - 1)) {
+        WIN32_FIND_DATAA ffd;
+        HANDLE hFind;
+
+        PrintBeginTable("VK_LAYER_PATH Explicit Layers", 3);
+        PrintBeginTableRow();
+        PrintTableElement("VK_LAYER_PATH");
+        PrintTableElement(env_value);
+        PrintTableElement("");
+        PrintEndTableRow();
+
+        // Look for any JSON files in that folder.
+        snprintf(full_layer_path, MAX_STRING_LENGTH - 1, "%s\\*.json",
+                 env_value);
+        i = 0;
+        hFind = FindFirstFileA(full_layer_path, &ffd);
+        if (hFind != INVALID_HANDLE_VALUE) {
+            do {
+                if (0 == (ffd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)) {
+                    snprintf(generic_string, MAX_STRING_LENGTH - 1, "[%d]",
+                             i++);
+                    snprintf(cur_vulkan_layer_json, MAX_STRING_LENGTH - 1,
+                             "%s\\%s", env_value, ffd.cFileName);
+
+                    PrintBeginTableRow();
+                    PrintTableElement(generic_string, ALIGN_RIGHT);
+                    PrintTableElement(ffd.cFileName);
+                    PrintTableElement("");
+                    PrintEndTableRow();
+
+                    std::ifstream *stream = NULL;
+                    stream = new std::ifstream(cur_vulkan_layer_json,
+                                               std::ifstream::in);
+                    if (nullptr == stream || stream->fail()) {
+                        PrintBeginTableRow();
+                        PrintTableElement("");
+                        PrintTableElement("ERROR reading JSON file!");
+                        PrintTableElement("");
+                        PrintEndTableRow();
+                        failed = true;
+                    } else {
+                        Json::Value root = Json::nullValue;
+                        Json::Reader reader;
+                        if (!reader.parse(*stream, root, false) ||
+                            root.isNull()) {
+                            // Report to the user the failure and their
+                            // locations in the document.
+                            PrintBeginTableRow();
+                            PrintTableElement("");
+                            PrintTableElement("ERROR parsing JSON file!");
+                            PrintTableElement(
+                                reader.getFormattedErrorMessages());
+                            PrintEndTableRow();
+                            failed = true;
+                        } else {
+                            PrintExplicitLayerJsonInfo(cur_vulkan_layer_json,
+                                                       root, 3);
+                        }
+
+                        stream->close();
+                        delete stream;
+                        stream = NULL;
+                    }
+                }
+            } while (FindNextFileA(hFind, &ffd) != 0);
+
+            FindClose(hFind);
+        }
+
+        PrintEndTable();
+    }
+
+    if (failed) {
+        throw(-1);
+    }
+}
+
+#elif __GNUC__
+
+// Utility function to determine if a driver may exist in the folder.
+bool CheckDriver(std::string &folder_loc, std::string &object_name) {
+    bool success = false;
+    std::string full_name = folder_loc;
+    if (folder_loc.c_str()[folder_loc.size() - 1] != '/') {
+        full_name += "/";
+    }
+    full_name += object_name;
+    if (access(full_name.c_str(), R_OK) != -1) {
+        success = true;
+    }
+    return success;
+}
+
+// Pointer to a function sed to validate if the system object is found
+typedef bool (*PFN_CheckIfValid)(std::string &folder_loc,
+                                 std::string &object_name);
+
+bool FindLinuxSystemObject(std::string object_name, PFN_CheckIfValid func,
+                           bool break_on_first) {
+    bool found_one = false;
+    std::string path_to_check;
+    char *env_value = getenv("LD_LIBRARY_PATH");
+
+    for (uint32_t iii = 0; iii < 5; iii++) {
+        switch (iii) {
+        case 0:
+            path_to_check = "/usr/lib";
+            break;
+        case 1:
+#if __x86_64__ || __ppc64__
+            path_to_check = "/usr/lib/x86_64-linux-gnu";
+#else
+            path_to_check = "/usr/lib/i386-linux-gnu";
+#endif
+            break;
+        case 2:
+#if __x86_64__ || __ppc64__
+            path_to_check = "/usr/lib64";
+#else
+            path_to_check = "/usr/lib32";
+#endif
+            break;
+        case 3:
+            path_to_check = "/usr/local/lib";
+            break;
+        case 4:
+#if __x86_64__ || __ppc64__
+            path_to_check = "/usr/local/lib64";
+#else
+            path_to_check = "/usr/local/lib32";
+#endif
+            break;
+        default:
+            continue;
+        }
+
+        if (func(path_to_check, object_name)) {
+            // We found one runtime, clear any failures
+            found_one = true;
+            if (break_on_first) {
+                goto out;
+            }
+        }
+    }
+
+    // LD_LIBRARY_PATH may have multiple folders listed in it (colon
+    // ':' delimited)
+    if (env_value != NULL) {
+        char *tok = strtok(env_value, ":");
+        while (tok != NULL) {
+            path_to_check = tok;
+            if (func(path_to_check, object_name)) {
+                // We found one runtime, clear any failures
+                found_one = true;
+            }
+            tok = strtok(NULL, ":");
+        }
+    }
+
+out:
+    return found_one;
+}
+
+// Print out any information about the current system that we can
+// capture to ease in debugging/investigation at a later time.
+void PrintSystemInfo(void) {
+    FILE *fp;
+    char path[1035];
+    char generic_string[MAX_STRING_LENGTH];
+    utsname buffer;
+    struct statvfs fs_stats;
+    int num_cpus;
+    uint64_t memory;
+    char *env_value;
+    bool failed = false;
+    std::string cur_directory;
+    std::string exe_directory;
+    std::string desktop_session;
+
+    BeginSection("System Info");
+
+    // Environment section has information about the OS and the
+    // execution environment.
+    PrintBeginTable("Environment", 3);
+
+    fp = popen("cat /etc/os-release", "r");
+    if (fp == NULL) {
+        PrintBeginTableRow();
+        PrintTableElement("ERROR");
+        PrintTableElement("Failed to cat /etc/os-release");
+        PrintTableElement("");
+        PrintEndTableRow();
+        failed = true;
+    } else {
+        // Read the output a line at a time - output it.
+        while (fgets(path, sizeof(path) - 1, fp) != NULL) {
+            if (NULL != strstr(path, "PRETTY_NAME")) {
+                uint32_t index;
+                index = strlen(path) - 1;
+                while (path[index] == ' ' || path[index] == '\t' ||
+                       path[index] == '\r' || path[index] == '\n' ||
+                       path[index] == '\"') {
+                    path[index] = '\0';
+                    index = strlen(path) - 1;
+                }
+                index = 13;
+                while (path[index] == ' ' || path[index] == '\t' ||
+                       path[index] == '\"') {
+                    index++;
+                }
+                PrintBeginTableRow();
+                PrintTableElement("Linux");
+                PrintTableElement("");
+                PrintTableElement("");
+                PrintEndTableRow();
+                PrintBeginTableRow();
+                PrintTableElement("");
+                PrintTableElement("Distro");
+                PrintTableElement(&path[index]);
+                PrintEndTableRow();
+                break;
+            }
+        }
+        pclose(fp);
+    }
+
+    errno = 0;
+    if (uname(&buffer) != 0) {
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("ERROR");
+        PrintTableElement("Failed to query uname");
+        PrintEndTableRow();
+        failed = true;
+    } else {
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("Kernel Build");
+        PrintTableElement(buffer.release);
+        PrintEndTableRow();
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("Machine Target");
+        PrintTableElement(buffer.machine);
+        PrintEndTableRow();
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("Version");
+        PrintTableElement(buffer.version);
+        PrintEndTableRow();
+    }
+
+    env_value = getenv("DESKTOP_SESSION");
+    if (env_value != NULL) {
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("DESKTOP_SESSION");
+        PrintTableElement(env_value);
+        PrintEndTableRow();
+
+        desktop_session = env_value;
+    }
+    env_value = getenv("LD_LIBRARY_PATH");
+    if (env_value != NULL) {
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("LD_LIBRARY_PATH");
+        PrintTableElement(env_value);
+        PrintEndTableRow();
+    }
+    env_value = getenv("GDK_BACKEND");
+    if (env_value != NULL) {
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("GDK_BACKEND");
+        PrintTableElement(env_value);
+        PrintEndTableRow();
+    }
+    env_value = getenv("DISPLAY");
+    if (env_value != NULL) {
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("DISPLAY");
+        PrintTableElement(env_value);
+        PrintEndTableRow();
+    }
+    env_value = getenv("WAYLAND_DISPLAY");
+    if (env_value != NULL) {
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("WAYLAND_DISPLAY");
+        PrintTableElement(env_value);
+        PrintEndTableRow();
+    }
+    env_value = getenv("MIR_SOCKET");
+    if (env_value != NULL) {
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("MIR_SOCKET");
+        PrintTableElement(env_value);
+        PrintEndTableRow();
+    }
+
+    PrintEndTable();
+
+    // Output whatever generic hardware information we can find out about the
+    // system.  Including how much memory and disk space is available.
+    PrintBeginTable("Hardware", 3);
+
+    num_cpus = sysconf(_SC_NPROCESSORS_ONLN);
+    snprintf(generic_string, MAX_STRING_LENGTH - 1, "%d", num_cpus);
+
+    PrintBeginTableRow();
+    PrintTableElement("CPUs");
+    PrintTableElement(generic_string);
+    PrintTableElement("");
+    PrintEndTableRow();
+
+    memory = (sysconf(_SC_PHYS_PAGES) * sysconf(_SC_PAGE_SIZE)) >> 10;
+    if ((memory >> 10) > 0) {
+        memory >>= 10;
+        if ((memory >> 20) > 0) {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%u TB",
+                     static_cast<uint32_t>(memory >> 20));
+        } else if ((memory >> 10) > 0) {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%u GB",
+                     static_cast<uint32_t>(memory >> 10));
+        } else {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%u MB",
+                     static_cast<uint32_t>(memory));
+        }
+    } else {
+        snprintf(generic_string, MAX_STRING_LENGTH - 1, "%u KB",
+                 static_cast<uint32_t>(memory));
+    }
+    PrintBeginTableRow();
+    PrintTableElement("Memory");
+    PrintTableElement("Physical");
+    PrintTableElement(generic_string);
+    PrintEndTableRow();
+
+    if (0 == statvfs("/etc/os-release", &fs_stats)) {
+        uint64_t bytes_total =
+            (uint64_t)fs_stats.f_bsize * (uint64_t)fs_stats.f_bavail;
+        if ((bytes_total >> 40) > 0x0ULL) {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%u TB",
+                     static_cast<uint32_t>(bytes_total >> 40));
+            PrintBeginTableRow();
+            PrintTableElement("Disk Space");
+            PrintTableElement("Free");
+            PrintTableElement(generic_string);
+            PrintEndTableRow();
+        } else if ((bytes_total >> 30) > 0x0ULL) {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%u GB",
+                     static_cast<uint32_t>(bytes_total >> 30));
+            PrintBeginTableRow();
+            PrintTableElement("Disk Space");
+            PrintTableElement("Free");
+            PrintTableElement(generic_string);
+        } else if ((bytes_total >> 20) > 0x0ULL) {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%u MB",
+                     static_cast<uint32_t>(bytes_total >> 20));
+            PrintBeginTableRow();
+            PrintTableElement("Disk Space");
+            PrintTableElement("Free");
+            PrintTableElement(generic_string);
+            PrintEndTableRow();
+        } else if ((bytes_total >> 10) > 0x0ULL) {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%u KB",
+                     static_cast<uint32_t>(bytes_total >> 10));
+            PrintBeginTableRow();
+            PrintTableElement("Disk Space");
+            PrintTableElement("Free");
+            PrintTableElement(generic_string);
+            PrintEndTableRow();
+        } else {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%u bytes",
+                     static_cast<uint32_t>(bytes_total));
+            PrintBeginTableRow();
+            PrintTableElement("Disk Space");
+            PrintTableElement("Free");
+            PrintTableElement(generic_string);
+            PrintEndTableRow();
+        }
+    }
+    PrintEndTable();
+
+    // Print out information about this executable.
+    PrintBeginTable("Executable", 2);
+
+    PrintBeginTableRow();
+    PrintTableElement("Exe Directory");
+    PrintTableElement(global_items.exe_directory);
+    PrintEndTableRow();
+
+    if (getcwd(generic_string, MAX_STRING_LENGTH - 1) != NULL) {
+        cur_directory = generic_string;
+        PrintBeginTableRow();
+        PrintTableElement("Current Directory");
+        PrintTableElement(cur_directory);
+        PrintEndTableRow();
+    } else {
+        cur_directory = "";
+    }
+
+    PrintBeginTableRow();
+    PrintTableElement("App Version");
+    PrintTableElement(APP_VERSION);
+    PrintEndTableRow();
+
+    uint32_t major = VK_VERSION_MAJOR(VK_API_VERSION_1_0);
+    uint32_t minor = VK_VERSION_MINOR(VK_API_VERSION_1_0);
+    uint32_t patch = VK_VERSION_PATCH(VK_HEADER_VERSION);
+    snprintf(generic_string, MAX_STRING_LENGTH - 1, "%d.%d.%d", major, minor,
+             patch);
+
+    PrintBeginTableRow();
+    PrintTableElement("Vulkan API Version");
+    PrintTableElement(generic_string);
+    PrintEndTableRow();
+
+    PrintBeginTableRow();
+    PrintTableElement("Byte Format");
+#if __x86_64__ || __ppc64__
+    PrintTableElement("64-bit");
+#else
+    PrintTableElement("32-bit");
+#endif
+    PrintEndTableRow();
+
+    PrintEndTable();
+
+    // Print out the rest of the useful system information.
+    PrintDriverInfo();
+    PrintRunTimeInfo();
+    PrintSDKInfo();
+    PrintLayerInfo();
+    PrintLayerSettingsFileInfo();
+    EndSection();
+
+    if (failed) {
+        throw(-1);
+    }
+}
+
+// Print out the information for every driver JSON in the appropriate
+// system folders.
+void PrintDriverInfo(void) {
+    bool failed = false;
+    char generic_string[MAX_STRING_LENGTH];
+    char full_driver_path[MAX_STRING_LENGTH];
+    uint32_t i = 0;
+    uint32_t j = 0;
+    bool found_json = false;
+    bool found_lib = false;
+
+    PrintBeginTable("Vulkan Driver Info", 3);
+
+    // There are three folders ICD JSONs could be in.  So,
+    // try all three.
+    for (uint32_t dir = 0; dir < 5; dir++) {
+        std::string cur_driver_path;
+        std::string cur_driver_json;
+        switch (dir) {
+        case 0:
+            cur_driver_path = "/etc/vulkan/icd.d";
+            break;
+        case 1:
+            cur_driver_path = "/usr/share/vulkan/icd.d";
+            break;
+        case 2:
+            cur_driver_path = "/usr/local/etc/vulkan/icd.d";
+            break;
+        case 3:
+            cur_driver_path = "/usr/local/share/vulkan/icd.d";
+            break;
+        case 4: {
+            char *env_value = getenv("VK_DRIVERS_PATH");
+            if (NULL == env_value) {
+                continue;
+            }
+            cur_driver_path = env_value;
+            break;
+        }
+        default:
+            continue;
+        }
+
+        PrintBeginTableRow();
+        PrintTableElement(cur_driver_path.c_str());
+        PrintTableElement("");
+        PrintTableElement("");
+        PrintEndTableRow();
+
+        // Loop through each JSON file found in the current
+        // location.
+        DIR *layer_dir = opendir(cur_driver_path.c_str());
+        if (NULL == layer_dir) {
+            continue;
+        }
+        dirent *cur_ent;
+        i = 0;
+        while ((cur_ent = readdir(layer_dir)) != NULL) {
+            if (NULL != strstr(cur_ent->d_name, ".json")) {
+                snprintf(generic_string, MAX_STRING_LENGTH - 1, "[%d]", i++);
+                cur_driver_json = cur_driver_path;
+                cur_driver_json += "/";
+                cur_driver_json += cur_ent->d_name;
+
+                PrintBeginTableRow();
+                PrintTableElement(generic_string, ALIGN_RIGHT);
+                PrintTableElement(cur_ent->d_name);
+                PrintTableElement("");
+                PrintEndTableRow();
+
+                bool found_lib_this_time = false;
+                std::ifstream *stream = NULL;
+                stream = new std::ifstream(cur_driver_json.c_str(),
+                                           std::ifstream::in);
+                if (nullptr == stream || stream->fail()) {
+                    PrintBeginTableRow();
+                    PrintTableElement("");
+                    PrintTableElement("Error reading JSON file");
+                    PrintTableElement(cur_driver_json);
+                    PrintEndTableRow();
+
+                    failed = true;
+                    continue;
+                } else {
+                    Json::Value root = Json::nullValue;
+                    Json::Reader reader;
+                    if (!reader.parse(*stream, root, false) || root.isNull()) {
+                        PrintBeginTableRow();
+                        PrintTableElement("");
+                        PrintTableElement("Error reading JSON file");
+                        PrintTableElement(reader.getFormattedErrorMessages());
+                        PrintEndTableRow();
+
+                        failed = true;
+                        stream->close();
+                        delete stream;
+                        continue;
+                    } else {
+                        PrintBeginTableRow();
+                        PrintTableElement("");
+                        PrintTableElement("JSON File Version");
+                        if (!root["file_format_version"].isNull()) {
+                            PrintTableElement(
+                                root["file_format_version"].asString());
+                        } else {
+                            PrintTableElement("MISSING!");
+                        }
+                        PrintEndTableRow();
+
+                        if (!root["ICD"].isNull()) {
+                            found_json = true;
+
+                            PrintBeginTableRow();
+                            PrintTableElement("");
+                            PrintTableElement("API Version");
+                            if (!root["ICD"]["api_version"].isNull()) {
+                                PrintTableElement(
+                                    root["ICD"]["api_version"].asString());
+                            } else {
+                                PrintTableElement("MISSING!");
+                            }
+                            PrintEndTableRow();
+
+                            PrintBeginTableRow();
+                            PrintTableElement("");
+                            PrintTableElement("Library Path");
+                            if (!root["ICD"]["library_path"].isNull()) {
+                                std::string driver_name = root["ICD"]["library_path"].asString();
+                                PrintTableElement(driver_name);
+                                PrintEndTableRow();
+
+                                if (GenerateLibraryPath(
+                                        cur_driver_json.c_str(),
+                                        driver_name.c_str(),
+                                        MAX_STRING_LENGTH, full_driver_path)) {
+                                    // First try the generated path.
+                                    if (access(full_driver_path, R_OK) != -1) {
+                                        found_lib_this_time = true;
+                                    } else if (driver_name.find("/") == std::string::npos) {
+                                        if (FindLinuxSystemObject(driver_name,
+                                                                  CheckDriver,
+                                                                  true)) {
+                                            found_lib_this_time = true;
+                                        }
+                                    }
+                                    if (!found_lib_this_time) {
+                                        snprintf(generic_string,
+                                                 MAX_STRING_LENGTH - 1,
+                                                 "Failed to find driver %s "
+                                                 "referenced by JSON %s",
+                                                 full_driver_path,
+                                                 cur_driver_json.c_str());
+                                    } else {
+                                        found_lib = true;
+                                    }
+                                } else {
+                                    snprintf(generic_string,
+                                             MAX_STRING_LENGTH - 1,
+                                             "Failed to find driver %s "
+                                             "referenced by JSON %s",
+                                             full_driver_path,
+                                             cur_driver_json.c_str());
+                                    PrintBeginTableRow();
+                                    PrintTableElement("");
+                                    PrintTableElement("");
+                                    PrintTableElement(generic_string);
+                                    PrintEndTableRow();
+                                }
+                            } else {
+                                PrintTableElement("MISSING!");
+                                PrintEndTableRow();
+                            }
+
+                            char count_str[MAX_STRING_LENGTH];
+                            j = 0;
+                            Json::Value dev_exts =
+                                root["ICD"]["device_extensions"];
+                            if (!dev_exts.isNull() && dev_exts.isArray()) {
+                                snprintf(count_str, MAX_STRING_LENGTH - 1, "%d",
+                                         dev_exts.size());
+                                PrintBeginTableRow();
+                                PrintTableElement("");
+                                PrintTableElement("Device Extensions");
+                                PrintTableElement(count_str);
+                                PrintEndTableRow();
+
+                                for (Json::ValueIterator dev_ext_it =
+                                         dev_exts.begin();
+                                     dev_ext_it != dev_exts.end();
+                                     dev_ext_it++) {
+                                    Json::Value dev_ext = (*dev_ext_it);
+                                    Json::Value dev_ext_name = dev_ext["name"];
+                                    if (!dev_ext_name.isNull()) {
+                                        snprintf(generic_string,
+                                                 MAX_STRING_LENGTH - 1, "[%d]",
+                                                 j);
+
+                                        PrintBeginTableRow();
+                                        PrintTableElement("");
+                                        PrintTableElement(generic_string,
+                                                          ALIGN_RIGHT);
+                                        PrintTableElement(
+                                            dev_ext_name.asString());
+                                        PrintEndTableRow();
+                                    }
+                                }
+                            }
+                            Json::Value inst_exts =
+                                root["ICD"]["instance_extensions"];
+                            j = 0;
+                            if (!inst_exts.isNull() && inst_exts.isArray()) {
+                                snprintf(count_str, MAX_STRING_LENGTH - 1, "%d",
+                                         inst_exts.size());
+                                PrintBeginTableRow();
+                                PrintTableElement("");
+                                PrintTableElement("Instance Extensions");
+                                PrintTableElement(count_str);
+                                PrintEndTableRow();
+
+                                for (Json::ValueIterator inst_ext_it =
+
+                                         inst_exts.begin();
+                                     inst_ext_it != inst_exts.end();
+                                     inst_ext_it++) {
+                                    Json::Value inst_ext = (*inst_ext_it);
+                                    Json::Value inst_ext_name =
+                                        inst_ext["name"];
+                                    if (!inst_ext_name.isNull()) {
+                                        snprintf(generic_string,
+                                                 MAX_STRING_LENGTH - 1, "[%d]",
+                                                 j);
+
+                                        PrintBeginTableRow();
+                                        PrintTableElement("");
+                                        PrintTableElement(generic_string,
+                                                          ALIGN_RIGHT);
+                                        PrintTableElement(
+                                            inst_ext_name.asString());
+                                        PrintEndTableRow();
+                                    }
+                                }
+                            }
+                        } else {
+                            PrintBeginTableRow();
+                            PrintTableElement("");
+                            PrintTableElement("ICD Section");
+                            PrintTableElement("MISSING!");
+                            PrintEndTableRow();
+                        }
+                    }
+
+                    stream->close();
+                    delete stream;
+                    stream = NULL;
+                }
+            }
+        }
+    }
+    if (!found_json || !found_lib) {
+        failed = true;
+    }
+    PrintEndTable();
+    if (failed) {
+        if (!found_json) {
+            throw MISSING_DRIVER_JSON;
+        } else if (!found_lib) {
+            throw MISSING_DRIVER_LIB;
+        } else {
+            throw(-1);
+        }
+    }
+}
+
+// Print out all the runtime files found in a given location.  This way we
+// capture the full state of the system.
+bool PrintRuntimesInFolder(std::string &folder_loc, std::string &object_name, bool print_header = true) {
+    DIR *runtime_dir;
+    bool success = false;
+    bool failed = false;
+
+    runtime_dir = opendir(folder_loc.c_str());
+    if (NULL != runtime_dir) {
+        bool file_found = false;
+        FILE *pfp;
+        uint32_t i = 0;
+        dirent *cur_ent;
+        std::string command_str;
+        std::stringstream generic_str;
+        char path[1035];
+
+        if (print_header) {
+            PrintBeginTableRow();
+            PrintTableElement(folder_loc, ALIGN_RIGHT);
+            PrintTableElement("");
+            PrintTableElement("");
+            PrintEndTableRow();
+        }
+
+        while ((cur_ent = readdir(runtime_dir)) != NULL) {
+            if (NULL != strstr(cur_ent->d_name, object_name.c_str()) &&
+                strlen(cur_ent->d_name) == 14) {
+
+                // Get the source of this symbolic link
+                command_str = "stat -c%N ";
+                command_str += folder_loc;
+                command_str += "/";
+                command_str += cur_ent->d_name;
+                pfp = popen(command_str.c_str(), "r");
+
+                generic_str << "[" << i++ << "]";
+
+                PrintBeginTableRow();
+                PrintTableElement(generic_str.str(), ALIGN_RIGHT);
+
+                file_found = true;
+
+                if (pfp == NULL) {
+                    PrintTableElement(cur_ent->d_name);
+                    PrintTableElement("Failed to retrieve symbolic link");
+                    failed = true;
+                } else {
+                    if (NULL != fgets(path, sizeof(path) - 1, pfp)) {
+                        std::string cmd = path;
+                        size_t arrow_loc = cmd.find("->");
+                        if (arrow_loc == std::string::npos) {
+                            std::string trimmed_path =
+                                TrimWhitespace(path, " \t\n\r\'\"");
+
+                            PrintTableElement(trimmed_path);
+                            PrintTableElement("");
+                        } else {
+                            std::string before_arrow = cmd.substr(0, arrow_loc);
+                            std::string trim_before =
+                                TrimWhitespace(before_arrow, " \t\n\r\'\"");
+                            std::string after_arrow =
+                                cmd.substr(arrow_loc + 2, std::string::npos);
+                            std::string trim_after =
+                                TrimWhitespace(after_arrow, " \t\n\r\'\"");
+                            PrintTableElement(trim_before);
+                            PrintTableElement(trim_after);
+                        }
+                    } else {
+                        PrintTableElement(cur_ent->d_name);
+                        PrintTableElement("Failed to retrieve symbolic link");
+                    }
+
+                    PrintEndTableRow();
+
+                    pclose(pfp);
+                }
+            }
+        }
+        if (!file_found) {
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement("No libvulkan.so files found");
+            PrintTableElement("");
+            PrintEndTableRow();
+        }
+        closedir(runtime_dir);
+
+        success = !failed;
+    } else {
+        PrintBeginTableRow();
+        PrintTableElement(folder_loc, ALIGN_RIGHT);
+        PrintTableElement("No such folder");
+        PrintTableElement("");
+        PrintEndTableRow();
+    }
+
+    return success;
+}
+
+// Utility function to determine if a runtime exists in the folder
+bool CheckRuntime(std::string &folder_loc, std::string &object_name) {
+    return PrintRuntimesInFolder(folder_loc, object_name);
+}
+
+// Print out whatever Vulkan runtime information we can gather from the
+// standard system paths, etc.
+void PrintRunTimeInfo(void) {
+    const char vulkan_so_prefix[] = "libvulkan.so.";
+    char path[1035];
+    char generic_string[MAX_STRING_LENGTH];
+    char buff[PATH_MAX];
+    std::string runtime_dir_name;
+    FILE *pfp;
+    bool failed = false;
+    PrintBeginTable("Vulkan Runtimes", 3);
+
+    PrintBeginTableRow();
+    PrintTableElement("Possible Runtime Folders");
+    PrintTableElement("");
+    PrintTableElement("");
+    PrintEndTableRow();
+
+    if (!FindLinuxSystemObject(vulkan_so_prefix, CheckRuntime, false)) {
+        failed = true;
+    }
+
+    ssize_t len = ::readlink("/proc/self/exe", buff, sizeof(buff) - 1);
+    if (len != -1) {
+        buff[len] = '\0';
+
+        std::string runtime_dir_id = "Runtime Folder Used By via";
+        snprintf(generic_string, MAX_STRING_LENGTH - 1, "ldd %s", buff);
+        pfp = popen(generic_string, "r");
+        if (pfp == NULL) {
+            PrintBeginTableRow();
+            PrintTableElement(runtime_dir_id);
+            PrintTableElement("Failed to query via library info");
+            PrintTableElement("");
+            PrintEndTableRow();
+            failed = true;
+        } else {
+            bool found = false;
+            while (fgets(path, sizeof(path) - 1, pfp) != NULL) {
+                if (NULL != strstr(path, vulkan_so_prefix)) {
+                    std::string cmd = path;
+                    size_t arrow_loc = cmd.find("=>");
+                    if (arrow_loc == std::string::npos) {
+                        std::string trimmed_path =
+                            TrimWhitespace(path, " \t\n\r\'\"");
+                        PrintBeginTableRow();
+                        PrintTableElement(runtime_dir_id);
+                        PrintTableElement(trimmed_path);
+                        PrintTableElement("");
+                        PrintEndTableRow();
+                    } else {
+                        std::string after_arrow = cmd.substr(arrow_loc + 2);
+                        std::string before_slash =
+                            after_arrow.substr(0, after_arrow.rfind("/"));
+                        std::string trimmed =
+                            TrimWhitespace(before_slash, " \t\n\r\'\"");
+
+                        PrintBeginTableRow();
+                        PrintTableElement(runtime_dir_id);
+                        PrintTableElement(trimmed);
+                        PrintTableElement("");
+                        PrintEndTableRow();
+
+                        std::string find_so = vulkan_so_prefix;
+                        if (!PrintRuntimesInFolder(trimmed, find_so, false)) {
+                            failed = true;
+                        } else {
+                            // We found one runtime, clear any failures
+                            if (failed) {
+                                failed = false;
+                            }
+                        }
+                    }
+                    found = !failed;
+                    break;
+                }
+            }
+            if (!found) {
+                PrintBeginTableRow();
+                PrintTableElement(runtime_dir_id);
+                PrintTableElement(
+                    "Failed to find Vulkan SO used for via");
+                PrintTableElement("");
+                PrintEndTableRow();
+            }
+            pclose(pfp);
+        }
+        PrintEndTableRow();
+    }
+
+    PrintEndTable();
+
+    if (failed) {
+        throw(-1);
+    }
+}
+
+// Print out the explicit layers that are stored in any of the standard
+// locations.
+bool PrintExplicitLayersInFolder(std::string &id, std::string &folder_loc) {
+    DIR *layer_dir;
+    bool success = false;
+
+    layer_dir = opendir(folder_loc.c_str());
+    if (NULL != layer_dir) {
+        dirent *cur_ent;
+        std::string cur_layer;
+        char generic_string[MAX_STRING_LENGTH];
+        uint32_t i = 0;
+        bool failed = false;
+        bool found_json = false;
+
+        PrintBeginTableRow();
+        PrintTableElement(id);
+        PrintTableElement(folder_loc);
+        PrintTableElement("");
+        PrintEndTableRow();
+
+        // Loop through each JSON in a given folder
+        while ((cur_ent = readdir(layer_dir)) != NULL) {
+            if (NULL != strstr(cur_ent->d_name, ".json")) {
+                found_json = true;
+
+                snprintf(generic_string, MAX_STRING_LENGTH - 1, "[%d]", i++);
+                cur_layer = folder_loc;
+                cur_layer += "/";
+                cur_layer += cur_ent->d_name;
+
+                // Parse the JSON file
+                std::ifstream *stream = NULL;
+                stream = new std::ifstream(cur_layer, std::ifstream::in);
+                if (nullptr == stream || stream->fail()) {
+                    PrintBeginTableRow();
+                    PrintTableElement(generic_string, ALIGN_RIGHT);
+                    PrintTableElement(cur_ent->d_name);
+                    PrintTableElement("ERROR reading JSON file!");
+                    PrintEndTableRow();
+                    failed = true;
+                } else {
+                    Json::Value root = Json::nullValue;
+                    Json::Reader reader;
+                    if (!reader.parse(*stream, root, false) || root.isNull()) {
+                        // Report to the user the failure and their
+                        // locations in the document.
+                        PrintBeginTableRow();
+                        PrintTableElement(generic_string, ALIGN_RIGHT);
+                        PrintTableElement(cur_ent->d_name);
+                        PrintTableElement(reader.getFormattedErrorMessages());
+                        PrintEndTableRow();
+                        failed = true;
+                    } else {
+                        PrintBeginTableRow();
+                        PrintTableElement(generic_string, ALIGN_RIGHT);
+                        PrintTableElement(cur_ent->d_name);
+                        PrintTableElement("");
+                        PrintEndTableRow();
+
+                        // Dump out the standard explicit layer information.
+                        PrintExplicitLayerJsonInfo(cur_layer.c_str(), root, 3);
+                    }
+
+                    stream->close();
+                    delete stream;
+                    stream = NULL;
+                }
+            }
+        }
+        if (!found_json) {
+            PrintBeginTableRow();
+            PrintTableElement(id);
+            PrintTableElement(folder_loc);
+            PrintTableElement("No JSON files found");
+            PrintEndTableRow();
+        }
+        closedir(layer_dir);
+
+        success = !failed;
+    } else {
+        PrintBeginTableRow();
+        PrintTableElement(id);
+        PrintTableElement(folder_loc);
+        PrintTableElement("No such folder");
+        PrintEndTableRow();
+
+        // This isn't a failure, just an attempt to read information
+        success = true;
+    }
+
+    return success;
+}
+
+// Print out information on whatever LunarG Vulkan SDKs we can find on
+// the system using the standard locations and environmental variables.
+// This includes listing what layers are available from the SDK.
+void PrintSDKInfo(void) {
+    bool failed = false;
+    bool sdk_exists = false;
+    std::string sdk_path;
+    std::string sdk_env_name;
+    const char vulkan_so_prefix[] = "libvulkan.so.";
+    DIR *sdk_dir;
+    dirent *cur_ent;
+    char *env_value;
+
+    PrintBeginTable("LunarG Vulkan SDKs", 3);
+
+    for (uint32_t dir = 0; dir < 2; dir++) {
+        switch (dir) {
+        case 0:
+            sdk_env_name = "VK_SDK_PATH";
+            env_value = getenv(sdk_env_name.c_str());
+            if (env_value == NULL) {
+                continue;
+            }
+            sdk_path = env_value;
+            break;
+        case 1:
+            sdk_env_name = "VULKAN_SDK";
+            env_value = getenv(sdk_env_name.c_str());
+            if (env_value == NULL) {
+                continue;
+            }
+            sdk_path = env_value;
+            break;
+        default:
+            failed = true;
+            continue;
+        }
+
+        std::string explicit_layer_path = sdk_path;
+        explicit_layer_path += "/etc/explicit_layer.d";
+
+        sdk_dir = opendir(explicit_layer_path.c_str());
+        if (NULL != sdk_dir) {
+            while ((cur_ent = readdir(sdk_dir)) != NULL) {
+                if (NULL != strstr(cur_ent->d_name, vulkan_so_prefix) &&
+                    strlen(cur_ent->d_name) == 14) {
+                }
+            }
+            closedir(sdk_dir);
+
+            if (!PrintExplicitLayersInFolder(sdk_env_name,
+                                             explicit_layer_path)) {
+                failed = true;
+            }
+
+            global_items.sdk_found = true;
+            global_items.sdk_path = sdk_path;
+            sdk_exists = true;
+        }
+    }
+
+    if (!sdk_exists) {
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("No installed SDKs found");
+        PrintTableElement("");
+        PrintEndTableRow();
+    }
+
+    PrintEndTable();
+
+    if (failed) {
+        throw(-1);
+    }
+}
+
+// Print out whatever layers we can find out from other environmental
+// variables that may be used to point the Vulkan loader at a layer path.
+void PrintLayerInfo(void) {
+    uint32_t i = 0;
+    char generic_string[MAX_STRING_LENGTH];
+    bool failed = false;
+    char cur_vulkan_layer_json[MAX_STRING_LENGTH];
+    DIR *layer_dir;
+    dirent *cur_ent;
+    const char implicit_layer_dir[] = "/etc/vulkan/implicit_layer.d";
+    const char explicit_layer_dir[] = "/etc/vulkan/explicit_layer.d";
+    std::string layer_path;
+
+    // Dump out implicit layer information first
+    PrintBeginTable("Implicit Layers", 3);
+    PrintBeginTableRow();
+    PrintTableElement("Location");
+    PrintTableElement(implicit_layer_dir);
+    PrintTableElement("");
+    PrintEndTableRow();
+
+    layer_dir = opendir(implicit_layer_dir);
+    if (NULL != layer_dir) {
+        while ((cur_ent = readdir(layer_dir)) != NULL) {
+            if (NULL != strstr(cur_ent->d_name, ".json")) {
+                snprintf(generic_string, MAX_STRING_LENGTH - 1, "[%d]", i++);
+                snprintf(cur_vulkan_layer_json, MAX_STRING_LENGTH - 1, "%s/%s",
+                         implicit_layer_dir, cur_ent->d_name);
+
+                PrintBeginTableRow();
+                PrintTableElement(generic_string, ALIGN_RIGHT);
+                PrintTableElement(cur_ent->d_name);
+                PrintTableElement("");
+                PrintEndTableRow();
+
+                std::ifstream *stream = NULL;
+                stream =
+                    new std::ifstream(cur_vulkan_layer_json, std::ifstream::in);
+                if (nullptr == stream || stream->fail()) {
+                    PrintBeginTableRow();
+                    PrintTableElement("");
+                    PrintTableElement("ERROR reading JSON file!");
+                    PrintTableElement("");
+                    PrintEndTableRow();
+                    failed = true;
+                } else {
+                    Json::Value root = Json::nullValue;
+                    Json::Reader reader;
+                    if (!reader.parse(*stream, root, false) || root.isNull()) {
+                        // Report to the user the failure and their
+                        // locations in the document.
+                        PrintBeginTableRow();
+                        PrintTableElement("");
+                        PrintTableElement("ERROR parsing JSON file!");
+                        PrintTableElement(reader.getFormattedErrorMessages());
+                        PrintEndTableRow();
+                        failed = true;
+                    } else {
+                        PrintExplicitLayerJsonInfo(cur_vulkan_layer_json, root,
+                                                   3);
+                    }
+
+                    stream->close();
+                    delete stream;
+                    stream = NULL;
+                }
+            }
+        }
+        closedir(layer_dir);
+    } else {
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("Directory does not exist");
+        PrintTableElement("");
+        PrintEndTableRow();
+    }
+    PrintEndTable();
+
+    // Dump out any explicit layer information.
+    PrintBeginTable("Explicit Layers", 3);
+
+    std::string explicit_layer_id = "Global path";
+    std::string explicit_layer_path = explicit_layer_dir;
+
+    if (!PrintExplicitLayersInFolder(explicit_layer_id, explicit_layer_path)) {
+        failed = true;
+    }
+
+    explicit_layer_id = "VK_LAYER_PATH";
+    char *env_value = getenv("VK_LAYER_PATH");
+    if (NULL != env_value) {
+        explicit_layer_path = env_value;
+        if (!PrintExplicitLayersInFolder(explicit_layer_id,
+                                         explicit_layer_path)) {
+            failed = true;
+        }
+    }
+
+    PrintEndTable();
+
+    if (failed) {
+        throw(-1);
+    }
+}
+
+// Run the test in the specified directory with the corresponding
+// command-line arguments.
+// Returns 0 on no error, 1 if test file wasn't found, and -1
+// on any other errors.
+int RunTestInDirectory(std::string path, std::string test,
+                       std::string cmd_line) {
+    char orig_dir[MAX_STRING_LENGTH];
+    int err_code = -1;
+    orig_dir[0] = '\0';
+    if (NULL != getcwd(orig_dir, MAX_STRING_LENGTH - 1)) {
+        int err = chdir(path.c_str());
+        if (-1 != err) {
+            if (-1 != access(test.c_str(), X_OK)) {
+                printf("cmd_line - %s\n", cmd_line.c_str());
+                err_code = system(cmd_line.c_str());
+            } else {
+                // Can't run because it's either not there or an actual
+                // exe.  So, just return a separate error code.
+                err_code = 1;
+            }
+        }
+        chdir(orig_dir);
+    }
+    return err_code;
+}
+
+#endif
+
+// Following functions should be OS agnostic:
+//==========================================
+
+// Trim any whitespace preceeding or following the actual
+// content inside of a string.  The actual items labeled
+// as whitespace are passed in as the second set of
+// parameters.
+std::string TrimWhitespace(const std::string &str,
+                           const std::string &whitespace) {
+    const auto strBegin = str.find_first_not_of(whitespace);
+    if (strBegin == std::string::npos) {
+        return ""; // no content
+    }
+
+    const auto strEnd = str.find_last_not_of(whitespace);
+    const auto strRange = strEnd - strBegin + 1;
+
+    return str.substr(strBegin, strRange);
+}
+
+// Print any information found on the current vk_layer_settings.txt
+// file being used.  It looks in the current folder first, and then will
+// look in any defined by the registry variable VK_LAYER_SETTINGS_PATH.
+void PrintLayerSettingsFileInfo(void) {
+    bool failed = false;
+    char *settings_path = NULL;
+    std::string settings_file;
+    std::map<std::string, std::vector<SettingPair>> settings;
+
+    PrintBeginTable("Layer Settings File", 4);
+
+// If the settings path environment variable is set, use that.
+#ifdef _WIN32
+    char generic_string[MAX_STRING_LENGTH];
+    if (0 != GetEnvironmentVariableA("VK_LAYER_SETTINGS_PATH", generic_string,
+                                     MAX_STRING_LENGTH - 1)) {
+        settings_path = generic_string;
+        settings_file = settings_path;
+        settings_file += '\\';
+    }
+#else
+    settings_path = getenv("VK_LAYER_SETTINGS_PATH");
+    if (NULL != settings_path) {
+        settings_file = settings_path;
+        settings_file += '/';
+    }
+#endif
+    settings_file += "vk_layer_settings.txt";
+
+    PrintBeginTableRow();
+    PrintTableElement("VK_LAYER_SETTINGS_PATH");
+    if (NULL != settings_path) {
+        PrintTableElement(settings_path);
+    } else {
+        PrintTableElement("Not Defined");
+    }
+    PrintTableElement("");
+    PrintTableElement("");
+    PrintEndTableRow();
+
+    // Load the file from the appropriate location
+    PrintBeginTableRow();
+    PrintTableElement("Settings File");
+    PrintTableElement("vk_layer_settings.txt");
+    std::ifstream *settings_stream =
+        new std::ifstream(settings_file, std::ifstream::in);
+    if (nullptr == settings_stream || settings_stream->fail()) {
+        // No file was found.  This is NOT an error.
+        PrintTableElement("Not Found");
+        PrintTableElement("");
+        PrintEndTableRow();
+    } else {
+        // We found a file, so parse it.
+        PrintTableElement("Found");
+        PrintTableElement("");
+        PrintEndTableRow();
+
+        // The settings file is a text file where:
+        //  - # indicates a comment
+        //  - Settings are stored in the fasion:
+        //        <layer_name>.<setting> = <value>
+        while (settings_stream->good()) {
+            std::string cur_line;
+            getline(*settings_stream, cur_line);
+            std::string trimmed_line = TrimWhitespace(cur_line);
+
+            // Skip blank and comment lines
+            if (trimmed_line.length() == 0 || trimmed_line.c_str()[0] == '#') {
+                continue;
+            }
+
+            // If no equal, treat as unknown
+            size_t equal_loc = trimmed_line.find("=");
+            if (equal_loc == std::string::npos) {
+                continue;
+            }
+
+            SettingPair new_pair;
+
+            std::string before_equal = trimmed_line.substr(0, equal_loc);
+            std::string after_equal =
+                trimmed_line.substr(equal_loc + 1, std::string::npos);
+            new_pair.value = TrimWhitespace(after_equal);
+
+            std::string trimmed_setting = TrimWhitespace(before_equal);
+
+            // Look for period
+            std::string setting_layer = "--None--";
+            std::string setting_name = "";
+            size_t period_loc = trimmed_setting.find(".");
+            if (period_loc == std::string::npos) {
+                setting_name = trimmed_setting;
+            } else {
+                setting_layer = trimmed_setting.substr(0, period_loc);
+                setting_name =
+                    trimmed_setting.substr(period_loc + 1, std::string::npos);
+            }
+            new_pair.name = setting_name;
+
+            // Add items to settings map for now
+            if (settings.find(setting_layer) == settings.end()) {
+                // Not found
+                std::vector<SettingPair> new_vector;
+                new_vector.push_back(new_pair);
+                settings[setting_layer] = new_vector;
+            } else {
+                // Already exists
+                std::vector<SettingPair> &cur_vector = settings[setting_layer];
+                cur_vector.push_back(new_pair);
+            }
+        }
+
+        // Now that all items have been grouped in the settings map
+        // appropriately, print
+        // them out
+        for (auto layer_iter = settings.begin(); layer_iter != settings.end();
+             layer_iter++) {
+            std::vector<SettingPair> &cur_vector = layer_iter->second;
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement(layer_iter->first, ALIGN_RIGHT);
+            PrintTableElement("");
+            PrintTableElement("");
+            PrintEndTableRow();
+            for (uint32_t cur_item = 0; cur_item < cur_vector.size();
+                 cur_item++) {
+                PrintBeginTableRow();
+                PrintTableElement("");
+                PrintTableElement("");
+                PrintTableElement(cur_vector[cur_item].name);
+                PrintTableElement(cur_vector[cur_item].value);
+                PrintEndTableRow();
+            }
+        }
+
+        settings_stream->close();
+        delete settings_stream;
+    }
+    PrintEndTable();
+
+    if (failed) {
+        throw(-1);
+    }
+}
+
+// Print out the information stored in an explicit layer's JSON file.
+void PrintExplicitLayerJsonInfo(const char *layer_json_filename,
+                                Json::Value root, uint32_t num_cols) {
+    char generic_string[MAX_STRING_LENGTH];
+    uint32_t cur_col;
+    uint32_t ext;
+    if (!root["layer"].isNull()) {
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("Name");
+        if (!root["layer"]["name"].isNull()) {
+            PrintTableElement(root["layer"]["name"].asString());
+        } else {
+            PrintTableElement("MISSING!");
+        }
+        cur_col = 3;
+        while (num_cols > cur_col) {
+            PrintTableElement("");
+            cur_col++;
+        }
+        PrintEndTableRow();
+
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("Description");
+        if (!root["layer"]["description"].isNull()) {
+            PrintTableElement(root["layer"]["description"].asString());
+        } else {
+            PrintTableElement("MISSING!");
+        }
+        cur_col = 3;
+        while (num_cols > cur_col) {
+            PrintTableElement("");
+            cur_col++;
+        }
+        PrintEndTableRow();
+
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("API Version");
+        if (!root["layer"]["api_version"].isNull()) {
+            PrintTableElement(root["layer"]["api_version"].asString());
+        } else {
+            PrintTableElement("MISSING!");
+        }
+        cur_col = 3;
+        while (num_cols > cur_col) {
+            PrintTableElement("");
+            cur_col++;
+        }
+        PrintEndTableRow();
+
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("JSON File Version");
+        if (!root["file_format_version"].isNull()) {
+            PrintTableElement(root["file_format_version"].asString());
+        } else {
+            PrintTableElement("MISSING!");
+        }
+        cur_col = 3;
+        while (num_cols > cur_col) {
+            PrintTableElement("");
+            cur_col++;
+        }
+        PrintEndTableRow();
+
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("Library Path");
+        if (!root["layer"]["library_path"].isNull()) {
+            PrintTableElement(root["layer"]["library_path"].asString());
+            cur_col = 3;
+            while (num_cols > cur_col) {
+                PrintTableElement("");
+                cur_col++;
+            }
+            PrintEndTableRow();
+
+#ifdef _WIN32
+            // On Windows, we can query the file version, so do so.
+            char full_layer_path[MAX_STRING_LENGTH];
+            if (GenerateLibraryPath(
+                    layer_json_filename,
+                    root["layer"]["library_path"].asString().c_str(),
+                    MAX_STRING_LENGTH, full_layer_path) &&
+                GetFileVersion(full_layer_path, MAX_STRING_LENGTH,
+                               generic_string)) {
+                PrintBeginTableRow();
+                PrintTableElement("");
+                PrintTableElement("Layer File Version");
+                PrintTableElement(generic_string);
+                cur_col = 3;
+                while (num_cols > cur_col) {
+                    PrintTableElement("");
+                    cur_col++;
+                }
+                PrintEndTableRow();
+            }
+#endif
+        } else {
+            PrintTableElement("MISSING!");
+            cur_col = 3;
+            while (num_cols > cur_col) {
+                PrintTableElement("");
+                cur_col++;
+            }
+            PrintEndTableRow();
+        }
+
+        char count_str[MAX_STRING_LENGTH];
+        Json::Value dev_exts = root["layer"]["device_extensions"];
+        ext = 0;
+        if (!dev_exts.isNull() && dev_exts.isArray()) {
+            snprintf(count_str, MAX_STRING_LENGTH - 1, "%d", dev_exts.size());
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement("Device Extensions");
+            PrintTableElement(count_str);
+            cur_col = 3;
+            while (num_cols > cur_col) {
+                PrintTableElement("");
+                cur_col++;
+            }
+            PrintEndTableRow();
+
+            for (Json::ValueIterator dev_ext_it = dev_exts.begin();
+                 dev_ext_it != dev_exts.end(); dev_ext_it++) {
+                Json::Value dev_ext = (*dev_ext_it);
+                Json::Value dev_ext_name = dev_ext["name"];
+                if (!dev_ext_name.isNull()) {
+                    snprintf(generic_string, MAX_STRING_LENGTH - 1, "[%d]",
+                             ext);
+                    PrintBeginTableRow();
+                    PrintTableElement("");
+                    PrintTableElement(generic_string, ALIGN_RIGHT);
+                    PrintTableElement(dev_ext_name.asString());
+                    cur_col = 3;
+                    while (num_cols > cur_col) {
+                        PrintTableElement("");
+                        cur_col++;
+                    }
+                    PrintEndTableRow();
+                }
+            }
+        }
+        Json::Value inst_exts = root["layer"]["instance_extensions"];
+        ext = 0;
+        if (!inst_exts.isNull() && inst_exts.isArray()) {
+            snprintf(count_str, MAX_STRING_LENGTH - 1, "%d", inst_exts.size());
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement("Instance Extensions");
+            PrintTableElement(count_str);
+            cur_col = 3;
+            while (num_cols > cur_col) {
+                PrintTableElement("");
+                cur_col++;
+            }
+            PrintEndTableRow();
+
+            for (Json::ValueIterator inst_ext_it = inst_exts.begin();
+                 inst_ext_it != inst_exts.end(); inst_ext_it++) {
+                Json::Value inst_ext = (*inst_ext_it);
+                Json::Value inst_ext_name = inst_ext["name"];
+                if (!inst_ext_name.isNull()) {
+                    snprintf(generic_string, MAX_STRING_LENGTH - 1, "[%d]",
+                             ext);
+                    PrintBeginTableRow();
+                    PrintTableElement("");
+                    PrintTableElement(generic_string, ALIGN_RIGHT);
+                    PrintTableElement(inst_ext_name.asString());
+                    cur_col = 3;
+                    while (num_cols > cur_col) {
+                        PrintTableElement("");
+                        cur_col++;
+                    }
+                    PrintEndTableRow();
+                }
+            }
+        }
+    } else {
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("Layer Section");
+        PrintTableElement("MISSING!");
+        cur_col = 3;
+        while (num_cols > cur_col) {
+            PrintTableElement("");
+            cur_col++;
+        }
+        PrintEndTableRow();
+    }
+}
+
+// Print out the information about an Implicit layer stored in
+// it's JSON file.  For the most part, it is similar to an
+// explicit layer, so we re-use that code.  However, implicit
+// layers have a DISABLE environment variable that can be used
+// to disable the layer by default.  Additionally, some implicit
+// layers have an ENABLE environment variable so that they are
+// disabled by default, but can be enabled.
+void PrintImplicitLayerJsonInfo(const char *layer_json_filename,
+                                Json::Value root) {
+    bool enabled = true;
+    std::string enable_env_variable = "--NONE--";
+    bool enable_var_set = false;
+    char enable_env_value[16];
+    std::string disable_env_variable = "--NONE--";
+    bool disable_var_set = false;
+    char disable_env_value[16];
+
+    PrintExplicitLayerJsonInfo(layer_json_filename, root, 4);
+
+    Json::Value enable = root["layer"]["enable_environment"];
+    if (!enable.isNull()) {
+        for (Json::Value::iterator en_iter = enable.begin();
+             en_iter != enable.end(); en_iter++) {
+            if (en_iter.key().isNull()) {
+                continue;
+            }
+            enable_env_variable = en_iter.key().asString();
+            // If an enable define exists, set it to disabled by default.
+            enabled = false;
+#ifdef _WIN32
+            if (0 != GetEnvironmentVariableA(enable_env_variable.c_str(),
+                                             enable_env_value, 15)) {
+#else
+            char *enable_env = getenv(enable_env_variable.c_str());
+            if (NULL != enable_env) {
+                strncpy(enable_env_value, enable_env, 15);
+                enable_env_value[15] = '\0';
+#endif
+                if (atoi(enable_env_value) != 0) {
+                    enable_var_set = true;
+                    enabled = true;
+                }
+            }
+            break;
+        }
+    }
+    Json::Value disable = root["layer"]["disable_environment"];
+    if (!disable.isNull()) {
+        for (Json::Value::iterator dis_iter = disable.begin();
+             dis_iter != disable.end(); dis_iter++) {
+            if (dis_iter.key().isNull()) {
+                continue;
+            }
+            disable_env_variable = dis_iter.key().asString();
+#ifdef _WIN32
+            if (0 != GetEnvironmentVariableA(disable_env_variable.c_str(),
+                                             disable_env_value, 15)) {
+#else
+            char *disable_env = getenv(disable_env_variable.c_str());
+            if (NULL != disable_env) {
+                strncpy(disable_env_value, disable_env, 15);
+                disable_env_value[15] = '\0';
+#endif
+                if (atoi(disable_env_value) > 0) {
+                    disable_var_set = true;
+                    enabled = false;
+                }
+            }
+            break;
+        }
+    }
+
+    // Print the overall state (ENABLED or DISABLED) so we can
+    // quickly determine if this layer is being used.
+    PrintBeginTableRow();
+    PrintTableElement("");
+    PrintTableElement("Enabled State");
+    PrintTableElement(enabled ? "ENABLED" : "DISABLED");
+    PrintTableElement("");
+    PrintEndTableRow();
+    PrintBeginTableRow();
+    PrintTableElement("");
+    PrintTableElement("Enable Env Var", ALIGN_RIGHT);
+    PrintTableElement(enable_env_variable);
+    if (enable_var_set) {
+        PrintTableElement("");
+    } else {
+        PrintTableElement("Not Defined");
+    }
+    PrintEndTableRow();
+    PrintBeginTableRow();
+    PrintTableElement("");
+    PrintTableElement("Disable Env Var", ALIGN_RIGHT);
+    PrintTableElement(disable_env_variable);
+    if (disable_var_set) {
+        PrintTableElement(disable_env_value);
+    } else {
+        PrintTableElement("Not Defined");
+    }
+    PrintEndTableRow();
+}
+
+// Perform Vulkan commands to find out what extensions are available
+// to a Vulkan Instance, and attempt to create one.
+void PrintInstanceInfo(void) {
+    VkApplicationInfo app_info;
+    VkInstanceCreateInfo inst_info;
+    uint32_t ext_count;
+    std::vector<VkExtensionProperties> ext_props;
+    VkResult status;
+    char generic_string[MAX_STRING_LENGTH];
+
+    memset(&app_info, 0, sizeof(VkApplicationInfo));
+    app_info.sType = VK_STRUCTURE_TYPE_APPLICATION_INFO;
+    app_info.pNext = NULL;
+    app_info.pApplicationName = "via";
+    app_info.applicationVersion = 1;
+    app_info.pEngineName = "via";
+    app_info.engineVersion = 1;
+    app_info.apiVersion = VK_API_VERSION_1_0;
+
+    memset(&inst_info, 0, sizeof(VkInstanceCreateInfo));
+    inst_info.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO;
+    inst_info.pNext = NULL;
+    inst_info.pApplicationInfo = &app_info;
+    inst_info.enabledLayerCount = 0;
+    inst_info.ppEnabledLayerNames = NULL;
+    inst_info.enabledExtensionCount = 0;
+    inst_info.ppEnabledExtensionNames = NULL;
+
+    PrintBeginTable("Instance", 3);
+
+    PrintBeginTableRow();
+    PrintTableElement("vkEnumerateInstanceExtensionProperties");
+    status = vkEnumerateInstanceExtensionProperties(NULL, &ext_count, NULL);
+    if (status) {
+        snprintf(generic_string, MAX_STRING_LENGTH - 1,
+                 "ERROR: Failed to determine num inst extensions - %d", status);
+        PrintTableElement(generic_string);
+        PrintTableElement("");
+        PrintEndTableRow();
+    } else {
+        snprintf(generic_string, MAX_STRING_LENGTH - 1, "%d extensions found",
+                 ext_count);
+        PrintTableElement(generic_string);
+        PrintTableElement("");
+        PrintEndTableRow();
+
+        ext_props.resize(ext_count);
+        status = vkEnumerateInstanceExtensionProperties(NULL, &ext_count,
+                                                        ext_props.data());
+        if (status) {
+            PrintBeginTableRow();
+            PrintTableElement("");
+            snprintf(generic_string, MAX_STRING_LENGTH - 1,
+                     "ERROR: Failed to enumerate inst extensions - %d", status);
+            PrintTableElement(generic_string);
+            PrintTableElement("");
+            PrintEndTableRow();
+        } else {
+            for (uint32_t iii = 0; iii < ext_count; iii++) {
+                PrintBeginTableRow();
+                snprintf(generic_string, MAX_STRING_LENGTH - 1, "[%d]", iii);
+                PrintTableElement(generic_string, ALIGN_RIGHT);
+                PrintTableElement(ext_props[iii].extensionName);
+                snprintf(generic_string, MAX_STRING_LENGTH - 1, "Spec Vers %d",
+                         ext_props[iii].specVersion);
+                PrintTableElement(generic_string);
+                PrintEndTableRow();
+            }
+        }
+    }
+
+    PrintBeginTableRow();
+    PrintTableElement("vkCreateInstance");
+    status = vkCreateInstance(&inst_info, NULL, &global_items.instance);
+    if (status == VK_ERROR_INCOMPATIBLE_DRIVER) {
+        PrintTableElement("ERROR: Incompatible Driver");
+    } else if (status == VK_ERROR_OUT_OF_HOST_MEMORY) {
+        PrintTableElement("ERROR: Out of memory");
+    } else if (status) {
+        snprintf(generic_string, MAX_STRING_LENGTH - 1,
+                 "ERROR: Failed to create - %d", status);
+        PrintTableElement(generic_string);
+    } else {
+        PrintTableElement("SUCCESSFUL");
+    }
+    PrintTableElement("");
+    PrintEndTableRow();
+    PrintEndTable();
+    if (VK_SUCCESS != status) {
+        throw(-1);
+    }
+}
+
+// Print out any information we can find out about physical devices using
+// the Vulkan commands.  There should be one for each Vulkan capable device
+// on the system.
+void PrintPhysDevInfo(void) {
+    VkPhysicalDeviceProperties props;
+    std::vector<VkPhysicalDevice> phys_devices;
+    VkResult status;
+    char generic_string[MAX_STRING_LENGTH];
+    uint32_t gpu_count = 0;
+    uint32_t iii;
+    uint32_t jjj;
+    bool failed = false;
+
+    PrintBeginTable("Physical Devices", 4);
+
+    PrintBeginTableRow();
+    PrintTableElement("vkEnumeratePhysicalDevices");
+    status =
+        vkEnumeratePhysicalDevices(global_items.instance, &gpu_count, NULL);
+    if (status) {
+        snprintf(generic_string, MAX_STRING_LENGTH - 1,
+                 "ERROR: Failed to query - %d", status);
+        PrintTableElement(generic_string);
+        failed = true;
+    } else {
+        snprintf(generic_string, MAX_STRING_LENGTH - 1, "%d", gpu_count);
+        PrintTableElement(generic_string);
+    }
+    PrintTableElement("");
+    PrintTableElement("");
+    PrintEndTableRow();
+
+    // If we failed here, the rest will have issues, so, unlike everywhere
+    // else, we'll just return immediately.
+    if (failed) {
+        throw VULKAN_CANT_FIND_DRIVER;
+    }
+
+    phys_devices.resize(gpu_count);
+    global_items.phys_devices.resize(gpu_count);
+    status = vkEnumeratePhysicalDevices(global_items.instance, &gpu_count,
+                                        phys_devices.data());
+    if (VK_SUCCESS != status) {
+        PrintBeginTableRow();
+        PrintTableElement("");
+        PrintTableElement("Failed to enumerate physical devices!");
+        PrintTableElement("");
+        PrintEndTableRow();
+        failed = true;
+    }
+    for (iii = 0; iii < gpu_count; iii++) {
+        global_items.phys_devices[iii].vulkan_phys_dev = phys_devices[iii];
+
+        PrintBeginTableRow();
+        snprintf(generic_string, MAX_STRING_LENGTH - 1, "[%d]", iii);
+        PrintTableElement(generic_string, ALIGN_RIGHT);
+        if (status) {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1,
+                     "ERROR: Failed to query - %d", status);
+            PrintTableElement(generic_string);
+            PrintTableElement("");
+            PrintTableElement("");
+            PrintEndTableRow();
+        } else {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "0x%p",
+                     phys_devices[iii]);
+            PrintTableElement(generic_string);
+            PrintTableElement("");
+            PrintTableElement("");
+            PrintEndTableRow();
+
+            vkGetPhysicalDeviceProperties(phys_devices[iii], &props);
+
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement("Vendor");
+            switch (props.vendorID) {
+            case 0x8086:
+            case 0x8087:
+                snprintf(generic_string, MAX_STRING_LENGTH - 1,
+                         "Intel [0x%04x]", props.vendorID);
+                break;
+            case 0x1002:
+            case 0x1022:
+                snprintf(generic_string, MAX_STRING_LENGTH - 1, "AMD [0x%04x]",
+                         props.vendorID);
+                break;
+            case 0x10DE:
+                snprintf(generic_string, MAX_STRING_LENGTH - 1,
+                         "Nvidia [0x%04x]", props.vendorID);
+                break;
+            case 0x1EB5:
+                snprintf(generic_string, MAX_STRING_LENGTH - 1, "ARM [0x%04x]",
+                         props.vendorID);
+                break;
+            case 0x5143:
+                snprintf(generic_string, MAX_STRING_LENGTH - 1,
+                         "Qualcomm [0x%04x]", props.vendorID);
+                break;
+            case 0x1099:
+            case 0x10C3:
+            case 0x1249:
+            case 0x4E8:
+                snprintf(generic_string, MAX_STRING_LENGTH - 1,
+                         "Samsung [0x%04x]", props.vendorID);
+                break;
+            default:
+                snprintf(generic_string, MAX_STRING_LENGTH - 1, "0x%04x",
+                         props.vendorID);
+                break;
+            }
+            PrintTableElement(generic_string);
+            PrintTableElement("");
+            PrintEndTableRow();
+
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement("Device Name");
+            PrintTableElement(props.deviceName);
+            PrintTableElement("");
+            PrintEndTableRow();
+
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement("Device ID");
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "0x%x",
+                     props.deviceID);
+            PrintTableElement(generic_string);
+            PrintTableElement("");
+            PrintEndTableRow();
+
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement("Device Type");
+            switch (props.deviceType) {
+            case VK_PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU:
+                PrintTableElement("Integrated GPU");
+                break;
+            case VK_PHYSICAL_DEVICE_TYPE_DISCRETE_GPU:
+                PrintTableElement("Discrete GPU");
+                break;
+            case VK_PHYSICAL_DEVICE_TYPE_VIRTUAL_GPU:
+                PrintTableElement("Virtual GPU");
+                break;
+            case VK_PHYSICAL_DEVICE_TYPE_CPU:
+                PrintTableElement("CPU");
+                break;
+            case VK_PHYSICAL_DEVICE_TYPE_OTHER:
+                PrintTableElement("Other");
+                break;
+            default:
+                PrintTableElement("INVALID!");
+                break;
+            }
+            PrintTableElement("");
+            PrintEndTableRow();
+
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement("Driver Version");
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%d.%d.%d",
+                     VK_VERSION_MAJOR(props.driverVersion),
+                     VK_VERSION_MINOR(props.driverVersion),
+                     VK_VERSION_PATCH(props.driverVersion));
+            PrintTableElement(generic_string);
+            PrintTableElement("");
+            PrintEndTableRow();
+
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement("API Version");
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%d.%d.%d",
+                     VK_VERSION_MAJOR(props.apiVersion),
+                     VK_VERSION_MINOR(props.apiVersion),
+                     VK_VERSION_PATCH(props.apiVersion));
+            PrintTableElement(generic_string);
+            PrintTableElement("");
+            PrintEndTableRow();
+
+            uint32_t queue_fam_count;
+            vkGetPhysicalDeviceQueueFamilyProperties(phys_devices[iii],
+                                                     &queue_fam_count, NULL);
+            if (queue_fam_count > 0) {
+                PrintBeginTableRow();
+                PrintTableElement("");
+                PrintTableElement("Queue Families");
+                snprintf(generic_string, MAX_STRING_LENGTH - 1, "%d",
+                         queue_fam_count);
+                PrintTableElement(generic_string);
+                PrintTableElement("");
+                PrintEndTableRow();
+
+                global_items.phys_devices[iii].queue_fam_props.resize(
+                    queue_fam_count);
+                vkGetPhysicalDeviceQueueFamilyProperties(
+                    phys_devices[iii], &queue_fam_count,
+                    global_items.phys_devices[iii].queue_fam_props.data());
+                for (jjj = 0; jjj < queue_fam_count; jjj++) {
+                    PrintBeginTableRow();
+                    PrintTableElement("");
+                    snprintf(generic_string, MAX_STRING_LENGTH - 1, "[%d]",
+                             jjj);
+                    PrintTableElement(generic_string, ALIGN_RIGHT);
+                    PrintTableElement("Queue Count");
+                    snprintf(generic_string, MAX_STRING_LENGTH - 1, "%d",
+                             global_items.phys_devices[iii]
+                                 .queue_fam_props[jjj]
+                                 .queueCount);
+                    PrintTableElement(generic_string);
+                    PrintEndTableRow();
+
+                    PrintBeginTableRow();
+                    PrintTableElement("");
+                    PrintTableElement("");
+                    PrintTableElement("Queue Flags");
+                    generic_string[0] = '\0';
+                    bool prev_set = false;
+                    if (global_items.phys_devices[iii]
+                            .queue_fam_props[jjj]
+                            .queueFlags &
+                        VK_QUEUE_GRAPHICS_BIT) {
+                        strncat(generic_string, "GRAPHICS",
+                                MAX_STRING_LENGTH - 1);
+                        prev_set = true;
+                    }
+                    if (global_items.phys_devices[iii]
+                            .queue_fam_props[jjj]
+                            .queueFlags &
+                        VK_QUEUE_COMPUTE_BIT) {
+                        if (prev_set) {
+                            strncat(generic_string, " | ",
+                                    MAX_STRING_LENGTH - 1);
+                        }
+                        strncat(generic_string, "COMPUTE",
+                                MAX_STRING_LENGTH - 1);
+                        prev_set = true;
+                    }
+                    if (global_items.phys_devices[iii]
+                            .queue_fam_props[jjj]
+                            .queueFlags &
+                        VK_QUEUE_TRANSFER_BIT) {
+                        if (prev_set) {
+                            strncat(generic_string, " | ",
+                                    MAX_STRING_LENGTH - 1);
+                        }
+                        strncat(generic_string, "TRANSFER",
+                                MAX_STRING_LENGTH - 1);
+                        prev_set = true;
+                    }
+                    if (global_items.phys_devices[iii]
+                            .queue_fam_props[jjj]
+                            .queueFlags &
+                        VK_QUEUE_SPARSE_BINDING_BIT) {
+                        if (prev_set) {
+                            strncat(generic_string, " | ",
+                                    MAX_STRING_LENGTH - 1);
+                        }
+                        strncat(generic_string, "SPARSE_BINDING",
+                                MAX_STRING_LENGTH - 1);
+                        prev_set = true;
+                    }
+                    if (!prev_set) {
+                        strncat(generic_string, "--NONE--",
+                                MAX_STRING_LENGTH - 1);
+                    }
+                    PrintTableElement(generic_string);
+                    PrintEndTableRow();
+
+                    PrintBeginTableRow();
+                    PrintTableElement("");
+                    PrintTableElement("");
+                    PrintTableElement("Timestamp Valid Bits");
+                    snprintf(generic_string, MAX_STRING_LENGTH - 1, "0x%x",
+                             global_items.phys_devices[iii]
+                                 .queue_fam_props[jjj]
+                                 .timestampValidBits);
+                    PrintTableElement(generic_string);
+                    PrintEndTableRow();
+
+                    PrintBeginTableRow();
+                    PrintTableElement("");
+                    PrintTableElement("");
+                    PrintTableElement("Image Granularity");
+                    PrintTableElement("");
+                    PrintEndTableRow();
+
+                    PrintBeginTableRow();
+                    PrintTableElement("");
+                    PrintTableElement("");
+                    PrintTableElement("Width", ALIGN_RIGHT);
+                    snprintf(generic_string, MAX_STRING_LENGTH - 1, "0x%x",
+                             global_items.phys_devices[iii]
+                                 .queue_fam_props[jjj]
+                                 .minImageTransferGranularity.width);
+                    PrintTableElement(generic_string);
+                    PrintEndTableRow();
+
+                    PrintBeginTableRow();
+                    PrintTableElement("");
+                    PrintTableElement("");
+                    PrintTableElement("Height", ALIGN_RIGHT);
+                    snprintf(generic_string, MAX_STRING_LENGTH - 1, "0x%x",
+                             global_items.phys_devices[iii]
+                                 .queue_fam_props[jjj]
+                                 .minImageTransferGranularity.height);
+                    PrintTableElement(generic_string);
+                    PrintEndTableRow();
+
+                    PrintBeginTableRow();
+                    PrintTableElement("");
+                    PrintTableElement("");
+                    PrintTableElement("Depth", ALIGN_RIGHT);
+                    snprintf(generic_string, MAX_STRING_LENGTH - 1, "0x%x",
+                             global_items.phys_devices[iii]
+                                 .queue_fam_props[jjj]
+                                 .minImageTransferGranularity.depth);
+                    PrintTableElement(generic_string);
+                    PrintEndTableRow();
+                }
+            } else {
+                PrintBeginTableRow();
+                PrintTableElement("");
+                PrintTableElement("vkGetPhysicalDeviceQueueFamilyProperties");
+                PrintTableElement("FAILED: Returned 0!");
+                PrintTableElement("");
+                PrintEndTableRow();
+            }
+
+            VkPhysicalDeviceMemoryProperties memory_props;
+            vkGetPhysicalDeviceMemoryProperties(phys_devices[iii],
+                                                &memory_props);
+
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement("Memory Heaps");
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%d",
+                     memory_props.memoryHeapCount);
+            PrintTableElement(generic_string);
+            PrintTableElement("");
+            PrintEndTableRow();
+
+            for (jjj = 0; jjj < memory_props.memoryHeapCount; jjj++) {
+                PrintBeginTableRow();
+                PrintTableElement("");
+                snprintf(generic_string, MAX_STRING_LENGTH - 1, "[%d]", jjj);
+                PrintTableElement(generic_string, ALIGN_RIGHT);
+                PrintTableElement("Property Flags");
+                generic_string[0] = '\0';
+                bool prev_set = false;
+                if (memory_props.memoryHeaps[jjj].flags &
+                    VK_MEMORY_HEAP_DEVICE_LOCAL_BIT) {
+                    strncat(generic_string, "DEVICE_LOCAL",
+                            MAX_STRING_LENGTH - 1);
+                    prev_set = true;
+                }
+                if (!prev_set) {
+                    strncat(generic_string, "--NONE--", MAX_STRING_LENGTH - 1);
+                }
+                PrintTableElement(generic_string);
+                PrintEndTableRow();
+
+                PrintBeginTableRow();
+                PrintTableElement("");
+                PrintTableElement("");
+                PrintTableElement("Heap Size");
+                snprintf(
+                    generic_string, MAX_STRING_LENGTH - 1, "%" PRIu64 "",
+                    static_cast<uint64_t>(memory_props.memoryHeaps[jjj].size));
+                PrintTableElement(generic_string);
+                PrintEndTableRow();
+            }
+
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement("Memory Types");
+            snprintf(generic_string, MAX_STRING_LENGTH - 1, "%d",
+                     memory_props.memoryTypeCount);
+            PrintTableElement(generic_string);
+            PrintTableElement("");
+            PrintEndTableRow();
+
+            for (jjj = 0; jjj < memory_props.memoryTypeCount; jjj++) {
+                PrintBeginTableRow();
+                PrintTableElement("");
+                snprintf(generic_string, MAX_STRING_LENGTH - 1, "[%d]", jjj);
+                PrintTableElement(generic_string, ALIGN_RIGHT);
+                PrintTableElement("Property Flags");
+                generic_string[0] = '\0';
+                bool prev_set = false;
+                if (memory_props.memoryTypes[jjj].propertyFlags &
+                    VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT) {
+                    strncat(generic_string, "DEVICE_LOCAL",
+                            MAX_STRING_LENGTH - 1);
+                    prev_set = true;
+                }
+                if (memory_props.memoryTypes[jjj].propertyFlags &
+                    VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT) {
+                    if (prev_set) {
+                        strncat(generic_string, " | ", MAX_STRING_LENGTH - 1);
+                    }
+                    strncat(generic_string, "HOST_VISIBLE",
+                            MAX_STRING_LENGTH - 1);
+                    prev_set = true;
+                }
+                if (memory_props.memoryTypes[jjj].propertyFlags &
+                    VK_MEMORY_PROPERTY_HOST_COHERENT_BIT) {
+                    if (prev_set) {
+                        strncat(generic_string, " | ", MAX_STRING_LENGTH - 1);
+                    }
+                    strncat(generic_string, "HOST_COHERENT",
+                            MAX_STRING_LENGTH - 1);
+                    prev_set = true;
+                }
+                if (memory_props.memoryTypes[jjj].propertyFlags &
+                    VK_MEMORY_PROPERTY_HOST_CACHED_BIT) {
+                    if (prev_set) {
+                        strncat(generic_string, " | ", MAX_STRING_LENGTH - 1);
+                    }
+                    strncat(generic_string, "HOST_CACHED",
+                            MAX_STRING_LENGTH - 1);
+                    prev_set = true;
+                }
+                if (memory_props.memoryTypes[jjj].propertyFlags &
+                    VK_MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT) {
+                    if (prev_set) {
+                        strncat(generic_string, " | ", MAX_STRING_LENGTH - 1);
+                    }
+                    strncat(generic_string, "LAZILY_ALLOC",
+                            MAX_STRING_LENGTH - 1);
+                    prev_set = true;
+                }
+                if (!prev_set) {
+                    strncat(generic_string, "--NONE--", MAX_STRING_LENGTH - 1);
+                }
+                PrintTableElement(generic_string);
+                PrintEndTableRow();
+
+                PrintBeginTableRow();
+                PrintTableElement("");
+                PrintTableElement("");
+                PrintTableElement("Heap Index");
+                snprintf(generic_string, MAX_STRING_LENGTH - 1, "%d",
+                         memory_props.memoryTypes[jjj].heapIndex);
+                PrintTableElement(generic_string);
+                PrintEndTableRow();
+            }
+
+            uint32_t num_ext_props;
+            std::vector<VkExtensionProperties> ext_props;
+
+            PrintBeginTableRow();
+            PrintTableElement("");
+            PrintTableElement("Device Extensions");
+            status = vkEnumerateDeviceExtensionProperties(
+                phys_devices[iii], NULL, &num_ext_props, NULL);
+            if (VK_SUCCESS != status) {
+                PrintTableElement("FAILED querying number of extensions");
+                PrintTableElement("");
+                PrintEndTableRow();
+
+                failed = true;
+            } else {
+                snprintf(generic_string, MAX_STRING_LENGTH - 1, "%d",
+                         num_ext_props);
+                PrintTableElement(generic_string);
+                ext_props.resize(num_ext_props);
+                status = vkEnumerateDeviceExtensionProperties(
+                    phys_devices[iii], NULL, &num_ext_props, ext_props.data());
+                if (VK_SUCCESS != status) {
+                    PrintTableElement("FAILED querying actual extension info");
+                    PrintEndTableRow();
+
+                    failed = true;
+                } else {
+                    PrintTableElement("");
+                    PrintEndTableRow();
+
+                    for (jjj = 0; jjj < num_ext_props; jjj++) {
+                        PrintBeginTableRow();
+                        PrintTableElement("");
+                        snprintf(generic_string, MAX_STRING_LENGTH - 1, "[%d]",
+                                 jjj);
+                        PrintTableElement(generic_string, ALIGN_RIGHT);
+                        PrintTableElement(ext_props[jjj].extensionName);
+                        snprintf(generic_string, MAX_STRING_LENGTH - 1,
+                                 "Spec Vers %d", ext_props[jjj].specVersion);
+                        PrintTableElement(generic_string);
+                        PrintEndTableRow();
+                    }
+                }
+            }
+        }
+    }
+
+    PrintEndTable();
+
+    if (failed) {
+        throw VULKAN_CANT_FIND_DRIVER;
+    }
+}
+
+// Using the previously determine information, attempt to create a logical
+// device for each physical device we found.
+void PrintLogicalDeviceInfo(void) {
+    VkDeviceCreateInfo device_create_info;
+    VkDeviceQueueCreateInfo queue_create_info;
+    VkResult status = VK_SUCCESS;
+    uint32_t dev_count =
+        static_cast<uint32_t>(global_items.phys_devices.size());
+    char generic_string[MAX_STRING_LENGTH];
+    bool failed = false;
+
+    PrintBeginTable("Logical Devices", 3);
+
+    PrintBeginTableRow();
+    PrintTableElement("vkCreateDevice");
+    snprintf(generic_string, MAX_STRING_LENGTH - 1, "%d", dev_count);
+    PrintTableElement(generic_string);
+    PrintTableElement("");
+    PrintEndTableRow();
+
+    global_items.log_devices.resize(dev_count);
+    for (uint32_t dev = 0; dev < dev_count; dev++) {
+        memset(&device_create_info, 0, sizeof(VkDeviceCreateInfo));
+        device_create_info.sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO;
+        device_create_info.pNext = NULL;
+        device_create_info.queueCreateInfoCount = 0;
+        device_create_info.pQueueCreateInfos = NULL;
+        device_create_info.enabledLayerCount = 0;
+        device_create_info.ppEnabledLayerNames = NULL;
+        device_create_info.enabledExtensionCount = 0;
+        device_create_info.ppEnabledExtensionNames = NULL;
+        device_create_info.queueCreateInfoCount = 1;
+        device_create_info.enabledLayerCount = 0;
+        device_create_info.ppEnabledLayerNames = NULL;
+        device_create_info.enabledExtensionCount = 0;
+        device_create_info.ppEnabledExtensionNames = NULL;
+
+        memset(&queue_create_info, 0, sizeof(VkDeviceQueueCreateInfo));
+        float queue_priority = 0;
+        queue_create_info.sType = VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO;
+        queue_create_info.pNext = NULL;
+        queue_create_info.queueCount = 1;
+        queue_create_info.pQueuePriorities = &queue_priority;
+
+        for (uint32_t queue = 0;
+             queue < global_items.phys_devices[dev].queue_fam_props.size();
+             queue++) {
+            if (0 != (global_items.phys_devices[dev]
+                          .queue_fam_props[queue]
+                          .queueFlags &
+                      VK_QUEUE_GRAPHICS_BIT)) {
+                queue_create_info.queueFamilyIndex = queue;
+                break;
+            }
+        }
+        device_create_info.pQueueCreateInfos = &queue_create_info;
+
+        PrintBeginTableRow();
+        PrintTableElement("");
+        snprintf(generic_string, MAX_STRING_LENGTH - 1, "[%d]", dev);
+        PrintTableElement(generic_string);
+
+        status = vkCreateDevice(global_items.phys_devices[dev].vulkan_phys_dev,
+                                &device_create_info, NULL,
+                                &global_items.log_devices[dev]);
+        if (VK_ERROR_INCOMPATIBLE_DRIVER == status) {
+            PrintTableElement("FAILED: Incompatible Driver");
+            failed = true;
+        } else if (VK_ERROR_OUT_OF_HOST_MEMORY == status) {
+            PrintTableElement("FAILED: Out of Host Memory");
+            failed = true;
+        } else if (VK_SUCCESS != status) {
+            snprintf(generic_string, MAX_STRING_LENGTH - 1,
+                     "FAILED : VkResult code = 0x%x", status);
+            PrintTableElement(generic_string);
+            failed = true;
+        } else {
+            PrintTableElement("SUCCESSFUL");
+        }
+
+        PrintEndTableRow();
+    }
+
+    PrintEndTable();
+    if (failed) {
+        throw(-1);
+    }
+}
+
+// Clean up all the Vulkan items we previously created and print
+// out if there are any problems.
+void PrintCleanupInfo(void) {
+    char generic_string[MAX_STRING_LENGTH];
+    uint32_t dev_count =
+        static_cast<uint32_t>(global_items.phys_devices.size());
+
+    PrintBeginTable("Cleanup", 3);
+
+    PrintBeginTableRow();
+    PrintTableElement("vkDestroyDevice");
+    snprintf(generic_string, MAX_STRING_LENGTH - 1, "%d", dev_count);
+    PrintTableElement(generic_string);
+    PrintTableElement("");
+    PrintEndTableRow();
+    for (uint32_t dev = 0; dev < dev_count; dev++) {
+        vkDestroyDevice(global_items.log_devices[dev], NULL);
+        PrintBeginTableRow();
+        PrintTableElement("");
+        snprintf(generic_string, MAX_STRING_LENGTH - 1, "[%d]", dev);
+        PrintTableElement(generic_string, ALIGN_RIGHT);
+        PrintTableElement("SUCCESSFUL");
+        PrintEndTableRow();
+    }
+
+    PrintBeginTableRow();
+    PrintTableElement("vkDestroyInstance");
+    vkDestroyInstance(global_items.instance, NULL);
+    PrintTableElement("SUCCESSFUL");
+    PrintTableElement("");
+    PrintEndTableRow();
+
+    PrintEndTable();
+}
+
+// Run any external tests we can find, and print the results of those
+// tests.
+void PrintTestResults(void) {
+    bool failed = false;
+
+    BeginSection("External Tests");
+    if (global_items.sdk_found) {
+        std::string cube_exe;
+        std::string full_cmd;
+        std::string path = global_items.sdk_path;
+
+#ifdef _WIN32
+        cube_exe = "cube.exe";
+
+#if _WIN64
+        path += "\\Bin";
+#else
+        path += "\\Bin32";
+#endif
+#else // gcc
+        cube_exe = "./cube";
+        path += "/../examples/build";
+#endif
+        full_cmd = cube_exe;
+        full_cmd += " --c 100";
+
+        PrintBeginTable("Cube", 2);
+
+        PrintBeginTableRow();
+        PrintTableElement(full_cmd);
+        int test_result = RunTestInDirectory(path, cube_exe, full_cmd);
+        if (test_result == 0) {
+            PrintTableElement("SUCCESSFUL");
+        } else if (test_result == 1) {
+            PrintTableElement("Not Found");
+        } else {
+            PrintTableElement("FAILED!");
+            failed = true;
+        }
+        PrintEndTableRow();
+
+        full_cmd += " --validate";
+
+        PrintBeginTableRow();
+        PrintTableElement(full_cmd);
+        test_result = RunTestInDirectory(path, cube_exe, full_cmd);
+        if (test_result == 0) {
+            PrintTableElement("SUCCESSFUL");
+        } else if (test_result == 1) {
+            PrintTableElement("Not Found");
+        } else {
+            PrintTableElement("FAILED!");
+            failed = true;
+        }
+        PrintEndTableRow();
+
+        PrintEndTable();
+    } else {
+        PrintStandardText("No SDK found by VIA, skipping test section");
+    }
+    EndSection();
+
+    if (failed) {
+        throw(-1);
+    }
+}
+
+// Print information on any Vulkan commands we can (or can't) execute.
+void PrintVulkanInfo(void) {
+    BeginSection("Vulkan API Calls");
+
+    PrintInstanceInfo();
+    PrintPhysDevInfo();
+    PrintLogicalDeviceInfo();
+    PrintCleanupInfo();
+
+    EndSection();
+}

diff --git a/vk-vtgenerate.py b/vk-vtgenerate.py
new file mode 100755
index 0000000..89d9e3c
--- /dev/null
+++ b/vk-vtgenerate.py

@@ -0,0 +1,223 @@
+#!/usr/bin/env python3
+#
+# Copyright (C) 2015-2016 Valve Corporation
+# Copyright (c) 2015-2016 LunarG, Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Author: Chia-I Wu <olv@lunarg.com>
+# Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+# Author: Jon Ashburn <jon@lunarg.com>
+
+import sys
+import vulkan
+
+def generate_get_proc_addr_check(name):
+    return "    if (!%s || %s[0] != 'v' || %s[1] != 'k')\n" \
+           "        return NULL;" % ((name,) * 3)
+
+class Subcommand(object):
+    def __init__(self, argv):
+        self.argv = argv
+        self.headers = vulkan.headers
+        self.protos = vulkan.protos
+        self.extensions = vulkan.extensions
+
+    def run(self):
+        print(self.generate())
+
+    def generate(self):
+        copyright = self.generate_copyright()
+        header = self.generate_header()
+        body = self.generate_body()
+        footer = self.generate_footer()
+
+        contents = []
+        if copyright:
+            contents.append(copyright)
+        if header:
+            contents.append(header)
+        if body:
+            contents.append(body)
+        if footer:
+            contents.append(footer)
+
+        return "\n\n".join(contents)
+
+    def generate_copyright(self):
+        return """/* THIS FILE IS GENERATED BY vk-vtgenerate.py.  DO NOT EDIT. */
+
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (c) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ */"""
+
+    def generate_header(self):
+        return "\n".join(["#include <" + h + ">" for h in self.headers])
+
+    def generate_body(self):
+        pass
+
+    def generate_footer(self):
+        pass
+
+class IcdDummyEntrypointsSubcommand(Subcommand):
+    def run(self):
+        if len(self.argv) == 1:
+            self.prefix = self.argv[0]
+            self.qual = "static"
+        else:
+            self.prefix = "vk"
+            self.qual = ""
+
+        super().run()
+
+    def generate_header(self):
+        return "#include \"icd.h\""
+
+    def _generate_stub_decl(self, proto):
+        if proto.name == "GetInstanceProcAddr":
+            return proto.c_pretty_decl(self.prefix + "_icd" + proto.name, attr="ICD_EXPORT VKAPI")
+        else:
+            return proto.c_pretty_decl(self.prefix + proto.name, attr="VKAPI")
+
+    def _generate_stubs(self):
+        stubs = []
+        exclusions = [ 'CreateAndroidSurfaceKHR', 'CreateWaylandSurfaceKHR', 'CreateMirSurfaceKHR',
+                       'GetPhysicalDeviceWaylandPresentationSupportKHR', 'GetPhysicalDeviceMirPresentationSupportKHR',
+                       'GetPhysicalDeviceDisplayPropertiesKHR', 'GetPhysicalDeviceDisplayPlanePropertiesKHR',
+                       'GetDisplayPlaneSupportedDisplaysKHR', 'GetDisplayModePropertiesKHR',
+                       'CreateDisplayModeKHR', 'GetDisplayPlaneCapabilitiesKHR', 'CreateDisplayPlaneSurfaceKHR',
+                       'CreateSharedSwapchainsKHR']
+        for proto in self.protos:
+            if proto.name in exclusions:
+                continue
+            decl = self._generate_stub_decl(proto)
+            if proto.ret != "void":
+                stmt = "    return VK_ERROR_UNKNOWN;\n"
+            else:
+                stmt = ""
+
+            stubs.append("%s %s\n{\n%s}" % (self.qual, decl, stmt))
+
+        return "\n\n".join(stubs)
+
+    def generate_body(self):
+        return self._generate_stubs()
+
+class IcdGetProcAddrSubcommand(IcdDummyEntrypointsSubcommand):
+    def generate_header(self):
+        return "\n".join(["#include <string.h>", "#include \"icd.h\""])
+
+    def generate_body(self):
+        for proto in self.protos:
+            if proto.name == "GetDeviceProcAddr":
+                gpa_proto = proto
+            if proto.name == "GetInstanceProcAddr":
+                gpa_instance_proto = proto
+
+        gpa_instance_decl = self._generate_stub_decl(gpa_instance_proto)
+        gpa_decl = self._generate_stub_decl(gpa_proto)
+        gpa_pname = gpa_proto.params[-1].name
+
+        lookups = []
+        exclusions = [ 'CreateAndroidSurfaceKHR', 'CreateWaylandSurfaceKHR', 'CreateMirSurfaceKHR',
+                       'GetPhysicalDeviceWaylandPresentationSupportKHR', 'GetPhysicalDeviceMirPresentationSupportKHR',
+                       'GetPhysicalDeviceDisplayPropertiesKHR', 'GetPhysicalDeviceDisplayPlanePropertiesKHR',
+                       'GetDisplayPlaneSupportedDisplaysKHR', 'GetDisplayModePropertiesKHR',
+                       'CreateDisplayModeKHR', 'GetDisplayPlaneCapabilitiesKHR', 'CreateDisplayPlaneSurfaceKHR',
+                       'CreateSharedSwapchainsKHR']
+
+        for ext in self.extensions:
+            if ext.ifdef:
+                lookups.append("#ifdef %s" % ext.ifdef)
+            for proto in ext.protos:
+                if proto.name in exclusions:
+                    continue
+                lookups.append("if (!strcmp(%s, \"%s\"))" %
+                        (gpa_pname, proto.name))
+                if proto.name != "GetInstanceProcAddr":
+                    lookups.append("    return (%s) %s%s;" %
+                        (gpa_proto.ret, self.prefix, proto.name))
+                else:
+                    lookups.append("    return (%s) %s%s;" %
+                        (gpa_proto.ret, self.prefix, "_icdGetInstanceProcAddr"))
+            if ext.ifdef:
+                lookups.append("#endif /* %s */" % ext.ifdef)
+
+        body = []
+        body.append("%s %s" % (self.qual, gpa_instance_decl))
+        body.append("{")
+        body.append(generate_get_proc_addr_check(gpa_pname))
+        body.append("")
+        body.append("    %s += 2;" % gpa_pname)
+        body.append("    %s" % "\n    ".join(lookups))
+        body.append("")
+        body.append("    return NULL;")
+        body.append("}")
+        body.append("")
+
+        body.append("%s %s" % (self.qual, gpa_decl))
+        body.append("{")
+        body.append(generate_get_proc_addr_check(gpa_pname))
+        body.append("")
+        body.append("    %s += 2;" % gpa_pname)
+        body.append("    %s" % "\n    ".join(lookups))
+        body.append("")
+        body.append("    return NULL;")
+        body.append("}")
+
+        return "\n".join(body)
+
+def main():
+
+    wsi = {
+            "Win32",
+            "Android",
+            "Xcb",
+            "Xlib",
+            "Wayland",
+            "Mir"
+    }
+
+    subcommands = {
+            "icd-dummy-entrypoints": IcdDummyEntrypointsSubcommand,
+            "icd-get-proc-addr": IcdGetProcAddrSubcommand
+    }
+
+    if len(sys.argv) < 3 or sys.argv[1] not in wsi or sys.argv[2] not in subcommands:
+        print("Usage: %s <wsi> <subcommand> [options]" % sys.argv[0])
+        print
+        print("Available sucommands are: %s" % " ".join(subcommands))
+        exit(1)
+
+    subcmd = subcommands[sys.argv[2]](sys.argv[3:])
+    subcmd.run()
+
+if __name__ == "__main__":
+    main()

diff --git a/vk-vtlayer-generate.py b/vk-vtlayer-generate.py
new file mode 100755
index 0000000..5e3873a
--- /dev/null
+++ b/vk-vtlayer-generate.py

@@ -0,0 +1,1367 @@
+#!/usr/bin/env python3
+#
+# VK
+#
+# Copyright (c) 2015-2016 Valve Corporation
+# Copyright (c) 2015-2016 LunarG, Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Author: Tobin Ehlis <tobin@lunarg.com>
+# Author: Courtney Goeltzenleuchter <courtney@lunarg.com>
+# Author: Jon Ashburn <jon@lunarg.com>
+# Author: Mark Lobodzinski <mark@lunarg.com>
+# Author: Mike Stroyan <stroyan@lunarg.com>
+# Author: Tony Barbour <tony@LunarG.com>
+# Author: Chia-I Wu <olv@lunarg.com>
+
+import sys
+import os
+import re
+
+import vulkan
+import vk_helper_api_dump
+from source_line_info import sourcelineinfo
+from collections import defaultdict
+
+def proto_is_global(proto):
+    global_function_names = [
+        "CreateInstance",
+        "EnumerateInstanceLayerProperties",
+        "EnumerateInstanceExtensionProperties",
+        "EnumerateDeviceLayerProperties",
+        "EnumerateDeviceExtensionProperties",
+        "CreateXcbSurfaceKHR",
+        "GetPhysicalDeviceXcbPresentationSupportKHR",
+        "CreateXlibSurfaceKHR",
+        "GetPhysicalDeviceXlibPresentationSupportKHR",
+        "CreateWaylandSurfaceKHR",
+        "GetPhysicalDeviceWaylandPresentationSupportKHR",
+        "CreateMirSurfaceKHR",
+        "GetPhysicalDeviceMirPresentationSupportKHR",
+        "CreateAndroidSurfaceKHR",
+        "CreateWin32SurfaceKHR",
+        "GetPhysicalDeviceWin32PresentationSupportKHR"
+    ]
+    if proto.params[0].ty == "VkInstance" or proto.params[0].ty == "VkPhysicalDevice" or proto.name in global_function_names:
+       return True
+    else:
+       return False
+
+def wsi_name(ext_name):
+    wsi_prefix = ""
+    if 'Xcb' in ext_name:
+        wsi_prefix = 'XCB'
+    elif 'Xlib' in ext_name:
+        wsi_prefix = 'XLIB'
+    elif 'Win32' in ext_name:
+        wsi_prefix = 'WIN32'
+    elif 'Mir' in ext_name:
+        wsi_prefix = 'MIR'
+    elif 'Wayland' in ext_name:
+        wsi_prefix = 'WAYLAND'
+    elif 'Android' in ext_name:
+        wsi_prefix = 'ANDROID'
+    else:
+        wsi_prefix = ''
+    return wsi_prefix
+
+def wsi_ifdef(ext_name):
+    wsi_prefix = wsi_name(ext_name)
+    if not wsi_prefix:
+        return ''
+    else:
+        return "#ifdef VK_USE_PLATFORM_%s_KHR" % wsi_prefix
+
+def wsi_endif(ext_name):
+    wsi_prefix = wsi_name(ext_name)
+    if not wsi_prefix:
+        return ''
+    else:
+        return "#endif  // VK_USE_PLATFORM_%s_KHR" % wsi_prefix
+
+def generate_get_proc_addr_check(name):
+    return "    if (!%s || %s[0] != 'v' || %s[1] != 'k')\n" \
+           "        return NULL;" % ((name,) * 3)
+
+# Parse complete struct chain and add any new ndo_uses to the dict
+def gather_object_uses_in_struct(obj_list, struct_type):
+    struct_uses = {}
+    if vk_helper_api_dump.typedef_rev_dict[struct_type] in vk_helper_api_dump.struct_dict:
+        struct_type = vk_helper_api_dump.typedef_rev_dict[struct_type]
+        # Parse elements of this struct param to identify objects and/or arrays of objects
+        for m in sorted(vk_helper_api_dump.struct_dict[struct_type]):
+            array_len = "%s" % (str(vk_helper_api_dump.struct_dict[struct_type][m]['array_size']))
+            base_type = vk_helper_api_dump.struct_dict[struct_type][m]['type']
+            mem_name = vk_helper_api_dump.struct_dict[struct_type][m]['name']
+            if array_len != '0':
+                mem_name = "%s[%s]" % (mem_name, array_len)
+            if base_type in obj_list:
+                #if array_len not in ndo_uses:
+                #    struct_uses[array_len] = []
+                #struct_uses[array_len].append("%s%s,%s" % (name_prefix, struct_name, base_type))
+                struct_uses[mem_name] = base_type
+            elif vk_helper_api_dump.is_type(base_type, 'struct'):
+                sub_uses = gather_object_uses_in_struct(obj_list, base_type)
+                if len(sub_uses) > 0:
+                    struct_uses[mem_name] = sub_uses
+    return struct_uses
+
+class Subcommand(object):
+    def __init__(self, argv):
+        self.argv = argv
+        self.headers = vulkan.headers
+        self.protos = vulkan.protos
+        self.no_addr = False
+        self.layer_name = ""
+        self.lineinfo = sourcelineinfo()
+        self.wsi = sys.argv[1]
+
+    def run(self):
+        print(self.generate())
+
+    def generate(self):
+        copyright = self.generate_copyright()
+        header = self.generate_header()
+        body = self.generate_body()
+        footer = self.generate_footer()
+
+        contents = []
+        if copyright:
+            contents.append(copyright)
+        if header:
+            contents.append(header)
+        if body:
+            contents.append(body)
+        if footer:
+            contents.append(footer)
+
+        return "\n\n".join(contents)
+
+    def generate_copyright(self):
+        return """/* THIS FILE IS GENERATED.  DO NOT EDIT. */
+
+/*
+ * Copyright (c) 2015-2016 Valve Corporation
+ * Copyright (c) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Tobin Ehlis <tobin@lunarg.com>
+ * Author: Courtney Goeltzenleuchter <courtney@lunarg.com>
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Mark Lobodzinski <mark@lunarg.com>
+ * Author: Mike Stroyan <mike@lunarg.com>
+ * Author: Tony Barbour <tony@LunarG.com>
+ */"""
+
+    def generate_header(self):
+        return "\n".join(["#include <" + h + ">" for h in self.headers])
+
+    def generate_body(self):
+        pass
+
+    def generate_footer(self):
+        pass
+
+    # Return set of printf '%' qualifier and input to that qualifier
+    def _get_printf_params(self, vk_type, name, output_param, count):
+        # TODO : Need ENUM and STRUCT checks here
+        if vk_helper_api_dump.is_type(vk_type, 'enum'):#"_TYPE" in vk_type: # TODO : This should be generic ENUM check
+            return ("%s", "string_%s(%s)" % (vk_type.replace('const ', '').strip('*'), name), '')
+        if "char*" == vk_type:
+            return ("%s", name, '')
+        if "uint64" in vk_type:
+            if '*' in vk_type:
+                return ("0x%p", "*%s" % name, '*')
+            return ("0x%p", name, '')
+        if vk_type.strip('*').replace('const ', '') in vulkan.object_non_dispatch_list:
+            #TODO : cast these types differently for 32 bit windows builds
+            if '*' in vk_type:
+                return ("0x%p", "HandleCast(*%s)" % name, '*')
+            return ("0x%p", "HandleCast(%s)" % name, '')
+        if vk_type.strip('*').replace('const ', '') in vulkan.object_dispatch_list:
+            if '*' in vk_type:
+                return ("0x%p", "HandleCast(*%s)" % name, '*')
+            return ("0x%p", "HandleCast(%s)" % name, '')
+        if "size" in vk_type.lower() or "mask" in vk_type.lower():
+            if '*' in vk_type:
+                return ("0x%p", "*%s" % name, '*')
+            return ("0x%p", "%s" % name, '')
+        if "float" in vk_type:
+            if '[' in vk_type: # handle array, current hard-coded to 4 (TODO: Make this dynamic)
+                return ("[%i, %i, %i, %i]", '"[" << %s[0] << "," << %s[1] << "," << %s[2] << "," << %s[3] << "]"' % (name, name, name, name), '')
+            return ("%f", name, '')
+        if "bool" in vk_type.lower() or 'xcb_randr_crtc_t' in vk_type:
+            return ("%u", name, '')
+        if True in [t in vk_type.lower() for t in ["int", "flags", "mask", "xcb_window_t"]]:
+            if '[' in vk_type: # handle array, current hard-coded to 4 (TODO: Make this dynamic)
+                return ("[%i, %i, %i, %i]", "%s[0] << %s[1] << %s[2] << %s[3]" % (name, name, name, name), '')
+            if '*' in vk_type:
+                return ("0x%p", "*%s" % name, '*')
+            return ("0x%p", name, '')
+        # TODO : This is special-cased as there's only one "format" param currently and it's nice to expand it
+        if "VkFormat" == vk_type:
+            return ("0x%p", "HandleCast(&%s)" % name, '&')
+        if output_param:
+            if 1 == vk_type.count('*'):
+                return ("0x%p", "*%s" % name, '*')
+            else:
+                return ("0x%p", "HandleCast(*%s)" % name, '*')
+#        if vk_helper_api_dump.is_type(vk_type, 'struct') and '*' not in vk_type:
+#            return ("%p", "HandleCast(&%s)" % name, '&')
+        return ("0x%p", "HandleCast(%s)" % name, '')
+
+    def _gen_create_msg_callback(self):
+        r_body = []
+        r_body.append('%s' % self.lineinfo.get())
+        r_body.append('VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkCreateDebugReportCallbackEXT(')
+        r_body.append('        VkInstance                                   instance,')
+        r_body.append('        const VkDebugReportCallbackCreateInfoEXT*    pCreateInfo,')
+        r_body.append('        const VkAllocationCallbacks*                 pAllocator,')
+        r_body.append('        VkDebugReportCallbackEXT*                    pCallback)')
+        r_body.append('{')
+        # Switch to this code section for the new per-instance storage and debug callbacks
+        if self.layer_name in ['object_tracker', 'threading', 'unique_objects']:
+            r_body.append('    VkLayerInstanceDispatchTable *pInstanceTable = get_dispatch_table(%s_instance_table_map, instance);' % self.layer_name )
+            r_body.append('    VkResult result = pInstanceTable->CreateDebugReportCallbackEXT(instance, pCreateInfo, pAllocator, pCallback);')
+            r_body.append('    if (VK_SUCCESS == result) {')
+            r_body.append('        layer_data *my_data = get_my_data_ptr(get_dispatch_key(instance), layer_data_map);')
+            r_body.append('        result = layer_create_msg_callback(my_data->report_data,')
+            r_body.append('                                           false,')
+            r_body.append('                                           pCreateInfo,')
+            r_body.append('                                           pAllocator,')
+            r_body.append('                                           pCallback);')
+            r_body.append('    }')
+            r_body.append('    return result;')
+        else:
+            r_body.append('    VkResult result = instance_dispatch_table(instance)->CreateDebugReportCallbackEXT(instance, pCreateInfo, pAllocator, pCallback);')
+            r_body.append('    if (VK_SUCCESS == result) {')
+            r_body.append('        layer_data *my_data = get_my_data_ptr(get_dispatch_key(instance), layer_data_map);')
+            r_body.append('        result = layer_create_msg_callback(my_data->report_data, false, pCreateInfo, pAllocator, pCallback);')
+            r_body.append('    }')
+            r_body.append('    return result;')
+        r_body.append('}')
+        return "\n".join(r_body)
+
+    def _gen_destroy_msg_callback(self):
+        r_body = []
+        r_body.append('%s' % self.lineinfo.get())
+        r_body.append('VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkDestroyDebugReportCallbackEXT(VkInstance instance, VkDebugReportCallbackEXT msgCallback, const VkAllocationCallbacks *pAllocator)')
+        r_body.append('{')
+        # Switch to this code section for the new per-instance storage and debug callbacks
+        if self.layer_name in ['object_tracker', 'threading', 'unique_objects']:
+            r_body.append('    VkLayerInstanceDispatchTable *pInstanceTable = get_dispatch_table(%s_instance_table_map, instance);' % self.layer_name )
+        else:
+            r_body.append('    VkLayerInstanceDispatchTable *pInstanceTable = instance_dispatch_table(instance);')
+        r_body.append('    pInstanceTable->DestroyDebugReportCallbackEXT(instance, msgCallback, pAllocator);')
+        r_body.append('    layer_data *my_data = get_my_data_ptr(get_dispatch_key(instance), layer_data_map);')
+        r_body.append('    layer_destroy_msg_callback(my_data->report_data, msgCallback, pAllocator);')
+        r_body.append('}')
+        return "\n".join(r_body)
+
+    def _gen_debug_report_msg(self):
+        r_body = []
+        r_body.append('%s' % self.lineinfo.get())
+        r_body.append('VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkDebugReportMessageEXT(VkInstance instance, VkDebugReportFlagsEXT    flags, VkDebugReportObjectTypeEXT objType, uint64_t object, size_t location, int32_t msgCode, const char *pLayerPrefix, const char *pMsg)')
+        r_body.append('{')
+        # Switch to this code section for the new per-instance storage and debug callbacks
+        if self.layer_name == 'object_tracker' or self.layer_name == 'threading':
+            r_body.append('    VkLayerInstanceDispatchTable *pInstanceTable = get_dispatch_table(%s_instance_table_map, instance);' % self.layer_name )
+        else:
+            r_body.append('    VkLayerInstanceDispatchTable *pInstanceTable = instance_dispatch_table(instance);')
+        r_body.append('    pInstanceTable->DebugReportMessageEXT(instance, flags, objType, object, location, msgCode, pLayerPrefix, pMsg);')
+        r_body.append('}')
+        return "\n".join(r_body)
+
+    def _gen_layer_get_global_extension_props(self, layer="generic"):
+        ggep_body = []
+        # generated layers do not provide any global extensions
+        ggep_body.append('%s' % self.lineinfo.get())
+
+        ggep_body.append('')
+        if self.layer_name == 'object_tracker' or self.layer_name == 'threading':
+            ggep_body.append('static const VkExtensionProperties instance_extensions[] = {')
+            ggep_body.append('    {')
+            ggep_body.append('        VK_EXT_DEBUG_REPORT_EXTENSION_NAME,')
+            ggep_body.append('        VK_EXT_DEBUG_REPORT_SPEC_VERSION')
+            ggep_body.append('    }')
+            ggep_body.append('};')
+        ggep_body.append('VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkEnumerateInstanceExtensionProperties(const char *pLayerName, uint32_t *pCount,  VkExtensionProperties* pProperties)')
+        ggep_body.append('{')
+        if self.layer_name == 'object_tracker' or self.layer_name == 'threading':
+          ggep_body.append('    return util_GetExtensionProperties(1, instance_extensions, pCount, pProperties);')
+        else:
+          ggep_body.append('    return util_GetExtensionProperties(0, NULL, pCount, pProperties);')
+        ggep_body.append('}')
+        return "\n".join(ggep_body)
+
+    def _gen_layer_get_global_layer_props(self, layer="generic"):
+        ggep_body = []
+        if layer == 'generic':
+            # Do nothing, extension definition part of generic.h
+            ggep_body.append('%s' % self.lineinfo.get())
+        else:
+            layer_name = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', layer)
+            layer_name = re.sub('([a-z0-9])([A-Z])', r'\1_\2', layer_name).lower()
+            ggep_body.append('%s' % self.lineinfo.get())
+            ggep_body.append('static const VkLayerProperties globalLayerProps[] = {')
+            ggep_body.append('    {')
+            ggep_body.append('        "VK_LAYER_LUNARG_%s",' % layer_name)
+            ggep_body.append('        VK_MAKE_VERSION(1, 0, VK_HEADER_VERSION), // specVersion')
+            ggep_body.append('        VK_MAKE_VERSION(0, 1, 0), // implementationVersion')
+            ggep_body.append('        "layer: %s",' % layer)
+            ggep_body.append('    }')
+            ggep_body.append('};')
+        ggep_body.append('')
+        ggep_body.append('%s' % self.lineinfo.get())
+        ggep_body.append('')
+        ggep_body.append('VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkEnumerateInstanceLayerProperties(uint32_t *pCount,  VkLayerProperties* pProperties)')
+        ggep_body.append('{')
+        ggep_body.append('    return util_GetLayerProperties(ARRAY_SIZE(globalLayerProps), globalLayerProps, pCount, pProperties);')
+        ggep_body.append('}')
+        return "\n".join(ggep_body)
+
+    def _gen_layer_get_physical_device_layer_props(self, layer="generic"):
+        gpdlp_body = []
+        if layer == 'generic':
+            # Do nothing, extension definition part of generic.h
+            gpdlp_body.append('%s' % self.lineinfo.get())
+        else:
+            gpdlp_body.append('%s' % self.lineinfo.get())
+            gpdlp_body.append('static const VkLayerProperties deviceLayerProps[] = {')
+            gpdlp_body.append('    {')
+            gpdlp_body.append('        "VK_LAYER_LUNARG_%s",' % layer)
+            gpdlp_body.append('        VK_MAKE_VERSION(1, 0, VK_HEADER_VERSION),')
+            gpdlp_body.append('        VK_MAKE_VERSION(0, 1, 0),')
+            gpdlp_body.append('        "layer: %s",' % layer)
+            gpdlp_body.append('    }')
+            gpdlp_body.append('};')
+        gpdlp_body.append('VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkEnumerateDeviceLayerProperties(VkPhysicalDevice physicalDevice, uint32_t *pCount, VkLayerProperties* pProperties)')
+        gpdlp_body.append('{')
+        gpdlp_body.append('    return util_GetLayerProperties(ARRAY_SIZE(deviceLayerProps), deviceLayerProps, pCount, pProperties);')
+        gpdlp_body.append('}')
+        gpdlp_body.append('')
+        return "\n".join(gpdlp_body)
+
+    def _generate_dispatch_entrypoints(self, qual=""):
+        if qual:
+            qual += " "
+
+        funcs = []
+        intercepted = []
+        for proto in self.protos:
+            if proto.name == "GetDeviceProcAddr" or proto.name == "GetInstanceProcAddr":
+                continue
+            else:
+                intercept = self.generate_intercept(proto, qual)
+                if intercept is None:
+                    # fill in default intercept for certain entrypoints
+                    if 'CreateDebugReportCallbackEXT' == proto.name:
+                        intercept = self._gen_layer_dbg_create_msg_callback()
+                    elif 'DestroyDebugReportCallbackEXT' == proto.name:
+                        intercept = self._gen_layer_dbg_destroy_msg_callback()
+                    elif 'DebugReportMessageEXT' == proto.name:
+                        intercept = self._gen_debug_report_msg()
+                    elif 'CreateDevice' == proto.name:
+                        funcs.append('/* CreateDevice HERE */')
+                    elif 'EnumerateInstanceExtensionProperties' == proto.name:
+                        intercept = self._gen_layer_get_global_extension_props(self.layer_name)
+                    elif 'EnumerateInstanceLayerProperties' == proto.name:
+                        intercept = self._gen_layer_get_global_layer_props(self.layer_name)
+                    elif 'EnumerateDeviceLayerProperties' == proto.name:
+                        intercept = self._gen_layer_get_physical_device_layer_props(self.layer_name)
+
+                if intercept is not None:
+                    funcs.append(intercept)
+                    if not "KHR" in proto.name:
+                        intercepted.append(proto)
+
+        prefix="vk"
+        lookups = []
+        for proto in intercepted:
+            lookups.append("if (!strcmp(name, \"%s\"))" % proto.name)
+            lookups.append("    return (PFN_vkVoidFunction) %s%s;" %
+                    (prefix, proto.name))
+
+        # add customized layer_intercept_proc
+        body = []
+        body.append('%s' % self.lineinfo.get())
+        body.append("static inline PFN_vkVoidFunction layer_intercept_proc(const char *name)")
+        body.append("{")
+        body.append(generate_get_proc_addr_check("name"))
+        body.append("")
+        body.append("    name += 2;")
+        body.append("    %s" % "\n    ".join(lookups))
+        body.append("")
+        body.append("    return NULL;")
+        body.append("}")
+        # add layer_intercept_instance_proc
+        lookups = []
+        for proto in self.protos:
+            if not proto_is_global(proto):
+                continue
+
+            if not proto in intercepted:
+                continue
+            if proto.name == "CreateInstance":
+                continue
+            if proto.name == "CreateDevice":
+                continue
+            lookups.append("if (!strcmp(name, \"%s\"))" % proto.name)
+            lookups.append("    return (PFN_vkVoidFunction) %s%s;" % (prefix, proto.name))
+
+        body.append("static inline PFN_vkVoidFunction layer_intercept_instance_proc(const char *name)")
+        body.append("{")
+        body.append(generate_get_proc_addr_check("name"))
+        body.append("")
+        body.append("    name += 2;")
+        body.append("    %s" % "\n    ".join(lookups))
+        body.append("")
+        body.append("    return NULL;")
+        body.append("}")
+
+        funcs.append("\n".join(body))
+        return "\n\n".join(funcs)
+
+    def _generate_extensions(self):
+        exts = []
+        exts.append('%s' % self.lineinfo.get())
+        exts.append(self._gen_create_msg_callback())
+        exts.append(self._gen_destroy_msg_callback())
+        exts.append(self._gen_debug_report_msg())
+        return "\n".join(exts)
+
+    def _generate_layer_gpa_function(self, extensions=[], instance_extensions=[]):
+        func_body = []
+#
+# New style of GPA Functions for the new layer_data/layer_logging changes
+#
+        if self.layer_name in ['object_tracker', 'threading', 'unique_objects']:
+            func_body.append("VK_LAYER_EXPORT VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL vkGetDeviceProcAddr(VkDevice device, const char* funcName)\n"
+                             "{\n"
+                             "    PFN_vkVoidFunction addr;\n"
+                             "    if (!strcmp(\"vkGetDeviceProcAddr\", funcName)) {\n"
+                             "        return (PFN_vkVoidFunction) vkGetDeviceProcAddr;\n"
+                             "    }\n\n"
+                             "    addr = layer_intercept_proc(funcName);\n"
+                             "    if (addr)\n"
+                             "        return addr;\n"
+                             "    if (device == VK_NULL_HANDLE) {\n"
+                             "        return NULL;\n"
+                             "    }\n")
+            if 0 != len(extensions):
+                func_body.append('%s' % self.lineinfo.get())
+                func_body.append('    layer_data *my_device_data = get_my_data_ptr(get_dispatch_key(device), layer_data_map);')
+                for (ext_enable, ext_list) in extensions:
+                    extra_space = ""
+                    if 0 != len(ext_enable):
+                        func_body.append('    if (my_device_data->%s) {' % ext_enable)
+                        extra_space = "    "
+                    for ext_name in ext_list:
+                        func_body.append('    %sif (!strcmp("%s", funcName))\n'
+                                         '        %sreturn reinterpret_cast<PFN_vkVoidFunction>(%s);' % (extra_space, ext_name, extra_space, ext_name))
+                    if 0 != len(ext_enable):
+                        func_body.append('    }\n')
+            func_body.append("\n    if (get_dispatch_table(%s_device_table_map, device)->GetDeviceProcAddr == NULL)\n"
+                             "        return NULL;\n"
+                             "    return get_dispatch_table(%s_device_table_map, device)->GetDeviceProcAddr(device, funcName);\n"
+                             "}\n" % (self.layer_name, self.layer_name))
+            func_body.append("VK_LAYER_EXPORT VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL vkGetInstanceProcAddr(VkInstance instance, const char* funcName)\n"
+                             "{\n"
+                             "    PFN_vkVoidFunction addr;\n"
+                             "    if (!strcmp(funcName, \"vkGetInstanceProcAddr\"))\n"
+                             "        return (PFN_vkVoidFunction) vkGetInstanceProcAddr;\n"
+                             "    if (!strcmp(funcName, \"vkCreateInstance\"))\n"
+                             "        return (PFN_vkVoidFunction) vkCreateInstance;\n"
+                             "    if (!strcmp(funcName, \"vkCreateDevice\"))\n"
+                             "        return (PFN_vkVoidFunction) vkCreateDevice;\n"
+                             "    addr = layer_intercept_instance_proc(funcName);\n"
+                             "    if (addr) {\n"
+                             "        return addr;"
+                             "    }\n"
+                             "    if (instance == VK_NULL_HANDLE) {\n"
+                             "        return NULL;\n"
+                             "    }\n"
+                             )
+
+            table_declared = False
+            if 0 != len(instance_extensions):
+                for (ext_enable, ext_list) in instance_extensions:
+                    extra_space = ""
+                    if 0 != len(ext_enable):
+                        if ext_enable == 'msg_callback_get_proc_addr':
+                            func_body.append("    layer_data *my_data = get_my_data_ptr(get_dispatch_key(instance), layer_data_map);\n"
+                                     "    addr = debug_report_get_instance_proc_addr(my_data->report_data, funcName);\n"
+                                     "    if (addr) {\n"
+                                     "        return addr;\n"
+                                     "    }\n")
+                        else:
+                            if table_declared == False:
+                                func_body.append("    VkLayerInstanceDispatchTable* pTable = get_dispatch_table(%s_instance_table_map, instance);" % self.layer_name)
+                                table_declared = True
+                            func_body.append('    if (instanceExtMap.size() != 0 && instanceExtMap[pTable].%s)' % ext_enable)
+                            func_body.append('    {')
+                            extra_space = "    "
+                            for ext_name in ext_list:
+                                if wsi_name(ext_name):
+                                    func_body.append('%s' % wsi_ifdef(ext_name))
+                                func_body.append('    %sif (!strcmp("%s", funcName))\n'
+                                                 '            return reinterpret_cast<PFN_vkVoidFunction>(%s);' % (extra_space, ext_name, ext_name))
+                                if wsi_name(ext_name):
+                                    func_body.append('%s' % wsi_endif(ext_name))
+                            if 0 != len(ext_enable):
+                               func_body.append('    }\n')
+
+            func_body.append("    if (get_dispatch_table(%s_instance_table_map, instance)->GetInstanceProcAddr == NULL) {\n"
+                             "        return NULL;\n"
+                             "    }\n"
+                             "    return get_dispatch_table(%s_instance_table_map, instance)->GetInstanceProcAddr(instance, funcName);\n"
+                             "}\n" % (self.layer_name, self.layer_name))
+            return "\n".join(func_body)
+        else:
+            func_body.append('%s' % self.lineinfo.get())
+            func_body.append("VK_LAYER_EXPORT VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL vkGetDeviceProcAddr(VkDevice device, const char* funcName)\n"
+                             "{\n"
+                             "    PFN_vkVoidFunction addr;\n")
+            if self.layer_name == 'generic':
+                func_body.append("\n"
+                             "    if (!strcmp(\"vkGetDeviceProcAddr\", funcName)) {\n"
+                             "        return (PFN_vkVoidFunction) vkGetDeviceProcAddr;\n"
+                             "    }\n\n"
+                             "    addr = layer_intercept_proc(funcName);\n"
+                             "    if (addr)\n"
+                             "        return addr;")
+            else:
+                func_body.append("\n"
+                             "    loader_platform_thread_once(&initOnce, init%s);\n\n"
+                             "    if (!strcmp(\"vkGetDeviceProcAddr\", funcName)) {\n"
+                             "        return (PFN_vkVoidFunction) vkGetDeviceProcAddr;\n"
+                             "    }\n\n"
+                             "    addr = layer_intercept_proc(funcName);\n"
+                             "    if (addr)\n"
+                             "        return addr;" % self.layer_name)
+            func_body.append("    if (device == VK_NULL_HANDLE) {\n"
+                             "        return NULL;\n"
+                             "    }\n")
+            func_body.append('')
+            func_body.append('    VkLayerDispatchTable *pDisp =  device_dispatch_table(device);')
+            if 0 != len(extensions):
+                extra_space = ""
+                for (ext_enable, ext_list) in extensions:
+                    if 0 != len(ext_enable):
+                        func_body.append('    if (deviceExtMap.size() != 0 && deviceExtMap[pDisp].%s)' % ext_enable)
+                        func_body.append('    {')
+                        extra_space = "    "
+                    for ext_name in ext_list:
+                        func_body.append('    %sif (!strcmp("%s", funcName))\n'
+                                         '            return reinterpret_cast<PFN_vkVoidFunction>(%s);' % (extra_space, ext_name, ext_name))
+                    if 0 != len(ext_enable):
+                        func_body.append('    }')
+            func_body.append('%s' % self.lineinfo.get())
+            func_body.append("    {\n"
+                             "        if (pDisp->GetDeviceProcAddr == NULL)\n"
+                             "            return NULL;\n"
+                             "        return pDisp->GetDeviceProcAddr(device, funcName);\n"
+                             "    }\n"
+                             "}\n")
+            func_body.append('%s' % self.lineinfo.get())
+            func_body.append("VK_LAYER_EXPORT VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL vkGetInstanceProcAddr(VkInstance instance, const char* funcName)\n"
+                             "{\n"
+                             "    PFN_vkVoidFunction addr;\n"
+                             "    if (!strcmp(funcName, \"vkGetInstanceProcAddr\"))\n"
+                             "        return (PFN_vkVoidFunction) vkGetInstanceProcAddr;\n"
+                             "    if (!strcmp(funcName, \"vkCreateInstance\"))\n"
+                             "        return (PFN_vkVoidFunction) vkCreateInstance;\n"
+                             "    if (!strcmp(funcName, \"vkCreateDevice\"))\n"
+                             "        return (PFN_vkVoidFunction) vkCreateDevice;\n"
+                             )
+            if self.layer_name == 'generic':
+                func_body.append("\n"
+                             "    addr = layer_intercept_instance_proc(funcName);\n"
+                             "    if (addr)\n"
+                             "        return addr;")
+            else:
+                func_body.append(
+                             "    loader_platform_thread_once(&initOnce, init%s);\n\n"
+                             "    addr = layer_intercept_instance_proc(funcName);\n"
+                             "    if (addr)\n"
+                             "        return addr;" % self.layer_name)
+            func_body.append("    if (instance == VK_NULL_HANDLE) {\n"
+                             "        return NULL;\n"
+                             "    }\n")
+            func_body.append("")
+            func_body.append("    VkLayerInstanceDispatchTable* pTable = instance_dispatch_table(instance);\n")
+            if 0 != len(instance_extensions):
+                extra_space = ""
+                for (ext_enable, ext_list) in instance_extensions:
+                    if 0 != len(ext_enable):
+                        if ext_enable == 'msg_callback_get_proc_addr':
+                            func_body.append("    layer_data *my_data = get_my_data_ptr(get_dispatch_key(instance), layer_data_map);\n"
+                                     "    addr = debug_report_get_instance_proc_addr(my_data->report_data, funcName);\n"
+                                     "    if (addr) {\n"
+                                     "        return addr;\n"
+                                     "    }\n")
+                        else:
+                            func_body.append('    if (instanceExtMap.size() != 0 && instanceExtMap[pTable].%s)' % ext_enable)
+                            func_body.append('    {')
+                            extra_space = "    "
+                            for ext_name in ext_list:
+                                if wsi_name(ext_name):
+                                    func_body.append('%s' % wsi_ifdef(ext_name))
+                                func_body.append('    %sif (!strcmp("%s", funcName))\n'
+                                         '            return reinterpret_cast<PFN_vkVoidFunction>(%s);' % (extra_space, ext_name, ext_name))
+                                if wsi_name(ext_name):
+                                    func_body.append('%s' % wsi_endif(ext_name))
+                            if 0 != len(ext_enable):
+                                func_body.append('    }\n')
+
+            func_body.append("    if (pTable->GetInstanceProcAddr == NULL)\n"
+                             "        return NULL;\n"
+                             "    return pTable->GetInstanceProcAddr(instance, funcName);\n"
+                             "}\n")
+            return "\n".join(func_body)
+
+    def _generate_layer_initialization(self, init_opts=False, prefix='vk', lockname=None, condname=None):
+        func_body = ["#include \"vk_dispatch_table_helper.h\""]
+        func_body.append('%s' % self.lineinfo.get())
+        func_body.append('static void init_%s(layer_data *my_data, const VkAllocationCallbacks *pAllocator)\n'
+                         '{\n' % self.layer_name)
+        if init_opts:
+            func_body.append('%s' % self.lineinfo.get())
+            func_body.append('')
+            func_body.append('    layer_debug_actions(my_data->report_data, my_data->logging_callback, pAllocator, "lunarg_%s");' % self.layer_name)
+            func_body.append('')
+        if lockname is not None:
+            func_body.append('%s' % self.lineinfo.get())
+            func_body.append("    if (!%sLockInitialized)" % lockname)
+            func_body.append("    {")
+            func_body.append("        // TODO/TBD: Need to delete this mutex sometime.  How???")
+            func_body.append("        loader_platform_thread_create_mutex(&%sLock);" % lockname)
+            if condname is not None:
+                func_body.append("        loader_platform_thread_init_cond(&%sCond);" % condname)
+            func_body.append("        %sLockInitialized = 1;" % lockname)
+            func_body.append("    }")
+        func_body.append("}\n")
+        func_body.append('')
+        return "\n".join(func_body)
+
+class LayerFuncsSubcommand(Subcommand):
+    def generate_header(self):
+        return '#include <vulkan/vk_layer.h>\n#include "loader.h"'
+
+    def generate_body(self):
+        return self._generate_dispatch_entrypoints("static")
+
+class GenericLayerSubcommand(Subcommand):
+    def generate_header(self):
+        gen_header = []
+        gen_header.append('%s' % self.lineinfo.get())
+        gen_header.append('#include <stdio.h>')
+        gen_header.append('#include <stdlib.h>')
+        gen_header.append('#include <string.h>')
+        gen_header.append('#include <unordered_map>')
+        gen_header.append('#include "vk_loader_platform.h"')
+        gen_header.append('#include "vulkan/vk_layer.h"')
+        gen_header.append('#include "vk_layer_config.h"')
+        gen_header.append('#include "vk_layer_logging.h"')
+        gen_header.append('#include "vk_layer_table.h"')
+        gen_header.append('#include "vk_layer_extension_utils.h"')
+        gen_header.append('#include "vk_layer_utils.h"')
+        gen_header.append('')
+        gen_header.append('#include "generic.h"')
+        gen_header.append('')
+        gen_header.append('%s' % self.lineinfo.get())
+        gen_header.append('#define LAYER_EXT_ARRAY_SIZE 1')
+        gen_header.append('#define LAYER_DEV_EXT_ARRAY_SIZE 1')
+        gen_header.append('//static LOADER_PLATFORM_THREAD_ONCE_DECLARATION(initOnce);')
+        gen_header.append('static std::unordered_map<void *, layer_data *> layer_data_map;\n')
+        gen_header.append('template layer_data *get_my_data_ptr<layer_data>(')
+        gen_header.append('        void *data_key,')
+        gen_header.append('        std::unordered_map<void *, layer_data *> &data_map);\n')
+        gen_header.append('')
+        return "\n".join(gen_header)
+
+    def generate_intercept(self, proto, qual):
+        if proto.name in [ 'EnumerateInstanceLayerProperties', 'EnumerateInstanceExtensionProperties', 'EnumerateDeviceLayerProperties', 'EnumerateDeviceExtensionProperties' ]:
+            # use default version
+            return None
+        decl = proto.c_func(prefix="vk", attr="VKAPI")
+        ret_val = ''
+        stmt = ''
+        funcs = []
+        table = ''
+        if proto_is_global(proto):
+           table = 'Instance'
+        if proto.ret != "void":
+            funcs.append('%s' % self.lineinfo.get())
+            ret_val = "%s result = " % proto.ret
+            stmt = "    return result;\n"
+        if proto.name == "CreateDevice":
+            funcs.append('%s' % self.lineinfo.get())
+            funcs.append('%s%s\n'
+                     '{\n'
+                     '    layer_data *my_instance_data = get_my_data_ptr(get_dispatch_key(physicalDevice), layer_data_map);\n'
+                     '    char str[1024];\n'
+                     '    sprintf(str, "At start of Generic layered %s\\n");\n'
+                     '    log_msg(my_instance_data->report_data, VK_DEBUG_REPORT_INFORMATION_BIT_EXT, VK_DEBUG_REPORT_OBJECT_TYPE_PHYSICAL_DEVICE_EXT,'
+                     '            (uint64_t)physicalDevice, __LINE__, 0, (char *) "generic", "%%s", (char *) str);\n'
+                     '    VkLayerDeviceCreateInfo *chain_info = get_chain_info(pCreateInfo, VK_LAYER_LINK_INFO);\n'
+                     '    PFN_vkGetInstanceProcAddr fpGetInstanceProcAddr = chain_info->u.pLayerInfo->pfnNextGetInstanceProcAddr;\n'
+                     '    PFN_vkGetDeviceProcAddr fpGetDeviceProcAddr = chain_info->u.pLayerInfo->pfnNextGetDeviceProcAddr;\n'
+                     '    PFN_vkCreateDevice fpCreateDevice = (PFN_vkCreateDevice) fpGetInstanceProcAddr(NULL, "vkCreateDevice");\n'
+                     '    if (fpCreateDevice == NULL) {\n'
+                     '        return VK_ERROR_INITIALIZATION_FAILED;\n'
+                     '    }\n'
+                     '    // Advance the link info for the next element on the chain\n'
+                     '    chain_info->u.pLayerInfo = chain_info->u.pLayerInfo->pNext;\n'
+                     '    VkResult result = fpCreateDevice(physicalDevice, pCreateInfo, pAllocator, pDevice);\n'
+                     '    if (result != VK_SUCCESS) {\n'
+                     '        return result;\n'
+                     '    }\n'
+                     '    layer_data *my_device_data = get_my_data_ptr(get_dispatch_key(*pDevice), layer_data_map);\n'
+                     '    initDeviceTable(*pDevice, fpGetDeviceProcAddr);\n'
+                     '    my_device_data->report_data = layer_debug_report_create_device(my_instance_data->report_data, *pDevice);\n'
+                     '    createDeviceRegisterExtensions(pCreateInfo, *pDevice);\n'
+                     '    sprintf(str, "Completed generic layered %s\\n");\n'
+                     '    log_msg(my_device_data->report_data, VK_DEBUG_REPORT_INFORMATION_BIT_EXT, VK_DEBUG_REPORT_OBJECT_TYPE_PHYSICAL_DEVICE_EXT, (uint64_t)physicalDevice, __LINE__, 0, (char *) "generic", "%%s", (char *) str);\n'
+                     '    %s'
+                     '}' % (qual, decl, proto.name, proto.name, stmt))
+        elif proto.name == "DestroyDevice":
+            funcs.append('%s' % self.lineinfo.get())
+            funcs.append('%s%s\n'
+                         '{\n'
+                         '    dispatch_key key = get_dispatch_key(device);\n'
+                         '    VkLayerDispatchTable *pDisp  =  device_dispatch_table(device);\n'
+                         '    pDisp->DestroyDevice(device, pAllocator);\n'
+                         '    deviceExtMap.erase(pDisp);\n'
+                         '    destroy_device_dispatch_table(key);\n'
+                         '}\n' % (qual, decl))
+        elif proto.name == "DestroyInstance":
+            funcs.append('%s' % self.lineinfo.get())
+            funcs.append('%s%s\n'
+                         '{\n'
+                         '    dispatch_key key = get_dispatch_key(instance);\n'
+                         '    VkLayerInstanceDispatchTable *pDisp  =  instance_dispatch_table(instance);\n'
+                         '    pDisp->DestroyInstance(instance, pAllocator);\n'
+                         '    // Clean up logging callback, if any\n'
+                         '    layer_data *my_data = get_my_data_ptr(key, layer_data_map);\n'
+                         '    while (my_data->logging_callback.size() > 0) {'
+                         '        VkDebugReportCallbackEXT callback = my_data->logging_callback.back();'
+                         '        layer_destroy_msg_callback(my_data->report_data, callback, pAllocator);'
+                         '        my_data->logging_callback.pop_back();'
+                         '    }\n\n'
+                         '    layer_debug_report_destroy_instance(my_data->report_data);\n'
+                         '    layer_data_map.erase(key);\n'
+                         '    instanceExtMap.erase(pDisp);\n'
+                         '    destroy_instance_dispatch_table(key);\n'
+                         '}\n' % (qual, decl))
+        elif proto.name == "CreateInstance":
+            funcs.append('%s' % self.lineinfo.get())
+            # CreateInstance needs to use the second parm instead of the first to set the correct dispatch table
+            funcs.append('%s%s\n'
+                         '{\n'
+                         '    char str[1024];\n'
+                         '    VkLayerInstanceCreateInfo *chain_info = get_chain_info(pCreateInfo, VK_LAYER_LINK_INFO);\n'
+                         '    PFN_vkGetInstanceProcAddr fpGetInstanceProcAddr = chain_info->u.pLayerInfo->pfnNextGetInstanceProcAddr;\n'
+                         '    PFN_vkCreateInstance fpCreateInstance = (PFN_vkCreateInstance) fpGetInstanceProcAddr(NULL, "vkCreateInstance");\n'
+                         '    if (fpCreateInstance == NULL) {\n'
+                         '        return VK_ERROR_INITIALIZATION_FAILED;\n'
+                         '    }\n'
+                         '    // Advance the link info for the next element on the chain\n'
+                         '    chain_info->u.pLayerInfo = chain_info->u.pLayerInfo->pNext;\n'
+                         '    VkResult result = fpCreateInstance(pCreateInfo, pAllocator, pInstance);\n'
+                         '    if (result != VK_SUCCESS) {\n'
+                         '        return result;\n'
+                         '    }\n'
+                         '    VkLayerInstanceDispatchTable *pTable = initInstanceTable(*pInstance, fpGetInstanceProcAddr);\n'
+                         '    createInstanceRegisterExtensions(pCreateInfo, *pInstance);\n'
+                         '    layer_data *my_data = get_my_data_ptr(get_dispatch_key(*pInstance), layer_data_map);\n'
+                         '    my_data->report_data = debug_report_create_instance(\n'
+                         '            pTable,\n'
+                         '            *pInstance,\n'
+                         '            pCreateInfo->enabledExtensionCount,\n'
+                         '            pCreateInfo->ppEnabledExtensionNames);\n'
+                         '    init_generic(my_data, pAllocator);\n'
+                         '    sprintf(str, "Completed generic layered %s\\n");\n'
+                         '    log_msg(my_data->report_data, VK_DEBUG_REPORT_INFORMATION_BIT_EXT, VK_DEBUG_REPORT_OBJECT_TYPE_INSTANCE_EXT, (uint64_t)*pInstance, __LINE__, 0, (char *) "generic", "%%s", (char *) str);\n'
+                         '    return result;\n'
+                         '}\n' % (qual, decl, proto.name))
+        else:
+            if wsi_name(proto.name):
+                funcs.append('%s' % wsi_ifdef(proto.name))
+            funcs.append('%s' % self.lineinfo.get())
+            dispatch_param = proto.params[0].name
+            # Must use 'instance' table for these APIs, 'device' table otherwise
+            table_type = ""
+            if proto_is_global(proto):
+                table_type = "instance"
+            else:
+                table_type = "device"
+            funcs.append('%s%s\n'
+                     '{\n'
+                     '    %s%s_dispatch_table(%s)->%s;\n'
+                     '%s'
+                     '}' % (qual, decl, ret_val, table_type, dispatch_param, proto.c_call(), stmt))
+            if wsi_name(proto.name):
+                funcs.append('%s' % wsi_endif(proto.name))
+        return "\n\n".join(funcs)
+
+    def generate_body(self):
+        self.layer_name = "generic"
+        instance_extensions=[('msg_callback_get_proc_addr', []),
+                     ('wsi_enabled',
+                     ['vkGetPhysicalDeviceSurfaceSupportKHR',
+                      'vkGetPhysicalDeviceSurfaceCapabilitiesKHR',
+                      'vkGetPhysicalDeviceSurfaceFormatsKHR',
+                      'vkGetPhysicalDeviceSurfacePresentModesKHR'])]
+        extensions=[('wsi_enabled',
+                     ['vkCreateSwapchainKHR',
+                      'vkDestroySwapchainKHR', 'vkGetSwapchainImagesKHR',
+                      'vkAcquireNextImageKHR', 'vkQueuePresentKHR'])]
+        body = [self._generate_layer_initialization(True),
+                self._generate_dispatch_entrypoints("VK_LAYER_EXPORT"),
+                self._gen_create_msg_callback(),
+                self._gen_destroy_msg_callback(),
+                self._gen_debug_report_msg(),
+                self._generate_layer_gpa_function(extensions, instance_extensions)]
+
+        return "\n\n".join(body)
+
+class ApiDumpSubcommand(Subcommand):
+    def generate_header(self):
+        header_txt = []
+        header_txt.append('%s' % self.lineinfo.get())
+        header_txt.append('#include <fstream>')
+        header_txt.append('#include <iostream>')
+        header_txt.append('#include <string>')
+        header_txt.append('#include <string.h>')
+        header_txt.append('')
+        header_txt.append('#include "vk_loader_platform.h"')
+        header_txt.append('#include "vulkan/vk_layer.h"')
+        header_txt.append('#include "vk_api_dump_helper_cpp.h"')
+        header_txt.append('#include "vk_layer_table.h"')
+        header_txt.append('#include "vk_layer_extension_utils.h"')
+        header_txt.append('#include "vk_layer_config.h"')
+        header_txt.append('#include "vk_layer_utils.h"')
+        header_txt.append('#include <unordered_map>')
+        header_txt.append('#include "api_dump.h"')
+        header_txt.append('')
+        header_txt.append('static std::ofstream fileStream;')
+        header_txt.append('')
+        header_txt.append('#ifdef ANDROID')
+        header_txt.append('static std::string fileName = "/sdcard/Android/vk_apidump.txt";')
+        header_txt.append('#else')
+        header_txt.append('static std::string fileName = "vk_apidump.txt";')
+        header_txt.append('#endif')
+        header_txt.append('')
+        header_txt.append('std::ostream* outputStream = NULL;')
+        header_txt.append('void ConfigureOutputStream(bool writeToFile, bool flushAfterWrite)')
+        header_txt.append('{')
+        header_txt.append('    if(writeToFile)')
+        header_txt.append('    {')
+        header_txt.append('        if (fileName == "stdout")')
+        header_txt.append('        {')
+        header_txt.append('            outputStream = &std::cout;')
+        header_txt.append('            (*outputStream) << std::endl << "api_dump output filename \'stdout\' specified. Writing to STDOUT instead of a file." << std::endl << std::endl;')
+        header_txt.append('        } else {')
+        header_txt.append('            fileStream.open(fileName);')
+        header_txt.append('            if ((fileStream.rdstate() & fileStream.failbit) != 0) {')
+        header_txt.append('                outputStream = &std::cout;')
+        header_txt.append('                (*outputStream) << std::endl << "api_dump ERROR: Bad output filename specified: " << fileName << ". Writing to STDOUT instead" << std::endl << std::endl;')
+        header_txt.append('            }')
+        header_txt.append('            else')
+        header_txt.append('                outputStream = &fileStream;')
+        header_txt.append('        }')
+        header_txt.append('    }')
+        header_txt.append('    else')
+        header_txt.append('    {')
+        header_txt.append('        outputStream = &std::cout;')
+        header_txt.append('    }')
+        header_txt.append('')
+        header_txt.append('    if(flushAfterWrite)')
+        header_txt.append('    {')
+        header_txt.append('        outputStream->sync_with_stdio(true);')
+        header_txt.append('    }')
+        header_txt.append('    else')
+        header_txt.append('    {')
+        header_txt.append('        outputStream->sync_with_stdio(false);')
+        header_txt.append('    }')
+        header_txt.append('')
+        header_txt.append('}')
+        header_txt.append('')
+        header_txt.append('%s' % self.lineinfo.get())
+        header_txt.append('static bool g_ApiDumpDetailed = true;')
+        header_txt.append('static uint64_t g_frameCounter = 0;')
+        header_txt.append('')
+        header_txt.append('static LOADER_PLATFORM_THREAD_ONCE_DECLARATION(initOnce);')
+        header_txt.append('')
+        header_txt.append('static int printLockInitialized = 0;')
+        header_txt.append('static loader_platform_thread_mutex printLock;')
+        header_txt.append('')
+        header_txt.append('%s' % self.lineinfo.get())
+        header_txt.append('#define LAYER_EXT_ARRAY_SIZE 1')
+        header_txt.append('#define LAYER_DEV_EXT_ARRAY_SIZE 1')
+        header_txt.append('#define MAX_TID 513')
+        header_txt.append('static loader_platform_thread_id tidMapping[MAX_TID] = {0};')
+        header_txt.append('static uint32_t maxTID = 0;')
+        header_txt.append('// Map actual TID to an index value and return that index')
+        header_txt.append('//  This keeps TIDs in range from 0-MAX_TID and simplifies compares between runs')
+        header_txt.append('static uint32_t getTIDIndex() {')
+        header_txt.append('    loader_platform_thread_id tid = loader_platform_get_thread_id();')
+        header_txt.append('    for (uint32_t i = 0; i < maxTID; i++) {')
+        header_txt.append('        if (tid == tidMapping[i])')
+        header_txt.append('            return i;')
+        header_txt.append('    }')
+        header_txt.append("    // Don't yet have mapping, set it and return newly set index")
+        header_txt.append('    uint32_t retVal = (uint32_t) maxTID;')
+        header_txt.append('    tidMapping[maxTID++] = tid;')
+        header_txt.append('    assert(maxTID < MAX_TID);')
+        header_txt.append('    return retVal;')
+        header_txt.append('}')
+        header_txt.append('')
+        return "\n".join(header_txt)
+
+    def generate_init(self):
+        func_body = []
+        func_body.append('%s' % self.lineinfo.get())
+        func_body.append('#include "vk_dispatch_table_helper.h"')
+        func_body.append('#include "vk_layer_config.h"')
+        func_body.append('')
+        func_body.append('static void init%s(void)' % self.layer_name)
+        func_body.append('{')
+        func_body.append('    using namespace StreamControl;')
+        func_body.append('    using namespace std;')
+        func_body.append('')
+        func_body.append('    char const*const logName = getLayerOption("lunarg_api_dump.log_filename");')
+        func_body.append('    if(logName != NULL)')
+        func_body.append('    {')
+        func_body.append('        fileName = logName;')
+        func_body.append('    }')
+        func_body.append('')
+        func_body.append('    char const*const detailedStr = getLayerOption("lunarg_api_dump.detailed");')
+        func_body.append('    if(detailedStr != NULL)')
+        func_body.append('    {')
+        func_body.append('        if(strcmp(detailedStr, "TRUE") == 0)')
+        func_body.append('        {')
+        func_body.append('            g_ApiDumpDetailed = true;')
+        func_body.append('        }')
+        func_body.append('        else if(strcmp(detailedStr, "FALSE") == 0)')
+        func_body.append('        {')
+        func_body.append('            g_ApiDumpDetailed = false;')
+        func_body.append('        }')
+        func_body.append('    }')
+        func_body.append('')
+        func_body.append('    char const*const writeToFileStr = getLayerOption("lunarg_api_dump.file");')
+        func_body.append('    bool writeToFile = false;')
+        func_body.append('    if(writeToFileStr != NULL)')
+        func_body.append('    {')
+        func_body.append('        if(strcmp(writeToFileStr, "TRUE") == 0)')
+        func_body.append('        {')
+        func_body.append('            writeToFile = true;')
+        func_body.append('        }')
+        func_body.append('        else if(strcmp(writeToFileStr, "FALSE") == 0)')
+        func_body.append('        {')
+        func_body.append('            writeToFile = false;')
+        func_body.append('        }')
+        func_body.append('    }')
+        func_body.append('')
+        func_body.append('%s' % self.lineinfo.get())
+        func_body.append('    char const*const noAddrStr = getLayerOption("lunarg_api_dump.no_addr");')
+        func_body.append('    if(noAddrStr != NULL)')
+        func_body.append('    {')
+        func_body.append('        if(strcmp(noAddrStr, "FALSE") == 0)')
+        func_body.append('        {')
+        func_body.append('            StreamControl::writeAddress = true;')
+        func_body.append('        }')
+        func_body.append('        else if(strcmp(noAddrStr, "TRUE") == 0)')
+        func_body.append('        {')
+        func_body.append('            StreamControl::writeAddress = false;')
+        func_body.append('        }')
+        func_body.append('    }')
+        func_body.append('')
+        func_body.append('    char const*const flushAfterWriteStr = getLayerOption("lunarg_api_dump.flush");')
+        func_body.append('    bool flushAfterWrite = false;')
+        func_body.append('    if(flushAfterWriteStr != NULL)')
+        func_body.append('    {')
+        func_body.append('        if(strcmp(flushAfterWriteStr, "TRUE") == 0)')
+        func_body.append('        {')
+        func_body.append('            flushAfterWrite = true;')
+        func_body.append('        }')
+        func_body.append('        else if(strcmp(flushAfterWriteStr, "FALSE") == 0)')
+        func_body.append('        {')
+        func_body.append('            flushAfterWrite = false;')
+        func_body.append('        }')
+        func_body.append('    }')
+        func_body.append('')
+        func_body.append('%s' % self.lineinfo.get())
+        func_body.append('    ConfigureOutputStream(writeToFile, flushAfterWrite);')
+        func_body.append('')
+        func_body.append('    if (!printLockInitialized)')
+        func_body.append('    {')
+        func_body.append('        // TODO/TBD: Need to delete this mutex sometime.  How???')
+        func_body.append('        loader_platform_thread_create_mutex(&printLock);')
+        func_body.append('        printLockInitialized = 1;')
+        func_body.append('    }')
+        func_body.append('}')
+        func_body.append('')
+        return "\n".join(func_body)
+
+    def generate_intercept(self, proto, qual):
+        if proto.name in [ 'EnumerateInstanceLayerProperties','EnumerateInstanceExtensionProperties','EnumerateDeviceLayerProperties','EnumerateDeviceExtensionProperties']:
+            return None
+        decl = proto.c_func(prefix="vk", attr="VKAPI")
+        ret_val = ''
+        stmt = ''
+        funcs = []
+        sp_param_dict = {} # Store 'index' for struct param to print, or an name of binding "Count" param for array to print
+        create_params = 0 # Num of params at end of function that are created and returned as output values
+        if 'AllocateDescriptorSets' in proto.name:
+            create_params = -1
+        elif 'Create' in proto.name or 'Alloc' in proto.name or 'MapMemory' in proto.name:
+            create_params = -1
+        if proto.ret != "void":
+            ret_val = "%s result = " % proto.ret
+            stmt = "    return result;\n"
+        f_open = 'loader_platform_thread_lock_mutex(&printLock);\n    '
+        log_func = '%s\n' % self.lineinfo.get()
+        log_func += '    if (StreamControl::writeAddress == true) {'
+        log_func += '\n        (*outputStream) << "t{" << getTIDIndex() << "} f{" << g_frameCounter << "} vk%s(' % proto.name
+        log_func_no_addr = '\n        (*outputStream) << "t{" << getTIDIndex() << "} f{" << g_frameCounter << "} vk%s(' % proto.name
+        f_close = '\n    loader_platform_thread_unlock_mutex(&printLock);'
+        pindex = 0
+        prev_count_name = ''
+        for p in proto.params:
+            cp = False
+            if 0 != create_params:
+                # If this is any of the N last params of the func, treat as output
+                for y in range(-1, create_params-1, -1):
+                    if p.name == proto.params[y].name:
+                        cp = True
+            (pft, pfi, cast) = self._get_printf_params(p.ty, p.name, cp, count=4)
+
+            if p.name == "pSwapchain" or p.name == "pSwapchainImages":
+                log_func += '%s = 0x" << nouppercase <<  hex << HandleCast(%s) << dec << ", ' % (p.name, p.name)
+            elif p.name == "swapchain":
+                log_func += '%s%s = 0x" << nouppercase <<  hex << HandleCast(%s) << dec << ", ' % (cast, p.name, p.name)
+            elif p.name == "visual_id":
+                log_func += '%s = 0x" << nouppercase <<  hex << %s << dec << ", ' % (p.name, p.name)
+            elif '0x' in pft:
+                if "*" in cast:
+                    log_func += '" << (%s ? \"%s\" : \"\") << "%s = 0x" << nouppercase <<  hex << (%s ? %s : 0) << dec << ", ' % (p.name, cast, p.name, p.name, pfi)
+                else:
+                    log_func += '%s%s = 0x" << nouppercase <<  hex << %s << dec << ", ' % (cast, p.name, pfi)
+            else:
+                log_func += '%s%s = " << %s << ", ' % (cast, p.name, pfi)
+            if "%p" in pft:
+                log_func_no_addr += '%s = address, ' % (p.name)
+            else:
+                log_func_no_addr += '%s%s = " << %s << ", ' % (cast, p.name, pfi)
+            if prev_count_name != '' and (prev_count_name.replace('Count', '')[1:] in p.name):
+                sp_param_dict[pindex] = prev_count_name
+                prev_count_name = ''
+            elif vk_helper_api_dump.is_type(p.ty.strip('*').replace('const ', ''), 'struct'):
+                sp_param_dict[pindex] = 'index'
+            if p.name.endswith('Count'):
+                if '*' in p.ty:
+                    prev_count_name = "*%s" % p.name
+                else:
+                    prev_count_name = p.name
+            else:
+                prev_count_name = ''
+            pindex += 1
+        log_func = log_func.strip(', ')
+        log_func_no_addr = log_func_no_addr.strip(', ')
+        if proto.ret == "VkResult":
+            log_func += ') = " << string_VkResult((VkResult)result) << endl'
+            log_func_no_addr += ') = " << string_VkResult((VkResult)result) << endl'
+        elif proto.ret == "void*":
+            log_func += ') = " << HandleCast(result) << endl'
+            log_func_no_addr += ') = " << HandleCast(result) << endl'
+        else:
+            log_func += ')\\n"'
+            log_func_no_addr += ')\\n"'
+        log_func += ';'
+        log_func_no_addr += ';'
+        log_func += '\n    }\n    else {%s\n    }' % log_func_no_addr;
+        log_func += '\n%s' % self.lineinfo.get()
+        # log_func += '\n// Proto %s has param_dict: %s' % (proto.name, sp_param_dict)
+        if len(sp_param_dict) > 0:
+            indent = '    '
+            log_func += '\n%sif (g_ApiDumpDetailed) {' % indent
+            indent += '    '
+            i_decl = False
+            log_func += '\n%s' % self.lineinfo.get()
+            for sp_index in sp_param_dict:
+                # log_func += '\n// sp_index: %s' % str(sp_index)
+                if 'index' == sp_param_dict[sp_index]:
+                    if proto.params[sp_index].ty.strip('*').replace('const ', '') in vk_helper_api_dump.opaque_types:
+                        continue;
+                    local_name = proto.params[sp_index].name
+                    if '*' not in proto.params[sp_index].ty:
+                        local_name = '&%s' % proto.params[sp_index].name
+                    log_func += '\n%s' % self.lineinfo.get()
+                    cis_print_func = 'vk_print_%s' % (proto.params[sp_index].ty.replace('const ', '').strip('*').lower())
+                    log_func += '\n%sif (%s) {' % (indent, local_name)
+                    indent += '    '
+                    log_func += '\n%sstring tmp_str = %s(%s, "    ");' % (indent, cis_print_func, local_name)
+                    log_func += '\n%s(*outputStream) << "   %s:\\n" << tmp_str << endl;' % (indent, local_name)
+                    indent = indent[4:]
+                    log_func += '\n%s}' % (indent)
+                else: # We have a count value stored to iterate over an array
+                    print_cast = ''
+                    print_func = ''
+                    if vk_helper_api_dump.is_type(proto.params[sp_index].ty.strip('*').replace('const ', ''), 'struct'):
+                        print_cast = '&'
+                        print_func = 'vk_print_%s' % proto.params[sp_index].ty.replace('const ', '').strip('*').lower()
+                    else:
+                        print_cast = ''
+                        print_func = 'string_convert_helper'
+                        #cis_print_func = 'tmp_str = string_convert_helper(HandleCast(%s[i]), "    ");' % proto.params[sp_index].name
+                    if not i_decl:
+                        log_func += '\n%suint32_t i;' % (indent)
+                        i_decl = True
+                    log_func += '\n%sif (%s) {' % (indent, proto.params[sp_index].name)
+                    indent += '    '
+                    log_func += '\n%s' % self.lineinfo.get()
+                    log_func += '\n%sfor (i = 0; i < %s; i++) {' % (indent, sp_param_dict[sp_index])
+                    indent += '    '
+                    if not proto.params[sp_index].ty.strip('*').replace('const ', '') in vk_helper_api_dump.opaque_types:
+                        cis_print_func = 'string tmp_str = %s(%s%s[i], "    ");' % (print_func, print_cast, proto.params[sp_index].name)
+                        log_func += '\n%sif (StreamControl::writeAddress == true) {' % (indent)
+                        indent += '    '
+                        log_func += '\n%s%s' % (indent, cis_print_func)
+                        log_func += '\n%s(*outputStream) << "   %s[" << i << "]:\\n" << tmp_str << endl;' % (indent, proto.params[sp_index].name)
+                    else:
+                        log_func += '\n%sif (StreamControl::writeAddress == true) {' % (indent)
+                        indent += '    '
+                        log_func += '\n%s(*outputStream) << "   %s[" << i << "] = 0x" << nouppercase << hex << HandleCast(%s[i]) << dec << endl;' % (indent, proto.params[sp_index].name, proto.params[sp_index].name)
+                    indent = indent[4:]
+                    log_func += '\n%s}' % (indent)
+                    indent = indent[4:]
+                    log_func += '\n%s}' % (indent)
+                    indent = indent[4:]
+                    log_func += '\n%s}' % (indent)
+            indent = indent[4:]
+            log_func += '\n%s}' % (indent)
+        table_type = ''
+        if proto_is_global(proto):
+           table_type = 'instance'
+        else:
+           table_type = 'device'
+        dispatch_param = proto.params[0].name
+
+        if proto.name == "CreateInstance":
+            dispatch_param = '*' + proto.params[1].name
+            funcs.append('%s%s\n'
+                     '{\n'
+                     '    using namespace StreamControl;\n'
+                     '    using namespace std;\n'
+                     '    loader_platform_thread_once(&initOnce, initapi_dump);\n'
+                     '    VkLayerInstanceCreateInfo *chain_info = get_chain_info(pCreateInfo, VK_LAYER_LINK_INFO);\n'
+                     '    PFN_vkGetInstanceProcAddr fpGetInstanceProcAddr = chain_info->u.pLayerInfo->pfnNextGetInstanceProcAddr;\n'
+                     '    PFN_vkCreateInstance fpCreateInstance = (PFN_vkCreateInstance) fpGetInstanceProcAddr(NULL, "vkCreateInstance");\n'
+                     '    if (fpCreateInstance == NULL) {\n'
+                     '        (*outputStream) << "t{" << getTIDIndex() << "} " << g_frameCounter << " vkCreateInstance FAILED TO FIND PROC ADDRESS" << endl;\n'
+                     '        return VK_ERROR_INITIALIZATION_FAILED;\n'
+                     '    }\n'
+                     '    // Advance the link info for the next element on the chain\n'
+                     '    assert(chain_info->u.pLayerInfo);\n'
+                     '    chain_info->u.pLayerInfo = chain_info->u.pLayerInfo->pNext;\n'
+                     '    VkResult result = fpCreateInstance(pCreateInfo, pAllocator, pInstance);\n'
+                     '    if (result == VK_SUCCESS) {\n'
+                     '        initInstanceTable(*pInstance, fpGetInstanceProcAddr);\n'
+                     '        createInstanceRegisterExtensions(pCreateInfo, *pInstance);\n'
+                     '    }\n'
+                     '    %s%s%s\n'
+                     '%s'
+                     '}\n' % (qual, decl, f_open, log_func, f_close, stmt))
+
+        elif proto.name == "CreateDevice":
+            funcs.append('%s\n' % self.lineinfo.get())
+            funcs.append('%s%s\n'
+                     '{\n'
+                     '    using namespace StreamControl;\n'
+                     '    using namespace std;\n'
+                     '    %sexplicit_CreateDevice(physicalDevice, pCreateInfo, pAllocator, pDevice);\n'
+                     '    %s%s%s\n'
+                     '%s'
+                     '}' % (qual, decl, ret_val, f_open, log_func, f_close, stmt))
+        elif proto.name == "DestroyDevice":
+            funcs.append('%s%s\n'
+                 '{\n'
+                 '    using namespace StreamControl;\n'
+                 '    using namespace std;\n'
+                 '    dispatch_key key = get_dispatch_key(device);\n'
+                 '    VkLayerDispatchTable *pDisp  = %s_dispatch_table(%s);\n'
+                 '    %spDisp->%s;\n'
+                 '    deviceExtMap.erase(pDisp);\n'
+                 '    destroy_device_dispatch_table(key);\n'
+                 '    %s%s%s\n'
+                 '%s'
+                 '}' % (qual, decl, table_type, dispatch_param, ret_val, proto.c_call(), f_open, log_func, f_close, stmt))
+        elif proto.name == "DestroyInstance":
+            funcs.append('%s%s\n'
+                 '{\n'
+                 '    using namespace StreamControl;\n'
+                 '    using namespace std;\n'
+                 '    dispatch_key key = get_dispatch_key(instance);\n'
+                 '    VkLayerInstanceDispatchTable *pDisp  = %s_dispatch_table(%s);\n'
+                 '    %spDisp->%s;\n'
+                 '    instanceExtMap.erase(pDisp);\n'
+                 '    destroy_instance_dispatch_table(key);\n'
+                 '    %s%s%s\n'
+                 '%s'
+                 '}' % (qual, decl, table_type, dispatch_param, ret_val, proto.c_call(), f_open, log_func, f_close, stmt))
+        elif proto.name == "QueuePresentKHR":
+            funcs.append('%s%s\n'
+                 '{\n'
+                 '    using namespace StreamControl;\n'
+                 '    using namespace std;\n'
+                 '    VkLayerDispatchTable *pDisp  = %s_dispatch_table(%s);\n'
+                 '    %spDisp->%s;\n'
+                 '    %s%s%s\n'
+                 '    ++g_frameCounter;\n'
+                 '%s'
+                 '}' % (qual, decl, table_type, dispatch_param, ret_val, proto.c_call(), f_open, log_func, f_close, stmt))
+        else:
+            if wsi_name(decl):
+                funcs.append('%s' % wsi_ifdef(decl))
+            funcs.append('%s%s\n'
+                     '{\n'
+                     '    using namespace StreamControl;\n'
+                     '    using namespace std;\n'
+                     '    %s%s_dispatch_table(%s)->%s;\n'
+                     '    %s%s%s\n'
+                     '%s'
+                     '}' % (qual, decl, ret_val, table_type, dispatch_param, proto.c_call(), f_open, log_func, f_close, stmt))
+            if wsi_name(decl):
+                funcs.append('%s' % wsi_endif(decl))
+        return "\n\n".join(funcs)
+
+    def generate_body(self):
+        self.layer_name = "api_dump"
+        if self.wsi == 'Win32':
+            instance_extensions=[('wsi_enabled',
+                                  ['vkDestroySurfaceKHR',
+                                   'vkGetPhysicalDeviceSurfaceSupportKHR',
+                                   'vkGetPhysicalDeviceSurfaceCapabilitiesKHR',
+                                   'vkGetPhysicalDeviceSurfaceFormatsKHR',
+                                   'vkGetPhysicalDeviceSurfacePresentModesKHR',
+                                   'vkCreateWin32SurfaceKHR',
+                                   'vkGetPhysicalDeviceWin32PresentationSupportKHR'])]
+        elif self.wsi == 'Android':
+            instance_extensions=[('wsi_enabled',
+                                  ['vkDestroySurfaceKHR',
+                                   'vkGetPhysicalDeviceSurfaceSupportKHR',
+                                   'vkGetPhysicalDeviceSurfaceCapabilitiesKHR',
+                                   'vkGetPhysicalDeviceSurfaceFormatsKHR',
+                                   'vkGetPhysicalDeviceSurfacePresentModesKHR',
+                                   'vkCreateAndroidSurfaceKHR'])]
+        elif self.wsi == 'Xcb':
+            instance_extensions=[('wsi_enabled',
+                                  ['vkDestroySurfaceKHR',
+                                   'vkGetPhysicalDeviceSurfaceSupportKHR',
+                                   'vkGetPhysicalDeviceSurfaceCapabilitiesKHR',
+                                   'vkGetPhysicalDeviceSurfaceFormatsKHR',
+                                   'vkGetPhysicalDeviceSurfacePresentModesKHR',
+                                   'vkCreateXcbSurfaceKHR',
+                                   'vkGetPhysicalDeviceXcbPresentationSupportKHR'])]
+        elif self.wsi == 'Xlib':
+            instance_extensions=[('wsi_enabled',
+                                  ['vkDestroySurfaceKHR',
+                                   'vkGetPhysicalDeviceSurfaceSupportKHR',
+                                   'vkGetPhysicalDeviceSurfaceCapabilitiesKHR',
+                                   'vkGetPhysicalDeviceSurfaceFormatsKHR',
+                                   'vkGetPhysicalDeviceSurfacePresentModesKHR',
+                                   'vkCreateXlibSurfaceKHR',
+                                   'vkGetPhysicalDeviceXlibPresentationSupportKHR'])]
+        elif self.wsi == 'Wayland':
+            instance_extensions=[('wsi_enabled',
+                                  ['vkDestroySurfaceKHR',
+                                   'vkGetPhysicalDeviceSurfaceSupportKHR',
+                                   'vkGetPhysicalDeviceSurfaceCapabilitiesKHR',
+                                   'vkGetPhysicalDeviceSurfaceFormatsKHR',
+                                   'vkGetPhysicalDeviceSurfacePresentModesKHR',
+                                   'vkCreateWaylandSurfaceKHR',
+                                   'vkGetPhysicalDeviceWaylandPresentationSupportKHR'])]
+        elif self.wsi == 'Mir':
+            instance_extensions=[('wsi_enabled',
+                                  ['vkDestroySurfaceKHR',
+                                   'vkGetPhysicalDeviceSurfaceSupportKHR',
+                                   'vkGetPhysicalDeviceSurfaceCapabilitiesKHR',
+                                   'vkGetPhysicalDeviceSurfaceFormatsKHR',
+                                   'vkGetPhysicalDeviceSurfacePresentModesKHR',
+                                   'vkCreateMirSurfaceKHR',
+                                   'vkGetPhysicalDeviceMirPresentationSupportKHR'])]
+        else:
+            message(FATAL_ERROR, "Invalid DisplayServer")
+
+        extensions=[('wsi_enabled',
+                     ['vkCreateSwapchainKHR',
+                      'vkDestroySwapchainKHR', 'vkGetSwapchainImagesKHR',
+                      'vkAcquireNextImageKHR', 'vkQueuePresentKHR'])]
+        body = [self.generate_init(),
+                self._generate_dispatch_entrypoints("VK_LAYER_EXPORT"),
+                self._generate_layer_gpa_function(extensions, instance_extensions)]
+        return "\n\n".join(body)
+
+def main():
+
+    wsi = {
+            "Win32",
+            "Android",
+            "Xcb",
+            "Xlib",
+            "Wayland",
+            "Mir",
+    }
+
+    subcommands = {
+            "layer-funcs" : LayerFuncsSubcommand,
+            "generic" : GenericLayerSubcommand,
+            "api_dump" : ApiDumpSubcommand,
+    }
+
+    if len(sys.argv) < 4 or sys.argv[1] not in wsi or sys.argv[2] not in subcommands or not os.path.exists(sys.argv[3]):
+        print("Usage: %s <wsi> <subcommand> <input_header> [options]" % sys.argv[0])
+        print
+        print("Available subcommands are: %s" % " ".join(subcommands))
+        print("Available wsi (displayservers) are: %s" % " ".join(wsi))
+        exit(1)
+
+    hfp = vk_helper_api_dump.HeaderFileParser(sys.argv[3])
+    hfp.parse()
+    vk_helper_api_dump.enum_val_dict = hfp.get_enum_val_dict()
+    vk_helper_api_dump.enum_type_dict = hfp.get_enum_type_dict()
+    vk_helper_api_dump.struct_dict = hfp.get_struct_dict()
+    vk_helper_api_dump.opaque_types = hfp.get_opaque_types()
+    vk_helper_api_dump.typedef_fwd_dict = hfp.get_typedef_fwd_dict()
+    vk_helper_api_dump.typedef_rev_dict = hfp.get_typedef_rev_dict()
+    vk_helper_api_dump.types_dict = hfp.get_types_dict()
+
+    subcmd = subcommands[sys.argv[2]](sys.argv[3:])
+    subcmd.run()
+
+if __name__ == "__main__":
+    main()

diff --git a/vk_helper_api_dump.py b/vk_helper_api_dump.py
new file mode 100644
index 0000000..c1b9cda
--- /dev/null
+++ b/vk_helper_api_dump.py

@@ -0,0 +1,2272 @@
+#!/usr/bin/env python3
+#
+# Copyright (c) 2015-2016 The Khronos Group Inc.
+# Copyright (c) 2015-2016 Valve Corporation
+# Copyright (c) 2015-2016 LunarG, Inc.
+# Copyright (c) 2015-2016 Google Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+# Author: Tobin Ehlis <tobin@lunarg.com>
+
+import argparse
+import os
+import sys
+import re
+import vulkan
+from source_line_info import sourcelineinfo
+
+# vk_helper_api_dump.py overview
+# This script generates code based on vulkan input header
+#  It generate wrappers functions that can be used to display
+#  structs in a human-readable txt format, as well as utility functions
+#  to print enum values as strings
+
+def handle_args():
+    parser = argparse.ArgumentParser(description='Perform analysis of vogl trace.')
+    parser.add_argument('input_file', help='The input header file from which code will be generated.')
+    parser.add_argument('--rel_out_dir', required=False, default='vktrace_gen', help='Path relative to exec path to write output files. Will be created if needed.')
+    parser.add_argument('--abs_out_dir', required=False, default=None, help='Absolute path to write output files. Will be created if needed.')
+    parser.add_argument('--gen_enum_string_helper', required=False, action='store_true', default=False, help='Enable generation of helper header file to print string versions of enums.')
+    parser.add_argument('--gen_struct_wrappers', required=False, action='store_true', default=False, help='Enable generation of struct wrapper classes.')
+    parser.add_argument('--gen_struct_sizes', required=False, action='store_true', default=False, help='Enable generation of struct sizes.')
+    parser.add_argument('--gen_cmake', required=False, action='store_true', default=False, help='Enable generation of cmake file for generated code.')
+    parser.add_argument('--gen_graphviz', required=False, action='store_true', default=False, help='Enable generation of graphviz dot file.')
+    #parser.add_argument('--test', action='store_true', default=False, help='Run simple test.')
+    return parser.parse_args()
+
+# TODO : Ideally these data structs would be opaque to user and could be wrapped
+#   in their own class(es) to make them friendly for data look-up
+# Dicts for Data storage
+# enum_val_dict[value_name] = dict keys are the string enum value names for all enums
+#    |-------->['type'] = the type of enum class into which the value falls
+#    |-------->['val'] = the value assigned to this particular value_name
+#    '-------->['unique'] = bool designating if this enum 'val' is unique within this enum 'type'
+enum_val_dict = {}
+# enum_type_dict['type'] = the type or class of of enum
+#  '----->['val_name1', 'val_name2'...] = each type references a list of val_names within this type
+enum_type_dict = {}
+# struct_dict['struct_basename'] = the base (non-typedef'd) name of the struct
+#  |->[<member_num>] = members are stored via their integer placement in the struct
+#  |    |->['name'] = string name of this struct member
+# ...   |->['full_type'] = complete type qualifier for this member
+#       |->['type'] = base type for this member
+#       |->['ptr'] = bool indicating if this member is ptr
+#       |->['const'] = bool indicating if this member is const
+#       |->['struct'] = bool indicating if this member is a struct type
+#       |->['array'] = bool indicating if this member is an array
+#       |->['dyn_array'] = bool indicating if member is a dynamically sized array
+#       '->['array_size'] = For dyn_array, member name used to size array, else int size for static array
+struct_dict = {}
+struct_order_list = [] # struct names in order they're declared
+# Store struct names that require #ifdef guards
+#  dict stores struct and enum definitions that are guarded by ifdef as the key
+#  and the txt of the ifdef is the value
+ifdef_dict = {}
+# typedef_fwd_dict stores mapping from orig_type_name -> new_type_name
+typedef_fwd_dict = {}
+# typedef_rev_dict stores mapping from new_type_name -> orig_type_name
+typedef_rev_dict = {} # store new_name -> orig_name mapping
+# types_dict['id_name'] = identifier name will map to it's type
+#              '---->'type' = currently either 'struct' or 'enum'
+types_dict = {}   # store orig_name -> type mapping
+
+
+# Class that parses header file and generates data structures that can
+#  Then be used for other tasks
+class HeaderFileParser:
+    def __init__(self, header_file=None):
+        self.header_file = header_file
+        # store header data in various formats, see above for more info
+        self.enum_val_dict = {}
+        self.enum_type_dict = {}
+        self.struct_dict = {}
+        self.typedef_fwd_dict = {}
+        self.typedef_rev_dict = {}
+        self.types_dict = {}
+        self.last_struct_count_name = ''
+        self.opaque_types = []
+
+    def setHeaderFile(self, header_file):
+        self.header_file = header_file
+
+    def get_enum_val_dict(self):
+        return self.enum_val_dict
+
+    def get_enum_type_dict(self):
+        return self.enum_type_dict
+
+    def get_struct_dict(self):
+        return self.struct_dict
+
+    def get_opaque_types(self):
+        return self.opaque_types
+
+    def get_typedef_fwd_dict(self):
+        return self.typedef_fwd_dict
+
+    def get_typedef_rev_dict(self):
+        return self.typedef_rev_dict
+
+    def get_types_dict(self):
+        return self.types_dict
+
+    # Parse header file into data structures
+    def parse(self):
+        # parse through the file, identifying different sections
+        parse_enum = False
+        parse_struct = False
+        member_num = 0
+        # TODO : Comment parsing is very fragile but handles 2 known files
+        block_comment = False
+        pointer_to_function_arg_list = False
+        prev_count_name = ''
+        ifdef_txt = ''
+        ifdef_active = 0
+        exclude_struct_list = ['VkPlatformHandleXcbKHR', 'VkPlatformHandleX11KHR']
+        with open(self.header_file) as f:
+            line_count = 0
+            for line in f:
+                line_count += 1
+                if True in [ifd_txt in line for ifd_txt in ['#ifdef ', '#ifndef ']]:
+                    ifdef_txt = line.split()[1]
+                    ifdef_active = ifdef_active + 1
+                    continue
+                if ifdef_active != 0 and '#endif' in line:
+                    ifdef_active = ifdef_active - 1
+                if block_comment:
+                    if '*/' in line:
+                        block_comment = False
+                    continue
+                if '/*' in line:
+                    if '*/' in line: # single line block comment
+                        continue
+                    block_comment = True
+                elif 0 == len(line.split()):
+                    #print("Skipping empty line")
+                    continue
+                elif line.split()[0].strip().startswith("//"):
+                    #print("Skipping commented line %s" % line)
+                    continue
+                elif 'typedef enum' in line:
+                    (ty_txt, en_txt, base_type) = line.strip().split(None, 2)
+                    #print("Found ENUM type %s" % base_type)
+                    if '{' == base_type:
+                        base_type = 'tmp_enum'
+                    parse_enum = True
+                    default_enum_val = 0
+                    self.types_dict[base_type] = 'enum'
+                elif 'typedef struct' in line or 'typedef union' in line:
+                    if True in [ex_type in line for ex_type in exclude_struct_list]:
+                        continue
+
+                    (ty_txt, st_txt, base_type) = line.strip().split(None, 2)
+                    if ' ' in base_type:
+                        (ignored, base_type) = base_type.strip().split(None, 1)
+
+                    #print("Found STRUCT type: %s" % base_type)
+                    # Note:  This really needs to be updated to handle one line struct definition, like
+                    #        typedef struct obj##_T { uint64_t handle; } obj;
+                    if ('{' == base_type or not (' ' in base_type)):
+                        base_type = 'tmp_struct'
+                        parse_struct = True
+                        self.types_dict[base_type] = 'struct'
+#                elif 'typedef union' in line:
+#                    (ty_txt, st_txt, base_type) = line.strip().split(None, 2)
+#                    print("Found UNION type: %s" % base_type)
+#                    parse_struct = True
+#                    self.types_dict[base_type] = 'struct'
+                elif '}' in line and (parse_enum or parse_struct):
+                    if len(line.split()) > 1: # deals with embedded union in one struct
+                        parse_enum = False
+                        parse_struct = False
+                        self.last_struct_count_name = ''
+                        member_num = 0
+                        (cur_char, targ_type) = line.strip().split(None, 1)
+                        if 'tmp_struct' == base_type:
+                            base_type = targ_type.strip(';')
+                            if True in [ex_type in base_type for ex_type in exclude_struct_list]:
+                                del self.struct_dict['tmp_struct']
+                                continue
+                            #print("Found Actual Struct type %s" % base_type)
+                            self.struct_dict[base_type] = self.struct_dict['tmp_struct']
+                            self.struct_dict.pop('tmp_struct', 0)
+                            struct_order_list.append(base_type)
+                            self.types_dict[base_type] = 'struct'
+                            self.types_dict.pop('tmp_struct', 0)
+                        elif 'tmp_enum' == base_type:
+                            base_type = targ_type.strip(';')
+                            #print("Found Actual ENUM type %s" % base_type)
+                            for n in self.enum_val_dict:
+                                if 'tmp_enum' == self.enum_val_dict[n]['type']:
+                                    self.enum_val_dict[n]['type'] = base_type
+#                            self.enum_val_dict[base_type] = self.enum_val_dict['tmp_enum']
+#                            self.enum_val_dict.pop('tmp_enum', 0)
+                            self.enum_type_dict[base_type] = self.enum_type_dict['tmp_enum']
+                            self.enum_type_dict.pop('tmp_enum', 0)
+                            self.types_dict[base_type] = 'enum'
+                            self.types_dict.pop('tmp_enum', 0)
+                        if ifdef_active:
+                            ifdef_dict[base_type] = ifdef_txt
+                        self.typedef_fwd_dict[base_type] = targ_type.strip(';')
+                        self.typedef_rev_dict[targ_type.strip(';')] = base_type
+                        #print("fwd_dict: %s = %s" % (base_type, targ_type))
+                elif parse_enum:
+                    #if 'VK_MAX_ENUM' not in line and '{' not in line:
+                    if True not in [ens in line for ens in ['{', '_MAX_ENUM', '_BEGIN_RANGE', '_END_RANGE', '_NUM = ', '_ENUM_RANGE']]:
+                        self._add_enum(line, base_type, default_enum_val)
+                        default_enum_val += 1
+                elif parse_struct:
+                    if ';' in line:
+                        self._add_struct(line, base_type, member_num)
+                        member_num = member_num + 1
+                elif '#define' not in line and ('VK_DEFINE_HANDLE' in line or 'VK_DEFINE_NON_DISPATCHABLE_HANDLE' in line):
+                    base_type = line.replace('VK_DEFINE_NON_DISPATCHABLE_HANDLE', '').replace('VK_DEFINE_HANDLE', '').strip('()\n ')
+                    self.types_dict[base_type] = 'struct'
+                    struct_order_list.append(base_type)
+                    self.struct_dict[base_type] = {}
+                    self.opaque_types.append(base_type)
+                    if ifdef_active:
+                        ifdef_dict[base_type] = ifdef_txt
+                    self.typedef_fwd_dict[base_type] = base_type
+                    self.typedef_rev_dict[base_type] = base_type
+                elif 'PFN_vk' in line and not ';' in line:
+                    pointer_to_function_arg_list = True
+                elif pointer_to_function_arg_list:
+                    if ';' in line:
+                        pointer_to_function_arg_list = False
+                #elif not ifdef_active and not any(token in line for token in ['#include', '#define', '#endif', 'extern', '{', 'PFN_vk', 'uint_34', 'uint_64', 'typedef VkFlags ']):
+                    #print(" *** Warning line #%d: %s ignored" % (line_count, line.strip('\n')))
+
+    # populate enum dicts based on enum lines
+    def _add_enum(self, line_txt, enum_type, def_enum_val):
+        #print("Parsing enum line %s" % line_txt)
+        if '=' in line_txt:
+            (enum_name, eq_char, enum_val) = line_txt.split(None, 2)
+        else:
+            enum_name = line_txt.split(',')[0]
+            enum_val = str(def_enum_val)
+        self.enum_val_dict[enum_name] = {}
+        self.enum_val_dict[enum_name]['type'] = enum_type
+        # strip comma and comment, then extra split in case of no comma w/ comments
+        enum_val = enum_val.strip().split(',', 1)[0]
+        self.enum_val_dict[enum_name]['val'] = enum_val.split()[0]
+        # Perform conversion of VK_BIT macro
+        if 'VK_BIT' in self.enum_val_dict[enum_name]['val']:
+            vk_bit_val = self.enum_val_dict[enum_name]['val']
+            bit_shift = int(vk_bit_val[vk_bit_val.find('(')+1:vk_bit_val.find(')')], 0)
+            self.enum_val_dict[enum_name]['val'] = str(1 << bit_shift)
+        else:
+            # account for negative values surrounded by parens
+            self.enum_val_dict[enum_name]['val'] = self.enum_val_dict[enum_name]['val'].strip(')').replace('-(', '-')
+        # Try to cast to int to determine if enum value is unique
+        try:
+            #print("ENUM val:", self.enum_val_dict[enum_name]['val'])
+            int(self.enum_val_dict[enum_name]['val'], 0)
+            self.enum_val_dict[enum_name]['unique'] = True
+            #print("ENUM has num value")
+        except ValueError:
+            self.enum_val_dict[enum_name]['unique'] = False
+            #print("ENUM is not a number value")
+        # Update enum_type_dict as well
+        if not enum_type in self.enum_type_dict:
+            self.enum_type_dict[enum_type] = []
+        self.enum_type_dict[enum_type].append(enum_name)
+
+    # Return True if struct member is a dynamic array
+    # RULES : This is a bit quirky based on the API
+    # NOTE : Changes in API spec may cause these rules to change
+    #  1. There must be a previous uint var w/ 'count' in the name in the struct
+    #  2. Dynam array must have 'const' and '*' qualifiers
+    #  3a. Name of dynam array must end in 's' char OR
+    #  3b. Name of count var minus 'count' must be contained in name of dynamic array
+    def _is_dynamic_array(self, full_type, name):
+        exceptions = ['pEnabledFeatures', 'pWaitDstStageMask', 'pSampleMask']
+        if name in exceptions:
+            return False
+        if '' != self.last_struct_count_name:
+            if 'const' in full_type and '*' in full_type:
+                if name.endswith('s') or self.last_struct_count_name.lower().replace('count', '') in name.lower():
+                    return True
+
+                # VkWriteDescriptorSet
+                if self.last_struct_count_name == "descriptorCount":
+                    return True
+
+        return False
+
+    # populate struct dicts based on struct lines
+    # TODO : Handle ":" bitfield, "**" ptr->ptr and "const type*const*"
+    def _add_struct(self, line_txt, struct_type, num):
+        #print("Parsing struct line %s" % line_txt)
+        if '{' == struct_type:
+            print("Parsing struct '{' w/ line %s" % line_txt)
+        if not struct_type in self.struct_dict:
+            self.struct_dict[struct_type] = {}
+        members = line_txt.strip().split(';', 1)[0] # first strip semicolon & comments
+        # TODO : Handle bitfields more correctly
+        members = members.strip().split(':', 1)[0] # strip bitfield element
+        (member_type, member_name) = members.rsplit(None, 1)
+        # Store counts to help recognize and size dynamic arrays
+        if 'count' in member_name.lower() and 'samplecount' != member_name.lower() and 'uint' in member_type:
+            self.last_struct_count_name = member_name
+        self.struct_dict[struct_type][num] = {}
+        self.struct_dict[struct_type][num]['full_type'] = member_type
+        self.struct_dict[struct_type][num]['dyn_array'] = False
+
+        if '*' in member_type:
+            self.struct_dict[struct_type][num]['ptr'] = True
+            # TODO : Need more general purpose way here to reduce down to basic type
+            member_type = member_type.replace(' const*', '')
+            member_type = member_type.strip('*')
+        else:
+            self.struct_dict[struct_type][num]['ptr'] = False
+        if 'const' in member_type:
+            self.struct_dict[struct_type][num]['const'] = True
+            member_type = member_type.replace('const', '').strip()
+        else:
+            self.struct_dict[struct_type][num]['const'] = False
+        # TODO : There is a bug here where it seems that at the time we do this check,
+        #    the data is not in the types or typedef_rev_dict, so we never pass this if check
+        if is_type(member_type, 'struct'):
+            self.struct_dict[struct_type][num]['struct'] = True
+        else:
+            self.struct_dict[struct_type][num]['struct'] = False
+        self.struct_dict[struct_type][num]['type'] = member_type
+        if '[' in member_name:
+            (member_name, array_size) = member_name.split('[', 1)
+            #if 'char' in member_type:
+            #    self.struct_dict[struct_type][num]['array'] = False
+            #    self.struct_dict[struct_type][num]['array_size'] = 0
+            #    self.struct_dict[struct_type][num]['ptr'] = True
+            #else:
+            self.struct_dict[struct_type][num]['array'] = True
+            self.struct_dict[struct_type][num]['array_size'] = array_size.strip(']')
+        elif self._is_dynamic_array(self.struct_dict[struct_type][num]['full_type'], member_name):
+            #print("Found dynamic array %s of size %s" % (member_name, self.last_struct_count_name))
+            self.struct_dict[struct_type][num]['array'] = True
+            self.struct_dict[struct_type][num]['dyn_array'] = True
+            self.struct_dict[struct_type][num]['array_size'] = self.last_struct_count_name
+        elif not 'array' in self.struct_dict[struct_type][num]:
+            self.struct_dict[struct_type][num]['array'] = False
+            self.struct_dict[struct_type][num]['array_size'] = 0
+        self.struct_dict[struct_type][num]['name'] = member_name
+
+# check if given identifier is of specified type_to_check
+def is_type(identifier, type_to_check):
+    if identifier in types_dict and type_to_check == types_dict[identifier]:
+        return True
+    if identifier in typedef_rev_dict:
+        new_id = typedef_rev_dict[identifier]
+        if new_id in types_dict and type_to_check == types_dict[new_id]:
+            return True
+    return False
+
+# This is a validation function to verify that we can reproduce the original structs
+def recreate_structs():
+    for struct_name in struct_dict:
+        sys.stdout.write("typedef struct %s\n{\n" % struct_name)
+        for mem_num in sorted(struct_dict[struct_name]):
+            sys.stdout.write("    ")
+            if struct_dict[struct_name][mem_num]['const']:
+                sys.stdout.write("const ")
+            #if struct_dict[struct_name][mem_num]['struct']:
+            #    sys.stdout.write("struct ")
+            sys.stdout.write (struct_dict[struct_name][mem_num]['type'])
+            if struct_dict[struct_name][mem_num]['ptr']:
+                sys.stdout.write("*")
+            sys.stdout.write(" ")
+            sys.stdout.write(struct_dict[struct_name][mem_num]['name'])
+            if struct_dict[struct_name][mem_num]['array']:
+                sys.stdout.write("[")
+                sys.stdout.write(struct_dict[struct_name][mem_num]['array_size'])
+                sys.stdout.write("]")
+            sys.stdout.write(";\n")
+        sys.stdout.write("} ")
+        sys.stdout.write(typedef_fwd_dict[struct_name])
+        sys.stdout.write(";\n\n")
+
+#
+# TODO: Fix construction of struct name
+def get_struct_name_from_struct_type(struct_type):
+    # Note: All struct types are now camel-case
+    # Debug Report has an inconsistency - so need special case.
+    if ("VK_STRUCTURE_TYPE_DEBUG_REPORT_CREATE_INFO_EXT" == struct_type):
+        return "VkDebugReportCallbackCreateInfoEXT"
+    caps_struct_name = struct_type.replace("_STRUCTURE_TYPE", "")
+    char_idx = 0
+    struct_name = ''
+    for char in caps_struct_name:
+        if (0 == char_idx) or (caps_struct_name[char_idx-1] == '_'):
+            struct_name += caps_struct_name[char_idx]
+        elif (caps_struct_name[char_idx] == '_'):
+            pass
+        else:
+            struct_name += caps_struct_name[char_idx].lower()
+        char_idx += 1
+
+    return struct_name
+
+# Emit an ifdef if incoming func matches a platform identifier
+def add_platform_wrapper_entry(list, func):
+    if (re.match(r'.*Xlib.*', func)):
+        list.append("#ifdef VK_USE_PLATFORM_XLIB_KHR")
+    if (re.match(r'.*Xcb.*', func)):
+        list.append("#ifdef VK_USE_PLATFORM_XCB_KHR")
+    if (re.match(r'.*Wayland.*', func)):
+        list.append("#ifdef VK_USE_PLATFORM_WAYLAND_KHR")
+    if (re.match(r'.*Mir.*', func)):
+        list.append("#ifdef VK_USE_PLATFORM_MIR_KHR")
+    if (re.match(r'.*Android.*', func)):
+        list.append("#ifdef VK_USE_PLATFORM_ANDROID_KHR")
+    if (re.match(r'.*Win32.*', func)):
+        list.append("#ifdef VK_USE_PLATFORM_WIN32_KHR")
+
+# Emit an endif if incoming func matches a platform identifier
+def add_platform_wrapper_exit(list, func):
+    if (re.match(r'.*Xlib.*', func)):
+        list.append("#endif //VK_USE_PLATFORM_XLIB_KHR")
+    if (re.match(r'.*Xcb.*', func)):
+        list.append("#endif //VK_USE_PLATFORM_XCB_KHR")
+    if (re.match(r'.*Wayland.*', func)):
+        list.append("#endif //VK_USE_PLATFORM_WAYLAND_KHR")
+    if (re.match(r'.*Mir.*', func)):
+        list.append("#endif //VK_USE_PLATFORM_MIR_KHR")
+    if (re.match(r'.*Android.*', func)):
+        list.append("#endif //VK_USE_PLATFORM_ANDROID_KHR")
+    if (re.match(r'.*Win32.*', func)):
+        list.append("#endif //VK_USE_PLATFORM_WIN32_KHR")
+
+# class for writing common file elements
+# Here's how this class lays out a file:
+#  COPYRIGHT
+#  HEADER
+#  BODY
+#  FOOTER
+#
+# For each of these sections, there's a "set*" function
+# The class as a whole has a generate function which will write each section in order
+class CommonFileGen:
+    def __init__(self, filename=None, copyright_txt="", header_txt="", body_txt="", footer_txt=""):
+        self.filename = filename
+        self.contents = {'copyright': copyright_txt, 'header': header_txt, 'body': body_txt, 'footer': footer_txt}
+        # TODO : Set a default copyright & footer at least
+
+    def setFilename(self, filename):
+        self.filename = filename
+
+    def setCopyright(self, c):
+        self.contents['copyright'] = c
+
+    def setHeader(self, h):
+        self.contents['header'] = h
+
+    def setBody(self, b):
+        self.contents['body'] = b
+
+    def setFooter(self, f):
+        self.contents['footer'] = f
+
+    def generate(self):
+        #print("Generate to file %s" % self.filename)
+        with open(self.filename, "w") as f:
+            f.write(self.contents['copyright'])
+            f.write(self.contents['header'])
+            f.write(self.contents['body'])
+            f.write(self.contents['footer'])
+
+# class for writing a wrapper class for structures
+# The wrapper class wraps the structs and includes utility functions for
+#  setting/getting member values and displaying the struct data in various formats
+class StructWrapperGen:
+    def __init__(self, in_struct_dict, in_opaque_types, prefix, out_dir):
+        self.struct_dict = in_struct_dict
+        self.opaque_types = in_opaque_types
+        self.include_headers = []
+        self.lineinfo = sourcelineinfo()
+        self.api = prefix
+        if prefix.lower() == "vulkan":
+            self.api_prefix = "vk"
+        else:
+            self.api_prefix = prefix
+        self.header_filename = os.path.join(out_dir, self.api_prefix+"_struct_wrappers.h")
+        self.class_filename = os.path.join(out_dir, self.api_prefix+"_struct_wrappers.cpp")
+        self.safe_struct_header_filename = os.path.join(out_dir, self.api_prefix+"_safe_struct.h")
+        self.safe_struct_source_filename = os.path.join(out_dir, self.api_prefix+"_safe_struct.cpp")
+        self.string_helper_filename = os.path.join(out_dir, self.api_prefix+"_struct_string_helper.h")
+        self.string_helper_no_addr_filename = os.path.join(out_dir, self.api_prefix+"_struct_string_helper_no_addr.h")
+        self.string_helper_cpp_filename = os.path.join(out_dir, self.api_prefix+"_api_dump_helper_cpp.h")
+        self.string_helper_no_addr_cpp_filename = os.path.join(out_dir, self.api_prefix+"_struct_string_helper_no_addr_cpp.h")
+        self.validate_helper_filename = os.path.join(out_dir, self.api_prefix+"_struct_validate_helper.h")
+        self.no_addr = False
+        # Safe Struct (ss) header and source files
+        self.ssh = CommonFileGen(self.safe_struct_header_filename)
+        self.sss = CommonFileGen(self.safe_struct_source_filename)
+        self.hfg = CommonFileGen(self.header_filename)
+        self.cfg = CommonFileGen(self.class_filename)
+        self.shg = CommonFileGen(self.string_helper_filename)
+        self.shcppg = CommonFileGen(self.string_helper_cpp_filename)
+        self.vhg = CommonFileGen(self.validate_helper_filename)
+        self.size_helper_filename = os.path.join(out_dir, self.api_prefix+"_struct_size_helper.h")
+        self.size_helper_c_filename = os.path.join(out_dir, self.api_prefix+"_struct_size_helper.c")
+        self.size_helper_gen = CommonFileGen(self.size_helper_filename)
+        self.size_helper_c_gen = CommonFileGen(self.size_helper_c_filename)
+        #print(self.header_filename)
+        self.header_txt = ""
+        self.definition_txt = ""
+
+    def set_include_headers(self, include_headers):
+        self.include_headers = include_headers
+
+    def set_no_addr(self, no_addr):
+        self.no_addr = no_addr
+        if self.no_addr:
+            self.shg = CommonFileGen(self.string_helper_no_addr_filename)
+            self.shcppg = CommonFileGen(self.string_helper_no_addr_cpp_filename)
+        else:
+            self.shg = CommonFileGen(self.string_helper_filename)
+            self.shcppg = CommonFileGen(self.string_helper_cpp_filename)
+
+    # Return class name for given struct name
+    def get_class_name(self, struct_name):
+        class_name = struct_name.strip('_').lower() + "_struct_wrapper"
+        return class_name
+
+    def get_file_list(self):
+        return [os.path.basename(self.header_filename), os.path.basename(self.class_filename), os.path.basename(self.string_helper_filename)]
+
+    # Generate class header file
+    def generateHeader(self):
+        self.hfg.setCopyright(self._generateCopyright())
+        self.hfg.setHeader(self._generateHeader())
+        self.hfg.setBody(self._generateClassDeclaration())
+        self.hfg.setFooter(self._generateFooter())
+        self.hfg.generate()
+
+    # Generate class definition
+    def generateBody(self):
+        self.cfg.setCopyright(self._generateCopyright())
+        self.cfg.setHeader(self._generateCppHeader())
+        self.cfg.setBody(self._generateClassDefinition())
+        self.cfg.setFooter(self._generateFooter())
+        self.cfg.generate()
+
+    # Safe Structs are versions of vulkan structs with non-const safe ptrs
+    #  that make shadowing structures and clean-up of shadowed structures very simple
+    def generateSafeStructHeader(self):
+        self.ssh.setCopyright(self._generateCopyright())
+        self.ssh.setHeader(self._generateSafeStructHeader())
+        self.ssh.setBody(self._generateSafeStructDecls())
+        self.ssh.generate()
+
+    def generateSafeStructs(self):
+        self.sss.setCopyright(self._generateCopyright())
+        self.sss.setHeader(self._generateSafeStructSourceHeader())
+        self.sss.setBody(self._generateSafeStructSource())
+        self.sss.generate()
+
+    # Generate c-style .h file that contains functions for printing structs
+    def generateStringHelper(self):
+        print("Generating struct string helper")
+        self.shg.setCopyright(self._generateCopyright())
+        self.shg.setHeader(self._generateStringHelperHeader())
+        self.shg.setBody(self._generateStringHelperFunctions())
+        self.shg.generate()
+
+    # Generate cpp-style .h file that contains functions for printing structs
+    def generateStringHelperCpp(self):
+        print("Generating struct string helper cpp")
+        self.shcppg.setCopyright(self._generateCopyright())
+        self.shcppg.setHeader(self._generateStringHelperHeaderCpp())
+        self.shcppg.setBody(self._generateStringHelperFunctionsCpp())
+        self.shcppg.generate()
+
+    # Generate c-style .h file that contains functions for printing structs
+    def generateValidateHelper(self):
+        print("Generating struct validate helper")
+        self.vhg.setCopyright(self._generateCopyright())
+        self.vhg.setHeader(self._generateValidateHelperHeader())
+        self.vhg.setBody(self._generateValidateHelperFunctions())
+        self.vhg.generate()
+
+    def generateSizeHelper(self):
+        print("Generating struct size helper")
+        self.size_helper_gen.setCopyright(self._generateCopyright())
+        self.size_helper_gen.setHeader(self._generateSizeHelperHeader())
+        self.size_helper_gen.setBody(self._generateSizeHelperFunctions())
+        self.size_helper_gen.generate()
+
+    def generateSizeHelperC(self):
+        print("Generating struct size helper c")
+        self.size_helper_c_gen.setCopyright(self._generateCopyright())
+        self.size_helper_c_gen.setHeader(self._generateSizeHelperHeaderC())
+        self.size_helper_c_gen.setBody(self._generateSizeHelperFunctionsC())
+        self.size_helper_c_gen.generate()
+
+    def _generateCopyright(self):
+        copyright = []
+        copyright.append('/* THIS FILE IS GENERATED.  DO NOT EDIT. */');
+        copyright.append('');
+        copyright.append('/*');
+        copyright.append(' * Vulkan');
+        copyright.append(' *');
+        copyright.append(' * Copyright (c) 2015-2016 The Khronos Group Inc.');
+        copyright.append(' * Copyright (c) 2015-2016 Valve Corporation.');
+        copyright.append(' * Copyright (c) 2015-2016 LunarG, Inc.');
+        copyright.append(' * Copyright (c) 2015-2016 Google Inc.');
+        copyright.append(' *');
+        copyright.append(' * Licensed under the Apache License, Version 2.0 (the "License");');
+        copyright.append(' * you may not use this file except in compliance with the License.');
+        copyright.append(' * You may obtain a copy of the License at');
+        copyright.append(' *');
+        copyright.append(' *     http://www.apache.org/licenses/LICENSE-2.0');
+        copyright.append(' *');
+        copyright.append(' * Unless required by applicable law or agreed to in writing, software');
+        copyright.append(' * distributed under the License is distributed on an "AS IS" BASIS,');
+        copyright.append(' * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.');
+        copyright.append(' * See the License for the specific language governing permissions and');
+        copyright.append(' * limitations under the License.');
+        copyright.append(' *');
+        copyright.append(' * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>');
+        copyright.append(' * Author: Tobin Ehlis <tobin@lunarg.com>');
+        copyright.append(' */');
+        copyright.append('');
+        return "\n".join(copyright)
+
+    def _generateCppHeader(self):
+        header = []
+        header.append("//#includes, #defines, globals and such...\n")
+        header.append("#include <stdio.h>\n#include <%s>\n#include <%s_enum_string_helper.h>\n" % (os.path.basename(self.header_filename), self.api_prefix))
+        return "".join(header)
+
+    def _generateClassDefinition(self):
+        class_def = []
+        if 'vk' == self.api:
+            class_def.append(self._generateDynamicPrintFunctions())
+        for s in sorted(self.struct_dict):
+            class_def.append("\n// %s class definition" % self.get_class_name(s))
+            class_def.append(self._generateConstructorDefinitions(s))
+            class_def.append(self._generateDestructorDefinitions(s))
+            class_def.append(self._generateDisplayDefinitions(s))
+        return "\n".join(class_def)
+
+    def _generateConstructorDefinitions(self, s):
+        con_defs = []
+        con_defs.append("%s::%s() : m_struct(), m_indent(0), m_dummy_prefix('\\0'), m_origStructAddr(NULL) {}" % (self.get_class_name(s), self.get_class_name(s)))
+        # TODO : This is a shallow copy of ptrs
+        con_defs.append("%s::%s(%s* pInStruct) : m_indent(0), m_dummy_prefix('\\0')\n{\n    m_struct = *pInStruct;\n    m_origStructAddr = pInStruct;\n}" % (self.get_class_name(s), self.get_class_name(s), typedef_fwd_dict[s]))
+        con_defs.append("%s::%s(const %s* pInStruct) : m_indent(0), m_dummy_prefix('\\0')\n{\n    m_struct = *pInStruct;\n    m_origStructAddr = pInStruct;\n}" % (self.get_class_name(s), self.get_class_name(s), typedef_fwd_dict[s]))
+        return "\n".join(con_defs)
+
+    def _generateDestructorDefinitions(self, s):
+        return "%s::~%s() {}" % (self.get_class_name(s), self.get_class_name(s))
+
+    def _generateDynamicPrintFunctions(self):
+        dp_funcs = []
+        dp_funcs.append("\nvoid dynamic_display_full_txt(const void* pStruct, uint32_t indent)\n{\n    // Cast to APP_INFO ptr initially just to pull sType off struct")
+        dp_funcs.append("    VkStructureType sType = ((VkApplicationInfo*)pStruct)->sType;\n")
+        dp_funcs.append("    switch (sType)\n    {")
+        for e in enum_type_dict:
+            class_num = 0
+            if "StructureType" in e:
+                for v in sorted(enum_type_dict[e]):
+                    struct_name = get_struct_name_from_struct_type(v)
+                    if struct_name not in self.struct_dict:
+                        continue
+
+                    class_name = self.get_class_name(struct_name)
+                    instance_name = "swc%i" % class_num
+                    dp_funcs.append("        case %s:\n        {" % (v))
+                    dp_funcs.append("            %s %s((%s*)pStruct);" % (class_name, instance_name, struct_name))
+                    dp_funcs.append("            %s.set_indent(indent);" % (instance_name))
+                    dp_funcs.append("            %s.display_full_txt();" % (instance_name))
+                    dp_funcs.append("        }")
+                    dp_funcs.append("        break;")
+                    class_num += 1
+                dp_funcs.append("    }")
+        dp_funcs.append("}\n")
+        return "\n".join(dp_funcs)
+
+    def _get_func_name(self, struct, mid_str):
+        return "%s_%s_%s" % (self.api_prefix, mid_str, struct.lower().strip("_"))
+
+    def _get_sh_func_name(self, struct):
+        return self._get_func_name(struct, 'print')
+
+    def _get_vh_func_name(self, struct):
+        return self._get_func_name(struct, 'validate')
+
+    def _get_size_helper_func_name(self, struct):
+        return self._get_func_name(struct, 'size')
+
+    # Return elements to create formatted string for given struct member
+    def _get_struct_print_formatted(self, struct_member, pre_var_name="prefix", postfix = "\\n", struct_var_name="pStruct", struct_ptr=True, print_array=False):
+        struct_op = "->"
+        if not struct_ptr:
+            struct_op = "."
+        member_name = struct_member['name']
+        print_type = "p"
+        cast_type = ""
+        member_post = ""
+        array_index = ""
+        member_print_post = ""
+        print_delimiter = "%"
+        if struct_member['array'] and 'char' in struct_member['type'].lower(): # just print char array as string
+            if member_name.startswith('pp'): # TODO : Only printing first element of dynam array of char* for now
+                member_post = "[0]"
+            print_type = "s"
+            print_array = False
+        elif struct_member['array'] and not print_array:
+            # Just print base address of array when not full print_array
+            print_delimiter = "0x%"
+            cast_type = "(void*)"
+        elif is_type(struct_member['type'], 'enum'):
+            cast_type = "string_%s" % struct_member['type']
+            if struct_member['ptr']:
+                struct_var_name = "*" + struct_var_name
+                print_delimiter = "0x%"
+            print_type = "s"
+        elif is_type(struct_member['type'], 'struct'): # print struct address for now
+            print_delimiter = "0x%"
+            cast_type = "(void*)"
+            if not struct_member['ptr']:
+                cast_type = "(void*)&"
+        elif 'bool' in struct_member['type'].lower():
+            print_type = "s"
+            member_post = ' ? "TRUE" : "FALSE"'
+        elif 'float' in struct_member['type']:
+            print_type = "f"
+        elif 'uint64' in struct_member['type'] or 'gpusize' in struct_member['type'].lower():
+            print_type = '" PRId64 "'
+        elif 'uint8' in struct_member['type']:
+            print_type = "hu"
+        elif 'size' in struct_member['type'].lower():
+            print_type = '" PRINTF_SIZE_T_SPECIFIER "'
+            print_delimiter = ""
+        elif True in [ui_str.lower() in struct_member['type'].lower() for ui_str in ['uint', 'flags', 'samplemask']]:
+            print_type = "u"
+        elif 'int' in struct_member['type']:
+            print_type = "i"
+        elif struct_member['ptr']:
+            print_delimiter = "0x%"
+            pass
+        else:
+            #print("Unhandled struct type: %s" % struct_member['type'])
+            print_delimiter = "0x%"
+            cast_type = "(void*)"
+        if print_array and struct_member['array']:
+            member_print_post = "[%u]"
+            array_index = " i,"
+            member_post = "[i]"
+        print_out = "%%s%s%s = %s%s%s" % (member_name, member_print_post, print_delimiter, print_type, postfix) # section of print that goes inside of quotes
+        print_arg = ", %s,%s %s(%s%s%s)%s" % (pre_var_name, array_index, cast_type, struct_var_name, struct_op, member_name, member_post) # section of print passed to portion in quotes
+        if self.no_addr and "p" == print_type:
+            print_out = "%%s%s%s = addr\\n" % (member_name, member_print_post) # section of print that goes inside of quotes
+            print_arg = ", %s" % (pre_var_name)
+        return (print_out, print_arg)
+
+    def _generateStringHelperFunctions(self):
+        sh_funcs = []
+        # We do two passes, first pass just generates prototypes for all the functsions
+        for s in sorted(self.struct_dict):
+            sh_funcs.append('char* %s(const %s* pStruct, const char* prefix);' % (self._get_sh_func_name(s), typedef_fwd_dict[s]))
+        sh_funcs.append('')
+        sh_funcs.append('#if defined(_WIN32)')
+        sh_funcs.append('// Microsoft did not implement C99 in Visual Studio; but started adding it with')
+        sh_funcs.append('// VS2013.  However, VS2013 still did not have snprintf().  The following is a')
+        sh_funcs.append('// work-around.')
+        sh_funcs.append('#define snprintf _snprintf')
+        sh_funcs.append('#endif // _WIN32\n')
+        for s in sorted(self.struct_dict):
+            p_out = ""
+            p_args = ""
+            stp_list = [] # stp == "struct to print" a list of structs for this API call that should be printed as structs
+            # This pre-pass flags embedded structs and pNext
+            for m in sorted(self.struct_dict[s]):
+                if 'pNext' == self.struct_dict[s][m]['name'] or is_type(self.struct_dict[s][m]['type'], 'struct'):
+                    stp_list.append(self.struct_dict[s][m])
+            sh_funcs.append('char* %s(const %s* pStruct, const char* prefix)\n{\n    char* str;' % (self._get_sh_func_name(s), typedef_fwd_dict[s]))
+            sh_funcs.append("    size_t len;")
+            num_stps = len(stp_list);
+            total_strlen_str = ''
+            if 0 != num_stps:
+                sh_funcs.append("    char* tmpStr;")
+                sh_funcs.append('    char* extra_indent = (char*)malloc(strlen(prefix) + 3);')
+                sh_funcs.append('    strcpy(extra_indent, "  ");')
+                sh_funcs.append('    strncat(extra_indent, prefix, strlen(prefix));')
+                sh_funcs.append('    char* stp_strs[%i];' % num_stps)
+                for index in range(num_stps):
+                    # If it's an array, print all of the elements
+                    # If it's a ptr, print thing it's pointing to
+                    # Non-ptr struct case. Print the struct using its address
+                    struct_deref = '&'
+                    if 1 < stp_list[index]['full_type'].count('*'):
+                        struct_deref = ''
+                    if (stp_list[index]['ptr']):
+                        sh_funcs.append('    if (pStruct->%s) {' % stp_list[index]['name'])
+                        if 'pNext' == stp_list[index]['name']:
+                            sh_funcs.append('        tmpStr = dynamic_display((void*)pStruct->pNext, prefix);')
+                            sh_funcs.append('        len = 256+strlen(tmpStr);')
+                            sh_funcs.append('        stp_strs[%i] = (char*)malloc(len);' % index)
+                            if self.no_addr:
+                                sh_funcs.append('        snprintf(stp_strs[%i], len, " %%spNext (addr)\\n%%s", prefix, tmpStr);' % index)
+                            else:
+                                sh_funcs.append('        snprintf(stp_strs[%i], len, " %%spNext (0x%%p)\\n%%s", prefix, (void*)pStruct->pNext, tmpStr);' % index)
+                            sh_funcs.append('        free(tmpStr);')
+                        else:
+                            if stp_list[index]['name'] in ['pImageViews', 'pBufferViews']:
+                                # TODO : This is a quick hack to handle these arrays of ptrs
+                                sh_funcs.append('        tmpStr = %s(&pStruct->%s[0], extra_indent);' % (self._get_sh_func_name(stp_list[index]['type']), stp_list[index]['name']))
+                            else:
+                                sh_funcs.append('        tmpStr = %s(pStruct->%s, extra_indent);' % (self._get_sh_func_name(stp_list[index]['type']), stp_list[index]['name']))
+                            sh_funcs.append('        len = 256+strlen(tmpStr)+strlen(prefix);')
+                            sh_funcs.append('        stp_strs[%i] = (char*)malloc(len);' % (index))
+                            if self.no_addr:
+                                sh_funcs.append('        snprintf(stp_strs[%i], len, " %%s%s (addr)\\n%%s", prefix, tmpStr);' % (index, stp_list[index]['name']))
+                            else:
+                                sh_funcs.append('        snprintf(stp_strs[%i], len, " %%s%s (0x%%p)\\n%%s", prefix, (void*)pStruct->%s, tmpStr);' % (index, stp_list[index]['name'], stp_list[index]['name']))
+                        sh_funcs.append('    }')
+                        sh_funcs.append("    else\n        stp_strs[%i] = \"\";" % (index))
+                    elif stp_list[index]['array']:
+                        sh_funcs.append('    tmpStr = %s(&pStruct->%s[0], extra_indent);' % (self._get_sh_func_name(stp_list[index]['type']), stp_list[index]['name']))
+                        sh_funcs.append('    len = 256+strlen(tmpStr);')
+                        sh_funcs.append('    stp_strs[%i] = (char*)malloc(len);' % (index))
+                        if self.no_addr:
+                            sh_funcs.append('    snprintf(stp_strs[%i], len, " %%s%s[0] (addr)\\n%%s", prefix, tmpStr);' % (index, stp_list[index]['name']))
+                        else:
+                            sh_funcs.append('    snprintf(stp_strs[%i], len, " %%s%s[0] (0x%%p)\\n%%s", prefix, (void*)&pStruct->%s[0], tmpStr);' % (index, stp_list[index]['name'], stp_list[index]['name']))
+                    else:
+                        sh_funcs.append('    tmpStr = %s(&pStruct->%s, extra_indent);' % (self._get_sh_func_name(stp_list[index]['type']), stp_list[index]['name']))
+                        sh_funcs.append('    len = 256+strlen(tmpStr);')
+                        sh_funcs.append('    stp_strs[%i] = (char*)malloc(len);' % (index))
+                        if self.no_addr:
+                            sh_funcs.append('    snprintf(stp_strs[%i], len, " %%s%s (addr)\\n%%s", prefix, tmpStr);' % (index, stp_list[index]['name']))
+                        else:
+                            sh_funcs.append('    snprintf(stp_strs[%i], len, " %%s%s (0x%%p)\\n%%s", prefix, (void*)&pStruct->%s, tmpStr);' % (index, stp_list[index]['name'], stp_list[index]['name']))
+                    total_strlen_str += 'strlen(stp_strs[%i]) + ' % index
+            sh_funcs.append('    len = %ssizeof(char)*1024;' % (total_strlen_str))
+            sh_funcs.append('    str = (char*)malloc(len);')
+            sh_funcs.append('    snprintf(str, len, "')
+            for m in sorted(self.struct_dict[s]):
+                (p_out1, p_args1) = self._get_struct_print_formatted(self.struct_dict[s][m])
+                p_out += p_out1
+                p_args += p_args1
+            p_out += '"'
+            p_args += ");"
+            sh_funcs[-1] = '%s%s%s' % (sh_funcs[-1], p_out, p_args)
+            if 0 != num_stps:
+                sh_funcs.append('    for (int32_t stp_index = %i; stp_index >= 0; stp_index--) {' % (num_stps-1))
+                sh_funcs.append('        if (0 < strlen(stp_strs[stp_index])) {')
+                sh_funcs.append('            strncat(str, stp_strs[stp_index], strlen(stp_strs[stp_index]));')
+                sh_funcs.append('            free(stp_strs[stp_index]);')
+                sh_funcs.append('        }')
+                sh_funcs.append('    }')
+                sh_funcs.append('    free(extra_indent);')
+            sh_funcs.append("    return str;\n}")
+        # Add function to dynamically print out unknown struct
+        sh_funcs.append("char* dynamic_display(const void* pStruct, const char* prefix)\n{")
+        sh_funcs.append("    // Cast to APP_INFO ptr initially just to pull sType off struct")
+        sh_funcs.append("    if (pStruct == NULL) {")
+        sh_funcs.append("        return NULL;")
+        sh_funcs.append("    }")
+        sh_funcs.append("    VkStructureType sType = ((VkApplicationInfo*)pStruct)->sType;")
+        sh_funcs.append('    char indent[100];\n    strcpy(indent, "    ");\n    strcat(indent, prefix);')
+        sh_funcs.append("    switch (sType)\n    {")
+        for e in enum_type_dict:
+            if "StructureType" in e:
+                for v in sorted(enum_type_dict[e]):
+                    struct_name = get_struct_name_from_struct_type(v)
+                    if struct_name not in self.struct_dict:
+                        continue
+                    print_func_name = self._get_sh_func_name(struct_name)
+                    sh_funcs.append('        case %s:\n        {' % (v))
+                    sh_funcs.append('            return %s((%s*)pStruct, indent);' % (print_func_name, struct_name))
+                    sh_funcs.append('        }')
+                    sh_funcs.append('        break;')
+                sh_funcs.append("        default:")
+                sh_funcs.append("        return NULL;")
+                sh_funcs.append("    }")
+        sh_funcs.append("}")
+        return "\n".join(sh_funcs)
+
+    def _generateStringHelperFunctionsCpp(self):
+        # declare str & tmp str
+        # declare array of stringstreams for every struct ptr in current struct
+        # declare array of stringstreams for every non-string element in current struct
+        # For every struct ptr, if non-Null, then set its string, else set to NULL str
+        # For every non-string element, set its string stream
+        # create and return final string
+        sh_funcs = []
+        # First generate prototypes for every struct
+        # XXX - REMOVE this comment
+        lineinfo = sourcelineinfo()
+        sh_funcs.append('%s' % lineinfo.get())
+        for s in sorted(self.struct_dict):
+            # Wrap this in platform check since it may contain undefined structs or functions
+            if not self.struct_dict[s]:
+                continue
+            add_platform_wrapper_entry(sh_funcs, typedef_fwd_dict[s])
+            sh_funcs.append('std::string %s(const %s* pStruct, const std::string prefix);' % (self._get_sh_func_name(s), typedef_fwd_dict[s]))
+            add_platform_wrapper_exit(sh_funcs, typedef_fwd_dict[s])
+
+        sh_funcs.append('\n')
+        sh_funcs.append('%s' % lineinfo.get())
+        for s in sorted(self.struct_dict):
+            if not self.struct_dict[s]:
+                continue
+            num_non_enum_elems = [(is_type(self.struct_dict[s][elem]['type'], 'enum') and not self.struct_dict[s][elem]['ptr']) for elem in self.struct_dict[s]].count(False)
+            stp_list = [] # stp == "struct to print" a list of structs for this API call that should be printed as structs
+            # This pre-pass flags embedded structs and pNext
+            for m in sorted(self.struct_dict[s]):
+                if 'pNext' == self.struct_dict[s][m]['name'] or is_type(self.struct_dict[s][m]['type'], 'struct') or self.struct_dict[s][m]['array']:
+                    # TODO: This is a tmp workaround
+                    if 'ppActiveLayerNames' not in self.struct_dict[s][m]['name']:
+                        stp_list.append(self.struct_dict[s][m])
+            sh_funcs.append('%s' % lineinfo.get())
+
+            # Wrap this in platform check since it may contain undefined structs or functions
+            add_platform_wrapper_entry(sh_funcs, typedef_fwd_dict[s])
+
+            sh_funcs.append('std::string %s(const %s* pStruct, const std::string prefix)\n{' % (self._get_sh_func_name(s), typedef_fwd_dict[s]))
+            sh_funcs.append('%s' % lineinfo.get())
+            indent = '    '
+            sh_funcs.append('%susing namespace StreamControl;' % (indent))
+            sh_funcs.append('%susing namespace std;' % (indent))
+            sh_funcs.append('%sstring final_str;' % (indent))
+            sh_funcs.append('%sstring tmp_str;' % (indent))
+            sh_funcs.append('%sstring extra_indent = "  " + prefix;' % (indent))
+            if (0 != num_non_enum_elems):
+                sh_funcs.append('%sstringstream ss[%u];' % (indent, num_non_enum_elems))
+            num_stps = len(stp_list)
+            # First generate code for any embedded structs or arrays
+            if 0 < num_stps:
+                sh_funcs.append('%sstring stp_strs[%u];' % (indent, num_stps))
+                idx_ss_decl = False # Make sure to only decl this once
+                for index in range(num_stps):
+                    if (stp_list[index]['array'] and 'char' == stp_list[index]['type']) or \
+                        (not stp_list[index]['ptr'] and stp_list[index]['type'] in self.opaque_types):
+                        continue
+                    if stp_list[index]['array'] and not 'char' == stp_list[index]['type']:
+                        sh_funcs.append('%s' % lineinfo.get())
+                        if stp_list[index]['dyn_array']:
+                            sh_funcs.append('%s' % lineinfo.get())
+                            array_count = 'pStruct->%s' % (stp_list[index]['array_size'])
+                        else:
+                            sh_funcs.append('%s' % lineinfo.get())
+                            array_count = '%s' % (stp_list[index]['array_size'])
+                        sh_funcs.append('%s' % lineinfo.get())
+                        sh_funcs.append('%sstp_strs[%u] = "";' % (indent, index))
+                        if not idx_ss_decl:
+                            sh_funcs.append('%sstringstream index_ss;' % (indent))
+                            idx_ss_decl = True
+                        if (stp_list[index]['name'] == 'pQueueFamilyIndices'):
+                            if (typedef_fwd_dict[s] == 'VkSwapchainCreateInfoKHR'):
+                                sh_funcs.append('%sif (pStruct->imageSharingMode == VK_SHARING_MODE_CONCURRENT) {' % (indent))
+                            else:
+                                sh_funcs.append('%sif (pStruct->sharingMode == VK_SHARING_MODE_CONCURRENT) {' % (indent))
+                            indent += '    '
+                        if (stp_list[index]['name'] == 'pImageInfo'):
+                            sh_funcs.append('%sif ((pStruct->descriptorType == VK_DESCRIPTOR_TYPE_SAMPLER)                ||' % (indent))
+                            sh_funcs.append('%s    (pStruct->descriptorType == VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER) ||' % (indent))
+                            sh_funcs.append('%s    (pStruct->descriptorType == VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE)          ||' % (indent))
+                            sh_funcs.append('%s    (pStruct->descriptorType == VK_DESCRIPTOR_TYPE_STORAGE_IMAGE))           {' % (indent))
+                            indent += '    '
+                        elif (stp_list[index]['name'] == 'pBufferInfo'):
+                            sh_funcs.append('%sif ((pStruct->descriptorType == VK_DESCRIPTOR_TYPE_STORAGE_BUFFER)         ||' % (indent))
+                            sh_funcs.append('%s    (pStruct->descriptorType == VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER)         ||' % (indent))
+                            sh_funcs.append('%s    (pStruct->descriptorType == VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC) ||' % (indent))
+                            sh_funcs.append('%s    (pStruct->descriptorType == VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC))  {' % (indent))
+                            indent += '    '
+                        elif (stp_list[index]['name'] == 'pTexelBufferView'):
+                            sh_funcs.append('%sif ((pStruct->descriptorType == VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER) ||' % (indent))
+                            sh_funcs.append('%s    (pStruct->descriptorType == VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER))  {' % (indent))
+                            indent += '    '
+                        if stp_list[index]['dyn_array']:
+                            sh_funcs.append('%sif (pStruct->%s) {' % (indent, stp_list[index]['name']))
+                            indent += '    '
+                        sh_funcs.append('%sfor (uint32_t i = 0; i < %s; i++) {' % (indent, array_count))
+                        indent += '    '
+                        sh_funcs.append('%sindex_ss.str("");' % (indent))
+                        sh_funcs.append('%sindex_ss << i;' % (indent))
+                        if is_type(stp_list[index]['type'], 'enum'):
+                            sh_funcs.append('%s' % lineinfo.get())
+                            #value_print = 'string_%s(%spStruct->%s)' % (self.struct_dict[s][m]['type'], deref, self.struct_dict[s][m]['name'])
+                            sh_funcs.append('%sss[%u] << string_%s(pStruct->%s[i]);' % (indent, index, stp_list[index]['type'], stp_list[index]['name']))
+                            sh_funcs.append('%sstp_strs[%u] += " " + prefix + "%s[" + index_ss.str() + "] = " + ss[%u].str() + "\\n";' % (indent, index, stp_list[index]['name'], index))
+                        elif is_type(stp_list[index]['type'], 'struct'):
+                            sh_funcs.append('%s' % lineinfo.get())
+                            if not stp_list[index]['type'] in self.opaque_types:
+                                sh_funcs.append('%stmp_str = %s(&pStruct->%s[i], extra_indent);' % (indent, self._get_sh_func_name(stp_list[index]['type']), stp_list[index]['name']))
+                                sh_funcs.append('%s' % lineinfo.get())
+                                sh_funcs.append('%sstp_strs[%u] += " " + prefix + "%s[" + index_ss.str() + "]:\\n" + tmp_str;' % (indent, index, stp_list[index]['name']))
+                            else:
+                                sh_funcs.append('%s' % lineinfo.get())
+                                sh_funcs.append('%sss[%u] << "0x" << hex << nouppercase << HandleCast(pStruct->%s[i]) << dec;' % (indent, index, stp_list[index]['name']))
+                                sh_funcs.append('%sstp_strs[%u] += " " + prefix + "%s[" + index_ss.str() + "] = " + ss[%u].str() + "\\n";' % (indent, index, stp_list[index]['name'], index))
+                        else:
+                            sh_funcs.append('%s' % lineinfo.get())
+                            if stp_list[index]['ptr'] or 'UUID' in stp_list[index]['name']:
+                                if 'uint8' in stp_list[index]['type']:
+                                    sh_funcs.append('%sss[%u] << "0x" << hex << nouppercase << static_cast<const unsigned>(pStruct->%s[i]) << dec;' % (indent, index, stp_list[index]['name']))
+                                else:
+                                    sh_funcs.append('%sss[%u] << "0x" << hex << nouppercase << pStruct->%s[i] << dec;' % (indent, index, stp_list[index]['name']))
+                            else:
+                                sh_funcs.append('%sss[%u] << pStruct->%s[i];' % (indent, index, stp_list[index]['name']))
+                            if stp_list[index]['type'] in vulkan.core.objects:
+                                sh_funcs.append('%sstp_strs[%u] += " " + prefix + "%s[" + index_ss.str() + "].handle = " + ss[%u].str() + "\\n";' % (indent, index, stp_list[index]['name'], index))
+                            else:
+                                sh_funcs.append('%sstp_strs[%u] += " " + prefix + "%s[" + index_ss.str() + "] = " + ss[%u].str() + "\\n";' % (indent, index, stp_list[index]['name'], index))
+                        sh_funcs.append('%s' % lineinfo.get())
+                        sh_funcs.append('%sss[%u].str("");' % (indent, index))
+                        indent = indent[4:]
+                        sh_funcs.append('%s}' % (indent))
+                        if stp_list[index]['dyn_array']:
+                            indent = indent[4:]
+                            sh_funcs.append('%s}' % (indent))
+                        #endif
+                        if (stp_list[index]['name'] == 'pQueueFamilyIndices') or (stp_list[index]['name'] == 'pImageInfo') or (stp_list[index]['name'] == 'pBufferInfo') or (stp_list[index]['name'] == 'pTexelBufferView'):
+                            indent = indent[4:]
+                            sh_funcs.append('%s}' % (indent))
+                    elif (stp_list[index]['ptr']):
+                        sh_funcs.append('%s' % lineinfo.get())
+                        sh_funcs.append('%sif (pStruct->%s) {' % (indent, stp_list[index]['name']))
+                        indent += '    '
+                        if 'pNext' == stp_list[index]['name']:
+                            sh_funcs.append('%s' % lineinfo.get())
+                            sh_funcs.append('        tmp_str = dynamic_display((void*)pStruct->pNext, prefix);')
+                        else:
+                            if stp_list[index]['name'] in ['pImageViews', 'pBufferViews']:
+                                # TODO : This is a quick hack to handle these arrays of ptrs
+                                sh_funcs.append('%s' % lineinfo.get())
+                                sh_funcs.append('        tmp_str = %s(&pStruct->%s[0], extra_indent);' % (self._get_sh_func_name(stp_list[index]['type']), stp_list[index]['name']))
+                            else:
+                                sh_funcs.append('%s' % lineinfo.get())
+                                sh_funcs.append('        tmp_str = %s(pStruct->%s, extra_indent);' % (self._get_sh_func_name(stp_list[index]['type']), stp_list[index]['name']))
+                        sh_funcs.append('%s' % lineinfo.get())
+                        sh_funcs.append('        stp_strs[%u] = " " + prefix + "%s:\\n" + tmp_str;' % (index, stp_list[index]['name']))
+                        sh_funcs.append('        ss[%u].str("");' % (index))
+                        sh_funcs.append('    }')
+                        sh_funcs.append('    else')
+                        sh_funcs.append('        stp_strs[%u] = "";' % index)
+                        indent = indent[4:]
+                    else:
+                        sh_funcs.append('%s' % lineinfo.get())
+                        if not stp_list[index]['type'] in self.opaque_types:
+                            sh_funcs.append('    tmp_str = %s(&pStruct->%s, extra_indent);' % (self._get_sh_func_name(stp_list[index]['type']), stp_list[index]['name']))
+                            if self.no_addr:
+                                sh_funcs.append('%s' % lineinfo.get())
+                                sh_funcs.append('    stp_strs[%u] = " " + prefix + "%s:\\n" + tmp_str;' % (index, stp_list[index]['name']))
+                            else:
+                                sh_funcs.append('%s' % lineinfo.get())
+                                sh_funcs.append('    stp_strs[%u] = " " + prefix + "%s:" + "\\n" + tmp_str;' % (index, stp_list[index]['name']))
+                        else:
+                            sh_funcs.append('    ss[%u] << "0x" << hex << nouppercase << pStruct->%s << dec;' % (index, stp_list[index]['name']))
+            # Now print one-line info for all data members
+            index = 0
+            final_str = []
+            for m in sorted(self.struct_dict[s]):
+                if not is_type(self.struct_dict[s][m]['type'], 'enum'):
+                    if is_type(self.struct_dict[s][m]['type'], 'struct') and not self.struct_dict[s][m]['ptr']:
+                        if self.no_addr:
+                            sh_funcs.append('%s' % lineinfo.get())
+                            sh_funcs.append('    ss[%u].str("addr");' % (index))
+                        elif self.struct_dict[s][m]['type'] in self.opaque_types:
+                            sh_funcs.append('%s' % lineinfo.get())
+                            sh_funcs.append('    ss[%u] << "0x" << hex << nouppercase << HandleCast(pStruct->%s) << dec;' % (index, self.struct_dict[s][m]['name']))
+                        else:
+                            sh_funcs.append('%s' % lineinfo.get())
+                            sh_funcs.append('    ss[%u] << "0x" << hex << nouppercase << HandleCast(&pStruct->%s) << dec;' % (index, self.struct_dict[s][m]['name']))
+                    elif self.struct_dict[s][m]['array']:
+                        if 'char' == self.struct_dict[s][m]['full_type'] or \
+                        (1 == self.struct_dict[s][m]['full_type'].count('*') and 'char' in self.struct_dict[s][m]['type']):
+                                sh_funcs.append('%s' % lineinfo.get())
+                                sh_funcs.append('    ss[%u] << pStruct->%s;' % (index, self.struct_dict[s][m]['name']))
+                        else:
+                            sh_funcs.append('%s' % lineinfo.get())
+                            sh_funcs.append('    ss[%u] << "0x" << hex << nouppercase << HandleCast(pStruct->%s) << dec;' % (index, self.struct_dict[s][m]['name']))
+                    elif 'bool' in self.struct_dict[s][m]['type'].lower():
+                        sh_funcs.append('%s' % lineinfo.get())
+                        sh_funcs.append('    ss[%u].str(pStruct->%s ? "TRUE" : "FALSE");' % (index, self.struct_dict[s][m]['name']))
+                    elif 'uint8' in self.struct_dict[s][m]['type'].lower():
+                        sh_funcs.append('%s' % lineinfo.get())
+                        sh_funcs.append('    ss[%u] << pStruct->%s;' % (index, self.struct_dict[s][m]['name']))
+                    elif 'void' in self.struct_dict[s][m]['type'].lower() and self.struct_dict[s][m]['ptr']:
+                        sh_funcs.append('%s' % lineinfo.get())
+                        sh_funcs.append('    if (StreamControl::writeAddress)')
+                        sh_funcs.append('        ss[%u] << "0x" << hex << nouppercase << HandleCast(pStruct->%s) << dec;' % (index, self.struct_dict[s][m]['name']))
+                        sh_funcs.append('    else')
+                        sh_funcs.append('        ss[%u].str("address");' % (index))
+                    elif 'char' in self.struct_dict[s][m]['type'].lower() and self.struct_dict[s][m]['ptr']:
+                        sh_funcs.append('%s' % lineinfo.get())
+                        sh_funcs.append('    if (pStruct->%s != NULL) {' % self.struct_dict[s][m]['name'])
+                        sh_funcs.append('        ss[%u] << pStruct->%s;' % (index, self.struct_dict[s][m]['name']))
+                        sh_funcs.append('     } else {')
+                        sh_funcs.append('        ss[%u] << "";' % index)
+                        sh_funcs.append('     }')
+                    else:
+                        hex_label_list = ["flag", "bit", "offset", "count", "pfn", "size", "handle", "buffer", "object", "mask", "index"]
+                        if self.struct_dict[s][m]['ptr'] or 'HWND' in self.struct_dict[s][m]['type'] or 'HINSTANCE' in self.struct_dict[s][m]['type']:
+                            sh_funcs.append('%s' % lineinfo.get())
+                            sh_funcs.append('    ss[%u] << "0x" << hex << nouppercase << HandleCast(pStruct->%s) << dec;' % (index, self.struct_dict[s][m]['name']))
+                        elif not 'float'  in self.struct_dict[s][m]['type'] and \
+                             (any (x in self.struct_dict[s][m]['name'].lower() for x in hex_label_list) or \
+                             any (x in self.struct_dict[s][m]['type'].lower() for x in hex_label_list) \
+                             ):
+                            sh_funcs.append('%s: NB: Edit here to choose hex vs dec output by variable name' % lineinfo.get())
+                            sh_funcs.append('    ss[%u] << "0x" << hex << nouppercase << pStruct->%s << dec;' % (index, self.struct_dict[s][m]['name']))
+                        else:
+                            sh_funcs.append('%s: NB Edit this section to choose hex vs dec output by variable name' % lineinfo.get())
+                            sh_funcs.append('    ss[%u] << pStruct->%s;' % (index, self.struct_dict[s][m]['name']))
+                    value_print = 'ss[%u].str()' % index
+                    index += 1
+                else:
+                    # For an non-empty array of enums just print address w/ note that array will be displayed below
+                    if self.struct_dict[s][m]['ptr']:
+                        sh_funcs.append('%s' % lineinfo.get())
+                        sh_funcs.append('    if (pStruct->%s)' % (self.struct_dict[s][m]['name']))
+                        sh_funcs.append('        ss[%u] << "0x" << hex << nouppercase << pStruct->%s << dec << " (See individual array values below)";' % (index, self.struct_dict[s][m]['name']))
+                        sh_funcs.append('    else')
+                        sh_funcs.append('        ss[%u].str("NULL");' % (index))
+                        value_print = 'ss[%u].str()' % index
+                        index += 1
+                    # For single enum just print the string representation
+                    else:
+                        value_print = 'string_%s(pStruct->%s)' % (self.struct_dict[s][m]['type'], self.struct_dict[s][m]['name'])
+                final_str.append('+ prefix + "%s = " + %s + "\\n"' % (self.struct_dict[s][m]['name'], value_print))
+            if 0 != num_stps: # Append data for any embedded structs
+                final_str.append("+ %s" % " + ".join(['stp_strs[%u]' % n for n in reversed(range(num_stps))]))
+            sh_funcs.append('%s' % lineinfo.get())
+            for final_str_part in final_str:
+                sh_funcs.append('    final_str = final_str %s;' % final_str_part)
+            sh_funcs.append('    return final_str;\n}')
+
+            # End of platform wrapped section
+            add_platform_wrapper_exit(sh_funcs, typedef_fwd_dict[s])
+
+        # Add function to return a string value for input void*
+        sh_funcs.append('%s' % lineinfo.get())
+        sh_funcs.append("std::string string_convert_helper(const void* toString, const std::string prefix)\n{")
+        sh_funcs.append("    using namespace StreamControl;")
+        sh_funcs.append("    using namespace std;")
+        sh_funcs.append("    stringstream ss;")
+        sh_funcs.append('    ss << toString;')
+        sh_funcs.append('    string final_str = prefix + ss.str();')
+        sh_funcs.append("    return final_str;")
+        sh_funcs.append("}")
+        sh_funcs.append('%s' % lineinfo.get())
+        # Add function to return a string value for input uint64_t
+        sh_funcs.append("std::string string_convert_helper(const uint64_t toString, const std::string prefix)\n{")
+        sh_funcs.append("    using namespace StreamControl;")
+        sh_funcs.append("    using namespace std;")
+        sh_funcs.append("    stringstream ss;")
+        sh_funcs.append('    ss << toString;')
+        sh_funcs.append('    string final_str = prefix + ss.str();')
+        sh_funcs.append("    return final_str;")
+        sh_funcs.append("}")
+        sh_funcs.append('%s' % lineinfo.get())
+        # Add function to return a string value for input VkSurfaceFormatKHR*
+        sh_funcs.append("std::string string_convert_helper(VkSurfaceFormatKHR toString, const std::string prefix)\n{")
+        sh_funcs.append("    using namespace std;")
+        sh_funcs.append('    string final_str = prefix + "format = " + string_VkFormat(toString.format) + "format = " + string_VkColorSpaceKHR(toString.colorSpace);')
+        sh_funcs.append("    return final_str;")
+        sh_funcs.append("}")
+        sh_funcs.append('%s' % lineinfo.get())
+        # Add function to dynamically print out unknown struct
+        sh_funcs.append("std::string dynamic_display(const void* pStruct, const std::string prefix)\n{")
+        sh_funcs.append("    using namespace std;")
+        sh_funcs.append("    // Cast to APP_INFO ptr initially just to pull sType off struct")
+        sh_funcs.append("    if (pStruct == NULL) {\n")
+        sh_funcs.append("        return string();")
+        sh_funcs.append("    }\n")
+        sh_funcs.append("    VkStructureType sType = ((VkApplicationInfo*)pStruct)->sType;")
+        sh_funcs.append('    string indent = "    ";')
+        sh_funcs.append('    indent += prefix;')
+        sh_funcs.append("    switch (sType)\n    {")
+        for e in enum_type_dict:
+            if "StructureType" in e:
+                for v in sorted(enum_type_dict[e]):
+                    struct_name = get_struct_name_from_struct_type(v)
+                    if struct_name not in self.struct_dict:
+                        continue
+                    print_func_name = self._get_sh_func_name(struct_name)
+                    #sh_funcs.append('string %s(const %s* pStruct, const string prefix);' % (self._get_sh_func_name(s), typedef_fwd_dict[s]))
+                    sh_funcs.append('        case %s:\n        {' % (v))
+                    sh_funcs.append('            return %s((%s*)pStruct, indent);' % (print_func_name, struct_name))
+                    sh_funcs.append('        }')
+                    sh_funcs.append('        break;')
+                sh_funcs.append("        default:")
+                sh_funcs.append("        return string();")
+        sh_funcs.append('%s' % lineinfo.get())
+        sh_funcs.append("    }")
+        sh_funcs.append("}")
+        return "\n".join(sh_funcs)
+
+    def _genStructMemberPrint(self, member, s, array, struct_array):
+        (p_out, p_arg) = self._get_struct_print_formatted(self.struct_dict[s][member], pre_var_name="&m_dummy_prefix", struct_var_name="m_struct", struct_ptr=False, print_array=True)
+        extra_indent = ""
+        if array:
+            extra_indent = "    "
+        if is_type(self.struct_dict[s][member]['type'], 'struct'): # print struct address for now
+            struct_array.insert(0, self.struct_dict[s][member])
+        elif self.struct_dict[s][member]['ptr']:
+            # Special case for void* named "pNext"
+            if "void" in self.struct_dict[s][member]['type'] and "pNext" == self.struct_dict[s][member]['name']:
+                struct_array.insert(0, self.struct_dict[s][member])
+        return ('    %sprintf("%%*s    %s", m_indent, ""%s);' % (extra_indent, p_out, p_arg), struct_array)
+
+    def _generateDisplayDefinitions(self, s):
+        disp_def = []
+        struct_array = []
+        # Single-line struct print function
+        disp_def.append("// Output 'structname = struct_address' on a single line")
+        disp_def.append("void %s::display_single_txt()\n{" % self.get_class_name(s))
+        disp_def.append('    printf(" %%*s%s = 0x%%p", m_indent, "", (void*)m_origStructAddr);' % typedef_fwd_dict[s])
+        disp_def.append("}\n")
+        # Private helper function to print struct members
+        disp_def.append("// Private helper function that displays the members of the wrapped struct")
+        disp_def.append("void %s::display_struct_members()\n{" % self.get_class_name(s))
+        i_declared = False
+        for member in sorted(self.struct_dict[s]):
+            # TODO : Need to display each member based on its type
+            # TODO : Need to handle pNext which are structs, but of void* type
+            #   Can grab struct type off of header of struct pointed to
+            # TODO : Handle Arrays
+            if self.struct_dict[s][member]['array']:
+                # Create for loop to print each element of array
+                if not i_declared:
+                    disp_def.append('    uint32_t i;')
+                    i_declared = True
+                disp_def.append('    for (i = 0; i<%s; i++) {' % self.struct_dict[s][member]['array_size'])
+                (return_str, struct_array) = self._genStructMemberPrint(member, s, True, struct_array)
+                disp_def.append(return_str)
+                disp_def.append('    }')
+            else:
+                (return_str, struct_array) = self._genStructMemberPrint(member, s, False, struct_array)
+                disp_def.append(return_str)
+        disp_def.append("}\n")
+        i_declared = False
+        # Basic print function to display struct members
+        disp_def.append("// Output all struct elements, each on their own line")
+        disp_def.append("void %s::display_txt()\n{" % self.get_class_name(s))
+        disp_def.append('    printf("%%*s%s struct contents at 0x%%p:\\n", m_indent, "", (void*)m_origStructAddr);' % typedef_fwd_dict[s])
+        disp_def.append('    this->display_struct_members();')
+        disp_def.append("}\n")
+        # Advanced print function to display current struct and contents of any pointed-to structs
+        disp_def.append("// Output all struct elements, and for any structs pointed to, print complete contents")
+        disp_def.append("void %s::display_full_txt()\n{" % self.get_class_name(s))
+        disp_def.append('    printf("%%*s%s struct contents at 0x%%p:\\n", m_indent, "", (void*)m_origStructAddr);' % typedef_fwd_dict[s])
+        disp_def.append('    this->display_struct_members();')
+        class_num = 0
+        # TODO : Need to handle arrays of structs here
+        for ms in struct_array:
+            swc_name = "class%s" % str(class_num)
+            if ms['array']:
+                if not i_declared:
+                    disp_def.append('    uint32_t i;')
+                    i_declared = True
+                disp_def.append('    for (i = 0; i<%s; i++) {' % ms['array_size'])
+                #disp_def.append("        if (m_struct.%s[i]) {" % (ms['name']))
+                disp_def.append("            %s %s(&(m_struct.%s[i]));" % (self.get_class_name(ms['type']), swc_name, ms['name']))
+                disp_def.append("            %s.set_indent(m_indent + 4);" % (swc_name))
+                disp_def.append("            %s.display_full_txt();" % (swc_name))
+                #disp_def.append('        }')
+                disp_def.append('    }')
+            elif 'pNext' == ms['name']:
+                # Need some code trickery here
+                #  I'm thinking have a generated function that takes pNext ptr value
+                #  then it checks sType and in large switch statement creates appropriate
+                #  wrapper class type and then prints contents
+                disp_def.append("    if (m_struct.%s) {" % (ms['name']))
+                #disp_def.append('        printf("%*s    This is where we would call dynamic print function\\n", m_indent, "");')
+                disp_def.append('        dynamic_display_full_txt(m_struct.%s, m_indent);' % (ms['name']))
+                disp_def.append("    }")
+            else:
+                if ms['ptr']:
+                    disp_def.append("    if (m_struct.%s) {" % (ms['name']))
+                    disp_def.append("        %s %s(m_struct.%s);" % (self.get_class_name(ms['type']), swc_name, ms['name']))
+                else:
+                    disp_def.append("    if (&m_struct.%s) {" % (ms['name']))
+                    disp_def.append("        %s %s(&m_struct.%s);" % (self.get_class_name(ms['type']), swc_name, ms['name']))
+                disp_def.append("        %s.set_indent(m_indent + 4);" % (swc_name))
+                disp_def.append("        %s.display_full_txt();\n    }" % (swc_name))
+            class_num += 1
+        disp_def.append("}\n")
+        return "\n".join(disp_def)
+
+    def _generateStringHelperHeader(self):
+        header = []
+        header.append("//#includes, #defines, globals and such...\n")
+        for f in self.include_headers:
+            if 'vk_enum_string_helper' not in f:
+                header.append("#include <%s>\n" % f)
+        header.append('#include "vk_enum_string_helper.h"\n\n// Function Prototypes\n')
+        header.append("char* dynamic_display(const void* pStruct, const char* prefix);\n")
+        return "".join(header)
+
+    def _generateStringHelperHeaderCpp(self):
+        header = []
+        header.append("//#includes, #defines, globals and such...\n")
+        for f in self.include_headers:
+            if 'vk_enum_string_helper' not in f:
+                header.append("#include <%s>\n" % f)
+        header.append('#include "vk_enum_string_helper.h"\n')
+        header.append('namespace StreamControl\n')
+        header.append('{\n')
+        header.append('bool writeAddress = true;\n')
+        header.append('template <typename T>\n')
+        header.append('std::ostream& operator<< (std::ostream &out, T const* pointer)\n')
+        header.append('{\n')
+        header.append('    if(writeAddress)\n')
+        header.append('    {\n')
+        header.append('        out.operator<<(pointer);\n')
+        header.append('    }\n')
+        header.append('    else\n')
+        header.append('    {\n')
+        header.append('        std::operator<<(out, "address");\n')
+        header.append('    }\n')
+        header.append('    return out;\n')
+        header.append('}\n')
+        header.append('template <typename HandleType> uint64_t HandleCast(HandleType * handle)\n')
+        header.append('{\n')
+        header.append('    return reinterpret_cast<uint64_t>(handle);\n')
+        header.append('}\n')
+        header.append('uint64_t HandleCast(uint64_t handle)\n')
+        header.append('{\n')
+        header.append('    return handle;\n')
+        header.append('}\n')
+        header.append('std::ostream& operator<<(std::ostream &out, char const*const s)\n')
+        header.append('{\n')
+        header.append('    return std::operator<<(out, s);\n')
+        header.append('}\n')
+        header.append('}\n')
+        header.append('\n')
+        header.append("std::string dynamic_display(const void* pStruct, const std::string prefix);\n")
+        return "".join(header)
+
+    def _generateValidateHelperFunctions(self):
+        sh_funcs = []
+        # We do two passes, first pass just generates prototypes for all the functsions
+        for s in sorted(self.struct_dict):
+
+            # Wrap this in platform check since it may contain undefined structs or functions
+            add_platform_wrapper_entry(sh_funcs, typedef_fwd_dict[s])
+            sh_funcs.append('uint32_t %s(const %s* pStruct);' % (self._get_vh_func_name(s), typedef_fwd_dict[s]))
+            add_platform_wrapper_exit(sh_funcs, typedef_fwd_dict[s])
+
+        sh_funcs.append('\n')
+        for s in sorted(self.struct_dict):
+
+            # Wrap this in platform check since it may contain undefined structs or functions
+            add_platform_wrapper_entry(sh_funcs, typedef_fwd_dict[s])
+
+            sh_funcs.append('uint32_t %s(const %s* pStruct)\n{' % (self._get_vh_func_name(s), typedef_fwd_dict[s]))
+            for m in sorted(self.struct_dict[s]):
+                # TODO : Need to handle arrays of enums like in VkRenderPassCreateInfo struct
+                if is_type(self.struct_dict[s][m]['type'], 'enum') and not self.struct_dict[s][m]['ptr']:
+                    sh_funcs.append('    if (!validate_%s(pStruct->%s))\n        return 0;' % (self.struct_dict[s][m]['type'], self.struct_dict[s][m]['name']))
+                # TODO : Need a little refinement to this code to make sure type of struct matches expected input (ptr, const...)
+                if is_type(self.struct_dict[s][m]['type'], 'struct'):
+                    if (self.struct_dict[s][m]['ptr']):
+                        sh_funcs.append('    if (pStruct->%s && !%s((const %s*)pStruct->%s))\n        return 0;' % (self.struct_dict[s][m]['name'], self._get_vh_func_name(self.struct_dict[s][m]['type']), self.struct_dict[s][m]['type'], self.struct_dict[s][m]['name']))
+                    else:
+                        sh_funcs.append('    if (!%s((const %s*)&pStruct->%s))\n        return 0;' % (self._get_vh_func_name(self.struct_dict[s][m]['type']), self.struct_dict[s][m]['type'], self.struct_dict[s][m]['name']))
+            sh_funcs.append("    return 1;\n}")
+
+            # End of platform wrapped section
+            add_platform_wrapper_exit(sh_funcs, typedef_fwd_dict[s])
+
+        return "\n".join(sh_funcs)
+
+    def _generateValidateHelperHeader(self):
+        header = []
+        header.append("//#includes, #defines, globals and such...\n")
+        for f in self.include_headers:
+            if 'vk_enum_validate_helper' not in f:
+                header.append("#include <%s>\n" % f)
+        header.append('#include "vk_enum_validate_helper.h"\n\n// Function Prototypes\n')
+        #header.append("char* dynamic_display(const void* pStruct, const char* prefix);\n")
+        return "".join(header)
+
+    def _generateSizeHelperFunctions(self):
+        sh_funcs = []
+        # just generates prototypes for all the functions
+        for s in sorted(self.struct_dict):
+
+            # Wrap this in platform check since it may contain undefined structs or functions
+            add_platform_wrapper_entry(sh_funcs, typedef_fwd_dict[s])
+            sh_funcs.append('size_t %s(const %s* pStruct);' % (self._get_size_helper_func_name(s), typedef_fwd_dict[s]))
+            add_platform_wrapper_exit(sh_funcs, typedef_fwd_dict[s])
+
+        return "\n".join(sh_funcs)
+
+
+    def _generateSizeHelperFunctionsC(self):
+        sh_funcs = []
+        # generate function definitions
+        for s in sorted(self.struct_dict):
+
+            # Wrap this in platform check since it may contain undefined structs or functions
+            add_platform_wrapper_entry(sh_funcs, typedef_fwd_dict[s])
+
+            skip_list = [] # Used when struct elements need to be skipped because size already accounted for
+            sh_funcs.append('size_t %s(const %s* pStruct)\n{' % (self._get_size_helper_func_name(s), typedef_fwd_dict[s]))
+            indent = '    '
+            sh_funcs.append('%ssize_t structSize = 0;' % (indent))
+            sh_funcs.append('%sif (pStruct) {' % (indent))
+            indent = '        '
+            sh_funcs.append('%sstructSize = sizeof(%s);' % (indent, typedef_fwd_dict[s]))
+            i_decl = False
+            for m in sorted(self.struct_dict[s]):
+                if m in skip_list:
+                    continue
+                if self.struct_dict[s][m]['dyn_array']:
+                    if self.struct_dict[s][m]['full_type'].count('*') > 1:
+                        if not is_type(self.struct_dict[s][m]['type'], 'struct') and not 'char' in self.struct_dict[s][m]['type'].lower():
+                            if 'ppMemoryBarriers' == self.struct_dict[s][m]['name']:
+                                # TODO : For now be conservative and consider all memBarrier ptrs as largest possible struct
+                                sh_funcs.append('%sstructSize += pStruct->%s*(sizeof(%s*) + sizeof(VkImageMemoryBarrier));' % (indent, self.struct_dict[s][m]['array_size'], self.struct_dict[s][m]['type']))
+                            else:
+                                sh_funcs.append('%sstructSize += pStruct->%s*(sizeof(%s*) + sizeof(%s));' % (indent, self.struct_dict[s][m]['array_size'], self.struct_dict[s][m]['type'], self.struct_dict[s][m]['type']))
+                        else: # This is an array of char* or array of struct ptrs
+                            if not i_decl:
+                                sh_funcs.append('%suint32_t i = 0;' % (indent))
+                                i_decl = True
+                            sh_funcs.append('%sfor (i = 0; i < pStruct->%s; i++) {' % (indent, self.struct_dict[s][m]['array_size']))
+                            indent = '            '
+                            if is_type(self.struct_dict[s][m]['type'], 'struct'):
+                                sh_funcs.append('%sstructSize += (sizeof(%s*) + %s(pStruct->%s[i]));' % (indent, self.struct_dict[s][m]['type'], self._get_size_helper_func_name(self.struct_dict[s][m]['type']), self.struct_dict[s][m]['name']))
+                            else:
+                                sh_funcs.append('%sstructSize += (sizeof(char*) + (sizeof(char) * (1 + strlen(pStruct->%s[i]))));' % (indent, self.struct_dict[s][m]['name']))
+                            indent = '        '
+                            sh_funcs.append('%s}' % (indent))
+                    else:
+                        if is_type(self.struct_dict[s][m]['type'], 'struct'):
+                            if not i_decl:
+                                sh_funcs.append('%suint32_t i = 0;' % (indent))
+                                i_decl = True
+                            sh_funcs.append('%sfor (i = 0; i < pStruct->%s; i++) {' % (indent, self.struct_dict[s][m]['array_size']))
+                            indent = '            '
+                            sh_funcs.append('%sstructSize += %s(&pStruct->%s[i]);' % (indent, self._get_size_helper_func_name(self.struct_dict[s][m]['type']), self.struct_dict[s][m]['name']))
+                            indent = '        '
+                            sh_funcs.append('%s}' % (indent))
+                        else:
+                            sh_funcs.append('%sstructSize += pStruct->%s*sizeof(%s);' % (indent, self.struct_dict[s][m]['array_size'], self.struct_dict[s][m]['type']))
+                elif self.struct_dict[s][m]['ptr'] and 'pNext' != self.struct_dict[s][m]['name'] and 'dpy' != self.struct_dict[s][m]['name']:
+                    if 'char' in self.struct_dict[s][m]['type'].lower():
+                        sh_funcs.append('%sstructSize += (pStruct->%s != NULL) ? sizeof(%s)*(1+strlen(pStruct->%s)) : 0;' % (indent, self.struct_dict[s][m]['name'], self.struct_dict[s][m]['type'], self.struct_dict[s][m]['name']))
+                    elif is_type(self.struct_dict[s][m]['type'], 'struct'):
+                        sh_funcs.append('%sstructSize += %s(pStruct->%s);' % (indent, self._get_size_helper_func_name(self.struct_dict[s][m]['type']), self.struct_dict[s][m]['name']))
+                    elif 'void' not in self.struct_dict[s][m]['type'].lower():
+                        if (self.struct_dict[s][m]['type'] != 'xcb_connection_t'):
+                            sh_funcs.append('%sstructSize += sizeof(%s);' % (indent, self.struct_dict[s][m]['type']))
+                elif 'size_t' == self.struct_dict[s][m]['type'].lower():
+                    sh_funcs.append('%sstructSize += pStruct->%s;' % (indent, self.struct_dict[s][m]['name']))
+                    skip_list.append(m+1)
+            indent = '    '
+            sh_funcs.append('%s}' % (indent))
+            sh_funcs.append("%sreturn structSize;\n}" % (indent))
+
+            # End of platform wrapped section
+            add_platform_wrapper_exit(sh_funcs, typedef_fwd_dict[s])
+
+        # Now generate generic functions to loop over entire struct chain (or just handle single generic structs)
+        if '_debug_' not in self.header_filename:
+            for follow_chain in [True, False]:
+                sh_funcs.append('%s' % self.lineinfo.get())
+                if follow_chain:
+                    sh_funcs.append('size_t get_struct_chain_size(const void* pStruct)\n{')
+                else:
+                    sh_funcs.append('size_t get_dynamic_struct_size(const void* pStruct)\n{')
+                indent = '    '
+                sh_funcs.append('%s// Just use VkApplicationInfo as struct until actual type is resolved' % (indent))
+                sh_funcs.append('%sVkApplicationInfo* pNext = (VkApplicationInfo*)pStruct;' % (indent))
+                sh_funcs.append('%ssize_t structSize = 0;' % (indent))
+                if follow_chain:
+                    sh_funcs.append('%swhile (pNext) {' % (indent))
+                    indent = '        '
+                sh_funcs.append('%sswitch (pNext->sType) {' % (indent))
+                indent += '    '
+                for e in enum_type_dict:
+                    if 'StructureType' in e:
+                        for v in sorted(enum_type_dict[e]):
+                            struct_name = get_struct_name_from_struct_type(v)
+                            if struct_name not in self.struct_dict:
+                                continue
+
+                            sh_funcs.append('%scase %s:' % (indent, v))
+                            sh_funcs.append('%s{' % (indent))
+                            indent += '    '
+                            sh_funcs.append('%sstructSize += %s((%s*)pNext);' % (indent, self._get_size_helper_func_name(struct_name), struct_name))
+                            sh_funcs.append('%sbreak;' % (indent))
+                            indent = indent[:-4]
+                            sh_funcs.append('%s}' % (indent))
+                        sh_funcs.append('%sdefault:' % (indent))
+                        indent += '    '
+                        sh_funcs.append('%sassert(0);' % (indent))
+                        sh_funcs.append('%sstructSize += 0;' % (indent))
+                        indent = indent[:-4]
+                indent = indent[:-4]
+                sh_funcs.append('%s}' % (indent))
+                if follow_chain:
+                    sh_funcs.append('%spNext = (VkApplicationInfo*)pNext->pNext;' % (indent))
+                    indent = indent[:-4]
+                    sh_funcs.append('%s}' % (indent))
+                sh_funcs.append('%sreturn structSize;\n}' % indent)
+        return "\n".join(sh_funcs)
+
+    def _generateSizeHelperHeader(self):
+        header = []
+        header.append("//#includes, #defines, globals and such...\n")
+        for f in self.include_headers:
+            header.append("#include <%s>\n" % f)
+        header.append('\n// Function Prototypes\n')
+        header.append("size_t get_struct_chain_size(const void* pStruct);\n")
+        header.append("size_t get_dynamic_struct_size(const void* pStruct);\n")
+        return "".join(header)
+
+    def _generateSizeHelperHeaderC(self):
+        header = []
+        header.append('#include "vk_struct_size_helper.h"')
+        header.append('#include <string.h>')
+        header.append('#include <assert.h>')
+        header.append('\n// Function definitions\n')
+        return "\n".join(header)
+
+
+    def _generateHeader(self):
+        header = []
+        header.append("//#includes, #defines, globals and such...\n")
+        for f in self.include_headers:
+            header.append("#include <%s>\n" % f)
+        return "".join(header)
+
+    # Declarations
+    def _generateConstructorDeclarations(self, s):
+        constructors = []
+        class_name = self.get_class_name(s)
+        constructors.append("    %s();\n" % class_name)
+        constructors.append("    %s(%s* pInStruct);\n" % (class_name, typedef_fwd_dict[s]))
+        constructors.append("    %s(const %s* pInStruct);\n" % (class_name, typedef_fwd_dict[s]))
+        return "".join(constructors)
+
+    def _generateDestructorDeclarations(self, s):
+        return "    virtual ~%s();\n" % self.get_class_name(s)
+
+    def _generateDisplayDeclarations(self, s):
+        return "    void display_txt();\n    void display_single_txt();\n    void display_full_txt();\n"
+
+    def _generateGetSetDeclarations(self, s):
+        get_set = []
+        get_set.append("    void set_indent(uint32_t indent) { m_indent = indent; }\n")
+        for member in sorted(self.struct_dict[s]):
+            # TODO : Skipping array set/get funcs for now
+            if self.struct_dict[s][member]['array']:
+                continue
+            get_set.append("    %s get_%s() { return m_struct.%s; }\n" % (self.struct_dict[s][member]['full_type'], self.struct_dict[s][member]['name'], self.struct_dict[s][member]['name']))
+            if not self.struct_dict[s][member]['const']:
+                get_set.append("    void set_%s(%s inValue) { m_struct.%s = inValue; }\n" % (self.struct_dict[s][member]['name'], self.struct_dict[s][member]['full_type'], self.struct_dict[s][member]['name']))
+        return "".join(get_set)
+
+    def _generatePrivateMembers(self, s):
+        priv = []
+        priv.append("\nprivate:\n")
+        priv.append("    %s m_struct;\n" % typedef_fwd_dict[s])
+        priv.append("    const %s* m_origStructAddr;\n" % typedef_fwd_dict[s])
+        priv.append("    uint32_t m_indent;\n")
+        priv.append("    const char m_dummy_prefix;\n")
+        priv.append("    void display_struct_members();\n")
+        return "".join(priv)
+
+    def _generateClassDeclaration(self):
+        class_decl = []
+        for s in sorted(self.struct_dict):
+            class_decl.append("\n//class declaration")
+            class_decl.append("class %s\n{\npublic:" % self.get_class_name(s))
+            class_decl.append(self._generateConstructorDeclarations(s))
+            class_decl.append(self._generateDestructorDeclarations(s))
+            class_decl.append(self._generateDisplayDeclarations(s))
+            class_decl.append(self._generateGetSetDeclarations(s))
+            class_decl.append(self._generatePrivateMembers(s))
+            class_decl.append("};\n")
+        return "\n".join(class_decl)
+
+    def _generateFooter(self):
+        return "\n//any footer info for class\n"
+
+    def _getSafeStructName(self, struct):
+        return "safe_%s" % (struct)
+
+    # If struct has sType or ptr members, generate safe type
+    def _hasSafeStruct(self, s):
+        exceptions = ['VkPhysicalDeviceFeatures', 'VkPipelineColorBlendStateCreateInfo', 'VkDebugMarkerMarkerInfoEXT']
+        if s in exceptions or not self.struct_dict[s]:
+            return False
+        if 'sType' == self.struct_dict[s][0]['name']:
+            return True
+        for m in self.struct_dict[s]:
+            if self.struct_dict[s][m]['ptr']:
+                return True
+        return False
+
+    def _generateSafeStructHeader(self):
+        header = []
+        header.append("//#includes, #defines, globals and such...\n")
+        header.append('#pragma once\n')
+        header.append('#include "vulkan/vulkan.h"')
+        return "".join(header)
+
+    # If given ty is in obj list, or is a struct that contains anything in obj list, return True
+    def _typeHasObject(self, ty, obj):
+        if ty in obj:
+            return True
+        if is_type(ty, 'struct'):
+            for m in self.struct_dict[ty]:
+                if self.struct_dict[ty][m]['type'] in obj:
+                    return True
+        return False
+
+    def _generateSafeStructDecls(self):
+        ss_decls = []
+        for s in struct_order_list:
+            if not self._hasSafeStruct(s):
+                continue
+            if s in ifdef_dict:
+                ss_decls.append('#ifdef %s' % ifdef_dict[s])
+            ss_name = self._getSafeStructName(s)
+            ss_decls.append("\nstruct %s {" % (ss_name))
+            for m in sorted(self.struct_dict[s]):
+                m_type = self.struct_dict[s][m]['type']
+                if is_type(m_type, 'struct') and self._hasSafeStruct(m_type):
+                    m_type = self._getSafeStructName(m_type)
+                if self.struct_dict[s][m]['array_size'] != 0 and not self.struct_dict[s][m]['dyn_array']:
+                    ss_decls.append("    %s %s[%s];" % (m_type, self.struct_dict[s][m]['name'], self.struct_dict[s][m]['array_size']))
+                elif self.struct_dict[s][m]['ptr'] and 'safe_' not in m_type and not self._typeHasObject(m_type, vulkan.object_non_dispatch_list):#m_type in ['char', 'float', 'uint32_t', 'void', 'VkPhysicalDeviceFeatures']: # We'll never overwrite char* so it can remain const
+                    ss_decls.append("    %s %s;" % (self.struct_dict[s][m]['full_type'], self.struct_dict[s][m]['name']))
+                elif self.struct_dict[s][m]['array']:
+                    ss_decls.append("    %s* %s;" % (m_type, self.struct_dict[s][m]['name']))
+                elif self.struct_dict[s][m]['ptr']:
+                    ss_decls.append("    %s* %s;" % (m_type, self.struct_dict[s][m]['name']))
+                else:
+                    ss_decls.append("    %s %s;" % (m_type, self.struct_dict[s][m]['name']))
+            ss_decls.append("    %s(const %s* pInStruct);" % (ss_name, s))
+            ss_decls.append("    %s(const %s& src);" % (ss_name, ss_name)) # Copy constructor
+            ss_decls.append("    %s();" % (ss_name)) # Default constructor
+            ss_decls.append("    ~%s();" % (ss_name))
+            ss_decls.append("    void initialize(const %s* pInStruct);" % (s))
+            ss_decls.append("    void initialize(const %s* src);" % (ss_name))
+            ss_decls.append("    %s *ptr() { return reinterpret_cast<%s *>(this); }" % (s, s))
+            ss_decls.append("    %s const *ptr() const { return reinterpret_cast<%s const *>(this); }" % (s, s))
+            ss_decls.append("};")
+            if s in ifdef_dict:
+                ss_decls.append('#endif')
+        return "\n".join(ss_decls)
+
+    def _generateSafeStructSourceHeader(self):
+        header = []
+        header.append("//#includes, #defines, globals and such...\n")
+        header.append('#include "vk_safe_struct.h"\n#include <string.h>\n\n')
+        return "".join(header)
+
+    def _generateSafeStructSource(self):
+        ss_src = []
+        for s in struct_order_list:
+            if not self._hasSafeStruct(s):
+                continue
+            if s in ifdef_dict:
+                ss_src.append('#ifdef %s' % ifdef_dict[s])
+            ss_name = self._getSafeStructName(s)
+            init_list = '' # list of members in struct constructor initializer
+            init_func_txt = '' # Txt for initialize() function that takes struct ptr and inits members
+            construct_txt = '' # Body of constuctor as well as body of initialize() func following init_func_txt
+            destruct_txt = ''
+            # VkWriteDescriptorSet is special case because pointers may be non-null but ignored
+            # TODO : This is ugly, figure out better way to do this
+            custom_construct_txt = {'VkWriteDescriptorSet' :
+                                    '    switch (descriptorType) {\n'
+                                    '        case VK_DESCRIPTOR_TYPE_SAMPLER:\n'
+                                    '        case VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER:\n'
+                                    '        case VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE:\n'
+                                    '        case VK_DESCRIPTOR_TYPE_STORAGE_IMAGE:\n'
+                                    '        case VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT:\n'
+                                    '        if (descriptorCount && pInStruct->pImageInfo) {\n'
+                                    '            pImageInfo = new VkDescriptorImageInfo[descriptorCount];\n'
+                                    '            for (uint32_t i=0; i<descriptorCount; ++i) {\n'
+                                    '                pImageInfo[i] = pInStruct->pImageInfo[i];\n'
+                                    '            }\n'
+                                    '        }\n'
+                                    '        break;\n'
+                                    '        case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER:\n'
+                                    '        case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER:\n'
+                                    '        case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC:\n'
+                                    '        case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC:\n'
+                                    '        if (descriptorCount && pInStruct->pBufferInfo) {\n'
+                                    '            pBufferInfo = new VkDescriptorBufferInfo[descriptorCount];\n'
+                                    '            for (uint32_t i=0; i<descriptorCount; ++i) {\n'
+                                    '                pBufferInfo[i] = pInStruct->pBufferInfo[i];\n'
+                                    '            }\n'
+                                    '        }\n'
+                                    '        break;\n'
+                                    '        case VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER:\n'
+                                    '        case VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER:\n'
+                                    '        if (descriptorCount && pInStruct->pTexelBufferView) {\n'
+                                    '            pTexelBufferView = new VkBufferView[descriptorCount];\n'
+                                    '            for (uint32_t i=0; i<descriptorCount; ++i) {\n'
+                                    '                pTexelBufferView[i] = pInStruct->pTexelBufferView[i];\n'
+                                    '            }\n'
+                                    '        }\n'
+                                    '        break;\n'
+                                    '        default:\n'
+                                    '        break;\n'
+                                    '    }\n'}
+            for m in self.struct_dict[s]:
+                m_name = self.struct_dict[s][m]['name']
+                m_type = self.struct_dict[s][m]['type']
+                if is_type(m_type, 'struct') and self._hasSafeStruct(m_type):
+                    m_type = self._getSafeStructName(m_type)
+                if self.struct_dict[s][m]['ptr'] and 'safe_' not in m_type and not self._typeHasObject(m_type, vulkan.object_non_dispatch_list):# in ['char', 'float', 'uint32_t', 'void', 'VkPhysicalDeviceFeatures']) or 'pp' == self.struct_dict[s][m]['name'][0:1]:
+                    # Ptr types w/o a safe_struct, for non-null case need to allocate new ptr and copy data in
+                    if 'KHR' in ss_name or m_type in ['void', 'char']:
+                        # For these exceptions just copy initial value over for now
+                        init_list += '\n\t%s(pInStruct->%s),' % (m_name, m_name)
+                        init_func_txt += '    %s = pInStruct->%s;\n' % (m_name, m_name)
+                    else:
+                        init_list += '\n\t%s(nullptr),' % (m_name)
+                        init_func_txt += '    %s = nullptr;\n' % (m_name)
+                        if 'pNext' != m_name and 'void' not in m_type:
+                            if not self.struct_dict[s][m]['array']:
+                                construct_txt += '    if (pInStruct->%s) {\n' % (m_name)
+                                construct_txt += '        %s = new %s(*pInStruct->%s);\n' % (m_name, m_type, m_name)
+                                construct_txt += '    }\n'
+                                destruct_txt += '    if (%s)\n' % (m_name)
+                                destruct_txt += '        delete %s;\n' % (m_name)
+                            else: # new array and then init each element
+                                construct_txt += '    if (pInStruct->%s) {\n' % (m_name)
+                                construct_txt += '        %s = new %s[pInStruct->%s];\n' % (m_name, m_type, self.struct_dict[s][m]['array_size'])
+                                #construct_txt += '        std::copy (pInStruct->%s, pInStruct->%s+pInStruct->%s, %s);\n' % (m_name, m_name, self.struct_dict[s][m]['array_size'], m_name)
+                                construct_txt += '        memcpy ((void *)%s, (void *)pInStruct->%s, sizeof(%s)*pInStruct->%s);\n' % (m_name, m_name, m_type, self.struct_dict[s][m]['array_size'])
+                                construct_txt += '    }\n'
+                                destruct_txt += '    if (%s)\n' % (m_name)
+                                destruct_txt += '        delete[] %s;\n' % (m_name)
+                elif self.struct_dict[s][m]['array']:
+                    # Init array ptr to NULL
+                    init_list += '\n\t%s(NULL),' % (m_name)
+                    init_func_txt += '    %s = NULL;\n' % (m_name)
+                    array_element = 'pInStruct->%s[i]' % (m_name)
+                    if is_type(self.struct_dict[s][m]['type'], 'struct') and self._hasSafeStruct(self.struct_dict[s][m]['type']):
+                        array_element = '%s(&pInStruct->%s[i])' % (self._getSafeStructName(self.struct_dict[s][m]['type']), m_name)
+                    construct_txt += '    if (%s && pInStruct->%s) {\n' % (self.struct_dict[s][m]['array_size'], m_name)
+                    construct_txt += '        %s = new %s[%s];\n' % (m_name, m_type, self.struct_dict[s][m]['array_size'])
+                    destruct_txt += '    if (%s)\n' % (m_name)
+                    destruct_txt += '        delete[] %s;\n' % (m_name)
+                    construct_txt += '        for (uint32_t i=0; i<%s; ++i) {\n' % (self.struct_dict[s][m]['array_size'])
+                    if 'safe_' in m_type:
+                        construct_txt += '            %s[i].initialize(&pInStruct->%s[i]);\n' % (m_name, m_name)
+                    else:
+                        construct_txt += '            %s[i] = %s;\n' % (m_name, array_element)
+                    construct_txt += '        }\n'
+                    construct_txt += '    }\n'
+                elif self.struct_dict[s][m]['ptr']:
+                    construct_txt += '    if (pInStruct->%s)\n' % (m_name)
+                    construct_txt += '        %s = new %s(pInStruct->%s);\n' % (m_name, m_type, m_name)
+                    construct_txt += '    else\n'
+                    construct_txt += '        %s = NULL;\n' % (m_name)
+                    destruct_txt += '    if (%s)\n' % (m_name)
+                    destruct_txt += '        delete %s;\n' % (m_name)
+                elif 'safe_' in m_type: # inline struct, need to pass in reference for constructor
+                    init_list += '\n\t%s(&pInStruct->%s),' % (m_name, m_name)
+                    init_func_txt += '        %s.initialize(&pInStruct->%s);\n' % (m_name, m_name)
+                else:
+                    init_list += '\n\t%s(pInStruct->%s),' % (m_name, m_name)
+                    init_func_txt += '    %s = pInStruct->%s;\n' % (m_name, m_name)
+            if '' != init_list:
+                init_list = init_list[:-1] # hack off final comma
+            if s in custom_construct_txt:
+                construct_txt = custom_construct_txt[s]
+            ss_src.append("\n%s::%s(const %s* pInStruct) : %s\n{\n%s}" % (ss_name, ss_name, s, init_list, construct_txt))
+            ss_src.append("\n%s::%s() {}" % (ss_name, ss_name))
+            # Create slight variation of init and construct txt for copy constructor that takes a src object reference vs. struct ptr
+            copy_construct_init = init_func_txt.replace('pInStruct->', 'src.')
+            copy_construct_txt = construct_txt.replace(' (pInStruct->', ' (src.') # Exclude 'if' blocks from next line
+            copy_construct_txt = copy_construct_txt.replace('(pInStruct->', '(*src.') # Pass object to copy constructors
+            copy_construct_txt = copy_construct_txt.replace('pInStruct->', 'src.') # Modify remaining struct refs for src object
+            ss_src.append("\n%s::%s(const %s& src)\n{\n%s%s}" % (ss_name, ss_name, ss_name, copy_construct_init, copy_construct_txt)) # Copy constructor
+            ss_src.append("\n%s::~%s()\n{\n%s}" % (ss_name, ss_name, destruct_txt))
+            ss_src.append("\nvoid %s::initialize(const %s* pInStruct)\n{\n%s%s}" % (ss_name, s, init_func_txt, construct_txt))
+            # Copy initializer uses same txt as copy constructor but has a ptr and not a reference
+            init_copy = copy_construct_init.replace('src.', 'src->')
+            init_construct = copy_construct_txt.replace('src.', 'src->')
+            ss_src.append("\nvoid %s::initialize(const %s* src)\n{\n%s%s}" % (ss_name, ss_name, init_copy, init_construct))
+            if s in ifdef_dict:
+                ss_src.append('#endif')
+        return "\n".join(ss_src)
+
+class EnumCodeGen:
+    def __init__(self, enum_type_dict=None, enum_val_dict=None, typedef_fwd_dict=None, in_file=None, out_sh_file=None, out_vh_file=None):
+        self.et_dict = enum_type_dict
+        self.ev_dict = enum_val_dict
+        self.tf_dict = typedef_fwd_dict
+        self.in_file = in_file
+        self.out_sh_file = out_sh_file
+        self.eshfg = CommonFileGen(self.out_sh_file)
+        self.out_vh_file = out_vh_file
+        self.evhfg = CommonFileGen(self.out_vh_file)
+
+    def generateStringHelper(self):
+        self.eshfg.setHeader(self._generateSHHeader())
+        self.eshfg.setBody(self._generateSHBody())
+        self.eshfg.generate()
+
+    def generateEnumValidate(self):
+        self.evhfg.setHeader(self._generateSHHeader())
+        self.evhfg.setBody(self._generateVHBody())
+        self.evhfg.generate()
+
+    def _generateVHBody(self):
+        body = []
+        for bet in sorted(self.et_dict):
+            fet = self.tf_dict[bet]
+            body.append("static inline uint32_t validate_%s(%s input_value)\n{" % (fet, fet))
+            # TODO : This is not ideal, but allows for flag combinations. Need more rigorous validation of realistic flag combinations
+            if 'flagbits' in bet.lower():
+                body.append('    if (input_value > (%s))' % (' | '.join(self.et_dict[bet])))
+                body.append('        return 0;')
+                body.append('    return 1;')
+                body.append('}\n\n')
+            else:
+                body.append('    switch ((%s)input_value)\n    {' % (fet))
+                for e in sorted(self.et_dict[bet]):
+                    if (self.ev_dict[e]['unique']):
+                        body.append('        case %s:' % (e))
+                body.append('            return 1;\n        default:\n            return 0;\n    }\n}\n\n')
+        return "\n".join(body)
+
+    def _generateSHBody(self):
+        body = []
+#        with open(self.out_file, "a") as hf:
+            # bet == base_enum_type, fet == final_enum_type
+        for bet in sorted(self.et_dict):
+            fet = self.tf_dict[bet]
+            body.append("static inline const char* string_%s(%s input_value)\n{\n    switch ((%s)input_value)\n    {" % (fet, fet, fet))
+            for e in sorted(self.et_dict[bet]):
+                if (self.ev_dict[e]['unique']):
+                    body.append('        case %s:\n            return "%s";' % (e, e))
+            body.append('        default:\n            return "Unhandled %s";\n    }\n}\n\n' % (fet))
+        return "\n".join(body)
+
+    def _generateSHHeader(self):
+        header = []
+        header.append('#pragma once\n')
+        header.append('#ifdef _WIN32\n')
+        header.append('#pragma warning( disable : 4065 )\n')
+        header.append('#endif\n')
+        header.append('#include <vulkan/%s>\n\n\n' % self.in_file)
+        return "\n".join(header)
+
+
+class CMakeGen:
+    def __init__(self, struct_wrapper=None, out_dir=None):
+        self.sw = struct_wrapper
+        self.include_headers = []
+        self.add_lib_file_list = self.sw.get_file_list()
+        self.out_dir = out_dir
+        self.out_file = os.path.join(self.out_dir, "CMakeLists.txt")
+        self.cmg = CommonFileGen(self.out_file)
+
+    def generate(self):
+        self.cmg.setBody(self._generateBody())
+        self.cmg.generate()
+
+    def _generateBody(self):
+        body = []
+        body.append("project(%s)" % os.path.basename(self.out_dir))
+        body.append("cmake_minimum_required(VERSION 2.8)\n")
+        body.append("add_library(${PROJECT_NAME} %s)\n" % " ".join(self.add_lib_file_list))
+        body.append('set(COMPILE_FLAGS "-fpermissive")')
+        body.append('set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${COMPILE_FLAGS}")\n')
+        body.append("include_directories(${SRC_DIR}/thirdparty/${GEN_API}/inc/)\n")
+        body.append("target_include_directories (%s PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})\n" % os.path.basename(self.out_dir))
+        return "\n".join(body)
+
+class GraphVizGen:
+    def __init__(self, struct_dict, prefix, out_dir):
+        self.struct_dict = struct_dict
+        self.api = prefix
+        if prefix == "vulkan":
+            self.api_prefix = "vk"
+        else:
+            self.api_prefix = prefix
+        self.out_file = os.path.join(out_dir, self.api_prefix+"_struct_graphviz_helper.h")
+        self.gvg = CommonFileGen(self.out_file)
+
+    def generate(self):
+        self.gvg.setCopyright("//This is the copyright\n")
+        self.gvg.setHeader(self._generateHeader())
+        self.gvg.setBody(self._generateBody())
+        #self.gvg.setFooter('}')
+        self.gvg.generate()
+
+    def set_include_headers(self, include_headers):
+        self.include_headers = include_headers
+
+    def _generateHeader(self):
+        header = []
+        header.append("//#includes, #defines, globals and such...\n")
+        for f in self.include_headers:
+            if 'vk_enum_string_helper' not in f:
+                header.append("#include <%s>\n" % f)
+        #header.append('#include "vk_enum_string_helper.h"\n\n// Function Prototypes\n')
+        header.append("\nchar* dynamic_gv_display(const void* pStruct, const char* prefix);\n")
+        return "".join(header)
+
+    def _get_gv_func_name(self, struct):
+        return "%s_gv_print_%s" % (self.api_prefix, struct.lower().strip("_"))
+
+    # Return elements to create formatted string for given struct member
+    def _get_struct_gv_print_formatted(self, struct_member, pre_var_name="", postfix = "\\n", struct_var_name="pStruct", struct_ptr=True, print_array=False, port_label=""):
+        struct_op = "->"
+        pre_var_name = '"%s "' % struct_member['full_type']
+        if not struct_ptr:
+            struct_op = "."
+        member_name = struct_member['name']
+        print_type = "p"
+        cast_type = ""
+        member_post = ""
+        array_index = ""
+        member_print_post = ""
+        print_delimiter = "%"
+        if struct_member['array'] and 'char' in struct_member['type'].lower(): # just print char array as string
+            print_type = "p"
+            print_array = False
+        elif struct_member['array'] and not print_array:
+            # Just print base address of array when not full print_array
+            cast_type = "(void*)"
+        elif is_type(struct_member['type'], 'enum'):
+            if struct_member['ptr']:
+                struct_var_name = "*" + struct_var_name
+                print_delimiter = "0x%"
+            cast_type = "string_%s" % struct_member['type']
+            print_type = "s"
+        elif is_type(struct_member['type'], 'struct'): # print struct address for now
+            cast_type = "(void*)"
+            print_delimiter = "0x%"
+            if not struct_member['ptr']:
+                cast_type = "(void*)&"
+        elif 'bool' in struct_member['type'].lower():
+            print_type = "s"
+            member_post = ' ? "TRUE" : "FALSE"'
+        elif 'float' in struct_member['type']:
+            print_type = "f"
+        elif 'uint64' in struct_member['type'] or 'gpusize' in struct_member['type'].lower():
+            print_type = '" PRId64 "'
+        elif 'uint8' in struct_member['type']:
+            print_type = "hu"
+        elif 'size' in struct_member['type'].lower():
+            print_type = '" PRINTF_SIZE_T_SPECIFIER "'
+            print_delimiter = ""
+        elif True in [ui_str.lower() in struct_member['type'].lower() for ui_str in ['uint', 'flags', 'samplemask']]:
+            print_type = "u"
+        elif 'int' in struct_member['type']:
+            print_type = "i"
+        elif struct_member['ptr']:
+            print_delimiter = "0x%"
+            pass
+        else:
+            #print("Unhandled struct type: %s" % struct_member['type'])
+            print_delimiter = "0x%"
+            cast_type = "(void*)"
+        if print_array and struct_member['array']:
+            member_print_post = "[%u]"
+            array_index = " i,"
+            member_post = "[i]"
+        print_out = "<TR><TD>%%s%s%s</TD><TD%s>%s%s%s</TD></TR>" % (member_name, member_print_post, port_label, print_delimiter, print_type, postfix) # section of print that goes inside of quotes
+        print_arg = ", %s,%s %s(%s%s%s)%s\n" % (pre_var_name, array_index, cast_type, struct_var_name, struct_op, member_name, member_post) # section of print passed to portion in quotes
+        return (print_out, print_arg)
+
+    def _generateBody(self):
+        gv_funcs = []
+        array_func_list = [] # structs for which we'll generate an array version of their print function
+        array_func_list.append('vkbufferviewattachinfo')
+        array_func_list.append('vkimageviewattachinfo')
+        array_func_list.append('vksamplerimageviewinfo')
+        array_func_list.append('vkdescriptortypecount')
+        # For first pass, generate prototype
+        for s in sorted(self.struct_dict):
+            gv_funcs.append('char* %s(const %s* pStruct, const char* myNodeName);\n' % (self._get_gv_func_name(s), typedef_fwd_dict[s]))
+            if s.lower().strip("_") in array_func_list:
+                if s.lower().strip("_") in ['vkbufferviewattachinfo', 'vkimageviewattachinfo']:
+                    gv_funcs.append('char* %s_array(uint32_t count, const %s* const* pStruct, const char* myNodeName);\n' % (self._get_gv_func_name(s), typedef_fwd_dict[s]))
+                else:
+                    gv_funcs.append('char* %s_array(uint32_t count, const %s* pStruct, const char* myNodeName);\n' % (self._get_gv_func_name(s), typedef_fwd_dict[s]))
+        gv_funcs.append('\n')
+        for s in sorted(self.struct_dict):
+            p_out = ""
+            p_args = ""
+            stp_list = [] # stp == "struct to print" a list of structs for this API call that should be printed as structs
+            # the fields below are a super-hacky way for now to get port labels into GV output, TODO : Clean this up!            
+            pl_dict = {}
+            struct_num = 0
+            # This isn't great but this pre-pass flags structs w/ pNext and other struct ptrs
+            for m in sorted(self.struct_dict[s]):
+                if 'pNext' == self.struct_dict[s][m]['name'] or is_type(self.struct_dict[s][m]['type'], 'struct'):
+                    stp_list.append(self.struct_dict[s][m])
+                    if 'pNext' == self.struct_dict[s][m]['name']:
+                        pl_dict[m] = ' PORT=\\"pNext\\"'
+                    else:
+                        pl_dict[m] = ' PORT=\\"struct%i\\"' % struct_num
+                    struct_num += 1
+            gv_funcs.append('char* %s(const %s* pStruct, const char* myNodeName)\n{\n    char* str;\n' % (self._get_gv_func_name(s), typedef_fwd_dict[s]))
+            num_stps = len(stp_list);
+            total_strlen_str = ''
+            if 0 != num_stps:
+                gv_funcs.append("    char* tmpStr;\n")
+                gv_funcs.append("    char nodeName[100];\n")
+                gv_funcs.append('    char* stp_strs[%i];\n' % num_stps)
+                for index in range(num_stps):
+                    if (stp_list[index]['ptr']):
+                        if 'pDescriptorInfo' == stp_list[index]['name']:
+                            gv_funcs.append('    if (pStruct->pDescriptorInfo && (0 != pStruct->descriptorCount)) {\n')
+                        else:
+                            gv_funcs.append('    if (pStruct->%s) {\n' % stp_list[index]['name'])
+                        if 'pNext' == stp_list[index]['name']:
+                            gv_funcs.append('        sprintf(nodeName, "pNext_0x%p", (void*)pStruct->pNext);\n')
+                            gv_funcs.append('        tmpStr = dynamic_gv_display((void*)pStruct->pNext, nodeName);\n')
+                            gv_funcs.append('        stp_strs[%i] = (char*)malloc(256+strlen(tmpStr)+strlen(nodeName)+strlen(myNodeName));\n' % index)
+                            gv_funcs.append('        sprintf(stp_strs[%i], "%%s\\n\\"%%s\\":pNext -> \\"%%s\\" [];\\n", tmpStr, myNodeName, nodeName);\n' % index)
+                            gv_funcs.append('        free(tmpStr);\n')
+                        else:
+                            gv_funcs.append('        sprintf(nodeName, "%s_0x%%p", (void*)pStruct->%s);\n' % (stp_list[index]['name'], stp_list[index]['name']))
+                            if stp_list[index]['name'] in ['pTypeCount', 'pSamplerImageViews']:
+                                gv_funcs.append('        tmpStr = %s_array(pStruct->count, pStruct->%s, nodeName);\n' % (self._get_gv_func_name(stp_list[index]['type']), stp_list[index]['name']))
+                            else:
+                                gv_funcs.append('        tmpStr = %s(pStruct->%s, nodeName);\n' % (self._get_gv_func_name(stp_list[index]['type']), stp_list[index]['name']))
+                            gv_funcs.append('        stp_strs[%i] = (char*)malloc(256+strlen(tmpStr)+strlen(nodeName)+strlen(myNodeName));\n' % (index))
+                            gv_funcs.append('        sprintf(stp_strs[%i], "%%s\\n\\"%%s\\":struct%i -> \\"%%s\\" [];\\n", tmpStr, myNodeName, nodeName);\n' % (index, index))
+                        gv_funcs.append('    }\n')
+                        gv_funcs.append("    else\n        stp_strs[%i] = \"\";\n" % (index))
+                    elif stp_list[index]['array']: # TODO : For now just printing first element of array
+                        gv_funcs.append('    sprintf(nodeName, "%s_0x%%p", (void*)&pStruct->%s[0]);\n' % (stp_list[index]['name'], stp_list[index]['name']))
+                        gv_funcs.append('    tmpStr = %s(&pStruct->%s[0], nodeName);\n' % (self._get_gv_func_name(stp_list[index]['type']), stp_list[index]['name']))
+                        gv_funcs.append('    stp_strs[%i] = (char*)malloc(256+strlen(tmpStr)+strlen(nodeName)+strlen(myNodeName));\n' % (index))
+                        gv_funcs.append('    sprintf(stp_strs[%i], "%%s\\n\\"%%s\\":struct%i -> \\"%%s\\" [];\\n", tmpStr, myNodeName, nodeName);\n' % (index, index))
+                    else:
+                        gv_funcs.append('    sprintf(nodeName, "%s_0x%%p", (void*)&pStruct->%s);\n' % (stp_list[index]['name'], stp_list[index]['name']))
+                        gv_funcs.append('    tmpStr = %s(&pStruct->%s, nodeName);\n' % (self._get_gv_func_name(stp_list[index]['type']), stp_list[index]['name']))
+                        gv_funcs.append('    stp_strs[%i] = (char*)malloc(256+strlen(tmpStr)+strlen(nodeName)+strlen(myNodeName));\n' % (index))
+                        gv_funcs.append('    sprintf(stp_strs[%i], "%%s\\n\\"%%s\\":struct%i -> \\"%%s\\" [];\\n", tmpStr, myNodeName, nodeName);\n' % (index, index))
+                    total_strlen_str += 'strlen(stp_strs[%i]) + ' % index
+            gv_funcs.append('    str = (char*)malloc(%ssizeof(char)*2048);\n' % (total_strlen_str))
+            gv_funcs.append('    sprintf(str, "\\"%s\\" [\\nlabel = <<TABLE BORDER=\\"0\\" CELLBORDER=\\"1\\" CELLSPACING=\\"0\\"><TR><TD COLSPAN=\\"2\\">%s (0x%p)</TD></TR>')
+            p_args = ", myNodeName, myNodeName, pStruct"
+            for m in sorted(self.struct_dict[s]):
+                plabel = ""
+                if m in pl_dict:
+                    plabel = pl_dict[m]
+                (p_out1, p_args1) = self._get_struct_gv_print_formatted(self.struct_dict[s][m], port_label=plabel)
+                p_out += p_out1
+                p_args += p_args1
+            p_out += '</TABLE>>\\n];\\n\\n"'
+            p_args += ");\n"
+            gv_funcs.append(p_out)
+            gv_funcs.append(p_args)
+            if 0 != num_stps:
+                gv_funcs.append('    for (int32_t stp_index = %i; stp_index >= 0; stp_index--) {\n' % (num_stps-1))
+                gv_funcs.append('        if (0 < strlen(stp_strs[stp_index])) {\n')
+                gv_funcs.append('            strncat(str, stp_strs[stp_index], strlen(stp_strs[stp_index]));\n')
+                gv_funcs.append('            free(stp_strs[stp_index]);\n')
+                gv_funcs.append('        }\n')
+                gv_funcs.append('    }\n')
+            gv_funcs.append("    return str;\n}\n")
+            if s.lower().strip("_") in array_func_list:
+                ptr_array = False
+                if s.lower().strip("_") in ['vkbufferviewattachinfo', 'vkimageviewattachinfo']:
+                    ptr_array = True
+                    gv_funcs.append('char* %s_array(uint32_t count, const %s* const* pStruct, const char* myNodeName)\n{\n    char* str;\n    char tmpStr[1024];\n' % (self._get_gv_func_name(s), typedef_fwd_dict[s]))
+                else:
+                    gv_funcs.append('char* %s_array(uint32_t count, const %s* pStruct, const char* myNodeName)\n{\n    char* str;\n    char tmpStr[1024];\n' % (self._get_gv_func_name(s), typedef_fwd_dict[s]))
+                gv_funcs.append('    str = (char*)malloc(sizeof(char)*1024*count);\n')
+                gv_funcs.append('    sprintf(str, "\\"%s\\" [\\nlabel = <<TABLE BORDER=\\"0\\" CELLBORDER=\\"1\\" CELLSPACING=\\"0\\"><TR><TD COLSPAN=\\"3\\">%s (0x%p)</TD></TR>", myNodeName, myNodeName, pStruct);\n')
+                gv_funcs.append('    for (uint32_t i=0; i < count; i++) {\n')
+                gv_funcs.append('        sprintf(tmpStr, "');
+                p_args = ""
+                p_out = ""
+                for m in sorted(self.struct_dict[s]):
+                    plabel = ""
+                    (p_out1, p_args1) = self._get_struct_gv_print_formatted(self.struct_dict[s][m], port_label=plabel)
+                    if 0 == m: # Add array index notation at end of first row
+                        p_out1 = '%s<TD ROWSPAN=\\"%i\\" PORT=\\"slot%%u\\">%%u</TD></TR>' % (p_out1[:-5], len(self.struct_dict[s]))
+                        p_args1 += ', i, i'
+                    p_out += p_out1
+                    p_args += p_args1
+                p_out += '"'
+                p_args += ");\n"
+                if ptr_array:
+                    p_args = p_args.replace('->', '[i]->')
+                else:
+                    p_args = p_args.replace('->', '[i].')
+                gv_funcs.append(p_out);
+                gv_funcs.append(p_args);
+                gv_funcs.append('        strncat(str, tmpStr, strlen(tmpStr));\n')
+                gv_funcs.append('    }\n')
+                gv_funcs.append('    strncat(str, "</TABLE>>\\n];\\n\\n", 20);\n')
+                gv_funcs.append('    return str;\n}\n')
+        # Add function to dynamically print out unknown struct
+        gv_funcs.append("char* dynamic_gv_display(const void* pStruct, const char* nodeName)\n{\n")
+        gv_funcs.append("    // Cast to APP_INFO ptr initially just to pull sType off struct\n")
+        gv_funcs.append("    VkStructureType sType = ((VkApplicationInfo*)pStruct)->sType;\n")
+        gv_funcs.append("    switch (sType)\n    {\n")
+        for e in enum_type_dict:
+            if "StructureType" in e:
+                for v in sorted(enum_type_dict[e]):
+                    struct_name = get_struct_name_from_struct_type(v)
+                    if struct_name not in self.struct_dict:
+                        continue
+
+                    print_func_name = self._get_gv_func_name(struct_name)
+                    # TODO : Hand-coded fixes for some exceptions
+                    #if 'VkPipelineCbStateCreateInfo' in struct_name:
+                    #    struct_name = 'VK_PIPELINE_CB_STATE'
+                    if 'VkSemaphoreCreateInfo' in struct_name:
+                        struct_name = 'VkSemaphoreCreateInfo'
+                        print_func_name = self._get_gv_func_name(struct_name)
+                    elif 'VkSemaphoreOpenInfo' in struct_name:
+                        struct_name = 'VkSemaphoreOpenInfo'
+                        print_func_name = self._get_gv_func_name(struct_name)
+                    gv_funcs.append('        case %s:\n' % (v))
+                    gv_funcs.append('            return %s((%s*)pStruct, nodeName);\n' % (print_func_name, struct_name))
+                    #gv_funcs.append('        }\n')
+                    #gv_funcs.append('        break;\n')
+                gv_funcs.append("        default:\n")
+                gv_funcs.append("        return NULL;\n")
+                gv_funcs.append("    }\n")
+        gv_funcs.append("}")
+        return "".join(gv_funcs)
+
+
+
+
+
+#    def _generateHeader(self):
+#        hdr = []
+#        hdr.append('digraph g {\ngraph [\nrankdir = "LR"\n];')
+#        hdr.append('node [\nfontsize = "16"\nshape = "plaintext"\n];')
+#        hdr.append('edge [\n];\n')
+#        return "\n".join(hdr)
+#
+#    def _generateBody(self):
+#        body = []
+#        for s in sorted(self.struc_dict):
+#            field_num = 1
+#            body.append('"%s" [\nlabel = <<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0"> <TR><TD COLSPAN="2" PORT="f0">%s</TD></TR>' % (s, typedef_fwd_dict[s]))
+#            for m in sorted(self.struc_dict[s]):
+#                body.append('<TR><TD PORT="f%i">%s</TD><TD PORT="f%i">%s</TD></TR>' % (field_num, self.struc_dict[s][m]['full_type'], field_num+1, self.struc_dict[s][m]['name']))
+#                field_num += 2
+#            body.append('</TABLE>>\n];\n')
+#        return "".join(body)
+
+def main(argv=None):
+    opts = handle_args()
+    # Parse input file and fill out global dicts
+    hfp = HeaderFileParser(opts.input_file)
+    hfp.parse()
+    # TODO : Don't want these to be global, see note at top about wrapper classes
+    global enum_val_dict
+    global enum_type_dict
+    global struct_dict
+    global opaque_types
+    global typedef_fwd_dict
+    global typedef_rev_dict
+    global types_dict
+    enum_val_dict = hfp.get_enum_val_dict()
+    enum_type_dict = hfp.get_enum_type_dict()
+    struct_dict = hfp.get_struct_dict()
+    opaque_types = hfp.get_opaque_types()
+    # TODO : Would like to validate struct data here to verify that all of the bools for struct members are correct at this point
+    typedef_fwd_dict = hfp.get_typedef_fwd_dict()
+    typedef_rev_dict = hfp.get_typedef_rev_dict()
+    types_dict = hfp.get_types_dict()
+    #print(enum_val_dict)
+    #print(typedef_dict)
+    #print(struct_dict)
+    input_header = os.path.basename(opts.input_file)
+    if 'vulkan.h' == input_header:
+        input_header = "vulkan/vulkan.h"
+
+    prefix = os.path.basename(opts.input_file).strip(".h")
+    if prefix == "vulkan":
+        prefix = "vk"
+    if (opts.abs_out_dir is not None):
+        enum_sh_filename = os.path.join(opts.abs_out_dir, prefix+"_enum_string_helper.h")
+    else:
+        enum_sh_filename = os.path.join(os.getcwd(), opts.rel_out_dir, prefix+"_enum_string_helper.h")
+    enum_sh_filename = os.path.abspath(enum_sh_filename)
+    if not os.path.exists(os.path.dirname(enum_sh_filename)):
+        print("Creating output dir %s" % os.path.dirname(enum_sh_filename))
+        os.mkdir(os.path.dirname(enum_sh_filename))
+    if opts.gen_enum_string_helper:
+        print("Generating enum string helper to %s" % enum_sh_filename)
+        enum_vh_filename = os.path.join(os.path.dirname(enum_sh_filename), prefix+"_enum_validate_helper.h")
+        print("Generating enum validate helper to %s" % enum_vh_filename)
+        eg = EnumCodeGen(enum_type_dict, enum_val_dict, typedef_fwd_dict, os.path.basename(opts.input_file), enum_sh_filename, enum_vh_filename)
+        eg.generateStringHelper()
+        eg.generateEnumValidate()
+    #for struct in struct_dict:
+    #print(struct)
+    if opts.gen_struct_wrappers:
+        sw = StructWrapperGen(struct_dict, opaque_types, os.path.basename(opts.input_file).strip(".h"), os.path.dirname(enum_sh_filename))
+        #print(sw.get_class_name(struct))
+        sw.set_include_headers([input_header,os.path.basename(enum_sh_filename),"stdint.h","stdio.h","stdlib.h","iostream","sstream","string"])
+        sw.set_no_addr(False)
+        # NB: the api dump version of gen_struct_wrappers only generates an api_dump version of the vk_struct_helper_cpp.h file called vk_api_dump_helper_cpp.h
+        sw.generateStringHelperCpp()
+        return
+        sw.set_include_headers([input_header,os.path.basename(enum_sh_filename),"stdint.h","cinttypes", "stdio.h","stdlib.h"])
+        print("Generating struct wrapper header to %s" % sw.header_filename)
+        sw.generateHeader()
+        print("Generating struct wrapper class to %s" % sw.class_filename)
+        sw.generateBody()
+        sw.generateStringHelper()
+        sw.generateValidateHelper()
+        # Generate a 2nd helper file that excludes addrs
+        sw.set_no_addr(True)
+        sw.generateStringHelper()
+        sw.set_no_addr(False)
+        sw.set_include_headers([input_header,os.path.basename(enum_sh_filename),"stdint.h","stdio.h","stdlib.h","iostream","sstream","string"])
+        sw.set_no_addr(True)
+        sw.generateStringHelperCpp()
+        sw.set_no_addr(False)
+        sw.generateStringHelperCpp()
+        sw.set_include_headers(["stdio.h", "stdlib.h", input_header])
+        sw.generateSizeHelper()
+        sw.generateSizeHelperC()
+        sw.generateSafeStructHeader()
+        sw.generateSafeStructs()
+    if opts.gen_struct_sizes:
+        st = StructWrapperGen(struct_dict, os.path.basename(opts.input_file).strip(".h"), os.path.dirname(enum_sh_filename))
+        st.set_include_headers(["stdio.h", "stdlib.h", input_header])
+        st.generateSizeHelper()
+        st.generateSizeHelperC()
+    if opts.gen_cmake:
+        cmg = CMakeGen(sw, os.path.dirname(enum_sh_filename))
+        cmg.generate()
+    if opts.gen_graphviz:
+        gv = GraphVizGen(struct_dict, os.path.basename(opts.input_file).strip(".h"), os.path.dirname(enum_sh_filename))
+        gv.set_include_headers([input_header,os.path.basename(enum_sh_filename),"stdint.h","stdio.h","stdlib.h", "cinttypes"])
+        gv.generate()
+    print("DONE!")
+    #print(typedef_rev_dict)
+    #print(types_dict)
+    #recreate_structs()
+
+if __name__ == "__main__":
+    sys.exit(main())

diff --git a/vktrace/.gitignore b/vktrace/.gitignore
new file mode 100644
index 0000000..73c53a6
--- /dev/null
+++ b/vktrace/.gitignore

@@ -0,0 +1,27 @@
+CMakeCache.txt
+CMakeFiles/
+cmake_install.cmake
+Makefile
+__pycache__
+VKConfig.h
+icd/common/libicd.a
+icd/intel/intel_gpa.c
+_out64
+out32/*
+out64/*
+demos/Debug/*
+demos/tri.dir/Debug/*
+demos/tri/Debug/*
+demos/Win32/Debug/*
+demos/xcb_nvidia.dir/*
+libs/Win32/Debug/*
+*.pyc
+*.vcproj
+*.sln
+*.suo
+*.vcxproj
+*.sdf
+*.filters
+build
+dbuild
+src/vktrace_layer/codegen/*

diff --git a/vktrace/CMakeLists.txt b/vktrace/CMakeLists.txt
new file mode 100644
index 0000000..59dd995
--- /dev/null
+++ b/vktrace/CMakeLists.txt

@@ -0,0 +1,97 @@
+PROJECT(vktrace_project)
+
+option(BUILD_VKTRACEVIEWER "Build VkTraceViewer" ON)
+
+if (BUILD_VKTRACEVIEWER)
+    # We need CMake version 3.0+ in order to "find_package(Qt5)":
+    cmake_minimum_required(VERSION 3.0)
+else ()
+    cmake_minimum_required(VERSION 2.8.11)
+endif()
+
+set(SRC_DIR "${CMAKE_CURRENT_SOURCE_DIR}/src")
+
+#set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${SRC_DIR}/cmake/Modules/")
+#set(CMAKE_EXTERNAL_PATH "${SRC_DIR}/../../external")
+
+if (WIN32)
+    # TODO: s/CMAKE_PREFIX_PATH/CMAKE_EXTERNAL_WINDOWS_PATH/g
+#    set(CMAKE_PREFIX_PATH "${CMAKE_EXTERNAL_PATH}/windows")
+    set(WIN32_PTHREADS_PATH "${SRC_DIR}/thirdparty/pthreads.2")
+    set(WIN32_PTHREADS_INCLUDE_PATH "${WIN32_PTHREADS_PATH}/include")
+endif()
+
+set(PYTHON_EXECUTABLE ${PYTHON_CMD})
+find_package(PythonInterp)
+
+if (NOT PYTHONINTERP_FOUND)
+    message(FATAL_ERROR "Missing PythonInterp. Install python interpreter 2.7 (on linux use cmd: sudo apt-get install python2.7)")
+endif()
+
+#search for QT only if BUILD_VKTRACEVIEWER is ON
+if(BUILD_VKTRACEVIEWER)
+    find_package(Qt5 COMPONENTS Widgets Gui Core Svg)
+    if (NOT Qt5_FOUND)
+        if (WIN32)
+            message(WARNING "Qt5 dev libraries not found, vktraceviewer will not be built.\nTo enable build of vktraceviewer, set env var Qt5_Dir to\nC:\\Qt\\5.3\\msvc2013_64\\lib\\cmake\\Qt5 or C:\\Qt\\5.3\\msvc2013\\lib\\cmake\\Qt5")
+        else()
+            message(WARNING "Qt5 dev libraries not found, vktraceviewer will not be built.\nTo enable build of vktraceviewer, install package qt5-default.")
+        endif()
+    endif()
+endif()
+
+include_directories(
+	${CMAKE_CURRENT_SOURCE_DIR}/../include/vulkan
+)
+
+message("")
+message("cmake options:")
+message("  -DCMAKE_BUILD_TYPE='${CMAKE_BUILD_TYPE}': Build debug or release. (Debug|Release)")
+message("  -DCMAKE_VERBOSE='${CMAKE_VERBOSE}': Spew cmake project options. (On|Off)")
+message("  -DBUILD_X64='${BUILD_X64}': Build 32 or 64-bit. (On|Off)")
+message("")
+
+#
+#  Components to build
+#
+set(VKTRACE_VULKAN_DIR ${CMAKE_CURRENT_SOURCE_DIR})
+
+add_subdirectory(src/vktrace_common)
+add_subdirectory(src/vktrace_trace)
+
+option(BUILD_VKTRACE_LAYER "Build vktrace_layer" ON)
+if(BUILD_VKTRACE_LAYER)
+    add_subdirectory(src/vktrace_layer)
+endif()
+option(BUILD_VKTRACE_REPLAY "Build vktrace_replay" ON)
+if(BUILD_VKTRACE_REPLAY)
+    add_subdirectory(src/vktrace_replay)
+endif()
+
+# Only build vktraceviewer if Qt5 is available
+if (Qt5_FOUND AND BUILD_VKTRACEVIEWER)
+    add_subdirectory(src/vktrace_viewer)
+endif()
+
+# use macro from stackoverflow (link below) to get all the extensions that are on the current system
+# http://stackoverflow.com/questions/7787823/cmake-how-to-get-the-name-of-all-subdirectories-of-a-directory
+MACRO(SUBDIRLIST result curdir)
+  FILE(GLOB children RELATIVE ${curdir} ${curdir}/*)
+  SET(dirlist "")
+  FOREACH(child ${children})
+    IF(IS_DIRECTORY ${curdir}/${child})
+        LIST(APPEND dirlist ${child})
+    ENDIF()
+  ENDFOREACH()
+  SET(${result} ${dirlist})
+ENDMACRO()
+
+# now generate the list and add each of the subdirectories
+SUBDIRLIST(SUBDIRS ${SRC_DIR}/vktrace_extensions)
+message("Adding extensions: '${SUBDIRS}'")
+FOREACH(subdir ${SUBDIRS})
+    add_subdirectory(${SRC_DIR}/vktrace_extensions/${subdir})
+ENDFOREACH()
+
+
+

diff --git a/vktrace/README.md b/vktrace/README.md
new file mode 100644
index 0000000..a3f7447
--- /dev/null
+++ b/vktrace/README.md

@@ -0,0 +1,109 @@
+Vktrace Trace and Replay Tool
+=============================
+
+Vktrace is a Vulkan API tracer for graphics applications.
+
+##Using Vktrace on Linux###
+Vktrace builds two binaries with associated Vulkan libraries: a tracer with Vulkan
+tracing library and a replayer. The tracing library is a Vulkan layer library.
+
+###Running Vktrace tracer as standalone server on Linux###
+The Vktrace tracer program can run as a server.  Then the app/game to be traced
+is launched separately with the Vktrace tracer library preloaded. To run
+Vktrace as a server one should omit the "-p" option.
+```
+cd <vktrace build directory>
+./vktrace <options>
+Example to trace spinning cube demo.
+export VK_ICD_FILENAMES=/home/jon/LoaderAndValidationLayers/main_icd.json
+export LD_LIBRARY_PATH=/home/jon/LoaderAndValidationLayers/dbuild/loader
+./vktrace -o vktrace_cube.vktrace
+```
+
+In a separate terminal run your app, the cube demo in this example:
+```
+cd /home/jon/LoaderAndValidationLayers/dbuild/demos
+export VK_ICD_FILENAMES=/home/jon/LoaderAndValidationLayers/dbuild/icd/intel/intel_icd.json
+export LD_LIBRARY_PATH=/home/jon/LoaderAndValidationLayers/dbuild/loader
+VK_INSTANCE_LAYERS=VK_LAYER_LUNARG_vktrace VK_DEVICE_LAYERS=VK_LAYER_LUNARG_vktrace ./cube
+```
+
+Trace file is written into "vktrace_cube<number>.vktrace".
+As the app is rerun, the Vktrace tracer server will increment the output file
+number for each successive run of the app.
+
+One can also set VKTRACE_LIB_IPADDR to a remote system IP address. Then
+the tracer inserted into an app will send the trace packets to the remote
+system rather than local system. In this case, the remote system should be
+running the trace server.
+
+###Running Vktrace tracer and launch app/game from tracer on Linux###
+The Vktrace tracer program launches the app/game you desire and then traces it.
+To launch app/game from Vktrace tracer one must use the "-p" option.
+```
+cd <vktrace build dir>
+./vktrace -p <path to app to launch>  <more options>
+```
+Example to trace the spinning cube demo from sample implementation
+```
+export VK_ICD_FILENAMES=/home/jon/LoaderAndValidationLayers/main_icd.json
+export LD_LIBRARY_PATH=/home/jon/LoaderAndValidationLayers/dbuild/loader
+./vktrace -p /home/jon/LoaderAndValidationLayers/dbuild/demos/cube -o vktrace_cube.vktrace -w /home/jon/LoaderAndValidationLayers/dbuild/demos
+```
+Trace file is in "vktrace_cube.vktrace".
+
+###Running replayer on Linux###
+The Vktrace replayer takes  a trace file  and will launch an Vulkan session based
+on trace file.
+```
+cd <vktrace build dir>
+export LD_LIBRARY_PATH=<path to libvulkan.so>
+./vkreplay <options> -t trace_filename
+```
+Example to replay trace file captured above
+```
+export VK_ICD_FILENAMES=/home/jon/LoaderAndValidationLayers/main_icd.json
+export LD_LIBRARY_PATH=/home/jon/LoaderAndValidationLayers/dbuild:/home/jon/LoaderAndValidationLayers/dbuild/loader
+./vkreplay -t vktrace_cube.vktrace
+```
+
+##Using Vktrace on Windows##
+Vktrace builds two binaries with associated Vulkan libraries: a tracer with Vulkan
+tracing library and a replayer. The tracing library is a Vulkan layer library.
+
+
+###Running Vktrace tracer and launch app/game from tracer on Windows###
+The Vktrace tracer program launches the app/game you desire and then traces it.
+To launch app/game from Vktrace tracer one must use the "-p" option.
+Also, you may need to copy the Vulkan.dll library into the directory of Vktrace,
+and of the app/game (while we continue to put Windows support into place).
+```
+cd <vktrace build dir>
+./vktrace -p <path-to-app-to-launch> -w <working-dir-path-of-app>  <more options>
+```
+Example to trace the spinning cube demo (Note: see other files for how to configure your ICD):
+```
+cd C:\\Users\developer\\Vktrace\\_out64\\Debug
+
+vktrace -p C:\\Users\developer\\LoaderAndValidationLayers\\_out64\\demos\\cube.exe
+        -w C:\\Users\developer\\LoaderAndValidationLayers\\_out64\\demos
+        -o vktrace_cube.vktrace
+```
+Trace file is in "vktrace_cube.vktrace".
+
+###Running replayer on Windows###
+The Vktrace replayer takes  a trace file  and will launch an Vulkan session based
+on trace file.
+```
+cd <vktrace build dir>
+vkreplay <options> -t trace_filename
+```
+Example to replay trace file captured above
+```
+cd C:\\Users\developer\\Vktrace\\_out64\\Debug
+vkreplay -t vktrace_cube.vktrace
+```
+##Building Vktrace##
+Vktrace is built as part of top level VulkanTools build. Follow the
+build directions for the top level VulkanTools project build in BUILDVT.md. Vktrace binaries and
+libraries will be placed in <build_dir>.

diff --git a/vktrace/TODO.md b/vktrace/TODO.md
new file mode 100644
index 0000000..dda62f4
--- /dev/null
+++ b/vktrace/TODO.md

@@ -0,0 +1,76 @@
+Here is a list of all supported features in VkTrace, followed by a TODO list of features that we'd like to add soon in the development process. We've also listed the features that we'd "like to have" in the future, but don't have a short-term plan to implement. 
+
+Feel Free to vote things up in the lists, attempt to implement them yourself, or add to the list!
+
+As you complete an item, please copy / paste it into the SUPPORTED FEATURES section.
+
+**SUPPORTED FEATURES IN DEBUGGER**
+* Generating & loading traces
+* Replay traces within the UI w/ pause, continue, stop ability
+  * Auto-pause on Validation Layer Messages (info, warnings, and/or errors), controlled by settings
+  * Single-step the replay
+  * Timeline pointer gets updated in real-time of replayed API call
+  * Run the replay in a separate thread from the UI
+  * Pop-out replay window to be floating so it can replay at larger dimensions
+* Timeline shows CPU time of each API call
+  * A separate timeline is shown for each thread referenced in the trace file
+  * Tooltips display the API call index and entrypoint name and parameters
+  * Click call will cause API Call Tree to highlight call
+* API entrypoints names & parameters displayed in UI
+* Tracing and replay standard output gets directed to Output window
+* Plugin-based UI allows for extensibility to other APIs
+* Search API Call Tree
+  * Search result navigation
+* API Call Tree Enhancements:
+  * Draw call navigation buttons
+  * Draw calls are shown in bold font
+  * "Run to here" context menu option to control where Replayer pauses
+* Group API Calls by:
+  * Frame boundary
+  * Thread Id
+* Export API Calls as Text file
+* Settings dialog
+
+**TODO LIST IN DEBUGGER**
+* Hide / show columns on API Call Tree
+* State dependency graph at selected API Call
+* Group API Calls by:
+  * API-specific debug groups
+  * Command Buffer Submission
+  * Render vs State calls
+* Saving 'session' data:
+  * Recently loaded traces
+* Capture state from replay
+* Rewind the replay
+* Custom viewers of each state type
+* Per API entrypoint call stacks
+* Collect and display machine information
+* 64-bit build supports 32-bit trace files
+* Timeline enhancements:
+  * Pan & Zoom
+* Optimize trace file loading by memory-mapping the file
+
+**SUPPORTED FEATURES IN TRACING/REPLAYING COMMAND LINE TOOLS AND LIBRARIES**
+* Command line Tracer app (vktrace) which launches game/app with tracing library(ies) inserted and writes trace packets to a file
+* Command line Tracer server which collects tracing packets over a socket connection and writes them to a file
+* Vulkan tracer library supports multithreaded Vulkan apps
+* Command line Replayer app (vkreplay) replays a Vulkan trace file with Window display on Linux
+
+**TODO LIST IN TRACING/REPLAYING COMMAND LINE TOOLS AND LIBRARIES**
+* Optimize replay speed by using hash maps for opaque handles
+* Handle XGL persistently CPU mapped buffers during tracing, rather then relying on updating data at unmap time
+* Optimize Replayer speed by memory-mapping the file and/or reading file in a separate thread
+* Looping in Replayer over arbitrary frames or calls
+* Looping in Replayer with state restoration at beginning of loop
+* Replayer window display of Vulkan on Windows OS
+* Command line tool to display trace file in human readable format
+* Command line tool for editing trace files in human readable format
+* Replayer supports multithreading
+* 64-bit build supports 32-bit trace files
+* XGL tracing and replay cross platform support with differing GPUs
+
+**LIKE TO HAVE FUTURE FEATURE IDEAS**
+* Export trace file into *.cpp/h files that compile into a runnable application
+* Editing, adding, removal of API calls
+* Shader editing
+* Hyperlink API Call Tree to state-specific windows

diff --git a/vktrace/src/LICENSE b/vktrace/src/LICENSE
new file mode 100644
index 0000000..1d79dd8
--- /dev/null
+++ b/vktrace/src/LICENSE

@@ -0,0 +1,17 @@
+Copyright (C) 2014-2016 Valve Corporation
+Copyright (C) 2014-2016 LunarG, Inc.
+
+All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+

diff --git a/vktrace/src/build_options.cmake b/vktrace/src/build_options.cmake
new file mode 100644
index 0000000..bde0c04
--- /dev/null
+++ b/vktrace/src/build_options.cmake

@@ -0,0 +1,515 @@
+#
+# cmake -DCMAKE_BUILD_TYPE=Debug ..
+#
+#   http://www.cmake.org/Wiki/CMake_FAQ
+#   http://www.cmake.org/Wiki/CMake_Useful_Variables
+#   http://clang.llvm.org/docs/LanguageExtensions.html
+#
+#
+cmake_minimum_required(VERSION 2.8)
+
+set(BUILD_X64 "" CACHE STRING "whether to perform 64-bit build: ON or OFF overrides default detection")
+
+option(CMAKE_VERBOSE "Verbose CMake" FALSE)
+if (CMAKE_VERBOSE)
+    SET(CMAKE_VERBOSE_MAKEFILE ON)
+endif()
+
+# With gcc48: http://indico.cern.ch/getFile.py/access?contribId=1&resId=0&materialId=slides&confId=230762
+
+option(WITH_HARDENING "Enable hardening: Compile-time protection against static sized buffer overflows" OFF)
+
+# Unless user specifies BUILD_X64 explicitly, assume native target
+if (BUILD_X64 STREQUAL "")
+  if (CMAKE_SIZEOF_VOID_P EQUAL 8)
+    set(BUILD_X64 "TRUE")
+  else()
+    set(BUILD_X64 "FALSE")
+  endif()
+endif()
+
+# Generate bitness suffix to use, but make sure to include the existing suffix (.exe) 
+# for platforms that need it (ie, Windows)
+if (BUILD_X64)
+    set(CMAKE_EXECUTABLE_SUFFIX "${CMAKE_EXECUTABLE_SUFFIX}")
+    set(CMAKE_SHARED_LIBRARY_SUFFIX "${CMAKE_SHARED_LIBRARY_SUFFIX}")
+else()
+    # Don't add the 32 for Windows because it goes in a different location.
+    if (MSVC)
+        set(CMAKE_EXECUTABLE_SUFFIX "${CMAKE_EXECUTABLE_SUFFIX}")
+        set(CMAKE_SHARED_LIBRARY_SUFFIX "${CMAKE_SHARED_LIBRARY_SUFFIX}")
+    else()
+        set(CMAKE_EXECUTABLE_SUFFIX "32${CMAKE_EXECUTABLE_SUFFIX}")
+        set(CMAKE_SHARED_LIBRARY_SUFFIX "32${CMAKE_SHARED_LIBRARY_SUFFIX}")
+    endif()
+endif()
+
+# Default to release build
+if (NOT CMAKE_BUILD_TYPE)
+    set(CMAKE_BUILD_TYPE Release)
+endif()
+
+# Make sure we're using 64-bit versions of stat, fopen, etc.
+# Large File Support extensions:
+#   http://www.gnu.org/software/libc/manual/html_node/Feature-Test-Macros.html#Feature-Test-Macros
+add_definitions(-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGE_FILES)
+
+# support for inttypes.h macros
+add_definitions(-D__STDC_LIMIT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_CONSTANT_MACROS)
+
+if(MSVC)
+    set(CMAKE_CXX_FLAGS_LIST "/W3 /D_CRT_SECURE_NO_WARNINGS=1 /DWIN32 /D_WIN32")
+    set(CMAKE_CXX_FLAGS_RELEASE_LIST "/O2 /DNDEBUG")
+    set(CMAKE_CXX_FLAGS_DEBUG_LIST "/Od /D_DEBUG")
+else()
+    set(CMAKE_CXX_FLAGS_LIST "-g -Wall -Wextra")
+    set(CMAKE_CXX_FLAGS_RELEASE_LIST "-g -O2 -DNDEBUG")
+    set(CMAKE_CXX_FLAGS_DEBUG_LIST "-g -O0 -D_DEBUG")
+endif()
+
+
+if (${CMAKE_C_COMPILER_ID} STREQUAL "Clang")
+
+  # clang doesn't print colored diagnostics when invoked from Ninja
+  if (UNIX AND CMAKE_GENERATOR STREQUAL "Ninja")
+      add_definitions ("-fcolor-diagnostics")
+  endif()
+
+  if (CLANG_EVERYTHING)
+      set(CMAKE_CXX_FLAGS_LIST ${CMAKE_CXX_FLAGS_LIST}
+          # "-pedantic"             # Warn on language extensions
+          "-Weverything"            # Enable all warnings
+          "-fdiagnostics-show-category=name"
+          "-Wno-unused-macros"
+          "-Wno-padded"
+          "-Wno-variadic-macros"
+          "-Wno-missing-variable-declarations"
+          "-Wno-missing-prototypes"
+          "-Wno-sign-conversion"
+          "-Wno-conversion"
+          "-Wno-cast-align"
+          "-Wno-exit-time-destructors"
+          "-Wno-documentation-deprecated-sync"
+          "-Wno-documentation-unknown-command"
+
+          # TODO: Would be great to start enabling some of these warnings...
+          "-Wno-undefined-reinterpret-cast"
+          "-Wno-incompatible-pointer-types-discards-qualifiers"
+          "-Wno-float-equal"
+          "-Wno-unreachable-code"
+          "-Wno-weak-vtables"
+          "-Wno-extra-semi"
+          "-Wno-disabled-macro-expansion"
+          "-Wno-format-nonliteral"
+          "-Wno-packed"
+          "-Wno-c++11-long-long"
+          "-Wno-c++11-extensions"
+          "-Wno-gnu-anonymous-struct"
+          "-Wno-gnu-zero-variadic-macro-arguments"
+          "-Wno-nested-anon-types"
+          "-Wno-gnu-redeclared-enum"
+
+          "-Wno-pedantic"
+          "-Wno-header-hygiene"
+          "-Wno-covered-switch-default"
+          "-Wno-duplicate-enum"
+          "-Wno-switch-enum"
+          "-Wno-extra-tokens"
+
+          # Added because SDL2 headers have a ton of Doxygen warnings currently.
+          "-Wno-documentation"
+
+          )
+  endif()
+
+endif()
+
+
+if ((NOT MSVC) AND (NOT BUILD_X64) AND (CMAKE_SIZEOF_VOID_P EQUAL 8))
+    set(CMAKE_CXX_FLAGS_LIST "${CMAKE_CXX_FLAGS_LIST} -m32")
+    set(CMAKE_EXE_LINK_FLAGS_LIST "${CMAKE_EXE_LINK_FLAGS_LIST} -m32")
+    set(CMAKE_SHARED_LINK_FLAGS_LIST "${CMAKE_SHARED_LINK_FLAGS_LIST} -m32")
+
+    set_property(GLOBAL PROPERTY FIND_LIBRARY_USE_LIB64_PATHS OFF)
+    set(CMAKE_SYSTEM_LIBRARY_PATH /lib32 /usr/lib32 /usr/lib/i386-linux-gnu /usr/local/lib32)
+    set(CMAKE_IGNORE_PATH /lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib64 /usr/local/lib)
+endif()
+
+function(add_compiler_flag flag)
+    set(CMAKE_C_FLAGS    "${CMAKE_C_FLAGS}   ${flag}" PARENT_SCOPE)
+    set(CMAKE_CXX_FLAGS  "${CMAKE_CXX_FLAGS} ${flag}" PARENT_SCOPE)
+endfunction()
+
+function(add_c_compiler_flag flag)
+    set(CMAKE_C_FLAGS    "${CMAKE_C_FLAGS}   ${flag}" PARENT_SCOPE)
+endfunction()
+
+function(add_cpp_compiler_flag flag)
+    set(CMAKE_CXX_FLAGS  "${CMAKE_CXX_FLAGS} ${flag}" PARENT_SCOPE)
+endfunction()
+
+function(add_compiler_flag_debug flag)
+    set(CMAKE_C_FLAGS_DEBUG    "${CMAKE_C_FLAGS_DEBUG}   ${flag}" PARENT_SCOPE)
+    set(CMAKE_CXX_FLAGS_DEBUG  "${CMAKE_CXX_FLAGS_DEBUG} ${flag}" PARENT_SCOPE)
+endfunction()
+
+function(add_compiler_flag_release flag)
+    set(CMAKE_C_FLAGS_RELEASE    "${CMAKE_C_FLAGS_RELEASE}   ${flag}" PARENT_SCOPE)
+    set(CMAKE_CXX_FLAGS_RELEASE  "${CMAKE_CXX_FLAGS_RELEASE} ${flag}" PARENT_SCOPE)
+endfunction()
+
+
+function(add_linker_flag flag)
+    set(CMAKE_EXE_LINKER_FLAGS   "${CMAKE_EXE_LINKER_FLAGS} ${flag}" PARENT_SCOPE)
+endfunction()
+
+function(add_shared_linker_flag flag)
+    set(CMAKE_SHARED_LINKER_FLAGS   "${CMAKE_SHARED_LINKER_FLAGS} ${flag}" PARENT_SCOPE)
+endfunction()
+
+#
+# To show the include files as you're building, do this:
+#    add_compiler_flag("-H")
+# For Visual Studio, it's /showIncludes I believe...
+#
+
+# stack-protector-strong: http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00974.html
+## -fstack-protector-strong
+# Compile with the option "-fstack-usage" and a file .su will be generated with stack
+# information for each function.
+## -fstack-usage
+
+# For more info on -fno-strict-aliasing: "Just Say No to strict aliasing optimizations in C": http://nothings.org/
+# The Linux kernel is compiled with -fno-strict-aliasing: https://lkml.org/lkml/2003/2/26/158 or http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg01647.html
+
+### TODO: see if sse is generated with these instructions and clang:
+## -march=corei7 -msse -mfpmath=sse
+
+if (${CMAKE_C_COMPILER_ID} STREQUAL "Clang")
+   if ( NOT BUILD_X64 )
+      # Fix startup crash in dlopen_notify_callback (called indirectly from our dlopen() function) when tracing glxspheres on my AMD dev box (x86 release only)
+      # Also fixes tracing Q3 Arena using release tracer
+      # Clang is generating sse2 code even when it shouldn't be:
+      #  http://lists.cs.uiuc.edu/pipermail/cfe-dev/2012-March/020310.html
+      set(MARCH_STR "-march=i586")
+   endif()
+endif()
+
+if(MSVC)
+    set(CMAKE_CXX_FLAGS_LIST
+        ${CMAKE_CXX_FLAGS_LIST}
+        "/EHsc" # Need exceptions
+    )
+else()
+    set(CMAKE_CXX_FLAGS_LIST
+        ${CMAKE_CXX_FLAGS_LIST}
+        "-fno-omit-frame-pointer"
+        ${MARCH_STR}
+        # "-msse2 -mfpmath=sse" # To build with SSE instruction sets
+        "-Wno-unused-parameter -Wno-unused-function"
+        "-fno-strict-aliasing" # DO NOT remove this, we have lots of code that will fail in obscure ways otherwise because it was developed with MSVC first.
+        "-fno-math-errno"
+    	  "-fvisibility=hidden"
+        # "-fno-exceptions" # Exceptions are enabled by default for c++ files, disabled for c files.
+    )
+endif()
+
+if (CMAKE_COMPILER_IS_GNUCC)
+    execute_process(COMMAND ${CMAKE_C_COMPILER} ${CMAKE_C_COMPILER_ARG1} -dumpversion OUTPUT_VARIABLE GCC_VERSION)
+    string(REGEX MATCHALL "[0-9]+" GCC_VERSION_COMPONENTS ${GCC_VERSION})
+    list(GET GCC_VERSION_COMPONENTS 0 GCC_MAJOR)
+    list(GET GCC_VERSION_COMPONENTS 1 GCC_MINOR)
+    # message(STATUS "Detected GCC v ${GCC_MAJOR} . ${GCC_MINOR}")
+endif()
+
+if (GCC_VERSION VERSION_GREATER 4.8 OR GCC_VERSION VERSION_EQUAL 4.8)
+    set(CMAKE_CXX_FLAGS_LIST ${CMAKE_CXX_FLAGS_LIST}
+        "-Wno-unused-local-typedefs"
+    )
+endif()
+
+if (MSVC)
+else()
+    if (WITH_HARDENING)
+        # http://gcc.gnu.org/ml/gcc-patches/2004-09/msg02055.html
+        add_definitions(-D_FORTIFY_SOURCE=2 -fpic)
+        if (${CMAKE_C_COMPILER_ID} STREQUAL "GNU")
+            # During program load, several ELF memory sections need to be written to by the
+            # linker, but can be turned read-only before turning over control to the
+            # program. This prevents some GOT (and .dtors) overwrite attacks, but at least
+            # the part of the GOT used by the dynamic linker (.got.plt) is still vulnerable.
+            add_definitions(-pie -z now -z relro)
+        endif()
+    endif()
+endif()
+
+if (NOT MSVC)
+    if(APPLE)
+        set(CMAKE_EXE_LINK_FLAGS_LIST "-Wl,-undefined,error")
+    else()
+        set(CMAKE_EXE_LINK_FLAGS_LIST "-Wl,--no-undefined")
+    endif()
+endif()
+
+# Compiler flags
+string(REPLACE ";" " " CMAKE_C_FLAGS              "${CMAKE_CXX_FLAGS_LIST}")
+string(REPLACE ";" " " CMAKE_C_FLAGS_RELEASE      "${CMAKE_CXX_FLAGS_RELEASE_LIST}")
+string(REPLACE ";" " " CMAKE_C_FLAGS_DEBUG        "${CMAKE_CXX_FLAGS_DEBUG_LIST}")
+
+string(REPLACE ";" " " CMAKE_CXX_FLAGS            "${CMAKE_CXX_FLAGS_LIST}")
+string(REPLACE ";" " " CMAKE_CXX_FLAGS_RELEASE    "${CMAKE_CXX_FLAGS_RELEASE_LIST}")
+string(REPLACE ";" " " CMAKE_CXX_FLAGS_DEBUG      "${CMAKE_CXX_FLAGS_DEBUG_LIST}")
+
+# Linker flags (exe)
+string(REPLACE ";" " " CMAKE_EXE_LINKER_FLAGS     "${CMAKE_EXE_LINK_FLAGS_LIST}")
+# Linker flags (shared)
+string(REPLACE ";" " " CMAKE_SHARED_LINKER_FLAGS  "${CMAKE_SHARED_LINK_FLAGS_LIST}")
+
+set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/../../)
+set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/../../)
+
+
+function(build_options_finalize)
+    if (CMAKE_VERBOSE)
+        message("  CMAKE_PROJECT_NAME: ${CMAKE_PROJECT_NAME}")
+        message("  PROJECT_NAME: ${PROJECT_NAME}")
+        message("  BUILD_X64: ${BUILD_X64}")
+        message("  BUILD_TYPE: ${CMAKE_BUILD_TYPE}")
+        message("  CMAKE_BINARY_DIR: ${CMAKE_BINARY_DIR}")
+        message("  PROJECT_BINARY_DIR: ${PROJECT_BINARY_DIR}")
+        message("  CMAKE_SOURCE_DIR: ${CMAKE_SOURCE_DIR}")
+        message("  PROJECT_SOURCE_DIR: ${PROJECT_SOURCE_DIR}")
+        message("  CMAKE_CURRENT_LIST_FILE: ${CMAKE_CURRENT_LIST_FILE}")
+        message("  CXX_FLAGS: ${CMAKE_CXX_FLAGS}")
+        message("  CXX_FLAGS_RELEASE: ${CMAKE_CXX_FLAGS_RELEASE}")
+        message("  CXX_FLAGS_DEBUG: ${CMAKE_CXX_FLAGS_DEBUG}")
+        message("  EXE_LINKER_FLAGS: ${CMAKE_EXE_LINKER_FLAGS}")
+        message("  SHARED_LINKER_FLAGS: ${CMAKE_SHARED_LINKER_FLAGS}")
+        message("  SHARED_LIBRARY_C_FLAGS: ${CMAKE_SHARED_LIBRARY_C_FLAGS}")
+        message("  SHARED_LIBRARY_CXX_FLAGS: ${CMAKE_SHARED_LIBRARY_CXX_FLAGS}")
+        message("  SHARED_LIBRARY_LINK_CXX_FLAGS: ${CMAKE_SHARED_LIBRARY_LINK_CXX_FLAGS}")
+        message("  SHARED_LIBRARY_LINK_C_FLAGS: ${CMAKE_SHARED_LIBRARY_LINK_C_FLAGS}")
+        message("  CMAKE_C_COMPILER: ${CMAKE_C_COMPILER}")
+        message("  CMAKE_CXX_COMPILER: ${CMAKE_CXX_COMPILER}")
+        message("  CMAKE_C_COMPILER_ID: ${CMAKE_C_COMPILER_ID}")
+        message("  CMAKE_EXECUTABLE_SUFFIX: ${CMAKE_EXECUTABLE_SUFFIX}")
+        message("")
+    endif()
+endfunction()
+
+function(require_pthreads)
+    find_package(Threads)
+    if (NOT CMAKE_USE_PTHREADS_INIT AND NOT WIN32_PTHREADS_INCLUDE_PATH)
+        message(FATAL_ERROR "pthread not found")
+    endif()
+
+    if (MSVC)
+        include_directories("${WIN32_PTHREADS_INCLUDE_PATH}")
+        if (BUILD_X64)
+            set(PTHREAD_SRC_LIB "${WIN32_PTHREADS_PATH}/lib/x64/pthreadVC2.lib" PARENT_SCOPE)
+            set(PTHREAD_SRC_DLL "${WIN32_PTHREADS_PATH}/dll/x64/pthreadVC2.dll" PARENT_SCOPE)
+        else()
+            set(PTHREAD_SRC_LIB "${WIN32_PTHREADS_PATH}/lib/x86/pthreadVC2.lib" PARENT_SCOPE)
+            set(PTHREAD_SRC_DLL "${WIN32_PTHREADS_PATH}/dll/x86/pthreadVC2.dll" PARENT_SCOPE)
+        endif()
+
+    else()
+        # Export the variable to the parent scope so the linker knows where to find the library.
+        set(CMAKE_THREAD_LIBS_INIT ${CMAKE_THREAD_LIBS_INIT} PARENT_SCOPE)
+    endif()
+endfunction()
+
+function(require_libjpegturbo)
+    find_library(LibJpegTurbo_LIBRARY
+        NAMES libturbojpeg.so libturbojpeg.so.0 libturbojpeg.dylib
+        PATHS /opt/libjpeg-turbo/lib 
+    )
+
+    # On platforms that find this, the include files will have also been installed to the system
+    # so we don't need extra include dirs.
+    if (LibJpegTurbo_LIBRARY)
+        set(LibJpegTurbo_INCLUDE "" PARENT_SCOPE)
+    else()
+        if (BUILD_X64)
+            set(BITS_STRING "x64")
+        else()
+            set(BITS_STRING "x86")
+        endif()
+        set(LibJpegTurbo_INCLUDE "${CMAKE_PREFIX_PATH}/libjpeg-turbo-2.1.3/include" PARENT_SCOPE)
+        set(LibJpegTurbo_LIBRARY "${CMAKE_PREFIX_PATH}/libjpeg-turbo-2.1.3/lib_${BITS_STRING}/turbojpeg.lib" PARENT_SCOPE)
+    endif()
+endfunction()
+
+function(require_sdl2)
+    if (${CMAKE_SYSTEM_NAME} MATCHES "Linux")
+        include(FindPkgConfig)
+        pkg_search_module(PC_SDL2 REQUIRED sdl2)
+
+        find_path(SDL2_INCLUDE SDL.h
+            DOC "SDL2 Include Path"
+	    HINTS ${PC_SDL2_INCLUDEDIR} ${PC_SDL2_INCLUDE_DIRS} )
+
+        find_library(SDL2_LIBRARY SDL2
+            DOC "SDL2 Library"
+	    HINTS ${PC_SDL2_LIBDIR} ${PC_SDL2_LIBRARY_DIRS} )
+    elseif (MSVC)
+        set(SDL2Root "${CMAKE_EXTERNAL_PATH}/SDL")
+
+        set(SDL2_INCLUDE "${SDL2Root}/include" CACHE PATH "SDL2 Include Path")
+        set(SDL2_LIBRARY "${CMAKE_LIBRARY_OUTPUT_DIRECTORY}/${CMAKE_CFG_INTDIR}/SDL2.lib" CACHE FILEPATH "SDL2 Library")
+
+        # Only want to include this once.
+        # This has to go into properties because it needs to persist across the entire cmake run.
+        get_property(SDL_PROJECT_ALREADY_INCLUDED 
+            GLOBAL 
+            PROPERTY SDL_PROJECT_INCLUDED
+            )
+
+        if (NOT SDL_PROJECT_ALREADY_INCLUDED)
+            INCLUDE_EXTERNAL_MSPROJECT(SDL "${SDL2Root}/VisualC/SDL/SDL_VS2013.vcxproj")
+            set_property(GLOBAL
+                PROPERTY SDL_PROJECT_INCLUDED "TRUE"
+                )
+            message("Including SDL_VS2013.vcxproj for you!")
+        endif()
+
+    else()
+        message(FATAL_ERROR "Need to deal with SDL on non-Windows platforms")
+    endif()
+endfunction()
+
+function(require_m)
+    if (MSVC)
+	set(M_LIBRARY "winmm.lib" PARENT_SCOPE)
+    else()
+	set(M_LIBRARY "m" PARENT_SCOPE)
+    endif()
+endfunction()
+
+function(require_gl)
+    if (${CMAKE_SYSTEM_NAME} MATCHES "Linux")
+	include(FindPkgConfig)
+	pkg_search_module(PC_GL QUIET gl)
+
+	find_path(GL_INCLUDE GL/gl.h
+	    DOC "OpenGL Include Path"
+	    HINTS ${PC_GL_INCLUDEDIR} ${PC_GL_INCLUDE_DIRS} )
+
+	find_library(GL_LIBRARY GL
+	    DOC "OpenGL Library"
+	    HINTS ${PC_GL_LIBDIR} ${PC_GL_LIBRARY_DIRS} )
+    elseif (MSVC)
+	set(GL_INCLUDE ""	      CACHE PATH "OpenGL Include Path")
+	set(GL_LIBRARY "opengl32.lib" CACHE FILEPATH "OpenGL Library")
+    else()
+	set(GL_INCLUDE ""   CACHE PATH "OpenGL Include Path")
+	set(GL_LIBRARY "GL" CACHE FILEPATH "OpenGL Library")
+    endif()
+endfunction()
+
+function(require_glu)
+    if (${CMAKE_SYSTEM_NAME} MATCHES "Linux")
+	include(FindPkgConfig)
+	pkg_search_module(PC_GLU QUIET glu)
+
+	find_path(GLU_INCLUDE GL/glu.h
+	    DOC "GLU Include Path"
+	    HINTS ${PC_GLU_INCLUDEDIR} ${PC_GLU_INCLUDE_DIRS} )
+
+	find_library(GLU_LIBRARY GLU
+	    DOC "GLU Library"
+	    HINTS ${PC_GLU_LIBDIR} ${PC_GLU_LIBRARY_DIRS} )
+    elseif (MSVC)
+	set(GLU_INCLUDE ""	    CACHE PATH "GLU Include Path")
+	set(GLU_LIBRARY "glu32.lib" CACHE FILEPATH "GLU Library")
+    else()
+	set(GLU_INCLUDE ""    CACHE PATH "GLU Include Path")
+	set(GLU_LIBRARY "GLU" CACHE FILEPATH "GLU Library")
+    endif()
+endfunction()
+
+function(request_backtrace)
+    if (NOT MSVC)
+        set( LibBackTrace_INCLUDE "${SRC_DIR}/libbacktrace" PARENT_SCOPE )
+        set( LibBackTrace_LIBRARY "backtracevogl" PARENT_SCOPE )
+    else()
+        set( LibBackTrace_INCLUDE "" PARENT_SCOPE )
+        set( LibBackTrace_LIBRARY "" PARENT_SCOPE )
+    endif()
+endfunction()
+
+# What compiler toolchain are we building on?
+if (${CMAKE_C_COMPILER_ID} STREQUAL "GNU")
+    add_compiler_flag("-DCOMPILER_GCC=1")
+    add_compiler_flag("-DCOMPILER_GCCLIKE=1")
+elseif (${CMAKE_C_COMPILER_ID} STREQUAL "mingw")
+    add_compiler_flag("-DCOMPILER_MINGW=1")
+    add_compiler_flag("-DCOMPILER_GCCLIKE=1")
+elseif (MSVC)
+    add_compiler_flag("-DCOMPILER_MSVC=1")
+elseif (${CMAKE_C_COMPILER_ID} STREQUAL "Clang")
+    add_compiler_flag("-DCOMPILER_CLANG=1")
+    add_compiler_flag("-DCOMPILER_GCCLIKE=1")
+else()
+    message("Compiler is ${CMAKE_C_COMPILER_ID}")
+    message(FATAL_ERROR "Compiler unset, build will fail--stopping at CMake time.")
+endif()
+
+# Platform specific library defines.
+if (WIN32)
+    set( LIBRT "" )
+    set( LIBDL "" )
+
+elseif (UNIX)
+    set( LIBRT rt )
+    set( LIBDL dl )
+else()
+    message(FATAL_ERROR "Need to determine what the library name for 'rt' is for non-windows, non-unix platforms (or if it's even needed).")
+endif()
+
+# What OS will we be running on?
+if (${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
+    add_compiler_flag("-DPLATFORM_OSX=1")
+    add_compiler_flag("-DPLATFORM_POSIX=1")
+elseif (${CMAKE_SYSTEM_NAME} MATCHES "Linux")
+    add_compiler_flag("-DPLATFORM_LINUX=1")
+    add_compiler_flag("-DPLATFORM_POSIX=1")
+elseif (${CMAKE_SYSTEM_NAME} MATCHES "Windows")
+    add_compiler_flag("-DPLATFORM_WINDOWS=1")
+else()
+    message(FATAL_ERROR "Platform unset, build will fail--stopping at CMake time.")
+endif()
+
+# What bittedness are we building?
+if (BUILD_X64)
+    add_compiler_flag("-DPLATFORM_64BIT=1")
+else()
+    add_compiler_flag("-DPLATFORM_32BIT=1")
+endif()
+
+# Compiler flags for windows.
+if (MSVC)
+    # Multithreaded compilation is a big time saver.
+    add_compiler_flag("/MP")
+
+    # In debug, we use the DLL debug runtime.
+    add_compiler_flag_debug("/MDd")
+
+    # And in release we use the DLL release runtime
+    add_compiler_flag_release("/MD")
+
+    # x64 doesn't ever support /ZI, only /Zi.
+    if (BUILD_X64)
+      add_compiler_flag("/Zi")
+    else()
+
+      # In debug, get debug information suitable for Edit and Continue
+      add_compiler_flag_debug("/ZI")
+
+      # In release, still generate debug information (because not having it is dumb)
+      add_compiler_flag_release("/Zi")
+    endif()
+
+    # And tell the linker to always generate the file for us.
+    add_linker_flag("/DEBUG")
+endif()

diff --git a/vktrace/src/vktrace_common/CMakeLists.txt b/vktrace/src/vktrace_common/CMakeLists.txt
new file mode 100644
index 0000000..0c7afa0
--- /dev/null
+++ b/vktrace/src/vktrace_common/CMakeLists.txt

@@ -0,0 +1,57 @@
+project(vktrace_common)
+cmake_minimum_required(VERSION 2.8)
+
+include(${SRC_DIR}/build_options.cmake)
+
+include_directories(
+    ${SRC_DIR}/vktrace_common
+    ${SRC_DIR}/thirdparty
+)
+
+if (${CMAKE_SYSTEM_NAME} MATCHES "Linux")
+    require_pthreads()
+endif()
+
+set(SRC_LIST
+    ${SRC_LIST}
+    vktrace_filelike.c
+    vktrace_interconnect.c
+    vktrace_platform.c
+    vktrace_process.c
+    vktrace_settings.c
+    vktrace_tracelog.c
+    vktrace_trace_packet_utils.c
+    vktrace_pageguard_memorycopy.cpp
+)
+
+set (CXX_SRC_LIST
+     vktrace_pageguard_memorycopy.cpp
+)
+
+set_source_files_properties( ${SRC_LIST} PROPERTIES LANGUAGE C)
+set_source_files_properties( ${CXX_SRC_LIST} PROPERTIES LANGUAGE CXX)
+
+file( GLOB_RECURSE HDRS *.[h|inl] )
+
+if (NOT MSVC)
+    add_c_compiler_flag("-fPIC")
+    add_cpp_compiler_flag("-fPIC -std=c++11")
+endif()
+
+add_library(${PROJECT_NAME} STATIC ${SRC_LIST} ${CXX_SRC_LIST} ${HDRS})
+
+
+if (${CMAKE_SYSTEM_NAME} MATCHES "Windows")
+target_link_libraries(${PROJECT_NAME}
+    Rpcrt4.lib
+)
+elseif (${CMAKE_SYSTEM_NAME} MATCHES "Linux")
+target_link_Libraries(${PROJECT_NAME}
+    dl
+    pthread
+)
+endif (${CMAKE_SYSTEM_NAME} MATCHES "Windows")
+
+build_options_finalize()
+
+set_target_properties(vktrace_common PROPERTIES LINKER_LANGUAGE C)

diff --git a/vktrace/src/vktrace_common/vktrace_common.h b/vktrace/src/vktrace_common/vktrace_common.h
new file mode 100644
index 0000000..a77bf02
--- /dev/null
+++ b/vktrace/src/vktrace_common/vktrace_common.h

@@ -0,0 +1,62 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ **************************************************************************/
+
+#pragma once
+
+#include <assert.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <stdint.h>
+#include <stdlib.h>
+
+#include "vktrace_platform.h"
+
+#include "vktrace_memory.h"
+#include "vktrace_tracelog.h"
+
+#ifndef STRINGIFY
+#define STRINGIFY(x) #x
+#endif
+
+#if defined(WIN32)
+
+#define VKTRACER_EXPORT __declspec(dllexport)
+#define VKTRACER_STDCALL __stdcall
+#define VKTRACER_CDECL __cdecl
+#define VKTRACER_EXIT void __cdecl
+#define VKTRACER_ENTRY void
+#define VKTRACER_LEAVE void
+
+#elif defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+
+#define VKTRACER_EXPORT __attribute__ ((visibility ("default")))
+#define VKTRACER_STDCALL
+#define VKTRACER_CDECL
+#define VKTRACER_EXIT void
+#define VKTRACER_ENTRY void __attribute__ ((constructor))
+#define VKTRACER_LEAVE void __attribute__ ((destructor))
+
+#endif
+
+#define ROUNDUP_TO_2(_len)  ((((_len)+ 1)>>1)<<1)
+#define ROUNDUP_TO_4(_len)  ((((_len)+ 3)>>2)<<2)
+#define ROUNDUP_TO_8(_len)  ((((_len)+ 7)>>3)<<3)
+#define ROUNDUP_TO_16(_len) ((((_len)+15)>>4)<<4)

diff --git a/vktrace/src/vktrace_common/vktrace_filelike.c b/vktrace/src/vktrace_common/vktrace_filelike.c
new file mode 100644
index 0000000..9391837
--- /dev/null
+++ b/vktrace/src/vktrace_common/vktrace_filelike.c

@@ -0,0 +1,180 @@
+/*
+ * Copyright (c) 2013, NVIDIA CORPORATION. All rights reserved.
+ * Copyright (c) 2014-2016 Valve Corporation. All rights reserved.
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * 
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ * 
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ */
+
+#include "vktrace_filelike.h"
+#include "vktrace_common.h"
+#include "vktrace_interconnect.h"
+#include <assert.h>
+#include <stdlib.h>
+
+// ------------------------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------------------------
+Checkpoint* vktrace_Checkpoint_create(const char* _str)
+{
+    Checkpoint* pCheckpoint = VKTRACE_NEW(Checkpoint);
+    pCheckpoint->mToken = _str;
+    pCheckpoint->mTokenLength = strlen(_str) + 1;
+    return pCheckpoint;
+}
+
+// ------------------------------------------------------------------------------------------------
+void vktrace_Checkpoint_write(Checkpoint* pCheckpoint, FileLike* _out)
+{
+    vktrace_FileLike_Write(_out, pCheckpoint->mToken, pCheckpoint->mTokenLength);
+}
+
+// ------------------------------------------------------------------------------------------------
+BOOL vktrace_Checkpoint_read(Checkpoint* pCheckpoint, FileLike* _in)
+{
+    if (pCheckpoint->mTokenLength < 64) {
+        char buffer[64];
+        vktrace_FileLike_Read(_in, buffer, pCheckpoint->mTokenLength);
+        if (strcmp(buffer, pCheckpoint->mToken) != 0) {
+            return FALSE;
+        }
+    } else {
+        char* buffer = VKTRACE_NEW_ARRAY(char, pCheckpoint->mTokenLength);
+        vktrace_FileLike_Read(_in, buffer, pCheckpoint->mTokenLength);
+        if (strcmp(buffer, pCheckpoint->mToken) != 0) {
+            VKTRACE_DELETE(buffer);
+            return FALSE;
+        }
+        VKTRACE_DELETE(buffer);
+    }
+    return TRUE;
+}
+
+// ------------------------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------------------------
+FileLike* vktrace_FileLike_create_file(FILE* fp)
+{
+    FileLike* pFile = NULL;
+    if (fp != NULL)
+    {
+        pFile = VKTRACE_NEW(FileLike);
+        pFile->mMode = File;
+        pFile->mFile = fp;
+        pFile->mMessageStream = NULL;
+    }
+    return pFile;
+}
+
+// ------------------------------------------------------------------------------------------------
+FileLike* vktrace_FileLike_create_msg(MessageStream* _msgStream)
+{
+    FileLike* pFile = NULL;
+    if (_msgStream != NULL)
+    {
+        pFile = VKTRACE_NEW(FileLike);
+        pFile->mMode = Socket;
+        pFile->mFile = NULL;
+        pFile->mMessageStream = _msgStream;
+    }
+    return pFile;
+}
+
+// ------------------------------------------------------------------------------------------------
+size_t vktrace_FileLike_Read(FileLike* pFileLike, void* _bytes, size_t _len)
+{
+    size_t minSize = 0;
+    size_t bytesInStream = 0;
+    if (vktrace_FileLike_ReadRaw(pFileLike, &bytesInStream, sizeof(bytesInStream)) == FALSE)
+        return 0;
+
+    minSize = (_len < bytesInStream) ? _len: bytesInStream;
+    if (bytesInStream > 0) {
+        assert(_len >= bytesInStream);
+        if (vktrace_FileLike_ReadRaw(pFileLike, _bytes, minSize) == FALSE)
+            return 0;
+    }
+
+    return minSize;
+}
+
+// ------------------------------------------------------------------------------------------------
+BOOL vktrace_FileLike_ReadRaw(FileLike* pFileLike, void* _bytes, size_t _len)
+{
+    BOOL result = TRUE;
+    assert((pFileLike->mFile != 0) ^ (pFileLike->mMessageStream != 0));
+
+    switch(pFileLike->mMode) {
+    case File:
+        {
+            if (1 != fread(_bytes, _len, 1, pFileLike->mFile))
+            {
+                if (ferror(pFileLike->mFile) != 0)
+                {
+                    perror("fread error");
+                }
+                else if (feof(pFileLike->mFile) != 0)
+                {
+                    vktrace_LogVerbose("Reached end of file.");
+                }
+                result = FALSE;
+            } 
+            break;
+        }
+    case Socket:
+        {
+            result = vktrace_MessageStream_BlockingRecv(pFileLike->mMessageStream, _bytes, _len);
+            break;
+        }
+
+        default: 
+            assert(!"Invalid mode in FileLike_ReadRaw");
+            result = FALSE;
+    }
+    return result;
+}
+
+void vktrace_FileLike_Write(FileLike* pFileLike, const void* _bytes, size_t _len)
+{
+    vktrace_FileLike_WriteRaw(pFileLike, &_len, sizeof(_len));
+    if (_len) {
+        vktrace_FileLike_WriteRaw(pFileLike, _bytes, _len);
+    }
+}
+
+// ------------------------------------------------------------------------------------------------
+BOOL vktrace_FileLike_WriteRaw(FileLike* pFile, const void* _bytes, size_t _len)
+{
+    BOOL result = TRUE;
+    assert((pFile->mFile != 0) ^ (pFile->mMessageStream != 0));
+    switch (pFile->mMode)
+    {
+        case File:
+            if (1 != fwrite(_bytes, _len, 1, pFile->mFile))
+            {
+                result = FALSE;
+            }
+            break;
+        case Socket:
+            result = vktrace_MessageStream_Send(pFile->mMessageStream, _bytes, _len);
+            break;
+        default:
+            assert(!"Invalid mode in FileLike_WriteRaw");
+            result = FALSE;
+            break;
+    }
+    return result;
+}

diff --git a/vktrace/src/vktrace_common/vktrace_filelike.h b/vktrace/src/vktrace_common/vktrace_filelike.h
new file mode 100644
index 0000000..7926e14
--- /dev/null
+++ b/vktrace/src/vktrace_common/vktrace_filelike.h

@@ -0,0 +1,85 @@
+/*
+ * Copyright (c) 2013, NVIDIA CORPORATION. All rights reserved.
+ * Copyright (c) 2014-2016 Valve Corporation. All rights reserved.
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ * 
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ */
+
+#pragma once
+
+#ifdef WIN32
+#include <WinSock2.h>
+#include <WS2tcpip.h>
+#pragma comment (lib, "Ws2_32.lib")
+#endif
+
+#include "vktrace_common.h"
+#include "vktrace_interconnect.h"
+
+typedef struct MessageStream MessageStream;
+
+struct FileLike;
+typedef struct FileLike FileLike;
+typedef struct FileLike
+{
+    enum { File, Socket } mMode;
+    FILE* mFile;
+    MessageStream* mMessageStream;
+} FileLike;
+
+// For creating checkpoints (consistency checks) in the various streams we're interacting with.
+typedef struct Checkpoint
+{
+    const char* mToken;
+    size_t mTokenLength;
+} Checkpoint;
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+Checkpoint* vktrace_Checkpoint_create(const char* _str);
+void vktrace_Checkpoint_write(Checkpoint* pCheckpoint, FileLike* _out);
+BOOL vktrace_Checkpoint_read(Checkpoint* pCheckpoint, FileLike* _in);
+
+// An interface for interacting with sockets, files, and memory streams with a file-like interface.
+// This is a simple file-like interface--it doesn't support rewinding or anything fancy, just fifo 
+// reads and writes.
+
+// create a filelike interface for file streaming
+FileLike* vktrace_FileLike_create_file(FILE* fp);
+
+// create a filelike interface for network streaming
+FileLike* vktrace_FileLike_create_msg(MessageStream* _msgStream);
+
+// read a size and then a buffer of that size
+size_t vktrace_FileLike_Read(FileLike* pFileLike, void* _bytes, size_t _len);
+
+// Normally, Read expects the size to live in the stream prefixing the data to be read.
+// With ReadRaw, no size is expected first, and the bytes are directly read.
+BOOL vktrace_FileLike_ReadRaw(FileLike* pFileLike, void* _bytes, size_t _len);
+
+// write _len and then the buffer of size _len
+void vktrace_FileLike_Write(FileLike* pFileLike, const void* _bytes, size_t _len);
+
+// Normally, Write outputs the _len to the stream first--with WriteRaw the bytes are simply written, 
+// no size parameter first.
+BOOL vktrace_FileLike_WriteRaw(FileLike* pFile, const void* _bytes, size_t _len);
+
+#ifdef __cplusplus
+}
+#endif

diff --git a/vktrace/src/vktrace_common/vktrace_interconnect.c b/vktrace/src/vktrace_common/vktrace_interconnect.c
new file mode 100644
index 0000000..0019fb2
--- /dev/null
+++ b/vktrace/src/vktrace_common/vktrace_interconnect.c

@@ -0,0 +1,531 @@
+/*
+ * Copyright (c) 2013, NVIDIA CORPORATION. All rights reserved.
+ * Copyright (c) 2014-2016 Valve Corporation. All rights reserved.
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ */
+
+#include "vktrace_interconnect.h"
+#include "vktrace_common.h"
+
+#include "vktrace_filelike.h"
+
+#if defined(ANDROID)
+#include <sys/un.h>
+#endif
+
+const size_t kSendBufferSize = 1024 * 1024;
+
+MessageStream* gMessageStream = NULL;
+static VKTRACE_CRITICAL_SECTION gSendLock;
+// ------------------------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------------------------
+// private functions
+BOOL vktrace_MessageStream_SetupSocket(MessageStream* pStream);
+BOOL vktrace_MessageStream_SetupHostSocket(MessageStream* pStream);
+BOOL vktrace_MessageStream_SetupClientSocket(MessageStream* pStream);
+BOOL vktrace_MessageStream_Handshake(MessageStream* pStream);
+BOOL vktrace_MessageStream_ReallySend(MessageStream* pStream, const void* _bytes, size_t _size, BOOL _optional);
+void vktrace_MessageStream_FlushSendBuffer(MessageStream* pStream, BOOL _optional);
+
+// public functions
+MessageStream* vktrace_MessageStream_create_port_string(BOOL _isHost, const char* _address, const char* _port)
+{
+    MessageStream* pStream;
+    // make sure the strings are shorter than the destination buffer we have to store them!
+    assert(strlen(_address) + 1 <= 64);
+    assert(strlen(_port) + 1 <= 8);
+
+    pStream = VKTRACE_NEW(MessageStream);
+    memcpy(pStream->mAddress, _address, strlen(_address) + 1);
+    memcpy(pStream->mPort, _port, strlen(_port) + 1);
+
+    pStream->mErrorNum = 0;
+    memset(pStream->mSmallBuffer, 0, 64);
+    pStream->mHost = _isHost;
+    pStream->mHostAddressInfo = NULL;
+    pStream->mNextPacketId = 0;
+    pStream->mSocket = INVALID_SOCKET;
+    pStream->mSendBuffer = NULL;
+
+    if (vktrace_MessageStream_SetupSocket(pStream) == FALSE)
+    {
+        VKTRACE_DELETE(pStream);
+        pStream = NULL;
+    }
+
+    return pStream;
+}
+
+MessageStream* vktrace_MessageStream_create(BOOL _isHost, const char* _address, unsigned int _port)
+{
+    char portBuf[32];
+    memset(portBuf, 0, 32 * sizeof(char));
+    sprintf(portBuf, "%u", _port);
+    return vktrace_MessageStream_create_port_string(_isHost, _address, portBuf);
+}
+
+void vktrace_MessageStream_destroy(MessageStream** ppStream)
+{
+    if ((*ppStream)->mSendBuffer != NULL) {
+        // Try to get our data out.
+        vktrace_MessageStream_FlushSendBuffer(*ppStream, TRUE);
+        vktrace_SimpleBuffer_destroy(&(*ppStream)->mSendBuffer);
+    }
+
+    if ((*ppStream)->mHostAddressInfo != NULL)
+    {
+        freeaddrinfo((*ppStream)->mHostAddressInfo);
+        (*ppStream)->mHostAddressInfo = NULL;
+    }
+
+    vktrace_LogDebug("Destroyed socket connection.");
+#if defined(WIN32)
+    WSACleanup();
+#endif
+    VKTRACE_DELETE(*ppStream);
+    (*ppStream) = NULL;
+}
+
+// ------------------------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------------------------
+// private function implementations
+BOOL vktrace_MessageStream_SetupSocket(MessageStream* pStream)
+{
+    BOOL result = TRUE;
+#if defined(WIN32)
+    WSADATA wsaData;
+
+    if (WSAStartup(MAKEWORD(2, 2), &wsaData) != NO_ERROR) {
+        result = FALSE;
+    }
+    else
+#endif
+    {
+        if (pStream->mHost) {
+            result = vktrace_MessageStream_SetupHostSocket(pStream);
+        } else {
+            result = vktrace_MessageStream_SetupClientSocket(pStream);
+        }
+    }
+    return result;
+}
+
+BOOL vktrace_MessageStream_SetupHostSocket(MessageStream* pStream)
+{
+    int hr = 0;
+#if defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+    int yes = 1;
+#endif
+    struct addrinfo hostAddrInfo = { 0 };
+    SOCKET listenSocket;
+
+    vktrace_create_critical_section(&gSendLock);
+    hostAddrInfo.ai_family = AF_INET;
+    hostAddrInfo.ai_socktype = SOCK_STREAM;
+    hostAddrInfo.ai_protocol = IPPROTO_TCP;
+    hostAddrInfo.ai_flags = AI_PASSIVE;
+
+    hr = getaddrinfo(NULL, pStream->mPort, &hostAddrInfo, &pStream->mHostAddressInfo);
+    if (hr != 0) {
+        vktrace_LogError("Host: Failed getaddrinfo.");
+        return FALSE;
+    }
+
+    listenSocket = socket(pStream->mHostAddressInfo->ai_family, pStream->mHostAddressInfo->ai_socktype, pStream->mHostAddressInfo->ai_protocol);
+    if (listenSocket == INVALID_SOCKET) {
+        // TODO: Figure out errors
+        vktrace_LogError("Host: Failed creating a listen socket.");
+        freeaddrinfo(pStream->mHostAddressInfo);
+        pStream->mHostAddressInfo = NULL;
+        return FALSE;
+    }
+
+#if defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+    setsockopt(listenSocket, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(yes));
+#endif
+    hr = bind(listenSocket, pStream->mHostAddressInfo->ai_addr, (int)pStream->mHostAddressInfo->ai_addrlen);
+    if (hr == SOCKET_ERROR) {
+        vktrace_LogError("Host: Failed binding socket err=%d.", VKTRACE_WSAGetLastError());
+        freeaddrinfo(pStream->mHostAddressInfo);
+        pStream->mHostAddressInfo = NULL;
+        closesocket(listenSocket);
+        return FALSE;
+    }
+
+    // Done with this.
+    freeaddrinfo(pStream->mHostAddressInfo);
+    pStream->mHostAddressInfo = NULL;
+
+    hr = listen(listenSocket, 1);
+    if (hr == SOCKET_ERROR) {
+        vktrace_LogError("Host: Failed listening on socket err=%d.");
+        closesocket(listenSocket);
+        return FALSE;
+    }
+
+    // Fo reals.
+    vktrace_LogVerbose("Listening for connections on port %s.", pStream->mPort);
+    pStream->mSocket = accept(listenSocket, NULL, NULL);
+    closesocket(listenSocket);
+
+    if (pStream->mSocket == INVALID_SOCKET) {
+        vktrace_LogError("Host: Failed accepting socket connection.");
+        return FALSE;
+    }
+
+    vktrace_LogVerbose("Connected on port %s.", pStream->mPort);
+    if (vktrace_MessageStream_Handshake(pStream))
+    {
+        // TODO: The SendBuffer can cause big delays in sending messages back to the client.
+        // We haven't verified if this improves performance in real applications,
+        // so disable it for now.
+        //pStream->mSendBuffer = vktrace_SimpleBuffer_create(kSendBufferSize);
+        pStream->mSendBuffer = NULL;
+    }
+    else
+    {
+        vktrace_LogError("vktrace_MessageStream_SetupHostSocket failed handshake.");
+    }
+    return TRUE;
+}
+
+// ------------------------------------------------------------------------------------------------
+BOOL vktrace_MessageStream_SetupClientSocket(MessageStream* pStream)
+{
+    int hr = 0;
+    unsigned int attempt = 0;
+    BOOL bConnected = FALSE;
+    struct addrinfo hostAddrInfo = { 0 },
+        *currentAttempt = NULL;
+    vktrace_create_critical_section(&gSendLock);
+
+#if defined(ANDROID)
+
+    struct sockaddr_un addr;
+    socklen_t namelen;
+
+    // Copy the string such that a null character precedes it, i.e. "0\vktrace"
+    memset(&addr, 0, sizeof(addr));
+    strcpy(addr.sun_path + 1, pStream->mPort);
+    addr.sun_family = AF_UNIX;
+    namelen = sizeof(addr.sun_family) + strlen(pStream->mPort) + 1;
+
+    pStream->mSocket = socket(AF_UNIX, SOCK_STREAM, 0);
+    hr = connect(pStream->mSocket, (struct sockaddr *) &addr, namelen);
+
+    if (hr == SOCKET_ERROR)
+    {
+        vktrace_LogError("Client: Failed connect to abstract socket.");
+        closesocket(pStream->mSocket);
+        pStream->mSocket = INVALID_SOCKET;
+    }
+
+#else
+
+    hostAddrInfo.ai_family = AF_UNSPEC;
+    hostAddrInfo.ai_socktype = SOCK_STREAM;
+    hostAddrInfo.ai_protocol = IPPROTO_TCP;
+
+    hr = getaddrinfo(pStream->mAddress, pStream->mPort, &hostAddrInfo, &pStream->mHostAddressInfo);
+    if (hr != 0) {
+        vktrace_LogError("Client: Failed getaddrinfo result=%d.", hr);
+        return FALSE;
+    }
+
+    // make several attempts to connect before bailing out
+    for (attempt = 0; attempt < 10 && !bConnected; attempt++)
+    {
+        for (currentAttempt = pStream->mHostAddressInfo; currentAttempt != NULL; currentAttempt = currentAttempt->ai_next)
+        {
+            pStream->mSocket = socket(currentAttempt->ai_family, currentAttempt->ai_socktype, currentAttempt->ai_protocol);
+
+            hr = connect(pStream->mSocket, currentAttempt->ai_addr, (int)currentAttempt->ai_addrlen);
+            if (hr == SOCKET_ERROR)
+            {
+                vktrace_LogVerbose("Client: Failed connect. Possibly non-fatal.");
+                closesocket(pStream->mSocket);
+                pStream->mSocket = INVALID_SOCKET;
+                continue;
+            }
+
+            bConnected = TRUE;
+            break;
+        }
+
+        if (!bConnected)
+        {
+            Sleep(1);
+            vktrace_LogVerbose("Client: Connect attempt %u on port %s failed, trying again.", attempt, pStream->mPort);
+        }
+        else
+        {
+            vktrace_LogVerbose("Client: Connected to port %s successfully.", pStream->mPort);
+        }
+    }
+
+    freeaddrinfo(pStream->mHostAddressInfo);
+    pStream->mHostAddressInfo = NULL;
+
+#endif
+
+    if (pStream->mSocket == INVALID_SOCKET) {
+        vktrace_LogError("Client: Couldn't find any connections.");
+        return FALSE;
+    }
+
+    if (!vktrace_MessageStream_Handshake(pStream))
+    {
+        vktrace_LogError("Client: Failed handshake with host.");
+        return FALSE;
+    }
+    return TRUE;
+}
+
+// ------------------------------------------------------------------------------------------------
+BOOL vktrace_MessageStream_Handshake(MessageStream* pStream)
+{
+    BOOL result = TRUE;
+    FileLike* fileLike = vktrace_FileLike_create_msg(pStream);
+    Checkpoint* syn = vktrace_Checkpoint_create("It's a trap!");
+    Checkpoint* ack = vktrace_Checkpoint_create(" - Admiral Ackbar");
+
+    if (pStream->mHost) {
+        vktrace_Checkpoint_write(syn, fileLike);
+        result = vktrace_Checkpoint_read(ack, fileLike);
+    } else {
+        if (vktrace_Checkpoint_read(syn, fileLike))
+        {
+            vktrace_Checkpoint_write(ack, fileLike);
+        }
+        else
+        {
+            result = FALSE;
+        }
+    }
+
+    // Turn on non-blocking modes for sockets now.
+    if (result)
+    {
+#if defined(WIN32)
+        u_long asyncMode = 1;
+        ioctlsocket(pStream->mSocket, FIONBIO, &asyncMode);
+#else
+        fcntl(pStream->mSocket, F_SETFL, O_NONBLOCK);
+#endif
+    }
+
+    VKTRACE_DELETE(syn);
+    VKTRACE_DELETE(ack);
+    VKTRACE_DELETE(fileLike);
+
+    return result;
+}
+
+// ------------------------------------------------------------------------------------------------
+void vktrace_MessageStream_FlushSendBuffer(MessageStream* pStream, BOOL _optional)
+{
+    size_t bufferedByteSize = 0;
+    const void* bufferBytes = vktrace_SimpleBuffer_GetBytes(pStream->mSendBuffer, &bufferedByteSize);
+    if (bufferedByteSize > 0) {
+        // TODO use return value from ReallySend
+        vktrace_MessageStream_ReallySend(pStream, bufferBytes, bufferedByteSize, _optional);
+        vktrace_SimpleBuffer_EmptyBuffer(pStream->mSendBuffer);
+    }
+}
+
+// ------------------------------------------------------------------------------------------------
+BOOL vktrace_MessageStream_BufferedSend(MessageStream* pStream, const void* _bytes, size_t _size, BOOL _optional)
+{
+    BOOL result = TRUE;
+    if (pStream->mSendBuffer == NULL) {
+        result = vktrace_MessageStream_ReallySend(pStream, _bytes, _size, _optional);
+    }
+    else
+    {
+        if (!vktrace_SimpleBuffer_WouldOverflow(pStream->mSendBuffer, _size)) {
+            result = vktrace_SimpleBuffer_AddBytes(pStream->mSendBuffer, _bytes, _size);
+        } else {
+            // Time to flush the cache.
+            vktrace_MessageStream_FlushSendBuffer(pStream, FALSE);
+
+            // Check to see if the packet is larger than the send buffer 
+            if (vktrace_SimpleBuffer_WouldOverflow(pStream->mSendBuffer, _size)) { 
+                result = vktrace_MessageStream_ReallySend(pStream, _bytes, _size, _optional); 
+            } else { 
+                result = vktrace_SimpleBuffer_AddBytes(pStream->mSendBuffer, _bytes, _size);
+            }
+        }
+    }
+    return result;
+}
+
+// ------------------------------------------------------------------------------------------------
+BOOL vktrace_MessageStream_Send(MessageStream* pStream, const void* _bytes, size_t _len)
+{
+    return vktrace_MessageStream_BufferedSend(pStream, _bytes, _len, FALSE);
+}
+
+// ------------------------------------------------------------------------------------------------
+BOOL vktrace_MessageStream_ReallySend(MessageStream* pStream, const void* _bytes, size_t _size, BOOL _optional)
+{
+    size_t bytesSent = 0;
+    assert(_size > 0);
+
+    vktrace_enter_critical_section(&gSendLock);
+    do {
+        int sentThisTime = send(pStream->mSocket, (const char*)_bytes + bytesSent, (int)_size - (int)bytesSent, 0);
+        if (sentThisTime == SOCKET_ERROR) {
+            int socketError = VKTRACE_WSAGetLastError();
+            if (socketError == WSAEWOULDBLOCK) {
+                // Try again. Don't sleep, because that nukes performance from orbit.
+                continue;
+            }
+
+            if (!_optional) {
+                vktrace_leave_critical_section(&gSendLock);
+                return FALSE;
+            } 
+        }
+        if (sentThisTime == 0) {
+            if (!_optional) {
+                vktrace_leave_critical_section(&gSendLock);
+                return FALSE;
+            }
+            vktrace_LogDebug("Send on socket 0 bytes, totalbytes sent so far %u.", bytesSent);
+            break;
+        }
+
+        bytesSent += sentThisTime;
+
+    } while (bytesSent < _size);
+    vktrace_leave_critical_section(&gSendLock);
+    return TRUE;
+}
+
+// ------------------------------------------------------------------------------------------------
+BOOL vktrace_MessageStream_Recv(MessageStream* pStream, void* _out, size_t _len)
+{
+    unsigned int totalDataRead = 0;
+    do {
+        int dataRead = recv(pStream->mSocket, ((char*)_out) + totalDataRead, (int)_len - totalDataRead, 0);
+        if (dataRead == SOCKET_ERROR) {
+            pStream->mErrorNum = VKTRACE_WSAGetLastError();
+            if (pStream->mErrorNum == WSAEWOULDBLOCK || pStream->mErrorNum == EAGAIN) {
+                if (totalDataRead == 0) {
+                    return FALSE;
+                } else {
+                    // I don't do partial reads--once I start receiving I wait for everything.
+                    vktrace_LogDebug("Sleep on partial socket recv (%u bytes / %u), error num %d.", totalDataRead, _len, pStream->mErrorNum);
+                    Sleep(1);
+                }
+                // I've split these into two blocks because one of them is expected and the other isn't.
+            } else if (pStream->mErrorNum == WSAECONNRESET) {
+                // The remote client disconnected, probably not an issue.
+                vktrace_LogDebug("Connection was reset by client.");
+                return FALSE;
+            } else {
+                // Some other wonky network error--place a breakpoint here.
+                vktrace_LogError("Unexpected error (%d) while receiving message stream.", pStream->mErrorNum);
+                return FALSE;
+            }
+        } else {
+            totalDataRead += dataRead;
+        }
+    } while (totalDataRead < _len);
+
+    return TRUE;
+}
+
+// ------------------------------------------------------------------------------------------------
+BOOL vktrace_MessageStream_BlockingRecv(MessageStream* pStream, void* _outBuffer, size_t _len)
+{
+    while (!vktrace_MessageStream_Recv(pStream, _outBuffer, _len)) {
+        Sleep(1);
+    }
+    return TRUE;
+}
+
+// ------------------------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------------------------
+SimpleBuffer* vktrace_SimpleBuffer_create(size_t _bufferSize)
+{
+    SimpleBuffer* pBuffer = VKTRACE_NEW(SimpleBuffer);
+    pBuffer->mBuffer = (unsigned char*)vktrace_malloc(_bufferSize);
+    if (pBuffer->mBuffer == NULL)
+    {
+        VKTRACE_DELETE(pBuffer);
+        return NULL;
+    }
+
+    pBuffer->mEnd = 0;
+    pBuffer->mSize = _bufferSize;
+
+    return pBuffer;
+}
+
+void vktrace_SimpleBuffer_destroy(SimpleBuffer** ppBuffer)
+{
+    vktrace_free((*ppBuffer)->mBuffer);
+    VKTRACE_DELETE(*ppBuffer);
+}
+
+BOOL vktrace_SimpleBuffer_AddBytes(SimpleBuffer* pBuffer, const void* _bytes, size_t _size)
+{
+    if (vktrace_SimpleBuffer_WouldOverflow(pBuffer, _size))
+    { 
+        return FALSE;
+    }
+
+    memcpy((unsigned char*)pBuffer->mBuffer + pBuffer->mEnd, _bytes, _size);
+    pBuffer->mEnd += _size;
+
+    return TRUE;
+}
+
+void vktrace_SimpleBuffer_EmptyBuffer(SimpleBuffer* pBuffer)
+{
+    pBuffer->mEnd = 0;
+}
+
+BOOL vktrace_SimpleBuffer_WouldOverflow(SimpleBuffer* pBuffer, size_t _requestedSize)
+{
+    return pBuffer->mEnd + _requestedSize > pBuffer->mSize;
+}
+
+const void* vktrace_SimpleBuffer_GetBytes(SimpleBuffer* pBuffer, size_t* _outByteCount)
+{
+    (*_outByteCount) = pBuffer->mEnd; 
+    return pBuffer->mBuffer; 
+}
+
+//// ------------------------------------------------------------------------------------------------
+//void RemoteCommand::Read(FileLike* _fileLike)
+//{
+//    unsigned int myCommand = 0;
+//    _fileLike->Read(&myCommand);
+//    mRemoteCommandType = (EnumRemoteCommand)myCommand;
+//}
+//
+//// ------------------------------------------------------------------------------------------------
+//void RemoteCommand::Write(FileLike* _fileLike) const
+//{
+//    _fileLike->Write((unsigned int)mRemoteCommandType);
+//}

diff --git a/vktrace/src/vktrace_common/vktrace_interconnect.h b/vktrace/src/vktrace_common/vktrace_interconnect.h
new file mode 100644
index 0000000..fc9ad3d
--- /dev/null
+++ b/vktrace/src/vktrace_common/vktrace_interconnect.h

@@ -0,0 +1,105 @@
+/*
+ * Copyright (c) 2013, NVIDIA CORPORATION. All rights reserved.
+ * Copyright (c) 2014-2016 Valve Corporation. All rights reserved.
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * 
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ */
+
+#pragma once
+
+#include <errno.h>
+#include "vktrace_common.h"
+
+#if defined(PLATFORM_POSIX)
+    #include <arpa/inet.h>
+    #include <netdb.h>
+    #include <netinet/in.h>
+    #include <netinet/tcp.h>
+    #include <sys/socket.h>
+    #define SOCKET int
+    #define INVALID_SOCKET 0
+    #define SOCKET_ERROR -1
+    #define closesocket close
+    #define VKTRACE_WSAGetLastError() errno
+    #define WSAEWOULDBLOCK EWOULDBLOCK
+    #define WSAEAGAIN EAGAIN
+    #define WSAECONNRESET ECONNRESET
+#elif defined(WIN32)
+    #include <WinSock2.h>
+    #include <WS2tcpip.h>
+    #pragma comment (lib, "Ws2_32.lib")
+    #define VKTRACE_WSAGetLastError() WSAGetLastError()
+#endif
+
+static const unsigned int VKTRACE_BASE_PORT = 34199;
+struct SSerializeDataPacket;
+
+struct SimpleBuffer;
+
+// ------------------------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------------------------
+typedef struct MessageStream
+{
+    SOCKET mSocket;
+    struct addrinfo* mHostAddressInfo;
+    size_t mNextPacketId;
+    struct SimpleBuffer* mSendBuffer;
+
+    // Used if someone asks for a receive of a small string.
+    char mSmallBuffer[64];
+
+    char mAddress[64];
+
+    char mPort[8];
+
+    BOOL mHost;
+    int mErrorNum;
+} MessageStream;
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+MessageStream* vktrace_MessageStream_create_port_string(BOOL _isHost, const char* _address, const char* _port);
+MessageStream* vktrace_MessageStream_create(BOOL _isHost, const char* _address, unsigned int _port);
+void vktrace_MessageStream_destroy(MessageStream** ppStream);
+BOOL vktrace_MessageStream_BufferedSend(MessageStream* pStream, const void* _bytes, size_t _size, BOOL _optional);
+BOOL vktrace_MessageStream_Send(MessageStream* pStream, const void* _bytes, size_t _len);
+
+BOOL vktrace_MessageStream_Recv(MessageStream* pStream, void* _out, size_t _len);
+BOOL vktrace_MessageStream_BlockingRecv(MessageStream* pStream, void* _outBuffer, size_t _len);
+
+extern MessageStream* gMessageStream;
+#ifdef __cplusplus
+}
+#endif
+// ------------------------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------------------------
+typedef struct SimpleBuffer
+{
+    void* mBuffer;
+    size_t mEnd;
+    size_t mSize;
+} SimpleBuffer;
+
+SimpleBuffer* vktrace_SimpleBuffer_create(size_t _bufferSize);
+void vktrace_SimpleBuffer_destroy(SimpleBuffer** ppBuffer);
+BOOL vktrace_SimpleBuffer_AddBytes(SimpleBuffer* pBuffer, const void* _bytes, size_t _size);
+void vktrace_SimpleBuffer_EmptyBuffer(SimpleBuffer* pBuffer);
+BOOL vktrace_SimpleBuffer_WouldOverflow(SimpleBuffer* pBuffer, size_t _requestedSize);
+const void* vktrace_SimpleBuffer_GetBytes(SimpleBuffer* pBuffer, size_t* _outByteCount);

diff --git a/vktrace/src/vktrace_common/vktrace_memory.h b/vktrace/src/vktrace_common/vktrace_memory.h
new file mode 100644
index 0000000..29c3f75
--- /dev/null
+++ b/vktrace/src/vktrace_common/vktrace_memory.h

@@ -0,0 +1,154 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ **************************************************************************/
+#pragma once
+
+#include <stdlib.h>
+#include <string.h>
+#include <stdarg.h>
+
+#define VKTRACE_NEW(type) (type*)vktrace_malloc(sizeof(type))
+#define VKTRACE_NEW_ARRAY(type, count) (type*)vktrace_malloc(sizeof(type)*count)
+#define VKTRACE_DELETE(ptr) vktrace_free(ptr);
+#define VKTRACE_REALLOC(ptr, size) vktrace_realloc(ptr, size);
+
+static void* vktrace_malloc(size_t size)
+{
+    void* pMemory;
+    if (size == 0)
+        return NULL;
+
+    pMemory = malloc(size);
+
+    return pMemory;
+}
+
+static void vktrace_free(void* ptr)
+{
+    free(ptr);
+    ptr = NULL;
+}
+
+static void * vktrace_realloc(void *ptr,size_t size)
+{
+    void *pMemory;
+    if (size == 0)
+        return NULL;
+
+    pMemory = realloc(ptr, size);
+    return pMemory;
+}
+
+static char* vktrace_allocate_and_copy(const char* _src)
+{
+    if (_src == NULL)
+    {
+        return NULL;
+    }
+    else
+    {
+        size_t bufferSize = 1 + strlen(_src);
+
+        char* retVal = VKTRACE_NEW_ARRAY(char, bufferSize);
+#ifdef WIN32
+        strcpy_s(retVal, bufferSize, _src);
+#else // linux
+        strncpy(retVal, _src, bufferSize);
+#endif
+
+        return retVal;
+    }
+}
+
+static char* vktrace_allocate_and_copy_n(const char* _src, int _count)
+{
+    size_t bufferSize = 1 + _count;
+
+    char* retVal = VKTRACE_NEW_ARRAY(char, bufferSize);
+
+#ifdef WIN32
+    strncpy_s(retVal, bufferSize, _src, _count);
+#else // linux
+    strncpy(retVal, _src, _count);
+    retVal[_count] = '\0';
+#endif
+
+    return retVal;
+}
+
+static char* vktrace_copy_and_append(const char* pBaseString, const char* pSeparator, const char* pAppendString)
+{
+    size_t baseSize = (pBaseString != NULL) ? strlen(pBaseString) : 0;
+    size_t separatorSize = ((pAppendString != NULL) && strlen(pAppendString) && (pSeparator != NULL)) ?
+                           strlen(pSeparator) : 0;
+    size_t appendSize = (pAppendString != NULL) ? strlen(pAppendString) : 0;
+    size_t bufferSize = baseSize + separatorSize + appendSize + 1;
+    char* retVal = VKTRACE_NEW_ARRAY(char, bufferSize);
+    if (retVal != NULL)
+    {
+#ifdef WIN32
+        strncpy_s(retVal, bufferSize, pBaseString, baseSize);
+        strncpy_s(&retVal[baseSize], bufferSize-baseSize, pSeparator, separatorSize);
+        strncpy_s(&retVal[baseSize+separatorSize], bufferSize-baseSize-separatorSize, pAppendString, appendSize);
+#else // linux
+        strncpy(retVal, pBaseString, baseSize);
+        strncpy(&retVal[baseSize], pSeparator, separatorSize);
+        strncpy(&retVal[baseSize+separatorSize], pAppendString, appendSize);
+#endif
+    }
+    retVal[bufferSize-1] = '\0';
+    return retVal;
+}
+
+static char* vktrace_copy_and_append_args(const char* pBaseString, const char* pSeparator, const char* pAppendFormat, va_list args)
+{
+    size_t baseSize = (pBaseString != NULL) ? strlen(pBaseString) : 0;
+    size_t separatorSize = (pSeparator != NULL) ? strlen(pSeparator) : 0;
+    size_t appendSize = 0;
+    size_t bufferSize = 0;
+    char* retVal = NULL;
+
+#if defined(WIN32)
+    appendSize = _vscprintf(pAppendFormat, args);
+#elif defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+    va_list argcopy;
+    va_copy(argcopy, args);
+    appendSize = vsnprintf(NULL, 0, pAppendFormat, argcopy);
+    va_end(argcopy);
+#endif
+
+    bufferSize = baseSize + separatorSize + appendSize + 1;
+    retVal = VKTRACE_NEW_ARRAY(char, bufferSize);
+    if (retVal != NULL)
+    {
+#ifdef WIN32
+        strncpy_s(retVal, bufferSize, pBaseString, baseSize);
+        strncpy_s(&retVal[baseSize], bufferSize-baseSize, pSeparator, separatorSize);
+        _vsnprintf_s(&retVal[baseSize+separatorSize], bufferSize-baseSize-separatorSize, appendSize, pAppendFormat, args);
+#else // linux
+        strncpy(retVal, pBaseString, baseSize);
+        strncpy(&retVal[baseSize], pSeparator, separatorSize);
+        vsnprintf(&retVal[baseSize+separatorSize], appendSize, pAppendFormat, args);
+#endif
+    }
+    return retVal;
+}
+

diff --git a/vktrace/src/vktrace_common/vktrace_multiplatform.h b/vktrace/src/vktrace_common/vktrace_multiplatform.h
new file mode 100644
index 0000000..2665438
--- /dev/null
+++ b/vktrace/src/vktrace_common/vktrace_multiplatform.h

@@ -0,0 +1,119 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: David Pinedo<david@lunarg.com>
+ **************************************************************************/
+#pragma once
+
+#include "vktrace_trace_packet_identifiers.h"
+#include "vktrace_filelike.h"
+#include "vktrace_memory.h"
+#include "vktrace_process.h"
+#include <stdbool.h>
+#include "vulkan/vk_icd.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+// Define types needed for cross-platform vkreplay.
+// Unfortunately, some of these are duplicated from vulkan.h
+// and platform-specific header files. Haven't figured out how
+// to avoid this.
+#if !defined(VK_USE_PLATFORM_XCB_KHR)
+typedef VkFlags VkXcbSurfaceCreateFlagsKHR;
+typedef struct xcb_connection_t xcb_connection_t;
+typedef uint32_t xcb_window_t;
+typedef uint32_t xcb_visualid_t;
+typedef struct VkXcbSurfaceCreateInfoKHR {
+    VkStructureType               sType;
+    const void*                   pNext;
+    VkXcbSurfaceCreateFlagsKHR    flags;
+    xcb_connection_t*             connection;
+    xcb_window_t                  window;
+} VkXcbSurfaceCreateInfoKHR;
+typedef VkResult (VKAPI_PTR *PFN_vkCreateXcbSurfaceKHR)(VkInstance instance, const VkXcbSurfaceCreateInfoKHR* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkSurfaceKHR* pSurface);
+typedef VkBool32 (VKAPI_PTR *PFN_vkGetPhysicalDeviceXcbPresentationSupportKHR)(VkPhysicalDevice physicalDevice, uint32_t queueFamilyIndex, xcb_connection_t* connection, xcb_visualid_t visual_id);
+typedef struct {
+    VkIcdSurfaceBase base;
+    xcb_connection_t *connection;
+    xcb_window_t window;
+} VkIcdSurfaceXcb;
+#endif
+
+#if !defined(VK_USE_PLATFORM_XLIB_KHR)
+typedef VkFlags VkXlibSurfaceCreateFlagsKHR;
+struct _XDisplay;
+typedef struct _XDisplay Display;
+typedef uint32_t CARD32;
+typedef CARD32 XID;
+typedef XID Window;
+typedef CARD32 VisualID;
+typedef struct VkXlibSurfaceCreateInfoKHR {
+    VkStructureType                sType;
+    const void*                    pNext;
+    VkXlibSurfaceCreateFlagsKHR    flags;
+    Display*                       dpy;
+    Window                         window;
+} VkXlibSurfaceCreateInfoKHR;
+typedef VkResult (VKAPI_PTR *PFN_vkCreateXlibSurfaceKHR)(VkInstance instance, const VkXlibSurfaceCreateInfoKHR* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkSurfaceKHR* pSurface);
+typedef VkBool32 (VKAPI_PTR *PFN_vkGetPhysicalDeviceXlibPresentationSupportKHR)(VkPhysicalDevice physicalDevice, uint32_t queueFamilyIndex, Display* dpy, VisualID visualID);
+typedef struct {
+    VkIcdSurfaceBase base;
+    Display *dpy;
+    Window window;
+} VkIcdSurfaceXlib;
+#endif
+
+#if !defined(VK_USE_PLATFORM_ANDROID_KHR)
+typedef VkFlags VkAndroidSurfaceCreateFlagsKHR;
+typedef uint32_t* ANativeWindow;
+typedef struct VkAndroidSurfaceCreateInfoKHR {
+    VkStructureType                   sType;
+    const void*                       pNext;
+    VkAndroidSurfaceCreateFlagsKHR    flags;
+    ANativeWindow*                    window;
+} VkAndroidSurfaceCreateInfoKHR;
+typedef VkResult (VKAPI_PTR *PFN_vkCreateAndroidSurfaceKHR)(VkInstance instance, const VkAndroidSurfaceCreateInfoKHR* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkSurfaceKHR* pSurface);
+typedef struct {
+  ANativeWindow* window;
+} VkIcdSurfaceAndroid;
+#endif
+
+#if defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+typedef void* HINSTANCE;
+typedef void* HWND;
+typedef void* HANDLE;
+typedef VkFlags VkWin32SurfaceCreateFlagsKHR;
+typedef struct VkWin32SurfaceCreateInfoKHR {
+    VkStructureType                sType;
+    const void*                    pNext;
+    VkWin32SurfaceCreateFlagsKHR   flags;
+    HINSTANCE                      hinstance;
+    HWND                           window;
+} VkWin32SurfaceCreateInfoKHR;
+typedef VkResult (VKAPI_PTR *PFN_vkCreateWin32SurfaceKHR)(VkInstance instance, const VkWin32SurfaceCreateInfoKHR* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkSurfaceKHR* pSurface);
+typedef VkBool32 (VKAPI_PTR *PFN_vkGetPhysicalDeviceWin32PresentationSupportKHR)(VkPhysicalDevice physicalDevice, uint32_t queueFamilyIndex);
+
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+

diff --git a/vktrace/src/vktrace_common/vktrace_pageguard_memorycopy.cpp b/vktrace/src/vktrace_common/vktrace_pageguard_memorycopy.cpp
new file mode 100644
index 0000000..1668564
--- /dev/null
+++ b/vktrace/src/vktrace_common/vktrace_pageguard_memorycopy.cpp

@@ -0,0 +1,495 @@
+/*
+* Copyright (c) 2016 Advanced Micro Devices, Inc. All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+#include <assert.h>
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include "vktrace_pageguard_memorycopy.h"
+
+#define OPTIMIZATION_FUNCTION_IMPLEMENTATION
+
+
+using namespace std;
+
+static const size_t SIZE_LIMIT_TO_USE_OPTIMIZATION = 1*1024*1024; //turn off optimization of memcpy if size < this limit.
+                                                                  //for multithread memcopy, there is system cost on multiple threads include switch control from different threads,
+                                                                  //synchronization and communication like semaphore wait and post and other process which system don't need to handle
+                                                                  //in single thread memcpy, if these cost is greater than benefit of using multithread,we should directly call memcpy.
+                                                                  //here set the value with 1M base on roughly estimation of the cost.
+
+
+bool vktrace_sem_create(vktrace_sem_id *sem_id, uint32_t initvalue)
+{
+    bool sem_create_ok = false;
+#if defined(USE_PAGEGUARD_SPEEDUP)
+#if defined(WIN32)
+    static const uint32_t maxValue=0x40000; // Posix doesn't have this value in its sem_create interface, but windows have it. here we also don't need this value, so give it a value that's has no limit for this case.
+    HANDLE sid = CreateSemaphore(NULL, initvalue, maxValue, NULL);
+    if (sid != INVALID_HANDLE_VALUE)
+    {
+        *sem_id = sid;
+        sem_create_ok = true;
+    }
+#else
+    sem_t *sem = new sem_t;
+    if(sem != nullptr)
+    {
+        if (sem_init(sem, 0, initvalue) == 0)
+        {
+            sem_create_ok = true;
+            *sem_id = sem;
+        }
+    }
+#endif
+#endif // USE_PAGEGUARD_SPEEDUP
+    return sem_create_ok;
+}
+
+void vktrace_sem_delete(vktrace_sem_id sid)
+{
+#if defined(USE_PAGEGUARD_SPEEDUP)
+#if defined(WIN32)
+    CloseHandle(sid);
+#else
+    sem_close(sid);
+    delete (sem_t *)sid;
+#endif
+#endif // USE_PAGEGUARD_SPEEDUP
+}
+
+void vktrace_sem_wait(vktrace_sem_id sid)
+{
+#if defined(USE_PAGEGUARD_SPEEDUP)
+#if defined(WIN32)
+    WaitForSingleObject(sid, INFINITE);
+#else
+    sem_wait(sid);
+#endif
+#endif // USE_PAGEGUARD_SPEEDUP
+}
+
+void vktrace_sem_post(vktrace_sem_id sid)
+{
+#if defined(USE_PAGEGUARD_SPEEDUP)
+#if defined(WIN32)
+    ReleaseSemaphore(sid, 1, NULL);
+#else
+    sem_post(sid);
+#endif
+#endif // USE_PAGEGUARD_SPEEDUP
+}
+
+#if defined(PAGEGUARD_MEMCPY_USE_PPL_LIB)
+
+#if defined(WIN32)
+#define PARALLEL_INVOKE_NUM   10 //this is the maximum task number that  parallel_invoke can use, how many threads are actually used to finish these task depand on system concurrency algorithm.
+extern "C" void *vktrace_pageguard_memcpy(void * destination, const void * source, size_t size)
+{
+    void *pRet=NULL;
+        if (size < SIZE_LIMIT_TO_USE_OPTIMIZATION)
+        {
+            pRet = memcpy(destination, source, (size_t)size);
+        }
+        else
+        {
+            pRet = destination;
+            void* ptr_parallel[PARALLEL_INVOKE_NUM];
+            void* pBuffer_parallel[PARALLEL_INVOKE_NUM];
+            size_t size_parallel[PARALLEL_INVOKE_NUM];
+
+            int stepsize = size / PARALLEL_INVOKE_NUM;
+            for (int i = 0; i < PARALLEL_INVOKE_NUM; i++)
+            {
+                ptr_parallel[i] = (char *)destination + i*stepsize;
+                pBuffer_parallel[i] = (char *)source + i*stepsize;
+                if ((i + 1) == PARALLEL_INVOKE_NUM)
+                {
+                    size_parallel[i] = size - (PARALLEL_INVOKE_NUM-1)*stepsize;
+                }
+                else
+                {
+                   size_parallel[i] = stepsize;
+                }
+            }
+            parallel_invoke(
+                [&ptr_parallel, &pBuffer_parallel, &size_parallel] { memcpy(ptr_parallel[9], pBuffer_parallel[9], (size_t)size_parallel[9]); },
+                [&ptr_parallel, &pBuffer_parallel, &size_parallel] { memcpy(ptr_parallel[8], pBuffer_parallel[8], (size_t)size_parallel[8]); },
+                [&ptr_parallel, &pBuffer_parallel, &size_parallel] { memcpy(ptr_parallel[7], pBuffer_parallel[7], (size_t)size_parallel[7]); },
+                [&ptr_parallel, &pBuffer_parallel, &size_parallel] { memcpy(ptr_parallel[6], pBuffer_parallel[6], (size_t)size_parallel[6]); },
+                [&ptr_parallel, &pBuffer_parallel, &size_parallel] { memcpy(ptr_parallel[5], pBuffer_parallel[5], (size_t)size_parallel[5]); },
+                [&ptr_parallel, &pBuffer_parallel, &size_parallel] { memcpy(ptr_parallel[4], pBuffer_parallel[4], (size_t)size_parallel[4]); },
+                [&ptr_parallel, &pBuffer_parallel, &size_parallel] { memcpy(ptr_parallel[3], pBuffer_parallel[3], (size_t)size_parallel[3]); },
+                [&ptr_parallel, &pBuffer_parallel, &size_parallel] { memcpy(ptr_parallel[2], pBuffer_parallel[2], (size_t)size_parallel[2]); },
+                [&ptr_parallel, &pBuffer_parallel, &size_parallel] { memcpy(ptr_parallel[1], pBuffer_parallel[1], (size_t)size_parallel[1]); },
+                [&ptr_parallel, &pBuffer_parallel, &size_parallel] { memcpy(ptr_parallel[0], pBuffer_parallel[0], (size_t)size_parallel[0]); }
+            );
+        }
+    return pRet;
+}
+#else//defined(PAGEGUARD_MEMCPY_USE_PPL_LIB), Linux
+extern "C" void *vktrace_pageguard_memcpy(void * destination, const void * source, size_t size)
+{
+    return memcpy(destination, source, (size_t)size);
+}
+#endif
+
+#else //!defined(PAGEGUARD_MEMCPY_USE_PPL_LIB), use cross-platform memcpy multithread which exclude PPL
+
+
+typedef void(*vktrace_pageguard_ptr_task_unit_function)(void *pTaskUnitParaInput);
+
+typedef struct
+{
+    void * src, *dest;
+    size_t size;
+} vktrace_pageguard_task_unit_parameters;
+
+typedef struct
+{
+    int index;
+    vktrace_pageguard_task_unit_parameters *ptask_units;
+    int amount;
+    vktrace_sem_id sem_id_access;
+} vktrace_pageguard_task_queue;
+
+typedef struct
+{
+    int index;
+    vktrace_pageguard_thread_id thread_id;
+    vktrace_sem_id sem_id_task_start;
+    vktrace_sem_id sem_id_task_end;
+    //vktrace_pageguard_task_unit_parameters *ptask_para;
+} vktrace_pageguard_task_control_block;
+
+
+
+#if defined(WIN32)
+  typedef uint32_t(*vktrace_pageguard_thread_function_ptr)(void *parameters);
+#else
+  typedef void *(*vktrace_pageguard_thread_function_ptr)(void *parameters);
+#endif
+bool vktrace_pageguard_create_thread(vktrace_pageguard_thread_id *ptid, vktrace_pageguard_thread_function_ptr pfunc, vktrace_pageguard_task_control_block *ptaskpara)
+{
+    bool create_thread_ok = false;
+#if defined(WIN32)
+    DWORD dwThreadID;
+    HANDLE thread_handle;
+    thread_handle = CreateThread(NULL, 0,
+        (LPTHREAD_START_ROUTINE)pfunc, ptaskpara,
+        0, &dwThreadID);
+
+    if (thread_handle != INVALID_HANDLE_VALUE)
+    {
+        *ptid = thread_handle;
+        create_thread_ok = true;
+    }
+
+#else
+    pthread_t thread;
+    int state = PTHREAD_CANCEL_ENABLE, oldtype;
+    state = PTHREAD_CANCEL_ASYNCHRONOUS;
+    pthread_setcanceltype(state, &oldtype);
+    if (pthread_create(&thread, NULL, pfunc, (void *)ptaskpara) == 0)
+    {
+        *ptid = thread;
+        create_thread_ok = true;
+    }
+    pthread_setcanceltype(oldtype, &state);
+#endif
+    return create_thread_ok;
+}
+
+
+void vktrace_pageguard_join_thread(vktrace_pageguard_thread_id tid)
+{
+#if defined(WIN32)
+    WaitForSingleObject((HANDLE)tid, INFINITE);
+#else
+    pthread_join((pthread_t)tid, NULL);
+#endif
+}
+
+void vktrace_pageguard_delete_thread(vktrace_pageguard_thread_id tid)
+{
+#if defined(WIN32)
+    DWORD  dwExitCode=0;
+    TerminateThread((HANDLE)tid, dwExitCode);
+#else
+    pthread_cancel((pthread_t)tid);
+#endif
+}
+
+int vktrace_pageguard_get_cpu_core_count()
+{
+    int iret = 4;
+#if defined(WIN32)
+    SYSTEM_INFO sSysInfo;
+    GetSystemInfo(&sSysInfo);
+    iret = sSysInfo.dwNumberOfProcessors;
+#else
+    iret=sysconf(_SC_NPROCESSORS_ONLN);
+#endif
+    return iret;
+}
+
+vktrace_pageguard_task_queue *vktrace_pageguard_get_task_queue()
+{
+    static vktrace_pageguard_task_queue *pvktrace_pageguard_task_queue_multi_thread_memcpy = nullptr;
+    if (!pvktrace_pageguard_task_queue_multi_thread_memcpy)
+    {
+        pvktrace_pageguard_task_queue_multi_thread_memcpy = new vktrace_pageguard_task_queue;
+        memset( reinterpret_cast<void *>(pvktrace_pageguard_task_queue_multi_thread_memcpy), 0, sizeof(vktrace_pageguard_task_queue));
+        vktrace_sem_create(&pvktrace_pageguard_task_queue_multi_thread_memcpy->sem_id_access, 1);
+    }
+    return pvktrace_pageguard_task_queue_multi_thread_memcpy;
+}
+
+vktrace_pageguard_task_unit_parameters *vktrace_pageguard_get_task_unit_parameters()
+{
+    vktrace_pageguard_task_unit_parameters *pret = nullptr;
+    vktrace_pageguard_task_queue *ptaskqueue = vktrace_pageguard_get_task_queue();
+    vktrace_sem_wait(ptaskqueue->sem_id_access);
+    if (ptaskqueue->index<ptaskqueue->amount)
+    {
+        pret = &ptaskqueue->ptask_units[ptaskqueue->index];
+        ptaskqueue->index++;
+    }
+    vktrace_sem_post(ptaskqueue->sem_id_access);
+    return pret;
+}
+
+vktrace_pageguard_task_control_block *vktrace_pageguard_get_task_control_block()
+{
+    static vktrace_pageguard_task_control_block *ptask_control_block = nullptr;
+    if (!ptask_control_block)
+    {
+        int thread_number = vktrace_pageguard_get_cpu_core_count();
+        ptask_control_block = reinterpret_cast<vktrace_pageguard_task_control_block *>(new uint8_t[thread_number*sizeof(vktrace_pageguard_task_control_block)]);
+        memset((void *)ptask_control_block, 0, thread_number*sizeof(vktrace_pageguard_task_control_block));
+    }
+    return ptask_control_block;
+}
+
+uint32_t vktrace_pageguard_thread_function(void *ptcbpara)
+{
+    vktrace_pageguard_task_control_block * ptasktcb = reinterpret_cast<vktrace_pageguard_task_control_block *>(ptcbpara);
+    vktrace_pageguard_task_unit_parameters *parameters;
+    bool stop_loop = false;
+    while (1)
+    {
+        vktrace_sem_wait(ptasktcb->sem_id_task_start);
+        stop_loop = false;
+        while (!stop_loop)
+        {
+            parameters = vktrace_pageguard_get_task_unit_parameters();
+            if (parameters != nullptr)
+            {
+                memcpy(parameters->dest, parameters->src, parameters->size);
+            }
+            else
+            {
+                stop_loop = true;
+            }
+        }
+        vktrace_sem_post(ptasktcb->sem_id_task_end);
+    }
+}
+
+bool vktrace_pageguard_init_multi_threads_memcpy_custom(vktrace_pageguard_thread_function_ptr pfunc)
+{
+    bool init_multi_threads_memcpy_custom_ok = false, success_sem_start = false, success_sem_end = false, success_thread = false;
+    vktrace_pageguard_task_control_block *ptcb = vktrace_pageguard_get_task_control_block();
+    int thread_number = vktrace_pageguard_get_cpu_core_count();
+    for (int i = 0; i < thread_number; i++)
+    {
+        success_sem_start = vktrace_sem_create(&ptcb[i].sem_id_task_start,0);
+        success_sem_end = vktrace_sem_create(&ptcb[i].sem_id_task_end,0);
+        ptcb[i].index = i;
+        success_thread = vktrace_pageguard_create_thread(&ptcb[i].thread_id, (vktrace_pageguard_thread_function_ptr)pfunc, &ptcb[i]);
+        if (success_sem_start&&success_sem_end &&success_thread)
+        {
+            init_multi_threads_memcpy_custom_ok = true;
+        }
+        else
+        {
+            init_multi_threads_memcpy_custom_ok = false;
+            break;
+        }
+    }
+    return init_multi_threads_memcpy_custom_ok;
+}
+
+static vktrace_sem_id glocal_sem_id;
+static bool glocal_sem_id_create_success= vktrace_sem_create(&glocal_sem_id, 1);
+
+int vktrace_pageguard_ref_count(bool release)
+{
+  static int ref_count = 0;
+  int curr_ref_count;
+  if (!release)
+  {
+      vktrace_sem_wait(glocal_sem_id);
+      curr_ref_count = ref_count;
+      ++ref_count;
+      vktrace_sem_post(glocal_sem_id);
+      return curr_ref_count;
+  }
+  else
+  {
+     vktrace_sem_wait(glocal_sem_id);
+     --ref_count;
+     curr_ref_count = ref_count;
+     vktrace_sem_post(glocal_sem_id);
+     return curr_ref_count;
+  }
+}
+
+extern "C" BOOL vktrace_pageguard_init_multi_threads_memcpy()
+{
+    int refnum = vktrace_pageguard_ref_count(false);
+    BOOL init_multi_threads_memcpy_ok = TRUE;
+    vktrace_pageguard_thread_function_ptr pfunc = (vktrace_pageguard_thread_function_ptr)vktrace_pageguard_thread_function;
+    if (!refnum)
+    {
+        init_multi_threads_memcpy_ok = vktrace_pageguard_init_multi_threads_memcpy_custom(pfunc);
+    }
+    return init_multi_threads_memcpy_ok;
+}
+
+void vktrace_pageguard_delete_task_control_block()
+{
+    delete[] vktrace_pageguard_get_task_control_block();
+}
+
+void vktrace_pageguard_delete_task_queue()
+{
+    delete vktrace_pageguard_get_task_queue();
+}
+
+extern "C" void vktrace_pageguard_done_multi_threads_memcpy()
+{
+    int refnum = vktrace_pageguard_ref_count(true);
+    if (!refnum)
+    {
+        vktrace_pageguard_task_control_block *task_control_block = vktrace_pageguard_get_task_control_block();
+        if (task_control_block != nullptr)
+        {
+            int thread_number = vktrace_pageguard_get_cpu_core_count();
+
+            for (int i = 0; i < thread_number; i++)
+            {
+                vktrace_sem_delete(task_control_block[i].sem_id_task_start);
+                vktrace_sem_delete(task_control_block[i].sem_id_task_end);
+                vktrace_pageguard_delete_thread(task_control_block[i].thread_id);
+            }
+            vktrace_pageguard_delete_task_control_block();
+            vktrace_sem_delete(vktrace_pageguard_get_task_queue()->sem_id_access);
+            vktrace_pageguard_delete_task_queue();
+        }
+    }
+}
+
+//must keep units until vktrace_pageguard_multi_threads_memcpy_run finished
+void vktrace_pageguard_set_task_queue(vktrace_pageguard_task_unit_parameters *units, int unitamount)
+{
+    vktrace_sem_wait(glocal_sem_id);
+    vktrace_pageguard_task_queue *pqueue = vktrace_pageguard_get_task_queue();
+    pqueue->amount = unitamount;
+    pqueue->ptask_units = units;
+    pqueue->index = 0;
+}
+
+void vktrace_pageguard_clear_task_queue()
+{
+    vktrace_pageguard_task_queue *pqueue = vktrace_pageguard_get_task_queue();
+    pqueue->amount = 0;
+    pqueue->ptask_units = nullptr;
+    pqueue->index = 0;
+    vktrace_sem_post(glocal_sem_id);
+}
+
+//The steps for using multithreading copy:
+//<1>init_multi_threads_memcpy
+//   it should be put at beginning of the app
+
+//<2>vktrace_pageguard_set_task_queue
+//<3>vktrace_pageguard_multi_threads_memcpy_run
+
+//<4>done_multi_threads_memcpy()
+//   it should be putted at end of the app
+void vktrace_pageguard_multi_threads_memcpy_run()
+{
+    vktrace_pageguard_task_control_block *ptcb = vktrace_pageguard_get_task_control_block();
+    int thread_number = vktrace_pageguard_get_cpu_core_count();
+
+    for (int i = 0; i < thread_number; i++)
+    {
+        vktrace_sem_post(ptcb[i].sem_id_task_start);
+    }
+
+    for (int i = 0; i < thread_number; i++)
+    {
+        vktrace_sem_wait(ptcb[i].sem_id_task_end);
+    }
+
+}
+
+void vktrace_pageguard_memcpy_multithread(void *dest, const void *src, size_t n)
+{
+    static const size_t PAGEGUARD_MEMCPY_MULTITHREAD_UNIT_SIZE = 0x10000;
+    int thread_number = vktrace_pageguard_get_cpu_core_count();
+
+    //taskunitamount should be >=thread_number, but should not >= a value which make the unit too small and the cost of switch thread > memcpy that unit, on the other side, too small is also not best if consider last task will determine the memcpy speed.
+    int taskunitamount = n / PAGEGUARD_MEMCPY_MULTITHREAD_UNIT_SIZE;
+    if (taskunitamount < thread_number)
+    {
+        taskunitamount = thread_number;
+    }
+    size_t size_per_unit = n / taskunitamount, size_left = n%taskunitamount, size;
+    vktrace_pageguard_task_unit_parameters *units = reinterpret_cast<vktrace_pageguard_task_unit_parameters *>(new uint8_t[taskunitamount*sizeof(vktrace_pageguard_task_unit_parameters)]);
+    assert(units);
+    for (int i = 0; i < taskunitamount; i++)
+    {
+        size = size_per_unit;
+        if((i + 1) == taskunitamount)
+        {
+            size += size_left;
+        }
+        units[i].src = (void *)((uint8_t *)src + i*size_per_unit);
+        units[i].dest = (void *)((uint8_t *)dest + i*size_per_unit);
+        units[i].size = size;
+    }
+    vktrace_pageguard_set_task_queue(units, taskunitamount);
+    vktrace_pageguard_multi_threads_memcpy_run();
+    delete[] units;
+    vktrace_pageguard_clear_task_queue();
+}
+
+extern "C" void *vktrace_pageguard_memcpy(void * destination, const void * source, size_t size)
+{
+    void *pRet = NULL;
+    if (size < SIZE_LIMIT_TO_USE_OPTIMIZATION)
+    {
+        pRet = memcpy(destination, source, (size_t)size);
+    }
+    else
+    {
+        pRet = destination;
+        vktrace_pageguard_memcpy_multithread(destination, source, (size_t)size);
+    }
+    return pRet;
+}
+#endif
\ No newline at end of file

diff --git a/vktrace/src/vktrace_common/vktrace_pageguard_memorycopy.h b/vktrace/src/vktrace_common/vktrace_pageguard_memorycopy.h
new file mode 100644
index 0000000..17ac3dd
--- /dev/null
+++ b/vktrace/src/vktrace_common/vktrace_pageguard_memorycopy.h

@@ -0,0 +1,70 @@
+/*
+* Copyright (c) 2016 Advanced Micro Devices, Inc. All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#pragma once
+
+#include <stdio.h>
+#include <stdlib.h>
+
+//use PPL parallel_invoke call(on windows for now, but PPL also has a PPLx for Linux), or use cross-platform memcpy multithread which exclude PPL
+//#define PAGEGUARD_MEMCPY_USE_PPL_LIB
+#include "vktrace_platform.h"
+
+//page guard method is only available on windows now. but most source code of page guard terget cross-platform except pageguard handler and its set/reset/clear method.
+//so, once we have more platforms to provide pageguard handler(include we implement it), the condition for #define USE_PAGEGUARD_SPEEDUP can be changed to enable pageguard.
+
+#if defined(WIN32)
+    #define USE_PAGEGUARD_SPEEDUP
+    #if defined(PAGEGUARD_MEMCPY_USE_PPL_LIB)
+        #include <ppl.h>
+        using namespace concurrency;
+    #endif
+#else
+    #include <semaphore.h>
+    #include <pthread.h>
+#endif
+
+
+typedef struct __PageGuardChangedBlockInfo
+{
+    uint32_t offset;
+    uint32_t length;
+    uint32_t reserve0;
+    uint32_t reserve1;
+} PageGuardChangedBlockInfo, *pPageGuardChangedBlockInfo;
+
+#if defined(WIN32)
+typedef HANDLE vktrace_pageguard_thread_id;
+typedef HANDLE vktrace_sem_id;
+#else
+typedef pthread_t vktrace_pageguard_thread_id;
+typedef sem_t* vktrace_sem_id;
+#endif
+
+#ifdef __cplusplus
+bool vktrace_sem_create(vktrace_sem_id *sem_id, uint32_t initvalue);
+void vktrace_sem_delete(vktrace_sem_id sid);
+void vktrace_sem_wait(vktrace_sem_id sid);
+void vktrace_sem_post(vktrace_sem_id sid);
+void vktrace_pageguard_memcpy_multithread(void *dest, const void *src, size_t n);
+extern "C" void *vktrace_pageguard_memcpy(void * destination, const void * source, size_t size);
+#else
+void *vktrace_pageguard_memcpy(void * destination, const void * source, size_t size);
+#endif
+
+
+
+#define PAGEGUARD_SPECIAL_FORMAT_PACKET_FOR_VKFLUSHMAPPEDMEMORYRANGES 0X00000001

diff --git a/vktrace/src/vktrace_common/vktrace_platform.c b/vktrace/src/vktrace_common/vktrace_platform.c
new file mode 100644
index 0000000..6bb9154
--- /dev/null
+++ b/vktrace/src/vktrace_common/vktrace_platform.c

@@ -0,0 +1,484 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ **************************************************************************/
+#include "vktrace_platform.h"
+
+#if defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+#include "vktrace_common.h"
+#include <pthread.h>
+#endif
+
+#if defined(PLATFORM_OSX)
+#include <libproc.h>
+#endif
+
+vktrace_process_id vktrace_get_pid()
+{
+#if defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+    return getpid();
+#elif defined(WIN32)
+    return GetCurrentProcessId();
+#endif
+}
+
+char* vktrace_platform_get_current_executable_directory()
+{
+    char* exePath = (char*)vktrace_malloc(_MAX_PATH);
+#if defined(WIN32)
+    DWORD s = GetModuleFileName(NULL, exePath, MAX_PATH);
+#elif defined(PLATFORM_LINUX)
+    ssize_t s = readlink("/proc/self/exe", exePath, _MAX_PATH);
+    if (s >= 0)
+    {
+        exePath[s] = '\0';
+    }
+    else
+    {
+        exePath[0] = '\0';
+    }
+#elif defined(PLATFORM_OSX)
+    ssize_t s = proc_pidpath(getpid(), exePath, _MAX_PATH);
+    if (s >= 0)
+    {
+        exePath[s] = '\0';
+    }
+    else
+    {
+        exePath[0] = '\0';
+    }
+#endif
+
+    while (s > 0)
+    {
+        if (exePath[s] == '/' || exePath[s] == '\\')
+        {
+            // NULL this location and break so that the shortened string can be returned.
+            exePath[s] = '\0';
+            break;
+        }
+
+        --s;
+    }
+
+    if (s <= 0)
+    {
+        assert(!"Unexpected path returned in vktrace_platform_get_current_executable_directory");
+        vktrace_free(exePath);
+        exePath = NULL;
+    }
+
+    return exePath;
+}
+
+BOOL vktrace_is_loaded_into_vktrace()
+{
+    char exePath[_MAX_PATH];
+
+#if defined(WIN32)
+    char* substr = ((sizeof(void*) == 4)? "vktrace32.exe" : "vktrace.exe");
+    GetModuleFileName(NULL, exePath, MAX_PATH);
+#elif defined(PLATFORM_LINUX)
+    char* substr = ((sizeof(void*) == 4)? "vktrace32" : "vktrace");
+    ssize_t s = readlink("/proc/self/exe", exePath, _MAX_PATH);
+    if (s >= 0)
+    {
+        exePath[s] = '\0';
+    }
+    else
+    {
+        exePath[0] = '\0';
+    }
+#elif defined(PLATFORM_OSX)
+    char* substr = ((sizeof(void*) == 4)? "vktrace32" : "vktrace");
+    ssize_t s = proc_pidpath(getpid(), exePath, _MAX_PATH);
+    if (s >= 0)
+    {
+        exePath[s] = '\0';
+    }
+    else
+    {
+        exePath[0] = '\0';
+    }
+#endif
+    return (strstr(exePath, substr) != NULL);
+}
+
+BOOL vktrace_platform_get_next_lib_sym(void * *ppFunc, const char * name)
+{
+#if defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+    if ((*ppFunc = dlsym(RTLD_NEXT, name)) == NULL) {
+         vktrace_LogError("dlsym: failed to find symbol %s %s", name, dlerror());
+         return FALSE;
+    }
+#elif defined(WIN32)
+    vktrace_LogError("unimplemented");
+    assert(0);
+    return FALSE;
+#endif
+   return TRUE;
+}
+
+vktrace_thread_id vktrace_platform_get_thread_id()
+{
+#if defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+    //return (vktrace_thread_id)syscall(SYS_gettid);
+    return (vktrace_thread_id)pthread_self();
+#elif defined(WIN32)
+    return GetCurrentThreadId();
+#endif
+}
+
+char *vktrace_get_global_var(const char *name)
+{
+#if defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+    return getenv(name);
+#else
+    // TODO: add code for reading from Windows registry
+    // For now we just return the result from getenv
+    return getenv(name);
+#endif
+}
+
+void vktrace_set_global_var(const char *name, const char *val)
+{
+#if defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+    setenv(name, val, 1);
+#else
+    // TODO add code for writing to Windows registry
+    // For now we just do _putenv_s
+    _putenv_s(name, val);
+#endif
+}
+
+size_t vktrace_platform_rand_s(uint32_t* out_array, size_t out_array_length)
+{
+#if defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+    static __thread unsigned int s_seed = 0;
+    size_t i = 0;
+
+    if (s_seed == 0)
+    {
+        // Try to seed rand_r() with /dev/urandom.
+        size_t nbytes = 0;
+        int fd = open("/dev/urandom", O_RDONLY);
+        if (fd != -1)
+        {
+            nbytes = read(fd, &s_seed, sizeof(s_seed));
+            close(fd);
+        }
+
+        // If that didn't work, fallback to time and thread id.
+        if (nbytes != sizeof(s_seed))
+        {
+            struct timeval time;
+            gettimeofday(&time, NULL);
+            s_seed = vktrace_platform_get_thread_id() ^ ((time.tv_sec * 1000) + (time.tv_usec / 1000));
+        }
+    }
+
+    for (i = 0; i < out_array_length; ++i)
+    {
+        out_array[i] = rand_r(&s_seed);
+    }
+
+    return out_array_length;
+#elif defined(WIN32)
+    //VKTRACE_ASSUME(sizeof(uint32_t) == sizeof(unsigned int));
+
+    size_t ret_values = 0;
+    for (ret_values = 0; ret_values < out_array_length; ++ret_values)
+    {
+        if (FAILED(rand_s(&out_array[ret_values])))
+            return ret_values;
+    }
+
+    return ret_values;
+#endif
+}
+
+void * vktrace_platform_open_library(const char* libPath)
+{
+#if defined(WIN32)
+    return LoadLibrary(libPath);
+#elif defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+    return dlopen(libPath, RTLD_LAZY);
+#endif
+}
+
+void * vktrace_platform_get_library_entrypoint(void * libHandle, const char *name)
+{
+    // Get func ptr to library entrypoint. We don't log an error if
+    // we don't find the entrypoint, because cross-platform support
+    // causes vkreplay to query the address of all api entrypoints,
+    // even the wsi-specific ones.
+#ifdef WIN32
+    FARPROC proc = GetProcAddress((HMODULE)libHandle, name);
+#else
+    void * proc = dlsym(libHandle, name);
+#endif
+    return proc;
+}
+
+void vktrace_platform_close_library(void* pLibrary)
+{
+#if defined(WIN32)
+    FreeLibrary((HMODULE)pLibrary);
+#elif defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+    dlclose(pLibrary);
+#endif
+}
+
+void vktrace_platform_full_path(const char* partPath, unsigned long bytes, char* buffer)
+{
+    assert(buffer != NULL);
+#if defined(WIN32)
+    GetFullPathName(partPath, bytes, buffer, NULL);
+#elif defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+    char *ptr = realpath(partPath, buffer);
+    (void) ptr;
+#endif
+}
+
+char* vktrace_platform_extract_path(char* _path)
+{
+    // These functions actually work on const strings, but the C decl version exposed by the macro 
+    // takes non-const TCHAR*.
+    char* pDir;
+    size_t newLen;
+    char* pathSepBack = strrchr(_path, '\\');
+    char* pathSepFor = strrchr(_path, '/');
+    char* lastPathSep = pathSepBack > pathSepFor ? pathSepBack : pathSepFor;
+
+    if (lastPathSep == NULL)
+    {
+        return vktrace_allocate_and_copy(".\\");
+    }
+
+    pDir = VKTRACE_NEW_ARRAY(char, strlen(_path) + 1);
+    newLen = strlen(_path) - strlen(lastPathSep);
+    strncpy(pDir, _path, newLen);
+    pDir[newLen] = '\0';
+    return pDir;
+}
+
+// The following linux paths are based on:
+// standards.freedesktop.org/basedir-spec/basedir-spec-0.8.html
+char* vktrace_platform_get_settings_path()
+{
+#if defined(__linux__)
+    char* xdgConfigHome = getenv("XDG_CONFIG_HOME");
+    if (xdgConfigHome != NULL && strlen(xdgConfigHome) > 0)
+    {
+        return vktrace_copy_and_append(xdgConfigHome, VKTRACE_PATH_SEPARATOR, "vktrace");
+    }
+    else
+    {
+        return vktrace_copy_and_append(getenv("HOME"), VKTRACE_PATH_SEPARATOR, ".config/vktrace");
+    }
+#elif defined(WIN32)
+    DWORD reqLength = GetEnvironmentVariable("localappdata", NULL, 0);
+    TCHAR* localAppData = VKTRACE_NEW_ARRAY(TCHAR, reqLength);
+    GetEnvironmentVariable("localappdata", localAppData, reqLength);
+    TCHAR* localVktraceData = vktrace_copy_and_append(localAppData, VKTRACE_PATH_SEPARATOR, "vktrace");
+    VKTRACE_DELETE(localAppData);
+    return localVktraceData;
+#else
+    assert(!"not implemented");
+#endif
+}
+
+char* vktrace_platform_get_data_path()
+{
+#if defined(__linux__)
+    char* xdgDataHome = getenv("XDG_DATA_HOME");
+    if (xdgDataHome != NULL && strlen(xdgDataHome) > 0)
+    {
+        return vktrace_copy_and_append(xdgDataHome, VKTRACE_PATH_SEPARATOR, "vktrace");
+    }
+    else
+    {
+        return vktrace_copy_and_append(getenv("HOME"), VKTRACE_PATH_SEPARATOR, ".local/share/vktrace");
+    }
+#elif defined(WIN32)
+    DWORD reqLength = GetEnvironmentVariable("localappdata", NULL, 0);
+    TCHAR* localAppData = VKTRACE_NEW_ARRAY(TCHAR, reqLength);
+    GetEnvironmentVariable("localappdata", localAppData, reqLength);
+    TCHAR* localVktraceData = vktrace_copy_and_append(localAppData, VKTRACE_PATH_SEPARATOR, "vktrace");
+    VKTRACE_DELETE(localAppData);
+    return localVktraceData;
+#else
+    assert(!"not implemented");
+#endif
+}
+
+
+vktrace_thread vktrace_platform_create_thread(VKTRACE_THREAD_ROUTINE_RETURN_TYPE(*start_routine)(LPVOID), void* args)
+{
+#if defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+    vktrace_thread thread = 0;
+    if(pthread_create(&thread, NULL, (void *(*) (void*)) start_routine, args) != 0)
+    {
+        vktrace_LogError("Failed to create thread");
+    }
+    return thread;
+#elif defined(WIN32)
+    return CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)start_routine, args, 0, NULL);
+#endif
+}
+
+void vktrace_platform_resume_thread(vktrace_thread* pThread)
+{
+    assert(pThread != NULL);
+#if defined(PLATFORM_LINUX)
+    assert(!"Add code to resume threads on Linux");
+#elif defined(PLATFORM_OSX)
+    assert(!"Add code to resume threads on macOS");
+#elif defined(WIN32)
+    if (*pThread != NULL)
+        ResumeThread(*pThread);
+#endif
+}
+
+void vktrace_platform_sync_wait_for_thread(vktrace_thread* pThread)
+{
+    assert(pThread != NULL);
+#if defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+    if (pthread_join(*pThread, NULL) != 0)
+#else
+    if (WaitForSingleObject(*pThread, INFINITE) != WAIT_OBJECT_0)
+#endif
+    {
+        vktrace_LogError("Error occurred while waiting for thread to end.");
+    }
+}
+
+void vktrace_platform_delete_thread(vktrace_thread* pThread)
+{
+    assert(pThread != NULL);
+#if defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+    // Don't have to do anything!
+#elif defined(WIN32)
+    CloseHandle(*pThread);
+    *pThread = NULL;
+#endif
+}
+
+#if defined(WIN32)
+void vktrace_platform_thread_once(void *ctl, BOOL (CALLBACK * func) (PINIT_ONCE, PVOID, PVOID *))
+{
+    assert(func != NULL);
+    assert(ctl != NULL);
+    InitOnceExecuteOnce((PINIT_ONCE) ctl, (PINIT_ONCE_FN) func, NULL, NULL);
+}
+#else
+void vktrace_platform_thread_once(void *ctl, void (* func) (void))
+{
+    assert(func != NULL);
+    assert(ctl != NULL);
+    pthread_once((pthread_once_t *) ctl, func);
+}
+#endif
+
+void vktrace_create_critical_section(VKTRACE_CRITICAL_SECTION* pCriticalSection)
+{
+#if defined(WIN32)
+    InitializeCriticalSection(pCriticalSection);
+#elif defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+    pthread_mutex_init(pCriticalSection, NULL);
+#endif
+}
+
+void vktrace_enter_critical_section(VKTRACE_CRITICAL_SECTION* pCriticalSection)
+{
+#if defined(WIN32)
+    EnterCriticalSection(pCriticalSection);
+#elif defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+    pthread_mutex_lock(pCriticalSection);
+#endif
+}
+
+void vktrace_leave_critical_section(VKTRACE_CRITICAL_SECTION* pCriticalSection)
+{
+#if defined(WIN32)
+    LeaveCriticalSection(pCriticalSection);
+#elif defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+    pthread_mutex_unlock(pCriticalSection);
+#endif
+}
+
+void vktrace_delete_critical_section(VKTRACE_CRITICAL_SECTION* pCriticalSection)
+{
+#if defined(WIN32)
+    DeleteCriticalSection(pCriticalSection);
+#elif defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+    pthread_mutex_destroy(pCriticalSection);
+#endif
+}
+
+BOOL vktrace_platform_remote_load_library(vktrace_process_handle pProcessHandle, const char* dllPath, vktrace_thread* pTracingThread, char ** ldPreload)
+{
+    if (dllPath == NULL)
+        return TRUE;
+#if defined(WIN32)
+    SIZE_T bytesWritten = 0;
+    void* targetProcessMem = NULL;
+    vktrace_thread thread = NULL;
+    size_t byteCount = sizeof(char) * (strlen(dllPath) + 1);
+    targetProcessMem = VirtualAllocEx(pProcessHandle, 0, byteCount, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
+    if (!targetProcessMem)
+    {
+        vktrace_LogError("Failed to inject ourselves into target process--couldn't allocate process memory.");
+        return FALSE;
+    }
+
+    if (!WriteProcessMemory(pProcessHandle, targetProcessMem, dllPath, byteCount, &bytesWritten))
+    {
+        vktrace_LogError("Failed to inject ourselves into target process--couldn't write inception DLL name into process.");
+        return FALSE;
+    }
+
+    thread = CreateRemoteThread(pProcessHandle, NULL, 0, (LPTHREAD_START_ROUTINE)LoadLibrary, targetProcessMem, 0, NULL);
+    if (thread == NULL)
+    {
+        vktrace_LogError("Failed to inject ourselves into target process--couldn't spawn thread.");
+        return FALSE;
+    }
+    assert(pTracingThread != NULL);
+    *pTracingThread = thread;
+
+#elif defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+    char *tmp;
+    if (ldPreload == NULL)
+        return TRUE;
+    if (*ldPreload == NULL)
+    {
+        tmp = vktrace_copy_and_append("LD_PRELOAD", "=", dllPath);
+    }
+    else
+    {
+        tmp = vktrace_copy_and_append(*ldPreload, " ", dllPath);
+        VKTRACE_DELETE((void*)*ldPreload);
+    }
+    *ldPreload = tmp;
+#endif
+
+    return TRUE;
+}

diff --git a/vktrace/src/vktrace_common/vktrace_platform.h b/vktrace/src/vktrace_common/vktrace_platform.h
new file mode 100644
index 0000000..9c690f1
--- /dev/null
+++ b/vktrace/src/vktrace_common/vktrace_platform.h

@@ -0,0 +1,177 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ * Author: David Pinedo <david@lunarg.com>
+ **************************************************************************/
+#pragma once
+
+
+#if defined(PLATFORM_LINUX)
+#define _GNU_SOURCE 1
+#include <unistd.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <pthread.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <sys/prctl.h>
+#include <dlfcn.h>
+#include <signal.h>
+#include "wintypes.h"
+#define APIENTRY
+#define Sleep(n) usleep(n * 1000)
+#define VKTRACE_WINAPI
+typedef pthread_t vktrace_thread;
+typedef pid_t vktrace_process_handle;
+typedef pid_t vktrace_thread_id;
+typedef pid_t vktrace_process_id;
+typedef unsigned int VKTRACE_THREAD_ROUTINE_RETURN_TYPE;
+typedef pthread_mutex_t VKTRACE_CRITICAL_SECTION;
+#define VKTRACE_NULL_THREAD 0
+#define _MAX_PATH PATH_MAX
+#define VKTRACE_PATH_SEPARATOR "/"
+#define VKTRACE_LIST_SEPARATOR ":"
+#define VKTRACE_THREAD_LOCAL __thread
+
+#elif defined(WIN32)
+#define _CRT_RAND_S
+// The following line is needed to use the C++ std::min() or std::max():
+#define NOMINMAX
+#include <Windows.h>
+#include <tchar.h>
+#define VKTRACE_WINAPI WINAPI
+typedef HANDLE vktrace_thread;
+typedef HANDLE vktrace_process_handle;
+typedef DWORD vktrace_thread_id;
+typedef DWORD vktrace_process_id;
+typedef DWORD VKTRACE_THREAD_ROUTINE_RETURN_TYPE;
+typedef CRITICAL_SECTION VKTRACE_CRITICAL_SECTION;
+#define VKTRACE_NULL_THREAD NULL
+#define VKTRACE_PATH_SEPARATOR "\\"
+#define VKTRACE_LIST_SEPARATOR ";"
+#define VKTRACE_THREAD_LOCAL __declspec(thread)
+#if !defined(__cplusplus)
+#define inline _inline
+#endif
+#elif defined(PLATFORM_OSX)
+
+#define _GNU_SOURCE 1
+#include <unistd.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <pthread.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+//#include <sys/prctl.h>
+#include <dlfcn.h>
+#include <signal.h>
+#include "wintypes.h"
+#define APIENTRY
+#define Sleep(n) usleep(n * 1000)
+#define VKTRACE_WINAPI
+typedef pthread_t vktrace_thread;
+typedef pid_t vktrace_process_handle;
+typedef pid_t vktrace_thread_id;
+typedef pid_t vktrace_process_id;
+typedef unsigned int VKTRACE_THREAD_ROUTINE_RETURN_TYPE;
+typedef pthread_mutex_t VKTRACE_CRITICAL_SECTION;
+#define VKTRACE_NULL_THREAD 0
+#define _MAX_PATH PATH_MAX
+#define VKTRACE_PATH_SEPARATOR "/"
+#define VKTRACE_LIST_SEPARATOR ":"
+#define VKTRACE_THREAD_LOCAL __thread
+
+#endif
+
+#if defined(WIN32)
+#include "vktrace_common.h"
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+// return the process ID of current process
+vktrace_process_id vktrace_get_pid();
+
+// Get the path of the currently running executable.
+// The string returned must be freed by the caller.
+char* vktrace_platform_get_current_executable_directory();
+
+// Determine if the current process is vktrace[32|64]
+BOOL vktrace_is_loaded_into_vktrace();
+
+// Get the thread id for this thread.
+vktrace_thread_id vktrace_platform_get_thread_id();
+
+// Get the Registry or Environment variable
+char *vktrace_get_global_var(const char *);
+
+// Set the Registry or Environment variable
+void vktrace_set_global_var(const char *, const char *);
+
+// Provides out_array_length uint32s of random data from a secure service
+size_t vktrace_platform_rand_s(uint32_t* out_array, size_t byteCount);
+
+// Alternatives to loading libraries, getting proc addresses, etc
+void * vktrace_platform_open_library(const char* libPath);
+void * vktrace_platform_get_library_entrypoint(void * libHandle, const char *name);
+void vktrace_platform_close_library(void* plibrary);
+BOOL vktrace_platform_get_next_lib_sym(void * *ppFunc, const char * name);
+
+// Returns the partial path appended to the current directory to provide a full path.
+// Note the resulting string may not point to an existing file.
+void vktrace_platform_full_path(const char* partPath, unsigned long bytes, char* buffer);
+
+// returns a newly allocated string which contains just the directory structure of the supplied file path.
+char* vktrace_platform_extract_path(char* _path);
+
+// returns platform specific path for settings / configuration files
+char* vktrace_platform_get_settings_path();
+
+// returns platform specific path for all data files
+char* vktrace_platform_get_data_path();
+
+vktrace_thread vktrace_platform_create_thread(VKTRACE_THREAD_ROUTINE_RETURN_TYPE(*start_routine)(LPVOID), void* args);
+void vktrace_platform_resume_thread(vktrace_thread* pThread);
+void vktrace_platform_sync_wait_for_thread(vktrace_thread* pThread);
+void vktrace_platform_delete_thread(vktrace_thread* pThread);
+#if defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+void vktrace_platform_thread_once(void *ctl, void (* func) (void));
+#elif defined(WIN32)
+void vktrace_platform_thread_once(void *ctl, BOOL (CALLBACK * func) (_Inout_ PINIT_ONCE initOnce, _Inout_opt_ PVOID param, _Out_opt_ PVOID *lpContext));
+#endif
+
+void vktrace_create_critical_section(VKTRACE_CRITICAL_SECTION* pCriticalSection);
+void vktrace_enter_critical_section(VKTRACE_CRITICAL_SECTION* pCriticalSection);
+void vktrace_leave_critical_section(VKTRACE_CRITICAL_SECTION* pCriticalSection);
+void vktrace_delete_critical_section(VKTRACE_CRITICAL_SECTION* pCriticalSection);
+
+#if defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+#define VKTRACE_LIBRARY_NAME(projname) (sizeof(void*) == 4)? "lib"#projname"32.so" : "lib"#projname".so"
+#endif
+#if defined(WIN32)
+#define VKTRACE_LIBRARY_NAME(projname) (sizeof(void*) == 4)? #projname"32.dll" : #projname".dll"
+#endif
+
+BOOL vktrace_platform_remote_load_library(vktrace_process_handle pProcessHandle, const char* dllPath, vktrace_thread* pTracingThread, char **ldPreload);
+
+#ifdef __cplusplus
+}
+#endif

diff --git a/vktrace/src/vktrace_common/vktrace_process.c b/vktrace/src/vktrace_common/vktrace_process.c
new file mode 100644
index 0000000..c3105cb
--- /dev/null
+++ b/vktrace/src/vktrace_common/vktrace_process.c

@@ -0,0 +1,138 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ * Author: David Pinedo <david@lunarg.com>
+ **************************************************************************/
+#include "vktrace_process.h"
+
+
+BOOL vktrace_process_spawn(vktrace_process_info* pInfo)
+{
+    assert(pInfo != NULL);
+
+#if defined(WIN32)
+    {
+    unsigned long processCreateFlags = CREATE_DEFAULT_ERROR_MODE | CREATE_SUSPENDED;
+    char fullExePath[_MAX_PATH];
+    PROCESS_INFORMATION processInformation;
+    STARTUPINFO si = { 0 };
+    si.cb = sizeof(si);
+
+    memset(&processInformation, 0, sizeof(PROCESS_INFORMATION));
+    memset(fullExePath, 0, sizeof(char)*_MAX_PATH);
+    fullExePath[0] = 0;
+
+    SetLastError(0);
+    SearchPath(NULL, pInfo->exeName, ".exe", ARRAYSIZE(fullExePath), fullExePath, NULL);
+
+    if (!CreateProcess(fullExePath, pInfo->fullProcessCmdLine, NULL, NULL, TRUE,
+        processCreateFlags, NULL, pInfo->workingDirectory,
+        &si, &processInformation))
+    {
+        vktrace_LogError("Failed to inject ourselves into target process--couldn't spawn '%s'.", fullExePath);
+        return FALSE;
+    }
+
+    pInfo->hProcess = processInformation.hProcess;
+    pInfo->hThread = processInformation.hThread;
+    pInfo->processId = processInformation.dwProcessId;
+    // TODO : Do we need to do anything with processInformation.dwThreadId?
+    }
+#elif defined(PLATFORM_LINUX)
+    pInfo->processId = fork();
+    if (pInfo->processId == -1)
+    {
+        vktrace_LogError("Failed to spawn process.");
+        return FALSE;
+    }
+    else if (pInfo->processId == 0)
+    {
+        // Inside new process
+        char *args[128];
+        const char delim[] = " \t";
+        unsigned int idx;
+
+        // Change process name so the the tracer DLLs will behave as expected when loaded.
+        // NOTE: Must be 15 characters or less.
+        const char * tmpProcName = "vktraceChildProcess";
+        prctl(PR_SET_NAME, (unsigned long)tmpProcName, 0, 0, 0);
+
+        // Change working directory
+        if (chdir(pInfo->workingDirectory) == -1)
+        {
+            vktrace_LogError("Failed to set working directory.");
+        }
+
+        args[0] = pInfo->exeName;
+        args[127] = NULL;
+        idx = 1;
+        args[idx] = strtok(pInfo->processArgs, delim);
+        while ( args[idx] != NULL && idx < 128)
+        {
+            idx++;
+            args[idx] = strtok(NULL, delim);
+        }
+        vktrace_LogDebug("exec process=%s argc=%u\n", pInfo->exeName, idx);
+#if 0  //uncoment to print out list of env vars
+        char *env = environ[0];
+        idx = 0;
+        while (env && strlen(env)) {
+            if (strstr(env, "VK") || strstr(env, "LD"))
+                vktrace_LogDebug("env[%d] = %s", idx++, env);
+            else
+                idx++;
+            env = environ[idx];
+        }
+#endif
+        if (execv(pInfo->exeName, args) < 0)
+        {
+            vktrace_LogError("Failed to spawn process.");
+            return FALSE;
+        }
+    }
+#endif
+
+    return TRUE;
+}
+
+void vktrace_process_info_delete(vktrace_process_info* pInfo)
+{
+    if (pInfo->pCaptureThreads != NULL)
+    {
+        vktrace_platform_delete_thread(&(pInfo->pCaptureThreads[0].recordingThread));
+        VKTRACE_DELETE(pInfo->pCaptureThreads);
+    }
+
+#ifdef WIN32
+    vktrace_platform_delete_thread(&(pInfo->watchdogThread));
+#endif
+
+    VKTRACE_DELETE(pInfo->traceFilename);
+    VKTRACE_DELETE(pInfo->workingDirectory);
+    VKTRACE_DELETE(pInfo->processArgs);
+    VKTRACE_DELETE(pInfo->fullProcessCmdLine);
+    VKTRACE_DELETE(pInfo->exeName);
+
+    if (pInfo->pTraceFile != NULL)
+    {
+        fclose(pInfo->pTraceFile);
+    }
+    vktrace_delete_critical_section(&(pInfo->traceFileCriticalSection));
+}

diff --git a/vktrace/src/vktrace_common/vktrace_process.h b/vktrace/src/vktrace_common/vktrace_process.h
new file mode 100644
index 0000000..fb15715
--- /dev/null
+++ b/vktrace/src/vktrace_common/vktrace_process.h

@@ -0,0 +1,70 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ **************************************************************************/
+#pragma once
+
+#include "vktrace_platform.h"
+#include "vktrace_trace_packet_identifiers.h"
+
+typedef struct vktrace_process_capture_trace_thread_info vktrace_process_capture_trace_thread_info;
+
+typedef struct vktrace_process_info
+{
+    char* exeName;
+    char* processArgs;
+    char* fullProcessCmdLine;
+    char* workingDirectory;
+    char* traceFilename;
+    FILE* pTraceFile;
+
+    // vktrace's thread id
+    vktrace_thread_id parentThreadId;
+
+    VKTRACE_CRITICAL_SECTION traceFileCriticalSection;
+
+    volatile BOOL serverRequestsTermination;
+
+    vktrace_process_capture_trace_thread_info* pCaptureThreads;
+
+    // process id, handle, and main thread
+    vktrace_process_id processId;
+    vktrace_process_handle hProcess;
+    vktrace_thread hThread;
+    vktrace_thread watchdogThread;
+} vktrace_process_info;
+
+
+typedef struct vktrace_process_tracer_dll
+{
+    char * dllPath;
+    BOOL bLoaded;
+    VKTRACE_TRACER_ID tid;
+} vktrace_process_tracer_dll;
+
+struct vktrace_process_capture_trace_thread_info
+{
+    vktrace_thread recordingThread;
+    vktrace_process_info* pProcessInfo;
+    VKTRACE_TRACER_ID tracerId;
+};
+
+BOOL vktrace_process_spawn(vktrace_process_info* pInfo);
+void vktrace_process_info_delete(vktrace_process_info* pInfo);

diff --git a/vktrace/src/vktrace_common/vktrace_settings.c b/vktrace/src/vktrace_common/vktrace_settings.c
new file mode 100644
index 0000000..4476e17
--- /dev/null
+++ b/vktrace/src/vktrace_common/vktrace_settings.c

@@ -0,0 +1,780 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ **************************************************************************/
+
+#include "vktrace_settings.h"
+
+// ------------------------------------------------------------------------------------------------
+void vktrace_SettingInfo_print(const vktrace_SettingInfo* pSetting)
+{
+    if (pSetting->bPrintInHelp)
+    {
+        char * pStrParams;
+        char tmpStr[100];
+        if (pSetting->type == VKTRACE_SETTING_STRING)
+        {
+            pStrParams = "<string>";
+        } else if (pSetting->type == VKTRACE_SETTING_BOOL) {
+            pStrParams = "<BOOL>";
+        } else if (pSetting->type == VKTRACE_SETTING_UINT) {
+            pStrParams = "<uint>";
+        } else if (pSetting->type == VKTRACE_SETTING_INT) {
+            pStrParams = "<int>";
+        } else {
+            pStrParams = "< ??? >";
+        }
+#if defined(WIN32)
+        _snprintf_s(tmpStr, sizeof(tmpStr), _TRUNCATE, "-%s,--%s %s",
+                    pSetting->pShortName, pSetting->pLongName, pStrParams);
+# else
+        snprintf(tmpStr, sizeof(tmpStr), "-%s, --%s %s",
+                 pSetting->pShortName, pSetting->pLongName, pStrParams);
+#endif
+        printf("    %-33s  %s\n", tmpStr, pSetting->pDesc);
+    }
+}
+
+// ------------------------------------------------------------------------------------------------
+void vktrace_SettingGroup_print(const vktrace_SettingGroup* pSettingGroup)
+{
+    unsigned int i;
+    printf("%s available options:\n", pSettingGroup->pName);
+
+    for (i = 0; i < pSettingGroup->numSettings; i++)
+    {
+        vktrace_SettingInfo_print(&(pSettingGroup->pSettings[i]));
+    }
+}
+
+// ------------------------------------------------------------------------------------------------
+BOOL vktrace_SettingInfo_parse_value(vktrace_SettingInfo* pSetting, const char* arg)
+{
+    switch(pSetting->type)
+    {
+    case VKTRACE_SETTING_STRING:
+        {
+            vktrace_free(*pSetting->Data.ppChar);
+            *pSetting->Data.ppChar = vktrace_allocate_and_copy(arg);
+        }
+        break;
+    case VKTRACE_SETTING_BOOL:
+        {
+            BOOL bTrue = FALSE;
+#if defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+            bTrue = (strncasecmp(arg, "true", 4) == 0);
+#elif defined(PLATFORM_WINDOWS)
+            bTrue = (_strnicmp(arg, "true", 4) == 0);
+#endif
+            *pSetting->Data.pBool = bTrue;
+        }
+        break;
+    case VKTRACE_SETTING_UINT:
+        {
+            if (sscanf(arg, "%u", pSetting->Data.pUint) != 1)
+            {
+                vktrace_LogWarning("Invalid unsigned int setting: '%s'. Resetting to default value instead.", arg);
+                *(pSetting->Data.pUint) = *(pSetting->Default.pUint);
+            }
+        }
+        break;
+    case VKTRACE_SETTING_INT:
+        {
+            if (sscanf(arg, "%d", pSetting->Data.pInt) != 1)
+            {
+                vktrace_LogWarning("Invalid int setting: '%s'. Resetting to default value instead.", arg);
+                *(pSetting->Data.pInt) = *(pSetting->Default.pInt);
+            }
+        }
+        break;
+    default:
+        vktrace_LogError("Unhandled setting type (%d).", pSetting->type);
+        return FALSE;
+    }
+
+    return TRUE;
+}
+
+// ------------------------------------------------------------------------------------------------
+char* vktrace_SettingInfo_stringify_value(vktrace_SettingInfo* pSetting)
+{
+    switch(pSetting->type)
+    {
+    case VKTRACE_SETTING_STRING:
+        {
+            return vktrace_allocate_and_copy(*pSetting->Data.ppChar);
+        }
+        break;
+    case VKTRACE_SETTING_BOOL:
+        {
+            return (*pSetting->Data.pBool ? vktrace_allocate_and_copy("TRUE") : vktrace_allocate_and_copy("FALSE"));
+        }
+        break;
+    case VKTRACE_SETTING_UINT:
+        {
+            char value[100];
+            memset(value, 0, 100);
+            sprintf(value, "%u", *pSetting->Data.pUint);
+            return vktrace_allocate_and_copy(value);
+        }
+        break;
+    case VKTRACE_SETTING_INT:
+        {
+            char value[100];
+            memset(value, 0, 100);
+            sprintf(value, "%d", *pSetting->Data.pInt);
+            return vktrace_allocate_and_copy(value);
+        }
+        break;
+    default:
+        assert(!"Unhandled setting type");
+        break;
+    }
+    return vktrace_allocate_and_copy("<unhandled setting type>");
+}
+
+// ------------------------------------------------------------------------------------------------
+void vktrace_SettingGroup_reset_defaults(vktrace_SettingGroup* pSettingGroup)
+{
+    if (pSettingGroup != NULL)
+    {
+        unsigned int u;
+        for (u = 0; u < pSettingGroup->numSettings; u++)
+        {
+            vktrace_SettingInfo_reset_default(&pSettingGroup->pSettings[u]);
+        }
+    }
+}
+
+// ------------------------------------------------------------------------------------------------
+void vktrace_SettingInfo_reset_default(vktrace_SettingInfo* pSetting)
+{
+    assert(pSetting != NULL);
+    switch(pSetting->type)
+    {
+    case VKTRACE_SETTING_STRING:
+        if (*pSetting->Data.ppChar != NULL)
+        {
+            vktrace_free(*pSetting->Data.ppChar);
+        }
+
+        if (pSetting->Default.ppChar == NULL)
+        {
+            *pSetting->Data.ppChar = NULL;
+        }
+        else
+        {
+            *pSetting->Data.ppChar = vktrace_allocate_and_copy(*pSetting->Default.ppChar);
+        }
+        break;
+    case VKTRACE_SETTING_BOOL:
+        *pSetting->Data.pBool = *pSetting->Default.pBool;
+        break;
+    case VKTRACE_SETTING_UINT:
+        *pSetting->Data.pUint = *pSetting->Default.pUint;
+        break;
+    case VKTRACE_SETTING_INT:
+        *pSetting->Data.pInt = *pSetting->Default.pInt;
+        break;
+    default:
+        assert(!"Unhandled VKTRACE_SETTING_TYPE");
+        break;
+    }
+}
+
+// ------------------------------------------------------------------------------------------------
+void vktrace_SettingGroup_merge(vktrace_SettingGroup* pSrc, vktrace_SettingGroup** ppDestGroups, unsigned int* pNumDestGroups)
+{
+    unsigned int g;
+    vktrace_SettingGroup* pDestGroup = NULL;
+    assert(pSrc != NULL);
+    assert(ppDestGroups != NULL);
+    assert(pNumDestGroups != NULL);
+
+    for (g = 0; g < *pNumDestGroups; g++)
+    {
+        if (strcmp(pSrc->pName, (*ppDestGroups)[g].pName) == 0)
+        {
+            // group exists, store the pointer
+            pDestGroup = &(*ppDestGroups)[g];
+            break;
+        }
+    }
+
+    if (pDestGroup == NULL)
+    {
+        // need to replicate pSrc into ppDestGroups
+        pDestGroup = vktrace_SettingGroup_Create(vktrace_allocate_and_copy(pSrc->pName), ppDestGroups, pNumDestGroups);
+        assert(pDestGroup != NULL);
+    }
+
+    if (pDestGroup != NULL)
+    {
+        // now add all the settings!
+        unsigned int srcIndex;
+        for (srcIndex = 0; srcIndex < pSrc->numSettings; srcIndex++)
+        {
+            // search for pre-existing setting in the dest group
+            unsigned int destIndex;
+            BOOL bFound = FALSE;
+            for (destIndex = 0; destIndex < pDestGroup->numSettings; destIndex++)
+            {
+                if (strcmp(pDestGroup->pSettings[destIndex].pLongName, pSrc->pSettings[srcIndex].pLongName) == 0)
+                {
+                    bFound = TRUE;
+                    break;
+                }
+            }
+
+            if (bFound == FALSE)
+            {
+                vktrace_SettingGroup_Add_Info(&pSrc->pSettings[srcIndex], pDestGroup);
+            }
+        }
+    }
+}
+
+// ------------------------------------------------------------------------------------------------
+void vktrace_SettingGroup_Add_Info(vktrace_SettingInfo* pSrcInfo, vktrace_SettingGroup* pDestGroup)
+{
+    assert(pSrcInfo != NULL);
+    assert(pDestGroup != NULL);
+    if (pDestGroup != NULL)
+    {
+        // create a SettingInfo to store the copied information
+        vktrace_SettingInfo info;
+        vktrace_SettingInfo* pTmp;
+        memset(&info, 0, sizeof(vktrace_SettingInfo));
+
+        // copy necessary buffers so that deletion works correctly
+        info.pShortName = pSrcInfo->pShortName;
+        info.pLongName = vktrace_allocate_and_copy(pSrcInfo->pLongName);
+        info.type = VKTRACE_SETTING_STRING;
+        info.Data.ppChar = vktrace_malloc(sizeof(char**));
+        *info.Data.ppChar = vktrace_SettingInfo_stringify_value(pSrcInfo);
+
+        // add it to the current group
+        pTmp = pDestGroup->pSettings;
+        pDestGroup->numSettings += 1;
+        pDestGroup->pSettings = VKTRACE_NEW_ARRAY(vktrace_SettingInfo, pDestGroup->numSettings);
+        if (pDestGroup->pSettings == NULL)
+        {
+            // failed to allocate new info array
+            // restore original
+            pDestGroup->numSettings -= 1;
+            pDestGroup->pSettings = pTmp;
+        }
+        else
+        {
+            if (pTmp != NULL)
+            {
+                memcpy(pDestGroup->pSettings, pTmp, pDestGroup->numSettings * sizeof(vktrace_SettingInfo));
+            }
+
+            pDestGroup->pSettings[pDestGroup->numSettings - 1] = info;
+        }
+    }
+}
+
+// ------------------------------------------------------------------------------------------------
+vktrace_SettingGroup* vktrace_SettingGroup_Create(const char* pGroupName, vktrace_SettingGroup** ppSettingGroups, unsigned int* pNumSettingGroups)
+{
+    vktrace_SettingGroup* pNewGroup = NULL;
+    vktrace_SettingGroup* pTmp = *ppSettingGroups;
+    unsigned int lastIndex = *pNumSettingGroups;
+
+    (*pNumSettingGroups) += 1;
+
+    *ppSettingGroups = VKTRACE_NEW_ARRAY(vktrace_SettingGroup, *pNumSettingGroups);
+    if (*ppSettingGroups == NULL)
+    {
+        // out of memory!
+        // Don't create the new group, and restore the list to it's original state
+        (*pNumSettingGroups) -= 1;
+        *ppSettingGroups = pTmp;
+    }
+    else
+    {
+        // copy old settings to new ones
+        memcpy(*ppSettingGroups, pTmp, lastIndex * sizeof(vktrace_SettingGroup));
+
+        // clean up old array
+        VKTRACE_DELETE(pTmp);
+
+        // initialize new group
+        memset(&(*ppSettingGroups)[lastIndex], 0, sizeof(vktrace_SettingGroup));
+
+        // name the new group
+        pNewGroup = &(*ppSettingGroups)[lastIndex];
+        pNewGroup->pName = pGroupName;
+    }
+
+    return pNewGroup;
+}
+
+// ------------------------------------------------------------------------------------------------
+void vktrace_SettingGroup_update(vktrace_SettingGroup* pSrc, vktrace_SettingGroup* pDestGroups, unsigned int numDestGroups)
+{
+    unsigned int i;
+    vktrace_SettingGroup* pGroup;
+    for (i = 0; i < numDestGroups; i++)
+    {
+        pGroup = &pDestGroups[i];
+        if (strcmp(pSrc->pName, pGroup->pName) == 0)
+        {
+            vktrace_SettingGroup_Apply_Overrides(pGroup, pSrc, 1);
+            break;
+        }
+    }
+}
+
+// ------------------------------------------------------------------------------------------------
+int vktrace_SettingGroup_Load_from_file(FILE* pFile, vktrace_SettingGroup** ppSettingGroups, unsigned int* pNumSettingGroups)
+{
+    int retVal = 0;
+    char* line = VKTRACE_NEW_ARRAY(char, 1024);
+
+    assert(pFile != NULL);
+    assert(ppSettingGroups != NULL);
+    assert(pNumSettingGroups != NULL);
+    *pNumSettingGroups = 0;
+
+    if (line == NULL)
+    {
+        vktrace_LogError("Out of memory while reading settings file.");
+        retVal = -1;
+    }
+    else
+    {
+        vktrace_SettingGroup* pCurGroup = NULL;
+        while (feof(pFile) == 0 && ferror(pFile) == 0)
+        {
+            char* lineStart;
+            char* pOpenBracket;
+            char* pCloseBracket;
+            line = fgets(line, 1024, pFile);
+            if (line == NULL)
+            {
+                break;
+            }
+
+            // if line ends with a newline, then replace it with a NULL
+            if (line[strlen(line)-1] == '\n')
+            {
+                line[strlen(line)-1] = '\0';
+            }
+
+            // remove any leading whitespace
+            lineStart = line;
+            while (*lineStart == ' ') { ++lineStart; }
+
+            // skip empty lines
+            if (strlen(lineStart) == 0)
+            {
+                continue;
+            }
+
+            // if the line starts with "#" or "//", then consider it a comment and ignore it.
+            // if the first 'word' is only "-- " then the remainder of the line is for application arguments
+            // else first 'word' in line should be a long setting name and the rest of line is value for setting
+            if (lineStart[0] == '#' || (lineStart[0] == '/' && lineStart[1] == '/'))
+            {
+                // its a comment, continue to next loop iteration
+                continue;
+            }
+
+            pOpenBracket = strchr(lineStart, '[');
+            pCloseBracket = strchr(lineStart, ']');
+            if (pOpenBracket != NULL && pCloseBracket != NULL)
+            {
+                // a group was found!
+                unsigned int i;
+                char* pGroupName = vktrace_allocate_and_copy_n(pOpenBracket + 1,
+                                                           (int) (pCloseBracket - pOpenBracket - 1));
+
+                // Check to see if we already have this group
+                pCurGroup = NULL;
+                for (i = 0; i < *pNumSettingGroups; i++)
+                {
+                    if (strcmp((*ppSettingGroups)[i].pName, pGroupName) == 0)
+                    {
+                        // we already have this group!
+                        pCurGroup = &(*ppSettingGroups)[i];
+                        break;
+                    }
+                }
+
+                if (pCurGroup == NULL)
+                {
+                    // Need to grow our list of groups!
+                    pCurGroup = vktrace_SettingGroup_Create(pGroupName, ppSettingGroups, pNumSettingGroups);
+                }
+            }
+            else
+            {
+                char* pTokName = strtok(lineStart, "=");
+                char* pTokValue = strtok(NULL, "=");
+                if (pTokName != NULL && pTokValue != NULL)
+                {
+                    // A setting name and value were found!
+                    char* pValueStart = pTokValue;
+                    char* pTmpEndName = pTokName;
+
+                    assert(pCurGroup != NULL);
+                    if (pCurGroup != NULL)
+                    {
+                        // create a SettingInfo to store this information
+                        vktrace_SettingInfo info;
+                        vktrace_SettingInfo* pTmp;
+                        memset(&info, 0, sizeof(vktrace_SettingInfo));
+
+                        // trim trailing whitespace by turning it into a null char
+                        while (*pTmpEndName != '\0')
+                        {
+                            if (*pTmpEndName == ' ')
+                            {
+                                *pTmpEndName = '\0';
+                                break;
+                            }
+                            else
+                            {
+                                ++pTmpEndName;
+                            }
+                        }
+
+                        info.pLongName = vktrace_allocate_and_copy(pTokName);
+                        info.type = VKTRACE_SETTING_STRING;
+
+                        // remove leading whitespace from value
+                        while (*pValueStart == ' ') { ++pValueStart; }
+                        info.Data.ppChar = vktrace_malloc(sizeof(char**));
+                        *info.Data.ppChar = vktrace_allocate_and_copy(pValueStart);
+
+                        // add it to the current group
+                        pTmp = pCurGroup->pSettings;
+                        pCurGroup->numSettings += 1;
+                        pCurGroup->pSettings = VKTRACE_NEW_ARRAY(vktrace_SettingInfo, pCurGroup->numSettings);
+                        if (pCurGroup->pSettings == NULL)
+                        {
+                            // failed to allocate new info array
+                            // restore original
+                            pCurGroup->numSettings -= 1;
+                            pCurGroup->pSettings = pTmp;
+                        }
+                        else
+                        {
+                            if (pTmp != NULL)
+                            {
+                                memcpy(pCurGroup->pSettings, pTmp, pCurGroup->numSettings * sizeof(vktrace_SettingInfo));
+                            }
+
+                            pCurGroup->pSettings[pCurGroup->numSettings - 1] = info;
+                        }
+                    }
+                }
+                else
+                {
+                    vktrace_LogWarning("Could not parse a line in settings file: '%s'.", line);
+                }
+            }
+        }
+    }
+
+    VKTRACE_DELETE(line);
+
+    return retVal;
+}
+
+// ------------------------------------------------------------------------------------------------
+void vktrace_SettingGroup_Delete_Loaded(vktrace_SettingGroup** ppSettingGroups, unsigned int* pNumSettingGroups)
+{
+    unsigned int g;
+    unsigned int s;
+    assert(ppSettingGroups != NULL);
+    assert(*ppSettingGroups != NULL);
+    assert(pNumSettingGroups != NULL);
+
+    for (g = 0; g < *pNumSettingGroups; g++)
+    {
+        vktrace_SettingGroup* pGroup = &(*ppSettingGroups)[g];
+        vktrace_free((void*)pGroup->pName);
+        pGroup->pName = NULL;
+
+        for (s = 0; s < pGroup->numSettings; s++)
+        {
+            vktrace_free((void*)pGroup->pSettings[s].pLongName);
+            pGroup->pSettings[s].pLongName = NULL;
+            vktrace_free(*pGroup->pSettings[s].Data.ppChar);
+            vktrace_free(pGroup->pSettings[s].Data.ppChar);
+        }
+
+        VKTRACE_DELETE(pGroup->pSettings);
+        pGroup->pSettings = NULL;
+    }
+
+    VKTRACE_DELETE(*ppSettingGroups);
+    *pNumSettingGroups = 0;
+}
+
+// ------------------------------------------------------------------------------------------------
+void vktrace_SettingGroup_Apply_Overrides(vktrace_SettingGroup* pSettingGroup, vktrace_SettingGroup* pOverrideGroups, unsigned int numOverrideGroups)
+{
+    unsigned int overrideGroupIndex;
+    assert(pSettingGroup != NULL);
+    assert(pOverrideGroups != NULL);
+
+    // only override matching group (based on name)
+    for (overrideGroupIndex = 0; overrideGroupIndex < numOverrideGroups; overrideGroupIndex++)
+    {
+        if (strcmp(pSettingGroup->pName, pOverrideGroups[overrideGroupIndex].pName) == 0)
+        {
+            unsigned int overrideSettingIndex;
+            vktrace_SettingGroup* pOverride = &pOverrideGroups[overrideGroupIndex];
+
+            for (overrideSettingIndex = 0; overrideSettingIndex < pOverride->numSettings; overrideSettingIndex++)
+            {
+                unsigned int baseSettingIndex;
+                vktrace_SettingInfo* pOverrideSetting = &pOverride->pSettings[overrideSettingIndex];
+
+                // override matching settings based on long name
+                for (baseSettingIndex = 0; baseSettingIndex < pSettingGroup->numSettings; baseSettingIndex++)
+                {
+                    if (strcmp(pSettingGroup->pSettings[baseSettingIndex].pLongName, pOverrideSetting->pLongName) == 0)
+                    {
+                        char* pTmp = vktrace_SettingInfo_stringify_value(pOverrideSetting);
+                        if (vktrace_SettingInfo_parse_value(&pSettingGroup->pSettings[baseSettingIndex], pTmp) == FALSE)
+                        {
+                            vktrace_LogWarning("Failed to parse override value.");
+                        }
+                        vktrace_free(pTmp);
+                        break;
+                    }
+                }
+            }
+            break;
+        }
+    }
+}
+
+//-----------------------------------------------------------------------------
+BOOL vktrace_SettingGroup_save(vktrace_SettingGroup* pSettingGroup, unsigned int numSettingGroups, FILE* pSettingsFile)
+{
+    BOOL retVal = TRUE;
+
+    if (pSettingGroup == NULL)
+    {
+        vktrace_LogError("Cannot save a null group of settings.");
+        retVal = FALSE;
+    }
+
+    if (pSettingsFile == NULL)
+    {
+        vktrace_LogError("Cannot save an unnamed settings file.");
+        retVal = FALSE;
+    }
+
+    if (retVal == TRUE)
+    {
+        unsigned int g;
+        unsigned int index;
+
+        for (g = 0; g < numSettingGroups; g++)
+        {
+            // group name
+            fputs("[", pSettingsFile);
+            fputs(pSettingGroup[g].pName, pSettingsFile);
+            fputs("]\n", pSettingsFile);
+
+            // settings
+            for (index = 0; index < pSettingGroup[g].numSettings; index++)
+            {
+                char* value = NULL;
+                fputs("   ", pSettingsFile);
+                fputs(pSettingGroup[g].pSettings[index].pLongName, pSettingsFile);
+                fputs(" = ", pSettingsFile);
+                value = vktrace_SettingInfo_stringify_value(&pSettingGroup[g].pSettings[index]);
+                if (value != NULL)
+                {
+                    fputs(value, pSettingsFile);
+                    vktrace_free(value);
+                }
+                else
+                {
+                    fputs("", pSettingsFile);
+                }
+                fputs("\n", pSettingsFile);
+            }
+
+            fputs("\n", pSettingsFile);
+        }
+    }
+
+    return retVal;
+}
+
+//-----------------------------------------------------------------------------
+int vktrace_SettingGroup_init_from_cmdline(vktrace_SettingGroup* pSettingGroup, int argc, char* argv[], char** ppOut_remaining_args)
+{
+    int i = 0;
+
+    if (pSettingGroup != NULL)
+    {
+        vktrace_SettingInfo* pSettings = pSettingGroup->pSettings;
+        unsigned int num_settings = pSettingGroup->numSettings;
+
+        // update settings based on command line options
+        for (i = 1; i < argc; )
+        {
+            unsigned int settingIndex;
+            int consumed = 0;
+            char* curArg = argv[i];
+
+            // if the arg is only "--" then all following args are for the application;
+            // if the arg starts with "-" then it is referring to a short name;
+            // if the arg starts with "--" then it is referring to a long name.
+            if (strcmp("--", curArg) == 0 && ppOut_remaining_args != NULL)
+            {
+                // all remaining args are for the application
+
+                // increment past the current arg
+                i += 1;
+                consumed++;
+                for (; i < argc; i++)
+                {
+                    if (*ppOut_remaining_args == NULL || strlen(*ppOut_remaining_args) == 0)
+                    {
+                        *ppOut_remaining_args = vktrace_allocate_and_copy(argv[i]);
+                    }
+                    else
+                    {
+                        *ppOut_remaining_args = vktrace_copy_and_append(*ppOut_remaining_args, " ", argv[i]);
+                    }
+                    consumed++;
+                }
+            }
+            else
+            {
+                for (settingIndex = 0; settingIndex < num_settings; settingIndex++)
+                {
+                    const char* pSettingName = NULL;
+                    curArg = argv[i];
+                    if (strncmp("--", curArg, 2) == 0)
+                    {
+                        // long option name
+                        pSettingName = pSettings[settingIndex].pLongName;
+                        curArg += 2;
+                    }
+                    else if (strncmp("-", curArg, 1) == 0)
+                    {
+                        // short option name
+                        pSettingName = pSettings[settingIndex].pShortName;
+                        curArg += 1;
+                    }
+
+                    if (pSettingName != NULL && strcmp(curArg, pSettingName) == 0)
+                    {
+                        if (i+1 < argc &&
+                            vktrace_SettingInfo_parse_value(&pSettings[settingIndex], argv[i+1]))
+                        {
+                            consumed += 2;
+                        }
+                        break;
+                    }
+                }
+            }
+
+            if (consumed == 0)
+            {
+                vktrace_SettingGroup_print(pSettingGroup);
+                vktrace_SettingGroup_delete(pSettingGroup);
+                return -1;
+            }
+
+            i += consumed;
+        }
+    }
+
+    return 0;
+}
+
+// ------------------------------------------------------------------------------------------------
+int vktrace_SettingGroup_init(vktrace_SettingGroup* pSettingGroup, FILE* pSettingsFile, int argc, char* argv[], const char** ppOut_remaining_args)
+{
+    if (pSettingGroup == NULL)
+    {
+        assert(!"No need to call vktrace_SettingGroup_init if the application has no settings");
+        return 0;
+    }
+
+    if (argc == 2 && (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-h") == 0))
+    {
+        vktrace_SettingGroup_print(pSettingGroup);
+        return -1;
+    }
+
+    // Initially, set all options to their defaults
+    vktrace_SettingGroup_reset_defaults(pSettingGroup);
+
+    // Secondly set options based on settings file
+    if (pSettingsFile != NULL)
+    {
+        vktrace_SettingGroup* pGroups = NULL;
+        unsigned int numGroups = 0;
+        if (vktrace_SettingGroup_Load_from_file(pSettingsFile, &pGroups, &numGroups) == -1)
+        {
+            vktrace_SettingGroup_print(pSettingGroup);
+            return -1;
+        }
+
+        vktrace_SettingGroup_Apply_Overrides(pSettingGroup, pGroups, numGroups);
+
+        vktrace_SettingGroup_Delete_Loaded(&pGroups, &numGroups);
+    }
+
+    // Thirdly set options based on cmd line args
+    if (vktrace_SettingGroup_init_from_cmdline(pSettingGroup, argc, argv, (char **)ppOut_remaining_args) == -1)
+    {
+        return -1;
+    }
+
+    return 0;
+}
+
+// ------------------------------------------------------------------------------------------------
+void vktrace_SettingGroup_delete(vktrace_SettingGroup* pSettingGroup)
+{
+    if (pSettingGroup != NULL)
+    {
+        unsigned int i;
+
+        // need to delete all strings
+        for (i = 0; i < pSettingGroup->numSettings; i++)
+        {
+            if (pSettingGroup->pSettings[i].type == VKTRACE_SETTING_STRING)
+            {
+                if (*(pSettingGroup->pSettings[i].Data.ppChar) != NULL)
+                {
+                    vktrace_free(*pSettingGroup->pSettings[i].Data.ppChar);
+                    *pSettingGroup->pSettings[i].Data.ppChar = NULL;
+                }
+            }
+        }
+    }
+}

diff --git a/vktrace/src/vktrace_common/vktrace_settings.h b/vktrace/src/vktrace_common/vktrace_settings.h
new file mode 100644
index 0000000..7b7a719
--- /dev/null
+++ b/vktrace/src/vktrace_common/vktrace_settings.h

@@ -0,0 +1,99 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ **************************************************************************/
+#pragma once
+
+#include "vktrace_common.h"
+
+typedef enum VKTRACE_SETTING_TYPE
+{
+    VKTRACE_SETTING_STRING,
+    VKTRACE_SETTING_BOOL,
+    VKTRACE_SETTING_UINT,
+    VKTRACE_SETTING_INT
+} VKTRACE_SETTING_TYPE;
+
+// ------------------------------------------------------------------------------------------------
+typedef struct vktrace_SettingInfo
+{
+    const char* pShortName;
+    const char* pLongName;
+    VKTRACE_SETTING_TYPE type;
+    union Data
+    {
+        void* pVoid;
+        char** ppChar;
+        BOOL* pBool;
+        unsigned int* pUint;
+        int* pInt;
+    } Data;
+    union Default
+    {
+        void* pVoid;
+        char** ppChar;
+        BOOL* pBool;
+        unsigned int* pUint;
+        int* pInt;
+    } Default;
+    BOOL bPrintInHelp;
+    const char* pDesc;
+} vktrace_SettingInfo;
+
+typedef struct vktrace_SettingGroup
+{
+    const char* pName;
+    unsigned int numSettings;
+    vktrace_SettingInfo* pSettings;
+} vktrace_SettingGroup;
+
+int vktrace_SettingGroup_init(vktrace_SettingGroup* pSettingGroup, FILE *pSettingsFile, int argc, char* argv[], const char** ppOut_remaining_args);
+BOOL vktrace_SettingGroup_save(vktrace_SettingGroup* pSettingGroup, unsigned int numSettingGroups, FILE* pSettingsFile);
+void vktrace_SettingGroup_delete(vktrace_SettingGroup* pSettingGroup);
+void vktrace_SettingGroup_reset_defaults(vktrace_SettingGroup* pSettingGroup);
+
+// Adds pSrc group to ppDestGroups if the named group is not already there,
+// or adds missing settings from pSrc into the existing group in ppDestGroups.
+// pNumDestGroups is updated if pSrc is added to ppDestGroups.
+void vktrace_SettingGroup_merge(vktrace_SettingGroup* pSrc, vktrace_SettingGroup** ppDestGroups, unsigned int* pNumDestGroups);
+
+// Updates DestGroups with values from Src
+void vktrace_SettingGroup_update(vktrace_SettingGroup* pSrc, vktrace_SettingGroup* pDestGroups, unsigned int numDestGroups);
+
+// Creates a new named group at the end of the ppSettingGroups array, and updates pNumSettingGroups.
+vktrace_SettingGroup* vktrace_SettingGroup_Create(const char* pGroupName, vktrace_SettingGroup** ppSettingGroups, unsigned int* pNumSettingGroups);
+
+// Adds a STRING settingInfo to pDestGroup which holds a copy of pSrcInfo, but with a stringified value.
+// The conversion to string is necessary for memory management purposes.
+void vktrace_SettingGroup_Add_Info(vktrace_SettingInfo* pSrcInfo, vktrace_SettingGroup* pDestGroup);
+
+int vktrace_SettingGroup_Load_from_file(FILE* pFile, vktrace_SettingGroup** ppSettingGroups, unsigned int* pNumSettingGroups);
+void vktrace_SettingGroup_Delete_Loaded(vktrace_SettingGroup** ppSettingGroups, unsigned int* pNumSettingGroups);
+
+void vktrace_SettingGroup_Apply_Overrides(vktrace_SettingGroup* pSettingGroup, vktrace_SettingGroup* pOverrideGroups, unsigned int numOverrideGroups);
+
+int vktrace_SettingGroup_init_from_cmdline(vktrace_SettingGroup* pSettingGroup, int argc, char* argv[], char** ppOut_remaining_args);
+
+void vktrace_SettingGroup_print(const vktrace_SettingGroup* pSettingGroup);
+void vktrace_SettingInfo_print(const vktrace_SettingInfo* pSetting);
+
+char* vktrace_SettingInfo_stringify_value(vktrace_SettingInfo* pSetting);
+BOOL vktrace_SettingInfo_parse_value(vktrace_SettingInfo* pSetting, const char* arg);
+void vktrace_SettingInfo_reset_default(vktrace_SettingInfo* pSetting);

diff --git a/vktrace/src/vktrace_common/vktrace_trace_packet_identifiers.h b/vktrace/src/vktrace_common/vktrace_trace_packet_identifiers.h
new file mode 100644
index 0000000..c057d1b
--- /dev/null
+++ b/vktrace/src/vktrace_common/vktrace_trace_packet_identifiers.h

@@ -0,0 +1,166 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ **************************************************************************/
+#pragma once
+
+#include "vktrace_common.h"
+
+#define VKTRACE_TRACE_FILE_VERSION_2 0x0002
+#define VKTRACE_TRACE_FILE_VERSION_3 0x0003
+#define VKTRACE_TRACE_FILE_VERSION_4 0x0004
+#define VKTRACE_TRACE_FILE_VERSION_5 0x0005
+#define VKTRACE_TRACE_FILE_VERSION VKTRACE_TRACE_FILE_VERSION_5
+#define VKTRACE_TRACE_FILE_VERSION_MINIMUM_COMPATIBLE VKTRACE_TRACE_FILE_VERSION_5
+
+#define VKTRACE_MAX_TRACER_ID_ARRAY_SIZE 14
+
+typedef enum VKTRACE_TRACER_ID
+{
+    VKTRACE_TID_RESERVED = 0,
+    VKTRACE_TID_GL_FPS,
+    VKTRACE_TID_VULKAN
+    // Max enum must be less than VKTRACE_MAX_TRACER_ID_ARRAY_SIZE
+} VKTRACE_TRACER_ID;
+
+typedef struct VKTRACE_TRACER_REPLAYER_INFO
+{
+    VKTRACE_TRACER_ID tracerId;
+    BOOL needsReplayer;
+    const char* const replayerLibraryName;
+    const char* const debuggerLibraryname;
+} VKTRACE_TRACER_REPLAYER_INFO;
+
+// The index here should match the value of the VKTRACE_TRACER_ID
+static const VKTRACE_TRACER_REPLAYER_INFO gs_tracerReplayerInfo[VKTRACE_MAX_TRACER_ID_ARRAY_SIZE] = {
+    {VKTRACE_TID_RESERVED, FALSE, "", ""},
+    {VKTRACE_TID_GL_FPS, FALSE, "", ""},
+    {VKTRACE_TID_VULKAN, TRUE, VKTRACE_LIBRARY_NAME(vulkan_replay), VKTRACE_LIBRARY_NAME(vktraceviewer_vk)},
+    {VKTRACE_TID_RESERVED, FALSE, "", ""}, // this can be updated as new tracers are added
+    {VKTRACE_TID_RESERVED, FALSE, "", ""}, // this can be updated as new tracers are added
+    {VKTRACE_TID_RESERVED, FALSE, "", ""}, // this can be updated as new tracers are added
+    {VKTRACE_TID_RESERVED, FALSE, "", ""}, // this can be updated as new tracers are added
+    {VKTRACE_TID_RESERVED, FALSE, "", ""}, // this can be updated as new tracers are added
+    {VKTRACE_TID_RESERVED, FALSE, "", ""}, // this can be updated as new tracers are added
+    {VKTRACE_TID_RESERVED, FALSE, "", ""}, // this can be updated as new tracers are added
+    {VKTRACE_TID_RESERVED, FALSE, "", ""}, // this can be updated as new tracers are added
+    {VKTRACE_TID_RESERVED, FALSE, "", ""}, // this can be updated as new tracers are added
+    {VKTRACE_TID_RESERVED, FALSE, "", ""}, // this can be updated as new tracers are added
+    {VKTRACE_TID_RESERVED, FALSE, "", ""}, // this can be updated as new tracers are added
+};
+
+typedef enum _VKTRACE_TRACE_PACKET_ID
+{
+    VKTRACE_TPI_MESSAGE,
+    VKTRACE_TPI_MARKER_CHECKPOINT,
+    VKTRACE_TPI_MARKER_API_BOUNDARY,
+    VKTRACE_TPI_MARKER_API_GROUP_BEGIN,
+    VKTRACE_TPI_MARKER_API_GROUP_END,
+    VKTRACE_TPI_MARKER_TERMINATE_PROCESS,
+    VKTRACE_TPI_BEGIN_API_HERE // this enum should always be the last in the list. Feel free to insert new ID above this one.
+} VKTRACE_TRACE_PACKET_ID;
+
+typedef struct {
+    uint8_t id;
+    uint8_t is_64_bit;
+} vktrace_tracer_info;
+
+#if defined(__LP64__) || defined(_WIN64) || (defined(__x86_64__) && !defined(__ILP32__) ) || defined(_M_X64) || defined(__ia64) || defined (_M_IA64) || defined(__aarch64__) || defined(__powerpc64__)
+
+typedef struct {
+    uint16_t trace_file_version;
+    uint32_t uuid[4];
+    uint64_t first_packet_offset;   // will be size of header including size of tracer_id_array and state_snapshot_path/binary
+    uint8_t tracer_count;           // number of tracers referenced in this trace file
+    vktrace_tracer_info tracer_id_array[VKTRACE_MAX_TRACER_ID_ARRAY_SIZE]; // array of tracer_ids and values which are referenced in the trace file
+    uint64_t trace_start_time;
+} vktrace_trace_file_header;
+
+typedef struct {
+    uint64_t size; // total size, including extra data, needed to get to the next packet_header
+    uint64_t global_packet_index;
+    uint8_t tracer_id; // TODO: need to uniquely identify tracers in a way that is known by the replayer
+    uint16_t packet_id; // VKTRACE_TRACE_PACKET_ID (or one of the api-specific IDs)
+    uint32_t thread_id;
+    uint64_t vktrace_begin_time; // start of measuring vktrace's overhead related to this packet
+    uint64_t entrypoint_begin_time;
+    uint64_t entrypoint_end_time;
+    uint64_t vktrace_end_time; // end of measuring vktrace's overhead related to this packet
+    uint64_t next_buffers_offset; // used for tracking the addition of buffers to the trace packet
+    uintptr_t pBody; // points to the body of the packet
+} vktrace_trace_packet_header;
+
+#else
+
+// vktrace_trace_file_header and vktrace_trace_packet_header are written and
+// read by both host and target CPU, which can be different architectures.
+// Specifically for Android, x86 and arm32 align 64-bit integers differently,
+// so we must require 8 byte alignment.
+// These changes are guarded by a 64-bit check above for now.
+
+#if defined(_WIN32) || defined(_WIN64)
+#define ALIGN8 __declspec(align(8))
+#else
+#define ALIGN8 __attribute__((aligned(8)))
+#endif
+
+typedef struct {
+    uint16_t trace_file_version;
+    uint32_t uuid[4];
+    ALIGN8 uint64_t first_packet_offset;   // will be size of header including size of tracer_id_array and state_snapshot_path/binary
+    uint8_t tracer_count;           // number of tracers referenced in this trace file
+    vktrace_tracer_info tracer_id_array[VKTRACE_MAX_TRACER_ID_ARRAY_SIZE]; // array of tracer_ids and values which are referenced in the trace file
+    ALIGN8 uint64_t trace_start_time;
+} vktrace_trace_file_header;
+
+typedef struct {
+    ALIGN8 uint64_t size; // total size, including extra data, needed to get to the next packet_header
+    ALIGN8 uint64_t global_packet_index;
+    uint8_t tracer_id; // TODO: need to uniquely identify tracers in a way that is known by the replayer
+    uint16_t packet_id; // VKTRACE_TRACE_PACKET_ID (or one of the api-specific IDs)
+    uint32_t thread_id;
+    ALIGN8 uint64_t vktrace_begin_time; // start of measuring vktrace's overhead related to this packet
+    ALIGN8 uint64_t entrypoint_begin_time;
+    ALIGN8 uint64_t entrypoint_end_time;
+    ALIGN8 uint64_t vktrace_end_time; // end of measuring vktrace's overhead related to this packet
+    ALIGN8 uint64_t next_buffers_offset; // used for tracking the addition of buffers to the trace packet
+    uintptr_t pBody; // points to the body of the packet
+} vktrace_trace_packet_header;
+
+#endif // 64-bit
+
+typedef struct {
+    vktrace_trace_packet_header* pHeader;
+    VktraceLogLevel type;
+    uint32_t length;
+    char* message;
+} vktrace_trace_packet_message;
+
+typedef struct {
+    vktrace_trace_packet_header* pHeader;
+    unsigned int length;
+    char* label;
+} vktrace_trace_packet_marker_checkpoint;
+
+typedef vktrace_trace_packet_marker_checkpoint vktrace_trace_packet_marker_api_boundary;
+typedef vktrace_trace_packet_marker_checkpoint vktrace_trace_packet_marker_api_group_begin;
+typedef vktrace_trace_packet_marker_checkpoint vktrace_trace_packet_marker_api_group_end;
+
+typedef VKTRACE_TRACER_ID (VKTRACER_CDECL *funcptr_VKTRACE_GetTracerId)();

diff --git a/vktrace/src/vktrace_common/vktrace_trace_packet_utils.c b/vktrace/src/vktrace_common/vktrace_trace_packet_utils.c
new file mode 100644
index 0000000..53d2b0f
--- /dev/null
+++ b/vktrace/src/vktrace_common/vktrace_trace_packet_utils.c

@@ -0,0 +1,300 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ **************************************************************************/
+#include "vktrace_trace_packet_utils.h"
+#include "vktrace_interconnect.h"
+#include "vktrace_filelike.h"
+#include "vktrace_pageguard_memorycopy.h"
+
+#ifdef WIN32
+#include <rpc.h>
+#pragma comment (lib, "Rpcrt4.lib")
+#endif
+
+#if defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+#include <fcntl.h>
+#include <time.h>
+#endif
+
+#if defined(PLATFORM_OSX)
+#include <mach/clock.h>
+#include <mach/mach.h>
+#endif
+
+#include "vktrace_pageguard_memorycopy.h"
+
+static uint64_t g_packet_index = 0;
+
+void vktrace_gen_uuid(uint32_t* pUuid)
+{
+    uint32_t buf[] = { 0xABCDEF, 0x12345678, 0xFFFECABC, 0xABCDDEF0 };
+    vktrace_platform_rand_s(buf, sizeof(buf)/sizeof(uint32_t));
+
+    pUuid[0] = buf[0];
+    pUuid[1] = buf[1];
+    pUuid[2] = buf[2];
+    pUuid[3] = buf[3];
+}
+
+#if defined(PLATFORM_LINUX)
+uint64_t vktrace_get_time()
+{
+    struct timespec time;
+    clock_gettime(CLOCK_MONOTONIC, &time);
+    return ((uint64_t)time.tv_sec * 1000000000) + time.tv_nsec;
+}
+#elif defined(PLATFORM_OSX)
+uint64_t vktrace_get_time()
+{
+    clock_serv_t cclock;
+    mach_timespec_t mts;
+    host_get_clock_service(mach_host_self(), CALENDAR_CLOCK, &cclock);
+    clock_get_time(cclock, &mts);
+    mach_port_deallocate(mach_task_self(), cclock);
+
+    return ((uint64_t)mts.tv_sec * 1000000000) + mts.tv_nsec;
+}
+#elif defined(PLATFORM_WINDOWS)
+uint64_t vktrace_get_time()
+{
+    // Should really avoid using RDTSC here since for RDTSC to be
+    // accurate, the process needs to stay on the same CPU and the CPU
+    // needs to stay at the same clock rate, which isn't always the case
+    // with today's power managed CPUs.
+    // But if all that is OK, the following one-liner could be used instead
+    // of the rest of this function.
+    //
+    // return __rdtsc();
+    //
+    LARGE_INTEGER count;
+    static LARGE_INTEGER start, freq;
+    if (0 == start.QuadPart) {
+        QueryPerformanceFrequency(&freq);
+        QueryPerformanceCounter(&start);
+    }
+    QueryPerformanceCounter(&count);
+    // Using a relative (from start) count here postpones overflow as we convert to ns.
+    return (uint64_t)(((count.QuadPart - start.QuadPart) * 1000000000) / freq.QuadPart);
+}
+#else
+uint64_t vktrace_get_time()
+{
+    return 0;
+}
+#endif
+
+//=============================================================================
+// trace file header
+
+vktrace_trace_file_header* vktrace_create_trace_file_header()
+{
+    vktrace_trace_file_header* pHeader = VKTRACE_NEW(vktrace_trace_file_header);
+    memset(pHeader, 0, sizeof(vktrace_trace_file_header));
+    pHeader->trace_file_version = VKTRACE_TRACE_FILE_VERSION;
+    vktrace_gen_uuid(pHeader->uuid);
+    pHeader->trace_start_time = vktrace_get_time();
+
+    return pHeader;
+}
+
+void vktrace_delete_trace_file_header(vktrace_trace_file_header** ppHeader)
+{
+    vktrace_free(*ppHeader);
+    *ppHeader = NULL;
+}
+
+//=============================================================================
+// Methods for creating, populating, and writing trace packets
+
+vktrace_trace_packet_header* vktrace_create_trace_packet(uint8_t tracer_id, uint16_t packet_id, uint64_t packet_size, uint64_t additional_buffers_size)
+{
+    // Always allocate at least enough space for the packet header
+    uint64_t total_packet_size = sizeof(vktrace_trace_packet_header) + packet_size + additional_buffers_size;
+    void* pMemory = vktrace_malloc((size_t)total_packet_size);
+    memset(pMemory, 0, (size_t)total_packet_size);
+
+    vktrace_trace_packet_header* pHeader = (vktrace_trace_packet_header*)pMemory;
+    pHeader->size = total_packet_size;
+    pHeader->global_packet_index = g_packet_index++;
+    pHeader->tracer_id = tracer_id;
+    pHeader->thread_id = vktrace_platform_get_thread_id();
+    pHeader->packet_id = packet_id;
+    if (pHeader->vktrace_begin_time == 0)
+        pHeader->vktrace_begin_time = vktrace_get_time();
+    pHeader->entrypoint_begin_time = pHeader->vktrace_begin_time;
+    pHeader->entrypoint_end_time = 0;
+    pHeader->vktrace_end_time = 0;
+    pHeader->next_buffers_offset = sizeof(vktrace_trace_packet_header) + packet_size; // initial offset is from start of header to after the packet body
+    if (total_packet_size > sizeof(vktrace_trace_packet_header))
+    {
+        pHeader->pBody = (uintptr_t)(((char*)pMemory) + sizeof(vktrace_trace_packet_header));
+    }
+    return pHeader;
+}
+
+void vktrace_delete_trace_packet(vktrace_trace_packet_header** ppHeader)
+{
+    if (ppHeader == NULL)
+        return;
+    if (*ppHeader == NULL)
+        return;
+
+    VKTRACE_DELETE(*ppHeader);
+    *ppHeader = NULL;
+}
+
+void* vktrace_trace_packet_get_new_buffer_address(vktrace_trace_packet_header* pHeader, uint64_t byteCount)
+{
+    void* pBufferStart;
+    assert(byteCount > 0);
+    assert((byteCount&0x3) == 0);  // All buffer sizes should be multiple of 4 so structs in packet are kept aligned
+    assert(pHeader->size >= pHeader->next_buffers_offset + byteCount);
+    if (pHeader->size < pHeader->next_buffers_offset + byteCount || byteCount == 0)
+    {
+        // not enough memory left in packet to hold buffer
+        // or request is for 0 bytes
+        return NULL;
+    }
+
+    pBufferStart = (void*)((char*)pHeader + pHeader->next_buffers_offset);
+    pHeader->next_buffers_offset += byteCount;
+    return pBufferStart;
+}
+
+void vktrace_add_buffer_to_trace_packet(vktrace_trace_packet_header* pHeader, void** ptr_address, uint64_t size, const void* pBuffer)
+{
+
+    // Make sure we have valid pointers and sizes. All pointers and sizes must be 4 byte aligned.
+    assert(ptr_address != NULL);
+    assert((size&0x3) == 0);
+
+    if (pBuffer == NULL || size == 0)
+    {
+        *ptr_address = NULL;
+    }
+    else
+    {
+        // set ptr to the location of the added buffer
+        *ptr_address = vktrace_trace_packet_get_new_buffer_address(pHeader, size);
+
+        // address of buffer in packet adding must be 4 byte aligned
+        assert(((uint64_t)*ptr_address&0x3) == 0);
+
+        // copy buffer to the location
+#ifdef WIN32
+        vktrace_pageguard_memcpy(*ptr_address, pBuffer, (size_t)size);
+#else
+        memcpy(*ptr_address, pBuffer, (size_t)size);
+#endif
+    }
+}
+
+void vktrace_finalize_buffer_address(vktrace_trace_packet_header* pHeader, void** ptr_address)
+{
+    assert(ptr_address != NULL);
+
+    if (*ptr_address != NULL)
+    {
+        // turn ptr into an offset from the packet body
+        uint64_t offset = (uint64_t)*ptr_address - (uint64_t) (pHeader->pBody);
+        *ptr_address = (void*)offset;
+    }
+}
+
+void vktrace_set_packet_entrypoint_end_time(vktrace_trace_packet_header* pHeader)
+{
+    pHeader->entrypoint_end_time = vktrace_get_time();
+}
+
+void vktrace_finalize_trace_packet(vktrace_trace_packet_header* pHeader)
+{
+    if (pHeader->entrypoint_end_time == 0)
+    {
+        vktrace_set_packet_entrypoint_end_time(pHeader);
+    }
+    pHeader->vktrace_end_time = vktrace_get_time();
+}
+
+void vktrace_write_trace_packet(const vktrace_trace_packet_header* pHeader, FileLike* pFile)
+{
+    static int errorCount = 0;
+    BOOL res = vktrace_FileLike_WriteRaw(pFile, pHeader, (size_t)pHeader->size);
+    if (!res && pHeader->packet_id != VKTRACE_TPI_MARKER_TERMINATE_PROCESS && errorCount < 10)
+    {
+        errorCount++;
+        vktrace_LogError("Failed to send trace packet index %u packetId %u size %u.", pHeader->global_packet_index, pHeader->packet_id, pHeader->size);
+    }
+}
+
+//=============================================================================
+// Methods for Reading and interpretting trace packets
+
+vktrace_trace_packet_header* vktrace_read_trace_packet(FileLike* pFile)
+{
+    // read size
+    // allocate space
+    // offset to after size
+    // read the rest of the packet
+    uint64_t total_packet_size = 0;
+    void* pMemory;
+    vktrace_trace_packet_header* pHeader;
+
+    if (vktrace_FileLike_ReadRaw(pFile, &total_packet_size, sizeof(uint64_t)) == FALSE)
+    {
+        return NULL;
+    }
+
+    // allocate space
+    pMemory = vktrace_malloc((size_t)total_packet_size);
+    pHeader = (vktrace_trace_packet_header*)pMemory;
+
+    if (pHeader != NULL)
+    {
+        pHeader->size = total_packet_size;
+        if (vktrace_FileLike_ReadRaw(pFile, (char*)pHeader + sizeof(uint64_t), (size_t)total_packet_size - sizeof(uint64_t)) == FALSE)
+        {
+            vktrace_LogError("Failed to read trace packet with size of %u.", total_packet_size);
+            return NULL;
+        }
+
+        pHeader->pBody = (uintptr_t)pHeader + sizeof(vktrace_trace_packet_header);
+    }
+    else {
+        vktrace_LogError("Malloc failed in vktrace_read_trace_packet of size %u.", total_packet_size);
+    }
+
+    return pHeader;
+}
+
+void* vktrace_trace_packet_interpret_buffer_pointer(vktrace_trace_packet_header* pHeader, intptr_t ptr_variable)
+{
+    // the pointer variable actually contains a byte offset from the packet body to the start of the buffer.
+    uint64_t offset = ptr_variable;
+    void* buffer_location;
+
+    // if the offset is 0, then we know the pointer to the buffer was NULL, so no buffer exists and we return NULL.
+    if (offset == 0)
+        return NULL;
+
+    buffer_location = (char*)(pHeader->pBody) + offset;
+    return buffer_location;
+}

diff --git a/vktrace/src/vktrace_common/vktrace_trace_packet_utils.h b/vktrace/src/vktrace_common/vktrace_trace_packet_utils.h
new file mode 100644
index 0000000..80a2524
--- /dev/null
+++ b/vktrace/src/vktrace_common/vktrace_trace_packet_utils.h

@@ -0,0 +1,153 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ **************************************************************************/
+#pragma once
+
+#include "vktrace_multiplatform.h"
+#include "vktrace_trace_packet_identifiers.h"
+#include "vktrace_filelike.h"
+#include "vktrace_memory.h"
+#include "vktrace_process.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+// pUuid is expected to be an array of 4 unsigned ints
+void vktrace_gen_uuid(uint32_t* pUuid);
+
+uint64_t vktrace_get_time();
+
+//=============================================================================
+// trace file header
+
+// there is a file header at the start of every trace file
+vktrace_trace_file_header* vktrace_create_trace_file_header();
+
+// deletes the trace file header and sets pointer to NULL
+void vktrace_delete_trace_file_header(vktrace_trace_file_header** ppHeader);
+
+static FILE* vktrace_write_trace_file_header(vktrace_process_info* pProcInfo)
+{
+    FILE* tracefp = NULL;
+    vktrace_trace_file_header* pHeader = NULL;
+    size_t items_written = 0;
+    assert(pProcInfo != NULL);
+
+    // open trace file
+    tracefp = fopen(pProcInfo->traceFilename, "wb");
+    if (tracefp == NULL)
+    {
+        vktrace_LogError("Cannot open trace file for writing %s.", pProcInfo->traceFilename);
+        return tracefp;
+    }
+
+    // populate header information
+    pHeader = vktrace_create_trace_file_header();
+    pHeader->first_packet_offset = sizeof(vktrace_trace_file_header);
+    pHeader->tracer_count = 1;
+
+
+    pHeader->tracer_id_array[0].id = pProcInfo->pCaptureThreads[0].tracerId;
+    pHeader->tracer_id_array[0].is_64_bit = (sizeof(intptr_t) == 8) ? 1 : 0;
+
+    // create critical section
+    vktrace_create_critical_section(&pProcInfo->traceFileCriticalSection);
+
+    // write header into file
+    vktrace_enter_critical_section(&pProcInfo->traceFileCriticalSection);
+    items_written = fwrite(pHeader, sizeof(vktrace_trace_file_header), 1, tracefp);
+    vktrace_leave_critical_section(&pProcInfo->traceFileCriticalSection);
+    if (items_written != 1)
+    {
+        vktrace_LogError("Failed to write trace file header.");
+        vktrace_delete_critical_section(&pProcInfo->traceFileCriticalSection);
+        fclose(tracefp);
+        return NULL;
+    }
+    vktrace_delete_trace_file_header(&pHeader);
+    return tracefp;
+};
+
+
+//=============================================================================
+// trace packets
+// There is a trace_packet_header before every trace_packet_body.
+// Additional buffers will come after the trace_packet_body.
+
+//=============================================================================
+// Methods for creating, populating, and writing trace packets
+
+// \param packet_size should include the total bytes for the specific type of packet, and any additional buffers needed by the packet.
+//        The size of the header will be added automatically within the function.
+vktrace_trace_packet_header* vktrace_create_trace_packet(uint8_t tracer_id, uint16_t packet_id, uint64_t packet_size, uint64_t additional_buffers_size);
+
+// deletes a trace packet and sets pointer to NULL
+void vktrace_delete_trace_packet(vktrace_trace_packet_header** ppHeader);
+
+// gets the next address available to write a buffer into the packet
+void* vktrace_trace_packet_get_new_buffer_address(vktrace_trace_packet_header* pHeader, uint64_t byteCount);
+
+// copies buffer data into a trace packet at the specified offset (from the end of the header).
+// it is up to the caller to ensure that buffers do not overlap.
+void vktrace_add_buffer_to_trace_packet(vktrace_trace_packet_header* pHeader, void** ptr_address, uint64_t size, const void* pBuffer);
+
+// converts buffer pointers into byte offset so that pointer can be interpretted after being read into memory
+void vktrace_finalize_buffer_address(vktrace_trace_packet_header* pHeader, void** ptr_address);
+
+// sets entrypoint end time
+void vktrace_set_packet_entrypoint_end_time(vktrace_trace_packet_header* pHeader);
+
+//void initialize_trace_packet_header(vktrace_trace_packet_header* pHeader, uint8_t tracer_id, uint16_t packet_id, uint64_t total_packet_size);
+void vktrace_finalize_trace_packet(vktrace_trace_packet_header* pHeader);
+
+// Write the trace packet to the filelike thing.
+// This has no knowledge of the details of the packet other than its size.
+void vktrace_write_trace_packet(const vktrace_trace_packet_header* pHeader, FileLike* pFile);
+
+//=============================================================================
+// Methods for Reading and interpretting trace packets
+
+// Reads in the trace packet header, the body of the packet, and additional buffers
+vktrace_trace_packet_header* vktrace_read_trace_packet(FileLike* pFile);
+
+// converts a pointer variable that is currently byte offset into a pointer to the actual offset location
+void* vktrace_trace_packet_interpret_buffer_pointer(vktrace_trace_packet_header* pHeader, intptr_t ptr_variable);
+
+//=============================================================================
+// trace packet message
+// Interpretting a trace_packet_message should be done only when:
+// 1) a trace_packet is first created and most of the contents are empty.
+// 2) immediately after the packet was read from memory
+// All other conversions of the trace packet body from the header should 
+// be performed using a C-style cast.
+static vktrace_trace_packet_message* vktrace_interpret_body_as_trace_packet_message(vktrace_trace_packet_header* pHeader)
+{
+    vktrace_trace_packet_message* pPacket = (vktrace_trace_packet_message*)pHeader->pBody;
+    // update pointers
+    pPacket->pHeader = pHeader;
+    pPacket->message = (char*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->message);
+    return pPacket;
+}
+
+#ifdef __cplusplus
+}
+#endif

diff --git a/vktrace/src/vktrace_common/vktrace_tracelog.c b/vktrace/src/vktrace_common/vktrace_tracelog.c
new file mode 100644
index 0000000..cea1a24
--- /dev/null
+++ b/vktrace/src/vktrace_common/vktrace_tracelog.c

@@ -0,0 +1,211 @@
+/*
+ * Copyright (c) 2013, NVIDIA CORPORATION. All rights reserved.
+ * Copyright (c) 2014-2016 Valve Corporation. All rights reserved.
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * 
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ */
+
+#include "vktrace_platform.h"
+
+#include "vktrace_tracelog.h"
+#include "vktrace_trace_packet_utils.h"
+
+#include <assert.h>
+#include <stdio.h>
+#include <stdarg.h>
+
+// filelike thing that is used for outputting trace messages
+static FileLike* g_pFileOut = NULL;
+
+VKTRACE_TRACER_ID g_tracelog_tracer_id = VKTRACE_TID_RESERVED;
+
+void vktrace_trace_set_trace_file(FileLike* pFileLike)
+{
+    g_pFileOut = pFileLike;
+}
+
+// set initial value to 0 but once we read the trace file version
+// we will update this and use for version checks
+static uint32_t g_trace_version_num = 0;
+
+void vktrace_set_trace_version(uint32_t version)
+{
+    g_trace_version_num = version;
+}
+
+BOOL vktrace_check_min_version(uint32_t version)
+{
+    return ((g_trace_version_num) >= (version) ? true : false);
+}
+
+FileLike* vktrace_trace_get_trace_file()
+{
+    return g_pFileOut;
+}
+
+void vktrace_tracelog_set_tracer_id(uint8_t tracerId)
+{
+    g_tracelog_tracer_id = (VKTRACE_TRACER_ID)tracerId;
+}
+
+VKTRACE_REPORT_CALLBACK_FUNCTION s_reportFunc;
+VktraceLogLevel s_logLevel = VKTRACE_LOG_ERROR;
+
+const char* vktrace_LogLevelToString(VktraceLogLevel level)
+{
+    switch(level)
+    {
+    case VKTRACE_LOG_NONE: return "Quiet";
+    case VKTRACE_LOG_DEBUG: return "Debug";
+    case VKTRACE_LOG_ERROR: return "Errors";
+    case VKTRACE_LOG_WARNING: return "Warnings";
+    case VKTRACE_LOG_VERBOSE: return "Info";
+    default:
+        return "Unknown";
+    }
+}
+
+const char* vktrace_LogLevelToShortString(VktraceLogLevel level)
+{
+    switch(level)
+    {
+    case VKTRACE_LOG_NONE: return "Quiet";
+    case VKTRACE_LOG_DEBUG: return "Debug";
+    case VKTRACE_LOG_ERROR: return "Errors";
+    case VKTRACE_LOG_WARNING: return "Warnings";
+    case VKTRACE_LOG_VERBOSE: return "Info";
+    default:
+        return "Unknown";
+    }
+}
+
+
+// For use by both tools and libraries.
+void vktrace_LogSetCallback(VKTRACE_REPORT_CALLBACK_FUNCTION pCallback)
+{
+    s_reportFunc = pCallback;
+}
+
+void vktrace_LogSetLevel(VktraceLogLevel level)
+{
+    vktrace_LogDebug("Log Level = %u (%s)", level, vktrace_LogLevelToString(level));
+    s_logLevel = level;
+}
+
+BOOL vktrace_LogIsLogging(VktraceLogLevel level)
+{
+    return (level <= s_logLevel) ? TRUE : FALSE;
+}
+
+void LogGuts(VktraceLogLevel level, const char* fmt, va_list args)
+{
+#if defined(WIN32)
+        int requiredLength = _vscprintf(fmt, args) + 1;
+#elif defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+        int requiredLength;
+        va_list argcopy;
+        va_copy(argcopy, args);
+        requiredLength = vsnprintf(NULL, 0, fmt, argcopy) + 1;
+        va_end(argcopy);
+#endif
+    static VKTRACE_THREAD_LOCAL BOOL logging = FALSE;
+
+    // Don't recursively log problems found during logging
+    if (logging)
+    {
+        return;
+    }
+    logging = TRUE;
+
+    char* message = (char*)vktrace_malloc(requiredLength);
+#if defined(WIN32)
+    _vsnprintf_s(message, requiredLength, requiredLength - 1, fmt, args);
+#elif defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+    vsnprintf(message, requiredLength, fmt, args);
+#endif
+
+    if (s_reportFunc != NULL)
+    {
+        s_reportFunc(level, message);
+    }
+    else
+    {
+#ifdef ANDROID
+        #include <android/log.h>
+        __android_log_print(ANDROID_LOG_INFO, "vktrace", "%s: %s\n", vktrace_LogLevelToString(level), message);
+#else
+        printf("%s: %s\n", vktrace_LogLevelToString(level), message);
+#endif
+    }
+
+    vktrace_free(message);
+    logging = FALSE;
+}
+
+void vktrace_LogAlways(const char* format, ...)
+{
+    va_list args;
+    va_start(args, format);
+    LogGuts(VKTRACE_LOG_VERBOSE, format, args);
+    va_end(args);
+}
+
+void vktrace_LogDebug(const char* format, ...)
+{
+#if defined(_DEBUG)
+	if (vktrace_LogIsLogging(VKTRACE_LOG_DEBUG))
+	{
+		va_list args;
+	    va_start(args, format);
+	    LogGuts(VKTRACE_LOG_DEBUG, format, args);
+	    va_end(args);
+    }
+#endif
+}
+
+void vktrace_LogError(const char* format, ...)
+{
+    if (vktrace_LogIsLogging(VKTRACE_LOG_ERROR))
+    {
+        va_list args;
+        va_start(args, format);
+        LogGuts(VKTRACE_LOG_ERROR, format, args);
+        va_end(args);
+    }
+}
+
+void vktrace_LogWarning(const char* format, ...)
+{
+    if (vktrace_LogIsLogging(VKTRACE_LOG_WARNING))
+    {
+        va_list args;
+        va_start(args, format);
+        LogGuts(VKTRACE_LOG_WARNING, format, args);
+        va_end(args);
+    }
+}
+
+void vktrace_LogVerbose(const char* format, ...)
+{
+    if (vktrace_LogIsLogging(VKTRACE_LOG_VERBOSE))
+    {
+        va_list args;
+        va_start(args, format);
+        LogGuts(VKTRACE_LOG_VERBOSE, format, args);
+        va_end(args);
+    }
+}

diff --git a/vktrace/src/vktrace_common/vktrace_tracelog.h b/vktrace/src/vktrace_common/vktrace_tracelog.h
new file mode 100644
index 0000000..a84776f
--- /dev/null
+++ b/vktrace/src/vktrace_common/vktrace_tracelog.h

@@ -0,0 +1,83 @@
+/*
+ * Copyright (c) 2013, NVIDIA CORPORATION. All rights reserved.
+ * Copyright (c) 2014-2016 Valve Corporation. All rights reserved.
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ */
+
+#pragma once
+
+#include <stdint.h>
+
+typedef struct FileLike FileLike;
+
+typedef enum {
+    VKTRACE_LOG_NONE = 0,
+    VKTRACE_LOG_ERROR,
+    VKTRACE_LOG_WARNING,
+    VKTRACE_LOG_VERBOSE,
+    VKTRACE_LOG_DEBUG
+} VktraceLogLevel;
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+const char* vktrace_LogLevelToString(VktraceLogLevel level);
+const char* vktrace_LogLevelToShortString(VktraceLogLevel level);
+
+void vktrace_trace_set_trace_file(FileLike* pFileLike);
+FileLike* vktrace_trace_get_trace_file();
+void vktrace_tracelog_set_tracer_id(uint8_t tracerId);
+
+void vktrace_set_trace_version(uint32_t version);
+BOOL vktrace_check_min_version(uint32_t version);
+
+// Logging is done by reporting the messages back to a callback.
+// Plugins should register a callback from the parent tool;
+// Tools should register their own callback so that they can output messages as desired.
+typedef void (*VKTRACE_REPORT_CALLBACK_FUNCTION)(VktraceLogLevel level, const char* pMsg);
+extern VKTRACE_REPORT_CALLBACK_FUNCTION s_reportFunc;
+extern VktraceLogLevel s_logLevel;
+
+void vktrace_LogSetCallback(VKTRACE_REPORT_CALLBACK_FUNCTION pCallback);
+void vktrace_LogSetLevel(VktraceLogLevel level);
+
+// Allows checking if a level is being logged so that string-related functions
+// can be skipped if they will not reported.
+BOOL vktrace_LogIsLogging(VktraceLogLevel level);
+
+// Always log the message, no matter what the ReportingLevel is.
+void vktrace_LogAlways(const char* format, ...);
+
+// Log debug information that is primarily helpful for Vktrace developers
+// and will only appear in _DEBUG builds.
+// This will also always be logged, no matter what the ReportingLevel is.
+void vktrace_LogDebug(const char* format, ...);
+
+// Log an error message.
+void vktrace_LogError(const char* format, ...);
+
+// Log a warning.
+void vktrace_LogWarning(const char* format, ...);
+
+// Log any misc information that might help a user understand what is going on.
+void vktrace_LogVerbose(const char* format, ...);
+
+#ifdef __cplusplus
+}
+#endif

diff --git a/vktrace/src/vktrace_common/wintypes.h b/vktrace/src/vktrace_common/wintypes.h
new file mode 100644
index 0000000..d4cbd52
--- /dev/null
+++ b/vktrace/src/vktrace_common/wintypes.h

@@ -0,0 +1,67 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ **************************************************************************/
+#pragma once
+
+#if  defined(__linux__) || defined(__APPLE__)
+#include <stdint.h>
+#include <stddef.h>
+typedef void * LPVOID;
+typedef void * PVOID;
+typedef void VOID;
+typedef char CHAR;
+typedef char TCHAR;
+typedef long LONG;
+typedef unsigned long ULONG;
+typedef int BOOL;
+typedef size_t SIZE_T;
+typedef unsigned long DWORD;
+typedef unsigned char BYTE;
+typedef unsigned char *PBYTE;
+typedef unsigned short USHORT;
+typedef unsigned char UCHAR;
+typedef unsigned short WORD;
+typedef DWORD * DWORD_PTR;
+typedef DWORD *PDWORD;
+typedef DWORD_PTR *PDWORD_PTR;
+typedef int32_t INT32;
+typedef int64_t LONG64;
+typedef uint64_t ULONG64;
+typedef const char * PCSTR;
+typedef const wchar_t * PCWSTR;
+#ifndef MAX_PATH
+#include <limits.h>
+#ifndef PATH_MAX
+#define MAX_PATH 4096
+#else
+#define MAX_PATH PATH_MAX
+#endif
+#endif
+#ifndef TRUE
+#define TRUE 1
+#endif
+#ifndef FALSE
+#define FALSE 0
+#endif
+
+#elif WIN32
+#include <windows.h>
+#endif
+

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/CMakeLists.txt b/vktrace/src/vktrace_extensions/vktracevulkan/CMakeLists.txt
new file mode 100644
index 0000000..a242c26
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/CMakeLists.txt

@@ -0,0 +1,79 @@
+PROJECT(vktracevulkan)
+cmake_minimum_required(VERSION 2.8)
+
+#include(FindPkgConfig)
+
+if (${CMAKE_SYSTEM_NAME} MATCHES "Linux")
+    set(PYTHON_CMD "python3")
+else()
+    set(PYTHON_CMD "py")
+endif()
+
+
+# if Vktrace is being built as part of a vulkan driver build, then use that target instead of the locally-commited binary.
+#if (TARGET vulkan)
+#    message(STATUS "Using external Vulkan header and library.")
+#    set(VKTRACE_VULKAN_LIB vulkan)
+    set(VKTRACE_VULKAN_DRIVER_DIR ${CMAKE_SOURCE_DIR})
+    set(VKTRACE_VULKAN_INCLUDE_DIR ${CMAKE_SOURCE_DIR}/include/vulkan)
+    set(VKTRACE_VULKAN_HEADER ${CMAKE_SOURCE_DIR}/include/vulkan/vulkan.h)
+    #set(VKTRACE_VULKAN_LUNARG_DEBUG_MARKER_HEADER ${VKTRACE_VULKAN_INCLUDE_DIR}/vk_lunarg_debug_marker.h)
+#else()
+    # Use a locally committed vulkan header and binary
+#    message(STATUS "Using Vktrace-supplied Vulkan header and library.")
+#    set(VKTRACE_VULKAN_DRIVER_DIR ${CMAKE_CURRENT_SOURCE_DIR}/vulkan)
+#    set(VKTRACE_VULKAN_INCLUDE_DIR ${CMAKE_CURRENT_SOURCE_DIR}/vulkan/include)
+#    set(VKTRACE_VULKAN_HEADER ${VKTRACE_VULKAN_INCLUDE_DIR}/vulkan/vulkan.h)
+#    set(VKTRACE_VULKAN_DEBUG_REPORT_LUNARG_HEADER ${VKTRACE_VULKAN_INCLUDE_DIR}/vk_debug_report_lunarg.h)
+
+if (${CMAKE_SYSTEM_NAME} MATCHES "Windows")
+   if (CMAKE_GENERATOR MATCHES "^Visual Studio.*")
+      set(VKTRACE_VULKAN_LIB
+         ${CMAKE_BINARY_DIR}/loader/${CMAKE_CFG_INTDIR}/vulkan-1.lib
+         )
+   else()
+      set(VKTRACE_VULKAN_LIB
+         ${CMAKE_BINARY_DIR}/loader/vulkan-1.lib
+         )
+   endif()
+endif()
+
+if (${CMAKE_SYSTEM_NAME} MATCHES "Linux" OR
+    ${CMAKE_SYSTEM_NAME} MATCHES "Darwin" )
+    set(VKTRACE_VULKAN_LIB
+        ${CMAKE_BINARY_DIR}/loader/libvulkan.so
+    )
+
+endif()
+
+message(STATUS "VKTRACE_VULKAN_LIB = " ${VKTRACE_VULKAN_LIB})
+#message(STATUS "VKTRACE_VULKAN_DRIVER_DIR = " ${VKTRACE_VULKAN_DRIVER_DIR})
+#message(STATUS "VKTRACE_VULKAN_HEADER = " ${VKTRACE_VULKAN_HEADER})
+
+# Run a codegen script to generate utilities that are vulkan-specific, dependent on the vulkan header files, and may be shared by the tracer, replayer, or debugger.
+# Generally, these are likely to be things that SHOULD be provided by the vulkan SDK.
+set(VKTRACE_VULKAN_CODEGEN_UTILS "vulkan/codegen_utils")
+file(MAKE_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/${VKTRACE_VULKAN_CODEGEN_UTILS})
+
+# generate files for vulkan.h
+execute_process(COMMAND ${PYTHON_EXECUTABLE} ${VKTRACE_VULKAN_DRIVER_DIR}/vk_helper.py --gen_struct_sizes ${VKTRACE_VULKAN_HEADER} --abs_out_dir ${CMAKE_CURRENT_SOURCE_DIR}/${VKTRACE_VULKAN_CODEGEN_UTILS})
+execute_process(COMMAND ${PYTHON_EXECUTABLE} ${VKTRACE_VULKAN_DRIVER_DIR}/vk_helper.py --gen_enum_string_helper ${VKTRACE_VULKAN_HEADER} --abs_out_dir ${CMAKE_CURRENT_SOURCE_DIR}/${VKTRACE_VULKAN_CODEGEN_UTILS})
+
+# generate files for vk_lunarg_debug_marker.h
+#execute_process(COMMAND ${PYTHON_EXECUTABLE} ${VKTRACE_VULKAN_DRIVER_DIR}/vk_helper.py --gen_struct_sizes ${VKTRACE_VULKAN_LUNARG_DEBUG_MARKER_HEADER} --abs_out_dir ${CMAKE_CURRENT_SOURCE_DIR}/${VKTRACE_VULKAN_CODEGEN_UTILS})
+#execute_process(COMMAND ${PYTHON_EXECUTABLE} ${VKTRACE_VULKAN_DRIVER_DIR}/vk_helper.py --gen_enum_string_helper ${VKTRACE_VULKAN_LUNARG_DEBUG_MARKER_HEADER} --abs_out_dir ${CMAKE_CURRENT_SOURCE_DIR}/${VKTRACE_VULKAN_CODEGEN_UTILS})
+
+# Run a codegen script to generate vktrace-specific vulkan utils
+set(CODEGEN_VKTRACE_DIR "${CMAKE_CURRENT_SOURCE_DIR}/codegen_vktrace_utils")
+file(MAKE_DIRECTORY ${CODEGEN_VKTRACE_DIR})
+execute_process(COMMAND ${PYTHON_EXECUTABLE} ${VKTRACE_VULKAN_DIR}/vktrace_generate.py AllPlatforms vktrace-packet-id VK_VERSION_1_0 OUTPUT_FILE ${CODEGEN_VKTRACE_DIR}/vktrace_vk_packet_id.h)
+execute_process(COMMAND ${PYTHON_EXECUTABLE} ${VKTRACE_VULKAN_DIR}/vktrace_generate.py AllPlatforms vktrace-core-trace-packets VK_VERSION_1_0 OUTPUT_FILE ${CODEGEN_VKTRACE_DIR}/vktrace_vk_vk_packets.h)
+execute_process(COMMAND ${PYTHON_EXECUTABLE} ${VKTRACE_VULKAN_DIR}/vktrace_generate.py AllPlatforms vktrace-ext-trace-packets vk_lunarg_debug_marker OUTPUT_FILE ${CODEGEN_VKTRACE_DIR}/vktrace_vk_vk_lunarg_debug_marker_packets.h)
+
+# Directories which actually contain vulkan-specific vktrace plugins.
+add_subdirectory(vkreplay/)
+# Only build vktraceviewer when Qt5 is available
+if (Qt5_FOUND AND BUILD_VKTRACEVIEWER)
+    add_subdirectory(vktraceviewer/)
+endif()
+

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/layers/layers_config.cpp b/vktrace/src/vktrace_extensions/vktracevulkan/layers/layers_config.cpp
new file mode 100644
index 0000000..e67fbf2
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/layers/layers_config.cpp

@@ -0,0 +1,174 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ **************************************************************************/
+#include <fstream>
+#include <string>
+#include <map>
+#include <string.h>
+#include <vkLayer.h>
+#include "loader_platform.h"
+#include "layers_config.h"
+// The following is #included again to catch certain OS-specific functions
+// being used:
+#include "loader_platform.h"
+
+#define MAX_CHARS_PER_LINE 4096
+
+class ConfigFile
+{
+public:
+    ConfigFile();
+    ~ConfigFile();
+
+    const char *getOption(const std::string &_option);
+    void setOption(const std::string &_option, const std::string &_val);
+
+private:
+    bool m_fileIsParsed;
+    std::map<std::string, std::string> m_valueMap;
+
+    void parseFile(const char *filename);
+};
+
+static ConfigFile g_configFileObj;
+
+static unsigned int convertStringEnumVal(const char *_enum)
+{
+    // only handles single enum values
+    if (!strcmp(_enum, "VK_DBG_LAYER_ACTION_IGNORE"))
+        return VK_DBG_LAYER_ACTION_IGNORE;
+    else if (!strcmp(_enum, "VK_DBG_LAYER_ACTION_CALLBACK"))
+        return VK_DBG_LAYER_ACTION_CALLBACK;
+    else if (!strcmp(_enum, "VK_DBG_LAYER_ACTION_LOG_MSG"))
+        return VK_DBG_LAYER_ACTION_LOG_MSG;
+    else if (!strcmp(_enum, "VK_DBG_LAYER_ACTION_BREAK"))
+        return VK_DBG_LAYER_ACTION_BREAK;
+    else if (!strcmp(_enum, "VK_DBG_LAYER_LEVEL_INFO"))
+        return VK_DBG_LAYER_LEVEL_INFO;
+    else if (!strcmp(_enum, "VK_DBG_LAYER_LEVEL_WARN"))
+        return VK_DBG_LAYER_LEVEL_WARN;
+    else if (!strcmp(_enum, "VK_DBG_LAYER_LEVEL_PERF_WARN"))
+        return VK_DBG_LAYER_LEVEL_PERF_WARN;
+    else if (!strcmp(_enum, "VK_DBG_LAYER_LEVEL_ERROR"))
+        return VK_DBG_LAYER_LEVEL_ERROR;
+    else if (!strcmp(_enum, "VK_DBG_LAYER_LEVEL_NONE"))
+        return VK_DBG_LAYER_LEVEL_NONE;
+    return 0;
+}
+const char *getLayerOption(const char *_option)
+{
+    return g_configFileObj.getOption(_option);
+}
+
+bool getLayerOptionEnum(const char *_option, uint32_t *optionDefault)
+{
+    bool res;
+    const char *option = (g_configFileObj.getOption(_option));
+    if (option != NULL) {
+        *optionDefault = convertStringEnumVal(option);
+        res = false;
+    } else {
+        res = true;
+    }
+    return res;
+}
+
+void setLayerOptionEnum(const char *_option, const char *_valEnum)
+{
+    unsigned int val = convertStringEnumVal(_valEnum);
+    char strVal[24];
+    snprintf(strVal, 24, "%u", val);
+    g_configFileObj.setOption(_option, strVal);
+}
+
+void setLayerOption(const char *_option, const char *_val)
+{
+    g_configFileObj.setOption(_option, _val);
+}
+
+ConfigFile::ConfigFile() : m_fileIsParsed(false)
+{
+}
+
+ConfigFile::~ConfigFile()
+{
+}
+
+const char *ConfigFile::getOption(const std::string &_option)
+{
+    std::map<std::string, std::string>::const_iterator it;
+    if (!m_fileIsParsed)
+    {
+        parseFile("vk_layer_settings.txt");
+    }
+
+    if ((it = m_valueMap.find(_option)) == m_valueMap.end())
+        return NULL;
+    else
+        return it->second.c_str();
+}
+
+void ConfigFile::setOption(const std::string &_option, const std::string &_val)
+{
+    if (!m_fileIsParsed)
+    {
+        parseFile("vk_layer_settings.txt");
+    }
+
+    m_valueMap[_option] = _val;
+}
+
+void ConfigFile::parseFile(const char *filename)
+{
+    std::ifstream file;
+    char buf[MAX_CHARS_PER_LINE];
+
+    m_fileIsParsed = true;
+    m_valueMap.clear();
+
+    file.open(filename);
+    if (!file.good())
+        return;
+
+    // read tokens from the file and form option, value pairs
+    file.getline(buf, MAX_CHARS_PER_LINE);
+    while (!file.eof())
+    {
+        char option[512];
+        char value[512];
+
+        char *pComment;
+
+        //discard any comments delimited by '#' in the line
+        pComment = strchr(buf, '#');
+        if (pComment)
+            *pComment = '\0';
+
+        if (sscanf(buf, " %511[^\n\t =] = %511[^\n \t]", option, value) == 2)
+        {
+            std::string optStr(option);
+            std::string valStr(value);
+            m_valueMap[optStr] = valStr;
+        }
+        file.getline(buf, MAX_CHARS_PER_LINE);
+    }
+}
+

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/layers/layers_config.h b/vktrace/src/vktrace_extensions/vktracevulkan/layers/layers_config.h
new file mode 100644
index 0000000..7add0f3
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/layers/layers_config.h

@@ -0,0 +1,35 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ **************************************************************************/
+#pragma once
+#include <stdbool.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+const char *getLayerOption(const char *_option);
+bool getLayerOptionEnum(const char *_option, uint32_t *optionDefault);
+
+void setLayerOption(const char *_option, const char *_val);
+void setLayerOptionEnum(const char *_option, const char *_valEnum);
+#ifdef __cplusplus
+}
+#endif

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/layers/vktrace_snapshot.c b/vktrace/src/vktrace_extensions/vktracevulkan/layers/vktrace_snapshot.c
new file mode 100644
index 0000000..5ca6229
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/layers/vktrace_snapshot.c

@@ -0,0 +1,1884 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Mark Lobodzinski <mark@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include "loader_platform.h"
+#include "vktrace_snapshot.h"
+#include "vk_struct_string_helper.h"
+
+#define LAYER_NAME_STR "VktraceSnapshot"
+#define LAYER_ABBREV_STR "VKTRACESnap"
+
+static VkLayerDispatchTable nextTable;
+static VkBaseLayerObject *pCurObj;
+
+// The following is #included again to catch certain OS-specific functions being used:
+#include "loader_platform.h"
+#include "layers_config.h"
+#include "layers_msg.h"
+
+static LOADER_PLATFORM_THREAD_ONCE_DECLARATION(tabOnce);
+static int objLockInitialized = 0;
+static loader_platform_thread_mutex objLock;
+
+// The 'masterSnapshot' which gets the delta merged into it when 'GetSnapshot()' is called.
+static VKTRACE_VK_SNAPSHOT s_snapshot = {0};
+
+// The 'deltaSnapshot' which tracks all object creation and deletion.
+static VKTRACE_VK_SNAPSHOT s_delta = {0};
+
+
+//=============================================================================
+// Helper structure for a VKTRACE vulkan snapshot.
+// These can probably be auto-generated at some point.
+//=============================================================================
+
+void vktrace_vk_malloc_and_copy(void** ppDest, size_t size, const void* pSrc)
+{
+    *ppDest = malloc(size);
+    memcpy(*ppDest, pSrc, size);
+}
+
+VkDeviceCreateInfo* vktrace_deepcopy_VkDeviceCreateInfo(const VkDeviceCreateInfo* pSrcCreateInfo)
+{
+    VkDeviceCreateInfo* pDestCreateInfo;
+
+    // NOTE: partially duplicated code from add_VkDeviceCreateInfo_to_packet(...)
+    {
+        uint32_t i;
+        vktrace_vk_malloc_and_copy((void**)&pDestCreateInfo, sizeof(VkDeviceCreateInfo), pSrcCreateInfo);
+        vktrace_vk_malloc_and_copy((void**)&pDestCreateInfo->pQueueCreateInfos, pSrcCreateInfo->queueCreateInfoCount*sizeof(VkDeviceQueueCreateInfo), pSrcCreateInfo->pQueueCreateInfos);
+        for (i = 0; i < pSrcCreateInfo->queueCreateInfoCount; i++) {
+            vktrace_vk_malloc_and_copy((void**)&pDestCreateInfo->pQueueCreateInfos[i].pQueuePriorities,
+                                       pSrcCreateInfo->pQueueCreateInfos[i].queueCount*sizeof(float),
+                                       pSrcCreateInfo->pQueueCreateInfos[i].pQueuePriorities);
+        }
+
+        if (pSrcCreateInfo->enabledExtensionCount > 0)
+        {
+            vktrace_vk_malloc_and_copy((void**)&pDestCreateInfo->ppEnabledExtensionNames, pSrcCreateInfo->enabledExtensionCount * sizeof(char *), pSrcCreateInfo->ppEnabledExtensionNames);
+            for (i = 0; i < pSrcCreateInfo->enabledExtensionCount; i++)
+            {
+                vktrace_vk_malloc_and_copy((void**)&pDestCreateInfo->ppEnabledExtensionNames[i], ROUNDUP_TO_4(strlen(pSrcCreateInfo->ppEnabledExtensionNames[i]) + 1), pSrcCreateInfo->ppEnabledExtensionNames[i]);
+            }
+        }
+        VkLayerCreateInfo *pSrcNext = ( VkLayerCreateInfo *) pSrcCreateInfo->pNext;
+        VkLayerCreateInfo **ppDstNext = ( VkLayerCreateInfo **) &pDestCreateInfo->pNext;
+        while (pSrcNext != NULL)
+        {
+            if ((pSrcNext->sType == VK_STRUCTURE_TYPE_LAYER_CREATE_INFO) && pSrcNext->enabledLayerCount > 0)
+            {
+                vktrace_vk_malloc_and_copy((void**)ppDstNext, sizeof(VkLayerCreateInfo), pSrcNext);
+                vktrace_vk_malloc_and_copy((void**)&(*ppDstNext)->ppActiveLayerNames, pSrcNext->enabledLayerCount * sizeof(char*), pSrcNext->ppActiveLayerNames);
+                for (i = 0; i < pSrcNext->enabledLayerCount; i++)
+                {
+                    vktrace_vk_malloc_and_copy((void**)&(*ppDstNext)->ppActiveLayerNames[i], ROUNDUP_TO_4(strlen(pSrcNext->ppActiveLayerNames[i]) + 1), pSrcNext->ppActiveLayerNames[i]);
+                }
+
+                ppDstNext = (VkLayerCreateInfo**) &(*ppDstNext)->pNext;
+            }
+            pSrcNext = (VkLayerCreateInfo*) pSrcNext->pNext;
+        }
+    }
+
+    return pDestCreateInfo;
+}
+
+void vktrace_deepfree_VkDeviceCreateInfo(VkDeviceCreateInfo* pCreateInfo)
+{
+    uint32_t i;
+    if (pCreateInfo->pQueueCreateInfos != NULL)
+    {
+        free((void*)pCreateInfo->pQueueCreateInfos);
+    }
+
+    if (pCreateInfo->ppEnabledExtensionNames != NULL)
+    {
+        for (i = 0; i < pCreateInfo->enabledExtensionCount; i++)
+        {
+            free((void*)pCreateInfo->ppEnabledExtensionNames[i]);
+        }
+        free((void*)pCreateInfo->ppEnabledExtensionNames);
+    }
+
+    VkLayerCreateInfo *pSrcNext = (VkLayerCreateInfo*)pCreateInfo->pNext;
+    while (pSrcNext != NULL)
+    {
+        VkLayerCreateInfo* pTmp = (VkLayerCreateInfo*)pSrcNext->pNext;
+        if ((pSrcNext->sType == VK_STRUCTURE_TYPE_LAYER_CREATE_INFO) && pSrcNext->enabledLayerCount > 0)
+        {
+            for (i = 0; i < pSrcNext->enabledLayerCount; i++)
+            {
+                free((void*)pSrcNext->ppActiveLayerNames[i]);
+            }
+
+            free((void*)pSrcNext->ppActiveLayerNames);
+            free(pSrcNext);
+        }
+        pSrcNext = pTmp;
+    }
+
+    free(pCreateInfo);
+}
+
+void vktrace_vk_snapshot_copy_createdevice_params(VKTRACE_VK_SNAPSHOT_CREATEDEVICE_PARAMS* pDest, VkPhysicalDevice physicalDevice, const VkDeviceCreateInfo* pCreateInfo, VkDevice* pDevice)
+{
+    pDest->physicalDevice = physicalDevice;
+
+    pDest->pCreateInfo = vktrace_deepcopy_VkDeviceCreateInfo(pCreateInfo);
+
+    pDest->pDevice = (VkDevice*)malloc(sizeof(VkDevice));
+    *pDest->pDevice = *pDevice;
+}
+
+void vktrace_vk_snapshot_destroy_createdevice_params(VKTRACE_VK_SNAPSHOT_CREATEDEVICE_PARAMS* pSrc)
+{
+    memset(&pSrc->physicalDevice, 0, sizeof(VkPhysicalDevice));
+
+    vktrace_deepfree_VkDeviceCreateInfo(pSrc->pCreateInfo);
+    pSrc->pCreateInfo = NULL;
+
+    free(pSrc->pDevice);
+    pSrc->pDevice = NULL;
+}
+
+
+
+// add a new node to the global and object lists, then return it so the caller can populate the object information.
+static VKTRACE_VK_SNAPSHOT_LL_NODE* snapshot_insert_object(VKTRACE_VK_SNAPSHOT* pSnapshot, VkObject object, VkObjectType type)
+{
+    // Create a new node
+    VKTRACE_VK_SNAPSHOT_LL_NODE* pNewObjNode = (VKTRACE_VK_SNAPSHOT_LL_NODE*)malloc(sizeof(VKTRACE_VK_SNAPSHOT_LL_NODE));
+    memset(pNewObjNode, 0, sizeof(VKTRACE_VK_SNAPSHOT_LL_NODE));
+    pNewObjNode->obj.object = object;
+    pNewObjNode->obj.objType = type;
+    pNewObjNode->obj.status = OBJSTATUS_NONE;
+
+    // insert at front of global list
+    pNewObjNode->pNextGlobal = pSnapshot->pGlobalObjs;
+    pSnapshot->pGlobalObjs = pNewObjNode;
+
+    // insert at front of object list
+    pNewObjNode->pNextObj = pSnapshot->pObjectHead[type];
+    pSnapshot->pObjectHead[type] = pNewObjNode;
+
+    // increment count
+    pSnapshot->globalObjCount++;
+    pSnapshot->numObjs[type]++;
+
+    return pNewObjNode;
+}
+
+// This is just a helper function to snapshot_remove_object(..). It is not intended for this to be called directly.
+static void snapshot_remove_obj_type(VKTRACE_VK_SNAPSHOT* pSnapshot, VkObject object, VkObjectType objType) {
+    VKTRACE_VK_SNAPSHOT_LL_NODE *pTrav = pSnapshot->pObjectHead[objType];
+    VKTRACE_VK_SNAPSHOT_LL_NODE *pPrev = pSnapshot->pObjectHead[objType];
+    while (pTrav) {
+        if (pTrav->obj.object == object) {
+            pPrev->pNextObj = pTrav->pNextObj;
+            // update HEAD of Obj list as needed
+            if (pSnapshot->pObjectHead[objType] == pTrav)
+            {
+                pSnapshot->pObjectHead[objType] = pTrav->pNextObj;
+            }
+            assert(pSnapshot->numObjs[objType] > 0);
+            pSnapshot->numObjs[objType]--;
+            return;
+        }
+        pPrev = pTrav;
+        pTrav = pTrav->pNextObj;
+    }
+    char str[1024];
+    sprintf(str, "OBJ INTERNAL ERROR : Obj %p was in global list but not in %s list", (void*)object, string_VK_OBJECT_TYPE(objType));
+    layerCbMsg(VK_DBG_MSG_ERROR, VK_VALIDATION_LEVEL_0, object, 0, VKTRACESNAPSHOT_INTERNAL_ERROR, LAYER_ABBREV_STR, str);
+}
+
+// Search global list to find object,
+// if found:
+// remove object from obj_type list using snapshot_remove_obj_type()
+// remove object from global list,
+// return object.
+// else:
+// Report message that we didn't see it get created,
+// return NULL.
+static VKTRACE_VK_SNAPSHOT_LL_NODE* snapshot_remove_object(VKTRACE_VK_SNAPSHOT* pSnapshot, VkObject object)
+{
+    VKTRACE_VK_SNAPSHOT_LL_NODE *pTrav = pSnapshot->pGlobalObjs;
+    VKTRACE_VK_SNAPSHOT_LL_NODE *pPrev = pSnapshot->pGlobalObjs;
+    while (pTrav)
+    {
+        if (pTrav->obj.object == object)
+        {
+            snapshot_remove_obj_type(pSnapshot, object, pTrav->obj.objType);
+            pPrev->pNextGlobal = pTrav->pNextGlobal;
+            // update HEAD of global list if needed
+            if (pSnapshot->pGlobalObjs == pTrav)
+            {
+                pSnapshot->pGlobalObjs = pTrav->pNextGlobal;
+            }
+            assert(pSnapshot->globalObjCount > 0);
+            pSnapshot->globalObjCount--;
+            return pTrav;
+        }
+        pPrev = pTrav;
+        pTrav = pTrav->pNextGlobal;
+    }
+
+    // Object not found.
+    char str[1024];
+    sprintf(str, "Object %p was not found in the created object list. It should be added as a deleted object.", (void*)object);
+    layerCbMsg(VK_DBG_MSG_UNKNOWN, VK_VALIDATION_LEVEL_0, object, 0, VKTRACESNAPSHOT_UNKNOWN_OBJECT, LAYER_ABBREV_STR, str);
+    return NULL;
+}
+
+// Add a new deleted object node to the list
+static void snapshot_insert_deleted_object(VKTRACE_VK_SNAPSHOT* pSnapshot, VkObject object, VkObjectType type)
+{
+    // Create a new node
+    VKTRACE_VK_SNAPSHOT_DELETED_OBJ_NODE* pNewObjNode = (VKTRACE_VK_SNAPSHOT_DELETED_OBJ_NODE*)malloc(sizeof(VKTRACE_VK_SNAPSHOT_DELETED_OBJ_NODE));
+    memset(pNewObjNode, 0, sizeof(VKTRACE_VK_SNAPSHOT_DELETED_OBJ_NODE));
+    pNewObjNode->objType = type;
+    pNewObjNode->object = object;
+
+    // insert at front of list
+    pNewObjNode->pNextObj = pSnapshot->pDeltaDeletedObjects;
+    pSnapshot->pDeltaDeletedObjects = pNewObjNode;
+
+    // increment count
+    pSnapshot->deltaDeletedObjectCount++;
+}
+
+// Note: the parameters after pSnapshot match the order of vkCreateDevice(..)
+static void snapshot_insert_device(VKTRACE_VK_SNAPSHOT* pSnapshot, VkPhysicalDevice physicalDevice, const VkDeviceCreateInfo* pCreateInfo, VkDevice* pDevice)
+{
+    VKTRACE_VK_SNAPSHOT_LL_NODE* pNode = snapshot_insert_object(pSnapshot, *pDevice, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    pNode->obj.pStruct = malloc(sizeof(VKTRACE_VK_SNAPSHOT_DEVICE_NODE));
+
+    VKTRACE_VK_SNAPSHOT_DEVICE_NODE* pDevNode = (VKTRACE_VK_SNAPSHOT_DEVICE_NODE*)pNode->obj.pStruct;
+    vktrace_vk_snapshot_copy_createdevice_params(&pDevNode->params, physicalDevice, pCreateInfo, pDevice);
+
+    // insert at front of device list
+    pNode->pNextObj = pSnapshot->pDevices;
+    pSnapshot->pDevices = pNode;
+
+    // increment count
+    pSnapshot->deviceCount++;
+}
+
+static void snapshot_remove_device(VKTRACE_VK_SNAPSHOT* pSnapshot, VkDevice device)
+{
+    VKTRACE_VK_SNAPSHOT_LL_NODE* pFoundObject = snapshot_remove_object(pSnapshot, device);
+
+    if (pFoundObject != NULL)
+    {
+        VKTRACE_VK_SNAPSHOT_LL_NODE *pTrav = pSnapshot->pDevices;
+        VKTRACE_VK_SNAPSHOT_LL_NODE *pPrev = pSnapshot->pDevices;
+        while (pTrav != NULL)
+        {
+            if (pTrav->obj.object == device)
+            {
+                pPrev->pNextObj = pTrav->pNextObj;
+                // update HEAD of Obj list as needed
+                if (pSnapshot->pDevices == pTrav)
+                    pSnapshot->pDevices = pTrav->pNextObj;
+
+                // delete the object
+                if (pTrav->obj.pStruct != NULL)
+                {
+                    VKTRACE_VK_SNAPSHOT_DEVICE_NODE* pDevNode = (VKTRACE_VK_SNAPSHOT_DEVICE_NODE*)pTrav->obj.pStruct;
+                    vktrace_vk_snapshot_destroy_createdevice_params(&pDevNode->params);
+                    free(pDevNode);
+                }
+                free(pTrav);
+
+                if (pSnapshot->deviceCount > 0)
+                {
+                    pSnapshot->deviceCount--;
+                }
+                else
+                {
+                    // TODO: Callback WARNING that too many devices were deleted
+                    assert(!"DeviceCount <= 0 means that too many devices were deleted.");
+                }
+                return;
+            }
+            pPrev = pTrav;
+            pTrav = pTrav->pNextObj;
+        }
+    }
+
+    // If the code got here, then the device wasn't in the devices list.
+    // That means we should add this device to the deleted items list.
+    snapshot_insert_deleted_object(&s_delta, device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+}
+
+// Traverse global list and return type for given object
+static VkObjectType ll_get_obj_type(VkObject object) {
+    VKTRACE_VK_SNAPSHOT_LL_NODE *pTrav = s_delta.pGlobalObjs;
+    while (pTrav) {
+        if (pTrav->obj.object == object)
+            return pTrav->obj.objType;
+        pTrav = pTrav->pNextGlobal;
+    }
+    char str[1024];
+    sprintf(str, "Attempting look-up on obj %p but it is NOT in the global list!", (void*)object);
+    layerCbMsg(VK_DBG_MSG_ERROR, VK_VALIDATION_LEVEL_0, object, 0, VKTRACESNAPSHOT_MISSING_OBJECT, LAYER_ABBREV_STR, str);
+    return (VkObjectType)-1;
+}
+
+static void ll_increment_use_count(VkObject object, VkObjectType objType) {
+    VKTRACE_VK_SNAPSHOT_LL_NODE *pTrav = s_delta.pObjectHead[objType];
+    while (pTrav) {
+        if (pTrav->obj.object == object) {
+            pTrav->obj.numUses++;
+            return;
+        }
+        pTrav = pTrav->pNextObj;
+    }
+
+    // If we do not find obj, insert it and then increment count
+    // TODO: we can't just create the object, because we don't know what it was created with.
+    // Instead, we need to make a list of referenced objects. When the delta is merged with a snapshot, we'll need
+    // to confirm that the referenced objects actually exist in the snapshot; otherwise I guess the merge should fail.
+    char str[1024];
+    sprintf(str, "Unable to increment count for obj %p, will add to list as %s type and increment count", (void*)object, string_VK_OBJECT_TYPE(objType));
+    layerCbMsg(VK_DBG_MSG_WARNING, VK_VALIDATION_LEVEL_0, object, 0, VKTRACESNAPSHOT_UNKNOWN_OBJECT, LAYER_ABBREV_STR, str);
+
+//    ll_insert_obj(pObj, objType);
+//    ll_increment_use_count(pObj, objType);
+}
+
+// Set selected flag state for an object node
+static void set_status(VkObject object, VkObjectType objType, OBJECT_STATUS status_flag) {
+    if ((void*)object != NULL) {
+        VKTRACE_VK_SNAPSHOT_LL_NODE *pTrav = s_delta.pObjectHead[objType];
+        while (pTrav) {
+            if (pTrav->obj.object == object) {
+                pTrav->obj.status |= status_flag;
+                return;
+            }
+            pTrav = pTrav->pNextObj;
+        }
+
+        // If we do not find it print an error
+        char str[1024];
+        sprintf(str, "Unable to set status for non-existent object %p of %s type", (void*)object, string_VK_OBJECT_TYPE(objType));
+        layerCbMsg(VK_DBG_MSG_ERROR, VK_VALIDATION_LEVEL_0, object, 0, VKTRACESNAPSHOT_UNKNOWN_OBJECT, LAYER_ABBREV_STR, str);
+    }
+}
+
+// Track selected state for an object node
+static void track_object_status(VkObject object, VkStateBindPoint stateBindPoint) {
+    VKTRACE_VK_SNAPSHOT_LL_NODE *pTrav = s_delta.pObjectHead[VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT];
+
+    while (pTrav) {
+        if (pTrav->obj.object == object) {
+            return;
+        }
+        pTrav = pTrav->pNextObj;
+    }
+
+    // If we do not find it print an error
+    char str[1024];
+    sprintf(str, "Unable to track status for non-existent Command Buffer object %p", (void*)object);
+    layerCbMsg(VK_DBG_MSG_ERROR, VK_VALIDATION_LEVEL_0, object, 0, VKTRACESNAPSHOT_UNKNOWN_OBJECT, LAYER_ABBREV_STR, str);
+}
+
+// Reset selected flag state for an object node
+static void reset_status(VkObject object, VkObjectType objType, OBJECT_STATUS status_flag) {
+    VKTRACE_VK_SNAPSHOT_LL_NODE *pTrav = s_delta.pObjectHead[objType];
+    while (pTrav) {
+        if (pTrav->obj.object == object) {
+            pTrav->obj.status &= ~status_flag;
+            return;
+        }
+        pTrav = pTrav->pNextObj;
+    }
+
+    // If we do not find it print an error
+    char str[1024];
+    sprintf(str, "Unable to reset status for non-existent object %p of %s type", (void*)object, string_VK_OBJECT_TYPE(objType));
+    layerCbMsg(VK_DBG_MSG_ERROR, VK_VALIDATION_LEVEL_0, object, 0, VKTRACESNAPSHOT_UNKNOWN_OBJECT, LAYER_ABBREV_STR, str);
+}
+
+#include "vk_dispatch_table_helper.h"
+static void initVktraceSnapshot(void)
+{
+    const char *strOpt;
+    // initialize VktraceSnapshot options
+    getLayerOptionEnum(LAYER_NAME_STR "ReportLevel", (uint32_t *) &g_reportingLevel);
+    g_actionIsDefault = getLayerOptionEnum(LAYER_NAME_STR "DebugAction", (uint32_t *) &g_debugAction);
+
+    if (g_debugAction & VK_DBG_LAYER_ACTION_LOG_MSG)
+    {
+        strOpt = getLayerOption(LAYER_NAME_STR "LogFilename");
+        if (strOpt)
+        {
+            g_logFile = fopen(strOpt, "w");
+        }
+        if (g_logFile == NULL)
+            g_logFile = stdout;
+    }
+
+    PFN_vkGetProcAddr fpNextGPA;
+    fpNextGPA = pCurObj->pGPA;
+    assert(fpNextGPA);
+
+    layer_init_device_dispatch_table(&nextTable, fpNextGPA, (VkPhysicalDevice) pCurObj->nextObject);
+    if (!objLockInitialized)
+    {
+        // TODO/TBD: Need to delete this mutex sometime.  How???
+        loader_platform_thread_create_mutex(&objLock);
+        objLockInitialized = 1;
+    }
+}
+
+//=============================================================================
+// vulkan entrypoints
+//=============================================================================
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkCreateInstance(const VkInstanceCreateInfo* pCreateInfo, VkInstance* pInstance)
+{
+    VkResult result = nextTable.CreateInstance(pCreateInfo, pInstance);
+    loader_platform_thread_lock_mutex(&objLock);
+    snapshot_insert_object(&s_delta, *pInstance, VK_DEBUG_REPORT_OBJECT_TYPE_INSTANCE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkDestroyInstance(VkInstance instance)
+{
+    nextTable.DestroyInstance(instance);
+    loader_platform_thread_lock_mutex(&objLock);
+    snapshot_remove_object(&s_delta, instance);
+    loader_platform_thread_unlock_mutex(&objLock);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkEnumeratePhysicalDevices(VkInstance instance, uint32_t* pPhysicalDeviceCount, VkPhysicalDevice* pPhysicalDevices)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(instance, VK_DEBUG_REPORT_OBJECT_TYPE_INSTANCE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.EnumeratePhysicalDevices(instance, pPhysicalDeviceCount, pPhysicalDevices);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkGetPhysicalDeviceInfo(VkPhysicalDevice gpu, VkPhysicalDeviceInfoType infoType, size_t* pDataSize, void* pData)
+{
+    VkBaseLayerObject* gpuw = (VkBaseLayerObject *) gpu;
+    pCurObj = gpuw;
+    loader_platform_thread_once(&tabOnce, initVktraceSnapshot);
+    VkResult result = nextTable.GetPhysicalDeviceInfo((VkPhysicalDevice)gpuw->nextObject, infoType, pDataSize, pData);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkCreateDevice(VkPhysicalDevice gpu, const VkDeviceCreateInfo* pCreateInfo, VkDevice* pDevice)
+{
+    VkBaseLayerObject* gpuw = (VkBaseLayerObject *) gpu;
+    pCurObj = gpuw;
+    loader_platform_thread_once(&tabOnce, initVktraceSnapshot);
+    VkResult result = nextTable.CreateDevice((VkPhysicalDevice)gpuw->nextObject, pCreateInfo, pDevice);
+    if (result == VK_SUCCESS)
+    {
+        loader_platform_thread_lock_mutex(&objLock);
+        snapshot_insert_device(&s_delta, gpu, pCreateInfo, pDevice);
+        loader_platform_thread_unlock_mutex(&objLock);
+    }
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkDestroyDevice(VkDevice device)
+{
+    nextTable.DestroyDevice(device);
+    loader_platform_thread_lock_mutex(&objLock);
+    snapshot_remove_device(&s_delta, device);
+    loader_platform_thread_unlock_mutex(&objLock);
+
+    // Report any remaining objects in LL
+    VKTRACE_VK_SNAPSHOT_LL_NODE *pTrav = s_delta.pGlobalObjs;
+    while (pTrav != NULL)
+    {
+//        if (pTrav->obj.objType == VK_DEBUG_REPORT_OBJECT_TYPE_SWAP_CHAIN_IMAGE_WSI ||
+//            pTrav->obj.objType == VK_DEBUG_REPORT_OBJECT_TYPE_SWAP_CHAIN_MEMORY_WSI)
+//        {
+//            VKTRACE_VK_SNAPSHOT_LL_NODE *pDel = pTrav;
+//            pTrav = pTrav->pNextGlobal;
+//            snapshot_remove_object(&s_delta, (void*)(pDel->obj.pVkObject));
+//        } else {
+            char str[1024];
+            sprintf(str, "OBJ ERROR : %s object %p has not been destroyed (was used %lu times).", string_VK_OBJECT_TYPE(pTrav->obj.objType), (void*)pTrav->obj.object, pTrav->obj.numUses);
+            layerCbMsg(VK_DBG_MSG_ERROR, VK_VALIDATION_LEVEL_0, device, 0, VKTRACESNAPSHOT_OBJECT_LEAK, LAYER_ABBREV_STR, str);
+            pTrav = pTrav->pNextGlobal;
+//        }
+    }
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkEnumerateLayers(VkPhysicalDevice physicalDevice, size_t maxStringSize, size_t* pOutLayerCount, char* const* pOutLayers, void* pReserved)
+{
+    if ((void*)physicalDevice != NULL)
+    {
+        VkBaseLayerObject* gpuw = (VkBaseLayerObject *) physicalDevice;
+        loader_platform_thread_lock_mutex(&objLock);
+        ll_increment_use_count(physicalDevice, VK_DEBUG_REPORT_OBJECT_TYPE_PHYSICAL_DEVICE_EXT);
+        loader_platform_thread_unlock_mutex(&objLock);
+        pCurObj = gpuw;
+        loader_platform_thread_once(&tabOnce, initVktraceSnapshot);
+        VkResult result = nextTable.EnumerateLayers((VkPhysicalDevice)gpuw->nextObject, maxStringSize, pOutLayerCount, pOutLayers, pReserved);
+        return result;
+    } else {
+        if (pOutLayerCount == NULL || pOutLayers == NULL || pOutLayers[0] == NULL)
+        {
+            return VK_ERROR_INVALID_POINTER;
+        }
+        // This layer compatible with all devices
+        *pOutLayerCount = 1;
+        strncpy((char *) pOutLayers[0], LAYER_NAME_STR, maxStringSize);
+        return VK_SUCCESS;
+    }
+}
+struct extProps {
+    uint32_t version;
+    const char * const name;
+};
+
+#define VKTRACE_SNAPSHOT_LAYER_EXT_ARRAY_SIZE 1
+static const struct extProps mtExts[VKTRACE_SNAPSHOT_LAYER_EXT_ARRAY_SIZE] = {
+    // TODO what is the version?
+{ 0x10, LAYER_NAME_STR }
+};
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkGetGlobalExtensionInfo(
+                                               VkExtensionInfoType infoType,
+                                               uint32_t extensionIndex,
+                                               size_t*  pDataSize,
+                                               void*    pData)
+{
+    /* This entrypoint is NOT going to init it's own dispatch table since loader calls here early */
+    VkExtensionProperties *ext_props;
+    uint32_t *count;
+
+    if (pDataSize == NULL)
+        return VK_ERROR_INVALID_POINTER;
+
+    switch (infoType) {
+        case VK_EXTENSION_INFO_TYPE_COUNT:
+            *pDataSize = sizeof(uint32_t);
+            if (pData == NULL)
+                return VK_SUCCESS;
+            count = (uint32_t *) pData;
+            *count = VKTRACE_SNAPSHOT_LAYER_EXT_ARRAY_SIZE;
+            break;
+        case VK_EXTENSION_INFO_TYPE_PROPERTIES:
+            *pDataSize = sizeof(VkExtensionProperties);
+            if (pData == NULL)
+                return VK_SUCCESS;
+            if (extensionIndex >= VKTRACE_SNAPSHOT_LAYER_EXT_ARRAY_SIZE)
+                return VK_ERROR_INVALID_VALUE;
+            ext_props = (VkExtensionProperties *) pData;
+            ext_props->version = mtExts[extensionIndex].version;
+            strncpy(ext_props->extensionName, mtExts[extensionIndex].name,
+                                        VK_MAX_EXTENSION_NAME_SIZE);
+            ext_props->extensionName[VK_MAX_EXTENSION_NAME_SIZE - 1] = '\0';
+            break;
+        default:
+            return VK_ERROR_INVALID_VALUE;
+    };
+
+    return VK_SUCCESS;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkGetDeviceQueue(VkDevice device, uint32_t queueNodeIndex, uint32_t queueIndex, VkQueue* pQueue)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.GetDeviceQueue(device, queueNodeIndex, queueIndex, pQueue);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkQueueSubmit(VkQueue queue, uint32_t commandBufferCount, const VkCommandBuffer* pCommandBuffers, VkFence fence)
+{
+    set_status(fence, VK_DEBUG_REPORT_OBJECT_TYPE_FENCE_EXT, OBJSTATUS_FENCE_IS_SUBMITTED);
+    VkResult result = nextTable.QueueSubmit(queue, commandBufferCount, pCommandBuffers, fence);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkQueueWaitIdle(VkQueue queue)
+{
+    VkResult result = nextTable.QueueWaitIdle(queue);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkDeviceWaitIdle(VkDevice device)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.DeviceWaitIdle(device);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkAllocateMemory(VkDevice device, const VkMemoryAllocateInfo* pAllocateInfo, VkDeviceMemory* pMemory)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.AllocateMemory(device, pAllocateInfo, pMemory);
+    if (result == VK_SUCCESS)
+    {
+        loader_platform_thread_lock_mutex(&objLock);
+        VKTRACE_VK_SNAPSHOT_LL_NODE* pNode = snapshot_insert_object(&s_delta, *pMemory, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_MEMORY_EXT);
+        pNode->obj.pStruct = NULL;
+        loader_platform_thread_unlock_mutex(&objLock);
+    }
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkFreeMemory(VkDevice device, VkDeviceMemory mem)
+{
+    nextTable.FreeMemory(device, mem);
+    loader_platform_thread_lock_mutex(&objLock);
+    snapshot_remove_object(&s_delta, mem);
+    loader_platform_thread_unlock_mutex(&objLock);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkSetMemoryPriority(VkDevice device, VkDeviceMemory mem, VkMemoryPriority priority)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(mem, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_MEMORY_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.SetMemoryPriority(device, mem, priority);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkMapMemory(VkDevice device, VkDeviceMemory mem, VkDeviceSize offset, VkDeviceSize size, VkMemoryMapFlags flags, void** ppData)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(mem, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_MEMORY_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    set_status(mem, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_MEMORY_EXT, OBJSTATUS_GPU_MEM_MAPPED);
+    VkResult result = nextTable.MapMemory(device, mem, offset, size, flags, ppData);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkUnmapMemory(VkDevice device, VkDeviceMemory mem)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(mem, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_MEMORY_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    reset_status(mem, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_MEMORY_EXT, OBJSTATUS_GPU_MEM_MAPPED);
+    nextTable.UnmapMemory(device, mem);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkPinSystemMemory(VkDevice device, const void* pSysMem, size_t memSize, VkDeviceMemory* pMemory)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.PinSystemMemory(device, pSysMem, memSize, pMemory);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkGetMultiDeviceCompatibility(VkPhysicalDevice physicalDevice0, VkPhysicalDevice physicalDevice1, VkPhysicalDeviceCompatibilityInfo* pInfo)
+{
+    VkBaseLayerObject* gpuw = (VkBaseLayerObject *) physicalDevice0;
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(physicalDevice0, VK_DEBUG_REPORT_OBJECT_TYPE_PHYSICAL_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    pCurObj = gpuw;
+    loader_platform_thread_once(&tabOnce, initVktraceSnapshot);
+    VkResult result = nextTable.GetMultiDeviceCompatibility((VkPhysicalDevice)gpuw->nextObject, physicalDevice1, pInfo);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkOpenSharedMemory(VkDevice device, const VkMemoryOpenInfo* pOpenInfo, VkDeviceMemory* pMemory)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.OpenSharedMemory(device, pOpenInfo, pMemory);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkOpenSharedSemaphore(VkDevice device, const VkSemaphoreOpenInfo* pOpenInfo, VkSemaphore* pSemaphore)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.OpenSharedSemaphore(device, pOpenInfo, pSemaphore);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkOpenPeerMemory(VkDevice device, const VkPeerMemoryOpenInfo* pOpenInfo, VkDeviceMemory* pMemory)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.OpenPeerMemory(device, pOpenInfo, pMemory);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkOpenPeerImage(VkDevice device, const VkPeerImageOpenInfo* pOpenInfo, VkImage* pImage, VkDeviceMemory* pMemory)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.OpenPeerImage(device, pOpenInfo, pImage, pMemory);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkDestroyObject(VkDevice device, VkObjectType objType, VkObject object)
+{
+    nextTable.DestroyObject(device, objType, object);
+    loader_platform_thread_lock_mutex(&objLock);
+    snapshot_remove_object(&s_delta, object);
+    loader_platform_thread_unlock_mutex(&objLock);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkGetObjectInfo(VkDevice device, VkObjectType objType, VkObject object, VkObjectInfoType infoType, size_t* pDataSize, void* pData)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(object, ll_get_obj_type(object));
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.GetObjectInfo(device, objType, object, infoType, pDataSize, pData);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkQueueBindObjectMemory(VkQueue queue, VkObjectType objType, VkObject object, uint32_t allocationIdx, VkDeviceMemory mem, VkDeviceSize offset)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(object, ll_get_obj_type(object));
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.QueueBindObjectMemory(queue, objType, object, allocationIdx, mem, offset);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkQueueBindObjectMemoryRange(VkQueue queue, VkObjectType objType, VkObject object, uint32_t allocationIdx, VkDeviceSize rangeOffset, VkDeviceSize rangeSize, VkDeviceMemory mem, VkDeviceSize memoryOffset)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(object, ll_get_obj_type(object));
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.QueueBindObjectMemoryRange(queue, objType, object, allocationIdx, rangeOffset, rangeSize, mem, memoryOffset);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkQueueBindImageMemoryRange(VkQueue queue, VkImage image, uint32_t allocationIdx, const VkImageMemoryBindInfo* pBindInfo, VkDeviceMemory mem, VkDeviceSize memoryOffset)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(image, VK_DEBUG_REPORT_OBJECT_TYPE_IMAGE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.QueueBindImageMemoryRange(queue, image, allocationIdx, pBindInfo, mem, memoryOffset);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkCreateFence(VkDevice device, const VkFenceCreateInfo* pCreateInfo, VkFence* pFence)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.CreateFence(device, pCreateInfo, pFence);
+    if (result == VK_SUCCESS)
+    {
+        loader_platform_thread_lock_mutex(&objLock);
+        VKTRACE_VK_SNAPSHOT_LL_NODE* pNode = snapshot_insert_object(&s_delta, *pFence, VK_DEBUG_REPORT_OBJECT_TYPE_FENCE_EXT);
+        pNode->obj.pStruct = NULL;
+        loader_platform_thread_unlock_mutex(&objLock);
+    }
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkGetFenceStatus(VkDevice device, VkFence fence)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(fence, VK_DEBUG_REPORT_OBJECT_TYPE_FENCE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    // Warn if submitted_flag is not set
+    VkResult result = nextTable.GetFenceStatus(device, fence);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkWaitForFences(VkDevice device, uint32_t fenceCount, const VkFence* pFences, bool32_t waitAll, uint64_t timeout)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.WaitForFences(device, fenceCount, pFences, waitAll, timeout);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkCreateSemaphore(VkDevice device, const VkSemaphoreCreateInfo* pCreateInfo, VkSemaphore* pSemaphore)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.CreateSemaphore(device, pCreateInfo, pSemaphore);
+    if (result == VK_SUCCESS)
+    {
+        loader_platform_thread_lock_mutex(&objLock);
+        VKTRACE_VK_SNAPSHOT_LL_NODE* pNode = snapshot_insert_object(&s_delta, *pSemaphore, VK_DEBUG_REPORT_OBJECT_TYPE_SEMAPHORE_EXT);
+        pNode->obj.pStruct = NULL;
+        loader_platform_thread_unlock_mutex(&objLock);
+    }
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkCreateEvent(VkDevice device, const VkEventCreateInfo* pCreateInfo, VkEvent* pEvent)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.CreateEvent(device, pCreateInfo, pEvent);
+    if (result == VK_SUCCESS)
+    {
+        loader_platform_thread_lock_mutex(&objLock);
+        VKTRACE_VK_SNAPSHOT_LL_NODE* pNode = snapshot_insert_object(&s_delta, *pEvent, VK_DEBUG_REPORT_OBJECT_TYPE_EVENT_EXT);
+        pNode->obj.pStruct = NULL;
+        loader_platform_thread_unlock_mutex(&objLock);
+    }
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkGetEventStatus(VkDevice device, VkEvent event)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(event, VK_DEBUG_REPORT_OBJECT_TYPE_EVENT_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.GetEventStatus(device, event);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkSetEvent(VkDevice device, VkEvent event)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(event, VK_DEBUG_REPORT_OBJECT_TYPE_EVENT_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.SetEvent(device, event);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkResetEvent(VkDevice device, VkEvent event)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(event, VK_DEBUG_REPORT_OBJECT_TYPE_EVENT_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.ResetEvent(device, event);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkCreateQueryPool(VkDevice device, const VkQueryPoolCreateInfo* pCreateInfo, VkQueryPool* pQueryPool)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.CreateQueryPool(device, pCreateInfo, pQueryPool);
+    if (result == VK_SUCCESS)
+    {
+        loader_platform_thread_lock_mutex(&objLock);
+        VKTRACE_VK_SNAPSHOT_LL_NODE* pNode = snapshot_insert_object(&s_delta, *pQueryPool, VK_DEBUG_REPORT_OBJECT_TYPE_QUERY_POOL_EXT);
+        pNode->obj.pStruct = NULL;
+        loader_platform_thread_unlock_mutex(&objLock);
+    }
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkGetQueryPoolResults(VkDevice device, VkQueryPool queryPool, uint32_t firstQuery, uint32_t queryCount, size_t* pDataSize, void* pData, VkQueryResultFlags flags)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(queryPool, VK_DEBUG_REPORT_OBJECT_TYPE_QUERY_POOL_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.GetQueryPoolResults(device, queryPool, firstQuery, queryCount, pDataSize, pData, flags);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkGetFormatInfo(VkDevice device, VkFormat format, VkFormatInfoType infoType, size_t* pDataSize, void* pData)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.GetFormatInfo(device, format, infoType, pDataSize, pData);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkCreateBuffer(VkDevice device, const VkBufferCreateInfo* pCreateInfo, VkBuffer* pBuffer)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.CreateBuffer(device, pCreateInfo, pBuffer);
+    if (result == VK_SUCCESS)
+    {
+        loader_platform_thread_lock_mutex(&objLock);
+        VKTRACE_VK_SNAPSHOT_LL_NODE* pNode = snapshot_insert_object(&s_delta, *pBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_BUFFER_EXT);
+        pNode->obj.pStruct = NULL;
+        loader_platform_thread_unlock_mutex(&objLock);
+    }
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkCreateBufferView(VkDevice device, const VkBufferViewCreateInfo* pCreateInfo, VkBufferView* pView)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.CreateBufferView(device, pCreateInfo, pView);
+    if (result == VK_SUCCESS)
+    {
+        loader_platform_thread_lock_mutex(&objLock);
+        VKTRACE_VK_SNAPSHOT_LL_NODE* pNode = snapshot_insert_object(&s_delta, *pView, VK_DEBUG_REPORT_OBJECT_TYPE_BUFFER_VIEW_EXT);
+        pNode->obj.pStruct = NULL;
+        loader_platform_thread_unlock_mutex(&objLock);
+    }
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkCreateImage(VkDevice device, const VkImageCreateInfo* pCreateInfo, VkImage* pImage)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.CreateImage(device, pCreateInfo, pImage);
+    if (result == VK_SUCCESS)
+    {
+        loader_platform_thread_lock_mutex(&objLock);
+        VKTRACE_VK_SNAPSHOT_LL_NODE* pNode = snapshot_insert_object(&s_delta, *pImage, VK_DEBUG_REPORT_OBJECT_TYPE_IMAGE_EXT);
+        pNode->obj.pStruct = NULL;
+        loader_platform_thread_unlock_mutex(&objLock);
+    }
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkGetImageSubresourceInfo(VkDevice device, VkImage image, const VkImageSubresource* pSubresource, VkSubresourceInfoType infoType, size_t* pDataSize, void* pData)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(image, VK_DEBUG_REPORT_OBJECT_TYPE_IMAGE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.GetImageSubresourceInfo(device, image, pSubresource, infoType, pDataSize, pData);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkCreateImageView(VkDevice device, const VkImageViewCreateInfo* pCreateInfo, VkImageView* pView)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.CreateImageView(device, pCreateInfo, pView);
+    if (result == VK_SUCCESS)
+    {
+        loader_platform_thread_lock_mutex(&objLock);
+        VKTRACE_VK_SNAPSHOT_LL_NODE* pNode = snapshot_insert_object(&s_delta, *pView, VK_DEBUG_REPORT_OBJECT_TYPE_IMAGE_VIEW_EXT);
+        pNode->obj.pStruct = NULL;
+        loader_platform_thread_unlock_mutex(&objLock);
+    }
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkCreateColorAttachmentView(VkDevice device, const VkColorAttachmentViewCreateInfo* pCreateInfo, VkColorAttachmentView* pView)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.CreateColorAttachmentView(device, pCreateInfo, pView);
+    if (result == VK_SUCCESS)
+    {
+        loader_platform_thread_lock_mutex(&objLock);
+        VKTRACE_VK_SNAPSHOT_LL_NODE* pNode = snapshot_insert_object(&s_delta, *pView, VK_DEBUG_REPORT_OBJECT_TYPE_COLOR_ATTACHMENT_VIEW);
+        pNode->obj.pStruct = NULL;
+        loader_platform_thread_unlock_mutex(&objLock);
+    }
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkCreateDepthStencilView(VkDevice device, const VkDepthStencilViewCreateInfo* pCreateInfo, VkDepthStencilView* pView)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.CreateDepthStencilView(device, pCreateInfo, pView);
+    if (result == VK_SUCCESS)
+    {
+        loader_platform_thread_lock_mutex(&objLock);
+        VKTRACE_VK_SNAPSHOT_LL_NODE* pNode = snapshot_insert_object(&s_delta, *pView, VK_DEBUG_REPORT_OBJECT_TYPE_DEPTH_STENCIL_VIEW);
+        pNode->obj.pStruct = NULL;
+        loader_platform_thread_unlock_mutex(&objLock);
+    }
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkCreateGraphicsPipeline(VkDevice device, const VkGraphicsPipelineCreateInfo* pCreateInfo, VkPipeline* pPipeline)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.CreateGraphicsPipeline(device, pCreateInfo, pPipeline);
+    if (result == VK_SUCCESS)
+    {
+        loader_platform_thread_lock_mutex(&objLock);
+        VKTRACE_VK_SNAPSHOT_LL_NODE* pNode = snapshot_insert_object(&s_delta, *pPipeline, VK_DEBUG_REPORT_OBJECT_TYPE_PIPELINE_EXT);
+        pNode->obj.pStruct = NULL;
+        loader_platform_thread_unlock_mutex(&objLock);
+    }
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkCreateComputePipeline(VkDevice device, const VkComputePipelineCreateInfo* pCreateInfo, VkPipeline* pPipeline)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.CreateComputePipeline(device, pCreateInfo, pPipeline);
+    if (result == VK_SUCCESS)
+    {
+        loader_platform_thread_lock_mutex(&objLock);
+        VKTRACE_VK_SNAPSHOT_LL_NODE* pNode = snapshot_insert_object(&s_delta, *pPipeline, VK_DEBUG_REPORT_OBJECT_TYPE_PIPELINE_EXT);
+        pNode->obj.pStruct = NULL;
+        loader_platform_thread_unlock_mutex(&objLock);
+    }
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkStorePipeline(VkDevice device, VkPipeline pipeline, size_t* pDataSize, void* pData)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(pipeline, VK_DEBUG_REPORT_OBJECT_TYPE_PIPELINE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.StorePipeline(device, pipeline, pDataSize, pData);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkLoadPipeline(VkDevice device, size_t dataSize, const void* pData, VkPipeline* pPipeline)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.LoadPipeline(device, dataSize, pData, pPipeline);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkCreateSampler(VkDevice device, const VkSamplerCreateInfo* pCreateInfo, VkSampler* pSampler)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.CreateSampler(device, pCreateInfo, pSampler);
+    if (result == VK_SUCCESS)
+    {
+        loader_platform_thread_lock_mutex(&objLock);
+        VKTRACE_VK_SNAPSHOT_LL_NODE* pNode = snapshot_insert_object(&s_delta, *pSampler, VK_DEBUG_REPORT_OBJECT_TYPE_SAMPLER_EXT);
+        pNode->obj.pStruct = NULL;
+        loader_platform_thread_unlock_mutex(&objLock);
+    }
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkCreateDescriptorSetLayout( VkDevice device, const VkDescriptorSetLayoutCreateInfo* pCreateInfo, VkDescriptorSetLayout* pSetLayout)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.CreateDescriptorSetLayout(device, pCreateInfo, pSetLayout);
+    if (result == VK_SUCCESS)
+    {
+        loader_platform_thread_lock_mutex(&objLock);
+        VKTRACE_VK_SNAPSHOT_LL_NODE* pNode = snapshot_insert_object(&s_delta, *pSetLayout, VK_DEBUG_REPORT_OBJECT_TYPE_DESCRIPTOR_SET_LAYOUT_EXT);
+        pNode->obj.pStruct = NULL;
+        loader_platform_thread_unlock_mutex(&objLock);
+    }
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkBeginDescriptorPoolUpdate(VkDevice device, VkDescriptorUpdateMode updateMode)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.BeginDescriptorPoolUpdate(device, updateMode);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkEndDescriptorPoolUpdate(VkDevice device, VkCommandBuffer cmd)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.EndDescriptorPoolUpdate(device, cmd);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkCreateDescriptorPool(VkDevice device, VkDescriptorPoolUsage poolUsage, uint32_t maxSets, const VkDescriptorPoolCreateInfo* pCreateInfo, VkDescriptorPool* pDescriptorPool)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.CreateDescriptorPool(device, poolUsage, maxSets, pCreateInfo, pDescriptorPool);
+    if (result == VK_SUCCESS)
+    {
+        loader_platform_thread_lock_mutex(&objLock);
+        VKTRACE_VK_SNAPSHOT_LL_NODE* pNode = snapshot_insert_object(&s_delta, *pDescriptorPool, VK_DEBUG_REPORT_OBJECT_TYPE_DESCRIPTOR_POOL_EXT);
+        pNode->obj.pStruct = NULL;
+        loader_platform_thread_unlock_mutex(&objLock);
+    }
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkResetDescriptorPool(VkDevice device, VkDescriptorPool descriptorPool)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(descriptorPool, VK_DEBUG_REPORT_OBJECT_TYPE_DESCRIPTOR_POOL_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.ResetDescriptorPool(device, descriptorPool);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkAllocateDescriptorSets(VkDevice device, VkDescriptorPool descriptorPool, VkDescriptorSetUsage setUsage, uint32_t count, const VkDescriptorSetLayout* pSetLayouts, VkDescriptorSet* pDescriptorSets, uint32_t* pCount)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(descriptorPool, VK_DEBUG_REPORT_OBJECT_TYPE_DESCRIPTOR_POOL_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.AllocateDescriptorSets(device, descriptorPool, setUsage, count, pSetLayouts, pDescriptorSets, pCount);
+    if (result == VK_SUCCESS)
+    {
+        for (uint32_t i = 0; i < *pCount; i++) {
+            loader_platform_thread_lock_mutex(&objLock);
+            VKTRACE_VK_SNAPSHOT_LL_NODE* pNode = snapshot_insert_object(&s_delta, pDescriptorSets[i], VK_DEBUG_REPORT_OBJECT_TYPE_DESCRIPTOR_SET_EXT);
+            pNode->obj.pStruct = NULL;
+            loader_platform_thread_unlock_mutex(&objLock);
+        }
+    }
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkClearDescriptorSets(VkDevice device, VkDescriptorPool descriptorPool, uint32_t count, const VkDescriptorSet* pDescriptorSets)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(descriptorPool, VK_DEBUG_REPORT_OBJECT_TYPE_DESCRIPTOR_POOL_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.ClearDescriptorSets(device, descriptorPool, count, pDescriptorSets);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkUpdateDescriptors(VkDevice device, VkDescriptorSet descriptorSet, uint32_t updateCount, const void** ppUpdateArray)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(descriptorSet, VK_DEBUG_REPORT_OBJECT_TYPE_DESCRIPTOR_SET_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.UpdateDescriptors(device, descriptorSet, updateCount, ppUpdateArray);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkAllocateCommandBuffers(VkDevice device, const VkCommandBufferAllocateInfo* pCreateInfo, VkCommandBuffer* pCommandBuffer)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.AllocateCommandBuffers(device, pCreateInfo, pCommandBuffer);
+    if (result == VK_SUCCESS)
+    {
+        loader_platform_thread_lock_mutex(&objLock);
+        VKTRACE_VK_SNAPSHOT_LL_NODE* pNode = snapshot_insert_object(&s_delta, *pCommandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+        pNode->obj.pStruct = NULL;
+        loader_platform_thread_unlock_mutex(&objLock);
+    }
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkBeginCommandBuffer(VkCommandBuffer commandBuffer, const VkCommandBufferBeginInfo* pBeginInfo)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.BeginCommandBuffer(commandBuffer, pBeginInfo);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkEndCommandBuffer(VkCommandBuffer commandBuffer)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.EndCommandBuffer(commandBuffer);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkResetCommandBuffer(VkCommandBuffer commandBuffer)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.ResetCommandBuffer(commandBuffer);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdBindPipeline(VkCommandBuffer commandBuffer, VkPipelineBindPoint pipelineBindPoint, VkPipeline pipeline)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdBindPipeline(commandBuffer, pipelineBindPoint, pipeline);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdBindDescriptorSets(VkCommandBuffer commandBuffer, VkPipelineBindPoint pipelineBindPoint, uint32_t firstSet, uint32_t setCount, const VkDescriptorSet* pDescriptorSets, uint32_t dynamicOffsetCount, const uint32_t* pDynamicOffsets)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdBindDescriptorSets(commandBuffer, pipelineBindPoint, firstSet, setCount, pDescriptorSets, dynamicOffsetCount, pDynamicOffsets);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdBindVertexBuffers(
+    VkCommandBuffer                                 commandBuffer,
+    uint32_t                                        firstBinding,
+    uint32_t                                        bindingCount,
+    const VkBuffer*                                 pBuffers,
+    const VkDeviceSize*                             pOffsets)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdBindVertexBuffers(commandBuffer, firstBinding, bindingCount, pBuffers, pOffsets);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdBindIndexBuffer(VkCommandBuffer commandBuffer, VkBuffer buffer, VkDeviceSize offset, VkIndexType indexType)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdBindIndexBuffer(commandBuffer, buffer, offset, indexType);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdDraw(VkCommandBuffer commandBuffer, uint32_t firstVertex, uint32_t vertexCount, uint32_t firstInstance, uint32_t instanceCount)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdDraw(commandBuffer, firstVertex, vertexCount, firstInstance, instanceCount);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdDrawIndexed(VkCommandBuffer commandBuffer, uint32_t indexCount, uint32_t instanceCount, uint32_t firstIndex, int32_t vertexOffset, uint32_t firstInstance)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdDrawIndexed(commandBuffer, indexCount, instanceCount, firstIndex, vertexOffset, firstInstance);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdDrawIndirect(VkCommandBuffer commandBuffer, VkBuffer buffer, VkDeviceSize offset, uint32_t count, uint32_t stride)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdDrawIndirect(commandBuffer, buffer, offset, count, stride);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdDrawIndexedIndirect(VkCommandBuffer commandBuffer, VkBuffer buffer, VkDeviceSize offset, uint32_t count, uint32_t stride)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdDrawIndexedIndirect(commandBuffer, buffer, offset, count, stride);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdDispatch(VkCommandBuffer commandBuffer, uint32_t x, uint32_t y, uint32_t z)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdDispatch(commandBuffer, x, y, z);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdDispatchIndirect(VkCommandBuffer commandBuffer, VkBuffer buffer, VkDeviceSize offset)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdDispatchIndirect(commandBuffer, buffer, offset);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdCopyBuffer(VkCommandBuffer commandBuffer, VkBuffer srcBuffer, VkBuffer dstBuffer, uint32_t regionCount, const VkBufferCopy* pRegions)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdCopyBuffer(commandBuffer, srcBuffer, dstBuffer, regionCount, pRegions);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdCopyImage(VkCommandBuffer commandBuffer, VkImage srcImage, VkImageLayout srcImageLayout, VkImage dstImage, VkImageLayout dstImageLayout, uint32_t regionCount, const VkImageCopy* pRegions)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdCopyImage(commandBuffer, srcImage, srcImageLayout, dstImage, dstImageLayout, regionCount, pRegions);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdCopyBufferToImage(VkCommandBuffer commandBuffer, VkBuffer srcBuffer, VkImage dstImage, VkImageLayout dstImageLayout, uint32_t regionCount, const VkBufferImageCopy* pRegions)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdCopyBufferToImage(commandBuffer, srcBuffer, dstImage, dstImageLayout, regionCount, pRegions);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdCopyImageToBuffer(VkCommandBuffer commandBuffer, VkImage srcImage, VkImageLayout srcImageLayout, VkBuffer dstBuffer, uint32_t regionCount, const VkBufferImageCopy* pRegions)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdCopyImageToBuffer(commandBuffer, srcImage, srcImageLayout, dstBuffer, regionCount, pRegions);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdCloneImageData(VkCommandBuffer commandBuffer, VkImage srcImage, VkImageLayout srcImageLayout, VkImage dstImage, VkImageLayout dstImageLayout)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdCloneImageData(commandBuffer, srcImage, srcImageLayout, dstImage, dstImageLayout);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdUpdateBuffer(VkCommandBuffer commandBuffer, VkBuffer dstBuffer, VkDeviceSize dstOffset, VkDeviceSize dataSize, const uint32_t* pData)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdUpdateBuffer(commandBuffer, dstBuffer, dstOffset, dataSize, pData);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdFillBuffer(VkCommandBuffer commandBuffer, VkBuffer dstBuffer, VkDeviceSize dstOffset, VkDeviceSize size, uint32_t data)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdFillBuffer(commandBuffer, dstBuffer, dstOffset, size, data);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdClearColorImage(VkCommandBuffer commandBuffer, VkImage image, VkImageLayout imageLayout, VkClearColor color, uint32_t rangeCount, const VkImageSubresourceRange* pRanges)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdClearColorImage(commandBuffer, image, imageLayout, color, rangeCount, pRanges);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdClearDepthStencil(VkCommandBuffer commandBuffer, VkImage image, VkImageLayout imageLayout, float depth, uint32_t stencil, uint32_t rangeCount, const VkImageSubresourceRange* pRanges)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdClearDepthStencil(commandBuffer, image, imageLayout, depth, stencil, rangeCount, pRanges);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdResolveImage(VkCommandBuffer commandBuffer, VkImage srcImage, VkImageLayout srcImageLayout, VkImage dstImage, VkImageLayout dstImageLayout, uint32_t rectCount, const VkImageResolve* pRects)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdResolveImage(commandBuffer, srcImage, srcImageLayout, dstImage, dstImageLayout, rectCount, pRects);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdSetEvent(VkCommandBuffer commandBuffer, VkEvent event, VkPipeEvent pipeEvent)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdSetEvent(commandBuffer, event, pipeEvent);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdResetEvent(VkCommandBuffer commandBuffer, VkEvent event, VkPipeEvent pipeEvent)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdResetEvent(commandBuffer, event, pipeEvent);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdWaitEvents(
+        VkCommandBuffer                                 commandBuffer,
+        VkWaitEvent                                 waitEvent,
+        uint32_t                                    eventCount,
+        const VkEvent*                              pEvents,
+        uint32_t                                    memoryBarrierCount,
+        const void**                                ppMemoryBarriers)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdWaitEvents(commandBuffer, waitEvent, eventCount, pEvents, memoryBarrierCount, ppMemoryBarriers);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdPipelineBarrier(
+        VkCommandBuffer                                 commandBuffer,
+        VkWaitEvent                                 waitEvent,
+        uint32_t                                    pipeEventCount,
+        const VkPipeEvent*                          pPipeEvents,
+        uint32_t                                    memoryBarrierCount,
+        const void**                                ppMemoryBarriers)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdPipelineBarrier(commandBuffer, waitEvent, pipeEventCount, pPipeEvents, memoryBarrierCount, ppMemoryBarriers);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdBeginQuery(VkCommandBuffer commandBuffer, VkQueryPool queryPool, uint32_t slot, VkFlags flags)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdBeginQuery(commandBuffer, queryPool, slot, flags);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdEndQuery(VkCommandBuffer commandBuffer, VkQueryPool queryPool, uint32_t slot)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdEndQuery(commandBuffer, queryPool, slot);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdResetQueryPool(VkCommandBuffer commandBuffer, VkQueryPool queryPool, uint32_t firstQuery, uint32_t queryCount)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdResetQueryPool(commandBuffer, queryPool, firstQuery, queryCount);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdWriteTimestamp(VkCommandBuffer commandBuffer, VkPipelineStageFlagBits pipelineStage, VkQueryPool queryPool, uint32_t slot)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdWriteTimestamp(commandBuffer, pipelineStage, queryPool, slot);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdInitAtomicCounters(VkCommandBuffer commandBuffer, VkPipelineBindPoint pipelineBindPoint, uint32_t startCounter, uint32_t counterCount, const uint32_t* pData)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdInitAtomicCounters(commandBuffer, pipelineBindPoint, startCounter, counterCount, pData);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdLoadAtomicCounters(VkCommandBuffer commandBuffer, VkPipelineBindPoint pipelineBindPoint, uint32_t startCounter, uint32_t counterCount, VkBuffer srcBuffer, VkDeviceSize srcOffset)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdLoadAtomicCounters(commandBuffer, pipelineBindPoint, startCounter, counterCount, srcBuffer, srcOffset);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdSaveAtomicCounters(VkCommandBuffer commandBuffer, VkPipelineBindPoint pipelineBindPoint, uint32_t startCounter, uint32_t counterCount, VkBuffer dstBuffer, VkDeviceSize dstOffset)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdSaveAtomicCounters(commandBuffer, pipelineBindPoint, startCounter, counterCount, dstBuffer, dstOffset);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkCreateFramebuffer(VkDevice device, const VkFramebufferCreateInfo* pCreateInfo, VkFramebuffer* pFramebuffer)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.CreateFramebuffer(device, pCreateInfo, pFramebuffer);
+    if (result == VK_SUCCESS)
+    {
+        loader_platform_thread_lock_mutex(&objLock);
+        VKTRACE_VK_SNAPSHOT_LL_NODE* pNode = snapshot_insert_object(&s_delta, *pFramebuffer, VK_DEBUG_REPORT_OBJECT_TYPE_FRAMEBUFFER_EXT);
+        pNode->obj.pStruct = NULL;
+        loader_platform_thread_unlock_mutex(&objLock);
+    }
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkCreateRenderPass(VkDevice device, const VkRenderPassCreateInfo* pCreateInfo, VkRenderPass* pRenderPass)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.CreateRenderPass(device, pCreateInfo, pRenderPass);
+    if (result == VK_SUCCESS)
+    {
+        loader_platform_thread_lock_mutex(&objLock);
+        VKTRACE_VK_SNAPSHOT_LL_NODE* pNode = snapshot_insert_object(&s_delta, *pRenderPass, VK_DEBUG_REPORT_OBJECT_TYPE_RENDER_PASS_EXT);
+        pNode->obj.pStruct = NULL;
+        loader_platform_thread_unlock_mutex(&objLock);
+    }
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdBeginRenderPass(VkCommandBuffer commandBuffer, const VkRenderPassBegin *pRenderPassBegin)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdBeginRenderPass(commandBuffer, pRenderPassBegin);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdEndRenderPass(VkCommandBuffer commandBuffer, VkRenderPass renderPass)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdEndRenderPass(commandBuffer, renderPass);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkDbgSetValidationLevel(VkDevice device, VkValidationLevel validationLevel)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.DbgSetValidationLevel(device, validationLevel);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkDbgRegisterMsgCallback(VkInstance instance, VK_DBG_MSG_CALLBACK_FUNCTION pfnMsgCallback, void* pUserData)
+{
+    // This layer intercepts callbacks
+    VK_LAYER_DBG_FUNCTION_NODE *pNewDbgFuncNode = (VK_LAYER_DBG_FUNCTION_NODE*)malloc(sizeof(VK_LAYER_DBG_FUNCTION_NODE));
+    if (pNewDbgFuncNode  == NULL)
+        return VK_ERROR_OUT_OF_HOST_MEMORY;
+    pNewDbgFuncNode->pfnMsgCallback = pfnMsgCallback;
+    pNewDbgFuncNode->pUserData = pUserData;
+    pNewDbgFuncNode->pNext = g_pDbgFunctionHead;
+    g_pDbgFunctionHead = pNewDbgFuncNode;
+    // force callbacks if DebugAction hasn't been set already other than initial value
+    if (g_actionIsDefault) {
+        g_debugAction = VK_DBG_LAYER_ACTION_CALLBACK;
+    }    VkResult result = nextTable.DbgRegisterMsgCallback(instance, pfnMsgCallback, pUserData);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkDbgUnregisterMsgCallback(VkInstance instance, VK_DBG_MSG_CALLBACK_FUNCTION pfnMsgCallback)
+{
+    VK_LAYER_DBG_FUNCTION_NODE *pTrav = g_pDbgFunctionHead;
+    VK_LAYER_DBG_FUNCTION_NODE *pPrev = pTrav;
+    while (pTrav) {
+        if (pTrav->pfnMsgCallback == pfnMsgCallback) {
+            pPrev->pNext = pTrav->pNext;
+            if (g_pDbgFunctionHead == pTrav)
+                g_pDbgFunctionHead = pTrav->pNext;
+            free(pTrav);
+            break;
+        }
+        pPrev = pTrav;
+        pTrav = pTrav->pNext;
+    }
+    if (g_pDbgFunctionHead == NULL)
+    {
+        if (g_actionIsDefault)
+            g_debugAction = VK_DBG_LAYER_ACTION_LOG_MSG;
+        else
+            g_debugAction &= ~VK_DBG_LAYER_ACTION_CALLBACK;
+    }
+    VkResult result = nextTable.DbgUnregisterMsgCallback(instance, pfnMsgCallback);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkDbgSetMessageFilter(VkDevice device, int32_t msgCode, VK_DBG_MSG_FILTER filter)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.DbgSetMessageFilter(device, msgCode, filter);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkDbgSetObjectTag(VkDevice device, VkObject object, size_t tagSize, const void* pTag)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(object, ll_get_obj_type(object));
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.DbgSetObjectTag(device, object, tagSize, pTag);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkDbgSetGlobalOption(VkInstance instance, VK_DBG_GLOBAL_OPTION dbgOption, size_t dataSize, const void* pData)
+{
+    VkResult result = nextTable.DbgSetGlobalOption(instance, dbgOption, dataSize, pData);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkDbgSetDeviceOption(VkDevice device, VK_DBG_DEVICE_OPTION dbgOption, size_t dataSize, const void* pData)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.DbgSetDeviceOption(device, dbgOption, dataSize, pData);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdDbgMarkerBegin(VkCommandBuffer commandBuffer, const char* pMarker)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdDbgMarkerBegin(commandBuffer, pMarker);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR void VKAPI_CALL vkCmdDbgMarkerEnd(VkCommandBuffer commandBuffer)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(commandBuffer, VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    nextTable.CmdDbgMarkerEnd(commandBuffer);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL xglGetDisplayInfoWSI(VkDisplayWSI display, VkDisplayInfoTypeWSI infoType, size_t* pDataSize, void* pData)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(display, VK_DEBUG_REPORT_OBJECT_TYPE_DISPLAY_WSI);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.GetDisplayInfoWSI(display, infoType, pDataSize, pData);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL xglCreateSwapChainWSI(VkDevice device, const VkSwapChainCreateInfoWSI* pCreateInfo, VkSwapChainWSI* pSwapChain)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(device, VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.CreateSwapChainWSI(device, pCreateInfo, pSwapChain);
+    if (result == VK_SUCCESS)
+    {
+        loader_platform_thread_lock_mutex(&objLock);
+
+#if 0
+        VKTRACE_VK_SNAPSHOT_LL_NODE* pNode = snapshot_insert_object(&s_delta, *pImage, VK_DEBUG_REPORT_OBJECT_TYPE_IMAGE_EXT);
+        pNode->obj.pStruct = NULL;
+
+        VKTRACE_VK_SNAPSHOT_LL_NODE* pMemNode = snapshot_insert_object(&s_delta, *pMemory, VK_DEBUG_REPORT_OBJECT_TYPE_PRESENTABLE_IMAGE_MEMORY);
+        pMemNode->obj.pStruct = NULL;
+#else
+        snapshot_insert_object(&s_delta, *pSwapChain, VK_DEBUG_REPORT_OBJECT_TYPE_SWAP_CHAIN_WSI);
+#endif
+
+        loader_platform_thread_unlock_mutex(&objLock);
+    }
+    return result;
+
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL xglDestroySwapChainWSI(VkSwapChainWSI swapChain)
+{
+    VkResult result = nextTable.DestroySwapChainWSI(swapChain);
+    loader_platform_thread_lock_mutex(&objLock);
+    snapshot_remove_object(&s_delta, swapChain);
+    loader_platform_thread_unlock_mutex(&objLock);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL xglGetSwapChainInfoWSI(VkSwapChainWSI swapChain, VkSwapChainInfoTypeWSI infoType, size_t* pDataSize, void* pData)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(swapChain, VK_DEBUG_REPORT_OBJECT_TYPE_SWAP_CHAIN_WSI);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.GetSwapChainInfoWSI(swapChain, infoType, pDataSize, pData);
+    return result;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL xglQueuePresentWSI(VkQueue queue, const VkPresentInfoWSI* pPresentInfo)
+{
+    loader_platform_thread_lock_mutex(&objLock);
+    ll_increment_use_count(queue, VK_DEBUG_REPORT_OBJECT_TYPE_QUEUE_EXT);
+    loader_platform_thread_unlock_mutex(&objLock);
+    VkResult result = nextTable.QueuePresentWSI(queue, pPresentInfo);
+    return result;
+}
+
+//=================================================================================================
+// Exported methods
+//=================================================================================================
+void vktraceSnapshotStartTracking(void)
+{
+    assert(!"Not Implemented");
+}
+
+//=================================================================================================
+VKTRACE_VK_SNAPSHOT vktraceSnapshotGetDelta(void)
+{
+    // copy the delta by merging it into an empty snapshot
+    VKTRACE_VK_SNAPSHOT empty;
+    memset(&empty, 0, sizeof(VKTRACE_VK_SNAPSHOT));
+
+    return vktraceSnapshotMerge(&s_delta, &empty);
+}
+
+//=================================================================================================
+VKTRACE_VK_SNAPSHOT vktraceSnapshotGetSnapshot(void)
+{
+    // copy the master snapshot by merging it into an empty snapshot
+    VKTRACE_VK_SNAPSHOT empty;
+    memset(&empty, 0, sizeof(VKTRACE_VK_SNAPSHOT));
+
+    return vktraceSnapshotMerge(&s_snapshot, &empty);
+}
+
+//=================================================================================================
+void vktraceSnapshotPrintDelta()
+{
+    char str[2048];
+    VKTRACE_VK_SNAPSHOT_LL_NODE* pTrav = s_delta.pGlobalObjs;
+    sprintf(str, "==== DELTA SNAPSHOT contains %lu objects, %lu devices, and %lu deleted objects", s_delta.globalObjCount, s_delta.deviceCount, s_delta.deltaDeletedObjectCount);
+    layerCbMsg(VK_DBG_MSG_UNKNOWN, VK_VALIDATION_LEVEL_0, (VkObject)NULL, 0, VKTRACESNAPSHOT_SNAPSHOT_DATA, LAYER_ABBREV_STR, str);
+
+    // print all objects
+    if (s_delta.globalObjCount > 0)
+    {
+        sprintf(str, "======== DELTA SNAPSHOT Created Objects:");
+        layerCbMsg(VK_DBG_MSG_UNKNOWN, VK_VALIDATION_LEVEL_0, pTrav->obj.object, 0, VKTRACESNAPSHOT_SNAPSHOT_DATA, LAYER_ABBREV_STR, str);
+        while (pTrav != NULL)
+        {
+            sprintf(str, "\t%s obj %p", string_VK_OBJECT_TYPE(pTrav->obj.objType), (void*)pTrav->obj.object);
+            layerCbMsg(VK_DBG_MSG_UNKNOWN, VK_VALIDATION_LEVEL_0, pTrav->obj.object, 0, VKTRACESNAPSHOT_SNAPSHOT_DATA, LAYER_ABBREV_STR, str);
+            pTrav = pTrav->pNextGlobal;
+        }
+    }
+
+    // print devices
+    if (s_delta.deviceCount > 0)
+    {
+        VKTRACE_VK_SNAPSHOT_LL_NODE* pDeviceNode = s_delta.pDevices;
+        sprintf(str, "======== DELTA SNAPSHOT Devices:");
+        layerCbMsg(VK_DBG_MSG_UNKNOWN, VK_VALIDATION_LEVEL_0, (VkObject)NULL, 0, VKTRACESNAPSHOT_SNAPSHOT_DATA, LAYER_ABBREV_STR, str);
+        while (pDeviceNode != NULL)
+        {
+            VKTRACE_VK_SNAPSHOT_DEVICE_NODE* pDev = (VKTRACE_VK_SNAPSHOT_DEVICE_NODE*)pDeviceNode->obj.pStruct;
+            char * createInfoStr = vk_print_vkdevicecreateinfo(pDev->params.pCreateInfo, "\t\t");
+            sprintf(str, "\t%s obj %p:\n%s", string_VK_OBJECT_TYPE(VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT), (void*)pDeviceNode->obj.object, createInfoStr);
+            layerCbMsg(VK_DBG_MSG_UNKNOWN, VK_VALIDATION_LEVEL_0, pDeviceNode->obj.object, 0, VKTRACESNAPSHOT_SNAPSHOT_DATA, LAYER_ABBREV_STR, str);
+            pDeviceNode = pDeviceNode->pNextObj;
+        }
+    }
+
+    // print deleted objects
+    if (s_delta.deltaDeletedObjectCount > 0)
+    {
+        VKTRACE_VK_SNAPSHOT_DELETED_OBJ_NODE* pDelObjNode = s_delta.pDeltaDeletedObjects;
+        sprintf(str, "======== DELTA SNAPSHOT Deleted Objects:");
+        layerCbMsg(VK_DBG_MSG_UNKNOWN, VK_VALIDATION_LEVEL_0, (VkObject)NULL, 0, VKTRACESNAPSHOT_SNAPSHOT_DATA, LAYER_ABBREV_STR, str);
+        while (pDelObjNode != NULL)
+        {
+            sprintf(str, "         %s obj %p", string_VK_OBJECT_TYPE(pDelObjNode->objType), (void*)pDelObjNode->object);
+            layerCbMsg(VK_DBG_MSG_UNKNOWN, VK_VALIDATION_LEVEL_0, pDelObjNode->object, 0, VKTRACESNAPSHOT_SNAPSHOT_DATA, LAYER_ABBREV_STR, str);
+            pDelObjNode = pDelObjNode->pNextObj;
+        }
+    }
+}
+
+void vktraceSnapshotStopTracking(void)
+{
+    assert(!"Not Implemented");
+}
+
+void vktraceSnapshotClear(void)
+{
+    assert(!"Not Implemented");
+}
+
+VKTRACE_VK_SNAPSHOT vktraceSnapshotMerge(const VKTRACE_VK_SNAPSHOT* const pDelta, const VKTRACE_VK_SNAPSHOT* const pSnapshot)
+{
+    assert(!"Not Implemented");
+}
+
+
+
+
+//=============================================================================
+// Old Exported methods
+//=============================================================================
+uint64_t vktraceSnapshotGetObjectCount(VkObjectType type)
+{
+    uint64_t retVal = /*(type == VK_DEBUG_REPORT_OBJECT_TYPE_ANY) ? s_delta.globalObjCount :*/ s_delta.numObjs[type];
+    return retVal;
+}
+
+VkResult vktraceSnapshotGetObjects(VkObjectType type, uint64_t objCount, VKTRACE_VK_SNAPSHOT_OBJECT_NODE *pObjNodeArray)
+{
+    // This bool flags if we're pulling all objs or just a single class of objs
+    bool32_t bAllObjs = false; /*(type == VK_DEBUG_REPORT_OBJECT_TYPE_ANY);*/
+    // Check the count first thing
+    uint64_t maxObjCount = (bAllObjs) ? s_delta.globalObjCount : s_delta.numObjs[type];
+    if (objCount > maxObjCount) {
+        char str[1024];
+        sprintf(str, "OBJ ERROR : Received objTrackGetObjects() request for %lu objs, but there are only %lu objs of type %s", objCount, maxObjCount, string_VK_OBJECT_TYPE(type));
+        layerCbMsg(VK_DBG_MSG_ERROR, VK_VALIDATION_LEVEL_0, 0, 0, VKTRACESNAPSHOT_OBJCOUNT_MAX_EXCEEDED, LAYER_ABBREV_STR, str);
+        return VK_ERROR_INVALID_VALUE;
+    }
+
+    VKTRACE_VK_SNAPSHOT_LL_NODE* pTrav = (bAllObjs) ? s_delta.pGlobalObjs : s_delta.pObjectHead[type];
+
+    for (uint64_t i = 0; i < objCount; i++) {
+        if (!pTrav) {
+            char str[1024];
+            sprintf(str, "OBJ INTERNAL ERROR : Ran out of %s objs! Should have %lu, but only copied %lu and not the requested %lu.", string_VK_OBJECT_TYPE(type), maxObjCount, i, objCount);
+            layerCbMsg(VK_DBG_MSG_ERROR, VK_VALIDATION_LEVEL_0, 0, 0, VKTRACESNAPSHOT_INTERNAL_ERROR, LAYER_ABBREV_STR, str);
+            return VK_ERROR_UNKNOWN;
+        }
+        memcpy(&pObjNodeArray[i], pTrav, sizeof(VKTRACE_VK_SNAPSHOT_OBJECT_NODE));
+        pTrav = (bAllObjs) ? pTrav->pNextGlobal : pTrav->pNextObj;
+    }
+    return VK_SUCCESS;
+}
+
+void vktraceSnapshotPrintObjects(void)
+{
+    vktraceSnapshotPrintDelta();
+}
+
+#include "vk_generic_intercept_proc_helper.h"
+VK_LAYER_EXPORT VKAPI_ATTR void* VKAPI_CALL vkGetProcAddr(VkPhysicalDevice physicalDevice, const char* funcName)
+{
+    VkBaseLayerObject* gpuw = (VkBaseLayerObject *) physicalDevice;
+    if ((void*)physicalDevice == NULL)
+        return NULL;
+    pCurObj = gpuw;
+    loader_platform_thread_once(&tabOnce, initVktraceSnapshot);
+
+    // TODO: This needs to be changed, need only the entry points this layer intercepts
+    //addr = layer_intercept_proc(funcName);
+    //if (addr)
+    //    return addr;
+    //else
+    if (!strncmp("vktraceSnapshotGetObjectCount", funcName, sizeof("vktraceSnapshotGetObjectCount")))
+        return vktraceSnapshotGetObjectCount;
+    else if (!strncmp("vktraceSnapshotGetObjects", funcName, sizeof("vktraceSnapshotGetObjects")))
+        return vktraceSnapshotGetObjects;
+    else if (!strncmp("vktraceSnapshotPrintObjects", funcName, sizeof("vktraceSnapshotPrintObjects")))
+        return vktraceSnapshotPrintObjects;
+    else if (!strncmp("vktraceSnapshotStartTracking", funcName, sizeof("vktraceSnapshotStartTracking")))
+        return vktraceSnapshotStartTracking;
+    else if (!strncmp("vktraceSnapshotGetDelta", funcName, sizeof("vktraceSnapshotGetDelta")))
+        return vktraceSnapshotGetDelta;
+    else if (!strncmp("vktraceSnapshotGetSnapshot", funcName, sizeof("vktraceSnapshotGetSnapshot")))
+        return vktraceSnapshotGetSnapshot;
+    else if (!strncmp("vktraceSnapshotPrintDelta", funcName, sizeof("vktraceSnapshotPrintDelta")))
+        return vktraceSnapshotPrintDelta;
+    else if (!strncmp("vktraceSnapshotStopTracking", funcName, sizeof("vktraceSnapshotStopTracking")))
+        return vktraceSnapshotStopTracking;
+    else if (!strncmp("vktraceSnapshotClear", funcName, sizeof("vktraceSnapshotClear")))
+        return vktraceSnapshotClear;
+    else if (!strncmp("vktraceSnapshotMerge", funcName, sizeof("vktraceSnapshotMerge")))
+        return vktraceSnapshotMerge;
+    else {
+        if (gpuw->pGPA == NULL)
+            return NULL;
+        return gpuw->pGPA((VkPhysicalDevice)gpuw->nextObject, funcName);
+    }
+}
+

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/layers/vktrace_snapshot.h b/vktrace/src/vktrace_extensions/vktracevulkan/layers/vktrace_snapshot.h
new file mode 100644
index 0000000..7b101e3
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/layers/vktrace_snapshot.h

@@ -0,0 +1,267 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Mark Lobodzinski <mark@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ */
+
+#include "vkLayer.h"
+#include "vulkan/vulkan.h"
+// VkTrace Snapshot ERROR codes
+typedef enum _VKTRACE_SNAPSHOT_ERROR
+{
+    VKTRACESNAPSHOT_NONE,                              // Used for INFO & other non-error messages
+    VKTRACESNAPSHOT_UNKNOWN_OBJECT,                    // Updating uses of object that's not in global object list
+    VKTRACESNAPSHOT_INTERNAL_ERROR,                    // Bug with data tracking within the layer
+    VKTRACESNAPSHOT_DESTROY_OBJECT_FAILED,             // Couldn't find object to be destroyed
+    VKTRACESNAPSHOT_MISSING_OBJECT,                    // Attempted look-up on object that isn't in global object list
+    VKTRACESNAPSHOT_OBJECT_LEAK,                       // OBJECT was not correctly freed/destroyed
+    VKTRACESNAPSHOT_OBJCOUNT_MAX_EXCEEDED,             // Request for Object data in excess of max obj count
+    VKTRACESNAPSHOT_INVALID_FENCE,                     // Requested status of unsubmitted fence object
+    VKTRACESNAPSHOT_VIEWPORT_NOT_BOUND,                // Draw submitted with no viewport state object bound
+    VKTRACESNAPSHOT_RASTER_NOT_BOUND,                  // Draw submitted with no raster state object bound
+    VKTRACESNAPSHOT_COLOR_BLEND_NOT_BOUND,             // Draw submitted with no color blend state object bound
+    VKTRACESNAPSHOT_DEPTH_STENCIL_NOT_BOUND,           // Draw submitted with no depth-stencil state object bound
+    VKTRACESNAPSHOT_GPU_MEM_MAPPED,                    // Mem object ref'd in cmd buff is still mapped
+    VKTRACESNAPSHOT_GETGPUINFO_NOT_CALLED,             // Gpu Information has not been requested before drawing
+    VKTRACESNAPSHOT_MEMREFCOUNT_MAX_EXCEEDED,          // Number of QueueSubmit memory references exceeds GPU maximum
+    VKTRACESNAPSHOT_SNAPSHOT_DATA,                     // Message being printed is actually snapshot data
+} VKTRACE_SNAPSHOT_ERROR;
+
+// Object Status -- used to track state of individual objects
+typedef enum _OBJECT_STATUS
+{
+    OBJSTATUS_NONE                              = 0x00000000, // No status is set
+    OBJSTATUS_FENCE_IS_SUBMITTED                = 0x00000001, // Fence has been submitted
+    OBJSTATUS_VIEWPORT_BOUND                    = 0x00000002, // Viewport state object has been bound
+    OBJSTATUS_RASTER_BOUND                      = 0x00000004, // Viewport state object has been bound
+    OBJSTATUS_COLOR_BLEND_BOUND                 = 0x00000008, // Viewport state object has been bound
+    OBJSTATUS_DEPTH_STENCIL_BOUND               = 0x00000010, // Viewport state object has been bound
+    OBJSTATUS_GPU_MEM_MAPPED                    = 0x00000020, // Memory object is currently mapped
+} OBJECT_STATUS;
+
+static const char* string_VK_OBJECT_TYPE(VkDebugReportObjectTypeEXT type) {
+    switch ((unsigned int)type)
+    {
+        case VK_DEBUG_REPORT_OBJECT_TYPE_INSTANCE_EXT:
+            return "INSTANCE";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_PHYSICAL_DEVICE_EXT:
+            return "PHYSICAL_DEVICE";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_EXT:
+            return "DEVICE";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_QUEUE_EXT:
+            return "QUEUE";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_BUFFER_EXT:
+            return "COMMAND_BUFFER";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_DEVICE_MEMORY_EXT:
+            return "DEVICE_MEMORY";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_BUFFER_EXT:
+            return "BUFFER";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_BUFFER_VIEW_EXT:
+            return "BUFFER_VIEW";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_IMAGE_EXT:
+            return "IMAGE";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_IMAGE_VIEW_EXT:
+            return "IMAGE_VIEW";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_ATTACHMENT_VIEW_EXT:
+            return "ATTACHMENT_VIEW";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_SHADER:
+            return "SHADER";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_PIPELINE_EXT:
+            return "PIPELINE";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_PIPELINE_LAYOUT_EXT:
+            return "PIPELINE_LAYOUT";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_SAMPLER_EXT:
+            return "SAMPLER";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_DESCRIPTOR_SET_EXT:
+            return "DESCRIPTOR_SET";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_DESCRIPTOR_SET_LAYOUT_EXT:
+            return "DESCRIPTOR_SET_LAYOUT";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_DESCRIPTOR_POOL_EXT:
+            return "DESCRIPTOR_POOL";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_DYNAMIC_VIEWPORT_STATE:
+            return "DYNAMIC_VIEWPORT_STATE";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_DYNAMIC_RASTER_STATE:
+            return "DYNAMIC_RASTER_STATE";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_DYNAMIC_COLOR_BLEND_STATE:
+            return "DYNAMIC_COLOR_BLEND_STATE";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_DYNAMIC_DEPTH_STENCIL_STATE:
+            return "DYNAMIC_DEPTH_STENCIL_STATE";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_FENCE_EXT:
+            return "FENCE";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_SEMAPHORE_EXT:
+            return "SEMAPHORE";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_EVENT_EXT:
+            return "EVENT";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_QUERY_POOL_EXT:
+            return "QUERY_POOL";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_FRAMEBUFFER_EXT:
+            return "FRAMEBUFFER";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_RENDER_PASS_EXT:
+            return "RENDER_PASS";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_PIPELINE_CACHE_EXT:
+            return "PIPELINE_CACHE";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_SWAP_CHAIN_WSI:
+            return "SWAP_CHAIN_WSI";
+        case VK_DEBUG_REPORT_OBJECT_TYPE_COMMAND_POOL_EXT:
+            return "COMMAND_POOL";
+        default:
+            return "UNKNOWN";
+    }
+}
+
+//=============================================================================
+// Helper structure for a VKTRACE vulkan snapshot.
+// These can probably be auto-generated at some point.
+//=============================================================================
+
+void vktrace_vk_malloc_and_copy(void** ppDest, size_t size, const void* pSrc);
+
+typedef struct _VKTRACE_VK_SNAPSHOT_CREATEDEVICE_PARAMS
+{
+    VkPhysicalDevice physicalDevice;
+    VkDeviceCreateInfo* pCreateInfo;
+    VkDevice* pDevice;
+} VKTRACE_VK_SNAPSHOT_CREATEDEVICE_PARAMS;
+
+VkDeviceCreateInfo* vktrace_deepcopy_xgl_device_create_info(const VkDeviceCreateInfo* pSrcCreateInfo);void vktrace_deepfree_xgl_device_create_info(VkDeviceCreateInfo* pCreateInfo);
+void vktrace_vk_snapshot_copy_createdevice_params(VKTRACE_VK_SNAPSHOT_CREATEDEVICE_PARAMS* pDest, VkPhysicalDevice physicalDevice, const VkDeviceCreateInfo* pCreateInfo, VkDevice* pDevice);
+void vktrace_vk_snapshot_destroy_createdevice_params(VKTRACE_VK_SNAPSHOT_CREATEDEVICE_PARAMS* pSrc);
+
+//=============================================================================
+// VkTrace Snapshot helper structs
+//=============================================================================
+
+// Node that stores information about an object
+typedef struct _VKTRACE_VK_SNAPSHOT_OBJECT_NODE {
+    VkObject        object;
+    VkObjectType    objType;
+    uint64_t        numUses;
+    OBJECT_STATUS   status;
+    void*           pStruct;    //< optionally points to a device-specific struct (ie, VKTRACE_VK_SNAPSHOT_DEVICE_NODE)
+} VKTRACE_VK_SNAPSHOT_OBJECT_NODE;
+
+// Node that stores information about an VkDevice
+typedef struct _VKTRACE_VK_SNAPSHOT_DEVICE_NODE {
+    // This object
+    VkDevice device;
+
+    // CreateDevice parameters
+    VKTRACE_VK_SNAPSHOT_CREATEDEVICE_PARAMS params;
+
+    // Other information a device needs to store.
+    // TODO: anything?
+} VKTRACE_VK_SNAPSHOT_DEVICE_NODE;
+
+// Linked-List node that stores information about an object
+// We maintain a "Global" list which links every object and a
+//  per-Object list which just links objects of a given type
+// The object node has both pointers so the actual nodes are shared between the two lists
+typedef struct _VKTRACE_VK_SNAPSHOT_LL_NODE {
+    struct _VKTRACE_VK_SNAPSHOT_LL_NODE *pNextObj;
+    struct _VKTRACE_VK_SNAPSHOT_LL_NODE *pNextGlobal;
+    VKTRACE_VK_SNAPSHOT_OBJECT_NODE obj;
+} VKTRACE_VK_SNAPSHOT_LL_NODE;
+
+// Linked-List node to identify an object that has been deleted,
+// but the delta snapshot never saw it get created.
+typedef struct _VKTRACE_VK_SNAPSHOT_DELETED_OBJ_NODE {
+    struct _VKTRACE_VK_SNAPSHOT_DELETED_OBJ_NODE* pNextObj;
+    VkObject object;
+    VkObjectType objType;
+} VKTRACE_VK_SNAPSHOT_DELETED_OBJ_NODE;
+
+//=============================================================================
+// Main structure for a VKTRACE vulkan snapshot.
+//=============================================================================
+typedef struct _VKTRACE_VK_SNAPSHOT {
+    // Stores a list of all the objects known by this snapshot.
+    // This may be used as a shortcut to more easily find objects.
+    uint64_t globalObjCount;
+    VKTRACE_VK_SNAPSHOT_LL_NODE* pGlobalObjs;
+
+    // TEMPORARY: Keep track of all objects of each type
+    uint64_t numObjs[VK_NUM_OBJECT_TYPE];
+    VKTRACE_VK_SNAPSHOT_LL_NODE *pObjectHead[VK_NUM_OBJECT_TYPE];
+
+    // List of created devices and [potentially] hierarchical tree of the objects on it.
+    // This is used to represent ownership of the objects
+    uint64_t deviceCount;
+    VKTRACE_VK_SNAPSHOT_LL_NODE* pDevices;
+
+    // This is used to support snapshot deltas.
+    uint64_t deltaDeletedObjectCount;
+    VKTRACE_VK_SNAPSHOT_DELETED_OBJ_NODE* pDeltaDeletedObjects;
+} VKTRACE_VK_SNAPSHOT;
+
+//=============================================================================
+// prototype for extension functions
+//=============================================================================
+// The snapshot functionality should work similar to a stopwatch.
+// 1) 'StartTracking()' is like starting the stopwatch. This causes the snapshot
+//    to start tracking the creation of objects and state. In general, this
+//    should happen at the very beginning, to track all objects. During this
+//    tracking time, all creations and deletions are tracked on the
+//    'deltaSnapshot'.
+//    NOTE: This entrypoint currently does nothing, as tracking is implied
+//          by enabling the layer.
+// 2) 'GetDelta()' is analogous to looking at the stopwatch and seeing the
+//    current lap time - A copy of the 'deltaSnapshot' will be returned to the
+//    caller, but nothings changes within the snapshot layer. All creations
+//    and deletions continue to be applied to the 'deltaSnapshot'.
+//    NOTE: This will involve a deep copy of the delta, so there may be a
+//          performance hit.
+// 3) 'GetSnapshot()' is similar to hitting the 'Lap' button on a stopwatch.
+//    The 'deltaSnapshot' is merged into the 'masterSnapshot', the 'deltaSnapshot'
+//    is cleared, and the 'masterSnapshot' is returned. All creations and
+//    deletions continue to be applied to the 'deltaSnapshot'.
+//    NOTE: This will involve a deep copy of the snapshot, so there may be a
+//          performance hit.
+// 4) 'PrintDelta()' will cause the delta to be output by the layer's msgCallback.
+// 5) Steps 2, 3, and 4 can happen as often as needed.
+// 6) 'StopTracking()' is like stopping the stopwatch.
+//    NOTE: This entrypoint currently does nothing, as tracking is implied
+//          by disabling the layer.
+// 7) 'Clear()' will clear the 'deltaSnapshot' and the 'masterSnapshot'.
+//=============================================================================
+
+void vktraceSnapshotStartTracking(void);
+VKTRACE_VK_SNAPSHOT vktraceSnapshotGetDelta(void);
+VKTRACE_VK_SNAPSHOT vktraceSnapshotGetSnapshot(void);
+void vktraceSnapshotPrintDelta(void);
+void vktraceSnapshotStopTracking(void);
+void vktraceSnapshotClear(void);
+
+// utility
+// merge a delta into a snapshot and return the updated snapshot
+VKTRACE_VK_SNAPSHOT vktraceSnapshotMerge(const VKTRACE_VK_SNAPSHOT * const pDelta, const VKTRACE_VK_SNAPSHOT * const pSnapshot);
+
+uint64_t vktraceSnapshotGetObjectCount(VkObjectType type);
+VkResult vktraceSnapshotGetObjects(VkObjectType type, uint64_t objCount, VKTRACE_VK_SNAPSHOT_OBJECT_NODE* pObjNodeArray);
+void vktraceSnapshotPrintObjects(void);
+
+// Func ptr typedefs
+typedef uint64_t (*VKTRACESNAPSHOT_GET_OBJECT_COUNT)(VkObjectType);
+typedef VkResult (*VKTRACESNAPSHOT_GET_OBJECTS)(VkObjectType, uint64_t, VKTRACE_VK_SNAPSHOT_OBJECT_NODE*);
+typedef void (*VKTRACESNAPSHOT_PRINT_OBJECTS)(void);
+typedef void (*VKTRACESNAPSHOT_START_TRACKING)(void);
+typedef VKTRACE_VK_SNAPSHOT (*VKTRACESNAPSHOT_GET_DELTA)(void);
+typedef VKTRACE_VK_SNAPSHOT (*VKTRACESNAPSHOT_GET_SNAPSHOT)(void);
+typedef void (*VKTRACESNAPSHOT_PRINT_DELTA)(void);
+typedef void (*VKTRACESNAPSHOT_STOP_TRACKING)(void);
+typedef void (*VKTRACESNAPSHOT_CLEAR)(void);

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/CMakeLists.txt b/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/CMakeLists.txt
new file mode 100644
index 0000000..79aa53e
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/CMakeLists.txt

@@ -0,0 +1,73 @@
+cmake_minimum_required(VERSION 2.8)
+
+project(vulkan_replay)
+
+include("${SRC_DIR}/build_options.cmake")
+
+file(MAKE_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/codegen)
+execute_process(COMMAND ${PYTHON_EXECUTABLE} ${VKTRACE_VULKAN_DIR}/vktrace_generate.py AllPlatforms vktrace-replay-vk-funcs     VK_VERSION_1_0 OUTPUT_FILE ${CMAKE_CURRENT_SOURCE_DIR}/codegen/vkreplay_vk_func_ptrs.h)
+execute_process(COMMAND ${PYTHON_EXECUTABLE} ${VKTRACE_VULKAN_DIR}/vktrace_generate.py AllPlatforms vktrace-replay-c            VK_VERSION_1_0 OUTPUT_FILE ${CMAKE_CURRENT_SOURCE_DIR}/codegen/vkreplay_vk_replay_gen.cpp)
+execute_process(COMMAND ${PYTHON_EXECUTABLE} ${VKTRACE_VULKAN_DIR}/vktrace_generate.py AllPlatforms vktrace-replay-obj-mapper-h VK_VERSION_1_0 OUTPUT_FILE ${CMAKE_CURRENT_SOURCE_DIR}/codegen/vkreplay_vk_objmapper.h)
+
+if (${CMAKE_SYSTEM_NAME} MATCHES "Linux")
+set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
+set(OS_REPLAYER_LIBS
+    xcb
+)
+endif()
+
+if (${CMAKE_SYSTEM_NAME} MATCHES "Windows" OR
+    ${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
+set(OS_REPLAYER_LIBS  )
+endif()
+
+set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS}")
+
+set(SRC_LIST
+    ${SRC_LIST}
+    vkreplay.cpp
+    vkreplay_settings.cpp
+    vkreplay_vkreplay.cpp
+    vkreplay_vkdisplay.cpp
+    codegen/vkreplay_vk_replay_gen.cpp
+)
+
+set (HDR_LIST
+    vkreplay.h
+    vkreplay_settings.h
+    vkreplay_vkdisplay.h
+    vkreplay_vkreplay.h
+    codegen/vkreplay_vk_func_ptrs.h
+    codegen/vkreplay_vk_objmapper.h
+    ${CMAKE_CURRENT_SOURCE_DIR}/../vulkan/codegen_utils/vk_enum_string_helper.h
+    ${CODEGEN_VKTRACE_DIR}/vktrace_vk_packet_id.h
+    ${CODEGEN_VKTRACE_DIR}/vktrace_vk_vk_packets.h
+
+)
+
+include_directories(
+    codegen
+    ${SRC_DIR}/vktrace_replay
+    ${SRC_DIR}/vktrace_common
+    ${SRC_DIR}/thirdparty
+    ${CMAKE_CURRENT_SOURCE_DIR}
+    ${CODEGEN_VKTRACE_DIR}
+    ${VKTRACE_VULKAN_INCLUDE_DIR}
+    ${CMAKE_CURRENT_SOURCE_DIR}/../vulkan/codegen_utils
+)
+# needed for vktraceviewer_vk library which is shared
+if (NOT MSVC)
+    add_compiler_flag("-fPIC")
+endif()
+
+add_library(${PROJECT_NAME} STATIC ${SRC_LIST} ${HDR_LIST})
+
+add_dependencies(${PROJECT_NAME} "vulkan-${MAJOR}")
+
+target_link_libraries(${PROJECT_NAME} 
+    ${OS_REPLAYER_LIBS}
+    ${VKTRACE_VULKAN_LIB}
+    vktrace_common
+)
+
+build_options_finalize()

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay.cpp b/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay.cpp
new file mode 100644
index 0000000..6c4d391
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay.cpp

@@ -0,0 +1,199 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation, Inc.
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ **************************************************************************/
+#include <inttypes.h>
+#include "vkreplay.h"
+#include "vkreplay_vkreplay.h"
+#include "vktrace_vk_packet_id.h"
+#include "vktrace_tracelog.h"
+
+static vkreplayer_settings s_defaultVkReplaySettings = { NULL, 1, -1, -1, NULL, NULL };
+
+vkReplay* g_pReplayer = NULL;
+VKTRACE_CRITICAL_SECTION g_handlerLock;
+PFN_vkDebugReportCallbackEXT g_fpDbgMsgCallback;
+vktrace_replay::VKTRACE_DBG_MSG_CALLBACK_FUNCTION g_fpVktraceCallback = NULL;
+
+static VKAPI_ATTR VkBool32 VKAPI_CALL vkErrorHandler(
+                                VkFlags             msgFlags,
+                                VkDebugReportObjectTypeEXT     objType,
+                                uint64_t            srcObjectHandle,
+                                size_t              location,
+                                int32_t             msgCode,
+                                const char*         pLayerPrefix,
+                                const char*         pMsg,
+                                void*               pUserData)
+{
+    VkBool32 bail = false;
+
+    vktrace_enter_critical_section(&g_handlerLock);
+    if ((msgFlags & VK_DEBUG_REPORT_ERROR_BIT_EXT) == VK_DEBUG_REPORT_ERROR_BIT_EXT)
+    {
+        vktrace_LogError("MsgFlags %d with object %#" PRIxLEAST64 ", location %u returned msgCode %d and msg %s",
+                     msgFlags, srcObjectHandle, location, msgCode, (char *) pMsg);
+        g_pReplayer->push_validation_msg(msgFlags, objType, srcObjectHandle, location, msgCode, pLayerPrefix, pMsg, pUserData);
+        if (g_fpVktraceCallback != NULL)
+        {
+            g_fpVktraceCallback(vktrace_replay::VKTRACE_DBG_MSG_ERROR, pMsg);
+        }
+        /* TODO: bailing out of the call chain due to this error should allow
+         * the app to continue in some fashion.
+         * Is that needed here?
+         */
+        bail = true;
+    }
+    else if ((msgFlags & VK_DEBUG_REPORT_WARNING_BIT_EXT) == VK_DEBUG_REPORT_WARNING_BIT_EXT ||
+             (msgFlags & VK_DEBUG_REPORT_PERFORMANCE_WARNING_BIT_EXT) == VK_DEBUG_REPORT_PERFORMANCE_WARNING_BIT_EXT)
+    {
+        if (g_fpVktraceCallback != NULL)
+        {
+            g_fpVktraceCallback(vktrace_replay::VKTRACE_DBG_MSG_WARNING, pMsg);
+        }
+    }
+    else
+    {
+        if (g_fpVktraceCallback != NULL)
+        {
+            g_fpVktraceCallback(vktrace_replay::VKTRACE_DBG_MSG_INFO, pMsg);
+        }
+    }
+    vktrace_leave_critical_section(&g_handlerLock);
+
+    return bail;
+}
+
+void VkReplaySetLogCallback(VKTRACE_REPORT_CALLBACK_FUNCTION pCallback)
+{
+}
+
+void VkReplaySetLogLevel(VktraceLogLevel level)
+{
+}
+
+void VkReplayRegisterDbgMsgCallback(vktrace_replay::VKTRACE_DBG_MSG_CALLBACK_FUNCTION pCallback)
+{
+    g_fpVktraceCallback = pCallback;
+}
+
+vktrace_SettingGroup* VKTRACER_CDECL VkReplayGetSettings()
+{
+    static BOOL bFirstTime = TRUE;
+    if (bFirstTime == TRUE)
+    {
+        vktrace_SettingGroup_reset_defaults(&g_vkReplaySettingGroup);
+        bFirstTime = FALSE;
+    }
+
+    return &g_vkReplaySettingGroup;
+}
+
+void VKTRACER_CDECL VkReplayUpdateFromSettings(vktrace_SettingGroup* pSettingGroups, unsigned int numSettingGroups)
+{
+    vktrace_SettingGroup_Apply_Overrides(&g_vkReplaySettingGroup, pSettingGroups, numSettingGroups);
+}
+
+int VKTRACER_CDECL VkReplayInitialize(vktrace_replay::ReplayDisplay* pDisplay, vkreplayer_settings *pReplaySettings)
+{
+    try
+    {
+        if (pReplaySettings == NULL)
+        {
+            g_pReplayer = new vkReplay(&s_defaultVkReplaySettings);
+        }
+        else
+        {
+            g_pReplayer = new vkReplay(pReplaySettings);
+        }
+    }
+    catch (int e)
+    {
+        vktrace_LogError("Failed to create vkReplay, probably out of memory. Error %d", e);
+        return -1;
+    }
+
+    vktrace_create_critical_section(&g_handlerLock);
+    g_fpDbgMsgCallback = vkErrorHandler;
+    int result = g_pReplayer->init(*pDisplay);
+    return result;
+}
+
+void VKTRACER_CDECL VkReplayDeinitialize()
+{
+    if (g_pReplayer != NULL)
+    {
+        delete g_pReplayer;
+        g_pReplayer = NULL;
+    }
+    vktrace_delete_critical_section(&g_handlerLock);
+}
+
+vktrace_trace_packet_header* VKTRACER_CDECL VkReplayInterpret(vktrace_trace_packet_header* pPacket)
+{
+    // Attempt to interpret the packet as a Vulkan packet
+    vktrace_trace_packet_header* pInterpretedHeader = interpret_trace_packet_vk(pPacket);
+    if (pInterpretedHeader == NULL)
+    {
+        vktrace_LogError("Unrecognized Vulkan packet_id: %u", pPacket->packet_id);
+    }
+
+    return pInterpretedHeader;
+}
+
+vktrace_replay::VKTRACE_REPLAY_RESULT VKTRACER_CDECL VkReplayReplay(vktrace_trace_packet_header* pPacket)
+{
+    vktrace_replay::VKTRACE_REPLAY_RESULT result = vktrace_replay::VKTRACE_REPLAY_ERROR;
+    if (g_pReplayer != NULL)
+    {
+        result = g_pReplayer->replay(pPacket);
+
+        if (result == vktrace_replay::VKTRACE_REPLAY_SUCCESS)
+            result = g_pReplayer->pop_validation_msgs();
+    }
+    return result;
+}
+
+int VKTRACER_CDECL VkReplayDump()
+{
+    if (g_pReplayer != NULL)
+    {
+        g_pReplayer->dump_validation_data();
+        return 0;
+    }
+    return -1;
+}
+
+int VKTRACER_CDECL VkReplayGetFrameNumber()
+{
+    if (g_pReplayer != NULL)
+    {
+        return g_pReplayer->get_frame_number();
+    }
+    return -1;
+}
+
+void VKTRACER_CDECL VkReplayResetFrameNumber()
+{
+    if (g_pReplayer != NULL)
+    {
+        g_pReplayer->reset_frame_number();
+    }
+}

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay.h b/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay.h
new file mode 100644
index 0000000..aa49311
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay.h

@@ -0,0 +1,41 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation, Inc.
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ * Author: Jon Ashburn <jon@lunarg.com>
+ **************************************************************************/
+#pragma once
+#include "vkreplay_window.h"
+#include "vkreplay_factory.h"
+#include "vkreplay_settings.h"
+
+
+extern void VkReplaySetLogCallback(VKTRACE_REPORT_CALLBACK_FUNCTION pCallback);
+extern void VkReplaySetLogLevel(VktraceLogLevel level);
+extern void VkReplayRegisterDbgMsgCallback(vktrace_replay::VKTRACE_DBG_MSG_CALLBACK_FUNCTION pCallback);
+extern vktrace_SettingGroup* VKTRACER_CDECL VkReplayGetSettings();
+extern void VKTRACER_CDECL VkReplayUpdateFromSettings(vktrace_SettingGroup* pSettingGroups, unsigned int numSettingGroups);
+extern int VKTRACER_CDECL VkReplayInitialize(vktrace_replay::ReplayDisplay* pDisplay, vkreplayer_settings *pReplaySettings);
+extern void VKTRACER_CDECL VkReplayDeinitialize();
+extern vktrace_trace_packet_header* VKTRACER_CDECL VkReplayInterpret(vktrace_trace_packet_header* pPacket);
+extern vktrace_replay::VKTRACE_REPLAY_RESULT VKTRACER_CDECL VkReplayReplay(vktrace_trace_packet_header* pPacket);
+extern int VKTRACER_CDECL VkReplayDump();
+extern int VKTRACER_CDECL VkReplayGetFrameNumber();
+extern void VKTRACER_CDECL VkReplayResetFrameNumber();
+
+extern PFN_vkDebugReportCallbackEXT g_fpDbgMsgCallback;

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay_settings.cpp b/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay_settings.cpp
new file mode 100644
index 0000000..ecd9845
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay_settings.cpp

@@ -0,0 +1,57 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation, Inc.
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ **************************************************************************/
+#include "vulkan/vk_layer.h"
+
+#include "vkreplay_settings.h"
+// declared as extern in header
+vkreplayer_settings g_vkReplaySettings;
+
+static vkreplayer_settings s_defaultVkReplaySettings = { NULL, 1, -1, -1, NULL, NULL };
+
+vktrace_SettingInfo g_vk_settings_info[] =
+{
+    { "t", "TraceFile", VKTRACE_SETTING_STRING, { &g_vkReplaySettings.pTraceFilePath }, { &s_defaultVkReplaySettings.pTraceFilePath }, TRUE, "The trace file to replay." },
+    { "l", "NumLoops", VKTRACE_SETTING_UINT, { &g_vkReplaySettings.numLoops }, { &s_defaultVkReplaySettings.numLoops }, TRUE, "The number of times to replay the trace file or loop range." },
+    { "lsf", "LoopStartFrame", VKTRACE_SETTING_INT, { &g_vkReplaySettings.loopStartFrame }, { &s_defaultVkReplaySettings.loopStartFrame }, TRUE, "The start frame number of the loop range." },
+    { "lef", "LoopEndFrame", VKTRACE_SETTING_INT, { &g_vkReplaySettings.loopEndFrame }, { &s_defaultVkReplaySettings.loopEndFrame }, TRUE, "The end frame number of the loop range." },
+    { "s", "Screenshot", VKTRACE_SETTING_STRING, { &g_vkReplaySettings.screenshotList }, { &s_defaultVkReplaySettings.screenshotList }, TRUE, "Comma separated list of frames to take a take snapshots of" },
+};
+
+vktrace_SettingGroup g_vkReplaySettingGroup =
+{
+    "vkreplay_vk",
+    sizeof(g_vk_settings_info) / sizeof(g_vk_settings_info[0]),
+    &g_vk_settings_info[0]
+};
+
+void apply_layerSettings_overrides()
+{
+#if 0
+    setLayerOptionEnum("DrawStateReportFlags", g_vkReplaySettings.drawStateReportFlags);
+    setLayerOptionEnum("DrawStateDebugAction", g_vkReplaySettings.drawStateDebugAction);
+    setLayerOptionEnum("MemTrackerReportFlags", g_vkReplaySettings.memTrackerReportFlags);
+    setLayerOptionEnum("MemTrackerDebugAction", g_vkReplaySettings.memTrackerDebugAction);
+    setLayerOptionEnum("ObjectTrackerReportFlags", g_vkReplaySettings.objectTrackerReportFlags);
+    setLayerOptionEnum("ObjectTrackerDebugAction", g_vkReplaySettings.objectTrackerDebugAction);
+#endif
+}

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay_settings.h b/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay_settings.h
new file mode 100644
index 0000000..003d6c0
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay_settings.h

@@ -0,0 +1,39 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation, Inc.
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ **************************************************************************/
+#ifndef VKREPLAY__VK_SETTINGS_H
+#define VKREPLAY__VK_SETTINGS_H
+
+extern "C"
+{
+#include "vktrace_settings.h"
+#include "vkreplay_main.h"
+}
+
+#include <vulkan/vulkan.h>
+
+extern vkreplayer_settings g_vkReplaySettings;
+extern vktrace_SettingGroup g_vkReplaySettingGroup;
+
+void apply_layerSettings_overrides();
+
+#endif // VKREPLAY__VK_SETTINGS_H

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay_vkdisplay.cpp b/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay_vkdisplay.cpp
new file mode 100644
index 0000000..97e7b16
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay_vkdisplay.cpp

@@ -0,0 +1,308 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ * Author: Mark Lobodzinski <mark@lunarg.com>
+ */
+
+#include "vkreplay_vkreplay.h"
+#include "vk_icd.h"
+#define APP_NAME "vkreplay_vk"
+#define IDI_ICON 101
+
+vkDisplay::vkDisplay()
+    : m_initedVK(false),
+    m_windowWidth(0),
+    m_windowHeight(0),
+    m_frameNumber(0)
+{
+#if defined(PLATFORM_LINUX)
+#if defined(ANDROID)
+    memset(&m_surface, 0, sizeof(VkIcdSurfaceAndroid));
+    m_window = 0;
+#else
+    memset(&m_surface, 0, sizeof(VkIcdSurfaceXcb));
+    m_pXcbConnection = NULL;
+    m_pXcbScreen = NULL;
+    m_XcbWindow = 0;
+#endif
+#elif defined(WIN32)
+    memset(&m_surface, 0, sizeof(VkIcdSurfaceWin32));
+    m_windowHandle = NULL;
+    m_connection = NULL;
+#endif
+}
+
+vkDisplay::~vkDisplay()
+{
+#if defined(PLATFORM_LINUX) && !defined(ANDROID)
+    if (m_XcbWindow != 0)
+    {
+        xcb_destroy_window(m_pXcbConnection, m_XcbWindow);
+    }
+    if (m_pXcbConnection != NULL)
+    {
+        xcb_disconnect(m_pXcbConnection);
+    }
+#endif
+}
+
+VkResult vkDisplay::init_vk(unsigned int gpu_idx)
+{
+#if 0
+    VkApplicationInfo appInfo = {};
+    appInfo.pApplicationName = APP_NAME;
+    appInfo.pEngineName = "";
+    appInfo.apiVersion = VK_API_VERSION;
+    VkResult res = vkInitAndEnumerateGpus(&appInfo, NULL, VK_MAX_PHYSICAL_GPUS, &m_gpuCount, m_gpus);
+    if ( res == VK_SUCCESS ) {
+        // retrieve the GPU information for all GPUs
+        for( uint32_t gpu = 0; gpu < m_gpuCount; gpu++)
+        {
+            size_t gpuInfoSize = sizeof(m_gpuProps[0]);
+
+            // get the GPU physical properties:
+            res = vkGetGpuInfo( m_gpus[gpu], VK_INFO_TYPE_PHYSICAL_GPU_PROPERTIES, &gpuInfoSize, &m_gpuProps[gpu]);
+            if (res != VK_SUCCESS)
+                vktrace_LogWarning("Failed to retrieve properties for gpu[%d] result %d", gpu, res);
+        }
+        res = VK_SUCCESS;
+    } else if ((gpu_idx + 1) > m_gpuCount) {
+        vktrace_LogError("vkInitAndEnumerate number of gpus does not include requested index: num %d, requested %d", m_gpuCount, gpu_idx);
+        return -1;
+    } else {
+        vktrace_LogError("vkInitAndEnumerate failed");
+        return res;
+    }
+    // TODO add multi-gpu support always use gpu[gpu_idx] for now
+    // get all extensions supported by this device gpu[gpu_idx]
+    // first check if extensions are available and save a list of them
+    bool foundWSIExt = false;
+    for( int ext = 0; ext < sizeof( extensions ) / sizeof( extensions[0] ); ext++)
+    {
+        res = vkGetExtensionSupport( m_gpus[gpu_idx], extensions[ext] );
+        if (res == VK_SUCCESS) {
+            m_extensions.push_back((char *) extensions[ext]);
+            if (!strcmp(extensions[ext], "VK_WSI_WINDOWS"))
+                foundWSIExt = true;
+        }
+    }
+    if (!foundWSIExt) {
+        vktrace_LogError("VK_WSI_WINDOWS extension not supported by gpu[%d]", gpu_idx);
+        return VK_ERROR_INCOMPATIBLE_DEVICE;
+    }
+    // TODO generalize this: use one universal queue for now
+    VkDeviceQueueCreateInfo dqci = {};
+    dqci.queueCount = 1;
+    dqci.queueType = VK_QUEUE_UNIVERSAL;
+    std::vector<float> queue_priorities (dqci.queueCount, 0.0);
+    dqci.pQueuePriorities = queue_priorities.data();
+    // create the device enabling validation level 4
+    const char * const * extensionNames = &m_extensions[0];
+    VkDeviceCreateInfo info = {};
+    info.queueCreateInfoCount = 1;
+    info.pQueueCreateInfos = &dqci;
+    info.enabledExtensionCount = static_cast <uint32_t> (m_extensions.size());
+    info.ppEnabledExtensionNames = extensionNames;
+    info.flags = VK_DEVICE_CREATE_VALIDATION;
+    info.maxValidationLevel = VK_VALIDATION_LEVEL_4;
+    bool32_t vkTrue = VK_TRUE;
+    res = vkDbgSetGlobalOption( VK_DBG_OPTION_BREAK_ON_ERROR, sizeof( vkTrue ), &vkTrue );
+    if (res != VK_SUCCESS)
+        vktrace_LogWarning("Could not set debug option break on error");
+    res = vkCreateDevice( m_gpus[0], &info, &m_dev[gpu_idx]);
+    return res;
+#else
+    return VK_ERROR_INITIALIZATION_FAILED;
+#endif
+}
+
+int vkDisplay::init(const unsigned int gpu_idx)
+{
+    //m_gpuIdx = gpu_idx;
+#if 0
+    VkResult result = init_vk(gpu_idx);
+    if (result != VK_SUCCESS) {
+        vktrace_LogError("could not init vulkan library");
+        return -1;
+    } else {
+        m_initedVK = true;
+    }
+#endif
+#if defined(PLATFORM_LINUX) && !defined(ANDROID)
+    const xcb_setup_t *setup;
+    xcb_screen_iterator_t iter;
+    int scr;
+    m_pXcbConnection = xcb_connect(NULL, &scr);
+    setup = xcb_get_setup(m_pXcbConnection);
+    iter = xcb_setup_roots_iterator(setup);
+    while (scr-- > 0)
+        xcb_screen_next(&iter);
+    m_pXcbScreen = iter.data;
+#endif
+    return 0;
+}
+
+#if defined(WIN32)
+LRESULT WINAPI WindowProcVk( HWND window, unsigned int msg, WPARAM wp, LPARAM lp)
+{
+    switch(msg)
+    {
+        case WM_CLOSE:
+            DestroyWindow( window);
+            // fall-thru
+        case WM_DESTROY:
+            PostQuitMessage(0) ;
+            return 0L ;
+        default:
+            return DefWindowProc( window, msg, wp, lp ) ;
+    }
+}
+#endif
+
+int vkDisplay::set_window(vktrace_window_handle hWindow, unsigned int width, unsigned int height)
+{
+#if defined(PLATFORM_LINUX)
+#if defined(ANDROID)
+    m_window = hWindow;
+    m_surface.window = hWindow;
+#else
+    m_XcbWindow = hWindow;
+#endif
+#elif defined(WIN32)
+    m_windowHandle = hWindow;
+#endif
+    m_windowWidth = width;
+    m_windowHeight = height;
+    return 0;
+}
+
+int vkDisplay::create_window(const unsigned int width, const unsigned int height)
+{
+#if defined(PLATFORM_LINUX)
+#if defined(ANDROID)
+    return 0;
+#else
+
+    uint32_t value_mask, value_list[32];
+    m_XcbWindow = xcb_generate_id(m_pXcbConnection);
+
+    value_mask = XCB_CW_BACK_PIXEL | XCB_CW_EVENT_MASK;
+    value_list[0] = m_pXcbScreen->black_pixel;
+    value_list[1] = XCB_EVENT_MASK_KEY_RELEASE |
+                    XCB_EVENT_MASK_EXPOSURE;
+
+    xcb_create_window(m_pXcbConnection,
+            XCB_COPY_FROM_PARENT,
+            m_XcbWindow, m_pXcbScreen->root,
+            0, 0, width, height, 0,
+            XCB_WINDOW_CLASS_INPUT_OUTPUT,
+            m_pXcbScreen->root_visual,
+            value_mask, value_list);
+
+    xcb_map_window(m_pXcbConnection, m_XcbWindow);
+    xcb_flush(m_pXcbConnection);
+    // TODO : Not sure of best place to put this, but I have all the info I need here so just setting it all here for now
+    //m_XcbPlatformHandle.connection = m_pXcbConnection;
+    //m_XcbPlatformHandle.root = m_pXcbScreen->root;
+    m_surface.base.platform = VK_ICD_WSI_PLATFORM_XCB;
+    m_surface.connection = m_pXcbConnection;
+    m_surface.window = m_XcbWindow;
+    return 0;
+#endif
+#elif defined(WIN32)
+    // Register Window class
+    WNDCLASSEX wcex = {};
+	m_connection = GetModuleHandle(0);
+    wcex.cbSize = sizeof( WNDCLASSEX);
+    wcex.style = CS_HREDRAW | CS_VREDRAW;
+    wcex.lpfnWndProc = WindowProcVk;
+    wcex.cbClsExtra = 0;
+    wcex.cbWndExtra = 0;
+    wcex.hInstance = m_connection;
+    wcex.hIcon = LoadIcon(wcex.hInstance, MAKEINTRESOURCE( IDI_ICON));
+    wcex.hCursor = LoadCursor( NULL, IDC_ARROW);
+    wcex.hbrBackground = ( HBRUSH )( COLOR_WINDOW + 1);
+    wcex.lpszMenuName = NULL;
+    wcex.lpszClassName = APP_NAME;
+    wcex.hIconSm = LoadIcon( wcex.hInstance, MAKEINTRESOURCE( IDI_ICON));
+    if( !RegisterClassEx( &wcex))
+    {
+        vktrace_LogError("Failed to register windows class");
+        return -1;
+    }
+
+    // create the window
+    RECT wr = {0,0,width,height};
+    AdjustWindowRect(&wr, WS_OVERLAPPEDWINDOW, FALSE);
+    m_windowHandle = CreateWindow(APP_NAME, APP_NAME, WS_OVERLAPPEDWINDOW,
+                                  0, 0, wr.right-wr.left, wr.bottom-wr.top,
+                                  NULL, NULL, wcex.hInstance, NULL);
+
+    if (m_windowHandle)
+    {
+        ShowWindow( m_windowHandle, SW_SHOWDEFAULT);
+        m_windowWidth = width;
+        m_windowHeight = height;
+    } else {
+        vktrace_LogError("Failed to create window");
+        return -1;
+    }
+    // TODO : Not sure of best place to put this, but I have all the info I need here so just setting it all here for now
+    m_surface.base.platform = VK_ICD_WSI_PLATFORM_WIN32;
+    m_surface.hinstance = wcex.hInstance;
+    m_surface.hwnd = m_windowHandle;
+    return 0;
+#endif
+}
+
+void vkDisplay::resize_window(const unsigned int width, const unsigned int height)
+{
+#if defined(PLATFORM_LINUX)
+#if defined(ANDROID)
+    m_windowWidth = width;
+    m_windowHeight = height;
+#else
+    if (width != m_windowWidth || height != m_windowHeight)
+    {
+        uint32_t values[2];
+        values[0] = width;
+        values[1] = height;
+        xcb_configure_window(m_pXcbConnection, m_XcbWindow, XCB_CONFIG_WINDOW_WIDTH | XCB_CONFIG_WINDOW_HEIGHT, values);
+        xcb_flush(m_pXcbConnection);
+        m_windowWidth = width;
+        m_windowHeight = height;
+    }
+#endif
+#elif defined(WIN32)
+    if (width != m_windowWidth || height != m_windowHeight)
+    {
+        RECT wr = {0, 0, width, height};
+        AdjustWindowRect(&wr, WS_OVERLAPPEDWINDOW, FALSE);
+        SetWindowPos(get_window_handle(), HWND_TOP, 0, 0, wr.right-wr.left, wr.bottom-wr.top, SWP_NOMOVE);
+        m_windowWidth = width;
+        m_windowHeight = height;
+    }
+#endif
+}
+
+void vkDisplay::process_event()
+{
+}

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay_vkdisplay.h b/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay_vkdisplay.h
new file mode 100644
index 0000000..00daf95
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay_vkdisplay.h

@@ -0,0 +1,88 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ * Author: Mark Lobodzinski <mark@lunarg.com>
+ */
+
+#pragma once
+
+#include "vkreplay_vkreplay.h"
+#include "vk_icd.h"
+
+class vkDisplay: public vktrace_replay::ReplayDisplayImp {
+friend class vkReplay;
+public:
+    vkDisplay();
+    ~vkDisplay();
+    int init(const unsigned int gpu_idx);
+    int set_window(vktrace_window_handle hWindow, unsigned int width, unsigned int height);
+    int create_window(const unsigned int width, const unsigned int height);
+    void resize_window(const unsigned int width, const unsigned int height);
+    void process_event();
+    VkSurfaceKHR get_surface() { return (VkSurfaceKHR) &m_surface; };
+    // VK_DEVICE get_device() { return m_dev[m_gpuIdx];}
+#if defined(PLATFORM_LINUX)
+#if defined(ANDROID)
+    ANativeWindow* get_window_handle() { return m_window; }
+#else
+    xcb_window_t get_window_handle() { return m_XcbWindow; }
+    xcb_connection_t* get_connection_handle() { return m_pXcbConnection; }
+    xcb_screen_t* get_screen_handle() { return m_pXcbScreen; }
+#endif
+#elif defined(WIN32)
+    HWND get_window_handle() { return m_windowHandle; }
+    HINSTANCE get_connection_handle() { return m_connection; }
+#endif
+private:
+    VkResult init_vk(const unsigned int gpu_idx);
+    bool m_initedVK;
+#if defined(PLATFORM_LINUX)
+#if defined(ANDROID)
+    VkIcdSurfaceAndroid m_surface;
+    ANativeWindow* m_window;
+#else
+    VkIcdSurfaceXcb m_surface;
+    xcb_connection_t *m_pXcbConnection;
+    xcb_screen_t *m_pXcbScreen;
+    xcb_window_t m_XcbWindow;
+    //VkPlatformHandleXcbKHR m_XcbPlatformHandle;
+#endif
+#elif defined(WIN32)
+    VkIcdSurfaceWin32 m_surface;
+    HWND m_windowHandle;
+    HINSTANCE m_connection;
+#endif
+    unsigned int m_windowWidth;
+    unsigned int m_windowHeight;
+    unsigned int m_frameNumber;
+    std::vector<VkExtent2D> imageExtents;
+    std::vector<VkImage> imageHandles;
+    std::vector<VkDeviceMemory> imageMemory;
+    std::vector<VkDevice> imageDevice;
+#if 0
+    VK_DEVICE m_dev[VK_MAX_PHYSICAL_GPUS];
+    uint32_t m_gpuCount;
+    unsigned int m_gpuIdx;
+    VK_PHYSICAL_GPU m_gpus[VK_MAX_PHYSICAL_GPUS];
+    VK_PHYSICAL_GPU_PROPERTIES m_gpuProps[VK_MAX_PHYSICAL_GPUS];
+#endif
+    std::vector<char *>m_extensions;
+};

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay_vkreplay.cpp b/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay_vkreplay.cpp
new file mode 100644
index 0000000..e3795d7
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay_vkreplay.cpp

@@ -0,0 +1,3450 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ * Author: Mark Lobodzinski <mark@lunarg.com>
+ * Author: Tobin Ehlis <tobin@lunarg.com>
+ */
+
+#include "vulkan/vulkan.h"
+#include "vkreplay_vkreplay.h"
+#include "vkreplay.h"
+#include "vkreplay_settings.h"
+
+#include <algorithm>
+
+#include "vktrace_vk_vk_packets.h"
+#include "vk_enum_string_helper.h"
+
+using namespace std;
+#include "vktrace_pageguard_memorycopy.h"
+
+vkreplayer_settings *g_pReplaySettings;
+
+vkReplay::vkReplay(vkreplayer_settings *pReplaySettings)
+{
+    g_pReplaySettings = pReplaySettings;
+    m_display = new vkDisplay();
+    m_pDSDump = NULL;
+    m_pCBDump = NULL;
+//    m_pVktraceSnapshotPrint = NULL;
+    m_objMapper.m_adjustForGPU = false;
+
+    m_frameNumber = 0;
+}
+
+vkReplay::~vkReplay()
+{
+    delete m_display;
+    vktrace_platform_close_library(m_vkFuncs.m_libHandle);
+}
+
+int vkReplay::init(vktrace_replay::ReplayDisplay & disp)
+{
+    int err;
+#if defined(PLATFORM_LINUX)
+    void * handle = dlopen("libvulkan.so", RTLD_LAZY);
+#else
+    HMODULE handle = LoadLibrary("vulkan-1.dll" );
+#endif
+
+    if (handle == NULL) {
+        vktrace_LogError("Failed to open vulkan library.");
+        return -1;
+    }
+    m_vkFuncs.init_funcs(handle);
+    disp.set_implementation(m_display);
+    if ((err = m_display->init(disp.get_gpu())) != 0) {
+        vktrace_LogError("Failed to init vulkan display.");
+        return err;
+    }
+    if (disp.get_window_handle() == 0)
+    {
+        if ((err = m_display->create_window(disp.get_width(), disp.get_height())) != 0) {
+            vktrace_LogError("Failed to create Window");
+            return err;
+        }
+    }
+    else
+    {
+        if ((err = m_display->set_window(disp.get_window_handle(), disp.get_width(), disp.get_height())) != 0)
+        {
+            vktrace_LogError("Failed to set Window");
+            return err;
+        }
+    }
+    return 0;
+}
+
+vktrace_replay::VKTRACE_REPLAY_RESULT vkReplay::handle_replay_errors(const char* entrypointName, const VkResult resCall, const VkResult resTrace, const vktrace_replay::VKTRACE_REPLAY_RESULT resIn)
+{
+    vktrace_replay::VKTRACE_REPLAY_RESULT res = resIn;
+    if (resCall == VK_ERROR_DEVICE_LOST)
+    {
+        vktrace_LogError("API call %s returned VK_ERROR_DEVICE_LOST. vkreplay cannot continue, exiting.", entrypointName);
+        exit(1);
+    }
+    if (resCall != resTrace) {
+        vktrace_LogError("Return value %s from API call (%s) does not match return value from trace file %s.",
+                string_VkResult((VkResult)resCall), entrypointName, string_VkResult((VkResult)resTrace));
+        res = vktrace_replay::VKTRACE_REPLAY_BAD_RETURN;
+    }
+    if (resCall != VK_SUCCESS  && resCall != VK_NOT_READY) {
+        vktrace_LogWarning("API call (%s) returned failed result %s", entrypointName, string_VkResult(resCall));
+    }
+    return res;
+}
+void vkReplay::push_validation_msg(VkFlags msgFlags, VkDebugReportObjectTypeEXT objType, uint64_t srcObjectHandle, size_t location, int32_t msgCode, const char* pLayerPrefix, const char* pMsg, const void* pUserData)
+{
+    struct ValidationMsg msgObj;
+    msgObj.msgFlags = msgFlags;
+    msgObj.objType = objType;
+    msgObj.srcObjectHandle = srcObjectHandle;
+    msgObj.location = location;
+    strncpy(msgObj.layerPrefix, pLayerPrefix, 256);
+    msgObj.layerPrefix[255] = '\0';
+    msgObj.msgCode = msgCode;
+    strncpy(msgObj.msg, pMsg, 256);
+    msgObj.msg[255] = '\0';
+    msgObj.pUserData = (void *) pUserData;
+    m_validationMsgs.push_back(msgObj);
+}
+
+vktrace_replay::VKTRACE_REPLAY_RESULT vkReplay::pop_validation_msgs()
+{
+    if (m_validationMsgs.size() == 0)
+        return vktrace_replay::VKTRACE_REPLAY_SUCCESS;
+    m_validationMsgs.clear();
+    return vktrace_replay::VKTRACE_REPLAY_VALIDATION_ERROR;
+}
+
+int vkReplay::dump_validation_data()
+{
+    if  (m_pDSDump && m_pCBDump)
+    {
+        m_pDSDump((char *) "pipeline_dump.dot");
+        m_pCBDump((char *) "cb_dump.dot");
+    }
+//    if (m_pVktraceSnapshotPrint != NULL)
+//    {
+//        m_pVktraceSnapshotPrint();
+//    }
+   return 0;
+}
+
+VkResult vkReplay::manually_replay_vkCreateInstance(packet_vkCreateInstance* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+    VkInstanceCreateInfo *pCreateInfo;
+    char **ppEnabledLayerNames = NULL, **saved_ppLayers;
+    if (!m_display->m_initedVK)
+    {
+        VkInstance inst;
+
+        const char strScreenShot[] = "VK_LAYER_LUNARG_screenshot";
+        pCreateInfo = (VkInstanceCreateInfo *) pPacket->pCreateInfo;
+        if (g_pReplaySettings->screenshotList != NULL) {
+            // enable screenshot layer if it is available and not already in list
+            bool found_ss = false;
+            for (uint32_t i = 0; i < pCreateInfo->enabledLayerCount; i++) {
+                if (!strcmp(pCreateInfo->ppEnabledLayerNames[i], strScreenShot)) {
+                    found_ss = true;
+                    break;
+                }
+            }
+            if (!found_ss) {
+                uint32_t count;
+
+                // query to find if ScreenShot layer is available
+                m_vkFuncs.real_vkEnumerateInstanceLayerProperties(&count, NULL);
+                VkLayerProperties *props = (VkLayerProperties *) vktrace_malloc(count * sizeof (VkLayerProperties));
+                if (props && count > 0)
+                    m_vkFuncs.real_vkEnumerateInstanceLayerProperties(&count, props);
+                for (uint32_t i = 0; i < count; i++) {
+                    if (!strcmp(props[i].layerName, strScreenShot)) {
+                        found_ss = true;
+                        break;
+                    }
+                }
+                if (found_ss) {
+                    // screenshot layer is available so enable it
+                    ppEnabledLayerNames = (char **) vktrace_malloc((pCreateInfo->enabledLayerCount + 1) * sizeof (char *));
+                    for (uint32_t i = 0; i < pCreateInfo->enabledLayerCount && ppEnabledLayerNames; i++) {
+                        ppEnabledLayerNames[i] = (char *) pCreateInfo->ppEnabledLayerNames[i];
+                    }
+                    ppEnabledLayerNames[pCreateInfo->enabledLayerCount] = (char *) vktrace_malloc(strlen(strScreenShot) + 1);
+                    strcpy(ppEnabledLayerNames[pCreateInfo->enabledLayerCount++], strScreenShot);
+                    saved_ppLayers = (char **) pCreateInfo->ppEnabledLayerNames;
+                    pCreateInfo->ppEnabledLayerNames = ppEnabledLayerNames;
+                }
+                vktrace_free(props);
+            }
+        }
+
+        char **saved_ppExtensions = (char **)pCreateInfo->ppEnabledExtensionNames;
+        int savedExtensionCount = pCreateInfo->enabledExtensionCount;
+        vector<const char *> extension_names;
+        vector<string> outlist;
+
+#if defined(PLATFORM_LINUX)
+#if !defined(ANDROID)
+        extension_names.push_back(VK_KHR_XCB_SURFACE_EXTENSION_NAME);
+        outlist.push_back("VK_KHR_win32_surface");
+#else
+        extension_names.push_back(VK_KHR_ANDROID_SURFACE_EXTENSION_NAME);
+        outlist.push_back("VK_KHR_win32_surface");
+        outlist.push_back("VK_KHR_xlib_surface");
+        outlist.push_back("VK_KHR_xcb_surface");
+        outlist.push_back("VK_KHR_wayland_surface");
+        outlist.push_back("VK_KHR_mir_surface");
+#endif //ANDROID
+#else
+        extension_names.push_back(VK_KHR_WIN32_SURFACE_EXTENSION_NAME);
+        outlist.push_back("VK_KHR_xlib_surface");
+        outlist.push_back("VK_KHR_xcb_surface");
+        outlist.push_back("VK_KHR_wayland_surface");
+        outlist.push_back("VK_KHR_mir_surface");
+#endif
+
+        for (uint32_t i = 0; i < pCreateInfo->enabledExtensionCount; i++) {
+            if ( find(outlist.begin(), outlist.end(), pCreateInfo->ppEnabledExtensionNames[i]) == outlist.end() ) {
+                extension_names.push_back(pCreateInfo->ppEnabledExtensionNames[i]);
+            }
+        }
+        pCreateInfo->ppEnabledExtensionNames = extension_names.data();
+        pCreateInfo->enabledExtensionCount = (uint32_t)extension_names.size();
+
+        replayResult = m_vkFuncs.real_vkCreateInstance(pPacket->pCreateInfo, NULL, &inst);
+
+        pCreateInfo->ppEnabledExtensionNames = saved_ppExtensions;
+        pCreateInfo->enabledExtensionCount = savedExtensionCount;
+
+        if (ppEnabledLayerNames) {
+            // restore the packets CreateInfo struct
+            vktrace_free(ppEnabledLayerNames[pCreateInfo->enabledLayerCount - 1]);
+            vktrace_free(ppEnabledLayerNames);
+            pCreateInfo->ppEnabledLayerNames = saved_ppLayers;
+        }
+
+        if (replayResult == VK_SUCCESS) {
+            m_objMapper.add_to_instances_map(*(pPacket->pInstance), inst);
+        }
+    }
+    return replayResult;
+}
+
+bool vkReplay::getQueueFamilyIdx(VkPhysicalDevice tracePhysicalDevice,
+                                 VkPhysicalDevice replayPhysicalDevice,
+                                 uint32_t traceIdx,
+                                 uint32_t* pReplayIdx)
+{
+    if  (traceIdx == VK_QUEUE_FAMILY_IGNORED)
+    {
+        *pReplayIdx = VK_QUEUE_FAMILY_IGNORED;
+        return true;
+    }
+
+    if (traceQueueFamilyProperties.find(tracePhysicalDevice) == traceQueueFamilyProperties.end() ||
+        replayQueueFamilyProperties.find(replayPhysicalDevice) == replayQueueFamilyProperties.end())
+    {
+        goto fail;
+    }
+
+    if (min(traceQueueFamilyProperties[tracePhysicalDevice].count, replayQueueFamilyProperties[replayPhysicalDevice].count) == 0)
+    {
+        goto fail;
+    }
+
+    if (replayQueueFamilyProperties[replayPhysicalDevice].count == 1)
+    {
+        *pReplayIdx = 0;
+        return true;
+    }
+
+    for (uint32_t i = 0; i < min(traceQueueFamilyProperties[tracePhysicalDevice].count, replayQueueFamilyProperties[replayPhysicalDevice].count); i++)
+    {
+        if (traceQueueFamilyProperties[tracePhysicalDevice].queueFamilyProperties[traceIdx].queueFlags == replayQueueFamilyProperties[replayPhysicalDevice].queueFamilyProperties[i].queueFlags)
+        {
+            *pReplayIdx = i;
+            return true;
+        }
+    }
+
+    // Didn't find an exact match, search for a superset
+    for (uint32_t i = 0; i < min(traceQueueFamilyProperties[tracePhysicalDevice].count, replayQueueFamilyProperties[replayPhysicalDevice].count); i++)
+    {
+        if (traceQueueFamilyProperties[tracePhysicalDevice].queueFamilyProperties[traceIdx].queueFlags ==
+            (traceQueueFamilyProperties[tracePhysicalDevice].queueFamilyProperties[traceIdx].queueFlags & replayQueueFamilyProperties[replayPhysicalDevice].queueFamilyProperties[i].queueFlags))
+        {
+            *pReplayIdx = i;
+            return true;
+        }
+    }
+
+fail:
+    vktrace_LogError("Cannot determine queue family index - has vkGetPhysicalDeviceQueueFamilyProperties been called?");
+    // Didn't find a match
+    return false;
+}
+
+bool vkReplay::getQueueFamilyIdx(VkDevice traceDevice,
+                                 VkDevice replayDevice,
+                                 uint32_t traceIdx,
+                                 uint32_t* pReplayIdx)
+{
+    VkPhysicalDevice tracePhysicalDevice;
+    VkPhysicalDevice replayPhysicalDevice;
+
+    if (tracePhysicalDevices.find(traceDevice) == tracePhysicalDevices.end() ||
+        replayPhysicalDevices.find(replayDevice) == replayPhysicalDevices.end())
+    {
+        vktrace_LogWarning("Cannot determine queue family index - has vkGetPhysicalDeviceQueueFamilyProperties been called?");
+        return false;
+    }
+
+    tracePhysicalDevice = tracePhysicalDevices[traceDevice];
+    replayPhysicalDevice = replayPhysicalDevices[replayDevice];
+
+    return getQueueFamilyIdx(tracePhysicalDevice,
+                             replayPhysicalDevice,
+                             traceIdx,
+                             pReplayIdx);
+}
+
+VkResult vkReplay::manually_replay_vkCreateDevice(packet_vkCreateDevice* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+    if (!m_display->m_initedVK)
+    {
+        VkDevice device;
+        VkPhysicalDevice remappedPhysicalDevice = m_objMapper.remap_physicaldevices(pPacket->physicalDevice);
+        VkDeviceCreateInfo *pCreateInfo;
+        char **ppEnabledLayerNames = NULL, **saved_ppLayers;
+        if (remappedPhysicalDevice == VK_NULL_HANDLE)
+        {
+            vktrace_LogError("Skipping vkCreateDevice() due to invalid remapped VkPhysicalDevice.");
+            return VK_ERROR_VALIDATION_FAILED_EXT;
+        }
+        const char strScreenShot[] = "VK_LAYER_LUNARG_screenshot";
+        //char *strScreenShotEnv = vktrace_get_global_var("_VK_SCREENSHOT");
+
+        pCreateInfo = (VkDeviceCreateInfo *) pPacket->pCreateInfo;
+        if (g_pReplaySettings->screenshotList != NULL) {
+            // enable screenshot layer if it is available and not already in list
+            bool found_ss = false;
+            for (uint32_t i = 0; i < pCreateInfo->enabledLayerCount; i++) {
+                if (!strcmp(pCreateInfo->ppEnabledLayerNames[i], strScreenShot)) {
+                    found_ss = true;
+                    break;
+                }
+            }
+            if (!found_ss) {
+                uint32_t count;
+
+                // query to find if ScreenShot layer is available
+                m_vkFuncs.real_vkEnumerateDeviceLayerProperties(remappedPhysicalDevice, &count, NULL);
+                VkLayerProperties *props = (VkLayerProperties *) vktrace_malloc(count * sizeof (VkLayerProperties));
+                if (props && count > 0)
+                    m_vkFuncs.real_vkEnumerateDeviceLayerProperties(remappedPhysicalDevice, &count, props);
+                for (uint32_t i = 0; i < count; i++) {
+                    if (!strcmp(props[i].layerName, strScreenShot)) {
+                        found_ss = true;
+                        break;
+                    }
+                }
+                if (found_ss) {
+                    // screenshot layer is available so enable it
+                    ppEnabledLayerNames = (char **) vktrace_malloc((pCreateInfo->enabledLayerCount+1) * sizeof(char *));
+                    for (uint32_t i = 0; i < pCreateInfo->enabledLayerCount && ppEnabledLayerNames; i++)
+                    {
+                        ppEnabledLayerNames[i] = (char *) pCreateInfo->ppEnabledLayerNames[i];
+                    }
+                    ppEnabledLayerNames[pCreateInfo->enabledLayerCount] = (char *) vktrace_malloc(strlen(strScreenShot) + 1);
+                    strcpy(ppEnabledLayerNames[pCreateInfo->enabledLayerCount++], strScreenShot);
+                    saved_ppLayers = (char **) pCreateInfo->ppEnabledLayerNames;
+                    pCreateInfo->ppEnabledLayerNames = ppEnabledLayerNames;
+                }
+                vktrace_free(props);
+            }
+        }
+
+        // Convert all instances of queueFamilyIndex in structure
+        for (uint32_t i = 0; i < pPacket->pCreateInfo->queueCreateInfoCount; i++) {
+            uint32_t replayIdx;
+            if (pPacket->pCreateInfo->pQueueCreateInfos &&
+                getQueueFamilyIdx(pPacket->physicalDevice,
+                                  remappedPhysicalDevice,
+                                  pPacket->pCreateInfo->pQueueCreateInfos->queueFamilyIndex,
+                                  &replayIdx))
+            {
+                *((uint32_t *)&pPacket->pCreateInfo->pQueueCreateInfos->queueFamilyIndex) = replayIdx;
+            }
+            else
+            {
+                vktrace_LogError("vkCreateDevice failed, bad queueFamilyIndex");
+                return VK_ERROR_VALIDATION_FAILED_EXT;
+            }
+        }
+
+        replayResult = m_vkFuncs.real_vkCreateDevice(remappedPhysicalDevice, pPacket->pCreateInfo, NULL, &device);
+        if (ppEnabledLayerNames)
+        {
+            // restore the packets CreateInfo struct
+            vktrace_free(ppEnabledLayerNames[pCreateInfo->enabledLayerCount-1]);
+            vktrace_free(ppEnabledLayerNames);
+            pCreateInfo->ppEnabledLayerNames = saved_ppLayers;
+        }
+        if (replayResult == VK_SUCCESS)
+        {
+            m_objMapper.add_to_devices_map(*(pPacket->pDevice), device);
+            tracePhysicalDevices[*(pPacket->pDevice)] = pPacket->physicalDevice;
+            replayPhysicalDevices[device] = remappedPhysicalDevice;
+        }
+    }
+    return replayResult;
+}
+
+VkResult vkReplay::manually_replay_vkCreateBuffer(packet_vkCreateBuffer* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+    bufferObj local_bufferObj;
+    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+    if (remappedDevice == VK_NULL_HANDLE)
+    {
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    // Convert queueFamilyIndices
+    if (pPacket->pCreateInfo)
+    {
+        for (uint32_t i = 0; i < pPacket->pCreateInfo->queueFamilyIndexCount; i++)
+        {
+            uint32_t replayIdx;
+            if (pPacket->pCreateInfo->pQueueFamilyIndices &&
+                getQueueFamilyIdx(pPacket->device,
+                                  remappedDevice,
+                                  pPacket->pCreateInfo->pQueueFamilyIndices[i],
+                                  &replayIdx))
+            {
+                *((uint32_t*)&pPacket->pCreateInfo->pQueueFamilyIndices[i]) = replayIdx;
+            } else {
+                vktrace_LogError("vkCreateBuffer failed, bad queueFamilyIndex");
+               return VK_ERROR_VALIDATION_FAILED_EXT;
+            }
+        }
+    }
+
+    replayResult = m_vkFuncs.real_vkCreateBuffer(remappedDevice, pPacket->pCreateInfo, NULL, &local_bufferObj.replayBuffer);
+    if (replayResult == VK_SUCCESS)
+    {
+        traceBufferToDevice[*pPacket->pBuffer] = pPacket->device;
+        replayBufferToDevice[local_bufferObj.replayBuffer] = remappedDevice;
+        m_objMapper.add_to_buffers_map(*(pPacket->pBuffer), local_bufferObj);
+    }
+    return replayResult;
+}
+
+VkResult vkReplay::manually_replay_vkCreateImage(packet_vkCreateImage* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+    imageObj local_imageObj;
+    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+    if (remappedDevice == VK_NULL_HANDLE)
+    {
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    // Convert queueFamilyIndices
+    if (pPacket->pCreateInfo)
+    {
+        for (uint32_t i = 0; i < pPacket->pCreateInfo->queueFamilyIndexCount; i++)
+        {
+            uint32_t replayIdx;
+            if (pPacket->pCreateInfo->pQueueFamilyIndices &&
+                getQueueFamilyIdx(pPacket->device,
+                                  remappedDevice,
+                                  pPacket->pCreateInfo->pQueueFamilyIndices[i],
+                                  &replayIdx))
+            {
+                *((uint32_t*)&pPacket->pCreateInfo->pQueueFamilyIndices[i]) = replayIdx;
+            } else {
+                vktrace_LogError("vkCreateImage failed, bad queueFamilyIndex");
+               return VK_ERROR_VALIDATION_FAILED_EXT;
+            }
+        }
+    }
+
+    replayResult = m_vkFuncs.real_vkCreateImage(remappedDevice, pPacket->pCreateInfo, NULL, &local_imageObj.replayImage);
+    if (replayResult == VK_SUCCESS)
+    {
+        traceImageToDevice[*pPacket->pImage] = pPacket->device;
+        replayImageToDevice[local_imageObj.replayImage] = remappedDevice;
+        m_objMapper.add_to_images_map(*(pPacket->pImage), local_imageObj);
+    }
+    return replayResult;
+}
+
+VkResult vkReplay::manually_replay_vkCreateCommandPool(packet_vkCreateCommandPool* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+    VkCommandPool local_pCommandPool;
+    VkDevice remappeddevice = m_objMapper.remap_devices(pPacket->device);
+    if (pPacket->device != VK_NULL_HANDLE && remappeddevice == VK_NULL_HANDLE)
+    {
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    // No need to remap pAllocator
+
+    // Convert queueFamilyIndex
+    if (pPacket->pCreateInfo)
+    {
+        uint32_t replayIdx;
+        if (getQueueFamilyIdx(pPacket->device,
+                              remappeddevice,
+                              pPacket->pCreateInfo->queueFamilyIndex,
+                              &replayIdx))
+        {
+            *((uint32_t*)&pPacket->pCreateInfo->queueFamilyIndex) = replayIdx;
+        } else {
+            vktrace_LogError("vkCreateCommandPool failed, bad queueFamilyIndex");
+           return VK_ERROR_VALIDATION_FAILED_EXT;
+        }
+    }
+
+    replayResult = m_vkFuncs.real_vkCreateCommandPool(remappeddevice, pPacket->pCreateInfo, pPacket->pAllocator, &local_pCommandPool);
+    if (replayResult == VK_SUCCESS)
+    {
+        m_objMapper.add_to_commandpools_map(*(pPacket->pCommandPool), local_pCommandPool);
+    }
+    return replayResult;
+}
+
+
+VkResult vkReplay::manually_replay_vkEnumeratePhysicalDevices(packet_vkEnumeratePhysicalDevices* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+    if (!m_display->m_initedVK)
+    {
+        uint32_t deviceCount = *(pPacket->pPhysicalDeviceCount);
+        VkPhysicalDevice *pDevices = pPacket->pPhysicalDevices;
+
+        VkInstance remappedInstance = m_objMapper.remap_instances(pPacket->instance);
+        if (remappedInstance == VK_NULL_HANDLE)
+        {
+            vktrace_LogError("Skipping vkEnumeratePhysicalDevices() due to invalid remapped VkInstance.");
+            return VK_ERROR_VALIDATION_FAILED_EXT;
+        }
+        if (pPacket->pPhysicalDevices != NULL)
+            pDevices = VKTRACE_NEW_ARRAY(VkPhysicalDevice, deviceCount);
+        replayResult = m_vkFuncs.real_vkEnumeratePhysicalDevices(remappedInstance, &deviceCount, pDevices);
+
+        //TODO handle different number of physical devices in trace versus replay
+        if (deviceCount != *(pPacket->pPhysicalDeviceCount))
+        {
+            vktrace_LogWarning("Number of physical devices mismatched in replay %u versus trace %u.", deviceCount, *(pPacket->pPhysicalDeviceCount));
+        }
+        else if (deviceCount == 0)
+        {
+             vktrace_LogError("vkEnumeratePhysicalDevices number of gpus is zero.");
+        }
+        else if (pDevices != NULL)
+        {
+            vktrace_LogVerbose("Enumerated %d physical devices in the system.", deviceCount);
+        }
+        // TODO handle enumeration results in a different order from trace to replay
+        for (uint32_t i = 0; i < deviceCount; i++)
+        {
+            if (pDevices != NULL)
+            {
+                m_objMapper.add_to_physicaldevices_map(pPacket->pPhysicalDevices[i], pDevices[i]);
+            }
+        }
+        VKTRACE_DELETE(pDevices);
+    }
+    return replayResult;
+}
+
+// TODO138 : Some of these functions have been renamed/changed in v138, need to scrub them and update as appropriate
+//VkResult vkReplay::manually_replay_vkGetPhysicalDeviceInfo(packet_vkGetPhysicalDeviceInfo* pPacket)
+//{
+//    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+//
+//    if (!m_display->m_initedVK)
+//    {
+//        VkPhysicalDevice remappedPhysicalDevice = m_objMapper.remap(pPacket->physicalDevice);
+//        if (remappedPhysicalDevice == VK_NULL_HANDLE)
+//            return VK_ERROR_VALIDATION_FAILED_EXT;
+//
+//        switch (pPacket->infoType) {
+//        case VK_PHYSICAL_DEVICE_INFO_TYPE_PROPERTIES:
+//        {
+//            VkPhysicalDeviceProperties deviceProps;
+//            size_t dataSize = sizeof(VkPhysicalDeviceProperties);
+//            replayResult = m_vkFuncs.real_vkGetPhysicalDeviceInfo(remappedPhysicalDevice, pPacket->infoType, &dataSize,
+//                            (pPacket->pData == NULL) ? NULL : &deviceProps);
+//            if (pPacket->pData != NULL)
+//            {
+//                vktrace_LogVerbose("Replay Physical Device Properties");
+//                vktrace_LogVerbose("Vendor ID %x, Device ID %x, name %s", deviceProps.vendorId, deviceProps.deviceId, deviceProps.deviceName);
+//                vktrace_LogVerbose("API version %u, Driver version %u, gpu Type %u", deviceProps.apiVersion, deviceProps.driverVersion, deviceProps.deviceType);
+//                vktrace_LogVerbose("Max Descriptor Sets: %u", deviceProps.maxDescriptorSets);
+//                vktrace_LogVerbose("Max Bound Descriptor Sets: %u", deviceProps.maxBoundDescriptorSets);
+//                vktrace_LogVerbose("Max Thread Group Size: %u", deviceProps.maxThreadGroupSize);
+//                vktrace_LogVerbose("Max Color Attachments: %u", deviceProps.maxColorAttachments);
+//                vktrace_LogVerbose("Max Inline Memory Update Size: %llu", deviceProps.maxInlineMemoryUpdateSize);
+//            }
+//            break;
+//        }
+//        case VK_PHYSICAL_DEVICE_INFO_TYPE_PERFORMANCE:
+//        {
+//            VkPhysicalDevicePerformance devicePerfs;
+//            size_t dataSize = sizeof(VkPhysicalDevicePerformance);
+//            replayResult = m_vkFuncs.real_vkGetPhysicalDeviceInfo(remappedPhysicalDevice, pPacket->infoType, &dataSize,
+//                            (pPacket->pData == NULL) ? NULL : &devicePerfs);
+//            if (pPacket->pData != NULL)
+//            {
+//                vktrace_LogVerbose("Replay Physical Device Performance");
+//                vktrace_LogVerbose("Max device clock %f, max shader ALUs/clock %f, max texel fetches/clock %f", devicePerfs.maxDeviceClock, devicePerfs.aluPerClock, devicePerfs.texPerClock);
+//                vktrace_LogVerbose("Max primitives/clock %f, Max pixels/clock %f",devicePerfs.primsPerClock, devicePerfs.pixelsPerClock);
+//            }
+//            break;
+//        }
+//        case VK_PHYSICAL_DEVICE_INFO_TYPE_QUEUE_PROPERTIES:
+//        {
+//            VkPhysicalDeviceQueueProperties *pGpuQueue, *pQ;
+//            size_t dataSize = sizeof(VkPhysicalDeviceQueueProperties);
+//            size_t numQueues = 1;
+//            assert(pPacket->pDataSize);
+//            if ((*(pPacket->pDataSize) % dataSize) != 0)
+//                vktrace_LogWarning("vkGetPhysicalDeviceInfo() for QUEUE_PROPERTIES not an integral data size assuming 1");
+//            else
+//                numQueues = *(pPacket->pDataSize) / dataSize;
+//            dataSize = numQueues * dataSize;
+//            pQ = static_cast < VkPhysicalDeviceQueueProperties *> (vktrace_malloc(dataSize));
+//            pGpuQueue = pQ;
+//            replayResult = m_vkFuncs.real_vkGetPhysicalDeviceInfo(remappedPhysicalDevice, pPacket->infoType, &dataSize,
+//                            (pPacket->pData == NULL) ? NULL : pGpuQueue);
+//            if (pPacket->pData != NULL)
+//            {
+//                for (unsigned int i = 0; i < numQueues; i++)
+//                {
+//                    vktrace_LogVerbose("Replay Physical Device Queue Property for index %d, flags %u.", i, pGpuQueue->queueFlags);
+//                    vktrace_LogVerbose("Max available count %u, max atomic counters %u, supports timestamps %u.",pGpuQueue->queueCount, pGpuQueue->maxAtomicCounters, pGpuQueue->supportsTimestamps);
+//                    pGpuQueue++;
+//                }
+//            }
+//            vktrace_free(pQ);
+//            break;
+//        }
+//        default:
+//        {
+//            size_t size = 0;
+//            void* pData = NULL;
+//            if (pPacket->pData != NULL && pPacket->pDataSize != NULL)
+//            {
+//                size = *pPacket->pDataSize;
+//                pData = vktrace_malloc(*pPacket->pDataSize);
+//            }
+//            replayResult = m_vkFuncs.real_vkGetPhysicalDeviceInfo(remappedPhysicalDevice, pPacket->infoType, &size, pData);
+//            if (replayResult == VK_SUCCESS)
+//            {
+///*                // TODO : We could pull this out into its own case of switch, and also may want to perform some
+////                //   validation between the trace values and replay values
+//                else*/ if (size != *pPacket->pDataSize && pData != NULL)
+//                {
+//                    vktrace_LogWarning("vkGetPhysicalDeviceInfo returned a differing data size: replay (%d bytes) vs trace (%d bytes)", size, *pPacket->pDataSize);
+//                }
+//                else if (pData != NULL && memcmp(pData, pPacket->pData, size) != 0)
+//                {
+//                    vktrace_LogWarning("vkGetPhysicalDeviceInfo returned differing data contents than the trace file contained.");
+//                }
+//            }
+//            vktrace_free(pData);
+//            break;
+//        }
+//        };
+//    }
+//    return replayResult;
+//}
+
+//VkResult vkReplay::manually_replay_vkGetGlobalExtensionInfo(packet_vkGetGlobalExtensionInfo* pPacket)
+//{
+//    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+//
+//    if (!m_display->m_initedVK) {
+//        replayResult = m_vkFuncs.real_vkGetGlobalExtensionInfo(pPacket->infoType, pPacket->extensionIndex, pPacket->pDataSize, pPacket->pData);
+//// TODO: Confirm that replay'd properties match with traced properties to ensure compatibility.
+////        if (replayResult == VK_SUCCESS) {
+////            for (unsigned int ext = 0; ext < sizeof(g_extensions) / sizeof(g_extensions[0]); ext++)
+////            {
+////                if (!strncmp(g_extensions[ext], pPacket->pExtName, strlen(g_extensions[ext]))) {
+////                    bool extInList = false;
+////                    for (unsigned int j = 0; j < m_display->m_extensions.size(); ++j) {
+////                        if (!strncmp(m_display->m_extensions[j], g_extensions[ext], strlen(g_extensions[ext])))
+////                            extInList = true;
+////                        break;
+////                    }
+////                    if (!extInList)
+////                        m_display->m_extensions.push_back((char *) g_extensions[ext]);
+////                    break;
+////                }
+////            }
+////        }
+//    }
+//    return replayResult;
+//}
+
+//VkResult vkReplay::manually_replay_vkGetPhysicalDeviceExtensionInfo(packet_vkGetPhysicalDeviceExtensionInfo* pPacket)
+//{
+//    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+//
+//    if (!m_display->m_initedVK) {
+//        VkPhysicalDevice remappedPhysicalDevice = m_objMapper.remap(pPacket->physicalDevice);
+//        if (remappedPhysicalDevice == VK_NULL_HANDLE)
+//            return VK_ERROR_VALIDATION_FAILED_EXT;
+//
+//        replayResult = m_vkFuncs.real_vkGetPhysicalDeviceExtensionInfo(remappedPhysicalDevice, pPacket->infoType, pPacket->extensionIndex, pPacket->pDataSize, pPacket->pData);
+//// TODO: Confirm that replay'd properties match with traced properties to ensure compatibility.
+////        if (replayResult == VK_SUCCESS) {
+////            for (unsigned int ext = 0; ext < sizeof(g_extensions) / sizeof(g_extensions[0]); ext++)
+////            {
+////                if (!strncmp(g_extensions[ext], pPacket->pExtName, strlen(g_extensions[ext]))) {
+////                    bool extInList = false;
+////                    for (unsigned int j = 0; j < m_display->m_extensions.size(); ++j) {
+////                        if (!strncmp(m_display->m_extensions[j], g_extensions[ext], strlen(g_extensions[ext])))
+////                            extInList = true;
+////                        break;
+////                    }
+////                    if (!extInList)
+////                        m_display->m_extensions.push_back((char *) g_extensions[ext]);
+////                    break;
+////                }
+////            }
+////        }
+//    }
+//    return replayResult;
+//}
+
+//VkResult vkReplay::manually_replay_vkGetSwapchainInfoWSI(packet_vkGetSwapchainInfoWSI* pPacket)
+//{
+//    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+//
+//    size_t dataSize = *pPacket->pDataSize;
+//    void* pData = vktrace_malloc(dataSize);
+//    VkSwapchainWSI remappedSwapchain = m_objMapper.remap_swapchainwsis(pPacket->swapchain);
+//    if (remappedSwapchain == VK_NULL_HANDLE)
+//    {
+//        vktrace_LogError("Skipping vkGetSwapchainInfoWSI() due to invalid remapped VkSwapchainWSI.");
+//        return VK_ERROR_VALIDATION_FAILED_EXT;
+//    }
+//    replayResult = m_vkFuncs.real_vkGetSwapchainInfoWSI(remappedSwapchain, pPacket->infoType, &dataSize, pData);
+//    if (replayResult == VK_SUCCESS)
+//    {
+//        if (dataSize != *pPacket->pDataSize)
+//        {
+//            vktrace_LogWarning("SwapchainInfo dataSize differs between trace (%d bytes) and replay (%d bytes)", *pPacket->pDataSize, dataSize);
+//        }
+//        if (pPacket->infoType == VK_SWAP_CHAIN_INFO_TYPE_IMAGES_WSI)
+//        {
+//            VkSwapchainImageInfoWSI* pImageInfoReplay = (VkSwapchainImageInfoWSI*)pData;
+//            VkSwapchainImageInfoWSI* pImageInfoTrace = (VkSwapchainImageInfoWSI*)pPacket->pData;
+//            size_t imageCountReplay = dataSize / sizeof(VkSwapchainImageInfoWSI);
+//            size_t imageCountTrace = *pPacket->pDataSize / sizeof(VkSwapchainImageInfoWSI);
+//            for (size_t i = 0; i < imageCountReplay && i < imageCountTrace; i++)
+//            {
+//                imageObj imgObj;
+//                imgObj.replayImage = pImageInfoReplay[i].image;
+//                m_objMapper.add_to_map(&pImageInfoTrace[i].image, &imgObj);
+//
+//                gpuMemObj memObj;
+//                memObj.replayGpuMem = pImageInfoReplay[i].memory;
+//                m_objMapper.add_to_map(&pImageInfoTrace[i].memory, &memObj);
+//            }
+//        }
+//    }
+//    vktrace_free(pData);
+//    return replayResult;
+//}
+
+VkResult vkReplay::manually_replay_vkQueueSubmit(packet_vkQueueSubmit* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+
+    VkQueue remappedQueue = m_objMapper.remap_queues(pPacket->queue);
+    if (remappedQueue == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkQueueSubmit() due to invalid remapped VkQueue.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkFence remappedFence = m_objMapper.remap_fences(pPacket->fence);
+    if (pPacket->fence != VK_NULL_HANDLE && remappedFence == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkQueueSubmit() due to invalid remapped VkPhysicalDevice.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkSubmitInfo *remappedSubmits = NULL;
+    remappedSubmits = VKTRACE_NEW_ARRAY( VkSubmitInfo, pPacket->submitCount);
+    VkCommandBuffer *pRemappedBuffers = NULL;
+    VkSemaphore *pRemappedWaitSems = NULL, *pRemappedSignalSems = NULL;
+    for (uint32_t submit_idx = 0; submit_idx < pPacket->submitCount; submit_idx++) {
+        const VkSubmitInfo *submit = &pPacket->pSubmits[submit_idx];
+        VkSubmitInfo *remappedSubmit = &remappedSubmits[submit_idx];
+        memset(remappedSubmit, 0, sizeof(VkSubmitInfo));
+        remappedSubmit->sType = submit->sType;
+        remappedSubmit->pNext = submit->pNext;
+        remappedSubmit->pWaitDstStageMask = submit->pWaitDstStageMask;
+        // Remap Semaphores & CommandBuffers for this submit
+        uint32_t i = 0;
+        if (submit->pCommandBuffers != NULL) {
+            pRemappedBuffers = VKTRACE_NEW_ARRAY( VkCommandBuffer, submit->commandBufferCount);
+            remappedSubmit->pCommandBuffers = pRemappedBuffers;
+            remappedSubmit->commandBufferCount = submit->commandBufferCount;
+            for (i = 0; i < submit->commandBufferCount; i++) {
+                *(pRemappedBuffers + i) = m_objMapper.remap_commandbuffers(*(submit->pCommandBuffers + i));
+                if (*(pRemappedBuffers + i) == VK_NULL_HANDLE) {
+                    vktrace_LogError("Skipping vkQueueSubmit() due to invalid remapped VkCommandBuffer.");
+                    VKTRACE_DELETE(remappedSubmits);
+                    VKTRACE_DELETE(pRemappedBuffers);
+                    return replayResult;
+                }
+            }
+        }
+        if (submit->pWaitSemaphores != NULL) {
+            pRemappedWaitSems = VKTRACE_NEW_ARRAY(VkSemaphore, submit->waitSemaphoreCount);
+            remappedSubmit->pWaitSemaphores = pRemappedWaitSems;
+            remappedSubmit->waitSemaphoreCount = submit->waitSemaphoreCount;
+            for (i = 0; i < submit->waitSemaphoreCount; i++) {
+                (*(pRemappedWaitSems + i)) = m_objMapper.remap_semaphores((*(submit->pWaitSemaphores + i)));
+                if (*(pRemappedWaitSems + i) == VK_NULL_HANDLE) {
+                    vktrace_LogError("Skipping vkQueueSubmit() due to invalid remapped wait VkSemaphore.");
+                    VKTRACE_DELETE(remappedSubmits);
+                    VKTRACE_DELETE(pRemappedBuffers);
+                    VKTRACE_DELETE(pRemappedWaitSems);
+                    return replayResult;
+                }
+            }
+        }
+        if (submit->pSignalSemaphores != NULL) {
+            pRemappedSignalSems = VKTRACE_NEW_ARRAY(VkSemaphore, submit->signalSemaphoreCount);
+            remappedSubmit->pSignalSemaphores = pRemappedSignalSems;
+            remappedSubmit->signalSemaphoreCount = submit->signalSemaphoreCount;
+            for (i = 0; i < submit->signalSemaphoreCount; i++) {
+                (*(pRemappedSignalSems + i)) = m_objMapper.remap_semaphores((*(submit->pSignalSemaphores + i)));
+                if (*(pRemappedSignalSems + i) == VK_NULL_HANDLE) {
+                    vktrace_LogError("Skipping vkQueueSubmit() due to invalid remapped signal VkSemaphore.");
+                    VKTRACE_DELETE(remappedSubmits);
+                    VKTRACE_DELETE(pRemappedBuffers);
+                    VKTRACE_DELETE(pRemappedWaitSems);
+                    VKTRACE_DELETE(pRemappedSignalSems);
+                    return replayResult;
+                }
+            }
+        }
+    }
+    replayResult = m_vkFuncs.real_vkQueueSubmit(remappedQueue,
+                                                pPacket->submitCount,
+                                                remappedSubmits,
+                                                remappedFence);
+    VKTRACE_DELETE(pRemappedBuffers);
+    VKTRACE_DELETE(pRemappedWaitSems);
+    VKTRACE_DELETE(pRemappedSignalSems);
+    VKTRACE_DELETE(remappedSubmits);
+    return replayResult;
+}
+
+VkResult vkReplay::manually_replay_vkQueueBindSparse(packet_vkQueueBindSparse* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+
+    VkQueue remappedQueue = m_objMapper.remap_queues(pPacket->queue);
+    if (remappedQueue == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkQueueBindSparse() due to invalid remapped VkQueue.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkFence remappedFence = m_objMapper.remap_fences(pPacket->fence);
+    if (pPacket->fence != VK_NULL_HANDLE && remappedFence == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkQueueBindSparse() due to invalid remapped VkPhysicalDevice.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkBindSparseInfo* remappedBindSparseInfos = VKTRACE_NEW_ARRAY(VkBindSparseInfo, pPacket->bindInfoCount);
+    VkSparseImageMemoryBind *pRemappedImageMemories = NULL;
+    VkSparseMemoryBind *pRemappedBufferMemories = NULL;
+    VkSparseMemoryBind *pRemappedImageOpaqueMemories = NULL;
+    VkSemaphore *pRemappedWaitSems = NULL;
+    VkSemaphore *pRemappedSignalSems = NULL;
+    VkSparseImageMemoryBindInfo* sIMBinf = NULL;
+    VkSparseBufferMemoryBindInfo* sBMBinf = NULL;
+    VkSparseImageOpaqueMemoryBindInfo* sIMOBinf = NULL;
+
+    memcpy((void*)remappedBindSparseInfos, (void*)(pPacket->pBindInfo), sizeof(VkBindSparseInfo)*pPacket->bindInfoCount);
+
+    for (uint32_t bindInfo_idx = 0; bindInfo_idx < pPacket->bindInfoCount; bindInfo_idx++) {
+        if (remappedBindSparseInfos[bindInfo_idx].pBufferBinds)
+        {
+            sBMBinf = VKTRACE_NEW_ARRAY(VkSparseBufferMemoryBindInfo, remappedBindSparseInfos[bindInfo_idx].bufferBindCount);
+            remappedBindSparseInfos[bindInfo_idx].pBufferBinds = (const VkSparseBufferMemoryBindInfo*)(vktrace_trace_packet_interpret_buffer_pointer(pPacket->header, (intptr_t)remappedBindSparseInfos[bindInfo_idx].pBufferBinds));
+            memcpy((void*)sBMBinf, (void*)remappedBindSparseInfos[bindInfo_idx].pBufferBinds, sizeof(VkSparseBufferMemoryBindInfo)*remappedBindSparseInfos[bindInfo_idx].bufferBindCount);
+
+            sBMBinf->buffer = m_objMapper.remap_buffers(sBMBinf->buffer);
+
+            if (sBMBinf->buffer == VK_NULL_HANDLE)
+            {
+                vktrace_LogError("Skipping vkQueueBindSparse() due to invalid remapped VkBuffer.");
+                goto FAILURE;
+            }
+
+            if (sBMBinf->bindCount > 0 && sBMBinf->pBinds)
+            {
+                pRemappedBufferMemories = (VkSparseMemoryBind*)(vktrace_trace_packet_interpret_buffer_pointer(pPacket->header, (intptr_t)remappedBindSparseInfos[bindInfo_idx].pBufferBinds->pBinds));
+            }
+
+            for (uint32_t bindCountIdx = 0; bindCountIdx < sBMBinf->bindCount; bindCountIdx++)
+            {
+                gpuMemObj local_mem = m_objMapper.m_devicememorys.find(pRemappedBufferMemories[bindCountIdx].memory)->second;
+                VkDeviceMemory replay_mem = m_objMapper.remap_devicememorys(pRemappedBufferMemories[bindCountIdx].memory);
+
+                if (replay_mem == VK_NULL_HANDLE || local_mem.pGpuMem == NULL)
+                {
+                    vktrace_LogError("Skipping vkQueueBindSparse() due to invalid remapped VkDeviceMemory.");
+                    goto FAILURE;
+                }
+                pRemappedBufferMemories[bindCountIdx].memory = replay_mem;
+            }
+            sBMBinf->pBinds = pRemappedBufferMemories;
+            remappedBindSparseInfos[bindInfo_idx].pBufferBinds = sBMBinf;
+        }
+
+        if (remappedBindSparseInfos[bindInfo_idx].pImageBinds)
+        {
+            sIMBinf = VKTRACE_NEW_ARRAY(VkSparseImageMemoryBindInfo, remappedBindSparseInfos[bindInfo_idx].imageBindCount);
+            remappedBindSparseInfos[bindInfo_idx].pImageBinds = (const VkSparseImageMemoryBindInfo*)(vktrace_trace_packet_interpret_buffer_pointer(pPacket->header, (intptr_t)remappedBindSparseInfos[bindInfo_idx].pImageBinds));
+            memcpy((void*)sIMBinf, (void*)remappedBindSparseInfos[bindInfo_idx].pImageBinds, sizeof(VkSparseImageMemoryBindInfo)*remappedBindSparseInfos[bindInfo_idx].imageBindCount);
+
+            sIMBinf->image = m_objMapper.remap_images(sIMBinf->image);
+
+            if(sIMBinf->image == VK_NULL_HANDLE)
+            {
+                vktrace_LogError("Skipping vkQueueBindSparse() due to invalid remapped VkImage.");
+                goto FAILURE;
+            }
+
+            if (sIMBinf->bindCount > 0 && sIMBinf->pBinds)
+            {
+                pRemappedImageMemories = (VkSparseImageMemoryBind*)(vktrace_trace_packet_interpret_buffer_pointer(pPacket->header, (intptr_t)remappedBindSparseInfos[bindInfo_idx].pImageBinds->pBinds));
+            }
+            for (uint32_t bindCountIdx = 0; bindCountIdx < sIMBinf->bindCount; bindCountIdx++)
+            {
+                gpuMemObj local_mem = m_objMapper.m_devicememorys.find(pRemappedImageMemories[bindCountIdx].memory)->second;
+                VkDeviceMemory replay_mem = m_objMapper.remap_devicememorys(pRemappedImageMemories[bindCountIdx].memory);
+
+                if (replay_mem == VK_NULL_HANDLE || local_mem.pGpuMem == NULL)
+                {
+                    vktrace_LogError("Skipping vkQueueBindSparse() due to invalid remapped VkDeviceMemory.");
+                    goto FAILURE;
+                }
+                pRemappedImageMemories[bindCountIdx].memory = replay_mem;
+            }
+            sIMBinf->pBinds = pRemappedImageMemories;
+            remappedBindSparseInfos[bindInfo_idx].pImageBinds = sIMBinf;
+        }
+
+        if (remappedBindSparseInfos[bindInfo_idx].pImageOpaqueBinds)
+        {
+            sIMOBinf = VKTRACE_NEW_ARRAY(VkSparseImageOpaqueMemoryBindInfo, remappedBindSparseInfos[bindInfo_idx].imageOpaqueBindCount);
+            remappedBindSparseInfos[bindInfo_idx].pImageOpaqueBinds = (const VkSparseImageOpaqueMemoryBindInfo*)(vktrace_trace_packet_interpret_buffer_pointer(pPacket->header, (intptr_t)remappedBindSparseInfos[bindInfo_idx].pImageOpaqueBinds));
+            memcpy((void*)sIMOBinf, (void*)remappedBindSparseInfos[bindInfo_idx].pImageOpaqueBinds, sizeof(VkSparseImageOpaqueMemoryBindInfo)*remappedBindSparseInfos[bindInfo_idx].imageOpaqueBindCount);
+
+            sIMOBinf->image = m_objMapper.remap_images(sIMOBinf->image);
+
+            if (sIMOBinf->image == VK_NULL_HANDLE)
+            {
+                vktrace_LogError("Skipping vkQueueBindSparse() due to invalid remapped VkImage.");
+                goto FAILURE;
+            }
+
+            if (sIMOBinf->bindCount > 0 && sIMOBinf->pBinds)
+            {
+                pRemappedImageOpaqueMemories = (VkSparseMemoryBind*)(vktrace_trace_packet_interpret_buffer_pointer(pPacket->header, (intptr_t)remappedBindSparseInfos[bindInfo_idx].pImageOpaqueBinds->pBinds));
+            }
+            for (uint32_t bindCountIdx = 0; bindCountIdx < sIMOBinf->bindCount; bindCountIdx++)
+            {
+                gpuMemObj local_mem = m_objMapper.m_devicememorys.find(pRemappedImageOpaqueMemories[bindCountIdx].memory)->second;
+                VkDeviceMemory replay_mem = m_objMapper.remap_devicememorys(pRemappedImageOpaqueMemories[bindCountIdx].memory);
+
+                if (replay_mem == VK_NULL_HANDLE || local_mem.pGpuMem == NULL)
+                {
+                    vktrace_LogError("Skipping vkQueueBindSparse() due to invalid remapped VkDeviceMemory.");
+                    goto FAILURE;
+                }
+                pRemappedImageOpaqueMemories[bindCountIdx].memory = replay_mem;
+            }
+            sIMOBinf->pBinds = pRemappedImageOpaqueMemories;
+            remappedBindSparseInfos[bindInfo_idx].pImageOpaqueBinds = sIMOBinf;
+        }
+
+        if (remappedBindSparseInfos[bindInfo_idx].pWaitSemaphores != NULL) {
+            pRemappedWaitSems = VKTRACE_NEW_ARRAY(VkSemaphore, remappedBindSparseInfos[bindInfo_idx].waitSemaphoreCount);
+            remappedBindSparseInfos[bindInfo_idx].pWaitSemaphores = pRemappedWaitSems;
+            for (uint32_t i = 0; i < remappedBindSparseInfos[bindInfo_idx].waitSemaphoreCount; i++) {
+                (*(pRemappedWaitSems + i)) = m_objMapper.remap_semaphores((*(remappedBindSparseInfos[bindInfo_idx].pWaitSemaphores + i)));
+                if (*(pRemappedWaitSems + i) == VK_NULL_HANDLE) {
+                    vktrace_LogError("Skipping vkQueueSubmit() due to invalid remapped wait VkSemaphore.");
+                    goto FAILURE;
+                }
+            }
+        }
+        if (remappedBindSparseInfos[bindInfo_idx].pSignalSemaphores != NULL) {
+            pRemappedSignalSems = VKTRACE_NEW_ARRAY(VkSemaphore, remappedBindSparseInfos[bindInfo_idx].signalSemaphoreCount);
+            remappedBindSparseInfos[bindInfo_idx].pSignalSemaphores = pRemappedSignalSems;
+            for (uint32_t i = 0; i < remappedBindSparseInfos[bindInfo_idx].signalSemaphoreCount; i++) {
+                (*(pRemappedSignalSems + i)) = m_objMapper.remap_semaphores((*(remappedBindSparseInfos[bindInfo_idx].pSignalSemaphores + i)));
+                if (*(pRemappedSignalSems + i) == VK_NULL_HANDLE) {
+                    vktrace_LogError("Skipping vkQueueSubmit() due to invalid remapped signal VkSemaphore.");
+                    goto FAILURE;
+                }
+            }
+        }
+    }
+
+    replayResult = m_vkFuncs.real_vkQueueBindSparse(remappedQueue,
+        pPacket->bindInfoCount,
+        remappedBindSparseInfos,
+        remappedFence);
+
+FAILURE:
+    VKTRACE_DELETE(remappedBindSparseInfos);
+    VKTRACE_DELETE(sIMBinf);
+    VKTRACE_DELETE(sBMBinf);
+    VKTRACE_DELETE(sIMOBinf);
+    VKTRACE_DELETE(pRemappedSignalSems);
+    VKTRACE_DELETE(pRemappedWaitSems);
+    return replayResult;
+}
+
+
+//VkResult vkReplay::manually_replay_vkGetObjectInfo(packet_vkGetObjectInfo* pPacket)
+//{
+//    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+//
+//    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+//    if (remappedDevice == VK_NULL_HANDLE)
+//    {
+//        vktrace_LogError("Skipping vkGetObjectInfo() due to invalid remapped VkDevice.");
+//        return VK_ERROR_VALIDATION_FAILED_EXT;
+//    }
+//
+//    VkObject remappedObject = m_objMapper.remap(pPacket->object, pPacket->objType);
+//    if (remappedObject == VK_NULL_HANDLE)
+//    {
+//        vktrace_LogError("Skipping vkGetObjectInfo() due to invalid remapped VkObject.");
+//        return VK_ERROR_VALIDATION_FAILED_EXT;
+//    }
+//
+//    size_t size = 0;
+//    void* pData = NULL;
+//    if (pPacket->pData != NULL && pPacket->pDataSize != NULL)
+//    {
+//        size = *pPacket->pDataSize;
+//        pData = vktrace_malloc(*pPacket->pDataSize);
+//        memcpy(pData, pPacket->pData, *pPacket->pDataSize);
+//    }
+//    // TODO only search for object once rather than at remap() and init_objMemXXX()
+//    replayResult = m_vkFuncs.real_vkGetObjectInfo(remappedDevice, pPacket->objType, remappedObject, pPacket->infoType, &size, pData);
+//    if (replayResult == VK_SUCCESS)
+//    {
+//        if (size != *pPacket->pDataSize && pData != NULL)
+//        {
+//            vktrace_LogWarning("vkGetObjectInfo returned a differing data size: replay (%d bytes) vs trace (%d bytes).", size, *pPacket->pDataSize);
+//        }
+//        else if (pData != NULL)
+//        {
+//            switch (pPacket->infoType)
+//            {
+//                case VK_OBJECT_INFO_TYPE_MEMORY_REQUIREMENTS:
+//                {
+//                    VkMemoryRequirements *traceReqs = (VkMemoryRequirements *) pPacket->pData;
+//                    VkMemoryRequirements *replayReqs = (VkMemoryRequirements *) pData;
+//                    size_t num = size / sizeof(VkMemoryRequirements);
+//                    for (unsigned int i = 0; i < num; i++)
+//                    {
+//                        if (traceReqs->size != replayReqs->size)
+//                            vktrace_LogWarning("vkGetObjectInfo(INFO_TYPE_MEMORY_REQUIREMENTS) mismatch: trace size %u, replay size %u.", traceReqs->size, replayReqs->size);
+//                        if (traceReqs->alignment != replayReqs->alignment)
+//                            vktrace_LogWarning("vkGetObjectInfo(INFO_TYPE_MEMORY_REQUIREMENTS) mismatch: trace alignment %u, replay aligmnent %u.", traceReqs->alignment, replayReqs->alignment);
+//                        if (traceReqs->granularity != replayReqs->granularity)
+//                            vktrace_LogWarning("vkGetObjectInfo(INFO_TYPE_MEMORY_REQUIREMENTS) mismatch: trace granularity %u, replay granularity %u.", traceReqs->granularity, replayReqs->granularity);
+//                        if (traceReqs->memPropsAllowed != replayReqs->memPropsAllowed)
+//                            vktrace_LogWarning("vkGetObjectInfo(INFO_TYPE_MEMORY_REQUIREMENTS) mismatch: trace memPropsAllowed %u, replay memPropsAllowed %u.", traceReqs->memPropsAllowed, replayReqs->memPropsAllowed);
+//                        if (traceReqs->memPropsRequired != replayReqs->memPropsRequired)
+//                            vktrace_LogWarning("vkGetObjectInfo(INFO_TYPE_MEMORY_REQUIREMENTS) mismatch: trace memPropsRequired %u, replay memPropsRequired %u.", traceReqs->memPropsRequired, replayReqs->memPropsRequired);
+//                        traceReqs++;
+//                        replayReqs++;
+//                    }
+//                    if (m_objMapper.m_adjustForGPU)
+//                        m_objMapper.init_objMemReqs(pPacket->object, replayReqs - num, num);
+//                    break;
+//                }
+//                default:
+//                    if (memcmp(pData, pPacket->pData, size) != 0)
+//                        vktrace_LogWarning("vkGetObjectInfo() mismatch on *pData: between trace and replay *pDataSize %u.", size);
+//            }
+//        }
+//    }
+//    vktrace_free(pData);
+//    return replayResult;
+//}
+
+//VkResult vkReplay::manually_replay_vkGetImageSubresourceInfo(packet_vkGetImageSubresourceInfo* pPacket)
+//{
+//    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+//
+//    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+//    if (remappedDevice == VK_NULL_HANDLE)
+//    {
+//        vktrace_LogError("Skipping vkGetImageSubresourceInfo() due to invalid remapped VkDevice.");
+//        return VK_ERROR_VALIDATION_FAILED_EXT;
+//    }
+//
+//    VkImage remappedImage = m_objMapper.remap(pPacket->image);
+//    if (remappedImage == VK_NULL_HANDLE)
+//    {
+//        vktrace_LogError("Skipping vkGetImageSubresourceInfo() due to invalid remapped VkImage.");
+//        return VK_ERROR_VALIDATION_FAILED_EXT;
+//    }
+//
+//    size_t size = 0;
+//    void* pData = NULL;
+//    if (pPacket->pData != NULL && pPacket->pDataSize != NULL)
+//    {
+//        size = *pPacket->pDataSize;
+//        pData = vktrace_malloc(*pPacket->pDataSize);
+//    }
+//    replayResult = m_vkFuncs.real_vkGetImageSubresourceInfo(remappedDevice, remappedImage, pPacket->pSubresource, pPacket->infoType, &size, pData);
+//    if (replayResult == VK_SUCCESS)
+//    {
+//        if (size != *pPacket->pDataSize && pData != NULL)
+//        {
+//            vktrace_LogWarning("vkGetImageSubresourceInfo returned a differing data size: replay (%d bytes) vs trace (%d bytes).", size, *pPacket->pDataSize);
+//        }
+//        else if (pData != NULL && memcmp(pData, pPacket->pData, size) != 0)
+//        {
+//            vktrace_LogWarning("vkGetImageSubresourceInfo returned differing data contents than the trace file contained.");
+//        }
+//    }
+//    vktrace_free(pData);
+//    return replayResult;
+//}
+
+void vkReplay::manually_replay_vkUpdateDescriptorSets(packet_vkUpdateDescriptorSets* pPacket)
+{
+    // We have to remap handles internal to the structures so save the handles prior to remap and then restore
+    // Rather than doing a deep memcpy of the entire struct and fixing any intermediate pointers, do save and restores via STL queue
+
+    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+    if (remappedDevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkUpdateDescriptorSets() due to invalid remapped VkDevice.");
+        return;
+    }
+
+    VkWriteDescriptorSet* pRemappedWrites = VKTRACE_NEW_ARRAY(VkWriteDescriptorSet, pPacket->descriptorWriteCount);
+    memcpy(pRemappedWrites, pPacket->pDescriptorWrites, pPacket->descriptorWriteCount * sizeof(VkWriteDescriptorSet));
+
+    VkCopyDescriptorSet* pRemappedCopies = VKTRACE_NEW_ARRAY(VkCopyDescriptorSet, pPacket->descriptorCopyCount);
+    memcpy(pRemappedCopies, pPacket->pDescriptorCopies, pPacket->descriptorCopyCount * sizeof(VkCopyDescriptorSet));
+
+    bool errorBadRemap = false;
+
+    for (uint32_t i = 0; i < pPacket->descriptorWriteCount && !errorBadRemap; i++)
+    {
+        pRemappedWrites[i].dstSet = m_objMapper.remap_descriptorsets(pPacket->pDescriptorWrites[i].dstSet);
+        if (pRemappedWrites[i].dstSet == VK_NULL_HANDLE)
+        {
+            vktrace_LogError("Skipping vkUpdateDescriptorSets() due to invalid remapped write VkDescriptorSet.");
+            errorBadRemap = true;
+            break;
+        }
+
+        switch (pPacket->pDescriptorWrites[i].descriptorType) {
+        case VK_DESCRIPTOR_TYPE_SAMPLER:
+            pRemappedWrites[i].pImageInfo = VKTRACE_NEW_ARRAY(VkDescriptorImageInfo, pPacket->pDescriptorWrites[i].descriptorCount);
+            memcpy((void*)pRemappedWrites[i].pImageInfo, pPacket->pDescriptorWrites[i].pImageInfo, pPacket->pDescriptorWrites[i].descriptorCount * sizeof(VkDescriptorImageInfo));
+            for (uint32_t j = 0; j < pPacket->pDescriptorWrites[i].descriptorCount; j++)
+            {
+                if (pPacket->pDescriptorWrites[i].pImageInfo[j].sampler != VK_NULL_HANDLE)
+                {
+                    const_cast<VkDescriptorImageInfo*>(pRemappedWrites[i].pImageInfo)[j].sampler = m_objMapper.remap_samplers(pPacket->pDescriptorWrites[i].pImageInfo[j].sampler);
+                    if (pRemappedWrites[i].pImageInfo[j].sampler == VK_NULL_HANDLE)
+                    {
+                        vktrace_LogError("Skipping vkUpdateDescriptorSets() due to invalid remapped VkSampler.");
+                        errorBadRemap = true;
+                        break;
+                    }
+                }
+            }
+            break;
+        case VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE:
+        case VK_DESCRIPTOR_TYPE_STORAGE_IMAGE:
+        case VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT:
+            pRemappedWrites[i].pImageInfo = VKTRACE_NEW_ARRAY(VkDescriptorImageInfo, pPacket->pDescriptorWrites[i].descriptorCount);
+            memcpy((void*)pRemappedWrites[i].pImageInfo, pPacket->pDescriptorWrites[i].pImageInfo, pPacket->pDescriptorWrites[i].descriptorCount * sizeof(VkDescriptorImageInfo));
+            for (uint32_t j = 0; j < pPacket->pDescriptorWrites[i].descriptorCount; j++)
+            {
+                if (pPacket->pDescriptorWrites[i].pImageInfo[j].imageView != VK_NULL_HANDLE)
+                {
+                    const_cast<VkDescriptorImageInfo*>(pRemappedWrites[i].pImageInfo)[j].imageView = m_objMapper.remap_imageviews(pPacket->pDescriptorWrites[i].pImageInfo[j].imageView);
+                    if (pRemappedWrites[i].pImageInfo[j].imageView == VK_NULL_HANDLE)
+                    {
+                        vktrace_LogError("Skipping vkUpdateDescriptorSets() due to invalid remapped VkImageView.");
+                        errorBadRemap = true;
+                        break;
+                    }
+                }
+            }
+            break;
+        case VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER:
+            pRemappedWrites[i].pImageInfo = VKTRACE_NEW_ARRAY(VkDescriptorImageInfo, pPacket->pDescriptorWrites[i].descriptorCount);
+            memcpy((void*)pRemappedWrites[i].pImageInfo, pPacket->pDescriptorWrites[i].pImageInfo, pPacket->pDescriptorWrites[i].descriptorCount * sizeof(VkDescriptorImageInfo));
+            for (uint32_t j = 0; j < pPacket->pDescriptorWrites[i].descriptorCount; j++)
+            {
+                if (pPacket->pDescriptorWrites[i].pImageInfo[j].sampler != VK_NULL_HANDLE)
+                {
+                    const_cast<VkDescriptorImageInfo*>(pRemappedWrites[i].pImageInfo)[j].sampler = m_objMapper.remap_samplers(pPacket->pDescriptorWrites[i].pImageInfo[j].sampler);
+                    if (pRemappedWrites[i].pImageInfo[j].sampler == VK_NULL_HANDLE)
+                    {
+                        vktrace_LogError("Skipping vkUpdateDescriptorSets() due to invalid remapped VkSampler.");
+                        errorBadRemap = true;
+                        break;
+                    }
+                }
+                if (pPacket->pDescriptorWrites[i].pImageInfo[j].imageView != VK_NULL_HANDLE)
+                {
+                    const_cast<VkDescriptorImageInfo*>(pRemappedWrites[i].pImageInfo)[j].imageView = m_objMapper.remap_imageviews(pPacket->pDescriptorWrites[i].pImageInfo[j].imageView);
+                    if (pRemappedWrites[i].pImageInfo[j].imageView == VK_NULL_HANDLE)
+                    {
+                        vktrace_LogError("Skipping vkUpdateDescriptorSets() due to invalid remapped VkImageView.");
+                        errorBadRemap = true;
+                        break;
+                    }
+                }
+            }
+            break;
+        case VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER:
+        case VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER:
+            pRemappedWrites[i].pTexelBufferView = VKTRACE_NEW_ARRAY(VkBufferView, pPacket->pDescriptorWrites[i].descriptorCount);
+            memcpy((void*)pRemappedWrites[i].pTexelBufferView, pPacket->pDescriptorWrites[i].pTexelBufferView, pPacket->pDescriptorWrites[i].descriptorCount * sizeof(VkBufferView));
+            for (uint32_t j = 0; j < pPacket->pDescriptorWrites[i].descriptorCount; j++)
+            {
+                if (pPacket->pDescriptorWrites[i].pTexelBufferView[j] != VK_NULL_HANDLE)
+                {
+                    const_cast<VkBufferView*>(pRemappedWrites[i].pTexelBufferView)[j] = m_objMapper.remap_bufferviews(pPacket->pDescriptorWrites[i].pTexelBufferView[j]);
+                    if (pRemappedWrites[i].pTexelBufferView[j] == VK_NULL_HANDLE)
+                    {
+                        vktrace_LogError("Skipping vkUpdateDescriptorSets() due to invalid remapped VkBufferView.");
+                        errorBadRemap = true;
+                        break;
+                    }
+                }
+            }
+            break;
+        case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER:
+        case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER:
+        case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC:
+        case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC:
+            pRemappedWrites[i].pBufferInfo = VKTRACE_NEW_ARRAY(VkDescriptorBufferInfo, pPacket->pDescriptorWrites[i].descriptorCount);
+            memcpy((void*)pRemappedWrites[i].pBufferInfo, pPacket->pDescriptorWrites[i].pBufferInfo, pPacket->pDescriptorWrites[i].descriptorCount * sizeof(VkDescriptorBufferInfo));
+            for (uint32_t j = 0; j < pPacket->pDescriptorWrites[i].descriptorCount; j++)
+            {
+                if (pPacket->pDescriptorWrites[i].pBufferInfo[j].buffer != VK_NULL_HANDLE)
+                {
+                    const_cast<VkDescriptorBufferInfo*>(pRemappedWrites[i].pBufferInfo)[j].buffer = m_objMapper.remap_buffers(pPacket->pDescriptorWrites[i].pBufferInfo[j].buffer);
+                    if (pRemappedWrites[i].pBufferInfo[j].buffer == VK_NULL_HANDLE)
+                    {
+                        vktrace_LogError("Skipping vkUpdateDescriptorSets() due to invalid remapped VkBufferView.");
+                        errorBadRemap = true;
+                        break;
+                    }
+                }
+            }
+            /* Nothing to do, already copied the constant values into the new descriptor info */
+        default:
+            break;
+        }
+    }
+
+    for (uint32_t i = 0; i < pPacket->descriptorCopyCount && !errorBadRemap; i++)
+    {
+        pRemappedCopies[i].dstSet = m_objMapper.remap_descriptorsets(pPacket->pDescriptorCopies[i].dstSet);
+        if (pRemappedCopies[i].dstSet == VK_NULL_HANDLE)
+        {
+            vktrace_LogError("Skipping vkUpdateDescriptorSets() due to invalid remapped destination VkDescriptorSet.");
+            errorBadRemap = true;
+            break;
+        }
+
+        pRemappedCopies[i].srcSet = m_objMapper.remap_descriptorsets(pPacket->pDescriptorCopies[i].srcSet);
+        if (pRemappedCopies[i].srcSet == VK_NULL_HANDLE)
+        {
+            vktrace_LogError("Skipping vkUpdateDescriptorSets() due to invalid remapped source VkDescriptorSet.");
+            errorBadRemap = true;
+            break;
+        }
+    }
+
+    if (!errorBadRemap)
+    {
+        // If an error occurred, don't call the real function, but skip ahead so that memory is cleaned up!
+
+        m_vkFuncs.real_vkUpdateDescriptorSets(remappedDevice, pPacket->descriptorWriteCount, pRemappedWrites, pPacket->descriptorCopyCount, pRemappedCopies);
+    }
+
+    for (uint32_t d = 0; d < pPacket->descriptorWriteCount; d++)
+    {
+        if (pRemappedWrites[d].pImageInfo != NULL) {
+            VKTRACE_DELETE((void*)pRemappedWrites[d].pImageInfo);
+            pRemappedWrites[d].pImageInfo = NULL;
+        }
+        if (pRemappedWrites[d].pBufferInfo != NULL) {
+            VKTRACE_DELETE((void*)pRemappedWrites[d].pBufferInfo);
+            pRemappedWrites[d].pImageInfo = NULL;
+        }
+        if (pRemappedWrites[d].pTexelBufferView != NULL) {
+            VKTRACE_DELETE((void*)pRemappedWrites[d].pTexelBufferView);
+            pRemappedWrites[d].pTexelBufferView = NULL;
+        }
+    }
+    VKTRACE_DELETE(pRemappedWrites);
+    VKTRACE_DELETE(pRemappedCopies);
+}
+
+VkResult vkReplay::manually_replay_vkCreateDescriptorSetLayout(packet_vkCreateDescriptorSetLayout* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+
+    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+    if (remappedDevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkCreateDescriptorSetLayout() due to invalid remapped VkDevice.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkDescriptorSetLayoutCreateInfo *pInfo = (VkDescriptorSetLayoutCreateInfo*) pPacket->pCreateInfo;
+    if (pInfo != NULL)
+    {
+        if (pInfo->pBindings != NULL)
+        {
+            for (unsigned int i = 0; i < pInfo->bindingCount; i++)
+            {
+                VkDescriptorSetLayoutBinding *pBindings = (VkDescriptorSetLayoutBinding *) &pInfo->pBindings[i];
+                if (pBindings->pImmutableSamplers != NULL)
+                {
+                    for (unsigned int j = 0; j < pBindings->descriptorCount; j++)
+                    {
+                        VkSampler* pSampler = (VkSampler*)&pBindings->pImmutableSamplers[j];
+                        *pSampler = m_objMapper.remap_samplers(pBindings->pImmutableSamplers[j]);
+                        if (*pSampler == VK_NULL_HANDLE)
+                        {
+                            vktrace_LogError("Skipping vkCreateDescriptorSetLayout() due to invalid remapped VkSampler.");
+                            return VK_ERROR_VALIDATION_FAILED_EXT;
+                        }
+                    }
+                }
+            }
+        }
+    }
+    VkDescriptorSetLayout setLayout;
+    replayResult = m_vkFuncs.real_vkCreateDescriptorSetLayout(remappedDevice, pPacket->pCreateInfo, NULL, &setLayout);
+    if (replayResult == VK_SUCCESS)
+    {
+        m_objMapper.add_to_descriptorsetlayouts_map(*(pPacket->pSetLayout), setLayout);
+    }
+    return replayResult;
+}
+
+void vkReplay::manually_replay_vkDestroyDescriptorSetLayout(packet_vkDestroyDescriptorSetLayout* pPacket)
+{
+    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+    if (remappedDevice == VK_NULL_HANDLE) {
+        vktrace_LogError("Skipping vkDestroyDescriptorSetLayout() due to invalid remapped VkDevice.");
+        return;
+    }
+
+    m_vkFuncs.real_vkDestroyDescriptorSetLayout(remappedDevice, pPacket->descriptorSetLayout, NULL);
+    m_objMapper.rm_from_descriptorsetlayouts_map(pPacket->descriptorSetLayout);
+}
+
+VkResult vkReplay::manually_replay_vkAllocateDescriptorSets(packet_vkAllocateDescriptorSets* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+
+    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+    if (remappedDevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkAllocateDescriptorSets() due to invalid remapped VkDevice.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkDescriptorPool remappedPool = m_objMapper.remap_descriptorpools(pPacket->pAllocateInfo->descriptorPool);
+    if (remappedPool == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkAllocateDescriptorSets() due to invalid remapped VkDescriptorPool.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkDescriptorSetLayout* pRemappedSetLayouts = VKTRACE_NEW_ARRAY(VkDescriptorSetLayout, pPacket->pAllocateInfo->descriptorSetCount);
+
+    VkDescriptorSetAllocateInfo allocateInfo;
+    allocateInfo.pNext = NULL;
+    allocateInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO;
+    allocateInfo.descriptorPool = remappedPool;
+    allocateInfo.descriptorSetCount = pPacket->pAllocateInfo->descriptorSetCount;
+    allocateInfo.pSetLayouts = pRemappedSetLayouts;
+
+    for (uint32_t i = 0; i < allocateInfo.descriptorSetCount; i++)
+    {
+        pRemappedSetLayouts[i] = m_objMapper.remap_descriptorsetlayouts(pPacket->pAllocateInfo->pSetLayouts[i]);
+        if (pRemappedSetLayouts[i] == VK_NULL_HANDLE)
+        {
+            vktrace_LogError("Skipping vkAllocateDescriptorSets() due to invalid remapped VkDescriptorSetLayout.");
+            VKTRACE_DELETE(pRemappedSetLayouts);
+            return VK_ERROR_VALIDATION_FAILED_EXT;
+        }
+    }
+
+    VkDescriptorSet* pDescriptorSets = NULL;
+    replayResult = m_vkFuncs.real_vkAllocateDescriptorSets(
+                       remappedDevice,
+                       pPacket->pAllocateInfo,
+                       pDescriptorSets);
+    if(replayResult == VK_SUCCESS)
+    {
+        for(uint32_t i = 0; i < pPacket->pAllocateInfo->descriptorSetCount; ++i)
+        {
+           m_objMapper.add_to_descriptorsets_map(pPacket->pDescriptorSets[i], pDescriptorSets[i]);
+        }
+    }
+
+    VKTRACE_DELETE(pRemappedSetLayouts);
+
+    return replayResult;
+}
+
+VkResult vkReplay::manually_replay_vkFreeDescriptorSets(packet_vkFreeDescriptorSets* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+
+    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+    if (remappedDevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkFreeDescriptorSets() due to invalid remapped VkDevice.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkDescriptorPool remappedDescriptorPool;
+    remappedDescriptorPool = m_objMapper.remap_descriptorpools(pPacket->descriptorPool);
+    if (remappedDescriptorPool == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkFreeDescriptorSets() due to invalid remapped VkDescriptorPool.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkDescriptorSet* localDSs = VKTRACE_NEW_ARRAY(VkDescriptorSet, pPacket->descriptorSetCount);
+    uint32_t i;
+    for (i = 0; i < pPacket->descriptorSetCount; ++i) {
+        localDSs[i] = m_objMapper.remap_descriptorsets(pPacket->pDescriptorSets[i]);
+        if (localDSs[i] == VK_NULL_HANDLE && pPacket->pDescriptorSets[i] != VK_NULL_HANDLE)
+        {
+            vktrace_LogError("Skipping vkFreeDescriptorSets() due to invalid remapped VkDescriptorSet.");
+            VKTRACE_DELETE(localDSs);
+            return VK_ERROR_VALIDATION_FAILED_EXT;
+        }
+    }
+
+    replayResult = m_vkFuncs.real_vkFreeDescriptorSets(remappedDevice, remappedDescriptorPool, pPacket->descriptorSetCount, localDSs);
+    if(replayResult == VK_SUCCESS)
+    {
+        for (i = 0; i < pPacket->descriptorSetCount; ++i) {
+           m_objMapper.rm_from_descriptorsets_map(pPacket->pDescriptorSets[i]);
+        }
+    }
+    VKTRACE_DELETE(localDSs);
+    return replayResult;
+}
+
+void vkReplay::manually_replay_vkCmdBindDescriptorSets(packet_vkCmdBindDescriptorSets* pPacket)
+{
+    VkCommandBuffer remappedCommandBuffer = m_objMapper.remap_commandbuffers(pPacket->commandBuffer);
+    if (remappedCommandBuffer == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkCmdBindDescriptorSets() due to invalid remapped VkCommandBuffer.");
+        return;
+    }
+
+    VkPipelineLayout remappedLayout = m_objMapper.remap_pipelinelayouts(pPacket->layout);
+    if (remappedLayout == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkCmdBindDescriptorSets() due to invalid remapped VkPipelineLayout.");
+        return;
+    }
+
+    VkDescriptorSet* pRemappedSets = (VkDescriptorSet *) vktrace_malloc(sizeof(VkDescriptorSet) * pPacket->descriptorSetCount);
+    if (pRemappedSets == NULL)
+    {
+        vktrace_LogError("Replay of CmdBindDescriptorSets out of memory.");
+        return;
+    }
+
+    for (uint32_t idx = 0; idx < pPacket->descriptorSetCount && pPacket->pDescriptorSets != NULL; idx++)
+    {
+        pRemappedSets[idx] = m_objMapper.remap_descriptorsets(pPacket->pDescriptorSets[idx]);
+        if (pRemappedSets[idx] == VK_NULL_HANDLE && pPacket->pDescriptorSets[idx] != VK_NULL_HANDLE)
+        {
+            vktrace_LogError("Skipping vkCmdBindDescriptorSets() due to invalid remapped VkDescriptorSet.");
+            vktrace_free(pRemappedSets);
+            return;
+        }
+    }
+
+    m_vkFuncs.real_vkCmdBindDescriptorSets(remappedCommandBuffer, pPacket->pipelineBindPoint, remappedLayout, pPacket->firstSet, pPacket->descriptorSetCount, pRemappedSets, pPacket->dynamicOffsetCount, pPacket->pDynamicOffsets);
+    vktrace_free(pRemappedSets);
+    return;
+}
+
+void vkReplay::manually_replay_vkCmdBindVertexBuffers(packet_vkCmdBindVertexBuffers* pPacket)
+{
+    VkCommandBuffer remappedCommandBuffer = m_objMapper.remap_commandbuffers(pPacket->commandBuffer);
+    if (remappedCommandBuffer == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkCmdBindVertexBuffers() due to invalid remapped VkCommandBuffer.");
+        return;
+    }
+
+    VkBuffer *pSaveBuff = VKTRACE_NEW_ARRAY(VkBuffer, pPacket->bindingCount);
+    if (pSaveBuff == NULL && pPacket->bindingCount > 0)
+    {
+        vktrace_LogError("Replay of CmdBindVertexBuffers out of memory.");
+        return;
+    }
+    uint32_t i = 0;
+    if (pPacket->pBuffers != NULL) {
+        for (i = 0; i < pPacket->bindingCount; i++)
+        {
+            VkBuffer *pBuff = (VkBuffer*) &(pPacket->pBuffers[i]);
+            pSaveBuff[i] = pPacket->pBuffers[i];
+            *pBuff = m_objMapper.remap_buffers(pPacket->pBuffers[i]);
+            if (*pBuff == VK_NULL_HANDLE && pPacket->pBuffers[i] != VK_NULL_HANDLE)
+            {
+                vktrace_LogError("Skipping vkCmdBindVertexBuffers() due to invalid remapped VkBuffer.");
+                VKTRACE_DELETE(pSaveBuff);
+                return;
+            }
+        }
+    }
+    m_vkFuncs.real_vkCmdBindVertexBuffers(remappedCommandBuffer, pPacket->firstBinding, pPacket->bindingCount, pPacket->pBuffers, pPacket->pOffsets);
+    for (uint32_t k = 0; k < i; k++)
+    {
+        VkBuffer *pBuff = (VkBuffer*) &(pPacket->pBuffers[k]);
+        *pBuff = pSaveBuff[k];
+    }
+    VKTRACE_DELETE(pSaveBuff);
+    return;
+}
+
+//VkResult vkReplay::manually_replay_vkCreateGraphicsPipeline(packet_vkCreateGraphicsPipeline* pPacket)
+//{
+//    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+//
+//    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+//    if (remappedDevice == VK_NULL_HANDLE)
+//    {
+//        vktrace_LogError("Skipping vkCreateGraphicsPipeline() due to invalid remapped VkDevice.");
+//        return VK_ERROR_VALIDATION_FAILED_EXT;
+//    }
+//
+//    // remap shaders from each stage
+//    VkPipelineShaderStageCreateInfo* pRemappedStages = VKTRACE_NEW_ARRAY(VkPipelineShaderStageCreateInfo, pPacket->pCreateInfo->stageCount);
+//    memcpy(pRemappedStages, pPacket->pCreateInfo->pStages, sizeof(VkPipelineShaderStageCreateInfo) * pPacket->pCreateInfo->stageCount);
+//    for (uint32_t i = 0; i < pPacket->pCreateInfo->stageCount; i++)
+//    {
+//        pRemappedStages[i].shader = m_objMapper.remap(pRemappedStages[i].shader);
+//        if (pRemappedStages[i].shader == VK_NULL_HANDLE)
+//        {
+//            vktrace_LogError("Skipping vkCreateGraphicsPipeline() due to invalid remapped VkShader.");
+//            return VK_ERROR_VALIDATION_FAILED_EXT;
+//        }
+//    }
+//
+//    VkGraphicsPipelineCreateInfo createInfo = {
+//        .sType = pPacket->pCreateInfo->sType,
+//        .pNext = pPacket->pCreateInfo->pNext,
+//        .stageCount = pPacket->pCreateInfo->stageCount,
+//        .pStages = pRemappedStages,
+//        .pVertexInputState = pPacket->pCreateInfo->pVertexInputState,
+//        .pIaState = pPacket->pCreateInfo->pIaState,
+//        .pTessState = pPacket->pCreateInfo->pTessState,
+//        .pVpState = pPacket->pCreateInfo->pVpState,
+//        .pRsState = pPacket->pCreateInfo->pRsState,
+//        .pMsState = pPacket->pCreateInfo->pMsState,
+//        .pDsState = pPacket->pCreateInfo->pDsState,
+//        .pCbState = pPacket->pCreateInfo->pCbState,
+//        .flags = pPacket->pCreateInfo->flags,
+//        .layout = m_objMapper.remap(pPacket->pCreateInfo->layout)
+//    };
+//
+//    VkPipeline pipeline;
+//    replayResult = m_vkFuncs.real_vkCreateGraphicsPipeline(remappedDevice, &createInfo, &pipeline);
+//    if (replayResult == VK_SUCCESS)
+//    {
+//        m_objMapper.add_to_map(pPacket->pPipeline, &pipeline);
+//    }
+//
+//    VKTRACE_DELETE(pRemappedStages);
+//
+//    return replayResult;
+//}
+
+VkResult vkReplay::manually_replay_vkGetPipelineCacheData(packet_vkGetPipelineCacheData* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+    size_t dataSize;
+    VkDevice remappeddevice = m_objMapper.remap_devices(pPacket->device);
+    if (remappeddevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkGetPipelineCacheData() due to invalid remapped VkDevice.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkPipelineCache remappedpipelineCache = m_objMapper.remap_pipelinecaches(pPacket->pipelineCache);
+    if (pPacket->pipelineCache != VK_NULL_HANDLE && remappedpipelineCache == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkGetPipelineCacheData() due to invalid remapped VkPipelineCache.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    // Since the returned data size may not be equal to size of the buffer in the trace packet allocate a local buffer as needed
+    replayResult = m_vkFuncs.real_vkGetPipelineCacheData(remappeddevice, remappedpipelineCache, &dataSize, NULL);
+    if (replayResult != VK_SUCCESS)
+        return replayResult;
+    if (pPacket->pData) {
+        uint8_t *pData = VKTRACE_NEW_ARRAY(uint8_t, dataSize);
+        replayResult = m_vkFuncs.real_vkGetPipelineCacheData(remappeddevice, remappedpipelineCache, pPacket->pDataSize, pData);
+        VKTRACE_DELETE(pData);
+    }
+    return replayResult;
+}
+
+VkResult vkReplay::manually_replay_vkCreateComputePipelines(packet_vkCreateComputePipelines* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+    VkDevice remappeddevice = m_objMapper.remap_devices(pPacket->device);
+    uint32_t i;
+
+    if (pPacket->device != VK_NULL_HANDLE && remappeddevice == VK_NULL_HANDLE)
+    {
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkPipelineCache pipelineCache;
+    pipelineCache = m_objMapper.remap_pipelinecaches(pPacket->pipelineCache);
+
+    VkComputePipelineCreateInfo* pLocalCIs = VKTRACE_NEW_ARRAY(VkComputePipelineCreateInfo, pPacket->createInfoCount);
+    memcpy((void*)pLocalCIs, (void*)(pPacket->pCreateInfos), sizeof(VkComputePipelineCreateInfo)*pPacket->createInfoCount);
+
+    // Fix up stage sub-elements
+    for (i = 0; i<pPacket->createInfoCount; i++)
+    {
+        pLocalCIs[i].stage.module = m_objMapper.remap_shadermodules(pLocalCIs[i].stage.module);
+
+        if (pLocalCIs[i].stage.pName)
+            pLocalCIs[i].stage.pName = (const char*)(vktrace_trace_packet_interpret_buffer_pointer(pPacket->header, (intptr_t)pLocalCIs[i].stage.pName));
+
+        if (pLocalCIs[i].stage.pSpecializationInfo)
+        {
+            VkSpecializationInfo* si = VKTRACE_NEW(VkSpecializationInfo);
+            pLocalCIs[i].stage.pSpecializationInfo = (const VkSpecializationInfo*)(vktrace_trace_packet_interpret_buffer_pointer(pPacket->header, (intptr_t)pLocalCIs[i].stage.pSpecializationInfo));
+            memcpy((void*)si, (void*)(pLocalCIs[i].stage.pSpecializationInfo), sizeof(VkSpecializationInfo));
+
+            if (si->mapEntryCount > 0 && si->pMapEntries)
+                si->pMapEntries = (const VkSpecializationMapEntry*)(vktrace_trace_packet_interpret_buffer_pointer(pPacket->header, (intptr_t)pLocalCIs[i].stage.pSpecializationInfo->pMapEntries));
+            if (si->dataSize > 0 && si->pData)
+                si->pData = (const void*)(vktrace_trace_packet_interpret_buffer_pointer(pPacket->header, (intptr_t)si->pData));
+            pLocalCIs[i].stage.pSpecializationInfo = si;
+        }
+
+        pLocalCIs[i].layout = m_objMapper.remap_pipelinelayouts(pLocalCIs[i].layout);
+        pLocalCIs[i].basePipelineHandle = m_objMapper.remap_pipelines(pLocalCIs[i].basePipelineHandle);
+    }
+
+    VkPipeline *local_pPipelines = VKTRACE_NEW_ARRAY(VkPipeline, pPacket->createInfoCount);
+
+    replayResult = m_vkFuncs.real_vkCreateComputePipelines(remappeddevice, pipelineCache, pPacket->createInfoCount, pLocalCIs, NULL, local_pPipelines);
+
+    if (replayResult == VK_SUCCESS)
+    {
+        for (i = 0; i < pPacket->createInfoCount; i++) {
+            m_objMapper.add_to_pipelines_map(pPacket->pPipelines[i], local_pPipelines[i]);
+        }
+    }
+
+    for (i = 0; i<pPacket->createInfoCount; i++)
+        if (pLocalCIs[i].stage.pSpecializationInfo)
+            VKTRACE_DELETE((void *)pLocalCIs[i].stage.pSpecializationInfo);
+    VKTRACE_DELETE(pLocalCIs);
+    VKTRACE_DELETE(local_pPipelines);
+
+    return replayResult;
+}
+
+VkResult vkReplay::manually_replay_vkCreateGraphicsPipelines(packet_vkCreateGraphicsPipelines* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+
+    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+    if (remappedDevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkCreateGraphicsPipelines() due to invalid remapped VkDevice.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    // remap shaders from each stage
+    VkPipelineShaderStageCreateInfo* pRemappedStages = VKTRACE_NEW_ARRAY(VkPipelineShaderStageCreateInfo, pPacket->pCreateInfos->stageCount);
+    memcpy(pRemappedStages, pPacket->pCreateInfos->pStages, sizeof(VkPipelineShaderStageCreateInfo) * pPacket->pCreateInfos->stageCount);
+
+    VkGraphicsPipelineCreateInfo* pLocalCIs = VKTRACE_NEW_ARRAY(VkGraphicsPipelineCreateInfo, pPacket->createInfoCount);
+    uint32_t i, j;
+    for (i=0; i<pPacket->createInfoCount; i++) {
+        memcpy((void*)&(pLocalCIs[i]), (void*)&(pPacket->pCreateInfos[i]), sizeof(VkGraphicsPipelineCreateInfo));
+        for (j=0; j < pPacket->pCreateInfos[i].stageCount; j++)
+        {
+            pRemappedStages[j].module = m_objMapper.remap_shadermodules(pRemappedStages[j].module);
+            if (pRemappedStages[j].module == VK_NULL_HANDLE)
+            {
+                vktrace_LogError("Skipping vkCreateGraphicsPipelines() due to invalid remapped VkShaderModule.");
+                VKTRACE_DELETE(pRemappedStages);
+                VKTRACE_DELETE(pLocalCIs);
+                return VK_ERROR_VALIDATION_FAILED_EXT;
+            }
+        }
+        VkPipelineShaderStageCreateInfo** ppSSCI = (VkPipelineShaderStageCreateInfo**)&(pLocalCIs[i].pStages);
+        *ppSSCI = pRemappedStages;
+
+        pLocalCIs[i].layout = m_objMapper.remap_pipelinelayouts(pPacket->pCreateInfos[i].layout);
+        if (pLocalCIs[i].layout == VK_NULL_HANDLE)
+        {
+            vktrace_LogError("Skipping vkCreateGraphicsPipelines() due to invalid remapped VkPipelineLayout.");
+            VKTRACE_DELETE(pRemappedStages);
+            VKTRACE_DELETE(pLocalCIs);
+            return VK_ERROR_VALIDATION_FAILED_EXT;
+        }
+
+        pLocalCIs[i].renderPass = m_objMapper.remap_renderpasss(pPacket->pCreateInfos[i].renderPass);
+        if (pLocalCIs[i].renderPass == VK_NULL_HANDLE)
+        {
+            vktrace_LogError("Skipping vkCreateGraphicsPipelines() due to invalid remapped VkRenderPass.");
+            VKTRACE_DELETE(pRemappedStages);
+            VKTRACE_DELETE(pLocalCIs);
+            return VK_ERROR_VALIDATION_FAILED_EXT;
+        }
+
+        pLocalCIs[i].basePipelineHandle = m_objMapper.remap_pipelines(pPacket->pCreateInfos[i].basePipelineHandle);
+        if (pLocalCIs[i].basePipelineHandle == VK_NULL_HANDLE && pPacket->pCreateInfos[i].basePipelineHandle != VK_NULL_HANDLE)
+        {
+            vktrace_LogError("Skipping vkCreateGraphicsPipelines() due to invalid remapped VkPipeline.");
+            VKTRACE_DELETE(pRemappedStages);
+            VKTRACE_DELETE(pLocalCIs);
+            return VK_ERROR_VALIDATION_FAILED_EXT;
+        }
+
+        ((VkPipelineViewportStateCreateInfo*)pLocalCIs[i].pViewportState)->pViewports =
+                (VkViewport*)vktrace_trace_packet_interpret_buffer_pointer(pPacket->header, (intptr_t)pPacket->pCreateInfos[i].pViewportState->pViewports);
+        ((VkPipelineViewportStateCreateInfo*)pLocalCIs[i].pViewportState)->pScissors =
+                (VkRect2D*)vktrace_trace_packet_interpret_buffer_pointer(pPacket->header, (intptr_t)pPacket->pCreateInfos[i].pViewportState->pScissors);
+
+        ((VkPipelineMultisampleStateCreateInfo*)pLocalCIs[i].pMultisampleState)->pSampleMask =
+                (VkSampleMask*)vktrace_trace_packet_interpret_buffer_pointer(pPacket->header, (intptr_t)pPacket->pCreateInfos[i].pMultisampleState->pSampleMask);
+    }
+
+    VkPipelineCache remappedPipelineCache;
+    remappedPipelineCache = m_objMapper.remap_pipelinecaches(pPacket->pipelineCache);
+    if (remappedPipelineCache == VK_NULL_HANDLE && pPacket->pipelineCache != VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkCreateGraphicsPipelines() due to invalid remapped VkPipelineCache.");
+        VKTRACE_DELETE(pRemappedStages);
+        VKTRACE_DELETE(pLocalCIs);
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    uint32_t createInfoCount = pPacket->createInfoCount;
+    VkPipeline *local_pPipelines = VKTRACE_NEW_ARRAY(VkPipeline, pPacket->createInfoCount);
+
+    replayResult = m_vkFuncs.real_vkCreateGraphicsPipelines(remappedDevice, remappedPipelineCache, createInfoCount, pLocalCIs, NULL, local_pPipelines);
+
+    if (replayResult == VK_SUCCESS)
+    {
+        for (i = 0; i < pPacket->createInfoCount; i++) {
+            m_objMapper.add_to_pipelines_map(pPacket->pPipelines[i], local_pPipelines[i]);
+        }
+    }
+
+    VKTRACE_DELETE(pRemappedStages);
+    VKTRACE_DELETE(pLocalCIs);
+    VKTRACE_DELETE(local_pPipelines);
+
+    return replayResult;
+}
+
+VkResult vkReplay::manually_replay_vkCreatePipelineLayout(packet_vkCreatePipelineLayout* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+
+    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+    if (remappedDevice == VK_NULL_HANDLE)
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+
+    // array to store the original trace-time layouts, so that we can remap them inside the packet and then
+    // restore them after replaying the API call.
+    VkDescriptorSetLayout* pSaveLayouts = NULL;
+    if (pPacket->pCreateInfo->setLayoutCount > 0) {
+        pSaveLayouts = (VkDescriptorSetLayout*) vktrace_malloc(sizeof(VkDescriptorSetLayout) * pPacket->pCreateInfo->setLayoutCount);
+        if (!pSaveLayouts) {
+            vktrace_LogError("Replay of CreatePipelineLayout out of memory.");
+            return VK_ERROR_VALIDATION_FAILED_EXT;
+        }
+    }
+    uint32_t i = 0;
+    for (i = 0; (i < pPacket->pCreateInfo->setLayoutCount) && (pPacket->pCreateInfo->pSetLayouts != NULL); i++) {
+        VkDescriptorSetLayout* pSL = (VkDescriptorSetLayout*) &(pPacket->pCreateInfo->pSetLayouts[i]);
+        pSaveLayouts[i] = pPacket->pCreateInfo->pSetLayouts[i];
+        *pSL = m_objMapper.remap_descriptorsetlayouts(pPacket->pCreateInfo->pSetLayouts[i]);
+    }
+    VkPipelineLayout localPipelineLayout;
+    replayResult = m_vkFuncs.real_vkCreatePipelineLayout(remappedDevice, pPacket->pCreateInfo, NULL, &localPipelineLayout);
+    if (replayResult == VK_SUCCESS)
+    {
+        m_objMapper.add_to_pipelinelayouts_map(*(pPacket->pPipelineLayout), localPipelineLayout);
+    }
+    // restore packet to contain the original Set Layouts before being remapped.
+    for (uint32_t k = 0; k < i; k++) {
+        VkDescriptorSetLayout* pSL = (VkDescriptorSetLayout*) &(pPacket->pCreateInfo->pSetLayouts[k]);
+        *pSL = pSaveLayouts[k];
+    }
+    if (pSaveLayouts != NULL) {
+        vktrace_free(pSaveLayouts);
+    }
+    return replayResult;
+}
+
+void vkReplay::manually_replay_vkCmdWaitEvents(packet_vkCmdWaitEvents* pPacket)
+{
+    VkDevice traceDevice;
+    VkDevice replayDevice;
+    uint32_t srcReplayIdx, dstReplayIdx;
+    VkCommandBuffer remappedCommandBuffer = m_objMapper.remap_commandbuffers(pPacket->commandBuffer);
+    if (remappedCommandBuffer == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkCmdWaitEvents() due to invalid remapped VkCommandBuffer.");
+        return;
+    }
+
+    VkEvent* saveEvent = VKTRACE_NEW_ARRAY(VkEvent, pPacket->eventCount);
+    uint32_t idx = 0;
+    uint32_t numRemapBuf = 0;
+    uint32_t numRemapImg = 0;
+    for (idx = 0; idx < pPacket->eventCount; idx++)
+    {
+        VkEvent *pEvent = (VkEvent *) &(pPacket->pEvents[idx]);
+        saveEvent[idx] = pPacket->pEvents[idx];
+        *pEvent = m_objMapper.remap_events(pPacket->pEvents[idx]);
+        if (*pEvent == VK_NULL_HANDLE)
+        {
+            vktrace_LogError("Skipping vkCmdWaitEvents() due to invalid remapped VkEvent.");
+            VKTRACE_DELETE(saveEvent);
+            return;
+        }
+    }
+
+    VkBuffer* saveBuf = VKTRACE_NEW_ARRAY(VkBuffer, pPacket->bufferMemoryBarrierCount);
+    for (idx = 0; idx < pPacket->bufferMemoryBarrierCount; idx++)
+    {
+        VkBufferMemoryBarrier *pNextBuf = (VkBufferMemoryBarrier *)& (pPacket->pBufferMemoryBarriers[idx]);
+        saveBuf[numRemapBuf++] = pNextBuf->buffer;
+        traceDevice = traceBufferToDevice[pNextBuf->buffer];
+        pNextBuf->buffer = m_objMapper.remap_buffers(pNextBuf->buffer);
+        if (pNextBuf->buffer == VK_NULL_HANDLE)
+        {
+            vktrace_LogError("Skipping vkCmdWaitEvents() due to invalid remapped VkBuffer.");
+            VKTRACE_DELETE(saveEvent);
+            VKTRACE_DELETE(saveBuf);
+            return;
+        }
+        replayDevice = replayBufferToDevice[pNextBuf->buffer];
+        if (getQueueFamilyIdx(traceDevice,
+                              replayDevice,
+                              pPacket->pBufferMemoryBarriers[idx].srcQueueFamilyIndex,
+                              &srcReplayIdx) &&
+            getQueueFamilyIdx(traceDevice,
+                              replayDevice,
+                              pPacket->pBufferMemoryBarriers[idx].dstQueueFamilyIndex,
+                              &dstReplayIdx))
+        {
+            *((uint32_t *)&pPacket->pBufferMemoryBarriers[idx].srcQueueFamilyIndex) = srcReplayIdx;
+            *((uint32_t *)&pPacket->pBufferMemoryBarriers[idx].srcQueueFamilyIndex) = dstReplayIdx;
+        } else {
+            vktrace_LogError("vkCmdWaitEvents failed, bad srcQueueFamilyIndex");
+            return;
+        }
+    }
+    VkImage* saveImg = VKTRACE_NEW_ARRAY(VkImage, pPacket->imageMemoryBarrierCount);
+    for (idx = 0; idx < pPacket->imageMemoryBarrierCount; idx++)
+    {
+        VkImageMemoryBarrier *pNextImg = (VkImageMemoryBarrier *) &(pPacket->pImageMemoryBarriers[idx]);
+        saveImg[numRemapImg++] = pNextImg->image;
+        traceDevice = traceImageToDevice[pNextImg->image];
+        pNextImg->image = m_objMapper.remap_images(pNextImg->image);
+        if (pNextImg->image == VK_NULL_HANDLE)
+        {
+            vktrace_LogError("Skipping vkCmdWaitEvents() due to invalid remapped VkImage.");
+            VKTRACE_DELETE(saveEvent);
+            VKTRACE_DELETE(saveBuf);
+            VKTRACE_DELETE(saveImg);
+            return;
+        }
+        replayDevice = replayImageToDevice[pNextImg->image];
+        if (getQueueFamilyIdx(traceDevice,
+                              replayDevice,
+                              pPacket->pImageMemoryBarriers[idx].srcQueueFamilyIndex,
+                              &srcReplayIdx) &&
+            getQueueFamilyIdx(traceDevice,
+                              replayDevice,
+                              pPacket->pImageMemoryBarriers[idx].dstQueueFamilyIndex,
+                              &dstReplayIdx))
+        {
+            *((uint32_t *)&pPacket->pImageMemoryBarriers[idx].srcQueueFamilyIndex) = srcReplayIdx;
+            *((uint32_t *)&pPacket->pImageMemoryBarriers[idx].srcQueueFamilyIndex) = dstReplayIdx;
+        } else {
+            vktrace_LogError("vkCmdWaitEvents failed, bad srcQueueFamilyIndex");
+            return;
+        }
+    }
+    m_vkFuncs.real_vkCmdWaitEvents(remappedCommandBuffer, pPacket->eventCount, pPacket->pEvents, pPacket->srcStageMask, pPacket->dstStageMask, pPacket->memoryBarrierCount, pPacket->pMemoryBarriers, pPacket->bufferMemoryBarrierCount, pPacket->pBufferMemoryBarriers, pPacket->imageMemoryBarrierCount, pPacket->pImageMemoryBarriers);
+
+    for (idx = 0; idx < pPacket->bufferMemoryBarrierCount; idx++) {
+        VkBufferMemoryBarrier *pNextBuf = (VkBufferMemoryBarrier *) &(pPacket->pBufferMemoryBarriers[idx]);
+        pNextBuf->buffer = saveBuf[idx];
+    }
+    for (idx = 0; idx < pPacket->memoryBarrierCount; idx++) {
+        VkImageMemoryBarrier *pNextImg = (VkImageMemoryBarrier *) &(pPacket->pImageMemoryBarriers[idx]);
+        pNextImg->image = saveImg[idx];
+    }
+    for (idx = 0; idx < pPacket->eventCount; idx++) {
+        VkEvent *pEvent = (VkEvent *) &(pPacket->pEvents[idx]);
+        *pEvent = saveEvent[idx];
+    }
+    VKTRACE_DELETE(saveEvent);
+    VKTRACE_DELETE(saveBuf);
+    VKTRACE_DELETE(saveImg);
+    return;
+}
+
+void vkReplay::manually_replay_vkCmdPipelineBarrier(packet_vkCmdPipelineBarrier* pPacket)
+{
+    VkDevice traceDevice;
+    VkDevice replayDevice;
+    uint32_t srcReplayIdx, dstReplayIdx;
+    VkCommandBuffer remappedCommandBuffer = m_objMapper.remap_commandbuffers(pPacket->commandBuffer);
+    if (remappedCommandBuffer == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkCmdPipelineBarrier() due to invalid remapped VkCommandBuffer.");
+        return;
+    }
+
+    uint32_t idx = 0;
+    uint32_t numRemapBuf = 0;
+    uint32_t numRemapImg = 0;
+    VkBuffer* saveBuf = VKTRACE_NEW_ARRAY(VkBuffer, pPacket->bufferMemoryBarrierCount);
+    VkImage* saveImg = VKTRACE_NEW_ARRAY(VkImage, pPacket->imageMemoryBarrierCount);
+    for (idx = 0; idx < pPacket->bufferMemoryBarrierCount; idx++)
+    {
+        VkBufferMemoryBarrier *pNextBuf = (VkBufferMemoryBarrier *) &(pPacket->pBufferMemoryBarriers[idx]);
+        saveBuf[numRemapBuf++] = pNextBuf->buffer;
+        traceDevice = traceBufferToDevice[pNextBuf->buffer];
+        pNextBuf->buffer = m_objMapper.remap_buffers(pNextBuf->buffer);
+        if (pNextBuf->buffer == VK_NULL_HANDLE && saveBuf[numRemapBuf - 1] != VK_NULL_HANDLE)
+        {
+            vktrace_LogError("Skipping vkCmdPipelineBarrier() due to invalid remapped VkBuffer.");
+            VKTRACE_DELETE(saveBuf);
+            VKTRACE_DELETE(saveImg);
+            return;
+        }
+        replayDevice = replayBufferToDevice[pNextBuf->buffer];
+        if (getQueueFamilyIdx(traceDevice,
+                              replayDevice,
+                              pPacket->pBufferMemoryBarriers[idx].srcQueueFamilyIndex,
+                              &srcReplayIdx) &&
+            getQueueFamilyIdx(traceDevice,
+                              replayDevice,
+                              pPacket->pBufferMemoryBarriers[idx].dstQueueFamilyIndex,
+                              &dstReplayIdx))
+        {
+            *((uint32_t *)&pPacket->pBufferMemoryBarriers[idx].srcQueueFamilyIndex) = srcReplayIdx;
+            *((uint32_t *)&pPacket->pBufferMemoryBarriers[idx].srcQueueFamilyIndex) = dstReplayIdx;
+        } else {
+            vktrace_LogError("vkCmdPipelineBarrier failed, bad srcQueueFamilyIndex");
+            return;
+        }
+    }
+    for (idx = 0; idx < pPacket->imageMemoryBarrierCount; idx++)
+    {
+        VkImageMemoryBarrier *pNextImg = (VkImageMemoryBarrier *) &(pPacket->pImageMemoryBarriers[idx]);
+        saveImg[numRemapImg++] = pNextImg->image;
+        traceDevice = traceImageToDevice[pNextImg->image];
+        if (traceDevice == NULL)
+            vktrace_LogError("DEBUG: traceDevice is NULL");
+        pNextImg->image = m_objMapper.remap_images(pNextImg->image);
+        if (pNextImg->image == VK_NULL_HANDLE && saveImg[numRemapImg - 1] != VK_NULL_HANDLE)
+        {
+            vktrace_LogError("Skipping vkCmdPipelineBarrier() due to invalid remapped VkImage.");
+            VKTRACE_DELETE(saveBuf);
+            VKTRACE_DELETE(saveImg);
+            return;
+        }
+        replayDevice = replayImageToDevice[pNextImg->image];
+        if (getQueueFamilyIdx(traceDevice,
+                              replayDevice,
+                              pPacket->pImageMemoryBarriers[idx].srcQueueFamilyIndex,
+                              &srcReplayIdx) &&
+            getQueueFamilyIdx(traceDevice,
+                              replayDevice,
+                              pPacket->pImageMemoryBarriers[idx].dstQueueFamilyIndex,
+                              &dstReplayIdx))
+        {
+            *((uint32_t *)&pPacket->pImageMemoryBarriers[idx].srcQueueFamilyIndex) = srcReplayIdx;
+            *((uint32_t *)&pPacket->pImageMemoryBarriers[idx].srcQueueFamilyIndex) = dstReplayIdx;
+        } else {
+            vktrace_LogError("vkPipelineBarrier failed, bad srcQueueFamilyIndex");
+            return;
+        }
+    }
+    m_vkFuncs.real_vkCmdPipelineBarrier(remappedCommandBuffer, pPacket->srcStageMask, pPacket->dstStageMask, pPacket->dependencyFlags, pPacket->memoryBarrierCount, pPacket->pMemoryBarriers, pPacket->bufferMemoryBarrierCount, pPacket->pBufferMemoryBarriers, pPacket->imageMemoryBarrierCount, pPacket->pImageMemoryBarriers);
+
+    for (idx = 0; idx < pPacket->bufferMemoryBarrierCount; idx++) {
+        VkBufferMemoryBarrier *pNextBuf = (VkBufferMemoryBarrier *) &(pPacket->pBufferMemoryBarriers[idx]);
+        pNextBuf->buffer = saveBuf[idx];
+    }
+    for (idx = 0; idx < pPacket->imageMemoryBarrierCount; idx++) {
+        VkImageMemoryBarrier *pNextImg = (VkImageMemoryBarrier *) &(pPacket->pImageMemoryBarriers[idx]);
+        pNextImg->image = saveImg[idx];
+    }
+    VKTRACE_DELETE(saveBuf);
+    VKTRACE_DELETE(saveImg);
+    return;
+}
+
+VkResult vkReplay::manually_replay_vkCreateFramebuffer(packet_vkCreateFramebuffer* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+
+    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+    if (remappedDevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkCreateFramebuffer() due to invalid remapped VkDevice.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkFramebufferCreateInfo *pInfo = (VkFramebufferCreateInfo *) pPacket->pCreateInfo;
+    VkImageView *pAttachments, *pSavedAttachments = (VkImageView*)pInfo->pAttachments;
+    bool allocatedAttachments = false;
+    if (pSavedAttachments != NULL)
+    {
+        allocatedAttachments = true;
+        pAttachments = VKTRACE_NEW_ARRAY(VkImageView, pInfo->attachmentCount);
+        memcpy(pAttachments, pSavedAttachments, sizeof(VkImageView) * pInfo->attachmentCount);
+        for (uint32_t i = 0; i < pInfo->attachmentCount; i++)
+        {
+            pAttachments[i] = m_objMapper.remap_imageviews(pInfo->pAttachments[i]);
+            if (pAttachments[i] == VK_NULL_HANDLE && pInfo->pAttachments[i] != VK_NULL_HANDLE)
+            {
+                vktrace_LogError("Skipping vkCreateFramebuffer() due to invalid remapped VkImageView.");
+                VKTRACE_DELETE(pAttachments);
+                return VK_ERROR_VALIDATION_FAILED_EXT;
+            }
+        }
+        pInfo->pAttachments = pAttachments;
+    }
+    VkRenderPass savedRP = pPacket->pCreateInfo->renderPass;
+    pInfo->renderPass = m_objMapper.remap_renderpasss(pPacket->pCreateInfo->renderPass);
+    if (pInfo->renderPass == VK_NULL_HANDLE && pPacket->pCreateInfo->renderPass != VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkCreateFramebuffer() due to invalid remapped VkRenderPass.");
+        if (allocatedAttachments)
+        {
+            VKTRACE_DELETE(pAttachments);
+        }
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkFramebuffer local_framebuffer;
+    replayResult = m_vkFuncs.real_vkCreateFramebuffer(remappedDevice, pPacket->pCreateInfo, NULL, &local_framebuffer);
+    pInfo->pAttachments = pSavedAttachments;
+    pInfo->renderPass = savedRP;
+    if (replayResult == VK_SUCCESS)
+    {
+        m_objMapper.add_to_framebuffers_map(*(pPacket->pFramebuffer), local_framebuffer);
+    }
+    if (allocatedAttachments)
+    {
+        VKTRACE_DELETE((void*)pAttachments);
+    }
+    return replayResult;
+}
+
+VkResult vkReplay::manually_replay_vkCreateRenderPass(packet_vkCreateRenderPass* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+
+    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+    if (remappedDevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkCreateRenderPass() due to invalid remapped VkDevice.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkRenderPass local_renderpass;
+    replayResult = m_vkFuncs.real_vkCreateRenderPass(remappedDevice, pPacket->pCreateInfo, NULL, &local_renderpass);
+    if (replayResult == VK_SUCCESS)
+    {
+        m_objMapper.add_to_renderpasss_map(*(pPacket->pRenderPass), local_renderpass);
+    }
+    return replayResult;
+}
+
+void vkReplay::manually_replay_vkCmdBeginRenderPass(packet_vkCmdBeginRenderPass* pPacket)
+{
+    VkCommandBuffer remappedCommandBuffer = m_objMapper.remap_commandbuffers(pPacket->commandBuffer);
+    if (remappedCommandBuffer == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkCmdBeginRenderPass() due to invalid remapped VkCommandBuffer.");
+        return;
+    }
+    VkRenderPassBeginInfo local_renderPassBeginInfo;
+    memcpy((void*)&local_renderPassBeginInfo, (void*)pPacket->pRenderPassBegin, sizeof(VkRenderPassBeginInfo));
+    local_renderPassBeginInfo.pClearValues = (const VkClearValue*)pPacket->pRenderPassBegin->pClearValues;
+    local_renderPassBeginInfo.framebuffer = m_objMapper.remap_framebuffers(pPacket->pRenderPassBegin->framebuffer);
+    if (local_renderPassBeginInfo.framebuffer == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkCmdBeginRenderPass() due to invalid remapped VkFramebuffer.");
+        return;
+    }
+    local_renderPassBeginInfo.renderPass = m_objMapper.remap_renderpasss(pPacket->pRenderPassBegin->renderPass);
+    if (local_renderPassBeginInfo.renderPass == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkCmdBeginRenderPass() due to invalid remapped VkRenderPass.");
+        return;
+    }
+    m_vkFuncs.real_vkCmdBeginRenderPass(remappedCommandBuffer, &local_renderPassBeginInfo, pPacket->contents);
+    return;
+}
+
+VkResult vkReplay::manually_replay_vkBeginCommandBuffer(packet_vkBeginCommandBuffer* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+
+    VkCommandBuffer remappedCommandBuffer = m_objMapper.remap_commandbuffers(pPacket->commandBuffer);
+    if (remappedCommandBuffer == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkBeginCommandBuffer() due to invalid remapped VkCommandBuffer.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkCommandBufferBeginInfo* pInfo = (VkCommandBufferBeginInfo*)pPacket->pBeginInfo;
+    VkCommandBufferInheritanceInfo *pHinfo = (VkCommandBufferInheritanceInfo *) ((pInfo) ? pInfo->pInheritanceInfo : NULL);
+    // Save the original RP & FB, then overwrite packet with remapped values
+    VkRenderPass savedRP, *pRP;
+    VkFramebuffer savedFB, *pFB;
+    if (pInfo != NULL && pHinfo != NULL)
+    {
+        savedRP = pHinfo->renderPass;
+        savedFB = pHinfo->framebuffer;
+        pRP = &(pHinfo->renderPass);
+        pFB = &(pHinfo->framebuffer);
+        *pRP = m_objMapper.remap_renderpasss(savedRP);
+        *pFB = m_objMapper.remap_framebuffers(savedFB);
+    }
+    replayResult = m_vkFuncs.real_vkBeginCommandBuffer(remappedCommandBuffer, pPacket->pBeginInfo);
+    if (pInfo != NULL && pHinfo != NULL) {
+        pHinfo->renderPass = savedRP;
+        pHinfo->framebuffer = savedFB;
+    }
+    return replayResult;
+}
+
+// TODO138 : Can we kill this?
+//VkResult vkReplay::manually_replay_vkStorePipeline(packet_vkStorePipeline* pPacket)
+//{
+//    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+//
+//    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+//    if (remappedDevice == VK_NULL_HANDLE)
+//    {
+//        vktrace_LogError("Skipping vkStorePipeline() due to invalid remapped VkDevice.");
+//        return VK_ERROR_VALIDATION_FAILED_EXT;
+//    }
+//
+//    VkPipeline remappedPipeline = m_objMapper.remap(pPacket->pipeline);
+//    if (remappedPipeline == VK_NULL_HANDLE)
+//    {
+//        vktrace_LogError("Skipping vkStorePipeline() due to invalid remapped VkPipeline.");
+//        return VK_ERROR_VALIDATION_FAILED_EXT;
+//    }
+//
+//    size_t size = 0;
+//    void* pData = NULL;
+//    if (pPacket->pData != NULL && pPacket->pDataSize != NULL)
+//    {
+//        size = *pPacket->pDataSize;
+//        pData = vktrace_malloc(*pPacket->pDataSize);
+//    }
+//    replayResult = m_vkFuncs.real_vkStorePipeline(remappedDevice, remappedPipeline, &size, pData);
+//    if (replayResult == VK_SUCCESS)
+//    {
+//        if (size != *pPacket->pDataSize && pData != NULL)
+//        {
+//            vktrace_LogWarning("vkStorePipeline returned a differing data size: replay (%d bytes) vs trace (%d bytes).", size, *pPacket->pDataSize);
+//        }
+//        else if (pData != NULL && memcmp(pData, pPacket->pData, size) != 0)
+//        {
+//            vktrace_LogWarning("vkStorePipeline returned differing data contents than the trace file contained.");
+//        }
+//    }
+//    vktrace_free(pData);
+//    return replayResult;
+//}
+
+// TODO138 : This needs to be broken out into separate functions for each non-disp object
+//VkResult vkReplay::manually_replay_vkDestroy<Object>(packet_vkDestroyObject* pPacket)
+//{
+//    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+//
+//    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+//    if (remappedDevice == VK_NULL_HANDLE)
+//    {
+//        vktrace_LogError("Skipping vkDestroy() due to invalid remapped VkDevice.");
+//        return VK_ERROR_VALIDATION_FAILED_EXT;
+//    }
+//
+//    uint64_t remapHandle = m_objMapper.remap_<OBJECT_TYPE_HERE>(pPacket->object, pPacket->objType);
+//    <VkObject> object;
+//    object.handle = remapHandle;
+//    if (object != 0)
+//        replayResult = m_vkFuncs.real_vkDestroy<Object>(remappedDevice, object);
+//    if (replayResult == VK_SUCCESS)
+//        m_objMapper.rm_from_<OBJECT_TYPE_HERE>_map(pPacket->object.handle);
+//    return replayResult;
+//}
+
+VkResult vkReplay::manually_replay_vkWaitForFences(packet_vkWaitForFences* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+    uint32_t i;
+
+    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+    if (remappedDevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkWaitForFences() due to invalid remapped VkDevice.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkFence *pFence = VKTRACE_NEW_ARRAY(VkFence, pPacket->fenceCount);
+    for (i = 0; i < pPacket->fenceCount; i++)
+    {
+        (*(pFence + i)) = m_objMapper.remap_fences((*(pPacket->pFences + i)));
+        if (*(pFence + i) == VK_NULL_HANDLE)
+        {
+            vktrace_LogError("Skipping vkWaitForFences() due to invalid remapped VkFence.");
+            VKTRACE_DELETE(pFence);
+            return VK_ERROR_VALIDATION_FAILED_EXT;
+        }
+    }
+    if (pPacket->result == VK_SUCCESS)
+    {
+        replayResult = m_vkFuncs.real_vkWaitForFences(remappedDevice, pPacket->fenceCount, pFence, pPacket->waitAll, UINT64_MAX);// mean as long as possible
+    }
+    else
+    {
+        if (pPacket->result == VK_TIMEOUT)
+        {
+            replayResult = m_vkFuncs.real_vkWaitForFences(remappedDevice, pPacket->fenceCount, pFence, pPacket->waitAll, 0);
+        }
+        else
+        {
+            replayResult = m_vkFuncs.real_vkWaitForFences(remappedDevice, pPacket->fenceCount, pFence, pPacket->waitAll, pPacket->timeout);
+        }
+    }
+    VKTRACE_DELETE(pFence);
+    return replayResult;
+}
+
+bool vkReplay::getMemoryTypeIdx(VkDevice traceDevice,
+                                VkDevice replayDevice,
+                                uint32_t traceIdx,
+                                uint32_t* pReplayIdx)
+{
+    VkPhysicalDevice tracePhysicalDevice;
+    VkPhysicalDevice replayPhysicalDevice;
+    bool foundMatch = false;
+    uint32_t i,j;
+
+    if (tracePhysicalDevices.find(traceDevice) == tracePhysicalDevices.end() ||
+        replayPhysicalDevices.find(replayDevice) == replayPhysicalDevices.end())
+    {
+        goto fail;
+    }
+
+    tracePhysicalDevice = tracePhysicalDevices[traceDevice];
+    replayPhysicalDevice = replayPhysicalDevices[replayDevice];
+
+    if (min(traceMemoryProperties[tracePhysicalDevice].memoryTypeCount, replayMemoryProperties[replayPhysicalDevice].memoryTypeCount) == 0)
+    {
+        goto fail;
+    }
+
+    for (i = 0; i < min(traceMemoryProperties[tracePhysicalDevice].memoryTypeCount, replayMemoryProperties[replayPhysicalDevice].memoryTypeCount); i++)
+    {
+        if (traceMemoryProperties[tracePhysicalDevice].memoryTypes[traceIdx].propertyFlags == replayMemoryProperties[replayPhysicalDevice].memoryTypes[i].propertyFlags)
+        {
+            *pReplayIdx = i;
+            foundMatch = true;
+            break;
+        }
+    }
+
+    if (!foundMatch)
+    {
+        // Didn't find an exact match, search for a superset
+        for (i = 0; i < min(traceMemoryProperties[tracePhysicalDevice].memoryTypeCount, replayMemoryProperties[replayPhysicalDevice].memoryTypeCount); i++)
+        {
+            if (traceMemoryProperties[tracePhysicalDevice].memoryTypes[traceIdx].propertyFlags ==
+                (traceMemoryProperties[tracePhysicalDevice].memoryTypes[traceIdx].propertyFlags & replayMemoryProperties[replayPhysicalDevice].memoryTypes[i].propertyFlags))
+            {
+                *pReplayIdx = i;
+                foundMatch = true;
+                break;
+            }
+        }
+    }
+
+    if (!foundMatch)
+    {
+        // Didn't find a superset, search for mem type with both HOST_VISIBLE and HOST_COHERENT set
+        for (i = 0; i < replayMemoryProperties[replayPhysicalDevice].memoryTypeCount; i++)
+        {
+            if ((VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT|VK_MEMORY_PROPERTY_HOST_COHERENT_BIT) & replayMemoryProperties[replayPhysicalDevice].memoryTypes[i].propertyFlags)
+            {
+                *pReplayIdx = i;
+                foundMatch = true;
+                break;
+            }
+        }
+    }
+
+    if (foundMatch)
+    {
+        // Check to see if there are other replayMemoryProperties identical to the one that matched.
+        // If there are, print a warning and use the index from the trace file.
+        for (j = i+1; j < replayMemoryProperties[replayPhysicalDevice].memoryTypeCount; j++)
+        {
+            if (replayMemoryProperties[replayPhysicalDevice].memoryTypes[i].propertyFlags == replayMemoryProperties[replayPhysicalDevice].memoryTypes[j].propertyFlags)
+            {
+                vktrace_LogWarning("memoryTypes propertyFlags identical in two or more entries, using idx %d from trace", traceIdx);
+                *pReplayIdx = traceIdx;
+                return true;
+            }
+         }
+        return true;
+     }
+
+fail:
+    // Didn't find a match
+    vktrace_LogError("Cannot determine memory type during vkAllocateMemory - vkGetPhysicalDeviceMemoryProperties should be called before vkAllocateMemory.");
+    return false;
+}
+
+
+VkResult vkReplay::manually_replay_vkAllocateMemory(packet_vkAllocateMemory* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+    gpuMemObj local_mem;
+
+    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+    if (remappedDevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkAllocateMemory() due to invalid remapped VkDevice.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    if (pPacket->pAllocateInfo->pNext)
+    {
+		VkDedicatedAllocationMemoryAllocateInfoNV* x = (VkDedicatedAllocationMemoryAllocateInfoNV*)(pPacket->pAllocateInfo->pNext);
+
+		if (x->sType == VK_STRUCTURE_TYPE_DEDICATED_ALLOCATION_MEMORY_ALLOCATE_INFO_NV)
+       {
+            x->image = m_objMapper.remap_images(x->image);
+            x->buffer = m_objMapper.remap_buffers(x->buffer);
+       }
+    }
+
+    if (!m_objMapper.m_adjustForGPU)
+    {
+        uint32_t replayIdx;
+        if (getMemoryTypeIdx(pPacket->device, remappedDevice, pPacket->pAllocateInfo->memoryTypeIndex, &replayIdx))
+        {
+            *((uint32_t*)&pPacket->pAllocateInfo->memoryTypeIndex) = replayIdx;
+            replayResult = m_vkFuncs.real_vkAllocateMemory(remappedDevice, pPacket->pAllocateInfo, NULL, &local_mem.replayGpuMem);
+        } else {
+            vktrace_LogError("vkAllocateMemory() failed, couldn't find memory type for memoryTypeIndex");
+            return VK_ERROR_VALIDATION_FAILED_EXT;
+        }
+    }
+    if (replayResult == VK_SUCCESS || m_objMapper.m_adjustForGPU)
+    {
+        local_mem.pGpuMem = new (gpuMemory);
+        if (local_mem.pGpuMem)
+            local_mem.pGpuMem->setAllocInfo(pPacket->pAllocateInfo, m_objMapper.m_adjustForGPU);
+        m_objMapper.add_to_devicememorys_map(*(pPacket->pMemory), local_mem);
+    }
+    return replayResult;
+}
+
+void vkReplay::manually_replay_vkFreeMemory(packet_vkFreeMemory* pPacket)
+{
+    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+    if (remappedDevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkFreeMemory() due to invalid remapped VkDevice.");
+        return;
+    }
+
+    gpuMemObj local_mem;
+    local_mem = m_objMapper.m_devicememorys.find(pPacket->memory)->second;
+    // TODO how/when to free pendingAlloc that did not use and existing gpuMemObj
+    m_vkFuncs.real_vkFreeMemory(remappedDevice, local_mem.replayGpuMem, NULL);
+    delete local_mem.pGpuMem;
+    m_objMapper.rm_from_devicememorys_map(pPacket->memory);
+}
+
+VkResult vkReplay::manually_replay_vkMapMemory(packet_vkMapMemory* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+
+    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+    if (remappedDevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkMapMemory() due to invalid remapped VkDevice.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    gpuMemObj local_mem = m_objMapper.m_devicememorys.find(pPacket->memory)->second;
+    void* pData;
+    if (!local_mem.pGpuMem->isPendingAlloc())
+    {
+        replayResult = m_vkFuncs.real_vkMapMemory(remappedDevice, local_mem.replayGpuMem, pPacket->offset, pPacket->size, pPacket->flags, &pData);
+        if (replayResult == VK_SUCCESS)
+        {
+            if (local_mem.pGpuMem)
+            {
+                local_mem.pGpuMem->setMemoryMapRange(pData, (size_t)pPacket->size, (size_t)pPacket->offset, false);
+            }
+        }
+    }
+    else
+    {
+        if (local_mem.pGpuMem)
+        {
+            local_mem.pGpuMem->setMemoryMapRange(NULL, (size_t)pPacket->size, (size_t)pPacket->offset, true);
+        }
+    }
+    return replayResult;
+}
+
+void vkReplay::manually_replay_vkUnmapMemory(packet_vkUnmapMemory* pPacket)
+{
+    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+    if (remappedDevice == VK_NULL_HANDLE) {
+        vktrace_LogError("Skipping vkUnmapMemory() due to invalid remapped VkDevice.");
+        return;
+    }
+
+    gpuMemObj local_mem = m_objMapper.m_devicememorys.find(pPacket->memory)->second;
+    if (!local_mem.pGpuMem->isPendingAlloc())
+    {
+        if (local_mem.pGpuMem)
+        {
+            if (pPacket->pData)
+                local_mem.pGpuMem->copyMappingData(pPacket->pData, true, 0, 0);  // copies data from packet into memory buffer
+        }
+        m_vkFuncs.real_vkUnmapMemory(remappedDevice, local_mem.replayGpuMem);
+    }
+    else
+    {
+        if (local_mem.pGpuMem)
+        {
+            unsigned char *pBuf = (unsigned char *) vktrace_malloc(local_mem.pGpuMem->getMemoryMapSize());
+            if (!pBuf)
+            {
+                vktrace_LogError("vkUnmapMemory() malloc failed.");
+            }
+            local_mem.pGpuMem->setMemoryDataAddr(pBuf);
+            local_mem.pGpuMem->copyMappingData(pPacket->pData, true, 0, 0);
+        }
+    }
+}
+
+BOOL isvkFlushMappedMemoryRangesSpecial(PBYTE pOPTPackageData)
+{
+    BOOL bRet = FALSE;
+    PageGuardChangedBlockInfo *pChangedInfoArray = (PageGuardChangedBlockInfo *)pOPTPackageData;
+    if (((uint64_t)pChangedInfoArray[0].reserve0) & PAGEGUARD_SPECIAL_FORMAT_PACKET_FOR_VKFLUSHMAPPEDMEMORYRANGES) // TODO need think about 32bit
+    {
+        bRet = TRUE;
+    }
+    return bRet;
+}
+
+//after OPT speed up, the format of this packet will be different with before, the packet now only include changed block(page).
+//
+VkResult vkReplay::manually_replay_vkFlushMappedMemoryRanges(packet_vkFlushMappedMemoryRanges* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+
+    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+    if (remappedDevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkFlushMappedMemoryRanges() due to invalid remapped VkDevice.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkMappedMemoryRange* localRanges = VKTRACE_NEW_ARRAY(VkMappedMemoryRange, pPacket->memoryRangeCount);
+    memcpy(localRanges, pPacket->pMemoryRanges, sizeof(VkMappedMemoryRange) * (pPacket->memoryRangeCount));
+
+    gpuMemObj* pLocalMems = VKTRACE_NEW_ARRAY(gpuMemObj, pPacket->memoryRangeCount);
+    for (uint32_t i = 0; i < pPacket->memoryRangeCount; i++)
+    {
+        pLocalMems[i] = m_objMapper.m_devicememorys.find(pPacket->pMemoryRanges[i].memory)->second;
+        localRanges[i].memory = m_objMapper.remap_devicememorys(pPacket->pMemoryRanges[i].memory);
+        if (localRanges[i].memory == VK_NULL_HANDLE || pLocalMems[i].pGpuMem == NULL)
+        {
+            vktrace_LogError("Skipping vkFlushMappedMemoryRanges() due to invalid remapped VkDeviceMemory.");
+            VKTRACE_DELETE(localRanges);
+            VKTRACE_DELETE(pLocalMems);
+            return VK_ERROR_VALIDATION_FAILED_EXT;
+        }
+
+        if (!pLocalMems[i].pGpuMem->isPendingAlloc())
+        {
+            if (pPacket->pMemoryRanges[i].size != 0)
+            {
+#ifdef USE_PAGEGUARD_SPEEDUP
+                if(vktrace_check_min_version(VKTRACE_TRACE_FILE_VERSION_5))
+                    pLocalMems[i].pGpuMem->copyMappingDataPageGuard(pPacket->ppData[i]);
+                else
+                    pLocalMems[i].pGpuMem->copyMappingData(pPacket->ppData[i], false, (size_t)pPacket->pMemoryRanges[i].size, (size_t)pPacket->pMemoryRanges[i].offset);
+#else
+                pLocalMems[i].pGpuMem->copyMappingData(pPacket->ppData[i], false, (size_t)pPacket->pMemoryRanges[i].size, (size_t)pPacket->pMemoryRanges[i].offset);
+#endif
+            }
+        }
+        else
+        {
+            unsigned char *pBuf = (unsigned char *) vktrace_malloc(pLocalMems[i].pGpuMem->getMemoryMapSize());
+            if (!pBuf)
+            {
+                vktrace_LogError("vkFlushMappedMemoryRanges() malloc failed.");
+            }
+            pLocalMems[i].pGpuMem->setMemoryDataAddr(pBuf);
+#ifdef USE_PAGEGUARD_SPEEDUP
+            if(vktrace_check_min_version(VKTRACE_TRACE_FILE_VERSION_5))
+                pLocalMems[i].pGpuMem->copyMappingDataPageGuard(pPacket->ppData[i]);
+            else
+                pLocalMems[i].pGpuMem->copyMappingData(pPacket->ppData[i], false, (size_t)pPacket->pMemoryRanges[i].size, (size_t)pPacket->pMemoryRanges[i].offset);
+#else
+            pLocalMems[i].pGpuMem->copyMappingData(pPacket->ppData[i], false, (size_t)pPacket->pMemoryRanges[i].size, (size_t)pPacket->pMemoryRanges[i].offset);
+#endif
+        }
+    }
+
+#ifdef USE_PAGEGUARD_SPEEDUP
+    replayResult = pPacket->result;//if this is a OPT refresh-all packet, we need avoid to call real api and return original return to avoid error message;
+    if (!vktrace_check_min_version(VKTRACE_TRACE_FILE_VERSION_5) || !isvkFlushMappedMemoryRangesSpecial((PBYTE)pPacket->ppData[0]))
+#endif
+    {
+        replayResult = m_vkFuncs.real_vkFlushMappedMemoryRanges(remappedDevice, pPacket->memoryRangeCount, localRanges);
+    }
+
+    VKTRACE_DELETE(localRanges);
+    VKTRACE_DELETE(pLocalMems);
+
+    return replayResult;
+}
+
+//InvalidateMappedMemory Ranges and flushMappedMemoryRanges are similar but keep it seperate until
+//functionality tested fully
+VkResult vkReplay::manually_replay_vkInvalidateMappedMemoryRanges(packet_vkInvalidateMappedMemoryRanges* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+
+    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+    if (remappedDevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkInvalidateMappedMemoryRanges() due to invalid remapped VkDevice.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkMappedMemoryRange* localRanges = VKTRACE_NEW_ARRAY(VkMappedMemoryRange, pPacket->memoryRangeCount);
+    memcpy(localRanges, pPacket->pMemoryRanges, sizeof(VkMappedMemoryRange) * (pPacket->memoryRangeCount));
+
+    gpuMemObj* pLocalMems = VKTRACE_NEW_ARRAY(gpuMemObj, pPacket->memoryRangeCount);
+    for (uint32_t i = 0; i < pPacket->memoryRangeCount; i++)
+    {
+        pLocalMems[i] = m_objMapper.m_devicememorys.find(pPacket->pMemoryRanges[i].memory)->second;
+        localRanges[i].memory = m_objMapper.remap_devicememorys(pPacket->pMemoryRanges[i].memory);
+        if (localRanges[i].memory == VK_NULL_HANDLE || pLocalMems[i].pGpuMem == NULL)
+        {
+            vktrace_LogError("Skipping vkInvalidsateMappedMemoryRanges() due to invalid remapped VkDeviceMemory.");
+            VKTRACE_DELETE(localRanges);
+            VKTRACE_DELETE(pLocalMems);
+            return VK_ERROR_VALIDATION_FAILED_EXT;
+        }
+
+        if (!pLocalMems[i].pGpuMem->isPendingAlloc())
+        {
+            if (pPacket->pMemoryRanges[i].size != 0)
+            {
+                pLocalMems[i].pGpuMem->copyMappingData(pPacket->ppData[i], false, (size_t)pPacket->pMemoryRanges[i].size, (size_t)pPacket->pMemoryRanges[i].offset);
+            }
+        }
+        else
+        {
+            unsigned char *pBuf = (unsigned char *) vktrace_malloc(pLocalMems[i].pGpuMem->getMemoryMapSize());
+            if (!pBuf)
+            {
+                vktrace_LogError("vkInvalidateMappedMemoryRanges() malloc failed.");
+            }
+            pLocalMems[i].pGpuMem->setMemoryDataAddr(pBuf);
+            pLocalMems[i].pGpuMem->copyMappingData(pPacket->ppData[i], false, (size_t)pPacket->pMemoryRanges[i].size, (size_t)pPacket->pMemoryRanges[i].offset);
+        }
+    }
+
+    replayResult = m_vkFuncs.real_vkInvalidateMappedMemoryRanges(remappedDevice, pPacket->memoryRangeCount, localRanges);
+
+    VKTRACE_DELETE(localRanges);
+    VKTRACE_DELETE(pLocalMems);
+
+    return replayResult;
+}
+
+VkResult vkReplay::manually_replay_vkGetPhysicalDeviceSurfaceSupportKHR(packet_vkGetPhysicalDeviceSurfaceSupportKHR* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+
+    VkPhysicalDevice remappedphysicalDevice = m_objMapper.remap_physicaldevices(pPacket->physicalDevice);
+    if (remappedphysicalDevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkGetPhysicalDeviceSurfaceSupportKHR() due to invalid remapped VkPhysicalDevice.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkSurfaceKHR remappedSurfaceKHR = m_objMapper.remap_surfacekhrs(pPacket->surface);
+    if (remappedSurfaceKHR == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkGetPhysicalDeviceSurfaceSupportKHR() due to invalid remapped VkSurfaceKHR.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    replayResult = m_vkFuncs.real_vkGetPhysicalDeviceSurfaceSupportKHR(remappedphysicalDevice, pPacket->queueFamilyIndex, remappedSurfaceKHR, pPacket->pSupported);
+
+    return replayResult;
+}
+
+void vkReplay::manually_replay_vkGetPhysicalDeviceMemoryProperties(packet_vkGetPhysicalDeviceMemoryProperties* pPacket)
+{
+    VkPhysicalDevice remappedphysicalDevice = m_objMapper.remap_physicaldevices(pPacket->physicalDevice);
+    if (remappedphysicalDevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkGetPhysicalDeviceMemoryProperties() due to invalid remapped VkPhysicalDevice.");
+        return;
+    }
+
+    traceMemoryProperties[pPacket->physicalDevice] = *(pPacket->pMemoryProperties);
+    m_vkFuncs.real_vkGetPhysicalDeviceMemoryProperties(remappedphysicalDevice, pPacket->pMemoryProperties);
+    replayMemoryProperties[remappedphysicalDevice] = *(pPacket->pMemoryProperties);
+    return;
+}
+
+void vkReplay::manually_replay_vkGetPhysicalDeviceQueueFamilyProperties(packet_vkGetPhysicalDeviceQueueFamilyProperties* pPacket)
+{
+    VkPhysicalDevice remappedphysicalDevice = m_objMapper.remap_physicaldevices(pPacket->physicalDevice);
+    if (pPacket->physicalDevice != VK_NULL_HANDLE && remappedphysicalDevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkGetPhysicalDeviceQueueFamilyProperties() due to invalid remapped VkPhysicalDevice.");
+        return;
+    }
+
+    // If we haven't previously allocated queueFamilyProperties for the trace physical device, allocate it.
+    // If we previously allocated queueFamilyProperities for the trace physical device and the size of this
+    // query is larger than what we saved last time, then free the last properties map and allocate a new map.
+    if (traceQueueFamilyProperties.find(pPacket->physicalDevice) == traceQueueFamilyProperties.end() ||
+        *pPacket->pQueueFamilyPropertyCount > traceQueueFamilyProperties[pPacket->physicalDevice].count)
+    {
+        if (traceQueueFamilyProperties.find(pPacket->physicalDevice) != traceQueueFamilyProperties.end())
+        {
+            free(traceQueueFamilyProperties[pPacket->physicalDevice].queueFamilyProperties);
+            traceQueueFamilyProperties.erase(pPacket->physicalDevice);
+        }
+        if (pPacket->pQueueFamilyProperties)
+        {
+            traceQueueFamilyProperties[pPacket->physicalDevice].queueFamilyProperties =
+                (VkQueueFamilyProperties*)malloc(*pPacket->pQueueFamilyPropertyCount * sizeof(VkQueueFamilyProperties));
+            memcpy(traceQueueFamilyProperties[pPacket->physicalDevice].queueFamilyProperties,
+                   pPacket->pQueueFamilyProperties,
+                   *pPacket->pQueueFamilyPropertyCount * sizeof(VkQueueFamilyProperties));
+            traceQueueFamilyProperties[pPacket->physicalDevice].count = *pPacket->pQueueFamilyPropertyCount;
+        }
+    }
+
+    m_vkFuncs.real_vkGetPhysicalDeviceQueueFamilyProperties(remappedphysicalDevice, pPacket->pQueueFamilyPropertyCount, pPacket->pQueueFamilyProperties);
+
+    // If we haven't previously allocated queueFamilyProperties for the replay physical device, allocate it.
+    // If we previously allocated queueFamilyProperities for the replay physical device and the size of this
+    // query is larger than what we saved last time, then free the last properties map and allocate a new map.
+    if (replayQueueFamilyProperties.find(remappedphysicalDevice) == replayQueueFamilyProperties.end() ||
+        *pPacket->pQueueFamilyPropertyCount > replayQueueFamilyProperties[remappedphysicalDevice].count)
+    {
+        if (replayQueueFamilyProperties.find(remappedphysicalDevice) != replayQueueFamilyProperties.end())
+        {
+            free(replayQueueFamilyProperties[remappedphysicalDevice].queueFamilyProperties);
+            replayQueueFamilyProperties.erase(remappedphysicalDevice);
+        }
+        if (pPacket->pQueueFamilyProperties)
+        {
+            replayQueueFamilyProperties[remappedphysicalDevice].queueFamilyProperties =
+                (VkQueueFamilyProperties*)malloc(*pPacket->pQueueFamilyPropertyCount * sizeof(VkQueueFamilyProperties));
+            memcpy(replayQueueFamilyProperties[remappedphysicalDevice].queueFamilyProperties,
+                   pPacket->pQueueFamilyProperties,
+                   *pPacket->pQueueFamilyPropertyCount * sizeof(VkQueueFamilyProperties));
+            replayQueueFamilyProperties[remappedphysicalDevice].count = *pPacket->pQueueFamilyPropertyCount;
+        }
+    }
+
+
+    return;
+}
+
+VkResult vkReplay::manually_replay_vkGetPhysicalDeviceSurfaceCapabilitiesKHR(packet_vkGetPhysicalDeviceSurfaceCapabilitiesKHR* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+
+    VkPhysicalDevice remappedphysicalDevice = m_objMapper.remap_physicaldevices(pPacket->physicalDevice);
+    if (remappedphysicalDevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkGetPhysicalDeviceSurfaceCapabilitiesKHR() due to invalid remapped VkPhysicalDevice.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkSurfaceKHR remappedSurfaceKHR = m_objMapper.remap_surfacekhrs(pPacket->surface);
+    if (remappedSurfaceKHR == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkGetPhysicalDeviceSurfaceCapabilitiesKHR() due to invalid remapped VkSurfaceKHR.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    m_display->resize_window(pPacket->pSurfaceCapabilities->currentExtent.width, pPacket->pSurfaceCapabilities->currentExtent.height);
+
+    replayResult = m_vkFuncs.real_vkGetPhysicalDeviceSurfaceCapabilitiesKHR(remappedphysicalDevice, remappedSurfaceKHR, pPacket->pSurfaceCapabilities);
+
+    return replayResult;
+}
+
+VkResult vkReplay::manually_replay_vkGetPhysicalDeviceSurfaceFormatsKHR(packet_vkGetPhysicalDeviceSurfaceFormatsKHR* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+
+    VkPhysicalDevice remappedphysicalDevice = m_objMapper.remap_physicaldevices(pPacket->physicalDevice);
+    if (remappedphysicalDevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkGetPhysicalDeviceSurfaceFormatsKHR() due to invalid remapped VkPhysicalDevice.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkSurfaceKHR remappedSurfaceKHR = m_objMapper.remap_surfacekhrs(pPacket->surface);
+    if (remappedSurfaceKHR == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkGetPhysicalDeviceSurfaceFormatsKHR() due to invalid remapped VkSurfaceKHR.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    replayResult = m_vkFuncs.real_vkGetPhysicalDeviceSurfaceFormatsKHR(remappedphysicalDevice, remappedSurfaceKHR, pPacket->pSurfaceFormatCount, pPacket->pSurfaceFormats);
+
+    return replayResult;
+}
+
+VkResult vkReplay::manually_replay_vkGetPhysicalDeviceSurfacePresentModesKHR(packet_vkGetPhysicalDeviceSurfacePresentModesKHR* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+
+    VkPhysicalDevice remappedphysicalDevice = m_objMapper.remap_physicaldevices(pPacket->physicalDevice);
+    if (remappedphysicalDevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkGetPhysicalDeviceSurfacePresentModesKHR() due to invalid remapped VkPhysicalDevice.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkSurfaceKHR remappedSurfaceKHR = m_objMapper.remap_surfacekhrs(pPacket->surface);
+    if (remappedSurfaceKHR == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkGetPhysicalDeviceSurfacePresentModesKHR() due to invalid remapped VkSurfaceKHR.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    replayResult = m_vkFuncs.real_vkGetPhysicalDeviceSurfacePresentModesKHR(remappedphysicalDevice, remappedSurfaceKHR, pPacket->pPresentModeCount, pPacket->pPresentModes);
+
+    return replayResult;
+}
+
+VkResult vkReplay::manually_replay_vkCreateSwapchainKHR(packet_vkCreateSwapchainKHR* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+    VkSwapchainKHR local_pSwapchain;
+    VkSwapchainKHR save_oldSwapchain, *pSC;
+    VkSurfaceKHR save_surface;
+    pSC = (VkSwapchainKHR *) &pPacket->pCreateInfo->oldSwapchain;
+    VkDevice remappeddevice = m_objMapper.remap_devices(pPacket->device);
+    if (remappeddevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkCreateSwapchainKHR() due to invalid remapped VkDevice.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    save_oldSwapchain = pPacket->pCreateInfo->oldSwapchain;
+    (*pSC) = m_objMapper.remap_swapchainkhrs(save_oldSwapchain);
+    if ((*pSC) == VK_NULL_HANDLE && save_oldSwapchain != VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkCreateSwapchainKHR() due to invalid remapped VkSwapchainKHR.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    save_surface = pPacket->pCreateInfo->surface;
+    VkSurfaceKHR *pSurf = (VkSurfaceKHR *) &(pPacket->pCreateInfo->surface);
+    *pSurf = m_objMapper.remap_surfacekhrs(*pSurf);
+    if (*pSurf == VK_NULL_HANDLE && pPacket->pCreateInfo->surface != VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkCreateSwapchainKHR() due to invalid remapped VkSurfaceKHR.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    m_display->resize_window(pPacket->pCreateInfo->imageExtent.width, pPacket->pCreateInfo->imageExtent.height);
+
+    // Convert queueFamilyIndices
+    if (pPacket->pCreateInfo)
+    {
+        for (uint32_t i = 0; i < pPacket->pCreateInfo->queueFamilyIndexCount; i++)
+        {
+            uint32_t replayIdx;
+            if (pPacket->pCreateInfo->pQueueFamilyIndices &&
+                getQueueFamilyIdx(pPacket->device,
+                                  remappeddevice,
+                                  pPacket->pCreateInfo->pQueueFamilyIndices[i],
+                                  &replayIdx))
+                {
+                    *((uint32_t*)&pPacket->pCreateInfo->pQueueFamilyIndices[i]) = replayIdx;
+                }
+            else {
+                vktrace_LogError("vkSwapchainCreateInfoKHR, bad queueFamilyIndex");
+                return VK_ERROR_VALIDATION_FAILED_EXT;
+            }
+        }
+    }
+
+    // Get the list of VkFormats that are supported:
+    VkPhysicalDevice remappedPhysicalDevice = replayPhysicalDevices[remappeddevice];
+    uint32_t formatCount;
+    VkResult res;
+    // Note that pPacket->pCreateInfo->surface has been remapped above
+    res = vkGetPhysicalDeviceSurfaceFormatsKHR(remappedPhysicalDevice, pPacket->pCreateInfo->surface,
+                                               &formatCount, NULL);
+    assert(!res);
+    VkSurfaceFormatKHR *surfFormats =
+        (VkSurfaceFormatKHR *)malloc(formatCount * sizeof(VkSurfaceFormatKHR));
+    assert(surfFormats);
+    res = vkGetPhysicalDeviceSurfaceFormatsKHR(remappedPhysicalDevice, pPacket->pCreateInfo->surface,
+                                               &formatCount, surfFormats);
+    assert(!res);
+    // If the format list includes just one entry of VK_FORMAT_UNDEFINED,
+    // the surface has no preferred format.  Otherwise, at least one
+    // supported format will be returned.
+    if (!(formatCount == 1 && surfFormats[0].format == VK_FORMAT_UNDEFINED)) {
+        bool found = false;
+        for (uint32_t i = 0; i < formatCount; i++) {
+            if (pPacket->pCreateInfo->imageFormat == surfFormats[i].format) {
+                found = true;
+                break;
+            }
+        }
+        if (!found) {
+            vktrace_LogWarning("Format %d is not supported for presentable images, using format %d",
+                               pPacket->pCreateInfo->imageFormat, surfFormats[0].format);
+            VkFormat *pFormat = (VkFormat *) &(pPacket->pCreateInfo->imageFormat);
+            *pFormat = surfFormats[0].format;
+        }
+    }
+    free(surfFormats);
+
+    replayResult = m_vkFuncs.real_vkCreateSwapchainKHR(remappeddevice, pPacket->pCreateInfo, pPacket->pAllocator, &local_pSwapchain);
+    if (replayResult == VK_SUCCESS)
+    {
+        m_objMapper.add_to_swapchainkhrs_map(*(pPacket->pSwapchain), local_pSwapchain);
+    }
+
+    (*pSC) = save_oldSwapchain;
+    *pSurf = save_surface;
+    return replayResult;
+}
+
+VkResult vkReplay::manually_replay_vkGetSwapchainImagesKHR(packet_vkGetSwapchainImagesKHR* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+    VkDevice remappeddevice = m_objMapper.remap_devices(pPacket->device);
+    if (remappeddevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkGetSwapchainImagesKHR() due to invalid remapped VkDevice.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkSwapchainKHR remappedswapchain;
+    remappedswapchain = m_objMapper.remap_swapchainkhrs(pPacket->swapchain);
+    if (remappedswapchain == VK_NULL_HANDLE && pPacket->swapchain != VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkCreateSwapchainKHR() due to invalid remapped VkSwapchainKHR.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkImage packetImage[128] = {0};
+    uint32_t numImages = 0;
+    if (pPacket->pSwapchainImages != NULL) {
+        // Need to store the images and then add to map after we get actual image handles back
+        VkImage* pPacketImages = (VkImage*)pPacket->pSwapchainImages;
+        numImages = *(pPacket->pSwapchainImageCount);
+        for (uint32_t i = 0; i < numImages; i++) {
+            packetImage[i] = pPacketImages[i];
+            traceImageToDevice[packetImage[i]] = pPacket->device;
+        }
+    }
+
+    replayResult = m_vkFuncs.real_vkGetSwapchainImagesKHR(remappeddevice, remappedswapchain, pPacket->pSwapchainImageCount, pPacket->pSwapchainImages);
+    if (replayResult == VK_SUCCESS)
+    {
+        if (numImages != 0) {
+            VkImage* pReplayImages = (VkImage*)pPacket->pSwapchainImages;
+            for (uint32_t i = 0; i < numImages; i++) {
+                imageObj local_imageObj;
+                local_imageObj.replayImage = pReplayImages[i];
+                m_objMapper.add_to_images_map(packetImage[i], local_imageObj);
+                replayImageToDevice[pReplayImages[i]] = remappeddevice;
+            }
+        }
+    }
+    return replayResult;
+}
+
+VkResult vkReplay::manually_replay_vkQueuePresentKHR(packet_vkQueuePresentKHR* pPacket)
+{
+    VkResult replayResult = VK_SUCCESS;
+    VkQueue remappedQueue = m_objMapper.remap_queues(pPacket->queue);
+    if (remappedQueue == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkQueuePresentKHR() due to invalid remapped VkQueue.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkSemaphore localSemaphores[5];
+    VkSwapchainKHR localSwapchains[5];
+    VkResult localResults[5];
+    VkSemaphore *pRemappedWaitSems = localSemaphores;
+    VkSwapchainKHR *pRemappedSwapchains = localSwapchains;
+    VkResult *pResults = localResults;
+    VkPresentInfoKHR present;
+    uint32_t i;
+
+    if (pPacket->pPresentInfo->swapchainCount > 5) {
+        pRemappedSwapchains = VKTRACE_NEW_ARRAY(VkSwapchainKHR, pPacket->pPresentInfo->swapchainCount);
+    }
+
+    if (pPacket->pPresentInfo->swapchainCount > 5 && pPacket->pPresentInfo->pResults != NULL) {
+        pResults = VKTRACE_NEW_ARRAY(VkResult, pPacket->pPresentInfo->swapchainCount);
+    }
+
+    if (pPacket->pPresentInfo->waitSemaphoreCount > 5) {
+        pRemappedWaitSems = VKTRACE_NEW_ARRAY(VkSemaphore, pPacket->pPresentInfo->waitSemaphoreCount);
+    }
+
+    if (pRemappedSwapchains == NULL || pRemappedWaitSems == NULL || pResults == NULL)
+    {
+        replayResult = VK_ERROR_OUT_OF_HOST_MEMORY;
+    }
+
+    if (replayResult == VK_SUCCESS) {
+        for (i=0; i<pPacket->pPresentInfo->swapchainCount; i++) {
+            pRemappedSwapchains[i] = m_objMapper.remap_swapchainkhrs(pPacket->pPresentInfo->pSwapchains[i]);
+            if (pRemappedSwapchains[i] == VK_NULL_HANDLE)
+            {
+                vktrace_LogError("Skipping vkQueuePresentKHR() due to invalid remapped VkSwapchainKHR.");
+                if (pRemappedWaitSems != NULL && pRemappedWaitSems != localSemaphores) {
+                    VKTRACE_DELETE(pRemappedWaitSems);
+                }
+                if (pResults != NULL && pResults != localResults) {
+                    VKTRACE_DELETE(pResults);
+                }
+                if (pRemappedSwapchains != NULL && pRemappedSwapchains != localSwapchains) {
+                    VKTRACE_DELETE(pRemappedSwapchains);
+                }
+                return VK_ERROR_VALIDATION_FAILED_EXT;
+            }
+        }
+
+        assert(pPacket->pPresentInfo->swapchainCount == 1 && "Multiple swapchain images not supported yet");
+        uint32_t remappedImageIndex = m_objMapper.remap_pImageIndex(*pPacket->pPresentInfo->pImageIndices);
+        if (remappedImageIndex == UINT32_MAX)
+        {
+            vktrace_LogError("Skipping vkQueuePresentKHR() due to invalid remapped pImageIndices.");
+            return VK_ERROR_VALIDATION_FAILED_EXT;
+        }
+
+        present.sType = pPacket->pPresentInfo->sType;
+        present.pNext = pPacket->pPresentInfo->pNext;
+        present.swapchainCount = pPacket->pPresentInfo->swapchainCount;
+        present.pSwapchains = pRemappedSwapchains;
+        present.pImageIndices = &remappedImageIndex;
+        present.waitSemaphoreCount = pPacket->pPresentInfo->waitSemaphoreCount;
+        present.pWaitSemaphores = NULL;
+        if (present.waitSemaphoreCount != 0) {
+            present.pWaitSemaphores = pRemappedWaitSems;
+            for (i = 0; i < pPacket->pPresentInfo->waitSemaphoreCount; i++) {
+                (*(pRemappedWaitSems + i)) = m_objMapper.remap_semaphores((*(pPacket->pPresentInfo->pWaitSemaphores + i)));
+                if (*(pRemappedWaitSems + i) == VK_NULL_HANDLE)
+                {
+                    vktrace_LogError("Skipping vkQueuePresentKHR() due to invalid remapped wait VkSemaphore.");
+                    if (pRemappedWaitSems != NULL && pRemappedWaitSems != localSemaphores) {
+                        VKTRACE_DELETE(pRemappedWaitSems);
+                    }
+                    if (pResults != NULL && pResults != localResults) {
+                        VKTRACE_DELETE(pResults);
+                    }
+                    if (pRemappedSwapchains != NULL && pRemappedSwapchains != localSwapchains) {
+                        VKTRACE_DELETE(pRemappedSwapchains);
+                    }
+                    return VK_ERROR_VALIDATION_FAILED_EXT;
+                }
+            }
+        }
+        present.pResults = NULL;
+    }
+
+    if (replayResult == VK_SUCCESS) {
+        // If the application requested per-swapchain results, set up to get the results from the replay.
+        if (pPacket->pPresentInfo->pResults != NULL) {
+            present.pResults = pResults;
+        }
+
+        replayResult = m_vkFuncs.real_vkQueuePresentKHR(remappedQueue, &present);
+
+        m_frameNumber++;
+
+        // Compare the results from the trace file with those just received from the replay.  Report any differences.
+        if (present.pResults != NULL) {
+            for (i=0; i<pPacket->pPresentInfo->swapchainCount; i++) {
+                if (present.pResults[i] != pPacket->pPresentInfo->pResults[i]) {
+                    vktrace_LogError("Return value %s from API call (VkQueuePresentKHR) does not match return value from trace file %s for swapchain %d.",
+                                     string_VkResult(present.pResults[i]), string_VkResult(pPacket->pPresentInfo->pResults[i]), i);
+                }
+            }
+        }
+    }
+
+    if (pRemappedWaitSems != NULL && pRemappedWaitSems != localSemaphores) {
+        VKTRACE_DELETE(pRemappedWaitSems);
+    }
+    if (pResults != NULL && pResults != localResults) {
+        VKTRACE_DELETE(pResults);
+    }
+    if (pRemappedSwapchains != NULL && pRemappedSwapchains != localSwapchains) {
+        VKTRACE_DELETE(pRemappedSwapchains);
+    }
+
+    return replayResult;
+}
+
+VkResult vkReplay::manually_replay_vkCreateXcbSurfaceKHR(packet_vkCreateXcbSurfaceKHR* pPacket)
+{
+    VkResult replayResult = VK_SUCCESS;
+    VkSurfaceKHR local_pSurface = VK_NULL_HANDLE;
+    VkInstance remappedInstance = m_objMapper.remap_instances(pPacket->instance);
+    if (remappedInstance == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkCreateXcbSurfaceKHR() due to invalid remapped VkInstance.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+#if defined(PLATFORM_LINUX) && !defined(ANDROID)
+    VkIcdSurfaceXcb *pSurf = (VkIcdSurfaceXcb *) m_display->get_surface();
+    VkXcbSurfaceCreateInfoKHR createInfo;
+    createInfo.sType = VK_STRUCTURE_TYPE_XCB_SURFACE_CREATE_INFO_KHR;
+    createInfo.pNext = pPacket->pCreateInfo->pNext;
+    createInfo.flags = pPacket->pCreateInfo->flags;
+    createInfo.connection = pSurf->connection;
+    createInfo.window = pSurf->window;
+    replayResult = m_vkFuncs.real_vkCreateXcbSurfaceKHR(remappedInstance, &createInfo, pPacket->pAllocator, &local_pSurface);
+#elif defined WIN32
+    VkIcdSurfaceWin32 *pSurf = (VkIcdSurfaceWin32 *) m_display->get_surface();
+    VkWin32SurfaceCreateInfoKHR createInfo;
+    createInfo.sType = VK_STRUCTURE_TYPE_WIN32_SURFACE_CREATE_INFO_KHR;
+    createInfo.pNext = pPacket->pCreateInfo->pNext;
+    createInfo.flags = pPacket->pCreateInfo->flags;
+    createInfo.hinstance = pSurf->hinstance;
+    createInfo.hwnd = pSurf->hwnd;
+    replayResult = m_vkFuncs.real_vkCreateWin32SurfaceKHR(remappedInstance, &createInfo, pPacket->pAllocator, &local_pSurface);
+#else
+    vktrace_LogError("manually_replay_vkCreateXcbSurfaceKHR not implemented on this vkreplay platform");
+    replayResult = VK_ERROR_FEATURE_NOT_PRESENT;
+#endif
+
+    if (replayResult == VK_SUCCESS) {
+        m_objMapper.add_to_surfacekhrs_map(*(pPacket->pSurface), local_pSurface);
+    }
+    return replayResult;
+}
+
+VkResult vkReplay::manually_replay_vkCreateXlibSurfaceKHR(packet_vkCreateXlibSurfaceKHR* pPacket)
+{
+    VkResult replayResult = VK_SUCCESS;
+    VkSurfaceKHR local_pSurface = VK_NULL_HANDLE;
+    VkInstance remappedinstance = m_objMapper.remap_instances(pPacket->instance);
+
+    if (pPacket->instance != VK_NULL_HANDLE && remappedinstance == VK_NULL_HANDLE) {
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+#if defined PLATFORM_LINUX && defined VK_USE_PLATFORM_XLIB_KHR
+    VkIcdSurfaceXlib *pSurf = (VkIcdSurfaceXlib *) m_display->get_surface();
+    VkXlibSurfaceCreateInfoKHR createInfo;
+    createInfo.sType = VK_STRUCTURE_TYPE_XLIB_SURFACE_CREATE_INFO_KHR;
+    createInfo.pNext = pPacket->pCreateInfo->pNext;
+    createInfo.flags = pPacket->pCreateInfo->flags;
+    createInfo.dpy = pSurf->dpy;
+    createInfo.window = pSurf->window;
+    replayResult = m_vkFuncs.real_vkCreateXlibSurfaceKHR(remappedinstance, &createInfo, pPacket->pAllocator, &local_pSurface);
+#elif defined PLATFORM_LINUX && defined VK_USE_PLATFORM_XCB_KHR
+    VkIcdSurfaceXcb *pSurf = (VkIcdSurfaceXcb *) m_display->get_surface();
+    VkXcbSurfaceCreateInfoKHR createInfo;
+    createInfo.sType = VK_STRUCTURE_TYPE_XCB_SURFACE_CREATE_INFO_KHR;
+    createInfo.pNext = pPacket->pCreateInfo->pNext;
+    createInfo.flags = pPacket->pCreateInfo->flags;
+    createInfo.connection = pSurf->connection;
+    createInfo.window = pSurf->window;
+    replayResult = m_vkFuncs.real_vkCreateXcbSurfaceKHR(remappedinstance, &createInfo, pPacket->pAllocator, &local_pSurface);
+#elif defined PLATFORM_LINUX && defined VK_USE_PLATFORM_ANDROID_KHR
+    // TODO
+#elif defined PLATFORM_LINUX
+#error manually_replay_vkCreateXlibSurfaceKHR on PLATFORM_LINUX requires one of VK_USE_PLATFORM_XLIB_KHR or VK_USE_PLATFORM_XCB_KHR or VK_USE_PLATFORM_ANDROID_KHR
+#elif defined WIN32
+    VkIcdSurfaceWin32 *pSurf = (VkIcdSurfaceWin32 *) m_display->get_surface();
+    VkWin32SurfaceCreateInfoKHR createInfo;
+    createInfo.sType = VK_STRUCTURE_TYPE_WIN32_SURFACE_CREATE_INFO_KHR;
+    createInfo.pNext = pPacket->pCreateInfo->pNext;
+    createInfo.flags = pPacket->pCreateInfo->flags;
+    createInfo.hinstance = pSurf->hinstance;
+    createInfo.hwnd = pSurf->hwnd;
+    replayResult = m_vkFuncs.real_vkCreateWin32SurfaceKHR(remappedinstance, &createInfo, pPacket->pAllocator, &local_pSurface);
+#else
+    vktrace_LogError("manually_replay_vkCreateXlibSurfaceKHR not implemented on this playback platform");
+    replayResult = VK_ERROR_FEATURE_NOT_PRESENT;
+#endif
+    if (replayResult == VK_SUCCESS) {
+        m_objMapper.add_to_surfacekhrs_map(*(pPacket->pSurface), local_pSurface);
+    }
+    return replayResult;
+}
+
+VkResult vkReplay::manually_replay_vkCreateWin32SurfaceKHR(packet_vkCreateWin32SurfaceKHR* pPacket)
+{
+
+    VkResult replayResult = VK_SUCCESS;
+    VkSurfaceKHR local_pSurface = VK_NULL_HANDLE;
+    VkInstance remappedInstance = m_objMapper.remap_instances(pPacket->instance);
+    if (remappedInstance == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkCreateWin32SurfaceKHR() due to invalid remapped VkInstance.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+#if defined WIN32
+    VkIcdSurfaceWin32 *pSurf = (VkIcdSurfaceWin32 *) m_display->get_surface();
+    VkWin32SurfaceCreateInfoKHR createInfo;
+    createInfo.sType = VK_STRUCTURE_TYPE_WIN32_SURFACE_CREATE_INFO_KHR;
+    createInfo.pNext = pPacket->pCreateInfo->pNext;
+    createInfo.flags = pPacket->pCreateInfo->flags;
+    createInfo.hinstance = pSurf->hinstance;
+    createInfo.hwnd = pSurf->hwnd;
+    replayResult = m_vkFuncs.real_vkCreateWin32SurfaceKHR(remappedInstance, &createInfo, pPacket->pAllocator, &local_pSurface);
+#elif defined(PLATFORM_LINUX) && !defined(ANDROID)
+    VkIcdSurfaceXcb *pSurf = (VkIcdSurfaceXcb *) m_display->get_surface();
+    VkXcbSurfaceCreateInfoKHR createInfo;
+    createInfo.sType = VK_STRUCTURE_TYPE_XCB_SURFACE_CREATE_INFO_KHR;
+    createInfo.pNext = pPacket->pCreateInfo->pNext;
+    createInfo.flags = pPacket->pCreateInfo->flags;
+    createInfo.connection = pSurf->connection;
+    createInfo.window = pSurf->window;
+    replayResult = m_vkFuncs.real_vkCreateXcbSurfaceKHR(remappedInstance, &createInfo, pPacket->pAllocator, &local_pSurface);
+#else
+    vktrace_LogError("manually_replay_vkCreateWin32SurfaceKHR not implemented on this playback platform");
+    replayResult = VK_ERROR_FEATURE_NOT_PRESENT;
+#endif
+    if (replayResult == VK_SUCCESS) {
+        m_objMapper.add_to_surfacekhrs_map(*(pPacket->pSurface), local_pSurface);
+    }
+    return replayResult;
+}
+
+VkResult vkReplay::manually_replay_vkCreateAndroidSurfaceKHR(packet_vkCreateAndroidSurfaceKHR* pPacket)
+{
+    VkResult replayResult = VK_SUCCESS;
+    VkSurfaceKHR local_pSurface = VK_NULL_HANDLE;
+    VkInstance remappedInstance = m_objMapper.remap_instances(pPacket->instance);
+    if (remappedInstance == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkCreateAndroidSurfaceKHR() due to invalid remapped VkInstance.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+#if defined WIN32
+    VkIcdSurfaceWin32 *pSurf = (VkIcdSurfaceWin32 *) m_display->get_surface();
+    VkWin32SurfaceCreateInfoKHR createInfo;
+    createInfo.sType = pPacket->pCreateInfo->sType;
+    createInfo.pNext = pPacket->pCreateInfo->pNext;
+    createInfo.flags = pPacket->pCreateInfo->flags;
+    createInfo.hinstance = pSurf->hinstance;
+    createInfo.hwnd = pSurf->hwnd;
+    replayResult = m_vkFuncs.real_vkCreateWin32SurfaceKHR(remappedInstance, &createInfo, pPacket->pAllocator, &local_pSurface);
+#elif defined(PLATFORM_LINUX)
+#if !defined(ANDROID)
+    VkIcdSurfaceXcb *pSurf = (VkIcdSurfaceXcb *) m_display->get_surface();
+    VkXcbSurfaceCreateInfoKHR createInfo;
+    createInfo.sType = pPacket->pCreateInfo->sType;
+    createInfo.pNext = pPacket->pCreateInfo->pNext;
+    createInfo.flags = pPacket->pCreateInfo->flags;
+    createInfo.connection = pSurf->connection;
+    createInfo.window = pSurf->window;
+    replayResult = m_vkFuncs.real_vkCreateXcbSurfaceKHR(remappedInstance, &createInfo, pPacket->pAllocator, &local_pSurface);
+#else
+    VkIcdSurfaceAndroid *pSurf = (VkIcdSurfaceAndroid *) m_display->get_surface();
+    VkAndroidSurfaceCreateInfoKHR createInfo;
+    createInfo.sType = pPacket->pCreateInfo->sType;
+    createInfo.pNext = pPacket->pCreateInfo->pNext;
+    createInfo.flags = pPacket->pCreateInfo->flags;
+    createInfo.window = pSurf->window;
+    replayResult = m_vkFuncs.real_vkCreateAndroidSurfaceKHR(remappedInstance, &createInfo, pPacket->pAllocator, &local_pSurface); 
+#endif // ANDROID
+#else
+    vktrace_LogError("manually_replay_vkCreateAndroidSurfaceKHR not implemented on this playback platform");
+    replayResult = VK_ERROR_FEATURE_NOT_PRESENT;
+#endif
+    if (replayResult == VK_SUCCESS) {
+        m_objMapper.add_to_surfacekhrs_map(*(pPacket->pSurface), local_pSurface);
+    }
+    return replayResult;
+}
+
+VkResult  vkReplay::manually_replay_vkCreateDebugReportCallbackEXT(packet_vkCreateDebugReportCallbackEXT* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+    VkDebugReportCallbackEXT local_msgCallback;
+    VkInstance remappedInstance = m_objMapper.remap_instances(pPacket->instance);
+    if (remappedInstance == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkCreateDebugReportCallbackEXT() due to invalid remapped VkInstance.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    if (!g_fpDbgMsgCallback || !m_vkFuncs.real_vkCreateDebugReportCallbackEXT) {
+        // just eat this call as we don't have local call back function defined
+        return VK_SUCCESS;
+    } else
+    {
+        VkDebugReportCallbackCreateInfoEXT dbgCreateInfo;
+        memset(&dbgCreateInfo, 0, sizeof(dbgCreateInfo));
+        dbgCreateInfo.sType = VK_STRUCTURE_TYPE_DEBUG_REPORT_CREATE_INFO_EXT;
+        dbgCreateInfo.flags = pPacket->pCreateInfo->flags;
+        dbgCreateInfo.pfnCallback = g_fpDbgMsgCallback;
+        dbgCreateInfo.pUserData = NULL;
+        replayResult = m_vkFuncs.real_vkCreateDebugReportCallbackEXT(remappedInstance, &dbgCreateInfo, NULL, &local_msgCallback);
+        if (replayResult == VK_SUCCESS)
+        {
+                m_objMapper.add_to_debugreportcallbackexts_map(*(pPacket->pCallback), local_msgCallback);
+        }
+    }
+    return replayResult;
+}
+
+void vkReplay::manually_replay_vkDestroyDebugReportCallbackEXT(packet_vkDestroyDebugReportCallbackEXT* pPacket)
+{
+    VkInstance remappedInstance = m_objMapper.remap_instances(pPacket->instance);
+    if (remappedInstance == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkDestroyDebugReportCallbackEXT() due to invalid remapped VkInstance.");
+        return;
+    }
+
+    VkDebugReportCallbackEXT remappedMsgCallback;
+    remappedMsgCallback = m_objMapper.remap_debugreportcallbackexts(pPacket->callback);
+    if (remappedMsgCallback == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkDestroyDebugReportCallbackEXT() due to invalid remapped VkDebugReportCallbackEXT.");
+        return;
+    }
+
+    if (!g_fpDbgMsgCallback)
+    {
+        // just eat this call as we don't have local call back function defined
+        return;
+    } else
+    {
+        m_vkFuncs.real_vkDestroyDebugReportCallbackEXT(remappedInstance, remappedMsgCallback, NULL);
+    }
+}
+
+VkResult vkReplay::manually_replay_vkAllocateCommandBuffers(packet_vkAllocateCommandBuffers* pPacket)
+{
+    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;
+    VkDevice remappedDevice = m_objMapper.remap_devices(pPacket->device);
+    if (remappedDevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkAllocateCommandBuffers() due to invalid remapped VkDevice.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    VkCommandBuffer *local_pCommandBuffers = new VkCommandBuffer[pPacket->pAllocateInfo->commandBufferCount];
+    VkCommandPool local_CommandPool;
+    local_CommandPool = pPacket->pAllocateInfo->commandPool;
+    ((VkCommandBufferAllocateInfo *) pPacket->pAllocateInfo)->commandPool = m_objMapper.remap_commandpools(pPacket->pAllocateInfo->commandPool);
+    if (pPacket->pAllocateInfo->commandPool == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Skipping vkAllocateCommandBuffers() due to invalid remapped VkCommandPool.");
+        return VK_ERROR_VALIDATION_FAILED_EXT;
+    }
+
+    replayResult = m_vkFuncs.real_vkAllocateCommandBuffers(remappedDevice, pPacket->pAllocateInfo, local_pCommandBuffers);
+    ((VkCommandBufferAllocateInfo *) pPacket->pAllocateInfo)->commandPool = local_CommandPool;
+
+    if (replayResult == VK_SUCCESS)
+    {
+        for (uint32_t i = 0; i < pPacket->pAllocateInfo->commandBufferCount; i++) {
+            m_objMapper.add_to_commandbuffers_map(pPacket->pCommandBuffers[i], local_pCommandBuffers[i]);
+        }
+    }
+    delete[] local_pCommandBuffers;
+    return replayResult;
+}
+
+VkBool32 vkReplay::manually_replay_vkGetPhysicalDeviceXcbPresentationSupportKHR(packet_vkGetPhysicalDeviceXcbPresentationSupportKHR* pPacket)
+{
+    VkPhysicalDevice remappedphysicalDevice = m_objMapper.remap_physicaldevices(pPacket->physicalDevice);
+    if (remappedphysicalDevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Error detected in vkGetPhysicalDeviceXcbPresentationSupportKHR() due to invalid remapped VkPhysicalDevice.");
+        return VK_FALSE;
+    }
+
+#if defined(PLATFORM_LINUX) && !defined(ANDROID)
+    VkIcdSurfaceXcb *pSurf = (VkIcdSurfaceXcb *) m_display->get_surface();
+    m_display->get_window_handle();
+    return (m_vkFuncs.real_vkGetPhysicalDeviceXcbPresentationSupportKHR(remappedphysicalDevice, pPacket->queueFamilyIndex, pSurf->connection, m_display->get_screen_handle()->root_visual));
+#elif defined WIN32
+    return (m_vkFuncs.real_vkGetPhysicalDeviceWin32PresentationSupportKHR(remappedphysicalDevice, pPacket->queueFamilyIndex));
+#else
+    vktrace_LogError("manually_replay_vkGetPhysicalDeviceXcbPresentationSupportKHR not implemented on this playback platform");
+    return VK_FALSE;
+#endif
+}
+
+VkBool32 vkReplay::manually_replay_vkGetPhysicalDeviceXlibPresentationSupportKHR(packet_vkGetPhysicalDeviceXlibPresentationSupportKHR* pPacket)
+{
+    VkPhysicalDevice remappedphysicalDevice = m_objMapper.remap_physicaldevices(pPacket->physicalDevice);
+    if (remappedphysicalDevice == VK_NULL_HANDLE)
+    {
+        vktrace_LogError("Error detected in vkGetPhysicalDeviceXcbPresentationSupportKHR() due to invalid remapped VkPhysicalDevice.");
+        return VK_FALSE;
+    }
+
+#if defined PLATFORM_LINUX && defined VK_USE_PLATFORM_XLIB_KHR
+    VkIcdSurfaceXlib *pSurf = (VkIcdSurfaceXlib *) m_display->get_surface();
+    m_display->get_window_handle();
+    return (m_vkFuncs.real_vkGetPhysicalDeviceXlibPresentationSupportKHR(remappedphysicalDevice, pPacket->queueFamilyIndex, pSurf->dpy, m_display->get_screen_handle()->root_visual));
+#elif defined PLATFORM_LINUX && defined VK_USE_PLATFORM_XCB_KHR
+    VkIcdSurfaceXcb *pSurf = (VkIcdSurfaceXcb *) m_display->get_surface();
+    m_display->get_window_handle();
+    return (m_vkFuncs.real_vkGetPhysicalDeviceXcbPresentationSupportKHR(remappedphysicalDevice, pPacket->queueFamilyIndex, pSurf->connection, m_display->get_screen_handle()->root_visual));
+#elif defined PLATFORM_LINUX && defined VK_USE_PLATFORM_ANDROID_KHR
+    // This is not defined for Android
+    return VK_TRUE;
+#elif defined PLATFORM_LINUX
+#error manually_replay_vkGetPhysicalDeviceXlibPresentationSupportKHR on PLATFORM_LINUX requires one of VK_USE_PLATFORM_XLIB_KHR or VK_USE_PLATFORM_XCB_KHR or VK_USE_PLATFORM_ANDROID_KHR
+#elif defined WIN32
+    return (m_vkFuncs.real_vkGetPhysicalDeviceWin32PresentationSupportKHR(remappedphysicalDevice, pPacket->queueFamilyIndex));
+#else
+    vktrace_LogError("manually_replay_vkGetPhysicalDeviceXlibPresentationSupportKHR not implemented on this playback platform");
+    return VK_FALSE;
+#endif
+}
+
+VkBool32 vkReplay::manually_replay_vkGetPhysicalDeviceWin32PresentationSupportKHR(packet_vkGetPhysicalDeviceWin32PresentationSupportKHR* pPacket)
+{
+    VkPhysicalDevice remappedphysicalDevice = m_objMapper.remap_physicaldevices(pPacket->physicalDevice);
+    if (pPacket->physicalDevice != VK_NULL_HANDLE && remappedphysicalDevice == VK_NULL_HANDLE)
+    {
+        return VK_FALSE;
+    }
+
+#if defined WIN32
+    return (m_vkFuncs.real_vkGetPhysicalDeviceWin32PresentationSupportKHR(remappedphysicalDevice, pPacket->queueFamilyIndex));
+#elif defined(PLATFORM_LINUX) && !defined(ANDROID)
+    VkIcdSurfaceXcb *pSurf = (VkIcdSurfaceXcb *) m_display->get_surface();
+    m_display->get_window_handle();
+    return (m_vkFuncs.real_vkGetPhysicalDeviceXcbPresentationSupportKHR(remappedphysicalDevice, pPacket->queueFamilyIndex, pSurf->connection, m_display->get_screen_handle()->root_visual));
+#else
+    vktrace_LogError("manually_replay_vkGetPhysicalDeviceWin32PresentationSupportKHR not implemented on this playback platform");
+    return VK_FALSE;
+#endif
+}
+

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay_vkreplay.h b/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay_vkreplay.h
new file mode 100644
index 0000000..fdb79f4
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/vkreplay/vkreplay_vkreplay.h

@@ -0,0 +1,217 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ * Author: Courtney Goeltzenleuchter <courtney@LunarG.com>
+ * Author: Mark Lobodzinski <mark@lunarg.com>
+ * Author: Tobin Ehlis <tobin@lunarg.com>
+ */
+
+#pragma once
+
+#include <set>
+#include <map>
+#include <vector>
+#include <string>
+#if defined(PLATFORM_LINUX)
+#if defined(ANDROID)
+#include <android_native_app_glue.h>
+#else
+#include <xcb/xcb.h>
+#endif // ANDROID
+#endif
+#include "vktrace_multiplatform.h"
+#include "vkreplay_window.h"
+#include "vkreplay_factory.h"
+#include "vktrace_trace_packet_identifiers.h"
+#include <unordered_map>
+
+extern "C" {
+#include "vktrace_vk_vk_packets.h"
+
+// TODO138 : Need to add packets files for new wsi headers
+}
+
+#include "vulkan/vulkan.h"
+
+#include "vkreplay_vkdisplay.h"
+#include "vkreplay_vk_func_ptrs.h"
+#include "vkreplay_vk_objmapper.h"
+
+#define CHECK_RETURN_VALUE(entrypoint) returnValue = handle_replay_errors(#entrypoint, replayResult, pPacket->result, returnValue);
+
+class vkReplay {
+public:
+    ~vkReplay();
+    vkReplay(vkreplayer_settings *pReplaySettings);
+
+    int init(vktrace_replay::ReplayDisplay & disp);
+    vkDisplay * get_display() {return m_display;}
+    vktrace_replay::VKTRACE_REPLAY_RESULT replay(vktrace_trace_packet_header *packet);
+    vktrace_replay::VKTRACE_REPLAY_RESULT handle_replay_errors(const char* entrypointName, const VkResult resCall, const VkResult resTrace, const vktrace_replay::VKTRACE_REPLAY_RESULT resIn);
+
+    void push_validation_msg(VkFlags msgFlags, VkDebugReportObjectTypeEXT objType, uint64_t srcObjectHandle, size_t location, int32_t msgCode, const char* pLayerPrefix, const char* pMsg, const void* pUserData);
+    vktrace_replay::VKTRACE_REPLAY_RESULT pop_validation_msgs();
+    int dump_validation_data();
+    int get_frame_number() { return m_frameNumber; }
+    void reset_frame_number() { m_frameNumber = 0; }
+private:
+    struct vkFuncs m_vkFuncs;
+    vkReplayObjMapper m_objMapper;
+    void (*m_pDSDump)(char*);
+    void (*m_pCBDump)(char*);
+    //VKTRACESNAPSHOT_PRINT_OBJECTS m_pVktraceSnapshotPrint;
+    vkDisplay *m_display;
+
+    int m_frameNumber;
+
+    struct ValidationMsg {
+        VkFlags msgFlags;
+        VkDebugReportObjectTypeEXT objType;
+        uint64_t srcObjectHandle;
+        size_t location;
+        int32_t msgCode;
+        char layerPrefix[256];
+        char msg[256];
+        void* pUserData;
+    };
+
+    VkDebugReportCallbackEXT m_dbgMsgCallbackObj;
+
+    std::vector<struct ValidationMsg> m_validationMsgs;
+    std::vector<int> m_screenshotFrames;
+    VkResult manually_replay_vkCreateInstance(packet_vkCreateInstance* pPacket);
+    VkResult manually_replay_vkCreateDevice(packet_vkCreateDevice* pPacket);
+    VkResult manually_replay_vkCreateBuffer(packet_vkCreateBuffer* pPacket);
+    VkResult manually_replay_vkCreateImage(packet_vkCreateImage* pPacket);
+    VkResult manually_replay_vkCreateCommandPool(packet_vkCreateCommandPool* pPacket);
+    VkResult manually_replay_vkEnumeratePhysicalDevices(packet_vkEnumeratePhysicalDevices* pPacket);
+    // TODO138 : Many new functions in API now that we need to assess if manual code needed
+    //VkResult manually_replay_vkGetPhysicalDeviceInfo(packet_vkGetPhysicalDeviceInfo* pPacket);
+    //VkResult manually_replay_vkGetGlobalExtensionInfo(packet_vkGetGlobalExtensionInfo* pPacket);
+    //VkResult manually_replay_vkGetPhysicalDeviceExtensionInfo(packet_vkGetPhysicalDeviceExtensionInfo* pPacket);
+    VkResult manually_replay_vkQueueSubmit(packet_vkQueueSubmit* pPacket);
+    VkResult manually_replay_vkQueueBindSparse(packet_vkQueueBindSparse* pPacket);
+    //VkResult manually_replay_vkGetObjectInfo(packet_vkGetObjectInfo* pPacket);
+    //VkResult manually_replay_vkGetImageSubresourceInfo(packet_vkGetImageSubresourceInfo* pPacket);
+    void manually_replay_vkUpdateDescriptorSets(packet_vkUpdateDescriptorSets* pPacket);
+    VkResult manually_replay_vkCreateDescriptorSetLayout(packet_vkCreateDescriptorSetLayout* pPacket);
+    void manually_replay_vkDestroyDescriptorSetLayout(packet_vkDestroyDescriptorSetLayout* pPacket);
+    VkResult manually_replay_vkAllocateDescriptorSets(packet_vkAllocateDescriptorSets* pPacket);
+    VkResult manually_replay_vkFreeDescriptorSets(packet_vkFreeDescriptorSets* pPacket);
+    void manually_replay_vkCmdBindDescriptorSets(packet_vkCmdBindDescriptorSets* pPacket);
+    void manually_replay_vkCmdBindVertexBuffers(packet_vkCmdBindVertexBuffers* pPacket);
+    VkResult manually_replay_vkGetPipelineCacheData(packet_vkGetPipelineCacheData* pPacket);
+    VkResult manually_replay_vkCreateGraphicsPipelines(packet_vkCreateGraphicsPipelines* pPacket);
+    VkResult manually_replay_vkCreateComputePipelines(packet_vkCreateComputePipelines* pPacket);
+    VkResult manually_replay_vkCreatePipelineLayout(packet_vkCreatePipelineLayout* pPacket);
+    void manually_replay_vkCmdWaitEvents(packet_vkCmdWaitEvents* pPacket);
+    void manually_replay_vkCmdPipelineBarrier(packet_vkCmdPipelineBarrier* pPacket);
+    VkResult manually_replay_vkCreateFramebuffer(packet_vkCreateFramebuffer* pPacket);
+    VkResult manually_replay_vkCreateRenderPass(packet_vkCreateRenderPass* pPacket);
+    void manually_replay_vkCmdBeginRenderPass(packet_vkCmdBeginRenderPass* pPacket);
+    VkResult manually_replay_vkBeginCommandBuffer(packet_vkBeginCommandBuffer* pPacket);
+    VkResult manually_replay_vkAllocateCommandBuffers(packet_vkAllocateCommandBuffers* pPacket);
+    VkResult manually_replay_vkWaitForFences(packet_vkWaitForFences* pPacket);
+    VkResult manually_replay_vkAllocateMemory(packet_vkAllocateMemory* pPacket);
+    void manually_replay_vkFreeMemory(packet_vkFreeMemory* pPacket);
+    VkResult manually_replay_vkMapMemory(packet_vkMapMemory* pPacket);
+    void manually_replay_vkUnmapMemory(packet_vkUnmapMemory* pPacket);
+    VkResult manually_replay_vkFlushMappedMemoryRanges(packet_vkFlushMappedMemoryRanges* pPacket);
+    VkResult manually_replay_vkInvalidateMappedMemoryRanges(packet_vkInvalidateMappedMemoryRanges* pPacket);
+    void manually_replay_vkGetPhysicalDeviceMemoryProperties(packet_vkGetPhysicalDeviceMemoryProperties* pPacket);
+    void manually_replay_vkGetPhysicalDeviceQueueFamilyProperties(packet_vkGetPhysicalDeviceQueueFamilyProperties* pPacket);
+    VkResult manually_replay_vkGetPhysicalDeviceSurfaceSupportKHR(packet_vkGetPhysicalDeviceSurfaceSupportKHR* pPacket);
+    VkResult manually_replay_vkGetPhysicalDeviceSurfaceCapabilitiesKHR(packet_vkGetPhysicalDeviceSurfaceCapabilitiesKHR* pPacket);
+    VkResult manually_replay_vkGetPhysicalDeviceSurfaceFormatsKHR(packet_vkGetPhysicalDeviceSurfaceFormatsKHR* pPacket);
+    VkResult manually_replay_vkGetPhysicalDeviceSurfacePresentModesKHR(packet_vkGetPhysicalDeviceSurfacePresentModesKHR* pPacket);
+    VkResult manually_replay_vkCreateSwapchainKHR(packet_vkCreateSwapchainKHR* pPacket);
+    VkResult manually_replay_vkGetSwapchainImagesKHR(packet_vkGetSwapchainImagesKHR* pPacket);
+    VkResult manually_replay_vkQueuePresentKHR(packet_vkQueuePresentKHR* pPacket);
+    VkResult manually_replay_vkCreateXcbSurfaceKHR(packet_vkCreateXcbSurfaceKHR* pPacket);
+    VkBool32 manually_replay_vkGetPhysicalDeviceXcbPresentationSupportKHR(packet_vkGetPhysicalDeviceXcbPresentationSupportKHR* pPacket);
+    VkResult manually_replay_vkCreateXlibSurfaceKHR(packet_vkCreateXlibSurfaceKHR* pPacket);
+    VkBool32 manually_replay_vkGetPhysicalDeviceXlibPresentationSupportKHR(packet_vkGetPhysicalDeviceXlibPresentationSupportKHR* pPacket);
+    VkResult manually_replay_vkCreateWin32SurfaceKHR(packet_vkCreateWin32SurfaceKHR* pPacket);
+    VkBool32 manually_replay_vkGetPhysicalDeviceWin32PresentationSupportKHR(packet_vkGetPhysicalDeviceWin32PresentationSupportKHR* pPacket);
+    VkResult manually_replay_vkCreateAndroidSurfaceKHR(packet_vkCreateAndroidSurfaceKHR* pPacket);
+    VkResult manually_replay_vkCreateDebugReportCallbackEXT(packet_vkCreateDebugReportCallbackEXT* pPacket);
+    void manually_replay_vkDestroyDebugReportCallbackEXT(packet_vkDestroyDebugReportCallbackEXT* pPacket);
+
+    void process_screenshot_list(const char *list)
+    {
+        std::string spec(list), word;
+        size_t start = 0, comma = 0;
+
+        while (start < spec.size()) {
+            comma = spec.find(',', start);
+
+            if (comma == std::string::npos)
+                word = std::string(spec, start);
+            else
+                word = std::string(spec, start, comma - start);
+
+            m_screenshotFrames.push_back(atoi(word.c_str()));
+            if (comma == std::string::npos)
+                break;
+
+            start = comma + 1;
+
+        }
+    }
+
+    struct QueueFamilyProperties {
+        uint32_t count;
+        VkQueueFamilyProperties* queueFamilyProperties;
+    };
+
+    // Map VkPhysicalDevice to QueueFamilyPropeties (and ultimately queue indices)
+    std::unordered_map<VkPhysicalDevice, struct QueueFamilyProperties> traceQueueFamilyProperties;
+    std::unordered_map<VkPhysicalDevice, struct QueueFamilyProperties> replayQueueFamilyProperties;
+
+    // Map VkDevice to a VkPhysicalDevice
+    std::unordered_map<VkDevice, VkPhysicalDevice> tracePhysicalDevices;
+    std::unordered_map<VkDevice, VkPhysicalDevice> replayPhysicalDevices;
+
+    // Map VkBuffer to VkDevice, so we can search for the VkDevice used to create a buffer
+    std::unordered_map<VkBuffer, VkDevice> traceBufferToDevice;
+    std::unordered_map<VkBuffer, VkDevice> replayBufferToDevice;
+
+    // Map VkImage to VkDevice, so we can search for the VkDevice used to create an image
+    std::unordered_map<VkImage, VkDevice> traceImageToDevice;
+    std::unordered_map<VkImage, VkDevice> replayImageToDevice;
+
+    // Map VkPhysicalDevice to VkPhysicalDeviceMemoryProperites
+    std::unordered_map<VkPhysicalDevice, VkPhysicalDeviceMemoryProperties> traceMemoryProperties;
+    std::unordered_map<VkPhysicalDevice, VkPhysicalDeviceMemoryProperties> replayMemoryProperties;
+
+    bool getMemoryTypeIdx(VkDevice traceDevice,
+                          VkDevice replayDevice,
+                          uint32_t traceIdx,
+                          uint32_t* pReplayIdx);
+
+    bool getQueueFamilyIdx(VkPhysicalDevice tracePhysicalDevice,
+                           VkPhysicalDevice replayPhysicalDevice,
+                           uint32_t traceIdx,
+                           uint32_t* pReplayIdx);
+    bool getQueueFamilyIdx(VkDevice traceDevice,
+        VkDevice replayDevice,
+        uint32_t traceIdx,
+        uint32_t* pReplayIdx);
+
+};

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/CMakeLists.txt b/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/CMakeLists.txt
new file mode 100644
index 0000000..0825ad9
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/CMakeLists.txt

@@ -0,0 +1,83 @@
+cmake_minimum_required(VERSION 2.8)
+
+project(vktraceviewer_vk)
+
+include("${SRC_DIR}/build_options.cmake")
+
+find_package(Qt5 COMPONENTS Widgets Gui Core Svg QUIET)
+
+if(NOT Qt5_FOUND)
+# After Qt5.6 is installed, you may need to add the following to the cmake command line:
+# -DCMAKE_PREFIX_PATH=C:\\Qt\\5.6\\msvc2015_64\\
+message(WARNING "WARNING: vktraceviewer_vk will be excluded because Qt5 was not found.")
+else()
+
+set(SRC_LIST
+    ${SRC_LIST}
+    vktraceviewer_vk.cpp
+    vktraceviewer_vk_settings.cpp
+    vktraceviewer_vk_qcontroller.cpp
+    vktraceviewer_vk_qfile_model.cpp
+    vktraceviewer_vk_qgroupframesproxymodel.cpp
+    ${SRC_DIR}/vktrace_viewer/vktraceviewer_QReplayWorker.cpp
+    ${SRC_DIR}/vktrace_replay/vkreplay_factory.cpp
+)
+
+# This should only contain headers that define a QOBJECT
+# Typically that means just headers for UI objects
+set(UI_HEADER_LIST
+    vktraceviewer_vk_qcontroller.h
+    vktraceviewer_vk_qfile_model.h
+    vktraceviewer_vk_qgroupframesproxymodel.h
+    ${SRC_DIR}/vktrace_viewer/vktraceviewer_qgroupthreadsproxymodel.h
+    ${SRC_DIR}/vktrace_viewer/vktraceviewer_qimageviewer.h
+    ${SRC_DIR}/vktrace_viewer/vktraceviewer_qsvgviewer.h
+    ${SRC_DIR}/vktrace_viewer/vktraceviewer_QTraceFileModel.h
+    ${SRC_DIR}/vktrace_viewer/vktraceviewer_QReplayWidget.h
+    ${SRC_DIR}/vktrace_viewer/vktraceviewer_QReplayWorker.h
+)
+
+set(HDR_LIST
+    vktraceviewer_vk_settings.h
+    vktraceviewer_vk_qgroupframesproxymodel.h
+    ${SRC_DIR}/vktrace_viewer/vktraceviewer_qgroupthreadsproxymodel.h
+    ${SRC_DIR}/vktrace_viewer/vktraceviewer_controller.h
+    ${SRC_DIR}/vktrace_replay/vkreplay_factory.h
+    ${CMAKE_CURRENT_SOURCE_DIR}/../vulkan/codegen_utils/vk_enum_string_helper.h
+    ${CODEGEN_VKTRACE_DIR}/vktrace_vk_packet_id.h
+    ${CODEGEN_VKTRACE_DIR}/vktrace_vk_vk_packets.h
+)
+
+include_directories(
+    ${CODEGEN_VKTRACE_DIR}
+    ${SRC_DIR}/vktrace_common
+    ${SRC_DIR}/vktrace_viewer
+    ${SRC_DIR}/vktrace_replay
+    ${SRC_DIR}/thirdparty
+    ${CMAKE_CURRENT_SOURCE_DIR}/../vkreplay
+    ${CMAKE_CURRENT_SOURCE_DIR}/../vulkan/codegen_utils
+    ${VKTRACE_VULKAN_DIR}/${CODEGEN_VKTRACE_DIR}
+    ${VKTRACE_VULKAN_INCLUDE_DIR}
+    ${Qt5Widgets_INCLUDE_DIRS}
+)
+
+QT5_WRAP_CPP(QT_GEN_HEADER_MOC_LIST ${UI_HEADER_LIST})
+
+set (CMAKE_RUNTIME_OUTPUT_DIRECTORY "${PROJECT_BINARY_DIR}/../../../../")
+
+add_library(${PROJECT_NAME} STATIC ${SRC_LIST} ${HDR_LIST}
+    ${QT_GEN_HEADER_MOC_LIST}
+)
+
+target_link_libraries(${PROJECT_NAME} 
+    Qt5::Widgets
+    Qt5::Core
+    Qt5::Svg
+    ${VKTRACE_VULKAN_LIB}
+    vktrace_common
+    vulkan_replay
+)
+
+build_options_finalize()
+
+endif(NOT Qt5_FOUND)

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk.cpp b/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk.cpp
new file mode 100644
index 0000000..19fa6ae
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk.cpp

@@ -0,0 +1,42 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+#include "vktraceviewer_vk_qcontroller.h"
+#include "vktraceviewer_controller.h"
+
+extern "C"
+{
+VKTRACER_EXPORT vktraceviewer_QController* VKTRACER_CDECL vtvCreateQController()
+{
+    vktraceviewer_vk_QController* pController = new vktraceviewer_vk_QController();
+
+    return (vktraceviewer_QController*) pController;
+}
+
+VKTRACER_EXPORT void VKTRACER_CDECL vtvDeleteQController(vktraceviewer_QController** ppController)
+{
+    if (ppController != NULL && *ppController != NULL)
+    {
+        delete *ppController;
+        *ppController = NULL;
+    }
+}
+
+}

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_qcontroller.cpp b/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_qcontroller.cpp
new file mode 100644
index 0000000..1bcd7c2
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_qcontroller.cpp

@@ -0,0 +1,365 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+**************************************************************************/
+#include "vktraceviewer_vk_settings.h"
+#include "vktraceviewer_vk_qcontroller.h"
+
+extern "C" {
+#include "vktrace_trace_packet_utils.h"
+#include "vktrace_vk_packet_id.h"
+}
+
+#include <assert.h>
+#include <QFileInfo>
+#include <QWidget>
+#include <QToolButton>
+#include <QCoreApplication>
+#include <QProcess>
+
+#include "vktraceviewer_view.h"
+#include "vkreplay_seq.h"
+
+static vktraceviewer_vk_QController* s_pController;
+
+void controllerLoggingCallback(VktraceLogLevel level, const char* pMessage)
+{
+    if (s_pController != NULL)
+    {
+        s_pController->OnOutputMessage(level, pMessage);
+    }
+}
+
+vktraceviewer_vk_QController::vktraceviewer_vk_QController()
+    : m_pView(NULL),
+      m_pTraceFileInfo(NULL),
+      m_pDrawStateDiagram(NULL),
+      m_pCommandBuffersDiagram(NULL),
+      m_pReplayWidget(NULL),
+      m_pTraceFileModel(NULL)
+{
+    s_pController = this;
+    vktrace_LogSetCallback(controllerLoggingCallback);
+    vktrace_LogSetLevel(VKTRACE_LOG_ERROR);
+    initialize_default_settings();
+    vktrace_SettingGroup_reset_defaults(&g_vkTraceViewerSettingGroup);
+}
+
+vktraceviewer_vk_QController::~vktraceviewer_vk_QController()
+{
+}
+
+vktrace_trace_packet_header* vktraceviewer_vk_QController::InterpretTracePacket(vktrace_trace_packet_header* pHeader)
+{
+    // Attempt to interpret the packet as a Vulkan packet
+    vktrace_trace_packet_header* pInterpretedHeader = interpret_trace_packet_vk(pHeader);
+    if (pInterpretedHeader == NULL)
+    {
+        vktrace_LogError("Unrecognized Vulkan packet id: %u.", pHeader->packet_id);
+    }
+    else if (pInterpretedHeader->packet_id == VKTRACE_TPI_VK_vkApiVersion)
+    {
+        packet_vkApiVersion* pPacket = (packet_vkApiVersion*)pInterpretedHeader->pBody;
+        if (pPacket->version != VK_MAKE_VERSION(1, 0, VK_HEADER_VERSION))
+        {
+            vktrace_LogError("Trace file is from Vulkan version 0x%x (%u.%u.%u), but the VkTraceViewer plugin only supports version 0x%x (%u.%u.%u).", pPacket->version, VK_VERSION_MAJOR(pPacket->version), VK_VERSION_MINOR(pPacket->version), VK_VERSION_PATCH(pPacket->version), VK_MAKE_VERSION(1, 0, VK_HEADER_VERSION), 1, 0, VK_HEADER_VERSION);
+            pInterpretedHeader = NULL;
+        }
+    }
+
+    return pInterpretedHeader;
+}
+
+bool vktraceviewer_vk_QController::LoadTraceFile(vktraceviewer_trace_file_info* pTraceFileInfo, vktraceviewer_view* pView)
+{
+    assert(pTraceFileInfo != NULL);
+    assert(pView != NULL);
+    setView(pView);
+    m_pTraceFileInfo = pTraceFileInfo;
+
+    assert(m_pReplayWidget == NULL);
+    m_pReplayWidget = new vktraceviewer_QReplayWidget(&m_replayWorker);
+    if (m_pReplayWidget != NULL)
+    {
+        // load available replayers
+        if (!m_replayWorker.load_replayers(pTraceFileInfo, m_pReplayWidget->GetReplayWindow(),
+            g_vkTraceViewerSettings.replay_window_width,
+            g_vkTraceViewerSettings.replay_window_height,
+            g_vkTraceViewerSettings.separate_replay_window))
+        {
+            emit OutputMessage(VKTRACE_LOG_ERROR, "Failed to load necessary replayers.");
+            delete m_pReplayWidget;
+            m_pReplayWidget = NULL;
+        }
+        else
+        {
+            m_pView->add_custom_state_viewer(m_pReplayWidget, "Replayer", true);
+            m_pReplayWidget->setEnabled(true);
+            connect(m_pReplayWidget, SIGNAL(ReplayStarted()), this, SLOT(onReplayStarted()));
+            connect(m_pReplayWidget, SIGNAL(ReplayPaused(uint64_t)), this, SLOT(onReplayPaused(uint64_t)));
+            connect(m_pReplayWidget, SIGNAL(ReplayContinued()), this, SLOT(onReplayContinued()));
+            connect(m_pReplayWidget, SIGNAL(ReplayStopped(uint64_t)), this, SLOT(onReplayStopped(uint64_t)));
+            connect(m_pReplayWidget, SIGNAL(ReplayFinished(uint64_t)), this, SLOT(onReplayFinished(uint64_t)));
+            connect(m_pReplayWidget, SIGNAL(ReplayProgressUpdate(uint64_t)), this, SLOT(onReplayProgressUpdate(uint64_t)));
+
+            connect(m_pReplayWidget, SIGNAL(OutputMessage(VktraceLogLevel, const QString&)), this, SIGNAL(OutputMessage(VktraceLogLevel, const QString&)));
+            connect(m_pReplayWidget, SIGNAL(OutputMessage(VktraceLogLevel, uint64_t, const QString&)), this, SIGNAL(OutputMessage(VktraceLogLevel, uint64_t, const QString&)));
+        }
+    }
+
+    assert(m_pTraceFileModel == NULL);
+    m_pTraceFileModel = new vktraceviewer_vk_QFileModel(NULL, pTraceFileInfo);
+    updateCallTreeBasedOnSettings();
+
+    deleteStateDumps();
+
+    return true;
+}
+
+const char* vktraceviewer_vk_QController::GetPacketIdString(uint16_t packetId)
+{
+    return vktrace_vk_packet_id_name((VKTRACE_TRACE_PACKET_ID_VK)packetId);
+}
+
+void vktraceviewer_vk_QController::updateCallTreeBasedOnSettings()
+{
+    if (m_pTraceFileModel == NULL)
+    {
+        return;
+    }
+
+    if (g_vkTraceViewerSettings.groupByFrame)
+    {
+        if (m_groupByFramesProxy.sourceModel() != m_pTraceFileModel)
+        {
+            m_groupByFramesProxy.setSourceModel(m_pTraceFileModel);
+        }
+        m_pView->set_calltree_model(m_pTraceFileModel, &m_groupByFramesProxy);
+    }
+    else if (g_vkTraceViewerSettings.groupByThread)
+    {
+        if (m_groupByThreadsProxy.sourceModel() != m_pTraceFileModel)
+        {
+            m_groupByThreadsProxy.setSourceModel(m_pTraceFileModel);
+        }
+        m_pView->set_calltree_model(m_pTraceFileModel, &m_groupByThreadsProxy);
+    }
+    else
+    {
+        m_pView->set_calltree_model(m_pTraceFileModel, NULL);
+    }
+}
+
+void vktraceviewer_vk_QController::deleteStateDumps() const
+{
+    QFile::remove("pipeline_dump.dot");
+    QFile::remove("pipeline_dump.svg");
+    QFile::remove("cb_dump.dot");
+    QFile::remove("cb_dump.svg");
+}
+
+void vktraceviewer_vk_QController::setStateWidgetsEnabled(bool bEnabled)
+{
+    if(m_pDrawStateDiagram != NULL)
+    {
+        m_pView->enable_custom_state_viewer(m_pDrawStateDiagram, bEnabled);
+    }
+
+    if(m_pCommandBuffersDiagram != NULL)
+    {
+        m_pView->enable_custom_state_viewer(m_pCommandBuffersDiagram, bEnabled);
+    }
+}
+
+void vktraceviewer_vk_QController::onReplayStarted()
+{
+    emit OutputMessage(VKTRACE_LOG_VERBOSE, "Replay Started");
+    deleteStateDumps();
+    setStateWidgetsEnabled(false);
+    m_pView->on_replay_state_changed(true);
+}
+
+void vktraceviewer_vk_QController::onReplayPaused(uint64_t packetIndex)
+{
+    emit OutputMessage(VKTRACE_LOG_VERBOSE, packetIndex, "Replay Paused");
+    m_pView->on_replay_state_changed(false);
+
+    // When paused, the replay will 'continue' from the last packet,
+    // so select that call to indicate to the user where the pause occured.
+    m_pView->select_call_at_packet_index(packetIndex);
+
+    // Dump state data from the replayer
+    vktrace_replay::vktrace_trace_packet_replay_library* pVkReplayer = m_replayWorker.getReplayer(VKTRACE_TID_VULKAN);
+    if (pVkReplayer != NULL)
+    {
+        int err;
+        err = pVkReplayer->Dump();
+        if (err)
+        {
+            emit OutputMessage(VKTRACE_LOG_WARNING, packetIndex, "Replayer couldn't output state data.");
+        }
+    }
+
+    // Now try to load known state data.
+
+    // Convert dot files to svg format
+#if defined(PLATFORM_LINUX)
+    if (QFile::exists("/usr/bin/dot"))
+    {
+        QProcess process;
+        process.start("/usr/bin/dot pipeline_dump.dot -Tsvg -o pipeline_dump.svg");
+        process.waitForFinished(-1);
+        process.start("/usr/bin/dot cb_dump.dot -Tsvg -o cb_dump.svg");
+        process.waitForFinished(-1);
+    }
+    else
+    {
+        emit OutputMessage(VKTRACE_LOG_ERROR, packetIndex, "DOT not found, unable to generate state diagrams.");
+    }
+#else
+    emit OutputMessage(VKTRACE_LOG_ERROR, packetIndex, "DOT not found, unable to generate state diagrams.");
+#endif
+
+    if (QFile::exists("pipeline_dump.svg"))
+    {
+        if (m_pDrawStateDiagram == NULL)
+        {
+            m_pDrawStateDiagram = new vktraceviewer_qsvgviewer();
+            m_pView->add_custom_state_viewer(m_pDrawStateDiagram, tr("Draw State"), false);
+            m_pView->enable_custom_state_viewer(m_pDrawStateDiagram, false);
+        }
+
+        if (m_pDrawStateDiagram != NULL && m_pDrawStateDiagram->load(tr("pipeline_dump.svg")))
+        {
+            m_pView->enable_custom_state_viewer(m_pDrawStateDiagram, true);
+        }
+
+    }
+
+    if (QFile::exists("cb_dump.svg"))
+    {
+        if (m_pCommandBuffersDiagram == NULL)
+        {
+            m_pCommandBuffersDiagram = new vktraceviewer_qsvgviewer();
+            m_pView->add_custom_state_viewer(m_pCommandBuffersDiagram, tr("Command Buffers"), false);
+            m_pView->enable_custom_state_viewer(m_pCommandBuffersDiagram, false);
+        }
+
+        if (m_pCommandBuffersDiagram != NULL && m_pCommandBuffersDiagram->load(tr("cb_dump.svg")))
+        {
+            m_pView->enable_custom_state_viewer(m_pCommandBuffersDiagram, true);
+        }
+    }
+}
+
+void vktraceviewer_vk_QController::onReplayContinued()
+{
+    emit OutputMessage(VKTRACE_LOG_VERBOSE, "Replay Continued");
+    deleteStateDumps();
+    setStateWidgetsEnabled(false);
+    m_pView->on_replay_state_changed(true);
+}
+
+void vktraceviewer_vk_QController::onReplayStopped(uint64_t packetIndex)
+{
+    emit OutputMessage(VKTRACE_LOG_VERBOSE, packetIndex, "Replay Stopped");
+    m_pView->on_replay_state_changed(false);
+    setStateWidgetsEnabled(false);
+
+    // Stopping the replay means that it will 'play' or 'step' from the beginning,
+    // so select the first packet index to indicate to the user what stopping replay does.
+    m_pView->select_call_at_packet_index(0);
+}
+
+void vktraceviewer_vk_QController::onReplayProgressUpdate(uint64_t packetArrayIndex)
+{
+    m_pView->highlight_timeline_item(packetArrayIndex, true, true);
+}
+
+void vktraceviewer_vk_QController::onReplayFinished(uint64_t packetIndex)
+{
+    emit OutputMessage(VKTRACE_LOG_VERBOSE, packetIndex, "Replay Finished");
+    m_pView->on_replay_state_changed(false);
+    setStateWidgetsEnabled(false);
+
+    // The replay has completed, so highlight the final packet index.
+    m_pView->select_call_at_packet_index(packetIndex);
+}
+
+void vktraceviewer_vk_QController::OnOutputMessage(VktraceLogLevel level, const QString& msg)
+{
+    emit OutputMessage(level, msg);
+}
+
+vktrace_SettingGroup* vktraceviewer_vk_QController::GetSettings()
+{
+    return &g_vkTraceViewerSettingGroup;
+}
+
+void vktraceviewer_vk_QController::UpdateFromSettings(vktrace_SettingGroup *pGroups, unsigned int numGroups)
+{
+    vktrace_SettingGroup_Apply_Overrides(&g_vkTraceViewerSettingGroup, pGroups, numGroups);
+
+    m_replayWorker.setPrintReplayMessages(g_vkTraceViewerSettings.printReplayInfoMsgs,
+        g_vkTraceViewerSettings.printReplayWarningMsgs,
+        g_vkTraceViewerSettings.printReplayErrorMsgs);
+
+    m_replayWorker.setPauseOnReplayMessages(g_vkTraceViewerSettings.pauseOnReplayInfo,
+        g_vkTraceViewerSettings.pauseOnReplayWarning,
+        g_vkTraceViewerSettings.pauseOnReplayError);
+
+    m_replayWorker.onSettingsUpdated(pGroups, numGroups);
+
+    updateCallTreeBasedOnSettings();
+}
+
+void vktraceviewer_vk_QController::UnloadTraceFile(void)
+{
+    if (m_pView != NULL)
+    {
+        m_pView->set_calltree_model(NULL, NULL);
+        m_pView = NULL;
+    }
+
+    if (m_pTraceFileModel != NULL)
+    {
+        delete m_pTraceFileModel;
+        m_pTraceFileModel = NULL;
+    }
+
+    if (m_pReplayWidget != NULL)
+    {
+        delete m_pReplayWidget;
+        m_pReplayWidget = NULL;
+    }
+
+    if (m_pDrawStateDiagram != NULL)
+    {
+        delete m_pDrawStateDiagram;
+        m_pDrawStateDiagram = NULL;
+    }
+
+    if (m_pCommandBuffersDiagram != NULL)
+    {
+        delete m_pCommandBuffersDiagram;
+        m_pCommandBuffersDiagram = NULL;
+    }
+
+    m_replayWorker.unloadReplayers();
+}

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_qcontroller.h b/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_qcontroller.h
new file mode 100644
index 0000000..c0a30d5
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_qcontroller.h

@@ -0,0 +1,87 @@
+/**************************************************************************
+ *
+ * Copyright 2014 Valve Software. All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ *************************************************************************/
+#ifndef VKTRACEVIEWER_VK_QCONTROLLER_H
+#define VKTRACEVIEWER_VK_QCONTROLLER_H
+
+#include "vktrace_trace_packet_identifiers.h"
+#include "vktraceviewer_vk_qgroupframesproxymodel.h"
+#include "vktraceviewer_qgroupthreadsproxymodel.h"
+#include "vktraceviewer_qsvgviewer.h"
+#include "vktraceviewer_QReplayWidget.h"
+#include "vktraceviewer_QReplayWorker.h"
+#include "vktraceviewer_vk_qfile_model.h"
+#include "vktraceviewer_controller.h"
+#include <QLabel>
+#include <QScrollArea>
+
+
+class vktraceviewer_vk_QController : public vktraceviewer_QController
+{
+    Q_OBJECT
+public:
+    vktraceviewer_vk_QController();
+    virtual ~vktraceviewer_vk_QController();
+
+    virtual vktrace_trace_packet_header* InterpretTracePacket(vktrace_trace_packet_header* pHeader);
+    virtual bool LoadTraceFile(vktraceviewer_trace_file_info* pTraceFileInfo, vktraceviewer_view* pView);
+    virtual void UnloadTraceFile(void);
+
+    void setView(vktraceviewer_view* pView)
+    {
+        m_pView = pView;
+        m_replayWorker.setView(pView);
+    }
+
+    virtual vktrace_SettingGroup* GetSettings();
+    virtual void UpdateFromSettings(vktrace_SettingGroup *pGroups, unsigned int numGroups);
+
+    virtual const char* GetPacketIdString(uint16_t packetId);
+
+public slots:
+    void OnOutputMessage(VktraceLogLevel level, const QString& msg);
+
+signals:
+    // Inherited from glvdebug_QController
+    void OutputMessage(VktraceLogLevel level, const QString& message);
+    void OutputMessage(VktraceLogLevel level, uint64_t packetIndex, const QString& message);
+
+protected slots:
+    void onReplayStarted();
+    void onReplayPaused(uint64_t packetIndex);
+    void onReplayContinued();
+    void onReplayStopped(uint64_t packetIndex);
+    void onReplayFinished(uint64_t packetIndex);
+    void onReplayProgressUpdate(uint64_t packetArrayIndex);
+
+private:
+    vktraceviewer_view* m_pView;
+    vktraceviewer_trace_file_info* m_pTraceFileInfo;
+    vktraceviewer_QReplayWorker m_replayWorker;
+    vktraceviewer_qsvgviewer* m_pDrawStateDiagram;
+    vktraceviewer_qsvgviewer* m_pCommandBuffersDiagram;
+    vktraceviewer_QReplayWidget* m_pReplayWidget;
+    vktraceviewer_vk_QFileModel* m_pTraceFileModel;
+    vktraceviewer_vk_QGroupFramesProxyModel m_groupByFramesProxy;
+    vktraceviewer_QGroupThreadsProxyModel m_groupByThreadsProxy;
+
+    void setStateWidgetsEnabled(bool bEnabled);
+    void updateCallTreeBasedOnSettings();
+    void deleteStateDumps() const;
+};
+
+#endif // VKTRACEVIEWER_VK_QCONTROLLER_H

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_qfile_model.cpp b/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_qfile_model.cpp
new file mode 100644
index 0000000..2761416
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_qfile_model.cpp

@@ -0,0 +1,95 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+
+#include "vktraceviewer_vk_qfile_model.h"
+extern "C" {
+#include "vktrace_trace_packet_utils.h"
+#include "vktrace_vk_packet_id.h"
+}
+
+vktraceviewer_vk_QFileModel::vktraceviewer_vk_QFileModel(QObject* parent, vktraceviewer_trace_file_info* pTraceFileInfo)
+        : vktraceviewer_QTraceFileModel(parent, pTraceFileInfo)
+{
+}
+
+vktraceviewer_vk_QFileModel::~vktraceviewer_vk_QFileModel()
+{
+}
+
+QString vktraceviewer_vk_QFileModel::get_packet_string(const vktrace_trace_packet_header* pHeader) const
+{
+    if (pHeader->packet_id < VKTRACE_TPI_BEGIN_API_HERE)
+    {
+        return vktraceviewer_QTraceFileModel::get_packet_string(pHeader);
+    }
+    else
+    {
+        QString packetString = vktrace_stringify_vk_packet_id((const enum VKTRACE_TRACE_PACKET_ID_VK) pHeader->packet_id, pHeader);
+        return packetString;
+    }
+}
+
+QString vktraceviewer_vk_QFileModel::get_packet_string_multiline(const vktrace_trace_packet_header* pHeader) const
+{
+    if (pHeader->packet_id < VKTRACE_TPI_BEGIN_API_HERE)
+    {
+        return vktraceviewer_QTraceFileModel::get_packet_string_multiline(pHeader);
+    }
+    else
+    {
+        QString packetString = vktrace_stringify_vk_packet_id((const enum VKTRACE_TRACE_PACKET_ID_VK) pHeader->packet_id, pHeader);
+        return packetString;
+    }
+}
+
+bool vktraceviewer_vk_QFileModel::isDrawCall(const VKTRACE_TRACE_PACKET_ID packetId) const
+{
+    // TODO : Update this based on latest API updates
+    bool isDraw = false;
+    switch((VKTRACE_TRACE_PACKET_ID_VK)packetId)
+    {
+        case VKTRACE_TPI_VK_vkCmdDraw:
+        case VKTRACE_TPI_VK_vkCmdDrawIndexed:
+        case VKTRACE_TPI_VK_vkCmdDrawIndirect:
+        case VKTRACE_TPI_VK_vkCmdDrawIndexedIndirect:
+        case VKTRACE_TPI_VK_vkCmdDispatch:
+        case VKTRACE_TPI_VK_vkCmdDispatchIndirect:
+        case VKTRACE_TPI_VK_vkCmdCopyBuffer:
+        case VKTRACE_TPI_VK_vkCmdCopyImage:
+        case VKTRACE_TPI_VK_vkCmdCopyBufferToImage:
+        case VKTRACE_TPI_VK_vkCmdCopyImageToBuffer:
+        case VKTRACE_TPI_VK_vkCmdUpdateBuffer:
+        case VKTRACE_TPI_VK_vkCmdFillBuffer:
+        case VKTRACE_TPI_VK_vkCmdClearColorImage:
+        case VKTRACE_TPI_VK_vkCmdClearDepthStencilImage:
+        case VKTRACE_TPI_VK_vkCmdClearAttachments:
+        case VKTRACE_TPI_VK_vkCmdResolveImage:
+        {
+            isDraw = true;
+            break;
+        }
+        default:
+        {
+            isDraw = false;
+        }
+    }
+    return isDraw;
+}

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_qfile_model.h b/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_qfile_model.h
new file mode 100644
index 0000000..47b4992
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_qfile_model.h

@@ -0,0 +1,40 @@
+/**************************************************************************
+ *
+ * Copyright 2014 Lunarg, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ **************************************************************************/
+#ifndef VKTRACEVIEWER_VK_QFILE_MODEL_H_
+#define VKTRACEVIEWER_VK_QFILE_MODEL_H_
+
+#include "vktrace_trace_packet_identifiers.h"
+#include "vktraceviewer_QTraceFileModel.h"
+#include <QObject>
+
+class vktraceviewer_vk_QFileModel : public vktraceviewer_QTraceFileModel
+{
+    Q_OBJECT
+public:
+    vktraceviewer_vk_QFileModel(QObject * parent, vktraceviewer_trace_file_info *);
+    virtual ~vktraceviewer_vk_QFileModel();
+
+    virtual QString get_packet_string(const vktrace_trace_packet_header* pHeader) const;
+    virtual QString get_packet_string_multiline(const vktrace_trace_packet_header* pHeader) const;
+
+    virtual bool isDrawCall(const VKTRACE_TRACE_PACKET_ID packetId) const;
+
+};
+
+#endif //VKTRACEVIEWER_VK_QFILE_MODEL_H_

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_qgroupframesproxymodel.cpp b/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_qgroupframesproxymodel.cpp
new file mode 100644
index 0000000..cb72e36
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_qgroupframesproxymodel.cpp

@@ -0,0 +1,57 @@
+/**************************************************************************
+ *
+ * Copyright 2016 Valve Corporation
+ * Copyright (C) 2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ **************************************************************************/
+#include "vktraceviewer_vk_qgroupframesproxymodel.h"
+
+extern "C" {
+#include "vktrace_vk_packet_id.h"
+}
+
+void vktraceviewer_vk_QGroupFramesProxyModel::buildGroups()
+    {
+        m_mapSourceRowToProxyGroupRow.clear();
+        m_frameList.clear();
+        m_curFrameCount = 0;
+
+        if (sourceModel() != NULL)
+        {
+            FrameInfo* pCurFrame = addNewFrame();
+            m_mapSourceRowToProxyGroupRow.reserve(sourceModel()->rowCount());
+            for (int srcRow = 0; srcRow < sourceModel()->rowCount(); srcRow++)
+            {
+                int proxyRow = pCurFrame->mapChildRowToSourceRow.count();
+
+                // map source row to it's corresponding row in the proxy group.
+                m_mapSourceRowToProxyGroupRow.append(proxyRow);
+
+                // add this src row to the current proxy group.
+                pCurFrame->mapChildRowToSourceRow.append(srcRow);
+
+                // Should a new frame be started based on the API call in the previous row?
+                // If source data is a frame boundary make a new frame
+                QModelIndex tmpIndex = sourceModel()->index(srcRow, 0);
+                assert(tmpIndex.isValid());
+                vktrace_trace_packet_header* pHeader = (vktrace_trace_packet_header*)tmpIndex.internalPointer();
+                if (pHeader != NULL && pHeader->tracer_id == VKTRACE_TID_VULKAN && pHeader->packet_id == VKTRACE_TPI_VK_vkQueuePresentKHR)
+                {
+                    pCurFrame = addNewFrame();
+                }
+            } // end for each source row
+        }
+    }

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_qgroupframesproxymodel.h b/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_qgroupframesproxymodel.h
new file mode 100644
index 0000000..c8c7ce9
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_qgroupframesproxymodel.h

@@ -0,0 +1,327 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+#ifndef VKTRACEVIEWER_VK_QGROUPFRAMESPROXYMODEL_H
+#define VKTRACEVIEWER_VK_QGROUPFRAMESPROXYMODEL_H
+
+
+#include "vktraceviewer_QTraceFileModel.h"
+#include <QAbstractProxyModel>
+#include <QStandardItem>
+
+#include <QDebug>
+
+struct FrameInfo
+{
+    int frameIndex;
+    QPersistentModelIndex modelIndex;
+    QList<int> mapChildRowToSourceRow;
+};
+
+class vktraceviewer_vk_QGroupFramesProxyModel : public QAbstractProxyModel
+{
+    Q_OBJECT
+public:
+    vktraceviewer_vk_QGroupFramesProxyModel(QObject *parent = 0)
+        : QAbstractProxyModel(parent),
+          m_curFrameCount(0)
+    {
+        buildGroups();
+    }
+
+    virtual ~vktraceviewer_vk_QGroupFramesProxyModel()
+    {
+    }
+
+    //---------------------------------------------------------------------------------------------
+    virtual void setSourceModel(QAbstractItemModel *sourceModel)
+    {
+        if (sourceModel != NULL && !sourceModel->inherits("vktraceviewer_QTraceFileModel"))
+        {
+            assert(!"Setting QGroupFramesProxyModel to have a sourceModel that doesn't inherit from QTraceFileModel.");
+            sourceModel = NULL;
+        }
+
+        QAbstractProxyModel::setSourceModel(sourceModel);
+        buildGroups();
+    }
+
+    //---------------------------------------------------------------------------------------------
+    virtual int rowCount(const QModelIndex &parent) const
+    {
+        if (!parent.isValid())
+        {
+            return m_frameList.count();
+        }
+        else if (isFrame(parent))
+        {
+            // this is a frame.
+            // A frame knows how many children it has!
+            return m_frameList[parent.row()].mapChildRowToSourceRow.count();
+        }
+        else
+        {
+            // ask the source
+            return sourceModel()->rowCount(mapToSource(parent));
+        }
+
+        return 0;
+    }
+
+    //---------------------------------------------------------------------------------------------
+    virtual bool hasChildren(const QModelIndex &parent) const
+    {
+        if (!parent.isValid())
+        {
+            return true;
+        }
+        else if (isFrame(parent))
+        {
+            return m_frameList[parent.row()].mapChildRowToSourceRow.count() > 0;
+        }
+        return false;
+    }
+
+    //---------------------------------------------------------------------------------------------
+    virtual QVariant data( const QModelIndex &index, int role ) const
+    {
+        if (!index.isValid())
+        {
+            return QVariant();
+        }
+
+        if (!isFrame(index))
+        {
+            return mapToSource(index).data(role);
+        }
+
+        if (role == Qt::DisplayRole)
+        {
+            if (index.column() == 0)
+            {
+                return QVariant(QString("Frame %1").arg(m_frameList[index.row()].frameIndex));
+            }
+            else
+            {
+                return QVariant(QString(""));
+            }
+        }
+
+        return QVariant();
+    }
+
+    //---------------------------------------------------------------------------------------------
+    virtual Qt::ItemFlags flags(const QModelIndex &index) const
+    {
+        return Qt::ItemIsEnabled | Qt::ItemIsSelectable;
+    }
+
+    //---------------------------------------------------------------------------------------------
+    virtual int columnCount(const QModelIndex &parent) const
+    {
+        return sourceModel()->columnCount();
+    }
+
+    //---------------------------------------------------------------------------------------------
+    virtual QVariant headerData(int section, Qt::Orientation orientation, int role) const
+    {
+        return sourceModel()->headerData(section, orientation, role);
+    }
+
+    //---------------------------------------------------------------------------------------------
+    QModelIndex index(int row, int column, const QModelIndex &parent = QModelIndex()) const
+    {
+        if (!parent.isValid())
+        {
+            // if parent is not valid, then this row and column is referencing Frame data
+            if (row < m_frameList.count())
+            {
+                return createIndex(row, column, (FrameInfo*)&m_frameList[row]);
+            }
+
+            return QModelIndex();
+        }
+        else if (isFrame(parent))
+        {
+            // the parent is a frame, so this row and column reference a source cell
+            const FrameInfo* pFrame = (const FrameInfo*)&m_frameList[parent.row()];
+            assert(pFrame->frameIndex == parent.row());
+            return createIndex(row, column, pFrame->frameIndex);
+        }
+
+        return QModelIndex();
+    }
+
+    //---------------------------------------------------------------------------------------------
+    QModelIndex parent(const QModelIndex &child) const
+    {
+        if (!child.isValid())
+            return QModelIndex();
+
+        if (isFrame(child))
+        {
+            // frames don't have a parent (ie, they are at the root level)
+            return QModelIndex();
+        }
+        else
+        {
+            // The child is a proxy of the source model,
+            // so the parent is its frame's modelIndex.
+            quintptr frameIndex = child.internalId();
+            const FrameInfo* pFrame = (const FrameInfo*)&m_frameList[frameIndex];
+            return pFrame->modelIndex;
+        }
+    }
+
+    //---------------------------------------------------------------------------------------------
+    // sibling(..) needed to be implemented here because the inherited implementation looks at the
+    // sourceModel to get the new index, which results in bugs because it returns a source model
+    // sibling of a proxy model index. The new implementation keeps everything in proxy model space.
+    QModelIndex sibling(int row, int column, const QModelIndex &idx) const
+    {
+        return index(row, column, parent(idx));
+    }
+
+    //---------------------------------------------------------------------------------------------
+    QModelIndex mapToSource(const QModelIndex &proxyIndex) const
+    {
+        if (!proxyIndex.isValid())
+            return QModelIndex();
+
+        if (isFrame(proxyIndex))
+        {
+            // frames can't get mapped to the source
+            return QModelIndex();
+        }
+        else
+        {
+            quintptr frameIndex = proxyIndex.internalId();
+            const FrameInfo* pFrame = (const FrameInfo*)&m_frameList[frameIndex];
+            assert(pFrame->frameIndex == (int)frameIndex);
+            if (proxyIndex.row() < pFrame->mapChildRowToSourceRow.count())
+            {
+                int srcRow = pFrame->mapChildRowToSourceRow[proxyIndex.row()];
+                int srcCol = proxyIndex.column();
+
+                // by using a default srcParent, we'll only get top-level indices.
+                // ie, we won't support hierarchical sourceModels.
+                return sourceModel()->index(srcRow, srcCol, QModelIndex());
+            }
+            else
+            {
+                // this unexpected case has happened when scrolling quickly.
+                // UPDATE: I think I fixed this issue, it was due to calling .next() too many times on an iterator.
+                return QModelIndex();
+            }
+        }
+    }
+
+    //---------------------------------------------------------------------------------------------
+    QModelIndex mapFromSource(const QModelIndex &sourceIndex) const
+    {
+        if (!sourceIndex.isValid())
+            return QModelIndex();
+
+        int srcRow = sourceIndex.row();
+        int proxyRow = m_mapSourceRowToProxyGroupRow[srcRow];
+
+        // figure out which frame has the srcRow as a child
+        const FrameInfo* pProxyGroup = NULL;
+        QListIterator<FrameInfo> frameIter(m_frameList);
+        while (frameIter.hasNext())
+        {
+            const FrameInfo* pFrame = &frameIter.next();
+            if (pFrame->mapChildRowToSourceRow.contains(srcRow))
+            {
+                pProxyGroup = pFrame;
+                break;
+            }
+        }
+
+        return createIndex(proxyRow, sourceIndex.column(), pProxyGroup->frameIndex);
+    }
+
+    //---------------------------------------------------------------------------------------------
+    virtual QModelIndexList match(const QModelIndex &start, int role, const QVariant &value, int hits, Qt::MatchFlags flags) const
+    {
+        QModelIndexList results = sourceModel()->match(start, role, value, hits, flags);
+
+        for (int i = 0; i < results.count(); i++)
+        {
+            results[i] = mapFromSource(results[i]);
+        }
+
+        return results;
+    }
+
+    //---------------------------------------------------------------------------------------------
+private:
+    QList<FrameInfo> m_frameList;
+    QList<int> m_mapSourceRowToProxyGroupRow;
+    int m_curFrameCount;
+
+    //---------------------------------------------------------------------------------------------
+    bool isFrame(const QModelIndex &proxyIndex) const
+    {
+        // API Calls use the frame number as the index's internalId
+        int id = (int)proxyIndex.internalId();
+        if (id >= 0 && id < m_frameList.count())
+        {
+            // this is an api call
+            return false;
+        }
+
+        // do some validation on the modelIndex
+        FrameInfo* pFI = (FrameInfo*)proxyIndex.internalPointer();
+        if (pFI != NULL &&
+            pFI->frameIndex == proxyIndex.row() &&
+            proxyIndex.row() < m_frameList.count())
+        {
+            return true;
+        }
+
+        return false;
+    }
+
+    //---------------------------------------------------------------------------------------------
+    FrameInfo* addNewFrame()
+    {
+        // create frame info
+        FrameInfo info;
+        m_frameList.append(info);
+        FrameInfo* pFrame = &m_frameList[m_curFrameCount];
+
+        pFrame->frameIndex = m_curFrameCount;
+
+        // create proxy model index for frame node
+        pFrame->modelIndex = createIndex(m_curFrameCount, 0, pFrame);
+
+        // increment frame count
+        m_curFrameCount++;
+
+        return pFrame;
+    }
+
+    //---------------------------------------------------------------------------------------------
+    void buildGroups();
+
+};
+
+#endif // VKTRACEVIEWER_VK_QGROUPFRAMESPROXYMODEL_H

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_settings.cpp b/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_settings.cpp
new file mode 100644
index 0000000..48ee550
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_settings.cpp

@@ -0,0 +1,63 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+
+#include "vktraceviewer_vk_settings.h"
+
+// declared as extern in header
+vktraceviewer_vk_settings g_vkTraceViewerSettings;
+static vktraceviewer_vk_settings s_defaultVkSettings;
+
+vktrace_SettingInfo g_vk_settings[] =
+{
+    { "ri", "PrintReplayInfoMsgs", VKTRACE_SETTING_BOOL, &g_vkTraceViewerSettings.printReplayInfoMsgs, &s_defaultVkSettings.printReplayInfoMsgs, TRUE, "Print info messages reported when replaying trace file."},
+    { "rw", "PrintReplayWarningMsgs", VKTRACE_SETTING_BOOL, &g_vkTraceViewerSettings.printReplayWarningMsgs, &s_defaultVkSettings.printReplayWarningMsgs, TRUE, "Print warning messages reported when replaying trace file."},
+    { "re", "PrintReplayErrorMsgs", VKTRACE_SETTING_BOOL, &g_vkTraceViewerSettings.printReplayErrorMsgs, &s_defaultVkSettings.printReplayErrorMsgs, TRUE, "Print error messages reported when replaying trace file."},
+    { "pi", "PauseOnReplayInfo", VKTRACE_SETTING_BOOL, &g_vkTraceViewerSettings.pauseOnReplayInfo, &s_defaultVkSettings.pauseOnReplayInfo, TRUE, "Pause replay if an info message is reported."},
+    { "pw", "PauseOnReplayWarning", VKTRACE_SETTING_BOOL, &g_vkTraceViewerSettings.pauseOnReplayWarning, &s_defaultVkSettings.pauseOnReplayWarning, TRUE, "Pause replay if a warning message is reported."},
+    { "pe", "PauseOnReplayError", VKTRACE_SETTING_BOOL, &g_vkTraceViewerSettings.pauseOnReplayError, &s_defaultVkSettings.pauseOnReplayError, TRUE, "Pause replay if an error message is reported."},
+    { "gf", "GroupByFrame", VKTRACE_SETTING_BOOL, &g_vkTraceViewerSettings.groupByFrame, &s_defaultVkSettings.groupByFrame, TRUE, "Group API calls by frame."},
+    { "gt", "GroupByThread", VKTRACE_SETTING_BOOL, &g_vkTraceViewerSettings.groupByThread, &s_defaultVkSettings.groupByThread, TRUE, "Group API calls by the CPU thread Id on which they executed."},
+    { "rw", "ReplayWindowWidth", VKTRACE_SETTING_INT, &g_vkTraceViewerSettings.replay_window_width, &s_defaultVkSettings.replay_window_width, TRUE, "Width of replay window on startup."},
+    { "rh", "ReplayWindowHeight", VKTRACE_SETTING_INT, &g_vkTraceViewerSettings.replay_window_height, &s_defaultVkSettings.replay_window_height, TRUE, "Height of replay window on startup."},
+    { "sr", "SeparateReplayWindow", VKTRACE_SETTING_BOOL, &g_vkTraceViewerSettings.separate_replay_window, &s_defaultVkSettings.separate_replay_window, TRUE, "Use a separate replay window."},
+};
+
+vktrace_SettingGroup g_vkTraceViewerSettingGroup =
+{
+    "vktraceviewer_vk",
+    sizeof(g_vk_settings) / sizeof(g_vk_settings[0]),
+    &g_vk_settings[0]
+};
+
+void initialize_default_settings()
+{
+    s_defaultVkSettings.printReplayInfoMsgs = FALSE;
+    s_defaultVkSettings.printReplayWarningMsgs = TRUE;
+    s_defaultVkSettings.printReplayErrorMsgs = TRUE;
+    s_defaultVkSettings.pauseOnReplayInfo = FALSE;
+    s_defaultVkSettings.pauseOnReplayWarning = FALSE;
+    s_defaultVkSettings.pauseOnReplayError = TRUE;
+    s_defaultVkSettings.groupByFrame = FALSE;
+    s_defaultVkSettings.groupByThread = FALSE;
+    s_defaultVkSettings.replay_window_width = 1024;
+    s_defaultVkSettings.replay_window_height = 768;
+    s_defaultVkSettings.separate_replay_window = FALSE;
+};

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_settings.h b/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_settings.h
new file mode 100644
index 0000000..b46681f
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/vktraceviewer/vktraceviewer_vk_settings.h

@@ -0,0 +1,49 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+#ifndef VKTRACEVIEWER_VK_SETTINGS_H
+#define VKTRACEVIEWER_VK_SETTINGS_H
+
+extern "C" {
+#include "vktrace_settings.h"
+}
+
+typedef struct vktraceviewer_vk_settings
+{
+    BOOL printReplayInfoMsgs;
+    BOOL printReplayWarningMsgs;
+    BOOL printReplayErrorMsgs;
+    BOOL pauseOnReplayInfo;
+    BOOL pauseOnReplayWarning;
+    BOOL pauseOnReplayError;
+    BOOL groupByFrame;
+    BOOL groupByThread;
+    int replay_window_width;
+    int replay_window_height;
+    BOOL separate_replay_window;
+} vktraceviewer_vk_settings;
+
+extern vktraceviewer_vk_settings g_vkTraceViewerSettings;
+extern vktrace_SettingGroup g_vkTraceViewerSettingGroup;
+
+void initialize_default_settings();
+
+#endif // VKTRACEVIEWER_VK_SETTINGS_H
+

diff --git a/vktrace/src/vktrace_extensions/vktracevulkan/vulkan/__init__.py b/vktrace/src/vktrace_extensions/vktracevulkan/vulkan/__init__.py
new file mode 100644
index 0000000..e69de29
--- /dev/null
+++ b/vktrace/src/vktrace_extensions/vktracevulkan/vulkan/__init__.py


diff --git a/vktrace/src/vktrace_layer/CMakeLists.txt b/vktrace/src/vktrace_layer/CMakeLists.txt
new file mode 100644
index 0000000..966b28e
--- /dev/null
+++ b/vktrace/src/vktrace_layer/CMakeLists.txt

@@ -0,0 +1,146 @@
+cmake_minimum_required(VERSION 2.8)
+
+if(CMAKE_SYSTEM_NAME STREQUAL "Linux")
+   if (BUILD_WSI_XCB_SUPPORT)
+       set(ENV{VULKAN_WSI} "Xcb")
+   elseif (BUILD_WSI_XLIB_SUPPORT)
+       set(ENV{VULKAN_WSI} "Xlib")
+   elseif (BUILD_WSI_WAYLAND_SUPPORT)
+       set(ENV{VULKAN_WSI} "Wayland")
+   else()
+       # Mir WSI Case
+       set(ENV{VULKAN_WSI} "Mir")
+   endif()
+endif()
+
+project(VkLayer_vktrace_layer)
+
+include("${SRC_DIR}/build_options.cmake")
+
+file(MAKE_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/codegen)
+execute_process(COMMAND ${PYTHON_EXECUTABLE} ${VKTRACE_VULKAN_DIR}/vktrace_generate.py AllPlatforms vktrace-trace-h      vk_version_1_0 OUTPUT_FILE ${CMAKE_CURRENT_SOURCE_DIR}/codegen/vktrace_vk_vk.h)
+execute_process(COMMAND ${PYTHON_EXECUTABLE} ${VKTRACE_VULKAN_DIR}/vktrace_generate.py AllPlatforms vktrace-trace-c      vk_version_1_0 OUTPUT_FILE ${CMAKE_CURRENT_SOURCE_DIR}/codegen/vktrace_vk_vk.cpp)
+#execute_process(COMMAND ${PYTHON_EXECUTABLE} ${VKTRACE_VULKAN_DIR}/vktrace_generate.py AllPlatforms vktrace-ext-trace-h vk_lunarg_debug_marker OUTPUT_FILE ${CMAKE_CURRENT_SOURCE_DIR}/codegen/vktrace_vk_vk_lunarg_debug_marker.h)
+#execute_process(COMMAND ${PYTHON_EXECUTABLE} ${VKTRACE_VULKAN_DIR}/vktrace_generate.py AllPlatforms vktrace-ext-trace-c vk_lunarg_debug_marker OUTPUT_FILE ${CMAKE_CURRENT_SOURCE_DIR}/codegen/vktrace_vk_vk_lunarg_debug_marker.cpp)
+
+if (WIN32)
+    # Put VkLayer_vktrace_layer.dll in the same directory as vktrace.exe
+    # so that vktrace.exe can find VkLayer_vktrace_layer.dll.
+    set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/../../../layersvt/)
+endif()
+
+set (CODEGEN_UTILS_DIR  ${CMAKE_CURRENT_SOURCE_DIR}/../vktrace_extensions/vktracevulkan/vulkan/codegen_utils)
+set(CODEGEN_VKTRACE_DIR "${CMAKE_CURRENT_SOURCE_DIR}/../vktrace_extensions/vktracevulkan/codegen_vktrace_utils")
+
+set(SRC_LIST
+    ${SRC_LIST}
+    vktrace_lib.c
+    vktrace_lib_pagestatusarray.cpp
+    vktrace_lib_pageguardmappedmemory.cpp
+    vktrace_lib_pageguardcapture.cpp
+    vktrace_lib_pageguard.cpp
+    vktrace_lib_trace.cpp
+    vktrace_vk_exts.cpp
+    codegen/vktrace_vk_vk.cpp
+    ${CODEGEN_UTILS_DIR}/vk_struct_size_helper.c
+)
+#    codegen/vktrace_vk_vk_lunarg_debug_marker.cpp
+#    ${CODEGEN_UTILS_DIR}/vk_lunarg_debug_marker_struct_size_helper.c
+
+set_source_files_properties( ${SRC_LIST} PROPERTIES LANGUAGE CXX)
+
+set (HDR_LIST
+    vktrace_lib_helpers.h
+    vktrace_lib_pagestatusarray.h
+    vktrace_lib_pageguardmappedmemory.h
+    vktrace_lib_pageguardcapture.h
+    vktrace_lib_pageguard.h
+    vktrace_vk_exts.h
+    vk_dispatch_table_helper.h
+    codegen/vktrace_vk_vk.h
+    ${CODEGEN_VKTRACE_DIR}/vktrace_vk_packet_id.h
+    ${CODEGEN_VKTRACE_DIR}/vktrace_vk_vk_packets.h
+    ${CODEGEN_UTILS_DIR}/vk_struct_size_helper.h
+)
+
+# def file - needed for Windows 32-bit so that vk functions names aren't mangled
+if (WIN32)
+    if (NOT(CMAKE_GENERATOR MATCHES "Win64"))
+        set (HDR_LIST ${HDR_LIST} VkLayer_vktrace_layer.def)
+    endif()
+endif()
+
+
+#    codegen/vktrace_vk_vk_lunarg_debug_marker.h
+#    ${CODEGEN_VKTRACE_DIR}/vktrace_vk_vk_lunarg_debug_marker_packets.h
+#    ${CODEGEN_UTILS_DIR}/vk_lunarg_debug_marker_struct_size_helper.h
+
+add_custom_command(OUTPUT vk_dispatch_table_helper.h
+    COMMAND ${PYTHON_CMD} ${CMAKE_SOURCE_DIR}/vk-generate.py ${DisplayServer} dispatch-table-ops layer > vk_dispatch_table_helper.h
+    DEPENDS ${CMAKE_SOURCE_DIR}/vk-generate.py ${CMAKE_SOURCE_DIR}/vulkan.py)
+add_custom_target(generate_vktrace_layer_helpers DEPENDS
+    vk_dispatch_table_helper.h)
+
+include_directories(
+    codegen
+    ${SRC_DIR}/vktrace_common
+    ${SRC_DIR}/vktrace_trace
+    ${CMAKE_CURRENT_SOURCE_DIR}
+    ${CODEGEN_VKTRACE_DIR}
+    ${VKTRACE_VULKAN_INCLUDE_DIR}
+    ${CODEGEN_UTILS_DIR}
+    ${CMAKE_CURRENT_BINARY_DIR}/../../../layersvt
+)
+# copy/link layer json file into build/layersvt directory
+if (NOT WIN32)
+    # extra setup for out-of-tree builds
+    if (NOT (CMAKE_CURRENT_SOURCE_DIR STREQUAL CMAKE_CURRENT_BINARY_DIR))
+        add_custom_target(vktrace_layer-json ALL
+            COMMAND ln -sf ${CMAKE_CURRENT_SOURCE_DIR}/linux/VkLayer_vktrace_layer.json ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}../layersvt/
+            VERBATIM
+            )
+    endif()
+else()
+    if (NOT (CMAKE_CURRENT_SOURCE_DIR STREQUAL CMAKE_CURRENT_BINARY_DIR))
+        FILE(TO_NATIVE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/windows/VkLayer_vktrace_layer.json src_json)
+        if (CMAKE_GENERATOR MATCHES "^Visual Studio.*")
+            FILE(TO_NATIVE_PATH ${PROJECT_BINARY_DIR}/../../../layersvt/$<CONFIGURATION>/VkLayer_vktrace_layer.json dst_json)
+        else()
+            FILE(TO_NATIVE_PATH ${PROJECT_BINARY_DIR}/../../../layersvt/VkLayer_vktrace_layer.json dst_json)
+        endif()
+
+        add_custom_target(vktrace_layer-json ALL
+            COMMAND copy ${src_json} ${dst_json}
+            VERBATIM
+            )
+        add_dependencies(vktrace_layer-json VkLayer_vktrace_layer)
+    endif()
+endif()
+
+add_library(${PROJECT_NAME} SHARED ${SRC_LIST} ${HDR_LIST})
+
+add_dependencies(${PROJECT_NAME} generate_vktrace_layer_helpers)
+
+if (${CMAKE_SYSTEM_NAME} MATCHES "Linux")
+    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
+    set(OS_TRACER_LIBS
+        -shared
+        -ldl
+    )
+endif()
+
+if (${CMAKE_SYSTEM_NAME} MATCHES "Windows")
+    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS}")
+    set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS}")
+    set(OS_TRACER_LIBS)
+endif()
+
+target_link_libraries(${PROJECT_NAME}
+    vktrace_common
+    ${VKTRACE_VULKAN_LIB}
+    ${OS_TRACER_LIBS}
+)
+
+build_options_finalize()
+
+set_target_properties(VkLayer_vktrace_layer PROPERTIES LINKER_LANGUAGE C)

diff --git a/vktrace/src/vktrace_layer/VkLayer_vktrace_layer.def b/vktrace/src/vktrace_layer/VkLayer_vktrace_layer.def
new file mode 100644
index 0000000..0c2d702
--- /dev/null
+++ b/vktrace/src/vktrace_layer/VkLayer_vktrace_layer.def

@@ -0,0 +1,181 @@
+;;;; Begin Copyright Notice ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

+; Vulkan

+;

+; Copyright (c) 2015-2016 The Khronos Group Inc.

+; Copyright (c) 2015-2016 Valve Corporation

+; Copyright (c) 2015-2016 LunarG, Inc.

+;

+; Licensed under the Apache License, Version 2.0 (the "License");

+; you may not use this file except in compliance with the License.

+; You may obtain a copy of the License at

+;

+;     http://www.apache.org/licenses/LICENSE-2.0

+;

+; Unless required by applicable law or agreed to in writing, software

+; distributed under the License is distributed on an "AS IS" BASIS,

+; WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

+; See the License for the specific language governing permissions and

+; limitations under the License.

+;

+;  Author: David Pinedo <david@lunarg.com>

+;;;;  End Copyright Notice ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

+

+

+LIBRARY VkLayer_vktrace_layer

+EXPORTS

+__HOOKED_vkAcquireNextImageKHR

+__HOOKED_vkAllocateCommandBuffers

+__HOOKED_vkAllocateDescriptorSets

+__HOOKED_vkAllocateMemory

+__HOOKED_vkBeginCommandBuffer

+__HOOKED_vkBindBufferMemory

+__HOOKED_vkBindImageMemory

+__HOOKED_vkCmdBeginQuery

+__HOOKED_vkCmdBeginRenderPass

+__HOOKED_vkCmdBindDescriptorSets

+__HOOKED_vkCmdBindIndexBuffer

+__HOOKED_vkCmdBindPipeline

+__HOOKED_vkCmdBindVertexBuffers

+__HOOKED_vkCmdBlitImage

+__HOOKED_vkCmdClearAttachments

+__HOOKED_vkCmdClearColorImage

+__HOOKED_vkCmdClearDepthStencilImage

+__HOOKED_vkCmdCopyBuffer

+__HOOKED_vkCmdCopyBufferToImage

+__HOOKED_vkCmdCopyImage

+__HOOKED_vkCmdCopyImageToBuffer

+__HOOKED_vkCmdCopyQueryPoolResults

+__HOOKED_vkCmdDispatch

+__HOOKED_vkCmdDispatchIndirect

+__HOOKED_vkCmdDraw

+__HOOKED_vkCmdDrawIndexed

+__HOOKED_vkCmdDrawIndexedIndirect

+__HOOKED_vkCmdDrawIndirect

+__HOOKED_vkCmdEndQuery

+__HOOKED_vkCmdEndRenderPass

+__HOOKED_vkCmdExecuteCommands

+__HOOKED_vkCmdFillBuffer

+__HOOKED_vkCmdNextSubpass

+__HOOKED_vkCmdPipelineBarrier

+__HOOKED_vkCmdPushConstants

+__HOOKED_vkCmdResetEvent

+__HOOKED_vkCmdResetQueryPool

+__HOOKED_vkCmdResolveImage

+__HOOKED_vkCmdSetBlendConstants

+__HOOKED_vkCmdSetDepthBias

+__HOOKED_vkCmdSetDepthBounds

+__HOOKED_vkCmdSetEvent

+__HOOKED_vkCmdSetLineWidth

+__HOOKED_vkCmdSetScissor

+__HOOKED_vkCmdSetStencilCompareMask

+__HOOKED_vkCmdSetStencilReference

+__HOOKED_vkCmdSetStencilWriteMask

+__HOOKED_vkCmdSetViewport

+__HOOKED_vkCmdUpdateBuffer

+__HOOKED_vkCmdWaitEvents

+__HOOKED_vkCmdWriteTimestamp

+__HOOKED_vkCreateBuffer

+__HOOKED_vkCreateBufferView

+__HOOKED_vkCreateCommandPool

+__HOOKED_vkCreateComputePipelines

+__HOOKED_vkCreateDebugReportCallbackEXT

+__HOOKED_vkCreateDescriptorPool

+__HOOKED_vkCreateDescriptorSetLayout

+__HOOKED_vkCreateDevice

+__HOOKED_vkCreateEvent

+__HOOKED_vkCreateFence

+__HOOKED_vkCreateFramebuffer

+__HOOKED_vkCreateGraphicsPipelines

+__HOOKED_vkCreateImage

+__HOOKED_vkCreateImageView

+__HOOKED_vkCreateInstance

+__HOOKED_vkCreatePipelineCache

+__HOOKED_vkCreatePipelineLayout

+__HOOKED_vkCreateQueryPool

+__HOOKED_vkCreateRenderPass

+__HOOKED_vkCreateSampler

+__HOOKED_vkCreateSemaphore

+__HOOKED_vkCreateShaderModule

+__HOOKED_vkCreateSwapchainKHR

+__HOOKED_vkCreateWin32SurfaceKHR

+__HOOKED_vkDebugReportMessageEXT

+__HOOKED_vkDestroyBuffer

+__HOOKED_vkDestroyBufferView

+__HOOKED_vkDestroyCommandPool

+__HOOKED_vkDestroyDebugReportCallbackEXT

+__HOOKED_vkDestroyDescriptorPool

+__HOOKED_vkDestroyDescriptorSetLayout

+__HOOKED_vkDestroyDevice

+__HOOKED_vkDestroyEvent

+__HOOKED_vkDestroyFence

+__HOOKED_vkDestroyFramebuffer

+__HOOKED_vkDestroyImage

+__HOOKED_vkDestroyImageView

+__HOOKED_vkDestroyInstance

+__HOOKED_vkDestroyPipeline

+__HOOKED_vkDestroyPipelineCache

+__HOOKED_vkDestroyPipelineLayout

+__HOOKED_vkDestroyQueryPool

+__HOOKED_vkDestroyRenderPass

+__HOOKED_vkDestroySampler

+__HOOKED_vkDestroySemaphore

+__HOOKED_vkDestroyShaderModule

+__HOOKED_vkDestroySurfaceKHR

+__HOOKED_vkDestroySwapchainKHR

+__HOOKED_vkDeviceWaitIdle

+__HOOKED_vkEndCommandBuffer

+vkEnumerateDeviceExtensionProperties

+__HOOKED_vkEnumerateDeviceExtensionProperties

+vkEnumerateDeviceLayerProperties

+__HOOKED_vkEnumerateDeviceLayerProperties

+vkEnumerateInstanceLayerProperties

+vkEnumerateInstanceExtensionProperties

+__HOOKED_vkEnumeratePhysicalDevices

+__HOOKED_vkFlushMappedMemoryRanges

+__HOOKED_vkFreeCommandBuffers

+__HOOKED_vkFreeDescriptorSets

+__HOOKED_vkFreeMemory

+__HOOKED_vkGetBufferMemoryRequirements

+__HOOKED_vkGetDeviceMemoryCommitment

+VK_LAYER_LUNARG_vktraceGetDeviceProcAddr

+__HOOKED_vkGetDeviceProcAddr

+__HOOKED_vkGetDeviceQueue

+__HOOKED_vkGetEventStatus

+__HOOKED_vkGetFenceStatus

+__HOOKED_vkGetImageMemoryRequirements

+__HOOKED_vkGetImageSparseMemoryRequirements

+__HOOKED_vkGetImageSubresourceLayout

+VK_LAYER_LUNARG_vktraceGetInstanceProcAddr

+__HOOKED_vkGetInstanceProcAddr

+__HOOKED_vkGetPhysicalDeviceFeatures

+__HOOKED_vkGetPhysicalDeviceFormatProperties

+__HOOKED_vkGetPhysicalDeviceImageFormatProperties

+__HOOKED_vkGetPhysicalDeviceMemoryProperties

+__HOOKED_vkGetPhysicalDeviceProperties

+__HOOKED_vkGetPhysicalDeviceQueueFamilyProperties

+__HOOKED_vkGetPhysicalDeviceSparseImageFormatProperties

+__HOOKED_vkGetPhysicalDeviceSurfaceCapabilitiesKHR

+__HOOKED_vkGetPhysicalDeviceSurfaceFormatsKHR

+__HOOKED_vkGetPhysicalDeviceSurfacePresentModesKHR

+__HOOKED_vkGetPhysicalDeviceSurfaceSupportKHR

+__HOOKED_vkGetPhysicalDeviceWin32PresentationSupportKHR

+__HOOKED_vkGetPipelineCacheData

+__HOOKED_vkGetQueryPoolResults

+__HOOKED_vkGetRenderAreaGranularity

+__HOOKED_vkGetSwapchainImagesKHR

+__HOOKED_vkInvalidateMappedMemoryRanges

+__HOOKED_vkMapMemory

+__HOOKED_vkMergePipelineCaches

+__HOOKED_vkQueueBindSparse

+__HOOKED_vkQueuePresentKHR

+__HOOKED_vkQueueSubmit

+__HOOKED_vkQueueWaitIdle

+__HOOKED_vkResetCommandBuffer

+__HOOKED_vkResetCommandPool

+__HOOKED_vkResetDescriptorPool

+__HOOKED_vkResetEvent

+__HOOKED_vkResetFences

+__HOOKED_vkSetEvent

+__HOOKED_vkUnmapMemory

+__HOOKED_vkUpdateDescriptorSets

+__HOOKED_vkWaitForFences


diff --git a/vktrace/src/vktrace_layer/linux/VkLayer_vktrace_layer.json b/vktrace/src/vktrace_layer/linux/VkLayer_vktrace_layer.json
new file mode 100644
index 0000000..daf651d
--- /dev/null
+++ b/vktrace/src/vktrace_layer/linux/VkLayer_vktrace_layer.json

@@ -0,0 +1,15 @@
+{
+    "file_format_version" : "1.0.0",
+    "layer" : {
+        "name": "VK_LAYER_LUNARG_vktrace",
+        "type": "GLOBAL",
+        "library_path": "../vktrace/libVkLayer_vktrace_layer.so",
+        "api_version": "1.0.33",
+        "implementation_version": "1",
+        "description": "Vktrace tracing library",
+        "functions" : {
+          "vkGetInstanceProcAddr" : "VK_LAYER_LUNARG_vktraceGetInstanceProcAddr",
+          "vkGetDeviceProcAddr" : "VK_LAYER_LUNARG_vktraceGetDeviceProcAddr"
+        }
+    }
+}

diff --git a/vktrace/src/vktrace_layer/vktrace_lib.c b/vktrace/src/vktrace_layer/vktrace_lib.c
new file mode 100644
index 0000000..3921603
--- /dev/null
+++ b/vktrace/src/vktrace_layer/vktrace_lib.c

@@ -0,0 +1,226 @@
+/*
+ * Copyright 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ */
+
+#include "vktrace_common.h"
+#include "vktrace_filelike.h"
+#include "vktrace_interconnect.h"
+#include "vktrace_vk_vk.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+// Environment variables
+// These are needed because Windows may not have getenv available.
+// See the Windows man page for getenv to find out why.
+
+#if defined(_WIN32)
+static inline char *vktrace_layer_getenv(const char *name)
+{
+    char *retVal;
+    DWORD valSize;
+    valSize = GetEnvironmentVariableA(name, NULL, 0); 
+    // valSize DOES include the null terminator, so for any set variable
+    // will always be at least 1. If it's 0, the variable wasn't set.
+    if (valSize == 0)
+        return NULL;
+    retVal = (char *)malloc(valSize);
+    GetEnvironmentVariableA(name, retVal, valSize);
+    return retVal;
+}
+
+static inline void vktrace_layer_free_getenv(const char *val)
+{
+    free((void *)val);
+}
+#else
+static inline char *vktrace_layer_getenv(const char *name)
+{
+    return getenv(name);
+}
+
+static inline void vktrace_layer_free_getenv(const char *val) { }
+#endif
+
+VKTRACER_LEAVE _Unload(void);
+
+#ifdef PLATFORM_LINUX
+static void vktrace_sighandler(int signum, siginfo_t *info, void *ptr)
+{
+   vktrace_LogVerbose("vktrace_lib library handle signal %d.", signum);
+    _Unload();
+    kill(0, signum);
+}
+#endif
+
+VKTRACER_EXIT TrapExit(void)
+{
+    vktrace_LogVerbose("vktrace_lib TrapExit.");
+}
+
+void loggingCallback(VktraceLogLevel level, const char* pMessage)
+{
+    switch(level)
+    {
+    case VKTRACE_LOG_DEBUG: printf("vktrace debug: %s\n", pMessage); break;
+    case VKTRACE_LOG_ERROR: printf("vktrace error: %s\n", pMessage); break;
+    case VKTRACE_LOG_WARNING: printf("vktrace warning: %s\n", pMessage); break;
+    case VKTRACE_LOG_VERBOSE: printf("vktrace info: %s\n", pMessage); break;
+    default:
+        printf("%s\n", pMessage); break;
+    }
+    fflush(stdout);
+
+    if (vktrace_trace_get_trace_file() != NULL)
+    {
+        uint32_t requiredLength = (uint32_t) ROUNDUP_TO_4(strlen(pMessage) + 1);
+        vktrace_trace_packet_header* pHeader = vktrace_create_trace_packet(VKTRACE_TID_VULKAN, VKTRACE_TPI_MESSAGE, sizeof(vktrace_trace_packet_message), requiredLength);
+        vktrace_trace_packet_message* pPacket = vktrace_interpret_body_as_trace_packet_message(pHeader);
+        pPacket->type = level;
+        pPacket->length = requiredLength;
+
+        vktrace_add_buffer_to_trace_packet(pHeader, (void**)&pPacket->message, requiredLength, pMessage);
+        vktrace_finalize_buffer_address(pHeader, (void**)&pPacket->message);
+        vktrace_set_packet_entrypoint_end_time(pHeader);
+        vktrace_finalize_trace_packet(pHeader);
+
+        vktrace_write_trace_packet(pHeader, vktrace_trace_get_trace_file());
+        vktrace_delete_trace_packet(&pHeader);
+    }
+
+#if defined(WIN32)
+#if _DEBUG
+    OutputDebugString(pMessage);
+#endif
+#endif
+
+#if defined(ANDROID)
+#include <android/log.h>
+    switch(level)
+    {
+    case VKTRACE_LOG_DEBUG:   __android_log_print(ANDROID_LOG_DEBUG,   "vktrace", "%s", pMessage); break;
+    case VKTRACE_LOG_ERROR:   __android_log_print(ANDROID_LOG_ERROR,   "vktrace", "%s", pMessage); break;
+    case VKTRACE_LOG_WARNING: __android_log_print(ANDROID_LOG_WARNING, "vktrace", "%s", pMessage); break;
+    case VKTRACE_LOG_VERBOSE: __android_log_print(ANDROID_LOG_INFO,    "vktrace", "%s", pMessage); break;
+    default:
+        __android_log_print(ANDROID_LOG_INFO, "vktrace", "%s", pMessage); break;
+    }
+#endif
+}
+
+extern
+VKTRACER_ENTRY _Load(void)
+{
+    // only do the hooking and networking if the tracer is NOT loaded by vktrace
+    if (vktrace_is_loaded_into_vktrace() == FALSE)
+    {
+        char *verbosity;
+        vktrace_LogSetCallback(loggingCallback);
+        verbosity = vktrace_layer_getenv("_VK_TRACE_VERBOSITY");
+        if (verbosity && !strcmp(verbosity, "quiet"))
+            vktrace_LogSetLevel(VKTRACE_LOG_NONE);
+        else if (verbosity && !strcmp(verbosity, "warnings"))
+            vktrace_LogSetLevel(VKTRACE_LOG_WARNING);
+        else if (verbosity && !strcmp(verbosity, "full"))
+            vktrace_LogSetLevel(VKTRACE_LOG_VERBOSE);
+#ifdef _DEBUG
+        else if (verbosity && !strcmp(verbosity, "debug"))
+            vktrace_LogSetLevel(VKTRACE_LOG_DEBUG);
+#endif
+        else
+            // Either verbosity=="errors", or it wasn't specified
+            vktrace_LogSetLevel(VKTRACE_LOG_ERROR);
+
+        vktrace_layer_free_getenv(verbosity);
+
+        vktrace_LogVerbose("vktrace_lib library loaded into PID %d", vktrace_get_pid());
+        atexit(TrapExit);
+
+        // If you need to debug startup, build with this set to true, then attach and change it to false.
+    #ifdef _DEBUG
+        {
+            bool debugStartup = false;
+        while (debugStartup);
+        }
+    #endif
+#ifdef PLATFORM_LINUX
+        struct sigaction act;
+        memset(&act, 0 , sizeof(act));
+        act.sa_sigaction = vktrace_sighandler;
+        act.sa_flags = SA_SIGINFO | SA_RESETHAND;
+        sigaction(SIGINT, &act, NULL);
+        sigaction(SIGTERM, &act, NULL);
+        sigaction(SIGABRT, &act, NULL);
+#endif
+    }
+}
+
+VKTRACER_LEAVE _Unload(void)
+{
+    // only do the hooking and networking if the tracer is NOT loaded by vktrace
+    if (vktrace_is_loaded_into_vktrace() == FALSE)
+    {
+        if (vktrace_trace_get_trace_file() != NULL) {
+            vktrace_trace_packet_header* pHeader = vktrace_create_trace_packet(VKTRACE_TID_VULKAN, VKTRACE_TPI_MARKER_TERMINATE_PROCESS, 0, 0);
+            vktrace_finalize_trace_packet(pHeader);
+            vktrace_write_trace_packet(pHeader, vktrace_trace_get_trace_file());
+            vktrace_delete_trace_packet(&pHeader);
+            vktrace_free(vktrace_trace_get_trace_file());
+            vktrace_trace_set_trace_file(NULL);
+        }
+        if (gMessageStream != NULL)
+        {
+            vktrace_MessageStream_destroy(&gMessageStream);
+        }
+        vktrace_LogVerbose("vktrace_lib library unloaded from PID %d", vktrace_get_pid());
+    }
+}
+
+#if defined(WIN32)
+BOOL APIENTRY DllMain( HMODULE hModule,
+                       DWORD  ul_reason_for_call,
+                       LPVOID lpReserved
+                     )
+{
+    hModule;
+    lpReserved;
+
+    switch (ul_reason_for_call)
+    {
+    case DLL_PROCESS_ATTACH:
+    {
+        _Load();
+        break;
+    }
+    case DLL_PROCESS_DETACH:
+    {
+        _Unload();
+        break;
+    }
+    default:
+        break;
+    }
+    return TRUE;
+}
+#endif
+#ifdef __cplusplus
+}
+#endif

diff --git a/vktrace/src/vktrace_layer/vktrace_lib_helpers.h b/vktrace/src/vktrace_layer/vktrace_lib_helpers.h
new file mode 100644
index 0000000..988cfac
--- /dev/null
+++ b/vktrace/src/vktrace_layer/vktrace_lib_helpers.h

@@ -0,0 +1,429 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Tobin Ehlis <tobin@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ */
+#pragma once
+#include <unordered_map>
+#include "vktrace_vk_vk.h"
+#include "vulkan/vk_layer.h"
+#include "vktrace_platform.h"
+
+#include "vk_struct_size_helper.h"
+
+// Support for shadowing CPU mapped memory
+//TODO better handling of multiple range rather than fixed array
+typedef struct _VKAllocInfo {
+    VkDeviceSize   totalSize;
+    VkDeviceSize   rangeSize;
+    VkDeviceSize   rangeOffset;
+    BOOL           didFlush;
+    VkDeviceMemory handle;
+    uint8_t        *pData;
+    BOOL           valid;
+} VKAllocInfo;
+
+typedef struct _VKMemInfo {
+    unsigned int numEntrys;
+    VKAllocInfo *pEntrys;
+    VKAllocInfo *pLastMapped;
+    unsigned int capacity;
+} VKMemInfo;
+
+typedef struct _layer_device_data {
+    VkLayerDispatchTable devTable;
+    bool KHRDeviceSwapchainEnabled;
+} layer_device_data;
+typedef struct _layer_instance_data {
+    VkLayerInstanceDispatchTable instTable;
+    bool LunargDebugReportEnabled;
+    bool KHRSurfaceEnabled;
+    bool KHRXcbSurfaceEnabled;
+    bool KHRXlibSurfaceEnabled;
+    bool KHRWaylandSurfaceEnabled;
+    bool KHRMirSurfaceEnabled;
+    bool KHRWin32SurfaceEnabled;
+    bool KHRAndroidSurfaceEnabled;
+} layer_instance_data;
+
+// defined in manually written file: vktrace_lib_trace.c
+extern VKMemInfo g_memInfo;
+extern VKTRACE_CRITICAL_SECTION g_memInfoLock;
+extern std::unordered_map<void *, layer_device_data *> g_deviceDataMap;
+extern std::unordered_map<void *, layer_instance_data *> g_instanceDataMap;
+
+typedef void *dispatch_key;
+inline dispatch_key get_dispatch_key(const void* object)
+{
+    return (dispatch_key) *(VkLayerDispatchTable **) object;
+}
+
+layer_instance_data *mid(void *object);
+layer_device_data *mdd(void* object);
+
+static void init_mem_info_entrys(VKAllocInfo *ptr, const unsigned int num)
+{
+    unsigned int i;
+    for (i = 0; i < num; i++)
+    {
+        VKAllocInfo *entry = ptr + i;
+        entry->pData = NULL;
+        entry->totalSize = 0;
+        entry->rangeSize = 0;
+        entry->rangeOffset = 0;
+        entry->didFlush = FALSE;
+        memset(&entry->handle, 0, sizeof(VkDeviceMemory));
+        entry->valid = FALSE;
+    }
+}
+
+// caller must hold the g_memInfoLock
+static void init_mem_info()
+{
+    g_memInfo.numEntrys = 0;
+    g_memInfo.capacity = 4096;
+    g_memInfo.pLastMapped = NULL;
+
+    g_memInfo.pEntrys = VKTRACE_NEW_ARRAY(VKAllocInfo, g_memInfo.capacity);
+
+    if (g_memInfo.pEntrys == NULL)
+        vktrace_LogError("init_mem_info()  malloc failed.");
+    else
+        init_mem_info_entrys(g_memInfo.pEntrys, g_memInfo.capacity);
+}
+
+// caller must hold the g_memInfoLock
+static void delete_mem_info()
+{
+    VKTRACE_DELETE(g_memInfo.pEntrys);
+    g_memInfo.pEntrys = NULL;
+    g_memInfo.numEntrys = 0;
+    g_memInfo.capacity = 0;
+    g_memInfo.pLastMapped = NULL;
+}
+
+// caller must hold the g_memInfoLock
+static VKAllocInfo * get_mem_info_entry()
+{
+    unsigned int i;
+    VKAllocInfo *entry;
+    if (g_memInfo.numEntrys > g_memInfo.capacity)
+    {
+        vktrace_LogError("get_mem_info_entry() bad internal state numEntrys %u.", g_memInfo.numEntrys);
+        return NULL;
+    }
+
+    entry = g_memInfo.pEntrys;
+    for (i = 0; i < g_memInfo.numEntrys; i++)
+    {
+        if ((entry + i)->valid == FALSE)
+            return entry + i;
+    }
+    if (g_memInfo.numEntrys == g_memInfo.capacity)
+    {  // grow the array 2x
+        g_memInfo.capacity *= 2;
+        g_memInfo.pEntrys = (VKAllocInfo *) VKTRACE_REALLOC(g_memInfo.pEntrys, g_memInfo.capacity * sizeof(VKAllocInfo));
+        if (g_memInfo.pEntrys == NULL)
+            vktrace_LogError("get_mem_info_entry() realloc failed.");
+        vktrace_LogDebug("realloc memInfo from %u to %u", g_memInfo.capacity /2, g_memInfo.capacity);
+        //init the newly added entrys
+        init_mem_info_entrys(g_memInfo.pEntrys + g_memInfo.capacity / 2, g_memInfo.capacity / 2);
+    }
+
+    assert(g_memInfo.numEntrys < g_memInfo.capacity);
+    entry = g_memInfo.pEntrys + g_memInfo.numEntrys;
+    g_memInfo.numEntrys++;
+    assert(entry->valid == FALSE);
+    return entry;
+}
+
+// caller must hold the g_memInfoLock
+static VKAllocInfo * find_mem_info_entry(const VkDeviceMemory handle)
+{
+    VKAllocInfo *entry;
+    unsigned int i;
+    entry = g_memInfo.pEntrys;
+    if (g_memInfo.pLastMapped && g_memInfo.pLastMapped->handle == handle && g_memInfo.pLastMapped->valid)
+    {
+        return g_memInfo.pLastMapped;
+    }
+    for (i = 0; i < g_memInfo.numEntrys; i++)
+    {
+        if ((entry + i)->valid && (handle == (entry + i)->handle))
+        {
+            return entry + i;
+        }
+    }
+
+    return NULL;
+}
+
+static VKAllocInfo * find_mem_info_entry_lock(const VkDeviceMemory handle)
+{
+    VKAllocInfo *res;
+    vktrace_enter_critical_section(&g_memInfoLock);
+    res = find_mem_info_entry(handle);
+    vktrace_leave_critical_section(&g_memInfoLock);
+    return res;
+}
+
+static void add_new_handle_to_mem_info(const VkDeviceMemory handle, VkDeviceSize size, void *pData)
+{
+    VKAllocInfo *entry;
+
+    vktrace_enter_critical_section(&g_memInfoLock);
+    if (g_memInfo.capacity == 0)
+        init_mem_info();
+
+    entry = get_mem_info_entry();
+    if (entry)
+    {
+        entry->valid = TRUE;
+        entry->handle = handle;
+        entry->totalSize = size;
+        entry->rangeSize = 0;
+        entry->rangeOffset = 0;
+        entry->didFlush = FALSE;
+        entry->pData = (uint8_t *) pData;   // NOTE: VKFreeMemory will free this mem, so no malloc()
+    }
+    vktrace_leave_critical_section(&g_memInfoLock);
+}
+
+static void add_data_to_mem_info(const VkDeviceMemory handle, VkDeviceSize rangeSize, VkDeviceSize rangeOffset, void *pData)
+{
+    VKAllocInfo *entry;
+
+    vktrace_enter_critical_section(&g_memInfoLock);
+    entry = find_mem_info_entry(handle);
+    if (entry)
+    {
+        entry->pData = (uint8_t *)pData;
+        if (rangeSize == VK_WHOLE_SIZE)
+            entry->rangeSize = entry->totalSize - rangeOffset;
+        else
+            entry->rangeSize = rangeSize;
+        entry->rangeOffset = rangeOffset;
+        assert(entry->totalSize >= entry->rangeSize + rangeOffset);
+    }
+    g_memInfo.pLastMapped = entry;
+    vktrace_leave_critical_section(&g_memInfoLock);
+}
+
+static void rm_handle_from_mem_info(const VkDeviceMemory handle)
+{
+    VKAllocInfo *entry;
+
+    vktrace_enter_critical_section(&g_memInfoLock);
+    entry = find_mem_info_entry(handle);
+    if (entry)
+    {
+        entry->valid = FALSE;
+        entry->pData = NULL;
+        entry->totalSize = 0;
+        entry->rangeSize = 0;
+        entry->rangeOffset = 0;
+        entry->didFlush = FALSE;
+        memset(&entry->handle, 0, sizeof(VkDeviceMemory));
+
+        if (entry == g_memInfo.pLastMapped)
+            g_memInfo.pLastMapped = NULL;
+        // adjust numEntrys to be last valid entry in list
+        do {
+            entry =  g_memInfo.pEntrys + g_memInfo.numEntrys - 1;
+            if (entry->valid == FALSE)
+                g_memInfo.numEntrys--;
+        } while ((entry->valid == FALSE) && (g_memInfo.numEntrys > 0));
+        if (g_memInfo.numEntrys == 0)
+            delete_mem_info();
+    }
+    vktrace_leave_critical_section(&g_memInfoLock);
+}
+
+static void add_alloc_memory_to_trace_packet(vktrace_trace_packet_header* pHeader, void** ppOut, const void* pIn)
+{
+    while (pIn)
+    {
+        switch (((VkApplicationInfo *)pIn)->sType)
+        {
+        case VK_STRUCTURE_TYPE_DEDICATED_ALLOCATION_MEMORY_ALLOCATE_INFO_NV:
+            vktrace_add_buffer_to_trace_packet(pHeader, ppOut, sizeof(VkDedicatedAllocationMemoryAllocateInfoNV), pIn);
+            vktrace_finalize_buffer_address(pHeader, ppOut);
+            break;
+        case VK_STRUCTURE_TYPE_EXPORT_MEMORY_ALLOCATE_INFO_NV:
+            vktrace_add_buffer_to_trace_packet(pHeader, ppOut, sizeof(VkExportMemoryAllocateInfoNV), pIn);
+            vktrace_finalize_buffer_address(pHeader, ppOut);
+            break;
+#ifdef VK_USE_PLATFORM_WIN32_KHR
+        case VK_STRUCTURE_TYPE_EXPORT_MEMORY_WIN32_HANDLE_INFO_NV:
+            vktrace_add_buffer_to_trace_packet(pHeader, ppOut, sizeof(VkExportMemoryWin32HandleInfoNV), pIn);
+            vktrace_finalize_buffer_address(pHeader, ppOut);
+            break;
+
+        case VK_STRUCTURE_TYPE_IMPORT_MEMORY_WIN32_HANDLE_INFO_NV:
+            vktrace_add_buffer_to_trace_packet(pHeader, ppOut, sizeof(VkImportMemoryWin32HandleInfoNV), pIn);
+            vktrace_finalize_buffer_address(pHeader, ppOut);
+            break;
+#endif
+        default:
+            vktrace_LogError("vkAllocateMemory: unrecognize pAllocate pNext list structure");
+            break;
+        }
+        pIn = ((VkApplicationInfo *)pIn)->pNext;
+    }
+}
+
+static void add_VkPipelineShaderStageCreateInfo_to_trace_packet(vktrace_trace_packet_header* pHeader, VkPipelineShaderStageCreateInfo* packetShader, const VkPipelineShaderStageCreateInfo* paramShader)
+{
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&packetShader->pName, ROUNDUP_TO_4(strlen(paramShader->pName) + 1), paramShader->pName);
+    vktrace_finalize_buffer_address(pHeader, (void**)&packetShader->pName);
+
+    // Specialization info
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&packetShader->pSpecializationInfo, sizeof(VkSpecializationInfo), paramShader->pSpecializationInfo);
+    if (packetShader->pSpecializationInfo != NULL)
+    {
+        if (paramShader->pSpecializationInfo != NULL) {
+            vktrace_add_buffer_to_trace_packet(pHeader, (void**)&packetShader->pSpecializationInfo->pMapEntries, sizeof(VkSpecializationMapEntry) * paramShader->pSpecializationInfo->mapEntryCount, paramShader->pSpecializationInfo->pMapEntries);
+            vktrace_add_buffer_to_trace_packet(pHeader, (void**)&packetShader->pSpecializationInfo->pData, paramShader->pSpecializationInfo->dataSize, paramShader->pSpecializationInfo->pData);
+            vktrace_finalize_buffer_address(pHeader, (void**)&packetShader->pSpecializationInfo->pMapEntries);
+            vktrace_finalize_buffer_address(pHeader, (void**)&packetShader->pSpecializationInfo->pData);
+        }
+    }
+    vktrace_finalize_buffer_address(pHeader, (void**)&packetShader->pSpecializationInfo);
+}
+
+static void add_create_ds_layout_to_trace_packet(vktrace_trace_packet_header* pHeader, const VkDescriptorSetLayoutCreateInfo** ppOut, const VkDescriptorSetLayoutCreateInfo* pIn)
+{
+    uint32_t i;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)(ppOut), sizeof(VkDescriptorSetLayoutCreateInfo), pIn);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&((*ppOut)->pBindings), sizeof(VkDescriptorSetLayoutBinding) * pIn->bindingCount, pIn->pBindings);
+    for (i = 0; i < pIn->bindingCount; i++) {
+        if (pIn->pBindings[i].pImmutableSamplers != NULL &&
+                (pIn->pBindings[i].descriptorType == VK_DESCRIPTOR_TYPE_SAMPLER ||
+                 pIn->pBindings[i].descriptorType == VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER)) {
+            vktrace_add_buffer_to_trace_packet(pHeader, (void**)&((*ppOut)->pBindings[i].pImmutableSamplers),
+                                               sizeof(VkSampler) * pIn->pBindings[i].descriptorCount,
+                                               pIn->pBindings[i].pImmutableSamplers);
+            vktrace_finalize_buffer_address(pHeader, (void**)&((*ppOut)->pBindings[i].pImmutableSamplers));
+        }
+    }
+    vktrace_finalize_buffer_address(pHeader, (void**)&((*ppOut)->pBindings));
+    vktrace_finalize_buffer_address(pHeader, (void**)(ppOut));
+    return;
+}
+
+static void add_VkGraphicsPipelineCreateInfos_to_trace_packet(vktrace_trace_packet_header* pHeader, VkGraphicsPipelineCreateInfo* pPacket, const VkGraphicsPipelineCreateInfo* pParam, uint32_t count)
+{
+    if (pParam != NULL)
+    {
+        uint32_t i;
+        uint32_t j;
+
+        for (i = 0; i < count; i++) {
+            // shader stages array
+            vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pStages), sizeof(VkPipelineShaderStageCreateInfo) * pParam[i].stageCount, pParam[i].pStages);
+            for (j = 0; j < pParam[i].stageCount; j++)
+            {
+                add_VkPipelineShaderStageCreateInfo_to_trace_packet(pHeader, (VkPipelineShaderStageCreateInfo*)&pPacket->pStages[j], &pParam->pStages[j]);
+            }
+            vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pStages));
+
+            // Vertex Input State
+            if (pParam[i].pVertexInputState) {
+                vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pVertexInputState), sizeof(VkPipelineVertexInputStateCreateInfo), pParam[i].pVertexInputState);
+                vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pVertexInputState->pVertexBindingDescriptions), pParam[i].pVertexInputState->vertexBindingDescriptionCount * sizeof(VkVertexInputBindingDescription), pParam[i].pVertexInputState->pVertexBindingDescriptions);
+                vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pVertexInputState->pVertexBindingDescriptions));
+                vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pVertexInputState->pVertexAttributeDescriptions), pParam[i].pVertexInputState->vertexAttributeDescriptionCount * sizeof(VkVertexInputAttributeDescription), pParam[i].pVertexInputState->pVertexAttributeDescriptions);
+                vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pVertexInputState->pVertexAttributeDescriptions));
+                vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pVertexInputState));
+            }
+            // Input Assembly State
+            if (pParam[i].pInputAssemblyState) {
+                vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pInputAssemblyState), sizeof(VkPipelineInputAssemblyStateCreateInfo), pParam[i].pInputAssemblyState);
+                vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pInputAssemblyState));
+            }
+            // Tesselation State
+            if (pParam[i].pTessellationState) {
+                vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pTessellationState), sizeof(VkPipelineTessellationStateCreateInfo), pParam[i].pTessellationState);
+                vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pTessellationState));
+            }
+            // Viewport State
+            if (pParam[i].pViewportState) {
+                vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pViewportState), sizeof(VkPipelineViewportStateCreateInfo), pParam[i].pViewportState);
+                vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pViewportState->pViewports), sizeof(VkViewport) * pParam[i].pViewportState->viewportCount,
+                                                   pParam[i].pViewportState->pViewports);
+                vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pViewportState->pViewports));
+                vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pViewportState->pScissors), sizeof(VkRect2D) * pParam[i].pViewportState->scissorCount,
+                                                   pParam[i].pViewportState->pScissors);
+                vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pViewportState->pScissors));
+                vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pViewportState));
+            }
+
+            // Raster State
+            if (pParam[i].pRasterizationState) {
+                vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pRasterizationState), sizeof(VkPipelineRasterizationStateCreateInfo), pParam[i].pRasterizationState);
+                vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pRasterizationState));
+            }
+            // MultiSample State
+            if (pParam[i].pMultisampleState) {
+                vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pMultisampleState), sizeof(VkPipelineMultisampleStateCreateInfo), pParam[i].pMultisampleState);
+                vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pMultisampleState->pSampleMask), sizeof(VkSampleMask), pParam[i].pMultisampleState->pSampleMask);
+                vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pMultisampleState->pSampleMask));
+                vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pMultisampleState));
+            }
+
+            // DepthStencil State
+            if (pParam[i].pDepthStencilState) {
+                vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pDepthStencilState), sizeof(VkPipelineDepthStencilStateCreateInfo), pParam[i].pDepthStencilState);
+                vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pDepthStencilState));
+            }
+
+            // ColorBlend State
+            if (pParam[i].pColorBlendState) {
+                vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pColorBlendState), sizeof(VkPipelineColorBlendStateCreateInfo), pParam[i].pColorBlendState);
+                vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pColorBlendState->pAttachments), pParam[i].pColorBlendState->attachmentCount * sizeof(VkPipelineColorBlendAttachmentState), pParam[i].pColorBlendState->pAttachments);
+                vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pColorBlendState->pAttachments));
+                vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pColorBlendState));
+            }
+
+            // DynamicState
+            if (pParam[i].pDynamicState) {
+                vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pDynamicState), sizeof(VkPipelineDynamicStateCreateInfo), pParam[i].pDynamicState);
+                vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pDynamicState->pDynamicStates), pParam[i].pDynamicState->dynamicStateCount * sizeof(VkDynamicState), pParam[i].pDynamicState->pDynamicStates);
+                vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pDynamicState->pDynamicStates));
+                vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pDynamicState));
+            }
+        }
+    }
+    return;
+}
+
+static void add_VkComputePipelineCreateInfos_to_trace_packet(vktrace_trace_packet_header* pHeader, VkComputePipelineCreateInfo* pPacket, const VkComputePipelineCreateInfo* pParam, uint32_t count)
+{
+    if (pParam != NULL)
+    {
+        uint32_t i;
+
+        for (i = 0; i < count; i++) {
+            // shader stage
+            add_VkPipelineShaderStageCreateInfo_to_trace_packet(pHeader, (VkPipelineShaderStageCreateInfo*)&pPacket->stage, &pParam[i].stage);
+        }
+    }
+    return;
+}

diff --git a/vktrace/src/vktrace_layer/vktrace_lib_pageguard.cpp b/vktrace/src/vktrace_layer/vktrace_lib_pageguard.cpp
new file mode 100644
index 0000000..6d70b46
--- /dev/null
+++ b/vktrace/src/vktrace_layer/vktrace_lib_pageguard.cpp

@@ -0,0 +1,397 @@
+/*
+* Copyright (c) 2016 Advanced Micro Devices, Inc. All rights reserved.
+* Copyright (C) 2015-2016 LunarG, Inc.
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+
+
+
+#include "vktrace_pageguard_memorycopy.h"
+#include "vktrace_lib_pagestatusarray.h"
+#include "vktrace_lib_pageguardmappedmemory.h"
+#include "vktrace_lib_pageguardcapture.h"
+#include "vktrace_lib_pageguard.h"
+
+static const bool PAGEGUARD_PAGEGUARD_ENABLE_DEFAULT = false;  //set the default value to false, if capture target app with enable pageguard, need run "set VKTRACE_PAGEGUARD=1" commandline.
+
+static const VkDeviceSize PAGEGUARD_TARGET_RANGE_SIZE_DEFAULT = 2; //cover all reasonal mapped memory size, the mapped memory size may be less than 1 page, so processing for mapped memory size<1 page is already added,
+                                                                   //other value: 32 * 1024 * 1024 (32M),  64M, this is the size which cause DOOM4 capture very slow.
+static const VkDeviceSize  PAGEGUARD_PAGEGUARD_TARGET_RANGE_SIZE_MIN = 1; // already tested: 2,2M,4M,32M,64M, because commonly page size is 4k, so only range size=2 can cover small size mapped memory.
+
+static vktrace_sem_id ref_amount_sem_id;// TODO if vktrace implement cross platform lib or dll load or unload function, this sem can be putted in those functions, but now we leave it to process quit.
+static bool ref_amount_sem_id_create_success = vktrace_sem_create(&ref_amount_sem_id, 1);
+
+static vktrace_sem_id map_lock_sem_id;
+static bool map_lock_sem_id_create_success = vktrace_sem_create(&map_lock_sem_id, 1);
+
+void pageguardEnter()
+{
+    vktrace_sem_wait(map_lock_sem_id);
+}
+void pageguardExit()
+{
+    vktrace_sem_post(map_lock_sem_id);
+}
+#if defined(WIN32) //page guard solution for windows
+
+
+VkDeviceSize& ref_target_range_size()
+{
+    static VkDeviceSize OPTTargetRangeSize = PAGEGUARD_TARGET_RANGE_SIZE_DEFAULT;
+    return OPTTargetRangeSize;
+}
+
+void set_pageguard_target_range_size(VkDeviceSize newrangesize)
+{
+    VkDeviceSize& refTargetRangeSize = ref_target_range_size();
+
+    refTargetRangeSize = newrangesize;
+}
+
+LONG WINAPI PageGuardExceptionHandler(PEXCEPTION_POINTERS ExceptionInfo);
+PVOID OPTHandler = nullptr; //use to remove page guard handler
+uint32_t OPTHandlerRefAmount = 0; //for persistent map and multi-threading environment, map and unmap maybe overlap, we need to make sure remove handler after all persistent map has been unmapped.
+
+//return if enable pageguard;
+//if enable page guard, then check if need to update target range size, page guard only work for those persistent mapped memory which >= target range size. 
+bool getPageGuardEnableFlag()
+{
+    static bool EnablePageGuard = PAGEGUARD_PAGEGUARD_ENABLE_DEFAULT;
+    static bool FirstTimeRun = true;
+    if (FirstTimeRun)
+    {
+        FirstTimeRun = false;
+        const char *env_pageguard = vktrace_get_global_var(PAGEGUARD_PAGEGUARD_ENABLE_ENV);
+        if (env_pageguard)
+        {
+            int envvalue;
+            if (sscanf(env_pageguard, "%d", &envvalue) == 1)
+            {
+                if (envvalue)
+                {
+                    EnablePageGuard = true;
+                    const char *env_target_range_size = vktrace_get_global_var(PAGEGUARD_PAGEGUARD_TARGET_RANGE_SIZE_ENV);
+                    if (env_target_range_size)
+                    {
+                        VkDeviceSize rangesize;
+                        if (sscanf(env_target_range_size, "%llu", &rangesize) == 1)
+                        {
+                            if (rangesize > PAGEGUARD_PAGEGUARD_TARGET_RANGE_SIZE_MIN)
+                            {
+                                set_pageguard_target_range_size(rangesize);
+                            }
+                        }
+                    }
+                }
+                else
+                {
+                    EnablePageGuard = false;
+                }
+            }
+        }
+    }
+    return EnablePageGuard;
+}
+
+void setPageGuardExceptionHandler()
+{
+    vktrace_sem_wait(ref_amount_sem_id);
+    if (!OPTHandler)
+    {
+        OPTHandler = AddVectoredExceptionHandler(1, PageGuardExceptionHandler);
+        OPTHandlerRefAmount = 1;
+    }
+    else
+    {
+        OPTHandlerRefAmount++;
+    }
+    vktrace_sem_post(ref_amount_sem_id);
+}
+
+void removePageGuardExceptionHandler()
+{
+    vktrace_sem_wait(ref_amount_sem_id);
+    if (OPTHandler)
+    {
+        if (OPTHandlerRefAmount)
+        {
+            OPTHandlerRefAmount--;
+        }
+        if (!OPTHandlerRefAmount)
+        {
+            RemoveVectoredExceptionHandler(OPTHandler);
+            OPTHandler = nullptr;
+        }
+    }
+    vktrace_sem_post(ref_amount_sem_id);
+}
+
+size_t pageguardGetAdjustedSize(size_t size)
+{
+    size_t pagesize = pageguardGetSystemPageSize();
+    if (size % pagesize)
+    {
+        size = size - (size % pagesize) + pagesize;
+    }
+    return size;
+}
+
+//page guard only work for virtual memory, real memory no page concept.
+void* pageguardAllocateMemory(size_t size)
+{
+    void* pMemory = nullptr;
+
+    if (size != 0)
+    {
+        pMemory = (PBYTE)VirtualAlloc(nullptr, pageguardGetAdjustedSize( size), MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);
+    }
+
+    return pMemory;
+}
+
+void pageguardFreeMemory(void* pMemory)
+{
+    if (pMemory)
+    {
+        VirtualFree(pMemory, 0, MEM_RELEASE);
+    }
+}
+
+DWORD pageguardGetSystemPageSize()
+{
+#if defined(PLATFORM_LINUX)
+    return 0x1000;
+#elif defined(WIN32)
+    SYSTEM_INFO sSysInfo;
+    GetSystemInfo(&sSysInfo);
+    return sSysInfo.dwPageSize;
+#endif
+}
+
+void setFlagTovkFlushMappedMemoryRangesSpecial(PBYTE pOPTPackageData)
+{
+    PageGuardChangedBlockInfo *pChangedInfoArray = (PageGuardChangedBlockInfo *)pOPTPackageData;
+    pChangedInfoArray[0].reserve0 = pChangedInfoArray[0].reserve0 | PAGEGUARD_SPECIAL_FORMAT_PACKET_FOR_VKFLUSHMAPPEDMEMORYRANGES;
+}
+
+PageGuardCapture& getPageGuardControlInstance()
+{
+    static PageGuardCapture OPTControl;
+    return OPTControl;
+}
+
+void flushTargetChangedMappedMemory(LPPageGuardMappedMemory TargetMappedMemory, vkFlushMappedMemoryRangesFunc pFunc, VkMappedMemoryRange*  pMemoryRanges)
+{
+    bool newMemoryRangesInside = (pMemoryRanges == nullptr);
+    if (newMemoryRangesInside)
+    {
+        pMemoryRanges = new VkMappedMemoryRange[1];
+        assert(pMemoryRanges);
+    }
+    pMemoryRanges[0].memory = TargetMappedMemory->getMappedMemory();
+    pMemoryRanges[0].offset = TargetMappedMemory->getMappedOffset();
+    pMemoryRanges[0].pNext = nullptr;
+    pMemoryRanges[0].size = TargetMappedMemory->getMappedSize();
+    pMemoryRanges[0].sType = VK_STRUCTURE_TYPE_MAPPED_MEMORY_RANGE;
+    (*pFunc)(TargetMappedMemory->getMappedDevice(), 1, pMemoryRanges);
+    if (newMemoryRangesInside)
+    {
+        delete[] pMemoryRanges;
+    }
+}
+
+void flushAllChangedMappedMemory(vkFlushMappedMemoryRangesFunc pFunc)
+{
+    LPPageGuardMappedMemory pMappedMemoryTemp;
+    uint64_t amount = getPageGuardControlInstance().getMapMemory().size();
+    if (amount)
+    {
+        int i = 0;
+        VkMappedMemoryRange*  pMemoryRanges = new VkMappedMemoryRange[1];//amount
+        for (std::unordered_map< VkDeviceMemory, PageGuardMappedMemory >::iterator it = getPageGuardControlInstance().getMapMemory().begin(); it != getPageGuardControlInstance().getMapMemory().end(); it++)
+        {
+            pMappedMemoryTemp = &(it->second);
+            flushTargetChangedMappedMemory(pMappedMemoryTemp, pFunc, pMemoryRanges);
+            i++;
+        }
+        delete[] pMemoryRanges;
+    }
+}
+
+void resetAllReadFlagAndPageGuard()
+{
+    LPPageGuardMappedMemory pMappedMemoryTemp;
+    uint64_t amount = getPageGuardControlInstance().getMapMemory().size();
+    if (amount)
+    {
+        int i = 0;
+        for (std::unordered_map< VkDeviceMemory, PageGuardMappedMemory >::iterator it = getPageGuardControlInstance().getMapMemory().begin(); it != getPageGuardControlInstance().getMapMemory().end(); it++)
+        {
+            pMappedMemoryTemp = &(it->second);
+            pMappedMemoryTemp->resetMemoryObjectAllReadFlagAndPageGuard();
+            i++;
+        }
+    }
+}
+
+//page guard handler
+LONG WINAPI PageGuardExceptionHandler(PEXCEPTION_POINTERS ExceptionInfo)
+{
+    LONG resultCode = EXCEPTION_CONTINUE_SEARCH;
+    pageguardEnter();
+    if (ExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_GUARD_PAGE_VIOLATION)
+    {
+        VkDeviceSize OffsetOfAddr;
+        PBYTE pBlock;
+        VkDeviceSize BlockSize;
+        PBYTE addr = reinterpret_cast<PBYTE>(ExceptionInfo->ExceptionRecord->ExceptionInformation[1]);
+        bool bWrite = ExceptionInfo->ExceptionRecord->ExceptionInformation[0];
+        LPPageGuardMappedMemory pMappedMem = getPageGuardControlInstance().findMappedMemoryObject(addr, &OffsetOfAddr, &pBlock, &BlockSize);
+        if (pMappedMem)
+        {
+            uint64_t index = pMappedMem->getIndexOfChangedBlockByAddr(addr);
+            if (bWrite)
+            {
+                pMappedMem->setMappedBlockChanged(index, true, BLOCK_FLAG_ARRAY_CHANGED);
+                resultCode = EXCEPTION_CONTINUE_EXECUTION;
+            }
+            else
+            {
+
+#ifndef PAGEGUARD_ADD_PAGEGUARD_ON_REAL_MAPPED_MEMORY
+                vktrace_pageguard_memcpy(pBlock, pMappedMem->getRealMappedDataPointer() + OffsetOfAddr - OffsetOfAddr % BlockSize, pMappedMem->getMappedBlockSize(index));
+                pMappedMem->setMappedBlockChanged(index, true, BLOCK_FLAG_ARRAY_READ);
+
+                resultCode = EXCEPTION_CONTINUE_EXECUTION;
+#else
+                pMappedMem->setMappedBlockChanged(index, true);
+                resultCode = EXCEPTION_CONTINUE_EXECUTION;
+#endif
+            }
+        }
+    }
+    pageguardExit();
+    return resultCode;
+}
+
+//The function source code is modified from __HOOKED_vkFlushMappedMemoryRanges
+//for coherent map, need this function to dump data so simulate target application write data when playback.
+VkResult vkFlushMappedMemoryRangesWithoutAPICall(
+    VkDevice device,
+    uint32_t memoryRangeCount,
+    const VkMappedMemoryRange* pMemoryRanges)
+{
+    VkResult result = VK_SUCCESS;
+    vktrace_trace_packet_header* pHeader;
+    size_t rangesSize = 0;
+    size_t dataSize = 0;
+    uint32_t iter;
+    packet_vkFlushMappedMemoryRanges* pPacket = nullptr;
+
+#ifdef USE_PAGEGUARD_SPEEDUP
+    PBYTE *ppPackageData = new PBYTE[memoryRangeCount];
+    bool bOPTTargetWithoutChange = getPageGuardControlInstance().vkFlushMappedMemoryRangesPageGuardHandle(device, memoryRangeCount, pMemoryRanges, ppPackageData);//the packet is not needed if no any change on data of all ranges
+    if (bOPTTargetWithoutChange)
+    {
+        return result;
+    }
+#endif
+
+    // find out how much memory is in the ranges
+    for (iter = 0; iter < memoryRangeCount; iter++)
+    {
+        VkMappedMemoryRange* pRange = (VkMappedMemoryRange*)&pMemoryRanges[iter];
+        rangesSize += vk_size_vkmappedmemoryrange(pRange);
+        dataSize += (size_t)pRange->size;
+    }
+#ifdef USE_PAGEGUARD_SPEEDUP
+    dataSize = getPageGuardControlInstance().getALLChangedPackageSizeInMappedMemory(device, memoryRangeCount, pMemoryRanges, ppPackageData);
+#endif
+    CREATE_TRACE_PACKET(vkFlushMappedMemoryRanges, rangesSize + sizeof(void*)*memoryRangeCount + dataSize);
+    pPacket = interpret_body_as_vkFlushMappedMemoryRanges(pHeader);
+
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pMemoryRanges), rangesSize, pMemoryRanges);
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pMemoryRanges));
+
+    // insert into packet the data that was written by CPU between the vkMapMemory call and here
+    // create a temporary local ppData array and add it to the packet (to reserve the space for the array)
+    void** ppTmpData = (void **)malloc(memoryRangeCount * sizeof(void*));
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->ppData), sizeof(void*)*memoryRangeCount, ppTmpData);
+    free(ppTmpData);
+
+    // now the actual memory
+    vktrace_enter_critical_section(&g_memInfoLock);
+    for (iter = 0; iter < memoryRangeCount; iter++)
+    {
+        VkMappedMemoryRange* pRange = (VkMappedMemoryRange*)&pMemoryRanges[iter];
+        VKAllocInfo* pEntry = find_mem_info_entry(pRange->memory);
+
+        if (pEntry != nullptr)
+        {
+            assert(pEntry->handle == pRange->memory);
+            assert(pEntry->totalSize >= (pRange->size + pRange->offset));
+            assert(pEntry->totalSize >= pRange->size);
+            assert(pRange->offset >= pEntry->rangeOffset && (pRange->offset + pRange->size) <= (pEntry->rangeOffset + pEntry->rangeSize));
+#ifdef USE_PAGEGUARD_SPEEDUP
+            LPPageGuardMappedMemory pOPTMemoryTemp = getPageGuardControlInstance().findMappedMemoryObject(device, pRange);
+            VkDeviceSize OPTPackageSizeTemp = 0;
+            if (pOPTMemoryTemp)
+            {
+                PBYTE pOPTDataTemp = pOPTMemoryTemp->getChangedDataPackage(&OPTPackageSizeTemp);
+                setFlagTovkFlushMappedMemoryRangesSpecial(pOPTDataTemp);
+                vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->ppData[iter]), OPTPackageSizeTemp, pOPTDataTemp);
+                pOPTMemoryTemp->clearChangedDataPackage();
+                pOPTMemoryTemp->resetMemoryObjectAllChangedFlagAndPageGuard();
+            }
+            else
+            {
+                PBYTE pOPTDataTemp = getPageGuardControlInstance().getChangedDataPackageOutOfMap(ppPackageData, iter, &OPTPackageSizeTemp);
+                setFlagTovkFlushMappedMemoryRangesSpecial(pOPTDataTemp);
+                vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->ppData[iter]), OPTPackageSizeTemp, pOPTDataTemp);
+                getPageGuardControlInstance().clearChangedDataPackageOutOfMap(ppPackageData, iter);
+            }
+#else
+            vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->ppData[iter]), pRange->size, pEntry->pData + pRange->offset);
+#endif
+            vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->ppData[iter]));
+            pEntry->didFlush = true;
+        }
+        else
+        {
+            vktrace_LogError("Failed to copy app memory into trace packet (idx = %u) on vkFlushedMappedMemoryRanges", pHeader->global_packet_index);
+        }
+    }
+#ifdef USE_PAGEGUARD_SPEEDUP
+    delete[] ppPackageData;
+#endif
+    vktrace_leave_critical_section(&g_memInfoLock);
+
+    // now finalize the ppData array since it is done being updated
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->ppData));
+
+    //result = mdd(device)->devTable.FlushMappedMemoryRanges(device, memoryRangeCount, pMemoryRanges);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket->device = device;
+    pPacket->memoryRangeCount = memoryRangeCount;
+    pPacket->result = result;
+
+    FINISH_TRACE_PACKET();
+    return result;
+}
+//OPT end
+#else
+
+#undef USE_PAGEGUARD_SPEEDUP
+
+#endif//page guard solution for windows

diff --git a/vktrace/src/vktrace_layer/vktrace_lib_pageguard.h b/vktrace/src/vktrace_layer/vktrace_lib_pageguard.h
new file mode 100644
index 0000000..db1d09d
--- /dev/null
+++ b/vktrace/src/vktrace_layer/vktrace_lib_pageguard.h

@@ -0,0 +1,108 @@
+/*
+* Copyright (c) 2016 Advanced Micro Devices, Inc. All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+//  Optimization by using page-guard for speed up capture and capture persistent mapped memory
+//
+//  Background:
+//
+//     The speed is extremely slow when use vktrace to capture DOOM4. It took over half a day and 900G of trace for a capture from beginning to the game menu.
+//     The reason that caused such slow capture is DOOM updates a big mapped memory(over 67M) frequently, vktrace copies this memory block to harddrive when DOOM calls vkFlushmappedMemory to update it every time.
+//     Here we use page guard to record which page of big memory block has been changed and only save those changed pages, it make the capture time reduce to round 15 minutes, the trace file size is round 40G, 
+//     The Playback time for these trace file is round 7 minutes(on Win10/AMDFury/32GRam/I5 system).
+//
+//     The page guard method also has been tested ok for capture/playback SaschaWillems(July20) applications. Test system:  CPU: AMD FX-9590, GPU: R9 200 Series, RAM: 16 GB, Driver: Crimson Public Driver 16.7.3.
+//
+//  Page guard process:
+//
+//     1. The target app call map buffer and get a mapped memory pointer(for example pMemReal).
+//
+//     2. The capturer then allocates a duplicated memory of at least the same size(for example pMemCopy). Copies all data from pMemReal to pMemCopy
+//
+//     3. The capturer adds page guard to pMemCopy for every page.Then returns the pMemCopy(not pMemReal) to the target app.
+//
+//     4. When the target app uses pMemCopy to write, page guard will be triggered and page guard handler will records which page has been changed.
+//
+//     5. When the target app uses pMemCopy to read data, page guard will be triggered and page guard handler records which page the target app wants to read. And only copy that page from real mapped memory to copied memory so after page guard exception, target app will read right data.
+//
+//     6. When the target app calls CPU->GPU synchronization(vkFlushMappedMemoryRanges and submits queue), the capturer saves all changed pages and also copies back these changed pages to pMemReal from pMemCopy, resets all page guard which triggered by write.
+//
+//     7. When the target app calls GPU->CPU synchronization(vkInvalidateMappedMemoryRanges, also include vkQueueSubmit which is before that synchronization happen), clear all read array and resets all page guard which triggered by read.
+//
+//     8. When the target app calls to unmap the memory, the capturer saves all changed pages to HD and also copies back these changed pages to pMemReal from pMemCopy. Finally removes all page guards and frees pMemCopy.
+//
+//  Known limitations:
+//
+//     1. for a page which is in mapped memory, if target app first read it before write, then do GPU->CPU synchronization like vkFlushMappedMemoryRanges trigger flush copied mapped memory( pMemCopy ) to real mapped memory( pMemReal ), that page will not be recorded as changed page.
+//
+//     2. one page accessed by a thread and the access just happen when another thread already finish copying date to real mapped memory for that page but haven't reset page guard of that page, that page will not be recorded as changed page.
+
+#pragma once
+
+#include <stdbool.h>
+#include <unordered_map>
+#include "vulkan/vulkan.h"
+#include "vktrace_platform.h"
+#include "vktrace_common.h"
+
+#include "vktrace_interconnect.h"
+#include "vktrace_filelike.h"
+#include "vktrace_trace_packet_utils.h"
+#include "vktrace_vk_exts.h"
+#include <stdio.h>
+
+#include "vktrace_pageguard_memorycopy.h"
+
+
+#define PAGEGUARD_PAGEGUARD_ENABLE_ENV "VKTRACE_PAGEGUARD"
+#define PAGEGUARD_PAGEGUARD_TARGET_RANGE_SIZE_ENV "VKTRACE_PAGEGUARDTARGETSIZE"
+
+
+#if defined(WIN32) /// page guard solution for windows
+
+VkDeviceSize& ref_target_range_size();
+bool getPageGuardEnableFlag();
+void setPageGuardExceptionHandler();
+void removePageGuardExceptionHandler();
+size_t pageguardGetAdjustedSize(size_t size);
+void* pageguardAllocateMemory(size_t size);
+void pageguardFreeMemory(void* pMemory);
+DWORD pageguardGetSystemPageSize();
+
+void pageguardEnter();
+void pageguardExit();
+
+
+void setFlagTovkFlushMappedMemoryRangesSpecial(PBYTE pOPTPackageData);
+
+
+void flushAllChangedMappedMemory(vkFlushMappedMemoryRangesFunc pFunc);
+
+void flushTargetChangedMappedMemory(LPPageGuardMappedMemory TargetMappedMemory, vkFlushMappedMemoryRangesFunc pFunc, VkMappedMemoryRange*  pMemoryRanges);
+
+void resetAllReadFlagAndPageGuard();
+
+LONG WINAPI PageGuardExceptionHandler(PEXCEPTION_POINTERS ExceptionInfo);
+
+VkResult vkFlushMappedMemoryRangesWithoutAPICall(VkDevice device, uint32_t memoryRangeCount, const VkMappedMemoryRange* pMemoryRanges);
+
+PageGuardCapture& getPageGuardControlInstance();
+
+//page guard for windows end
+#else
+
+#undef USE_PAGEGUARD_SPEEDUP
+
+#endif//page guard solution

diff --git a/vktrace/src/vktrace_layer/vktrace_lib_pageguardcapture.cpp b/vktrace/src/vktrace_layer/vktrace_lib_pageguardcapture.cpp
new file mode 100644
index 0000000..4884d9a
--- /dev/null
+++ b/vktrace/src/vktrace_layer/vktrace_lib_pageguardcapture.cpp

@@ -0,0 +1,369 @@
+/*
+* Copyright (c) 2016 Advanced Micro Devices, Inc. All rights reserved.
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+//OPT: Optimization by using page-guard for speed up capture
+//     The speed is extremely slow when use vktrace to capture DOOM4. It took over half a day and 900G of trace for a capture from beginning to the game menu.
+//     The reason that caused such slow capture is DOOM updates a big mapped memory(over 67M) frequently, vktrace copies this memory block to harddrive when DOOM calls vkFlushmappedMemory to update it every time.
+//     Here we use page guard to record which page of big memory block has been changed and only save those changed pages, it make the capture time reduce to round 15 minutes, the trace file size is round 40G, 
+//     The Playback time for these trace file is round 7 minutes(on Win10/AMDFury/32GRam/I5 system).
+
+#include "vktrace_pageguard_memorycopy.h"
+#include "vktrace_lib_pagestatusarray.h"
+#include "vktrace_lib_pageguardmappedmemory.h"
+#include "vktrace_lib_pageguardcapture.h"
+#include "vktrace_lib_pageguard.h"
+
+#if defined(WIN32) //page guard solution for windows
+
+
+PageGuardCapture::PageGuardCapture()
+{
+    EmptyChangedInfoArray.offset = 0;
+    EmptyChangedInfoArray.length = 0;
+}
+
+std::unordered_map< VkDeviceMemory, PageGuardMappedMemory >& PageGuardCapture::getMapMemory()
+{
+    return MapMemory;
+}
+
+void PageGuardCapture::vkMapMemoryPageGuardHandle(
+    VkDevice device,
+    VkDeviceMemory memory,
+    VkDeviceSize offset,
+    VkDeviceSize size,
+    VkFlags flags,
+    void** ppData)
+{
+    PageGuardMappedMemory OPTmappedmem;
+    if (getPageGuardEnableFlag())
+    {
+#ifdef PAGEGUARD_TARGET_RANGE_SIZE_CONTROL
+        if ((size >= ref_target_range_size()) && (size != -1))
+#endif
+        {
+            OPTmappedmem.vkMapMemoryPageGuardHandle(device, memory, offset, size, flags, ppData);
+            MapMemory[memory] = OPTmappedmem;
+        }
+    }
+    MapMemoryPtr[memory] = (PBYTE)(*ppData);
+    MapMemoryOffset[memory] = offset;
+}
+
+void PageGuardCapture::vkUnmapMemoryPageGuardHandle(VkDevice device, VkDeviceMemory memory, void** MappedData, vkFlushMappedMemoryRangesFunc pFunc)
+{
+    LPPageGuardMappedMemory lpOPTMemoryTemp = findMappedMemoryObject(device, memory);
+    if (lpOPTMemoryTemp)
+    {
+        lpOPTMemoryTemp->vkUnmapMemoryPageGuardHandle(device, memory, MappedData);
+        MapMemory.erase(memory);
+    }
+    MapMemoryPtr.erase(memory);
+    MapMemoryOffset.erase(memory);
+}
+
+void* PageGuardCapture::getMappedMemoryPointer(VkDevice device, VkDeviceMemory memory)
+{
+    return MapMemoryPtr[memory];
+}
+
+VkDeviceSize PageGuardCapture::getMappedMemoryOffset(VkDevice device, VkDeviceMemory memory)
+{
+    return MapMemoryOffset[memory];
+}
+
+//return: if it's target mapped memory and no change at all;
+//PBYTE *ppPackageDataforOutOfMap, must be an array include memoryRangeCount elements
+bool PageGuardCapture::vkFlushMappedMemoryRangesPageGuardHandle(
+    VkDevice device,
+    uint32_t memoryRangeCount,
+    const VkMappedMemoryRange* pMemoryRanges,
+    PBYTE *ppPackageDataforOutOfMap)
+{
+    bool handleSuccessfully = false, bChanged = false;
+    std::unordered_map< VkDeviceMemory, PageGuardMappedMemory >::const_iterator mappedmem_it;
+    for (uint32_t i = 0; i < memoryRangeCount; i++)
+    {
+        VkMappedMemoryRange* pRange = (VkMappedMemoryRange*)&pMemoryRanges[i];
+        size_t rangesSize = (size_t)pRange->size;
+
+        ppPackageDataforOutOfMap[i] = nullptr;
+        LPPageGuardMappedMemory lpOPTMemoryTemp = findMappedMemoryObject(device, pRange->memory);
+
+        if (lpOPTMemoryTemp)
+        {
+            if (pRange->size == VK_WHOLE_SIZE)
+            {
+                pRange->size = lpOPTMemoryTemp->getMappedSize() - pRange->offset;
+            }
+            if (lpOPTMemoryTemp->vkFlushMappedMemoryRangePageGuardHandle(device, pRange->memory, pRange->offset, pRange->size, nullptr, nullptr, nullptr))
+            {
+                bChanged = true;
+            }
+        }
+        else
+        {
+            bChanged = true;
+            VkDeviceSize RealRangeSize = pRange->size;
+            if (RealRangeSize == VK_WHOLE_SIZE)
+            {
+                RealRangeSize = pRange->size;//have to replace with real size, here just for break point
+            }
+            ppPackageDataforOutOfMap[i] = (PBYTE)pageguardAllocateMemory(RealRangeSize + 2 * sizeof(PageGuardChangedBlockInfo));
+            PageGuardChangedBlockInfo *pInfoTemp = (PageGuardChangedBlockInfo *)ppPackageDataforOutOfMap[i];
+            pInfoTemp[0].offset = 1;
+            pInfoTemp[0].length = (DWORD)RealRangeSize;
+            pInfoTemp[0].reserve0 = 0;
+            pInfoTemp[0].reserve1 = 0;
+            pInfoTemp[1].offset = pRange->offset - getMappedMemoryOffset(device, pRange->memory);
+            pInfoTemp[1].length = (DWORD)RealRangeSize;
+            pInfoTemp[1].reserve0 = 0;
+            pInfoTemp[1].reserve1 = 0;
+            PBYTE pDataInPackage = (PBYTE)(pInfoTemp + 2);
+            void* pDataMapped = getMappedMemoryPointer(device, pRange->memory);
+            vktrace_pageguard_memcpy(pDataInPackage, reinterpret_cast<PBYTE>(pDataMapped) + pInfoTemp[1].offset, RealRangeSize);
+        }
+    }
+    if (!bChanged)
+    {
+        handleSuccessfully = true;
+    }
+    return handleSuccessfully;
+}
+
+LPPageGuardMappedMemory PageGuardCapture::findMappedMemoryObject(VkDevice device, VkDeviceMemory memory)
+{
+    LPPageGuardMappedMemory pMappedMemoryObject = nullptr;
+    std::unordered_map< VkDeviceMemory, PageGuardMappedMemory >::const_iterator mappedmem_it;
+    mappedmem_it = MapMemory.find(memory);
+    if (mappedmem_it != MapMemory.end())
+    {
+        pMappedMemoryObject = ((PageGuardMappedMemory *)&(mappedmem_it->second));
+        if (pMappedMemoryObject->MappedDevice != device)
+        {
+            pMappedMemoryObject = nullptr;
+        }
+    }
+    return pMappedMemoryObject;
+}
+
+LPPageGuardMappedMemory PageGuardCapture::findMappedMemoryObject(PBYTE addr, VkDeviceSize *pOffsetOfAddr, PBYTE *ppBlock, VkDeviceSize *pBlockSize)
+{
+    LPPageGuardMappedMemory pMappedMemoryObject = nullptr;
+    LPPageGuardMappedMemory pMappedMemoryTemp;
+    PBYTE RealMappedMemoryAddr = nullptr;
+    PBYTE pBlock = nullptr;
+    VkDeviceSize OffsetOfAddr = 0, BlockSize = 0;
+
+    for (std::unordered_map< VkDeviceMemory, PageGuardMappedMemory >::iterator it = MapMemory.begin(); it != MapMemory.end(); it++)
+    {
+        pMappedMemoryTemp = &(it->second);
+        if ((addr >= pMappedMemoryTemp->pMappedData) && (addr < (pMappedMemoryTemp->pMappedData + pMappedMemoryTemp->MappedSize)))
+        {
+            pMappedMemoryObject = pMappedMemoryTemp;
+
+            OffsetOfAddr = (VkDeviceSize)(addr - pMappedMemoryTemp->pMappedData);
+            BlockSize = pMappedMemoryTemp->PageGuardSize;
+            pBlock = addr - OffsetOfAddr % BlockSize;
+            if (ppBlock)
+            {
+                *ppBlock = pBlock;
+            }
+            if (pBlockSize)
+            {
+                *pBlockSize = BlockSize;
+            }
+            if (pOffsetOfAddr)
+            {
+                *pOffsetOfAddr = OffsetOfAddr;
+            }
+        }
+    }
+    return pMappedMemoryObject;
+}
+
+LPPageGuardMappedMemory PageGuardCapture::findMappedMemoryObject(VkDevice device, const VkMappedMemoryRange* pMemoryRange)
+{
+    LPPageGuardMappedMemory pMappedMemoryObject = findMappedMemoryObject(device, pMemoryRange->memory);
+    return pMappedMemoryObject;
+}
+
+//get size of all changed package in array of pMemoryRanges
+VkDeviceSize PageGuardCapture::getALLChangedPackageSizeInMappedMemory(VkDevice device, uint32_t memoryRangeCount, const VkMappedMemoryRange* pMemoryRanges, PBYTE *ppPackageDataforOutOfMap)
+{
+    VkDeviceSize allChangedPackageSize = 0, PackageSize = 0;
+    LPPageGuardMappedMemory pMappedMemoryTemp;
+    for (uint32_t i = 0; i < memoryRangeCount; i++)
+    {
+        pMappedMemoryTemp = findMappedMemoryObject(device, pMemoryRanges + i);
+        if (pMappedMemoryTemp)
+        {
+            pMappedMemoryTemp->getChangedDataPackage(&PackageSize);
+        }
+        else
+        {
+            PageGuardChangedBlockInfo *pInfoTemp = (PageGuardChangedBlockInfo *)ppPackageDataforOutOfMap[i];
+            PackageSize = pInfoTemp->length + 2 * sizeof(PageGuardChangedBlockInfo);
+        }
+        allChangedPackageSize += PackageSize;
+    }
+    return allChangedPackageSize;
+}
+
+//get ptr and size of OPTChangedDataPackage;
+PBYTE PageGuardCapture::getChangedDataPackageOutOfMap(PBYTE *ppPackageDataforOutOfMap, DWORD dwRangeIndex, VkDeviceSize  *pSize)
+{
+    PBYTE pDataPackage = (PBYTE)ppPackageDataforOutOfMap[dwRangeIndex];
+    PageGuardChangedBlockInfo *pInfo = (PageGuardChangedBlockInfo *)pDataPackage;
+    if (pSize)
+    {
+        *pSize = sizeof(PageGuardChangedBlockInfo) * 2 + pInfo->length;
+    }
+    return pDataPackage;
+}
+
+void PageGuardCapture::clearChangedDataPackageOutOfMap(PBYTE *ppPackageDataforOutOfMap, DWORD dwRangeIndex)
+{
+    pageguardFreeMemory(ppPackageDataforOutOfMap[dwRangeIndex]);
+    ppPackageDataforOutOfMap[dwRangeIndex] = nullptr;
+}
+
+bool PageGuardCapture::isHostWriteFlagSetInMemoryBarriers(uint32_t  memoryBarrierCount, const VkMemoryBarrier*  pMemoryBarriers)
+{
+    bool flagSet = false;
+    if ((memoryBarrierCount != 0) && (pMemoryBarriers))
+    {
+        for (uint32_t i = 0; i < memoryBarrierCount; i++)
+        {
+            if (pMemoryBarriers[i].srcAccessMask&VK_ACCESS_HOST_WRITE_BIT)
+            {
+                flagSet = true;
+            }
+        }
+    }
+    return flagSet;
+}
+
+bool PageGuardCapture::isHostWriteFlagSetInBufferMemoryBarrier(uint32_t  memoryBarrierCount, const VkBufferMemoryBarrier*  pMemoryBarriers)
+{
+    bool flagSet = false;
+    if ((memoryBarrierCount != 0) && (pMemoryBarriers))
+    {
+        for (uint32_t i = 0; i < memoryBarrierCount; i++)
+        {
+            if (pMemoryBarriers[i].srcAccessMask&VK_ACCESS_HOST_WRITE_BIT)
+            {
+                flagSet = true;
+            }
+        }
+    }
+    return flagSet;
+}
+
+bool PageGuardCapture::isHostWriteFlagSetInImageMemoryBarrier(uint32_t  memoryBarrierCount, const VkImageMemoryBarrier*  pMemoryBarriers)
+{
+    bool flagSet = false;
+    if ((memoryBarrierCount != 0) && (pMemoryBarriers))
+    {
+        for (uint32_t i = 0; i < memoryBarrierCount; i++)
+        {
+            if (pMemoryBarriers[i].srcAccessMask&VK_ACCESS_HOST_WRITE_BIT)
+            {
+                flagSet = true;
+            }
+        }
+    }
+    return flagSet;
+}
+
+bool PageGuardCapture::isHostWriteFlagSet(VkPipelineStageFlags srcStageMask, VkPipelineStageFlags dstStageMask, VkDependencyFlags  dependencyFlags,
+    uint32_t   memoryBarrierCount, const VkMemoryBarrier*   pMemoryBarriers,
+    uint32_t  bufferMemoryBarrierCount, const VkBufferMemoryBarrier*  pBufferMemoryBarriers,
+    uint32_t  imageMemoryBarrierCount, const VkImageMemoryBarrier*  pImageMemoryBarriers)
+{
+    bool flagSet = false, bWrite = isHostWriteFlagSetInMemoryBarriers(memoryBarrierCount, pMemoryBarriers) ||
+        isHostWriteFlagSetInBufferMemoryBarrier(bufferMemoryBarrierCount, pBufferMemoryBarriers) ||
+        isHostWriteFlagSetInImageMemoryBarrier(imageMemoryBarrierCount, pImageMemoryBarriers);
+    if (bWrite || (srcStageMask&VK_PIPELINE_STAGE_HOST_BIT))
+    {
+        flagSet = true;
+    }
+    return flagSet;
+}
+
+bool PageGuardCapture::isReadyForHostReadInMemoryBarriers(uint32_t  memoryBarrierCount, const VkMemoryBarrier*  pMemoryBarriers)
+{
+    bool isReady = false;
+    if ((memoryBarrierCount != 0) && (pMemoryBarriers))
+    {
+        for (uint32_t i = 0; i < memoryBarrierCount; i++)
+        {
+            if (pMemoryBarriers[i].dstAccessMask&VK_ACCESS_HOST_READ_BIT)
+            {
+                isReady = true;
+            }
+        }
+    }
+    return isReady;
+}
+
+bool PageGuardCapture::isReadyForHostReadInBufferMemoryBarrier(uint32_t  memoryBarrierCount, const VkBufferMemoryBarrier*  pMemoryBarriers)
+{
+    bool isReady = false;
+    if ((memoryBarrierCount != 0) && (pMemoryBarriers))
+    {
+        for (uint32_t i = 0; i < memoryBarrierCount; i++)
+        {
+            if (pMemoryBarriers[i].dstAccessMask&VK_ACCESS_HOST_READ_BIT)
+            {
+                isReady = true;
+            }
+        }
+    }
+    return isReady;
+}
+
+bool PageGuardCapture::isReadyForHostReadInImageMemoryBarrier(uint32_t  memoryBarrierCount, const VkImageMemoryBarrier*  pMemoryBarriers)
+{
+    bool isReady = false;
+    if ((memoryBarrierCount != 0) && (pMemoryBarriers))
+    {
+        for (uint32_t i = 0; i < memoryBarrierCount; i++)
+        {
+            if ((pMemoryBarriers[i].dstAccessMask&VK_ACCESS_HOST_READ_BIT))
+            {
+                isReady = true;
+            }
+        }
+    }
+    return isReady;
+}
+
+bool PageGuardCapture::isReadyForHostRead(VkPipelineStageFlags srcStageMask, VkPipelineStageFlags dstStageMask, VkDependencyFlags  dependencyFlags,
+    uint32_t   memoryBarrierCount, const VkMemoryBarrier*   pMemoryBarriers,
+    uint32_t  bufferMemoryBarrierCount, const VkBufferMemoryBarrier*  pBufferMemoryBarriers,
+    uint32_t  imageMemoryBarrierCount, const VkImageMemoryBarrier*  pImageMemoryBarriers)
+{
+    bool isReady = false, bRead = isReadyForHostReadInMemoryBarriers(memoryBarrierCount, pMemoryBarriers) ||
+        isReadyForHostReadInBufferMemoryBarrier(bufferMemoryBarrierCount, pBufferMemoryBarriers) ||
+        isReadyForHostReadInImageMemoryBarrier(imageMemoryBarrierCount, pImageMemoryBarriers);
+    if (bRead || (dstStageMask&VK_PIPELINE_STAGE_HOST_BIT))
+    {
+        isReady = true;
+    }
+    return isReady;
+}
+
+#endif//page guard solution for windows

diff --git a/vktrace/src/vktrace_layer/vktrace_lib_pageguardcapture.h b/vktrace/src/vktrace_layer/vktrace_lib_pageguardcapture.h
new file mode 100644
index 0000000..411993f
--- /dev/null
+++ b/vktrace/src/vktrace_layer/vktrace_lib_pageguardcapture.h

@@ -0,0 +1,111 @@
+/*
+* Copyright (c) 2016 Advanced Micro Devices, Inc. All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+#pragma once
+
+#include <stdbool.h>
+#include <unordered_map>
+#include "vulkan/vulkan.h"
+#include "vktrace_platform.h"
+#include "vktrace_common.h"
+
+#include "vktrace_interconnect.h"
+#include "vktrace_filelike.h"
+#include "vktrace_trace_packet_utils.h"
+#include "vktrace_vk_exts.h"
+#include <stdio.h>
+
+#include "vktrace_pageguard_memorycopy.h"
+#include "vktrace_lib_pageguardmappedmemory.h"
+
+#if defined(WIN32) /// page guard solution for windows
+
+#define PAGEGUARD_TARGET_RANGE_SIZE_CONTROL
+
+//PAGEGUARD_ADD_PAGEGUARD_ON_REAL_MAPPED_MEMORY is a compile flag for add page guard on real mapped memory.
+//If comment this flag, pageguard will be added on a copy of mapped memory, with the flag, page guard will be added to
+//real mapped memory.
+//for some hareware, add to mapped memory not the copy of it may not be allowed, so turn on this flag just for you are already sure page guard can work on that hardware.
+//If add page guard to the copy of mapped memory, it's always allowed but need to do synchonization between the mapped memory and its copy.
+
+//#define PAGEGUARD_ADD_PAGEGUARD_ON_REAL_MAPPED_MEMORY
+
+typedef VkResult(*vkFlushMappedMemoryRangesFunc)(VkDevice device, uint32_t memoryRangeCount, const VkMappedMemoryRange*  pMemoryRanges);
+
+typedef class PageGuardCapture
+{
+private:
+    PageGuardChangedBlockInfo EmptyChangedInfoArray;
+    std::unordered_map< VkDeviceMemory, PageGuardMappedMemory > MapMemory;
+    std::unordered_map< VkDeviceMemory, PBYTE > MapMemoryPtr;
+    std::unordered_map< VkDeviceMemory, VkDeviceSize > MapMemoryOffset;
+public:
+
+    PageGuardCapture();
+
+    std::unordered_map< VkDeviceMemory, PageGuardMappedMemory >& getMapMemory();
+
+    void vkMapMemoryPageGuardHandle(VkDevice device, VkDeviceMemory memory, VkDeviceSize offset, VkDeviceSize size, VkFlags flags, void** ppData);
+
+    void vkUnmapMemoryPageGuardHandle(VkDevice device, VkDeviceMemory memory, void** MappedData, vkFlushMappedMemoryRangesFunc pFunc);
+
+    void* getMappedMemoryPointer(VkDevice device, VkDeviceMemory memory);
+
+    VkDeviceSize PageGuardCapture::getMappedMemoryOffset(VkDevice device, VkDeviceMemory memory);
+
+    /// return: if it's target mapped memory and no change at all;
+    /// PBYTE *ppPackageDataforOutOfMap, must be an array include memoryRangeCount elements
+    bool vkFlushMappedMemoryRangesPageGuardHandle(VkDevice device, uint32_t memoryRangeCount, const VkMappedMemoryRange* pMemoryRanges, PBYTE *ppPackageDataforOutOfMap);
+
+    LPPageGuardMappedMemory findMappedMemoryObject(VkDevice device, VkDeviceMemory memory);
+
+    LPPageGuardMappedMemory findMappedMemoryObject(PBYTE addr, VkDeviceSize *pOffsetOfAddr = nullptr, PBYTE *ppBlock = nullptr, VkDeviceSize *pBlockSize = nullptr);
+
+    LPPageGuardMappedMemory findMappedMemoryObject(VkDevice device, const VkMappedMemoryRange* pMemoryRange);
+
+    /// get size of all changed package in array of pMemoryRanges
+    VkDeviceSize getALLChangedPackageSizeInMappedMemory(VkDevice device, uint32_t memoryRangeCount, const VkMappedMemoryRange* pMemoryRanges, PBYTE *ppPackageDataforOutOfMap);
+
+    /// get ptr and size of OPTChangedDataPackage;
+    PBYTE getChangedDataPackageOutOfMap(PBYTE *ppPackageDataforOutOfMap, DWORD dwRangeIndex, VkDeviceSize  *pSize);
+
+    void clearChangedDataPackageOutOfMap(PBYTE *ppPackageDataforOutOfMap, DWORD dwRangeIndex);
+
+    bool isHostWriteFlagSetInMemoryBarriers(uint32_t  memoryBarrierCount, const VkMemoryBarrier*  pMemoryBarriers);
+
+    bool isHostWriteFlagSetInBufferMemoryBarrier(uint32_t  memoryBarrierCount, const VkBufferMemoryBarrier*  pMemoryBarriers);
+
+    bool isHostWriteFlagSetInImageMemoryBarrier(uint32_t  memoryBarrierCount, const VkImageMemoryBarrier*  pMemoryBarriers);
+
+    bool isHostWriteFlagSet(VkPipelineStageFlags srcStageMask, VkPipelineStageFlags dstStageMask, VkDependencyFlags  dependencyFlags,
+        uint32_t   memoryBarrierCount, const VkMemoryBarrier*   pMemoryBarriers,
+        uint32_t  bufferMemoryBarrierCount, const VkBufferMemoryBarrier*  pBufferMemoryBarriers,
+        uint32_t  imageMemoryBarrierCount, const VkImageMemoryBarrier*  pImageMemoryBarriers);
+
+    bool isReadyForHostReadInMemoryBarriers(uint32_t  memoryBarrierCount, const VkMemoryBarrier*  pMemoryBarriers);
+
+    bool isReadyForHostReadInBufferMemoryBarrier(uint32_t  memoryBarrierCount, const VkBufferMemoryBarrier*  pMemoryBarriers);
+
+    bool isReadyForHostReadInImageMemoryBarrier(uint32_t  memoryBarrierCount, const VkImageMemoryBarrier*  pMemoryBarriers);
+
+    bool isReadyForHostRead(VkPipelineStageFlags srcStageMask, VkPipelineStageFlags dstStageMask, VkDependencyFlags  dependencyFlags,
+        uint32_t   memoryBarrierCount, const VkMemoryBarrier*   pMemoryBarriers,
+        uint32_t  bufferMemoryBarrierCount, const VkBufferMemoryBarrier*  pBufferMemoryBarriers,
+        uint32_t  imageMemoryBarrierCount, const VkImageMemoryBarrier*  pImageMemoryBarriers);
+
+} PageGuardCapture;
+//page guard for windows end
+
+#endif//page guard solution

diff --git a/vktrace/src/vktrace_layer/vktrace_lib_pageguardmappedmemory.cpp b/vktrace/src/vktrace_layer/vktrace_lib_pageguardmappedmemory.cpp
new file mode 100644
index 0000000..2cf2c3a
--- /dev/null
+++ b/vktrace/src/vktrace_layer/vktrace_lib_pageguardmappedmemory.cpp

@@ -0,0 +1,518 @@
+/*
+* Copyright (c) 2016 Advanced Micro Devices, Inc. All rights reserved.
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+//OPT: Optimization by using page-guard for speed up capture
+//     The speed is extremely slow when use vktrace to capture DOOM4. It took over half a day and 900G of trace for a capture from beginning to the game menu.
+//     The reason that caused such slow capture is DOOM updates a big mapped memory(over 67M) frequently, vktrace copies this memory block to harddrive when DOOM calls vkFlushmappedMemory to update it every time.
+//     Here we use page guard to record which page of big memory block has been changed and only save those changed pages, it make the capture time reduce to round 15 minutes, the trace file size is round 40G, 
+//     The Playback time for these trace file is round 7 minutes(on Win10/AMDFury/32GRam/I5 system).
+
+#include "vktrace_pageguard_memorycopy.h"
+#include "vktrace_lib_pagestatusarray.h"
+#include "vktrace_lib_pageguardmappedmemory.h"
+#include "vktrace_lib_pageguardcapture.h"
+#include "vktrace_lib_pageguard.h"
+
+#if defined(WIN32) //page guard solution for windows
+
+VkDevice& PageGuardMappedMemory::getMappedDevice()
+{
+    return MappedDevice;
+}
+
+VkDeviceMemory& PageGuardMappedMemory::getMappedMemory()
+{
+    return MappedMemory;
+}
+
+VkDeviceSize& PageGuardMappedMemory::getMappedOffset()
+{
+    return MappedOffset;
+}
+
+PBYTE& PageGuardMappedMemory::getRealMappedDataPointer()
+{
+    return pRealMappedData;
+}
+
+VkDeviceSize& PageGuardMappedMemory::getMappedSize()
+{
+    return MappedSize;
+}
+
+PageGuardMappedMemory::PageGuardMappedMemory()
+    :MappedDevice(nullptr),
+    MappedMemory((VkDeviceMemory)nullptr),
+    pRealMappedData(nullptr),
+    pChangedDataPackage(nullptr),
+    pMappedData(nullptr),
+    MappedSize(0),
+    pPageStatus(nullptr),
+    BlockConflictError(false),
+    PageGuardSize(pageguardGetSystemPageSize()),
+    PageSizeLeft(0),
+    PageGuardAmount(0)
+{
+}
+
+PageGuardMappedMemory::~PageGuardMappedMemory()
+{
+}
+
+bool PageGuardMappedMemory::isUseCopyForRealMappedMemory()
+{
+    bool useCopyForRealMappedMemory = false;
+    if (pRealMappedData)
+    {
+        useCopyForRealMappedMemory = true;
+    }
+    return useCopyForRealMappedMemory;
+}
+
+bool PageGuardMappedMemory::getChangedRangeByIndex(uint64_t index, PBYTE *pAddress, VkDeviceSize *pBlockSize)
+{
+    bool isValidResult = false;
+    if (index < PageGuardAmount)
+    {
+        isValidResult = true;
+        if ((index + 1) == PageGuardAmount)
+        {
+            if (pAddress)
+            {
+                *pAddress = pMappedData + index*PageGuardSize;
+            }
+            if (pBlockSize)
+            {
+                *pBlockSize = (SIZE_T)(PageSizeLeft ? PageSizeLeft : PageGuardSize);
+            }
+        }
+        else
+        {
+            if (pAddress)
+            {
+                *pAddress = pMappedData + index*PageGuardSize;
+            }
+            if (pBlockSize)
+            {
+                *pBlockSize = (SIZE_T)PageGuardSize;
+            }
+        }
+    }
+    return isValidResult;
+}
+
+//if return value ==-1, mean addr is out of page guard.
+uint64_t PageGuardMappedMemory::getIndexOfChangedBlockByAddr(PBYTE addr)
+{
+    int64_t addrOffset = addr - pMappedData;
+    uint64_t indexOfChangedBlockByAddr = -1;
+    if ((addrOffset >= 0) && (addrOffset < MappedSize))
+    {
+        indexOfChangedBlockByAddr = addrOffset / PageGuardSize;
+    }
+    else
+    {
+        assert(false);
+    }
+    return indexOfChangedBlockByAddr;
+}
+
+void PageGuardMappedMemory::setMappedBlockChanged(uint64_t index, bool changed, int which)
+{
+    if (index < PageGuardAmount)
+    {
+        switch (which)
+        {
+        case BLOCK_FLAG_ARRAY_CHANGED:
+            pPageStatus->setBlockChangedArray(index, changed);
+            break;
+
+        case BLOCK_FLAG_ARRAY_CHANGED_SNAPSHOT:
+            pPageStatus->setBlockChangedArraySnapshot(index, changed);
+            break;
+
+        case BLOCK_FLAG_ARRAY_READ_SNAPSHOT:
+            pPageStatus->setBlockReadArraySnapshot(index, changed);
+            break;
+
+        default://BLOCK_FLAG_ARRAY_READ
+            pPageStatus->setBlockReadArray(index, changed);
+            break;
+        }
+    }
+}
+
+bool PageGuardMappedMemory::isMappedBlockChanged(uint64_t index, int which)
+{
+    bool mappedBlockChanged = false;
+    if (index < PageGuardAmount)
+    {
+        switch (which)
+        {
+        case BLOCK_FLAG_ARRAY_CHANGED:
+            mappedBlockChanged = pPageStatus->getBlockChangedArray(index);
+            break;
+
+        case BLOCK_FLAG_ARRAY_CHANGED_SNAPSHOT:
+            mappedBlockChanged = pPageStatus->getBlockChangedArraySnapshot(index);
+            break;
+
+        case BLOCK_FLAG_ARRAY_READ_SNAPSHOT:
+            mappedBlockChanged = pPageStatus->getBlockReadArraySnapshot(index);
+            break;
+
+        case BLOCK_FLAG_ARRAY_READ:
+            mappedBlockChanged = pPageStatus->getBlockReadArray(index);
+            break;
+
+        default:
+            assert(false);
+            break;
+        }
+    }
+    return mappedBlockChanged;
+}
+
+uint64_t PageGuardMappedMemory::getMappedBlockSize(uint64_t index)
+{
+    uint64_t mappedBlockSize = PageGuardSize;
+    if ((index + 1) == PageGuardAmount)
+    {
+        if (PageSizeLeft)
+        {
+            mappedBlockSize = PageSizeLeft;
+        }
+    }
+    return mappedBlockSize;
+}
+
+uint64_t PageGuardMappedMemory::getMappedBlockOffset(uint64_t index)
+{
+    uint64_t mappedBlockOffset = 0;
+    if (index < PageGuardAmount)
+    {
+        mappedBlockOffset = index*PageGuardSize;
+    }
+    return mappedBlockOffset;
+}
+
+bool PageGuardMappedMemory::isNoMappedBlockChanged()
+{
+    bool noMappedBlockChanged = true;
+    for (uint64_t i = 0; i < PageGuardAmount; i++)
+    {
+        if (isMappedBlockChanged(i, BLOCK_FLAG_ARRAY_CHANGED))
+        {
+            noMappedBlockChanged = false;
+            break;
+        }
+    }
+    return noMappedBlockChanged;
+}
+
+void PageGuardMappedMemory::resetMemoryObjectAllChangedFlagAndPageGuard()
+{
+    DWORD oldProt;
+    for (uint64_t i = 0; i < PageGuardAmount; i++)
+    {
+        if (isMappedBlockChanged(i, BLOCK_FLAG_ARRAY_CHANGED_SNAPSHOT))
+        {
+            setMappedBlockChanged(i, false, BLOCK_FLAG_ARRAY_CHANGED_SNAPSHOT);
+            VirtualProtect(pMappedData + i*PageGuardSize, (SIZE_T)getMappedBlockSize(i), PAGE_READWRITE | PAGE_GUARD, &oldProt);
+        }
+    }
+}
+
+void PageGuardMappedMemory::resetMemoryObjectAllReadFlagAndPageGuard()
+{
+    DWORD oldProt;
+    backupBlockReadArraySnapshot();
+    for (uint64_t i = 0; i < PageGuardAmount; i++)
+    {
+        if (isMappedBlockChanged(i, BLOCK_FLAG_ARRAY_READ_SNAPSHOT))
+        {
+            setMappedBlockChanged(i, false, BLOCK_FLAG_ARRAY_READ_SNAPSHOT);
+            VirtualProtect(pMappedData + i*PageGuardSize, (SIZE_T)getMappedBlockSize(i), PAGE_READWRITE | PAGE_GUARD, &oldProt);
+        }
+    }
+}
+
+bool PageGuardMappedMemory::setAllPageGuardAndFlag(bool bSetPageGuard, bool bSetBlockChanged)
+{
+    bool setSuccessfully = true;
+    DWORD oldProt, dwErr;
+    DWORD dwMemSetting = bSetPageGuard ? (PAGE_READWRITE | PAGE_GUARD) : PAGE_READWRITE;
+
+    for (uint64_t i = 0; i < PageGuardAmount; i++)
+    {
+        setMappedBlockChanged(i, bSetBlockChanged, BLOCK_FLAG_ARRAY_CHANGED);
+        if (!VirtualProtect(pMappedData + i*PageGuardSize, (SIZE_T)getMappedBlockSize(i), dwMemSetting, &oldProt))
+        {
+            dwErr = GetLastError();
+            setSuccessfully = false;
+        }
+        else
+        {
+            dwErr = GetLastError();
+        }
+    }
+    return setSuccessfully;
+}
+
+bool PageGuardMappedMemory::vkMapMemoryPageGuardHandle(VkDevice device, VkDeviceMemory memory, VkDeviceSize offset, VkDeviceSize size, VkFlags flags, void** ppData)
+{
+    bool handleSuccessfully = true;
+    MappedDevice = device;
+    MappedMemory = memory;
+    MappedOffset = offset;
+#ifndef PAGEGUARD_ADD_PAGEGUARD_ON_REAL_MAPPED_MEMORY
+    pRealMappedData = (PBYTE)*ppData;
+    pMappedData = (PBYTE)pageguardAllocateMemory(size);
+    vktrace_pageguard_memcpy(pMappedData, pRealMappedData, size);
+    *ppData = pMappedData;
+#else
+    pMappedData = (PBYTE)*ppData;
+#endif
+    MappedSize = size;
+
+    setPageGuardExceptionHandler();
+
+    PageSizeLeft = size % PageGuardSize;
+    PageGuardAmount = size / PageGuardSize;
+    if (PageSizeLeft != 0)
+    {
+        PageGuardAmount++;
+    }
+    pPageStatus = new PageStatusArray(PageGuardAmount);
+    assert(pPageStatus);
+    if (!setAllPageGuardAndFlag(true, false))
+    {
+        handleSuccessfully = false;
+    }
+    return handleSuccessfully;
+}
+
+void PageGuardMappedMemory::vkUnmapMemoryPageGuardHandle(VkDevice device, VkDeviceMemory memory, void** MappedData)
+{
+    if ((memory == MappedMemory) && (device == MappedDevice))
+    {
+        setAllPageGuardAndFlag(false, false);
+        removePageGuardExceptionHandler();
+        clearChangedDataPackage();
+#ifndef PAGEGUARD_ADD_PAGEGUARD_ON_REAL_MAPPED_MEMORY
+        if (MappedData == nullptr)
+        {
+            vktrace_pageguard_memcpy(pRealMappedData, pMappedData, MappedSize);
+            pageguardFreeMemory(pMappedData);
+        }
+        else
+        {
+            vktrace_pageguard_memcpy(pRealMappedData, pMappedData, MappedSize);
+            *MappedData = pMappedData;
+        }
+        pRealMappedData = nullptr;
+        pMappedData = nullptr;
+#else
+        pMappedData = nullptr;
+        if (MappedData != nullptr)
+        {
+            *MappedData = nullptr;
+        }
+#endif
+        delete pPageStatus;
+        pPageStatus = nullptr;
+        MappedMemory =(VkDeviceMemory)nullptr;
+        MappedSize = 0;
+    }
+}
+
+void PageGuardMappedMemory::backupBlockChangedArraySnapshot()
+{
+    pPageStatus->backupChangedArray();
+}
+
+void PageGuardMappedMemory::backupBlockReadArraySnapshot()
+{
+    pPageStatus->backupReadArray();
+}
+
+DWORD PageGuardMappedMemory::getChangedBlockAmount(int useWhich)
+{
+    DWORD dwAmount = 0;
+    for (int i = 0; i < PageGuardAmount; i++)
+    {
+        if (isMappedBlockChanged(i, useWhich))
+        {
+            dwAmount++;
+        }
+    }
+    return dwAmount;
+}
+
+//is RangeLimit cover or partly cover Range
+bool PageGuardMappedMemory::isRangeIncluded(VkDeviceSize RangeOffsetLimit, VkDeviceSize RangeSizeLimit, VkDeviceSize RangeOffset, VkDeviceSize RangeSize)
+{
+    bool rangeIncluded = false;
+    if ((RangeOffsetLimit <= RangeOffset) &&
+        ((RangeOffsetLimit + RangeSizeLimit)>RangeOffset))
+    {
+        rangeIncluded = true;
+    }
+    else
+    {
+        if ((RangeOffsetLimit<(RangeOffset + RangeSize)) &&
+            ((RangeOffsetLimit + RangeSizeLimit) >= (RangeOffset + RangeSize)))
+        {
+            rangeIncluded = true;
+        }
+
+    }
+    return rangeIncluded;
+}
+
+//for output,
+//if pData!=nullptr,the pData + Offset is head addr of an array of PageGuardChangedBlockInfo, the [0] is block amount, size (size for all changed blocks which amount is block amount),then block1 offset,block1 size...., 
+//               the block? offset is  this changed block offset to mapped memory head addr,the array followed by changed blocks data
+//
+//if pData==nullptr, only get size
+//DWORD *pdwSaveSize, the size of all changed blocks
+//DWORD *pInfoSize, the size of array of PageGuardChangedBlockInfo
+//VkDeviceSize RangeOffset, RangeSize, only consider the block which is in the range which start from RangeOffset and size is RangeSize, if RangeOffset<0, consider whole mapped memory
+//return the amount of changed blocks.
+DWORD PageGuardMappedMemory::getChangedBlockInfo(VkDeviceSize RangeOffset, VkDeviceSize RangeSize, DWORD *pdwSaveSize, DWORD *pInfoSize, PBYTE pData, DWORD DataOffset, int useWhich)
+{
+    DWORD dwAmount = getChangedBlockAmount(useWhich), dwIndex = 0, offset = 0;
+    DWORD infosize = sizeof(PageGuardChangedBlockInfo)*(dwAmount + 1), SaveSize = 0, CurrentBlockSize = 0;
+    PBYTE pChangedData;
+    PageGuardChangedBlockInfo *pChangedInfoArray = (PageGuardChangedBlockInfo *)(pData ? (pData + DataOffset) : nullptr);
+
+    if (pInfoSize)
+    {
+        *pInfoSize = infosize;
+    }
+    for (int i = 0; i < PageGuardAmount; i++)
+    {
+        CurrentBlockSize = getMappedBlockSize(i);
+        offset = getMappedBlockOffset(i);
+        if (isMappedBlockChanged(i, useWhich))
+        {
+            if (pChangedInfoArray)
+            {
+                pChangedInfoArray[dwIndex + 1].offset = offset;
+                pChangedInfoArray[dwIndex + 1].length = CurrentBlockSize;
+                pChangedInfoArray[dwIndex + 1].reserve0 = 0;
+                pChangedInfoArray[dwIndex + 1].reserve1 = 0;
+                pChangedData = pData + DataOffset + infosize + SaveSize;
+                vktrace_pageguard_memcpy(pChangedData, pMappedData + offset, CurrentBlockSize);
+            }
+            SaveSize += CurrentBlockSize;
+            dwIndex++;
+        }
+    }
+    if (pChangedInfoArray)
+    {
+        pChangedInfoArray[0].offset = dwAmount;
+        pChangedInfoArray[0].length = SaveSize;
+    }
+    if (pdwSaveSize)
+    {
+        *pdwSaveSize = SaveSize;
+    }
+    return dwAmount;
+}
+
+//return: if memory already changed;
+//        evenif no change to mmeory, it will still allocate memory for info array which only include one PageGuardChangedBlockInfo,its  offset and length are all 0;
+// VkDeviceSize       *pChangedSize, the size of all changed data block
+// VkDeviceSize       *pDataPackageSize, size for ChangedDataPackage
+// PBYTE              *ppChangedDataPackage, is an array of PageGuardChangedBlockInfo + all changed data, the [0] offset is blocks amount, length is size amount of these blocks(not include this info array),then block1 offset,block1 size....
+//                     allocate needed memory in the method, ppChangedDataPackage point to returned pointer, if ppChangedDataPackage==null, only calculate size
+bool PageGuardMappedMemory::vkFlushMappedMemoryRangePageGuardHandle(
+    VkDevice           device,
+    VkDeviceMemory     memory,
+    VkDeviceSize       offset,
+    VkDeviceSize       size,
+    VkDeviceSize       *pChangedSize,
+    VkDeviceSize       *pDataPackageSize,
+    PBYTE              *ppChangedDataPackage)
+{
+    bool handleSuccessfully = false;
+    DWORD dwSaveSize, InfoSize;
+    backupBlockChangedArraySnapshot();
+    DWORD dwAmount = getChangedBlockInfo(offset, size, &dwSaveSize, &InfoSize, nullptr, 0, BLOCK_FLAG_ARRAY_CHANGED_SNAPSHOT); //get the info size and size of changed blocks
+    if ((dwSaveSize != 0))
+    {
+        handleSuccessfully = true;
+    }
+    if (pChangedSize)
+    {
+        *pChangedSize = dwSaveSize;
+    }
+    if (pDataPackageSize)
+    {
+        *pDataPackageSize = dwSaveSize + InfoSize;
+    }
+    pChangedDataPackage = (PBYTE)pageguardAllocateMemory(dwSaveSize + InfoSize);
+    getChangedBlockInfo(offset, size, &dwSaveSize, &InfoSize, pChangedDataPackage, 0, BLOCK_FLAG_ARRAY_CHANGED_SNAPSHOT);
+
+    //if use copy of real mapped memory, need copy back to real mapped memory
+#ifndef PAGEGUARD_ADD_PAGEGUARD_ON_REAL_MAPPED_MEMORY
+    PageGuardChangedBlockInfo *pChangedInfoArray = (PageGuardChangedBlockInfo *)pChangedDataPackage;
+    if (pChangedInfoArray[0].length)
+    {
+        PBYTE pChangedData = (PBYTE)pChangedDataPackage + sizeof(PageGuardChangedBlockInfo)*(pChangedInfoArray[0].offset + 1);
+        DWORD CurrentOffset = 0;
+        for (DWORD i = 0; i < pChangedInfoArray[0].offset; i++)
+        {
+            vktrace_pageguard_memcpy(pRealMappedData + pChangedInfoArray[i + 1].offset, pChangedData + CurrentOffset, (size_t)pChangedInfoArray[i + 1].length);
+            CurrentOffset += pChangedInfoArray[i + 1].length;
+        }
+    }
+#endif
+
+    if (ppChangedDataPackage)
+    {
+        //regist the changed package
+        *ppChangedDataPackage = pChangedDataPackage;
+    }
+    return handleSuccessfully;
+}
+
+void PageGuardMappedMemory::clearChangedDataPackage()
+{
+    if (pChangedDataPackage)
+    {
+        pageguardFreeMemory(pChangedDataPackage);
+        pChangedDataPackage = nullptr;
+    }
+}
+
+//get ptr and size of OPTChangedDataPackage;
+PBYTE PageGuardMappedMemory::getChangedDataPackage(VkDeviceSize  *pSize)
+{
+    PBYTE pResultDataPackage = nullptr;
+    if (pChangedDataPackage)
+    {
+        pResultDataPackage = pChangedDataPackage;
+        PageGuardChangedBlockInfo *pChangedInfoArray = reinterpret_cast<PageGuardChangedBlockInfo *>(pChangedDataPackage);
+        if (pSize)
+        {
+            *pSize = sizeof(PageGuardChangedBlockInfo)*(pChangedInfoArray[0].offset + 1) + pChangedInfoArray[0].length;
+        }
+    }
+    return pResultDataPackage;
+}
+
+#endif//page guard solution for windows

diff --git a/vktrace/src/vktrace_layer/vktrace_lib_pageguardmappedmemory.h b/vktrace/src/vktrace_layer/vktrace_lib_pageguardmappedmemory.h
new file mode 100644
index 0000000..0e091cc
--- /dev/null
+++ b/vktrace/src/vktrace_layer/vktrace_lib_pageguardmappedmemory.h

@@ -0,0 +1,134 @@
+/*
+* Copyright (c) 2016 Advanced Micro Devices, Inc. All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+#pragma once
+
+#include <stdbool.h>
+#include <unordered_map>
+#include "vulkan/vulkan.h"
+#include "vktrace_platform.h"
+#include "vktrace_common.h"
+
+#include "vktrace_interconnect.h"
+#include "vktrace_filelike.h"
+#include "vktrace_trace_packet_utils.h"
+#include "vktrace_vk_exts.h"
+#include <stdio.h>
+
+#include "vktrace_pageguard_memorycopy.h"
+#include "vktrace_lib_pagestatusarray.h"
+
+#if defined(WIN32) /// page guard solution for windows
+
+typedef class PageGuardMappedMemory
+{
+    friend class PageGuardCapture;
+
+private:
+    VkDevice MappedDevice;
+    VkDeviceMemory MappedMemory;
+    VkDeviceSize MappedOffset;
+    PBYTE pMappedData; /// point to mapped memory in app process, if pRealMappedData==nullptr, pMappedData point to real mapped memory,
+    /// if pRealMappedData!=nullptr, pMappedData point to real mapped memory copy, the copy can be added page guard
+    PBYTE pRealMappedData; /// point to real mapped memory in app process
+    PBYTE pChangedDataPackage; /// if not nullptr, it point to a package which include changed info array and changed data block, allocated by this class
+    VkDeviceSize MappedSize; /// the size of range
+
+    VkDeviceSize PageGuardSize; /// size for one block
+
+protected:
+    PageStatusArray *pPageStatus;
+    bool BlockConflictError; /// record if any block has been read by host and also write by host
+    VkDeviceSize PageSizeLeft;
+    uint64_t PageGuardAmount;
+
+public:
+    PageGuardMappedMemory();
+    ~PageGuardMappedMemory();
+
+    VkDevice& getMappedDevice();
+
+    VkDeviceMemory& getMappedMemory();
+
+    VkDeviceSize& getMappedOffset();
+
+    PBYTE& getRealMappedDataPointer(); /// get pointer to real mapped memory in app process
+
+    VkDeviceSize& getMappedSize(); /// get the size of range
+
+    bool isUseCopyForRealMappedMemory();
+
+    /// get head addr and size for a block which is located by a given index
+    bool getChangedRangeByIndex(uint64_t index, PBYTE *paddr, VkDeviceSize *pBlockSize);
+
+    /// if return value <0, mean addr is out of page guard.
+    uint64_t getIndexOfChangedBlockByAddr(PBYTE addr);
+
+    void setMappedBlockChanged(uint64_t index, bool bChanged, int useWhich);
+
+    bool isMappedBlockChanged(uint64_t index, int useWhich);
+
+    uint64_t getMappedBlockSize(uint64_t index);
+
+    uint64_t getMappedBlockOffset(uint64_t index);
+
+    bool isNoMappedBlockChanged();
+
+    void resetMemoryObjectAllChangedFlagAndPageGuard();
+
+    void resetMemoryObjectAllReadFlagAndPageGuard();
+
+    bool setAllPageGuardAndFlag(bool bSetPageGuard, bool bSetBlockChanged);
+
+    bool vkMapMemoryPageGuardHandle(VkDevice device, VkDeviceMemory memory, VkDeviceSize offset, VkDeviceSize size, VkFlags flags, void** ppData);
+
+    void vkUnmapMemoryPageGuardHandle(VkDevice device, VkDeviceMemory memory, void** MappedData);
+
+    void backupBlockChangedArraySnapshot();
+
+    void backupBlockReadArraySnapshot();
+
+    DWORD getChangedBlockAmount(int useWhich);
+
+    /// is RangeLimit cover or partly cover Range
+    bool isRangeIncluded(VkDeviceSize RangeOffsetLimit, VkDeviceSize RangeSizeLimit, VkDeviceSize RangeOffset, VkDeviceSize RangeSize);
+
+    /// for output,
+    /// if pData!=nullptr,the pData + Offset is head addr of an array of PageGuardChangedBlockInfo, the [0] is block amount, size (size for all changed blocks which amount is block amount),then block1 offset,block1 size...., 
+    ///               the block? offset is  this changed block offset to mapped memory head addr,the array followed by changed blocks data
+    ///
+    /// if pData==nullptr, only get size
+    /// DWORD *pdwSaveSize, the size of all changed blocks
+    /// DWORD *pInfoSize, the size of array of PageGuardChangedBlockInfo
+    /// VkDeviceSize RangeOffset, RangeSize, only consider the block which is in the range which start from RangeOffset and size is RangeSize, if RangeOffset<0, consider whole mapped memory
+    /// return the amount of changed blocks.
+    DWORD getChangedBlockInfo(VkDeviceSize RangeOffset, VkDeviceSize RangeSize, DWORD *pdwSaveSize, DWORD *pInfoSize, PBYTE pData, DWORD DataOffset, int useWhich = BLOCK_FLAG_ARRAY_CHANGED);
+
+    /// return: if memory already changed;
+    ///        evenif no change to mmeory, it will still allocate memory for info array which only include one PageGuardChangedBlockInfo,its  offset and length are all 0;
+    /// VkDeviceSize       *pChangedSize, the size of all changed data block
+    /// VkDeviceSize       *pDataPackageSize, size for ChangedDataPackage
+    /// PBYTE              *ppChangedDataPackage, is an array of PageGuardChangedBlockInfo + all changed data, the [0] offset is blocks amount, length is size amount of these blocks(not include this info array),then block1 offset,block1 size....
+    ///                     allocate needed memory in the method, ppChangedDataPackage point to returned pointer, if ppChangedDataPackage==null, only calculate size
+    bool vkFlushMappedMemoryRangePageGuardHandle(VkDevice device, VkDeviceMemory memory, VkDeviceSize offset, VkDeviceSize size, VkDeviceSize  *pChangedSize, VkDeviceSize  *pDataPackageSize, PBYTE   *ppChangedDataPackage);
+
+    void clearChangedDataPackage();
+
+    /// get ptr and size of OPTChangedDataPackage;
+    PBYTE getChangedDataPackage(VkDeviceSize  *pSize);
+} PageGuardMappedMemory, *LPPageGuardMappedMemory;
+
+//page guard for windows end
+#endif//page guard solution

diff --git a/vktrace/src/vktrace_layer/vktrace_lib_pagestatusarray.cpp b/vktrace/src/vktrace_layer/vktrace_lib_pagestatusarray.cpp
new file mode 100644
index 0000000..547c84d
--- /dev/null
+++ b/vktrace/src/vktrace_layer/vktrace_lib_pagestatusarray.cpp

@@ -0,0 +1,163 @@
+/*
+* Copyright (c) 2016 Advanced Micro Devices, Inc. All rights reserved.
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+//OPT: Optimization by using page-guard for speed up capture
+//     The speed is extremely slow when use vktrace to capture DOOM4. It took over half a day and 900G of trace for a capture from beginning to the game menu.
+//     The reason that caused such slow capture is DOOM updates a big mapped memory(over 67M) frequently, vktrace copies this memory block to harddrive when DOOM calls vkFlushmappedMemory to update it every time.
+//     Here we use page guard to record which page of big memory block has been changed and only save those changed pages, it make the capture time reduce to round 15 minutes, the trace file size is round 40G, 
+//     The Playback time for these trace file is round 7 minutes(on Win10/AMDFury/32GRam/I5 system).
+#include <atomic>
+#include "vktrace_lib_pagestatusarray.h"
+
+#if defined(WIN32) //page guard solution for windows
+
+const uint64_t PageStatusArray::PAGE_FLAG_AMOUNT_PER_BYTE = 1;
+const uint64_t PageStatusArray::PAGE_NUMBER_FROM_BIT_SHIFT = 0;
+
+PageStatusArray::PageStatusArray(uint64_t pageCount)
+{
+    ByteCount = (pageCount % PAGE_FLAG_AMOUNT_PER_BYTE) ? (pageCount / PAGE_FLAG_AMOUNT_PER_BYTE) + 1 : pageCount / PAGE_FLAG_AMOUNT_PER_BYTE;
+
+    pChangedArray[0] = new uint8_t[ByteCount];
+    assert(pChangedArray[0]);
+
+    pChangedArray[1] = new uint8_t[ByteCount];
+    assert(pChangedArray[1]);
+
+    pReadArray[0] = new uint8_t[ByteCount];
+    assert(pReadArray[0]);
+
+    pReadArray[1] = new uint8_t[ByteCount];
+    assert(pReadArray[1]);
+
+    activeChangesArray = pChangedArray[0];
+    capturedChangesArray = pChangedArray[1];
+    activeReadArray = pReadArray[0];
+    capturedReadArray = pReadArray[1];
+
+    clearAll();
+}
+
+PageStatusArray::~PageStatusArray()
+{
+    delete[] pChangedArray[0];
+    delete[] pChangedArray[1];
+    delete[] pReadArray[0];
+    delete[] pReadArray[1];
+}
+
+void PageStatusArray::toggleChangedArray()
+{
+    //TODO use atomic exchange
+    uint8_t *tempArray = activeChangesArray;
+    activeChangesArray = capturedChangesArray;
+    capturedChangesArray = tempArray;
+}
+
+void PageStatusArray::toggleReadArray()
+{
+    //TODO use atomic exchange
+    uint8_t *tempArray = activeReadArray;
+    activeReadArray = capturedReadArray;
+    capturedReadArray = tempArray;
+}
+
+bool PageStatusArray::getBlockChangedArray(uint64_t index)
+{
+    return activeChangesArray[index >> PAGE_NUMBER_FROM_BIT_SHIFT] & (1 << (index % PAGE_FLAG_AMOUNT_PER_BYTE));
+}
+
+bool PageStatusArray::getBlockChangedArraySnapshot(uint64_t index)
+{
+    return capturedChangesArray[index >> PAGE_NUMBER_FROM_BIT_SHIFT] & (1 << (index % PAGE_FLAG_AMOUNT_PER_BYTE));
+}
+
+bool PageStatusArray::getBlockReadArray(uint64_t index)
+{
+    return activeReadArray[index >> PAGE_NUMBER_FROM_BIT_SHIFT] & (1 << (index % PAGE_FLAG_AMOUNT_PER_BYTE));
+}
+
+bool PageStatusArray::getBlockReadArraySnapshot(uint64_t index)
+{
+    return capturedReadArray[index >> PAGE_NUMBER_FROM_BIT_SHIFT] & (1 << (index % PAGE_FLAG_AMOUNT_PER_BYTE));
+}
+
+void PageStatusArray::setBlockChangedArray(uint64_t index, bool changed)
+{
+    if (changed)
+    {
+        activeChangesArray[index >> PAGE_NUMBER_FROM_BIT_SHIFT] |= (1 << (index % PAGE_FLAG_AMOUNT_PER_BYTE));
+    }
+    else
+    {
+        activeChangesArray[index >> PAGE_NUMBER_FROM_BIT_SHIFT] &= ~((uint8_t)(1 << (index % PAGE_FLAG_AMOUNT_PER_BYTE)));
+    }
+}
+
+void PageStatusArray::setBlockChangedArraySnapshot(uint64_t index, bool changed)
+{
+    if (changed)
+    {
+        capturedChangesArray[index >> PAGE_NUMBER_FROM_BIT_SHIFT] |= (1 << (index % PAGE_FLAG_AMOUNT_PER_BYTE));
+    }
+    else
+    {
+        capturedChangesArray[index >> PAGE_NUMBER_FROM_BIT_SHIFT] &= ~((uint8_t)(1 << (index % PAGE_FLAG_AMOUNT_PER_BYTE)));
+    }
+}
+
+void PageStatusArray::setBlockReadArray(uint64_t index, bool changed)
+{
+    if (changed)
+    {
+        activeReadArray[index >> PAGE_NUMBER_FROM_BIT_SHIFT] |= (1 << (index % PAGE_FLAG_AMOUNT_PER_BYTE));
+    }
+    else
+    {
+        activeReadArray[index >> PAGE_NUMBER_FROM_BIT_SHIFT] &= ~((uint8_t)(1 << (index % PAGE_FLAG_AMOUNT_PER_BYTE)));
+    }
+}
+
+void PageStatusArray::setBlockReadArraySnapshot(uint64_t index, bool changed)
+{
+    if (changed)
+    {
+        capturedReadArray[index >> PAGE_NUMBER_FROM_BIT_SHIFT] |= (1 << (index % PAGE_FLAG_AMOUNT_PER_BYTE));
+    }
+    else
+    {
+        capturedReadArray[index >> PAGE_NUMBER_FROM_BIT_SHIFT] &= ~((uint8_t)(1 << (index % PAGE_FLAG_AMOUNT_PER_BYTE)));
+    }
+}
+
+void PageStatusArray::backupChangedArray()
+{
+    toggleChangedArray();
+}
+
+void PageStatusArray::backupReadArray()
+{
+    toggleReadArray();
+}
+
+void PageStatusArray::clearAll()
+{
+    memset(activeChangesArray, 0, ByteCount);
+    memset(capturedChangesArray, 0, ByteCount);
+    memset(activeReadArray, 0, ByteCount);
+    memset(capturedReadArray, 0, ByteCount);
+}
+
+#endif//page guard solution for windows

diff --git a/vktrace/src/vktrace_layer/vktrace_lib_pagestatusarray.h b/vktrace/src/vktrace_layer/vktrace_lib_pagestatusarray.h
new file mode 100644
index 0000000..14c9c9f
--- /dev/null
+++ b/vktrace/src/vktrace_layer/vktrace_lib_pagestatusarray.h

@@ -0,0 +1,71 @@
+/*
+* Copyright (c) 2016 Advanced Micro Devices, Inc. All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+#pragma once
+
+#include <stdbool.h>
+#include <unordered_map>
+#include "vulkan/vulkan.h"
+#include "vktrace_platform.h"
+#include "vktrace_common.h"
+
+#include "vktrace_interconnect.h"
+#include "vktrace_filelike.h"
+#include "vktrace_trace_packet_utils.h"
+#include "vktrace_vk_exts.h"
+#include <stdio.h>
+
+
+
+#if defined(WIN32) /// page guard solution for windows
+
+static const int BLOCK_FLAG_ARRAY_CHANGED = 0;
+static const int  BLOCK_FLAG_ARRAY_CHANGED_SNAPSHOT = 1;
+static const int  BLOCK_FLAG_ARRAY_READ = 2;
+static const int  BLOCK_FLAG_ARRAY_READ_SNAPSHOT = 3;
+
+typedef class PageStatusArray
+{
+public:
+    PageStatusArray(uint64_t pageCount);
+    ~PageStatusArray();
+
+    void toggleChangedArray();
+    void toggleReadArray();
+
+    bool getBlockChangedArray(uint64_t index);
+    bool getBlockChangedArraySnapshot(uint64_t index);
+    bool getBlockReadArray(uint64_t index);
+    bool getBlockReadArraySnapshot(uint64_t index);
+    void setBlockChangedArray(uint64_t index, bool changed);
+    void setBlockChangedArraySnapshot(uint64_t index, bool changed);
+    void setBlockReadArray(uint64_t index, bool changed);
+    void setBlockReadArraySnapshot(uint64_t index, bool changed);
+    void backupChangedArray();
+    void backupReadArray();
+    void clearAll();
+private:
+    const static uint64_t PAGE_FLAG_AMOUNT_PER_BYTE;
+    const static uint64_t PAGE_NUMBER_FROM_BIT_SHIFT;
+    uint64_t ByteCount;
+    uint8_t *activeChangesArray;
+    uint8_t *capturedChangesArray;
+    uint8_t *activeReadArray;
+    uint8_t *capturedReadArray;
+    uint8_t *pChangedArray[2]; /// include two array, one for page guard handler to record which block has been changed from vkMap.. or last time vkFlush..., the other one for flush data and reset pageguard
+    uint8_t *pReadArray[2]; /// include two array, one for page guard handler to record which block has been read by host from vkMap.. or last time vkinvalidate or vkpipelinebarrier with specific para..., the other one for reset page guard
+} PageStatusArray;
+
+#endif//page guard solution

diff --git a/vktrace/src/vktrace_layer/vktrace_lib_trace.cpp b/vktrace/src/vktrace_layer/vktrace_lib_trace.cpp
new file mode 100644
index 0000000..9f652c2
--- /dev/null
+++ b/vktrace/src/vktrace_layer/vktrace_lib_trace.cpp

@@ -0,0 +1,2455 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Tobin Ehlis <tobin@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ * Author: Mark Lobodzinski <mark@lunarg.com>
+ */
+#include <stdbool.h>
+#include <unordered_map>
+#include "vktrace_vk_vk.h"
+#include "vulkan/vulkan.h"
+#include "vulkan/vk_layer.h"
+#include "vktrace_platform.h"
+#include "vk_dispatch_table_helper.h"
+#include "vktrace_common.h"
+#include "vktrace_lib_helpers.h"
+
+#include "vktrace_interconnect.h"
+#include "vktrace_filelike.h"
+#include "vktrace_trace_packet_utils.h"
+#include "vktrace_vk_exts.h"
+#include <stdio.h>
+
+#include "vktrace_pageguard_memorycopy.h"
+#include "vktrace_lib_pagestatusarray.h"
+#include "vktrace_lib_pageguardmappedmemory.h"
+#include "vktrace_lib_pageguardcapture.h"
+#include "vktrace_lib_pageguard.h"
+
+// declared as extern in vktrace_lib_helpers.h
+VKTRACE_CRITICAL_SECTION g_memInfoLock;
+VKMemInfo g_memInfo = {0, NULL, NULL, 0};
+
+
+std::unordered_map<void *, layer_device_data *> g_deviceDataMap;
+std::unordered_map<void *, layer_instance_data *> g_instanceDataMap;
+
+
+layer_instance_data *mid(void *object)
+{
+    dispatch_key key = get_dispatch_key(object);
+    std::unordered_map<void *, layer_instance_data *>::const_iterator got;
+    got = g_instanceDataMap.find(key);
+    assert(got != g_instanceDataMap.end());
+    return got->second;
+}
+
+layer_device_data *mdd(void* object)
+{
+    dispatch_key key = get_dispatch_key(object);
+    std::unordered_map<void *, layer_device_data *>::const_iterator got;
+    got = g_deviceDataMap.find(key);
+    assert(got != g_deviceDataMap.end());
+    return got->second;
+}
+
+static layer_instance_data *initInstanceData(
+                                    VkInstance instance,
+                                    const PFN_vkGetInstanceProcAddr gpa,
+                                    std::unordered_map<void *, layer_instance_data *> &map)
+{
+    layer_instance_data *pTable;
+    assert(instance);
+    dispatch_key key = get_dispatch_key(instance);
+
+    std::unordered_map<void *, layer_instance_data *>::const_iterator it = map.find(key);
+    if (it == map.end())
+    {
+        pTable =  new layer_instance_data();
+        map[key] = pTable;
+    } else
+    {
+        return it->second;
+    }
+
+    // TODO: Convert to new init method
+    layer_init_instance_dispatch_table(instance, &pTable->instTable, gpa);
+
+    return pTable;
+}
+
+static layer_device_data *initDeviceData(
+        VkDevice device,
+        const PFN_vkGetDeviceProcAddr gpa,
+        std::unordered_map<void *, layer_device_data *> &map)
+{
+    layer_device_data *pTable;
+    dispatch_key key = get_dispatch_key(device);
+
+    std::unordered_map<void *, layer_device_data *>::const_iterator it = map.find(key);
+    if (it == map.end())
+    {
+        pTable =  new layer_device_data();
+        map[key] = pTable;
+    } else
+    {
+        return it->second;
+    }
+
+    layer_init_device_dispatch_table(device, &pTable->devTable, gpa);
+
+    return pTable;
+}
+
+/*
+ * This function will return the pNext pointer of any
+ * CreateInfo extensions that are not loader extensions.
+ * This is used to skip past the loader extensions prepended
+ * to the list during CreateInstance and CreateDevice.
+ */
+void *strip_create_extensions(const void *pNext)
+{
+    VkLayerInstanceCreateInfo *create_info = (VkLayerInstanceCreateInfo *) pNext;
+
+    while (create_info && (create_info->sType == VK_STRUCTURE_TYPE_LOADER_INSTANCE_CREATE_INFO ||
+           create_info->sType == VK_STRUCTURE_TYPE_LOADER_DEVICE_CREATE_INFO)) {
+        create_info = (VkLayerInstanceCreateInfo *) create_info->pNext;
+    }
+
+    return create_info;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkAllocateMemory(
+    VkDevice device,
+    const VkMemoryAllocateInfo* pAllocateInfo,
+    const VkAllocationCallbacks* pAllocator,
+    VkDeviceMemory* pMemory)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkAllocateMemory* pPacket = NULL;
+    CREATE_TRACE_PACKET(vkAllocateMemory, get_struct_chain_size((void*)pAllocateInfo) + sizeof(VkAllocationCallbacks) + sizeof(VkDeviceMemory));
+    result = mdd(device)->devTable.AllocateMemory(device, pAllocateInfo, pAllocator, pMemory);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkAllocateMemory(pHeader);
+    pPacket->device = device;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pAllocateInfo), sizeof(VkMemoryAllocateInfo), pAllocateInfo);
+    add_alloc_memory_to_trace_packet(pHeader, (void**)&(pPacket->pAllocateInfo->pNext), pAllocateInfo->pNext);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pAllocator), sizeof(VkAllocationCallbacks), NULL);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pMemory), sizeof(VkDeviceMemory), pMemory);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pAllocateInfo));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pAllocator));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pMemory));
+    FINISH_TRACE_PACKET();
+    // begin custom code
+    add_new_handle_to_mem_info(*pMemory, pAllocateInfo->allocationSize, NULL);
+    // end custom code
+    return result;
+}
+
+
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkMapMemory(
+    VkDevice device,
+    VkDeviceMemory memory,
+    VkDeviceSize offset,
+    VkDeviceSize size,
+    VkFlags flags,
+    void** ppData)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkMapMemory* pPacket = NULL;
+    VKAllocInfo *entry;
+    CREATE_TRACE_PACKET(vkMapMemory, sizeof(void*));
+    result = mdd(device)->devTable.MapMemory(device, memory, offset, size, flags, ppData);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    entry = find_mem_info_entry(memory);
+
+    // For vktrace usage, clamp the memory size to the total size less offset if VK_WHOLE_SIZE is specified.
+    if (size == VK_WHOLE_SIZE) {
+        size = entry->totalSize - offset;
+    }
+#ifdef USE_PAGEGUARD_SPEEDUP
+    pageguardEnter();
+    getPageGuardControlInstance().vkMapMemoryPageGuardHandle(device, memory, offset, size, flags, ppData);
+    pageguardExit();
+#endif
+    pPacket = interpret_body_as_vkMapMemory(pHeader);
+    pPacket->device = device;
+    pPacket->memory = memory;
+    pPacket->offset = offset;
+    pPacket->size = size;
+    pPacket->flags = flags;
+    if (ppData != NULL)
+    {
+        vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->ppData), sizeof(void*), *ppData);
+        vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->ppData));
+        add_data_to_mem_info(memory, size, offset, *ppData);
+    }
+    pPacket->result = result;
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR void VKAPI_CALL __HOOKED_vkUnmapMemory(
+    VkDevice device,
+    VkDeviceMemory memory)
+{
+    vktrace_trace_packet_header* pHeader;
+    packet_vkUnmapMemory* pPacket;
+    VKAllocInfo *entry;
+    size_t siz = 0;
+#ifdef USE_PAGEGUARD_SPEEDUP
+    void *PageGuardMappedData;
+    pageguardEnter();
+    getPageGuardControlInstance().vkUnmapMemoryPageGuardHandle(device, memory, &PageGuardMappedData, &vkFlushMappedMemoryRangesWithoutAPICall);
+    pageguardExit();
+#endif
+    uint64_t trace_begin_time = vktrace_get_time();
+
+    // insert into packet the data that was written by CPU between the vkMapMemory call and here
+    // Note must do this prior to the real vkUnMap() or else may get a FAULT
+    vktrace_enter_critical_section(&g_memInfoLock);
+    entry = find_mem_info_entry(memory);
+    if (entry && entry->pData != NULL)
+    {
+        if (!entry->didFlush)
+        {
+            // no FlushMapped Memory
+            siz = (size_t)entry->rangeSize;
+        }
+    }
+    CREATE_TRACE_PACKET(vkUnmapMemory, siz);
+    pHeader->vktrace_begin_time = trace_begin_time;
+    pPacket = interpret_body_as_vkUnmapMemory(pHeader);
+    if (siz)
+    {
+        assert(entry->handle == memory);
+        vktrace_add_buffer_to_trace_packet(pHeader, (void**) &(pPacket->pData), siz, entry->pData);
+        vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pData));
+        entry->pData = NULL;
+    }
+    vktrace_leave_critical_section(&g_memInfoLock);
+    pHeader->entrypoint_begin_time = vktrace_get_time();
+    mdd(device)->devTable.UnmapMemory(device, memory);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket->device = device;
+    pPacket->memory = memory;
+    FINISH_TRACE_PACKET();
+#ifdef USE_PAGEGUARD_SPEEDUP
+    pageguardEnter();
+    if (PageGuardMappedData != nullptr)
+    {
+        pageguardFreeMemory(PageGuardMappedData);
+    }
+    pageguardExit();
+#endif
+}
+
+VKTRACER_EXPORT VKAPI_ATTR void VKAPI_CALL __HOOKED_vkFreeMemory(
+    VkDevice device,
+    VkDeviceMemory memory,
+    const VkAllocationCallbacks* pAllocator)
+{
+    vktrace_trace_packet_header* pHeader;
+    packet_vkFreeMemory* pPacket = NULL;
+    CREATE_TRACE_PACKET(vkFreeMemory, sizeof(VkAllocationCallbacks));
+    mdd(device)->devTable.FreeMemory(device, memory, pAllocator);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkFreeMemory(pHeader);
+    pPacket->device = device;
+    pPacket->memory = memory;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pAllocator), sizeof(VkAllocationCallbacks), NULL);
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pAllocator));
+    FINISH_TRACE_PACKET();
+    // begin custom code
+    rm_handle_from_mem_info(memory);
+    // end custom code
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkInvalidateMappedMemoryRanges(
+    VkDevice device,
+    uint32_t memoryRangeCount,
+    const VkMappedMemoryRange* pMemoryRanges)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    size_t rangesSize = 0;
+    size_t dataSize = 0;
+    uint32_t iter;
+    packet_vkInvalidateMappedMemoryRanges* pPacket = NULL;
+    uint64_t trace_begin_time = vktrace_get_time();
+
+#ifdef USE_PAGEGUARD_SPEEDUP
+    pageguardEnter();
+    resetAllReadFlagAndPageGuard();
+    pageguardExit();
+#endif
+
+    // find out how much memory is in the ranges
+    for (iter = 0; iter < memoryRangeCount; iter++)
+    {
+        VkMappedMemoryRange* pRange = (VkMappedMemoryRange*)&pMemoryRanges[iter];
+        rangesSize += vk_size_vkmappedmemoryrange(pRange);
+        dataSize += (size_t)pRange->size;
+    }
+
+    CREATE_TRACE_PACKET(vkInvalidateMappedMemoryRanges, rangesSize + sizeof(void*)*memoryRangeCount + dataSize);
+    pHeader->vktrace_begin_time = trace_begin_time;
+    pPacket = interpret_body_as_vkInvalidateMappedMemoryRanges(pHeader);
+
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**) &(pPacket->pMemoryRanges), rangesSize, pMemoryRanges);
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pMemoryRanges));
+
+    // insert into packet the data that was written by CPU between the vkMapMemory call and here
+    // create a temporary local ppData array and add it to the packet (to reserve the space for the array)
+    void** ppTmpData = (void **) malloc(memoryRangeCount * sizeof(void*));
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**) &(pPacket->ppData), sizeof(void*)*memoryRangeCount, ppTmpData);
+    free(ppTmpData);
+
+    // now the actual memory
+    vktrace_enter_critical_section(&g_memInfoLock);
+    for (iter = 0; iter < memoryRangeCount; iter++)
+    {
+        VkMappedMemoryRange* pRange = (VkMappedMemoryRange*)&pMemoryRanges[iter];
+        VKAllocInfo* pEntry = find_mem_info_entry(pRange->memory);
+
+        if (pEntry != NULL)
+        {
+            assert(pEntry->handle == pRange->memory);
+            assert(pEntry->totalSize >= (pRange->size + pRange->offset));
+            assert(pEntry->totalSize >= pRange->size);
+            assert(pRange->offset >= pEntry->rangeOffset && (pRange->offset + pRange->size) <= (pEntry->rangeOffset + pEntry->rangeSize));
+            vktrace_add_buffer_to_trace_packet(pHeader, (void**) &(pPacket->ppData[iter]), pRange->size, pEntry->pData + pRange->offset);
+            vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->ppData[iter]));
+            pEntry->didFlush = TRUE;//Do we need didInvalidate?
+        }
+        else
+        {
+             vktrace_LogError("Failed to copy app memory into trace packet (idx = %u) on vkInvalidateMappedMemoryRanges", pHeader->global_packet_index);
+        }
+    }
+    vktrace_leave_critical_section(&g_memInfoLock);
+
+    // now finalize the ppData array since it is done being updated
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->ppData));
+
+    pHeader->entrypoint_begin_time = vktrace_get_time();
+    result = mdd(device)->devTable.InvalidateMappedMemoryRanges(device, memoryRangeCount, pMemoryRanges);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket->device = device;
+    pPacket->memoryRangeCount = memoryRangeCount;
+    pPacket->result = result;
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkFlushMappedMemoryRanges(
+    VkDevice device,
+    uint32_t memoryRangeCount,
+    const VkMappedMemoryRange* pMemoryRanges)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    size_t rangesSize = 0;
+    size_t dataSize = 0;
+    uint32_t iter;
+    packet_vkFlushMappedMemoryRanges* pPacket = NULL;
+#ifdef USE_PAGEGUARD_SPEEDUP
+    pageguardEnter();
+    PBYTE *ppPackageData = new PBYTE[memoryRangeCount];
+    getPageGuardControlInstance().vkFlushMappedMemoryRangesPageGuardHandle(device, memoryRangeCount, pMemoryRanges, ppPackageData);//the packet is not needed if no any change on data of all ranges
+#endif
+
+    uint64_t trace_begin_time = vktrace_get_time();
+
+    // find out how much memory is in the ranges
+    for (iter = 0; iter < memoryRangeCount; iter++)
+    {
+        VkMappedMemoryRange* pRange = (VkMappedMemoryRange*)&pMemoryRanges[iter];
+        rangesSize += vk_size_vkmappedmemoryrange(pRange);
+        dataSize += ROUNDUP_TO_4(((size_t)pRange->size));
+    }
+#ifdef USE_PAGEGUARD_SPEEDUP
+    dataSize = ROUNDUP_TO_4(getPageGuardControlInstance().getALLChangedPackageSizeInMappedMemory(device, memoryRangeCount, pMemoryRanges, ppPackageData));
+#endif
+
+    CREATE_TRACE_PACKET(vkFlushMappedMemoryRanges, rangesSize + sizeof(void*)*memoryRangeCount + dataSize);
+    pHeader->vktrace_begin_time = trace_begin_time;
+    pPacket = interpret_body_as_vkFlushMappedMemoryRanges(pHeader);
+
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**) &(pPacket->pMemoryRanges), rangesSize, pMemoryRanges);
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pMemoryRanges));
+
+    // insert into packet the data that was written by CPU between the vkMapMemory call and here
+    // create a temporary local ppData array and add it to the packet (to reserve the space for the array)
+    void** ppTmpData = (void **) malloc(memoryRangeCount * sizeof(void*));
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**) &(pPacket->ppData), sizeof(void*)*memoryRangeCount, ppTmpData);
+    free(ppTmpData);
+
+    // now the actual memory
+    vktrace_enter_critical_section(&g_memInfoLock);
+    for (iter = 0; iter < memoryRangeCount; iter++)
+    {
+        VkMappedMemoryRange* pRange = (VkMappedMemoryRange*)&pMemoryRanges[iter];
+        VKAllocInfo* pEntry = find_mem_info_entry(pRange->memory);
+
+        if (pEntry != NULL)
+        {
+            assert(pEntry->handle == pRange->memory);
+            assert(pEntry->totalSize >= (pRange->size + pRange->offset));
+            assert(pEntry->totalSize >= pRange->size);
+            assert(pRange->offset >= pEntry->rangeOffset && (pRange->offset + pRange->size) <= (pEntry->rangeOffset + pEntry->rangeSize));
+#ifdef USE_PAGEGUARD_SPEEDUP
+            LPPageGuardMappedMemory pOPTMemoryTemp = getPageGuardControlInstance().findMappedMemoryObject(device, pRange);
+            VkDeviceSize OPTPackageSizeTemp = 0;
+            if (pOPTMemoryTemp)
+            {
+                PBYTE pOPTDataTemp = pOPTMemoryTemp->getChangedDataPackage(&OPTPackageSizeTemp);
+                vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->ppData[iter]), ROUNDUP_TO_4(OPTPackageSizeTemp), pOPTDataTemp);
+                pOPTMemoryTemp->clearChangedDataPackage();
+                pOPTMemoryTemp->resetMemoryObjectAllChangedFlagAndPageGuard();
+            }
+            else
+            {
+                PBYTE pOPTDataTemp = getPageGuardControlInstance().getChangedDataPackageOutOfMap(ppPackageData, iter, &OPTPackageSizeTemp);
+                vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->ppData[iter]), ROUNDUP_TO_4(OPTPackageSizeTemp), pOPTDataTemp);
+                getPageGuardControlInstance().clearChangedDataPackageOutOfMap(ppPackageData, iter);
+            }
+#else
+            vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->ppData[iter]), ROUNDUP_TO_4(pRange->size), pEntry->pData + pRange->offset);
+#endif
+            vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->ppData[iter]));
+            pEntry->didFlush = TRUE;
+        }
+        else
+        {
+             vktrace_LogError("Failed to copy app memory into trace packet (idx = %u) on vkFlushedMappedMemoryRanges", pHeader->global_packet_index);
+        }
+    }
+#ifdef USE_PAGEGUARD_SPEEDUP
+    delete[] ppPackageData;
+#endif
+    vktrace_leave_critical_section(&g_memInfoLock);
+
+    // now finalize the ppData array since it is done being updated
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->ppData));
+
+    pHeader->entrypoint_begin_time = vktrace_get_time();
+    result = mdd(device)->devTable.FlushMappedMemoryRanges(device, memoryRangeCount, pMemoryRanges);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket->device = device;
+    pPacket->memoryRangeCount = memoryRangeCount;
+    pPacket->result = result;
+    FINISH_TRACE_PACKET();
+#ifdef USE_PAGEGUARD_SPEEDUP
+    pageguardExit();
+#endif
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkAllocateCommandBuffers(
+    VkDevice device,
+    const VkCommandBufferAllocateInfo* pAllocateInfo,
+    VkCommandBuffer* pCommandBuffers)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkAllocateCommandBuffers* pPacket = NULL;
+    CREATE_TRACE_PACKET(vkAllocateCommandBuffers, get_struct_chain_size((void*)pAllocateInfo) + sizeof(VkCommandBuffer) * pAllocateInfo->commandBufferCount);
+    result = mdd(device)->devTable.AllocateCommandBuffers(device, pAllocateInfo, pCommandBuffers);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkAllocateCommandBuffers(pHeader);
+    pPacket->device = device;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pAllocateInfo), sizeof(VkCommandBufferAllocateInfo), pAllocateInfo);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCommandBuffers), sizeof(VkCommandBuffer) * pAllocateInfo->commandBufferCount, pCommandBuffers);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pAllocateInfo));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCommandBuffers));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkBeginCommandBuffer(
+    VkCommandBuffer commandBuffer,
+    const VkCommandBufferBeginInfo* pBeginInfo)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkBeginCommandBuffer* pPacket = NULL;
+    CREATE_TRACE_PACKET(vkBeginCommandBuffer, get_struct_chain_size((void*)pBeginInfo));
+    result = mdd(commandBuffer)->devTable.BeginCommandBuffer(commandBuffer, pBeginInfo);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkBeginCommandBuffer(pHeader);
+    pPacket->commandBuffer = commandBuffer;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pBeginInfo), sizeof(VkCommandBufferBeginInfo), pBeginInfo);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pBeginInfo->pInheritanceInfo), sizeof(VkCommandBufferInheritanceInfo), pBeginInfo->pInheritanceInfo);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pBeginInfo->pInheritanceInfo));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pBeginInfo));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkCreateDescriptorPool(
+    VkDevice device,
+    const VkDescriptorPoolCreateInfo* pCreateInfo,
+    const VkAllocationCallbacks* pAllocator,
+    VkDescriptorPool* pDescriptorPool)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkCreateDescriptorPool* pPacket = NULL;
+    // begin custom code (needs to use get_struct_chain_size)
+    CREATE_TRACE_PACKET(vkCreateDescriptorPool,  get_struct_chain_size((void*)pCreateInfo) + sizeof(VkAllocationCallbacks) + sizeof(VkDescriptorPool));
+    // end custom code
+    result = mdd(device)->devTable.CreateDescriptorPool(device, pCreateInfo, pAllocator, pDescriptorPool);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkCreateDescriptorPool(pHeader);
+    pPacket->device = device;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo), sizeof(VkDescriptorPoolCreateInfo), pCreateInfo);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo->pPoolSizes), pCreateInfo->poolSizeCount * sizeof(VkDescriptorPoolSize), pCreateInfo->pPoolSizes);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pAllocator), sizeof(VkAllocationCallbacks), NULL);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pDescriptorPool), sizeof(VkDescriptorPool), pDescriptorPool);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo->pPoolSizes));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pAllocator));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pDescriptorPool));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VkLayerDeviceCreateInfo *get_chain_info(const VkDeviceCreateInfo *pCreateInfo, VkLayerFunction func)
+{
+    VkLayerDeviceCreateInfo *chain_info = (VkLayerDeviceCreateInfo *) pCreateInfo->pNext;
+    while (chain_info && !(chain_info->sType == VK_STRUCTURE_TYPE_LOADER_DEVICE_CREATE_INFO
+           && chain_info->function == func)) {
+        chain_info = (VkLayerDeviceCreateInfo *) chain_info->pNext;
+    }
+    assert(chain_info != NULL);
+    return chain_info;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkCreateDevice(
+    VkPhysicalDevice physicalDevice,
+    const VkDeviceCreateInfo* pCreateInfo,
+    const VkAllocationCallbacks* pAllocator,
+    VkDevice* pDevice)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkCreateDevice* pPacket = NULL;
+    uint32_t i;
+
+    VkLayerDeviceCreateInfo *chain_info = get_chain_info(pCreateInfo, VK_LAYER_LINK_INFO);
+
+    assert(chain_info->u.pLayerInfo);
+    PFN_vkGetInstanceProcAddr fpGetInstanceProcAddr = chain_info->u.pLayerInfo->pfnNextGetInstanceProcAddr;
+    assert(fpGetInstanceProcAddr);
+    PFN_vkGetDeviceProcAddr fpGetDeviceProcAddr = chain_info->u.pLayerInfo->pfnNextGetDeviceProcAddr;
+    assert(fpGetDeviceProcAddr);
+    PFN_vkCreateDevice fpCreateDevice = (PFN_vkCreateDevice) fpGetInstanceProcAddr(NULL, "vkCreateDevice");
+    if (fpCreateDevice == NULL) {
+        return VK_ERROR_INITIALIZATION_FAILED;
+    }
+
+    // Advance the link info for the next element on the chain
+    chain_info->u.pLayerInfo = chain_info->u.pLayerInfo->pNext;
+
+    result = fpCreateDevice(physicalDevice, pCreateInfo, pAllocator, pDevice);
+    if (result != VK_SUCCESS) {
+        return result;
+    }
+ 
+    initDeviceData(*pDevice, fpGetDeviceProcAddr, g_deviceDataMap);
+    // Setup device dispatch table for extensions
+    ext_init_create_device(mdd(*pDevice), *pDevice, fpGetDeviceProcAddr, pCreateInfo->enabledExtensionCount, pCreateInfo->ppEnabledExtensionNames);
+
+    // remove the loader extended createInfo structure
+    VkDeviceCreateInfo localCreateInfo;
+    memcpy(&localCreateInfo, pCreateInfo, sizeof(localCreateInfo));
+    for (i = 0; i < pCreateInfo->enabledExtensionCount; i++) {
+        char **ppName = (char **) &localCreateInfo.ppEnabledExtensionNames[i];
+        *ppName = (char *) pCreateInfo->ppEnabledExtensionNames[i];
+    }
+    for (i = 0; i < pCreateInfo->enabledLayerCount; i++) {
+        char **ppName = (char **) &localCreateInfo.ppEnabledLayerNames[i];
+        *ppName = (char *) pCreateInfo->ppEnabledLayerNames[i];
+    }
+    localCreateInfo.pNext = strip_create_extensions(pCreateInfo->pNext);
+
+    CREATE_TRACE_PACKET(vkCreateDevice, get_struct_chain_size((void*)&localCreateInfo) + sizeof(VkAllocationCallbacks) + sizeof(VkDevice));
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkCreateDevice(pHeader);
+    pPacket->physicalDevice = physicalDevice;
+    add_VkDeviceCreateInfo_to_packet(pHeader, (VkDeviceCreateInfo**) &(pPacket->pCreateInfo), &localCreateInfo);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pAllocator), sizeof(VkAllocationCallbacks), NULL);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pDevice), sizeof(VkDevice), pDevice);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pAllocator));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pDevice));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkCreateFramebuffer(
+    VkDevice device,
+    const VkFramebufferCreateInfo* pCreateInfo,
+    const VkAllocationCallbacks* pAllocator,
+    VkFramebuffer* pFramebuffer)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkCreateFramebuffer* pPacket = NULL;
+    // begin custom code
+    uint32_t attachmentCount = (pCreateInfo != NULL && pCreateInfo->pAttachments != NULL) ? pCreateInfo->attachmentCount : 0;
+    CREATE_TRACE_PACKET(vkCreateFramebuffer, get_struct_chain_size((void*)pCreateInfo) + sizeof(VkAllocationCallbacks) + sizeof(VkFramebuffer));
+    // end custom code
+    result = mdd(device)->devTable.CreateFramebuffer(device, pCreateInfo, pAllocator, pFramebuffer);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkCreateFramebuffer(pHeader);
+    pPacket->device = device;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo), sizeof(VkFramebufferCreateInfo), pCreateInfo);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo->pAttachments), attachmentCount * sizeof(VkImageView), pCreateInfo->pAttachments);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pAllocator), sizeof(VkAllocationCallbacks), NULL);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pFramebuffer), sizeof(VkFramebuffer), pFramebuffer);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo->pAttachments));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pAllocator));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pFramebuffer));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VkLayerInstanceCreateInfo *get_chain_info(const VkInstanceCreateInfo *pCreateInfo, VkLayerFunction func)
+{
+    VkLayerInstanceCreateInfo *chain_info = (VkLayerInstanceCreateInfo *) pCreateInfo->pNext;
+    while (chain_info && ((chain_info->sType != VK_STRUCTURE_TYPE_LOADER_INSTANCE_CREATE_INFO)
+           || (chain_info->function != func))) {
+        chain_info = (VkLayerInstanceCreateInfo *) chain_info->pNext;
+    }
+    assert(chain_info != NULL);
+    return chain_info;
+}
+
+#if defined(USE_PAGEGUARD_SPEEDUP) && !defined(PAGEGUARD_MEMCPY_USE_PPL_LIB)
+extern "C" BOOL vktrace_pageguard_init_multi_threads_memcpy();
+extern "C" void vktrace_pageguard_done_multi_threads_memcpy();
+#endif
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkCreateInstance(
+    const VkInstanceCreateInfo* pCreateInfo,
+    const VkAllocationCallbacks* pAllocator,
+    VkInstance* pInstance)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkCreateInstance* pPacket = NULL;
+    uint64_t startTime;
+    uint64_t endTime;
+    uint32_t i;
+    uint64_t vktraceStartTime = vktrace_get_time();
+    SEND_ENTRYPOINT_ID(vkCreateInstance);
+    startTime = vktrace_get_time();
+
+#if defined(USE_PAGEGUARD_SPEEDUP) && !defined(PAGEGUARD_MEMCPY_USE_PPL_LIB)
+    vktrace_pageguard_init_multi_threads_memcpy();
+#endif
+
+    VkLayerInstanceCreateInfo *chain_info = get_chain_info(pCreateInfo, VK_LAYER_LINK_INFO);
+
+    assert(chain_info->u.pLayerInfo);
+    PFN_vkGetInstanceProcAddr fpGetInstanceProcAddr = chain_info->u.pLayerInfo->pfnNextGetInstanceProcAddr;
+    assert(fpGetInstanceProcAddr);
+    PFN_vkCreateInstance fpCreateInstance = (PFN_vkCreateInstance) fpGetInstanceProcAddr(NULL, "vkCreateInstance");
+    if (fpCreateInstance == NULL) {
+        return VK_ERROR_INITIALIZATION_FAILED;
+    }
+
+    // Advance the link info for the next element on the chain
+    chain_info->u.pLayerInfo = chain_info->u.pLayerInfo->pNext;
+
+    result = fpCreateInstance(pCreateInfo, pAllocator, pInstance);
+    if (result != VK_SUCCESS) {
+        return result;
+    }
+    endTime = vktrace_get_time();
+
+    initInstanceData(*pInstance, fpGetInstanceProcAddr, g_instanceDataMap);
+    ext_init_create_instance(mid(*pInstance), *pInstance, pCreateInfo->enabledExtensionCount, pCreateInfo->ppEnabledExtensionNames);
+
+    // remove the loader extended createInfo structure
+    VkInstanceCreateInfo localCreateInfo;
+    memcpy(&localCreateInfo, pCreateInfo, sizeof(localCreateInfo));
+
+    // Alloc space to copy pointers
+    if (localCreateInfo.enabledLayerCount > 0)
+        localCreateInfo.ppEnabledLayerNames = (const char* const*) malloc(localCreateInfo.enabledLayerCount * sizeof(char*));
+    if (localCreateInfo.enabledExtensionCount > 0)
+        localCreateInfo.ppEnabledExtensionNames = (const char* const*) malloc(localCreateInfo.enabledExtensionCount * sizeof(char*));
+
+    for (i = 0; i < pCreateInfo->enabledExtensionCount; i++) {
+        char **ppName = (char **) &localCreateInfo.ppEnabledExtensionNames[i];
+        *ppName = (char *) pCreateInfo->ppEnabledExtensionNames[i];
+    }
+
+    // If app requests vktrace layer, don't record that in the trace
+    char **ppName = (char **) &localCreateInfo.ppEnabledLayerNames[0];
+    for (i = 0 ; i < pCreateInfo->enabledLayerCount; i++) {
+        if (strcmp("VK_LAYER_LUNARG_vktrace", pCreateInfo->ppEnabledLayerNames[i]) == 0) {
+            // Decrement the enabled layer count and skip copying the pointer
+            localCreateInfo.enabledLayerCount--;
+        } else {
+            // Copy pointer and increment write pointer for everything else
+            *ppName++ = (char *) pCreateInfo->ppEnabledLayerNames[i];
+        }
+    }
+
+    //localCreateInfo.pNext = strip_create_extensions(pCreateInfo->pNext);
+    // The pNext pointer isn't getting marshalled into the trace buffer properly anyway, so
+    // set it to NULL so that replay does not trip over it.
+    localCreateInfo.pNext = NULL;
+    CREATE_TRACE_PACKET(vkCreateInstance, sizeof(VkInstance) + get_struct_chain_size((void*)&localCreateInfo) + sizeof(VkAllocationCallbacks));
+    pHeader->vktrace_begin_time = vktraceStartTime;
+    pHeader->entrypoint_begin_time = startTime;
+    pHeader->entrypoint_end_time = endTime;
+    pPacket = interpret_body_as_vkCreateInstance(pHeader);
+
+    add_VkInstanceCreateInfo_to_packet(pHeader, (VkInstanceCreateInfo**)&(pPacket->pCreateInfo), (VkInstanceCreateInfo*) &localCreateInfo);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pAllocator), sizeof(VkAllocationCallbacks), NULL);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pInstance), sizeof(VkInstance), pInstance);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pAllocator));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pInstance));
+    FINISH_TRACE_PACKET();
+
+    if (localCreateInfo.enabledLayerCount > 0)
+        free((void *)localCreateInfo.ppEnabledLayerNames);
+    if (localCreateInfo.enabledExtensionCount > 0)
+        free((void *)localCreateInfo.ppEnabledExtensionNames);
+
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR void VKAPI_CALL __HOOKED_vkDestroyInstance(
+    VkInstance instance,
+    const VkAllocationCallbacks* pAllocator)
+{
+    vktrace_trace_packet_header* pHeader;
+    packet_vkDestroyInstance* pPacket = NULL;
+    dispatch_key key = get_dispatch_key(instance);
+    CREATE_TRACE_PACKET(vkDestroyInstance, sizeof(VkAllocationCallbacks));
+    mid(instance)->instTable.DestroyInstance(instance, pAllocator);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkDestroyInstance(pHeader);
+    pPacket->instance = instance;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pAllocator), sizeof(VkAllocationCallbacks), NULL);
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pAllocator));
+    FINISH_TRACE_PACKET();
+    g_instanceDataMap.erase(key);
+#if defined(USE_PAGEGUARD_SPEEDUP) && !defined(PAGEGUARD_MEMCPY_USE_PPL_LIB)
+	vktrace_pageguard_done_multi_threads_memcpy();
+#endif
+}
+
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkCreateRenderPass(
+    VkDevice device,
+    const VkRenderPassCreateInfo* pCreateInfo,
+    const VkAllocationCallbacks* pAllocator,
+    VkRenderPass* pRenderPass)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkCreateRenderPass* pPacket = NULL;
+    // begin custom code (get_struct_chain_size)
+    uint32_t attachmentCount = (pCreateInfo != NULL && (pCreateInfo->pAttachments != NULL)) ? pCreateInfo->attachmentCount : 0;
+    uint32_t dependencyCount = (pCreateInfo != NULL && (pCreateInfo->pDependencies != NULL)) ? pCreateInfo->dependencyCount : 0;
+    uint32_t subpassCount = (pCreateInfo != NULL && (pCreateInfo->pSubpasses != NULL)) ? pCreateInfo->subpassCount : 0;
+    CREATE_TRACE_PACKET(vkCreateRenderPass, get_struct_chain_size((void*)pCreateInfo) + sizeof(VkAllocationCallbacks) + sizeof(VkRenderPass));
+    // end custom code
+    result = mdd(device)->devTable.CreateRenderPass(device, pCreateInfo, pAllocator, pRenderPass);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkCreateRenderPass(pHeader);
+    pPacket->device = device;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo), sizeof(VkRenderPassCreateInfo), pCreateInfo);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo->pAttachments), attachmentCount * sizeof(VkAttachmentDescription), pCreateInfo->pAttachments);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo->pDependencies), dependencyCount * sizeof(VkSubpassDependency), pCreateInfo->pDependencies);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo->pSubpasses), subpassCount * sizeof(VkSubpassDescription), pCreateInfo->pSubpasses);
+    uint32_t i;
+    for (i=0; i < pPacket->pCreateInfo->subpassCount; i++) {
+        VkSubpassDescription *pSubpass = (VkSubpassDescription *) &pPacket->pCreateInfo->pSubpasses[i];
+        const VkSubpassDescription *pSp = &pCreateInfo->pSubpasses[i];
+        vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pSubpass->pInputAttachments), pSubpass->inputAttachmentCount * sizeof(VkAttachmentReference), pSp->pInputAttachments);
+        vktrace_finalize_buffer_address(pHeader, (void**)&(pSubpass->pInputAttachments));
+        vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pSubpass->pColorAttachments), pSubpass->colorAttachmentCount * sizeof(VkAttachmentReference), pSp->pColorAttachments);
+        vktrace_finalize_buffer_address(pHeader, (void**)&(pSubpass->pColorAttachments));
+        vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pSubpass->pResolveAttachments), pSubpass->colorAttachmentCount * sizeof(VkAttachmentReference), pSp->pResolveAttachments);
+        vktrace_finalize_buffer_address(pHeader, (void**)&(pSubpass->pResolveAttachments));
+        vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pSubpass->pDepthStencilAttachment), 1 * sizeof(VkAttachmentReference), pSp->pDepthStencilAttachment);
+        vktrace_finalize_buffer_address(pHeader, (void**)&(pSubpass->pDepthStencilAttachment));
+        vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pSubpass->pPreserveAttachments), pSubpass->preserveAttachmentCount * sizeof(VkAttachmentReference), pSp->pPreserveAttachments);
+        vktrace_finalize_buffer_address(pHeader, (void**)&(pSubpass->pPreserveAttachments));
+    }
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pAllocator), sizeof(VkAllocationCallbacks), NULL);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pRenderPass), sizeof(VkRenderPass), pRenderPass);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo->pAttachments));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo->pDependencies));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo->pSubpasses));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pAllocator));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pRenderPass));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkEnumerateDeviceExtensionProperties(
+    VkPhysicalDevice physicalDevice,
+    const char* pLayerName,
+    uint32_t* pPropertyCount,
+    VkExtensionProperties* pProperties)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkEnumerateDeviceExtensionProperties* pPacket = NULL;
+    uint64_t startTime;
+    uint64_t endTime;
+    uint64_t vktraceStartTime = vktrace_get_time();
+    startTime = vktrace_get_time();
+    // Only call down chain if querying ICD rather than layer device extensions
+    if (pLayerName == NULL)
+        result = mid(physicalDevice)->instTable.EnumerateDeviceExtensionProperties(physicalDevice, NULL, pPropertyCount, pProperties);
+    else
+    {
+        *pPropertyCount = 0;
+        return VK_SUCCESS;
+    }
+    endTime = vktrace_get_time();
+    CREATE_TRACE_PACKET(vkEnumerateDeviceExtensionProperties, ((pLayerName != NULL) ? ROUNDUP_TO_4(strlen(pLayerName) + 1) : 0) + sizeof(uint32_t) + (*pPropertyCount * sizeof(VkExtensionProperties)));
+    pHeader->vktrace_begin_time = vktraceStartTime;
+    pHeader->entrypoint_begin_time = startTime;
+    pHeader->entrypoint_end_time = endTime;
+    pPacket = interpret_body_as_vkEnumerateDeviceExtensionProperties(pHeader);
+    pPacket->physicalDevice = physicalDevice;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pLayerName), ((pLayerName != NULL) ? ROUNDUP_TO_4(strlen(pLayerName) + 1) : 0), pLayerName);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pPropertyCount), sizeof(uint32_t), pPropertyCount);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pProperties), *pPropertyCount * sizeof(VkExtensionProperties), pProperties);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pLayerName));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pPropertyCount));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pProperties));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkEnumerateDeviceLayerProperties(
+    VkPhysicalDevice physicalDevice,
+    uint32_t* pPropertyCount,
+    VkLayerProperties* pProperties)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkEnumerateDeviceLayerProperties* pPacket = NULL;
+    uint64_t startTime;
+    uint64_t endTime;
+    uint64_t vktraceStartTime = vktrace_get_time();
+    startTime = vktrace_get_time();
+    result = mid(physicalDevice)->instTable.EnumerateDeviceLayerProperties(physicalDevice, pPropertyCount, pProperties);
+    endTime = vktrace_get_time();
+    CREATE_TRACE_PACKET(vkEnumerateDeviceLayerProperties, sizeof(uint32_t) + (*pPropertyCount * sizeof(VkLayerProperties)));
+    pHeader->vktrace_begin_time = vktraceStartTime;
+    pHeader->entrypoint_begin_time = startTime;
+    pHeader->entrypoint_end_time = endTime;
+    pPacket = interpret_body_as_vkEnumerateDeviceLayerProperties(pHeader);
+    pPacket->physicalDevice = physicalDevice;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pPropertyCount), sizeof(uint32_t), pPropertyCount);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pProperties), *pPropertyCount * sizeof(VkLayerProperties), pProperties);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pPropertyCount));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pProperties));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+// TODO : This should be pretty easy to fit into codegen. Don't need to make the call prior to creating packet
+//  Just need to account for "count" number of queue properties
+VKTRACER_EXPORT VKAPI_ATTR void VKAPI_CALL __HOOKED_vkGetPhysicalDeviceQueueFamilyProperties(
+    VkPhysicalDevice physicalDevice,
+    uint32_t* pQueueFamilyPropertyCount,
+    VkQueueFamilyProperties* pQueueFamilyProperties)
+{
+    vktrace_trace_packet_header* pHeader;
+    packet_vkGetPhysicalDeviceQueueFamilyProperties* pPacket = NULL;
+    uint64_t startTime;
+    uint64_t endTime;
+    uint64_t vktraceStartTime = vktrace_get_time();
+    startTime = vktrace_get_time();
+    mid(physicalDevice)->instTable.GetPhysicalDeviceQueueFamilyProperties(physicalDevice, pQueueFamilyPropertyCount, pQueueFamilyProperties);
+    endTime = vktrace_get_time();
+    CREATE_TRACE_PACKET(vkGetPhysicalDeviceQueueFamilyProperties, sizeof(uint32_t) + *pQueueFamilyPropertyCount * sizeof(VkQueueFamilyProperties));
+    pHeader->vktrace_begin_time = vktraceStartTime;
+    pHeader->entrypoint_begin_time = startTime;
+    pHeader->entrypoint_end_time = endTime;
+    pPacket = interpret_body_as_vkGetPhysicalDeviceQueueFamilyProperties(pHeader);
+    pPacket->physicalDevice = physicalDevice;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pQueueFamilyPropertyCount), sizeof(uint32_t), pQueueFamilyPropertyCount);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pQueueFamilyProperties), *pQueueFamilyPropertyCount * sizeof(VkQueueFamilyProperties), pQueueFamilyProperties);
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pQueueFamilyPropertyCount));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pQueueFamilyProperties));
+    FINISH_TRACE_PACKET();
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkEnumeratePhysicalDevices(
+    VkInstance instance,
+    uint32_t* pPhysicalDeviceCount,
+    VkPhysicalDevice* pPhysicalDevices)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkEnumeratePhysicalDevices* pPacket = NULL;
+    uint64_t startTime;
+    uint64_t endTime;
+    uint64_t vktraceStartTime = vktrace_get_time();
+    //TODO make sure can handle being called twice with pPD == 0
+    SEND_ENTRYPOINT_ID(vkEnumeratePhysicalDevices);
+    startTime = vktrace_get_time();
+    result = mid(instance)->instTable.EnumeratePhysicalDevices(instance, pPhysicalDeviceCount, pPhysicalDevices);
+    endTime = vktrace_get_time();
+    CREATE_TRACE_PACKET(vkEnumeratePhysicalDevices, sizeof(uint32_t) + ((pPhysicalDevices && pPhysicalDeviceCount) ? *pPhysicalDeviceCount * sizeof(VkPhysicalDevice) : 0));
+    pHeader->vktrace_begin_time = vktraceStartTime;
+    pHeader->entrypoint_begin_time = startTime;
+    pHeader->entrypoint_end_time = endTime;
+    pPacket = interpret_body_as_vkEnumeratePhysicalDevices(pHeader);
+    pPacket->instance = instance;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pPhysicalDeviceCount), sizeof(uint32_t), pPhysicalDeviceCount);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pPhysicalDevices), *pPhysicalDeviceCount*sizeof(VkPhysicalDevice), pPhysicalDevices);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pPhysicalDeviceCount));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pPhysicalDevices));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkGetQueryPoolResults(
+    VkDevice device,
+    VkQueryPool queryPool,
+    uint32_t firstQuery,
+    uint32_t queryCount,
+    size_t dataSize,
+    void* pData,
+    VkDeviceSize stride,
+    VkQueryResultFlags flags)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkGetQueryPoolResults* pPacket = NULL;
+    uint64_t startTime;
+    uint64_t endTime;
+    uint64_t vktraceStartTime = vktrace_get_time();
+    startTime = vktrace_get_time();
+    result = mdd(device)->devTable.GetQueryPoolResults(device, queryPool, firstQuery, queryCount, dataSize, pData, stride, flags);
+    endTime = vktrace_get_time();
+    CREATE_TRACE_PACKET(vkGetQueryPoolResults, dataSize);
+    pHeader->vktrace_begin_time = vktraceStartTime;
+    pHeader->entrypoint_begin_time = startTime;
+    pHeader->entrypoint_end_time = endTime;
+    pPacket = interpret_body_as_vkGetQueryPoolResults(pHeader);
+    pPacket->device = device;
+    pPacket->queryPool = queryPool;
+    pPacket->firstQuery = firstQuery;
+    pPacket->queryCount = queryCount;
+    pPacket->dataSize = dataSize;
+    pPacket->stride = stride;
+    pPacket->flags = flags;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pData), dataSize, pData);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pData));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkAllocateDescriptorSets(
+    VkDevice device,
+    const VkDescriptorSetAllocateInfo* pAllocateInfo,
+    VkDescriptorSet* pDescriptorSets)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkAllocateDescriptorSets* pPacket = NULL;
+    uint64_t startTime;
+    uint64_t endTime;
+    uint64_t vktraceStartTime = vktrace_get_time();
+    SEND_ENTRYPOINT_ID(vkAllocateDescriptorSets);
+    startTime = vktrace_get_time();
+    result = mdd(device)->devTable.AllocateDescriptorSets(device, pAllocateInfo, pDescriptorSets);
+    endTime = vktrace_get_time();
+    CREATE_TRACE_PACKET(vkAllocateDescriptorSets, vk_size_vkdescriptorsetallocateinfo(pAllocateInfo) + (pAllocateInfo->descriptorSetCount * sizeof(VkDescriptorSet)));
+    pHeader->vktrace_begin_time = vktraceStartTime;
+    pHeader->entrypoint_begin_time = startTime;
+    pHeader->entrypoint_end_time = endTime;
+    pPacket = interpret_body_as_vkAllocateDescriptorSets(pHeader);
+    pPacket->device = device;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pAllocateInfo), sizeof(VkDescriptorSetAllocateInfo), pAllocateInfo);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pAllocateInfo->pSetLayouts), pPacket->pAllocateInfo->descriptorSetCount * sizeof(VkDescriptorSetLayout), pAllocateInfo->pSetLayouts);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pDescriptorSets), pPacket->pAllocateInfo->descriptorSetCount * sizeof(VkDescriptorSet), pDescriptorSets);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pAllocateInfo->pSetLayouts));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pDescriptorSets));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pAllocateInfo));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR void VKAPI_CALL __HOOKED_vkUpdateDescriptorSets(
+VkDevice device,
+        uint32_t descriptorWriteCount,
+        const VkWriteDescriptorSet* pDescriptorWrites,
+        uint32_t descriptorCopyCount,
+        const VkCopyDescriptorSet* pDescriptorCopies);
+// Manually written because it needs to use get_struct_chain_size and allocate some extra pointers (why?)
+// Also since it needs to app the array of pointers and sub-buffers (see comments in function)
+VKTRACER_EXPORT VKAPI_ATTR void VKAPI_CALL __HOOKED_vkUpdateDescriptorSets(
+    VkDevice device,
+    uint32_t descriptorWriteCount,
+    const VkWriteDescriptorSet* pDescriptorWrites,
+    uint32_t descriptorCopyCount,
+    const VkCopyDescriptorSet* pDescriptorCopies )
+{
+    vktrace_trace_packet_header* pHeader;
+    packet_vkUpdateDescriptorSets* pPacket = NULL;
+    // begin custom code
+    size_t arrayByteCount = 0;
+    size_t i;
+
+    for (i = 0; i < descriptorWriteCount; i++)
+    {
+        arrayByteCount += get_struct_chain_size(&pDescriptorWrites[i]);
+    }
+
+    for (i = 0; i < descriptorCopyCount; i++)
+    {
+        arrayByteCount += get_struct_chain_size(&pDescriptorCopies[i]);
+    }
+
+    CREATE_TRACE_PACKET(vkUpdateDescriptorSets, arrayByteCount);
+    // end custom code
+
+    mdd(device)->devTable.UpdateDescriptorSets(device, descriptorWriteCount, pDescriptorWrites, descriptorCopyCount, pDescriptorCopies);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkUpdateDescriptorSets(pHeader);
+    pPacket->device = device;
+    pPacket->descriptorWriteCount = descriptorWriteCount;
+    // begin custom code
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pDescriptorWrites), descriptorWriteCount * sizeof(VkWriteDescriptorSet), pDescriptorWrites);
+    for (i = 0; i < descriptorWriteCount; i++)
+    {
+        switch (pPacket->pDescriptorWrites[i].descriptorType) {
+        case VK_DESCRIPTOR_TYPE_SAMPLER:
+        case VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER:
+        case VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE:
+        case VK_DESCRIPTOR_TYPE_STORAGE_IMAGE:
+        case VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT:
+            {
+                vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pDescriptorWrites[i].pImageInfo),
+                                                   pDescriptorWrites[i].descriptorCount * sizeof(VkDescriptorImageInfo),
+                                                   pDescriptorWrites[i].pImageInfo);
+                vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pDescriptorWrites[i].pImageInfo));
+            }
+            break;
+        case VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER:
+        case VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER:
+            {
+                vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pDescriptorWrites[i].pTexelBufferView),
+                                                   pDescriptorWrites[i].descriptorCount * sizeof(VkBufferView),
+                                                   pDescriptorWrites[i].pTexelBufferView);
+                vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pDescriptorWrites[i].pTexelBufferView));
+            }
+            break;
+        case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER:
+        case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER:
+        case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC:
+        case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC:
+            {
+                vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pDescriptorWrites[i].pBufferInfo),
+                                                   pDescriptorWrites[i].descriptorCount * sizeof(VkDescriptorBufferInfo),
+                                                   pDescriptorWrites[i].pBufferInfo);
+                vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pDescriptorWrites[i].pBufferInfo));
+            }
+            break;
+        default:
+            break;
+        }
+    }
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pDescriptorWrites));
+
+    pPacket->descriptorCopyCount = descriptorCopyCount;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pDescriptorCopies), descriptorCopyCount * sizeof(VkCopyDescriptorSet), pDescriptorCopies);
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pDescriptorCopies));
+    // end custom code
+    FINISH_TRACE_PACKET();
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkQueueSubmit(
+    VkQueue queue,
+    uint32_t submitCount,
+    const VkSubmitInfo* pSubmits,
+    VkFence fence)
+{
+#ifdef USE_PAGEGUARD_SPEEDUP
+    pageguardEnter();
+    flushAllChangedMappedMemory(&vkFlushMappedMemoryRangesWithoutAPICall);
+    resetAllReadFlagAndPageGuard();
+    pageguardExit();
+#endif
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkQueueSubmit* pPacket = NULL;
+    size_t arrayByteCount = 0;
+    uint32_t i = 0;
+    for (i=0; i<submitCount; ++i) {
+        arrayByteCount += vk_size_vksubmitinfo(&pSubmits[i]);
+    }
+    CREATE_TRACE_PACKET(vkQueueSubmit, arrayByteCount);
+    result = mdd(queue)->devTable.QueueSubmit(queue, submitCount, pSubmits, fence);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkQueueSubmit(pHeader);
+    pPacket->queue = queue;
+    pPacket->submitCount = submitCount;
+    pPacket->fence = fence;
+    pPacket->result = result;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pSubmits), submitCount*sizeof(VkSubmitInfo), pSubmits);
+    for (i=0; i<submitCount; ++i) {
+        vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pSubmits[i].pCommandBuffers), pPacket->pSubmits[i].commandBufferCount * sizeof(VkCommandBuffer), pSubmits[i].pCommandBuffers);
+        vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pSubmits[i].pCommandBuffers));
+        vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pSubmits[i].pWaitSemaphores), pPacket->pSubmits[i].waitSemaphoreCount * sizeof(VkSemaphore), pSubmits[i].pWaitSemaphores);
+        vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pSubmits[i].pWaitSemaphores));
+        vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pSubmits[i].pSignalSemaphores), pPacket->pSubmits[i].signalSemaphoreCount * sizeof(VkSemaphore), pSubmits[i].pSignalSemaphores);
+        vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pSubmits[i].pSignalSemaphores));
+        vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pSubmits[i].pWaitDstStageMask), sizeof(VkPipelineStageFlags), pSubmits[i].pWaitDstStageMask);
+        vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pSubmits[i].pWaitDstStageMask));
+    }
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pSubmits));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkQueueBindSparse(
+    VkQueue queue,
+    uint32_t bindInfoCount,
+    const VkBindSparseInfo* pBindInfo,
+    VkFence fence)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkQueueBindSparse* pPacket = NULL;
+    size_t arrayByteCount = 0;
+    uint32_t i = 0;
+
+    for (i = 0; i<bindInfoCount; ++i) {
+        arrayByteCount += vk_size_vkbindsparseinfo(&pBindInfo[i]);
+    }
+
+    CREATE_TRACE_PACKET(vkQueueBindSparse, arrayByteCount + 2 * sizeof(VkDeviceMemory));
+    result = mdd(queue)->devTable.QueueBindSparse(queue, bindInfoCount, pBindInfo, fence);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkQueueBindSparse(pHeader);
+    pPacket->queue = queue;
+    pPacket->bindInfoCount = bindInfoCount;
+    pPacket->fence = fence;
+    pPacket->result = result;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pBindInfo), bindInfoCount * sizeof(VkBindSparseInfo), pBindInfo);
+
+    for (i = 0; i<bindInfoCount; ++i) {
+        vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pBindInfo[i].pBufferBinds), pPacket->pBindInfo[i].bufferBindCount * sizeof(VkSparseBufferMemoryBindInfo), pBindInfo[i].pBufferBinds);
+	for (uint32_t j = 0; j < pPacket->pBindInfo[i].bufferBindCount; j++) {
+            VkSparseBufferMemoryBindInfo *pSparseBufferMemoryBindInfo = (VkSparseBufferMemoryBindInfo *)&pPacket->pBindInfo[i].pBufferBinds[j];
+            const VkSparseBufferMemoryBindInfo *pSparseBufMemBndInf = &pBindInfo[i].pBufferBinds[j];
+            vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pSparseBufferMemoryBindInfo->pBinds), pSparseBufferMemoryBindInfo->bindCount * sizeof(VkSparseMemoryBind), pSparseBufMemBndInf->pBinds);
+            vktrace_finalize_buffer_address(pHeader, (void**)&(pSparseBufferMemoryBindInfo->pBinds));
+        }
+        vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pBindInfo[i].pBufferBinds));
+
+        vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pBindInfo[i].pImageBinds), pPacket->pBindInfo[i].imageBindCount * sizeof(VkSparseImageMemoryBindInfo), pBindInfo[i].pImageOpaqueBinds);
+        for (uint32_t j = 0; j < pPacket->pBindInfo[i].imageBindCount; j++) {
+            VkSparseImageMemoryBindInfo *pSparseImageMemoryBindInfo = (VkSparseImageMemoryBindInfo *)&pPacket->pBindInfo[i].pImageBinds[j];
+            const VkSparseImageMemoryBindInfo *pSparseImgMemBndInf = &pBindInfo[i].pImageBinds[j];
+            vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pSparseImageMemoryBindInfo->pBinds), pSparseImageMemoryBindInfo->bindCount * sizeof(VkSparseImageMemoryBind), pSparseImgMemBndInf->pBinds);
+            vktrace_finalize_buffer_address(pHeader, (void**)&(pSparseImageMemoryBindInfo->pBinds));
+        }
+        vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pBindInfo[i].pImageBinds));
+
+        vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pBindInfo[i].pImageOpaqueBinds), pPacket->pBindInfo[i].imageOpaqueBindCount * sizeof(VkSparseImageOpaqueMemoryBindInfo), pBindInfo[i].pImageOpaqueBinds);
+        for (uint32_t j = 0; j < pPacket->pBindInfo[i].imageOpaqueBindCount; j++) {
+            VkSparseImageOpaqueMemoryBindInfo *pSparseImageOpaqueMemoryBindInfo = (VkSparseImageOpaqueMemoryBindInfo *)&pPacket->pBindInfo[i].pImageOpaqueBinds[j];
+            const VkSparseImageOpaqueMemoryBindInfo *pSparseImgOpqMemBndInf = &pBindInfo[i].pImageOpaqueBinds[j];
+            vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pSparseImageOpaqueMemoryBindInfo->pBinds), pSparseImageOpaqueMemoryBindInfo->bindCount * sizeof(VkSparseMemoryBind), pSparseImgOpqMemBndInf->pBinds);
+            vktrace_finalize_buffer_address(pHeader, (void**)&(pSparseImageOpaqueMemoryBindInfo->pBinds));
+        }
+        vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pBindInfo[i].pImageOpaqueBinds));
+
+        vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pBindInfo[i].pWaitSemaphores), pPacket->pBindInfo[i].waitSemaphoreCount * sizeof(VkSemaphore), pBindInfo[i].pWaitSemaphores);
+        vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pBindInfo[i].pWaitSemaphores));
+
+        vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pBindInfo[i].pSignalSemaphores), pPacket->pBindInfo[i].signalSemaphoreCount * sizeof(VkSemaphore), pBindInfo[i].pSignalSemaphores);
+        vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pBindInfo[i].pSignalSemaphores));
+    }
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pBindInfo));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR void VKAPI_CALL __HOOKED_vkCmdWaitEvents(
+    VkCommandBuffer                             commandBuffer,
+    uint32_t                                    eventCount,
+    const VkEvent*                              pEvents,
+    VkPipelineStageFlags                        srcStageMask,
+    VkPipelineStageFlags                        dstStageMask,
+    uint32_t                                    memoryBarrierCount,
+    const VkMemoryBarrier*                      pMemoryBarriers,
+    uint32_t                                    bufferMemoryBarrierCount,
+    const VkBufferMemoryBarrier*                pBufferMemoryBarriers,
+    uint32_t                                    imageMemoryBarrierCount,
+    const VkImageMemoryBarrier*                 pImageMemoryBarriers)
+{
+    vktrace_trace_packet_header* pHeader;
+    packet_vkCmdWaitEvents* pPacket = NULL;
+    size_t customSize;
+    customSize = (eventCount * sizeof(VkEvent)) + (memoryBarrierCount * sizeof(VkMemoryBarrier)) +
+            (bufferMemoryBarrierCount * sizeof(VkBufferMemoryBarrier)) +
+            (imageMemoryBarrierCount * sizeof(VkImageMemoryBarrier));
+    CREATE_TRACE_PACKET(vkCmdWaitEvents, customSize);
+    mdd(commandBuffer)->devTable.CmdWaitEvents(commandBuffer, eventCount, pEvents, srcStageMask, dstStageMask,
+                                    memoryBarrierCount, pMemoryBarriers,
+                                    bufferMemoryBarrierCount, pBufferMemoryBarriers,
+                                    imageMemoryBarrierCount, pImageMemoryBarriers);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkCmdWaitEvents(pHeader);
+    pPacket->commandBuffer = commandBuffer;
+    pPacket->eventCount = eventCount;
+    pPacket->srcStageMask = srcStageMask;
+    pPacket->dstStageMask = dstStageMask;
+    pPacket->memoryBarrierCount = memoryBarrierCount;
+    pPacket->bufferMemoryBarrierCount = bufferMemoryBarrierCount;
+    pPacket->imageMemoryBarrierCount = imageMemoryBarrierCount;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pEvents), eventCount * sizeof(VkEvent), pEvents);
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pEvents));
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pMemoryBarriers), memoryBarrierCount * sizeof(VkMemoryBarrier), pMemoryBarriers);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pBufferMemoryBarriers), bufferMemoryBarrierCount * sizeof(VkBufferMemoryBarrier), pBufferMemoryBarriers);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pImageMemoryBarriers), imageMemoryBarrierCount * sizeof(VkImageMemoryBarrier), pImageMemoryBarriers);
+
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pMemoryBarriers));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pBufferMemoryBarriers));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pImageMemoryBarriers));
+    FINISH_TRACE_PACKET();
+}
+
+VKTRACER_EXPORT VKAPI_ATTR void VKAPI_CALL __HOOKED_vkCmdPipelineBarrier(
+    VkCommandBuffer                             commandBuffer,
+    VkPipelineStageFlags                        srcStageMask,
+    VkPipelineStageFlags                        dstStageMask,
+    VkDependencyFlags                           dependencyFlags,
+    uint32_t                                    memoryBarrierCount,
+    const VkMemoryBarrier*                      pMemoryBarriers,
+    uint32_t                                    bufferMemoryBarrierCount,
+    const VkBufferMemoryBarrier*                pBufferMemoryBarriers,
+    uint32_t                                    imageMemoryBarrierCount,
+    const VkImageMemoryBarrier*                 pImageMemoryBarriers)
+{
+    vktrace_trace_packet_header* pHeader;
+    packet_vkCmdPipelineBarrier* pPacket = NULL;
+    size_t customSize;
+    customSize = (memoryBarrierCount * sizeof(VkMemoryBarrier)) +
+            (bufferMemoryBarrierCount * sizeof(VkBufferMemoryBarrier)) +
+            (imageMemoryBarrierCount * sizeof(VkImageMemoryBarrier));
+    CREATE_TRACE_PACKET(vkCmdPipelineBarrier, customSize);
+    mdd(commandBuffer)->devTable.CmdPipelineBarrier(commandBuffer, srcStageMask, dstStageMask, dependencyFlags, memoryBarrierCount, pMemoryBarriers, bufferMemoryBarrierCount, pBufferMemoryBarriers, imageMemoryBarrierCount, pImageMemoryBarriers);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkCmdPipelineBarrier(pHeader);
+    pPacket->commandBuffer = commandBuffer;
+    pPacket->srcStageMask = srcStageMask;
+    pPacket->dstStageMask = dstStageMask;
+    pPacket->dependencyFlags = dependencyFlags;
+    pPacket->memoryBarrierCount = memoryBarrierCount;
+    pPacket->bufferMemoryBarrierCount = bufferMemoryBarrierCount;
+    pPacket->imageMemoryBarrierCount = imageMemoryBarrierCount;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pMemoryBarriers), memoryBarrierCount * sizeof(VkMemoryBarrier), pMemoryBarriers);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pBufferMemoryBarriers), bufferMemoryBarrierCount * sizeof(VkBufferMemoryBarrier), pBufferMemoryBarriers);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pImageMemoryBarriers), imageMemoryBarrierCount * sizeof(VkImageMemoryBarrier), pImageMemoryBarriers);
+
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pMemoryBarriers));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pBufferMemoryBarriers));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pImageMemoryBarriers));
+    FINISH_TRACE_PACKET();
+}
+
+VKTRACER_EXPORT VKAPI_ATTR void VKAPI_CALL __HOOKED_vkCmdPushConstants(
+    VkCommandBuffer commandBuffer,
+    VkPipelineLayout layout,
+    VkShaderStageFlags stageFlags,
+    uint32_t offset,
+    uint32_t size,
+    const void* pValues)
+{
+    vktrace_trace_packet_header* pHeader;
+    packet_vkCmdPushConstants* pPacket = NULL;
+    CREATE_TRACE_PACKET(vkCmdPushConstants, size);
+    mdd(commandBuffer)->devTable.CmdPushConstants(commandBuffer, layout, stageFlags, offset, size, pValues);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkCmdPushConstants(pHeader);
+    pPacket->commandBuffer = commandBuffer;
+    pPacket->layout = layout;
+    pPacket->stageFlags = stageFlags;
+    pPacket->offset = offset;
+    pPacket->size = size;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pValues), size, pValues);
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pValues));
+    FINISH_TRACE_PACKET();
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkGetPipelineCacheData(
+    VkDevice device,
+    VkPipelineCache pipelineCache,
+    size_t* pDataSize,
+    void* pData)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkGetPipelineCacheData* pPacket = NULL;
+    uint64_t startTime;
+    uint64_t endTime;
+    uint64_t vktraceStartTime = vktrace_get_time();
+    startTime = vktrace_get_time();
+    result = mdd(device)->devTable.GetPipelineCacheData(device, pipelineCache, pDataSize, pData);
+    endTime = vktrace_get_time();
+    assert(pDataSize);
+    CREATE_TRACE_PACKET(vkGetPipelineCacheData, sizeof(size_t) + ROUNDUP_TO_4(*pDataSize));
+    pHeader->vktrace_begin_time = vktraceStartTime;
+    pHeader->entrypoint_begin_time = startTime;
+    pHeader->entrypoint_end_time = endTime;
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkGetPipelineCacheData(pHeader);
+    pPacket->device = device;
+    pPacket->pipelineCache = pipelineCache;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pDataSize), sizeof(size_t), pDataSize);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pData), ROUNDUP_TO_4(*pDataSize), pData);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pDataSize));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pData));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkCreateGraphicsPipelines(
+    VkDevice device,
+    VkPipelineCache pipelineCache,
+    uint32_t createInfoCount,
+    const VkGraphicsPipelineCreateInfo* pCreateInfos,
+    const VkAllocationCallbacks* pAllocator,
+    VkPipeline* pPipelines)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkCreateGraphicsPipelines* pPacket = NULL;
+    size_t total_size = 0;
+    uint32_t i;
+    for (i = 0; i < createInfoCount; i++) {
+        total_size += get_struct_chain_size((void*)&pCreateInfos[i]);
+    }
+    CREATE_TRACE_PACKET(vkCreateGraphicsPipelines, total_size + sizeof(VkAllocationCallbacks) + createInfoCount*sizeof(VkPipeline));
+    result = mdd(device)->devTable.CreateGraphicsPipelines(device, pipelineCache, createInfoCount, pCreateInfos, pAllocator, pPipelines);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkCreateGraphicsPipelines(pHeader);
+    pPacket->device = device;
+    pPacket->pipelineCache = pipelineCache;
+    pPacket->createInfoCount = createInfoCount;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfos), createInfoCount*sizeof(VkGraphicsPipelineCreateInfo), pCreateInfos);
+    add_VkGraphicsPipelineCreateInfos_to_trace_packet(pHeader, (VkGraphicsPipelineCreateInfo*)pPacket->pCreateInfos, pCreateInfos, createInfoCount);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pAllocator), sizeof(VkAllocationCallbacks), NULL);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pPipelines), createInfoCount*sizeof(VkPipeline), pPipelines);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfos));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pAllocator));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pPipelines));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+uint64_t getVkComputePipelineCreateInfosAdditionalSize(uint32_t createInfoCount, const VkComputePipelineCreateInfo* pCreateInfos)
+{
+    uint64_t uiRet = 0;
+    VkPipelineShaderStageCreateInfo* packetShader;
+    for (uint32_t i = 0; i < createInfoCount; i++)
+    {
+        uiRet += sizeof(VkPipelineShaderStageCreateInfo);
+        packetShader = (VkPipelineShaderStageCreateInfo*)&pCreateInfos[i].stage;
+        uiRet += strlen(packetShader->pName) + 1;
+        uiRet += sizeof(VkSpecializationInfo);
+        if (packetShader->pSpecializationInfo != NULL)
+        {
+            uiRet += sizeof(VkSpecializationMapEntry) * packetShader->pSpecializationInfo->mapEntryCount;
+            uiRet += packetShader->pSpecializationInfo->dataSize;
+        }
+    }
+    return uiRet;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkCreateComputePipelines(
+    VkDevice device,
+    VkPipelineCache pipelineCache,
+    uint32_t createInfoCount,
+    const VkComputePipelineCreateInfo* pCreateInfos,
+    const VkAllocationCallbacks* pAllocator,
+    VkPipeline* pPipelines)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkCreateComputePipelines* pPacket = NULL;
+    /*uint32_t i;
+    size_t total_size;
+
+    total_size = createInfoCount*sizeof(VkComputePipelineCreateInfo) + sizeof(VkAllocationCallbacks) + createInfoCount*sizeof(VkPipeline);
+    for (i=0; i < createInfoCount; i++) {
+        total_size += ROUNDUP_TO_4(strlen(pCreateInfos[i].stage.pName)+1);
+        if (pCreateInfos[i].stage.pSpecializationInfo) {
+            total_size += sizeof(VkSpecializationInfo);
+            if (pCreateInfos[i].stage.pSpecializationInfo->mapEntryCount > 0 && pCreateInfos[i].stage.pSpecializationInfo->pMapEntries)
+                total_size += pCreateInfos[i].stage.pSpecializationInfo->mapEntryCount * sizeof(VkSpecializationMapEntry);
+            if (pCreateInfos[i].stage.pSpecializationInfo->dataSize > 0 && pCreateInfos[i].stage.pSpecializationInfo->pData)
+                total_size += pCreateInfos[i].stage.pSpecializationInfo->dataSize;
+        }
+    }
+    CREATE_TRACE_PACKET(vkCreateComputePipelines, total_size);*/
+    CREATE_TRACE_PACKET(vkCreateComputePipelines, createInfoCount*sizeof(VkComputePipelineCreateInfo) + getVkComputePipelineCreateInfosAdditionalSize( createInfoCount, pCreateInfos) + sizeof(VkAllocationCallbacks) + createInfoCount*sizeof(VkPipeline));
+
+    result = mdd(device)->devTable.CreateComputePipelines(device, pipelineCache, createInfoCount, pCreateInfos, pAllocator, pPipelines);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkCreateComputePipelines(pHeader);
+    pPacket->device = device;
+    pPacket->pipelineCache = pipelineCache;
+    pPacket->createInfoCount = createInfoCount;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfos), createInfoCount*sizeof(VkComputePipelineCreateInfo), pCreateInfos);
+    add_VkComputePipelineCreateInfos_to_trace_packet(pHeader, (VkComputePipelineCreateInfo*)pPacket->pCreateInfos, pCreateInfos, createInfoCount);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pAllocator), sizeof(VkAllocationCallbacks), NULL);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pPipelines), createInfoCount*sizeof(VkPipeline), pPipelines);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfos));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pAllocator));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pPipelines));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkCreatePipelineCache(
+    VkDevice device,
+    const VkPipelineCacheCreateInfo* pCreateInfo,
+    const VkAllocationCallbacks* pAllocator,
+    VkPipelineCache* pPipelineCache)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkCreatePipelineCache* pPacket = NULL;
+    // Need to round up the size when we create the packet because pCreateInfo->initialDataSize may not be a mult of 4
+    CREATE_TRACE_PACKET(vkCreatePipelineCache, ROUNDUP_TO_4(get_struct_chain_size((void*)pCreateInfo) + sizeof(VkAllocationCallbacks) + sizeof(VkPipelineCache)));
+    result = mdd(device)->devTable.CreatePipelineCache(device, pCreateInfo, pAllocator, pPipelineCache);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkCreatePipelineCache(pHeader);
+    pPacket->device = device;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo), sizeof(VkPipelineCacheCreateInfo), pCreateInfo);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo->pInitialData), ROUNDUP_TO_4(pPacket->pCreateInfo->initialDataSize), pCreateInfo->pInitialData);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pAllocator), sizeof(VkAllocationCallbacks), NULL);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pPipelineCache), sizeof(VkPipelineCache), pPipelineCache);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo->pInitialData));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pAllocator));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pPipelineCache));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR void VKAPI_CALL __HOOKED_vkCmdBeginRenderPass(
+    VkCommandBuffer commandBuffer,
+    const VkRenderPassBeginInfo* pRenderPassBegin,
+    VkSubpassContents contents)
+{
+    vktrace_trace_packet_header* pHeader;
+    packet_vkCmdBeginRenderPass* pPacket = NULL;
+    size_t clearValueSize = sizeof(VkClearValue) * pRenderPassBegin->clearValueCount;
+    CREATE_TRACE_PACKET(vkCmdBeginRenderPass, sizeof(VkRenderPassBeginInfo) + clearValueSize);
+    mdd(commandBuffer)->devTable.CmdBeginRenderPass(commandBuffer, pRenderPassBegin, contents);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkCmdBeginRenderPass(pHeader);
+    pPacket->commandBuffer = commandBuffer;
+    pPacket->contents = contents;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pRenderPassBegin), sizeof(VkRenderPassBeginInfo), pRenderPassBegin);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pRenderPassBegin->pClearValues), clearValueSize, pRenderPassBegin->pClearValues);
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pRenderPassBegin->pClearValues));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pRenderPassBegin));
+    FINISH_TRACE_PACKET();
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkFreeDescriptorSets(
+    VkDevice device,
+    VkDescriptorPool descriptorPool,
+    uint32_t descriptorSetCount,
+    const VkDescriptorSet* pDescriptorSets)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkFreeDescriptorSets* pPacket = NULL;
+    CREATE_TRACE_PACKET(vkFreeDescriptorSets, descriptorSetCount*sizeof(VkDescriptorSet));
+    result = mdd(device)->devTable.FreeDescriptorSets(device, descriptorPool, descriptorSetCount, pDescriptorSets);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkFreeDescriptorSets(pHeader);
+    pPacket->device = device;
+    pPacket->descriptorPool = descriptorPool;
+    pPacket->descriptorSetCount = descriptorSetCount;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pDescriptorSets), descriptorSetCount*sizeof(VkDescriptorSet), pDescriptorSets);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pDescriptorSets));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkGetPhysicalDeviceSurfaceCapabilitiesKHR(
+    VkPhysicalDevice physicalDevice,
+    VkSurfaceKHR surface,
+    VkSurfaceCapabilitiesKHR* pSurfaceCapabilities)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkGetPhysicalDeviceSurfaceCapabilitiesKHR* pPacket = NULL;
+    CREATE_TRACE_PACKET(vkGetPhysicalDeviceSurfaceCapabilitiesKHR, sizeof(VkSurfaceCapabilitiesKHR));
+    result = mid(physicalDevice)->instTable.GetPhysicalDeviceSurfaceCapabilitiesKHR(physicalDevice, surface, pSurfaceCapabilities);
+    pPacket = interpret_body_as_vkGetPhysicalDeviceSurfaceCapabilitiesKHR(pHeader);
+    pPacket->physicalDevice = physicalDevice;
+    pPacket->surface = surface;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pSurfaceCapabilities), sizeof(VkSurfaceCapabilitiesKHR), pSurfaceCapabilities);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pSurfaceCapabilities));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkGetPhysicalDeviceSurfaceFormatsKHR(
+    VkPhysicalDevice physicalDevice,
+    VkSurfaceKHR surface,
+    uint32_t* pSurfaceFormatCount,
+    VkSurfaceFormatKHR* pSurfaceFormats)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    size_t _dataSize;
+    packet_vkGetPhysicalDeviceSurfaceFormatsKHR* pPacket = NULL;
+    uint64_t startTime;
+    uint64_t endTime;
+    uint64_t vktraceStartTime = vktrace_get_time();
+    startTime = vktrace_get_time();
+    result = mid(physicalDevice)->instTable.GetPhysicalDeviceSurfaceFormatsKHR(physicalDevice, surface, pSurfaceFormatCount, pSurfaceFormats);
+    endTime = vktrace_get_time();
+    _dataSize = (pSurfaceFormatCount == NULL || pSurfaceFormats == NULL) ? 0 : (*pSurfaceFormatCount *sizeof(VkSurfaceFormatKHR));
+    CREATE_TRACE_PACKET(vkGetPhysicalDeviceSurfaceFormatsKHR, sizeof(uint32_t) + _dataSize);
+    pHeader->vktrace_begin_time = vktraceStartTime;
+    pHeader->entrypoint_begin_time = startTime;
+    pHeader->entrypoint_end_time = endTime;
+    pPacket = interpret_body_as_vkGetPhysicalDeviceSurfaceFormatsKHR(pHeader);
+    pPacket->physicalDevice = physicalDevice;
+    pPacket->surface = surface;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pSurfaceFormatCount), sizeof(uint32_t), pSurfaceFormatCount);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pSurfaceFormats), _dataSize, pSurfaceFormats);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pSurfaceFormatCount));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pSurfaceFormats));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkGetPhysicalDeviceSurfacePresentModesKHR(
+    VkPhysicalDevice physicalDevice,
+    VkSurfaceKHR surface,
+    uint32_t* pPresentModeCount,
+    VkPresentModeKHR* pPresentModes)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    size_t _dataSize;
+    packet_vkGetPhysicalDeviceSurfacePresentModesKHR* pPacket = NULL;
+    uint64_t startTime;
+    uint64_t endTime;
+    uint64_t vktraceStartTime = vktrace_get_time();
+    startTime = vktrace_get_time();
+    result = mid(physicalDevice)->instTable.GetPhysicalDeviceSurfacePresentModesKHR(physicalDevice, surface, pPresentModeCount, pPresentModes);
+    endTime = vktrace_get_time();
+    _dataSize = (pPresentModeCount == NULL || pPresentModes == NULL) ? 0 : (*pPresentModeCount *sizeof(VkPresentModeKHR));
+    CREATE_TRACE_PACKET(vkGetPhysicalDeviceSurfacePresentModesKHR, sizeof(uint32_t) + _dataSize);
+    pHeader->vktrace_begin_time = vktraceStartTime;
+    pHeader->entrypoint_begin_time = startTime;
+    pHeader->entrypoint_end_time = endTime;
+    pPacket = interpret_body_as_vkGetPhysicalDeviceSurfacePresentModesKHR(pHeader);
+    pPacket->physicalDevice = physicalDevice;
+    pPacket->surface = surface;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pPresentModeCount), sizeof(uint32_t), pPresentModeCount);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pPresentModes), _dataSize, pPresentModes);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pPresentModeCount));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pPresentModes));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkCreateSwapchainKHR(
+    VkDevice device,
+    const VkSwapchainCreateInfoKHR* pCreateInfo,
+    const VkAllocationCallbacks* pAllocator,
+    VkSwapchainKHR* pSwapchain)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkCreateSwapchainKHR* pPacket = NULL;
+    CREATE_TRACE_PACKET(vkCreateSwapchainKHR, vk_size_vkswapchaincreateinfokhr(pCreateInfo) + sizeof(VkSwapchainKHR) + sizeof(VkAllocationCallbacks));
+    result = mdd(device)->devTable.CreateSwapchainKHR(device, pCreateInfo, pAllocator, pSwapchain);
+    pPacket = interpret_body_as_vkCreateSwapchainKHR(pHeader);
+    pPacket->device = device;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo), sizeof(VkSwapchainCreateInfoKHR), pCreateInfo);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pAllocator), sizeof(VkAllocationCallbacks), NULL);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pSwapchain), sizeof(VkSwapchainKHR), pSwapchain);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo->pQueueFamilyIndices), pCreateInfo->queueFamilyIndexCount * sizeof(uint32_t), pCreateInfo->pQueueFamilyIndices);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pAllocator));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo->pQueueFamilyIndices));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pSwapchain));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkGetSwapchainImagesKHR(
+    VkDevice device,
+    VkSwapchainKHR swapchain,
+    uint32_t* pSwapchainImageCount,
+    VkImage* pSwapchainImages)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    size_t _dataSize;
+    packet_vkGetSwapchainImagesKHR* pPacket = NULL;
+    uint64_t startTime;
+    uint64_t endTime;
+    uint64_t vktraceStartTime = vktrace_get_time();
+    startTime = vktrace_get_time();
+    result = mdd(device)->devTable.GetSwapchainImagesKHR(device, swapchain, pSwapchainImageCount, pSwapchainImages);
+    endTime = vktrace_get_time();
+    _dataSize = (pSwapchainImageCount == NULL || pSwapchainImages == NULL) ? 0 : (*pSwapchainImageCount *sizeof(VkImage));
+    CREATE_TRACE_PACKET(vkGetSwapchainImagesKHR, sizeof(uint32_t) + _dataSize);
+    pHeader->vktrace_begin_time = vktraceStartTime;
+    pHeader->entrypoint_begin_time = startTime;
+    pHeader->entrypoint_end_time = endTime;
+    pPacket = interpret_body_as_vkGetSwapchainImagesKHR(pHeader);
+    pPacket->device = device;
+    pPacket->swapchain = swapchain;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pSwapchainImageCount), sizeof(uint32_t), pSwapchainImageCount);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pSwapchainImages), _dataSize, pSwapchainImages);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pSwapchainImageCount));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pSwapchainImages));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkQueuePresentKHR(
+    VkQueue queue,
+    const VkPresentInfoKHR* pPresentInfo)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkQueuePresentKHR* pPacket = NULL;
+    size_t swapchainSize = pPresentInfo->swapchainCount * sizeof(VkSwapchainKHR);
+    size_t indexSize = pPresentInfo->swapchainCount * sizeof(uint32_t);
+    size_t semaSize = pPresentInfo->waitSemaphoreCount * sizeof(VkSemaphore);
+    size_t resultsSize = pPresentInfo->swapchainCount * sizeof(VkResult);
+    size_t totalSize = sizeof(VkPresentInfoKHR) + swapchainSize + indexSize + semaSize;
+    if (pPresentInfo->pResults != NULL) {
+        totalSize += resultsSize;
+    }
+    CREATE_TRACE_PACKET(vkQueuePresentKHR, totalSize);
+    result = mdd(queue)->devTable.QueuePresentKHR(queue, pPresentInfo);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkQueuePresentKHR(pHeader);
+    pPacket->queue = queue;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pPresentInfo), sizeof(VkPresentInfoKHR), pPresentInfo);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pPresentInfo->pSwapchains), swapchainSize, pPresentInfo->pSwapchains);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pPresentInfo->pImageIndices), indexSize, pPresentInfo->pImageIndices);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pPresentInfo->pWaitSemaphores), semaSize, pPresentInfo->pWaitSemaphores);
+    if (pPresentInfo->pResults != NULL) {
+        vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pPresentInfo->pResults), resultsSize, pPresentInfo->pResults);
+    }
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pPresentInfo->pImageIndices));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pPresentInfo->pSwapchains));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pPresentInfo->pWaitSemaphores));
+    if (pPresentInfo->pResults != NULL) {
+        vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pPresentInfo->pResults));
+    }
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pPresentInfo));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+/* TODO these can probably be moved into code gen */
+#ifdef VK_USE_PLATFORM_WIN32_KHR
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkCreateWin32SurfaceKHR(
+    VkInstance                                  instance,
+    const VkWin32SurfaceCreateInfoKHR*          pCreateInfo,
+    const VkAllocationCallbacks*                pAllocator,
+    VkSurfaceKHR*                               pSurface)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkCreateWin32SurfaceKHR* pPacket = NULL;
+    // don't bother with copying the actual win32 hinstance, hwnd into the trace packet, vkreplay has to use it's own anyway
+    CREATE_TRACE_PACKET(vkCreateWin32SurfaceKHR, sizeof(VkSurfaceKHR) + sizeof(VkAllocationCallbacks) + sizeof(VkWin32SurfaceCreateInfoKHR));
+    result = mid(instance)->instTable.CreateWin32SurfaceKHR(instance, pCreateInfo, pAllocator, pSurface);
+    pPacket = interpret_body_as_vkCreateWin32SurfaceKHR(pHeader);
+    pPacket->instance = instance;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo), sizeof(VkWin32SurfaceCreateInfoKHR), pCreateInfo);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pAllocator), sizeof(VkAllocationCallbacks), NULL);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pSurface), sizeof(VkSurfaceKHR), pSurface);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pAllocator));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pSurface));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkBool32 VKAPI_CALL __HOOKED_vkGetPhysicalDeviceWin32PresentationSupportKHR(
+    VkPhysicalDevice                            physicalDevice,
+    uint32_t                                    queueFamilyIndex)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkBool32 result;
+    packet_vkGetPhysicalDeviceWin32PresentationSupportKHR* pPacket = NULL;
+    CREATE_TRACE_PACKET(vkGetPhysicalDeviceWin32PresentationSupportKHR, 0);
+    result = mid(physicalDevice)->instTable.GetPhysicalDeviceWin32PresentationSupportKHR(physicalDevice, queueFamilyIndex);
+    pPacket = interpret_body_as_vkGetPhysicalDeviceWin32PresentationSupportKHR(pHeader);
+    pPacket->physicalDevice = physicalDevice;
+    pPacket->queueFamilyIndex = queueFamilyIndex;
+    pPacket->result = result;
+    FINISH_TRACE_PACKET();
+    return result;
+}
+#endif
+#ifdef VK_USE_PLATFORM_XCB_KHR
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkCreateXcbSurfaceKHR(
+    VkInstance                                  instance,
+    const VkXcbSurfaceCreateInfoKHR*            pCreateInfo,
+    const VkAllocationCallbacks*                pAllocator,
+    VkSurfaceKHR*                               pSurface)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkCreateXcbSurfaceKHR* pPacket = NULL;
+    // don't bother with copying the actual xcb window and connection into the trace packet, vkreplay has to use it's own anyway
+    CREATE_TRACE_PACKET(vkCreateXcbSurfaceKHR, sizeof(VkSurfaceKHR) + sizeof(VkAllocationCallbacks) + sizeof(VkXcbSurfaceCreateInfoKHR));
+    result = mid(instance)->instTable.CreateXcbSurfaceKHR(instance, pCreateInfo, pAllocator, pSurface);
+    pPacket = interpret_body_as_vkCreateXcbSurfaceKHR(pHeader);
+    pPacket->instance = instance;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo), sizeof(VkXcbSurfaceCreateInfoKHR), pCreateInfo);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pAllocator), sizeof(VkAllocationCallbacks), NULL);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pSurface), sizeof(VkSurfaceKHR), pSurface);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pAllocator));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pSurface));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkBool32 VKAPI_CALL __HOOKED_vkGetPhysicalDeviceXcbPresentationSupportKHR(
+    VkPhysicalDevice                            physicalDevice,
+    uint32_t                                    queueFamilyIndex,
+    xcb_connection_t*                           connection,
+    xcb_visualid_t                              visual_id)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkBool32 result;
+    packet_vkGetPhysicalDeviceXcbPresentationSupportKHR* pPacket = NULL;
+    // don't bother with copying the actual xcb visual_id and connection into the trace packet, vkreplay has to use it's own anyway
+    CREATE_TRACE_PACKET(vkGetPhysicalDeviceXcbPresentationSupportKHR, 0);
+    result = mid(physicalDevice)->instTable.GetPhysicalDeviceXcbPresentationSupportKHR(physicalDevice, queueFamilyIndex, connection, visual_id);
+    pPacket = interpret_body_as_vkGetPhysicalDeviceXcbPresentationSupportKHR(pHeader);
+    pPacket->physicalDevice = physicalDevice;
+    pPacket->connection = connection;
+    pPacket->queueFamilyIndex = queueFamilyIndex;
+    pPacket->visual_id = visual_id;
+    pPacket->result = result;
+    FINISH_TRACE_PACKET();
+    return result;
+}
+#endif
+#ifdef VK_USE_PLATFORM_XLIB_KHR
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkCreateXlibSurfaceKHR(
+    VkInstance                                  instance,
+    const VkXlibSurfaceCreateInfoKHR*           pCreateInfo,
+    const VkAllocationCallbacks*                pAllocator,
+    VkSurfaceKHR*                               pSurface)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkCreateXlibSurfaceKHR* pPacket = NULL;
+    // don't bother with copying the actual xlib window and connection into the trace packet, vkreplay has to use it's own anyway
+    CREATE_TRACE_PACKET(vkCreateXlibSurfaceKHR, sizeof(VkSurfaceKHR) + sizeof(VkAllocationCallbacks) + sizeof(VkXlibSurfaceCreateInfoKHR));
+    result = mid(instance)->instTable.CreateXlibSurfaceKHR(instance, pCreateInfo, pAllocator, pSurface);
+    pPacket = interpret_body_as_vkCreateXlibSurfaceKHR(pHeader);
+    pPacket->instance = instance;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo), sizeof(VkXlibSurfaceCreateInfoKHR), pCreateInfo);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pAllocator), sizeof(VkAllocationCallbacks), NULL);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pSurface), sizeof(VkSurfaceKHR), pSurface);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pAllocator));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pSurface));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+
+VKTRACER_EXPORT VKAPI_ATTR VkBool32 VKAPI_CALL __HOOKED_vkGetPhysicalDeviceXlibPresentationSupportKHR(
+    VkPhysicalDevice                            physicalDevice,
+    uint32_t                                    queueFamilyIndex,
+    Display*                                    dpy,
+    VisualID                                    visualID)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkBool32 result;
+    packet_vkGetPhysicalDeviceXlibPresentationSupportKHR* pPacket = NULL;
+    // don't bother with copying the actual xlib visual_id and connection into the trace packet, vkreplay has to use it's own anyway
+    CREATE_TRACE_PACKET(vkGetPhysicalDeviceXlibPresentationSupportKHR, 0);
+    result = mid(physicalDevice)->instTable.GetPhysicalDeviceXlibPresentationSupportKHR(physicalDevice, queueFamilyIndex, dpy, visualID);
+    pPacket = interpret_body_as_vkGetPhysicalDeviceXlibPresentationSupportKHR(pHeader);
+    pPacket->physicalDevice = physicalDevice;
+    pPacket->dpy = dpy;
+    pPacket->queueFamilyIndex = queueFamilyIndex;
+    pPacket->visualID = visualID;
+    pPacket->result = result;
+    FINISH_TRACE_PACKET();
+    return result;
+}
+#endif
+#ifdef VK_USE_PLATFORM_ANDROID_KHR
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkCreateAndroidSurfaceKHR(
+    VkInstance                                  instance,
+    const VkAndroidSurfaceCreateInfoKHR*        pCreateInfo,
+    const VkAllocationCallbacks*                pAllocator,
+    VkSurfaceKHR*                               pSurface)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkCreateAndroidSurfaceKHR* pPacket = NULL;
+    // don't bother with copying the actual native window into the trace packet, vkreplay has to use it's own anyway
+    CREATE_TRACE_PACKET(vkCreateAndroidSurfaceKHR, sizeof(VkSurfaceKHR) + sizeof(VkAllocationCallbacks) + sizeof(VkAndroidSurfaceCreateInfoKHR));
+    result = mid(instance)->instTable.CreateAndroidSurfaceKHR(instance, pCreateInfo, pAllocator, pSurface);
+    pPacket = interpret_body_as_vkCreateAndroidSurfaceKHR(pHeader);
+    pPacket->instance = instance;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo), sizeof(VkAndroidSurfaceCreateInfoKHR), pCreateInfo);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pAllocator), sizeof(VkAllocationCallbacks), NULL);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pSurface), sizeof(VkSurfaceKHR), pSurface);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pAllocator));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pSurface));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+#endif
+
+//TODO Wayland and Mir support
+
+/* TODO: Probably want to make this manual to get the result of the boolean and then check it on replay
+VKTRACER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL __HOOKED_vkGetPhysicalDeviceSurfaceSupportKHR(
+    VkPhysicalDevice physicalDevice,
+    uint32_t queueFamilyIndex,
+    const VkSurfaceDescriptionKHR* pSurfaceDescription,
+    VkBool32* pSupported)
+{
+    vktrace_trace_packet_header* pHeader;
+    VkResult result;
+    packet_vkGetPhysicalDeviceSurfaceSupportKHR* pPacket = NULL;
+    CREATE_TRACE_PACKET(vkGetPhysicalDeviceSurfaceSupportKHR, sizeof(VkSurfaceDescriptionKHR) + sizeof(VkBool32));
+    result = mid(physicalDevice)->instTable.GetPhysicalDeviceSurfaceSupportKHR(physicalDevice, queueFamilyIndex, pSurfaceDescription, pSupported);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkGetPhysicalDeviceSurfaceSupportKHR(pHeader);
+    pPacket->physicalDevice = physicalDevice;
+    pPacket->queueFamilyIndex = queueFamilyIndex;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pSurfaceDescription), sizeof(VkSurfaceDescriptionKHR), pSurfaceDescription);
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pSupported), sizeof(VkBool32), pSupported);
+    pPacket->result = result;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pSurfaceDescription));
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pSupported));
+    FINISH_TRACE_PACKET();
+    return result;
+}
+*/
+
+static inline PFN_vkVoidFunction layer_intercept_proc(const char *name)
+{
+    if (!name || name[0] != 'v' || name[1] != 'k')
+        return NULL;
+
+    name += 2;
+
+    if (!strcmp(name, "CreateDevice"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCreateDevice;
+    if (!strcmp(name, "DestroyDevice"))
+        return (PFN_vkVoidFunction) __HOOKED_vkDestroyDevice;
+    if (!strcmp(name, "GetDeviceQueue"))
+        return (PFN_vkVoidFunction) __HOOKED_vkGetDeviceQueue;
+    if (!strcmp(name, "QueueSubmit"))
+        return (PFN_vkVoidFunction) __HOOKED_vkQueueSubmit;
+    if (!strcmp(name, "QueueWaitIdle"))
+        return (PFN_vkVoidFunction) __HOOKED_vkQueueWaitIdle;
+    if (!strcmp(name, "DeviceWaitIdle"))
+        return (PFN_vkVoidFunction) __HOOKED_vkDeviceWaitIdle;
+    if (!strcmp(name, "AllocateMemory"))
+        return (PFN_vkVoidFunction) __HOOKED_vkAllocateMemory;
+    if (!strcmp(name, "FreeMemory"))
+        return (PFN_vkVoidFunction) __HOOKED_vkFreeMemory;
+    if (!strcmp(name, "MapMemory"))
+        return (PFN_vkVoidFunction) __HOOKED_vkMapMemory;
+    if (!strcmp(name, "UnmapMemory"))
+        return (PFN_vkVoidFunction) __HOOKED_vkUnmapMemory;
+    if (!strcmp(name, "FlushMappedMemoryRanges"))
+        return (PFN_vkVoidFunction) __HOOKED_vkFlushMappedMemoryRanges;
+    if (!strcmp(name, "InvalidateMappedMemoryRanges"))
+        return (PFN_vkVoidFunction) __HOOKED_vkInvalidateMappedMemoryRanges;
+    if (!strcmp(name, "GetDeviceMemoryCommitment"))
+        return (PFN_vkVoidFunction) __HOOKED_vkGetDeviceMemoryCommitment;
+    if (!strcmp(name, "BindBufferMemory"))
+        return (PFN_vkVoidFunction) __HOOKED_vkBindBufferMemory;
+    if (!strcmp(name, "BindImageMemory"))
+        return (PFN_vkVoidFunction) __HOOKED_vkBindImageMemory;
+    if (!strcmp(name, "GetBufferMemoryRequirements"))
+        return (PFN_vkVoidFunction) __HOOKED_vkGetBufferMemoryRequirements;
+    if (!strcmp(name, "GetImageMemoryRequirements"))
+        return (PFN_vkVoidFunction) __HOOKED_vkGetImageMemoryRequirements;
+    if (!strcmp(name, "GetImageSparseMemoryRequirements"))
+        return (PFN_vkVoidFunction) __HOOKED_vkGetImageSparseMemoryRequirements;
+    if (!strcmp(name, "GetPhysicalDeviceSparseImageFormatProperties"))
+        return (PFN_vkVoidFunction) __HOOKED_vkGetPhysicalDeviceSparseImageFormatProperties;
+    if (!strcmp(name, "QueueBindSparse"))
+        return (PFN_vkVoidFunction) __HOOKED_vkQueueBindSparse;
+    if (!strcmp(name, "CreateFence"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCreateFence;
+    if (!strcmp(name, "DestroyFence"))
+        return (PFN_vkVoidFunction) __HOOKED_vkDestroyFence;
+    if (!strcmp(name, "ResetFences"))
+        return (PFN_vkVoidFunction) __HOOKED_vkResetFences;
+    if (!strcmp(name, "GetFenceStatus"))
+        return (PFN_vkVoidFunction) __HOOKED_vkGetFenceStatus;
+    if (!strcmp(name, "WaitForFences"))
+        return (PFN_vkVoidFunction) __HOOKED_vkWaitForFences;
+    if (!strcmp(name, "CreateSemaphore"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCreateSemaphore;
+    if (!strcmp(name, "DestroySemaphore"))
+        return (PFN_vkVoidFunction) __HOOKED_vkDestroySemaphore;
+    if (!strcmp(name, "CreateEvent"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCreateEvent;
+    if (!strcmp(name, "DestroyEvent"))
+        return (PFN_vkVoidFunction) __HOOKED_vkDestroyEvent;
+    if (!strcmp(name, "GetEventStatus"))
+        return (PFN_vkVoidFunction) __HOOKED_vkGetEventStatus;
+    if (!strcmp(name, "SetEvent"))
+        return (PFN_vkVoidFunction) __HOOKED_vkSetEvent;
+    if (!strcmp(name, "ResetEvent"))
+        return (PFN_vkVoidFunction) __HOOKED_vkResetEvent;
+    if (!strcmp(name, "CreateQueryPool"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCreateQueryPool;
+    if (!strcmp(name, "DestroyQueryPool"))
+        return (PFN_vkVoidFunction) __HOOKED_vkDestroyQueryPool;
+    if (!strcmp(name, "GetQueryPoolResults"))
+        return (PFN_vkVoidFunction) __HOOKED_vkGetQueryPoolResults;
+    if (!strcmp(name, "CreateBuffer"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCreateBuffer;
+    if (!strcmp(name, "DestroyBuffer"))
+        return (PFN_vkVoidFunction) __HOOKED_vkDestroyBuffer;
+    if (!strcmp(name, "CreateBufferView"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCreateBufferView;
+    if (!strcmp(name, "DestroyBufferView"))
+        return (PFN_vkVoidFunction) __HOOKED_vkDestroyBufferView;
+    if (!strcmp(name, "CreateImage"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCreateImage;
+    if (!strcmp(name, "DestroyImage"))
+        return (PFN_vkVoidFunction) __HOOKED_vkDestroyImage;
+    if (!strcmp(name, "GetImageSubresourceLayout"))
+        return (PFN_vkVoidFunction) __HOOKED_vkGetImageSubresourceLayout;
+    if (!strcmp(name, "CreateImageView"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCreateImageView;
+    if (!strcmp(name, "DestroyImageView"))
+        return (PFN_vkVoidFunction) __HOOKED_vkDestroyImageView;
+    if (!strcmp(name, "CreateShaderModule"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCreateShaderModule;
+    if (!strcmp(name, "DestroyShaderModule"))
+        return (PFN_vkVoidFunction) __HOOKED_vkDestroyShaderModule;
+    if (!strcmp(name, "CreatePipelineCache"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCreatePipelineCache;
+    if (!strcmp(name, "DestroyPipelineCache"))
+        return (PFN_vkVoidFunction) __HOOKED_vkDestroyPipelineCache;
+    if (!strcmp(name, "GetPipelineCacheData"))
+        return (PFN_vkVoidFunction) __HOOKED_vkGetPipelineCacheData;
+    if (!strcmp(name, "MergePipelineCaches"))
+        return (PFN_vkVoidFunction) __HOOKED_vkMergePipelineCaches;
+    if (!strcmp(name, "CreateGraphicsPipelines"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCreateGraphicsPipelines;
+    if (!strcmp(name, "CreateComputePipelines"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCreateComputePipelines;
+    if (!strcmp(name, "DestroyPipeline"))
+        return (PFN_vkVoidFunction) __HOOKED_vkDestroyPipeline;
+    if (!strcmp(name, "CreatePipelineLayout"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCreatePipelineLayout;
+    if (!strcmp(name, "DestroyPipelineLayout"))
+        return (PFN_vkVoidFunction) __HOOKED_vkDestroyPipelineLayout;
+    if (!strcmp(name, "CreateSampler"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCreateSampler;
+    if (!strcmp(name, "DestroySampler"))
+        return (PFN_vkVoidFunction) __HOOKED_vkDestroySampler;
+    if (!strcmp(name, "CreateDescriptorSetLayout"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCreateDescriptorSetLayout;
+    if (!strcmp(name, "DestroyDescriptorSetLayout"))
+        return (PFN_vkVoidFunction) __HOOKED_vkDestroyDescriptorSetLayout;
+    if (!strcmp(name, "CreateDescriptorPool"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCreateDescriptorPool;
+    if (!strcmp(name, "DestroyDescriptorPool"))
+        return (PFN_vkVoidFunction) __HOOKED_vkDestroyDescriptorPool;
+    if (!strcmp(name, "ResetDescriptorPool"))
+        return (PFN_vkVoidFunction) __HOOKED_vkResetDescriptorPool;
+    if (!strcmp(name, "AllocateDescriptorSets"))
+        return (PFN_vkVoidFunction) __HOOKED_vkAllocateDescriptorSets;
+    if (!strcmp(name, "FreeDescriptorSets"))
+        return (PFN_vkVoidFunction) __HOOKED_vkFreeDescriptorSets;
+    if (!strcmp(name, "UpdateDescriptorSets"))
+        return (PFN_vkVoidFunction) __HOOKED_vkUpdateDescriptorSets;
+    if (!strcmp(name, "CreateCommandPool"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCreateCommandPool;
+    if (!strcmp(name, "DestroyCommandPool"))
+        return (PFN_vkVoidFunction) __HOOKED_vkDestroyCommandPool;
+    if (!strcmp(name, "ResetCommandPool"))
+        return (PFN_vkVoidFunction) __HOOKED_vkResetCommandPool;
+    if (!strcmp(name, "AllocateCommandBuffers"))
+        return (PFN_vkVoidFunction) __HOOKED_vkAllocateCommandBuffers;
+    if (!strcmp(name, "FreeCommandBuffers"))
+        return (PFN_vkVoidFunction) __HOOKED_vkFreeCommandBuffers;
+    if (!strcmp(name, "BeginCommandBuffer"))
+        return (PFN_vkVoidFunction) __HOOKED_vkBeginCommandBuffer;
+    if (!strcmp(name, "EndCommandBuffer"))
+        return (PFN_vkVoidFunction) __HOOKED_vkEndCommandBuffer;
+    if (!strcmp(name, "ResetCommandBuffer"))
+        return (PFN_vkVoidFunction) __HOOKED_vkResetCommandBuffer;
+    if (!strcmp(name, "CmdBindPipeline"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdBindPipeline;
+    if (!strcmp(name, "CmdSetViewport"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdSetViewport;
+    if (!strcmp(name, "CmdSetScissor"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdSetScissor;
+    if (!strcmp(name, "CmdSetLineWidth"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdSetLineWidth;
+    if (!strcmp(name, "CmdSetDepthBias"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdSetDepthBias;
+    if (!strcmp(name, "CmdSetBlendConstants"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdSetBlendConstants;
+    if (!strcmp(name, "CmdSetDepthBounds"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdSetDepthBounds;
+    if (!strcmp(name, "CmdSetStencilCompareMask"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdSetStencilCompareMask;
+    if (!strcmp(name, "CmdSetStencilWriteMask"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdSetStencilWriteMask;
+    if (!strcmp(name, "CmdSetStencilReference"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdSetStencilReference;
+    if (!strcmp(name, "CmdBindDescriptorSets"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdBindDescriptorSets;
+    if (!strcmp(name, "CmdBindIndexBuffer"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdBindIndexBuffer;
+    if (!strcmp(name, "CmdBindVertexBuffers"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdBindVertexBuffers;
+    if (!strcmp(name, "CmdDraw"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdDraw;
+    if (!strcmp(name, "CmdDrawIndexed"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdDrawIndexed;
+    if (!strcmp(name, "CmdDrawIndirect"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdDrawIndirect;
+    if (!strcmp(name, "CmdDrawIndexedIndirect"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdDrawIndexedIndirect;
+    if (!strcmp(name, "CmdDispatch"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdDispatch;
+    if (!strcmp(name, "CmdDispatchIndirect"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdDispatchIndirect;
+    if (!strcmp(name, "CmdCopyBuffer"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdCopyBuffer;
+    if (!strcmp(name, "CmdCopyImage"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdCopyImage;
+    if (!strcmp(name, "CmdBlitImage"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdBlitImage;
+    if (!strcmp(name, "CmdCopyBufferToImage"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdCopyBufferToImage;
+    if (!strcmp(name, "CmdCopyImageToBuffer"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdCopyImageToBuffer;
+    if (!strcmp(name, "CmdUpdateBuffer"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdUpdateBuffer;
+    if (!strcmp(name, "CmdFillBuffer"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdFillBuffer;
+    if (!strcmp(name, "CmdClearColorImage"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdClearColorImage;
+    if (!strcmp(name, "CmdClearDepthStencilImage"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdClearDepthStencilImage;
+    if (!strcmp(name, "CmdClearAttachments"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdClearAttachments;
+    if (!strcmp(name, "CmdResolveImage"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdResolveImage;
+    if (!strcmp(name, "CmdSetEvent"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdSetEvent;
+    if (!strcmp(name, "CmdResetEvent"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdResetEvent;
+    if (!strcmp(name, "CmdWaitEvents"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdWaitEvents;
+    if (!strcmp(name, "CmdPipelineBarrier"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdPipelineBarrier;
+    if (!strcmp(name, "CmdBeginQuery"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdBeginQuery;
+    if (!strcmp(name, "CmdEndQuery"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdEndQuery;
+    if (!strcmp(name, "CmdResetQueryPool"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdResetQueryPool;
+    if (!strcmp(name, "CmdWriteTimestamp"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdWriteTimestamp;
+    if (!strcmp(name, "CmdCopyQueryPoolResults"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdCopyQueryPoolResults;
+    if (!strcmp(name, "CreateFramebuffer"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCreateFramebuffer;
+    if (!strcmp(name, "DestroyFramebuffer"))
+        return (PFN_vkVoidFunction) __HOOKED_vkDestroyFramebuffer;
+    if (!strcmp(name, "CreateRenderPass"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCreateRenderPass;
+    if (!strcmp(name, "DestroyRenderPass"))
+        return (PFN_vkVoidFunction) __HOOKED_vkDestroyRenderPass;
+    if (!strcmp(name, "GetRenderAreaGranularity"))
+        return (PFN_vkVoidFunction) __HOOKED_vkGetRenderAreaGranularity;
+    if (!strcmp(name, "CmdBeginRenderPass"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdBeginRenderPass;
+    if (!strcmp(name, "CmdNextSubpass"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdNextSubpass;
+    if (!strcmp(name, "CmdPushConstants"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdPushConstants;
+    if (!strcmp(name, "CmdEndRenderPass"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdEndRenderPass;
+    if (!strcmp(name, "CmdExecuteCommands"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCmdExecuteCommands;
+
+    return NULL;
+}
+
+static inline PFN_vkVoidFunction layer_intercept_instance_proc(const char *name)
+{
+    if (!name || name[0] != 'v' || name[1] != 'k')
+        return NULL;
+
+    name += 2;
+    if (!strcmp(name, "CreateDevice"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCreateDevice;
+    if (!strcmp(name, "CreateInstance"))
+        return (PFN_vkVoidFunction) __HOOKED_vkCreateInstance;
+    if (!strcmp(name, "DestroyInstance"))
+        return (PFN_vkVoidFunction) __HOOKED_vkDestroyInstance;
+    if (!strcmp(name, "EnumeratePhysicalDevices"))
+        return (PFN_vkVoidFunction) __HOOKED_vkEnumeratePhysicalDevices;
+    if (!strcmp(name, "GetPhysicalDeviceFeatures"))
+        return (PFN_vkVoidFunction) __HOOKED_vkGetPhysicalDeviceFeatures;
+    if (!strcmp(name, "GetPhysicalDeviceFormatProperties"))
+        return (PFN_vkVoidFunction) __HOOKED_vkGetPhysicalDeviceFormatProperties;
+    if (!strcmp(name, "GetPhysicalDeviceImageFormatProperties"))
+        return (PFN_vkVoidFunction) __HOOKED_vkGetPhysicalDeviceImageFormatProperties;
+    if (!strcmp(name, "GetPhysicalDeviceSparseImageFormatProperties"))
+        return (PFN_vkVoidFunction) __HOOKED_vkGetPhysicalDeviceSparseImageFormatProperties;
+    if (!strcmp(name, "GetPhysicalDeviceProperties"))
+        return (PFN_vkVoidFunction) __HOOKED_vkGetPhysicalDeviceProperties;
+    if (!strcmp(name, "GetPhysicalDeviceQueueFamilyProperties"))
+        return (PFN_vkVoidFunction) __HOOKED_vkGetPhysicalDeviceQueueFamilyProperties;
+    if (!strcmp(name, "GetPhysicalDeviceMemoryProperties"))
+        return (PFN_vkVoidFunction) __HOOKED_vkGetPhysicalDeviceMemoryProperties;
+    if (!strcmp(name, "EnumerateDeviceLayerProperties"))
+        return (PFN_vkVoidFunction) __HOOKED_vkEnumerateDeviceLayerProperties;
+    if (!strcmp(name, "EnumerateDeviceExtensionProperties"))
+        return (PFN_vkVoidFunction) __HOOKED_vkEnumerateDeviceExtensionProperties;
+
+    return NULL;
+}
+
+/**
+ * Want trace packets created for GetDeviceProcAddr that is app initiated
+ * but not for loader initiated calls to GDPA. Thus need two versions of GDPA.
+ */
+VKTRACER_EXPORT VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL vktraceGetDeviceProcAddr(VkDevice device, const char* funcName)
+{
+    vktrace_trace_packet_header *pHeader;
+    PFN_vkVoidFunction addr;
+    packet_vkGetDeviceProcAddr* pPacket = NULL;
+    CREATE_TRACE_PACKET(vkGetDeviceProcAddr, ((funcName != NULL) ? ROUNDUP_TO_4(strlen(funcName) + 1) : 0));
+    addr = __HOOKED_vkGetDeviceProcAddr(device, funcName);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkGetDeviceProcAddr(pHeader);
+    pPacket->device = device;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pName), ((funcName != NULL) ? ROUNDUP_TO_4(strlen(funcName) + 1) : 0), funcName);
+    pPacket->result = addr;
+    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pName));
+    FINISH_TRACE_PACKET();
+    return addr;
+}
+
+/* GDPA with no trace packet creation */
+VKTRACER_EXPORT VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL __HOOKED_vkGetDeviceProcAddr(VkDevice device, const char* funcName)
+{
+    if (!strcmp("vkGetDeviceProcAddr", funcName)) {
+        if (gMessageStream != NULL) {
+            return (PFN_vkVoidFunction) vktraceGetDeviceProcAddr;
+        } else {
+            return (PFN_vkVoidFunction) __HOOKED_vkGetDeviceProcAddr;
+        }
+    }
+
+    layer_device_data  *devData = mdd(device);
+    if (gMessageStream != NULL) {
+
+        PFN_vkVoidFunction addr;
+        addr = layer_intercept_proc(funcName);
+        if (addr)
+            return addr;
+
+        if (devData->KHRDeviceSwapchainEnabled)
+        {
+            if (!strcmp("vkCreateSwapchainKHR", funcName))
+                return (PFN_vkVoidFunction) __HOOKED_vkCreateSwapchainKHR;
+            if (!strcmp("vkDestroySwapchainKHR", funcName))
+                return (PFN_vkVoidFunction) __HOOKED_vkDestroySwapchainKHR;
+            if (!strcmp("vkGetSwapchainImagesKHR", funcName))
+                return (PFN_vkVoidFunction) __HOOKED_vkGetSwapchainImagesKHR;
+            if (!strcmp("vkAcquireNextImageKHR", funcName))
+                return (PFN_vkVoidFunction) __HOOKED_vkAcquireNextImageKHR;
+            if (!strcmp("vkQueuePresentKHR", funcName))
+                return (PFN_vkVoidFunction) __HOOKED_vkQueuePresentKHR;
+        }
+    }
+
+    if (device == VK_NULL_HANDLE) {
+        return NULL;
+    }
+
+    VkLayerDispatchTable *pDisp =  &devData->devTable;
+    if (pDisp->GetDeviceProcAddr == NULL)
+        return NULL;
+    return pDisp->GetDeviceProcAddr(device, funcName);
+}
+
+/**
+ * Want trace packets created for GetInstanceProcAddr that is app initiated
+ * but not for loader initiated calls to GIPA. Thus need two versions of GIPA.
+ */
+VKTRACER_EXPORT VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL vktraceGetInstanceProcAddr(VkInstance instance, const char* funcName)
+{
+    vktrace_trace_packet_header* pHeader;
+    PFN_vkVoidFunction addr;
+    packet_vkGetInstanceProcAddr* pPacket = NULL;
+    //assert(strcmp("vkGetInstanceProcAddr", funcName));
+    CREATE_TRACE_PACKET(vkGetInstanceProcAddr, ((funcName != NULL) ? ROUNDUP_TO_4(strlen(funcName) + 1) : 0));
+    addr = __HOOKED_vkGetInstanceProcAddr(instance, funcName);
+    vktrace_set_packet_entrypoint_end_time(pHeader);
+    pPacket = interpret_body_as_vkGetInstanceProcAddr(pHeader);
+    pPacket->instance = instance;
+    vktrace_add_buffer_to_trace_packet(pHeader, (void**) &(pPacket->pName), ((funcName != NULL) ? ROUNDUP_TO_4(strlen(funcName) + 1) : 0), funcName);
+    pPacket->result = addr;
+    vktrace_finalize_buffer_address(pHeader, (void**) &(pPacket->pName));
+    FINISH_TRACE_PACKET();
+    return addr;
+}
+
+/* GIPA with no trace packet creation */
+VKTRACER_EXPORT VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL __HOOKED_vkGetInstanceProcAddr(VkInstance instance, const char* funcName)
+{
+    PFN_vkVoidFunction addr;
+    layer_instance_data  *instData;
+
+    vktrace_platform_thread_once((void*) &gInitOnce, InitTracer);
+    if (!strcmp("vkGetInstanceProcAddr", funcName)) {
+        if (gMessageStream != NULL) {
+            return (PFN_vkVoidFunction) vktraceGetInstanceProcAddr;
+        } else {
+            return (PFN_vkVoidFunction) __HOOKED_vkGetInstanceProcAddr;
+        }
+    }
+
+    if (gMessageStream != NULL) {
+        addr = layer_intercept_instance_proc(funcName);
+        if (addr)
+            return addr;
+
+        if (instance == VK_NULL_HANDLE) {
+            return NULL;
+        }
+
+        instData = mid(instance);
+        if (instData->LunargDebugReportEnabled)
+        {
+            if (!strcmp("vkCreateDebugReportCallbackEXT", funcName))
+                return (PFN_vkVoidFunction) __HOOKED_vkCreateDebugReportCallbackEXT;
+            if (!strcmp("vkDestroyDebugReportCallbackEXT", funcName))
+                return (PFN_vkVoidFunction) __HOOKED_vkDestroyDebugReportCallbackEXT;
+
+        }
+        if (instData->KHRSurfaceEnabled)
+        {
+            if (!strcmp("vkGetPhysicalDeviceSurfaceSupportKHR", funcName))
+                return (PFN_vkVoidFunction) __HOOKED_vkGetPhysicalDeviceSurfaceSupportKHR;
+            if (!strcmp("vkDestroySurfaceKHR", funcName))
+                return (PFN_vkVoidFunction) __HOOKED_vkDestroySurfaceKHR;
+            if (!strcmp("vkGetPhysicalDeviceSurfaceCapabilitiesKHR", funcName))
+                return (PFN_vkVoidFunction) __HOOKED_vkGetPhysicalDeviceSurfaceCapabilitiesKHR;
+            if (!strcmp("vkGetPhysicalDeviceSurfaceFormatsKHR", funcName))
+                return (PFN_vkVoidFunction) __HOOKED_vkGetPhysicalDeviceSurfaceFormatsKHR;
+            if (!strcmp("vkGetPhysicalDeviceSurfacePresentModesKHR", funcName))
+                return (PFN_vkVoidFunction) __HOOKED_vkGetPhysicalDeviceSurfacePresentModesKHR;
+        }
+#ifdef VK_USE_PLATFORM_XLIB_KHR
+        if (instData->KHRXlibSurfaceEnabled)
+        {
+            if (!strcmp("vkCreateXlibSurfaceKHR", funcName))
+                return (PFN_vkVoidFunction) __HOOKED_vkCreateXlibSurfaceKHR;
+            if (!strcmp("vkGetPhysicalDeviceXlibPresentationSupportKHR", funcName))
+                return (PFN_vkVoidFunction) __HOOKED_vkGetPhysicalDeviceXlibPresentationSupportKHR;
+        }
+#endif
+#ifdef VK_USE_PLATFORM_XCB_KHR
+        if (instData->KHRXcbSurfaceEnabled)
+        {
+            if (!strcmp("vkCreateXcbSurfaceKHR", funcName))
+                return (PFN_vkVoidFunction) __HOOKED_vkCreateXcbSurfaceKHR;
+            if (!strcmp("vkGetPhysicalDeviceXcbPresentationSupportKHR", funcName))
+                return (PFN_vkVoidFunction) __HOOKED_vkGetPhysicalDeviceXcbPresentationSupportKHR;
+        }
+#endif
+#ifdef VK_USE_PLATFORM_WAYLAND_KHR
+        if (instData->KHRWaylandSurfaceEnabled)
+        {
+            if (!strcmp("vkCreateWaylandSurfaceKHR", funcName))
+                return (PFN_vkVoidFunction) __HOOKED_vkCreateWaylandSurfaceKHR;
+            if (!strcmp("vkGetPhysicalDeviceWaylandPresentationSupportKHR", funcName))
+                return (PFN_vkVoidFunction) __HOOKED_vkGetPhysicalDeviceWaylandPresentationSupportKHR;
+        }
+#endif
+#ifdef VK_USE_PLATFORM_MIR_KHR
+        if (instData->KHRMirSurfaceEnabled)
+        {
+            if (!strcmp("vkCreateMirSurfaceKHR", funcName))
+                return (PFN_vkVoidFunction) __HOOKED_vkCreateMirSurfaceKHR;
+            if (!strcmp("vkGetPhysicalDeviceMirPresentationSupportKHR", funcName))
+                return (PFN_vkVoidFunction) __HOOKED_vkGetPhysicalDeviceMirPresentationSupportKHR;
+        }
+#endif
+#ifdef VK_USE_PLATFORM_WIN32_KHR
+        if (instData->KHRWin32SurfaceEnabled)
+        {
+            if (!strcmp("vkCreateWin32SurfaceKHR", funcName))
+                return (PFN_vkVoidFunction) __HOOKED_vkCreateWin32SurfaceKHR;
+            if (!strcmp("vkGetPhysicalDeviceWin32PresentationSupportKHR", funcName))
+                return (PFN_vkVoidFunction) __HOOKED_vkGetPhysicalDeviceWin32PresentationSupportKHR;
+        }
+#endif
+#ifdef VK_USE_PLATFORM_ANDROID_KHR
+        if (instData->KHRAndroidSurfaceEnabled)
+        {
+            if (!strcmp("vkCreateAndroidSurfaceKHR", funcName))
+                return (PFN_vkVoidFunction) __HOOKED_vkCreateAndroidSurfaceKHR;
+        }
+#endif
+    } else {
+        if (instance == VK_NULL_HANDLE) {
+            return NULL;
+        }
+        instData = mid(instance);
+    }
+    VkLayerInstanceDispatchTable* pTable = &instData->instTable;
+    if (pTable->GetInstanceProcAddr == NULL)
+        return NULL;
+
+    return pTable->GetInstanceProcAddr(instance, funcName);
+}
+
+static const VkLayerProperties layerProps = {
+    "VK_LAYER_LUNARG_vktrace",
+    VK_MAKE_VERSION(1, 0, VK_HEADER_VERSION),
+    1, "LunarG tracing layer",
+};
+
+template<typename T>
+VkResult EnumerateProperties(uint32_t src_count, const T *src_props, uint32_t *dst_count, T *dst_props) {
+    if (!dst_props || !src_props) {
+        *dst_count = src_count;
+        return VK_SUCCESS;
+    }
+
+    uint32_t copy_count = (*dst_count < src_count) ? *dst_count : src_count;
+    memcpy(dst_props, src_props, sizeof(T) * copy_count);
+    *dst_count = copy_count;
+
+    return (copy_count == src_count) ? VK_SUCCESS : VK_INCOMPLETE;
+}
+
+// LoaderLayerInterface V0
+// https://github.com/KhronosGroup/Vulkan-LoaderAndValidationLayers/blob/master/loader/LoaderAndLayerInterface.md
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkEnumerateInstanceLayerProperties(uint32_t *pPropertyCount, VkLayerProperties *pProperties) {
+    return EnumerateProperties(1, &layerProps, pPropertyCount, pProperties);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkEnumerateInstanceExtensionProperties(const char *pLayerName, uint32_t *pPropertyCount, VkExtensionProperties *pProperties) {
+    if (pLayerName && !strcmp(pLayerName, layerProps.layerName))
+        return EnumerateProperties(0, (VkExtensionProperties*)nullptr, pPropertyCount, pProperties);
+
+    return VK_ERROR_LAYER_NOT_PRESENT;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkEnumerateDeviceLayerProperties(VkPhysicalDevice physicalDevice, uint32_t *pPropertyCount, VkLayerProperties *pProperties) {
+    return EnumerateProperties(1, &layerProps, pPropertyCount, pProperties);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR VkResult VKAPI_CALL vkEnumerateDeviceExtensionProperties(const char *pLayerName, uint32_t *pPropertyCount, VkExtensionProperties *pProperties) {
+    if (pLayerName && !strcmp(pLayerName, layerProps.layerName))
+        return EnumerateProperties(0, (VkExtensionProperties*)nullptr, pPropertyCount, pProperties);
+
+    return VK_ERROR_LAYER_NOT_PRESENT;
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL VK_LAYER_LUNARG_vktraceGetInstanceProcAddr(VkInstance instance, const char* funcName) {
+    return __HOOKED_vkGetInstanceProcAddr(instance, funcName);
+}
+
+VK_LAYER_EXPORT VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL VK_LAYER_LUNARG_vktraceGetDeviceProcAddr(VkDevice device, const char* funcName) {
+    return __HOOKED_vkGetDeviceProcAddr(device, funcName);
+}

diff --git a/vktrace/src/vktrace_layer/vktrace_vk_exts.cpp b/vktrace/src/vktrace_layer/vktrace_vk_exts.cpp
new file mode 100644
index 0000000..d63fa89
--- /dev/null
+++ b/vktrace/src/vktrace_layer/vktrace_vk_exts.cpp

@@ -0,0 +1,131 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ */
+#include "vktrace_lib_helpers.h"
+
+#include "vulkan/vk_layer.h"
+
+void ext_init_create_instance(
+        layer_instance_data             *instData,
+        VkInstance                      inst,
+        uint32_t                        extension_count,
+        const char*const*               ppEnabledExtensions)    // extension names to be enabled
+{
+    PFN_vkGetInstanceProcAddr gpa = instData->instTable.GetInstanceProcAddr;
+
+    instData->instTable.CreateDebugReportCallbackEXT = (PFN_vkCreateDebugReportCallbackEXT) gpa(inst, "vkCreateDebugReportCallbackEXT");
+    instData->instTable.DestroyDebugReportCallbackEXT = (PFN_vkDestroyDebugReportCallbackEXT) gpa(inst, "vkDestroyDebugReportCallbackEXT");
+    instData->instTable.DebugReportMessageEXT = (PFN_vkDebugReportMessageEXT) gpa(inst, "vkDebugReportMessageEXT");
+    instData->instTable.GetPhysicalDeviceSurfaceSupportKHR = (PFN_vkGetPhysicalDeviceSurfaceSupportKHR) gpa(inst, "vkGetPhysicalDeviceSurfaceSupportKHR");
+    instData->instTable.DestroySurfaceKHR = (PFN_vkDestroySurfaceKHR) gpa(inst, "vkDestroySurfaceKHR");
+    instData->instTable.GetPhysicalDeviceSurfaceCapabilitiesKHR = (PFN_vkGetPhysicalDeviceSurfaceCapabilitiesKHR) gpa(inst, "vkGetPhysicalDeviceSurfaceCapabilitiesKHR");
+    instData->instTable.GetPhysicalDeviceSurfaceFormatsKHR = (PFN_vkGetPhysicalDeviceSurfaceFormatsKHR) gpa(inst, "vkGetPhysicalDeviceSurfaceFormatsKHR");
+    instData->instTable.GetPhysicalDeviceSurfacePresentModesKHR = (PFN_vkGetPhysicalDeviceSurfacePresentModesKHR) gpa(inst, "vkGetPhysicalDeviceSurfacePresentModesKHR");
+#ifdef VK_USE_PLATFORM_XLIB_KHR
+    instData->instTable.CreateXlibSurfaceKHR = (PFN_vkCreateXlibSurfaceKHR) gpa(inst, "vkCreateXlibSurfaceKHR");
+    instData->instTable.GetPhysicalDeviceXlibPresentationSupportKHR = (PFN_vkGetPhysicalDeviceXlibPresentationSupportKHR) gpa(inst, "vkGetPhysicalDeviceXlibPresentationSupportKHR");
+#endif
+#ifdef VK_USE_PLATFORM_XCB_KHR
+    instData->instTable.CreateXcbSurfaceKHR = (PFN_vkCreateXcbSurfaceKHR) gpa(inst, "vkCreateXcbSurfaceKHR");
+    instData->instTable.GetPhysicalDeviceXcbPresentationSupportKHR = (PFN_vkGetPhysicalDeviceXcbPresentationSupportKHR) gpa(inst, "vkGetPhysicalDeviceXcbPresentationSupportKHR");
+#endif
+#ifdef VK_USE_PLATFORM_MIR_KHR
+    instData->instTable.CreateMirSurfaceKHR = (PFN_vkCreateMirSurfaceKHR) gpa(inst, "vkCreateMirSurfaceKHR");
+    instData->instTable.GetPhysicalDeviceMirPresentationSupportKHR = (PFN_vkGetPhysicalDeviceMirPresentationSupportKHR) gpa(inst, "vkGetPhysicalDeviceMirPresentationSupportKHR");
+#endif
+#ifdef VK_USE_PLATFORM_WAYLAND_KHR
+    instData->instTable.CreateWaylandSurfaceKHR = (PFN_vkCreateWaylandSurfaceKHR) gpa(inst, "vkCreateWaylandSurfaceKHR");
+    instData->instTable.GetPhysicalDeviceWaylandPresentationSupportKHR = (PFN_vkGetPhysicalDeviceWaylandPresentationSupportKHR) gpa(inst, "vkGetPhysicalDeviceWaylandPresentationSupportKHR");
+#endif
+#ifdef VK_USE_PLATFORM_WIN32_KHR
+    instData->instTable.CreateWin32SurfaceKHR = (PFN_vkCreateWin32SurfaceKHR) gpa(inst, "vkCreateWin32SurfaceKHR");
+    instData->instTable.GetPhysicalDeviceWin32PresentationSupportKHR = (PFN_vkGetPhysicalDeviceWin32PresentationSupportKHR) gpa(inst, "vkGetPhysicalDeviceWin32PresentationSupportKHR");
+#endif
+#ifdef VK_USE_PLATFORM_ANDROID_KHR
+    instData->instTable.CreateAndroidSurfaceKHR = (PFN_vkCreateAndroidSurfaceKHR) gpa(inst, "vkCreateAndroidSurfaceKHR");
+#endif
+    instData->LunargDebugReportEnabled = false;
+    instData->KHRSurfaceEnabled = false;
+    instData->KHRXlibSurfaceEnabled = false;
+    instData->KHRXcbSurfaceEnabled = false;
+    instData->KHRWaylandSurfaceEnabled = false;
+    instData->KHRMirSurfaceEnabled = false;
+    instData->KHRWin32SurfaceEnabled = false;
+    instData->KHRAndroidSurfaceEnabled = false;
+    for (uint32_t i = 0; i < extension_count; i++) {
+        if (strcmp(ppEnabledExtensions[i], VK_EXT_DEBUG_REPORT_EXTENSION_NAME) == 0) {
+            instData->LunargDebugReportEnabled = true;
+        }
+        if (strcmp(ppEnabledExtensions[i], VK_KHR_SURFACE_EXTENSION_NAME) == 0) {
+            instData->KHRSurfaceEnabled = true;
+        }
+#ifdef VK_USE_PLATFORM_XLIB_KHR
+        if (strcmp(ppEnabledExtensions[i], VK_KHR_XLIB_SURFACE_EXTENSION_NAME) == 0) {
+            instData->KHRXlibSurfaceEnabled = true;
+        }
+#endif
+#ifdef VK_USE_PLATFORM_XCB_KHR
+        if (strcmp(ppEnabledExtensions[i], VK_KHR_XCB_SURFACE_EXTENSION_NAME) == 0) {
+            instData->KHRXcbSurfaceEnabled = true;
+        }
+#endif
+#ifdef VK_USE_PLATFORM_MIR_KHR
+        if (strcmp(ppEnabledExtensions[i], VK_KHR_MIR_SURFACE_EXTENSION_NAME) == 0) {
+            instData->KHRMirSurfaceEnabled = true;
+        }
+#endif
+#ifdef VK_USE_PLATFORM_WAYLAND_KHR
+        if (strcmp(ppEnabledExtensions[i], VK_KHR_WAYLAND_SURFACE_EXTENSION_NAME) == 0) {
+            instData->KHRWaylandSurfaceEnabled = true;
+        }
+#endif
+#ifdef VK_USE_PLATFORM_WIN32_KHR
+        if (strcmp(ppEnabledExtensions[i], VK_KHR_WIN32_SURFACE_EXTENSION_NAME) == 0) {
+            instData->KHRWin32SurfaceEnabled = true;
+        }
+#endif
+#ifdef VK_USE_PLATFORM_ANDROID_KHR
+        if (strcmp(ppEnabledExtensions[i], VK_KHR_ANDROID_SURFACE_EXTENSION_NAME) == 0) {
+            instData->KHRAndroidSurfaceEnabled = true;
+        }
+#endif
+    }
+}
+
+void ext_init_create_device(
+        layer_device_data               *devData,
+        VkDevice                        dev,
+        PFN_vkGetDeviceProcAddr         gpa,
+        uint32_t                        extension_count,
+        const char*const*               ppEnabledExtensions)  // extension names to be enabled
+{
+    devData->devTable.CreateSwapchainKHR = (PFN_vkCreateSwapchainKHR) gpa(dev, "vkCreateSwapchainKHR");
+    devData->devTable.DestroySwapchainKHR = (PFN_vkDestroySwapchainKHR) gpa(dev, "vkDestroySwapchainKHR");
+    devData->devTable.GetSwapchainImagesKHR = (PFN_vkGetSwapchainImagesKHR) gpa(dev, "vkGetSwapchainImagesKHR");
+    devData->devTable.AcquireNextImageKHR = (PFN_vkAcquireNextImageKHR) gpa(dev, "vkAcquireNextImageKHR");
+    devData->devTable.QueuePresentKHR = (PFN_vkQueuePresentKHR) gpa(dev, "vkQueuePresentKHR");
+    
+    devData->KHRDeviceSwapchainEnabled = false;
+    for (uint32_t i = 0; i < extension_count; i++) {
+        if (strcmp(ppEnabledExtensions[i], VK_KHR_SWAPCHAIN_EXTENSION_NAME) == 0) {
+            devData->KHRDeviceSwapchainEnabled = true;
+        }
+    }
+}

diff --git a/vktrace/src/vktrace_layer/vktrace_vk_exts.h b/vktrace/src/vktrace_layer/vktrace_vk_exts.h
new file mode 100644
index 0000000..b4a7500
--- /dev/null
+++ b/vktrace/src/vktrace_layer/vktrace_vk_exts.h

@@ -0,0 +1,37 @@
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ */
+#pragma once
+
+#include "vulkan/vk_layer.h"
+#include "vktrace_lib_helpers.h"
+
+void ext_init_create_instance(
+        layer_instance_data             *instData,
+        VkInstance                      inst,
+        uint32_t                        extension_count,
+        const char*const*               ppEnabledExtensions);
+
+void ext_init_create_device(
+        layer_device_data               *devData,
+        VkDevice                        dev,
+        PFN_vkGetDeviceProcAddr         gpa,
+        uint32_t                        extension_count,
+        const char*const*               ppEnabledExtensions);

diff --git a/vktrace/src/vktrace_layer/windows/VkLayer_vktrace_layer.json b/vktrace/src/vktrace_layer/windows/VkLayer_vktrace_layer.json
new file mode 100644
index 0000000..1de3f90
--- /dev/null
+++ b/vktrace/src/vktrace_layer/windows/VkLayer_vktrace_layer.json

@@ -0,0 +1,15 @@
+{
+    "file_format_version" : "1.0.0",
+    "layer" : {
+        "name": "VK_LAYER_LUNARG_vktrace",
+        "type": "GLOBAL",
+        "library_path": ".\\VkLayer_vktrace_layer.dll",
+        "api_version": "1.0.33",
+        "implementation_version": "1",
+        "description": "Vktrace tracing library",
+        "functions" : {
+          "vkGetInstanceProcAddr" : "__HOOKED_vkGetInstanceProcAddr",
+          "vkGetDeviceProcAddr" : "__HOOKED_vkGetDeviceProcAddr"
+        }
+    }
+}

diff --git a/vktrace/src/vktrace_replay/CMakeLists.txt b/vktrace/src/vktrace_replay/CMakeLists.txt
new file mode 100644
index 0000000..c8d3e63
--- /dev/null
+++ b/vktrace/src/vktrace_replay/CMakeLists.txt

@@ -0,0 +1,33 @@
+cmake_minimum_required(VERSION 2.8)
+project(vkreplay)
+
+include("${SRC_DIR}/build_options.cmake")
+
+set(SRC_LIST
+    ${SRC_LIST}
+    vkreplay_factory.h
+    vkreplay_seq.h
+    vkreplay_window.h
+    vkreplay_main.cpp
+    vkreplay_seq.cpp
+    vkreplay_factory.cpp
+)
+
+include_directories(
+    ${SRC_DIR}/vktrace_replay
+    ${SRC_DIR}/vktrace_common
+    ${SRC_DIR}/thirdparty
+    ${CMAKE_CURRENT_SOURCE_DIR}/../vktrace_extensions/vktracevulkan/vkreplay/
+)
+
+set (LIBRARIES vktrace_common vulkan_replay)
+
+add_executable(${PROJECT_NAME} ${SRC_LIST})
+
+add_dependencies(${PROJECT_NAME} "vulkan-${MAJOR}")
+
+target_link_libraries(${PROJECT_NAME}
+    ${LIBRARIES}
+)
+
+build_options_finalize()

diff --git a/vktrace/src/vktrace_replay/vkreplay_factory.cpp b/vktrace/src/vktrace_replay/vkreplay_factory.cpp
new file mode 100644
index 0000000..edf4586
--- /dev/null
+++ b/vktrace/src/vktrace_replay/vkreplay_factory.cpp

@@ -0,0 +1,147 @@
+/**************************************************************************
+ *
+ * Copyright 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ **************************************************************************/
+
+#include "vkreplay_factory.h"
+#include "vktrace_trace_packet_identifiers.h"
+#include "vkreplay.h"
+
+namespace vktrace_replay {
+
+vktrace_trace_packet_replay_library* ReplayFactory::Create(uint8_t tracerId)
+{
+    vktrace_trace_packet_replay_library* pReplayer = NULL;
+    //void* pLibrary = NULL;
+
+    const VKTRACE_TRACER_REPLAYER_INFO* pReplayerInfo = &(gs_tracerReplayerInfo[tracerId]);
+
+
+    if (pReplayerInfo->tracerId != tracerId)
+    {
+        vktrace_LogError("Replayer info for TracerId (%d) failed consistency check.", tracerId);
+        assert(!"TracerId in VKTRACE_TRACER_REPLAYER_INFO does not match the requested tracerId. The array needs to be corrected.");
+    }
+
+    // Vulkan library is built into replayer executable
+    if (tracerId == VKTRACE_TID_VULKAN) {
+        pReplayer = VKTRACE_NEW(vktrace_trace_packet_replay_library);
+        if (pReplayer == NULL)
+        {
+            vktrace_LogError("Failed to allocate replayer library.");
+        }
+        else
+        {
+            pReplayer->pLibrary = NULL;
+
+            pReplayer->SetLogCallback = VkReplaySetLogCallback;
+            pReplayer->SetLogLevel = VkReplaySetLogLevel;
+
+            pReplayer->RegisterDbgMsgCallback = VkReplayRegisterDbgMsgCallback;
+            pReplayer->GetSettings = VkReplayGetSettings;
+            pReplayer->UpdateFromSettings = VkReplayUpdateFromSettings;
+            pReplayer->Initialize = VkReplayInitialize;
+            pReplayer->Deinitialize = VkReplayDeinitialize;
+            pReplayer->Interpret = VkReplayInterpret;
+            pReplayer->Replay = VkReplayReplay;
+            pReplayer->Dump = VkReplayDump;
+            pReplayer->GetFrameNumber = VkReplayGetFrameNumber;
+            pReplayer->ResetFrameNumber = VkReplayResetFrameNumber;
+        }
+
+    }
+    
+//    if (pReplayerInfo->needsReplayer == TRUE)
+//    {
+//        pLibrary = vktrace_platform_open_library(pReplayerInfo->replayerLibraryName);
+//        if (pLibrary == NULL)
+//        {
+//            vktrace_LogError("Failed to load replayer '%s.", pReplayerInfo->replayerLibraryName);
+//#if defined(PLATFORM_LINUX)
+//            char* error = dlerror();
+//            vktrace_LogError(error);
+//#endif
+//        }
+//    }
+//    else
+//    {
+//        vktrace_LogError("A replayer was requested for TracerId (%d), but it does not require a replayer.", tracerId);
+//        assert(!"Invalid TracerId supplied to ReplayFactory");
+//    }
+//
+//    if (pLibrary != NULL)
+//    {
+//        pReplayer = VKTRACE_NEW(vktrace_trace_packet_replay_library);
+//        if (pReplayer == NULL)
+//        {
+//            vktrace_LogError("Failed to allocate replayer library.");
+//            vktrace_platform_close_library(pLibrary);
+//        }
+//        else
+//        {
+//            pReplayer->pLibrary = pLibrary;
+//
+//            pReplayer->SetLogCallback = (funcptr_vkreplayer_setlogcallback)vktrace_platform_get_library_entrypoint(pLibrary, "SetLogCallback");
+//            pReplayer->SetLogLevel = (funcptr_vkreplayer_setloglevel)vktrace_platform_get_library_entrypoint(pLibrary, "SetLogLevel");
+//
+//            pReplayer->RegisterDbgMsgCallback = (funcptr_vkreplayer_registerdbgmsgcallback)vktrace_platform_get_library_entrypoint(pLibrary, "RegisterDbgMsgCallback");
+//            pReplayer->GetSettings = (funcptr_vkreplayer_getSettings)vktrace_platform_get_library_entrypoint(pLibrary, "GetSettings");
+//            pReplayer->UpdateFromSettings = (funcptr_vkreplayer_updatefromsettings)vktrace_platform_get_library_entrypoint(pLibrary, "UpdateFromSettings");
+//            pReplayer->Initialize = (funcptr_vkreplayer_initialize)vktrace_platform_get_library_entrypoint(pLibrary, "Initialize");
+//            pReplayer->Deinitialize = (funcptr_vkreplayer_deinitialize)vktrace_platform_get_library_entrypoint(pLibrary, "Deinitialize");
+//            pReplayer->Interpret = (funcptr_vkreplayer_interpret)vktrace_platform_get_library_entrypoint(pLibrary, "Interpret");
+//            pReplayer->Replay = (funcptr_vkreplayer_replay)vktrace_platform_get_library_entrypoint(pLibrary, "Replay");
+//            pReplayer->Dump = (funcptr_vkreplayer_dump)vktrace_platform_get_library_entrypoint(pLibrary, "Dump");
+//
+//            if (pReplayer->SetLogCallback == NULL ||
+//                pReplayer->SetLogLevel == NULL ||
+//                pReplayer->RegisterDbgMsgCallback == NULL ||
+//                pReplayer->GetSettings == NULL ||
+//                pReplayer->UpdateFromSettings == NULL ||
+//                pReplayer->Initialize == NULL ||
+//                pReplayer->Deinitialize == NULL ||
+//                pReplayer->Interpret == NULL ||
+//                pReplayer->Replay == NULL ||
+//                pReplayer->Dump == NULL)
+//            {
+//                VKTRACE_DELETE(pReplayer);
+//                vktrace_platform_close_library(pLibrary);
+//                pReplayer = NULL;
+//            }
+//        }
+//    }
+
+    return pReplayer;
+}
+
+void ReplayFactory::Destroy(vktrace_trace_packet_replay_library** ppReplayer)
+{
+    assert (ppReplayer != NULL);
+    assert (*ppReplayer != NULL);
+    if ((*ppReplayer)->pLibrary != NULL)
+    {
+        vktrace_platform_close_library((*ppReplayer)->pLibrary);
+    }
+    VKTRACE_DELETE(*ppReplayer);
+    *ppReplayer = NULL;
+}
+
+
+} // namespace vktrace_replay

diff --git a/vktrace/src/vktrace_replay/vkreplay_factory.h b/vktrace/src/vktrace_replay/vkreplay_factory.h
new file mode 100644
index 0000000..bd4d9f0
--- /dev/null
+++ b/vktrace/src/vktrace_replay/vkreplay_factory.h

@@ -0,0 +1,98 @@
+/**************************************************************************
+ *
+ * Copyright 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ **************************************************************************/
+#pragma once
+
+extern "C" {
+#include "vktrace_common.h"
+#include "vktrace_settings.h"
+#include "vktrace_trace_packet_identifiers.h"
+}
+#include "vkreplay_window.h"
+#include "vkreplay_main.h"
+
+namespace vktrace_replay {
+
+enum VKTRACE_REPLAY_RESULT
+{
+    VKTRACE_REPLAY_SUCCESS = 0,
+    VKTRACE_REPLAY_ERROR,          // internal error unrelated to the specific packet
+    VKTRACE_REPLAY_INVALID_ID,     // packet_id invalid
+    VKTRACE_REPLAY_BAD_RETURN,     // replay return value != trace return value
+    VKTRACE_REPLAY_CALL_ERROR,     // replaying call caused an error
+    VKTRACE_REPLAY_INVALID_PARAMS, // trace file parameters are invalid
+    VKTRACE_REPLAY_VALIDATION_ERROR // callback Msg error from validation layer
+};
+
+enum VKTRACE_DBG_MSG_TYPE
+{
+    VKTRACE_DBG_MSG_INFO = 0,
+    VKTRACE_DBG_MSG_WARNING,
+    VKTRACE_DBG_MSG_ERROR
+};
+
+// callback signature
+typedef void (*VKTRACE_DBG_MSG_CALLBACK_FUNCTION)(VKTRACE_DBG_MSG_TYPE msgType, const char* pMsg);
+
+// entrypoints that must be exposed by each replayer library
+extern "C"
+{
+// entrypoints
+
+typedef void (VKTRACER_CDECL *funcptr_vkreplayer_setloglevel)(VktraceLogLevel level);
+typedef void (VKTRACER_CDECL *funcptr_vkreplayer_setlogcallback)(VKTRACE_REPORT_CALLBACK_FUNCTION pCallback);
+
+typedef void (VKTRACER_CDECL *funcptr_vkreplayer_registerdbgmsgcallback)(VKTRACE_DBG_MSG_CALLBACK_FUNCTION pCallback);
+typedef vktrace_SettingGroup* (VKTRACER_CDECL *funcptr_vkreplayer_getSettings)();
+typedef void (VKTRACER_CDECL *funcptr_vkreplayer_updatefromsettings)(vktrace_SettingGroup* pSettingGroups, unsigned int numSettingGroups);
+typedef int (VKTRACER_CDECL *funcptr_vkreplayer_initialize)(vktrace_replay::ReplayDisplay* pDisplay, vkreplayer_settings* pReplaySettings);
+typedef void (VKTRACER_CDECL *funcptr_vkreplayer_deinitialize)();
+typedef vktrace_trace_packet_header* (VKTRACER_CDECL *funcptr_vkreplayer_interpret)(vktrace_trace_packet_header* pPacket);
+typedef vktrace_replay::VKTRACE_REPLAY_RESULT (VKTRACER_CDECL *funcptr_vkreplayer_replay)(vktrace_trace_packet_header* pPacket);
+typedef int (VKTRACER_CDECL *funcptr_vkreplayer_dump)();
+typedef int (VKTRACER_CDECL *funcptr_vkreplayer_getframenumber)();
+typedef void (VKTRACER_CDECL *funcptr_vkreplayer_resetframenumber)();
+}
+
+struct vktrace_trace_packet_replay_library
+{
+    void* pLibrary;
+    funcptr_vkreplayer_setloglevel SetLogLevel;
+    funcptr_vkreplayer_setlogcallback SetLogCallback;
+
+    funcptr_vkreplayer_registerdbgmsgcallback RegisterDbgMsgCallback;
+    funcptr_vkreplayer_getSettings GetSettings;
+    funcptr_vkreplayer_updatefromsettings UpdateFromSettings;
+    funcptr_vkreplayer_initialize Initialize;
+    funcptr_vkreplayer_deinitialize Deinitialize;
+    funcptr_vkreplayer_interpret Interpret;
+    funcptr_vkreplayer_replay Replay;
+    funcptr_vkreplayer_dump Dump;
+    funcptr_vkreplayer_getframenumber GetFrameNumber;
+    funcptr_vkreplayer_resetframenumber ResetFrameNumber;
+};
+
+class ReplayFactory {
+public:
+    vktrace_trace_packet_replay_library *Create(uint8_t tracerId);
+    void Destroy(vktrace_trace_packet_replay_library** ppReplayer);
+};
+} // namespace vktrace_replay

diff --git a/vktrace/src/vktrace_replay/vkreplay_main.cpp b/vktrace/src/vktrace_replay/vkreplay_main.cpp
new file mode 100644
index 0000000..80a5544
--- /dev/null
+++ b/vktrace/src/vktrace_replay/vkreplay_main.cpp

@@ -0,0 +1,546 @@
+/**************************************************************************
+ *
+ * Copyright 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ * Author: David Pinedo <david@lunarg.com>
+ **************************************************************************/
+
+#include <stdio.h>
+#include <string>
+#if defined(ANDROID)
+#include <vector>
+#include <sstream>
+#include <android/log.h>
+#include <android_native_app_glue.h>
+#endif
+#include "vktrace_common.h"
+#include "vktrace_tracelog.h"
+#include "vktrace_filelike.h"
+#include "vktrace_trace_packet_utils.h"
+#include "vkreplay_main.h"
+#include "vkreplay_factory.h"
+#include "vkreplay_seq.h"
+#include "vkreplay_window.h"
+
+vkreplayer_settings replaySettings = { NULL, 1, -1, -1, NULL, NULL };
+
+vktrace_SettingInfo g_settings_info[] =
+{
+    { "t", "TraceFile", VKTRACE_SETTING_STRING, { &replaySettings.pTraceFilePath }, { &replaySettings.pTraceFilePath }, TRUE, "The trace file to replay."},
+    { "l", "NumLoops", VKTRACE_SETTING_UINT, { &replaySettings.numLoops }, { &replaySettings.numLoops }, TRUE, "The number of times to replay the trace file or loop range." },
+    { "lsf", "LoopStartFrame", VKTRACE_SETTING_INT, { &replaySettings.loopStartFrame }, { &replaySettings.loopStartFrame }, TRUE, "The start frame number of the loop range." },
+    { "lef", "LoopEndFrame", VKTRACE_SETTING_INT, { &replaySettings.loopEndFrame }, { &replaySettings.loopEndFrame }, TRUE, "The end frame number of the loop range." },
+    { "s", "Screenshot", VKTRACE_SETTING_STRING, { &replaySettings.screenshotList }, { &replaySettings.screenshotList }, TRUE, "Comma separated list of frames to take a snapshot of."},
+#if _DEBUG
+    { "v", "Verbosity", VKTRACE_SETTING_STRING, { &replaySettings.verbosity }, { &replaySettings.verbosity }, TRUE, "Verbosity mode. Modes are \"quiet\", \"errors\", \"warnings\", \"full\", \"debug\"."},
+#else
+    { "v", "Verbosity", VKTRACE_SETTING_STRING, { &replaySettings.verbosity }, { &replaySettings.verbosity }, TRUE, "Verbosity mode. Modes are \"quiet\", \"errors\", \"warnings\", \"full\"."},
+#endif
+};
+
+vktrace_SettingGroup g_replaySettingGroup =
+{
+    "vkreplay",
+    sizeof(g_settings_info) / sizeof(g_settings_info[0]),
+    &g_settings_info[0]
+};
+
+namespace vktrace_replay {
+int main_loop(Sequencer &seq, vktrace_trace_packet_replay_library *replayerArray[], vkreplayer_settings settings)
+{
+    int err = 0;
+    vktrace_trace_packet_header *packet;
+    unsigned int res;
+    vktrace_trace_packet_replay_library *replayer = NULL;
+    vktrace_trace_packet_message* msgPacket;
+    struct seqBookmark startingPacket;
+
+    bool trace_running = true;
+    int prevFrameNumber = -1;
+
+    // record the location of looping start packet
+    seq.record_bookmark();
+    seq.get_bookmark(startingPacket);
+    while (settings.numLoops > 0)
+    {
+        while ((packet = seq.get_next_packet()) != NULL && trace_running)
+        {
+            switch (packet->packet_id) {
+                case VKTRACE_TPI_MESSAGE:
+                    msgPacket = vktrace_interpret_body_as_trace_packet_message(packet);
+                    vktrace_LogAlways("Packet %lu: Traced Message (%s): %s", packet->global_packet_index, vktrace_LogLevelToShortString(msgPacket->type), msgPacket->message);
+                    break;
+                case VKTRACE_TPI_MARKER_CHECKPOINT:
+                    break;
+                case VKTRACE_TPI_MARKER_API_BOUNDARY:
+                    break;
+                case VKTRACE_TPI_MARKER_API_GROUP_BEGIN:
+                    break;
+                case VKTRACE_TPI_MARKER_API_GROUP_END:
+                    break;
+                case VKTRACE_TPI_MARKER_TERMINATE_PROCESS:
+                    break;
+                //TODO processing code for all the above cases
+                default:
+                {
+                    if (packet->tracer_id >= VKTRACE_MAX_TRACER_ID_ARRAY_SIZE  || packet->tracer_id == VKTRACE_TID_RESERVED) {
+                        vktrace_LogError("Tracer_id from packet num packet %d invalid.", packet->packet_id);
+                        continue;
+                    }
+                    replayer = replayerArray[packet->tracer_id];
+                    if (replayer == NULL) {
+                        vktrace_LogWarning("Tracer_id %d has no valid replayer.", packet->tracer_id);
+                        continue;
+                    }
+                    if (packet->packet_id >= VKTRACE_TPI_BEGIN_API_HERE)
+                    {
+                        // replay the API packet
+                        res = replayer->Replay(replayer->Interpret(packet));
+                        if (res != VKTRACE_REPLAY_SUCCESS)
+                        {
+                           vktrace_LogError("Failed to replay packet_id %d, with global_packet_index %d.", packet->packet_id, packet->global_packet_index);
+                           static BOOL QuitOnAnyError=FALSE;
+                           if(QuitOnAnyError)
+                           {
+                              return -1;
+                           }
+                        }
+
+                        // frame control logic
+                        int frameNumber = replayer->GetFrameNumber();
+                        if (prevFrameNumber != frameNumber)
+                        {
+                            prevFrameNumber = frameNumber;
+
+                            if (frameNumber == settings.loopStartFrame)
+                            {
+                                // record the location of looping start packet
+                                seq.record_bookmark();
+                                seq.get_bookmark(startingPacket);
+                            }
+
+                            if (frameNumber == settings.loopEndFrame)
+                            {
+                                trace_running = false;
+                            }
+                        }
+
+                    } else {
+                        vktrace_LogError("Bad packet type id=%d, index=%d.", packet->packet_id, packet->global_packet_index);
+                        return -1;
+                    }
+                }
+            }
+        }
+        settings.numLoops--;
+        //if screenshot is enabled run it for one cycle only
+        //as all consecutive cycles must generate same screen
+        if (replaySettings.screenshotList != NULL)
+        {
+            vktrace_free((char*)replaySettings.screenshotList);
+        }
+        seq.set_bookmark(startingPacket);
+        trace_running = true;
+        if (replayer != NULL)
+        {
+            replayer->ResetFrameNumber();
+        }
+    }
+    return err;
+}
+} // namespace vktrace_replay
+
+using namespace vktrace_replay;
+
+void loggingCallback(VktraceLogLevel level, const char* pMessage)
+{
+    if (level == VKTRACE_LOG_NONE)
+        return;
+
+#if defined(ANDROID)
+    switch(level)
+    {
+    case VKTRACE_LOG_DEBUG: __android_log_print(ANDROID_LOG_DEBUG, "vkreplay", "%s", pMessage); break;
+    case VKTRACE_LOG_ERROR: __android_log_print(ANDROID_LOG_ERROR, "vkreplay", "%s", pMessage); break;
+    case VKTRACE_LOG_WARNING: __android_log_print(ANDROID_LOG_WARN, "vkreplay", "%s", pMessage); break;
+    case VKTRACE_LOG_VERBOSE: __android_log_print(ANDROID_LOG_VERBOSE, "vkreplay", "%s", pMessage); break;
+    default:
+        __android_log_print(ANDROID_LOG_INFO, "vkreplay", "%s", pMessage); break;
+    }
+#else
+    switch(level)
+    {
+    case VKTRACE_LOG_DEBUG: printf("vkreplay debug: %s\n", pMessage); break;
+    case VKTRACE_LOG_ERROR: printf("vkreplay error: %s\n", pMessage); break;
+    case VKTRACE_LOG_WARNING: printf("vkreplay warning: %s\n", pMessage); break;
+    case VKTRACE_LOG_VERBOSE: printf("vkreplay info: %s\n", pMessage); break;
+    default:
+        printf("%s\n", pMessage); break;
+    }
+    fflush(stdout);
+
+#if defined(_DEBUG)
+#if defined(WIN32)
+    OutputDebugString(pMessage);
+#endif
+#endif
+#endif // ANDROID
+}
+
+int vkreplay_main(int argc, char **argv, vktrace_window_handle window = 0)
+{
+    int err = 0;
+    vktrace_SettingGroup* pAllSettings = NULL;
+    unsigned int numAllSettings = 0;
+
+    // Default verbosity level
+    vktrace_LogSetCallback(loggingCallback);
+    vktrace_LogSetLevel(VKTRACE_LOG_ERROR);
+
+    // apply settings from cmd-line args
+    if (vktrace_SettingGroup_init_from_cmdline(&g_replaySettingGroup, argc, argv, &replaySettings.pTraceFilePath) != 0)
+    {
+        // invalid options specified
+        if (pAllSettings != NULL)
+        {
+            vktrace_SettingGroup_Delete_Loaded(&pAllSettings, &numAllSettings);
+        }
+        return 1;
+    }
+
+    // merge settings so that new settings will get written into the settings file
+    vktrace_SettingGroup_merge(&g_replaySettingGroup, &pAllSettings, &numAllSettings);
+
+    // Set verbosity level
+    if (replaySettings.verbosity == NULL || !strcmp(replaySettings.verbosity, "errors"))
+        replaySettings.verbosity = "errors";
+    else if (!strcmp(replaySettings.verbosity, "quiet"))
+        vktrace_LogSetLevel(VKTRACE_LOG_NONE);
+    else if (!strcmp(replaySettings.verbosity, "warnings"))
+        vktrace_LogSetLevel(VKTRACE_LOG_WARNING);
+    else if (!strcmp(replaySettings.verbosity, "full"))
+        vktrace_LogSetLevel(VKTRACE_LOG_VERBOSE);
+#if _DEBUG
+    else if (!strcmp(replaySettings.verbosity, "debug"))
+        vktrace_LogSetLevel(VKTRACE_LOG_DEBUG);
+#endif
+    else
+    {
+        vktrace_SettingGroup_print(&g_replaySettingGroup);
+        return 1;
+    }
+
+    // Set up environment for screenshot
+    if (replaySettings.screenshotList != NULL)
+    {
+        // Set env var that communicates list to ScreenShot layer
+        vktrace_set_global_var("_VK_SCREENSHOT", replaySettings.screenshotList);
+
+    }
+
+    // open trace file and read in header
+    const char* pTraceFile = replaySettings.pTraceFilePath;
+    vktrace_trace_file_header fileHeader;
+    FILE *tracefp;
+
+    if (pTraceFile != NULL && strlen(pTraceFile) > 0)
+    {
+        tracefp = fopen(pTraceFile, "rb");
+        if (tracefp == NULL)
+        {
+            vktrace_LogError("Cannot open trace file: '%s'.", pTraceFile);
+            return 1;
+        }
+    }
+    else
+    {
+        vktrace_LogError("No trace file specified.");
+        vktrace_SettingGroup_print(&g_replaySettingGroup);
+        if (pAllSettings != NULL)
+        {
+            vktrace_SettingGroup_Delete_Loaded(&pAllSettings, &numAllSettings);
+        }
+        return 1;
+    }
+
+    FileLike* traceFile = vktrace_FileLike_create_file(tracefp);
+    if (vktrace_FileLike_ReadRaw(traceFile, &fileHeader, sizeof(fileHeader)) == false)
+    {
+        vktrace_LogError("Unable to read header from file.");
+        if (pAllSettings != NULL)
+        {
+            vktrace_SettingGroup_Delete_Loaded(&pAllSettings, &numAllSettings);
+        }
+        VKTRACE_DELETE(traceFile);
+        return 1;
+    }
+
+    //set global version num
+    vktrace_set_trace_version(fileHeader.trace_file_version);
+
+    // Make sure trace file version is supported
+    if (fileHeader.trace_file_version < VKTRACE_TRACE_FILE_VERSION_MINIMUM_COMPATIBLE)
+    {
+        vktrace_LogError("Trace file version %u is older than minimum compatible version (%u).\nYou'll need to make a new trace file, or use an older replayer.", fileHeader.trace_file_version, VKTRACE_TRACE_FILE_VERSION_MINIMUM_COMPATIBLE);
+    }
+
+    // load any API specific driver libraries and init replayer objects
+    uint8_t tidApi = VKTRACE_TID_RESERVED;
+    vktrace_trace_packet_replay_library* replayer[VKTRACE_MAX_TRACER_ID_ARRAY_SIZE];
+    ReplayFactory makeReplayer;
+
+    // Create window. Initial size is 100x100. It will later get resized to the size
+    // used by the traced app. The resize will happen  during playback of swapchain functions.
+#if defined(ANDROID)
+    vktrace_replay::ReplayDisplay disp(window, 100, 100);
+#else
+    vktrace_replay::ReplayDisplay disp(100, 100, 0, false);
+#endif
+    //**********************************************************
+#if _DEBUG
+    static BOOL debugStartup = FALSE;//TRUE
+    while (debugStartup);
+#endif
+    //***********************************************************
+
+    for (int i = 0; i < VKTRACE_MAX_TRACER_ID_ARRAY_SIZE; i++)
+    {
+        replayer[i] = NULL;
+    }
+
+    for (int i = 0; i < fileHeader.tracer_count; i++)
+    {
+        uint8_t tracerId = fileHeader.tracer_id_array[i].id;
+        tidApi = tracerId;
+
+        const VKTRACE_TRACER_REPLAYER_INFO* pReplayerInfo = &(gs_tracerReplayerInfo[tracerId]);
+
+        if (pReplayerInfo->tracerId != tracerId)
+        {
+            vktrace_LogError("Replayer info for TracerId (%d) failed consistency check.", tracerId);
+            assert(!"TracerId in VKTRACE_TRACER_REPLAYER_INFO does not match the requested tracerId. The array needs to be corrected.");
+        }
+        else if (pReplayerInfo->needsReplayer == TRUE)
+        {
+            // Have our factory create the necessary replayer
+            replayer[tracerId] = makeReplayer.Create(tracerId);
+
+            if (replayer[tracerId] == NULL)
+            {
+                // replayer failed to be created
+                if (pAllSettings != NULL)
+                {
+                    vktrace_SettingGroup_Delete_Loaded(&pAllSettings, &numAllSettings);
+                }
+                return 1;
+            }
+
+            // merge the replayer's settings into the list of all settings so that we can output a comprehensive settings file later on.
+            vktrace_SettingGroup_merge(replayer[tracerId]->GetSettings(), &pAllSettings, &numAllSettings);
+
+            // update the replayer with the loaded settings
+            replayer[tracerId]->UpdateFromSettings(pAllSettings, numAllSettings);
+
+            // Initialize the replayer
+            err = replayer[tracerId]->Initialize(&disp, &replaySettings);
+            if (err) {
+                vktrace_LogError("Couldn't Initialize replayer for TracerId %d.", tracerId);
+                if (pAllSettings != NULL)
+                {
+                    vktrace_SettingGroup_Delete_Loaded(&pAllSettings, &numAllSettings);
+                }
+                return err;
+            }
+        }
+    }
+
+    if (tidApi == VKTRACE_TID_RESERVED) {
+        vktrace_LogError("No API specified in tracefile for replaying.");
+        if (pAllSettings != NULL)
+        {
+            vktrace_SettingGroup_Delete_Loaded(&pAllSettings, &numAllSettings);
+        }
+        return -1;
+    }
+ 
+    // main loop
+    Sequencer sequencer(traceFile);
+    err = vktrace_replay::main_loop(sequencer, replayer, replaySettings);
+
+    for (int i = 0; i < VKTRACE_MAX_TRACER_ID_ARRAY_SIZE; i++)
+    {
+        if (replayer[i] != NULL)
+        {
+            replayer[i]->Deinitialize();
+            makeReplayer.Destroy(&replayer[i]);
+        }
+    }
+
+    if (pAllSettings != NULL)
+    {
+        vktrace_SettingGroup_Delete_Loaded(&pAllSettings, &numAllSettings);
+    }
+    return err;
+}
+
+#if defined(ANDROID)
+static bool initialized = false;
+static bool active = false;
+
+// Convert Intents to argv
+// Ported from Hologram sample, only difference is flexible key
+std::vector<std::string> get_args(android_app &app, const char* intent_extra_data_key)
+{
+    std::vector<std::string> args;
+    JavaVM &vm = *app.activity->vm;
+    JNIEnv *p_env;
+    if (vm.AttachCurrentThread(&p_env, nullptr) != JNI_OK)
+        return args;
+
+    JNIEnv &env = *p_env;
+    jobject activity = app.activity->clazz;
+    jmethodID get_intent_method = env.GetMethodID(env.GetObjectClass(activity),
+            "getIntent", "()Landroid/content/Intent;");
+    jobject intent = env.CallObjectMethod(activity, get_intent_method);
+    jmethodID get_string_extra_method = env.GetMethodID(env.GetObjectClass(intent),
+            "getStringExtra", "(Ljava/lang/String;)Ljava/lang/String;");
+    jvalue get_string_extra_args;
+    get_string_extra_args.l = env.NewStringUTF(intent_extra_data_key);
+    jstring extra_str = static_cast<jstring>(env.CallObjectMethodA(intent,
+            get_string_extra_method, &get_string_extra_args));
+
+    std::string args_str;
+    if (extra_str) {
+        const char *extra_utf = env.GetStringUTFChars(extra_str, nullptr);
+        args_str = extra_utf;
+        env.ReleaseStringUTFChars(extra_str, extra_utf);
+        env.DeleteLocalRef(extra_str);
+    }
+
+    env.DeleteLocalRef(get_string_extra_args.l);
+    env.DeleteLocalRef(intent);
+    vm.DetachCurrentThread();
+
+    // split args_str
+    std::stringstream ss(args_str);
+    std::string arg;
+    while (std::getline(ss, arg, ' ')) {
+        if (!arg.empty())
+            args.push_back(arg);
+    }
+
+    return args;
+}
+
+static int32_t processInput(struct android_app* app, AInputEvent* event) {
+    return 0;
+}
+
+static void processCommand(struct android_app* app, int32_t cmd) {
+    switch(cmd) {
+        case APP_CMD_INIT_WINDOW: {
+            if (app->window) {
+                initialized = true;
+            }
+            break;
+        }
+        case APP_CMD_GAINED_FOCUS: {
+            active = true;
+            break;
+        }
+        case APP_CMD_LOST_FOCUS: {
+            active = false;
+            break;
+        }
+    }
+}
+
+// Start with carbon copy of main() and convert it to support Android, then diff them and move common code to helpers.
+void android_main(struct android_app *app)
+{
+    app_dummy();
+
+    const char* appTag = "vkreplay";
+
+    int vulkanSupport = InitVulkan();
+    if (vulkanSupport == 0) {
+        __android_log_print(ANDROID_LOG_ERROR, appTag, "No Vulkan support found");
+        return;
+    }
+
+    app->onAppCmd = processCommand;
+    app->onInputEvent = processInput;
+
+    while(1) {
+        int events;
+        struct android_poll_source* source;
+        while (ALooper_pollAll(active ? 0 : -1, NULL, &events, (void**)&source) >= 0) {
+            if (source) {
+                source->process(app, source);
+            }
+
+            if (app->destroyRequested != 0) {
+                // anything to clean up?
+                return;
+            }
+        }
+
+        if (initialized && active) {
+            // Parse Intents into argc, argv
+            // Use the following key to send arguments to gtest, i.e.
+            // --es args "-v\ debug\ -t\ /sdcard/cube0.vktrace"
+            const char key[] = "args";
+            std::vector<std::string> args = get_args(*app, key);
+
+            int argc = args.size() + 1;
+
+            char** argv = (char**) malloc(argc * sizeof(char*));
+            argv[0] = (char*)"vkreplay";
+            for (int i = 0; i < args.size(); i++)
+                argv[i + 1] = (char*) args[i].c_str();
+
+
+            __android_log_print(ANDROID_LOG_INFO, appTag, "argc = %i", argc);
+            for (int i = 0; i < argc; i++)
+                __android_log_print(ANDROID_LOG_INFO, appTag, "argv[%i] = %s", i, argv[i]);
+
+            // sleep to allow attaching debugger
+            //sleep(10);
+
+            // Call into common code
+            int err = vkreplay_main(argc, argv, app->window);
+            __android_log_print(ANDROID_LOG_DEBUG, appTag, "vkreplay_main returned %i", err);
+
+            ANativeActivity_finish(app->activity);
+            free(argv);
+
+            return;
+        }
+    }
+}
+
+#else // ANDROID
+
+extern "C"
+int main(int argc, char **argv)
+{
+    return vkreplay_main(argc, argv);
+}
+
+#endif

diff --git a/vktrace/src/vktrace_replay/vkreplay_main.h b/vktrace/src/vktrace_replay/vkreplay_main.h
new file mode 100644
index 0000000..a63e1af
--- /dev/null
+++ b/vktrace/src/vktrace_replay/vkreplay_main.h

@@ -0,0 +1,35 @@
+/**************************************************************************
+ *
+ * Copyright 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Tony Barbour <tony@lunarg.com>
+ **************************************************************************/
+#ifndef VKREPLAY__MAIN_H
+#define VKREPLAY__MAIN_H
+
+typedef struct vkreplayer_settings
+{
+    char* pTraceFilePath;
+    unsigned int numLoops;
+    int loopStartFrame;
+    int loopEndFrame;
+    const char* screenshotList;
+    const char* verbosity;
+} vkreplayer_settings;
+
+#endif // VKREPLAY__MAIN_H

diff --git a/vktrace/src/vktrace_replay/vkreplay_seq.cpp b/vktrace/src/vktrace_replay/vkreplay_seq.cpp
new file mode 100644
index 0000000..106f019
--- /dev/null
+++ b/vktrace/src/vktrace_replay/vkreplay_seq.cpp

@@ -0,0 +1,52 @@
+/**************************************************************************
+ *
+ * Copyright 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ **************************************************************************/
+#include "vkreplay_seq.h"
+
+extern "C" {
+#include "vktrace_trace_packet_utils.h"
+}
+
+namespace vktrace_replay {
+
+vktrace_trace_packet_header * Sequencer::get_next_packet()
+{
+    vktrace_free(m_lastPacket);
+    if (!m_pFile)
+        return (NULL);
+    m_lastPacket = vktrace_read_trace_packet(m_pFile);
+    return(m_lastPacket);
+}
+
+void Sequencer::get_bookmark(seqBookmark &bookmark) {
+    bookmark.file_offset = m_bookmark.file_offset;
+}
+
+
+void Sequencer::set_bookmark(const seqBookmark &bookmark) {
+    fseek(m_pFile->mFile, m_bookmark.file_offset, SEEK_SET);
+}
+
+void Sequencer::record_bookmark()
+{
+    m_bookmark.file_offset = ftell(m_pFile->mFile);
+}
+
+} /* namespace vktrace_replay */

diff --git a/vktrace/src/vktrace_replay/vkreplay_seq.h b/vktrace/src/vktrace_replay/vkreplay_seq.h
new file mode 100644
index 0000000..67f20d1
--- /dev/null
+++ b/vktrace/src/vktrace_replay/vkreplay_seq.h

@@ -0,0 +1,71 @@
+/**************************************************************************
+ *
+ * Copyright 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ **************************************************************************/
+#pragma once
+
+extern "C" {
+#include "vktrace_filelike.h"
+#include "vktrace_trace_packet_identifiers.h"
+}
+
+/* Class to handle fetching and sequencing packets from a tracefile.
+ * Contains no knowledge of type of tracer needed to process packet.
+ * Requires low level file/stream reading/seeking support. */
+namespace vktrace_replay {
+
+
+struct seqBookmark
+{
+    unsigned int file_offset;
+};
+
+
+// replay Sequencer interface
+ class AbstractSequencer
+ {
+ public:
+    virtual ~AbstractSequencer() {}
+    virtual vktrace_trace_packet_header *get_next_packet() = 0;
+    virtual void get_bookmark(seqBookmark &bookmark) = 0;
+    virtual void set_bookmark(const seqBookmark &bookmark) = 0;
+ };
+
+class Sequencer: public AbstractSequencer
+{
+
+public:
+    Sequencer(FileLike* pFile) : m_lastPacket(NULL), m_pFile(pFile) {}
+    ~Sequencer() { delete m_lastPacket;}
+    
+    vktrace_trace_packet_header *get_next_packet();
+    void get_bookmark(seqBookmark &bookmark);
+    void set_bookmark(const seqBookmark &bookmark);
+    void record_bookmark();
+    
+private:
+    vktrace_trace_packet_header *m_lastPacket;
+    seqBookmark m_bookmark;
+    FileLike *m_pFile;
+    
+};
+
+} /* namespace vktrace_replay */
+
+

diff --git a/vktrace/src/vktrace_replay/vkreplay_window.h b/vktrace/src/vktrace_replay/vkreplay_window.h
new file mode 100644
index 0000000..7232296
--- /dev/null
+++ b/vktrace/src/vktrace_replay/vkreplay_window.h

@@ -0,0 +1,141 @@
+/**************************************************************************
+ *
+ * Copyright 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ **************************************************************************/
+#pragma once
+
+extern "C"{
+#include "vktrace_platform.h"
+}
+
+#if defined(PLATFORM_LINUX) || defined(XCB_NVIDIA)
+#if defined(ANDROID)
+#include <android_native_app_glue.h>
+typedef ANativeWindow* vktrace_window_handle;
+#else
+#include <xcb/xcb.h>
+typedef xcb_window_t vktrace_window_handle;
+#endif
+#elif defined(WIN32)
+typedef HWND vktrace_window_handle;
+#endif
+
+/* classes to abstract the display and initialization of rendering API for presenting
+ * framebuffers for display into a window on the screen or else fullscreen.
+ * Uses Bridge design pattern.
+ */
+namespace vktrace_replay {
+
+class ReplayDisplayImp {
+public:
+    virtual ~ReplayDisplayImp() {}
+    virtual int init(const unsigned int gpu_idx) = 0;
+    virtual int set_window(vktrace_window_handle hWindow, unsigned int width, unsigned int height) = 0;
+    virtual int create_window(const unsigned int width, const unsigned int height) = 0;
+    virtual void process_event() = 0;
+};
+
+class ReplayDisplay {
+public:
+    ReplayDisplay()
+        : m_imp(NULL),
+        m_width(0),
+        m_height(0),
+        m_gpu(0),
+        m_fullscreen(false),
+        m_hWindow(0)
+    {
+
+    }
+
+    ReplayDisplay(const unsigned int width, const unsigned int height, const unsigned int gpu, const bool fullscreen) :
+        m_imp(NULL),
+        m_width(width),
+        m_height(height),
+        m_gpu(gpu),
+        m_fullscreen(fullscreen),
+        m_hWindow(0)
+    {
+    }
+
+    ReplayDisplay(vktrace_window_handle hWindow, unsigned int width, unsigned int height) :
+        m_imp(NULL),
+        m_width(width),
+        m_height(height),
+        m_gpu(0),
+        m_fullscreen(false),
+        m_hWindow(hWindow)
+    {
+    }
+
+    virtual ~ReplayDisplay()
+    {
+    }
+
+    void set_implementation(ReplayDisplayImp & disp)
+    {
+        m_imp = & disp;
+    }
+    void set_implementation(ReplayDisplayImp * disp)
+    {
+        m_imp = disp;
+    }
+    int init()
+    {
+        if (m_imp)
+            return m_imp->init(m_gpu);
+        else
+            return -1;
+    }
+    void process_event()
+    {
+        if(m_imp)
+            m_imp->process_event();
+    }
+    unsigned int get_gpu()
+    {
+        return m_gpu;
+    }
+    unsigned int get_width()
+    {
+        return m_width;
+    }
+    unsigned int get_height()
+    {
+        return m_height;
+    }
+    bool get_fullscreen()
+    {
+        return m_fullscreen;
+    }
+    vktrace_window_handle get_window_handle()
+    {
+        return m_hWindow;
+    }
+
+private:
+    ReplayDisplayImp *m_imp;
+    unsigned int m_width;
+    unsigned int m_height;
+    unsigned int m_gpu;
+    bool m_fullscreen;
+    vktrace_window_handle m_hWindow;
+};
+
+}   // namespace vktrace_replay

diff --git a/vktrace/src/vktrace_trace/CMakeLists.txt b/vktrace/src/vktrace_trace/CMakeLists.txt
new file mode 100644
index 0000000..63ba576
--- /dev/null
+++ b/vktrace/src/vktrace_trace/CMakeLists.txt

@@ -0,0 +1,27 @@
+cmake_minimum_required(VERSION 2.8)
+project(vktrace)
+
+include("${SRC_DIR}/build_options.cmake")
+
+include_directories(${CMAKE_CURRENT_BINARY_DIR})
+
+set(SRC_LIST
+    ${SRC_LIST}
+    vktrace.cpp
+    vktrace_process.h
+    vktrace_process.cpp
+)
+
+include_directories(
+    ${SRC_DIR}
+    ${SRC_DIR}/vktrace_common
+    ${SRC_DIR}/vktrace_trace
+)
+
+add_executable(${PROJECT_NAME} ${SRC_LIST})
+
+target_link_libraries(${PROJECT_NAME}
+    vktrace_common
+)
+
+build_options_finalize()

diff --git a/vktrace/src/vktrace_trace/vktrace.cpp b/vktrace/src/vktrace_trace/vktrace.cpp
new file mode 100644
index 0000000..ad60466
--- /dev/null
+++ b/vktrace/src/vktrace_trace/vktrace.cpp

@@ -0,0 +1,359 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ **************************************************************************/
+#include "vktrace.h"
+
+#include "vktrace_process.h"
+
+extern "C" {
+#include "vktrace_common.h"
+#include "vktrace_filelike.h"
+#include "vktrace_interconnect.h"
+#include "vktrace_trace_packet_identifiers.h"
+#include "vktrace_trace_packet_utils.h"
+}
+
+#include <sys/types.h>
+#include <sys/stat.h>
+
+vktrace_settings g_settings;
+vktrace_settings g_default_settings;
+
+vktrace_SettingInfo g_settings_info[] =
+{
+    // common command options
+    { "p", "Program", VKTRACE_SETTING_STRING, { &g_settings.program }, { &g_default_settings.program }, TRUE, "The program to trace."},
+    { "a", "Arguments", VKTRACE_SETTING_STRING, { &g_settings.arguments }, { &g_default_settings.arguments }, TRUE, "Cmd-line arguments to pass to trace program."},
+    { "w", "WorkingDir", VKTRACE_SETTING_STRING, { &g_settings.working_dir }, { &g_default_settings.working_dir }, TRUE, "The program's working directory."},
+    { "o", "OutputTrace", VKTRACE_SETTING_STRING, { &g_settings.output_trace }, { &g_default_settings.output_trace }, TRUE, "Path to the generated output trace file."},
+    { "s", "ScreenShot", VKTRACE_SETTING_STRING, { &g_settings.screenshotList }, { &g_default_settings.screenshotList }, TRUE, "Comma separated list of frames to take a snapshot of."},
+    { "ptm", "PrintTraceMessages", VKTRACE_SETTING_BOOL, { &g_settings.print_trace_messages }, { &g_default_settings.print_trace_messages }, TRUE, "Print trace messages to vktrace console."},
+#if _DEBUG
+    { "v", "Verbosity", VKTRACE_SETTING_STRING, { &g_settings.verbosity }, { &g_default_settings.verbosity }, TRUE, "Verbosity mode. Modes are \"quiet\", \"errors\", \"warnings\", \"full\", \"debug\"."},
+#else
+    { "v", "Verbosity", VKTRACE_SETTING_STRING, { &g_settings.verbosity }, { &g_default_settings.verbosity }, TRUE, "Verbosity mode. Modes are \"quiet\", \"errors\", \"warnings\", \"full\"."},
+#endif
+
+    //{ "z", "pauze", VKTRACE_SETTING_BOOL, &g_settings.pause, &g_default_settings.pause, TRUE, "Wait for a key at startup (so a debugger can be attached)" },
+};
+
+vktrace_SettingGroup g_settingGroup =
+{
+    "vktrace",
+    sizeof(g_settings_info) / sizeof(g_settings_info[0]),
+    &g_settings_info[0]
+};
+
+// ------------------------------------------------------------------------------------------------
+#if defined(WIN32)
+void MessageLoop()
+{
+    MSG msg = { 0 };
+    bool quit = false;
+    while (!quit)
+    {
+        if (GetMessage(&msg, NULL, 0, 0) == FALSE)
+        {
+            quit = true;
+        }
+        else
+        {
+            quit = (msg.message == VKTRACE_WM_COMPLETE);
+        }
+    }
+}
+#endif
+
+int PrepareTracers(vktrace_process_capture_trace_thread_info** ppTracerInfo)
+{
+    unsigned int num_tracers = 1;
+
+    assert(ppTracerInfo != NULL && *ppTracerInfo == NULL);
+    *ppTracerInfo = VKTRACE_NEW_ARRAY(vktrace_process_capture_trace_thread_info, num_tracers);
+    memset(*ppTracerInfo, 0, sizeof(vktrace_process_capture_trace_thread_info) * num_tracers);
+
+    // we only support Vulkan tracer
+    (*ppTracerInfo)[0].tracerId = VKTRACE_TID_VULKAN;
+
+    return num_tracers;
+}
+
+bool InjectTracersIntoProcess(vktrace_process_info* pInfo)
+{
+    bool bRecordingThreadsCreated = true;
+    vktrace_thread tracingThread;
+    if (vktrace_platform_remote_load_library(pInfo->hProcess, NULL, &tracingThread, NULL)) {
+        // prepare data for capture threads
+        pInfo->pCaptureThreads[0].pProcessInfo = pInfo;
+        pInfo->pCaptureThreads[0].recordingThread = VKTRACE_NULL_THREAD;
+
+        // create thread to record trace packets from the tracer
+        pInfo->pCaptureThreads[0].recordingThread = vktrace_platform_create_thread(Process_RunRecordTraceThread, &(pInfo->pCaptureThreads[0]));
+        if (pInfo->pCaptureThreads[0].recordingThread == VKTRACE_NULL_THREAD) {
+            vktrace_LogError("Failed to create trace recording thread.");
+            bRecordingThreadsCreated = false;
+        }
+
+    } else {
+        // failed to inject a DLL
+        bRecordingThreadsCreated = false;
+    }
+    return bRecordingThreadsCreated;
+}
+
+void loggingCallback(VktraceLogLevel level, const char* pMessage)
+{
+    if (level == VKTRACE_LOG_NONE)
+        return;
+
+    switch(level)
+    {
+    case VKTRACE_LOG_DEBUG: printf("vktrace debug: %s\n", pMessage); break;
+    case VKTRACE_LOG_ERROR: printf("vktrace error: %s\n", pMessage); break;
+    case VKTRACE_LOG_WARNING: printf("vktrace warning: %s\n", pMessage); break;
+    case VKTRACE_LOG_VERBOSE: printf("vktrace info: %s\n", pMessage); break;
+    default:
+        printf("%s\n", pMessage); break;
+    }
+    fflush(stdout);
+
+#if defined(WIN32)
+#if _DEBUG
+    OutputDebugString(pMessage);
+#endif
+#endif
+}
+
+// ------------------------------------------------------------------------------------------------
+int main(int argc, char* argv[])
+{
+    memset(&g_settings, 0, sizeof(vktrace_settings));
+
+    vktrace_LogSetCallback(loggingCallback);
+    vktrace_LogSetLevel(VKTRACE_LOG_ERROR);
+
+    // setup defaults
+    memset(&g_default_settings, 0, sizeof(vktrace_settings));
+    g_default_settings.output_trace = vktrace_allocate_and_copy("vktrace_out.vktrace");
+    g_default_settings.verbosity = "errors";
+    g_default_settings.screenshotList = NULL;
+
+    if (vktrace_SettingGroup_init(&g_settingGroup, NULL, argc, argv, &g_settings.arguments) != 0)
+    {
+        // invalid cmd-line parameters
+        vktrace_SettingGroup_delete(&g_settingGroup);
+        vktrace_free(g_default_settings.output_trace);
+        return -1;
+    }
+    else
+    {
+        // Validate vktrace inputs
+        BOOL validArgs = TRUE;
+
+        if (g_settings.output_trace == NULL || strlen (g_settings.output_trace) == 0)
+        {
+            validArgs = FALSE;
+        }
+
+        if (strcmp(g_settings.verbosity, "quiet") == 0)
+            vktrace_LogSetLevel(VKTRACE_LOG_NONE);
+        else if (strcmp(g_settings.verbosity, "errors") == 0)
+            vktrace_LogSetLevel(VKTRACE_LOG_ERROR);
+        else if (strcmp(g_settings.verbosity, "warnings") == 0)
+            vktrace_LogSetLevel(VKTRACE_LOG_WARNING);
+        else if (strcmp(g_settings.verbosity, "full") == 0)
+            vktrace_LogSetLevel(VKTRACE_LOG_VERBOSE);
+#if _DEBUG
+        else if (strcmp(g_settings.verbosity, "debug") == 0)
+            vktrace_LogSetLevel(VKTRACE_LOG_DEBUG);
+#endif
+        else
+        {
+            vktrace_LogSetLevel(VKTRACE_LOG_ERROR);
+            validArgs = FALSE;
+        }
+        vktrace_set_global_var("_VK_TRACE_VERBOSITY", g_settings.verbosity);
+
+        if (validArgs == FALSE)
+        {
+            vktrace_SettingGroup_print(&g_settingGroup);
+            return -1;
+        }
+
+        if (g_settings.program == NULL || strlen(g_settings.program) == 0)
+        {
+            vktrace_LogWarning("No program (-p) parameter found: Will run vktrace as server.");
+            printf("Running vktrace as server...\n");
+            fflush(stdout);
+            g_settings.arguments = NULL;
+        }
+        else
+        {
+            if (g_settings.working_dir == NULL || strlen(g_settings.working_dir) == 0)
+            {
+                CHAR* buf = VKTRACE_NEW_ARRAY(CHAR, 4096);
+                vktrace_LogVerbose("No working directory (-w) parameter found: Assuming executable's path as working directory.");
+                vktrace_platform_full_path(g_settings.program, 4096, buf);
+                g_settings.working_dir = vktrace_platform_extract_path(buf);
+                VKTRACE_DELETE(buf);
+            }
+
+            vktrace_LogVerbose("Running vktrace as parent process will spawn child process: %s", g_settings.program);
+            if (g_settings.arguments != NULL && strlen(g_settings.arguments) > 0)
+            {
+                vktrace_LogVerbose("Args to be passed to child process: '%s'", g_settings.arguments);
+            }
+        }
+    }
+
+    if (g_settings.screenshotList)
+    {
+        // Export list to screenshot layer
+        vktrace_set_global_var("_VK_SCREENSHOT", g_settings.screenshotList);
+    }
+    else
+    {
+        vktrace_set_global_var("_VK_SCREENSHOT","");
+    }
+
+
+    unsigned int serverIndex = 0;
+    do {
+        // Create and start the process or run in server mode
+
+        BOOL procStarted = TRUE;
+        vktrace_process_info procInfo;
+        memset(&procInfo, 0, sizeof(vktrace_process_info));
+        if (g_settings.program != NULL)
+        {
+            procInfo.exeName = vktrace_allocate_and_copy(g_settings.program);
+            procInfo.processArgs = vktrace_allocate_and_copy(g_settings.arguments);
+            procInfo.fullProcessCmdLine = vktrace_copy_and_append(g_settings.program, " ", g_settings.arguments);
+            procInfo.workingDirectory = vktrace_allocate_and_copy(g_settings.working_dir);
+            procInfo.traceFilename = vktrace_allocate_and_copy(g_settings.output_trace);
+        } else
+        {
+            const char *pExtension = strrchr(g_settings.output_trace, '.');
+            char *basename = vktrace_allocate_and_copy_n(g_settings.output_trace, (int) ((pExtension == NULL) ? strlen(g_settings.output_trace) : pExtension - g_settings.output_trace));
+            char num[16];
+#ifdef PLATFORM_LINUX
+            snprintf(num, 16, "%u", serverIndex);
+#elif defined(WIN32)
+            _snprintf_s(num, 16, _TRUNCATE, "%u", serverIndex);
+#endif
+            procInfo.traceFilename = vktrace_copy_and_append(basename, num, pExtension);
+         }
+
+        procInfo.parentThreadId = vktrace_platform_get_thread_id();
+
+        // setup tracer, only Vulkan tracer suppported
+        PrepareTracers(&procInfo.pCaptureThreads);
+
+        if (g_settings.program != NULL)
+        {
+            char *instEnv = vktrace_get_global_var("VK_INSTANCE_LAYERS");
+            // Add ScreenShot layer if enabled
+            if (g_settings.screenshotList && (!instEnv || !strstr(instEnv, "VK_LAYER_LUNARG_screenshot")))
+            {
+                if (!instEnv || strlen(instEnv)  == 0)
+                    vktrace_set_global_var("VK_INSTANCE_LAYERS", "VK_LAYER_LUNARG_screenshot");
+                else
+                {
+                    char *newEnv = vktrace_copy_and_append(instEnv, VKTRACE_LIST_SEPARATOR, "VK_LAYER_LUNARG_screenshot");
+                    vktrace_set_global_var("VK_INSTANCE_LAYERS", newEnv);
+                }
+                instEnv = vktrace_get_global_var("VK_INSTANCE_LAYERS");
+            }
+            char *devEnv = vktrace_get_global_var("VK_DEVICE_LAYERS");
+            if (g_settings.screenshotList && (!devEnv || !strstr(devEnv, "VK_LAYER_LUNARG_screenshot")))
+            {
+                if (!devEnv || strlen(devEnv) == 0)
+                    vktrace_set_global_var("VK_DEVICE_LAYERS", "VK_LAYER_LUNARG_screenshot");
+                else
+                {
+                    char *newEnv = vktrace_copy_and_append(devEnv, VKTRACE_LIST_SEPARATOR, "VK_LAYER_LUNARG_screenshot");
+                    vktrace_set_global_var("VK_DEVICE_LAYERS", newEnv);
+                }
+                devEnv = vktrace_get_global_var("VK_DEVICE_LAYERS");
+            }
+            // Add vktrace_layer enable env var if needed
+            if (!instEnv || strlen(instEnv) == 0)
+            {
+                vktrace_set_global_var("VK_INSTANCE_LAYERS", "VK_LAYER_LUNARG_vktrace");
+            }
+            else if (instEnv != strstr(instEnv, "VK_LAYER_LUNARG_vktrace"))
+            {
+                char *newEnv = vktrace_copy_and_append("VK_LAYER_LUNARG_vktrace", VKTRACE_LIST_SEPARATOR, instEnv);
+                vktrace_set_global_var("VK_INSTANCE_LAYERS", newEnv);
+            }
+            if (!devEnv || strlen(devEnv) == 0)
+            {
+                vktrace_set_global_var("VK_DEVICE_LAYERS", "VK_LAYER_LUNARG_vktrace");
+            }
+            else if (devEnv != strstr(devEnv, "VK_LAYER_LUNARG_vktrace"))
+            {
+                char *newEnv = vktrace_copy_and_append("VK_LAYER_LUNARG_vktrace", VKTRACE_LIST_SEPARATOR, devEnv);
+                vktrace_set_global_var("VK_DEVICE_LAYERS", newEnv);
+            }
+            // call CreateProcess to launch the application
+            procStarted = vktrace_process_spawn(&procInfo);
+        }
+        if (procStarted == FALSE)
+        {
+            vktrace_LogError("Failed to setup remote process.");
+        }
+        else
+        {
+            if (InjectTracersIntoProcess(&procInfo) == FALSE)
+            {
+                vktrace_LogError("Failed to setup tracer communication threads.");
+                return -1;
+            }
+
+            // create watchdog thread to monitor existence of remote process
+            if (g_settings.program != NULL)
+                procInfo.watchdogThread = vktrace_platform_create_thread(Process_RunWatchdogThread, &procInfo);
+
+#if defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+            // Sync wait for local threads and remote process to complete.
+
+            vktrace_platform_sync_wait_for_thread(&(procInfo.pCaptureThreads[0].recordingThread));
+
+            if (g_settings.program != NULL)
+                vktrace_platform_sync_wait_for_thread(&procInfo.watchdogThread);
+#else
+            vktrace_platform_resume_thread(&procInfo.hThread);
+
+            // Now into the main message loop, listen for hotkeys to send over.
+            MessageLoop();
+#endif
+        }
+
+        vktrace_process_info_delete(&procInfo);
+        serverIndex++;
+    } while (g_settings.program == NULL);
+
+    vktrace_SettingGroup_delete(&g_settingGroup);
+    vktrace_free(g_default_settings.output_trace);
+
+    return 0;
+}
+

diff --git a/vktrace/src/vktrace_trace/vktrace.h b/vktrace/src/vktrace_trace/vktrace.h
new file mode 100644
index 0000000..870abb6
--- /dev/null
+++ b/vktrace/src/vktrace_trace/vktrace.h

@@ -0,0 +1,46 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ **************************************************************************/
+#pragma once
+
+extern "C" {
+#include "vktrace_settings.h"
+}
+
+
+#if defined(WIN32)
+#define VKTRACE_WM_COMPLETE (WM_USER + 0)
+#endif
+
+//----------------------------------------------------------------------------------------------------------------------
+// globals
+//----------------------------------------------------------------------------------------------------------------------
+typedef struct vktrace_settings
+{
+    const char* program;
+    const char* arguments;
+    const char* working_dir;
+    char* output_trace;
+    BOOL print_trace_messages;
+    const char* screenshotList;
+    const char *verbosity;
+} vktrace_settings;
+
+extern vktrace_settings g_settings;

diff --git a/vktrace/src/vktrace_trace/vktrace_process.cpp b/vktrace/src/vktrace_trace/vktrace_process.cpp
new file mode 100644
index 0000000..295ab2c
--- /dev/null
+++ b/vktrace/src/vktrace_trace/vktrace_process.cpp

@@ -0,0 +1,198 @@
+/*
+ * Copyright (c) 2013, NVIDIA CORPORATION. All rights reserved.
+ * Copyright (c) 2014-2016 Valve Corporation. All rights reserved.
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * 
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ */
+
+#include <string>
+#include "vktrace_process.h"
+#include "vktrace.h"
+
+#if defined(PLATFORM_LINUX)
+#include <sys/prctl.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#endif
+
+#if defined(PLATFORM_OSX)
+#include <sys/types.h>
+#include <sys/wait.h>
+#endif
+
+extern "C" {
+#include "vktrace_filelike.h"
+#include "vktrace_interconnect.h"
+#include "vktrace_trace_packet_utils.h"
+}
+
+const unsigned long kWatchDogPollTime = 250;
+
+#if defined(WIN32)
+void SafeCloseHandle(HANDLE& _handle)
+{
+    if (_handle) {
+        CloseHandle(_handle);
+        _handle = NULL;
+    }
+}
+#endif
+
+// ------------------------------------------------------------------------------------------------
+VKTRACE_THREAD_ROUTINE_RETURN_TYPE Process_RunWatchdogThread(LPVOID _procInfoPtr)
+{
+    vktrace_process_info* pProcInfo = (vktrace_process_info*)_procInfoPtr;
+
+#if defined(WIN32)
+
+    while (WaitForSingleObject(pProcInfo->hProcess, kWatchDogPollTime) == WAIT_TIMEOUT)
+    {
+        if (pProcInfo->serverRequestsTermination)
+        {
+            vktrace_LogVerbose("Vktrace has requested exit.");
+            return 0;
+        }
+    }
+
+    vktrace_LogVerbose("Child process has terminated.");
+
+    PostThreadMessage(pProcInfo->parentThreadId, VKTRACE_WM_COMPLETE, 0, 0);
+    pProcInfo->serverRequestsTermination = TRUE;
+    
+#elif defined(PLATFORM_LINUX) || defined(PLATFORM_OSX)
+    int status = 0;
+    int options = 0;
+    while (waitpid(pProcInfo->processId, &status, options) != -1)
+    {
+        if (WIFEXITED(status))
+        {
+            vktrace_LogVerbose("Child process exited.");
+            break;
+        }
+        else if (WCOREDUMP(status))
+        {
+            vktrace_LogError("Child process crashed.");
+            break;
+        }
+        else if (WIFSIGNALED(status))
+            vktrace_LogVerbose("Child process was signaled.");
+        else if (WIFSTOPPED(status))
+            vktrace_LogVerbose("Child process was stopped.");
+        else if (WIFCONTINUED(status))
+            vktrace_LogVerbose("Child process was continued.");
+    }
+#endif
+    return 0;
+}
+
+// ------------------------------------------------------------------------------------------------
+VKTRACE_THREAD_ROUTINE_RETURN_TYPE Process_RunRecordTraceThread(LPVOID _threadInfo)
+{
+    vktrace_process_capture_trace_thread_info* pInfo = (vktrace_process_capture_trace_thread_info*)_threadInfo;
+
+    MessageStream* pMessageStream = vktrace_MessageStream_create(TRUE, "", VKTRACE_BASE_PORT + pInfo->tracerId);
+    if (pMessageStream == NULL)
+    {
+        vktrace_LogError("Thread_CaptureTrace() cannot create message stream.");
+        return 1;
+    }
+
+    // create trace file
+    pInfo->pProcessInfo->pTraceFile = vktrace_write_trace_file_header(pInfo->pProcessInfo);
+
+    if (pInfo->pProcessInfo->pTraceFile == NULL) {
+        // writing trace file generated an error, no sense in continuing.
+        vktrace_LogError("Error cannot create trace file and write header.");
+        vktrace_process_info_delete(pInfo->pProcessInfo);
+        return 1;
+    }
+
+    FileLike* fileLikeSocket = vktrace_FileLike_create_msg(pMessageStream);
+    unsigned int total_packet_count = 0;
+    vktrace_trace_packet_header* pHeader = NULL;
+    size_t bytes_written;
+
+    while (pInfo->pProcessInfo->serverRequestsTermination == FALSE)
+    {
+        // get a packet
+        //vktrace_LogDebug("Waiting for a packet...");
+
+        // read entire packet in
+        pHeader = vktrace_read_trace_packet(fileLikeSocket);
+        ++total_packet_count;
+        if (pHeader == NULL)
+        {
+            if (pMessageStream->mErrorNum == WSAECONNRESET)
+            {
+                vktrace_LogError("Network Connection Reset");
+            }
+            else
+            {
+                vktrace_LogError("Network Connection Failed");
+            }
+            break;
+        }
+
+        //vktrace_LogDebug("Received packet id: %hu", pHeader->packet_id);
+        
+        if (pHeader->pBody == (uintptr_t) NULL)
+        {
+            vktrace_LogWarning("Received empty packet body for id: %hu", pHeader->packet_id);
+        }
+        else
+        {
+            // handle special case packets
+            if (pHeader->packet_id == VKTRACE_TPI_MESSAGE)
+            {
+                if (g_settings.print_trace_messages == TRUE)
+                {
+                    vktrace_trace_packet_message* pPacket = vktrace_interpret_body_as_trace_packet_message(pHeader);
+                    vktrace_LogAlways("Packet %lu: Traced Message (%s): %s", pHeader->global_packet_index, vktrace_LogLevelToShortString(pPacket->type), pPacket->message);
+                    vktrace_finalize_buffer_address(pHeader, (void **) &(pPacket->message));
+                }
+            }
+
+            if (pHeader->packet_id == VKTRACE_TPI_MARKER_TERMINATE_PROCESS)
+            {
+                pInfo->pProcessInfo->serverRequestsTermination = true;
+                vktrace_delete_trace_packet(&pHeader);
+                vktrace_LogVerbose("Thread_CaptureTrace is exiting.");
+                break;
+            }
+
+            if (pInfo->pProcessInfo->pTraceFile != NULL)
+            {
+                vktrace_enter_critical_section(&pInfo->pProcessInfo->traceFileCriticalSection);
+                bytes_written = fwrite(pHeader, 1, (size_t)pHeader->size, pInfo->pProcessInfo->pTraceFile);
+                fflush(pInfo->pProcessInfo->pTraceFile);
+                vktrace_leave_critical_section(&pInfo->pProcessInfo->traceFileCriticalSection);
+                if (bytes_written != pHeader->size)
+                {
+                    vktrace_LogError("Failed to write the packet for packet_id = %hu", pHeader->packet_id);
+                }
+            }
+        }
+
+        // clean up
+        vktrace_delete_trace_packet(&pHeader);
+    }
+
+    VKTRACE_DELETE(fileLikeSocket);
+    vktrace_MessageStream_destroy(&pMessageStream);
+
+    return 0;
+}

diff --git a/vktrace/src/vktrace_trace/vktrace_process.h b/vktrace/src/vktrace_trace/vktrace_process.h
new file mode 100644
index 0000000..bc7a176
--- /dev/null
+++ b/vktrace/src/vktrace_trace/vktrace_process.h

@@ -0,0 +1,32 @@
+/*
+ * Copyright (c) 2013, NVIDIA CORPORATION. All rights reserved.
+ * Copyright (c) 2014-2016 Valve Corporation. All rights reserved.
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ */
+
+#pragma once
+
+extern "C" {
+#include "vktrace_common.h"
+#include "vktrace_process.h"
+#include "vktrace_interconnect.h"
+}
+
+VKTRACE_THREAD_ROUTINE_RETURN_TYPE Process_RunRecordTraceThread(LPVOID);
+
+VKTRACE_THREAD_ROUTINE_RETURN_TYPE Process_RunWatchdogThread(LPVOID);

diff --git a/vktrace/src/vktrace_viewer/CMakeLists.txt b/vktrace/src/vktrace_viewer/CMakeLists.txt
new file mode 100644
index 0000000..dfe2a48
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/CMakeLists.txt

@@ -0,0 +1,190 @@
+project(vktraceviewer)
+cmake_minimum_required(VERSION 3.0)
+
+include("${SRC_DIR}/build_options.cmake")
+
+set(CMAKE_PREFIX_PATH ${CMAKE_PREFIX_PATH} "${SRC_DIR}/cmake/Modules/")
+
+# we want cmake to link the Qt libs into the binary
+# This policy was introduced in 2.8.11, so on newer versions, use the OLD policy to maintain consistent behavior
+if (POLICY CMP0020)
+   cmake_policy(SET CMP0020 OLD)
+endif()
+
+find_package(Qt5 COMPONENTS Widgets Gui Core Svg QUIET)
+
+if(NOT Qt5_FOUND)
+# After Qt5.6 is installed, you may need to add the following to the cmake command line:
+# -DCMAKE_PREFIX_PATH=C:\\Qt\\5.6\\msvc2015_64\\
+message(WARNING "WARNING: vktraceviewer will be excluded because Qt5 was not found.")
+else()
+
+find_package(Threads REQUIRED)
+find_package(X11 REQUIRED)
+
+require_pthreads()
+
+include_directories(
+    ${SRC_DIR}
+    ${SRC_DIR}/vktrace_common
+    ${SRC_DIR}/vktrace_replay
+    ${SRC_DIR}/vktrace_extensions/vktracevulkan/vkreplay
+    ${SRC_DIR}/vktrace_extensions/vktracevulkan/vktraceviewer
+    ${SRC_DIR}/vktrace_viewer
+    ${CMAKE_CURRENT_BINARY_DIR}
+    ${Qt5Widgets_INCLUDE_DIRS}
+)
+
+set(SRC_LIST
+    main.cpp
+    vktraceviewer.cpp
+    vktraceviewer_settings.cpp
+    vktraceviewer_output.cpp
+    vktraceviewer_trace_file_utils.cpp
+    vktraceviewer_qgeneratetracedialog.cpp
+    vktraceviewer_qsettingsdialog.cpp
+    vktraceviewer_qtimelineview.cpp
+    vktraceviewer_qtracefileloader.cpp
+    vktraceviewer_QReplayWorker.cpp
+    vktraceviewer_controller_factory.cpp
+    ${SRC_DIR}/vktrace_replay/vkreplay_seq.cpp
+    ${SRC_DIR}/vktrace_replay/vkreplay_factory.cpp
+   )
+
+# This should only contain headers that define a QOBJECT
+# Typically that means just headers for UI objects
+set(UI_HEADER_LIST
+    vktraceviewer.h
+    vktraceviewer_qgeneratetracedialog.h
+    vktraceviewer_qsettingsdialog.h
+    vktraceviewer_qtimelineview.h
+    vktraceviewer_qimageviewer.h
+    vktraceviewer_qsvgviewer.h
+    vktraceviewer_QReplayWidget.h
+    vktraceviewer_QReplayWorker.h
+    vktraceviewer_QTraceFileModel.h
+    vktraceviewer_qtracefileloader.h
+   )
+
+# These is for all headers
+set(HEADER_LIST
+    vktraceviewer.h
+    vktraceviewer_settings.h
+    vktraceviewer_output.h
+    vktraceviewer_controller.h
+    vktraceviewer_controller_factory.h
+    vktraceviewer_qgeneratetracedialog.h
+    vktraceviewer_qsettingsdialog.h
+    vktraceviewer_qtimelineview.h
+    vktraceviewer_qimageviewer.h
+    vktraceviewer_qsvgviewer.h
+    vktraceviewer_QReplayWidget.h
+    vktraceviewer_QReplayWorker.h
+    vktraceviewer_qtracefileloader.h
+    vktraceviewer_QTraceFileModel.h
+    vktraceviewer_trace_file_utils.h
+    vktraceviewer_view.h
+   )
+
+set(FORM_LIST
+    vktraceviewer.ui
+   )
+
+set(RESOURCE_LIST
+   )
+
+QT5_WRAP_CPP(QT_GEN_HEADER_MOC_LIST ${UI_HEADER_LIST})
+QT5_WRAP_UI(QT_GEN_FORM_HEADER_LIST ${FORM_LIST})
+QT5_ADD_RESOURCES(QT_GEN_RESOURCE_RCC_LIST ${RESOURCE_LIST})
+
+if ("${CMAKE_C_COMPILER_ID}" STREQUAL "Clang")
+    add_compiler_flag("-Wno-global-constructors -Wno-used-but-marked-unused")
+endif()
+
+# Platform specific compile flags.
+if (NOT MSVC)
+    add_compiler_flag("-fPIC")
+endif()
+
+add_executable(${PROJECT_NAME} ${SRC_LIST} ${HEADER_LIST}
+    ${QT_GEN_HEADER_MOC_LIST}
+    ${QT_GEN_FORM_HEADER_LIST}
+    ${QT_GEN_RESOURCE_RCC_LIST}
+   )
+
+if (TARGET SDL)
+    add_dependencies(${PROJECT_NAME} SDL)
+endif ()
+
+add_dependencies(${PROJECT_NAME} vktraceviewer_vk)
+
+target_link_libraries(${PROJECT_NAME}
+    Qt5::Widgets
+    Qt5::Core
+    Qt5::Svg
+    vktrace_common
+    vulkan_replay
+    vktraceviewer_vk
+    ${CMAKE_DL_LIBS}
+    ${X11_X11_LIB}
+)
+
+if (MSVC)
+  # copy the debug and release dlls for Qt5Widgets, Qt5Core, and Qt5Gui
+  set (COPY_DEST ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/${CMAKE_CFG_INTDIR})
+
+  get_property(qt5_qmake_executable TARGET ${Qt5Core_QMAKE_EXECUTABLE} PROPERTY IMPORT_LOCATION)
+  execute_process(COMMAND ${qt5_qmake_executable} -query QT_INSTALL_PLUGINS OUTPUT_VARIABLE QT_INSTALL_PLUGINS OUTPUT_STRIP_TRAILING_WHITESPACE)
+  execute_process(COMMAND ${qt5_qmake_executable} -query QT_INSTALL_BINS OUTPUT_VARIABLE QT_INSTALL_BINS OUTPUT_STRIP_TRAILING_WHITESPACE)
+
+  # There are also several other files that need to be copied or created
+  FILE(WRITE ${CMAKE_CURRENT_LIST_DIR}/qt.conf "[Paths]\nPlugins=.")
+  if(EXISTS "${QT_INSTALL_BINS}/icudt54.dll")
+    add_custom_target(copy_deps_qt5 ALL
+                      ${CMAKE_COMMAND} -E make_directory "${COPY_DEST}/platforms"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${CMAKE_CURRENT_LIST_DIR}/qt.conf" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/Qt5Widgets.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/Qt5Widgetsd.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/Qt5Core.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/Qt5Cored.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/Qt5Gui.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/Qt5Guid.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/Qt5Svgd.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/Qt5Svg.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/libGLESv2d.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/libGLESv2.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/libEGLd.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/libEGL.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_PLUGINS}/platforms/qwindows.dll" "${COPY_DEST}/platforms"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_PLUGINS}/platforms/qwindowsd.dll"  "${COPY_DEST}/platforms"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/icudt54.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/icuin54.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/icuuc54.dll" "${COPY_DEST}"
+                      COMMENT "Copying vktraceviewer Qt5 dependencies to ${COPY_DEST}"
+                      VERBATIM)
+  else()
+    add_custom_target(copy_deps_qt5 ALL
+                      ${CMAKE_COMMAND} -E make_directory "${COPY_DEST}/platforms"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${CMAKE_CURRENT_LIST_DIR}/qt.conf" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/Qt5Widgets.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/Qt5Widgetsd.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/Qt5Core.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/Qt5Cored.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/Qt5Gui.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/Qt5Guid.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/Qt5Svgd.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/Qt5Svg.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/libGLESv2d.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/libGLESv2.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/libEGLd.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_BINS}/libEGL.dll" "${COPY_DEST}"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_PLUGINS}/platforms/qwindows.dll" "${COPY_DEST}/platforms"
+                      COMMAND ${CMAKE_COMMAND} -E copy_if_different "${QT_INSTALL_PLUGINS}/platforms/qwindowsd.dll"  "${COPY_DEST}/platforms"
+                      COMMENT "Copying vktraceviewer Qt5 dependencies to ${COPY_DEST}"
+                      VERBATIM)
+  endif()
+
+endif()
+
+build_options_finalize()
+endif(NOT Qt5_FOUND)

diff --git a/vktrace/src/vktrace_viewer/main.cpp b/vktrace/src/vktrace_viewer/main.cpp
new file mode 100644
index 0000000..f756163
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/main.cpp

@@ -0,0 +1,51 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+
+#include <QApplication>
+
+#include "vktraceviewer.h"
+#include "vktraceviewer_settings.h"
+
+int main(int argc, char *argv[])
+{
+    // Initialize QApplication before initializing settings
+    QApplication a(argc, argv);
+
+    // initialize settings
+    if (vktraceviewer_initialize_settings(argc, argv) == false)
+    {
+        return -1;
+    }
+
+    vktraceviewer w;
+    w.show();
+
+    if (g_settings.trace_file_to_open != NULL && strlen(g_settings.trace_file_to_open) > 0)
+    {
+        w.open_trace_file_threaded(QString(g_settings.trace_file_to_open));
+    }
+
+    int result = a.exec();
+
+    vktrace_SettingGroup_Delete_Loaded(&g_pAllSettings, &g_numAllSettings);
+
+    return result;
+}

diff --git a/vktrace/src/vktrace_viewer/qt.conf b/vktrace/src/vktrace_viewer/qt.conf
new file mode 100644
index 0000000..3b74f3a
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/qt.conf

@@ -0,0 +1,2 @@
+[Paths]

+Plugins=.
\ No newline at end of file

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer.cpp b/vktrace/src/vktrace_viewer/vktraceviewer.cpp
new file mode 100644
index 0000000..1e63198
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer.cpp

@@ -0,0 +1,1292 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+
+#include <assert.h>
+#include <QDebug>
+#include <QFileDialog>
+#include <QMoveEvent>
+#include <QPalette>
+#include <QProcess>
+#include <QToolButton>
+#include <QStandardPaths>
+#include <QMessageBox>
+#include <QCoreApplication>
+#include <QGraphicsBlurEffect>
+#include <QAbstractProxyModel>
+#include <qwindow.h>
+
+#include "ui_vktraceviewer.h"
+#include "vktraceviewer.h"
+#include "vktraceviewer_settings.h"
+#include "vktraceviewer_output.h"
+
+#include "vktraceviewer_controller_factory.h"
+#include "vktraceviewer_qgeneratetracedialog.h"
+#include "vktraceviewer_qtracefileloader.h"
+
+#include "vkreplay_main.h"
+//----------------------------------------------------------------------------------------------------------------------
+// globals
+//----------------------------------------------------------------------------------------------------------------------
+static QString g_PROJECT_NAME = "VkTrace Viewer";
+
+//-----------------------------------------------------------------------------
+void loggingCallback(VktraceLogLevel level, const char* pMessage)
+{
+    switch(level)
+    {
+    case VKTRACE_LOG_ERROR:
+        gs_OUTPUT.error(-1, QString(pMessage));
+        break;
+    case VKTRACE_LOG_WARNING:
+        gs_OUTPUT.warning(-1, QString(pMessage));
+        break;
+    case VKTRACE_LOG_DEBUG:
+    case VKTRACE_LOG_VERBOSE:
+    default:
+        gs_OUTPUT.message(-1, QString(pMessage));
+        break;
+    }
+
+#if defined(WIN32)
+#if _DEBUG
+    OutputDebugString(pMessage);
+#endif
+#endif
+}
+
+//-----------------------------------------------------------------------------
+
+vktraceviewer::vktraceviewer(QWidget *parent)
+    : QMainWindow(parent),
+      ui(new Ui::vktraceviewer),
+      m_settingsDialog(this),
+      m_pTraceFileModel(NULL),
+      m_pProxyModel(NULL),
+      m_pController(NULL),
+      m_pGenerateTraceButton(NULL),
+      m_pTimeline(NULL),
+      m_pGenerateTraceDialog(NULL),
+      m_bDelayUpdateUIForContext(false),
+      m_bGeneratingTrace(false)
+{
+    ui->setupUi(this);
+    qRegisterMetaType<uint64_t>("uint64_t");
+    qRegisterMetaType<VktraceLogLevel>("VktraceLogLevel");
+
+    m_pTraceStatsTab = new QWidget();
+    m_pTraceStatsTab->setObjectName(QStringLiteral("m_pTraceStatsTab"));
+    m_pTraceStatsTabLayout = new QGridLayout(m_pTraceStatsTab);
+    m_pTraceStatsTabLayout->setSpacing(6);
+    m_pTraceStatsTabLayout->setContentsMargins(11,11,11,11);
+    m_pTraceStatsTabLayout->setObjectName(QStringLiteral("m_pTraceStatsTabLayout"));
+    m_pTraceStatsTabText = new QTextBrowser(m_pTraceStatsTab);
+    m_pTraceStatsTabText->setObjectName(QStringLiteral("m_pTraceStatsTabText"));
+    m_pTraceStatsTabText->setLineWrapMode(QTextEdit::NoWrap);
+    m_pTraceStatsTabLayout->addWidget(m_pTraceStatsTabText, 0, 0, 1, 1);
+
+    QFont font("monospace", 10);
+    m_pTraceStatsTabText->setFont(font);
+    ui->outputTextBrowser->setFont(font);
+
+    // Hide unused, default tab
+    ui->stateTabWidget->removeTab(0);
+
+    memset(&m_traceFileInfo, 0, sizeof(vktraceviewer_trace_file_info));
+
+    m_settingsDialog.resize(g_settings.settings_dialog_width, g_settings.settings_dialog_height);
+    connect(&m_settingsDialog, SIGNAL(SaveSettings(vktrace_SettingGroup*, unsigned int)), this, SLOT(on_settingsSaved(vktrace_SettingGroup*, unsigned int)));
+    connect(&m_settingsDialog, SIGNAL(Resized(unsigned int, unsigned int)), this, SLOT(on_settingsDialogResized(unsigned int, unsigned int)));
+
+    this->move(g_settings.window_position_left, g_settings.window_position_top);
+    this->resize(g_settings.window_size_width, g_settings.window_size_height);
+
+    connect(ui->outputTextBrowser, SIGNAL(anchorClicked(const QUrl&)), this, SLOT(on_hyperlinkClicked(const QUrl&)));
+
+    // setup Output Window
+    vktraceviewer_output_init(ui->outputTextBrowser);
+    vktrace_LogSetCallback(loggingCallback);
+    vktrace_LogSetLevel(VKTRACE_LOG_ERROR);
+    vktrace_LogAlways("Welcome to VkTraceViewer!");
+
+    // cache the original background color of the search text box
+    m_searchTextboxBackgroundColor = ui->searchTextBox->palette().base().color();
+    
+    // add buttons to toolbar
+    m_pGenerateTraceButton = new QToolButton(ui->mainToolBar);
+    m_pGenerateTraceButton->setText("Generate Trace...");
+    m_pGenerateTraceButton->setEnabled(true);
+    connect(m_pGenerateTraceButton, SIGNAL(clicked()), this, SLOT(prompt_generate_trace()));
+
+    ui->mainToolBar->addWidget(m_pGenerateTraceButton);
+
+    ui->treeView->setModel(NULL);
+    ui->treeView->setContextMenuPolicy(Qt::ActionsContextMenu);
+    ui->treeView->setUniformRowHeights(true);
+
+    // setup timeline
+    m_pTimeline = new vktraceviewer_QTimelineView();
+    if (m_pTimeline != NULL)
+    {
+        m_pTimeline->setMinimumHeight(100);
+        connect(m_pTimeline, SIGNAL(clicked(const QModelIndex &)), this, SLOT(slot_timeline_clicked(const QModelIndex &)));
+        ui->timelineLayout->addWidget(m_pTimeline);
+        ui->timelineLayout->removeWidget(ui->timelineViewPlaceholder);
+        delete ui->timelineViewPlaceholder;
+        ui->timelineViewPlaceholder = NULL;
+    }
+
+    m_pGenerateTraceDialog = new vktraceviewer_QGenerateTraceDialog(this);
+    connect(m_pGenerateTraceDialog, SIGNAL(OutputMessage(VktraceLogLevel, const QString&)), this, SLOT(OnOutputMessage(VktraceLogLevel, const QString&)));
+
+    reset_tracefile_ui();
+
+    // for now, remove these widgets since they are not used
+    ui->bottomTabWidget->removeTab(ui->bottomTabWidget->indexOf(ui->machineInfoTab));
+    ui->bottomTabWidget->removeTab(ui->bottomTabWidget->indexOf(ui->callStackTab));
+}
+
+vktraceviewer::~vktraceviewer()
+{
+    close_trace_file();
+
+    if (m_pTimeline != NULL)
+    {
+        delete m_pTimeline;
+        m_pTimeline = NULL;
+    }
+
+    reset_view();
+
+    delete ui;
+    vktraceviewer_output_deinit();
+}
+
+void vktraceviewer::moveEvent(QMoveEvent *pEvent)
+{
+    g_settings.window_position_left = pEvent->pos().x();
+    g_settings.window_position_top = pEvent->pos().y();
+
+    vktraceviewer_settings_updated();
+}
+
+void vktraceviewer::closeEvent (QCloseEvent *pEvent)
+{
+    vktraceviewer_save_settings();
+    close_trace_file();
+    pEvent->accept();
+}
+
+void vktraceviewer::resizeEvent(QResizeEvent *pEvent)
+{
+    g_settings.window_size_width = pEvent->size().width();
+    g_settings.window_size_height = pEvent->size().height();
+
+    vktraceviewer_settings_updated();
+}
+
+int vktraceviewer::add_custom_state_viewer(QWidget* pWidget, const QString& title, bool bBringToFront)
+{
+    int tabIndex = ui->stateTabWidget->addTab(pWidget, title);
+
+    if (bBringToFront)
+    {
+        ui->stateTabWidget->setCurrentWidget(pWidget);
+    }
+
+    return tabIndex;
+}
+
+void vktraceviewer::remove_custom_state_viewer(int const tabIndex)
+{
+    ui->stateTabWidget->removeTab(tabIndex);
+}
+
+void vktraceviewer::enable_custom_state_viewer(QWidget* pWidget, bool bEnabled)
+{
+    ui->stateTabWidget->setTabEnabled(ui->stateTabWidget->indexOf(pWidget), bEnabled);
+}
+
+QToolButton* vktraceviewer::add_toolbar_button(const QString& title, bool bEnabled)
+{
+    QToolButton* pButton = new QToolButton(ui->mainToolBar);
+    pButton->setText(title);
+    pButton->setEnabled(bEnabled);
+    ui->mainToolBar->addWidget(pButton);
+    return pButton;
+}
+
+void vktraceviewer::set_calltree_model(vktraceviewer_QTraceFileModel* pTraceFileModel, QAbstractProxyModel* pModel)
+{
+    if (m_pTraceFileModel == pTraceFileModel && pModel == m_pProxyModel)
+    {
+        // Setting model and proxy to the same thing they are already set to, so there's nothing to do!
+        return;
+    }
+
+    m_pTraceFileModel = pTraceFileModel;
+    m_pProxyModel = pModel;
+
+    if (m_pTimeline != NULL)
+    {
+        m_pTimeline->setModel(pTraceFileModel);
+    }
+
+    if (pModel == NULL)
+    {
+        ui->treeView->setModel(pTraceFileModel);
+    }
+    else
+    {
+        ui->treeView->setModel(pModel);
+    }
+
+    // initially show all columns before hiding others
+    int columns = ui->treeView->header()->count();
+    for (int i = 0; i < columns; i++)
+    {
+        ui->treeView->showColumn(i);
+    }
+
+    // hide columns that are not very important right now
+    ui->treeView->hideColumn(vktraceviewer_QTraceFileModel::Column_TracerId);
+    ui->treeView->hideColumn(vktraceviewer_QTraceFileModel::Column_BeginTime);
+    ui->treeView->hideColumn(vktraceviewer_QTraceFileModel::Column_EndTime);
+    ui->treeView->hideColumn(vktraceviewer_QTraceFileModel::Column_PacketSize);
+
+    int width = ui->treeView->geometry().width();
+    int firstEqualWidthColumnIndex = 0;
+    float fSharedEqualWidthPct = 1.0;
+
+    if (pModel != NULL && pModel->inherits("vktraceviewer_QGroupThreadsProxyModel"))
+    {
+        ui->treeView->hideColumn(vktraceviewer_QTraceFileModel::Column_EntrypointName);
+        ui->treeView->hideColumn(vktraceviewer_QTraceFileModel::Column_ThreadId);
+
+        ui->treeView->setColumnWidth(vktraceviewer_QTraceFileModel::Column_PacketIndex, width * 0.05);
+        ui->treeView->setColumnWidth(vktraceviewer_QTraceFileModel::Column_CpuDuration, width * 0.08);
+        firstEqualWidthColumnIndex = m_pTraceFileModel->columnCount();
+        fSharedEqualWidthPct = 1.0f - 0.05f - 0.08f;
+    }
+    else
+    {
+        // entrypoint names get the most space
+        ui->treeView->setColumnWidth(vktraceviewer_QTraceFileModel::Column_EntrypointName, width * 0.55);
+        firstEqualWidthColumnIndex = 1;
+        fSharedEqualWidthPct = 1.0f - 0.55f;
+    }
+
+    // the remaining space is divided among visible columns
+    int visibleColumns = 0;
+    for (int i = firstEqualWidthColumnIndex; i < columns; i++)
+    {
+        if (!ui->treeView->isColumnHidden(i))
+        {
+            visibleColumns++;
+        }
+    }
+
+
+    int scollbarWidth = qApp->style()->pixelMetric(QStyle::PM_ScrollBarExtent);
+    int columnWidths = (width-scollbarWidth) * (fSharedEqualWidthPct / visibleColumns);
+
+    for (int i = firstEqualWidthColumnIndex; i < columns; i++)
+    {
+        if (!ui->treeView->isColumnHidden(i))
+        {
+            ui->treeView->setColumnWidth(i, columnWidths);
+        }
+    }
+}
+
+void vktraceviewer::add_calltree_contextmenu_item(QAction* pAction)
+{
+    ui->treeView->addAction(pAction);
+}
+
+int indexOfColumn(QAbstractItemModel* pModel, const QString &text)
+{
+    for (int i = 0; i < pModel->columnCount(); i++)
+    {
+        if (pModel->headerData(i, Qt::Horizontal, Qt::DisplayRole).toString() == text)
+        {
+            return i;
+        }
+    }
+    return -1;
+}
+
+void vktraceviewer::select_call_at_packet_index(unsigned long long packetIndex)
+{
+    if (m_pTraceFileModel != NULL)
+    {
+        QApplication::setOverrideCursor(Qt::WaitCursor);
+
+        QModelIndex start = m_pTraceFileModel->index(0, vktraceviewer_QTraceFileModel::Column_PacketIndex);
+
+        QModelIndexList matches = m_pTraceFileModel->match(start, Qt::DisplayRole, QVariant(packetIndex), 1, Qt::MatchFixedString | Qt::MatchRecursive | Qt::MatchWrap);
+        if (matches.count() > 0)
+        {
+            // for some reason, we need to recreate the index such that the index and parent both are for column 0
+            QModelIndex updatedMatch = m_pTraceFileModel->index(matches[0].row(), 0, m_pTraceFileModel->index(matches[0].parent().row(), 0));
+
+            selectApicallModelIndex(updatedMatch, true, true);
+            ui->treeView->setFocus();
+
+            if (m_pTimeline != NULL)
+            {
+                m_pTimeline->setCurrentIndex(m_pTraceFileModel->index(matches[0].row(), vktraceviewer_QTraceFileModel::Column_EntrypointName, QModelIndex()));
+            }
+        }
+
+        QApplication::restoreOverrideCursor();
+    }
+}
+
+void vktraceviewer::highlight_timeline_item(unsigned long long packetArrayIndex, bool bScrollTo, bool bSelect)
+{
+    if (m_pTraceFileModel != NULL)
+    {
+        QModelIndex location = m_pTraceFileModel->index(packetArrayIndex, 0);
+
+        if (m_pTimeline != NULL && m_pTimeline->currentIndex() != location)
+        {
+            // scroll to the index
+            if (bScrollTo)
+            {
+                m_pTimeline->scrollTo(location);
+            }
+
+            // select the index
+            if (bSelect)
+            {
+                m_pTimeline->setCurrentIndex(location);
+            }
+        }
+    }
+}
+
+
+void vktraceviewer::on_replay_state_changed(bool bReplayInProgress)
+{
+    bool bEnableUi = !bReplayInProgress;
+    ui->treeView->setEnabled(bEnableUi);
+    this->m_pGenerateTraceButton->setEnabled(bEnableUi);
+    ui->nextDrawcallButton->setEnabled(bEnableUi);
+    ui->prevDrawcallButton->setEnabled(bEnableUi);
+    ui->searchNextButton->setEnabled(bEnableUi);
+    ui->searchPrevButton->setEnabled(bEnableUi);
+    ui->searchTextBox->setEnabled(bEnableUi);
+    if (m_pTimeline != NULL)
+    {
+        m_pTimeline->setEnabled(bEnableUi);
+    }
+}
+
+unsigned long long vktraceviewer::get_current_packet_index()
+{
+    QModelIndex currentIndex = ui->treeView->currentIndex();
+    QModelIndex col0Index = currentIndex.sibling(currentIndex.row(), 0);
+    QModelIndex index = mapTreeIndexToModel(col0Index);
+
+    unsigned long long packetIndex = 0;
+    if (index.isValid())
+    {
+        vktrace_trace_packet_header* pHeader = (vktrace_trace_packet_header*)index.internalPointer();
+        if (pHeader != NULL)
+        {
+            assert(pHeader != NULL);
+            packetIndex = pHeader->global_packet_index;
+        }
+    }
+    return packetIndex;
+}
+
+void vktraceviewer::reset_view()
+{
+    int count = ui->stateTabWidget->count();
+    while (count > 0)
+    {
+        delete ui->stateTabWidget->widget(0);
+        count = ui->stateTabWidget->count();
+    }
+}
+
+void vktraceviewer::LogAlways(const QString& message)
+{
+    OnOutputMessage(VKTRACE_LOG_VERBOSE, message);
+}
+
+void vktraceviewer::LogWarning(const QString& message)
+{
+    OnOutputMessage(VKTRACE_LOG_WARNING, message);
+}
+
+void vktraceviewer::LogError(const QString& message)
+{
+    OnOutputMessage(VKTRACE_LOG_ERROR, message);
+}
+
+void vktraceviewer::add_setting_group(vktrace_SettingGroup* pGroup)
+{
+    vktrace_SettingGroup_merge(pGroup, &g_pAllSettings, &g_numAllSettings);
+}
+
+unsigned int vktraceviewer::get_global_settings(vktrace_SettingGroup** ppGroups)
+{
+    if (ppGroups != NULL)
+    {
+        *ppGroups = g_pAllSettings;
+    }
+
+    return g_numAllSettings;
+}
+
+bool vktraceviewer::prompt_load_new_trace(const QString& tracefile)
+{
+    int ret = QMessageBox::warning(this, tr(g_PROJECT_NAME.toStdString().c_str()), tr("Would you like to load the new trace file?"),
+                                  QMessageBox::Yes | QMessageBox::No, QMessageBox::Yes);
+
+    if (ret == QMessageBox::Yes)
+    {
+    //    // save current session if there is one
+    //    if (m_openFilename.size() > 0 && m_pTraceReader != NULL && m_pApiCallTreeModel != NULL)
+    //    {
+    //        save_session_to_disk(get_sessionfile_path(m_openFilename, *m_pTraceReader), m_openFilename, m_pTraceReader, m_pApiCallTreeModel);
+    //    }
+
+        // try to open the new file
+        open_trace_file_threaded(tracefile);
+        return true;
+    }
+
+    return false;
+}
+
+void vktraceviewer::on_actionE_xit_triggered()
+{
+    close();
+}
+
+void vktraceviewer::on_action_Open_triggered()
+{
+    QString fileName = QFileDialog::getOpenFileName(this, tr("Open File"), QString(),
+                                                    tr("vktrace Binary Files (*.vktrace *.*)"));
+
+    if (!fileName.isEmpty())
+    {
+        open_trace_file_threaded(fileName);
+    }
+}
+
+typedef struct {
+    uint64_t totalCpuExecutionTime;
+    uint64_t totalTraceOverhead;
+    uint32_t totalCallCount;
+} vtvApiUsageStats;
+
+void vktraceviewer::GenerateTraceFileStats()
+{
+    // process trace file to extract some API usage stats
+    // (NOTE: this could happen in a background thread)
+    ui->bottomTabWidget->addTab(m_pTraceStatsTab, "Trace Stats");
+
+    QString statText;
+    m_pTraceStatsTabText->setText(statText);
+
+    vtvApiUsageStats tmpNewStat;
+    tmpNewStat.totalCallCount = 1;
+    tmpNewStat.totalCpuExecutionTime = 0;
+    tmpNewStat.totalTraceOverhead = 0;
+
+    vtvApiUsageStats totalStats;
+    totalStats.totalCallCount = 0;
+    totalStats.totalCpuExecutionTime = 0;
+    totalStats.totalTraceOverhead = 0;
+
+    uint64_t totalTraceTime = 0;
+
+    if (m_traceFileInfo.packetCount > 0)
+    {
+        uint64_t start = m_traceFileInfo.pPacketOffsets[0].pHeader->entrypoint_begin_time;
+        uint64_t end = m_traceFileInfo.pPacketOffsets[m_traceFileInfo.packetCount-1].pHeader->entrypoint_end_time;
+        totalTraceTime = end-start;
+    }
+
+    QMap<uint16_t, vtvApiUsageStats> statMap;
+    for (uint64_t i = 0; i < m_traceFileInfo.packetCount; i++)
+    {
+        vktrace_trace_packet_header* pHeader = m_traceFileInfo.pPacketOffsets[i].pHeader;
+        if (pHeader->packet_id >= VKTRACE_TPI_BEGIN_API_HERE)
+        {
+            totalStats.totalCallCount++;
+            totalStats.totalCpuExecutionTime += (pHeader->entrypoint_end_time - pHeader->entrypoint_begin_time);
+            totalStats.totalTraceOverhead += ((pHeader->vktrace_end_time - pHeader->vktrace_begin_time) - (pHeader->entrypoint_end_time - pHeader->entrypoint_begin_time));
+            if (statMap.contains(pHeader->packet_id))
+            {
+                statMap[pHeader->packet_id].totalCpuExecutionTime += (pHeader->entrypoint_end_time - pHeader->entrypoint_begin_time);
+                statMap[pHeader->packet_id].totalTraceOverhead += ((pHeader->vktrace_end_time - pHeader->vktrace_begin_time) - (pHeader->entrypoint_end_time - pHeader->entrypoint_begin_time));
+                statMap[pHeader->packet_id].totalCallCount++;
+            }
+            else
+            {
+                tmpNewStat.totalCpuExecutionTime = (pHeader->entrypoint_end_time - pHeader->entrypoint_begin_time);
+                tmpNewStat.totalTraceOverhead = ((pHeader->vktrace_end_time - pHeader->vktrace_begin_time) - (pHeader->entrypoint_end_time - pHeader->entrypoint_begin_time));
+                statMap.insert(pHeader->packet_id, tmpNewStat);
+            }
+        }
+    }
+
+    uint64_t appTime = totalTraceTime - totalStats.totalCpuExecutionTime - totalStats.totalTraceOverhead;
+    uint64_t appDriverTime = totalTraceTime - totalStats.totalTraceOverhead;
+
+    statText += "<table>";
+    statText += QString("<tr><td>Total Trace Time:</td><td>%1 ns</td></tr>").arg(totalTraceTime);
+    statText += QString("<tr><td>Total App+Driver Time:</td><td>%1 ns (%2%)</td></tr>").arg(appDriverTime).arg(100*(float)appDriverTime/(float)totalTraceTime, 0, 'f', 1);
+    statText += QString("<tr><td>Total Driver Time:</td><td>%1 ns (%2%)</td></tr>").arg(totalStats.totalCpuExecutionTime).arg(100*(float)totalStats.totalCpuExecutionTime/(float)totalTraceTime, 0, 'f', 1);
+    statText += QString("<tr><td>Total App Time:</td><td>%1 ns (%2%)</td></tr>").arg(appTime).arg(100*(float)appTime/(float)totalTraceTime, 0, 'f', 1);
+    statText += QString("<tr><td>Total VkTrace Overhead:</td><td>%1 ns (%2%)</td></tr>").arg(totalStats.totalTraceOverhead).arg(100*(float)totalStats.totalTraceOverhead/(float)totalTraceTime, 0, 'f', 1);;
+    statText += QString("<tr><td>Total API Calls:</td><td>%1</td></tr>").arg(totalStats.totalCallCount);
+    statText += QString("<tr><td>Total Entrypoints Called:</td><td>%1</td></tr>").arg(statMap.count());
+    statText += "</table><br/>";
+
+    statText += "<table><thead><tr><th align='left'>Entrypoint</th><th align='right'># Calls (%Total)</th><th align='right'>Driver Time (%Total %AppDr %Driver)</th><th align='right'>VkTrace Overhead (%Total %vktrace)</th></tr></thead><tbody>";
+
+    for (QMap<uint16_t, vtvApiUsageStats>::iterator i = statMap.begin(); i != statMap.end(); i++)
+    {
+        const vtvApiUsageStats stat = i.value();
+        const char* entrypoint = m_pController->GetPacketIdString(i.key());
+        if (entrypoint == NULL)
+        {
+            // Instead of printing entrypoint name, just print i
+            statText += QString("<tr><td>%1</td>").arg(i.key());
+        }
+        else
+        {
+            statText += QString("<tr><td>%1</td>").arg(entrypoint);
+        }
+
+        // Note, because this is being output as HTML, consecutive spaces ' ' get removed, thus I'm using '?' to pad the numbers and replacing them with "&nbsp;" below so that numbers are better aligned.
+        // Also, this text field has been created with a fixed-width font.
+        statText += QString("<td align='right'>%1 (%2%)</td>").arg(stat.totalCallCount).arg(100*(float)stat.totalCallCount/(float)totalStats.totalCallCount, 5, 'f', 1, '?');
+        statText += QString("<td align='right'>%1 ns (%2% %3%??%4%)</td>").arg(stat.totalCpuExecutionTime).arg(100*(float)stat.totalCpuExecutionTime/(float)totalTraceTime, 5, 'f', 1, '?').arg(100*(float)stat.totalCpuExecutionTime/(float)appDriverTime, 5, 'f', 1, '?').arg(100*(float)stat.totalCpuExecutionTime/(float)totalStats.totalCpuExecutionTime, 5, 'f', 1, '?');
+        statText += QString("<td align='right'>%1 ns (%2% %3%)</td></tr>").arg(stat.totalTraceOverhead).arg(100*(float)stat.totalTraceOverhead/(float)totalTraceTime, 5, 'f', 1, '?').arg(100*(float)stat.totalTraceOverhead/(float)totalStats.totalTraceOverhead, 5, 'f', 1, '?');
+
+        statText.replace('?', "&nbsp;");
+    }
+
+    statText += "</tbody></table>";
+    m_pTraceStatsTabText->setHtml(statText);
+}
+
+void vktraceviewer::onTraceFileLoaded(bool bSuccess, const vktraceviewer_trace_file_info& fileInfo, const QString& controllerFilename)
+{
+    QApplication::restoreOverrideCursor();
+
+    if (fileInfo.packetCount == 0)
+    {
+        LogWarning("The trace file has 0 packets.");
+    }
+    else if (fileInfo.pPacketOffsets == NULL)
+    {
+        LogError("No packet offsets read from trace file.");
+        bSuccess = false;
+    }
+
+    if (!bSuccess)
+    {
+        LogAlways("...FAILED!");
+        QMessageBox::critical(this, tr("Error"), tr("Could not open trace file."));
+        close_trace_file();
+
+        if (m_bGeneratingTrace)
+        {
+            // if the user was generating a trace file, but the trace failed to load,
+            // then re-spawn the generate trace dialog.
+            prompt_generate_trace();
+        }
+    }
+    else
+    {
+        m_traceFileInfo = fileInfo;
+
+        setWindowTitle(QString(m_traceFileInfo.filename) + " - " + g_PROJECT_NAME);
+        LogAlways("...success!");
+
+        // update settings to reflect the currently open file
+        g_settings.trace_file_to_open = vktrace_allocate_and_copy(m_traceFileInfo.filename);
+        vktraceviewer_settings_updated();
+
+#ifndef USE_STATIC_CONTROLLER_LIBRARY
+        if (!controllerFilename.isEmpty())
+        {
+            m_pController = m_controllerFactory.Load(controllerFilename.toStdString().c_str());
+        }
+#else
+        m_pController = vtvCreateQController();
+#endif
+
+        if (m_pController != NULL)
+        {
+            connect(m_pController, SIGNAL(OutputMessage(VktraceLogLevel, const QString&)), this, SLOT(OnOutputMessage(VktraceLogLevel, const QString&)));
+            connect(m_pController, SIGNAL(OutputMessage(VktraceLogLevel, uint64_t, const QString&)), this, SLOT(OnOutputMessage(VktraceLogLevel, uint64_t, const QString&)));
+
+            // Merge in settings from the controller.
+            // This won't replace settings that may have already been loaded from disk.
+            vktrace_SettingGroup_merge(m_pController->GetSettings(), &g_pAllSettings, &g_numAllSettings);
+
+            // now update the controller with the loaded settings
+            m_pController->UpdateFromSettings(g_pAllSettings, g_numAllSettings);
+
+            //// trace file was loaded, now attempt to open additional session data
+            //if (load_or_create_session(filename.c_str(), m_pTraceReader) == false)
+            //{
+            //    // failing to load session data is not critical, but may result in unexpected behavior at times.
+            //    vktraceviewer_output_error("VkTraceViewer was unable to create a session folder to save viewing information. Functionality may be limited.");
+            //}
+
+            // Update the UI with the controller
+            m_pController->LoadTraceFile(&m_traceFileInfo, this);
+        }
+
+        // update toolbar
+        ui->searchTextBox->setEnabled(true);
+        ui->searchPrevButton->setEnabled(true);
+        ui->searchNextButton->setEnabled(true);
+
+        ui->action_Close->setEnabled(true);
+        ui->actionExport_API_Calls->setEnabled(true);
+
+        ui->prevDrawcallButton->setEnabled(true);
+        ui->nextDrawcallButton->setEnabled(true);
+
+        // reset flag indicating that the ui may have been generating a trace file.
+        m_bGeneratingTrace = false;
+
+        GenerateTraceFileStats();
+    }
+}
+
+void vktraceviewer::on_action_Close_triggered()
+{
+    close_trace_file();
+}
+
+void vktraceviewer::close_trace_file()
+{
+    if (m_pController != NULL)
+    {
+        ui->bottomTabWidget->removeTab(ui->bottomTabWidget->indexOf(m_pTraceStatsTab));
+        m_pController->UnloadTraceFile();
+#ifndef USE_STATIC_CONTROLLER_LIBRARY
+        m_controllerFactory.Unload(&m_pController);
+#else
+        vtvDeleteQController(&m_pController);
+#endif
+    }
+
+    if (m_pTimeline != NULL && m_pTimeline->model() != NULL)
+    {
+        m_pTimeline->setModel(NULL);
+        m_pTimeline->repaint();
+    }
+
+    if (m_traceFileInfo.packetCount > 0)
+    {
+        for (unsigned int i = 0; i < m_traceFileInfo.packetCount; i++)
+        {
+            if (m_traceFileInfo.pPacketOffsets[i].pHeader != NULL)
+            {
+                vktrace_free(m_traceFileInfo.pPacketOffsets[i].pHeader);
+                m_traceFileInfo.pPacketOffsets[i].pHeader = NULL;
+            }
+        }
+
+        VKTRACE_DELETE(m_traceFileInfo.pPacketOffsets);
+        m_traceFileInfo.pPacketOffsets = NULL;
+        m_traceFileInfo.packetCount = 0;
+    }
+
+    if (m_traceFileInfo.pFile != NULL)
+    {
+        fclose(m_traceFileInfo.pFile);
+        m_traceFileInfo.pFile = NULL;
+    }
+
+    if (m_traceFileInfo.filename != NULL)
+    {
+        vktrace_free(m_traceFileInfo.filename);
+        m_traceFileInfo.filename = NULL;
+
+        LogAlways("Closing trace file.");
+        LogAlways("-------------------");
+
+        // update settings
+        if (g_settings.trace_file_to_open != NULL)
+        {
+            vktrace_free(g_settings.trace_file_to_open);
+            g_settings.trace_file_to_open = NULL;
+            vktraceviewer_settings_updated();
+        }
+    }
+
+    setWindowTitle(g_PROJECT_NAME);
+
+    reset_tracefile_ui();
+}
+
+void vktraceviewer::on_actionExport_API_Calls_triggered()
+{
+    QString suggestedName(m_traceFileInfo.filename);
+
+    int lastIndex = suggestedName.lastIndexOf('.');
+    if (lastIndex != -1)
+    {
+        suggestedName = suggestedName.remove(lastIndex, suggestedName.size() - lastIndex);
+    }
+    suggestedName += "-ApiCalls.txt";
+
+    QString fileName = QFileDialog::getSaveFileName(this, tr("Export API Calls"), suggestedName, tr("Text (*.txt)"));
+
+    if (!fileName.isEmpty())
+    {
+        FILE* pFile = fopen(fileName.toStdString().c_str(), "w");
+        if (pFile == NULL)
+        {
+            LogError("Failed to open file for write. Can't export API calls.");
+            return;
+        }
+
+        // iterate through every packet
+        for (unsigned int i = 0; i < m_traceFileInfo.packetCount; i++)
+        {
+            vktraceviewer_trace_file_packet_offsets* pOffsets = &m_traceFileInfo.pPacketOffsets[i];
+            vktrace_trace_packet_header* pHeader = pOffsets->pHeader;
+            assert(pHeader != NULL);
+            QString string = m_pTraceFileModel->get_packet_string(pHeader);
+
+            // output packet string
+            fprintf(pFile, "%s\n", string.toStdString().c_str());
+        }
+
+        fclose(pFile);
+    }
+}
+
+void vktraceviewer::on_actionEdit_triggered()
+{
+    // make sure dialog is at the size specified by the settings
+    m_settingsDialog.resize(g_settings.settings_dialog_width, g_settings.settings_dialog_height);
+
+    // set the groups so that the dialog is displaying the most recent information
+    m_settingsDialog.setGroups(g_pAllSettings, g_numAllSettings);
+
+    // execute the dialog
+    m_settingsDialog.exec();
+}
+
+void vktraceviewer::on_settingsDialogResized(unsigned int width, unsigned int height)
+{
+    // the dialog was resized, so update the settings
+    g_settings.settings_dialog_width = width;
+    g_settings.settings_dialog_height = height;
+
+    // Update the setting groups with the new values.
+    vktraceviewer_settings_updated();
+
+    // re-set the groups so that the dialog is displaying the most recent information.
+    m_settingsDialog.setGroups(g_pAllSettings, g_numAllSettings);
+}
+
+void vktraceviewer::on_settingsSaved(vktrace_SettingGroup* pUpdatedSettings, unsigned int numGroups)
+{
+    // pUpdatedSettings is already pointing to the same location as g_pAllSettings
+    g_numAllSettings = numGroups;
+
+    // apply updated settings to the settingGroup so that the UI will respond to the changes
+    vktrace_SettingGroup_Apply_Overrides(&g_settingGroup, pUpdatedSettings, numGroups);
+
+    if (m_pController != NULL)
+    {
+        m_pController->UpdateFromSettings(pUpdatedSettings, numGroups);
+    }
+
+    vktraceviewer_save_settings();
+
+    // react to changes in settings
+    this->move(g_settings.window_position_left, g_settings.window_position_top);
+    this->resize(g_settings.window_size_width, g_settings.window_size_height);
+}
+
+void vktraceviewer::open_trace_file_threaded(const QString& filename)
+{
+    // close any existing trace
+    close_trace_file();
+
+    LogAlways("*********************");
+    LogAlways("Opening trace file...");
+    LogAlways(filename);
+
+    QApplication::setOverrideCursor(Qt::WaitCursor);
+
+    vktraceviewer_QTraceFileLoader* pTraceLoader = new vktraceviewer_QTraceFileLoader();
+    m_traceLoaderThread.setObjectName("TraceLoaderThread");
+    pTraceLoader->moveToThread(&m_traceLoaderThread);
+
+    connect(pTraceLoader, SIGNAL(OutputMessage(VktraceLogLevel, uint64_t, const QString&)), this, SLOT(OnOutputMessage(VktraceLogLevel, uint64_t, const QString&)), Qt::QueuedConnection);
+    connect(pTraceLoader, SIGNAL(OutputMessage(VktraceLogLevel, const QString&)), this, SLOT(OnOutputMessage(VktraceLogLevel, const QString&)), Qt::QueuedConnection);
+
+    connect(this, SIGNAL(LoadTraceFile(const QString&)), pTraceLoader, SLOT(loadTraceFile(QString)), Qt::QueuedConnection);
+
+    connect(pTraceLoader, SIGNAL(TraceFileLoaded(bool, vktraceviewer_trace_file_info, const QString&)), this, SLOT(onTraceFileLoaded(bool, vktraceviewer_trace_file_info, const QString&)));
+    connect(pTraceLoader, SIGNAL(Finished()), &m_traceLoaderThread, SLOT(quit()));
+    connect(pTraceLoader, SIGNAL(Finished()), pTraceLoader, SLOT(deleteLater()));
+
+    m_traceLoaderThread.start();
+
+    // Signal the loader to start
+    emit LoadTraceFile(filename);
+}
+
+void vktraceviewer::reset_tracefile_ui()
+{
+    ui->action_Close->setEnabled(false);
+    ui->actionExport_API_Calls->setEnabled(false);
+
+    ui->prevDrawcallButton->setEnabled(false);
+    ui->nextDrawcallButton->setEnabled(false);
+    ui->searchTextBox->clear();
+    ui->searchTextBox->setEnabled(false);
+    ui->searchPrevButton->setEnabled(false);
+    ui->searchNextButton->setEnabled(false);
+
+    //VKTRACEVIEWER_DISABLE_BOTTOM_TAB(ui->machineInfoTab);
+    //VKTRACEVIEWER_DISABLE_BOTTOM_TAB(ui->callStackTab);
+}
+
+void vktraceviewer::on_treeView_clicked(const QModelIndex &index)
+{
+    QModelIndex col0Index = index.sibling(index.row(), vktraceviewer_QTraceFileModel::Column_EntrypointName);
+    QModelIndex srcIndex = mapTreeIndexToModel(col0Index);
+    if (srcIndex.isValid())
+    {
+        selectApicallModelIndex(srcIndex, true, true);
+    }
+}
+
+void vktraceviewer::on_hyperlinkClicked(const QUrl& link)
+{
+    if (link.fileName() == "packet")
+    {
+        if (link.hasFragment())
+        {
+            qulonglong index = link.fragment().toULongLong();
+            this->select_call_at_packet_index(index);
+        }
+    }
+}
+
+void vktraceviewer::slot_timeline_clicked(const QModelIndex &index)
+{
+    selectApicallModelIndex(index, true, true);
+}
+
+void vktraceviewer::selectApicallModelIndex(QModelIndex index, bool scrollTo, bool select)
+{
+    // make sure the index is visible in tree view
+    QModelIndex treeIndex = mapTreeIndexFromModel(index);
+
+    if (ui->treeView->currentIndex() != treeIndex && ui->treeView->isEnabled())
+    {
+        QModelIndex parentIndex = treeIndex.parent();
+        while (parentIndex.isValid())
+        {
+            if (ui->treeView->isExpanded(parentIndex) == false)
+            {
+                ui->treeView->expand(parentIndex);
+            }
+            parentIndex = parentIndex.parent();
+        }
+
+        // scroll to the index
+        if (scrollTo)
+        {
+            ui->treeView->scrollTo(treeIndex);
+        }
+
+        // select the index
+        if (select)
+        {
+            ui->treeView->setCurrentIndex(treeIndex);
+        }
+    }
+
+    if (m_pTimeline != NULL && m_pTimeline->currentIndex() != index)
+    {
+        // scroll to the index
+        if (scrollTo)
+        {
+            m_pTimeline->scrollTo(index);
+        }
+
+        // select the index
+        if (select)
+        {
+            m_pTimeline->setCurrentIndex(index);
+        }
+    }
+}
+
+void vktraceviewer::on_searchTextBox_textChanged(const QString &searchText)
+{
+    QPalette palette(ui->searchTextBox->palette());
+    palette.setColor(QPalette::Base, m_searchTextboxBackgroundColor);
+    ui->searchTextBox->setPalette(palette);
+
+    if (m_pTraceFileModel != NULL)
+    {
+        m_pTraceFileModel->set_highlight_search_string(searchText);
+    }
+
+    // need to briefly give the treeview focus so that it properly redraws and highlights the matching rows
+    // then return focus to the search textbox so that typed keys are not lost
+    ui->treeView->setFocus();
+    ui->searchTextBox->setFocus();
+}
+
+void vktraceviewer::on_searchNextButton_clicked()
+{
+    if (m_pTraceFileModel != NULL)
+    {
+        QModelIndex index = ui->treeView->currentIndex();
+        if (!index.isValid())
+        {
+            // If there was no valid current index, then get the first index in the trace file model.
+            index = m_pTraceFileModel->index(0, vktraceviewer_QTraceFileModel::Column_EntrypointName);
+        }
+
+        // Need to make sure this index is in model-space
+        if (index.model() == m_pProxyModel)
+        {
+            index = mapTreeIndexToModel(index);
+        }
+
+        // get the next item in the list
+        // TODO: this means that we won't be able to hit "Next" and select the first item in the trace file.
+        // However, if we move this into the loop below, then hitting "Next" will always result in finding the same item.
+        index = index.sibling(index.row()+1, vktraceviewer_QTraceFileModel::Column_EntrypointName);
+
+        // Can't get column count from the TreeView, so need to ask both the model and proxy if it exists.
+        int columnCount = m_pTraceFileModel->columnCount(index);
+        if (m_pProxyModel != NULL)
+        {
+            columnCount = m_pProxyModel->columnCount(index);
+        }
+
+        while (index.isValid())
+        {
+            for (int column = 0; column < columnCount; column++)
+            {
+                if (ui->treeView->isColumnHidden(column))
+                    continue;
+
+                QModelIndex treeIndex = mapTreeIndexFromModel(m_pTraceFileModel->index(index.row(), column, index.parent()));
+                if (treeIndex.data(Qt::DisplayRole).toString().contains(ui->searchTextBox->text(), Qt::CaseInsensitive))
+                {
+                    // Get the first column so that it can be selected
+                    QModelIndex srcIndex = m_pTraceFileModel->index(index.row(), vktraceviewer_QTraceFileModel::Column_EntrypointName, index.parent());
+                    selectApicallModelIndex(srcIndex, true, true);
+                    ui->treeView->setFocus();
+                    return;
+                }
+            }
+
+            // wasn't found in that row, so check the next one
+            index = index.sibling(index.row()+1, vktraceviewer_QTraceFileModel::Column_EntrypointName);
+        }
+    }
+}
+
+void vktraceviewer::on_searchPrevButton_clicked()
+{
+    if (m_pTraceFileModel != NULL)
+    {
+        QModelIndex index = ui->treeView->currentIndex();
+        if (!index.isValid())
+        {
+            // If there was no valid current index, then get the first index in the trace file model.
+            index = m_pTraceFileModel->index(0, vktraceviewer_QTraceFileModel::Column_EntrypointName);
+        }
+
+        // Need to make sure this index is in model-space
+        if (index.model() == m_pProxyModel)
+        {
+            index = mapTreeIndexToModel(index);
+        }
+
+        // get the next item in the list
+        // TODO: this means that we won't be able to hit "Prev" and select the first item in the trace file.
+        // However, if we move this into the loop below, then hitting "Prev" will always result in finding the same item.
+        index = index.sibling(index.row()-1, vktraceviewer_QTraceFileModel::Column_EntrypointName);
+
+        // Can't get column count from the TreeView, so need to ask both the model and proxy if it exists.
+        int columnCount = m_pTraceFileModel->columnCount(index);
+        if (m_pProxyModel != NULL)
+        {
+            columnCount = m_pProxyModel->columnCount(index);
+        }
+
+        while (index.isValid())
+        {
+            for (int column = 0; column < columnCount; column++)
+            {
+                if (ui->treeView->isColumnHidden(column))
+                    continue;
+
+                QModelIndex treeIndex = mapTreeIndexFromModel(m_pTraceFileModel->index(index.row(), column, index.parent()));
+                if (treeIndex.data(Qt::DisplayRole).toString().contains(ui->searchTextBox->text(), Qt::CaseInsensitive))
+                {
+                    // Get the first column so that it can be selected
+                    QModelIndex srcIndex = m_pTraceFileModel->index(index.row(), vktraceviewer_QTraceFileModel::Column_EntrypointName, index.parent());
+                    selectApicallModelIndex(srcIndex, true, true);
+                    ui->treeView->setFocus();
+                    return;
+                }
+            }
+
+            // wasn't found in that row, so check the next one
+            index = index.sibling(index.row()-1, vktraceviewer_QTraceFileModel::Column_EntrypointName);
+        }
+
+    }
+}
+
+void vktraceviewer::on_prevDrawcallButton_clicked()
+{
+    if (m_pTraceFileModel != NULL)
+    {
+        QModelIndex index = ui->treeView->currentIndex();
+
+        index = mapTreeIndexToModel(index);
+
+        QModelIndex indexAbove= index.sibling(index.row()-1, vktraceviewer_QTraceFileModel::Column_EntrypointName);
+        while (indexAbove.isValid())
+        {
+            vktrace_trace_packet_header* pHeader = (vktrace_trace_packet_header*)indexAbove.internalPointer();
+            if (pHeader != NULL && m_pTraceFileModel->isDrawCall((VKTRACE_TRACE_PACKET_ID)pHeader->packet_id))
+            {
+                selectApicallModelIndex(indexAbove, true, true);
+                ui->treeView->setFocus();
+                return;
+            }
+
+            // that row is not a draw call, so check the prev one
+            indexAbove = indexAbove.sibling(indexAbove.row()-1, vktraceviewer_QTraceFileModel::Column_EntrypointName);
+        }
+    }
+}
+
+QModelIndex vktraceviewer::mapTreeIndexToModel(const QModelIndex& treeIndex) const
+{
+    if (m_pProxyModel != NULL)
+    {
+        return m_pProxyModel->mapToSource(treeIndex);
+    }
+
+    return treeIndex;
+}
+
+QModelIndex vktraceviewer::mapTreeIndexFromModel(const QModelIndex& modelIndex) const
+{
+    if (m_pProxyModel != NULL)
+    {
+        return m_pProxyModel->mapFromSource(modelIndex);
+    }
+
+    return modelIndex;
+}
+
+void vktraceviewer::on_nextDrawcallButton_clicked()
+{
+    if (m_pTraceFileModel != NULL)
+    {
+        QModelIndex index = ui->treeView->currentIndex();
+
+        index = mapTreeIndexToModel(index);
+
+        QModelIndex indexBelow = index.sibling(index.row()+1, vktraceviewer_QTraceFileModel::Column_EntrypointName);
+        while (indexBelow.isValid())
+        {
+            vktrace_trace_packet_header* pHeader = (vktrace_trace_packet_header*)indexBelow.internalPointer();
+            if (pHeader != NULL && m_pTraceFileModel->isDrawCall((VKTRACE_TRACE_PACKET_ID)pHeader->packet_id))
+            {
+                selectApicallModelIndex(indexBelow, true, true);
+                ui->treeView->setFocus();
+                return;
+            }
+
+            // that row is not a draw call, so check the next one
+            indexBelow = indexBelow.sibling(indexBelow.row()+1, vktraceviewer_QTraceFileModel::Column_EntrypointName);
+        }
+    }
+}
+
+void vktraceviewer::on_searchTextBox_returnPressed()
+{
+    if (m_pTraceFileModel != NULL)
+    {
+        QModelIndex index = ui->treeView->indexBelow(ui->treeView->currentIndex());
+        bool bFound = false;
+
+        // search down from the current index
+        while (index.isValid())
+        {
+            for (int column = 0; column < m_pTraceFileModel->columnCount(index); column++)
+            {
+                if (m_pTraceFileModel->data(m_pTraceFileModel->index(index.row(), column, index.parent()), Qt::DisplayRole).toString().contains(ui->searchTextBox->text(), Qt::CaseInsensitive))
+                {
+                    bFound = true;
+                    break;
+                }
+            }
+
+            if (bFound)
+            {
+                break;
+            }
+            else
+            {
+                // wasn't found in that row, so check the next one
+                index = ui->treeView->indexBelow(index);
+            }
+        }
+
+        // if not found yet, then search from the root down to the current node
+        if (!bFound)
+        {
+            index = m_pTraceFileModel->index(0, 0);
+
+            while (index.isValid() && index != ui->treeView->currentIndex())
+            {
+                for (int column = 0; column < m_pTraceFileModel->columnCount(index); column++)
+                {
+                    if (m_pTraceFileModel->data(m_pTraceFileModel->index(index.row(), column, index.parent()), Qt::DisplayRole).toString().contains(ui->searchTextBox->text(), Qt::CaseInsensitive))
+                    {
+                        bFound = true;
+                        break;
+                    }
+                }
+
+                if (bFound)
+                {
+                    break;
+                }
+                else
+                {
+                    // wasn't found in that row, so check the next one
+                    index = ui->treeView->indexBelow(index);
+                }
+            }
+        }
+
+        if (bFound && index.isValid())
+        {
+            // a valid item was found, scroll to it and select it
+            selectApicallModelIndex(index, true, true);
+            ui->searchTextBox->setFocus();
+        }
+        else
+        {
+            // no items were found, so set the textbox background to red (it will get cleared to the original color if the user edits the search text)
+            QPalette palette(ui->searchTextBox->palette());
+            palette.setColor(QPalette::Base, Qt::red);
+            ui->searchTextBox->setPalette(palette);
+        }
+    }
+}
+
+void vktraceviewer::on_contextComboBox_currentIndexChanged(int index)
+{
+}
+
+void vktraceviewer::prompt_generate_trace()
+{
+    m_bGeneratingTrace = true;
+
+    bool bShowDialog = true;
+    while (bShowDialog)
+    {
+        int code = m_pGenerateTraceDialog->exec();
+        if (code != vktraceviewer_QGenerateTraceDialog::Succeeded)
+        {
+            m_bGeneratingTrace = false;
+            bShowDialog = false;
+        }
+        else
+        {
+            QFileInfo fileInfo(m_pGenerateTraceDialog->get_trace_file_path());
+            if (code == vktraceviewer_QGenerateTraceDialog::Succeeded &&
+                fileInfo.exists())
+            {
+                bShowDialog = false;
+                if (prompt_load_new_trace(fileInfo.canonicalFilePath()) == false)
+                {
+                    // The user decided not to load the trace file, so clear the generatingTrace flag.
+                    m_bGeneratingTrace = false;
+                }
+            }
+            else
+            {
+                LogError("Failed to trace the application.");
+                QMessageBox::critical(this, "Error", "Failed to trace application.");
+            }
+        }
+    }
+}
+
+void vktraceviewer::OnOutputMessage(VktraceLogLevel level, const QString &message)
+{
+    OnOutputMessage(level, -1, message);
+}
+
+void vktraceviewer::OnOutputMessage(VktraceLogLevel level, uint64_t packetIndex, const QString &message)
+{
+    switch(level)
+    {
+        case VKTRACE_LOG_ERROR:
+        {
+            gs_OUTPUT.error(packetIndex, message);
+            break;
+        }
+        case VKTRACE_LOG_WARNING:
+        {
+            gs_OUTPUT.warning(packetIndex, message);
+            break;
+        }
+        case VKTRACE_LOG_DEBUG:
+        case VKTRACE_LOG_VERBOSE:
+        default:
+        {
+            gs_OUTPUT.message(packetIndex, message);
+            break;
+        }
+    }
+}

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer.h b/vktrace/src/vktrace_viewer/vktraceviewer.h
new file mode 100644
index 0000000..6823bb6
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer.h

@@ -0,0 +1,176 @@
+/**************************************************************************
+ *
+ * Copyright 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+
+#ifndef VKTRACEVIEWER_H
+#define VKTRACEVIEWER_H
+
+#include <QMainWindow>
+#include <QString>
+#include <QThread>
+
+#include "vkreplay_seq.h"
+#include "vkreplay_factory.h"
+
+#include "vktraceviewer_view.h"
+#include "vktraceviewer_trace_file_utils.h"
+#include "vktraceviewer_qtimelineview.h"
+#include "vktraceviewer_controller_factory.h"
+#include "vktraceviewer_controller.h"
+#include "vktraceviewer_QTraceFileModel.h"
+#include "vktraceviewer_qgeneratetracedialog.h"
+#include "vktraceviewer_qsettingsdialog.h"
+#include "vktraceviewer_settings.h"
+
+namespace Ui
+{
+    class vktraceviewer;
+}
+
+class QGridLayout;
+class QModelIndex;
+class QProcess;
+class QProcessEnvironment;
+class QToolButton;
+
+class vktraceviewer : public QMainWindow, public vktraceviewer_view
+{
+    Q_OBJECT
+
+    enum Prompt_Result
+    {
+        vktraceviewer_prompt_error = -1,
+        vktraceviewer_prompt_cancelled = 0,
+        vktraceviewer_prompt_success = 1
+    };
+
+public:
+    explicit vktraceviewer(QWidget *parent = 0);
+    ~vktraceviewer();
+
+    void open_trace_file_threaded(const QString &filename);
+    void close_trace_file();
+
+    // Implementation of vktraceviewer_view
+    virtual void reset_view();
+
+
+    void LogAlways(const QString &message);
+    void LogWarning(const QString& message);
+    void LogError(const QString& message);
+    virtual void add_setting_group(vktrace_SettingGroup* pGroup);
+    virtual unsigned int get_global_settings(vktrace_SettingGroup** ppGroups);
+    virtual int add_custom_state_viewer(QWidget* pWidget, const QString& title, bool bBringToFront = false);
+    virtual void remove_custom_state_viewer(int tabIndex);
+    virtual void enable_custom_state_viewer(QWidget* pWidget, bool bEnabled);
+    virtual QToolButton* add_toolbar_button(const QString& title, bool bEnabled);
+    virtual void add_calltree_contextmenu_item(QAction* pAction);
+    virtual void set_calltree_model(vktraceviewer_QTraceFileModel* pTraceFileModel, QAbstractProxyModel *pModel);
+    virtual void select_call_at_packet_index(unsigned long long packetIndex);
+    virtual void highlight_timeline_item(unsigned long long packetArrayIndex, bool bScrollTo, bool bSelect);
+    virtual void on_replay_state_changed(bool bReplayInProgress);
+    virtual unsigned long long get_current_packet_index();
+
+protected:
+    // re-implemented from QMainWindow
+    virtual void moveEvent(QMoveEvent *pEvent);
+    virtual void closeEvent(QCloseEvent *pEvent);
+    virtual void resizeEvent(QResizeEvent *pEvent);
+
+signals:
+    void LoadTraceFile(const QString& filename);
+
+public slots:
+
+private slots:
+    void on_action_Open_triggered();
+    void on_action_Close_triggered();
+    void on_actionE_xit_triggered();
+    void on_actionExport_API_Calls_triggered();
+    void on_actionEdit_triggered();
+
+    void on_settingsDialogResized(unsigned int width, unsigned int height);
+    void on_settingsSaved(vktrace_SettingGroup* pUpdatedSettings, unsigned int numGroups);
+
+    void onTraceFileLoaded(bool bSuccess, const vktraceviewer_trace_file_info& fileInfo, const QString& controllerFilename);
+
+    void on_treeView_clicked(const QModelIndex &index);
+    void slot_timeline_clicked(const QModelIndex &index);
+
+    void on_searchTextBox_textChanged(const QString &searchText);
+    void on_searchNextButton_clicked();
+    void on_searchPrevButton_clicked();
+    void on_prevDrawcallButton_clicked();
+    void on_nextDrawcallButton_clicked();
+
+    void on_searchTextBox_returnPressed();
+
+    void on_contextComboBox_currentIndexChanged(int index);
+
+    void prompt_generate_trace();
+
+    void OnOutputMessage(VktraceLogLevel level, uint64_t packetIndex, const QString& message);
+    void OnOutputMessage(VktraceLogLevel level, const QString& message);
+
+    void on_hyperlinkClicked(const QUrl& link);
+
+private:
+    Ui::vktraceviewer *ui;
+    QWidget* m_pTraceStatsTab;
+    QGridLayout* m_pTraceStatsTabLayout;
+    QTextBrowser* m_pTraceStatsTabText;
+
+    QThread m_traceLoaderThread;
+    vktraceviewer_QSettingsDialog m_settingsDialog;
+
+    // Returns true if the user chose to load the file.
+    // Returns false if the user decided NOT to load the file.
+    bool prompt_load_new_trace(const QString &tracefile);
+
+    // Scan the trace file packets to extract API usage stats
+    void GenerateTraceFileStats();
+
+    void reset_tracefile_ui();
+
+    void selectApicallModelIndex(QModelIndex index, bool scrollTo, bool select);
+    QModelIndex mapTreeIndexToModel(const QModelIndex& treeIndex) const;
+    QModelIndex mapTreeIndexFromModel(const QModelIndex& modelIndex) const;
+
+    static float u64ToFloat(uint64_t value);
+    void build_timeline_model();
+
+    vktraceviewer_trace_file_info m_traceFileInfo;
+    vktraceviewer_QTraceFileModel* m_pTraceFileModel;
+    QAbstractProxyModel* m_pProxyModel;
+
+    vktraceviewer_controller_factory m_controllerFactory;
+    vktraceviewer_QController* m_pController;
+
+    QToolButton *m_pGenerateTraceButton;
+
+    vktraceviewer_QTimelineView* m_pTimeline;
+    vktraceviewer_QGenerateTraceDialog* m_pGenerateTraceDialog;
+
+    QColor m_searchTextboxBackgroundColor;
+    bool m_bDelayUpdateUIForContext;
+    bool m_bGeneratingTrace;
+};
+
+#endif // VKTRACEVIEWER_H

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer.ui b/vktrace/src/vktrace_viewer/vktraceviewer.ui
new file mode 100644
index 0000000..fedc983
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer.ui

@@ -0,0 +1,373 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<ui version="4.0">
+ <class>vktraceviewer</class>
+ <widget class="QMainWindow" name="vktraceviewer">
+  <property name="geometry">
+   <rect>
+    <x>0</x>
+    <y>0</y>
+    <width>1025</width>
+    <height>768</height>
+   </rect>
+  </property>
+  <property name="windowTitle">
+   <string>VkTrace Viewer</string>
+  </property>
+  <widget class="QWidget" name="centralWidget">
+   <layout class="QGridLayout" name="gridLayout_3">
+    <item row="0" column="0">
+     <widget class="QSplitter" name="splitter_3">
+      <property name="sizePolicy">
+       <sizepolicy hsizetype="Preferred" vsizetype="Expanding">
+        <horstretch>0</horstretch>
+        <verstretch>1</verstretch>
+       </sizepolicy>
+      </property>
+      <property name="orientation">
+       <enum>Qt::Vertical</enum>
+      </property>
+      <widget class="QSplitter" name="splitter_2">
+       <property name="sizePolicy">
+        <sizepolicy hsizetype="Preferred" vsizetype="Expanding">
+         <horstretch>0</horstretch>
+         <verstretch>5</verstretch>
+        </sizepolicy>
+       </property>
+       <property name="orientation">
+        <enum>Qt::Vertical</enum>
+       </property>
+       <widget class="QWidget" name="timelineLayoutWidget">
+        <layout class="QVBoxLayout" name="timelineLayout" stretch="0">
+         <property name="spacing">
+          <number>6</number>
+         </property>
+         <property name="sizeConstraint">
+          <enum>QLayout::SetDefaultConstraint</enum>
+         </property>
+         <property name="bottomMargin">
+          <number>0</number>
+         </property>
+         <item>
+          <widget class="QGraphicsView" name="timelineViewPlaceholder"/>
+         </item>
+        </layout>
+       </widget>
+       <widget class="QSplitter" name="splitter">
+        <property name="sizePolicy">
+         <sizepolicy hsizetype="Expanding" vsizetype="Expanding">
+          <horstretch>0</horstretch>
+          <verstretch>9</verstretch>
+         </sizepolicy>
+        </property>
+        <property name="orientation">
+         <enum>Qt::Horizontal</enum>
+        </property>
+        <widget class="QWidget" name="traceNavigationLayoutWidget">
+         <layout class="QHBoxLayout" name="horizontalLayout" stretch="0">
+          <item>
+           <layout class="QVBoxLayout" name="traceNavigationLayout" stretch="0,0">
+            <item>
+             <layout class="QHBoxLayout" name="horizontalLayout_4">
+              <item>
+               <widget class="QPushButton" name="prevDrawcallButton">
+                <property name="maximumSize">
+                 <size>
+                  <width>65</width>
+                  <height>16777215</height>
+                 </size>
+                </property>
+                <property name="toolTip">
+                 <string>Jump to previous draw call</string>
+                </property>
+                <property name="text">
+                 <string>Prev DC</string>
+                </property>
+               </widget>
+              </item>
+              <item>
+               <widget class="QPushButton" name="nextDrawcallButton">
+                <property name="maximumSize">
+                 <size>
+                  <width>65</width>
+                  <height>16777215</height>
+                 </size>
+                </property>
+                <property name="toolTip">
+                 <string>Jump to next draw call</string>
+                </property>
+                <property name="text">
+                 <string>Next DC</string>
+                </property>
+               </widget>
+              </item>
+              <item>
+               <spacer name="horizontalSpacer">
+                <property name="orientation">
+                 <enum>Qt::Horizontal</enum>
+                </property>
+                <property name="sizeHint" stdset="0">
+                 <size>
+                  <width>40</width>
+                  <height>20</height>
+                 </size>
+                </property>
+               </spacer>
+              </item>
+              <item>
+               <widget class="QLineEdit" name="searchTextBox">
+                <property name="maximumSize">
+                 <size>
+                  <width>200</width>
+                  <height>16777215</height>
+                 </size>
+                </property>
+                <property name="placeholderText">
+                 <string>Search</string>
+                </property>
+               </widget>
+              </item>
+              <item>
+               <widget class="QPushButton" name="searchPrevButton">
+                <property name="maximumSize">
+                 <size>
+                  <width>40</width>
+                  <height>16777215</height>
+                 </size>
+                </property>
+                <property name="toolTip">
+                 <string>Jump to previous occurrence</string>
+                </property>
+                <property name="text">
+                 <string>Prev</string>
+                </property>
+               </widget>
+              </item>
+              <item>
+               <widget class="QPushButton" name="searchNextButton">
+                <property name="maximumSize">
+                 <size>
+                  <width>40</width>
+                  <height>16777215</height>
+                 </size>
+                </property>
+                <property name="toolTip">
+                 <string>Jump to next occurrence</string>
+                </property>
+                <property name="text">
+                 <string>Next</string>
+                </property>
+               </widget>
+              </item>
+             </layout>
+            </item>
+            <item>
+             <widget class="QTreeView" name="treeView">
+              <property name="sizePolicy">
+               <sizepolicy hsizetype="Expanding" vsizetype="Expanding">
+                <horstretch>1</horstretch>
+                <verstretch>0</verstretch>
+               </sizepolicy>
+              </property>
+             </widget>
+            </item>
+           </layout>
+          </item>
+         </layout>
+        </widget>
+        <widget class="QWidget" name="snapshotLayoutWidget">
+         <layout class="QVBoxLayout" name="snapshotLayout">
+          <item>
+           <widget class="QComboBox" name="contextComboBox"/>
+          </item>
+          <item>
+           <widget class="QTabWidget" name="stateTabWidget">
+            <property name="sizePolicy">
+             <sizepolicy hsizetype="Expanding" vsizetype="Expanding">
+              <horstretch>2</horstretch>
+              <verstretch>0</verstretch>
+             </sizepolicy>
+            </property>
+            <property name="tabShape">
+             <enum>QTabWidget::Rounded</enum>
+            </property>
+            <property name="currentIndex">
+             <number>0</number>
+            </property>
+            <widget class="QWidget" name="stateTab">
+             <attribute name="title">
+              <string>State</string>
+             </attribute>
+             <layout class="QVBoxLayout" name="verticalLayout_3">
+              <item>
+               <widget class="QTreeView" name="stateTreeView">
+                <property name="sizePolicy">
+                 <sizepolicy hsizetype="Expanding" vsizetype="Expanding">
+                  <horstretch>0</horstretch>
+                  <verstretch>3</verstretch>
+                 </sizepolicy>
+                </property>
+               </widget>
+              </item>
+             </layout>
+            </widget>
+           </widget>
+          </item>
+         </layout>
+        </widget>
+       </widget>
+      </widget>
+      <widget class="QTabWidget" name="bottomTabWidget">
+       <property name="minimumSize">
+        <size>
+         <width>0</width>
+         <height>100</height>
+        </size>
+       </property>
+       <property name="baseSize">
+        <size>
+         <width>0</width>
+         <height>100</height>
+        </size>
+       </property>
+       <property name="tabPosition">
+        <enum>QTabWidget::North</enum>
+       </property>
+       <property name="tabShape">
+        <enum>QTabWidget::Rounded</enum>
+       </property>
+       <property name="currentIndex">
+        <number>0</number>
+       </property>
+       <widget class="QWidget" name="outputTab">
+        <attribute name="title">
+         <string>Output</string>
+        </attribute>
+        <layout class="QGridLayout" name="gridLayout_2">
+         <item row="0" column="0">
+          <widget class="QTextBrowser" name="outputTextBrowser">
+           <property name="readOnly">
+            <bool>true</bool>
+           </property>
+           <property name="openLinks">
+            <bool>false</bool>
+           </property>
+          </widget>
+         </item>
+        </layout>
+       </widget>
+       <widget class="QWidget" name="machineInfoTab">
+        <attribute name="title">
+         <string>Machine Info</string>
+        </attribute>
+        <layout class="QGridLayout" name="gridLayout_4">
+         <item row="0" column="0">
+          <widget class="QTextBrowser" name="machineInfoText">
+           <property name="lineWrapMode">
+            <enum>QTextEdit::NoWrap</enum>
+           </property>
+          </widget>
+         </item>
+        </layout>
+       </widget>
+       <widget class="QWidget" name="callStackTab">
+        <attribute name="title">
+         <string>Call Stack</string>
+        </attribute>
+        <layout class="QGridLayout" name="gridLayout">
+         <item row="0" column="0">
+          <widget class="QTextBrowser" name="backtraceText">
+           <property name="minimumSize">
+            <size>
+             <width>0</width>
+             <height>100</height>
+            </size>
+           </property>
+           <property name="baseSize">
+            <size>
+             <width>0</width>
+             <height>100</height>
+            </size>
+           </property>
+           <property name="lineWrapMode">
+            <enum>QTextEdit::NoWrap</enum>
+           </property>
+          </widget>
+         </item>
+        </layout>
+       </widget>
+      </widget>
+     </widget>
+    </item>
+   </layout>
+  </widget>
+  <widget class="QMenuBar" name="menuBar">
+   <property name="geometry">
+    <rect>
+     <x>0</x>
+     <y>0</y>
+     <width>1025</width>
+     <height>25</height>
+    </rect>
+   </property>
+   <widget class="QMenu" name="menuFile">
+    <property name="title">
+     <string>&amp;File</string>
+    </property>
+    <addaction name="action_Open"/>
+    <addaction name="action_Close"/>
+    <addaction name="separator"/>
+    <addaction name="actionExport_API_Calls"/>
+    <addaction name="separator"/>
+    <addaction name="actionE_xit"/>
+   </widget>
+   <widget class="QMenu" name="menuSettings">
+    <property name="title">
+     <string>Settings</string>
+    </property>
+    <addaction name="actionEdit"/>
+   </widget>
+   <addaction name="menuFile"/>
+   <addaction name="menuSettings"/>
+  </widget>
+  <widget class="QToolBar" name="mainToolBar">
+   <attribute name="toolBarArea">
+    <enum>TopToolBarArea</enum>
+   </attribute>
+   <attribute name="toolBarBreak">
+    <bool>false</bool>
+   </attribute>
+  </widget>
+  <widget class="QStatusBar" name="statusBar"/>
+  <action name="action_Open">
+   <property name="text">
+    <string>&amp;Open Trace...</string>
+   </property>
+   <property name="shortcut">
+    <string>Ctrl+O</string>
+   </property>
+  </action>
+  <action name="actionE_xit">
+   <property name="text">
+    <string>E&amp;xit</string>
+   </property>
+  </action>
+  <action name="action_Close">
+   <property name="text">
+    <string>Close Trace</string>
+   </property>
+  </action>
+  <action name="actionExport_API_Calls">
+   <property name="text">
+    <string>Export API Calls...</string>
+   </property>
+  </action>
+  <action name="actionEdit">
+   <property name="text">
+    <string>Edit...</string>
+   </property>
+  </action>
+ </widget>
+ <layoutdefault spacing="6" margin="11"/>
+ <resources/>
+ <connections/>
+</ui>

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_QReplayWidget.h b/vktrace/src/vktrace_viewer/vktraceviewer_QReplayWidget.h
new file mode 100644
index 0000000..af047cd
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_QReplayWidget.h

@@ -0,0 +1,262 @@
+/**************************************************************************
+ *
+ * Copyright 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+#ifndef _VKTRACEVIEWER_QREPLAYWIDGET_H_
+#define _VKTRACEVIEWER_QREPLAYWIDGET_H_
+
+#include <QWidget>
+#include <QThread>
+#include <QToolBar>
+#include <QToolButton>
+#include <QVBoxLayout>
+#include <QCheckBox>
+
+#include "vktraceviewer_QReplayWorker.h"
+
+class vktraceviewer_QReplayWidget : public QWidget
+{
+    Q_OBJECT
+public:
+    explicit vktraceviewer_QReplayWidget(vktraceviewer_QReplayWorker* pWorker, QWidget *parent = 0)
+        : QWidget(parent),
+          m_pWorker(pWorker)
+    {
+        QVBoxLayout* pLayout = new QVBoxLayout(this);
+        setLayout(pLayout);
+
+        m_pToolBar = new QToolBar("ReplayToolbar", this);
+        pLayout->addWidget(m_pToolBar);
+
+        m_pPlayButton = new QToolButton(m_pToolBar);
+        m_pPlayButton->setText("Play");
+        m_pPlayButton->setEnabled(true);
+        m_pToolBar->addWidget(m_pPlayButton);
+        connect(m_pPlayButton, SIGNAL(clicked()), this, SLOT(onPlayButtonClicked()));
+
+        m_pStepButton = new QToolButton(m_pToolBar);
+        m_pStepButton->setText("Step");
+        m_pStepButton->setEnabled(true);
+        m_pToolBar->addWidget(m_pStepButton);
+        connect(m_pStepButton, SIGNAL(clicked()), this, SLOT(onStepButtonClicked()));
+
+        m_pPauseButton = new QToolButton(m_pToolBar);
+        m_pPauseButton->setText("Pause");
+        m_pPauseButton->setEnabled(false);
+        m_pToolBar->addWidget(m_pPauseButton);
+        connect(m_pPauseButton, SIGNAL(clicked()), this, SLOT(onPauseButtonClicked()));
+
+        m_pContinueButton = new QToolButton(m_pToolBar);
+        m_pContinueButton->setText("Continue");
+        m_pContinueButton->setEnabled(false);
+        m_pToolBar->addWidget(m_pContinueButton);
+        connect(m_pContinueButton, SIGNAL(clicked()), this, SLOT(onContinueButtonClicked()));
+
+        m_pStopButton = new QToolButton(m_pToolBar);
+        m_pStopButton->setText("Stop");
+        m_pStopButton->setEnabled(false);
+        m_pToolBar->addWidget(m_pStopButton);
+        connect(m_pStopButton, SIGNAL(clicked()), this, SLOT(onStopButtonClicked()));
+
+        m_pDetachCheckBox = new QCheckBox(m_pToolBar);
+        m_pDetachCheckBox->setText("Detach");
+        m_pDetachCheckBox->setEnabled(true);
+        m_pToolBar->addWidget(m_pDetachCheckBox);
+        connect(m_pDetachCheckBox, SIGNAL(clicked(bool)), this, SLOT(onDetachCheckBoxClicked(bool)));
+
+        m_pReplayWindow = new QWidget(this);
+        pLayout->addWidget(m_pReplayWindow);
+
+        // connect worker signals to widget actions
+        qRegisterMetaType<uint64_t>("uint64_t");
+        m_replayThread.setObjectName("ReplayThread");
+        m_pWorker->moveToThread(&m_replayThread);
+        m_replayThread.start();
+
+        // Clicking the Pause and Stop buttons are direct connections so that they happen more immediately than a queued connection.
+        // Queued connections are used here whenever the replay will be advanced from a stopped state,
+        // and for all the signals FROM the worker object since it is on a different thread.
+        connect(this, SIGNAL(PlayButtonClicked()), m_pWorker, SLOT(StartReplay()), Qt::QueuedConnection);
+        connect(this, SIGNAL(StepButtonClicked()), m_pWorker, SLOT(StepReplay()), Qt::QueuedConnection);
+        connect(this, SIGNAL(PauseButtonClicked()), m_pWorker, SLOT(PauseReplay()), Qt::DirectConnection);
+        connect(this, SIGNAL(ContinueButtonClicked()), m_pWorker, SLOT(ContinueReplay()), Qt::QueuedConnection);
+        connect(this, SIGNAL(StopButtonClicked()), m_pWorker, SLOT(StopReplay()), Qt::DirectConnection);
+        connect(this, SIGNAL(DetachCheckBoxClicked(bool)), m_pWorker, SLOT(DetachReplay(bool)), Qt::QueuedConnection);
+
+        connect(m_pWorker, SIGNAL(ReplayStarted()), this, SLOT(slotReplayStarted()), Qt::QueuedConnection);
+        connect(m_pWorker, SIGNAL(ReplayPaused(uint64_t)), this, SLOT(slotReplayPaused(uint64_t)), Qt::QueuedConnection);
+        connect(m_pWorker, SIGNAL(ReplayContinued()), this, SLOT(slotReplayContinued()), Qt::QueuedConnection);
+        connect(m_pWorker, SIGNAL(ReplayStopped(uint64_t)), this, SLOT(slotReplayStopped(uint64_t)), Qt::QueuedConnection);
+        connect(m_pWorker, SIGNAL(ReplayFinished(uint64_t)), this, SLOT(slotReplayFinished(uint64_t)), Qt::QueuedConnection);
+        connect(m_pWorker, SIGNAL(ReplayProgressUpdate(uint64_t)), this, SIGNAL(ReplayProgressUpdate(uint64_t)), Qt::QueuedConnection);
+
+        connect(m_pWorker, SIGNAL(OutputMessage(VktraceLogLevel, uint64_t, const QString&)), this, SLOT(OnOutputMessage(VktraceLogLevel, uint64_t, const QString&)), Qt::QueuedConnection);
+    }
+
+    virtual ~vktraceviewer_QReplayWidget()
+    {
+        m_replayThread.quit();
+        m_replayThread.wait();
+    }
+
+    virtual QPaintEngine* paintEngine() const
+    {
+        return NULL;
+    }
+
+    QWidget* GetReplayWindow() const
+    {
+        return m_pReplayWindow;
+    }
+
+signals:
+    void PlayButtonClicked();
+    void StepButtonClicked();
+    void PauseButtonClicked();
+    void ContinueButtonClicked();
+    void StopButtonClicked();
+    void DetachCheckBoxClicked(bool checked);
+
+    void ReplayStarted();
+    void ReplayPaused(uint64_t packetIndex);
+    void ReplayContinued();
+    void ReplayStopped(uint64_t packetIndex);
+    void ReplayFinished(uint64_t packetIndex);
+    void ReplayProgressUpdate(uint64_t packetIndex);
+
+    void OutputMessage(VktraceLogLevel level, const QString& msg);
+    void OutputMessage(VktraceLogLevel level, uint64_t packetIndex, const QString& msg);
+
+private slots:
+
+    void slotReplayStarted()
+    {
+        m_pPlayButton->setEnabled(false);
+        m_pStepButton->setEnabled(false);
+        m_pPauseButton->setEnabled(true);
+        m_pContinueButton->setEnabled(false);
+        m_pStopButton->setEnabled(true);
+        m_pDetachCheckBox->setEnabled(false);
+
+        emit ReplayStarted();
+    }
+
+    void slotReplayPaused(uint64_t packetIndex)
+    {
+        m_pPlayButton->setEnabled(false);
+        m_pStepButton->setEnabled(true);
+        m_pPauseButton->setEnabled(false);
+        m_pContinueButton->setEnabled(true);
+        m_pStopButton->setEnabled(true);
+        m_pDetachCheckBox->setEnabled(false);
+
+        emit ReplayPaused(packetIndex);
+    }
+
+    void slotReplayContinued()
+    {
+        m_pPlayButton->setEnabled(false);
+        m_pStepButton->setEnabled(false);
+        m_pPauseButton->setEnabled(true);
+        m_pContinueButton->setEnabled(false);
+        m_pStopButton->setEnabled(true);
+        m_pDetachCheckBox->setEnabled(false);
+
+        emit ReplayContinued();
+    }
+
+    void slotReplayStopped(uint64_t packetIndex)
+    {
+        m_pPlayButton->setEnabled(true);
+        m_pStepButton->setEnabled(true);
+        m_pPauseButton->setEnabled(false);
+        m_pContinueButton->setEnabled(false);
+        m_pStopButton->setEnabled(false);
+        m_pDetachCheckBox->setEnabled(true);
+
+        emit ReplayStopped(packetIndex);
+    }
+
+    void slotReplayFinished(uint64_t packetIndex)
+    {
+        m_pPlayButton->setEnabled(true);
+        m_pStepButton->setEnabled(true);
+        m_pPauseButton->setEnabled(false);
+        m_pContinueButton->setEnabled(false);
+        m_pStopButton->setEnabled(false);
+        m_pDetachCheckBox->setEnabled(true);
+
+        emit ReplayFinished(packetIndex);
+    }
+
+    void OnOutputMessage(VktraceLogLevel level, uint64_t packetIndex, const QString& msg)
+    {
+        emit OutputMessage(level, packetIndex, msg);
+    }
+
+public slots:
+    void onPlayButtonClicked()
+    {
+        emit PlayButtonClicked();
+    }
+
+    void onStepButtonClicked()
+    {
+        emit StepButtonClicked();
+    }
+
+    void onPauseButtonClicked()
+    {
+        m_pPlayButton->setEnabled(false);
+        m_pPauseButton->setEnabled(false);
+        m_pContinueButton->setEnabled(false);
+        m_pStopButton->setEnabled(false);
+
+        emit PauseButtonClicked();
+    }
+
+    void onContinueButtonClicked()
+    {
+        emit ContinueButtonClicked();
+    }
+
+    void onStopButtonClicked()
+    {
+        emit StopButtonClicked();
+    }
+
+    void onDetachCheckBoxClicked(bool checked)
+    {
+        emit DetachCheckBoxClicked(checked);
+    }
+
+private:
+    vktraceviewer_QReplayWorker* m_pWorker;
+    QWidget* m_pReplayWindow;
+    QToolBar* m_pToolBar;
+    QToolButton* m_pPlayButton;
+    QToolButton* m_pStepButton;
+    QToolButton* m_pPauseButton;
+    QToolButton* m_pContinueButton;
+    QToolButton* m_pStopButton;
+    QCheckBox* m_pDetachCheckBox;
+    QThread m_replayThread;
+};
+
+#endif //_VKTRACEVIEWER_QREPLAYWIDGET_H_

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_QReplayWorker.cpp b/vktrace/src/vktrace_viewer/vktraceviewer_QReplayWorker.cpp
new file mode 100644
index 0000000..bc2fcf9
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_QReplayWorker.cpp

@@ -0,0 +1,580 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+
+#include "vktraceviewer_QReplayWorker.h"
+#include <QAction>
+#include <QCoreApplication>
+#include "vktraceviewer_trace_file_utils.h"
+
+vktraceviewer_QReplayWorker* g_pWorker;
+static uint64_t s_currentReplayPacket = 0;
+
+static void dbg_msg_callback(vktrace_replay::VKTRACE_DBG_MSG_TYPE msgType, uint64_t packetIndex, const char* pMsg);
+
+void replayWorkerLoggingCallback(VktraceLogLevel level, const char* pMessage)
+{
+    if (g_pWorker != NULL)
+    {
+        switch(level)
+        {
+        case VKTRACE_LOG_DEBUG:
+            dbg_msg_callback(vktrace_replay::VKTRACE_DBG_MSG_INFO, s_currentReplayPacket, pMessage);
+            break;
+        case VKTRACE_LOG_ERROR:
+            dbg_msg_callback(vktrace_replay::VKTRACE_DBG_MSG_ERROR, s_currentReplayPacket, pMessage);
+            break;
+        case VKTRACE_LOG_WARNING:
+            dbg_msg_callback(vktrace_replay::VKTRACE_DBG_MSG_WARNING, s_currentReplayPacket, pMessage);
+            break;
+        case VKTRACE_LOG_VERBOSE:
+            dbg_msg_callback(vktrace_replay::VKTRACE_DBG_MSG_INFO, s_currentReplayPacket, pMessage);
+            break;
+        default:
+            break;
+        }
+    }
+
+#if defined(WIN32)
+#if _DEBUG
+    OutputDebugString(pMessage);
+#endif
+#endif
+}
+
+vktraceviewer_QReplayWorker::vktraceviewer_QReplayWorker()
+    : QObject(NULL),
+      m_bPauseReplay(false),
+      m_bStopReplay(false),
+      m_bReplayInProgress(false),
+      m_pView(NULL),
+      m_pTraceFileInfo(NULL),
+      m_currentReplayPacketIndex(0),
+      m_pActionRunToHere(NULL),
+      m_pauseAtPacketIndex((uint64_t)-1),
+      m_pReplayWindow(NULL),
+      m_pReplayWindowWidth(0),
+      m_pReplayWindowHeight(0),
+      m_bPrintReplayInfoMessages(TRUE),
+      m_bPrintReplayWarningMessages(TRUE),
+      m_bPrintReplayErrorMessages(TRUE),
+      m_bPauseOnReplayInfoMessages(FALSE),
+      m_bPauseOnReplayWarningMessages(FALSE),
+      m_bPauseOnReplayErrorMessages(FALSE)
+{
+    memset(m_pReplayers, 0, sizeof(vktrace_replay::vktrace_trace_packet_replay_library*) * VKTRACE_MAX_TRACER_ID_ARRAY_SIZE);
+    g_pWorker = this;
+}
+
+vktraceviewer_QReplayWorker::~vktraceviewer_QReplayWorker()
+{
+    setView(NULL);
+    g_pWorker = NULL;
+
+    if (m_pActionRunToHere != NULL)
+    {
+        disconnect(m_pActionRunToHere, SIGNAL(triggered()), this, SLOT(onPlayToHere()));
+        delete m_pActionRunToHere;
+        m_pActionRunToHere = NULL;
+    }
+}
+
+void vktraceviewer_QReplayWorker::setPrintReplayMessages(BOOL bPrintInfo, BOOL bPrintWarning, BOOL bPrintError)
+{
+    m_bPrintReplayInfoMessages = bPrintInfo;
+    m_bPrintReplayWarningMessages = bPrintWarning;
+    m_bPrintReplayErrorMessages = bPrintError;
+}
+
+void vktraceviewer_QReplayWorker::setPauseOnReplayMessages(BOOL bPauseOnInfo, BOOL bPauseOnWarning, BOOL bPauseOnError)
+{
+    m_bPauseOnReplayInfoMessages = bPauseOnInfo;
+    m_bPauseOnReplayWarningMessages = bPauseOnWarning;
+    m_bPauseOnReplayErrorMessages = bPauseOnError;
+}
+
+BOOL vktraceviewer_QReplayWorker::PrintReplayInfoMsgs()
+{
+    return m_bPrintReplayInfoMessages;
+}
+
+BOOL vktraceviewer_QReplayWorker::PrintReplayWarningMsgs()
+{
+    return m_bPrintReplayWarningMessages;
+}
+
+BOOL vktraceviewer_QReplayWorker::PrintReplayErrorMsgs()
+{
+    return m_bPrintReplayErrorMessages;
+}
+
+BOOL vktraceviewer_QReplayWorker::PauseOnReplayInfoMsg()
+{
+    return m_bPauseOnReplayInfoMessages;
+}
+
+BOOL vktraceviewer_QReplayWorker::PauseOnReplayWarningMsg()
+{
+    return m_bPauseOnReplayWarningMessages;
+}
+
+BOOL vktraceviewer_QReplayWorker::PauseOnReplayErrorMsg()
+{
+    return m_bPauseOnReplayErrorMessages;
+}
+
+void vktraceviewer_QReplayWorker::setView(vktraceviewer_view* pView)
+{
+    m_pView = pView;
+}
+
+bool vktraceviewer_QReplayWorker::load_replayers(vktraceviewer_trace_file_info* pTraceFileInfo,
+    QWidget* pReplayWindow, int const replayWindowWidth,
+    int const replayWindowHeight, bool const separateReplayWindow)
+{
+    // Get window handle of the widget to replay into.
+    assert(pReplayWindow != NULL);
+    assert(replayWindowWidth > 0);
+    assert(replayWindowHeight > 0);
+
+    m_pReplayWindow = pReplayWindow;
+    m_pReplayWindowWidth = replayWindowWidth;
+    m_pReplayWindowHeight = replayWindowHeight;
+
+    m_pTraceFileInfo = pTraceFileInfo;
+
+    // TODO: Get the width and height from the replayer. We can't do this yet
+    // because the replayer doesn't know the render target's size.
+
+    WId hWindow = pReplayWindow->winId();
+
+    // load any API specific driver libraries and init replayer objects
+    uint8_t tidApi = VKTRACE_TID_RESERVED;
+    bool bReplayerLoaded = false;
+
+    vktrace_replay::ReplayDisplay disp;
+    if(separateReplayWindow)
+    {
+        disp = vktrace_replay::ReplayDisplay(replayWindowWidth, replayWindowHeight, 0, false);
+    }
+    else
+    {
+        disp = vktrace_replay::ReplayDisplay((vktrace_window_handle)hWindow, replayWindowWidth, replayWindowHeight);
+    }
+
+    for (int i = 0; i < VKTRACE_MAX_TRACER_ID_ARRAY_SIZE; i++)
+    {
+        m_pReplayers[i] = NULL;
+    }
+
+    for (int i = 0; i < pTraceFileInfo->header.tracer_count; i++)
+    {
+        uint8_t tracerId = pTraceFileInfo->header.tracer_id_array[i].id;
+        tidApi = tracerId;
+
+        const VKTRACE_TRACER_REPLAYER_INFO* pReplayerInfo = &(gs_tracerReplayerInfo[tracerId]);
+
+        if (pReplayerInfo->tracerId != tracerId)
+        {
+            emit OutputMessage(VKTRACE_LOG_ERROR, QString("Replayer info for TracerId (%1) failed consistency check.").arg(tracerId));
+            assert(!"TracerId in VKTRACE_TRACER_REPLAYER_INFO does not match the requested tracerId. The array needs to be corrected.");
+        }
+        else if (pReplayerInfo->needsReplayer == TRUE)
+        {
+            // Have our factory create the necessary replayer
+            m_pReplayers[tracerId] = m_replayerFactory.Create(tracerId);
+
+            if (m_pReplayers[tracerId] == NULL)
+            {
+                // replayer failed to be created
+                emit OutputMessage(VKTRACE_LOG_ERROR, QString("Couldn't create replayer for TracerId %1.").arg(tracerId));
+                bReplayerLoaded = false;
+            }
+            else
+            {
+                m_pReplayers[tracerId]->SetLogCallback(replayWorkerLoggingCallback);
+                m_pReplayers[tracerId]->SetLogLevel(VKTRACE_LOG_ERROR);
+                m_pReplayers[tracerId]->RegisterDbgMsgCallback((vktrace_replay::VKTRACE_DBG_MSG_CALLBACK_FUNCTION)&dbg_msg_callback);
+
+                // get settings from the replayer
+                m_pView->add_setting_group(m_pReplayers[tracerId]->GetSettings());
+
+                // update replayer with updated state
+                vktrace_SettingGroup* pGlobalSettings = NULL;
+                unsigned int numGlobalSettings = m_pView->get_global_settings(&pGlobalSettings);
+                m_pReplayers[tracerId]->UpdateFromSettings(pGlobalSettings, numGlobalSettings);
+
+                // Initialize the replayer
+                int err = m_pReplayers[tracerId]->Initialize(&disp, NULL);
+                if (err) {
+                    emit OutputMessage(VKTRACE_LOG_ERROR, QString("Couldn't Initialize replayer for TracerId %1.").arg(tracerId));
+                    return false;
+                }
+
+                bReplayerLoaded = true;
+            }
+        }
+    }
+
+    if (tidApi == VKTRACE_TID_RESERVED)
+    {
+        emit OutputMessage(VKTRACE_LOG_ERROR, QString("No API specified in tracefile for replaying."));
+        return false;
+    }
+
+    if (bReplayerLoaded)
+    {
+        m_pActionRunToHere = new QAction("Play to here", NULL);
+        connect(m_pActionRunToHere, SIGNAL(triggered()), this, SLOT(onPlayToHere()));
+        m_pView->add_calltree_contextmenu_item(m_pActionRunToHere);
+    }
+
+    return bReplayerLoaded;
+}
+
+void vktraceviewer_QReplayWorker::unloadReplayers()
+{
+    m_pTraceFileInfo = NULL;
+
+    // Clean up replayers
+    if (m_pReplayers != NULL)
+    {
+        for (int i = 0; i < VKTRACE_MAX_TRACER_ID_ARRAY_SIZE; i++)
+        {
+            if (m_pReplayers[i] != NULL)
+            {
+                m_pReplayers[i]->Deinitialize();
+                m_replayerFactory.Destroy(&m_pReplayers[i]);
+            }
+        }
+    }
+}
+
+void vktraceviewer_QReplayWorker::playCurrentTraceFile(uint64_t startPacketIndex)
+{
+    vktraceviewer_trace_file_info* pTraceFileInfo = m_pTraceFileInfo;
+    vktraceviewer_trace_file_packet_offsets* pCurPacket = NULL;
+    unsigned int res = vktrace_replay::VKTRACE_REPLAY_ERROR;
+    vktrace_replay::vktrace_trace_packet_replay_library *replayer;
+
+    m_bReplayInProgress = true;
+
+    for (uint64_t i = startPacketIndex; i < pTraceFileInfo->packetCount; i++)
+    {
+        m_currentReplayPacketIndex = i;
+        emit ReplayProgressUpdate(m_currentReplayPacketIndex);
+
+        pCurPacket = &pTraceFileInfo->pPacketOffsets[i];
+        s_currentReplayPacket = pCurPacket->pHeader->global_packet_index;
+        switch (pCurPacket->pHeader->packet_id) {
+            case VKTRACE_TPI_MESSAGE:
+            {
+                vktrace_trace_packet_message* msgPacket;
+                msgPacket = (vktrace_trace_packet_message*)pCurPacket->pHeader;
+                replayWorkerLoggingCallback(msgPacket->type, msgPacket->message);
+                break;
+            }
+            case VKTRACE_TPI_MARKER_CHECKPOINT:
+                break;
+            case VKTRACE_TPI_MARKER_API_BOUNDARY:
+                break;
+            case VKTRACE_TPI_MARKER_API_GROUP_BEGIN:
+                break;
+            case VKTRACE_TPI_MARKER_API_GROUP_END:
+                break;
+            case VKTRACE_TPI_MARKER_TERMINATE_PROCESS:
+                break;
+            //TODO processing code for all the above cases
+            default:
+            {
+                if (pCurPacket->pHeader->tracer_id >= VKTRACE_MAX_TRACER_ID_ARRAY_SIZE  || pCurPacket->pHeader->tracer_id == VKTRACE_TID_RESERVED)
+                {
+                    replayWorkerLoggingCallback(VKTRACE_LOG_WARNING, QString("Tracer_id from packet num packet %1 invalid.").arg(pCurPacket->pHeader->packet_id).toStdString().c_str());
+                    continue;
+                }
+                replayer = m_pReplayers[pCurPacket->pHeader->tracer_id];
+                if (replayer == NULL) {
+                    replayWorkerLoggingCallback(VKTRACE_LOG_WARNING, QString("Tracer_id %1 has no valid replayer.").arg(pCurPacket->pHeader->tracer_id).toStdString().c_str());
+                    continue;
+                }
+                if (pCurPacket->pHeader->packet_id >= VKTRACE_TPI_BEGIN_API_HERE)
+                {
+                    // replay the API packet
+                    try
+                    {
+                        res = replayer->Replay(pCurPacket->pHeader);
+                    }
+                    catch (std::exception& e)
+                    {
+                        replayWorkerLoggingCallback(VKTRACE_LOG_ERROR, QString("Caught std::exception while replaying packet %1: %2").arg(pCurPacket->pHeader->global_packet_index).arg(e.what()).toStdString().c_str());
+                    }
+                    catch (int i)
+                    {
+                        replayWorkerLoggingCallback(VKTRACE_LOG_ERROR, QString("Caught int exception: %1").arg(i).toStdString().c_str());
+                    }
+                    catch (...)
+                    {
+                        replayWorkerLoggingCallback(VKTRACE_LOG_ERROR, "Caught unknown exception.");
+                    }
+
+                    if (res == vktrace_replay::VKTRACE_REPLAY_ERROR ||
+                        res == vktrace_replay::VKTRACE_REPLAY_INVALID_ID ||
+                        res == vktrace_replay::VKTRACE_REPLAY_CALL_ERROR)
+                    {
+                        replayWorkerLoggingCallback(VKTRACE_LOG_ERROR, QString("Failed to replay packet %1.").arg(pCurPacket->pHeader->global_packet_index).toStdString().c_str());
+                    }
+                    else if (res == vktrace_replay::VKTRACE_REPLAY_BAD_RETURN)
+                    {
+                        replayWorkerLoggingCallback(VKTRACE_LOG_WARNING, QString("Replay of packet %1 has diverged from trace due to a different return value.").arg(pCurPacket->pHeader->global_packet_index).toStdString().c_str());
+                    }
+                    else if (res == vktrace_replay::VKTRACE_REPLAY_INVALID_PARAMS ||
+                             res == vktrace_replay::VKTRACE_REPLAY_VALIDATION_ERROR)
+                    {
+                        // validation layer should have reported these if the user wanted them, so don't print any additional warnings here.
+                    }
+                    else if (res != vktrace_replay::VKTRACE_REPLAY_SUCCESS)
+                    {
+                        replayWorkerLoggingCallback(VKTRACE_LOG_ERROR, QString("Unknown error caused by packet %1.").arg(pCurPacket->pHeader->global_packet_index).toStdString().c_str());
+                    }
+                }
+                else
+                {
+                    replayWorkerLoggingCallback(VKTRACE_LOG_ERROR, QString("Bad packet type id=%1, index=%2.").arg(pCurPacket->pHeader->packet_id).arg(pCurPacket->pHeader->global_packet_index).toStdString().c_str());
+                }
+            }
+        }
+
+        // Process events and pause or stop if needed
+        if (m_bPauseReplay || m_pauseAtPacketIndex == pCurPacket->pHeader->global_packet_index)
+        {
+            if (m_pauseAtPacketIndex == pCurPacket->pHeader->global_packet_index)
+            {
+                // reset
+                m_pauseAtPacketIndex = -1;
+            }
+
+            m_bReplayInProgress = false;
+            doReplayPaused(pCurPacket->pHeader->global_packet_index);
+            return;
+        }
+
+        if (m_bStopReplay)
+        {
+            m_bReplayInProgress = false;
+            doReplayStopped(pCurPacket->pHeader->global_packet_index);
+            return;
+        }
+    }
+
+    m_bReplayInProgress = false;
+    doReplayFinished(pCurPacket->pHeader->global_packet_index);
+}
+
+void vktraceviewer_QReplayWorker::onPlayToHere()
+{
+    m_pauseAtPacketIndex = m_pView->get_current_packet_index();
+    if (m_pauseAtPacketIndex <= m_currentReplayPacketIndex || m_currentReplayPacketIndex == 0)
+    {
+        // pause location is behind the current replay position, so restart the replay.
+        StartReplay();
+    }
+    else
+    {
+        // pause location is ahead of current replay position, so continue the replay.
+        ContinueReplay();
+    }
+}
+
+
+void vktraceviewer_QReplayWorker::StartReplay()
+{
+    // Starting the replay can happen immediately.
+    emit ReplayStarted();
+
+    // Reset some flags and play the replay from the beginning
+    m_bPauseReplay = false;
+    m_bStopReplay = false;
+    playCurrentTraceFile(0);
+}
+
+void vktraceviewer_QReplayWorker::StepReplay()
+{
+    // Stepping the replay can happen immediately.
+    emit ReplayContinued();
+
+    // Set the pause flag so that the replay will stop after replaying the next packet.
+    m_bPauseReplay = true;
+    m_bStopReplay = false;
+    playCurrentTraceFile(m_currentReplayPacketIndex+1);
+}
+
+void vktraceviewer_QReplayWorker::PauseReplay()
+{
+    // Pausing the replay happens asyncronously.
+    // So set the pause flag and the replay will
+    // react to it as soon as it can. It will call
+    // doReplayPaused() when it has paused.
+    m_bPauseReplay = true;
+}
+
+void vktraceviewer_QReplayWorker::ContinueReplay()
+{
+    // Continuing the replay can happen immediately.
+    emit ReplayContinued();
+
+    // clear the pause and stop flags and continue the replay from the next packet
+    m_bPauseReplay = false;
+    m_bStopReplay = false;
+    playCurrentTraceFile(m_currentReplayPacketIndex+1);
+}
+
+void vktraceviewer_QReplayWorker::StopReplay()
+{
+    if (m_bReplayInProgress)
+    {
+        // If a replay is in progress, then
+        // Stopping the replay happens asycnronously.
+        // Set the stop flag and the replay will
+        // react to it as soon as it can. It will call
+        // doReplayStopped() when it has stopped.
+        m_bStopReplay = true;
+    }
+    else
+    {
+        // Replay is not in progress means:
+        // 1) replay wasn't started (in which case stop button should be disabled and we can't get to this point),
+        // 2) replay is currently paused, so do same actions as if the replay detected that it should stop.
+        uint64_t packetIndex = this->m_pTraceFileInfo->pPacketOffsets[m_currentReplayPacketIndex].pHeader->global_packet_index;
+        doReplayStopped(packetIndex);
+    }
+}
+
+void vktraceviewer_QReplayWorker::onSettingsUpdated(vktrace_SettingGroup* pGroups, unsigned int numGroups)
+{
+    if (m_pReplayers != NULL)
+    {
+        for (unsigned int tracerId = 0; tracerId < VKTRACE_MAX_TRACER_ID_ARRAY_SIZE; tracerId++)
+        {
+            if (m_pReplayers[tracerId] != NULL)
+            {
+                // now update the replayer with the loaded settings
+                m_pReplayers[tracerId]->UpdateFromSettings(pGroups, numGroups);
+            }
+        }
+    }
+}
+
+vktrace_replay::vktrace_trace_packet_replay_library* vktraceviewer_QReplayWorker::getReplayer(VKTRACE_TRACER_ID tracerId)
+{
+    if (tracerId < 0 || tracerId >= VKTRACE_MAX_TRACER_ID_ARRAY_SIZE)
+    {
+        return NULL;
+    }
+
+    return m_pReplayers[tracerId];
+}
+
+void vktraceviewer_QReplayWorker::DetachReplay(bool detach)
+{
+    for (int i = 0; i < VKTRACE_MAX_TRACER_ID_ARRAY_SIZE; i++)
+    {
+        if(m_pReplayers[i] != NULL)
+        {
+            m_pReplayers[i]->Deinitialize();
+
+            vktrace_replay::ReplayDisplay disp;
+            if(detach)
+            {
+                disp = vktrace_replay::ReplayDisplay(m_pReplayWindowWidth, m_pReplayWindowHeight, 0, false);
+            }
+            else
+            {
+                WId hWindow = m_pReplayWindow->winId();
+                disp = vktrace_replay::ReplayDisplay((vktrace_window_handle)hWindow, m_pReplayWindowWidth, m_pReplayWindowHeight);
+            }
+
+            int err = m_pReplayers[i]->Initialize(&disp, NULL);
+            assert(err == 0);
+        }
+    }
+}
+
+void vktraceviewer_QReplayWorker::doReplayPaused(uint64_t packetIndex)
+{
+    emit ReplayPaused(packetIndex);
+}
+
+void vktraceviewer_QReplayWorker::doReplayStopped(uint64_t packetIndex)
+{
+    emit ReplayStopped(packetIndex);
+
+    // Replay will start again from the beginning, so setup for that now.
+    m_currentReplayPacketIndex = 0;
+}
+
+void vktraceviewer_QReplayWorker::doReplayFinished(uint64_t packetIndex)
+{
+    // Indicate that the replay finished at the particular packet.
+    emit ReplayFinished(packetIndex);
+
+    // Replay will start again from the beginning, so setup for that now.
+    m_currentReplayPacketIndex = 0;
+}
+
+//=============================================================================
+void dbg_msg_callback(vktrace_replay::VKTRACE_DBG_MSG_TYPE msgType, uint64_t packetIndex, const char* pMsg)
+{
+    if (g_pWorker != NULL)
+    {
+        if (msgType == vktrace_replay::VKTRACE_DBG_MSG_ERROR)
+        {
+            if (g_pWorker->PrintReplayErrorMsgs())
+            {
+                g_pWorker->OutputMessage(VKTRACE_LOG_ERROR, packetIndex, QString(pMsg));
+            }
+            if (g_pWorker->PauseOnReplayErrorMsg())
+            {
+                g_pWorker->PauseReplay();
+            }
+        }
+        else if (msgType == vktrace_replay::VKTRACE_DBG_MSG_WARNING)
+        {
+            if (g_pWorker->PrintReplayWarningMsgs())
+            {
+                g_pWorker->OutputMessage(VKTRACE_LOG_WARNING, packetIndex, QString(pMsg));
+            }
+            if (g_pWorker->PauseOnReplayWarningMsg())
+            {
+                g_pWorker->PauseReplay();
+            }
+        }
+        else
+        {
+            if (g_pWorker->PrintReplayInfoMsgs())
+            {
+                g_pWorker->OutputMessage(VKTRACE_LOG_VERBOSE, packetIndex, QString(pMsg));
+            }
+            if (g_pWorker->PauseOnReplayInfoMsg())
+            {
+                g_pWorker->PauseReplay();
+            }
+        }
+    }
+}

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_QReplayWorker.h b/vktrace/src/vktrace_viewer/vktraceviewer_QReplayWorker.h
new file mode 100644
index 0000000..81d64e7
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_QReplayWorker.h

@@ -0,0 +1,113 @@
+/**************************************************************************
+ *
+ * Copyright 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+#ifndef VKTRACEVIEWER_QREPLAYWORKER_H
+#define VKTRACEVIEWER_QREPLAYWORKER_H
+
+#include <QObject>
+#include "vktraceviewer_view.h"
+#include "vkreplay_factory.h"
+
+class vktraceviewer_QReplayWorker : public QObject
+{
+    Q_OBJECT
+public:
+    vktraceviewer_QReplayWorker();
+    virtual ~vktraceviewer_QReplayWorker();
+
+    void setPrintReplayMessages(BOOL bPrintInfo, BOOL bPrintWarning, BOOL bPrintError);
+    void setPauseOnReplayMessages(BOOL bPauseOnInfo, BOOL bPauseOnWarning, BOOL bPauseOnError);
+
+    BOOL PrintReplayInfoMsgs();
+    BOOL PrintReplayWarningMsgs();
+    BOOL PrintReplayErrorMsgs();
+
+    BOOL PauseOnReplayInfoMsg();
+    BOOL PauseOnReplayWarningMsg();
+    BOOL PauseOnReplayErrorMsg();
+
+    void setView(vktraceviewer_view* pView);
+
+    bool load_replayers(vktraceviewer_trace_file_info* pTraceFileInfo,
+        QWidget* pReplayWindow, int const replayWindowWidth,
+        int const replayWindowHeight, bool const separateReplayWindow);
+
+    void unloadReplayers();
+
+protected slots:
+    virtual void playCurrentTraceFile(uint64_t startPacketIndex);
+    virtual void onPlayToHere();
+
+public slots:
+    void StartReplay();
+    void StepReplay();
+    void PauseReplay();
+    void ContinueReplay();
+    void StopReplay();
+
+    void onSettingsUpdated(vktrace_SettingGroup* pGroups, unsigned int numGroups);
+
+    vktrace_replay::vktrace_trace_packet_replay_library* getReplayer(VKTRACE_TRACER_ID tracerId);
+
+    void DetachReplay(bool detach);
+
+signals:
+    void ReplayStarted();
+    void ReplayPaused(uint64_t packetIndex);
+    void ReplayContinued();
+    void ReplayStopped(uint64_t packetIndex);
+    void ReplayFinished(uint64_t packetIndex);
+
+    void ReplayProgressUpdate(uint64_t packetIndex);
+
+    void OutputMessage(VktraceLogLevel level, const QString& msg);
+    void OutputMessage(VktraceLogLevel level, uint64_t packetIndex, const QString& msg);
+
+private:
+    volatile bool m_bPauseReplay;
+    volatile bool m_bStopReplay;
+    volatile bool m_bReplayInProgress;
+    vktraceviewer_view* m_pView;
+    vktraceviewer_trace_file_info* m_pTraceFileInfo;
+    uint64_t m_currentReplayPacketIndex;
+    QAction* m_pActionRunToHere;
+    uint64_t m_pauseAtPacketIndex;
+
+    QWidget* m_pReplayWindow;
+    int m_pReplayWindowWidth;
+    int m_pReplayWindowHeight;
+
+    BOOL m_bPrintReplayInfoMessages;
+    BOOL m_bPrintReplayWarningMessages;
+    BOOL m_bPrintReplayErrorMessages;
+
+    BOOL m_bPauseOnReplayInfoMessages;
+    BOOL m_bPauseOnReplayWarningMessages;
+    BOOL m_bPauseOnReplayErrorMessages;
+
+    vktrace_replay::ReplayFactory m_replayerFactory;
+    vktrace_replay::vktrace_trace_packet_replay_library* m_pReplayers[VKTRACE_MAX_TRACER_ID_ARRAY_SIZE];
+
+    void doReplayPaused(uint64_t packetIndex);
+    void doReplayStopped(uint64_t packetIndex);
+    void doReplayFinished(uint64_t packetIndex);
+};
+
+#endif // VKTRACEVIEWER_QREPLAYWORKER_H

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_QTraceFileModel.h b/vktrace/src/vktrace_viewer/vktraceviewer_QTraceFileModel.h
new file mode 100644
index 0000000..83dd170
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_QTraceFileModel.h

@@ -0,0 +1,303 @@
+/**************************************************************************
+ *
+ * Copyright 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+#pragma once
+
+#include <QColor>
+#include <QFont>
+#include <QSize>
+#include <qabstractitemmodel.h>
+#include "vktraceviewer_trace_file_utils.h"
+
+class vktraceviewer_QTraceFileModel : public QAbstractItemModel
+{
+    Q_OBJECT
+public:
+    vktraceviewer_QTraceFileModel(QObject* parent, vktraceviewer_trace_file_info* pTraceFileInfo)
+        : QAbstractItemModel(parent)
+    {
+        m_pTraceFileInfo = pTraceFileInfo;
+    }
+
+    virtual ~vktraceviewer_QTraceFileModel()
+    {
+    }
+
+    virtual bool isDrawCall(const VKTRACE_TRACE_PACKET_ID packetId) const
+    {
+        return false;
+    }
+
+    virtual QString get_packet_string(const vktrace_trace_packet_header* pHeader) const
+    {
+        switch (pHeader->packet_id)
+        {
+            case VKTRACE_TPI_MESSAGE:
+            {
+                vktrace_trace_packet_message* pPacket = (vktrace_trace_packet_message*)pHeader->pBody;
+                return QString(pPacket->message);
+            }
+            case VKTRACE_TPI_MARKER_CHECKPOINT:
+            case VKTRACE_TPI_MARKER_API_BOUNDARY:
+            case VKTRACE_TPI_MARKER_API_GROUP_BEGIN:
+            case VKTRACE_TPI_MARKER_API_GROUP_END:
+            case VKTRACE_TPI_MARKER_TERMINATE_PROCESS:
+            default:
+            {
+                return QString ("%1").arg(pHeader->packet_id);
+            }
+        }
+    }
+
+    virtual QString get_packet_string_multiline(const vktrace_trace_packet_header* pHeader) const
+    {
+        // Default implemention is naive.
+        return get_packet_string(pHeader);
+    }
+
+    int rowCount(const QModelIndex &parent = QModelIndex()) const
+    {
+        if (parent.column() > 0)
+        {
+            return 0;
+        }
+
+        int rowCount = 0;
+        if (m_pTraceFileInfo != NULL)
+        {
+            rowCount = m_pTraceFileInfo->packetCount;
+        }
+
+        if (parent.isValid())
+        {
+            // there is a valid parent, so this is a child node, which has no rows
+            rowCount = 0;
+        }
+
+        return rowCount;
+    }
+
+    enum Columns
+    {
+        Column_EntrypointName,
+        Column_TracerId,
+        Column_PacketIndex,
+        Column_ThreadId,
+        Column_BeginTime,
+        Column_EndTime,
+        Column_PacketSize,
+        Column_CpuDuration,
+        cNumColumns
+    };
+
+    int columnCount(const QModelIndex &parent = QModelIndex()) const
+    {
+        return cNumColumns;
+    }
+
+    QVariant data(const QModelIndex &index, int role = Qt::DisplayRole) const
+    {
+        if (m_pTraceFileInfo == NULL)
+        {
+            return QVariant();
+        }
+
+        if (role == Qt::FontRole)
+        {
+            vktrace_trace_packet_header* pHeader = (vktrace_trace_packet_header*)this->index(index.row(), Column_EntrypointName, index.parent()).internalPointer();
+            if (isDrawCall((VKTRACE_TRACE_PACKET_ID)pHeader->packet_id))
+            {
+                QFont font;
+                font.setBold(true);
+                return font;
+            }
+        }
+
+        if (role == Qt::SizeHintRole)
+        {
+            return QSize(20, 20);
+        }
+
+        if (role == Qt::BackgroundRole && !m_searchString.isEmpty())
+        {
+            QVariant cellData = data(index, Qt::DisplayRole);
+            QString string = cellData.toString();
+            if (string.contains(m_searchString, Qt::CaseInsensitive))
+            {
+                return QColor(Qt::yellow);
+            }
+        }
+
+        if (role == Qt::DisplayRole)
+        {
+            switch (index.column())
+            {
+                case Column_EntrypointName:
+                {
+                    vktrace_trace_packet_header* pHeader = (vktrace_trace_packet_header*)index.internalPointer();
+                    QString apiStr = this->get_packet_string(pHeader);
+                    return apiStr;
+                }
+                case Column_TracerId:
+                    return QVariant(*(uint8_t*)index.internalPointer());
+                case Column_PacketIndex:
+                case Column_ThreadId:
+                    return QVariant(*(uint32_t*)index.internalPointer());
+                case Column_BeginTime:
+                case Column_EndTime:
+                case Column_PacketSize:
+                    return QVariant(*(unsigned long long*)index.internalPointer());
+                case Column_CpuDuration:
+                {
+                    vktrace_trace_packet_header* pHeader = (vktrace_trace_packet_header*)index.internalPointer();
+                    uint64_t duration = pHeader->entrypoint_end_time - pHeader->entrypoint_begin_time;
+                    return QVariant((unsigned int)duration);
+                }
+            }
+        }
+
+        if (role == Qt::ToolTipRole && index.column() == Column_EntrypointName)
+        {
+            vktrace_trace_packet_header* pHeader = (vktrace_trace_packet_header*)index.internalPointer();
+            QString tip;
+            tip += "<html><table>";
+#if defined(_DEBUG)
+            tip += "<tr><td><b>Packet header (pHeader->):</b></td><td/></tr>";
+            tip += QString("<tr><td>size</td><td>= %1 bytes</td></tr>").arg(pHeader->size);
+            tip += QString("<tr><td>global_packet_index</td><td>= %1</td></tr>").arg(pHeader->global_packet_index);
+            tip += QString("<tr><td>tracer_id</td><td>= %1</td></tr>").arg(pHeader->tracer_id);
+            tip += QString("<tr><td>packet_id</td><td>= %1</td></tr>").arg(pHeader->packet_id);
+            tip += QString("<tr><td>thread_id</td><td>= %1</td></tr>").arg(pHeader->thread_id);
+            tip += QString("<tr><td>vktrace_begin_time</td><td>= %1</td></tr>").arg(pHeader->vktrace_begin_time);
+            tip += QString("<tr><td>entrypoint_begin_time</td><td>= %1</td></tr>").arg(pHeader->entrypoint_begin_time);
+            tip += QString("<tr><td>entrypoint_end_time</td><td>= %1 (%2)</td></tr>").arg(pHeader->entrypoint_end_time).arg(pHeader->entrypoint_end_time - pHeader->entrypoint_begin_time);
+            tip += QString("<tr><td>vktrace_end_time</td><td>= %1 (%2)</td></tr>").arg(pHeader->vktrace_end_time).arg(pHeader->vktrace_end_time - pHeader->vktrace_begin_time);
+            tip += QString("<tr><td>next_buffers_offset</td><td>= %1</td></tr>").arg(pHeader->next_buffers_offset);
+            tip += QString("<tr><td>pBody</td><td>= %1</td></tr>").arg(pHeader->pBody);
+            tip += "<br>";
+#endif
+            tip += "<tr><td><b>";
+            QString multiline = this->get_packet_string_multiline(pHeader);
+            // only replaces the first '('
+            multiline.replace(multiline.indexOf("("), 1, "</b>(</td><td/></tr><tr><td>");
+            multiline.replace(", ", ", </td></tr><tr><td>");
+            multiline.replace(" = ", "</td><td>= ");
+
+            // only replaces the final ')'
+            multiline.replace(multiline.lastIndexOf(")"), 1, "</td></tr><tr><td>)</td><td>");
+            tip += multiline;
+            tip += "</td></tr></table>";
+            tip += "</html>";
+            return tip;
+        }
+
+        return QVariant();
+    }
+
+    QModelIndex index(int row, int column, const QModelIndex& parent = QModelIndex()) const
+    {
+        if (m_pTraceFileInfo == NULL || m_pTraceFileInfo->packetCount == 0)
+        {
+            return createIndex(row, column);
+        }
+
+        if ((uint64_t)row >= m_pTraceFileInfo->packetCount)
+        {
+            return QModelIndex();
+        }
+
+        vktrace_trace_packet_header* pHeader = m_pTraceFileInfo->pPacketOffsets[row].pHeader;
+        void* pData = NULL;
+        switch (column)
+        {
+        case Column_EntrypointName:
+            pData = pHeader;
+            break;
+        case Column_TracerId:
+            pData = &pHeader->tracer_id;
+            break;
+        case Column_PacketIndex:
+            pData = &pHeader->global_packet_index;
+            break;
+        case Column_ThreadId:
+            pData = &pHeader->thread_id;
+            break;
+        case Column_BeginTime:
+            pData = &pHeader->entrypoint_begin_time;
+            break;
+        case Column_EndTime:
+            pData = &pHeader->entrypoint_end_time;
+            break;
+        case Column_PacketSize:
+            pData = &pHeader->size;
+            break;
+        case Column_CpuDuration:
+            pData = pHeader;
+            break;
+        }
+
+        return createIndex(row, column, pData);
+    }
+
+    QModelIndex parent(const QModelIndex& index) const
+    {
+        return QModelIndex();
+    }
+
+    QVariant headerData(int section, Qt::Orientation orientation, int role) const
+    {
+        if (role == Qt::DisplayRole)
+        {
+            if (orientation == Qt::Horizontal)
+            {
+                switch (section)
+                {
+                case Column_EntrypointName:
+                    return QString("API Call");
+                case Column_TracerId:
+                    return QString("Tracer ID");
+                case Column_PacketIndex:
+                    return QString("Index");
+                case Column_ThreadId:
+                    return QString("Thread ID");
+                case Column_BeginTime:
+                    return QString("Start Time");
+                case Column_EndTime:
+                    return QString("End Time");
+                case Column_PacketSize:
+                    return QString("Size (bytes)");
+                case Column_CpuDuration:
+                    return QString("Duration");
+                }
+            }
+        }
+        return QVariant();
+    }
+
+    void set_highlight_search_string(const QString searchString)
+    {
+        m_searchString = searchString;
+    }
+
+private:
+    vktraceviewer_trace_file_info* m_pTraceFileInfo;
+    QString m_searchString;
+
+};

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_apicallitem.h b/vktrace/src/vktrace_viewer/vktraceviewer_apicallitem.h
new file mode 100644
index 0000000..3527954
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_apicallitem.h

@@ -0,0 +1,165 @@
+/**************************************************************************
+ *
+ * Copyright 2013-2014 RAD Game Tools and Valve Software
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ **************************************************************************/
+
+#ifndef GLVDEBUG_APICALLITEM_H
+#define GLVDEBUG_APICALLITEM_H
+
+//#include "glvdebug_snapshotitem.h"
+#include "glv_trace_packet_identifiers.h"
+
+// predeclared classes
+//class glvdebug_frameItem;
+//class glvdebug_groupItem;
+//class vogl_trace_packet;
+
+class glvdebug_apiCallItem //: public vogleditor_snapshotItem
+{
+public:
+    glvdebug_apiCallItem(glvdebug_apiCallTreeItem* pParent, glv_trace_packet_header* pTracePacket)
+        //: m_pParentFrame(pFrame),
+        //  m_glPacket(glPacket),
+          //m_pTracePacket(pTracePacket),
+          //m_globalCallIndex(glPacket.m_call_counter),
+          //m_begin_rdtsc(glPacket.m_packet_begin_rdtsc),
+          //m_end_rdtsc(glPacket.m_packet_end_rdtsc),
+          //m_backtrace_hash_index(glPacket.m_backtrace_hash_index)
+    {
+        //if (m_end_rdtsc < m_begin_rdtsc)
+        //{
+        //    m_end_rdtsc = m_begin_rdtsc + 1;
+        //}
+    }
+
+    ~glvdebug_apiCallItem()
+    {
+        //if (m_pTracePacket != NULL)
+        //{
+        //    vogl_delete(m_pTracePacket);
+        //    m_pTracePacket = NULL;
+        //}
+    }
+
+//    inline vogleditor_frameItem *frame() const
+//    {
+//        return m_pParentFrame;
+//    }
+//
+//    inline vogleditor_groupItem *group() const
+//    {
+//        return m_pParentGroup;
+//    }
+//
+//    inline uint64_t globalCallIndex() const
+//    {
+//        return m_globalCallIndex;
+//    }
+//
+//    inline uint64_t startTime() const
+//    {
+//        return m_begin_rdtsc;
+//    }
+//
+//    inline uint64_t endTime() const
+//    {
+//        return m_end_rdtsc;
+//    }
+//
+//    inline uint64_t duration() const
+//    {
+//        return endTime() - startTime();
+//    }
+//
+//    const vogl_trace_gl_entrypoint_packet *getGLPacket() const
+//    {
+//        return &m_glPacket;
+//    }
+//
+//    vogl_trace_packet *getTracePacket()
+//    {
+//        return m_pTracePacket;
+//    }
+//
+//    inline uint64_t backtraceHashIndex() const
+//    {
+//        return m_backtrace_hash_index;
+//    }
+//
+//    // Returns the api function call and its args as a string
+//    QString apiFunctionCall()
+//    {
+//        const gl_entrypoint_desc_t &entrypoint_desc = g_vogl_entrypoint_descs[m_pTracePacket->get_entrypoint_id()];
+//
+//        QString funcCall = entrypoint_desc.m_pName;
+//
+//        // format parameters
+//        funcCall.append("( ");
+//        dynamic_string paramStr;
+//        for (uint param_index = 0; param_index < m_pTracePacket->total_params(); param_index++)
+//        {
+//            if (param_index != 0)
+//                funcCall.append(", ");
+//
+//            paramStr.clear();
+//            m_pTracePacket->pretty_print_param(paramStr, param_index, false);
+//
+//            funcCall.append(paramStr.c_str());
+//        }
+//        funcCall.append(" )");
+//
+//        if (m_pTracePacket->has_return_value())
+//        {
+//            funcCall.append(" = ");
+//            paramStr.clear();
+//            m_pTracePacket->pretty_print_return_value(paramStr, false);
+//            funcCall.append(paramStr.c_str());
+//        }
+//        return funcCall;
+//    }
+//
+//    // Returns the string argument of an apicall in apiFunctionCall() output format
+//    //
+//    // TODO: (as needed) Add logic to return which string (argument count) from
+//    //                   a multi-string argument list (or all as a QStringList)
+//    QString stringArg()
+//    {
+//        QString apiCall = apiFunctionCall();
+//
+//        QString sec, name;
+//        int start = 1;
+//        while (!(sec = apiCall.section('\'', start, start)).isEmpty())
+//        {
+//            name.append(sec);
+//            start += 2;
+//        }
+//        return name;
+//    }
+//
+//private:
+//    glvdebug_apiCallTreeItem *m_pParentFrame;
+//    glvdebug_groupItem *m_pParentGroup;
+////    const vogl_trace_gl_entrypoint_packet m_glPacket;
+////    vogl_trace_packet *m_pTracePacket;
+//
+//    uint64_t m_globalCallIndex;
+//    uint64_t m_begin_rdtsc;
+//    uint64_t m_end_rdtsc;
+//    uint64_t m_backtrace_hash_index;
+};
+
+#endif // GLVDEBUG_APICALLITEM_H

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_apicalltreeitem.cpp b/vktrace/src/vktrace_viewer/vktraceviewer_apicalltreeitem.cpp
new file mode 100644
index 0000000..959cf7b
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_apicalltreeitem.cpp

@@ -0,0 +1,419 @@
+/**************************************************************************
+ *
+ * Copyright 2013-2014 RAD Game Tools and Valve Software
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ **************************************************************************/
+
+#include <QColor>
+#include <QIcon>
+
+#include "glvdebug_apicalltreeitem.h"
+#include "glvdebug_groupitem.h"
+#include "glvdebug_qapicalltreemodel.h"
+#include "glvdebug_frameitem.h"
+
+//#include "vogl_common.h"
+//#include "vogl_trace_file_reader.h"
+//#include "vogl_trace_packet.h"
+//#include "vogl_trace_stream_types.h"
+//#include "glvdebug_gl_state_snapshot.h"
+//#include "glvdebug_settings.h"
+
+// Constructor for root node
+glvdebug_apiCallTreeItem::glvdebug_apiCallTreeItem(int columnCount, glvdebug_QApiCallTreeModel *pModel)
+    : m_parentItem(NULL),
+      //m_pApiCallItem(NULL),
+      //m_pGroupItem(NULL),
+      //m_pFrameItem(NULL),
+      m_pModel(pModel),
+      m_localRowIndex(0),
+      m_columnCount(columnCount)
+{
+    m_columnData = new QVariant[m_columnCount];
+    //m_columnData[VOGL_ACTC_APICALL] = "API Call";
+    //m_columnData[VOGL_ACTC_INDEX] = "Index";
+    //m_columnData[VOGL_ACTC_FLAGS] = "";
+    //m_columnData[VOGL_ACTC_GLCONTEXT] = "GL Context";
+    ////m_ColumnTitles[VOGL_ACTC_BEGINTIME] = "Begin Time";
+    ////m_ColumnTitles[VOGL_ACTC_ENDTIME] = "End Time";
+    //m_columnData[VOGL_ACTC_DURATION] = "Duration (ns)";
+    m_columnData[0] = "API Call";
+    m_columnData[1] = "Index";
+    m_columnData[2] = "";
+    m_columnData[3] = "GL Context";
+    m_columnData[4] = "Duration (ns)";
+}
+//
+//// Constructor for frame nodes
+//glvdebug_apiCallTreeItem::glvdebug_apiCallTreeItem(glvdebug_frameItem *frameItem, glvdebug_apiCallTreeItem *parent)
+//    : m_parentItem(parent),
+//      m_pApiCallItem(NULL),
+//      m_pGroupItem(NULL),
+//      m_pFrameItem(frameItem),
+//      m_pModel(NULL),
+//      m_localRowIndex(0)
+//{
+//    if (frameItem != NULL)
+//    {
+//        QString tmp;
+//        tmp.sprintf("Frame %llu", frameItem->frameNumber());
+//        m_columnData[VOGL_ACTC_APICALL] = tmp;
+//    }
+//
+//    if (m_parentItem != NULL)
+//    {
+//        m_pModel = m_parentItem->m_pModel;
+//    }
+//}
+//
+//// Constructor for group nodes
+//glvdebug_apiCallTreeItem::glvdebug_apiCallTreeItem(glvdebug_groupItem *groupItem, glvdebug_apiCallTreeItem *parent)
+//    : m_parentItem(parent),
+//      m_pApiCallItem(NULL),
+//      m_pGroupItem(groupItem),
+//      m_pFrameItem(NULL),
+//      m_pModel(NULL),
+//      m_localRowIndex(0)
+//{
+//    m_columnData[VOGL_ACTC_APICALL] = cTREEITEM_STATECHANGES;
+//    if (m_parentItem != NULL)
+//    {
+//        m_pModel = m_parentItem->m_pModel;
+//    }
+//}
+
+// Constructor for apiCall nodes
+glvdebug_apiCallTreeItem::glvdebug_apiCallTreeItem(glvdebug_apiCallItem *apiCallItem)
+    : m_parentItem(NULL),
+      //m_pApiCallItem(apiCallItem),
+      //m_pGroupItem(NULL),
+      //m_pFrameItem(NULL),
+      m_pModel(NULL),
+      m_localRowIndex(0)
+{
+    //m_columnData[VOGL_ACTC_APICALL] = apiCallItem->apiFunctionCall();
+
+    //if (apiCallItem != NULL)
+    //{
+    //    m_columnData[VOGL_ACTC_INDEX] = (qulonglong)apiCallItem->globalCallIndex();
+    //    m_columnData[VOGL_ACTC_FLAGS] = "";
+    //    dynamic_string strContext;
+    //    m_columnData[VOGL_ACTC_GLCONTEXT] = strContext.format("0x%" PRIx64, apiCallItem->getGLPacket()->m_context_handle).c_str();
+    //    //m_columnData[VOGL_ACTC_BEGINTIME] = apiCallItem->startTime();
+    //    //m_columnData[VOGL_ACTC_ENDTIME] = apiCallItem->endTime();
+    //    m_columnData[VOGL_ACTC_DURATION] = (qulonglong)apiCallItem->duration();
+    //}
+
+    //if (m_parentItem != NULL)
+    //{
+    //    m_pModel = m_parentItem->m_pModel;
+    //}
+
+    m_columnCount = m_pModel->columnCount();
+    m_columnData = new QVariant[m_columnCount];
+    //m_columnData[VOGL_ACTC_APICALL] = "API Call";
+    //m_columnData[VOGL_ACTC_INDEX] = "Index";
+    //m_columnData[VOGL_ACTC_FLAGS] = "";
+    //m_columnData[VOGL_ACTC_GLCONTEXT] = "GL Context";
+    ////m_ColumnTitles[VOGL_ACTC_BEGINTIME] = "Begin Time";
+    ////m_ColumnTitles[VOGL_ACTC_ENDTIME] = "End Time";
+    //m_columnData[VOGL_ACTC_DURATION] = "Duration (ns)";
+    for (int i = 0; i < m_columnCount; i++)
+    {
+        m_columnData[i] = "data";
+    }
+}
+
+glvdebug_apiCallTreeItem::~glvdebug_apiCallTreeItem()
+{
+    delete [] m_columnData;
+    //if (m_pFrameItem != NULL)
+    //{
+    //    vogl_delete(m_pFrameItem);
+    //    m_pFrameItem = NULL;
+    //}
+
+    //if (m_pGroupItem != NULL)
+    //{
+    //    vogl_delete(m_pGroupItem);
+    //    m_pGroupItem = NULL;
+    //}
+
+    //if (m_pApiCallItem != NULL)
+    //{
+    //    vogl_delete(m_pApiCallItem);
+    //    m_pApiCallItem = NULL;
+    //}
+
+    //for (int i = 0; i < m_childItems.size(); i++)
+    //{
+    //    vogl_delete(m_childItems[i]);
+    //    m_childItems[i] = NULL;
+    //}
+    m_childItems.clear();
+}
+
+void glvdebug_apiCallTreeItem::setParent(glvdebug_apiCallTreeItem* pParent)
+{
+    m_parentItem = pParent;
+    if (m_parentItem != NULL)
+    {
+        m_pModel = m_parentItem->m_pModel;
+    }
+}
+glvdebug_apiCallTreeItem *glvdebug_apiCallTreeItem::parent() const
+{
+    return m_parentItem;
+}
+//bool glvdebug_apiCallTreeItem::isApiCall() const
+//{
+//    return m_pApiCallItem != NULL;
+//}
+//bool glvdebug_apiCallTreeItem::isGroup() const
+//{
+//    return (g_settings.groups_state_render() && (m_pGroupItem != NULL));
+//}
+//bool glvdebug_apiCallTreeItem::isFrame() const
+//{
+//    return m_pFrameItem != NULL;
+//}
+//bool glvdebug_apiCallTreeItem::isRoot() const
+//{
+//    return !(isApiCall() | isGroup() | isFrame());
+//}
+
+void glvdebug_apiCallTreeItem::appendChild(glvdebug_apiCallTreeItem *pChild)
+{
+    pChild->m_localRowIndex = m_childItems.size();
+    pChild->setParent(this);
+    m_childItems.append(pChild);
+}
+
+void glvdebug_apiCallTreeItem::popChild()
+{
+    m_childItems.removeLast();
+}
+
+int glvdebug_apiCallTreeItem::childCount() const
+{
+    return m_childItems.size();
+}
+
+glvdebug_apiCallTreeItem *glvdebug_apiCallTreeItem::child(int index) const
+{
+    if (index < 0 || index >= childCount())
+    {
+        return NULL;
+    }
+
+    return m_childItems[index];
+}
+
+glvdebug_apiCallItem *glvdebug_apiCallTreeItem::apiCallItem() const
+{
+    return m_pApiCallItem;
+}
+
+glvdebug_groupItem *glvdebug_apiCallTreeItem::groupItem() const
+{
+    return m_pGroupItem;
+}
+
+glvdebug_frameItem *glvdebug_apiCallTreeItem::frameItem() const
+{
+    return m_pFrameItem;
+}
+//
+//uint64_t glvdebug_apiCallTreeItem::startTime() const
+//{
+//    uint64_t startTime = 0;
+//
+//    if (m_pApiCallItem)
+//    {
+//        startTime = m_pApiCallItem->startTime();
+//    }
+//    else if (m_pGroupItem)
+//    {
+//        startTime = m_pGroupItem->startTime();
+//    }
+//    else if (m_pFrameItem)
+//    {
+//        startTime = m_pFrameItem->startTime();
+//    }
+//    else // root
+//    {
+//        startTime = child(0)->startTime();
+//    }
+//    return startTime;
+//}
+//
+//uint64_t glvdebug_apiCallTreeItem::endTime() const
+//{
+//    uint64_t endTime = 0;
+//
+//    if (m_pApiCallItem)
+//    {
+//        endTime = m_pApiCallItem->endTime();
+//    }
+//    else if (m_pGroupItem)
+//    {
+//        endTime = m_pGroupItem->endTime();
+//    }
+//    else if (m_pFrameItem)
+//    {
+//        endTime = m_pFrameItem->endTime();
+//    }
+//    else // root
+//    {
+//        endTime = child(childCount() - 1)->endTime();
+//    }
+//    return endTime;
+//}
+//
+//uint64_t glvdebug_apiCallTreeItem::duration() const
+//{
+//    return endTime() - startTime();
+//}
+
+//void glvdebug_apiCallTreeItem::set_snapshot(glvdebug_gl_state_snapshot *pSnapshot)
+//{
+//    if (m_pFrameItem)
+//    {
+//        m_pFrameItem->set_snapshot(pSnapshot);
+//    }
+//
+//    if (m_pApiCallItem)
+//    {
+//        m_pApiCallItem->set_snapshot(pSnapshot);
+//    }
+//}
+//
+//bool glvdebug_apiCallTreeItem::has_snapshot() const
+//{
+//    bool bHasSnapshot = false;
+//    if (m_pFrameItem)
+//    {
+//        bHasSnapshot = m_pFrameItem->has_snapshot();
+//    }
+//
+//    if (m_pApiCallItem)
+//    {
+//        bHasSnapshot = m_pApiCallItem->has_snapshot();
+//    }
+//    return bHasSnapshot;
+//}
+//
+//glvdebug_gl_state_snapshot *glvdebug_apiCallTreeItem::get_snapshot() const
+//{
+//    glvdebug_gl_state_snapshot *pSnapshot = NULL;
+//    if (m_pFrameItem)
+//    {
+//        pSnapshot = m_pFrameItem->get_snapshot();
+//    }
+//
+//    if (m_pApiCallItem)
+//    {
+//        pSnapshot = m_pApiCallItem->get_snapshot();
+//    }
+//    return pSnapshot;
+//}
+
+int glvdebug_apiCallTreeItem::columnCount() const
+{
+    int count = 0;
+    if (m_parentItem == NULL)
+    {
+        count = m_columnCount;
+    }
+    else
+    {
+        m_pModel->columnCount();
+    }
+
+    return count;
+}
+
+QVariant glvdebug_apiCallTreeItem::columnData(int column, int role) const
+{
+    if (column >= m_columnCount)
+    {
+        assert(!"Unexpected column data being requested");
+        return QVariant();
+    }
+
+    if (role == Qt::DecorationRole)
+    {
+        //// handle flags
+        //if (column == VOGL_ACTC_FLAGS)
+        //{
+        //    if (has_snapshot())
+        //    {
+        //        if (get_snapshot()->is_outdated())
+        //        {
+        //            // snapshot was dirtied due to an earlier edit
+        //            return QColor(200, 0, 0);
+        //        }
+        //        else if (get_snapshot()->is_edited())
+        //        {
+        //            // snapshot has been edited
+        //            return QColor(200, 102, 0);
+        //        }
+        //        else
+        //        {
+        //            // snapshot is good
+        //            return QColor(0, 0, 255);
+        //        }
+        //    }
+        //    else if (frameItem() != NULL && frameItem()->get_screenshot_filename().size() > 0)
+        //    {
+        //        return QIcon(frameItem()->get_screenshot_filename().c_str());
+        //    }
+        //}
+    }
+
+    if (role == Qt::DisplayRole)
+    {
+        return m_columnData[column];
+    }
+
+    return QVariant();
+}
+
+//void glvdebug_apiCallTreeItem::setApiCallColumnData(QString name)
+//{
+//    setColumnData(QVariant(name), VOGL_ACTC_APICALL);
+//}
+
+//void glvdebug_apiCallTreeItem::setColumnData(QVariant data, int column)
+//{
+//    m_columnData[column] = data;
+//}
+
+//QString glvdebug_apiCallTreeItem::apiCallColumnData() const
+//{
+//    return (columnData(VOGL_ACTC_APICALL, Qt::DisplayRole)).toString();
+//}
+//
+//QString glvdebug_apiCallTreeItem::apiCallStringArg() const
+//{
+//    return isApiCall() ? apiCallItem()->stringArg() : QString();
+//}
+
+int glvdebug_apiCallTreeItem::row() const
+{
+    // note, this is just the row within the current level of the hierarchy
+    return m_localRowIndex;
+}

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_apicalltreeitem.h b/vktrace/src/vktrace_viewer/vktraceviewer_apicalltreeitem.h
new file mode 100644
index 0000000..af29593
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_apicalltreeitem.h

@@ -0,0 +1,97 @@
+/**************************************************************************
+ *
+ * Copyright 2013-2014 RAD Game Tools and Valve Software
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ **************************************************************************/
+
+#ifndef GLVDEBUG_APICALLTREEITEM_H
+#define GLVDEBUG_APICALLTREEITEM_H
+
+#include <QList>
+#include <QVariant>
+
+typedef unsigned long long uint64_t;
+
+class glvdebug_frameItem;
+class glvdebug_groupItem;
+class glvdebug_apiCallItem;
+
+class glvdebug_QApiCallTreeModel;
+
+
+const QString cTREEITEM_STATECHANGES("State changes");
+// TODO: Maybe think about a more unique name so as not to be confused with,
+//       e.g., a marker_push entrypoint that has also been named "Render"
+const QString cTREEITEM_RENDER("Render");
+
+class glvdebug_apiCallTreeItem
+{
+public:
+    // Constructor for the root node
+    glvdebug_apiCallTreeItem(int columnCount, glvdebug_QApiCallTreeModel *pModel);
+
+    //// Constructor for frame nodes
+    //glvdebug_apiCallTreeItem(glvdebug_frameItem *frameItem, glvdebug_apiCallTreeItem *parent);
+
+    //// Constructor for group nodes
+    //glvdebug_apiCallTreeItem(glvdebug_groupItem *groupItem, glvdebug_apiCallTreeItem *parent);
+
+    // Constructor for apiCall nodes
+    glvdebug_apiCallTreeItem(glvdebug_apiCallItem *apiCallItem);
+
+    ~glvdebug_apiCallTreeItem();
+
+    void setParent(glvdebug_apiCallTreeItem* pParent);
+    glvdebug_apiCallTreeItem *parent() const;
+
+    void appendChild(glvdebug_apiCallTreeItem *pChild);
+    void popChild();
+
+    int childCount() const;
+
+    glvdebug_apiCallTreeItem *child(int index) const;
+
+    glvdebug_apiCallItem *apiCallItem() const;
+    glvdebug_groupItem *groupItem() const;
+    glvdebug_frameItem *frameItem() const;
+    
+    int columnCount() const;
+
+    QVariant columnData(int column, int role) const;
+
+    int row() const;
+
+    //bool isApiCall() const;
+    //bool isGroup() const;
+    //bool isFrame() const;
+    //bool isRoot() const;
+
+private:
+//    void setColumnData(QVariant data, int column);
+
+private:
+    QList<glvdebug_apiCallTreeItem *> m_childItems;
+    QVariant* m_columnData;
+    int m_columnCount;
+    glvdebug_apiCallTreeItem *m_parentItem;
+    //glvdebug_apiCallItem *m_pApiCallItem;
+    //glvdebug_groupItem *m_pGroupItem;
+    //glvdebug_frameItem *m_pFrameItem;
+    glvdebug_QApiCallTreeModel *m_pModel;
+    int m_localRowIndex;
+};
+
+#endif // GLVDEBUG_APICALLTREEITEM_H

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_controller.h b/vktrace/src/vktrace_viewer/vktraceviewer_controller.h
new file mode 100644
index 0000000..753831d
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_controller.h

@@ -0,0 +1,56 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+#pragma once
+#include "vktraceviewer_trace_file_utils.h"
+#include "vktraceviewer_view.h"
+#include "vktrace_settings.h"
+
+#include <QObject>
+
+class vktraceviewer_QController : public QObject
+{
+public:
+    vktraceviewer_QController() {}
+    virtual ~vktraceviewer_QController() {}
+
+    virtual const char* GetPacketIdString(uint16_t packetId) = 0;
+    virtual vktrace_SettingGroup* GetSettings() = 0;
+    virtual void UpdateFromSettings(vktrace_SettingGroup* pSettingGroups, unsigned int numSettingGroups) = 0;
+    virtual vktrace_trace_packet_header* InterpretTracePacket(vktrace_trace_packet_header* pHeader) = 0;
+    virtual bool LoadTraceFile(vktraceviewer_trace_file_info* pTraceFileInfo, vktraceviewer_view* pView) = 0;
+    virtual void UnloadTraceFile(void) = 0;
+
+public slots:
+
+signals:
+    void OutputMessage(VktraceLogLevel level, const QString& message);
+    void OutputMessage(VktraceLogLevel level, uint64_t packetIndex, const QString& message);
+};
+
+extern "C"
+{
+VKTRACER_EXPORT vktraceviewer_QController* VKTRACER_CDECL vtvCreateQController(void);
+VKTRACER_EXPORT void VKTRACER_CDECL vtvDeleteQController(vktraceviewer_QController** ppController);
+
+// entrypoints that must be exposed by each controller library
+typedef vktraceviewer_QController* (VKTRACER_CDECL *funcptr_vktraceviewer_CreateQController)(void);
+typedef void (VKTRACER_CDECL *funcptr_vktraceviewer_DeleteQController)(vktraceviewer_QController* pController);
+}

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_controller_factory.cpp b/vktrace/src/vktrace_viewer/vktraceviewer_controller_factory.cpp
new file mode 100644
index 0000000..265b2c2
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_controller_factory.cpp

@@ -0,0 +1,92 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+#include "vktraceviewer_controller_factory.h"
+#include "vktrace_platform.h"
+
+vktraceviewer_controller_factory::vktraceviewer_controller_factory()
+{
+}
+
+vktraceviewer_controller_factory::~vktraceviewer_controller_factory()
+{
+}
+
+vktraceviewer_QController *vktraceviewer_controller_factory::Load(const char* filename)
+{
+    void* pLibrary = vktrace_platform_open_library(filename);
+    if (pLibrary == NULL)
+    {
+        vktrace_LogError("Failed to load controller '%s'", filename);
+#if defined(PLATFORM_LINUX)
+        char* error = dlerror();
+        vktrace_LogError("Due to: %s", error);
+#endif
+        return NULL;
+    }
+
+    vktraceviewer_QController* pController = NULL;
+    funcptr_vktraceviewer_CreateQController CreateQController = (funcptr_vktraceviewer_CreateQController)vktrace_platform_get_library_entrypoint(pLibrary, "vtvCreateQController");
+    funcptr_vktraceviewer_DeleteQController DeleteQController = (funcptr_vktraceviewer_DeleteQController)vktrace_platform_get_library_entrypoint(pLibrary, "vtvDeleteQController");
+    if (CreateQController == NULL)
+    {
+        vktrace_LogError("Controller '%s' is missing entrypoint 'vtvCreateQController'.\n", filename);
+    }
+    if (DeleteQController == NULL)
+    {
+        vktrace_LogError("Controller '%s' is missing entrypoint 'vtvDeleteQController'.\n", filename);
+    }
+
+    if (CreateQController != NULL &&
+        DeleteQController != NULL)
+    {
+        pController = CreateQController();
+    }
+
+    if (pController != NULL)
+    {
+        m_controllerToLibraryMap[pController] = pLibrary;
+    }
+
+    return pController;
+}
+
+void vktraceviewer_controller_factory::Unload(vktraceviewer_QController** ppController)
+{
+    assert(ppController != NULL);
+    assert(*ppController != NULL);
+
+    void* pLibrary = m_controllerToLibraryMap[*ppController];
+    if (pLibrary == NULL)
+    {
+        vktrace_LogError("NULL Library encountered while unloading controller.");
+    }
+    else
+    {
+        funcptr_vktraceviewer_DeleteQController DeleteQController = (funcptr_vktraceviewer_DeleteQController)vktrace_platform_get_library_entrypoint(pLibrary, "vtvDeleteQController");
+        if (DeleteQController != NULL)
+        {
+            DeleteQController(*ppController);
+            *ppController = NULL;
+        }
+
+        vktrace_platform_close_library(pLibrary);
+    }
+}

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_controller_factory.h b/vktrace/src/vktrace_viewer/vktraceviewer_controller_factory.h
new file mode 100644
index 0000000..cd5242a
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_controller_factory.h

@@ -0,0 +1,46 @@
+/**************************************************************************
+ *
+ * Copyright 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+#ifndef VKTRACEVIEWER_CONTROLLER_FACTORY_H
+#define VKTRACEVIEWER_CONTROLLER_FACTORY_H
+
+#include <QMap>
+
+extern "C" {
+#include "vktrace_common.h"
+#include "vktrace_trace_packet_identifiers.h"
+}
+
+#include "vktraceviewer_controller.h"
+
+class vktraceviewer_controller_factory
+{
+public:
+    vktraceviewer_controller_factory();
+    ~vktraceviewer_controller_factory();
+
+    vktraceviewer_QController* Load(const char* filename);
+    void Unload(vktraceviewer_QController** ppController);
+
+private:
+    QMap<vktraceviewer_QController*, void*> m_controllerToLibraryMap;
+};
+
+#endif // VKTRACEVIEWER_CONTROLLER_FACTORY_H

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_frameitem.h b/vktrace/src/vktrace_viewer/vktraceviewer_frameitem.h
new file mode 100644
index 0000000..853f011
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_frameitem.h

@@ -0,0 +1,137 @@
+/**************************************************************************
+ *
+ * Copyright 2013-2014 RAD Game Tools and Valve Software
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ **************************************************************************/
+
+#ifndef VOGLEDITOR_FRAMEITEM_H
+#define VOGLEDITOR_FRAMEITEM_H
+
+#include <QList>
+
+// external class (could be predeclared if
+// definitions were move to a .cpp file)
+#include "glvdebug_apicallitem.h"
+
+class vogleditor_frameItem //: public vogleditor_snapshotItem
+{
+public:
+    vogleditor_frameItem(uint64_t frameNumber)
+        : m_frameNumber(frameNumber)
+    {
+    }
+
+    ~vogleditor_frameItem()
+    {
+        m_apiCallList.clear();
+        m_groupList.clear();
+    }
+
+    //inline uint64_t frameNumber() const
+    //{
+    //    return m_frameNumber;
+    //}
+
+    //void appendGroup(vogleditor_groupItem *pItem)
+    //{
+    //    m_groupList.append(pItem);
+    //}
+
+    //vogleditor_apiCallItem *popApiCall()
+    //{
+    //    return m_apiCallList.takeLast();
+    //}
+
+    //vogleditor_groupItem *popGroup()
+    //{
+    //    return m_groupList.takeLast();
+    //}
+
+    //void appendCall(vogleditor_apiCallItem *pItem)
+    //{
+    //    m_apiCallList.append(pItem);
+    //}
+
+    //inline int callCount() const
+    //{
+    //    return m_apiCallList.size();
+    //}
+
+    //vogleditor_apiCallItem *call(int index) const
+    //{
+    //    if (index < 0 || index > callCount())
+    //    {
+    //        return NULL;
+    //    }
+
+    //    return m_apiCallList[index];
+    //}
+
+    //bool getStartEndTimes(uint64_t &start, uint64_t &end) const
+    //{
+    //    if (callCount() == 0)
+    //    {
+    //        return false;
+    //    }
+
+    //    start = startTime();
+    //    end = endTime();
+    //    return true;
+    //}
+
+    //uint64_t startTime() const
+    //{
+    //    return apiCallStartTime(0);
+    //}
+
+    //uint64_t endTime() const
+    //{
+    //    return apiCallEndTime(callCount() - 1);
+    //}
+
+    //uint64_t apiCallStartTime(uint index) const
+    //{
+    //    return m_apiCallList[index]->startTime();
+    //}
+
+    //uint64_t apiCallEndTime(uint index) const
+    //{
+    //    return m_apiCallList[index]->endTime();
+    //}
+
+    //uint64_t duration() const
+    //{
+    //    return (endTime() - startTime());
+    //}
+
+    //void set_screenshot_filename(const std::string &filename)
+    //{
+    //    m_screenshot_filename = filename;
+    //}
+
+    //const std::string &get_screenshot_filename() const
+    //{
+    //    return m_screenshot_filename;
+    //}
+
+private:
+    uint64_t m_frameNumber;
+    QList<glvdebug_apiCallItem *> m_apiCallList;
+    QList<vogleditor_groupItem *> m_groupList;
+
+    //std::string m_screenshot_filename;
+};
+
+#endif // VOGLEDITOR_FRAMEITEM_H

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_groupitem.h b/vktrace/src/vktrace_viewer/vktraceviewer_groupitem.h
new file mode 100644
index 0000000..1f2109a
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_groupitem.h

@@ -0,0 +1,92 @@
+/**************************************************************************
+ *
+ * Copyright 2013-2014 RAD Game Tools and Valve Software
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ **************************************************************************/
+
+#pragma once
+
+#include <QList>
+//#include "glvdebug_snapshotitem.h"
+#include "glvdebug_apicallitem.h"
+
+class vogleditor_frameItem;
+
+class vogleditor_groupItem //: public vogleditor_snapshotItem
+{
+public:
+    vogleditor_groupItem(vogleditor_frameItem *pFrameItem)
+        : m_pParentFrame(pFrameItem)
+    {
+    }
+
+    ~vogleditor_groupItem()
+    {
+        m_apiCallList.clear();
+    }
+
+    void appendCall(glvdebug_apiCallItem *pItem)
+    {
+        m_apiCallList.append(pItem);
+    }
+
+    inline int callCount() const
+    {
+        return m_apiCallList.size();
+    }
+
+    glvdebug_apiCallItem *call(int index) const
+    {
+        if (index < 0 || index > callCount())
+        {
+            return NULL;
+        }
+        return m_apiCallList[index];
+    }
+
+    //inline uint64_t firstApiCallIndex() const
+    //{
+    //    return apiCallIndex(0);
+    //}
+
+    //inline uint64_t apiCallIndex(int index = 0) const
+    //{
+    //    if (vogleditor_apiCallItem *apiCallItem = call(index))
+    //    {
+    //        return apiCallItem->globalCallIndex();
+    //    }
+    //    return uint64_t(-1); // (-1 index won't be found)
+    //}
+
+    //inline uint64_t startTime() const
+    //{
+    //    return m_apiCallList[0]->startTime();
+    //}
+
+    //inline uint64_t endTime() const
+    //{
+    //    return m_apiCallList[callCount() - 1]->endTime();
+    //}
+
+    //inline uint64_t duration() const
+    //{
+    //    return endTime() - startTime();
+    //}
+
+private:
+    vogleditor_frameItem *m_pParentFrame;
+    QList<glvdebug_apiCallItem *> m_apiCallList;
+};

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_output.cpp b/vktrace/src/vktrace_viewer/vktraceviewer_output.cpp
new file mode 100644
index 0000000..308183f
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_output.cpp

@@ -0,0 +1,109 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+#include "vktraceviewer_output.h"
+#include <QTextEdit>
+
+vktraceviewer_output gs_OUTPUT;
+
+vktraceviewer_output::vktraceviewer_output()
+{
+}
+
+vktraceviewer_output::~vktraceviewer_output()
+{
+}
+
+void vktraceviewer_output::init(QTextBrowser *pTextEdit)
+{
+    m_pTextEdit = pTextEdit;
+}
+
+QString vktraceviewer_output::convertToHtml(QString message)
+{
+    QString result;
+    if (message.endsWith("\n"))
+    {
+        message.chop(1);
+    }
+    result = message.replace("\n", "<br>");
+    return result;
+}
+
+void vktraceviewer_output::moveCursorToEnd()
+{
+    QTextCursor cursor = m_pTextEdit->textCursor();
+    cursor.movePosition(QTextCursor::End, QTextCursor::MoveAnchor);
+    m_pTextEdit->setTextCursor(cursor);
+}
+
+void vktraceviewer_output::message(uint64_t packetIndex, const QString& message)
+{
+    if (m_pTextEdit != NULL)
+    {
+        QString msg;
+        if (packetIndex == (uint64_t)-1)
+        {
+            msg = message;
+        }
+        else
+        {
+            msg = QString("(<a href='packet#%1'>%1</a>): %2 ").arg(packetIndex).arg(message);
+        }
+        moveCursorToEnd();
+        m_pTextEdit->append(msg);
+    }
+}
+
+void vktraceviewer_output::warning(uint64_t packetIndex, const QString& warning)
+{
+    if (m_pTextEdit != NULL)
+    {
+        QString msg;
+        if (packetIndex == (uint64_t)-1)
+        {
+            msg = QString("<font color='red'>Warning: %1</font> ").arg(warning);
+        }
+        else
+        {
+            msg = QString("<font color='red'>(<a href='packet#%1'>%1</a>) Warning: %2</font> ").arg(packetIndex).arg(warning);
+        }
+        moveCursorToEnd();
+        m_pTextEdit->append(msg);
+    }
+}
+
+void vktraceviewer_output::error(uint64_t packetIndex, const QString& error)
+{
+    if (m_pTextEdit != NULL)
+    {
+        QString msg;
+        if (packetIndex == (uint64_t)-1)
+        {
+            msg = QString("<font color='red'><b>Error: %1</b></font> ").arg(convertToHtml(error));
+        }
+        else
+        {
+            msg = QString("<font color='red'><b>(<a href='packet#%1'>%1</a>) Error: %2</b></font> ").arg(packetIndex).arg(convertToHtml(error));
+        }
+        moveCursorToEnd();
+        m_pTextEdit->append(msg);
+    }
+}

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_output.h b/vktrace/src/vktrace_viewer/vktraceviewer_output.h
new file mode 100644
index 0000000..416f665
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_output.h

@@ -0,0 +1,66 @@
+/**************************************************************************
+ *
+ * Copyright 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+#ifndef VKTRACEVIEWER_OUTPUT_H
+#define VKTRACEVIEWER_OUTPUT_H
+
+#include <QString>
+#include <QTextBrowser>
+extern "C"
+{
+#include "vktrace_platform.h"
+#include "vktrace_tracelog.h"
+}
+
+class QTextEdit;
+
+class vktraceviewer_output
+{
+public:
+    vktraceviewer_output();
+    ~vktraceviewer_output();
+
+    void init(QTextBrowser* pTextEdit);
+
+    void message(uint64_t packetIndex, const QString& message);
+    void warning(uint64_t packetIndex, const QString& warning);
+    void error(uint64_t packetIndex, const QString& error);
+
+private:
+    QString convertToHtml(QString message);
+    void moveCursorToEnd();
+    QTextBrowser* m_pTextEdit;
+};
+
+extern vktraceviewer_output gs_OUTPUT;
+
+inline void vktraceviewer_output_init(QTextBrowser* pTextEdit) { gs_OUTPUT.init(pTextEdit); }
+
+inline void vktraceviewer_output_message(uint64_t packetIndex, const QString& message) { gs_OUTPUT.message(packetIndex, message); }
+inline void vktraceviewer_output_message(const QString& message) { gs_OUTPUT.message(-1, message); }
+
+inline void vktraceviewer_output_warning(uint64_t packetIndex, const QString& warning) { gs_OUTPUT.warning(packetIndex, warning); }
+inline void vktraceviewer_output_warning(const QString& warning) { gs_OUTPUT.warning(-1, warning); }
+
+inline void vktraceviewer_output_error(uint64_t packetIndex, const QString& error) { gs_OUTPUT.error(packetIndex, error); }
+inline void vktraceviewer_output_error(const QString& error) { gs_OUTPUT.error(-1, error); }
+inline void vktraceviewer_output_deinit() { gs_OUTPUT.init(0); }
+
+#endif // VKTRACEVIEWER_OUTPUT_H

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_qapicalltreemodel.cpp b/vktrace/src/vktrace_viewer/vktraceviewer_qapicalltreemodel.cpp
new file mode 100644
index 0000000..92ad805
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_qapicalltreemodel.cpp

@@ -0,0 +1,439 @@
+/**************************************************************************
+ *
+ * Copyright 2013-2014 RAD Game Tools and Valve Software
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ **************************************************************************/
+
+#include <QColor>
+#include <QFont>
+#include <QLocale>
+
+#include "glvdebug_qapicalltreemodel.h"
+#include "glv_common.h"
+#include "glv_trace_packet_identifiers.h"
+
+#include "glvdebug_apicalltreeitem.h"
+//#include "glvdebug_frameitem.h"
+//#include "glvdebug_groupitem.h"
+//#include "glvdebug_apicallitem.h"
+#include "glvdebug_output.h"
+#include "glvdebug_settings.h"
+
+glvdebug_QApiCallTreeModel::glvdebug_QApiCallTreeModel(int columnCount, QObject *parent)
+    : QAbstractItemModel(parent),
+      m_columnCount(columnCount)
+{
+    m_rootItem = new glvdebug_apiCallTreeItem(columnCount, this);
+}
+
+glvdebug_QApiCallTreeModel::~glvdebug_QApiCallTreeModel()
+{
+    if (m_rootItem != NULL)
+    {
+        delete m_rootItem;
+        m_rootItem = NULL;
+    }
+
+    m_itemList.clear();
+}
+
+
+QModelIndex glvdebug_QApiCallTreeModel::index(int row, int column, const QModelIndex &parent) const
+{
+    if (!hasIndex(row, column, parent))
+        return QModelIndex();
+
+    glvdebug_apiCallTreeItem *parentItem;
+
+    if (!parent.isValid())
+        parentItem = m_rootItem;
+    else
+        parentItem = static_cast<glvdebug_apiCallTreeItem *>(parent.internalPointer());
+
+    glvdebug_apiCallTreeItem *childItem = parentItem->child(row);
+    if (childItem)
+        return createIndex(row, column, childItem);
+    else
+        return QModelIndex();
+}
+//
+//QModelIndex glvdebug_QApiCallTreeModel::indexOf(const glvdebug_apiCallTreeItem *pItem) const
+//{
+//    if (pItem != NULL)
+//        return createIndex(pItem->row(), /*VOGL_ACTC_APICALL*/ 0, (void *)pItem);
+//    else
+//        return QModelIndex();
+//}
+
+QModelIndex glvdebug_QApiCallTreeModel::parent(const QModelIndex &index) const
+{
+    if (!index.isValid())
+        return QModelIndex();
+
+    glvdebug_apiCallTreeItem *childItem = static_cast<glvdebug_apiCallTreeItem *>(index.internalPointer());
+    if (childItem == m_rootItem)
+        return QModelIndex();
+
+    glvdebug_apiCallTreeItem *parentItem = childItem->parent();
+
+    if (parentItem == m_rootItem || parentItem == NULL)
+        return QModelIndex();
+
+    return createIndex(parentItem->row(), /*VOGL_ACTC_APICALL*/ 0, parentItem);
+}
+
+int glvdebug_QApiCallTreeModel::rowCount(const QModelIndex &parent) const
+{
+    glvdebug_apiCallTreeItem *parentItem;
+    if (parent.column() > 0)
+        return 0;
+
+    if (!parent.isValid())
+        parentItem = m_rootItem;
+    else
+        parentItem = static_cast<glvdebug_apiCallTreeItem *>(parent.internalPointer());
+
+    return parentItem->childCount();
+}
+
+int glvdebug_QApiCallTreeModel::columnCount(const QModelIndex &parent) const
+{
+    //VOGL_NOTE_UNUSED(parent);
+    return m_columnCount;
+}
+
+QVariant glvdebug_QApiCallTreeModel::data(const QModelIndex &index, int role) const
+{
+    if (!index.isValid())
+        return QVariant();
+
+    glvdebug_apiCallTreeItem *pItem = static_cast<glvdebug_apiCallTreeItem *>(index.internalPointer());
+
+    if (pItem == NULL)
+    {
+        return QVariant();
+    }
+
+    //// make draw call rows appear in bold
+    //if (role == Qt::FontRole && pItem->apiCallItem() != NULL && vogl_is_frame_buffer_write_entrypoint((gl_entrypoint_id_t)pItem->apiCallItem()->getGLPacket()->m_entrypoint_id))
+    //{
+    //    QFont font;
+    //    font.setBold(true);
+    //    return font;
+    //}
+
+    //// highlight the API call cell if it has a substring which matches the searchString
+    //if (role == Qt::BackgroundRole && index.column() == VOGL_ACTC_APICALL)
+    //{
+    //    if (!m_searchString.isEmpty())
+    //    {
+    //        QVariant data = pItem->columnData(VOGL_ACTC_APICALL, Qt::DisplayRole);
+    //        QString string = data.toString();
+    //        if (string.contains(m_searchString, Qt::CaseInsensitive))
+    //        {
+    //            return QColor(Qt::yellow);
+    //        }
+    //    }
+    //}
+
+    return pItem->columnData(index.column(), role);
+}
+
+Qt::ItemFlags glvdebug_QApiCallTreeModel::flags(const QModelIndex &index) const
+{
+    if (!index.isValid())
+        return 0;
+
+    return Qt::ItemIsEnabled | Qt::ItemIsSelectable;
+}
+
+QVariant glvdebug_QApiCallTreeModel::headerData(int section, Qt::Orientation orientation,
+                                                  int role) const
+{
+    if (orientation == Qt::Horizontal && role == Qt::DisplayRole)
+        return m_rootItem->columnData(section, role);
+
+    return QVariant();
+}
+
+//void glvdebug_QApiCallTreeModel::set_highlight_search_string(const QString searchString)
+//{
+//    m_searchString = searchString;
+//}
+//
+//QModelIndex glvdebug_QApiCallTreeModel::find_prev_search_result(glvdebug_apiCallTreeItem *start, const QString searchText)
+//{
+//    QLinkedListIterator<glvdebug_apiCallTreeItem *> iter(m_itemList);
+//
+//    if (start != NULL)
+//    {
+//        if (iter.findNext(start) == false)
+//        {
+//            // the object wasn't found in the list, so return a default (invalid) item
+//            return QModelIndex();
+//        }
+//
+//        // need to back up past the current item
+//        iter.previous();
+//    }
+//    else
+//    {
+//        // set the iterator to the back so that searching starts from the end of the list
+//        iter.toBack();
+//    }
+//
+//    // now the iterator is pointing to the desired start object in the list,
+//    // continually check the prev item and find one with a snapshot
+//    glvdebug_apiCallTreeItem *pFound = NULL;
+//    while (iter.hasPrevious())
+//    {
+//        glvdebug_apiCallTreeItem *pItem = iter.peekPrevious();
+//        QVariant data = pItem->columnData(VOGL_ACTC_APICALL, Qt::DisplayRole);
+//        QString string = data.toString();
+//        if (string.contains(searchText, Qt::CaseInsensitive))
+//        {
+//            pFound = pItem;
+//            break;
+//        }
+//
+//        iter.previous();
+//    }
+//
+//    return indexOf(pFound);
+//}
+//
+//QModelIndex glvdebug_QApiCallTreeModel::find_next_search_result(glvdebug_apiCallTreeItem *start, const QString searchText)
+//{
+//    QLinkedListIterator<glvdebug_apiCallTreeItem *> iter(m_itemList);
+//
+//    if (start != NULL)
+//    {
+//        if (iter.findNext(start) == false)
+//        {
+//            // the object wasn't found in the list, so return a default (invalid) item
+//            return QModelIndex();
+//        }
+//    }
+//
+//    // now the iterator is pointing to the desired start object in the list,
+//    // continually check the next item and find one with a snapshot
+//    glvdebug_apiCallTreeItem *pFound = NULL;
+//    while (iter.hasNext())
+//    {
+//        glvdebug_apiCallTreeItem *pItem = iter.peekNext();
+//        QVariant data = pItem->columnData(VOGL_ACTC_APICALL, Qt::DisplayRole);
+//        QString string = data.toString();
+//        if (string.contains(searchText, Qt::CaseInsensitive))
+//        {
+//            pFound = pItem;
+//            break;
+//        }
+//
+//        iter.next();
+//    }
+//
+//    return indexOf(pFound);
+//}
+//
+//glvdebug_apiCallTreeItem *glvdebug_QApiCallTreeModel::find_prev_snapshot(glvdebug_apiCallTreeItem *start)
+//{
+//    QLinkedListIterator<glvdebug_apiCallTreeItem *> iter(m_itemList);
+//
+//    if (start != NULL)
+//    {
+//        if (iter.findNext(start) == false)
+//        {
+//            // the object wasn't found in the list
+//            return NULL;
+//        }
+//
+//        // need to back up past the current item
+//        iter.previous();
+//    }
+//    else
+//    {
+//        // set the iterator to the back so that searching starts from the end of the list
+//        iter.toBack();
+//    }
+//
+//    // now the iterator is pointing to the desired start object in the list,
+//    // continually check the prev item and find one with a snapshot
+//    glvdebug_apiCallTreeItem *pFound = NULL;
+//    while (iter.hasPrevious())
+//    {
+//        if (iter.peekPrevious()->has_snapshot())
+//        {
+//            pFound = iter.peekPrevious();
+//            break;
+//        }
+//
+//        iter.previous();
+//    }
+//
+//    return pFound;
+//}
+//
+//glvdebug_apiCallTreeItem *glvdebug_QApiCallTreeModel::find_next_snapshot(glvdebug_apiCallTreeItem *start)
+//{
+//    QLinkedListIterator<glvdebug_apiCallTreeItem *> iter(m_itemList);
+//
+//    // if start is NULL, then search will begin from top, otherwise it will begin from the start item and search onwards
+//    if (start != NULL)
+//    {
+//        if (iter.findNext(start) == false)
+//        {
+//            // the object wasn't found in the list
+//            return NULL;
+//        }
+//    }
+//
+//    // now the iterator is pointing to the desired start object in the list,
+//    // continually check the next item and find one with a snapshot
+//    glvdebug_apiCallTreeItem *pFound = NULL;
+//    while (iter.hasNext())
+//    {
+//        if (iter.peekNext()->has_snapshot())
+//        {
+//            pFound = iter.peekNext();
+//            break;
+//        }
+//
+//        iter.next();
+//    }
+//
+//    return pFound;
+//}
+//
+//glvdebug_apiCallTreeItem *glvdebug_QApiCallTreeModel::find_prev_drawcall(glvdebug_apiCallTreeItem *start)
+//{
+//    QLinkedListIterator<glvdebug_apiCallTreeItem *> iter(m_itemList);
+//
+//    if (start != NULL)
+//    {
+//        if (iter.findNext(start) == false)
+//        {
+//            // the object wasn't found in the list
+//            return NULL;
+//        }
+//
+//        // need to back up past the current item
+//        iter.previous();
+//    }
+//    else
+//    {
+//        // set the iterator to the back so that searching starts from the end of the list
+//        iter.toBack();
+//    }
+//
+//    // now the iterator is pointing to the desired start object in the list,
+//    // continually check the prev item and find one with a snapshot
+//    glvdebug_apiCallTreeItem *pFound = NULL;
+//    while (iter.hasPrevious())
+//    {
+//        glvdebug_apiCallTreeItem *pItem = iter.peekPrevious();
+//        if (pItem->apiCallItem() != NULL)
+//        {
+//            gl_entrypoint_id_t entrypointId = pItem->apiCallItem()->getTracePacket()->get_entrypoint_id();
+//            if (vogl_is_frame_buffer_write_entrypoint(entrypointId))
+//            {
+//                pFound = iter.peekPrevious();
+//                break;
+//            }
+//        }
+//
+//        iter.previous();
+//    }
+//
+//    return pFound;
+//}
+//
+//glvdebug_apiCallTreeItem *glvdebug_QApiCallTreeModel::find_next_drawcall(glvdebug_apiCallTreeItem *start)
+//{
+//    QLinkedListIterator<glvdebug_apiCallTreeItem *> iter(m_itemList);
+//
+//    if (iter.findNext(start) == false)
+//    {
+//        // the object wasn't found in the list
+//        return NULL;
+//    }
+//
+//    // now the iterator is pointing to the desired start object in the list,
+//    // continually check the next item and find one with a snapshot
+//    glvdebug_apiCallTreeItem *pFound = NULL;
+//    while (iter.hasNext())
+//    {
+//        glvdebug_apiCallTreeItem *pItem = iter.peekNext();
+//        if (pItem->apiCallItem() != NULL)
+//        {
+//            gl_entrypoint_id_t entrypointId = pItem->apiCallItem()->getTracePacket()->get_entrypoint_id();
+//            if (vogl_is_frame_buffer_write_entrypoint(entrypointId))
+//            {
+//                pFound = iter.peekNext();
+//                break;
+//            }
+//        }
+//
+//        iter.next();
+//    }
+//
+//    return pFound;
+//}
+//
+//glvdebug_apiCallTreeItem *glvdebug_QApiCallTreeModel::find_call_number(unsigned int callNumber)
+//{
+//    QLinkedListIterator<glvdebug_apiCallTreeItem *> iter(m_itemList);
+//
+//    glvdebug_apiCallTreeItem *pFound = NULL;
+//    while (iter.hasNext())
+//    {
+//        glvdebug_apiCallTreeItem *pItem = iter.peekNext();
+//        if (pItem->apiCallItem() != NULL)
+//        {
+//            if (pItem->apiCallItem()->globalCallIndex() == callNumber)
+//            {
+//                pFound = iter.peekNext();
+//                break;
+//            }
+//        }
+//
+//        iter.next();
+//    }
+//
+//    return pFound;
+//}
+//
+//glvdebug_apiCallTreeItem *glvdebug_QApiCallTreeModel::find_frame_number(unsigned int frameNumber)
+//{
+//    QLinkedListIterator<glvdebug_apiCallTreeItem *> iter(m_itemList);
+//
+//    glvdebug_apiCallTreeItem *pFound = NULL;
+//    while (iter.hasNext())
+//    {
+//        glvdebug_apiCallTreeItem *pItem = iter.peekNext();
+//        if (pItem->frameItem() != NULL)
+//        {
+//            if (pItem->frameItem()->frameNumber() == frameNumber)
+//            {
+//                pFound = iter.peekNext();
+//                break;
+//            }
+//        }
+//
+//        iter.next();
+//    }
+//
+//    return pFound;
+//}

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_qapicalltreemodel.h b/vktrace/src/vktrace_viewer/vktraceviewer_qapicalltreemodel.h
new file mode 100644
index 0000000..67f1d11
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_qapicalltreemodel.h

@@ -0,0 +1,103 @@
+/**************************************************************************
+ *
+ * Copyright 2013-2014 RAD Game Tools and Valve Software
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ **************************************************************************/
+
+#ifndef GLVDEBUG_QAPICALLTREEMODEL_H
+#define GLVDEBUG_QAPICALLTREEMODEL_H
+
+#include <QAbstractItemModel>
+#include <QLinkedList>
+
+#include "glv_common.h"
+
+
+class QVariant;
+class glvdebug_apiCallTreeItem;
+//class glvdebug_groupItem;
+//class glvdebug_frameItem;
+class glvdebug_apiCallItem;
+
+class glvdebug_QApiCallTreeModel : public QAbstractItemModel
+{
+    Q_OBJECT
+
+public:
+    glvdebug_QApiCallTreeModel(int columnCount, QObject *parent = 0);
+    ~glvdebug_QApiCallTreeModel();
+
+    // required to implement
+    virtual QModelIndex index(int row, int column, const QModelIndex &parent = QModelIndex()) const;
+    virtual QModelIndex parent(const QModelIndex &index) const;
+    virtual QVariant data(const QModelIndex &index, int role) const;
+    virtual int rowCount(const QModelIndex &parent = QModelIndex()) const;
+    virtual int columnCount(const QModelIndex &parent = QModelIndex()) const;
+
+    //void appendChild(glvdebug_apiCallTreeItem* pItem)
+    //{
+    //    m_rootItem->appendChild(pItem);
+    //}
+
+    //virtual Qt::ItemFlags flags(const QModelIndex &index) const;
+    //virtual QVariant headerData(int section, Qt::Orientation orientation, int role = Qt::DisplayRole) const;
+
+    //QModelIndex indexOf(const glvdebug_apiCallTreeItem *pItem) const;
+
+    //glvdebug_apiCallTreeItem *root() const
+    //{
+    //    return m_rootItem;
+    //}
+
+    //glvdebug_apiCallTreeItem *create_group(glvdebug_frameItem *pFrameObj,
+    //                                         glvdebug_groupItem *&pGroupObj,
+    //                                         glvdebug_apiCallTreeItem *pParentNode);
+    //void set_highlight_search_string(const QString searchString);
+    //QModelIndex find_prev_search_result(glvdebug_apiCallTreeItem *start, const QString searchText);
+    //QModelIndex find_next_search_result(glvdebug_apiCallTreeItem *start, const QString searchText);
+
+    //glvdebug_apiCallTreeItem *find_prev_snapshot(glvdebug_apiCallTreeItem *start);
+    //glvdebug_apiCallTreeItem *find_next_snapshot(glvdebug_apiCallTreeItem *start);
+
+    //glvdebug_apiCallTreeItem *find_prev_drawcall(glvdebug_apiCallTreeItem *start);
+    //glvdebug_apiCallTreeItem *find_next_drawcall(glvdebug_apiCallTreeItem *start);
+
+    //glvdebug_apiCallTreeItem *find_call_number(unsigned int callNumber);
+    //glvdebug_apiCallTreeItem *find_frame_number(unsigned int frameNumber);
+
+signals:
+
+public
+slots:
+
+private:
+    //gl_entrypoint_id_t itemApiCallId(glvdebug_apiCallTreeItem *apiCall) const;
+    //gl_entrypoint_id_t lastItemApiCallId() const;
+
+    //bool processMarkerPushEntrypoint(gl_entrypoint_id_t id);
+    //bool processMarkerPopEntrypoint(gl_entrypoint_id_t id);
+    //bool processStartNestedEntrypoint(gl_entrypoint_id_t id);
+    //bool processEndNestedEntrypoint(gl_entrypoint_id_t id);
+    //bool processFrameBufferWriteEntrypoint(gl_entrypoint_id_t id);
+
+private:
+    int m_columnCount;
+    glvdebug_apiCallTreeItem *m_rootItem;
+    QLinkedList<glvdebug_apiCallTreeItem *> m_itemList;
+//    QString m_searchString;
+};
+
+#endif // GLVDEBUG_QAPICALLTREEMODEL_H

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_qgeneratetracedialog.cpp b/vktrace/src/vktrace_viewer/vktraceviewer_qgeneratetracedialog.cpp
new file mode 100644
index 0000000..61d8c99
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_qgeneratetracedialog.cpp

@@ -0,0 +1,475 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+#include "vktraceviewer_qgeneratetracedialog.h"
+#include "vktraceviewer_settings.h"
+#include <QDir>
+#include <QFileDialog>
+#include <QGridLayout>
+#include <QProcessEnvironment>
+
+vktraceviewer_QGenerateTraceDialog::vktraceviewer_QGenerateTraceDialog(QWidget *parent)
+    : QDialog(parent),
+      m_pGenerateTraceProcess(NULL)
+{
+    setMinimumWidth(500);
+    setWindowTitle("Generate Trace File");
+    m_pGridLayout = new QGridLayout(this);
+    m_pGridLayout->setObjectName("m_pGridLayout");
+    m_pGridLayout->setHorizontalSpacing(2);
+    m_pGridLayout->setVerticalSpacing(1);
+
+    m_pApplicationLabel = new QLabel(this);
+    m_pApplicationLabel->setObjectName(QStringLiteral("m_pApplicationLabel"));
+    QSizePolicy sizePolicy1(QSizePolicy::Preferred, QSizePolicy::Preferred);
+    sizePolicy1.setHorizontalStretch(0);
+    sizePolicy1.setVerticalStretch(0);
+    sizePolicy1.setHeightForWidth(m_pApplicationLabel->sizePolicy().hasHeightForWidth());
+    m_pApplicationLabel->setSizePolicy(sizePolicy1);
+    m_pApplicationLabel->setTextFormat(Qt::AutoText);
+    m_pApplicationLabel->setAlignment(Qt::AlignRight|Qt::AlignTrailing|Qt::AlignVCenter);
+    m_pApplicationLabel->setText(QApplication::translate("vktraceviewer_QGenerateTraceDialog", "<span style=\"color: red;\">*</span>Application to trace:", 0));
+
+    m_pGridLayout->addWidget(m_pApplicationLabel, 0, 0, 1, 1);
+
+    m_pApplicationLineEdit = new QLineEdit(this);
+    m_pApplicationLineEdit->setObjectName(QStringLiteral("m_pApplicationLineEdit"));
+    QSizePolicy sizePolicy2(QSizePolicy::Expanding, QSizePolicy::Fixed);
+    sizePolicy2.setHorizontalStretch(1);
+    sizePolicy2.setVerticalStretch(0);
+    sizePolicy2.setHeightForWidth(m_pApplicationLineEdit->sizePolicy().hasHeightForWidth());
+    m_pApplicationLineEdit->setSizePolicy(sizePolicy2);
+
+    m_pGridLayout->addWidget(m_pApplicationLineEdit, 0, 1, 1, 1);
+
+    m_pFindApplicationButton = new QPushButton(this);
+    m_pFindApplicationButton->setObjectName(QStringLiteral("m_pFindApplicationButton"));
+    QSizePolicy sizePolicy(QSizePolicy::Fixed, QSizePolicy::Fixed);
+    sizePolicy.setHorizontalStretch(0);
+    sizePolicy.setVerticalStretch(0);
+    sizePolicy.setHeightForWidth(m_pFindApplicationButton->sizePolicy().hasHeightForWidth());
+    m_pFindApplicationButton->setSizePolicy(sizePolicy);
+    m_pFindApplicationButton->setMaximumSize(QSize(20, 16777215));
+    m_pFindApplicationButton->setText(QApplication::translate("vktraceviewer_QGenerateTraceDialog", "...", 0));
+
+    m_pGridLayout->addWidget(m_pFindApplicationButton, 0, 2, 1, 1);
+
+    m_pArgumentsLabel = new QLabel(this);
+    m_pArgumentsLabel->setObjectName(QStringLiteral("m_pArgumentsLabel"));
+    sizePolicy1.setHeightForWidth(m_pArgumentsLabel->sizePolicy().hasHeightForWidth());
+    m_pArgumentsLabel->setSizePolicy(sizePolicy1);
+    m_pArgumentsLabel->setAlignment(Qt::AlignRight|Qt::AlignTrailing|Qt::AlignVCenter);
+    m_pArgumentsLabel->setText(QApplication::translate("vktraceviewer_QGenerateTraceDialog", "Application arguments:", 0));
+
+    m_pGridLayout->addWidget(m_pArgumentsLabel, 1, 0, 1, 1);
+
+    m_pArgumentsLineEdit = new QLineEdit(this);
+    m_pArgumentsLineEdit->setObjectName(QStringLiteral("m_pArgumentsLineEdit"));
+    QSizePolicy sizePolicy3(QSizePolicy::Expanding, QSizePolicy::Fixed);
+    sizePolicy3.setHorizontalStretch(0);
+    sizePolicy3.setVerticalStretch(0);
+    sizePolicy3.setHeightForWidth(m_pArgumentsLineEdit->sizePolicy().hasHeightForWidth());
+    m_pArgumentsLineEdit->setSizePolicy(sizePolicy3);
+
+    m_pGridLayout->addWidget(m_pArgumentsLineEdit, 1, 1, 1, 2);
+
+    m_pWorkingDirLabel = new QLabel(this);
+    m_pWorkingDirLabel->setObjectName(QStringLiteral("m_pWorkingDirLabel"));
+    sizePolicy1.setHeightForWidth(m_pWorkingDirLabel->sizePolicy().hasHeightForWidth());
+    m_pWorkingDirLabel->setSizePolicy(sizePolicy1);
+    m_pWorkingDirLabel->setAlignment(Qt::AlignRight|Qt::AlignTrailing|Qt::AlignVCenter);
+    m_pWorkingDirLabel->setText(QApplication::translate("vktraceviewer_QGenerateTraceDialog", "Working directory:", 0));
+
+    m_pGridLayout->addWidget(m_pWorkingDirLabel, 2, 0, 1, 1);
+
+    m_pWorkingDirLineEdit = new QLineEdit(this);
+    m_pWorkingDirLineEdit->setObjectName(QStringLiteral("m_pWorkingDirLineEdit"));
+    sizePolicy3.setHeightForWidth(m_pWorkingDirLineEdit->sizePolicy().hasHeightForWidth());
+    m_pWorkingDirLineEdit->setSizePolicy(sizePolicy3);
+
+    m_pGridLayout->addWidget(m_pWorkingDirLineEdit, 2, 1, 1, 2);
+
+    m_pVkLayerPathLabel = new QLabel(this);
+    m_pVkLayerPathLabel->setObjectName(QStringLiteral("m_pVkLayerPathLabel"));
+    m_pVkLayerPathLabel->setAlignment(Qt::AlignRight|Qt::AlignTrailing|Qt::AlignVCenter);
+    m_pVkLayerPathLabel->setText(QApplication::translate("vktraceviewer_QGenerateTraceDialog", "<span style=\"color: red;\">*</span>VK_LAYER_PATH:", 0));
+    m_pVkLayerPathLabel->setDisabled(true);
+
+    m_pGridLayout->addWidget(m_pVkLayerPathLabel, 3, 0, 1, 1);
+
+    m_pVkLayerPathLineEdit = new QLineEdit(this);
+    m_pVkLayerPathLineEdit->setObjectName(QStringLiteral("m_pVkLayerPathLineEdit"));
+    m_pVkLayerPathLineEdit->setText(QString());
+
+    m_pGridLayout->addWidget(m_pVkLayerPathLineEdit, 3, 1, 1, 1);
+
+    m_pVkLayerPathButton = new QPushButton(this);
+    m_pVkLayerPathButton->setObjectName(QStringLiteral("m_pVkLayerPathButton"));
+    sizePolicy.setHeightForWidth(m_pVkLayerPathButton->sizePolicy().hasHeightForWidth());
+    m_pVkLayerPathButton->setSizePolicy(sizePolicy);
+    m_pVkLayerPathButton->setMinimumSize(QSize(0, 0));
+    m_pVkLayerPathButton->setMaximumSize(QSize(20, 16777215));
+    m_pVkLayerPathButton->setText(QApplication::translate("vktraceviewer_QGenerateTraceDialog", "...", 0));
+
+    m_pGridLayout->addWidget(m_pVkLayerPathButton, 3, 2, 1, 1);
+
+    m_pTracefileLabel = new QLabel(this);
+    m_pTracefileLabel->setObjectName(QStringLiteral("m_pTracefileLabel"));
+    m_pTracefileLabel->setAlignment(Qt::AlignRight|Qt::AlignTrailing|Qt::AlignVCenter);
+    m_pTracefileLabel->setText(QApplication::translate("vktraceviewer_QGenerateTraceDialog", "<span style=\"color: red;\">*</span>Output trace file:", 0));
+
+    m_pGridLayout->addWidget(m_pTracefileLabel, 4, 0, 1, 1);
+
+    m_pTraceFileLineEdit = new QLineEdit(this);
+    m_pTraceFileLineEdit->setObjectName(QStringLiteral("m_pTraceFileLineEdit"));
+    m_pTraceFileLineEdit->setText(QString());
+
+    m_pGridLayout->addWidget(m_pTraceFileLineEdit, 4, 1, 1, 1);
+
+    m_pFindTraceFileButton = new QPushButton(this);
+    m_pFindTraceFileButton->setObjectName(QStringLiteral("m_pFindTraceFileButton"));
+    sizePolicy.setHeightForWidth(m_pFindTraceFileButton->sizePolicy().hasHeightForWidth());
+    m_pFindTraceFileButton->setSizePolicy(sizePolicy);
+    m_pFindTraceFileButton->setMinimumSize(QSize(0, 0));
+    m_pFindTraceFileButton->setMaximumSize(QSize(20, 16777215));
+    m_pFindTraceFileButton->setText(QApplication::translate("vktraceviewer_QGenerateTraceDialog", "...", 0));
+
+    m_pGridLayout->addWidget(m_pFindTraceFileButton, 4, 2, 1, 1);
+
+    m_pButtonFrame = new QFrame(this);
+    m_pButtonFrame->setObjectName(QStringLiteral("m_pButtonFrame"));
+    m_pButtonFrame->setFrameShape(QFrame::NoFrame);
+    m_pButtonFrame->setFrameShadow(QFrame::Raised);
+
+    m_pButtonHorizontalLayout = new QHBoxLayout(m_pButtonFrame);
+    m_pButtonHorizontalLayout->setObjectName(QStringLiteral("m_pButtonHorizontalLayout"));
+    m_pButtonHorizontalLayout->setContentsMargins(0, 0, 0, 0);
+    m_pButtonHSpacer = new QSpacerItem(40, 20, QSizePolicy::Expanding, QSizePolicy::Minimum);
+
+    m_pButtonHorizontalLayout->addItem(m_pButtonHSpacer);
+
+    m_pCancelButton = new QPushButton(m_pButtonFrame);
+    m_pCancelButton->setObjectName(QStringLiteral("m_pCancelButton"));
+    m_pCancelButton->setText("Cancel");
+    m_pCancelButton->setText(QApplication::translate("vktraceviewer_QGenerateTraceDialog", "Cancel", 0));
+
+    m_pButtonHorizontalLayout->addWidget(m_pCancelButton);
+
+    m_pOkButton = new QPushButton(m_pButtonFrame);
+    m_pOkButton->setObjectName(QStringLiteral("m_pOkButton"));
+    m_pOkButton->setEnabled(false);
+    m_pOkButton->setText(QApplication::translate("vktraceviewer_QGenerateTraceDialog", "OK", 0));
+
+    m_pButtonHorizontalLayout->addWidget(m_pOkButton);
+
+    m_pButtonHSpacer2 = new QSpacerItem(40, 20, QSizePolicy::Expanding, QSizePolicy::Minimum);
+
+    m_pButtonHorizontalLayout->addItem(m_pButtonHSpacer2);
+
+    m_pGridLayout->addWidget(m_pButtonFrame, 5, 0, 1, 2);
+
+    QWidget::setTabOrder(m_pApplicationLineEdit, m_pFindApplicationButton);
+    QWidget::setTabOrder(m_pFindApplicationButton, m_pArgumentsLineEdit);
+    QWidget::setTabOrder(m_pArgumentsLineEdit, m_pTraceFileLineEdit);
+    QWidget::setTabOrder(m_pTraceFileLineEdit, m_pFindTraceFileButton);
+    QWidget::setTabOrder(m_pFindTraceFileButton, m_pOkButton);
+    QWidget::setTabOrder(m_pOkButton, m_pCancelButton);
+    QWidget::setTabOrder(m_pCancelButton, m_pApplicationLineEdit);
+
+    connect(m_pCancelButton, SIGNAL(clicked()), this, SLOT(reject()));
+    connect(m_pOkButton, SIGNAL(clicked()), this, SLOT(accept()));
+    connect(m_pFindApplicationButton, SIGNAL(clicked()), this, SLOT(on_findApplicationButton_clicked()));
+    connect(m_pVkLayerPathButton, SIGNAL(clicked()), this, SLOT(on_vkLayerPathButton_clicked()));
+    connect(m_pFindTraceFileButton, SIGNAL(clicked()), this, SLOT(on_findTraceFileButton_clicked()));
+    connect(m_pApplicationLineEdit, SIGNAL(textChanged(QString)), SLOT(on_applicationLineEdit_textChanged(QString)));
+    connect(m_pVkLayerPathLineEdit, SIGNAL(textChanged(QString)), SLOT(on_vkLayerPathLineEdit_textChanged(QString)));
+    connect(m_pTraceFileLineEdit, SIGNAL(textChanged(QString)), SLOT(on_traceFileLineEdit_textChanged(QString)));
+}
+
+vktraceviewer_QGenerateTraceDialog::~vktraceviewer_QGenerateTraceDialog()
+{
+}
+
+int vktraceviewer_QGenerateTraceDialog::exec()
+{
+    if (g_settings.gentrace_application != NULL)
+    {
+        m_pApplicationLineEdit->setText(QString(g_settings.gentrace_application));
+    }
+    if (g_settings.gentrace_arguments != NULL)
+    {
+        m_pArgumentsLineEdit->setText(QString(g_settings.gentrace_arguments));
+    }
+    if (g_settings.gentrace_working_dir != NULL)
+    {
+        m_pWorkingDirLineEdit->setText(QString(g_settings.gentrace_working_dir));
+    }
+
+    QProcessEnvironment environment = QProcessEnvironment::systemEnvironment();
+    if (environment.contains("VK_LAYER_PATH"))
+    {
+        m_pVkLayerPathLineEdit->setText(QString(environment.value("VK_LAYER_PATH")));
+    }
+    else
+    {
+        if (g_settings.gentrace_vk_layer_path != NULL)
+        {
+            m_pVkLayerPathLineEdit->setText(QString(g_settings.gentrace_vk_layer_path));
+        }
+    }
+    if (g_settings.gentrace_output_file != NULL)
+    {
+        m_pTraceFileLineEdit->setText(QString(g_settings.gentrace_output_file));
+    }
+
+    int result = QDialog::exec();
+
+    if (result == QDialog::Accepted)
+    {
+        bool bSuccess = launch_application_to_generate_trace();
+
+        if (!bSuccess)
+        {
+            result = vktraceviewer_QGenerateTraceDialog::Failed;
+        }
+    }
+
+    return result;
+}
+
+QString vktraceviewer_QGenerateTraceDialog::get_trace_file_path()
+{
+    return m_pTraceFileLineEdit->text();
+}
+
+void vktraceviewer_QGenerateTraceDialog::on_applicationLineEdit_textChanged(const QString &text)
+{
+    check_inputs();
+}
+
+void vktraceviewer_QGenerateTraceDialog::on_vkLayerPathLineEdit_textChanged(const QString &text)
+{
+    check_inputs();
+}
+
+void vktraceviewer_QGenerateTraceDialog::on_traceFileLineEdit_textChanged(const QString &text)
+{
+    check_inputs();
+}
+
+void vktraceviewer_QGenerateTraceDialog::check_inputs()
+{
+    bool applicationFileEntered = m_pApplicationLineEdit->text().size() != 0;
+    bool traceFileEntered = m_pTraceFileLineEdit->text().size() != 0;
+    bool vkLayerPathEntered = m_pVkLayerPathLineEdit->text().size() != 0;
+    m_pOkButton->setEnabled(applicationFileEntered && traceFileEntered && vkLayerPathEntered);
+}
+
+void vktraceviewer_QGenerateTraceDialog::on_vkLayerPathButton_clicked()
+{
+    // open file dialog
+    QString suggestedName = m_pVkLayerPathLineEdit->text();
+    if (suggestedName.isEmpty())
+    {
+        suggestedName = QCoreApplication::applicationDirPath();
+    }
+
+    QString selectedName = QFileDialog::getExistingDirectory(this, tr("Find VK_LAYER_PATH Directory"), suggestedName, 0);
+    if (!selectedName.isEmpty())
+    {
+        m_pVkLayerPathLineEdit->setText(selectedName);
+    }
+}
+
+void vktraceviewer_QGenerateTraceDialog::on_findApplicationButton_clicked()
+{
+    // open file dialog
+    QString suggestedName = m_pApplicationLineEdit->text();
+    QString selectedName = QFileDialog::getOpenFileName(this, tr("Find Application to Trace"), suggestedName, "");
+    if (!selectedName.isEmpty())
+    {
+        m_pApplicationLineEdit->setText(selectedName);
+    }
+}
+
+void vktraceviewer_QGenerateTraceDialog::on_findTraceFileButton_clicked()
+{
+    // open file dialog
+    QString suggestedName = m_pTraceFileLineEdit->text();
+    QString selectedName = QFileDialog::getSaveFileName(this, tr("Output Trace File"), suggestedName, tr("vktrace file (*.vktrace *.*)"));
+    if (!selectedName.isEmpty())
+    {
+        m_pTraceFileLineEdit->setText(selectedName);
+    }
+}
+
+bool vktraceviewer_QGenerateTraceDialog::launch_application_to_generate_trace()
+{
+    QString application = m_pApplicationLineEdit->text();
+    QString arguments = m_pArgumentsLineEdit->text();
+    QString workingDirectory = m_pWorkingDirLineEdit->text();
+    QString vkLayerPath = m_pVkLayerPathLineEdit->text();
+    QString outputTraceFile = get_trace_file_path();
+
+    // update settings
+    if (g_settings.gentrace_application != NULL)
+    {
+        vktrace_free(g_settings.gentrace_application);
+    }
+    g_settings.gentrace_application = vktrace_allocate_and_copy(application.toStdString().c_str());
+
+    if (g_settings.gentrace_arguments != NULL)
+    {
+        vktrace_free(g_settings.gentrace_arguments);
+    }
+    g_settings.gentrace_arguments = vktrace_allocate_and_copy(arguments.toStdString().c_str());
+
+    if (g_settings.gentrace_working_dir != NULL)
+    {
+        vktrace_free(g_settings.gentrace_working_dir);
+    }
+    g_settings.gentrace_working_dir = vktrace_allocate_and_copy(workingDirectory.toStdString().c_str());
+
+    if (g_settings.gentrace_vk_layer_path != NULL)
+    {
+        vktrace_free(g_settings.gentrace_vk_layer_path);
+    }
+    g_settings.gentrace_vk_layer_path = vktrace_allocate_and_copy(vkLayerPath.toStdString().c_str());
+
+    if (g_settings.gentrace_output_file != NULL)
+    {
+        vktrace_free(g_settings.gentrace_output_file);
+    }
+    g_settings.gentrace_output_file = vktrace_allocate_and_copy(outputTraceFile.toStdString().c_str());
+    vktraceviewer_settings_updated();
+
+    QProcessEnvironment environment = QProcessEnvironment::systemEnvironment();
+    environment.insert("VK_LAYER_PATH", vkLayerPath);
+
+    m_pGenerateTraceProcess = new QProcess();
+    connect(m_pGenerateTraceProcess, SIGNAL(readyReadStandardOutput()), this, SLOT(on_readStandardOutput()));
+    connect(m_pGenerateTraceProcess, SIGNAL(readyReadStandardError()), this, SLOT(on_readStandardError()));
+
+    emit OutputMessage(VKTRACE_LOG_VERBOSE, QString("Tracing application: %1").arg(application));
+
+    // backup existing environment
+    QProcessEnvironment tmpEnv = m_pGenerateTraceProcess->processEnvironment();
+    m_pGenerateTraceProcess->setProcessEnvironment(environment);
+
+    QString vktraceviewer = QCoreApplication::applicationDirPath() + "/vktrace";
+
+#if defined(PLATFORM_64BIT)
+    vktraceviewer += "";
+#else
+    vktraceviewer += "32";
+#endif
+
+#if defined(PLATFORM_WINDOWS)
+    vktraceviewer += ".exe";
+#endif
+
+    QString cmdline = vktraceviewer + " -p \"" + application + "\" -o \"" + outputTraceFile + "\"";
+
+    if (!workingDirectory.isEmpty())
+    {
+        cmdline += " -w \"" + workingDirectory + "\"";
+    }
+
+    if (!arguments.isEmpty())
+    {
+        cmdline += " -- " + arguments;
+    }
+
+    bool bCompleted = false;
+    m_pGenerateTraceProcess->start(cmdline);
+    if (m_pGenerateTraceProcess->waitForStarted() == false)
+    {
+        emit OutputMessage(VKTRACE_LOG_ERROR, "Application could not be executed.");
+    }
+    else
+    {
+        // This is a bad idea as it will wait forever,
+        // but if the replay is taking forever then we have bigger problems.
+        if (m_pGenerateTraceProcess->waitForFinished(-1))
+        {
+            emit OutputMessage(VKTRACE_LOG_VERBOSE, "Trace Completed!");
+        }
+        int procRetValue = m_pGenerateTraceProcess->exitCode();
+        if (procRetValue == -2)
+        {
+            // proc failed to starts
+          emit OutputMessage(VKTRACE_LOG_ERROR, "Application could not be executed.");
+        }
+        else if (procRetValue == -1)
+        {
+            // proc crashed
+            emit OutputMessage(VKTRACE_LOG_ERROR, "Application aborted unexpectedly.");
+        }
+        else if (procRetValue == 0)
+        {
+            // success
+            bCompleted = true;
+        }
+        else
+        {
+            // some other return value
+            bCompleted = false;
+        }
+    }
+
+    // restore previous environment
+    m_pGenerateTraceProcess->setProcessEnvironment(tmpEnv);
+
+    if (m_pGenerateTraceProcess != NULL)
+    {
+        delete m_pGenerateTraceProcess;
+        m_pGenerateTraceProcess = NULL;
+    }
+
+    return bCompleted;
+}
+
+void vktraceviewer_QGenerateTraceDialog::on_readStandardOutput()
+{
+    m_pGenerateTraceProcess->setReadChannel(QProcess::StandardOutput);
+    while (m_pGenerateTraceProcess->canReadLine())
+    {
+        QByteArray output = m_pGenerateTraceProcess->readLine();
+        if (output.endsWith("\n"))
+        {
+            output.remove(output.size() - 1, 1);
+        }
+        emit OutputMessage(VKTRACE_LOG_VERBOSE, output.constData());
+    }
+}
+
+void vktraceviewer_QGenerateTraceDialog::on_readStandardError()
+{
+    m_pGenerateTraceProcess->setReadChannel(QProcess::StandardError);
+    while (m_pGenerateTraceProcess->canReadLine())
+    {
+        QByteArray output = m_pGenerateTraceProcess->readLine();
+        if (output.endsWith("\n"))
+        {
+            output.remove(output.size() - 1, 1);
+        }
+        emit OutputMessage(VKTRACE_LOG_ERROR, output.constData());
+    }
+}

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_qgeneratetracedialog.h b/vktrace/src/vktrace_viewer/vktraceviewer_qgeneratetracedialog.h
new file mode 100644
index 0000000..61a22be
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_qgeneratetracedialog.h

@@ -0,0 +1,104 @@
+/**************************************************************************
+ *
+ * Copyright 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+#ifndef VKTRACEVIEWER_QGENERATETRACEDIALOG_H
+#define VKTRACEVIEWER_QGENERATETRACEDIALOG_H
+
+extern "C"
+{
+#include "vktrace_platform.h"
+#include "vktrace_tracelog.h"
+}
+
+#include <QDialog>
+#include <QProcessEnvironment>
+
+#include <QVariant>
+#include <QAction>
+#include <QApplication>
+#include <QButtonGroup>
+#include <QCheckBox>
+#include <QFrame>
+#include <QGridLayout>
+#include <QHBoxLayout>
+#include <QHeaderView>
+#include <QLabel>
+#include <QLineEdit>
+#include <QPushButton>
+#include <QSpacerItem>
+
+class vktraceviewer_QGenerateTraceDialog : public QDialog
+{
+    Q_OBJECT
+public:
+    explicit vktraceviewer_QGenerateTraceDialog(QWidget *parent = 0);
+    virtual ~vktraceviewer_QGenerateTraceDialog();
+
+    virtual int exec();
+
+    enum DialogCode {Cancelled, Succeeded, Failed};
+
+    QString get_trace_file_path();
+
+signals:
+    void OutputMessage(VktraceLogLevel level, const QString& message);
+
+public slots:
+
+private
+slots:
+    void on_applicationLineEdit_textChanged(const QString &text);
+    void on_traceFileLineEdit_textChanged(const QString &text);
+    void on_vkLayerPathLineEdit_textChanged(const QString &text);
+    void on_findApplicationButton_clicked();
+    void on_vkLayerPathButton_clicked();
+    void on_findTraceFileButton_clicked();
+
+    void on_readStandardOutput();
+    void on_readStandardError();
+private:
+    bool launch_application_to_generate_trace();
+
+    void check_inputs();
+
+    QProcess *m_pGenerateTraceProcess;
+    QGridLayout *m_pGridLayout;
+    QLabel *m_pApplicationLabel;
+    QLineEdit *m_pApplicationLineEdit;
+    QPushButton *m_pFindApplicationButton;
+    QLabel *m_pArgumentsLabel;
+    QLineEdit *m_pArgumentsLineEdit;
+    QLabel *m_pWorkingDirLabel;
+    QLineEdit *m_pWorkingDirLineEdit;
+    QLabel *m_pVkLayerPathLabel;
+    QLineEdit *m_pVkLayerPathLineEdit;
+    QPushButton *m_pVkLayerPathButton;
+    QLabel *m_pTracefileLabel;
+    QLineEdit *m_pTraceFileLineEdit;
+    QPushButton *m_pFindTraceFileButton;
+    QFrame *m_pButtonFrame;
+    QHBoxLayout *m_pButtonHorizontalLayout;
+    QSpacerItem *m_pButtonHSpacer;
+    QPushButton *m_pCancelButton;
+    QPushButton *m_pOkButton;
+    QSpacerItem *m_pButtonHSpacer2;
+};
+
+#endif // VKTRACEVIEWER_QGENERATETRACEDIALOG_H

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_qgroupthreadsproxymodel.h b/vktrace/src/vktrace_viewer/vktraceviewer_qgroupthreadsproxymodel.h
new file mode 100644
index 0000000..9b727f7
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_qgroupthreadsproxymodel.h

@@ -0,0 +1,235 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+#ifndef VKTRACEVIEWER_QGROUPTHREADSPROXYMODEL_H
+#define VKTRACEVIEWER_QGROUPTHREADSPROXYMODEL_H
+
+#include "vktraceviewer_QTraceFileModel.h"
+#include <QAbstractProxyModel>
+#include <QStandardItem>
+#include <QList>
+#include <QDebug>
+
+struct GroupInfo
+{
+    int groupIndex;
+    uint32_t threadId;
+    QPersistentModelIndex modelIndex;
+    QList<QPersistentModelIndex> children;
+};
+
+class vktraceviewer_QGroupThreadsProxyModel : public QAbstractProxyModel
+{
+    Q_OBJECT
+public:
+    vktraceviewer_QGroupThreadsProxyModel(QObject *parent = 0)
+        : QAbstractProxyModel(parent)
+    {
+        buildGroups(NULL);
+    }
+
+    virtual ~vktraceviewer_QGroupThreadsProxyModel()
+    {
+    }
+
+    //---------------------------------------------------------------------------------------------
+    virtual void setSourceModel(QAbstractItemModel *sourceModel)
+    {
+        QAbstractProxyModel::setSourceModel(sourceModel);
+
+        if (sourceModel->inherits("vktraceviewer_QTraceFileModel"))
+        {
+            vktraceviewer_QTraceFileModel* pTFM = static_cast<vktraceviewer_QTraceFileModel*>(sourceModel);
+            buildGroups(pTFM);
+        }
+    }
+
+    //---------------------------------------------------------------------------------------------
+    virtual int rowCount(const QModelIndex &parent) const
+    {
+        // ask the source
+        return sourceModel()->rowCount(mapToSource(parent));
+    }
+
+    //---------------------------------------------------------------------------------------------
+    virtual bool hasChildren(const QModelIndex &parent) const
+    {
+        if (!parent.isValid())
+        {
+            return true;
+        }
+
+        return false;
+    }
+
+    //---------------------------------------------------------------------------------------------
+    virtual QVariant data( const QModelIndex &index, int role ) const
+    {
+        if (!index.isValid())
+        {
+            return QVariant();
+        }
+
+        return mapToSource(index).data(role);
+    }
+
+    //---------------------------------------------------------------------------------------------
+    virtual Qt::ItemFlags flags(const QModelIndex &index) const
+    {
+        return Qt::ItemIsEnabled | Qt::ItemIsSelectable;
+    }
+
+    //---------------------------------------------------------------------------------------------
+    virtual int columnCount(const QModelIndex &parent) const
+    {
+        return sourceModel()->columnCount() + m_uniqueThreadIdMapToColumn.count();
+    }
+
+    //---------------------------------------------------------------------------------------------
+    virtual QVariant headerData(int section, Qt::Orientation orientation, int role) const
+    {
+        if (!isThreadColumn(section))
+        {
+            return sourceModel()->headerData(section, orientation, role);
+        }
+        else
+        {
+            if (role == Qt::DisplayRole)
+            {
+                int threadIndex = getThreadColumnIndex(section);
+                return QString("Thread %1").arg(m_uniqueThreadIdMapToColumn.key(threadIndex));
+            }
+        }
+
+        return QVariant();
+    }
+
+    //---------------------------------------------------------------------------------------------
+    QModelIndex index(int row, int column, const QModelIndex &parent = QModelIndex()) const
+    {
+        if (!hasIndex(row, column, parent))
+            return QModelIndex();
+
+        return createIndex(row, column);
+    }
+
+    //---------------------------------------------------------------------------------------------
+    QModelIndex parent(const QModelIndex &child) const
+    {
+        if (!child.isValid())
+            return QModelIndex();
+
+        return QModelIndex();
+    }
+
+    //---------------------------------------------------------------------------------------------
+    QModelIndex mapToSource(const QModelIndex &proxyIndex) const
+    {
+        if (!proxyIndex.isValid())
+            return QModelIndex();
+
+        QModelIndex result;
+        if (!isThreadColumn(proxyIndex.column()))
+        {
+            // it is a column for the source model and not for one of our thread IDs (which isn't in the source, unless we map it to the same Thread Id column?)
+            result = sourceModel()->index(proxyIndex.row(), proxyIndex.column());
+        }
+        else
+        {
+            int threadIndex = getThreadColumnIndex(proxyIndex.column());
+            if (m_packetIndexToColumn[proxyIndex.row()] == threadIndex)
+            {
+                return sourceModel()->index(proxyIndex.row(), vktraceviewer_QTraceFileModel::Column_EntrypointName);
+            }
+        }
+
+        return result;
+    }
+
+    //---------------------------------------------------------------------------------------------
+    QModelIndex mapFromSource(const QModelIndex &sourceIndex) const
+    {
+        if (!sourceIndex.isValid())
+            return QModelIndex();
+
+        return createIndex(sourceIndex.row(), sourceIndex.column());
+    }
+
+    //---------------------------------------------------------------------------------------------
+    virtual QModelIndexList match(const QModelIndex &start, int role, const QVariant &value, int hits, Qt::MatchFlags flags) const
+    {
+        QModelIndexList results = sourceModel()->match(start, role, value, hits, flags);
+
+        for (int i = 0; i < results.count(); i++)
+        {
+            results[i] = mapFromSource(results[i]);
+        }
+
+        return results;
+    }
+
+    //---------------------------------------------------------------------------------------------
+private:
+    QMap<uint32_t, int> m_uniqueThreadIdMapToColumn;
+
+    // Each entry in the list corresponds to a packet index;
+    // the int stored in the list indicates which column the API call belongs in.
+    QList<int> m_packetIndexToColumn;
+
+    //---------------------------------------------------------------------------------------------
+    bool isThreadColumn(int columnIndex) const
+    {
+        return (columnIndex >= sourceModel()->columnCount());
+    }
+
+    //---------------------------------------------------------------------------------------------
+    int getThreadColumnIndex(int proxyColumnIndex) const
+    {
+        return proxyColumnIndex - sourceModel()->columnCount();
+    }
+
+    //---------------------------------------------------------------------------------------------
+    void buildGroups(vktraceviewer_QTraceFileModel* pTFM)
+    {
+        m_uniqueThreadIdMapToColumn.clear();
+        m_packetIndexToColumn.clear();
+
+        if (pTFM != NULL)
+        {
+            // Determine how many additional columns are needed by counting the number if different thread Ids being used.
+            for (int i = 0; i < pTFM->rowCount(); i++)
+            {
+                vktrace_trace_packet_header* pHeader = (vktrace_trace_packet_header*)pTFM->index(i, 0).internalPointer();
+                if (pHeader != NULL)
+                {
+                    if (!m_uniqueThreadIdMapToColumn.contains(pHeader->thread_id))
+                    {
+                        int columnIndex = m_uniqueThreadIdMapToColumn.count();
+                        m_uniqueThreadIdMapToColumn.insert(pHeader->thread_id, columnIndex);
+                    }
+
+                    m_packetIndexToColumn.append(m_uniqueThreadIdMapToColumn[pHeader->thread_id]);
+                }
+            }
+        }
+    }
+};
+
+#endif // VKTRACEVIEWER_QGROUPTHREADSPROXYMODEL_H

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_qimageviewer.h b/vktrace/src/vktrace_viewer/vktraceviewer_qimageviewer.h
new file mode 100644
index 0000000..9f35d32
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_qimageviewer.h

@@ -0,0 +1,211 @@
+/**************************************************************************
+ *
+ * Copyright 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+
+#ifndef _VKTRACEVIEWER_QIMAGEVIEWER_H_
+#define _VKTRACEVIEWER_QIMAGEVIEWER_H_
+
+#include <cassert>
+
+#include <QFileInfo>
+#include <QLabel>
+#include <QScrollArea>
+#include <QScrollBar>
+#include <QWheelEvent>
+
+// Pretend an image is a QScrollArea with some some special event handling.
+class vktraceviewer_qimageviewer : public QScrollArea
+{
+    Q_OBJECT
+public:
+    explicit vktraceviewer_qimageviewer(QWidget* parent = 0)
+        : QScrollArea(parent),
+          m_pImageLabel(NULL),
+          m_pPanStart(0, 0),
+          m_pPan(false),
+          m_pAutoFit(false)
+    {
+        // Create a basic image viewer using a QLabel to display the image.
+        m_pImageLabel = new QLabel;
+        assert(m_pImageLabel != NULL);
+
+        m_pImageLabel->setBackgroundRole(QPalette::Base);
+        m_pImageLabel->setSizePolicy(QSizePolicy::Ignored, QSizePolicy::Ignored);
+        m_pImageLabel->setScaledContents(true);
+
+        // The QLabel is embedded in a QScrollArea so the image can be panned.
+        this->setBackgroundRole(QPalette::Dark);
+        this->setWidget(m_pImageLabel);
+    }
+
+    virtual ~vktraceviewer_qimageviewer()
+    {
+        if (m_pImageLabel != NULL)
+        {
+            delete m_pImageLabel;
+            m_pImageLabel = NULL;
+        }
+    }
+
+    void mouseMoveEvent(QMouseEvent* event)
+    {
+        if(m_pPan)
+        {
+            this->horizontalScrollBar()->setValue(this->horizontalScrollBar()->value() -
+                (event->x() - m_pPanStart.x()));
+            this->verticalScrollBar()->setValue(this->verticalScrollBar()->value() -
+                (event->y() - m_pPanStart.y()));
+            m_pPanStart =  event->pos();
+
+            event->accept();
+            return;
+        }
+
+        event->ignore();
+    }
+
+    void mousePressEvent(QMouseEvent* event)
+    {
+        if(event->button() == Qt::MiddleButton)
+        {
+            m_pPan = true;
+            setCursor(Qt::ClosedHandCursor);
+            m_pPanStart = event->pos();
+
+            event->accept();
+            return;
+        }
+
+        event->ignore();
+    }
+
+    void mouseReleaseEvent(QMouseEvent* event)
+    {
+        if(event->button() == Qt::MiddleButton)
+        {
+            m_pPan = false;
+            setCursor(Qt::ArrowCursor);
+
+            event->accept();
+            return;
+        }
+
+        event->ignore();
+    }
+
+    void resizeEvent(QResizeEvent* event)
+    {
+        QSize const size = computeMinimumSize();
+        m_pImageLabel->setMinimumSize(size);
+
+        if(m_pAutoFit)
+        {
+            m_pImageLabel->resize(size);
+        }
+
+        event->accept();
+    }
+
+    void wheelEvent(QWheelEvent* event)
+    {
+        if(event->orientation() == Qt::Vertical)
+        {
+            // Stop automatically resizing the image when zoom is requested.
+            m_pAutoFit = false;
+
+            // Compute the scaling factor.
+            int const numDegrees = event->delta() / 8;
+            int const numSteps = numDegrees / 15;
+            double const factor = 1.0 + 0.1 * numSteps;
+
+            m_pImageLabel->resize(m_pImageLabel->size() * factor);
+
+            zoomScrollBar(this->horizontalScrollBar(), factor);
+            zoomScrollBar(this->verticalScrollBar(), factor);
+
+            event->accept();
+            return;
+        }
+
+        event->ignore();
+    }
+
+    bool loadImage(QString const& fileName)
+    {
+        QFileInfo fileInfo(fileName);
+        if(!fileInfo.exists() || !fileInfo.isFile())
+        {
+            return false;
+        }
+
+        QImage image(fileName);
+        if(image.isNull())
+        {
+            return false;
+        }
+
+        m_pImageLabel->setPixmap(QPixmap::fromImage(image));
+        m_pImageLabel->adjustSize();
+        m_pImageLabel->setMaximumSize(image.size());
+
+        // Resize the image to the scroll area.
+        m_pAutoFit = true;
+
+        return true;
+    }
+
+private:
+    QSize computeMinimumSize() const
+    {
+        if(m_pImageLabel->pixmap() == NULL)
+        {
+            return QSize(0, 0);
+        }
+
+        double const aspect = m_pImageLabel->pixmap()->width() /
+            m_pImageLabel->pixmap()->height();
+        if(aspect > 1.0)
+        {
+            int const minWidth = this->width() - 2 * this->frameWidth();
+            int const minHeight = minWidth * 1.0 / aspect;
+            return QSize(minWidth, minHeight);
+        }
+        else
+        {
+            int const minHeight = this->height() - 2 * this->frameWidth();
+            int const minWidth = minHeight * aspect;
+            return QSize(minWidth, minHeight);
+        }
+    }
+
+    void zoomScrollBar(QScrollBar* scrollBar, double const& factor)
+    {
+        int const value = static_cast<int>(factor * scrollBar->value() +
+            ((factor - 1.0) * scrollBar->pageStep() / 2));
+        scrollBar->setValue(value);
+    }
+
+    QLabel* m_pImageLabel;
+    QPoint m_pPanStart;
+    bool m_pPan;
+    bool m_pAutoFit;
+};
+
+#endif //_VKTRACEVIEWER_QIMAGEVIEWER_H_

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_qsettingsdialog.cpp b/vktrace/src/vktrace_viewer/vktraceviewer_qsettingsdialog.cpp
new file mode 100644
index 0000000..d30972b
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_qsettingsdialog.cpp

@@ -0,0 +1,168 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+#include "vktraceviewer_qsettingsdialog.h"
+#include "vktraceviewer_settings.h"
+
+#include <QDialogButtonBox>
+#include <QCheckBox>
+#include <QGridLayout>
+#include <QGroupBox>
+#include <QHeaderView>
+#include <QLabel>
+#include <QLineEdit>
+#include <QPushButton>
+#include <QResizeEvent>
+#include <QTableWidget>
+#include <QVBoxLayout>
+
+Q_DECLARE_METATYPE(vktrace_SettingInfo*);
+
+vktraceviewer_QSettingsDialog::vktraceviewer_QSettingsDialog(QWidget *parent)
+    : QDialog(parent),
+      m_pSettingGroups(NULL),
+      m_numSettingGroups(0)
+{
+    this->setWindowTitle("Settings");
+
+    QVBoxLayout* pLayout = new QVBoxLayout(this);
+    this->setLayout(pLayout);
+
+    m_pTabWidget = new QTabWidget(this);
+    pLayout->addWidget(m_pTabWidget);
+
+    QDialogButtonBox* pButtonBox = new QDialogButtonBox(/*QDialogButtonBox::Save | QDialogButtonBox::Cancel*/);
+    pButtonBox->addButton("OK", QDialogButtonBox::RejectRole);
+    pButtonBox->addButton("Save && Apply", QDialogButtonBox::AcceptRole);
+    pLayout->addWidget(pButtonBox);
+    connect(pButtonBox, SIGNAL(accepted()), this, SLOT(acceptCB()));
+    connect(pButtonBox, SIGNAL(rejected()), this, SLOT(cancelCB()));
+}
+
+vktraceviewer_QSettingsDialog::~vktraceviewer_QSettingsDialog()
+{
+    removeTabs();
+}
+
+void vktraceviewer_QSettingsDialog::removeTabs()
+{
+    if (m_pTabWidget == NULL)
+    {
+        return;
+    }
+
+    while (m_pTabWidget->count() > 0)
+    {
+        m_pTabWidget->removeTab(0);
+    }
+}
+
+void vktraceviewer_QSettingsDialog::setGroups(vktrace_SettingGroup* pSettingGroups, unsigned int numGroups)
+{
+    removeTabs();
+
+    m_pSettingGroups = pSettingGroups;
+    m_numSettingGroups = numGroups;
+
+    // add tabs to display other groups of settings
+    for (unsigned int i = 0; i < m_numSettingGroups; i++)
+    {
+        this->add_tab(&m_pSettingGroups[i]);
+    }
+}
+
+void vktraceviewer_QSettingsDialog::acceptCB()
+{
+    save();
+}
+
+void vktraceviewer_QSettingsDialog::cancelCB()
+{
+    reject();
+}
+
+void vktraceviewer_QSettingsDialog::resizeEvent(QResizeEvent *pEvent)
+{
+    emit Resized(pEvent->size().width(), pEvent->size().height());
+}
+
+void vktraceviewer_QSettingsDialog::save()
+{
+    // save vktraceviewer settings
+
+    emit SaveSettings(m_pSettingGroups, m_numSettingGroups);
+    accept();
+}
+
+void vktraceviewer_QSettingsDialog::add_tab(vktrace_SettingGroup* pGroup)
+{
+    QWidget* pTab = new QWidget(m_pTabWidget);
+    m_pTabWidget->addTab(pTab, pGroup->pName);
+    QHBoxLayout* pLayout = new QHBoxLayout(pTab);
+    pTab->setLayout(pLayout);
+
+    QTableWidget* pTable = new QTableWidget(pGroup->numSettings, 2, pTab);
+
+    pLayout->addWidget(pTable, 1);
+
+    QStringList headers;
+    headers << "Name" << "Value";
+    pTable->setHorizontalHeaderLabels(headers);
+    pTable->horizontalHeader()->setSectionResizeMode(0, QHeaderView::ResizeToContents);
+    pTable->horizontalHeader()->setSectionResizeMode(1, QHeaderView::Stretch);
+
+    connect(pTable, SIGNAL(itemChanged(QTableWidgetItem *)), this, SLOT(settingEdited(QTableWidgetItem *)));
+    int row = 0;
+    for (unsigned int i = 0; i < pGroup->numSettings; i++)
+    {
+        QTableWidgetItem *nameItem = new QTableWidgetItem(pGroup->pSettings[i].pLongName);
+        nameItem->setData(Qt::UserRole, QVariant::fromValue(&pGroup->pSettings[i]));
+        pTable->setItem(row, 0, nameItem);
+
+        char* pLeakedMem = vktrace_SettingInfo_stringify_value(&pGroup->pSettings[i]);
+        QTableWidgetItem *valueItem = new QTableWidgetItem(pLeakedMem);
+        valueItem->setData(Qt::UserRole, QVariant::fromValue(&pGroup->pSettings[i]));
+        pTable->setItem(row, 1, valueItem);
+
+        ++row;
+    }
+}
+
+void vktraceviewer_QSettingsDialog::settingEdited(QTableWidgetItem *pItem)
+{
+    vktrace_SettingInfo* pSetting = pItem->data(Qt::UserRole).value<vktrace_SettingInfo*>();
+
+    if (pSetting != NULL)
+    {
+        if (pItem->column() == 0)
+        {
+            vktrace_free((void*)pSetting->pLongName);
+            pSetting->pLongName = vktrace_allocate_and_copy(pItem->text().toStdString().c_str());
+        }
+        else if (pItem->column() == 1)
+        {
+            vktrace_SettingInfo_parse_value(pSetting, pItem->text().toStdString().c_str());
+        }
+        else
+        {
+            // invalid column
+        }
+    }
+}

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_qsettingsdialog.h b/vktrace/src/vktrace_viewer/vktraceviewer_qsettingsdialog.h
new file mode 100644
index 0000000..ca637d0
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_qsettingsdialog.h

@@ -0,0 +1,65 @@
+/**************************************************************************
+ *
+ * Copyright 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+#ifndef _VKTRACEVIEWER_QSETTINGSDIALOG_H_
+#define _VKTRACEVIEWER_QSETTINGSDIALOG_H_
+
+#include <QDialog>
+#include <QTabWidget>
+#include <QTableWidgetItem>
+#include "vktraceviewer_settings.h"
+
+Q_DECLARE_METATYPE(vktrace_SettingGroup)
+
+class vktraceviewer_QSettingsDialog : public QDialog
+{
+    Q_OBJECT
+
+public:
+    vktraceviewer_QSettingsDialog(QWidget* parent = 0);
+    ~vktraceviewer_QSettingsDialog();
+
+    void setGroups(vktrace_SettingGroup* pSettingGroups, unsigned int numGroups);
+
+    void save();
+
+private:
+    vktrace_SettingGroup* m_pSettingGroups;
+    unsigned int m_numSettingGroups;
+
+    QTabWidget* m_pTabWidget;
+    void add_tab(vktrace_SettingGroup* pGroup);
+    virtual void resizeEvent(QResizeEvent *pEvent);
+    void removeTabs();
+
+signals:
+    void SaveSettings(vktrace_SettingGroup* pUpdatedSettingGroups, unsigned int numGroups);
+    void Resized(unsigned int width, unsigned int height);
+
+private
+slots:
+    void acceptCB();
+    void cancelCB();
+
+    void settingEdited(QTableWidgetItem *pItem);
+
+};
+
+#endif // _VKTRACEVIEWER_QSETTINGSDIALOG_H_

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_qsvgviewer.h b/vktrace/src/vktrace_viewer/vktraceviewer_qsvgviewer.h
new file mode 100644
index 0000000..09ac674
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_qsvgviewer.h

@@ -0,0 +1,135 @@
+/**************************************************************************
+ *
+ * Copyright 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+
+#ifndef _VKTRACEVIEWER_QSVGVIEWER_H_
+#define _VKTRACEVIEWER_QSVGVIEWER_H_
+
+#include <QFileInfo>
+#include <QGraphicsSvgItem>
+#include <QGraphicsView>
+#include <QWheelEvent>
+
+class vktraceviewer_qsvgviewer : public QGraphicsView
+{
+    Q_OBJECT
+public:
+    vktraceviewer_qsvgviewer(QWidget* parent = 0) :
+        QGraphicsView(parent),
+        disabledScene(NULL),
+        enabledScene(NULL),
+        autoFit(false)
+    {
+        // The destructor for QGraphicsScene will be called when this QGraphicsView is
+        // destroyed.
+        enabledScene = new QGraphicsScene(this);
+        disabledScene = new QGraphicsScene(this);
+        this->setScene(disabledScene);
+
+        // Anchor the point under the mouse during view transformations.
+        this->setTransformationAnchor(AnchorUnderMouse);
+
+        // Enable drag scrolling with the left mouse button.
+        this->setDragMode(ScrollHandDrag);
+
+        // Always update the entire viewport. Don't waste time trying to figure out
+        // which items need to be updated since there is only one.
+        this->setViewportUpdateMode(FullViewportUpdate);
+    }
+
+    void changeEvent(QEvent* event)
+    {
+        switch(event->type())
+        {
+            case QEvent::EnabledChange:
+                if(this->isEnabled())
+                {
+                    this->setScene(enabledScene);
+                }
+                else
+                {
+                    this->setScene(disabledScene);
+                }
+                break;
+            default:
+                break;
+        }
+    }
+
+    void paintEvent(QPaintEvent* event)
+    {
+        // Resize the scene to fit the widget. This is deferred until the first paint
+        // event (when the widget size is known).
+        if(autoFit)
+        {
+            this->fitInView(enabledScene->itemsBoundingRect(), Qt::KeepAspectRatio);
+            autoFit = false;
+        }
+
+        QGraphicsView::paintEvent(event);
+    }
+
+    void wheelEvent(QWheelEvent* event)
+    {
+        if(event->orientation() == Qt::Vertical)
+        {
+            // The delta value is in units of eighths of a degree.
+            qreal const degrees = event->delta() / 8.0;
+
+            // According to Qt documentation, mice have steps of 15-degrees.
+            qreal const steps = degrees / 15.0;
+
+            qreal factor = 1.0 + 0.1 * steps;
+
+            this->scale(factor, factor);
+
+            event->accept();
+        }
+    }
+
+    bool load(QString const& fileName)
+    {
+        QFileInfo fileInfo(fileName);
+        if(!fileInfo.exists() || !fileInfo.isFile())
+        {
+            return false;
+        }
+
+        this->resetTransform();
+
+        enabledScene->clear();
+
+        // The destructor for QGraphicsSvgItem will be called when the scene is cleared.
+        // This occurs when a SVG is loaded or when the QGraphicsScene is destroyed.
+        enabledScene->addItem(new QGraphicsSvgItem(fileName));
+
+        autoFit = true;
+
+        return true;
+    }
+
+private:
+    QGraphicsScene* disabledScene;
+    QGraphicsScene* enabledScene;
+
+    bool autoFit;
+};
+
+#endif // _VKTRACEVIEWER_QSVGVIEWER_H_

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_qtimelineview.cpp b/vktrace/src/vktrace_viewer/vktraceviewer_qtimelineview.cpp
new file mode 100644
index 0000000..2995da0
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_qtimelineview.cpp

@@ -0,0 +1,761 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+
+#include <QApplication>
+#include <QPainter>
+#include <QPaintEvent>
+#include <QToolTip>
+#ifdef _WIN32
+// The following line allows Visual Studio to provide the M_PI_2 constant:
+#define _USE_MATH_DEFINES
+#endif
+#include <math.h>
+#include "vktraceviewer_qtimelineview.h"
+#include "vktraceviewer_QTraceFileModel.h"
+
+// helper
+float u64ToFloat(uint64_t value)
+{
+    // taken from: http://stackoverflow.com/questions/4400747/converting-from-unsigned-long-long-to-float-with-round-to-nearest-even
+    const int mask_bit_count = 31;
+
+    // How many bits are needed?
+    int b = sizeof(uint64_t) * CHAR_BIT - 1;
+    for (; b >= 0; --b)
+    {
+        if (value & (1ull << b))
+        {
+            break;
+        }
+    }
+
+    // If there are few enough significant bits, use normal cast and done.
+    if (b < mask_bit_count)
+    {
+        return static_cast<float>(value & ~1ull);
+    }
+
+    // Save off the low-order useless bits:
+    uint64_t low_bits = value & ((1ull << (b - mask_bit_count)) - 1);
+
+    // Now mask away those useless low bits:
+    value &= ~((1ull << (b - mask_bit_count)) - 1);
+
+    // Finally, decide how to round the new LSB:
+    if (low_bits > ((1ull << (b - mask_bit_count)) / 2ull))
+    {
+        // Round up.
+        value |= (1ull << (b - mask_bit_count));
+    }
+    else
+    {
+        // Round down.
+        value &= ~(1ull << (b - mask_bit_count));
+    }
+
+    return static_cast<float>(value);
+}
+
+//=============================================================================
+vktraceviewer_QTimelineItemDelegate::vktraceviewer_QTimelineItemDelegate(QObject *parent)
+    : QAbstractItemDelegate(parent)
+{
+    assert(parent != NULL);
+}
+
+vktraceviewer_QTimelineItemDelegate::~vktraceviewer_QTimelineItemDelegate()
+{
+
+}
+
+void vktraceviewer_QTimelineItemDelegate::paint(QPainter *painter, const QStyleOptionViewItem &option, const QModelIndex &index) const
+{
+    vktrace_trace_packet_header* pHeader = (vktrace_trace_packet_header*)index.internalPointer();
+
+    if (pHeader->entrypoint_end_time <= pHeader->entrypoint_begin_time)
+    {
+        return;
+    }
+
+    painter->save();
+    {
+        vktraceviewer_QTimelineView* pTimeline = (vktraceviewer_QTimelineView*)parent();
+        if (pTimeline != NULL)
+        {
+            QRectF rect = option.rect;
+
+            if (rect.width() == 0)
+            {
+                rect.setWidth(1);
+            }
+
+            float duration = u64ToFloat(pHeader->entrypoint_end_time - pHeader->entrypoint_begin_time);
+            float durationRatio = duration / pTimeline->getMaxItemDuration();
+            int intensity = std::min(255, (int)(durationRatio * 255.0f));
+            QColor color(intensity, 255-intensity, 0);
+
+            // add gradient to the items better distinguish between the end of one and beginning of the next
+            QLinearGradient linearGrad(rect.center(), rect.bottomRight());
+            linearGrad.setColorAt(0, color);
+            linearGrad.setColorAt(1, color.darker(150));
+
+            painter->setBrush(linearGrad);
+            painter->setPen(Qt::NoPen);
+
+            painter->drawRect(rect);
+
+            if (rect.width() >= 2)
+            {
+                // draw shadow and highlight around the item
+                painter->setPen(color.darker(175));
+                painter->drawLine(rect.right()-1, rect.top(), rect.right()-1, rect.bottom()-1);
+                painter->drawLine(rect.right()-1, rect.bottom()-1, rect.left(), rect.bottom()-1);
+
+                painter->setPen(color.lighter());
+                painter->drawLine(rect.left(), rect.bottom()-1, rect.left(), rect.top());
+                painter->drawLine(rect.left(), rect.top(), rect.right()-1, rect.top());
+            }
+        }
+    }
+
+    painter->restore();
+}
+
+QSize vktraceviewer_QTimelineItemDelegate::sizeHint( const QStyleOptionViewItem &option, const QModelIndex &index) const
+{
+    QSize size;
+
+    vktraceviewer_QTimelineView* pTimeline = (vktraceviewer_QTimelineView*)parent();
+    if (pTimeline != NULL)
+    {
+        QRectF rect = pTimeline->visualRect(index);
+
+        size = rect.toRect().size();
+    }
+    return QSize();
+}
+
+//=============================================================================
+vktraceviewer_QTimelineView::vktraceviewer_QTimelineView(QWidget *parent) :
+    QAbstractItemView(parent),
+    m_maxItemDuration(0),
+    m_maxZoom(0.001f),
+    m_threadHeight(0),
+    m_hashIsDirty(true),
+    m_margin(10),
+    m_pPixmap(NULL),
+    m_itemDelegate(this)
+{
+    horizontalScrollBar()->setRange(0,0);
+    verticalScrollBar()->setRange(0,0);
+
+    m_background = QBrush(QColor(200,200,200));
+    m_trianglePen = QPen(Qt::darkGray);
+    m_trianglePen.setWidth(1);
+    m_textPen = QPen(Qt::white);
+    m_textFont.setPixelSize(50);
+
+    // Allows tracking the mouse position when it is over the timeline
+    setMouseTracking(true);
+
+    m_durationToViewportScale = 1;
+    m_zoomFactor = 1;
+    m_lineLength = 1;
+    m_scrollBarWidth = QApplication::style()->pixelMetric(QStyle::PM_ScrollBarExtent);
+}
+
+//-----------------------------------------------------------------------------
+vktraceviewer_QTimelineView::~vktraceviewer_QTimelineView()
+{
+    m_threadIdList.clear();
+}
+
+//-----------------------------------------------------------------------------
+void vktraceviewer_QTimelineView::setModel(QAbstractItemModel* pModel)
+{
+    QAbstractItemView::setModel(pModel);
+    m_hashIsDirty = true;
+    setItemDelegate(&m_itemDelegate);
+
+    m_threadIdList.clear();
+    m_threadMask.clear();
+    m_maxItemDuration = 0;
+    m_rawStartTime = 0;
+    m_rawEndTime = 0;
+
+    deletePixmap();
+
+    // Gather some stats from the model
+    if (model() == NULL)
+    {
+        horizontalScrollBar()->setRange(0,0);
+        verticalScrollBar()->setRange(0,0);
+        return;
+    }
+
+    int numRows = model()->rowCount();
+    for (int i = 0; i < numRows; i++)
+    {
+        // Count number of unique thread Ids
+        QModelIndex item = model()->index(i, vktraceviewer_QTraceFileModel::Column_ThreadId);
+        if (item.isValid())
+        {
+            uint32_t threadId = item.data().toUInt();
+            if (!m_threadIdList.contains(threadId))
+            {
+                m_threadIdList.append(threadId);
+                m_threadMask.insert(threadId, QVector<int>());
+                m_threadArea.append(QRect());
+            }
+        }
+
+        // Find duration of longest item
+        item = model()->index(i, vktraceviewer_QTraceFileModel::Column_CpuDuration);
+        if (item.isValid())
+        {
+            float duration = item.data().toFloat();
+            if (m_maxItemDuration < duration)
+            {
+                m_maxItemDuration = duration;
+            }
+        }
+    }
+
+    // Get start time
+    QModelIndex start = model()->index(0, vktraceviewer_QTraceFileModel::Column_BeginTime);
+    if (start.isValid())
+    {
+        m_rawStartTime = start.data().toULongLong();
+    }
+
+    // Get end time
+    QModelIndex end = model()->index(numRows - 1, vktraceviewer_QTraceFileModel::Column_EndTime);
+    if (end.isValid())
+    {
+        m_rawEndTime = end.data().toULongLong();
+    }
+
+    // the duration to viewport scale should allow us to map the entire timeline into the current window width.
+    m_lineLength = m_rawEndTime - m_rawStartTime;
+
+    int initialTimelineWidth = viewport()->width() - 2*m_margin - m_scrollBarWidth;
+    m_durationToViewportScale = (float)initialTimelineWidth / u64ToFloat(m_lineLength);
+
+    m_zoomFactor = m_durationToViewportScale;
+
+    verticalScrollBar()->setMaximum(1000);
+    verticalScrollBar()->setValue(0);
+    verticalScrollBar()->setPageStep(1);
+    verticalScrollBar()->setSingleStep(1);
+}
+
+//-----------------------------------------------------------------------------
+void vktraceviewer_QTimelineView::calculateRectsIfNecessary()
+{
+    if (!m_hashIsDirty)
+    {
+        return;
+    }
+
+    if (model() == NULL)
+    {
+        return;
+    }
+
+    int itemHeight = m_threadHeight * 0.4;
+
+    for (int threadIndex = 0; threadIndex < m_threadIdList.size(); threadIndex++)
+    {
+        int top = (m_threadHeight * threadIndex) + (m_threadHeight * 0.5) - itemHeight/2;
+        this->m_threadArea[threadIndex] = QRect(0, top, viewport()->width(), itemHeight);
+    }
+
+    int numRows = model()->rowCount();
+    for (int row = 0; row < numRows; row++)
+    {
+        QRectF rect;
+        QModelIndex item = model()->index(row, vktraceviewer_QTraceFileModel::Column_EntrypointName);
+
+        vktrace_trace_packet_header* pHeader = (vktrace_trace_packet_header*)item.internalPointer();
+
+        // make sure item is valid size
+        if (pHeader->entrypoint_end_time > pHeader->entrypoint_begin_time)
+        {
+            int threadIndex = m_threadIdList.indexOf(pHeader->thread_id);
+            int topOffset = (m_threadHeight * threadIndex) + (m_threadHeight * 0.5);
+
+            uint64_t duration = pHeader->entrypoint_end_time - pHeader->entrypoint_begin_time;
+
+            float leftOffset = u64ToFloat(pHeader->entrypoint_begin_time - m_rawStartTime);
+            float Width = u64ToFloat(duration);
+
+            // create the rect that represents this item
+            rect.setLeft(leftOffset);
+            rect.setTop(topOffset - (itemHeight/2));
+            rect.setWidth(Width);
+            rect.setHeight(itemHeight);
+        }
+
+        m_rowToRectMap[row] = rect;
+    }
+
+    m_hashIsDirty = false;
+    viewport()->update();
+}
+
+//-----------------------------------------------------------------------------
+QRectF vktraceviewer_QTimelineView::itemRect(const QModelIndex &item) const
+{
+    QRectF rect;
+    if (item.isValid())
+    {
+        rect = m_rowToRectMap.value(item.row());
+    }
+    return rect;
+}
+
+//-----------------------------------------------------------------------------
+bool vktraceviewer_QTimelineView::event(QEvent * e)
+{
+    if (e->type() == QEvent::ToolTip)
+    {
+        QHelpEvent* pHelp = static_cast<QHelpEvent*>(e);
+        QModelIndex index = indexAt(pHelp->pos());
+        if (index.isValid())
+        {
+            vktrace_trace_packet_header* pHeader = (vktrace_trace_packet_header*)index.internalPointer();
+            QToolTip::showText(pHelp->globalPos(), QString("Call %1:\n%2").arg(pHeader->global_packet_index).arg(index.data().toString()));
+            return true;
+        }
+        else
+        {
+            QToolTip::hideText();
+        }
+    }
+
+    return QAbstractItemView::event(e);
+}
+
+//-----------------------------------------------------------------------------
+void vktraceviewer_QTimelineView::resizeEvent(QResizeEvent *event)
+{
+    m_hashIsDirty = true;
+    deletePixmap();
+
+    // The duration to viewport scale should allow us to map the entire timeline into the current window width.
+    if (m_lineLength > 0)
+    {
+        // Calculate zoom ratio prior to the resize
+        float ratio = m_zoomFactor / m_durationToViewportScale;
+
+        // Adjust scale that fits the timeline duration to the viewport area
+        int timelineViewportWidth = viewport()->width() - 2*m_margin - m_scrollBarWidth;
+        m_durationToViewportScale = (float)timelineViewportWidth / u64ToFloat(m_lineLength);
+
+        // Adjust the zoom factor based on the new duration to viewport scale and the previous ratio
+        m_zoomFactor = m_durationToViewportScale * ratio;
+
+        // Adjust horizontal scroll bar to maintain current view as best as possible
+        float hRatio = (float)horizontalScrollBar()->value() / qMax(1.0f,(float)horizontalScrollBar()->maximum());
+        updateGeometries();
+        horizontalScrollBar()->setValue(hRatio * horizontalScrollBar()->maximum());
+    }
+}
+
+//-----------------------------------------------------------------------------
+void vktraceviewer_QTimelineView::scrollContentsBy(int dx, int dy)
+{
+    deletePixmap();
+
+    if (dy != 0)
+    {
+        QPoint pos = m_mousePosition;
+        int focusX = pos.x();
+        if (pos.isNull())
+        {
+            // If the mouse position is NULL (ie, the mouse is not in the viewport)
+            // zooming will happen around the center of the timeline
+            focusX = (viewport()->width() - m_scrollBarWidth) / 2;
+        }
+
+        int x = focusX + horizontalScrollBar()->value();
+        float wx = (float)x / m_zoomFactor;
+
+        // The zoom factor is a smoothed interpolation (based on the vertical scroll bar value and max)
+        // between the m_durationToViewportScale (which fits the entire timeline into the viewport width)
+        // and m_maxZoom (which is initialized to 1/1000)
+        float zoomRatio = (float)verticalScrollBar()->value() / (float)verticalScrollBar()->maximum();
+        zoomRatio = 1-cos(zoomRatio*M_PI_2);
+        float diff = m_maxZoom - m_durationToViewportScale;
+        m_zoomFactor = m_durationToViewportScale + zoomRatio * diff;
+        updateGeometries();
+
+        int newValue = (wx * m_zoomFactor) - focusX;
+
+        horizontalScrollBar()->setValue(newValue);
+    }
+
+    viewport()->update();
+//    scrollDirtyRegion(dx, 0);
+//    viewport()->scroll(dx, 0);
+}
+
+//-----------------------------------------------------------------------------
+void vktraceviewer_QTimelineView::mousePressEvent(QMouseEvent * event)
+{
+    QAbstractItemView::mousePressEvent(event);
+    QModelIndex index = indexAt(event->pos());
+    if (index.isValid())
+    {
+        setCurrentIndex(index);
+    }
+}
+
+//-----------------------------------------------------------------------------
+void vktraceviewer_QTimelineView::mouseMoveEvent(QMouseEvent * event)
+{
+    // Since we are tracking the mouse position, we really should pass the event
+    // to our base class, Unfortunately it really sucks up performance, so don't do it for now.
+    //QAbstractItemView::mouseMoveEvent(event);
+
+    if (event->pos().x() > viewport()->width() - m_margin)
+    {
+        // The mouse position was within m_margin of the vertical scroll bar (used for zooming)
+        // Make this a null point so that zooming will happen on the center of the timeline
+        // rather than the very edge where the mouse cursor was last seen.
+        m_mousePosition = QPoint();
+    }
+    else
+    {
+        m_mousePosition = event->pos();
+    }
+    event->accept();
+}
+
+//-----------------------------------------------------------------------------
+void vktraceviewer_QTimelineView::updateGeometries()
+{
+    uint64_t duration = m_rawEndTime - m_rawStartTime;
+    int hMax = scaleDurationHorizontally(duration) + 2*m_margin - viewport()->width();
+    horizontalScrollBar()->setRange(0, qMax(0, hMax));
+    horizontalScrollBar()->setPageStep(viewport()->width());
+    horizontalScrollBar()->setSingleStep(1);
+}
+
+//-----------------------------------------------------------------------------
+QRect vktraceviewer_QTimelineView::visualRect(const QModelIndex &index) const
+{
+    QRectF rectf = viewportRect(index);
+    return rectf.toRect();
+}
+
+//-----------------------------------------------------------------------------
+QRectF vktraceviewer_QTimelineView::viewportRect(const QModelIndex &index) const
+{
+    QRectF rectf = itemRect(index);
+    if (rectf.isValid())
+    {
+        rectf.moveLeft( rectf.left() * m_zoomFactor);
+        rectf.setWidth( rectf.width() * m_zoomFactor);
+
+        rectf.adjust(-horizontalScrollBar()->value(), 0,
+                     -horizontalScrollBar()->value(), 0);
+
+        // the margin is added last so that it is NOT scaled
+        rectf.adjust(m_margin, 0,
+                     m_margin, 0);
+    }
+
+    return rectf;
+}
+
+//-----------------------------------------------------------------------------
+void vktraceviewer_QTimelineView::scrollTo(const QModelIndex &index, ScrollHint hint/* = EnsureVisible*/)
+{
+    if (!index.isValid())
+    {
+        return;
+    }
+
+    QRect viewRect = viewport()->rect();
+    QRect itemRect = visualRect(index);
+
+    if (itemRect.left() < viewRect.left())
+    {
+        horizontalScrollBar()->setValue(horizontalScrollBar()->value() + itemRect.center().x() - viewport()->width()/2);
+    }
+    else if (itemRect.right() > viewRect.right())
+    {
+        horizontalScrollBar()->setValue(horizontalScrollBar()->value() + itemRect.center().x() - viewport()->width()/2);
+    }
+
+    if (!viewRect.contains(itemRect))
+    {
+        deletePixmap();
+    }
+    viewport()->update();
+}
+
+//-----------------------------------------------------------------------------
+QModelIndex vktraceviewer_QTimelineView::indexAt(const QPoint &point) const
+{
+    if (model() == NULL)
+        return QModelIndex();
+
+    float wy = (float)point.y();
+
+    // Early out if the point is not in the areas covered by timeline items
+    bool inThreadArea = false;
+    for (int i = 0; i < m_threadArea.size(); i++)
+    {
+        if (wy >= m_threadArea[i].top() && wy <= m_threadArea[i].bottom())
+        {
+            inThreadArea = true;
+            break;
+        }
+    }
+
+    if (inThreadArea == false)
+    {
+        // point is outside the areas that timeline items are drawn to.
+        return QModelIndex();
+    }
+
+    // Transform the view coordinates into content widget coordinates.
+    int x = point.x() - m_margin + horizontalScrollBar()->value();
+    float wx = (float)x / m_zoomFactor;
+
+    QHashIterator<int, QRectF> iter(m_rowToRectMap);
+    while (iter.hasNext())
+    {
+        iter.next();
+        QRectF value = iter.value();
+        if (value.contains(wx, wy))
+        {
+            return model()->index(iter.key(), vktraceviewer_QTraceFileModel::Column_EntrypointName);
+        }
+    }
+
+    return QModelIndex();
+}
+
+//-----------------------------------------------------------------------------
+QRegion vktraceviewer_QTimelineView::itemRegion(const QModelIndex &index) const
+{
+    if (!index.isValid())
+        return QRegion();
+
+    return QRegion(viewportRect(index).toRect());
+}
+
+//-----------------------------------------------------------------------------
+void vktraceviewer_QTimelineView::paintEvent(QPaintEvent *event)
+{
+    QPainter painter(viewport());
+    paint(&painter, event);
+}
+
+//-----------------------------------------------------------------------------
+void vktraceviewer_QTimelineView::drawBaseTimelines(QPainter* painter, const QRect& rect, const QList<uint32_t> &threadList)
+{
+    int numThreads = threadList.count();
+
+    painter->save();
+    QFont font = painter->font();
+    int fontHeight = qMin((int)(m_threadHeight * 0.3), font.pointSize());
+    font.setPointSize(fontHeight);
+    painter->setFont(font);
+
+    for (int i = 0; i < numThreads; i++)
+    {
+        int threadTop = (i*m_threadHeight);
+
+        painter->drawText(0, threadTop + fontHeight, QString("Thread %1").arg(threadList[i]));
+
+        // draw the timeline in the middle of this thread's area
+        int lineStart = m_margin - horizontalOffset();
+        int lineEnd = lineStart + scaleDurationHorizontally(m_lineLength);
+        int lineY = threadTop + m_threadHeight/2;
+        painter->drawLine(lineStart, lineY, lineEnd, lineY);
+    }
+    painter->restore();
+}
+
+//-----------------------------------------------------------------------------
+QList<uint32_t> vktraceviewer_QTimelineView::getModelThreadList() const
+{
+    return m_threadIdList;
+}
+
+//-----------------------------------------------------------------------------
+void vktraceviewer_QTimelineView::paint(QPainter *painter, QPaintEvent *event)
+{
+    m_threadHeight = event->rect().height();
+    if (m_threadIdList.count() > 0)
+    {
+        m_threadHeight /= m_threadIdList.count();
+    }
+
+    int arrowHeight = 12;
+    int arrowTop = 2;
+    int arrowHalfWidth = 4;
+
+    QPolygon triangle(3);
+    triangle.setPoint(0, 0, arrowTop);
+    triangle.setPoint(1, -arrowHalfWidth, arrowTop+arrowHeight);
+    triangle.setPoint(2, arrowHalfWidth, arrowTop+arrowHeight);
+
+    QList<uint32_t> threadList = getModelThreadList();
+
+    calculateRectsIfNecessary();
+
+    if (m_pPixmap == NULL)
+    {
+        int pixmapHeight = event->rect().height();
+        int pixmapWidth = event->rect().width();
+
+        m_pPixmap = new QPixmap(pixmapWidth, pixmapHeight);
+
+        for (int t = 0; t < m_threadIdList.size(); t++)
+        {
+            m_threadMask[m_threadIdList[t]] = QVector<int>(pixmapWidth, 0);
+        }
+
+        QPainter pixmapPainter(m_pPixmap);
+
+        // fill entire background with background color
+        pixmapPainter.fillRect(event->rect(), m_background);
+        drawBaseTimelines(&pixmapPainter, event->rect(), threadList);
+
+        if (model() != NULL)
+        {
+            int numRows = model()->rowCount();
+
+            for (int r = 0; r < numRows; r++)
+            {
+                QModelIndex index = model()->index(r, vktraceviewer_QTraceFileModel::Column_EntrypointName);
+
+                drawTimelineItem(&pixmapPainter, index);
+            }
+        }
+    }
+    painter->drawPixmap(event->rect(), *m_pPixmap, m_pPixmap->rect());
+
+    if (model() == NULL)
+    {
+        return;
+    }
+
+    // draw current api call marker
+    int currentIndexRow = currentIndex().row();
+    if (currentIndexRow >= 0)
+    {
+        // Overlay a black rectangle around the current item.
+        // For more information on how rects are drawn as outlines,
+        // see here: http://qt-project.org/doc/qt-4.8/qrectf.html#rendering
+        int penWidth = 2;
+        int penWidthHalf = 1;
+        QPen blackPen(Qt::black);
+        blackPen.setWidth(penWidth);
+        blackPen.setJoinStyle(Qt::MiterJoin);
+        painter->setPen(blackPen);
+
+        // Don't fill the rectangle
+        painter->setBrush(Qt::NoBrush);
+
+        QModelIndex index = model()->index(currentIndexRow, vktraceviewer_QTraceFileModel::Column_EntrypointName);
+        QRectF rect = visualRect(index);
+        rect.adjust(-penWidthHalf, -penWidthHalf, penWidthHalf, penWidthHalf);
+        painter->drawRect(rect);
+
+        // Draw marker underneath the current rect
+        painter->save();
+        QPainter::RenderHints hints = painter->renderHints();
+        painter->setRenderHints(QPainter::Antialiasing);
+        painter->setPen(m_trianglePen);
+        painter->setBrush(QColor(Qt::yellow));
+        painter->translate(rect.center().x(), rect.bottom());
+        painter->drawPolygon(triangle);
+        painter->setRenderHints(hints, false);
+        painter->restore();
+    }
+}
+
+//-----------------------------------------------------------------------------
+float vktraceviewer_QTimelineView::scaleDurationHorizontally(uint64_t value) const
+{
+    float scaled = u64ToFloat(value) * m_zoomFactor;
+
+    return scaled;
+}
+
+//-----------------------------------------------------------------------------
+float vktraceviewer_QTimelineView::scalePositionHorizontally(uint64_t value) const
+{
+    uint64_t shiftedValue = value - m_rawStartTime;
+    float offset = scaleDurationHorizontally(shiftedValue);
+
+    return offset;
+}
+
+//-----------------------------------------------------------------------------
+void vktraceviewer_QTimelineView::drawTimelineItem(QPainter* painter, const QModelIndex &index)
+{
+    QRectF rect = viewportRect(index);
+
+    // don't draw if the rect is outside the viewport
+    if (!rect.isValid() ||
+        rect.bottom() < 0 ||
+        rect.y() > viewport()->height() ||
+        rect.x() > viewport()->width() ||
+        rect.right() < 0)
+    {
+        return;
+    }
+
+    QStyleOptionViewItem option = viewOptions();
+    option.rect = rect.toRect();
+    if (selectionModel()->isSelected(index))
+        option.state |= QStyle::State_Selected;
+    if (currentIndex() == index)
+        option.state |= QStyle::State_HasFocus;
+
+    // check mask to determine if this item should be drawn, or if something has already covered it's pixels
+    vktrace_trace_packet_header* pHeader = (vktrace_trace_packet_header*)index.internalPointer();
+    QVector<int>& mask = m_threadMask[pHeader->thread_id];
+    bool drawItem = false;
+    int x = option.rect.x();
+    int right = qMin( qMax(x, option.rect.right()), viewport()->width()-1);
+    for (int pixel = qMax(0, x); pixel <= right; pixel++)
+    {
+        if (mask[pixel] == 0)
+        {
+            drawItem = true;
+            mask[pixel] = 1;
+        }
+    }
+
+    // draw item if it should be visible
+    if (drawItem)
+    {
+        itemDelegate()->paint(painter, option, index);
+    }
+}

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_qtimelineview.h b/vktrace/src/vktrace_viewer/vktraceviewer_qtimelineview.h
new file mode 100644
index 0000000..ec1f176
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_qtimelineview.h

@@ -0,0 +1,171 @@
+/**************************************************************************
+ *
+ * Copyright 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+
+#ifndef VKTRACEVIEWER_QTIMELINEVIEW_H
+#define VKTRACEVIEWER_QTIMELINEVIEW_H
+
+#include <stdint.h>
+#include "vktrace_trace_packet_identifiers.h"
+
+#include <QWidget>
+
+QT_BEGIN_NAMESPACE
+class QPainter;
+class QPaintEvent;
+QT_END_NAMESPACE
+
+#include <QAbstractItemView>
+#include <QBrush>
+#include <QFont>
+#include <QPen>
+#include <QScrollBar>
+
+class vktraceviewer_QTimelineItemDelegate : public QAbstractItemDelegate
+{
+    Q_OBJECT
+public:
+    vktraceviewer_QTimelineItemDelegate(QObject *parent = 0);
+    virtual ~vktraceviewer_QTimelineItemDelegate();
+
+    virtual void paint(QPainter *painter, const QStyleOptionViewItem &option, const QModelIndex &index) const;
+    virtual QSize sizeHint(const QStyleOptionViewItem &option, const QModelIndex &index) const;
+};
+
+// Implementation of the QTimelineView has benefited greatly from the following site:
+// http://www.informit.com/articles/article.aspx?p=1613548
+class vktraceviewer_QTimelineView : public QAbstractItemView
+{
+    Q_OBJECT
+public:
+    explicit vktraceviewer_QTimelineView(QWidget *parent = 0);
+    virtual ~vktraceviewer_QTimelineView();
+
+    virtual void setModel(QAbstractItemModel* pModel);
+
+    // Begin public virtual functions of QAbstractItemView
+    virtual QRect visualRect(const QModelIndex &index) const;
+    virtual void scrollTo(const QModelIndex &index, ScrollHint hint = EnsureVisible);
+    virtual QModelIndex indexAt(const QPoint &point) const;
+    // End public virtual functions of QAbstractItemView
+
+    QList<uint32_t> getModelThreadList() const;
+    QRectF itemRect(const QModelIndex &item) const;
+    float getMaxItemDuration() const
+    {
+        return m_maxItemDuration;
+    }
+
+    void deletePixmap()
+    {
+        if (m_pPixmap != NULL)
+        {
+            delete m_pPixmap;
+            m_pPixmap = NULL;
+        }
+    }
+
+private:
+    QBrush m_background;
+    QPen m_trianglePen;
+    QPen m_textPen;
+    QFont m_textFont;
+
+    // new members
+    QList<uint32_t> m_threadIdList;
+    QHash< uint32_t, QVector<int> > m_threadMask;
+    QList<QRect> m_threadArea;
+    float m_maxItemDuration;
+    uint64_t m_rawStartTime;
+    uint64_t m_rawEndTime;
+    uint64_t m_lineLength;
+    float m_durationToViewportScale;
+    float m_zoomFactor;
+    float m_maxZoom;
+    int m_threadHeight;
+    QHash<int, QRectF> m_rowToRectMap;
+    bool m_hashIsDirty;
+    int m_margin;
+    int m_scrollBarWidth;
+    QPoint m_mousePosition;
+
+    QPixmap *m_pPixmap;
+    vktraceviewer_QTimelineItemDelegate m_itemDelegate;
+
+    void calculateRectsIfNecessary();
+    void drawBaseTimelines(QPainter *painter, const QRect &rect, const QList<uint32_t> &threadList);
+    void drawTimelineItem(QPainter* painter, const QModelIndex &index);
+
+    QRectF viewportRect(const QModelIndex &index) const;
+    float scaleDurationHorizontally(uint64_t value) const;
+    float scalePositionHorizontally(uint64_t value) const;
+
+    // Begin Private...
+    virtual QRegion itemRegion(const QModelIndex &index) const;
+    // End private...
+
+protected:
+    void paintEvent(QPaintEvent *event);
+    void paint(QPainter *painter, QPaintEvent *event);
+
+    virtual bool event(QEvent * e);
+    virtual void resizeEvent(QResizeEvent *event);
+    virtual void mousePressEvent(QMouseEvent * event);
+    virtual void mouseMoveEvent(QMouseEvent * event);
+    virtual void scrollContentsBy(int dx, int dy);
+
+    // Begin protected virtual functions of QAbstractItemView
+    virtual QModelIndex moveCursor(CursorAction cursorAction,
+                                   Qt::KeyboardModifiers modifiers)
+    {
+        return QModelIndex();
+    }
+
+    virtual int horizontalOffset() const
+    {
+        return horizontalScrollBar()->value();
+    }
+    virtual int verticalOffset() const
+    {
+        return verticalScrollBar()->value();
+    }
+
+    virtual bool isIndexHidden(const QModelIndex &index) const
+    {
+        return false;
+    }
+
+    virtual void setSelection(const QRect &rect, QItemSelectionModel::SelectionFlags command) {}
+    virtual QRegion visualRegionForSelection(const QItemSelection &selection) const
+    {
+        return QRegion();
+    }
+    // End protected virtual functions of QAbstractItemView
+
+protected slots:
+    virtual void updateGeometries();
+
+signals:
+
+public
+slots:
+};
+
+#endif // VKTRACEVIEWER_QTIMELINEVIEW_H

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_qtracefileloader.cpp b/vktrace/src/vktrace_viewer/vktraceviewer_qtracefileloader.cpp
new file mode 100644
index 0000000..b380d19
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_qtracefileloader.cpp

@@ -0,0 +1,284 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+
+#include "vktraceviewer_qtracefileloader.h"
+#include "vktraceviewer_controller_factory.h"
+
+extern "C" {
+#include "vktrace_trace_packet_utils.h"
+}
+
+vktraceviewer_QTraceFileLoader::vktraceviewer_QTraceFileLoader()
+    : QObject(NULL)
+{
+    qRegisterMetaType<vktraceviewer_trace_file_info>("vktraceviewer_trace_file_info");
+}
+
+vktraceviewer_QTraceFileLoader::~vktraceviewer_QTraceFileLoader()
+{
+}
+
+//-----------------------------------------------------------------------------
+void vktraceviewer_QTraceFileLoader::loadTraceFile(const QString& filename)
+{
+    // open trace file and read in header
+    memset(&m_traceFileInfo, 0, sizeof(vktraceviewer_trace_file_info));
+    m_traceFileInfo.pFile = fopen(filename.toStdString().c_str(), "rb");
+
+    bool bOpened = (m_traceFileInfo.pFile != NULL);
+    if (!bOpened)
+    {
+        emit OutputMessage(VKTRACE_LOG_ERROR, "Unable to open file.");
+    }
+    else
+    {
+        m_traceFileInfo.filename = vktrace_allocate_and_copy(filename.toStdString().c_str());
+        if (populate_trace_file_info(&m_traceFileInfo) == FALSE)
+        {
+            emit OutputMessage(VKTRACE_LOG_ERROR, "Unable to populate trace file info from file.");
+            bOpened = false;
+        }
+        else
+        {
+            // Make sure trace file version is supported
+            if (m_traceFileInfo.header.trace_file_version < VKTRACE_TRACE_FILE_VERSION_MINIMUM_COMPATIBLE)
+            {
+                emit OutputMessage(VKTRACE_LOG_ERROR, QString("Trace file version %1 is older than minimum compatible version (%2).\nYou'll need to make a new trace file, or use an older replayer.").arg(m_traceFileInfo.header.trace_file_version).arg(VKTRACE_TRACE_FILE_VERSION_MINIMUM_COMPATIBLE));
+                bOpened = false;
+            }
+
+#ifdef USE_STATIC_CONTROLLER_LIBRARY
+            m_pController = vtvCreateQController();
+            if (bOpened)
+#else
+            if (!load_controllers(&m_traceFileInfo))
+            {
+                emit OutputMessage(VKTRACE_LOG_ERROR, "Failed to load necessary debug controllers.");
+                bOpened = false;
+            }
+            else if (bOpened)
+#endif
+            {
+                connect(m_pController, SIGNAL(OutputMessage(VktraceLogLevel, const QString&)), this, SIGNAL(OutputMessage(VktraceLogLevel, const QString&)));
+                connect(m_pController, SIGNAL(OutputMessage(VktraceLogLevel, uint64_t, const QString&)), this, SIGNAL(OutputMessage(VktraceLogLevel, uint64_t, const QString&)));
+
+                // interpret the trace file packets
+                for (uint64_t i = 0; i < m_traceFileInfo.packetCount; i++)
+                {
+                    vktraceviewer_trace_file_packet_offsets* pOffsets = &m_traceFileInfo.pPacketOffsets[i];
+                    switch (pOffsets->pHeader->packet_id) {
+                        case VKTRACE_TPI_MESSAGE:
+                            m_traceFileInfo.pPacketOffsets[i].pHeader = vktrace_interpret_body_as_trace_packet_message(pOffsets->pHeader)->pHeader;
+                            break;
+                        case VKTRACE_TPI_MARKER_CHECKPOINT:
+                            break;
+                        case VKTRACE_TPI_MARKER_API_BOUNDARY:
+                            break;
+                        case VKTRACE_TPI_MARKER_API_GROUP_BEGIN:
+                            break;
+                        case VKTRACE_TPI_MARKER_API_GROUP_END:
+                            break;
+                        case VKTRACE_TPI_MARKER_TERMINATE_PROCESS:
+                            break;
+                        //TODO processing code for all the above cases
+                        default:
+                        {
+                            vktrace_trace_packet_header* pHeader = m_pController->InterpretTracePacket(pOffsets->pHeader);
+                            if (pHeader == NULL)
+                            {
+                                bOpened = false;
+                                emit OutputMessage(VKTRACE_LOG_ERROR, QString("Unrecognized packet type: %1").arg(pOffsets->pHeader->packet_id));
+                                m_traceFileInfo.pPacketOffsets[i].pHeader = NULL;
+                                break;
+                            }
+                            m_traceFileInfo.pPacketOffsets[i].pHeader = pHeader;
+                        }
+                    }
+
+                    // break from loop if there is an error
+                    if (bOpened == false)
+                    {
+                        break;
+                    }
+                }
+
+#ifdef USE_STATIC_CONTROLLER_LIBRARY
+            vtvDeleteQController(&m_pController);
+#else
+            m_controllerFactory.Unload(&m_pController);
+#endif
+            }
+        }
+
+        // TODO: We don't really want to close the trace file yet.
+        // I think we want to keep it open so that we can dynamically read from it.
+        // BUT we definitely don't want it to get locked open, so we need a smart
+        // way to open / close from it when reading.
+        fclose(m_traceFileInfo.pFile);
+        m_traceFileInfo.pFile = NULL;
+    }
+
+    // populate the UI based on trace file info
+    emit TraceFileLoaded(bOpened, m_traceFileInfo, m_controllerFilename);
+
+    emit Finished();
+}
+
+//-----------------------------------------------------------------------------
+bool vktraceviewer_QTraceFileLoader::load_controllers(vktraceviewer_trace_file_info* pTraceFileInfo)
+{
+    if (pTraceFileInfo->header.tracer_count == 0)
+    {
+        emit OutputMessage(VKTRACE_LOG_ERROR, "No API specified in tracefile for replaying.");
+        return false;
+    }
+
+    for (int i = 0; i < pTraceFileInfo->header.tracer_count; i++)
+    {
+        uint8_t tracerId = pTraceFileInfo->header.tracer_id_array[i].id;
+
+        const VKTRACE_TRACER_REPLAYER_INFO* pReplayerInfo = &(gs_tracerReplayerInfo[tracerId]);
+
+        if (pReplayerInfo->tracerId != tracerId)
+        {
+            emit OutputMessage(VKTRACE_LOG_ERROR, QString("Replayer info for TracerId (%1) failed consistency check.").arg(tracerId));
+            assert(!"TracerId in VKTRACE_TRACER_REPLAYER_INFO does not match the requested tracerId. The array needs to be corrected.");
+        }
+        else if (strlen(pReplayerInfo->debuggerLibraryname) != 0)
+        {
+            // Have our factory create the necessary controller
+            emit OutputMessage(VKTRACE_LOG_VERBOSE, QString("Loading controller: %1").arg(pReplayerInfo->debuggerLibraryname));
+
+            m_pController = m_controllerFactory.Load(pReplayerInfo->debuggerLibraryname);
+
+            if (m_pController != NULL)
+            {
+                m_controllerFilename = QString(pReplayerInfo->debuggerLibraryname);
+                // Only one controller needs to be loaded, so break from loop
+                break;
+            }
+            else
+            {
+                // controller failed to be created
+                emit OutputMessage(VKTRACE_LOG_ERROR, QString("Unable to load controller for TracerId %1.").arg(tracerId));
+            }
+        }
+    }
+
+    return m_pController != NULL;
+}
+
+//-----------------------------------------------------------------------------
+bool vktraceviewer_QTraceFileLoader::populate_trace_file_info(vktraceviewer_trace_file_info* pTraceFileInfo)
+{
+    assert(pTraceFileInfo != NULL);
+    assert(pTraceFileInfo->pFile != NULL);
+
+    // read trace file header
+    if (1 != fread(&(pTraceFileInfo->header), sizeof(vktrace_trace_file_header), 1, pTraceFileInfo->pFile))
+    {
+        emit OutputMessage(VKTRACE_LOG_ERROR, "Unable to read header from file.");
+        return false;
+    }
+
+    // Find out how many trace packets there are.
+
+    // Seek to first packet
+    long first_offset = pTraceFileInfo->header.first_packet_offset;
+    int seekResult = fseek(pTraceFileInfo->pFile, first_offset, SEEK_SET);
+    if (seekResult != 0)
+    {
+        emit OutputMessage(VKTRACE_LOG_WARNING, "Failed to seek to the first packet offset in the trace file.");
+    }
+
+    uint64_t fileOffset = pTraceFileInfo->header.first_packet_offset;
+    uint64_t packetSize = 0;
+    while(1 == fread(&packetSize, sizeof(uint64_t), 1, pTraceFileInfo->pFile))
+    {
+        // success!
+        pTraceFileInfo->packetCount++;
+        fileOffset += packetSize;
+
+        fseek(pTraceFileInfo->pFile, fileOffset, SEEK_SET);
+    }
+
+    if (pTraceFileInfo->packetCount == 0)
+    {
+        if (ferror(pTraceFileInfo->pFile) != 0)
+        {
+            perror("File Read error:");
+            emit OutputMessage(VKTRACE_LOG_ERROR, "There was an error reading the trace file.");
+            return false;
+        }
+        else if (feof(pTraceFileInfo->pFile) != 0)
+        {
+            emit OutputMessage(VKTRACE_LOG_WARNING, "Reached the end of the file.");
+        }
+        emit OutputMessage(VKTRACE_LOG_WARNING, "There are no trace packets in this trace file.");
+        pTraceFileInfo->pPacketOffsets = NULL;
+    }
+    else
+    {
+        pTraceFileInfo->pPacketOffsets = VKTRACE_NEW_ARRAY(vktraceviewer_trace_file_packet_offsets, pTraceFileInfo->packetCount);
+
+        // rewind to first packet and this time, populate the packet offsets
+        if (fseek(pTraceFileInfo->pFile, first_offset, SEEK_SET) != 0)
+        {
+            emit OutputMessage(VKTRACE_LOG_ERROR, "Unable to rewind trace file to gather packet offsets.");
+            return false;
+        }
+
+        unsigned int packetIndex = 0;
+        fileOffset = first_offset;
+        while(1 == fread(&packetSize, sizeof(uint64_t), 1, pTraceFileInfo->pFile))
+        {
+            // the fread confirms that this packet exists
+            // NOTE: We do not actually read the entire packet into memory right now.
+            pTraceFileInfo->pPacketOffsets[packetIndex].fileOffset = fileOffset;
+
+            // rewind slightly
+            fseek(pTraceFileInfo->pFile, -1*(long)sizeof(uint64_t), SEEK_CUR);
+
+            // allocate space for the packet and read it in
+            pTraceFileInfo->pPacketOffsets[packetIndex].pHeader = (vktrace_trace_packet_header*)vktrace_malloc(packetSize);
+            if (1 != fread(pTraceFileInfo->pPacketOffsets[packetIndex].pHeader, packetSize, 1, pTraceFileInfo->pFile))
+            {
+                emit OutputMessage(VKTRACE_LOG_ERROR, "Unable to read in a trace packet.");
+                return false;
+            }
+
+            // adjust pointer to body of the packet
+            pTraceFileInfo->pPacketOffsets[packetIndex].pHeader->pBody = (uintptr_t)pTraceFileInfo->pPacketOffsets[packetIndex].pHeader + sizeof(vktrace_trace_packet_header);
+
+            // now seek to what should be the next packet
+            fileOffset += packetSize;
+            packetIndex++;
+        }
+
+        if (fseek(pTraceFileInfo->pFile, first_offset, SEEK_SET) != 0)
+        {
+            emit OutputMessage(VKTRACE_LOG_ERROR, "Unable to rewind trace file to restore position.");
+            return false;
+        }
+    }
+
+    return true;
+}

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_qtracefileloader.h b/vktrace/src/vktrace_viewer/vktraceviewer_qtracefileloader.h
new file mode 100644
index 0000000..809eb46
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_qtracefileloader.h

@@ -0,0 +1,59 @@
+/**************************************************************************
+ *
+ * Copyright 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+#ifndef VKTRACEVIEWER_QTRACEFILELOADER_H
+#define VKTRACEVIEWER_QTRACEFILELOADER_H
+
+#include <QObject>
+#include "vktraceviewer_controller_factory.h"
+#include "vktraceviewer_controller.h"
+
+#define USE_STATIC_CONTROLLER_LIBRARY 1
+class vktraceviewer_QTraceFileLoader : public QObject
+{
+    Q_OBJECT
+public:
+    vktraceviewer_QTraceFileLoader();
+    virtual ~vktraceviewer_QTraceFileLoader();
+
+public slots:
+    void loadTraceFile(const QString& filename);
+
+signals:
+    void OutputMessage(VktraceLogLevel level, uint64_t packetIndex, const QString& message);
+    void OutputMessage(VktraceLogLevel level, const QString& message);
+
+    void TraceFileLoaded(bool bSuccess, const vktraceviewer_trace_file_info& fileInfo, const QString& controllerFilename);
+
+    void Finished();
+
+private:
+    vktraceviewer_trace_file_info m_traceFileInfo;
+    vktraceviewer_controller_factory m_controllerFactory;
+    vktraceviewer_QController* m_pController;
+    QString m_controllerFilename;
+
+    bool load_controllers(vktraceviewer_trace_file_info* pTraceFileInfo);
+
+    bool populate_trace_file_info(vktraceviewer_trace_file_info* pTraceFileInfo);
+
+};
+
+#endif // VKTRACEVIEWER_QTRACEFILELOADER_H

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_settings.cpp b/vktrace/src/vktrace_viewer/vktraceviewer_settings.cpp
new file mode 100644
index 0000000..1b03567
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_settings.cpp

@@ -0,0 +1,202 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+#include "vktraceviewer_settings.h"
+#include "vktraceviewer_output.h"
+#include <assert.h>
+
+#include <QCoreApplication>
+#include <QDir>
+
+extern "C" {
+#include "vktrace_settings.h"
+}
+
+static const unsigned int VKTRACEVIEWER_SETTINGS_FILE_FORMAT_VERSION_1 = 1;
+static const unsigned int VKTRACEVIEWER_SETTINGS_FILE_FORMAT_VERSION = VKTRACEVIEWER_SETTINGS_FILE_FORMAT_VERSION_1;
+
+static const char *s_SETTINGS_FILE = "vktraceviewer_settings.txt";
+
+// declared as extern in header
+vktraceviewer_settings g_settings;
+static vktraceviewer_settings s_default_settings;
+vktrace_SettingGroup* g_pAllSettings = NULL;
+unsigned int g_numAllSettings = 0;
+
+vktrace_SettingInfo g_settings_info[] =
+{
+    { "o", "OpenTraceFile", VKTRACE_SETTING_STRING, &g_settings.trace_file_to_open, &s_default_settings.trace_file_to_open, TRUE, "Load the specified trace file when vktraceviewer is opened."},
+    { "wl", "WindowLeft", VKTRACE_SETTING_INT, &g_settings.window_position_left, &s_default_settings.window_position_left, TRUE, "Left coordinate of VkTraceViewer window on startup."},
+    { "wt", "WindowTop", VKTRACE_SETTING_INT, &g_settings.window_position_top, &s_default_settings.window_position_top, TRUE, "Top coordinate of VkTraceViewer window on startup."},
+    { "ww", "WindowWidth", VKTRACE_SETTING_INT, &g_settings.window_size_width, &s_default_settings.window_size_width, TRUE, "Width of VkTraceViewer window on startup."},
+    { "wh", "WindowHeight", VKTRACE_SETTING_INT, &g_settings.window_size_height, &s_default_settings.window_size_height, TRUE, "Height of VkTraceViewer window on startup."},
+
+    { "", "GenTraceApplication", VKTRACE_SETTING_STRING, &g_settings.gentrace_application, &s_default_settings.gentrace_application, FALSE, "The most recent application path in the 'Generate Trace' dialog."},
+    { "", "GenTraceArguments", VKTRACE_SETTING_STRING, &g_settings.gentrace_arguments, &s_default_settings.gentrace_arguments, FALSE, "The most recent application arguments in the 'Generate Trace' dialog."},
+    { "", "GenTraceWorkingDir", VKTRACE_SETTING_STRING, &g_settings.gentrace_working_dir, &s_default_settings.gentrace_working_dir, FALSE, "The most recent working directory in the 'Generate Trace' dialog."},
+    { "", "GenTraceVkLayerPath", VKTRACE_SETTING_STRING, &g_settings.gentrace_vk_layer_path, &s_default_settings.gentrace_vk_layer_path, FALSE, "The most recent VK_LAYER_PATH used in the 'Generate Trace' dialog."},
+    { "", "GenTraceOutputFile", VKTRACE_SETTING_STRING, &g_settings.gentrace_output_file, &s_default_settings.gentrace_output_file, FALSE, "The most recent output trace file in the 'Generate Trace' dialog."},
+
+    { "", "SettingsDialogWidth", VKTRACE_SETTING_INT, &g_settings.settings_dialog_width, &s_default_settings.settings_dialog_width, TRUE, "Width of VkTraceViewer settings dialog when opened."},
+    { "", "SettingsDialogHeight", VKTRACE_SETTING_INT, &g_settings.settings_dialog_height, &s_default_settings.settings_dialog_height, TRUE, "Height of VkTraceViewer settings dialog when opened."},
+
+    //{ "tltps", "trim_large_trace_prompt_size", VKTRACE_SETTING_UINT, &g_settings.trim_large_trace_prompt_size, &s_default_settings.trim_large_trace_prompt_size, TRUE, "The number of frames in a trace file at which the user should be prompted to trim the trace before loading."},
+    //{ "gsr", "group_state_render", VKTRACE_SETTING_BOOL, &g_settings.groups_state_render, &s_default_settings.groups_state_render, TRUE, "Path to the dynamic tracer library to be injected, may use [0-15]."},
+    //{ "gppm", "groups_push_pop_markers", VKTRACE_SETTING_BOOL, &g_settings.groups_push_pop_markers, &s_default_settings.groups_push_pop_markers, TRUE, "Path to the dynamic tracer library to be injected, may use [0-15]."},
+    //{ "gnc", "groups_nested_calls", VKTRACE_SETTING_BOOL, &g_settings.groups_nested_calls, &s_default_settings.groups_nested_calls, TRUE, "Path to the dynamic tracer library to be injected, may use [0-15]."},
+};
+
+vktrace_SettingGroup g_settingGroup =
+{
+    "vktraceviewer",
+    sizeof(g_settings_info) / sizeof(g_settings_info[0]),
+    &g_settings_info[0]
+};
+
+QString vktraceviewer_get_settings_file_path()
+{
+    QString result = vktraceviewer_get_settings_directory() + "/" + QString(s_SETTINGS_FILE);
+    return result;
+}
+
+QString vktraceviewer_get_settings_directory()
+{
+    char * pSettingsPath = vktrace_platform_get_settings_path();
+    QString result = QString(pSettingsPath);
+    vktrace_free(pSettingsPath);
+    return result;
+}
+
+QString vktraceviewer_get_sessions_directory()
+{
+    char * pDataPath = vktrace_platform_get_data_path();
+    QString result = QString(pDataPath) + "/sessions/";
+    vktrace_free(pDataPath);
+    return result;
+}
+
+bool vktraceviewer_initialize_settings(int argc, char* argv[])
+{
+    bool bSuccess = true;
+
+    // setup default values
+    memset(&s_default_settings, 0, sizeof(vktraceviewer_settings));
+
+    s_default_settings.trace_file_to_open = NULL;
+    s_default_settings.window_position_left = 0;
+    s_default_settings.window_position_top = 0;
+    s_default_settings.window_size_width = 1024;
+    s_default_settings.window_size_height = 768;
+
+    s_default_settings.gentrace_application = NULL;
+    s_default_settings.gentrace_arguments = NULL;
+    s_default_settings.gentrace_working_dir = NULL;
+    s_default_settings.gentrace_vk_layer_path = NULL;
+    s_default_settings.gentrace_output_file = NULL;
+
+    // This seems to be a reasonable default size for the dialog.
+    s_default_settings.settings_dialog_width = 600;
+    s_default_settings.settings_dialog_height = 600;
+
+    // initialize settings as defaults
+    g_settings = s_default_settings;
+
+    QString settingsFilePath = vktraceviewer_get_settings_file_path();
+    FILE* pFile = fopen(settingsFilePath.toStdString().c_str(), "r");
+    if (pFile == NULL)
+    {
+        vktrace_LogWarning("Unable to open settings file: '%s'.", settingsFilePath.toStdString().c_str());
+    }
+
+    // Secondly set options based on settings file
+    if (pFile != NULL)
+    {
+        g_pAllSettings = NULL;
+        g_numAllSettings = 0;
+        if (vktrace_SettingGroup_Load_from_file(pFile, &g_pAllSettings, &g_numAllSettings) == -1)
+        {
+            vktrace_SettingGroup_print(&g_settingGroup);
+            return false;
+        }
+
+        if (g_pAllSettings != NULL && g_numAllSettings > 0)
+        {
+            vktrace_SettingGroup_Apply_Overrides(&g_settingGroup, g_pAllSettings, g_numAllSettings);
+        }
+
+        fclose(pFile);
+        pFile = NULL;
+    }
+
+    // apply settings from settings file and from cmd-line args
+    if (vktrace_SettingGroup_init_from_cmdline(&g_settingGroup, argc, argv, &g_settings.trace_file_to_open) != 0)
+    {
+        // invalid options specified
+        bSuccess = false;
+    }
+
+    // Merge known vktraceviewer settings into the loaded settings.
+    // This ensures that new known settings are added to the settings dialog
+    // and will be re-written to the settings file upon saving.
+    vktrace_SettingGroup_merge(&g_settingGroup, &g_pAllSettings, &g_numAllSettings);
+
+    // This would be a good place to validate any "required" settings, but right now there aren't any!
+
+    if (bSuccess == false)
+    {
+        vktrace_SettingGroup_print(&g_settingGroup);
+        vktrace_SettingGroup_delete(&g_settingGroup);
+    }
+
+    return bSuccess;
+}
+
+void vktraceviewer_settings_updated()
+{
+    vktrace_SettingGroup_update(&g_settingGroup, g_pAllSettings, g_numAllSettings);
+}
+
+void vktraceviewer_save_settings()
+{
+    QDir settingsDir(vktraceviewer_get_settings_directory());
+    if (settingsDir.mkpath(".") == false)
+    {
+        QString error = "Failed to create " + settingsDir.path();
+        vktraceviewer_output_error(error);
+    }
+
+    QString filepath = vktraceviewer_get_settings_file_path();
+    FILE* pSettingsFile = fopen(filepath.toStdString().c_str(), "w");
+    if (pSettingsFile == NULL)
+    {
+        QString error = "Failed to open settings file for writing: " + filepath;
+        vktraceviewer_output_error(error);
+    }
+    else
+    {
+        if (vktrace_SettingGroup_save(g_pAllSettings, g_numAllSettings, pSettingsFile) == FALSE)
+        {
+            QString error = "Failed to save settings file: " + filepath;
+            vktraceviewer_output_error(error);
+        }
+
+        fclose(pSettingsFile);
+    }
+}

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_settings.h b/vktrace/src/vktrace_viewer/vktraceviewer_settings.h
new file mode 100644
index 0000000..7bf9fec
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_settings.h

@@ -0,0 +1,67 @@
+/**************************************************************************
+ *
+ * Copyright 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+#ifndef VKTRACEVIEWER_SETTINGS_H
+#define VKTRACEVIEWER_SETTINGS_H
+
+extern "C" {
+#include "vktrace_settings.h"
+}
+
+#include <QString>
+
+extern vktrace_SettingGroup* g_pAllSettings;
+extern unsigned int g_numAllSettings;
+
+typedef struct vktraceviewer_settings
+{
+    char * trace_file_to_open;
+    int window_position_left;
+    int window_position_top;
+    int window_size_width;
+    int window_size_height;
+    char * gentrace_application;
+    char * gentrace_arguments;
+    char * gentrace_working_dir;
+    char * gentrace_vk_layer_path;
+    char * gentrace_output_file;
+    int settings_dialog_width;
+    int settings_dialog_height;
+    //unsigned int trim_large_trace_prompt_size;
+
+    //bool groups_state_render;
+    //bool groups_push_pop_markers;
+    //bool groups_nested_calls;
+} vktraceviewer_settings;
+
+extern vktraceviewer_settings g_settings;
+extern vktrace_SettingGroup g_settingGroup;
+
+bool vktraceviewer_initialize_settings(int argc, char* argv[]);
+
+void vktraceviewer_settings_updated();
+
+void vktraceviewer_save_settings();
+
+QString vktraceviewer_get_settings_file_path();
+QString vktraceviewer_get_settings_directory();
+QString vktraceviewer_get_sessions_directory();
+
+#endif // VKTRACEVIEWER_SETTINGS_H

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_trace_file_utils.cpp b/vktrace/src/vktrace_viewer/vktraceviewer_trace_file_utils.cpp
new file mode 100644
index 0000000..ac987a3
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_trace_file_utils.cpp

@@ -0,0 +1,118 @@
+/**************************************************************************
+ *
+ * Copyright 2014-2016 Valve Corporation
+ * Copyright (C) 2014-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+#include "vktraceviewer_trace_file_utils.h"
+#include "vktrace_memory.h"
+
+BOOL vktraceviewer_populate_trace_file_info(vktraceviewer_trace_file_info* pTraceFileInfo)
+{
+    assert(pTraceFileInfo != NULL);
+    assert(pTraceFileInfo->pFile != NULL);
+
+    // read trace file header
+    if (1 != fread(&(pTraceFileInfo->header), sizeof(vktrace_trace_file_header), 1, pTraceFileInfo->pFile))
+    {
+        vktraceviewer_output_error("Unable to read header from file.");
+        return FALSE;
+    }
+
+    // Find out how many trace packets there are.
+
+    // Seek to first packet
+    long first_offset = pTraceFileInfo->header.first_packet_offset;
+    int seekResult = fseek(pTraceFileInfo->pFile, first_offset, SEEK_SET);
+    if (seekResult != 0)
+    {
+        vktraceviewer_output_warning("Failed to seek to the first packet offset in the trace file.");
+    }
+
+    uint64_t fileOffset = pTraceFileInfo->header.first_packet_offset;
+    uint64_t packetSize = 0;
+    while(1 == fread(&packetSize, sizeof(uint64_t), 1, pTraceFileInfo->pFile))
+    {
+        // success!
+        pTraceFileInfo->packetCount++;
+        fileOffset += packetSize;
+
+        fseek(pTraceFileInfo->pFile, fileOffset, SEEK_SET);
+    }
+
+    if (pTraceFileInfo->packetCount == 0)
+    {
+        if (ferror(pTraceFileInfo->pFile) != 0)
+        {
+            perror("File Read error:");
+            vktraceviewer_output_warning("There was an error reading the trace file.");
+            return FALSE;
+        }
+        else if (feof(pTraceFileInfo->pFile) != 0)
+        {
+            vktraceviewer_output_warning("Reached the end of the file.");
+        }
+        vktraceviewer_output_warning("There are no trace packets in this trace file.");
+        pTraceFileInfo->pPacketOffsets = NULL;
+    }
+    else
+    {
+        pTraceFileInfo->pPacketOffsets = VKTRACE_NEW_ARRAY(vktraceviewer_trace_file_packet_offsets, pTraceFileInfo->packetCount);
+
+        // rewind to first packet and this time, populate the packet offsets
+        if (fseek(pTraceFileInfo->pFile, first_offset, SEEK_SET) != 0)
+        {
+            vktraceviewer_output_error("Unable to rewind trace file to gather packet offsets.");
+            return FALSE;
+        }
+
+        unsigned int packetIndex = 0;
+        fileOffset = first_offset;
+        while(1 == fread(&packetSize, sizeof(uint64_t), 1, pTraceFileInfo->pFile))
+        {
+            // the fread confirms that this packet exists
+            // NOTE: We do not actually read the entire packet into memory right now.
+            pTraceFileInfo->pPacketOffsets[packetIndex].fileOffset = fileOffset;
+
+            // rewind slightly
+            fseek(pTraceFileInfo->pFile, -1*(long)sizeof(uint64_t), SEEK_CUR);
+
+            // allocate space for the packet and read it in
+            pTraceFileInfo->pPacketOffsets[packetIndex].pHeader = (vktrace_trace_packet_header*)vktrace_malloc(packetSize);
+            if (1 != fread(pTraceFileInfo->pPacketOffsets[packetIndex].pHeader, packetSize, 1, pTraceFileInfo->pFile))
+            {
+                vktraceviewer_output_error("Unable to read in a trace packet.");
+                return FALSE;
+            }
+
+            // adjust pointer to body of the packet
+            pTraceFileInfo->pPacketOffsets[packetIndex].pHeader->pBody = (uintptr_t)pTraceFileInfo->pPacketOffsets[packetIndex].pHeader + sizeof(vktrace_trace_packet_header);
+
+            // now seek to what should be the next packet
+            fileOffset += packetSize;
+            packetIndex++;
+        }
+
+        if (fseek(pTraceFileInfo->pFile, first_offset, SEEK_SET) != 0)
+        {
+            vktraceviewer_output_error("Unable to rewind trace file to restore position.");
+            return FALSE;
+        }
+    }
+
+    return TRUE;
+}

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_trace_file_utils.h b/vktrace/src/vktrace_viewer/vktraceviewer_trace_file_utils.h
new file mode 100644
index 0000000..85f48c2
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_trace_file_utils.h

@@ -0,0 +1,61 @@
+/**************************************************************************
+ *
+ * Copyright 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+#ifndef VKTRACEVIEWER_TRACE_FILE_UTILS_H_
+#define VKTRACEVIEWER_TRACE_FILE_UTILS_H_
+
+//#include <string>
+#include <QString>
+
+extern "C" {
+#include "vktrace_trace_packet_identifiers.h"
+}
+#include "vktraceviewer_output.h"
+
+struct vktraceviewer_trace_file_packet_offsets
+{
+    // the file offset to this particular packet
+    unsigned int fileOffset;
+
+    // Pointer to the packet header if it's been read from disk
+    vktrace_trace_packet_header* pHeader;
+};
+
+struct vktraceviewer_trace_file_info
+{
+    // the trace file name & path
+    char* filename;
+
+    // the trace file
+    FILE* pFile;
+
+    // copy of the trace file header
+    vktrace_trace_file_header header;
+
+    // number of packets in file which should also be number of elements in pPacketOffsets array
+    uint64_t packetCount;
+
+    // array of packet offsets
+    vktraceviewer_trace_file_packet_offsets* pPacketOffsets;
+};
+
+BOOL vktraceviewer_populate_trace_file_info(vktraceviewer_trace_file_info* pTraceFileInfo);
+
+#endif //VKTRACEVIEWER_TRACE_FILE_UTILS_H_

diff --git a/vktrace/src/vktrace_viewer/vktraceviewer_view.h b/vktrace/src/vktrace_viewer/vktraceviewer_view.h
new file mode 100644
index 0000000..da1300d
--- /dev/null
+++ b/vktrace/src/vktrace_viewer/vktraceviewer_view.h

@@ -0,0 +1,59 @@
+/**************************************************************************
+ *
+ * Copyright 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ * All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Peter Lohrmann <peterl@valvesoftware.com> <plohrmann@gmail.com>
+ **************************************************************************/
+#pragma once
+
+#include "vktraceviewer_QTraceFileModel.h"
+
+struct vktraceviewer_trace_file_info;
+struct vktrace_SettingGroup;
+class QWidget;
+class QToolButton;
+class QAction;
+class QAbstractProxyModel;
+
+class vktraceviewer_view
+{
+public:
+    virtual void reset_view() = 0;
+
+//    virtual void output_message(uint64_t packetIndex, QString message) = 0;
+//    virtual void output_warning(uint64_t packetIndex, QString message) = 0;
+//    virtual void output_error(uint64_t packetIndex, QString message) = 0;
+
+    virtual void add_setting_group(vktrace_SettingGroup* pGroup) = 0;
+    virtual unsigned int get_global_settings(vktrace_SettingGroup** ppGroups) = 0;
+
+    virtual void set_calltree_model(vktraceviewer_QTraceFileModel* pTraceFileModel, QAbstractProxyModel *pModel) = 0;
+    virtual void add_calltree_contextmenu_item(QAction* pAction) = 0;
+    virtual void select_call_at_packet_index(unsigned long long packetIndex) = 0;
+    virtual void highlight_timeline_item(unsigned long long packetArrayIndex, bool bScrollTo, bool bSelect) = 0;
+
+    // \return tab index of state viewer
+    virtual int add_custom_state_viewer(QWidget* pWidget, const QString& title, bool bBringToFront = false) = 0;
+    virtual void remove_custom_state_viewer(int tabIndex) = 0;
+    virtual void enable_custom_state_viewer(QWidget* pWidget, bool bEnabled) = 0;
+
+    virtual QToolButton* add_toolbar_button(const QString& title, bool bEnabled) = 0;
+
+    virtual void on_replay_state_changed(bool bReplayInProgress) = 0;
+
+    virtual unsigned long long get_current_packet_index() = 0;
+};

diff --git a/vktrace/vktrace_generate.py b/vktrace/vktrace_generate.py
new file mode 100755
index 0000000..0f91c24
--- /dev/null
+++ b/vktrace/vktrace_generate.py

@@ -0,0 +1,2354 @@
+#!/usr/bin/env python3
+#
+# Vulkan
+#
+# Copyright (C) 2015-2016 Valve Corporation
+# Copyright (C) 2015-2016 LunarG, Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Author: Jon Ashburn <jon@lunarg.com>
+# Author: Tobin Ehlis <tobin@lunarg.com>
+# Author: Peter Lohrmann <peterl@valvesoftware.com>
+#
+
+import os, sys
+
+
+# add main repo directory so vulkan.py can be imported. This needs to be a complete path.
+vktrace_scripts_path = os.path.dirname(os.path.abspath(__file__))
+main_path = os.path.abspath(vktrace_scripts_path + "/../")
+sys.path.append(main_path)
+from source_line_info import sourcelineinfo
+
+import vulkan
+
+# vulkan.py doesn't include all the extensions (debug_report missing)
+headers = []
+objects = []
+protos = []
+proto_exclusions = [ 'CreateWaylandSurfaceKHR', 'CreateMirSurfaceKHR',
+                     'GetPhysicalDeviceWaylandPresentationSupportKHR', 'GetPhysicalDeviceMirPresentationSupportKHR',
+                     'GetPhysicalDeviceDisplayPropertiesKHR', 'GetPhysicalDeviceDisplayPlanePropertiesKHR',
+                     'GetDisplayPlaneSupportedDisplaysKHR', 'GetDisplayModePropertiesKHR',
+                     'CreateDisplayModeKHR', 'GetDisplayPlaneCapabilitiesKHR', 'CreateDisplayPlaneSurfaceKHR']
+
+for ext in vulkan.extensions_all:
+    headers.extend(ext.headers)
+    objects.extend(ext.objects)
+    protos.extend(ext.protos)
+
+# Add parameters we need to remap, along with their type, in pairs
+additional_remap_dict = {}
+additional_remap_dict['pImageIndex'] = "uint32_t"
+
+class Subcommand(object):
+    def __init__(self, argv):
+        self.argv = argv
+        self.extensionName = argv
+        self.headers = headers
+        self.objects = objects
+        self.protos = protos
+        self.lineinfo = sourcelineinfo()
+
+    def run(self):
+        print(self.generate(self.extensionName))
+
+    def generate(self, extensionName):
+        copyright = self.generate_copyright()
+        header = self.generate_header(extensionName)
+        body = self.generate_body()
+        footer = self.generate_footer()
+        contents = []
+        if copyright:
+            contents.append(copyright)
+        if header:
+            contents.append(header)
+        if body:
+            contents.append(body)
+        if footer:
+            contents.append(footer)
+
+        return "\n\n".join(contents)
+
+    def generate_copyright(self):
+        return """/* THIS FILE IS GENERATED.  DO NOT EDIT. */
+
+/*
+ *
+ * Copyright (C) 2015-2016 Valve Corporation
+ * Copyright (C) 2015-2016 LunarG, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * Author: Jon Ashburn <jon@lunarg.com>
+ * Author: Tobin Ehlis <tobin@lunarg.com>
+ * Author: Peter Lohrmann <peterl@valvesoftware.com>
+ */"""
+
+    def generate_header(self, extensionName):
+        return "\n".join(["#include <" + h + ">" for h in self.headers])
+
+    def generate_body(self):
+        pass
+
+    def generate_footer(self):
+        pass
+
+    def _generate_trace_func_protos(self):
+        func_protos = []
+        func_protos.append('#ifdef __cplusplus')
+        func_protos.append('extern"C" {')
+        func_protos.append('#endif')
+        func_protos.append('// Hooked function prototypes\n')
+        for ext in vulkan.extensions_all:
+            if ext.ifdef:
+                func_protos.append('#ifdef %s' % ext.ifdef)
+            for proto in ext.protos:
+                if proto.name not in proto_exclusions:
+                    func_protos.append('VKTRACER_EXPORT %s;' % proto.c_func(prefix="__HOOKED_vk", attr="VKAPI"))
+
+                # LoaderLayerInterface V0
+                if proto.name in [ 'GetInstanceProcAddr', 'GetDeviceProcAddr']:
+                    func_protos.append('VK_LAYER_EXPORT %s;' % proto.c_func(prefix="VK_LAYER_LUNARG_vktrace", attr="VKAPI"))
+                if proto.name in [ 'EnumerateInstanceLayerProperties', 'EnumerateInstanceExtensionProperties',
+                                   'EnumerateDeviceLayerProperties', 'EnumerateDeviceExtensionProperties' ]:
+                    func_protos.append('VK_LAYER_EXPORT %s;' % proto.c_func(prefix="vk", attr="VKAPI"))
+
+            if ext.ifdef:
+                func_protos.append('#endif /* %s */' % ext.ifdef)
+
+        func_protos.append('#ifdef __cplusplus')
+        func_protos.append('}')
+        func_protos.append('#endif')
+        return "\n".join(func_protos)
+
+    def _generate_trace_func_protos_ext(self, extensionName):
+        func_protos = []
+        func_protos.append('// Hooked function prototypes\n')
+        for ext in vulkan.extensions_all:
+            if (extensionName.lower() == ext.name.lower()):
+                if ext.ifdef:
+                    func_protos.append('#ifdef %s' % ext.ifdef)
+                for proto in ext.protos:
+                    if proto.name not in proto_exclusions:
+                        func_protos.append('VKTRACER_EXPORT %s;' % proto.c_func(prefix="__HOOKED_vk", attr="VKAPI"))
+
+                    # LoaderLayerInterface V0
+                    if proto.name in [ 'GetInstanceProcAddr', 'GetDeviceProcAddr']:
+                        func_protos.append('VK_LAYER_EXPORT %s;' % proto.c_func(prefix="VK_LAYER_LUNARG_vktrace", attr="VKAPI"))
+                    if proto.name in [ 'EnumerateInstanceLayerProperties', 'EnumerateInstanceExtensionProperties',
+                                       'EnumerateDeviceLayerProperties', 'EnumerateDeviceExtensionProperties' ]:
+                        func_protos.append('VK_LAYER_EXPORT %s;' % proto.c_func(prefix="vk", attr="VKAPI"))
+
+                if ext.ifdef:
+                    func_protos.append('#endif /* %s */' % ext.ifdef)
+
+        return "\n".join(func_protos)
+
+    # Return set of printf '%' qualifier, input to that qualifier, and any dereference
+    def _get_printf_params(self, vk_type, name, output_param):
+        deref = ""
+        # TODO : Need ENUM and STRUCT checks here
+        if "VkImageLayout" in vk_type:
+            return ("%s", "string_%s(%s)" % (vk_type.replace('const ', '').strip('*'), name), deref)
+        if "VkClearColor" in vk_type:
+            return ("%p", "(void*)&%s" % name, deref)
+        if "_type" in vk_type.lower(): # TODO : This should be generic ENUM check
+            return ("%s", "string_%s(%s)" % (vk_type.replace('const ', '').strip('*'), name), deref)
+        if "char*" in vk_type:
+            return ("\\\"%s\\\"", name, "*")
+        if "uint64_t" in vk_type:
+            if '*' in vk_type:
+                return ("%\" PRIu64 \"",  "(%s == NULL) ? 0 : *(%s)" % (name, name), "*")
+            return ("%\" PRIu64 \"", name, deref)
+        if "uint32_t" in vk_type:
+            if '*' in vk_type:
+                return ("%u",  "(%s == NULL) ? 0 : *(%s)" % (name, name), "*")
+            return ("%u", name, deref)
+        if "xcb_visualid_t" in vk_type:
+            if '*' in vk_type:
+                return ("%u",  "(%s == NULL) ? 0 : *(%s)" % (name, name), "*")
+            return ("%u", name, deref)
+        if "VisualID" in vk_type:
+            return ("%\" PRIu64 \"", "(uint64_t)%s" % name, deref)
+        if "VkBool32" in vk_type:
+            if '*' in vk_type:
+                return ("%s",  "(*%s == VK_TRUE) ? \"VK_TRUE\" : \"VK_FALSE\"" % (name), "*")
+            return ("%s", "(%s == VK_TRUE) ? \"VK_TRUE\" : \"VK_FALSE\"" %(name), deref)
+        if "size_t" in vk_type:
+            if '*' in vk_type:
+                return ("\" VK_SIZE_T_SPECIFIER \"", "(%s == NULL) ? 0 : *(%s)" % (name, name), "*")
+            return ("\" VK_SIZE_T_SPECIFIER \"", name, deref)
+        if "float" in vk_type:
+            if '[' in vk_type: # handle array, current hard-coded to 4 (TODO: Make this dynamic)
+                return ("[%f, %f, %f, %f]", "%s[0], %s[1], %s[2], %s[3]" % (name, name, name, name), deref)
+            return ("%f", name, deref)
+        if "bool" in vk_type or 'xcb_randr_crtc_t' in vk_type:
+            return ("%u", name, deref)
+        if True in [t in vk_type.lower() for t in ["int", "flags", "mask", "xcb_window_t"]]:
+            if '[' in vk_type: # handle array, current hard-coded to 4 (TODO: Make this dynamic)
+                return ("[%i, %i, %i, %i]", "%s[0], %s[1], %s[2], %s[3]" % (name, name, name, name), deref)
+            if '*' in vk_type:
+                return ("%i", "(%s == NULL) ? 0 : *(%s)" % (name, name), "*")
+            return ("%i", name, deref)
+        if output_param:
+            return ("%p {%\" PRIx64 \"}", "(void*)%s, (%s == NULL) ? 0 : (uint64_t)*(%s)" % (name, name, name), deref)
+        return ("%p", "(void*)(%s)" % name, deref)
+
+    def _generate_init_funcs(self):
+        init_tracer = []
+        init_tracer.append('void send_vk_api_version_packet()\n{')
+        init_tracer.append('    packet_vkApiVersion* pPacket;')
+        init_tracer.append('    vktrace_trace_packet_header* pHeader;')
+        init_tracer.append('    pHeader = vktrace_create_trace_packet(VKTRACE_TID_VULKAN, VKTRACE_TPI_VK_vkApiVersion, sizeof(packet_vkApiVersion), 0);')
+        init_tracer.append('    pPacket = interpret_body_as_vkApiVersion(pHeader);')
+        init_tracer.append('    pPacket->version = VK_MAKE_VERSION(1, 0, VK_HEADER_VERSION);')
+        init_tracer.append('    vktrace_set_packet_entrypoint_end_time(pHeader);')
+        init_tracer.append('    FINISH_TRACE_PACKET();\n}\n')
+
+        init_tracer.append('extern VKTRACE_CRITICAL_SECTION g_memInfoLock;')
+
+        init_tracer.append('#ifdef WIN32\n')
+        init_tracer.append('BOOL CALLBACK InitTracer(_Inout_ PINIT_ONCE initOnce, _Inout_opt_ PVOID param, _Out_opt_ PVOID *lpContext)\n{')
+        init_tracer.append('#elif defined(PLATFORM_LINUX)\n')
+        init_tracer.append('void InitTracer(void)\n{')
+        init_tracer.append('#endif\n')
+        init_tracer.append('#if defined(ANDROID)')
+        init_tracer.append('    // On Android, we can use an abstract socket to fit permissions model')
+        init_tracer.append('    const char *ipAddr = "localabstract";')
+        init_tracer.append('    const char *ipPort = "vktrace";')
+        init_tracer.append('    gMessageStream = vktrace_MessageStream_create_port_string(FALSE, ipAddr, ipPort);')
+        init_tracer.append('#else')
+        init_tracer.append('    const char *ipAddr = vktrace_get_global_var("VKTRACE_LIB_IPADDR");')
+        init_tracer.append('    if (ipAddr == NULL)')
+        init_tracer.append('        ipAddr = "127.0.0.1";')
+        init_tracer.append('    gMessageStream = vktrace_MessageStream_create(FALSE, ipAddr, VKTRACE_BASE_PORT + VKTRACE_TID_VULKAN);')
+        init_tracer.append('#endif')
+        init_tracer.append('    vktrace_trace_set_trace_file(vktrace_FileLike_create_msg(gMessageStream));')
+        init_tracer.append('    vktrace_tracelog_set_tracer_id(VKTRACE_TID_VULKAN);')
+        init_tracer.append('    vktrace_create_critical_section(&g_memInfoLock);')
+        init_tracer.append('    if (gMessageStream != NULL)')
+        init_tracer.append('        send_vk_api_version_packet();\n')
+        init_tracer.append('#ifdef WIN32\n')
+        init_tracer.append('    return true;\n}\n')
+        init_tracer.append('#elif defined(PLATFORM_LINUX)\n')
+        init_tracer.append('    return;\n}\n')
+        init_tracer.append('#endif\n')
+        return "\n".join(init_tracer)
+
+    # Take a list of params and return a list of dicts w/ ptr param details
+    def _get_packet_ptr_param_list(self, params):
+        ptr_param_list = []
+        # TODO : This is a slightly nicer way to handle custom cases than initial code, however
+        #   this can still be further generalized to eliminate more custom code
+        #   big case to handle is when ptrs to structs have embedded data that needs to be accounted for in packet
+        custom_ptr_dict = {'VkDeviceCreateInfo': {'add_txt': 'add_VkDeviceCreateInfo_to_packet(pHeader, (VkDeviceCreateInfo**) &(pPacket->pCreateInfo), pCreateInfo)',
+                                                  'finalize_txt': ''},
+                           'VkApplicationInfo': {'add_txt': 'add_VkApplicationInfo_to_packet(pHeader, (VkApplicationInfo**)&(pPacket->pApplicationInfo), pApplicationInfo)',
+                                                 'finalize_txt': ''},
+                           'VkPhysicalDevice': {'add_txt': 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pGpus), *pGpuCount*sizeof(VkPhysicalDevice), pGpus)',
+                                                'finalize_txt': 'default'},
+                           'VkImageCreateInfo': {'add_txt': 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo), sizeof(VkImageCreateInfo), pCreateInfo);\n'
+						                                    '    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo->pQueueFamilyIndices), sizeof(uint32_t) * pCreateInfo->queueFamilyIndexCount, pCreateInfo->pQueueFamilyIndices)',
+                                               'finalize_txt': 'vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo->pQueueFamilyIndices));\n'
+											                   '    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo))'},
+                           'VkBufferCreateInfo': {'add_txt': 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo), sizeof(VkBufferCreateInfo), pCreateInfo);\n'
+                                                            '    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo->pQueueFamilyIndices), sizeof(uint32_t) * pCreateInfo->queueFamilyIndexCount, pCreateInfo->pQueueFamilyIndices)',
+                                               'finalize_txt': 'vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo->pQueueFamilyIndices));\n'
+                                                               '    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo))'},
+                           'pDataSize': {'add_txt': 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pDataSize), sizeof(size_t), &_dataSize)',
+                                         'finalize_txt': 'vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pDataSize))'},
+                           'pData': {'add_txt': 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pData), _dataSize, pData)',
+                                     'finalize_txt': 'default'},
+                           'pName': {'add_txt': 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pName), ((pName != NULL) ? ROUNDUP_TO_4(strlen(pName) + 1) : 0), pName)',
+                                     'finalize_txt': 'default'},
+                           'pMarker': {'add_txt': 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pMarker), ((pMarker != NULL) ? ROUNDUP_TO_4(strlen(pMarker) + 1) : 0), pMarker)',
+                                       'finalize_txt': 'default'},
+                           'pExtName': {'add_txt': 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pExtName), ((pExtName != NULL) ? ROUNDUP_TO_4(strlen(pExtName) + 1) : 0), pExtName)',
+                                        'finalize_txt': 'default'},
+                           'pDescriptorSets': {'add_txt': 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pDescriptorSets), customSize, pDescriptorSets)',
+                                               'finalize_txt': 'default'},
+                           'pSparseMemoryRequirements': {'add_txt': 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pSparseMemoryRequirements), (*pSparseMemoryRequirementCount) * sizeof(VkSparseImageMemoryRequirements), pSparseMemoryRequirements)',
+                                               'finalize_txt': 'default'},
+                           'pAllocator': {'add_txt': 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pAllocator), sizeof(VkAllocationCallbacks), NULL)',
+                                          'finalize_txt': 'default'},
+                           'VkSparseImageFormatProperties': {'add_txt': 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pProperties), (*pPropertyCount) * sizeof(VkSparseImageFormatProperties), pProperties)',
+                                               'finalize_txt': 'default'},
+                           'VkSparseMemoryBindInfo': {'add_txt': 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pBindInfo), numBindings * sizeof(VkSparseMemoryBindInfo), pBindInfo)',
+                                               'finalize_txt': 'default'},
+                           'VkSparseImageMemoryBindInfo': {'add_txt': 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pBindInfo), numBindings * sizeof(VkSparseImageMemoryBindInfo), pBindInfo)',
+                                               'finalize_txt': 'default'},
+                           'VkFramebufferCreateInfo': {'add_txt': 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo), sizeof(VkFramebufferCreateInfo), pCreateInfo);\n'
+                                                                  '    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo->pColorAttachments), colorCount * sizeof(VkColorAttachmentBindInfo), pCreateInfo->pColorAttachments);\n'
+                                                                  '    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo->pDepthStencilAttachment), dsSize, pCreateInfo->pDepthStencilAttachment)',
+                                                  'finalize_txt': 'vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo->pColorAttachments));\n'
+                                                                  '    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo->pDepthStencilAttachment));\n'
+                                                                  '    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo))'},
+                           'VkRenderPassCreateInfo': {'add_txt': 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo), sizeof(VkRenderPassCreateInfo), pCreateInfo);\n'
+                                                                 '    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo->pColorFormats), colorCount * sizeof(VkFormat), pCreateInfo->pColorFormats);\n'
+                                                                 '    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo->pColorLayouts), colorCount * sizeof(VkImageLayout), pCreateInfo->pColorLayouts);\n'
+                                                                 '    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo->pColorLoadOps), colorCount * sizeof(VkAttachmentLoadOp), pCreateInfo->pColorLoadOps);\n'
+                                                                 '    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo->pColorStoreOps), colorCount * sizeof(VkAttachmentStoreOp), pCreateInfo->pColorStoreOps);\n'
+                                                                 '    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo->pColorLoadClearValues), colorCount * sizeof(VkClearColor), pCreateInfo->pColorLoadClearValues)',
+                                                 'finalize_txt': 'vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo->pColorFormats));\n'
+                                                                 '    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo->pColorLayouts));\n'
+                                                                 '    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo->pColorLoadOps));\n'
+                                                                 '    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo->pColorStoreOps));\n'
+                                                                 '    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo->pColorLoadClearValues));\n'
+                                                                 '    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo))'},
+                           'VkPipelineLayoutCreateInfo': {'add_txt': 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo), sizeof(VkPipelineLayoutCreateInfo), pCreateInfo);\n'
+                                                                     '    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo->pSetLayouts), pCreateInfo->setLayoutCount * sizeof(VkDescriptorSetLayout), pCreateInfo->pSetLayouts);'
+                                                                     '    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo->pPushConstantRanges), pCreateInfo->pushConstantRangeCount * sizeof(VkPushConstantRange), pCreateInfo->pPushConstantRanges);',
+                                                     'finalize_txt': 'vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo->pSetLayouts));\n'
+                                                                     'vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo->pPushConstantRanges));\n'
+                                                                     'vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo))'},
+                           'VkMemoryAllocateInfo': {'add_txt': 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pAllocateInfo), sizeof(VkMemoryAllocateInfo), pAllocateInfo);\n'
+                                                            '    add_alloc_memory_to_trace_packet(pHeader, (void**)&(pPacket->pAllocateInfo->pNext), pAllocateInfo->pNext)',
+                                            'finalize_txt': 'vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pAllocateInfo))'},
+#                          'VkGraphicsPipelineCreateInfo': {'add_txt': 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfos), count*sizeof(VkGraphicsPipelineCreateInfo), pCreateInfos);\n'
+#                                                                      '    add_VkGraphicsPipelineCreateInfos_to_trace_packet(pHeader, (VkGraphicsPipelineCreateInfo*)pPacket->pCreateInfos, pCreateInfos, count)',
+#                                                      'finalize_txt': 'vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfos))'},
+#                          'VkComputePipelineCreateInfo': {'add_txt': 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfos), count*sizeof(VkComputePipelineCreateInfo), pCreateInfos);\n'
+#                                                                      '    add_VkComputePipelineCreateInfos_to_trace_packet(pHeader, (VkComputePipelineCreateInfo*)pPacket->pCreateInfos, pCreateInfos, count)',
+#                                                      'finalize_txt': 'vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfos))'},
+                           'VkDescriptorPoolCreateInfo': {'add_txt': 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo), sizeof(VkDescriptorPoolCreateInfo), pCreateInfo);\n'
+                                                                     '    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo->pPoolSizes), pCreateInfo->poolSizeCount * sizeof(VkDescriptorPoolSize), pCreateInfo->pPoolSizes)',
+                                                     'finalize_txt': 'vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo->pPoolSizes));\n'
+                                                                     '    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo))'},
+                           'VkDescriptorSetLayoutCreateInfo': {'add_txt': 'add_create_ds_layout_to_trace_packet(pHeader, &pPacket->pCreateInfo, pCreateInfo)',
+                                                          'finalize_txt': '// pCreateInfo finalized in add_create_ds_layout_to_trace_packet'},
+                           'VkSwapchainCreateInfoKHR': {'add_txt':      'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo), sizeof(VkSwapchainCreateInfoKHR), pCreateInfo);\n'
+                                                                        '    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo->pQueueFamilyIndices), pPacket->pCreateInfo->queueFamilyIndexCount * sizeof(uint32_t), pCreateInfo->pQueueFamilyIndices)',
+                                                        'finalize_txt': 'vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo->pQueueFamilyIndices));\n'
+                                                                        '    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo))'},
+                           'VkShaderModuleCreateInfo': {'add_txt':      'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo), sizeof(VkShaderModuleCreateInfo), pCreateInfo);\n'
+                                                                        '    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->pCreateInfo->pCode), pPacket->pCreateInfo->codeSize, pCreateInfo->pCode)',
+                                                        'finalize_txt': 'vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo->pCode));\n'
+                                                                        '    vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->pCreateInfo))'},
+                          }
+
+        for p in params:
+            pp_dict = {}
+            if '*' in p.ty and p.name not in ['pTag', 'pUserData']:
+                if 'const' in p.ty.lower() and 'count' in params[params.index(p)-1].name.lower():
+                    pp_dict['add_txt'] = 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->%s), %s*sizeof(%s), %s)' % (p.name, params[params.index(p)-1].name, p.ty.strip('*').replace('const ', ''), p.name)
+                elif 'pOffsets' == p.name: # TODO : This is a custom case for BindVertexBuffers last param, need to clean this up
+                    pp_dict['add_txt'] = 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->%s), %s*sizeof(%s), %s)' % (p.name, params[params.index(p)-2].name, p.ty.strip('*').replace('const ', ''), p.name)
+                elif p.ty.strip('*').replace('const ', '') in custom_ptr_dict:
+                    pp_dict['add_txt'] = custom_ptr_dict[p.ty.strip('*').replace('const ', '')]['add_txt']
+                    pp_dict['finalize_txt'] = custom_ptr_dict[p.ty.strip('*').replace('const ', '')]['finalize_txt']
+                elif p.name in custom_ptr_dict:
+                    pp_dict['add_txt'] = custom_ptr_dict[p.name]['add_txt']
+                    pp_dict['finalize_txt'] = custom_ptr_dict[p.name]['finalize_txt']
+                    # TODO : This is custom hack to account for 2 pData items with dataSize param for sizing
+                    if 'pData' == p.name and 'dataSize' == params[params.index(p)-1].name:
+                        pp_dict['add_txt'] = pp_dict['add_txt'].replace('_dataSize', 'dataSize')
+                elif 'void' in p.ty and (p.name == 'pData' or p.name == 'pValues'):
+                    pp_dict['add_txt'] = '//TODO FIXME vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->%s), sizeof(%s), %s)' % (p.name, p.ty.strip('*').replace('const ', ''), p.name)
+                    pp_dict['finalize_txt'] = '//TODO FIXME vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->%s))' % (p.name)
+                else:
+                    pp_dict['add_txt'] = 'vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(pPacket->%s), sizeof(%s), %s)' % (p.name, p.ty.strip('*').replace('const ', ''), p.name)
+                if 'finalize_txt' not in pp_dict or 'default' == pp_dict['finalize_txt']:
+                    pp_dict['finalize_txt'] = 'vktrace_finalize_buffer_address(pHeader, (void**)&(pPacket->%s))' % (p.name)
+                pp_dict['index'] = params.index(p)
+                ptr_param_list.append(pp_dict)
+        return ptr_param_list
+
+    # Take a list of params and return a list of packet size elements
+    def _get_packet_size(self, extensionName, params):
+        ps = [] # List of elements to be added together to account for packet size for given params
+        skip_list = [] # store params that are already accounted for so we don't count them twice
+        # Dict of specific params with unique custom sizes
+        # TODO: Now using bitfields for all stages, need pSetBindPoints to accommodate that.
+        custom_size_dict = {'pSetBindPoints': '(VK_SHADER_STAGE_COMPUTE * sizeof(uint32_t))', # Accounting for largest possible array
+                            'VkSwapchainCreateInfoKHR' : 'vk_size_vkswapchaincreateinfokhr(pCreateInfo)',
+                            }
+        size_func_suffix = ''
+        if extensionName.lower() != "vk_version_1_0":
+            size_func_suffix = '_%s' % extensionName.lower()
+        for p in params:
+            #First handle custom cases
+            if p.name in ['pCreateInfo', 'pSetLayoutInfoList', 'pBeginInfo', 'pAllocateInfo'] and 'khr' not in p.ty.lower() and 'lunarg' not in p.ty.lower() and 'ext' not in p.ty.lower():
+                ps.append('get_struct_chain_size%s((void*)%s)' % (size_func_suffix, p.name))
+                skip_list.append(p.name)
+            elif p.name in custom_size_dict:
+                ps.append(custom_size_dict[p.name])
+                skip_list.append(p.name)
+            elif p.ty.strip('*').replace('const ', '') in custom_size_dict:
+                tmp_ty = p.ty.strip('*').replace('const ', '')
+                ps.append(custom_size_dict[tmp_ty])
+                skip_list.append(p.name)
+            # Skip any params already handled
+            if p.name in skip_list:
+                continue
+            # Now check to identify dynamic arrays which depend on two params
+            if 'count' in p.name.lower():
+                next_idx = params.index(p)+1
+                # If next element is a const *, then multiply count and array type
+                if next_idx < len(params) and '*' in params[next_idx].ty and 'const' in params[next_idx].ty.lower():
+                    if '*' in p.ty:
+                        ps.append('*%s*sizeof(%s)' % (p.name, params[next_idx].ty.strip('*').replace('const ', '')))
+                    else:
+                        ps.append('%s*sizeof(%s)' % (p.name, params[next_idx].ty.strip('*').replace('const ', '')))
+                    skip_list.append(params[next_idx].name)
+                if 'bindingCount' == p.name: # TODO : This is custom case for CmdBindVertexBuffers, need to clean it up
+                    ps.append('%s*sizeof(%s)' % (p.name, params[next_idx+1].ty.strip('*').replace('const ', '')))
+                    skip_list.append(params[next_idx+1].name)
+                elif '*' in p.ty: # Not a custom array size we're aware of, but ptr so need to account for its size
+                    ps.append('sizeof(%s)' % (p.ty.strip('*').replace('const ', '')))
+            elif '*' in p.ty and p.name not in ['pSysMem', 'pReserved']:
+                if 'pData' == p.name:
+                    if 'dataSize' == params[params.index(p)-1].name:
+                        ps.append('dataSize')
+                    elif 'counterCount' == params[params.index(p)-1].name:
+                        ps.append('sizeof(%s)' % p.ty.strip('*').replace('const ', ''))
+                    else:
+                        #ps.append('((pDataSize != NULL && pData != NULL) ? *pDataSize : 0)')
+                        ps.append('sizeof(void*)')
+                elif '**' in p.ty and 'void' in p.ty:
+                    ps.append('sizeof(void*)')
+                elif 'void' in p.ty:
+                    ps.append('sizeof(%s)' % p.name)
+                elif 'char' in p.ty:
+                    ps.append('((%s != NULL) ? ROUNDUP_TO_4(strlen(%s) + 1) : 0)' % (p.name, p.name))
+                elif 'pDataSize' in p.name:
+                    ps.append('((pDataSize != NULL) ? sizeof(size_t) : 0)')
+                elif 'IMAGE_SUBRESOURCE' in p.ty and 'pSubresource' == p.name:
+                    ps.append('((pSubresource != NULL) ? sizeof(VkImage_SUBRESOURCE) : 0)')
+                else:
+                    ps.append('sizeof(%s)' % (p.ty.strip('*').replace('const ', '')))
+        return ps
+
+    # Generate functions used to trace API calls and store the input and result data into a packet
+    # Here's the general flow of code insertion w/ option items flagged w/ "?"
+    # Result decl?
+    # Packet struct decl
+    # ?Special case : setup call to function first and do custom API call time tracking
+    # CREATE_PACKET
+    # call real entrypoint and get return value (if there is one)
+    # Assign packet values
+    # FINISH packet
+    # return result if needed
+    def _generate_trace_funcs(self, extensionName):
+        func_body = []
+        manually_written_hooked_funcs = ['AllocateCommandBuffers',
+                                         'AllocateMemory',
+                                         'AllocateDescriptorSets',
+                                         'BeginCommandBuffer',
+                                         'CreateDescriptorPool',
+                                         'CreateDevice',
+                                         'CreateFramebuffer',
+                                         'CreateInstance',
+                                         'CreatePipelineCache',
+                                         'CreateRenderPass',
+                                         'GetPipelineCacheData',
+                                         'CreateGraphicsPipelines',
+                                         'CreateComputePipelines',
+                                         'CmdPipelineBarrier',
+                                         'CmdWaitEvents',
+                                         'CmdBeginRenderPass',
+                                         'CmdPushConstants',
+                                         'DestroyInstance',
+                                         'EnumeratePhysicalDevices',
+                                         'FreeMemory',
+                                         'FreeDescriptorSets',
+                                         'QueueSubmit',
+                                         'QueueBindSparse',
+                                         'FlushMappedMemoryRanges',
+                                         'InvalidateMappedMemoryRanges',
+                                         'GetDeviceProcAddr',
+                                         'GetInstanceProcAddr',
+                                         'EnumerateInstanceExtensionProperties',
+                                         'EnumerateDeviceExtensionProperties',
+                                         'EnumerateInstanceLayerProperties',
+                                         'EnumerateDeviceLayerProperties',
+                                         'GetPhysicalDeviceQueueFamilyProperties',
+                                         'GetQueryPoolResults',
+                                         'MapMemory',
+                                         'UnmapMemory',
+                                         'UpdateDescriptorSets',
+                                         'GetPhysicalDeviceSurfaceCapabilitiesKHR',
+                                         'GetPhysicalDeviceSurfaceFormatsKHR',
+                                         'GetPhysicalDeviceSurfacePresentModesKHR',
+                                         'CreateSwapchainKHR',
+                                         'GetSwapchainImagesKHR',
+                                         'QueuePresentKHR',
+                                         #TODO add Wayland, Mir
+                                         'CreateXcbSurfaceKHR',
+                                         'CreateXlibSurfaceKHR',
+                                         'GetPhysicalDeviceXcbPresentationSupportKHR',
+                                         'GetPhysicalDeviceXlibPresentationSupportKHR',
+                                         'CreateWin32SurfaceKHR',
+                                         'GetPhysicalDeviceWin32PresentationSupportKHR',
+                                         'CreateAndroidSurfaceKHR',
+                                         ]
+
+        # validate the manually_written_hooked_funcs list
+        protoFuncs = [proto.name for proto in self.protos]
+        wsi_platform_manual_funcs = ['CreateWin32SurfaceKHR', 'CreateXcbSurfaceKHR', 'CreateXlibSurfaceKHR', 'CreateAndroidSurfaceKHR',
+                                     'GetPhysicalDeviceXcbPresentationSupportKHR','GetPhysicalDeviceXlibPresentationSupportKHR', 'GetPhysicalDeviceWin32PresentationSupportKHR']
+        for func in manually_written_hooked_funcs:
+            if (func not in protoFuncs) and (func not in wsi_platform_manual_funcs):
+                sys.exit("Entry '%s' in manually_written_hooked_funcs list is not in the vulkan function prototypes" % func)
+
+        # process each of the entrypoint prototypes
+        approved_ext = ['vk_khr_surface', 'vk_khr_swapchain', 'vk_khr_win32_surface', 'vk_khr_xcb_surface', 'vk_ext_debug_report']
+        for ext in vulkan.extensions_all:
+            if (ext.name.lower() == extensionName.lower()) or ((extensionName.lower() == 'vk_version_1_0') and (ext.name.lower() in approved_ext)):
+                for proto in ext.protos:
+                    if proto.name in manually_written_hooked_funcs:
+                        func_body.append( '// __HOOKED_vk%s is manually written. Look in vktrace_lib_trace.cpp\n' % proto.name)
+                    elif proto.name not in proto_exclusions:
+                        raw_packet_update_list = [] # non-ptr elements placed directly into packet
+                        ptr_packet_update_list = [] # ptr elements to be updated into packet
+                        return_txt = ''
+                        packet_size = []
+                        in_data_size = False # flag when we need to capture local input size variable for in/out size
+                        func_body.append('%s' % self.lineinfo.get())
+                        func_body.append('VKTRACER_EXPORT VKAPI_ATTR %s VKAPI_CALL __HOOKED_vk%s(' % (proto.ret, proto.name))
+                        for p in proto.params: # TODO : For all of the ptr types, check them for NULL and return 0 if NULL
+                            func_body.append('    %s,' % p.c())
+                            if '*' in p.ty and p.name not in ['pSysMem', 'pReserved']:
+                                if 'pDataSize' in p.name:
+                                    in_data_size = True;
+                            elif 'pfnMsgCallback' == p.name:
+                                raw_packet_update_list.append('    PFN_vkDebugReportCallbackEXT* pNonConstCallback = (PFN_vkDebugReportCallbackEXT*)&pPacket->pfnMsgCallback;')
+                                raw_packet_update_list.append('    *pNonConstCallback = pfnMsgCallback;')
+                            elif '[' in p.ty:
+                                raw_packet_update_list.append('    memcpy((void *) pPacket->%s, %s, sizeof(pPacket->%s));' % (p.name, p.name, p.name))
+                            else:
+                                raw_packet_update_list.append('    pPacket->%s = %s;' % (p.name, p.name))
+                        # Get list of packet size modifiers due to ptr params
+                        packet_size = self._get_packet_size(extensionName, proto.params)
+                        ptr_packet_update_list = self._get_packet_ptr_param_list(proto.params)
+                        func_body[-1] = func_body[-1].replace(',', ')')
+                        # End of function declaration portion, begin function body
+                        func_body.append('{\n    vktrace_trace_packet_header* pHeader;')
+                        if 'void' not in proto.ret or '*' in proto.ret:
+                            func_body.append('    %s result;' % proto.ret)
+                            return_txt = 'result = '
+                        if in_data_size:
+                            func_body.append('    size_t _dataSize;')
+                        func_body.append('    packet_vk%s* pPacket = NULL;' % proto.name)
+                        if proto.name == "DestroyInstance" or proto.name == "DestroyDevice":
+                            func_body.append('    dispatch_key key = get_dispatch_key(%s);' % proto.params[0].name)
+
+                        if (0 == len(packet_size)):
+                            func_body.append('    CREATE_TRACE_PACKET(vk%s, 0);' % (proto.name))
+                        else:
+                            func_body.append('    CREATE_TRACE_PACKET(vk%s, %s);' % (proto.name, ' + '.join(packet_size)))
+
+                        # call down the layer chain and get return value (if there is one)
+                        # Note: this logic doesn't work for CreateInstance or CreateDevice but those are handwritten
+                        if extensionName == 'vk_lunarg_debug_marker':
+                            table_txt = 'mdd(%s)->debugMarkerTable' % proto.params[0].name
+                        elif proto.params[0].ty in ['VkInstance', 'VkPhysicalDevice']:
+                           table_txt = 'mid(%s)->instTable' % proto.params[0].name
+                        else:
+                           table_txt = 'mdd(%s)->devTable' % proto.params[0].name
+                        func_body.append('    %s%s.%s;' % (return_txt, table_txt, proto.c_call()))
+                        func_body.append('    vktrace_set_packet_entrypoint_end_time(pHeader);')
+
+                        if in_data_size:
+                            func_body.append('    _dataSize = (pDataSize == NULL || pData == NULL) ? 0 : *pDataSize;')
+                        func_body.append('    pPacket = interpret_body_as_vk%s(pHeader);' % proto.name)
+                        func_body.append('\n'.join(raw_packet_update_list))
+                        for pp_dict in ptr_packet_update_list: #buff_ptr_indices:
+                            func_body.append('    %s;' % (pp_dict['add_txt']))
+                        if 'void' not in proto.ret or '*' in proto.ret:
+                            func_body.append('    pPacket->result = result;')
+                        for pp_dict in ptr_packet_update_list:
+                            if ('DeviceCreateInfo' not in proto.params[pp_dict['index']].ty):
+                                func_body.append('    %s;' % (pp_dict['finalize_txt']))
+                        # All buffers should be finalized by now, and the trace packet can be finished (which sends it over the socket)
+                        func_body.append('    FINISH_TRACE_PACKET();')
+                        if proto.name == "DestroyInstance":
+                            func_body.append('    g_instanceDataMap.erase(key);')
+                        elif proto.name == "DestroyDevice":
+                            func_body.append('    g_deviceDataMap.erase(key);')
+
+                        # return result if needed
+                        if 'void' not in proto.ret or '*' in proto.ret:
+                            func_body.append('    return result;')
+                        func_body.append('}\n')
+        return "\n".join(func_body)
+
+    def _generate_packet_id_enum(self):
+        pid_enum = []
+        pid_enum.append('enum VKTRACE_TRACE_PACKET_ID_VK')
+        pid_enum.append('{')
+        first_func = True
+        for proto in self.protos:
+            if proto.name in proto_exclusions:
+                continue
+            if first_func:
+                first_func = False
+                pid_enum.append('    VKTRACE_TPI_VK_vkApiVersion = VKTRACE_TPI_BEGIN_API_HERE,')
+                pid_enum.append('    VKTRACE_TPI_VK_vk%s,' % proto.name)
+            else:
+                pid_enum.append('    VKTRACE_TPI_VK_vk%s,' % proto.name)
+        pid_enum.append('};\n')
+        return "\n".join(pid_enum)
+
+    def _generate_packet_id_name_func(self):
+        func_body = []
+        func_body.append('static const char *vktrace_vk_packet_id_name(const enum VKTRACE_TRACE_PACKET_ID_VK id)')
+        func_body.append('{')
+        func_body.append('    switch(id) {')
+        func_body.append('    case VKTRACE_TPI_VK_vkApiVersion:')
+        func_body.append('    {')
+        func_body.append('        return "vkApiVersion";')
+        func_body.append('    }')
+        for proto in self.protos:
+            if proto.name in proto_exclusions:
+                continue
+            func_body.append('    case VKTRACE_TPI_VK_vk%s:' % proto.name)
+            func_body.append('    {')
+            func_body.append('        return "vk%s";' % proto.name)
+            func_body.append('    }')
+        func_body.append('    default:')
+        func_body.append('        return NULL;')
+        func_body.append('    }')
+        func_body.append('}\n')
+        return "\n".join(func_body)
+
+    def _generate_stringify_func(self):
+        func_body = []
+        func_body.append('static const char *vktrace_stringify_vk_packet_id(const enum VKTRACE_TRACE_PACKET_ID_VK id, const vktrace_trace_packet_header* pHeader)')
+        func_body.append('{')
+        func_body.append('    static char str[1024];')
+        func_body.append('    switch(id) {')
+        func_body.append('    case VKTRACE_TPI_VK_vkApiVersion:')
+        func_body.append('    {')
+        func_body.append('        packet_vkApiVersion* pPacket = (packet_vkApiVersion*)(pHeader->pBody);')
+        func_body.append('        snprintf(str, 1024, "vkApiVersion = 0x%x", pPacket->version);')
+        func_body.append('        return str;')
+        func_body.append('    }')
+        for proto in self.protos:
+            if proto.name in proto_exclusions:
+                continue
+            func_body.append('    case VKTRACE_TPI_VK_vk%s:' % proto.name)
+            func_body.append('    {')
+            func_str = 'vk%s(' % proto.name
+            print_vals = ''
+            create_func = False
+            if 'Create' in proto.name or 'Alloc' in proto.name or 'MapMemory' in proto.name:
+                create_func = True
+            for p in proto.params:
+                last_param = False
+                if (p.name == proto.params[-1].name):
+                    last_param = True
+                if last_param and create_func: # last param of create func
+                    (pft, pfi, ptr) = self._get_printf_params(p.ty,'pPacket->%s' % p.name, True)
+                else:
+                    (pft, pfi, ptr) = self._get_printf_params(p.ty, 'pPacket->%s' % p.name, False)
+                if last_param == True:
+                    func_str += '%s%s = %s)' % (ptr, p.name, pft)
+                    print_vals += ', %s' % (pfi)
+                else:
+                    func_str += '%s%s = %s, ' % (ptr, p.name, pft)
+                    print_vals += ', %s' % (pfi)
+            func_body.append('        packet_vk%s* pPacket = (packet_vk%s*)(pHeader->pBody);' % (proto.name, proto.name))
+            func_body.append('        snprintf(str, 1024, "%s"%s);' % (func_str, print_vals))
+            func_body.append('        return str;')
+            func_body.append('    }')
+        func_body.append('    default:')
+        func_body.append('        return NULL;')
+        func_body.append('    }')
+        func_body.append('};\n')
+        return "\n".join(func_body)
+    
+    def _generate_interp_func(self):
+        interp_func_body = []
+        interp_func_body.append('%s' % self.lineinfo.get())
+        interp_func_body.append('static vktrace_trace_packet_header* interpret_trace_packet_vk(vktrace_trace_packet_header* pHeader)')
+        interp_func_body.append('{')
+        interp_func_body.append('    if (pHeader == NULL)')
+        interp_func_body.append('    {')
+        interp_func_body.append('        return NULL;')
+        interp_func_body.append('    }')
+        interp_func_body.append('    switch (pHeader->packet_id)')
+        interp_func_body.append('    {')
+        interp_func_body.append('        case VKTRACE_TPI_VK_vkApiVersion:')
+        interp_func_body.append('        {')
+        interp_func_body.append('            return interpret_body_as_vkApiVersion(pHeader)->header;')
+        interp_func_body.append('        }')
+        for proto in self.protos:
+            if proto.name in proto_exclusions:
+                continue
+
+            interp_func_body.append('        case VKTRACE_TPI_VK_vk%s:\n        {' % proto.name)
+            header_prefix = 'h'
+            if 'Dbg' in proto.name :
+                header_prefix = 'pH'
+            interp_func_body.append('%s' % self.lineinfo.get())
+            interp_func_body.append('            return interpret_body_as_vk%s(pHeader)->%seader;\n        }' % (proto.name, header_prefix))
+        interp_func_body.append('        default:')
+        interp_func_body.append('            return NULL;')
+        interp_func_body.append('    }')
+        interp_func_body.append('    return NULL;')
+        interp_func_body.append('}')
+        return "\n".join(interp_func_body)
+
+    def _generate_struct_util_funcs(self):
+        lineinfo = self.lineinfo
+        pid_enum = []
+        pid_enum.append('%s' % lineinfo.get())
+        pid_enum.append('//=============================================================================')
+        pid_enum.append('static void add_VkApplicationInfo_to_packet(vktrace_trace_packet_header*  pHeader, VkApplicationInfo** ppStruct, const VkApplicationInfo *pInStruct)')
+        pid_enum.append('{')
+        pid_enum.append('    vktrace_add_buffer_to_trace_packet(pHeader, (void**)ppStruct, sizeof(VkApplicationInfo), pInStruct);')
+        pid_enum.append('    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&((*ppStruct)->pApplicationName), (pInStruct->pApplicationName != NULL) ? ROUNDUP_TO_4(strlen(pInStruct->pApplicationName) + 1) : 0, pInStruct->pApplicationName);')
+        pid_enum.append('    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&((*ppStruct)->pEngineName), (pInStruct->pEngineName != NULL) ? ROUNDUP_TO_4(strlen(pInStruct->pEngineName) + 1) : 0, pInStruct->pEngineName);')
+        pid_enum.append('    vktrace_finalize_buffer_address(pHeader, (void**)&((*ppStruct)->pApplicationName));')
+        pid_enum.append('    vktrace_finalize_buffer_address(pHeader, (void**)&((*ppStruct)->pEngineName));')
+        pid_enum.append('    vktrace_finalize_buffer_address(pHeader, (void**)&*ppStruct);')
+        pid_enum.append('};\n')
+        pid_enum.append('%s' % lineinfo.get())
+        pid_enum.append('static void add_VkInstanceCreateInfo_to_packet(vktrace_trace_packet_header* pHeader, VkInstanceCreateInfo** ppStruct, VkInstanceCreateInfo *pInStruct)')
+        pid_enum.append('{')
+        pid_enum.append('    vktrace_add_buffer_to_trace_packet(pHeader, (void**)ppStruct, sizeof(VkInstanceCreateInfo), pInStruct);')
+        pid_enum.append('    if (pInStruct->pApplicationInfo) add_VkApplicationInfo_to_packet(pHeader, (VkApplicationInfo**)&((*ppStruct)->pApplicationInfo), pInStruct->pApplicationInfo);')
+        # TODO138 : This is an initial pass at getting the extension/layer arrays correct, needs to be validated.
+        pid_enum.append('    uint32_t i, siz = 0;')
+        pid_enum.append('    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&((*ppStruct)->ppEnabledLayerNames), pInStruct->enabledLayerCount * sizeof(char*), pInStruct->ppEnabledLayerNames);')
+        pid_enum.append('    if (pInStruct->enabledLayerCount > 0) ')
+        pid_enum.append('    {')
+        pid_enum.append('        for (i = 0; i < pInStruct->enabledLayerCount; i++) {')
+        pid_enum.append('            siz = (uint32_t) ROUNDUP_TO_4(1 + strlen(pInStruct->ppEnabledLayerNames[i]));')
+        pid_enum.append('            vktrace_add_buffer_to_trace_packet(pHeader, (void**)(&(*ppStruct)->ppEnabledLayerNames[i]), siz, pInStruct->ppEnabledLayerNames[i]);')
+        pid_enum.append('            vktrace_finalize_buffer_address(pHeader, (void **)&(*ppStruct)->ppEnabledLayerNames[i]);')
+        pid_enum.append('        }')
+        pid_enum.append('    }')
+        pid_enum.append('    vktrace_finalize_buffer_address(pHeader, (void **)&(*ppStruct)->ppEnabledLayerNames);')
+        pid_enum.append('    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&((*ppStruct)->ppEnabledExtensionNames), pInStruct->enabledExtensionCount * sizeof(char*), pInStruct->ppEnabledExtensionNames);')
+        pid_enum.append('    if (pInStruct->enabledExtensionCount > 0) ')
+        pid_enum.append('    {')
+        pid_enum.append('        for (i = 0; i < pInStruct->enabledExtensionCount; i++) {')
+        pid_enum.append('            siz = (uint32_t) ROUNDUP_TO_4(1 + strlen(pInStruct->ppEnabledExtensionNames[i]));')
+        pid_enum.append('            vktrace_add_buffer_to_trace_packet(pHeader, (void**)(&(*ppStruct)->ppEnabledExtensionNames[i]), siz, pInStruct->ppEnabledExtensionNames[i]);')
+        pid_enum.append('            vktrace_finalize_buffer_address(pHeader, (void **)&(*ppStruct)->ppEnabledExtensionNames[i]);')
+        pid_enum.append('        }')
+        pid_enum.append('    }')
+        pid_enum.append('    vktrace_finalize_buffer_address(pHeader, (void **)&(*ppStruct)->ppEnabledExtensionNames);')
+        pid_enum.append('    vktrace_finalize_buffer_address(pHeader, (void**)ppStruct);')
+        pid_enum.append('}\n')
+        pid_enum.append('%s' % lineinfo.get())
+        pid_enum.append('static void add_VkDeviceCreateInfo_to_packet(vktrace_trace_packet_header*  pHeader, VkDeviceCreateInfo** ppStruct, const VkDeviceCreateInfo *pInStruct)')
+        pid_enum.append('{')
+        pid_enum.append('    vktrace_add_buffer_to_trace_packet(pHeader, (void**)ppStruct, sizeof(VkDeviceCreateInfo), pInStruct);')
+        pid_enum.append('    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(*ppStruct)->pQueueCreateInfos, pInStruct->queueCreateInfoCount*sizeof(VkDeviceQueueCreateInfo), pInStruct->pQueueCreateInfos);')
+        pid_enum.append('    for (uint32_t i = 0; i < pInStruct->queueCreateInfoCount; i++) {')
+        pid_enum.append('        vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(*ppStruct)->pQueueCreateInfos[i].pQueuePriorities,')
+        pid_enum.append('                                   pInStruct->pQueueCreateInfos[i].queueCount*sizeof(float),')
+        pid_enum.append('                                   pInStruct->pQueueCreateInfos[i].pQueuePriorities);')
+        pid_enum.append('        vktrace_finalize_buffer_address(pHeader, (void**)&(*ppStruct)->pQueueCreateInfos[i].pQueuePriorities);')
+        pid_enum.append('    }')
+        pid_enum.append('    vktrace_finalize_buffer_address(pHeader, (void**)&(*ppStruct)->pQueueCreateInfos);')
+        # TODO138 : This is an initial pass at getting the extension/layer arrays correct, needs to be validated.
+        pid_enum.append('    uint32_t i, siz = 0;')
+        pid_enum.append('    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&((*ppStruct)->ppEnabledLayerNames), pInStruct->enabledLayerCount * sizeof(char*), pInStruct->ppEnabledLayerNames);')
+        pid_enum.append('    if (pInStruct->enabledLayerCount > 0) ')
+        pid_enum.append('    {')
+        pid_enum.append('        for (i = 0; i < pInStruct->enabledLayerCount; i++) {')
+        pid_enum.append('            siz = (uint32_t) ROUNDUP_TO_4(1 + strlen(pInStruct->ppEnabledLayerNames[i]));')
+        pid_enum.append('            vktrace_add_buffer_to_trace_packet(pHeader, (void**)(&(*ppStruct)->ppEnabledLayerNames[i]), siz, pInStruct->ppEnabledLayerNames[i]);')
+        pid_enum.append('            vktrace_finalize_buffer_address(pHeader, (void **)&(*ppStruct)->ppEnabledLayerNames[i]);')
+        pid_enum.append('        }')
+        pid_enum.append('    }')
+        pid_enum.append('    vktrace_finalize_buffer_address(pHeader, (void **)&(*ppStruct)->ppEnabledLayerNames);')
+        pid_enum.append('    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&((*ppStruct)->ppEnabledExtensionNames), pInStruct->enabledExtensionCount * sizeof(char*), pInStruct->ppEnabledExtensionNames);')
+        pid_enum.append('    if (pInStruct->enabledExtensionCount > 0) ')
+        pid_enum.append('    {')
+        pid_enum.append('        for (i = 0; i < pInStruct->enabledExtensionCount; i++) {')
+        pid_enum.append('            siz = (uint32_t) ROUNDUP_TO_4(1 + strlen(pInStruct->ppEnabledExtensionNames[i]));')
+        pid_enum.append('            vktrace_add_buffer_to_trace_packet(pHeader, (void**)(&(*ppStruct)->ppEnabledExtensionNames[i]), siz, pInStruct->ppEnabledExtensionNames[i]);')
+        pid_enum.append('            vktrace_finalize_buffer_address(pHeader, (void **)&(*ppStruct)->ppEnabledExtensionNames[i]);')
+        pid_enum.append('        }')
+        pid_enum.append('    }')
+        pid_enum.append('    vktrace_finalize_buffer_address(pHeader, (void **)&(*ppStruct)->ppEnabledExtensionNames);')
+        pid_enum.append('    vktrace_add_buffer_to_trace_packet(pHeader, (void**)&(*ppStruct)->pEnabledFeatures, sizeof(VkPhysicalDeviceFeatures), pInStruct->pEnabledFeatures);')
+        pid_enum.append('    vktrace_finalize_buffer_address(pHeader, (void**)&(*ppStruct)->pEnabledFeatures);')
+        pid_enum.append('    vktrace_finalize_buffer_address(pHeader, (void**)ppStruct);')
+        pid_enum.append('}\n')
+        pid_enum.append('%s' % lineinfo.get())
+        pid_enum.append('//=============================================================================\n')
+        pid_enum.append('static VkInstanceCreateInfo* interpret_VkInstanceCreateInfo(vktrace_trace_packet_header*  pHeader, intptr_t ptr_variable)')
+        pid_enum.append('{')
+        pid_enum.append('    VkInstanceCreateInfo* pVkInstanceCreateInfo = (VkInstanceCreateInfo*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)ptr_variable);\n')
+        pid_enum.append('    uint32_t i;')
+        pid_enum.append('    if (pVkInstanceCreateInfo != NULL)')
+        pid_enum.append('    {')
+        pid_enum.append('        pVkInstanceCreateInfo->pApplicationInfo = (VkApplicationInfo*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pVkInstanceCreateInfo->pApplicationInfo);')
+        pid_enum.append('        VkApplicationInfo** ppApplicationInfo = (VkApplicationInfo**) &pVkInstanceCreateInfo->pApplicationInfo;')
+        pid_enum.append('        if (pVkInstanceCreateInfo->pApplicationInfo)')
+        pid_enum.append('        {')
+        pid_enum.append('            (*ppApplicationInfo)->pApplicationName = (const char*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pVkInstanceCreateInfo->pApplicationInfo->pApplicationName);')
+        pid_enum.append('            (*ppApplicationInfo)->pEngineName = (const char*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pVkInstanceCreateInfo->pApplicationInfo->pEngineName);')
+        pid_enum.append('        }')
+        pid_enum.append('        if (pVkInstanceCreateInfo->enabledLayerCount > 0)')
+        pid_enum.append('        {')
+        pid_enum.append('            pVkInstanceCreateInfo->ppEnabledLayerNames = (const char* const*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pVkInstanceCreateInfo->ppEnabledLayerNames);')
+        pid_enum.append('            for (i = 0; i < pVkInstanceCreateInfo->enabledLayerCount; i++) {')
+        pid_enum.append('                char** ppTmp = (char**)&pVkInstanceCreateInfo->ppEnabledLayerNames[i];')
+        pid_enum.append('                *ppTmp = (char*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pVkInstanceCreateInfo->ppEnabledLayerNames[i]);')
+        pid_enum.append('            }')
+        pid_enum.append('        }')
+        pid_enum.append('        if (pVkInstanceCreateInfo->enabledExtensionCount > 0)')
+        pid_enum.append('        {')
+        pid_enum.append('            pVkInstanceCreateInfo->ppEnabledExtensionNames = (const char* const*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pVkInstanceCreateInfo->ppEnabledExtensionNames);')
+        pid_enum.append('            for (i = 0; i < pVkInstanceCreateInfo->enabledExtensionCount; i++) {')
+        pid_enum.append('                char** ppTmp = (char**)&pVkInstanceCreateInfo->ppEnabledExtensionNames[i];')
+        pid_enum.append('                *ppTmp = (char*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pVkInstanceCreateInfo->ppEnabledExtensionNames[i]);')
+        pid_enum.append('            }')
+        pid_enum.append('        }')
+        pid_enum.append('    }\n')
+        pid_enum.append('    return pVkInstanceCreateInfo;')
+        pid_enum.append('}\n')
+        pid_enum.append('%s' % lineinfo.get())
+        pid_enum.append('static VkDeviceCreateInfo* interpret_VkDeviceCreateInfo(vktrace_trace_packet_header*  pHeader, intptr_t ptr_variable)')
+        pid_enum.append('{')
+        pid_enum.append('    VkDeviceCreateInfo* pVkDeviceCreateInfo = (VkDeviceCreateInfo*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)ptr_variable);\n')
+        pid_enum.append('    uint32_t i;')
+        pid_enum.append('    if (pVkDeviceCreateInfo != NULL)')
+        pid_enum.append('    {')
+        pid_enum.append('        if (pVkDeviceCreateInfo->queueCreateInfoCount > 0)')
+        pid_enum.append('        {')
+        pid_enum.append('            pVkDeviceCreateInfo->pQueueCreateInfos = (const VkDeviceQueueCreateInfo *)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pVkDeviceCreateInfo->pQueueCreateInfos);')
+        pid_enum.append('            for (i = 0; i < pVkDeviceCreateInfo->queueCreateInfoCount; i++) {')
+        pid_enum.append('                float** ppQueuePriority = (float**)&pVkDeviceCreateInfo->pQueueCreateInfos[i].pQueuePriorities;')
+        pid_enum.append('                *ppQueuePriority = (float*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pVkDeviceCreateInfo->pQueueCreateInfos[i].pQueuePriorities);')
+        pid_enum.append('            }')
+        pid_enum.append('        }')
+        pid_enum.append('        if (pVkDeviceCreateInfo->enabledLayerCount > 0)')
+        pid_enum.append('        {')
+        pid_enum.append('            pVkDeviceCreateInfo->ppEnabledLayerNames = (const char* const*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pVkDeviceCreateInfo->ppEnabledLayerNames);')
+        pid_enum.append('            for (i = 0; i < pVkDeviceCreateInfo->enabledLayerCount; i++) {')
+        pid_enum.append('                char** ppTmp = (char**)&pVkDeviceCreateInfo->ppEnabledLayerNames[i];')
+        pid_enum.append('                *ppTmp = (char*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pVkDeviceCreateInfo->ppEnabledLayerNames[i]);')
+        pid_enum.append('            }')
+        pid_enum.append('        }')
+        pid_enum.append('        if (pVkDeviceCreateInfo->enabledExtensionCount > 0)')
+        pid_enum.append('        {')
+        pid_enum.append('            pVkDeviceCreateInfo->ppEnabledExtensionNames = (const char* const*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pVkDeviceCreateInfo->ppEnabledExtensionNames);')
+        pid_enum.append('            for (i = 0; i < pVkDeviceCreateInfo->enabledExtensionCount; i++) {')
+        pid_enum.append('                char** ppTmp = (char**)&pVkDeviceCreateInfo->ppEnabledExtensionNames[i];')
+        pid_enum.append('                *ppTmp = (char*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pVkDeviceCreateInfo->ppEnabledExtensionNames[i]);')
+        pid_enum.append('            }')
+        pid_enum.append('        }')
+        pid_enum.append('        pVkDeviceCreateInfo->pEnabledFeatures = (const VkPhysicalDeviceFeatures*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pVkDeviceCreateInfo->pEnabledFeatures);\n')
+        pid_enum.append('    }\n')
+        pid_enum.append('    return pVkDeviceCreateInfo;')
+        pid_enum.append('}\n')
+        pid_enum.append('%s' % lineinfo.get())
+        pid_enum.append('static void interpret_VkPipelineShaderStageCreateInfo(vktrace_trace_packet_header*  pHeader, VkPipelineShaderStageCreateInfo* pShader)')
+        pid_enum.append('{')
+        pid_enum.append('    if (pShader != NULL)')
+        pid_enum.append('    {')
+        pid_enum.append('        pShader->pName = (const char*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pShader->pName);')
+        pid_enum.append('        // specialization info')
+        pid_enum.append('        pShader->pSpecializationInfo = (const VkSpecializationInfo*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pShader->pSpecializationInfo);')
+        pid_enum.append('        if (pShader->pSpecializationInfo != NULL)')
+        pid_enum.append('        {')
+        pid_enum.append('            VkSpecializationInfo* pInfo = (VkSpecializationInfo*)pShader->pSpecializationInfo;')
+        pid_enum.append('            pInfo->pMapEntries = (const VkSpecializationMapEntry*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pShader->pSpecializationInfo->pMapEntries);')
+        pid_enum.append('            pInfo->pData = (const void*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pShader->pSpecializationInfo->pData);')
+        pid_enum.append('        }')
+        pid_enum.append('    }')
+        pid_enum.append('}\n')
+        pid_enum.append('//=============================================================================')
+        return "\n".join(pid_enum)
+
+    # Interpret functions used on replay to read in packets and interpret their contents
+    #  This code gets generated into vktrace_vk_vk_packets.h file
+    def _generate_interp_funcs(self):
+        # Custom txt for given function and parameter.  First check if param is NULL, then insert txt if not
+        # First some common code used by both CmdWaitEvents & CmdPipelineBarrier
+        mem_barrier_interp = ['uint32_t i = 0;\n',
+                              'for (i = 0; i < pPacket->memoryBarrierCount; i++) {\n',
+                              '    void** ppMB = (void**)&(pPacket->ppMemoryBarriers[i]);\n',
+                              '    *ppMB = vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->ppMemoryBarriers[i]);\n',
+                              '    //VkMemoryBarrier* pBarr = (VkMemoryBarrier*)pPacket->ppMemoryBarriers[i];\n',
+                              '    // TODO : Could fix up the pNext ptrs here if they were finalized and if we cared by switching on Barrier type and remapping\n',
+                              '}']
+        create_rp_interp = ['VkRenderPassCreateInfo* pInfo = (VkRenderPassCreateInfo*)pPacket->pCreateInfo;\n',
+                            'uint32_t i = 0;\n',
+                            'VkAttachmentDescription **ppAD = (VkAttachmentDescription **)&(pInfo->pAttachments);\n',
+                            '*ppAD = (VkAttachmentDescription*) vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pInfo->pAttachments);\n',
+                            'VkSubpassDescription** ppSP = (VkSubpassDescription**)&(pInfo->pSubpasses);\n',
+                            '*ppSP = (VkSubpassDescription*) vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pInfo->pSubpasses);\n',
+                            'for (i=0; i<pInfo->subpassCount; i++) {\n',
+                            '    VkAttachmentReference** pAR = (VkAttachmentReference**)&(pInfo->pSubpasses[i].pInputAttachments);\n',
+                            '    *pAR = (VkAttachmentReference*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pInfo->pSubpasses[i].pInputAttachments);\n',
+                            '    pAR = (VkAttachmentReference**)&(pInfo->pSubpasses[i].pColorAttachments);\n',
+                            '    *pAR = (VkAttachmentReference*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pInfo->pSubpasses[i].pColorAttachments);\n',
+                            '    pAR = (VkAttachmentReference**)&(pInfo->pSubpasses[i].pResolveAttachments);\n',
+                            '    *pAR = (VkAttachmentReference*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pInfo->pSubpasses[i].pResolveAttachments);\n',
+                            '    pAR = (VkAttachmentReference**)&(pInfo->pSubpasses[i].pDepthStencilAttachment);\n',
+                            '    *pAR = (VkAttachmentReference*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pInfo->pSubpasses[i].pDepthStencilAttachment);\n',
+                            '    pAR = (VkAttachmentReference**)&(pInfo->pSubpasses[i].pPreserveAttachments);\n',
+                            '    *pAR = (VkAttachmentReference*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pInfo->pSubpasses[i].pPreserveAttachments);\n',
+                            '}\n',
+                            'VkSubpassDependency** ppSD = (VkSubpassDependency**)&(pInfo->pDependencies);\n',
+                            '*ppSD = (VkSubpassDependency*) vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pInfo->pDependencies);\n']
+        create_gfx_pipe = ['uint32_t i;\n',
+                           'uint32_t j;\n',
+                           'for (i=0; i<pPacket->createInfoCount; i++) {\n',
+                            'if (pPacket->pCreateInfos[i].sType == VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO) {\n',
+                            '// need to make a non-const pointer to the pointer so that we can properly change the original pointer to the interpretted one\n',
+                            'VkGraphicsPipelineCreateInfo* pNonConst = (VkGraphicsPipelineCreateInfo*)&(pPacket->pCreateInfos[i]);\n',
+                            '// shader stages array\n',
+                            'pNonConst->pStages = (VkPipelineShaderStageCreateInfo*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pCreateInfos[i].pStages);\n',
+                            'for (j = 0; j < pPacket->pCreateInfos[i].stageCount; j++)\n',
+                            '{\n',
+                            '    interpret_VkPipelineShaderStageCreateInfo(pHeader, (VkPipelineShaderStageCreateInfo*)&pPacket->pCreateInfos[i].pStages[j]);\n',
+                            '}\n',
+                            '// Vertex Input State\n',
+                            'pNonConst->pVertexInputState = (VkPipelineVertexInputStateCreateInfo*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pCreateInfos[i].pVertexInputState);\n',
+                            'VkPipelineVertexInputStateCreateInfo* pNonConstVIState = (VkPipelineVertexInputStateCreateInfo*)pNonConst->pVertexInputState;\n',
+                            'if (pNonConstVIState) {\n',
+                            '    pNonConstVIState->pVertexBindingDescriptions = (const VkVertexInputBindingDescription*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pCreateInfos[i].pVertexInputState->pVertexBindingDescriptions);\n',
+                            '    pNonConstVIState->pVertexAttributeDescriptions = (const VkVertexInputAttributeDescription*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pCreateInfos[i].pVertexInputState->pVertexAttributeDescriptions);\n',
+                            '}\n',
+                            '// Input Assembly State\n',
+                            'pNonConst->pInputAssemblyState = (const VkPipelineInputAssemblyStateCreateInfo*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pCreateInfos[i].pInputAssemblyState);\n',
+                            '// Tesselation State\n',
+                            'pNonConst->pTessellationState = (const VkPipelineTessellationStateCreateInfo*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pCreateInfos[i].pTessellationState);\n',
+                            '// Viewport State\n',
+                            'pNonConst->pViewportState = (const VkPipelineViewportStateCreateInfo*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pCreateInfos[i].pViewportState);\n',
+                            '// Raster State\n',
+                            'pNonConst->pRasterizationState = (const VkPipelineRasterizationStateCreateInfo*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pCreateInfos[i].pRasterizationState);\n',
+                            '// MultiSample State\n',
+                            'pNonConst->pMultisampleState = (const VkPipelineMultisampleStateCreateInfo*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pCreateInfos[i].pMultisampleState);\n',
+                            '// DepthStencil State\n',
+                            'pNonConst->pDepthStencilState = (const VkPipelineDepthStencilStateCreateInfo*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pCreateInfos[i].pDepthStencilState);\n',
+                            '// DynamicState State\n',
+                            'pNonConst->pDynamicState = (const VkPipelineDynamicStateCreateInfo*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pCreateInfos[i].pDynamicState);\n',
+                            'VkPipelineDynamicStateCreateInfo* pNonConstDyState = (VkPipelineDynamicStateCreateInfo*)pNonConst->pDynamicState;\n',
+                            'if (pNonConstDyState) {\n',
+                            '    pNonConstDyState->pDynamicStates = (const VkDynamicState*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pCreateInfos[i].pDynamicState->pDynamicStates);\n',
+                            '}\n',
+
+                            '// ColorBuffer State\n',
+                            'pNonConst->pColorBlendState = (const VkPipelineColorBlendStateCreateInfo*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pCreateInfos[i].pColorBlendState);\n',
+                            'VkPipelineColorBlendStateCreateInfo* pNonConstCbState = (VkPipelineColorBlendStateCreateInfo*)pNonConst->pColorBlendState;\n',
+                            'if (pNonConstCbState)\n',
+                            '    pNonConstCbState->pAttachments = (const VkPipelineColorBlendAttachmentState*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pCreateInfos[i].pColorBlendState->pAttachments);\n',
+                            '} else {\n',
+                            '    // This is unexpected.\n',
+                            '    vktrace_LogError("CreateGraphicsPipelines must have CreateInfo stype of VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO.");\n',
+                            '    pPacket->header = NULL;\n',
+                            '}\n',
+                            '}\n']
+        # TODO : This code is now too large and complex, need to make codegen smarter for pointers embedded in struct params to handle those cases automatically
+                              # TODO138 : Just ripped out a bunch of custom code here that was out of date. Need to scrub these function and verify they're correct
+        custom_case_dict = { #'CreateFramebuffer' : {'param': 'pCreateInfo', 'txt': ['VkFramebufferCreateInfo* pInfo = (VkFramebufferCreateInfo*)pPacket->pCreateInfo;\n',
+                              #                      'pInfo->pColorAttachments = (VkColorAttachmentBindInfo*) vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pCreateInfo->pColorAttachments);\n',
+                               #                     'pInfo->pDepthStencilAttachment = (VkDepthStencilBindInfo*) vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pCreateInfo->pDepthStencilAttachment);\n']},
+                             'CreateRenderPass' : {'param': 'pCreateInfo', 'txt': create_rp_interp},
+                             'CreatePipelineCache' : {'param': 'pCreateInfo', 'txt': [
+                                                       '((VkPipelineCacheCreateInfo *)pPacket->pCreateInfo)->pInitialData = (const void*) vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pCreateInfo->pInitialData);\n']},
+                             'CreatePipelineLayout' : {'param': 'pCreateInfo', 'txt': ['VkPipelineLayoutCreateInfo* pInfo = (VkPipelineLayoutCreateInfo*)pPacket->pCreateInfo;\n',
+                                                       'pInfo->pSetLayouts = (VkDescriptorSetLayout*) vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pCreateInfo->pSetLayouts);\n',
+                                                       'pInfo->pPushConstantRanges = (VkPushConstantRange*) vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pCreateInfo->pPushConstantRanges);\n']},
+                             'CreateDescriptorPool' : {'param': 'pCreateInfo', 'txt': ['VkDescriptorPoolCreateInfo* pInfo = (VkDescriptorPoolCreateInfo*)pPacket->pCreateInfo;\n',
+                                                       'pInfo->pPoolSizes = (VkDescriptorPoolSize*) vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pCreateInfo->pPoolSizes);\n']},
+                             'CmdWaitEvents' : {'param': 'ppMemoryBarriers', 'txt': mem_barrier_interp},
+                             'CmdPipelineBarrier' : {'param': 'ppMemoryBarriers', 'txt': mem_barrier_interp},
+                             'CreateDescriptorSetLayout' : {'param': 'pCreateInfo', 'txt': ['if (pPacket->pCreateInfo->sType == VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO) {\n',
+                                                                                         '    VkDescriptorSetLayoutCreateInfo* pNext = (VkDescriptorSetLayoutCreateInfo*)pPacket->pCreateInfo;\n',
+                                                                                         '    do\n','    {\n',
+                                                                                         '        // need to make a non-const pointer to the pointer so that we can properly change the original pointer to the interpretted one\n',
+                                                                                         '        void** ppNextVoidPtr = (void**)&(pNext->pNext);\n',
+                                                                                         '        *ppNextVoidPtr = (void*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pNext->pNext);\n',
+                                                                                         '        switch(pNext->sType)\n', '        {\n',
+                                                                                         '            case VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO:\n',
+                                                                                         '            {\n' ,
+                                                                                         '                unsigned int i = 0;\n',
+                                                                                         '                pNext->pBindings = (VkDescriptorSetLayoutBinding*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pNext->pBindings);\n',
+                                                                                         '                for (i = 0; i < pNext->bindingCount; i++)\n','                {\n',
+                                                                                         '                    VkSampler** ppSamplers = (VkSampler**)&(pNext->pBindings[i].pImmutableSamplers);\n',
+                                                                                         '                    *ppSamplers = (VkSampler*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pNext->pBindings[i].pImmutableSamplers);\n',
+                                                                                         '                }\n',
+                                                                                         '                break;\n',
+                                                                                         '            }\n',
+                                                                                         '            default:\n',
+                                                                                         '            {\n',
+                                                                                         '                vktrace_LogError("Encountered an unexpected type in descriptor set layout create list.");\n',
+                                                                                         '                pPacket->header = NULL;\n',
+                                                                                         '                pNext->pNext = NULL;\n',
+                                                                                         '            }\n',
+                                                                                         '        }\n',
+                                                                                         '        pNext = (VkDescriptorSetLayoutCreateInfo*)pNext->pNext;\n',
+                                                                                         '     }  while (NULL != pNext);\n',
+                                                                                         '} else {\n',
+                                                                                         '     // This is unexpected.\n',
+                                                                                         '     vktrace_LogError("CreateDescriptorSetLayout must have pCreateInfo->stype of VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO.");\n',
+                                                                                         '     pPacket->header = NULL;\n',
+                                                                                         '}']},
+                             'BeginCommandBuffer' : {'param': 'pBeginInfo', 'txt': [
+                                                                                          'VkCommandBufferBeginInfo* pInfo = (VkCommandBufferBeginInfo*) pPacket->pBeginInfo;\n',
+                                                       'pInfo->pInheritanceInfo = (VkCommandBufferInheritanceInfo*) vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pBeginInfo->pInheritanceInfo);\n']},
+                             'AllocateMemory' : {'param': 'pAllocateInfo', 'txt': ['if (pPacket->pAllocateInfo->sType == VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO) {\n',
+                                                                                         '    VkMemoryAllocateInfo** ppNext = (VkMemoryAllocateInfo**) &(pPacket->pAllocateInfo->pNext);\n',
+                                                                                         '    *ppNext = (VkMemoryAllocateInfo*) vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pAllocateInfo->pNext);\n',
+                                                                                         '} else {\n',
+                                                                                         '    // This is unexpected.\n',
+                                                                                         '    vktrace_LogError("AllocateMemory must have AllocInfo stype of VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO.");\n',
+                                                                                         '    pPacket->header = NULL;\n',
+                                                                                         '}']},
+                             'AllocateDescriptorSets' : {'param': 'pAllocateInfo', 'txt':
+                                                                               ['VkDescriptorSetLayout **ppDescSetLayout = (VkDescriptorSetLayout **) &pPacket->pAllocateInfo->pSetLayouts;\n'
+                                                                                '        *ppDescSetLayout = (VkDescriptorSetLayout *) vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)(pPacket->pAllocateInfo->pSetLayouts));']},
+                             'UpdateDescriptorSets' : {'param': 'pDescriptorWrites', 'txt':
+                                                                               [ 'uint32_t i;\n',
+                                                                                 'for (i = 0; i < pPacket->descriptorWriteCount; i++) {\n',
+                                                                                 '    switch (pPacket->pDescriptorWrites[i].descriptorType) {',
+                                                                                 '    case VK_DESCRIPTOR_TYPE_SAMPLER:',
+                                                                                 '    case VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER:',
+                                                                                 '    case VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE:',
+                                                                                 '    case VK_DESCRIPTOR_TYPE_STORAGE_IMAGE:',
+                                                                                 '    case VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT:',
+                                                                                 '        {',
+                                                                                 '            VkDescriptorImageInfo** ppImageInfo = (VkDescriptorImageInfo**)&pPacket->pDescriptorWrites[i].pImageInfo;\n',
+                                                                                 '            *ppImageInfo = (VkDescriptorImageInfo*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pDescriptorWrites[i].pImageInfo);\n',
+                                                                                 '        }',
+                                                                                 '        break;',
+                                                                                 '    case VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER:',
+                                                                                 '    case VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER:',
+                                                                                 '        {',
+                                                                                 '            VkBufferView** ppTexelBufferView = (VkBufferView**)&pPacket->pDescriptorWrites[i].pTexelBufferView;\n',
+                                                                                 '            *ppTexelBufferView = (VkBufferView*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pDescriptorWrites[i].pTexelBufferView);\n',
+                                                                                 '        }',
+                                                                                 '        break;',
+                                                                                 '    case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER:',
+                                                                                 '    case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER:',
+                                                                                 '    case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC:',
+                                                                                 '    case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC:',
+                                                                                 '        {',
+                                                                                 '            VkDescriptorBufferInfo** ppBufferInfo = (VkDescriptorBufferInfo**)&pPacket->pDescriptorWrites[i].pBufferInfo;\n',
+                                                                                 '            *ppBufferInfo = (VkDescriptorBufferInfo*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pDescriptorWrites[i].pBufferInfo);\n',
+                                                                                 '        }',
+                                                                                 '        break;',
+                                                                                 '    default:',
+                                                                                 '        break;',
+                                                                                 '    }',
+                                                                                 '}'
+                                                                               ]},
+                             'QueueSubmit' : {'param': 'pSubmits', 'txt':
+                                                                               [ 'uint32_t i;\n',
+                                                                                 'for (i = 0; i < pPacket->submitCount; i++) {\n',
+                                                                                 '   VkCommandBuffer** ppCBs = (VkCommandBuffer**)&pPacket->pSubmits[i].pCommandBuffers;\n',
+                                                                                 '   *ppCBs = (VkCommandBuffer*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pSubmits[i].pCommandBuffers);\n',
+                                                                                 '   VkSemaphore** ppSems = (VkSemaphore**)&pPacket->pSubmits[i].pWaitSemaphores;\n',
+                                                                                 '   *ppSems = (VkSemaphore*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pSubmits[i].pWaitSemaphores);\n',
+                                                                                 '   ppSems = (VkSemaphore**)&pPacket->pSubmits[i].pSignalSemaphores;\n',
+                                                                                 '   *ppSems = (VkSemaphore*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pSubmits[i].pSignalSemaphores);\n',
+                                                                                 '   VkPipelineStageFlags** ppStageMask = (VkPipelineStageFlags**)&pPacket->pSubmits[i].pWaitDstStageMask;\n',
+                                                                                 '   *ppStageMask = (VkPipelineStageFlags*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pSubmits[i].pWaitDstStageMask);\n',
+                                                                                 '}'
+                                                                               ]},
+                             'CreateGraphicsPipelines' : {'param': 'pCreateInfos', 'txt': create_gfx_pipe},
+                             'CreateComputePipeline' : {'param': 'pCreateInfo', 'txt': ['interpret_VkPipelineShaderStageCreateInfo(pHeader, (VkPipelineShaderStageCreateInfo*)(&pPacket->pCreateInfo->cs));']},
+                             'CreateFramebuffer' : {'param': 'pCreateInfo', 'txt': ['VkImageView** ppAV = (VkImageView**)&(pPacket->pCreateInfo->pAttachments);\n',
+                                                                                    '*ppAV = (VkImageView*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)(pPacket->pCreateInfo->pAttachments));']},
+                             'CmdBeginRenderPass' : {'param': 'pRenderPassBegin', 'txt': ['VkClearValue** ppCV = (VkClearValue**)&(pPacket->pRenderPassBegin->pClearValues);\n',
+                                                                                          '*ppCV = (VkClearValue*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)(pPacket->pRenderPassBegin->pClearValues));']},
+                             'CreateShaderModule' : {'param': 'pCreateInfo', 'txt': ['void** ppCode = (void**)&(pPacket->pCreateInfo->pCode);\n',
+                                                                                     '*ppCode = (void*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pCreateInfo->pCode);']},
+                             'CreateImage' : {'param': 'pCreateInfo', 'txt': ['uint32_t** ppQueueFamilyIndices = (uint32_t**)&(pPacket->pCreateInfo->pQueueFamilyIndices);\n',
+                                                                              '*ppQueueFamilyIndices = (uint32_t*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pCreateInfo->pQueueFamilyIndices);']},
+                             'CreateBuffer' : {'param': 'pCreateInfo', 'txt': ['uint32_t** ppQueueFamilyIndices = (uint32_t**)&(pPacket->pCreateInfo->pQueueFamilyIndices);\n',
+                                                                              '*ppQueueFamilyIndices = (uint32_t*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->pCreateInfo->pQueueFamilyIndices);']},
+                             'FlushMappedMemoryRanges' : {'param': 'ppData', 'txt': ['uint32_t i = 0;\n',
+                                                                                     'for (i = 0; i < pPacket->memoryRangeCount; i++)\n',
+                                                                                     '{\n',
+                                                                                     '    pPacket->ppData[i] = (void*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->ppData[i]);\n',
+                                                                                     '}']},
+                             'InvalidateMappedMemoryRanges' : {'param': 'ppData', 'txt': ['uint32_t i = 0;\n',
+                                                                                     'for (i = 0; i < pPacket->memoryRangeCount; i++)\n',
+                                                                                     '{\n',
+                                                                                     '    pPacket->ppData[i] = (void*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->ppData[i]);\n',
+                                                                                     '}']},
+                             'QueuePresentKHR' : {'param': 'pPresentInfo', 'txt': ['VkSwapchainKHR **ppSC = (VkSwapchainKHR **)& pPacket->pPresentInfo->pSwapchains;\n',
+                                                                                   '*ppSC = (VkSwapchainKHR*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)(pPacket->pPresentInfo->pSwapchains));\n',
+                                                                                   'VkSemaphore **ppS = (VkSemaphore **) &pPacket->pPresentInfo->pWaitSemaphores;\n',
+                                                                                   '*ppS = (VkSemaphore *) vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)(pPacket->pPresentInfo->pWaitSemaphores));\n',
+                                                                                   'uint32_t **ppII = (uint32_t **) &pPacket->pPresentInfo->pImageIndices;\n',
+                                                                                   '*ppII = (uint32_t*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)(pPacket->pPresentInfo->pImageIndices));\n',
+                                                                                   'if (pPacket->pPresentInfo->pResults != NULL) {\n',
+                                                                                   '    VkResult **ppR = (VkResult **) &pPacket->pPresentInfo->pResults;\n',
+                                                                                   '    *ppR = (VkResult*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)(pPacket->pPresentInfo->pResults));\n',
+                                                                                   '}']},
+                             'CreateSwapchainKHR' : {'param': 'pCreateInfo', 'txt': ['uint32_t **ppQFI = (uint32_t**)&pPacket->pCreateInfo->pQueueFamilyIndices;\n',
+                                                     '(*ppQFI) = (uint32_t*)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)(pPacket->pCreateInfo->pQueueFamilyIndices));']},
+
+        }
+        if_body = []
+        if_body.append('typedef struct packet_vkApiVersion {')
+        if_body.append('    vktrace_trace_packet_header* header;')
+        if_body.append('    uint32_t version;')
+        if_body.append('} packet_vkApiVersion;\n')
+        if_body.append('static packet_vkApiVersion* interpret_body_as_vkApiVersion(vktrace_trace_packet_header* pHeader)')
+        if_body.append('{')
+        if_body.append('    packet_vkApiVersion* pPacket = (packet_vkApiVersion*)pHeader->pBody;')
+        if_body.append('    pPacket->header = pHeader;')
+        if_body.append('    return pPacket;')
+        if_body.append('}\n')
+        for proto in self.protos:
+            if proto.name not in proto_exclusions:
+                if 'UnmapMemory' == proto.name:
+                    proto.params.append(vulkan.Param("void*", "pData"))
+                elif 'FlushMappedMemoryRanges' == proto.name:
+                    proto.params.append(vulkan.Param("void**", "ppData"))
+                elif 'InvalidateMappedMemoryRanges' == proto.name:
+                    proto.params.append(vulkan.Param("void**", "ppData"))
+                if_body.append('%s' % self.lineinfo.get())
+                if_body.append('typedef struct packet_vk%s {' % proto.name)
+                if_body.append('    vktrace_trace_packet_header* header;')
+                for p in proto.params:
+                    if_body.append('    %s;' % p.c())
+                if 'void' != proto.ret:
+                    if_body.append('    %s result;' % proto.ret)
+                if_body.append('} packet_vk%s;\n' % proto.name)
+                if_body.append('static packet_vk%s* interpret_body_as_vk%s(vktrace_trace_packet_header* pHeader)' % (proto.name, proto.name))
+                if_body.append('{')
+                if_body.append('    packet_vk%s* pPacket = (packet_vk%s*)pHeader->pBody;' % (proto.name, proto.name))
+                if_body.append('    pPacket->header = pHeader;')
+                for p in proto.params:
+                    if '*' in p.ty:
+                        if 'DeviceCreateInfo' in p.ty:
+                            if_body.append('    pPacket->%s = interpret_VkDeviceCreateInfo(pHeader, (intptr_t)pPacket->%s);' % (p.name, p.name))
+                        elif 'InstanceCreateInfo' in p.ty:
+                            if_body.append('    pPacket->%s = interpret_VkInstanceCreateInfo(pHeader, (intptr_t)pPacket->%s);' % (p.name, p.name))
+                        else:
+                            if_body.append('    pPacket->%s = (%s)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->%s);' % (p.name, p.ty, p.name))
+                        # TODO : Generalize this custom code to kill dict data struct above.
+                        #  Really the point of this block is to catch params w/ embedded ptrs to structs and chains of structs
+                        if proto.name in custom_case_dict and p.name == custom_case_dict[proto.name]['param']:
+                            if_body.append('    if (pPacket->%s != NULL)' % custom_case_dict[proto.name]['param'])
+                            if_body.append('    {')
+                            if_body.append('        %s' % "        ".join(custom_case_dict[proto.name]['txt']))
+                            if_body.append('    }')
+                if_body.append('    return pPacket;')
+                if_body.append('}\n')
+        return "\n".join(if_body)
+
+    def _generate_interp_funcs_ext(self, extensionName):
+        if_body = []
+        custom_case_dict = { }
+        for ext in vulkan.extensions_all:
+            if ext.name.lower() == extensionName.lower():
+                if ext.ifdef:
+                    if_body.append('#ifdef %s' % ext.ifdef)
+                for proto in ext.protos:
+                    if_body.append('typedef struct packet_vk%s {' % proto.name)
+                    if_body.append('    vktrace_trace_packet_header* pHeader;')
+                    for p in proto.params:
+                        if_body.append('    %s;' % p.c())
+                    if 'void' != proto.ret:
+                        if_body.append('    %s result;' % proto.ret)
+                    if_body.append('} packet_vk%s;\n' % proto.name)
+                    if_body.append('static packet_vk%s* interpret_body_as_vk%s(vktrace_trace_packet_header* pHeader)' % (proto.name, proto.name))
+                    if_body.append('{')
+                    if_body.append('    packet_vk%s* pPacket = (packet_vk%s*)pHeader->pBody;' % (proto.name, proto.name))
+                    if_body.append('    pPacket->pHeader = pHeader;')
+                    for p in proto.params:
+                        if '*' in p.ty:
+                            if_body.append('    pPacket->%s = (%s)vktrace_trace_packet_interpret_buffer_pointer(pHeader, (intptr_t)pPacket->%s);' % (p.name, p.ty, p.name))
+                            # TODO : Generalize this custom code to kill dict data struct above.
+                            #  Really the point of this block is to catch params w/ embedded ptrs to structs and chains of structs
+                            if proto.name in custom_case_dict and p.name == custom_case_dict[proto.name]['param']:
+                                if_body.append('    if (pPacket->%s != NULL)' % custom_case_dict[proto.name]['param'])
+                                if_body.append('    {')
+                                if_body.append('        %s' % "        ".join(custom_case_dict[proto.name]['txt']))
+                                if_body.append('    }')
+                    if_body.append('    return pPacket;')
+                    if_body.append('}\n')
+                if ext.ifdef:
+                    if_body.append('#endif /* %s */' % ext.ifdef)
+        return "\n".join(if_body)
+
+    def _generate_replay_func_ptrs(self):
+        xf_body = []
+        xf_body.append('struct vkFuncs {')
+        xf_body.append('    void init_funcs(void * libHandle);')
+        xf_body.append('    void *m_libHandle;\n')
+        for ext in vulkan.extensions_all:
+            if ext.ifdef:
+                xf_body.append('#ifdef %s' % ext.ifdef)
+            for proto in ext.protos:
+                if proto.name in proto_exclusions:
+                    continue
+
+                xf_body.append('    typedef %s( VKAPI_PTR * type_vk%s)(' % (proto.ret, proto.name))
+                for p in proto.params:
+                    xf_body.append('        %s,' % p.c())
+                xf_body[-1] = xf_body[-1].replace(',', ');')
+                xf_body.append('    type_vk%s real_vk%s;' % (proto.name, proto.name))
+            if ext.ifdef:
+                xf_body.append('#endif /* %s */' % ext.ifdef)
+        xf_body.append('};')
+        return "\n".join(xf_body)
+
+    def _map_decl(self, type1, type2, name):
+        return '    std::map<%s, %s> %s;' % (type1, type2, name)
+
+    def _add_to_map_decl(self, type1, type2, name):
+        txt = '    void add_to_%s_map(%s pTraceVal, %s pReplayVal)\n    {\n' % (name[2:], type1, type2)
+        #TODO138 : These checks need to vary between disp & non-disp objects
+        #txt += '        assert(pTraceVal != 0);\n'
+        #txt += '        assert(pReplayVal != 0);\n'
+        txt += '        %s[pTraceVal] = pReplayVal;\n    }\n' % name
+        return txt
+
+    def _rm_from_map_decl(self, ty, name):
+        txt = '    void rm_from_%s_map(const %s& key)\n    {\n' % (name[2:], ty)
+        txt += '        %s.erase(key);\n    }\n' % name
+        return txt
+
+    def _remap_decl(self, ty, name):
+        txt = '    %s remap_%s(const %s& value)\n    {\n' % (ty, name[2:], ty)
+        txt += '        if (value == 0) { return 0; }\n'
+        txt += '        std::map<%s, %s>::const_iterator q = %s.find(value);\n' % (ty, ty, name)
+        txt += '        if (q == %s.end()) { vktrace_LogError("Failed to remap %s."); return VK_NULL_HANDLE; }\n' % (name, ty)
+        txt += '        return q->second;\n    }\n'
+        return txt
+
+    def _generate_replay_objMemory_funcs(self):
+        rof_body = []
+        # Custom code for memory mapping functions for app writes into mapped memory
+        rof_body.append('// memory mapping functions for app writes into mapped memory')
+        rof_body.append('    bool isPendingAlloc()')
+        rof_body.append('    {')
+        rof_body.append('        return m_pendingAlloc;')
+        rof_body.append('    }')
+        rof_body.append('')
+        rof_body.append('    void setAllocInfo(const VkMemoryAllocateInfo *info, const bool pending)')
+        rof_body.append('    {')
+        rof_body.append('        m_pendingAlloc = pending;')
+        rof_body.append('        m_allocInfo = *info;')
+        rof_body.append('    }')
+        rof_body.append('')
+        rof_body.append('    void setMemoryDataAddr(void *pBuf)')
+        rof_body.append('    {')
+        rof_body.append('        if (m_mapRange.empty())')
+        rof_body.append('        {')
+        rof_body.append('            vktrace_LogError("gpuMemory::setMemoryDataAddr() m_mapRange is empty.");')
+        rof_body.append('            return;')
+        rof_body.append('        }')
+        rof_body.append('        MapRange mr = m_mapRange.back();')
+        rof_body.append('        if (mr.pData != NULL)')
+        rof_body.append('            vktrace_LogWarning("gpuMemory::setMemoryDataAddr() data already mapped overwrite old mapping.");')
+        rof_body.append('        else if (pBuf == NULL)')
+        rof_body.append('            vktrace_LogWarning("gpuMemory::setMemoryDataAddr() adding NULL pointer.");')
+        rof_body.append('        mr.pData = (uint8_t *) pBuf;')
+        rof_body.append('    }')
+        rof_body.append('')
+        # add for page guard optimization
+        rof_body.append('    void copyMappingDataPageGuard(const void* pSrcData)')
+        rof_body.append('    {')
+        rof_body.append('        if (m_mapRange.empty())')
+        rof_body.append('        {')
+        rof_body.append('            vktrace_LogError("gpuMemory::copyMappingData() m_mapRange is empty.");')
+        rof_body.append('            return;')
+        rof_body.append('        }')
+        rof_body.append('        MapRange mr = m_mapRange.back();')
+        rof_body.append('        if (!pSrcData || !mr.pData)')
+        rof_body.append('        {')
+        rof_body.append('            if (!pSrcData)')
+        rof_body.append('                vktrace_LogError("gpuMemory::copyMappingData() null src pointer.");')
+        rof_body.append('            else')
+        rof_body.append('                vktrace_LogError("gpuMemory::copyMappingData() null dest pointer totalSize=%u.", m_allocInfo.allocationSize);')
+        rof_body.append('            m_mapRange.pop_back();')
+        rof_body.append('            return;')
+        rof_body.append('        }')
+        rof_body.append('')
+        rof_body.append('        PageGuardChangedBlockInfo *pChangedInfoArray = (PageGuardChangedBlockInfo *)pSrcData;')
+        rof_body.append('        if (pChangedInfoArray[0].length)')
+        rof_body.append('        {')
+        rof_body.append('            PBYTE pChangedData = (PBYTE)(pSrcData)+sizeof(PageGuardChangedBlockInfo)*(pChangedInfoArray[0].offset + 1);')
+        rof_body.append('            DWORD CurrentOffset = 0;')
+        rof_body.append('            for (DWORD i = 0; i < pChangedInfoArray[0].offset; i++)')
+        rof_body.append('            {')
+        rof_body.append('                if ((size_t)pChangedInfoArray[i + 1].length)')
+        rof_body.append('                {')
+        rof_body.append('                    memcpy(mr.pData +  (size_t)pChangedInfoArray[i + 1].offset, pChangedData + CurrentOffset, (size_t)pChangedInfoArray[i + 1].length);')
+        rof_body.append('                }')
+        rof_body.append('                CurrentOffset += pChangedInfoArray[i + 1].length;')
+        rof_body.append('            }')
+        rof_body.append('        }')
+        rof_body.append('    }')
+        rof_body.append('')
+        #  add for page guard optimization end
+        rof_body.append('    void setMemoryMapRange(void *pBuf, const size_t size, const size_t offset, const bool pending)')
+        rof_body.append('    {')
+        rof_body.append('        MapRange mr;')
+        rof_body.append('        mr.pData = (uint8_t *) pBuf;')
+        rof_body.append('        if (size == 0)')
+        rof_body.append('            mr.size = (size_t)m_allocInfo.allocationSize - offset;')
+        rof_body.append('        else')
+        rof_body.append('            mr.size = size;')
+        rof_body.append('        mr.offset = offset;')
+        rof_body.append('        mr.pending = pending;')
+        rof_body.append('        m_mapRange.push_back(mr);')
+        rof_body.append('        assert((size_t)m_allocInfo.allocationSize >= (size + offset));')
+        rof_body.append('    }')
+        rof_body.append('')
+        rof_body.append('    void copyMappingData(const void* pSrcData, bool entire_map, size_t size, size_t offset)')
+        rof_body.append('    {')
+        rof_body.append('        if (m_mapRange.empty())')
+        rof_body.append('        {')
+        rof_body.append('            vktrace_LogError("gpuMemory::copyMappingData() m_mapRange is empty.");')
+        rof_body.append('            return;')
+        rof_body.append('        }')
+        rof_body.append('        MapRange mr = m_mapRange.back();')
+        rof_body.append('        if (!pSrcData || !mr.pData)')
+        rof_body.append('        {')
+        rof_body.append('            if (!pSrcData)')
+        rof_body.append('                vktrace_LogError("gpuMemory::copyMappingData() null src pointer.");')
+        rof_body.append('            else')
+        rof_body.append('                vktrace_LogError("gpuMemory::copyMappingData() null dest pointer totalSize=%u.", m_allocInfo.allocationSize);')
+        rof_body.append('            m_mapRange.pop_back();')
+        rof_body.append('            return;')
+        rof_body.append('        }')
+        rof_body.append('        if (entire_map)')
+        rof_body.append('        {')
+        rof_body.append('            size = mr.size;')
+        rof_body.append('            offset = 0;   // pointer to mapped buffer is from offset 0')
+        rof_body.append('        }')
+        rof_body.append('        else')
+        rof_body.append('        {')
+        rof_body.append('            assert(offset >= mr.offset);')
+        rof_body.append('            assert(size <= mr.size && (size + offset) <= (size_t)m_allocInfo.allocationSize);')
+        rof_body.append('        }')
+        rof_body.append('        memcpy(mr.pData + offset, pSrcData, size);')
+        rof_body.append('        if (!mr.pending && entire_map)')
+        rof_body.append('            m_mapRange.pop_back();')
+        rof_body.append('    }')
+        rof_body.append('')
+        rof_body.append('    size_t getMemoryMapSize()')
+        rof_body.append('    {')
+        rof_body.append('        return (!m_mapRange.empty()) ? m_mapRange.back().size : 0;')
+        rof_body.append('    }\n')
+        return "\n".join(rof_body)
+
+    def _generate_replay_objmapper_class(self):
+        # Create dict mapping member var names to VK type (i.e. 'm_imageViews' : 'VkImage_VIEW')
+        obj_map_dict = {}
+        for obj in vulkan.object_type_list:
+            if (obj.startswith('Vk')):
+                mem_var = obj.replace('Vk', '').lower()
+            mem_var_list = mem_var.split('_')
+            mem_var = 'm_%s%ss' % (mem_var_list[0], "".join([m.title() for m in mem_var_list[1:]]))
+            obj_map_dict[mem_var] = obj
+        rc_body = []
+        rc_body.append('#define VKTRACE_VK_DEBUG_REPORT_OBJECT_TYPE_UNKNOWN (VkObjectType)-1')
+        rc_body.append('')
+        rc_body.append('typedef struct _VKAllocInfo {')
+        rc_body.append('    VkDeviceSize size;')
+        rc_body.append('    uint8_t *pData;')
+        rc_body.append('    bool rangeUpdated;')
+        rc_body.append('} VKAllocInfo;')
+        rc_body.append('')
+        rc_body.append('class objMemory {')
+        rc_body.append('public:')
+        rc_body.append('    objMemory() : m_numAllocations(0), m_pMemReqs(NULL) {}')
+        rc_body.append('    ~objMemory() { free(m_pMemReqs);}')
+        rc_body.append('    void setCount(const uint32_t num)')
+        rc_body.append('    {')
+        rc_body.append('        m_numAllocations = num;')
+        rc_body.append('    }\n')
+        rc_body.append('    void setReqs(const VkMemoryRequirements *pReqs, const uint32_t num)')
+        rc_body.append('    {')
+        rc_body.append('        if (m_numAllocations != num && m_numAllocations != 0)')
+        rc_body.append('            vktrace_LogError("objMemory::setReqs, internal mismatch on number of allocations.");')
+        rc_body.append('        if (m_pMemReqs == NULL && pReqs != NULL)')
+        rc_body.append('        {')
+        rc_body.append('            m_pMemReqs = (VkMemoryRequirements *) vktrace_malloc(num * sizeof(VkMemoryRequirements));')
+        rc_body.append('            if (m_pMemReqs == NULL)')
+        rc_body.append('            {')
+        rc_body.append('                vktrace_LogError("objMemory::setReqs out of memory.");')
+        rc_body.append('                return;')
+        rc_body.append('            }')
+        rc_body.append('            memcpy(m_pMemReqs, pReqs, num * sizeof(VkMemoryRequirements));')
+        rc_body.append('        }')
+        rc_body.append('    }\n')
+        rc_body.append('private:')
+        rc_body.append('    uint32_t m_numAllocations;')
+        rc_body.append('    VkMemoryRequirements *m_pMemReqs;')
+        rc_body.append('};')
+        rc_body.append('')
+        rc_body.append('class gpuMemory {')
+        rc_body.append('public:')
+        rc_body.append('    gpuMemory() : m_pendingAlloc(false) {m_allocInfo.allocationSize = 0;}')
+        rc_body.append('    ~gpuMemory() {}')
+        rc_body.append(self._generate_replay_objMemory_funcs())
+        rc_body.append('private:')
+        rc_body.append('    bool m_pendingAlloc;')
+        rc_body.append('    struct MapRange {')
+        rc_body.append('        bool pending;')
+        rc_body.append('        size_t size;')
+        rc_body.append('        size_t offset;')
+        rc_body.append('        uint8_t* pData;')
+        rc_body.append('    };')
+        rc_body.append('    std::vector<MapRange> m_mapRange;')
+        rc_body.append('    VkMemoryAllocateInfo m_allocInfo;')
+        rc_body.append('};')
+        rc_body.append('')
+        rc_body.append('typedef struct _imageObj {')
+        rc_body.append('     objMemory imageMem;')
+        rc_body.append('     VkImage replayImage;')
+        rc_body.append(' } imageObj;')
+        rc_body.append('')
+        rc_body.append('typedef struct _bufferObj {')
+        rc_body.append('     objMemory bufferMem;')
+        rc_body.append('     VkBuffer replayBuffer;')
+        rc_body.append(' } bufferObj;')
+        rc_body.append('')
+        rc_body.append('typedef struct _gpuMemObj {')
+        rc_body.append('     gpuMemory *pGpuMem;')
+        rc_body.append('     VkDeviceMemory replayGpuMem;')
+        rc_body.append(' } gpuMemObj;')
+        rc_body.append('')
+        rc_body.append('')
+        rc_body.append('class vkReplayObjMapper {')
+        rc_body.append('public:')
+        rc_body.append('    vkReplayObjMapper() {}')
+        rc_body.append('    ~vkReplayObjMapper() {}')
+        rc_body.append('')
+        rc_body.append(' bool m_adjustForGPU; // true if replay adjusts behavior based on GPU')
+        # Code for memory objects for handling replay GPU != trace GPU object memory requirements
+        rc_body.append('void init_objMemCount(const uint64_t handle, const VkDebugReportObjectTypeEXT objectType, const uint32_t &num)\n {')
+        rc_body.append('    switch (objectType) {')
+        rc_body.append('        case VK_DEBUG_REPORT_OBJECT_TYPE_BUFFER_EXT:')
+        rc_body.append('        {')
+        rc_body.append('            std::map<VkBuffer, bufferObj>::iterator it = m_buffers.find((VkBuffer) handle);')
+        rc_body.append('            if (it != m_buffers.end()) {')
+        rc_body.append('                objMemory obj = it->second.bufferMem;')
+        rc_body.append('                obj.setCount(num);')
+        rc_body.append('                return;')
+        rc_body.append('            }')
+        rc_body.append('            break;')
+        rc_body.append('        }')
+        rc_body.append('        case VK_DEBUG_REPORT_OBJECT_TYPE_IMAGE_EXT:')
+        rc_body.append('        {')
+        rc_body.append('            std::map<VkImage, imageObj>::iterator it = m_images.find((VkImage) handle);')
+        rc_body.append('            if (it != m_images.end()) {')
+        rc_body.append('                objMemory obj = it->second.imageMem;')
+        rc_body.append('                obj.setCount(num);')
+        rc_body.append('                return;')
+        rc_body.append('            }')
+        rc_body.append('            break;')
+        rc_body.append('        }')
+        rc_body.append('        default:')
+        rc_body.append('            break;')
+        rc_body.append('    }')
+        rc_body.append('    return;')
+        rc_body.append('}\n')
+        rc_body.append('void init_objMemReqs(const uint64_t handle, const VkDebugReportObjectTypeEXT objectType, const VkMemoryRequirements *pMemReqs, const unsigned int num)\n    {')
+        rc_body.append('    switch (objectType) {')
+        rc_body.append('        case VK_DEBUG_REPORT_OBJECT_TYPE_BUFFER_EXT:')
+        rc_body.append('        {')
+        rc_body.append('            std::map<VkBuffer, bufferObj>::iterator it = m_buffers.find((VkBuffer) handle);')
+        rc_body.append('            if (it != m_buffers.end()) {')
+        rc_body.append('                objMemory obj = it->second.bufferMem;')
+        rc_body.append('                obj.setReqs(pMemReqs, num);')
+        rc_body.append('                return;')
+        rc_body.append('            }')
+        rc_body.append('            break;')
+        rc_body.append('        }')
+        rc_body.append('        case VK_DEBUG_REPORT_OBJECT_TYPE_IMAGE_EXT:')
+        rc_body.append('        {')
+        rc_body.append('            std::map<VkImage, imageObj>::iterator it = m_images.find((VkImage) handle);')
+        rc_body.append('            if (it != m_images.end()) {')
+        rc_body.append('                objMemory obj = it->second.imageMem;')
+        rc_body.append('                obj.setReqs(pMemReqs, num);')
+        rc_body.append('                return;')
+        rc_body.append('            }')
+        rc_body.append('            break;')
+        rc_body.append('        }')
+        rc_body.append('        default:')
+        rc_body.append('            break;')
+        rc_body.append('    }')
+        rc_body.append('    return;')
+        rc_body.append('    }')
+        rc_body.append('')
+        rc_body.append('    void clear_all_map_handles()\n    {')
+        for var in sorted(obj_map_dict):
+            rc_body.append('        %s.clear();' % var)
+        for var in additional_remap_dict:
+            rc_body.append('        m_%s.clear();' %var)
+        rc_body.append('    }\n')
+        disp_obj_types = [obj for obj in vulkan.object_dispatch_list]
+        for var in sorted(obj_map_dict):
+            if obj_map_dict[var] == 'VkImage':
+                rc_body.append(self._map_decl('VkImage', 'imageObj', var))
+                rc_body.append(self._add_to_map_decl('VkImage', 'imageObj', var))
+                rc_body.append(self._rm_from_map_decl('VkImage', var))
+                rc_body.append('    VkImage remap_images(const VkImage& value)')
+                rc_body.append('    {')
+                rc_body.append('        if (value == 0) { return 0; }')
+                rc_body.append('')
+                rc_body.append('        std::map<VkImage, imageObj>::const_iterator q = m_images.find(value);')
+                rc_body.append('        if (q == m_images.end()) { vktrace_LogError("Failed to remap VkImage."); return VK_NULL_HANDLE; }\n')
+                rc_body.append('        return q->second.replayImage;')
+                rc_body.append('    }\n')
+            elif obj_map_dict[var] == 'VkBuffer':
+                rc_body.append(self._map_decl('VkBuffer', 'bufferObj', var))
+                rc_body.append(self._add_to_map_decl('VkBuffer', 'bufferObj', var))
+                rc_body.append(self._rm_from_map_decl('VkBuffer', var))
+                rc_body.append('    VkBuffer remap_buffers(const VkBuffer& value)')
+                rc_body.append('    {')
+                rc_body.append('        if (value == 0) { return 0; }')
+                rc_body.append('')
+                rc_body.append('        std::map<VkBuffer, bufferObj>::const_iterator q = m_buffers.find(value);')
+                rc_body.append('        if (q == m_buffers.end()) { vktrace_LogError("Failed to remap VkBuffer."); return VK_NULL_HANDLE; }\n')
+                rc_body.append('        return q->second.replayBuffer;')
+                rc_body.append('    }\n')
+            elif obj_map_dict[var] == 'VkDeviceMemory':
+                rc_body.append(self._map_decl('VkDeviceMemory', 'gpuMemObj', var))
+                rc_body.append(self._add_to_map_decl('VkDeviceMemory', 'gpuMemObj', var))
+                rc_body.append(self._rm_from_map_decl('VkDeviceMemory', var))
+                rc_body.append('    VkDeviceMemory remap_devicememorys(const VkDeviceMemory& value)')
+                rc_body.append('    {')
+                rc_body.append('        if (value == 0) { return 0; }')
+                rc_body.append('')
+                rc_body.append('        std::map<VkDeviceMemory, gpuMemObj>::const_iterator q = m_devicememorys.find(value);')
+                rc_body.append('        if (q == m_devicememorys.end()) { vktrace_LogError("Failed to remap VkDeviceMemory."); return VK_NULL_HANDLE; }')
+                rc_body.append('        return q->second.replayGpuMem;')
+                rc_body.append('    }\n')
+            else:
+                rc_body.append(self._map_decl(obj_map_dict[var], obj_map_dict[var], var))
+                rc_body.append(self._add_to_map_decl(obj_map_dict[var], obj_map_dict[var], var))
+                rc_body.append(self._rm_from_map_decl(obj_map_dict[var], var))
+                rc_body.append(self._remap_decl(obj_map_dict[var], var))
+        for var in additional_remap_dict:
+            rc_body.append('        std::map<%s, %s> m_%s;' % (additional_remap_dict[var], additional_remap_dict[var], var))
+            rc_body.append('        void add_to_%s_map(%s traceVal, %s replayVal)' % (var, additional_remap_dict[var], additional_remap_dict[var]))
+            rc_body.append('        {')
+            rc_body.append('            m_%s[traceVal] = replayVal;' % var)
+            rc_body.append('        }')
+            rc_body.append('')
+            rc_body.append('        void rm_from_%s_map(const %s& key)' % (var, additional_remap_dict[var]))
+            rc_body.append('        {')
+            rc_body.append('            m_%s.erase(key);' % var)
+            rc_body.append('        }')
+            rc_body.append('')
+            rc_body.append('        %s remap_%s(const %s& value)' % (additional_remap_dict[var], var, additional_remap_dict[var]))
+            rc_body.append('        {')
+            rc_body.append('            std::map<%s, %s>::const_iterator q = m_%s.find(value);' % (additional_remap_dict[var], additional_remap_dict[var], var))
+            rc_body.append('            if (q == m_%s.end()) { vktrace_LogError("Failed to remap %s."); return UINT32_MAX; }' % (var, var))
+            rc_body.append('            return q->second;')
+            rc_body.append('        }')
+
+        # VkDynamicStateObject code
+# TODO138 : Each dynamic state object is now unique so need to make sure their re-mapping is being handled correctly
+#        state_obj_remap_types = vulkan.object_dynamic_state_list
+#        state_obj_bindings = vulkan.object_dynamic_state_bind_point_list
+#        rc_body.append('    VkDynamicStateObject remap(const VkDynamicStateObject& state, const VkStateBindPoint& bindPoint)\n    {')
+#        rc_body.append('        VkDynamicStateObject obj;')
+#        index = 0
+#        while index < len(state_obj_remap_types):
+#            obj = state_obj_remap_types[index]
+#            type = state_obj_bindings[index]
+#            rc_body.append('        if (bindPoint == %s) {' % type)
+#            rc_body.append('            if ((obj = remap(static_cast <%s> (state))) != VK_NULL_HANDLE)' % obj.type)
+#            rc_body.append('                return obj;')
+#            rc_body.append('        }')
+#            index += 1
+#        for obj in state_obj_remap_types:
+#            rc_body.append('//        if ((obj = remap(static_cast <%s> (state))) != VK_NULL_HANDLE)' % obj.type)
+#            rc_body.append('//            return obj;')
+#        rc_body.append('        vktrace_LogWarning("Failed to remap VkDynamicStateObject.");')
+#        rc_body.append('        return VK_NULL_HANDLE;\n    }')
+#        rc_body.append('    void rm_from_map(const VkDynamicStateObject& state)\n    {')
+#        for obj in state_obj_remap_types:
+#            rc_body.append('        rm_from_map(static_cast <%s> (state));' % obj.type)
+#        rc_body.append('    }')
+#        rc_body.append('')
+        rc_body.append('};')
+        return "\n".join(rc_body)
+
+    def _generate_replay_init_funcs(self):
+        rif_body = []
+        rif_body.append('void vkFuncs::init_funcs(void * handle)\n{\n    m_libHandle = handle;')
+        for ext in vulkan.extensions_all:
+            if ext.ifdef:
+                rif_body.append('#ifdef %s' % ext.ifdef)
+            for proto in ext.protos:
+                if proto.name in proto_exclusions:
+                    continue
+                if 'DebugReport' not in proto.name:
+                    rif_body.append('    real_vk%s = (type_vk%s)(vktrace_platform_get_library_entrypoint(handle, "vk%s"));' % (proto.name, proto.name, proto.name))
+                else: # These func ptrs get assigned at GetProcAddr time
+                    rif_body.append('    real_vk%s = (type_vk%s)NULL;' % (proto.name, proto.name))
+            if ext.ifdef:
+                rif_body.append('#endif /* %s */' % ext.ifdef)
+        rif_body.append('}')
+        return "\n".join(rif_body)
+
+    def _remap_packet_param(self, funcName, paramType, paramName, lastName):
+        remap_list = vulkan.object_type_list
+        param_exclude_list = ['pDescriptorSets', 'pFences']
+        cleanParamType = paramType.strip('*').replace('const ', '')
+        VulkNonDispObjects = [o for o in vulkan.object_non_dispatch_list]
+        for obj in remap_list:
+            if obj == cleanParamType and paramName not in param_exclude_list:
+                objectTypeRemapParam = ''
+                if 'VkDynamicStateObject' == cleanParamType:
+                    objectTypeRemapParam = ', pPacket->stateBindPoint'
+                elif 'object' == paramName:
+                    if 'DbgSetObjectTag' == funcName:
+                        objectTypeRemapParam = ', VKTRACE_VK_DEBUG_REPORT_OBJECT_TYPE_UNKNOWN'
+                    else:
+                        objectTypeRemapParam = ', pPacket->objType'
+                elif 'srcObject' == paramName and 'Callback' in funcName:
+                    objectTypeRemapParam = ', pPacket->objType'
+                pArray = ''
+                if '*' in paramType:
+                    if 'const' not in paramType:
+                        result = '        %s remapped%s = m_objMapper.remap_%ss(*pPacket->%s%s);\n' % (cleanParamType, paramName, paramName.lower(), paramName, objectTypeRemapParam)
+                        result += '        if (pPacket->%s != VK_NULL_HANDLE && remapped%s == VK_NULL_HANDLE)\n' % (paramName, paramName)
+                        result += '        {\n'
+                        result += '            vktrace_LogError("Error detected in %s() due to invalid remapped %s.");\n' % (funcName, cleanParamType)
+                        result += '            return vktrace_replay::VKTRACE_REPLAY_ERROR;\n'
+                        result += '        }\n'
+                        return result
+                    else:
+                        if lastName == '':
+                            return '            // pPacket->%s should have been remapped with special case code' % (paramName)
+                        pArray = '[pPacket->%s]' % lastName
+                        result = '            %s *remapped%s = new %s%s;\n' % (cleanParamType, paramName, cleanParamType, pArray)
+                        result += '%s\n' % self.lineinfo.get()
+                        result += '            for (uint32_t i = 0; i < pPacket->%s; i++) {\n' % lastName
+                        result += '                remapped%s[i] = m_objMapper.remap_%ss(pPacket->%s[i]%s);\n' % (paramName, cleanParamType.lower()[2:], paramName, objectTypeRemapParam)
+                        result += '                if (pPacket->%s[i] != VK_NULL_HANDLE && remapped%s[i] == VK_NULL_HANDLE)\n' % (paramName, paramName)
+                        result += '                {\n'
+                        result += '                    vktrace_LogError("Error detected in %s() due to invalid remapped %s.");\n' % (funcName, cleanParamType)
+                        result += '                    return vktrace_replay::VKTRACE_REPLAY_ERROR;\n'
+                        result += '                }\n'
+                        result += '            }\n'
+                        return result
+
+                result = '            %s remapped%s = m_objMapper.remap_%ss(pPacket->%s%s);\n' % (paramType, paramName, cleanParamType.lower()[2:], paramName, objectTypeRemapParam)
+                result += '%s\n' % self.lineinfo.get()
+                result += '            if (pPacket->%s != VK_NULL_HANDLE && remapped%s == VK_NULL_HANDLE)\n' % (paramName, paramName)
+                result += '            {\n'
+                result += '                vktrace_LogError("Error detected in %s() due to invalid remapped %s.");\n' % (funcName, cleanParamType)
+                result += '                return vktrace_replay::VKTRACE_REPLAY_ERROR;\n'
+                result += '            }\n'
+                return result
+        return '            // No need to remap %s' % (paramName)
+
+    def _get_packet_param(self, funcName, paramType, paramName):
+        # list of types that require remapping
+        remap_list = vulkan.object_type_list
+        param_exclude_list = ['pDescriptorSets', 'pFences']
+        cleanParamType = paramType.strip('*').replace('const ', '')
+        for obj in remap_list:
+            if obj == cleanParamType and paramName not in param_exclude_list:
+                objectTypeRemapParam = ''
+                if 'object' == paramName:
+                    if 'DbgSetObjectTag' == funcName:
+                        objectTypeRemapParam = ', VKTRACE_VK_DEBUG_REPORT_OBJECT_TYPE_UNKNOWN'
+                    else:
+                        objectTypeRemapParam = ', pPacket->objType'
+                return 'remapped%s' % (paramName)
+        return 'pPacket->%s' % (paramName)
+
+    def _gen_replay_create_instance(self):
+        cb_body = []
+        cb_body.append('            replayResult = manually_replay_vkCreateInstance(pPacket);')
+        cb_body.append('            CHECK_RETURN_VALUE(vkCreateInstance);')
+        cb_body.append('            if (replayResult == VK_SUCCESS) {')
+        cb_body.append('                VkInstance remappedInstance = m_objMapper.remap_instances(*pPacket->pInstance);')
+        cb_body.append('                if (remappedInstance == VK_NULL_HANDLE) {')
+        cb_body.append('                    vktrace_LogError("Error detected in vkCreateInstance() due to invalid remapped VkInstance.");')
+        cb_body.append('                    returnValue = vktrace_replay::VKTRACE_REPLAY_ERROR;')
+        cb_body.append('                    break;')
+        cb_body.append('                }')
+        cb_body.append('                VkFlags reportFlags = VK_DEBUG_REPORT_INFORMATION_BIT_EXT | VK_DEBUG_REPORT_WARNING_BIT_EXT | VK_DEBUG_REPORT_PERFORMANCE_WARNING_BIT_EXT | VK_DEBUG_REPORT_ERROR_BIT_EXT | VK_DEBUG_REPORT_DEBUG_BIT_EXT;')
+        cb_body.append('                PFN_vkCreateDebugReportCallbackEXT callback = (PFN_vkCreateDebugReportCallbackEXT)vkGetInstanceProcAddr(remappedInstance, "vkCreateDebugReportCallbackEXT");')
+        cb_body.append('                if (callback != NULL) {')
+        cb_body.append('                    VkDebugReportCallbackCreateInfoEXT dbgCreateInfo;')
+        cb_body.append('                    memset(&dbgCreateInfo, 0, sizeof(dbgCreateInfo));')
+        cb_body.append('                    dbgCreateInfo.sType = VK_STRUCTURE_TYPE_DEBUG_REPORT_CREATE_INFO_EXT;')
+        cb_body.append('                    dbgCreateInfo.flags = reportFlags;')
+        cb_body.append('                    dbgCreateInfo.pfnCallback = g_fpDbgMsgCallback;')
+        cb_body.append('                    dbgCreateInfo.pUserData = NULL;')
+        cb_body.append('                    if (callback(remappedInstance, &dbgCreateInfo, NULL, &m_dbgMsgCallbackObj) != VK_SUCCESS) {')
+        cb_body.append('                        vktrace_LogWarning("Failed to register vulkan callback for replayer error handling.");')
+        cb_body.append('                        returnValue = vktrace_replay::VKTRACE_REPLAY_ERROR;')
+        cb_body.append('                        break;')
+        cb_body.append('                    }')
+        cb_body.append('                }')
+        cb_body.append('            }')
+        return "\n".join(cb_body)
+
+    # These are customized because they are the only entry points returning VkBool32
+    def _gen_replay_GetPhysicalDeviceXcbPresentationSupportKHR (self):
+        cb_body = []
+        cb_body.append('            VkBool32 rval = manually_replay_vkGetPhysicalDeviceXcbPresentationSupportKHR(pPacket);')
+        cb_body.append('            if (rval != pPacket->result)')
+        cb_body.append('            {')
+        cb_body.append('                vktrace_LogError("Return value %d from API call (vkGetPhysicalDeviceXcbPresentationSupportKHR) does not match return value from trace file %d.",')
+        cb_body.append('                                 rval, pPacket->result);')
+        cb_body.append('                returnValue = vktrace_replay::VKTRACE_REPLAY_BAD_RETURN;')
+        cb_body.append('            }')
+        return "\n".join(cb_body)
+
+    def _gen_replay_GetPhysicalDeviceXlibPresentationSupportKHR (self):
+        cb_body = []
+        cb_body.append('            VkBool32 rval = manually_replay_vkGetPhysicalDeviceXlibPresentationSupportKHR(pPacket);')
+        cb_body.append('            if (rval != pPacket->result)')
+        cb_body.append('            {')
+        cb_body.append('                vktrace_LogError("Return value %d from API call (vkGetPhysicalDeviceXlibPresentationSupportKHR) does not match return value from trace file %d.",')
+        cb_body.append('                                 rval, pPacket->result);')
+        cb_body.append('                returnValue = vktrace_replay::VKTRACE_REPLAY_BAD_RETURN;')
+        cb_body.append('            }')
+        return "\n".join(cb_body)
+
+    def _gen_replay_GetPhysicalDeviceWin32PresentationSupportKHR (self):
+        cb_body = []
+        cb_body.append('            VkBool32 rval = manually_replay_vkGetPhysicalDeviceWin32PresentationSupportKHR(pPacket);')
+        cb_body.append('            if (rval != pPacket->result)')
+        cb_body.append('            {')
+        cb_body.append('                vktrace_LogError("Return value %d from API call (vkGetPhysicalDeviceWin32PresentationSupportKHR) does not match return value from trace file %d.",')
+        cb_body.append('                                 rval, pPacket->result);')
+        cb_body.append('                returnValue = vktrace_replay::VKTRACE_REPLAY_BAD_RETURN;')
+        cb_body.append('            }')
+        return "\n".join(cb_body)
+
+    # Generate main replay case statements where actual replay API call is dispatched based on input packet data
+    def _generate_replay(self):
+        manually_replay_funcs = ['AllocateMemory',
+                                 'BeginCommandBuffer',
+                                 'CreateDescriptorSetLayout',
+                                 'CreateDevice',
+                                 'CreateBuffer',
+                                 'CreateImage',
+                                 'CreateCommandPool',
+                                 'CreateFramebuffer',
+                                 'GetPipelineCacheData',
+                                 'CreateGraphicsPipelines',
+                                 'CreateComputePipelines',
+                                 #'CreateInstance',
+                                 'CreatePipelineLayout',
+                                 'CreateRenderPass',
+                                 'CmdBeginRenderPass',
+                                 'CmdBindDescriptorSets',
+                                 'CmdBindVertexBuffers',
+                                 'CmdPipelineBarrier',
+                                 'QueuePresentKHR',
+                                 'CmdWaitEvents',
+                                 #'DestroyObject',
+                                 'EnumeratePhysicalDevices',
+                                 'FreeMemory',
+                                 'FreeDescriptorSets',
+                                 'FlushMappedMemoryRanges',
+                                 'InvalidateMappedMemoryRanges',
+                                 #'GetGlobalExtensionInfo',
+                                 #'GetImageSubresourceInfo',
+                                 #'GetObjectInfo',
+                                 #'GetPhysicalDeviceExtensionInfo',
+                                 'GetPhysicalDeviceMemoryProperties',
+                                 'GetPhysicalDeviceQueueFamilyProperties',
+                                 'GetPhysicalDeviceSurfaceSupportKHR',
+                                 'GetPhysicalDeviceSurfaceCapabilitiesKHR',
+                                 'GetPhysicalDeviceSurfaceFormatsKHR',
+                                 'GetPhysicalDeviceSurfacePresentModesKHR',
+                                 'CreateSwapchainKHR',
+                                 'GetSwapchainImagesKHR',
+                                 'CreateXcbSurfaceKHR',
+                                 'CreateXlibSurfaceKHR',
+                                 'GetPhysicalDeviceXcbPresentationSupportKHR',
+                                 'GetPhysicalDeviceXlibPresentationSupportKHR',
+                                 'CreateWin32SurfaceKHR',
+                                 'GetPhysicalDeviceWin32PresentationSupportKHR',
+                                 'CreateAndroidSurfaceKHR',
+                                 #TODO Wayland, Mir, Xlib
+                                 #'GetPhysicalDeviceInfo',
+                                 'MapMemory',
+                                 'QueueSubmit',
+                                 'QueueBindSparse',
+                                 #'StorePipeline',
+                                 'UnmapMemory',
+                                 'UpdateDescriptorSets',
+                                 'WaitForFences',
+                                 'CreateDebugReportCallbackEXT',
+                                 'DestroyDebugReportCallbackEXT',
+                                 'AllocateCommandBuffers',
+                                 ]
+
+        # validate the manually_replay_funcs list
+        protoFuncs = [proto.name for proto in self.protos]
+        wsi_platform_manual_funcs = ['CreateWin32SurfaceKHR', 'CreateXcbSurfaceKHR', 'CreateXlibSurfaceKHR', 'CreateAndroidSurfaceKHR']
+
+        for func in manually_replay_funcs:
+            if (func not in protoFuncs) and (func not in wsi_platform_manual_funcs):
+                sys.exit("Entry '%s' in manually_replay_funcs list is not in the vulkan function prototypes" % func)
+
+        # map protos to custom functions if body is fully custom
+        custom_body_dict = {'CreateInstance': self._gen_replay_create_instance,
+                            'GetPhysicalDeviceXcbPresentationSupportKHR': self._gen_replay_GetPhysicalDeviceXcbPresentationSupportKHR,
+                            'GetPhysicalDeviceXlibPresentationSupportKHR': self._gen_replay_GetPhysicalDeviceXlibPresentationSupportKHR,
+                            'GetPhysicalDeviceWin32PresentationSupportKHR': self._gen_replay_GetPhysicalDeviceWin32PresentationSupportKHR }
+        # multi-gpu Open funcs w/ list of local params to create
+        custom_open_params = {'OpenSharedMemory': (-1,),
+                              'OpenSharedSemaphore': (-1,),
+                              'OpenPeerMemory': (-1,),
+                              'OpenPeerImage': (-1, -2,)}
+        # Functions that create views are unique from other create functions
+        create_view_list = ['CreateBufferView', 'CreateImageView', 'CreateComputePipeline']
+        # Functions to treat as "Create' that don't have 'Create' in the name
+        special_create_list = ['LoadPipeline', 'LoadPipelineDerivative', 'AllocateMemory', 'GetDeviceQueue', 'PinSystemMemory', 'AllocateDescriptorSets', 'AcquireNextImageKHR']
+        # A couple funcs use do while loops
+        do_while_dict = {'GetFenceStatus': 'replayResult != pPacket->result  && pPacket->result == VK_SUCCESS', 'GetEventStatus': '(pPacket->result == VK_EVENT_SET || pPacket->result == VK_EVENT_RESET) && replayResult != pPacket->result', 'GetQueryPoolResults': 'pPacket->result == VK_SUCCESS && replayResult != pPacket->result'}
+        rbody = []
+        rbody.append('%s' % self.lineinfo.get())
+        rbody.append('vktrace_replay::VKTRACE_REPLAY_RESULT vkReplay::replay(vktrace_trace_packet_header *packet)')
+        rbody.append('{')
+        rbody.append('    vktrace_replay::VKTRACE_REPLAY_RESULT returnValue = vktrace_replay::VKTRACE_REPLAY_SUCCESS;')
+        rbody.append('    VkResult replayResult = VK_ERROR_VALIDATION_FAILED_EXT;')
+        rbody.append('    switch (packet->packet_id)')
+        rbody.append('    {')
+        rbody.append('        case VKTRACE_TPI_VK_vkApiVersion:')
+        rbody.append('        {')
+        rbody.append('            packet_vkApiVersion* pPacket = (packet_vkApiVersion*)(packet->pBody);')
+        rbody.append('            if (VK_VERSION_MAJOR(pPacket->version) != 1 || VK_VERSION_MINOR (pPacket->version) != 0)')
+        rbody.append('            {')
+        rbody.append('                vktrace_LogError("Trace file is from Vulkan version 0x%x (%u.%u.%u), but the vktrace plugin only supports version 0x%x (%u.%u.%u).", pPacket->version, (pPacket->version & 0xFFC00000) >> 22, (pPacket->version & 0x003FF000) >> 12, (pPacket->version & 0x00000FFF), VK_MAKE_VERSION(1, 0, VK_HEADER_VERSION), ((VK_MAKE_VERSION(1, 0, VK_HEADER_VERSION)) & 0xFFC00000) >> 22, ((VK_MAKE_VERSION(1, 0, VK_HEADER_VERSION)) & 0x003FF000) >> 12, ((VK_MAKE_VERSION(1, 0, VK_HEADER_VERSION)) & 0x00000FFF));')
+        rbody.append('                returnValue = vktrace_replay::VKTRACE_REPLAY_ERROR;')
+        rbody.append('            }')
+        rbody.append('            break;')
+        rbody.append('        }')
+        for proto in self.protos:
+            if proto.name in proto_exclusions:
+                continue
+
+            # TODO : This is an O(N^4) way of finding if this proto is guarded by an ifdef.
+            # If the concept of an ifdef field is ok, rewrite the outer loop to already have the ext.ifdef value ready:
+            # for ext in vulkan.extensions_all:
+            #     if ext.ifdef: if_body.append('#ifdef') # wrap all the protos in a single #ifdef block instead of repeating #ifdef for each proto
+            #     for proto in ext.protos:
+            proto_ext_ifdef = None
+            for ext in vulkan.extensions_all:
+                if ext.ifdef:
+                    for ext_proto in ext.protos:
+                        if proto.name == ext_proto.name:
+                            proto_ext_ifdef = ext.ifdef
+            if proto_ext_ifdef:
+                rbody.append('#ifdef %s' % proto_ext_ifdef)
+
+            ret_value = False
+            create_view = False
+            create_func = False
+            # TODO : How to handle void* return of GetProcAddr?
+#TODO make sure vkDestroy object functions really do clean up the object maps
+            if ('void' not in proto.ret.lower()) and ('size_t' not in proto.ret) and (proto.name not in custom_body_dict):
+                ret_value = True
+            if proto.name in create_view_list:
+                create_view = True
+            elif 'Create' in proto.name or proto.name in special_create_list:
+                create_func = True
+            rbody.append('        case VKTRACE_TPI_VK_vk%s:' % proto.name)
+            rbody.append('        {')
+            rbody.append('            packet_vk%s* pPacket = (packet_vk%s*)(packet->pBody);' % (proto.name, proto.name))
+            if proto.name in manually_replay_funcs:
+                if ret_value == True:
+                    rbody.append('            replayResult = manually_replay_vk%s(pPacket);' % proto.name)
+                else:
+                    rbody.append('            manually_replay_vk%s(pPacket);' % proto.name)
+            elif proto.name in custom_body_dict:
+                rbody.append(custom_body_dict[proto.name]())
+            else:
+                if proto.name in custom_open_params:
+                    for pidx in custom_open_params[proto.name]:
+                        rbody.append('            %s local_%s;' % (proto.params[pidx].ty.replace('const ', '').strip('*'), proto.params[pidx].name))
+                elif create_view:
+                    rbody.append('            %s createInfo;' % (proto.params[1].ty.strip('*').replace('const ', '')))
+                    rbody.append('            memcpy(&createInfo, pPacket->pCreateInfo, sizeof(%s));' % (proto.params[1].ty.strip('*').replace('const ', '')))
+                    if 'CreateComputePipeline' == proto.name:
+                        rbody.append('            createInfo.cs.shader = m_objMapper.remap_shaders(pPacket->pCreateInfo->cs.shader);')
+                        rbody.append('            if (createInfo.cs.shader == VK_NULL_HANDLE && pPacket->pCreateInfo->cs.shader != VK_NULL_HANDLE)')
+                        rbody.append('            {')
+                        rbody.append('                vktrace_LogError("Error detected in vkCreateComputePipelines() due to invalid remapped VkShader.");')
+                        rbody.append('                return vktrace_replay::VKTRACE_REPLAY_ERROR;')
+                        rbody.append('            }')
+                    elif 'CreateBufferView' == proto.name:
+                        rbody.append('            createInfo.buffer = m_objMapper.remap_buffers(pPacket->pCreateInfo->buffer);')
+                        rbody.append('            if (createInfo.buffer == VK_NULL_HANDLE && pPacket->pCreateInfo->buffer != VK_NULL_HANDLE)')
+                        rbody.append('            {')
+                        rbody.append('                vktrace_LogError("Error detected in vkCreateBufferView() due to invalid remapped VkBuffer.");')
+                        rbody.append('                return vktrace_replay::VKTRACE_REPLAY_ERROR;')
+                        rbody.append('            }')
+                    else:
+                        rbody.append('            createInfo.image = m_objMapper.remap_images(pPacket->pCreateInfo->image);')
+                        rbody.append('            if (createInfo.image == VK_NULL_HANDLE && pPacket->pCreateInfo->image != VK_NULL_HANDLE)')
+                        rbody.append('            {')
+                        rbody.append('                vktrace_LogError("Error detected in vkCreateImageView() due to invalid remapped VkImage.");')
+                        rbody.append('                return vktrace_replay::VKTRACE_REPLAY_ERROR;')
+                        rbody.append('            }')
+                    rbody.append('            %s local_%s;' % (proto.params[-1].ty.strip('*').replace('const ', ''), proto.params[-1].name))
+                elif create_func: # Declare local var to store created handle into
+                    if 'AllocateDescriptorSets' == proto.name:
+                        p_ty = proto.params[-1].ty.strip('*').replace('const ', '')
+                        rbody.append('            %s* local_%s = (%s*)malloc(pPacket->pAllocateInfo->descriptorSetCount * sizeof(%s));' % (p_ty, proto.params[-1].name, p_ty, p_ty))
+                        rbody.append('            VkDescriptorSetLayout* local_pSetLayouts = (VkDescriptorSetLayout*)malloc(pPacket->pAllocateInfo->descriptorSetCount * sizeof(VkDescriptorSetLayout));')
+                        rbody.append('            VkDescriptorSetAllocateInfo local_AllocInfo, *local_pAllocateInfo = &local_AllocInfo;')
+                        rbody.append('            VkDescriptorPool local_descPool;')
+                        rbody.append('            local_descPool = m_objMapper.remap_descriptorpools(pPacket->pAllocateInfo->descriptorPool);')
+                        rbody.append('            if (local_descPool == VK_NULL_HANDLE)')
+                        rbody.append('            {')
+                        rbody.append('                vktrace_LogError("Error detected in vkAllocateDescriptorSets() due to invalid remapped VkDescriptorPool.");')
+                        rbody.append('                return vktrace_replay::VKTRACE_REPLAY_ERROR;')
+                        rbody.append('            }')
+                        rbody.append('            for (uint32_t i = 0; i < pPacket->pAllocateInfo->descriptorSetCount; i++)')
+                        rbody.append('            {')
+                        rbody.append('                local_pSetLayouts[i] = m_objMapper.remap_descriptorsetlayouts(pPacket->%s->pSetLayouts[i]);' % (proto.params[-2].name))
+                        rbody.append('                if (local_pSetLayouts[i] == VK_NULL_HANDLE)')
+                        rbody.append('                {')
+                        rbody.append('                    vktrace_LogError("Error detected in vkAllocateDescriptorSets() due to invalid remapped VkDescriptorSetLayout.");')
+                        rbody.append('                    return vktrace_replay::VKTRACE_REPLAY_ERROR;')
+                        rbody.append('                }')
+                        rbody.append('            }')
+                        rbody.append('            memcpy(local_pAllocateInfo, pPacket->pAllocateInfo, sizeof(VkDescriptorSetAllocateInfo));')
+                        rbody.append('            local_pAllocateInfo->pSetLayouts = local_pSetLayouts;')
+                        rbody.append('            local_pAllocateInfo->descriptorPool = local_descPool;')
+                    else:
+                        rbody.append('            %s local_%s;' % (proto.params[-1].ty.strip('*').replace('const ', ''), proto.params[-1].name))
+                elif proto.name == 'ResetFences':
+                    rbody.append('            VkFence* fences = VKTRACE_NEW_ARRAY(VkFence, pPacket->fenceCount);')
+                    rbody.append('            for (uint32_t i = 0; i < pPacket->fenceCount; i++)')
+                    rbody.append('            {')
+                    rbody.append('                fences[i] = m_objMapper.remap_fences(pPacket->%s[i]);' % (proto.params[-1].name))
+                    rbody.append('                if (fences[i] == VK_NULL_HANDLE)')
+                    rbody.append('                {')
+                    rbody.append('                    vktrace_LogError("Error detected in vkResetFences() due to invalid remapped VkFence.");')
+                    rbody.append('                    return vktrace_replay::VKTRACE_REPLAY_ERROR;')
+                    rbody.append('                }')
+                    rbody.append('            }')
+                elif proto.name in do_while_dict:
+                    rbody.append('            do {')
+                last_name = ''
+                for p in proto.params:
+                    if create_func or create_view:
+                        if p.name != proto.params[-1].name:
+                            rbody.append(self._remap_packet_param(proto.name, p.ty, p.name, last_name))
+                    else:
+                        rbody.append(self._remap_packet_param(proto.name, p.ty, p.name, last_name))
+                    last_name = p.name
+
+                if proto.name == 'DestroyInstance':
+                    rbody.append('            if (m_vkFuncs.real_vkDestroyDebugReportCallbackEXT != NULL)')
+                    rbody.append('            {')
+                    rbody.append('                m_vkFuncs.real_vkDestroyDebugReportCallbackEXT(remappedinstance, m_dbgMsgCallbackObj, pPacket->pAllocator);')
+                    rbody.append('            }')
+                # TODO: need a better way to indicate which extensions should be mapped to which Get*ProcAddr
+                elif proto.name == 'GetInstanceProcAddr':
+                    for iExt in vulkan.extensions_all:
+                        if iExt.ifdef:
+                            rbody.append('#ifdef %s' % iExt.ifdef)
+                        for iProto in iExt.protos:
+                            if iProto.name in proto_exclusions:
+                                continue
+                            if 'DebugReport' in iProto.name:
+                                rbody.append('            if (strcmp(pPacket->pName, "vk%s") == 0) {' % (iProto.name))
+                                rbody.append('               m_vkFuncs.real_vk%s = (PFN_vk%s)vk%s(remappedinstance, pPacket->pName);' % (iProto.name, iProto.name, proto.name))
+                                rbody.append('            }')
+                            elif  (iProto.params[0].ty == 'VkInstance' or iProto.params[0].ty != 'VkPhysicalDevice')  and 'KHR' in iProto.name:
+                                rbody.append('            if (strcmp(pPacket->pName, "vk%s") == 0) {' % (iProto.name))
+                                rbody.append('               m_vkFuncs.real_vk%s = (PFN_vk%s)vk%s(remappedinstance, pPacket->pName);' % (iProto.name, iProto.name, proto.name))
+                                rbody.append('            }')
+                        if iExt.ifdef:
+                            rbody.append('#endif /* %s */' % iExt.ifdef)
+                elif proto.name == 'GetDeviceProcAddr':
+                    for dProto in self.protos:
+                       if dProto.name in proto_exclusions:
+                            continue
+                       if 'KHR' in dProto.name and dProto.params[0].ty != 'VkInstance' and dProto.params[0].ty != 'VkPhysicalDevice':
+                       # if 'KHR' in dProto.name:
+                            rbody.append('            if (strcmp(pPacket->pName, "vk%s") == 0) {' % (dProto.name))
+                            rbody.append('               m_vkFuncs.real_vk%s = (PFN_vk%s)vk%s(remappeddevice, pPacket->pName);' % (dProto.name, dProto.name, proto.name))
+                            rbody.append('            }')
+
+                # build the call to the "real_" entrypoint
+                rr_string = '            '
+                if ret_value:
+                    if proto.ret != 'VkResult':
+                        ret_value = False
+                    else:
+                        rr_string = '            replayResult = '
+                rr_string += 'm_vkFuncs.real_vk%s(' % proto.name
+                for p in proto.params:
+                    # For last param of Create funcs, pass address of param
+                    if create_func:
+                        if proto.name == 'AllocateDescriptorSets' and ((p.name == proto.params[-2].name) or (p.name == proto.params[-1].name)):
+                            rr_string += 'local_%s, ' % p.name
+                        elif p.name == proto.params[-1].name:
+                            rr_string += '&local_%s, ' % p.name
+                        else:
+                            rr_string += '%s, ' % self._get_packet_param(proto.name, p.ty, p.name)
+                    else:
+                        rr_string += '%s, ' % self._get_packet_param(proto.name, p.ty, p.name)
+                rr_string = '%s);' % rr_string[:-2]
+                if proto.name in custom_open_params:
+                    rr_list = rr_string.split(', ')
+                    for pidx in custom_open_params[proto.name]:
+                        rr_list[pidx] = '&local_%s' % proto.params[pidx].name
+                    rr_string = ', '.join(rr_list)
+                    rr_string += ');'
+                elif create_view:
+                    rr_list = rr_string.split(', ')
+                    rr_list[-3] = '&createInfo'
+                    rr_list[-2] = 'NULL'
+                    rr_list[-1] = '&local_%s);' % proto.params[-1].name
+                    rr_string = ', '.join(rr_list)
+                    # this is a sneaky shortcut to use generic create code below to add_to_map
+                    create_func = True
+                elif proto.name == 'AllocateDescriptorSets':
+                    rr_string = rr_string.replace('pPacket->pSetLayouts', 'pLocalDescSetLayouts')
+                elif proto.name == 'ResetFences':
+                   rr_string = rr_string.replace('pPacket->pFences', 'fences')
+
+                # insert the real_*(..) call
+                rbody.append(rr_string)
+
+                # handle return values or anything that needs to happen after the real_*(..) call
+                get_ext_layers_proto = ['EnumerateInstanceExtensionProperties', 'EnumerateDeviceExtensionProperties','EnumerateInstanceLayerProperties', 'EnumerateDeviceLayerProperties']
+                if 'DestroyDevice' in proto.name:
+                    rbody.append('            if (replayResult == VK_SUCCESS)')
+                    rbody.append('            {')
+                    rbody.append('                m_pCBDump = NULL;')
+                    rbody.append('                m_pDSDump = NULL;')
+                    #TODO138 : disabling snapshot
+                    #rbody.append('                m_pVktraceSnapshotPrint = NULL;')
+                    rbody.append('                m_objMapper.rm_from_devices_map(pPacket->device);')
+                    rbody.append('                m_display->m_initedVK = false;')
+                    rbody.append('            }')
+                elif proto.name in get_ext_layers_proto:
+                    rbody.append('            if (replayResult == VK_ERROR_LAYER_NOT_PRESENT || replayResult == VK_INCOMPLETE)')
+                    rbody.append('            { // ignore errors caused by trace config != replay config')
+                    rbody.append('                replayResult = VK_SUCCESS;')
+                    rbody.append('            }')
+                elif 'DestroySwapchainKHR' in proto.name:
+                    rbody.append('            if (replayResult == VK_SUCCESS)')
+                    rbody.append('            {')
+                    rbody.append('                m_objMapper.rm_from_swapchainkhrs_map(pPacket->swapchain);')
+                    rbody.append('            }')
+                elif 'AcquireNextImageKHR' in proto.name:
+                    rbody.append('            m_objMapper.add_to_pImageIndex_map(*(pPacket->pImageIndex), local_pImageIndex);')
+                elif 'DestroyInstance' in proto.name:
+                    rbody.append('            if (replayResult == VK_SUCCESS)')
+                    rbody.append('            {')
+                    rbody.append('                // TODO need to handle multiple instances and only clearing maps within an instance.')
+                    rbody.append('                // TODO this only works with a single instance used at any given time.')
+                    rbody.append('                m_objMapper.clear_all_map_handles();')
+                    rbody.append('            }')
+                elif 'MergePipelineCaches' in proto.name:
+                    rbody.append('            delete[] remappedpSrcCaches;')
+                elif 'FreeCommandBuffers' in proto.name:
+                    rbody.append('            delete[] remappedpCommandBuffers;')
+                elif 'CmdExecuteCommands' in proto.name:
+                    rbody.append('            delete[] remappedpCommandBuffers;')
+                elif 'AllocateDescriptorSets' in proto.name:
+                    rbody.append('            if (replayResult == VK_SUCCESS)')
+                    rbody.append('            {')
+                    rbody.append('                for (uint32_t i = 0; i < pPacket->pAllocateInfo->descriptorSetCount; i++) {')
+                    rbody.append('                    m_objMapper.add_to_descriptorsets_map(pPacket->%s[i], local_%s[i]);' % (proto.params[-1].name, proto.params[-1].name))
+                    rbody.append('                }')
+                    rbody.append('            }')
+                    rbody.append('            free(local_pSetLayouts);')
+                    rbody.append('            free(local_pDescriptorSets);')
+                elif proto.name == 'ResetFences':
+                    rbody.append('            VKTRACE_DELETE(fences);')
+                elif create_func: # save handle mapping if create successful
+                    if ret_value:
+                        rbody.append('            if (replayResult == VK_SUCCESS)')
+                        rbody.append('            {')
+                    clean_type = proto.params[-1].ty.strip('*').replace('const ', '')
+                    VkNonDispObjType = [o for o in vulkan.object_non_dispatch_list]
+                    rbody.append('                m_objMapper.add_to_%ss_map(*(pPacket->%s), local_%s);' % (clean_type.lower()[2:], proto.params[-1].name, proto.params[-1].name))
+                    if 'AllocateMemory' == proto.name:
+                        rbody.append('                m_objMapper.add_entry_to_mapData(local_%s, pPacket->pAllocateInfo->allocationSize);' % (proto.params[-1].name))
+                    if ret_value:
+                        rbody.append('            }')
+                elif proto.name in do_while_dict:
+                    rbody[-1] = '    %s' % rbody[-1]
+                    rbody.append('            } while (%s);' % do_while_dict[proto.name])
+                    rbody.append('            if (pPacket->result != VK_NOT_READY || replayResult != VK_SUCCESS)')
+            if ret_value:
+                rbody.append('            CHECK_RETURN_VALUE(vk%s);' % proto.name)
+            rbody.append('            break;')
+            rbody.append('        }')
+            if proto_ext_ifdef:
+                rbody.append('#endif /* %s */' % proto_ext_ifdef)
+        rbody.append('        default:')
+        rbody.append('            vktrace_LogWarning("Unrecognized packet_id %u, skipping.", packet->packet_id);')
+        rbody.append('            returnValue = vktrace_replay::VKTRACE_REPLAY_INVALID_ID;')
+        rbody.append('            break;')
+        rbody.append('    }')
+        rbody.append('    return returnValue;')
+        rbody.append('}')
+        return "\n".join(rbody)
+
+class VktraceTraceHeader(Subcommand):
+    def generate_header(self, extensionName):
+        header_txt = []
+        header_txt.append('#include "vktrace_vk_vk_packets.h"')
+        header_txt.append('#include "vktrace_vk_packet_id.h"')
+        header_txt.append('#include "vulkan/vk_layer.h"\n\n')
+        header_txt.append('void InitTracer(void);\n\n')
+        header_txt.append('#ifdef WIN32')
+        header_txt.append('BOOL CALLBACK InitTracer(_Inout_ PINIT_ONCE initOnce, _Inout_opt_ PVOID param, _Out_opt_ PVOID *lpContext);')
+        header_txt.append('extern INIT_ONCE gInitOnce;')
+        header_txt.append('\n#elif defined(PLATFORM_LINUX)')
+        header_txt.append('void InitTracer(void);')
+        header_txt.append('extern pthread_once_t gInitOnce;')
+        header_txt.append('#endif\n')
+        return "\n".join(header_txt)
+
+    def generate_body(self):
+        body = [self._generate_trace_func_protos()]
+
+        return "\n".join(body)
+
+class VktraceTraceC(Subcommand):
+    def generate_header(self, extensionName):
+        header_txt = []
+        header_txt.append('#include "vktrace_platform.h"')
+        header_txt.append('#include "vktrace_common.h"')
+        header_txt.append('#include "vktrace_lib_helpers.h"')
+        header_txt.append('#include "vktrace_vk_vk.h"')
+        #header_txt.append('#include "vktrace_vk_vk_lunarg_debug_marker.h"')
+        header_txt.append('#include "vktrace_interconnect.h"')
+        header_txt.append('#include "vktrace_filelike.h"')
+        header_txt.append('#include "vk_struct_size_helper.h"')
+        header_txt.append('#ifdef PLATFORM_LINUX')
+        header_txt.append('#include <pthread.h>')
+        header_txt.append('#endif')
+        header_txt.append('#include "vktrace_trace_packet_utils.h"')
+        header_txt.append('#include <stdio.h>\n')
+        header_txt.append('#include <string.h>\n')
+        header_txt.append('#ifdef WIN32')
+        header_txt.append('INIT_ONCE gInitOnce = INIT_ONCE_STATIC_INIT;')
+        header_txt.append('#elif defined(PLATFORM_LINUX)')
+        header_txt.append('pthread_once_t gInitOnce = PTHREAD_ONCE_INIT;')
+        header_txt.append('#endif')
+        return "\n".join(header_txt)
+
+    def generate_body(self):
+        body = [self._generate_init_funcs(),
+                self._generate_trace_funcs(self.extensionName)]
+
+        return "\n".join(body)
+
+class VktracePacketID(Subcommand):
+    def generate_header(self, extensionName):
+        header_txt = []
+        header_txt.append('#pragma once\n')
+        header_txt.append('#include "vktrace_vk_vk_packets.h"')
+        header_txt.append('#include "vktrace_trace_packet_utils.h"')
+        header_txt.append('#include "vktrace_trace_packet_identifiers.h"')
+        header_txt.append('#include "vktrace_interconnect.h"')
+        header_txt.append("#include <inttypes.h>")
+        #header_txt.append('#include "vktrace_vk_vk_lunarg_debug_marker_packets.h"')
+        header_txt.append('#include "vk_enum_string_helper.h"')
+        header_txt.append('#ifndef _WIN32')
+        header_txt.append(' #pragma GCC diagnostic ignored "-Wwrite-strings"')
+        header_txt.append('#endif')
+        #header_txt.append('#include "vk_struct_string_helper.h"')
+        header_txt.append('#ifndef _WIN32')
+        header_txt.append(' #pragma GCC diagnostic warning "-Wwrite-strings"')
+        header_txt.append('#endif')
+        header_txt.append('#if defined(WIN32)')
+        header_txt.append('#define snprintf _snprintf')
+        header_txt.append('#endif')
+        header_txt.append('#if defined(WIN32)')
+        header_txt.append('#define VK_SIZE_T_SPECIFIER "%Iu"')
+        header_txt.append('#else')
+        header_txt.append('#define VK_SIZE_T_SPECIFIER "%zu"')
+        header_txt.append('#endif')
+        header_txt.append('#define SEND_ENTRYPOINT_ID(entrypoint) ;')
+        header_txt.append('//#define SEND_ENTRYPOINT_ID(entrypoint) vktrace_TraceInfo(#entrypoint);\n')
+        header_txt.append('#define SEND_ENTRYPOINT_PARAMS(entrypoint, ...) ;')
+        header_txt.append('//#define SEND_ENTRYPOINT_PARAMS(entrypoint, ...) vktrace_TraceInfo(entrypoint, __VA_ARGS__);\n')
+        header_txt.append('#define CREATE_TRACE_PACKET(entrypoint, buffer_bytes_needed) \\')
+        header_txt.append('    pHeader = vktrace_create_trace_packet(VKTRACE_TID_VULKAN, VKTRACE_TPI_VK_##entrypoint, sizeof(packet_##entrypoint), buffer_bytes_needed);\n')
+        header_txt.append('#define FINISH_TRACE_PACKET() \\')
+        header_txt.append('    vktrace_finalize_trace_packet(pHeader); \\')
+        header_txt.append('    vktrace_write_trace_packet(pHeader, vktrace_trace_get_trace_file()); \\')
+        header_txt.append('    vktrace_delete_trace_packet(&pHeader);')
+        return "\n".join(header_txt)
+
+    def generate_body(self):
+        body = [self._generate_packet_id_enum(),
+                self._generate_packet_id_name_func(),
+                self._generate_stringify_func(),
+                self._generate_interp_func()]
+
+        return "\n".join(body)
+
+class VktraceCoreTracePackets(Subcommand):
+    def generate_header(self, extensionName):
+        header_txt = []
+        header_txt.append('#pragma once\n')
+        header_txt.append('#include "vulkan/vulkan.h"')
+        header_txt.append('#include "vktrace_trace_packet_utils.h"\n')
+        return "\n".join(header_txt)
+
+    def generate_body(self):
+        body = [self._generate_struct_util_funcs(),
+                self._generate_interp_funcs()]
+
+        return "\n".join(body)
+
+class VktraceExtTraceHeader(Subcommand):
+    def generate_header(self, extensionName):
+        header_txt = []
+        header_txt.append('#pragma once\n')
+        header_txt.append('#include "vulkan/vulkan.h"')
+        header_txt.append('#include "%s.h"' % extensionName.lower())
+        return "\n".join(header_txt)
+
+    def generate_body(self):
+        body = [self._generate_trace_func_protos_ext(self.extensionName)]
+
+        return "\n".join(body)
+
+class VktraceExtTraceC(Subcommand):
+    def generate_header(self, extensionName):
+        header_txt = []
+        header_txt.append('#include "vktrace_vk_packet_id.h"')
+        header_txt.append('#include "vktrace_platform.h"')
+        header_txt.append('#include "vktrace_common.h"')
+        header_txt.append('#include "vktrace_vk_%s.h"' % extensionName.lower())
+        header_txt.append('#include "vktrace_vk_%s_packets.h"' % extensionName.lower())
+        header_txt.append('#include "vk_struct_size_helper.h"')
+        header_txt.append('#include "%s_struct_size_helper.h"' % extensionName.lower())
+        #if extensionName == 'vk_lunarg_debug_marker':
+        #    header_txt.append('#include "vk_debug_marker_layer.h"\n')
+
+        header_txt.append('#include "vktrace_lib_helpers.h"')
+        return "\n".join(header_txt)
+
+    def generate_body(self):
+        body = [self._generate_trace_funcs(self.extensionName)]
+
+        return "\n".join(body)
+
+class VktraceExtTracePackets(Subcommand):
+    def generate_header(self, extensionName):
+        header_txt = []
+        header_txt.append('#pragma once\n')
+        header_txt.append('#include "%s.h"' % extensionName.lower())
+        header_txt.append('#include "vktrace_trace_packet_utils.h"\n')
+        return "\n".join(header_txt)
+
+    def generate_body(self):
+        body = [self._generate_interp_funcs_ext(self.extensionName)]
+
+        return "\n".join(body)
+
+class VktraceReplayVkFuncPtrs(Subcommand):
+    def generate_header(self, extensionName):
+        header_txt = []
+        header_txt.append('#pragma once\n')
+        header_txt.append('#if defined(PLATFORM_LINUX) || defined(XCB_NVIDIA)')
+        header_txt.append('#if !defined(ANDROID)')
+        header_txt.append('#include <xcb/xcb.h>\n')
+        header_txt.append('#endif')
+        header_txt.append('#endif')
+        header_txt.append('#include "vulkan/vulkan.h"')
+        #header_txt.append('#include "vulkan/vk_lunarg_debug_marker.h"')
+
+    def generate_body(self):
+        body = [self._generate_replay_func_ptrs()]
+        return "\n".join(body)
+
+class VktraceReplayObjMapperHeader(Subcommand):
+    def generate_header(self, extensionName):
+        header_txt = []
+        header_txt.append('#pragma once\n')
+        header_txt.append('#include <set>')
+        header_txt.append('#include <map>')
+        header_txt.append('#include <vector>')
+        header_txt.append('#include <string>')
+        header_txt.append('#include "vulkan/vulkan.h"')
+        header_txt.append('#include "vktrace_pageguard_memorycopy.h"')
+        #header_txt.append('#include "vulkan/vk_lunarg_debug_marker.h"')
+        return "\n".join(header_txt)
+
+    def generate_body(self):
+        body = [self._generate_replay_objmapper_class()]
+        return "\n".join(body)
+
+class VktraceReplayC(Subcommand):
+    def generate_header(self, extensionName):
+        header_txt = []
+        header_txt.append('#include "vkreplay_vkreplay.h"\n')
+        header_txt.append('#include "vkreplay.h"\n')
+        header_txt.append('#include "vkreplay_main.h"\n')
+        header_txt.append('#include <algorithm>')
+        header_txt.append('#include <queue>')
+        header_txt.append('\n')
+        header_txt.append('extern "C" {')
+        header_txt.append('#include "vktrace_vk_vk_packets.h"')
+        #header_txt.append('#include "vktrace_vk_vk_lunarg_debug_marker_packets.h"')
+        header_txt.append('#include "vktrace_vk_packet_id.h"')
+        #header_txt.append('#include "vk_enum_string_helper.h"\n}\n')
+
+        return "\n".join(header_txt)
+
+    def generate_body(self):
+        body = [self._generate_replay_init_funcs(),
+                self._generate_replay()]
+        body.append("}")
+        return "\n".join(body)
+
+def main():
+
+    subcommands = {
+            "vktrace-trace-h" : VktraceTraceHeader,
+            "vktrace-trace-c" : VktraceTraceC,
+            "vktrace-packet-id" : VktracePacketID,
+            "vktrace-core-trace-packets" : VktraceCoreTracePackets,
+            "vktrace-ext-trace-h" : VktraceExtTraceHeader,
+            "vktrace-ext-trace-c" : VktraceExtTraceC,
+            "vktrace-ext-trace-packets" : VktraceExtTracePackets,
+            "vktrace-replay-vk-funcs" : VktraceReplayVkFuncPtrs,
+            "vktrace-replay-obj-mapper-h" : VktraceReplayObjMapperHeader,
+            "vktrace-replay-c" : VktraceReplayC,
+    }
+
+    if len(sys.argv) < 3 or sys.argv[2] not in subcommands:
+        print("Available subcommands are: %s" % " ".join(subcommands))
+        exit(1)
+
+    subcmd = subcommands[sys.argv[2]](sys.argv[3])
+    subcmd.run()
+
+if __name__ == "__main__":
+    main()

diff --git a/vt_genvk.py b/vt_genvk.py
new file mode 100644
index 0000000..df35529
--- /dev/null
+++ b/vt_genvk.py

@@ -0,0 +1,260 @@
+#!/usr/bin/env python
+#
+# Copyright (c) 2013-2016 The Khronos Group Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse, cProfile, pdb, string, sys, time
+from reg import *
+from generator import write
+
+#
+# VulkanTools Generator Additions
+from api_dump_generator import ApiDumpGeneratorOptions, ApiDumpOutputGenerator, COMMON_CODEGEN, TEXT_CODEGEN
+
+# Simple timer functions
+startTime = None
+
+def startTimer(timeit):
+    global startTime
+    startTime = time.clock()
+
+def endTimer(timeit, msg):
+    global startTime
+    endTime = time.clock()
+    if (timeit):
+        write(msg, endTime - startTime, file=sys.stderr)
+        startTime = None
+
+# Turn a list of strings into a regexp string matching exactly those strings
+def makeREstring(list):
+    return '^(' + '|'.join(list) + ')$'
+
+# Returns a directory of [ generator function, generator options ] indexed
+# by specified short names. The generator options incorporate the following
+# parameters:
+#
+# extensions - list of extension names to include.
+# protect - True if re-inclusion protection should be added to headers
+# directory - path to directory in which to generate the target(s)
+def makeGenOpts(extensions = [], protect = True, directory = '.'):
+    global genOpts
+    genOpts = {}
+
+    # Descriptive names for various regexp patterns used to select
+    # versions and extensions
+    allVersions     = allExtensions = '.*'
+    noVersions      = noExtensions = None
+
+    addExtensions     = makeREstring(extensions)
+    removeExtensions  = makeREstring([])
+
+    # Copyright text prefixing all headers (list of strings).
+    prefixStrings = [
+        '/*',
+        '** Copyright (c) 2015-2016 The Khronos Group Inc.',
+        '**',
+        '** Licensed under the Apache License, Version 2.0 (the "License");',
+        '** you may not use this file except in compliance with the License.',
+        '** You may obtain a copy of the License at',
+        '**',
+        '**     http://www.apache.org/licenses/LICENSE-2.0',
+        '**',
+        '** Unless required by applicable law or agreed to in writing, software',
+        '** distributed under the License is distributed on an "AS IS" BASIS,',
+        '** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.',
+        '** See the License for the specific language governing permissions and',
+        '** limitations under the License.',
+        '*/',
+        ''
+    ]
+
+    # Text specific to Vulkan headers
+    vkPrefixStrings = [
+        '/*',
+        '** This header is generated from the Khronos Vulkan XML API Registry.',
+        '**',
+        '*/',
+        ''
+    ]
+
+    # Defaults for generating re-inclusion protection wrappers (or not)
+    protectFile = protect
+    protectFeature = protect
+    protectProto = protect
+
+    #
+    # VulkanTools Generators
+    
+    # Options for api dump layer    
+    genOpts['api_dump.cpp'] = [
+        ApiDumpOutputGenerator,
+        ApiDumpGeneratorOptions(
+            input             = COMMON_CODEGEN,
+            filename          = 'api_dump.cpp',
+            apiname           = 'vulkan',
+            profile           = None,
+            versions          = allVersions,
+            emitversions      = allVersions,
+            defaultExtensions = 'vulkan',
+            addExtensions     = None,
+            removeExtensions  = None,
+            prefixText        = prefixStrings + vkPrefixStrings,
+            genFuncPointers   = True,
+            protectFile       = protectFile,
+            protectFeature    = False,
+            protectProto      = None,
+            protectProtoStr   = 'VK_NO_PROTOTYPES',
+            apicall           = 'VKAPI_ATTR ',
+            apientry          = 'VKAPI_CALL ',
+            apientryp         = 'VKAPI_PTR *',
+            alignFuncParam    = 48)
+        ]
+    genOpts['api_dump_text.h'] = [
+        ApiDumpOutputGenerator,
+        ApiDumpGeneratorOptions(
+            input             = TEXT_CODEGEN,
+            filename          = 'api_dump_text.h',
+            apiname           = 'vulkan',
+            profile           = None,
+            versions          = allVersions,
+            emitversions      = allVersions,
+            defaultExtensions = 'vulkan',
+            addExtensions     = None,
+            removeExtensions  = None,
+            prefixText        = prefixStrings + vkPrefixStrings,
+            genFuncPointers   = True,
+            protectFile       = protectFile,
+            protectFeature    = False,
+            protectProto      = None,
+            protectProtoStr   = 'VK_NO_PROTOTYPES',
+            apicall           = 'VKAPI_ATTR ',
+            apientry          = 'VKAPI_CALL ',
+            apientryp         = 'VKAPI_PTR *',
+            alignFuncParam    = 48)
+    ]
+
+# Generate a target based on the options in the matching genOpts{} object.
+# This is encapsulated in a function so it can be profiled and/or timed.
+# The args parameter is an parsed argument object containing the following
+# fields that are used:
+#   target - target to generate
+#   directory - directory to generate it in
+#   protect - True if re-inclusion wrappers should be created
+#   extensions - list of additional extensions to include in generated
+#   interfaces
+def genTarget(args):
+    global genOpts
+
+    # Create generator options with specified parameters
+    makeGenOpts(extensions = args.extension,
+                protect = args.protect,
+                directory = args.directory)
+
+    if (args.target in genOpts.keys()):
+        createGenerator = genOpts[args.target][0]
+        options = genOpts[args.target][1]
+
+        write('* Building', options.filename, file=sys.stderr)
+
+        startTimer(args.time)
+        gen = createGenerator(errFile=errWarn,
+                              warnFile=errWarn,
+                              diagFile=diag,
+                              registryFile=args.registry)
+        reg.setGenerator(gen)
+        reg.apiGen(options)
+        write('* Generated', options.filename, file=sys.stderr)
+        endTimer(args.time, '* Time to generate ' + options.filename + ' =')
+    else:
+        write('No generator options for unknown target:',
+              args.target, file=sys.stderr)
+
+# -extension name - may be a single extension name, a a space-separated list
+# of names, or a regular expression.
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+
+    parser.add_argument('-extension', action='append',
+                        default=[],
+                        help='Specify an extension or extensions to add to targets')
+    parser.add_argument('-debug', action='store_true',
+                        help='Enable debugging')
+    parser.add_argument('-dump', action='store_true',
+                        help='Enable dump to stderr')
+    parser.add_argument('-diagfile', action='store',
+                        default=None,
+                        help='Write diagnostics to specified file')
+    parser.add_argument('-errfile', action='store',
+                        default=None,
+                        help='Write errors and warnings to specified file instead of stderr')
+    parser.add_argument('-noprotect', dest='protect', action='store_false',
+                        help='Disable inclusion protection in output headers')
+    parser.add_argument('-profile', action='store_true',
+                        help='Enable profiling')
+    parser.add_argument('-registry', action='store',
+                        default='vk.xml',
+                        help='Use specified registry file instead of vk.xml')
+    parser.add_argument('-time', action='store_true',
+                        help='Enable timing')
+    parser.add_argument('-validate', action='store_true',
+                        help='Enable group validation')
+    parser.add_argument('-o', action='store', dest='directory',
+                        default='.',
+                        help='Create target and related files in specified directory')
+    parser.add_argument('target', metavar='target', nargs='?',
+                        help='Specify target')
+
+    args = parser.parse_args()
+
+    # This splits arguments which are space-separated lists
+    args.extension = [name for arg in args.extension for name in arg.split()]
+
+    # Load & parse registry
+    reg = Registry()
+
+    startTimer(args.time)
+    tree = etree.parse(args.registry)
+    endTimer(args.time, '* Time to make ElementTree =')
+
+    startTimer(args.time)
+    reg.loadElementTree(tree)
+    endTimer(args.time, '* Time to parse ElementTree =')
+
+    if (args.validate):
+        reg.validateGroups()
+
+    if (args.dump):
+        write('* Dumping registry to regdump.txt', file=sys.stderr)
+        reg.dumpReg(filehandle = open('regdump.txt','w'))
+
+    # create error/warning & diagnostic files
+    if (args.errfile):
+        errWarn = open(args.errfile, 'w')
+    else:
+        errWarn = sys.stderr
+
+    if (args.diagfile):
+        diag = open(args.diagfile, 'w')
+    else:
+        diag = None
+
+    if (args.debug):
+        pdb.run('genTarget(args)')
+    elif (args.profile):
+        import cProfile, pstats
+        cProfile.run('genTarget(args)', 'profile.txt')
+        p = pstats.Stats('profile.txt')
+        p.strip_dirs().sort_stats('time').print_stats(50)
+    else:
+        genTarget(args)
\ No newline at end of file
commit	0a000fac4ce1f2d7e6feac54a8d5ab980998e4e5	[log] [tgz]
author	Arda Coskunses <arda@lunarg.com>	Mon Nov 21 09:46:36 2016 -0700
committer	Arda Coskunses <arda@lunarg.com>	Mon Nov 21 09:46:36 2016 -0700
tree	7f91b49d96675f18791544a842930ec569c0cdfb
parent	fc98a1ddc69c545bc9f7659e4d772101248afb82 [diff]
parent	39c845ed4c066740e9efaed0a00af51be07c67c1 [diff]